All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 1/2] distributor lib performance enhancements
@ 2016-12-01  4:50 David Hunt
  2016-12-01  4:50 ` [PATCH v1 1/2] lib: distributor " David Hunt
  2016-12-01  4:50 ` [PATCH v1 2/2] example: distributor app modified to use burstAPI David Hunt
  0 siblings, 2 replies; 202+ messages in thread
From: David Hunt @ 2016-12-01  4:50 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson

This patch aims to improve the throughput of the distributor library.

It uses a similar handshake mechanism to the previous version of
the library, in that bits are used to indicate when packets are ready
to be sent to a worker and ready to be returned from a worker. One main
difference is that instead of sending one packet in a cache line, it makes
use of the 7 free spaces in the same cache line in order to send up to
8 packets at a time to/from a worker.

The flow matching algorithm has had significant re-work, and now keeps an
array of inflight flows and an array of backlog flows, and matches incoming
flows to the inflight/backlog flows of all workers so that flow pinning to
workers can be maintained.

Notes:
   Apps must now work in bursts, as up to 8 are given to a worker at a time
   For performance in matching, Flow ID's are 15-bits
   Original API (and code) is kept for backward compatibility

Performance Gains
   2.2GHz Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
   2 x XL710 40GbE NICS to 2 x 40Gbps traffic generator channels 64b packets
   separate cores for rx, tx, distributor
    1 worker  - 4.8x
    4 workers - 2.9x
    8 workers - 1.8x
   12 workers - 2.1x
   16 workers - 1.8x


[PATCH v1 1/2] lib: distributor performance enhancements
[PATCH v1 2/2] example: distributor app modified to use burstAPI

^ permalink raw reply	[flat|nested] 202+ messages in thread

* [PATCH v1 1/2] lib: distributor performance enhancements
  2016-12-01  4:50 [PATCH v1 1/2] distributor lib performance enhancements David Hunt
@ 2016-12-01  4:50 ` David Hunt
  2016-12-22  4:37   ` [PATCH v2 0/5] distributor library " David Hunt
  2016-12-01  4:50 ` [PATCH v1 2/2] example: distributor app modified to use burstAPI David Hunt
  1 sibling, 1 reply; 202+ messages in thread
From: David Hunt @ 2016-12-01  4:50 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Now sends bursts of up to 8 mbufs to each worker, and tracks
the in-flight flow-ids (atomic scheduling)

New file with a new api, similar to the old API except with _burst
at the end of the function names.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                 |   2 +
 lib/librte_distributor/rte_distributor.c        |  27 +-
 lib/librte_distributor/rte_distributor_burst.c  | 617 ++++++++++++++++++++++++
 lib/librte_distributor/rte_distributor_burst.h  | 255 ++++++++++
 lib/librte_distributor/rte_distributor_common.h |  75 +++
 5 files changed, 950 insertions(+), 26 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_burst.c
 create mode 100644 lib/librte_distributor/rte_distributor_burst.h
 create mode 100644 lib/librte_distributor/rte_distributor_common.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 4c9af17..2acc54d 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -43,9 +43,11 @@ LIBABIVER := 1
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_burst.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor_burst.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index f3f778c..cfd187c 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -40,34 +40,9 @@
 #include <rte_errno.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
+#include "rte_distributor_common.h"
 #include "rte_distributor.h"
 
-#define NO_FLAGS 0
-#define RTE_DISTRIB_PREFIX "DT_"
-
-/* we will use the bottom four bits of pointer for flags, shifting out
- * the top four bits to make room (since a 64-bit pointer actually only uses
- * 48 bits). An arithmetic-right-shift will then appropriately restore the
- * original pointer value with proper sign extension into the top bits. */
-#define RTE_DISTRIB_FLAG_BITS 4
-#define RTE_DISTRIB_FLAGS_MASK (0x0F)
-#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
-#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
-#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
-
-#define RTE_DISTRIB_BACKLOG_SIZE 8
-#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
-
-#define RTE_DISTRIB_MAX_RETURNS 128
-#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
-
-/**
- * Maximum number of workers allowed.
- * Be aware of increasing the limit, becaus it is limited by how we track
- * in-flight tags. See @in_flight_bitmask and @rte_distributor_process
- */
-#define RTE_DISTRIB_MAX_WORKERS	64
-
 /**
  * Buffer structure used to pass the pointer data between cores. This is cache
  * line aligned, but to improve performance and prevent adjacent cache-line
diff --git a/lib/librte_distributor/rte_distributor_burst.c b/lib/librte_distributor/rte_distributor_burst.c
new file mode 100644
index 0000000..1e32c0f
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_burst.c
@@ -0,0 +1,617 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <sys/queue.h>
+#include <string.h>
+#include <rte_mbuf.h>
+#include <rte_memory.h>
+#include <rte_cycles.h>
+#include <rte_memzone.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
+#include <rte_eal_memconfig.h>
+#include "rte_distributor_common.h"
+#include "rte_distributor_burst.h"
+#include "smmintrin.h"
+
+/**
+ * Number of packets to deal with in bursts. Needs to be 8 so as to
+ * fit in one cache line.
+ */
+#define RTE_DIST_BURST_SIZE (sizeof(__m128i) / sizeof(uint16_t))
+
+/**
+ * Buffer structure used to pass the pointer data between cores. This is cache
+ * line aligned, but to improve performance and prevent adjacent cache-line
+ * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
+ * the next cache line to worker 0, we pad this out to two cache lines.
+ * We can pass up to 8 mbufs at a time in one cacheline.
+ * There is a separate cacheline for returns.
+ */
+struct rte_distributor_buffer {
+	volatile int64_t bufptr64[RTE_DIST_BURST_SIZE]
+			__rte_cache_aligned; /* <= outgoing to worker */
+
+	int64_t pad1 __rte_cache_aligned;    /* <= one cache line  */
+
+	volatile int64_t retptr64[RTE_DIST_BURST_SIZE]
+			__rte_cache_aligned; /* <= incoming from worker */
+
+	int64_t pad2 __rte_cache_aligned;    /* <= one cache line  */
+
+	int count __rte_cache_aligned;       /* <= number of current mbufs */
+};
+
+struct rte_distributor_backlog {
+	unsigned int start;
+	unsigned int count;
+	int64_t pkts[RTE_DIST_BURST_SIZE] __rte_cache_aligned;
+	uint16_t *tags; /* will point to second cacheline of inflights */
+} __rte_cache_aligned;
+
+struct rte_distributor_returned_pkts {
+	unsigned int start;
+	unsigned int count;
+	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
+};
+
+struct rte_distributor {
+	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
+
+	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
+	unsigned int num_workers;             /**< Number of workers polling */
+
+	/**>
+	 * First cache line in the this array are the tags inflight
+	 * on the worker core. Second cache line are the backlog
+	 * that are going to go to the worker core.
+	 */
+	uint16_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS][RTE_DIST_BURST_SIZE*2]
+			__rte_cache_aligned;
+
+	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS]
+			__rte_cache_aligned;
+
+	struct rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
+
+	struct rte_distributor_returned_pkts returns;
+};
+
+TAILQ_HEAD(rte_distributor_list, rte_distributor);
+
+static struct rte_tailq_elem rte_distributor_tailq = {
+	.name = "RTE_DISTRIBUTOR",
+};
+EAL_REGISTER_TAILQ(rte_distributor_tailq)
+
+/**** APIs called by workers ****/
+
+/**** Burst Packet APIs called by workers ****/
+
+/* This function should really be called return_pkt_burst() */
+void
+rte_distributor_request_pkt_burst(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count)
+{
+	struct rte_distributor_buffer *buf = &(d->bufs[worker_id]);
+	unsigned int i;
+
+	volatile int64_t *retptr64;
+
+	retptr64 = &(buf->retptr64[0]);
+	/* Spin while handshake bits are set (scheduler clears it) */
+	while (unlikely(*retptr64 & RTE_DISTRIB_GET_BUF)) {
+		rte_pause();
+		uint64_t t = __rdtsc()+100;
+
+		while (__rdtsc() < t)
+			rte_pause();
+	}
+
+	/*
+	 * OK, if we've got here, then the scheduler has just cleared the
+	 * handshake bits. Populate the retptrs with returning packets.
+	 */
+
+	for (i = count; i < RTE_DIST_BURST_SIZE; i++)
+		buf->retptr64[i] = 0;
+
+	/* Set Return bit for each packet returned */
+	for (i = count; i-- > 0; )
+		buf->retptr64[i] =
+			(((int64_t)(uintptr_t)(oldpkt[i])) <<
+			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
+
+	/*
+	 * Finally, set the GET_BUF  to signal to distributor that cache
+	 * line is ready for processing
+	 */
+	*retptr64 |= RTE_DISTRIB_GET_BUF;
+}
+
+int
+rte_distributor_poll_pkt_burst(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **pkts)
+{
+	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
+	uint64_t ret;
+	int count = 0;
+
+	/* If bit is set, return */
+	if (buf->bufptr64[0] & RTE_DISTRIB_GET_BUF)
+		return 0;
+
+	/* since bufptr64 is signed, this should be an arithmetic shift */
+	for (unsigned int i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+		if (likely(buf->bufptr64[i] & RTE_DISTRIB_VALID_BUF)) {
+			ret = buf->bufptr64[i] >> RTE_DISTRIB_FLAG_BITS;
+			pkts[count++] = (struct rte_mbuf *)((uintptr_t)(ret));
+		}
+	}
+
+	/*
+	 * so now we've got the contents of the cacheline into an  array of
+	 * mbuf pointers, so toggle the bit so scheduler can start working
+	 * on the next cacheline while we're working.
+	 */
+	buf->bufptr64[0] |= RTE_DISTRIB_GET_BUF;
+
+
+	return count;
+}
+
+int
+rte_distributor_get_pkt_burst(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **pkts,
+		struct rte_mbuf **oldpkt, unsigned int return_count)
+{
+	unsigned int count;
+	uint64_t retries = 0;
+
+	rte_distributor_request_pkt_burst(d, worker_id, oldpkt, return_count);
+
+	count = rte_distributor_poll_pkt_burst(d, worker_id, pkts);
+	while (count == 0) {
+		rte_pause();
+		retries++;
+		if (retries > 1000) {
+			retries = 0;
+			return 0;
+		}
+		uint64_t t = __rdtsc()+100;
+
+		while (__rdtsc() < t)
+			rte_pause();
+
+		count = rte_distributor_poll_pkt_burst(d, worker_id, pkts);
+	}
+	return count;
+}
+
+int
+rte_distributor_return_pkt_burst(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
+{
+	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
+	unsigned int i;
+
+	for (i = 0; i < RTE_DIST_BURST_SIZE; i++)
+		/* Switch off the return bit first */
+		buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
+
+	for (i = num; i-- > 0; )
+		buf->retptr64[i] = (((int64_t)(uintptr_t)oldpkt[i]) <<
+			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
+
+	/* set the GET_BUF but even if we got no returns */
+	buf->retptr64[0] |= RTE_DISTRIB_GET_BUF;
+
+	return 0;
+}
+
+/**** APIs called on distributor core ***/
+
+/* stores a packet returned from a worker inside the returns array */
+static inline void
+store_return(uintptr_t oldbuf, struct rte_distributor *d,
+		unsigned int *ret_start, unsigned int *ret_count)
+{
+	if (!oldbuf)
+		return;
+	/* store returns in a circular buffer */
+	d->returns.mbufs[(*ret_start + *ret_count) & RTE_DISTRIB_RETURNS_MASK]
+			= (void *)oldbuf;
+	*ret_start += (*ret_count == RTE_DISTRIB_RETURNS_MASK);
+	*ret_count += (*ret_count != RTE_DISTRIB_RETURNS_MASK);
+}
+
+
+static void
+find_match(struct rte_distributor *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	/* Setup */
+	__m128i incoming_fids;
+	__m128i inflight_fids;
+	__m128i preflight_fids;
+	__m128i wkr;
+	__m128i mask1;
+	__m128i mask2;
+	__m128i output;
+	struct rte_distributor_backlog *bl;
+
+	/*
+	 * Function overview:
+	 * 2. Loop through all worker ID's
+	 *  2a. Load the current inflights for that worker into an xmm reg
+	 *  2b. Load the current backlog for that worker into an xmm reg
+	 *  2c. use cmpestrm to intersect flow_ids with backlog and inflights
+	 *  2d. Add any matches to the output
+	 * 3. Write the output xmm (matching worker ids).
+	 */
+
+
+	output = _mm_set1_epi16(0);
+	incoming_fids = _mm_load_si128((__m128i *)data_ptr);
+
+	for (uint16_t i = 0; i < d->num_workers; i++) {
+		bl = &d->backlog[i];
+
+		inflight_fids =
+			_mm_load_si128((__m128i *)&(d->in_flight_tags[i]));
+		preflight_fids =
+			_mm_load_si128((__m128i *)(bl->tags));
+
+		/*
+		 * Any incoming_fid that exists anywhere in inflight_fids will
+		 * have 0xffff in same position of the mask as the incoming fid
+		 * Example (shortened to bytes for brevity):
+		 * incoming_fids   0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08
+		 * inflight_fids   0x03 0x05 0x07 0x00 0x00 0x00 0x00 0x00
+		 * mask            0x00 0x00 0xff 0x00 0xff 0x00 0xff 0x00
+		 */
+
+		mask1 = _mm_cmpestrm(inflight_fids, 8, incoming_fids, 8,
+			_SIDD_UWORD_OPS |
+			_SIDD_CMP_EQUAL_ANY |
+			_SIDD_UNIT_MASK);
+		mask2 = _mm_cmpestrm(preflight_fids, 8, incoming_fids, 8,
+			_SIDD_UWORD_OPS |
+			_SIDD_CMP_EQUAL_ANY |
+			_SIDD_UNIT_MASK);
+
+		mask1 = _mm_or_si128(mask1, mask2);
+		 /*
+		 * Now mask contains 0xffff where there's a match.
+		 * Next we need to store the worker_id in the relevant position
+		 * in the output.
+		 */
+
+		wkr = _mm_set1_epi16(i+1);
+		mask1 = _mm_and_si128(mask1, wkr);
+		output = _mm_or_si128(mask1, output);
+	}
+
+	/*
+	 * At this stage, the output 128-bit contains 8 16-bit values, with
+	 * each non-zero value containing the worker ID on which the
+	 * corresponding flow is pinned to.
+	 */
+	_mm_store_si128((__m128i *)output_ptr, output);
+}
+
+
+static unsigned int
+release(struct rte_distributor *d, unsigned int wkr)
+{
+	uintptr_t oldbuf;
+	unsigned int ret_start = d->returns.start,
+			ret_count = d->returns.count;
+	struct rte_distributor_buffer *buf = &(d->bufs[wkr]);
+	unsigned int i;
+
+	if (d->backlog[wkr].count == 0)
+		return 0;
+
+	/*
+	 * wait for the GET_BUF bit to go high, otherwise we can't send
+	 * the packets to the worker
+	 */
+	while (!(d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF))
+		rte_pause();
+
+	if (buf->retptr64[0] & RTE_DISTRIB_GET_BUF) {
+		for (unsigned int i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+			if (buf->retptr64[i] & RTE_DISTRIB_RETURN_BUF) {
+				oldbuf = ((uintptr_t)(buf->retptr64[i] >>
+					RTE_DISTRIB_FLAG_BITS));
+				/* store returns in a circular buffer */
+				store_return(oldbuf, d, &ret_start, &ret_count);
+				buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
+			}
+		}
+		d->returns.start = ret_start;
+		d->returns.count = ret_count;
+		/* Clear for the worker to populate with more returns */
+		buf->retptr64[0] = 0;
+	}
+
+	buf->count = 0;
+
+	for (i = 0; i < d->backlog[wkr].count; i++) {
+		d->bufs[wkr].bufptr64[i] = d->backlog[wkr].pkts[i] |
+				RTE_DISTRIB_GET_BUF | RTE_DISTRIB_VALID_BUF;
+		d->in_flight_tags[wkr][i] = d->backlog[wkr].tags[i];
+	}
+	buf->count = i;
+	for ( ; i < RTE_DIST_BURST_SIZE ; i++) {
+		buf->bufptr64[i] = RTE_DISTRIB_GET_BUF;
+		d->in_flight_tags[wkr][i] = 0;
+	}
+	d->backlog[wkr].count = 0;
+
+	/* Clear the GET bit */
+	buf->bufptr64[0] &= ~RTE_DISTRIB_GET_BUF;
+	return  buf->count;
+
+}
+
+
+/* process a set of packets to distribute them to workers */
+int
+rte_distributor_process_burst(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs)
+{
+	unsigned int next_idx = 0;
+	static unsigned int wkr;
+	struct rte_mbuf *next_mb = NULL;
+	int64_t next_value = 0;
+	uint16_t new_tag = 0;
+	uint16_t flows[8] __rte_cache_aligned;
+
+	if (unlikely(num_mbufs == 0)) {
+		/* Flush out all non-full cache-lines to workers. */
+		for (unsigned int wid = 0 ; wid < d->num_workers; wid++) {
+			if ((d->bufs[wid].bufptr64[0] & RTE_DISTRIB_GET_BUF))
+				release(d, wid);
+		}
+		return 0;
+	}
+
+	while (next_idx < num_mbufs) {
+		uint16_t matches[8];
+		int pkts;
+
+		if (d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF)
+			d->bufs[wkr].count = 0;
+
+		for (unsigned int i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+			if (mbufs[next_idx + i]) {
+				/* flows have to be non-zero */
+				flows[i] = mbufs[next_idx + i]->hash.usr | 1;
+			} else
+				flows[i] = 0;
+		}
+
+		find_match(d, &flows[0], &matches[0]);
+
+		/*
+		 * Matches array now contain the intended worker ID (+1) of
+		 * the incoming packets. Any zeroes need to be assigned
+		 * workers.
+		 */
+
+		if ((num_mbufs - next_idx) < RTE_DIST_BURST_SIZE)
+			pkts = num_mbufs - next_idx;
+		else
+			pkts = RTE_DIST_BURST_SIZE;
+
+		for (int j = 0; j < pkts; j++) {
+
+			next_mb = mbufs[next_idx++];
+			next_value = (((int64_t)(uintptr_t)next_mb) <<
+					RTE_DISTRIB_FLAG_BITS);
+			/*
+			 * User is advocated to set tag vaue for each
+			 * mbuf before calling rte_distributor_process.
+			 * User defined tags are used to identify flows,
+			 * or sessions.
+			 */
+			/* flows MUST be non-zero */
+			new_tag = (uint16_t)(next_mb->hash.usr) | 1;
+
+			/*
+			 * Using the next line will cause the find_match
+			 * function to be optimised out, making this function
+			 * do parallel (non-atomic) distribution
+			 */
+			//matches[j] = 0;
+
+			if (matches[j]) {
+				struct rte_distributor_backlog *bl =
+						&d->backlog[matches[j]-1];
+				if (unlikely(bl->count == RTE_DIST_BURST_SIZE))
+					release(d, matches[j]-1);
+
+				/* Add to worker that already has flow */
+				unsigned int idx = bl->count++;
+
+				bl->tags[idx] = new_tag;
+				bl->pkts[idx] = next_value;
+
+			} else {
+				struct rte_distributor_backlog *bl =
+						&d->backlog[wkr];
+				if (unlikely(bl->count == RTE_DIST_BURST_SIZE))
+					release(d, wkr);
+
+				/* Add to current worker worker */
+				unsigned int idx = bl->count++;
+
+				bl->tags[idx] = new_tag;
+				bl->pkts[idx] = next_value;
+				/*
+				 * Now that we've just added an unpinned flow
+				 * to a worker, we need to ensure that all
+				 * other packets with that same flow will go
+				 * to the same worker in this burst.
+				 */
+				for (int w = j; w < pkts; w++)
+					if (flows[w] == new_tag)
+						matches[w] = wkr+1;
+			}
+		}
+		wkr++;
+		if (wkr >= d->num_workers)
+			wkr = 0;
+	}
+
+	/* Flush out all non-full cache-lines to workers. */
+	for (unsigned int wid = 0 ; wid < d->num_workers; wid++) {
+		if ((d->bufs[wid].bufptr64[0] & RTE_DISTRIB_GET_BUF))
+			release(d, wid);
+	}
+
+	return num_mbufs;
+}
+
+/* return to the caller, packets returned from workers */
+int
+rte_distributor_returned_pkts_burst(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs)
+{
+	struct rte_distributor_returned_pkts *returns = &d->returns;
+	unsigned int retval = (max_mbufs < returns->count) ?
+			max_mbufs : returns->count;
+	unsigned int i;
+
+	for (i = 0; i < retval; i++) {
+		unsigned int idx = (returns->start + i) &
+				RTE_DISTRIB_RETURNS_MASK;
+
+		mbufs[i] = returns->mbufs[idx];
+	}
+	returns->start += i;
+	returns->count -= i;
+
+	return retval;
+}
+
+/*
+ * Return the number of packets in-flight in a distributor, i.e. packets
+ * being workered on or queued up in a backlog.
+ */
+static inline unsigned int
+total_outstanding(const struct rte_distributor *d)
+{
+	unsigned int wkr, total_outstanding = 0;
+
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		total_outstanding += d->backlog[wkr].count;
+
+	return total_outstanding;
+}
+
+/*
+ * Flush the distributor, so that there are no outstanding packets in flight or
+ * queued up.
+ */
+int
+rte_distributor_flush_burst(struct rte_distributor *d)
+{
+	const unsigned int flushed = total_outstanding(d);
+
+	while (total_outstanding(d) > 0)
+		rte_distributor_process_burst(d, NULL, 0);
+
+	return flushed;
+}
+
+/* clears the internal returns array in the distributor */
+void
+rte_distributor_clear_returns_burst(struct rte_distributor *d)
+{
+	/* throw away returns, so workers can exit */
+	for (unsigned int wkr = 0; wkr < d->num_workers; wkr++)
+		d->bufs[wkr].retptr64[0] = 0;
+}
+
+/* creates a distributor instance */
+struct rte_distributor *
+rte_distributor_create_burst(const char *name,
+		unsigned int socket_id,
+		unsigned int num_workers)
+{
+	struct rte_distributor *d;
+	struct rte_distributor_list *distributor_list;
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+
+	/* compilation-time checks */
+	RTE_BUILD_BUG_ON((sizeof(*d) & RTE_CACHE_LINE_MASK) != 0);
+	RTE_BUILD_BUG_ON((RTE_DISTRIB_MAX_WORKERS & 7) != 0);
+
+	if (name == NULL || num_workers >= RTE_DISTRIB_MAX_WORKERS) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	snprintf(mz_name, sizeof(mz_name), RTE_DISTRIB_PREFIX"%s", name);
+	mz = rte_memzone_reserve(mz_name, sizeof(*d), socket_id, NO_FLAGS);
+	if (mz == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	d = mz->addr;
+	snprintf(d->name, sizeof(d->name), "%s", name);
+	d->num_workers = num_workers;
+
+	/*
+	 * Set up the backog tags so they're pointing at the second cache
+	 * line for performance during flow matching
+	 */
+	for (unsigned int i = 0 ; i < num_workers ; i++)
+		d->backlog[i].tags = &d->in_flight_tags[i][RTE_DIST_BURST_SIZE];
+
+	distributor_list = RTE_TAILQ_CAST(rte_distributor_tailq.head,
+					  rte_distributor_list);
+
+	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+	TAILQ_INSERT_TAIL(distributor_list, d, next);
+	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
+	return d;
+}
diff --git a/lib/librte_distributor/rte_distributor_burst.h b/lib/librte_distributor/rte_distributor_burst.h
new file mode 100644
index 0000000..1437657
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_burst.h
@@ -0,0 +1,255 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIBUTE_H_
+#define _RTE_DISTRIBUTE_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct rte_distributor;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor *
+rte_distributor_create_burst(const char *name, unsigned int socket_id,
+		unsigned int num_workers);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed on different cores at
+ * the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process_burst(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts_burst(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush_burst(struct rte_distributor *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns_burst(struct rte_distributor *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get new packets to process. Any previous packets
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param pkts
+ *   The mbufs pointer array to be filled in (up to 8 packets)
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ * @param retcount
+ *   The number of packets being returned
+ *
+ * @return
+ *   The number of packets in the pkts array
+ */
+int
+rte_distributor_get_pkt_burst(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **pkts,
+	struct rte_mbuf **oldpkt, unsigned int retcount);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbuf
+ *   The previous packet being processed by the worker
+ */
+int
+rte_distributor_return_pkt_burst(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt_burst(), this function does not wait for a
+ * new packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt_burst() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt_burst()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The returning packets, if any, processed by the worker
+ * @param count
+ *   The number of returning packets
+ */
+void
+rte_distributor_request_pkt_burst(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbufs
+ *   The array of mbufs being given to the worker
+ *
+ * @return
+ *   The number of packets being given to the worker thread, zero if no
+ *   packet is yet available.
+ */
+int
+rte_distributor_poll_pkt_burst(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **mbufs);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_common.h b/lib/librte_distributor/rte_distributor_common.h
new file mode 100644
index 0000000..e2a9c3e
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_common.h
@@ -0,0 +1,75 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DIST_COMMON_H_
+#define _RTE_DIST_COMMON_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define NO_FLAGS 0
+#define RTE_DISTRIB_PREFIX "DT_"
+
+/*
+ * We will use the bottom four bits of pointer for flags, shifting out
+ * the top four bits to make room (since a 64-bit pointer actually only uses
+ * 48 bits). An arithmetic-right-shift will then appropriately restore the
+ * original pointer value with proper sign extension into the top bits.
+ */
+#define RTE_DISTRIB_FLAG_BITS 4
+#define RTE_DISTRIB_FLAGS_MASK (0x0F)
+#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
+#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
+#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
+#define RTE_DISTRIB_VALID_BUF (4)  /**< set if bufptr contains ptr */
+
+#define RTE_DISTRIB_BACKLOG_SIZE 8
+#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
+
+#define RTE_DISTRIB_MAX_RETURNS 128
+#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
+
+/**
+ * Maximum number of workers allowed.
+ * Be aware of increasing the limit, becaus it is limited by how we track
+ * in-flight tags. See @in_flight_bitmask and @rte_distributor_process
+ */
+#define RTE_DISTRIB_MAX_WORKERS 64
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v1 2/2] example: distributor app modified to use burstAPI
  2016-12-01  4:50 [PATCH v1 1/2] distributor lib performance enhancements David Hunt
  2016-12-01  4:50 ` [PATCH v1 1/2] lib: distributor " David Hunt
@ 2016-12-01  4:50 ` David Hunt
  1 sibling, 0 replies; 202+ messages in thread
From: David Hunt @ 2016-12-01  4:50 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

New stats to show details of throughput per second. Runs on a separate
core to the rx thread so as not to affect performance.
Thread for Stats, rx, tx, and distributor, all other cores in coremask
will be used for workers.
There's some "#if 0" lines in the code that allow the rx + dist to be
run on the same core, and a few more commented out lines to allow
skipping of flow matching algo, etc. Code is comment in the appropriate
location

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 examples/distributor/main.c | 489 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 380 insertions(+), 109 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index 537cee1..fcac807 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -1,8 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -31,6 +30,8 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
+#define BURST_API 1
+
 #include <stdint.h>
 #include <inttypes.h>
 #include <unistd.h>
@@ -43,39 +44,87 @@
 #include <rte_malloc.h>
 #include <rte_debug.h>
 #include <rte_prefetch.h>
+#if BURST_API
+#include <rte_distributor_burst.h>
+#else
 #include <rte_distributor.h>
+#endif
 
-#define RX_RING_SIZE 256
-#define TX_RING_SIZE 512
+#define RX_QUEUE_SIZE 512
+#define TX_QUEUE_SIZE 512
 #define NUM_MBUFS ((64*1024)-1)
-#define MBUF_CACHE_SIZE 250
+#define MBUF_CACHE_SIZE 128
+#if BURST_API
+#define BURST_SIZE 64
+#define SCHED_RX_RING_SZ 8192
+#define SCHED_TX_RING_SZ 65536
+#else
 #define BURST_SIZE 32
-#define RTE_RING_SZ 1024
+#define SCHED_RX_RING_SZ 1024
+#define SCHED_TX_RING_SZ 1024
+#endif
+#define BURST_SIZE_TX 32
 
 #define RTE_LOGTYPE_DISTRAPP RTE_LOGTYPE_USER1
 
+#define ANSI_COLOR_RED     "\x1b[31m"
+#define ANSI_COLOR_RESET   "\x1b[0m"
+
 /* mask of enabled ports */
 static uint32_t enabled_port_mask;
 volatile uint8_t quit_signal;
 volatile uint8_t quit_signal_rx;
+volatile uint8_t quit_signal_dist;
+volatile uint8_t quit_signal_work;
 
 static volatile struct app_stats {
 	struct {
 		uint64_t rx_pkts;
 		uint64_t returned_pkts;
 		uint64_t enqueued_pkts;
+		uint64_t enqdrop_pkts;
 	} rx __rte_cache_aligned;
+	int pad1 __rte_cache_aligned;
+
+	struct {
+		uint64_t in_pkts;
+		uint64_t ret_pkts;
+		uint64_t sent_pkts;
+		uint64_t enqdrop_pkts;
+	} dist __rte_cache_aligned;
+	int pad2 __rte_cache_aligned;
 
 	struct {
 		uint64_t dequeue_pkts;
 		uint64_t tx_pkts;
+		uint64_t enqdrop_pkts;
 	} tx __rte_cache_aligned;
+	int pad3 __rte_cache_aligned;
+
+	uint64_t worker_pkts[64] __rte_cache_aligned;
+
+	int pad4 __rte_cache_aligned;
+
+	uint64_t worker_bursts[64][8] __rte_cache_aligned;
+
+	int pad5 __rte_cache_aligned;
+
+	uint64_t port_rx_pkts[64] __rte_cache_aligned;
+	uint64_t port_tx_pkts[64] __rte_cache_aligned;
 } app_stats;
 
+struct app_stats prev_app_stats;
+
 static const struct rte_eth_conf port_conf_default = {
 	.rxmode = {
 		.mq_mode = ETH_MQ_RX_RSS,
 		.max_rx_pkt_len = ETHER_MAX_LEN,
+		.split_hdr_size = 0,
+		.header_split   = 0, /**< Header Split disabled */
+		.hw_ip_checksum = 1, /**< IP checksum offload enabled */
+		.hw_vlan_filter = 0, /**< VLAN filtering disabled */
+		.jumbo_frame    = 0, /**< Jumbo Frame Support disabled */
+		.hw_strip_crc   = 0, /**< CRC stripped by hardware */
 	},
 	.txmode = {
 		.mq_mode = ETH_MQ_TX_NONE,
@@ -93,6 +142,8 @@ struct output_buffer {
 	struct rte_mbuf *mbufs[BURST_SIZE];
 };
 
+static void print_stats(void);
+
 /*
  * Initialises a given port using global settings and with the rx buffers
  * coming from the mbuf_pool passed as parameter
@@ -101,9 +152,13 @@ static inline int
 port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 {
 	struct rte_eth_conf port_conf = port_conf_default;
-	const uint16_t rxRings = 1, txRings = rte_lcore_count() - 1;
-	int retval;
+	const uint16_t rxRings = 1;
+	uint16_t txRings = rte_lcore_count() - 1;
 	uint16_t q;
+	int retval;
+
+	if (txRings > RTE_MAX_ETHPORTS)
+		txRings = RTE_MAX_ETHPORTS;
 
 	if (port >= rte_eth_dev_count())
 		return -1;
@@ -113,7 +168,7 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 		return retval;
 
 	for (q = 0; q < rxRings; q++) {
-		retval = rte_eth_rx_queue_setup(port, q, RX_RING_SIZE,
+		retval = rte_eth_rx_queue_setup(port, q, RX_QUEUE_SIZE,
 						rte_eth_dev_socket_id(port),
 						NULL, mbuf_pool);
 		if (retval < 0)
@@ -121,7 +176,7 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 	}
 
 	for (q = 0; q < txRings; q++) {
-		retval = rte_eth_tx_queue_setup(port, q, TX_RING_SIZE,
+		retval = rte_eth_tx_queue_setup(port, q, TX_QUEUE_SIZE,
 						rte_eth_dev_socket_id(port),
 						NULL);
 		if (retval < 0)
@@ -134,7 +189,8 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 
 	struct rte_eth_link link;
 	rte_eth_link_get_nowait(port, &link);
-	if (!link.link_status) {
+	while (!link.link_status) {
+		printf("Waiting for Link up on port %"PRIu8"\n", port);
 		sleep(1);
 		rte_eth_link_get_nowait(port, &link);
 	}
@@ -161,40 +217,51 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 struct lcore_params {
 	unsigned worker_id;
 	struct rte_distributor *d;
-	struct rte_ring *r;
+	struct rte_ring *rx_dist_ring;
+	struct rte_ring *dist_tx_ring;
 	struct rte_mempool *mem_pool;
 };
 
-static int
-quit_workers(struct rte_distributor *d, struct rte_mempool *p)
+static inline void
+flush_one_port(struct output_buffer *outbuf, uint8_t outp)
 {
-	const unsigned num_workers = rte_lcore_count() - 2;
-	unsigned i;
-	struct rte_mbuf *bufs[num_workers];
+	unsigned int nb_tx = rte_eth_tx_burst(outp, 0,
+			outbuf->mbufs, outbuf->count);
+	app_stats.tx.tx_pkts += outbuf->count;
 
-	if (rte_mempool_get_bulk(p, (void *)bufs, num_workers) != 0) {
-		printf("line %d: Error getting mbufs from pool\n", __LINE__);
-		return -1;
+	if (unlikely(nb_tx < outbuf->count)) {
+		app_stats.tx.enqdrop_pkts +=  outbuf->count - nb_tx;
+		do {
+			rte_pktmbuf_free(outbuf->mbufs[nb_tx]);
+		} while (++nb_tx < outbuf->count);
 	}
+	outbuf->count = 0;
+}
 
-	for (i = 0; i < num_workers; i++)
-		bufs[i]->hash.rss = i << 1;
+static inline void
+flush_all_ports(struct output_buffer *tx_buffers, uint8_t nb_ports)
+{
+	uint8_t outp;
 
-	rte_distributor_process(d, bufs, num_workers);
-	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
+	for (outp = 0; outp < nb_ports; outp++) {
+		/* skip ports that are not enabled */
+		if ((enabled_port_mask & (1 << outp)) == 0)
+			continue;
 
-	return 0;
+		if (tx_buffers[outp].count == 0)
+			continue;
+
+		flush_one_port(&tx_buffers[outp], outp);
+	}
 }
 
 static int
 lcore_rx(struct lcore_params *p)
 {
-	struct rte_distributor *d = p->d;
-	struct rte_mempool *mem_pool = p->mem_pool;
-	struct rte_ring *r = p->r;
 	const uint8_t nb_ports = rte_eth_dev_count();
 	const int socket_id = rte_socket_id();
 	uint8_t port;
+	struct rte_mbuf *bufs[BURST_SIZE*2];
 
 	for (port = 0; port < nb_ports; port++) {
 		/* skip ports that are not enabled */
@@ -210,6 +277,7 @@ lcore_rx(struct lcore_params *p)
 
 	printf("\nCore %u doing packet RX.\n", rte_lcore_id());
 	port = 0;
+
 	while (!quit_signal_rx) {
 
 		/* skip ports that are not enabled */
@@ -218,7 +286,7 @@ lcore_rx(struct lcore_params *p)
 				port = 0;
 			continue;
 		}
-		struct rte_mbuf *bufs[BURST_SIZE*2];
+
 		const uint16_t nb_rx = rte_eth_rx_burst(port, 0, bufs,
 				BURST_SIZE);
 		if (unlikely(nb_rx == 0)) {
@@ -228,19 +296,46 @@ lcore_rx(struct lcore_params *p)
 		}
 		app_stats.rx.rx_pkts += nb_rx;
 
+/*
+ * You can run the distributor on the rx core with this code. Returned
+ * packets are then send straight to the tx core.
+ */
+#if 0
+
+#if BURST_API
+		rte_distributor_process_burst(d, bufs, nb_rx);
+		const uint16_t nb_ret = rte_distributor_returned_pkts_burst(d,
+				bufs, BURST_SIZE*2);
+#else
 		rte_distributor_process(d, bufs, nb_rx);
 		const uint16_t nb_ret = rte_distributor_returned_pkts(d,
 				bufs, BURST_SIZE*2);
+#endif
+
 		app_stats.rx.returned_pkts += nb_ret;
 		if (unlikely(nb_ret == 0)) {
 			if (++port == nb_ports)
 				port = 0;
 			continue;
 		}
-
-		uint16_t sent = rte_ring_enqueue_burst(r, (void *)bufs, nb_ret);
+		struct rte_ring *tx_ring = p->dist_tx_ring;
+		uint16_t sent = rte_ring_enqueue_burst(tx_ring,
+				(void *)bufs, nb_ret);
+#else
+		uint16_t nb_ret = nb_rx;
+		/*
+		 * Swap the following two lines if you want the rx traffic
+		 * to go directly to tx, no distribution.
+		 */
+		struct rte_ring *out_ring = p->rx_dist_ring;
+		//struct rte_ring *out_ring = p->dist_tx_ring;
+
+		uint16_t sent = rte_ring_enqueue_burst(out_ring,
+				(void *)bufs, nb_ret);
+#endif
 		app_stats.rx.enqueued_pkts += sent;
 		if (unlikely(sent < nb_ret)) {
+			app_stats.rx.enqdrop_pkts +=  nb_ret - sent;
 			RTE_LOG(DEBUG, DISTRAPP,
 				"%s:Packet loss due to full ring\n", __func__);
 			while (sent < nb_ret)
@@ -249,54 +344,86 @@ lcore_rx(struct lcore_params *p)
 		if (++port == nb_ports)
 			port = 0;
 	}
-	rte_distributor_process(d, NULL, 0);
-	/* flush distributor to bring to known state */
-	rte_distributor_flush(d);
 	/* set worker & tx threads quit flag */
+	printf("\nCore %u exiting rx task.\n", rte_lcore_id());
 	quit_signal = 1;
-	/*
-	 * worker threads may hang in get packet as
-	 * distributor process is not running, just make sure workers
-	 * get packets till quit_signal is actually been
-	 * received and they gracefully shutdown
-	 */
-	if (quit_workers(d, mem_pool) != 0)
-		return -1;
-	/* rx thread should quit at last */
 	return 0;
 }
 
-static inline void
-flush_one_port(struct output_buffer *outbuf, uint8_t outp)
+static int
+lcore_distributor(struct lcore_params *p)
 {
-	unsigned nb_tx = rte_eth_tx_burst(outp, 0, outbuf->mbufs,
-			outbuf->count);
-	app_stats.tx.tx_pkts += nb_tx;
+	struct rte_ring *in_r = p->rx_dist_ring;
+	struct rte_ring *out_r = p->dist_tx_ring;
+	struct rte_mbuf *bufs[BURST_SIZE * 4];
+	struct rte_distributor *d = p->d;
 
-	if (unlikely(nb_tx < outbuf->count)) {
-		RTE_LOG(DEBUG, DISTRAPP,
-			"%s:Packet loss with tx_burst\n", __func__);
-		do {
-			rte_pktmbuf_free(outbuf->mbufs[nb_tx]);
-		} while (++nb_tx < outbuf->count);
-	}
-	outbuf->count = 0;
-}
+	printf("\nCore %u acting as distributor core.\n", rte_lcore_id());
+	while (!quit_signal_dist) {
 
-static inline void
-flush_all_ports(struct output_buffer *tx_buffers, uint8_t nb_ports)
-{
-	uint8_t outp;
-	for (outp = 0; outp < nb_ports; outp++) {
-		/* skip ports that are not enabled */
-		if ((enabled_port_mask & (1 << outp)) == 0)
-			continue;
+		const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
+				(void *)bufs, BURST_SIZE*1);
 
-		if (tx_buffers[outp].count == 0)
-			continue;
+		if (nb_rx) {
+			app_stats.dist.in_pkts += nb_rx;
+/*
+ * This #if allows you to bypass the distributor. Incoming packets may be
+ * sent straight to the tx ring.
+ */
+#if 1
+
+#if BURST_API
+			/* Distribute the packets */
+			rte_distributor_process_burst(d, bufs, nb_rx);
+			/* Handle Returns */
+			const uint16_t nb_ret =
+				rte_distributor_returned_pkts_burst(d,
+					bufs, BURST_SIZE*2);
+#else
+			/* Distribute the packets */
+			rte_distributor_process(d, bufs, nb_rx);
+			/* Handle Returns */
+			const uint16_t nb_ret =
+				rte_distributor_returned_pkts(d,
+					bufs, BURST_SIZE*2);
+#endif
+
+#else
+			/* Bypass the distributor */
+			const unsigned int xor_val = (rte_eth_dev_count() > 1);
+			/* Touch the mbuf by xor'ing the port */
+			for (unsigned int i = 0; i < nb_rx; i++)
+				bufs[i]->port ^= xor_val;
+
+			const uint16_t nb_ret = nb_rx;
+#endif
+			if (unlikely(nb_ret == 0))
+				continue;
 
-		flush_one_port(&tx_buffers[outp], outp);
+			app_stats.dist.ret_pkts += nb_ret;
+
+			uint16_t sent = rte_ring_enqueue_burst(out_r,
+					(void *)bufs, nb_ret);
+			app_stats.dist.sent_pkts += sent;
+			if (unlikely(sent < nb_ret)) {
+				app_stats.dist.enqdrop_pkts += nb_ret - sent;
+				RTE_LOG(DEBUG, DISTRAPP,
+					"%s:Packet loss due to full out ring\n",
+					__func__);
+				while (sent < nb_ret)
+					rte_pktmbuf_free(bufs[sent++]);
+			}
+		}
 	}
+	printf("\nCore %u exiting distributor task.\n", rte_lcore_id());
+	quit_signal_work = 1;
+
+#if BURST_API
+	/* Unblock any returns so workers can exit */
+	rte_distributor_clear_returns_burst(d);
+#endif
+	quit_signal_rx = 1;
+	return 0;
 }
 
 static int
@@ -327,9 +454,9 @@ lcore_tx(struct rte_ring *in_r)
 			if ((enabled_port_mask & (1 << port)) == 0)
 				continue;
 
-			struct rte_mbuf *bufs[BURST_SIZE];
+			struct rte_mbuf *bufs[BURST_SIZE_TX];
 			const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
-					(void *)bufs, BURST_SIZE);
+					(void *)bufs, BURST_SIZE_TX);
 			app_stats.tx.dequeue_pkts += nb_rx;
 
 			/* if we get no traffic, flush anything we have */
@@ -358,11 +485,12 @@ lcore_tx(struct rte_ring *in_r)
 
 				outbuf = &tx_buffers[outp];
 				outbuf->mbufs[outbuf->count++] = bufs[i];
-				if (outbuf->count == BURST_SIZE)
+				if (outbuf->count == BURST_SIZE_TX)
 					flush_one_port(outbuf, outp);
 			}
 		}
 	}
+	printf("\nCore %u exiting tx task.\n", rte_lcore_id());
 	return 0;
 }
 
@@ -371,7 +499,7 @@ int_handler(int sig_num)
 {
 	printf("Exiting on signal %d\n", sig_num);
 	/* set quit flag for rx thread to exit */
-	quit_signal_rx = 1;
+	quit_signal_dist = 1;
 }
 
 static void
@@ -379,24 +507,88 @@ print_stats(void)
 {
 	struct rte_eth_stats eth_stats;
 	unsigned i;
-
-	printf("\nRX thread stats:\n");
-	printf(" - Received:    %"PRIu64"\n", app_stats.rx.rx_pkts);
-	printf(" - Processed:   %"PRIu64"\n", app_stats.rx.returned_pkts);
-	printf(" - Enqueued:    %"PRIu64"\n", app_stats.rx.enqueued_pkts);
-
-	printf("\nTX thread stats:\n");
-	printf(" - Dequeued:    %"PRIu64"\n", app_stats.tx.dequeue_pkts);
-	printf(" - Transmitted: %"PRIu64"\n", app_stats.tx.tx_pkts);
+	const unsigned int num_workers = rte_lcore_count() - 4;
 
 	for (i = 0; i < rte_eth_dev_count(); i++) {
 		rte_eth_stats_get(i, &eth_stats);
-		printf("\nPort %u stats:\n", i);
-		printf(" - Pkts in:   %"PRIu64"\n", eth_stats.ipackets);
-		printf(" - Pkts out:  %"PRIu64"\n", eth_stats.opackets);
-		printf(" - In Errs:   %"PRIu64"\n", eth_stats.ierrors);
-		printf(" - Out Errs:  %"PRIu64"\n", eth_stats.oerrors);
-		printf(" - Mbuf Errs: %"PRIu64"\n", eth_stats.rx_nombuf);
+		app_stats.port_rx_pkts[i] = eth_stats.ipackets;
+		app_stats.port_tx_pkts[i] = eth_stats.opackets;
+	}
+
+	printf("\n\nRX Thread:\n");
+	for (i = 0; i < rte_eth_dev_count(); i++) {
+		printf("Port %u Pktsin : %5.2f\n", i,
+				(app_stats.port_rx_pkts[i] -
+				prev_app_stats.port_rx_pkts[i])/1000000.0);
+		prev_app_stats.port_rx_pkts[i] = app_stats.port_rx_pkts[i];
+	}
+	printf(" - Received:    %5.2f\n",
+			(app_stats.rx.rx_pkts -
+			prev_app_stats.rx.rx_pkts)/1000000.0);
+	printf(" - Returned:    %5.2f\n",
+			(app_stats.rx.returned_pkts -
+			prev_app_stats.rx.returned_pkts)/1000000.0);
+	printf(" - Enqueued:    %5.2f\n",
+			(app_stats.rx.enqueued_pkts -
+			prev_app_stats.rx.enqueued_pkts)/1000000.0);
+	printf(" - Dropped:     %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.rx.enqdrop_pkts -
+			prev_app_stats.rx.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	printf("Distributor thread:\n");
+	printf(" - In:          %5.2f\n",
+			(app_stats.dist.in_pkts -
+			prev_app_stats.dist.in_pkts)/1000000.0);
+	printf(" - Returned:    %5.2f\n",
+			(app_stats.dist.ret_pkts -
+			prev_app_stats.dist.ret_pkts)/1000000.0);
+	printf(" - Sent:        %5.2f\n",
+			(app_stats.dist.sent_pkts -
+			prev_app_stats.dist.sent_pkts)/1000000.0);
+	printf(" - Dropped      %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.dist.enqdrop_pkts -
+			prev_app_stats.dist.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	printf("TX thread:\n");
+	printf(" - Dequeued:    %5.2f\n",
+			(app_stats.tx.dequeue_pkts -
+			prev_app_stats.tx.dequeue_pkts)/1000000.0);
+	for (i = 0; i < rte_eth_dev_count(); i++) {
+		printf("Port %u Pktsout: %5.2f\n",
+				i, (app_stats.port_tx_pkts[i] -
+				prev_app_stats.port_tx_pkts[i])/1000000.0);
+		prev_app_stats.port_tx_pkts[i] = app_stats.port_tx_pkts[i];
+	}
+	printf(" - Transmitted: %5.2f\n",
+			(app_stats.tx.tx_pkts -
+			prev_app_stats.tx.tx_pkts)/1000000.0);
+	printf(" - Dropped:     %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.tx.enqdrop_pkts -
+			prev_app_stats.tx.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	prev_app_stats.rx.rx_pkts = app_stats.rx.rx_pkts;
+	prev_app_stats.rx.returned_pkts = app_stats.rx.returned_pkts;
+	prev_app_stats.rx.enqueued_pkts = app_stats.rx.enqueued_pkts;
+	prev_app_stats.rx.enqdrop_pkts = app_stats.rx.enqdrop_pkts;
+	prev_app_stats.dist.in_pkts = app_stats.dist.in_pkts;
+	prev_app_stats.dist.ret_pkts = app_stats.dist.ret_pkts;
+	prev_app_stats.dist.sent_pkts = app_stats.dist.sent_pkts;
+	prev_app_stats.dist.enqdrop_pkts = app_stats.dist.enqdrop_pkts;
+	prev_app_stats.tx.dequeue_pkts = app_stats.tx.dequeue_pkts;
+	prev_app_stats.tx.tx_pkts = app_stats.tx.tx_pkts;
+	prev_app_stats.tx.enqdrop_pkts = app_stats.tx.enqdrop_pkts;
+
+	for (i = 0; i < num_workers; i++) {
+		printf("Worker %02u Pkts: %5.2f. Bursts(1-8): ", i,
+				(app_stats.worker_pkts[i] -
+				prev_app_stats.worker_pkts[i])/1000000.0);
+		for (int j = 0; j < 8; j++)
+			printf("%ld ", app_stats.worker_bursts[i][j]);
+		printf("\n");
+		prev_app_stats.worker_pkts[i] = app_stats.worker_pkts[i];
 	}
 }
 
@@ -405,18 +597,48 @@ lcore_worker(struct lcore_params *p)
 {
 	struct rte_distributor *d = p->d;
 	const unsigned id = p->worker_id;
+	unsigned int num = 0;
+
 	/*
 	 * for single port, xor_val will be zero so we won't modify the output
 	 * port, otherwise we send traffic from 0 to 1, 2 to 3, and vice versa
 	 */
 	const unsigned xor_val = (rte_eth_dev_count() > 1);
-	struct rte_mbuf *buf = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+
+	for (int i = 0; i < 8; i++)
+		buf[i] = NULL;
+
+	app_stats.worker_pkts[p->worker_id] = 1;
+
 
 	printf("\nCore %u acting as worker core.\n", rte_lcore_id());
-	while (!quit_signal) {
-		buf = rte_distributor_get_pkt(d, id, buf);
-		buf->port ^= xor_val;
+	while (!quit_signal_work) {
+
+#if BURST_API
+		num = rte_distributor_get_pkt_burst(d, id, buf, buf, num);
+		/* Do a little bit of work for each packet */
+		for (unsigned int i = 0; i < num; i++) {
+			uint64_t t = __rdtsc()+100;
+
+			while (__rdtsc() < t)
+				rte_pause();
+			buf[i]->port ^= xor_val;
+		}
+#else
+		buf[0] = rte_distributor_get_pkt(d, id, buf[0]);
+		uint64_t t = __rdtsc() + 10;
+
+		while (__rdtsc() < t)
+			rte_pause();
+		buf[0]->port ^= xor_val;
+#endif
+
+		app_stats.worker_pkts[p->worker_id] += num;
+		if (num > 0)
+			app_stats.worker_bursts[p->worker_id][num-1]++;
 	}
+	printf("\nCore %u exiting worker task.\n", rte_lcore_id());
 	return 0;
 }
 
@@ -497,11 +719,13 @@ main(int argc, char *argv[])
 {
 	struct rte_mempool *mbuf_pool;
 	struct rte_distributor *d;
-	struct rte_ring *output_ring;
+	struct rte_ring *dist_tx_ring;
+	struct rte_ring *rx_dist_ring;
 	unsigned lcore_id, worker_id = 0;
 	unsigned nb_ports;
 	uint8_t portid;
 	uint8_t nb_ports_available;
+	uint64_t t, freq;
 
 	/* catch ctrl-c so we can print on exit */
 	signal(SIGINT, int_handler);
@@ -518,10 +742,12 @@ main(int argc, char *argv[])
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Invalid distributor parameters\n");
 
-	if (rte_lcore_count() < 3)
+	if (rte_lcore_count() < 5)
 		rte_exit(EXIT_FAILURE, "Error, This application needs at "
-				"least 3 logical cores to run:\n"
-				"1 lcore for packet RX and distribution\n"
+				"least 5 logical cores to run:\n"
+				"1 lcore for stats (can be core 0)\n"
+				"1 lcore for packet RX\n"
+				"1 lcore for distribution\n"
 				"1 lcore for packet TX\n"
 				"and at least 1 lcore for worker threads\n");
 
@@ -560,41 +786,86 @@ main(int argc, char *argv[])
 				"All available ports are disabled. Please set portmask.\n");
 	}
 
+#if BURST_API
+	d = rte_distributor_create_burst("PKT_DIST", rte_socket_id(),
+			rte_lcore_count() - 4);
+#else
 	d = rte_distributor_create("PKT_DIST", rte_socket_id(),
-			rte_lcore_count() - 2);
+			rte_lcore_count() - 4);
+#endif
 	if (d == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create distributor\n");
 
 	/*
-	 * scheduler ring is read only by the transmitter core, but written to
-	 * by multiple threads
+	 * scheduler ring is read by the transmitter core, and written to
+	 * by scheduler core
 	 */
-	output_ring = rte_ring_create("Output_ring", RTE_RING_SZ,
-			rte_socket_id(), RING_F_SC_DEQ);
-	if (output_ring == NULL)
+	dist_tx_ring = rte_ring_create("Output_ring", SCHED_TX_RING_SZ,
+			rte_socket_id(), RING_F_SC_DEQ | RING_F_SP_ENQ);
+	if (dist_tx_ring == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot create output ring\n");
+
+	rx_dist_ring = rte_ring_create("Input_ring", SCHED_RX_RING_SZ,
+			rte_socket_id(), RING_F_SC_DEQ | RING_F_SP_ENQ);
+	if (rx_dist_ring == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create output ring\n");
 
 	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
-		if (worker_id == rte_lcore_count() - 2)
+		if (worker_id == rte_lcore_count() - 3) {
+			printf("Starting distributor on lcore_id %d\n",
+					lcore_id);
+			/* distributor core */
+			struct lcore_params *p =
+					rte_malloc(NULL, sizeof(*p), 0);
+			if (!p)
+				rte_panic("malloc failure\n");
+			*p = (struct lcore_params){worker_id, d,
+					rx_dist_ring, dist_tx_ring, mbuf_pool};
+			rte_eal_remote_launch(
+					(lcore_function_t *)lcore_distributor,
+					p, lcore_id);
+		} else if (worker_id == rte_lcore_count() - 4) {
+			printf("Starting tx  on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
+			/* tx core */
 			rte_eal_remote_launch((lcore_function_t *)lcore_tx,
-					output_ring, lcore_id);
-		else {
+					dist_tx_ring, lcore_id);
+		} else if (worker_id == rte_lcore_count() - 2) {
+			printf("Starting rx on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
+			/* rx core */
+			struct lcore_params *p =
+					rte_malloc(NULL, sizeof(*p), 0);
+			if (!p)
+				rte_panic("malloc failure\n");
+			*p = (struct lcore_params){worker_id, d, rx_dist_ring,
+					dist_tx_ring, mbuf_pool};
+			rte_eal_remote_launch((lcore_function_t *)lcore_rx,
+					p, lcore_id);
+		} else {
+			printf("Starting worker on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
 			struct lcore_params *p =
 					rte_malloc(NULL, sizeof(*p), 0);
 			if (!p)
 				rte_panic("malloc failure\n");
-			*p = (struct lcore_params){worker_id, d, output_ring, mbuf_pool};
+			*p = (struct lcore_params){worker_id, d, rx_dist_ring,
+					dist_tx_ring, mbuf_pool};
 
 			rte_eal_remote_launch((lcore_function_t *)lcore_worker,
 					p, lcore_id);
 		}
 		worker_id++;
 	}
-	/* call lcore_main on master core only */
-	struct lcore_params p = { 0, d, output_ring, mbuf_pool};
 
-	if (lcore_rx(&p) != 0)
-		return -1;
+	freq = rte_get_timer_hz();
+	t = __rdtsc() + freq;
+	while (!quit_signal_dist) {
+		if (t < __rdtsc()) {
+			print_stats();
+			t = _rdtsc() + freq;
+		}
+	}
 
 	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
 		if (rte_eal_wait_lcore(lcore_id) < 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v2 0/5] distributor library performance enhancements
  2016-12-01  4:50 ` [PATCH v1 1/2] lib: distributor " David Hunt
@ 2016-12-22  4:37   ` David Hunt
  2016-12-22  4:37     ` [PATCH v2 1/5] lib: distributor " David Hunt
                       ` (4 more replies)
  0 siblings, 5 replies; 202+ messages in thread
From: David Hunt @ 2016-12-22  4:37 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson

This patch aims to improve the throughput of the distributor library.

It adds a series of API calls similar to the original API, but with
"_burst" in the function names. Usage is similar (but not identical), in that
there are now bursts of mbufs sent to each worker at a time instead of a
single mbuf pointer. See the header file rte_distributor_burst.h for more
details on API usage.

It uses a similar handshake mechanism to the previous version of
the library, in that bits are used to indicate when packets are ready
to be sent to a worker and ready to be returned from a worker. One main
difference is that instead of sending one packet in a cache line, it makes
use of the 7 free spaces in the same cache line in order to send up to
8 packets at a time to/from a worker.

The flow matching algorithm has had significant re-work, and now keeps an
array of inflight flows and an array of backlog flows, and matches incoming
flows to the inflight/backlog flows of all workers so that flow pinning to
workers can be maintained.

The Flow Match algorithm has both scalar and a vector versions, and a
function pointer is used to select the post appropriate function at run time,
depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
the scalar match function is selected, which should still gives a good boost
in performance over the non-burst API.

v2 changes:
  * Created a common distributor_priv.h header file with common
    definitions and structures.
  * Added a scalar version so it can be built and used on machines without
    sse2 instruction set
  * Added unit autotests
  * Added perf autotest
  * Added doc updates

Notes:
   Apps using the birst API must now work in bursts, as up to 8 are given
   to a worker at a time
   For performance in matching, Flow ID's are 15-bits (non-zero)
   Original API (and code) is kept for backward compatibility

Performance Gains
   2.2GHz Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
   2 x XL710 40GbE NICS to 2 x 40Gbps traffic generator channels 64b packets
   separate cores for rx, tx, distributor
    1 worker  - 4.8x
    4 workers - 2.9x
    8 workers - 1.8x
   12 workers - 2.1x
   16 workers - 1.8x

[PATCH v2 1/5] lib: distributor performance enhancements
[PATCH v2 2/5] test: unit tests for new distributor burst api
[PATCH v2 3/5] test: add distributor_perf autotest
[PATCH v2 4/5] example: distributor app showing burst api
[PATCH v2 5/5] doc: distributor library changes for new burst api

^ permalink raw reply	[flat|nested] 202+ messages in thread

* [PATCH v2 1/5] lib: distributor performance enhancements
  2016-12-22  4:37   ` [PATCH v2 0/5] distributor library " David Hunt
@ 2016-12-22  4:37     ` David Hunt
  2016-12-22 12:47       ` Jerin Jacob
                         ` (2 more replies)
  2016-12-22  4:37     ` [PATCH v2 2/5] test: unit tests for new distributor " David Hunt
                       ` (3 subsequent siblings)
  4 siblings, 3 replies; 202+ messages in thread
From: David Hunt @ 2016-12-22  4:37 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Now sends bursts of up to 8 mbufs to each worker, and tracks
the in-flight flow-ids (atomic scheduling)

New file with a new api, similar to the old API except with _burst
at the end of the function names

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                |   2 +
 lib/librte_distributor/rte_distributor.c       |  72 +--
 lib/librte_distributor/rte_distributor_burst.c | 642 +++++++++++++++++++++++++
 lib/librte_distributor/rte_distributor_burst.h | 255 ++++++++++
 lib/librte_distributor/rte_distributor_priv.h  | 190 ++++++++
 5 files changed, 1090 insertions(+), 71 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_burst.c
 create mode 100644 lib/librte_distributor/rte_distributor_burst.h
 create mode 100644 lib/librte_distributor/rte_distributor_priv.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 4c9af17..2acc54d 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -43,9 +43,11 @@ LIBABIVER := 1
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_burst.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor_burst.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index f3f778c..c05f6e3 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -40,79 +40,9 @@
 #include <rte_errno.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
+#include "rte_distributor_priv.h"
 #include "rte_distributor.h"
 
-#define NO_FLAGS 0
-#define RTE_DISTRIB_PREFIX "DT_"
-
-/* we will use the bottom four bits of pointer for flags, shifting out
- * the top four bits to make room (since a 64-bit pointer actually only uses
- * 48 bits). An arithmetic-right-shift will then appropriately restore the
- * original pointer value with proper sign extension into the top bits. */
-#define RTE_DISTRIB_FLAG_BITS 4
-#define RTE_DISTRIB_FLAGS_MASK (0x0F)
-#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
-#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
-#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
-
-#define RTE_DISTRIB_BACKLOG_SIZE 8
-#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
-
-#define RTE_DISTRIB_MAX_RETURNS 128
-#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
-
-/**
- * Maximum number of workers allowed.
- * Be aware of increasing the limit, becaus it is limited by how we track
- * in-flight tags. See @in_flight_bitmask and @rte_distributor_process
- */
-#define RTE_DISTRIB_MAX_WORKERS	64
-
-/**
- * Buffer structure used to pass the pointer data between cores. This is cache
- * line aligned, but to improve performance and prevent adjacent cache-line
- * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
- * the next cache line to worker 0, we pad this out to three cache lines.
- * Only 64-bits of the memory is actually used though.
- */
-union rte_distributor_buffer {
-	volatile int64_t bufptr64;
-	char pad[RTE_CACHE_LINE_SIZE*3];
-} __rte_cache_aligned;
-
-struct rte_distributor_backlog {
-	unsigned start;
-	unsigned count;
-	int64_t pkts[RTE_DISTRIB_BACKLOG_SIZE];
-};
-
-struct rte_distributor_returned_pkts {
-	unsigned start;
-	unsigned count;
-	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
-};
-
-struct rte_distributor {
-	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
-
-	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
-	unsigned num_workers;                 /**< Number of workers polling */
-
-	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
-		/**< Tracks the tag being processed per core */
-	uint64_t in_flight_bitmask;
-		/**< on/off bits for in-flight tags.
-		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
-		 * the bitmask has to expand.
-		 */
-
-	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
-
-	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
-
-	struct rte_distributor_returned_pkts returns;
-};
-
 TAILQ_HEAD(rte_distributor_list, rte_distributor);
 
 static struct rte_tailq_elem rte_distributor_tailq = {
diff --git a/lib/librte_distributor/rte_distributor_burst.c b/lib/librte_distributor/rte_distributor_burst.c
new file mode 100644
index 0000000..9d9ae2d
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_burst.c
@@ -0,0 +1,642 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <sys/queue.h>
+#include <string.h>
+#include <rte_mbuf.h>
+#include <rte_memory.h>
+#include <rte_cycles.h>
+#include <rte_memzone.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
+#include <rte_eal_memconfig.h>
+#include "rte_distributor_priv.h"
+#include "rte_distributor_burst.h"
+#include "smmintrin.h"
+
+TAILQ_HEAD(rte_dist_burst_list, rte_distributor_burst);
+
+static struct rte_tailq_elem rte_dist_burst_tailq = {
+	.name = "RTE_DIST_BURST",
+};
+EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
+
+/**** APIs called by workers ****/
+
+/**** Burst Packet APIs called by workers ****/
+
+/* This function should really be called return_pkt_burst() */
+void
+rte_distributor_request_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count)
+{
+	struct rte_distributor_buffer_burst *buf = &(d->bufs[worker_id]);
+	unsigned int i;
+
+	volatile int64_t *retptr64;
+
+
+	/* if we dont' have any packets to return, return. */
+	if (count == 0)
+		return;
+
+	retptr64 = &(buf->retptr64[0]);
+	/* Spin while handshake bits are set (scheduler clears it) */
+	while (unlikely(*retptr64 & RTE_DISTRIB_GET_BUF)) {
+		rte_pause();
+		uint64_t t = __rdtsc()+100;
+
+		while (__rdtsc() < t)
+			rte_pause();
+	}
+
+	/*
+	 * OK, if we've got here, then the scheduler has just cleared the
+	 * handshake bits. Populate the retptrs with returning packets.
+	 */
+
+	for (i = count; i < RTE_DIST_BURST_SIZE; i++)
+		buf->retptr64[i] = 0;
+
+	/* Set Return bit for each packet returned */
+	for (i = count; i-- > 0; )
+		buf->retptr64[i] =
+			(((int64_t)(uintptr_t)(oldpkt[i])) <<
+			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
+
+	/*
+	 * Finally, set the GET_BUF  to signal to distributor that cache
+	 * line is ready for processing
+	 */
+	*retptr64 |= RTE_DISTRIB_GET_BUF;
+}
+
+int
+rte_distributor_poll_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **pkts)
+{
+	struct rte_distributor_buffer_burst *buf = &d->bufs[worker_id];
+	uint64_t ret;
+	int count = 0;
+
+	/* If bit is set, return */
+	if (buf->bufptr64[0] & RTE_DISTRIB_GET_BUF)
+		return 0;
+
+	/* since bufptr64 is signed, this should be an arithmetic shift */
+	for (unsigned int i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+		if (likely(buf->bufptr64[i] & RTE_DISTRIB_VALID_BUF)) {
+			ret = buf->bufptr64[i] >> RTE_DISTRIB_FLAG_BITS;
+			pkts[count++] = (struct rte_mbuf *)((uintptr_t)(ret));
+		}
+	}
+
+	/*
+	 * so now we've got the contents of the cacheline into an  array of
+	 * mbuf pointers, so toggle the bit so scheduler can start working
+	 * on the next cacheline while we're working.
+	 */
+	buf->bufptr64[0] |= RTE_DISTRIB_GET_BUF;
+
+
+	return count;
+}
+
+int
+rte_distributor_get_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **pkts,
+		struct rte_mbuf **oldpkt, unsigned int return_count)
+{
+	unsigned int count;
+	uint64_t retries = 0;
+
+	rte_distributor_request_pkt_burst(d, worker_id, oldpkt, return_count);
+
+	count = rte_distributor_poll_pkt_burst(d, worker_id, pkts);
+	while (count == 0) {
+		rte_pause();
+		retries++;
+		if (retries > 1000) {
+			retries = 0;
+			return 0;
+		}
+		uint64_t t = __rdtsc()+100;
+
+		while (__rdtsc() < t)
+			rte_pause();
+
+		count = rte_distributor_poll_pkt_burst(d, worker_id, pkts);
+	}
+	return count;
+}
+
+int
+rte_distributor_return_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
+{
+	struct rte_distributor_buffer_burst *buf = &d->bufs[worker_id];
+	unsigned int i;
+
+	for (i = 0; i < RTE_DIST_BURST_SIZE; i++)
+		/* Switch off the return bit first */
+		buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
+
+	for (i = num; i-- > 0; )
+		buf->retptr64[i] = (((int64_t)(uintptr_t)oldpkt[i]) <<
+			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
+
+	/* set the GET_BUF but even if we got no returns */
+	buf->retptr64[0] |= RTE_DISTRIB_GET_BUF;
+
+	return 0;
+}
+
+/**** APIs called on distributor core ***/
+
+/* stores a packet returned from a worker inside the returns array */
+static inline void
+store_return(uintptr_t oldbuf, struct rte_distributor_burst *d,
+		unsigned int *ret_start, unsigned int *ret_count)
+{
+	if (!oldbuf)
+		return;
+	/* store returns in a circular buffer */
+	d->returns.mbufs[(*ret_start + *ret_count) & RTE_DISTRIB_RETURNS_MASK]
+			= (void *)oldbuf;
+	*ret_start += (*ret_count == RTE_DISTRIB_RETURNS_MASK);
+	*ret_count += (*ret_count != RTE_DISTRIB_RETURNS_MASK);
+}
+
+#if RTE_MACHINE_CPUFLAG_SSE2
+static inline void
+find_match_sse2(struct rte_distributor_burst *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	/* Setup */
+	__m128i incoming_fids;
+	__m128i inflight_fids;
+	__m128i preflight_fids;
+	__m128i wkr;
+	__m128i mask1;
+	__m128i mask2;
+	__m128i output;
+	struct rte_distributor_backlog *bl;
+
+	/*
+	 * Function overview:
+	 * 2. Loop through all worker ID's
+	 *  2a. Load the current inflights for that worker into an xmm reg
+	 *  2b. Load the current backlog for that worker into an xmm reg
+	 *  2c. use cmpestrm to intersect flow_ids with backlog and inflights
+	 *  2d. Add any matches to the output
+	 * 3. Write the output xmm (matching worker ids).
+	 */
+
+
+	output = _mm_set1_epi16(0);
+	incoming_fids = _mm_load_si128((__m128i *)data_ptr);
+
+	for (uint16_t i = 0; i < d->num_workers; i++) {
+		bl = &d->backlog[i];
+
+		inflight_fids =
+			_mm_load_si128((__m128i *)&(d->in_flight_tags[i]));
+		preflight_fids =
+			_mm_load_si128((__m128i *)(bl->tags));
+
+		/*
+		 * Any incoming_fid that exists anywhere in inflight_fids will
+		 * have 0xffff in same position of the mask as the incoming fid
+		 * Example (shortened to bytes for brevity):
+		 * incoming_fids   0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08
+		 * inflight_fids   0x03 0x05 0x07 0x00 0x00 0x00 0x00 0x00
+		 * mask            0x00 0x00 0xff 0x00 0xff 0x00 0xff 0x00
+		 */
+
+		mask1 = _mm_cmpestrm(inflight_fids, 8, incoming_fids, 8,
+			_SIDD_UWORD_OPS |
+			_SIDD_CMP_EQUAL_ANY |
+			_SIDD_UNIT_MASK);
+		mask2 = _mm_cmpestrm(preflight_fids, 8, incoming_fids, 8,
+			_SIDD_UWORD_OPS |
+			_SIDD_CMP_EQUAL_ANY |
+			_SIDD_UNIT_MASK);
+
+		mask1 = _mm_or_si128(mask1, mask2);
+		/*
+		 * Now mask contains 0xffff where there's a match.
+		 * Next we need to store the worker_id in the relevant position
+		 * in the output.
+		 */
+
+		wkr = _mm_set1_epi16(i+1);
+		mask1 = _mm_and_si128(mask1, wkr);
+		output = _mm_or_si128(mask1, output);
+	}
+
+	/*
+	 * At this stage, the output 128-bit contains 8 16-bit values, with
+	 * each non-zero value containing the worker ID on which the
+	 * corresponding flow is pinned to.
+	 */
+	_mm_store_si128((__m128i *)output_ptr, output);
+}
+#endif
+
+static inline void
+find_match_scalar(struct rte_distributor_burst *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	struct rte_distributor_backlog *bl;
+	uint16_t i, j, w;
+
+	/*
+	 * Function overview:
+	 * 1. Loop through all worker ID's
+	 * 2. Compare the current inflights to the incoming tags
+	 * 3. Compare the current backlog to the incoming tags
+	 * 4. Add any matches to the output
+	 */
+
+	for (j = 0 ; j < 8; j++)
+		output_ptr[j] = 0;
+
+	for (i = 0; i < d->num_workers; i++) {
+		bl = &d->backlog[i];
+
+		for (j = 0; j < 8 ; j++)
+			for (w = 0; w < 8; w++)
+				if (d->in_flight_tags[i][j] == data_ptr[w]) {
+					output_ptr[j] = i+1;
+					break;
+				}
+		for (j = 0; j < 8; j++)
+			for (w = 0; w < 8; w++)
+				if (bl->tags[j] == data_ptr[w]) {
+					output_ptr[j] = i+1;
+					break;
+				}
+	}
+
+	/*
+	 * At this stage, the output contains 8 16-bit values, with
+	 * each non-zero value containing the worker ID on which the
+	 * corresponding flow is pinned to.
+	 */
+}
+
+
+
+static unsigned int
+handle_returns(struct rte_distributor_burst *d, unsigned int wkr)
+{
+	struct rte_distributor_buffer_burst *buf = &(d->bufs[wkr]);
+	uintptr_t oldbuf;
+	unsigned int ret_start = d->returns.start,
+			ret_count = d->returns.count;
+	unsigned int count = 0;
+	/*
+	 * wait for the GET_BUF bit to go high, otherwise we can't send
+	 * the packets to the worker
+	 */
+
+	if (buf->retptr64[0] & RTE_DISTRIB_GET_BUF) {
+		for (unsigned int i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+			if (buf->retptr64[i] & RTE_DISTRIB_RETURN_BUF) {
+				oldbuf = ((uintptr_t)(buf->retptr64[i] >>
+					RTE_DISTRIB_FLAG_BITS));
+				/* store returns in a circular buffer */
+				store_return(oldbuf, d, &ret_start, &ret_count);
+				count++;
+				buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
+			}
+		}
+		d->returns.start = ret_start;
+		d->returns.count = ret_count;
+		/* Clear for the worker to populate with more returns */
+		buf->retptr64[0] = 0;
+	}
+	return count;
+}
+
+static unsigned int
+release(struct rte_distributor_burst *d, unsigned int wkr)
+{
+	struct rte_distributor_buffer_burst *buf = &(d->bufs[wkr]);
+	unsigned int i;
+
+	if (d->backlog[wkr].count == 0)
+		return 0;
+
+	while (!(d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF))
+		rte_pause();
+
+	handle_returns(d, wkr);
+
+	buf->count = 0;
+
+	for (i = 0; i < d->backlog[wkr].count; i++) {
+		d->bufs[wkr].bufptr64[i] = d->backlog[wkr].pkts[i] |
+				RTE_DISTRIB_GET_BUF | RTE_DISTRIB_VALID_BUF;
+		d->in_flight_tags[wkr][i] = d->backlog[wkr].tags[i];
+	}
+	buf->count = i;
+	for ( ; i < RTE_DIST_BURST_SIZE ; i++) {
+		buf->bufptr64[i] = RTE_DISTRIB_GET_BUF;
+		d->in_flight_tags[wkr][i] = 0;
+	}
+
+	d->backlog[wkr].count = 0;
+
+	/* Clear the GET bit */
+	buf->bufptr64[0] &= ~RTE_DISTRIB_GET_BUF;
+	return  buf->count;
+
+}
+
+
+/* process a set of packets to distribute them to workers */
+int
+rte_distributor_process_burst(struct rte_distributor_burst *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs)
+{
+	unsigned int next_idx = 0;
+	static unsigned int wkr;
+	struct rte_mbuf *next_mb = NULL;
+	int64_t next_value = 0;
+	uint16_t new_tag = 0;
+	uint16_t flows[8] __rte_cache_aligned;
+	//static int iter=0;
+
+	if (unlikely(num_mbufs == 0)) {
+		/* Flush out all non-full cache-lines to workers. */
+		for (unsigned int wid = 0 ; wid < d->num_workers; wid++) {
+			if ((d->bufs[wid].bufptr64[0] & RTE_DISTRIB_GET_BUF)) {
+				release(d, wid);
+				handle_returns(d, wid);
+			}
+		}
+		return 0;
+	}
+
+	while (next_idx < num_mbufs) {
+		uint16_t matches[8];
+		int pkts;
+
+		if (d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF)
+			d->bufs[wkr].count = 0;
+
+		for (unsigned int i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+			if (mbufs[next_idx + i]) {
+				/* flows have to be non-zero */
+				flows[i] = mbufs[next_idx + i]->hash.usr | 1;
+			} else
+				flows[i] = 0;
+		}
+
+		switch (d->dist_match_fn) {
+#ifdef RTE_MACHINE_CPUFLAG_SSE2
+		case RTE_DIST_MATCH_SSE:
+			find_match_sse2(d, &flows[0], &matches[0]);
+			break;
+#endif
+		default:
+			find_match_scalar(d, &flows[0], &matches[0]);
+		}
+
+		/*
+		 * Matches array now contain the intended worker ID (+1) of
+		 * the incoming packets. Any zeroes need to be assigned
+		 * workers.
+		 */
+
+		if ((num_mbufs - next_idx) < RTE_DIST_BURST_SIZE)
+			pkts = num_mbufs - next_idx;
+		else
+			pkts = RTE_DIST_BURST_SIZE;
+
+		for (int j = 0; j < pkts; j++) {
+
+			next_mb = mbufs[next_idx++];
+			next_value = (((int64_t)(uintptr_t)next_mb) <<
+					RTE_DISTRIB_FLAG_BITS);
+			/*
+			 * User is advocated to set tag vaue for each
+			 * mbuf before calling rte_distributor_process.
+			 * User defined tags are used to identify flows,
+			 * or sessions.
+			 */
+			/* flows MUST be non-zero */
+			new_tag = (uint16_t)(next_mb->hash.usr) | 1;
+
+			/*
+			 * Using the next line will cause the find_match
+			 * function to be optimised out, making this function
+			 * do parallel (non-atomic) distribution
+			 */
+			//matches[j] = 0;
+
+			if (matches[j]) {
+				struct rte_distributor_backlog *bl =
+						&d->backlog[matches[j]-1];
+				if (unlikely(bl->count ==
+						RTE_DIST_BURST_SIZE)) {
+					release(d, matches[j]-1);
+				}
+
+				/* Add to worker that already has flow */
+				unsigned int idx = bl->count++;
+
+				bl->tags[idx] = new_tag;
+				bl->pkts[idx] = next_value;
+
+			} else {
+				struct rte_distributor_backlog *bl =
+						&d->backlog[wkr];
+				if (unlikely(bl->count ==
+						RTE_DIST_BURST_SIZE)) {
+					release(d, wkr);
+				}
+
+				/* Add to current worker worker */
+				unsigned int idx = bl->count++;
+
+				bl->tags[idx] = new_tag;
+				bl->pkts[idx] = next_value;
+				/*
+				 * Now that we've just added an unpinned flow
+				 * to a worker, we need to ensure that all
+				 * other packets with that same flow will go
+				 * to the same worker in this burst.
+				 */
+				for (int w = j; w < pkts; w++)
+					if (flows[w] == new_tag)
+						matches[w] = wkr+1;
+			}
+		}
+		wkr++;
+		if (wkr >= d->num_workers)
+			wkr = 0;
+	}
+
+	/* Flush out all non-full cache-lines to workers. */
+	for (unsigned int wid = 0 ; wid < d->num_workers; wid++)
+		if ((d->bufs[wid].bufptr64[0] & RTE_DISTRIB_GET_BUF))
+			release(d, wid);
+
+	return num_mbufs;
+}
+
+/* return to the caller, packets returned from workers */
+int
+rte_distributor_returned_pkts_burst(struct rte_distributor_burst *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs)
+{
+	struct rte_distributor_returned_pkts *returns = &d->returns;
+	unsigned int retval = (max_mbufs < returns->count) ?
+			max_mbufs : returns->count;
+	unsigned int i;
+
+	for (i = 0; i < retval; i++) {
+		unsigned int idx = (returns->start + i) &
+				RTE_DISTRIB_RETURNS_MASK;
+
+		mbufs[i] = returns->mbufs[idx];
+	}
+	returns->start += i;
+	returns->count -= i;
+
+	return retval;
+}
+
+/*
+ * Return the number of packets in-flight in a distributor, i.e. packets
+ * being workered on or queued up in a backlog.
+ */
+static inline unsigned int
+total_outstanding(const struct rte_distributor_burst *d)
+{
+	unsigned int wkr, total_outstanding = 0;
+
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		total_outstanding += d->backlog[wkr].count;
+
+	return total_outstanding;
+}
+
+/*
+ * Flush the distributor, so that there are no outstanding packets in flight or
+ * queued up.
+ */
+int
+rte_distributor_flush_burst(struct rte_distributor_burst *d)
+{
+	const unsigned int flushed = total_outstanding(d);
+	unsigned int wkr;
+
+	while (total_outstanding(d) > 0)
+		rte_distributor_process_burst(d, NULL, 0);
+
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		handle_returns(d, wkr);
+
+	return flushed;
+}
+
+/* clears the internal returns array in the distributor */
+void
+rte_distributor_clear_returns_burst(struct rte_distributor_burst *d)
+{
+	/* throw away returns, so workers can exit */
+	for (unsigned int wkr = 0; wkr < d->num_workers; wkr++)
+		d->bufs[wkr].retptr64[0] = 0;
+}
+
+/* creates a distributor instance */
+struct rte_distributor_burst *
+rte_distributor_create_burst(const char *name,
+		unsigned int socket_id,
+		unsigned int num_workers)
+{
+	struct rte_distributor_burst *d;
+	struct rte_dist_burst_list *dist_burst_list;
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+
+	/* compilation-time checks */
+	RTE_BUILD_BUG_ON((sizeof(*d) & RTE_CACHE_LINE_MASK) != 0);
+	RTE_BUILD_BUG_ON((RTE_DISTRIB_MAX_WORKERS & 7) != 0);
+
+	if (name == NULL || num_workers >= RTE_DISTRIB_MAX_WORKERS) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	snprintf(mz_name, sizeof(mz_name), RTE_DISTRIB_PREFIX"%s", name);
+	mz = rte_memzone_reserve(mz_name, sizeof(*d), socket_id, NO_FLAGS);
+	if (mz == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	d = mz->addr;
+	snprintf(d->name, sizeof(d->name), "%s", name);
+	d->num_workers = num_workers;
+
+#if defined(RTE_ARCH_X86)
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2)) {
+		d->dist_match_fn = RTE_DIST_MATCH_SSE;
+	} else {
+#endif
+		d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
+	}
+
+	/*
+	 * Set up the backog tags so they're pointing at the second cache
+	 * line for performance during flow matching
+	 */
+	for (unsigned int i = 0 ; i < num_workers ; i++)
+		d->backlog[i].tags = &d->in_flight_tags[i][RTE_DIST_BURST_SIZE];
+
+	dist_burst_list = RTE_TAILQ_CAST(rte_dist_burst_tailq.head,
+					  rte_dist_burst_list);
+
+	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+	TAILQ_INSERT_TAIL(dist_burst_list, d, next);
+	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
+	return d;
+}
diff --git a/lib/librte_distributor/rte_distributor_burst.h b/lib/librte_distributor/rte_distributor_burst.h
new file mode 100644
index 0000000..5096b13
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_burst.h
@@ -0,0 +1,255 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DIST_BURST_H_
+#define _RTE_DIST_BURST_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct rte_distributor_burst;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor_burst *
+rte_distributor_create_burst(const char *name, unsigned int socket_id,
+		unsigned int num_workers);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed on different cores at
+ * the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process_burst(struct rte_distributor_burst *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts_burst(struct rte_distributor_burst *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush_burst(struct rte_distributor_burst *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns_burst(struct rte_distributor_burst *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get new packets to process. Any previous packets
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param pkts
+ *   The mbufs pointer array to be filled in (up to 8 packets)
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ * @param retcount
+ *   The number of packets being returned
+ *
+ * @return
+ *   The number of packets in the pkts array
+ */
+int
+rte_distributor_get_pkt_burst(struct rte_distributor_burst *d,
+	unsigned int worker_id, struct rte_mbuf **pkts,
+	struct rte_mbuf **oldpkt, unsigned int retcount);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbuf
+ *   The previous packet being processed by the worker
+ */
+int
+rte_distributor_return_pkt_burst(struct rte_distributor_burst *d,
+	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt_burst(), this function does not wait for a
+ * new packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt_burst() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt_burst()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The returning packets, if any, processed by the worker
+ * @param count
+ *   The number of returning packets
+ */
+void
+rte_distributor_request_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbufs
+ *   The array of mbufs being given to the worker
+ *
+ * @return
+ *   The number of packets being given to the worker thread, zero if no
+ *   packet is yet available.
+ */
+int
+rte_distributor_poll_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **mbufs);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_priv.h b/lib/librte_distributor/rte_distributor_priv.h
new file mode 100644
index 0000000..1b1295a
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_priv.h
@@ -0,0 +1,190 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DIST_PRIV_H_
+#define _RTE_DIST_PRIV_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define NO_FLAGS 0
+#define RTE_DISTRIB_PREFIX "DT_"
+
+/*
+ * We will use the bottom four bits of pointer for flags, shifting out
+ * the top four bits to make room (since a 64-bit pointer actually only uses
+ * 48 bits). An arithmetic-right-shift will then appropriately restore the
+ * original pointer value with proper sign extension into the top bits.
+ */
+#define RTE_DISTRIB_FLAG_BITS 4
+#define RTE_DISTRIB_FLAGS_MASK (0x0F)
+#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
+#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
+#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
+#define RTE_DISTRIB_VALID_BUF (4)  /**< set if bufptr contains ptr */
+
+#define RTE_DISTRIB_BACKLOG_SIZE 8
+#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
+
+#define RTE_DISTRIB_MAX_RETURNS 128
+#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
+
+/**
+ * Maximum number of workers allowed.
+ * Be aware of increasing the limit, becaus it is limited by how we track
+ * in-flight tags. See @in_flight_bitmask and @rte_distributor_process
+ */
+#define RTE_DISTRIB_MAX_WORKERS 64
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+/**
+ * Buffer structure used to pass the pointer data between cores. This is cache
+ * line aligned, but to improve performance and prevent adjacent cache-line
+ * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
+ * the next cache line to worker 0, we pad this out to three cache lines.
+ * Only 64-bits of the memory is actually used though.
+ */
+union rte_distributor_buffer {
+	volatile int64_t bufptr64;
+	char pad[RTE_CACHE_LINE_SIZE*3];
+} __rte_cache_aligned;
+
+/**
+ * Number of packets to deal with in bursts. Needs to be 8 so as to
+ * fit in one cache line.
+ */
+#define RTE_DIST_BURST_SIZE (sizeof(__m128i) / sizeof(uint16_t))
+
+/**
+ * Buffer structure used to pass the pointer data between cores. This is cache
+ * line aligned, but to improve performance and prevent adjacent cache-line
+ * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
+ * the next cache line to worker 0, we pad this out to two cache lines.
+ * We can pass up to 8 mbufs at a time in one cacheline.
+ * There is a separate cacheline for returns in the burst API.
+ */
+struct rte_distributor_buffer_burst {
+	volatile int64_t bufptr64[RTE_DIST_BURST_SIZE]
+			__rte_cache_aligned; /* <= outgoing to worker */
+
+	int64_t pad1 __rte_cache_aligned;    /* <= one cache line  */
+
+	volatile int64_t retptr64[RTE_DIST_BURST_SIZE]
+			__rte_cache_aligned; /* <= incoming from worker */
+
+	int64_t pad2 __rte_cache_aligned;    /* <= one cache line  */
+
+	int count __rte_cache_aligned;       /* <= number of current mbufs */
+};
+
+
+struct rte_distributor_backlog {
+	unsigned int start;
+	unsigned int count;
+	int64_t pkts[RTE_DIST_BURST_SIZE] __rte_cache_aligned;
+	uint16_t *tags; /* will point to second cacheline of inflights */
+} __rte_cache_aligned;
+
+
+struct rte_distributor_returned_pkts {
+	unsigned int start;
+	unsigned int count;
+	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
+};
+
+struct rte_distributor {
+	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
+
+	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
+	unsigned int num_workers;             /**< Number of workers polling */
+
+	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
+		/**< Tracks the tag being processed per core */
+	uint64_t in_flight_bitmask;
+		/**< on/off bits for in-flight tags.
+		  * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
+		  * the bitmask has to expand.
+		  */
+
+	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
+
+	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
+
+	struct rte_distributor_returned_pkts returns;
+};
+
+/* All different signature compare functions */
+enum rte_distributor_match_function {
+	RTE_DIST_MATCH_SCALAR = 0,
+	RTE_DIST_MATCH_SSE,
+	RTE_DIST_MATCH_NUM
+};
+
+struct rte_distributor_burst {
+	TAILQ_ENTRY(rte_distributor_burst) next;    /**< Next in list. */
+
+	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
+	unsigned int num_workers;             /**< Number of workers polling */
+
+	/**>
+	  * First cache line in the this array are the tags inflight
+	  * on the worker core. Second cache line are the backlog
+	  * that are going to go to the worker core.
+	  */
+	uint16_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS][RTE_DIST_BURST_SIZE*2]
+			__rte_cache_aligned;
+
+	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS]
+			__rte_cache_aligned;
+
+	struct rte_distributor_buffer_burst bufs[RTE_DISTRIB_MAX_WORKERS];
+
+	struct rte_distributor_returned_pkts returns;
+
+	enum rte_distributor_match_function dist_match_fn;
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v2 2/5] test: unit tests for new distributor burst api
  2016-12-22  4:37   ` [PATCH v2 0/5] distributor library " David Hunt
  2016-12-22  4:37     ` [PATCH v2 1/5] lib: distributor " David Hunt
@ 2016-12-22  4:37     ` David Hunt
  2016-12-22  4:37     ` [PATCH v2 3/5] test: add distributor_perf autotest David Hunt
                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2016-12-22  4:37 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 app/test/test_distributor.c | 500 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 391 insertions(+), 109 deletions(-)

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index 85cb8f3..7738f04 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -40,11 +40,24 @@
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
 #include <rte_distributor.h>
+#include <rte_distributor_burst.h>
 
 #define ITER_POWER 20 /* log 2 of how many iterations we do when timing. */
 #define BURST 32
 #define BIG_BATCH 1024
 
+#define DIST_SINGLE 0
+#define DIST_BURST  1
+#define DIST_NUM_TYPES 2
+
+struct worker_params {
+	struct rte_distributor *d;
+	struct rte_distributor_burst *db;
+	int dist_type;
+};
+
+struct worker_params worker_params;
+
 /* statics - all zero-initialized by default */
 static volatile int quit;      /**< general quit variable for all threads */
 static volatile int zero_quit; /**< var for when we just want thr0 to quit*/
@@ -81,17 +94,35 @@ static int
 handle_work(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
-
-	pkt = rte_distributor_get_pkt(d, id, NULL);
-	while (!quit) {
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
+	unsigned int count = 0, num = 0;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+
+	if (wp->dist_type == DIST_SINGLE) {
+		pkt = rte_distributor_get_pkt(d, id, NULL);
+		while (!quit) {
+			worker_stats[id].handled_packets++, count++;
+			pkt = rte_distributor_get_pkt(d, id, pkt);
+		}
 		worker_stats[id].handled_packets++, count++;
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		rte_distributor_return_pkt(d, id, pkt);
+	} else {
+		for (int i = 0; i < 8; i++)
+			buf[i] = NULL;
+		num = rte_distributor_get_pkt_burst(db, id, buf, buf, num);
+		while (!quit) {
+			worker_stats[id].handled_packets += num;
+			count += num;
+			num = rte_distributor_get_pkt_burst(db, id,
+					buf, buf, num);
+		}
+		worker_stats[id].handled_packets += num;
+		count += num;
+		rte_distributor_return_pkt_burst(db, id, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
 	return 0;
 }
 
@@ -107,12 +138,21 @@ handle_work(void *arg)
  *   not necessarily in the same order (as different flows).
  */
 static int
-sanity_test(struct rte_distributor *d, struct rte_mempool *p)
+sanity_test(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	struct rte_mbuf *bufs[BURST];
-	unsigned i;
+	struct rte_mbuf *returns[BURST*2];
+	unsigned int i;
+	unsigned int retries;
+	unsigned int count = 0;
+
+	if (wp->dist_type == DIST_SINGLE)
+		printf("=== Basic distributor sanity tests (single) ===\n");
+	else
+		printf("=== Basic distributor sanity tests (burst) ===\n");
 
-	printf("=== Basic distributor sanity tests ===\n");
 	clear_packet_count();
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
@@ -124,8 +164,21 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 	for (i = 0; i < BURST; i++)
 		bufs[i]->hash.usr = 0;
 
-	rte_distributor_process(d, bufs, BURST);
-	rte_distributor_flush(d);
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, bufs, BURST);
+		rte_distributor_flush(d);
+	} else {
+		rte_distributor_process_burst(db, bufs, BURST);
+		count = 0;
+		do {
+
+			rte_distributor_flush_burst(db);
+			count += rte_distributor_returned_pkts_burst(db,
+					returns, BURST*2);
+		} while (count < BURST);
+	}
+
+
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -146,8 +199,18 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 		for (i = 0; i < BURST; i++)
 			bufs[i]->hash.usr = (i & 1) << 8;
 
-		rte_distributor_process(d, bufs, BURST);
-		rte_distributor_flush(d);
+		if (wp->dist_type == DIST_SINGLE) {
+			rte_distributor_process(d, bufs, BURST);
+			rte_distributor_flush(d);
+		} else {
+			rte_distributor_process_burst(db, bufs, BURST);
+			count = 0;
+			do {
+				rte_distributor_flush_burst(db);
+				count += rte_distributor_returned_pkts_burst(db,
+						returns, BURST*2);
+			} while (count < BURST);
+		}
 		if (total_packet_count() != BURST) {
 			printf("Line %d: Error, not all packets flushed. "
 					"Expected %u, got %u\n",
@@ -155,24 +218,32 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 			return -1;
 		}
 
+
 		for (i = 0; i < rte_lcore_count() - 1; i++)
 			printf("Worker %u handled %u packets\n", i,
 					worker_stats[i].handled_packets);
 		printf("Sanity test with two hash values done\n");
-
-		if (worker_stats[0].handled_packets != 16 ||
-				worker_stats[1].handled_packets != 16)
-			return -1;
 	}
 
 	/* give a different hash value to each packet,
 	 * so load gets distributed */
 	clear_packet_count();
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = i;
+		bufs[i]->hash.usr = i+1;
+
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, bufs, BURST);
+		rte_distributor_flush(d);
+	} else {
+		rte_distributor_process_burst(db, bufs, BURST);
+		count = 0;
+		do {
+			rte_distributor_flush_burst(db);
+			count += rte_distributor_returned_pkts_burst(db,
+					returns, BURST*2);
+		} while (count < BURST);
+	}
 
-	rte_distributor_process(d, bufs, BURST);
-	rte_distributor_flush(d);
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -194,8 +265,15 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 	unsigned num_returned = 0;
 
 	/* flush out any remaining packets */
-	rte_distributor_flush(d);
-	rte_distributor_clear_returns(d);
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_flush(d);
+		rte_distributor_clear_returns(d);
+	} else {
+		rte_distributor_flush_burst(db);
+		rte_distributor_clear_returns_burst(db);
+	}
+
+
 	if (rte_mempool_get_bulk(p, (void *)many_bufs, BIG_BATCH) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
 		return -1;
@@ -203,28 +281,59 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 	for (i = 0; i < BIG_BATCH; i++)
 		many_bufs[i]->hash.usr = i << 2;
 
-	for (i = 0; i < BIG_BATCH/BURST; i++) {
-		rte_distributor_process(d, &many_bufs[i*BURST], BURST);
+	if (wp->dist_type == DIST_SINGLE) {
+		printf("===testing single big burst===\n");
+		for (i = 0; i < BIG_BATCH/BURST; i++) {
+			rte_distributor_process(d, &many_bufs[i*BURST], BURST);
+			num_returned += rte_distributor_returned_pkts(d,
+					&return_bufs[num_returned],
+					BIG_BATCH - num_returned);
+		}
+		rte_distributor_flush(d);
 		num_returned += rte_distributor_returned_pkts(d,
 				&return_bufs[num_returned],
 				BIG_BATCH - num_returned);
+	} else {
+		printf("===testing burst big burst===\n");
+		for (i = 0; i < BIG_BATCH/BURST; i++) {
+			rte_distributor_process_burst(db,
+					&many_bufs[i*BURST], BURST);
+			count = rte_distributor_returned_pkts_burst(db,
+					&return_bufs[num_returned],
+					BIG_BATCH - num_returned);
+			num_returned += count;
+		}
+		rte_distributor_flush_burst(db);
+		count = rte_distributor_returned_pkts_burst(db,
+				&return_bufs[num_returned],
+				BIG_BATCH - num_returned);
+		num_returned += count;
 	}
-	rte_distributor_flush(d);
-	num_returned += rte_distributor_returned_pkts(d,
-			&return_bufs[num_returned], BIG_BATCH - num_returned);
+	retries = 0;
+	do {
+		rte_distributor_flush_burst(db);
+		count = rte_distributor_returned_pkts_burst(db,
+				&return_bufs[num_returned],
+				BIG_BATCH - num_returned);
+		num_returned += count;
+		retries++;
+	} while ((num_returned < BIG_BATCH) && (retries < 100));
+
 
 	if (num_returned != BIG_BATCH) {
-		printf("line %d: Number returned is not the same as "
-				"number sent\n", __LINE__);
+		printf("line %d: Missing packets, expected %d\n",
+				__LINE__, num_returned);
 		return -1;
 	}
+
 	/* big check -  make sure all packets made it back!! */
 	for (i = 0; i < BIG_BATCH; i++) {
 		unsigned j;
 		struct rte_mbuf *src = many_bufs[i];
-		for (j = 0; j < BIG_BATCH; j++)
+		for (j = 0; j < BIG_BATCH; j++) {
 			if (return_bufs[j] == src)
 				break;
+		}
 
 		if (j == BIG_BATCH) {
 			printf("Error: could not find source packet #%u\n", i);
@@ -234,7 +343,6 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 	printf("Sanity test of returned packets done\n");
 
 	rte_mempool_put_bulk(p, (void *)many_bufs, BIG_BATCH);
-
 	printf("\n");
 	return 0;
 }
@@ -249,18 +357,40 @@ static int
 handle_work_with_free_mbufs(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
-
-	pkt = rte_distributor_get_pkt(d, id, NULL);
-	while (!quit) {
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
+	unsigned int count = 0;
+	unsigned int i = 0;
+	unsigned int num = 0;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+
+	if (wp->dist_type == DIST_SINGLE) {
+		pkt = rte_distributor_get_pkt(d, id, NULL);
+		while (!quit) {
+			worker_stats[id].handled_packets++, count++;
+			rte_pktmbuf_free(pkt);
+			pkt = rte_distributor_get_pkt(d, id, pkt);
+		}
 		worker_stats[id].handled_packets++, count++;
-		rte_pktmbuf_free(pkt);
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		rte_distributor_return_pkt(d, id, pkt);
+	} else {
+		for (int i = 0; i < 8; i++)
+			buf[i] = NULL;
+		num = rte_distributor_get_pkt_burst(db, id, buf, buf, num);
+		while (!quit) {
+			worker_stats[id].handled_packets += num;
+			count += num;
+			for (i = 0; i < num; i++)
+				rte_pktmbuf_free(buf[i]);
+			num = rte_distributor_get_pkt_burst(db,
+					id, buf, buf, num);
+		}
+		worker_stats[id].handled_packets += num;
+		count += num;
+		rte_distributor_return_pkt_burst(db, id, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
 	return 0;
 }
 
@@ -270,26 +400,45 @@ handle_work_with_free_mbufs(void *arg)
  * library.
  */
 static int
-sanity_test_with_mbuf_alloc(struct rte_distributor *d, struct rte_mempool *p)
+sanity_test_with_mbuf_alloc(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	unsigned i;
 	struct rte_mbuf *bufs[BURST];
 
-	printf("=== Sanity test with mbuf alloc/free  ===\n");
+	if (wp->dist_type == DIST_SINGLE)
+		printf("=== Sanity test with mbuf alloc/free (single) ===\n");
+	else
+		printf("=== Sanity test with mbuf alloc/free (burst)  ===\n");
+
 	clear_packet_count();
 	for (i = 0; i < ((1<<ITER_POWER)); i += BURST) {
 		unsigned j;
-		while (rte_mempool_get_bulk(p, (void *)bufs, BURST) < 0)
-			rte_distributor_process(d, NULL, 0);
+		while (rte_mempool_get_bulk(p, (void *)bufs, BURST) < 0) {
+			if (wp->dist_type == DIST_SINGLE)
+				rte_distributor_process(d, NULL, 0);
+			else
+				rte_distributor_process_burst(db, NULL, 0);
+		}
 		for (j = 0; j < BURST; j++) {
 			bufs[j]->hash.usr = (i+j) << 1;
 			rte_mbuf_refcnt_set(bufs[j], 1);
 		}
 
-		rte_distributor_process(d, bufs, BURST);
+		if (wp->dist_type == DIST_SINGLE)
+			rte_distributor_process(d, bufs, BURST);
+		else
+			rte_distributor_process_burst(db, bufs, BURST);
 	}
 
-	rte_distributor_flush(d);
+	if (wp->dist_type == DIST_SINGLE)
+		rte_distributor_flush(d);
+	else
+		rte_distributor_flush_burst(db);
+
+	rte_delay_us(10000);
+
 	if (total_packet_count() < (1<<ITER_POWER)) {
 		printf("Line %u: Packet count is incorrect, %u, expected %u\n",
 				__LINE__, total_packet_count(),
@@ -305,20 +454,48 @@ static int
 handle_work_for_shutdown_test(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
-	unsigned count = 0;
-	const unsigned id = __sync_fetch_and_add(&worker_idx, 1);
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
+	unsigned int count = 0;
+	unsigned int num = 0;
+	unsigned int total = 0;
+	unsigned int i;
+	unsigned int returned = 0;
+	const unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+
+	if (wp->dist_type == DIST_SINGLE)
+		pkt = rte_distributor_get_pkt(d, id, NULL);
+	else
+		num = rte_distributor_get_pkt_burst(db, id, buf, buf, num);
 
-	pkt = rte_distributor_get_pkt(d, id, NULL);
 	/* wait for quit single globally, or for worker zero, wait
 	 * for zero_quit */
 	while (!quit && !(id == 0 && zero_quit)) {
-		worker_stats[id].handled_packets++, count++;
-		rte_pktmbuf_free(pkt);
-		pkt = rte_distributor_get_pkt(d, id, NULL);
+		if (wp->dist_type == DIST_SINGLE) {
+			worker_stats[id].handled_packets++, count++;
+			rte_pktmbuf_free(pkt);
+			pkt = rte_distributor_get_pkt(d, id, NULL);
+			num = 1;
+			total += num;
+		} else {
+			worker_stats[id].handled_packets += num;
+			count += num;
+			for (i = 0; i < num; i++)
+				rte_pktmbuf_free(buf[i]);
+			num = rte_distributor_get_pkt_burst(db,
+					id, buf, buf, num);
+			total += num;
+		}
+	}
+	worker_stats[id].handled_packets += num;
+	count += num;
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_return_pkt(d, id, pkt);
+	} else {
+		returned = rte_distributor_return_pkt_burst(db, id, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
 
 	if (id == 0) {
 		/* for worker zero, allow it to restart to pick up last packet
@@ -326,13 +503,29 @@ handle_work_for_shutdown_test(void *arg)
 		 */
 		while (zero_quit)
 			usleep(100);
-		pkt = rte_distributor_get_pkt(d, id, NULL);
+		if (wp->dist_type == DIST_SINGLE) {
+			pkt = rte_distributor_get_pkt(d, id, NULL);
+		} else {
+			num = rte_distributor_get_pkt_burst(db,
+					id, buf, buf, num);
+		}
 		while (!quit) {
 			worker_stats[id].handled_packets++, count++;
 			rte_pktmbuf_free(pkt);
-			pkt = rte_distributor_get_pkt(d, id, NULL);
+			if (wp->dist_type == DIST_SINGLE) {
+				pkt = rte_distributor_get_pkt(d, id, NULL);
+			} else {
+				num = rte_distributor_get_pkt_burst(db,
+						id, buf, buf, num);
+			}
+		}
+		if (wp->dist_type == DIST_SINGLE) {
+			rte_distributor_return_pkt(d, id, pkt);
+		} else {
+			returned = rte_distributor_return_pkt_burst(db,
+					id, buf, num);
+			printf("Num returned = %d\n", returned);
 		}
-		rte_distributor_return_pkt(d, id, pkt);
 	}
 	return 0;
 }
@@ -344,26 +537,37 @@ handle_work_for_shutdown_test(void *arg)
  * library.
  */
 static int
-sanity_test_with_worker_shutdown(struct rte_distributor *d,
+sanity_test_with_worker_shutdown(struct worker_params *wp,
 		struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
 	printf("=== Sanity test of worker shutdown ===\n");
 
 	clear_packet_count();
+
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
 		return -1;
 	}
 
-	/* now set all hash values in all buffers to zero, so all pkts go to the
-	 * one worker thread */
+	/*
+	 * Now set all hash values in all buffers to same value so all
+	 * pkts go to the one worker thread
+	 */
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = 0;
+		bufs[i]->hash.usr = 1;
+
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, bufs, BURST);
+	} else {
+		rte_distributor_process_burst(db, bufs, BURST);
+		rte_distributor_flush_burst(db);
+	}
 
-	rte_distributor_process(d, bufs, BURST);
 	/* at this point, we will have processed some packets and have a full
 	 * backlog for the other ones at worker 0.
 	 */
@@ -374,14 +578,25 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d,
 		return -1;
 	}
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = 0;
+		bufs[i]->hash.usr = 1;
 
 	/* get worker zero to quit */
 	zero_quit = 1;
-	rte_distributor_process(d, bufs, BURST);
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, bufs, BURST);
+		/* flush the distributor */
+		rte_distributor_flush(d);
+	} else {
+		rte_distributor_process_burst(db, bufs, BURST);
+		/* flush the distributor */
+		rte_distributor_flush_burst(db);
+	}
+	rte_delay_us(10000);
+
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
 
-	/* flush the distributor */
-	rte_distributor_flush(d);
 	if (total_packet_count() != BURST * 2) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -389,10 +604,6 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d,
 		return -1;
 	}
 
-	for (i = 0; i < rte_lcore_count() - 1; i++)
-		printf("Worker %u handled %u packets\n", i,
-				worker_stats[i].handled_packets);
-
 	printf("Sanity test with worker shutdown passed\n\n");
 	return 0;
 }
@@ -401,13 +612,18 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d,
  * one worker shuts down..
  */
 static int
-test_flush_with_worker_shutdown(struct rte_distributor *d,
+test_flush_with_worker_shutdown(struct worker_params *wp,
 		struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
-	printf("=== Test flush fn with worker shutdown ===\n");
+	if (wp->dist_type == DIST_SINGLE)
+		printf("=== Test flush fn with worker shutdown (single) ===\n");
+	else
+		printf("=== Test flush fn with worker shutdown (burst) ===\n");
 
 	clear_packet_count();
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
@@ -420,7 +636,11 @@ test_flush_with_worker_shutdown(struct rte_distributor *d,
 	for (i = 0; i < BURST; i++)
 		bufs[i]->hash.usr = 0;
 
-	rte_distributor_process(d, bufs, BURST);
+	if (wp->dist_type == DIST_SINGLE)
+		rte_distributor_process(d, bufs, BURST);
+	else
+		rte_distributor_process_burst(db, bufs, BURST);
+
 	/* at this point, we will have processed some packets and have a full
 	 * backlog for the other ones at worker 0.
 	 */
@@ -429,9 +649,18 @@ test_flush_with_worker_shutdown(struct rte_distributor *d,
 	zero_quit = 1;
 
 	/* flush the distributor */
-	rte_distributor_flush(d);
+	if (wp->dist_type == DIST_SINGLE)
+		rte_distributor_flush(d);
+	else
+		rte_distributor_flush_burst(db);
+
+	rte_delay_us(10000);
 
 	zero_quit = 0;
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
+
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -439,10 +668,6 @@ test_flush_with_worker_shutdown(struct rte_distributor *d,
 		return -1;
 	}
 
-	for (i = 0; i < rte_lcore_count() - 1; i++)
-		printf("Worker %u handled %u packets\n", i,
-				worker_stats[i].handled_packets);
-
 	printf("Flush test with worker shutdown passed\n\n");
 	return 0;
 }
@@ -451,6 +676,7 @@ static
 int test_error_distributor_create_name(void)
 {
 	struct rte_distributor *d = NULL;
+	struct rte_distributor_burst *db = NULL;
 	char *name = NULL;
 
 	d = rte_distributor_create(name, rte_socket_id(),
@@ -460,6 +686,13 @@ int test_error_distributor_create_name(void)
 		return -1;
 	}
 
+	db = rte_distributor_create_burst(name, rte_socket_id(),
+			rte_lcore_count() - 1);
+	if (db != NULL || rte_errno != EINVAL) {
+		printf("ERROR: No error on create_burst() with NULL param\n");
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -468,20 +701,32 @@ static
 int test_error_distributor_create_numworkers(void)
 {
 	struct rte_distributor *d = NULL;
+	struct rte_distributor_burst *db = NULL;
+
 	d = rte_distributor_create("test_numworkers", rte_socket_id(),
 			RTE_MAX_LCORE + 10);
 	if (d != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with num_workers > MAX\n");
 		return -1;
 	}
+
+	db = rte_distributor_create_burst("test_numworkers", rte_socket_id(),
+			RTE_MAX_LCORE + 10);
+	if (db != NULL || rte_errno != EINVAL) {
+		printf("ERROR: No error on create_burst() num_workers > MAX\n");
+		return -1;
+	}
+
 	return 0;
 }
 
 
 /* Useful function which ensures that all worker functions terminate */
 static void
-quit_workers(struct rte_distributor *d, struct rte_mempool *p)
+quit_workers(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	const unsigned num_workers = rte_lcore_count() - 1;
 	unsigned i;
 	struct rte_mbuf *bufs[RTE_MAX_LCORE];
@@ -491,12 +736,20 @@ quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 	quit = 1;
 	for (i = 0; i < num_workers; i++)
 		bufs[i]->hash.usr = i << 1;
-	rte_distributor_process(d, bufs, num_workers);
+	if (wp->dist_type == DIST_SINGLE)
+		rte_distributor_process(d, bufs, num_workers);
+	else
+		rte_distributor_process_burst(db, bufs, num_workers);
 
 	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
 
-	rte_distributor_process(d, NULL, 0);
-	rte_distributor_flush(d);
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, NULL, 0);
+		rte_distributor_flush(d);
+	} else {
+		rte_distributor_process_burst(db, NULL, 0);
+		rte_distributor_flush_burst(db);
+	}
 	rte_eal_mp_wait_lcore();
 	quit = 0;
 	worker_idx = 0;
@@ -506,7 +759,9 @@ static int
 test_distributor(void)
 {
 	static struct rte_distributor *d;
+	static struct rte_distributor_burst *db;
 	static struct rte_mempool *p;
+	int i;
 
 	if (rte_lcore_count() < 2) {
 		printf("ERROR: not enough cores to test distributor\n");
@@ -525,6 +780,19 @@ test_distributor(void)
 		rte_distributor_clear_returns(d);
 	}
 
+	if (db == NULL) {
+		db = rte_distributor_create_burst("Test_dist_burst",
+				rte_socket_id(),
+				rte_lcore_count() - 1);
+		if (db == NULL) {
+			printf("Error creating burst distributor\n");
+			return -1;
+		}
+	} else {
+		rte_distributor_flush_burst(db);
+		rte_distributor_clear_returns_burst(db);
+	}
+
 	const unsigned nb_bufs = (511 * rte_lcore_count()) < BIG_BATCH ?
 			(BIG_BATCH * 2) - 1 : (511 * rte_lcore_count());
 	if (p == NULL) {
@@ -536,31 +804,45 @@ test_distributor(void)
 		}
 	}
 
-	rte_eal_mp_remote_launch(handle_work, d, SKIP_MASTER);
-	if (sanity_test(d, p) < 0)
-		goto err;
-	quit_workers(d, p);
+	worker_params.d = d;
+	worker_params.db = db;
 
-	rte_eal_mp_remote_launch(handle_work_with_free_mbufs, d, SKIP_MASTER);
-	if (sanity_test_with_mbuf_alloc(d, p) < 0)
-		goto err;
-	quit_workers(d, p);
+	for (i = 0; i < DIST_NUM_TYPES; i++) {
 
-	if (rte_lcore_count() > 2) {
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test, d,
-				SKIP_MASTER);
-		if (sanity_test_with_worker_shutdown(d, p) < 0)
-			goto err;
-		quit_workers(d, p);
+		worker_params.dist_type = i;
 
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test, d,
-				SKIP_MASTER);
-		if (test_flush_with_worker_shutdown(d, p) < 0)
+		rte_eal_mp_remote_launch(handle_work,
+				&worker_params, SKIP_MASTER);
+		if (sanity_test(&worker_params, p) < 0)
 			goto err;
-		quit_workers(d, p);
+		quit_workers(&worker_params, p);
 
-	} else {
-		printf("Not enough cores to run tests for worker shutdown\n");
+		rte_eal_mp_remote_launch(handle_work_with_free_mbufs,
+				&worker_params, SKIP_MASTER);
+		if (sanity_test_with_mbuf_alloc(&worker_params, p) < 0)
+			goto err;
+		quit_workers(&worker_params, p);
+
+		if (rte_lcore_count() > 2) {
+			rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+					&worker_params,
+					SKIP_MASTER);
+			if (sanity_test_with_worker_shutdown(&worker_params,
+					p) < 0)
+				goto err;
+			quit_workers(&worker_params, p);
+
+			rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+					&worker_params,
+					SKIP_MASTER);
+			if (test_flush_with_worker_shutdown(&worker_params,
+					p) < 0)
+				goto err;
+			quit_workers(&worker_params, p);
+
+		} else {
+			printf("Too few cores to run worker shutdown test\n");
+		}
 	}
 
 	if (test_error_distributor_create_numworkers() == -1 ||
@@ -572,7 +854,7 @@ test_distributor(void)
 	return 0;
 
 err:
-	quit_workers(d, p);
+	quit_workers(&worker_params, p);
 	return -1;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v2 3/5] test: add distributor_perf autotest
  2016-12-22  4:37   ` [PATCH v2 0/5] distributor library " David Hunt
  2016-12-22  4:37     ` [PATCH v2 1/5] lib: distributor " David Hunt
  2016-12-22  4:37     ` [PATCH v2 2/5] test: unit tests for new distributor " David Hunt
@ 2016-12-22  4:37     ` David Hunt
  2016-12-22 12:19       ` Jerin Jacob
  2016-12-22  4:37     ` [PATCH v2 4/5] example: distributor app showing burst api David Hunt
  2016-12-22  4:37     ` [PATCH v2 5/5] doc: distributor library changes for new " David Hunt
  4 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2016-12-22  4:37 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 app/test/test_distributor_perf.c | 133 +++++++++++++++++++++++++++++++++++++--
 1 file changed, 127 insertions(+), 6 deletions(-)

diff --git a/app/test/test_distributor_perf.c b/app/test/test_distributor_perf.c
index 7947fe9..86285fd 100644
--- a/app/test/test_distributor_perf.c
+++ b/app/test/test_distributor_perf.c
@@ -40,9 +40,11 @@
 #include <rte_common.h>
 #include <rte_mbuf.h>
 #include <rte_distributor.h>
+#include <rte_distributor_burst.h>
 
-#define ITER_POWER 20 /* log 2 of how many iterations we do when timing. */
-#define BURST 32
+#define ITER_POWER_CL 25 /* log 2 of how many iterations  for Cache Line test */
+#define ITER_POWER 21 /* log 2 of how many iterations we do when timing. */
+#define BURST 64
 #define BIG_BATCH 1024
 
 /* static vars - zero initialized by default */
@@ -86,7 +88,7 @@ time_cache_line_switch(void)
 		rte_pause();
 
 	const uint64_t start_time = rte_rdtsc();
-	for (i = 0; i < (1 << ITER_POWER); i++) {
+	for (i = 0; i < (1 << ITER_POWER_CL); i++) {
 		while (*pdata)
 			rte_pause();
 		*pdata = 1;
@@ -98,10 +100,10 @@ time_cache_line_switch(void)
 	*pdata = 2;
 	rte_eal_wait_lcore(slaveid);
 	printf("==== Cache line switch test ===\n");
-	printf("Time for %u iterations = %"PRIu64" ticks\n", (1<<ITER_POWER),
+	printf("Time for %u iterations = %"PRIu64" ticks\n", (1<<ITER_POWER_CL),
 			end_time-start_time);
 	printf("Ticks per iteration = %"PRIu64"\n\n",
-			(end_time-start_time) >> ITER_POWER);
+			(end_time-start_time) >> ITER_POWER_CL);
 }
 
 /* returns the total count of the number of packets handled by the worker
@@ -144,6 +146,34 @@ handle_work(void *arg)
 	return 0;
 }
 
+/* this is the basic worker function for performance tests.
+ * it does nothing but return packets and count them.
+ */
+static int
+handle_work_burst(void *arg)
+{
+	//struct rte_mbuf *pkt = NULL;
+	struct rte_distributor_burst *d = arg;
+	unsigned int count = 0;
+	unsigned int num = 0;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+
+	for (int i = 0; i < 8; i++)
+		buf[i] = NULL;
+
+	num = rte_distributor_get_pkt_burst(d, id, buf, buf, num);
+	while (!quit) {
+		worker_stats[id].handled_packets += num;
+		count += num;
+		num = rte_distributor_get_pkt_burst(d, id, buf, buf, num);
+	}
+	worker_stats[id].handled_packets += num;
+	count += num;
+	rte_distributor_return_pkt_burst(d, id, buf, num);
+	return 0;
+}
+
 /* this basic performance test just repeatedly sends in 32 packets at a time
  * to the distributor and verifies at the end that we got them all in the worker
  * threads and finally how long per packet the processing took.
@@ -174,6 +204,8 @@ perf_test(struct rte_distributor *d, struct rte_mempool *p)
 		rte_distributor_process(d, NULL, 0);
 	} while (total_packet_count() < (BURST << ITER_POWER));
 
+	rte_distributor_clear_returns(d);
+
 	printf("=== Performance test of distributor ===\n");
 	printf("Time per burst:  %"PRIu64"\n", (end - start) >> ITER_POWER);
 	printf("Time per packet: %"PRIu64"\n\n",
@@ -190,6 +222,54 @@ perf_test(struct rte_distributor *d, struct rte_mempool *p)
 	return 0;
 }
 
+/* this basic performance test just repeatedly sends in 32 packets at a time
+ * to the distributor and verifies at the end that we got them all in the worker
+ * threads and finally how long per packet the processing took.
+ */
+static inline int
+perf_test_burst(struct rte_distributor_burst *d, struct rte_mempool *p)
+{
+	unsigned int i;
+	uint64_t start, end;
+	struct rte_mbuf *bufs[BURST];
+
+	clear_packet_count();
+	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
+		printf("Error getting mbufs from pool\n");
+		return -1;
+	}
+	/* ensure we have different hash value for each pkt */
+	for (i = 0; i < BURST; i++)
+		bufs[i]->hash.usr = i;
+
+	start = rte_rdtsc();
+	for (i = 0; i < (1<<ITER_POWER); i++)
+		rte_distributor_process_burst(d, bufs, BURST);
+	end = rte_rdtsc();
+
+	do {
+		usleep(100);
+		rte_distributor_process_burst(d, NULL, 0);
+	} while (total_packet_count() < (BURST << ITER_POWER));
+
+	rte_distributor_clear_returns_burst(d);
+
+	printf("=== Performance test of burst distributor ===\n");
+	printf("Time per burst:  %"PRIu64"\n", (end - start) >> ITER_POWER);
+	printf("Time per packet: %"PRIu64"\n\n",
+			((end - start) >> ITER_POWER)/BURST);
+	rte_mempool_put_bulk(p, (void *)bufs, BURST);
+
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
+	printf("Total packets: %u (%x)\n", total_packet_count(),
+			total_packet_count());
+	printf("=== Perf test done ===\n\n");
+
+	return 0;
+}
+
 /* Useful function which ensures that all worker functions terminate */
 static void
 quit_workers(struct rte_distributor *d, struct rte_mempool *p)
@@ -212,10 +292,34 @@ quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 	worker_idx = 0;
 }
 
+/* Useful function which ensures that all worker functions terminate */
+static void
+quit_workers_burst(struct rte_distributor_burst *d, struct rte_mempool *p)
+{
+	const unsigned int num_workers = rte_lcore_count() - 1;
+	unsigned int i;
+	struct rte_mbuf *bufs[RTE_MAX_LCORE];
+
+	rte_mempool_get_bulk(p, (void *)bufs, num_workers);
+
+	quit = 1;
+	for (i = 0; i < num_workers; i++)
+		bufs[i]->hash.usr = i << 1;
+	rte_distributor_process_burst(d, bufs, num_workers);
+
+	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
+
+	rte_distributor_process_burst(d, NULL, 0);
+	rte_eal_mp_wait_lcore();
+	quit = 0;
+	worker_idx = 0;
+}
+
 static int
 test_distributor_perf(void)
 {
 	static struct rte_distributor *d;
+	static struct rte_distributor_burst *db;
 	static struct rte_mempool *p;
 
 	if (rte_lcore_count() < 2) {
@@ -234,10 +338,22 @@ test_distributor_perf(void)
 			return -1;
 		}
 	} else {
-		rte_distributor_flush(d);
+		//rte_distributor_flush_burst(d);
 		rte_distributor_clear_returns(d);
 	}
 
+	if (db == NULL) {
+		db = rte_distributor_create_burst("Test_burst", rte_socket_id(),
+				rte_lcore_count() - 1);
+		if (db == NULL) {
+			printf("Error creating burst distributor\n");
+			return -1;
+		}
+	} else {
+		//rte_distributor_flush_burst(d);
+		rte_distributor_clear_returns_burst(db);
+	}
+
 	const unsigned nb_bufs = (511 * rte_lcore_count()) < BIG_BATCH ?
 			(BIG_BATCH * 2) - 1 : (511 * rte_lcore_count());
 	if (p == NULL) {
@@ -254,6 +370,11 @@ test_distributor_perf(void)
 		return -1;
 	quit_workers(d, p);
 
+	rte_eal_mp_remote_launch(handle_work_burst, db, SKIP_MASTER);
+	if (perf_test_burst(db, p) < 0)
+		return -1;
+	quit_workers_burst(db, p);
+
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v2 4/5] example: distributor app showing burst api
  2016-12-22  4:37   ` [PATCH v2 0/5] distributor library " David Hunt
                       ` (2 preceding siblings ...)
  2016-12-22  4:37     ` [PATCH v2 3/5] test: add distributor_perf autotest David Hunt
@ 2016-12-22  4:37     ` David Hunt
  2016-12-22  4:37     ` [PATCH v2 5/5] doc: distributor library changes for new " David Hunt
  4 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2016-12-22  4:37 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 examples/distributor/main.c | 505 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 388 insertions(+), 117 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index e7641d2..451e253 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -1,8 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -31,6 +30,8 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
+#define BURST_API 1
+
 #include <stdint.h>
 #include <inttypes.h>
 #include <unistd.h>
@@ -43,39 +44,87 @@
 #include <rte_malloc.h>
 #include <rte_debug.h>
 #include <rte_prefetch.h>
+#if BURST_API
+#include <rte_distributor_burst.h>
+#else
 #include <rte_distributor.h>
+#endif
 
-#define RX_RING_SIZE 256
-#define TX_RING_SIZE 512
+#define RX_QUEUE_SIZE 512
+#define TX_QUEUE_SIZE 512
 #define NUM_MBUFS ((64*1024)-1)
-#define MBUF_CACHE_SIZE 250
+#define MBUF_CACHE_SIZE 128
+#if BURST_API
+#define BURST_SIZE 64
+#define SCHED_RX_RING_SZ 8192
+#define SCHED_TX_RING_SZ 65536
+#else
 #define BURST_SIZE 32
-#define RTE_RING_SZ 1024
+#define SCHED_RX_RING_SZ 1024
+#define SCHED_TX_RING_SZ 1024
+#endif
+#define BURST_SIZE_TX 32
 
 #define RTE_LOGTYPE_DISTRAPP RTE_LOGTYPE_USER1
 
+#define ANSI_COLOR_RED     "\x1b[31m"
+#define ANSI_COLOR_RESET   "\x1b[0m"
+
 /* mask of enabled ports */
 static uint32_t enabled_port_mask;
 volatile uint8_t quit_signal;
 volatile uint8_t quit_signal_rx;
+volatile uint8_t quit_signal_dist;
+volatile uint8_t quit_signal_work;
 
 static volatile struct app_stats {
 	struct {
 		uint64_t rx_pkts;
 		uint64_t returned_pkts;
 		uint64_t enqueued_pkts;
+		uint64_t enqdrop_pkts;
 	} rx __rte_cache_aligned;
+	int pad1 __rte_cache_aligned;
+
+	struct {
+		uint64_t in_pkts;
+		uint64_t ret_pkts;
+		uint64_t sent_pkts;
+		uint64_t enqdrop_pkts;
+	} dist __rte_cache_aligned;
+	int pad2 __rte_cache_aligned;
 
 	struct {
 		uint64_t dequeue_pkts;
 		uint64_t tx_pkts;
+		uint64_t enqdrop_pkts;
 	} tx __rte_cache_aligned;
+	int pad3 __rte_cache_aligned;
+
+	uint64_t worker_pkts[64] __rte_cache_aligned;
+
+	int pad4 __rte_cache_aligned;
+
+	uint64_t worker_bursts[64][8] __rte_cache_aligned;
+
+	int pad5 __rte_cache_aligned;
+
+	uint64_t port_rx_pkts[64] __rte_cache_aligned;
+	uint64_t port_tx_pkts[64] __rte_cache_aligned;
 } app_stats;
 
+struct app_stats prev_app_stats;
+
 static const struct rte_eth_conf port_conf_default = {
 	.rxmode = {
 		.mq_mode = ETH_MQ_RX_RSS,
 		.max_rx_pkt_len = ETHER_MAX_LEN,
+		.split_hdr_size = 0,
+		.header_split   = 0, /**< Header Split disabled */
+		.hw_ip_checksum = 1, /**< IP checksum offload enabled */
+		.hw_vlan_filter = 0, /**< VLAN filtering disabled */
+		.jumbo_frame    = 0, /**< Jumbo Frame Support disabled */
+		.hw_strip_crc   = 0, /**< CRC stripped by hardware */
 	},
 	.txmode = {
 		.mq_mode = ETH_MQ_TX_NONE,
@@ -93,6 +142,8 @@ struct output_buffer {
 	struct rte_mbuf *mbufs[BURST_SIZE];
 };
 
+static void print_stats(void);
+
 /*
  * Initialises a given port using global settings and with the rx buffers
  * coming from the mbuf_pool passed as parameter
@@ -101,9 +152,13 @@ static inline int
 port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 {
 	struct rte_eth_conf port_conf = port_conf_default;
-	const uint16_t rxRings = 1, txRings = rte_lcore_count() - 1;
-	int retval;
+	const uint16_t rxRings = 1;
+	uint16_t txRings = rte_lcore_count() - 1;
 	uint16_t q;
+	int retval;
+
+	if (txRings > RTE_MAX_ETHPORTS)
+		txRings = RTE_MAX_ETHPORTS;
 
 	if (port >= rte_eth_dev_count())
 		return -1;
@@ -113,7 +168,7 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 		return retval;
 
 	for (q = 0; q < rxRings; q++) {
-		retval = rte_eth_rx_queue_setup(port, q, RX_RING_SIZE,
+		retval = rte_eth_rx_queue_setup(port, q, RX_QUEUE_SIZE,
 						rte_eth_dev_socket_id(port),
 						NULL, mbuf_pool);
 		if (retval < 0)
@@ -121,7 +176,7 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 	}
 
 	for (q = 0; q < txRings; q++) {
-		retval = rte_eth_tx_queue_setup(port, q, TX_RING_SIZE,
+		retval = rte_eth_tx_queue_setup(port, q, TX_QUEUE_SIZE,
 						rte_eth_dev_socket_id(port),
 						NULL);
 		if (retval < 0)
@@ -134,7 +189,8 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 
 	struct rte_eth_link link;
 	rte_eth_link_get_nowait(port, &link);
-	if (!link.link_status) {
+	while (!link.link_status) {
+		printf("Waiting for Link up on port %"PRIu8"\n", port);
 		sleep(1);
 		rte_eth_link_get_nowait(port, &link);
 	}
@@ -160,41 +216,52 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 
 struct lcore_params {
 	unsigned worker_id;
-	struct rte_distributor *d;
-	struct rte_ring *r;
+	struct rte_distributor_burst *d;
+	struct rte_ring *rx_dist_ring;
+	struct rte_ring *dist_tx_ring;
 	struct rte_mempool *mem_pool;
 };
 
-static int
-quit_workers(struct rte_distributor *d, struct rte_mempool *p)
+static inline void
+flush_one_port(struct output_buffer *outbuf, uint8_t outp)
 {
-	const unsigned num_workers = rte_lcore_count() - 2;
-	unsigned i;
-	struct rte_mbuf *bufs[num_workers];
+	unsigned int nb_tx = rte_eth_tx_burst(outp, 0,
+			outbuf->mbufs, outbuf->count);
+	app_stats.tx.tx_pkts += outbuf->count;
 
-	if (rte_mempool_get_bulk(p, (void *)bufs, num_workers) != 0) {
-		printf("line %d: Error getting mbufs from pool\n", __LINE__);
-		return -1;
+	if (unlikely(nb_tx < outbuf->count)) {
+		app_stats.tx.enqdrop_pkts +=  outbuf->count - nb_tx;
+		do {
+			rte_pktmbuf_free(outbuf->mbufs[nb_tx]);
+		} while (++nb_tx < outbuf->count);
 	}
+	outbuf->count = 0;
+}
 
-	for (i = 0; i < num_workers; i++)
-		bufs[i]->hash.rss = i << 1;
+static inline void
+flush_all_ports(struct output_buffer *tx_buffers, uint8_t nb_ports)
+{
+	uint8_t outp;
 
-	rte_distributor_process(d, bufs, num_workers);
-	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
+	for (outp = 0; outp < nb_ports; outp++) {
+		/* skip ports that are not enabled */
+		if ((enabled_port_mask & (1 << outp)) == 0)
+			continue;
 
-	return 0;
+		if (tx_buffers[outp].count == 0)
+			continue;
+
+		flush_one_port(&tx_buffers[outp], outp);
+	}
 }
 
 static int
 lcore_rx(struct lcore_params *p)
 {
-	struct rte_distributor *d = p->d;
-	struct rte_mempool *mem_pool = p->mem_pool;
-	struct rte_ring *r = p->r;
 	const uint8_t nb_ports = rte_eth_dev_count();
 	const int socket_id = rte_socket_id();
 	uint8_t port;
+	struct rte_mbuf *bufs[BURST_SIZE*2];
 
 	for (port = 0; port < nb_ports; port++) {
 		/* skip ports that are not enabled */
@@ -210,6 +277,7 @@ lcore_rx(struct lcore_params *p)
 
 	printf("\nCore %u doing packet RX.\n", rte_lcore_id());
 	port = 0;
+
 	while (!quit_signal_rx) {
 
 		/* skip ports that are not enabled */
@@ -218,7 +286,7 @@ lcore_rx(struct lcore_params *p)
 				port = 0;
 			continue;
 		}
-		struct rte_mbuf *bufs[BURST_SIZE*2];
+
 		const uint16_t nb_rx = rte_eth_rx_burst(port, 0, bufs,
 				BURST_SIZE);
 		if (unlikely(nb_rx == 0)) {
@@ -228,19 +296,46 @@ lcore_rx(struct lcore_params *p)
 		}
 		app_stats.rx.rx_pkts += nb_rx;
 
-		rte_distributor_process(d, bufs, nb_rx);
-		const uint16_t nb_ret = rte_distributor_returned_pkts(d,
-				bufs, BURST_SIZE*2);
+/*
+ * You can run the distributor on the rx core with this code. Returned
+ * packets are then send straight to the tx core.
+ */
+#if 0
+
+#if BURST_API
+	rte_distributor_process_burst(d, bufs, nb_rx);
+	const uint16_t nb_ret = rte_distributor_returned_pkts_burst(d,
+			bufs, BURST_SIZE*2);
+#else
+	rte_distributor_process(d, bufs, nb_rx);
+	const uint16_t nb_ret = rte_distributor_returned_pkts(d,
+			bufs, BURST_SIZE*2);
+#endif
+
 		app_stats.rx.returned_pkts += nb_ret;
 		if (unlikely(nb_ret == 0)) {
 			if (++port == nb_ports)
 				port = 0;
 			continue;
 		}
-
-		uint16_t sent = rte_ring_enqueue_burst(r, (void *)bufs, nb_ret);
+		struct rte_ring *tx_ring = p->dist_tx_ring;
+		uint16_t sent = rte_ring_enqueue_burst(tx_ring,
+				(void *)bufs, nb_ret);
+#else
+		uint16_t nb_ret = nb_rx;
+		/*
+		* Swap the following two lines if you want the rx traffic
+		* to go directly to tx, no distribution.
+		*/
+		struct rte_ring *out_ring = p->rx_dist_ring;
+		//struct rte_ring *out_ring = p->dist_tx_ring;
+
+		uint16_t sent = rte_ring_enqueue_burst(out_ring,
+				(void *)bufs, nb_ret);
+#endif
 		app_stats.rx.enqueued_pkts += sent;
 		if (unlikely(sent < nb_ret)) {
+			app_stats.rx.enqdrop_pkts +=  nb_ret - sent;
 			RTE_LOG_DP(DEBUG, DISTRAPP,
 				"%s:Packet loss due to full ring\n", __func__);
 			while (sent < nb_ret)
@@ -249,56 +344,88 @@ lcore_rx(struct lcore_params *p)
 		if (++port == nb_ports)
 			port = 0;
 	}
-	rte_distributor_process(d, NULL, 0);
-	/* flush distributor to bring to known state */
-	rte_distributor_flush(d);
 	/* set worker & tx threads quit flag */
+	printf("\nCore %u exiting rx task.\n", rte_lcore_id());
 	quit_signal = 1;
-	/*
-	 * worker threads may hang in get packet as
-	 * distributor process is not running, just make sure workers
-	 * get packets till quit_signal is actually been
-	 * received and they gracefully shutdown
-	 */
-	if (quit_workers(d, mem_pool) != 0)
-		return -1;
-	/* rx thread should quit at last */
 	return 0;
 }
 
-static inline void
-flush_one_port(struct output_buffer *outbuf, uint8_t outp)
-{
-	unsigned nb_tx = rte_eth_tx_burst(outp, 0, outbuf->mbufs,
-			outbuf->count);
-	app_stats.tx.tx_pkts += nb_tx;
 
-	if (unlikely(nb_tx < outbuf->count)) {
-		RTE_LOG_DP(DEBUG, DISTRAPP,
-			"%s:Packet loss with tx_burst\n", __func__);
-		do {
-			rte_pktmbuf_free(outbuf->mbufs[nb_tx]);
-		} while (++nb_tx < outbuf->count);
-	}
-	outbuf->count = 0;
-}
 
-static inline void
-flush_all_ports(struct output_buffer *tx_buffers, uint8_t nb_ports)
+static int
+lcore_distributor(struct lcore_params *p)
 {
-	uint8_t outp;
-	for (outp = 0; outp < nb_ports; outp++) {
-		/* skip ports that are not enabled */
-		if ((enabled_port_mask & (1 << outp)) == 0)
-			continue;
-
-		if (tx_buffers[outp].count == 0)
-			continue;
-
-		flush_one_port(&tx_buffers[outp], outp);
+	struct rte_ring *in_r = p->rx_dist_ring;
+	struct rte_ring *out_r = p->dist_tx_ring;
+	struct rte_mbuf *bufs[BURST_SIZE * 4];
+	struct rte_distributor_burst *d = p->d;
+
+	printf("\nCore %u acting as distributor core.\n", rte_lcore_id());
+	while (!quit_signal_dist) {
+		const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
+				(void *)bufs, BURST_SIZE*1);
+		if (nb_rx) {
+			app_stats.dist.in_pkts += nb_rx;
+/*
+ * This '#if' allows you to bypass the distributor. Incoming packets may be
+ * sent straight to the tx ring.
+ */
+#if 1
+
+#if BURST_API
+			/* Distribute the packets */
+			rte_distributor_process_burst(d, bufs, nb_rx);
+			/* Handle Returns */
+			const uint16_t nb_ret =
+				rte_distributor_returned_pkts_burst(d,
+					bufs, BURST_SIZE*2);
+#else
+			/* Distribute the packets */
+			rte_distributor_process(d, bufs, nb_rx);
+			/* Handle Returns */
+			const uint16_t nb_ret =
+				rte_distributor_returned_pkts(d,
+					bufs, BURST_SIZE*2);
+#endif
+
+#else
+			/* Bypass the distributor */
+			const unsigned int xor_val = (rte_eth_dev_count() > 1);
+			/* Touch the mbuf by xor'ing the port */
+			for (unsigned int i = 0; i < nb_rx; i++)
+				bufs[i]->port ^= xor_val;
+
+			const uint16_t nb_ret = nb_rx;
+#endif
+			if (unlikely(nb_ret == 0))
+				continue;
+			app_stats.dist.ret_pkts += nb_ret;
+
+			uint16_t sent = rte_ring_enqueue_burst(out_r,
+					(void *)bufs, nb_ret);
+			app_stats.dist.sent_pkts += sent;
+			if (unlikely(sent < nb_ret)) {
+				app_stats.dist.enqdrop_pkts += nb_ret - sent;
+				RTE_LOG(DEBUG, DISTRAPP,
+					"%s:Packet loss due to full out ring\n",
+					__func__);
+				while (sent < nb_ret)
+					rte_pktmbuf_free(bufs[sent++]);
+			}
+		}
 	}
+	printf("\nCore %u exiting distributor task.\n", rte_lcore_id());
+	quit_signal_work = 1;
+
+#if BURST_API
+	/* Unblock any returns so workers can exit */
+	rte_distributor_clear_returns_burst(d);
+#endif
+	quit_signal_rx = 1;
+	return 0;
 }
 
+
 static int
 lcore_tx(struct rte_ring *in_r)
 {
@@ -327,9 +454,9 @@ lcore_tx(struct rte_ring *in_r)
 			if ((enabled_port_mask & (1 << port)) == 0)
 				continue;
 
-			struct rte_mbuf *bufs[BURST_SIZE];
+			struct rte_mbuf *bufs[BURST_SIZE_TX];
 			const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
-					(void *)bufs, BURST_SIZE);
+					(void *)bufs, BURST_SIZE_TX);
 			app_stats.tx.dequeue_pkts += nb_rx;
 
 			/* if we get no traffic, flush anything we have */
@@ -358,11 +485,12 @@ lcore_tx(struct rte_ring *in_r)
 
 				outbuf = &tx_buffers[outp];
 				outbuf->mbufs[outbuf->count++] = bufs[i];
-				if (outbuf->count == BURST_SIZE)
+				if (outbuf->count == BURST_SIZE_TX)
 					flush_one_port(outbuf, outp);
 			}
 		}
 	}
+	printf("\nCore %u exiting tx task.\n", rte_lcore_id());
 	return 0;
 }
 
@@ -371,7 +499,7 @@ int_handler(int sig_num)
 {
 	printf("Exiting on signal %d\n", sig_num);
 	/* set quit flag for rx thread to exit */
-	quit_signal_rx = 1;
+	quit_signal_dist = 1;
 }
 
 static void
@@ -379,44 +507,138 @@ print_stats(void)
 {
 	struct rte_eth_stats eth_stats;
 	unsigned i;
-
-	printf("\nRX thread stats:\n");
-	printf(" - Received:    %"PRIu64"\n", app_stats.rx.rx_pkts);
-	printf(" - Processed:   %"PRIu64"\n", app_stats.rx.returned_pkts);
-	printf(" - Enqueued:    %"PRIu64"\n", app_stats.rx.enqueued_pkts);
-
-	printf("\nTX thread stats:\n");
-	printf(" - Dequeued:    %"PRIu64"\n", app_stats.tx.dequeue_pkts);
-	printf(" - Transmitted: %"PRIu64"\n", app_stats.tx.tx_pkts);
+	const unsigned int num_workers = rte_lcore_count() - 4;
 
 	for (i = 0; i < rte_eth_dev_count(); i++) {
 		rte_eth_stats_get(i, &eth_stats);
-		printf("\nPort %u stats:\n", i);
-		printf(" - Pkts in:   %"PRIu64"\n", eth_stats.ipackets);
-		printf(" - Pkts out:  %"PRIu64"\n", eth_stats.opackets);
-		printf(" - In Errs:   %"PRIu64"\n", eth_stats.ierrors);
-		printf(" - Out Errs:  %"PRIu64"\n", eth_stats.oerrors);
-		printf(" - Mbuf Errs: %"PRIu64"\n", eth_stats.rx_nombuf);
+		app_stats.port_rx_pkts[i] = eth_stats.ipackets;
+		app_stats.port_tx_pkts[i] = eth_stats.opackets;
+	}
+
+	printf("\n\nRX Thread:\n");
+	for (i = 0; i < rte_eth_dev_count(); i++) {
+		printf("Port %u Pktsin : %5.2f\n", i,
+				(app_stats.port_rx_pkts[i] -
+				prev_app_stats.port_rx_pkts[i])/1000000.0);
+		prev_app_stats.port_rx_pkts[i] = app_stats.port_rx_pkts[i];
+	}
+	printf(" - Received:    %5.2f\n",
+			(app_stats.rx.rx_pkts -
+			prev_app_stats.rx.rx_pkts)/1000000.0);
+	printf(" - Returned:    %5.2f\n",
+			(app_stats.rx.returned_pkts -
+			prev_app_stats.rx.returned_pkts)/1000000.0);
+	printf(" - Enqueued:    %5.2f\n",
+			(app_stats.rx.enqueued_pkts -
+			prev_app_stats.rx.enqueued_pkts)/1000000.0);
+	printf(" - Dropped:     %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.rx.enqdrop_pkts -
+			prev_app_stats.rx.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	printf("Distributor thread:\n");
+	printf(" - In:          %5.2f\n",
+			(app_stats.dist.in_pkts -
+			prev_app_stats.dist.in_pkts)/1000000.0);
+	printf(" - Returned:    %5.2f\n",
+			(app_stats.dist.ret_pkts -
+			prev_app_stats.dist.ret_pkts)/1000000.0);
+	printf(" - Sent:        %5.2f\n",
+			(app_stats.dist.sent_pkts -
+			prev_app_stats.dist.sent_pkts)/1000000.0);
+	printf(" - Dropped      %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.dist.enqdrop_pkts -
+			prev_app_stats.dist.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	printf("TX thread:\n");
+	printf(" - Dequeued:    %5.2f\n",
+			(app_stats.tx.dequeue_pkts -
+			prev_app_stats.tx.dequeue_pkts)/1000000.0);
+	for (i = 0; i < rte_eth_dev_count(); i++) {
+		printf("Port %u Pktsout: %5.2f\n",
+				i, (app_stats.port_tx_pkts[i] -
+				prev_app_stats.port_tx_pkts[i])/1000000.0);
+		prev_app_stats.port_tx_pkts[i] = app_stats.port_tx_pkts[i];
+	}
+	printf(" - Transmitted: %5.2f\n",
+			(app_stats.tx.tx_pkts -
+			prev_app_stats.tx.tx_pkts)/1000000.0);
+	printf(" - Dropped:     %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.tx.enqdrop_pkts -
+			prev_app_stats.tx.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	prev_app_stats.rx.rx_pkts = app_stats.rx.rx_pkts;
+	prev_app_stats.rx.returned_pkts = app_stats.rx.returned_pkts;
+	prev_app_stats.rx.enqueued_pkts = app_stats.rx.enqueued_pkts;
+	prev_app_stats.rx.enqdrop_pkts = app_stats.rx.enqdrop_pkts;
+	prev_app_stats.dist.in_pkts = app_stats.dist.in_pkts;
+	prev_app_stats.dist.ret_pkts = app_stats.dist.ret_pkts;
+	prev_app_stats.dist.sent_pkts = app_stats.dist.sent_pkts;
+	prev_app_stats.dist.enqdrop_pkts = app_stats.dist.enqdrop_pkts;
+	prev_app_stats.tx.dequeue_pkts = app_stats.tx.dequeue_pkts;
+	prev_app_stats.tx.tx_pkts = app_stats.tx.tx_pkts;
+	prev_app_stats.tx.enqdrop_pkts = app_stats.tx.enqdrop_pkts;
+
+	for (i = 0; i < num_workers; i++) {
+		printf("Worker %02u Pkts: %5.2f. Bursts(1-8): ", i,
+				(app_stats.worker_pkts[i] -
+				prev_app_stats.worker_pkts[i])/1000000.0);
+		for (int j = 0; j < 8; j++)
+			printf("%ld ", app_stats.worker_bursts[i][j]);
+		printf("\n");
+		prev_app_stats.worker_pkts[i] = app_stats.worker_pkts[i];
 	}
 }
 
 static int
 lcore_worker(struct lcore_params *p)
 {
-	struct rte_distributor *d = p->d;
+	struct rte_distributor_burst *d = p->d;
 	const unsigned id = p->worker_id;
+	unsigned int num = 0;
+
 	/*
 	 * for single port, xor_val will be zero so we won't modify the output
 	 * port, otherwise we send traffic from 0 to 1, 2 to 3, and vice versa
 	 */
 	const unsigned xor_val = (rte_eth_dev_count() > 1);
-	struct rte_mbuf *buf = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+
+	for (int i = 0; i < 8; i++)
+		buf[i] = NULL;
+
+	app_stats.worker_pkts[p->worker_id] = 1;
+
 
 	printf("\nCore %u acting as worker core.\n", rte_lcore_id());
-	while (!quit_signal) {
-		buf = rte_distributor_get_pkt(d, id, buf);
-		buf->port ^= xor_val;
+	while (!quit_signal_work) {
+
+#if BURST_API
+		num = rte_distributor_get_pkt_burst(d, id, buf, buf, num);
+		/* Do a little bit of work for each packet */
+		for (unsigned int i = 0; i < num; i++) {
+			uint64_t t = __rdtsc()+100;
+
+			while (__rdtsc() < t)
+				rte_pause();
+			buf[i]->port ^= xor_val;
+		}
+#else
+		buf[0] = rte_distributor_get_pkt(d, id, buf[0]);
+		uint64_t t = __rdtsc() + 10;
+
+		while (__rdtsc() < t)
+			rte_pause();
+		buf[0]->port ^= xor_val;
+#endif
+
+		app_stats.worker_pkts[p->worker_id] += num;
+		if (num > 0)
+			app_stats.worker_bursts[p->worker_id][num-1]++;
 	}
+	printf("\nCore %u exiting worker task.\n", rte_lcore_id());
 	return 0;
 }
 
@@ -496,12 +718,14 @@ int
 main(int argc, char *argv[])
 {
 	struct rte_mempool *mbuf_pool;
-	struct rte_distributor *d;
-	struct rte_ring *output_ring;
+	struct rte_distributor_burst *d;
+	struct rte_ring *dist_tx_ring;
+	struct rte_ring *rx_dist_ring;
 	unsigned lcore_id, worker_id = 0;
 	unsigned nb_ports;
 	uint8_t portid;
 	uint8_t nb_ports_available;
+	uint64_t t, freq;
 
 	/* catch ctrl-c so we can print on exit */
 	signal(SIGINT, int_handler);
@@ -518,10 +742,12 @@ main(int argc, char *argv[])
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Invalid distributor parameters\n");
 
-	if (rte_lcore_count() < 3)
+	if (rte_lcore_count() < 5)
 		rte_exit(EXIT_FAILURE, "Error, This application needs at "
-				"least 3 logical cores to run:\n"
-				"1 lcore for packet RX and distribution\n"
+				"least 5 logical cores to run:\n"
+				"1 lcore for stats (can be core 0)\n"
+				"1 lcore for packet RX\n"
+				"1 lcore for distribution\n"
 				"1 lcore for packet TX\n"
 				"and at least 1 lcore for worker threads\n");
 
@@ -560,41 +786,86 @@ main(int argc, char *argv[])
 				"All available ports are disabled. Please set portmask.\n");
 	}
 
+#if BURST_API
+	d = rte_distributor_create_burst("PKT_DIST", rte_socket_id(),
+			rte_lcore_count() - 4);
+#else
 	d = rte_distributor_create("PKT_DIST", rte_socket_id(),
-			rte_lcore_count() - 2);
+			rte_lcore_count() - 4);
+#endif
 	if (d == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create distributor\n");
 
 	/*
-	 * scheduler ring is read only by the transmitter core, but written to
-	 * by multiple threads
+	 * scheduler ring is read by the transmitter core, and written to
+	 * by scheduler core
 	 */
-	output_ring = rte_ring_create("Output_ring", RTE_RING_SZ,
-			rte_socket_id(), RING_F_SC_DEQ);
-	if (output_ring == NULL)
+	dist_tx_ring = rte_ring_create("Output_ring", SCHED_TX_RING_SZ,
+			rte_socket_id(), RING_F_SC_DEQ | RING_F_SP_ENQ);
+	if (dist_tx_ring == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot create output ring\n");
+
+	rx_dist_ring = rte_ring_create("Input_ring", SCHED_RX_RING_SZ,
+			rte_socket_id(), RING_F_SC_DEQ | RING_F_SP_ENQ);
+	if (rx_dist_ring == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create output ring\n");
 
 	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
-		if (worker_id == rte_lcore_count() - 2)
+		if (worker_id == rte_lcore_count() - 3) {
+			printf("Starting distributor on lcore_id %d\n",
+					lcore_id);
+			/* distributor core */
+			struct lcore_params *p =
+					rte_malloc(NULL, sizeof(*p), 0);
+			if (!p)
+				rte_panic("malloc failure\n");
+			*p = (struct lcore_params){worker_id, d,
+					rx_dist_ring, dist_tx_ring, mbuf_pool};
+			rte_eal_remote_launch(
+					(lcore_function_t *)lcore_distributor,
+					p, lcore_id);
+		} else if (worker_id == rte_lcore_count() - 4) {
+			printf("Starting tx  on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
+			/* tx core */
 			rte_eal_remote_launch((lcore_function_t *)lcore_tx,
-					output_ring, lcore_id);
-		else {
+					dist_tx_ring, lcore_id);
+		} else if (worker_id == rte_lcore_count() - 2) {
+			printf("Starting rx on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
+			/* rx core */
+			struct lcore_params *p =
+					rte_malloc(NULL, sizeof(*p), 0);
+			if (!p)
+				rte_panic("malloc failure\n");
+			*p = (struct lcore_params){worker_id, d, rx_dist_ring,
+					dist_tx_ring, mbuf_pool};
+			rte_eal_remote_launch((lcore_function_t *)lcore_rx,
+					p, lcore_id);
+		} else {
+			printf("Starting worker on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
 			struct lcore_params *p =
 					rte_malloc(NULL, sizeof(*p), 0);
 			if (!p)
 				rte_panic("malloc failure\n");
-			*p = (struct lcore_params){worker_id, d, output_ring, mbuf_pool};
+			*p = (struct lcore_params){worker_id, d, rx_dist_ring,
+					dist_tx_ring, mbuf_pool};
 
 			rte_eal_remote_launch((lcore_function_t *)lcore_worker,
 					p, lcore_id);
 		}
 		worker_id++;
 	}
-	/* call lcore_main on master core only */
-	struct lcore_params p = { 0, d, output_ring, mbuf_pool};
 
-	if (lcore_rx(&p) != 0)
-		return -1;
+	freq = rte_get_timer_hz();
+	t = __rdtsc() + freq;
+	while (!quit_signal_dist) {
+		if (t < __rdtsc()) {
+			print_stats();
+			t = _rdtsc() + freq;
+		}
+	}
 
 	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
 		if (rte_eal_wait_lcore(lcore_id) < 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v2 5/5] doc: distributor library changes for new burst api
  2016-12-22  4:37   ` [PATCH v2 0/5] distributor library " David Hunt
                       ` (3 preceding siblings ...)
  2016-12-22  4:37     ` [PATCH v2 4/5] example: distributor app showing burst api David Hunt
@ 2016-12-22  4:37     ` David Hunt
  4 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2016-12-22  4:37 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 doc/guides/prog_guide/packet_distrib_lib.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/prog_guide/packet_distrib_lib.rst b/doc/guides/prog_guide/packet_distrib_lib.rst
index b5bdabb..dffd4ad 100644
--- a/doc/guides/prog_guide/packet_distrib_lib.rst
+++ b/doc/guides/prog_guide/packet_distrib_lib.rst
@@ -42,6 +42,10 @@ The model of operation is shown in the diagram below.
 
    Packet Distributor mode of operation
 
+There are two versions of the API in the distributor Library, one which sends one packet at a time to workers,
+and another which sends bursts of up to 8 packets at a time to workers. The functions names of the second API
+are identified by "_burst", and must not be intermixed with the single packet API. The operations described below
+apply to both API's, select which API you wish to use by including the relevant header file.
 
 Distributor Core Operation
 --------------------------
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* Re: [PATCH v2 3/5] test: add distributor_perf autotest
  2016-12-22  4:37     ` [PATCH v2 3/5] test: add distributor_perf autotest David Hunt
@ 2016-12-22 12:19       ` Jerin Jacob
  2017-01-02 16:24         ` Hunt, David
  0 siblings, 1 reply; 202+ messages in thread
From: Jerin Jacob @ 2016-12-22 12:19 UTC (permalink / raw)
  To: David Hunt; +Cc: dev, bruce.richardson

On Thu, Dec 22, 2016 at 04:37:06AM +0000, David Hunt wrote:
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
> + * it does nothing but return packets and count them.
> + */
> +static int
> +handle_work_burst(void *arg)
> +{
> +	//struct rte_mbuf *pkt = NULL;

Seems like their is lot test code with // in this file. Please remove it.

> +	struct rte_distributor_burst *d = arg;
> +	unsigned int count = 0;
> +	unsigned int num = 0;
> +	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);

Use rte_atomic equivalent

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v2 1/5] lib: distributor performance enhancements
  2016-12-22  4:37     ` [PATCH v2 1/5] lib: distributor " David Hunt
@ 2016-12-22 12:47       ` Jerin Jacob
  2016-12-22 16:14         ` Hunt, David
  2017-01-02 10:22       ` [WARNING: A/V UNSCANNABLE][PATCH v3 0/6] distributor-performance-improvements David Hunt
  2017-01-09  7:50       ` [PATCH v4 0/6] distributor library performance enhancements David Hunt
  2 siblings, 1 reply; 202+ messages in thread
From: Jerin Jacob @ 2016-12-22 12:47 UTC (permalink / raw)
  To: David Hunt; +Cc: dev, bruce.richardson

On Thu, Dec 22, 2016 at 04:37:04AM +0000, David Hunt wrote:
> Now sends bursts of up to 8 mbufs to each worker, and tracks
> the in-flight flow-ids (atomic scheduling)
> 
> New file with a new api, similar to the old API except with _burst
> at the end of the function names
> 
> Signed-off-by: David Hunt <david.hunt@intel.com>
> +
> +int
> +rte_distributor_get_pkt_burst(struct rte_distributor_burst *d,
> +		unsigned int worker_id, struct rte_mbuf **pkts,
> +		struct rte_mbuf **oldpkt, unsigned int return_count)
> +{
> +	unsigned int count;
> +	uint64_t retries = 0;
> +
> +	rte_distributor_request_pkt_burst(d, worker_id, oldpkt, return_count);
> +
> +	count = rte_distributor_poll_pkt_burst(d, worker_id, pkts);
> +	while (count == 0) {
> +		rte_pause();
> +		retries++;
> +		if (retries > 1000) {
> +			retries = 0;

This retries write may not have any significance as it just before the
return

> +			return 0;
> +		}
> +		uint64_t t = __rdtsc()+100;

Use rte_ version of __rdtsc.

> +
> +		while (__rdtsc() < t)
> +			rte_pause();
> +
> +		count = rte_distributor_poll_pkt_burst(d, worker_id, pkts);
> +	}
> +	return count;
> +}
> +
> +int
> +rte_distributor_return_pkt_burst(struct rte_distributor_burst *d,
> +		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
> +{
> +	struct rte_distributor_buffer_burst *buf = &d->bufs[worker_id];
> +	unsigned int i;
> +
> +	for (i = 0; i < RTE_DIST_BURST_SIZE; i++)
> +		/* Switch off the return bit first */
> +		buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
> +
> +	for (i = num; i-- > 0; )
> +		buf->retptr64[i] = (((int64_t)(uintptr_t)oldpkt[i]) <<
> +			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
> +
> +	/* set the GET_BUF but even if we got no returns */
> +	buf->retptr64[0] |= RTE_DISTRIB_GET_BUF;
> +
> +	return 0;
> +}
> +
> +#if RTE_MACHINE_CPUFLAG_SSE2
> +static inline void

Move SSE version of the code to separate file so that later other SIMD arch
specific version like NEON can be incorporated.

> +find_match_sse2(struct rte_distributor_burst *d,
> +			uint16_t *data_ptr,
> +			uint16_t *output_ptr)
> +{
> +	/* Setup */
> +	__m128i incoming_fids;
> +	__m128i inflight_fids;
> +	__m128i preflight_fids;
> +	__m128i wkr;
> +	__m128i mask1;
> +	__m128i mask2;
> +	__m128i output;
> +	struct rte_distributor_backlog *bl;
> +
> +	/*
> +	 * Function overview:
> +	 * 2. Loop through all worker ID's
> +	 *  2a. Load the current inflights for that worker into an xmm reg
> +	 *  2b. Load the current backlog for that worker into an xmm reg
> +	 *  2c. use cmpestrm to intersect flow_ids with backlog and inflights
> +	 *  2d. Add any matches to the output
> +	 * 3. Write the output xmm (matching worker ids).
> +	 */
> +
> +
> +	output = _mm_set1_epi16(0);
> +	incoming_fids = _mm_load_si128((__m128i *)data_ptr);
> +
> +	for (uint16_t i = 0; i < d->num_workers; i++) {
> +		bl = &d->backlog[i];
> +
> +		inflight_fids =
> +			_mm_load_si128((__m128i *)&(d->in_flight_tags[i]));
> +		preflight_fids =
> +			_mm_load_si128((__m128i *)(bl->tags));
> +
> +		/*
> +		 * Any incoming_fid that exists anywhere in inflight_fids will
> +		 * have 0xffff in same position of the mask as the incoming fid
> +		 * Example (shortened to bytes for brevity):
> +		 * incoming_fids   0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08
> +		 * inflight_fids   0x03 0x05 0x07 0x00 0x00 0x00 0x00 0x00
> +		 * mask            0x00 0x00 0xff 0x00 0xff 0x00 0xff 0x00
> +		 */
> +
> +		mask1 = _mm_cmpestrm(inflight_fids, 8, incoming_fids, 8,
> +			_SIDD_UWORD_OPS |
> +			_SIDD_CMP_EQUAL_ANY |
> +			_SIDD_UNIT_MASK);
> +		mask2 = _mm_cmpestrm(preflight_fids, 8, incoming_fids, 8,
> +			_SIDD_UWORD_OPS |
> +			_SIDD_CMP_EQUAL_ANY |
> +			_SIDD_UNIT_MASK);
> +
> +		mask1 = _mm_or_si128(mask1, mask2);
> +		/*
> +		 * Now mask contains 0xffff where there's a match.
> +		 * Next we need to store the worker_id in the relevant position
> +		 * in the output.
> +		 */
> +
> +		wkr = _mm_set1_epi16(i+1);
> +		mask1 = _mm_and_si128(mask1, wkr);
> +		output = _mm_or_si128(mask1, output);
> +	}
> +
> +/* process a set of packets to distribute them to workers */
> +int
> +rte_distributor_process_burst(struct rte_distributor_burst *d,
> +		struct rte_mbuf **mbufs, unsigned int num_mbufs)
> +{
> +	unsigned int next_idx = 0;
> +	static unsigned int wkr;
> +	struct rte_mbuf *next_mb = NULL;
> +	int64_t next_value = 0;
> +	uint16_t new_tag = 0;
> +	uint16_t flows[8] __rte_cache_aligned;

The const 8 has been used down in the function also. Please replace with macro

> +	//static int iter=0;

Please remove the test-code with // across the patch.

> +
> +	if (unlikely(num_mbufs == 0)) {
> +		/* Flush out all non-full cache-lines to workers. */
> +		for (unsigned int wid = 0 ; wid < d->num_workers; wid++) {
> +			if ((d->bufs[wid].bufptr64[0] & RTE_DISTRIB_GET_BUF)) {
> +				release(d, wid);
> +				handle_returns(d, wid);
> +			}
> +		}
> +		return 0;
> +	}
> +
> +	while (next_idx < num_mbufs) {
> +		uint16_t matches[8];
> +		int pkts;
> +
> +		if (d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF)
> +			d->bufs[wkr].count = 0;
> +
> +		for (unsigned int i = 0; i < RTE_DIST_BURST_SIZE; i++) {
> +			if (mbufs[next_idx + i]) {
> +				/* flows have to be non-zero */
> +				flows[i] = mbufs[next_idx + i]->hash.usr | 1;
> +			} else
> +				flows[i] = 0;
> +		}
> +
> +		switch (d->dist_match_fn) {
> +#ifdef RTE_MACHINE_CPUFLAG_SSE2

Is this conditional compilation flag is really required ? i.e
RTE_DIST_MATCH_SSE will not enabled in non SSE case

> +		case RTE_DIST_MATCH_SSE:
> +			find_match_sse2(d, &flows[0], &matches[0]);
> +			break;
> +#endif
> +		default:
> +			find_match_scalar(d, &flows[0], &matches[0]);
> +		}
> +
> +		/*
> +		 * Matches array now contain the intended worker ID (+1) of
> +		 * the incoming packets. Any zeroes need to be assigned
> +		 * workers.
> +		 */
> +
> +		if ((num_mbufs - next_idx) < RTE_DIST_BURST_SIZE)
> +			pkts = num_mbufs - next_idx;
> +		else
> +			pkts = RTE_DIST_BURST_SIZE;
> +
> +		for (int j = 0; j < pkts; j++) {
> +
> +			next_mb = mbufs[next_idx++];
> +			next_value = (((int64_t)(uintptr_t)next_mb) <<
> +					RTE_DISTRIB_FLAG_BITS);
> +			/*
> +			 * User is advocated to set tag vaue for each
> +			 * mbuf before calling rte_distributor_process.
> +			 * User defined tags are used to identify flows,
> +			 * or sessions.
> +			 */
> +			/* flows MUST be non-zero */
> +			new_tag = (uint16_t)(next_mb->hash.usr) | 1;
> +
> +			/*
> +			 * Using the next line will cause the find_match
> +			 * function to be optimised out, making this function
> +			 * do parallel (non-atomic) distribution
> +			 */
> +			//matches[j] = 0;

test code with //

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v2 1/5] lib: distributor performance enhancements
  2016-12-22 12:47       ` Jerin Jacob
@ 2016-12-22 16:14         ` Hunt, David
  0 siblings, 0 replies; 202+ messages in thread
From: Hunt, David @ 2016-12-22 16:14 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev, bruce.richardson


Thanks for the review, Jerin, I very much appreciate it. I'll address 
all the minor comments, and I've a comment or two on the remaining 
changes below.


On 22/12/2016 12:47 PM, Jerin Jacob wrote:

> On Thu, Dec 22, 2016 at 04:37:04AM +0000, David Hunt wrote:
>

--snip--

>> +
>> +	/* set the GET_BUF but even if we got no returns */
>> +	buf->retptr64[0] |= RTE_DISTRIB_GET_BUF;
>> +
>> +	return 0;
>> +}
>> +
>> +#if RTE_MACHINE_CPUFLAG_SSE2
>> +static inline void
> Move SSE version of the code to separate file so that later other SIMD arch
> specific version like NEON can be incorporated.
>

Sure. Will do. I'll model it on the i40e SIMD layout.


>> +		switch (d->dist_match_fn) {
>> +#ifdef RTE_MACHINE_CPUFLAG_SSE2
> Is this conditional compilation flag is really required ? i.e
> RTE_DIST_MATCH_SSE will not enabled in non SSE case

So I can always leave the call to find_match_sse2 in there, but the 
run-time cpu flags check will
take care of whether it's called or not? OK sure.


Thanks,
Dave.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* [WARNING: A/V UNSCANNABLE][PATCH v3 0/6] distributor-performance-improvements
  2016-12-22  4:37     ` [PATCH v2 1/5] lib: distributor " David Hunt
  2016-12-22 12:47       ` Jerin Jacob
@ 2017-01-02 10:22       ` David Hunt
  2017-01-02 10:22         ` [WARNING: A/V UNSCANNABLE][PATCH v3 1/6] lib: distributor performance enhancements David Hunt
                           ` (5 more replies)
  2017-01-09  7:50       ` [PATCH v4 0/6] distributor library performance enhancements David Hunt
  2 siblings, 6 replies; 202+ messages in thread
From: David Hunt @ 2017-01-02 10:22 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson

This patch aims to improve the throughput of the distributor library.

It uses a similar handshake mechanism to the previous version of
the library, in that bits are used to indicate when packets are ready
to be sent to a worker and ready to be returned from a worker. One main
difference is that instead of sending one packet in a cache line, it makes
use of the 7 free spaces in the same cache line in order to send up to
8 packets at a time to/from a worker.

The flow matching algorithm has had significant re-work, and now keeps an
array of inflight flows and an array of backlog flows, and matches incoming
flows to the inflight/backlog flows of all workers so that flow pinning to
workers can be maintained.

The Flow Match algorithm has both scalar and a vector versions, and a
function pointer is used to select the post appropriate function at run time,
depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
the scalar match function is selected, which should still gives a good boost
in performance over the non-burst API.

v2 changes:
  * Created a common distributor_priv.h header file with common
    definitions and structures.
  * Added a scalar version so it can be built and used on machines without
    sse2 instruction set
  * Added unit autotests
  * Added perf autotest

v3 changes:
  * Addressed mailing list review comments
  * Test code removal
  * Split out SSE match into separate file to facilitate NEON addition
  * Cleaned up conditional compilation flags for SSE2
  * Addressed c99 style compilation errors
  * rebased on latest head (Jan 2 2017, Happy New Year to all)

Notes:
   Apps must now work in bursts, as up to 8 are given to a worker at a time
   For performance in matching, Flow ID's are 15-bits
   Original API (and code) is kept for backward compatibility

Performance Gains
   2.2GHz Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
   2 x XL710 40GbE NICS to 2 x 40Gbps traffic generator channels 64b packets
   separate cores for rx, tx, distributor
    1 worker  - 4.8x
    4 workers - 2.9x
    8 workers - 1.8x
   12 workers - 2.1x
   16 workers - 1.8x

[PATCH v3 1/6] lib: distributor performance enhancements
[PATCH v3 2/6] lib: add distributor vector flow matching
[PATCH v3 3/6] test: unit tests for new distributor burst api
[PATCH v3 4/6] test: add distributor_perf autotest
[PATCH v3 5/6] example: distributor app showing burst api
[PATCH v3 6/6] doc: distributor library changes for new burst api

^ permalink raw reply	[flat|nested] 202+ messages in thread

* [WARNING: A/V UNSCANNABLE][PATCH v3 1/6] lib: distributor performance enhancements
  2017-01-02 10:22       ` [WARNING: A/V UNSCANNABLE][PATCH v3 0/6] distributor-performance-improvements David Hunt
@ 2017-01-02 10:22         ` David Hunt
  2017-01-02 10:22         ` [WARNING: A/V UNSCANNABLE][PATCH v3 2/6] lib: add distributor vector flow matching David Hunt
                           ` (4 subsequent siblings)
  5 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-01-02 10:22 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Now sends bursts of up to 8 mbufs to each worker, and tracks
the in-flight flow-ids (atomic scheduling)

New file with a new api, similar to the old API except with _burst
at the end of the function names

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                |   2 +
 lib/librte_distributor/rte_distributor.c       |  72 +---
 lib/librte_distributor/rte_distributor_burst.c | 558 +++++++++++++++++++++++++
 lib/librte_distributor/rte_distributor_burst.h | 255 +++++++++++
 lib/librte_distributor/rte_distributor_priv.h  | 189 +++++++++
 5 files changed, 1005 insertions(+), 71 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_burst.c
 create mode 100644 lib/librte_distributor/rte_distributor_burst.h
 create mode 100644 lib/librte_distributor/rte_distributor_priv.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 4c9af17..2acc54d 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -43,9 +43,11 @@ LIBABIVER := 1
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_burst.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor_burst.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index f3f778c..c05f6e3 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -40,79 +40,9 @@
 #include <rte_errno.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
+#include "rte_distributor_priv.h"
 #include "rte_distributor.h"
 
-#define NO_FLAGS 0
-#define RTE_DISTRIB_PREFIX "DT_"
-
-/* we will use the bottom four bits of pointer for flags, shifting out
- * the top four bits to make room (since a 64-bit pointer actually only uses
- * 48 bits). An arithmetic-right-shift will then appropriately restore the
- * original pointer value with proper sign extension into the top bits. */
-#define RTE_DISTRIB_FLAG_BITS 4
-#define RTE_DISTRIB_FLAGS_MASK (0x0F)
-#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
-#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
-#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
-
-#define RTE_DISTRIB_BACKLOG_SIZE 8
-#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
-
-#define RTE_DISTRIB_MAX_RETURNS 128
-#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
-
-/**
- * Maximum number of workers allowed.
- * Be aware of increasing the limit, becaus it is limited by how we track
- * in-flight tags. See @in_flight_bitmask and @rte_distributor_process
- */
-#define RTE_DISTRIB_MAX_WORKERS	64
-
-/**
- * Buffer structure used to pass the pointer data between cores. This is cache
- * line aligned, but to improve performance and prevent adjacent cache-line
- * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
- * the next cache line to worker 0, we pad this out to three cache lines.
- * Only 64-bits of the memory is actually used though.
- */
-union rte_distributor_buffer {
-	volatile int64_t bufptr64;
-	char pad[RTE_CACHE_LINE_SIZE*3];
-} __rte_cache_aligned;
-
-struct rte_distributor_backlog {
-	unsigned start;
-	unsigned count;
-	int64_t pkts[RTE_DISTRIB_BACKLOG_SIZE];
-};
-
-struct rte_distributor_returned_pkts {
-	unsigned start;
-	unsigned count;
-	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
-};
-
-struct rte_distributor {
-	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
-
-	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
-	unsigned num_workers;                 /**< Number of workers polling */
-
-	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
-		/**< Tracks the tag being processed per core */
-	uint64_t in_flight_bitmask;
-		/**< on/off bits for in-flight tags.
-		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
-		 * the bitmask has to expand.
-		 */
-
-	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
-
-	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
-
-	struct rte_distributor_returned_pkts returns;
-};
-
 TAILQ_HEAD(rte_distributor_list, rte_distributor);
 
 static struct rte_tailq_elem rte_distributor_tailq = {
diff --git a/lib/librte_distributor/rte_distributor_burst.c b/lib/librte_distributor/rte_distributor_burst.c
new file mode 100644
index 0000000..ae7cf9d
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_burst.c
@@ -0,0 +1,558 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <sys/queue.h>
+#include <string.h>
+#include <rte_mbuf.h>
+#include <rte_memory.h>
+#include <rte_cycles.h>
+#include <rte_memzone.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
+#include <rte_eal_memconfig.h>
+#include "rte_distributor_priv.h"
+#include "rte_distributor_burst.h"
+
+TAILQ_HEAD(rte_dist_burst_list, rte_distributor_burst);
+
+static struct rte_tailq_elem rte_dist_burst_tailq = {
+	.name = "RTE_DIST_BURST",
+};
+EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
+
+/**** APIs called by workers ****/
+
+/**** Burst Packet APIs called by workers ****/
+
+/* This function should really be called return_pkt_burst() */
+void
+rte_distributor_request_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count)
+{
+	struct rte_distributor_buffer_burst *buf = &(d->bufs[worker_id]);
+	unsigned int i;
+
+	volatile int64_t *retptr64;
+
+
+	/* if we dont' have any packets to return, return. */
+	if (count == 0)
+		return;
+
+	retptr64 = &(buf->retptr64[0]);
+	/* Spin while handshake bits are set (scheduler clears it) */
+	while (unlikely(*retptr64 & RTE_DISTRIB_GET_BUF)) {
+		rte_pause();
+		uint64_t t = rte_rdtsc()+100;
+
+		while (rte_rdtsc() < t)
+			rte_pause();
+	}
+
+	/*
+	 * OK, if we've got here, then the scheduler has just cleared the
+	 * handshake bits. Populate the retptrs with returning packets.
+	 */
+
+	for (i = count; i < RTE_DIST_BURST_SIZE; i++)
+		buf->retptr64[i] = 0;
+
+	/* Set Return bit for each packet returned */
+	for (i = count; i-- > 0; )
+		buf->retptr64[i] =
+			(((int64_t)(uintptr_t)(oldpkt[i])) <<
+			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
+
+	/*
+	 * Finally, set the GET_BUF  to signal to distributor that cache
+	 * line is ready for processing
+	 */
+	*retptr64 |= RTE_DISTRIB_GET_BUF;
+}
+
+int
+rte_distributor_poll_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **pkts)
+{
+	struct rte_distributor_buffer_burst *buf = &d->bufs[worker_id];
+	uint64_t ret;
+	int count = 0;
+	unsigned int i;
+
+	/* If bit is set, return */
+	if (buf->bufptr64[0] & RTE_DISTRIB_GET_BUF)
+		return 0;
+
+	/* since bufptr64 is signed, this should be an arithmetic shift */
+	for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+		if (likely(buf->bufptr64[i] & RTE_DISTRIB_VALID_BUF)) {
+			ret = buf->bufptr64[i] >> RTE_DISTRIB_FLAG_BITS;
+			pkts[count++] = (struct rte_mbuf *)((uintptr_t)(ret));
+		}
+	}
+
+	/*
+	 * so now we've got the contents of the cacheline into an  array of
+	 * mbuf pointers, so toggle the bit so scheduler can start working
+	 * on the next cacheline while we're working.
+	 */
+	buf->bufptr64[0] |= RTE_DISTRIB_GET_BUF;
+
+
+	return count;
+}
+
+int
+rte_distributor_get_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **pkts,
+		struct rte_mbuf **oldpkt, unsigned int return_count)
+{
+	unsigned int count;
+	uint64_t retries = 0;
+
+	rte_distributor_request_pkt_burst(d, worker_id, oldpkt, return_count);
+
+	count = rte_distributor_poll_pkt_burst(d, worker_id, pkts);
+	while (count == 0) {
+		rte_pause();
+		retries++;
+		if (retries > 1000)
+			return 0;
+
+		uint64_t t = rte_rdtsc()+100;
+
+		while (rte_rdtsc() < t)
+			rte_pause();
+
+		count = rte_distributor_poll_pkt_burst(d, worker_id, pkts);
+	}
+	return count;
+}
+
+int
+rte_distributor_return_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
+{
+	struct rte_distributor_buffer_burst *buf = &d->bufs[worker_id];
+	unsigned int i;
+
+	for (i = 0; i < RTE_DIST_BURST_SIZE; i++)
+		/* Switch off the return bit first */
+		buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
+
+	for (i = num; i-- > 0; )
+		buf->retptr64[i] = (((int64_t)(uintptr_t)oldpkt[i]) <<
+			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
+
+	/* set the GET_BUF but even if we got no returns */
+	buf->retptr64[0] |= RTE_DISTRIB_GET_BUF;
+
+	return 0;
+}
+
+/**** APIs called on distributor core ***/
+
+/* stores a packet returned from a worker inside the returns array */
+static inline void
+store_return(uintptr_t oldbuf, struct rte_distributor_burst *d,
+		unsigned int *ret_start, unsigned int *ret_count)
+{
+	if (!oldbuf)
+		return;
+	/* store returns in a circular buffer */
+	d->returns.mbufs[(*ret_start + *ret_count) & RTE_DISTRIB_RETURNS_MASK]
+			= (void *)oldbuf;
+	*ret_start += (*ret_count == RTE_DISTRIB_RETURNS_MASK);
+	*ret_count += (*ret_count != RTE_DISTRIB_RETURNS_MASK);
+}
+
+static inline void
+find_match_scalar(struct rte_distributor_burst *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	struct rte_distributor_backlog *bl;
+	uint16_t i, j, w;
+
+	/*
+	 * Function overview:
+	 * 1. Loop through all worker ID's
+	 * 2. Compare the current inflights to the incoming tags
+	 * 3. Compare the current backlog to the incoming tags
+	 * 4. Add any matches to the output
+	 */
+
+	for (j = 0 ; j < RTE_DIST_BURST_SIZE; j++)
+		output_ptr[j] = 0;
+
+	for (i = 0; i < d->num_workers; i++) {
+		bl = &d->backlog[i];
+
+		for (j = 0; j < RTE_DIST_BURST_SIZE ; j++)
+			for (w = 0; w < RTE_DIST_BURST_SIZE; w++)
+				if (d->in_flight_tags[i][j] == data_ptr[w]) {
+					output_ptr[j] = i+1;
+					break;
+				}
+		for (j = 0; j < RTE_DIST_BURST_SIZE; j++)
+			for (w = 0; w < RTE_DIST_BURST_SIZE; w++)
+				if (bl->tags[j] == data_ptr[w]) {
+					output_ptr[j] = i+1;
+					break;
+				}
+	}
+
+	/*
+	 * At this stage, the output contains 8 16-bit values, with
+	 * each non-zero value containing the worker ID on which the
+	 * corresponding flow is pinned to.
+	 */
+}
+
+
+
+static unsigned int
+handle_returns(struct rte_distributor_burst *d, unsigned int wkr)
+{
+	struct rte_distributor_buffer_burst *buf = &(d->bufs[wkr]);
+	uintptr_t oldbuf;
+	unsigned int ret_start = d->returns.start,
+			ret_count = d->returns.count;
+	unsigned int count = 0;
+	unsigned int i;
+	/*
+	 * wait for the GET_BUF bit to go high, otherwise we can't send
+	 * the packets to the worker
+	 */
+
+	if (buf->retptr64[0] & RTE_DISTRIB_GET_BUF) {
+		for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+			if (buf->retptr64[i] & RTE_DISTRIB_RETURN_BUF) {
+				oldbuf = ((uintptr_t)(buf->retptr64[i] >>
+					RTE_DISTRIB_FLAG_BITS));
+				/* store returns in a circular buffer */
+				store_return(oldbuf, d, &ret_start, &ret_count);
+				count++;
+				buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
+			}
+		}
+		d->returns.start = ret_start;
+		d->returns.count = ret_count;
+		/* Clear for the worker to populate with more returns */
+		buf->retptr64[0] = 0;
+	}
+	return count;
+}
+
+static unsigned int
+release(struct rte_distributor_burst *d, unsigned int wkr)
+{
+	struct rte_distributor_buffer_burst *buf = &(d->bufs[wkr]);
+	unsigned int i;
+
+	if (d->backlog[wkr].count == 0)
+		return 0;
+
+	while (!(d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF))
+		rte_pause();
+
+	handle_returns(d, wkr);
+
+	buf->count = 0;
+
+	for (i = 0; i < d->backlog[wkr].count; i++) {
+		d->bufs[wkr].bufptr64[i] = d->backlog[wkr].pkts[i] |
+				RTE_DISTRIB_GET_BUF | RTE_DISTRIB_VALID_BUF;
+		d->in_flight_tags[wkr][i] = d->backlog[wkr].tags[i];
+	}
+	buf->count = i;
+	for ( ; i < RTE_DIST_BURST_SIZE ; i++) {
+		buf->bufptr64[i] = RTE_DISTRIB_GET_BUF;
+		d->in_flight_tags[wkr][i] = 0;
+	}
+
+	d->backlog[wkr].count = 0;
+
+	/* Clear the GET bit */
+	buf->bufptr64[0] &= ~RTE_DISTRIB_GET_BUF;
+	return  buf->count;
+
+}
+
+
+/* process a set of packets to distribute them to workers */
+int
+rte_distributor_process_burst(struct rte_distributor_burst *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs)
+{
+	unsigned int next_idx = 0;
+	static unsigned int wkr;
+	struct rte_mbuf *next_mb = NULL;
+	int64_t next_value = 0;
+	uint16_t new_tag = 0;
+	uint16_t flows[RTE_DIST_BURST_SIZE] __rte_cache_aligned;
+	unsigned int i, wid;
+	int j, w;
+
+	if (unlikely(num_mbufs == 0)) {
+		/* Flush out all non-full cache-lines to workers. */
+		for (wid = 0 ; wid < d->num_workers; wid++) {
+			if ((d->bufs[wid].bufptr64[0] & RTE_DISTRIB_GET_BUF)) {
+				release(d, wid);
+				handle_returns(d, wid);
+			}
+		}
+		return 0;
+	}
+
+	while (next_idx < num_mbufs) {
+		uint16_t matches[RTE_DIST_BURST_SIZE];
+		int pkts;
+
+		if (d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF)
+			d->bufs[wkr].count = 0;
+
+		for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+			if (mbufs[next_idx + i]) {
+				/* flows have to be non-zero */
+				flows[i] = mbufs[next_idx + i]->hash.usr | 1;
+			} else
+				flows[i] = 0;
+		}
+
+		switch (d->dist_match_fn) {
+		default:
+			find_match_scalar(d, &flows[0], &matches[0]);
+		}
+
+		/*
+		 * Matches array now contain the intended worker ID (+1) of
+		 * the incoming packets. Any zeroes need to be assigned
+		 * workers.
+		 */
+
+		if ((num_mbufs - next_idx) < RTE_DIST_BURST_SIZE)
+			pkts = num_mbufs - next_idx;
+		else
+			pkts = RTE_DIST_BURST_SIZE;
+
+		for (j = 0; j < pkts; j++) {
+
+			next_mb = mbufs[next_idx++];
+			next_value = (((int64_t)(uintptr_t)next_mb) <<
+					RTE_DISTRIB_FLAG_BITS);
+			/*
+			 * User is advocated to set tag vaue for each
+			 * mbuf before calling rte_distributor_process.
+			 * User defined tags are used to identify flows,
+			 * or sessions.
+			 */
+			/* flows MUST be non-zero */
+			new_tag = (uint16_t)(next_mb->hash.usr) | 1;
+
+			/*
+			 * Uncommenting the next line will cause the find_match
+			 * function to be optimised out, making this function
+			 * do parallel (non-atomic) distribution
+			 */
+			/* matches[j] = 0; */
+
+			if (matches[j]) {
+				struct rte_distributor_backlog *bl =
+						&d->backlog[matches[j]-1];
+				if (unlikely(bl->count ==
+						RTE_DIST_BURST_SIZE)) {
+					release(d, matches[j]-1);
+				}
+
+				/* Add to worker that already has flow */
+				unsigned int idx = bl->count++;
+
+				bl->tags[idx] = new_tag;
+				bl->pkts[idx] = next_value;
+
+			} else {
+				struct rte_distributor_backlog *bl =
+						&d->backlog[wkr];
+				if (unlikely(bl->count ==
+						RTE_DIST_BURST_SIZE)) {
+					release(d, wkr);
+				}
+
+				/* Add to current worker worker */
+				unsigned int idx = bl->count++;
+
+				bl->tags[idx] = new_tag;
+				bl->pkts[idx] = next_value;
+				/*
+				 * Now that we've just added an unpinned flow
+				 * to a worker, we need to ensure that all
+				 * other packets with that same flow will go
+				 * to the same worker in this burst.
+				 */
+				for (w = j; w < pkts; w++)
+					if (flows[w] == new_tag)
+						matches[w] = wkr+1;
+			}
+		}
+		wkr++;
+		if (wkr >= d->num_workers)
+			wkr = 0;
+	}
+
+	/* Flush out all non-full cache-lines to workers. */
+	for (wid = 0 ; wid < d->num_workers; wid++)
+		if ((d->bufs[wid].bufptr64[0] & RTE_DISTRIB_GET_BUF))
+			release(d, wid);
+
+	return num_mbufs;
+}
+
+/* return to the caller, packets returned from workers */
+int
+rte_distributor_returned_pkts_burst(struct rte_distributor_burst *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs)
+{
+	struct rte_distributor_returned_pkts *returns = &d->returns;
+	unsigned int retval = (max_mbufs < returns->count) ?
+			max_mbufs : returns->count;
+	unsigned int i;
+
+	for (i = 0; i < retval; i++) {
+		unsigned int idx = (returns->start + i) &
+				RTE_DISTRIB_RETURNS_MASK;
+
+		mbufs[i] = returns->mbufs[idx];
+	}
+	returns->start += i;
+	returns->count -= i;
+
+	return retval;
+}
+
+/*
+ * Return the number of packets in-flight in a distributor, i.e. packets
+ * being workered on or queued up in a backlog.
+ */
+static inline unsigned int
+total_outstanding(const struct rte_distributor_burst *d)
+{
+	unsigned int wkr, total_outstanding = 0;
+
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		total_outstanding += d->backlog[wkr].count;
+
+	return total_outstanding;
+}
+
+/*
+ * Flush the distributor, so that there are no outstanding packets in flight or
+ * queued up.
+ */
+int
+rte_distributor_flush_burst(struct rte_distributor_burst *d)
+{
+	const unsigned int flushed = total_outstanding(d);
+	unsigned int wkr;
+
+	while (total_outstanding(d) > 0)
+		rte_distributor_process_burst(d, NULL, 0);
+
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		handle_returns(d, wkr);
+
+	return flushed;
+}
+
+/* clears the internal returns array in the distributor */
+void
+rte_distributor_clear_returns_burst(struct rte_distributor_burst *d)
+{
+	unsigned int wkr;
+
+	/* throw away returns, so workers can exit */
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		d->bufs[wkr].retptr64[0] = 0;
+}
+
+/* creates a distributor instance */
+struct rte_distributor_burst *
+rte_distributor_create_burst(const char *name,
+		unsigned int socket_id,
+		unsigned int num_workers)
+{
+	struct rte_distributor_burst *d;
+	struct rte_dist_burst_list *dist_burst_list;
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+	unsigned int i;
+
+	/* compilation-time checks */
+	RTE_BUILD_BUG_ON((sizeof(*d) & RTE_CACHE_LINE_MASK) != 0);
+	RTE_BUILD_BUG_ON((RTE_DISTRIB_MAX_WORKERS & 7) != 0);
+
+	if (name == NULL || num_workers >= RTE_DISTRIB_MAX_WORKERS) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	snprintf(mz_name, sizeof(mz_name), RTE_DISTRIB_PREFIX"%s", name);
+	mz = rte_memzone_reserve(mz_name, sizeof(*d), socket_id, NO_FLAGS);
+	if (mz == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	d = mz->addr;
+	snprintf(d->name, sizeof(d->name), "%s", name);
+	d->num_workers = num_workers;
+
+	d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
+
+	/*
+	 * Set up the backog tags so they're pointing at the second cache
+	 * line for performance during flow matching
+	 */
+	for (i = 0 ; i < num_workers ; i++)
+		d->backlog[i].tags = &d->in_flight_tags[i][RTE_DIST_BURST_SIZE];
+
+	dist_burst_list = RTE_TAILQ_CAST(rte_dist_burst_tailq.head,
+					  rte_dist_burst_list);
+
+	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+	TAILQ_INSERT_TAIL(dist_burst_list, d, next);
+	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
+	return d;
+}
diff --git a/lib/librte_distributor/rte_distributor_burst.h b/lib/librte_distributor/rte_distributor_burst.h
new file mode 100644
index 0000000..5096b13
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_burst.h
@@ -0,0 +1,255 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DIST_BURST_H_
+#define _RTE_DIST_BURST_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct rte_distributor_burst;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor_burst *
+rte_distributor_create_burst(const char *name, unsigned int socket_id,
+		unsigned int num_workers);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed on different cores at
+ * the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process_burst(struct rte_distributor_burst *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts_burst(struct rte_distributor_burst *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush_burst(struct rte_distributor_burst *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns_burst(struct rte_distributor_burst *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get new packets to process. Any previous packets
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param pkts
+ *   The mbufs pointer array to be filled in (up to 8 packets)
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ * @param retcount
+ *   The number of packets being returned
+ *
+ * @return
+ *   The number of packets in the pkts array
+ */
+int
+rte_distributor_get_pkt_burst(struct rte_distributor_burst *d,
+	unsigned int worker_id, struct rte_mbuf **pkts,
+	struct rte_mbuf **oldpkt, unsigned int retcount);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbuf
+ *   The previous packet being processed by the worker
+ */
+int
+rte_distributor_return_pkt_burst(struct rte_distributor_burst *d,
+	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt_burst(), this function does not wait for a
+ * new packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt_burst() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt_burst()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The returning packets, if any, processed by the worker
+ * @param count
+ *   The number of returning packets
+ */
+void
+rte_distributor_request_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbufs
+ *   The array of mbufs being given to the worker
+ *
+ * @return
+ *   The number of packets being given to the worker thread, zero if no
+ *   packet is yet available.
+ */
+int
+rte_distributor_poll_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **mbufs);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_priv.h b/lib/librte_distributor/rte_distributor_priv.h
new file mode 100644
index 0000000..833855f
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_priv.h
@@ -0,0 +1,189 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DIST_PRIV_H_
+#define _RTE_DIST_PRIV_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define NO_FLAGS 0
+#define RTE_DISTRIB_PREFIX "DT_"
+
+/*
+ * We will use the bottom four bits of pointer for flags, shifting out
+ * the top four bits to make room (since a 64-bit pointer actually only uses
+ * 48 bits). An arithmetic-right-shift will then appropriately restore the
+ * original pointer value with proper sign extension into the top bits.
+ */
+#define RTE_DISTRIB_FLAG_BITS 4
+#define RTE_DISTRIB_FLAGS_MASK (0x0F)
+#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
+#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
+#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
+#define RTE_DISTRIB_VALID_BUF (4)  /**< set if bufptr contains ptr */
+
+#define RTE_DISTRIB_BACKLOG_SIZE 8
+#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
+
+#define RTE_DISTRIB_MAX_RETURNS 128
+#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
+
+/**
+ * Maximum number of workers allowed.
+ * Be aware of increasing the limit, becaus it is limited by how we track
+ * in-flight tags. See @in_flight_bitmask and @rte_distributor_process
+ */
+#define RTE_DISTRIB_MAX_WORKERS 64
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+/**
+ * Buffer structure used to pass the pointer data between cores. This is cache
+ * line aligned, but to improve performance and prevent adjacent cache-line
+ * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
+ * the next cache line to worker 0, we pad this out to three cache lines.
+ * Only 64-bits of the memory is actually used though.
+ */
+union rte_distributor_buffer {
+	volatile int64_t bufptr64;
+	char pad[RTE_CACHE_LINE_SIZE*3];
+} __rte_cache_aligned;
+
+/**
+ * Number of packets to deal with in bursts. Needs to be 8 so as to
+ * fit in one cache line.
+ */
+#define RTE_DIST_BURST_SIZE (sizeof(__m128i) / sizeof(uint16_t))
+
+/**
+ * Buffer structure used to pass the pointer data between cores. This is cache
+ * line aligned, but to improve performance and prevent adjacent cache-line
+ * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
+ * the next cache line to worker 0, we pad this out to two cache lines.
+ * We can pass up to 8 mbufs at a time in one cacheline.
+ * There is a separate cacheline for returns in the burst API.
+ */
+struct rte_distributor_buffer_burst {
+	volatile int64_t bufptr64[RTE_DIST_BURST_SIZE]
+			__rte_cache_aligned; /* <= outgoing to worker */
+
+	int64_t pad1 __rte_cache_aligned;    /* <= one cache line  */
+
+	volatile int64_t retptr64[RTE_DIST_BURST_SIZE]
+			__rte_cache_aligned; /* <= incoming from worker */
+
+	int64_t pad2 __rte_cache_aligned;    /* <= one cache line  */
+
+	int count __rte_cache_aligned;       /* <= number of current mbufs */
+};
+
+
+struct rte_distributor_backlog {
+	unsigned int start;
+	unsigned int count;
+	int64_t pkts[RTE_DIST_BURST_SIZE] __rte_cache_aligned;
+	uint16_t *tags; /* will point to second cacheline of inflights */
+} __rte_cache_aligned;
+
+
+struct rte_distributor_returned_pkts {
+	unsigned int start;
+	unsigned int count;
+	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
+};
+
+struct rte_distributor {
+	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
+
+	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
+	unsigned int num_workers;             /**< Number of workers polling */
+
+	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
+		/**< Tracks the tag being processed per core */
+	uint64_t in_flight_bitmask;
+		/**< on/off bits for in-flight tags.
+		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
+		 * the bitmask has to expand.
+		 */
+
+	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
+
+	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
+
+	struct rte_distributor_returned_pkts returns;
+};
+
+/* All different signature compare functions */
+enum rte_distributor_match_function {
+	RTE_DIST_MATCH_SCALAR = 0,
+	RTE_DIST_MATCH_NUM
+};
+
+struct rte_distributor_burst {
+	TAILQ_ENTRY(rte_distributor_burst) next;    /**< Next in list. */
+
+	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
+	unsigned int num_workers;             /**< Number of workers polling */
+
+	/**>
+	 * First cache line in the this array are the tags inflight
+	 * on the worker core. Second cache line are the backlog
+	 * that are going to go to the worker core.
+	 */
+	uint16_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS][RTE_DIST_BURST_SIZE*2]
+			__rte_cache_aligned;
+
+	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS]
+			__rte_cache_aligned;
+
+	struct rte_distributor_buffer_burst bufs[RTE_DISTRIB_MAX_WORKERS];
+
+	struct rte_distributor_returned_pkts returns;
+
+	enum rte_distributor_match_function dist_match_fn;
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [WARNING: A/V UNSCANNABLE][PATCH v3 2/6] lib: add distributor vector flow matching
  2017-01-02 10:22       ` [WARNING: A/V UNSCANNABLE][PATCH v3 0/6] distributor-performance-improvements David Hunt
  2017-01-02 10:22         ` [WARNING: A/V UNSCANNABLE][PATCH v3 1/6] lib: distributor performance enhancements David Hunt
@ 2017-01-02 10:22         ` David Hunt
  2017-01-02 10:22         ` [WARNING: A/V UNSCANNABLE][PATCH v3 3/6] test: unit tests for new distributor burst api David Hunt
                           ` (3 subsequent siblings)
  5 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-01-02 10:22 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                    |   4 +
 lib/librte_distributor/rte_distributor_burst.c     |  11 +-
 lib/librte_distributor/rte_distributor_match_sse.c | 113 +++++++++++++++++++++
 lib/librte_distributor/rte_distributor_priv.h      |   6 ++
 4 files changed, 133 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_distributor/rte_distributor_match_sse.c

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 2acc54d..a725aaf 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -44,6 +44,10 @@ LIBABIVER := 1
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_burst.c
+ifeq ($(CONFIG_RTE_ARCH_X86),y)
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_match_sse.c
+endif
+
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
diff --git a/lib/librte_distributor/rte_distributor_burst.c b/lib/librte_distributor/rte_distributor_burst.c
index ae7cf9d..35044c4 100644
--- a/lib/librte_distributor/rte_distributor_burst.c
+++ b/lib/librte_distributor/rte_distributor_burst.c
@@ -352,6 +352,9 @@ rte_distributor_process_burst(struct rte_distributor_burst *d,
 		}
 
 		switch (d->dist_match_fn) {
+		case RTE_DIST_MATCH_VECTOR:
+			find_match_vec(d, &flows[0], &matches[0]);
+			break;
 		default:
 			find_match_scalar(d, &flows[0], &matches[0]);
 		}
@@ -538,7 +541,13 @@ rte_distributor_create_burst(const char *name,
 	snprintf(d->name, sizeof(d->name), "%s", name);
 	d->num_workers = num_workers;
 
-	d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
+#if defined(RTE_ARCH_X86)
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2)) {
+		d->dist_match_fn = RTE_DIST_MATCH_VECTOR;
+	} else {
+#endif
+		d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
+	}
 
 	/*
 	 * Set up the backog tags so they're pointing at the second cache
diff --git a/lib/librte_distributor/rte_distributor_match_sse.c b/lib/librte_distributor/rte_distributor_match_sse.c
new file mode 100644
index 0000000..78641f5
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_match_sse.c
@@ -0,0 +1,113 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_mbuf.h>
+#include "rte_distributor_priv.h"
+#include "rte_distributor_burst.h"
+#include "smmintrin.h"
+
+
+void
+find_match_vec(struct rte_distributor_burst *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	/* Setup */
+	__m128i incoming_fids;
+	__m128i inflight_fids;
+	__m128i preflight_fids;
+	__m128i wkr;
+	__m128i mask1;
+	__m128i mask2;
+	__m128i output;
+	struct rte_distributor_backlog *bl;
+	uint16_t i;
+
+	/*
+	 * Function overview:
+	 * 2. Loop through all worker ID's
+	 *  2a. Load the current inflights for that worker into an xmm reg
+	 *  2b. Load the current backlog for that worker into an xmm reg
+	 *  2c. use cmpestrm to intersect flow_ids with backlog and inflights
+	 *  2d. Add any matches to the output
+	 * 3. Write the output xmm (matching worker ids).
+	 */
+
+
+	output = _mm_set1_epi16(0);
+	incoming_fids = _mm_load_si128((__m128i *)data_ptr);
+
+	for (i = 0; i < d->num_workers; i++) {
+		bl = &d->backlog[i];
+
+		inflight_fids =
+			_mm_load_si128((__m128i *)&(d->in_flight_tags[i]));
+		preflight_fids =
+			_mm_load_si128((__m128i *)(bl->tags));
+
+		/*
+		 * Any incoming_fid that exists anywhere in inflight_fids will
+		 * have 0xffff in same position of the mask as the incoming fid
+		 * Example (shortened to bytes for brevity):
+		 * incoming_fids   0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08
+		 * inflight_fids   0x03 0x05 0x07 0x00 0x00 0x00 0x00 0x00
+		 * mask            0x00 0x00 0xff 0x00 0xff 0x00 0xff 0x00
+		 */
+
+		mask1 = _mm_cmpestrm(inflight_fids, 8, incoming_fids, 8,
+			_SIDD_UWORD_OPS |
+			_SIDD_CMP_EQUAL_ANY |
+			_SIDD_UNIT_MASK);
+		mask2 = _mm_cmpestrm(preflight_fids, 8, incoming_fids, 8,
+			_SIDD_UWORD_OPS |
+			_SIDD_CMP_EQUAL_ANY |
+			_SIDD_UNIT_MASK);
+
+		mask1 = _mm_or_si128(mask1, mask2);
+		/*
+		 * Now mask contains 0xffff where there's a match.
+		 * Next we need to store the worker_id in the relevant position
+		 * in the output.
+		 */
+
+		wkr = _mm_set1_epi16(i+1);
+		mask1 = _mm_and_si128(mask1, wkr);
+		output = _mm_or_si128(mask1, output);
+	}
+
+	/*
+	 * At this stage, the output 128-bit contains 8 16-bit values, with
+	 * each non-zero value containing the worker ID on which the
+	 * corresponding flow is pinned to.
+	 */
+	_mm_store_si128((__m128i *)output_ptr, output);
+}
diff --git a/lib/librte_distributor/rte_distributor_priv.h b/lib/librte_distributor/rte_distributor_priv.h
index 833855f..cc2c478 100644
--- a/lib/librte_distributor/rte_distributor_priv.h
+++ b/lib/librte_distributor/rte_distributor_priv.h
@@ -155,6 +155,7 @@ struct rte_distributor {
 /* All different signature compare functions */
 enum rte_distributor_match_function {
 	RTE_DIST_MATCH_SCALAR = 0,
+	RTE_DIST_MATCH_VECTOR,
 	RTE_DIST_MATCH_NUM
 };
 
@@ -182,6 +183,11 @@ struct rte_distributor_burst {
 	enum rte_distributor_match_function dist_match_fn;
 };
 
+void
+find_match_vec(struct rte_distributor_burst *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr);
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [WARNING: A/V UNSCANNABLE][PATCH v3 3/6] test: unit tests for new distributor burst api
  2017-01-02 10:22       ` [WARNING: A/V UNSCANNABLE][PATCH v3 0/6] distributor-performance-improvements David Hunt
  2017-01-02 10:22         ` [WARNING: A/V UNSCANNABLE][PATCH v3 1/6] lib: distributor performance enhancements David Hunt
  2017-01-02 10:22         ` [WARNING: A/V UNSCANNABLE][PATCH v3 2/6] lib: add distributor vector flow matching David Hunt
@ 2017-01-02 10:22         ` David Hunt
  2017-01-02 10:22         ` [WARNING: A/V UNSCANNABLE][PATCH v3 4/6] test: add distributor_perf autotest David Hunt
                           ` (2 subsequent siblings)
  5 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-01-02 10:22 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 app/test/test_distributor.c | 501 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 392 insertions(+), 109 deletions(-)

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index 85cb8f3..3871f86 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -40,11 +40,24 @@
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
 #include <rte_distributor.h>
+#include <rte_distributor_burst.h>
 
 #define ITER_POWER 20 /* log 2 of how many iterations we do when timing. */
 #define BURST 32
 #define BIG_BATCH 1024
 
+#define DIST_SINGLE 0
+#define DIST_BURST  1
+#define DIST_NUM_TYPES 2
+
+struct worker_params {
+	struct rte_distributor *d;
+	struct rte_distributor_burst *db;
+	int dist_type;
+};
+
+struct worker_params worker_params;
+
 /* statics - all zero-initialized by default */
 static volatile int quit;      /**< general quit variable for all threads */
 static volatile int zero_quit; /**< var for when we just want thr0 to quit*/
@@ -81,17 +94,36 @@ static int
 handle_work(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
-
-	pkt = rte_distributor_get_pkt(d, id, NULL);
-	while (!quit) {
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
+	unsigned int count = 0, num = 0;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	int i;
+
+	if (wp->dist_type == DIST_SINGLE) {
+		pkt = rte_distributor_get_pkt(d, id, NULL);
+		while (!quit) {
+			worker_stats[id].handled_packets++, count++;
+			pkt = rte_distributor_get_pkt(d, id, pkt);
+		}
 		worker_stats[id].handled_packets++, count++;
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		rte_distributor_return_pkt(d, id, pkt);
+	} else {
+		for (i = 0; i < 8; i++)
+			buf[i] = NULL;
+		num = rte_distributor_get_pkt_burst(db, id, buf, buf, num);
+		while (!quit) {
+			worker_stats[id].handled_packets += num;
+			count += num;
+			num = rte_distributor_get_pkt_burst(db, id,
+					buf, buf, num);
+		}
+		worker_stats[id].handled_packets += num;
+		count += num;
+		rte_distributor_return_pkt_burst(db, id, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
 	return 0;
 }
 
@@ -107,12 +139,21 @@ handle_work(void *arg)
  *   not necessarily in the same order (as different flows).
  */
 static int
-sanity_test(struct rte_distributor *d, struct rte_mempool *p)
+sanity_test(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	struct rte_mbuf *bufs[BURST];
-	unsigned i;
+	struct rte_mbuf *returns[BURST*2];
+	unsigned int i;
+	unsigned int retries;
+	unsigned int count = 0;
+
+	if (wp->dist_type == DIST_SINGLE)
+		printf("=== Basic distributor sanity tests (single) ===\n");
+	else
+		printf("=== Basic distributor sanity tests (burst) ===\n");
 
-	printf("=== Basic distributor sanity tests ===\n");
 	clear_packet_count();
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
@@ -124,8 +165,21 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 	for (i = 0; i < BURST; i++)
 		bufs[i]->hash.usr = 0;
 
-	rte_distributor_process(d, bufs, BURST);
-	rte_distributor_flush(d);
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, bufs, BURST);
+		rte_distributor_flush(d);
+	} else {
+		rte_distributor_process_burst(db, bufs, BURST);
+		count = 0;
+		do {
+
+			rte_distributor_flush_burst(db);
+			count += rte_distributor_returned_pkts_burst(db,
+					returns, BURST*2);
+		} while (count < BURST);
+	}
+
+
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -146,8 +200,18 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 		for (i = 0; i < BURST; i++)
 			bufs[i]->hash.usr = (i & 1) << 8;
 
-		rte_distributor_process(d, bufs, BURST);
-		rte_distributor_flush(d);
+		if (wp->dist_type == DIST_SINGLE) {
+			rte_distributor_process(d, bufs, BURST);
+			rte_distributor_flush(d);
+		} else {
+			rte_distributor_process_burst(db, bufs, BURST);
+			count = 0;
+			do {
+				rte_distributor_flush_burst(db);
+				count += rte_distributor_returned_pkts_burst(db,
+						returns, BURST*2);
+			} while (count < BURST);
+		}
 		if (total_packet_count() != BURST) {
 			printf("Line %d: Error, not all packets flushed. "
 					"Expected %u, got %u\n",
@@ -155,24 +219,32 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 			return -1;
 		}
 
+
 		for (i = 0; i < rte_lcore_count() - 1; i++)
 			printf("Worker %u handled %u packets\n", i,
 					worker_stats[i].handled_packets);
 		printf("Sanity test with two hash values done\n");
-
-		if (worker_stats[0].handled_packets != 16 ||
-				worker_stats[1].handled_packets != 16)
-			return -1;
 	}
 
 	/* give a different hash value to each packet,
 	 * so load gets distributed */
 	clear_packet_count();
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = i;
+		bufs[i]->hash.usr = i+1;
+
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, bufs, BURST);
+		rte_distributor_flush(d);
+	} else {
+		rte_distributor_process_burst(db, bufs, BURST);
+		count = 0;
+		do {
+			rte_distributor_flush_burst(db);
+			count += rte_distributor_returned_pkts_burst(db,
+					returns, BURST*2);
+		} while (count < BURST);
+	}
 
-	rte_distributor_process(d, bufs, BURST);
-	rte_distributor_flush(d);
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -194,8 +266,15 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 	unsigned num_returned = 0;
 
 	/* flush out any remaining packets */
-	rte_distributor_flush(d);
-	rte_distributor_clear_returns(d);
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_flush(d);
+		rte_distributor_clear_returns(d);
+	} else {
+		rte_distributor_flush_burst(db);
+		rte_distributor_clear_returns_burst(db);
+	}
+
+
 	if (rte_mempool_get_bulk(p, (void *)many_bufs, BIG_BATCH) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
 		return -1;
@@ -203,28 +282,59 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 	for (i = 0; i < BIG_BATCH; i++)
 		many_bufs[i]->hash.usr = i << 2;
 
-	for (i = 0; i < BIG_BATCH/BURST; i++) {
-		rte_distributor_process(d, &many_bufs[i*BURST], BURST);
+	if (wp->dist_type == DIST_SINGLE) {
+		printf("===testing single big burst===\n");
+		for (i = 0; i < BIG_BATCH/BURST; i++) {
+			rte_distributor_process(d, &many_bufs[i*BURST], BURST);
+			num_returned += rte_distributor_returned_pkts(d,
+					&return_bufs[num_returned],
+					BIG_BATCH - num_returned);
+		}
+		rte_distributor_flush(d);
 		num_returned += rte_distributor_returned_pkts(d,
 				&return_bufs[num_returned],
 				BIG_BATCH - num_returned);
+	} else {
+		printf("===testing burst big burst===\n");
+		for (i = 0; i < BIG_BATCH/BURST; i++) {
+			rte_distributor_process_burst(db,
+					&many_bufs[i*BURST], BURST);
+			count = rte_distributor_returned_pkts_burst(db,
+					&return_bufs[num_returned],
+					BIG_BATCH - num_returned);
+			num_returned += count;
+		}
+		rte_distributor_flush_burst(db);
+		count = rte_distributor_returned_pkts_burst(db,
+				&return_bufs[num_returned],
+				BIG_BATCH - num_returned);
+		num_returned += count;
 	}
-	rte_distributor_flush(d);
-	num_returned += rte_distributor_returned_pkts(d,
-			&return_bufs[num_returned], BIG_BATCH - num_returned);
+	retries = 0;
+	do {
+		rte_distributor_flush_burst(db);
+		count = rte_distributor_returned_pkts_burst(db,
+				&return_bufs[num_returned],
+				BIG_BATCH - num_returned);
+		num_returned += count;
+		retries++;
+	} while ((num_returned < BIG_BATCH) && (retries < 100));
+
 
 	if (num_returned != BIG_BATCH) {
-		printf("line %d: Number returned is not the same as "
-				"number sent\n", __LINE__);
+		printf("line %d: Missing packets, expected %d\n",
+				__LINE__, num_returned);
 		return -1;
 	}
+
 	/* big check -  make sure all packets made it back!! */
 	for (i = 0; i < BIG_BATCH; i++) {
 		unsigned j;
 		struct rte_mbuf *src = many_bufs[i];
-		for (j = 0; j < BIG_BATCH; j++)
+		for (j = 0; j < BIG_BATCH; j++) {
 			if (return_bufs[j] == src)
 				break;
+		}
 
 		if (j == BIG_BATCH) {
 			printf("Error: could not find source packet #%u\n", i);
@@ -234,7 +344,6 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 	printf("Sanity test of returned packets done\n");
 
 	rte_mempool_put_bulk(p, (void *)many_bufs, BIG_BATCH);
-
 	printf("\n");
 	return 0;
 }
@@ -249,18 +358,40 @@ static int
 handle_work_with_free_mbufs(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
-
-	pkt = rte_distributor_get_pkt(d, id, NULL);
-	while (!quit) {
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
+	unsigned int count = 0;
+	unsigned int i;
+	unsigned int num = 0;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+
+	if (wp->dist_type == DIST_SINGLE) {
+		pkt = rte_distributor_get_pkt(d, id, NULL);
+		while (!quit) {
+			worker_stats[id].handled_packets++, count++;
+			rte_pktmbuf_free(pkt);
+			pkt = rte_distributor_get_pkt(d, id, pkt);
+		}
 		worker_stats[id].handled_packets++, count++;
-		rte_pktmbuf_free(pkt);
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		rte_distributor_return_pkt(d, id, pkt);
+	} else {
+		for (i = 0; i < 8; i++)
+			buf[i] = NULL;
+		num = rte_distributor_get_pkt_burst(db, id, buf, buf, num);
+		while (!quit) {
+			worker_stats[id].handled_packets += num;
+			count += num;
+			for (i = 0; i < num; i++)
+				rte_pktmbuf_free(buf[i]);
+			num = rte_distributor_get_pkt_burst(db,
+					id, buf, buf, num);
+		}
+		worker_stats[id].handled_packets += num;
+		count += num;
+		rte_distributor_return_pkt_burst(db, id, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
 	return 0;
 }
 
@@ -270,26 +401,45 @@ handle_work_with_free_mbufs(void *arg)
  * library.
  */
 static int
-sanity_test_with_mbuf_alloc(struct rte_distributor *d, struct rte_mempool *p)
+sanity_test_with_mbuf_alloc(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	unsigned i;
 	struct rte_mbuf *bufs[BURST];
 
-	printf("=== Sanity test with mbuf alloc/free  ===\n");
+	if (wp->dist_type == DIST_SINGLE)
+		printf("=== Sanity test with mbuf alloc/free (single) ===\n");
+	else
+		printf("=== Sanity test with mbuf alloc/free (burst)  ===\n");
+
 	clear_packet_count();
 	for (i = 0; i < ((1<<ITER_POWER)); i += BURST) {
 		unsigned j;
-		while (rte_mempool_get_bulk(p, (void *)bufs, BURST) < 0)
-			rte_distributor_process(d, NULL, 0);
+		while (rte_mempool_get_bulk(p, (void *)bufs, BURST) < 0) {
+			if (wp->dist_type == DIST_SINGLE)
+				rte_distributor_process(d, NULL, 0);
+			else
+				rte_distributor_process_burst(db, NULL, 0);
+		}
 		for (j = 0; j < BURST; j++) {
 			bufs[j]->hash.usr = (i+j) << 1;
 			rte_mbuf_refcnt_set(bufs[j], 1);
 		}
 
-		rte_distributor_process(d, bufs, BURST);
+		if (wp->dist_type == DIST_SINGLE)
+			rte_distributor_process(d, bufs, BURST);
+		else
+			rte_distributor_process_burst(db, bufs, BURST);
 	}
 
-	rte_distributor_flush(d);
+	if (wp->dist_type == DIST_SINGLE)
+		rte_distributor_flush(d);
+	else
+		rte_distributor_flush_burst(db);
+
+	rte_delay_us(10000);
+
 	if (total_packet_count() < (1<<ITER_POWER)) {
 		printf("Line %u: Packet count is incorrect, %u, expected %u\n",
 				__LINE__, total_packet_count(),
@@ -305,20 +455,48 @@ static int
 handle_work_for_shutdown_test(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
-	unsigned count = 0;
-	const unsigned id = __sync_fetch_and_add(&worker_idx, 1);
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
+	unsigned int count = 0;
+	unsigned int num = 0;
+	unsigned int total = 0;
+	unsigned int i;
+	unsigned int returned = 0;
+	const unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+
+	if (wp->dist_type == DIST_SINGLE)
+		pkt = rte_distributor_get_pkt(d, id, NULL);
+	else
+		num = rte_distributor_get_pkt_burst(db, id, buf, buf, num);
 
-	pkt = rte_distributor_get_pkt(d, id, NULL);
 	/* wait for quit single globally, or for worker zero, wait
 	 * for zero_quit */
 	while (!quit && !(id == 0 && zero_quit)) {
-		worker_stats[id].handled_packets++, count++;
-		rte_pktmbuf_free(pkt);
-		pkt = rte_distributor_get_pkt(d, id, NULL);
+		if (wp->dist_type == DIST_SINGLE) {
+			worker_stats[id].handled_packets++, count++;
+			rte_pktmbuf_free(pkt);
+			pkt = rte_distributor_get_pkt(d, id, NULL);
+			num = 1;
+			total += num;
+		} else {
+			worker_stats[id].handled_packets += num;
+			count += num;
+			for (i = 0; i < num; i++)
+				rte_pktmbuf_free(buf[i]);
+			num = rte_distributor_get_pkt_burst(db,
+					id, buf, buf, num);
+			total += num;
+		}
+	}
+	worker_stats[id].handled_packets += num;
+	count += num;
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_return_pkt(d, id, pkt);
+	} else {
+		returned = rte_distributor_return_pkt_burst(db, id, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
 
 	if (id == 0) {
 		/* for worker zero, allow it to restart to pick up last packet
@@ -326,13 +504,29 @@ handle_work_for_shutdown_test(void *arg)
 		 */
 		while (zero_quit)
 			usleep(100);
-		pkt = rte_distributor_get_pkt(d, id, NULL);
+		if (wp->dist_type == DIST_SINGLE) {
+			pkt = rte_distributor_get_pkt(d, id, NULL);
+		} else {
+			num = rte_distributor_get_pkt_burst(db,
+					id, buf, buf, num);
+		}
 		while (!quit) {
 			worker_stats[id].handled_packets++, count++;
 			rte_pktmbuf_free(pkt);
-			pkt = rte_distributor_get_pkt(d, id, NULL);
+			if (wp->dist_type == DIST_SINGLE) {
+				pkt = rte_distributor_get_pkt(d, id, NULL);
+			} else {
+				num = rte_distributor_get_pkt_burst(db,
+						id, buf, buf, num);
+			}
+		}
+		if (wp->dist_type == DIST_SINGLE) {
+			rte_distributor_return_pkt(d, id, pkt);
+		} else {
+			returned = rte_distributor_return_pkt_burst(db,
+					id, buf, num);
+			printf("Num returned = %d\n", returned);
 		}
-		rte_distributor_return_pkt(d, id, pkt);
 	}
 	return 0;
 }
@@ -344,26 +538,37 @@ handle_work_for_shutdown_test(void *arg)
  * library.
  */
 static int
-sanity_test_with_worker_shutdown(struct rte_distributor *d,
+sanity_test_with_worker_shutdown(struct worker_params *wp,
 		struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
 	printf("=== Sanity test of worker shutdown ===\n");
 
 	clear_packet_count();
+
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
 		return -1;
 	}
 
-	/* now set all hash values in all buffers to zero, so all pkts go to the
-	 * one worker thread */
+	/*
+	 * Now set all hash values in all buffers to same value so all
+	 * pkts go to the one worker thread
+	 */
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = 0;
+		bufs[i]->hash.usr = 1;
+
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, bufs, BURST);
+	} else {
+		rte_distributor_process_burst(db, bufs, BURST);
+		rte_distributor_flush_burst(db);
+	}
 
-	rte_distributor_process(d, bufs, BURST);
 	/* at this point, we will have processed some packets and have a full
 	 * backlog for the other ones at worker 0.
 	 */
@@ -374,14 +579,25 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d,
 		return -1;
 	}
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = 0;
+		bufs[i]->hash.usr = 1;
 
 	/* get worker zero to quit */
 	zero_quit = 1;
-	rte_distributor_process(d, bufs, BURST);
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, bufs, BURST);
+		/* flush the distributor */
+		rte_distributor_flush(d);
+	} else {
+		rte_distributor_process_burst(db, bufs, BURST);
+		/* flush the distributor */
+		rte_distributor_flush_burst(db);
+	}
+	rte_delay_us(10000);
+
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
 
-	/* flush the distributor */
-	rte_distributor_flush(d);
 	if (total_packet_count() != BURST * 2) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -389,10 +605,6 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d,
 		return -1;
 	}
 
-	for (i = 0; i < rte_lcore_count() - 1; i++)
-		printf("Worker %u handled %u packets\n", i,
-				worker_stats[i].handled_packets);
-
 	printf("Sanity test with worker shutdown passed\n\n");
 	return 0;
 }
@@ -401,13 +613,18 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d,
  * one worker shuts down..
  */
 static int
-test_flush_with_worker_shutdown(struct rte_distributor *d,
+test_flush_with_worker_shutdown(struct worker_params *wp,
 		struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
-	printf("=== Test flush fn with worker shutdown ===\n");
+	if (wp->dist_type == DIST_SINGLE)
+		printf("=== Test flush fn with worker shutdown (single) ===\n");
+	else
+		printf("=== Test flush fn with worker shutdown (burst) ===\n");
 
 	clear_packet_count();
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
@@ -420,7 +637,11 @@ test_flush_with_worker_shutdown(struct rte_distributor *d,
 	for (i = 0; i < BURST; i++)
 		bufs[i]->hash.usr = 0;
 
-	rte_distributor_process(d, bufs, BURST);
+	if (wp->dist_type == DIST_SINGLE)
+		rte_distributor_process(d, bufs, BURST);
+	else
+		rte_distributor_process_burst(db, bufs, BURST);
+
 	/* at this point, we will have processed some packets and have a full
 	 * backlog for the other ones at worker 0.
 	 */
@@ -429,9 +650,18 @@ test_flush_with_worker_shutdown(struct rte_distributor *d,
 	zero_quit = 1;
 
 	/* flush the distributor */
-	rte_distributor_flush(d);
+	if (wp->dist_type == DIST_SINGLE)
+		rte_distributor_flush(d);
+	else
+		rte_distributor_flush_burst(db);
+
+	rte_delay_us(10000);
 
 	zero_quit = 0;
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
+
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -439,10 +669,6 @@ test_flush_with_worker_shutdown(struct rte_distributor *d,
 		return -1;
 	}
 
-	for (i = 0; i < rte_lcore_count() - 1; i++)
-		printf("Worker %u handled %u packets\n", i,
-				worker_stats[i].handled_packets);
-
 	printf("Flush test with worker shutdown passed\n\n");
 	return 0;
 }
@@ -451,6 +677,7 @@ static
 int test_error_distributor_create_name(void)
 {
 	struct rte_distributor *d = NULL;
+	struct rte_distributor_burst *db = NULL;
 	char *name = NULL;
 
 	d = rte_distributor_create(name, rte_socket_id(),
@@ -460,6 +687,13 @@ int test_error_distributor_create_name(void)
 		return -1;
 	}
 
+	db = rte_distributor_create_burst(name, rte_socket_id(),
+			rte_lcore_count() - 1);
+	if (db != NULL || rte_errno != EINVAL) {
+		printf("ERROR: No error on create_burst() with NULL param\n");
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -468,20 +702,32 @@ static
 int test_error_distributor_create_numworkers(void)
 {
 	struct rte_distributor *d = NULL;
+	struct rte_distributor_burst *db = NULL;
+
 	d = rte_distributor_create("test_numworkers", rte_socket_id(),
 			RTE_MAX_LCORE + 10);
 	if (d != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with num_workers > MAX\n");
 		return -1;
 	}
+
+	db = rte_distributor_create_burst("test_numworkers", rte_socket_id(),
+			RTE_MAX_LCORE + 10);
+	if (db != NULL || rte_errno != EINVAL) {
+		printf("ERROR: No error on create_burst() num_workers > MAX\n");
+		return -1;
+	}
+
 	return 0;
 }
 
 
 /* Useful function which ensures that all worker functions terminate */
 static void
-quit_workers(struct rte_distributor *d, struct rte_mempool *p)
+quit_workers(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	const unsigned num_workers = rte_lcore_count() - 1;
 	unsigned i;
 	struct rte_mbuf *bufs[RTE_MAX_LCORE];
@@ -491,12 +737,20 @@ quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 	quit = 1;
 	for (i = 0; i < num_workers; i++)
 		bufs[i]->hash.usr = i << 1;
-	rte_distributor_process(d, bufs, num_workers);
+	if (wp->dist_type == DIST_SINGLE)
+		rte_distributor_process(d, bufs, num_workers);
+	else
+		rte_distributor_process_burst(db, bufs, num_workers);
 
 	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
 
-	rte_distributor_process(d, NULL, 0);
-	rte_distributor_flush(d);
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, NULL, 0);
+		rte_distributor_flush(d);
+	} else {
+		rte_distributor_process_burst(db, NULL, 0);
+		rte_distributor_flush_burst(db);
+	}
 	rte_eal_mp_wait_lcore();
 	quit = 0;
 	worker_idx = 0;
@@ -506,7 +760,9 @@ static int
 test_distributor(void)
 {
 	static struct rte_distributor *d;
+	static struct rte_distributor_burst *db;
 	static struct rte_mempool *p;
+	int i;
 
 	if (rte_lcore_count() < 2) {
 		printf("ERROR: not enough cores to test distributor\n");
@@ -525,6 +781,19 @@ test_distributor(void)
 		rte_distributor_clear_returns(d);
 	}
 
+	if (db == NULL) {
+		db = rte_distributor_create_burst("Test_dist_burst",
+				rte_socket_id(),
+				rte_lcore_count() - 1);
+		if (db == NULL) {
+			printf("Error creating burst distributor\n");
+			return -1;
+		}
+	} else {
+		rte_distributor_flush_burst(db);
+		rte_distributor_clear_returns_burst(db);
+	}
+
 	const unsigned nb_bufs = (511 * rte_lcore_count()) < BIG_BATCH ?
 			(BIG_BATCH * 2) - 1 : (511 * rte_lcore_count());
 	if (p == NULL) {
@@ -536,31 +805,45 @@ test_distributor(void)
 		}
 	}
 
-	rte_eal_mp_remote_launch(handle_work, d, SKIP_MASTER);
-	if (sanity_test(d, p) < 0)
-		goto err;
-	quit_workers(d, p);
+	worker_params.d = d;
+	worker_params.db = db;
 
-	rte_eal_mp_remote_launch(handle_work_with_free_mbufs, d, SKIP_MASTER);
-	if (sanity_test_with_mbuf_alloc(d, p) < 0)
-		goto err;
-	quit_workers(d, p);
+	for (i = 0; i < DIST_NUM_TYPES; i++) {
 
-	if (rte_lcore_count() > 2) {
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test, d,
-				SKIP_MASTER);
-		if (sanity_test_with_worker_shutdown(d, p) < 0)
-			goto err;
-		quit_workers(d, p);
+		worker_params.dist_type = i;
 
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test, d,
-				SKIP_MASTER);
-		if (test_flush_with_worker_shutdown(d, p) < 0)
+		rte_eal_mp_remote_launch(handle_work,
+				&worker_params, SKIP_MASTER);
+		if (sanity_test(&worker_params, p) < 0)
 			goto err;
-		quit_workers(d, p);
+		quit_workers(&worker_params, p);
 
-	} else {
-		printf("Not enough cores to run tests for worker shutdown\n");
+		rte_eal_mp_remote_launch(handle_work_with_free_mbufs,
+				&worker_params, SKIP_MASTER);
+		if (sanity_test_with_mbuf_alloc(&worker_params, p) < 0)
+			goto err;
+		quit_workers(&worker_params, p);
+
+		if (rte_lcore_count() > 2) {
+			rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+					&worker_params,
+					SKIP_MASTER);
+			if (sanity_test_with_worker_shutdown(&worker_params,
+					p) < 0)
+				goto err;
+			quit_workers(&worker_params, p);
+
+			rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+					&worker_params,
+					SKIP_MASTER);
+			if (test_flush_with_worker_shutdown(&worker_params,
+					p) < 0)
+				goto err;
+			quit_workers(&worker_params, p);
+
+		} else {
+			printf("Too few cores to run worker shutdown test\n");
+		}
 	}
 
 	if (test_error_distributor_create_numworkers() == -1 ||
@@ -572,7 +855,7 @@ test_distributor(void)
 	return 0;
 
 err:
-	quit_workers(d, p);
+	quit_workers(&worker_params, p);
 	return -1;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [WARNING: A/V UNSCANNABLE][PATCH v3 4/6] test: add distributor_perf autotest
  2017-01-02 10:22       ` [WARNING: A/V UNSCANNABLE][PATCH v3 0/6] distributor-performance-improvements David Hunt
                           ` (2 preceding siblings ...)
  2017-01-02 10:22         ` [WARNING: A/V UNSCANNABLE][PATCH v3 3/6] test: unit tests for new distributor burst api David Hunt
@ 2017-01-02 10:22         ` David Hunt
  2017-01-02 10:22         ` [WARNING: A/V UNSCANNABLE][PATCH v3 5/6] example: distributor app showing burst api David Hunt
  2017-01-02 10:22         ` [WARNING: A/V UNSCANNABLE][PATCH v3 6/6] doc: distributor library changes for new " David Hunt
  5 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-01-02 10:22 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 app/test/test_distributor_perf.c | 148 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 137 insertions(+), 11 deletions(-)

diff --git a/app/test/test_distributor_perf.c b/app/test/test_distributor_perf.c
index 7947fe9..b273bf9 100644
--- a/app/test/test_distributor_perf.c
+++ b/app/test/test_distributor_perf.c
@@ -40,9 +40,11 @@
 #include <rte_common.h>
 #include <rte_mbuf.h>
 #include <rte_distributor.h>
+#include <rte_distributor_burst.h>
 
-#define ITER_POWER 20 /* log 2 of how many iterations we do when timing. */
-#define BURST 32
+#define ITER_POWER_CL 25 /* log 2 of how many iterations  for Cache Line test */
+#define ITER_POWER 21 /* log 2 of how many iterations we do when timing. */
+#define BURST 64
 #define BIG_BATCH 1024
 
 /* static vars - zero initialized by default */
@@ -54,7 +56,8 @@ struct worker_stats {
 } __rte_cache_aligned;
 struct worker_stats worker_stats[RTE_MAX_LCORE];
 
-/* worker thread used for testing the time to do a round-trip of a cache
+/*
+ * worker thread used for testing the time to do a round-trip of a cache
  * line between two cores and back again
  */
 static void
@@ -69,7 +72,8 @@ flip_bit(volatile uint64_t *arg)
 	}
 }
 
-/* test case to time the number of cycles to round-trip a cache line between
+/*
+ * test case to time the number of cycles to round-trip a cache line between
  * two cores and back again.
  */
 static void
@@ -86,7 +90,7 @@ time_cache_line_switch(void)
 		rte_pause();
 
 	const uint64_t start_time = rte_rdtsc();
-	for (i = 0; i < (1 << ITER_POWER); i++) {
+	for (i = 0; i < (1 << ITER_POWER_CL); i++) {
 		while (*pdata)
 			rte_pause();
 		*pdata = 1;
@@ -98,13 +102,14 @@ time_cache_line_switch(void)
 	*pdata = 2;
 	rte_eal_wait_lcore(slaveid);
 	printf("==== Cache line switch test ===\n");
-	printf("Time for %u iterations = %"PRIu64" ticks\n", (1<<ITER_POWER),
+	printf("Time for %u iterations = %"PRIu64" ticks\n", (1<<ITER_POWER_CL),
 			end_time-start_time);
 	printf("Ticks per iteration = %"PRIu64"\n\n",
-			(end_time-start_time) >> ITER_POWER);
+			(end_time-start_time) >> ITER_POWER_CL);
 }
 
-/* returns the total count of the number of packets handled by the worker
+/*
+ * returns the total count of the number of packets handled by the worker
  * functions given below.
  */
 static unsigned
@@ -123,7 +128,8 @@ clear_packet_count(void)
 	memset(&worker_stats, 0, sizeof(worker_stats));
 }
 
-/* this is the basic worker function for performance tests.
+/*
+ * this is the basic worker function for performance tests.
  * it does nothing but return packets and count them.
  */
 static int
@@ -144,7 +150,37 @@ handle_work(void *arg)
 	return 0;
 }
 
-/* this basic performance test just repeatedly sends in 32 packets at a time
+/*
+ * this is the basic worker function for performance tests.
+ * it does nothing but return packets and count them.
+ */
+static int
+handle_work_burst(void *arg)
+{
+	struct rte_distributor_burst *d = arg;
+	unsigned int count = 0;
+	unsigned int num = 0;
+	int i;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
+
+	num = rte_distributor_get_pkt_burst(d, id, buf, buf, num);
+	while (!quit) {
+		worker_stats[id].handled_packets += num;
+		count += num;
+		num = rte_distributor_get_pkt_burst(d, id, buf, buf, num);
+	}
+	worker_stats[id].handled_packets += num;
+	count += num;
+	rte_distributor_return_pkt_burst(d, id, buf, num);
+	return 0;
+}
+
+/*
+ * this basic performance test just repeatedly sends in 32 packets at a time
  * to the distributor and verifies at the end that we got them all in the worker
  * threads and finally how long per packet the processing took.
  */
@@ -174,6 +210,8 @@ perf_test(struct rte_distributor *d, struct rte_mempool *p)
 		rte_distributor_process(d, NULL, 0);
 	} while (total_packet_count() < (BURST << ITER_POWER));
 
+	rte_distributor_clear_returns(d);
+
 	printf("=== Performance test of distributor ===\n");
 	printf("Time per burst:  %"PRIu64"\n", (end - start) >> ITER_POWER);
 	printf("Time per packet: %"PRIu64"\n\n",
@@ -190,6 +228,55 @@ perf_test(struct rte_distributor *d, struct rte_mempool *p)
 	return 0;
 }
 
+/*
+ * this basic performance test just repeatedly sends in 32 packets at a time
+ * to the distributor and verifies at the end that we got them all in the worker
+ * threads and finally how long per packet the processing took.
+ */
+static inline int
+perf_test_burst(struct rte_distributor_burst *d, struct rte_mempool *p)
+{
+	unsigned int i;
+	uint64_t start, end;
+	struct rte_mbuf *bufs[BURST];
+
+	clear_packet_count();
+	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
+		printf("Error getting mbufs from pool\n");
+		return -1;
+	}
+	/* ensure we have different hash value for each pkt */
+	for (i = 0; i < BURST; i++)
+		bufs[i]->hash.usr = i;
+
+	start = rte_rdtsc();
+	for (i = 0; i < (1<<ITER_POWER); i++)
+		rte_distributor_process_burst(d, bufs, BURST);
+	end = rte_rdtsc();
+
+	do {
+		usleep(100);
+		rte_distributor_process_burst(d, NULL, 0);
+	} while (total_packet_count() < (BURST << ITER_POWER));
+
+	rte_distributor_clear_returns_burst(d);
+
+	printf("=== Performance test of burst distributor ===\n");
+	printf("Time per burst:  %"PRIu64"\n", (end - start) >> ITER_POWER);
+	printf("Time per packet: %"PRIu64"\n\n",
+			((end - start) >> ITER_POWER)/BURST);
+	rte_mempool_put_bulk(p, (void *)bufs, BURST);
+
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
+	printf("Total packets: %u (%x)\n", total_packet_count(),
+			total_packet_count());
+	printf("=== Perf test done ===\n\n");
+
+	return 0;
+}
+
 /* Useful function which ensures that all worker functions terminate */
 static void
 quit_workers(struct rte_distributor *d, struct rte_mempool *p)
@@ -212,10 +299,34 @@ quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 	worker_idx = 0;
 }
 
+/* Useful function which ensures that all worker functions terminate */
+static void
+quit_workers_burst(struct rte_distributor_burst *d, struct rte_mempool *p)
+{
+	const unsigned int num_workers = rte_lcore_count() - 1;
+	unsigned int i;
+	struct rte_mbuf *bufs[RTE_MAX_LCORE];
+
+	rte_mempool_get_bulk(p, (void *)bufs, num_workers);
+
+	quit = 1;
+	for (i = 0; i < num_workers; i++)
+		bufs[i]->hash.usr = i << 1;
+	rte_distributor_process_burst(d, bufs, num_workers);
+
+	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
+
+	rte_distributor_process_burst(d, NULL, 0);
+	rte_eal_mp_wait_lcore();
+	quit = 0;
+	worker_idx = 0;
+}
+
 static int
 test_distributor_perf(void)
 {
 	static struct rte_distributor *d;
+	static struct rte_distributor_burst *db;
 	static struct rte_mempool *p;
 
 	if (rte_lcore_count() < 2) {
@@ -234,10 +345,20 @@ test_distributor_perf(void)
 			return -1;
 		}
 	} else {
-		rte_distributor_flush(d);
 		rte_distributor_clear_returns(d);
 	}
 
+	if (db == NULL) {
+		db = rte_distributor_create_burst("Test_burst", rte_socket_id(),
+				rte_lcore_count() - 1);
+		if (db == NULL) {
+			printf("Error creating burst distributor\n");
+			return -1;
+		}
+	} else {
+		rte_distributor_clear_returns_burst(db);
+	}
+
 	const unsigned nb_bufs = (511 * rte_lcore_count()) < BIG_BATCH ?
 			(BIG_BATCH * 2) - 1 : (511 * rte_lcore_count());
 	if (p == NULL) {
@@ -254,6 +375,11 @@ test_distributor_perf(void)
 		return -1;
 	quit_workers(d, p);
 
+	rte_eal_mp_remote_launch(handle_work_burst, db, SKIP_MASTER);
+	if (perf_test_burst(db, p) < 0)
+		return -1;
+	quit_workers_burst(db, p);
+
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [WARNING: A/V UNSCANNABLE][PATCH v3 5/6] example: distributor app showing burst api
  2017-01-02 10:22       ` [WARNING: A/V UNSCANNABLE][PATCH v3 0/6] distributor-performance-improvements David Hunt
                           ` (3 preceding siblings ...)
  2017-01-02 10:22         ` [WARNING: A/V UNSCANNABLE][PATCH v3 4/6] test: add distributor_perf autotest David Hunt
@ 2017-01-02 10:22         ` David Hunt
  2017-01-02 10:22         ` [WARNING: A/V UNSCANNABLE][PATCH v3 6/6] doc: distributor library changes for new " David Hunt
  5 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-01-02 10:22 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 examples/distributor/main.c | 508 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 390 insertions(+), 118 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index e7641d2..eebfb74 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -1,8 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -31,6 +30,8 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
+#define BURST_API 1
+
 #include <stdint.h>
 #include <inttypes.h>
 #include <unistd.h>
@@ -43,39 +44,87 @@
 #include <rte_malloc.h>
 #include <rte_debug.h>
 #include <rte_prefetch.h>
+#if BURST_API
+#include <rte_distributor_burst.h>
+#else
 #include <rte_distributor.h>
+#endif
 
-#define RX_RING_SIZE 256
-#define TX_RING_SIZE 512
+#define RX_QUEUE_SIZE 512
+#define TX_QUEUE_SIZE 512
 #define NUM_MBUFS ((64*1024)-1)
-#define MBUF_CACHE_SIZE 250
+#define MBUF_CACHE_SIZE 128
+#if BURST_API
+#define BURST_SIZE 64
+#define SCHED_RX_RING_SZ 8192
+#define SCHED_TX_RING_SZ 65536
+#else
 #define BURST_SIZE 32
-#define RTE_RING_SZ 1024
+#define SCHED_RX_RING_SZ 1024
+#define SCHED_TX_RING_SZ 1024
+#endif
+#define BURST_SIZE_TX 32
 
 #define RTE_LOGTYPE_DISTRAPP RTE_LOGTYPE_USER1
 
+#define ANSI_COLOR_RED     "\x1b[31m"
+#define ANSI_COLOR_RESET   "\x1b[0m"
+
 /* mask of enabled ports */
 static uint32_t enabled_port_mask;
 volatile uint8_t quit_signal;
 volatile uint8_t quit_signal_rx;
+volatile uint8_t quit_signal_dist;
+volatile uint8_t quit_signal_work;
 
 static volatile struct app_stats {
 	struct {
 		uint64_t rx_pkts;
 		uint64_t returned_pkts;
 		uint64_t enqueued_pkts;
+		uint64_t enqdrop_pkts;
 	} rx __rte_cache_aligned;
+	int pad1 __rte_cache_aligned;
+
+	struct {
+		uint64_t in_pkts;
+		uint64_t ret_pkts;
+		uint64_t sent_pkts;
+		uint64_t enqdrop_pkts;
+	} dist __rte_cache_aligned;
+	int pad2 __rte_cache_aligned;
 
 	struct {
 		uint64_t dequeue_pkts;
 		uint64_t tx_pkts;
+		uint64_t enqdrop_pkts;
 	} tx __rte_cache_aligned;
+	int pad3 __rte_cache_aligned;
+
+	uint64_t worker_pkts[64] __rte_cache_aligned;
+
+	int pad4 __rte_cache_aligned;
+
+	uint64_t worker_bursts[64][8] __rte_cache_aligned;
+
+	int pad5 __rte_cache_aligned;
+
+	uint64_t port_rx_pkts[64] __rte_cache_aligned;
+	uint64_t port_tx_pkts[64] __rte_cache_aligned;
 } app_stats;
 
+struct app_stats prev_app_stats;
+
 static const struct rte_eth_conf port_conf_default = {
 	.rxmode = {
 		.mq_mode = ETH_MQ_RX_RSS,
 		.max_rx_pkt_len = ETHER_MAX_LEN,
+		.split_hdr_size = 0,
+		.header_split   = 0, /**< Header Split disabled */
+		.hw_ip_checksum = 1, /**< IP checksum offload enabled */
+		.hw_vlan_filter = 0, /**< VLAN filtering disabled */
+		.jumbo_frame    = 0, /**< Jumbo Frame Support disabled */
+		.hw_strip_crc   = 0, /**< CRC stripped by hardware */
 	},
 	.txmode = {
 		.mq_mode = ETH_MQ_TX_NONE,
@@ -93,6 +142,8 @@ struct output_buffer {
 	struct rte_mbuf *mbufs[BURST_SIZE];
 };
 
+static void print_stats(void);
+
 /*
  * Initialises a given port using global settings and with the rx buffers
  * coming from the mbuf_pool passed as parameter
@@ -101,9 +152,13 @@ static inline int
 port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 {
 	struct rte_eth_conf port_conf = port_conf_default;
-	const uint16_t rxRings = 1, txRings = rte_lcore_count() - 1;
-	int retval;
+	const uint16_t rxRings = 1;
+	uint16_t txRings = rte_lcore_count() - 1;
 	uint16_t q;
+	int retval;
+
+	if (txRings > RTE_MAX_ETHPORTS)
+		txRings = RTE_MAX_ETHPORTS;
 
 	if (port >= rte_eth_dev_count())
 		return -1;
@@ -113,7 +168,7 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 		return retval;
 
 	for (q = 0; q < rxRings; q++) {
-		retval = rte_eth_rx_queue_setup(port, q, RX_RING_SIZE,
+		retval = rte_eth_rx_queue_setup(port, q, RX_QUEUE_SIZE,
 						rte_eth_dev_socket_id(port),
 						NULL, mbuf_pool);
 		if (retval < 0)
@@ -121,7 +176,7 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 	}
 
 	for (q = 0; q < txRings; q++) {
-		retval = rte_eth_tx_queue_setup(port, q, TX_RING_SIZE,
+		retval = rte_eth_tx_queue_setup(port, q, TX_QUEUE_SIZE,
 						rte_eth_dev_socket_id(port),
 						NULL);
 		if (retval < 0)
@@ -134,7 +189,8 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 
 	struct rte_eth_link link;
 	rte_eth_link_get_nowait(port, &link);
-	if (!link.link_status) {
+	while (!link.link_status) {
+		printf("Waiting for Link up on port %"PRIu8"\n", port);
 		sleep(1);
 		rte_eth_link_get_nowait(port, &link);
 	}
@@ -160,41 +216,52 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 
 struct lcore_params {
 	unsigned worker_id;
-	struct rte_distributor *d;
-	struct rte_ring *r;
+	struct rte_distributor_burst *d;
+	struct rte_ring *rx_dist_ring;
+	struct rte_ring *dist_tx_ring;
 	struct rte_mempool *mem_pool;
 };
 
-static int
-quit_workers(struct rte_distributor *d, struct rte_mempool *p)
+static inline void
+flush_one_port(struct output_buffer *outbuf, uint8_t outp)
 {
-	const unsigned num_workers = rte_lcore_count() - 2;
-	unsigned i;
-	struct rte_mbuf *bufs[num_workers];
+	unsigned int nb_tx = rte_eth_tx_burst(outp, 0,
+			outbuf->mbufs, outbuf->count);
+	app_stats.tx.tx_pkts += outbuf->count;
 
-	if (rte_mempool_get_bulk(p, (void *)bufs, num_workers) != 0) {
-		printf("line %d: Error getting mbufs from pool\n", __LINE__);
-		return -1;
+	if (unlikely(nb_tx < outbuf->count)) {
+		app_stats.tx.enqdrop_pkts +=  outbuf->count - nb_tx;
+		do {
+			rte_pktmbuf_free(outbuf->mbufs[nb_tx]);
+		} while (++nb_tx < outbuf->count);
 	}
+	outbuf->count = 0;
+}
+
+static inline void
+flush_all_ports(struct output_buffer *tx_buffers, uint8_t nb_ports)
+{
+	uint8_t outp;
 
-	for (i = 0; i < num_workers; i++)
-		bufs[i]->hash.rss = i << 1;
+	for (outp = 0; outp < nb_ports; outp++) {
+		/* skip ports that are not enabled */
+		if ((enabled_port_mask & (1 << outp)) == 0)
+			continue;
 
-	rte_distributor_process(d, bufs, num_workers);
-	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
+		if (tx_buffers[outp].count == 0)
+			continue;
 
-	return 0;
+		flush_one_port(&tx_buffers[outp], outp);
+	}
 }
 
 static int
 lcore_rx(struct lcore_params *p)
 {
-	struct rte_distributor *d = p->d;
-	struct rte_mempool *mem_pool = p->mem_pool;
-	struct rte_ring *r = p->r;
 	const uint8_t nb_ports = rte_eth_dev_count();
 	const int socket_id = rte_socket_id();
 	uint8_t port;
+	struct rte_mbuf *bufs[BURST_SIZE*2];
 
 	for (port = 0; port < nb_ports; port++) {
 		/* skip ports that are not enabled */
@@ -210,6 +277,7 @@ lcore_rx(struct lcore_params *p)
 
 	printf("\nCore %u doing packet RX.\n", rte_lcore_id());
 	port = 0;
+
 	while (!quit_signal_rx) {
 
 		/* skip ports that are not enabled */
@@ -218,7 +286,7 @@ lcore_rx(struct lcore_params *p)
 				port = 0;
 			continue;
 		}
-		struct rte_mbuf *bufs[BURST_SIZE*2];
+
 		const uint16_t nb_rx = rte_eth_rx_burst(port, 0, bufs,
 				BURST_SIZE);
 		if (unlikely(nb_rx == 0)) {
@@ -228,19 +296,46 @@ lcore_rx(struct lcore_params *p)
 		}
 		app_stats.rx.rx_pkts += nb_rx;
 
-		rte_distributor_process(d, bufs, nb_rx);
-		const uint16_t nb_ret = rte_distributor_returned_pkts(d,
-				bufs, BURST_SIZE*2);
+/*
+ * You can run the distributor on the rx core with this code. Returned
+ * packets are then send straight to the tx core.
+ */
+#if 0
+
+#if BURST_API
+	rte_distributor_process_burst(d, bufs, nb_rx);
+	const uint16_t nb_ret = rte_distributor_returned_pkts_burst(d,
+			bufs, BURST_SIZE*2);
+#else
+	rte_distributor_process(d, bufs, nb_rx);
+	const uint16_t nb_ret = rte_distributor_returned_pkts(d,
+			bufs, BURST_SIZE*2);
+#endif
+
 		app_stats.rx.returned_pkts += nb_ret;
 		if (unlikely(nb_ret == 0)) {
 			if (++port == nb_ports)
 				port = 0;
 			continue;
 		}
-
-		uint16_t sent = rte_ring_enqueue_burst(r, (void *)bufs, nb_ret);
+		struct rte_ring *tx_ring = p->dist_tx_ring;
+		uint16_t sent = rte_ring_enqueue_burst(tx_ring,
+				(void *)bufs, nb_ret);
+#else
+		uint16_t nb_ret = nb_rx;
+		/*
+		* Swap the following two lines if you want the rx traffic
+		* to go directly to tx, no distribution.
+		*/
+		struct rte_ring *out_ring = p->rx_dist_ring;
+		//struct rte_ring *out_ring = p->dist_tx_ring;
+
+		uint16_t sent = rte_ring_enqueue_burst(out_ring,
+				(void *)bufs, nb_ret);
+#endif
 		app_stats.rx.enqueued_pkts += sent;
 		if (unlikely(sent < nb_ret)) {
+			app_stats.rx.enqdrop_pkts +=  nb_ret - sent;
 			RTE_LOG_DP(DEBUG, DISTRAPP,
 				"%s:Packet loss due to full ring\n", __func__);
 			while (sent < nb_ret)
@@ -249,56 +344,88 @@ lcore_rx(struct lcore_params *p)
 		if (++port == nb_ports)
 			port = 0;
 	}
-	rte_distributor_process(d, NULL, 0);
-	/* flush distributor to bring to known state */
-	rte_distributor_flush(d);
 	/* set worker & tx threads quit flag */
+	printf("\nCore %u exiting rx task.\n", rte_lcore_id());
 	quit_signal = 1;
-	/*
-	 * worker threads may hang in get packet as
-	 * distributor process is not running, just make sure workers
-	 * get packets till quit_signal is actually been
-	 * received and they gracefully shutdown
-	 */
-	if (quit_workers(d, mem_pool) != 0)
-		return -1;
-	/* rx thread should quit at last */
 	return 0;
 }
 
-static inline void
-flush_one_port(struct output_buffer *outbuf, uint8_t outp)
-{
-	unsigned nb_tx = rte_eth_tx_burst(outp, 0, outbuf->mbufs,
-			outbuf->count);
-	app_stats.tx.tx_pkts += nb_tx;
 
-	if (unlikely(nb_tx < outbuf->count)) {
-		RTE_LOG_DP(DEBUG, DISTRAPP,
-			"%s:Packet loss with tx_burst\n", __func__);
-		do {
-			rte_pktmbuf_free(outbuf->mbufs[nb_tx]);
-		} while (++nb_tx < outbuf->count);
-	}
-	outbuf->count = 0;
-}
 
-static inline void
-flush_all_ports(struct output_buffer *tx_buffers, uint8_t nb_ports)
+static int
+lcore_distributor(struct lcore_params *p)
 {
-	uint8_t outp;
-	for (outp = 0; outp < nb_ports; outp++) {
-		/* skip ports that are not enabled */
-		if ((enabled_port_mask & (1 << outp)) == 0)
-			continue;
-
-		if (tx_buffers[outp].count == 0)
-			continue;
-
-		flush_one_port(&tx_buffers[outp], outp);
+	struct rte_ring *in_r = p->rx_dist_ring;
+	struct rte_ring *out_r = p->dist_tx_ring;
+	struct rte_mbuf *bufs[BURST_SIZE * 4];
+	struct rte_distributor_burst *d = p->d;
+
+	printf("\nCore %u acting as distributor core.\n", rte_lcore_id());
+	while (!quit_signal_dist) {
+		const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
+				(void *)bufs, BURST_SIZE*1);
+		if (nb_rx) {
+			app_stats.dist.in_pkts += nb_rx;
+/*
+ * This '#if' allows you to bypass the distributor. Incoming packets may be
+ * sent straight to the tx ring.
+ */
+#if 1
+
+#if BURST_API
+			/* Distribute the packets */
+			rte_distributor_process_burst(d, bufs, nb_rx);
+			/* Handle Returns */
+			const uint16_t nb_ret =
+				rte_distributor_returned_pkts_burst(d,
+					bufs, BURST_SIZE*2);
+#else
+			/* Distribute the packets */
+			rte_distributor_process(d, bufs, nb_rx);
+			/* Handle Returns */
+			const uint16_t nb_ret =
+				rte_distributor_returned_pkts(d,
+					bufs, BURST_SIZE*2);
+#endif
+
+#else
+			/* Bypass the distributor */
+			const unsigned int xor_val = (rte_eth_dev_count() > 1);
+			/* Touch the mbuf by xor'ing the port */
+			for (unsigned int i = 0; i < nb_rx; i++)
+				bufs[i]->port ^= xor_val;
+
+			const uint16_t nb_ret = nb_rx;
+#endif
+			if (unlikely(nb_ret == 0))
+				continue;
+			app_stats.dist.ret_pkts += nb_ret;
+
+			uint16_t sent = rte_ring_enqueue_burst(out_r,
+					(void *)bufs, nb_ret);
+			app_stats.dist.sent_pkts += sent;
+			if (unlikely(sent < nb_ret)) {
+				app_stats.dist.enqdrop_pkts += nb_ret - sent;
+				RTE_LOG(DEBUG, DISTRAPP,
+					"%s:Packet loss due to full out ring\n",
+					__func__);
+				while (sent < nb_ret)
+					rte_pktmbuf_free(bufs[sent++]);
+			}
+		}
 	}
+	printf("\nCore %u exiting distributor task.\n", rte_lcore_id());
+	quit_signal_work = 1;
+
+#if BURST_API
+	/* Unblock any returns so workers can exit */
+	rte_distributor_clear_returns_burst(d);
+#endif
+	quit_signal_rx = 1;
+	return 0;
 }
 
+
 static int
 lcore_tx(struct rte_ring *in_r)
 {
@@ -327,9 +454,9 @@ lcore_tx(struct rte_ring *in_r)
 			if ((enabled_port_mask & (1 << port)) == 0)
 				continue;
 
-			struct rte_mbuf *bufs[BURST_SIZE];
+			struct rte_mbuf *bufs[BURST_SIZE_TX];
 			const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
-					(void *)bufs, BURST_SIZE);
+					(void *)bufs, BURST_SIZE_TX);
 			app_stats.tx.dequeue_pkts += nb_rx;
 
 			/* if we get no traffic, flush anything we have */
@@ -358,11 +485,12 @@ lcore_tx(struct rte_ring *in_r)
 
 				outbuf = &tx_buffers[outp];
 				outbuf->mbufs[outbuf->count++] = bufs[i];
-				if (outbuf->count == BURST_SIZE)
+				if (outbuf->count == BURST_SIZE_TX)
 					flush_one_port(outbuf, outp);
 			}
 		}
 	}
+	printf("\nCore %u exiting tx task.\n", rte_lcore_id());
 	return 0;
 }
 
@@ -371,52 +499,147 @@ int_handler(int sig_num)
 {
 	printf("Exiting on signal %d\n", sig_num);
 	/* set quit flag for rx thread to exit */
-	quit_signal_rx = 1;
+	quit_signal_dist = 1;
 }
 
 static void
 print_stats(void)
 {
 	struct rte_eth_stats eth_stats;
-	unsigned i;
-
-	printf("\nRX thread stats:\n");
-	printf(" - Received:    %"PRIu64"\n", app_stats.rx.rx_pkts);
-	printf(" - Processed:   %"PRIu64"\n", app_stats.rx.returned_pkts);
-	printf(" - Enqueued:    %"PRIu64"\n", app_stats.rx.enqueued_pkts);
-
-	printf("\nTX thread stats:\n");
-	printf(" - Dequeued:    %"PRIu64"\n", app_stats.tx.dequeue_pkts);
-	printf(" - Transmitted: %"PRIu64"\n", app_stats.tx.tx_pkts);
+	unsigned int i, j;
+	const unsigned int num_workers = rte_lcore_count() - 4;
 
 	for (i = 0; i < rte_eth_dev_count(); i++) {
 		rte_eth_stats_get(i, &eth_stats);
-		printf("\nPort %u stats:\n", i);
-		printf(" - Pkts in:   %"PRIu64"\n", eth_stats.ipackets);
-		printf(" - Pkts out:  %"PRIu64"\n", eth_stats.opackets);
-		printf(" - In Errs:   %"PRIu64"\n", eth_stats.ierrors);
-		printf(" - Out Errs:  %"PRIu64"\n", eth_stats.oerrors);
-		printf(" - Mbuf Errs: %"PRIu64"\n", eth_stats.rx_nombuf);
+		app_stats.port_rx_pkts[i] = eth_stats.ipackets;
+		app_stats.port_tx_pkts[i] = eth_stats.opackets;
+	}
+
+	printf("\n\nRX Thread:\n");
+	for (i = 0; i < rte_eth_dev_count(); i++) {
+		printf("Port %u Pktsin : %5.2f\n", i,
+				(app_stats.port_rx_pkts[i] -
+				prev_app_stats.port_rx_pkts[i])/1000000.0);
+		prev_app_stats.port_rx_pkts[i] = app_stats.port_rx_pkts[i];
+	}
+	printf(" - Received:    %5.2f\n",
+			(app_stats.rx.rx_pkts -
+			prev_app_stats.rx.rx_pkts)/1000000.0);
+	printf(" - Returned:    %5.2f\n",
+			(app_stats.rx.returned_pkts -
+			prev_app_stats.rx.returned_pkts)/1000000.0);
+	printf(" - Enqueued:    %5.2f\n",
+			(app_stats.rx.enqueued_pkts -
+			prev_app_stats.rx.enqueued_pkts)/1000000.0);
+	printf(" - Dropped:     %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.rx.enqdrop_pkts -
+			prev_app_stats.rx.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	printf("Distributor thread:\n");
+	printf(" - In:          %5.2f\n",
+			(app_stats.dist.in_pkts -
+			prev_app_stats.dist.in_pkts)/1000000.0);
+	printf(" - Returned:    %5.2f\n",
+			(app_stats.dist.ret_pkts -
+			prev_app_stats.dist.ret_pkts)/1000000.0);
+	printf(" - Sent:        %5.2f\n",
+			(app_stats.dist.sent_pkts -
+			prev_app_stats.dist.sent_pkts)/1000000.0);
+	printf(" - Dropped      %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.dist.enqdrop_pkts -
+			prev_app_stats.dist.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	printf("TX thread:\n");
+	printf(" - Dequeued:    %5.2f\n",
+			(app_stats.tx.dequeue_pkts -
+			prev_app_stats.tx.dequeue_pkts)/1000000.0);
+	for (i = 0; i < rte_eth_dev_count(); i++) {
+		printf("Port %u Pktsout: %5.2f\n",
+				i, (app_stats.port_tx_pkts[i] -
+				prev_app_stats.port_tx_pkts[i])/1000000.0);
+		prev_app_stats.port_tx_pkts[i] = app_stats.port_tx_pkts[i];
+	}
+	printf(" - Transmitted: %5.2f\n",
+			(app_stats.tx.tx_pkts -
+			prev_app_stats.tx.tx_pkts)/1000000.0);
+	printf(" - Dropped:     %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.tx.enqdrop_pkts -
+			prev_app_stats.tx.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	prev_app_stats.rx.rx_pkts = app_stats.rx.rx_pkts;
+	prev_app_stats.rx.returned_pkts = app_stats.rx.returned_pkts;
+	prev_app_stats.rx.enqueued_pkts = app_stats.rx.enqueued_pkts;
+	prev_app_stats.rx.enqdrop_pkts = app_stats.rx.enqdrop_pkts;
+	prev_app_stats.dist.in_pkts = app_stats.dist.in_pkts;
+	prev_app_stats.dist.ret_pkts = app_stats.dist.ret_pkts;
+	prev_app_stats.dist.sent_pkts = app_stats.dist.sent_pkts;
+	prev_app_stats.dist.enqdrop_pkts = app_stats.dist.enqdrop_pkts;
+	prev_app_stats.tx.dequeue_pkts = app_stats.tx.dequeue_pkts;
+	prev_app_stats.tx.tx_pkts = app_stats.tx.tx_pkts;
+	prev_app_stats.tx.enqdrop_pkts = app_stats.tx.enqdrop_pkts;
+
+	for (i = 0; i < num_workers; i++) {
+		printf("Worker %02u Pkts: %5.2f. Bursts(1-8): ", i,
+				(app_stats.worker_pkts[i] -
+				prev_app_stats.worker_pkts[i])/1000000.0);
+		for (j = 0; j < 8; j++)
+			printf("%ld ", app_stats.worker_bursts[i][j]);
+		printf("\n");
+		prev_app_stats.worker_pkts[i] = app_stats.worker_pkts[i];
 	}
 }
 
 static int
 lcore_worker(struct lcore_params *p)
 {
-	struct rte_distributor *d = p->d;
+	struct rte_distributor_burst *d = p->d;
 	const unsigned id = p->worker_id;
+	unsigned int num = 0;
+	unsigned int i;
+
 	/*
 	 * for single port, xor_val will be zero so we won't modify the output
 	 * port, otherwise we send traffic from 0 to 1, 2 to 3, and vice versa
 	 */
 	const unsigned xor_val = (rte_eth_dev_count() > 1);
-	struct rte_mbuf *buf = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
+
+	app_stats.worker_pkts[p->worker_id] = 1;
+
 
 	printf("\nCore %u acting as worker core.\n", rte_lcore_id());
-	while (!quit_signal) {
-		buf = rte_distributor_get_pkt(d, id, buf);
-		buf->port ^= xor_val;
+	while (!quit_signal_work) {
+
+#if BURST_API
+		num = rte_distributor_get_pkt_burst(d, id, buf, buf, num);
+		/* Do a little bit of work for each packet */
+		for (i = 0; i < num; i++) {
+			uint64_t t = __rdtsc()+100;
+
+			while (__rdtsc() < t)
+				rte_pause();
+			buf[i]->port ^= xor_val;
+		}
+#else
+		buf[0] = rte_distributor_get_pkt(d, id, buf[0]);
+		uint64_t t = __rdtsc() + 10;
+
+		while (__rdtsc() < t)
+			rte_pause();
+		buf[0]->port ^= xor_val;
+#endif
+
+		app_stats.worker_pkts[p->worker_id] += num;
+		if (num > 0)
+			app_stats.worker_bursts[p->worker_id][num-1]++;
 	}
+	printf("\nCore %u exiting worker task.\n", rte_lcore_id());
 	return 0;
 }
 
@@ -496,12 +719,14 @@ int
 main(int argc, char *argv[])
 {
 	struct rte_mempool *mbuf_pool;
-	struct rte_distributor *d;
-	struct rte_ring *output_ring;
+	struct rte_distributor_burst *d;
+	struct rte_ring *dist_tx_ring;
+	struct rte_ring *rx_dist_ring;
 	unsigned lcore_id, worker_id = 0;
 	unsigned nb_ports;
 	uint8_t portid;
 	uint8_t nb_ports_available;
+	uint64_t t, freq;
 
 	/* catch ctrl-c so we can print on exit */
 	signal(SIGINT, int_handler);
@@ -518,10 +743,12 @@ main(int argc, char *argv[])
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Invalid distributor parameters\n");
 
-	if (rte_lcore_count() < 3)
+	if (rte_lcore_count() < 5)
 		rte_exit(EXIT_FAILURE, "Error, This application needs at "
-				"least 3 logical cores to run:\n"
-				"1 lcore for packet RX and distribution\n"
+				"least 5 logical cores to run:\n"
+				"1 lcore for stats (can be core 0)\n"
+				"1 lcore for packet RX\n"
+				"1 lcore for distribution\n"
 				"1 lcore for packet TX\n"
 				"and at least 1 lcore for worker threads\n");
 
@@ -560,41 +787,86 @@ main(int argc, char *argv[])
 				"All available ports are disabled. Please set portmask.\n");
 	}
 
+#if BURST_API
+	d = rte_distributor_create_burst("PKT_DIST", rte_socket_id(),
+			rte_lcore_count() - 4);
+#else
 	d = rte_distributor_create("PKT_DIST", rte_socket_id(),
-			rte_lcore_count() - 2);
+			rte_lcore_count() - 4);
+#endif
 	if (d == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create distributor\n");
 
 	/*
-	 * scheduler ring is read only by the transmitter core, but written to
-	 * by multiple threads
+	 * scheduler ring is read by the transmitter core, and written to
+	 * by scheduler core
 	 */
-	output_ring = rte_ring_create("Output_ring", RTE_RING_SZ,
-			rte_socket_id(), RING_F_SC_DEQ);
-	if (output_ring == NULL)
+	dist_tx_ring = rte_ring_create("Output_ring", SCHED_TX_RING_SZ,
+			rte_socket_id(), RING_F_SC_DEQ | RING_F_SP_ENQ);
+	if (dist_tx_ring == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot create output ring\n");
+
+	rx_dist_ring = rte_ring_create("Input_ring", SCHED_RX_RING_SZ,
+			rte_socket_id(), RING_F_SC_DEQ | RING_F_SP_ENQ);
+	if (rx_dist_ring == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create output ring\n");
 
 	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
-		if (worker_id == rte_lcore_count() - 2)
+		if (worker_id == rte_lcore_count() - 3) {
+			printf("Starting distributor on lcore_id %d\n",
+					lcore_id);
+			/* distributor core */
+			struct lcore_params *p =
+					rte_malloc(NULL, sizeof(*p), 0);
+			if (!p)
+				rte_panic("malloc failure\n");
+			*p = (struct lcore_params){worker_id, d,
+					rx_dist_ring, dist_tx_ring, mbuf_pool};
+			rte_eal_remote_launch(
+					(lcore_function_t *)lcore_distributor,
+					p, lcore_id);
+		} else if (worker_id == rte_lcore_count() - 4) {
+			printf("Starting tx  on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
+			/* tx core */
 			rte_eal_remote_launch((lcore_function_t *)lcore_tx,
-					output_ring, lcore_id);
-		else {
+					dist_tx_ring, lcore_id);
+		} else if (worker_id == rte_lcore_count() - 2) {
+			printf("Starting rx on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
+			/* rx core */
+			struct lcore_params *p =
+					rte_malloc(NULL, sizeof(*p), 0);
+			if (!p)
+				rte_panic("malloc failure\n");
+			*p = (struct lcore_params){worker_id, d, rx_dist_ring,
+					dist_tx_ring, mbuf_pool};
+			rte_eal_remote_launch((lcore_function_t *)lcore_rx,
+					p, lcore_id);
+		} else {
+			printf("Starting worker on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
 			struct lcore_params *p =
 					rte_malloc(NULL, sizeof(*p), 0);
 			if (!p)
 				rte_panic("malloc failure\n");
-			*p = (struct lcore_params){worker_id, d, output_ring, mbuf_pool};
+			*p = (struct lcore_params){worker_id, d, rx_dist_ring,
+					dist_tx_ring, mbuf_pool};
 
 			rte_eal_remote_launch((lcore_function_t *)lcore_worker,
 					p, lcore_id);
 		}
 		worker_id++;
 	}
-	/* call lcore_main on master core only */
-	struct lcore_params p = { 0, d, output_ring, mbuf_pool};
 
-	if (lcore_rx(&p) != 0)
-		return -1;
+	freq = rte_get_timer_hz();
+	t = __rdtsc() + freq;
+	while (!quit_signal_dist) {
+		if (t < __rdtsc()) {
+			print_stats();
+			t = _rdtsc() + freq;
+		}
+	}
 
 	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
 		if (rte_eal_wait_lcore(lcore_id) < 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [WARNING: A/V UNSCANNABLE][PATCH v3 6/6] doc: distributor library changes for new burst api
  2017-01-02 10:22       ` [WARNING: A/V UNSCANNABLE][PATCH v3 0/6] distributor-performance-improvements David Hunt
                           ` (4 preceding siblings ...)
  2017-01-02 10:22         ` [WARNING: A/V UNSCANNABLE][PATCH v3 5/6] example: distributor app showing burst api David Hunt
@ 2017-01-02 10:22         ` David Hunt
  5 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-01-02 10:22 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 doc/guides/prog_guide/packet_distrib_lib.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/prog_guide/packet_distrib_lib.rst b/doc/guides/prog_guide/packet_distrib_lib.rst
index b5bdabb..dffd4ad 100644
--- a/doc/guides/prog_guide/packet_distrib_lib.rst
+++ b/doc/guides/prog_guide/packet_distrib_lib.rst
@@ -42,6 +42,10 @@ The model of operation is shown in the diagram below.
 
    Packet Distributor mode of operation
 
+There are two versions of the API in the distributor Library, one which sends one packet at a time to workers,
+and another which sends bursts of up to 8 packets at a time to workers. The functions names of the second API
+are identified by "_burst", and must not be intermixed with the single packet API. The operations described below
+apply to both API's, select which API you wish to use by including the relevant header file.
 
 Distributor Core Operation
 --------------------------
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* Re: [PATCH v2 3/5] test: add distributor_perf autotest
  2016-12-22 12:19       ` Jerin Jacob
@ 2017-01-02 16:24         ` Hunt, David
  2017-01-04 13:09           ` Jerin Jacob
  0 siblings, 1 reply; 202+ messages in thread
From: Hunt, David @ 2017-01-02 16:24 UTC (permalink / raw)
  To: Jerin Jacob; +Cc: dev, bruce.richardson



On 22/12/2016 12:19 PM, Jerin Jacob wrote:
> On Thu, Dec 22, 2016 at 04:37:06AM +0000, David Hunt wrote:
>> +	struct rte_distributor_burst *d = arg;
>> +	unsigned int count = 0;
>> +	unsigned int num = 0;
>> +	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
> Use rte_atomic equivalent

Jerin,
     I'm looking for an equivalent, but I can't seem to find one. Here 
I'm assigning 'id' with the incremented value of worker_idx in one 
statement.
However, rte_atomic32_add just increments the variable and returns void, 
so I'd have to use two statements, losing the atomicity.

   static inline void
   rte_atomic32_add(rte_atomic32_t *v, int32_t inc)

There's a second reason why I can't use the rte_atomics, and that's 
because worker_idx is a volatile.

Maybe we could add new atomic functions in the future to address this?

Thanks,
Dave.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v2 3/5] test: add distributor_perf autotest
  2017-01-02 16:24         ` Hunt, David
@ 2017-01-04 13:09           ` Jerin Jacob
  0 siblings, 0 replies; 202+ messages in thread
From: Jerin Jacob @ 2017-01-04 13:09 UTC (permalink / raw)
  To: Hunt, David; +Cc: dev, bruce.richardson

On Mon, Jan 02, 2017 at 04:24:01PM +0000, Hunt, David wrote:
> 
> 
> On 22/12/2016 12:19 PM, Jerin Jacob wrote:
> > On Thu, Dec 22, 2016 at 04:37:06AM +0000, David Hunt wrote:
> > > +	struct rte_distributor_burst *d = arg;
> > > +	unsigned int count = 0;
> > > +	unsigned int num = 0;
> > > +	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
> > Use rte_atomic equivalent
> 
> Jerin,
>     I'm looking for an equivalent, but I can't seem to find one. Here I'm
> assigning 'id' with the incremented value of worker_idx in one statement.
> However, rte_atomic32_add just increments the variable and returns void, so
> I'd have to use two statements, losing the atomicity.
> 
>   static inline void
>   rte_atomic32_add(rte_atomic32_t *v, int32_t inc)

It would have been better rte_atomic32_add returns the old value.

> 
> There's a second reason why I can't use the rte_atomics, and that's because
> worker_idx is a volatile.

may be you could change worker_idx as rte_atomic32_t

> 
> Maybe we could add new atomic functions in the future to address this?

Yes. I guess, the fixing the return value of  rte_atomic*_[add/sub] may
be enough

> 
> Thanks,
> Dave.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* [PATCH v4 0/6] distributor library performance enhancements
  2016-12-22  4:37     ` [PATCH v2 1/5] lib: distributor " David Hunt
  2016-12-22 12:47       ` Jerin Jacob
  2017-01-02 10:22       ` [WARNING: A/V UNSCANNABLE][PATCH v3 0/6] distributor-performance-improvements David Hunt
@ 2017-01-09  7:50       ` David Hunt
  2017-01-09  7:50         ` [PATCH v4 1/6] lib: distributor " David Hunt
                           ` (5 more replies)
  2 siblings, 6 replies; 202+ messages in thread
From: David Hunt @ 2017-01-09  7:50 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson

This patch aims to improve the throughput of the distributor library.

It uses a similar handshake mechanism to the previous version of
the library, in that bits are used to indicate when packets are ready
to be sent to a worker and ready to be returned from a worker. One main
difference is that instead of sending one packet in a cache line, it makes
use of the 7 free spaces in the same cache line in order to send up to
8 packets at a time to/from a worker.

The flow matching algorithm has had significant re-work, and now keeps an
array of inflight flows and an array of backlog flows, and matches incoming
flows to the inflight/backlog flows of all workers so that flow pinning to
workers can be maintained.

The Flow Match algorithm has both scalar and a vector versions, and a
function pointer is used to select the post appropriate function at run time,
depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
the scalar match function is selected, which should still gives a good boost
in performance over the non-burst API.

v2 changes:
  * Created a common distributor_priv.h header file with common
    definitions and structures.
  * Added a scalar version so it can be built and used on machines without
    sse2 instruction set
  * Added unit autotests
  * Added perf autotest

v3 changes:
  * Addressed mailing list review comments
  * Test code removal
  * Split out SSE match into separate file to facilitate NEON addition
  * Cleaned up conditional compilation flags for SSE2
  * Addressed c99 style compilation errors
  * rebased on latest head (Jan 2 2017, Happy New Year to all)

v4 changes:
   * fixed issue building shared libraries

Notes:
   Apps must now work in bursts, as up to 8 are given to a worker at a time
   For performance in matching, Flow ID's are 15-bits
   Original API (and code) is kept for backward compatibility

Performance Gains
   2.2GHz Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
   2 x XL710 40GbE NICS to 2 x 40Gbps traffic generator channels 64b packets
   separate cores for rx, tx, distributor
    1 worker  - 4.8x
    4 workers - 2.9x
    8 workers - 1.8x
   12 workers - 2.1x
   16 workers - 1.8x

[PATCH v4 1/6] lib: distributor performance enhancements
[PATCH v4 2/6] lib: add distributor vector flow matching
[PATCH v4 3/6] test: unit tests for new distributor burst api
[PATCH v4 4/6] test: add distributor_perf autotest
[PATCH v4 5/6] example: distributor app showing burst api
[PATCH v4 6/6] doc: distributor library changes for new burst api

^ permalink raw reply	[flat|nested] 202+ messages in thread

* [PATCH v4 1/6] lib: distributor performance enhancements
  2017-01-09  7:50       ` [PATCH v4 0/6] distributor library performance enhancements David Hunt
@ 2017-01-09  7:50         ` David Hunt
  2017-01-13 15:19           ` Bruce Richardson
                             ` (2 more replies)
  2017-01-09  7:50         ` [PATCH v4 2/6] lib: add distributor vector flow matching David Hunt
                           ` (4 subsequent siblings)
  5 siblings, 3 replies; 202+ messages in thread
From: David Hunt @ 2017-01-09  7:50 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Now sends bursts of up to 8 mbufs to each worker, and tracks
the in-flight flow-ids (atomic scheduling)

New file with a new api, similar to the old API except with _burst
at the end of the function names

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                    |   2 +
 lib/librte_distributor/rte_distributor.c           |  72 +--
 lib/librte_distributor/rte_distributor_burst.c     | 558 +++++++++++++++++++++
 lib/librte_distributor/rte_distributor_burst.h     | 255 ++++++++++
 lib/librte_distributor/rte_distributor_priv.h      | 189 +++++++
 lib/librte_distributor/rte_distributor_version.map |   9 +
 6 files changed, 1014 insertions(+), 71 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_burst.c
 create mode 100644 lib/librte_distributor/rte_distributor_burst.h
 create mode 100644 lib/librte_distributor/rte_distributor_priv.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 4c9af17..2acc54d 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -43,9 +43,11 @@ LIBABIVER := 1
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_burst.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor_burst.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index f3f778c..c05f6e3 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -40,79 +40,9 @@
 #include <rte_errno.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
+#include "rte_distributor_priv.h"
 #include "rte_distributor.h"
 
-#define NO_FLAGS 0
-#define RTE_DISTRIB_PREFIX "DT_"
-
-/* we will use the bottom four bits of pointer for flags, shifting out
- * the top four bits to make room (since a 64-bit pointer actually only uses
- * 48 bits). An arithmetic-right-shift will then appropriately restore the
- * original pointer value with proper sign extension into the top bits. */
-#define RTE_DISTRIB_FLAG_BITS 4
-#define RTE_DISTRIB_FLAGS_MASK (0x0F)
-#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
-#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
-#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
-
-#define RTE_DISTRIB_BACKLOG_SIZE 8
-#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
-
-#define RTE_DISTRIB_MAX_RETURNS 128
-#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
-
-/**
- * Maximum number of workers allowed.
- * Be aware of increasing the limit, becaus it is limited by how we track
- * in-flight tags. See @in_flight_bitmask and @rte_distributor_process
- */
-#define RTE_DISTRIB_MAX_WORKERS	64
-
-/**
- * Buffer structure used to pass the pointer data between cores. This is cache
- * line aligned, but to improve performance and prevent adjacent cache-line
- * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
- * the next cache line to worker 0, we pad this out to three cache lines.
- * Only 64-bits of the memory is actually used though.
- */
-union rte_distributor_buffer {
-	volatile int64_t bufptr64;
-	char pad[RTE_CACHE_LINE_SIZE*3];
-} __rte_cache_aligned;
-
-struct rte_distributor_backlog {
-	unsigned start;
-	unsigned count;
-	int64_t pkts[RTE_DISTRIB_BACKLOG_SIZE];
-};
-
-struct rte_distributor_returned_pkts {
-	unsigned start;
-	unsigned count;
-	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
-};
-
-struct rte_distributor {
-	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
-
-	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
-	unsigned num_workers;                 /**< Number of workers polling */
-
-	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
-		/**< Tracks the tag being processed per core */
-	uint64_t in_flight_bitmask;
-		/**< on/off bits for in-flight tags.
-		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
-		 * the bitmask has to expand.
-		 */
-
-	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
-
-	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
-
-	struct rte_distributor_returned_pkts returns;
-};
-
 TAILQ_HEAD(rte_distributor_list, rte_distributor);
 
 static struct rte_tailq_elem rte_distributor_tailq = {
diff --git a/lib/librte_distributor/rte_distributor_burst.c b/lib/librte_distributor/rte_distributor_burst.c
new file mode 100644
index 0000000..ae7cf9d
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_burst.c
@@ -0,0 +1,558 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <sys/queue.h>
+#include <string.h>
+#include <rte_mbuf.h>
+#include <rte_memory.h>
+#include <rte_cycles.h>
+#include <rte_memzone.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
+#include <rte_eal_memconfig.h>
+#include "rte_distributor_priv.h"
+#include "rte_distributor_burst.h"
+
+TAILQ_HEAD(rte_dist_burst_list, rte_distributor_burst);
+
+static struct rte_tailq_elem rte_dist_burst_tailq = {
+	.name = "RTE_DIST_BURST",
+};
+EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
+
+/**** APIs called by workers ****/
+
+/**** Burst Packet APIs called by workers ****/
+
+/* This function should really be called return_pkt_burst() */
+void
+rte_distributor_request_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count)
+{
+	struct rte_distributor_buffer_burst *buf = &(d->bufs[worker_id]);
+	unsigned int i;
+
+	volatile int64_t *retptr64;
+
+
+	/* if we dont' have any packets to return, return. */
+	if (count == 0)
+		return;
+
+	retptr64 = &(buf->retptr64[0]);
+	/* Spin while handshake bits are set (scheduler clears it) */
+	while (unlikely(*retptr64 & RTE_DISTRIB_GET_BUF)) {
+		rte_pause();
+		uint64_t t = rte_rdtsc()+100;
+
+		while (rte_rdtsc() < t)
+			rte_pause();
+	}
+
+	/*
+	 * OK, if we've got here, then the scheduler has just cleared the
+	 * handshake bits. Populate the retptrs with returning packets.
+	 */
+
+	for (i = count; i < RTE_DIST_BURST_SIZE; i++)
+		buf->retptr64[i] = 0;
+
+	/* Set Return bit for each packet returned */
+	for (i = count; i-- > 0; )
+		buf->retptr64[i] =
+			(((int64_t)(uintptr_t)(oldpkt[i])) <<
+			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
+
+	/*
+	 * Finally, set the GET_BUF  to signal to distributor that cache
+	 * line is ready for processing
+	 */
+	*retptr64 |= RTE_DISTRIB_GET_BUF;
+}
+
+int
+rte_distributor_poll_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **pkts)
+{
+	struct rte_distributor_buffer_burst *buf = &d->bufs[worker_id];
+	uint64_t ret;
+	int count = 0;
+	unsigned int i;
+
+	/* If bit is set, return */
+	if (buf->bufptr64[0] & RTE_DISTRIB_GET_BUF)
+		return 0;
+
+	/* since bufptr64 is signed, this should be an arithmetic shift */
+	for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+		if (likely(buf->bufptr64[i] & RTE_DISTRIB_VALID_BUF)) {
+			ret = buf->bufptr64[i] >> RTE_DISTRIB_FLAG_BITS;
+			pkts[count++] = (struct rte_mbuf *)((uintptr_t)(ret));
+		}
+	}
+
+	/*
+	 * so now we've got the contents of the cacheline into an  array of
+	 * mbuf pointers, so toggle the bit so scheduler can start working
+	 * on the next cacheline while we're working.
+	 */
+	buf->bufptr64[0] |= RTE_DISTRIB_GET_BUF;
+
+
+	return count;
+}
+
+int
+rte_distributor_get_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **pkts,
+		struct rte_mbuf **oldpkt, unsigned int return_count)
+{
+	unsigned int count;
+	uint64_t retries = 0;
+
+	rte_distributor_request_pkt_burst(d, worker_id, oldpkt, return_count);
+
+	count = rte_distributor_poll_pkt_burst(d, worker_id, pkts);
+	while (count == 0) {
+		rte_pause();
+		retries++;
+		if (retries > 1000)
+			return 0;
+
+		uint64_t t = rte_rdtsc()+100;
+
+		while (rte_rdtsc() < t)
+			rte_pause();
+
+		count = rte_distributor_poll_pkt_burst(d, worker_id, pkts);
+	}
+	return count;
+}
+
+int
+rte_distributor_return_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
+{
+	struct rte_distributor_buffer_burst *buf = &d->bufs[worker_id];
+	unsigned int i;
+
+	for (i = 0; i < RTE_DIST_BURST_SIZE; i++)
+		/* Switch off the return bit first */
+		buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
+
+	for (i = num; i-- > 0; )
+		buf->retptr64[i] = (((int64_t)(uintptr_t)oldpkt[i]) <<
+			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
+
+	/* set the GET_BUF but even if we got no returns */
+	buf->retptr64[0] |= RTE_DISTRIB_GET_BUF;
+
+	return 0;
+}
+
+/**** APIs called on distributor core ***/
+
+/* stores a packet returned from a worker inside the returns array */
+static inline void
+store_return(uintptr_t oldbuf, struct rte_distributor_burst *d,
+		unsigned int *ret_start, unsigned int *ret_count)
+{
+	if (!oldbuf)
+		return;
+	/* store returns in a circular buffer */
+	d->returns.mbufs[(*ret_start + *ret_count) & RTE_DISTRIB_RETURNS_MASK]
+			= (void *)oldbuf;
+	*ret_start += (*ret_count == RTE_DISTRIB_RETURNS_MASK);
+	*ret_count += (*ret_count != RTE_DISTRIB_RETURNS_MASK);
+}
+
+static inline void
+find_match_scalar(struct rte_distributor_burst *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	struct rte_distributor_backlog *bl;
+	uint16_t i, j, w;
+
+	/*
+	 * Function overview:
+	 * 1. Loop through all worker ID's
+	 * 2. Compare the current inflights to the incoming tags
+	 * 3. Compare the current backlog to the incoming tags
+	 * 4. Add any matches to the output
+	 */
+
+	for (j = 0 ; j < RTE_DIST_BURST_SIZE; j++)
+		output_ptr[j] = 0;
+
+	for (i = 0; i < d->num_workers; i++) {
+		bl = &d->backlog[i];
+
+		for (j = 0; j < RTE_DIST_BURST_SIZE ; j++)
+			for (w = 0; w < RTE_DIST_BURST_SIZE; w++)
+				if (d->in_flight_tags[i][j] == data_ptr[w]) {
+					output_ptr[j] = i+1;
+					break;
+				}
+		for (j = 0; j < RTE_DIST_BURST_SIZE; j++)
+			for (w = 0; w < RTE_DIST_BURST_SIZE; w++)
+				if (bl->tags[j] == data_ptr[w]) {
+					output_ptr[j] = i+1;
+					break;
+				}
+	}
+
+	/*
+	 * At this stage, the output contains 8 16-bit values, with
+	 * each non-zero value containing the worker ID on which the
+	 * corresponding flow is pinned to.
+	 */
+}
+
+
+
+static unsigned int
+handle_returns(struct rte_distributor_burst *d, unsigned int wkr)
+{
+	struct rte_distributor_buffer_burst *buf = &(d->bufs[wkr]);
+	uintptr_t oldbuf;
+	unsigned int ret_start = d->returns.start,
+			ret_count = d->returns.count;
+	unsigned int count = 0;
+	unsigned int i;
+	/*
+	 * wait for the GET_BUF bit to go high, otherwise we can't send
+	 * the packets to the worker
+	 */
+
+	if (buf->retptr64[0] & RTE_DISTRIB_GET_BUF) {
+		for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+			if (buf->retptr64[i] & RTE_DISTRIB_RETURN_BUF) {
+				oldbuf = ((uintptr_t)(buf->retptr64[i] >>
+					RTE_DISTRIB_FLAG_BITS));
+				/* store returns in a circular buffer */
+				store_return(oldbuf, d, &ret_start, &ret_count);
+				count++;
+				buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
+			}
+		}
+		d->returns.start = ret_start;
+		d->returns.count = ret_count;
+		/* Clear for the worker to populate with more returns */
+		buf->retptr64[0] = 0;
+	}
+	return count;
+}
+
+static unsigned int
+release(struct rte_distributor_burst *d, unsigned int wkr)
+{
+	struct rte_distributor_buffer_burst *buf = &(d->bufs[wkr]);
+	unsigned int i;
+
+	if (d->backlog[wkr].count == 0)
+		return 0;
+
+	while (!(d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF))
+		rte_pause();
+
+	handle_returns(d, wkr);
+
+	buf->count = 0;
+
+	for (i = 0; i < d->backlog[wkr].count; i++) {
+		d->bufs[wkr].bufptr64[i] = d->backlog[wkr].pkts[i] |
+				RTE_DISTRIB_GET_BUF | RTE_DISTRIB_VALID_BUF;
+		d->in_flight_tags[wkr][i] = d->backlog[wkr].tags[i];
+	}
+	buf->count = i;
+	for ( ; i < RTE_DIST_BURST_SIZE ; i++) {
+		buf->bufptr64[i] = RTE_DISTRIB_GET_BUF;
+		d->in_flight_tags[wkr][i] = 0;
+	}
+
+	d->backlog[wkr].count = 0;
+
+	/* Clear the GET bit */
+	buf->bufptr64[0] &= ~RTE_DISTRIB_GET_BUF;
+	return  buf->count;
+
+}
+
+
+/* process a set of packets to distribute them to workers */
+int
+rte_distributor_process_burst(struct rte_distributor_burst *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs)
+{
+	unsigned int next_idx = 0;
+	static unsigned int wkr;
+	struct rte_mbuf *next_mb = NULL;
+	int64_t next_value = 0;
+	uint16_t new_tag = 0;
+	uint16_t flows[RTE_DIST_BURST_SIZE] __rte_cache_aligned;
+	unsigned int i, wid;
+	int j, w;
+
+	if (unlikely(num_mbufs == 0)) {
+		/* Flush out all non-full cache-lines to workers. */
+		for (wid = 0 ; wid < d->num_workers; wid++) {
+			if ((d->bufs[wid].bufptr64[0] & RTE_DISTRIB_GET_BUF)) {
+				release(d, wid);
+				handle_returns(d, wid);
+			}
+		}
+		return 0;
+	}
+
+	while (next_idx < num_mbufs) {
+		uint16_t matches[RTE_DIST_BURST_SIZE];
+		int pkts;
+
+		if (d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF)
+			d->bufs[wkr].count = 0;
+
+		for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+			if (mbufs[next_idx + i]) {
+				/* flows have to be non-zero */
+				flows[i] = mbufs[next_idx + i]->hash.usr | 1;
+			} else
+				flows[i] = 0;
+		}
+
+		switch (d->dist_match_fn) {
+		default:
+			find_match_scalar(d, &flows[0], &matches[0]);
+		}
+
+		/*
+		 * Matches array now contain the intended worker ID (+1) of
+		 * the incoming packets. Any zeroes need to be assigned
+		 * workers.
+		 */
+
+		if ((num_mbufs - next_idx) < RTE_DIST_BURST_SIZE)
+			pkts = num_mbufs - next_idx;
+		else
+			pkts = RTE_DIST_BURST_SIZE;
+
+		for (j = 0; j < pkts; j++) {
+
+			next_mb = mbufs[next_idx++];
+			next_value = (((int64_t)(uintptr_t)next_mb) <<
+					RTE_DISTRIB_FLAG_BITS);
+			/*
+			 * User is advocated to set tag vaue for each
+			 * mbuf before calling rte_distributor_process.
+			 * User defined tags are used to identify flows,
+			 * or sessions.
+			 */
+			/* flows MUST be non-zero */
+			new_tag = (uint16_t)(next_mb->hash.usr) | 1;
+
+			/*
+			 * Uncommenting the next line will cause the find_match
+			 * function to be optimised out, making this function
+			 * do parallel (non-atomic) distribution
+			 */
+			/* matches[j] = 0; */
+
+			if (matches[j]) {
+				struct rte_distributor_backlog *bl =
+						&d->backlog[matches[j]-1];
+				if (unlikely(bl->count ==
+						RTE_DIST_BURST_SIZE)) {
+					release(d, matches[j]-1);
+				}
+
+				/* Add to worker that already has flow */
+				unsigned int idx = bl->count++;
+
+				bl->tags[idx] = new_tag;
+				bl->pkts[idx] = next_value;
+
+			} else {
+				struct rte_distributor_backlog *bl =
+						&d->backlog[wkr];
+				if (unlikely(bl->count ==
+						RTE_DIST_BURST_SIZE)) {
+					release(d, wkr);
+				}
+
+				/* Add to current worker worker */
+				unsigned int idx = bl->count++;
+
+				bl->tags[idx] = new_tag;
+				bl->pkts[idx] = next_value;
+				/*
+				 * Now that we've just added an unpinned flow
+				 * to a worker, we need to ensure that all
+				 * other packets with that same flow will go
+				 * to the same worker in this burst.
+				 */
+				for (w = j; w < pkts; w++)
+					if (flows[w] == new_tag)
+						matches[w] = wkr+1;
+			}
+		}
+		wkr++;
+		if (wkr >= d->num_workers)
+			wkr = 0;
+	}
+
+	/* Flush out all non-full cache-lines to workers. */
+	for (wid = 0 ; wid < d->num_workers; wid++)
+		if ((d->bufs[wid].bufptr64[0] & RTE_DISTRIB_GET_BUF))
+			release(d, wid);
+
+	return num_mbufs;
+}
+
+/* return to the caller, packets returned from workers */
+int
+rte_distributor_returned_pkts_burst(struct rte_distributor_burst *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs)
+{
+	struct rte_distributor_returned_pkts *returns = &d->returns;
+	unsigned int retval = (max_mbufs < returns->count) ?
+			max_mbufs : returns->count;
+	unsigned int i;
+
+	for (i = 0; i < retval; i++) {
+		unsigned int idx = (returns->start + i) &
+				RTE_DISTRIB_RETURNS_MASK;
+
+		mbufs[i] = returns->mbufs[idx];
+	}
+	returns->start += i;
+	returns->count -= i;
+
+	return retval;
+}
+
+/*
+ * Return the number of packets in-flight in a distributor, i.e. packets
+ * being workered on or queued up in a backlog.
+ */
+static inline unsigned int
+total_outstanding(const struct rte_distributor_burst *d)
+{
+	unsigned int wkr, total_outstanding = 0;
+
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		total_outstanding += d->backlog[wkr].count;
+
+	return total_outstanding;
+}
+
+/*
+ * Flush the distributor, so that there are no outstanding packets in flight or
+ * queued up.
+ */
+int
+rte_distributor_flush_burst(struct rte_distributor_burst *d)
+{
+	const unsigned int flushed = total_outstanding(d);
+	unsigned int wkr;
+
+	while (total_outstanding(d) > 0)
+		rte_distributor_process_burst(d, NULL, 0);
+
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		handle_returns(d, wkr);
+
+	return flushed;
+}
+
+/* clears the internal returns array in the distributor */
+void
+rte_distributor_clear_returns_burst(struct rte_distributor_burst *d)
+{
+	unsigned int wkr;
+
+	/* throw away returns, so workers can exit */
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		d->bufs[wkr].retptr64[0] = 0;
+}
+
+/* creates a distributor instance */
+struct rte_distributor_burst *
+rte_distributor_create_burst(const char *name,
+		unsigned int socket_id,
+		unsigned int num_workers)
+{
+	struct rte_distributor_burst *d;
+	struct rte_dist_burst_list *dist_burst_list;
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+	unsigned int i;
+
+	/* compilation-time checks */
+	RTE_BUILD_BUG_ON((sizeof(*d) & RTE_CACHE_LINE_MASK) != 0);
+	RTE_BUILD_BUG_ON((RTE_DISTRIB_MAX_WORKERS & 7) != 0);
+
+	if (name == NULL || num_workers >= RTE_DISTRIB_MAX_WORKERS) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	snprintf(mz_name, sizeof(mz_name), RTE_DISTRIB_PREFIX"%s", name);
+	mz = rte_memzone_reserve(mz_name, sizeof(*d), socket_id, NO_FLAGS);
+	if (mz == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	d = mz->addr;
+	snprintf(d->name, sizeof(d->name), "%s", name);
+	d->num_workers = num_workers;
+
+	d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
+
+	/*
+	 * Set up the backog tags so they're pointing at the second cache
+	 * line for performance during flow matching
+	 */
+	for (i = 0 ; i < num_workers ; i++)
+		d->backlog[i].tags = &d->in_flight_tags[i][RTE_DIST_BURST_SIZE];
+
+	dist_burst_list = RTE_TAILQ_CAST(rte_dist_burst_tailq.head,
+					  rte_dist_burst_list);
+
+	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+	TAILQ_INSERT_TAIL(dist_burst_list, d, next);
+	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
+	return d;
+}
diff --git a/lib/librte_distributor/rte_distributor_burst.h b/lib/librte_distributor/rte_distributor_burst.h
new file mode 100644
index 0000000..5096b13
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_burst.h
@@ -0,0 +1,255 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DIST_BURST_H_
+#define _RTE_DIST_BURST_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct rte_distributor_burst;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor_burst *
+rte_distributor_create_burst(const char *name, unsigned int socket_id,
+		unsigned int num_workers);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed on different cores at
+ * the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process_burst(struct rte_distributor_burst *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts_burst(struct rte_distributor_burst *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush_burst(struct rte_distributor_burst *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns_burst(struct rte_distributor_burst *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get new packets to process. Any previous packets
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param pkts
+ *   The mbufs pointer array to be filled in (up to 8 packets)
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ * @param retcount
+ *   The number of packets being returned
+ *
+ * @return
+ *   The number of packets in the pkts array
+ */
+int
+rte_distributor_get_pkt_burst(struct rte_distributor_burst *d,
+	unsigned int worker_id, struct rte_mbuf **pkts,
+	struct rte_mbuf **oldpkt, unsigned int retcount);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbuf
+ *   The previous packet being processed by the worker
+ */
+int
+rte_distributor_return_pkt_burst(struct rte_distributor_burst *d,
+	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt_burst(), this function does not wait for a
+ * new packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt_burst() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt_burst()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The returning packets, if any, processed by the worker
+ * @param count
+ *   The number of returning packets
+ */
+void
+rte_distributor_request_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbufs
+ *   The array of mbufs being given to the worker
+ *
+ * @return
+ *   The number of packets being given to the worker thread, zero if no
+ *   packet is yet available.
+ */
+int
+rte_distributor_poll_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **mbufs);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_priv.h b/lib/librte_distributor/rte_distributor_priv.h
new file mode 100644
index 0000000..833855f
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_priv.h
@@ -0,0 +1,189 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DIST_PRIV_H_
+#define _RTE_DIST_PRIV_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define NO_FLAGS 0
+#define RTE_DISTRIB_PREFIX "DT_"
+
+/*
+ * We will use the bottom four bits of pointer for flags, shifting out
+ * the top four bits to make room (since a 64-bit pointer actually only uses
+ * 48 bits). An arithmetic-right-shift will then appropriately restore the
+ * original pointer value with proper sign extension into the top bits.
+ */
+#define RTE_DISTRIB_FLAG_BITS 4
+#define RTE_DISTRIB_FLAGS_MASK (0x0F)
+#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
+#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
+#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
+#define RTE_DISTRIB_VALID_BUF (4)  /**< set if bufptr contains ptr */
+
+#define RTE_DISTRIB_BACKLOG_SIZE 8
+#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
+
+#define RTE_DISTRIB_MAX_RETURNS 128
+#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
+
+/**
+ * Maximum number of workers allowed.
+ * Be aware of increasing the limit, becaus it is limited by how we track
+ * in-flight tags. See @in_flight_bitmask and @rte_distributor_process
+ */
+#define RTE_DISTRIB_MAX_WORKERS 64
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+/**
+ * Buffer structure used to pass the pointer data between cores. This is cache
+ * line aligned, but to improve performance and prevent adjacent cache-line
+ * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
+ * the next cache line to worker 0, we pad this out to three cache lines.
+ * Only 64-bits of the memory is actually used though.
+ */
+union rte_distributor_buffer {
+	volatile int64_t bufptr64;
+	char pad[RTE_CACHE_LINE_SIZE*3];
+} __rte_cache_aligned;
+
+/**
+ * Number of packets to deal with in bursts. Needs to be 8 so as to
+ * fit in one cache line.
+ */
+#define RTE_DIST_BURST_SIZE (sizeof(__m128i) / sizeof(uint16_t))
+
+/**
+ * Buffer structure used to pass the pointer data between cores. This is cache
+ * line aligned, but to improve performance and prevent adjacent cache-line
+ * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
+ * the next cache line to worker 0, we pad this out to two cache lines.
+ * We can pass up to 8 mbufs at a time in one cacheline.
+ * There is a separate cacheline for returns in the burst API.
+ */
+struct rte_distributor_buffer_burst {
+	volatile int64_t bufptr64[RTE_DIST_BURST_SIZE]
+			__rte_cache_aligned; /* <= outgoing to worker */
+
+	int64_t pad1 __rte_cache_aligned;    /* <= one cache line  */
+
+	volatile int64_t retptr64[RTE_DIST_BURST_SIZE]
+			__rte_cache_aligned; /* <= incoming from worker */
+
+	int64_t pad2 __rte_cache_aligned;    /* <= one cache line  */
+
+	int count __rte_cache_aligned;       /* <= number of current mbufs */
+};
+
+
+struct rte_distributor_backlog {
+	unsigned int start;
+	unsigned int count;
+	int64_t pkts[RTE_DIST_BURST_SIZE] __rte_cache_aligned;
+	uint16_t *tags; /* will point to second cacheline of inflights */
+} __rte_cache_aligned;
+
+
+struct rte_distributor_returned_pkts {
+	unsigned int start;
+	unsigned int count;
+	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
+};
+
+struct rte_distributor {
+	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
+
+	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
+	unsigned int num_workers;             /**< Number of workers polling */
+
+	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
+		/**< Tracks the tag being processed per core */
+	uint64_t in_flight_bitmask;
+		/**< on/off bits for in-flight tags.
+		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
+		 * the bitmask has to expand.
+		 */
+
+	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
+
+	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
+
+	struct rte_distributor_returned_pkts returns;
+};
+
+/* All different signature compare functions */
+enum rte_distributor_match_function {
+	RTE_DIST_MATCH_SCALAR = 0,
+	RTE_DIST_MATCH_NUM
+};
+
+struct rte_distributor_burst {
+	TAILQ_ENTRY(rte_distributor_burst) next;    /**< Next in list. */
+
+	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
+	unsigned int num_workers;             /**< Number of workers polling */
+
+	/**>
+	 * First cache line in the this array are the tags inflight
+	 * on the worker core. Second cache line are the backlog
+	 * that are going to go to the worker core.
+	 */
+	uint16_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS][RTE_DIST_BURST_SIZE*2]
+			__rte_cache_aligned;
+
+	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS]
+			__rte_cache_aligned;
+
+	struct rte_distributor_buffer_burst bufs[RTE_DISTRIB_MAX_WORKERS];
+
+	struct rte_distributor_returned_pkts returns;
+
+	enum rte_distributor_match_function dist_match_fn;
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_version.map b/lib/librte_distributor/rte_distributor_version.map
index 73fdc43..39795a1 100644
--- a/lib/librte_distributor/rte_distributor_version.map
+++ b/lib/librte_distributor/rte_distributor_version.map
@@ -2,14 +2,23 @@ DPDK_2.0 {
 	global:
 
 	rte_distributor_clear_returns;
+	rte_distributor_clear_returns_burst;
 	rte_distributor_create;
+	rte_distributor_create_burst;
 	rte_distributor_flush;
+	rte_distributor_flush_burst;
 	rte_distributor_get_pkt;
+	rte_distributor_get_pkt_burst;
 	rte_distributor_poll_pkt;
+	rte_distributor_poll_pkt_burst;
 	rte_distributor_process;
+	rte_distributor_process_burst;
 	rte_distributor_request_pkt;
+	rte_distributor_request_pkt_burst;
 	rte_distributor_return_pkt;
+	rte_distributor_return_pkt_burst;
 	rte_distributor_returned_pkts;
+	rte_distributor_returned_pkts_burst;
 
 	local: *;
 };
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v4 2/6] lib: add distributor vector flow matching
  2017-01-09  7:50       ` [PATCH v4 0/6] distributor library performance enhancements David Hunt
  2017-01-09  7:50         ` [PATCH v4 1/6] lib: distributor " David Hunt
@ 2017-01-09  7:50         ` David Hunt
  2017-01-13 15:26           ` Bruce Richardson
  2017-01-16 16:40           ` Bruce Richardson
  2017-01-09  7:50         ` [PATCH v4 3/6] test: unit tests for new distributor burst api David Hunt
                           ` (3 subsequent siblings)
  5 siblings, 2 replies; 202+ messages in thread
From: David Hunt @ 2017-01-09  7:50 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                    |   4 +
 lib/librte_distributor/rte_distributor_burst.c     |  11 +-
 lib/librte_distributor/rte_distributor_match_sse.c | 113 +++++++++++++++++++++
 lib/librte_distributor/rte_distributor_priv.h      |   6 ++
 4 files changed, 133 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_distributor/rte_distributor_match_sse.c

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 2acc54d..a725aaf 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -44,6 +44,10 @@ LIBABIVER := 1
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_burst.c
+ifeq ($(CONFIG_RTE_ARCH_X86),y)
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_match_sse.c
+endif
+
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
diff --git a/lib/librte_distributor/rte_distributor_burst.c b/lib/librte_distributor/rte_distributor_burst.c
index ae7cf9d..35044c4 100644
--- a/lib/librte_distributor/rte_distributor_burst.c
+++ b/lib/librte_distributor/rte_distributor_burst.c
@@ -352,6 +352,9 @@ rte_distributor_process_burst(struct rte_distributor_burst *d,
 		}
 
 		switch (d->dist_match_fn) {
+		case RTE_DIST_MATCH_VECTOR:
+			find_match_vec(d, &flows[0], &matches[0]);
+			break;
 		default:
 			find_match_scalar(d, &flows[0], &matches[0]);
 		}
@@ -538,7 +541,13 @@ rte_distributor_create_burst(const char *name,
 	snprintf(d->name, sizeof(d->name), "%s", name);
 	d->num_workers = num_workers;
 
-	d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
+#if defined(RTE_ARCH_X86)
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2)) {
+		d->dist_match_fn = RTE_DIST_MATCH_VECTOR;
+	} else {
+#endif
+		d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
+	}
 
 	/*
 	 * Set up the backog tags so they're pointing at the second cache
diff --git a/lib/librte_distributor/rte_distributor_match_sse.c b/lib/librte_distributor/rte_distributor_match_sse.c
new file mode 100644
index 0000000..78641f5
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_match_sse.c
@@ -0,0 +1,113 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2016 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_mbuf.h>
+#include "rte_distributor_priv.h"
+#include "rte_distributor_burst.h"
+#include "smmintrin.h"
+
+
+void
+find_match_vec(struct rte_distributor_burst *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	/* Setup */
+	__m128i incoming_fids;
+	__m128i inflight_fids;
+	__m128i preflight_fids;
+	__m128i wkr;
+	__m128i mask1;
+	__m128i mask2;
+	__m128i output;
+	struct rte_distributor_backlog *bl;
+	uint16_t i;
+
+	/*
+	 * Function overview:
+	 * 2. Loop through all worker ID's
+	 *  2a. Load the current inflights for that worker into an xmm reg
+	 *  2b. Load the current backlog for that worker into an xmm reg
+	 *  2c. use cmpestrm to intersect flow_ids with backlog and inflights
+	 *  2d. Add any matches to the output
+	 * 3. Write the output xmm (matching worker ids).
+	 */
+
+
+	output = _mm_set1_epi16(0);
+	incoming_fids = _mm_load_si128((__m128i *)data_ptr);
+
+	for (i = 0; i < d->num_workers; i++) {
+		bl = &d->backlog[i];
+
+		inflight_fids =
+			_mm_load_si128((__m128i *)&(d->in_flight_tags[i]));
+		preflight_fids =
+			_mm_load_si128((__m128i *)(bl->tags));
+
+		/*
+		 * Any incoming_fid that exists anywhere in inflight_fids will
+		 * have 0xffff in same position of the mask as the incoming fid
+		 * Example (shortened to bytes for brevity):
+		 * incoming_fids   0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08
+		 * inflight_fids   0x03 0x05 0x07 0x00 0x00 0x00 0x00 0x00
+		 * mask            0x00 0x00 0xff 0x00 0xff 0x00 0xff 0x00
+		 */
+
+		mask1 = _mm_cmpestrm(inflight_fids, 8, incoming_fids, 8,
+			_SIDD_UWORD_OPS |
+			_SIDD_CMP_EQUAL_ANY |
+			_SIDD_UNIT_MASK);
+		mask2 = _mm_cmpestrm(preflight_fids, 8, incoming_fids, 8,
+			_SIDD_UWORD_OPS |
+			_SIDD_CMP_EQUAL_ANY |
+			_SIDD_UNIT_MASK);
+
+		mask1 = _mm_or_si128(mask1, mask2);
+		/*
+		 * Now mask contains 0xffff where there's a match.
+		 * Next we need to store the worker_id in the relevant position
+		 * in the output.
+		 */
+
+		wkr = _mm_set1_epi16(i+1);
+		mask1 = _mm_and_si128(mask1, wkr);
+		output = _mm_or_si128(mask1, output);
+	}
+
+	/*
+	 * At this stage, the output 128-bit contains 8 16-bit values, with
+	 * each non-zero value containing the worker ID on which the
+	 * corresponding flow is pinned to.
+	 */
+	_mm_store_si128((__m128i *)output_ptr, output);
+}
diff --git a/lib/librte_distributor/rte_distributor_priv.h b/lib/librte_distributor/rte_distributor_priv.h
index 833855f..cc2c478 100644
--- a/lib/librte_distributor/rte_distributor_priv.h
+++ b/lib/librte_distributor/rte_distributor_priv.h
@@ -155,6 +155,7 @@ struct rte_distributor {
 /* All different signature compare functions */
 enum rte_distributor_match_function {
 	RTE_DIST_MATCH_SCALAR = 0,
+	RTE_DIST_MATCH_VECTOR,
 	RTE_DIST_MATCH_NUM
 };
 
@@ -182,6 +183,11 @@ struct rte_distributor_burst {
 	enum rte_distributor_match_function dist_match_fn;
 };
 
+void
+find_match_vec(struct rte_distributor_burst *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr);
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v4 3/6] test: unit tests for new distributor burst api
  2017-01-09  7:50       ` [PATCH v4 0/6] distributor library performance enhancements David Hunt
  2017-01-09  7:50         ` [PATCH v4 1/6] lib: distributor " David Hunt
  2017-01-09  7:50         ` [PATCH v4 2/6] lib: add distributor vector flow matching David Hunt
@ 2017-01-09  7:50         ` David Hunt
  2017-01-13 15:33           ` Bruce Richardson
  2017-01-09  7:50         ` [PATCH v4 4/6] test: add distributor_perf autotest David Hunt
                           ` (2 subsequent siblings)
  5 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-01-09  7:50 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 app/test/test_distributor.c | 501 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 392 insertions(+), 109 deletions(-)

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index 85cb8f3..3871f86 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -40,11 +40,24 @@
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
 #include <rte_distributor.h>
+#include <rte_distributor_burst.h>
 
 #define ITER_POWER 20 /* log 2 of how many iterations we do when timing. */
 #define BURST 32
 #define BIG_BATCH 1024
 
+#define DIST_SINGLE 0
+#define DIST_BURST  1
+#define DIST_NUM_TYPES 2
+
+struct worker_params {
+	struct rte_distributor *d;
+	struct rte_distributor_burst *db;
+	int dist_type;
+};
+
+struct worker_params worker_params;
+
 /* statics - all zero-initialized by default */
 static volatile int quit;      /**< general quit variable for all threads */
 static volatile int zero_quit; /**< var for when we just want thr0 to quit*/
@@ -81,17 +94,36 @@ static int
 handle_work(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
-
-	pkt = rte_distributor_get_pkt(d, id, NULL);
-	while (!quit) {
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
+	unsigned int count = 0, num = 0;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	int i;
+
+	if (wp->dist_type == DIST_SINGLE) {
+		pkt = rte_distributor_get_pkt(d, id, NULL);
+		while (!quit) {
+			worker_stats[id].handled_packets++, count++;
+			pkt = rte_distributor_get_pkt(d, id, pkt);
+		}
 		worker_stats[id].handled_packets++, count++;
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		rte_distributor_return_pkt(d, id, pkt);
+	} else {
+		for (i = 0; i < 8; i++)
+			buf[i] = NULL;
+		num = rte_distributor_get_pkt_burst(db, id, buf, buf, num);
+		while (!quit) {
+			worker_stats[id].handled_packets += num;
+			count += num;
+			num = rte_distributor_get_pkt_burst(db, id,
+					buf, buf, num);
+		}
+		worker_stats[id].handled_packets += num;
+		count += num;
+		rte_distributor_return_pkt_burst(db, id, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
 	return 0;
 }
 
@@ -107,12 +139,21 @@ handle_work(void *arg)
  *   not necessarily in the same order (as different flows).
  */
 static int
-sanity_test(struct rte_distributor *d, struct rte_mempool *p)
+sanity_test(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	struct rte_mbuf *bufs[BURST];
-	unsigned i;
+	struct rte_mbuf *returns[BURST*2];
+	unsigned int i;
+	unsigned int retries;
+	unsigned int count = 0;
+
+	if (wp->dist_type == DIST_SINGLE)
+		printf("=== Basic distributor sanity tests (single) ===\n");
+	else
+		printf("=== Basic distributor sanity tests (burst) ===\n");
 
-	printf("=== Basic distributor sanity tests ===\n");
 	clear_packet_count();
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
@@ -124,8 +165,21 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 	for (i = 0; i < BURST; i++)
 		bufs[i]->hash.usr = 0;
 
-	rte_distributor_process(d, bufs, BURST);
-	rte_distributor_flush(d);
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, bufs, BURST);
+		rte_distributor_flush(d);
+	} else {
+		rte_distributor_process_burst(db, bufs, BURST);
+		count = 0;
+		do {
+
+			rte_distributor_flush_burst(db);
+			count += rte_distributor_returned_pkts_burst(db,
+					returns, BURST*2);
+		} while (count < BURST);
+	}
+
+
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -146,8 +200,18 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 		for (i = 0; i < BURST; i++)
 			bufs[i]->hash.usr = (i & 1) << 8;
 
-		rte_distributor_process(d, bufs, BURST);
-		rte_distributor_flush(d);
+		if (wp->dist_type == DIST_SINGLE) {
+			rte_distributor_process(d, bufs, BURST);
+			rte_distributor_flush(d);
+		} else {
+			rte_distributor_process_burst(db, bufs, BURST);
+			count = 0;
+			do {
+				rte_distributor_flush_burst(db);
+				count += rte_distributor_returned_pkts_burst(db,
+						returns, BURST*2);
+			} while (count < BURST);
+		}
 		if (total_packet_count() != BURST) {
 			printf("Line %d: Error, not all packets flushed. "
 					"Expected %u, got %u\n",
@@ -155,24 +219,32 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 			return -1;
 		}
 
+
 		for (i = 0; i < rte_lcore_count() - 1; i++)
 			printf("Worker %u handled %u packets\n", i,
 					worker_stats[i].handled_packets);
 		printf("Sanity test with two hash values done\n");
-
-		if (worker_stats[0].handled_packets != 16 ||
-				worker_stats[1].handled_packets != 16)
-			return -1;
 	}
 
 	/* give a different hash value to each packet,
 	 * so load gets distributed */
 	clear_packet_count();
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = i;
+		bufs[i]->hash.usr = i+1;
+
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, bufs, BURST);
+		rte_distributor_flush(d);
+	} else {
+		rte_distributor_process_burst(db, bufs, BURST);
+		count = 0;
+		do {
+			rte_distributor_flush_burst(db);
+			count += rte_distributor_returned_pkts_burst(db,
+					returns, BURST*2);
+		} while (count < BURST);
+	}
 
-	rte_distributor_process(d, bufs, BURST);
-	rte_distributor_flush(d);
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -194,8 +266,15 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 	unsigned num_returned = 0;
 
 	/* flush out any remaining packets */
-	rte_distributor_flush(d);
-	rte_distributor_clear_returns(d);
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_flush(d);
+		rte_distributor_clear_returns(d);
+	} else {
+		rte_distributor_flush_burst(db);
+		rte_distributor_clear_returns_burst(db);
+	}
+
+
 	if (rte_mempool_get_bulk(p, (void *)many_bufs, BIG_BATCH) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
 		return -1;
@@ -203,28 +282,59 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 	for (i = 0; i < BIG_BATCH; i++)
 		many_bufs[i]->hash.usr = i << 2;
 
-	for (i = 0; i < BIG_BATCH/BURST; i++) {
-		rte_distributor_process(d, &many_bufs[i*BURST], BURST);
+	if (wp->dist_type == DIST_SINGLE) {
+		printf("===testing single big burst===\n");
+		for (i = 0; i < BIG_BATCH/BURST; i++) {
+			rte_distributor_process(d, &many_bufs[i*BURST], BURST);
+			num_returned += rte_distributor_returned_pkts(d,
+					&return_bufs[num_returned],
+					BIG_BATCH - num_returned);
+		}
+		rte_distributor_flush(d);
 		num_returned += rte_distributor_returned_pkts(d,
 				&return_bufs[num_returned],
 				BIG_BATCH - num_returned);
+	} else {
+		printf("===testing burst big burst===\n");
+		for (i = 0; i < BIG_BATCH/BURST; i++) {
+			rte_distributor_process_burst(db,
+					&many_bufs[i*BURST], BURST);
+			count = rte_distributor_returned_pkts_burst(db,
+					&return_bufs[num_returned],
+					BIG_BATCH - num_returned);
+			num_returned += count;
+		}
+		rte_distributor_flush_burst(db);
+		count = rte_distributor_returned_pkts_burst(db,
+				&return_bufs[num_returned],
+				BIG_BATCH - num_returned);
+		num_returned += count;
 	}
-	rte_distributor_flush(d);
-	num_returned += rte_distributor_returned_pkts(d,
-			&return_bufs[num_returned], BIG_BATCH - num_returned);
+	retries = 0;
+	do {
+		rte_distributor_flush_burst(db);
+		count = rte_distributor_returned_pkts_burst(db,
+				&return_bufs[num_returned],
+				BIG_BATCH - num_returned);
+		num_returned += count;
+		retries++;
+	} while ((num_returned < BIG_BATCH) && (retries < 100));
+
 
 	if (num_returned != BIG_BATCH) {
-		printf("line %d: Number returned is not the same as "
-				"number sent\n", __LINE__);
+		printf("line %d: Missing packets, expected %d\n",
+				__LINE__, num_returned);
 		return -1;
 	}
+
 	/* big check -  make sure all packets made it back!! */
 	for (i = 0; i < BIG_BATCH; i++) {
 		unsigned j;
 		struct rte_mbuf *src = many_bufs[i];
-		for (j = 0; j < BIG_BATCH; j++)
+		for (j = 0; j < BIG_BATCH; j++) {
 			if (return_bufs[j] == src)
 				break;
+		}
 
 		if (j == BIG_BATCH) {
 			printf("Error: could not find source packet #%u\n", i);
@@ -234,7 +344,6 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 	printf("Sanity test of returned packets done\n");
 
 	rte_mempool_put_bulk(p, (void *)many_bufs, BIG_BATCH);
-
 	printf("\n");
 	return 0;
 }
@@ -249,18 +358,40 @@ static int
 handle_work_with_free_mbufs(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
-
-	pkt = rte_distributor_get_pkt(d, id, NULL);
-	while (!quit) {
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
+	unsigned int count = 0;
+	unsigned int i;
+	unsigned int num = 0;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+
+	if (wp->dist_type == DIST_SINGLE) {
+		pkt = rte_distributor_get_pkt(d, id, NULL);
+		while (!quit) {
+			worker_stats[id].handled_packets++, count++;
+			rte_pktmbuf_free(pkt);
+			pkt = rte_distributor_get_pkt(d, id, pkt);
+		}
 		worker_stats[id].handled_packets++, count++;
-		rte_pktmbuf_free(pkt);
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		rte_distributor_return_pkt(d, id, pkt);
+	} else {
+		for (i = 0; i < 8; i++)
+			buf[i] = NULL;
+		num = rte_distributor_get_pkt_burst(db, id, buf, buf, num);
+		while (!quit) {
+			worker_stats[id].handled_packets += num;
+			count += num;
+			for (i = 0; i < num; i++)
+				rte_pktmbuf_free(buf[i]);
+			num = rte_distributor_get_pkt_burst(db,
+					id, buf, buf, num);
+		}
+		worker_stats[id].handled_packets += num;
+		count += num;
+		rte_distributor_return_pkt_burst(db, id, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
 	return 0;
 }
 
@@ -270,26 +401,45 @@ handle_work_with_free_mbufs(void *arg)
  * library.
  */
 static int
-sanity_test_with_mbuf_alloc(struct rte_distributor *d, struct rte_mempool *p)
+sanity_test_with_mbuf_alloc(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	unsigned i;
 	struct rte_mbuf *bufs[BURST];
 
-	printf("=== Sanity test with mbuf alloc/free  ===\n");
+	if (wp->dist_type == DIST_SINGLE)
+		printf("=== Sanity test with mbuf alloc/free (single) ===\n");
+	else
+		printf("=== Sanity test with mbuf alloc/free (burst)  ===\n");
+
 	clear_packet_count();
 	for (i = 0; i < ((1<<ITER_POWER)); i += BURST) {
 		unsigned j;
-		while (rte_mempool_get_bulk(p, (void *)bufs, BURST) < 0)
-			rte_distributor_process(d, NULL, 0);
+		while (rte_mempool_get_bulk(p, (void *)bufs, BURST) < 0) {
+			if (wp->dist_type == DIST_SINGLE)
+				rte_distributor_process(d, NULL, 0);
+			else
+				rte_distributor_process_burst(db, NULL, 0);
+		}
 		for (j = 0; j < BURST; j++) {
 			bufs[j]->hash.usr = (i+j) << 1;
 			rte_mbuf_refcnt_set(bufs[j], 1);
 		}
 
-		rte_distributor_process(d, bufs, BURST);
+		if (wp->dist_type == DIST_SINGLE)
+			rte_distributor_process(d, bufs, BURST);
+		else
+			rte_distributor_process_burst(db, bufs, BURST);
 	}
 
-	rte_distributor_flush(d);
+	if (wp->dist_type == DIST_SINGLE)
+		rte_distributor_flush(d);
+	else
+		rte_distributor_flush_burst(db);
+
+	rte_delay_us(10000);
+
 	if (total_packet_count() < (1<<ITER_POWER)) {
 		printf("Line %u: Packet count is incorrect, %u, expected %u\n",
 				__LINE__, total_packet_count(),
@@ -305,20 +455,48 @@ static int
 handle_work_for_shutdown_test(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
-	unsigned count = 0;
-	const unsigned id = __sync_fetch_and_add(&worker_idx, 1);
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
+	unsigned int count = 0;
+	unsigned int num = 0;
+	unsigned int total = 0;
+	unsigned int i;
+	unsigned int returned = 0;
+	const unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+
+	if (wp->dist_type == DIST_SINGLE)
+		pkt = rte_distributor_get_pkt(d, id, NULL);
+	else
+		num = rte_distributor_get_pkt_burst(db, id, buf, buf, num);
 
-	pkt = rte_distributor_get_pkt(d, id, NULL);
 	/* wait for quit single globally, or for worker zero, wait
 	 * for zero_quit */
 	while (!quit && !(id == 0 && zero_quit)) {
-		worker_stats[id].handled_packets++, count++;
-		rte_pktmbuf_free(pkt);
-		pkt = rte_distributor_get_pkt(d, id, NULL);
+		if (wp->dist_type == DIST_SINGLE) {
+			worker_stats[id].handled_packets++, count++;
+			rte_pktmbuf_free(pkt);
+			pkt = rte_distributor_get_pkt(d, id, NULL);
+			num = 1;
+			total += num;
+		} else {
+			worker_stats[id].handled_packets += num;
+			count += num;
+			for (i = 0; i < num; i++)
+				rte_pktmbuf_free(buf[i]);
+			num = rte_distributor_get_pkt_burst(db,
+					id, buf, buf, num);
+			total += num;
+		}
+	}
+	worker_stats[id].handled_packets += num;
+	count += num;
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_return_pkt(d, id, pkt);
+	} else {
+		returned = rte_distributor_return_pkt_burst(db, id, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
 
 	if (id == 0) {
 		/* for worker zero, allow it to restart to pick up last packet
@@ -326,13 +504,29 @@ handle_work_for_shutdown_test(void *arg)
 		 */
 		while (zero_quit)
 			usleep(100);
-		pkt = rte_distributor_get_pkt(d, id, NULL);
+		if (wp->dist_type == DIST_SINGLE) {
+			pkt = rte_distributor_get_pkt(d, id, NULL);
+		} else {
+			num = rte_distributor_get_pkt_burst(db,
+					id, buf, buf, num);
+		}
 		while (!quit) {
 			worker_stats[id].handled_packets++, count++;
 			rte_pktmbuf_free(pkt);
-			pkt = rte_distributor_get_pkt(d, id, NULL);
+			if (wp->dist_type == DIST_SINGLE) {
+				pkt = rte_distributor_get_pkt(d, id, NULL);
+			} else {
+				num = rte_distributor_get_pkt_burst(db,
+						id, buf, buf, num);
+			}
+		}
+		if (wp->dist_type == DIST_SINGLE) {
+			rte_distributor_return_pkt(d, id, pkt);
+		} else {
+			returned = rte_distributor_return_pkt_burst(db,
+					id, buf, num);
+			printf("Num returned = %d\n", returned);
 		}
-		rte_distributor_return_pkt(d, id, pkt);
 	}
 	return 0;
 }
@@ -344,26 +538,37 @@ handle_work_for_shutdown_test(void *arg)
  * library.
  */
 static int
-sanity_test_with_worker_shutdown(struct rte_distributor *d,
+sanity_test_with_worker_shutdown(struct worker_params *wp,
 		struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
 	printf("=== Sanity test of worker shutdown ===\n");
 
 	clear_packet_count();
+
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
 		return -1;
 	}
 
-	/* now set all hash values in all buffers to zero, so all pkts go to the
-	 * one worker thread */
+	/*
+	 * Now set all hash values in all buffers to same value so all
+	 * pkts go to the one worker thread
+	 */
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = 0;
+		bufs[i]->hash.usr = 1;
+
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, bufs, BURST);
+	} else {
+		rte_distributor_process_burst(db, bufs, BURST);
+		rte_distributor_flush_burst(db);
+	}
 
-	rte_distributor_process(d, bufs, BURST);
 	/* at this point, we will have processed some packets and have a full
 	 * backlog for the other ones at worker 0.
 	 */
@@ -374,14 +579,25 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d,
 		return -1;
 	}
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = 0;
+		bufs[i]->hash.usr = 1;
 
 	/* get worker zero to quit */
 	zero_quit = 1;
-	rte_distributor_process(d, bufs, BURST);
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, bufs, BURST);
+		/* flush the distributor */
+		rte_distributor_flush(d);
+	} else {
+		rte_distributor_process_burst(db, bufs, BURST);
+		/* flush the distributor */
+		rte_distributor_flush_burst(db);
+	}
+	rte_delay_us(10000);
+
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
 
-	/* flush the distributor */
-	rte_distributor_flush(d);
 	if (total_packet_count() != BURST * 2) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -389,10 +605,6 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d,
 		return -1;
 	}
 
-	for (i = 0; i < rte_lcore_count() - 1; i++)
-		printf("Worker %u handled %u packets\n", i,
-				worker_stats[i].handled_packets);
-
 	printf("Sanity test with worker shutdown passed\n\n");
 	return 0;
 }
@@ -401,13 +613,18 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d,
  * one worker shuts down..
  */
 static int
-test_flush_with_worker_shutdown(struct rte_distributor *d,
+test_flush_with_worker_shutdown(struct worker_params *wp,
 		struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
-	printf("=== Test flush fn with worker shutdown ===\n");
+	if (wp->dist_type == DIST_SINGLE)
+		printf("=== Test flush fn with worker shutdown (single) ===\n");
+	else
+		printf("=== Test flush fn with worker shutdown (burst) ===\n");
 
 	clear_packet_count();
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
@@ -420,7 +637,11 @@ test_flush_with_worker_shutdown(struct rte_distributor *d,
 	for (i = 0; i < BURST; i++)
 		bufs[i]->hash.usr = 0;
 
-	rte_distributor_process(d, bufs, BURST);
+	if (wp->dist_type == DIST_SINGLE)
+		rte_distributor_process(d, bufs, BURST);
+	else
+		rte_distributor_process_burst(db, bufs, BURST);
+
 	/* at this point, we will have processed some packets and have a full
 	 * backlog for the other ones at worker 0.
 	 */
@@ -429,9 +650,18 @@ test_flush_with_worker_shutdown(struct rte_distributor *d,
 	zero_quit = 1;
 
 	/* flush the distributor */
-	rte_distributor_flush(d);
+	if (wp->dist_type == DIST_SINGLE)
+		rte_distributor_flush(d);
+	else
+		rte_distributor_flush_burst(db);
+
+	rte_delay_us(10000);
 
 	zero_quit = 0;
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
+
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -439,10 +669,6 @@ test_flush_with_worker_shutdown(struct rte_distributor *d,
 		return -1;
 	}
 
-	for (i = 0; i < rte_lcore_count() - 1; i++)
-		printf("Worker %u handled %u packets\n", i,
-				worker_stats[i].handled_packets);
-
 	printf("Flush test with worker shutdown passed\n\n");
 	return 0;
 }
@@ -451,6 +677,7 @@ static
 int test_error_distributor_create_name(void)
 {
 	struct rte_distributor *d = NULL;
+	struct rte_distributor_burst *db = NULL;
 	char *name = NULL;
 
 	d = rte_distributor_create(name, rte_socket_id(),
@@ -460,6 +687,13 @@ int test_error_distributor_create_name(void)
 		return -1;
 	}
 
+	db = rte_distributor_create_burst(name, rte_socket_id(),
+			rte_lcore_count() - 1);
+	if (db != NULL || rte_errno != EINVAL) {
+		printf("ERROR: No error on create_burst() with NULL param\n");
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -468,20 +702,32 @@ static
 int test_error_distributor_create_numworkers(void)
 {
 	struct rte_distributor *d = NULL;
+	struct rte_distributor_burst *db = NULL;
+
 	d = rte_distributor_create("test_numworkers", rte_socket_id(),
 			RTE_MAX_LCORE + 10);
 	if (d != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with num_workers > MAX\n");
 		return -1;
 	}
+
+	db = rte_distributor_create_burst("test_numworkers", rte_socket_id(),
+			RTE_MAX_LCORE + 10);
+	if (db != NULL || rte_errno != EINVAL) {
+		printf("ERROR: No error on create_burst() num_workers > MAX\n");
+		return -1;
+	}
+
 	return 0;
 }
 
 
 /* Useful function which ensures that all worker functions terminate */
 static void
-quit_workers(struct rte_distributor *d, struct rte_mempool *p)
+quit_workers(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	const unsigned num_workers = rte_lcore_count() - 1;
 	unsigned i;
 	struct rte_mbuf *bufs[RTE_MAX_LCORE];
@@ -491,12 +737,20 @@ quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 	quit = 1;
 	for (i = 0; i < num_workers; i++)
 		bufs[i]->hash.usr = i << 1;
-	rte_distributor_process(d, bufs, num_workers);
+	if (wp->dist_type == DIST_SINGLE)
+		rte_distributor_process(d, bufs, num_workers);
+	else
+		rte_distributor_process_burst(db, bufs, num_workers);
 
 	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
 
-	rte_distributor_process(d, NULL, 0);
-	rte_distributor_flush(d);
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, NULL, 0);
+		rte_distributor_flush(d);
+	} else {
+		rte_distributor_process_burst(db, NULL, 0);
+		rte_distributor_flush_burst(db);
+	}
 	rte_eal_mp_wait_lcore();
 	quit = 0;
 	worker_idx = 0;
@@ -506,7 +760,9 @@ static int
 test_distributor(void)
 {
 	static struct rte_distributor *d;
+	static struct rte_distributor_burst *db;
 	static struct rte_mempool *p;
+	int i;
 
 	if (rte_lcore_count() < 2) {
 		printf("ERROR: not enough cores to test distributor\n");
@@ -525,6 +781,19 @@ test_distributor(void)
 		rte_distributor_clear_returns(d);
 	}
 
+	if (db == NULL) {
+		db = rte_distributor_create_burst("Test_dist_burst",
+				rte_socket_id(),
+				rte_lcore_count() - 1);
+		if (db == NULL) {
+			printf("Error creating burst distributor\n");
+			return -1;
+		}
+	} else {
+		rte_distributor_flush_burst(db);
+		rte_distributor_clear_returns_burst(db);
+	}
+
 	const unsigned nb_bufs = (511 * rte_lcore_count()) < BIG_BATCH ?
 			(BIG_BATCH * 2) - 1 : (511 * rte_lcore_count());
 	if (p == NULL) {
@@ -536,31 +805,45 @@ test_distributor(void)
 		}
 	}
 
-	rte_eal_mp_remote_launch(handle_work, d, SKIP_MASTER);
-	if (sanity_test(d, p) < 0)
-		goto err;
-	quit_workers(d, p);
+	worker_params.d = d;
+	worker_params.db = db;
 
-	rte_eal_mp_remote_launch(handle_work_with_free_mbufs, d, SKIP_MASTER);
-	if (sanity_test_with_mbuf_alloc(d, p) < 0)
-		goto err;
-	quit_workers(d, p);
+	for (i = 0; i < DIST_NUM_TYPES; i++) {
 
-	if (rte_lcore_count() > 2) {
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test, d,
-				SKIP_MASTER);
-		if (sanity_test_with_worker_shutdown(d, p) < 0)
-			goto err;
-		quit_workers(d, p);
+		worker_params.dist_type = i;
 
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test, d,
-				SKIP_MASTER);
-		if (test_flush_with_worker_shutdown(d, p) < 0)
+		rte_eal_mp_remote_launch(handle_work,
+				&worker_params, SKIP_MASTER);
+		if (sanity_test(&worker_params, p) < 0)
 			goto err;
-		quit_workers(d, p);
+		quit_workers(&worker_params, p);
 
-	} else {
-		printf("Not enough cores to run tests for worker shutdown\n");
+		rte_eal_mp_remote_launch(handle_work_with_free_mbufs,
+				&worker_params, SKIP_MASTER);
+		if (sanity_test_with_mbuf_alloc(&worker_params, p) < 0)
+			goto err;
+		quit_workers(&worker_params, p);
+
+		if (rte_lcore_count() > 2) {
+			rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+					&worker_params,
+					SKIP_MASTER);
+			if (sanity_test_with_worker_shutdown(&worker_params,
+					p) < 0)
+				goto err;
+			quit_workers(&worker_params, p);
+
+			rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+					&worker_params,
+					SKIP_MASTER);
+			if (test_flush_with_worker_shutdown(&worker_params,
+					p) < 0)
+				goto err;
+			quit_workers(&worker_params, p);
+
+		} else {
+			printf("Too few cores to run worker shutdown test\n");
+		}
 	}
 
 	if (test_error_distributor_create_numworkers() == -1 ||
@@ -572,7 +855,7 @@ test_distributor(void)
 	return 0;
 
 err:
-	quit_workers(d, p);
+	quit_workers(&worker_params, p);
 	return -1;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v4 4/6] test: add distributor_perf autotest
  2017-01-09  7:50       ` [PATCH v4 0/6] distributor library performance enhancements David Hunt
                           ` (2 preceding siblings ...)
  2017-01-09  7:50         ` [PATCH v4 3/6] test: unit tests for new distributor burst api David Hunt
@ 2017-01-09  7:50         ` David Hunt
  2017-01-09  7:50         ` [PATCH v4 5/6] example: distributor app showing burst api David Hunt
  2017-01-09  7:50         ` [PATCH v4 6/6] doc: distributor library changes for new " David Hunt
  5 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-01-09  7:50 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 app/test/test_distributor_perf.c | 148 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 137 insertions(+), 11 deletions(-)

diff --git a/app/test/test_distributor_perf.c b/app/test/test_distributor_perf.c
index 7947fe9..b273bf9 100644
--- a/app/test/test_distributor_perf.c
+++ b/app/test/test_distributor_perf.c
@@ -40,9 +40,11 @@
 #include <rte_common.h>
 #include <rte_mbuf.h>
 #include <rte_distributor.h>
+#include <rte_distributor_burst.h>
 
-#define ITER_POWER 20 /* log 2 of how many iterations we do when timing. */
-#define BURST 32
+#define ITER_POWER_CL 25 /* log 2 of how many iterations  for Cache Line test */
+#define ITER_POWER 21 /* log 2 of how many iterations we do when timing. */
+#define BURST 64
 #define BIG_BATCH 1024
 
 /* static vars - zero initialized by default */
@@ -54,7 +56,8 @@ struct worker_stats {
 } __rte_cache_aligned;
 struct worker_stats worker_stats[RTE_MAX_LCORE];
 
-/* worker thread used for testing the time to do a round-trip of a cache
+/*
+ * worker thread used for testing the time to do a round-trip of a cache
  * line between two cores and back again
  */
 static void
@@ -69,7 +72,8 @@ flip_bit(volatile uint64_t *arg)
 	}
 }
 
-/* test case to time the number of cycles to round-trip a cache line between
+/*
+ * test case to time the number of cycles to round-trip a cache line between
  * two cores and back again.
  */
 static void
@@ -86,7 +90,7 @@ time_cache_line_switch(void)
 		rte_pause();
 
 	const uint64_t start_time = rte_rdtsc();
-	for (i = 0; i < (1 << ITER_POWER); i++) {
+	for (i = 0; i < (1 << ITER_POWER_CL); i++) {
 		while (*pdata)
 			rte_pause();
 		*pdata = 1;
@@ -98,13 +102,14 @@ time_cache_line_switch(void)
 	*pdata = 2;
 	rte_eal_wait_lcore(slaveid);
 	printf("==== Cache line switch test ===\n");
-	printf("Time for %u iterations = %"PRIu64" ticks\n", (1<<ITER_POWER),
+	printf("Time for %u iterations = %"PRIu64" ticks\n", (1<<ITER_POWER_CL),
 			end_time-start_time);
 	printf("Ticks per iteration = %"PRIu64"\n\n",
-			(end_time-start_time) >> ITER_POWER);
+			(end_time-start_time) >> ITER_POWER_CL);
 }
 
-/* returns the total count of the number of packets handled by the worker
+/*
+ * returns the total count of the number of packets handled by the worker
  * functions given below.
  */
 static unsigned
@@ -123,7 +128,8 @@ clear_packet_count(void)
 	memset(&worker_stats, 0, sizeof(worker_stats));
 }
 
-/* this is the basic worker function for performance tests.
+/*
+ * this is the basic worker function for performance tests.
  * it does nothing but return packets and count them.
  */
 static int
@@ -144,7 +150,37 @@ handle_work(void *arg)
 	return 0;
 }
 
-/* this basic performance test just repeatedly sends in 32 packets at a time
+/*
+ * this is the basic worker function for performance tests.
+ * it does nothing but return packets and count them.
+ */
+static int
+handle_work_burst(void *arg)
+{
+	struct rte_distributor_burst *d = arg;
+	unsigned int count = 0;
+	unsigned int num = 0;
+	int i;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
+
+	num = rte_distributor_get_pkt_burst(d, id, buf, buf, num);
+	while (!quit) {
+		worker_stats[id].handled_packets += num;
+		count += num;
+		num = rte_distributor_get_pkt_burst(d, id, buf, buf, num);
+	}
+	worker_stats[id].handled_packets += num;
+	count += num;
+	rte_distributor_return_pkt_burst(d, id, buf, num);
+	return 0;
+}
+
+/*
+ * this basic performance test just repeatedly sends in 32 packets at a time
  * to the distributor and verifies at the end that we got them all in the worker
  * threads and finally how long per packet the processing took.
  */
@@ -174,6 +210,8 @@ perf_test(struct rte_distributor *d, struct rte_mempool *p)
 		rte_distributor_process(d, NULL, 0);
 	} while (total_packet_count() < (BURST << ITER_POWER));
 
+	rte_distributor_clear_returns(d);
+
 	printf("=== Performance test of distributor ===\n");
 	printf("Time per burst:  %"PRIu64"\n", (end - start) >> ITER_POWER);
 	printf("Time per packet: %"PRIu64"\n\n",
@@ -190,6 +228,55 @@ perf_test(struct rte_distributor *d, struct rte_mempool *p)
 	return 0;
 }
 
+/*
+ * this basic performance test just repeatedly sends in 32 packets at a time
+ * to the distributor and verifies at the end that we got them all in the worker
+ * threads and finally how long per packet the processing took.
+ */
+static inline int
+perf_test_burst(struct rte_distributor_burst *d, struct rte_mempool *p)
+{
+	unsigned int i;
+	uint64_t start, end;
+	struct rte_mbuf *bufs[BURST];
+
+	clear_packet_count();
+	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
+		printf("Error getting mbufs from pool\n");
+		return -1;
+	}
+	/* ensure we have different hash value for each pkt */
+	for (i = 0; i < BURST; i++)
+		bufs[i]->hash.usr = i;
+
+	start = rte_rdtsc();
+	for (i = 0; i < (1<<ITER_POWER); i++)
+		rte_distributor_process_burst(d, bufs, BURST);
+	end = rte_rdtsc();
+
+	do {
+		usleep(100);
+		rte_distributor_process_burst(d, NULL, 0);
+	} while (total_packet_count() < (BURST << ITER_POWER));
+
+	rte_distributor_clear_returns_burst(d);
+
+	printf("=== Performance test of burst distributor ===\n");
+	printf("Time per burst:  %"PRIu64"\n", (end - start) >> ITER_POWER);
+	printf("Time per packet: %"PRIu64"\n\n",
+			((end - start) >> ITER_POWER)/BURST);
+	rte_mempool_put_bulk(p, (void *)bufs, BURST);
+
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
+	printf("Total packets: %u (%x)\n", total_packet_count(),
+			total_packet_count());
+	printf("=== Perf test done ===\n\n");
+
+	return 0;
+}
+
 /* Useful function which ensures that all worker functions terminate */
 static void
 quit_workers(struct rte_distributor *d, struct rte_mempool *p)
@@ -212,10 +299,34 @@ quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 	worker_idx = 0;
 }
 
+/* Useful function which ensures that all worker functions terminate */
+static void
+quit_workers_burst(struct rte_distributor_burst *d, struct rte_mempool *p)
+{
+	const unsigned int num_workers = rte_lcore_count() - 1;
+	unsigned int i;
+	struct rte_mbuf *bufs[RTE_MAX_LCORE];
+
+	rte_mempool_get_bulk(p, (void *)bufs, num_workers);
+
+	quit = 1;
+	for (i = 0; i < num_workers; i++)
+		bufs[i]->hash.usr = i << 1;
+	rte_distributor_process_burst(d, bufs, num_workers);
+
+	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
+
+	rte_distributor_process_burst(d, NULL, 0);
+	rte_eal_mp_wait_lcore();
+	quit = 0;
+	worker_idx = 0;
+}
+
 static int
 test_distributor_perf(void)
 {
 	static struct rte_distributor *d;
+	static struct rte_distributor_burst *db;
 	static struct rte_mempool *p;
 
 	if (rte_lcore_count() < 2) {
@@ -234,10 +345,20 @@ test_distributor_perf(void)
 			return -1;
 		}
 	} else {
-		rte_distributor_flush(d);
 		rte_distributor_clear_returns(d);
 	}
 
+	if (db == NULL) {
+		db = rte_distributor_create_burst("Test_burst", rte_socket_id(),
+				rte_lcore_count() - 1);
+		if (db == NULL) {
+			printf("Error creating burst distributor\n");
+			return -1;
+		}
+	} else {
+		rte_distributor_clear_returns_burst(db);
+	}
+
 	const unsigned nb_bufs = (511 * rte_lcore_count()) < BIG_BATCH ?
 			(BIG_BATCH * 2) - 1 : (511 * rte_lcore_count());
 	if (p == NULL) {
@@ -254,6 +375,11 @@ test_distributor_perf(void)
 		return -1;
 	quit_workers(d, p);
 
+	rte_eal_mp_remote_launch(handle_work_burst, db, SKIP_MASTER);
+	if (perf_test_burst(db, p) < 0)
+		return -1;
+	quit_workers_burst(db, p);
+
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v4 5/6] example: distributor app showing burst api
  2017-01-09  7:50       ` [PATCH v4 0/6] distributor library performance enhancements David Hunt
                           ` (3 preceding siblings ...)
  2017-01-09  7:50         ` [PATCH v4 4/6] test: add distributor_perf autotest David Hunt
@ 2017-01-09  7:50         ` David Hunt
  2017-01-13 15:36           ` Bruce Richardson
  2017-01-13 15:38           ` Bruce Richardson
  2017-01-09  7:50         ` [PATCH v4 6/6] doc: distributor library changes for new " David Hunt
  5 siblings, 2 replies; 202+ messages in thread
From: David Hunt @ 2017-01-09  7:50 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 examples/distributor/main.c | 508 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 390 insertions(+), 118 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index e7641d2..eebfb74 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -1,8 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
+ *   Copyright(c) 2010-2016 Intel Corporation. All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -31,6 +30,8 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
+#define BURST_API 1
+
 #include <stdint.h>
 #include <inttypes.h>
 #include <unistd.h>
@@ -43,39 +44,87 @@
 #include <rte_malloc.h>
 #include <rte_debug.h>
 #include <rte_prefetch.h>
+#if BURST_API
+#include <rte_distributor_burst.h>
+#else
 #include <rte_distributor.h>
+#endif
 
-#define RX_RING_SIZE 256
-#define TX_RING_SIZE 512
+#define RX_QUEUE_SIZE 512
+#define TX_QUEUE_SIZE 512
 #define NUM_MBUFS ((64*1024)-1)
-#define MBUF_CACHE_SIZE 250
+#define MBUF_CACHE_SIZE 128
+#if BURST_API
+#define BURST_SIZE 64
+#define SCHED_RX_RING_SZ 8192
+#define SCHED_TX_RING_SZ 65536
+#else
 #define BURST_SIZE 32
-#define RTE_RING_SZ 1024
+#define SCHED_RX_RING_SZ 1024
+#define SCHED_TX_RING_SZ 1024
+#endif
+#define BURST_SIZE_TX 32
 
 #define RTE_LOGTYPE_DISTRAPP RTE_LOGTYPE_USER1
 
+#define ANSI_COLOR_RED     "\x1b[31m"
+#define ANSI_COLOR_RESET   "\x1b[0m"
+
 /* mask of enabled ports */
 static uint32_t enabled_port_mask;
 volatile uint8_t quit_signal;
 volatile uint8_t quit_signal_rx;
+volatile uint8_t quit_signal_dist;
+volatile uint8_t quit_signal_work;
 
 static volatile struct app_stats {
 	struct {
 		uint64_t rx_pkts;
 		uint64_t returned_pkts;
 		uint64_t enqueued_pkts;
+		uint64_t enqdrop_pkts;
 	} rx __rte_cache_aligned;
+	int pad1 __rte_cache_aligned;
+
+	struct {
+		uint64_t in_pkts;
+		uint64_t ret_pkts;
+		uint64_t sent_pkts;
+		uint64_t enqdrop_pkts;
+	} dist __rte_cache_aligned;
+	int pad2 __rte_cache_aligned;
 
 	struct {
 		uint64_t dequeue_pkts;
 		uint64_t tx_pkts;
+		uint64_t enqdrop_pkts;
 	} tx __rte_cache_aligned;
+	int pad3 __rte_cache_aligned;
+
+	uint64_t worker_pkts[64] __rte_cache_aligned;
+
+	int pad4 __rte_cache_aligned;
+
+	uint64_t worker_bursts[64][8] __rte_cache_aligned;
+
+	int pad5 __rte_cache_aligned;
+
+	uint64_t port_rx_pkts[64] __rte_cache_aligned;
+	uint64_t port_tx_pkts[64] __rte_cache_aligned;
 } app_stats;
 
+struct app_stats prev_app_stats;
+
 static const struct rte_eth_conf port_conf_default = {
 	.rxmode = {
 		.mq_mode = ETH_MQ_RX_RSS,
 		.max_rx_pkt_len = ETHER_MAX_LEN,
+		.split_hdr_size = 0,
+		.header_split   = 0, /**< Header Split disabled */
+		.hw_ip_checksum = 1, /**< IP checksum offload enabled */
+		.hw_vlan_filter = 0, /**< VLAN filtering disabled */
+		.jumbo_frame    = 0, /**< Jumbo Frame Support disabled */
+		.hw_strip_crc   = 0, /**< CRC stripped by hardware */
 	},
 	.txmode = {
 		.mq_mode = ETH_MQ_TX_NONE,
@@ -93,6 +142,8 @@ struct output_buffer {
 	struct rte_mbuf *mbufs[BURST_SIZE];
 };
 
+static void print_stats(void);
+
 /*
  * Initialises a given port using global settings and with the rx buffers
  * coming from the mbuf_pool passed as parameter
@@ -101,9 +152,13 @@ static inline int
 port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 {
 	struct rte_eth_conf port_conf = port_conf_default;
-	const uint16_t rxRings = 1, txRings = rte_lcore_count() - 1;
-	int retval;
+	const uint16_t rxRings = 1;
+	uint16_t txRings = rte_lcore_count() - 1;
 	uint16_t q;
+	int retval;
+
+	if (txRings > RTE_MAX_ETHPORTS)
+		txRings = RTE_MAX_ETHPORTS;
 
 	if (port >= rte_eth_dev_count())
 		return -1;
@@ -113,7 +168,7 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 		return retval;
 
 	for (q = 0; q < rxRings; q++) {
-		retval = rte_eth_rx_queue_setup(port, q, RX_RING_SIZE,
+		retval = rte_eth_rx_queue_setup(port, q, RX_QUEUE_SIZE,
 						rte_eth_dev_socket_id(port),
 						NULL, mbuf_pool);
 		if (retval < 0)
@@ -121,7 +176,7 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 	}
 
 	for (q = 0; q < txRings; q++) {
-		retval = rte_eth_tx_queue_setup(port, q, TX_RING_SIZE,
+		retval = rte_eth_tx_queue_setup(port, q, TX_QUEUE_SIZE,
 						rte_eth_dev_socket_id(port),
 						NULL);
 		if (retval < 0)
@@ -134,7 +189,8 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 
 	struct rte_eth_link link;
 	rte_eth_link_get_nowait(port, &link);
-	if (!link.link_status) {
+	while (!link.link_status) {
+		printf("Waiting for Link up on port %"PRIu8"\n", port);
 		sleep(1);
 		rte_eth_link_get_nowait(port, &link);
 	}
@@ -160,41 +216,52 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 
 struct lcore_params {
 	unsigned worker_id;
-	struct rte_distributor *d;
-	struct rte_ring *r;
+	struct rte_distributor_burst *d;
+	struct rte_ring *rx_dist_ring;
+	struct rte_ring *dist_tx_ring;
 	struct rte_mempool *mem_pool;
 };
 
-static int
-quit_workers(struct rte_distributor *d, struct rte_mempool *p)
+static inline void
+flush_one_port(struct output_buffer *outbuf, uint8_t outp)
 {
-	const unsigned num_workers = rte_lcore_count() - 2;
-	unsigned i;
-	struct rte_mbuf *bufs[num_workers];
+	unsigned int nb_tx = rte_eth_tx_burst(outp, 0,
+			outbuf->mbufs, outbuf->count);
+	app_stats.tx.tx_pkts += outbuf->count;
 
-	if (rte_mempool_get_bulk(p, (void *)bufs, num_workers) != 0) {
-		printf("line %d: Error getting mbufs from pool\n", __LINE__);
-		return -1;
+	if (unlikely(nb_tx < outbuf->count)) {
+		app_stats.tx.enqdrop_pkts +=  outbuf->count - nb_tx;
+		do {
+			rte_pktmbuf_free(outbuf->mbufs[nb_tx]);
+		} while (++nb_tx < outbuf->count);
 	}
+	outbuf->count = 0;
+}
+
+static inline void
+flush_all_ports(struct output_buffer *tx_buffers, uint8_t nb_ports)
+{
+	uint8_t outp;
 
-	for (i = 0; i < num_workers; i++)
-		bufs[i]->hash.rss = i << 1;
+	for (outp = 0; outp < nb_ports; outp++) {
+		/* skip ports that are not enabled */
+		if ((enabled_port_mask & (1 << outp)) == 0)
+			continue;
 
-	rte_distributor_process(d, bufs, num_workers);
-	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
+		if (tx_buffers[outp].count == 0)
+			continue;
 
-	return 0;
+		flush_one_port(&tx_buffers[outp], outp);
+	}
 }
 
 static int
 lcore_rx(struct lcore_params *p)
 {
-	struct rte_distributor *d = p->d;
-	struct rte_mempool *mem_pool = p->mem_pool;
-	struct rte_ring *r = p->r;
 	const uint8_t nb_ports = rte_eth_dev_count();
 	const int socket_id = rte_socket_id();
 	uint8_t port;
+	struct rte_mbuf *bufs[BURST_SIZE*2];
 
 	for (port = 0; port < nb_ports; port++) {
 		/* skip ports that are not enabled */
@@ -210,6 +277,7 @@ lcore_rx(struct lcore_params *p)
 
 	printf("\nCore %u doing packet RX.\n", rte_lcore_id());
 	port = 0;
+
 	while (!quit_signal_rx) {
 
 		/* skip ports that are not enabled */
@@ -218,7 +286,7 @@ lcore_rx(struct lcore_params *p)
 				port = 0;
 			continue;
 		}
-		struct rte_mbuf *bufs[BURST_SIZE*2];
+
 		const uint16_t nb_rx = rte_eth_rx_burst(port, 0, bufs,
 				BURST_SIZE);
 		if (unlikely(nb_rx == 0)) {
@@ -228,19 +296,46 @@ lcore_rx(struct lcore_params *p)
 		}
 		app_stats.rx.rx_pkts += nb_rx;
 
-		rte_distributor_process(d, bufs, nb_rx);
-		const uint16_t nb_ret = rte_distributor_returned_pkts(d,
-				bufs, BURST_SIZE*2);
+/*
+ * You can run the distributor on the rx core with this code. Returned
+ * packets are then send straight to the tx core.
+ */
+#if 0
+
+#if BURST_API
+	rte_distributor_process_burst(d, bufs, nb_rx);
+	const uint16_t nb_ret = rte_distributor_returned_pkts_burst(d,
+			bufs, BURST_SIZE*2);
+#else
+	rte_distributor_process(d, bufs, nb_rx);
+	const uint16_t nb_ret = rte_distributor_returned_pkts(d,
+			bufs, BURST_SIZE*2);
+#endif
+
 		app_stats.rx.returned_pkts += nb_ret;
 		if (unlikely(nb_ret == 0)) {
 			if (++port == nb_ports)
 				port = 0;
 			continue;
 		}
-
-		uint16_t sent = rte_ring_enqueue_burst(r, (void *)bufs, nb_ret);
+		struct rte_ring *tx_ring = p->dist_tx_ring;
+		uint16_t sent = rte_ring_enqueue_burst(tx_ring,
+				(void *)bufs, nb_ret);
+#else
+		uint16_t nb_ret = nb_rx;
+		/*
+		* Swap the following two lines if you want the rx traffic
+		* to go directly to tx, no distribution.
+		*/
+		struct rte_ring *out_ring = p->rx_dist_ring;
+		//struct rte_ring *out_ring = p->dist_tx_ring;
+
+		uint16_t sent = rte_ring_enqueue_burst(out_ring,
+				(void *)bufs, nb_ret);
+#endif
 		app_stats.rx.enqueued_pkts += sent;
 		if (unlikely(sent < nb_ret)) {
+			app_stats.rx.enqdrop_pkts +=  nb_ret - sent;
 			RTE_LOG_DP(DEBUG, DISTRAPP,
 				"%s:Packet loss due to full ring\n", __func__);
 			while (sent < nb_ret)
@@ -249,56 +344,88 @@ lcore_rx(struct lcore_params *p)
 		if (++port == nb_ports)
 			port = 0;
 	}
-	rte_distributor_process(d, NULL, 0);
-	/* flush distributor to bring to known state */
-	rte_distributor_flush(d);
 	/* set worker & tx threads quit flag */
+	printf("\nCore %u exiting rx task.\n", rte_lcore_id());
 	quit_signal = 1;
-	/*
-	 * worker threads may hang in get packet as
-	 * distributor process is not running, just make sure workers
-	 * get packets till quit_signal is actually been
-	 * received and they gracefully shutdown
-	 */
-	if (quit_workers(d, mem_pool) != 0)
-		return -1;
-	/* rx thread should quit at last */
 	return 0;
 }
 
-static inline void
-flush_one_port(struct output_buffer *outbuf, uint8_t outp)
-{
-	unsigned nb_tx = rte_eth_tx_burst(outp, 0, outbuf->mbufs,
-			outbuf->count);
-	app_stats.tx.tx_pkts += nb_tx;
 
-	if (unlikely(nb_tx < outbuf->count)) {
-		RTE_LOG_DP(DEBUG, DISTRAPP,
-			"%s:Packet loss with tx_burst\n", __func__);
-		do {
-			rte_pktmbuf_free(outbuf->mbufs[nb_tx]);
-		} while (++nb_tx < outbuf->count);
-	}
-	outbuf->count = 0;
-}
 
-static inline void
-flush_all_ports(struct output_buffer *tx_buffers, uint8_t nb_ports)
+static int
+lcore_distributor(struct lcore_params *p)
 {
-	uint8_t outp;
-	for (outp = 0; outp < nb_ports; outp++) {
-		/* skip ports that are not enabled */
-		if ((enabled_port_mask & (1 << outp)) == 0)
-			continue;
-
-		if (tx_buffers[outp].count == 0)
-			continue;
-
-		flush_one_port(&tx_buffers[outp], outp);
+	struct rte_ring *in_r = p->rx_dist_ring;
+	struct rte_ring *out_r = p->dist_tx_ring;
+	struct rte_mbuf *bufs[BURST_SIZE * 4];
+	struct rte_distributor_burst *d = p->d;
+
+	printf("\nCore %u acting as distributor core.\n", rte_lcore_id());
+	while (!quit_signal_dist) {
+		const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
+				(void *)bufs, BURST_SIZE*1);
+		if (nb_rx) {
+			app_stats.dist.in_pkts += nb_rx;
+/*
+ * This '#if' allows you to bypass the distributor. Incoming packets may be
+ * sent straight to the tx ring.
+ */
+#if 1
+
+#if BURST_API
+			/* Distribute the packets */
+			rte_distributor_process_burst(d, bufs, nb_rx);
+			/* Handle Returns */
+			const uint16_t nb_ret =
+				rte_distributor_returned_pkts_burst(d,
+					bufs, BURST_SIZE*2);
+#else
+			/* Distribute the packets */
+			rte_distributor_process(d, bufs, nb_rx);
+			/* Handle Returns */
+			const uint16_t nb_ret =
+				rte_distributor_returned_pkts(d,
+					bufs, BURST_SIZE*2);
+#endif
+
+#else
+			/* Bypass the distributor */
+			const unsigned int xor_val = (rte_eth_dev_count() > 1);
+			/* Touch the mbuf by xor'ing the port */
+			for (unsigned int i = 0; i < nb_rx; i++)
+				bufs[i]->port ^= xor_val;
+
+			const uint16_t nb_ret = nb_rx;
+#endif
+			if (unlikely(nb_ret == 0))
+				continue;
+			app_stats.dist.ret_pkts += nb_ret;
+
+			uint16_t sent = rte_ring_enqueue_burst(out_r,
+					(void *)bufs, nb_ret);
+			app_stats.dist.sent_pkts += sent;
+			if (unlikely(sent < nb_ret)) {
+				app_stats.dist.enqdrop_pkts += nb_ret - sent;
+				RTE_LOG(DEBUG, DISTRAPP,
+					"%s:Packet loss due to full out ring\n",
+					__func__);
+				while (sent < nb_ret)
+					rte_pktmbuf_free(bufs[sent++]);
+			}
+		}
 	}
+	printf("\nCore %u exiting distributor task.\n", rte_lcore_id());
+	quit_signal_work = 1;
+
+#if BURST_API
+	/* Unblock any returns so workers can exit */
+	rte_distributor_clear_returns_burst(d);
+#endif
+	quit_signal_rx = 1;
+	return 0;
 }
 
+
 static int
 lcore_tx(struct rte_ring *in_r)
 {
@@ -327,9 +454,9 @@ lcore_tx(struct rte_ring *in_r)
 			if ((enabled_port_mask & (1 << port)) == 0)
 				continue;
 
-			struct rte_mbuf *bufs[BURST_SIZE];
+			struct rte_mbuf *bufs[BURST_SIZE_TX];
 			const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
-					(void *)bufs, BURST_SIZE);
+					(void *)bufs, BURST_SIZE_TX);
 			app_stats.tx.dequeue_pkts += nb_rx;
 
 			/* if we get no traffic, flush anything we have */
@@ -358,11 +485,12 @@ lcore_tx(struct rte_ring *in_r)
 
 				outbuf = &tx_buffers[outp];
 				outbuf->mbufs[outbuf->count++] = bufs[i];
-				if (outbuf->count == BURST_SIZE)
+				if (outbuf->count == BURST_SIZE_TX)
 					flush_one_port(outbuf, outp);
 			}
 		}
 	}
+	printf("\nCore %u exiting tx task.\n", rte_lcore_id());
 	return 0;
 }
 
@@ -371,52 +499,147 @@ int_handler(int sig_num)
 {
 	printf("Exiting on signal %d\n", sig_num);
 	/* set quit flag for rx thread to exit */
-	quit_signal_rx = 1;
+	quit_signal_dist = 1;
 }
 
 static void
 print_stats(void)
 {
 	struct rte_eth_stats eth_stats;
-	unsigned i;
-
-	printf("\nRX thread stats:\n");
-	printf(" - Received:    %"PRIu64"\n", app_stats.rx.rx_pkts);
-	printf(" - Processed:   %"PRIu64"\n", app_stats.rx.returned_pkts);
-	printf(" - Enqueued:    %"PRIu64"\n", app_stats.rx.enqueued_pkts);
-
-	printf("\nTX thread stats:\n");
-	printf(" - Dequeued:    %"PRIu64"\n", app_stats.tx.dequeue_pkts);
-	printf(" - Transmitted: %"PRIu64"\n", app_stats.tx.tx_pkts);
+	unsigned int i, j;
+	const unsigned int num_workers = rte_lcore_count() - 4;
 
 	for (i = 0; i < rte_eth_dev_count(); i++) {
 		rte_eth_stats_get(i, &eth_stats);
-		printf("\nPort %u stats:\n", i);
-		printf(" - Pkts in:   %"PRIu64"\n", eth_stats.ipackets);
-		printf(" - Pkts out:  %"PRIu64"\n", eth_stats.opackets);
-		printf(" - In Errs:   %"PRIu64"\n", eth_stats.ierrors);
-		printf(" - Out Errs:  %"PRIu64"\n", eth_stats.oerrors);
-		printf(" - Mbuf Errs: %"PRIu64"\n", eth_stats.rx_nombuf);
+		app_stats.port_rx_pkts[i] = eth_stats.ipackets;
+		app_stats.port_tx_pkts[i] = eth_stats.opackets;
+	}
+
+	printf("\n\nRX Thread:\n");
+	for (i = 0; i < rte_eth_dev_count(); i++) {
+		printf("Port %u Pktsin : %5.2f\n", i,
+				(app_stats.port_rx_pkts[i] -
+				prev_app_stats.port_rx_pkts[i])/1000000.0);
+		prev_app_stats.port_rx_pkts[i] = app_stats.port_rx_pkts[i];
+	}
+	printf(" - Received:    %5.2f\n",
+			(app_stats.rx.rx_pkts -
+			prev_app_stats.rx.rx_pkts)/1000000.0);
+	printf(" - Returned:    %5.2f\n",
+			(app_stats.rx.returned_pkts -
+			prev_app_stats.rx.returned_pkts)/1000000.0);
+	printf(" - Enqueued:    %5.2f\n",
+			(app_stats.rx.enqueued_pkts -
+			prev_app_stats.rx.enqueued_pkts)/1000000.0);
+	printf(" - Dropped:     %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.rx.enqdrop_pkts -
+			prev_app_stats.rx.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	printf("Distributor thread:\n");
+	printf(" - In:          %5.2f\n",
+			(app_stats.dist.in_pkts -
+			prev_app_stats.dist.in_pkts)/1000000.0);
+	printf(" - Returned:    %5.2f\n",
+			(app_stats.dist.ret_pkts -
+			prev_app_stats.dist.ret_pkts)/1000000.0);
+	printf(" - Sent:        %5.2f\n",
+			(app_stats.dist.sent_pkts -
+			prev_app_stats.dist.sent_pkts)/1000000.0);
+	printf(" - Dropped      %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.dist.enqdrop_pkts -
+			prev_app_stats.dist.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	printf("TX thread:\n");
+	printf(" - Dequeued:    %5.2f\n",
+			(app_stats.tx.dequeue_pkts -
+			prev_app_stats.tx.dequeue_pkts)/1000000.0);
+	for (i = 0; i < rte_eth_dev_count(); i++) {
+		printf("Port %u Pktsout: %5.2f\n",
+				i, (app_stats.port_tx_pkts[i] -
+				prev_app_stats.port_tx_pkts[i])/1000000.0);
+		prev_app_stats.port_tx_pkts[i] = app_stats.port_tx_pkts[i];
+	}
+	printf(" - Transmitted: %5.2f\n",
+			(app_stats.tx.tx_pkts -
+			prev_app_stats.tx.tx_pkts)/1000000.0);
+	printf(" - Dropped:     %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.tx.enqdrop_pkts -
+			prev_app_stats.tx.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	prev_app_stats.rx.rx_pkts = app_stats.rx.rx_pkts;
+	prev_app_stats.rx.returned_pkts = app_stats.rx.returned_pkts;
+	prev_app_stats.rx.enqueued_pkts = app_stats.rx.enqueued_pkts;
+	prev_app_stats.rx.enqdrop_pkts = app_stats.rx.enqdrop_pkts;
+	prev_app_stats.dist.in_pkts = app_stats.dist.in_pkts;
+	prev_app_stats.dist.ret_pkts = app_stats.dist.ret_pkts;
+	prev_app_stats.dist.sent_pkts = app_stats.dist.sent_pkts;
+	prev_app_stats.dist.enqdrop_pkts = app_stats.dist.enqdrop_pkts;
+	prev_app_stats.tx.dequeue_pkts = app_stats.tx.dequeue_pkts;
+	prev_app_stats.tx.tx_pkts = app_stats.tx.tx_pkts;
+	prev_app_stats.tx.enqdrop_pkts = app_stats.tx.enqdrop_pkts;
+
+	for (i = 0; i < num_workers; i++) {
+		printf("Worker %02u Pkts: %5.2f. Bursts(1-8): ", i,
+				(app_stats.worker_pkts[i] -
+				prev_app_stats.worker_pkts[i])/1000000.0);
+		for (j = 0; j < 8; j++)
+			printf("%ld ", app_stats.worker_bursts[i][j]);
+		printf("\n");
+		prev_app_stats.worker_pkts[i] = app_stats.worker_pkts[i];
 	}
 }
 
 static int
 lcore_worker(struct lcore_params *p)
 {
-	struct rte_distributor *d = p->d;
+	struct rte_distributor_burst *d = p->d;
 	const unsigned id = p->worker_id;
+	unsigned int num = 0;
+	unsigned int i;
+
 	/*
 	 * for single port, xor_val will be zero so we won't modify the output
 	 * port, otherwise we send traffic from 0 to 1, 2 to 3, and vice versa
 	 */
 	const unsigned xor_val = (rte_eth_dev_count() > 1);
-	struct rte_mbuf *buf = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
+
+	app_stats.worker_pkts[p->worker_id] = 1;
+
 
 	printf("\nCore %u acting as worker core.\n", rte_lcore_id());
-	while (!quit_signal) {
-		buf = rte_distributor_get_pkt(d, id, buf);
-		buf->port ^= xor_val;
+	while (!quit_signal_work) {
+
+#if BURST_API
+		num = rte_distributor_get_pkt_burst(d, id, buf, buf, num);
+		/* Do a little bit of work for each packet */
+		for (i = 0; i < num; i++) {
+			uint64_t t = __rdtsc()+100;
+
+			while (__rdtsc() < t)
+				rte_pause();
+			buf[i]->port ^= xor_val;
+		}
+#else
+		buf[0] = rte_distributor_get_pkt(d, id, buf[0]);
+		uint64_t t = __rdtsc() + 10;
+
+		while (__rdtsc() < t)
+			rte_pause();
+		buf[0]->port ^= xor_val;
+#endif
+
+		app_stats.worker_pkts[p->worker_id] += num;
+		if (num > 0)
+			app_stats.worker_bursts[p->worker_id][num-1]++;
 	}
+	printf("\nCore %u exiting worker task.\n", rte_lcore_id());
 	return 0;
 }
 
@@ -496,12 +719,14 @@ int
 main(int argc, char *argv[])
 {
 	struct rte_mempool *mbuf_pool;
-	struct rte_distributor *d;
-	struct rte_ring *output_ring;
+	struct rte_distributor_burst *d;
+	struct rte_ring *dist_tx_ring;
+	struct rte_ring *rx_dist_ring;
 	unsigned lcore_id, worker_id = 0;
 	unsigned nb_ports;
 	uint8_t portid;
 	uint8_t nb_ports_available;
+	uint64_t t, freq;
 
 	/* catch ctrl-c so we can print on exit */
 	signal(SIGINT, int_handler);
@@ -518,10 +743,12 @@ main(int argc, char *argv[])
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Invalid distributor parameters\n");
 
-	if (rte_lcore_count() < 3)
+	if (rte_lcore_count() < 5)
 		rte_exit(EXIT_FAILURE, "Error, This application needs at "
-				"least 3 logical cores to run:\n"
-				"1 lcore for packet RX and distribution\n"
+				"least 5 logical cores to run:\n"
+				"1 lcore for stats (can be core 0)\n"
+				"1 lcore for packet RX\n"
+				"1 lcore for distribution\n"
 				"1 lcore for packet TX\n"
 				"and at least 1 lcore for worker threads\n");
 
@@ -560,41 +787,86 @@ main(int argc, char *argv[])
 				"All available ports are disabled. Please set portmask.\n");
 	}
 
+#if BURST_API
+	d = rte_distributor_create_burst("PKT_DIST", rte_socket_id(),
+			rte_lcore_count() - 4);
+#else
 	d = rte_distributor_create("PKT_DIST", rte_socket_id(),
-			rte_lcore_count() - 2);
+			rte_lcore_count() - 4);
+#endif
 	if (d == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create distributor\n");
 
 	/*
-	 * scheduler ring is read only by the transmitter core, but written to
-	 * by multiple threads
+	 * scheduler ring is read by the transmitter core, and written to
+	 * by scheduler core
 	 */
-	output_ring = rte_ring_create("Output_ring", RTE_RING_SZ,
-			rte_socket_id(), RING_F_SC_DEQ);
-	if (output_ring == NULL)
+	dist_tx_ring = rte_ring_create("Output_ring", SCHED_TX_RING_SZ,
+			rte_socket_id(), RING_F_SC_DEQ | RING_F_SP_ENQ);
+	if (dist_tx_ring == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot create output ring\n");
+
+	rx_dist_ring = rte_ring_create("Input_ring", SCHED_RX_RING_SZ,
+			rte_socket_id(), RING_F_SC_DEQ | RING_F_SP_ENQ);
+	if (rx_dist_ring == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create output ring\n");
 
 	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
-		if (worker_id == rte_lcore_count() - 2)
+		if (worker_id == rte_lcore_count() - 3) {
+			printf("Starting distributor on lcore_id %d\n",
+					lcore_id);
+			/* distributor core */
+			struct lcore_params *p =
+					rte_malloc(NULL, sizeof(*p), 0);
+			if (!p)
+				rte_panic("malloc failure\n");
+			*p = (struct lcore_params){worker_id, d,
+					rx_dist_ring, dist_tx_ring, mbuf_pool};
+			rte_eal_remote_launch(
+					(lcore_function_t *)lcore_distributor,
+					p, lcore_id);
+		} else if (worker_id == rte_lcore_count() - 4) {
+			printf("Starting tx  on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
+			/* tx core */
 			rte_eal_remote_launch((lcore_function_t *)lcore_tx,
-					output_ring, lcore_id);
-		else {
+					dist_tx_ring, lcore_id);
+		} else if (worker_id == rte_lcore_count() - 2) {
+			printf("Starting rx on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
+			/* rx core */
+			struct lcore_params *p =
+					rte_malloc(NULL, sizeof(*p), 0);
+			if (!p)
+				rte_panic("malloc failure\n");
+			*p = (struct lcore_params){worker_id, d, rx_dist_ring,
+					dist_tx_ring, mbuf_pool};
+			rte_eal_remote_launch((lcore_function_t *)lcore_rx,
+					p, lcore_id);
+		} else {
+			printf("Starting worker on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
 			struct lcore_params *p =
 					rte_malloc(NULL, sizeof(*p), 0);
 			if (!p)
 				rte_panic("malloc failure\n");
-			*p = (struct lcore_params){worker_id, d, output_ring, mbuf_pool};
+			*p = (struct lcore_params){worker_id, d, rx_dist_ring,
+					dist_tx_ring, mbuf_pool};
 
 			rte_eal_remote_launch((lcore_function_t *)lcore_worker,
 					p, lcore_id);
 		}
 		worker_id++;
 	}
-	/* call lcore_main on master core only */
-	struct lcore_params p = { 0, d, output_ring, mbuf_pool};
 
-	if (lcore_rx(&p) != 0)
-		return -1;
+	freq = rte_get_timer_hz();
+	t = __rdtsc() + freq;
+	while (!quit_signal_dist) {
+		if (t < __rdtsc()) {
+			print_stats();
+			t = _rdtsc() + freq;
+		}
+	}
 
 	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
 		if (rte_eal_wait_lcore(lcore_id) < 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v4 6/6] doc: distributor library changes for new burst api
  2017-01-09  7:50       ` [PATCH v4 0/6] distributor library performance enhancements David Hunt
                           ` (4 preceding siblings ...)
  2017-01-09  7:50         ` [PATCH v4 5/6] example: distributor app showing burst api David Hunt
@ 2017-01-09  7:50         ` David Hunt
  5 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-01-09  7:50 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 doc/guides/prog_guide/packet_distrib_lib.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/prog_guide/packet_distrib_lib.rst b/doc/guides/prog_guide/packet_distrib_lib.rst
index b5bdabb..dffd4ad 100644
--- a/doc/guides/prog_guide/packet_distrib_lib.rst
+++ b/doc/guides/prog_guide/packet_distrib_lib.rst
@@ -42,6 +42,10 @@ The model of operation is shown in the diagram below.
 
    Packet Distributor mode of operation
 
+There are two versions of the API in the distributor Library, one which sends one packet at a time to workers,
+and another which sends bursts of up to 8 packets at a time to workers. The functions names of the second API
+are identified by "_burst", and must not be intermixed with the single packet API. The operations described below
+apply to both API's, select which API you wish to use by including the relevant header file.
 
 Distributor Core Operation
 --------------------------
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* Re: [PATCH v4 1/6] lib: distributor performance enhancements
  2017-01-09  7:50         ` [PATCH v4 1/6] lib: distributor " David Hunt
@ 2017-01-13 15:19           ` Bruce Richardson
  2017-01-19 14:58             ` Hunt, David
  2017-01-16 16:36           ` Bruce Richardson
  2017-01-20  9:18           ` [PATCH v5 0/6] distributor library " David Hunt
  2 siblings, 1 reply; 202+ messages in thread
From: Bruce Richardson @ 2017-01-13 15:19 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Mon, Jan 09, 2017 at 07:50:43AM +0000, David Hunt wrote:
> Now sends bursts of up to 8 mbufs to each worker, and tracks
> the in-flight flow-ids (atomic scheduling)
> 
> New file with a new api, similar to the old API except with _burst
> at the end of the function names
> 
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  lib/librte_distributor/Makefile                    |   2 +
>  lib/librte_distributor/rte_distributor.c           |  72 +--
>  lib/librte_distributor/rte_distributor_burst.c     | 558 +++++++++++++++++++++
>  lib/librte_distributor/rte_distributor_burst.h     | 255 ++++++++++
>  lib/librte_distributor/rte_distributor_priv.h      | 189 +++++++
>  lib/librte_distributor/rte_distributor_version.map |   9 +
>  6 files changed, 1014 insertions(+), 71 deletions(-)
>  create mode 100644 lib/librte_distributor/rte_distributor_burst.c
>  create mode 100644 lib/librte_distributor/rte_distributor_burst.h
>  create mode 100644 lib/librte_distributor/rte_distributor_priv.h
> 
Run a documentation sanity check after this patch throws up a few
warnings:

--- /dev/null   2017-01-10 10:26:01.206201474 +0000
+++ /tmp/doc-check/doc.txt      2017-01-13 15:19:50.717102848 +0000
@@ -0,0 +1,6 @@
+/home/bruce/dpdk-clean/lib/librte_distributor/rte_distributor_burst.h:187:
warning: argument 'mbuf' of command @param is not found in the argument
list of rte_distributor_return_pkt_burst(struct rte_distributor_burst
*d, unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
+/home/bruce/dpdk-clean/lib/librte_distributor/rte_distributor_burst.h:199:
warning: The following parameters of
rte_distributor_return_pkt_burst(struct rte_distributor_burst *d,
unsigned int worker_id, struct rte_mbuf **oldpkt, int num) are not
documented:
+  parameter 'oldpkt'
+  parameter 'num'
+/home/bruce/dpdk-clean/lib/librte_distributor/rte_distributor_priv.h:73:
warning: Found unknown command `\in_flight_bitmask'
+/home/bruce/dpdk-clean/lib/librte_distributor/rte_distributor_priv.h:73:
warning: Found unknown command `\rte_distributor_process'

Regards,
/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v4 2/6] lib: add distributor vector flow matching
  2017-01-09  7:50         ` [PATCH v4 2/6] lib: add distributor vector flow matching David Hunt
@ 2017-01-13 15:26           ` Bruce Richardson
  2017-01-19 14:59             ` Hunt, David
  2017-01-16 16:40           ` Bruce Richardson
  1 sibling, 1 reply; 202+ messages in thread
From: Bruce Richardson @ 2017-01-13 15:26 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Mon, Jan 09, 2017 at 07:50:44AM +0000, David Hunt wrote:
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  lib/librte_distributor/Makefile                    |   4 +
>  lib/librte_distributor/rte_distributor_burst.c     |  11 +-
>  lib/librte_distributor/rte_distributor_match_sse.c | 113 +++++++++++++++++++++
>  lib/librte_distributor/rte_distributor_priv.h      |   6 ++
>  4 files changed, 133 insertions(+), 1 deletion(-)
>  create mode 100644 lib/librte_distributor/rte_distributor_match_sse.c
> 
> diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
> index 2acc54d..a725aaf 100644
> --- a/lib/librte_distributor/Makefile
> +++ b/lib/librte_distributor/Makefile
> @@ -44,6 +44,10 @@ LIBABIVER := 1
>  # all source are stored in SRCS-y
>  SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
>  SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_burst.c
> +ifeq ($(CONFIG_RTE_ARCH_X86),y)
> +SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_match_sse.c
> +endif
> +
>  

I believe some of the intrinsics used in the vector code are SSE4.2
instructions, so you need to pass that flag for the compilation for e.g.
the "default" target for packaging into distros.

>  # install this header file
>  SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
> diff --git a/lib/librte_distributor/rte_distributor_burst.c b/lib/librte_distributor/rte_distributor_burst.c
> index ae7cf9d..35044c4 100644
> --- a/lib/librte_distributor/rte_distributor_burst.c
> +++ b/lib/librte_distributor/rte_distributor_burst.c
> @@ -352,6 +352,9 @@ rte_distributor_process_burst(struct rte_distributor_burst *d,
>  		}
>  
>  		switch (d->dist_match_fn) {
> +		case RTE_DIST_MATCH_VECTOR:
> +			find_match_vec(d, &flows[0], &matches[0]);
> +			break;
>  		default:
>  			find_match_scalar(d, &flows[0], &matches[0]);
>  		}
> @@ -538,7 +541,13 @@ rte_distributor_create_burst(const char *name,
>  	snprintf(d->name, sizeof(d->name), "%s", name);
>  	d->num_workers = num_workers;
>  
> -	d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
> +#if defined(RTE_ARCH_X86)
> +	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2)) {
> +		d->dist_match_fn = RTE_DIST_MATCH_VECTOR;
> +	} else {
> +#endif
> +		d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
> +	}
>  

Two issues here:
1) the check needs to be for SSE4.2, not SSE2 [minimum for DPDK on x86
is SSE3 anyway, so no need for any checks for SSE2]
2) The closing brace should be ifdefed out to fix compilation on non-x86
platforms. A simpler/better solution might actually be to remove the
braces since only a single line is involved in each branch.

Regards,
/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v4 3/6] test: unit tests for new distributor burst api
  2017-01-09  7:50         ` [PATCH v4 3/6] test: unit tests for new distributor burst api David Hunt
@ 2017-01-13 15:33           ` Bruce Richardson
  0 siblings, 0 replies; 202+ messages in thread
From: Bruce Richardson @ 2017-01-13 15:33 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Mon, Jan 09, 2017 at 07:50:45AM +0000, David Hunt wrote:
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  app/test/test_distributor.c | 501 ++++++++++++++++++++++++++++++++++----------
>  1 file changed, 392 insertions(+), 109 deletions(-)
> 
check-git-log script complains about your capitalization of the commit
title, API vs api:

Wrong headline lowercase:
        test: unit tests for new distributor burst api

This also applies to a few more of the patches.

/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v4 5/6] example: distributor app showing burst api
  2017-01-09  7:50         ` [PATCH v4 5/6] example: distributor app showing burst api David Hunt
@ 2017-01-13 15:36           ` Bruce Richardson
  2017-01-13 15:38           ` Bruce Richardson
  1 sibling, 0 replies; 202+ messages in thread
From: Bruce Richardson @ 2017-01-13 15:36 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Mon, Jan 09, 2017 at 07:50:47AM +0000, David Hunt wrote:
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  examples/distributor/main.c | 508 ++++++++++++++++++++++++++++++++++----------
>  1 file changed, 390 insertions(+), 118 deletions(-)
> 
check-git-log complains a bit about the title of the patch, and it would
be good to have a description of the app changes as a commit message
body.

As well as this checkpatch throws up the fact that there is a line
commented out with a C99 style comment. This line should be deleted if
it's commented out.

ERROR:C99_COMMENTS: do not use C99 // comments
#305: FILE: examples/distributor/main.c:331:
+               //struct rte_ring *out_ring = p->dist_tx_ring;

total: 1 errors, 0 warnings, 743 lines checked

/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v4 5/6] example: distributor app showing burst api
  2017-01-09  7:50         ` [PATCH v4 5/6] example: distributor app showing burst api David Hunt
  2017-01-13 15:36           ` Bruce Richardson
@ 2017-01-13 15:38           ` Bruce Richardson
  1 sibling, 0 replies; 202+ messages in thread
From: Bruce Richardson @ 2017-01-13 15:38 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Mon, Jan 09, 2017 at 07:50:47AM +0000, David Hunt wrote:
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  examples/distributor/main.c | 508 ++++++++++++++++++++++++++++++++++----------
>  1 file changed, 390 insertions(+), 118 deletions(-)
> 
> diff --git a/examples/distributor/main.c b/examples/distributor/main.c
> index e7641d2..eebfb74 100644

Compile errors on 32-bit:

/home/bruce/dpdk-clean/examples/distributor/main.c: In function
‘print_stats’:
/home/bruce/dpdk-clean/examples/distributor/main.c:589:14: error: format
‘%ld’ expects argument of type ‘long int’, but argument 2 has type
‘uint64_t {aka volatile long long unsigned int}’ [-Werror=format=]
    printf("%ld ", app_stats.worker_bursts[i][j]);
                  ^

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v4 1/6] lib: distributor performance enhancements
  2017-01-09  7:50         ` [PATCH v4 1/6] lib: distributor " David Hunt
  2017-01-13 15:19           ` Bruce Richardson
@ 2017-01-16 16:36           ` Bruce Richardson
  2017-01-19 12:07             ` Hunt, David
  2017-01-20  9:18           ` [PATCH v5 0/6] distributor library " David Hunt
  2 siblings, 1 reply; 202+ messages in thread
From: Bruce Richardson @ 2017-01-16 16:36 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Mon, Jan 09, 2017 at 07:50:43AM +0000, David Hunt wrote:
> Now sends bursts of up to 8 mbufs to each worker, and tracks
> the in-flight flow-ids (atomic scheduling)
> 
> New file with a new api, similar to the old API except with _burst
> at the end of the function names
>

Can you explain why this is necessary, and also how the new version
works compared to the old. I know this is explained in the cover letter,
but the cover letter does not make the git commit log.

> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
<snip>
> diff --git a/lib/librte_distributor/rte_distributor_burst.c b/lib/librte_distributor/rte_distributor_burst.c
> new file mode 100644
> index 0000000..ae7cf9d
> --- /dev/null
> +++ b/lib/librte_distributor/rte_distributor_burst.c
> @@ -0,0 +1,558 @@
> +/*-
> + *   BSD LICENSE
> + *
> + *   Copyright(c) 2016 Intel Corporation. All rights reserved.

Update year since we aren't in 2016 any more.

> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of Intel Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include <stdio.h>
> +#include <sys/queue.h>
> +#include <string.h>
> +#include <rte_mbuf.h>
> +#include <rte_memory.h>
> +#include <rte_cycles.h>
> +#include <rte_memzone.h>
> +#include <rte_errno.h>
> +#include <rte_string_fns.h>
> +#include <rte_eal_memconfig.h>
> +#include "rte_distributor_priv.h"
> +#include "rte_distributor_burst.h"
> +
> +TAILQ_HEAD(rte_dist_burst_list, rte_distributor_burst);
> +
> +static struct rte_tailq_elem rte_dist_burst_tailq = {
> +	.name = "RTE_DIST_BURST",
> +};
> +EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
> +
> +/**** APIs called by workers ****/
> +
> +/**** Burst Packet APIs called by workers ****/
> +
> +/* This function should really be called return_pkt_burst() */
1) Why should it be? 
2) Why isn't it called that? 
Please explain the naming. :-)

> +void
> +rte_distributor_request_pkt_burst(struct rte_distributor_burst *d,
> +		unsigned int worker_id, struct rte_mbuf **oldpkt,
> +		unsigned int count)
> +{
> +	struct rte_distributor_buffer_burst *buf = &(d->bufs[worker_id]);
> +	unsigned int i;
> +
> +	volatile int64_t *retptr64;
> +
> +
> +	/* if we dont' have any packets to return, return. */
> +	if (count == 0)
> +		return;
> +
So if we don't return anything we don't get any more packets, right?
What happens if we return fewer packets than we were previously given?
If that is allowed, why the restriction on returning at least one?

> +	retptr64 = &(buf->retptr64[0]);
<snip>
> +
> +int
> +rte_distributor_get_pkt_burst(struct rte_distributor_burst *d,
> +		unsigned int worker_id, struct rte_mbuf **pkts,
> +		struct rte_mbuf **oldpkt, unsigned int return_count)
> +{
> +	unsigned int count;
> +	uint64_t retries = 0;
> +
> +	rte_distributor_request_pkt_burst(d, worker_id, oldpkt, return_count);
> +
> +	count = rte_distributor_poll_pkt_burst(d, worker_id, pkts);
> +	while (count == 0) {
> +		rte_pause();
> +		retries++;
> +		if (retries > 1000)
> +			return 0;

This behaviour is different to the original get_pkt() behaviour in that
it has a timeout. Why the change to add the timeout, and should the
timeout not be user configurable in some way?

> +
> +		uint64_t t = rte_rdtsc()+100;

need spaces around the "+"

> +
> +		while (rte_rdtsc() < t)
> +			rte_pause();
> +
> +		count = rte_distributor_poll_pkt_burst(d, worker_id, pkts);
> +	}
> +	return count;
> +}
> +
> +int
> +rte_distributor_return_pkt_burst(struct rte_distributor_burst *d,
> +		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
> +{
> +	struct rte_distributor_buffer_burst *buf = &d->bufs[worker_id];
> +	unsigned int i;
> +
> +	for (i = 0; i < RTE_DIST_BURST_SIZE; i++)
> +		/* Switch off the return bit first */
> +		buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
> +
> +	for (i = num; i-- > 0; )
> +		buf->retptr64[i] = (((int64_t)(uintptr_t)oldpkt[i]) <<
> +			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
> +
> +	/* set the GET_BUF but even if we got no returns */
> +	buf->retptr64[0] |= RTE_DISTRIB_GET_BUF;

Does this mean we are requesting more packets here?

> +
> +	return 0;
> +}
> +
> +/**** APIs called on distributor core ***/
> +
<snip>
> +
> +static unsigned int
> +release(struct rte_distributor_burst *d, unsigned int wkr)

I think this function needs a comment describing what it is doing,
and where is it called from and why. Other functions on distributor side
probably need the same thing too.

> +{
> +	struct rte_distributor_buffer_burst *buf = &(d->bufs[wkr]);
> +	unsigned int i;
> +
> +	if (d->backlog[wkr].count == 0)
> +		return 0;
> +
> +	while (!(d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF))
> +		rte_pause();
> +
> +	handle_returns(d, wkr);
> +
> +	buf->count = 0;
> +
> +	for (i = 0; i < d->backlog[wkr].count; i++) {
> +		d->bufs[wkr].bufptr64[i] = d->backlog[wkr].pkts[i] |
> +				RTE_DISTRIB_GET_BUF | RTE_DISTRIB_VALID_BUF;
> +		d->in_flight_tags[wkr][i] = d->backlog[wkr].tags[i];
> +	}
> +	buf->count = i;
> +	for ( ; i < RTE_DIST_BURST_SIZE ; i++) {
> +		buf->bufptr64[i] = RTE_DISTRIB_GET_BUF;
> +		d->in_flight_tags[wkr][i] = 0;
> +	}
> +
> +	d->backlog[wkr].count = 0;
> +
> +	/* Clear the GET bit */
> +	buf->bufptr64[0] &= ~RTE_DISTRIB_GET_BUF;
> +	return  buf->count;
> +
> +}
<snip>
> +/**
> + * API called by a worker to get new packets to process. Any previous packets
> + * given to the worker is assumed to have completed processing, and may be
> + * optionally returned to the distributor via the oldpkt parameter.
> + *
> + * @param d
> + *   The distributor instance to be used
> + * @param worker_id
> + *   The worker instance number to use - must be less that num_workers passed
> + *   at distributor creation time.
> + * @param pkts
> + *   The mbufs pointer array to be filled in (up to 8 packets)
> + * @param oldpkt
> + *   The previous packet, if any, being processed by the worker
> + * @param retcount
> + *   The number of packets being returneda

I think you need to document that it can't be zero, if I read the above
C implementation correctly.

> + *
> + * @return
> + *   The number of packets in the pkts array
> + */
> +int
> +rte_distributor_get_pkt_burst(struct rte_distributor_burst *d,
> +	unsigned int worker_id, struct rte_mbuf **pkts,
> +	struct rte_mbuf **oldpkt, unsigned int retcount);
> +
> +/**
<snip>
> +
> +/**
> + * Number of packets to deal with in bursts. Needs to be 8 so as to
> + * fit in one cache line.
> + */
> +#define RTE_DIST_BURST_SIZE (sizeof(__m128i) / sizeof(uint16_t))

Does this compile for non-x86 with the references to __m128i?

> +
<snip>
> +
> +	struct rte_distributor_returned_pkts returns;
> +};
> +
> +/* All different signature compare functions */
> +enum rte_distributor_match_function {
> +	RTE_DIST_MATCH_SCALAR = 0,
> +	RTE_DIST_MATCH_NUM

I think this last entry should be "RTE_DIST_NUM_MATCH_FNS", as
"NUM" is not a match function, and the define doesn't ready right.

> +};
> +
> +struct rte_distributor_burst {
> +	TAILQ_ENTRY(rte_distributor_burst) next;    /**< Next in list. */
> +
> +	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
> +	unsigned int num_workers;             /**< Number of workers polling */
> +
> +	/**>
> +	 * First cache line in the this array are the tags inflight
> +	 * on the worker core. Second cache line are the backlog
> +	 * that are going to go to the worker core.
> +	 */
> +	uint16_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS][RTE_DIST_BURST_SIZE*2]
> +			__rte_cache_aligned;
> +
> +	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS]
> +			__rte_cache_aligned;
> +
> +	struct rte_distributor_buffer_burst bufs[RTE_DISTRIB_MAX_WORKERS];
> +
> +	struct rte_distributor_returned_pkts returns;
> +
> +	enum rte_distributor_match_function dist_match_fn;
> +};
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif
> diff --git a/lib/librte_distributor/rte_distributor_version.map b/lib/librte_distributor/rte_distributor_version.map
> index 73fdc43..39795a1 100644
> --- a/lib/librte_distributor/rte_distributor_version.map
> +++ b/lib/librte_distributor/rte_distributor_version.map
> @@ -2,14 +2,23 @@ DPDK_2.0 {
>  	global:
>  
>  	rte_distributor_clear_returns;
> +	rte_distributor_clear_returns_burst;
>  	rte_distributor_create;
> +	rte_distributor_create_burst;
>  	rte_distributor_flush;
> +	rte_distributor_flush_burst;
>  	rte_distributor_get_pkt;
> +	rte_distributor_get_pkt_burst;
>  	rte_distributor_poll_pkt;
> +	rte_distributor_poll_pkt_burst;
>  	rte_distributor_process;
> +	rte_distributor_process_burst;
>  	rte_distributor_request_pkt;
> +	rte_distributor_request_pkt_burst;
>  	rte_distributor_return_pkt;
> +	rte_distributor_return_pkt_burst;
>  	rte_distributor_returned_pkts;
> +	rte_distributor_returned_pkts_burst;
>  
>  	local: *;
>  };

The new functions are not present in DPDK 2.0, so you need a new node
for the 17.02 release.

Regards,
/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v4 2/6] lib: add distributor vector flow matching
  2017-01-09  7:50         ` [PATCH v4 2/6] lib: add distributor vector flow matching David Hunt
  2017-01-13 15:26           ` Bruce Richardson
@ 2017-01-16 16:40           ` Bruce Richardson
  2017-01-19 12:11             ` Hunt, David
  1 sibling, 1 reply; 202+ messages in thread
From: Bruce Richardson @ 2017-01-16 16:40 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Mon, Jan 09, 2017 at 07:50:44AM +0000, David Hunt wrote:
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  lib/librte_distributor/Makefile                    |   4 +
>  lib/librte_distributor/rte_distributor_burst.c     |  11 +-
>  lib/librte_distributor/rte_distributor_match_sse.c | 113 +++++++++++++++++++++
>  lib/librte_distributor/rte_distributor_priv.h      |   6 ++
>  4 files changed, 133 insertions(+), 1 deletion(-)
>  create mode 100644 lib/librte_distributor/rte_distributor_match_sse.c
> 
> diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
> index 2acc54d..a725aaf 100644
> --- a/lib/librte_distributor/Makefile
> +++ b/lib/librte_distributor/Makefile
> @@ -44,6 +44,10 @@ LIBABIVER := 1
>  # all source are stored in SRCS-y
>  SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
>  SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_burst.c
> +ifeq ($(CONFIG_RTE_ARCH_X86),y)
> +SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_match_sse.c
> +endif
> +
>  
>  # install this header file
>  SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
> diff --git a/lib/librte_distributor/rte_distributor_burst.c b/lib/librte_distributor/rte_distributor_burst.c
> index ae7cf9d..35044c4 100644
> --- a/lib/librte_distributor/rte_distributor_burst.c
> +++ b/lib/librte_distributor/rte_distributor_burst.c
> @@ -352,6 +352,9 @@ rte_distributor_process_burst(struct rte_distributor_burst *d,
>  		}
>  
>  		switch (d->dist_match_fn) {
> +		case RTE_DIST_MATCH_VECTOR:
> +			find_match_vec(d, &flows[0], &matches[0]);
> +			break;
>  		default:
>  			find_match_scalar(d, &flows[0], &matches[0]);
>  		}

Will link not fail on non-x86 platforms due to find_match_vec not having
any implementation on those platforms?

/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v4 1/6] lib: distributor performance enhancements
  2017-01-16 16:36           ` Bruce Richardson
@ 2017-01-19 12:07             ` Hunt, David
  0 siblings, 0 replies; 202+ messages in thread
From: Hunt, David @ 2017-01-19 12:07 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev

Thanks for the comments Bruce. Addressed below.


On 16/1/2017 4:36 PM, Bruce Richardson wrote:
> On Mon, Jan 09, 2017 at 07:50:43AM +0000, David Hunt wrote:
>> Now sends bursts of up to 8 mbufs to each worker, and tracks
>> the in-flight flow-ids (atomic scheduling)
>>
>> New file with a new api, similar to the old API except with _burst
>> at the end of the function names
>>
> Can you explain why this is necessary, and also how the new version
> works compared to the old. I know this is explained in the cover letter,
> but the cover letter does not make the git commit log.

Sure. I'll add extra comments into the git comment. The main reason is to
preserve the original API. This gives the user the choice to migrate to 
the new
API should they wish to.

>> Signed-off-by: David Hunt <david.hunt@intel.com>
>> ---
> <snip>
>> diff --git a/lib/librte_distributor/rte_distributor_burst.c 
>> b/lib/librte_distributor/rte_distributor_burst.c
>> new file mode 100644
>> index 0000000..ae7cf9d
>> --- /dev/null
>> +++ b/lib/librte_distributor/rte_distributor_burst.c
>> @@ -0,0 +1,558 @@
>> +/*-
>> + *   BSD LICENSE
>> + *
>> + *   Copyright(c) 2016 Intel Corporation. All rights reserved.
> Update year since we aren't in 2016 any more.
>
>> + *
>> + *   Redistribution and use in source and binary forms, with or without
>> + *   modification, are permitted provided that the following conditions
>> + *   are met:
>> + *
>> + *     * Redistributions of source code must retain the above copyright
>> + *       notice, this list of conditions and the following disclaimer.
>> + *     * Redistributions in binary form must reproduce the above 
>> copyright
>> + *       notice, this list of conditions and the following 
>> disclaimer in
>> + *       the documentation and/or other materials provided with the
>> + *       distribution.
>> + *     * Neither the name of Intel Corporation nor the names of its
>> + *       contributors may be used to endorse or promote products 
>> derived
>> + *       from this software without specific prior written permission.
>> + *
>> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND 
>> CONTRIBUTORS
>> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND 
>> FITNESS FOR
>> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE 
>> COPYRIGHT
>> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, 
>> INCIDENTAL,
>> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS 
>> OF USE,
>> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 
>> ON ANY
>> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR 
>> TORT
>> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF 
>> THE USE
>> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH 
>> DAMAGE.
>> + */
>> +
>> +#include <stdio.h>
>> +#include <sys/queue.h>
>> +#include <string.h>
>> +#include <rte_mbuf.h>
>> +#include <rte_memory.h>
>> +#include <rte_cycles.h>
>> +#include <rte_memzone.h>
>> +#include <rte_errno.h>
>> +#include <rte_string_fns.h>
>> +#include <rte_eal_memconfig.h>
>> +#include "rte_distributor_priv.h"
>> +#include "rte_distributor_burst.h"
>> +
>> +TAILQ_HEAD(rte_dist_burst_list, rte_distributor_burst);
>> +
>> +static struct rte_tailq_elem rte_dist_burst_tailq = {
>> +    .name = "RTE_DIST_BURST",
>> +};
>> +EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
>> +
>> +/**** APIs called by workers ****/
>> +
>> +/**** Burst Packet APIs called by workers ****/
>> +
>> +/* This function should really be called return_pkt_burst() */
> 1) Why should it be?
> 2) Why isn't it called that?
> Please explain the naming.

It seemed to me that the main use of this function was to return
the packets from the worker rather than requesting new packets,
whilst also toggling the bit to tell the distributor to send more packets.
So I guess it's OK as it is. I've removed the comment to remove this 
confusion.

>> +void
>> +rte_distributor_request_pkt_burst(struct rte_distributor_burst *d,
>> +        unsigned int worker_id, struct rte_mbuf **oldpkt,
>> +        unsigned int count)
>> +{
>> +    struct rte_distributor_buffer_burst *buf = &(d->bufs[worker_id]);
>> +    unsigned int i;
>> +
>> +    volatile int64_t *retptr64;
>> +
>> +
>> +    /* if we dont' have any packets to return, return. */
>> +    if (count == 0)
>> +        return;
>> +
> So if we don't return anything we don't get any more packets, right?
> What happens if we return fewer packets than we were previously given?
> If that is allowed, why the restriction on returning at least one?

You are correct. We should be able to return 0, and still flip the 
handshake
bit to request more packets. This check will be removed.

>> +    retptr64 = &(buf->retptr64[0]);
> <snip>
>> +
>> +int
>> +rte_distributor_get_pkt_burst(struct rte_distributor_burst *d,
>> +        unsigned int worker_id, struct rte_mbuf **pkts,
>> +        struct rte_mbuf **oldpkt, unsigned int return_count)
>> +{
>> +    unsigned int count;
>> +    uint64_t retries = 0;
>> +
>> +    rte_distributor_request_pkt_burst(d, worker_id, oldpkt, 
>> return_count);
>> +
>> +    count = rte_distributor_poll_pkt_burst(d, worker_id, pkts);
>> +    while (count == 0) {
>> +        rte_pause();
>> +        retries++;
>> +        if (retries > 1000)
>> +            return 0;
> This behaviour is different to the original get_pkt() behaviour in that
> it has a timeout. Why the change to add the timeout, and should the
> timeout not be user configurable in some way?

I had another look at this, and managed to clean up this logic. There is 
no longer a need for the retry.

In the old logic, the poll_pkt function returned a pointer, or NULL when 
the handshake bit was not ready.
In the new logic, up until now, I had similar logic, but return 0 for 
both the case where the bit was not ready,
and the bit was ready and the number of valid pointers was 0. This meant 
that there was no way for the loop
to break out when the application was exiting or flushing. I've now 
introduced a -1 return when the bit
is not ready, so will continue looping. But when the distributor sets 
the bot with no packets, the poll_pkt
function will return 0, allowing the loop to exit and return to the caller.

Thanks for that comment, Bruce, it's fixed a major shortcoming in the 
logic.


>> +
>> +        uint64_t t = rte_rdtsc()+100;
> need spaces around the "+"

Done

>> +
>> +        while (rte_rdtsc() < t)
>> +            rte_pause();
>> +
>> +        count = rte_distributor_poll_pkt_burst(d, worker_id, pkts);
>> +    }
>> +    return count;
>> +}
>> +
>> +int
>> +rte_distributor_return_pkt_burst(struct rte_distributor_burst *d,
>> +        unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
>> +{
>> +    struct rte_distributor_buffer_burst *buf = &d->bufs[worker_id];
>> +    unsigned int i;
>> +
>> +    for (i = 0; i < RTE_DIST_BURST_SIZE; i++)
>> +        /* Switch off the return bit first */
>> +        buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
>> +
>> +    for (i = num; i-- > 0; )
>> +        buf->retptr64[i] = (((int64_t)(uintptr_t)oldpkt[i]) <<
>> +            RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
>> +
>> +    /* set the GET_BUF but even if we got no returns */
>> +    buf->retptr64[0] |= RTE_DISTRIB_GET_BUF;
> Does this mean we are requesting more packets here?

No, we're setting retptr which means that the distributor will start 
processing the returns cacheline.
The only way to request more packets is
         buf->bufptr64[0] |= RTE_DISTRIB_GET_BUF;

This is usually called when you are shutting down a thread and want to 
return what you have, and not
request any new packets from the distributor.

>
>> +
>> +    return 0;
>> +}
>> +
>> +/**** APIs called on distributor core ***/
>> +
> <snip>
>> +
>> +static unsigned int
>> +release(struct rte_distributor_burst *d, unsigned int wkr)
> I think this function needs a comment describing what it is doing,
> and where is it called from and why. Other functions on distributor side
> probably need the same thing too.

Done.

>> +{
>> +    struct rte_distributor_buffer_burst *buf = &(d->bufs[wkr]);
>> +    unsigned int i;
>> +
>> +    if (d->backlog[wkr].count == 0)
>> +        return 0;
>> +
>> +    while (!(d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF))
>> +        rte_pause();
>> +
>> +    handle_returns(d, wkr);
>> +
>> +    buf->count = 0;
>> +
>> +    for (i = 0; i < d->backlog[wkr].count; i++) {
>> +        d->bufs[wkr].bufptr64[i] = d->backlog[wkr].pkts[i] |
>> +                RTE_DISTRIB_GET_BUF | RTE_DISTRIB_VALID_BUF;
>> +        d->in_flight_tags[wkr][i] = d->backlog[wkr].tags[i];
>> +    }
>> +    buf->count = i;
>> +    for ( ; i < RTE_DIST_BURST_SIZE ; i++) {
>> +        buf->bufptr64[i] = RTE_DISTRIB_GET_BUF;
>> +        d->in_flight_tags[wkr][i] = 0;
>> +    }
>> +
>> +    d->backlog[wkr].count = 0;
>> +
>> +    /* Clear the GET bit */
>> +    buf->bufptr64[0] &= ~RTE_DISTRIB_GET_BUF;
>> +    return  buf->count;
>> +
>> +}
> <snip>
>> +/**
>> + * API called by a worker to get new packets to process. Any 
>> previous packets
>> + * given to the worker is assumed to have completed processing, and 
>> may be
>> + * optionally returned to the distributor via the oldpkt parameter.
>> + *
>> + * @param d
>> + *   The distributor instance to be used
>> + * @param worker_id
>> + *   The worker instance number to use - must be less that 
>> num_workers passed
>> + *   at distributor creation time.
>> + * @param pkts
>> + *   The mbufs pointer array to be filled in (up to 8 packets)
>> + * @param oldpkt
>> + *   The previous packet, if any, being processed by the worker
>> + * @param retcount
>> + *   The number of packets being returneda
> I think you need to document that it can't be zero, if I read the above
> C implementation correctly.

Can be zero now, after resolving some issues indicated above. We should 
be able to return zero
to indicated that we've processed all in the burst but are not returning 
any (i.e. drop)

>> + *
>> + * @return
>> + *   The number of packets in the pkts array
>> + */
>> +int
>> +rte_distributor_get_pkt_burst(struct rte_distributor_burst *d,
>> +    unsigned int worker_id, struct rte_mbuf **pkts,
>> +    struct rte_mbuf **oldpkt, unsigned int retcount);
>> +
>> +/**
> <snip>
>> +
>> +/**
>> + * Number of packets to deal with in bursts. Needs to be 8 so as to
>> + * fit in one cache line.
>> + */
>> +#define RTE_DIST_BURST_SIZE (sizeof(__m128i) / sizeof(uint16_t))
> Does this compile for non-x86 with the references to __m128i?

Changed to rte_xmm_t


>> +
> <snip>
>> +
>> +    struct rte_distributor_returned_pkts returns;
>> +};
>> +
>> +/* All different signature compare functions */
>> +enum rte_distributor_match_function {
>> +    RTE_DIST_MATCH_SCALAR = 0,
>> +    RTE_DIST_MATCH_NUM
> I think this last entry should be "RTE_DIST_NUM_MATCH_FNS", as
> "NUM" is not a match function, and the define doesn't ready right.

Done.

>> +};
>> +
>> +struct rte_distributor_burst {
>> +    TAILQ_ENTRY(rte_distributor_burst) next;    /**< Next in list. */
>> +
>> +    char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
>> +    unsigned int num_workers;             /**< Number of workers 
>> polling */
>> +
>> +    /**>
>> +     * First cache line in the this array are the tags inflight
>> +     * on the worker core. Second cache line are the backlog
>> +     * that are going to go to the worker core.
>> +     */
>> +    uint16_t 
>> in_flight_tags[RTE_DISTRIB_MAX_WORKERS][RTE_DIST_BURST_SIZE*2]
>> +            __rte_cache_aligned;
>> +
>> +    struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS]
>> +            __rte_cache_aligned;
>> +
>> +    struct rte_distributor_buffer_burst bufs[RTE_DISTRIB_MAX_WORKERS];
>> +
>> +    struct rte_distributor_returned_pkts returns;
>> +
>> +    enum rte_distributor_match_function dist_match_fn;
>> +};
>> +
>> +#ifdef __cplusplus
>> +}
>> +#endif
>> +
>> +#endif
>> diff --git a/lib/librte_distributor/rte_distributor_version.map 
>> b/lib/librte_distributor/rte_distributor_version.map
>> index 73fdc43..39795a1 100644
>> --- a/lib/librte_distributor/rte_distributor_version.map
>> +++ b/lib/librte_distributor/rte_distributor_version.map
>> @@ -2,14 +2,23 @@ DPDK_2.0 {
>>       global:
>>         rte_distributor_clear_returns;
>> +    rte_distributor_clear_returns_burst;
>>       rte_distributor_create;
>> +    rte_distributor_create_burst;
>>       rte_distributor_flush;
>> +    rte_distributor_flush_burst;
>>       rte_distributor_get_pkt;
>> +    rte_distributor_get_pkt_burst;
>>       rte_distributor_poll_pkt;
>> +    rte_distributor_poll_pkt_burst;
>>       rte_distributor_process;
>> +    rte_distributor_process_burst;
>>       rte_distributor_request_pkt;
>> +    rte_distributor_request_pkt_burst;
>>       rte_distributor_return_pkt;
>> +    rte_distributor_return_pkt_burst;
>>       rte_distributor_returned_pkts;
>> +    rte_distributor_returned_pkts_burst;
>>         local: *;
>>   };
> The new functions are not present in DPDK 2.0, so you need a new node
> for the 17.02 release.

Sure.

> Regards,
> /Bruce
>

Thanks Bruce. I'll get a new revision up later today.

Regards,
Dave.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v4 2/6] lib: add distributor vector flow matching
  2017-01-16 16:40           ` Bruce Richardson
@ 2017-01-19 12:11             ` Hunt, David
  0 siblings, 0 replies; 202+ messages in thread
From: Hunt, David @ 2017-01-19 12:11 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev


On 16/1/2017 4:40 PM, Bruce Richardson wrote:
> On Mon, Jan 09, 2017 at 07:50:44AM +0000, David Hunt wrote:
>> --- a/lib/librte_distributor/rte_distributor_burst.c
>> +++ b/lib/librte_distributor/rte_distributor_burst.c
>> @@ -352,6 +352,9 @@ rte_distributor_process_burst(struct rte_distributor_burst *d,
>>   		}
>>   
>>   		switch (d->dist_match_fn) {
>> +		case RTE_DIST_MATCH_VECTOR:
>> +			find_match_vec(d, &flows[0], &matches[0]);
>> +			break;
>>   		default:
>>   			find_match_scalar(d, &flows[0], &matches[0]);
>>   		}
> Will link not fail on non-x86 platforms due to find_match_vec not having
> any implementation on those platforms?
>
> /Bruce

I've added a fallback find_match_vec in rte_distributor_match_generic.c 
that calls find_match_scalar.

Rgds,
Dave.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v4 1/6] lib: distributor performance enhancements
  2017-01-13 15:19           ` Bruce Richardson
@ 2017-01-19 14:58             ` Hunt, David
  0 siblings, 0 replies; 202+ messages in thread
From: Hunt, David @ 2017-01-19 14:58 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev



On 13/1/2017 3:19 PM, Bruce Richardson wrote:
> On Mon, Jan 09, 2017 at 07:50:43AM +0000, David Hunt wrote:
>> Now sends bursts of up to 8 mbufs to each worker, and tracks
>> the in-flight flow-ids (atomic scheduling)
>>
>> New file with a new api, similar to the old API except with _burst
>> at the end of the function names
>>
>> Signed-off-by: David Hunt <david.hunt@intel.com>
>> ---
>>   lib/librte_distributor/Makefile                    |   2 +
>>   lib/librte_distributor/rte_distributor.c           |  72 +--
>>   lib/librte_distributor/rte_distributor_burst.c     | 558 +++++++++++++++++++++
>>   lib/librte_distributor/rte_distributor_burst.h     | 255 ++++++++++
>>   lib/librte_distributor/rte_distributor_priv.h      | 189 +++++++
>>   lib/librte_distributor/rte_distributor_version.map |   9 +
>>   6 files changed, 1014 insertions(+), 71 deletions(-)
>>   create mode 100644 lib/librte_distributor/rte_distributor_burst.c
>>   create mode 100644 lib/librte_distributor/rte_distributor_burst.h
>>   create mode 100644 lib/librte_distributor/rte_distributor_priv.h
>>
> Run a documentation sanity check after this patch throws up a few
> warnings:
>
> --- /dev/null   2017-01-10 10:26:01.206201474 +0000
> +++ /tmp/doc-check/doc.txt      2017-01-13 15:19:50.717102848 +0000
> @@ -0,0 +1,6 @@
> +/home/bruce/dpdk-clean/lib/librte_distributor/rte_distributor_burst.h:187:
> warning: argument 'mbuf' of command @param is not found in the argument
> list of rte_distributor_return_pkt_burst(struct rte_distributor_burst
> *d, unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
> +/home/bruce/dpdk-clean/lib/librte_distributor/rte_distributor_burst.h:199:
> warning: The following parameters of
> rte_distributor_return_pkt_burst(struct rte_distributor_burst *d,
> unsigned int worker_id, struct rte_mbuf **oldpkt, int num) are not
> documented:
> +  parameter 'oldpkt'
> +  parameter 'num'
> +/home/bruce/dpdk-clean/lib/librte_distributor/rte_distributor_priv.h:73:
> warning: Found unknown command `\in_flight_bitmask'
> +/home/bruce/dpdk-clean/lib/librte_distributor/rte_distributor_priv.h:73:
> warning: Found unknown command `\rte_distributor_process'
>
> Regards,
> /Bruce

Will be cleaned up in next revsion.

Thanks,
Dave.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v4 2/6] lib: add distributor vector flow matching
  2017-01-13 15:26           ` Bruce Richardson
@ 2017-01-19 14:59             ` Hunt, David
  0 siblings, 0 replies; 202+ messages in thread
From: Hunt, David @ 2017-01-19 14:59 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev



On 13/1/2017 3:26 PM, Bruce Richardson wrote:
> On Mon, Jan 09, 2017 at 07:50:44AM +0000, David Hunt wrote:
>> Signed-off-by: David Hunt <david.hunt@intel.com>
>> ---
>>   lib/librte_distributor/Makefile                    |   4 +
>>   lib/librte_distributor/rte_distributor_burst.c     |  11 +-
>>   lib/librte_distributor/rte_distributor_match_sse.c | 113 +++++++++++++++++++++
>>   lib/librte_distributor/rte_distributor_priv.h      |   6 ++
>>   4 files changed, 133 insertions(+), 1 deletion(-)
>>   create mode 100644 lib/librte_distributor/rte_distributor_match_sse.c
>>
>> diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
>> index 2acc54d..a725aaf 100644
>> --- a/lib/librte_distributor/Makefile
>> +++ b/lib/librte_distributor/Makefile
>> @@ -44,6 +44,10 @@ LIBABIVER := 1
>>   # all source are stored in SRCS-y
>>   SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
>>   SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_burst.c
>> +ifeq ($(CONFIG_RTE_ARCH_X86),y)
>> +SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_match_sse.c
>> +endif
>> +
>>   
> I believe some of the intrinsics used in the vector code are SSE4.2
> instructions, so you need to pass that flag for the compilation for e.g.
> the "default" target for packaging into distros.
>
>>   # install this header file
>>   SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
>> diff --git a/lib/librte_distributor/rte_distributor_burst.c b/lib/librte_distributor/rte_distributor_burst.c
>> index ae7cf9d..35044c4 100644
>> --- a/lib/librte_distributor/rte_distributor_burst.c
>> +++ b/lib/librte_distributor/rte_distributor_burst.c
>> @@ -352,6 +352,9 @@ rte_distributor_process_burst(struct rte_distributor_burst *d,
>>   		}
>>   
>>   		switch (d->dist_match_fn) {
>> +		case RTE_DIST_MATCH_VECTOR:
>> +			find_match_vec(d, &flows[0], &matches[0]);
>> +			break;
>>   		default:
>>   			find_match_scalar(d, &flows[0], &matches[0]);
>>   		}
>> @@ -538,7 +541,13 @@ rte_distributor_create_burst(const char *name,
>>   	snprintf(d->name, sizeof(d->name), "%s", name);
>>   	d->num_workers = num_workers;
>>   
>> -	d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
>> +#if defined(RTE_ARCH_X86)
>> +	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE2)) {
>> +		d->dist_match_fn = RTE_DIST_MATCH_VECTOR;
>> +	} else {
>> +#endif
>> +		d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
>> +	}
>>   
> Two issues here:
> 1) the check needs to be for SSE4.2, not SSE2 [minimum for DPDK on x86
> is SSE3 anyway, so no need for any checks for SSE2]
> 2) The closing brace should be ifdefed out to fix compilation on non-x86
> platforms. A simpler/better solution might actually be to remove the
> braces since only a single line is involved in each branch.
>
> Regards,
> /Bruce


Will be resolved up in next revision.

Thanks,
Dave.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* [PATCH v5 0/6] distributor library performance enhancements
  2017-01-09  7:50         ` [PATCH v4 1/6] lib: distributor " David Hunt
  2017-01-13 15:19           ` Bruce Richardson
  2017-01-16 16:36           ` Bruce Richardson
@ 2017-01-20  9:18           ` David Hunt
  2017-01-20  9:18             ` [PATCH v5 1/6] lib: distributor " David Hunt
                               ` (5 more replies)
  2 siblings, 6 replies; 202+ messages in thread
From: David Hunt @ 2017-01-20  9:18 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson

This patch aims to improve the throughput of the distributor library.

It uses a similar handshake mechanism to the previous version of
the library, in that bits are used to indicate when packets are ready
to be sent to a worker and ready to be returned from a worker. One main
difference is that instead of sending one packet in a cache line, it makes
use of the 7 free spaces in the same cache line in order to send up to
8 packets at a time to/from a worker.

The flow matching algorithm has had significant re-work, and now keeps an
array of inflight flows and an array of backlog flows, and matches incoming
flows to the inflight/backlog flows of all workers so that flow pinning to
workers can be maintained.

The Flow Match algorithm has both scalar and a vector versions, and a
function pointer is used to select the post appropriate function at run time,
depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
the scalar match function is selected, which should still gives a good boost
in performance over the non-burst API.

v2 changes:
  * Created a common distributor_priv.h header file with common
    definitions and structures.
  * Added a scalar version so it can be built and used on machines without
    sse2 instruction set
  * Added unit autotests
  * Added perf autotest

v3 changes:
  * Addressed mailing list review comments
  * Test code removal
  * Split out SSE match into separate file to facilitate NEON addition
  * Cleaned up conditional compilation flags for SSE2
  * Addressed c99 style compilation errors
  * rebased on latest head (Jan 2 2017, Happy New Year to all)

v4 changes:
   * fixed issue building shared libraries

v5 changes:
   * Removed some un-needed code around retries in worker API calls
   * Cleanup due to review comments on mailing list
   * Cleanup of non-x86 platform compilation, fallback to scalar match

Notes:
   Apps must now work in bursts, as up to 8 are given to a worker at a time
   For performance in matching, Flow ID's are 15-bits
   Original API (and code) is kept for backward compatibility

Performance Gains
   2.2GHz Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
   2 x XL710 40GbE NICS to 2 x 40Gbps traffic generator channels 64b packets
   separate cores for rx, tx, distributor
    1 worker  - 4.8x
    4 workers - 2.9x
    8 workers - 1.8x
   12 workers - 2.1x
   16 workers - 1.8x

[PATCH v5 1/6] lib: distributor performance enhancements
[PATCH v5 2/6] lib: add distributor vector flow matching
[PATCH v5 3/6] test: unit tests for new distributor burst API
[PATCH v5 4/6] test: add distributor perf autotest
[PATCH v5 5/6] examples/distributor_app: showing burst API
[PATCH v5 6/6] doc: distributor library changes for new burst API

^ permalink raw reply	[flat|nested] 202+ messages in thread

* [PATCH v5 1/6] lib: distributor performance enhancements
  2017-01-20  9:18           ` [PATCH v5 0/6] distributor library " David Hunt
@ 2017-01-20  9:18             ` David Hunt
  2017-01-23  9:24               ` [PATCH v6 0/6] distributor library " David Hunt
  2017-01-23 12:26               ` [PATCH v5 1/6] lib: distributor " Bruce Richardson
  2017-01-20  9:18             ` [PATCH v5 2/6] lib: add distributor vector flow matching David Hunt
                               ` (4 subsequent siblings)
  5 siblings, 2 replies; 202+ messages in thread
From: David Hunt @ 2017-01-20  9:18 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Now sends bursts of up to 8 mbufs to each worker, and tracks
the in-flight flow-ids (atomic scheduling)

New file with a new api, similar to the old API except with _burst
at the end of the function names. This is to preserve the original
API (and code) for backward compatibility.

It uses a similar handshake mechanism to the previous version of
the library, in that bits are used to indicate when packets are ready
to be sent to a worker and ready to be returned from a worker. One main
difference is that instead of sending one packet in a cache line, it
makes use of the 7 free spaces in the same cache line in order to send up to
8 packets at a time to/from a worker.

The flow matching algorithm has had significant re-work, and now keeps
an array of inflight flows and an array of backlog flows, and matches
incoming flows to the inflight/backlog flows of all workers so
that flow pinning to workers can be maintained.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                    |   2 +
 lib/librte_distributor/rte_distributor.c           |  74 +--
 lib/librte_distributor/rte_distributor.h           |   2 +-
 lib/librte_distributor/rte_distributor_burst.c     | 563 +++++++++++++++++++++
 lib/librte_distributor/rte_distributor_burst.h     | 255 ++++++++++
 lib/librte_distributor/rte_distributor_priv.h      | 189 +++++++
 lib/librte_distributor/rte_distributor_version.map |  14 +
 7 files changed, 1026 insertions(+), 73 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_burst.c
 create mode 100644 lib/librte_distributor/rte_distributor_burst.h
 create mode 100644 lib/librte_distributor/rte_distributor_priv.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 4c9af17..2acc54d 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -43,9 +43,11 @@ LIBABIVER := 1
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_burst.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor_burst.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index f3f778c..ac566c4 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -40,79 +40,9 @@
 #include <rte_errno.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
+#include "rte_distributor_priv.h"
 #include "rte_distributor.h"
 
-#define NO_FLAGS 0
-#define RTE_DISTRIB_PREFIX "DT_"
-
-/* we will use the bottom four bits of pointer for flags, shifting out
- * the top four bits to make room (since a 64-bit pointer actually only uses
- * 48 bits). An arithmetic-right-shift will then appropriately restore the
- * original pointer value with proper sign extension into the top bits. */
-#define RTE_DISTRIB_FLAG_BITS 4
-#define RTE_DISTRIB_FLAGS_MASK (0x0F)
-#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
-#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
-#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
-
-#define RTE_DISTRIB_BACKLOG_SIZE 8
-#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
-
-#define RTE_DISTRIB_MAX_RETURNS 128
-#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
-
-/**
- * Maximum number of workers allowed.
- * Be aware of increasing the limit, becaus it is limited by how we track
- * in-flight tags. See @in_flight_bitmask and @rte_distributor_process
- */
-#define RTE_DISTRIB_MAX_WORKERS	64
-
-/**
- * Buffer structure used to pass the pointer data between cores. This is cache
- * line aligned, but to improve performance and prevent adjacent cache-line
- * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
- * the next cache line to worker 0, we pad this out to three cache lines.
- * Only 64-bits of the memory is actually used though.
- */
-union rte_distributor_buffer {
-	volatile int64_t bufptr64;
-	char pad[RTE_CACHE_LINE_SIZE*3];
-} __rte_cache_aligned;
-
-struct rte_distributor_backlog {
-	unsigned start;
-	unsigned count;
-	int64_t pkts[RTE_DISTRIB_BACKLOG_SIZE];
-};
-
-struct rte_distributor_returned_pkts {
-	unsigned start;
-	unsigned count;
-	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
-};
-
-struct rte_distributor {
-	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
-
-	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
-	unsigned num_workers;                 /**< Number of workers polling */
-
-	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
-		/**< Tracks the tag being processed per core */
-	uint64_t in_flight_bitmask;
-		/**< on/off bits for in-flight tags.
-		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
-		 * the bitmask has to expand.
-		 */
-
-	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
-
-	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
-
-	struct rte_distributor_returned_pkts returns;
-};
-
 TAILQ_HEAD(rte_distributor_list, rte_distributor);
 
 static struct rte_tailq_elem rte_distributor_tailq = {
diff --git a/lib/librte_distributor/rte_distributor.h b/lib/librte_distributor/rte_distributor.h
index 7d36bc8..7281491 100644
--- a/lib/librte_distributor/rte_distributor.h
+++ b/lib/librte_distributor/rte_distributor.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
diff --git a/lib/librte_distributor/rte_distributor_burst.c b/lib/librte_distributor/rte_distributor_burst.c
new file mode 100644
index 0000000..2cbf635
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_burst.c
@@ -0,0 +1,563 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <sys/queue.h>
+#include <string.h>
+#include <rte_mbuf.h>
+#include <rte_memory.h>
+#include <rte_cycles.h>
+#include <rte_memzone.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
+#include <rte_eal_memconfig.h>
+#include "rte_distributor_priv.h"
+#include "rte_distributor_burst.h"
+
+TAILQ_HEAD(rte_dist_burst_list, rte_distributor_burst);
+
+static struct rte_tailq_elem rte_dist_burst_tailq = {
+	.name = "RTE_DIST_BURST",
+};
+EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
+
+/**** APIs called by workers ****/
+
+/**** Burst Packet APIs called by workers ****/
+
+void
+rte_distributor_request_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count)
+{
+	struct rte_distributor_buffer_burst *buf = &(d->bufs[worker_id]);
+	unsigned int i;
+
+	volatile int64_t *retptr64;
+
+	retptr64 = &(buf->retptr64[0]);
+	/* Spin while handshake bits are set (scheduler clears it) */
+	while (unlikely(*retptr64 & RTE_DISTRIB_GET_BUF)) {
+		rte_pause();
+		uint64_t t = rte_rdtsc()+100;
+
+		while (rte_rdtsc() < t)
+			rte_pause();
+	}
+
+	/*
+	 * OK, if we've got here, then the scheduler has just cleared the
+	 * handshake bits. Populate the retptrs with returning packets.
+	 */
+
+	for (i = count; i < RTE_DIST_BURST_SIZE; i++)
+		buf->retptr64[i] = 0;
+
+	/* Set Return bit for each packet returned */
+	for (i = count; i-- > 0; )
+		buf->retptr64[i] =
+			(((int64_t)(uintptr_t)(oldpkt[i])) <<
+			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
+
+	/*
+	 * Finally, set the GET_BUF  to signal to distributor that cache
+	 * line is ready for processing
+	 */
+	*retptr64 |= RTE_DISTRIB_GET_BUF;
+}
+
+int
+rte_distributor_poll_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **pkts)
+{
+	struct rte_distributor_buffer_burst *buf = &d->bufs[worker_id];
+	uint64_t ret;
+	int count = 0;
+	unsigned int i;
+
+	/* If bit is set, return */
+	if (buf->bufptr64[0] & RTE_DISTRIB_GET_BUF)
+		return -1;
+
+	/* since bufptr64 is signed, this should be an arithmetic shift */
+	for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+		if (likely(buf->bufptr64[i] & RTE_DISTRIB_VALID_BUF)) {
+			ret = buf->bufptr64[i] >> RTE_DISTRIB_FLAG_BITS;
+			pkts[count++] = (struct rte_mbuf *)((uintptr_t)(ret));
+		}
+	}
+
+	/*
+	 * so now we've got the contents of the cacheline into an  array of
+	 * mbuf pointers, so toggle the bit so scheduler can start working
+	 * on the next cacheline while we're working.
+	 */
+	buf->bufptr64[0] |= RTE_DISTRIB_GET_BUF;
+
+	return count;
+}
+
+int
+rte_distributor_get_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **pkts,
+		struct rte_mbuf **oldpkt, unsigned int return_count)
+{
+	int count;
+
+	rte_distributor_request_pkt_burst(d, worker_id, oldpkt, return_count);
+
+	count = rte_distributor_poll_pkt_burst(d, worker_id, pkts);
+	while (count == -1) {
+		uint64_t t = rte_rdtsc() + 100;
+
+		while (rte_rdtsc() < t)
+			rte_pause();
+
+		count = rte_distributor_poll_pkt_burst(d, worker_id, pkts);
+	}
+	return count;
+}
+
+int
+rte_distributor_return_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
+{
+	struct rte_distributor_buffer_burst *buf = &d->bufs[worker_id];
+	unsigned int i;
+
+	for (i = 0; i < RTE_DIST_BURST_SIZE; i++)
+		/* Switch off the return bit first */
+		buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
+
+	for (i = num; i-- > 0; )
+		buf->retptr64[i] = (((int64_t)(uintptr_t)oldpkt[i]) <<
+			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
+
+	/* set the GET_BUF but even if we got no returns */
+	buf->retptr64[0] |= RTE_DISTRIB_GET_BUF;
+
+	return 0;
+}
+
+/**** APIs called on distributor core ***/
+
+/* stores a packet returned from a worker inside the returns array */
+static inline void
+store_return(uintptr_t oldbuf, struct rte_distributor_burst *d,
+		unsigned int *ret_start, unsigned int *ret_count)
+{
+	if (!oldbuf)
+		return;
+	/* store returns in a circular buffer */
+	d->returns.mbufs[(*ret_start + *ret_count) & RTE_DISTRIB_RETURNS_MASK]
+			= (void *)oldbuf;
+	*ret_start += (*ret_count == RTE_DISTRIB_RETURNS_MASK);
+	*ret_count += (*ret_count != RTE_DISTRIB_RETURNS_MASK);
+}
+
+/*
+ * Match then flow_ids (tags) of the incoming packets to the flow_ids
+ * of the inflight packets (both inflight on the workers and in each worker
+ * backlog). This will then allow us to pin those packets to the relevant
+ * workers to give us our atomic flow pinning.
+ */
+static inline void
+find_match_scalar(struct rte_distributor_burst *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	struct rte_distributor_backlog *bl;
+	uint16_t i, j, w;
+
+	/*
+	 * Function overview:
+	 * 1. Loop through all worker ID's
+	 * 2. Compare the current inflights to the incoming tags
+	 * 3. Compare the current backlog to the incoming tags
+	 * 4. Add any matches to the output
+	 */
+
+	for (j = 0 ; j < RTE_DIST_BURST_SIZE; j++)
+		output_ptr[j] = 0;
+
+	for (i = 0; i < d->num_workers; i++) {
+		bl = &d->backlog[i];
+
+		for (j = 0; j < RTE_DIST_BURST_SIZE ; j++)
+			for (w = 0; w < RTE_DIST_BURST_SIZE; w++)
+				if (d->in_flight_tags[i][j] == data_ptr[w]) {
+					output_ptr[j] = i+1;
+					break;
+				}
+		for (j = 0; j < RTE_DIST_BURST_SIZE; j++)
+			for (w = 0; w < RTE_DIST_BURST_SIZE; w++)
+				if (bl->tags[j] == data_ptr[w]) {
+					output_ptr[j] = i+1;
+					break;
+				}
+	}
+
+	/*
+	 * At this stage, the output contains 8 16-bit values, with
+	 * each non-zero value containing the worker ID on which the
+	 * corresponding flow is pinned to.
+	 */
+}
+
+
+/*
+ * When the handshake bits indicate that there are packets coming
+ * back from the worker, this function is called to copy and store
+ * the valid returned pointers (store_return).
+ */
+static unsigned int
+handle_returns(struct rte_distributor_burst *d, unsigned int wkr)
+{
+	struct rte_distributor_buffer_burst *buf = &(d->bufs[wkr]);
+	uintptr_t oldbuf;
+	unsigned int ret_start = d->returns.start,
+			ret_count = d->returns.count;
+	unsigned int count = 0;
+	unsigned int i;
+
+	if (buf->retptr64[0] & RTE_DISTRIB_GET_BUF) {
+		for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+			if (buf->retptr64[i] & RTE_DISTRIB_RETURN_BUF) {
+				oldbuf = ((uintptr_t)(buf->retptr64[i] >>
+					RTE_DISTRIB_FLAG_BITS));
+				/* store returns in a circular buffer */
+				store_return(oldbuf, d, &ret_start, &ret_count);
+				count++;
+				buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
+			}
+		}
+		d->returns.start = ret_start;
+		d->returns.count = ret_count;
+		/* Clear for the worker to populate with more returns */
+		buf->retptr64[0] = 0;
+	}
+	return count;
+}
+
+/*
+ * This function releases a burst (cache line) to a worker.
+ * It is called from the process function when a cacheline is
+ * full to make room for more packets for that worker, or when
+ * all packets have been assigned to bursts and need to be flushed
+ * to the workers.
+ * It also needs to wait for any outstanding packets from the worker
+ * before sending out new packets.
+ */
+static unsigned int
+release(struct rte_distributor_burst *d, unsigned int wkr)
+{
+	struct rte_distributor_buffer_burst *buf = &(d->bufs[wkr]);
+	unsigned int i;
+
+	while (!(d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF))
+		rte_pause();
+
+	handle_returns(d, wkr);
+
+	buf->count = 0;
+
+	for (i = 0; i < d->backlog[wkr].count; i++) {
+		d->bufs[wkr].bufptr64[i] = d->backlog[wkr].pkts[i] |
+				RTE_DISTRIB_GET_BUF | RTE_DISTRIB_VALID_BUF;
+		d->in_flight_tags[wkr][i] = d->backlog[wkr].tags[i];
+	}
+	buf->count = i;
+	for ( ; i < RTE_DIST_BURST_SIZE ; i++) {
+		buf->bufptr64[i] = RTE_DISTRIB_GET_BUF;
+		d->in_flight_tags[wkr][i] = 0;
+	}
+
+	d->backlog[wkr].count = 0;
+
+	/* Clear the GET bit */
+	buf->bufptr64[0] &= ~RTE_DISTRIB_GET_BUF;
+	return  buf->count;
+
+}
+
+
+/* process a set of packets to distribute them to workers */
+int
+rte_distributor_process_burst(struct rte_distributor_burst *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs)
+{
+	unsigned int next_idx = 0;
+	static unsigned int wkr;
+	struct rte_mbuf *next_mb = NULL;
+	int64_t next_value = 0;
+	uint16_t new_tag = 0;
+	uint16_t flows[RTE_DIST_BURST_SIZE] __rte_cache_aligned;
+	unsigned int i, wid;
+	int j, w;
+
+	if (unlikely(num_mbufs == 0)) {
+		/* Flush out all non-full cache-lines to workers. */
+		for (wid = 0 ; wid < d->num_workers; wid++) {
+			if ((d->bufs[wid].bufptr64[0] & RTE_DISTRIB_GET_BUF)) {
+				release(d, wid);
+				handle_returns(d, wid);
+			}
+		}
+		return 0;
+	}
+
+	while (next_idx < num_mbufs) {
+		uint16_t matches[RTE_DIST_BURST_SIZE];
+		int pkts;
+
+		if (d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF)
+			d->bufs[wkr].count = 0;
+
+		for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+			if (mbufs[next_idx + i]) {
+				/* flows have to be non-zero */
+				flows[i] = mbufs[next_idx + i]->hash.usr | 1;
+			} else
+				flows[i] = 0;
+		}
+
+		switch (d->dist_match_fn) {
+		default:
+			find_match_scalar(d, &flows[0], &matches[0]);
+		}
+
+		/*
+		 * Matches array now contain the intended worker ID (+1) of
+		 * the incoming packets. Any zeroes need to be assigned
+		 * workers.
+		 */
+
+		if ((num_mbufs - next_idx) < RTE_DIST_BURST_SIZE)
+			pkts = num_mbufs - next_idx;
+		else
+			pkts = RTE_DIST_BURST_SIZE;
+
+		for (j = 0; j < pkts; j++) {
+
+			next_mb = mbufs[next_idx++];
+			next_value = (((int64_t)(uintptr_t)next_mb) <<
+					RTE_DISTRIB_FLAG_BITS);
+			/*
+			 * User is advocated to set tag vaue for each
+			 * mbuf before calling rte_distributor_process.
+			 * User defined tags are used to identify flows,
+			 * or sessions.
+			 */
+			/* flows MUST be non-zero */
+			new_tag = (uint16_t)(next_mb->hash.usr) | 1;
+
+			/*
+			 * Uncommenting the next line will cause the find_match
+			 * function to be optimised out, making this function
+			 * do parallel (non-atomic) distribution
+			 */
+			/* matches[j] = 0; */
+
+			if (matches[j]) {
+				struct rte_distributor_backlog *bl =
+						&d->backlog[matches[j]-1];
+				if (unlikely(bl->count ==
+						RTE_DIST_BURST_SIZE)) {
+					release(d, matches[j]-1);
+				}
+
+				/* Add to worker that already has flow */
+				unsigned int idx = bl->count++;
+
+				bl->tags[idx] = new_tag;
+				bl->pkts[idx] = next_value;
+
+			} else {
+				struct rte_distributor_backlog *bl =
+						&d->backlog[wkr];
+				if (unlikely(bl->count ==
+						RTE_DIST_BURST_SIZE)) {
+					release(d, wkr);
+				}
+
+				/* Add to current worker worker */
+				unsigned int idx = bl->count++;
+
+				bl->tags[idx] = new_tag;
+				bl->pkts[idx] = next_value;
+				/*
+				 * Now that we've just added an unpinned flow
+				 * to a worker, we need to ensure that all
+				 * other packets with that same flow will go
+				 * to the same worker in this burst.
+				 */
+				for (w = j; w < pkts; w++)
+					if (flows[w] == new_tag)
+						matches[w] = wkr+1;
+			}
+		}
+		wkr++;
+		if (wkr >= d->num_workers)
+			wkr = 0;
+	}
+
+	/* Flush out all non-full cache-lines to workers. */
+	for (wid = 0 ; wid < d->num_workers; wid++)
+		if ((d->bufs[wid].bufptr64[0] & RTE_DISTRIB_GET_BUF))
+			release(d, wid);
+
+	return num_mbufs;
+}
+
+/* return to the caller, packets returned from workers */
+int
+rte_distributor_returned_pkts_burst(struct rte_distributor_burst *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs)
+{
+	struct rte_distributor_returned_pkts *returns = &d->returns;
+	unsigned int retval = (max_mbufs < returns->count) ?
+			max_mbufs : returns->count;
+	unsigned int i;
+
+	for (i = 0; i < retval; i++) {
+		unsigned int idx = (returns->start + i) &
+				RTE_DISTRIB_RETURNS_MASK;
+
+		mbufs[i] = returns->mbufs[idx];
+	}
+	returns->start += i;
+	returns->count -= i;
+
+	return retval;
+}
+
+/*
+ * Return the number of packets in-flight in a distributor, i.e. packets
+ * being workered on or queued up in a backlog.
+ */
+static inline unsigned int
+total_outstanding(const struct rte_distributor_burst *d)
+{
+	unsigned int wkr, total_outstanding = 0;
+
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		total_outstanding += d->backlog[wkr].count;
+
+	return total_outstanding;
+}
+
+/*
+ * Flush the distributor, so that there are no outstanding packets in flight or
+ * queued up.
+ */
+int
+rte_distributor_flush_burst(struct rte_distributor_burst *d)
+{
+	const unsigned int flushed = total_outstanding(d);
+	unsigned int wkr;
+
+	while (total_outstanding(d) > 0)
+		rte_distributor_process_burst(d, NULL, 0);
+
+	/*
+	 * Send empty burst to all workers to allow them to exit
+	 * gracefully, should they need to.
+	 */
+	rte_distributor_process_burst(d, NULL, 0);
+
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		handle_returns(d, wkr);
+
+	return flushed;
+}
+
+/* clears the internal returns array in the distributor */
+void
+rte_distributor_clear_returns_burst(struct rte_distributor_burst *d)
+{
+	unsigned int wkr;
+
+	/* throw away returns, so workers can exit */
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		d->bufs[wkr].retptr64[0] = 0;
+}
+
+/* creates a distributor instance */
+struct rte_distributor_burst *
+rte_distributor_create_burst(const char *name,
+		unsigned int socket_id,
+		unsigned int num_workers)
+{
+	struct rte_distributor_burst *d;
+	struct rte_dist_burst_list *dist_burst_list;
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+	unsigned int i;
+
+	/* compilation-time checks */
+	RTE_BUILD_BUG_ON((sizeof(*d) & RTE_CACHE_LINE_MASK) != 0);
+	RTE_BUILD_BUG_ON((RTE_DISTRIB_MAX_WORKERS & 7) != 0);
+
+	if (name == NULL || num_workers >= RTE_DISTRIB_MAX_WORKERS) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	snprintf(mz_name, sizeof(mz_name), RTE_DISTRIB_PREFIX"%s", name);
+	mz = rte_memzone_reserve(mz_name, sizeof(*d), socket_id, NO_FLAGS);
+	if (mz == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	d = mz->addr;
+	snprintf(d->name, sizeof(d->name), "%s", name);
+	d->num_workers = num_workers;
+
+	d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
+
+	/*
+	 * Set up the backog tags so they're pointing at the second cache
+	 * line for performance during flow matching
+	 */
+	for (i = 0 ; i < num_workers ; i++)
+		d->backlog[i].tags = &d->in_flight_tags[i][RTE_DIST_BURST_SIZE];
+
+	dist_burst_list = RTE_TAILQ_CAST(rte_dist_burst_tailq.head,
+					  rte_dist_burst_list);
+
+	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+	TAILQ_INSERT_TAIL(dist_burst_list, d, next);
+	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
+	return d;
+}
diff --git a/lib/librte_distributor/rte_distributor_burst.h b/lib/librte_distributor/rte_distributor_burst.h
new file mode 100644
index 0000000..0b65518
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_burst.h
@@ -0,0 +1,255 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DIST_BURST_H_
+#define _RTE_DIST_BURST_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct rte_distributor_burst;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor_burst *
+rte_distributor_create_burst(const char *name, unsigned int socket_id,
+		unsigned int num_workers);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed on different cores at
+ * the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process_burst(struct rte_distributor_burst *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts_burst(struct rte_distributor_burst *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush_burst(struct rte_distributor_burst *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns_burst(struct rte_distributor_burst *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get new packets to process. Any previous packets
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param pkts
+ *   The mbufs pointer array to be filled in (up to 8 packets)
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ * @param retcount
+ *   The number of packets being returned
+ *
+ * @return
+ *   The number of packets in the pkts array
+ */
+int
+rte_distributor_get_pkt_burst(struct rte_distributor_burst *d,
+	unsigned int worker_id, struct rte_mbuf **pkts,
+	struct rte_mbuf **oldpkt, unsigned int retcount);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbuf
+ *   The previous packet being processed by the worker
+ */
+int
+rte_distributor_return_pkt_burst(struct rte_distributor_burst *d,
+	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt_burst(), this function does not wait for a
+ * new packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt_burst() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt_burst()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The returning packets, if any, processed by the worker
+ * @param count
+ *   The number of returning packets
+ */
+void
+rte_distributor_request_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbufs
+ *   The array of mbufs being given to the worker
+ *
+ * @return
+ *   The number of packets being given to the worker thread, zero if no
+ *   packet is yet available.
+ */
+int
+rte_distributor_poll_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **mbufs);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_priv.h b/lib/librte_distributor/rte_distributor_priv.h
new file mode 100644
index 0000000..ae48d86
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_priv.h
@@ -0,0 +1,189 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DIST_PRIV_H_
+#define _RTE_DIST_PRIV_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define NO_FLAGS 0
+#define RTE_DISTRIB_PREFIX "DT_"
+
+/*
+ * We will use the bottom four bits of pointer for flags, shifting out
+ * the top four bits to make room (since a 64-bit pointer actually only uses
+ * 48 bits). An arithmetic-right-shift will then appropriately restore the
+ * original pointer value with proper sign extension into the top bits.
+ */
+#define RTE_DISTRIB_FLAG_BITS 4
+#define RTE_DISTRIB_FLAGS_MASK (0x0F)
+#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
+#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
+#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
+#define RTE_DISTRIB_VALID_BUF (4)  /**< set if bufptr contains ptr */
+
+#define RTE_DISTRIB_BACKLOG_SIZE 8
+#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
+
+#define RTE_DISTRIB_MAX_RETURNS 128
+#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
+
+/**
+ * Maximum number of workers allowed.
+ * Be aware of increasing the limit, becaus it is limited by how we track
+ * in-flight tags. See @in_flight_bitmask and @rte_distributor_process
+ */
+#define RTE_DISTRIB_MAX_WORKERS 64
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+/**
+ * Buffer structure used to pass the pointer data between cores. This is cache
+ * line aligned, but to improve performance and prevent adjacent cache-line
+ * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
+ * the next cache line to worker 0, we pad this out to three cache lines.
+ * Only 64-bits of the memory is actually used though.
+ */
+union rte_distributor_buffer {
+	volatile int64_t bufptr64;
+	char pad[RTE_CACHE_LINE_SIZE*3];
+} __rte_cache_aligned;
+
+/**
+ * Number of packets to deal with in bursts. Needs to be 8 so as to
+ * fit in one cache line.
+ */
+#define RTE_DIST_BURST_SIZE (sizeof(rte_xmm_t) / sizeof(uint16_t))
+
+/**
+ * Buffer structure used to pass the pointer data between cores. This is cache
+ * line aligned, but to improve performance and prevent adjacent cache-line
+ * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
+ * the next cache line to worker 0, we pad this out to two cache lines.
+ * We can pass up to 8 mbufs at a time in one cacheline.
+ * There is a separate cacheline for returns in the burst API.
+ */
+struct rte_distributor_buffer_burst {
+	volatile int64_t bufptr64[RTE_DIST_BURST_SIZE]
+			__rte_cache_aligned; /* <= outgoing to worker */
+
+	int64_t pad1 __rte_cache_aligned;    /* <= one cache line  */
+
+	volatile int64_t retptr64[RTE_DIST_BURST_SIZE]
+			__rte_cache_aligned; /* <= incoming from worker */
+
+	int64_t pad2 __rte_cache_aligned;    /* <= one cache line  */
+
+	int count __rte_cache_aligned;       /* <= number of current mbufs */
+};
+
+
+struct rte_distributor_backlog {
+	unsigned int start;
+	unsigned int count;
+	int64_t pkts[RTE_DIST_BURST_SIZE] __rte_cache_aligned;
+	uint16_t *tags; /* will point to second cacheline of inflights */
+} __rte_cache_aligned;
+
+
+struct rte_distributor_returned_pkts {
+	unsigned int start;
+	unsigned int count;
+	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
+};
+
+struct rte_distributor {
+	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
+
+	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
+	unsigned int num_workers;             /**< Number of workers polling */
+
+	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
+		/**< Tracks the tag being processed per core */
+	uint64_t in_flight_bitmask;
+		/**< on/off bits for in-flight tags.
+		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
+		 * the bitmask has to expand.
+		 */
+
+	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
+
+	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
+
+	struct rte_distributor_returned_pkts returns;
+};
+
+/* All different signature compare functions */
+enum rte_distributor_match_function {
+	RTE_DIST_MATCH_SCALAR = 0,
+	RTE_DIST_NUM_MATCH_FNS
+};
+
+struct rte_distributor_burst {
+	TAILQ_ENTRY(rte_distributor_burst) next;    /**< Next in list. */
+
+	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
+	unsigned int num_workers;             /**< Number of workers polling */
+
+	/**>
+	 * First cache line in the this array are the tags inflight
+	 * on the worker core. Second cache line are the backlog
+	 * that are going to go to the worker core.
+	 */
+	uint16_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS][RTE_DIST_BURST_SIZE*2]
+			__rte_cache_aligned;
+
+	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS]
+			__rte_cache_aligned;
+
+	struct rte_distributor_buffer_burst bufs[RTE_DISTRIB_MAX_WORKERS];
+
+	struct rte_distributor_returned_pkts returns;
+
+	enum rte_distributor_match_function dist_match_fn;
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_version.map b/lib/librte_distributor/rte_distributor_version.map
index 73fdc43..eabcaf5 100644
--- a/lib/librte_distributor/rte_distributor_version.map
+++ b/lib/librte_distributor/rte_distributor_version.map
@@ -13,3 +13,17 @@ DPDK_2.0 {
 
 	local: *;
 };
+
+DPDK_17.02 {
+	global:
+
+	rte_distributor_clear_returns_burst;
+	rte_distributor_create_burst;
+	rte_distributor_flush_burst;
+	rte_distributor_get_pkt_burst;
+	rte_distributor_poll_pkt_burst;
+	rte_distributor_process_burst;
+	rte_distributor_request_pkt_burst;
+	rte_distributor_return_pkt_burst;
+	rte_distributor_returned_pkts_burst;
+} DPDK_2.0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v5 2/6] lib: add distributor vector flow matching
  2017-01-20  9:18           ` [PATCH v5 0/6] distributor library " David Hunt
  2017-01-20  9:18             ` [PATCH v5 1/6] lib: distributor " David Hunt
@ 2017-01-20  9:18             ` David Hunt
  2017-01-20  9:18             ` [PATCH v5 3/6] test: unit tests for new distributor burst API David Hunt
                               ` (3 subsequent siblings)
  5 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-01-20  9:18 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                    |   7 ++
 lib/librte_distributor/rte_distributor_burst.c     |  12 ++-
 lib/librte_distributor/rte_distributor_burst.h     |   6 +-
 .../rte_distributor_match_generic.c                |  43 ++++++++
 lib/librte_distributor/rte_distributor_match_sse.c | 113 +++++++++++++++++++++
 lib/librte_distributor/rte_distributor_priv.h      |  15 ++-
 6 files changed, 191 insertions(+), 5 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_match_generic.c
 create mode 100644 lib/librte_distributor/rte_distributor_match_sse.c

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 2acc54d..4baaa0c 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -44,6 +44,13 @@ LIBABIVER := 1
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_burst.c
+ifeq ($(CONFIG_RTE_ARCH_X86),y)
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_match_sse.c
+CFLAGS_rte_distributor_match_sse.o += -msse4.2
+else
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_match_generic.c
+endif
+
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
diff --git a/lib/librte_distributor/rte_distributor_burst.c b/lib/librte_distributor/rte_distributor_burst.c
index 2cbf635..a629c73 100644
--- a/lib/librte_distributor/rte_distributor_burst.c
+++ b/lib/librte_distributor/rte_distributor_burst.c
@@ -190,7 +190,7 @@ store_return(uintptr_t oldbuf, struct rte_distributor_burst *d,
  * backlog). This will then allow us to pin those packets to the relevant
  * workers to give us our atomic flow pinning.
  */
-static inline void
+void
 find_match_scalar(struct rte_distributor_burst *d,
 			uint16_t *data_ptr,
 			uint16_t *output_ptr)
@@ -351,6 +351,9 @@ rte_distributor_process_burst(struct rte_distributor_burst *d,
 		}
 
 		switch (d->dist_match_fn) {
+		case RTE_DIST_MATCH_VECTOR:
+			find_match_vec(d, &flows[0], &matches[0]);
+			break;
 		default:
 			find_match_scalar(d, &flows[0], &matches[0]);
 		}
@@ -543,7 +546,12 @@ rte_distributor_create_burst(const char *name,
 	snprintf(d->name, sizeof(d->name), "%s", name);
 	d->num_workers = num_workers;
 
-	d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
+#if defined(RTE_ARCH_X86)
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2)) {
+		d->dist_match_fn = RTE_DIST_MATCH_VECTOR;
+	} else
+#endif
+		d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
 
 	/*
 	 * Set up the backog tags so they're pointing at the second cache
diff --git a/lib/librte_distributor/rte_distributor_burst.h b/lib/librte_distributor/rte_distributor_burst.h
index 0b65518..b0b41ec 100644
--- a/lib/librte_distributor/rte_distributor_burst.h
+++ b/lib/librte_distributor/rte_distributor_burst.h
@@ -192,8 +192,10 @@ rte_distributor_get_pkt_burst(struct rte_distributor_burst *d,
  * @param worker_id
  *   The worker instance number to use - must be less that num_workers passed
  *   at distributor creation time.
- * @param mbuf
- *   The previous packet being processed by the worker
+ * @param oldpkt
+ *   The previous packets being processed by the worker
+ * @param num
+ *   The number of packets in the oldpkt array
  */
 int
 rte_distributor_return_pkt_burst(struct rte_distributor_burst *d,
diff --git a/lib/librte_distributor/rte_distributor_match_generic.c b/lib/librte_distributor/rte_distributor_match_generic.c
new file mode 100644
index 0000000..6a1ff7f
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_match_generic.c
@@ -0,0 +1,43 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_mbuf.h>
+#include "rte_distributor_priv.h"
+#include "rte_distributor_burst.h"
+
+void
+find_match_vec(struct rte_distributor_burst *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	find_match_scalar(d, data_ptr, output_ptr);
+}
diff --git a/lib/librte_distributor/rte_distributor_match_sse.c b/lib/librte_distributor/rte_distributor_match_sse.c
new file mode 100644
index 0000000..383f12e
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_match_sse.c
@@ -0,0 +1,113 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_mbuf.h>
+#include "rte_distributor_priv.h"
+#include "rte_distributor_burst.h"
+#include "smmintrin.h"
+
+
+void
+find_match_vec(struct rte_distributor_burst *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	/* Setup */
+	__m128i incoming_fids;
+	__m128i inflight_fids;
+	__m128i preflight_fids;
+	__m128i wkr;
+	__m128i mask1;
+	__m128i mask2;
+	__m128i output;
+	struct rte_distributor_backlog *bl;
+	uint16_t i;
+
+	/*
+	 * Function overview:
+	 * 2. Loop through all worker ID's
+	 *  2a. Load the current inflights for that worker into an xmm reg
+	 *  2b. Load the current backlog for that worker into an xmm reg
+	 *  2c. use cmpestrm to intersect flow_ids with backlog and inflights
+	 *  2d. Add any matches to the output
+	 * 3. Write the output xmm (matching worker ids).
+	 */
+
+
+	output = _mm_set1_epi16(0);
+	incoming_fids = _mm_load_si128((__m128i *)data_ptr);
+
+	for (i = 0; i < d->num_workers; i++) {
+		bl = &d->backlog[i];
+
+		inflight_fids =
+			_mm_load_si128((__m128i *)&(d->in_flight_tags[i]));
+		preflight_fids =
+			_mm_load_si128((__m128i *)(bl->tags));
+
+		/*
+		 * Any incoming_fid that exists anywhere in inflight_fids will
+		 * have 0xffff in same position of the mask as the incoming fid
+		 * Example (shortened to bytes for brevity):
+		 * incoming_fids   0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08
+		 * inflight_fids   0x03 0x05 0x07 0x00 0x00 0x00 0x00 0x00
+		 * mask            0x00 0x00 0xff 0x00 0xff 0x00 0xff 0x00
+		 */
+
+		mask1 = _mm_cmpestrm(inflight_fids, 8, incoming_fids, 8,
+			_SIDD_UWORD_OPS |
+			_SIDD_CMP_EQUAL_ANY |
+			_SIDD_UNIT_MASK);
+		mask2 = _mm_cmpestrm(preflight_fids, 8, incoming_fids, 8,
+			_SIDD_UWORD_OPS |
+			_SIDD_CMP_EQUAL_ANY |
+			_SIDD_UNIT_MASK);
+
+		mask1 = _mm_or_si128(mask1, mask2);
+		/*
+		 * Now mask contains 0xffff where there's a match.
+		 * Next we need to store the worker_id in the relevant position
+		 * in the output.
+		 */
+
+		wkr = _mm_set1_epi16(i+1);
+		mask1 = _mm_and_si128(mask1, wkr);
+		output = _mm_or_si128(mask1, output);
+	}
+
+	/*
+	 * At this stage, the output 128-bit contains 8 16-bit values, with
+	 * each non-zero value containing the worker ID on which the
+	 * corresponding flow is pinned to.
+	 */
+	_mm_store_si128((__m128i *)output_ptr, output);
+}
diff --git a/lib/librte_distributor/rte_distributor_priv.h b/lib/librte_distributor/rte_distributor_priv.h
index ae48d86..1d73d92 100644
--- a/lib/librte_distributor/rte_distributor_priv.h
+++ b/lib/librte_distributor/rte_distributor_priv.h
@@ -33,6 +33,8 @@
 #ifndef _RTE_DIST_PRIV_H_
 #define _RTE_DIST_PRIV_H_
 
+#include <rte_vect.h>
+
 /**
  * @file
  * RTE distributor
@@ -70,7 +72,7 @@ extern "C" {
 /**
  * Maximum number of workers allowed.
  * Be aware of increasing the limit, becaus it is limited by how we track
- * in-flight tags. See @in_flight_bitmask and @rte_distributor_process
+ * in-flight tags. See in_flight_bitmask and rte_distributor_process
  */
 #define RTE_DISTRIB_MAX_WORKERS 64
 
@@ -155,6 +157,7 @@ struct rte_distributor {
 /* All different signature compare functions */
 enum rte_distributor_match_function {
 	RTE_DIST_MATCH_SCALAR = 0,
+	RTE_DIST_MATCH_VECTOR,
 	RTE_DIST_NUM_MATCH_FNS
 };
 
@@ -182,6 +185,16 @@ struct rte_distributor_burst {
 	enum rte_distributor_match_function dist_match_fn;
 };
 
+void
+find_match_scalar(struct rte_distributor_burst *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr);
+
+void
+find_match_vec(struct rte_distributor_burst *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr);
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v5 3/6] test: unit tests for new distributor burst API
  2017-01-20  9:18           ` [PATCH v5 0/6] distributor library " David Hunt
  2017-01-20  9:18             ` [PATCH v5 1/6] lib: distributor " David Hunt
  2017-01-20  9:18             ` [PATCH v5 2/6] lib: add distributor vector flow matching David Hunt
@ 2017-01-20  9:18             ` David Hunt
  2017-01-20  9:18             ` [PATCH v5 4/6] test: add distributor perf autotest David Hunt
                               ` (2 subsequent siblings)
  5 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-01-20  9:18 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 app/test/test_distributor.c | 501 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 392 insertions(+), 109 deletions(-)

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index 85cb8f3..3871f86 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -40,11 +40,24 @@
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
 #include <rte_distributor.h>
+#include <rte_distributor_burst.h>
 
 #define ITER_POWER 20 /* log 2 of how many iterations we do when timing. */
 #define BURST 32
 #define BIG_BATCH 1024
 
+#define DIST_SINGLE 0
+#define DIST_BURST  1
+#define DIST_NUM_TYPES 2
+
+struct worker_params {
+	struct rte_distributor *d;
+	struct rte_distributor_burst *db;
+	int dist_type;
+};
+
+struct worker_params worker_params;
+
 /* statics - all zero-initialized by default */
 static volatile int quit;      /**< general quit variable for all threads */
 static volatile int zero_quit; /**< var for when we just want thr0 to quit*/
@@ -81,17 +94,36 @@ static int
 handle_work(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
-
-	pkt = rte_distributor_get_pkt(d, id, NULL);
-	while (!quit) {
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
+	unsigned int count = 0, num = 0;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	int i;
+
+	if (wp->dist_type == DIST_SINGLE) {
+		pkt = rte_distributor_get_pkt(d, id, NULL);
+		while (!quit) {
+			worker_stats[id].handled_packets++, count++;
+			pkt = rte_distributor_get_pkt(d, id, pkt);
+		}
 		worker_stats[id].handled_packets++, count++;
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		rte_distributor_return_pkt(d, id, pkt);
+	} else {
+		for (i = 0; i < 8; i++)
+			buf[i] = NULL;
+		num = rte_distributor_get_pkt_burst(db, id, buf, buf, num);
+		while (!quit) {
+			worker_stats[id].handled_packets += num;
+			count += num;
+			num = rte_distributor_get_pkt_burst(db, id,
+					buf, buf, num);
+		}
+		worker_stats[id].handled_packets += num;
+		count += num;
+		rte_distributor_return_pkt_burst(db, id, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
 	return 0;
 }
 
@@ -107,12 +139,21 @@ handle_work(void *arg)
  *   not necessarily in the same order (as different flows).
  */
 static int
-sanity_test(struct rte_distributor *d, struct rte_mempool *p)
+sanity_test(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	struct rte_mbuf *bufs[BURST];
-	unsigned i;
+	struct rte_mbuf *returns[BURST*2];
+	unsigned int i;
+	unsigned int retries;
+	unsigned int count = 0;
+
+	if (wp->dist_type == DIST_SINGLE)
+		printf("=== Basic distributor sanity tests (single) ===\n");
+	else
+		printf("=== Basic distributor sanity tests (burst) ===\n");
 
-	printf("=== Basic distributor sanity tests ===\n");
 	clear_packet_count();
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
@@ -124,8 +165,21 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 	for (i = 0; i < BURST; i++)
 		bufs[i]->hash.usr = 0;
 
-	rte_distributor_process(d, bufs, BURST);
-	rte_distributor_flush(d);
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, bufs, BURST);
+		rte_distributor_flush(d);
+	} else {
+		rte_distributor_process_burst(db, bufs, BURST);
+		count = 0;
+		do {
+
+			rte_distributor_flush_burst(db);
+			count += rte_distributor_returned_pkts_burst(db,
+					returns, BURST*2);
+		} while (count < BURST);
+	}
+
+
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -146,8 +200,18 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 		for (i = 0; i < BURST; i++)
 			bufs[i]->hash.usr = (i & 1) << 8;
 
-		rte_distributor_process(d, bufs, BURST);
-		rte_distributor_flush(d);
+		if (wp->dist_type == DIST_SINGLE) {
+			rte_distributor_process(d, bufs, BURST);
+			rte_distributor_flush(d);
+		} else {
+			rte_distributor_process_burst(db, bufs, BURST);
+			count = 0;
+			do {
+				rte_distributor_flush_burst(db);
+				count += rte_distributor_returned_pkts_burst(db,
+						returns, BURST*2);
+			} while (count < BURST);
+		}
 		if (total_packet_count() != BURST) {
 			printf("Line %d: Error, not all packets flushed. "
 					"Expected %u, got %u\n",
@@ -155,24 +219,32 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 			return -1;
 		}
 
+
 		for (i = 0; i < rte_lcore_count() - 1; i++)
 			printf("Worker %u handled %u packets\n", i,
 					worker_stats[i].handled_packets);
 		printf("Sanity test with two hash values done\n");
-
-		if (worker_stats[0].handled_packets != 16 ||
-				worker_stats[1].handled_packets != 16)
-			return -1;
 	}
 
 	/* give a different hash value to each packet,
 	 * so load gets distributed */
 	clear_packet_count();
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = i;
+		bufs[i]->hash.usr = i+1;
+
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, bufs, BURST);
+		rte_distributor_flush(d);
+	} else {
+		rte_distributor_process_burst(db, bufs, BURST);
+		count = 0;
+		do {
+			rte_distributor_flush_burst(db);
+			count += rte_distributor_returned_pkts_burst(db,
+					returns, BURST*2);
+		} while (count < BURST);
+	}
 
-	rte_distributor_process(d, bufs, BURST);
-	rte_distributor_flush(d);
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -194,8 +266,15 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 	unsigned num_returned = 0;
 
 	/* flush out any remaining packets */
-	rte_distributor_flush(d);
-	rte_distributor_clear_returns(d);
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_flush(d);
+		rte_distributor_clear_returns(d);
+	} else {
+		rte_distributor_flush_burst(db);
+		rte_distributor_clear_returns_burst(db);
+	}
+
+
 	if (rte_mempool_get_bulk(p, (void *)many_bufs, BIG_BATCH) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
 		return -1;
@@ -203,28 +282,59 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 	for (i = 0; i < BIG_BATCH; i++)
 		many_bufs[i]->hash.usr = i << 2;
 
-	for (i = 0; i < BIG_BATCH/BURST; i++) {
-		rte_distributor_process(d, &many_bufs[i*BURST], BURST);
+	if (wp->dist_type == DIST_SINGLE) {
+		printf("===testing single big burst===\n");
+		for (i = 0; i < BIG_BATCH/BURST; i++) {
+			rte_distributor_process(d, &many_bufs[i*BURST], BURST);
+			num_returned += rte_distributor_returned_pkts(d,
+					&return_bufs[num_returned],
+					BIG_BATCH - num_returned);
+		}
+		rte_distributor_flush(d);
 		num_returned += rte_distributor_returned_pkts(d,
 				&return_bufs[num_returned],
 				BIG_BATCH - num_returned);
+	} else {
+		printf("===testing burst big burst===\n");
+		for (i = 0; i < BIG_BATCH/BURST; i++) {
+			rte_distributor_process_burst(db,
+					&many_bufs[i*BURST], BURST);
+			count = rte_distributor_returned_pkts_burst(db,
+					&return_bufs[num_returned],
+					BIG_BATCH - num_returned);
+			num_returned += count;
+		}
+		rte_distributor_flush_burst(db);
+		count = rte_distributor_returned_pkts_burst(db,
+				&return_bufs[num_returned],
+				BIG_BATCH - num_returned);
+		num_returned += count;
 	}
-	rte_distributor_flush(d);
-	num_returned += rte_distributor_returned_pkts(d,
-			&return_bufs[num_returned], BIG_BATCH - num_returned);
+	retries = 0;
+	do {
+		rte_distributor_flush_burst(db);
+		count = rte_distributor_returned_pkts_burst(db,
+				&return_bufs[num_returned],
+				BIG_BATCH - num_returned);
+		num_returned += count;
+		retries++;
+	} while ((num_returned < BIG_BATCH) && (retries < 100));
+
 
 	if (num_returned != BIG_BATCH) {
-		printf("line %d: Number returned is not the same as "
-				"number sent\n", __LINE__);
+		printf("line %d: Missing packets, expected %d\n",
+				__LINE__, num_returned);
 		return -1;
 	}
+
 	/* big check -  make sure all packets made it back!! */
 	for (i = 0; i < BIG_BATCH; i++) {
 		unsigned j;
 		struct rte_mbuf *src = many_bufs[i];
-		for (j = 0; j < BIG_BATCH; j++)
+		for (j = 0; j < BIG_BATCH; j++) {
 			if (return_bufs[j] == src)
 				break;
+		}
 
 		if (j == BIG_BATCH) {
 			printf("Error: could not find source packet #%u\n", i);
@@ -234,7 +344,6 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 	printf("Sanity test of returned packets done\n");
 
 	rte_mempool_put_bulk(p, (void *)many_bufs, BIG_BATCH);
-
 	printf("\n");
 	return 0;
 }
@@ -249,18 +358,40 @@ static int
 handle_work_with_free_mbufs(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
-
-	pkt = rte_distributor_get_pkt(d, id, NULL);
-	while (!quit) {
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
+	unsigned int count = 0;
+	unsigned int i;
+	unsigned int num = 0;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+
+	if (wp->dist_type == DIST_SINGLE) {
+		pkt = rte_distributor_get_pkt(d, id, NULL);
+		while (!quit) {
+			worker_stats[id].handled_packets++, count++;
+			rte_pktmbuf_free(pkt);
+			pkt = rte_distributor_get_pkt(d, id, pkt);
+		}
 		worker_stats[id].handled_packets++, count++;
-		rte_pktmbuf_free(pkt);
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		rte_distributor_return_pkt(d, id, pkt);
+	} else {
+		for (i = 0; i < 8; i++)
+			buf[i] = NULL;
+		num = rte_distributor_get_pkt_burst(db, id, buf, buf, num);
+		while (!quit) {
+			worker_stats[id].handled_packets += num;
+			count += num;
+			for (i = 0; i < num; i++)
+				rte_pktmbuf_free(buf[i]);
+			num = rte_distributor_get_pkt_burst(db,
+					id, buf, buf, num);
+		}
+		worker_stats[id].handled_packets += num;
+		count += num;
+		rte_distributor_return_pkt_burst(db, id, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
 	return 0;
 }
 
@@ -270,26 +401,45 @@ handle_work_with_free_mbufs(void *arg)
  * library.
  */
 static int
-sanity_test_with_mbuf_alloc(struct rte_distributor *d, struct rte_mempool *p)
+sanity_test_with_mbuf_alloc(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	unsigned i;
 	struct rte_mbuf *bufs[BURST];
 
-	printf("=== Sanity test with mbuf alloc/free  ===\n");
+	if (wp->dist_type == DIST_SINGLE)
+		printf("=== Sanity test with mbuf alloc/free (single) ===\n");
+	else
+		printf("=== Sanity test with mbuf alloc/free (burst)  ===\n");
+
 	clear_packet_count();
 	for (i = 0; i < ((1<<ITER_POWER)); i += BURST) {
 		unsigned j;
-		while (rte_mempool_get_bulk(p, (void *)bufs, BURST) < 0)
-			rte_distributor_process(d, NULL, 0);
+		while (rte_mempool_get_bulk(p, (void *)bufs, BURST) < 0) {
+			if (wp->dist_type == DIST_SINGLE)
+				rte_distributor_process(d, NULL, 0);
+			else
+				rte_distributor_process_burst(db, NULL, 0);
+		}
 		for (j = 0; j < BURST; j++) {
 			bufs[j]->hash.usr = (i+j) << 1;
 			rte_mbuf_refcnt_set(bufs[j], 1);
 		}
 
-		rte_distributor_process(d, bufs, BURST);
+		if (wp->dist_type == DIST_SINGLE)
+			rte_distributor_process(d, bufs, BURST);
+		else
+			rte_distributor_process_burst(db, bufs, BURST);
 	}
 
-	rte_distributor_flush(d);
+	if (wp->dist_type == DIST_SINGLE)
+		rte_distributor_flush(d);
+	else
+		rte_distributor_flush_burst(db);
+
+	rte_delay_us(10000);
+
 	if (total_packet_count() < (1<<ITER_POWER)) {
 		printf("Line %u: Packet count is incorrect, %u, expected %u\n",
 				__LINE__, total_packet_count(),
@@ -305,20 +455,48 @@ static int
 handle_work_for_shutdown_test(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
-	unsigned count = 0;
-	const unsigned id = __sync_fetch_and_add(&worker_idx, 1);
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
+	unsigned int count = 0;
+	unsigned int num = 0;
+	unsigned int total = 0;
+	unsigned int i;
+	unsigned int returned = 0;
+	const unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+
+	if (wp->dist_type == DIST_SINGLE)
+		pkt = rte_distributor_get_pkt(d, id, NULL);
+	else
+		num = rte_distributor_get_pkt_burst(db, id, buf, buf, num);
 
-	pkt = rte_distributor_get_pkt(d, id, NULL);
 	/* wait for quit single globally, or for worker zero, wait
 	 * for zero_quit */
 	while (!quit && !(id == 0 && zero_quit)) {
-		worker_stats[id].handled_packets++, count++;
-		rte_pktmbuf_free(pkt);
-		pkt = rte_distributor_get_pkt(d, id, NULL);
+		if (wp->dist_type == DIST_SINGLE) {
+			worker_stats[id].handled_packets++, count++;
+			rte_pktmbuf_free(pkt);
+			pkt = rte_distributor_get_pkt(d, id, NULL);
+			num = 1;
+			total += num;
+		} else {
+			worker_stats[id].handled_packets += num;
+			count += num;
+			for (i = 0; i < num; i++)
+				rte_pktmbuf_free(buf[i]);
+			num = rte_distributor_get_pkt_burst(db,
+					id, buf, buf, num);
+			total += num;
+		}
+	}
+	worker_stats[id].handled_packets += num;
+	count += num;
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_return_pkt(d, id, pkt);
+	} else {
+		returned = rte_distributor_return_pkt_burst(db, id, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
 
 	if (id == 0) {
 		/* for worker zero, allow it to restart to pick up last packet
@@ -326,13 +504,29 @@ handle_work_for_shutdown_test(void *arg)
 		 */
 		while (zero_quit)
 			usleep(100);
-		pkt = rte_distributor_get_pkt(d, id, NULL);
+		if (wp->dist_type == DIST_SINGLE) {
+			pkt = rte_distributor_get_pkt(d, id, NULL);
+		} else {
+			num = rte_distributor_get_pkt_burst(db,
+					id, buf, buf, num);
+		}
 		while (!quit) {
 			worker_stats[id].handled_packets++, count++;
 			rte_pktmbuf_free(pkt);
-			pkt = rte_distributor_get_pkt(d, id, NULL);
+			if (wp->dist_type == DIST_SINGLE) {
+				pkt = rte_distributor_get_pkt(d, id, NULL);
+			} else {
+				num = rte_distributor_get_pkt_burst(db,
+						id, buf, buf, num);
+			}
+		}
+		if (wp->dist_type == DIST_SINGLE) {
+			rte_distributor_return_pkt(d, id, pkt);
+		} else {
+			returned = rte_distributor_return_pkt_burst(db,
+					id, buf, num);
+			printf("Num returned = %d\n", returned);
 		}
-		rte_distributor_return_pkt(d, id, pkt);
 	}
 	return 0;
 }
@@ -344,26 +538,37 @@ handle_work_for_shutdown_test(void *arg)
  * library.
  */
 static int
-sanity_test_with_worker_shutdown(struct rte_distributor *d,
+sanity_test_with_worker_shutdown(struct worker_params *wp,
 		struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
 	printf("=== Sanity test of worker shutdown ===\n");
 
 	clear_packet_count();
+
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
 		return -1;
 	}
 
-	/* now set all hash values in all buffers to zero, so all pkts go to the
-	 * one worker thread */
+	/*
+	 * Now set all hash values in all buffers to same value so all
+	 * pkts go to the one worker thread
+	 */
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = 0;
+		bufs[i]->hash.usr = 1;
+
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, bufs, BURST);
+	} else {
+		rte_distributor_process_burst(db, bufs, BURST);
+		rte_distributor_flush_burst(db);
+	}
 
-	rte_distributor_process(d, bufs, BURST);
 	/* at this point, we will have processed some packets and have a full
 	 * backlog for the other ones at worker 0.
 	 */
@@ -374,14 +579,25 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d,
 		return -1;
 	}
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = 0;
+		bufs[i]->hash.usr = 1;
 
 	/* get worker zero to quit */
 	zero_quit = 1;
-	rte_distributor_process(d, bufs, BURST);
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, bufs, BURST);
+		/* flush the distributor */
+		rte_distributor_flush(d);
+	} else {
+		rte_distributor_process_burst(db, bufs, BURST);
+		/* flush the distributor */
+		rte_distributor_flush_burst(db);
+	}
+	rte_delay_us(10000);
+
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
 
-	/* flush the distributor */
-	rte_distributor_flush(d);
 	if (total_packet_count() != BURST * 2) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -389,10 +605,6 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d,
 		return -1;
 	}
 
-	for (i = 0; i < rte_lcore_count() - 1; i++)
-		printf("Worker %u handled %u packets\n", i,
-				worker_stats[i].handled_packets);
-
 	printf("Sanity test with worker shutdown passed\n\n");
 	return 0;
 }
@@ -401,13 +613,18 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d,
  * one worker shuts down..
  */
 static int
-test_flush_with_worker_shutdown(struct rte_distributor *d,
+test_flush_with_worker_shutdown(struct worker_params *wp,
 		struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
-	printf("=== Test flush fn with worker shutdown ===\n");
+	if (wp->dist_type == DIST_SINGLE)
+		printf("=== Test flush fn with worker shutdown (single) ===\n");
+	else
+		printf("=== Test flush fn with worker shutdown (burst) ===\n");
 
 	clear_packet_count();
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
@@ -420,7 +637,11 @@ test_flush_with_worker_shutdown(struct rte_distributor *d,
 	for (i = 0; i < BURST; i++)
 		bufs[i]->hash.usr = 0;
 
-	rte_distributor_process(d, bufs, BURST);
+	if (wp->dist_type == DIST_SINGLE)
+		rte_distributor_process(d, bufs, BURST);
+	else
+		rte_distributor_process_burst(db, bufs, BURST);
+
 	/* at this point, we will have processed some packets and have a full
 	 * backlog for the other ones at worker 0.
 	 */
@@ -429,9 +650,18 @@ test_flush_with_worker_shutdown(struct rte_distributor *d,
 	zero_quit = 1;
 
 	/* flush the distributor */
-	rte_distributor_flush(d);
+	if (wp->dist_type == DIST_SINGLE)
+		rte_distributor_flush(d);
+	else
+		rte_distributor_flush_burst(db);
+
+	rte_delay_us(10000);
 
 	zero_quit = 0;
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
+
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -439,10 +669,6 @@ test_flush_with_worker_shutdown(struct rte_distributor *d,
 		return -1;
 	}
 
-	for (i = 0; i < rte_lcore_count() - 1; i++)
-		printf("Worker %u handled %u packets\n", i,
-				worker_stats[i].handled_packets);
-
 	printf("Flush test with worker shutdown passed\n\n");
 	return 0;
 }
@@ -451,6 +677,7 @@ static
 int test_error_distributor_create_name(void)
 {
 	struct rte_distributor *d = NULL;
+	struct rte_distributor_burst *db = NULL;
 	char *name = NULL;
 
 	d = rte_distributor_create(name, rte_socket_id(),
@@ -460,6 +687,13 @@ int test_error_distributor_create_name(void)
 		return -1;
 	}
 
+	db = rte_distributor_create_burst(name, rte_socket_id(),
+			rte_lcore_count() - 1);
+	if (db != NULL || rte_errno != EINVAL) {
+		printf("ERROR: No error on create_burst() with NULL param\n");
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -468,20 +702,32 @@ static
 int test_error_distributor_create_numworkers(void)
 {
 	struct rte_distributor *d = NULL;
+	struct rte_distributor_burst *db = NULL;
+
 	d = rte_distributor_create("test_numworkers", rte_socket_id(),
 			RTE_MAX_LCORE + 10);
 	if (d != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with num_workers > MAX\n");
 		return -1;
 	}
+
+	db = rte_distributor_create_burst("test_numworkers", rte_socket_id(),
+			RTE_MAX_LCORE + 10);
+	if (db != NULL || rte_errno != EINVAL) {
+		printf("ERROR: No error on create_burst() num_workers > MAX\n");
+		return -1;
+	}
+
 	return 0;
 }
 
 
 /* Useful function which ensures that all worker functions terminate */
 static void
-quit_workers(struct rte_distributor *d, struct rte_mempool *p)
+quit_workers(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	const unsigned num_workers = rte_lcore_count() - 1;
 	unsigned i;
 	struct rte_mbuf *bufs[RTE_MAX_LCORE];
@@ -491,12 +737,20 @@ quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 	quit = 1;
 	for (i = 0; i < num_workers; i++)
 		bufs[i]->hash.usr = i << 1;
-	rte_distributor_process(d, bufs, num_workers);
+	if (wp->dist_type == DIST_SINGLE)
+		rte_distributor_process(d, bufs, num_workers);
+	else
+		rte_distributor_process_burst(db, bufs, num_workers);
 
 	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
 
-	rte_distributor_process(d, NULL, 0);
-	rte_distributor_flush(d);
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, NULL, 0);
+		rte_distributor_flush(d);
+	} else {
+		rte_distributor_process_burst(db, NULL, 0);
+		rte_distributor_flush_burst(db);
+	}
 	rte_eal_mp_wait_lcore();
 	quit = 0;
 	worker_idx = 0;
@@ -506,7 +760,9 @@ static int
 test_distributor(void)
 {
 	static struct rte_distributor *d;
+	static struct rte_distributor_burst *db;
 	static struct rte_mempool *p;
+	int i;
 
 	if (rte_lcore_count() < 2) {
 		printf("ERROR: not enough cores to test distributor\n");
@@ -525,6 +781,19 @@ test_distributor(void)
 		rte_distributor_clear_returns(d);
 	}
 
+	if (db == NULL) {
+		db = rte_distributor_create_burst("Test_dist_burst",
+				rte_socket_id(),
+				rte_lcore_count() - 1);
+		if (db == NULL) {
+			printf("Error creating burst distributor\n");
+			return -1;
+		}
+	} else {
+		rte_distributor_flush_burst(db);
+		rte_distributor_clear_returns_burst(db);
+	}
+
 	const unsigned nb_bufs = (511 * rte_lcore_count()) < BIG_BATCH ?
 			(BIG_BATCH * 2) - 1 : (511 * rte_lcore_count());
 	if (p == NULL) {
@@ -536,31 +805,45 @@ test_distributor(void)
 		}
 	}
 
-	rte_eal_mp_remote_launch(handle_work, d, SKIP_MASTER);
-	if (sanity_test(d, p) < 0)
-		goto err;
-	quit_workers(d, p);
+	worker_params.d = d;
+	worker_params.db = db;
 
-	rte_eal_mp_remote_launch(handle_work_with_free_mbufs, d, SKIP_MASTER);
-	if (sanity_test_with_mbuf_alloc(d, p) < 0)
-		goto err;
-	quit_workers(d, p);
+	for (i = 0; i < DIST_NUM_TYPES; i++) {
 
-	if (rte_lcore_count() > 2) {
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test, d,
-				SKIP_MASTER);
-		if (sanity_test_with_worker_shutdown(d, p) < 0)
-			goto err;
-		quit_workers(d, p);
+		worker_params.dist_type = i;
 
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test, d,
-				SKIP_MASTER);
-		if (test_flush_with_worker_shutdown(d, p) < 0)
+		rte_eal_mp_remote_launch(handle_work,
+				&worker_params, SKIP_MASTER);
+		if (sanity_test(&worker_params, p) < 0)
 			goto err;
-		quit_workers(d, p);
+		quit_workers(&worker_params, p);
 
-	} else {
-		printf("Not enough cores to run tests for worker shutdown\n");
+		rte_eal_mp_remote_launch(handle_work_with_free_mbufs,
+				&worker_params, SKIP_MASTER);
+		if (sanity_test_with_mbuf_alloc(&worker_params, p) < 0)
+			goto err;
+		quit_workers(&worker_params, p);
+
+		if (rte_lcore_count() > 2) {
+			rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+					&worker_params,
+					SKIP_MASTER);
+			if (sanity_test_with_worker_shutdown(&worker_params,
+					p) < 0)
+				goto err;
+			quit_workers(&worker_params, p);
+
+			rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+					&worker_params,
+					SKIP_MASTER);
+			if (test_flush_with_worker_shutdown(&worker_params,
+					p) < 0)
+				goto err;
+			quit_workers(&worker_params, p);
+
+		} else {
+			printf("Too few cores to run worker shutdown test\n");
+		}
 	}
 
 	if (test_error_distributor_create_numworkers() == -1 ||
@@ -572,7 +855,7 @@ test_distributor(void)
 	return 0;
 
 err:
-	quit_workers(d, p);
+	quit_workers(&worker_params, p);
 	return -1;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v5 4/6] test: add distributor perf autotest
  2017-01-20  9:18           ` [PATCH v5 0/6] distributor library " David Hunt
                               ` (2 preceding siblings ...)
  2017-01-20  9:18             ` [PATCH v5 3/6] test: unit tests for new distributor burst API David Hunt
@ 2017-01-20  9:18             ` David Hunt
  2017-01-20  9:18             ` [PATCH v5 5/6] examples/distributor_app: showing burst API David Hunt
  2017-01-20  9:18             ` [PATCH v5 6/6] doc: distributor library changes for new " David Hunt
  5 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-01-20  9:18 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 app/test/test_distributor_perf.c | 150 +++++++++++++++++++++++++++++++++++----
 1 file changed, 138 insertions(+), 12 deletions(-)

diff --git a/app/test/test_distributor_perf.c b/app/test/test_distributor_perf.c
index 7947fe9..9132010 100644
--- a/app/test/test_distributor_perf.c
+++ b/app/test/test_distributor_perf.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -40,9 +40,11 @@
 #include <rte_common.h>
 #include <rte_mbuf.h>
 #include <rte_distributor.h>
+#include <rte_distributor_burst.h>
 
-#define ITER_POWER 20 /* log 2 of how many iterations we do when timing. */
-#define BURST 32
+#define ITER_POWER_CL 25 /* log 2 of how many iterations  for Cache Line test */
+#define ITER_POWER 21 /* log 2 of how many iterations we do when timing. */
+#define BURST 64
 #define BIG_BATCH 1024
 
 /* static vars - zero initialized by default */
@@ -54,7 +56,8 @@ struct worker_stats {
 } __rte_cache_aligned;
 struct worker_stats worker_stats[RTE_MAX_LCORE];
 
-/* worker thread used for testing the time to do a round-trip of a cache
+/*
+ * worker thread used for testing the time to do a round-trip of a cache
  * line between two cores and back again
  */
 static void
@@ -69,7 +72,8 @@ flip_bit(volatile uint64_t *arg)
 	}
 }
 
-/* test case to time the number of cycles to round-trip a cache line between
+/*
+ * test case to time the number of cycles to round-trip a cache line between
  * two cores and back again.
  */
 static void
@@ -86,7 +90,7 @@ time_cache_line_switch(void)
 		rte_pause();
 
 	const uint64_t start_time = rte_rdtsc();
-	for (i = 0; i < (1 << ITER_POWER); i++) {
+	for (i = 0; i < (1 << ITER_POWER_CL); i++) {
 		while (*pdata)
 			rte_pause();
 		*pdata = 1;
@@ -98,13 +102,14 @@ time_cache_line_switch(void)
 	*pdata = 2;
 	rte_eal_wait_lcore(slaveid);
 	printf("==== Cache line switch test ===\n");
-	printf("Time for %u iterations = %"PRIu64" ticks\n", (1<<ITER_POWER),
+	printf("Time for %u iterations = %"PRIu64" ticks\n", (1<<ITER_POWER_CL),
 			end_time-start_time);
 	printf("Ticks per iteration = %"PRIu64"\n\n",
-			(end_time-start_time) >> ITER_POWER);
+			(end_time-start_time) >> ITER_POWER_CL);
 }
 
-/* returns the total count of the number of packets handled by the worker
+/*
+ * returns the total count of the number of packets handled by the worker
  * functions given below.
  */
 static unsigned
@@ -123,7 +128,8 @@ clear_packet_count(void)
 	memset(&worker_stats, 0, sizeof(worker_stats));
 }
 
-/* this is the basic worker function for performance tests.
+/*
+ * this is the basic worker function for performance tests.
  * it does nothing but return packets and count them.
  */
 static int
@@ -144,7 +150,37 @@ handle_work(void *arg)
 	return 0;
 }
 
-/* this basic performance test just repeatedly sends in 32 packets at a time
+/*
+ * this is the basic worker function for performance tests.
+ * it does nothing but return packets and count them.
+ */
+static int
+handle_work_burst(void *arg)
+{
+	struct rte_distributor_burst *d = arg;
+	unsigned int count = 0;
+	unsigned int num = 0;
+	int i;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
+
+	num = rte_distributor_get_pkt_burst(d, id, buf, buf, num);
+	while (!quit) {
+		worker_stats[id].handled_packets += num;
+		count += num;
+		num = rte_distributor_get_pkt_burst(d, id, buf, buf, num);
+	}
+	worker_stats[id].handled_packets += num;
+	count += num;
+	rte_distributor_return_pkt_burst(d, id, buf, num);
+	return 0;
+}
+
+/*
+ * this basic performance test just repeatedly sends in 32 packets at a time
  * to the distributor and verifies at the end that we got them all in the worker
  * threads and finally how long per packet the processing took.
  */
@@ -174,6 +210,8 @@ perf_test(struct rte_distributor *d, struct rte_mempool *p)
 		rte_distributor_process(d, NULL, 0);
 	} while (total_packet_count() < (BURST << ITER_POWER));
 
+	rte_distributor_clear_returns(d);
+
 	printf("=== Performance test of distributor ===\n");
 	printf("Time per burst:  %"PRIu64"\n", (end - start) >> ITER_POWER);
 	printf("Time per packet: %"PRIu64"\n\n",
@@ -190,6 +228,55 @@ perf_test(struct rte_distributor *d, struct rte_mempool *p)
 	return 0;
 }
 
+/*
+ * this basic performance test just repeatedly sends in 32 packets at a time
+ * to the distributor and verifies at the end that we got them all in the worker
+ * threads and finally how long per packet the processing took.
+ */
+static inline int
+perf_test_burst(struct rte_distributor_burst *d, struct rte_mempool *p)
+{
+	unsigned int i;
+	uint64_t start, end;
+	struct rte_mbuf *bufs[BURST];
+
+	clear_packet_count();
+	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
+		printf("Error getting mbufs from pool\n");
+		return -1;
+	}
+	/* ensure we have different hash value for each pkt */
+	for (i = 0; i < BURST; i++)
+		bufs[i]->hash.usr = i;
+
+	start = rte_rdtsc();
+	for (i = 0; i < (1<<ITER_POWER); i++)
+		rte_distributor_process_burst(d, bufs, BURST);
+	end = rte_rdtsc();
+
+	do {
+		usleep(100);
+		rte_distributor_process_burst(d, NULL, 0);
+	} while (total_packet_count() < (BURST << ITER_POWER));
+
+	rte_distributor_clear_returns_burst(d);
+
+	printf("=== Performance test of burst distributor ===\n");
+	printf("Time per burst:  %"PRIu64"\n", (end - start) >> ITER_POWER);
+	printf("Time per packet: %"PRIu64"\n\n",
+			((end - start) >> ITER_POWER)/BURST);
+	rte_mempool_put_bulk(p, (void *)bufs, BURST);
+
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
+	printf("Total packets: %u (%x)\n", total_packet_count(),
+			total_packet_count());
+	printf("=== Perf test done ===\n\n");
+
+	return 0;
+}
+
 /* Useful function which ensures that all worker functions terminate */
 static void
 quit_workers(struct rte_distributor *d, struct rte_mempool *p)
@@ -212,10 +299,34 @@ quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 	worker_idx = 0;
 }
 
+/* Useful function which ensures that all worker functions terminate */
+static void
+quit_workers_burst(struct rte_distributor_burst *d, struct rte_mempool *p)
+{
+	const unsigned int num_workers = rte_lcore_count() - 1;
+	unsigned int i;
+	struct rte_mbuf *bufs[RTE_MAX_LCORE];
+
+	rte_mempool_get_bulk(p, (void *)bufs, num_workers);
+
+	quit = 1;
+	for (i = 0; i < num_workers; i++)
+		bufs[i]->hash.usr = i << 1;
+	rte_distributor_process_burst(d, bufs, num_workers);
+
+	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
+
+	rte_distributor_process_burst(d, NULL, 0);
+	rte_eal_mp_wait_lcore();
+	quit = 0;
+	worker_idx = 0;
+}
+
 static int
 test_distributor_perf(void)
 {
 	static struct rte_distributor *d;
+	static struct rte_distributor_burst *db;
 	static struct rte_mempool *p;
 
 	if (rte_lcore_count() < 2) {
@@ -234,10 +345,20 @@ test_distributor_perf(void)
 			return -1;
 		}
 	} else {
-		rte_distributor_flush(d);
 		rte_distributor_clear_returns(d);
 	}
 
+	if (db == NULL) {
+		db = rte_distributor_create_burst("Test_burst", rte_socket_id(),
+				rte_lcore_count() - 1);
+		if (db == NULL) {
+			printf("Error creating burst distributor\n");
+			return -1;
+		}
+	} else {
+		rte_distributor_clear_returns_burst(db);
+	}
+
 	const unsigned nb_bufs = (511 * rte_lcore_count()) < BIG_BATCH ?
 			(BIG_BATCH * 2) - 1 : (511 * rte_lcore_count());
 	if (p == NULL) {
@@ -254,6 +375,11 @@ test_distributor_perf(void)
 		return -1;
 	quit_workers(d, p);
 
+	rte_eal_mp_remote_launch(handle_work_burst, db, SKIP_MASTER);
+	if (perf_test_burst(db, p) < 0)
+		return -1;
+	quit_workers_burst(db, p);
+
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v5 5/6] examples/distributor_app: showing burst API
  2017-01-20  9:18           ` [PATCH v5 0/6] distributor library " David Hunt
                               ` (3 preceding siblings ...)
  2017-01-20  9:18             ` [PATCH v5 4/6] test: add distributor perf autotest David Hunt
@ 2017-01-20  9:18             ` David Hunt
  2017-01-23 12:31               ` Bruce Richardson
  2017-01-20  9:18             ` [PATCH v5 6/6] doc: distributor library changes for new " David Hunt
  5 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-01-20  9:18 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 examples/distributor/main.c | 509 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 391 insertions(+), 118 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index e7641d2..4c134d5 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -1,8 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -31,6 +30,8 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
+#define BURST_API 1
+
 #include <stdint.h>
 #include <inttypes.h>
 #include <unistd.h>
@@ -43,39 +44,87 @@
 #include <rte_malloc.h>
 #include <rte_debug.h>
 #include <rte_prefetch.h>
+#if BURST_API
+#include <rte_distributor_burst.h>
+#else
 #include <rte_distributor.h>
+#endif
 
-#define RX_RING_SIZE 256
-#define TX_RING_SIZE 512
+#define RX_QUEUE_SIZE 512
+#define TX_QUEUE_SIZE 512
 #define NUM_MBUFS ((64*1024)-1)
-#define MBUF_CACHE_SIZE 250
+#define MBUF_CACHE_SIZE 128
+#if BURST_API
+#define BURST_SIZE 64
+#define SCHED_RX_RING_SZ 8192
+#define SCHED_TX_RING_SZ 65536
+#else
 #define BURST_SIZE 32
-#define RTE_RING_SZ 1024
+#define SCHED_RX_RING_SZ 1024
+#define SCHED_TX_RING_SZ 1024
+#endif
+#define BURST_SIZE_TX 32
 
 #define RTE_LOGTYPE_DISTRAPP RTE_LOGTYPE_USER1
 
+#define ANSI_COLOR_RED     "\x1b[31m"
+#define ANSI_COLOR_RESET   "\x1b[0m"
+
 /* mask of enabled ports */
 static uint32_t enabled_port_mask;
 volatile uint8_t quit_signal;
 volatile uint8_t quit_signal_rx;
+volatile uint8_t quit_signal_dist;
+volatile uint8_t quit_signal_work;
 
 static volatile struct app_stats {
 	struct {
 		uint64_t rx_pkts;
 		uint64_t returned_pkts;
 		uint64_t enqueued_pkts;
+		uint64_t enqdrop_pkts;
 	} rx __rte_cache_aligned;
+	int pad1 __rte_cache_aligned;
+
+	struct {
+		uint64_t in_pkts;
+		uint64_t ret_pkts;
+		uint64_t sent_pkts;
+		uint64_t enqdrop_pkts;
+	} dist __rte_cache_aligned;
+	int pad2 __rte_cache_aligned;
 
 	struct {
 		uint64_t dequeue_pkts;
 		uint64_t tx_pkts;
+		uint64_t enqdrop_pkts;
 	} tx __rte_cache_aligned;
+	int pad3 __rte_cache_aligned;
+
+	uint64_t worker_pkts[64] __rte_cache_aligned;
+
+	int pad4 __rte_cache_aligned;
+
+	uint64_t worker_bursts[64][8] __rte_cache_aligned;
+
+	int pad5 __rte_cache_aligned;
+
+	uint64_t port_rx_pkts[64] __rte_cache_aligned;
+	uint64_t port_tx_pkts[64] __rte_cache_aligned;
 } app_stats;
 
+struct app_stats prev_app_stats;
+
 static const struct rte_eth_conf port_conf_default = {
 	.rxmode = {
 		.mq_mode = ETH_MQ_RX_RSS,
 		.max_rx_pkt_len = ETHER_MAX_LEN,
+		.split_hdr_size = 0,
+		.header_split   = 0, /**< Header Split disabled */
+		.hw_ip_checksum = 1, /**< IP checksum offload enabled */
+		.hw_vlan_filter = 0, /**< VLAN filtering disabled */
+		.jumbo_frame    = 0, /**< Jumbo Frame Support disabled */
+		.hw_strip_crc   = 0, /**< CRC stripped by hardware */
 	},
 	.txmode = {
 		.mq_mode = ETH_MQ_TX_NONE,
@@ -93,6 +142,8 @@ struct output_buffer {
 	struct rte_mbuf *mbufs[BURST_SIZE];
 };
 
+static void print_stats(void);
+
 /*
  * Initialises a given port using global settings and with the rx buffers
  * coming from the mbuf_pool passed as parameter
@@ -101,9 +152,13 @@ static inline int
 port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 {
 	struct rte_eth_conf port_conf = port_conf_default;
-	const uint16_t rxRings = 1, txRings = rte_lcore_count() - 1;
-	int retval;
+	const uint16_t rxRings = 1;
+	uint16_t txRings = rte_lcore_count() - 1;
 	uint16_t q;
+	int retval;
+
+	if (txRings > RTE_MAX_ETHPORTS)
+		txRings = RTE_MAX_ETHPORTS;
 
 	if (port >= rte_eth_dev_count())
 		return -1;
@@ -113,7 +168,7 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 		return retval;
 
 	for (q = 0; q < rxRings; q++) {
-		retval = rte_eth_rx_queue_setup(port, q, RX_RING_SIZE,
+		retval = rte_eth_rx_queue_setup(port, q, RX_QUEUE_SIZE,
 						rte_eth_dev_socket_id(port),
 						NULL, mbuf_pool);
 		if (retval < 0)
@@ -121,7 +176,7 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 	}
 
 	for (q = 0; q < txRings; q++) {
-		retval = rte_eth_tx_queue_setup(port, q, TX_RING_SIZE,
+		retval = rte_eth_tx_queue_setup(port, q, TX_QUEUE_SIZE,
 						rte_eth_dev_socket_id(port),
 						NULL);
 		if (retval < 0)
@@ -134,7 +189,8 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 
 	struct rte_eth_link link;
 	rte_eth_link_get_nowait(port, &link);
-	if (!link.link_status) {
+	while (!link.link_status) {
+		printf("Waiting for Link up on port %"PRIu8"\n", port);
 		sleep(1);
 		rte_eth_link_get_nowait(port, &link);
 	}
@@ -160,41 +216,52 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 
 struct lcore_params {
 	unsigned worker_id;
-	struct rte_distributor *d;
-	struct rte_ring *r;
+	struct rte_distributor_burst *d;
+	struct rte_ring *rx_dist_ring;
+	struct rte_ring *dist_tx_ring;
 	struct rte_mempool *mem_pool;
 };
 
-static int
-quit_workers(struct rte_distributor *d, struct rte_mempool *p)
+static inline void
+flush_one_port(struct output_buffer *outbuf, uint8_t outp)
 {
-	const unsigned num_workers = rte_lcore_count() - 2;
-	unsigned i;
-	struct rte_mbuf *bufs[num_workers];
+	unsigned int nb_tx = rte_eth_tx_burst(outp, 0,
+			outbuf->mbufs, outbuf->count);
+	app_stats.tx.tx_pkts += outbuf->count;
 
-	if (rte_mempool_get_bulk(p, (void *)bufs, num_workers) != 0) {
-		printf("line %d: Error getting mbufs from pool\n", __LINE__);
-		return -1;
+	if (unlikely(nb_tx < outbuf->count)) {
+		app_stats.tx.enqdrop_pkts +=  outbuf->count - nb_tx;
+		do {
+			rte_pktmbuf_free(outbuf->mbufs[nb_tx]);
+		} while (++nb_tx < outbuf->count);
 	}
+	outbuf->count = 0;
+}
+
+static inline void
+flush_all_ports(struct output_buffer *tx_buffers, uint8_t nb_ports)
+{
+	uint8_t outp;
 
-	for (i = 0; i < num_workers; i++)
-		bufs[i]->hash.rss = i << 1;
+	for (outp = 0; outp < nb_ports; outp++) {
+		/* skip ports that are not enabled */
+		if ((enabled_port_mask & (1 << outp)) == 0)
+			continue;
 
-	rte_distributor_process(d, bufs, num_workers);
-	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
+		if (tx_buffers[outp].count == 0)
+			continue;
 
-	return 0;
+		flush_one_port(&tx_buffers[outp], outp);
+	}
 }
 
 static int
 lcore_rx(struct lcore_params *p)
 {
-	struct rte_distributor *d = p->d;
-	struct rte_mempool *mem_pool = p->mem_pool;
-	struct rte_ring *r = p->r;
 	const uint8_t nb_ports = rte_eth_dev_count();
 	const int socket_id = rte_socket_id();
 	uint8_t port;
+	struct rte_mbuf *bufs[BURST_SIZE*2];
 
 	for (port = 0; port < nb_ports; port++) {
 		/* skip ports that are not enabled */
@@ -210,6 +277,7 @@ lcore_rx(struct lcore_params *p)
 
 	printf("\nCore %u doing packet RX.\n", rte_lcore_id());
 	port = 0;
+
 	while (!quit_signal_rx) {
 
 		/* skip ports that are not enabled */
@@ -218,7 +286,7 @@ lcore_rx(struct lcore_params *p)
 				port = 0;
 			continue;
 		}
-		struct rte_mbuf *bufs[BURST_SIZE*2];
+
 		const uint16_t nb_rx = rte_eth_rx_burst(port, 0, bufs,
 				BURST_SIZE);
 		if (unlikely(nb_rx == 0)) {
@@ -228,19 +296,46 @@ lcore_rx(struct lcore_params *p)
 		}
 		app_stats.rx.rx_pkts += nb_rx;
 
-		rte_distributor_process(d, bufs, nb_rx);
-		const uint16_t nb_ret = rte_distributor_returned_pkts(d,
-				bufs, BURST_SIZE*2);
+/*
+ * You can run the distributor on the rx core with this code. Returned
+ * packets are then send straight to the tx core.
+ */
+#if 0
+
+#if BURST_API
+	rte_distributor_process_burst(d, bufs, nb_rx);
+	const uint16_t nb_ret = rte_distributor_returned_pkts_burst(d,
+			bufs, BURST_SIZE*2);
+#else
+	rte_distributor_process(d, bufs, nb_rx);
+	const uint16_t nb_ret = rte_distributor_returned_pkts(d,
+			bufs, BURST_SIZE*2);
+#endif
+
 		app_stats.rx.returned_pkts += nb_ret;
 		if (unlikely(nb_ret == 0)) {
 			if (++port == nb_ports)
 				port = 0;
 			continue;
 		}
-
-		uint16_t sent = rte_ring_enqueue_burst(r, (void *)bufs, nb_ret);
+		struct rte_ring *tx_ring = p->dist_tx_ring;
+		uint16_t sent = rte_ring_enqueue_burst(tx_ring,
+				(void *)bufs, nb_ret);
+#else
+		uint16_t nb_ret = nb_rx;
+		/*
+		* Swap the following two lines if you want the rx traffic
+		* to go directly to tx, no distribution.
+		*/
+		struct rte_ring *out_ring = p->rx_dist_ring;
+		/* struct rte_ring *out_ring = p->dist_tx_ring; */
+
+		uint16_t sent = rte_ring_enqueue_burst(out_ring,
+				(void *)bufs, nb_ret);
+#endif
 		app_stats.rx.enqueued_pkts += sent;
 		if (unlikely(sent < nb_ret)) {
+			app_stats.rx.enqdrop_pkts +=  nb_ret - sent;
 			RTE_LOG_DP(DEBUG, DISTRAPP,
 				"%s:Packet loss due to full ring\n", __func__);
 			while (sent < nb_ret)
@@ -249,56 +344,89 @@ lcore_rx(struct lcore_params *p)
 		if (++port == nb_ports)
 			port = 0;
 	}
-	rte_distributor_process(d, NULL, 0);
-	/* flush distributor to bring to known state */
-	rte_distributor_flush(d);
 	/* set worker & tx threads quit flag */
+	printf("\nCore %u exiting rx task.\n", rte_lcore_id());
 	quit_signal = 1;
-	/*
-	 * worker threads may hang in get packet as
-	 * distributor process is not running, just make sure workers
-	 * get packets till quit_signal is actually been
-	 * received and they gracefully shutdown
-	 */
-	if (quit_workers(d, mem_pool) != 0)
-		return -1;
-	/* rx thread should quit at last */
 	return 0;
 }
 
-static inline void
-flush_one_port(struct output_buffer *outbuf, uint8_t outp)
-{
-	unsigned nb_tx = rte_eth_tx_burst(outp, 0, outbuf->mbufs,
-			outbuf->count);
-	app_stats.tx.tx_pkts += nb_tx;
 
-	if (unlikely(nb_tx < outbuf->count)) {
-		RTE_LOG_DP(DEBUG, DISTRAPP,
-			"%s:Packet loss with tx_burst\n", __func__);
-		do {
-			rte_pktmbuf_free(outbuf->mbufs[nb_tx]);
-		} while (++nb_tx < outbuf->count);
-	}
-	outbuf->count = 0;
-}
 
-static inline void
-flush_all_ports(struct output_buffer *tx_buffers, uint8_t nb_ports)
+static int
+lcore_distributor(struct lcore_params *p)
 {
-	uint8_t outp;
-	for (outp = 0; outp < nb_ports; outp++) {
-		/* skip ports that are not enabled */
-		if ((enabled_port_mask & (1 << outp)) == 0)
-			continue;
-
-		if (tx_buffers[outp].count == 0)
-			continue;
-
-		flush_one_port(&tx_buffers[outp], outp);
+	struct rte_ring *in_r = p->rx_dist_ring;
+	struct rte_ring *out_r = p->dist_tx_ring;
+	struct rte_mbuf *bufs[BURST_SIZE * 4];
+	struct rte_distributor_burst *d = p->d;
+
+	printf("\nCore %u acting as distributor core.\n", rte_lcore_id());
+	while (!quit_signal_dist) {
+		const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
+				(void *)bufs, BURST_SIZE*1);
+		if (nb_rx) {
+			app_stats.dist.in_pkts += nb_rx;
+/*
+ * This '#if' allows you to bypass the distributor. Incoming packets may be
+ * sent straight to the tx ring.
+ */
+#if 1
+
+#if BURST_API
+			/* Distribute the packets */
+			rte_distributor_process_burst(d, bufs, nb_rx);
+			/* Handle Returns */
+			const uint16_t nb_ret =
+				rte_distributor_returned_pkts_burst(d,
+					bufs, BURST_SIZE*2);
+#else
+			/* Distribute the packets */
+			rte_distributor_process(d, bufs, nb_rx);
+			/* Handle Returns */
+			const uint16_t nb_ret =
+				rte_distributor_returned_pkts(d,
+					bufs, BURST_SIZE*2);
+#endif
+
+#else
+			/* Bypass the distributor */
+			const unsigned int xor_val = (rte_eth_dev_count() > 1);
+			/* Touch the mbuf by xor'ing the port */
+			for (unsigned int i = 0; i < nb_rx; i++)
+				bufs[i]->port ^= xor_val;
+
+			const uint16_t nb_ret = nb_rx;
+#endif
+			if (unlikely(nb_ret == 0))
+				continue;
+			app_stats.dist.ret_pkts += nb_ret;
+
+			uint16_t sent = rte_ring_enqueue_burst(out_r,
+					(void *)bufs, nb_ret);
+			app_stats.dist.sent_pkts += sent;
+			if (unlikely(sent < nb_ret)) {
+				app_stats.dist.enqdrop_pkts += nb_ret - sent;
+				RTE_LOG(DEBUG, DISTRAPP,
+					"%s:Packet loss due to full out ring\n",
+					__func__);
+				while (sent < nb_ret)
+					rte_pktmbuf_free(bufs[sent++]);
+			}
+		}
 	}
+	printf("\nCore %u exiting distributor task.\n", rte_lcore_id());
+	quit_signal_work = 1;
+
+#if BURST_API
+	rte_distributor_flush_burst(d);
+	/* Unblock any returns so workers can exit */
+	rte_distributor_clear_returns_burst(d);
+#endif
+	quit_signal_rx = 1;
+	return 0;
 }
 
+
 static int
 lcore_tx(struct rte_ring *in_r)
 {
@@ -327,9 +455,9 @@ lcore_tx(struct rte_ring *in_r)
 			if ((enabled_port_mask & (1 << port)) == 0)
 				continue;
 
-			struct rte_mbuf *bufs[BURST_SIZE];
+			struct rte_mbuf *bufs[BURST_SIZE_TX];
 			const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
-					(void *)bufs, BURST_SIZE);
+					(void *)bufs, BURST_SIZE_TX);
 			app_stats.tx.dequeue_pkts += nb_rx;
 
 			/* if we get no traffic, flush anything we have */
@@ -358,11 +486,12 @@ lcore_tx(struct rte_ring *in_r)
 
 				outbuf = &tx_buffers[outp];
 				outbuf->mbufs[outbuf->count++] = bufs[i];
-				if (outbuf->count == BURST_SIZE)
+				if (outbuf->count == BURST_SIZE_TX)
 					flush_one_port(outbuf, outp);
 			}
 		}
 	}
+	printf("\nCore %u exiting tx task.\n", rte_lcore_id());
 	return 0;
 }
 
@@ -371,52 +500,147 @@ int_handler(int sig_num)
 {
 	printf("Exiting on signal %d\n", sig_num);
 	/* set quit flag for rx thread to exit */
-	quit_signal_rx = 1;
+	quit_signal_dist = 1;
 }
 
 static void
 print_stats(void)
 {
 	struct rte_eth_stats eth_stats;
-	unsigned i;
-
-	printf("\nRX thread stats:\n");
-	printf(" - Received:    %"PRIu64"\n", app_stats.rx.rx_pkts);
-	printf(" - Processed:   %"PRIu64"\n", app_stats.rx.returned_pkts);
-	printf(" - Enqueued:    %"PRIu64"\n", app_stats.rx.enqueued_pkts);
-
-	printf("\nTX thread stats:\n");
-	printf(" - Dequeued:    %"PRIu64"\n", app_stats.tx.dequeue_pkts);
-	printf(" - Transmitted: %"PRIu64"\n", app_stats.tx.tx_pkts);
+	unsigned int i, j;
+	const unsigned int num_workers = rte_lcore_count() - 4;
 
 	for (i = 0; i < rte_eth_dev_count(); i++) {
 		rte_eth_stats_get(i, &eth_stats);
-		printf("\nPort %u stats:\n", i);
-		printf(" - Pkts in:   %"PRIu64"\n", eth_stats.ipackets);
-		printf(" - Pkts out:  %"PRIu64"\n", eth_stats.opackets);
-		printf(" - In Errs:   %"PRIu64"\n", eth_stats.ierrors);
-		printf(" - Out Errs:  %"PRIu64"\n", eth_stats.oerrors);
-		printf(" - Mbuf Errs: %"PRIu64"\n", eth_stats.rx_nombuf);
+		app_stats.port_rx_pkts[i] = eth_stats.ipackets;
+		app_stats.port_tx_pkts[i] = eth_stats.opackets;
+	}
+
+	printf("\n\nRX Thread:\n");
+	for (i = 0; i < rte_eth_dev_count(); i++) {
+		printf("Port %u Pktsin : %5.2f\n", i,
+				(app_stats.port_rx_pkts[i] -
+				prev_app_stats.port_rx_pkts[i])/1000000.0);
+		prev_app_stats.port_rx_pkts[i] = app_stats.port_rx_pkts[i];
+	}
+	printf(" - Received:    %5.2f\n",
+			(app_stats.rx.rx_pkts -
+			prev_app_stats.rx.rx_pkts)/1000000.0);
+	printf(" - Returned:    %5.2f\n",
+			(app_stats.rx.returned_pkts -
+			prev_app_stats.rx.returned_pkts)/1000000.0);
+	printf(" - Enqueued:    %5.2f\n",
+			(app_stats.rx.enqueued_pkts -
+			prev_app_stats.rx.enqueued_pkts)/1000000.0);
+	printf(" - Dropped:     %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.rx.enqdrop_pkts -
+			prev_app_stats.rx.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	printf("Distributor thread:\n");
+	printf(" - In:          %5.2f\n",
+			(app_stats.dist.in_pkts -
+			prev_app_stats.dist.in_pkts)/1000000.0);
+	printf(" - Returned:    %5.2f\n",
+			(app_stats.dist.ret_pkts -
+			prev_app_stats.dist.ret_pkts)/1000000.0);
+	printf(" - Sent:        %5.2f\n",
+			(app_stats.dist.sent_pkts -
+			prev_app_stats.dist.sent_pkts)/1000000.0);
+	printf(" - Dropped      %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.dist.enqdrop_pkts -
+			prev_app_stats.dist.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	printf("TX thread:\n");
+	printf(" - Dequeued:    %5.2f\n",
+			(app_stats.tx.dequeue_pkts -
+			prev_app_stats.tx.dequeue_pkts)/1000000.0);
+	for (i = 0; i < rte_eth_dev_count(); i++) {
+		printf("Port %u Pktsout: %5.2f\n",
+				i, (app_stats.port_tx_pkts[i] -
+				prev_app_stats.port_tx_pkts[i])/1000000.0);
+		prev_app_stats.port_tx_pkts[i] = app_stats.port_tx_pkts[i];
+	}
+	printf(" - Transmitted: %5.2f\n",
+			(app_stats.tx.tx_pkts -
+			prev_app_stats.tx.tx_pkts)/1000000.0);
+	printf(" - Dropped:     %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.tx.enqdrop_pkts -
+			prev_app_stats.tx.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	prev_app_stats.rx.rx_pkts = app_stats.rx.rx_pkts;
+	prev_app_stats.rx.returned_pkts = app_stats.rx.returned_pkts;
+	prev_app_stats.rx.enqueued_pkts = app_stats.rx.enqueued_pkts;
+	prev_app_stats.rx.enqdrop_pkts = app_stats.rx.enqdrop_pkts;
+	prev_app_stats.dist.in_pkts = app_stats.dist.in_pkts;
+	prev_app_stats.dist.ret_pkts = app_stats.dist.ret_pkts;
+	prev_app_stats.dist.sent_pkts = app_stats.dist.sent_pkts;
+	prev_app_stats.dist.enqdrop_pkts = app_stats.dist.enqdrop_pkts;
+	prev_app_stats.tx.dequeue_pkts = app_stats.tx.dequeue_pkts;
+	prev_app_stats.tx.tx_pkts = app_stats.tx.tx_pkts;
+	prev_app_stats.tx.enqdrop_pkts = app_stats.tx.enqdrop_pkts;
+
+	for (i = 0; i < num_workers; i++) {
+		printf("Worker %02u Pkts: %5.2f. Bursts(1-8): ", i,
+				(app_stats.worker_pkts[i] -
+				prev_app_stats.worker_pkts[i])/1000000.0);
+		for (j = 0; j < 8; j++)
+			printf("%"PRIu64" ", app_stats.worker_bursts[i][j]);
+		printf("\n");
+		prev_app_stats.worker_pkts[i] = app_stats.worker_pkts[i];
 	}
 }
 
 static int
 lcore_worker(struct lcore_params *p)
 {
-	struct rte_distributor *d = p->d;
+	struct rte_distributor_burst *d = p->d;
 	const unsigned id = p->worker_id;
+	unsigned int num = 0;
+	unsigned int i;
+
 	/*
 	 * for single port, xor_val will be zero so we won't modify the output
 	 * port, otherwise we send traffic from 0 to 1, 2 to 3, and vice versa
 	 */
 	const unsigned xor_val = (rte_eth_dev_count() > 1);
-	struct rte_mbuf *buf = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
+
+	app_stats.worker_pkts[p->worker_id] = 1;
+
 
 	printf("\nCore %u acting as worker core.\n", rte_lcore_id());
-	while (!quit_signal) {
-		buf = rte_distributor_get_pkt(d, id, buf);
-		buf->port ^= xor_val;
+	while (!quit_signal_work) {
+
+#if BURST_API
+		num = rte_distributor_get_pkt_burst(d, id, buf, buf, num);
+		/* Do a little bit of work for each packet */
+		for (i = 0; i < num; i++) {
+			uint64_t t = __rdtsc()+100;
+
+			while (__rdtsc() < t)
+				rte_pause();
+			buf[i]->port ^= xor_val;
+		}
+#else
+		buf[0] = rte_distributor_get_pkt(d, id, buf[0]);
+		uint64_t t = __rdtsc() + 10;
+
+		while (__rdtsc() < t)
+			rte_pause();
+		buf[0]->port ^= xor_val;
+#endif
+
+		app_stats.worker_pkts[p->worker_id] += num;
+		if (num > 0)
+			app_stats.worker_bursts[p->worker_id][num-1]++;
 	}
+	printf("\nCore %u exiting worker task.\n", rte_lcore_id());
 	return 0;
 }
 
@@ -496,12 +720,14 @@ int
 main(int argc, char *argv[])
 {
 	struct rte_mempool *mbuf_pool;
-	struct rte_distributor *d;
-	struct rte_ring *output_ring;
+	struct rte_distributor_burst *d;
+	struct rte_ring *dist_tx_ring;
+	struct rte_ring *rx_dist_ring;
 	unsigned lcore_id, worker_id = 0;
 	unsigned nb_ports;
 	uint8_t portid;
 	uint8_t nb_ports_available;
+	uint64_t t, freq;
 
 	/* catch ctrl-c so we can print on exit */
 	signal(SIGINT, int_handler);
@@ -518,10 +744,12 @@ main(int argc, char *argv[])
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Invalid distributor parameters\n");
 
-	if (rte_lcore_count() < 3)
+	if (rte_lcore_count() < 5)
 		rte_exit(EXIT_FAILURE, "Error, This application needs at "
-				"least 3 logical cores to run:\n"
-				"1 lcore for packet RX and distribution\n"
+				"least 5 logical cores to run:\n"
+				"1 lcore for stats (can be core 0)\n"
+				"1 lcore for packet RX\n"
+				"1 lcore for distribution\n"
 				"1 lcore for packet TX\n"
 				"and at least 1 lcore for worker threads\n");
 
@@ -560,41 +788,86 @@ main(int argc, char *argv[])
 				"All available ports are disabled. Please set portmask.\n");
 	}
 
+#if BURST_API
+	d = rte_distributor_create_burst("PKT_DIST", rte_socket_id(),
+			rte_lcore_count() - 4);
+#else
 	d = rte_distributor_create("PKT_DIST", rte_socket_id(),
-			rte_lcore_count() - 2);
+			rte_lcore_count() - 4);
+#endif
 	if (d == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create distributor\n");
 
 	/*
-	 * scheduler ring is read only by the transmitter core, but written to
-	 * by multiple threads
+	 * scheduler ring is read by the transmitter core, and written to
+	 * by scheduler core
 	 */
-	output_ring = rte_ring_create("Output_ring", RTE_RING_SZ,
-			rte_socket_id(), RING_F_SC_DEQ);
-	if (output_ring == NULL)
+	dist_tx_ring = rte_ring_create("Output_ring", SCHED_TX_RING_SZ,
+			rte_socket_id(), RING_F_SC_DEQ | RING_F_SP_ENQ);
+	if (dist_tx_ring == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot create output ring\n");
+
+	rx_dist_ring = rte_ring_create("Input_ring", SCHED_RX_RING_SZ,
+			rte_socket_id(), RING_F_SC_DEQ | RING_F_SP_ENQ);
+	if (rx_dist_ring == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create output ring\n");
 
 	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
-		if (worker_id == rte_lcore_count() - 2)
+		if (worker_id == rte_lcore_count() - 3) {
+			printf("Starting distributor on lcore_id %d\n",
+					lcore_id);
+			/* distributor core */
+			struct lcore_params *p =
+					rte_malloc(NULL, sizeof(*p), 0);
+			if (!p)
+				rte_panic("malloc failure\n");
+			*p = (struct lcore_params){worker_id, d,
+					rx_dist_ring, dist_tx_ring, mbuf_pool};
+			rte_eal_remote_launch(
+					(lcore_function_t *)lcore_distributor,
+					p, lcore_id);
+		} else if (worker_id == rte_lcore_count() - 4) {
+			printf("Starting tx  on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
+			/* tx core */
 			rte_eal_remote_launch((lcore_function_t *)lcore_tx,
-					output_ring, lcore_id);
-		else {
+					dist_tx_ring, lcore_id);
+		} else if (worker_id == rte_lcore_count() - 2) {
+			printf("Starting rx on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
+			/* rx core */
 			struct lcore_params *p =
 					rte_malloc(NULL, sizeof(*p), 0);
 			if (!p)
 				rte_panic("malloc failure\n");
-			*p = (struct lcore_params){worker_id, d, output_ring, mbuf_pool};
+			*p = (struct lcore_params){worker_id, d, rx_dist_ring,
+					dist_tx_ring, mbuf_pool};
+			rte_eal_remote_launch((lcore_function_t *)lcore_rx,
+					p, lcore_id);
+		} else {
+			printf("Starting worker on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
+			struct lcore_params *p =
+					rte_malloc(NULL, sizeof(*p), 0);
+			if (!p)
+				rte_panic("malloc failure\n");
+			*p = (struct lcore_params){worker_id, d, rx_dist_ring,
+					dist_tx_ring, mbuf_pool};
 
 			rte_eal_remote_launch((lcore_function_t *)lcore_worker,
 					p, lcore_id);
 		}
 		worker_id++;
 	}
-	/* call lcore_main on master core only */
-	struct lcore_params p = { 0, d, output_ring, mbuf_pool};
 
-	if (lcore_rx(&p) != 0)
-		return -1;
+	freq = rte_get_timer_hz();
+	t = __rdtsc() + freq;
+	while (!quit_signal_dist) {
+		if (t < __rdtsc()) {
+			print_stats();
+			t = _rdtsc() + freq;
+		}
+	}
 
 	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
 		if (rte_eal_wait_lcore(lcore_id) < 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v5 6/6] doc: distributor library changes for new burst API
  2017-01-20  9:18           ` [PATCH v5 0/6] distributor library " David Hunt
                               ` (4 preceding siblings ...)
  2017-01-20  9:18             ` [PATCH v5 5/6] examples/distributor_app: showing burst API David Hunt
@ 2017-01-20  9:18             ` David Hunt
  5 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-01-20  9:18 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 doc/guides/prog_guide/packet_distrib_lib.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/prog_guide/packet_distrib_lib.rst b/doc/guides/prog_guide/packet_distrib_lib.rst
index b5bdabb..dffd4ad 100644
--- a/doc/guides/prog_guide/packet_distrib_lib.rst
+++ b/doc/guides/prog_guide/packet_distrib_lib.rst
@@ -42,6 +42,10 @@ The model of operation is shown in the diagram below.
 
    Packet Distributor mode of operation
 
+There are two versions of the API in the distributor Library, one which sends one packet at a time to workers,
+and another which sends bursts of up to 8 packets at a time to workers. The functions names of the second API
+are identified by "_burst", and must not be intermixed with the single packet API. The operations described below
+apply to both API's, select which API you wish to use by including the relevant header file.
 
 Distributor Core Operation
 --------------------------
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v6 0/6] distributor library performance enhancements
  2017-01-20  9:18             ` [PATCH v5 1/6] lib: distributor " David Hunt
@ 2017-01-23  9:24               ` David Hunt
  2017-01-23  9:24                 ` [PATCH v6 1/6] lib: distributor " David Hunt
                                   ` (7 more replies)
  2017-01-23 12:26               ` [PATCH v5 1/6] lib: distributor " Bruce Richardson
  1 sibling, 8 replies; 202+ messages in thread
From: David Hunt @ 2017-01-23  9:24 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson

This patch aims to improve the throughput of the distributor library.

It uses a similar handshake mechanism to the previous version of
the library, in that bits are used to indicate when packets are ready
to be sent to a worker and ready to be returned from a worker. One main
difference is that instead of sending one packet in a cache line, it makes
use of the 7 free spaces in the same cache line in order to send up to
8 packets at a time to/from a worker.

The flow matching algorithm has had significant re-work, and now keeps an
array of inflight flows and an array of backlog flows, and matches incoming
flows to the inflight/backlog flows of all workers so that flow pinning to
workers can be maintained.

The Flow Match algorithm has both scalar and a vector versions, and a
function pointer is used to select the post appropriate function at run time,
depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
the scalar match function is selected, which should still gives a good boost
in performance over the non-burst API.

v2 changes:
  * Created a common distributor_priv.h header file with common
    definitions and structures.
  * Added a scalar version so it can be built and used on machines without
    sse2 instruction set
  * Added unit autotests
  * Added perf autotest

v3 changes:
  * Addressed mailing list review comments
  * Test code removal
  * Split out SSE match into separate file to facilitate NEON addition
  * Cleaned up conditional compilation flags for SSE2
  * Addressed c99 style compilation errors
  * rebased on latest head (Jan 2 2017, Happy New Year to all)

v4 changes:
   * fixed issue building shared libraries

v5 changes:
   * Removed some un-needed code around retries in worker API calls
   * Cleanup due to review comments on mailing list
   * Cleanup of non-x86 platform compilation, fallback to scalar match

v6 changes:
   * Fixed intermittent segfault where num pkts not divisible
     by BURST_SIZE
   * Cleanup due to review comments on mailing list
   * Renamed _priv.h to _private.h.

Notes:
   Apps must now work in bursts, as up to 8 are given to a worker at a time
   For performance in matching, Flow ID's are 15-bits
   Original API (and code) is kept for backward compatibility

Performance Gains
   2.2GHz Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
   2 x XL710 40GbE NICS to 2 x 40Gbps traffic generator channels 64b packets
   separate cores for rx, tx, distributor
    1 worker  - 4.8x
    4 workers - 2.9x
    8 workers - 1.8x
   12 workers - 2.1x
   16 workers - 1.8x

[PATCH v6 1/6] lib: distributor performance enhancements
[PATCH v6 2/6] lib: add distributor vector flow matching
[PATCH v6 3/6] test: unit tests for new distributor burst API
[PATCH v6 4/6] test: add distributor perf autotest
[PATCH v6 5/6] examples/distributor_app: showing burst API
[PATCH v6 6/6] doc: distributor library changes for new burst API

^ permalink raw reply	[flat|nested] 202+ messages in thread

* [PATCH v6 1/6] lib: distributor performance enhancements
  2017-01-23  9:24               ` [PATCH v6 0/6] distributor library " David Hunt
@ 2017-01-23  9:24                 ` David Hunt
  2017-02-21  3:17                   ` [PATCH v7 0/17] distributor library " David Hunt
  2017-01-23  9:24                 ` [PATCH v6 2/6] lib: add distributor vector flow matching David Hunt
                                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-01-23  9:24 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Now sends bursts of up to 8 mbufs to each worker, and tracks
the in-flight flow-ids (atomic scheduling)

New file with a new api, similar to the old API except with _burst
at the end of the function names. This is to preserve the original
API (and code) for backward compatibility.

It uses a similar handshake mechanism to the previous version of
the library, in that bits are used to indicate when packets are ready
to be sent to a worker and ready to be returned from a worker. One main
difference is that instead of sending one packet in a cache line, it
makes use of the 7 free spaces in the same cache line in order to send
up to 8 packets at a time to/from a worker.

The flow matching algorithm has had significant re-work, and now keeps
an array of inflight flows and an array of backlog flows, and matches
incoming flows to the inflight/backlog flows of all workers so
that flow pinning to workers can be maintained.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                    |   2 +
 lib/librte_distributor/rte_distributor.c           |  74 +--
 lib/librte_distributor/rte_distributor.h           |   2 +-
 lib/librte_distributor/rte_distributor_burst.c     | 564 +++++++++++++++++++++
 lib/librte_distributor/rte_distributor_burst.h     | 257 ++++++++++
 lib/librte_distributor/rte_distributor_private.h   | 189 +++++++
 lib/librte_distributor/rte_distributor_version.map |  14 +
 7 files changed, 1029 insertions(+), 73 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_burst.c
 create mode 100644 lib/librte_distributor/rte_distributor_burst.h
 create mode 100644 lib/librte_distributor/rte_distributor_private.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 4c9af17..2acc54d 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -43,9 +43,11 @@ LIBABIVER := 1
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_burst.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor_burst.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index f3f778c..1bcee4c 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -40,79 +40,9 @@
 #include <rte_errno.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
+#include "rte_distributor_private.h"
 #include "rte_distributor.h"
 
-#define NO_FLAGS 0
-#define RTE_DISTRIB_PREFIX "DT_"
-
-/* we will use the bottom four bits of pointer for flags, shifting out
- * the top four bits to make room (since a 64-bit pointer actually only uses
- * 48 bits). An arithmetic-right-shift will then appropriately restore the
- * original pointer value with proper sign extension into the top bits. */
-#define RTE_DISTRIB_FLAG_BITS 4
-#define RTE_DISTRIB_FLAGS_MASK (0x0F)
-#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
-#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
-#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
-
-#define RTE_DISTRIB_BACKLOG_SIZE 8
-#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
-
-#define RTE_DISTRIB_MAX_RETURNS 128
-#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
-
-/**
- * Maximum number of workers allowed.
- * Be aware of increasing the limit, becaus it is limited by how we track
- * in-flight tags. See @in_flight_bitmask and @rte_distributor_process
- */
-#define RTE_DISTRIB_MAX_WORKERS	64
-
-/**
- * Buffer structure used to pass the pointer data between cores. This is cache
- * line aligned, but to improve performance and prevent adjacent cache-line
- * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
- * the next cache line to worker 0, we pad this out to three cache lines.
- * Only 64-bits of the memory is actually used though.
- */
-union rte_distributor_buffer {
-	volatile int64_t bufptr64;
-	char pad[RTE_CACHE_LINE_SIZE*3];
-} __rte_cache_aligned;
-
-struct rte_distributor_backlog {
-	unsigned start;
-	unsigned count;
-	int64_t pkts[RTE_DISTRIB_BACKLOG_SIZE];
-};
-
-struct rte_distributor_returned_pkts {
-	unsigned start;
-	unsigned count;
-	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
-};
-
-struct rte_distributor {
-	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
-
-	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
-	unsigned num_workers;                 /**< Number of workers polling */
-
-	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
-		/**< Tracks the tag being processed per core */
-	uint64_t in_flight_bitmask;
-		/**< on/off bits for in-flight tags.
-		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
-		 * the bitmask has to expand.
-		 */
-
-	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
-
-	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
-
-	struct rte_distributor_returned_pkts returns;
-};
-
 TAILQ_HEAD(rte_distributor_list, rte_distributor);
 
 static struct rte_tailq_elem rte_distributor_tailq = {
diff --git a/lib/librte_distributor/rte_distributor.h b/lib/librte_distributor/rte_distributor.h
index 7d36bc8..7281491 100644
--- a/lib/librte_distributor/rte_distributor.h
+++ b/lib/librte_distributor/rte_distributor.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
diff --git a/lib/librte_distributor/rte_distributor_burst.c b/lib/librte_distributor/rte_distributor_burst.c
new file mode 100644
index 0000000..9315c12
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_burst.c
@@ -0,0 +1,564 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <sys/queue.h>
+#include <string.h>
+#include <rte_mbuf.h>
+#include <rte_memory.h>
+#include <rte_cycles.h>
+#include <rte_memzone.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
+#include <rte_eal_memconfig.h>
+#include "rte_distributor_private.h"
+#include "rte_distributor_burst.h"
+
+TAILQ_HEAD(rte_dist_burst_list, rte_distributor_burst);
+
+static struct rte_tailq_elem rte_dist_burst_tailq = {
+	.name = "RTE_DIST_BURST",
+};
+EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
+
+/**** APIs called by workers ****/
+
+/**** Burst Packet APIs called by workers ****/
+
+void
+rte_distributor_request_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count)
+{
+	struct rte_distributor_buffer_burst *buf = &(d->bufs[worker_id]);
+	unsigned int i;
+
+	volatile int64_t *retptr64;
+
+	retptr64 = &(buf->retptr64[0]);
+	/* Spin while handshake bits are set (scheduler clears it) */
+	while (unlikely(*retptr64 & RTE_DISTRIB_GET_BUF)) {
+		rte_pause();
+		uint64_t t = rte_rdtsc()+100;
+
+		while (rte_rdtsc() < t)
+			rte_pause();
+	}
+
+	/*
+	 * OK, if we've got here, then the scheduler has just cleared the
+	 * handshake bits. Populate the retptrs with returning packets.
+	 */
+
+	for (i = count; i < RTE_DIST_BURST_SIZE; i++)
+		buf->retptr64[i] = 0;
+
+	/* Set Return bit for each packet returned */
+	for (i = count; i-- > 0; )
+		buf->retptr64[i] =
+			(((int64_t)(uintptr_t)(oldpkt[i])) <<
+			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
+
+	/*
+	 * Finally, set the GET_BUF  to signal to distributor that cache
+	 * line is ready for processing
+	 */
+	*retptr64 |= RTE_DISTRIB_GET_BUF;
+}
+
+int
+rte_distributor_poll_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **pkts)
+{
+	struct rte_distributor_buffer_burst *buf = &d->bufs[worker_id];
+	uint64_t ret;
+	int count = 0;
+	unsigned int i;
+
+	/* If bit is set, return */
+	if (buf->bufptr64[0] & RTE_DISTRIB_GET_BUF)
+		return -1;
+
+	/* since bufptr64 is signed, this should be an arithmetic shift */
+	for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+		if (likely(buf->bufptr64[i] & RTE_DISTRIB_VALID_BUF)) {
+			ret = buf->bufptr64[i] >> RTE_DISTRIB_FLAG_BITS;
+			pkts[count++] = (struct rte_mbuf *)((uintptr_t)(ret));
+		}
+	}
+
+	/*
+	 * so now we've got the contents of the cacheline into an  array of
+	 * mbuf pointers, so toggle the bit so scheduler can start working
+	 * on the next cacheline while we're working.
+	 */
+	buf->bufptr64[0] |= RTE_DISTRIB_GET_BUF;
+
+	return count;
+}
+
+int
+rte_distributor_get_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **pkts,
+		struct rte_mbuf **oldpkt, unsigned int return_count)
+{
+	int count;
+
+	rte_distributor_request_pkt_burst(d, worker_id, oldpkt, return_count);
+
+	count = rte_distributor_poll_pkt_burst(d, worker_id, pkts);
+	while (count == -1) {
+		uint64_t t = rte_rdtsc() + 100;
+
+		while (rte_rdtsc() < t)
+			rte_pause();
+
+		count = rte_distributor_poll_pkt_burst(d, worker_id, pkts);
+	}
+	return count;
+}
+
+int
+rte_distributor_return_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
+{
+	struct rte_distributor_buffer_burst *buf = &d->bufs[worker_id];
+	unsigned int i;
+
+	for (i = 0; i < RTE_DIST_BURST_SIZE; i++)
+		/* Switch off the return bit first */
+		buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
+
+	for (i = num; i-- > 0; )
+		buf->retptr64[i] = (((int64_t)(uintptr_t)oldpkt[i]) <<
+			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
+
+	/* set the GET_BUF but even if we got no returns */
+	buf->retptr64[0] |= RTE_DISTRIB_GET_BUF;
+
+	return 0;
+}
+
+/**** APIs called on distributor core ***/
+
+/* stores a packet returned from a worker inside the returns array */
+static inline void
+store_return(uintptr_t oldbuf, struct rte_distributor_burst *d,
+		unsigned int *ret_start, unsigned int *ret_count)
+{
+	if (!oldbuf)
+		return;
+	/* store returns in a circular buffer */
+	d->returns.mbufs[(*ret_start + *ret_count) & RTE_DISTRIB_RETURNS_MASK]
+			= (void *)oldbuf;
+	*ret_start += (*ret_count == RTE_DISTRIB_RETURNS_MASK);
+	*ret_count += (*ret_count != RTE_DISTRIB_RETURNS_MASK);
+}
+
+/*
+ * Match then flow_ids (tags) of the incoming packets to the flow_ids
+ * of the inflight packets (both inflight on the workers and in each worker
+ * backlog). This will then allow us to pin those packets to the relevant
+ * workers to give us our atomic flow pinning.
+ */
+static inline void
+find_match_scalar(struct rte_distributor_burst *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	struct rte_distributor_backlog *bl;
+	uint16_t i, j, w;
+
+	/*
+	 * Function overview:
+	 * 1. Loop through all worker ID's
+	 * 2. Compare the current inflights to the incoming tags
+	 * 3. Compare the current backlog to the incoming tags
+	 * 4. Add any matches to the output
+	 */
+
+	for (j = 0 ; j < RTE_DIST_BURST_SIZE; j++)
+		output_ptr[j] = 0;
+
+	for (i = 0; i < d->num_workers; i++) {
+		bl = &d->backlog[i];
+
+		for (j = 0; j < RTE_DIST_BURST_SIZE ; j++)
+			for (w = 0; w < RTE_DIST_BURST_SIZE; w++)
+				if (d->in_flight_tags[i][j] == data_ptr[w]) {
+					output_ptr[j] = i+1;
+					break;
+				}
+		for (j = 0; j < RTE_DIST_BURST_SIZE; j++)
+			for (w = 0; w < RTE_DIST_BURST_SIZE; w++)
+				if (bl->tags[j] == data_ptr[w]) {
+					output_ptr[j] = i+1;
+					break;
+				}
+	}
+
+	/*
+	 * At this stage, the output contains 8 16-bit values, with
+	 * each non-zero value containing the worker ID on which the
+	 * corresponding flow is pinned to.
+	 */
+}
+
+
+/*
+ * When the handshake bits indicate that there are packets coming
+ * back from the worker, this function is called to copy and store
+ * the valid returned pointers (store_return).
+ */
+static unsigned int
+handle_returns(struct rte_distributor_burst *d, unsigned int wkr)
+{
+	struct rte_distributor_buffer_burst *buf = &(d->bufs[wkr]);
+	uintptr_t oldbuf;
+	unsigned int ret_start = d->returns.start,
+			ret_count = d->returns.count;
+	unsigned int count = 0;
+	unsigned int i;
+
+	if (buf->retptr64[0] & RTE_DISTRIB_GET_BUF) {
+		for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+			if (buf->retptr64[i] & RTE_DISTRIB_RETURN_BUF) {
+				oldbuf = ((uintptr_t)(buf->retptr64[i] >>
+					RTE_DISTRIB_FLAG_BITS));
+				/* store returns in a circular buffer */
+				store_return(oldbuf, d, &ret_start, &ret_count);
+				count++;
+				buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
+			}
+		}
+		d->returns.start = ret_start;
+		d->returns.count = ret_count;
+		/* Clear for the worker to populate with more returns */
+		buf->retptr64[0] = 0;
+	}
+	return count;
+}
+
+/*
+ * This function releases a burst (cache line) to a worker.
+ * It is called from the process function when a cacheline is
+ * full to make room for more packets for that worker, or when
+ * all packets have been assigned to bursts and need to be flushed
+ * to the workers.
+ * It also needs to wait for any outstanding packets from the worker
+ * before sending out new packets.
+ */
+static unsigned int
+release(struct rte_distributor_burst *d, unsigned int wkr)
+{
+	struct rte_distributor_buffer_burst *buf = &(d->bufs[wkr]);
+	unsigned int i;
+
+	while (!(d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF))
+		rte_pause();
+
+	handle_returns(d, wkr);
+
+	buf->count = 0;
+
+	for (i = 0; i < d->backlog[wkr].count; i++) {
+		d->bufs[wkr].bufptr64[i] = d->backlog[wkr].pkts[i] |
+				RTE_DISTRIB_GET_BUF | RTE_DISTRIB_VALID_BUF;
+		d->in_flight_tags[wkr][i] = d->backlog[wkr].tags[i];
+	}
+	buf->count = i;
+	for ( ; i < RTE_DIST_BURST_SIZE ; i++) {
+		buf->bufptr64[i] = RTE_DISTRIB_GET_BUF;
+		d->in_flight_tags[wkr][i] = 0;
+	}
+
+	d->backlog[wkr].count = 0;
+
+	/* Clear the GET bit */
+	buf->bufptr64[0] &= ~RTE_DISTRIB_GET_BUF;
+	return  buf->count;
+
+}
+
+
+/* process a set of packets to distribute them to workers */
+int
+rte_distributor_process_burst(struct rte_distributor_burst *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs)
+{
+	unsigned int next_idx = 0;
+	static unsigned int wkr;
+	struct rte_mbuf *next_mb = NULL;
+	int64_t next_value = 0;
+	uint16_t new_tag = 0;
+	uint16_t flows[RTE_DIST_BURST_SIZE] __rte_cache_aligned;
+	unsigned int i, j, w, wid;
+
+	if (unlikely(num_mbufs == 0)) {
+		/* Flush out all non-full cache-lines to workers. */
+		for (wid = 0 ; wid < d->num_workers; wid++) {
+			if ((d->bufs[wid].bufptr64[0] & RTE_DISTRIB_GET_BUF)) {
+				release(d, wid);
+				handle_returns(d, wid);
+			}
+		}
+		return 0;
+	}
+
+	while (next_idx < num_mbufs) {
+		uint16_t matches[RTE_DIST_BURST_SIZE];
+		unsigned int pkts;
+
+		if (d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF)
+			d->bufs[wkr].count = 0;
+
+		if ((num_mbufs - next_idx) < RTE_DIST_BURST_SIZE)
+			pkts = num_mbufs - next_idx;
+		else
+			pkts = RTE_DIST_BURST_SIZE;
+
+		for (i = 0; i < pkts; i++) {
+			if (mbufs[next_idx + i]) {
+				/* flows have to be non-zero */
+				flows[i] = mbufs[next_idx + i]->hash.usr | 1;
+			} else
+				flows[i] = 0;
+		}
+		for (; i < RTE_DIST_BURST_SIZE; i++)
+			flows[i] = 0;
+
+		switch (d->dist_match_fn) {
+		default:
+			find_match_scalar(d, &flows[0], &matches[0]);
+		}
+
+		/*
+		 * Matches array now contain the intended worker ID (+1) of
+		 * the incoming packets. Any zeroes need to be assigned
+		 * workers.
+		 */
+
+		for (j = 0; j < pkts; j++) {
+
+			next_mb = mbufs[next_idx++];
+			next_value = (((int64_t)(uintptr_t)next_mb) <<
+					RTE_DISTRIB_FLAG_BITS);
+			/*
+			 * User is advocated to set tag vaue for each
+			 * mbuf before calling rte_distributor_process.
+			 * User defined tags are used to identify flows,
+			 * or sessions.
+			 */
+			/* flows MUST be non-zero */
+			new_tag = (uint16_t)(next_mb->hash.usr) | 1;
+
+			/*
+			 * Uncommenting the next line will cause the find_match
+			 * function to be optimised out, making this function
+			 * do parallel (non-atomic) distribution
+			 */
+			/* matches[j] = 0; */
+
+			if (matches[j]) {
+				struct rte_distributor_backlog *bl =
+						&d->backlog[matches[j]-1];
+				if (unlikely(bl->count ==
+						RTE_DIST_BURST_SIZE)) {
+					release(d, matches[j]-1);
+				}
+
+				/* Add to worker that already has flow */
+				unsigned int idx = bl->count++;
+
+				bl->tags[idx] = new_tag;
+				bl->pkts[idx] = next_value;
+
+			} else {
+				struct rte_distributor_backlog *bl =
+						&d->backlog[wkr];
+				if (unlikely(bl->count ==
+						RTE_DIST_BURST_SIZE)) {
+					release(d, wkr);
+				}
+
+				/* Add to current worker worker */
+				unsigned int idx = bl->count++;
+
+				bl->tags[idx] = new_tag;
+				bl->pkts[idx] = next_value;
+				/*
+				 * Now that we've just added an unpinned flow
+				 * to a worker, we need to ensure that all
+				 * other packets with that same flow will go
+				 * to the same worker in this burst.
+				 */
+				for (w = j; w < pkts; w++)
+					if (flows[w] == new_tag)
+						matches[w] = wkr+1;
+			}
+		}
+		wkr++;
+		if (wkr >= d->num_workers)
+			wkr = 0;
+	}
+
+	/* Flush out all non-full cache-lines to workers. */
+	for (wid = 0 ; wid < d->num_workers; wid++)
+		if ((d->bufs[wid].bufptr64[0] & RTE_DISTRIB_GET_BUF))
+			release(d, wid);
+
+	return num_mbufs;
+}
+
+/* return to the caller, packets returned from workers */
+int
+rte_distributor_returned_pkts_burst(struct rte_distributor_burst *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs)
+{
+	struct rte_distributor_returned_pkts *returns = &d->returns;
+	unsigned int retval = (max_mbufs < returns->count) ?
+			max_mbufs : returns->count;
+	unsigned int i;
+
+	for (i = 0; i < retval; i++) {
+		unsigned int idx = (returns->start + i) &
+				RTE_DISTRIB_RETURNS_MASK;
+
+		mbufs[i] = returns->mbufs[idx];
+	}
+	returns->start += i;
+	returns->count -= i;
+
+	return retval;
+}
+
+/*
+ * Return the number of packets in-flight in a distributor, i.e. packets
+ * being workered on or queued up in a backlog.
+ */
+static inline unsigned int
+total_outstanding(const struct rte_distributor_burst *d)
+{
+	unsigned int wkr, total_outstanding = 0;
+
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		total_outstanding += d->backlog[wkr].count;
+
+	return total_outstanding;
+}
+
+/*
+ * Flush the distributor, so that there are no outstanding packets in flight or
+ * queued up.
+ */
+int
+rte_distributor_flush_burst(struct rte_distributor_burst *d)
+{
+	const unsigned int flushed = total_outstanding(d);
+	unsigned int wkr;
+
+	while (total_outstanding(d) > 0)
+		rte_distributor_process_burst(d, NULL, 0);
+
+	/*
+	 * Send empty burst to all workers to allow them to exit
+	 * gracefully, should they need to.
+	 */
+	rte_distributor_process_burst(d, NULL, 0);
+
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		handle_returns(d, wkr);
+
+	return flushed;
+}
+
+/* clears the internal returns array in the distributor */
+void
+rte_distributor_clear_returns_burst(struct rte_distributor_burst *d)
+{
+	unsigned int wkr;
+
+	/* throw away returns, so workers can exit */
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		d->bufs[wkr].retptr64[0] = 0;
+}
+
+/* creates a distributor instance */
+struct rte_distributor_burst *
+rte_distributor_create_burst(const char *name,
+		unsigned int socket_id,
+		unsigned int num_workers)
+{
+	struct rte_distributor_burst *d;
+	struct rte_dist_burst_list *dist_burst_list;
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+	unsigned int i;
+
+	/* compilation-time checks */
+	RTE_BUILD_BUG_ON((sizeof(*d) & RTE_CACHE_LINE_MASK) != 0);
+	RTE_BUILD_BUG_ON((RTE_DISTRIB_MAX_WORKERS & 7) != 0);
+
+	if (name == NULL || num_workers >= RTE_DISTRIB_MAX_WORKERS) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	snprintf(mz_name, sizeof(mz_name), RTE_DISTRIB_PREFIX"%s", name);
+	mz = rte_memzone_reserve(mz_name, sizeof(*d), socket_id, NO_FLAGS);
+	if (mz == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	d = mz->addr;
+	snprintf(d->name, sizeof(d->name), "%s", name);
+	d->num_workers = num_workers;
+
+	d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
+
+	/*
+	 * Set up the backog tags so they're pointing at the second cache
+	 * line for performance during flow matching
+	 */
+	for (i = 0 ; i < num_workers ; i++)
+		d->backlog[i].tags = &d->in_flight_tags[i][RTE_DIST_BURST_SIZE];
+
+	dist_burst_list = RTE_TAILQ_CAST(rte_dist_burst_tailq.head,
+					  rte_dist_burst_list);
+
+	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+	TAILQ_INSERT_TAIL(dist_burst_list, d, next);
+	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
+	return d;
+}
diff --git a/lib/librte_distributor/rte_distributor_burst.h b/lib/librte_distributor/rte_distributor_burst.h
new file mode 100644
index 0000000..b0b41ec
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_burst.h
@@ -0,0 +1,257 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DIST_BURST_H_
+#define _RTE_DIST_BURST_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct rte_distributor_burst;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor_burst *
+rte_distributor_create_burst(const char *name, unsigned int socket_id,
+		unsigned int num_workers);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed on different cores at
+ * the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process_burst(struct rte_distributor_burst *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts_burst(struct rte_distributor_burst *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush_burst(struct rte_distributor_burst *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns_burst(struct rte_distributor_burst *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get new packets to process. Any previous packets
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param pkts
+ *   The mbufs pointer array to be filled in (up to 8 packets)
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ * @param retcount
+ *   The number of packets being returned
+ *
+ * @return
+ *   The number of packets in the pkts array
+ */
+int
+rte_distributor_get_pkt_burst(struct rte_distributor_burst *d,
+	unsigned int worker_id, struct rte_mbuf **pkts,
+	struct rte_mbuf **oldpkt, unsigned int retcount);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packets being processed by the worker
+ * @param num
+ *   The number of packets in the oldpkt array
+ */
+int
+rte_distributor_return_pkt_burst(struct rte_distributor_burst *d,
+	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt_burst(), this function does not wait for a
+ * new packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt_burst() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt_burst()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The returning packets, if any, processed by the worker
+ * @param count
+ *   The number of returning packets
+ */
+void
+rte_distributor_request_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbufs
+ *   The array of mbufs being given to the worker
+ *
+ * @return
+ *   The number of packets being given to the worker thread, zero if no
+ *   packet is yet available.
+ */
+int
+rte_distributor_poll_pkt_burst(struct rte_distributor_burst *d,
+		unsigned int worker_id, struct rte_mbuf **mbufs);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
new file mode 100644
index 0000000..ae70a98
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_private.h
@@ -0,0 +1,189 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DIST_PRIV_H_
+#define _RTE_DIST_PRIV_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define NO_FLAGS 0
+#define RTE_DISTRIB_PREFIX "DT_"
+
+/*
+ * We will use the bottom four bits of pointer for flags, shifting out
+ * the top four bits to make room (since a 64-bit pointer actually only uses
+ * 48 bits). An arithmetic-right-shift will then appropriately restore the
+ * original pointer value with proper sign extension into the top bits.
+ */
+#define RTE_DISTRIB_FLAG_BITS 4
+#define RTE_DISTRIB_FLAGS_MASK (0x0F)
+#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
+#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
+#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
+#define RTE_DISTRIB_VALID_BUF (4)  /**< set if bufptr contains ptr */
+
+#define RTE_DISTRIB_BACKLOG_SIZE 8
+#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
+
+#define RTE_DISTRIB_MAX_RETURNS 128
+#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
+
+/**
+ * Maximum number of workers allowed.
+ * Be aware of increasing the limit, becaus it is limited by how we track
+ * in-flight tags. See in_flight_bitmask and rte_distributor_process
+ */
+#define RTE_DISTRIB_MAX_WORKERS 64
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+/**
+ * Buffer structure used to pass the pointer data between cores. This is cache
+ * line aligned, but to improve performance and prevent adjacent cache-line
+ * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
+ * the next cache line to worker 0, we pad this out to three cache lines.
+ * Only 64-bits of the memory is actually used though.
+ */
+union rte_distributor_buffer {
+	volatile int64_t bufptr64;
+	char pad[RTE_CACHE_LINE_SIZE*3];
+} __rte_cache_aligned;
+
+/**
+ * Number of packets to deal with in bursts. Needs to be 8 so as to
+ * fit in one cache line.
+ */
+#define RTE_DIST_BURST_SIZE (sizeof(rte_xmm_t) / sizeof(uint16_t))
+
+/**
+ * Buffer structure used to pass the pointer data between cores. This is cache
+ * line aligned, but to improve performance and prevent adjacent cache-line
+ * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
+ * the next cache line to worker 0, we pad this out to two cache lines.
+ * We can pass up to 8 mbufs at a time in one cacheline.
+ * There is a separate cacheline for returns in the burst API.
+ */
+struct rte_distributor_buffer_burst {
+	volatile int64_t bufptr64[RTE_DIST_BURST_SIZE]
+			__rte_cache_aligned; /* <= outgoing to worker */
+
+	int64_t pad1 __rte_cache_aligned;    /* <= one cache line  */
+
+	volatile int64_t retptr64[RTE_DIST_BURST_SIZE]
+			__rte_cache_aligned; /* <= incoming from worker */
+
+	int64_t pad2 __rte_cache_aligned;    /* <= one cache line  */
+
+	int count __rte_cache_aligned;       /* <= number of current mbufs */
+};
+
+
+struct rte_distributor_backlog {
+	unsigned int start;
+	unsigned int count;
+	int64_t pkts[RTE_DIST_BURST_SIZE] __rte_cache_aligned;
+	uint16_t *tags; /* will point to second cacheline of inflights */
+} __rte_cache_aligned;
+
+
+struct rte_distributor_returned_pkts {
+	unsigned int start;
+	unsigned int count;
+	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
+};
+
+struct rte_distributor {
+	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
+
+	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
+	unsigned int num_workers;             /**< Number of workers polling */
+
+	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
+		/**< Tracks the tag being processed per core */
+	uint64_t in_flight_bitmask;
+		/**< on/off bits for in-flight tags.
+		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
+		 * the bitmask has to expand.
+		 */
+
+	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
+
+	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
+
+	struct rte_distributor_returned_pkts returns;
+};
+
+/* All different signature compare functions */
+enum rte_distributor_match_function {
+	RTE_DIST_MATCH_SCALAR = 0,
+	RTE_DIST_NUM_MATCH_FNS
+};
+
+struct rte_distributor_burst {
+	TAILQ_ENTRY(rte_distributor_burst) next;    /**< Next in list. */
+
+	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
+	unsigned int num_workers;             /**< Number of workers polling */
+
+	/**>
+	 * First cache line in the this array are the tags inflight
+	 * on the worker core. Second cache line are the backlog
+	 * that are going to go to the worker core.
+	 */
+	uint16_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS][RTE_DIST_BURST_SIZE*2]
+			__rte_cache_aligned;
+
+	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS]
+			__rte_cache_aligned;
+
+	struct rte_distributor_buffer_burst bufs[RTE_DISTRIB_MAX_WORKERS];
+
+	struct rte_distributor_returned_pkts returns;
+
+	enum rte_distributor_match_function dist_match_fn;
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_version.map b/lib/librte_distributor/rte_distributor_version.map
index 73fdc43..eabcaf5 100644
--- a/lib/librte_distributor/rte_distributor_version.map
+++ b/lib/librte_distributor/rte_distributor_version.map
@@ -13,3 +13,17 @@ DPDK_2.0 {
 
 	local: *;
 };
+
+DPDK_17.02 {
+	global:
+
+	rte_distributor_clear_returns_burst;
+	rte_distributor_create_burst;
+	rte_distributor_flush_burst;
+	rte_distributor_get_pkt_burst;
+	rte_distributor_poll_pkt_burst;
+	rte_distributor_process_burst;
+	rte_distributor_request_pkt_burst;
+	rte_distributor_return_pkt_burst;
+	rte_distributor_returned_pkts_burst;
+} DPDK_2.0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v6 2/6] lib: add distributor vector flow matching
  2017-01-23  9:24               ` [PATCH v6 0/6] distributor library " David Hunt
  2017-01-23  9:24                 ` [PATCH v6 1/6] lib: distributor " David Hunt
@ 2017-01-23  9:24                 ` David Hunt
  2017-01-23  9:24                 ` [PATCH v6 3/6] test: unit tests for new distributor burst API David Hunt
                                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-01-23  9:24 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                    |   7 ++
 lib/librte_distributor/rte_distributor_burst.c     |  12 ++-
 .../rte_distributor_match_generic.c                |  43 ++++++++
 lib/librte_distributor/rte_distributor_match_sse.c | 113 +++++++++++++++++++++
 lib/librte_distributor/rte_distributor_private.h   |  13 +++
 5 files changed, 186 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_match_generic.c
 create mode 100644 lib/librte_distributor/rte_distributor_match_sse.c

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 2acc54d..4baaa0c 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -44,6 +44,13 @@ LIBABIVER := 1
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_burst.c
+ifeq ($(CONFIG_RTE_ARCH_X86),y)
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_match_sse.c
+CFLAGS_rte_distributor_match_sse.o += -msse4.2
+else
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_match_generic.c
+endif
+
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
diff --git a/lib/librte_distributor/rte_distributor_burst.c b/lib/librte_distributor/rte_distributor_burst.c
index 9315c12..c000b8d 100644
--- a/lib/librte_distributor/rte_distributor_burst.c
+++ b/lib/librte_distributor/rte_distributor_burst.c
@@ -190,7 +190,7 @@ store_return(uintptr_t oldbuf, struct rte_distributor_burst *d,
  * backlog). This will then allow us to pin those packets to the relevant
  * workers to give us our atomic flow pinning.
  */
-static inline void
+void
 find_match_scalar(struct rte_distributor_burst *d,
 			uint16_t *data_ptr,
 			uint16_t *output_ptr)
@@ -357,6 +357,9 @@ rte_distributor_process_burst(struct rte_distributor_burst *d,
 			flows[i] = 0;
 
 		switch (d->dist_match_fn) {
+		case RTE_DIST_MATCH_VECTOR:
+			find_match_vec(d, &flows[0], &matches[0]);
+			break;
 		default:
 			find_match_scalar(d, &flows[0], &matches[0]);
 		}
@@ -544,7 +547,12 @@ rte_distributor_create_burst(const char *name,
 	snprintf(d->name, sizeof(d->name), "%s", name);
 	d->num_workers = num_workers;
 
-	d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
+#if defined(RTE_ARCH_X86)
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2)) {
+		d->dist_match_fn = RTE_DIST_MATCH_VECTOR;
+	} else
+#endif
+		d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
 
 	/*
 	 * Set up the backog tags so they're pointing at the second cache
diff --git a/lib/librte_distributor/rte_distributor_match_generic.c b/lib/librte_distributor/rte_distributor_match_generic.c
new file mode 100644
index 0000000..a523f08
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_match_generic.c
@@ -0,0 +1,43 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_mbuf.h>
+#include "rte_distributor_private.h"
+#include "rte_distributor_burst.h"
+
+void
+find_match_vec(struct rte_distributor_burst *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	find_match_scalar(d, data_ptr, output_ptr);
+}
diff --git a/lib/librte_distributor/rte_distributor_match_sse.c b/lib/librte_distributor/rte_distributor_match_sse.c
new file mode 100644
index 0000000..1d97799
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_match_sse.c
@@ -0,0 +1,113 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_mbuf.h>
+#include "rte_distributor_private.h"
+#include "rte_distributor_burst.h"
+#include "smmintrin.h"
+
+
+void
+find_match_vec(struct rte_distributor_burst *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	/* Setup */
+	__m128i incoming_fids;
+	__m128i inflight_fids;
+	__m128i preflight_fids;
+	__m128i wkr;
+	__m128i mask1;
+	__m128i mask2;
+	__m128i output;
+	struct rte_distributor_backlog *bl;
+	uint16_t i;
+
+	/*
+	 * Function overview:
+	 * 2. Loop through all worker ID's
+	 *  2a. Load the current inflights for that worker into an xmm reg
+	 *  2b. Load the current backlog for that worker into an xmm reg
+	 *  2c. use cmpestrm to intersect flow_ids with backlog and inflights
+	 *  2d. Add any matches to the output
+	 * 3. Write the output xmm (matching worker ids).
+	 */
+
+
+	output = _mm_set1_epi16(0);
+	incoming_fids = _mm_load_si128((__m128i *)data_ptr);
+
+	for (i = 0; i < d->num_workers; i++) {
+		bl = &d->backlog[i];
+
+		inflight_fids =
+			_mm_load_si128((__m128i *)&(d->in_flight_tags[i]));
+		preflight_fids =
+			_mm_load_si128((__m128i *)(bl->tags));
+
+		/*
+		 * Any incoming_fid that exists anywhere in inflight_fids will
+		 * have 0xffff in same position of the mask as the incoming fid
+		 * Example (shortened to bytes for brevity):
+		 * incoming_fids   0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08
+		 * inflight_fids   0x03 0x05 0x07 0x00 0x00 0x00 0x00 0x00
+		 * mask            0x00 0x00 0xff 0x00 0xff 0x00 0xff 0x00
+		 */
+
+		mask1 = _mm_cmpestrm(inflight_fids, 8, incoming_fids, 8,
+			_SIDD_UWORD_OPS |
+			_SIDD_CMP_EQUAL_ANY |
+			_SIDD_UNIT_MASK);
+		mask2 = _mm_cmpestrm(preflight_fids, 8, incoming_fids, 8,
+			_SIDD_UWORD_OPS |
+			_SIDD_CMP_EQUAL_ANY |
+			_SIDD_UNIT_MASK);
+
+		mask1 = _mm_or_si128(mask1, mask2);
+		/*
+		 * Now mask contains 0xffff where there's a match.
+		 * Next we need to store the worker_id in the relevant position
+		 * in the output.
+		 */
+
+		wkr = _mm_set1_epi16(i+1);
+		mask1 = _mm_and_si128(mask1, wkr);
+		output = _mm_or_si128(mask1, output);
+	}
+
+	/*
+	 * At this stage, the output 128-bit contains 8 16-bit values, with
+	 * each non-zero value containing the worker ID on which the
+	 * corresponding flow is pinned to.
+	 */
+	_mm_store_si128((__m128i *)output_ptr, output);
+}
diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
index ae70a98..1d73d92 100644
--- a/lib/librte_distributor/rte_distributor_private.h
+++ b/lib/librte_distributor/rte_distributor_private.h
@@ -33,6 +33,8 @@
 #ifndef _RTE_DIST_PRIV_H_
 #define _RTE_DIST_PRIV_H_
 
+#include <rte_vect.h>
+
 /**
  * @file
  * RTE distributor
@@ -155,6 +157,7 @@ struct rte_distributor {
 /* All different signature compare functions */
 enum rte_distributor_match_function {
 	RTE_DIST_MATCH_SCALAR = 0,
+	RTE_DIST_MATCH_VECTOR,
 	RTE_DIST_NUM_MATCH_FNS
 };
 
@@ -182,6 +185,16 @@ struct rte_distributor_burst {
 	enum rte_distributor_match_function dist_match_fn;
 };
 
+void
+find_match_scalar(struct rte_distributor_burst *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr);
+
+void
+find_match_vec(struct rte_distributor_burst *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr);
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v6 3/6] test: unit tests for new distributor burst API
  2017-01-23  9:24               ` [PATCH v6 0/6] distributor library " David Hunt
  2017-01-23  9:24                 ` [PATCH v6 1/6] lib: distributor " David Hunt
  2017-01-23  9:24                 ` [PATCH v6 2/6] lib: add distributor vector flow matching David Hunt
@ 2017-01-23  9:24                 ` David Hunt
  2017-01-23  9:24                 ` [PATCH v6 4/6] test: add distributor perf autotest David Hunt
                                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-01-23  9:24 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 app/test/test_distributor.c | 501 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 392 insertions(+), 109 deletions(-)

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index 85cb8f3..3871f86 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -40,11 +40,24 @@
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
 #include <rte_distributor.h>
+#include <rte_distributor_burst.h>
 
 #define ITER_POWER 20 /* log 2 of how many iterations we do when timing. */
 #define BURST 32
 #define BIG_BATCH 1024
 
+#define DIST_SINGLE 0
+#define DIST_BURST  1
+#define DIST_NUM_TYPES 2
+
+struct worker_params {
+	struct rte_distributor *d;
+	struct rte_distributor_burst *db;
+	int dist_type;
+};
+
+struct worker_params worker_params;
+
 /* statics - all zero-initialized by default */
 static volatile int quit;      /**< general quit variable for all threads */
 static volatile int zero_quit; /**< var for when we just want thr0 to quit*/
@@ -81,17 +94,36 @@ static int
 handle_work(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
-
-	pkt = rte_distributor_get_pkt(d, id, NULL);
-	while (!quit) {
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
+	unsigned int count = 0, num = 0;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	int i;
+
+	if (wp->dist_type == DIST_SINGLE) {
+		pkt = rte_distributor_get_pkt(d, id, NULL);
+		while (!quit) {
+			worker_stats[id].handled_packets++, count++;
+			pkt = rte_distributor_get_pkt(d, id, pkt);
+		}
 		worker_stats[id].handled_packets++, count++;
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		rte_distributor_return_pkt(d, id, pkt);
+	} else {
+		for (i = 0; i < 8; i++)
+			buf[i] = NULL;
+		num = rte_distributor_get_pkt_burst(db, id, buf, buf, num);
+		while (!quit) {
+			worker_stats[id].handled_packets += num;
+			count += num;
+			num = rte_distributor_get_pkt_burst(db, id,
+					buf, buf, num);
+		}
+		worker_stats[id].handled_packets += num;
+		count += num;
+		rte_distributor_return_pkt_burst(db, id, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
 	return 0;
 }
 
@@ -107,12 +139,21 @@ handle_work(void *arg)
  *   not necessarily in the same order (as different flows).
  */
 static int
-sanity_test(struct rte_distributor *d, struct rte_mempool *p)
+sanity_test(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	struct rte_mbuf *bufs[BURST];
-	unsigned i;
+	struct rte_mbuf *returns[BURST*2];
+	unsigned int i;
+	unsigned int retries;
+	unsigned int count = 0;
+
+	if (wp->dist_type == DIST_SINGLE)
+		printf("=== Basic distributor sanity tests (single) ===\n");
+	else
+		printf("=== Basic distributor sanity tests (burst) ===\n");
 
-	printf("=== Basic distributor sanity tests ===\n");
 	clear_packet_count();
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
@@ -124,8 +165,21 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 	for (i = 0; i < BURST; i++)
 		bufs[i]->hash.usr = 0;
 
-	rte_distributor_process(d, bufs, BURST);
-	rte_distributor_flush(d);
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, bufs, BURST);
+		rte_distributor_flush(d);
+	} else {
+		rte_distributor_process_burst(db, bufs, BURST);
+		count = 0;
+		do {
+
+			rte_distributor_flush_burst(db);
+			count += rte_distributor_returned_pkts_burst(db,
+					returns, BURST*2);
+		} while (count < BURST);
+	}
+
+
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -146,8 +200,18 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 		for (i = 0; i < BURST; i++)
 			bufs[i]->hash.usr = (i & 1) << 8;
 
-		rte_distributor_process(d, bufs, BURST);
-		rte_distributor_flush(d);
+		if (wp->dist_type == DIST_SINGLE) {
+			rte_distributor_process(d, bufs, BURST);
+			rte_distributor_flush(d);
+		} else {
+			rte_distributor_process_burst(db, bufs, BURST);
+			count = 0;
+			do {
+				rte_distributor_flush_burst(db);
+				count += rte_distributor_returned_pkts_burst(db,
+						returns, BURST*2);
+			} while (count < BURST);
+		}
 		if (total_packet_count() != BURST) {
 			printf("Line %d: Error, not all packets flushed. "
 					"Expected %u, got %u\n",
@@ -155,24 +219,32 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 			return -1;
 		}
 
+
 		for (i = 0; i < rte_lcore_count() - 1; i++)
 			printf("Worker %u handled %u packets\n", i,
 					worker_stats[i].handled_packets);
 		printf("Sanity test with two hash values done\n");
-
-		if (worker_stats[0].handled_packets != 16 ||
-				worker_stats[1].handled_packets != 16)
-			return -1;
 	}
 
 	/* give a different hash value to each packet,
 	 * so load gets distributed */
 	clear_packet_count();
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = i;
+		bufs[i]->hash.usr = i+1;
+
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, bufs, BURST);
+		rte_distributor_flush(d);
+	} else {
+		rte_distributor_process_burst(db, bufs, BURST);
+		count = 0;
+		do {
+			rte_distributor_flush_burst(db);
+			count += rte_distributor_returned_pkts_burst(db,
+					returns, BURST*2);
+		} while (count < BURST);
+	}
 
-	rte_distributor_process(d, bufs, BURST);
-	rte_distributor_flush(d);
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -194,8 +266,15 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 	unsigned num_returned = 0;
 
 	/* flush out any remaining packets */
-	rte_distributor_flush(d);
-	rte_distributor_clear_returns(d);
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_flush(d);
+		rte_distributor_clear_returns(d);
+	} else {
+		rte_distributor_flush_burst(db);
+		rte_distributor_clear_returns_burst(db);
+	}
+
+
 	if (rte_mempool_get_bulk(p, (void *)many_bufs, BIG_BATCH) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
 		return -1;
@@ -203,28 +282,59 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 	for (i = 0; i < BIG_BATCH; i++)
 		many_bufs[i]->hash.usr = i << 2;
 
-	for (i = 0; i < BIG_BATCH/BURST; i++) {
-		rte_distributor_process(d, &many_bufs[i*BURST], BURST);
+	if (wp->dist_type == DIST_SINGLE) {
+		printf("===testing single big burst===\n");
+		for (i = 0; i < BIG_BATCH/BURST; i++) {
+			rte_distributor_process(d, &many_bufs[i*BURST], BURST);
+			num_returned += rte_distributor_returned_pkts(d,
+					&return_bufs[num_returned],
+					BIG_BATCH - num_returned);
+		}
+		rte_distributor_flush(d);
 		num_returned += rte_distributor_returned_pkts(d,
 				&return_bufs[num_returned],
 				BIG_BATCH - num_returned);
+	} else {
+		printf("===testing burst big burst===\n");
+		for (i = 0; i < BIG_BATCH/BURST; i++) {
+			rte_distributor_process_burst(db,
+					&many_bufs[i*BURST], BURST);
+			count = rte_distributor_returned_pkts_burst(db,
+					&return_bufs[num_returned],
+					BIG_BATCH - num_returned);
+			num_returned += count;
+		}
+		rte_distributor_flush_burst(db);
+		count = rte_distributor_returned_pkts_burst(db,
+				&return_bufs[num_returned],
+				BIG_BATCH - num_returned);
+		num_returned += count;
 	}
-	rte_distributor_flush(d);
-	num_returned += rte_distributor_returned_pkts(d,
-			&return_bufs[num_returned], BIG_BATCH - num_returned);
+	retries = 0;
+	do {
+		rte_distributor_flush_burst(db);
+		count = rte_distributor_returned_pkts_burst(db,
+				&return_bufs[num_returned],
+				BIG_BATCH - num_returned);
+		num_returned += count;
+		retries++;
+	} while ((num_returned < BIG_BATCH) && (retries < 100));
+
 
 	if (num_returned != BIG_BATCH) {
-		printf("line %d: Number returned is not the same as "
-				"number sent\n", __LINE__);
+		printf("line %d: Missing packets, expected %d\n",
+				__LINE__, num_returned);
 		return -1;
 	}
+
 	/* big check -  make sure all packets made it back!! */
 	for (i = 0; i < BIG_BATCH; i++) {
 		unsigned j;
 		struct rte_mbuf *src = many_bufs[i];
-		for (j = 0; j < BIG_BATCH; j++)
+		for (j = 0; j < BIG_BATCH; j++) {
 			if (return_bufs[j] == src)
 				break;
+		}
 
 		if (j == BIG_BATCH) {
 			printf("Error: could not find source packet #%u\n", i);
@@ -234,7 +344,6 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 	printf("Sanity test of returned packets done\n");
 
 	rte_mempool_put_bulk(p, (void *)many_bufs, BIG_BATCH);
-
 	printf("\n");
 	return 0;
 }
@@ -249,18 +358,40 @@ static int
 handle_work_with_free_mbufs(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
-
-	pkt = rte_distributor_get_pkt(d, id, NULL);
-	while (!quit) {
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
+	unsigned int count = 0;
+	unsigned int i;
+	unsigned int num = 0;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+
+	if (wp->dist_type == DIST_SINGLE) {
+		pkt = rte_distributor_get_pkt(d, id, NULL);
+		while (!quit) {
+			worker_stats[id].handled_packets++, count++;
+			rte_pktmbuf_free(pkt);
+			pkt = rte_distributor_get_pkt(d, id, pkt);
+		}
 		worker_stats[id].handled_packets++, count++;
-		rte_pktmbuf_free(pkt);
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		rte_distributor_return_pkt(d, id, pkt);
+	} else {
+		for (i = 0; i < 8; i++)
+			buf[i] = NULL;
+		num = rte_distributor_get_pkt_burst(db, id, buf, buf, num);
+		while (!quit) {
+			worker_stats[id].handled_packets += num;
+			count += num;
+			for (i = 0; i < num; i++)
+				rte_pktmbuf_free(buf[i]);
+			num = rte_distributor_get_pkt_burst(db,
+					id, buf, buf, num);
+		}
+		worker_stats[id].handled_packets += num;
+		count += num;
+		rte_distributor_return_pkt_burst(db, id, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
 	return 0;
 }
 
@@ -270,26 +401,45 @@ handle_work_with_free_mbufs(void *arg)
  * library.
  */
 static int
-sanity_test_with_mbuf_alloc(struct rte_distributor *d, struct rte_mempool *p)
+sanity_test_with_mbuf_alloc(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	unsigned i;
 	struct rte_mbuf *bufs[BURST];
 
-	printf("=== Sanity test with mbuf alloc/free  ===\n");
+	if (wp->dist_type == DIST_SINGLE)
+		printf("=== Sanity test with mbuf alloc/free (single) ===\n");
+	else
+		printf("=== Sanity test with mbuf alloc/free (burst)  ===\n");
+
 	clear_packet_count();
 	for (i = 0; i < ((1<<ITER_POWER)); i += BURST) {
 		unsigned j;
-		while (rte_mempool_get_bulk(p, (void *)bufs, BURST) < 0)
-			rte_distributor_process(d, NULL, 0);
+		while (rte_mempool_get_bulk(p, (void *)bufs, BURST) < 0) {
+			if (wp->dist_type == DIST_SINGLE)
+				rte_distributor_process(d, NULL, 0);
+			else
+				rte_distributor_process_burst(db, NULL, 0);
+		}
 		for (j = 0; j < BURST; j++) {
 			bufs[j]->hash.usr = (i+j) << 1;
 			rte_mbuf_refcnt_set(bufs[j], 1);
 		}
 
-		rte_distributor_process(d, bufs, BURST);
+		if (wp->dist_type == DIST_SINGLE)
+			rte_distributor_process(d, bufs, BURST);
+		else
+			rte_distributor_process_burst(db, bufs, BURST);
 	}
 
-	rte_distributor_flush(d);
+	if (wp->dist_type == DIST_SINGLE)
+		rte_distributor_flush(d);
+	else
+		rte_distributor_flush_burst(db);
+
+	rte_delay_us(10000);
+
 	if (total_packet_count() < (1<<ITER_POWER)) {
 		printf("Line %u: Packet count is incorrect, %u, expected %u\n",
 				__LINE__, total_packet_count(),
@@ -305,20 +455,48 @@ static int
 handle_work_for_shutdown_test(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
-	unsigned count = 0;
-	const unsigned id = __sync_fetch_and_add(&worker_idx, 1);
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
+	unsigned int count = 0;
+	unsigned int num = 0;
+	unsigned int total = 0;
+	unsigned int i;
+	unsigned int returned = 0;
+	const unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+
+	if (wp->dist_type == DIST_SINGLE)
+		pkt = rte_distributor_get_pkt(d, id, NULL);
+	else
+		num = rte_distributor_get_pkt_burst(db, id, buf, buf, num);
 
-	pkt = rte_distributor_get_pkt(d, id, NULL);
 	/* wait for quit single globally, or for worker zero, wait
 	 * for zero_quit */
 	while (!quit && !(id == 0 && zero_quit)) {
-		worker_stats[id].handled_packets++, count++;
-		rte_pktmbuf_free(pkt);
-		pkt = rte_distributor_get_pkt(d, id, NULL);
+		if (wp->dist_type == DIST_SINGLE) {
+			worker_stats[id].handled_packets++, count++;
+			rte_pktmbuf_free(pkt);
+			pkt = rte_distributor_get_pkt(d, id, NULL);
+			num = 1;
+			total += num;
+		} else {
+			worker_stats[id].handled_packets += num;
+			count += num;
+			for (i = 0; i < num; i++)
+				rte_pktmbuf_free(buf[i]);
+			num = rte_distributor_get_pkt_burst(db,
+					id, buf, buf, num);
+			total += num;
+		}
+	}
+	worker_stats[id].handled_packets += num;
+	count += num;
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_return_pkt(d, id, pkt);
+	} else {
+		returned = rte_distributor_return_pkt_burst(db, id, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
 
 	if (id == 0) {
 		/* for worker zero, allow it to restart to pick up last packet
@@ -326,13 +504,29 @@ handle_work_for_shutdown_test(void *arg)
 		 */
 		while (zero_quit)
 			usleep(100);
-		pkt = rte_distributor_get_pkt(d, id, NULL);
+		if (wp->dist_type == DIST_SINGLE) {
+			pkt = rte_distributor_get_pkt(d, id, NULL);
+		} else {
+			num = rte_distributor_get_pkt_burst(db,
+					id, buf, buf, num);
+		}
 		while (!quit) {
 			worker_stats[id].handled_packets++, count++;
 			rte_pktmbuf_free(pkt);
-			pkt = rte_distributor_get_pkt(d, id, NULL);
+			if (wp->dist_type == DIST_SINGLE) {
+				pkt = rte_distributor_get_pkt(d, id, NULL);
+			} else {
+				num = rte_distributor_get_pkt_burst(db,
+						id, buf, buf, num);
+			}
+		}
+		if (wp->dist_type == DIST_SINGLE) {
+			rte_distributor_return_pkt(d, id, pkt);
+		} else {
+			returned = rte_distributor_return_pkt_burst(db,
+					id, buf, num);
+			printf("Num returned = %d\n", returned);
 		}
-		rte_distributor_return_pkt(d, id, pkt);
 	}
 	return 0;
 }
@@ -344,26 +538,37 @@ handle_work_for_shutdown_test(void *arg)
  * library.
  */
 static int
-sanity_test_with_worker_shutdown(struct rte_distributor *d,
+sanity_test_with_worker_shutdown(struct worker_params *wp,
 		struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
 	printf("=== Sanity test of worker shutdown ===\n");
 
 	clear_packet_count();
+
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
 		return -1;
 	}
 
-	/* now set all hash values in all buffers to zero, so all pkts go to the
-	 * one worker thread */
+	/*
+	 * Now set all hash values in all buffers to same value so all
+	 * pkts go to the one worker thread
+	 */
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = 0;
+		bufs[i]->hash.usr = 1;
+
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, bufs, BURST);
+	} else {
+		rte_distributor_process_burst(db, bufs, BURST);
+		rte_distributor_flush_burst(db);
+	}
 
-	rte_distributor_process(d, bufs, BURST);
 	/* at this point, we will have processed some packets and have a full
 	 * backlog for the other ones at worker 0.
 	 */
@@ -374,14 +579,25 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d,
 		return -1;
 	}
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = 0;
+		bufs[i]->hash.usr = 1;
 
 	/* get worker zero to quit */
 	zero_quit = 1;
-	rte_distributor_process(d, bufs, BURST);
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, bufs, BURST);
+		/* flush the distributor */
+		rte_distributor_flush(d);
+	} else {
+		rte_distributor_process_burst(db, bufs, BURST);
+		/* flush the distributor */
+		rte_distributor_flush_burst(db);
+	}
+	rte_delay_us(10000);
+
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
 
-	/* flush the distributor */
-	rte_distributor_flush(d);
 	if (total_packet_count() != BURST * 2) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -389,10 +605,6 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d,
 		return -1;
 	}
 
-	for (i = 0; i < rte_lcore_count() - 1; i++)
-		printf("Worker %u handled %u packets\n", i,
-				worker_stats[i].handled_packets);
-
 	printf("Sanity test with worker shutdown passed\n\n");
 	return 0;
 }
@@ -401,13 +613,18 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d,
  * one worker shuts down..
  */
 static int
-test_flush_with_worker_shutdown(struct rte_distributor *d,
+test_flush_with_worker_shutdown(struct worker_params *wp,
 		struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
-	printf("=== Test flush fn with worker shutdown ===\n");
+	if (wp->dist_type == DIST_SINGLE)
+		printf("=== Test flush fn with worker shutdown (single) ===\n");
+	else
+		printf("=== Test flush fn with worker shutdown (burst) ===\n");
 
 	clear_packet_count();
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
@@ -420,7 +637,11 @@ test_flush_with_worker_shutdown(struct rte_distributor *d,
 	for (i = 0; i < BURST; i++)
 		bufs[i]->hash.usr = 0;
 
-	rte_distributor_process(d, bufs, BURST);
+	if (wp->dist_type == DIST_SINGLE)
+		rte_distributor_process(d, bufs, BURST);
+	else
+		rte_distributor_process_burst(db, bufs, BURST);
+
 	/* at this point, we will have processed some packets and have a full
 	 * backlog for the other ones at worker 0.
 	 */
@@ -429,9 +650,18 @@ test_flush_with_worker_shutdown(struct rte_distributor *d,
 	zero_quit = 1;
 
 	/* flush the distributor */
-	rte_distributor_flush(d);
+	if (wp->dist_type == DIST_SINGLE)
+		rte_distributor_flush(d);
+	else
+		rte_distributor_flush_burst(db);
+
+	rte_delay_us(10000);
 
 	zero_quit = 0;
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
+
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -439,10 +669,6 @@ test_flush_with_worker_shutdown(struct rte_distributor *d,
 		return -1;
 	}
 
-	for (i = 0; i < rte_lcore_count() - 1; i++)
-		printf("Worker %u handled %u packets\n", i,
-				worker_stats[i].handled_packets);
-
 	printf("Flush test with worker shutdown passed\n\n");
 	return 0;
 }
@@ -451,6 +677,7 @@ static
 int test_error_distributor_create_name(void)
 {
 	struct rte_distributor *d = NULL;
+	struct rte_distributor_burst *db = NULL;
 	char *name = NULL;
 
 	d = rte_distributor_create(name, rte_socket_id(),
@@ -460,6 +687,13 @@ int test_error_distributor_create_name(void)
 		return -1;
 	}
 
+	db = rte_distributor_create_burst(name, rte_socket_id(),
+			rte_lcore_count() - 1);
+	if (db != NULL || rte_errno != EINVAL) {
+		printf("ERROR: No error on create_burst() with NULL param\n");
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -468,20 +702,32 @@ static
 int test_error_distributor_create_numworkers(void)
 {
 	struct rte_distributor *d = NULL;
+	struct rte_distributor_burst *db = NULL;
+
 	d = rte_distributor_create("test_numworkers", rte_socket_id(),
 			RTE_MAX_LCORE + 10);
 	if (d != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with num_workers > MAX\n");
 		return -1;
 	}
+
+	db = rte_distributor_create_burst("test_numworkers", rte_socket_id(),
+			RTE_MAX_LCORE + 10);
+	if (db != NULL || rte_errno != EINVAL) {
+		printf("ERROR: No error on create_burst() num_workers > MAX\n");
+		return -1;
+	}
+
 	return 0;
 }
 
 
 /* Useful function which ensures that all worker functions terminate */
 static void
-quit_workers(struct rte_distributor *d, struct rte_mempool *p)
+quit_workers(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->d;
+	struct rte_distributor_burst *db = wp->db;
 	const unsigned num_workers = rte_lcore_count() - 1;
 	unsigned i;
 	struct rte_mbuf *bufs[RTE_MAX_LCORE];
@@ -491,12 +737,20 @@ quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 	quit = 1;
 	for (i = 0; i < num_workers; i++)
 		bufs[i]->hash.usr = i << 1;
-	rte_distributor_process(d, bufs, num_workers);
+	if (wp->dist_type == DIST_SINGLE)
+		rte_distributor_process(d, bufs, num_workers);
+	else
+		rte_distributor_process_burst(db, bufs, num_workers);
 
 	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
 
-	rte_distributor_process(d, NULL, 0);
-	rte_distributor_flush(d);
+	if (wp->dist_type == DIST_SINGLE) {
+		rte_distributor_process(d, NULL, 0);
+		rte_distributor_flush(d);
+	} else {
+		rte_distributor_process_burst(db, NULL, 0);
+		rte_distributor_flush_burst(db);
+	}
 	rte_eal_mp_wait_lcore();
 	quit = 0;
 	worker_idx = 0;
@@ -506,7 +760,9 @@ static int
 test_distributor(void)
 {
 	static struct rte_distributor *d;
+	static struct rte_distributor_burst *db;
 	static struct rte_mempool *p;
+	int i;
 
 	if (rte_lcore_count() < 2) {
 		printf("ERROR: not enough cores to test distributor\n");
@@ -525,6 +781,19 @@ test_distributor(void)
 		rte_distributor_clear_returns(d);
 	}
 
+	if (db == NULL) {
+		db = rte_distributor_create_burst("Test_dist_burst",
+				rte_socket_id(),
+				rte_lcore_count() - 1);
+		if (db == NULL) {
+			printf("Error creating burst distributor\n");
+			return -1;
+		}
+	} else {
+		rte_distributor_flush_burst(db);
+		rte_distributor_clear_returns_burst(db);
+	}
+
 	const unsigned nb_bufs = (511 * rte_lcore_count()) < BIG_BATCH ?
 			(BIG_BATCH * 2) - 1 : (511 * rte_lcore_count());
 	if (p == NULL) {
@@ -536,31 +805,45 @@ test_distributor(void)
 		}
 	}
 
-	rte_eal_mp_remote_launch(handle_work, d, SKIP_MASTER);
-	if (sanity_test(d, p) < 0)
-		goto err;
-	quit_workers(d, p);
+	worker_params.d = d;
+	worker_params.db = db;
 
-	rte_eal_mp_remote_launch(handle_work_with_free_mbufs, d, SKIP_MASTER);
-	if (sanity_test_with_mbuf_alloc(d, p) < 0)
-		goto err;
-	quit_workers(d, p);
+	for (i = 0; i < DIST_NUM_TYPES; i++) {
 
-	if (rte_lcore_count() > 2) {
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test, d,
-				SKIP_MASTER);
-		if (sanity_test_with_worker_shutdown(d, p) < 0)
-			goto err;
-		quit_workers(d, p);
+		worker_params.dist_type = i;
 
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test, d,
-				SKIP_MASTER);
-		if (test_flush_with_worker_shutdown(d, p) < 0)
+		rte_eal_mp_remote_launch(handle_work,
+				&worker_params, SKIP_MASTER);
+		if (sanity_test(&worker_params, p) < 0)
 			goto err;
-		quit_workers(d, p);
+		quit_workers(&worker_params, p);
 
-	} else {
-		printf("Not enough cores to run tests for worker shutdown\n");
+		rte_eal_mp_remote_launch(handle_work_with_free_mbufs,
+				&worker_params, SKIP_MASTER);
+		if (sanity_test_with_mbuf_alloc(&worker_params, p) < 0)
+			goto err;
+		quit_workers(&worker_params, p);
+
+		if (rte_lcore_count() > 2) {
+			rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+					&worker_params,
+					SKIP_MASTER);
+			if (sanity_test_with_worker_shutdown(&worker_params,
+					p) < 0)
+				goto err;
+			quit_workers(&worker_params, p);
+
+			rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+					&worker_params,
+					SKIP_MASTER);
+			if (test_flush_with_worker_shutdown(&worker_params,
+					p) < 0)
+				goto err;
+			quit_workers(&worker_params, p);
+
+		} else {
+			printf("Too few cores to run worker shutdown test\n");
+		}
 	}
 
 	if (test_error_distributor_create_numworkers() == -1 ||
@@ -572,7 +855,7 @@ test_distributor(void)
 	return 0;
 
 err:
-	quit_workers(d, p);
+	quit_workers(&worker_params, p);
 	return -1;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v6 4/6] test: add distributor perf autotest
  2017-01-23  9:24               ` [PATCH v6 0/6] distributor library " David Hunt
                                   ` (2 preceding siblings ...)
  2017-01-23  9:24                 ` [PATCH v6 3/6] test: unit tests for new distributor burst API David Hunt
@ 2017-01-23  9:24                 ` David Hunt
  2017-01-23  9:24                 ` [PATCH v6 5/6] examples/distributor_app: showing burst API David Hunt
                                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-01-23  9:24 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 app/test/test_distributor_perf.c | 150 +++++++++++++++++++++++++++++++++++----
 1 file changed, 138 insertions(+), 12 deletions(-)

diff --git a/app/test/test_distributor_perf.c b/app/test/test_distributor_perf.c
index 7947fe9..9132010 100644
--- a/app/test/test_distributor_perf.c
+++ b/app/test/test_distributor_perf.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -40,9 +40,11 @@
 #include <rte_common.h>
 #include <rte_mbuf.h>
 #include <rte_distributor.h>
+#include <rte_distributor_burst.h>
 
-#define ITER_POWER 20 /* log 2 of how many iterations we do when timing. */
-#define BURST 32
+#define ITER_POWER_CL 25 /* log 2 of how many iterations  for Cache Line test */
+#define ITER_POWER 21 /* log 2 of how many iterations we do when timing. */
+#define BURST 64
 #define BIG_BATCH 1024
 
 /* static vars - zero initialized by default */
@@ -54,7 +56,8 @@ struct worker_stats {
 } __rte_cache_aligned;
 struct worker_stats worker_stats[RTE_MAX_LCORE];
 
-/* worker thread used for testing the time to do a round-trip of a cache
+/*
+ * worker thread used for testing the time to do a round-trip of a cache
  * line between two cores and back again
  */
 static void
@@ -69,7 +72,8 @@ flip_bit(volatile uint64_t *arg)
 	}
 }
 
-/* test case to time the number of cycles to round-trip a cache line between
+/*
+ * test case to time the number of cycles to round-trip a cache line between
  * two cores and back again.
  */
 static void
@@ -86,7 +90,7 @@ time_cache_line_switch(void)
 		rte_pause();
 
 	const uint64_t start_time = rte_rdtsc();
-	for (i = 0; i < (1 << ITER_POWER); i++) {
+	for (i = 0; i < (1 << ITER_POWER_CL); i++) {
 		while (*pdata)
 			rte_pause();
 		*pdata = 1;
@@ -98,13 +102,14 @@ time_cache_line_switch(void)
 	*pdata = 2;
 	rte_eal_wait_lcore(slaveid);
 	printf("==== Cache line switch test ===\n");
-	printf("Time for %u iterations = %"PRIu64" ticks\n", (1<<ITER_POWER),
+	printf("Time for %u iterations = %"PRIu64" ticks\n", (1<<ITER_POWER_CL),
 			end_time-start_time);
 	printf("Ticks per iteration = %"PRIu64"\n\n",
-			(end_time-start_time) >> ITER_POWER);
+			(end_time-start_time) >> ITER_POWER_CL);
 }
 
-/* returns the total count of the number of packets handled by the worker
+/*
+ * returns the total count of the number of packets handled by the worker
  * functions given below.
  */
 static unsigned
@@ -123,7 +128,8 @@ clear_packet_count(void)
 	memset(&worker_stats, 0, sizeof(worker_stats));
 }
 
-/* this is the basic worker function for performance tests.
+/*
+ * this is the basic worker function for performance tests.
  * it does nothing but return packets and count them.
  */
 static int
@@ -144,7 +150,37 @@ handle_work(void *arg)
 	return 0;
 }
 
-/* this basic performance test just repeatedly sends in 32 packets at a time
+/*
+ * this is the basic worker function for performance tests.
+ * it does nothing but return packets and count them.
+ */
+static int
+handle_work_burst(void *arg)
+{
+	struct rte_distributor_burst *d = arg;
+	unsigned int count = 0;
+	unsigned int num = 0;
+	int i;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
+
+	num = rte_distributor_get_pkt_burst(d, id, buf, buf, num);
+	while (!quit) {
+		worker_stats[id].handled_packets += num;
+		count += num;
+		num = rte_distributor_get_pkt_burst(d, id, buf, buf, num);
+	}
+	worker_stats[id].handled_packets += num;
+	count += num;
+	rte_distributor_return_pkt_burst(d, id, buf, num);
+	return 0;
+}
+
+/*
+ * this basic performance test just repeatedly sends in 32 packets at a time
  * to the distributor and verifies at the end that we got them all in the worker
  * threads and finally how long per packet the processing took.
  */
@@ -174,6 +210,8 @@ perf_test(struct rte_distributor *d, struct rte_mempool *p)
 		rte_distributor_process(d, NULL, 0);
 	} while (total_packet_count() < (BURST << ITER_POWER));
 
+	rte_distributor_clear_returns(d);
+
 	printf("=== Performance test of distributor ===\n");
 	printf("Time per burst:  %"PRIu64"\n", (end - start) >> ITER_POWER);
 	printf("Time per packet: %"PRIu64"\n\n",
@@ -190,6 +228,55 @@ perf_test(struct rte_distributor *d, struct rte_mempool *p)
 	return 0;
 }
 
+/*
+ * this basic performance test just repeatedly sends in 32 packets at a time
+ * to the distributor and verifies at the end that we got them all in the worker
+ * threads and finally how long per packet the processing took.
+ */
+static inline int
+perf_test_burst(struct rte_distributor_burst *d, struct rte_mempool *p)
+{
+	unsigned int i;
+	uint64_t start, end;
+	struct rte_mbuf *bufs[BURST];
+
+	clear_packet_count();
+	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
+		printf("Error getting mbufs from pool\n");
+		return -1;
+	}
+	/* ensure we have different hash value for each pkt */
+	for (i = 0; i < BURST; i++)
+		bufs[i]->hash.usr = i;
+
+	start = rte_rdtsc();
+	for (i = 0; i < (1<<ITER_POWER); i++)
+		rte_distributor_process_burst(d, bufs, BURST);
+	end = rte_rdtsc();
+
+	do {
+		usleep(100);
+		rte_distributor_process_burst(d, NULL, 0);
+	} while (total_packet_count() < (BURST << ITER_POWER));
+
+	rte_distributor_clear_returns_burst(d);
+
+	printf("=== Performance test of burst distributor ===\n");
+	printf("Time per burst:  %"PRIu64"\n", (end - start) >> ITER_POWER);
+	printf("Time per packet: %"PRIu64"\n\n",
+			((end - start) >> ITER_POWER)/BURST);
+	rte_mempool_put_bulk(p, (void *)bufs, BURST);
+
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
+	printf("Total packets: %u (%x)\n", total_packet_count(),
+			total_packet_count());
+	printf("=== Perf test done ===\n\n");
+
+	return 0;
+}
+
 /* Useful function which ensures that all worker functions terminate */
 static void
 quit_workers(struct rte_distributor *d, struct rte_mempool *p)
@@ -212,10 +299,34 @@ quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 	worker_idx = 0;
 }
 
+/* Useful function which ensures that all worker functions terminate */
+static void
+quit_workers_burst(struct rte_distributor_burst *d, struct rte_mempool *p)
+{
+	const unsigned int num_workers = rte_lcore_count() - 1;
+	unsigned int i;
+	struct rte_mbuf *bufs[RTE_MAX_LCORE];
+
+	rte_mempool_get_bulk(p, (void *)bufs, num_workers);
+
+	quit = 1;
+	for (i = 0; i < num_workers; i++)
+		bufs[i]->hash.usr = i << 1;
+	rte_distributor_process_burst(d, bufs, num_workers);
+
+	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
+
+	rte_distributor_process_burst(d, NULL, 0);
+	rte_eal_mp_wait_lcore();
+	quit = 0;
+	worker_idx = 0;
+}
+
 static int
 test_distributor_perf(void)
 {
 	static struct rte_distributor *d;
+	static struct rte_distributor_burst *db;
 	static struct rte_mempool *p;
 
 	if (rte_lcore_count() < 2) {
@@ -234,10 +345,20 @@ test_distributor_perf(void)
 			return -1;
 		}
 	} else {
-		rte_distributor_flush(d);
 		rte_distributor_clear_returns(d);
 	}
 
+	if (db == NULL) {
+		db = rte_distributor_create_burst("Test_burst", rte_socket_id(),
+				rte_lcore_count() - 1);
+		if (db == NULL) {
+			printf("Error creating burst distributor\n");
+			return -1;
+		}
+	} else {
+		rte_distributor_clear_returns_burst(db);
+	}
+
 	const unsigned nb_bufs = (511 * rte_lcore_count()) < BIG_BATCH ?
 			(BIG_BATCH * 2) - 1 : (511 * rte_lcore_count());
 	if (p == NULL) {
@@ -254,6 +375,11 @@ test_distributor_perf(void)
 		return -1;
 	quit_workers(d, p);
 
+	rte_eal_mp_remote_launch(handle_work_burst, db, SKIP_MASTER);
+	if (perf_test_burst(db, p) < 0)
+		return -1;
+	quit_workers_burst(db, p);
+
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v6 5/6] examples/distributor_app: showing burst API
  2017-01-23  9:24               ` [PATCH v6 0/6] distributor library " David Hunt
                                   ` (3 preceding siblings ...)
  2017-01-23  9:24                 ` [PATCH v6 4/6] test: add distributor perf autotest David Hunt
@ 2017-01-23  9:24                 ` David Hunt
  2017-01-23  9:24                 ` [PATCH v6 6/6] doc: distributor library changes for new " David Hunt
                                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-01-23  9:24 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 examples/distributor/main.c | 511 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 393 insertions(+), 118 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index e7641d2..b0d8b31 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -1,8 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -31,6 +30,8 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
+#define BURST_API 1
+
 #include <stdint.h>
 #include <inttypes.h>
 #include <unistd.h>
@@ -43,39 +44,87 @@
 #include <rte_malloc.h>
 #include <rte_debug.h>
 #include <rte_prefetch.h>
+#if BURST_API
+#include <rte_distributor_burst.h>
+#else
 #include <rte_distributor.h>
+#endif
 
-#define RX_RING_SIZE 256
-#define TX_RING_SIZE 512
+#define RX_QUEUE_SIZE 512
+#define TX_QUEUE_SIZE 512
 #define NUM_MBUFS ((64*1024)-1)
-#define MBUF_CACHE_SIZE 250
+#define MBUF_CACHE_SIZE 128
+#if BURST_API
+#define BURST_SIZE 64
+#define SCHED_RX_RING_SZ 8192
+#define SCHED_TX_RING_SZ 65536
+#else
 #define BURST_SIZE 32
-#define RTE_RING_SZ 1024
+#define SCHED_RX_RING_SZ 1024
+#define SCHED_TX_RING_SZ 1024
+#endif
+#define BURST_SIZE_TX 32
 
 #define RTE_LOGTYPE_DISTRAPP RTE_LOGTYPE_USER1
 
+#define ANSI_COLOR_RED     "\x1b[31m"
+#define ANSI_COLOR_RESET   "\x1b[0m"
+
 /* mask of enabled ports */
 static uint32_t enabled_port_mask;
 volatile uint8_t quit_signal;
 volatile uint8_t quit_signal_rx;
+volatile uint8_t quit_signal_dist;
+volatile uint8_t quit_signal_work;
 
 static volatile struct app_stats {
 	struct {
 		uint64_t rx_pkts;
 		uint64_t returned_pkts;
 		uint64_t enqueued_pkts;
+		uint64_t enqdrop_pkts;
 	} rx __rte_cache_aligned;
+	int pad1 __rte_cache_aligned;
+
+	struct {
+		uint64_t in_pkts;
+		uint64_t ret_pkts;
+		uint64_t sent_pkts;
+		uint64_t enqdrop_pkts;
+	} dist __rte_cache_aligned;
+	int pad2 __rte_cache_aligned;
 
 	struct {
 		uint64_t dequeue_pkts;
 		uint64_t tx_pkts;
+		uint64_t enqdrop_pkts;
 	} tx __rte_cache_aligned;
+	int pad3 __rte_cache_aligned;
+
+	uint64_t worker_pkts[64] __rte_cache_aligned;
+
+	int pad4 __rte_cache_aligned;
+
+	uint64_t worker_bursts[64][8] __rte_cache_aligned;
+
+	int pad5 __rte_cache_aligned;
+
+	uint64_t port_rx_pkts[64] __rte_cache_aligned;
+	uint64_t port_tx_pkts[64] __rte_cache_aligned;
 } app_stats;
 
+struct app_stats prev_app_stats;
+
 static const struct rte_eth_conf port_conf_default = {
 	.rxmode = {
 		.mq_mode = ETH_MQ_RX_RSS,
 		.max_rx_pkt_len = ETHER_MAX_LEN,
+		.split_hdr_size = 0,
+		.header_split   = 0, /**< Header Split disabled */
+		.hw_ip_checksum = 1, /**< IP checksum offload enabled */
+		.hw_vlan_filter = 0, /**< VLAN filtering disabled */
+		.jumbo_frame    = 0, /**< Jumbo Frame Support disabled */
+		.hw_strip_crc   = 0, /**< CRC stripped by hardware */
 	},
 	.txmode = {
 		.mq_mode = ETH_MQ_TX_NONE,
@@ -93,6 +142,8 @@ struct output_buffer {
 	struct rte_mbuf *mbufs[BURST_SIZE];
 };
 
+static void print_stats(void);
+
 /*
  * Initialises a given port using global settings and with the rx buffers
  * coming from the mbuf_pool passed as parameter
@@ -101,9 +152,13 @@ static inline int
 port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 {
 	struct rte_eth_conf port_conf = port_conf_default;
-	const uint16_t rxRings = 1, txRings = rte_lcore_count() - 1;
-	int retval;
+	const uint16_t rxRings = 1;
+	uint16_t txRings = rte_lcore_count() - 1;
 	uint16_t q;
+	int retval;
+
+	if (txRings > RTE_MAX_ETHPORTS)
+		txRings = RTE_MAX_ETHPORTS;
 
 	if (port >= rte_eth_dev_count())
 		return -1;
@@ -113,7 +168,7 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 		return retval;
 
 	for (q = 0; q < rxRings; q++) {
-		retval = rte_eth_rx_queue_setup(port, q, RX_RING_SIZE,
+		retval = rte_eth_rx_queue_setup(port, q, RX_QUEUE_SIZE,
 						rte_eth_dev_socket_id(port),
 						NULL, mbuf_pool);
 		if (retval < 0)
@@ -121,7 +176,7 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 	}
 
 	for (q = 0; q < txRings; q++) {
-		retval = rte_eth_tx_queue_setup(port, q, TX_RING_SIZE,
+		retval = rte_eth_tx_queue_setup(port, q, TX_QUEUE_SIZE,
 						rte_eth_dev_socket_id(port),
 						NULL);
 		if (retval < 0)
@@ -134,7 +189,8 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 
 	struct rte_eth_link link;
 	rte_eth_link_get_nowait(port, &link);
-	if (!link.link_status) {
+	while (!link.link_status) {
+		printf("Waiting for Link up on port %"PRIu8"\n", port);
 		sleep(1);
 		rte_eth_link_get_nowait(port, &link);
 	}
@@ -160,41 +216,52 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 
 struct lcore_params {
 	unsigned worker_id;
-	struct rte_distributor *d;
-	struct rte_ring *r;
+	struct rte_distributor_burst *d;
+	struct rte_ring *rx_dist_ring;
+	struct rte_ring *dist_tx_ring;
 	struct rte_mempool *mem_pool;
 };
 
-static int
-quit_workers(struct rte_distributor *d, struct rte_mempool *p)
+static inline void
+flush_one_port(struct output_buffer *outbuf, uint8_t outp)
 {
-	const unsigned num_workers = rte_lcore_count() - 2;
-	unsigned i;
-	struct rte_mbuf *bufs[num_workers];
+	unsigned int nb_tx = rte_eth_tx_burst(outp, 0,
+			outbuf->mbufs, outbuf->count);
+	app_stats.tx.tx_pkts += outbuf->count;
 
-	if (rte_mempool_get_bulk(p, (void *)bufs, num_workers) != 0) {
-		printf("line %d: Error getting mbufs from pool\n", __LINE__);
-		return -1;
+	if (unlikely(nb_tx < outbuf->count)) {
+		app_stats.tx.enqdrop_pkts +=  outbuf->count - nb_tx;
+		do {
+			rte_pktmbuf_free(outbuf->mbufs[nb_tx]);
+		} while (++nb_tx < outbuf->count);
 	}
+	outbuf->count = 0;
+}
+
+static inline void
+flush_all_ports(struct output_buffer *tx_buffers, uint8_t nb_ports)
+{
+	uint8_t outp;
 
-	for (i = 0; i < num_workers; i++)
-		bufs[i]->hash.rss = i << 1;
+	for (outp = 0; outp < nb_ports; outp++) {
+		/* skip ports that are not enabled */
+		if ((enabled_port_mask & (1 << outp)) == 0)
+			continue;
 
-	rte_distributor_process(d, bufs, num_workers);
-	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
+		if (tx_buffers[outp].count == 0)
+			continue;
 
-	return 0;
+		flush_one_port(&tx_buffers[outp], outp);
+	}
 }
 
 static int
 lcore_rx(struct lcore_params *p)
 {
-	struct rte_distributor *d = p->d;
-	struct rte_mempool *mem_pool = p->mem_pool;
-	struct rte_ring *r = p->r;
 	const uint8_t nb_ports = rte_eth_dev_count();
 	const int socket_id = rte_socket_id();
 	uint8_t port;
+	struct rte_mbuf *bufs[BURST_SIZE*2];
 
 	for (port = 0; port < nb_ports; port++) {
 		/* skip ports that are not enabled */
@@ -210,6 +277,7 @@ lcore_rx(struct lcore_params *p)
 
 	printf("\nCore %u doing packet RX.\n", rte_lcore_id());
 	port = 0;
+
 	while (!quit_signal_rx) {
 
 		/* skip ports that are not enabled */
@@ -218,7 +286,7 @@ lcore_rx(struct lcore_params *p)
 				port = 0;
 			continue;
 		}
-		struct rte_mbuf *bufs[BURST_SIZE*2];
+
 		const uint16_t nb_rx = rte_eth_rx_burst(port, 0, bufs,
 				BURST_SIZE);
 		if (unlikely(nb_rx == 0)) {
@@ -228,19 +296,46 @@ lcore_rx(struct lcore_params *p)
 		}
 		app_stats.rx.rx_pkts += nb_rx;
 
-		rte_distributor_process(d, bufs, nb_rx);
-		const uint16_t nb_ret = rte_distributor_returned_pkts(d,
-				bufs, BURST_SIZE*2);
+/*
+ * You can run the distributor on the rx core with this code. Returned
+ * packets are then send straight to the tx core.
+ */
+#if 0
+
+#if BURST_API
+	rte_distributor_process_burst(d, bufs, nb_rx);
+	const uint16_t nb_ret = rte_distributor_returned_pkts_burst(d,
+			bufs, BURST_SIZE*2);
+#else
+	rte_distributor_process(d, bufs, nb_rx);
+	const uint16_t nb_ret = rte_distributor_returned_pkts(d,
+			bufs, BURST_SIZE*2);
+#endif
+
 		app_stats.rx.returned_pkts += nb_ret;
 		if (unlikely(nb_ret == 0)) {
 			if (++port == nb_ports)
 				port = 0;
 			continue;
 		}
-
-		uint16_t sent = rte_ring_enqueue_burst(r, (void *)bufs, nb_ret);
+		struct rte_ring *tx_ring = p->dist_tx_ring;
+		uint16_t sent = rte_ring_enqueue_burst(tx_ring,
+				(void *)bufs, nb_ret);
+#else
+		uint16_t nb_ret = nb_rx;
+		/*
+		 * Swap the following two lines if you want the rx traffic
+		 * to go directly to tx, no distribution.
+		 */
+		struct rte_ring *out_ring = p->rx_dist_ring;
+		/* struct rte_ring *out_ring = p->dist_tx_ring; */
+
+		uint16_t sent = rte_ring_enqueue_burst(out_ring,
+				(void *)bufs, nb_ret);
+#endif
 		app_stats.rx.enqueued_pkts += sent;
 		if (unlikely(sent < nb_ret)) {
+			app_stats.rx.enqdrop_pkts +=  nb_ret - sent;
 			RTE_LOG_DP(DEBUG, DISTRAPP,
 				"%s:Packet loss due to full ring\n", __func__);
 			while (sent < nb_ret)
@@ -249,56 +344,89 @@ lcore_rx(struct lcore_params *p)
 		if (++port == nb_ports)
 			port = 0;
 	}
-	rte_distributor_process(d, NULL, 0);
-	/* flush distributor to bring to known state */
-	rte_distributor_flush(d);
 	/* set worker & tx threads quit flag */
+	printf("\nCore %u exiting rx task.\n", rte_lcore_id());
 	quit_signal = 1;
-	/*
-	 * worker threads may hang in get packet as
-	 * distributor process is not running, just make sure workers
-	 * get packets till quit_signal is actually been
-	 * received and they gracefully shutdown
-	 */
-	if (quit_workers(d, mem_pool) != 0)
-		return -1;
-	/* rx thread should quit at last */
 	return 0;
 }
 
-static inline void
-flush_one_port(struct output_buffer *outbuf, uint8_t outp)
-{
-	unsigned nb_tx = rte_eth_tx_burst(outp, 0, outbuf->mbufs,
-			outbuf->count);
-	app_stats.tx.tx_pkts += nb_tx;
 
-	if (unlikely(nb_tx < outbuf->count)) {
-		RTE_LOG_DP(DEBUG, DISTRAPP,
-			"%s:Packet loss with tx_burst\n", __func__);
-		do {
-			rte_pktmbuf_free(outbuf->mbufs[nb_tx]);
-		} while (++nb_tx < outbuf->count);
-	}
-	outbuf->count = 0;
-}
 
-static inline void
-flush_all_ports(struct output_buffer *tx_buffers, uint8_t nb_ports)
+static int
+lcore_distributor(struct lcore_params *p)
 {
-	uint8_t outp;
-	for (outp = 0; outp < nb_ports; outp++) {
-		/* skip ports that are not enabled */
-		if ((enabled_port_mask & (1 << outp)) == 0)
-			continue;
-
-		if (tx_buffers[outp].count == 0)
-			continue;
-
-		flush_one_port(&tx_buffers[outp], outp);
+	struct rte_ring *in_r = p->rx_dist_ring;
+	struct rte_ring *out_r = p->dist_tx_ring;
+	struct rte_mbuf *bufs[BURST_SIZE * 4];
+	struct rte_distributor_burst *d = p->d;
+
+	printf("\nCore %u acting as distributor core.\n", rte_lcore_id());
+	while (!quit_signal_dist) {
+		const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
+				(void *)bufs, BURST_SIZE*1);
+		if (nb_rx) {
+			app_stats.dist.in_pkts += nb_rx;
+/*
+ * This '#if' allows you to bypass the distributor. Incoming packets may be
+ * sent straight to the tx ring.
+ */
+#if 1
+
+#if BURST_API
+			/* Distribute the packets */
+			rte_distributor_process_burst(d, bufs, nb_rx);
+			/* Handle Returns */
+			const uint16_t nb_ret =
+				rte_distributor_returned_pkts_burst(d,
+					bufs, BURST_SIZE*2);
+#else
+			/* Distribute the packets */
+			rte_distributor_process(d, bufs, nb_rx);
+			/* Handle Returns */
+			const uint16_t nb_ret =
+				rte_distributor_returned_pkts(d,
+					bufs, BURST_SIZE*2);
+#endif
+
+#else
+			/* Bypass the distributor */
+			const unsigned int xor_val = (rte_eth_dev_count() > 1);
+			/* Touch the mbuf by xor'ing the port */
+			for (unsigned int i = 0; i < nb_rx; i++)
+				bufs[i]->port ^= xor_val;
+
+			const uint16_t nb_ret = nb_rx;
+#endif
+			if (unlikely(nb_ret == 0))
+				continue;
+			app_stats.dist.ret_pkts += nb_ret;
+
+			uint16_t sent = rte_ring_enqueue_burst(out_r,
+					(void *)bufs, nb_ret);
+			app_stats.dist.sent_pkts += sent;
+			if (unlikely(sent < nb_ret)) {
+				app_stats.dist.enqdrop_pkts += nb_ret - sent;
+				RTE_LOG(DEBUG, DISTRAPP,
+					"%s:Packet loss due to full out ring\n",
+					__func__);
+				while (sent < nb_ret)
+					rte_pktmbuf_free(bufs[sent++]);
+			}
+		}
 	}
+	printf("\nCore %u exiting distributor task.\n", rte_lcore_id());
+	quit_signal_work = 1;
+
+#if BURST_API
+	rte_distributor_flush_burst(d);
+	/* Unblock any returns so workers can exit */
+	rte_distributor_clear_returns_burst(d);
+#endif
+	quit_signal_rx = 1;
+	return 0;
 }
 
+
 static int
 lcore_tx(struct rte_ring *in_r)
 {
@@ -327,9 +455,9 @@ lcore_tx(struct rte_ring *in_r)
 			if ((enabled_port_mask & (1 << port)) == 0)
 				continue;
 
-			struct rte_mbuf *bufs[BURST_SIZE];
+			struct rte_mbuf *bufs[BURST_SIZE_TX];
 			const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
-					(void *)bufs, BURST_SIZE);
+					(void *)bufs, BURST_SIZE_TX);
 			app_stats.tx.dequeue_pkts += nb_rx;
 
 			/* if we get no traffic, flush anything we have */
@@ -358,11 +486,12 @@ lcore_tx(struct rte_ring *in_r)
 
 				outbuf = &tx_buffers[outp];
 				outbuf->mbufs[outbuf->count++] = bufs[i];
-				if (outbuf->count == BURST_SIZE)
+				if (outbuf->count == BURST_SIZE_TX)
 					flush_one_port(outbuf, outp);
 			}
 		}
 	}
+	printf("\nCore %u exiting tx task.\n", rte_lcore_id());
 	return 0;
 }
 
@@ -371,52 +500,149 @@ int_handler(int sig_num)
 {
 	printf("Exiting on signal %d\n", sig_num);
 	/* set quit flag for rx thread to exit */
-	quit_signal_rx = 1;
+	quit_signal_dist = 1;
 }
 
 static void
 print_stats(void)
 {
 	struct rte_eth_stats eth_stats;
-	unsigned i;
-
-	printf("\nRX thread stats:\n");
-	printf(" - Received:    %"PRIu64"\n", app_stats.rx.rx_pkts);
-	printf(" - Processed:   %"PRIu64"\n", app_stats.rx.returned_pkts);
-	printf(" - Enqueued:    %"PRIu64"\n", app_stats.rx.enqueued_pkts);
-
-	printf("\nTX thread stats:\n");
-	printf(" - Dequeued:    %"PRIu64"\n", app_stats.tx.dequeue_pkts);
-	printf(" - Transmitted: %"PRIu64"\n", app_stats.tx.tx_pkts);
+	unsigned int i, j;
+	const unsigned int num_workers = rte_lcore_count() - 4;
 
 	for (i = 0; i < rte_eth_dev_count(); i++) {
 		rte_eth_stats_get(i, &eth_stats);
-		printf("\nPort %u stats:\n", i);
-		printf(" - Pkts in:   %"PRIu64"\n", eth_stats.ipackets);
-		printf(" - Pkts out:  %"PRIu64"\n", eth_stats.opackets);
-		printf(" - In Errs:   %"PRIu64"\n", eth_stats.ierrors);
-		printf(" - Out Errs:  %"PRIu64"\n", eth_stats.oerrors);
-		printf(" - Mbuf Errs: %"PRIu64"\n", eth_stats.rx_nombuf);
+		app_stats.port_rx_pkts[i] = eth_stats.ipackets;
+		app_stats.port_tx_pkts[i] = eth_stats.opackets;
+	}
+
+	printf("\n\nRX Thread:\n");
+	for (i = 0; i < rte_eth_dev_count(); i++) {
+		printf("Port %u Pktsin : %5.2f\n", i,
+				(app_stats.port_rx_pkts[i] -
+				prev_app_stats.port_rx_pkts[i])/1000000.0);
+		prev_app_stats.port_rx_pkts[i] = app_stats.port_rx_pkts[i];
+	}
+	printf(" - Received:    %5.2f\n",
+			(app_stats.rx.rx_pkts -
+			prev_app_stats.rx.rx_pkts)/1000000.0);
+	printf(" - Returned:    %5.2f\n",
+			(app_stats.rx.returned_pkts -
+			prev_app_stats.rx.returned_pkts)/1000000.0);
+	printf(" - Enqueued:    %5.2f\n",
+			(app_stats.rx.enqueued_pkts -
+			prev_app_stats.rx.enqueued_pkts)/1000000.0);
+	printf(" - Dropped:     %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.rx.enqdrop_pkts -
+			prev_app_stats.rx.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	printf("Distributor thread:\n");
+	printf(" - In:          %5.2f\n",
+			(app_stats.dist.in_pkts -
+			prev_app_stats.dist.in_pkts)/1000000.0);
+	printf(" - Returned:    %5.2f\n",
+			(app_stats.dist.ret_pkts -
+			prev_app_stats.dist.ret_pkts)/1000000.0);
+	printf(" - Sent:        %5.2f\n",
+			(app_stats.dist.sent_pkts -
+			prev_app_stats.dist.sent_pkts)/1000000.0);
+	printf(" - Dropped      %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.dist.enqdrop_pkts -
+			prev_app_stats.dist.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	printf("TX thread:\n");
+	printf(" - Dequeued:    %5.2f\n",
+			(app_stats.tx.dequeue_pkts -
+			prev_app_stats.tx.dequeue_pkts)/1000000.0);
+	for (i = 0; i < rte_eth_dev_count(); i++) {
+		printf("Port %u Pktsout: %5.2f\n",
+				i, (app_stats.port_tx_pkts[i] -
+				prev_app_stats.port_tx_pkts[i])/1000000.0);
+		prev_app_stats.port_tx_pkts[i] = app_stats.port_tx_pkts[i];
+	}
+	printf(" - Transmitted: %5.2f\n",
+			(app_stats.tx.tx_pkts -
+			prev_app_stats.tx.tx_pkts)/1000000.0);
+	printf(" - Dropped:     %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.tx.enqdrop_pkts -
+			prev_app_stats.tx.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	prev_app_stats.rx.rx_pkts = app_stats.rx.rx_pkts;
+	prev_app_stats.rx.returned_pkts = app_stats.rx.returned_pkts;
+	prev_app_stats.rx.enqueued_pkts = app_stats.rx.enqueued_pkts;
+	prev_app_stats.rx.enqdrop_pkts = app_stats.rx.enqdrop_pkts;
+	prev_app_stats.dist.in_pkts = app_stats.dist.in_pkts;
+	prev_app_stats.dist.ret_pkts = app_stats.dist.ret_pkts;
+	prev_app_stats.dist.sent_pkts = app_stats.dist.sent_pkts;
+	prev_app_stats.dist.enqdrop_pkts = app_stats.dist.enqdrop_pkts;
+	prev_app_stats.tx.dequeue_pkts = app_stats.tx.dequeue_pkts;
+	prev_app_stats.tx.tx_pkts = app_stats.tx.tx_pkts;
+	prev_app_stats.tx.enqdrop_pkts = app_stats.tx.enqdrop_pkts;
+
+	for (i = 0; i < num_workers; i++) {
+		printf("Worker %02u Pkts: %5.2f. Bursts(1-8): ", i,
+				(app_stats.worker_pkts[i] -
+				prev_app_stats.worker_pkts[i])/1000000.0);
+		for (j = 0; j < 8; j++) {
+			printf("%"PRIu64" ", app_stats.worker_bursts[i][j]);
+			app_stats.worker_bursts[i][j] = 0;
+		}
+		printf("\n");
+		prev_app_stats.worker_pkts[i] = app_stats.worker_pkts[i];
 	}
 }
 
 static int
 lcore_worker(struct lcore_params *p)
 {
-	struct rte_distributor *d = p->d;
+	struct rte_distributor_burst *d = p->d;
 	const unsigned id = p->worker_id;
+	unsigned int num = 0;
+	unsigned int i;
+
 	/*
 	 * for single port, xor_val will be zero so we won't modify the output
 	 * port, otherwise we send traffic from 0 to 1, 2 to 3, and vice versa
 	 */
 	const unsigned xor_val = (rte_eth_dev_count() > 1);
-	struct rte_mbuf *buf = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
+
+	app_stats.worker_pkts[p->worker_id] = 1;
+
 
 	printf("\nCore %u acting as worker core.\n", rte_lcore_id());
-	while (!quit_signal) {
-		buf = rte_distributor_get_pkt(d, id, buf);
-		buf->port ^= xor_val;
+	while (!quit_signal_work) {
+
+#if BURST_API
+		num = rte_distributor_get_pkt_burst(d, id, buf, buf, num);
+		/* Do a little bit of work for each packet */
+		for (i = 0; i < num; i++) {
+			uint64_t t = __rdtsc()+100;
+
+			while (__rdtsc() < t)
+				rte_pause();
+			buf[i]->port ^= xor_val;
+		}
+#else
+		buf[0] = rte_distributor_get_pkt(d, id, buf[0]);
+		uint64_t t = __rdtsc() + 10;
+
+		while (__rdtsc() < t)
+			rte_pause();
+		buf[0]->port ^= xor_val;
+#endif
+
+		app_stats.worker_pkts[p->worker_id] += num;
+		if (num > 0)
+			app_stats.worker_bursts[p->worker_id][num-1]++;
 	}
+	printf("\nCore %u exiting worker task.\n", rte_lcore_id());
 	return 0;
 }
 
@@ -496,12 +722,14 @@ int
 main(int argc, char *argv[])
 {
 	struct rte_mempool *mbuf_pool;
-	struct rte_distributor *d;
-	struct rte_ring *output_ring;
+	struct rte_distributor_burst *d;
+	struct rte_ring *dist_tx_ring;
+	struct rte_ring *rx_dist_ring;
 	unsigned lcore_id, worker_id = 0;
 	unsigned nb_ports;
 	uint8_t portid;
 	uint8_t nb_ports_available;
+	uint64_t t, freq;
 
 	/* catch ctrl-c so we can print on exit */
 	signal(SIGINT, int_handler);
@@ -518,10 +746,12 @@ main(int argc, char *argv[])
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Invalid distributor parameters\n");
 
-	if (rte_lcore_count() < 3)
+	if (rte_lcore_count() < 5)
 		rte_exit(EXIT_FAILURE, "Error, This application needs at "
-				"least 3 logical cores to run:\n"
-				"1 lcore for packet RX and distribution\n"
+				"least 5 logical cores to run:\n"
+				"1 lcore for stats (can be core 0)\n"
+				"1 lcore for packet RX\n"
+				"1 lcore for distribution\n"
 				"1 lcore for packet TX\n"
 				"and at least 1 lcore for worker threads\n");
 
@@ -560,41 +790,86 @@ main(int argc, char *argv[])
 				"All available ports are disabled. Please set portmask.\n");
 	}
 
+#if BURST_API
+	d = rte_distributor_create_burst("PKT_DIST", rte_socket_id(),
+			rte_lcore_count() - 4);
+#else
 	d = rte_distributor_create("PKT_DIST", rte_socket_id(),
-			rte_lcore_count() - 2);
+			rte_lcore_count() - 4);
+#endif
 	if (d == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create distributor\n");
 
 	/*
-	 * scheduler ring is read only by the transmitter core, but written to
-	 * by multiple threads
+	 * scheduler ring is read by the transmitter core, and written to
+	 * by scheduler core
 	 */
-	output_ring = rte_ring_create("Output_ring", RTE_RING_SZ,
-			rte_socket_id(), RING_F_SC_DEQ);
-	if (output_ring == NULL)
+	dist_tx_ring = rte_ring_create("Output_ring", SCHED_TX_RING_SZ,
+			rte_socket_id(), RING_F_SC_DEQ | RING_F_SP_ENQ);
+	if (dist_tx_ring == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot create output ring\n");
+
+	rx_dist_ring = rte_ring_create("Input_ring", SCHED_RX_RING_SZ,
+			rte_socket_id(), RING_F_SC_DEQ | RING_F_SP_ENQ);
+	if (rx_dist_ring == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create output ring\n");
 
 	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
-		if (worker_id == rte_lcore_count() - 2)
+		if (worker_id == rte_lcore_count() - 3) {
+			printf("Starting distributor on lcore_id %d\n",
+					lcore_id);
+			/* distributor core */
+			struct lcore_params *p =
+					rte_malloc(NULL, sizeof(*p), 0);
+			if (!p)
+				rte_panic("malloc failure\n");
+			*p = (struct lcore_params){worker_id, d,
+					rx_dist_ring, dist_tx_ring, mbuf_pool};
+			rte_eal_remote_launch(
+					(lcore_function_t *)lcore_distributor,
+					p, lcore_id);
+		} else if (worker_id == rte_lcore_count() - 4) {
+			printf("Starting tx  on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
+			/* tx core */
 			rte_eal_remote_launch((lcore_function_t *)lcore_tx,
-					output_ring, lcore_id);
-		else {
+					dist_tx_ring, lcore_id);
+		} else if (worker_id == rte_lcore_count() - 2) {
+			printf("Starting rx on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
+			/* rx core */
+			struct lcore_params *p =
+					rte_malloc(NULL, sizeof(*p), 0);
+			if (!p)
+				rte_panic("malloc failure\n");
+			*p = (struct lcore_params){worker_id, d, rx_dist_ring,
+					dist_tx_ring, mbuf_pool};
+			rte_eal_remote_launch((lcore_function_t *)lcore_rx,
+					p, lcore_id);
+		} else {
+			printf("Starting worker on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
 			struct lcore_params *p =
 					rte_malloc(NULL, sizeof(*p), 0);
 			if (!p)
 				rte_panic("malloc failure\n");
-			*p = (struct lcore_params){worker_id, d, output_ring, mbuf_pool};
+			*p = (struct lcore_params){worker_id, d, rx_dist_ring,
+					dist_tx_ring, mbuf_pool};
 
 			rte_eal_remote_launch((lcore_function_t *)lcore_worker,
 					p, lcore_id);
 		}
 		worker_id++;
 	}
-	/* call lcore_main on master core only */
-	struct lcore_params p = { 0, d, output_ring, mbuf_pool};
 
-	if (lcore_rx(&p) != 0)
-		return -1;
+	freq = rte_get_timer_hz();
+	t = __rdtsc() + freq;
+	while (!quit_signal_dist) {
+		if (t < __rdtsc()) {
+			print_stats();
+			t = _rdtsc() + freq;
+		}
+	}
 
 	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
 		if (rte_eal_wait_lcore(lcore_id) < 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v6 6/6] doc: distributor library changes for new burst API
  2017-01-23  9:24               ` [PATCH v6 0/6] distributor library " David Hunt
                                   ` (4 preceding siblings ...)
  2017-01-23  9:24                 ` [PATCH v6 5/6] examples/distributor_app: showing burst API David Hunt
@ 2017-01-23  9:24                 ` David Hunt
  2017-01-23 17:02                 ` [PATCH v6 0/6] distributor library performance enhancements Bruce Richardson
  2017-01-24  8:56                 ` Liu, Yong
  7 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-01-23  9:24 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 doc/guides/prog_guide/packet_distrib_lib.rst | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/doc/guides/prog_guide/packet_distrib_lib.rst b/doc/guides/prog_guide/packet_distrib_lib.rst
index b5bdabb..dffd4ad 100644
--- a/doc/guides/prog_guide/packet_distrib_lib.rst
+++ b/doc/guides/prog_guide/packet_distrib_lib.rst
@@ -42,6 +42,10 @@ The model of operation is shown in the diagram below.
 
    Packet Distributor mode of operation
 
+There are two versions of the API in the distributor Library, one which sends one packet at a time to workers,
+and another which sends bursts of up to 8 packets at a time to workers. The functions names of the second API
+are identified by "_burst", and must not be intermixed with the single packet API. The operations described below
+apply to both API's, select which API you wish to use by including the relevant header file.
 
 Distributor Core Operation
 --------------------------
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* Re: [PATCH v5 1/6] lib: distributor performance enhancements
  2017-01-20  9:18             ` [PATCH v5 1/6] lib: distributor " David Hunt
  2017-01-23  9:24               ` [PATCH v6 0/6] distributor library " David Hunt
@ 2017-01-23 12:26               ` Bruce Richardson
  1 sibling, 0 replies; 202+ messages in thread
From: Bruce Richardson @ 2017-01-23 12:26 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Fri, Jan 20, 2017 at 09:18:48AM +0000, David Hunt wrote:
> Now sends bursts of up to 8 mbufs to each worker, and tracks
> the in-flight flow-ids (atomic scheduling)
> 
> New file with a new api, similar to the old API except with _burst
> at the end of the function names. This is to preserve the original
> API (and code) for backward compatibility.
> 
> It uses a similar handshake mechanism to the previous version of
> the library, in that bits are used to indicate when packets are ready
> to be sent to a worker and ready to be returned from a worker. One main
> difference is that instead of sending one packet in a cache line, it
> makes use of the 7 free spaces in the same cache line in order to send up to
> 8 packets at a time to/from a worker.
> 
> The flow matching algorithm has had significant re-work, and now keeps
> an array of inflight flows and an array of backlog flows, and matches
> incoming flows to the inflight/backlog flows of all workers so
> that flow pinning to workers can be maintained.
> 
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---

I still see some issues reported here by sanity check scripts.

/Bruce

--- /dev/null   2017-01-10 10:26:01.206201474 +0000
+++ /tmp/doc-check/doc.txt      2017-01-23 12:23:01.748870247 +0000
@@ -0,0 +1,6 @@
+/home/bruce/dpdk-clean/lib/librte_distributor/rte_distributor_burst.h:187:
warning: argument 'mbuf' of command @param is not found in the argument
list of rte_distributor_return_pkt_burst(struct rte_distributor_burst
*d, unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
+/home/bruce/dpdk-clean/lib/librte_distributor/rte_distributor_burst.h:199:
warning: The following parameters of
rte_distributor_return_pkt_burst(struct rte_distributor_burst *d,
unsigned int worker_id, struct rte_mbuf **oldpkt, int num) are not
documented:
+  parameter 'oldpkt'
+  parameter 'num'
+/home/bruce/dpdk-clean/lib/librte_distributor/rte_distributor_priv.h:73:
warning: Found unknown command `\in_flight_bitmask'
+/home/bruce/dpdk-clean/lib/librte_distributor/rte_distributor_priv.h:73:
warning: Found unknown command `\rte_distributor_process'
==== Error with doc check ====
Line too long:
        makes use of the 7 free spaces in the same cache line in order
	to send up to

### lib: distributor performance enhancements

WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description
(prefer a maximum 75 chars per line)
#17:
makes use of the 7 free spaces in the same cache line in order to send
up to

total: 0 errors, 1 warnings, 1131 lines checked

0/1 valid patch

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v5 5/6] examples/distributor_app: showing burst API
  2017-01-20  9:18             ` [PATCH v5 5/6] examples/distributor_app: showing burst API David Hunt
@ 2017-01-23 12:31               ` Bruce Richardson
  0 siblings, 0 replies; 202+ messages in thread
From: Bruce Richardson @ 2017-01-23 12:31 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Fri, Jan 20, 2017 at 09:18:52AM +0000, David Hunt wrote:
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  examples/distributor/main.c | 509 ++++++++++++++++++++++++++++++++++----------
>  1 file changed, 391 insertions(+), 118 deletions(-)
> 

Another minor nit from checkpatch.

/Bruce

### examples/distributor_app: showing burst API

WARNING:BLOCK_COMMENT_STYLE: Block comments should align the * on each
line
#301: FILE: examples/distributor/main.c:327:
+               /*
+               * Swap the following two lines if you want the rx
traffic

total: 0 errors, 1 warnings, 744 lines checked

0/1 valid patch

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v6 0/6] distributor library performance enhancements
  2017-01-23  9:24               ` [PATCH v6 0/6] distributor library " David Hunt
                                   ` (5 preceding siblings ...)
  2017-01-23  9:24                 ` [PATCH v6 6/6] doc: distributor library changes for new " David Hunt
@ 2017-01-23 17:02                 ` Bruce Richardson
  2017-01-24  8:56                 ` Liu, Yong
  7 siblings, 0 replies; 202+ messages in thread
From: Bruce Richardson @ 2017-01-23 17:02 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Mon, Jan 23, 2017 at 09:24:34AM +0000, David Hunt wrote:
> This patch aims to improve the throughput of the distributor library.
> 
> It uses a similar handshake mechanism to the previous version of
> the library, in that bits are used to indicate when packets are ready
> to be sent to a worker and ready to be returned from a worker. One main
> difference is that instead of sending one packet in a cache line, it makes
> use of the 7 free spaces in the same cache line in order to send up to
> 8 packets at a time to/from a worker.
> 
> The flow matching algorithm has had significant re-work, and now keeps an
> array of inflight flows and an array of backlog flows, and matches incoming
> flows to the inflight/backlog flows of all workers so that flow pinning to
> workers can be maintained.
> 
> The Flow Match algorithm has both scalar and a vector versions, and a
> function pointer is used to select the post appropriate function at run time,
> depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
> the scalar match function is selected, which should still gives a good boost
> in performance over the non-burst API.
> 
> v2 changes:
>   * Created a common distributor_priv.h header file with common
>     definitions and structures.
>   * Added a scalar version so it can be built and used on machines without
>     sse2 instruction set
>   * Added unit autotests
>   * Added perf autotest
> 
> v3 changes:
>   * Addressed mailing list review comments
>   * Test code removal
>   * Split out SSE match into separate file to facilitate NEON addition
>   * Cleaned up conditional compilation flags for SSE2
>   * Addressed c99 style compilation errors
>   * rebased on latest head (Jan 2 2017, Happy New Year to all)
> 
> v4 changes:
>    * fixed issue building shared libraries
> 
> v5 changes:
>    * Removed some un-needed code around retries in worker API calls
>    * Cleanup due to review comments on mailing list
>    * Cleanup of non-x86 platform compilation, fallback to scalar match
> 
> v6 changes:
>    * Fixed intermittent segfault where num pkts not divisible
>      by BURST_SIZE
>    * Cleanup due to review comments on mailing list
>    * Renamed _priv.h to _private.h.
> 
> Notes:
>    Apps must now work in bursts, as up to 8 are given to a worker at a time
>    For performance in matching, Flow ID's are 15-bits
>    Original API (and code) is kept for backward compatibility
> 
> Performance Gains
>    2.2GHz Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
>    2 x XL710 40GbE NICS to 2 x 40Gbps traffic generator channels 64b packets
>    separate cores for rx, tx, distributor
>     1 worker  - 4.8x
>     4 workers - 2.9x
>     8 workers - 1.8x
>    12 workers - 2.1x
>    16 workers - 1.8x
> 

Series Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v6 0/6] distributor library performance enhancements
  2017-01-23  9:24               ` [PATCH v6 0/6] distributor library " David Hunt
                                   ` (6 preceding siblings ...)
  2017-01-23 17:02                 ` [PATCH v6 0/6] distributor library performance enhancements Bruce Richardson
@ 2017-01-24  8:56                 ` Liu, Yong
  7 siblings, 0 replies; 202+ messages in thread
From: Liu, Yong @ 2017-01-24  8:56 UTC (permalink / raw)
  To: Hunt, David, dev; +Cc: Richardson, Bruce

Tested-by: Yong Liu <yong.liu@intel.com>

- Tested Branch: master
- Tested Commit: 61207d014fc906302a184ae2f779b54ccfd0cd4c
- OS: Fedora20 4.9.0
- GCC: gcc version 4.8.3 20140911
- CPU: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz
- NIC: Intel Corporation Device Fortville [8086:1584]
- Default x86_64-native-linuxapp-gcc configuration
- Prerequisites:
- Total 6 cases, 6 passed, 0 failed

- Prerequisites command / instruction:
  Intel(r) X710 (Fortville) NIC plugged in

- Case: Distributor unit test
  Description: check burst packet distributor API work fine
  Command / instruction:
    Start test application and run distributor unit test
       test -c f -n 4 -- -i
       RTE>>distributor_autotest
    Verify burst distributor API unit test passed

- Case: Distributor performance unit test
  Description: check burst packet distributor API performance
  Command / instruction:
    Start test application and run distributor unit test
       test -c f -n 4 -- -i
       RTE>>distributor_perf_autotest
    Compared CPU cycles for normal distributor and burst API
    Verify burst distributor API cost much less cycles then legacy library

- Case: Distributor library function check
  Description: check burst packet distributor API performance
  Command / instruction:
    Start distributor sample with one worker::
      distributor_app -c 0x7c  -n 4 -- -p 0x1
    Send few packets (less then burst 8) with sequence index
    Check forwarded packets are all in sequence and content not changed
    Send packets equal to burst size with sequence index
    Check forwarded packets are all in sequence and content not changed
    Send packets over burst size with sequence index
    Check forwarded packets are all in sequence and content not changed

- Case: Distributor between multiple workers
  Description: check burst packet distributor sample performance
  Command / instruction:
    Start distributor sample with multiple workers::
      distributor_app -c 0xfc  -n 4 -- -p 0x1
    Send several packets with IP address increasing
    Check packets distributed to all workers
    Repeat these steps for 4/8/16/32 workers

- Case: Distributor between maximum workers
  Description: check burst packet distributor can work with 63 workers
  Command / instruction:
    Start distributor sample with multiple workers::
      distributor_app -c 0xeffffffffffffffff0  -n 4 -- -p 0x1
    Send several packets with IP address increasing
    Check packets distributed to all workers
    
- Case: Distributor packets from multiple input ports
  Description: check burst packet distributor work with multiple inputs
  Command / instruction:
    Start distributor sample with multiple workers::
      distributor_app -c 0x7c -n 4 -- -p 0x3
    Send several packets from two tester ports with different IP
    Check packets forwarded back

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of David Hunt
> Sent: Monday, January 23, 2017 5:25 PM
> To: dev@dpdk.org
> Cc: Richardson, Bruce <bruce.richardson@intel.com>
> Subject: [dpdk-dev] [PATCH v6 0/6] distributor library performance
> enhancements

^ permalink raw reply	[flat|nested] 202+ messages in thread

* [PATCH v7 0/17] distributor library performance enhancements
  2017-01-23  9:24                 ` [PATCH v6 1/6] lib: distributor " David Hunt
@ 2017-02-21  3:17                   ` David Hunt
  2017-02-21  3:17                     ` [PATCH v7 01/17] lib: rename legacy distributor lib files David Hunt
                                       ` (17 more replies)
  0 siblings, 18 replies; 202+ messages in thread
From: David Hunt @ 2017-02-21  3:17 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson

This patch aims to improve the throughput of the distributor library.

It uses a similar handshake mechanism to the previous version of
the library, in that bits are used to indicate when packets are ready
to be sent to a worker and ready to be returned from a worker. One main
difference is that instead of sending one packet in a cache line, it makes
use of the 7 free spaces in the same cache line in order to send up to
8 packets at a time to/from a worker.

The flow matching algorithm has had significant re-work, and now keeps an
array of inflight flows and an array of backlog flows, and matches incoming
flows to the inflight/backlog flows of all workers so that flow pinning to
workers can be maintained.

The Flow Match algorithm has both scalar and a vector versions, and a
function pointer is used to select the post appropriate function at run time,
depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
the scalar match function is selected, which should still gives a good boost
in performance over the non-burst API.

v2 changes:
  * Created a common distributor_priv.h header file with common
    definitions and structures.
  * Added a scalar version so it can be built and used on machines without
    sse2 instruction set
  * Added unit autotests
  * Added perf autotest

v3 changes:
  * Addressed mailing list review comments
  * Test code removal
  * Split out SSE match into separate file to facilitate NEON addition
  * Cleaned up conditional compilation flags for SSE2
  * Addressed c99 style compilation errors
  * rebased on latest head (Jan 2 2017, Happy New Year to all)

v4 changes:
   * fixed issue building shared libraries

v5 changes:
   * Removed some un-needed code around retries in worker API calls
   * Cleanup due to review comments on mailing list
   * Cleanup of non-x86 platform compilation, fallback to scalar match

v6 changes:
   * Fixed intermittent segfault where num pkts not divisible
     by BURST_SIZE
   * Cleanup due to review comments on mailing list
   * Renamed _priv.h to _private.h.

v7 changes:
   * Reorganised patch so there's a more natural progression in the
     changes, and divided them down into easier to review chunks.
   * Previous versions of this patch set were effectively two APIs.
     We now have a single API. Legacy functionality can
     be used by by using the rte_distributor_create API call with the
     RTE_DISTRIBUTOR_SINGLE flag when creating a distributor instance.
   * Added symbol versioning for old API so that ABI is preserved.

Notes:
   Apps must now work in bursts, as up to 8 are given to a worker at a time
   For performance in matching, Flow ID's are 15-bits
   If 32 bits Flow IDs are required, use the packet-at-a-time (SINGLE)
   mode.

Performance Gains
   2.2GHz Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
   2 x XL710 40GbE NICS to 2 x 40Gbps traffic generator channels 64b packets
   separate cores for rx, tx, distributor
    1 worker  - 4.8x
    4 workers - 2.9x
    8 workers - 1.8x
   12 workers - 2.1x
   16 workers - 1.8x

[01/17] lib: rename legacy distributor lib files
[02/17] lib: symbol versioning of functions in distributor
[03/17] lib: create rte_distributor_private.h
[04/17] lib: add new burst oriented distributor structs
[05/17] lib: add new distributor code
[06/17] lib: add SIMD flow matching to distributor
[07/17] lib: apply symbol versioning to distibutor lib
[08/17] test: change params to distributor autotest
[09/17] test: switch distributor test over to burst API
[10/17] test: test single and burst distributor API
[11/17] test: add perf test for distributor burst mode
[12/17] example: add extra stats to distributor sample
[13/17] sample: distributor: wait for ports to come up
[14/17] sample: switch to new distributor API
[15/17] lib: make v20 header file private
[16/17] doc: distributor library changes for new burst api
[17/17] maintainers: add to distributor lib maintainers

^ permalink raw reply	[flat|nested] 202+ messages in thread

* [PATCH v7 01/17] lib: rename legacy distributor lib files
  2017-02-21  3:17                   ` [PATCH v7 0/17] distributor library " David Hunt
@ 2017-02-21  3:17                     ` David Hunt
  2017-02-21 10:27                       ` Hunt, David
                                         ` (2 more replies)
  2017-02-21  3:17                     ` [PATCH v7 02/17] lib: symbol versioning of functions in distributor David Hunt
                                       ` (16 subsequent siblings)
  17 siblings, 3 replies; 202+ messages in thread
From: David Hunt @ 2017-02-21  3:17 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Move files out of the way so that we can replace with new
versions of the distributor libtrary. Files are named in
such a way as to match the symbol versioning that we will
apply for backward ABI compatibility.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 app/test/test_distributor.c                  |   2 +-
 app/test/test_distributor_perf.c             |   2 +-
 examples/distributor/main.c                  |   2 +-
 lib/librte_distributor/Makefile              |   4 +-
 lib/librte_distributor/rte_distributor.c     | 487 ---------------------------
 lib/librte_distributor/rte_distributor.h     | 247 --------------
 lib/librte_distributor/rte_distributor_v20.c | 487 +++++++++++++++++++++++++++
 lib/librte_distributor/rte_distributor_v20.h | 247 ++++++++++++++
 8 files changed, 739 insertions(+), 739 deletions(-)
 delete mode 100644 lib/librte_distributor/rte_distributor.c
 delete mode 100644 lib/librte_distributor/rte_distributor.h
 create mode 100644 lib/librte_distributor/rte_distributor_v20.c
 create mode 100644 lib/librte_distributor/rte_distributor_v20.h

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index 85cb8f3..ba402e2 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -39,7 +39,7 @@
 #include <rte_errno.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
-#include <rte_distributor.h>
+#include <rte_distributor_v20.h>
 
 #define ITER_POWER 20 /* log 2 of how many iterations we do when timing. */
 #define BURST 32
diff --git a/app/test/test_distributor_perf.c b/app/test/test_distributor_perf.c
index 7947fe9..fe0c97d 100644
--- a/app/test/test_distributor_perf.c
+++ b/app/test/test_distributor_perf.c
@@ -39,7 +39,7 @@
 #include <rte_cycles.h>
 #include <rte_common.h>
 #include <rte_mbuf.h>
-#include <rte_distributor.h>
+#include <rte_distributor_v20.h>
 
 #define ITER_POWER 20 /* log 2 of how many iterations we do when timing. */
 #define BURST 32
diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index e7641d2..fba5446 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -43,7 +43,7 @@
 #include <rte_malloc.h>
 #include <rte_debug.h>
 #include <rte_prefetch.h>
-#include <rte_distributor.h>
+#include <rte_distributor_v20.h>
 
 #define RX_RING_SIZE 256
 #define TX_RING_SIZE 512
diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 4c9af17..60837ed 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -42,10 +42,10 @@ EXPORT_MAP := rte_distributor_version.map
 LIBABIVER := 1
 
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
 
 # install this header file
-SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor_v20.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
deleted file mode 100644
index f3f778c..0000000
--- a/lib/librte_distributor/rte_distributor.c
+++ /dev/null
@@ -1,487 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#include <stdio.h>
-#include <sys/queue.h>
-#include <string.h>
-#include <rte_mbuf.h>
-#include <rte_memory.h>
-#include <rte_memzone.h>
-#include <rte_errno.h>
-#include <rte_string_fns.h>
-#include <rte_eal_memconfig.h>
-#include "rte_distributor.h"
-
-#define NO_FLAGS 0
-#define RTE_DISTRIB_PREFIX "DT_"
-
-/* we will use the bottom four bits of pointer for flags, shifting out
- * the top four bits to make room (since a 64-bit pointer actually only uses
- * 48 bits). An arithmetic-right-shift will then appropriately restore the
- * original pointer value with proper sign extension into the top bits. */
-#define RTE_DISTRIB_FLAG_BITS 4
-#define RTE_DISTRIB_FLAGS_MASK (0x0F)
-#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
-#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
-#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
-
-#define RTE_DISTRIB_BACKLOG_SIZE 8
-#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
-
-#define RTE_DISTRIB_MAX_RETURNS 128
-#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
-
-/**
- * Maximum number of workers allowed.
- * Be aware of increasing the limit, becaus it is limited by how we track
- * in-flight tags. See @in_flight_bitmask and @rte_distributor_process
- */
-#define RTE_DISTRIB_MAX_WORKERS	64
-
-/**
- * Buffer structure used to pass the pointer data between cores. This is cache
- * line aligned, but to improve performance and prevent adjacent cache-line
- * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
- * the next cache line to worker 0, we pad this out to three cache lines.
- * Only 64-bits of the memory is actually used though.
- */
-union rte_distributor_buffer {
-	volatile int64_t bufptr64;
-	char pad[RTE_CACHE_LINE_SIZE*3];
-} __rte_cache_aligned;
-
-struct rte_distributor_backlog {
-	unsigned start;
-	unsigned count;
-	int64_t pkts[RTE_DISTRIB_BACKLOG_SIZE];
-};
-
-struct rte_distributor_returned_pkts {
-	unsigned start;
-	unsigned count;
-	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
-};
-
-struct rte_distributor {
-	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
-
-	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
-	unsigned num_workers;                 /**< Number of workers polling */
-
-	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
-		/**< Tracks the tag being processed per core */
-	uint64_t in_flight_bitmask;
-		/**< on/off bits for in-flight tags.
-		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
-		 * the bitmask has to expand.
-		 */
-
-	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
-
-	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
-
-	struct rte_distributor_returned_pkts returns;
-};
-
-TAILQ_HEAD(rte_distributor_list, rte_distributor);
-
-static struct rte_tailq_elem rte_distributor_tailq = {
-	.name = "RTE_DISTRIBUTOR",
-};
-EAL_REGISTER_TAILQ(rte_distributor_tailq)
-
-/**** APIs called by workers ****/
-
-void
-rte_distributor_request_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt)
-{
-	union rte_distributor_buffer *buf = &d->bufs[worker_id];
-	int64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
-			| RTE_DISTRIB_GET_BUF;
-	while (unlikely(buf->bufptr64 & RTE_DISTRIB_FLAGS_MASK))
-		rte_pause();
-	buf->bufptr64 = req;
-}
-
-struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
-		unsigned worker_id)
-{
-	union rte_distributor_buffer *buf = &d->bufs[worker_id];
-	if (buf->bufptr64 & RTE_DISTRIB_GET_BUF)
-		return NULL;
-
-	/* since bufptr64 is signed, this should be an arithmetic shift */
-	int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
-	return (struct rte_mbuf *)((uintptr_t)ret);
-}
-
-struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt)
-{
-	struct rte_mbuf *ret;
-	rte_distributor_request_pkt(d, worker_id, oldpkt);
-	while ((ret = rte_distributor_poll_pkt(d, worker_id)) == NULL)
-		rte_pause();
-	return ret;
-}
-
-int
-rte_distributor_return_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt)
-{
-	union rte_distributor_buffer *buf = &d->bufs[worker_id];
-	uint64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
-			| RTE_DISTRIB_RETURN_BUF;
-	buf->bufptr64 = req;
-	return 0;
-}
-
-/**** APIs called on distributor core ***/
-
-/* as name suggests, adds a packet to the backlog for a particular worker */
-static int
-add_to_backlog(struct rte_distributor_backlog *bl, int64_t item)
-{
-	if (bl->count == RTE_DISTRIB_BACKLOG_SIZE)
-		return -1;
-
-	bl->pkts[(bl->start + bl->count++) & (RTE_DISTRIB_BACKLOG_MASK)]
-			= item;
-	return 0;
-}
-
-/* takes the next packet for a worker off the backlog */
-static int64_t
-backlog_pop(struct rte_distributor_backlog *bl)
-{
-	bl->count--;
-	return bl->pkts[bl->start++ & RTE_DISTRIB_BACKLOG_MASK];
-}
-
-/* stores a packet returned from a worker inside the returns array */
-static inline void
-store_return(uintptr_t oldbuf, struct rte_distributor *d,
-		unsigned *ret_start, unsigned *ret_count)
-{
-	/* store returns in a circular buffer - code is branch-free */
-	d->returns.mbufs[(*ret_start + *ret_count) & RTE_DISTRIB_RETURNS_MASK]
-			= (void *)oldbuf;
-	*ret_start += (*ret_count == RTE_DISTRIB_RETURNS_MASK) & !!(oldbuf);
-	*ret_count += (*ret_count != RTE_DISTRIB_RETURNS_MASK) & !!(oldbuf);
-}
-
-static inline void
-handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
-{
-	d->in_flight_tags[wkr] = 0;
-	d->in_flight_bitmask &= ~(1UL << wkr);
-	d->bufs[wkr].bufptr64 = 0;
-	if (unlikely(d->backlog[wkr].count != 0)) {
-		/* On return of a packet, we need to move the
-		 * queued packets for this core elsewhere.
-		 * Easiest solution is to set things up for
-		 * a recursive call. That will cause those
-		 * packets to be queued up for the next free
-		 * core, i.e. it will return as soon as a
-		 * core becomes free to accept the first
-		 * packet, as subsequent ones will be added to
-		 * the backlog for that core.
-		 */
-		struct rte_mbuf *pkts[RTE_DISTRIB_BACKLOG_SIZE];
-		unsigned i;
-		struct rte_distributor_backlog *bl = &d->backlog[wkr];
-
-		for (i = 0; i < bl->count; i++) {
-			unsigned idx = (bl->start + i) &
-					RTE_DISTRIB_BACKLOG_MASK;
-			pkts[i] = (void *)((uintptr_t)(bl->pkts[idx] >>
-					RTE_DISTRIB_FLAG_BITS));
-		}
-		/* recursive call.
-		 * Note that the tags were set before first level call
-		 * to rte_distributor_process.
-		 */
-		rte_distributor_process(d, pkts, i);
-		bl->count = bl->start = 0;
-	}
-}
-
-/* this function is called when process() fn is called without any new
- * packets. It goes through all the workers and clears any returned packets
- * to do a partial flush.
- */
-static int
-process_returns(struct rte_distributor *d)
-{
-	unsigned wkr;
-	unsigned flushed = 0;
-	unsigned ret_start = d->returns.start,
-			ret_count = d->returns.count;
-
-	for (wkr = 0; wkr < d->num_workers; wkr++) {
-
-		const int64_t data = d->bufs[wkr].bufptr64;
-		uintptr_t oldbuf = 0;
-
-		if (data & RTE_DISTRIB_GET_BUF) {
-			flushed++;
-			if (d->backlog[wkr].count)
-				d->bufs[wkr].bufptr64 =
-						backlog_pop(&d->backlog[wkr]);
-			else {
-				d->bufs[wkr].bufptr64 = RTE_DISTRIB_GET_BUF;
-				d->in_flight_tags[wkr] = 0;
-				d->in_flight_bitmask &= ~(1UL << wkr);
-			}
-			oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
-		} else if (data & RTE_DISTRIB_RETURN_BUF) {
-			handle_worker_shutdown(d, wkr);
-			oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
-		}
-
-		store_return(oldbuf, d, &ret_start, &ret_count);
-	}
-
-	d->returns.start = ret_start;
-	d->returns.count = ret_count;
-
-	return flushed;
-}
-
-/* process a set of packets to distribute them to workers */
-int
-rte_distributor_process(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned num_mbufs)
-{
-	unsigned next_idx = 0;
-	unsigned wkr = 0;
-	struct rte_mbuf *next_mb = NULL;
-	int64_t next_value = 0;
-	uint32_t new_tag = 0;
-	unsigned ret_start = d->returns.start,
-			ret_count = d->returns.count;
-
-	if (unlikely(num_mbufs == 0))
-		return process_returns(d);
-
-	while (next_idx < num_mbufs || next_mb != NULL) {
-
-		int64_t data = d->bufs[wkr].bufptr64;
-		uintptr_t oldbuf = 0;
-
-		if (!next_mb) {
-			next_mb = mbufs[next_idx++];
-			next_value = (((int64_t)(uintptr_t)next_mb)
-					<< RTE_DISTRIB_FLAG_BITS);
-			/*
-			 * User is advocated to set tag vaue for each
-			 * mbuf before calling rte_distributor_process.
-			 * User defined tags are used to identify flows,
-			 * or sessions.
-			 */
-			new_tag = next_mb->hash.usr;
-
-			/*
-			 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64
-			 * then the size of match has to be expanded.
-			 */
-			uint64_t match = 0;
-			unsigned i;
-			/*
-			 * to scan for a match use "xor" and "not" to get a 0/1
-			 * value, then use shifting to merge to single "match"
-			 * variable, where a one-bit indicates a match for the
-			 * worker given by the bit-position
-			 */
-			for (i = 0; i < d->num_workers; i++)
-				match |= (!(d->in_flight_tags[i] ^ new_tag)
-					<< i);
-
-			/* Only turned-on bits are considered as match */
-			match &= d->in_flight_bitmask;
-
-			if (match) {
-				next_mb = NULL;
-				unsigned worker = __builtin_ctzl(match);
-				if (add_to_backlog(&d->backlog[worker],
-						next_value) < 0)
-					next_idx--;
-			}
-		}
-
-		if ((data & RTE_DISTRIB_GET_BUF) &&
-				(d->backlog[wkr].count || next_mb)) {
-
-			if (d->backlog[wkr].count)
-				d->bufs[wkr].bufptr64 =
-						backlog_pop(&d->backlog[wkr]);
-
-			else {
-				d->bufs[wkr].bufptr64 = next_value;
-				d->in_flight_tags[wkr] = new_tag;
-				d->in_flight_bitmask |= (1UL << wkr);
-				next_mb = NULL;
-			}
-			oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
-		} else if (data & RTE_DISTRIB_RETURN_BUF) {
-			handle_worker_shutdown(d, wkr);
-			oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
-		}
-
-		/* store returns in a circular buffer */
-		store_return(oldbuf, d, &ret_start, &ret_count);
-
-		if (++wkr == d->num_workers)
-			wkr = 0;
-	}
-	/* to finish, check all workers for backlog and schedule work for them
-	 * if they are ready */
-	for (wkr = 0; wkr < d->num_workers; wkr++)
-		if (d->backlog[wkr].count &&
-				(d->bufs[wkr].bufptr64 & RTE_DISTRIB_GET_BUF)) {
-
-			int64_t oldbuf = d->bufs[wkr].bufptr64 >>
-					RTE_DISTRIB_FLAG_BITS;
-			store_return(oldbuf, d, &ret_start, &ret_count);
-
-			d->bufs[wkr].bufptr64 = backlog_pop(&d->backlog[wkr]);
-		}
-
-	d->returns.start = ret_start;
-	d->returns.count = ret_count;
-	return num_mbufs;
-}
-
-/* return to the caller, packets returned from workers */
-int
-rte_distributor_returned_pkts(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned max_mbufs)
-{
-	struct rte_distributor_returned_pkts *returns = &d->returns;
-	unsigned retval = (max_mbufs < returns->count) ?
-			max_mbufs : returns->count;
-	unsigned i;
-
-	for (i = 0; i < retval; i++) {
-		unsigned idx = (returns->start + i) & RTE_DISTRIB_RETURNS_MASK;
-		mbufs[i] = returns->mbufs[idx];
-	}
-	returns->start += i;
-	returns->count -= i;
-
-	return retval;
-}
-
-/* return the number of packets in-flight in a distributor, i.e. packets
- * being workered on or queued up in a backlog. */
-static inline unsigned
-total_outstanding(const struct rte_distributor *d)
-{
-	unsigned wkr, total_outstanding;
-
-	total_outstanding = __builtin_popcountl(d->in_flight_bitmask);
-
-	for (wkr = 0; wkr < d->num_workers; wkr++)
-		total_outstanding += d->backlog[wkr].count;
-
-	return total_outstanding;
-}
-
-/* flush the distributor, so that there are no outstanding packets in flight or
- * queued up. */
-int
-rte_distributor_flush(struct rte_distributor *d)
-{
-	const unsigned flushed = total_outstanding(d);
-
-	while (total_outstanding(d) > 0)
-		rte_distributor_process(d, NULL, 0);
-
-	return flushed;
-}
-
-/* clears the internal returns array in the distributor */
-void
-rte_distributor_clear_returns(struct rte_distributor *d)
-{
-	d->returns.start = d->returns.count = 0;
-#ifndef __OPTIMIZE__
-	memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
-#endif
-}
-
-/* creates a distributor instance */
-struct rte_distributor *
-rte_distributor_create(const char *name,
-		unsigned socket_id,
-		unsigned num_workers)
-{
-	struct rte_distributor *d;
-	struct rte_distributor_list *distributor_list;
-	char mz_name[RTE_MEMZONE_NAMESIZE];
-	const struct rte_memzone *mz;
-
-	/* compilation-time checks */
-	RTE_BUILD_BUG_ON((sizeof(*d) & RTE_CACHE_LINE_MASK) != 0);
-	RTE_BUILD_BUG_ON((RTE_DISTRIB_MAX_WORKERS & 7) != 0);
-	RTE_BUILD_BUG_ON(RTE_DISTRIB_MAX_WORKERS >
-				sizeof(d->in_flight_bitmask) * CHAR_BIT);
-
-	if (name == NULL || num_workers >= RTE_DISTRIB_MAX_WORKERS) {
-		rte_errno = EINVAL;
-		return NULL;
-	}
-
-	snprintf(mz_name, sizeof(mz_name), RTE_DISTRIB_PREFIX"%s", name);
-	mz = rte_memzone_reserve(mz_name, sizeof(*d), socket_id, NO_FLAGS);
-	if (mz == NULL) {
-		rte_errno = ENOMEM;
-		return NULL;
-	}
-
-	d = mz->addr;
-	snprintf(d->name, sizeof(d->name), "%s", name);
-	d->num_workers = num_workers;
-
-	distributor_list = RTE_TAILQ_CAST(rte_distributor_tailq.head,
-					  rte_distributor_list);
-
-	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
-	TAILQ_INSERT_TAIL(distributor_list, d, next);
-	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
-
-	return d;
-}
diff --git a/lib/librte_distributor/rte_distributor.h b/lib/librte_distributor/rte_distributor.h
deleted file mode 100644
index 7d36bc8..0000000
--- a/lib/librte_distributor/rte_distributor.h
+++ /dev/null
@@ -1,247 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _RTE_DISTRIBUTE_H_
-#define _RTE_DISTRIBUTE_H_
-
-/**
- * @file
- * RTE distributor
- *
- * The distributor is a component which is designed to pass packets
- * one-at-a-time to workers, with dynamic load balancing.
- */
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
-
-struct rte_distributor;
-struct rte_mbuf;
-
-/**
- * Function to create a new distributor instance
- *
- * Reserves the memory needed for the distributor operation and
- * initializes the distributor to work with the configured number of workers.
- *
- * @param name
- *   The name to be given to the distributor instance.
- * @param socket_id
- *   The NUMA node on which the memory is to be allocated
- * @param num_workers
- *   The maximum number of workers that will request packets from this
- *   distributor
- * @return
- *   The newly created distributor instance
- */
-struct rte_distributor *
-rte_distributor_create(const char *name, unsigned socket_id,
-		unsigned num_workers);
-
-/*  *** APIS to be called on the distributor lcore ***  */
-/*
- * The following APIs are the public APIs which are designed for use on a
- * single lcore which acts as the distributor lcore for a given distributor
- * instance. These functions cannot be called on multiple cores simultaneously
- * without using locking to protect access to the internals of the distributor.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * Process a set of packets by distributing them among workers that request
- * packets. The distributor will ensure that no two packets that have the
- * same flow id, or tag, in the mbuf will be procesed at the same time.
- *
- * The user is advocated to set tag for each mbuf before calling this function.
- * If user doesn't set the tag, the tag value can be various values depending on
- * driver implementation and configuration.
- *
- * This is not multi-thread safe and should only be called on a single lcore.
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs to be distributed
- * @param num_mbufs
- *   The number of mbufs in the mbufs array
- * @return
- *   The number of mbufs processed.
- */
-int
-rte_distributor_process(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned num_mbufs);
-
-/**
- * Get a set of mbufs that have been returned to the distributor by workers
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs pointer array to be filled in
- * @param max_mbufs
- *   The size of the mbufs array
- * @return
- *   The number of mbufs returned in the mbufs array.
- */
-int
-rte_distributor_returned_pkts(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned max_mbufs);
-
-/**
- * Flush the distributor component, so that there are no in-flight or
- * backlogged packets awaiting processing
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @return
- *   The number of queued/in-flight packets that were completed by this call.
- */
-int
-rte_distributor_flush(struct rte_distributor *d);
-
-/**
- * Clears the array of returned packets used as the source for the
- * rte_distributor_returned_pkts() API call.
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- */
-void
-rte_distributor_clear_returns(struct rte_distributor *d);
-
-/*  *** APIS to be called on the worker lcores ***  */
-/*
- * The following APIs are the public APIs which are designed for use on
- * multiple lcores which act as workers for a distributor. Each lcore should use
- * a unique worker id when requesting packets.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * API called by a worker to get a new packet to process. Any previous packet
- * given to the worker is assumed to have completed processing, and may be
- * optionally returned to the distributor via the oldpkt parameter.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- *
- * @return
- *   A new packet to be processed by the worker thread.
- */
-struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to return a completed packet without requesting a
- * new packet, for example, because a worker thread is shutting down
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param mbuf
- *   The previous packet being processed by the worker
- */
-int
-rte_distributor_return_pkt(struct rte_distributor *d, unsigned worker_id,
-		struct rte_mbuf *mbuf);
-
-/**
- * API called by a worker to request a new packet to process.
- * Any previous packet given to the worker is assumed to have completed
- * processing, and may be optionally returned to the distributor via
- * the oldpkt parameter.
- * Unlike rte_distributor_get_pkt(), this function does not wait for a new
- * packet to be provided by the distributor.
- *
- * NOTE: after calling this function, rte_distributor_poll_pkt() should
- * be used to poll for the packet requested. The rte_distributor_get_pkt()
- * API should *not* be used to try and retrieve the new packet.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- */
-void
-rte_distributor_request_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to check for a new packet that was previously
- * requested by a call to rte_distributor_request_pkt(). It does not wait
- * for the new packet to be available, but returns NULL if the request has
- * not yet been fulfilled by the distributor.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- *
- * @return
- *   A new packet to be processed by the worker thread, or NULL if no
- *   packet is yet available.
- */
-struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
-		unsigned worker_id);
-
-#ifdef __cplusplus
-}
-#endif
-
-#endif
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
new file mode 100644
index 0000000..b890947
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -0,0 +1,487 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <sys/queue.h>
+#include <string.h>
+#include <rte_mbuf.h>
+#include <rte_memory.h>
+#include <rte_memzone.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
+#include <rte_eal_memconfig.h>
+#include "rte_distributor_v20.h"
+
+#define NO_FLAGS 0
+#define RTE_DISTRIB_PREFIX "DT_"
+
+/* we will use the bottom four bits of pointer for flags, shifting out
+ * the top four bits to make room (since a 64-bit pointer actually only uses
+ * 48 bits). An arithmetic-right-shift will then appropriately restore the
+ * original pointer value with proper sign extension into the top bits. */
+#define RTE_DISTRIB_FLAG_BITS 4
+#define RTE_DISTRIB_FLAGS_MASK (0x0F)
+#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
+#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
+#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
+
+#define RTE_DISTRIB_BACKLOG_SIZE 8
+#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
+
+#define RTE_DISTRIB_MAX_RETURNS 128
+#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
+
+/**
+ * Maximum number of workers allowed.
+ * Be aware of increasing the limit, becaus it is limited by how we track
+ * in-flight tags. See @in_flight_bitmask and @rte_distributor_process
+ */
+#define RTE_DISTRIB_MAX_WORKERS	64
+
+/**
+ * Buffer structure used to pass the pointer data between cores. This is cache
+ * line aligned, but to improve performance and prevent adjacent cache-line
+ * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
+ * the next cache line to worker 0, we pad this out to three cache lines.
+ * Only 64-bits of the memory is actually used though.
+ */
+union rte_distributor_buffer {
+	volatile int64_t bufptr64;
+	char pad[RTE_CACHE_LINE_SIZE*3];
+} __rte_cache_aligned;
+
+struct rte_distributor_backlog {
+	unsigned start;
+	unsigned count;
+	int64_t pkts[RTE_DISTRIB_BACKLOG_SIZE];
+};
+
+struct rte_distributor_returned_pkts {
+	unsigned start;
+	unsigned count;
+	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
+};
+
+struct rte_distributor {
+	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
+
+	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
+	unsigned num_workers;                 /**< Number of workers polling */
+
+	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
+		/**< Tracks the tag being processed per core */
+	uint64_t in_flight_bitmask;
+		/**< on/off bits for in-flight tags.
+		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
+		 * the bitmask has to expand.
+		 */
+
+	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
+
+	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
+
+	struct rte_distributor_returned_pkts returns;
+};
+
+TAILQ_HEAD(rte_distributor_list, rte_distributor);
+
+static struct rte_tailq_elem rte_distributor_tailq = {
+	.name = "RTE_DISTRIBUTOR",
+};
+EAL_REGISTER_TAILQ(rte_distributor_tailq)
+
+/**** APIs called by workers ****/
+
+void
+rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned worker_id, struct rte_mbuf *oldpkt)
+{
+	union rte_distributor_buffer *buf = &d->bufs[worker_id];
+	int64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
+			| RTE_DISTRIB_GET_BUF;
+	while (unlikely(buf->bufptr64 & RTE_DISTRIB_FLAGS_MASK))
+		rte_pause();
+	buf->bufptr64 = req;
+}
+
+struct rte_mbuf *
+rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned worker_id)
+{
+	union rte_distributor_buffer *buf = &d->bufs[worker_id];
+	if (buf->bufptr64 & RTE_DISTRIB_GET_BUF)
+		return NULL;
+
+	/* since bufptr64 is signed, this should be an arithmetic shift */
+	int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
+	return (struct rte_mbuf *)((uintptr_t)ret);
+}
+
+struct rte_mbuf *
+rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned worker_id, struct rte_mbuf *oldpkt)
+{
+	struct rte_mbuf *ret;
+	rte_distributor_request_pkt(d, worker_id, oldpkt);
+	while ((ret = rte_distributor_poll_pkt(d, worker_id)) == NULL)
+		rte_pause();
+	return ret;
+}
+
+int
+rte_distributor_return_pkt(struct rte_distributor *d,
+		unsigned worker_id, struct rte_mbuf *oldpkt)
+{
+	union rte_distributor_buffer *buf = &d->bufs[worker_id];
+	uint64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
+			| RTE_DISTRIB_RETURN_BUF;
+	buf->bufptr64 = req;
+	return 0;
+}
+
+/**** APIs called on distributor core ***/
+
+/* as name suggests, adds a packet to the backlog for a particular worker */
+static int
+add_to_backlog(struct rte_distributor_backlog *bl, int64_t item)
+{
+	if (bl->count == RTE_DISTRIB_BACKLOG_SIZE)
+		return -1;
+
+	bl->pkts[(bl->start + bl->count++) & (RTE_DISTRIB_BACKLOG_MASK)]
+			= item;
+	return 0;
+}
+
+/* takes the next packet for a worker off the backlog */
+static int64_t
+backlog_pop(struct rte_distributor_backlog *bl)
+{
+	bl->count--;
+	return bl->pkts[bl->start++ & RTE_DISTRIB_BACKLOG_MASK];
+}
+
+/* stores a packet returned from a worker inside the returns array */
+static inline void
+store_return(uintptr_t oldbuf, struct rte_distributor *d,
+		unsigned *ret_start, unsigned *ret_count)
+{
+	/* store returns in a circular buffer - code is branch-free */
+	d->returns.mbufs[(*ret_start + *ret_count) & RTE_DISTRIB_RETURNS_MASK]
+			= (void *)oldbuf;
+	*ret_start += (*ret_count == RTE_DISTRIB_RETURNS_MASK) & !!(oldbuf);
+	*ret_count += (*ret_count != RTE_DISTRIB_RETURNS_MASK) & !!(oldbuf);
+}
+
+static inline void
+handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
+{
+	d->in_flight_tags[wkr] = 0;
+	d->in_flight_bitmask &= ~(1UL << wkr);
+	d->bufs[wkr].bufptr64 = 0;
+	if (unlikely(d->backlog[wkr].count != 0)) {
+		/* On return of a packet, we need to move the
+		 * queued packets for this core elsewhere.
+		 * Easiest solution is to set things up for
+		 * a recursive call. That will cause those
+		 * packets to be queued up for the next free
+		 * core, i.e. it will return as soon as a
+		 * core becomes free to accept the first
+		 * packet, as subsequent ones will be added to
+		 * the backlog for that core.
+		 */
+		struct rte_mbuf *pkts[RTE_DISTRIB_BACKLOG_SIZE];
+		unsigned i;
+		struct rte_distributor_backlog *bl = &d->backlog[wkr];
+
+		for (i = 0; i < bl->count; i++) {
+			unsigned idx = (bl->start + i) &
+					RTE_DISTRIB_BACKLOG_MASK;
+			pkts[i] = (void *)((uintptr_t)(bl->pkts[idx] >>
+					RTE_DISTRIB_FLAG_BITS));
+		}
+		/* recursive call.
+		 * Note that the tags were set before first level call
+		 * to rte_distributor_process.
+		 */
+		rte_distributor_process(d, pkts, i);
+		bl->count = bl->start = 0;
+	}
+}
+
+/* this function is called when process() fn is called without any new
+ * packets. It goes through all the workers and clears any returned packets
+ * to do a partial flush.
+ */
+static int
+process_returns(struct rte_distributor *d)
+{
+	unsigned wkr;
+	unsigned flushed = 0;
+	unsigned ret_start = d->returns.start,
+			ret_count = d->returns.count;
+
+	for (wkr = 0; wkr < d->num_workers; wkr++) {
+
+		const int64_t data = d->bufs[wkr].bufptr64;
+		uintptr_t oldbuf = 0;
+
+		if (data & RTE_DISTRIB_GET_BUF) {
+			flushed++;
+			if (d->backlog[wkr].count)
+				d->bufs[wkr].bufptr64 =
+						backlog_pop(&d->backlog[wkr]);
+			else {
+				d->bufs[wkr].bufptr64 = RTE_DISTRIB_GET_BUF;
+				d->in_flight_tags[wkr] = 0;
+				d->in_flight_bitmask &= ~(1UL << wkr);
+			}
+			oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
+		} else if (data & RTE_DISTRIB_RETURN_BUF) {
+			handle_worker_shutdown(d, wkr);
+			oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
+		}
+
+		store_return(oldbuf, d, &ret_start, &ret_count);
+	}
+
+	d->returns.start = ret_start;
+	d->returns.count = ret_count;
+
+	return flushed;
+}
+
+/* process a set of packets to distribute them to workers */
+int
+rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned num_mbufs)
+{
+	unsigned next_idx = 0;
+	unsigned wkr = 0;
+	struct rte_mbuf *next_mb = NULL;
+	int64_t next_value = 0;
+	uint32_t new_tag = 0;
+	unsigned ret_start = d->returns.start,
+			ret_count = d->returns.count;
+
+	if (unlikely(num_mbufs == 0))
+		return process_returns(d);
+
+	while (next_idx < num_mbufs || next_mb != NULL) {
+
+		int64_t data = d->bufs[wkr].bufptr64;
+		uintptr_t oldbuf = 0;
+
+		if (!next_mb) {
+			next_mb = mbufs[next_idx++];
+			next_value = (((int64_t)(uintptr_t)next_mb)
+					<< RTE_DISTRIB_FLAG_BITS);
+			/*
+			 * User is advocated to set tag vaue for each
+			 * mbuf before calling rte_distributor_process.
+			 * User defined tags are used to identify flows,
+			 * or sessions.
+			 */
+			new_tag = next_mb->hash.usr;
+
+			/*
+			 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64
+			 * then the size of match has to be expanded.
+			 */
+			uint64_t match = 0;
+			unsigned i;
+			/*
+			 * to scan for a match use "xor" and "not" to get a 0/1
+			 * value, then use shifting to merge to single "match"
+			 * variable, where a one-bit indicates a match for the
+			 * worker given by the bit-position
+			 */
+			for (i = 0; i < d->num_workers; i++)
+				match |= (!(d->in_flight_tags[i] ^ new_tag)
+					<< i);
+
+			/* Only turned-on bits are considered as match */
+			match &= d->in_flight_bitmask;
+
+			if (match) {
+				next_mb = NULL;
+				unsigned worker = __builtin_ctzl(match);
+				if (add_to_backlog(&d->backlog[worker],
+						next_value) < 0)
+					next_idx--;
+			}
+		}
+
+		if ((data & RTE_DISTRIB_GET_BUF) &&
+				(d->backlog[wkr].count || next_mb)) {
+
+			if (d->backlog[wkr].count)
+				d->bufs[wkr].bufptr64 =
+						backlog_pop(&d->backlog[wkr]);
+
+			else {
+				d->bufs[wkr].bufptr64 = next_value;
+				d->in_flight_tags[wkr] = new_tag;
+				d->in_flight_bitmask |= (1UL << wkr);
+				next_mb = NULL;
+			}
+			oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
+		} else if (data & RTE_DISTRIB_RETURN_BUF) {
+			handle_worker_shutdown(d, wkr);
+			oldbuf = data >> RTE_DISTRIB_FLAG_BITS;
+		}
+
+		/* store returns in a circular buffer */
+		store_return(oldbuf, d, &ret_start, &ret_count);
+
+		if (++wkr == d->num_workers)
+			wkr = 0;
+	}
+	/* to finish, check all workers for backlog and schedule work for them
+	 * if they are ready */
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		if (d->backlog[wkr].count &&
+				(d->bufs[wkr].bufptr64 & RTE_DISTRIB_GET_BUF)) {
+
+			int64_t oldbuf = d->bufs[wkr].bufptr64 >>
+					RTE_DISTRIB_FLAG_BITS;
+			store_return(oldbuf, d, &ret_start, &ret_count);
+
+			d->bufs[wkr].bufptr64 = backlog_pop(&d->backlog[wkr]);
+		}
+
+	d->returns.start = ret_start;
+	d->returns.count = ret_count;
+	return num_mbufs;
+}
+
+/* return to the caller, packets returned from workers */
+int
+rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned max_mbufs)
+{
+	struct rte_distributor_returned_pkts *returns = &d->returns;
+	unsigned retval = (max_mbufs < returns->count) ?
+			max_mbufs : returns->count;
+	unsigned i;
+
+	for (i = 0; i < retval; i++) {
+		unsigned idx = (returns->start + i) & RTE_DISTRIB_RETURNS_MASK;
+		mbufs[i] = returns->mbufs[idx];
+	}
+	returns->start += i;
+	returns->count -= i;
+
+	return retval;
+}
+
+/* return the number of packets in-flight in a distributor, i.e. packets
+ * being workered on or queued up in a backlog. */
+static inline unsigned
+total_outstanding(const struct rte_distributor *d)
+{
+	unsigned wkr, total_outstanding;
+
+	total_outstanding = __builtin_popcountl(d->in_flight_bitmask);
+
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		total_outstanding += d->backlog[wkr].count;
+
+	return total_outstanding;
+}
+
+/* flush the distributor, so that there are no outstanding packets in flight or
+ * queued up. */
+int
+rte_distributor_flush(struct rte_distributor *d)
+{
+	const unsigned flushed = total_outstanding(d);
+
+	while (total_outstanding(d) > 0)
+		rte_distributor_process(d, NULL, 0);
+
+	return flushed;
+}
+
+/* clears the internal returns array in the distributor */
+void
+rte_distributor_clear_returns(struct rte_distributor *d)
+{
+	d->returns.start = d->returns.count = 0;
+#ifndef __OPTIMIZE__
+	memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
+#endif
+}
+
+/* creates a distributor instance */
+struct rte_distributor *
+rte_distributor_create(const char *name,
+		unsigned socket_id,
+		unsigned num_workers)
+{
+	struct rte_distributor *d;
+	struct rte_distributor_list *distributor_list;
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+
+	/* compilation-time checks */
+	RTE_BUILD_BUG_ON((sizeof(*d) & RTE_CACHE_LINE_MASK) != 0);
+	RTE_BUILD_BUG_ON((RTE_DISTRIB_MAX_WORKERS & 7) != 0);
+	RTE_BUILD_BUG_ON(RTE_DISTRIB_MAX_WORKERS >
+				sizeof(d->in_flight_bitmask) * CHAR_BIT);
+
+	if (name == NULL || num_workers >= RTE_DISTRIB_MAX_WORKERS) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	snprintf(mz_name, sizeof(mz_name), RTE_DISTRIB_PREFIX"%s", name);
+	mz = rte_memzone_reserve(mz_name, sizeof(*d), socket_id, NO_FLAGS);
+	if (mz == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	d = mz->addr;
+	snprintf(d->name, sizeof(d->name), "%s", name);
+	d->num_workers = num_workers;
+
+	distributor_list = RTE_TAILQ_CAST(rte_distributor_tailq.head,
+					  rte_distributor_list);
+
+	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+	TAILQ_INSERT_TAIL(distributor_list, d, next);
+	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
+	return d;
+}
diff --git a/lib/librte_distributor/rte_distributor_v20.h b/lib/librte_distributor/rte_distributor_v20.h
new file mode 100644
index 0000000..7d36bc8
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v20.h
@@ -0,0 +1,247 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIBUTE_H_
+#define _RTE_DISTRIBUTE_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+struct rte_distributor;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor *
+rte_distributor_create(const char *name, unsigned socket_id,
+		unsigned num_workers);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be procesed at the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush(struct rte_distributor *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns(struct rte_distributor *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get a new packet to process. Any previous packet
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ *
+ * @return
+ *   A new packet to be processed by the worker thread.
+ */
+struct rte_mbuf *
+rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbuf
+ *   The previous packet being processed by the worker
+ */
+int
+rte_distributor_return_pkt(struct rte_distributor *d, unsigned worker_id,
+		struct rte_mbuf *mbuf);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt(), this function does not wait for a new
+ * packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ */
+void
+rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ *
+ * @return
+ *   A new packet to be processed by the worker thread, or NULL if no
+ *   packet is yet available.
+ */
+struct rte_mbuf *
+rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned worker_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v7 02/17] lib: symbol versioning of functions in distributor
  2017-02-21  3:17                   ` [PATCH v7 0/17] distributor library " David Hunt
  2017-02-21  3:17                     ` [PATCH v7 01/17] lib: rename legacy distributor lib files David Hunt
@ 2017-02-21  3:17                     ` David Hunt
  2017-02-24 14:05                       ` Bruce Richardson
  2017-02-21  3:17                     ` [PATCH v7 03/17] lib: create rte_distributor_private.h David Hunt
                                       ` (15 subsequent siblings)
  17 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-02-21  3:17 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

we will start the symbol versioning by renaming all legacy functions

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 app/test/test_distributor.c                        | 104 +++++++++++----------
 app/test/test_distributor_perf.c                   |  28 +++---
 examples/distributor/main.c                        |  24 ++---
 lib/librte_distributor/rte_distributor_v20.c       |  54 +++++------
 lib/librte_distributor/rte_distributor_v20.h       |  33 +++----
 lib/librte_distributor/rte_distributor_version.map |  18 ++--
 6 files changed, 132 insertions(+), 129 deletions(-)

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index ba402e2..6a4e20b 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -81,17 +81,17 @@ static int
 handle_work(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
+	struct rte_distributor_v20 *d = arg;
 	unsigned count = 0;
 	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
 
-	pkt = rte_distributor_get_pkt(d, id, NULL);
+	pkt = rte_distributor_get_pkt_v20(d, id, NULL);
 	while (!quit) {
 		worker_stats[id].handled_packets++, count++;
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		pkt = rte_distributor_get_pkt_v20(d, id, pkt);
 	}
 	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
+	rte_distributor_return_pkt_v20(d, id, pkt);
 	return 0;
 }
 
@@ -107,7 +107,7 @@ handle_work(void *arg)
  *   not necessarily in the same order (as different flows).
  */
 static int
-sanity_test(struct rte_distributor *d, struct rte_mempool *p)
+sanity_test(struct rte_distributor_v20 *d, struct rte_mempool *p)
 {
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
@@ -124,8 +124,8 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 	for (i = 0; i < BURST; i++)
 		bufs[i]->hash.usr = 0;
 
-	rte_distributor_process(d, bufs, BURST);
-	rte_distributor_flush(d);
+	rte_distributor_process_v20(d, bufs, BURST);
+	rte_distributor_flush_v20(d);
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -146,8 +146,8 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 		for (i = 0; i < BURST; i++)
 			bufs[i]->hash.usr = (i & 1) << 8;
 
-		rte_distributor_process(d, bufs, BURST);
-		rte_distributor_flush(d);
+		rte_distributor_process_v20(d, bufs, BURST);
+		rte_distributor_flush_v20(d);
 		if (total_packet_count() != BURST) {
 			printf("Line %d: Error, not all packets flushed. "
 					"Expected %u, got %u\n",
@@ -171,8 +171,8 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 	for (i = 0; i < BURST; i++)
 		bufs[i]->hash.usr = i;
 
-	rte_distributor_process(d, bufs, BURST);
-	rte_distributor_flush(d);
+	rte_distributor_process_v20(d, bufs, BURST);
+	rte_distributor_flush_v20(d);
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -194,8 +194,8 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 	unsigned num_returned = 0;
 
 	/* flush out any remaining packets */
-	rte_distributor_flush(d);
-	rte_distributor_clear_returns(d);
+	rte_distributor_flush_v20(d);
+	rte_distributor_clear_returns_v20(d);
 	if (rte_mempool_get_bulk(p, (void *)many_bufs, BIG_BATCH) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
 		return -1;
@@ -204,13 +204,13 @@ sanity_test(struct rte_distributor *d, struct rte_mempool *p)
 		many_bufs[i]->hash.usr = i << 2;
 
 	for (i = 0; i < BIG_BATCH/BURST; i++) {
-		rte_distributor_process(d, &many_bufs[i*BURST], BURST);
-		num_returned += rte_distributor_returned_pkts(d,
+		rte_distributor_process_v20(d, &many_bufs[i*BURST], BURST);
+		num_returned += rte_distributor_returned_pkts_v20(d,
 				&return_bufs[num_returned],
 				BIG_BATCH - num_returned);
 	}
-	rte_distributor_flush(d);
-	num_returned += rte_distributor_returned_pkts(d,
+	rte_distributor_flush_v20(d);
+	num_returned += rte_distributor_returned_pkts_v20(d,
 			&return_bufs[num_returned], BIG_BATCH - num_returned);
 
 	if (num_returned != BIG_BATCH) {
@@ -249,18 +249,18 @@ static int
 handle_work_with_free_mbufs(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
+	struct rte_distributor_v20 *d = arg;
 	unsigned count = 0;
 	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
 
-	pkt = rte_distributor_get_pkt(d, id, NULL);
+	pkt = rte_distributor_get_pkt_v20(d, id, NULL);
 	while (!quit) {
 		worker_stats[id].handled_packets++, count++;
 		rte_pktmbuf_free(pkt);
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		pkt = rte_distributor_get_pkt_v20(d, id, pkt);
 	}
 	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
+	rte_distributor_return_pkt_v20(d, id, pkt);
 	return 0;
 }
 
@@ -270,7 +270,8 @@ handle_work_with_free_mbufs(void *arg)
  * library.
  */
 static int
-sanity_test_with_mbuf_alloc(struct rte_distributor *d, struct rte_mempool *p)
+sanity_test_with_mbuf_alloc(struct rte_distributor_v20 *d,
+		struct rte_mempool *p)
 {
 	unsigned i;
 	struct rte_mbuf *bufs[BURST];
@@ -280,16 +281,16 @@ sanity_test_with_mbuf_alloc(struct rte_distributor *d, struct rte_mempool *p)
 	for (i = 0; i < ((1<<ITER_POWER)); i += BURST) {
 		unsigned j;
 		while (rte_mempool_get_bulk(p, (void *)bufs, BURST) < 0)
-			rte_distributor_process(d, NULL, 0);
+			rte_distributor_process_v20(d, NULL, 0);
 		for (j = 0; j < BURST; j++) {
 			bufs[j]->hash.usr = (i+j) << 1;
 			rte_mbuf_refcnt_set(bufs[j], 1);
 		}
 
-		rte_distributor_process(d, bufs, BURST);
+		rte_distributor_process_v20(d, bufs, BURST);
 	}
 
-	rte_distributor_flush(d);
+	rte_distributor_flush_v20(d);
 	if (total_packet_count() < (1<<ITER_POWER)) {
 		printf("Line %u: Packet count is incorrect, %u, expected %u\n",
 				__LINE__, total_packet_count(),
@@ -305,20 +306,20 @@ static int
 handle_work_for_shutdown_test(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
+	struct rte_distributor_v20 *d = arg;
 	unsigned count = 0;
 	const unsigned id = __sync_fetch_and_add(&worker_idx, 1);
 
-	pkt = rte_distributor_get_pkt(d, id, NULL);
+	pkt = rte_distributor_get_pkt_v20(d, id, NULL);
 	/* wait for quit single globally, or for worker zero, wait
 	 * for zero_quit */
 	while (!quit && !(id == 0 && zero_quit)) {
 		worker_stats[id].handled_packets++, count++;
 		rte_pktmbuf_free(pkt);
-		pkt = rte_distributor_get_pkt(d, id, NULL);
+		pkt = rte_distributor_get_pkt_v20(d, id, NULL);
 	}
 	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
+	rte_distributor_return_pkt_v20(d, id, pkt);
 
 	if (id == 0) {
 		/* for worker zero, allow it to restart to pick up last packet
@@ -326,13 +327,13 @@ handle_work_for_shutdown_test(void *arg)
 		 */
 		while (zero_quit)
 			usleep(100);
-		pkt = rte_distributor_get_pkt(d, id, NULL);
+		pkt = rte_distributor_get_pkt_v20(d, id, NULL);
 		while (!quit) {
 			worker_stats[id].handled_packets++, count++;
 			rte_pktmbuf_free(pkt);
-			pkt = rte_distributor_get_pkt(d, id, NULL);
+			pkt = rte_distributor_get_pkt_v20(d, id, NULL);
 		}
-		rte_distributor_return_pkt(d, id, pkt);
+		rte_distributor_return_pkt_v20(d, id, pkt);
 	}
 	return 0;
 }
@@ -344,7 +345,7 @@ handle_work_for_shutdown_test(void *arg)
  * library.
  */
 static int
-sanity_test_with_worker_shutdown(struct rte_distributor *d,
+sanity_test_with_worker_shutdown(struct rte_distributor_v20 *d,
 		struct rte_mempool *p)
 {
 	struct rte_mbuf *bufs[BURST];
@@ -363,7 +364,7 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d,
 	for (i = 0; i < BURST; i++)
 		bufs[i]->hash.usr = 0;
 
-	rte_distributor_process(d, bufs, BURST);
+	rte_distributor_process_v20(d, bufs, BURST);
 	/* at this point, we will have processed some packets and have a full
 	 * backlog for the other ones at worker 0.
 	 */
@@ -378,10 +379,10 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d,
 
 	/* get worker zero to quit */
 	zero_quit = 1;
-	rte_distributor_process(d, bufs, BURST);
+	rte_distributor_process_v20(d, bufs, BURST);
 
 	/* flush the distributor */
-	rte_distributor_flush(d);
+	rte_distributor_flush_v20(d);
 	if (total_packet_count() != BURST * 2) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -401,7 +402,7 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d,
  * one worker shuts down..
  */
 static int
-test_flush_with_worker_shutdown(struct rte_distributor *d,
+test_flush_with_worker_shutdown(struct rte_distributor_v20 *d,
 		struct rte_mempool *p)
 {
 	struct rte_mbuf *bufs[BURST];
@@ -420,7 +421,7 @@ test_flush_with_worker_shutdown(struct rte_distributor *d,
 	for (i = 0; i < BURST; i++)
 		bufs[i]->hash.usr = 0;
 
-	rte_distributor_process(d, bufs, BURST);
+	rte_distributor_process_v20(d, bufs, BURST);
 	/* at this point, we will have processed some packets and have a full
 	 * backlog for the other ones at worker 0.
 	 */
@@ -429,7 +430,7 @@ test_flush_with_worker_shutdown(struct rte_distributor *d,
 	zero_quit = 1;
 
 	/* flush the distributor */
-	rte_distributor_flush(d);
+	rte_distributor_flush_v20(d);
 
 	zero_quit = 0;
 	if (total_packet_count() != BURST) {
@@ -450,10 +451,10 @@ test_flush_with_worker_shutdown(struct rte_distributor *d,
 static
 int test_error_distributor_create_name(void)
 {
-	struct rte_distributor *d = NULL;
+	struct rte_distributor_v20 *d = NULL;
 	char *name = NULL;
 
-	d = rte_distributor_create(name, rte_socket_id(),
+	d = rte_distributor_create_v20(name, rte_socket_id(),
 			rte_lcore_count() - 1);
 	if (d != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with NULL name param\n");
@@ -467,8 +468,8 @@ int test_error_distributor_create_name(void)
 static
 int test_error_distributor_create_numworkers(void)
 {
-	struct rte_distributor *d = NULL;
-	d = rte_distributor_create("test_numworkers", rte_socket_id(),
+	struct rte_distributor_v20 *d = NULL;
+	d = rte_distributor_create_v20("test_numworkers", rte_socket_id(),
 			RTE_MAX_LCORE + 10);
 	if (d != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with num_workers > MAX\n");
@@ -480,7 +481,7 @@ int test_error_distributor_create_numworkers(void)
 
 /* Useful function which ensures that all worker functions terminate */
 static void
-quit_workers(struct rte_distributor *d, struct rte_mempool *p)
+quit_workers(struct rte_distributor_v20 *d, struct rte_mempool *p)
 {
 	const unsigned num_workers = rte_lcore_count() - 1;
 	unsigned i;
@@ -491,12 +492,12 @@ quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 	quit = 1;
 	for (i = 0; i < num_workers; i++)
 		bufs[i]->hash.usr = i << 1;
-	rte_distributor_process(d, bufs, num_workers);
+	rte_distributor_process_v20(d, bufs, num_workers);
 
 	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
 
-	rte_distributor_process(d, NULL, 0);
-	rte_distributor_flush(d);
+	rte_distributor_process_v20(d, NULL, 0);
+	rte_distributor_flush_v20(d);
 	rte_eal_mp_wait_lcore();
 	quit = 0;
 	worker_idx = 0;
@@ -505,7 +506,7 @@ quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 static int
 test_distributor(void)
 {
-	static struct rte_distributor *d;
+	static struct rte_distributor_v20 *d;
 	static struct rte_mempool *p;
 
 	if (rte_lcore_count() < 2) {
@@ -514,15 +515,16 @@ test_distributor(void)
 	}
 
 	if (d == NULL) {
-		d = rte_distributor_create("Test_distributor", rte_socket_id(),
+		d = rte_distributor_create_v20("Test_distributor",
+				rte_socket_id(),
 				rte_lcore_count() - 1);
 		if (d == NULL) {
 			printf("Error creating distributor\n");
 			return -1;
 		}
 	} else {
-		rte_distributor_flush(d);
-		rte_distributor_clear_returns(d);
+		rte_distributor_flush_v20(d);
+		rte_distributor_clear_returns_v20(d);
 	}
 
 	const unsigned nb_bufs = (511 * rte_lcore_count()) < BIG_BATCH ?
diff --git a/app/test/test_distributor_perf.c b/app/test/test_distributor_perf.c
index fe0c97d..a7e4823 100644
--- a/app/test/test_distributor_perf.c
+++ b/app/test/test_distributor_perf.c
@@ -130,17 +130,17 @@ static int
 handle_work(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
+	struct rte_distributor_v20 *d = arg;
 	unsigned count = 0;
 	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
 
-	pkt = rte_distributor_get_pkt(d, id, NULL);
+	pkt = rte_distributor_get_pkt_v20(d, id, NULL);
 	while (!quit) {
 		worker_stats[id].handled_packets++, count++;
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		pkt = rte_distributor_get_pkt_v20(d, id, pkt);
 	}
 	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
+	rte_distributor_return_pkt_v20(d, id, pkt);
 	return 0;
 }
 
@@ -149,7 +149,7 @@ handle_work(void *arg)
  * threads and finally how long per packet the processing took.
  */
 static inline int
-perf_test(struct rte_distributor *d, struct rte_mempool *p)
+perf_test(struct rte_distributor_v20 *d, struct rte_mempool *p)
 {
 	unsigned i;
 	uint64_t start, end;
@@ -166,12 +166,12 @@ perf_test(struct rte_distributor *d, struct rte_mempool *p)
 
 	start = rte_rdtsc();
 	for (i = 0; i < (1<<ITER_POWER); i++)
-		rte_distributor_process(d, bufs, BURST);
+		rte_distributor_process_v20(d, bufs, BURST);
 	end = rte_rdtsc();
 
 	do {
 		usleep(100);
-		rte_distributor_process(d, NULL, 0);
+		rte_distributor_process_v20(d, NULL, 0);
 	} while (total_packet_count() < (BURST << ITER_POWER));
 
 	printf("=== Performance test of distributor ===\n");
@@ -192,7 +192,7 @@ perf_test(struct rte_distributor *d, struct rte_mempool *p)
 
 /* Useful function which ensures that all worker functions terminate */
 static void
-quit_workers(struct rte_distributor *d, struct rte_mempool *p)
+quit_workers(struct rte_distributor_v20 *d, struct rte_mempool *p)
 {
 	const unsigned num_workers = rte_lcore_count() - 1;
 	unsigned i;
@@ -202,11 +202,11 @@ quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 	quit = 1;
 	for (i = 0; i < num_workers; i++)
 		bufs[i]->hash.usr = i << 1;
-	rte_distributor_process(d, bufs, num_workers);
+	rte_distributor_process_v20(d, bufs, num_workers);
 
 	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
 
-	rte_distributor_process(d, NULL, 0);
+	rte_distributor_process_v20(d, NULL, 0);
 	rte_eal_mp_wait_lcore();
 	quit = 0;
 	worker_idx = 0;
@@ -215,7 +215,7 @@ quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 static int
 test_distributor_perf(void)
 {
-	static struct rte_distributor *d;
+	static struct rte_distributor_v20 *d;
 	static struct rte_mempool *p;
 
 	if (rte_lcore_count() < 2) {
@@ -227,15 +227,15 @@ test_distributor_perf(void)
 	time_cache_line_switch();
 
 	if (d == NULL) {
-		d = rte_distributor_create("Test_perf", rte_socket_id(),
+		d = rte_distributor_create_v20("Test_perf", rte_socket_id(),
 				rte_lcore_count() - 1);
 		if (d == NULL) {
 			printf("Error creating distributor\n");
 			return -1;
 		}
 	} else {
-		rte_distributor_flush(d);
-		rte_distributor_clear_returns(d);
+		rte_distributor_flush_v20(d);
+		rte_distributor_clear_returns_v20(d);
 	}
 
 	const unsigned nb_bufs = (511 * rte_lcore_count()) < BIG_BATCH ?
diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index fba5446..350d6f6 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -160,13 +160,13 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 
 struct lcore_params {
 	unsigned worker_id;
-	struct rte_distributor *d;
+	struct rte_distributor_v20 *d;
 	struct rte_ring *r;
 	struct rte_mempool *mem_pool;
 };
 
 static int
-quit_workers(struct rte_distributor *d, struct rte_mempool *p)
+quit_workers(struct rte_distributor_v20 *d, struct rte_mempool *p)
 {
 	const unsigned num_workers = rte_lcore_count() - 2;
 	unsigned i;
@@ -180,7 +180,7 @@ quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 	for (i = 0; i < num_workers; i++)
 		bufs[i]->hash.rss = i << 1;
 
-	rte_distributor_process(d, bufs, num_workers);
+	rte_distributor_process_v20(d, bufs, num_workers);
 	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
 
 	return 0;
@@ -189,7 +189,7 @@ quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 static int
 lcore_rx(struct lcore_params *p)
 {
-	struct rte_distributor *d = p->d;
+	struct rte_distributor_v20 *d = p->d;
 	struct rte_mempool *mem_pool = p->mem_pool;
 	struct rte_ring *r = p->r;
 	const uint8_t nb_ports = rte_eth_dev_count();
@@ -228,8 +228,8 @@ lcore_rx(struct lcore_params *p)
 		}
 		app_stats.rx.rx_pkts += nb_rx;
 
-		rte_distributor_process(d, bufs, nb_rx);
-		const uint16_t nb_ret = rte_distributor_returned_pkts(d,
+		rte_distributor_process_v20(d, bufs, nb_rx);
+		const uint16_t nb_ret = rte_distributor_returned_pkts_v20(d,
 				bufs, BURST_SIZE*2);
 		app_stats.rx.returned_pkts += nb_ret;
 		if (unlikely(nb_ret == 0)) {
@@ -249,9 +249,9 @@ lcore_rx(struct lcore_params *p)
 		if (++port == nb_ports)
 			port = 0;
 	}
-	rte_distributor_process(d, NULL, 0);
+	rte_distributor_process_v20(d, NULL, 0);
 	/* flush distributor to bring to known state */
-	rte_distributor_flush(d);
+	rte_distributor_flush_v20(d);
 	/* set worker & tx threads quit flag */
 	quit_signal = 1;
 	/*
@@ -403,7 +403,7 @@ print_stats(void)
 static int
 lcore_worker(struct lcore_params *p)
 {
-	struct rte_distributor *d = p->d;
+	struct rte_distributor_v20 *d = p->d;
 	const unsigned id = p->worker_id;
 	/*
 	 * for single port, xor_val will be zero so we won't modify the output
@@ -414,7 +414,7 @@ lcore_worker(struct lcore_params *p)
 
 	printf("\nCore %u acting as worker core.\n", rte_lcore_id());
 	while (!quit_signal) {
-		buf = rte_distributor_get_pkt(d, id, buf);
+		buf = rte_distributor_get_pkt_v20(d, id, buf);
 		buf->port ^= xor_val;
 	}
 	return 0;
@@ -496,7 +496,7 @@ int
 main(int argc, char *argv[])
 {
 	struct rte_mempool *mbuf_pool;
-	struct rte_distributor *d;
+	struct rte_distributor_v20 *d;
 	struct rte_ring *output_ring;
 	unsigned lcore_id, worker_id = 0;
 	unsigned nb_ports;
@@ -560,7 +560,7 @@ main(int argc, char *argv[])
 				"All available ports are disabled. Please set portmask.\n");
 	}
 
-	d = rte_distributor_create("PKT_DIST", rte_socket_id(),
+	d = rte_distributor_create_v20("PKT_DIST", rte_socket_id(),
 			rte_lcore_count() - 2);
 	if (d == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create distributor\n");
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
index b890947..48a8794 100644
--- a/lib/librte_distributor/rte_distributor_v20.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -75,7 +75,7 @@
  * the next cache line to worker 0, we pad this out to three cache lines.
  * Only 64-bits of the memory is actually used though.
  */
-union rte_distributor_buffer {
+union rte_distributor_buffer_v20 {
 	volatile int64_t bufptr64;
 	char pad[RTE_CACHE_LINE_SIZE*3];
 } __rte_cache_aligned;
@@ -92,8 +92,8 @@ struct rte_distributor_returned_pkts {
 	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
 };
 
-struct rte_distributor {
-	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
+struct rte_distributor_v20 {
+	TAILQ_ENTRY(rte_distributor_v20) next;    /**< Next in list. */
 
 	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
 	unsigned num_workers;                 /**< Number of workers polling */
@@ -108,12 +108,12 @@ struct rte_distributor {
 
 	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
 
-	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
+	union rte_distributor_buffer_v20 bufs[RTE_DISTRIB_MAX_WORKERS];
 
 	struct rte_distributor_returned_pkts returns;
 };
 
-TAILQ_HEAD(rte_distributor_list, rte_distributor);
+TAILQ_HEAD(rte_distributor_list, rte_distributor_v20);
 
 static struct rte_tailq_elem rte_distributor_tailq = {
 	.name = "RTE_DISTRIBUTOR",
@@ -123,10 +123,10 @@ EAL_REGISTER_TAILQ(rte_distributor_tailq)
 /**** APIs called by workers ****/
 
 void
-rte_distributor_request_pkt(struct rte_distributor *d,
+rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id, struct rte_mbuf *oldpkt)
 {
-	union rte_distributor_buffer *buf = &d->bufs[worker_id];
+	union rte_distributor_buffer_v20 *buf = &d->bufs[worker_id];
 	int64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
 			| RTE_DISTRIB_GET_BUF;
 	while (unlikely(buf->bufptr64 & RTE_DISTRIB_FLAGS_MASK))
@@ -135,10 +135,10 @@ rte_distributor_request_pkt(struct rte_distributor *d,
 }
 
 struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
+rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id)
 {
-	union rte_distributor_buffer *buf = &d->bufs[worker_id];
+	union rte_distributor_buffer_v20 *buf = &d->bufs[worker_id];
 	if (buf->bufptr64 & RTE_DISTRIB_GET_BUF)
 		return NULL;
 
@@ -148,21 +148,21 @@ rte_distributor_poll_pkt(struct rte_distributor *d,
 }
 
 struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
+rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id, struct rte_mbuf *oldpkt)
 {
 	struct rte_mbuf *ret;
-	rte_distributor_request_pkt(d, worker_id, oldpkt);
-	while ((ret = rte_distributor_poll_pkt(d, worker_id)) == NULL)
+	rte_distributor_request_pkt_v20(d, worker_id, oldpkt);
+	while ((ret = rte_distributor_poll_pkt_v20(d, worker_id)) == NULL)
 		rte_pause();
 	return ret;
 }
 
 int
-rte_distributor_return_pkt(struct rte_distributor *d,
+rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id, struct rte_mbuf *oldpkt)
 {
-	union rte_distributor_buffer *buf = &d->bufs[worker_id];
+	union rte_distributor_buffer_v20 *buf = &d->bufs[worker_id];
 	uint64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
 			| RTE_DISTRIB_RETURN_BUF;
 	buf->bufptr64 = req;
@@ -193,7 +193,7 @@ backlog_pop(struct rte_distributor_backlog *bl)
 
 /* stores a packet returned from a worker inside the returns array */
 static inline void
-store_return(uintptr_t oldbuf, struct rte_distributor *d,
+store_return(uintptr_t oldbuf, struct rte_distributor_v20 *d,
 		unsigned *ret_start, unsigned *ret_count)
 {
 	/* store returns in a circular buffer - code is branch-free */
@@ -204,7 +204,7 @@ store_return(uintptr_t oldbuf, struct rte_distributor *d,
 }
 
 static inline void
-handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
+handle_worker_shutdown(struct rte_distributor_v20 *d, unsigned int wkr)
 {
 	d->in_flight_tags[wkr] = 0;
 	d->in_flight_bitmask &= ~(1UL << wkr);
@@ -234,7 +234,7 @@ handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
 		 * Note that the tags were set before first level call
 		 * to rte_distributor_process.
 		 */
-		rte_distributor_process(d, pkts, i);
+		rte_distributor_process_v20(d, pkts, i);
 		bl->count = bl->start = 0;
 	}
 }
@@ -244,7 +244,7 @@ handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
  * to do a partial flush.
  */
 static int
-process_returns(struct rte_distributor *d)
+process_returns(struct rte_distributor_v20 *d)
 {
 	unsigned wkr;
 	unsigned flushed = 0;
@@ -283,7 +283,7 @@ process_returns(struct rte_distributor *d)
 
 /* process a set of packets to distribute them to workers */
 int
-rte_distributor_process(struct rte_distributor *d,
+rte_distributor_process_v20(struct rte_distributor_v20 *d,
 		struct rte_mbuf **mbufs, unsigned num_mbufs)
 {
 	unsigned next_idx = 0;
@@ -387,7 +387,7 @@ rte_distributor_process(struct rte_distributor *d,
 
 /* return to the caller, packets returned from workers */
 int
-rte_distributor_returned_pkts(struct rte_distributor *d,
+rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
 		struct rte_mbuf **mbufs, unsigned max_mbufs)
 {
 	struct rte_distributor_returned_pkts *returns = &d->returns;
@@ -408,7 +408,7 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
 /* return the number of packets in-flight in a distributor, i.e. packets
  * being workered on or queued up in a backlog. */
 static inline unsigned
-total_outstanding(const struct rte_distributor *d)
+total_outstanding(const struct rte_distributor_v20 *d)
 {
 	unsigned wkr, total_outstanding;
 
@@ -423,19 +423,19 @@ total_outstanding(const struct rte_distributor *d)
 /* flush the distributor, so that there are no outstanding packets in flight or
  * queued up. */
 int
-rte_distributor_flush(struct rte_distributor *d)
+rte_distributor_flush_v20(struct rte_distributor_v20 *d)
 {
 	const unsigned flushed = total_outstanding(d);
 
 	while (total_outstanding(d) > 0)
-		rte_distributor_process(d, NULL, 0);
+		rte_distributor_process_v20(d, NULL, 0);
 
 	return flushed;
 }
 
 /* clears the internal returns array in the distributor */
 void
-rte_distributor_clear_returns(struct rte_distributor *d)
+rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d)
 {
 	d->returns.start = d->returns.count = 0;
 #ifndef __OPTIMIZE__
@@ -444,12 +444,12 @@ rte_distributor_clear_returns(struct rte_distributor *d)
 }
 
 /* creates a distributor instance */
-struct rte_distributor *
-rte_distributor_create(const char *name,
+struct rte_distributor_v20 *
+rte_distributor_create_v20(const char *name,
 		unsigned socket_id,
 		unsigned num_workers)
 {
-	struct rte_distributor *d;
+	struct rte_distributor_v20 *d;
 	struct rte_distributor_list *distributor_list;
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	const struct rte_memzone *mz;
diff --git a/lib/librte_distributor/rte_distributor_v20.h b/lib/librte_distributor/rte_distributor_v20.h
index 7d36bc8..6da2ae3 100644
--- a/lib/librte_distributor/rte_distributor_v20.h
+++ b/lib/librte_distributor/rte_distributor_v20.h
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -31,15 +31,15 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-#ifndef _RTE_DISTRIBUTE_H_
-#define _RTE_DISTRIBUTE_H_
+#ifndef _RTE_DISTRIBUTE_V20_H_
+#define _RTE_DISTRIBUTE_V20_H_
 
 /**
  * @file
  * RTE distributor
  *
- * The distributor is a component which is designed to pass packets
- * one-at-a-time to workers, with dynamic load balancing.
+ * This file contains the legacy single-packet-at-a-time API and is
+ * here to allow the latest API provide backward compatibility.
  */
 
 #ifdef __cplusplus
@@ -48,7 +48,7 @@ extern "C" {
 
 #define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
 
-struct rte_distributor;
+struct rte_distributor_v20;
 struct rte_mbuf;
 
 /**
@@ -67,8 +67,8 @@ struct rte_mbuf;
  * @return
  *   The newly created distributor instance
  */
-struct rte_distributor *
-rte_distributor_create(const char *name, unsigned socket_id,
+struct rte_distributor_v20 *
+rte_distributor_create_v20(const char *name, unsigned int socket_id,
 		unsigned num_workers);
 
 /*  *** APIS to be called on the distributor lcore ***  */
@@ -103,7 +103,7 @@ rte_distributor_create(const char *name, unsigned socket_id,
  *   The number of mbufs processed.
  */
 int
-rte_distributor_process(struct rte_distributor *d,
+rte_distributor_process_v20(struct rte_distributor_v20 *d,
 		struct rte_mbuf **mbufs, unsigned num_mbufs);
 
 /**
@@ -121,7 +121,7 @@ rte_distributor_process(struct rte_distributor *d,
  *   The number of mbufs returned in the mbufs array.
  */
 int
-rte_distributor_returned_pkts(struct rte_distributor *d,
+rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
 		struct rte_mbuf **mbufs, unsigned max_mbufs);
 
 /**
@@ -136,7 +136,7 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
  *   The number of queued/in-flight packets that were completed by this call.
  */
 int
-rte_distributor_flush(struct rte_distributor *d);
+rte_distributor_flush_v20(struct rte_distributor_v20 *d);
 
 /**
  * Clears the array of returned packets used as the source for the
@@ -148,7 +148,7 @@ rte_distributor_flush(struct rte_distributor *d);
  *   The distributor instance to be used
  */
 void
-rte_distributor_clear_returns(struct rte_distributor *d);
+rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d);
 
 /*  *** APIS to be called on the worker lcores ***  */
 /*
@@ -177,7 +177,7 @@ rte_distributor_clear_returns(struct rte_distributor *d);
  *   A new packet to be processed by the worker thread.
  */
 struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
+rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id, struct rte_mbuf *oldpkt);
 
 /**
@@ -193,7 +193,8 @@ rte_distributor_get_pkt(struct rte_distributor *d,
  *   The previous packet being processed by the worker
  */
 int
-rte_distributor_return_pkt(struct rte_distributor *d, unsigned worker_id,
+rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
+		unsigned int worker_id,
 		struct rte_mbuf *mbuf);
 
 /**
@@ -217,7 +218,7 @@ rte_distributor_return_pkt(struct rte_distributor *d, unsigned worker_id,
  *   The previous packet, if any, being processed by the worker
  */
 void
-rte_distributor_request_pkt(struct rte_distributor *d,
+rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id, struct rte_mbuf *oldpkt);
 
 /**
@@ -237,7 +238,7 @@ rte_distributor_request_pkt(struct rte_distributor *d,
  *   packet is yet available.
  */
 struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
+rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id);
 
 #ifdef __cplusplus
diff --git a/lib/librte_distributor/rte_distributor_version.map b/lib/librte_distributor/rte_distributor_version.map
index 73fdc43..414fdc3 100644
--- a/lib/librte_distributor/rte_distributor_version.map
+++ b/lib/librte_distributor/rte_distributor_version.map
@@ -1,15 +1,15 @@
 DPDK_2.0 {
 	global:
 
-	rte_distributor_clear_returns;
-	rte_distributor_create;
-	rte_distributor_flush;
-	rte_distributor_get_pkt;
-	rte_distributor_poll_pkt;
-	rte_distributor_process;
-	rte_distributor_request_pkt;
-	rte_distributor_return_pkt;
-	rte_distributor_returned_pkts;
+	rte_distributor_clear_returns_v20;
+	rte_distributor_create_v20;
+	rte_distributor_flush_v20;
+	rte_distributor_get_pkt_v20;
+	rte_distributor_poll_pkt_v20;
+	rte_distributor_process_v20;
+	rte_distributor_request_pk_v20t;
+	rte_distributor_return_pkt_v20;
+	rte_distributor_returned_pkts_v20;
 
 	local: *;
 };
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v7 03/17] lib: create rte_distributor_private.h
  2017-02-21  3:17                   ` [PATCH v7 0/17] distributor library " David Hunt
  2017-02-21  3:17                     ` [PATCH v7 01/17] lib: rename legacy distributor lib files David Hunt
  2017-02-21  3:17                     ` [PATCH v7 02/17] lib: symbol versioning of functions in distributor David Hunt
@ 2017-02-21  3:17                     ` David Hunt
  2017-02-24 14:07                       ` Bruce Richardson
  2017-02-21  3:17                     ` [PATCH v7 04/17] lib: add new burst oriented distributor structs David Hunt
                                       ` (14 subsequent siblings)
  17 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-02-21  3:17 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

We'll be adding content in here common to both burst and
legacy APIs.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/rte_distributor_private.h | 136 +++++++++++++++++++++++
 lib/librte_distributor/rte_distributor_v20.c     |  72 +-----------
 2 files changed, 137 insertions(+), 71 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_private.h

diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
new file mode 100644
index 0000000..2d85b9b
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_private.h
@@ -0,0 +1,136 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DIST_PRIV_H_
+#define _RTE_DIST_PRIV_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define NO_FLAGS 0
+#define RTE_DISTRIB_PREFIX "DT_"
+
+/*
+ * We will use the bottom four bits of pointer for flags, shifting out
+ * the top four bits to make room (since a 64-bit pointer actually only uses
+ * 48 bits). An arithmetic-right-shift will then appropriately restore the
+ * original pointer value with proper sign extension into the top bits.
+ */
+#define RTE_DISTRIB_FLAG_BITS 4
+#define RTE_DISTRIB_FLAGS_MASK (0x0F)
+#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
+#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
+#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
+#define RTE_DISTRIB_VALID_BUF (4)  /**< set if bufptr contains ptr */
+
+#define RTE_DISTRIB_BACKLOG_SIZE 8
+#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
+
+#define RTE_DISTRIB_MAX_RETURNS 128
+#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
+
+/**
+ * Maximum number of workers allowed.
+ * Be aware of increasing the limit, becaus it is limited by how we track
+ * in-flight tags. See in_flight_bitmask and rte_distributor_process
+ */
+#define RTE_DISTRIB_MAX_WORKERS 64
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+/**
+ * Buffer structure used to pass the pointer data between cores. This is cache
+ * line aligned, but to improve performance and prevent adjacent cache-line
+ * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
+ * the next cache line to worker 0, we pad this out to three cache lines.
+ * Only 64-bits of the memory is actually used though.
+ */
+union rte_distributor_buffer_v20 {
+	volatile int64_t bufptr64;
+	char pad[RTE_CACHE_LINE_SIZE*3];
+} __rte_cache_aligned;
+
+/**
+ * Number of packets to deal with in bursts. Needs to be 8 so as to
+ * fit in one cache line.
+ */
+#define RTE_DIST_BURST_SIZE (sizeof(rte_xmm_t) / sizeof(uint16_t))
+
+struct rte_distributor_backlog {
+	unsigned int start;
+	unsigned int count;
+	int64_t pkts[RTE_DIST_BURST_SIZE] __rte_cache_aligned;
+	uint16_t *tags; /* will point to second cacheline of inflights */
+} __rte_cache_aligned;
+
+
+struct rte_distributor_returned_pkts {
+	unsigned int start;
+	unsigned int count;
+	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
+};
+
+struct rte_distributor_v20 {
+	TAILQ_ENTRY(rte_distributor_v20) next;    /**< Next in list. */
+
+	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
+	unsigned int num_workers;             /**< Number of workers polling */
+
+	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
+		/**< Tracks the tag being processed per core */
+	uint64_t in_flight_bitmask;
+		/**< on/off bits for in-flight tags.
+		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
+		 * the bitmask has to expand.
+		 */
+
+	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
+
+	union rte_distributor_buffer_v20 bufs[RTE_DISTRIB_MAX_WORKERS];
+
+	struct rte_distributor_returned_pkts returns;
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
index 48a8794..1f406c5 100644
--- a/lib/librte_distributor/rte_distributor_v20.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -41,77 +41,7 @@
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
 #include "rte_distributor_v20.h"
-
-#define NO_FLAGS 0
-#define RTE_DISTRIB_PREFIX "DT_"
-
-/* we will use the bottom four bits of pointer for flags, shifting out
- * the top four bits to make room (since a 64-bit pointer actually only uses
- * 48 bits). An arithmetic-right-shift will then appropriately restore the
- * original pointer value with proper sign extension into the top bits. */
-#define RTE_DISTRIB_FLAG_BITS 4
-#define RTE_DISTRIB_FLAGS_MASK (0x0F)
-#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
-#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
-#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
-
-#define RTE_DISTRIB_BACKLOG_SIZE 8
-#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
-
-#define RTE_DISTRIB_MAX_RETURNS 128
-#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
-
-/**
- * Maximum number of workers allowed.
- * Be aware of increasing the limit, becaus it is limited by how we track
- * in-flight tags. See @in_flight_bitmask and @rte_distributor_process
- */
-#define RTE_DISTRIB_MAX_WORKERS	64
-
-/**
- * Buffer structure used to pass the pointer data between cores. This is cache
- * line aligned, but to improve performance and prevent adjacent cache-line
- * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
- * the next cache line to worker 0, we pad this out to three cache lines.
- * Only 64-bits of the memory is actually used though.
- */
-union rte_distributor_buffer_v20 {
-	volatile int64_t bufptr64;
-	char pad[RTE_CACHE_LINE_SIZE*3];
-} __rte_cache_aligned;
-
-struct rte_distributor_backlog {
-	unsigned start;
-	unsigned count;
-	int64_t pkts[RTE_DISTRIB_BACKLOG_SIZE];
-};
-
-struct rte_distributor_returned_pkts {
-	unsigned start;
-	unsigned count;
-	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
-};
-
-struct rte_distributor_v20 {
-	TAILQ_ENTRY(rte_distributor_v20) next;    /**< Next in list. */
-
-	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
-	unsigned num_workers;                 /**< Number of workers polling */
-
-	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
-		/**< Tracks the tag being processed per core */
-	uint64_t in_flight_bitmask;
-		/**< on/off bits for in-flight tags.
-		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
-		 * the bitmask has to expand.
-		 */
-
-	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
-
-	union rte_distributor_buffer_v20 bufs[RTE_DISTRIB_MAX_WORKERS];
-
-	struct rte_distributor_returned_pkts returns;
-};
+#include "rte_distributor_private.h"
 
 TAILQ_HEAD(rte_distributor_list, rte_distributor_v20);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v7 04/17] lib: add new burst oriented distributor structs
  2017-02-21  3:17                   ` [PATCH v7 0/17] distributor library " David Hunt
                                       ` (2 preceding siblings ...)
  2017-02-21  3:17                     ` [PATCH v7 03/17] lib: create rte_distributor_private.h David Hunt
@ 2017-02-21  3:17                     ` David Hunt
  2017-02-24 14:08                       ` Bruce Richardson
  2017-02-24 14:09                       ` Bruce Richardson
  2017-02-21  3:17                     ` [PATCH v7 05/17] lib: add new distributor code David Hunt
                                       ` (13 subsequent siblings)
  17 siblings, 2 replies; 202+ messages in thread
From: David Hunt @ 2017-02-21  3:17 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/rte_distributor_private.h | 61 ++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
index 2d85b9b..c8e0f98 100644
--- a/lib/librte_distributor/rte_distributor_private.h
+++ b/lib/librte_distributor/rte_distributor_private.h
@@ -129,6 +129,67 @@ struct rte_distributor_v20 {
 	struct rte_distributor_returned_pkts returns;
 };
 
+/* All different signature compare functions */
+enum rte_distributor_match_function {
+	RTE_DIST_MATCH_SCALAR = 0,
+	RTE_DIST_MATCH_VECTOR,
+	RTE_DIST_NUM_MATCH_FNS
+};
+
+/**
+ * Buffer structure used to pass the pointer data between cores. This is cache
+ * line aligned, but to improve performance and prevent adjacent cache-line
+ * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
+ * the next cache line to worker 0, we pad this out to two cache lines.
+ * We can pass up to 8 mbufs at a time in one cacheline.
+ * There is a separate cacheline for returns in the burst API.
+ */
+struct rte_distributor_buffer {
+	volatile int64_t bufptr64[RTE_DIST_BURST_SIZE]
+			__rte_cache_aligned; /* <= outgoing to worker */
+
+	int64_t pad1 __rte_cache_aligned;    /* <= one cache line  */
+
+	volatile int64_t retptr64[RTE_DIST_BURST_SIZE]
+			__rte_cache_aligned; /* <= incoming from worker */
+
+	int64_t pad2 __rte_cache_aligned;    /* <= one cache line  */
+
+	int count __rte_cache_aligned;       /* <= number of current mbufs */
+};
+
+struct rte_distributor {
+	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
+
+	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
+	unsigned int num_workers;             /**< Number of workers polling */
+	unsigned int alg_type;                /**< Number of alg types */
+
+	/**>
+	 * First cache line in the this array are the tags inflight
+	 * on the worker core. Second cache line are the backlog
+	 * that are going to go to the worker core.
+	 */
+	uint16_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS][RTE_DIST_BURST_SIZE*2]
+			__rte_cache_aligned;
+
+	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS]
+			__rte_cache_aligned;
+
+	struct rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
+
+	struct rte_distributor_returned_pkts returns;
+
+	enum rte_distributor_match_function dist_match_fn;
+
+	struct rte_distributor_v20 *d_v20;
+};
+
+void
+find_match_scalar(struct rte_distributor *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr);
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v7 05/17] lib: add new distributor code
  2017-02-21  3:17                   ` [PATCH v7 0/17] distributor library " David Hunt
                                       ` (3 preceding siblings ...)
  2017-02-21  3:17                     ` [PATCH v7 04/17] lib: add new burst oriented distributor structs David Hunt
@ 2017-02-21  3:17                     ` David Hunt
  2017-02-24 14:11                       ` Bruce Richardson
  2017-02-21  3:17                     ` [PATCH v7 06/17] lib: add SIMD flow matching to distributor David Hunt
                                       ` (12 subsequent siblings)
  17 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-02-21  3:17 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

This patch includes public header file which will be used once
we add in the symbol versioning for v20 and v1705 APIs.

Also includes v1702 private header file, and code for new
burst-capable distributor library.

The new distributor code contains a very similar API to the legacy code,
but now sends bursts of up to 8 mbufs to each worker. Flow ID's are
reduced to 15 bits for an optimal flow matching algorithm.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                |   2 +
 lib/librte_distributor/rte_distributor.c       | 629 +++++++++++++++++++++++++
 lib/librte_distributor/rte_distributor.h       | 269 +++++++++++
 lib/librte_distributor/rte_distributor_v1705.h |  84 ++++
 4 files changed, 984 insertions(+)
 create mode 100644 lib/librte_distributor/rte_distributor.c
 create mode 100644 lib/librte_distributor/rte_distributor.h
 create mode 100644 lib/librte_distributor/rte_distributor_v1705.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 60837ed..276695a 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -43,9 +43,11 @@ LIBABIVER := 1
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor_v20.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
new file mode 100644
index 0000000..ae8d508
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor.c
@@ -0,0 +1,629 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <sys/queue.h>
+#include <string.h>
+#include <rte_mbuf.h>
+#include <rte_memory.h>
+#include <rte_cycles.h>
+#include <rte_memzone.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
+#include <rte_eal_memconfig.h>
+#include <rte_compat.h>
+#include "rte_distributor_private.h"
+#include "rte_distributor.h"
+#include "rte_distributor_v1705.h"
+#include "rte_distributor_v20.h"
+
+TAILQ_HEAD(rte_dist_burst_list, rte_distributor);
+
+static struct rte_tailq_elem rte_dist_burst_tailq = {
+	.name = "RTE_DIST_BURST",
+};
+EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
+
+/**** APIs called by workers ****/
+
+/**** Burst Packet APIs called by workers ****/
+
+void
+rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count)
+{
+	struct rte_distributor_buffer *buf = &(d->bufs[worker_id]);
+	unsigned int i;
+
+	volatile int64_t *retptr64;
+
+	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
+		rte_distributor_request_pkt_v20(d->d_v20,
+			worker_id, oldpkt[0]);
+		return;
+	}
+
+	retptr64 = &(buf->retptr64[0]);
+	/* Spin while handshake bits are set (scheduler clears it) */
+	while (unlikely(*retptr64 & RTE_DISTRIB_GET_BUF)) {
+		rte_pause();
+		uint64_t t = rte_rdtsc()+100;
+
+		while (rte_rdtsc() < t)
+			rte_pause();
+	}
+
+	/*
+	 * OK, if we've got here, then the scheduler has just cleared the
+	 * handshake bits. Populate the retptrs with returning packets.
+	 */
+
+	for (i = count; i < RTE_DIST_BURST_SIZE; i++)
+		buf->retptr64[i] = 0;
+
+	/* Set Return bit for each packet returned */
+	for (i = count; i-- > 0; )
+		buf->retptr64[i] =
+			(((int64_t)(uintptr_t)(oldpkt[i])) <<
+			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
+
+	/*
+	 * Finally, set the GET_BUF  to signal to distributor that cache
+	 * line is ready for processing
+	 */
+	*retptr64 |= RTE_DISTRIB_GET_BUF;
+}
+
+int
+rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **pkts)
+{
+	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
+	uint64_t ret;
+	int count = 0;
+	unsigned int i;
+
+	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
+		pkts[0] = rte_distributor_poll_pkt_v20(d->d_v20, worker_id);
+		return (pkts[0]) ? 1 : 0;
+	}
+
+	/* If bit is set, return */
+	if (buf->bufptr64[0] & RTE_DISTRIB_GET_BUF)
+		return -1;
+
+	/* since bufptr64 is signed, this should be an arithmetic shift */
+	for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+		if (likely(buf->bufptr64[i] & RTE_DISTRIB_VALID_BUF)) {
+			ret = buf->bufptr64[i] >> RTE_DISTRIB_FLAG_BITS;
+			pkts[count++] = (struct rte_mbuf *)((uintptr_t)(ret));
+		}
+	}
+
+	/*
+	 * so now we've got the contents of the cacheline into an  array of
+	 * mbuf pointers, so toggle the bit so scheduler can start working
+	 * on the next cacheline while we're working.
+	 */
+	buf->bufptr64[0] |= RTE_DISTRIB_GET_BUF;
+
+	return count;
+}
+
+int
+rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **pkts,
+		struct rte_mbuf **oldpkt, unsigned int return_count)
+{
+	int count;
+
+	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
+		if (return_count <= 1) {
+			pkts[0] = rte_distributor_get_pkt_v20(d->d_v20,
+				worker_id, oldpkt[0]);
+			return (pkts[0]) ? 1 : 0;
+		} else
+			return -EINVAL;
+	}
+
+	rte_distributor_request_pkt(d, worker_id, oldpkt, return_count);
+
+	count = rte_distributor_poll_pkt(d, worker_id, pkts);
+	while (count == -1) {
+		uint64_t t = rte_rdtsc() + 100;
+
+		while (rte_rdtsc() < t)
+			rte_pause();
+
+		count = rte_distributor_poll_pkt(d, worker_id, pkts);
+	}
+	return count;
+}
+
+int
+rte_distributor_return_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
+{
+	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
+	unsigned int i;
+
+	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
+		if (num == 1)
+			return rte_distributor_return_pkt_v20(d->d_v20,
+				worker_id, oldpkt[0]);
+		else
+			return -EINVAL;
+	}
+
+	for (i = 0; i < RTE_DIST_BURST_SIZE; i++)
+		/* Switch off the return bit first */
+		buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
+
+	for (i = num; i-- > 0; )
+		buf->retptr64[i] = (((int64_t)(uintptr_t)oldpkt[i]) <<
+			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
+
+	/* set the GET_BUF but even if we got no returns */
+	buf->retptr64[0] |= RTE_DISTRIB_GET_BUF;
+
+	return 0;
+}
+
+/**** APIs called on distributor core ***/
+
+/* stores a packet returned from a worker inside the returns array */
+static inline void
+store_return(uintptr_t oldbuf, struct rte_distributor *d,
+		unsigned int *ret_start, unsigned int *ret_count)
+{
+	if (!oldbuf)
+		return;
+	/* store returns in a circular buffer */
+	d->returns.mbufs[(*ret_start + *ret_count) & RTE_DISTRIB_RETURNS_MASK]
+			= (void *)oldbuf;
+	*ret_start += (*ret_count == RTE_DISTRIB_RETURNS_MASK);
+	*ret_count += (*ret_count != RTE_DISTRIB_RETURNS_MASK);
+}
+
+/*
+ * Match then flow_ids (tags) of the incoming packets to the flow_ids
+ * of the inflight packets (both inflight on the workers and in each worker
+ * backlog). This will then allow us to pin those packets to the relevant
+ * workers to give us our atomic flow pinning.
+ */
+void
+find_match_scalar(struct rte_distributor *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	struct rte_distributor_backlog *bl;
+	uint16_t i, j, w;
+
+	/*
+	 * Function overview:
+	 * 1. Loop through all worker ID's
+	 * 2. Compare the current inflights to the incoming tags
+	 * 3. Compare the current backlog to the incoming tags
+	 * 4. Add any matches to the output
+	 */
+
+	for (j = 0 ; j < RTE_DIST_BURST_SIZE; j++)
+		output_ptr[j] = 0;
+
+	for (i = 0; i < d->num_workers; i++) {
+		bl = &d->backlog[i];
+
+		for (j = 0; j < RTE_DIST_BURST_SIZE ; j++)
+			for (w = 0; w < RTE_DIST_BURST_SIZE; w++)
+				if (d->in_flight_tags[i][j] == data_ptr[w]) {
+					output_ptr[j] = i+1;
+					break;
+				}
+		for (j = 0; j < RTE_DIST_BURST_SIZE; j++)
+			for (w = 0; w < RTE_DIST_BURST_SIZE; w++)
+				if (bl->tags[j] == data_ptr[w]) {
+					output_ptr[j] = i+1;
+					break;
+				}
+	}
+
+	/*
+	 * At this stage, the output contains 8 16-bit values, with
+	 * each non-zero value containing the worker ID on which the
+	 * corresponding flow is pinned to.
+	 */
+}
+
+
+/*
+ * When the handshake bits indicate that there are packets coming
+ * back from the worker, this function is called to copy and store
+ * the valid returned pointers (store_return).
+ */
+static unsigned int
+handle_returns(struct rte_distributor *d, unsigned int wkr)
+{
+	struct rte_distributor_buffer *buf = &(d->bufs[wkr]);
+	uintptr_t oldbuf;
+	unsigned int ret_start = d->returns.start,
+			ret_count = d->returns.count;
+	unsigned int count = 0;
+	unsigned int i;
+
+	if (buf->retptr64[0] & RTE_DISTRIB_GET_BUF) {
+		for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+			if (buf->retptr64[i] & RTE_DISTRIB_RETURN_BUF) {
+				oldbuf = ((uintptr_t)(buf->retptr64[i] >>
+					RTE_DISTRIB_FLAG_BITS));
+				/* store returns in a circular buffer */
+				store_return(oldbuf, d, &ret_start, &ret_count);
+				count++;
+				buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
+			}
+		}
+		d->returns.start = ret_start;
+		d->returns.count = ret_count;
+		/* Clear for the worker to populate with more returns */
+		buf->retptr64[0] = 0;
+	}
+	return count;
+}
+
+/*
+ * This function releases a burst (cache line) to a worker.
+ * It is called from the process function when a cacheline is
+ * full to make room for more packets for that worker, or when
+ * all packets have been assigned to bursts and need to be flushed
+ * to the workers.
+ * It also needs to wait for any outstanding packets from the worker
+ * before sending out new packets.
+ */
+static unsigned int
+release(struct rte_distributor *d, unsigned int wkr)
+{
+	struct rte_distributor_buffer *buf = &(d->bufs[wkr]);
+	unsigned int i;
+
+	while (!(d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF))
+		rte_pause();
+
+	handle_returns(d, wkr);
+
+	buf->count = 0;
+
+	for (i = 0; i < d->backlog[wkr].count; i++) {
+		d->bufs[wkr].bufptr64[i] = d->backlog[wkr].pkts[i] |
+				RTE_DISTRIB_GET_BUF | RTE_DISTRIB_VALID_BUF;
+		d->in_flight_tags[wkr][i] = d->backlog[wkr].tags[i];
+	}
+	buf->count = i;
+	for ( ; i < RTE_DIST_BURST_SIZE ; i++) {
+		buf->bufptr64[i] = RTE_DISTRIB_GET_BUF;
+		d->in_flight_tags[wkr][i] = 0;
+	}
+
+	d->backlog[wkr].count = 0;
+
+	/* Clear the GET bit */
+	buf->bufptr64[0] &= ~RTE_DISTRIB_GET_BUF;
+	return  buf->count;
+
+}
+
+
+/* process a set of packets to distribute them to workers */
+int
+rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs)
+{
+	unsigned int next_idx = 0;
+	static unsigned int wkr;
+	struct rte_mbuf *next_mb = NULL;
+	int64_t next_value = 0;
+	uint16_t new_tag = 0;
+	uint16_t flows[RTE_DIST_BURST_SIZE] __rte_cache_aligned;
+	unsigned int i, j, w, wid;
+
+	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
+		/* Call the old API */
+		return rte_distributor_process_v20(d->d_v20, mbufs, num_mbufs);
+	}
+
+	if (unlikely(num_mbufs == 0)) {
+		/* Flush out all non-full cache-lines to workers. */
+		for (wid = 0 ; wid < d->num_workers; wid++) {
+			if ((d->bufs[wid].bufptr64[0] & RTE_DISTRIB_GET_BUF)) {
+				release(d, wid);
+				handle_returns(d, wid);
+			}
+		}
+		return 0;
+	}
+
+	while (next_idx < num_mbufs) {
+		uint16_t matches[RTE_DIST_BURST_SIZE];
+		unsigned int pkts;
+
+		if (d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF)
+			d->bufs[wkr].count = 0;
+
+		if ((num_mbufs - next_idx) < RTE_DIST_BURST_SIZE)
+			pkts = num_mbufs - next_idx;
+		else
+			pkts = RTE_DIST_BURST_SIZE;
+
+		for (i = 0; i < pkts; i++) {
+			if (mbufs[next_idx + i]) {
+				/* flows have to be non-zero */
+				flows[i] = mbufs[next_idx + i]->hash.usr | 1;
+			} else
+				flows[i] = 0;
+		}
+		for (; i < RTE_DIST_BURST_SIZE; i++)
+			flows[i] = 0;
+
+		find_match_scalar(d, &flows[0], &matches[0]);
+
+		/*
+		 * Matches array now contain the intended worker ID (+1) of
+		 * the incoming packets. Any zeroes need to be assigned
+		 * workers.
+		 */
+
+		for (j = 0; j < pkts; j++) {
+
+			next_mb = mbufs[next_idx++];
+			next_value = (((int64_t)(uintptr_t)next_mb) <<
+					RTE_DISTRIB_FLAG_BITS);
+			/*
+			 * User is advocated to set tag vaue for each
+			 * mbuf before calling rte_distributor_process.
+			 * User defined tags are used to identify flows,
+			 * or sessions.
+			 */
+			/* flows MUST be non-zero */
+			new_tag = (uint16_t)(next_mb->hash.usr) | 1;
+
+			/*
+			 * Uncommenting the next line will cause the find_match
+			 * function to be optimised out, making this function
+			 * do parallel (non-atomic) distribution
+			 */
+			/* matches[j] = 0; */
+
+			if (matches[j]) {
+				struct rte_distributor_backlog *bl =
+						&d->backlog[matches[j]-1];
+				if (unlikely(bl->count ==
+						RTE_DIST_BURST_SIZE)) {
+					release(d, matches[j]-1);
+				}
+
+				/* Add to worker that already has flow */
+				unsigned int idx = bl->count++;
+
+				bl->tags[idx] = new_tag;
+				bl->pkts[idx] = next_value;
+
+			} else {
+				struct rte_distributor_backlog *bl =
+						&d->backlog[wkr];
+				if (unlikely(bl->count ==
+						RTE_DIST_BURST_SIZE)) {
+					release(d, wkr);
+				}
+
+				/* Add to current worker worker */
+				unsigned int idx = bl->count++;
+
+				bl->tags[idx] = new_tag;
+				bl->pkts[idx] = next_value;
+				/*
+				 * Now that we've just added an unpinned flow
+				 * to a worker, we need to ensure that all
+				 * other packets with that same flow will go
+				 * to the same worker in this burst.
+				 */
+				for (w = j; w < pkts; w++)
+					if (flows[w] == new_tag)
+						matches[w] = wkr+1;
+			}
+		}
+		wkr++;
+		if (wkr >= d->num_workers)
+			wkr = 0;
+	}
+
+	/* Flush out all non-full cache-lines to workers. */
+	for (wid = 0 ; wid < d->num_workers; wid++)
+		if ((d->bufs[wid].bufptr64[0] & RTE_DISTRIB_GET_BUF))
+			release(d, wid);
+
+	return num_mbufs;
+}
+
+/* return to the caller, packets returned from workers */
+int
+rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs)
+{
+	struct rte_distributor_returned_pkts *returns = &d->returns;
+	unsigned int retval = (max_mbufs < returns->count) ?
+			max_mbufs : returns->count;
+	unsigned int i;
+
+	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
+		/* Call the old API */
+		return rte_distributor_returned_pkts_v20(d->d_v20,
+				mbufs, max_mbufs);
+	}
+
+	for (i = 0; i < retval; i++) {
+		unsigned int idx = (returns->start + i) &
+				RTE_DISTRIB_RETURNS_MASK;
+
+		mbufs[i] = returns->mbufs[idx];
+	}
+	returns->start += i;
+	returns->count -= i;
+
+	return retval;
+}
+
+/*
+ * Return the number of packets in-flight in a distributor, i.e. packets
+ * being workered on or queued up in a backlog.
+ */
+static inline unsigned int
+total_outstanding(const struct rte_distributor *d)
+{
+	unsigned int wkr, total_outstanding = 0;
+
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		total_outstanding += d->backlog[wkr].count;
+
+	return total_outstanding;
+}
+
+/*
+ * Flush the distributor, so that there are no outstanding packets in flight or
+ * queued up.
+ */
+int
+rte_distributor_flush(struct rte_distributor *d)
+{
+	const unsigned int flushed = total_outstanding(d);
+	unsigned int wkr;
+
+	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
+		/* Call the old API */
+		return rte_distributor_flush_v20(d->d_v20);
+	}
+
+	while (total_outstanding(d) > 0)
+		rte_distributor_process(d, NULL, 0);
+
+	/*
+	 * Send empty burst to all workers to allow them to exit
+	 * gracefully, should they need to.
+	 */
+	rte_distributor_process(d, NULL, 0);
+
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		handle_returns(d, wkr);
+
+	return flushed;
+}
+
+/* clears the internal returns array in the distributor */
+void
+rte_distributor_clear_returns(struct rte_distributor *d)
+{
+	unsigned int wkr;
+
+	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
+		/* Call the old API */
+		rte_distributor_clear_returns_v20(d->d_v20);
+	}
+
+	/* throw away returns, so workers can exit */
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		d->bufs[wkr].retptr64[0] = 0;
+}
+
+/* creates a distributor instance */
+struct rte_distributor *
+rte_distributor_create(const char *name,
+		unsigned int socket_id,
+		unsigned int num_workers,
+		unsigned int alg_type)
+{
+	struct rte_distributor *d;
+	struct rte_dist_burst_list *dist_burst_list;
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+	unsigned int i;
+
+	/* TODO Reorganise function properly around RTE_DIST_ALG_SINGLE/BURST */
+
+	/* compilation-time checks */
+	RTE_BUILD_BUG_ON((sizeof(*d) & RTE_CACHE_LINE_MASK) != 0);
+	RTE_BUILD_BUG_ON((RTE_DISTRIB_MAX_WORKERS & 7) != 0);
+
+	if (alg_type == RTE_DIST_ALG_SINGLE) {
+		d = malloc(sizeof(struct rte_distributor));
+		d->d_v20 = rte_distributor_create_v20(name,
+				socket_id, num_workers);
+		if (d->d_v20 == NULL) {
+			/* rte_errno will have been set */
+			return NULL;
+		}
+		d->alg_type = alg_type;
+		return d;
+	}
+
+	if (name == NULL || num_workers >= RTE_DISTRIB_MAX_WORKERS) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	snprintf(mz_name, sizeof(mz_name), RTE_DISTRIB_PREFIX"%s", name);
+	mz = rte_memzone_reserve(mz_name, sizeof(*d), socket_id, NO_FLAGS);
+	if (mz == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	d = mz->addr;
+	snprintf(d->name, sizeof(d->name), "%s", name);
+	d->num_workers = num_workers;
+	d->alg_type = alg_type;
+	d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
+
+	/*
+	 * Set up the backog tags so they're pointing at the second cache
+	 * line for performance during flow matching
+	 */
+	for (i = 0 ; i < num_workers ; i++)
+		d->backlog[i].tags = &d->in_flight_tags[i][RTE_DIST_BURST_SIZE];
+
+	dist_burst_list = RTE_TAILQ_CAST(rte_dist_burst_tailq.head,
+					  rte_dist_burst_list);
+
+
+	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+	TAILQ_INSERT_TAIL(dist_burst_list, d, next);
+	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
+	return d;
+}
diff --git a/lib/librte_distributor/rte_distributor.h b/lib/librte_distributor/rte_distributor.h
new file mode 100644
index 0000000..9b9efdb
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor.h
@@ -0,0 +1,269 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIBUTOR_H_
+#define _RTE_DISTRIBUTOR_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Type of distribution (burst/single) */
+enum rte_distributor_alg_type {
+	RTE_DIST_ALG_BURST = 0,
+	RTE_DIST_ALG_SINGLE,
+	RTE_DIST_NUM_ALG_TYPES
+};
+
+struct rte_distributor;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @param alg_type
+ *   Call the legacy API, or use the new burst API. legacy uses 32-bit
+ *   flow ID, and works on a single packet at a time. Latest uses 15-
+ *   bit flow ID and works on up to 8 packets at a time to worers.
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor *
+rte_distributor_create(const char *name, unsigned int socket_id,
+		unsigned int num_workers,
+		unsigned int alg_type);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed on different cores at
+ * the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush(struct rte_distributor *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns(struct rte_distributor *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get new packets to process. Any previous packets
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param pkts
+ *   The mbufs pointer array to be filled in (up to 8 packets)
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ * @param retcount
+ *   The number of packets being returned
+ *
+ * @return
+ *   The number of packets in the pkts array
+ */
+int
+rte_distributor_get_pkt(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **pkts,
+	struct rte_mbuf **oldpkt, unsigned int retcount);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packets being processed by the worker
+ * @param num
+ *   The number of packets in the oldpkt array
+ */
+int
+rte_distributor_return_pkt(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt_burst(), this function does not wait for a
+ * new packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt_burst() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt_burst()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The returning packets, if any, processed by the worker
+ * @param count
+ *   The number of returning packets
+ */
+void
+rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbufs
+ *   The array of mbufs being given to the worker
+ *
+ * @return
+ *   The number of packets being given to the worker thread, zero if no
+ *   packet is yet available.
+ */
+int
+rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **mbufs);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_v1705.h b/lib/librte_distributor/rte_distributor_v1705.h
new file mode 100644
index 0000000..e18914b
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v1705.h
@@ -0,0 +1,84 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DIST_V1705_H_
+#define _RTE_DIST_V1705_H_
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct rte_distributor;
+struct rte_mbuf;
+
+struct rte_distributor *
+rte_distributor_create_v1705(const char *name, unsigned int socket_id,
+		unsigned int num_workers,
+		unsigned int alg_type);
+
+int
+rte_distributor_process_v1705(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+int
+rte_distributor_returned_pkts_v1705(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+int
+rte_distributor_flush_v1705(struct rte_distributor *d);
+
+void
+rte_distributor_clear_returns_v1705(struct rte_distributor *d);
+
+int
+rte_distributor_get_pkt_v1705(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **pkts,
+	struct rte_mbuf **oldpkt, unsigned int retcount);
+
+int
+rte_distributor_return_pkt_v1705(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
+
+void
+rte_distributor_request_pkt_v1705(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count);
+
+int
+rte_distributor_poll_pkt_v1705(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **mbufs);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v7 06/17] lib: add SIMD flow matching to distributor
  2017-02-21  3:17                   ` [PATCH v7 0/17] distributor library " David Hunt
                                       ` (4 preceding siblings ...)
  2017-02-21  3:17                     ` [PATCH v7 05/17] lib: add new distributor code David Hunt
@ 2017-02-21  3:17                     ` David Hunt
  2017-02-24 14:11                       ` Bruce Richardson
  2017-02-21  3:17                     ` [PATCH v7 07/17] lib: apply symbol versioning to distibutor lib David Hunt
                                       ` (11 subsequent siblings)
  17 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-02-21  3:17 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Add an optimised version of the in-flight flow matching algorithm
using SIMD instructions. This should give up to 1.5x over the scalar
versions performance.

Falls back to scalar version if SSE4.2 not available

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                    |   7 ++
 lib/librte_distributor/rte_distributor.c           |  16 ++-
 .../rte_distributor_match_generic.c                |  43 ++++++++
 lib/librte_distributor/rte_distributor_match_sse.c | 113 +++++++++++++++++++++
 lib/librte_distributor/rte_distributor_private.h   |   5 +
 5 files changed, 182 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_match_generic.c
 create mode 100644 lib/librte_distributor/rte_distributor_match_sse.c

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 276695a..5b599c6 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -44,6 +44,13 @@ LIBABIVER := 1
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor.c
+ifeq ($(CONFIG_RTE_ARCH_X86),y)
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_match_sse.c
+CFLAGS_rte_distributor_match_sse.o += -msse4.2
+else
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_match_generic.c
+endif
+
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor_v20.h
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index ae8d508..b8e171c 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -392,7 +392,13 @@ rte_distributor_process(struct rte_distributor *d,
 		for (; i < RTE_DIST_BURST_SIZE; i++)
 			flows[i] = 0;
 
-		find_match_scalar(d, &flows[0], &matches[0]);
+		switch (d->dist_match_fn) {
+		case RTE_DIST_MATCH_VECTOR:
+			find_match_vec(d, &flows[0], &matches[0]);
+			break;
+		default:
+			find_match_scalar(d, &flows[0], &matches[0]);
+		}
 
 		/*
 		 * Matches array now contain the intended worker ID (+1) of
@@ -608,7 +614,13 @@ rte_distributor_create(const char *name,
 	snprintf(d->name, sizeof(d->name), "%s", name);
 	d->num_workers = num_workers;
 	d->alg_type = alg_type;
-	d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
+
+#if defined(RTE_ARCH_X86)
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2)) {
+		d->dist_match_fn = RTE_DIST_MATCH_VECTOR;
+	} else
+#endif
+		d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
 
 	/*
 	 * Set up the backog tags so they're pointing at the second cache
diff --git a/lib/librte_distributor/rte_distributor_match_generic.c b/lib/librte_distributor/rte_distributor_match_generic.c
new file mode 100644
index 0000000..4925a78
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_match_generic.c
@@ -0,0 +1,43 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_mbuf.h>
+#include "rte_distributor_private.h"
+#include "rte_distributor.h"
+
+void
+find_match_vec(struct rte_distributor *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	find_match_scalar(d, data_ptr, output_ptr);
+}
diff --git a/lib/librte_distributor/rte_distributor_match_sse.c b/lib/librte_distributor/rte_distributor_match_sse.c
new file mode 100644
index 0000000..a84097f
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_match_sse.c
@@ -0,0 +1,113 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_mbuf.h>
+#include "rte_distributor_private.h"
+#include "rte_distributor.h"
+#include "smmintrin.h"
+
+
+void
+find_match_vec(struct rte_distributor *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	/* Setup */
+	__m128i incoming_fids;
+	__m128i inflight_fids;
+	__m128i preflight_fids;
+	__m128i wkr;
+	__m128i mask1;
+	__m128i mask2;
+	__m128i output;
+	struct rte_distributor_backlog *bl;
+	uint16_t i;
+
+	/*
+	 * Function overview:
+	 * 2. Loop through all worker ID's
+	 *  2a. Load the current inflights for that worker into an xmm reg
+	 *  2b. Load the current backlog for that worker into an xmm reg
+	 *  2c. use cmpestrm to intersect flow_ids with backlog and inflights
+	 *  2d. Add any matches to the output
+	 * 3. Write the output xmm (matching worker ids).
+	 */
+
+
+	output = _mm_set1_epi16(0);
+	incoming_fids = _mm_load_si128((__m128i *)data_ptr);
+
+	for (i = 0; i < d->num_workers; i++) {
+		bl = &d->backlog[i];
+
+		inflight_fids =
+			_mm_load_si128((__m128i *)&(d->in_flight_tags[i]));
+		preflight_fids =
+			_mm_load_si128((__m128i *)(bl->tags));
+
+		/*
+		 * Any incoming_fid that exists anywhere in inflight_fids will
+		 * have 0xffff in same position of the mask as the incoming fid
+		 * Example (shortened to bytes for brevity):
+		 * incoming_fids   0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08
+		 * inflight_fids   0x03 0x05 0x07 0x00 0x00 0x00 0x00 0x00
+		 * mask            0x00 0x00 0xff 0x00 0xff 0x00 0xff 0x00
+		 */
+
+		mask1 = _mm_cmpestrm(inflight_fids, 8, incoming_fids, 8,
+			_SIDD_UWORD_OPS |
+			_SIDD_CMP_EQUAL_ANY |
+			_SIDD_UNIT_MASK);
+		mask2 = _mm_cmpestrm(preflight_fids, 8, incoming_fids, 8,
+			_SIDD_UWORD_OPS |
+			_SIDD_CMP_EQUAL_ANY |
+			_SIDD_UNIT_MASK);
+
+		mask1 = _mm_or_si128(mask1, mask2);
+		/*
+		 * Now mask contains 0xffff where there's a match.
+		 * Next we need to store the worker_id in the relevant position
+		 * in the output.
+		 */
+
+		wkr = _mm_set1_epi16(i+1);
+		mask1 = _mm_and_si128(mask1, wkr);
+		output = _mm_or_si128(mask1, output);
+	}
+
+	/*
+	 * At this stage, the output 128-bit contains 8 16-bit values, with
+	 * each non-zero value containing the worker ID on which the
+	 * corresponding flow is pinned to.
+	 */
+	_mm_store_si128((__m128i *)output_ptr, output);
+}
diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
index c8e0f98..c1d852b 100644
--- a/lib/librte_distributor/rte_distributor_private.h
+++ b/lib/librte_distributor/rte_distributor_private.h
@@ -190,6 +190,11 @@ find_match_scalar(struct rte_distributor *d,
 			uint16_t *data_ptr,
 			uint16_t *output_ptr);
 
+void
+find_match_vec(struct rte_distributor *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr);
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v7 07/17] lib: apply symbol versioning to distibutor lib
  2017-02-21  3:17                   ` [PATCH v7 0/17] distributor library " David Hunt
                                       ` (5 preceding siblings ...)
  2017-02-21  3:17                     ` [PATCH v7 06/17] lib: add SIMD flow matching to distributor David Hunt
@ 2017-02-21  3:17                     ` David Hunt
  2017-02-21 11:50                       ` Hunt, David
  2017-02-24 14:12                       ` Bruce Richardson
  2017-02-21  3:17                     ` [PATCH v7 08/17] test: change params to distributor autotest David Hunt
                                       ` (10 subsequent siblings)
  17 siblings, 2 replies; 202+ messages in thread
From: David Hunt @ 2017-02-21  3:17 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Note: LIBABIVER is also bumped up in the Makefile

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/rte_distributor.c           | 10 +++++++++-
 lib/librte_distributor/rte_distributor_v20.c       | 10 ++++++++++
 lib/librte_distributor/rte_distributor_version.map | 14 ++++++++++++++
 3 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index b8e171c..2dc7738 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -43,7 +43,6 @@
 #include <rte_compat.h>
 #include "rte_distributor_private.h"
 #include "rte_distributor.h"
-#include "rte_distributor_v1705.h"
 #include "rte_distributor_v20.h"
 
 TAILQ_HEAD(rte_dist_burst_list, rte_distributor);
@@ -103,6 +102,7 @@ rte_distributor_request_pkt(struct rte_distributor *d,
 	 */
 	*retptr64 |= RTE_DISTRIB_GET_BUF;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_request_pkt,, 17.05);
 
 int
 rte_distributor_poll_pkt(struct rte_distributor *d,
@@ -139,6 +139,7 @@ rte_distributor_poll_pkt(struct rte_distributor *d,
 
 	return count;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_poll_pkt,, 17.05);
 
 int
 rte_distributor_get_pkt(struct rte_distributor *d,
@@ -169,6 +170,7 @@ rte_distributor_get_pkt(struct rte_distributor *d,
 	}
 	return count;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_get_pkt,, 17.05);
 
 int
 rte_distributor_return_pkt(struct rte_distributor *d,
@@ -198,6 +200,7 @@ rte_distributor_return_pkt(struct rte_distributor *d,
 
 	return 0;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_return_pkt,, 17.05);
 
 /**** APIs called on distributor core ***/
 
@@ -477,6 +480,7 @@ rte_distributor_process(struct rte_distributor *d,
 
 	return num_mbufs;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_process,, 17.05);
 
 /* return to the caller, packets returned from workers */
 int
@@ -505,6 +509,7 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
 
 	return retval;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_returned_pkts,, 17.05);
 
 /*
  * Return the number of packets in-flight in a distributor, i.e. packets
@@ -550,6 +555,7 @@ rte_distributor_flush(struct rte_distributor *d)
 
 	return flushed;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_flush,, 17.05);
 
 /* clears the internal returns array in the distributor */
 void
@@ -566,6 +572,7 @@ rte_distributor_clear_returns(struct rte_distributor *d)
 	for (wkr = 0; wkr < d->num_workers; wkr++)
 		d->bufs[wkr].retptr64[0] = 0;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_clear_returns,, 17.05);
 
 /* creates a distributor instance */
 struct rte_distributor *
@@ -639,3 +646,4 @@ rte_distributor_create(const char *name,
 
 	return d;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_create,, 17.05);
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
index 1f406c5..d74a789 100644
--- a/lib/librte_distributor/rte_distributor_v20.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -42,6 +42,7 @@
 #include <rte_eal_memconfig.h>
 #include "rte_distributor_v20.h"
 #include "rte_distributor_private.h"
+#include "rte_compat.h"
 
 TAILQ_HEAD(rte_distributor_list, rte_distributor_v20);
 
@@ -63,6 +64,7 @@ rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
 		rte_pause();
 	buf->bufptr64 = req;
 }
+VERSION_SYMBOL(rte_distributor_request_pkt, _v20, 2.0);
 
 struct rte_mbuf *
 rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
@@ -76,6 +78,7 @@ rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
 	int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
 	return (struct rte_mbuf *)((uintptr_t)ret);
 }
+VERSION_SYMBOL(rte_distributor_poll_pkt, _v20, 2.0);
 
 struct rte_mbuf *
 rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
@@ -87,6 +90,7 @@ rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
 		rte_pause();
 	return ret;
 }
+VERSION_SYMBOL(rte_distributor_get_pkt, _v20, 2.0);
 
 int
 rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
@@ -98,6 +102,7 @@ rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
 	buf->bufptr64 = req;
 	return 0;
 }
+VERSION_SYMBOL(rte_distributor_return_pkt, _v20, 2.0);
 
 /**** APIs called on distributor core ***/
 
@@ -314,6 +319,7 @@ rte_distributor_process_v20(struct rte_distributor_v20 *d,
 	d->returns.count = ret_count;
 	return num_mbufs;
 }
+VERSION_SYMBOL(rte_distributor_process, _v20, 2.0);
 
 /* return to the caller, packets returned from workers */
 int
@@ -334,6 +340,7 @@ rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
 
 	return retval;
 }
+VERSION_SYMBOL(rte_distributor_returned_pkts, _v20, 2.0);
 
 /* return the number of packets in-flight in a distributor, i.e. packets
  * being workered on or queued up in a backlog. */
@@ -362,6 +369,7 @@ rte_distributor_flush_v20(struct rte_distributor_v20 *d)
 
 	return flushed;
 }
+VERSION_SYMBOL(rte_distributor_flush, _v20, 2.0);
 
 /* clears the internal returns array in the distributor */
 void
@@ -372,6 +380,7 @@ rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d)
 	memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
 #endif
 }
+VERSION_SYMBOL(rte_distributor_clear_returns, _v20, 2.0);
 
 /* creates a distributor instance */
 struct rte_distributor_v20 *
@@ -415,3 +424,4 @@ rte_distributor_create_v20(const char *name,
 
 	return d;
 }
+VERSION_SYMBOL(rte_distributor_create, _v20, 2.0);
diff --git a/lib/librte_distributor/rte_distributor_version.map b/lib/librte_distributor/rte_distributor_version.map
index 414fdc3..7531cbe 100644
--- a/lib/librte_distributor/rte_distributor_version.map
+++ b/lib/librte_distributor/rte_distributor_version.map
@@ -13,3 +13,17 @@ DPDK_2.0 {
 
 	local: *;
 };
+
+DPDK_17.05 {
+	global:
+
+	rte_distributor_clear_returns;
+	rte_distributor_create;
+	rte_distributor_flush;
+	rte_distributor_get_pkt;
+	rte_distributor_poll_pkt;
+	rte_distributor_process;
+	rte_distributor_request_pkt;
+	rte_distributor_return_pkt;
+	rte_distributor_returned_pkts;
+} DPDK_2.0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v7 08/17] test: change params to distributor autotest
  2017-02-21  3:17                   ` [PATCH v7 0/17] distributor library " David Hunt
                                       ` (6 preceding siblings ...)
  2017-02-21  3:17                     ` [PATCH v7 07/17] lib: apply symbol versioning to distibutor lib David Hunt
@ 2017-02-21  3:17                     ` David Hunt
  2017-02-24 14:14                       ` Bruce Richardson
  2017-02-21  3:17                     ` [PATCH v7 09/17] test: switch distributor test over to burst API David Hunt
                                       ` (9 subsequent siblings)
  17 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-02-21  3:17 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

In the next few patches, we'll want to test old and new API,
so here we're allowing different parameters to be passed to
the tests, instead of just a distributor struct.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 app/test/test_distributor.c | 64 +++++++++++++++++++++++++++++----------------
 1 file changed, 42 insertions(+), 22 deletions(-)

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index 6a4e20b..fdfa793 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -45,6 +45,13 @@
 #define BURST 32
 #define BIG_BATCH 1024
 
+struct worker_params {
+	char name[64];
+	struct rte_distributor_v20 *dist;
+};
+
+struct worker_params worker_params;
+
 /* statics - all zero-initialized by default */
 static volatile int quit;      /**< general quit variable for all threads */
 static volatile int zero_quit; /**< var for when we just want thr0 to quit*/
@@ -81,7 +88,8 @@ static int
 handle_work(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor_v20 *d = arg;
+	struct worker_params *wp = arg;
+	struct rte_distributor_v20 *d = wp->dist;
 	unsigned count = 0;
 	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
 
@@ -107,8 +115,9 @@ handle_work(void *arg)
  *   not necessarily in the same order (as different flows).
  */
 static int
-sanity_test(struct rte_distributor_v20 *d, struct rte_mempool *p)
+sanity_test(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor_v20 *d = wp->dist;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
@@ -249,7 +258,8 @@ static int
 handle_work_with_free_mbufs(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor_v20 *d = arg;
+	struct worker_params *wp = arg;
+	struct rte_distributor_v20 *d = wp->dist;
 	unsigned count = 0;
 	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
 
@@ -270,9 +280,9 @@ handle_work_with_free_mbufs(void *arg)
  * library.
  */
 static int
-sanity_test_with_mbuf_alloc(struct rte_distributor_v20 *d,
-		struct rte_mempool *p)
+sanity_test_with_mbuf_alloc(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor_v20 *d = wp->dist;
 	unsigned i;
 	struct rte_mbuf *bufs[BURST];
 
@@ -306,7 +316,8 @@ static int
 handle_work_for_shutdown_test(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor_v20 *d = arg;
+	struct worker_params *wp = arg;
+	struct rte_distributor_v20 *d = wp->dist;
 	unsigned count = 0;
 	const unsigned id = __sync_fetch_and_add(&worker_idx, 1);
 
@@ -345,9 +356,10 @@ handle_work_for_shutdown_test(void *arg)
  * library.
  */
 static int
-sanity_test_with_worker_shutdown(struct rte_distributor_v20 *d,
+sanity_test_with_worker_shutdown(struct worker_params *wp,
 		struct rte_mempool *p)
 {
+	struct rte_distributor_v20 *d = wp->dist;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
@@ -402,9 +414,10 @@ sanity_test_with_worker_shutdown(struct rte_distributor_v20 *d,
  * one worker shuts down..
  */
 static int
-test_flush_with_worker_shutdown(struct rte_distributor_v20 *d,
+test_flush_with_worker_shutdown(struct worker_params *wp,
 		struct rte_mempool *p)
 {
+	struct rte_distributor_v20 *d = wp->dist;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
@@ -481,8 +494,9 @@ int test_error_distributor_create_numworkers(void)
 
 /* Useful function which ensures that all worker functions terminate */
 static void
-quit_workers(struct rte_distributor_v20 *d, struct rte_mempool *p)
+quit_workers(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor_v20 *d = wp->dist;
 	const unsigned num_workers = rte_lcore_count() - 1;
 	unsigned i;
 	struct rte_mbuf *bufs[RTE_MAX_LCORE];
@@ -538,28 +552,34 @@ test_distributor(void)
 		}
 	}
 
-	rte_eal_mp_remote_launch(handle_work, d, SKIP_MASTER);
-	if (sanity_test(d, p) < 0)
+	worker_params.dist = d;
+	sprintf(worker_params.name, "single");
+
+	rte_eal_mp_remote_launch(handle_work, &worker_params, SKIP_MASTER);
+	if (sanity_test(&worker_params, p) < 0)
 		goto err;
-	quit_workers(d, p);
+	quit_workers(&worker_params, p);
 
-	rte_eal_mp_remote_launch(handle_work_with_free_mbufs, d, SKIP_MASTER);
-	if (sanity_test_with_mbuf_alloc(d, p) < 0)
+	rte_eal_mp_remote_launch(handle_work_with_free_mbufs, &worker_params,
+				SKIP_MASTER);
+	if (sanity_test_with_mbuf_alloc(&worker_params, p) < 0)
 		goto err;
-	quit_workers(d, p);
+	quit_workers(&worker_params, p);
 
 	if (rte_lcore_count() > 2) {
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test, d,
+		rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+				&worker_params,
 				SKIP_MASTER);
-		if (sanity_test_with_worker_shutdown(d, p) < 0)
+		if (sanity_test_with_worker_shutdown(&worker_params, p) < 0)
 			goto err;
-		quit_workers(d, p);
+		quit_workers(&worker_params, p);
 
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test, d,
+		rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+				&worker_params,
 				SKIP_MASTER);
-		if (test_flush_with_worker_shutdown(d, p) < 0)
+		if (test_flush_with_worker_shutdown(&worker_params, p) < 0)
 			goto err;
-		quit_workers(d, p);
+		quit_workers(&worker_params, p);
 
 	} else {
 		printf("Not enough cores to run tests for worker shutdown\n");
@@ -574,7 +594,7 @@ test_distributor(void)
 	return 0;
 
 err:
-	quit_workers(d, p);
+	quit_workers(&worker_params, p);
 	return -1;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v7 09/17] test: switch distributor test over to burst API
  2017-02-21  3:17                   ` [PATCH v7 0/17] distributor library " David Hunt
                                       ` (7 preceding siblings ...)
  2017-02-21  3:17                     ` [PATCH v7 08/17] test: change params to distributor autotest David Hunt
@ 2017-02-21  3:17                     ` David Hunt
  2017-02-21  3:17                     ` [PATCH v7 10/17] test: test single and burst distributor API David Hunt
                                       ` (8 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-02-21  3:17 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 app/test/test_distributor.c | 292 ++++++++++++++++++++++++++++----------------
 1 file changed, 187 insertions(+), 105 deletions(-)

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index fdfa793..8866e31 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -39,7 +39,7 @@
 #include <rte_errno.h>
 #include <rte_mempool.h>
 #include <rte_mbuf.h>
-#include <rte_distributor_v20.h>
+#include <rte_distributor.h>
 
 #define ITER_POWER 20 /* log 2 of how many iterations we do when timing. */
 #define BURST 32
@@ -47,7 +47,7 @@
 
 struct worker_params {
 	char name[64];
-	struct rte_distributor_v20 *dist;
+	struct rte_distributor *dist;
 };
 
 struct worker_params worker_params;
@@ -87,19 +87,25 @@ clear_packet_count(void)
 static int
 handle_work(void *arg)
 {
-	struct rte_mbuf *pkt = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
 	struct worker_params *wp = arg;
-	struct rte_distributor_v20 *d = wp->dist;
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
-
-	pkt = rte_distributor_get_pkt_v20(d, id, NULL);
+	struct rte_distributor *db = wp->dist;
+	unsigned int count = 0, num = 0;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	int i;
+
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
+	num = rte_distributor_get_pkt(db, id, buf, buf, num);
 	while (!quit) {
-		worker_stats[id].handled_packets++, count++;
-		pkt = rte_distributor_get_pkt_v20(d, id, pkt);
+		worker_stats[id].handled_packets += num;
+		count += num;
+		num = rte_distributor_get_pkt(db, id,
+				buf, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt_v20(d, id, pkt);
+	worker_stats[id].handled_packets += num;
+	count += num;
+	rte_distributor_return_pkt(db, id, buf, num);
 	return 0;
 }
 
@@ -117,11 +123,15 @@ handle_work(void *arg)
 static int
 sanity_test(struct worker_params *wp, struct rte_mempool *p)
 {
-	struct rte_distributor_v20 *d = wp->dist;
+	struct rte_distributor *db = wp->dist;
 	struct rte_mbuf *bufs[BURST];
-	unsigned i;
+	struct rte_mbuf *returns[BURST*2];
+	unsigned int i;
+	unsigned int retries;
+	unsigned int count = 0;
+
+	printf("=== Basic distributor sanity tests (%s) ===\n", wp->name);
 
-	printf("=== Basic distributor sanity tests ===\n");
 	clear_packet_count();
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
@@ -133,8 +143,16 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 	for (i = 0; i < BURST; i++)
 		bufs[i]->hash.usr = 0;
 
-	rte_distributor_process_v20(d, bufs, BURST);
-	rte_distributor_flush_v20(d);
+	rte_distributor_process(db, bufs, BURST);
+	count = 0;
+	do {
+
+		rte_distributor_flush(db);
+		count += rte_distributor_returned_pkts(db,
+				returns, BURST*2);
+	} while (count < BURST);
+
+
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -146,8 +164,6 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 		printf("Worker %u handled %u packets\n", i,
 				worker_stats[i].handled_packets);
 	printf("Sanity test with all zero hashes done.\n");
-	if (worker_stats[0].handled_packets != BURST)
-		return -1;
 
 	/* pick two flows and check they go correctly */
 	if (rte_lcore_count() >= 3) {
@@ -155,8 +171,13 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 		for (i = 0; i < BURST; i++)
 			bufs[i]->hash.usr = (i & 1) << 8;
 
-		rte_distributor_process_v20(d, bufs, BURST);
-		rte_distributor_flush_v20(d);
+		rte_distributor_process(db, bufs, BURST);
+		count = 0;
+		do {
+			rte_distributor_flush(db);
+			count += rte_distributor_returned_pkts(db,
+					returns, BURST*2);
+		} while (count < BURST);
 		if (total_packet_count() != BURST) {
 			printf("Line %d: Error, not all packets flushed. "
 					"Expected %u, got %u\n",
@@ -168,20 +189,22 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 			printf("Worker %u handled %u packets\n", i,
 					worker_stats[i].handled_packets);
 		printf("Sanity test with two hash values done\n");
-
-		if (worker_stats[0].handled_packets != 16 ||
-				worker_stats[1].handled_packets != 16)
-			return -1;
 	}
 
 	/* give a different hash value to each packet,
 	 * so load gets distributed */
 	clear_packet_count();
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = i;
+		bufs[i]->hash.usr = i+1;
+
+	rte_distributor_process(db, bufs, BURST);
+	count = 0;
+	do {
+		rte_distributor_flush(db);
+		count += rte_distributor_returned_pkts(db,
+				returns, BURST*2);
+	} while (count < BURST);
 
-	rte_distributor_process_v20(d, bufs, BURST);
-	rte_distributor_flush_v20(d);
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -203,8 +226,9 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 	unsigned num_returned = 0;
 
 	/* flush out any remaining packets */
-	rte_distributor_flush_v20(d);
-	rte_distributor_clear_returns_v20(d);
+	rte_distributor_flush(db);
+	rte_distributor_clear_returns(db);
+
 	if (rte_mempool_get_bulk(p, (void *)many_bufs, BIG_BATCH) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
 		return -1;
@@ -212,28 +236,45 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 	for (i = 0; i < BIG_BATCH; i++)
 		many_bufs[i]->hash.usr = i << 2;
 
+	printf("=== testing bit burst (%s) ===\n", wp->name);
 	for (i = 0; i < BIG_BATCH/BURST; i++) {
-		rte_distributor_process_v20(d, &many_bufs[i*BURST], BURST);
-		num_returned += rte_distributor_returned_pkts_v20(d,
+		rte_distributor_process(db,
+				&many_bufs[i*BURST], BURST);
+		count = rte_distributor_returned_pkts(db,
 				&return_bufs[num_returned],
 				BIG_BATCH - num_returned);
+		num_returned += count;
 	}
-	rte_distributor_flush_v20(d);
-	num_returned += rte_distributor_returned_pkts_v20(d,
-			&return_bufs[num_returned], BIG_BATCH - num_returned);
+	rte_distributor_flush(db);
+	count = rte_distributor_returned_pkts(db,
+		&return_bufs[num_returned],
+			BIG_BATCH - num_returned);
+	num_returned += count;
+	retries = 0;
+	do {
+		rte_distributor_flush(db);
+		count = rte_distributor_returned_pkts(db,
+				&return_bufs[num_returned],
+				BIG_BATCH - num_returned);
+		num_returned += count;
+		retries++;
+	} while ((num_returned < BIG_BATCH) && (retries < 100));
+
 
 	if (num_returned != BIG_BATCH) {
-		printf("line %d: Number returned is not the same as "
-				"number sent\n", __LINE__);
+		printf("line %d: Missing packets, expected %d\n",
+				__LINE__, num_returned);
 		return -1;
 	}
+
 	/* big check -  make sure all packets made it back!! */
 	for (i = 0; i < BIG_BATCH; i++) {
 		unsigned j;
 		struct rte_mbuf *src = many_bufs[i];
-		for (j = 0; j < BIG_BATCH; j++)
+		for (j = 0; j < BIG_BATCH; j++) {
 			if (return_bufs[j] == src)
 				break;
+		}
 
 		if (j == BIG_BATCH) {
 			printf("Error: could not find source packet #%u\n", i);
@@ -257,20 +298,28 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 static int
 handle_work_with_free_mbufs(void *arg)
 {
-	struct rte_mbuf *pkt = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
 	struct worker_params *wp = arg;
-	struct rte_distributor_v20 *d = wp->dist;
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
-
-	pkt = rte_distributor_get_pkt_v20(d, id, NULL);
+	struct rte_distributor *d = wp->dist;
+	unsigned int count = 0;
+	unsigned int i;
+	unsigned int num = 0;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
+	num = rte_distributor_get_pkt(d, id, buf, buf, num);
 	while (!quit) {
-		worker_stats[id].handled_packets++, count++;
-		rte_pktmbuf_free(pkt);
-		pkt = rte_distributor_get_pkt_v20(d, id, pkt);
+		worker_stats[id].handled_packets += num;
+		count += num;
+		for (i = 0; i < num; i++)
+			rte_pktmbuf_free(buf[i]);
+		num = rte_distributor_get_pkt(d,
+				id, buf, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt_v20(d, id, pkt);
+	worker_stats[id].handled_packets += num;
+	count += num;
+	rte_distributor_return_pkt(d, id, buf, num);
 	return 0;
 }
 
@@ -282,25 +331,29 @@ handle_work_with_free_mbufs(void *arg)
 static int
 sanity_test_with_mbuf_alloc(struct worker_params *wp, struct rte_mempool *p)
 {
-	struct rte_distributor_v20 *d = wp->dist;
+	struct rte_distributor *d = wp->dist;
 	unsigned i;
 	struct rte_mbuf *bufs[BURST];
 
-	printf("=== Sanity test with mbuf alloc/free  ===\n");
+	printf("=== Sanity test with mbuf alloc/free (%s) ===\n", wp->name);
+
 	clear_packet_count();
 	for (i = 0; i < ((1<<ITER_POWER)); i += BURST) {
 		unsigned j;
 		while (rte_mempool_get_bulk(p, (void *)bufs, BURST) < 0)
-			rte_distributor_process_v20(d, NULL, 0);
+			rte_distributor_process(d, NULL, 0);
 		for (j = 0; j < BURST; j++) {
 			bufs[j]->hash.usr = (i+j) << 1;
 			rte_mbuf_refcnt_set(bufs[j], 1);
 		}
 
-		rte_distributor_process_v20(d, bufs, BURST);
+		rte_distributor_process(d, bufs, BURST);
 	}
 
-	rte_distributor_flush_v20(d);
+	rte_distributor_flush(d);
+
+	rte_delay_us(10000);
+
 	if (total_packet_count() < (1<<ITER_POWER)) {
 		printf("Line %u: Packet count is incorrect, %u, expected %u\n",
 				__LINE__, total_packet_count(),
@@ -316,21 +369,32 @@ static int
 handle_work_for_shutdown_test(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
 	struct worker_params *wp = arg;
-	struct rte_distributor_v20 *d = wp->dist;
-	unsigned count = 0;
-	const unsigned id = __sync_fetch_and_add(&worker_idx, 1);
+	struct rte_distributor *d = wp->dist;
+	unsigned int count = 0;
+	unsigned int num = 0;
+	unsigned int total = 0;
+	unsigned int i;
+	unsigned int returned = 0;
+	const unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+
+	num = rte_distributor_get_pkt(d, id, buf, buf, num);
 
-	pkt = rte_distributor_get_pkt_v20(d, id, NULL);
 	/* wait for quit single globally, or for worker zero, wait
 	 * for zero_quit */
 	while (!quit && !(id == 0 && zero_quit)) {
-		worker_stats[id].handled_packets++, count++;
-		rte_pktmbuf_free(pkt);
-		pkt = rte_distributor_get_pkt_v20(d, id, NULL);
+		worker_stats[id].handled_packets += num;
+		count += num;
+		for (i = 0; i < num; i++)
+			rte_pktmbuf_free(buf[i]);
+		num = rte_distributor_get_pkt(d,
+				id, buf, buf, num);
+		total += num;
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt_v20(d, id, pkt);
+	worker_stats[id].handled_packets += num;
+	count += num;
+	returned = rte_distributor_return_pkt(d, id, buf, num);
 
 	if (id == 0) {
 		/* for worker zero, allow it to restart to pick up last packet
@@ -338,13 +402,18 @@ handle_work_for_shutdown_test(void *arg)
 		 */
 		while (zero_quit)
 			usleep(100);
-		pkt = rte_distributor_get_pkt_v20(d, id, NULL);
+
+		num = rte_distributor_get_pkt(d,
+				id, buf, buf, num);
+
 		while (!quit) {
 			worker_stats[id].handled_packets++, count++;
 			rte_pktmbuf_free(pkt);
-			pkt = rte_distributor_get_pkt_v20(d, id, NULL);
+			num = rte_distributor_get_pkt(d, id, buf, buf, num);
 		}
-		rte_distributor_return_pkt_v20(d, id, pkt);
+		returned = rte_distributor_return_pkt(d,
+				id, buf, num);
+		printf("Num returned = %d\n", returned);
 	}
 	return 0;
 }
@@ -359,24 +428,29 @@ static int
 sanity_test_with_worker_shutdown(struct worker_params *wp,
 		struct rte_mempool *p)
 {
-	struct rte_distributor_v20 *d = wp->dist;
+	struct rte_distributor *d = wp->dist;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
 	printf("=== Sanity test of worker shutdown ===\n");
 
 	clear_packet_count();
+
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
 		return -1;
 	}
 
-	/* now set all hash values in all buffers to zero, so all pkts go to the
-	 * one worker thread */
+	/*
+	 * Now set all hash values in all buffers to same value so all
+	 * pkts go to the one worker thread
+	 */
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = 0;
+		bufs[i]->hash.usr = 1;
+
+	rte_distributor_process(d, bufs, BURST);
+	rte_distributor_flush(d);
 
-	rte_distributor_process_v20(d, bufs, BURST);
 	/* at this point, we will have processed some packets and have a full
 	 * backlog for the other ones at worker 0.
 	 */
@@ -387,14 +461,19 @@ sanity_test_with_worker_shutdown(struct worker_params *wp,
 		return -1;
 	}
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = 0;
+		bufs[i]->hash.usr = 1;
 
 	/* get worker zero to quit */
 	zero_quit = 1;
-	rte_distributor_process_v20(d, bufs, BURST);
-
+	rte_distributor_process(d, bufs, BURST);
 	/* flush the distributor */
-	rte_distributor_flush_v20(d);
+	rte_distributor_flush(d);
+	rte_delay_us(10000);
+
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
+
 	if (total_packet_count() != BURST * 2) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -402,10 +481,6 @@ sanity_test_with_worker_shutdown(struct worker_params *wp,
 		return -1;
 	}
 
-	for (i = 0; i < rte_lcore_count() - 1; i++)
-		printf("Worker %u handled %u packets\n", i,
-				worker_stats[i].handled_packets);
-
 	printf("Sanity test with worker shutdown passed\n\n");
 	return 0;
 }
@@ -417,11 +492,11 @@ static int
 test_flush_with_worker_shutdown(struct worker_params *wp,
 		struct rte_mempool *p)
 {
-	struct rte_distributor_v20 *d = wp->dist;
+	struct rte_distributor *d = wp->dist;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
-	printf("=== Test flush fn with worker shutdown ===\n");
+	printf("=== Test flush fn with worker shutdown (%s) ===\n", wp->name);
 
 	clear_packet_count();
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
@@ -434,7 +509,8 @@ test_flush_with_worker_shutdown(struct worker_params *wp,
 	for (i = 0; i < BURST; i++)
 		bufs[i]->hash.usr = 0;
 
-	rte_distributor_process_v20(d, bufs, BURST);
+	rte_distributor_process(d, bufs, BURST);
+
 	/* at this point, we will have processed some packets and have a full
 	 * backlog for the other ones at worker 0.
 	 */
@@ -443,9 +519,15 @@ test_flush_with_worker_shutdown(struct worker_params *wp,
 	zero_quit = 1;
 
 	/* flush the distributor */
-	rte_distributor_flush_v20(d);
+	rte_distributor_flush(d);
+
+	rte_delay_us(10000);
 
 	zero_quit = 0;
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
+
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -453,10 +535,6 @@ test_flush_with_worker_shutdown(struct worker_params *wp,
 		return -1;
 	}
 
-	for (i = 0; i < rte_lcore_count() - 1; i++)
-		printf("Worker %u handled %u packets\n", i,
-				worker_stats[i].handled_packets);
-
 	printf("Flush test with worker shutdown passed\n\n");
 	return 0;
 }
@@ -464,11 +542,12 @@ test_flush_with_worker_shutdown(struct worker_params *wp,
 static
 int test_error_distributor_create_name(void)
 {
-	struct rte_distributor_v20 *d = NULL;
+	struct rte_distributor *d = NULL;
 	char *name = NULL;
 
-	d = rte_distributor_create_v20(name, rte_socket_id(),
-			rte_lcore_count() - 1);
+	d = rte_distributor_create(name, rte_socket_id(),
+			rte_lcore_count() - 1,
+			RTE_DIST_ALG_BURST);
 	if (d != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with NULL name param\n");
 		return -1;
@@ -481,9 +560,11 @@ int test_error_distributor_create_name(void)
 static
 int test_error_distributor_create_numworkers(void)
 {
-	struct rte_distributor_v20 *d = NULL;
-	d = rte_distributor_create_v20("test_numworkers", rte_socket_id(),
-			RTE_MAX_LCORE + 10);
+	struct rte_distributor *d = NULL;
+
+	d = rte_distributor_create("test_numworkers", rte_socket_id(),
+			RTE_MAX_LCORE + 10,
+			RTE_DIST_ALG_BURST);
 	if (d != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with num_workers > MAX\n");
 		return -1;
@@ -496,7 +577,7 @@ int test_error_distributor_create_numworkers(void)
 static void
 quit_workers(struct worker_params *wp, struct rte_mempool *p)
 {
-	struct rte_distributor_v20 *d = wp->dist;
+	struct rte_distributor *d = wp->dist;
 	const unsigned num_workers = rte_lcore_count() - 1;
 	unsigned i;
 	struct rte_mbuf *bufs[RTE_MAX_LCORE];
@@ -506,12 +587,12 @@ quit_workers(struct worker_params *wp, struct rte_mempool *p)
 	quit = 1;
 	for (i = 0; i < num_workers; i++)
 		bufs[i]->hash.usr = i << 1;
-	rte_distributor_process_v20(d, bufs, num_workers);
+	rte_distributor_process(d, bufs, num_workers);
 
 	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
 
-	rte_distributor_process_v20(d, NULL, 0);
-	rte_distributor_flush_v20(d);
+	rte_distributor_process(d, NULL, 0);
+	rte_distributor_flush(d);
 	rte_eal_mp_wait_lcore();
 	quit = 0;
 	worker_idx = 0;
@@ -520,7 +601,7 @@ quit_workers(struct worker_params *wp, struct rte_mempool *p)
 static int
 test_distributor(void)
 {
-	static struct rte_distributor_v20 *d;
+	static struct rte_distributor *d;
 	static struct rte_mempool *p;
 
 	if (rte_lcore_count() < 2) {
@@ -529,16 +610,17 @@ test_distributor(void)
 	}
 
 	if (d == NULL) {
-		d = rte_distributor_create_v20("Test_distributor",
+		d = rte_distributor_create("Test_dist_burst",
 				rte_socket_id(),
-				rte_lcore_count() - 1);
+				rte_lcore_count() - 1,
+				RTE_DIST_ALG_BURST);
 		if (d == NULL) {
-			printf("Error creating distributor\n");
+			printf("Error creating burst distributor\n");
 			return -1;
 		}
 	} else {
-		rte_distributor_flush_v20(d);
-		rte_distributor_clear_returns_v20(d);
+		rte_distributor_flush(d);
+		rte_distributor_clear_returns(d);
 	}
 
 	const unsigned nb_bufs = (511 * rte_lcore_count()) < BIG_BATCH ?
@@ -553,7 +635,7 @@ test_distributor(void)
 	}
 
 	worker_params.dist = d;
-	sprintf(worker_params.name, "single");
+	sprintf(worker_params.name, "burst");
 
 	rte_eal_mp_remote_launch(handle_work, &worker_params, SKIP_MASTER);
 	if (sanity_test(&worker_params, p) < 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v7 10/17] test: test single and burst distributor API
  2017-02-21  3:17                   ` [PATCH v7 0/17] distributor library " David Hunt
                                       ` (8 preceding siblings ...)
  2017-02-21  3:17                     ` [PATCH v7 09/17] test: switch distributor test over to burst API David Hunt
@ 2017-02-21  3:17                     ` David Hunt
  2017-02-21  3:17                     ` [PATCH v7 11/17] test: add perf test for distributor burst mode David Hunt
                                       ` (7 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-02-21  3:17 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 app/test/test_distributor.c | 115 +++++++++++++++++++++++++++++++-------------
 1 file changed, 82 insertions(+), 33 deletions(-)

diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
index 8866e31..c345382 100644
--- a/app/test/test_distributor.c
+++ b/app/test/test_distributor.c
@@ -543,16 +543,25 @@ static
 int test_error_distributor_create_name(void)
 {
 	struct rte_distributor *d = NULL;
+	struct rte_distributor *db = NULL;
 	char *name = NULL;
 
 	d = rte_distributor_create(name, rte_socket_id(),
 			rte_lcore_count() - 1,
-			RTE_DIST_ALG_BURST);
+			RTE_DIST_ALG_SINGLE);
 	if (d != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with NULL name param\n");
 		return -1;
 	}
 
+	db = rte_distributor_create(name, rte_socket_id(),
+			rte_lcore_count() - 1,
+			RTE_DIST_ALG_BURST);
+	if (db != NULL || rte_errno != EINVAL) {
+		printf("ERROR: No error on create() with NULL param\n");
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -560,15 +569,25 @@ int test_error_distributor_create_name(void)
 static
 int test_error_distributor_create_numworkers(void)
 {
-	struct rte_distributor *d = NULL;
+	struct rte_distributor *ds = NULL;
+	struct rte_distributor *db = NULL;
 
-	d = rte_distributor_create("test_numworkers", rte_socket_id(),
+	ds = rte_distributor_create("test_numworkers", rte_socket_id(),
 			RTE_MAX_LCORE + 10,
-			RTE_DIST_ALG_BURST);
-	if (d != NULL || rte_errno != EINVAL) {
+			RTE_DIST_ALG_SINGLE);
+	if (ds != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with num_workers > MAX\n");
 		return -1;
 	}
+
+	db = rte_distributor_create("test_numworkers", rte_socket_id(),
+			RTE_MAX_LCORE + 10,
+			RTE_DIST_ALG_BURST);
+	if (db != NULL || rte_errno != EINVAL) {
+		printf("ERROR: No error on create() num_workers > MAX\n");
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -601,26 +620,43 @@ quit_workers(struct worker_params *wp, struct rte_mempool *p)
 static int
 test_distributor(void)
 {
-	static struct rte_distributor *d;
+	static struct rte_distributor *ds;
+	static struct rte_distributor *db;
+	static struct rte_distributor *dist[2];
 	static struct rte_mempool *p;
+	int i;
 
 	if (rte_lcore_count() < 2) {
 		printf("ERROR: not enough cores to test distributor\n");
 		return -1;
 	}
 
-	if (d == NULL) {
-		d = rte_distributor_create("Test_dist_burst",
+	if (db == NULL) {
+		db = rte_distributor_create("Test_dist_burst",
 				rte_socket_id(),
 				rte_lcore_count() - 1,
 				RTE_DIST_ALG_BURST);
-		if (d == NULL) {
+		if (db == NULL) {
 			printf("Error creating burst distributor\n");
 			return -1;
 		}
 	} else {
-		rte_distributor_flush(d);
-		rte_distributor_clear_returns(d);
+		rte_distributor_flush(db);
+		rte_distributor_clear_returns(db);
+	}
+
+	if (ds == NULL) {
+		ds = rte_distributor_create("Test_dist_single",
+				rte_socket_id(),
+				rte_lcore_count() - 1,
+				RTE_DIST_ALG_SINGLE);
+		if (ds == NULL) {
+			printf("Error creating single distributor\n");
+			return -1;
+		}
+	} else {
+		rte_distributor_flush(ds);
+		rte_distributor_clear_returns(ds);
 	}
 
 	const unsigned nb_bufs = (511 * rte_lcore_count()) < BIG_BATCH ?
@@ -634,37 +670,50 @@ test_distributor(void)
 		}
 	}
 
-	worker_params.dist = d;
-	sprintf(worker_params.name, "burst");
+	dist[0] = ds;
+	dist[1] = db;
 
-	rte_eal_mp_remote_launch(handle_work, &worker_params, SKIP_MASTER);
-	if (sanity_test(&worker_params, p) < 0)
-		goto err;
-	quit_workers(&worker_params, p);
+	for (i = 0; i < 2; i++) {
 
-	rte_eal_mp_remote_launch(handle_work_with_free_mbufs, &worker_params,
-				SKIP_MASTER);
-	if (sanity_test_with_mbuf_alloc(&worker_params, p) < 0)
-		goto err;
-	quit_workers(&worker_params, p);
+		worker_params.dist = dist[i];
+		if (i)
+			sprintf(worker_params.name, "burst");
+		else
+			sprintf(worker_params.name, "single");
 
-	if (rte_lcore_count() > 2) {
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
-				&worker_params,
-				SKIP_MASTER);
-		if (sanity_test_with_worker_shutdown(&worker_params, p) < 0)
+		rte_eal_mp_remote_launch(handle_work,
+				&worker_params, SKIP_MASTER);
+		if (sanity_test(&worker_params, p) < 0)
 			goto err;
 		quit_workers(&worker_params, p);
 
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
-				&worker_params,
-				SKIP_MASTER);
-		if (test_flush_with_worker_shutdown(&worker_params, p) < 0)
+		rte_eal_mp_remote_launch(handle_work_with_free_mbufs,
+				&worker_params, SKIP_MASTER);
+		if (sanity_test_with_mbuf_alloc(&worker_params, p) < 0)
 			goto err;
 		quit_workers(&worker_params, p);
 
-	} else {
-		printf("Not enough cores to run tests for worker shutdown\n");
+		if (rte_lcore_count() > 2) {
+			rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+					&worker_params,
+					SKIP_MASTER);
+			if (sanity_test_with_worker_shutdown(&worker_params,
+					p) < 0)
+				goto err;
+			quit_workers(&worker_params, p);
+
+			rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+					&worker_params,
+					SKIP_MASTER);
+			if (test_flush_with_worker_shutdown(&worker_params,
+					p) < 0)
+				goto err;
+			quit_workers(&worker_params, p);
+
+		} else {
+			printf("Too few cores to run worker shutdown test\n");
+		}
+
 	}
 
 	if (test_error_distributor_create_numworkers() == -1 ||
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v7 11/17] test: add perf test for distributor burst mode
  2017-02-21  3:17                   ` [PATCH v7 0/17] distributor library " David Hunt
                                       ` (9 preceding siblings ...)
  2017-02-21  3:17                     ` [PATCH v7 10/17] test: test single and burst distributor API David Hunt
@ 2017-02-21  3:17                     ` David Hunt
  2017-02-21  3:17                     ` [PATCH v7 12/17] example: add extra stats to distributor sample David Hunt
                                       ` (6 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-02-21  3:17 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 app/test/test_distributor_perf.c | 117 +++++++++++++++++++++++++--------------
 1 file changed, 76 insertions(+), 41 deletions(-)

diff --git a/app/test/test_distributor_perf.c b/app/test/test_distributor_perf.c
index a7e4823..30ab1a5 100644
--- a/app/test/test_distributor_perf.c
+++ b/app/test/test_distributor_perf.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -39,10 +39,11 @@
 #include <rte_cycles.h>
 #include <rte_common.h>
 #include <rte_mbuf.h>
-#include <rte_distributor_v20.h>
+#include <rte_distributor.h>
 
-#define ITER_POWER 20 /* log 2 of how many iterations we do when timing. */
-#define BURST 32
+#define ITER_POWER_CL 25 /* log 2 of how many iterations  for Cache Line test */
+#define ITER_POWER 21 /* log 2 of how many iterations we do when timing. */
+#define BURST 64
 #define BIG_BATCH 1024
 
 /* static vars - zero initialized by default */
@@ -54,7 +55,8 @@ struct worker_stats {
 } __rte_cache_aligned;
 struct worker_stats worker_stats[RTE_MAX_LCORE];
 
-/* worker thread used for testing the time to do a round-trip of a cache
+/*
+ * worker thread used for testing the time to do a round-trip of a cache
  * line between two cores and back again
  */
 static void
@@ -69,7 +71,8 @@ flip_bit(volatile uint64_t *arg)
 	}
 }
 
-/* test case to time the number of cycles to round-trip a cache line between
+/*
+ * test case to time the number of cycles to round-trip a cache line between
  * two cores and back again.
  */
 static void
@@ -86,7 +89,7 @@ time_cache_line_switch(void)
 		rte_pause();
 
 	const uint64_t start_time = rte_rdtsc();
-	for (i = 0; i < (1 << ITER_POWER); i++) {
+	for (i = 0; i < (1 << ITER_POWER_CL); i++) {
 		while (*pdata)
 			rte_pause();
 		*pdata = 1;
@@ -98,13 +101,14 @@ time_cache_line_switch(void)
 	*pdata = 2;
 	rte_eal_wait_lcore(slaveid);
 	printf("==== Cache line switch test ===\n");
-	printf("Time for %u iterations = %"PRIu64" ticks\n", (1<<ITER_POWER),
+	printf("Time for %u iterations = %"PRIu64" ticks\n", (1<<ITER_POWER_CL),
 			end_time-start_time);
 	printf("Ticks per iteration = %"PRIu64"\n\n",
-			(end_time-start_time) >> ITER_POWER);
+			(end_time-start_time) >> ITER_POWER_CL);
 }
 
-/* returns the total count of the number of packets handled by the worker
+/*
+ * returns the total count of the number of packets handled by the worker
  * functions given below.
  */
 static unsigned
@@ -123,35 +127,44 @@ clear_packet_count(void)
 	memset(&worker_stats, 0, sizeof(worker_stats));
 }
 
-/* this is the basic worker function for performance tests.
+/*
+ * this is the basic worker function for performance tests.
  * it does nothing but return packets and count them.
  */
 static int
 handle_work(void *arg)
 {
-	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor_v20 *d = arg;
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
+	struct rte_distributor *d = arg;
+	unsigned int count = 0;
+	unsigned int num = 0;
+	int i;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
 
-	pkt = rte_distributor_get_pkt_v20(d, id, NULL);
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
+
+	num = rte_distributor_get_pkt(d, id, buf, buf, num);
 	while (!quit) {
-		worker_stats[id].handled_packets++, count++;
-		pkt = rte_distributor_get_pkt_v20(d, id, pkt);
+		worker_stats[id].handled_packets += num;
+		count += num;
+		num = rte_distributor_get_pkt(d, id, buf, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt_v20(d, id, pkt);
+	worker_stats[id].handled_packets += num;
+	count += num;
+	rte_distributor_return_pkt(d, id, buf, num);
 	return 0;
 }
 
-/* this basic performance test just repeatedly sends in 32 packets at a time
+/*
+ * this basic performance test just repeatedly sends in 32 packets at a time
  * to the distributor and verifies at the end that we got them all in the worker
  * threads and finally how long per packet the processing took.
  */
 static inline int
-perf_test(struct rte_distributor_v20 *d, struct rte_mempool *p)
+perf_test(struct rte_distributor *d, struct rte_mempool *p)
 {
-	unsigned i;
+	unsigned int i;
 	uint64_t start, end;
 	struct rte_mbuf *bufs[BURST];
 
@@ -166,15 +179,16 @@ perf_test(struct rte_distributor_v20 *d, struct rte_mempool *p)
 
 	start = rte_rdtsc();
 	for (i = 0; i < (1<<ITER_POWER); i++)
-		rte_distributor_process_v20(d, bufs, BURST);
+		rte_distributor_process(d, bufs, BURST);
 	end = rte_rdtsc();
 
 	do {
 		usleep(100);
-		rte_distributor_process_v20(d, NULL, 0);
+		rte_distributor_process(d, NULL, 0);
 	} while (total_packet_count() < (BURST << ITER_POWER));
 
-	printf("=== Performance test of distributor ===\n");
+	rte_distributor_clear_returns(d);
+
 	printf("Time per burst:  %"PRIu64"\n", (end - start) >> ITER_POWER);
 	printf("Time per packet: %"PRIu64"\n\n",
 			((end - start) >> ITER_POWER)/BURST);
@@ -192,21 +206,22 @@ perf_test(struct rte_distributor_v20 *d, struct rte_mempool *p)
 
 /* Useful function which ensures that all worker functions terminate */
 static void
-quit_workers(struct rte_distributor_v20 *d, struct rte_mempool *p)
+quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 {
-	const unsigned num_workers = rte_lcore_count() - 1;
-	unsigned i;
+	const unsigned int num_workers = rte_lcore_count() - 1;
+	unsigned int i;
 	struct rte_mbuf *bufs[RTE_MAX_LCORE];
+
 	rte_mempool_get_bulk(p, (void *)bufs, num_workers);
 
 	quit = 1;
 	for (i = 0; i < num_workers; i++)
 		bufs[i]->hash.usr = i << 1;
-	rte_distributor_process_v20(d, bufs, num_workers);
+	rte_distributor_process(d, bufs, num_workers);
 
 	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
 
-	rte_distributor_process_v20(d, NULL, 0);
+	rte_distributor_process(d, NULL, 0);
 	rte_eal_mp_wait_lcore();
 	quit = 0;
 	worker_idx = 0;
@@ -215,7 +230,8 @@ quit_workers(struct rte_distributor_v20 *d, struct rte_mempool *p)
 static int
 test_distributor_perf(void)
 {
-	static struct rte_distributor_v20 *d;
+	static struct rte_distributor *ds;
+	static struct rte_distributor *db;
 	static struct rte_mempool *p;
 
 	if (rte_lcore_count() < 2) {
@@ -226,16 +242,28 @@ test_distributor_perf(void)
 	/* first time how long it takes to round-trip a cache line */
 	time_cache_line_switch();
 
-	if (d == NULL) {
-		d = rte_distributor_create_v20("Test_perf", rte_socket_id(),
-				rte_lcore_count() - 1);
-		if (d == NULL) {
+	if (ds == NULL) {
+		ds = rte_distributor_create("Test_perf", rte_socket_id(),
+				rte_lcore_count() - 1,
+				RTE_DIST_ALG_SINGLE);
+		if (ds == NULL) {
 			printf("Error creating distributor\n");
 			return -1;
 		}
 	} else {
-		rte_distributor_flush_v20(d);
-		rte_distributor_clear_returns_v20(d);
+		rte_distributor_clear_returns(ds);
+	}
+
+	if (db == NULL) {
+		db = rte_distributor_create("Test_burst", rte_socket_id(),
+				rte_lcore_count() - 1,
+				RTE_DIST_ALG_BURST);
+		if (db == NULL) {
+			printf("Error creating burst distributor\n");
+			return -1;
+		}
+	} else {
+		rte_distributor_clear_returns(db);
 	}
 
 	const unsigned nb_bufs = (511 * rte_lcore_count()) < BIG_BATCH ?
@@ -249,10 +277,17 @@ test_distributor_perf(void)
 		}
 	}
 
-	rte_eal_mp_remote_launch(handle_work, d, SKIP_MASTER);
-	if (perf_test(d, p) < 0)
+	printf("=== Performance test of distributor (single mode) ===\n");
+	rte_eal_mp_remote_launch(handle_work, ds, SKIP_MASTER);
+	if (perf_test(ds, p) < 0)
+		return -1;
+	quit_workers(ds, p);
+
+	printf("=== Performance test of distributor (burst mode) ===\n");
+	rte_eal_mp_remote_launch(handle_work, db, SKIP_MASTER);
+	if (perf_test(db, p) < 0)
 		return -1;
-	quit_workers(d, p);
+	quit_workers(db, p);
 
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v7 12/17] example: add extra stats to distributor sample
  2017-02-21  3:17                   ` [PATCH v7 0/17] distributor library " David Hunt
                                       ` (10 preceding siblings ...)
  2017-02-21  3:17                     ` [PATCH v7 11/17] test: add perf test for distributor burst mode David Hunt
@ 2017-02-21  3:17                     ` David Hunt
  2017-02-24 14:16                       ` Bruce Richardson
  2017-02-21  3:17                     ` [PATCH v7 13/17] sample: distributor: wait for ports to come up David Hunt
                                       ` (5 subsequent siblings)
  17 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-02-21  3:17 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

This will allow us to see what's going on at various stages
throughout the sample app, with per-second visibility

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 examples/distributor/main.c | 139 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 123 insertions(+), 16 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index 350d6f6..6634e2f 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -54,24 +54,53 @@
 
 #define RTE_LOGTYPE_DISTRAPP RTE_LOGTYPE_USER1
 
+#define ANSI_COLOR_RED     "\x1b[31m"
+#define ANSI_COLOR_RESET   "\x1b[0m"
+
 /* mask of enabled ports */
 static uint32_t enabled_port_mask;
 volatile uint8_t quit_signal;
 volatile uint8_t quit_signal_rx;
+volatile uint8_t quit_signal_dist;
 
 static volatile struct app_stats {
 	struct {
 		uint64_t rx_pkts;
 		uint64_t returned_pkts;
 		uint64_t enqueued_pkts;
+		uint64_t enqdrop_pkts;
 	} rx __rte_cache_aligned;
+	int pad1 __rte_cache_aligned;
+
+	struct {
+		uint64_t in_pkts;
+		uint64_t ret_pkts;
+		uint64_t sent_pkts;
+		uint64_t enqdrop_pkts;
+	} dist __rte_cache_aligned;
+	int pad2 __rte_cache_aligned;
 
 	struct {
 		uint64_t dequeue_pkts;
 		uint64_t tx_pkts;
+		uint64_t enqdrop_pkts;
 	} tx __rte_cache_aligned;
+	int pad3 __rte_cache_aligned;
+
+	uint64_t worker_pkts[64] __rte_cache_aligned;
+
+	int pad4 __rte_cache_aligned;
+
+	uint64_t worker_bursts[64][8] __rte_cache_aligned;
+
+	int pad5 __rte_cache_aligned;
+
+	uint64_t port_rx_pkts[64] __rte_cache_aligned;
+	uint64_t port_tx_pkts[64] __rte_cache_aligned;
 } app_stats;
 
+struct app_stats prev_app_stats;
+
 static const struct rte_eth_conf port_conf_default = {
 	.rxmode = {
 		.mq_mode = ETH_MQ_RX_RSS,
@@ -93,6 +122,8 @@ struct output_buffer {
 	struct rte_mbuf *mbufs[BURST_SIZE];
 };
 
+static void print_stats(void);
+
 /*
  * Initialises a given port using global settings and with the rx buffers
  * coming from the mbuf_pool passed as parameter
@@ -378,25 +409,91 @@ static void
 print_stats(void)
 {
 	struct rte_eth_stats eth_stats;
-	unsigned i;
-
-	printf("\nRX thread stats:\n");
-	printf(" - Received:    %"PRIu64"\n", app_stats.rx.rx_pkts);
-	printf(" - Processed:   %"PRIu64"\n", app_stats.rx.returned_pkts);
-	printf(" - Enqueued:    %"PRIu64"\n", app_stats.rx.enqueued_pkts);
-
-	printf("\nTX thread stats:\n");
-	printf(" - Dequeued:    %"PRIu64"\n", app_stats.tx.dequeue_pkts);
-	printf(" - Transmitted: %"PRIu64"\n", app_stats.tx.tx_pkts);
+	unsigned int i, j;
+	const unsigned int num_workers = rte_lcore_count() - 4;
 
 	for (i = 0; i < rte_eth_dev_count(); i++) {
 		rte_eth_stats_get(i, &eth_stats);
-		printf("\nPort %u stats:\n", i);
-		printf(" - Pkts in:   %"PRIu64"\n", eth_stats.ipackets);
-		printf(" - Pkts out:  %"PRIu64"\n", eth_stats.opackets);
-		printf(" - In Errs:   %"PRIu64"\n", eth_stats.ierrors);
-		printf(" - Out Errs:  %"PRIu64"\n", eth_stats.oerrors);
-		printf(" - Mbuf Errs: %"PRIu64"\n", eth_stats.rx_nombuf);
+		app_stats.port_rx_pkts[i] = eth_stats.ipackets;
+		app_stats.port_tx_pkts[i] = eth_stats.opackets;
+	}
+
+	printf("\n\nRX Thread:\n");
+	for (i = 0; i < rte_eth_dev_count(); i++) {
+		printf("Port %u Pktsin : %5.2f\n", i,
+				(app_stats.port_rx_pkts[i] -
+				prev_app_stats.port_rx_pkts[i])/1000000.0);
+		prev_app_stats.port_rx_pkts[i] = app_stats.port_rx_pkts[i];
+	}
+	printf(" - Received:    %5.2f\n",
+			(app_stats.rx.rx_pkts -
+			prev_app_stats.rx.rx_pkts)/1000000.0);
+	printf(" - Returned:    %5.2f\n",
+			(app_stats.rx.returned_pkts -
+			prev_app_stats.rx.returned_pkts)/1000000.0);
+	printf(" - Enqueued:    %5.2f\n",
+			(app_stats.rx.enqueued_pkts -
+			prev_app_stats.rx.enqueued_pkts)/1000000.0);
+	printf(" - Dropped:     %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.rx.enqdrop_pkts -
+			prev_app_stats.rx.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	printf("Distributor thread:\n");
+	printf(" - In:          %5.2f\n",
+			(app_stats.dist.in_pkts -
+			prev_app_stats.dist.in_pkts)/1000000.0);
+	printf(" - Returned:    %5.2f\n",
+			(app_stats.dist.ret_pkts -
+			prev_app_stats.dist.ret_pkts)/1000000.0);
+	printf(" - Sent:        %5.2f\n",
+			(app_stats.dist.sent_pkts -
+			prev_app_stats.dist.sent_pkts)/1000000.0);
+	printf(" - Dropped      %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.dist.enqdrop_pkts -
+			prev_app_stats.dist.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	printf("TX thread:\n");
+	printf(" - Dequeued:    %5.2f\n",
+			(app_stats.tx.dequeue_pkts -
+			prev_app_stats.tx.dequeue_pkts)/1000000.0);
+	for (i = 0; i < rte_eth_dev_count(); i++) {
+		printf("Port %u Pktsout: %5.2f\n",
+				i, (app_stats.port_tx_pkts[i] -
+				prev_app_stats.port_tx_pkts[i])/1000000.0);
+		prev_app_stats.port_tx_pkts[i] = app_stats.port_tx_pkts[i];
+	}
+	printf(" - Transmitted: %5.2f\n",
+			(app_stats.tx.tx_pkts -
+			prev_app_stats.tx.tx_pkts)/1000000.0);
+	printf(" - Dropped:     %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.tx.enqdrop_pkts -
+			prev_app_stats.tx.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	prev_app_stats.rx.rx_pkts = app_stats.rx.rx_pkts;
+	prev_app_stats.rx.returned_pkts = app_stats.rx.returned_pkts;
+	prev_app_stats.rx.enqueued_pkts = app_stats.rx.enqueued_pkts;
+	prev_app_stats.rx.enqdrop_pkts = app_stats.rx.enqdrop_pkts;
+	prev_app_stats.dist.in_pkts = app_stats.dist.in_pkts;
+	prev_app_stats.dist.ret_pkts = app_stats.dist.ret_pkts;
+	prev_app_stats.dist.sent_pkts = app_stats.dist.sent_pkts;
+	prev_app_stats.dist.enqdrop_pkts = app_stats.dist.enqdrop_pkts;
+	prev_app_stats.tx.dequeue_pkts = app_stats.tx.dequeue_pkts;
+	prev_app_stats.tx.tx_pkts = app_stats.tx.tx_pkts;
+	prev_app_stats.tx.enqdrop_pkts = app_stats.tx.enqdrop_pkts;
+
+	for (i = 0; i < num_workers; i++) {
+		printf("Worker %02u Pkts: %5.2f. Bursts(1-8): ", i,
+				(app_stats.worker_pkts[i] -
+				prev_app_stats.worker_pkts[i])/1000000.0);
+		for (j = 0; j < 8; j++) {
+			printf("%"PRIu64" ", app_stats.worker_bursts[i][j]);
+			app_stats.worker_bursts[i][j] = 0;
+		}
+		printf("\n");
+		prev_app_stats.worker_pkts[i] = app_stats.worker_pkts[i];
 	}
 }
 
@@ -502,6 +599,7 @@ main(int argc, char *argv[])
 	unsigned nb_ports;
 	uint8_t portid;
 	uint8_t nb_ports_available;
+	uint64_t t, freq;
 
 	/* catch ctrl-c so we can print on exit */
 	signal(SIGINT, int_handler);
@@ -596,6 +694,15 @@ main(int argc, char *argv[])
 	if (lcore_rx(&p) != 0)
 		return -1;
 
+	freq = rte_get_timer_hz();
+	t = __rdtsc() + freq;
+	while (!quit_signal_dist) {
+		if (t < __rdtsc()) {
+			print_stats();
+			t = _rdtsc() + freq;
+		}
+	}
+
 	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
 		if (rte_eal_wait_lcore(lcore_id) < 0)
 			return -1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v7 13/17] sample: distributor: wait for ports to come up
  2017-02-21  3:17                   ` [PATCH v7 0/17] distributor library " David Hunt
                                       ` (11 preceding siblings ...)
  2017-02-21  3:17                     ` [PATCH v7 12/17] example: add extra stats to distributor sample David Hunt
@ 2017-02-21  3:17                     ` David Hunt
  2017-02-21  3:17                     ` [PATCH v7 14/17] sample: switch to new distributor API David Hunt
                                       ` (4 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-02-21  3:17 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 examples/distributor/main.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index 6634e2f..7621ff9 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -1,8 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -62,6 +61,7 @@ static uint32_t enabled_port_mask;
 volatile uint8_t quit_signal;
 volatile uint8_t quit_signal_rx;
 volatile uint8_t quit_signal_dist;
+volatile uint8_t quit_signal_work;
 
 static volatile struct app_stats {
 	struct {
@@ -165,7 +165,8 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 
 	struct rte_eth_link link;
 	rte_eth_link_get_nowait(port, &link);
-	if (!link.link_status) {
+	while (!link.link_status) {
+		printf("Waiting for Link up on port %"PRIu8"\n", port);
 		sleep(1);
 		rte_eth_link_get_nowait(port, &link);
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v7 14/17] sample: switch to new distributor API
  2017-02-21  3:17                   ` [PATCH v7 0/17] distributor library " David Hunt
                                       ` (12 preceding siblings ...)
  2017-02-21  3:17                     ` [PATCH v7 13/17] sample: distributor: wait for ports to come up David Hunt
@ 2017-02-21  3:17                     ` David Hunt
  2017-02-24 14:16                       ` Bruce Richardson
  2017-02-21  3:17                     ` [PATCH v7 15/17] lib: make v20 header file private David Hunt
                                       ` (3 subsequent siblings)
  17 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-02-21  3:17 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

and give distributor it's own thread

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 examples/distributor/main.c | 280 ++++++++++++++++++++++++++++++--------------
 1 file changed, 192 insertions(+), 88 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index 7621ff9..0856b57 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -42,14 +42,16 @@
 #include <rte_malloc.h>
 #include <rte_debug.h>
 #include <rte_prefetch.h>
-#include <rte_distributor_v20.h>
+#include <rte_distributor.h>
 
-#define RX_RING_SIZE 256
-#define TX_RING_SIZE 512
+#define RX_QUEUE_SIZE 512
+#define TX_QUEUE_SIZE 512
 #define NUM_MBUFS ((64*1024)-1)
-#define MBUF_CACHE_SIZE 250
-#define BURST_SIZE 32
-#define RTE_RING_SZ 1024
+#define MBUF_CACHE_SIZE 128
+#define BURST_SIZE 64
+#define SCHED_RX_RING_SZ 8192
+#define SCHED_TX_RING_SZ 65536
+#define BURST_SIZE_TX 32
 
 #define RTE_LOGTYPE_DISTRAPP RTE_LOGTYPE_USER1
 
@@ -132,9 +134,13 @@ static inline int
 port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 {
 	struct rte_eth_conf port_conf = port_conf_default;
-	const uint16_t rxRings = 1, txRings = rte_lcore_count() - 1;
-	int retval;
+	const uint16_t rxRings = 1;
+	uint16_t txRings = rte_lcore_count() - 1;
 	uint16_t q;
+	int retval;
+
+	if (txRings > RTE_MAX_ETHPORTS)
+		txRings = RTE_MAX_ETHPORTS;
 
 	if (port >= rte_eth_dev_count())
 		return -1;
@@ -144,7 +150,7 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 		return retval;
 
 	for (q = 0; q < rxRings; q++) {
-		retval = rte_eth_rx_queue_setup(port, q, RX_RING_SIZE,
+		retval = rte_eth_rx_queue_setup(port, q, RX_QUEUE_SIZE,
 						rte_eth_dev_socket_id(port),
 						NULL, mbuf_pool);
 		if (retval < 0)
@@ -152,7 +158,7 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 	}
 
 	for (q = 0; q < txRings; q++) {
-		retval = rte_eth_tx_queue_setup(port, q, TX_RING_SIZE,
+		retval = rte_eth_tx_queue_setup(port, q, TX_QUEUE_SIZE,
 						rte_eth_dev_socket_id(port),
 						NULL);
 		if (retval < 0)
@@ -192,41 +198,20 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 
 struct lcore_params {
 	unsigned worker_id;
-	struct rte_distributor_v20 *d;
-	struct rte_ring *r;
+	struct rte_distributor *d;
+	struct rte_ring *rx_dist_ring;
+	struct rte_ring *dist_tx_ring;
 	struct rte_mempool *mem_pool;
 };
 
-static int
-quit_workers(struct rte_distributor_v20 *d, struct rte_mempool *p)
-{
-	const unsigned num_workers = rte_lcore_count() - 2;
-	unsigned i;
-	struct rte_mbuf *bufs[num_workers];
-
-	if (rte_mempool_get_bulk(p, (void *)bufs, num_workers) != 0) {
-		printf("line %d: Error getting mbufs from pool\n", __LINE__);
-		return -1;
-	}
-
-	for (i = 0; i < num_workers; i++)
-		bufs[i]->hash.rss = i << 1;
-
-	rte_distributor_process_v20(d, bufs, num_workers);
-	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
-
-	return 0;
-}
 
 static int
 lcore_rx(struct lcore_params *p)
 {
-	struct rte_distributor_v20 *d = p->d;
-	struct rte_mempool *mem_pool = p->mem_pool;
-	struct rte_ring *r = p->r;
 	const uint8_t nb_ports = rte_eth_dev_count();
 	const int socket_id = rte_socket_id();
 	uint8_t port;
+	struct rte_mbuf *bufs[BURST_SIZE*2];
 
 	for (port = 0; port < nb_ports; port++) {
 		/* skip ports that are not enabled */
@@ -242,6 +227,7 @@ lcore_rx(struct lcore_params *p)
 
 	printf("\nCore %u doing packet RX.\n", rte_lcore_id());
 	port = 0;
+
 	while (!quit_signal_rx) {
 
 		/* skip ports that are not enabled */
@@ -250,7 +236,7 @@ lcore_rx(struct lcore_params *p)
 				port = 0;
 			continue;
 		}
-		struct rte_mbuf *bufs[BURST_SIZE*2];
+
 		const uint16_t nb_rx = rte_eth_rx_burst(port, 0, bufs,
 				BURST_SIZE);
 		if (unlikely(nb_rx == 0)) {
@@ -260,19 +246,39 @@ lcore_rx(struct lcore_params *p)
 		}
 		app_stats.rx.rx_pkts += nb_rx;
 
-		rte_distributor_process_v20(d, bufs, nb_rx);
-		const uint16_t nb_ret = rte_distributor_returned_pkts_v20(d,
-				bufs, BURST_SIZE*2);
+/*
+ * You can run the distributor on the rx core with this code. Returned
+ * packets are then send straight to the tx core.
+ */
+#if 0
+	rte_distributor_process(d, bufs, nb_rx);
+	const uint16_t nb_ret = rte_distributor_returned_pktsd,
+			bufs, BURST_SIZE*2);
+
 		app_stats.rx.returned_pkts += nb_ret;
 		if (unlikely(nb_ret == 0)) {
 			if (++port == nb_ports)
 				port = 0;
 			continue;
 		}
-
-		uint16_t sent = rte_ring_enqueue_burst(r, (void *)bufs, nb_ret);
+		struct rte_ring *tx_ring = p->dist_tx_ring;
+		uint16_t sent = rte_ring_enqueue_burst(tx_ring,
+				(void *)bufs, nb_ret);
+#else
+		uint16_t nb_ret = nb_rx;
+		/*
+		 * Swap the following two lines if you want the rx traffic
+		 * to go directly to tx, no distribution.
+		 */
+		struct rte_ring *out_ring = p->rx_dist_ring;
+		/* struct rte_ring *out_ring = p->dist_tx_ring; */
+
+		uint16_t sent = rte_ring_enqueue_burst(out_ring,
+				(void *)bufs, nb_ret);
+#endif
 		app_stats.rx.enqueued_pkts += sent;
 		if (unlikely(sent < nb_ret)) {
+			app_stats.rx.enqdrop_pkts +=  nb_ret - sent;
 			RTE_LOG_DP(DEBUG, DISTRAPP,
 				"%s:Packet loss due to full ring\n", __func__);
 			while (sent < nb_ret)
@@ -281,33 +287,21 @@ lcore_rx(struct lcore_params *p)
 		if (++port == nb_ports)
 			port = 0;
 	}
-	rte_distributor_process_v20(d, NULL, 0);
-	/* flush distributor to bring to known state */
-	rte_distributor_flush_v20(d);
 	/* set worker & tx threads quit flag */
+	printf("\nCore %u exiting rx task.\n", rte_lcore_id());
 	quit_signal = 1;
-	/*
-	 * worker threads may hang in get packet as
-	 * distributor process is not running, just make sure workers
-	 * get packets till quit_signal is actually been
-	 * received and they gracefully shutdown
-	 */
-	if (quit_workers(d, mem_pool) != 0)
-		return -1;
-	/* rx thread should quit at last */
 	return 0;
 }
 
 static inline void
 flush_one_port(struct output_buffer *outbuf, uint8_t outp)
 {
-	unsigned nb_tx = rte_eth_tx_burst(outp, 0, outbuf->mbufs,
-			outbuf->count);
-	app_stats.tx.tx_pkts += nb_tx;
+	unsigned int nb_tx = rte_eth_tx_burst(outp, 0,
+			outbuf->mbufs, outbuf->count);
+	app_stats.tx.tx_pkts += outbuf->count;
 
 	if (unlikely(nb_tx < outbuf->count)) {
-		RTE_LOG_DP(DEBUG, DISTRAPP,
-			"%s:Packet loss with tx_burst\n", __func__);
+		app_stats.tx.enqdrop_pkts +=  outbuf->count - nb_tx;
 		do {
 			rte_pktmbuf_free(outbuf->mbufs[nb_tx]);
 		} while (++nb_tx < outbuf->count);
@@ -319,6 +313,7 @@ static inline void
 flush_all_ports(struct output_buffer *tx_buffers, uint8_t nb_ports)
 {
 	uint8_t outp;
+
 	for (outp = 0; outp < nb_ports; outp++) {
 		/* skip ports that are not enabled */
 		if ((enabled_port_mask & (1 << outp)) == 0)
@@ -331,6 +326,58 @@ flush_all_ports(struct output_buffer *tx_buffers, uint8_t nb_ports)
 	}
 }
 
+
+
+static int
+lcore_distributor(struct lcore_params *p)
+{
+	struct rte_ring *in_r = p->rx_dist_ring;
+	struct rte_ring *out_r = p->dist_tx_ring;
+	struct rte_mbuf *bufs[BURST_SIZE * 4];
+	struct rte_distributor *d = p->d;
+
+	printf("\nCore %u acting as distributor core.\n", rte_lcore_id());
+	while (!quit_signal_dist) {
+		const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
+				(void *)bufs, BURST_SIZE*1);
+		if (nb_rx) {
+			app_stats.dist.in_pkts += nb_rx;
+
+			/* Distribute the packets */
+			rte_distributor_process(d, bufs, nb_rx);
+			/* Handle Returns */
+			const uint16_t nb_ret =
+				rte_distributor_returned_pkts(d,
+					bufs, BURST_SIZE*2);
+
+			if (unlikely(nb_ret == 0))
+				continue;
+			app_stats.dist.ret_pkts += nb_ret;
+
+			uint16_t sent = rte_ring_enqueue_burst(out_r,
+					(void *)bufs, nb_ret);
+			app_stats.dist.sent_pkts += sent;
+			if (unlikely(sent < nb_ret)) {
+				app_stats.dist.enqdrop_pkts += nb_ret - sent;
+				RTE_LOG(DEBUG, DISTRAPP,
+					"%s:Packet loss due to full out ring\n",
+					__func__);
+				while (sent < nb_ret)
+					rte_pktmbuf_free(bufs[sent++]);
+			}
+		}
+	}
+	printf("\nCore %u exiting distributor task.\n", rte_lcore_id());
+	quit_signal_work = 1;
+
+	rte_distributor_flush(d);
+	/* Unblock any returns so workers can exit */
+	rte_distributor_clear_returns(d);
+	quit_signal_rx = 1;
+	return 0;
+}
+
+
 static int
 lcore_tx(struct rte_ring *in_r)
 {
@@ -359,9 +406,9 @@ lcore_tx(struct rte_ring *in_r)
 			if ((enabled_port_mask & (1 << port)) == 0)
 				continue;
 
-			struct rte_mbuf *bufs[BURST_SIZE];
+			struct rte_mbuf *bufs[BURST_SIZE_TX];
 			const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
-					(void *)bufs, BURST_SIZE);
+					(void *)bufs, BURST_SIZE_TX);
 			app_stats.tx.dequeue_pkts += nb_rx;
 
 			/* if we get no traffic, flush anything we have */
@@ -390,11 +437,12 @@ lcore_tx(struct rte_ring *in_r)
 
 				outbuf = &tx_buffers[outp];
 				outbuf->mbufs[outbuf->count++] = bufs[i];
-				if (outbuf->count == BURST_SIZE)
+				if (outbuf->count == BURST_SIZE_TX)
 					flush_one_port(outbuf, outp);
 			}
 		}
 	}
+	printf("\nCore %u exiting tx task.\n", rte_lcore_id());
 	return 0;
 }
 
@@ -403,7 +451,7 @@ int_handler(int sig_num)
 {
 	printf("Exiting on signal %d\n", sig_num);
 	/* set quit flag for rx thread to exit */
-	quit_signal_rx = 1;
+	quit_signal_dist = 1;
 }
 
 static void
@@ -501,19 +549,40 @@ print_stats(void)
 static int
 lcore_worker(struct lcore_params *p)
 {
-	struct rte_distributor_v20 *d = p->d;
+	struct rte_distributor *d = p->d;
 	const unsigned id = p->worker_id;
+	unsigned int num = 0;
+	unsigned int i;
+
 	/*
 	 * for single port, xor_val will be zero so we won't modify the output
 	 * port, otherwise we send traffic from 0 to 1, 2 to 3, and vice versa
 	 */
 	const unsigned xor_val = (rte_eth_dev_count() > 1);
-	struct rte_mbuf *buf = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
+
+	app_stats.worker_pkts[p->worker_id] = 1;
+
 
 	printf("\nCore %u acting as worker core.\n", rte_lcore_id());
-	while (!quit_signal) {
-		buf = rte_distributor_get_pkt_v20(d, id, buf);
-		buf->port ^= xor_val;
+	while (!quit_signal_work) {
+
+		num = rte_distributor_get_pkt(d, id, buf, buf, num);
+		/* Do a little bit of work for each packet */
+		for (i = 0; i < num; i++) {
+			uint64_t t = __rdtsc()+100;
+
+			while (__rdtsc() < t)
+				rte_pause();
+			buf[i]->port ^= xor_val;
+		}
+
+		app_stats.worker_pkts[p->worker_id] += num;
+		if (num > 0)
+			app_stats.worker_bursts[p->worker_id][num-1]++;
 	}
 	return 0;
 }
@@ -594,8 +663,9 @@ int
 main(int argc, char *argv[])
 {
 	struct rte_mempool *mbuf_pool;
-	struct rte_distributor_v20 *d;
-	struct rte_ring *output_ring;
+	struct rte_distributor *d;
+	struct rte_ring *dist_tx_ring;
+	struct rte_ring *rx_dist_ring;
 	unsigned lcore_id, worker_id = 0;
 	unsigned nb_ports;
 	uint8_t portid;
@@ -617,10 +687,12 @@ main(int argc, char *argv[])
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Invalid distributor parameters\n");
 
-	if (rte_lcore_count() < 3)
+	if (rte_lcore_count() < 5)
 		rte_exit(EXIT_FAILURE, "Error, This application needs at "
-				"least 3 logical cores to run:\n"
-				"1 lcore for packet RX and distribution\n"
+				"least 5 logical cores to run:\n"
+				"1 lcore for stats (can be core 0)\n"
+				"1 lcore for packet RX\n"
+				"1 lcore for distribution\n"
 				"1 lcore for packet TX\n"
 				"and at least 1 lcore for worker threads\n");
 
@@ -659,41 +731,73 @@ main(int argc, char *argv[])
 				"All available ports are disabled. Please set portmask.\n");
 	}
 
-	d = rte_distributor_create_v20("PKT_DIST", rte_socket_id(),
-			rte_lcore_count() - 2);
+	d = rte_distributor_create("PKT_DIST", rte_socket_id(),
+			rte_lcore_count() - 4,
+			RTE_DIST_ALG_BURST);
 	if (d == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create distributor\n");
 
 	/*
-	 * scheduler ring is read only by the transmitter core, but written to
-	 * by multiple threads
+	 * scheduler ring is read by the transmitter core, and written to
+	 * by scheduler core
 	 */
-	output_ring = rte_ring_create("Output_ring", RTE_RING_SZ,
-			rte_socket_id(), RING_F_SC_DEQ);
-	if (output_ring == NULL)
+	dist_tx_ring = rte_ring_create("Output_ring", SCHED_TX_RING_SZ,
+			rte_socket_id(), RING_F_SC_DEQ | RING_F_SP_ENQ);
+	if (dist_tx_ring == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot create output ring\n");
+
+	rx_dist_ring = rte_ring_create("Input_ring", SCHED_RX_RING_SZ,
+			rte_socket_id(), RING_F_SC_DEQ | RING_F_SP_ENQ);
+	if (rx_dist_ring == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create output ring\n");
 
 	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
-		if (worker_id == rte_lcore_count() - 2)
+		if (worker_id == rte_lcore_count() - 3) {
+			printf("Starting distributor on lcore_id %d\n",
+					lcore_id);
+			/* distributor core */
+			struct lcore_params *p =
+					rte_malloc(NULL, sizeof(*p), 0);
+			if (!p)
+				rte_panic("malloc failure\n");
+			*p = (struct lcore_params){worker_id, d,
+					rx_dist_ring, dist_tx_ring, mbuf_pool};
+			rte_eal_remote_launch(
+					(lcore_function_t *)lcore_distributor,
+					p, lcore_id);
+		} else if (worker_id == rte_lcore_count() - 4) {
+			printf("Starting tx  on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
+			/* tx core */
 			rte_eal_remote_launch((lcore_function_t *)lcore_tx,
-					output_ring, lcore_id);
-		else {
+					dist_tx_ring, lcore_id);
+		} else if (worker_id == rte_lcore_count() - 2) {
+			printf("Starting rx on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
+			/* rx core */
 			struct lcore_params *p =
 					rte_malloc(NULL, sizeof(*p), 0);
 			if (!p)
 				rte_panic("malloc failure\n");
-			*p = (struct lcore_params){worker_id, d, output_ring, mbuf_pool};
+			*p = (struct lcore_params){worker_id, d, rx_dist_ring,
+					dist_tx_ring, mbuf_pool};
+			rte_eal_remote_launch((lcore_function_t *)lcore_rx,
+					p, lcore_id);
+		} else {
+			printf("Starting worker on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
+			struct lcore_params *p =
+					rte_malloc(NULL, sizeof(*p), 0);
+			if (!p)
+				rte_panic("malloc failure\n");
+			*p = (struct lcore_params){worker_id, d, rx_dist_ring,
+					dist_tx_ring, mbuf_pool};
 
 			rte_eal_remote_launch((lcore_function_t *)lcore_worker,
 					p, lcore_id);
 		}
 		worker_id++;
 	}
-	/* call lcore_main on master core only */
-	struct lcore_params p = { 0, d, output_ring, mbuf_pool};
-
-	if (lcore_rx(&p) != 0)
-		return -1;
 
 	freq = rte_get_timer_hz();
 	t = __rdtsc() + freq;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v7 15/17] lib: make v20 header file private
  2017-02-21  3:17                   ` [PATCH v7 0/17] distributor library " David Hunt
                                       ` (13 preceding siblings ...)
  2017-02-21  3:17                     ` [PATCH v7 14/17] sample: switch to new distributor API David Hunt
@ 2017-02-21  3:17                     ` David Hunt
  2017-02-24 14:18                       ` Bruce Richardson
  2017-02-21  3:17                     ` [PATCH v7 16/17] doc: distributor library changes for new burst api David Hunt
                                       ` (2 subsequent siblings)
  17 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-02-21  3:17 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 5b599c6..3017398 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -53,8 +53,7 @@ endif
 
 
 # install this header file
-SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor_v20.h
-SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v7 16/17] doc: distributor library changes for new burst api
  2017-02-21  3:17                   ` [PATCH v7 0/17] distributor library " David Hunt
                                       ` (14 preceding siblings ...)
  2017-02-21  3:17                     ` [PATCH v7 15/17] lib: make v20 header file private David Hunt
@ 2017-02-21  3:17                     ` David Hunt
  2017-02-21 16:18                       ` Mcnamara, John
  2017-02-21  3:17                     ` [PATCH v7 17/17] maintainers: add to distributor lib maintainers David Hunt
  2017-02-24 14:01                     ` [PATCH v7 0/17] distributor library performance enhancements Bruce Richardson
  17 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-02-21  3:17 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 doc/guides/prog_guide/packet_distrib_lib.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/guides/prog_guide/packet_distrib_lib.rst b/doc/guides/prog_guide/packet_distrib_lib.rst
index b5bdabb..c070eaa 100644
--- a/doc/guides/prog_guide/packet_distrib_lib.rst
+++ b/doc/guides/prog_guide/packet_distrib_lib.rst
@@ -42,6 +42,9 @@ The model of operation is shown in the diagram below.
 
    Packet Distributor mode of operation
 
+There are two modes of operation of the API in the distributor Library, one which sends one packet at a time
+to workers using 32-bits for flow_id, and an optiomised mode which sends bursts of up to 8 packets at a time
+to workers, using 15 bits of flow_id. The mode is selected by the type field in the rte_distributor_create function.
 
 Distributor Core Operation
 --------------------------
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v7 17/17] maintainers: add to distributor lib maintainers
  2017-02-21  3:17                   ` [PATCH v7 0/17] distributor library " David Hunt
                                       ` (15 preceding siblings ...)
  2017-02-21  3:17                     ` [PATCH v7 16/17] doc: distributor library changes for new burst api David Hunt
@ 2017-02-21  3:17                     ` David Hunt
  2017-02-24 14:01                     ` [PATCH v7 0/17] distributor library performance enhancements Bruce Richardson
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-02-21  3:17 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 8305237..e9033ec 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -501,6 +501,7 @@ F: doc/guides/sample_app_ug/ip_reassembly.rst
 
 Distributor
 M: Bruce Richardson <bruce.richardson@intel.com>
+M: David Hunt <david.hunt@intel.com>
 F: lib/librte_distributor/
 F: doc/guides/prog_guide/packet_distrib_lib.rst
 F: app/test/test_distributor*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* Re: [PATCH v7 01/17] lib: rename legacy distributor lib files
  2017-02-21  3:17                     ` [PATCH v7 01/17] lib: rename legacy distributor lib files David Hunt
@ 2017-02-21 10:27                       ` Hunt, David
  2017-02-24 14:03                       ` Bruce Richardson
  2017-03-01  7:47                       ` [PATCH v8 0/18] distributor library performance enhancements David Hunt
  2 siblings, 0 replies; 202+ messages in thread
From: Hunt, David @ 2017-02-21 10:27 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson


On 21/2/2017 3:17 AM, David Hunt wrote:
> Move files out of the way so that we can replace with new
> versions of the distributor libtrary. Files are named in
> such a way as to match the symbol versioning that we will
> apply for backward ABI compatibility.
>
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>
---snip--

Apologies, this patch should have been sent with '--find-renames', thus 
reducing the
size of this patch significantly, and eliminating checkpatch 
warnings/errors.

Dave.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v7 07/17] lib: apply symbol versioning to distibutor lib
  2017-02-21  3:17                     ` [PATCH v7 07/17] lib: apply symbol versioning to distibutor lib David Hunt
@ 2017-02-21 11:50                       ` Hunt, David
  2017-02-24 14:12                       ` Bruce Richardson
  1 sibling, 0 replies; 202+ messages in thread
From: Hunt, David @ 2017-02-21 11:50 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson


On 21/2/2017 3:17 AM, David Hunt wrote:
> Note: LIBABIVER is also bumped up in the Makefile
>
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>   lib/librte_distributor/rte_distributor.c           | 10 +++++++++-
>   lib/librte_distributor/rte_distributor_v20.c       | 10 ++++++++++
>   lib/librte_distributor/rte_distributor_version.map | 14 ++++++++++++++
>   3 files changed, 33 insertions(+), 1 deletion(-)
>
--snip--

The following is generated by checkpatch for this patch:

ERROR:SPACING: space required after that ',' (ctx:VxO)
#70: FILE: lib/librte_distributor/rte_distributor.c:105:
+BIND_DEFAULT_SYMBOL(rte_distributor_request_pkt,, 17.05);
                                                 ^

However, I also tried with a space:

ERROR:SPACING: space prohibited before that ',' (ctx:WxW)
#26: FILE: lib/librte_distributor/rte_distributor.c:105:
+BIND_DEFAULT_SYMBOL(rte_distributor_request_pkt, , 17.05);
                                                   ^
So in either case it seems it's not possible to make checkpatch happy.


Rgds,
Dave.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v7 16/17] doc: distributor library changes for new burst api
  2017-02-21  3:17                     ` [PATCH v7 16/17] doc: distributor library changes for new burst api David Hunt
@ 2017-02-21 16:18                       ` Mcnamara, John
  0 siblings, 0 replies; 202+ messages in thread
From: Mcnamara, John @ 2017-02-21 16:18 UTC (permalink / raw)
  To: Hunt, David, dev; +Cc: Richardson, Bruce, Hunt, David

> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of David Hunt
> Sent: Tuesday, February 21, 2017 3:18 AM
> To: dev@dpdk.org
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Hunt, David
> <david.hunt@intel.com>
> Subject: [dpdk-dev] [PATCH v7 16/17] doc: distributor library changes for
> new burst api
> 
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  doc/guides/prog_guide/packet_distrib_lib.rst | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/packet_distrib_lib.rst
> b/doc/guides/prog_guide/packet_distrib_lib.rst
> index b5bdabb..c070eaa 100644
> --- a/doc/guides/prog_guide/packet_distrib_lib.rst
> +++ b/doc/guides/prog_guide/packet_distrib_lib.rst
> @@ -42,6 +42,9 @@ The model of operation is shown in the diagram below.
> 
>     Packet Distributor mode of operation
> 
> +There are two modes of operation of the API in the distributor Library,
> +one which sends one packet at a time to workers using 32-bits for
> +flow_id, and an optiomised mode which sends bursts of up to 8 packets at
> a time to workers, using 15 bits of flow_id. The mode is selected by the
> type field in the rte_distributor_create function.

It is better to use fixed width backquotes for function names like:

    type field in the ``rte_distributor_create()`` function.

Apart from that.

Acked-by: John McNamara <john.mcnamara@intel.com>

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v7 0/17] distributor library performance enhancements
  2017-02-21  3:17                   ` [PATCH v7 0/17] distributor library " David Hunt
                                       ` (16 preceding siblings ...)
  2017-02-21  3:17                     ` [PATCH v7 17/17] maintainers: add to distributor lib maintainers David Hunt
@ 2017-02-24 14:01                     ` Bruce Richardson
  17 siblings, 0 replies; 202+ messages in thread
From: Bruce Richardson @ 2017-02-24 14:01 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Tue, Feb 21, 2017 at 03:17:36AM +0000, David Hunt wrote:
> This patch aims to improve the throughput of the distributor library.
> 
> It uses a similar handshake mechanism to the previous version of
> the library, in that bits are used to indicate when packets are ready
> to be sent to a worker and ready to be returned from a worker. One main
> difference is that instead of sending one packet in a cache line, it makes
> use of the 7 free spaces in the same cache line in order to send up to
> 8 packets at a time to/from a worker.
> 
> The flow matching algorithm has had significant re-work, and now keeps an
> array of inflight flows and an array of backlog flows, and matches incoming
> flows to the inflight/backlog flows of all workers so that flow pinning to
> workers can be maintained.
> 
> The Flow Match algorithm has both scalar and a vector versions, and a
> function pointer is used to select the post appropriate function at run time,
> depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
> the scalar match function is selected, which should still gives a good boost
> in performance over the non-burst API.
> 
> v2 changes:
>   * Created a common distributor_priv.h header file with common
>     definitions and structures.
>   * Added a scalar version so it can be built and used on machines without
>     sse2 instruction set
>   * Added unit autotests
>   * Added perf autotest

For future reference, I think it's better to put the list of deltas from
each version in reverse order, so that the latest changes are on top,
and save scrolling for those of us who have been tracking the set.

> 
> v3 changes:
>   * Addressed mailing list review comments
>   * Test code removal
>   * Split out SSE match into separate file to facilitate NEON addition
>   * Cleaned up conditional compilation flags for SSE2
>   * Addressed c99 style compilation errors
>   * rebased on latest head (Jan 2 2017, Happy New Year to all)
> 
> v4 changes:
>    * fixed issue building shared libraries
> 
> v5 changes:
>    * Removed some un-needed code around retries in worker API calls
>    * Cleanup due to review comments on mailing list
>    * Cleanup of non-x86 platform compilation, fallback to scalar match
> 
> v6 changes:
>    * Fixed intermittent segfault where num pkts not divisible
>      by BURST_SIZE
>    * Cleanup due to review comments on mailing list
>    * Renamed _priv.h to _private.h.
> 
> v7 changes:
>    * Reorganised patch so there's a more natural progression in the
>      changes, and divided them down into easier to review chunks.
>    * Previous versions of this patch set were effectively two APIs.
>      We now have a single API. Legacy functionality can
>      be used by by using the rte_distributor_create API call with the
>      RTE_DISTRIBUTOR_SINGLE flag when creating a distributor instance.
>    * Added symbol versioning for old API so that ABI is preserved.
> 
The merging to a single API is great to see, making it so much easier
for app developers. Thanks for that.

/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v7 01/17] lib: rename legacy distributor lib files
  2017-02-21  3:17                     ` [PATCH v7 01/17] lib: rename legacy distributor lib files David Hunt
  2017-02-21 10:27                       ` Hunt, David
@ 2017-02-24 14:03                       ` Bruce Richardson
  2017-03-01  9:55                         ` Hunt, David
  2017-03-01  7:47                       ` [PATCH v8 0/18] distributor library performance enhancements David Hunt
  2 siblings, 1 reply; 202+ messages in thread
From: Bruce Richardson @ 2017-02-24 14:03 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Tue, Feb 21, 2017 at 03:17:37AM +0000, David Hunt wrote:
> Move files out of the way so that we can replace with new
> versions of the distributor libtrary. Files are named in
> such a way as to match the symbol versioning that we will
> apply for backward ABI compatibility.
> 
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  app/test/test_distributor.c                  |   2 +-
>  app/test/test_distributor_perf.c             |   2 +-
>  examples/distributor/main.c                  |   2 +-
>  lib/librte_distributor/Makefile              |   4 +-
>  lib/librte_distributor/rte_distributor.c     | 487 ---------------------------
>  lib/librte_distributor/rte_distributor.h     | 247 --------------
>  lib/librte_distributor/rte_distributor_v20.c | 487 +++++++++++++++++++++++++++
>  lib/librte_distributor/rte_distributor_v20.h | 247 ++++++++++++++

Rather than changing the unit tests and example applications, I think
this patch would be better with a new rte_distributor.h file which
simply does "#include  <rte_distributor_v20.h>". Alternatively, I
recently upstreamed a patch, which went into 17.02, to allow symlinks in
the folder so you could create a symlink to the renamed file.

/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v7 02/17] lib: symbol versioning of functions in distributor
  2017-02-21  3:17                     ` [PATCH v7 02/17] lib: symbol versioning of functions in distributor David Hunt
@ 2017-02-24 14:05                       ` Bruce Richardson
  0 siblings, 0 replies; 202+ messages in thread
From: Bruce Richardson @ 2017-02-24 14:05 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Tue, Feb 21, 2017 at 03:17:38AM +0000, David Hunt wrote:
> we will start the symbol versioning by renaming all legacy functions
> 
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  app/test/test_distributor.c                        | 104 +++++++++++----------
>  app/test/test_distributor_perf.c                   |  28 +++---
>  examples/distributor/main.c                        |  24 ++---
>  lib/librte_distributor/rte_distributor_v20.c       |  54 +++++------
>  lib/librte_distributor/rte_distributor_v20.h       |  33 +++----
>  lib/librte_distributor/rte_distributor_version.map |  18 ++--
>  6 files changed, 132 insertions(+), 129 deletions(-)
> 
<snip>
> diff --git a/lib/librte_distributor/rte_distributor_version.map b/lib/librte_distributor/rte_distributor_version.map
> index 73fdc43..414fdc3 100644
> --- a/lib/librte_distributor/rte_distributor_version.map
> +++ b/lib/librte_distributor/rte_distributor_version.map
> @@ -1,15 +1,15 @@
>  DPDK_2.0 {
>  	global:
>  
> -	rte_distributor_clear_returns;
> -	rte_distributor_create;
> -	rte_distributor_flush;
> -	rte_distributor_get_pkt;
> -	rte_distributor_poll_pkt;
> -	rte_distributor_process;
> -	rte_distributor_request_pkt;
> -	rte_distributor_return_pkt;
> -	rte_distributor_returned_pkts;
> +	rte_distributor_clear_returns_v20;
> +	rte_distributor_create_v20;
> +	rte_distributor_flush_v20;
> +	rte_distributor_get_pkt_v20;
> +	rte_distributor_poll_pkt_v20;
> +	rte_distributor_process_v20;
> +	rte_distributor_request_pk_v20t;
> +	rte_distributor_return_pkt_v20;
> +	rte_distributor_returned_pkts_v20;
>  
>  	local: *;
>  };
> -- 
This looks the wrong thing to do - renaming the files in the history.
Instead, I think you need to add in aliases for the renamed versions,
thereby avoiding the need for apps, including tests and examples to
update their code to use the _v20 functions. Those _v20 suffixes should
never be externally visible.

/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v7 03/17] lib: create rte_distributor_private.h
  2017-02-21  3:17                     ` [PATCH v7 03/17] lib: create rte_distributor_private.h David Hunt
@ 2017-02-24 14:07                       ` Bruce Richardson
  0 siblings, 0 replies; 202+ messages in thread
From: Bruce Richardson @ 2017-02-24 14:07 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Tue, Feb 21, 2017 at 03:17:39AM +0000, David Hunt wrote:
> We'll be adding content in here common to both burst and
> legacy APIs.
> 
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
Couple of minor nits on the commit text here:
* check-git-log.sh doesn't like the title as it's too technical. Suggest
  using "create private header file" or similar
* I think you could quality the "content" as being "internal
  implementation definitions" 

/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v7 04/17] lib: add new burst oriented distributor structs
  2017-02-21  3:17                     ` [PATCH v7 04/17] lib: add new burst oriented distributor structs David Hunt
@ 2017-02-24 14:08                       ` Bruce Richardson
  2017-03-01  9:57                         ` Hunt, David
  2017-02-24 14:09                       ` Bruce Richardson
  1 sibling, 1 reply; 202+ messages in thread
From: Bruce Richardson @ 2017-02-24 14:08 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Tue, Feb 21, 2017 at 03:17:40AM +0000, David Hunt wrote:
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  lib/librte_distributor/rte_distributor_private.h | 61 ++++++++++++++++++++++++
>  1 file changed, 61 insertions(+)
> 
> diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
> index 2d85b9b..c8e0f98 100644
> --- a/lib/librte_distributor/rte_distributor_private.h
> +++ b/lib/librte_distributor/rte_distributor_private.h
> @@ -129,6 +129,67 @@ struct rte_distributor_v20 {
>  	struct rte_distributor_returned_pkts returns;
>  };
>  
> +/* All different signature compare functions */
> +enum rte_distributor_match_function {
> +	RTE_DIST_MATCH_SCALAR = 0,
> +	RTE_DIST_MATCH_VECTOR,
> +	RTE_DIST_NUM_MATCH_FNS
> +};
> +
> +/**
> + * Buffer structure used to pass the pointer data between cores. This is cache
> + * line aligned, but to improve performance and prevent adjacent cache-line
> + * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
> + * the next cache line to worker 0, we pad this out to two cache lines.
> + * We can pass up to 8 mbufs at a time in one cacheline.
> + * There is a separate cacheline for returns in the burst API.
> + */
> +struct rte_distributor_buffer {
> +	volatile int64_t bufptr64[RTE_DIST_BURST_SIZE]
> +			__rte_cache_aligned; /* <= outgoing to worker */
> +
> +	int64_t pad1 __rte_cache_aligned;    /* <= one cache line  */
> +
> +	volatile int64_t retptr64[RTE_DIST_BURST_SIZE]
> +			__rte_cache_aligned; /* <= incoming from worker */
> +
> +	int64_t pad2 __rte_cache_aligned;    /* <= one cache line  */
> +
> +	int count __rte_cache_aligned;       /* <= number of current mbufs */
> +};
> +
> +struct rte_distributor {
> +	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
> +
> +	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
> +	unsigned int num_workers;             /**< Number of workers polling */
> +	unsigned int alg_type;                /**< Number of alg types */
> +
> +	/**>
> +	 * First cache line in the this array are the tags inflight
> +	 * on the worker core. Second cache line are the backlog
> +	 * that are going to go to the worker core.
> +	 */
> +	uint16_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS][RTE_DIST_BURST_SIZE*2]
> +			__rte_cache_aligned;
> +
> +	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS]
> +			__rte_cache_aligned;
> +
> +	struct rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
> +
> +	struct rte_distributor_returned_pkts returns;
> +
> +	enum rte_distributor_match_function dist_match_fn;
> +
> +	struct rte_distributor_v20 *d_v20;
> +};
> +
> +void
> +find_match_scalar(struct rte_distributor *d,
> +			uint16_t *data_ptr,
> +			uint16_t *output_ptr);
> +
>  #ifdef __cplusplus
>  }
>  #endif
The last patch claimed that this header file is for structs/definitions
common between the old and new implementations. These definitions look
to apply only to the new one, so do they belong in the .c file instead?

/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v7 04/17] lib: add new burst oriented distributor structs
  2017-02-21  3:17                     ` [PATCH v7 04/17] lib: add new burst oriented distributor structs David Hunt
  2017-02-24 14:08                       ` Bruce Richardson
@ 2017-02-24 14:09                       ` Bruce Richardson
  2017-03-01  9:58                         ` Hunt, David
  1 sibling, 1 reply; 202+ messages in thread
From: Bruce Richardson @ 2017-02-24 14:09 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Tue, Feb 21, 2017 at 03:17:40AM +0000, David Hunt wrote:
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  lib/librte_distributor/rte_distributor_private.h | 61 ++++++++++++++++++++++++
>  1 file changed, 61 insertions(+)
> 
> diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
> index 2d85b9b..c8e0f98 100644
> --- a/lib/librte_distributor/rte_distributor_private.h
> +++ b/lib/librte_distributor/rte_distributor_private.h
> @@ -129,6 +129,67 @@ struct rte_distributor_v20 {
>  	struct rte_distributor_returned_pkts returns;
>  };
>  
> +/* All different signature compare functions */
> +enum rte_distributor_match_function {
> +	RTE_DIST_MATCH_SCALAR = 0,
> +	RTE_DIST_MATCH_VECTOR,
> +	RTE_DIST_NUM_MATCH_FNS
> +};
> +
> +/**
> + * Buffer structure used to pass the pointer data between cores. This is cache
> + * line aligned, but to improve performance and prevent adjacent cache-line
> + * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
> + * the next cache line to worker 0, we pad this out to two cache lines.
> + * We can pass up to 8 mbufs at a time in one cacheline.
> + * There is a separate cacheline for returns in the burst API.
> + */
> +struct rte_distributor_buffer {
> +	volatile int64_t bufptr64[RTE_DIST_BURST_SIZE]
> +			__rte_cache_aligned; /* <= outgoing to worker */
> +
> +	int64_t pad1 __rte_cache_aligned;    /* <= one cache line  */
> +
> +	volatile int64_t retptr64[RTE_DIST_BURST_SIZE]
> +			__rte_cache_aligned; /* <= incoming from worker */
> +
> +	int64_t pad2 __rte_cache_aligned;    /* <= one cache line  */
> +
> +	int count __rte_cache_aligned;       /* <= number of current mbufs */
> +};

Rather than adding padding elements here, would it be better and clearer
just to align the values to 128B (or more strictly CACHE_LINE_SZ * 2)?

/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v7 05/17] lib: add new distributor code
  2017-02-21  3:17                     ` [PATCH v7 05/17] lib: add new distributor code David Hunt
@ 2017-02-24 14:11                       ` Bruce Richardson
  0 siblings, 0 replies; 202+ messages in thread
From: Bruce Richardson @ 2017-02-24 14:11 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Tue, Feb 21, 2017 at 03:17:41AM +0000, David Hunt wrote:
> This patch includes public header file which will be used once
> we add in the symbol versioning for v20 and v1705 APIs.
> 
> Also includes v1702 private header file, and code for new

Now v1705.
Looking at the code, the header includes definitions for functions which
don't actually exist. Therefore I don't think the header belongs in this
patch.

/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v7 06/17] lib: add SIMD flow matching to distributor
  2017-02-21  3:17                     ` [PATCH v7 06/17] lib: add SIMD flow matching to distributor David Hunt
@ 2017-02-24 14:11                       ` Bruce Richardson
  0 siblings, 0 replies; 202+ messages in thread
From: Bruce Richardson @ 2017-02-24 14:11 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Tue, Feb 21, 2017 at 03:17:42AM +0000, David Hunt wrote:
> Add an optimised version of the in-flight flow matching algorithm
> using SIMD instructions. This should give up to 1.5x over the scalar
> versions performance.
> 
> Falls back to scalar version if SSE4.2 not available
> 
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  lib/librte_distributor/Makefile                    |   7 ++
>  lib/librte_distributor/rte_distributor.c           |  16 ++-
>  .../rte_distributor_match_generic.c                |  43 ++++++++
>  lib/librte_distributor/rte_distributor_match_sse.c | 113 +++++++++++++++++++++
>  lib/librte_distributor/rte_distributor_private.h   |   5 +
>  5 files changed, 182 insertions(+), 2 deletions(-)
>  create mode 100644 lib/librte_distributor/rte_distributor_match_generic.c
>  create mode 100644 lib/librte_distributor/rte_distributor_match_sse.c
> 
> diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
> index 276695a..5b599c6 100644
> --- a/lib/librte_distributor/Makefile
> +++ b/lib/librte_distributor/Makefile
> @@ -44,6 +44,13 @@ LIBABIVER := 1
>  # all source are stored in SRCS-y
>  SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
>  SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor.c
> +ifeq ($(CONFIG_RTE_ARCH_X86),y)
> +SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_match_sse.c
> +CFLAGS_rte_distributor_match_sse.o += -msse4.2
> +else
> +SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_match_generic.c
> +endif
> +
>  
>  # install this header file
>  SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor_v20.h
> diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
> index ae8d508..b8e171c 100644
> --- a/lib/librte_distributor/rte_distributor.c
> +++ b/lib/librte_distributor/rte_distributor.c
> @@ -392,7 +392,13 @@ rte_distributor_process(struct rte_distributor *d,
>  		for (; i < RTE_DIST_BURST_SIZE; i++)
>  			flows[i] = 0;
>  
> -		find_match_scalar(d, &flows[0], &matches[0]);
> +		switch (d->dist_match_fn) {
> +		case RTE_DIST_MATCH_VECTOR:
> +			find_match_vec(d, &flows[0], &matches[0]);
> +			break;
> +		default:
> +			find_match_scalar(d, &flows[0], &matches[0]);
> +		}
>  
>  		/*
>  		 * Matches array now contain the intended worker ID (+1) of
> @@ -608,7 +614,13 @@ rte_distributor_create(const char *name,
>  	snprintf(d->name, sizeof(d->name), "%s", name);
>  	d->num_workers = num_workers;
>  	d->alg_type = alg_type;
> -	d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
> +
> +#if defined(RTE_ARCH_X86)
> +	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2)) {
> +		d->dist_match_fn = RTE_DIST_MATCH_VECTOR;
> +	} else

Minor nit: you can remove the braces here.

/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v7 07/17] lib: apply symbol versioning to distibutor lib
  2017-02-21  3:17                     ` [PATCH v7 07/17] lib: apply symbol versioning to distibutor lib David Hunt
  2017-02-21 11:50                       ` Hunt, David
@ 2017-02-24 14:12                       ` Bruce Richardson
  1 sibling, 0 replies; 202+ messages in thread
From: Bruce Richardson @ 2017-02-24 14:12 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Tue, Feb 21, 2017 at 03:17:43AM +0000, David Hunt wrote:
> Note: LIBABIVER is also bumped up in the Makefile
> 
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  lib/librte_distributor/rte_distributor.c           | 10 +++++++++-
>  lib/librte_distributor/rte_distributor_v20.c       | 10 ++++++++++
>  lib/librte_distributor/rte_distributor_version.map | 14 ++++++++++++++
>  3 files changed, 33 insertions(+), 1 deletion(-)
> 
In my sanity checks this breaks the build with shared libs. Please
investigate.

Regards,
/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v7 08/17] test: change params to distributor autotest
  2017-02-21  3:17                     ` [PATCH v7 08/17] test: change params to distributor autotest David Hunt
@ 2017-02-24 14:14                       ` Bruce Richardson
  2017-03-01 10:06                         ` Hunt, David
  0 siblings, 1 reply; 202+ messages in thread
From: Bruce Richardson @ 2017-02-24 14:14 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Tue, Feb 21, 2017 at 03:17:44AM +0000, David Hunt wrote:
> In the next few patches, we'll want to test old and new API,
> so here we're allowing different parameters to be passed to
> the tests, instead of just a distributor struct.
> 
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  app/test/test_distributor.c | 64 +++++++++++++++++++++++++++++----------------
>  1 file changed, 42 insertions(+), 22 deletions(-)
> 
> diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
> index 6a4e20b..fdfa793 100644
> --- a/app/test/test_distributor.c
> +++ b/app/test/test_distributor.c
> @@ -45,6 +45,13 @@
>  #define BURST 32
>  #define BIG_BATCH 1024
>  
> +struct worker_params {
> +	char name[64];
> +	struct rte_distributor_v20 *dist;
> +};
> +
> +struct worker_params worker_params;
> +
>  /* statics - all zero-initialized by default */
>  static volatile int quit;      /**< general quit variable for all threads */
>  static volatile int zero_quit; /**< var for when we just want thr0 to quit*/
> @@ -81,7 +88,8 @@ static int
>  handle_work(void *arg)
>  {
>  	struct rte_mbuf *pkt = NULL;
> -	struct rte_distributor_v20 *d = arg;
> +	struct worker_params *wp = arg;
> +	struct rte_distributor_v20 *d = wp->dist;

The cover letter indicated that using new vs old API was just a matter
of passing a different parameter to create. I therefore would not expect
to see references to v20 APIs or structures in any code outside the lib
itself. Am I missing something?

/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v7 12/17] example: add extra stats to distributor sample
  2017-02-21  3:17                     ` [PATCH v7 12/17] example: add extra stats to distributor sample David Hunt
@ 2017-02-24 14:16                       ` Bruce Richardson
  0 siblings, 0 replies; 202+ messages in thread
From: Bruce Richardson @ 2017-02-24 14:16 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Tue, Feb 21, 2017 at 03:17:48AM +0000, David Hunt wrote:
> This will allow us to see what's going on at various stages
> throughout the sample app, with per-second visibility
> 
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
For example apps, the patch prefix should be "examples/<example_name>:"
check-git-log.sh should check this for you.

Regards,
/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v7 14/17] sample: switch to new distributor API
  2017-02-21  3:17                     ` [PATCH v7 14/17] sample: switch to new distributor API David Hunt
@ 2017-02-24 14:16                       ` Bruce Richardson
  0 siblings, 0 replies; 202+ messages in thread
From: Bruce Richardson @ 2017-02-24 14:16 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Tue, Feb 21, 2017 at 03:17:50AM +0000, David Hunt wrote:
> and give distributor it's own thread

This change probably deserves a separate patch from switching the API.

/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v7 15/17] lib: make v20 header file private
  2017-02-21  3:17                     ` [PATCH v7 15/17] lib: make v20 header file private David Hunt
@ 2017-02-24 14:18                       ` Bruce Richardson
  0 siblings, 0 replies; 202+ messages in thread
From: Bruce Richardson @ 2017-02-24 14:18 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Tue, Feb 21, 2017 at 03:17:51AM +0000, David Hunt wrote:
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  lib/librte_distributor/Makefile | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
> index 5b599c6..3017398 100644
> --- a/lib/librte_distributor/Makefile
> +++ b/lib/librte_distributor/Makefile
> @@ -53,8 +53,7 @@ endif
>  
>  
>  # install this header file
> -SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor_v20.h
> -SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor.h
> +SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
>  

Minor nits:
1/ I think this patch should go earlier in the set.
2/ you can keep the += in the assignment. It actually makes it less
error prone for anyone changing/adding things later as they can
copy-paste or reorder the lines without causing themselves problems.

/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* [PATCH v8 0/18] distributor library performance enhancements
  2017-02-21  3:17                     ` [PATCH v7 01/17] lib: rename legacy distributor lib files David Hunt
  2017-02-21 10:27                       ` Hunt, David
  2017-02-24 14:03                       ` Bruce Richardson
@ 2017-03-01  7:47                       ` David Hunt
  2017-03-01  7:47                         ` [PATCH v8 01/18] lib: rename legacy distributor lib files David Hunt
                                           ` (17 more replies)
  2 siblings, 18 replies; 202+ messages in thread
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson

This patch aims to improve the throughput of the distributor library.

It uses a similar handshake mechanism to the previous version of
the library, in that bits are used to indicate when packets are ready
to be sent to a worker and ready to be returned from a worker. One main
difference is that instead of sending one packet in a cache line, it makes
use of the 7 free spaces in the same cache line in order to send up to
8 packets at a time to/from a worker.

The flow matching algorithm has had significant re-work, and now keeps an
array of inflight flows and an array of backlog flows, and matches incoming
flows to the inflight/backlog flows of all workers so that flow pinning to
workers can be maintained.

The Flow Match algorithm has both scalar and a vector versions, and a
function pointer is used to select the post appropriate function at run time,
depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
the scalar match function is selected, which should still gives a good boost
in performance over the non-burst API.

v8 changes:
   * Changed the patch set to have a more logical order order of
     the changes, but the end result is basically the same.
   * Fixed broken shared library build.
   * Split down the updates to example app more
   * No longer changes the test app and sample app to use a temporary
     API.
   * No longer temporarily re-names the functions in the
     version.map file.

v7 changes:
   * Reorganised patch so there's a more natural progression in the
     changes, and divided them down into easier to review chunks.
   * Previous versions of this patch set were effectively two APIs.
     We now have a single API. Legacy functionality can
     be used by by using the rte_distributor_create API call with the
     RTE_DISTRIBUTOR_SINGLE flag when creating a distributor instance.
   * Added symbol versioning for old API so that ABI is preserved.

v6 changes:
   * Fixed intermittent segfault where num pkts not divisible
     by BURST_SIZE
   * Cleanup due to review comments on mailing list
   * Renamed _priv.h to _private.h.

v5 changes:
   * Removed some un-needed code around retries in worker API calls
   * Cleanup due to review comments on mailing list
   * Cleanup of non-x86 platform compilation, fallback to scalar match

v4 changes:
   * fixed issue building shared libraries

v3 changes:
  * Addressed mailing list review comments
  * Test code removal
  * Split out SSE match into separate file to facilitate NEON addition
  * Cleaned up conditional compilation flags for SSE2
  * Addressed c99 style compilation errors
  * rebased on latest head (Jan 2 2017, Happy New Year to all)

v2 changes:
  * Created a common distributor_priv.h header file with common
    definitions and structures.
  * Added a scalar version so it can be built and used on machines without
    sse2 instruction set
  * Added unit autotests
  * Added perf autotest

Notes:
   Apps must now work in bursts, as up to 8 are given to a worker at a time
   For performance in matching, Flow ID's are 15-bits
   If 32 bits Flow IDs are required, use the packet-at-a-time (SINGLE)
   mode.

Performance Gains
   2.2GHz Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
   2 x XL710 40GbE NICS to 2 x 40Gbps traffic generator channels 64b packets
   separate cores for rx, tx, distributor
    1 worker  - up to 4.8x
    4 workers - up to 2.9x
    8 workers - up to 1.8x
   12 workers - up to 2.1x
   16 workers - up to 1.8x

[01/18] lib: rename legacy distributor lib files
[02/18] lib: create private header file
[03/18] lib: add new burst oriented distributor structs
[04/18] lib: add new distributor code
[05/18] lib: add SIMD flow matching to distributor
[06/18] test/distributor: extra params for autotests
[07/18] lib: switch distributor over to new API
[08/18] lib: make v20 header file private
[09/18] lib: add symbol versioning to distributor
[10/18] test: test single and burst distributor API
[11/18] test: add perf test for distributor burst mode
[12/18] examples/distributor: allow for extra stats
[13/18] sample: distributor: wait for ports to come up
[14/18] examples/distributor: give distributor a core
[15/18] examples/distributor: limit number of Tx rings
[16/18] examples/distributor: give Rx thread a core
[17/18] doc: distributor library changes for new burst API
[18/18] maintainers: add to distributor lib maintainers

^ permalink raw reply	[flat|nested] 202+ messages in thread

* [PATCH v8 01/18] lib: rename legacy distributor lib files
  2017-03-01  7:47                       ` [PATCH v8 0/18] distributor library performance enhancements David Hunt
@ 2017-03-01  7:47                         ` David Hunt
  2017-03-06  9:10                           ` [PATCH v9 00/18] distributor lib performance enhancements David Hunt
  2017-03-01  7:47                         ` [PATCH v8 02/18] lib: create private header file David Hunt
                                           ` (16 subsequent siblings)
  17 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Move files out of the way so that we can replace with new
versions of the distributor libtrary. Files are named in
such a way as to match the symbol versioning that we will
apply for backward ABI compatibility.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                    |   3 +-
 lib/librte_distributor/rte_distributor.h           | 210 +-----------------
 .../{rte_distributor.c => rte_distributor_v20.c}   |   2 +-
 lib/librte_distributor/rte_distributor_v20.h       | 247 +++++++++++++++++++++
 4 files changed, 251 insertions(+), 211 deletions(-)
 rename lib/librte_distributor/{rte_distributor.c => rte_distributor_v20.c} (99%)
 create mode 100644 lib/librte_distributor/rte_distributor_v20.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 4c9af17..b314ca6 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -42,10 +42,11 @@ EXPORT_MAP := rte_distributor_version.map
 LIBABIVER := 1
 
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor_v20.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
diff --git a/lib/librte_distributor/rte_distributor.h b/lib/librte_distributor/rte_distributor.h
index 7d36bc8..e41d522 100644
--- a/lib/librte_distributor/rte_distributor.h
+++ b/lib/librte_distributor/rte_distributor.h
@@ -34,214 +34,6 @@
 #ifndef _RTE_DISTRIBUTE_H_
 #define _RTE_DISTRIBUTE_H_
 
-/**
- * @file
- * RTE distributor
- *
- * The distributor is a component which is designed to pass packets
- * one-at-a-time to workers, with dynamic load balancing.
- */
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
-
-struct rte_distributor;
-struct rte_mbuf;
-
-/**
- * Function to create a new distributor instance
- *
- * Reserves the memory needed for the distributor operation and
- * initializes the distributor to work with the configured number of workers.
- *
- * @param name
- *   The name to be given to the distributor instance.
- * @param socket_id
- *   The NUMA node on which the memory is to be allocated
- * @param num_workers
- *   The maximum number of workers that will request packets from this
- *   distributor
- * @return
- *   The newly created distributor instance
- */
-struct rte_distributor *
-rte_distributor_create(const char *name, unsigned socket_id,
-		unsigned num_workers);
-
-/*  *** APIS to be called on the distributor lcore ***  */
-/*
- * The following APIs are the public APIs which are designed for use on a
- * single lcore which acts as the distributor lcore for a given distributor
- * instance. These functions cannot be called on multiple cores simultaneously
- * without using locking to protect access to the internals of the distributor.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * Process a set of packets by distributing them among workers that request
- * packets. The distributor will ensure that no two packets that have the
- * same flow id, or tag, in the mbuf will be procesed at the same time.
- *
- * The user is advocated to set tag for each mbuf before calling this function.
- * If user doesn't set the tag, the tag value can be various values depending on
- * driver implementation and configuration.
- *
- * This is not multi-thread safe and should only be called on a single lcore.
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs to be distributed
- * @param num_mbufs
- *   The number of mbufs in the mbufs array
- * @return
- *   The number of mbufs processed.
- */
-int
-rte_distributor_process(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned num_mbufs);
-
-/**
- * Get a set of mbufs that have been returned to the distributor by workers
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs pointer array to be filled in
- * @param max_mbufs
- *   The size of the mbufs array
- * @return
- *   The number of mbufs returned in the mbufs array.
- */
-int
-rte_distributor_returned_pkts(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned max_mbufs);
-
-/**
- * Flush the distributor component, so that there are no in-flight or
- * backlogged packets awaiting processing
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @return
- *   The number of queued/in-flight packets that were completed by this call.
- */
-int
-rte_distributor_flush(struct rte_distributor *d);
-
-/**
- * Clears the array of returned packets used as the source for the
- * rte_distributor_returned_pkts() API call.
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- */
-void
-rte_distributor_clear_returns(struct rte_distributor *d);
-
-/*  *** APIS to be called on the worker lcores ***  */
-/*
- * The following APIs are the public APIs which are designed for use on
- * multiple lcores which act as workers for a distributor. Each lcore should use
- * a unique worker id when requesting packets.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * API called by a worker to get a new packet to process. Any previous packet
- * given to the worker is assumed to have completed processing, and may be
- * optionally returned to the distributor via the oldpkt parameter.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- *
- * @return
- *   A new packet to be processed by the worker thread.
- */
-struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to return a completed packet without requesting a
- * new packet, for example, because a worker thread is shutting down
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param mbuf
- *   The previous packet being processed by the worker
- */
-int
-rte_distributor_return_pkt(struct rte_distributor *d, unsigned worker_id,
-		struct rte_mbuf *mbuf);
-
-/**
- * API called by a worker to request a new packet to process.
- * Any previous packet given to the worker is assumed to have completed
- * processing, and may be optionally returned to the distributor via
- * the oldpkt parameter.
- * Unlike rte_distributor_get_pkt(), this function does not wait for a new
- * packet to be provided by the distributor.
- *
- * NOTE: after calling this function, rte_distributor_poll_pkt() should
- * be used to poll for the packet requested. The rte_distributor_get_pkt()
- * API should *not* be used to try and retrieve the new packet.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- */
-void
-rte_distributor_request_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to check for a new packet that was previously
- * requested by a call to rte_distributor_request_pkt(). It does not wait
- * for the new packet to be available, but returns NULL if the request has
- * not yet been fulfilled by the distributor.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- *
- * @return
- *   A new packet to be processed by the worker thread, or NULL if no
- *   packet is yet available.
- */
-struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
-		unsigned worker_id);
-
-#ifdef __cplusplus
-}
-#endif
+#include <rte_distributor_v20.h>
 
 #endif
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor_v20.c
similarity index 99%
rename from lib/librte_distributor/rte_distributor.c
rename to lib/librte_distributor/rte_distributor_v20.c
index f3f778c..b890947 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -40,7 +40,7 @@
 #include <rte_errno.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
-#include "rte_distributor.h"
+#include "rte_distributor_v20.h"
 
 #define NO_FLAGS 0
 #define RTE_DISTRIB_PREFIX "DT_"
diff --git a/lib/librte_distributor/rte_distributor_v20.h b/lib/librte_distributor/rte_distributor_v20.h
new file mode 100644
index 0000000..b69aa27
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v20.h
@@ -0,0 +1,247 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIB_V20_H_
+#define _RTE_DISTRIB_V20_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+struct rte_distributor;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor *
+rte_distributor_create(const char *name, unsigned int socket_id,
+		unsigned int num_workers);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed at the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush(struct rte_distributor *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns(struct rte_distributor *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get a new packet to process. Any previous packet
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ *
+ * @return
+ *   A new packet to be processed by the worker thread.
+ */
+struct rte_mbuf *
+rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbuf
+ *   The previous packet being processed by the worker
+ */
+int
+rte_distributor_return_pkt(struct rte_distributor *d, unsigned int worker_id,
+		struct rte_mbuf *mbuf);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt(), this function does not wait for a new
+ * packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ */
+void
+rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ *
+ * @return
+ *   A new packet to be processed by the worker thread, or NULL if no
+ *   packet is yet available.
+ */
+struct rte_mbuf *
+rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v8 02/18] lib: create private header file
  2017-03-01  7:47                       ` [PATCH v8 0/18] distributor library performance enhancements David Hunt
  2017-03-01  7:47                         ` [PATCH v8 01/18] lib: rename legacy distributor lib files David Hunt
@ 2017-03-01  7:47                         ` David Hunt
  2017-03-01  7:47                         ` [PATCH v8 03/18] lib: add new burst oriented distributor structs David Hunt
                                           ` (15 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

We'll be adding internal implementation definitions in here
that are common to both burst and legacy APIs.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/rte_distributor_private.h | 136 +++++++++++++++++++++++
 lib/librte_distributor/rte_distributor_v20.c     |  72 +-----------
 2 files changed, 137 insertions(+), 71 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_private.h

diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
new file mode 100644
index 0000000..6d72f1c
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_private.h
@@ -0,0 +1,136 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DIST_PRIV_H_
+#define _RTE_DIST_PRIV_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define NO_FLAGS 0
+#define RTE_DISTRIB_PREFIX "DT_"
+
+/*
+ * We will use the bottom four bits of pointer for flags, shifting out
+ * the top four bits to make room (since a 64-bit pointer actually only uses
+ * 48 bits). An arithmetic-right-shift will then appropriately restore the
+ * original pointer value with proper sign extension into the top bits.
+ */
+#define RTE_DISTRIB_FLAG_BITS 4
+#define RTE_DISTRIB_FLAGS_MASK (0x0F)
+#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
+#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
+#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
+#define RTE_DISTRIB_VALID_BUF (4)  /**< set if bufptr contains ptr */
+
+#define RTE_DISTRIB_BACKLOG_SIZE 8
+#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
+
+#define RTE_DISTRIB_MAX_RETURNS 128
+#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
+
+/**
+ * Maximum number of workers allowed.
+ * Be aware of increasing the limit, becaus it is limited by how we track
+ * in-flight tags. See in_flight_bitmask and rte_distributor_process
+ */
+#define RTE_DISTRIB_MAX_WORKERS 64
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+/**
+ * Buffer structure used to pass the pointer data between cores. This is cache
+ * line aligned, but to improve performance and prevent adjacent cache-line
+ * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
+ * the next cache line to worker 0, we pad this out to three cache lines.
+ * Only 64-bits of the memory is actually used though.
+ */
+union rte_distributor_buffer {
+	volatile int64_t bufptr64;
+	char pad[RTE_CACHE_LINE_SIZE*3];
+} __rte_cache_aligned;
+
+/**
+ * Number of packets to deal with in bursts. Needs to be 8 so as to
+ * fit in one cache line.
+ */
+#define RTE_DIST_BURST_SIZE (sizeof(rte_xmm_t) / sizeof(uint16_t))
+
+struct rte_distributor_backlog {
+	unsigned int start;
+	unsigned int count;
+	int64_t pkts[RTE_DIST_BURST_SIZE] __rte_cache_aligned;
+	uint16_t *tags; /* will point to second cacheline of inflights */
+} __rte_cache_aligned;
+
+
+struct rte_distributor_returned_pkts {
+	unsigned int start;
+	unsigned int count;
+	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
+};
+
+struct rte_distributor {
+	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
+
+	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
+	unsigned int num_workers;             /**< Number of workers polling */
+
+	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
+		/**< Tracks the tag being processed per core */
+	uint64_t in_flight_bitmask;
+		/**< on/off bits for in-flight tags.
+		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
+		 * the bitmask has to expand.
+		 */
+
+	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
+
+	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
+
+	struct rte_distributor_returned_pkts returns;
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
index b890947..be297ec 100644
--- a/lib/librte_distributor/rte_distributor_v20.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -41,77 +41,7 @@
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
 #include "rte_distributor_v20.h"
-
-#define NO_FLAGS 0
-#define RTE_DISTRIB_PREFIX "DT_"
-
-/* we will use the bottom four bits of pointer for flags, shifting out
- * the top four bits to make room (since a 64-bit pointer actually only uses
- * 48 bits). An arithmetic-right-shift will then appropriately restore the
- * original pointer value with proper sign extension into the top bits. */
-#define RTE_DISTRIB_FLAG_BITS 4
-#define RTE_DISTRIB_FLAGS_MASK (0x0F)
-#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
-#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
-#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
-
-#define RTE_DISTRIB_BACKLOG_SIZE 8
-#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
-
-#define RTE_DISTRIB_MAX_RETURNS 128
-#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
-
-/**
- * Maximum number of workers allowed.
- * Be aware of increasing the limit, becaus it is limited by how we track
- * in-flight tags. See @in_flight_bitmask and @rte_distributor_process
- */
-#define RTE_DISTRIB_MAX_WORKERS	64
-
-/**
- * Buffer structure used to pass the pointer data between cores. This is cache
- * line aligned, but to improve performance and prevent adjacent cache-line
- * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
- * the next cache line to worker 0, we pad this out to three cache lines.
- * Only 64-bits of the memory is actually used though.
- */
-union rte_distributor_buffer {
-	volatile int64_t bufptr64;
-	char pad[RTE_CACHE_LINE_SIZE*3];
-} __rte_cache_aligned;
-
-struct rte_distributor_backlog {
-	unsigned start;
-	unsigned count;
-	int64_t pkts[RTE_DISTRIB_BACKLOG_SIZE];
-};
-
-struct rte_distributor_returned_pkts {
-	unsigned start;
-	unsigned count;
-	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
-};
-
-struct rte_distributor {
-	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
-
-	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
-	unsigned num_workers;                 /**< Number of workers polling */
-
-	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
-		/**< Tracks the tag being processed per core */
-	uint64_t in_flight_bitmask;
-		/**< on/off bits for in-flight tags.
-		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
-		 * the bitmask has to expand.
-		 */
-
-	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
-
-	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
-
-	struct rte_distributor_returned_pkts returns;
-};
+#include "rte_distributor_private.h"
 
 TAILQ_HEAD(rte_distributor_list, rte_distributor);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v8 03/18] lib: add new burst oriented distributor structs
  2017-03-01  7:47                       ` [PATCH v8 0/18] distributor library performance enhancements David Hunt
  2017-03-01  7:47                         ` [PATCH v8 01/18] lib: rename legacy distributor lib files David Hunt
  2017-03-01  7:47                         ` [PATCH v8 02/18] lib: create private header file David Hunt
@ 2017-03-01  7:47                         ` David Hunt
  2017-03-01  7:47                         ` [PATCH v8 04/18] lib: add new distributor code David Hunt
                                           ` (14 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/rte_distributor_private.h | 56 ++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
index 6d72f1c..d3a470e 100644
--- a/lib/librte_distributor/rte_distributor_private.h
+++ b/lib/librte_distributor/rte_distributor_private.h
@@ -129,6 +129,62 @@ struct rte_distributor {
 	struct rte_distributor_returned_pkts returns;
 };
 
+/* All different signature compare functions */
+enum rte_distributor_match_function {
+	RTE_DIST_MATCH_SCALAR = 0,
+	RTE_DIST_MATCH_VECTOR,
+	RTE_DIST_NUM_MATCH_FNS
+};
+
+/**
+ * Buffer structure used to pass the pointer data between cores. This is cache
+ * line aligned, but to improve performance and prevent adjacent cache-line
+ * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
+ * the next cache line to worker 0, we pad this out to two cache lines.
+ * We can pass up to 8 mbufs at a time in one cacheline.
+ * There is a separate cacheline for returns in the burst API.
+ */
+struct rte_distributor_buffer_v1705 {
+	volatile int64_t bufptr64[RTE_DIST_BURST_SIZE]
+		__rte_cache_aligned; /* <= outgoing to worker */
+
+	int64_t pad1 __rte_cache_aligned;    /* <= one cache line  */
+
+	volatile int64_t retptr64[RTE_DIST_BURST_SIZE]
+		__rte_cache_aligned; /* <= incoming from worker */
+
+	int64_t pad2 __rte_cache_aligned;    /* <= one cache line  */
+
+	int count __rte_cache_aligned;       /* <= number of current mbufs */
+};
+
+struct rte_distributor_v1705 {
+	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
+
+	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
+	unsigned int num_workers;             /**< Number of workers polling */
+	unsigned int alg_type;                /**< Number of alg types */
+
+	/**>
+	 * First cache line in the this array are the tags inflight
+	 * on the worker core. Second cache line are the backlog
+	 * that are going to go to the worker core.
+	 */
+	uint16_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS][RTE_DIST_BURST_SIZE*2]
+			__rte_cache_aligned;
+
+	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS]
+			__rte_cache_aligned;
+
+	struct rte_distributor_buffer_v1705 bufs[RTE_DISTRIB_MAX_WORKERS];
+
+	struct rte_distributor_returned_pkts returns;
+
+	enum rte_distributor_match_function dist_match_fn;
+
+	struct rte_distributor *d_v20;
+};
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v8 04/18] lib: add new distributor code
  2017-03-01  7:47                       ` [PATCH v8 0/18] distributor library performance enhancements David Hunt
                                           ` (2 preceding siblings ...)
  2017-03-01  7:47                         ` [PATCH v8 03/18] lib: add new burst oriented distributor structs David Hunt
@ 2017-03-01  7:47                         ` David Hunt
  2017-03-01  7:47                         ` [PATCH v8 05/18] lib: add SIMD flow matching to distributor David Hunt
                                           ` (13 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

This patch includes public header file which will be used once
we add in the symbol versioning for v20 and v1705 APIs.

Also includes v1702 header file, and code for new
burst-capable distributor library. This will be re-named as
rte_distributor.h later in the patch-set

The new distributor code contains a very similar API to the legacy code,
but now sends bursts of up to 8 mbufs to each worker. Flow ID's are
reduced to 15 bits for an optimal flow matching algorithm.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                  |   1 +
 lib/librte_distributor/rte_distributor.c         | 628 +++++++++++++++++++++++
 lib/librte_distributor/rte_distributor_private.h |   7 +-
 lib/librte_distributor/rte_distributor_v1705.h   | 269 ++++++++++
 4 files changed, 904 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_distributor/rte_distributor.c
 create mode 100644 lib/librte_distributor/rte_distributor_v1705.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index b314ca6..74256ff 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -43,6 +43,7 @@ LIBABIVER := 1
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
new file mode 100644
index 0000000..0d5e833
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor.c
@@ -0,0 +1,628 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <sys/queue.h>
+#include <string.h>
+#include <rte_mbuf.h>
+#include <rte_memory.h>
+#include <rte_cycles.h>
+#include <rte_memzone.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
+#include <rte_eal_memconfig.h>
+#include <rte_compat.h>
+#include "rte_distributor_private.h"
+#include "rte_distributor_v1705.h"
+#include "rte_distributor_v20.h"
+
+TAILQ_HEAD(rte_dist_burst_list, rte_distributor_v1705);
+
+static struct rte_tailq_elem rte_dist_burst_tailq = {
+	.name = "RTE_DIST_BURST",
+};
+EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
+
+/**** APIs called by workers ****/
+
+/**** Burst Packet APIs called by workers ****/
+
+void
+rte_distributor_request_pkt_v1705(struct rte_distributor_v1705 *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count)
+{
+	struct rte_distributor_buffer_v1705 *buf = &(d->bufs[worker_id]);
+	unsigned int i;
+
+	volatile int64_t *retptr64;
+
+	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
+		rte_distributor_request_pkt(d->d_v20,
+			worker_id, oldpkt[0]);
+		return;
+	}
+
+	retptr64 = &(buf->retptr64[0]);
+	/* Spin while handshake bits are set (scheduler clears it) */
+	while (unlikely(*retptr64 & RTE_DISTRIB_GET_BUF)) {
+		rte_pause();
+		uint64_t t = rte_rdtsc()+100;
+
+		while (rte_rdtsc() < t)
+			rte_pause();
+	}
+
+	/*
+	 * OK, if we've got here, then the scheduler has just cleared the
+	 * handshake bits. Populate the retptrs with returning packets.
+	 */
+
+	for (i = count; i < RTE_DIST_BURST_SIZE; i++)
+		buf->retptr64[i] = 0;
+
+	/* Set Return bit for each packet returned */
+	for (i = count; i-- > 0; )
+		buf->retptr64[i] =
+			(((int64_t)(uintptr_t)(oldpkt[i])) <<
+			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
+
+	/*
+	 * Finally, set the GET_BUF  to signal to distributor that cache
+	 * line is ready for processing
+	 */
+	*retptr64 |= RTE_DISTRIB_GET_BUF;
+}
+
+int
+rte_distributor_poll_pkt_v1705(struct rte_distributor_v1705 *d,
+		unsigned int worker_id, struct rte_mbuf **pkts)
+{
+	struct rte_distributor_buffer_v1705 *buf = &d->bufs[worker_id];
+	uint64_t ret;
+	int count = 0;
+	unsigned int i;
+
+	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
+		pkts[0] = rte_distributor_poll_pkt(d->d_v20, worker_id);
+		return (pkts[0]) ? 1 : 0;
+	}
+
+	/* If bit is set, return */
+	if (buf->bufptr64[0] & RTE_DISTRIB_GET_BUF)
+		return -1;
+
+	/* since bufptr64 is signed, this should be an arithmetic shift */
+	for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+		if (likely(buf->bufptr64[i] & RTE_DISTRIB_VALID_BUF)) {
+			ret = buf->bufptr64[i] >> RTE_DISTRIB_FLAG_BITS;
+			pkts[count++] = (struct rte_mbuf *)((uintptr_t)(ret));
+		}
+	}
+
+	/*
+	 * so now we've got the contents of the cacheline into an  array of
+	 * mbuf pointers, so toggle the bit so scheduler can start working
+	 * on the next cacheline while we're working.
+	 */
+	buf->bufptr64[0] |= RTE_DISTRIB_GET_BUF;
+
+	return count;
+}
+
+int
+rte_distributor_get_pkt_v1705(struct rte_distributor_v1705 *d,
+		unsigned int worker_id, struct rte_mbuf **pkts,
+		struct rte_mbuf **oldpkt, unsigned int return_count)
+{
+	int count;
+
+	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
+		if (return_count <= 1) {
+			pkts[0] = rte_distributor_get_pkt(d->d_v20,
+				worker_id, oldpkt[0]);
+			return (pkts[0]) ? 1 : 0;
+		} else
+			return -EINVAL;
+	}
+
+	rte_distributor_request_pkt_v1705(d, worker_id, oldpkt, return_count);
+
+	count = rte_distributor_poll_pkt_v1705(d, worker_id, pkts);
+	while (count == -1) {
+		uint64_t t = rte_rdtsc() + 100;
+
+		while (rte_rdtsc() < t)
+			rte_pause();
+
+		count = rte_distributor_poll_pkt_v1705(d, worker_id, pkts);
+	}
+	return count;
+}
+
+int
+rte_distributor_return_pkt_v1705(struct rte_distributor_v1705 *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
+{
+	struct rte_distributor_buffer_v1705 *buf = &d->bufs[worker_id];
+	unsigned int i;
+
+	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
+		if (num == 1)
+			return rte_distributor_return_pkt(d->d_v20,
+				worker_id, oldpkt[0]);
+		else
+			return -EINVAL;
+	}
+
+	for (i = 0; i < RTE_DIST_BURST_SIZE; i++)
+		/* Switch off the return bit first */
+		buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
+
+	for (i = num; i-- > 0; )
+		buf->retptr64[i] = (((int64_t)(uintptr_t)oldpkt[i]) <<
+			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
+
+	/* set the GET_BUF but even if we got no returns */
+	buf->retptr64[0] |= RTE_DISTRIB_GET_BUF;
+
+	return 0;
+}
+
+/**** APIs called on distributor core ***/
+
+/* stores a packet returned from a worker inside the returns array */
+static inline void
+store_return(uintptr_t oldbuf, struct rte_distributor_v1705 *d,
+		unsigned int *ret_start, unsigned int *ret_count)
+{
+	if (!oldbuf)
+		return;
+	/* store returns in a circular buffer */
+	d->returns.mbufs[(*ret_start + *ret_count) & RTE_DISTRIB_RETURNS_MASK]
+			= (void *)oldbuf;
+	*ret_start += (*ret_count == RTE_DISTRIB_RETURNS_MASK);
+	*ret_count += (*ret_count != RTE_DISTRIB_RETURNS_MASK);
+}
+
+/*
+ * Match then flow_ids (tags) of the incoming packets to the flow_ids
+ * of the inflight packets (both inflight on the workers and in each worker
+ * backlog). This will then allow us to pin those packets to the relevant
+ * workers to give us our atomic flow pinning.
+ */
+void
+find_match_scalar(struct rte_distributor_v1705 *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	struct rte_distributor_backlog *bl;
+	uint16_t i, j, w;
+
+	/*
+	 * Function overview:
+	 * 1. Loop through all worker ID's
+	 * 2. Compare the current inflights to the incoming tags
+	 * 3. Compare the current backlog to the incoming tags
+	 * 4. Add any matches to the output
+	 */
+
+	for (j = 0 ; j < RTE_DIST_BURST_SIZE; j++)
+		output_ptr[j] = 0;
+
+	for (i = 0; i < d->num_workers; i++) {
+		bl = &d->backlog[i];
+
+		for (j = 0; j < RTE_DIST_BURST_SIZE ; j++)
+			for (w = 0; w < RTE_DIST_BURST_SIZE; w++)
+				if (d->in_flight_tags[i][j] == data_ptr[w]) {
+					output_ptr[j] = i+1;
+					break;
+				}
+		for (j = 0; j < RTE_DIST_BURST_SIZE; j++)
+			for (w = 0; w < RTE_DIST_BURST_SIZE; w++)
+				if (bl->tags[j] == data_ptr[w]) {
+					output_ptr[j] = i+1;
+					break;
+				}
+	}
+
+	/*
+	 * At this stage, the output contains 8 16-bit values, with
+	 * each non-zero value containing the worker ID on which the
+	 * corresponding flow is pinned to.
+	 */
+}
+
+
+/*
+ * When the handshake bits indicate that there are packets coming
+ * back from the worker, this function is called to copy and store
+ * the valid returned pointers (store_return).
+ */
+static unsigned int
+handle_returns(struct rte_distributor_v1705 *d, unsigned int wkr)
+{
+	struct rte_distributor_buffer_v1705 *buf = &(d->bufs[wkr]);
+	uintptr_t oldbuf;
+	unsigned int ret_start = d->returns.start,
+			ret_count = d->returns.count;
+	unsigned int count = 0;
+	unsigned int i;
+
+	if (buf->retptr64[0] & RTE_DISTRIB_GET_BUF) {
+		for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+			if (buf->retptr64[i] & RTE_DISTRIB_RETURN_BUF) {
+				oldbuf = ((uintptr_t)(buf->retptr64[i] >>
+					RTE_DISTRIB_FLAG_BITS));
+				/* store returns in a circular buffer */
+				store_return(oldbuf, d, &ret_start, &ret_count);
+				count++;
+				buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
+			}
+		}
+		d->returns.start = ret_start;
+		d->returns.count = ret_count;
+		/* Clear for the worker to populate with more returns */
+		buf->retptr64[0] = 0;
+	}
+	return count;
+}
+
+/*
+ * This function releases a burst (cache line) to a worker.
+ * It is called from the process function when a cacheline is
+ * full to make room for more packets for that worker, or when
+ * all packets have been assigned to bursts and need to be flushed
+ * to the workers.
+ * It also needs to wait for any outstanding packets from the worker
+ * before sending out new packets.
+ */
+static unsigned int
+release(struct rte_distributor_v1705 *d, unsigned int wkr)
+{
+	struct rte_distributor_buffer_v1705 *buf = &(d->bufs[wkr]);
+	unsigned int i;
+
+	while (!(d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF))
+		rte_pause();
+
+	handle_returns(d, wkr);
+
+	buf->count = 0;
+
+	for (i = 0; i < d->backlog[wkr].count; i++) {
+		d->bufs[wkr].bufptr64[i] = d->backlog[wkr].pkts[i] |
+				RTE_DISTRIB_GET_BUF | RTE_DISTRIB_VALID_BUF;
+		d->in_flight_tags[wkr][i] = d->backlog[wkr].tags[i];
+	}
+	buf->count = i;
+	for ( ; i < RTE_DIST_BURST_SIZE ; i++) {
+		buf->bufptr64[i] = RTE_DISTRIB_GET_BUF;
+		d->in_flight_tags[wkr][i] = 0;
+	}
+
+	d->backlog[wkr].count = 0;
+
+	/* Clear the GET bit */
+	buf->bufptr64[0] &= ~RTE_DISTRIB_GET_BUF;
+	return  buf->count;
+
+}
+
+
+/* process a set of packets to distribute them to workers */
+int
+rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs)
+{
+	unsigned int next_idx = 0;
+	static unsigned int wkr;
+	struct rte_mbuf *next_mb = NULL;
+	int64_t next_value = 0;
+	uint16_t new_tag = 0;
+	uint16_t flows[RTE_DIST_BURST_SIZE] __rte_cache_aligned;
+	unsigned int i, j, w, wid;
+
+	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
+		/* Call the old API */
+		return rte_distributor_process(d->d_v20, mbufs, num_mbufs);
+	}
+
+	if (unlikely(num_mbufs == 0)) {
+		/* Flush out all non-full cache-lines to workers. */
+		for (wid = 0 ; wid < d->num_workers; wid++) {
+			if ((d->bufs[wid].bufptr64[0] & RTE_DISTRIB_GET_BUF)) {
+				release(d, wid);
+				handle_returns(d, wid);
+			}
+		}
+		return 0;
+	}
+
+	while (next_idx < num_mbufs) {
+		uint16_t matches[RTE_DIST_BURST_SIZE];
+		unsigned int pkts;
+
+		if (d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF)
+			d->bufs[wkr].count = 0;
+
+		if ((num_mbufs - next_idx) < RTE_DIST_BURST_SIZE)
+			pkts = num_mbufs - next_idx;
+		else
+			pkts = RTE_DIST_BURST_SIZE;
+
+		for (i = 0; i < pkts; i++) {
+			if (mbufs[next_idx + i]) {
+				/* flows have to be non-zero */
+				flows[i] = mbufs[next_idx + i]->hash.usr | 1;
+			} else
+				flows[i] = 0;
+		}
+		for (; i < RTE_DIST_BURST_SIZE; i++)
+			flows[i] = 0;
+
+		find_match_scalar(d, &flows[0], &matches[0]);
+
+		/*
+		 * Matches array now contain the intended worker ID (+1) of
+		 * the incoming packets. Any zeroes need to be assigned
+		 * workers.
+		 */
+
+		for (j = 0; j < pkts; j++) {
+
+			next_mb = mbufs[next_idx++];
+			next_value = (((int64_t)(uintptr_t)next_mb) <<
+					RTE_DISTRIB_FLAG_BITS);
+			/*
+			 * User is advocated to set tag vaue for each
+			 * mbuf before calling rte_distributor_process.
+			 * User defined tags are used to identify flows,
+			 * or sessions.
+			 */
+			/* flows MUST be non-zero */
+			new_tag = (uint16_t)(next_mb->hash.usr) | 1;
+
+			/*
+			 * Uncommenting the next line will cause the find_match
+			 * function to be optimised out, making this function
+			 * do parallel (non-atomic) distribution
+			 */
+			/* matches[j] = 0; */
+
+			if (matches[j]) {
+				struct rte_distributor_backlog *bl =
+						&d->backlog[matches[j]-1];
+				if (unlikely(bl->count ==
+						RTE_DIST_BURST_SIZE)) {
+					release(d, matches[j]-1);
+				}
+
+				/* Add to worker that already has flow */
+				unsigned int idx = bl->count++;
+
+				bl->tags[idx] = new_tag;
+				bl->pkts[idx] = next_value;
+
+			} else {
+				struct rte_distributor_backlog *bl =
+						&d->backlog[wkr];
+				if (unlikely(bl->count ==
+						RTE_DIST_BURST_SIZE)) {
+					release(d, wkr);
+				}
+
+				/* Add to current worker worker */
+				unsigned int idx = bl->count++;
+
+				bl->tags[idx] = new_tag;
+				bl->pkts[idx] = next_value;
+				/*
+				 * Now that we've just added an unpinned flow
+				 * to a worker, we need to ensure that all
+				 * other packets with that same flow will go
+				 * to the same worker in this burst.
+				 */
+				for (w = j; w < pkts; w++)
+					if (flows[w] == new_tag)
+						matches[w] = wkr+1;
+			}
+		}
+		wkr++;
+		if (wkr >= d->num_workers)
+			wkr = 0;
+	}
+
+	/* Flush out all non-full cache-lines to workers. */
+	for (wid = 0 ; wid < d->num_workers; wid++)
+		if ((d->bufs[wid].bufptr64[0] & RTE_DISTRIB_GET_BUF))
+			release(d, wid);
+
+	return num_mbufs;
+}
+
+/* return to the caller, packets returned from workers */
+int
+rte_distributor_returned_pkts_v1705(struct rte_distributor_v1705 *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs)
+{
+	struct rte_distributor_returned_pkts *returns = &d->returns;
+	unsigned int retval = (max_mbufs < returns->count) ?
+			max_mbufs : returns->count;
+	unsigned int i;
+
+	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
+		/* Call the old API */
+		return rte_distributor_returned_pkts(d->d_v20,
+				mbufs, max_mbufs);
+	}
+
+	for (i = 0; i < retval; i++) {
+		unsigned int idx = (returns->start + i) &
+				RTE_DISTRIB_RETURNS_MASK;
+
+		mbufs[i] = returns->mbufs[idx];
+	}
+	returns->start += i;
+	returns->count -= i;
+
+	return retval;
+}
+
+/*
+ * Return the number of packets in-flight in a distributor, i.e. packets
+ * being workered on or queued up in a backlog.
+ */
+static inline unsigned int
+total_outstanding(const struct rte_distributor_v1705 *d)
+{
+	unsigned int wkr, total_outstanding = 0;
+
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		total_outstanding += d->backlog[wkr].count;
+
+	return total_outstanding;
+}
+
+/*
+ * Flush the distributor, so that there are no outstanding packets in flight or
+ * queued up.
+ */
+int
+rte_distributor_flush_v1705(struct rte_distributor_v1705 *d)
+{
+	const unsigned int flushed = total_outstanding(d);
+	unsigned int wkr;
+
+	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
+		/* Call the old API */
+		return rte_distributor_flush(d->d_v20);
+	}
+
+	while (total_outstanding(d) > 0)
+		rte_distributor_process_v1705(d, NULL, 0);
+
+	/*
+	 * Send empty burst to all workers to allow them to exit
+	 * gracefully, should they need to.
+	 */
+	rte_distributor_process_v1705(d, NULL, 0);
+
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		handle_returns(d, wkr);
+
+	return flushed;
+}
+
+/* clears the internal returns array in the distributor */
+void
+rte_distributor_clear_returns_v1705(struct rte_distributor_v1705 *d)
+{
+	unsigned int wkr;
+
+	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
+		/* Call the old API */
+		rte_distributor_clear_returns(d->d_v20);
+	}
+
+	/* throw away returns, so workers can exit */
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		d->bufs[wkr].retptr64[0] = 0;
+}
+
+/* creates a distributor instance */
+struct rte_distributor_v1705 *
+rte_distributor_create_v1705(const char *name,
+		unsigned int socket_id,
+		unsigned int num_workers,
+		unsigned int alg_type)
+{
+	struct rte_distributor_v1705 *d;
+	struct rte_dist_burst_list *dist_burst_list;
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+	unsigned int i;
+
+	/* TODO Reorganise function properly around RTE_DIST_ALG_SINGLE/BURST */
+
+	/* compilation-time checks */
+	RTE_BUILD_BUG_ON((sizeof(*d) & RTE_CACHE_LINE_MASK) != 0);
+	RTE_BUILD_BUG_ON((RTE_DISTRIB_MAX_WORKERS & 7) != 0);
+
+	if (alg_type == RTE_DIST_ALG_SINGLE) {
+		d = malloc(sizeof(struct rte_distributor_v1705));
+		d->d_v20 = rte_distributor_create(name,
+				socket_id, num_workers);
+		if (d->d_v20 == NULL) {
+			/* rte_errno will have been set */
+			return NULL;
+		}
+		d->alg_type = alg_type;
+		return d;
+	}
+
+	if (name == NULL || num_workers >= RTE_DISTRIB_MAX_WORKERS) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	snprintf(mz_name, sizeof(mz_name), RTE_DISTRIB_PREFIX"%s", name);
+	mz = rte_memzone_reserve(mz_name, sizeof(*d), socket_id, NO_FLAGS);
+	if (mz == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	d = mz->addr;
+	snprintf(d->name, sizeof(d->name), "%s", name);
+	d->num_workers = num_workers;
+	d->alg_type = alg_type;
+	d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
+
+	/*
+	 * Set up the backog tags so they're pointing at the second cache
+	 * line for performance during flow matching
+	 */
+	for (i = 0 ; i < num_workers ; i++)
+		d->backlog[i].tags = &d->in_flight_tags[i][RTE_DIST_BURST_SIZE];
+
+	dist_burst_list = RTE_TAILQ_CAST(rte_dist_burst_tailq.head,
+					  rte_dist_burst_list);
+
+
+	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+	TAILQ_INSERT_TAIL(dist_burst_list, d, next);
+	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
+	return d;
+}
diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
index d3a470e..f0042a8 100644
--- a/lib/librte_distributor/rte_distributor_private.h
+++ b/lib/librte_distributor/rte_distributor_private.h
@@ -159,7 +159,7 @@ struct rte_distributor_buffer_v1705 {
 };
 
 struct rte_distributor_v1705 {
-	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
+	TAILQ_ENTRY(rte_distributor_v1705) next;    /**< Next in list. */
 
 	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
 	unsigned int num_workers;             /**< Number of workers polling */
@@ -185,6 +185,11 @@ struct rte_distributor_v1705 {
 	struct rte_distributor *d_v20;
 };
 
+void
+find_match_scalar(struct rte_distributor_v1705 *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_distributor/rte_distributor_v1705.h b/lib/librte_distributor/rte_distributor_v1705.h
new file mode 100644
index 0000000..0034020
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v1705.h
@@ -0,0 +1,269 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIBUTOR_H_
+#define _RTE_DISTRIBUTOR_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Type of distribution (burst/single) */
+enum rte_distributor_alg_type {
+	RTE_DIST_ALG_BURST = 0,
+	RTE_DIST_ALG_SINGLE,
+	RTE_DIST_NUM_ALG_TYPES
+};
+
+struct rte_distributor_v1705;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @param alg_type
+ *   Call the legacy API, or use the new burst API. legacy uses 32-bit
+ *   flow ID, and works on a single packet at a time. Latest uses 15-
+ *   bit flow ID and works on up to 8 packets at a time to worers.
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor_v1705 *
+rte_distributor_create_v1705(const char *name, unsigned int socket_id,
+		unsigned int num_workers,
+		unsigned int alg_type);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed on different cores at
+ * the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts_v1705(struct rte_distributor_v1705 *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush_v1705(struct rte_distributor_v1705 *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns_v1705(struct rte_distributor_v1705 *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get new packets to process. Any previous packets
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param pkts
+ *   The mbufs pointer array to be filled in (up to 8 packets)
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ * @param retcount
+ *   The number of packets being returned
+ *
+ * @return
+ *   The number of packets in the pkts array
+ */
+int
+rte_distributor_get_pkt_v1705(struct rte_distributor_v1705 *d,
+	unsigned int worker_id, struct rte_mbuf **pkts,
+	struct rte_mbuf **oldpkt, unsigned int retcount);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packets being processed by the worker
+ * @param num
+ *   The number of packets in the oldpkt array
+ */
+int
+rte_distributor_return_pkt_v1705(struct rte_distributor_v1705 *d,
+	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt_burst(), this function does not wait for a
+ * new packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt_burst() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt_burst()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The returning packets, if any, processed by the worker
+ * @param count
+ *   The number of returning packets
+ */
+void
+rte_distributor_request_pkt_v1705(struct rte_distributor_v1705 *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbufs
+ *   The array of mbufs being given to the worker
+ *
+ * @return
+ *   The number of packets being given to the worker thread, zero if no
+ *   packet is yet available.
+ */
+int
+rte_distributor_poll_pkt_v1705(struct rte_distributor_v1705 *d,
+		unsigned int worker_id, struct rte_mbuf **mbufs);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v8 05/18] lib: add SIMD flow matching to distributor
  2017-03-01  7:47                       ` [PATCH v8 0/18] distributor library performance enhancements David Hunt
                                           ` (3 preceding siblings ...)
  2017-03-01  7:47                         ` [PATCH v8 04/18] lib: add new distributor code David Hunt
@ 2017-03-01  7:47                         ` David Hunt
  2017-03-01  7:47                         ` [PATCH v8 06/18] test/distributor: extra params for autotests David Hunt
                                           ` (12 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Add an optimised version of the in-flight flow matching algorithm
using SIMD instructions. This should give up to 1.5x over the scalar
versions performance.

Falls back to scalar version if SSE4.2 not available

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                    |  10 ++
 lib/librte_distributor/rte_distributor.c           |  16 ++-
 .../rte_distributor_match_generic.c                |  43 ++++++++
 lib/librte_distributor/rte_distributor_match_sse.c | 114 +++++++++++++++++++++
 lib/librte_distributor/rte_distributor_private.h   |   5 +
 5 files changed, 186 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_match_generic.c
 create mode 100644 lib/librte_distributor/rte_distributor_match_sse.c

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 74256ff..a812fe4 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -44,6 +44,16 @@ LIBABIVER := 1
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor.c
+ifeq ($(CONFIG_RTE_ARCH_X86),y)
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_match_sse.c
+# distributor SIMD algo needs SSE4.2 support
+ifeq ($(findstring RTE_MACHINE_CPUFLAG_SSE4_2,$(CFLAGS)),)
+CFLAGS_rte_distributor_match_sse.o += -msse4.2
+endif
+else
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_match_generic.c
+endif
+
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index 0d5e833..51c9ad9 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -391,7 +391,13 @@ rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
 		for (; i < RTE_DIST_BURST_SIZE; i++)
 			flows[i] = 0;
 
-		find_match_scalar(d, &flows[0], &matches[0]);
+		switch (d->dist_match_fn) {
+		case RTE_DIST_MATCH_VECTOR:
+			find_match_vec(d, &flows[0], &matches[0]);
+			break;
+		default:
+			find_match_scalar(d, &flows[0], &matches[0]);
+		}
 
 		/*
 		 * Matches array now contain the intended worker ID (+1) of
@@ -607,7 +613,13 @@ rte_distributor_create_v1705(const char *name,
 	snprintf(d->name, sizeof(d->name), "%s", name);
 	d->num_workers = num_workers;
 	d->alg_type = alg_type;
-	d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
+
+#if defined(RTE_ARCH_X86)
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2))
+		d->dist_match_fn = RTE_DIST_MATCH_VECTOR;
+	else
+#endif
+		d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
 
 	/*
 	 * Set up the backog tags so they're pointing at the second cache
diff --git a/lib/librte_distributor/rte_distributor_match_generic.c b/lib/librte_distributor/rte_distributor_match_generic.c
new file mode 100644
index 0000000..4925a78
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_match_generic.c
@@ -0,0 +1,43 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_mbuf.h>
+#include "rte_distributor_private.h"
+#include "rte_distributor.h"
+
+void
+find_match_vec(struct rte_distributor *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	find_match_scalar(d, data_ptr, output_ptr);
+}
diff --git a/lib/librte_distributor/rte_distributor_match_sse.c b/lib/librte_distributor/rte_distributor_match_sse.c
new file mode 100644
index 0000000..b9f9bb0
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_match_sse.c
@@ -0,0 +1,114 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_mbuf.h>
+#include "rte_distributor_private.h"
+#include "rte_distributor.h"
+#include "smmintrin.h"
+#include "nmmintrin.h"
+
+
+void
+find_match_vec(struct rte_distributor_v1705 *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	/* Setup */
+	__m128i incoming_fids;
+	__m128i inflight_fids;
+	__m128i preflight_fids;
+	__m128i wkr;
+	__m128i mask1;
+	__m128i mask2;
+	__m128i output;
+	struct rte_distributor_backlog *bl;
+	uint16_t i;
+
+	/*
+	 * Function overview:
+	 * 2. Loop through all worker ID's
+	 *  2a. Load the current inflights for that worker into an xmm reg
+	 *  2b. Load the current backlog for that worker into an xmm reg
+	 *  2c. use cmpestrm to intersect flow_ids with backlog and inflights
+	 *  2d. Add any matches to the output
+	 * 3. Write the output xmm (matching worker ids).
+	 */
+
+
+	output = _mm_set1_epi16(0);
+	incoming_fids = _mm_load_si128((__m128i *)data_ptr);
+
+	for (i = 0; i < d->num_workers; i++) {
+		bl = &d->backlog[i];
+
+		inflight_fids =
+			_mm_load_si128((__m128i *)&(d->in_flight_tags[i]));
+		preflight_fids =
+			_mm_load_si128((__m128i *)(bl->tags));
+
+		/*
+		 * Any incoming_fid that exists anywhere in inflight_fids will
+		 * have 0xffff in same position of the mask as the incoming fid
+		 * Example (shortened to bytes for brevity):
+		 * incoming_fids   0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08
+		 * inflight_fids   0x03 0x05 0x07 0x00 0x00 0x00 0x00 0x00
+		 * mask            0x00 0x00 0xff 0x00 0xff 0x00 0xff 0x00
+		 */
+
+		mask1 = _mm_cmpestrm(inflight_fids, 8, incoming_fids, 8,
+			_SIDD_UWORD_OPS |
+			_SIDD_CMP_EQUAL_ANY |
+			_SIDD_UNIT_MASK);
+		mask2 = _mm_cmpestrm(preflight_fids, 8, incoming_fids, 8,
+			_SIDD_UWORD_OPS |
+			_SIDD_CMP_EQUAL_ANY |
+			_SIDD_UNIT_MASK);
+
+		mask1 = _mm_or_si128(mask1, mask2);
+		/*
+		 * Now mask contains 0xffff where there's a match.
+		 * Next we need to store the worker_id in the relevant position
+		 * in the output.
+		 */
+
+		wkr = _mm_set1_epi16(i+1);
+		mask1 = _mm_and_si128(mask1, wkr);
+		output = _mm_or_si128(mask1, output);
+	}
+
+	/*
+	 * At this stage, the output 128-bit contains 8 16-bit values, with
+	 * each non-zero value containing the worker ID on which the
+	 * corresponding flow is pinned to.
+	 */
+	_mm_store_si128((__m128i *)output_ptr, output);
+}
diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
index f0042a8..92052b1 100644
--- a/lib/librte_distributor/rte_distributor_private.h
+++ b/lib/librte_distributor/rte_distributor_private.h
@@ -190,6 +190,11 @@ find_match_scalar(struct rte_distributor_v1705 *d,
 			uint16_t *data_ptr,
 			uint16_t *output_ptr);
 
+void
+find_match_vec(struct rte_distributor_v1705 *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr);
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v8 06/18] test/distributor: extra params for autotests
  2017-03-01  7:47                       ` [PATCH v8 0/18] distributor library performance enhancements David Hunt
                                           ` (4 preceding siblings ...)
  2017-03-01  7:47                         ` [PATCH v8 05/18] lib: add SIMD flow matching to distributor David Hunt
@ 2017-03-01  7:47                         ` David Hunt
  2017-03-01  7:47                         ` [PATCH v8 07/18] lib: switch distributor over to new API David Hunt
                                           ` (11 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

In the next few patches, we'll want to test old and new API,
so here we're allowing different parameters to be passed to
the tests, instead of just a distributor struct.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 test/test/test_distributor.c | 64 +++++++++++++++++++++++++++++---------------
 1 file changed, 43 insertions(+), 21 deletions(-)

diff --git a/test/test/test_distributor.c b/test/test/test_distributor.c
index 85cb8f3..6059a0c 100644
--- a/test/test/test_distributor.c
+++ b/test/test/test_distributor.c
@@ -45,6 +45,13 @@
 #define BURST 32
 #define BIG_BATCH 1024
 
+struct worker_params {
+	char name[64];
+	struct rte_distributor *dist;
+};
+
+struct worker_params worker_params;
+
 /* statics - all zero-initialized by default */
 static volatile int quit;      /**< general quit variable for all threads */
 static volatile int zero_quit; /**< var for when we just want thr0 to quit*/
@@ -81,7 +88,9 @@ static int
 handle_work(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->dist;
+
 	unsigned count = 0;
 	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
 
@@ -107,8 +116,9 @@ handle_work(void *arg)
  *   not necessarily in the same order (as different flows).
  */
 static int
-sanity_test(struct rte_distributor *d, struct rte_mempool *p)
+sanity_test(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->dist;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
@@ -249,7 +259,8 @@ static int
 handle_work_with_free_mbufs(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->dist;
 	unsigned count = 0;
 	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
 
@@ -270,8 +281,9 @@ handle_work_with_free_mbufs(void *arg)
  * library.
  */
 static int
-sanity_test_with_mbuf_alloc(struct rte_distributor *d, struct rte_mempool *p)
+sanity_test_with_mbuf_alloc(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->dist;
 	unsigned i;
 	struct rte_mbuf *bufs[BURST];
 
@@ -305,7 +317,8 @@ static int
 handle_work_for_shutdown_test(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->dist;
 	unsigned count = 0;
 	const unsigned id = __sync_fetch_and_add(&worker_idx, 1);
 
@@ -344,9 +357,10 @@ handle_work_for_shutdown_test(void *arg)
  * library.
  */
 static int
-sanity_test_with_worker_shutdown(struct rte_distributor *d,
+sanity_test_with_worker_shutdown(struct worker_params *wp,
 		struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->dist;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
@@ -401,9 +415,10 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d,
  * one worker shuts down..
  */
 static int
-test_flush_with_worker_shutdown(struct rte_distributor *d,
+test_flush_with_worker_shutdown(struct worker_params *wp,
 		struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->dist;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
@@ -480,8 +495,9 @@ int test_error_distributor_create_numworkers(void)
 
 /* Useful function which ensures that all worker functions terminate */
 static void
-quit_workers(struct rte_distributor *d, struct rte_mempool *p)
+quit_workers(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->dist;
 	const unsigned num_workers = rte_lcore_count() - 1;
 	unsigned i;
 	struct rte_mbuf *bufs[RTE_MAX_LCORE];
@@ -536,28 +552,34 @@ test_distributor(void)
 		}
 	}
 
-	rte_eal_mp_remote_launch(handle_work, d, SKIP_MASTER);
-	if (sanity_test(d, p) < 0)
+	worker_params.dist = d;
+	sprintf(worker_params.name, "single");
+
+	rte_eal_mp_remote_launch(handle_work, &worker_params, SKIP_MASTER);
+	if (sanity_test(&worker_params, p) < 0)
 		goto err;
-	quit_workers(d, p);
+	quit_workers(&worker_params, p);
 
-	rte_eal_mp_remote_launch(handle_work_with_free_mbufs, d, SKIP_MASTER);
-	if (sanity_test_with_mbuf_alloc(d, p) < 0)
+	rte_eal_mp_remote_launch(handle_work_with_free_mbufs, &worker_params,
+				SKIP_MASTER);
+	if (sanity_test_with_mbuf_alloc(&worker_params, p) < 0)
 		goto err;
-	quit_workers(d, p);
+	quit_workers(&worker_params, p);
 
 	if (rte_lcore_count() > 2) {
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test, d,
+		rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+				&worker_params,
 				SKIP_MASTER);
-		if (sanity_test_with_worker_shutdown(d, p) < 0)
+		if (sanity_test_with_worker_shutdown(&worker_params, p) < 0)
 			goto err;
-		quit_workers(d, p);
+		quit_workers(&worker_params, p);
 
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test, d,
+		rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+				&worker_params,
 				SKIP_MASTER);
-		if (test_flush_with_worker_shutdown(d, p) < 0)
+		if (test_flush_with_worker_shutdown(&worker_params, p) < 0)
 			goto err;
-		quit_workers(d, p);
+		quit_workers(&worker_params, p);
 
 	} else {
 		printf("Not enough cores to run tests for worker shutdown\n");
@@ -572,7 +594,7 @@ test_distributor(void)
 	return 0;
 
 err:
-	quit_workers(d, p);
+	quit_workers(&worker_params, p);
 	return -1;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v8 07/18] lib: switch distributor over to new API
  2017-03-01  7:47                       ` [PATCH v8 0/18] distributor library performance enhancements David Hunt
                                           ` (5 preceding siblings ...)
  2017-03-01  7:47                         ` [PATCH v8 06/18] test/distributor: extra params for autotests David Hunt
@ 2017-03-01  7:47                         ` David Hunt
  2017-03-01  7:47                         ` [PATCH v8 08/18] lib: make v20 header file private David Hunt
                                           ` (10 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

This is the main switch over between the legacy API and the new
burst API. We rename all the functions in rte_distributor.c to remove
the _v1705, and we add in _v20 in the rte_distributor_v20.c

At the same time, we need the autotests and sample app to compile
properly, hence thosie changes are in there as well.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 examples/distributor/main.c                        |  22 +-
 lib/librte_distributor/rte_distributor.c           |  76 +++---
 lib/librte_distributor/rte_distributor.h           | 240 +++++++++++++++++-
 lib/librte_distributor/rte_distributor_match_sse.c |   2 +-
 lib/librte_distributor/rte_distributor_private.h   |  22 +-
 lib/librte_distributor/rte_distributor_v1705.h     | 269 ---------------------
 lib/librte_distributor/rte_distributor_v20.c       |  46 ++--
 lib/librte_distributor/rte_distributor_v20.h       |  24 +-
 test/test/test_distributor.c                       | 235 ++++++++++++------
 test/test/test_distributor_perf.c                  |  26 +-
 10 files changed, 511 insertions(+), 451 deletions(-)
 delete mode 100644 lib/librte_distributor/rte_distributor_v1705.h

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index e7641d2..cc3bdb0 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -405,17 +405,30 @@ lcore_worker(struct lcore_params *p)
 {
 	struct rte_distributor *d = p->d;
 	const unsigned id = p->worker_id;
+	unsigned int num = 0;
+	unsigned int i;
+
 	/*
 	 * for single port, xor_val will be zero so we won't modify the output
 	 * port, otherwise we send traffic from 0 to 1, 2 to 3, and vice versa
 	 */
 	const unsigned xor_val = (rte_eth_dev_count() > 1);
-	struct rte_mbuf *buf = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
 
 	printf("\nCore %u acting as worker core.\n", rte_lcore_id());
 	while (!quit_signal) {
-		buf = rte_distributor_get_pkt(d, id, buf);
-		buf->port ^= xor_val;
+		num = rte_distributor_get_pkt(d, id, buf, buf, num);
+		/* Do a little bit of work for each packet */
+		for (i = 0; i < num; i++) {
+			uint64_t t = rte_rdtsc()+100;
+
+			while (rte_rdtsc() < t)
+				rte_pause();
+			buf[i]->port ^= xor_val;
+		}
 	}
 	return 0;
 }
@@ -561,7 +574,8 @@ main(int argc, char *argv[])
 	}
 
 	d = rte_distributor_create("PKT_DIST", rte_socket_id(),
-			rte_lcore_count() - 2);
+			rte_lcore_count() - 2,
+			RTE_DIST_ALG_BURST);
 	if (d == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create distributor\n");
 
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index 51c9ad9..6e1debf 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -42,10 +42,10 @@
 #include <rte_eal_memconfig.h>
 #include <rte_compat.h>
 #include "rte_distributor_private.h"
-#include "rte_distributor_v1705.h"
+#include "rte_distributor.h"
 #include "rte_distributor_v20.h"
 
-TAILQ_HEAD(rte_dist_burst_list, rte_distributor_v1705);
+TAILQ_HEAD(rte_dist_burst_list, rte_distributor);
 
 static struct rte_tailq_elem rte_dist_burst_tailq = {
 	.name = "RTE_DIST_BURST",
@@ -57,17 +57,17 @@ EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
 /**** Burst Packet APIs called by workers ****/
 
 void
-rte_distributor_request_pkt_v1705(struct rte_distributor_v1705 *d,
+rte_distributor_request_pkt(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt,
 		unsigned int count)
 {
-	struct rte_distributor_buffer_v1705 *buf = &(d->bufs[worker_id]);
+	struct rte_distributor_buffer *buf = &(d->bufs[worker_id]);
 	unsigned int i;
 
 	volatile int64_t *retptr64;
 
 	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
-		rte_distributor_request_pkt(d->d_v20,
+		rte_distributor_request_pkt_v20(d->d_v20,
 			worker_id, oldpkt[0]);
 		return;
 	}
@@ -104,16 +104,16 @@ rte_distributor_request_pkt_v1705(struct rte_distributor_v1705 *d,
 }
 
 int
-rte_distributor_poll_pkt_v1705(struct rte_distributor_v1705 *d,
+rte_distributor_poll_pkt(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts)
 {
-	struct rte_distributor_buffer_v1705 *buf = &d->bufs[worker_id];
+	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
 	uint64_t ret;
 	int count = 0;
 	unsigned int i;
 
 	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
-		pkts[0] = rte_distributor_poll_pkt(d->d_v20, worker_id);
+		pkts[0] = rte_distributor_poll_pkt_v20(d->d_v20, worker_id);
 		return (pkts[0]) ? 1 : 0;
 	}
 
@@ -140,7 +140,7 @@ rte_distributor_poll_pkt_v1705(struct rte_distributor_v1705 *d,
 }
 
 int
-rte_distributor_get_pkt_v1705(struct rte_distributor_v1705 *d,
+rte_distributor_get_pkt(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts,
 		struct rte_mbuf **oldpkt, unsigned int return_count)
 {
@@ -148,37 +148,37 @@ rte_distributor_get_pkt_v1705(struct rte_distributor_v1705 *d,
 
 	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
 		if (return_count <= 1) {
-			pkts[0] = rte_distributor_get_pkt(d->d_v20,
+			pkts[0] = rte_distributor_get_pkt_v20(d->d_v20,
 				worker_id, oldpkt[0]);
 			return (pkts[0]) ? 1 : 0;
 		} else
 			return -EINVAL;
 	}
 
-	rte_distributor_request_pkt_v1705(d, worker_id, oldpkt, return_count);
+	rte_distributor_request_pkt(d, worker_id, oldpkt, return_count);
 
-	count = rte_distributor_poll_pkt_v1705(d, worker_id, pkts);
+	count = rte_distributor_poll_pkt(d, worker_id, pkts);
 	while (count == -1) {
 		uint64_t t = rte_rdtsc() + 100;
 
 		while (rte_rdtsc() < t)
 			rte_pause();
 
-		count = rte_distributor_poll_pkt_v1705(d, worker_id, pkts);
+		count = rte_distributor_poll_pkt(d, worker_id, pkts);
 	}
 	return count;
 }
 
 int
-rte_distributor_return_pkt_v1705(struct rte_distributor_v1705 *d,
+rte_distributor_return_pkt(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
 {
-	struct rte_distributor_buffer_v1705 *buf = &d->bufs[worker_id];
+	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
 	unsigned int i;
 
 	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
 		if (num == 1)
-			return rte_distributor_return_pkt(d->d_v20,
+			return rte_distributor_return_pkt_v20(d->d_v20,
 				worker_id, oldpkt[0]);
 		else
 			return -EINVAL;
@@ -202,7 +202,7 @@ rte_distributor_return_pkt_v1705(struct rte_distributor_v1705 *d,
 
 /* stores a packet returned from a worker inside the returns array */
 static inline void
-store_return(uintptr_t oldbuf, struct rte_distributor_v1705 *d,
+store_return(uintptr_t oldbuf, struct rte_distributor *d,
 		unsigned int *ret_start, unsigned int *ret_count)
 {
 	if (!oldbuf)
@@ -221,7 +221,7 @@ store_return(uintptr_t oldbuf, struct rte_distributor_v1705 *d,
  * workers to give us our atomic flow pinning.
  */
 void
-find_match_scalar(struct rte_distributor_v1705 *d,
+find_match_scalar(struct rte_distributor *d,
 			uint16_t *data_ptr,
 			uint16_t *output_ptr)
 {
@@ -270,9 +270,9 @@ find_match_scalar(struct rte_distributor_v1705 *d,
  * the valid returned pointers (store_return).
  */
 static unsigned int
-handle_returns(struct rte_distributor_v1705 *d, unsigned int wkr)
+handle_returns(struct rte_distributor *d, unsigned int wkr)
 {
-	struct rte_distributor_buffer_v1705 *buf = &(d->bufs[wkr]);
+	struct rte_distributor_buffer *buf = &(d->bufs[wkr]);
 	uintptr_t oldbuf;
 	unsigned int ret_start = d->returns.start,
 			ret_count = d->returns.count;
@@ -308,9 +308,9 @@ handle_returns(struct rte_distributor_v1705 *d, unsigned int wkr)
  * before sending out new packets.
  */
 static unsigned int
-release(struct rte_distributor_v1705 *d, unsigned int wkr)
+release(struct rte_distributor *d, unsigned int wkr)
 {
-	struct rte_distributor_buffer_v1705 *buf = &(d->bufs[wkr]);
+	struct rte_distributor_buffer *buf = &(d->bufs[wkr]);
 	unsigned int i;
 
 	while (!(d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF))
@@ -342,7 +342,7 @@ release(struct rte_distributor_v1705 *d, unsigned int wkr)
 
 /* process a set of packets to distribute them to workers */
 int
-rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
+rte_distributor_process(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int num_mbufs)
 {
 	unsigned int next_idx = 0;
@@ -355,7 +355,7 @@ rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
 
 	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
 		/* Call the old API */
-		return rte_distributor_process(d->d_v20, mbufs, num_mbufs);
+		return rte_distributor_process_v20(d->d_v20, mbufs, num_mbufs);
 	}
 
 	if (unlikely(num_mbufs == 0)) {
@@ -479,7 +479,7 @@ rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
 
 /* return to the caller, packets returned from workers */
 int
-rte_distributor_returned_pkts_v1705(struct rte_distributor_v1705 *d,
+rte_distributor_returned_pkts(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int max_mbufs)
 {
 	struct rte_distributor_returned_pkts *returns = &d->returns;
@@ -489,7 +489,7 @@ rte_distributor_returned_pkts_v1705(struct rte_distributor_v1705 *d,
 
 	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
 		/* Call the old API */
-		return rte_distributor_returned_pkts(d->d_v20,
+		return rte_distributor_returned_pkts_v20(d->d_v20,
 				mbufs, max_mbufs);
 	}
 
@@ -510,7 +510,7 @@ rte_distributor_returned_pkts_v1705(struct rte_distributor_v1705 *d,
  * being workered on or queued up in a backlog.
  */
 static inline unsigned int
-total_outstanding(const struct rte_distributor_v1705 *d)
+total_outstanding(const struct rte_distributor *d)
 {
 	unsigned int wkr, total_outstanding = 0;
 
@@ -525,24 +525,24 @@ total_outstanding(const struct rte_distributor_v1705 *d)
  * queued up.
  */
 int
-rte_distributor_flush_v1705(struct rte_distributor_v1705 *d)
+rte_distributor_flush(struct rte_distributor *d)
 {
 	const unsigned int flushed = total_outstanding(d);
 	unsigned int wkr;
 
 	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
 		/* Call the old API */
-		return rte_distributor_flush(d->d_v20);
+		return rte_distributor_flush_v20(d->d_v20);
 	}
 
 	while (total_outstanding(d) > 0)
-		rte_distributor_process_v1705(d, NULL, 0);
+		rte_distributor_process(d, NULL, 0);
 
 	/*
 	 * Send empty burst to all workers to allow them to exit
 	 * gracefully, should they need to.
 	 */
-	rte_distributor_process_v1705(d, NULL, 0);
+	rte_distributor_process(d, NULL, 0);
 
 	for (wkr = 0; wkr < d->num_workers; wkr++)
 		handle_returns(d, wkr);
@@ -552,13 +552,13 @@ rte_distributor_flush_v1705(struct rte_distributor_v1705 *d)
 
 /* clears the internal returns array in the distributor */
 void
-rte_distributor_clear_returns_v1705(struct rte_distributor_v1705 *d)
+rte_distributor_clear_returns(struct rte_distributor *d)
 {
 	unsigned int wkr;
 
 	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
 		/* Call the old API */
-		rte_distributor_clear_returns(d->d_v20);
+		rte_distributor_clear_returns_v20(d->d_v20);
 	}
 
 	/* throw away returns, so workers can exit */
@@ -567,13 +567,13 @@ rte_distributor_clear_returns_v1705(struct rte_distributor_v1705 *d)
 }
 
 /* creates a distributor instance */
-struct rte_distributor_v1705 *
-rte_distributor_create_v1705(const char *name,
+struct rte_distributor *
+rte_distributor_create(const char *name,
 		unsigned int socket_id,
 		unsigned int num_workers,
 		unsigned int alg_type)
 {
-	struct rte_distributor_v1705 *d;
+	struct rte_distributor *d;
 	struct rte_dist_burst_list *dist_burst_list;
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	const struct rte_memzone *mz;
@@ -586,8 +586,8 @@ rte_distributor_create_v1705(const char *name,
 	RTE_BUILD_BUG_ON((RTE_DISTRIB_MAX_WORKERS & 7) != 0);
 
 	if (alg_type == RTE_DIST_ALG_SINGLE) {
-		d = malloc(sizeof(struct rte_distributor_v1705));
-		d->d_v20 = rte_distributor_create(name,
+		d = malloc(sizeof(struct rte_distributor));
+		d->d_v20 = rte_distributor_create_v20(name,
 				socket_id, num_workers);
 		if (d->d_v20 == NULL) {
 			/* rte_errno will have been set */
diff --git a/lib/librte_distributor/rte_distributor.h b/lib/librte_distributor/rte_distributor.h
index e41d522..9b9efdb 100644
--- a/lib/librte_distributor/rte_distributor.h
+++ b/lib/librte_distributor/rte_distributor.h
@@ -1,8 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -31,9 +30,240 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-#ifndef _RTE_DISTRIBUTE_H_
-#define _RTE_DISTRIBUTE_H_
+#ifndef _RTE_DISTRIBUTOR_H_
+#define _RTE_DISTRIBUTOR_H_
 
-#include <rte_distributor_v20.h>
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Type of distribution (burst/single) */
+enum rte_distributor_alg_type {
+	RTE_DIST_ALG_BURST = 0,
+	RTE_DIST_ALG_SINGLE,
+	RTE_DIST_NUM_ALG_TYPES
+};
+
+struct rte_distributor;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @param alg_type
+ *   Call the legacy API, or use the new burst API. legacy uses 32-bit
+ *   flow ID, and works on a single packet at a time. Latest uses 15-
+ *   bit flow ID and works on up to 8 packets at a time to worers.
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor *
+rte_distributor_create(const char *name, unsigned int socket_id,
+		unsigned int num_workers,
+		unsigned int alg_type);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed on different cores at
+ * the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush(struct rte_distributor *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns(struct rte_distributor *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get new packets to process. Any previous packets
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param pkts
+ *   The mbufs pointer array to be filled in (up to 8 packets)
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ * @param retcount
+ *   The number of packets being returned
+ *
+ * @return
+ *   The number of packets in the pkts array
+ */
+int
+rte_distributor_get_pkt(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **pkts,
+	struct rte_mbuf **oldpkt, unsigned int retcount);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packets being processed by the worker
+ * @param num
+ *   The number of packets in the oldpkt array
+ */
+int
+rte_distributor_return_pkt(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt_burst(), this function does not wait for a
+ * new packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt_burst() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt_burst()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The returning packets, if any, processed by the worker
+ * @param count
+ *   The number of returning packets
+ */
+void
+rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbufs
+ *   The array of mbufs being given to the worker
+ *
+ * @return
+ *   The number of packets being given to the worker thread, zero if no
+ *   packet is yet available.
+ */
+int
+rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **mbufs);
+
+#ifdef __cplusplus
+}
+#endif
 
 #endif
diff --git a/lib/librte_distributor/rte_distributor_match_sse.c b/lib/librte_distributor/rte_distributor_match_sse.c
index b9f9bb0..44935a6 100644
--- a/lib/librte_distributor/rte_distributor_match_sse.c
+++ b/lib/librte_distributor/rte_distributor_match_sse.c
@@ -38,7 +38,7 @@
 
 
 void
-find_match_vec(struct rte_distributor_v1705 *d,
+find_match_vec(struct rte_distributor *d,
 			uint16_t *data_ptr,
 			uint16_t *output_ptr)
 {
diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
index 92052b1..fb5a43a 100644
--- a/lib/librte_distributor/rte_distributor_private.h
+++ b/lib/librte_distributor/rte_distributor_private.h
@@ -83,7 +83,7 @@ extern "C" {
  * the next cache line to worker 0, we pad this out to three cache lines.
  * Only 64-bits of the memory is actually used though.
  */
-union rte_distributor_buffer {
+union rte_distributor_buffer_v20 {
 	volatile int64_t bufptr64;
 	char pad[RTE_CACHE_LINE_SIZE*3];
 } __rte_cache_aligned;
@@ -108,8 +108,8 @@ struct rte_distributor_returned_pkts {
 	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
 };
 
-struct rte_distributor {
-	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
+struct rte_distributor_v20 {
+	TAILQ_ENTRY(rte_distributor_v20) next;    /**< Next in list. */
 
 	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
 	unsigned int num_workers;             /**< Number of workers polling */
@@ -124,7 +124,7 @@ struct rte_distributor {
 
 	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
 
-	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
+	union rte_distributor_buffer_v20 bufs[RTE_DISTRIB_MAX_WORKERS];
 
 	struct rte_distributor_returned_pkts returns;
 };
@@ -144,7 +144,7 @@ enum rte_distributor_match_function {
  * We can pass up to 8 mbufs at a time in one cacheline.
  * There is a separate cacheline for returns in the burst API.
  */
-struct rte_distributor_buffer_v1705 {
+struct rte_distributor_buffer {
 	volatile int64_t bufptr64[RTE_DIST_BURST_SIZE]
 		__rte_cache_aligned; /* <= outgoing to worker */
 
@@ -158,8 +158,8 @@ struct rte_distributor_buffer_v1705 {
 	int count __rte_cache_aligned;       /* <= number of current mbufs */
 };
 
-struct rte_distributor_v1705 {
-	TAILQ_ENTRY(rte_distributor_v1705) next;    /**< Next in list. */
+struct rte_distributor {
+	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
 
 	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
 	unsigned int num_workers;             /**< Number of workers polling */
@@ -176,22 +176,22 @@ struct rte_distributor_v1705 {
 	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS]
 			__rte_cache_aligned;
 
-	struct rte_distributor_buffer_v1705 bufs[RTE_DISTRIB_MAX_WORKERS];
+	struct rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
 
 	struct rte_distributor_returned_pkts returns;
 
 	enum rte_distributor_match_function dist_match_fn;
 
-	struct rte_distributor *d_v20;
+	struct rte_distributor_v20 *d_v20;
 };
 
 void
-find_match_scalar(struct rte_distributor_v1705 *d,
+find_match_scalar(struct rte_distributor *d,
 			uint16_t *data_ptr,
 			uint16_t *output_ptr);
 
 void
-find_match_vec(struct rte_distributor_v1705 *d,
+find_match_vec(struct rte_distributor *d,
 			uint16_t *data_ptr,
 			uint16_t *output_ptr);
 
diff --git a/lib/librte_distributor/rte_distributor_v1705.h b/lib/librte_distributor/rte_distributor_v1705.h
deleted file mode 100644
index 0034020..0000000
--- a/lib/librte_distributor/rte_distributor_v1705.h
+++ /dev/null
@@ -1,269 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2017 Intel Corporation. All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _RTE_DISTRIBUTOR_H_
-#define _RTE_DISTRIBUTOR_H_
-
-/**
- * @file
- * RTE distributor
- *
- * The distributor is a component which is designed to pass packets
- * one-at-a-time to workers, with dynamic load balancing.
- */
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-/* Type of distribution (burst/single) */
-enum rte_distributor_alg_type {
-	RTE_DIST_ALG_BURST = 0,
-	RTE_DIST_ALG_SINGLE,
-	RTE_DIST_NUM_ALG_TYPES
-};
-
-struct rte_distributor_v1705;
-struct rte_mbuf;
-
-/**
- * Function to create a new distributor instance
- *
- * Reserves the memory needed for the distributor operation and
- * initializes the distributor to work with the configured number of workers.
- *
- * @param name
- *   The name to be given to the distributor instance.
- * @param socket_id
- *   The NUMA node on which the memory is to be allocated
- * @param num_workers
- *   The maximum number of workers that will request packets from this
- *   distributor
- * @param alg_type
- *   Call the legacy API, or use the new burst API. legacy uses 32-bit
- *   flow ID, and works on a single packet at a time. Latest uses 15-
- *   bit flow ID and works on up to 8 packets at a time to worers.
- * @return
- *   The newly created distributor instance
- */
-struct rte_distributor_v1705 *
-rte_distributor_create_v1705(const char *name, unsigned int socket_id,
-		unsigned int num_workers,
-		unsigned int alg_type);
-
-/*  *** APIS to be called on the distributor lcore ***  */
-/*
- * The following APIs are the public APIs which are designed for use on a
- * single lcore which acts as the distributor lcore for a given distributor
- * instance. These functions cannot be called on multiple cores simultaneously
- * without using locking to protect access to the internals of the distributor.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * Process a set of packets by distributing them among workers that request
- * packets. The distributor will ensure that no two packets that have the
- * same flow id, or tag, in the mbuf will be processed on different cores at
- * the same time.
- *
- * The user is advocated to set tag for each mbuf before calling this function.
- * If user doesn't set the tag, the tag value can be various values depending on
- * driver implementation and configuration.
- *
- * This is not multi-thread safe and should only be called on a single lcore.
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs to be distributed
- * @param num_mbufs
- *   The number of mbufs in the mbufs array
- * @return
- *   The number of mbufs processed.
- */
-int
-rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
-		struct rte_mbuf **mbufs, unsigned int num_mbufs);
-
-/**
- * Get a set of mbufs that have been returned to the distributor by workers
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs pointer array to be filled in
- * @param max_mbufs
- *   The size of the mbufs array
- * @return
- *   The number of mbufs returned in the mbufs array.
- */
-int
-rte_distributor_returned_pkts_v1705(struct rte_distributor_v1705 *d,
-		struct rte_mbuf **mbufs, unsigned int max_mbufs);
-
-/**
- * Flush the distributor component, so that there are no in-flight or
- * backlogged packets awaiting processing
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @return
- *   The number of queued/in-flight packets that were completed by this call.
- */
-int
-rte_distributor_flush_v1705(struct rte_distributor_v1705 *d);
-
-/**
- * Clears the array of returned packets used as the source for the
- * rte_distributor_returned_pkts() API call.
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- */
-void
-rte_distributor_clear_returns_v1705(struct rte_distributor_v1705 *d);
-
-/*  *** APIS to be called on the worker lcores ***  */
-/*
- * The following APIs are the public APIs which are designed for use on
- * multiple lcores which act as workers for a distributor. Each lcore should use
- * a unique worker id when requesting packets.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * API called by a worker to get new packets to process. Any previous packets
- * given to the worker is assumed to have completed processing, and may be
- * optionally returned to the distributor via the oldpkt parameter.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param pkts
- *   The mbufs pointer array to be filled in (up to 8 packets)
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- * @param retcount
- *   The number of packets being returned
- *
- * @return
- *   The number of packets in the pkts array
- */
-int
-rte_distributor_get_pkt_v1705(struct rte_distributor_v1705 *d,
-	unsigned int worker_id, struct rte_mbuf **pkts,
-	struct rte_mbuf **oldpkt, unsigned int retcount);
-
-/**
- * API called by a worker to return a completed packet without requesting a
- * new packet, for example, because a worker thread is shutting down
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packets being processed by the worker
- * @param num
- *   The number of packets in the oldpkt array
- */
-int
-rte_distributor_return_pkt_v1705(struct rte_distributor_v1705 *d,
-	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
-
-/**
- * API called by a worker to request a new packet to process.
- * Any previous packet given to the worker is assumed to have completed
- * processing, and may be optionally returned to the distributor via
- * the oldpkt parameter.
- * Unlike rte_distributor_get_pkt_burst(), this function does not wait for a
- * new packet to be provided by the distributor.
- *
- * NOTE: after calling this function, rte_distributor_poll_pkt_burst() should
- * be used to poll for the packet requested. The rte_distributor_get_pkt_burst()
- * API should *not* be used to try and retrieve the new packet.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The returning packets, if any, processed by the worker
- * @param count
- *   The number of returning packets
- */
-void
-rte_distributor_request_pkt_v1705(struct rte_distributor_v1705 *d,
-		unsigned int worker_id, struct rte_mbuf **oldpkt,
-		unsigned int count);
-
-/**
- * API called by a worker to check for a new packet that was previously
- * requested by a call to rte_distributor_request_pkt(). It does not wait
- * for the new packet to be available, but returns NULL if the request has
- * not yet been fulfilled by the distributor.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param mbufs
- *   The array of mbufs being given to the worker
- *
- * @return
- *   The number of packets being given to the worker thread, zero if no
- *   packet is yet available.
- */
-int
-rte_distributor_poll_pkt_v1705(struct rte_distributor_v1705 *d,
-		unsigned int worker_id, struct rte_mbuf **mbufs);
-
-#ifdef __cplusplus
-}
-#endif
-
-#endif
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
index be297ec..1f406c5 100644
--- a/lib/librte_distributor/rte_distributor_v20.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -43,7 +43,7 @@
 #include "rte_distributor_v20.h"
 #include "rte_distributor_private.h"
 
-TAILQ_HEAD(rte_distributor_list, rte_distributor);
+TAILQ_HEAD(rte_distributor_list, rte_distributor_v20);
 
 static struct rte_tailq_elem rte_distributor_tailq = {
 	.name = "RTE_DISTRIBUTOR",
@@ -53,10 +53,10 @@ EAL_REGISTER_TAILQ(rte_distributor_tailq)
 /**** APIs called by workers ****/
 
 void
-rte_distributor_request_pkt(struct rte_distributor *d,
+rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id, struct rte_mbuf *oldpkt)
 {
-	union rte_distributor_buffer *buf = &d->bufs[worker_id];
+	union rte_distributor_buffer_v20 *buf = &d->bufs[worker_id];
 	int64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
 			| RTE_DISTRIB_GET_BUF;
 	while (unlikely(buf->bufptr64 & RTE_DISTRIB_FLAGS_MASK))
@@ -65,10 +65,10 @@ rte_distributor_request_pkt(struct rte_distributor *d,
 }
 
 struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
+rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id)
 {
-	union rte_distributor_buffer *buf = &d->bufs[worker_id];
+	union rte_distributor_buffer_v20 *buf = &d->bufs[worker_id];
 	if (buf->bufptr64 & RTE_DISTRIB_GET_BUF)
 		return NULL;
 
@@ -78,21 +78,21 @@ rte_distributor_poll_pkt(struct rte_distributor *d,
 }
 
 struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
+rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id, struct rte_mbuf *oldpkt)
 {
 	struct rte_mbuf *ret;
-	rte_distributor_request_pkt(d, worker_id, oldpkt);
-	while ((ret = rte_distributor_poll_pkt(d, worker_id)) == NULL)
+	rte_distributor_request_pkt_v20(d, worker_id, oldpkt);
+	while ((ret = rte_distributor_poll_pkt_v20(d, worker_id)) == NULL)
 		rte_pause();
 	return ret;
 }
 
 int
-rte_distributor_return_pkt(struct rte_distributor *d,
+rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id, struct rte_mbuf *oldpkt)
 {
-	union rte_distributor_buffer *buf = &d->bufs[worker_id];
+	union rte_distributor_buffer_v20 *buf = &d->bufs[worker_id];
 	uint64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
 			| RTE_DISTRIB_RETURN_BUF;
 	buf->bufptr64 = req;
@@ -123,7 +123,7 @@ backlog_pop(struct rte_distributor_backlog *bl)
 
 /* stores a packet returned from a worker inside the returns array */
 static inline void
-store_return(uintptr_t oldbuf, struct rte_distributor *d,
+store_return(uintptr_t oldbuf, struct rte_distributor_v20 *d,
 		unsigned *ret_start, unsigned *ret_count)
 {
 	/* store returns in a circular buffer - code is branch-free */
@@ -134,7 +134,7 @@ store_return(uintptr_t oldbuf, struct rte_distributor *d,
 }
 
 static inline void
-handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
+handle_worker_shutdown(struct rte_distributor_v20 *d, unsigned int wkr)
 {
 	d->in_flight_tags[wkr] = 0;
 	d->in_flight_bitmask &= ~(1UL << wkr);
@@ -164,7 +164,7 @@ handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
 		 * Note that the tags were set before first level call
 		 * to rte_distributor_process.
 		 */
-		rte_distributor_process(d, pkts, i);
+		rte_distributor_process_v20(d, pkts, i);
 		bl->count = bl->start = 0;
 	}
 }
@@ -174,7 +174,7 @@ handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
  * to do a partial flush.
  */
 static int
-process_returns(struct rte_distributor *d)
+process_returns(struct rte_distributor_v20 *d)
 {
 	unsigned wkr;
 	unsigned flushed = 0;
@@ -213,7 +213,7 @@ process_returns(struct rte_distributor *d)
 
 /* process a set of packets to distribute them to workers */
 int
-rte_distributor_process(struct rte_distributor *d,
+rte_distributor_process_v20(struct rte_distributor_v20 *d,
 		struct rte_mbuf **mbufs, unsigned num_mbufs)
 {
 	unsigned next_idx = 0;
@@ -317,7 +317,7 @@ rte_distributor_process(struct rte_distributor *d,
 
 /* return to the caller, packets returned from workers */
 int
-rte_distributor_returned_pkts(struct rte_distributor *d,
+rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
 		struct rte_mbuf **mbufs, unsigned max_mbufs)
 {
 	struct rte_distributor_returned_pkts *returns = &d->returns;
@@ -338,7 +338,7 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
 /* return the number of packets in-flight in a distributor, i.e. packets
  * being workered on or queued up in a backlog. */
 static inline unsigned
-total_outstanding(const struct rte_distributor *d)
+total_outstanding(const struct rte_distributor_v20 *d)
 {
 	unsigned wkr, total_outstanding;
 
@@ -353,19 +353,19 @@ total_outstanding(const struct rte_distributor *d)
 /* flush the distributor, so that there are no outstanding packets in flight or
  * queued up. */
 int
-rte_distributor_flush(struct rte_distributor *d)
+rte_distributor_flush_v20(struct rte_distributor_v20 *d)
 {
 	const unsigned flushed = total_outstanding(d);
 
 	while (total_outstanding(d) > 0)
-		rte_distributor_process(d, NULL, 0);
+		rte_distributor_process_v20(d, NULL, 0);
 
 	return flushed;
 }
 
 /* clears the internal returns array in the distributor */
 void
-rte_distributor_clear_returns(struct rte_distributor *d)
+rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d)
 {
 	d->returns.start = d->returns.count = 0;
 #ifndef __OPTIMIZE__
@@ -374,12 +374,12 @@ rte_distributor_clear_returns(struct rte_distributor *d)
 }
 
 /* creates a distributor instance */
-struct rte_distributor *
-rte_distributor_create(const char *name,
+struct rte_distributor_v20 *
+rte_distributor_create_v20(const char *name,
 		unsigned socket_id,
 		unsigned num_workers)
 {
-	struct rte_distributor *d;
+	struct rte_distributor_v20 *d;
 	struct rte_distributor_list *distributor_list;
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	const struct rte_memzone *mz;
diff --git a/lib/librte_distributor/rte_distributor_v20.h b/lib/librte_distributor/rte_distributor_v20.h
index b69aa27..f02e6aa 100644
--- a/lib/librte_distributor/rte_distributor_v20.h
+++ b/lib/librte_distributor/rte_distributor_v20.h
@@ -48,7 +48,7 @@ extern "C" {
 
 #define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
 
-struct rte_distributor;
+struct rte_distributor_v20;
 struct rte_mbuf;
 
 /**
@@ -67,8 +67,8 @@ struct rte_mbuf;
  * @return
  *   The newly created distributor instance
  */
-struct rte_distributor *
-rte_distributor_create(const char *name, unsigned int socket_id,
+struct rte_distributor_v20 *
+rte_distributor_create_v20(const char *name, unsigned int socket_id,
 		unsigned int num_workers);
 
 /*  *** APIS to be called on the distributor lcore ***  */
@@ -103,7 +103,7 @@ rte_distributor_create(const char *name, unsigned int socket_id,
  *   The number of mbufs processed.
  */
 int
-rte_distributor_process(struct rte_distributor *d,
+rte_distributor_process_v20(struct rte_distributor_v20 *d,
 		struct rte_mbuf **mbufs, unsigned int num_mbufs);
 
 /**
@@ -121,7 +121,7 @@ rte_distributor_process(struct rte_distributor *d,
  *   The number of mbufs returned in the mbufs array.
  */
 int
-rte_distributor_returned_pkts(struct rte_distributor *d,
+rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
 		struct rte_mbuf **mbufs, unsigned int max_mbufs);
 
 /**
@@ -136,7 +136,7 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
  *   The number of queued/in-flight packets that were completed by this call.
  */
 int
-rte_distributor_flush(struct rte_distributor *d);
+rte_distributor_flush_v20(struct rte_distributor_v20 *d);
 
 /**
  * Clears the array of returned packets used as the source for the
@@ -148,7 +148,7 @@ rte_distributor_flush(struct rte_distributor *d);
  *   The distributor instance to be used
  */
 void
-rte_distributor_clear_returns(struct rte_distributor *d);
+rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d);
 
 /*  *** APIS to be called on the worker lcores ***  */
 /*
@@ -177,7 +177,7 @@ rte_distributor_clear_returns(struct rte_distributor *d);
  *   A new packet to be processed by the worker thread.
  */
 struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
+rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned int worker_id, struct rte_mbuf *oldpkt);
 
 /**
@@ -193,8 +193,8 @@ rte_distributor_get_pkt(struct rte_distributor *d,
  *   The previous packet being processed by the worker
  */
 int
-rte_distributor_return_pkt(struct rte_distributor *d, unsigned int worker_id,
-		struct rte_mbuf *mbuf);
+rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
+		unsigned int worker_id, struct rte_mbuf *mbuf);
 
 /**
  * API called by a worker to request a new packet to process.
@@ -217,7 +217,7 @@ rte_distributor_return_pkt(struct rte_distributor *d, unsigned int worker_id,
  *   The previous packet, if any, being processed by the worker
  */
 void
-rte_distributor_request_pkt(struct rte_distributor *d,
+rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned int worker_id, struct rte_mbuf *oldpkt);
 
 /**
@@ -237,7 +237,7 @@ rte_distributor_request_pkt(struct rte_distributor *d,
  *   packet is yet available.
  */
 struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
+rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned int worker_id);
 
 #ifdef __cplusplus
diff --git a/test/test/test_distributor.c b/test/test/test_distributor.c
index 6059a0c..7a30513 100644
--- a/test/test/test_distributor.c
+++ b/test/test/test_distributor.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -87,20 +87,25 @@ clear_packet_count(void)
 static int
 handle_work(void *arg)
 {
-	struct rte_mbuf *pkt = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
 	struct worker_params *wp = arg;
-	struct rte_distributor *d = wp->dist;
-
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
-
-	pkt = rte_distributor_get_pkt(d, id, NULL);
+	struct rte_distributor *db = wp->dist;
+	unsigned int count = 0, num = 0;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	int i;
+
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
+	num = rte_distributor_get_pkt(db, id, buf, buf, num);
 	while (!quit) {
-		worker_stats[id].handled_packets++, count++;
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		worker_stats[id].handled_packets += num;
+		count += num;
+		num = rte_distributor_get_pkt(db, id,
+				buf, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
+	worker_stats[id].handled_packets += num;
+	count += num;
+	rte_distributor_return_pkt(db, id, buf, num);
 	return 0;
 }
 
@@ -118,9 +123,11 @@ handle_work(void *arg)
 static int
 sanity_test(struct worker_params *wp, struct rte_mempool *p)
 {
-	struct rte_distributor *d = wp->dist;
+	struct rte_distributor *db = wp->dist;
 	struct rte_mbuf *bufs[BURST];
-	unsigned i;
+	struct rte_mbuf *returns[BURST*2];
+	unsigned int i, count;
+	unsigned int retries;
 
 	printf("=== Basic distributor sanity tests ===\n");
 	clear_packet_count();
@@ -134,8 +141,15 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 	for (i = 0; i < BURST; i++)
 		bufs[i]->hash.usr = 0;
 
-	rte_distributor_process(d, bufs, BURST);
-	rte_distributor_flush(d);
+	rte_distributor_process(db, bufs, BURST);
+	count = 0;
+	do {
+
+		rte_distributor_flush(db);
+		count += rte_distributor_returned_pkts(db,
+				returns, BURST*2);
+	} while (count < BURST);
+
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -147,8 +161,6 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 		printf("Worker %u handled %u packets\n", i,
 				worker_stats[i].handled_packets);
 	printf("Sanity test with all zero hashes done.\n");
-	if (worker_stats[0].handled_packets != BURST)
-		return -1;
 
 	/* pick two flows and check they go correctly */
 	if (rte_lcore_count() >= 3) {
@@ -156,8 +168,13 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 		for (i = 0; i < BURST; i++)
 			bufs[i]->hash.usr = (i & 1) << 8;
 
-		rte_distributor_process(d, bufs, BURST);
-		rte_distributor_flush(d);
+		rte_distributor_process(db, bufs, BURST);
+		count = 0;
+		do {
+			rte_distributor_flush(db);
+			count += rte_distributor_returned_pkts(db,
+					returns, BURST*2);
+		} while (count < BURST);
 		if (total_packet_count() != BURST) {
 			printf("Line %d: Error, not all packets flushed. "
 					"Expected %u, got %u\n",
@@ -169,20 +186,21 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 			printf("Worker %u handled %u packets\n", i,
 					worker_stats[i].handled_packets);
 		printf("Sanity test with two hash values done\n");
-
-		if (worker_stats[0].handled_packets != 16 ||
-				worker_stats[1].handled_packets != 16)
-			return -1;
 	}
 
 	/* give a different hash value to each packet,
 	 * so load gets distributed */
 	clear_packet_count();
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = i;
-
-	rte_distributor_process(d, bufs, BURST);
-	rte_distributor_flush(d);
+		bufs[i]->hash.usr = i+1;
+
+	rte_distributor_process(db, bufs, BURST);
+	count = 0;
+	do {
+		rte_distributor_flush(db);
+		count += rte_distributor_returned_pkts(db,
+				returns, BURST*2);
+	} while (count < BURST);
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -204,8 +222,9 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 	unsigned num_returned = 0;
 
 	/* flush out any remaining packets */
-	rte_distributor_flush(d);
-	rte_distributor_clear_returns(d);
+	rte_distributor_flush(db);
+	rte_distributor_clear_returns(db);
+
 	if (rte_mempool_get_bulk(p, (void *)many_bufs, BIG_BATCH) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
 		return -1;
@@ -213,28 +232,44 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 	for (i = 0; i < BIG_BATCH; i++)
 		many_bufs[i]->hash.usr = i << 2;
 
+	printf("=== testing big burst (%s) ===\n", wp->name);
 	for (i = 0; i < BIG_BATCH/BURST; i++) {
-		rte_distributor_process(d, &many_bufs[i*BURST], BURST);
-		num_returned += rte_distributor_returned_pkts(d,
+		rte_distributor_process(db,
+				&many_bufs[i*BURST], BURST);
+		count = rte_distributor_returned_pkts(db,
 				&return_bufs[num_returned],
 				BIG_BATCH - num_returned);
+		num_returned += count;
 	}
-	rte_distributor_flush(d);
-	num_returned += rte_distributor_returned_pkts(d,
-			&return_bufs[num_returned], BIG_BATCH - num_returned);
+	rte_distributor_flush(db);
+	count = rte_distributor_returned_pkts(db,
+		&return_bufs[num_returned],
+			BIG_BATCH - num_returned);
+	num_returned += count;
+	retries = 0;
+	do {
+		rte_distributor_flush(db);
+		count = rte_distributor_returned_pkts(db,
+				&return_bufs[num_returned],
+				BIG_BATCH - num_returned);
+		num_returned += count;
+		retries++;
+	} while ((num_returned < BIG_BATCH) && (retries < 100));
 
 	if (num_returned != BIG_BATCH) {
-		printf("line %d: Number returned is not the same as "
-				"number sent\n", __LINE__);
+		printf("line %d: Missing packets, expected %d\n",
+				__LINE__, num_returned);
 		return -1;
 	}
+
 	/* big check -  make sure all packets made it back!! */
 	for (i = 0; i < BIG_BATCH; i++) {
 		unsigned j;
 		struct rte_mbuf *src = many_bufs[i];
-		for (j = 0; j < BIG_BATCH; j++)
+		for (j = 0; j < BIG_BATCH; j++) {
 			if (return_bufs[j] == src)
 				break;
+		}
 
 		if (j == BIG_BATCH) {
 			printf("Error: could not find source packet #%u\n", i);
@@ -258,20 +293,28 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 static int
 handle_work_with_free_mbufs(void *arg)
 {
-	struct rte_mbuf *pkt = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
 	struct worker_params *wp = arg;
 	struct rte_distributor *d = wp->dist;
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
-
-	pkt = rte_distributor_get_pkt(d, id, NULL);
+	unsigned int count = 0;
+	unsigned int i;
+	unsigned int num = 0;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
+	num = rte_distributor_get_pkt(d, id, buf, buf, num);
 	while (!quit) {
-		worker_stats[id].handled_packets++, count++;
-		rte_pktmbuf_free(pkt);
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		worker_stats[id].handled_packets += num;
+		count += num;
+		for (i = 0; i < num; i++)
+			rte_pktmbuf_free(buf[i]);
+		num = rte_distributor_get_pkt(d,
+				id, buf, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
+	worker_stats[id].handled_packets += num;
+	count += num;
+	rte_distributor_return_pkt(d, id, buf, num);
 	return 0;
 }
 
@@ -287,7 +330,8 @@ sanity_test_with_mbuf_alloc(struct worker_params *wp, struct rte_mempool *p)
 	unsigned i;
 	struct rte_mbuf *bufs[BURST];
 
-	printf("=== Sanity test with mbuf alloc/free  ===\n");
+	printf("=== Sanity test with mbuf alloc/free (%s) ===\n", wp->name);
+
 	clear_packet_count();
 	for (i = 0; i < ((1<<ITER_POWER)); i += BURST) {
 		unsigned j;
@@ -302,6 +346,9 @@ sanity_test_with_mbuf_alloc(struct worker_params *wp, struct rte_mempool *p)
 	}
 
 	rte_distributor_flush(d);
+
+	rte_delay_us(10000);
+
 	if (total_packet_count() < (1<<ITER_POWER)) {
 		printf("Line %u: Packet count is incorrect, %u, expected %u\n",
 				__LINE__, total_packet_count(),
@@ -317,21 +364,32 @@ static int
 handle_work_for_shutdown_test(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
 	struct worker_params *wp = arg;
 	struct rte_distributor *d = wp->dist;
-	unsigned count = 0;
-	const unsigned id = __sync_fetch_and_add(&worker_idx, 1);
+	unsigned int count = 0;
+	unsigned int num = 0;
+	unsigned int total = 0;
+	unsigned int i;
+	unsigned int returned = 0;
+	const unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+
+	num = rte_distributor_get_pkt(d, id, buf, buf, num);
 
-	pkt = rte_distributor_get_pkt(d, id, NULL);
 	/* wait for quit single globally, or for worker zero, wait
 	 * for zero_quit */
 	while (!quit && !(id == 0 && zero_quit)) {
-		worker_stats[id].handled_packets++, count++;
-		rte_pktmbuf_free(pkt);
-		pkt = rte_distributor_get_pkt(d, id, NULL);
+		worker_stats[id].handled_packets += num;
+		count += num;
+		for (i = 0; i < num; i++)
+			rte_pktmbuf_free(buf[i]);
+		num = rte_distributor_get_pkt(d,
+				id, buf, buf, num);
+		total += num;
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
+	worker_stats[id].handled_packets += num;
+	count += num;
+	returned = rte_distributor_return_pkt(d, id, buf, num);
 
 	if (id == 0) {
 		/* for worker zero, allow it to restart to pick up last packet
@@ -339,13 +397,18 @@ handle_work_for_shutdown_test(void *arg)
 		 */
 		while (zero_quit)
 			usleep(100);
-		pkt = rte_distributor_get_pkt(d, id, NULL);
+
+		num = rte_distributor_get_pkt(d,
+				id, buf, buf, num);
+
 		while (!quit) {
 			worker_stats[id].handled_packets++, count++;
 			rte_pktmbuf_free(pkt);
-			pkt = rte_distributor_get_pkt(d, id, NULL);
+			num = rte_distributor_get_pkt(d, id, buf, buf, num);
 		}
-		rte_distributor_return_pkt(d, id, pkt);
+		returned = rte_distributor_return_pkt(d,
+				id, buf, num);
+		printf("Num returned = %d\n", returned);
 	}
 	return 0;
 }
@@ -367,17 +430,22 @@ sanity_test_with_worker_shutdown(struct worker_params *wp,
 	printf("=== Sanity test of worker shutdown ===\n");
 
 	clear_packet_count();
+
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
 		return -1;
 	}
 
-	/* now set all hash values in all buffers to zero, so all pkts go to the
-	 * one worker thread */
+	/*
+	 * Now set all hash values in all buffers to same value so all
+	 * pkts go to the one worker thread
+	 */
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = 0;
+		bufs[i]->hash.usr = 1;
 
 	rte_distributor_process(d, bufs, BURST);
+	rte_distributor_flush(d);
+
 	/* at this point, we will have processed some packets and have a full
 	 * backlog for the other ones at worker 0.
 	 */
@@ -388,7 +456,7 @@ sanity_test_with_worker_shutdown(struct worker_params *wp,
 		return -1;
 	}
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = 0;
+		bufs[i]->hash.usr = 1;
 
 	/* get worker zero to quit */
 	zero_quit = 1;
@@ -396,6 +464,12 @@ sanity_test_with_worker_shutdown(struct worker_params *wp,
 
 	/* flush the distributor */
 	rte_distributor_flush(d);
+	rte_delay_us(10000);
+
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
+
 	if (total_packet_count() != BURST * 2) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -403,10 +477,6 @@ sanity_test_with_worker_shutdown(struct worker_params *wp,
 		return -1;
 	}
 
-	for (i = 0; i < rte_lcore_count() - 1; i++)
-		printf("Worker %u handled %u packets\n", i,
-				worker_stats[i].handled_packets);
-
 	printf("Sanity test with worker shutdown passed\n\n");
 	return 0;
 }
@@ -422,7 +492,7 @@ test_flush_with_worker_shutdown(struct worker_params *wp,
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
-	printf("=== Test flush fn with worker shutdown ===\n");
+	printf("=== Test flush fn with worker shutdown (%s) ===\n", wp->name);
 
 	clear_packet_count();
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
@@ -446,7 +516,13 @@ test_flush_with_worker_shutdown(struct worker_params *wp,
 	/* flush the distributor */
 	rte_distributor_flush(d);
 
+	rte_delay_us(10000);
+
 	zero_quit = 0;
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
+
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -454,10 +530,6 @@ test_flush_with_worker_shutdown(struct worker_params *wp,
 		return -1;
 	}
 
-	for (i = 0; i < rte_lcore_count() - 1; i++)
-		printf("Worker %u handled %u packets\n", i,
-				worker_stats[i].handled_packets);
-
 	printf("Flush test with worker shutdown passed\n\n");
 	return 0;
 }
@@ -469,7 +541,9 @@ int test_error_distributor_create_name(void)
 	char *name = NULL;
 
 	d = rte_distributor_create(name, rte_socket_id(),
-			rte_lcore_count() - 1);
+			rte_lcore_count() - 1,
+			RTE_DIST_ALG_BURST);
+
 	if (d != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with NULL name param\n");
 		return -1;
@@ -483,8 +557,10 @@ static
 int test_error_distributor_create_numworkers(void)
 {
 	struct rte_distributor *d = NULL;
+
 	d = rte_distributor_create("test_numworkers", rte_socket_id(),
-			RTE_MAX_LCORE + 10);
+			RTE_MAX_LCORE + 10,
+			RTE_DIST_ALG_BURST);
 	if (d != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with num_workers > MAX\n");
 		return -1;
@@ -530,10 +606,11 @@ test_distributor(void)
 	}
 
 	if (d == NULL) {
-		d = rte_distributor_create("Test_distributor", rte_socket_id(),
-				rte_lcore_count() - 1);
+		d = rte_distributor_create("Test_dist_burst", rte_socket_id(),
+				rte_lcore_count() - 1,
+				RTE_DIST_ALG_BURST);
 		if (d == NULL) {
-			printf("Error creating distributor\n");
+			printf("Error creating burst distributor\n");
 			return -1;
 		}
 	} else {
@@ -553,7 +630,7 @@ test_distributor(void)
 	}
 
 	worker_params.dist = d;
-	sprintf(worker_params.name, "single");
+	sprintf(worker_params.name, "burst");
 
 	rte_eal_mp_remote_launch(handle_work, &worker_params, SKIP_MASTER);
 	if (sanity_test(&worker_params, p) < 0)
diff --git a/test/test/test_distributor_perf.c b/test/test/test_distributor_perf.c
index 7947fe9..1dd326b 100644
--- a/test/test/test_distributor_perf.c
+++ b/test/test/test_distributor_perf.c
@@ -129,18 +129,25 @@ clear_packet_count(void)
 static int
 handle_work(void *arg)
 {
-	struct rte_mbuf *pkt = NULL;
 	struct rte_distributor *d = arg;
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
+	unsigned int count = 0;
+	unsigned int num = 0;
+	int i;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
 
-	pkt = rte_distributor_get_pkt(d, id, NULL);
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
+
+	num = rte_distributor_get_pkt(d, id, buf, buf, num);
 	while (!quit) {
-		worker_stats[id].handled_packets++, count++;
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		worker_stats[id].handled_packets += num;
+		count += num;
+		num = rte_distributor_get_pkt(d, id, buf, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
+	worker_stats[id].handled_packets += num;
+	count += num;
+	rte_distributor_return_pkt(d, id, buf, num);
 	return 0;
 }
 
@@ -228,7 +235,8 @@ test_distributor_perf(void)
 
 	if (d == NULL) {
 		d = rte_distributor_create("Test_perf", rte_socket_id(),
-				rte_lcore_count() - 1);
+				rte_lcore_count() - 1,
+				RTE_DIST_ALG_SINGLE);
 		if (d == NULL) {
 			printf("Error creating distributor\n");
 			return -1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v8 08/18] lib: make v20 header file private
  2017-03-01  7:47                       ` [PATCH v8 0/18] distributor library performance enhancements David Hunt
                                           ` (6 preceding siblings ...)
  2017-03-01  7:47                         ` [PATCH v8 07/18] lib: switch distributor over to new API David Hunt
@ 2017-03-01  7:47                         ` David Hunt
  2017-03-01  7:47                         ` [PATCH v8 09/18] lib: add symbol versioning to distributor David Hunt
                                           ` (9 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index a812fe4..2b28eff 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -57,7 +57,6 @@ endif
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
-SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor_v20.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v8 09/18] lib: add symbol versioning to distributor
  2017-03-01  7:47                       ` [PATCH v8 0/18] distributor library performance enhancements David Hunt
                                           ` (7 preceding siblings ...)
  2017-03-01  7:47                         ` [PATCH v8 08/18] lib: make v20 header file private David Hunt
@ 2017-03-01  7:47                         ` David Hunt
  2017-03-01 14:50                           ` Hunt, David
  2017-03-01  7:47                         ` [PATCH v8 10/18] test: test single and burst distributor API David Hunt
                                           ` (8 subsequent siblings)
  17 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Also bumped up the ABI version number in the Makefile

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                    |  2 +-
 lib/librte_distributor/rte_distributor.c           |  8 ++++++++
 lib/librte_distributor/rte_distributor_v20.c       | 10 ++++++++++
 lib/librte_distributor/rte_distributor_version.map | 14 ++++++++++++++
 4 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 2b28eff..2f05cf3 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -39,7 +39,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
 
 EXPORT_MAP := rte_distributor_version.map
 
-LIBABIVER := 1
+LIBABIVER := 2
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index 6e1debf..2c5511d 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -36,6 +36,7 @@
 #include <rte_mbuf.h>
 #include <rte_memory.h>
 #include <rte_cycles.h>
+#include <rte_compat.h>
 #include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
@@ -168,6 +169,7 @@ rte_distributor_get_pkt(struct rte_distributor *d,
 	}
 	return count;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_get_pkt, , 17.05);
 
 int
 rte_distributor_return_pkt(struct rte_distributor *d,
@@ -197,6 +199,7 @@ rte_distributor_return_pkt(struct rte_distributor *d,
 
 	return 0;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_return_pkt, , 17.05);
 
 /**** APIs called on distributor core ***/
 
@@ -476,6 +479,7 @@ rte_distributor_process(struct rte_distributor *d,
 
 	return num_mbufs;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_process, , 17.05);
 
 /* return to the caller, packets returned from workers */
 int
@@ -504,6 +508,7 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
 
 	return retval;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_returned_pkts, , 17.05);
 
 /*
  * Return the number of packets in-flight in a distributor, i.e. packets
@@ -549,6 +554,7 @@ rte_distributor_flush(struct rte_distributor *d)
 
 	return flushed;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_flush, , 17.05);
 
 /* clears the internal returns array in the distributor */
 void
@@ -565,6 +571,7 @@ rte_distributor_clear_returns(struct rte_distributor *d)
 	for (wkr = 0; wkr < d->num_workers; wkr++)
 		d->bufs[wkr].retptr64[0] = 0;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_clear_returns, , 17.05);
 
 /* creates a distributor instance */
 struct rte_distributor *
@@ -638,3 +645,4 @@ rte_distributor_create(const char *name,
 
 	return d;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_create, , 17.05);
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
index 1f406c5..bb6c5d7 100644
--- a/lib/librte_distributor/rte_distributor_v20.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -38,6 +38,7 @@
 #include <rte_memory.h>
 #include <rte_memzone.h>
 #include <rte_errno.h>
+#include <rte_compat.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
 #include "rte_distributor_v20.h"
@@ -63,6 +64,7 @@ rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
 		rte_pause();
 	buf->bufptr64 = req;
 }
+VERSION_SYMBOL(rte_distributor_request_pkt, _v20, 2.0);
 
 struct rte_mbuf *
 rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
@@ -76,6 +78,7 @@ rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
 	int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
 	return (struct rte_mbuf *)((uintptr_t)ret);
 }
+VERSION_SYMBOL(rte_distributor_poll_pkt, _v20, 2.0);
 
 struct rte_mbuf *
 rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
@@ -87,6 +90,7 @@ rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
 		rte_pause();
 	return ret;
 }
+VERSION_SYMBOL(rte_distributor_get_pkt, _v20, 2.0);
 
 int
 rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
@@ -98,6 +102,7 @@ rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
 	buf->bufptr64 = req;
 	return 0;
 }
+VERSION_SYMBOL(rte_distributor_return_pkt, _v20, 2.0);
 
 /**** APIs called on distributor core ***/
 
@@ -314,6 +319,7 @@ rte_distributor_process_v20(struct rte_distributor_v20 *d,
 	d->returns.count = ret_count;
 	return num_mbufs;
 }
+VERSION_SYMBOL(rte_distributor_process, _v20, 2.0);
 
 /* return to the caller, packets returned from workers */
 int
@@ -334,6 +340,7 @@ rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
 
 	return retval;
 }
+VERSION_SYMBOL(rte_distributor_returned_pkts, _v20, 2.0);
 
 /* return the number of packets in-flight in a distributor, i.e. packets
  * being workered on or queued up in a backlog. */
@@ -362,6 +369,7 @@ rte_distributor_flush_v20(struct rte_distributor_v20 *d)
 
 	return flushed;
 }
+VERSION_SYMBOL(rte_distributor_flush, _v20, 2.0);
 
 /* clears the internal returns array in the distributor */
 void
@@ -372,6 +380,7 @@ rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d)
 	memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
 #endif
 }
+VERSION_SYMBOL(rte_distributor_clear_returns, _v20, 2.0);
 
 /* creates a distributor instance */
 struct rte_distributor_v20 *
@@ -415,3 +424,4 @@ rte_distributor_create_v20(const char *name,
 
 	return d;
 }
+VERSION_SYMBOL(rte_distributor_create, _v20, 2.0);
diff --git a/lib/librte_distributor/rte_distributor_version.map b/lib/librte_distributor/rte_distributor_version.map
index 73fdc43..3a285b3 100644
--- a/lib/librte_distributor/rte_distributor_version.map
+++ b/lib/librte_distributor/rte_distributor_version.map
@@ -13,3 +13,17 @@ DPDK_2.0 {
 
 	local: *;
 };
+
+DPDK_17.05 {
+	global:
+
+	rte_distributor_clear_returns;
+	rte_distributor_create;
+	rte_distributor_flush;
+	rte_distributor_get_pkt;
+	rte_distributor_poll_pkt;
+	rte_distributor_process;
+	rte_distributor_request_pkt;
+	rte_distributor_return_pkt;
+	rte_distributor_returned_pkts;
+} DPDK_2.0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v8 10/18] test: test single and burst distributor API
  2017-03-01  7:47                       ` [PATCH v8 0/18] distributor library performance enhancements David Hunt
                                           ` (8 preceding siblings ...)
  2017-03-01  7:47                         ` [PATCH v8 09/18] lib: add symbol versioning to distributor David Hunt
@ 2017-03-01  7:47                         ` David Hunt
  2017-03-01  7:47                         ` [PATCH v8 11/18] test: add perf test for distributor burst mode David Hunt
                                           ` (7 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 test/test/test_distributor.c | 116 ++++++++++++++++++++++++++++++-------------
 1 file changed, 82 insertions(+), 34 deletions(-)

diff --git a/test/test/test_distributor.c b/test/test/test_distributor.c
index 7a30513..890a852 100644
--- a/test/test/test_distributor.c
+++ b/test/test/test_distributor.c
@@ -538,17 +538,25 @@ static
 int test_error_distributor_create_name(void)
 {
 	struct rte_distributor *d = NULL;
+	struct rte_distributor *db = NULL;
 	char *name = NULL;
 
 	d = rte_distributor_create(name, rte_socket_id(),
 			rte_lcore_count() - 1,
-			RTE_DIST_ALG_BURST);
-
+			RTE_DIST_ALG_SINGLE);
 	if (d != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with NULL name param\n");
 		return -1;
 	}
 
+	db = rte_distributor_create(name, rte_socket_id(),
+			rte_lcore_count() - 1,
+			RTE_DIST_ALG_BURST);
+	if (db != NULL || rte_errno != EINVAL) {
+		printf("ERROR: No error on create() with NULL param\n");
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -556,15 +564,25 @@ int test_error_distributor_create_name(void)
 static
 int test_error_distributor_create_numworkers(void)
 {
-	struct rte_distributor *d = NULL;
+	struct rte_distributor *ds = NULL;
+	struct rte_distributor *db = NULL;
 
-	d = rte_distributor_create("test_numworkers", rte_socket_id(),
+	ds = rte_distributor_create("test_numworkers", rte_socket_id(),
 			RTE_MAX_LCORE + 10,
-			RTE_DIST_ALG_BURST);
-	if (d != NULL || rte_errno != EINVAL) {
+			RTE_DIST_ALG_SINGLE);
+	if (ds != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with num_workers > MAX\n");
 		return -1;
 	}
+
+	db = rte_distributor_create("test_numworkers", rte_socket_id(),
+			RTE_MAX_LCORE + 10,
+			RTE_DIST_ALG_BURST);
+	if (db != NULL || rte_errno != EINVAL) {
+		printf("ERROR: No error on create() num_workers > MAX\n");
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -597,25 +615,42 @@ quit_workers(struct worker_params *wp, struct rte_mempool *p)
 static int
 test_distributor(void)
 {
-	static struct rte_distributor *d;
+	static struct rte_distributor *ds;
+	static struct rte_distributor *db;
+	static struct rte_distributor *dist[2];
 	static struct rte_mempool *p;
+	int i;
 
 	if (rte_lcore_count() < 2) {
 		printf("ERROR: not enough cores to test distributor\n");
 		return -1;
 	}
 
-	if (d == NULL) {
-		d = rte_distributor_create("Test_dist_burst", rte_socket_id(),
+	if (db == NULL) {
+		db = rte_distributor_create("Test_dist_burst", rte_socket_id(),
 				rte_lcore_count() - 1,
 				RTE_DIST_ALG_BURST);
-		if (d == NULL) {
+		if (db == NULL) {
 			printf("Error creating burst distributor\n");
 			return -1;
 		}
 	} else {
-		rte_distributor_flush(d);
-		rte_distributor_clear_returns(d);
+		rte_distributor_flush(db);
+		rte_distributor_clear_returns(db);
+	}
+
+	if (ds == NULL) {
+		ds = rte_distributor_create("Test_dist_single",
+				rte_socket_id(),
+				rte_lcore_count() - 1,
+			RTE_DIST_ALG_SINGLE);
+		if (ds == NULL) {
+			printf("Error creating single distributor\n");
+			return -1;
+		}
+	} else {
+		rte_distributor_flush(ds);
+		rte_distributor_clear_returns(ds);
 	}
 
 	const unsigned nb_bufs = (511 * rte_lcore_count()) < BIG_BATCH ?
@@ -629,37 +664,50 @@ test_distributor(void)
 		}
 	}
 
-	worker_params.dist = d;
-	sprintf(worker_params.name, "burst");
+	dist[0] = ds;
+	dist[1] = db;
 
-	rte_eal_mp_remote_launch(handle_work, &worker_params, SKIP_MASTER);
-	if (sanity_test(&worker_params, p) < 0)
-		goto err;
-	quit_workers(&worker_params, p);
+	for (i = 0; i < 2; i++) {
 
-	rte_eal_mp_remote_launch(handle_work_with_free_mbufs, &worker_params,
-				SKIP_MASTER);
-	if (sanity_test_with_mbuf_alloc(&worker_params, p) < 0)
-		goto err;
-	quit_workers(&worker_params, p);
+		worker_params.dist = dist[i];
+		if (i)
+			sprintf(worker_params.name, "burst");
+		else
+			sprintf(worker_params.name, "single");
 
-	if (rte_lcore_count() > 2) {
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
-				&worker_params,
-				SKIP_MASTER);
-		if (sanity_test_with_worker_shutdown(&worker_params, p) < 0)
+		rte_eal_mp_remote_launch(handle_work,
+				&worker_params, SKIP_MASTER);
+		if (sanity_test(&worker_params, p) < 0)
 			goto err;
 		quit_workers(&worker_params, p);
 
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
-				&worker_params,
-				SKIP_MASTER);
-		if (test_flush_with_worker_shutdown(&worker_params, p) < 0)
+		rte_eal_mp_remote_launch(handle_work_with_free_mbufs,
+				&worker_params, SKIP_MASTER);
+		if (sanity_test_with_mbuf_alloc(&worker_params, p) < 0)
 			goto err;
 		quit_workers(&worker_params, p);
 
-	} else {
-		printf("Not enough cores to run tests for worker shutdown\n");
+		if (rte_lcore_count() > 2) {
+			rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+					&worker_params,
+					SKIP_MASTER);
+			if (sanity_test_with_worker_shutdown(&worker_params,
+					p) < 0)
+				goto err;
+			quit_workers(&worker_params, p);
+
+			rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+					&worker_params,
+					SKIP_MASTER);
+			if (test_flush_with_worker_shutdown(&worker_params,
+					p) < 0)
+				goto err;
+			quit_workers(&worker_params, p);
+
+		} else {
+			printf("Too few cores to run worker shutdown test\n");
+		}
+
 	}
 
 	if (test_error_distributor_create_numworkers() == -1 ||
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v8 11/18] test: add perf test for distributor burst mode
  2017-03-01  7:47                       ` [PATCH v8 0/18] distributor library performance enhancements David Hunt
                                           ` (9 preceding siblings ...)
  2017-03-01  7:47                         ` [PATCH v8 10/18] test: test single and burst distributor API David Hunt
@ 2017-03-01  7:47                         ` David Hunt
  2017-03-01  7:47                         ` [PATCH v8 12/18] examples/distributor: allow for extra stats David Hunt
                                           ` (6 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 test/test/test_distributor_perf.c | 75 ++++++++++++++++++++++++++-------------
 1 file changed, 51 insertions(+), 24 deletions(-)

diff --git a/test/test/test_distributor_perf.c b/test/test/test_distributor_perf.c
index 1dd326b..732d86d 100644
--- a/test/test/test_distributor_perf.c
+++ b/test/test/test_distributor_perf.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -41,8 +41,9 @@
 #include <rte_mbuf.h>
 #include <rte_distributor.h>
 
-#define ITER_POWER 20 /* log 2 of how many iterations we do when timing. */
-#define BURST 32
+#define ITER_POWER_CL 25 /* log 2 of how many iterations  for Cache Line test */
+#define ITER_POWER 21 /* log 2 of how many iterations we do when timing. */
+#define BURST 64
 #define BIG_BATCH 1024
 
 /* static vars - zero initialized by default */
@@ -54,7 +55,8 @@ struct worker_stats {
 } __rte_cache_aligned;
 struct worker_stats worker_stats[RTE_MAX_LCORE];
 
-/* worker thread used for testing the time to do a round-trip of a cache
+/*
+ * worker thread used for testing the time to do a round-trip of a cache
  * line between two cores and back again
  */
 static void
@@ -69,7 +71,8 @@ flip_bit(volatile uint64_t *arg)
 	}
 }
 
-/* test case to time the number of cycles to round-trip a cache line between
+/*
+ * test case to time the number of cycles to round-trip a cache line between
  * two cores and back again.
  */
 static void
@@ -86,7 +89,7 @@ time_cache_line_switch(void)
 		rte_pause();
 
 	const uint64_t start_time = rte_rdtsc();
-	for (i = 0; i < (1 << ITER_POWER); i++) {
+	for (i = 0; i < (1 << ITER_POWER_CL); i++) {
 		while (*pdata)
 			rte_pause();
 		*pdata = 1;
@@ -98,13 +101,14 @@ time_cache_line_switch(void)
 	*pdata = 2;
 	rte_eal_wait_lcore(slaveid);
 	printf("==== Cache line switch test ===\n");
-	printf("Time for %u iterations = %"PRIu64" ticks\n", (1<<ITER_POWER),
+	printf("Time for %u iterations = %"PRIu64" ticks\n", (1<<ITER_POWER_CL),
 			end_time-start_time);
 	printf("Ticks per iteration = %"PRIu64"\n\n",
-			(end_time-start_time) >> ITER_POWER);
+			(end_time-start_time) >> ITER_POWER_CL);
 }
 
-/* returns the total count of the number of packets handled by the worker
+/*
+ * returns the total count of the number of packets handled by the worker
  * functions given below.
  */
 static unsigned
@@ -123,7 +127,8 @@ clear_packet_count(void)
 	memset(&worker_stats, 0, sizeof(worker_stats));
 }
 
-/* this is the basic worker function for performance tests.
+/*
+ * This is the basic worker function for performance tests.
  * it does nothing but return packets and count them.
  */
 static int
@@ -151,14 +156,15 @@ handle_work(void *arg)
 	return 0;
 }
 
-/* this basic performance test just repeatedly sends in 32 packets at a time
+/*
+ * This basic performance test just repeatedly sends in 32 packets at a time
  * to the distributor and verifies at the end that we got them all in the worker
  * threads and finally how long per packet the processing took.
  */
 static inline int
 perf_test(struct rte_distributor *d, struct rte_mempool *p)
 {
-	unsigned i;
+	unsigned int i;
 	uint64_t start, end;
 	struct rte_mbuf *bufs[BURST];
 
@@ -181,7 +187,8 @@ perf_test(struct rte_distributor *d, struct rte_mempool *p)
 		rte_distributor_process(d, NULL, 0);
 	} while (total_packet_count() < (BURST << ITER_POWER));
 
-	printf("=== Performance test of distributor ===\n");
+	rte_distributor_clear_returns(d);
+
 	printf("Time per burst:  %"PRIu64"\n", (end - start) >> ITER_POWER);
 	printf("Time per packet: %"PRIu64"\n\n",
 			((end - start) >> ITER_POWER)/BURST);
@@ -201,9 +208,10 @@ perf_test(struct rte_distributor *d, struct rte_mempool *p)
 static void
 quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 {
-	const unsigned num_workers = rte_lcore_count() - 1;
-	unsigned i;
+	const unsigned int num_workers = rte_lcore_count() - 1;
+	unsigned int i;
 	struct rte_mbuf *bufs[RTE_MAX_LCORE];
+
 	rte_mempool_get_bulk(p, (void *)bufs, num_workers);
 
 	quit = 1;
@@ -222,7 +230,8 @@ quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 static int
 test_distributor_perf(void)
 {
-	static struct rte_distributor *d;
+	static struct rte_distributor *ds;
+	static struct rte_distributor *db;
 	static struct rte_mempool *p;
 
 	if (rte_lcore_count() < 2) {
@@ -233,17 +242,28 @@ test_distributor_perf(void)
 	/* first time how long it takes to round-trip a cache line */
 	time_cache_line_switch();
 
-	if (d == NULL) {
-		d = rte_distributor_create("Test_perf", rte_socket_id(),
+	if (ds == NULL) {
+		ds = rte_distributor_create("Test_perf", rte_socket_id(),
 				rte_lcore_count() - 1,
 				RTE_DIST_ALG_SINGLE);
-		if (d == NULL) {
+		if (ds == NULL) {
 			printf("Error creating distributor\n");
 			return -1;
 		}
 	} else {
-		rte_distributor_flush(d);
-		rte_distributor_clear_returns(d);
+		rte_distributor_clear_returns(ds);
+	}
+
+	if (db == NULL) {
+		db = rte_distributor_create("Test_burst", rte_socket_id(),
+				rte_lcore_count() - 1,
+				RTE_DIST_ALG_BURST);
+		if (db == NULL) {
+			printf("Error creating burst distributor\n");
+			return -1;
+		}
+	} else {
+		rte_distributor_clear_returns(db);
 	}
 
 	const unsigned nb_bufs = (511 * rte_lcore_count()) < BIG_BATCH ?
@@ -257,10 +277,17 @@ test_distributor_perf(void)
 		}
 	}
 
-	rte_eal_mp_remote_launch(handle_work, d, SKIP_MASTER);
-	if (perf_test(d, p) < 0)
+	printf("=== Performance test of distributor (single mode) ===\n");
+	rte_eal_mp_remote_launch(handle_work, ds, SKIP_MASTER);
+	if (perf_test(ds, p) < 0)
+		return -1;
+	quit_workers(ds, p);
+
+	printf("=== Performance test of distributor (burst mode) ===\n");
+	rte_eal_mp_remote_launch(handle_work, db, SKIP_MASTER);
+	if (perf_test(db, p) < 0)
 		return -1;
-	quit_workers(d, p);
+	quit_workers(db, p);
 
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v8 12/18] examples/distributor: allow for extra stats
  2017-03-01  7:47                       ` [PATCH v8 0/18] distributor library performance enhancements David Hunt
                                           ` (10 preceding siblings ...)
  2017-03-01  7:47                         ` [PATCH v8 11/18] test: add perf test for distributor burst mode David Hunt
@ 2017-03-01  7:47                         ` David Hunt
  2017-03-01  7:47                         ` [PATCH v8 13/18] sample: distributor: wait for ports to come up David Hunt
                                           ` (5 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

This will allow us to see what's going on at various stages
throughout the sample app, with per-second visibility

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 examples/distributor/main.c | 139 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 123 insertions(+), 16 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index cc3bdb0..3657e5d 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -54,24 +54,53 @@
 
 #define RTE_LOGTYPE_DISTRAPP RTE_LOGTYPE_USER1
 
+#define ANSI_COLOR_RED     "\x1b[31m"
+#define ANSI_COLOR_RESET   "\x1b[0m"
+
 /* mask of enabled ports */
 static uint32_t enabled_port_mask;
 volatile uint8_t quit_signal;
 volatile uint8_t quit_signal_rx;
+volatile uint8_t quit_signal_dist;
 
 static volatile struct app_stats {
 	struct {
 		uint64_t rx_pkts;
 		uint64_t returned_pkts;
 		uint64_t enqueued_pkts;
+		uint64_t enqdrop_pkts;
 	} rx __rte_cache_aligned;
+	int pad1 __rte_cache_aligned;
+
+	struct {
+		uint64_t in_pkts;
+		uint64_t ret_pkts;
+		uint64_t sent_pkts;
+		uint64_t enqdrop_pkts;
+	} dist __rte_cache_aligned;
+	int pad2 __rte_cache_aligned;
 
 	struct {
 		uint64_t dequeue_pkts;
 		uint64_t tx_pkts;
+		uint64_t enqdrop_pkts;
 	} tx __rte_cache_aligned;
+	int pad3 __rte_cache_aligned;
+
+	uint64_t worker_pkts[64] __rte_cache_aligned;
+
+	int pad4 __rte_cache_aligned;
+
+	uint64_t worker_bursts[64][8] __rte_cache_aligned;
+
+	int pad5 __rte_cache_aligned;
+
+	uint64_t port_rx_pkts[64] __rte_cache_aligned;
+	uint64_t port_tx_pkts[64] __rte_cache_aligned;
 } app_stats;
 
+struct app_stats prev_app_stats;
+
 static const struct rte_eth_conf port_conf_default = {
 	.rxmode = {
 		.mq_mode = ETH_MQ_RX_RSS,
@@ -93,6 +122,8 @@ struct output_buffer {
 	struct rte_mbuf *mbufs[BURST_SIZE];
 };
 
+static void print_stats(void);
+
 /*
  * Initialises a given port using global settings and with the rx buffers
  * coming from the mbuf_pool passed as parameter
@@ -378,25 +409,91 @@ static void
 print_stats(void)
 {
 	struct rte_eth_stats eth_stats;
-	unsigned i;
-
-	printf("\nRX thread stats:\n");
-	printf(" - Received:    %"PRIu64"\n", app_stats.rx.rx_pkts);
-	printf(" - Processed:   %"PRIu64"\n", app_stats.rx.returned_pkts);
-	printf(" - Enqueued:    %"PRIu64"\n", app_stats.rx.enqueued_pkts);
-
-	printf("\nTX thread stats:\n");
-	printf(" - Dequeued:    %"PRIu64"\n", app_stats.tx.dequeue_pkts);
-	printf(" - Transmitted: %"PRIu64"\n", app_stats.tx.tx_pkts);
+	unsigned int i, j;
+	const unsigned int num_workers = rte_lcore_count() - 4;
 
 	for (i = 0; i < rte_eth_dev_count(); i++) {
 		rte_eth_stats_get(i, &eth_stats);
-		printf("\nPort %u stats:\n", i);
-		printf(" - Pkts in:   %"PRIu64"\n", eth_stats.ipackets);
-		printf(" - Pkts out:  %"PRIu64"\n", eth_stats.opackets);
-		printf(" - In Errs:   %"PRIu64"\n", eth_stats.ierrors);
-		printf(" - Out Errs:  %"PRIu64"\n", eth_stats.oerrors);
-		printf(" - Mbuf Errs: %"PRIu64"\n", eth_stats.rx_nombuf);
+		app_stats.port_rx_pkts[i] = eth_stats.ipackets;
+		app_stats.port_tx_pkts[i] = eth_stats.opackets;
+	}
+
+	printf("\n\nRX Thread:\n");
+	for (i = 0; i < rte_eth_dev_count(); i++) {
+		printf("Port %u Pktsin : %5.2f\n", i,
+				(app_stats.port_rx_pkts[i] -
+				prev_app_stats.port_rx_pkts[i])/1000000.0);
+		prev_app_stats.port_rx_pkts[i] = app_stats.port_rx_pkts[i];
+	}
+	printf(" - Received:    %5.2f\n",
+			(app_stats.rx.rx_pkts -
+			prev_app_stats.rx.rx_pkts)/1000000.0);
+	printf(" - Returned:    %5.2f\n",
+			(app_stats.rx.returned_pkts -
+			prev_app_stats.rx.returned_pkts)/1000000.0);
+	printf(" - Enqueued:    %5.2f\n",
+			(app_stats.rx.enqueued_pkts -
+			prev_app_stats.rx.enqueued_pkts)/1000000.0);
+	printf(" - Dropped:     %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.rx.enqdrop_pkts -
+			prev_app_stats.rx.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	printf("Distributor thread:\n");
+	printf(" - In:          %5.2f\n",
+			(app_stats.dist.in_pkts -
+			prev_app_stats.dist.in_pkts)/1000000.0);
+	printf(" - Returned:    %5.2f\n",
+			(app_stats.dist.ret_pkts -
+			prev_app_stats.dist.ret_pkts)/1000000.0);
+	printf(" - Sent:        %5.2f\n",
+			(app_stats.dist.sent_pkts -
+			prev_app_stats.dist.sent_pkts)/1000000.0);
+	printf(" - Dropped      %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.dist.enqdrop_pkts -
+			prev_app_stats.dist.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	printf("TX thread:\n");
+	printf(" - Dequeued:    %5.2f\n",
+			(app_stats.tx.dequeue_pkts -
+			prev_app_stats.tx.dequeue_pkts)/1000000.0);
+	for (i = 0; i < rte_eth_dev_count(); i++) {
+		printf("Port %u Pktsout: %5.2f\n",
+				i, (app_stats.port_tx_pkts[i] -
+				prev_app_stats.port_tx_pkts[i])/1000000.0);
+		prev_app_stats.port_tx_pkts[i] = app_stats.port_tx_pkts[i];
+	}
+	printf(" - Transmitted: %5.2f\n",
+			(app_stats.tx.tx_pkts -
+			prev_app_stats.tx.tx_pkts)/1000000.0);
+	printf(" - Dropped:     %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.tx.enqdrop_pkts -
+			prev_app_stats.tx.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	prev_app_stats.rx.rx_pkts = app_stats.rx.rx_pkts;
+	prev_app_stats.rx.returned_pkts = app_stats.rx.returned_pkts;
+	prev_app_stats.rx.enqueued_pkts = app_stats.rx.enqueued_pkts;
+	prev_app_stats.rx.enqdrop_pkts = app_stats.rx.enqdrop_pkts;
+	prev_app_stats.dist.in_pkts = app_stats.dist.in_pkts;
+	prev_app_stats.dist.ret_pkts = app_stats.dist.ret_pkts;
+	prev_app_stats.dist.sent_pkts = app_stats.dist.sent_pkts;
+	prev_app_stats.dist.enqdrop_pkts = app_stats.dist.enqdrop_pkts;
+	prev_app_stats.tx.dequeue_pkts = app_stats.tx.dequeue_pkts;
+	prev_app_stats.tx.tx_pkts = app_stats.tx.tx_pkts;
+	prev_app_stats.tx.enqdrop_pkts = app_stats.tx.enqdrop_pkts;
+
+	for (i = 0; i < num_workers; i++) {
+		printf("Worker %02u Pkts: %5.2f. Bursts(1-8): ", i,
+				(app_stats.worker_pkts[i] -
+				prev_app_stats.worker_pkts[i])/1000000.0);
+		for (j = 0; j < 8; j++) {
+			printf("%"PRIu64" ", app_stats.worker_bursts[i][j]);
+			app_stats.worker_bursts[i][j] = 0;
+		}
+		printf("\n");
+		prev_app_stats.worker_pkts[i] = app_stats.worker_pkts[i];
 	}
 }
 
@@ -515,6 +612,7 @@ main(int argc, char *argv[])
 	unsigned nb_ports;
 	uint8_t portid;
 	uint8_t nb_ports_available;
+	uint64_t t, freq;
 
 	/* catch ctrl-c so we can print on exit */
 	signal(SIGINT, int_handler);
@@ -610,6 +708,15 @@ main(int argc, char *argv[])
 	if (lcore_rx(&p) != 0)
 		return -1;
 
+	freq = rte_get_timer_hz();
+	t = rte_rdtsc() + freq;
+	while (!quit_signal_dist) {
+		if (t < rte_rdtsc()) {
+			print_stats();
+			t = rte_rdtsc() + freq;
+		}
+	}
+
 	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
 		if (rte_eal_wait_lcore(lcore_id) < 0)
 			return -1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v8 13/18] sample: distributor: wait for ports to come up
  2017-03-01  7:47                       ` [PATCH v8 0/18] distributor library performance enhancements David Hunt
                                           ` (11 preceding siblings ...)
  2017-03-01  7:47                         ` [PATCH v8 12/18] examples/distributor: allow for extra stats David Hunt
@ 2017-03-01  7:47                         ` David Hunt
  2017-03-01  7:47                         ` [PATCH v8 14/18] examples/distributor: give distributor a core David Hunt
                                           ` (4 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

On some machines, ports take several seconds to come up. This
patch causes the app to wait.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 examples/distributor/main.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index 3657e5d..aeb75a8 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -1,8 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -62,6 +61,7 @@ static uint32_t enabled_port_mask;
 volatile uint8_t quit_signal;
 volatile uint8_t quit_signal_rx;
 volatile uint8_t quit_signal_dist;
+volatile uint8_t quit_signal_work;
 
 static volatile struct app_stats {
 	struct {
@@ -165,7 +165,8 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 
 	struct rte_eth_link link;
 	rte_eth_link_get_nowait(port, &link);
-	if (!link.link_status) {
+	while (!link.link_status) {
+		printf("Waiting for Link up on port %"PRIu8"\n", port);
 		sleep(1);
 		rte_eth_link_get_nowait(port, &link);
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v8 14/18] examples/distributor: give distributor a core
  2017-03-01  7:47                       ` [PATCH v8 0/18] distributor library performance enhancements David Hunt
                                           ` (12 preceding siblings ...)
  2017-03-01  7:47                         ` [PATCH v8 13/18] sample: distributor: wait for ports to come up David Hunt
@ 2017-03-01  7:47                         ` David Hunt
  2017-03-01  7:47                         ` [PATCH v8 15/18] examples/distributor: limit number of Tx rings David Hunt
                                           ` (3 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 examples/distributor/main.c | 181 ++++++++++++++++++++++++++++++--------------
 1 file changed, 123 insertions(+), 58 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index aeb75a8..e9ebe5e 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -49,6 +49,8 @@
 #define NUM_MBUFS ((64*1024)-1)
 #define MBUF_CACHE_SIZE 250
 #define BURST_SIZE 32
+#define SCHED_RX_RING_SZ 8192
+#define SCHED_TX_RING_SZ 65536
 #define RTE_RING_SZ 1024
 
 #define RTE_LOGTYPE_DISTRAPP RTE_LOGTYPE_USER1
@@ -193,37 +195,14 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 struct lcore_params {
 	unsigned worker_id;
 	struct rte_distributor *d;
-	struct rte_ring *r;
+	struct rte_ring *rx_dist_ring;
+	struct rte_ring *dist_tx_ring;
 	struct rte_mempool *mem_pool;
 };
 
 static int
-quit_workers(struct rte_distributor *d, struct rte_mempool *p)
-{
-	const unsigned num_workers = rte_lcore_count() - 2;
-	unsigned i;
-	struct rte_mbuf *bufs[num_workers];
-
-	if (rte_mempool_get_bulk(p, (void *)bufs, num_workers) != 0) {
-		printf("line %d: Error getting mbufs from pool\n", __LINE__);
-		return -1;
-	}
-
-	for (i = 0; i < num_workers; i++)
-		bufs[i]->hash.rss = i << 1;
-
-	rte_distributor_process(d, bufs, num_workers);
-	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
-
-	return 0;
-}
-
-static int
 lcore_rx(struct lcore_params *p)
 {
-	struct rte_distributor *d = p->d;
-	struct rte_mempool *mem_pool = p->mem_pool;
-	struct rte_ring *r = p->r;
 	const uint8_t nb_ports = rte_eth_dev_count();
 	const int socket_id = rte_socket_id();
 	uint8_t port;
@@ -260,9 +239,15 @@ lcore_rx(struct lcore_params *p)
 		}
 		app_stats.rx.rx_pkts += nb_rx;
 
-		rte_distributor_process(d, bufs, nb_rx);
-		const uint16_t nb_ret = rte_distributor_returned_pkts(d,
-				bufs, BURST_SIZE*2);
+/*
+ * You can run the distributor on the rx core with this code. Returned
+ * packets are then send straight to the tx core.
+ */
+#if 0
+	rte_distributor_process(d, bufs, nb_rx);
+	const uint16_t nb_ret = rte_distributor_returned_pktsd,
+			bufs, BURST_SIZE*2);
+
 		app_stats.rx.returned_pkts += nb_ret;
 		if (unlikely(nb_ret == 0)) {
 			if (++port == nb_ports)
@@ -270,7 +255,22 @@ lcore_rx(struct lcore_params *p)
 			continue;
 		}
 
-		uint16_t sent = rte_ring_enqueue_burst(r, (void *)bufs, nb_ret);
+		struct rte_ring *tx_ring = p->dist_tx_ring;
+		uint16_t sent = rte_ring_enqueue_burst(tx_ring,
+				(void *)bufs, nb_ret);
+#else
+		uint16_t nb_ret = nb_rx;
+		/*
+		 * Swap the following two lines if you want the rx traffic
+		 * to go directly to tx, no distribution.
+		 */
+		struct rte_ring *out_ring = p->rx_dist_ring;
+		/* struct rte_ring *out_ring = p->dist_tx_ring; */
+
+		uint16_t sent = rte_ring_enqueue_burst(out_ring,
+				(void *)bufs, nb_ret);
+#endif
+
 		app_stats.rx.enqueued_pkts += sent;
 		if (unlikely(sent < nb_ret)) {
 			RTE_LOG_DP(DEBUG, DISTRAPP,
@@ -281,20 +281,9 @@ lcore_rx(struct lcore_params *p)
 		if (++port == nb_ports)
 			port = 0;
 	}
-	rte_distributor_process(d, NULL, 0);
-	/* flush distributor to bring to known state */
-	rte_distributor_flush(d);
 	/* set worker & tx threads quit flag */
+	printf("\nCore %u exiting rx task.\n", rte_lcore_id());
 	quit_signal = 1;
-	/*
-	 * worker threads may hang in get packet as
-	 * distributor process is not running, just make sure workers
-	 * get packets till quit_signal is actually been
-	 * received and they gracefully shutdown
-	 */
-	if (quit_workers(d, mem_pool) != 0)
-		return -1;
-	/* rx thread should quit at last */
 	return 0;
 }
 
@@ -331,6 +320,58 @@ flush_all_ports(struct output_buffer *tx_buffers, uint8_t nb_ports)
 	}
 }
 
+
+
+static int
+lcore_distributor(struct lcore_params *p)
+{
+	struct rte_ring *in_r = p->rx_dist_ring;
+	struct rte_ring *out_r = p->dist_tx_ring;
+	struct rte_mbuf *bufs[BURST_SIZE * 4];
+	struct rte_distributor *d = p->d;
+
+	printf("\nCore %u acting as distributor core.\n", rte_lcore_id());
+	while (!quit_signal_dist) {
+		const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
+				(void *)bufs, BURST_SIZE*1);
+		if (nb_rx) {
+			app_stats.dist.in_pkts += nb_rx;
+
+			/* Distribute the packets */
+			rte_distributor_process(d, bufs, nb_rx);
+			/* Handle Returns */
+			const uint16_t nb_ret =
+				rte_distributor_returned_pkts(d,
+					bufs, BURST_SIZE*2);
+
+			if (unlikely(nb_ret == 0))
+				continue;
+			app_stats.dist.ret_pkts += nb_ret;
+
+			uint16_t sent = rte_ring_enqueue_burst(out_r,
+					(void *)bufs, nb_ret);
+			app_stats.dist.sent_pkts += sent;
+			if (unlikely(sent < nb_ret)) {
+				app_stats.dist.enqdrop_pkts += nb_ret - sent;
+				RTE_LOG(DEBUG, DISTRAPP,
+					"%s:Packet loss due to full out ring\n",
+					__func__);
+				while (sent < nb_ret)
+					rte_pktmbuf_free(bufs[sent++]);
+			}
+		}
+	}
+	printf("\nCore %u exiting distributor task.\n", rte_lcore_id());
+	quit_signal_work = 1;
+
+	rte_distributor_flush(d);
+	/* Unblock any returns so workers can exit */
+	rte_distributor_clear_returns(d);
+	quit_signal_rx = 1;
+	return 0;
+}
+
+
 static int
 lcore_tx(struct rte_ring *in_r)
 {
@@ -403,7 +444,7 @@ int_handler(int sig_num)
 {
 	printf("Exiting on signal %d\n", sig_num);
 	/* set quit flag for rx thread to exit */
-	quit_signal_rx = 1;
+	quit_signal_dist = 1;
 }
 
 static void
@@ -517,7 +558,7 @@ lcore_worker(struct lcore_params *p)
 		buf[i] = NULL;
 
 	printf("\nCore %u acting as worker core.\n", rte_lcore_id());
-	while (!quit_signal) {
+	while (!quit_signal_work) {
 		num = rte_distributor_get_pkt(d, id, buf, buf, num);
 		/* Do a little bit of work for each packet */
 		for (i = 0; i < num; i++) {
@@ -608,7 +649,8 @@ main(int argc, char *argv[])
 {
 	struct rte_mempool *mbuf_pool;
 	struct rte_distributor *d;
-	struct rte_ring *output_ring;
+	struct rte_ring *dist_tx_ring;
+	struct rte_ring *rx_dist_ring;
 	unsigned lcore_id, worker_id = 0;
 	unsigned nb_ports;
 	uint8_t portid;
@@ -630,10 +672,11 @@ main(int argc, char *argv[])
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Invalid distributor parameters\n");
 
-	if (rte_lcore_count() < 3)
+	if (rte_lcore_count() < 4)
 		rte_exit(EXIT_FAILURE, "Error, This application needs at "
-				"least 3 logical cores to run:\n"
-				"1 lcore for packet RX and distribution\n"
+				"least 4 logical cores to run:\n"
+				"1 lcore for packet RX\n"
+				"1 lcore for distribution\n"
 				"1 lcore for packet TX\n"
 				"and at least 1 lcore for worker threads\n");
 
@@ -673,30 +716,52 @@ main(int argc, char *argv[])
 	}
 
 	d = rte_distributor_create("PKT_DIST", rte_socket_id(),
-			rte_lcore_count() - 2,
+			rte_lcore_count() - 3,
 			RTE_DIST_ALG_BURST);
 	if (d == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create distributor\n");
 
 	/*
-	 * scheduler ring is read only by the transmitter core, but written to
-	 * by multiple threads
+	 * scheduler ring is read by the transmitter core, and written to
+	 * by scheduler core
 	 */
-	output_ring = rte_ring_create("Output_ring", RTE_RING_SZ,
-			rte_socket_id(), RING_F_SC_DEQ);
-	if (output_ring == NULL)
+	dist_tx_ring = rte_ring_create("Output_ring", SCHED_TX_RING_SZ,
+			rte_socket_id(), RING_F_SC_DEQ | RING_F_SP_ENQ);
+	if (dist_tx_ring == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot create output ring\n");
+
+	rx_dist_ring = rte_ring_create("Input_ring", SCHED_RX_RING_SZ,
+			rte_socket_id(), RING_F_SC_DEQ | RING_F_SP_ENQ);
+	if (rx_dist_ring == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create output ring\n");
 
 	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
-		if (worker_id == rte_lcore_count() - 2)
+		if (worker_id == rte_lcore_count() - 3) {
+			printf("Starting distributor on lcore_id %d\n",
+					lcore_id);
+			/* distributor core */
+			struct lcore_params *p =
+					rte_malloc(NULL, sizeof(*p), 0);
+			if (!p)
+				rte_panic("malloc failure\n");
+			*p = (struct lcore_params){worker_id, d,
+				rx_dist_ring, dist_tx_ring, mbuf_pool};
+			rte_eal_remote_launch(
+				(lcore_function_t *)lcore_distributor,
+				p, lcore_id);
+		} else if (worker_id == rte_lcore_count() - 4) {
+			printf("Starting tx  on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
+			/* tx core */
 			rte_eal_remote_launch((lcore_function_t *)lcore_tx,
-					output_ring, lcore_id);
-		else {
+					dist_tx_ring, lcore_id);
+		} else {
 			struct lcore_params *p =
 					rte_malloc(NULL, sizeof(*p), 0);
 			if (!p)
 				rte_panic("malloc failure\n");
-			*p = (struct lcore_params){worker_id, d, output_ring, mbuf_pool};
+			*p = (struct lcore_params){worker_id, d, rx_dist_ring,
+					dist_tx_ring, mbuf_pool};
 
 			rte_eal_remote_launch((lcore_function_t *)lcore_worker,
 					p, lcore_id);
@@ -704,7 +769,7 @@ main(int argc, char *argv[])
 		worker_id++;
 	}
 	/* call lcore_main on master core only */
-	struct lcore_params p = { 0, d, output_ring, mbuf_pool};
+	struct lcore_params p = { 0, d, rx_dist_ring, dist_tx_ring, mbuf_pool};
 
 	if (lcore_rx(&p) != 0)
 		return -1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v8 15/18] examples/distributor: limit number of Tx rings
  2017-03-01  7:47                       ` [PATCH v8 0/18] distributor library performance enhancements David Hunt
                                           ` (13 preceding siblings ...)
  2017-03-01  7:47                         ` [PATCH v8 14/18] examples/distributor: give distributor a core David Hunt
@ 2017-03-01  7:47                         ` David Hunt
  2017-03-01  7:47                         ` [PATCH v8 16/18] examples/distributor: give Rx thread a core David Hunt
                                           ` (2 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 examples/distributor/main.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index e9ebe5e..cf2e826 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -44,14 +44,15 @@
 #include <rte_prefetch.h>
 #include <rte_distributor.h>
 
-#define RX_RING_SIZE 256
-#define TX_RING_SIZE 512
+#define RX_QUEUE_SIZE 512
+#define TX_QUEUE_SIZE 512
+
 #define NUM_MBUFS ((64*1024)-1)
-#define MBUF_CACHE_SIZE 250
-#define BURST_SIZE 32
+#define MBUF_CACHE_SIZE 128
+#define BURST_SIZE 64
 #define SCHED_RX_RING_SZ 8192
 #define SCHED_TX_RING_SZ 65536
-#define RTE_RING_SZ 1024
+#define BURST_SIZE_TX 32
 
 #define RTE_LOGTYPE_DISTRAPP RTE_LOGTYPE_USER1
 
@@ -134,9 +135,13 @@ static inline int
 port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 {
 	struct rte_eth_conf port_conf = port_conf_default;
-	const uint16_t rxRings = 1, txRings = rte_lcore_count() - 1;
-	int retval;
+	const uint16_t rxRings = 1;
+	uint16_t txRings = rte_lcore_count() - 1;
 	uint16_t q;
+	int retval;
+
+	if (txRings > RTE_MAX_ETHPORTS)
+		txRings = RTE_MAX_ETHPORTS;
 
 	if (port >= rte_eth_dev_count())
 		return -1;
@@ -146,7 +151,7 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 		return retval;
 
 	for (q = 0; q < rxRings; q++) {
-		retval = rte_eth_rx_queue_setup(port, q, RX_RING_SIZE,
+		retval = rte_eth_rx_queue_setup(port, q, RX_QUEUE_SIZE,
 						rte_eth_dev_socket_id(port),
 						NULL, mbuf_pool);
 		if (retval < 0)
@@ -154,7 +159,7 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 	}
 
 	for (q = 0; q < txRings; q++) {
-		retval = rte_eth_tx_queue_setup(port, q, TX_RING_SIZE,
+		retval = rte_eth_tx_queue_setup(port, q, TX_QUEUE_SIZE,
 						rte_eth_dev_socket_id(port),
 						NULL);
 		if (retval < 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v8 16/18] examples/distributor: give Rx thread a core
  2017-03-01  7:47                       ` [PATCH v8 0/18] distributor library performance enhancements David Hunt
                                           ` (14 preceding siblings ...)
  2017-03-01  7:47                         ` [PATCH v8 15/18] examples/distributor: limit number of Tx rings David Hunt
@ 2017-03-01  7:47                         ` David Hunt
  2017-03-01  7:47                         ` [PATCH v8 17/18] doc: distributor library changes for new burst API David Hunt
  2017-03-01  7:47                         ` [PATCH v8 18/18] maintainers: add to distributor lib maintainers David Hunt
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

This so that with the increased amount of stats we are counting,
we don't interfere with the rx core.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 examples/distributor/main.c | 50 ++++++++++++++++++++++++++++++---------------
 1 file changed, 34 insertions(+), 16 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index cf2e826..8daf43d 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -278,6 +278,7 @@ lcore_rx(struct lcore_params *p)
 
 		app_stats.rx.enqueued_pkts += sent;
 		if (unlikely(sent < nb_ret)) {
+			app_stats.rx.enqdrop_pkts +=  nb_ret - sent;
 			RTE_LOG_DP(DEBUG, DISTRAPP,
 				"%s:Packet loss due to full ring\n", __func__);
 			while (sent < nb_ret)
@@ -295,13 +296,12 @@ lcore_rx(struct lcore_params *p)
 static inline void
 flush_one_port(struct output_buffer *outbuf, uint8_t outp)
 {
-	unsigned nb_tx = rte_eth_tx_burst(outp, 0, outbuf->mbufs,
-			outbuf->count);
-	app_stats.tx.tx_pkts += nb_tx;
+	unsigned int nb_tx = rte_eth_tx_burst(outp, 0,
+			outbuf->mbufs, outbuf->count);
+	app_stats.tx.tx_pkts += outbuf->count;
 
 	if (unlikely(nb_tx < outbuf->count)) {
-		RTE_LOG_DP(DEBUG, DISTRAPP,
-			"%s:Packet loss with tx_burst\n", __func__);
+		app_stats.tx.enqdrop_pkts +=  outbuf->count - nb_tx;
 		do {
 			rte_pktmbuf_free(outbuf->mbufs[nb_tx]);
 		} while (++nb_tx < outbuf->count);
@@ -313,6 +313,7 @@ static inline void
 flush_all_ports(struct output_buffer *tx_buffers, uint8_t nb_ports)
 {
 	uint8_t outp;
+
 	for (outp = 0; outp < nb_ports; outp++) {
 		/* skip ports that are not enabled */
 		if ((enabled_port_mask & (1 << outp)) == 0)
@@ -405,9 +406,9 @@ lcore_tx(struct rte_ring *in_r)
 			if ((enabled_port_mask & (1 << port)) == 0)
 				continue;
 
-			struct rte_mbuf *bufs[BURST_SIZE];
+			struct rte_mbuf *bufs[BURST_SIZE_TX];
 			const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
-					(void *)bufs, BURST_SIZE);
+					(void *)bufs, BURST_SIZE_TX);
 			app_stats.tx.dequeue_pkts += nb_rx;
 
 			/* if we get no traffic, flush anything we have */
@@ -436,11 +437,12 @@ lcore_tx(struct rte_ring *in_r)
 
 				outbuf = &tx_buffers[outp];
 				outbuf->mbufs[outbuf->count++] = bufs[i];
-				if (outbuf->count == BURST_SIZE)
+				if (outbuf->count == BURST_SIZE_TX)
 					flush_one_port(outbuf, outp);
 			}
 		}
 	}
+	printf("\nCore %u exiting tx task.\n", rte_lcore_id());
 	return 0;
 }
 
@@ -562,6 +564,8 @@ lcore_worker(struct lcore_params *p)
 	for (i = 0; i < 8; i++)
 		buf[i] = NULL;
 
+	app_stats.worker_pkts[p->worker_id] = 1;
+
 	printf("\nCore %u acting as worker core.\n", rte_lcore_id());
 	while (!quit_signal_work) {
 		num = rte_distributor_get_pkt(d, id, buf, buf, num);
@@ -573,6 +577,10 @@ lcore_worker(struct lcore_params *p)
 				rte_pause();
 			buf[i]->port ^= xor_val;
 		}
+
+		app_stats.worker_pkts[p->worker_id] += num;
+		if (num > 0)
+			app_stats.worker_bursts[p->worker_id][num-1]++;
 	}
 	return 0;
 }
@@ -677,9 +685,10 @@ main(int argc, char *argv[])
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Invalid distributor parameters\n");
 
-	if (rte_lcore_count() < 4)
+	if (rte_lcore_count() < 5)
 		rte_exit(EXIT_FAILURE, "Error, This application needs at "
-				"least 4 logical cores to run:\n"
+				"least 5 logical cores to run:\n"
+				"1 lcore for stats (can be core 0)\n"
 				"1 lcore for packet RX\n"
 				"1 lcore for distribution\n"
 				"1 lcore for packet TX\n"
@@ -721,7 +730,7 @@ main(int argc, char *argv[])
 	}
 
 	d = rte_distributor_create("PKT_DIST", rte_socket_id(),
-			rte_lcore_count() - 3,
+			rte_lcore_count() - 4,
 			RTE_DIST_ALG_BURST);
 	if (d == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create distributor\n");
@@ -760,7 +769,21 @@ main(int argc, char *argv[])
 			/* tx core */
 			rte_eal_remote_launch((lcore_function_t *)lcore_tx,
 					dist_tx_ring, lcore_id);
+		} else if (worker_id == rte_lcore_count() - 2) {
+			printf("Starting rx on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
+			/* rx core */
+			struct lcore_params *p =
+					rte_malloc(NULL, sizeof(*p), 0);
+			if (!p)
+				rte_panic("malloc failure\n");
+			*p = (struct lcore_params){worker_id, d, rx_dist_ring,
+					dist_tx_ring, mbuf_pool};
+			rte_eal_remote_launch((lcore_function_t *)lcore_rx,
+					p, lcore_id);
 		} else {
+			printf("Starting worker on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
 			struct lcore_params *p =
 					rte_malloc(NULL, sizeof(*p), 0);
 			if (!p)
@@ -773,11 +796,6 @@ main(int argc, char *argv[])
 		}
 		worker_id++;
 	}
-	/* call lcore_main on master core only */
-	struct lcore_params p = { 0, d, rx_dist_ring, dist_tx_ring, mbuf_pool};
-
-	if (lcore_rx(&p) != 0)
-		return -1;
 
 	freq = rte_get_timer_hz();
 	t = rte_rdtsc() + freq;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v8 17/18] doc: distributor library changes for new burst API
  2017-03-01  7:47                       ` [PATCH v8 0/18] distributor library performance enhancements David Hunt
                                           ` (15 preceding siblings ...)
  2017-03-01  7:47                         ` [PATCH v8 16/18] examples/distributor: give Rx thread a core David Hunt
@ 2017-03-01  7:47                         ` David Hunt
  2017-03-01  7:47                         ` [PATCH v8 18/18] maintainers: add to distributor lib maintainers David Hunt
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 doc/guides/prog_guide/packet_distrib_lib.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/guides/prog_guide/packet_distrib_lib.rst b/doc/guides/prog_guide/packet_distrib_lib.rst
index b5bdabb..e0adcaa 100644
--- a/doc/guides/prog_guide/packet_distrib_lib.rst
+++ b/doc/guides/prog_guide/packet_distrib_lib.rst
@@ -42,6 +42,9 @@ The model of operation is shown in the diagram below.
 
    Packet Distributor mode of operation
 
+There are two modes of operation of the API in the distributor Library, one which sends one packet at a time
+to workers using 32-bits for flow_id, and an optiomised mode which sends bursts of up to 8 packets at a time
+to workers, using 15 bits of flow_id. The mode is selected by the type field in the ``rte_distributor_create function``.
 
 Distributor Core Operation
 --------------------------
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v8 18/18] maintainers: add to distributor lib maintainers
  2017-03-01  7:47                       ` [PATCH v8 0/18] distributor library performance enhancements David Hunt
                                           ` (16 preceding siblings ...)
  2017-03-01  7:47                         ` [PATCH v8 17/18] doc: distributor library changes for new burst API David Hunt
@ 2017-03-01  7:47                         ` David Hunt
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-01  7:47 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5030c1c..42eece0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -501,6 +501,7 @@ F: doc/guides/sample_app_ug/ip_reassembly.rst
 
 Distributor
 M: Bruce Richardson <bruce.richardson@intel.com>
+M: David Hunt <david.hunt@intel.com>
 F: lib/librte_distributor/
 F: doc/guides/prog_guide/packet_distrib_lib.rst
 F: test/test/test_distributor*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* Re: [PATCH v7 01/17] lib: rename legacy distributor lib files
  2017-02-24 14:03                       ` Bruce Richardson
@ 2017-03-01  9:55                         ` Hunt, David
  0 siblings, 0 replies; 202+ messages in thread
From: Hunt, David @ 2017-03-01  9:55 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev



On 24/2/2017 2:03 PM, Bruce Richardson wrote:
> On Tue, Feb 21, 2017 at 03:17:37AM +0000, David Hunt wrote:
>> Move files out of the way so that we can replace with new
>> versions of the distributor libtrary. Files are named in
>> such a way as to match the symbol versioning that we will
>> apply for backward ABI compatibility.
>>
>> Signed-off-by: David Hunt <david.hunt@intel.com>
>> ---
>>   app/test/test_distributor.c                  |   2 +-
>>   app/test/test_distributor_perf.c             |   2 +-
>>   examples/distributor/main.c                  |   2 +-
>>   lib/librte_distributor/Makefile              |   4 +-
>>   lib/librte_distributor/rte_distributor.c     | 487 ---------------------------
>>   lib/librte_distributor/rte_distributor.h     | 247 --------------
>>   lib/librte_distributor/rte_distributor_v20.c | 487 +++++++++++++++++++++++++++
>>   lib/librte_distributor/rte_distributor_v20.h | 247 ++++++++++++++
> Rather than changing the unit tests and example applications, I think
> this patch would be better with a new rte_distributor.h file which
> simply does "#include  <rte_distributor_v20.h>". Alternatively, I
> recently upstreamed a patch, which went into 17.02, to allow symlinks in
> the folder so you could create a symlink to the renamed file.
>
> /Bruce

Thanks for the review, Bruce. I've just finished reworking the patchset 
on your review comments (including later emails) and will post soon.

Regards,
Dave.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v7 04/17] lib: add new burst oriented distributor structs
  2017-02-24 14:08                       ` Bruce Richardson
@ 2017-03-01  9:57                         ` Hunt, David
  0 siblings, 0 replies; 202+ messages in thread
From: Hunt, David @ 2017-03-01  9:57 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev



On 24/2/2017 2:08 PM, Bruce Richardson wrote:
> On Tue, Feb 21, 2017 at 03:17:40AM +0000, David Hunt wrote:
>> Signed-off-by: David Hunt <david.hunt@intel.com>
>> ---
>>   lib/librte_distributor/rte_distributor_private.h | 61 ++++++++++++++++++++++++
>>   1 file changed, 61 insertions(+)
>>
>> diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
>> index 2d85b9b..c8e0f98 100644
>> --- a/lib/librte_distributor/rte_distributor_private.h
>> +++ b/lib/librte_distributor/rte_distributor_private.h
>> @@ -129,6 +129,67 @@ struct rte_distributor_v20 {
>>   	struct rte_distributor_returned_pkts returns;
>>   };
>>   
>> +/* All different signature compare functions */
>> +enum rte_distributor_match_function {
>> +	RTE_DIST_MATCH_SCALAR = 0,
>> +	RTE_DIST_MATCH_VECTOR,
>> +	RTE_DIST_NUM_MATCH_FNS
>> +};
>> +
>> +/**
>> + * Buffer structure used to pass the pointer data between cores. This is cache
>> + * line aligned, but to improve performance and prevent adjacent cache-line
>> + * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
>> + * the next cache line to worker 0, we pad this out to two cache lines.
>> + * We can pass up to 8 mbufs at a time in one cacheline.
>> + * There is a separate cacheline for returns in the burst API.
>> + */
>> +struct rte_distributor_buffer {
>> +	volatile int64_t bufptr64[RTE_DIST_BURST_SIZE]
>> +			__rte_cache_aligned; /* <= outgoing to worker */
>> +
>> +	int64_t pad1 __rte_cache_aligned;    /* <= one cache line  */
>> +
>> +	volatile int64_t retptr64[RTE_DIST_BURST_SIZE]
>> +			__rte_cache_aligned; /* <= incoming from worker */
>> +
>> +	int64_t pad2 __rte_cache_aligned;    /* <= one cache line  */
>> +
>> +	int count __rte_cache_aligned;       /* <= number of current mbufs */
>> +};
>> +
>> +struct rte_distributor {
>> +	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
>> +
>> +	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
>> +	unsigned int num_workers;             /**< Number of workers polling */
>> +	unsigned int alg_type;                /**< Number of alg types */
>> +
>> +	/**>
>> +	 * First cache line in the this array are the tags inflight
>> +	 * on the worker core. Second cache line are the backlog
>> +	 * that are going to go to the worker core.
>> +	 */
>> +	uint16_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS][RTE_DIST_BURST_SIZE*2]
>> +			__rte_cache_aligned;
>> +
>> +	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS]
>> +			__rte_cache_aligned;
>> +
>> +	struct rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
>> +
>> +	struct rte_distributor_returned_pkts returns;
>> +
>> +	enum rte_distributor_match_function dist_match_fn;
>> +
>> +	struct rte_distributor_v20 *d_v20;
>> +};
>> +
>> +void
>> +find_match_scalar(struct rte_distributor *d,
>> +			uint16_t *data_ptr,
>> +			uint16_t *output_ptr);
>> +
>>   #ifdef __cplusplus
>>   }
>>   #endif
> The last patch claimed that this header file is for structs/definitions
> common between the old and new implementations. These definitions look
> to apply only to the new one, so do they belong in the .c file instead?

The _v20 structs are used as a fallback in the new struct, so probably 
best to have in a common private file.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v7 04/17] lib: add new burst oriented distributor structs
  2017-02-24 14:09                       ` Bruce Richardson
@ 2017-03-01  9:58                         ` Hunt, David
  0 siblings, 0 replies; 202+ messages in thread
From: Hunt, David @ 2017-03-01  9:58 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev



On 24/2/2017 2:09 PM, Bruce Richardson wrote:
> On Tue, Feb 21, 2017 at 03:17:40AM +0000, David Hunt wrote:
>> Signed-off-by: David Hunt <david.hunt@intel.com>
>> ---
>>   lib/librte_distributor/rte_distributor_private.h | 61 ++++++++++++++++++++++++
>>   1 file changed, 61 insertions(+)
>>
>> diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
>> index 2d85b9b..c8e0f98 100644
>> --- a/lib/librte_distributor/rte_distributor_private.h
>> +++ b/lib/librte_distributor/rte_distributor_private.h
>> @@ -129,6 +129,67 @@ struct rte_distributor_v20 {
>>   	struct rte_distributor_returned_pkts returns;
>>   };
>>   
>> +/* All different signature compare functions */
>> +enum rte_distributor_match_function {
>> +	RTE_DIST_MATCH_SCALAR = 0,
>> +	RTE_DIST_MATCH_VECTOR,
>> +	RTE_DIST_NUM_MATCH_FNS
>> +};
>> +
>> +/**
>> + * Buffer structure used to pass the pointer data between cores. This is cache
>> + * line aligned, but to improve performance and prevent adjacent cache-line
>> + * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
>> + * the next cache line to worker 0, we pad this out to two cache lines.
>> + * We can pass up to 8 mbufs at a time in one cacheline.
>> + * There is a separate cacheline for returns in the burst API.
>> + */
>> +struct rte_distributor_buffer {
>> +	volatile int64_t bufptr64[RTE_DIST_BURST_SIZE]
>> +			__rte_cache_aligned; /* <= outgoing to worker */
>> +
>> +	int64_t pad1 __rte_cache_aligned;    /* <= one cache line  */
>> +
>> +	volatile int64_t retptr64[RTE_DIST_BURST_SIZE]
>> +			__rte_cache_aligned; /* <= incoming from worker */
>> +
>> +	int64_t pad2 __rte_cache_aligned;    /* <= one cache line  */
>> +
>> +	int count __rte_cache_aligned;       /* <= number of current mbufs */
>> +};
> Rather than adding padding elements here, would it be better and clearer
> just to align the values to 128B (or more strictly CACHE_LINE_SZ * 2)?
>
> /Bruce

I tried various combinations of __rte_align(128) and taking out the 
pads, but the performance regressed 10-15%. For the moment, I suggest 
leaving as is.

Dave.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v7 08/17] test: change params to distributor autotest
  2017-02-24 14:14                       ` Bruce Richardson
@ 2017-03-01 10:06                         ` Hunt, David
  0 siblings, 0 replies; 202+ messages in thread
From: Hunt, David @ 2017-03-01 10:06 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev


On 24/2/2017 2:14 PM, Bruce Richardson wrote:
> On Tue, Feb 21, 2017 at 03:17:44AM +0000, David Hunt wrote:
>> In the next few patches, we'll want to test old and new API,
>> so here we're allowing different parameters to be passed to
>> the tests, instead of just a distributor struct.
>>
>> Signed-off-by: David Hunt <david.hunt@intel.com>
>> ---
>>   app/test/test_distributor.c | 64 +++++++++++++++++++++++++++++----------------
>>   1 file changed, 42 insertions(+), 22 deletions(-)
>>
>> diff --git a/app/test/test_distributor.c b/app/test/test_distributor.c
>> index 6a4e20b..fdfa793 100644
>> --- a/app/test/test_distributor.c
>> +++ b/app/test/test_distributor.c
>> @@ -45,6 +45,13 @@
>>   #define BURST 32
>>   #define BIG_BATCH 1024
>>   
>> +struct worker_params {
>> +	char name[64];
>> +	struct rte_distributor_v20 *dist;
>> +};
>> +
>> +struct worker_params worker_params;
>> +
>>   /* statics - all zero-initialized by default */
>>   static volatile int quit;      /**< general quit variable for all threads */
>>   static volatile int zero_quit; /**< var for when we just want thr0 to quit*/
>> @@ -81,7 +88,8 @@ static int
>>   handle_work(void *arg)
>>   {
>>   	struct rte_mbuf *pkt = NULL;
>> -	struct rte_distributor_v20 *d = arg;
>> +	struct worker_params *wp = arg;
>> +	struct rte_distributor_v20 *d = wp->dist;
> The cover letter indicated that using new vs old API was just a matter
> of passing a different parameter to create. I therefore would not expect
> to see references to v20 APIs or structures in any code outside the lib
> itself. Am I missing something?
>
> /Bruce
>

Patchset has now been reworked so the api switches over and apps 
migrated to new SPI in one step, so _v20 never exposed.

Dave.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v8 09/18] lib: add symbol versioning to distributor
  2017-03-01  7:47                         ` [PATCH v8 09/18] lib: add symbol versioning to distributor David Hunt
@ 2017-03-01 14:50                           ` Hunt, David
  0 siblings, 0 replies; 202+ messages in thread
From: Hunt, David @ 2017-03-01 14:50 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson

ERROR:SPACING: space prohibited before that ',' (ctx:WxW)
#84: FILE: lib/librte_distributor/rte_distributor.c:172:
+BIND_DEFAULT_SYMBOL(rte_distributor_get_pkt, , 17.05);
                                               ^

FYI, checkpatch does not like this regardless of whether there's
a space there or not. It complains either way. :)

Regards,
Dave.



On 1/3/2017 7:47 AM, David Hunt wrote:
> Also bumped up the ABI version number in the Makefile
>
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>   lib/librte_distributor/Makefile                    |  2 +-
>   lib/librte_distributor/rte_distributor.c           |  8 ++++++++
>   lib/librte_distributor/rte_distributor_v20.c       | 10 ++++++++++
>   lib/librte_distributor/rte_distributor_version.map | 14 ++++++++++++++
>   4 files changed, 33 insertions(+), 1 deletion(-)
>
> diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
> index 2b28eff..2f05cf3 100644
> --- a/lib/librte_distributor/Makefile
> +++ b/lib/librte_distributor/Makefile
> @@ -39,7 +39,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
>   
>   EXPORT_MAP := rte_distributor_version.map
>   
> -LIBABIVER := 1
> +LIBABIVER := 2
>   
>   # all source are stored in SRCS-y
>   SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
> diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
> index 6e1debf..2c5511d 100644
> --- a/lib/librte_distributor/rte_distributor.c
> +++ b/lib/librte_distributor/rte_distributor.c
> @@ -36,6 +36,7 @@
>   #include <rte_mbuf.h>
>   #include <rte_memory.h>
>   #include <rte_cycles.h>
> +#include <rte_compat.h>
>   #include <rte_memzone.h>
>   #include <rte_errno.h>
>   #include <rte_string_fns.h>
> @@ -168,6 +169,7 @@ rte_distributor_get_pkt(struct rte_distributor *d,
>   	}
>   	return count;
>   }
> +BIND_DEFAULT_SYMBOL(rte_distributor_get_pkt, , 17.05);
>   
>   int
>   rte_distributor_return_pkt(struct rte_distributor *d,
> @@ -197,6 +199,7 @@ rte_distributor_return_pkt(struct rte_distributor *d,
>   
>   	return 0;
>   }
> +BIND_DEFAULT_SYMBOL(rte_distributor_return_pkt, , 17.05);
>   
>   /**** APIs called on distributor core ***/
>   
> @@ -476,6 +479,7 @@ rte_distributor_process(struct rte_distributor *d,
>   
>   	return num_mbufs;
>   }
> +BIND_DEFAULT_SYMBOL(rte_distributor_process, , 17.05);
>   
>   /* return to the caller, packets returned from workers */
>   int
> @@ -504,6 +508,7 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
>   
>   	return retval;
>   }
> +BIND_DEFAULT_SYMBOL(rte_distributor_returned_pkts, , 17.05);
>   
>   /*
>    * Return the number of packets in-flight in a distributor, i.e. packets
> @@ -549,6 +554,7 @@ rte_distributor_flush(struct rte_distributor *d)
>   
>   	return flushed;
>   }
> +BIND_DEFAULT_SYMBOL(rte_distributor_flush, , 17.05);
>   
>   /* clears the internal returns array in the distributor */
>   void
> @@ -565,6 +571,7 @@ rte_distributor_clear_returns(struct rte_distributor *d)
>   	for (wkr = 0; wkr < d->num_workers; wkr++)
>   		d->bufs[wkr].retptr64[0] = 0;
>   }
> +BIND_DEFAULT_SYMBOL(rte_distributor_clear_returns, , 17.05);
>   
>   /* creates a distributor instance */
>   struct rte_distributor *
> @@ -638,3 +645,4 @@ rte_distributor_create(const char *name,
>   
>   	return d;
>   }
> +BIND_DEFAULT_SYMBOL(rte_distributor_create, , 17.05);
> diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
> index 1f406c5..bb6c5d7 100644
> --- a/lib/librte_distributor/rte_distributor_v20.c
> +++ b/lib/librte_distributor/rte_distributor_v20.c
> @@ -38,6 +38,7 @@
>   #include <rte_memory.h>
>   #include <rte_memzone.h>
>   #include <rte_errno.h>
> +#include <rte_compat.h>
>   #include <rte_string_fns.h>
>   #include <rte_eal_memconfig.h>
>   #include "rte_distributor_v20.h"
> @@ -63,6 +64,7 @@ rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
>   		rte_pause();
>   	buf->bufptr64 = req;
>   }
> +VERSION_SYMBOL(rte_distributor_request_pkt, _v20, 2.0);
>   
>   struct rte_mbuf *
>   rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
> @@ -76,6 +78,7 @@ rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
>   	int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
>   	return (struct rte_mbuf *)((uintptr_t)ret);
>   }
> +VERSION_SYMBOL(rte_distributor_poll_pkt, _v20, 2.0);
>   
>   struct rte_mbuf *
>   rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
> @@ -87,6 +90,7 @@ rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
>   		rte_pause();
>   	return ret;
>   }
> +VERSION_SYMBOL(rte_distributor_get_pkt, _v20, 2.0);
>   
>   int
>   rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
> @@ -98,6 +102,7 @@ rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
>   	buf->bufptr64 = req;
>   	return 0;
>   }
> +VERSION_SYMBOL(rte_distributor_return_pkt, _v20, 2.0);
>   
>   /**** APIs called on distributor core ***/
>   
> @@ -314,6 +319,7 @@ rte_distributor_process_v20(struct rte_distributor_v20 *d,
>   	d->returns.count = ret_count;
>   	return num_mbufs;
>   }
> +VERSION_SYMBOL(rte_distributor_process, _v20, 2.0);
>   
>   /* return to the caller, packets returned from workers */
>   int
> @@ -334,6 +340,7 @@ rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
>   
>   	return retval;
>   }
> +VERSION_SYMBOL(rte_distributor_returned_pkts, _v20, 2.0);
>   
>   /* return the number of packets in-flight in a distributor, i.e. packets
>    * being workered on or queued up in a backlog. */
> @@ -362,6 +369,7 @@ rte_distributor_flush_v20(struct rte_distributor_v20 *d)
>   
>   	return flushed;
>   }
> +VERSION_SYMBOL(rte_distributor_flush, _v20, 2.0);
>   
>   /* clears the internal returns array in the distributor */
>   void
> @@ -372,6 +380,7 @@ rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d)
>   	memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
>   #endif
>   }
> +VERSION_SYMBOL(rte_distributor_clear_returns, _v20, 2.0);
>   
>   /* creates a distributor instance */
>   struct rte_distributor_v20 *
> @@ -415,3 +424,4 @@ rte_distributor_create_v20(const char *name,
>   
>   	return d;
>   }
> +VERSION_SYMBOL(rte_distributor_create, _v20, 2.0);
> diff --git a/lib/librte_distributor/rte_distributor_version.map b/lib/librte_distributor/rte_distributor_version.map
> index 73fdc43..3a285b3 100644
> --- a/lib/librte_distributor/rte_distributor_version.map
> +++ b/lib/librte_distributor/rte_distributor_version.map
> @@ -13,3 +13,17 @@ DPDK_2.0 {
>   
>   	local: *;
>   };
> +
> +DPDK_17.05 {
> +	global:
> +
> +	rte_distributor_clear_returns;
> +	rte_distributor_create;
> +	rte_distributor_flush;
> +	rte_distributor_get_pkt;
> +	rte_distributor_poll_pkt;
> +	rte_distributor_process;
> +	rte_distributor_request_pkt;
> +	rte_distributor_return_pkt;
> +	rte_distributor_returned_pkts;
> +} DPDK_2.0;

^ permalink raw reply	[flat|nested] 202+ messages in thread

* [PATCH v9 00/18] distributor lib performance enhancements
  2017-03-01  7:47                         ` [PATCH v8 01/18] lib: rename legacy distributor lib files David Hunt
@ 2017-03-06  9:10                           ` David Hunt
  2017-03-06  9:10                             ` [PATCH v9 01/18] lib: rename legacy distributor lib files David Hunt
                                               ` (18 more replies)
  0 siblings, 19 replies; 202+ messages in thread
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson

This patch aims to improve the throughput of the distributor library.

It uses a similar handshake mechanism to the previous version of
the library, in that bits are used to indicate when packets are ready
to be sent to a worker and ready to be returned from a worker. One main
difference is that instead of sending one packet in a cache line, it makes
use of the 7 free spaces in the same cache line in order to send up to
8 packets at a time to/from a worker.

The flow matching algorithm has had significant re-work, and now keeps an
array of inflight flows and an array of backlog flows, and matches incoming
flows to the inflight/backlog flows of all workers so that flow pinning to
workers can be maintained.

The Flow Match algorithm has both scalar and a vector versions, and a
function pointer is used to select the post appropriate function at run time,
depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
the scalar match function is selected, which should still gives a good boost
in performance over the non-burst API.

v9 changes:
   * fixed symbol versioning so it will compile on CentOS and RedHat

v8 changes:
   * Changed the patch set to have a more logical order order of
     the changes, but the end result is basically the same.
   * Fixed broken shared library build.
   * Split down the updates to example app more
   * No longer changes the test app and sample app to use a temporary
     API.
   * No longer temporarily re-names the functions in the
     version.map file.

v7 changes:
   * Reorganised patch so there's a more natural progression in the
     changes, and divided them down into easier to review chunks.
   * Previous versions of this patch set were effectively two APIs.
     We now have a single API. Legacy functionality can
     be used by by using the rte_distributor_create API call with the
     RTE_DISTRIBUTOR_SINGLE flag when creating a distributor instance.
   * Added symbol versioning for old API so that ABI is preserved.

v6 changes:
   * Fixed intermittent segfault where num pkts not divisible
     by BURST_SIZE
   * Cleanup due to review comments on mailing list
   * Renamed _priv.h to _private.h.

v5 changes:
   * Removed some un-needed code around retries in worker API calls
   * Cleanup due to review comments on mailing list
   * Cleanup of non-x86 platform compilation, fallback to scalar match

v4 changes:
   * fixed issue building shared libraries

v3 changes:
  * Addressed mailing list review comments
  * Test code removal
  * Split out SSE match into separate file to facilitate NEON addition
  * Cleaned up conditional compilation flags for SSE2
  * Addressed c99 style compilation errors
  * rebased on latest head (Jan 2 2017, Happy New Year to all)

v2 changes:
  * Created a common distributor_priv.h header file with common
    definitions and structures.
  * Added a scalar version so it can be built and used on machines without
    sse2 instruction set
  * Added unit autotests
  * Added perf autotest

Notes:
   Apps must now work in bursts, as up to 8 are given to a worker at a time
   For performance in matching, Flow ID's are 15-bits
   If 32 bits Flow IDs are required, use the packet-at-a-time (SINGLE)
   mode.

Performance Gains
   2.2GHz Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
   2 x XL710 40GbE NICS to 2 x 40Gbps traffic generator channels 64b packets
   separate cores for rx, tx, distributor
    1 worker  - up to 4.8x
    4 workers - up to 2.9x
    8 workers - up to 1.8x
   12 workers - up to 2.1x
   16 workers - up to 1.8x

[01/18] lib: rename legacy distributor lib files
[02/18] lib: create private header file
[03/18] lib: add new burst oriented distributor structs
[04/18] lib: add new distributor code
[05/18] lib: add SIMD flow matching to distributor
[06/18] test/distributor: extra params for autotests
[07/18] lib: switch distributor over to new API
[08/18] lib: make v20 header file private
[09/18] lib: add symbol versioning to distributor
[10/18] test: test single and burst distributor API
[11/18] test: add perf test for distributor burst mode
[12/18] examples/distributor: allow for extra stats
[13/18] sample: distributor: wait for ports to come up
[14/18] examples/distributor: give distributor a core
[15/18] examples/distributor: limit number of Tx rings
[16/18] examples/distributor: give Rx thread a core
[17/18] doc: distributor library changes for new burst API
[18/18] maintainers: add to distributor lib maintainers

^ permalink raw reply	[flat|nested] 202+ messages in thread

* [PATCH v9 01/18] lib: rename legacy distributor lib files
  2017-03-06  9:10                           ` [PATCH v9 00/18] distributor lib performance enhancements David Hunt
@ 2017-03-06  9:10                             ` David Hunt
  2017-03-15  6:19                               ` [PATCH v10 0/18] distributor library performance enhancements David Hunt
  2017-03-06  9:10                             ` [PATCH v9 02/18] lib: create private header file David Hunt
                                               ` (17 subsequent siblings)
  18 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Move files out of the way so that we can replace with new
versions of the distributor libtrary. Files are named in
such a way as to match the symbol versioning that we will
apply for backward ABI compatibility.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                    |   3 +-
 lib/librte_distributor/rte_distributor.h           | 210 +-----------------
 .../{rte_distributor.c => rte_distributor_v20.c}   |   2 +-
 lib/librte_distributor/rte_distributor_v20.h       | 247 +++++++++++++++++++++
 4 files changed, 251 insertions(+), 211 deletions(-)
 rename lib/librte_distributor/{rte_distributor.c => rte_distributor_v20.c} (99%)
 create mode 100644 lib/librte_distributor/rte_distributor_v20.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 4c9af17..b314ca6 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -42,10 +42,11 @@ EXPORT_MAP := rte_distributor_version.map
 LIBABIVER := 1
 
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor_v20.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
diff --git a/lib/librte_distributor/rte_distributor.h b/lib/librte_distributor/rte_distributor.h
index 7d36bc8..e41d522 100644
--- a/lib/librte_distributor/rte_distributor.h
+++ b/lib/librte_distributor/rte_distributor.h
@@ -34,214 +34,6 @@
 #ifndef _RTE_DISTRIBUTE_H_
 #define _RTE_DISTRIBUTE_H_
 
-/**
- * @file
- * RTE distributor
- *
- * The distributor is a component which is designed to pass packets
- * one-at-a-time to workers, with dynamic load balancing.
- */
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
-
-struct rte_distributor;
-struct rte_mbuf;
-
-/**
- * Function to create a new distributor instance
- *
- * Reserves the memory needed for the distributor operation and
- * initializes the distributor to work with the configured number of workers.
- *
- * @param name
- *   The name to be given to the distributor instance.
- * @param socket_id
- *   The NUMA node on which the memory is to be allocated
- * @param num_workers
- *   The maximum number of workers that will request packets from this
- *   distributor
- * @return
- *   The newly created distributor instance
- */
-struct rte_distributor *
-rte_distributor_create(const char *name, unsigned socket_id,
-		unsigned num_workers);
-
-/*  *** APIS to be called on the distributor lcore ***  */
-/*
- * The following APIs are the public APIs which are designed for use on a
- * single lcore which acts as the distributor lcore for a given distributor
- * instance. These functions cannot be called on multiple cores simultaneously
- * without using locking to protect access to the internals of the distributor.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * Process a set of packets by distributing them among workers that request
- * packets. The distributor will ensure that no two packets that have the
- * same flow id, or tag, in the mbuf will be procesed at the same time.
- *
- * The user is advocated to set tag for each mbuf before calling this function.
- * If user doesn't set the tag, the tag value can be various values depending on
- * driver implementation and configuration.
- *
- * This is not multi-thread safe and should only be called on a single lcore.
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs to be distributed
- * @param num_mbufs
- *   The number of mbufs in the mbufs array
- * @return
- *   The number of mbufs processed.
- */
-int
-rte_distributor_process(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned num_mbufs);
-
-/**
- * Get a set of mbufs that have been returned to the distributor by workers
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs pointer array to be filled in
- * @param max_mbufs
- *   The size of the mbufs array
- * @return
- *   The number of mbufs returned in the mbufs array.
- */
-int
-rte_distributor_returned_pkts(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned max_mbufs);
-
-/**
- * Flush the distributor component, so that there are no in-flight or
- * backlogged packets awaiting processing
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @return
- *   The number of queued/in-flight packets that were completed by this call.
- */
-int
-rte_distributor_flush(struct rte_distributor *d);
-
-/**
- * Clears the array of returned packets used as the source for the
- * rte_distributor_returned_pkts() API call.
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- */
-void
-rte_distributor_clear_returns(struct rte_distributor *d);
-
-/*  *** APIS to be called on the worker lcores ***  */
-/*
- * The following APIs are the public APIs which are designed for use on
- * multiple lcores which act as workers for a distributor. Each lcore should use
- * a unique worker id when requesting packets.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * API called by a worker to get a new packet to process. Any previous packet
- * given to the worker is assumed to have completed processing, and may be
- * optionally returned to the distributor via the oldpkt parameter.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- *
- * @return
- *   A new packet to be processed by the worker thread.
- */
-struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to return a completed packet without requesting a
- * new packet, for example, because a worker thread is shutting down
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param mbuf
- *   The previous packet being processed by the worker
- */
-int
-rte_distributor_return_pkt(struct rte_distributor *d, unsigned worker_id,
-		struct rte_mbuf *mbuf);
-
-/**
- * API called by a worker to request a new packet to process.
- * Any previous packet given to the worker is assumed to have completed
- * processing, and may be optionally returned to the distributor via
- * the oldpkt parameter.
- * Unlike rte_distributor_get_pkt(), this function does not wait for a new
- * packet to be provided by the distributor.
- *
- * NOTE: after calling this function, rte_distributor_poll_pkt() should
- * be used to poll for the packet requested. The rte_distributor_get_pkt()
- * API should *not* be used to try and retrieve the new packet.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- */
-void
-rte_distributor_request_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to check for a new packet that was previously
- * requested by a call to rte_distributor_request_pkt(). It does not wait
- * for the new packet to be available, but returns NULL if the request has
- * not yet been fulfilled by the distributor.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- *
- * @return
- *   A new packet to be processed by the worker thread, or NULL if no
- *   packet is yet available.
- */
-struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
-		unsigned worker_id);
-
-#ifdef __cplusplus
-}
-#endif
+#include <rte_distributor_v20.h>
 
 #endif
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor_v20.c
similarity index 99%
rename from lib/librte_distributor/rte_distributor.c
rename to lib/librte_distributor/rte_distributor_v20.c
index f3f778c..b890947 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -40,7 +40,7 @@
 #include <rte_errno.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
-#include "rte_distributor.h"
+#include "rte_distributor_v20.h"
 
 #define NO_FLAGS 0
 #define RTE_DISTRIB_PREFIX "DT_"
diff --git a/lib/librte_distributor/rte_distributor_v20.h b/lib/librte_distributor/rte_distributor_v20.h
new file mode 100644
index 0000000..b69aa27
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v20.h
@@ -0,0 +1,247 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIB_V20_H_
+#define _RTE_DISTRIB_V20_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+struct rte_distributor;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor *
+rte_distributor_create(const char *name, unsigned int socket_id,
+		unsigned int num_workers);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed at the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush(struct rte_distributor *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns(struct rte_distributor *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get a new packet to process. Any previous packet
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ *
+ * @return
+ *   A new packet to be processed by the worker thread.
+ */
+struct rte_mbuf *
+rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbuf
+ *   The previous packet being processed by the worker
+ */
+int
+rte_distributor_return_pkt(struct rte_distributor *d, unsigned int worker_id,
+		struct rte_mbuf *mbuf);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt(), this function does not wait for a new
+ * packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ */
+void
+rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ *
+ * @return
+ *   A new packet to be processed by the worker thread, or NULL if no
+ *   packet is yet available.
+ */
+struct rte_mbuf *
+rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v9 02/18] lib: create private header file
  2017-03-06  9:10                           ` [PATCH v9 00/18] distributor lib performance enhancements David Hunt
  2017-03-06  9:10                             ` [PATCH v9 01/18] lib: rename legacy distributor lib files David Hunt
@ 2017-03-06  9:10                             ` David Hunt
  2017-03-06  9:10                             ` [PATCH v9 03/18] lib: add new burst oriented distributor structs David Hunt
                                               ` (16 subsequent siblings)
  18 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

We'll be adding internal implementation definitions in here
that are common to both burst and legacy APIs.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/rte_distributor_private.h | 136 +++++++++++++++++++++++
 lib/librte_distributor/rte_distributor_v20.c     |  72 +-----------
 2 files changed, 137 insertions(+), 71 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_private.h

diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
new file mode 100644
index 0000000..6d72f1c
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_private.h
@@ -0,0 +1,136 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DIST_PRIV_H_
+#define _RTE_DIST_PRIV_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define NO_FLAGS 0
+#define RTE_DISTRIB_PREFIX "DT_"
+
+/*
+ * We will use the bottom four bits of pointer for flags, shifting out
+ * the top four bits to make room (since a 64-bit pointer actually only uses
+ * 48 bits). An arithmetic-right-shift will then appropriately restore the
+ * original pointer value with proper sign extension into the top bits.
+ */
+#define RTE_DISTRIB_FLAG_BITS 4
+#define RTE_DISTRIB_FLAGS_MASK (0x0F)
+#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
+#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
+#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
+#define RTE_DISTRIB_VALID_BUF (4)  /**< set if bufptr contains ptr */
+
+#define RTE_DISTRIB_BACKLOG_SIZE 8
+#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
+
+#define RTE_DISTRIB_MAX_RETURNS 128
+#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
+
+/**
+ * Maximum number of workers allowed.
+ * Be aware of increasing the limit, becaus it is limited by how we track
+ * in-flight tags. See in_flight_bitmask and rte_distributor_process
+ */
+#define RTE_DISTRIB_MAX_WORKERS 64
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+/**
+ * Buffer structure used to pass the pointer data between cores. This is cache
+ * line aligned, but to improve performance and prevent adjacent cache-line
+ * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
+ * the next cache line to worker 0, we pad this out to three cache lines.
+ * Only 64-bits of the memory is actually used though.
+ */
+union rte_distributor_buffer {
+	volatile int64_t bufptr64;
+	char pad[RTE_CACHE_LINE_SIZE*3];
+} __rte_cache_aligned;
+
+/**
+ * Number of packets to deal with in bursts. Needs to be 8 so as to
+ * fit in one cache line.
+ */
+#define RTE_DIST_BURST_SIZE (sizeof(rte_xmm_t) / sizeof(uint16_t))
+
+struct rte_distributor_backlog {
+	unsigned int start;
+	unsigned int count;
+	int64_t pkts[RTE_DIST_BURST_SIZE] __rte_cache_aligned;
+	uint16_t *tags; /* will point to second cacheline of inflights */
+} __rte_cache_aligned;
+
+
+struct rte_distributor_returned_pkts {
+	unsigned int start;
+	unsigned int count;
+	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
+};
+
+struct rte_distributor {
+	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
+
+	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
+	unsigned int num_workers;             /**< Number of workers polling */
+
+	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
+		/**< Tracks the tag being processed per core */
+	uint64_t in_flight_bitmask;
+		/**< on/off bits for in-flight tags.
+		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
+		 * the bitmask has to expand.
+		 */
+
+	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
+
+	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
+
+	struct rte_distributor_returned_pkts returns;
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
index b890947..be297ec 100644
--- a/lib/librte_distributor/rte_distributor_v20.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -41,77 +41,7 @@
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
 #include "rte_distributor_v20.h"
-
-#define NO_FLAGS 0
-#define RTE_DISTRIB_PREFIX "DT_"
-
-/* we will use the bottom four bits of pointer for flags, shifting out
- * the top four bits to make room (since a 64-bit pointer actually only uses
- * 48 bits). An arithmetic-right-shift will then appropriately restore the
- * original pointer value with proper sign extension into the top bits. */
-#define RTE_DISTRIB_FLAG_BITS 4
-#define RTE_DISTRIB_FLAGS_MASK (0x0F)
-#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
-#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
-#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
-
-#define RTE_DISTRIB_BACKLOG_SIZE 8
-#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
-
-#define RTE_DISTRIB_MAX_RETURNS 128
-#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
-
-/**
- * Maximum number of workers allowed.
- * Be aware of increasing the limit, becaus it is limited by how we track
- * in-flight tags. See @in_flight_bitmask and @rte_distributor_process
- */
-#define RTE_DISTRIB_MAX_WORKERS	64
-
-/**
- * Buffer structure used to pass the pointer data between cores. This is cache
- * line aligned, but to improve performance and prevent adjacent cache-line
- * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
- * the next cache line to worker 0, we pad this out to three cache lines.
- * Only 64-bits of the memory is actually used though.
- */
-union rte_distributor_buffer {
-	volatile int64_t bufptr64;
-	char pad[RTE_CACHE_LINE_SIZE*3];
-} __rte_cache_aligned;
-
-struct rte_distributor_backlog {
-	unsigned start;
-	unsigned count;
-	int64_t pkts[RTE_DISTRIB_BACKLOG_SIZE];
-};
-
-struct rte_distributor_returned_pkts {
-	unsigned start;
-	unsigned count;
-	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
-};
-
-struct rte_distributor {
-	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
-
-	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
-	unsigned num_workers;                 /**< Number of workers polling */
-
-	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
-		/**< Tracks the tag being processed per core */
-	uint64_t in_flight_bitmask;
-		/**< on/off bits for in-flight tags.
-		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
-		 * the bitmask has to expand.
-		 */
-
-	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
-
-	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
-
-	struct rte_distributor_returned_pkts returns;
-};
+#include "rte_distributor_private.h"
 
 TAILQ_HEAD(rte_distributor_list, rte_distributor);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v9 03/18] lib: add new burst oriented distributor structs
  2017-03-06  9:10                           ` [PATCH v9 00/18] distributor lib performance enhancements David Hunt
  2017-03-06  9:10                             ` [PATCH v9 01/18] lib: rename legacy distributor lib files David Hunt
  2017-03-06  9:10                             ` [PATCH v9 02/18] lib: create private header file David Hunt
@ 2017-03-06  9:10                             ` David Hunt
  2017-03-06  9:10                             ` [PATCH v9 04/18] lib: add new distributor code David Hunt
                                               ` (15 subsequent siblings)
  18 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/rte_distributor_private.h | 56 ++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
index 6d72f1c..d3a470e 100644
--- a/lib/librte_distributor/rte_distributor_private.h
+++ b/lib/librte_distributor/rte_distributor_private.h
@@ -129,6 +129,62 @@ struct rte_distributor {
 	struct rte_distributor_returned_pkts returns;
 };
 
+/* All different signature compare functions */
+enum rte_distributor_match_function {
+	RTE_DIST_MATCH_SCALAR = 0,
+	RTE_DIST_MATCH_VECTOR,
+	RTE_DIST_NUM_MATCH_FNS
+};
+
+/**
+ * Buffer structure used to pass the pointer data between cores. This is cache
+ * line aligned, but to improve performance and prevent adjacent cache-line
+ * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
+ * the next cache line to worker 0, we pad this out to two cache lines.
+ * We can pass up to 8 mbufs at a time in one cacheline.
+ * There is a separate cacheline for returns in the burst API.
+ */
+struct rte_distributor_buffer_v1705 {
+	volatile int64_t bufptr64[RTE_DIST_BURST_SIZE]
+		__rte_cache_aligned; /* <= outgoing to worker */
+
+	int64_t pad1 __rte_cache_aligned;    /* <= one cache line  */
+
+	volatile int64_t retptr64[RTE_DIST_BURST_SIZE]
+		__rte_cache_aligned; /* <= incoming from worker */
+
+	int64_t pad2 __rte_cache_aligned;    /* <= one cache line  */
+
+	int count __rte_cache_aligned;       /* <= number of current mbufs */
+};
+
+struct rte_distributor_v1705 {
+	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
+
+	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
+	unsigned int num_workers;             /**< Number of workers polling */
+	unsigned int alg_type;                /**< Number of alg types */
+
+	/**>
+	 * First cache line in the this array are the tags inflight
+	 * on the worker core. Second cache line are the backlog
+	 * that are going to go to the worker core.
+	 */
+	uint16_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS][RTE_DIST_BURST_SIZE*2]
+			__rte_cache_aligned;
+
+	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS]
+			__rte_cache_aligned;
+
+	struct rte_distributor_buffer_v1705 bufs[RTE_DISTRIB_MAX_WORKERS];
+
+	struct rte_distributor_returned_pkts returns;
+
+	enum rte_distributor_match_function dist_match_fn;
+
+	struct rte_distributor *d_v20;
+};
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v9 04/18] lib: add new distributor code
  2017-03-06  9:10                           ` [PATCH v9 00/18] distributor lib performance enhancements David Hunt
                                               ` (2 preceding siblings ...)
  2017-03-06  9:10                             ` [PATCH v9 03/18] lib: add new burst oriented distributor structs David Hunt
@ 2017-03-06  9:10                             ` David Hunt
  2017-03-10 16:03                               ` Bruce Richardson
  2017-03-06  9:10                             ` [PATCH v9 05/18] lib: add SIMD flow matching to distributor David Hunt
                                               ` (14 subsequent siblings)
  18 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

This patch includes public header file which will be used once
we add in the symbol versioning for v20 and v1705 APIs.

Also includes v1702 header file, and code for new
burst-capable distributor library. This will be re-named as
rte_distributor.h later in the patch-set

The new distributor code contains a very similar API to the legacy code,
but now sends bursts of up to 8 mbufs to each worker. Flow ID's are
reduced to 15 bits for an optimal flow matching algorithm.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                  |   1 +
 lib/librte_distributor/rte_distributor.c         | 628 +++++++++++++++++++++++
 lib/librte_distributor/rte_distributor_private.h |   7 +-
 lib/librte_distributor/rte_distributor_v1705.h   | 269 ++++++++++
 4 files changed, 904 insertions(+), 1 deletion(-)
 create mode 100644 lib/librte_distributor/rte_distributor.c
 create mode 100644 lib/librte_distributor/rte_distributor_v1705.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index b314ca6..74256ff 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -43,6 +43,7 @@ LIBABIVER := 1
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
new file mode 100644
index 0000000..0d5e833
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor.c
@@ -0,0 +1,628 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <sys/queue.h>
+#include <string.h>
+#include <rte_mbuf.h>
+#include <rte_memory.h>
+#include <rte_cycles.h>
+#include <rte_memzone.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
+#include <rte_eal_memconfig.h>
+#include <rte_compat.h>
+#include "rte_distributor_private.h"
+#include "rte_distributor_v1705.h"
+#include "rte_distributor_v20.h"
+
+TAILQ_HEAD(rte_dist_burst_list, rte_distributor_v1705);
+
+static struct rte_tailq_elem rte_dist_burst_tailq = {
+	.name = "RTE_DIST_BURST",
+};
+EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
+
+/**** APIs called by workers ****/
+
+/**** Burst Packet APIs called by workers ****/
+
+void
+rte_distributor_request_pkt_v1705(struct rte_distributor_v1705 *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count)
+{
+	struct rte_distributor_buffer_v1705 *buf = &(d->bufs[worker_id]);
+	unsigned int i;
+
+	volatile int64_t *retptr64;
+
+	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
+		rte_distributor_request_pkt(d->d_v20,
+			worker_id, oldpkt[0]);
+		return;
+	}
+
+	retptr64 = &(buf->retptr64[0]);
+	/* Spin while handshake bits are set (scheduler clears it) */
+	while (unlikely(*retptr64 & RTE_DISTRIB_GET_BUF)) {
+		rte_pause();
+		uint64_t t = rte_rdtsc()+100;
+
+		while (rte_rdtsc() < t)
+			rte_pause();
+	}
+
+	/*
+	 * OK, if we've got here, then the scheduler has just cleared the
+	 * handshake bits. Populate the retptrs with returning packets.
+	 */
+
+	for (i = count; i < RTE_DIST_BURST_SIZE; i++)
+		buf->retptr64[i] = 0;
+
+	/* Set Return bit for each packet returned */
+	for (i = count; i-- > 0; )
+		buf->retptr64[i] =
+			(((int64_t)(uintptr_t)(oldpkt[i])) <<
+			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
+
+	/*
+	 * Finally, set the GET_BUF  to signal to distributor that cache
+	 * line is ready for processing
+	 */
+	*retptr64 |= RTE_DISTRIB_GET_BUF;
+}
+
+int
+rte_distributor_poll_pkt_v1705(struct rte_distributor_v1705 *d,
+		unsigned int worker_id, struct rte_mbuf **pkts)
+{
+	struct rte_distributor_buffer_v1705 *buf = &d->bufs[worker_id];
+	uint64_t ret;
+	int count = 0;
+	unsigned int i;
+
+	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
+		pkts[0] = rte_distributor_poll_pkt(d->d_v20, worker_id);
+		return (pkts[0]) ? 1 : 0;
+	}
+
+	/* If bit is set, return */
+	if (buf->bufptr64[0] & RTE_DISTRIB_GET_BUF)
+		return -1;
+
+	/* since bufptr64 is signed, this should be an arithmetic shift */
+	for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+		if (likely(buf->bufptr64[i] & RTE_DISTRIB_VALID_BUF)) {
+			ret = buf->bufptr64[i] >> RTE_DISTRIB_FLAG_BITS;
+			pkts[count++] = (struct rte_mbuf *)((uintptr_t)(ret));
+		}
+	}
+
+	/*
+	 * so now we've got the contents of the cacheline into an  array of
+	 * mbuf pointers, so toggle the bit so scheduler can start working
+	 * on the next cacheline while we're working.
+	 */
+	buf->bufptr64[0] |= RTE_DISTRIB_GET_BUF;
+
+	return count;
+}
+
+int
+rte_distributor_get_pkt_v1705(struct rte_distributor_v1705 *d,
+		unsigned int worker_id, struct rte_mbuf **pkts,
+		struct rte_mbuf **oldpkt, unsigned int return_count)
+{
+	int count;
+
+	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
+		if (return_count <= 1) {
+			pkts[0] = rte_distributor_get_pkt(d->d_v20,
+				worker_id, oldpkt[0]);
+			return (pkts[0]) ? 1 : 0;
+		} else
+			return -EINVAL;
+	}
+
+	rte_distributor_request_pkt_v1705(d, worker_id, oldpkt, return_count);
+
+	count = rte_distributor_poll_pkt_v1705(d, worker_id, pkts);
+	while (count == -1) {
+		uint64_t t = rte_rdtsc() + 100;
+
+		while (rte_rdtsc() < t)
+			rte_pause();
+
+		count = rte_distributor_poll_pkt_v1705(d, worker_id, pkts);
+	}
+	return count;
+}
+
+int
+rte_distributor_return_pkt_v1705(struct rte_distributor_v1705 *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
+{
+	struct rte_distributor_buffer_v1705 *buf = &d->bufs[worker_id];
+	unsigned int i;
+
+	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
+		if (num == 1)
+			return rte_distributor_return_pkt(d->d_v20,
+				worker_id, oldpkt[0]);
+		else
+			return -EINVAL;
+	}
+
+	for (i = 0; i < RTE_DIST_BURST_SIZE; i++)
+		/* Switch off the return bit first */
+		buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
+
+	for (i = num; i-- > 0; )
+		buf->retptr64[i] = (((int64_t)(uintptr_t)oldpkt[i]) <<
+			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
+
+	/* set the GET_BUF but even if we got no returns */
+	buf->retptr64[0] |= RTE_DISTRIB_GET_BUF;
+
+	return 0;
+}
+
+/**** APIs called on distributor core ***/
+
+/* stores a packet returned from a worker inside the returns array */
+static inline void
+store_return(uintptr_t oldbuf, struct rte_distributor_v1705 *d,
+		unsigned int *ret_start, unsigned int *ret_count)
+{
+	if (!oldbuf)
+		return;
+	/* store returns in a circular buffer */
+	d->returns.mbufs[(*ret_start + *ret_count) & RTE_DISTRIB_RETURNS_MASK]
+			= (void *)oldbuf;
+	*ret_start += (*ret_count == RTE_DISTRIB_RETURNS_MASK);
+	*ret_count += (*ret_count != RTE_DISTRIB_RETURNS_MASK);
+}
+
+/*
+ * Match then flow_ids (tags) of the incoming packets to the flow_ids
+ * of the inflight packets (both inflight on the workers and in each worker
+ * backlog). This will then allow us to pin those packets to the relevant
+ * workers to give us our atomic flow pinning.
+ */
+void
+find_match_scalar(struct rte_distributor_v1705 *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	struct rte_distributor_backlog *bl;
+	uint16_t i, j, w;
+
+	/*
+	 * Function overview:
+	 * 1. Loop through all worker ID's
+	 * 2. Compare the current inflights to the incoming tags
+	 * 3. Compare the current backlog to the incoming tags
+	 * 4. Add any matches to the output
+	 */
+
+	for (j = 0 ; j < RTE_DIST_BURST_SIZE; j++)
+		output_ptr[j] = 0;
+
+	for (i = 0; i < d->num_workers; i++) {
+		bl = &d->backlog[i];
+
+		for (j = 0; j < RTE_DIST_BURST_SIZE ; j++)
+			for (w = 0; w < RTE_DIST_BURST_SIZE; w++)
+				if (d->in_flight_tags[i][j] == data_ptr[w]) {
+					output_ptr[j] = i+1;
+					break;
+				}
+		for (j = 0; j < RTE_DIST_BURST_SIZE; j++)
+			for (w = 0; w < RTE_DIST_BURST_SIZE; w++)
+				if (bl->tags[j] == data_ptr[w]) {
+					output_ptr[j] = i+1;
+					break;
+				}
+	}
+
+	/*
+	 * At this stage, the output contains 8 16-bit values, with
+	 * each non-zero value containing the worker ID on which the
+	 * corresponding flow is pinned to.
+	 */
+}
+
+
+/*
+ * When the handshake bits indicate that there are packets coming
+ * back from the worker, this function is called to copy and store
+ * the valid returned pointers (store_return).
+ */
+static unsigned int
+handle_returns(struct rte_distributor_v1705 *d, unsigned int wkr)
+{
+	struct rte_distributor_buffer_v1705 *buf = &(d->bufs[wkr]);
+	uintptr_t oldbuf;
+	unsigned int ret_start = d->returns.start,
+			ret_count = d->returns.count;
+	unsigned int count = 0;
+	unsigned int i;
+
+	if (buf->retptr64[0] & RTE_DISTRIB_GET_BUF) {
+		for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+			if (buf->retptr64[i] & RTE_DISTRIB_RETURN_BUF) {
+				oldbuf = ((uintptr_t)(buf->retptr64[i] >>
+					RTE_DISTRIB_FLAG_BITS));
+				/* store returns in a circular buffer */
+				store_return(oldbuf, d, &ret_start, &ret_count);
+				count++;
+				buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
+			}
+		}
+		d->returns.start = ret_start;
+		d->returns.count = ret_count;
+		/* Clear for the worker to populate with more returns */
+		buf->retptr64[0] = 0;
+	}
+	return count;
+}
+
+/*
+ * This function releases a burst (cache line) to a worker.
+ * It is called from the process function when a cacheline is
+ * full to make room for more packets for that worker, or when
+ * all packets have been assigned to bursts and need to be flushed
+ * to the workers.
+ * It also needs to wait for any outstanding packets from the worker
+ * before sending out new packets.
+ */
+static unsigned int
+release(struct rte_distributor_v1705 *d, unsigned int wkr)
+{
+	struct rte_distributor_buffer_v1705 *buf = &(d->bufs[wkr]);
+	unsigned int i;
+
+	while (!(d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF))
+		rte_pause();
+
+	handle_returns(d, wkr);
+
+	buf->count = 0;
+
+	for (i = 0; i < d->backlog[wkr].count; i++) {
+		d->bufs[wkr].bufptr64[i] = d->backlog[wkr].pkts[i] |
+				RTE_DISTRIB_GET_BUF | RTE_DISTRIB_VALID_BUF;
+		d->in_flight_tags[wkr][i] = d->backlog[wkr].tags[i];
+	}
+	buf->count = i;
+	for ( ; i < RTE_DIST_BURST_SIZE ; i++) {
+		buf->bufptr64[i] = RTE_DISTRIB_GET_BUF;
+		d->in_flight_tags[wkr][i] = 0;
+	}
+
+	d->backlog[wkr].count = 0;
+
+	/* Clear the GET bit */
+	buf->bufptr64[0] &= ~RTE_DISTRIB_GET_BUF;
+	return  buf->count;
+
+}
+
+
+/* process a set of packets to distribute them to workers */
+int
+rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs)
+{
+	unsigned int next_idx = 0;
+	static unsigned int wkr;
+	struct rte_mbuf *next_mb = NULL;
+	int64_t next_value = 0;
+	uint16_t new_tag = 0;
+	uint16_t flows[RTE_DIST_BURST_SIZE] __rte_cache_aligned;
+	unsigned int i, j, w, wid;
+
+	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
+		/* Call the old API */
+		return rte_distributor_process(d->d_v20, mbufs, num_mbufs);
+	}
+
+	if (unlikely(num_mbufs == 0)) {
+		/* Flush out all non-full cache-lines to workers. */
+		for (wid = 0 ; wid < d->num_workers; wid++) {
+			if ((d->bufs[wid].bufptr64[0] & RTE_DISTRIB_GET_BUF)) {
+				release(d, wid);
+				handle_returns(d, wid);
+			}
+		}
+		return 0;
+	}
+
+	while (next_idx < num_mbufs) {
+		uint16_t matches[RTE_DIST_BURST_SIZE];
+		unsigned int pkts;
+
+		if (d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF)
+			d->bufs[wkr].count = 0;
+
+		if ((num_mbufs - next_idx) < RTE_DIST_BURST_SIZE)
+			pkts = num_mbufs - next_idx;
+		else
+			pkts = RTE_DIST_BURST_SIZE;
+
+		for (i = 0; i < pkts; i++) {
+			if (mbufs[next_idx + i]) {
+				/* flows have to be non-zero */
+				flows[i] = mbufs[next_idx + i]->hash.usr | 1;
+			} else
+				flows[i] = 0;
+		}
+		for (; i < RTE_DIST_BURST_SIZE; i++)
+			flows[i] = 0;
+
+		find_match_scalar(d, &flows[0], &matches[0]);
+
+		/*
+		 * Matches array now contain the intended worker ID (+1) of
+		 * the incoming packets. Any zeroes need to be assigned
+		 * workers.
+		 */
+
+		for (j = 0; j < pkts; j++) {
+
+			next_mb = mbufs[next_idx++];
+			next_value = (((int64_t)(uintptr_t)next_mb) <<
+					RTE_DISTRIB_FLAG_BITS);
+			/*
+			 * User is advocated to set tag vaue for each
+			 * mbuf before calling rte_distributor_process.
+			 * User defined tags are used to identify flows,
+			 * or sessions.
+			 */
+			/* flows MUST be non-zero */
+			new_tag = (uint16_t)(next_mb->hash.usr) | 1;
+
+			/*
+			 * Uncommenting the next line will cause the find_match
+			 * function to be optimised out, making this function
+			 * do parallel (non-atomic) distribution
+			 */
+			/* matches[j] = 0; */
+
+			if (matches[j]) {
+				struct rte_distributor_backlog *bl =
+						&d->backlog[matches[j]-1];
+				if (unlikely(bl->count ==
+						RTE_DIST_BURST_SIZE)) {
+					release(d, matches[j]-1);
+				}
+
+				/* Add to worker that already has flow */
+				unsigned int idx = bl->count++;
+
+				bl->tags[idx] = new_tag;
+				bl->pkts[idx] = next_value;
+
+			} else {
+				struct rte_distributor_backlog *bl =
+						&d->backlog[wkr];
+				if (unlikely(bl->count ==
+						RTE_DIST_BURST_SIZE)) {
+					release(d, wkr);
+				}
+
+				/* Add to current worker worker */
+				unsigned int idx = bl->count++;
+
+				bl->tags[idx] = new_tag;
+				bl->pkts[idx] = next_value;
+				/*
+				 * Now that we've just added an unpinned flow
+				 * to a worker, we need to ensure that all
+				 * other packets with that same flow will go
+				 * to the same worker in this burst.
+				 */
+				for (w = j; w < pkts; w++)
+					if (flows[w] == new_tag)
+						matches[w] = wkr+1;
+			}
+		}
+		wkr++;
+		if (wkr >= d->num_workers)
+			wkr = 0;
+	}
+
+	/* Flush out all non-full cache-lines to workers. */
+	for (wid = 0 ; wid < d->num_workers; wid++)
+		if ((d->bufs[wid].bufptr64[0] & RTE_DISTRIB_GET_BUF))
+			release(d, wid);
+
+	return num_mbufs;
+}
+
+/* return to the caller, packets returned from workers */
+int
+rte_distributor_returned_pkts_v1705(struct rte_distributor_v1705 *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs)
+{
+	struct rte_distributor_returned_pkts *returns = &d->returns;
+	unsigned int retval = (max_mbufs < returns->count) ?
+			max_mbufs : returns->count;
+	unsigned int i;
+
+	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
+		/* Call the old API */
+		return rte_distributor_returned_pkts(d->d_v20,
+				mbufs, max_mbufs);
+	}
+
+	for (i = 0; i < retval; i++) {
+		unsigned int idx = (returns->start + i) &
+				RTE_DISTRIB_RETURNS_MASK;
+
+		mbufs[i] = returns->mbufs[idx];
+	}
+	returns->start += i;
+	returns->count -= i;
+
+	return retval;
+}
+
+/*
+ * Return the number of packets in-flight in a distributor, i.e. packets
+ * being workered on or queued up in a backlog.
+ */
+static inline unsigned int
+total_outstanding(const struct rte_distributor_v1705 *d)
+{
+	unsigned int wkr, total_outstanding = 0;
+
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		total_outstanding += d->backlog[wkr].count;
+
+	return total_outstanding;
+}
+
+/*
+ * Flush the distributor, so that there are no outstanding packets in flight or
+ * queued up.
+ */
+int
+rte_distributor_flush_v1705(struct rte_distributor_v1705 *d)
+{
+	const unsigned int flushed = total_outstanding(d);
+	unsigned int wkr;
+
+	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
+		/* Call the old API */
+		return rte_distributor_flush(d->d_v20);
+	}
+
+	while (total_outstanding(d) > 0)
+		rte_distributor_process_v1705(d, NULL, 0);
+
+	/*
+	 * Send empty burst to all workers to allow them to exit
+	 * gracefully, should they need to.
+	 */
+	rte_distributor_process_v1705(d, NULL, 0);
+
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		handle_returns(d, wkr);
+
+	return flushed;
+}
+
+/* clears the internal returns array in the distributor */
+void
+rte_distributor_clear_returns_v1705(struct rte_distributor_v1705 *d)
+{
+	unsigned int wkr;
+
+	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
+		/* Call the old API */
+		rte_distributor_clear_returns(d->d_v20);
+	}
+
+	/* throw away returns, so workers can exit */
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		d->bufs[wkr].retptr64[0] = 0;
+}
+
+/* creates a distributor instance */
+struct rte_distributor_v1705 *
+rte_distributor_create_v1705(const char *name,
+		unsigned int socket_id,
+		unsigned int num_workers,
+		unsigned int alg_type)
+{
+	struct rte_distributor_v1705 *d;
+	struct rte_dist_burst_list *dist_burst_list;
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+	unsigned int i;
+
+	/* TODO Reorganise function properly around RTE_DIST_ALG_SINGLE/BURST */
+
+	/* compilation-time checks */
+	RTE_BUILD_BUG_ON((sizeof(*d) & RTE_CACHE_LINE_MASK) != 0);
+	RTE_BUILD_BUG_ON((RTE_DISTRIB_MAX_WORKERS & 7) != 0);
+
+	if (alg_type == RTE_DIST_ALG_SINGLE) {
+		d = malloc(sizeof(struct rte_distributor_v1705));
+		d->d_v20 = rte_distributor_create(name,
+				socket_id, num_workers);
+		if (d->d_v20 == NULL) {
+			/* rte_errno will have been set */
+			return NULL;
+		}
+		d->alg_type = alg_type;
+		return d;
+	}
+
+	if (name == NULL || num_workers >= RTE_DISTRIB_MAX_WORKERS) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	snprintf(mz_name, sizeof(mz_name), RTE_DISTRIB_PREFIX"%s", name);
+	mz = rte_memzone_reserve(mz_name, sizeof(*d), socket_id, NO_FLAGS);
+	if (mz == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	d = mz->addr;
+	snprintf(d->name, sizeof(d->name), "%s", name);
+	d->num_workers = num_workers;
+	d->alg_type = alg_type;
+	d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
+
+	/*
+	 * Set up the backog tags so they're pointing at the second cache
+	 * line for performance during flow matching
+	 */
+	for (i = 0 ; i < num_workers ; i++)
+		d->backlog[i].tags = &d->in_flight_tags[i][RTE_DIST_BURST_SIZE];
+
+	dist_burst_list = RTE_TAILQ_CAST(rte_dist_burst_tailq.head,
+					  rte_dist_burst_list);
+
+
+	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+	TAILQ_INSERT_TAIL(dist_burst_list, d, next);
+	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
+	return d;
+}
diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
index d3a470e..f0042a8 100644
--- a/lib/librte_distributor/rte_distributor_private.h
+++ b/lib/librte_distributor/rte_distributor_private.h
@@ -159,7 +159,7 @@ struct rte_distributor_buffer_v1705 {
 };
 
 struct rte_distributor_v1705 {
-	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
+	TAILQ_ENTRY(rte_distributor_v1705) next;    /**< Next in list. */
 
 	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
 	unsigned int num_workers;             /**< Number of workers polling */
@@ -185,6 +185,11 @@ struct rte_distributor_v1705 {
 	struct rte_distributor *d_v20;
 };
 
+void
+find_match_scalar(struct rte_distributor_v1705 *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/librte_distributor/rte_distributor_v1705.h b/lib/librte_distributor/rte_distributor_v1705.h
new file mode 100644
index 0000000..0034020
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v1705.h
@@ -0,0 +1,269 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIBUTOR_H_
+#define _RTE_DISTRIBUTOR_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Type of distribution (burst/single) */
+enum rte_distributor_alg_type {
+	RTE_DIST_ALG_BURST = 0,
+	RTE_DIST_ALG_SINGLE,
+	RTE_DIST_NUM_ALG_TYPES
+};
+
+struct rte_distributor_v1705;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @param alg_type
+ *   Call the legacy API, or use the new burst API. legacy uses 32-bit
+ *   flow ID, and works on a single packet at a time. Latest uses 15-
+ *   bit flow ID and works on up to 8 packets at a time to worers.
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor_v1705 *
+rte_distributor_create_v1705(const char *name, unsigned int socket_id,
+		unsigned int num_workers,
+		unsigned int alg_type);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed on different cores at
+ * the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts_v1705(struct rte_distributor_v1705 *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush_v1705(struct rte_distributor_v1705 *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns_v1705(struct rte_distributor_v1705 *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get new packets to process. Any previous packets
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param pkts
+ *   The mbufs pointer array to be filled in (up to 8 packets)
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ * @param retcount
+ *   The number of packets being returned
+ *
+ * @return
+ *   The number of packets in the pkts array
+ */
+int
+rte_distributor_get_pkt_v1705(struct rte_distributor_v1705 *d,
+	unsigned int worker_id, struct rte_mbuf **pkts,
+	struct rte_mbuf **oldpkt, unsigned int retcount);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packets being processed by the worker
+ * @param num
+ *   The number of packets in the oldpkt array
+ */
+int
+rte_distributor_return_pkt_v1705(struct rte_distributor_v1705 *d,
+	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt_burst(), this function does not wait for a
+ * new packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt_burst() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt_burst()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The returning packets, if any, processed by the worker
+ * @param count
+ *   The number of returning packets
+ */
+void
+rte_distributor_request_pkt_v1705(struct rte_distributor_v1705 *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbufs
+ *   The array of mbufs being given to the worker
+ *
+ * @return
+ *   The number of packets being given to the worker thread, zero if no
+ *   packet is yet available.
+ */
+int
+rte_distributor_poll_pkt_v1705(struct rte_distributor_v1705 *d,
+		unsigned int worker_id, struct rte_mbuf **mbufs);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v9 05/18] lib: add SIMD flow matching to distributor
  2017-03-06  9:10                           ` [PATCH v9 00/18] distributor lib performance enhancements David Hunt
                                               ` (3 preceding siblings ...)
  2017-03-06  9:10                             ` [PATCH v9 04/18] lib: add new distributor code David Hunt
@ 2017-03-06  9:10                             ` David Hunt
  2017-03-06  9:10                             ` [PATCH v9 06/18] test/distributor: extra params for autotests David Hunt
                                               ` (13 subsequent siblings)
  18 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Add an optimised version of the in-flight flow matching algorithm
using SIMD instructions. This should give up to 1.5x over the scalar
versions performance.

Falls back to scalar version if SSE4.2 not available

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                    |  10 ++
 lib/librte_distributor/rte_distributor.c           |  16 ++-
 .../rte_distributor_match_generic.c                |  43 ++++++++
 lib/librte_distributor/rte_distributor_match_sse.c | 114 +++++++++++++++++++++
 lib/librte_distributor/rte_distributor_private.h   |   5 +
 5 files changed, 186 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_match_generic.c
 create mode 100644 lib/librte_distributor/rte_distributor_match_sse.c

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 74256ff..a812fe4 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -44,6 +44,16 @@ LIBABIVER := 1
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor.c
+ifeq ($(CONFIG_RTE_ARCH_X86),y)
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_match_sse.c
+# distributor SIMD algo needs SSE4.2 support
+ifeq ($(findstring RTE_MACHINE_CPUFLAG_SSE4_2,$(CFLAGS)),)
+CFLAGS_rte_distributor_match_sse.o += -msse4.2
+endif
+else
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_match_generic.c
+endif
+
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index 0d5e833..51c9ad9 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -391,7 +391,13 @@ rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
 		for (; i < RTE_DIST_BURST_SIZE; i++)
 			flows[i] = 0;
 
-		find_match_scalar(d, &flows[0], &matches[0]);
+		switch (d->dist_match_fn) {
+		case RTE_DIST_MATCH_VECTOR:
+			find_match_vec(d, &flows[0], &matches[0]);
+			break;
+		default:
+			find_match_scalar(d, &flows[0], &matches[0]);
+		}
 
 		/*
 		 * Matches array now contain the intended worker ID (+1) of
@@ -607,7 +613,13 @@ rte_distributor_create_v1705(const char *name,
 	snprintf(d->name, sizeof(d->name), "%s", name);
 	d->num_workers = num_workers;
 	d->alg_type = alg_type;
-	d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
+
+#if defined(RTE_ARCH_X86)
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2))
+		d->dist_match_fn = RTE_DIST_MATCH_VECTOR;
+	else
+#endif
+		d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
 
 	/*
 	 * Set up the backog tags so they're pointing at the second cache
diff --git a/lib/librte_distributor/rte_distributor_match_generic.c b/lib/librte_distributor/rte_distributor_match_generic.c
new file mode 100644
index 0000000..4925a78
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_match_generic.c
@@ -0,0 +1,43 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_mbuf.h>
+#include "rte_distributor_private.h"
+#include "rte_distributor.h"
+
+void
+find_match_vec(struct rte_distributor *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	find_match_scalar(d, data_ptr, output_ptr);
+}
diff --git a/lib/librte_distributor/rte_distributor_match_sse.c b/lib/librte_distributor/rte_distributor_match_sse.c
new file mode 100644
index 0000000..b9f9bb0
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_match_sse.c
@@ -0,0 +1,114 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_mbuf.h>
+#include "rte_distributor_private.h"
+#include "rte_distributor.h"
+#include "smmintrin.h"
+#include "nmmintrin.h"
+
+
+void
+find_match_vec(struct rte_distributor_v1705 *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	/* Setup */
+	__m128i incoming_fids;
+	__m128i inflight_fids;
+	__m128i preflight_fids;
+	__m128i wkr;
+	__m128i mask1;
+	__m128i mask2;
+	__m128i output;
+	struct rte_distributor_backlog *bl;
+	uint16_t i;
+
+	/*
+	 * Function overview:
+	 * 2. Loop through all worker ID's
+	 *  2a. Load the current inflights for that worker into an xmm reg
+	 *  2b. Load the current backlog for that worker into an xmm reg
+	 *  2c. use cmpestrm to intersect flow_ids with backlog and inflights
+	 *  2d. Add any matches to the output
+	 * 3. Write the output xmm (matching worker ids).
+	 */
+
+
+	output = _mm_set1_epi16(0);
+	incoming_fids = _mm_load_si128((__m128i *)data_ptr);
+
+	for (i = 0; i < d->num_workers; i++) {
+		bl = &d->backlog[i];
+
+		inflight_fids =
+			_mm_load_si128((__m128i *)&(d->in_flight_tags[i]));
+		preflight_fids =
+			_mm_load_si128((__m128i *)(bl->tags));
+
+		/*
+		 * Any incoming_fid that exists anywhere in inflight_fids will
+		 * have 0xffff in same position of the mask as the incoming fid
+		 * Example (shortened to bytes for brevity):
+		 * incoming_fids   0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08
+		 * inflight_fids   0x03 0x05 0x07 0x00 0x00 0x00 0x00 0x00
+		 * mask            0x00 0x00 0xff 0x00 0xff 0x00 0xff 0x00
+		 */
+
+		mask1 = _mm_cmpestrm(inflight_fids, 8, incoming_fids, 8,
+			_SIDD_UWORD_OPS |
+			_SIDD_CMP_EQUAL_ANY |
+			_SIDD_UNIT_MASK);
+		mask2 = _mm_cmpestrm(preflight_fids, 8, incoming_fids, 8,
+			_SIDD_UWORD_OPS |
+			_SIDD_CMP_EQUAL_ANY |
+			_SIDD_UNIT_MASK);
+
+		mask1 = _mm_or_si128(mask1, mask2);
+		/*
+		 * Now mask contains 0xffff where there's a match.
+		 * Next we need to store the worker_id in the relevant position
+		 * in the output.
+		 */
+
+		wkr = _mm_set1_epi16(i+1);
+		mask1 = _mm_and_si128(mask1, wkr);
+		output = _mm_or_si128(mask1, output);
+	}
+
+	/*
+	 * At this stage, the output 128-bit contains 8 16-bit values, with
+	 * each non-zero value containing the worker ID on which the
+	 * corresponding flow is pinned to.
+	 */
+	_mm_store_si128((__m128i *)output_ptr, output);
+}
diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
index f0042a8..92052b1 100644
--- a/lib/librte_distributor/rte_distributor_private.h
+++ b/lib/librte_distributor/rte_distributor_private.h
@@ -190,6 +190,11 @@ find_match_scalar(struct rte_distributor_v1705 *d,
 			uint16_t *data_ptr,
 			uint16_t *output_ptr);
 
+void
+find_match_vec(struct rte_distributor_v1705 *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr);
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v9 06/18] test/distributor: extra params for autotests
  2017-03-06  9:10                           ` [PATCH v9 00/18] distributor lib performance enhancements David Hunt
                                               ` (4 preceding siblings ...)
  2017-03-06  9:10                             ` [PATCH v9 05/18] lib: add SIMD flow matching to distributor David Hunt
@ 2017-03-06  9:10                             ` David Hunt
  2017-03-06  9:10                             ` [PATCH v9 07/18] lib: switch distributor over to new API David Hunt
                                               ` (12 subsequent siblings)
  18 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

In the next few patches, we'll want to test old and new API,
so here we're allowing different parameters to be passed to
the tests, instead of just a distributor struct.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 test/test/test_distributor.c | 64 +++++++++++++++++++++++++++++---------------
 1 file changed, 43 insertions(+), 21 deletions(-)

diff --git a/test/test/test_distributor.c b/test/test/test_distributor.c
index 85cb8f3..6059a0c 100644
--- a/test/test/test_distributor.c
+++ b/test/test/test_distributor.c
@@ -45,6 +45,13 @@
 #define BURST 32
 #define BIG_BATCH 1024
 
+struct worker_params {
+	char name[64];
+	struct rte_distributor *dist;
+};
+
+struct worker_params worker_params;
+
 /* statics - all zero-initialized by default */
 static volatile int quit;      /**< general quit variable for all threads */
 static volatile int zero_quit; /**< var for when we just want thr0 to quit*/
@@ -81,7 +88,9 @@ static int
 handle_work(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->dist;
+
 	unsigned count = 0;
 	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
 
@@ -107,8 +116,9 @@ handle_work(void *arg)
  *   not necessarily in the same order (as different flows).
  */
 static int
-sanity_test(struct rte_distributor *d, struct rte_mempool *p)
+sanity_test(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->dist;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
@@ -249,7 +259,8 @@ static int
 handle_work_with_free_mbufs(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->dist;
 	unsigned count = 0;
 	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
 
@@ -270,8 +281,9 @@ handle_work_with_free_mbufs(void *arg)
  * library.
  */
 static int
-sanity_test_with_mbuf_alloc(struct rte_distributor *d, struct rte_mempool *p)
+sanity_test_with_mbuf_alloc(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->dist;
 	unsigned i;
 	struct rte_mbuf *bufs[BURST];
 
@@ -305,7 +317,8 @@ static int
 handle_work_for_shutdown_test(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->dist;
 	unsigned count = 0;
 	const unsigned id = __sync_fetch_and_add(&worker_idx, 1);
 
@@ -344,9 +357,10 @@ handle_work_for_shutdown_test(void *arg)
  * library.
  */
 static int
-sanity_test_with_worker_shutdown(struct rte_distributor *d,
+sanity_test_with_worker_shutdown(struct worker_params *wp,
 		struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->dist;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
@@ -401,9 +415,10 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d,
  * one worker shuts down..
  */
 static int
-test_flush_with_worker_shutdown(struct rte_distributor *d,
+test_flush_with_worker_shutdown(struct worker_params *wp,
 		struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->dist;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
@@ -480,8 +495,9 @@ int test_error_distributor_create_numworkers(void)
 
 /* Useful function which ensures that all worker functions terminate */
 static void
-quit_workers(struct rte_distributor *d, struct rte_mempool *p)
+quit_workers(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->dist;
 	const unsigned num_workers = rte_lcore_count() - 1;
 	unsigned i;
 	struct rte_mbuf *bufs[RTE_MAX_LCORE];
@@ -536,28 +552,34 @@ test_distributor(void)
 		}
 	}
 
-	rte_eal_mp_remote_launch(handle_work, d, SKIP_MASTER);
-	if (sanity_test(d, p) < 0)
+	worker_params.dist = d;
+	sprintf(worker_params.name, "single");
+
+	rte_eal_mp_remote_launch(handle_work, &worker_params, SKIP_MASTER);
+	if (sanity_test(&worker_params, p) < 0)
 		goto err;
-	quit_workers(d, p);
+	quit_workers(&worker_params, p);
 
-	rte_eal_mp_remote_launch(handle_work_with_free_mbufs, d, SKIP_MASTER);
-	if (sanity_test_with_mbuf_alloc(d, p) < 0)
+	rte_eal_mp_remote_launch(handle_work_with_free_mbufs, &worker_params,
+				SKIP_MASTER);
+	if (sanity_test_with_mbuf_alloc(&worker_params, p) < 0)
 		goto err;
-	quit_workers(d, p);
+	quit_workers(&worker_params, p);
 
 	if (rte_lcore_count() > 2) {
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test, d,
+		rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+				&worker_params,
 				SKIP_MASTER);
-		if (sanity_test_with_worker_shutdown(d, p) < 0)
+		if (sanity_test_with_worker_shutdown(&worker_params, p) < 0)
 			goto err;
-		quit_workers(d, p);
+		quit_workers(&worker_params, p);
 
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test, d,
+		rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+				&worker_params,
 				SKIP_MASTER);
-		if (test_flush_with_worker_shutdown(d, p) < 0)
+		if (test_flush_with_worker_shutdown(&worker_params, p) < 0)
 			goto err;
-		quit_workers(d, p);
+		quit_workers(&worker_params, p);
 
 	} else {
 		printf("Not enough cores to run tests for worker shutdown\n");
@@ -572,7 +594,7 @@ test_distributor(void)
 	return 0;
 
 err:
-	quit_workers(d, p);
+	quit_workers(&worker_params, p);
 	return -1;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v9 07/18] lib: switch distributor over to new API
  2017-03-06  9:10                           ` [PATCH v9 00/18] distributor lib performance enhancements David Hunt
                                               ` (5 preceding siblings ...)
  2017-03-06  9:10                             ` [PATCH v9 06/18] test/distributor: extra params for autotests David Hunt
@ 2017-03-06  9:10                             ` David Hunt
  2017-03-06  9:10                             ` [PATCH v9 08/18] lib: make v20 header file private David Hunt
                                               ` (11 subsequent siblings)
  18 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

This is the main switch over between the legacy API and the new
burst API. We rename all the functions in rte_distributor.c to remove
the _v1705, and we add in _v20 in the rte_distributor_v20.c

At the same time, we need the autotests and sample app to compile
properly, hence thosie changes are in there as well.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 examples/distributor/main.c                        |  22 +-
 lib/librte_distributor/rte_distributor.c           |  76 +++---
 lib/librte_distributor/rte_distributor.h           | 240 +++++++++++++++++-
 lib/librte_distributor/rte_distributor_match_sse.c |   2 +-
 lib/librte_distributor/rte_distributor_private.h   |  22 +-
 lib/librte_distributor/rte_distributor_v1705.h     | 269 ---------------------
 lib/librte_distributor/rte_distributor_v20.c       |  46 ++--
 lib/librte_distributor/rte_distributor_v20.h       |  24 +-
 test/test/test_distributor.c                       | 235 ++++++++++++------
 test/test/test_distributor_perf.c                  |  26 +-
 10 files changed, 511 insertions(+), 451 deletions(-)
 delete mode 100644 lib/librte_distributor/rte_distributor_v1705.h

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index e7641d2..cc3bdb0 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -405,17 +405,30 @@ lcore_worker(struct lcore_params *p)
 {
 	struct rte_distributor *d = p->d;
 	const unsigned id = p->worker_id;
+	unsigned int num = 0;
+	unsigned int i;
+
 	/*
 	 * for single port, xor_val will be zero so we won't modify the output
 	 * port, otherwise we send traffic from 0 to 1, 2 to 3, and vice versa
 	 */
 	const unsigned xor_val = (rte_eth_dev_count() > 1);
-	struct rte_mbuf *buf = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
 
 	printf("\nCore %u acting as worker core.\n", rte_lcore_id());
 	while (!quit_signal) {
-		buf = rte_distributor_get_pkt(d, id, buf);
-		buf->port ^= xor_val;
+		num = rte_distributor_get_pkt(d, id, buf, buf, num);
+		/* Do a little bit of work for each packet */
+		for (i = 0; i < num; i++) {
+			uint64_t t = rte_rdtsc()+100;
+
+			while (rte_rdtsc() < t)
+				rte_pause();
+			buf[i]->port ^= xor_val;
+		}
 	}
 	return 0;
 }
@@ -561,7 +574,8 @@ main(int argc, char *argv[])
 	}
 
 	d = rte_distributor_create("PKT_DIST", rte_socket_id(),
-			rte_lcore_count() - 2);
+			rte_lcore_count() - 2,
+			RTE_DIST_ALG_BURST);
 	if (d == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create distributor\n");
 
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index 51c9ad9..6e1debf 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -42,10 +42,10 @@
 #include <rte_eal_memconfig.h>
 #include <rte_compat.h>
 #include "rte_distributor_private.h"
-#include "rte_distributor_v1705.h"
+#include "rte_distributor.h"
 #include "rte_distributor_v20.h"
 
-TAILQ_HEAD(rte_dist_burst_list, rte_distributor_v1705);
+TAILQ_HEAD(rte_dist_burst_list, rte_distributor);
 
 static struct rte_tailq_elem rte_dist_burst_tailq = {
 	.name = "RTE_DIST_BURST",
@@ -57,17 +57,17 @@ EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
 /**** Burst Packet APIs called by workers ****/
 
 void
-rte_distributor_request_pkt_v1705(struct rte_distributor_v1705 *d,
+rte_distributor_request_pkt(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt,
 		unsigned int count)
 {
-	struct rte_distributor_buffer_v1705 *buf = &(d->bufs[worker_id]);
+	struct rte_distributor_buffer *buf = &(d->bufs[worker_id]);
 	unsigned int i;
 
 	volatile int64_t *retptr64;
 
 	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
-		rte_distributor_request_pkt(d->d_v20,
+		rte_distributor_request_pkt_v20(d->d_v20,
 			worker_id, oldpkt[0]);
 		return;
 	}
@@ -104,16 +104,16 @@ rte_distributor_request_pkt_v1705(struct rte_distributor_v1705 *d,
 }
 
 int
-rte_distributor_poll_pkt_v1705(struct rte_distributor_v1705 *d,
+rte_distributor_poll_pkt(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts)
 {
-	struct rte_distributor_buffer_v1705 *buf = &d->bufs[worker_id];
+	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
 	uint64_t ret;
 	int count = 0;
 	unsigned int i;
 
 	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
-		pkts[0] = rte_distributor_poll_pkt(d->d_v20, worker_id);
+		pkts[0] = rte_distributor_poll_pkt_v20(d->d_v20, worker_id);
 		return (pkts[0]) ? 1 : 0;
 	}
 
@@ -140,7 +140,7 @@ rte_distributor_poll_pkt_v1705(struct rte_distributor_v1705 *d,
 }
 
 int
-rte_distributor_get_pkt_v1705(struct rte_distributor_v1705 *d,
+rte_distributor_get_pkt(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts,
 		struct rte_mbuf **oldpkt, unsigned int return_count)
 {
@@ -148,37 +148,37 @@ rte_distributor_get_pkt_v1705(struct rte_distributor_v1705 *d,
 
 	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
 		if (return_count <= 1) {
-			pkts[0] = rte_distributor_get_pkt(d->d_v20,
+			pkts[0] = rte_distributor_get_pkt_v20(d->d_v20,
 				worker_id, oldpkt[0]);
 			return (pkts[0]) ? 1 : 0;
 		} else
 			return -EINVAL;
 	}
 
-	rte_distributor_request_pkt_v1705(d, worker_id, oldpkt, return_count);
+	rte_distributor_request_pkt(d, worker_id, oldpkt, return_count);
 
-	count = rte_distributor_poll_pkt_v1705(d, worker_id, pkts);
+	count = rte_distributor_poll_pkt(d, worker_id, pkts);
 	while (count == -1) {
 		uint64_t t = rte_rdtsc() + 100;
 
 		while (rte_rdtsc() < t)
 			rte_pause();
 
-		count = rte_distributor_poll_pkt_v1705(d, worker_id, pkts);
+		count = rte_distributor_poll_pkt(d, worker_id, pkts);
 	}
 	return count;
 }
 
 int
-rte_distributor_return_pkt_v1705(struct rte_distributor_v1705 *d,
+rte_distributor_return_pkt(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
 {
-	struct rte_distributor_buffer_v1705 *buf = &d->bufs[worker_id];
+	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
 	unsigned int i;
 
 	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
 		if (num == 1)
-			return rte_distributor_return_pkt(d->d_v20,
+			return rte_distributor_return_pkt_v20(d->d_v20,
 				worker_id, oldpkt[0]);
 		else
 			return -EINVAL;
@@ -202,7 +202,7 @@ rte_distributor_return_pkt_v1705(struct rte_distributor_v1705 *d,
 
 /* stores a packet returned from a worker inside the returns array */
 static inline void
-store_return(uintptr_t oldbuf, struct rte_distributor_v1705 *d,
+store_return(uintptr_t oldbuf, struct rte_distributor *d,
 		unsigned int *ret_start, unsigned int *ret_count)
 {
 	if (!oldbuf)
@@ -221,7 +221,7 @@ store_return(uintptr_t oldbuf, struct rte_distributor_v1705 *d,
  * workers to give us our atomic flow pinning.
  */
 void
-find_match_scalar(struct rte_distributor_v1705 *d,
+find_match_scalar(struct rte_distributor *d,
 			uint16_t *data_ptr,
 			uint16_t *output_ptr)
 {
@@ -270,9 +270,9 @@ find_match_scalar(struct rte_distributor_v1705 *d,
  * the valid returned pointers (store_return).
  */
 static unsigned int
-handle_returns(struct rte_distributor_v1705 *d, unsigned int wkr)
+handle_returns(struct rte_distributor *d, unsigned int wkr)
 {
-	struct rte_distributor_buffer_v1705 *buf = &(d->bufs[wkr]);
+	struct rte_distributor_buffer *buf = &(d->bufs[wkr]);
 	uintptr_t oldbuf;
 	unsigned int ret_start = d->returns.start,
 			ret_count = d->returns.count;
@@ -308,9 +308,9 @@ handle_returns(struct rte_distributor_v1705 *d, unsigned int wkr)
  * before sending out new packets.
  */
 static unsigned int
-release(struct rte_distributor_v1705 *d, unsigned int wkr)
+release(struct rte_distributor *d, unsigned int wkr)
 {
-	struct rte_distributor_buffer_v1705 *buf = &(d->bufs[wkr]);
+	struct rte_distributor_buffer *buf = &(d->bufs[wkr]);
 	unsigned int i;
 
 	while (!(d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF))
@@ -342,7 +342,7 @@ release(struct rte_distributor_v1705 *d, unsigned int wkr)
 
 /* process a set of packets to distribute them to workers */
 int
-rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
+rte_distributor_process(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int num_mbufs)
 {
 	unsigned int next_idx = 0;
@@ -355,7 +355,7 @@ rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
 
 	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
 		/* Call the old API */
-		return rte_distributor_process(d->d_v20, mbufs, num_mbufs);
+		return rte_distributor_process_v20(d->d_v20, mbufs, num_mbufs);
 	}
 
 	if (unlikely(num_mbufs == 0)) {
@@ -479,7 +479,7 @@ rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
 
 /* return to the caller, packets returned from workers */
 int
-rte_distributor_returned_pkts_v1705(struct rte_distributor_v1705 *d,
+rte_distributor_returned_pkts(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int max_mbufs)
 {
 	struct rte_distributor_returned_pkts *returns = &d->returns;
@@ -489,7 +489,7 @@ rte_distributor_returned_pkts_v1705(struct rte_distributor_v1705 *d,
 
 	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
 		/* Call the old API */
-		return rte_distributor_returned_pkts(d->d_v20,
+		return rte_distributor_returned_pkts_v20(d->d_v20,
 				mbufs, max_mbufs);
 	}
 
@@ -510,7 +510,7 @@ rte_distributor_returned_pkts_v1705(struct rte_distributor_v1705 *d,
  * being workered on or queued up in a backlog.
  */
 static inline unsigned int
-total_outstanding(const struct rte_distributor_v1705 *d)
+total_outstanding(const struct rte_distributor *d)
 {
 	unsigned int wkr, total_outstanding = 0;
 
@@ -525,24 +525,24 @@ total_outstanding(const struct rte_distributor_v1705 *d)
  * queued up.
  */
 int
-rte_distributor_flush_v1705(struct rte_distributor_v1705 *d)
+rte_distributor_flush(struct rte_distributor *d)
 {
 	const unsigned int flushed = total_outstanding(d);
 	unsigned int wkr;
 
 	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
 		/* Call the old API */
-		return rte_distributor_flush(d->d_v20);
+		return rte_distributor_flush_v20(d->d_v20);
 	}
 
 	while (total_outstanding(d) > 0)
-		rte_distributor_process_v1705(d, NULL, 0);
+		rte_distributor_process(d, NULL, 0);
 
 	/*
 	 * Send empty burst to all workers to allow them to exit
 	 * gracefully, should they need to.
 	 */
-	rte_distributor_process_v1705(d, NULL, 0);
+	rte_distributor_process(d, NULL, 0);
 
 	for (wkr = 0; wkr < d->num_workers; wkr++)
 		handle_returns(d, wkr);
@@ -552,13 +552,13 @@ rte_distributor_flush_v1705(struct rte_distributor_v1705 *d)
 
 /* clears the internal returns array in the distributor */
 void
-rte_distributor_clear_returns_v1705(struct rte_distributor_v1705 *d)
+rte_distributor_clear_returns(struct rte_distributor *d)
 {
 	unsigned int wkr;
 
 	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
 		/* Call the old API */
-		rte_distributor_clear_returns(d->d_v20);
+		rte_distributor_clear_returns_v20(d->d_v20);
 	}
 
 	/* throw away returns, so workers can exit */
@@ -567,13 +567,13 @@ rte_distributor_clear_returns_v1705(struct rte_distributor_v1705 *d)
 }
 
 /* creates a distributor instance */
-struct rte_distributor_v1705 *
-rte_distributor_create_v1705(const char *name,
+struct rte_distributor *
+rte_distributor_create(const char *name,
 		unsigned int socket_id,
 		unsigned int num_workers,
 		unsigned int alg_type)
 {
-	struct rte_distributor_v1705 *d;
+	struct rte_distributor *d;
 	struct rte_dist_burst_list *dist_burst_list;
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	const struct rte_memzone *mz;
@@ -586,8 +586,8 @@ rte_distributor_create_v1705(const char *name,
 	RTE_BUILD_BUG_ON((RTE_DISTRIB_MAX_WORKERS & 7) != 0);
 
 	if (alg_type == RTE_DIST_ALG_SINGLE) {
-		d = malloc(sizeof(struct rte_distributor_v1705));
-		d->d_v20 = rte_distributor_create(name,
+		d = malloc(sizeof(struct rte_distributor));
+		d->d_v20 = rte_distributor_create_v20(name,
 				socket_id, num_workers);
 		if (d->d_v20 == NULL) {
 			/* rte_errno will have been set */
diff --git a/lib/librte_distributor/rte_distributor.h b/lib/librte_distributor/rte_distributor.h
index e41d522..9b9efdb 100644
--- a/lib/librte_distributor/rte_distributor.h
+++ b/lib/librte_distributor/rte_distributor.h
@@ -1,8 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -31,9 +30,240 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-#ifndef _RTE_DISTRIBUTE_H_
-#define _RTE_DISTRIBUTE_H_
+#ifndef _RTE_DISTRIBUTOR_H_
+#define _RTE_DISTRIBUTOR_H_
 
-#include <rte_distributor_v20.h>
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Type of distribution (burst/single) */
+enum rte_distributor_alg_type {
+	RTE_DIST_ALG_BURST = 0,
+	RTE_DIST_ALG_SINGLE,
+	RTE_DIST_NUM_ALG_TYPES
+};
+
+struct rte_distributor;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @param alg_type
+ *   Call the legacy API, or use the new burst API. legacy uses 32-bit
+ *   flow ID, and works on a single packet at a time. Latest uses 15-
+ *   bit flow ID and works on up to 8 packets at a time to worers.
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor *
+rte_distributor_create(const char *name, unsigned int socket_id,
+		unsigned int num_workers,
+		unsigned int alg_type);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed on different cores at
+ * the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush(struct rte_distributor *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns(struct rte_distributor *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get new packets to process. Any previous packets
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param pkts
+ *   The mbufs pointer array to be filled in (up to 8 packets)
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ * @param retcount
+ *   The number of packets being returned
+ *
+ * @return
+ *   The number of packets in the pkts array
+ */
+int
+rte_distributor_get_pkt(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **pkts,
+	struct rte_mbuf **oldpkt, unsigned int retcount);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packets being processed by the worker
+ * @param num
+ *   The number of packets in the oldpkt array
+ */
+int
+rte_distributor_return_pkt(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt_burst(), this function does not wait for a
+ * new packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt_burst() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt_burst()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The returning packets, if any, processed by the worker
+ * @param count
+ *   The number of returning packets
+ */
+void
+rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbufs
+ *   The array of mbufs being given to the worker
+ *
+ * @return
+ *   The number of packets being given to the worker thread, zero if no
+ *   packet is yet available.
+ */
+int
+rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **mbufs);
+
+#ifdef __cplusplus
+}
+#endif
 
 #endif
diff --git a/lib/librte_distributor/rte_distributor_match_sse.c b/lib/librte_distributor/rte_distributor_match_sse.c
index b9f9bb0..44935a6 100644
--- a/lib/librte_distributor/rte_distributor_match_sse.c
+++ b/lib/librte_distributor/rte_distributor_match_sse.c
@@ -38,7 +38,7 @@
 
 
 void
-find_match_vec(struct rte_distributor_v1705 *d,
+find_match_vec(struct rte_distributor *d,
 			uint16_t *data_ptr,
 			uint16_t *output_ptr)
 {
diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
index 92052b1..fb5a43a 100644
--- a/lib/librte_distributor/rte_distributor_private.h
+++ b/lib/librte_distributor/rte_distributor_private.h
@@ -83,7 +83,7 @@ extern "C" {
  * the next cache line to worker 0, we pad this out to three cache lines.
  * Only 64-bits of the memory is actually used though.
  */
-union rte_distributor_buffer {
+union rte_distributor_buffer_v20 {
 	volatile int64_t bufptr64;
 	char pad[RTE_CACHE_LINE_SIZE*3];
 } __rte_cache_aligned;
@@ -108,8 +108,8 @@ struct rte_distributor_returned_pkts {
 	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
 };
 
-struct rte_distributor {
-	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
+struct rte_distributor_v20 {
+	TAILQ_ENTRY(rte_distributor_v20) next;    /**< Next in list. */
 
 	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
 	unsigned int num_workers;             /**< Number of workers polling */
@@ -124,7 +124,7 @@ struct rte_distributor {
 
 	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
 
-	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
+	union rte_distributor_buffer_v20 bufs[RTE_DISTRIB_MAX_WORKERS];
 
 	struct rte_distributor_returned_pkts returns;
 };
@@ -144,7 +144,7 @@ enum rte_distributor_match_function {
  * We can pass up to 8 mbufs at a time in one cacheline.
  * There is a separate cacheline for returns in the burst API.
  */
-struct rte_distributor_buffer_v1705 {
+struct rte_distributor_buffer {
 	volatile int64_t bufptr64[RTE_DIST_BURST_SIZE]
 		__rte_cache_aligned; /* <= outgoing to worker */
 
@@ -158,8 +158,8 @@ struct rte_distributor_buffer_v1705 {
 	int count __rte_cache_aligned;       /* <= number of current mbufs */
 };
 
-struct rte_distributor_v1705 {
-	TAILQ_ENTRY(rte_distributor_v1705) next;    /**< Next in list. */
+struct rte_distributor {
+	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
 
 	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
 	unsigned int num_workers;             /**< Number of workers polling */
@@ -176,22 +176,22 @@ struct rte_distributor_v1705 {
 	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS]
 			__rte_cache_aligned;
 
-	struct rte_distributor_buffer_v1705 bufs[RTE_DISTRIB_MAX_WORKERS];
+	struct rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
 
 	struct rte_distributor_returned_pkts returns;
 
 	enum rte_distributor_match_function dist_match_fn;
 
-	struct rte_distributor *d_v20;
+	struct rte_distributor_v20 *d_v20;
 };
 
 void
-find_match_scalar(struct rte_distributor_v1705 *d,
+find_match_scalar(struct rte_distributor *d,
 			uint16_t *data_ptr,
 			uint16_t *output_ptr);
 
 void
-find_match_vec(struct rte_distributor_v1705 *d,
+find_match_vec(struct rte_distributor *d,
 			uint16_t *data_ptr,
 			uint16_t *output_ptr);
 
diff --git a/lib/librte_distributor/rte_distributor_v1705.h b/lib/librte_distributor/rte_distributor_v1705.h
deleted file mode 100644
index 0034020..0000000
--- a/lib/librte_distributor/rte_distributor_v1705.h
+++ /dev/null
@@ -1,269 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2017 Intel Corporation. All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _RTE_DISTRIBUTOR_H_
-#define _RTE_DISTRIBUTOR_H_
-
-/**
- * @file
- * RTE distributor
- *
- * The distributor is a component which is designed to pass packets
- * one-at-a-time to workers, with dynamic load balancing.
- */
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-/* Type of distribution (burst/single) */
-enum rte_distributor_alg_type {
-	RTE_DIST_ALG_BURST = 0,
-	RTE_DIST_ALG_SINGLE,
-	RTE_DIST_NUM_ALG_TYPES
-};
-
-struct rte_distributor_v1705;
-struct rte_mbuf;
-
-/**
- * Function to create a new distributor instance
- *
- * Reserves the memory needed for the distributor operation and
- * initializes the distributor to work with the configured number of workers.
- *
- * @param name
- *   The name to be given to the distributor instance.
- * @param socket_id
- *   The NUMA node on which the memory is to be allocated
- * @param num_workers
- *   The maximum number of workers that will request packets from this
- *   distributor
- * @param alg_type
- *   Call the legacy API, or use the new burst API. legacy uses 32-bit
- *   flow ID, and works on a single packet at a time. Latest uses 15-
- *   bit flow ID and works on up to 8 packets at a time to worers.
- * @return
- *   The newly created distributor instance
- */
-struct rte_distributor_v1705 *
-rte_distributor_create_v1705(const char *name, unsigned int socket_id,
-		unsigned int num_workers,
-		unsigned int alg_type);
-
-/*  *** APIS to be called on the distributor lcore ***  */
-/*
- * The following APIs are the public APIs which are designed for use on a
- * single lcore which acts as the distributor lcore for a given distributor
- * instance. These functions cannot be called on multiple cores simultaneously
- * without using locking to protect access to the internals of the distributor.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * Process a set of packets by distributing them among workers that request
- * packets. The distributor will ensure that no two packets that have the
- * same flow id, or tag, in the mbuf will be processed on different cores at
- * the same time.
- *
- * The user is advocated to set tag for each mbuf before calling this function.
- * If user doesn't set the tag, the tag value can be various values depending on
- * driver implementation and configuration.
- *
- * This is not multi-thread safe and should only be called on a single lcore.
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs to be distributed
- * @param num_mbufs
- *   The number of mbufs in the mbufs array
- * @return
- *   The number of mbufs processed.
- */
-int
-rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
-		struct rte_mbuf **mbufs, unsigned int num_mbufs);
-
-/**
- * Get a set of mbufs that have been returned to the distributor by workers
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs pointer array to be filled in
- * @param max_mbufs
- *   The size of the mbufs array
- * @return
- *   The number of mbufs returned in the mbufs array.
- */
-int
-rte_distributor_returned_pkts_v1705(struct rte_distributor_v1705 *d,
-		struct rte_mbuf **mbufs, unsigned int max_mbufs);
-
-/**
- * Flush the distributor component, so that there are no in-flight or
- * backlogged packets awaiting processing
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @return
- *   The number of queued/in-flight packets that were completed by this call.
- */
-int
-rte_distributor_flush_v1705(struct rte_distributor_v1705 *d);
-
-/**
- * Clears the array of returned packets used as the source for the
- * rte_distributor_returned_pkts() API call.
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- */
-void
-rte_distributor_clear_returns_v1705(struct rte_distributor_v1705 *d);
-
-/*  *** APIS to be called on the worker lcores ***  */
-/*
- * The following APIs are the public APIs which are designed for use on
- * multiple lcores which act as workers for a distributor. Each lcore should use
- * a unique worker id when requesting packets.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * API called by a worker to get new packets to process. Any previous packets
- * given to the worker is assumed to have completed processing, and may be
- * optionally returned to the distributor via the oldpkt parameter.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param pkts
- *   The mbufs pointer array to be filled in (up to 8 packets)
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- * @param retcount
- *   The number of packets being returned
- *
- * @return
- *   The number of packets in the pkts array
- */
-int
-rte_distributor_get_pkt_v1705(struct rte_distributor_v1705 *d,
-	unsigned int worker_id, struct rte_mbuf **pkts,
-	struct rte_mbuf **oldpkt, unsigned int retcount);
-
-/**
- * API called by a worker to return a completed packet without requesting a
- * new packet, for example, because a worker thread is shutting down
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packets being processed by the worker
- * @param num
- *   The number of packets in the oldpkt array
- */
-int
-rte_distributor_return_pkt_v1705(struct rte_distributor_v1705 *d,
-	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
-
-/**
- * API called by a worker to request a new packet to process.
- * Any previous packet given to the worker is assumed to have completed
- * processing, and may be optionally returned to the distributor via
- * the oldpkt parameter.
- * Unlike rte_distributor_get_pkt_burst(), this function does not wait for a
- * new packet to be provided by the distributor.
- *
- * NOTE: after calling this function, rte_distributor_poll_pkt_burst() should
- * be used to poll for the packet requested. The rte_distributor_get_pkt_burst()
- * API should *not* be used to try and retrieve the new packet.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The returning packets, if any, processed by the worker
- * @param count
- *   The number of returning packets
- */
-void
-rte_distributor_request_pkt_v1705(struct rte_distributor_v1705 *d,
-		unsigned int worker_id, struct rte_mbuf **oldpkt,
-		unsigned int count);
-
-/**
- * API called by a worker to check for a new packet that was previously
- * requested by a call to rte_distributor_request_pkt(). It does not wait
- * for the new packet to be available, but returns NULL if the request has
- * not yet been fulfilled by the distributor.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param mbufs
- *   The array of mbufs being given to the worker
- *
- * @return
- *   The number of packets being given to the worker thread, zero if no
- *   packet is yet available.
- */
-int
-rte_distributor_poll_pkt_v1705(struct rte_distributor_v1705 *d,
-		unsigned int worker_id, struct rte_mbuf **mbufs);
-
-#ifdef __cplusplus
-}
-#endif
-
-#endif
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
index be297ec..1f406c5 100644
--- a/lib/librte_distributor/rte_distributor_v20.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -43,7 +43,7 @@
 #include "rte_distributor_v20.h"
 #include "rte_distributor_private.h"
 
-TAILQ_HEAD(rte_distributor_list, rte_distributor);
+TAILQ_HEAD(rte_distributor_list, rte_distributor_v20);
 
 static struct rte_tailq_elem rte_distributor_tailq = {
 	.name = "RTE_DISTRIBUTOR",
@@ -53,10 +53,10 @@ EAL_REGISTER_TAILQ(rte_distributor_tailq)
 /**** APIs called by workers ****/
 
 void
-rte_distributor_request_pkt(struct rte_distributor *d,
+rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id, struct rte_mbuf *oldpkt)
 {
-	union rte_distributor_buffer *buf = &d->bufs[worker_id];
+	union rte_distributor_buffer_v20 *buf = &d->bufs[worker_id];
 	int64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
 			| RTE_DISTRIB_GET_BUF;
 	while (unlikely(buf->bufptr64 & RTE_DISTRIB_FLAGS_MASK))
@@ -65,10 +65,10 @@ rte_distributor_request_pkt(struct rte_distributor *d,
 }
 
 struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
+rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id)
 {
-	union rte_distributor_buffer *buf = &d->bufs[worker_id];
+	union rte_distributor_buffer_v20 *buf = &d->bufs[worker_id];
 	if (buf->bufptr64 & RTE_DISTRIB_GET_BUF)
 		return NULL;
 
@@ -78,21 +78,21 @@ rte_distributor_poll_pkt(struct rte_distributor *d,
 }
 
 struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
+rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id, struct rte_mbuf *oldpkt)
 {
 	struct rte_mbuf *ret;
-	rte_distributor_request_pkt(d, worker_id, oldpkt);
-	while ((ret = rte_distributor_poll_pkt(d, worker_id)) == NULL)
+	rte_distributor_request_pkt_v20(d, worker_id, oldpkt);
+	while ((ret = rte_distributor_poll_pkt_v20(d, worker_id)) == NULL)
 		rte_pause();
 	return ret;
 }
 
 int
-rte_distributor_return_pkt(struct rte_distributor *d,
+rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id, struct rte_mbuf *oldpkt)
 {
-	union rte_distributor_buffer *buf = &d->bufs[worker_id];
+	union rte_distributor_buffer_v20 *buf = &d->bufs[worker_id];
 	uint64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
 			| RTE_DISTRIB_RETURN_BUF;
 	buf->bufptr64 = req;
@@ -123,7 +123,7 @@ backlog_pop(struct rte_distributor_backlog *bl)
 
 /* stores a packet returned from a worker inside the returns array */
 static inline void
-store_return(uintptr_t oldbuf, struct rte_distributor *d,
+store_return(uintptr_t oldbuf, struct rte_distributor_v20 *d,
 		unsigned *ret_start, unsigned *ret_count)
 {
 	/* store returns in a circular buffer - code is branch-free */
@@ -134,7 +134,7 @@ store_return(uintptr_t oldbuf, struct rte_distributor *d,
 }
 
 static inline void
-handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
+handle_worker_shutdown(struct rte_distributor_v20 *d, unsigned int wkr)
 {
 	d->in_flight_tags[wkr] = 0;
 	d->in_flight_bitmask &= ~(1UL << wkr);
@@ -164,7 +164,7 @@ handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
 		 * Note that the tags were set before first level call
 		 * to rte_distributor_process.
 		 */
-		rte_distributor_process(d, pkts, i);
+		rte_distributor_process_v20(d, pkts, i);
 		bl->count = bl->start = 0;
 	}
 }
@@ -174,7 +174,7 @@ handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
  * to do a partial flush.
  */
 static int
-process_returns(struct rte_distributor *d)
+process_returns(struct rte_distributor_v20 *d)
 {
 	unsigned wkr;
 	unsigned flushed = 0;
@@ -213,7 +213,7 @@ process_returns(struct rte_distributor *d)
 
 /* process a set of packets to distribute them to workers */
 int
-rte_distributor_process(struct rte_distributor *d,
+rte_distributor_process_v20(struct rte_distributor_v20 *d,
 		struct rte_mbuf **mbufs, unsigned num_mbufs)
 {
 	unsigned next_idx = 0;
@@ -317,7 +317,7 @@ rte_distributor_process(struct rte_distributor *d,
 
 /* return to the caller, packets returned from workers */
 int
-rte_distributor_returned_pkts(struct rte_distributor *d,
+rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
 		struct rte_mbuf **mbufs, unsigned max_mbufs)
 {
 	struct rte_distributor_returned_pkts *returns = &d->returns;
@@ -338,7 +338,7 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
 /* return the number of packets in-flight in a distributor, i.e. packets
  * being workered on or queued up in a backlog. */
 static inline unsigned
-total_outstanding(const struct rte_distributor *d)
+total_outstanding(const struct rte_distributor_v20 *d)
 {
 	unsigned wkr, total_outstanding;
 
@@ -353,19 +353,19 @@ total_outstanding(const struct rte_distributor *d)
 /* flush the distributor, so that there are no outstanding packets in flight or
  * queued up. */
 int
-rte_distributor_flush(struct rte_distributor *d)
+rte_distributor_flush_v20(struct rte_distributor_v20 *d)
 {
 	const unsigned flushed = total_outstanding(d);
 
 	while (total_outstanding(d) > 0)
-		rte_distributor_process(d, NULL, 0);
+		rte_distributor_process_v20(d, NULL, 0);
 
 	return flushed;
 }
 
 /* clears the internal returns array in the distributor */
 void
-rte_distributor_clear_returns(struct rte_distributor *d)
+rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d)
 {
 	d->returns.start = d->returns.count = 0;
 #ifndef __OPTIMIZE__
@@ -374,12 +374,12 @@ rte_distributor_clear_returns(struct rte_distributor *d)
 }
 
 /* creates a distributor instance */
-struct rte_distributor *
-rte_distributor_create(const char *name,
+struct rte_distributor_v20 *
+rte_distributor_create_v20(const char *name,
 		unsigned socket_id,
 		unsigned num_workers)
 {
-	struct rte_distributor *d;
+	struct rte_distributor_v20 *d;
 	struct rte_distributor_list *distributor_list;
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	const struct rte_memzone *mz;
diff --git a/lib/librte_distributor/rte_distributor_v20.h b/lib/librte_distributor/rte_distributor_v20.h
index b69aa27..f02e6aa 100644
--- a/lib/librte_distributor/rte_distributor_v20.h
+++ b/lib/librte_distributor/rte_distributor_v20.h
@@ -48,7 +48,7 @@ extern "C" {
 
 #define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
 
-struct rte_distributor;
+struct rte_distributor_v20;
 struct rte_mbuf;
 
 /**
@@ -67,8 +67,8 @@ struct rte_mbuf;
  * @return
  *   The newly created distributor instance
  */
-struct rte_distributor *
-rte_distributor_create(const char *name, unsigned int socket_id,
+struct rte_distributor_v20 *
+rte_distributor_create_v20(const char *name, unsigned int socket_id,
 		unsigned int num_workers);
 
 /*  *** APIS to be called on the distributor lcore ***  */
@@ -103,7 +103,7 @@ rte_distributor_create(const char *name, unsigned int socket_id,
  *   The number of mbufs processed.
  */
 int
-rte_distributor_process(struct rte_distributor *d,
+rte_distributor_process_v20(struct rte_distributor_v20 *d,
 		struct rte_mbuf **mbufs, unsigned int num_mbufs);
 
 /**
@@ -121,7 +121,7 @@ rte_distributor_process(struct rte_distributor *d,
  *   The number of mbufs returned in the mbufs array.
  */
 int
-rte_distributor_returned_pkts(struct rte_distributor *d,
+rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
 		struct rte_mbuf **mbufs, unsigned int max_mbufs);
 
 /**
@@ -136,7 +136,7 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
  *   The number of queued/in-flight packets that were completed by this call.
  */
 int
-rte_distributor_flush(struct rte_distributor *d);
+rte_distributor_flush_v20(struct rte_distributor_v20 *d);
 
 /**
  * Clears the array of returned packets used as the source for the
@@ -148,7 +148,7 @@ rte_distributor_flush(struct rte_distributor *d);
  *   The distributor instance to be used
  */
 void
-rte_distributor_clear_returns(struct rte_distributor *d);
+rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d);
 
 /*  *** APIS to be called on the worker lcores ***  */
 /*
@@ -177,7 +177,7 @@ rte_distributor_clear_returns(struct rte_distributor *d);
  *   A new packet to be processed by the worker thread.
  */
 struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
+rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned int worker_id, struct rte_mbuf *oldpkt);
 
 /**
@@ -193,8 +193,8 @@ rte_distributor_get_pkt(struct rte_distributor *d,
  *   The previous packet being processed by the worker
  */
 int
-rte_distributor_return_pkt(struct rte_distributor *d, unsigned int worker_id,
-		struct rte_mbuf *mbuf);
+rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
+		unsigned int worker_id, struct rte_mbuf *mbuf);
 
 /**
  * API called by a worker to request a new packet to process.
@@ -217,7 +217,7 @@ rte_distributor_return_pkt(struct rte_distributor *d, unsigned int worker_id,
  *   The previous packet, if any, being processed by the worker
  */
 void
-rte_distributor_request_pkt(struct rte_distributor *d,
+rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned int worker_id, struct rte_mbuf *oldpkt);
 
 /**
@@ -237,7 +237,7 @@ rte_distributor_request_pkt(struct rte_distributor *d,
  *   packet is yet available.
  */
 struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
+rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned int worker_id);
 
 #ifdef __cplusplus
diff --git a/test/test/test_distributor.c b/test/test/test_distributor.c
index 6059a0c..7a30513 100644
--- a/test/test/test_distributor.c
+++ b/test/test/test_distributor.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -87,20 +87,25 @@ clear_packet_count(void)
 static int
 handle_work(void *arg)
 {
-	struct rte_mbuf *pkt = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
 	struct worker_params *wp = arg;
-	struct rte_distributor *d = wp->dist;
-
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
-
-	pkt = rte_distributor_get_pkt(d, id, NULL);
+	struct rte_distributor *db = wp->dist;
+	unsigned int count = 0, num = 0;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	int i;
+
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
+	num = rte_distributor_get_pkt(db, id, buf, buf, num);
 	while (!quit) {
-		worker_stats[id].handled_packets++, count++;
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		worker_stats[id].handled_packets += num;
+		count += num;
+		num = rte_distributor_get_pkt(db, id,
+				buf, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
+	worker_stats[id].handled_packets += num;
+	count += num;
+	rte_distributor_return_pkt(db, id, buf, num);
 	return 0;
 }
 
@@ -118,9 +123,11 @@ handle_work(void *arg)
 static int
 sanity_test(struct worker_params *wp, struct rte_mempool *p)
 {
-	struct rte_distributor *d = wp->dist;
+	struct rte_distributor *db = wp->dist;
 	struct rte_mbuf *bufs[BURST];
-	unsigned i;
+	struct rte_mbuf *returns[BURST*2];
+	unsigned int i, count;
+	unsigned int retries;
 
 	printf("=== Basic distributor sanity tests ===\n");
 	clear_packet_count();
@@ -134,8 +141,15 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 	for (i = 0; i < BURST; i++)
 		bufs[i]->hash.usr = 0;
 
-	rte_distributor_process(d, bufs, BURST);
-	rte_distributor_flush(d);
+	rte_distributor_process(db, bufs, BURST);
+	count = 0;
+	do {
+
+		rte_distributor_flush(db);
+		count += rte_distributor_returned_pkts(db,
+				returns, BURST*2);
+	} while (count < BURST);
+
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -147,8 +161,6 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 		printf("Worker %u handled %u packets\n", i,
 				worker_stats[i].handled_packets);
 	printf("Sanity test with all zero hashes done.\n");
-	if (worker_stats[0].handled_packets != BURST)
-		return -1;
 
 	/* pick two flows and check they go correctly */
 	if (rte_lcore_count() >= 3) {
@@ -156,8 +168,13 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 		for (i = 0; i < BURST; i++)
 			bufs[i]->hash.usr = (i & 1) << 8;
 
-		rte_distributor_process(d, bufs, BURST);
-		rte_distributor_flush(d);
+		rte_distributor_process(db, bufs, BURST);
+		count = 0;
+		do {
+			rte_distributor_flush(db);
+			count += rte_distributor_returned_pkts(db,
+					returns, BURST*2);
+		} while (count < BURST);
 		if (total_packet_count() != BURST) {
 			printf("Line %d: Error, not all packets flushed. "
 					"Expected %u, got %u\n",
@@ -169,20 +186,21 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 			printf("Worker %u handled %u packets\n", i,
 					worker_stats[i].handled_packets);
 		printf("Sanity test with two hash values done\n");
-
-		if (worker_stats[0].handled_packets != 16 ||
-				worker_stats[1].handled_packets != 16)
-			return -1;
 	}
 
 	/* give a different hash value to each packet,
 	 * so load gets distributed */
 	clear_packet_count();
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = i;
-
-	rte_distributor_process(d, bufs, BURST);
-	rte_distributor_flush(d);
+		bufs[i]->hash.usr = i+1;
+
+	rte_distributor_process(db, bufs, BURST);
+	count = 0;
+	do {
+		rte_distributor_flush(db);
+		count += rte_distributor_returned_pkts(db,
+				returns, BURST*2);
+	} while (count < BURST);
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -204,8 +222,9 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 	unsigned num_returned = 0;
 
 	/* flush out any remaining packets */
-	rte_distributor_flush(d);
-	rte_distributor_clear_returns(d);
+	rte_distributor_flush(db);
+	rte_distributor_clear_returns(db);
+
 	if (rte_mempool_get_bulk(p, (void *)many_bufs, BIG_BATCH) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
 		return -1;
@@ -213,28 +232,44 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 	for (i = 0; i < BIG_BATCH; i++)
 		many_bufs[i]->hash.usr = i << 2;
 
+	printf("=== testing big burst (%s) ===\n", wp->name);
 	for (i = 0; i < BIG_BATCH/BURST; i++) {
-		rte_distributor_process(d, &many_bufs[i*BURST], BURST);
-		num_returned += rte_distributor_returned_pkts(d,
+		rte_distributor_process(db,
+				&many_bufs[i*BURST], BURST);
+		count = rte_distributor_returned_pkts(db,
 				&return_bufs[num_returned],
 				BIG_BATCH - num_returned);
+		num_returned += count;
 	}
-	rte_distributor_flush(d);
-	num_returned += rte_distributor_returned_pkts(d,
-			&return_bufs[num_returned], BIG_BATCH - num_returned);
+	rte_distributor_flush(db);
+	count = rte_distributor_returned_pkts(db,
+		&return_bufs[num_returned],
+			BIG_BATCH - num_returned);
+	num_returned += count;
+	retries = 0;
+	do {
+		rte_distributor_flush(db);
+		count = rte_distributor_returned_pkts(db,
+				&return_bufs[num_returned],
+				BIG_BATCH - num_returned);
+		num_returned += count;
+		retries++;
+	} while ((num_returned < BIG_BATCH) && (retries < 100));
 
 	if (num_returned != BIG_BATCH) {
-		printf("line %d: Number returned is not the same as "
-				"number sent\n", __LINE__);
+		printf("line %d: Missing packets, expected %d\n",
+				__LINE__, num_returned);
 		return -1;
 	}
+
 	/* big check -  make sure all packets made it back!! */
 	for (i = 0; i < BIG_BATCH; i++) {
 		unsigned j;
 		struct rte_mbuf *src = many_bufs[i];
-		for (j = 0; j < BIG_BATCH; j++)
+		for (j = 0; j < BIG_BATCH; j++) {
 			if (return_bufs[j] == src)
 				break;
+		}
 
 		if (j == BIG_BATCH) {
 			printf("Error: could not find source packet #%u\n", i);
@@ -258,20 +293,28 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 static int
 handle_work_with_free_mbufs(void *arg)
 {
-	struct rte_mbuf *pkt = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
 	struct worker_params *wp = arg;
 	struct rte_distributor *d = wp->dist;
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
-
-	pkt = rte_distributor_get_pkt(d, id, NULL);
+	unsigned int count = 0;
+	unsigned int i;
+	unsigned int num = 0;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
+	num = rte_distributor_get_pkt(d, id, buf, buf, num);
 	while (!quit) {
-		worker_stats[id].handled_packets++, count++;
-		rte_pktmbuf_free(pkt);
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		worker_stats[id].handled_packets += num;
+		count += num;
+		for (i = 0; i < num; i++)
+			rte_pktmbuf_free(buf[i]);
+		num = rte_distributor_get_pkt(d,
+				id, buf, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
+	worker_stats[id].handled_packets += num;
+	count += num;
+	rte_distributor_return_pkt(d, id, buf, num);
 	return 0;
 }
 
@@ -287,7 +330,8 @@ sanity_test_with_mbuf_alloc(struct worker_params *wp, struct rte_mempool *p)
 	unsigned i;
 	struct rte_mbuf *bufs[BURST];
 
-	printf("=== Sanity test with mbuf alloc/free  ===\n");
+	printf("=== Sanity test with mbuf alloc/free (%s) ===\n", wp->name);
+
 	clear_packet_count();
 	for (i = 0; i < ((1<<ITER_POWER)); i += BURST) {
 		unsigned j;
@@ -302,6 +346,9 @@ sanity_test_with_mbuf_alloc(struct worker_params *wp, struct rte_mempool *p)
 	}
 
 	rte_distributor_flush(d);
+
+	rte_delay_us(10000);
+
 	if (total_packet_count() < (1<<ITER_POWER)) {
 		printf("Line %u: Packet count is incorrect, %u, expected %u\n",
 				__LINE__, total_packet_count(),
@@ -317,21 +364,32 @@ static int
 handle_work_for_shutdown_test(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
 	struct worker_params *wp = arg;
 	struct rte_distributor *d = wp->dist;
-	unsigned count = 0;
-	const unsigned id = __sync_fetch_and_add(&worker_idx, 1);
+	unsigned int count = 0;
+	unsigned int num = 0;
+	unsigned int total = 0;
+	unsigned int i;
+	unsigned int returned = 0;
+	const unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+
+	num = rte_distributor_get_pkt(d, id, buf, buf, num);
 
-	pkt = rte_distributor_get_pkt(d, id, NULL);
 	/* wait for quit single globally, or for worker zero, wait
 	 * for zero_quit */
 	while (!quit && !(id == 0 && zero_quit)) {
-		worker_stats[id].handled_packets++, count++;
-		rte_pktmbuf_free(pkt);
-		pkt = rte_distributor_get_pkt(d, id, NULL);
+		worker_stats[id].handled_packets += num;
+		count += num;
+		for (i = 0; i < num; i++)
+			rte_pktmbuf_free(buf[i]);
+		num = rte_distributor_get_pkt(d,
+				id, buf, buf, num);
+		total += num;
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
+	worker_stats[id].handled_packets += num;
+	count += num;
+	returned = rte_distributor_return_pkt(d, id, buf, num);
 
 	if (id == 0) {
 		/* for worker zero, allow it to restart to pick up last packet
@@ -339,13 +397,18 @@ handle_work_for_shutdown_test(void *arg)
 		 */
 		while (zero_quit)
 			usleep(100);
-		pkt = rte_distributor_get_pkt(d, id, NULL);
+
+		num = rte_distributor_get_pkt(d,
+				id, buf, buf, num);
+
 		while (!quit) {
 			worker_stats[id].handled_packets++, count++;
 			rte_pktmbuf_free(pkt);
-			pkt = rte_distributor_get_pkt(d, id, NULL);
+			num = rte_distributor_get_pkt(d, id, buf, buf, num);
 		}
-		rte_distributor_return_pkt(d, id, pkt);
+		returned = rte_distributor_return_pkt(d,
+				id, buf, num);
+		printf("Num returned = %d\n", returned);
 	}
 	return 0;
 }
@@ -367,17 +430,22 @@ sanity_test_with_worker_shutdown(struct worker_params *wp,
 	printf("=== Sanity test of worker shutdown ===\n");
 
 	clear_packet_count();
+
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
 		return -1;
 	}
 
-	/* now set all hash values in all buffers to zero, so all pkts go to the
-	 * one worker thread */
+	/*
+	 * Now set all hash values in all buffers to same value so all
+	 * pkts go to the one worker thread
+	 */
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = 0;
+		bufs[i]->hash.usr = 1;
 
 	rte_distributor_process(d, bufs, BURST);
+	rte_distributor_flush(d);
+
 	/* at this point, we will have processed some packets and have a full
 	 * backlog for the other ones at worker 0.
 	 */
@@ -388,7 +456,7 @@ sanity_test_with_worker_shutdown(struct worker_params *wp,
 		return -1;
 	}
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = 0;
+		bufs[i]->hash.usr = 1;
 
 	/* get worker zero to quit */
 	zero_quit = 1;
@@ -396,6 +464,12 @@ sanity_test_with_worker_shutdown(struct worker_params *wp,
 
 	/* flush the distributor */
 	rte_distributor_flush(d);
+	rte_delay_us(10000);
+
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
+
 	if (total_packet_count() != BURST * 2) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -403,10 +477,6 @@ sanity_test_with_worker_shutdown(struct worker_params *wp,
 		return -1;
 	}
 
-	for (i = 0; i < rte_lcore_count() - 1; i++)
-		printf("Worker %u handled %u packets\n", i,
-				worker_stats[i].handled_packets);
-
 	printf("Sanity test with worker shutdown passed\n\n");
 	return 0;
 }
@@ -422,7 +492,7 @@ test_flush_with_worker_shutdown(struct worker_params *wp,
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
-	printf("=== Test flush fn with worker shutdown ===\n");
+	printf("=== Test flush fn with worker shutdown (%s) ===\n", wp->name);
 
 	clear_packet_count();
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
@@ -446,7 +516,13 @@ test_flush_with_worker_shutdown(struct worker_params *wp,
 	/* flush the distributor */
 	rte_distributor_flush(d);
 
+	rte_delay_us(10000);
+
 	zero_quit = 0;
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
+
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -454,10 +530,6 @@ test_flush_with_worker_shutdown(struct worker_params *wp,
 		return -1;
 	}
 
-	for (i = 0; i < rte_lcore_count() - 1; i++)
-		printf("Worker %u handled %u packets\n", i,
-				worker_stats[i].handled_packets);
-
 	printf("Flush test with worker shutdown passed\n\n");
 	return 0;
 }
@@ -469,7 +541,9 @@ int test_error_distributor_create_name(void)
 	char *name = NULL;
 
 	d = rte_distributor_create(name, rte_socket_id(),
-			rte_lcore_count() - 1);
+			rte_lcore_count() - 1,
+			RTE_DIST_ALG_BURST);
+
 	if (d != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with NULL name param\n");
 		return -1;
@@ -483,8 +557,10 @@ static
 int test_error_distributor_create_numworkers(void)
 {
 	struct rte_distributor *d = NULL;
+
 	d = rte_distributor_create("test_numworkers", rte_socket_id(),
-			RTE_MAX_LCORE + 10);
+			RTE_MAX_LCORE + 10,
+			RTE_DIST_ALG_BURST);
 	if (d != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with num_workers > MAX\n");
 		return -1;
@@ -530,10 +606,11 @@ test_distributor(void)
 	}
 
 	if (d == NULL) {
-		d = rte_distributor_create("Test_distributor", rte_socket_id(),
-				rte_lcore_count() - 1);
+		d = rte_distributor_create("Test_dist_burst", rte_socket_id(),
+				rte_lcore_count() - 1,
+				RTE_DIST_ALG_BURST);
 		if (d == NULL) {
-			printf("Error creating distributor\n");
+			printf("Error creating burst distributor\n");
 			return -1;
 		}
 	} else {
@@ -553,7 +630,7 @@ test_distributor(void)
 	}
 
 	worker_params.dist = d;
-	sprintf(worker_params.name, "single");
+	sprintf(worker_params.name, "burst");
 
 	rte_eal_mp_remote_launch(handle_work, &worker_params, SKIP_MASTER);
 	if (sanity_test(&worker_params, p) < 0)
diff --git a/test/test/test_distributor_perf.c b/test/test/test_distributor_perf.c
index 7947fe9..1dd326b 100644
--- a/test/test/test_distributor_perf.c
+++ b/test/test/test_distributor_perf.c
@@ -129,18 +129,25 @@ clear_packet_count(void)
 static int
 handle_work(void *arg)
 {
-	struct rte_mbuf *pkt = NULL;
 	struct rte_distributor *d = arg;
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
+	unsigned int count = 0;
+	unsigned int num = 0;
+	int i;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
 
-	pkt = rte_distributor_get_pkt(d, id, NULL);
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
+
+	num = rte_distributor_get_pkt(d, id, buf, buf, num);
 	while (!quit) {
-		worker_stats[id].handled_packets++, count++;
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		worker_stats[id].handled_packets += num;
+		count += num;
+		num = rte_distributor_get_pkt(d, id, buf, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
+	worker_stats[id].handled_packets += num;
+	count += num;
+	rte_distributor_return_pkt(d, id, buf, num);
 	return 0;
 }
 
@@ -228,7 +235,8 @@ test_distributor_perf(void)
 
 	if (d == NULL) {
 		d = rte_distributor_create("Test_perf", rte_socket_id(),
-				rte_lcore_count() - 1);
+				rte_lcore_count() - 1,
+				RTE_DIST_ALG_SINGLE);
 		if (d == NULL) {
 			printf("Error creating distributor\n");
 			return -1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v9 08/18] lib: make v20 header file private
  2017-03-06  9:10                           ` [PATCH v9 00/18] distributor lib performance enhancements David Hunt
                                               ` (6 preceding siblings ...)
  2017-03-06  9:10                             ` [PATCH v9 07/18] lib: switch distributor over to new API David Hunt
@ 2017-03-06  9:10                             ` David Hunt
  2017-03-06  9:10                             ` [PATCH v9 09/18] lib: add symbol versioning to distributor David Hunt
                                               ` (10 subsequent siblings)
  18 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index a812fe4..2b28eff 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -57,7 +57,6 @@ endif
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
-SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor_v20.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v9 09/18] lib: add symbol versioning to distributor
  2017-03-06  9:10                           ` [PATCH v9 00/18] distributor lib performance enhancements David Hunt
                                               ` (7 preceding siblings ...)
  2017-03-06  9:10                             ` [PATCH v9 08/18] lib: make v20 header file private David Hunt
@ 2017-03-06  9:10                             ` David Hunt
  2017-03-10 16:22                               ` Bruce Richardson
  2017-03-06  9:10                             ` [PATCH v9 10/18] test: test single and burst distributor API David Hunt
                                               ` (9 subsequent siblings)
  18 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Also bumped up the ABI version number in the Makefile

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 lib/librte_distributor/Makefile                    |  2 +-
 lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
 lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++
 lib/librte_distributor/rte_distributor_v20.c       | 10 +++
 lib/librte_distributor/rte_distributor_version.map | 14 ++++
 5 files changed, 162 insertions(+), 10 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_v1705.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 2b28eff..2f05cf3 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -39,7 +39,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
 
 EXPORT_MAP := rte_distributor_version.map
 
-LIBABIVER := 1
+LIBABIVER := 2
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index 6e1debf..c4128a0 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -36,6 +36,7 @@
 #include <rte_mbuf.h>
 #include <rte_memory.h>
 #include <rte_cycles.h>
+#include <rte_compat.h>
 #include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
@@ -44,6 +45,7 @@
 #include "rte_distributor_private.h"
 #include "rte_distributor.h"
 #include "rte_distributor_v20.h"
+#include "rte_distributor_v1705.h"
 
 TAILQ_HEAD(rte_dist_burst_list, rte_distributor);
 
@@ -57,7 +59,7 @@ EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
 /**** Burst Packet APIs called by workers ****/
 
 void
-rte_distributor_request_pkt(struct rte_distributor *d,
+rte_distributor_request_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt,
 		unsigned int count)
 {
@@ -102,9 +104,14 @@ rte_distributor_request_pkt(struct rte_distributor *d,
 	 */
 	*retptr64 |= RTE_DISTRIB_GET_BUF;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_request_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(void rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count),
+		rte_distributor_request_pkt_v1705);
 
 int
-rte_distributor_poll_pkt(struct rte_distributor *d,
+rte_distributor_poll_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts)
 {
 	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
@@ -138,9 +145,13 @@ rte_distributor_poll_pkt(struct rte_distributor *d,
 
 	return count;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_poll_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **pkts),
+		rte_distributor_poll_pkt_v1705);
 
 int
-rte_distributor_get_pkt(struct rte_distributor *d,
+rte_distributor_get_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts,
 		struct rte_mbuf **oldpkt, unsigned int return_count)
 {
@@ -168,9 +179,14 @@ rte_distributor_get_pkt(struct rte_distributor *d,
 	}
 	return count;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_get_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **pkts,
+		struct rte_mbuf **oldpkt, unsigned int return_count),
+		rte_distributor_get_pkt_v1705);
 
 int
-rte_distributor_return_pkt(struct rte_distributor *d,
+rte_distributor_return_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
 {
 	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
@@ -197,6 +213,10 @@ rte_distributor_return_pkt(struct rte_distributor *d,
 
 	return 0;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_return_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_return_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt, int num),
+		rte_distributor_return_pkt_v1705);
 
 /**** APIs called on distributor core ***/
 
@@ -342,7 +362,7 @@ release(struct rte_distributor *d, unsigned int wkr)
 
 /* process a set of packets to distribute them to workers */
 int
-rte_distributor_process(struct rte_distributor *d,
+rte_distributor_process_v1705(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int num_mbufs)
 {
 	unsigned int next_idx = 0;
@@ -476,10 +496,14 @@ rte_distributor_process(struct rte_distributor *d,
 
 	return num_mbufs;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_process, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs),
+		rte_distributor_process_v1705);
 
 /* return to the caller, packets returned from workers */
 int
-rte_distributor_returned_pkts(struct rte_distributor *d,
+rte_distributor_returned_pkts_v1705(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int max_mbufs)
 {
 	struct rte_distributor_returned_pkts *returns = &d->returns;
@@ -504,6 +528,10 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
 
 	return retval;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_returned_pkts, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs),
+		rte_distributor_returned_pkts_v1705);
 
 /*
  * Return the number of packets in-flight in a distributor, i.e. packets
@@ -525,7 +553,7 @@ total_outstanding(const struct rte_distributor *d)
  * queued up.
  */
 int
-rte_distributor_flush(struct rte_distributor *d)
+rte_distributor_flush_v1705(struct rte_distributor *d)
 {
 	const unsigned int flushed = total_outstanding(d);
 	unsigned int wkr;
@@ -549,10 +577,13 @@ rte_distributor_flush(struct rte_distributor *d)
 
 	return flushed;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_flush, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_flush(struct rte_distributor *d),
+		rte_distributor_returned_pkts_v1705);
 
 /* clears the internal returns array in the distributor */
 void
-rte_distributor_clear_returns(struct rte_distributor *d)
+rte_distributor_clear_returns_v1705(struct rte_distributor *d)
 {
 	unsigned int wkr;
 
@@ -565,10 +596,13 @@ rte_distributor_clear_returns(struct rte_distributor *d)
 	for (wkr = 0; wkr < d->num_workers; wkr++)
 		d->bufs[wkr].retptr64[0] = 0;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_clear_returns, _v1705, 17.05);
+MAP_STATIC_SYMBOL(void rte_distributor_clear_returns(struct rte_distributor *d),
+		rte_distributor_clear_returns_v1705);
 
 /* creates a distributor instance */
 struct rte_distributor *
-rte_distributor_create(const char *name,
+rte_distributor_create_v1705(const char *name,
 		unsigned int socket_id,
 		unsigned int num_workers,
 		unsigned int alg_type)
@@ -638,3 +672,8 @@ rte_distributor_create(const char *name,
 
 	return d;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_create, _v1705, 17.05);
+MAP_STATIC_SYMBOL(struct rte_distributor *rte_distributor_create(
+		const char *name, unsigned int socket_id,
+		unsigned int num_workers, unsigned int alg_type),
+		rte_distributor_create_v1705);
diff --git a/lib/librte_distributor/rte_distributor_v1705.h b/lib/librte_distributor/rte_distributor_v1705.h
new file mode 100644
index 0000000..81b2691
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v1705.h
@@ -0,0 +1,89 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIB_V1705_H_
+#define _RTE_DISTRIB_V1705_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct rte_distributor *
+rte_distributor_create_v1705(const char *name, unsigned int socket_id,
+		unsigned int num_workers,
+		unsigned int alg_type);
+
+int
+rte_distributor_process_v1705(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+int
+rte_distributor_returned_pkts_v1705(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+int
+rte_distributor_flush_v1705(struct rte_distributor *d);
+
+void
+rte_distributor_clear_returns_v1705(struct rte_distributor *d);
+
+int
+rte_distributor_get_pkt_v1705(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **pkts,
+	struct rte_mbuf **oldpkt, unsigned int retcount);
+
+int
+rte_distributor_return_pkt_v1705(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
+
+void
+rte_distributor_request_pkt_v1705(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count);
+
+int
+rte_distributor_poll_pkt_v1705(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **mbufs);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
index 1f406c5..bb6c5d7 100644
--- a/lib/librte_distributor/rte_distributor_v20.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -38,6 +38,7 @@
 #include <rte_memory.h>
 #include <rte_memzone.h>
 #include <rte_errno.h>
+#include <rte_compat.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
 #include "rte_distributor_v20.h"
@@ -63,6 +64,7 @@ rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
 		rte_pause();
 	buf->bufptr64 = req;
 }
+VERSION_SYMBOL(rte_distributor_request_pkt, _v20, 2.0);
 
 struct rte_mbuf *
 rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
@@ -76,6 +78,7 @@ rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
 	int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
 	return (struct rte_mbuf *)((uintptr_t)ret);
 }
+VERSION_SYMBOL(rte_distributor_poll_pkt, _v20, 2.0);
 
 struct rte_mbuf *
 rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
@@ -87,6 +90,7 @@ rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
 		rte_pause();
 	return ret;
 }
+VERSION_SYMBOL(rte_distributor_get_pkt, _v20, 2.0);
 
 int
 rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
@@ -98,6 +102,7 @@ rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
 	buf->bufptr64 = req;
 	return 0;
 }
+VERSION_SYMBOL(rte_distributor_return_pkt, _v20, 2.0);
 
 /**** APIs called on distributor core ***/
 
@@ -314,6 +319,7 @@ rte_distributor_process_v20(struct rte_distributor_v20 *d,
 	d->returns.count = ret_count;
 	return num_mbufs;
 }
+VERSION_SYMBOL(rte_distributor_process, _v20, 2.0);
 
 /* return to the caller, packets returned from workers */
 int
@@ -334,6 +340,7 @@ rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
 
 	return retval;
 }
+VERSION_SYMBOL(rte_distributor_returned_pkts, _v20, 2.0);
 
 /* return the number of packets in-flight in a distributor, i.e. packets
  * being workered on or queued up in a backlog. */
@@ -362,6 +369,7 @@ rte_distributor_flush_v20(struct rte_distributor_v20 *d)
 
 	return flushed;
 }
+VERSION_SYMBOL(rte_distributor_flush, _v20, 2.0);
 
 /* clears the internal returns array in the distributor */
 void
@@ -372,6 +380,7 @@ rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d)
 	memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
 #endif
 }
+VERSION_SYMBOL(rte_distributor_clear_returns, _v20, 2.0);
 
 /* creates a distributor instance */
 struct rte_distributor_v20 *
@@ -415,3 +424,4 @@ rte_distributor_create_v20(const char *name,
 
 	return d;
 }
+VERSION_SYMBOL(rte_distributor_create, _v20, 2.0);
diff --git a/lib/librte_distributor/rte_distributor_version.map b/lib/librte_distributor/rte_distributor_version.map
index 73fdc43..3a285b3 100644
--- a/lib/librte_distributor/rte_distributor_version.map
+++ b/lib/librte_distributor/rte_distributor_version.map
@@ -13,3 +13,17 @@ DPDK_2.0 {
 
 	local: *;
 };
+
+DPDK_17.05 {
+	global:
+
+	rte_distributor_clear_returns;
+	rte_distributor_create;
+	rte_distributor_flush;
+	rte_distributor_get_pkt;
+	rte_distributor_poll_pkt;
+	rte_distributor_process;
+	rte_distributor_request_pkt;
+	rte_distributor_return_pkt;
+	rte_distributor_returned_pkts;
+} DPDK_2.0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v9 10/18] test: test single and burst distributor API
  2017-03-06  9:10                           ` [PATCH v9 00/18] distributor lib performance enhancements David Hunt
                                               ` (8 preceding siblings ...)
  2017-03-06  9:10                             ` [PATCH v9 09/18] lib: add symbol versioning to distributor David Hunt
@ 2017-03-06  9:10                             ` David Hunt
  2017-03-06  9:10                             ` [PATCH v9 11/18] test: add perf test for distributor burst mode David Hunt
                                               ` (8 subsequent siblings)
  18 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 test/test/test_distributor.c | 116 ++++++++++++++++++++++++++++++-------------
 1 file changed, 82 insertions(+), 34 deletions(-)

diff --git a/test/test/test_distributor.c b/test/test/test_distributor.c
index 7a30513..890a852 100644
--- a/test/test/test_distributor.c
+++ b/test/test/test_distributor.c
@@ -538,17 +538,25 @@ static
 int test_error_distributor_create_name(void)
 {
 	struct rte_distributor *d = NULL;
+	struct rte_distributor *db = NULL;
 	char *name = NULL;
 
 	d = rte_distributor_create(name, rte_socket_id(),
 			rte_lcore_count() - 1,
-			RTE_DIST_ALG_BURST);
-
+			RTE_DIST_ALG_SINGLE);
 	if (d != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with NULL name param\n");
 		return -1;
 	}
 
+	db = rte_distributor_create(name, rte_socket_id(),
+			rte_lcore_count() - 1,
+			RTE_DIST_ALG_BURST);
+	if (db != NULL || rte_errno != EINVAL) {
+		printf("ERROR: No error on create() with NULL param\n");
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -556,15 +564,25 @@ int test_error_distributor_create_name(void)
 static
 int test_error_distributor_create_numworkers(void)
 {
-	struct rte_distributor *d = NULL;
+	struct rte_distributor *ds = NULL;
+	struct rte_distributor *db = NULL;
 
-	d = rte_distributor_create("test_numworkers", rte_socket_id(),
+	ds = rte_distributor_create("test_numworkers", rte_socket_id(),
 			RTE_MAX_LCORE + 10,
-			RTE_DIST_ALG_BURST);
-	if (d != NULL || rte_errno != EINVAL) {
+			RTE_DIST_ALG_SINGLE);
+	if (ds != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with num_workers > MAX\n");
 		return -1;
 	}
+
+	db = rte_distributor_create("test_numworkers", rte_socket_id(),
+			RTE_MAX_LCORE + 10,
+			RTE_DIST_ALG_BURST);
+	if (db != NULL || rte_errno != EINVAL) {
+		printf("ERROR: No error on create() num_workers > MAX\n");
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -597,25 +615,42 @@ quit_workers(struct worker_params *wp, struct rte_mempool *p)
 static int
 test_distributor(void)
 {
-	static struct rte_distributor *d;
+	static struct rte_distributor *ds;
+	static struct rte_distributor *db;
+	static struct rte_distributor *dist[2];
 	static struct rte_mempool *p;
+	int i;
 
 	if (rte_lcore_count() < 2) {
 		printf("ERROR: not enough cores to test distributor\n");
 		return -1;
 	}
 
-	if (d == NULL) {
-		d = rte_distributor_create("Test_dist_burst", rte_socket_id(),
+	if (db == NULL) {
+		db = rte_distributor_create("Test_dist_burst", rte_socket_id(),
 				rte_lcore_count() - 1,
 				RTE_DIST_ALG_BURST);
-		if (d == NULL) {
+		if (db == NULL) {
 			printf("Error creating burst distributor\n");
 			return -1;
 		}
 	} else {
-		rte_distributor_flush(d);
-		rte_distributor_clear_returns(d);
+		rte_distributor_flush(db);
+		rte_distributor_clear_returns(db);
+	}
+
+	if (ds == NULL) {
+		ds = rte_distributor_create("Test_dist_single",
+				rte_socket_id(),
+				rte_lcore_count() - 1,
+			RTE_DIST_ALG_SINGLE);
+		if (ds == NULL) {
+			printf("Error creating single distributor\n");
+			return -1;
+		}
+	} else {
+		rte_distributor_flush(ds);
+		rte_distributor_clear_returns(ds);
 	}
 
 	const unsigned nb_bufs = (511 * rte_lcore_count()) < BIG_BATCH ?
@@ -629,37 +664,50 @@ test_distributor(void)
 		}
 	}
 
-	worker_params.dist = d;
-	sprintf(worker_params.name, "burst");
+	dist[0] = ds;
+	dist[1] = db;
 
-	rte_eal_mp_remote_launch(handle_work, &worker_params, SKIP_MASTER);
-	if (sanity_test(&worker_params, p) < 0)
-		goto err;
-	quit_workers(&worker_params, p);
+	for (i = 0; i < 2; i++) {
 
-	rte_eal_mp_remote_launch(handle_work_with_free_mbufs, &worker_params,
-				SKIP_MASTER);
-	if (sanity_test_with_mbuf_alloc(&worker_params, p) < 0)
-		goto err;
-	quit_workers(&worker_params, p);
+		worker_params.dist = dist[i];
+		if (i)
+			sprintf(worker_params.name, "burst");
+		else
+			sprintf(worker_params.name, "single");
 
-	if (rte_lcore_count() > 2) {
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
-				&worker_params,
-				SKIP_MASTER);
-		if (sanity_test_with_worker_shutdown(&worker_params, p) < 0)
+		rte_eal_mp_remote_launch(handle_work,
+				&worker_params, SKIP_MASTER);
+		if (sanity_test(&worker_params, p) < 0)
 			goto err;
 		quit_workers(&worker_params, p);
 
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
-				&worker_params,
-				SKIP_MASTER);
-		if (test_flush_with_worker_shutdown(&worker_params, p) < 0)
+		rte_eal_mp_remote_launch(handle_work_with_free_mbufs,
+				&worker_params, SKIP_MASTER);
+		if (sanity_test_with_mbuf_alloc(&worker_params, p) < 0)
 			goto err;
 		quit_workers(&worker_params, p);
 
-	} else {
-		printf("Not enough cores to run tests for worker shutdown\n");
+		if (rte_lcore_count() > 2) {
+			rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+					&worker_params,
+					SKIP_MASTER);
+			if (sanity_test_with_worker_shutdown(&worker_params,
+					p) < 0)
+				goto err;
+			quit_workers(&worker_params, p);
+
+			rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+					&worker_params,
+					SKIP_MASTER);
+			if (test_flush_with_worker_shutdown(&worker_params,
+					p) < 0)
+				goto err;
+			quit_workers(&worker_params, p);
+
+		} else {
+			printf("Too few cores to run worker shutdown test\n");
+		}
+
 	}
 
 	if (test_error_distributor_create_numworkers() == -1 ||
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v9 11/18] test: add perf test for distributor burst mode
  2017-03-06  9:10                           ` [PATCH v9 00/18] distributor lib performance enhancements David Hunt
                                               ` (9 preceding siblings ...)
  2017-03-06  9:10                             ` [PATCH v9 10/18] test: test single and burst distributor API David Hunt
@ 2017-03-06  9:10                             ` David Hunt
  2017-03-06  9:10                             ` [PATCH v9 12/18] examples/distributor: allow for extra stats David Hunt
                                               ` (7 subsequent siblings)
  18 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 test/test/test_distributor_perf.c | 75 ++++++++++++++++++++++++++-------------
 1 file changed, 51 insertions(+), 24 deletions(-)

diff --git a/test/test/test_distributor_perf.c b/test/test/test_distributor_perf.c
index 1dd326b..732d86d 100644
--- a/test/test/test_distributor_perf.c
+++ b/test/test/test_distributor_perf.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -41,8 +41,9 @@
 #include <rte_mbuf.h>
 #include <rte_distributor.h>
 
-#define ITER_POWER 20 /* log 2 of how many iterations we do when timing. */
-#define BURST 32
+#define ITER_POWER_CL 25 /* log 2 of how many iterations  for Cache Line test */
+#define ITER_POWER 21 /* log 2 of how many iterations we do when timing. */
+#define BURST 64
 #define BIG_BATCH 1024
 
 /* static vars - zero initialized by default */
@@ -54,7 +55,8 @@ struct worker_stats {
 } __rte_cache_aligned;
 struct worker_stats worker_stats[RTE_MAX_LCORE];
 
-/* worker thread used for testing the time to do a round-trip of a cache
+/*
+ * worker thread used for testing the time to do a round-trip of a cache
  * line between two cores and back again
  */
 static void
@@ -69,7 +71,8 @@ flip_bit(volatile uint64_t *arg)
 	}
 }
 
-/* test case to time the number of cycles to round-trip a cache line between
+/*
+ * test case to time the number of cycles to round-trip a cache line between
  * two cores and back again.
  */
 static void
@@ -86,7 +89,7 @@ time_cache_line_switch(void)
 		rte_pause();
 
 	const uint64_t start_time = rte_rdtsc();
-	for (i = 0; i < (1 << ITER_POWER); i++) {
+	for (i = 0; i < (1 << ITER_POWER_CL); i++) {
 		while (*pdata)
 			rte_pause();
 		*pdata = 1;
@@ -98,13 +101,14 @@ time_cache_line_switch(void)
 	*pdata = 2;
 	rte_eal_wait_lcore(slaveid);
 	printf("==== Cache line switch test ===\n");
-	printf("Time for %u iterations = %"PRIu64" ticks\n", (1<<ITER_POWER),
+	printf("Time for %u iterations = %"PRIu64" ticks\n", (1<<ITER_POWER_CL),
 			end_time-start_time);
 	printf("Ticks per iteration = %"PRIu64"\n\n",
-			(end_time-start_time) >> ITER_POWER);
+			(end_time-start_time) >> ITER_POWER_CL);
 }
 
-/* returns the total count of the number of packets handled by the worker
+/*
+ * returns the total count of the number of packets handled by the worker
  * functions given below.
  */
 static unsigned
@@ -123,7 +127,8 @@ clear_packet_count(void)
 	memset(&worker_stats, 0, sizeof(worker_stats));
 }
 
-/* this is the basic worker function for performance tests.
+/*
+ * This is the basic worker function for performance tests.
  * it does nothing but return packets and count them.
  */
 static int
@@ -151,14 +156,15 @@ handle_work(void *arg)
 	return 0;
 }
 
-/* this basic performance test just repeatedly sends in 32 packets at a time
+/*
+ * This basic performance test just repeatedly sends in 32 packets at a time
  * to the distributor and verifies at the end that we got them all in the worker
  * threads and finally how long per packet the processing took.
  */
 static inline int
 perf_test(struct rte_distributor *d, struct rte_mempool *p)
 {
-	unsigned i;
+	unsigned int i;
 	uint64_t start, end;
 	struct rte_mbuf *bufs[BURST];
 
@@ -181,7 +187,8 @@ perf_test(struct rte_distributor *d, struct rte_mempool *p)
 		rte_distributor_process(d, NULL, 0);
 	} while (total_packet_count() < (BURST << ITER_POWER));
 
-	printf("=== Performance test of distributor ===\n");
+	rte_distributor_clear_returns(d);
+
 	printf("Time per burst:  %"PRIu64"\n", (end - start) >> ITER_POWER);
 	printf("Time per packet: %"PRIu64"\n\n",
 			((end - start) >> ITER_POWER)/BURST);
@@ -201,9 +208,10 @@ perf_test(struct rte_distributor *d, struct rte_mempool *p)
 static void
 quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 {
-	const unsigned num_workers = rte_lcore_count() - 1;
-	unsigned i;
+	const unsigned int num_workers = rte_lcore_count() - 1;
+	unsigned int i;
 	struct rte_mbuf *bufs[RTE_MAX_LCORE];
+
 	rte_mempool_get_bulk(p, (void *)bufs, num_workers);
 
 	quit = 1;
@@ -222,7 +230,8 @@ quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 static int
 test_distributor_perf(void)
 {
-	static struct rte_distributor *d;
+	static struct rte_distributor *ds;
+	static struct rte_distributor *db;
 	static struct rte_mempool *p;
 
 	if (rte_lcore_count() < 2) {
@@ -233,17 +242,28 @@ test_distributor_perf(void)
 	/* first time how long it takes to round-trip a cache line */
 	time_cache_line_switch();
 
-	if (d == NULL) {
-		d = rte_distributor_create("Test_perf", rte_socket_id(),
+	if (ds == NULL) {
+		ds = rte_distributor_create("Test_perf", rte_socket_id(),
 				rte_lcore_count() - 1,
 				RTE_DIST_ALG_SINGLE);
-		if (d == NULL) {
+		if (ds == NULL) {
 			printf("Error creating distributor\n");
 			return -1;
 		}
 	} else {
-		rte_distributor_flush(d);
-		rte_distributor_clear_returns(d);
+		rte_distributor_clear_returns(ds);
+	}
+
+	if (db == NULL) {
+		db = rte_distributor_create("Test_burst", rte_socket_id(),
+				rte_lcore_count() - 1,
+				RTE_DIST_ALG_BURST);
+		if (db == NULL) {
+			printf("Error creating burst distributor\n");
+			return -1;
+		}
+	} else {
+		rte_distributor_clear_returns(db);
 	}
 
 	const unsigned nb_bufs = (511 * rte_lcore_count()) < BIG_BATCH ?
@@ -257,10 +277,17 @@ test_distributor_perf(void)
 		}
 	}
 
-	rte_eal_mp_remote_launch(handle_work, d, SKIP_MASTER);
-	if (perf_test(d, p) < 0)
+	printf("=== Performance test of distributor (single mode) ===\n");
+	rte_eal_mp_remote_launch(handle_work, ds, SKIP_MASTER);
+	if (perf_test(ds, p) < 0)
+		return -1;
+	quit_workers(ds, p);
+
+	printf("=== Performance test of distributor (burst mode) ===\n");
+	rte_eal_mp_remote_launch(handle_work, db, SKIP_MASTER);
+	if (perf_test(db, p) < 0)
 		return -1;
-	quit_workers(d, p);
+	quit_workers(db, p);
 
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v9 12/18] examples/distributor: allow for extra stats
  2017-03-06  9:10                           ` [PATCH v9 00/18] distributor lib performance enhancements David Hunt
                                               ` (10 preceding siblings ...)
  2017-03-06  9:10                             ` [PATCH v9 11/18] test: add perf test for distributor burst mode David Hunt
@ 2017-03-06  9:10                             ` David Hunt
  2017-03-10 16:46                               ` Bruce Richardson
  2017-03-06  9:10                             ` [PATCH v9 13/18] sample: distributor: wait for ports to come up David Hunt
                                               ` (6 subsequent siblings)
  18 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

This will allow us to see what's going on at various stages
throughout the sample app, with per-second visibility

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 examples/distributor/main.c | 139 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 123 insertions(+), 16 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index cc3bdb0..3657e5d 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -54,24 +54,53 @@
 
 #define RTE_LOGTYPE_DISTRAPP RTE_LOGTYPE_USER1
 
+#define ANSI_COLOR_RED     "\x1b[31m"
+#define ANSI_COLOR_RESET   "\x1b[0m"
+
 /* mask of enabled ports */
 static uint32_t enabled_port_mask;
 volatile uint8_t quit_signal;
 volatile uint8_t quit_signal_rx;
+volatile uint8_t quit_signal_dist;
 
 static volatile struct app_stats {
 	struct {
 		uint64_t rx_pkts;
 		uint64_t returned_pkts;
 		uint64_t enqueued_pkts;
+		uint64_t enqdrop_pkts;
 	} rx __rte_cache_aligned;
+	int pad1 __rte_cache_aligned;
+
+	struct {
+		uint64_t in_pkts;
+		uint64_t ret_pkts;
+		uint64_t sent_pkts;
+		uint64_t enqdrop_pkts;
+	} dist __rte_cache_aligned;
+	int pad2 __rte_cache_aligned;
 
 	struct {
 		uint64_t dequeue_pkts;
 		uint64_t tx_pkts;
+		uint64_t enqdrop_pkts;
 	} tx __rte_cache_aligned;
+	int pad3 __rte_cache_aligned;
+
+	uint64_t worker_pkts[64] __rte_cache_aligned;
+
+	int pad4 __rte_cache_aligned;
+
+	uint64_t worker_bursts[64][8] __rte_cache_aligned;
+
+	int pad5 __rte_cache_aligned;
+
+	uint64_t port_rx_pkts[64] __rte_cache_aligned;
+	uint64_t port_tx_pkts[64] __rte_cache_aligned;
 } app_stats;
 
+struct app_stats prev_app_stats;
+
 static const struct rte_eth_conf port_conf_default = {
 	.rxmode = {
 		.mq_mode = ETH_MQ_RX_RSS,
@@ -93,6 +122,8 @@ struct output_buffer {
 	struct rte_mbuf *mbufs[BURST_SIZE];
 };
 
+static void print_stats(void);
+
 /*
  * Initialises a given port using global settings and with the rx buffers
  * coming from the mbuf_pool passed as parameter
@@ -378,25 +409,91 @@ static void
 print_stats(void)
 {
 	struct rte_eth_stats eth_stats;
-	unsigned i;
-
-	printf("\nRX thread stats:\n");
-	printf(" - Received:    %"PRIu64"\n", app_stats.rx.rx_pkts);
-	printf(" - Processed:   %"PRIu64"\n", app_stats.rx.returned_pkts);
-	printf(" - Enqueued:    %"PRIu64"\n", app_stats.rx.enqueued_pkts);
-
-	printf("\nTX thread stats:\n");
-	printf(" - Dequeued:    %"PRIu64"\n", app_stats.tx.dequeue_pkts);
-	printf(" - Transmitted: %"PRIu64"\n", app_stats.tx.tx_pkts);
+	unsigned int i, j;
+	const unsigned int num_workers = rte_lcore_count() - 4;
 
 	for (i = 0; i < rte_eth_dev_count(); i++) {
 		rte_eth_stats_get(i, &eth_stats);
-		printf("\nPort %u stats:\n", i);
-		printf(" - Pkts in:   %"PRIu64"\n", eth_stats.ipackets);
-		printf(" - Pkts out:  %"PRIu64"\n", eth_stats.opackets);
-		printf(" - In Errs:   %"PRIu64"\n", eth_stats.ierrors);
-		printf(" - Out Errs:  %"PRIu64"\n", eth_stats.oerrors);
-		printf(" - Mbuf Errs: %"PRIu64"\n", eth_stats.rx_nombuf);
+		app_stats.port_rx_pkts[i] = eth_stats.ipackets;
+		app_stats.port_tx_pkts[i] = eth_stats.opackets;
+	}
+
+	printf("\n\nRX Thread:\n");
+	for (i = 0; i < rte_eth_dev_count(); i++) {
+		printf("Port %u Pktsin : %5.2f\n", i,
+				(app_stats.port_rx_pkts[i] -
+				prev_app_stats.port_rx_pkts[i])/1000000.0);
+		prev_app_stats.port_rx_pkts[i] = app_stats.port_rx_pkts[i];
+	}
+	printf(" - Received:    %5.2f\n",
+			(app_stats.rx.rx_pkts -
+			prev_app_stats.rx.rx_pkts)/1000000.0);
+	printf(" - Returned:    %5.2f\n",
+			(app_stats.rx.returned_pkts -
+			prev_app_stats.rx.returned_pkts)/1000000.0);
+	printf(" - Enqueued:    %5.2f\n",
+			(app_stats.rx.enqueued_pkts -
+			prev_app_stats.rx.enqueued_pkts)/1000000.0);
+	printf(" - Dropped:     %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.rx.enqdrop_pkts -
+			prev_app_stats.rx.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	printf("Distributor thread:\n");
+	printf(" - In:          %5.2f\n",
+			(app_stats.dist.in_pkts -
+			prev_app_stats.dist.in_pkts)/1000000.0);
+	printf(" - Returned:    %5.2f\n",
+			(app_stats.dist.ret_pkts -
+			prev_app_stats.dist.ret_pkts)/1000000.0);
+	printf(" - Sent:        %5.2f\n",
+			(app_stats.dist.sent_pkts -
+			prev_app_stats.dist.sent_pkts)/1000000.0);
+	printf(" - Dropped      %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.dist.enqdrop_pkts -
+			prev_app_stats.dist.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	printf("TX thread:\n");
+	printf(" - Dequeued:    %5.2f\n",
+			(app_stats.tx.dequeue_pkts -
+			prev_app_stats.tx.dequeue_pkts)/1000000.0);
+	for (i = 0; i < rte_eth_dev_count(); i++) {
+		printf("Port %u Pktsout: %5.2f\n",
+				i, (app_stats.port_tx_pkts[i] -
+				prev_app_stats.port_tx_pkts[i])/1000000.0);
+		prev_app_stats.port_tx_pkts[i] = app_stats.port_tx_pkts[i];
+	}
+	printf(" - Transmitted: %5.2f\n",
+			(app_stats.tx.tx_pkts -
+			prev_app_stats.tx.tx_pkts)/1000000.0);
+	printf(" - Dropped:     %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.tx.enqdrop_pkts -
+			prev_app_stats.tx.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	prev_app_stats.rx.rx_pkts = app_stats.rx.rx_pkts;
+	prev_app_stats.rx.returned_pkts = app_stats.rx.returned_pkts;
+	prev_app_stats.rx.enqueued_pkts = app_stats.rx.enqueued_pkts;
+	prev_app_stats.rx.enqdrop_pkts = app_stats.rx.enqdrop_pkts;
+	prev_app_stats.dist.in_pkts = app_stats.dist.in_pkts;
+	prev_app_stats.dist.ret_pkts = app_stats.dist.ret_pkts;
+	prev_app_stats.dist.sent_pkts = app_stats.dist.sent_pkts;
+	prev_app_stats.dist.enqdrop_pkts = app_stats.dist.enqdrop_pkts;
+	prev_app_stats.tx.dequeue_pkts = app_stats.tx.dequeue_pkts;
+	prev_app_stats.tx.tx_pkts = app_stats.tx.tx_pkts;
+	prev_app_stats.tx.enqdrop_pkts = app_stats.tx.enqdrop_pkts;
+
+	for (i = 0; i < num_workers; i++) {
+		printf("Worker %02u Pkts: %5.2f. Bursts(1-8): ", i,
+				(app_stats.worker_pkts[i] -
+				prev_app_stats.worker_pkts[i])/1000000.0);
+		for (j = 0; j < 8; j++) {
+			printf("%"PRIu64" ", app_stats.worker_bursts[i][j]);
+			app_stats.worker_bursts[i][j] = 0;
+		}
+		printf("\n");
+		prev_app_stats.worker_pkts[i] = app_stats.worker_pkts[i];
 	}
 }
 
@@ -515,6 +612,7 @@ main(int argc, char *argv[])
 	unsigned nb_ports;
 	uint8_t portid;
 	uint8_t nb_ports_available;
+	uint64_t t, freq;
 
 	/* catch ctrl-c so we can print on exit */
 	signal(SIGINT, int_handler);
@@ -610,6 +708,15 @@ main(int argc, char *argv[])
 	if (lcore_rx(&p) != 0)
 		return -1;
 
+	freq = rte_get_timer_hz();
+	t = rte_rdtsc() + freq;
+	while (!quit_signal_dist) {
+		if (t < rte_rdtsc()) {
+			print_stats();
+			t = rte_rdtsc() + freq;
+		}
+	}
+
 	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
 		if (rte_eal_wait_lcore(lcore_id) < 0)
 			return -1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v9 13/18] sample: distributor: wait for ports to come up
  2017-03-06  9:10                           ` [PATCH v9 00/18] distributor lib performance enhancements David Hunt
                                               ` (11 preceding siblings ...)
  2017-03-06  9:10                             ` [PATCH v9 12/18] examples/distributor: allow for extra stats David Hunt
@ 2017-03-06  9:10                             ` David Hunt
  2017-03-10 16:48                               ` Bruce Richardson
  2017-03-06  9:10                             ` [PATCH v9 14/18] examples/distributor: give distributor a core David Hunt
                                               ` (5 subsequent siblings)
  18 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

On some machines, ports take several seconds to come up. This
patch causes the app to wait.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 examples/distributor/main.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index 3657e5d..aeb75a8 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -1,8 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -62,6 +61,7 @@ static uint32_t enabled_port_mask;
 volatile uint8_t quit_signal;
 volatile uint8_t quit_signal_rx;
 volatile uint8_t quit_signal_dist;
+volatile uint8_t quit_signal_work;
 
 static volatile struct app_stats {
 	struct {
@@ -165,7 +165,8 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 
 	struct rte_eth_link link;
 	rte_eth_link_get_nowait(port, &link);
-	if (!link.link_status) {
+	while (!link.link_status) {
+		printf("Waiting for Link up on port %"PRIu8"\n", port);
 		sleep(1);
 		rte_eth_link_get_nowait(port, &link);
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v9 14/18] examples/distributor: give distributor a core
  2017-03-06  9:10                           ` [PATCH v9 00/18] distributor lib performance enhancements David Hunt
                                               ` (12 preceding siblings ...)
  2017-03-06  9:10                             ` [PATCH v9 13/18] sample: distributor: wait for ports to come up David Hunt
@ 2017-03-06  9:10                             ` David Hunt
  2017-03-10 16:49                               ` Bruce Richardson
  2017-03-06  9:10                             ` [PATCH v9 15/18] examples/distributor: limit number of Tx rings David Hunt
                                               ` (4 subsequent siblings)
  18 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 examples/distributor/main.c | 181 ++++++++++++++++++++++++++++++--------------
 1 file changed, 123 insertions(+), 58 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index aeb75a8..e9ebe5e 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -49,6 +49,8 @@
 #define NUM_MBUFS ((64*1024)-1)
 #define MBUF_CACHE_SIZE 250
 #define BURST_SIZE 32
+#define SCHED_RX_RING_SZ 8192
+#define SCHED_TX_RING_SZ 65536
 #define RTE_RING_SZ 1024
 
 #define RTE_LOGTYPE_DISTRAPP RTE_LOGTYPE_USER1
@@ -193,37 +195,14 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 struct lcore_params {
 	unsigned worker_id;
 	struct rte_distributor *d;
-	struct rte_ring *r;
+	struct rte_ring *rx_dist_ring;
+	struct rte_ring *dist_tx_ring;
 	struct rte_mempool *mem_pool;
 };
 
 static int
-quit_workers(struct rte_distributor *d, struct rte_mempool *p)
-{
-	const unsigned num_workers = rte_lcore_count() - 2;
-	unsigned i;
-	struct rte_mbuf *bufs[num_workers];
-
-	if (rte_mempool_get_bulk(p, (void *)bufs, num_workers) != 0) {
-		printf("line %d: Error getting mbufs from pool\n", __LINE__);
-		return -1;
-	}
-
-	for (i = 0; i < num_workers; i++)
-		bufs[i]->hash.rss = i << 1;
-
-	rte_distributor_process(d, bufs, num_workers);
-	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
-
-	return 0;
-}
-
-static int
 lcore_rx(struct lcore_params *p)
 {
-	struct rte_distributor *d = p->d;
-	struct rte_mempool *mem_pool = p->mem_pool;
-	struct rte_ring *r = p->r;
 	const uint8_t nb_ports = rte_eth_dev_count();
 	const int socket_id = rte_socket_id();
 	uint8_t port;
@@ -260,9 +239,15 @@ lcore_rx(struct lcore_params *p)
 		}
 		app_stats.rx.rx_pkts += nb_rx;
 
-		rte_distributor_process(d, bufs, nb_rx);
-		const uint16_t nb_ret = rte_distributor_returned_pkts(d,
-				bufs, BURST_SIZE*2);
+/*
+ * You can run the distributor on the rx core with this code. Returned
+ * packets are then send straight to the tx core.
+ */
+#if 0
+	rte_distributor_process(d, bufs, nb_rx);
+	const uint16_t nb_ret = rte_distributor_returned_pktsd,
+			bufs, BURST_SIZE*2);
+
 		app_stats.rx.returned_pkts += nb_ret;
 		if (unlikely(nb_ret == 0)) {
 			if (++port == nb_ports)
@@ -270,7 +255,22 @@ lcore_rx(struct lcore_params *p)
 			continue;
 		}
 
-		uint16_t sent = rte_ring_enqueue_burst(r, (void *)bufs, nb_ret);
+		struct rte_ring *tx_ring = p->dist_tx_ring;
+		uint16_t sent = rte_ring_enqueue_burst(tx_ring,
+				(void *)bufs, nb_ret);
+#else
+		uint16_t nb_ret = nb_rx;
+		/*
+		 * Swap the following two lines if you want the rx traffic
+		 * to go directly to tx, no distribution.
+		 */
+		struct rte_ring *out_ring = p->rx_dist_ring;
+		/* struct rte_ring *out_ring = p->dist_tx_ring; */
+
+		uint16_t sent = rte_ring_enqueue_burst(out_ring,
+				(void *)bufs, nb_ret);
+#endif
+
 		app_stats.rx.enqueued_pkts += sent;
 		if (unlikely(sent < nb_ret)) {
 			RTE_LOG_DP(DEBUG, DISTRAPP,
@@ -281,20 +281,9 @@ lcore_rx(struct lcore_params *p)
 		if (++port == nb_ports)
 			port = 0;
 	}
-	rte_distributor_process(d, NULL, 0);
-	/* flush distributor to bring to known state */
-	rte_distributor_flush(d);
 	/* set worker & tx threads quit flag */
+	printf("\nCore %u exiting rx task.\n", rte_lcore_id());
 	quit_signal = 1;
-	/*
-	 * worker threads may hang in get packet as
-	 * distributor process is not running, just make sure workers
-	 * get packets till quit_signal is actually been
-	 * received and they gracefully shutdown
-	 */
-	if (quit_workers(d, mem_pool) != 0)
-		return -1;
-	/* rx thread should quit at last */
 	return 0;
 }
 
@@ -331,6 +320,58 @@ flush_all_ports(struct output_buffer *tx_buffers, uint8_t nb_ports)
 	}
 }
 
+
+
+static int
+lcore_distributor(struct lcore_params *p)
+{
+	struct rte_ring *in_r = p->rx_dist_ring;
+	struct rte_ring *out_r = p->dist_tx_ring;
+	struct rte_mbuf *bufs[BURST_SIZE * 4];
+	struct rte_distributor *d = p->d;
+
+	printf("\nCore %u acting as distributor core.\n", rte_lcore_id());
+	while (!quit_signal_dist) {
+		const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
+				(void *)bufs, BURST_SIZE*1);
+		if (nb_rx) {
+			app_stats.dist.in_pkts += nb_rx;
+
+			/* Distribute the packets */
+			rte_distributor_process(d, bufs, nb_rx);
+			/* Handle Returns */
+			const uint16_t nb_ret =
+				rte_distributor_returned_pkts(d,
+					bufs, BURST_SIZE*2);
+
+			if (unlikely(nb_ret == 0))
+				continue;
+			app_stats.dist.ret_pkts += nb_ret;
+
+			uint16_t sent = rte_ring_enqueue_burst(out_r,
+					(void *)bufs, nb_ret);
+			app_stats.dist.sent_pkts += sent;
+			if (unlikely(sent < nb_ret)) {
+				app_stats.dist.enqdrop_pkts += nb_ret - sent;
+				RTE_LOG(DEBUG, DISTRAPP,
+					"%s:Packet loss due to full out ring\n",
+					__func__);
+				while (sent < nb_ret)
+					rte_pktmbuf_free(bufs[sent++]);
+			}
+		}
+	}
+	printf("\nCore %u exiting distributor task.\n", rte_lcore_id());
+	quit_signal_work = 1;
+
+	rte_distributor_flush(d);
+	/* Unblock any returns so workers can exit */
+	rte_distributor_clear_returns(d);
+	quit_signal_rx = 1;
+	return 0;
+}
+
+
 static int
 lcore_tx(struct rte_ring *in_r)
 {
@@ -403,7 +444,7 @@ int_handler(int sig_num)
 {
 	printf("Exiting on signal %d\n", sig_num);
 	/* set quit flag for rx thread to exit */
-	quit_signal_rx = 1;
+	quit_signal_dist = 1;
 }
 
 static void
@@ -517,7 +558,7 @@ lcore_worker(struct lcore_params *p)
 		buf[i] = NULL;
 
 	printf("\nCore %u acting as worker core.\n", rte_lcore_id());
-	while (!quit_signal) {
+	while (!quit_signal_work) {
 		num = rte_distributor_get_pkt(d, id, buf, buf, num);
 		/* Do a little bit of work for each packet */
 		for (i = 0; i < num; i++) {
@@ -608,7 +649,8 @@ main(int argc, char *argv[])
 {
 	struct rte_mempool *mbuf_pool;
 	struct rte_distributor *d;
-	struct rte_ring *output_ring;
+	struct rte_ring *dist_tx_ring;
+	struct rte_ring *rx_dist_ring;
 	unsigned lcore_id, worker_id = 0;
 	unsigned nb_ports;
 	uint8_t portid;
@@ -630,10 +672,11 @@ main(int argc, char *argv[])
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Invalid distributor parameters\n");
 
-	if (rte_lcore_count() < 3)
+	if (rte_lcore_count() < 4)
 		rte_exit(EXIT_FAILURE, "Error, This application needs at "
-				"least 3 logical cores to run:\n"
-				"1 lcore for packet RX and distribution\n"
+				"least 4 logical cores to run:\n"
+				"1 lcore for packet RX\n"
+				"1 lcore for distribution\n"
 				"1 lcore for packet TX\n"
 				"and at least 1 lcore for worker threads\n");
 
@@ -673,30 +716,52 @@ main(int argc, char *argv[])
 	}
 
 	d = rte_distributor_create("PKT_DIST", rte_socket_id(),
-			rte_lcore_count() - 2,
+			rte_lcore_count() - 3,
 			RTE_DIST_ALG_BURST);
 	if (d == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create distributor\n");
 
 	/*
-	 * scheduler ring is read only by the transmitter core, but written to
-	 * by multiple threads
+	 * scheduler ring is read by the transmitter core, and written to
+	 * by scheduler core
 	 */
-	output_ring = rte_ring_create("Output_ring", RTE_RING_SZ,
-			rte_socket_id(), RING_F_SC_DEQ);
-	if (output_ring == NULL)
+	dist_tx_ring = rte_ring_create("Output_ring", SCHED_TX_RING_SZ,
+			rte_socket_id(), RING_F_SC_DEQ | RING_F_SP_ENQ);
+	if (dist_tx_ring == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot create output ring\n");
+
+	rx_dist_ring = rte_ring_create("Input_ring", SCHED_RX_RING_SZ,
+			rte_socket_id(), RING_F_SC_DEQ | RING_F_SP_ENQ);
+	if (rx_dist_ring == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create output ring\n");
 
 	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
-		if (worker_id == rte_lcore_count() - 2)
+		if (worker_id == rte_lcore_count() - 3) {
+			printf("Starting distributor on lcore_id %d\n",
+					lcore_id);
+			/* distributor core */
+			struct lcore_params *p =
+					rte_malloc(NULL, sizeof(*p), 0);
+			if (!p)
+				rte_panic("malloc failure\n");
+			*p = (struct lcore_params){worker_id, d,
+				rx_dist_ring, dist_tx_ring, mbuf_pool};
+			rte_eal_remote_launch(
+				(lcore_function_t *)lcore_distributor,
+				p, lcore_id);
+		} else if (worker_id == rte_lcore_count() - 4) {
+			printf("Starting tx  on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
+			/* tx core */
 			rte_eal_remote_launch((lcore_function_t *)lcore_tx,
-					output_ring, lcore_id);
-		else {
+					dist_tx_ring, lcore_id);
+		} else {
 			struct lcore_params *p =
 					rte_malloc(NULL, sizeof(*p), 0);
 			if (!p)
 				rte_panic("malloc failure\n");
-			*p = (struct lcore_params){worker_id, d, output_ring, mbuf_pool};
+			*p = (struct lcore_params){worker_id, d, rx_dist_ring,
+					dist_tx_ring, mbuf_pool};
 
 			rte_eal_remote_launch((lcore_function_t *)lcore_worker,
 					p, lcore_id);
@@ -704,7 +769,7 @@ main(int argc, char *argv[])
 		worker_id++;
 	}
 	/* call lcore_main on master core only */
-	struct lcore_params p = { 0, d, output_ring, mbuf_pool};
+	struct lcore_params p = { 0, d, rx_dist_ring, dist_tx_ring, mbuf_pool};
 
 	if (lcore_rx(&p) != 0)
 		return -1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v9 15/18] examples/distributor: limit number of Tx rings
  2017-03-06  9:10                           ` [PATCH v9 00/18] distributor lib performance enhancements David Hunt
                                               ` (13 preceding siblings ...)
  2017-03-06  9:10                             ` [PATCH v9 14/18] examples/distributor: give distributor a core David Hunt
@ 2017-03-06  9:10                             ` David Hunt
  2017-03-10 16:50                               ` Bruce Richardson
  2017-03-06  9:10                             ` [PATCH v9 16/18] examples/distributor: give Rx thread a core David Hunt
                                               ` (3 subsequent siblings)
  18 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 examples/distributor/main.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index e9ebe5e..cf2e826 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -44,14 +44,15 @@
 #include <rte_prefetch.h>
 #include <rte_distributor.h>
 
-#define RX_RING_SIZE 256
-#define TX_RING_SIZE 512
+#define RX_QUEUE_SIZE 512
+#define TX_QUEUE_SIZE 512
+
 #define NUM_MBUFS ((64*1024)-1)
-#define MBUF_CACHE_SIZE 250
-#define BURST_SIZE 32
+#define MBUF_CACHE_SIZE 128
+#define BURST_SIZE 64
 #define SCHED_RX_RING_SZ 8192
 #define SCHED_TX_RING_SZ 65536
-#define RTE_RING_SZ 1024
+#define BURST_SIZE_TX 32
 
 #define RTE_LOGTYPE_DISTRAPP RTE_LOGTYPE_USER1
 
@@ -134,9 +135,13 @@ static inline int
 port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 {
 	struct rte_eth_conf port_conf = port_conf_default;
-	const uint16_t rxRings = 1, txRings = rte_lcore_count() - 1;
-	int retval;
+	const uint16_t rxRings = 1;
+	uint16_t txRings = rte_lcore_count() - 1;
 	uint16_t q;
+	int retval;
+
+	if (txRings > RTE_MAX_ETHPORTS)
+		txRings = RTE_MAX_ETHPORTS;
 
 	if (port >= rte_eth_dev_count())
 		return -1;
@@ -146,7 +151,7 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 		return retval;
 
 	for (q = 0; q < rxRings; q++) {
-		retval = rte_eth_rx_queue_setup(port, q, RX_RING_SIZE,
+		retval = rte_eth_rx_queue_setup(port, q, RX_QUEUE_SIZE,
 						rte_eth_dev_socket_id(port),
 						NULL, mbuf_pool);
 		if (retval < 0)
@@ -154,7 +159,7 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 	}
 
 	for (q = 0; q < txRings; q++) {
-		retval = rte_eth_tx_queue_setup(port, q, TX_RING_SIZE,
+		retval = rte_eth_tx_queue_setup(port, q, TX_QUEUE_SIZE,
 						rte_eth_dev_socket_id(port),
 						NULL);
 		if (retval < 0)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v9 16/18] examples/distributor: give Rx thread a core
  2017-03-06  9:10                           ` [PATCH v9 00/18] distributor lib performance enhancements David Hunt
                                               ` (14 preceding siblings ...)
  2017-03-06  9:10                             ` [PATCH v9 15/18] examples/distributor: limit number of Tx rings David Hunt
@ 2017-03-06  9:10                             ` David Hunt
  2017-03-10 16:51                               ` Bruce Richardson
  2017-03-06  9:10                             ` [PATCH v9 17/18] doc: distributor library changes for new burst API David Hunt
                                               ` (2 subsequent siblings)
  18 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

This so that with the increased amount of stats we are counting,
we don't interfere with the rx core.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 examples/distributor/main.c | 50 ++++++++++++++++++++++++++++++---------------
 1 file changed, 34 insertions(+), 16 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index cf2e826..8daf43d 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -278,6 +278,7 @@ lcore_rx(struct lcore_params *p)
 
 		app_stats.rx.enqueued_pkts += sent;
 		if (unlikely(sent < nb_ret)) {
+			app_stats.rx.enqdrop_pkts +=  nb_ret - sent;
 			RTE_LOG_DP(DEBUG, DISTRAPP,
 				"%s:Packet loss due to full ring\n", __func__);
 			while (sent < nb_ret)
@@ -295,13 +296,12 @@ lcore_rx(struct lcore_params *p)
 static inline void
 flush_one_port(struct output_buffer *outbuf, uint8_t outp)
 {
-	unsigned nb_tx = rte_eth_tx_burst(outp, 0, outbuf->mbufs,
-			outbuf->count);
-	app_stats.tx.tx_pkts += nb_tx;
+	unsigned int nb_tx = rte_eth_tx_burst(outp, 0,
+			outbuf->mbufs, outbuf->count);
+	app_stats.tx.tx_pkts += outbuf->count;
 
 	if (unlikely(nb_tx < outbuf->count)) {
-		RTE_LOG_DP(DEBUG, DISTRAPP,
-			"%s:Packet loss with tx_burst\n", __func__);
+		app_stats.tx.enqdrop_pkts +=  outbuf->count - nb_tx;
 		do {
 			rte_pktmbuf_free(outbuf->mbufs[nb_tx]);
 		} while (++nb_tx < outbuf->count);
@@ -313,6 +313,7 @@ static inline void
 flush_all_ports(struct output_buffer *tx_buffers, uint8_t nb_ports)
 {
 	uint8_t outp;
+
 	for (outp = 0; outp < nb_ports; outp++) {
 		/* skip ports that are not enabled */
 		if ((enabled_port_mask & (1 << outp)) == 0)
@@ -405,9 +406,9 @@ lcore_tx(struct rte_ring *in_r)
 			if ((enabled_port_mask & (1 << port)) == 0)
 				continue;
 
-			struct rte_mbuf *bufs[BURST_SIZE];
+			struct rte_mbuf *bufs[BURST_SIZE_TX];
 			const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
-					(void *)bufs, BURST_SIZE);
+					(void *)bufs, BURST_SIZE_TX);
 			app_stats.tx.dequeue_pkts += nb_rx;
 
 			/* if we get no traffic, flush anything we have */
@@ -436,11 +437,12 @@ lcore_tx(struct rte_ring *in_r)
 
 				outbuf = &tx_buffers[outp];
 				outbuf->mbufs[outbuf->count++] = bufs[i];
-				if (outbuf->count == BURST_SIZE)
+				if (outbuf->count == BURST_SIZE_TX)
 					flush_one_port(outbuf, outp);
 			}
 		}
 	}
+	printf("\nCore %u exiting tx task.\n", rte_lcore_id());
 	return 0;
 }
 
@@ -562,6 +564,8 @@ lcore_worker(struct lcore_params *p)
 	for (i = 0; i < 8; i++)
 		buf[i] = NULL;
 
+	app_stats.worker_pkts[p->worker_id] = 1;
+
 	printf("\nCore %u acting as worker core.\n", rte_lcore_id());
 	while (!quit_signal_work) {
 		num = rte_distributor_get_pkt(d, id, buf, buf, num);
@@ -573,6 +577,10 @@ lcore_worker(struct lcore_params *p)
 				rte_pause();
 			buf[i]->port ^= xor_val;
 		}
+
+		app_stats.worker_pkts[p->worker_id] += num;
+		if (num > 0)
+			app_stats.worker_bursts[p->worker_id][num-1]++;
 	}
 	return 0;
 }
@@ -677,9 +685,10 @@ main(int argc, char *argv[])
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Invalid distributor parameters\n");
 
-	if (rte_lcore_count() < 4)
+	if (rte_lcore_count() < 5)
 		rte_exit(EXIT_FAILURE, "Error, This application needs at "
-				"least 4 logical cores to run:\n"
+				"least 5 logical cores to run:\n"
+				"1 lcore for stats (can be core 0)\n"
 				"1 lcore for packet RX\n"
 				"1 lcore for distribution\n"
 				"1 lcore for packet TX\n"
@@ -721,7 +730,7 @@ main(int argc, char *argv[])
 	}
 
 	d = rte_distributor_create("PKT_DIST", rte_socket_id(),
-			rte_lcore_count() - 3,
+			rte_lcore_count() - 4,
 			RTE_DIST_ALG_BURST);
 	if (d == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create distributor\n");
@@ -760,7 +769,21 @@ main(int argc, char *argv[])
 			/* tx core */
 			rte_eal_remote_launch((lcore_function_t *)lcore_tx,
 					dist_tx_ring, lcore_id);
+		} else if (worker_id == rte_lcore_count() - 2) {
+			printf("Starting rx on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
+			/* rx core */
+			struct lcore_params *p =
+					rte_malloc(NULL, sizeof(*p), 0);
+			if (!p)
+				rte_panic("malloc failure\n");
+			*p = (struct lcore_params){worker_id, d, rx_dist_ring,
+					dist_tx_ring, mbuf_pool};
+			rte_eal_remote_launch((lcore_function_t *)lcore_rx,
+					p, lcore_id);
 		} else {
+			printf("Starting worker on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
 			struct lcore_params *p =
 					rte_malloc(NULL, sizeof(*p), 0);
 			if (!p)
@@ -773,11 +796,6 @@ main(int argc, char *argv[])
 		}
 		worker_id++;
 	}
-	/* call lcore_main on master core only */
-	struct lcore_params p = { 0, d, rx_dist_ring, dist_tx_ring, mbuf_pool};
-
-	if (lcore_rx(&p) != 0)
-		return -1;
 
 	freq = rte_get_timer_hz();
 	t = rte_rdtsc() + freq;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v9 17/18] doc: distributor library changes for new burst API
  2017-03-06  9:10                           ` [PATCH v9 00/18] distributor lib performance enhancements David Hunt
                                               ` (15 preceding siblings ...)
  2017-03-06  9:10                             ` [PATCH v9 16/18] examples/distributor: give Rx thread a core David Hunt
@ 2017-03-06  9:10                             ` David Hunt
  2017-03-07 17:25                               ` Mcnamara, John
  2017-03-06  9:10                             ` [PATCH v9 18/18] maintainers: add to distributor lib maintainers David Hunt
  2017-03-10 16:54                             ` [PATCH v9 00/18] distributor lib performance enhancements Bruce Richardson
  18 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 doc/guides/prog_guide/packet_distrib_lib.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/guides/prog_guide/packet_distrib_lib.rst b/doc/guides/prog_guide/packet_distrib_lib.rst
index b5bdabb..e0adcaa 100644
--- a/doc/guides/prog_guide/packet_distrib_lib.rst
+++ b/doc/guides/prog_guide/packet_distrib_lib.rst
@@ -42,6 +42,9 @@ The model of operation is shown in the diagram below.
 
    Packet Distributor mode of operation
 
+There are two modes of operation of the API in the distributor Library, one which sends one packet at a time
+to workers using 32-bits for flow_id, and an optiomised mode which sends bursts of up to 8 packets at a time
+to workers, using 15 bits of flow_id. The mode is selected by the type field in the ``rte_distributor_create function``.
 
 Distributor Core Operation
 --------------------------
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v9 18/18] maintainers: add to distributor lib maintainers
  2017-03-06  9:10                           ` [PATCH v9 00/18] distributor lib performance enhancements David Hunt
                                               ` (16 preceding siblings ...)
  2017-03-06  9:10                             ` [PATCH v9 17/18] doc: distributor library changes for new burst API David Hunt
@ 2017-03-06  9:10                             ` David Hunt
  2017-03-10 16:54                             ` [PATCH v9 00/18] distributor lib performance enhancements Bruce Richardson
  18 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-06  9:10 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 5030c1c..42eece0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -501,6 +501,7 @@ F: doc/guides/sample_app_ug/ip_reassembly.rst
 
 Distributor
 M: Bruce Richardson <bruce.richardson@intel.com>
+M: David Hunt <david.hunt@intel.com>
 F: lib/librte_distributor/
 F: doc/guides/prog_guide/packet_distrib_lib.rst
 F: test/test/test_distributor*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* Re: [PATCH v9 17/18] doc: distributor library changes for new burst API
  2017-03-06  9:10                             ` [PATCH v9 17/18] doc: distributor library changes for new burst API David Hunt
@ 2017-03-07 17:25                               ` Mcnamara, John
  0 siblings, 0 replies; 202+ messages in thread
From: Mcnamara, John @ 2017-03-07 17:25 UTC (permalink / raw)
  To: Hunt, David, dev; +Cc: Richardson, Bruce, Hunt, David



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of David Hunt
> Sent: Monday, March 6, 2017 9:11 AM
> To: dev@dpdk.org
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Hunt, David
> <david.hunt@intel.com>
> Subject: [dpdk-dev] [PATCH v9 17/18] doc: distributor library changes for
> new burst API
> 
> Signed-off-by: David Hunt <david.hunt@intel.com>

Acked-by: John McNamara <john.mcnamara@intel.com>

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v9 04/18] lib: add new distributor code
  2017-03-06  9:10                             ` [PATCH v9 04/18] lib: add new distributor code David Hunt
@ 2017-03-10 16:03                               ` Bruce Richardson
  2017-03-14 10:43                                 ` Hunt, David
  0 siblings, 1 reply; 202+ messages in thread
From: Bruce Richardson @ 2017-03-10 16:03 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Mon, Mar 06, 2017 at 09:10:19AM +0000, David Hunt wrote:
> This patch includes public header file which will be used once
> we add in the symbol versioning for v20 and v1705 APIs.
> 
> Also includes v1702 header file, and code for new

Now 1705.

> burst-capable distributor library. This will be re-named as
> rte_distributor.h later in the patch-set
> 
> The new distributor code contains a very similar API to the legacy code,
> but now sends bursts of up to 8 mbufs to each worker. Flow ID's are
> reduced to 15 bits for an optimal flow matching algorithm.
> 
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  lib/librte_distributor/Makefile                  |   1 +
>  lib/librte_distributor/rte_distributor.c         | 628 +++++++++++++++++++++++
>  lib/librte_distributor/rte_distributor_private.h |   7 +-
>  lib/librte_distributor/rte_distributor_v1705.h   | 269 ++++++++++
>  4 files changed, 904 insertions(+), 1 deletion(-)
>  create mode 100644 lib/librte_distributor/rte_distributor.c
>  create mode 100644 lib/librte_distributor/rte_distributor_v1705.h
> 

Minor nit, I think this patch might be squashed into the previous one,
to have new structures and code together.

/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v9 09/18] lib: add symbol versioning to distributor
  2017-03-06  9:10                             ` [PATCH v9 09/18] lib: add symbol versioning to distributor David Hunt
@ 2017-03-10 16:22                               ` Bruce Richardson
  2017-03-13 10:17                                 ` Hunt, David
  2017-03-13 10:28                                 ` Hunt, David
  0 siblings, 2 replies; 202+ messages in thread
From: Bruce Richardson @ 2017-03-10 16:22 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Mon, Mar 06, 2017 at 09:10:24AM +0000, David Hunt wrote:
> Also bumped up the ABI version number in the Makefile
> 
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  lib/librte_distributor/Makefile                    |  2 +-
>  lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
>  lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++

A file named rte_distributor_v1705.h was added in patch 4, then deleted
in patch 7, and now added again here. Seems a lot of churn.

/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v9 12/18] examples/distributor: allow for extra stats
  2017-03-06  9:10                             ` [PATCH v9 12/18] examples/distributor: allow for extra stats David Hunt
@ 2017-03-10 16:46                               ` Bruce Richardson
  2017-03-14 10:44                                 ` Hunt, David
  0 siblings, 1 reply; 202+ messages in thread
From: Bruce Richardson @ 2017-03-10 16:46 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Mon, Mar 06, 2017 at 09:10:27AM +0000, David Hunt wrote:
> This will allow us to see what's going on at various stages
> throughout the sample app, with per-second visibility
> 
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  examples/distributor/main.c | 139 +++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 123 insertions(+), 16 deletions(-)
> 
> diff --git a/examples/distributor/main.c b/examples/distributor/main.c
> index cc3bdb0..3657e5d 100644
> --- a/examples/distributor/main.c
> +++ b/examples/distributor/main.c
> @@ -54,24 +54,53 @@
>  
>  #define RTE_LOGTYPE_DISTRAPP RTE_LOGTYPE_USER1
>  
> +#define ANSI_COLOR_RED     "\x1b[31m"
> +#define ANSI_COLOR_RESET   "\x1b[0m"
> +
>  /* mask of enabled ports */
>  static uint32_t enabled_port_mask;
>  volatile uint8_t quit_signal;
>  volatile uint8_t quit_signal_rx;
> +volatile uint8_t quit_signal_dist;
>  
>  static volatile struct app_stats {
>  	struct {
>  		uint64_t rx_pkts;
>  		uint64_t returned_pkts;
>  		uint64_t enqueued_pkts;
> +		uint64_t enqdrop_pkts;
>  	} rx __rte_cache_aligned;
> +	int pad1 __rte_cache_aligned;
> +
> +	struct {
> +		uint64_t in_pkts;
> +		uint64_t ret_pkts;
> +		uint64_t sent_pkts;
> +		uint64_t enqdrop_pkts;
> +	} dist __rte_cache_aligned;
> +	int pad2 __rte_cache_aligned;
>  
>  	struct {
>  		uint64_t dequeue_pkts;
>  		uint64_t tx_pkts;
> +		uint64_t enqdrop_pkts;
>  	} tx __rte_cache_aligned;
> +	int pad3 __rte_cache_aligned;
> +
> +	uint64_t worker_pkts[64] __rte_cache_aligned;
> +
> +	int pad4 __rte_cache_aligned;
> +
> +	uint64_t worker_bursts[64][8] __rte_cache_aligned;
> +
> +	int pad5 __rte_cache_aligned;
> +
> +	uint64_t port_rx_pkts[64] __rte_cache_aligned;
> +	uint64_t port_tx_pkts[64] __rte_cache_aligned;
>  } app_stats;
>  
> +struct app_stats prev_app_stats;
> +
>  static const struct rte_eth_conf port_conf_default = {
>  	.rxmode = {
>  		.mq_mode = ETH_MQ_RX_RSS,
> @@ -93,6 +122,8 @@ struct output_buffer {
>  	struct rte_mbuf *mbufs[BURST_SIZE];
>  };
>  
> +static void print_stats(void);
> +
>  /*
>   * Initialises a given port using global settings and with the rx buffers
>   * coming from the mbuf_pool passed as parameter
> @@ -378,25 +409,91 @@ static void
>  print_stats(void)
>  {
>  	struct rte_eth_stats eth_stats;
> -	unsigned i;
> -
> -	printf("\nRX thread stats:\n");
> -	printf(" - Received:    %"PRIu64"\n", app_stats.rx.rx_pkts);
> -	printf(" - Processed:   %"PRIu64"\n", app_stats.rx.returned_pkts);
> -	printf(" - Enqueued:    %"PRIu64"\n", app_stats.rx.enqueued_pkts);
> -
> -	printf("\nTX thread stats:\n");
> -	printf(" - Dequeued:    %"PRIu64"\n", app_stats.tx.dequeue_pkts);
> -	printf(" - Transmitted: %"PRIu64"\n", app_stats.tx.tx_pkts);
> +	unsigned int i, j;
> +	const unsigned int num_workers = rte_lcore_count() - 4;
>  
>  	for (i = 0; i < rte_eth_dev_count(); i++) {
>  		rte_eth_stats_get(i, &eth_stats);
> -		printf("\nPort %u stats:\n", i);
> -		printf(" - Pkts in:   %"PRIu64"\n", eth_stats.ipackets);
> -		printf(" - Pkts out:  %"PRIu64"\n", eth_stats.opackets);
> -		printf(" - In Errs:   %"PRIu64"\n", eth_stats.ierrors);
> -		printf(" - Out Errs:  %"PRIu64"\n", eth_stats.oerrors);
> -		printf(" - Mbuf Errs: %"PRIu64"\n", eth_stats.rx_nombuf);
> +		app_stats.port_rx_pkts[i] = eth_stats.ipackets;
> +		app_stats.port_tx_pkts[i] = eth_stats.opackets;
> +	}
> +
> +	printf("\n\nRX Thread:\n");
> +	for (i = 0; i < rte_eth_dev_count(); i++) {
> +		printf("Port %u Pktsin : %5.2f\n", i,
> +				(app_stats.port_rx_pkts[i] -
> +				prev_app_stats.port_rx_pkts[i])/1000000.0);
> +		prev_app_stats.port_rx_pkts[i] = app_stats.port_rx_pkts[i];
> +	}
> +	printf(" - Received:    %5.2f\n",
> +			(app_stats.rx.rx_pkts -
> +			prev_app_stats.rx.rx_pkts)/1000000.0);
> +	printf(" - Returned:    %5.2f\n",
> +			(app_stats.rx.returned_pkts -
> +			prev_app_stats.rx.returned_pkts)/1000000.0);
> +	printf(" - Enqueued:    %5.2f\n",
> +			(app_stats.rx.enqueued_pkts -
> +			prev_app_stats.rx.enqueued_pkts)/1000000.0);
> +	printf(" - Dropped:     %s%5.2f%s\n", ANSI_COLOR_RED,
> +			(app_stats.rx.enqdrop_pkts -
> +			prev_app_stats.rx.enqdrop_pkts)/1000000.0,
> +			ANSI_COLOR_RESET);
> +
> +	printf("Distributor thread:\n");
> +	printf(" - In:          %5.2f\n",
> +			(app_stats.dist.in_pkts -
> +			prev_app_stats.dist.in_pkts)/1000000.0);
> +	printf(" - Returned:    %5.2f\n",
> +			(app_stats.dist.ret_pkts -
> +			prev_app_stats.dist.ret_pkts)/1000000.0);
> +	printf(" - Sent:        %5.2f\n",
> +			(app_stats.dist.sent_pkts -
> +			prev_app_stats.dist.sent_pkts)/1000000.0);
> +	printf(" - Dropped      %s%5.2f%s\n", ANSI_COLOR_RED,
> +			(app_stats.dist.enqdrop_pkts -
> +			prev_app_stats.dist.enqdrop_pkts)/1000000.0,
> +			ANSI_COLOR_RESET);
> +
> +	printf("TX thread:\n");
> +	printf(" - Dequeued:    %5.2f\n",
> +			(app_stats.tx.dequeue_pkts -
> +			prev_app_stats.tx.dequeue_pkts)/1000000.0);
> +	for (i = 0; i < rte_eth_dev_count(); i++) {
> +		printf("Port %u Pktsout: %5.2f\n",
> +				i, (app_stats.port_tx_pkts[i] -
> +				prev_app_stats.port_tx_pkts[i])/1000000.0);
> +		prev_app_stats.port_tx_pkts[i] = app_stats.port_tx_pkts[i];
> +	}
> +	printf(" - Transmitted: %5.2f\n",
> +			(app_stats.tx.tx_pkts -
> +			prev_app_stats.tx.tx_pkts)/1000000.0);
> +	printf(" - Dropped:     %s%5.2f%s\n", ANSI_COLOR_RED,
> +			(app_stats.tx.enqdrop_pkts -
> +			prev_app_stats.tx.enqdrop_pkts)/1000000.0,
> +			ANSI_COLOR_RESET);
> +
> +	prev_app_stats.rx.rx_pkts = app_stats.rx.rx_pkts;
> +	prev_app_stats.rx.returned_pkts = app_stats.rx.returned_pkts;
> +	prev_app_stats.rx.enqueued_pkts = app_stats.rx.enqueued_pkts;
> +	prev_app_stats.rx.enqdrop_pkts = app_stats.rx.enqdrop_pkts;
> +	prev_app_stats.dist.in_pkts = app_stats.dist.in_pkts;
> +	prev_app_stats.dist.ret_pkts = app_stats.dist.ret_pkts;
> +	prev_app_stats.dist.sent_pkts = app_stats.dist.sent_pkts;
> +	prev_app_stats.dist.enqdrop_pkts = app_stats.dist.enqdrop_pkts;
> +	prev_app_stats.tx.dequeue_pkts = app_stats.tx.dequeue_pkts;
> +	prev_app_stats.tx.tx_pkts = app_stats.tx.tx_pkts;
> +	prev_app_stats.tx.enqdrop_pkts = app_stats.tx.enqdrop_pkts;
> +
> +	for (i = 0; i < num_workers; i++) {
> +		printf("Worker %02u Pkts: %5.2f. Bursts(1-8): ", i,
> +				(app_stats.worker_pkts[i] -
> +				prev_app_stats.worker_pkts[i])/1000000.0);
> +		for (j = 0; j < 8; j++) {
> +			printf("%"PRIu64" ", app_stats.worker_bursts[i][j]);
> +			app_stats.worker_bursts[i][j] = 0;
> +		}
> +		printf("\n");
> +		prev_app_stats.worker_pkts[i] = app_stats.worker_pkts[i];
>  	}
>  }
>  
> @@ -515,6 +612,7 @@ main(int argc, char *argv[])
>  	unsigned nb_ports;
>  	uint8_t portid;
>  	uint8_t nb_ports_available;
> +	uint64_t t, freq;
>  
>  	/* catch ctrl-c so we can print on exit */
>  	signal(SIGINT, int_handler);
> @@ -610,6 +708,15 @@ main(int argc, char *argv[])
>  	if (lcore_rx(&p) != 0)
>  		return -1;
>  
> +	freq = rte_get_timer_hz();
> +	t = rte_rdtsc() + freq;
> +	while (!quit_signal_dist) {
> +		if (t < rte_rdtsc()) {
> +			print_stats();
> +			t = rte_rdtsc() + freq;
> +		}
> +	}
> +

You can probably put in a usleep or nanosleep inot the while loop above.
No need to burn an entire core by polling the tsc.

/Bruce

> 2.7.4
> 

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v9 13/18] sample: distributor: wait for ports to come up
  2017-03-06  9:10                             ` [PATCH v9 13/18] sample: distributor: wait for ports to come up David Hunt
@ 2017-03-10 16:48                               ` Bruce Richardson
  0 siblings, 0 replies; 202+ messages in thread
From: Bruce Richardson @ 2017-03-10 16:48 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Mon, Mar 06, 2017 at 09:10:28AM +0000, David Hunt wrote:
> On some machines, ports take several seconds to come up. This
> patch causes the app to wait.
> 
> Signed-off-by: David Hunt <david.hunt@intel.com>

Title prefix should match other patches.

/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v9 14/18] examples/distributor: give distributor a core
  2017-03-06  9:10                             ` [PATCH v9 14/18] examples/distributor: give distributor a core David Hunt
@ 2017-03-10 16:49                               ` Bruce Richardson
  2017-03-14 10:48                                 ` Hunt, David
  0 siblings, 1 reply; 202+ messages in thread
From: Bruce Richardson @ 2017-03-10 16:49 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Mon, Mar 06, 2017 at 09:10:29AM +0000, David Hunt wrote:
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---

Title could do with some rewording - e.g. "make distributor API calls on
dedicated core"

This also requires an explanation as to why the change is being made.
Does it not also need an update to the sample app guide about how the
app works?

/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v9 15/18] examples/distributor: limit number of Tx rings
  2017-03-06  9:10                             ` [PATCH v9 15/18] examples/distributor: limit number of Tx rings David Hunt
@ 2017-03-10 16:50                               ` Bruce Richardson
  2017-03-14 10:50                                 ` Hunt, David
  0 siblings, 1 reply; 202+ messages in thread
From: Bruce Richardson @ 2017-03-10 16:50 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Mon, Mar 06, 2017 at 09:10:30AM +0000, David Hunt wrote:
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---

Please explain reason for change.

/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v9 16/18] examples/distributor: give Rx thread a core
  2017-03-06  9:10                             ` [PATCH v9 16/18] examples/distributor: give Rx thread a core David Hunt
@ 2017-03-10 16:51                               ` Bruce Richardson
  2017-03-14  9:34                                 ` Hunt, David
  0 siblings, 1 reply; 202+ messages in thread
From: Bruce Richardson @ 2017-03-10 16:51 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Mon, Mar 06, 2017 at 09:10:31AM +0000, David Hunt wrote:
> This so that with the increased amount of stats we are counting,
> we don't interfere with the rx core.
> 
Where are the stats being counted in the current code and how would they
interfere?

/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v9 00/18] distributor lib performance enhancements
  2017-03-06  9:10                           ` [PATCH v9 00/18] distributor lib performance enhancements David Hunt
                                               ` (17 preceding siblings ...)
  2017-03-06  9:10                             ` [PATCH v9 18/18] maintainers: add to distributor lib maintainers David Hunt
@ 2017-03-10 16:54                             ` Bruce Richardson
  18 siblings, 0 replies; 202+ messages in thread
From: Bruce Richardson @ 2017-03-10 16:54 UTC (permalink / raw)
  To: David Hunt; +Cc: dev

On Mon, Mar 06, 2017 at 09:10:15AM +0000, David Hunt wrote:
> This patch aims to improve the throughput of the distributor library.
> 
> It uses a similar handshake mechanism to the previous version of
> the library, in that bits are used to indicate when packets are ready
> to be sent to a worker and ready to be returned from a worker. One main
> difference is that instead of sending one packet in a cache line, it makes
> use of the 7 free spaces in the same cache line in order to send up to
> 8 packets at a time to/from a worker.
> 
> The flow matching algorithm has had significant re-work, and now keeps an
> array of inflight flows and an array of backlog flows, and matches incoming
> flows to the inflight/backlog flows of all workers so that flow pinning to
> workers can be maintained.
> 
> The Flow Match algorithm has both scalar and a vector versions, and a
> function pointer is used to select the post appropriate function at run time,
> depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
> the scalar match function is selected, which should still gives a good boost
> in performance over the non-burst API.
> 
> v9 changes:
>    * fixed symbol versioning so it will compile on CentOS and RedHat
> 

I've flagged a number of things that could do with being cleaned up in
the patchset. However, the idea itself of adding a new burst-mode to
improve distributor performance - and using vector matching to further
boost it - is a good improvement. Therefore

Series-Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v9 09/18] lib: add symbol versioning to distributor
  2017-03-10 16:22                               ` Bruce Richardson
@ 2017-03-13 10:17                                 ` Hunt, David
  2017-03-13 10:28                                 ` Hunt, David
  1 sibling, 0 replies; 202+ messages in thread
From: Hunt, David @ 2017-03-13 10:17 UTC (permalink / raw)
  To: Richardson, Bruce; +Cc: dev



-----Original Message-----
From: Richardson, Bruce 
Sent: Friday, 10 March, 2017 4:22 PM
To: Hunt, David <david.hunt@intel.com>
Cc: dev@dpdk.org
Subject: Re: [PATCH v9 09/18] lib: add symbol versioning to distributor

On Mon, Mar 06, 2017 at 09:10:24AM +0000, David Hunt wrote:
> Also bumped up the ABI version number in the Makefile
> 
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  lib/librte_distributor/Makefile                    |  2 +-
>  lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
>  lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++

A file named rte_distributor_v1705.h was added in patch 4, then deleted in patch 7, and now added again here. Seems a lot of churn.

/Bruce

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v9 09/18] lib: add symbol versioning to distributor
  2017-03-10 16:22                               ` Bruce Richardson
  2017-03-13 10:17                                 ` Hunt, David
@ 2017-03-13 10:28                                 ` Hunt, David
  2017-03-13 11:01                                   ` Van Haaren, Harry
  1 sibling, 1 reply; 202+ messages in thread
From: Hunt, David @ 2017-03-13 10:28 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev


On 10/3/2017 4:22 PM, Bruce Richardson wrote:
> On Mon, Mar 06, 2017 at 09:10:24AM +0000, David Hunt wrote:
>> Also bumped up the ABI version number in the Makefile
>>
>> Signed-off-by: David Hunt <david.hunt@intel.com>
>> ---
>>   lib/librte_distributor/Makefile                    |  2 +-
>>   lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
>>   lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++
> A file named rte_distributor_v1705.h was added in patch 4, then deleted
> in patch 7, and now added again here. Seems a lot of churn.
>
> /Bruce
>

The first introduction of this file is what will become the public 
header. For successful compilation,
this cannot be called rte_distributor.h until the symbol versioning 
patch, at which stage I will
rename the file, and introduce the symbol versioned header at the same 
time. In the next patch
I'll rename this version of the files as rte_distributor_public.h to 
make this clearer.

Regards,
Dave.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v9 09/18] lib: add symbol versioning to distributor
  2017-03-13 10:28                                 ` Hunt, David
@ 2017-03-13 11:01                                   ` Van Haaren, Harry
  2017-03-13 11:02                                     ` Hunt, David
  0 siblings, 1 reply; 202+ messages in thread
From: Van Haaren, Harry @ 2017-03-13 11:01 UTC (permalink / raw)
  To: Hunt, David, Richardson, Bruce; +Cc: dev

> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Hunt, David
> Subject: Re: [dpdk-dev] [PATCH v9 09/18] lib: add symbol versioning to distributor
> 
> On 10/3/2017 4:22 PM, Bruce Richardson wrote:
> > On Mon, Mar 06, 2017 at 09:10:24AM +0000, David Hunt wrote:
> >> Also bumped up the ABI version number in the Makefile
> >>
> >> Signed-off-by: David Hunt <david.hunt@intel.com>
> >> ---
> >>   lib/librte_distributor/Makefile                    |  2 +-
> >>   lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
> >>   lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++
> > A file named rte_distributor_v1705.h was added in patch 4, then deleted
> > in patch 7, and now added again here. Seems a lot of churn.
> >
> > /Bruce
> >
> 
> The first introduction of this file is what will become the public
> header. For successful compilation,
> this cannot be called rte_distributor.h until the symbol versioning
> patch, at which stage I will
> rename the file, and introduce the symbol versioned header at the same
> time. In the next patch
> I'll rename this version of the files as rte_distributor_public.h to
> make this clearer.


Suggestion to use rte_distributor_next.h instead of public?
Public doesn't indicate if its old or new, while next would make that clearer IMO :)

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v9 09/18] lib: add symbol versioning to distributor
  2017-03-13 11:01                                   ` Van Haaren, Harry
@ 2017-03-13 11:02                                     ` Hunt, David
  0 siblings, 0 replies; 202+ messages in thread
From: Hunt, David @ 2017-03-13 11:02 UTC (permalink / raw)
  To: Van Haaren, Harry, Richardson, Bruce; +Cc: dev


On 13/3/2017 11:01 AM, Van Haaren, Harry wrote:
>> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Hunt, David
>> Subject: Re: [dpdk-dev] [PATCH v9 09/18] lib: add symbol versioning to distributor
>>
>> On 10/3/2017 4:22 PM, Bruce Richardson wrote:
>>> On Mon, Mar 06, 2017 at 09:10:24AM +0000, David Hunt wrote:
>>>> Also bumped up the ABI version number in the Makefile
>>>>
>>>> Signed-off-by: David Hunt <david.hunt@intel.com>
>>>> ---
>>>>    lib/librte_distributor/Makefile                    |  2 +-
>>>>    lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
>>>>    lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++
>>> A file named rte_distributor_v1705.h was added in patch 4, then deleted
>>> in patch 7, and now added again here. Seems a lot of churn.
>>>
>>> /Bruce
>>>
>> The first introduction of this file is what will become the public
>> header. For successful compilation,
>> this cannot be called rte_distributor.h until the symbol versioning
>> patch, at which stage I will
>> rename the file, and introduce the symbol versioned header at the same
>> time. In the next patch
>> I'll rename this version of the files as rte_distributor_public.h to
>> make this clearer.
>
> Suggestion to use rte_distributor_next.h instead of public?
> Public doesn't indicate if its old or new, while next would make that clearer IMO :)

Good call, will use "_next". Its clearer.
Thanks,
Dave.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v9 16/18] examples/distributor: give Rx thread a core
  2017-03-10 16:51                               ` Bruce Richardson
@ 2017-03-14  9:34                                 ` Hunt, David
  0 siblings, 0 replies; 202+ messages in thread
From: Hunt, David @ 2017-03-14  9:34 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev



On 10/3/2017 4:51 PM, Bruce Richardson wrote:
> On Mon, Mar 06, 2017 at 09:10:31AM +0000, David Hunt wrote:
>> This so that with the increased amount of stats we are counting,
>> we don't interfere with the rx core.
>>
> Where are the stats being counted in the current code and how would they
> interfere?
>
> /Bruce

Previous version of the distributor example did not print out several 
lines of
statistics every second, which the new version does. I felt that it it 
would be
better to separate this off to it's own core, so that the rx core 
performance
would not be impacted.

I'll add an extra comment in the commit message.

Regards,
Dave.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v9 04/18] lib: add new distributor code
  2017-03-10 16:03                               ` Bruce Richardson
@ 2017-03-14 10:43                                 ` Hunt, David
  0 siblings, 0 replies; 202+ messages in thread
From: Hunt, David @ 2017-03-14 10:43 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev



On 10/3/2017 4:03 PM, Bruce Richardson wrote:
> On Mon, Mar 06, 2017 at 09:10:19AM +0000, David Hunt wrote:
>> This patch includes public header file which will be used once
>> we add in the symbol versioning for v20 and v1705 APIs.
>>
>> Also includes v1702 header file, and code for new
> Now 1705.
>
>> burst-capable distributor library. This will be re-named as
>> rte_distributor.h later in the patch-set
>>
>> The new distributor code contains a very similar API to the legacy code,
>> but now sends bursts of up to 8 mbufs to each worker. Flow ID's are
>> reduced to 15 bits for an optimal flow matching algorithm.
>>
>> Signed-off-by: David Hunt <david.hunt@intel.com>
>> ---
>>   lib/librte_distributor/Makefile                  |   1 +
>>   lib/librte_distributor/rte_distributor.c         | 628 +++++++++++++++++++++++
>>   lib/librte_distributor/rte_distributor_private.h |   7 +-
>>   lib/librte_distributor/rte_distributor_v1705.h   | 269 ++++++++++
>>   4 files changed, 904 insertions(+), 1 deletion(-)
>>   create mode 100644 lib/librte_distributor/rte_distributor.c
>>   create mode 100644 lib/librte_distributor/rte_distributor_v1705.h
>>
> Minor nit, I think this patch might be squashed into the previous one,
> to have new structures and code together.
>
> /Bruce

Done in the next version.
Dave.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v9 12/18] examples/distributor: allow for extra stats
  2017-03-10 16:46                               ` Bruce Richardson
@ 2017-03-14 10:44                                 ` Hunt, David
  0 siblings, 0 replies; 202+ messages in thread
From: Hunt, David @ 2017-03-14 10:44 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev



On 10/3/2017 4:46 PM, Bruce Richardson wrote:
> On Mon, Mar 06, 2017 at 09:10:27AM +0000, David Hunt wrote:
>> +	freq = rte_get_timer_hz();
>> +	t = rte_rdtsc() + freq;
>> +	while (!quit_signal_dist) {
>> +		if (t < rte_rdtsc()) {
>> +			print_stats();
>> +			t = rte_rdtsc() + freq;
>> +		}
>> +	}
>> +
> You can probably put in a usleep or nanosleep inot the while loop above.
> No need to burn an entire core by polling the tsc.
>
> /Bruce


Done in the next version.
Dave.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v9 14/18] examples/distributor: give distributor a core
  2017-03-10 16:49                               ` Bruce Richardson
@ 2017-03-14 10:48                                 ` Hunt, David
  0 siblings, 0 replies; 202+ messages in thread
From: Hunt, David @ 2017-03-14 10:48 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev



On 10/3/2017 4:49 PM, Bruce Richardson wrote:
> On Mon, Mar 06, 2017 at 09:10:29AM +0000, David Hunt wrote:
>> Signed-off-by: David Hunt <david.hunt@intel.com>
>> ---
> Title could do with some rewording - e.g. "make distributor API calls on
> dedicated core"
>
> This also requires an explanation as to why the change is being made.
> Does it not also need an update to the sample app guide about how the
> app works?
>
> /Bruce

Yes, sure. And I'll add some more info into the dist_app.rst file.
Rgds,
Dave.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v9 15/18] examples/distributor: limit number of Tx rings
  2017-03-10 16:50                               ` Bruce Richardson
@ 2017-03-14 10:50                                 ` Hunt, David
  0 siblings, 0 replies; 202+ messages in thread
From: Hunt, David @ 2017-03-14 10:50 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: dev


On 10/3/2017 4:50 PM, Bruce Richardson wrote:
> On Mon, Mar 06, 2017 at 09:10:30AM +0000, David Hunt wrote:
>> Signed-off-by: David Hunt <david.hunt@intel.com>
>> ---
> Please explain reason for change.
>
> /Bruce

I've re-worked this change, as it's mostly do to with performance, 
resulting in about 10% gain.
The number of Tx rings has been removed, as it was part of an 
investigation for
high core count operation, and is not relevant.
Rgds,
Dave.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* [PATCH v10 0/18] distributor library performance enhancements
  2017-03-06  9:10                             ` [PATCH v9 01/18] lib: rename legacy distributor lib files David Hunt
@ 2017-03-15  6:19                               ` David Hunt
  2017-03-15  6:19                                 ` [PATCH v10 01/18] lib: rename legacy distributor lib files David Hunt
                                                   ` (17 more replies)
  0 siblings, 18 replies; 202+ messages in thread
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson

This patch aims to improve the throughput of the distributor library.

It uses a similar handshake mechanism to the previous version of
the library, in that bits are used to indicate when packets are ready
to be sent to a worker and ready to be returned from a worker. One main
difference is that instead of sending one packet in a cache line, it makes
use of the 7 free spaces in the same cache line in order to send up to
8 packets at a time to/from a worker.

The flow matching algorithm has had significant re-work, and now keeps an
array of inflight flows and an array of backlog flows, and matches incoming
flows to the inflight/backlog flows of all workers so that flow pinning to
workers can be maintained.

The Flow Match algorithm has both scalar and a vector versions, and a
function pointer is used to select the post appropriate function at run time,
depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
the scalar match function is selected, which should still gives a good boost
in performance over the non-burst API.

v10 changes:
   * Addressed all review comments from v9 (thanks, Bruce)
   * Squashed the two patches containing distributor structs and code
   * Renamed confusing rte_distributor_v1705.h to rte_distributor_next.h
   * Added usleep in main so as to be a little more gentle with that core
   * Fixed some patch titles and improved some descriptions
   * Updated sample app guide documentation
   * Removed un-needed code limiting Tx rings and cleaned up patch
   * Inherited v9 series Ack by Bruce, except new suggested addition
     for example app documentation (17/18)

v9 changes:
   * fixed symbol versioning so it will compile on CentOS and RedHat

v8 changes:
   * Changed the patch set to have a more logical order order of
     the changes, but the end result is basically the same.
   * Fixed broken shared library build.
   * Split down the updates to example app more
   * No longer changes the test app and sample app to use a temporary
     API.
   * No longer temporarily re-names the functions in the
     version.map file.

v7 changes:
   * Reorganised patch so there's a more natural progression in the
     changes, and divided them down into easier to review chunks.
   * Previous versions of this patch set were effectively two APIs.
     We now have a single API. Legacy functionality can
     be used by by using the rte_distributor_create API call with the
     RTE_DISTRIBUTOR_SINGLE flag when creating a distributor instance.
   * Added symbol versioning for old API so that ABI is preserved.

v6 changes:
   * Fixed intermittent segfault where num pkts not divisible
     by BURST_SIZE
   * Cleanup due to review comments on mailing list
   * Renamed _priv.h to _private.h.

v5 changes:
   * Removed some un-needed code around retries in worker API calls
   * Cleanup due to review comments on mailing list
   * Cleanup of non-x86 platform compilation, fallback to scalar match

v4 changes:
   * fixed issue building shared libraries

v3 changes:
  * Addressed mailing list review comments
  * Test code removal
  * Split out SSE match into separate file to facilitate NEON addition
  * Cleaned up conditional compilation flags for SSE2
  * Addressed c99 style compilation errors
  * rebased on latest head (Jan 2 2017, Happy New Year to all)

v2 changes:
  * Created a common distributor_priv.h header file with common
    definitions and structures.
  * Added a scalar version so it can be built and used on machines without
    sse2 instruction set
  * Added unit autotests
  * Added perf autotest

Notes:
   Apps must now work in bursts, as up to 8 are given to a worker at a time
   For performance in matching, Flow ID's are 15-bits
   If 32 bits Flow IDs are required, use the packet-at-a-time (SINGLE)
   mode.

Performance Gains
   2.2GHz Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
   2 x XL710 40GbE NICS to 2 x 40Gbps traffic generator channels 64b packets
   separate cores for rx, tx, distributor
    1 worker  - up to 4.8x
    4 workers - up to 2.9x
    8 workers - up to 1.8x
   12 workers - up to 2.1x
   16 workers - up to 1.8x

[01/18] lib: rename legacy distributor lib files
[02/18] lib: create private header file
[03/18] lib: add new distributor code
[04/18] lib: add SIMD flow matching to distributor
[05/18] test/distributor: extra params for autotests
[06/18] lib: switch distributor over to new API
[07/18] lib: make v20 header file private
[08/18] lib: add symbol versioning to distributor
[09/18] test: test single and burst distributor API
[10/18] test: add perf test for distributor burst mode
[11/18] examples/distributor: allow for extra stats
[12/18] examples/distributor: wait for ports to come up
[13/18] examples/distributor: add dedicated core for dist
[14/18] examples/distributor: tweaks for performance
[15/18] examples/distributor: give Rx thread a core
[16/18] doc: distributor library changes for new burst API
[17/18] doc: distributor app changes for new burst API
[18/18] maintainers: add to distributor lib maintainers

^ permalink raw reply	[flat|nested] 202+ messages in thread

* [PATCH v10 01/18] lib: rename legacy distributor lib files
  2017-03-15  6:19                               ` [PATCH v10 0/18] distributor library performance enhancements David Hunt
@ 2017-03-15  6:19                                 ` David Hunt
  2017-03-20 10:08                                   ` [PATCH v11 0/18] distributor lib performance enhancements David Hunt
  2017-03-15  6:19                                 ` [PATCH v10 02/18] lib: create private header file David Hunt
                                                   ` (16 subsequent siblings)
  17 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Move files out of the way so that we can replace with new
versions of the distributor libtrary. Files are named in
such a way as to match the symbol versioning that we will
apply for backward ABI compatibility.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_distributor/Makefile                    |   3 +-
 lib/librte_distributor/rte_distributor.h           | 210 +-----------------
 .../{rte_distributor.c => rte_distributor_v20.c}   |   2 +-
 lib/librte_distributor/rte_distributor_v20.h       | 247 +++++++++++++++++++++
 4 files changed, 251 insertions(+), 211 deletions(-)
 rename lib/librte_distributor/{rte_distributor.c => rte_distributor_v20.c} (99%)
 create mode 100644 lib/librte_distributor/rte_distributor_v20.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 4c9af17..b314ca6 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -42,10 +42,11 @@ EXPORT_MAP := rte_distributor_version.map
 LIBABIVER := 1
 
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor_v20.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
diff --git a/lib/librte_distributor/rte_distributor.h b/lib/librte_distributor/rte_distributor.h
index 7d36bc8..e41d522 100644
--- a/lib/librte_distributor/rte_distributor.h
+++ b/lib/librte_distributor/rte_distributor.h
@@ -34,214 +34,6 @@
 #ifndef _RTE_DISTRIBUTE_H_
 #define _RTE_DISTRIBUTE_H_
 
-/**
- * @file
- * RTE distributor
- *
- * The distributor is a component which is designed to pass packets
- * one-at-a-time to workers, with dynamic load balancing.
- */
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
-
-struct rte_distributor;
-struct rte_mbuf;
-
-/**
- * Function to create a new distributor instance
- *
- * Reserves the memory needed for the distributor operation and
- * initializes the distributor to work with the configured number of workers.
- *
- * @param name
- *   The name to be given to the distributor instance.
- * @param socket_id
- *   The NUMA node on which the memory is to be allocated
- * @param num_workers
- *   The maximum number of workers that will request packets from this
- *   distributor
- * @return
- *   The newly created distributor instance
- */
-struct rte_distributor *
-rte_distributor_create(const char *name, unsigned socket_id,
-		unsigned num_workers);
-
-/*  *** APIS to be called on the distributor lcore ***  */
-/*
- * The following APIs are the public APIs which are designed for use on a
- * single lcore which acts as the distributor lcore for a given distributor
- * instance. These functions cannot be called on multiple cores simultaneously
- * without using locking to protect access to the internals of the distributor.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * Process a set of packets by distributing them among workers that request
- * packets. The distributor will ensure that no two packets that have the
- * same flow id, or tag, in the mbuf will be procesed at the same time.
- *
- * The user is advocated to set tag for each mbuf before calling this function.
- * If user doesn't set the tag, the tag value can be various values depending on
- * driver implementation and configuration.
- *
- * This is not multi-thread safe and should only be called on a single lcore.
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs to be distributed
- * @param num_mbufs
- *   The number of mbufs in the mbufs array
- * @return
- *   The number of mbufs processed.
- */
-int
-rte_distributor_process(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned num_mbufs);
-
-/**
- * Get a set of mbufs that have been returned to the distributor by workers
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs pointer array to be filled in
- * @param max_mbufs
- *   The size of the mbufs array
- * @return
- *   The number of mbufs returned in the mbufs array.
- */
-int
-rte_distributor_returned_pkts(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned max_mbufs);
-
-/**
- * Flush the distributor component, so that there are no in-flight or
- * backlogged packets awaiting processing
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @return
- *   The number of queued/in-flight packets that were completed by this call.
- */
-int
-rte_distributor_flush(struct rte_distributor *d);
-
-/**
- * Clears the array of returned packets used as the source for the
- * rte_distributor_returned_pkts() API call.
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- */
-void
-rte_distributor_clear_returns(struct rte_distributor *d);
-
-/*  *** APIS to be called on the worker lcores ***  */
-/*
- * The following APIs are the public APIs which are designed for use on
- * multiple lcores which act as workers for a distributor. Each lcore should use
- * a unique worker id when requesting packets.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * API called by a worker to get a new packet to process. Any previous packet
- * given to the worker is assumed to have completed processing, and may be
- * optionally returned to the distributor via the oldpkt parameter.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- *
- * @return
- *   A new packet to be processed by the worker thread.
- */
-struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to return a completed packet without requesting a
- * new packet, for example, because a worker thread is shutting down
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param mbuf
- *   The previous packet being processed by the worker
- */
-int
-rte_distributor_return_pkt(struct rte_distributor *d, unsigned worker_id,
-		struct rte_mbuf *mbuf);
-
-/**
- * API called by a worker to request a new packet to process.
- * Any previous packet given to the worker is assumed to have completed
- * processing, and may be optionally returned to the distributor via
- * the oldpkt parameter.
- * Unlike rte_distributor_get_pkt(), this function does not wait for a new
- * packet to be provided by the distributor.
- *
- * NOTE: after calling this function, rte_distributor_poll_pkt() should
- * be used to poll for the packet requested. The rte_distributor_get_pkt()
- * API should *not* be used to try and retrieve the new packet.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- */
-void
-rte_distributor_request_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to check for a new packet that was previously
- * requested by a call to rte_distributor_request_pkt(). It does not wait
- * for the new packet to be available, but returns NULL if the request has
- * not yet been fulfilled by the distributor.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- *
- * @return
- *   A new packet to be processed by the worker thread, or NULL if no
- *   packet is yet available.
- */
-struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
-		unsigned worker_id);
-
-#ifdef __cplusplus
-}
-#endif
+#include <rte_distributor_v20.h>
 
 #endif
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor_v20.c
similarity index 99%
rename from lib/librte_distributor/rte_distributor.c
rename to lib/librte_distributor/rte_distributor_v20.c
index f3f778c..b890947 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -40,7 +40,7 @@
 #include <rte_errno.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
-#include "rte_distributor.h"
+#include "rte_distributor_v20.h"
 
 #define NO_FLAGS 0
 #define RTE_DISTRIB_PREFIX "DT_"
diff --git a/lib/librte_distributor/rte_distributor_v20.h b/lib/librte_distributor/rte_distributor_v20.h
new file mode 100644
index 0000000..b69aa27
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v20.h
@@ -0,0 +1,247 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIB_V20_H_
+#define _RTE_DISTRIB_V20_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+struct rte_distributor;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor *
+rte_distributor_create(const char *name, unsigned int socket_id,
+		unsigned int num_workers);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed at the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush(struct rte_distributor *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns(struct rte_distributor *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get a new packet to process. Any previous packet
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ *
+ * @return
+ *   A new packet to be processed by the worker thread.
+ */
+struct rte_mbuf *
+rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbuf
+ *   The previous packet being processed by the worker
+ */
+int
+rte_distributor_return_pkt(struct rte_distributor *d, unsigned int worker_id,
+		struct rte_mbuf *mbuf);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt(), this function does not wait for a new
+ * packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ */
+void
+rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ *
+ * @return
+ *   A new packet to be processed by the worker thread, or NULL if no
+ *   packet is yet available.
+ */
+struct rte_mbuf *
+rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v10 02/18] lib: create private header file
  2017-03-15  6:19                               ` [PATCH v10 0/18] distributor library performance enhancements David Hunt
  2017-03-15  6:19                                 ` [PATCH v10 01/18] lib: rename legacy distributor lib files David Hunt
@ 2017-03-15  6:19                                 ` David Hunt
  2017-03-15 17:18                                   ` Thomas Monjalon
  2017-03-15  6:19                                 ` [PATCH v10 03/18] lib: add new distributor code David Hunt
                                                   ` (15 subsequent siblings)
  17 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

We'll be adding internal implementation definitions in here
that are common to both burst and legacy APIs.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_distributor/rte_distributor_private.h | 136 +++++++++++++++++++++++
 lib/librte_distributor/rte_distributor_v20.c     |  72 +-----------
 2 files changed, 137 insertions(+), 71 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_private.h

diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
new file mode 100644
index 0000000..6d72f1c
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_private.h
@@ -0,0 +1,136 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DIST_PRIV_H_
+#define _RTE_DIST_PRIV_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define NO_FLAGS 0
+#define RTE_DISTRIB_PREFIX "DT_"
+
+/*
+ * We will use the bottom four bits of pointer for flags, shifting out
+ * the top four bits to make room (since a 64-bit pointer actually only uses
+ * 48 bits). An arithmetic-right-shift will then appropriately restore the
+ * original pointer value with proper sign extension into the top bits.
+ */
+#define RTE_DISTRIB_FLAG_BITS 4
+#define RTE_DISTRIB_FLAGS_MASK (0x0F)
+#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
+#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
+#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
+#define RTE_DISTRIB_VALID_BUF (4)  /**< set if bufptr contains ptr */
+
+#define RTE_DISTRIB_BACKLOG_SIZE 8
+#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
+
+#define RTE_DISTRIB_MAX_RETURNS 128
+#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
+
+/**
+ * Maximum number of workers allowed.
+ * Be aware of increasing the limit, becaus it is limited by how we track
+ * in-flight tags. See in_flight_bitmask and rte_distributor_process
+ */
+#define RTE_DISTRIB_MAX_WORKERS 64
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+/**
+ * Buffer structure used to pass the pointer data between cores. This is cache
+ * line aligned, but to improve performance and prevent adjacent cache-line
+ * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
+ * the next cache line to worker 0, we pad this out to three cache lines.
+ * Only 64-bits of the memory is actually used though.
+ */
+union rte_distributor_buffer {
+	volatile int64_t bufptr64;
+	char pad[RTE_CACHE_LINE_SIZE*3];
+} __rte_cache_aligned;
+
+/**
+ * Number of packets to deal with in bursts. Needs to be 8 so as to
+ * fit in one cache line.
+ */
+#define RTE_DIST_BURST_SIZE (sizeof(rte_xmm_t) / sizeof(uint16_t))
+
+struct rte_distributor_backlog {
+	unsigned int start;
+	unsigned int count;
+	int64_t pkts[RTE_DIST_BURST_SIZE] __rte_cache_aligned;
+	uint16_t *tags; /* will point to second cacheline of inflights */
+} __rte_cache_aligned;
+
+
+struct rte_distributor_returned_pkts {
+	unsigned int start;
+	unsigned int count;
+	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
+};
+
+struct rte_distributor {
+	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
+
+	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
+	unsigned int num_workers;             /**< Number of workers polling */
+
+	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
+		/**< Tracks the tag being processed per core */
+	uint64_t in_flight_bitmask;
+		/**< on/off bits for in-flight tags.
+		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
+		 * the bitmask has to expand.
+		 */
+
+	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
+
+	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
+
+	struct rte_distributor_returned_pkts returns;
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
index b890947..be297ec 100644
--- a/lib/librte_distributor/rte_distributor_v20.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -41,77 +41,7 @@
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
 #include "rte_distributor_v20.h"
-
-#define NO_FLAGS 0
-#define RTE_DISTRIB_PREFIX "DT_"
-
-/* we will use the bottom four bits of pointer for flags, shifting out
- * the top four bits to make room (since a 64-bit pointer actually only uses
- * 48 bits). An arithmetic-right-shift will then appropriately restore the
- * original pointer value with proper sign extension into the top bits. */
-#define RTE_DISTRIB_FLAG_BITS 4
-#define RTE_DISTRIB_FLAGS_MASK (0x0F)
-#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
-#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
-#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
-
-#define RTE_DISTRIB_BACKLOG_SIZE 8
-#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
-
-#define RTE_DISTRIB_MAX_RETURNS 128
-#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
-
-/**
- * Maximum number of workers allowed.
- * Be aware of increasing the limit, becaus it is limited by how we track
- * in-flight tags. See @in_flight_bitmask and @rte_distributor_process
- */
-#define RTE_DISTRIB_MAX_WORKERS	64
-
-/**
- * Buffer structure used to pass the pointer data between cores. This is cache
- * line aligned, but to improve performance and prevent adjacent cache-line
- * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
- * the next cache line to worker 0, we pad this out to three cache lines.
- * Only 64-bits of the memory is actually used though.
- */
-union rte_distributor_buffer {
-	volatile int64_t bufptr64;
-	char pad[RTE_CACHE_LINE_SIZE*3];
-} __rte_cache_aligned;
-
-struct rte_distributor_backlog {
-	unsigned start;
-	unsigned count;
-	int64_t pkts[RTE_DISTRIB_BACKLOG_SIZE];
-};
-
-struct rte_distributor_returned_pkts {
-	unsigned start;
-	unsigned count;
-	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
-};
-
-struct rte_distributor {
-	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
-
-	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
-	unsigned num_workers;                 /**< Number of workers polling */
-
-	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
-		/**< Tracks the tag being processed per core */
-	uint64_t in_flight_bitmask;
-		/**< on/off bits for in-flight tags.
-		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
-		 * the bitmask has to expand.
-		 */
-
-	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
-
-	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
-
-	struct rte_distributor_returned_pkts returns;
-};
+#include "rte_distributor_private.h"
 
 TAILQ_HEAD(rte_distributor_list, rte_distributor);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v10 03/18] lib: add new distributor code
  2017-03-15  6:19                               ` [PATCH v10 0/18] distributor library performance enhancements David Hunt
  2017-03-15  6:19                                 ` [PATCH v10 01/18] lib: rename legacy distributor lib files David Hunt
  2017-03-15  6:19                                 ` [PATCH v10 02/18] lib: create private header file David Hunt
@ 2017-03-15  6:19                                 ` David Hunt
  2017-03-15  6:19                                 ` [PATCH v10 04/18] lib: add SIMD flow matching to distributor David Hunt
                                                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

This patch includes the code for new burst-capable distributor library.

It also includes the rte_distributor_next.h file which will
be used as the public header once we add in the symbol versioning
for v20 and v1705 APIs, at which stage we will rename it to
rte_distributor.h.

The new distributor code contains a very similar API to the legacy code,
but now sends bursts of up to 8 mbufs to each worker. Flow ID's are
reduced to 15 bits for an optimal flow matching algorithm.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_distributor/Makefile                  |   1 +
 lib/librte_distributor/rte_distributor.c         | 628 +++++++++++++++++++++++
 lib/librte_distributor/rte_distributor_next.h    | 269 ++++++++++
 lib/librte_distributor/rte_distributor_private.h |  61 +++
 4 files changed, 959 insertions(+)
 create mode 100644 lib/librte_distributor/rte_distributor.c
 create mode 100644 lib/librte_distributor/rte_distributor_next.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index b314ca6..74256ff 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -43,6 +43,7 @@ LIBABIVER := 1
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
new file mode 100644
index 0000000..75b0d47
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor.c
@@ -0,0 +1,628 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <sys/queue.h>
+#include <string.h>
+#include <rte_mbuf.h>
+#include <rte_memory.h>
+#include <rte_cycles.h>
+#include <rte_memzone.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
+#include <rte_eal_memconfig.h>
+#include <rte_compat.h>
+#include "rte_distributor_private.h"
+#include "rte_distributor_next.h"
+#include "rte_distributor_v20.h"
+
+TAILQ_HEAD(rte_dist_burst_list, rte_distributor_v1705);
+
+static struct rte_tailq_elem rte_dist_burst_tailq = {
+	.name = "RTE_DIST_BURST",
+};
+EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
+
+/**** APIs called by workers ****/
+
+/**** Burst Packet APIs called by workers ****/
+
+void
+rte_distributor_request_pkt_v1705(struct rte_distributor_v1705 *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count)
+{
+	struct rte_distributor_buffer_v1705 *buf = &(d->bufs[worker_id]);
+	unsigned int i;
+
+	volatile int64_t *retptr64;
+
+	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
+		rte_distributor_request_pkt(d->d_v20,
+			worker_id, oldpkt[0]);
+		return;
+	}
+
+	retptr64 = &(buf->retptr64[0]);
+	/* Spin while handshake bits are set (scheduler clears it) */
+	while (unlikely(*retptr64 & RTE_DISTRIB_GET_BUF)) {
+		rte_pause();
+		uint64_t t = rte_rdtsc()+100;
+
+		while (rte_rdtsc() < t)
+			rte_pause();
+	}
+
+	/*
+	 * OK, if we've got here, then the scheduler has just cleared the
+	 * handshake bits. Populate the retptrs with returning packets.
+	 */
+
+	for (i = count; i < RTE_DIST_BURST_SIZE; i++)
+		buf->retptr64[i] = 0;
+
+	/* Set Return bit for each packet returned */
+	for (i = count; i-- > 0; )
+		buf->retptr64[i] =
+			(((int64_t)(uintptr_t)(oldpkt[i])) <<
+			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
+
+	/*
+	 * Finally, set the GET_BUF  to signal to distributor that cache
+	 * line is ready for processing
+	 */
+	*retptr64 |= RTE_DISTRIB_GET_BUF;
+}
+
+int
+rte_distributor_poll_pkt_v1705(struct rte_distributor_v1705 *d,
+		unsigned int worker_id, struct rte_mbuf **pkts)
+{
+	struct rte_distributor_buffer_v1705 *buf = &d->bufs[worker_id];
+	uint64_t ret;
+	int count = 0;
+	unsigned int i;
+
+	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
+		pkts[0] = rte_distributor_poll_pkt(d->d_v20, worker_id);
+		return (pkts[0]) ? 1 : 0;
+	}
+
+	/* If bit is set, return */
+	if (buf->bufptr64[0] & RTE_DISTRIB_GET_BUF)
+		return -1;
+
+	/* since bufptr64 is signed, this should be an arithmetic shift */
+	for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+		if (likely(buf->bufptr64[i] & RTE_DISTRIB_VALID_BUF)) {
+			ret = buf->bufptr64[i] >> RTE_DISTRIB_FLAG_BITS;
+			pkts[count++] = (struct rte_mbuf *)((uintptr_t)(ret));
+		}
+	}
+
+	/*
+	 * so now we've got the contents of the cacheline into an  array of
+	 * mbuf pointers, so toggle the bit so scheduler can start working
+	 * on the next cacheline while we're working.
+	 */
+	buf->bufptr64[0] |= RTE_DISTRIB_GET_BUF;
+
+	return count;
+}
+
+int
+rte_distributor_get_pkt_v1705(struct rte_distributor_v1705 *d,
+		unsigned int worker_id, struct rte_mbuf **pkts,
+		struct rte_mbuf **oldpkt, unsigned int return_count)
+{
+	int count;
+
+	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
+		if (return_count <= 1) {
+			pkts[0] = rte_distributor_get_pkt(d->d_v20,
+				worker_id, oldpkt[0]);
+			return (pkts[0]) ? 1 : 0;
+		} else
+			return -EINVAL;
+	}
+
+	rte_distributor_request_pkt_v1705(d, worker_id, oldpkt, return_count);
+
+	count = rte_distributor_poll_pkt_v1705(d, worker_id, pkts);
+	while (count == -1) {
+		uint64_t t = rte_rdtsc() + 100;
+
+		while (rte_rdtsc() < t)
+			rte_pause();
+
+		count = rte_distributor_poll_pkt_v1705(d, worker_id, pkts);
+	}
+	return count;
+}
+
+int
+rte_distributor_return_pkt_v1705(struct rte_distributor_v1705 *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
+{
+	struct rte_distributor_buffer_v1705 *buf = &d->bufs[worker_id];
+	unsigned int i;
+
+	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
+		if (num == 1)
+			return rte_distributor_return_pkt(d->d_v20,
+				worker_id, oldpkt[0]);
+		else
+			return -EINVAL;
+	}
+
+	for (i = 0; i < RTE_DIST_BURST_SIZE; i++)
+		/* Switch off the return bit first */
+		buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
+
+	for (i = num; i-- > 0; )
+		buf->retptr64[i] = (((int64_t)(uintptr_t)oldpkt[i]) <<
+			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
+
+	/* set the GET_BUF but even if we got no returns */
+	buf->retptr64[0] |= RTE_DISTRIB_GET_BUF;
+
+	return 0;
+}
+
+/**** APIs called on distributor core ***/
+
+/* stores a packet returned from a worker inside the returns array */
+static inline void
+store_return(uintptr_t oldbuf, struct rte_distributor_v1705 *d,
+		unsigned int *ret_start, unsigned int *ret_count)
+{
+	if (!oldbuf)
+		return;
+	/* store returns in a circular buffer */
+	d->returns.mbufs[(*ret_start + *ret_count) & RTE_DISTRIB_RETURNS_MASK]
+			= (void *)oldbuf;
+	*ret_start += (*ret_count == RTE_DISTRIB_RETURNS_MASK);
+	*ret_count += (*ret_count != RTE_DISTRIB_RETURNS_MASK);
+}
+
+/*
+ * Match then flow_ids (tags) of the incoming packets to the flow_ids
+ * of the inflight packets (both inflight on the workers and in each worker
+ * backlog). This will then allow us to pin those packets to the relevant
+ * workers to give us our atomic flow pinning.
+ */
+void
+find_match_scalar(struct rte_distributor_v1705 *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	struct rte_distributor_backlog *bl;
+	uint16_t i, j, w;
+
+	/*
+	 * Function overview:
+	 * 1. Loop through all worker ID's
+	 * 2. Compare the current inflights to the incoming tags
+	 * 3. Compare the current backlog to the incoming tags
+	 * 4. Add any matches to the output
+	 */
+
+	for (j = 0 ; j < RTE_DIST_BURST_SIZE; j++)
+		output_ptr[j] = 0;
+
+	for (i = 0; i < d->num_workers; i++) {
+		bl = &d->backlog[i];
+
+		for (j = 0; j < RTE_DIST_BURST_SIZE ; j++)
+			for (w = 0; w < RTE_DIST_BURST_SIZE; w++)
+				if (d->in_flight_tags[i][j] == data_ptr[w]) {
+					output_ptr[j] = i+1;
+					break;
+				}
+		for (j = 0; j < RTE_DIST_BURST_SIZE; j++)
+			for (w = 0; w < RTE_DIST_BURST_SIZE; w++)
+				if (bl->tags[j] == data_ptr[w]) {
+					output_ptr[j] = i+1;
+					break;
+				}
+	}
+
+	/*
+	 * At this stage, the output contains 8 16-bit values, with
+	 * each non-zero value containing the worker ID on which the
+	 * corresponding flow is pinned to.
+	 */
+}
+
+
+/*
+ * When the handshake bits indicate that there are packets coming
+ * back from the worker, this function is called to copy and store
+ * the valid returned pointers (store_return).
+ */
+static unsigned int
+handle_returns(struct rte_distributor_v1705 *d, unsigned int wkr)
+{
+	struct rte_distributor_buffer_v1705 *buf = &(d->bufs[wkr]);
+	uintptr_t oldbuf;
+	unsigned int ret_start = d->returns.start,
+			ret_count = d->returns.count;
+	unsigned int count = 0;
+	unsigned int i;
+
+	if (buf->retptr64[0] & RTE_DISTRIB_GET_BUF) {
+		for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+			if (buf->retptr64[i] & RTE_DISTRIB_RETURN_BUF) {
+				oldbuf = ((uintptr_t)(buf->retptr64[i] >>
+					RTE_DISTRIB_FLAG_BITS));
+				/* store returns in a circular buffer */
+				store_return(oldbuf, d, &ret_start, &ret_count);
+				count++;
+				buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
+			}
+		}
+		d->returns.start = ret_start;
+		d->returns.count = ret_count;
+		/* Clear for the worker to populate with more returns */
+		buf->retptr64[0] = 0;
+	}
+	return count;
+}
+
+/*
+ * This function releases a burst (cache line) to a worker.
+ * It is called from the process function when a cacheline is
+ * full to make room for more packets for that worker, or when
+ * all packets have been assigned to bursts and need to be flushed
+ * to the workers.
+ * It also needs to wait for any outstanding packets from the worker
+ * before sending out new packets.
+ */
+static unsigned int
+release(struct rte_distributor_v1705 *d, unsigned int wkr)
+{
+	struct rte_distributor_buffer_v1705 *buf = &(d->bufs[wkr]);
+	unsigned int i;
+
+	while (!(d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF))
+		rte_pause();
+
+	handle_returns(d, wkr);
+
+	buf->count = 0;
+
+	for (i = 0; i < d->backlog[wkr].count; i++) {
+		d->bufs[wkr].bufptr64[i] = d->backlog[wkr].pkts[i] |
+				RTE_DISTRIB_GET_BUF | RTE_DISTRIB_VALID_BUF;
+		d->in_flight_tags[wkr][i] = d->backlog[wkr].tags[i];
+	}
+	buf->count = i;
+	for ( ; i < RTE_DIST_BURST_SIZE ; i++) {
+		buf->bufptr64[i] = RTE_DISTRIB_GET_BUF;
+		d->in_flight_tags[wkr][i] = 0;
+	}
+
+	d->backlog[wkr].count = 0;
+
+	/* Clear the GET bit */
+	buf->bufptr64[0] &= ~RTE_DISTRIB_GET_BUF;
+	return  buf->count;
+
+}
+
+
+/* process a set of packets to distribute them to workers */
+int
+rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs)
+{
+	unsigned int next_idx = 0;
+	static unsigned int wkr;
+	struct rte_mbuf *next_mb = NULL;
+	int64_t next_value = 0;
+	uint16_t new_tag = 0;
+	uint16_t flows[RTE_DIST_BURST_SIZE] __rte_cache_aligned;
+	unsigned int i, j, w, wid;
+
+	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
+		/* Call the old API */
+		return rte_distributor_process(d->d_v20, mbufs, num_mbufs);
+	}
+
+	if (unlikely(num_mbufs == 0)) {
+		/* Flush out all non-full cache-lines to workers. */
+		for (wid = 0 ; wid < d->num_workers; wid++) {
+			if ((d->bufs[wid].bufptr64[0] & RTE_DISTRIB_GET_BUF)) {
+				release(d, wid);
+				handle_returns(d, wid);
+			}
+		}
+		return 0;
+	}
+
+	while (next_idx < num_mbufs) {
+		uint16_t matches[RTE_DIST_BURST_SIZE];
+		unsigned int pkts;
+
+		if (d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF)
+			d->bufs[wkr].count = 0;
+
+		if ((num_mbufs - next_idx) < RTE_DIST_BURST_SIZE)
+			pkts = num_mbufs - next_idx;
+		else
+			pkts = RTE_DIST_BURST_SIZE;
+
+		for (i = 0; i < pkts; i++) {
+			if (mbufs[next_idx + i]) {
+				/* flows have to be non-zero */
+				flows[i] = mbufs[next_idx + i]->hash.usr | 1;
+			} else
+				flows[i] = 0;
+		}
+		for (; i < RTE_DIST_BURST_SIZE; i++)
+			flows[i] = 0;
+
+		find_match_scalar(d, &flows[0], &matches[0]);
+
+		/*
+		 * Matches array now contain the intended worker ID (+1) of
+		 * the incoming packets. Any zeroes need to be assigned
+		 * workers.
+		 */
+
+		for (j = 0; j < pkts; j++) {
+
+			next_mb = mbufs[next_idx++];
+			next_value = (((int64_t)(uintptr_t)next_mb) <<
+					RTE_DISTRIB_FLAG_BITS);
+			/*
+			 * User is advocated to set tag vaue for each
+			 * mbuf before calling rte_distributor_process.
+			 * User defined tags are used to identify flows,
+			 * or sessions.
+			 */
+			/* flows MUST be non-zero */
+			new_tag = (uint16_t)(next_mb->hash.usr) | 1;
+
+			/*
+			 * Uncommenting the next line will cause the find_match
+			 * function to be optimised out, making this function
+			 * do parallel (non-atomic) distribution
+			 */
+			/* matches[j] = 0; */
+
+			if (matches[j]) {
+				struct rte_distributor_backlog *bl =
+						&d->backlog[matches[j]-1];
+				if (unlikely(bl->count ==
+						RTE_DIST_BURST_SIZE)) {
+					release(d, matches[j]-1);
+				}
+
+				/* Add to worker that already has flow */
+				unsigned int idx = bl->count++;
+
+				bl->tags[idx] = new_tag;
+				bl->pkts[idx] = next_value;
+
+			} else {
+				struct rte_distributor_backlog *bl =
+						&d->backlog[wkr];
+				if (unlikely(bl->count ==
+						RTE_DIST_BURST_SIZE)) {
+					release(d, wkr);
+				}
+
+				/* Add to current worker worker */
+				unsigned int idx = bl->count++;
+
+				bl->tags[idx] = new_tag;
+				bl->pkts[idx] = next_value;
+				/*
+				 * Now that we've just added an unpinned flow
+				 * to a worker, we need to ensure that all
+				 * other packets with that same flow will go
+				 * to the same worker in this burst.
+				 */
+				for (w = j; w < pkts; w++)
+					if (flows[w] == new_tag)
+						matches[w] = wkr+1;
+			}
+		}
+		wkr++;
+		if (wkr >= d->num_workers)
+			wkr = 0;
+	}
+
+	/* Flush out all non-full cache-lines to workers. */
+	for (wid = 0 ; wid < d->num_workers; wid++)
+		if ((d->bufs[wid].bufptr64[0] & RTE_DISTRIB_GET_BUF))
+			release(d, wid);
+
+	return num_mbufs;
+}
+
+/* return to the caller, packets returned from workers */
+int
+rte_distributor_returned_pkts_v1705(struct rte_distributor_v1705 *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs)
+{
+	struct rte_distributor_returned_pkts *returns = &d->returns;
+	unsigned int retval = (max_mbufs < returns->count) ?
+			max_mbufs : returns->count;
+	unsigned int i;
+
+	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
+		/* Call the old API */
+		return rte_distributor_returned_pkts(d->d_v20,
+				mbufs, max_mbufs);
+	}
+
+	for (i = 0; i < retval; i++) {
+		unsigned int idx = (returns->start + i) &
+				RTE_DISTRIB_RETURNS_MASK;
+
+		mbufs[i] = returns->mbufs[idx];
+	}
+	returns->start += i;
+	returns->count -= i;
+
+	return retval;
+}
+
+/*
+ * Return the number of packets in-flight in a distributor, i.e. packets
+ * being workered on or queued up in a backlog.
+ */
+static inline unsigned int
+total_outstanding(const struct rte_distributor_v1705 *d)
+{
+	unsigned int wkr, total_outstanding = 0;
+
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		total_outstanding += d->backlog[wkr].count;
+
+	return total_outstanding;
+}
+
+/*
+ * Flush the distributor, so that there are no outstanding packets in flight or
+ * queued up.
+ */
+int
+rte_distributor_flush_v1705(struct rte_distributor_v1705 *d)
+{
+	const unsigned int flushed = total_outstanding(d);
+	unsigned int wkr;
+
+	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
+		/* Call the old API */
+		return rte_distributor_flush(d->d_v20);
+	}
+
+	while (total_outstanding(d) > 0)
+		rte_distributor_process_v1705(d, NULL, 0);
+
+	/*
+	 * Send empty burst to all workers to allow them to exit
+	 * gracefully, should they need to.
+	 */
+	rte_distributor_process_v1705(d, NULL, 0);
+
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		handle_returns(d, wkr);
+
+	return flushed;
+}
+
+/* clears the internal returns array in the distributor */
+void
+rte_distributor_clear_returns_v1705(struct rte_distributor_v1705 *d)
+{
+	unsigned int wkr;
+
+	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
+		/* Call the old API */
+		rte_distributor_clear_returns(d->d_v20);
+	}
+
+	/* throw away returns, so workers can exit */
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		d->bufs[wkr].retptr64[0] = 0;
+}
+
+/* creates a distributor instance */
+struct rte_distributor_v1705 *
+rte_distributor_create_v1705(const char *name,
+		unsigned int socket_id,
+		unsigned int num_workers,
+		unsigned int alg_type)
+{
+	struct rte_distributor_v1705 *d;
+	struct rte_dist_burst_list *dist_burst_list;
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+	unsigned int i;
+
+	/* TODO Reorganise function properly around RTE_DIST_ALG_SINGLE/BURST */
+
+	/* compilation-time checks */
+	RTE_BUILD_BUG_ON((sizeof(*d) & RTE_CACHE_LINE_MASK) != 0);
+	RTE_BUILD_BUG_ON((RTE_DISTRIB_MAX_WORKERS & 7) != 0);
+
+	if (alg_type == RTE_DIST_ALG_SINGLE) {
+		d = malloc(sizeof(struct rte_distributor_v1705));
+		d->d_v20 = rte_distributor_create(name,
+				socket_id, num_workers);
+		if (d->d_v20 == NULL) {
+			/* rte_errno will have been set */
+			return NULL;
+		}
+		d->alg_type = alg_type;
+		return d;
+	}
+
+	if (name == NULL || num_workers >= RTE_DISTRIB_MAX_WORKERS) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	snprintf(mz_name, sizeof(mz_name), RTE_DISTRIB_PREFIX"%s", name);
+	mz = rte_memzone_reserve(mz_name, sizeof(*d), socket_id, NO_FLAGS);
+	if (mz == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	d = mz->addr;
+	snprintf(d->name, sizeof(d->name), "%s", name);
+	d->num_workers = num_workers;
+	d->alg_type = alg_type;
+	d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
+
+	/*
+	 * Set up the backog tags so they're pointing at the second cache
+	 * line for performance during flow matching
+	 */
+	for (i = 0 ; i < num_workers ; i++)
+		d->backlog[i].tags = &d->in_flight_tags[i][RTE_DIST_BURST_SIZE];
+
+	dist_burst_list = RTE_TAILQ_CAST(rte_dist_burst_tailq.head,
+					  rte_dist_burst_list);
+
+
+	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+	TAILQ_INSERT_TAIL(dist_burst_list, d, next);
+	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
+	return d;
+}
diff --git a/lib/librte_distributor/rte_distributor_next.h b/lib/librte_distributor/rte_distributor_next.h
new file mode 100644
index 0000000..0034020
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_next.h
@@ -0,0 +1,269 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIBUTOR_H_
+#define _RTE_DISTRIBUTOR_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Type of distribution (burst/single) */
+enum rte_distributor_alg_type {
+	RTE_DIST_ALG_BURST = 0,
+	RTE_DIST_ALG_SINGLE,
+	RTE_DIST_NUM_ALG_TYPES
+};
+
+struct rte_distributor_v1705;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @param alg_type
+ *   Call the legacy API, or use the new burst API. legacy uses 32-bit
+ *   flow ID, and works on a single packet at a time. Latest uses 15-
+ *   bit flow ID and works on up to 8 packets at a time to worers.
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor_v1705 *
+rte_distributor_create_v1705(const char *name, unsigned int socket_id,
+		unsigned int num_workers,
+		unsigned int alg_type);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed on different cores at
+ * the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts_v1705(struct rte_distributor_v1705 *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush_v1705(struct rte_distributor_v1705 *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns_v1705(struct rte_distributor_v1705 *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get new packets to process. Any previous packets
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param pkts
+ *   The mbufs pointer array to be filled in (up to 8 packets)
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ * @param retcount
+ *   The number of packets being returned
+ *
+ * @return
+ *   The number of packets in the pkts array
+ */
+int
+rte_distributor_get_pkt_v1705(struct rte_distributor_v1705 *d,
+	unsigned int worker_id, struct rte_mbuf **pkts,
+	struct rte_mbuf **oldpkt, unsigned int retcount);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packets being processed by the worker
+ * @param num
+ *   The number of packets in the oldpkt array
+ */
+int
+rte_distributor_return_pkt_v1705(struct rte_distributor_v1705 *d,
+	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt_burst(), this function does not wait for a
+ * new packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt_burst() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt_burst()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The returning packets, if any, processed by the worker
+ * @param count
+ *   The number of returning packets
+ */
+void
+rte_distributor_request_pkt_v1705(struct rte_distributor_v1705 *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbufs
+ *   The array of mbufs being given to the worker
+ *
+ * @return
+ *   The number of packets being given to the worker thread, zero if no
+ *   packet is yet available.
+ */
+int
+rte_distributor_poll_pkt_v1705(struct rte_distributor_v1705 *d,
+		unsigned int worker_id, struct rte_mbuf **mbufs);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
index 6d72f1c..f0042a8 100644
--- a/lib/librte_distributor/rte_distributor_private.h
+++ b/lib/librte_distributor/rte_distributor_private.h
@@ -129,6 +129,67 @@ struct rte_distributor {
 	struct rte_distributor_returned_pkts returns;
 };
 
+/* All different signature compare functions */
+enum rte_distributor_match_function {
+	RTE_DIST_MATCH_SCALAR = 0,
+	RTE_DIST_MATCH_VECTOR,
+	RTE_DIST_NUM_MATCH_FNS
+};
+
+/**
+ * Buffer structure used to pass the pointer data between cores. This is cache
+ * line aligned, but to improve performance and prevent adjacent cache-line
+ * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
+ * the next cache line to worker 0, we pad this out to two cache lines.
+ * We can pass up to 8 mbufs at a time in one cacheline.
+ * There is a separate cacheline for returns in the burst API.
+ */
+struct rte_distributor_buffer_v1705 {
+	volatile int64_t bufptr64[RTE_DIST_BURST_SIZE]
+		__rte_cache_aligned; /* <= outgoing to worker */
+
+	int64_t pad1 __rte_cache_aligned;    /* <= one cache line  */
+
+	volatile int64_t retptr64[RTE_DIST_BURST_SIZE]
+		__rte_cache_aligned; /* <= incoming from worker */
+
+	int64_t pad2 __rte_cache_aligned;    /* <= one cache line  */
+
+	int count __rte_cache_aligned;       /* <= number of current mbufs */
+};
+
+struct rte_distributor_v1705 {
+	TAILQ_ENTRY(rte_distributor_v1705) next;    /**< Next in list. */
+
+	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
+	unsigned int num_workers;             /**< Number of workers polling */
+	unsigned int alg_type;                /**< Number of alg types */
+
+	/**>
+	 * First cache line in the this array are the tags inflight
+	 * on the worker core. Second cache line are the backlog
+	 * that are going to go to the worker core.
+	 */
+	uint16_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS][RTE_DIST_BURST_SIZE*2]
+			__rte_cache_aligned;
+
+	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS]
+			__rte_cache_aligned;
+
+	struct rte_distributor_buffer_v1705 bufs[RTE_DISTRIB_MAX_WORKERS];
+
+	struct rte_distributor_returned_pkts returns;
+
+	enum rte_distributor_match_function dist_match_fn;
+
+	struct rte_distributor *d_v20;
+};
+
+void
+find_match_scalar(struct rte_distributor_v1705 *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr);
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v10 04/18] lib: add SIMD flow matching to distributor
  2017-03-15  6:19                               ` [PATCH v10 0/18] distributor library performance enhancements David Hunt
                                                   ` (2 preceding siblings ...)
  2017-03-15  6:19                                 ` [PATCH v10 03/18] lib: add new distributor code David Hunt
@ 2017-03-15  6:19                                 ` David Hunt
  2017-03-15  6:19                                 ` [PATCH v10 05/18] test/distributor: extra params for autotests David Hunt
                                                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Add an optimised version of the in-flight flow matching algorithm
using SIMD instructions. This should give up to 1.5x over the scalar
versions performance.

Falls back to scalar version if SSE4.2 not available

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_distributor/Makefile                    |  10 ++
 lib/librte_distributor/rte_distributor.c           |  16 ++-
 .../rte_distributor_match_generic.c                |  43 ++++++++
 lib/librte_distributor/rte_distributor_match_sse.c | 114 +++++++++++++++++++++
 lib/librte_distributor/rte_distributor_private.h   |   5 +
 5 files changed, 186 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_match_generic.c
 create mode 100644 lib/librte_distributor/rte_distributor_match_sse.c

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 74256ff..a812fe4 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -44,6 +44,16 @@ LIBABIVER := 1
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor.c
+ifeq ($(CONFIG_RTE_ARCH_X86),y)
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_match_sse.c
+# distributor SIMD algo needs SSE4.2 support
+ifeq ($(findstring RTE_MACHINE_CPUFLAG_SSE4_2,$(CFLAGS)),)
+CFLAGS_rte_distributor_match_sse.o += -msse4.2
+endif
+else
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_match_generic.c
+endif
+
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index 75b0d47..6158fa6 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -391,7 +391,13 @@ rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
 		for (; i < RTE_DIST_BURST_SIZE; i++)
 			flows[i] = 0;
 
-		find_match_scalar(d, &flows[0], &matches[0]);
+		switch (d->dist_match_fn) {
+		case RTE_DIST_MATCH_VECTOR:
+			find_match_vec(d, &flows[0], &matches[0]);
+			break;
+		default:
+			find_match_scalar(d, &flows[0], &matches[0]);
+		}
 
 		/*
 		 * Matches array now contain the intended worker ID (+1) of
@@ -607,7 +613,13 @@ rte_distributor_create_v1705(const char *name,
 	snprintf(d->name, sizeof(d->name), "%s", name);
 	d->num_workers = num_workers;
 	d->alg_type = alg_type;
-	d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
+
+#if defined(RTE_ARCH_X86)
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2))
+		d->dist_match_fn = RTE_DIST_MATCH_VECTOR;
+	else
+#endif
+		d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
 
 	/*
 	 * Set up the backog tags so they're pointing at the second cache
diff --git a/lib/librte_distributor/rte_distributor_match_generic.c b/lib/librte_distributor/rte_distributor_match_generic.c
new file mode 100644
index 0000000..4925a78
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_match_generic.c
@@ -0,0 +1,43 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_mbuf.h>
+#include "rte_distributor_private.h"
+#include "rte_distributor.h"
+
+void
+find_match_vec(struct rte_distributor *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	find_match_scalar(d, data_ptr, output_ptr);
+}
diff --git a/lib/librte_distributor/rte_distributor_match_sse.c b/lib/librte_distributor/rte_distributor_match_sse.c
new file mode 100644
index 0000000..b9f9bb0
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_match_sse.c
@@ -0,0 +1,114 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_mbuf.h>
+#include "rte_distributor_private.h"
+#include "rte_distributor.h"
+#include "smmintrin.h"
+#include "nmmintrin.h"
+
+
+void
+find_match_vec(struct rte_distributor_v1705 *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	/* Setup */
+	__m128i incoming_fids;
+	__m128i inflight_fids;
+	__m128i preflight_fids;
+	__m128i wkr;
+	__m128i mask1;
+	__m128i mask2;
+	__m128i output;
+	struct rte_distributor_backlog *bl;
+	uint16_t i;
+
+	/*
+	 * Function overview:
+	 * 2. Loop through all worker ID's
+	 *  2a. Load the current inflights for that worker into an xmm reg
+	 *  2b. Load the current backlog for that worker into an xmm reg
+	 *  2c. use cmpestrm to intersect flow_ids with backlog and inflights
+	 *  2d. Add any matches to the output
+	 * 3. Write the output xmm (matching worker ids).
+	 */
+
+
+	output = _mm_set1_epi16(0);
+	incoming_fids = _mm_load_si128((__m128i *)data_ptr);
+
+	for (i = 0; i < d->num_workers; i++) {
+		bl = &d->backlog[i];
+
+		inflight_fids =
+			_mm_load_si128((__m128i *)&(d->in_flight_tags[i]));
+		preflight_fids =
+			_mm_load_si128((__m128i *)(bl->tags));
+
+		/*
+		 * Any incoming_fid that exists anywhere in inflight_fids will
+		 * have 0xffff in same position of the mask as the incoming fid
+		 * Example (shortened to bytes for brevity):
+		 * incoming_fids   0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08
+		 * inflight_fids   0x03 0x05 0x07 0x00 0x00 0x00 0x00 0x00
+		 * mask            0x00 0x00 0xff 0x00 0xff 0x00 0xff 0x00
+		 */
+
+		mask1 = _mm_cmpestrm(inflight_fids, 8, incoming_fids, 8,
+			_SIDD_UWORD_OPS |
+			_SIDD_CMP_EQUAL_ANY |
+			_SIDD_UNIT_MASK);
+		mask2 = _mm_cmpestrm(preflight_fids, 8, incoming_fids, 8,
+			_SIDD_UWORD_OPS |
+			_SIDD_CMP_EQUAL_ANY |
+			_SIDD_UNIT_MASK);
+
+		mask1 = _mm_or_si128(mask1, mask2);
+		/*
+		 * Now mask contains 0xffff where there's a match.
+		 * Next we need to store the worker_id in the relevant position
+		 * in the output.
+		 */
+
+		wkr = _mm_set1_epi16(i+1);
+		mask1 = _mm_and_si128(mask1, wkr);
+		output = _mm_or_si128(mask1, output);
+	}
+
+	/*
+	 * At this stage, the output 128-bit contains 8 16-bit values, with
+	 * each non-zero value containing the worker ID on which the
+	 * corresponding flow is pinned to.
+	 */
+	_mm_store_si128((__m128i *)output_ptr, output);
+}
diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
index f0042a8..92052b1 100644
--- a/lib/librte_distributor/rte_distributor_private.h
+++ b/lib/librte_distributor/rte_distributor_private.h
@@ -190,6 +190,11 @@ find_match_scalar(struct rte_distributor_v1705 *d,
 			uint16_t *data_ptr,
 			uint16_t *output_ptr);
 
+void
+find_match_vec(struct rte_distributor_v1705 *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr);
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v10 05/18] test/distributor: extra params for autotests
  2017-03-15  6:19                               ` [PATCH v10 0/18] distributor library performance enhancements David Hunt
                                                   ` (3 preceding siblings ...)
  2017-03-15  6:19                                 ` [PATCH v10 04/18] lib: add SIMD flow matching to distributor David Hunt
@ 2017-03-15  6:19                                 ` David Hunt
  2017-03-15  6:19                                 ` [PATCH v10 06/18] lib: switch distributor over to new API David Hunt
                                                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

In the next few patches, we'll want to test old and new API,
so here we're allowing different parameters to be passed to
the tests, instead of just a distributor struct.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 test/test/test_distributor.c | 64 +++++++++++++++++++++++++++++---------------
 1 file changed, 43 insertions(+), 21 deletions(-)

diff --git a/test/test/test_distributor.c b/test/test/test_distributor.c
index 85cb8f3..6059a0c 100644
--- a/test/test/test_distributor.c
+++ b/test/test/test_distributor.c
@@ -45,6 +45,13 @@
 #define BURST 32
 #define BIG_BATCH 1024
 
+struct worker_params {
+	char name[64];
+	struct rte_distributor *dist;
+};
+
+struct worker_params worker_params;
+
 /* statics - all zero-initialized by default */
 static volatile int quit;      /**< general quit variable for all threads */
 static volatile int zero_quit; /**< var for when we just want thr0 to quit*/
@@ -81,7 +88,9 @@ static int
 handle_work(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->dist;
+
 	unsigned count = 0;
 	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
 
@@ -107,8 +116,9 @@ handle_work(void *arg)
  *   not necessarily in the same order (as different flows).
  */
 static int
-sanity_test(struct rte_distributor *d, struct rte_mempool *p)
+sanity_test(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->dist;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
@@ -249,7 +259,8 @@ static int
 handle_work_with_free_mbufs(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->dist;
 	unsigned count = 0;
 	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
 
@@ -270,8 +281,9 @@ handle_work_with_free_mbufs(void *arg)
  * library.
  */
 static int
-sanity_test_with_mbuf_alloc(struct rte_distributor *d, struct rte_mempool *p)
+sanity_test_with_mbuf_alloc(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->dist;
 	unsigned i;
 	struct rte_mbuf *bufs[BURST];
 
@@ -305,7 +317,8 @@ static int
 handle_work_for_shutdown_test(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->dist;
 	unsigned count = 0;
 	const unsigned id = __sync_fetch_and_add(&worker_idx, 1);
 
@@ -344,9 +357,10 @@ handle_work_for_shutdown_test(void *arg)
  * library.
  */
 static int
-sanity_test_with_worker_shutdown(struct rte_distributor *d,
+sanity_test_with_worker_shutdown(struct worker_params *wp,
 		struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->dist;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
@@ -401,9 +415,10 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d,
  * one worker shuts down..
  */
 static int
-test_flush_with_worker_shutdown(struct rte_distributor *d,
+test_flush_with_worker_shutdown(struct worker_params *wp,
 		struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->dist;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
@@ -480,8 +495,9 @@ int test_error_distributor_create_numworkers(void)
 
 /* Useful function which ensures that all worker functions terminate */
 static void
-quit_workers(struct rte_distributor *d, struct rte_mempool *p)
+quit_workers(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->dist;
 	const unsigned num_workers = rte_lcore_count() - 1;
 	unsigned i;
 	struct rte_mbuf *bufs[RTE_MAX_LCORE];
@@ -536,28 +552,34 @@ test_distributor(void)
 		}
 	}
 
-	rte_eal_mp_remote_launch(handle_work, d, SKIP_MASTER);
-	if (sanity_test(d, p) < 0)
+	worker_params.dist = d;
+	sprintf(worker_params.name, "single");
+
+	rte_eal_mp_remote_launch(handle_work, &worker_params, SKIP_MASTER);
+	if (sanity_test(&worker_params, p) < 0)
 		goto err;
-	quit_workers(d, p);
+	quit_workers(&worker_params, p);
 
-	rte_eal_mp_remote_launch(handle_work_with_free_mbufs, d, SKIP_MASTER);
-	if (sanity_test_with_mbuf_alloc(d, p) < 0)
+	rte_eal_mp_remote_launch(handle_work_with_free_mbufs, &worker_params,
+				SKIP_MASTER);
+	if (sanity_test_with_mbuf_alloc(&worker_params, p) < 0)
 		goto err;
-	quit_workers(d, p);
+	quit_workers(&worker_params, p);
 
 	if (rte_lcore_count() > 2) {
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test, d,
+		rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+				&worker_params,
 				SKIP_MASTER);
-		if (sanity_test_with_worker_shutdown(d, p) < 0)
+		if (sanity_test_with_worker_shutdown(&worker_params, p) < 0)
 			goto err;
-		quit_workers(d, p);
+		quit_workers(&worker_params, p);
 
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test, d,
+		rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+				&worker_params,
 				SKIP_MASTER);
-		if (test_flush_with_worker_shutdown(d, p) < 0)
+		if (test_flush_with_worker_shutdown(&worker_params, p) < 0)
 			goto err;
-		quit_workers(d, p);
+		quit_workers(&worker_params, p);
 
 	} else {
 		printf("Not enough cores to run tests for worker shutdown\n");
@@ -572,7 +594,7 @@ test_distributor(void)
 	return 0;
 
 err:
-	quit_workers(d, p);
+	quit_workers(&worker_params, p);
 	return -1;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v10 06/18] lib: switch distributor over to new API
  2017-03-15  6:19                               ` [PATCH v10 0/18] distributor library performance enhancements David Hunt
                                                   ` (4 preceding siblings ...)
  2017-03-15  6:19                                 ` [PATCH v10 05/18] test/distributor: extra params for autotests David Hunt
@ 2017-03-15  6:19                                 ` David Hunt
  2017-03-15  6:19                                 ` [PATCH v10 07/18] lib: make v20 header file private David Hunt
                                                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

This is the main switch over between the legacy API and the new
burst API. We rename all the functions in rte_distributor.c to remove
the _v1705, and we add in _v20 in the rte_distributor_v20.c

We also rename the rte_distributor_next.h as rte_distributor.h, as
this is now the public header.

At the same time, we need the autotests and sample app to compile
properly, hence those changes are in this patch also.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 examples/distributor/main.c                        |  22 +-
 lib/librte_distributor/rte_distributor.c           |  76 +++---
 lib/librte_distributor/rte_distributor.h           | 240 +++++++++++++++++-
 lib/librte_distributor/rte_distributor_match_sse.c |   2 +-
 lib/librte_distributor/rte_distributor_next.h      | 269 ---------------------
 lib/librte_distributor/rte_distributor_private.h   |  22 +-
 lib/librte_distributor/rte_distributor_v20.c       |  46 ++--
 lib/librte_distributor/rte_distributor_v20.h       |  24 +-
 test/test/test_distributor.c                       | 235 ++++++++++++------
 test/test/test_distributor_perf.c                  |  26 +-
 10 files changed, 511 insertions(+), 451 deletions(-)
 delete mode 100644 lib/librte_distributor/rte_distributor_next.h

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index 7b8a759..a748985 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -405,17 +405,30 @@ lcore_worker(struct lcore_params *p)
 {
 	struct rte_distributor *d = p->d;
 	const unsigned id = p->worker_id;
+	unsigned int num = 0;
+	unsigned int i;
+
 	/*
 	 * for single port, xor_val will be zero so we won't modify the output
 	 * port, otherwise we send traffic from 0 to 1, 2 to 3, and vice versa
 	 */
 	const unsigned xor_val = (rte_eth_dev_count() > 1);
-	struct rte_mbuf *buf = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
 
 	printf("\nCore %u acting as worker core.\n", rte_lcore_id());
 	while (!quit_signal) {
-		buf = rte_distributor_get_pkt(d, id, buf);
-		buf->port ^= xor_val;
+		num = rte_distributor_get_pkt(d, id, buf, buf, num);
+		/* Do a little bit of work for each packet */
+		for (i = 0; i < num; i++) {
+			uint64_t t = rte_rdtsc()+100;
+
+			while (rte_rdtsc() < t)
+				rte_pause();
+			buf[i]->port ^= xor_val;
+		}
 	}
 	return 0;
 }
@@ -561,7 +574,8 @@ main(int argc, char *argv[])
 	}
 
 	d = rte_distributor_create("PKT_DIST", rte_socket_id(),
-			rte_lcore_count() - 2);
+			rte_lcore_count() - 2,
+			RTE_DIST_ALG_BURST);
 	if (d == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create distributor\n");
 
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index 6158fa6..6e1debf 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -42,10 +42,10 @@
 #include <rte_eal_memconfig.h>
 #include <rte_compat.h>
 #include "rte_distributor_private.h"
-#include "rte_distributor_next.h"
+#include "rte_distributor.h"
 #include "rte_distributor_v20.h"
 
-TAILQ_HEAD(rte_dist_burst_list, rte_distributor_v1705);
+TAILQ_HEAD(rte_dist_burst_list, rte_distributor);
 
 static struct rte_tailq_elem rte_dist_burst_tailq = {
 	.name = "RTE_DIST_BURST",
@@ -57,17 +57,17 @@ EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
 /**** Burst Packet APIs called by workers ****/
 
 void
-rte_distributor_request_pkt_v1705(struct rte_distributor_v1705 *d,
+rte_distributor_request_pkt(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt,
 		unsigned int count)
 {
-	struct rte_distributor_buffer_v1705 *buf = &(d->bufs[worker_id]);
+	struct rte_distributor_buffer *buf = &(d->bufs[worker_id]);
 	unsigned int i;
 
 	volatile int64_t *retptr64;
 
 	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
-		rte_distributor_request_pkt(d->d_v20,
+		rte_distributor_request_pkt_v20(d->d_v20,
 			worker_id, oldpkt[0]);
 		return;
 	}
@@ -104,16 +104,16 @@ rte_distributor_request_pkt_v1705(struct rte_distributor_v1705 *d,
 }
 
 int
-rte_distributor_poll_pkt_v1705(struct rte_distributor_v1705 *d,
+rte_distributor_poll_pkt(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts)
 {
-	struct rte_distributor_buffer_v1705 *buf = &d->bufs[worker_id];
+	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
 	uint64_t ret;
 	int count = 0;
 	unsigned int i;
 
 	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
-		pkts[0] = rte_distributor_poll_pkt(d->d_v20, worker_id);
+		pkts[0] = rte_distributor_poll_pkt_v20(d->d_v20, worker_id);
 		return (pkts[0]) ? 1 : 0;
 	}
 
@@ -140,7 +140,7 @@ rte_distributor_poll_pkt_v1705(struct rte_distributor_v1705 *d,
 }
 
 int
-rte_distributor_get_pkt_v1705(struct rte_distributor_v1705 *d,
+rte_distributor_get_pkt(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts,
 		struct rte_mbuf **oldpkt, unsigned int return_count)
 {
@@ -148,37 +148,37 @@ rte_distributor_get_pkt_v1705(struct rte_distributor_v1705 *d,
 
 	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
 		if (return_count <= 1) {
-			pkts[0] = rte_distributor_get_pkt(d->d_v20,
+			pkts[0] = rte_distributor_get_pkt_v20(d->d_v20,
 				worker_id, oldpkt[0]);
 			return (pkts[0]) ? 1 : 0;
 		} else
 			return -EINVAL;
 	}
 
-	rte_distributor_request_pkt_v1705(d, worker_id, oldpkt, return_count);
+	rte_distributor_request_pkt(d, worker_id, oldpkt, return_count);
 
-	count = rte_distributor_poll_pkt_v1705(d, worker_id, pkts);
+	count = rte_distributor_poll_pkt(d, worker_id, pkts);
 	while (count == -1) {
 		uint64_t t = rte_rdtsc() + 100;
 
 		while (rte_rdtsc() < t)
 			rte_pause();
 
-		count = rte_distributor_poll_pkt_v1705(d, worker_id, pkts);
+		count = rte_distributor_poll_pkt(d, worker_id, pkts);
 	}
 	return count;
 }
 
 int
-rte_distributor_return_pkt_v1705(struct rte_distributor_v1705 *d,
+rte_distributor_return_pkt(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
 {
-	struct rte_distributor_buffer_v1705 *buf = &d->bufs[worker_id];
+	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
 	unsigned int i;
 
 	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
 		if (num == 1)
-			return rte_distributor_return_pkt(d->d_v20,
+			return rte_distributor_return_pkt_v20(d->d_v20,
 				worker_id, oldpkt[0]);
 		else
 			return -EINVAL;
@@ -202,7 +202,7 @@ rte_distributor_return_pkt_v1705(struct rte_distributor_v1705 *d,
 
 /* stores a packet returned from a worker inside the returns array */
 static inline void
-store_return(uintptr_t oldbuf, struct rte_distributor_v1705 *d,
+store_return(uintptr_t oldbuf, struct rte_distributor *d,
 		unsigned int *ret_start, unsigned int *ret_count)
 {
 	if (!oldbuf)
@@ -221,7 +221,7 @@ store_return(uintptr_t oldbuf, struct rte_distributor_v1705 *d,
  * workers to give us our atomic flow pinning.
  */
 void
-find_match_scalar(struct rte_distributor_v1705 *d,
+find_match_scalar(struct rte_distributor *d,
 			uint16_t *data_ptr,
 			uint16_t *output_ptr)
 {
@@ -270,9 +270,9 @@ find_match_scalar(struct rte_distributor_v1705 *d,
  * the valid returned pointers (store_return).
  */
 static unsigned int
-handle_returns(struct rte_distributor_v1705 *d, unsigned int wkr)
+handle_returns(struct rte_distributor *d, unsigned int wkr)
 {
-	struct rte_distributor_buffer_v1705 *buf = &(d->bufs[wkr]);
+	struct rte_distributor_buffer *buf = &(d->bufs[wkr]);
 	uintptr_t oldbuf;
 	unsigned int ret_start = d->returns.start,
 			ret_count = d->returns.count;
@@ -308,9 +308,9 @@ handle_returns(struct rte_distributor_v1705 *d, unsigned int wkr)
  * before sending out new packets.
  */
 static unsigned int
-release(struct rte_distributor_v1705 *d, unsigned int wkr)
+release(struct rte_distributor *d, unsigned int wkr)
 {
-	struct rte_distributor_buffer_v1705 *buf = &(d->bufs[wkr]);
+	struct rte_distributor_buffer *buf = &(d->bufs[wkr]);
 	unsigned int i;
 
 	while (!(d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF))
@@ -342,7 +342,7 @@ release(struct rte_distributor_v1705 *d, unsigned int wkr)
 
 /* process a set of packets to distribute them to workers */
 int
-rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
+rte_distributor_process(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int num_mbufs)
 {
 	unsigned int next_idx = 0;
@@ -355,7 +355,7 @@ rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
 
 	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
 		/* Call the old API */
-		return rte_distributor_process(d->d_v20, mbufs, num_mbufs);
+		return rte_distributor_process_v20(d->d_v20, mbufs, num_mbufs);
 	}
 
 	if (unlikely(num_mbufs == 0)) {
@@ -479,7 +479,7 @@ rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
 
 /* return to the caller, packets returned from workers */
 int
-rte_distributor_returned_pkts_v1705(struct rte_distributor_v1705 *d,
+rte_distributor_returned_pkts(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int max_mbufs)
 {
 	struct rte_distributor_returned_pkts *returns = &d->returns;
@@ -489,7 +489,7 @@ rte_distributor_returned_pkts_v1705(struct rte_distributor_v1705 *d,
 
 	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
 		/* Call the old API */
-		return rte_distributor_returned_pkts(d->d_v20,
+		return rte_distributor_returned_pkts_v20(d->d_v20,
 				mbufs, max_mbufs);
 	}
 
@@ -510,7 +510,7 @@ rte_distributor_returned_pkts_v1705(struct rte_distributor_v1705 *d,
  * being workered on or queued up in a backlog.
  */
 static inline unsigned int
-total_outstanding(const struct rte_distributor_v1705 *d)
+total_outstanding(const struct rte_distributor *d)
 {
 	unsigned int wkr, total_outstanding = 0;
 
@@ -525,24 +525,24 @@ total_outstanding(const struct rte_distributor_v1705 *d)
  * queued up.
  */
 int
-rte_distributor_flush_v1705(struct rte_distributor_v1705 *d)
+rte_distributor_flush(struct rte_distributor *d)
 {
 	const unsigned int flushed = total_outstanding(d);
 	unsigned int wkr;
 
 	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
 		/* Call the old API */
-		return rte_distributor_flush(d->d_v20);
+		return rte_distributor_flush_v20(d->d_v20);
 	}
 
 	while (total_outstanding(d) > 0)
-		rte_distributor_process_v1705(d, NULL, 0);
+		rte_distributor_process(d, NULL, 0);
 
 	/*
 	 * Send empty burst to all workers to allow them to exit
 	 * gracefully, should they need to.
 	 */
-	rte_distributor_process_v1705(d, NULL, 0);
+	rte_distributor_process(d, NULL, 0);
 
 	for (wkr = 0; wkr < d->num_workers; wkr++)
 		handle_returns(d, wkr);
@@ -552,13 +552,13 @@ rte_distributor_flush_v1705(struct rte_distributor_v1705 *d)
 
 /* clears the internal returns array in the distributor */
 void
-rte_distributor_clear_returns_v1705(struct rte_distributor_v1705 *d)
+rte_distributor_clear_returns(struct rte_distributor *d)
 {
 	unsigned int wkr;
 
 	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
 		/* Call the old API */
-		rte_distributor_clear_returns(d->d_v20);
+		rte_distributor_clear_returns_v20(d->d_v20);
 	}
 
 	/* throw away returns, so workers can exit */
@@ -567,13 +567,13 @@ rte_distributor_clear_returns_v1705(struct rte_distributor_v1705 *d)
 }
 
 /* creates a distributor instance */
-struct rte_distributor_v1705 *
-rte_distributor_create_v1705(const char *name,
+struct rte_distributor *
+rte_distributor_create(const char *name,
 		unsigned int socket_id,
 		unsigned int num_workers,
 		unsigned int alg_type)
 {
-	struct rte_distributor_v1705 *d;
+	struct rte_distributor *d;
 	struct rte_dist_burst_list *dist_burst_list;
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	const struct rte_memzone *mz;
@@ -586,8 +586,8 @@ rte_distributor_create_v1705(const char *name,
 	RTE_BUILD_BUG_ON((RTE_DISTRIB_MAX_WORKERS & 7) != 0);
 
 	if (alg_type == RTE_DIST_ALG_SINGLE) {
-		d = malloc(sizeof(struct rte_distributor_v1705));
-		d->d_v20 = rte_distributor_create(name,
+		d = malloc(sizeof(struct rte_distributor));
+		d->d_v20 = rte_distributor_create_v20(name,
 				socket_id, num_workers);
 		if (d->d_v20 == NULL) {
 			/* rte_errno will have been set */
diff --git a/lib/librte_distributor/rte_distributor.h b/lib/librte_distributor/rte_distributor.h
index e41d522..9b9efdb 100644
--- a/lib/librte_distributor/rte_distributor.h
+++ b/lib/librte_distributor/rte_distributor.h
@@ -1,8 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -31,9 +30,240 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-#ifndef _RTE_DISTRIBUTE_H_
-#define _RTE_DISTRIBUTE_H_
+#ifndef _RTE_DISTRIBUTOR_H_
+#define _RTE_DISTRIBUTOR_H_
 
-#include <rte_distributor_v20.h>
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Type of distribution (burst/single) */
+enum rte_distributor_alg_type {
+	RTE_DIST_ALG_BURST = 0,
+	RTE_DIST_ALG_SINGLE,
+	RTE_DIST_NUM_ALG_TYPES
+};
+
+struct rte_distributor;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @param alg_type
+ *   Call the legacy API, or use the new burst API. legacy uses 32-bit
+ *   flow ID, and works on a single packet at a time. Latest uses 15-
+ *   bit flow ID and works on up to 8 packets at a time to worers.
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor *
+rte_distributor_create(const char *name, unsigned int socket_id,
+		unsigned int num_workers,
+		unsigned int alg_type);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed on different cores at
+ * the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush(struct rte_distributor *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns(struct rte_distributor *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get new packets to process. Any previous packets
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param pkts
+ *   The mbufs pointer array to be filled in (up to 8 packets)
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ * @param retcount
+ *   The number of packets being returned
+ *
+ * @return
+ *   The number of packets in the pkts array
+ */
+int
+rte_distributor_get_pkt(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **pkts,
+	struct rte_mbuf **oldpkt, unsigned int retcount);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packets being processed by the worker
+ * @param num
+ *   The number of packets in the oldpkt array
+ */
+int
+rte_distributor_return_pkt(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt_burst(), this function does not wait for a
+ * new packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt_burst() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt_burst()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The returning packets, if any, processed by the worker
+ * @param count
+ *   The number of returning packets
+ */
+void
+rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbufs
+ *   The array of mbufs being given to the worker
+ *
+ * @return
+ *   The number of packets being given to the worker thread, zero if no
+ *   packet is yet available.
+ */
+int
+rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **mbufs);
+
+#ifdef __cplusplus
+}
+#endif
 
 #endif
diff --git a/lib/librte_distributor/rte_distributor_match_sse.c b/lib/librte_distributor/rte_distributor_match_sse.c
index b9f9bb0..44935a6 100644
--- a/lib/librte_distributor/rte_distributor_match_sse.c
+++ b/lib/librte_distributor/rte_distributor_match_sse.c
@@ -38,7 +38,7 @@
 
 
 void
-find_match_vec(struct rte_distributor_v1705 *d,
+find_match_vec(struct rte_distributor *d,
 			uint16_t *data_ptr,
 			uint16_t *output_ptr)
 {
diff --git a/lib/librte_distributor/rte_distributor_next.h b/lib/librte_distributor/rte_distributor_next.h
deleted file mode 100644
index 0034020..0000000
--- a/lib/librte_distributor/rte_distributor_next.h
+++ /dev/null
@@ -1,269 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2017 Intel Corporation. All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _RTE_DISTRIBUTOR_H_
-#define _RTE_DISTRIBUTOR_H_
-
-/**
- * @file
- * RTE distributor
- *
- * The distributor is a component which is designed to pass packets
- * one-at-a-time to workers, with dynamic load balancing.
- */
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-/* Type of distribution (burst/single) */
-enum rte_distributor_alg_type {
-	RTE_DIST_ALG_BURST = 0,
-	RTE_DIST_ALG_SINGLE,
-	RTE_DIST_NUM_ALG_TYPES
-};
-
-struct rte_distributor_v1705;
-struct rte_mbuf;
-
-/**
- * Function to create a new distributor instance
- *
- * Reserves the memory needed for the distributor operation and
- * initializes the distributor to work with the configured number of workers.
- *
- * @param name
- *   The name to be given to the distributor instance.
- * @param socket_id
- *   The NUMA node on which the memory is to be allocated
- * @param num_workers
- *   The maximum number of workers that will request packets from this
- *   distributor
- * @param alg_type
- *   Call the legacy API, or use the new burst API. legacy uses 32-bit
- *   flow ID, and works on a single packet at a time. Latest uses 15-
- *   bit flow ID and works on up to 8 packets at a time to worers.
- * @return
- *   The newly created distributor instance
- */
-struct rte_distributor_v1705 *
-rte_distributor_create_v1705(const char *name, unsigned int socket_id,
-		unsigned int num_workers,
-		unsigned int alg_type);
-
-/*  *** APIS to be called on the distributor lcore ***  */
-/*
- * The following APIs are the public APIs which are designed for use on a
- * single lcore which acts as the distributor lcore for a given distributor
- * instance. These functions cannot be called on multiple cores simultaneously
- * without using locking to protect access to the internals of the distributor.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * Process a set of packets by distributing them among workers that request
- * packets. The distributor will ensure that no two packets that have the
- * same flow id, or tag, in the mbuf will be processed on different cores at
- * the same time.
- *
- * The user is advocated to set tag for each mbuf before calling this function.
- * If user doesn't set the tag, the tag value can be various values depending on
- * driver implementation and configuration.
- *
- * This is not multi-thread safe and should only be called on a single lcore.
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs to be distributed
- * @param num_mbufs
- *   The number of mbufs in the mbufs array
- * @return
- *   The number of mbufs processed.
- */
-int
-rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
-		struct rte_mbuf **mbufs, unsigned int num_mbufs);
-
-/**
- * Get a set of mbufs that have been returned to the distributor by workers
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs pointer array to be filled in
- * @param max_mbufs
- *   The size of the mbufs array
- * @return
- *   The number of mbufs returned in the mbufs array.
- */
-int
-rte_distributor_returned_pkts_v1705(struct rte_distributor_v1705 *d,
-		struct rte_mbuf **mbufs, unsigned int max_mbufs);
-
-/**
- * Flush the distributor component, so that there are no in-flight or
- * backlogged packets awaiting processing
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @return
- *   The number of queued/in-flight packets that were completed by this call.
- */
-int
-rte_distributor_flush_v1705(struct rte_distributor_v1705 *d);
-
-/**
- * Clears the array of returned packets used as the source for the
- * rte_distributor_returned_pkts() API call.
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- */
-void
-rte_distributor_clear_returns_v1705(struct rte_distributor_v1705 *d);
-
-/*  *** APIS to be called on the worker lcores ***  */
-/*
- * The following APIs are the public APIs which are designed for use on
- * multiple lcores which act as workers for a distributor. Each lcore should use
- * a unique worker id when requesting packets.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * API called by a worker to get new packets to process. Any previous packets
- * given to the worker is assumed to have completed processing, and may be
- * optionally returned to the distributor via the oldpkt parameter.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param pkts
- *   The mbufs pointer array to be filled in (up to 8 packets)
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- * @param retcount
- *   The number of packets being returned
- *
- * @return
- *   The number of packets in the pkts array
- */
-int
-rte_distributor_get_pkt_v1705(struct rte_distributor_v1705 *d,
-	unsigned int worker_id, struct rte_mbuf **pkts,
-	struct rte_mbuf **oldpkt, unsigned int retcount);
-
-/**
- * API called by a worker to return a completed packet without requesting a
- * new packet, for example, because a worker thread is shutting down
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packets being processed by the worker
- * @param num
- *   The number of packets in the oldpkt array
- */
-int
-rte_distributor_return_pkt_v1705(struct rte_distributor_v1705 *d,
-	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
-
-/**
- * API called by a worker to request a new packet to process.
- * Any previous packet given to the worker is assumed to have completed
- * processing, and may be optionally returned to the distributor via
- * the oldpkt parameter.
- * Unlike rte_distributor_get_pkt_burst(), this function does not wait for a
- * new packet to be provided by the distributor.
- *
- * NOTE: after calling this function, rte_distributor_poll_pkt_burst() should
- * be used to poll for the packet requested. The rte_distributor_get_pkt_burst()
- * API should *not* be used to try and retrieve the new packet.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The returning packets, if any, processed by the worker
- * @param count
- *   The number of returning packets
- */
-void
-rte_distributor_request_pkt_v1705(struct rte_distributor_v1705 *d,
-		unsigned int worker_id, struct rte_mbuf **oldpkt,
-		unsigned int count);
-
-/**
- * API called by a worker to check for a new packet that was previously
- * requested by a call to rte_distributor_request_pkt(). It does not wait
- * for the new packet to be available, but returns NULL if the request has
- * not yet been fulfilled by the distributor.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param mbufs
- *   The array of mbufs being given to the worker
- *
- * @return
- *   The number of packets being given to the worker thread, zero if no
- *   packet is yet available.
- */
-int
-rte_distributor_poll_pkt_v1705(struct rte_distributor_v1705 *d,
-		unsigned int worker_id, struct rte_mbuf **mbufs);
-
-#ifdef __cplusplus
-}
-#endif
-
-#endif
diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
index 92052b1..fb5a43a 100644
--- a/lib/librte_distributor/rte_distributor_private.h
+++ b/lib/librte_distributor/rte_distributor_private.h
@@ -83,7 +83,7 @@ extern "C" {
  * the next cache line to worker 0, we pad this out to three cache lines.
  * Only 64-bits of the memory is actually used though.
  */
-union rte_distributor_buffer {
+union rte_distributor_buffer_v20 {
 	volatile int64_t bufptr64;
 	char pad[RTE_CACHE_LINE_SIZE*3];
 } __rte_cache_aligned;
@@ -108,8 +108,8 @@ struct rte_distributor_returned_pkts {
 	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
 };
 
-struct rte_distributor {
-	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
+struct rte_distributor_v20 {
+	TAILQ_ENTRY(rte_distributor_v20) next;    /**< Next in list. */
 
 	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
 	unsigned int num_workers;             /**< Number of workers polling */
@@ -124,7 +124,7 @@ struct rte_distributor {
 
 	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
 
-	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
+	union rte_distributor_buffer_v20 bufs[RTE_DISTRIB_MAX_WORKERS];
 
 	struct rte_distributor_returned_pkts returns;
 };
@@ -144,7 +144,7 @@ enum rte_distributor_match_function {
  * We can pass up to 8 mbufs at a time in one cacheline.
  * There is a separate cacheline for returns in the burst API.
  */
-struct rte_distributor_buffer_v1705 {
+struct rte_distributor_buffer {
 	volatile int64_t bufptr64[RTE_DIST_BURST_SIZE]
 		__rte_cache_aligned; /* <= outgoing to worker */
 
@@ -158,8 +158,8 @@ struct rte_distributor_buffer_v1705 {
 	int count __rte_cache_aligned;       /* <= number of current mbufs */
 };
 
-struct rte_distributor_v1705 {
-	TAILQ_ENTRY(rte_distributor_v1705) next;    /**< Next in list. */
+struct rte_distributor {
+	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
 
 	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
 	unsigned int num_workers;             /**< Number of workers polling */
@@ -176,22 +176,22 @@ struct rte_distributor_v1705 {
 	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS]
 			__rte_cache_aligned;
 
-	struct rte_distributor_buffer_v1705 bufs[RTE_DISTRIB_MAX_WORKERS];
+	struct rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
 
 	struct rte_distributor_returned_pkts returns;
 
 	enum rte_distributor_match_function dist_match_fn;
 
-	struct rte_distributor *d_v20;
+	struct rte_distributor_v20 *d_v20;
 };
 
 void
-find_match_scalar(struct rte_distributor_v1705 *d,
+find_match_scalar(struct rte_distributor *d,
 			uint16_t *data_ptr,
 			uint16_t *output_ptr);
 
 void
-find_match_vec(struct rte_distributor_v1705 *d,
+find_match_vec(struct rte_distributor *d,
 			uint16_t *data_ptr,
 			uint16_t *output_ptr);
 
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
index be297ec..1f406c5 100644
--- a/lib/librte_distributor/rte_distributor_v20.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -43,7 +43,7 @@
 #include "rte_distributor_v20.h"
 #include "rte_distributor_private.h"
 
-TAILQ_HEAD(rte_distributor_list, rte_distributor);
+TAILQ_HEAD(rte_distributor_list, rte_distributor_v20);
 
 static struct rte_tailq_elem rte_distributor_tailq = {
 	.name = "RTE_DISTRIBUTOR",
@@ -53,10 +53,10 @@ EAL_REGISTER_TAILQ(rte_distributor_tailq)
 /**** APIs called by workers ****/
 
 void
-rte_distributor_request_pkt(struct rte_distributor *d,
+rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id, struct rte_mbuf *oldpkt)
 {
-	union rte_distributor_buffer *buf = &d->bufs[worker_id];
+	union rte_distributor_buffer_v20 *buf = &d->bufs[worker_id];
 	int64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
 			| RTE_DISTRIB_GET_BUF;
 	while (unlikely(buf->bufptr64 & RTE_DISTRIB_FLAGS_MASK))
@@ -65,10 +65,10 @@ rte_distributor_request_pkt(struct rte_distributor *d,
 }
 
 struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
+rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id)
 {
-	union rte_distributor_buffer *buf = &d->bufs[worker_id];
+	union rte_distributor_buffer_v20 *buf = &d->bufs[worker_id];
 	if (buf->bufptr64 & RTE_DISTRIB_GET_BUF)
 		return NULL;
 
@@ -78,21 +78,21 @@ rte_distributor_poll_pkt(struct rte_distributor *d,
 }
 
 struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
+rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id, struct rte_mbuf *oldpkt)
 {
 	struct rte_mbuf *ret;
-	rte_distributor_request_pkt(d, worker_id, oldpkt);
-	while ((ret = rte_distributor_poll_pkt(d, worker_id)) == NULL)
+	rte_distributor_request_pkt_v20(d, worker_id, oldpkt);
+	while ((ret = rte_distributor_poll_pkt_v20(d, worker_id)) == NULL)
 		rte_pause();
 	return ret;
 }
 
 int
-rte_distributor_return_pkt(struct rte_distributor *d,
+rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id, struct rte_mbuf *oldpkt)
 {
-	union rte_distributor_buffer *buf = &d->bufs[worker_id];
+	union rte_distributor_buffer_v20 *buf = &d->bufs[worker_id];
 	uint64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
 			| RTE_DISTRIB_RETURN_BUF;
 	buf->bufptr64 = req;
@@ -123,7 +123,7 @@ backlog_pop(struct rte_distributor_backlog *bl)
 
 /* stores a packet returned from a worker inside the returns array */
 static inline void
-store_return(uintptr_t oldbuf, struct rte_distributor *d,
+store_return(uintptr_t oldbuf, struct rte_distributor_v20 *d,
 		unsigned *ret_start, unsigned *ret_count)
 {
 	/* store returns in a circular buffer - code is branch-free */
@@ -134,7 +134,7 @@ store_return(uintptr_t oldbuf, struct rte_distributor *d,
 }
 
 static inline void
-handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
+handle_worker_shutdown(struct rte_distributor_v20 *d, unsigned int wkr)
 {
 	d->in_flight_tags[wkr] = 0;
 	d->in_flight_bitmask &= ~(1UL << wkr);
@@ -164,7 +164,7 @@ handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
 		 * Note that the tags were set before first level call
 		 * to rte_distributor_process.
 		 */
-		rte_distributor_process(d, pkts, i);
+		rte_distributor_process_v20(d, pkts, i);
 		bl->count = bl->start = 0;
 	}
 }
@@ -174,7 +174,7 @@ handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
  * to do a partial flush.
  */
 static int
-process_returns(struct rte_distributor *d)
+process_returns(struct rte_distributor_v20 *d)
 {
 	unsigned wkr;
 	unsigned flushed = 0;
@@ -213,7 +213,7 @@ process_returns(struct rte_distributor *d)
 
 /* process a set of packets to distribute them to workers */
 int
-rte_distributor_process(struct rte_distributor *d,
+rte_distributor_process_v20(struct rte_distributor_v20 *d,
 		struct rte_mbuf **mbufs, unsigned num_mbufs)
 {
 	unsigned next_idx = 0;
@@ -317,7 +317,7 @@ rte_distributor_process(struct rte_distributor *d,
 
 /* return to the caller, packets returned from workers */
 int
-rte_distributor_returned_pkts(struct rte_distributor *d,
+rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
 		struct rte_mbuf **mbufs, unsigned max_mbufs)
 {
 	struct rte_distributor_returned_pkts *returns = &d->returns;
@@ -338,7 +338,7 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
 /* return the number of packets in-flight in a distributor, i.e. packets
  * being workered on or queued up in a backlog. */
 static inline unsigned
-total_outstanding(const struct rte_distributor *d)
+total_outstanding(const struct rte_distributor_v20 *d)
 {
 	unsigned wkr, total_outstanding;
 
@@ -353,19 +353,19 @@ total_outstanding(const struct rte_distributor *d)
 /* flush the distributor, so that there are no outstanding packets in flight or
  * queued up. */
 int
-rte_distributor_flush(struct rte_distributor *d)
+rte_distributor_flush_v20(struct rte_distributor_v20 *d)
 {
 	const unsigned flushed = total_outstanding(d);
 
 	while (total_outstanding(d) > 0)
-		rte_distributor_process(d, NULL, 0);
+		rte_distributor_process_v20(d, NULL, 0);
 
 	return flushed;
 }
 
 /* clears the internal returns array in the distributor */
 void
-rte_distributor_clear_returns(struct rte_distributor *d)
+rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d)
 {
 	d->returns.start = d->returns.count = 0;
 #ifndef __OPTIMIZE__
@@ -374,12 +374,12 @@ rte_distributor_clear_returns(struct rte_distributor *d)
 }
 
 /* creates a distributor instance */
-struct rte_distributor *
-rte_distributor_create(const char *name,
+struct rte_distributor_v20 *
+rte_distributor_create_v20(const char *name,
 		unsigned socket_id,
 		unsigned num_workers)
 {
-	struct rte_distributor *d;
+	struct rte_distributor_v20 *d;
 	struct rte_distributor_list *distributor_list;
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	const struct rte_memzone *mz;
diff --git a/lib/librte_distributor/rte_distributor_v20.h b/lib/librte_distributor/rte_distributor_v20.h
index b69aa27..f02e6aa 100644
--- a/lib/librte_distributor/rte_distributor_v20.h
+++ b/lib/librte_distributor/rte_distributor_v20.h
@@ -48,7 +48,7 @@ extern "C" {
 
 #define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
 
-struct rte_distributor;
+struct rte_distributor_v20;
 struct rte_mbuf;
 
 /**
@@ -67,8 +67,8 @@ struct rte_mbuf;
  * @return
  *   The newly created distributor instance
  */
-struct rte_distributor *
-rte_distributor_create(const char *name, unsigned int socket_id,
+struct rte_distributor_v20 *
+rte_distributor_create_v20(const char *name, unsigned int socket_id,
 		unsigned int num_workers);
 
 /*  *** APIS to be called on the distributor lcore ***  */
@@ -103,7 +103,7 @@ rte_distributor_create(const char *name, unsigned int socket_id,
  *   The number of mbufs processed.
  */
 int
-rte_distributor_process(struct rte_distributor *d,
+rte_distributor_process_v20(struct rte_distributor_v20 *d,
 		struct rte_mbuf **mbufs, unsigned int num_mbufs);
 
 /**
@@ -121,7 +121,7 @@ rte_distributor_process(struct rte_distributor *d,
  *   The number of mbufs returned in the mbufs array.
  */
 int
-rte_distributor_returned_pkts(struct rte_distributor *d,
+rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
 		struct rte_mbuf **mbufs, unsigned int max_mbufs);
 
 /**
@@ -136,7 +136,7 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
  *   The number of queued/in-flight packets that were completed by this call.
  */
 int
-rte_distributor_flush(struct rte_distributor *d);
+rte_distributor_flush_v20(struct rte_distributor_v20 *d);
 
 /**
  * Clears the array of returned packets used as the source for the
@@ -148,7 +148,7 @@ rte_distributor_flush(struct rte_distributor *d);
  *   The distributor instance to be used
  */
 void
-rte_distributor_clear_returns(struct rte_distributor *d);
+rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d);
 
 /*  *** APIS to be called on the worker lcores ***  */
 /*
@@ -177,7 +177,7 @@ rte_distributor_clear_returns(struct rte_distributor *d);
  *   A new packet to be processed by the worker thread.
  */
 struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
+rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned int worker_id, struct rte_mbuf *oldpkt);
 
 /**
@@ -193,8 +193,8 @@ rte_distributor_get_pkt(struct rte_distributor *d,
  *   The previous packet being processed by the worker
  */
 int
-rte_distributor_return_pkt(struct rte_distributor *d, unsigned int worker_id,
-		struct rte_mbuf *mbuf);
+rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
+		unsigned int worker_id, struct rte_mbuf *mbuf);
 
 /**
  * API called by a worker to request a new packet to process.
@@ -217,7 +217,7 @@ rte_distributor_return_pkt(struct rte_distributor *d, unsigned int worker_id,
  *   The previous packet, if any, being processed by the worker
  */
 void
-rte_distributor_request_pkt(struct rte_distributor *d,
+rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned int worker_id, struct rte_mbuf *oldpkt);
 
 /**
@@ -237,7 +237,7 @@ rte_distributor_request_pkt(struct rte_distributor *d,
  *   packet is yet available.
  */
 struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
+rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned int worker_id);
 
 #ifdef __cplusplus
diff --git a/test/test/test_distributor.c b/test/test/test_distributor.c
index 6059a0c..7a30513 100644
--- a/test/test/test_distributor.c
+++ b/test/test/test_distributor.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -87,20 +87,25 @@ clear_packet_count(void)
 static int
 handle_work(void *arg)
 {
-	struct rte_mbuf *pkt = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
 	struct worker_params *wp = arg;
-	struct rte_distributor *d = wp->dist;
-
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
-
-	pkt = rte_distributor_get_pkt(d, id, NULL);
+	struct rte_distributor *db = wp->dist;
+	unsigned int count = 0, num = 0;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	int i;
+
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
+	num = rte_distributor_get_pkt(db, id, buf, buf, num);
 	while (!quit) {
-		worker_stats[id].handled_packets++, count++;
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		worker_stats[id].handled_packets += num;
+		count += num;
+		num = rte_distributor_get_pkt(db, id,
+				buf, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
+	worker_stats[id].handled_packets += num;
+	count += num;
+	rte_distributor_return_pkt(db, id, buf, num);
 	return 0;
 }
 
@@ -118,9 +123,11 @@ handle_work(void *arg)
 static int
 sanity_test(struct worker_params *wp, struct rte_mempool *p)
 {
-	struct rte_distributor *d = wp->dist;
+	struct rte_distributor *db = wp->dist;
 	struct rte_mbuf *bufs[BURST];
-	unsigned i;
+	struct rte_mbuf *returns[BURST*2];
+	unsigned int i, count;
+	unsigned int retries;
 
 	printf("=== Basic distributor sanity tests ===\n");
 	clear_packet_count();
@@ -134,8 +141,15 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 	for (i = 0; i < BURST; i++)
 		bufs[i]->hash.usr = 0;
 
-	rte_distributor_process(d, bufs, BURST);
-	rte_distributor_flush(d);
+	rte_distributor_process(db, bufs, BURST);
+	count = 0;
+	do {
+
+		rte_distributor_flush(db);
+		count += rte_distributor_returned_pkts(db,
+				returns, BURST*2);
+	} while (count < BURST);
+
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -147,8 +161,6 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 		printf("Worker %u handled %u packets\n", i,
 				worker_stats[i].handled_packets);
 	printf("Sanity test with all zero hashes done.\n");
-	if (worker_stats[0].handled_packets != BURST)
-		return -1;
 
 	/* pick two flows and check they go correctly */
 	if (rte_lcore_count() >= 3) {
@@ -156,8 +168,13 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 		for (i = 0; i < BURST; i++)
 			bufs[i]->hash.usr = (i & 1) << 8;
 
-		rte_distributor_process(d, bufs, BURST);
-		rte_distributor_flush(d);
+		rte_distributor_process(db, bufs, BURST);
+		count = 0;
+		do {
+			rte_distributor_flush(db);
+			count += rte_distributor_returned_pkts(db,
+					returns, BURST*2);
+		} while (count < BURST);
 		if (total_packet_count() != BURST) {
 			printf("Line %d: Error, not all packets flushed. "
 					"Expected %u, got %u\n",
@@ -169,20 +186,21 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 			printf("Worker %u handled %u packets\n", i,
 					worker_stats[i].handled_packets);
 		printf("Sanity test with two hash values done\n");
-
-		if (worker_stats[0].handled_packets != 16 ||
-				worker_stats[1].handled_packets != 16)
-			return -1;
 	}
 
 	/* give a different hash value to each packet,
 	 * so load gets distributed */
 	clear_packet_count();
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = i;
-
-	rte_distributor_process(d, bufs, BURST);
-	rte_distributor_flush(d);
+		bufs[i]->hash.usr = i+1;
+
+	rte_distributor_process(db, bufs, BURST);
+	count = 0;
+	do {
+		rte_distributor_flush(db);
+		count += rte_distributor_returned_pkts(db,
+				returns, BURST*2);
+	} while (count < BURST);
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -204,8 +222,9 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 	unsigned num_returned = 0;
 
 	/* flush out any remaining packets */
-	rte_distributor_flush(d);
-	rte_distributor_clear_returns(d);
+	rte_distributor_flush(db);
+	rte_distributor_clear_returns(db);
+
 	if (rte_mempool_get_bulk(p, (void *)many_bufs, BIG_BATCH) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
 		return -1;
@@ -213,28 +232,44 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 	for (i = 0; i < BIG_BATCH; i++)
 		many_bufs[i]->hash.usr = i << 2;
 
+	printf("=== testing big burst (%s) ===\n", wp->name);
 	for (i = 0; i < BIG_BATCH/BURST; i++) {
-		rte_distributor_process(d, &many_bufs[i*BURST], BURST);
-		num_returned += rte_distributor_returned_pkts(d,
+		rte_distributor_process(db,
+				&many_bufs[i*BURST], BURST);
+		count = rte_distributor_returned_pkts(db,
 				&return_bufs[num_returned],
 				BIG_BATCH - num_returned);
+		num_returned += count;
 	}
-	rte_distributor_flush(d);
-	num_returned += rte_distributor_returned_pkts(d,
-			&return_bufs[num_returned], BIG_BATCH - num_returned);
+	rte_distributor_flush(db);
+	count = rte_distributor_returned_pkts(db,
+		&return_bufs[num_returned],
+			BIG_BATCH - num_returned);
+	num_returned += count;
+	retries = 0;
+	do {
+		rte_distributor_flush(db);
+		count = rte_distributor_returned_pkts(db,
+				&return_bufs[num_returned],
+				BIG_BATCH - num_returned);
+		num_returned += count;
+		retries++;
+	} while ((num_returned < BIG_BATCH) && (retries < 100));
 
 	if (num_returned != BIG_BATCH) {
-		printf("line %d: Number returned is not the same as "
-				"number sent\n", __LINE__);
+		printf("line %d: Missing packets, expected %d\n",
+				__LINE__, num_returned);
 		return -1;
 	}
+
 	/* big check -  make sure all packets made it back!! */
 	for (i = 0; i < BIG_BATCH; i++) {
 		unsigned j;
 		struct rte_mbuf *src = many_bufs[i];
-		for (j = 0; j < BIG_BATCH; j++)
+		for (j = 0; j < BIG_BATCH; j++) {
 			if (return_bufs[j] == src)
 				break;
+		}
 
 		if (j == BIG_BATCH) {
 			printf("Error: could not find source packet #%u\n", i);
@@ -258,20 +293,28 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 static int
 handle_work_with_free_mbufs(void *arg)
 {
-	struct rte_mbuf *pkt = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
 	struct worker_params *wp = arg;
 	struct rte_distributor *d = wp->dist;
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
-
-	pkt = rte_distributor_get_pkt(d, id, NULL);
+	unsigned int count = 0;
+	unsigned int i;
+	unsigned int num = 0;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
+	num = rte_distributor_get_pkt(d, id, buf, buf, num);
 	while (!quit) {
-		worker_stats[id].handled_packets++, count++;
-		rte_pktmbuf_free(pkt);
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		worker_stats[id].handled_packets += num;
+		count += num;
+		for (i = 0; i < num; i++)
+			rte_pktmbuf_free(buf[i]);
+		num = rte_distributor_get_pkt(d,
+				id, buf, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
+	worker_stats[id].handled_packets += num;
+	count += num;
+	rte_distributor_return_pkt(d, id, buf, num);
 	return 0;
 }
 
@@ -287,7 +330,8 @@ sanity_test_with_mbuf_alloc(struct worker_params *wp, struct rte_mempool *p)
 	unsigned i;
 	struct rte_mbuf *bufs[BURST];
 
-	printf("=== Sanity test with mbuf alloc/free  ===\n");
+	printf("=== Sanity test with mbuf alloc/free (%s) ===\n", wp->name);
+
 	clear_packet_count();
 	for (i = 0; i < ((1<<ITER_POWER)); i += BURST) {
 		unsigned j;
@@ -302,6 +346,9 @@ sanity_test_with_mbuf_alloc(struct worker_params *wp, struct rte_mempool *p)
 	}
 
 	rte_distributor_flush(d);
+
+	rte_delay_us(10000);
+
 	if (total_packet_count() < (1<<ITER_POWER)) {
 		printf("Line %u: Packet count is incorrect, %u, expected %u\n",
 				__LINE__, total_packet_count(),
@@ -317,21 +364,32 @@ static int
 handle_work_for_shutdown_test(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
 	struct worker_params *wp = arg;
 	struct rte_distributor *d = wp->dist;
-	unsigned count = 0;
-	const unsigned id = __sync_fetch_and_add(&worker_idx, 1);
+	unsigned int count = 0;
+	unsigned int num = 0;
+	unsigned int total = 0;
+	unsigned int i;
+	unsigned int returned = 0;
+	const unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+
+	num = rte_distributor_get_pkt(d, id, buf, buf, num);
 
-	pkt = rte_distributor_get_pkt(d, id, NULL);
 	/* wait for quit single globally, or for worker zero, wait
 	 * for zero_quit */
 	while (!quit && !(id == 0 && zero_quit)) {
-		worker_stats[id].handled_packets++, count++;
-		rte_pktmbuf_free(pkt);
-		pkt = rte_distributor_get_pkt(d, id, NULL);
+		worker_stats[id].handled_packets += num;
+		count += num;
+		for (i = 0; i < num; i++)
+			rte_pktmbuf_free(buf[i]);
+		num = rte_distributor_get_pkt(d,
+				id, buf, buf, num);
+		total += num;
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
+	worker_stats[id].handled_packets += num;
+	count += num;
+	returned = rte_distributor_return_pkt(d, id, buf, num);
 
 	if (id == 0) {
 		/* for worker zero, allow it to restart to pick up last packet
@@ -339,13 +397,18 @@ handle_work_for_shutdown_test(void *arg)
 		 */
 		while (zero_quit)
 			usleep(100);
-		pkt = rte_distributor_get_pkt(d, id, NULL);
+
+		num = rte_distributor_get_pkt(d,
+				id, buf, buf, num);
+
 		while (!quit) {
 			worker_stats[id].handled_packets++, count++;
 			rte_pktmbuf_free(pkt);
-			pkt = rte_distributor_get_pkt(d, id, NULL);
+			num = rte_distributor_get_pkt(d, id, buf, buf, num);
 		}
-		rte_distributor_return_pkt(d, id, pkt);
+		returned = rte_distributor_return_pkt(d,
+				id, buf, num);
+		printf("Num returned = %d\n", returned);
 	}
 	return 0;
 }
@@ -367,17 +430,22 @@ sanity_test_with_worker_shutdown(struct worker_params *wp,
 	printf("=== Sanity test of worker shutdown ===\n");
 
 	clear_packet_count();
+
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
 		return -1;
 	}
 
-	/* now set all hash values in all buffers to zero, so all pkts go to the
-	 * one worker thread */
+	/*
+	 * Now set all hash values in all buffers to same value so all
+	 * pkts go to the one worker thread
+	 */
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = 0;
+		bufs[i]->hash.usr = 1;
 
 	rte_distributor_process(d, bufs, BURST);
+	rte_distributor_flush(d);
+
 	/* at this point, we will have processed some packets and have a full
 	 * backlog for the other ones at worker 0.
 	 */
@@ -388,7 +456,7 @@ sanity_test_with_worker_shutdown(struct worker_params *wp,
 		return -1;
 	}
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = 0;
+		bufs[i]->hash.usr = 1;
 
 	/* get worker zero to quit */
 	zero_quit = 1;
@@ -396,6 +464,12 @@ sanity_test_with_worker_shutdown(struct worker_params *wp,
 
 	/* flush the distributor */
 	rte_distributor_flush(d);
+	rte_delay_us(10000);
+
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
+
 	if (total_packet_count() != BURST * 2) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -403,10 +477,6 @@ sanity_test_with_worker_shutdown(struct worker_params *wp,
 		return -1;
 	}
 
-	for (i = 0; i < rte_lcore_count() - 1; i++)
-		printf("Worker %u handled %u packets\n", i,
-				worker_stats[i].handled_packets);
-
 	printf("Sanity test with worker shutdown passed\n\n");
 	return 0;
 }
@@ -422,7 +492,7 @@ test_flush_with_worker_shutdown(struct worker_params *wp,
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
-	printf("=== Test flush fn with worker shutdown ===\n");
+	printf("=== Test flush fn with worker shutdown (%s) ===\n", wp->name);
 
 	clear_packet_count();
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
@@ -446,7 +516,13 @@ test_flush_with_worker_shutdown(struct worker_params *wp,
 	/* flush the distributor */
 	rte_distributor_flush(d);
 
+	rte_delay_us(10000);
+
 	zero_quit = 0;
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
+
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -454,10 +530,6 @@ test_flush_with_worker_shutdown(struct worker_params *wp,
 		return -1;
 	}
 
-	for (i = 0; i < rte_lcore_count() - 1; i++)
-		printf("Worker %u handled %u packets\n", i,
-				worker_stats[i].handled_packets);
-
 	printf("Flush test with worker shutdown passed\n\n");
 	return 0;
 }
@@ -469,7 +541,9 @@ int test_error_distributor_create_name(void)
 	char *name = NULL;
 
 	d = rte_distributor_create(name, rte_socket_id(),
-			rte_lcore_count() - 1);
+			rte_lcore_count() - 1,
+			RTE_DIST_ALG_BURST);
+
 	if (d != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with NULL name param\n");
 		return -1;
@@ -483,8 +557,10 @@ static
 int test_error_distributor_create_numworkers(void)
 {
 	struct rte_distributor *d = NULL;
+
 	d = rte_distributor_create("test_numworkers", rte_socket_id(),
-			RTE_MAX_LCORE + 10);
+			RTE_MAX_LCORE + 10,
+			RTE_DIST_ALG_BURST);
 	if (d != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with num_workers > MAX\n");
 		return -1;
@@ -530,10 +606,11 @@ test_distributor(void)
 	}
 
 	if (d == NULL) {
-		d = rte_distributor_create("Test_distributor", rte_socket_id(),
-				rte_lcore_count() - 1);
+		d = rte_distributor_create("Test_dist_burst", rte_socket_id(),
+				rte_lcore_count() - 1,
+				RTE_DIST_ALG_BURST);
 		if (d == NULL) {
-			printf("Error creating distributor\n");
+			printf("Error creating burst distributor\n");
 			return -1;
 		}
 	} else {
@@ -553,7 +630,7 @@ test_distributor(void)
 	}
 
 	worker_params.dist = d;
-	sprintf(worker_params.name, "single");
+	sprintf(worker_params.name, "burst");
 
 	rte_eal_mp_remote_launch(handle_work, &worker_params, SKIP_MASTER);
 	if (sanity_test(&worker_params, p) < 0)
diff --git a/test/test/test_distributor_perf.c b/test/test/test_distributor_perf.c
index 7947fe9..1dd326b 100644
--- a/test/test/test_distributor_perf.c
+++ b/test/test/test_distributor_perf.c
@@ -129,18 +129,25 @@ clear_packet_count(void)
 static int
 handle_work(void *arg)
 {
-	struct rte_mbuf *pkt = NULL;
 	struct rte_distributor *d = arg;
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
+	unsigned int count = 0;
+	unsigned int num = 0;
+	int i;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
 
-	pkt = rte_distributor_get_pkt(d, id, NULL);
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
+
+	num = rte_distributor_get_pkt(d, id, buf, buf, num);
 	while (!quit) {
-		worker_stats[id].handled_packets++, count++;
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		worker_stats[id].handled_packets += num;
+		count += num;
+		num = rte_distributor_get_pkt(d, id, buf, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
+	worker_stats[id].handled_packets += num;
+	count += num;
+	rte_distributor_return_pkt(d, id, buf, num);
 	return 0;
 }
 
@@ -228,7 +235,8 @@ test_distributor_perf(void)
 
 	if (d == NULL) {
 		d = rte_distributor_create("Test_perf", rte_socket_id(),
-				rte_lcore_count() - 1);
+				rte_lcore_count() - 1,
+				RTE_DIST_ALG_SINGLE);
 		if (d == NULL) {
 			printf("Error creating distributor\n");
 			return -1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v10 07/18] lib: make v20 header file private
  2017-03-15  6:19                               ` [PATCH v10 0/18] distributor library performance enhancements David Hunt
                                                   ` (5 preceding siblings ...)
  2017-03-15  6:19                                 ` [PATCH v10 06/18] lib: switch distributor over to new API David Hunt
@ 2017-03-15  6:19                                 ` David Hunt
  2017-03-15  6:19                                 ` [PATCH v10 08/18] lib: add symbol versioning to distributor David Hunt
                                                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_distributor/Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index a812fe4..2b28eff 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -57,7 +57,6 @@ endif
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
-SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor_v20.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v10 08/18] lib: add symbol versioning to distributor
  2017-03-15  6:19                               ` [PATCH v10 0/18] distributor library performance enhancements David Hunt
                                                   ` (6 preceding siblings ...)
  2017-03-15  6:19                                 ` [PATCH v10 07/18] lib: make v20 header file private David Hunt
@ 2017-03-15  6:19                                 ` David Hunt
  2017-03-15  6:19                                 ` [PATCH v10 09/18] test: test single and burst distributor API David Hunt
                                                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Also bumped up the ABI version number in the Makefile

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_distributor/Makefile                    |  2 +-
 lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
 lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++
 lib/librte_distributor/rte_distributor_v20.c       | 10 +++
 lib/librte_distributor/rte_distributor_version.map | 14 ++++
 5 files changed, 162 insertions(+), 10 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_v1705.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 2b28eff..2f05cf3 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -39,7 +39,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
 
 EXPORT_MAP := rte_distributor_version.map
 
-LIBABIVER := 1
+LIBABIVER := 2
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index 6e1debf..06df13d 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -36,6 +36,7 @@
 #include <rte_mbuf.h>
 #include <rte_memory.h>
 #include <rte_cycles.h>
+#include <rte_compat.h>
 #include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
@@ -44,6 +45,7 @@
 #include "rte_distributor_private.h"
 #include "rte_distributor.h"
 #include "rte_distributor_v20.h"
+#include "rte_distributor_v1705.h"
 
 TAILQ_HEAD(rte_dist_burst_list, rte_distributor);
 
@@ -57,7 +59,7 @@ EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
 /**** Burst Packet APIs called by workers ****/
 
 void
-rte_distributor_request_pkt(struct rte_distributor *d,
+rte_distributor_request_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt,
 		unsigned int count)
 {
@@ -102,9 +104,14 @@ rte_distributor_request_pkt(struct rte_distributor *d,
 	 */
 	*retptr64 |= RTE_DISTRIB_GET_BUF;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_request_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(void rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count),
+		rte_distributor_request_pkt_v1705);
 
 int
-rte_distributor_poll_pkt(struct rte_distributor *d,
+rte_distributor_poll_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts)
 {
 	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
@@ -138,9 +145,13 @@ rte_distributor_poll_pkt(struct rte_distributor *d,
 
 	return count;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_poll_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **pkts),
+		rte_distributor_poll_pkt_v1705);
 
 int
-rte_distributor_get_pkt(struct rte_distributor *d,
+rte_distributor_get_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts,
 		struct rte_mbuf **oldpkt, unsigned int return_count)
 {
@@ -168,9 +179,14 @@ rte_distributor_get_pkt(struct rte_distributor *d,
 	}
 	return count;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_get_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **pkts,
+		struct rte_mbuf **oldpkt, unsigned int return_count),
+		rte_distributor_get_pkt_v1705);
 
 int
-rte_distributor_return_pkt(struct rte_distributor *d,
+rte_distributor_return_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
 {
 	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
@@ -197,6 +213,10 @@ rte_distributor_return_pkt(struct rte_distributor *d,
 
 	return 0;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_return_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_return_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt, int num),
+		rte_distributor_return_pkt_v1705);
 
 /**** APIs called on distributor core ***/
 
@@ -342,7 +362,7 @@ release(struct rte_distributor *d, unsigned int wkr)
 
 /* process a set of packets to distribute them to workers */
 int
-rte_distributor_process(struct rte_distributor *d,
+rte_distributor_process_v1705(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int num_mbufs)
 {
 	unsigned int next_idx = 0;
@@ -476,10 +496,14 @@ rte_distributor_process(struct rte_distributor *d,
 
 	return num_mbufs;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_process, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs),
+		rte_distributor_process_v1705);
 
 /* return to the caller, packets returned from workers */
 int
-rte_distributor_returned_pkts(struct rte_distributor *d,
+rte_distributor_returned_pkts_v1705(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int max_mbufs)
 {
 	struct rte_distributor_returned_pkts *returns = &d->returns;
@@ -504,6 +528,10 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
 
 	return retval;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_returned_pkts, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs),
+		rte_distributor_returned_pkts_v1705);
 
 /*
  * Return the number of packets in-flight in a distributor, i.e. packets
@@ -525,7 +553,7 @@ total_outstanding(const struct rte_distributor *d)
  * queued up.
  */
 int
-rte_distributor_flush(struct rte_distributor *d)
+rte_distributor_flush_v1705(struct rte_distributor *d)
 {
 	const unsigned int flushed = total_outstanding(d);
 	unsigned int wkr;
@@ -549,10 +577,13 @@ rte_distributor_flush(struct rte_distributor *d)
 
 	return flushed;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_flush, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_flush(struct rte_distributor *d),
+		rte_distributor_flush_v1705);
 
 /* clears the internal returns array in the distributor */
 void
-rte_distributor_clear_returns(struct rte_distributor *d)
+rte_distributor_clear_returns_v1705(struct rte_distributor *d)
 {
 	unsigned int wkr;
 
@@ -565,10 +596,13 @@ rte_distributor_clear_returns(struct rte_distributor *d)
 	for (wkr = 0; wkr < d->num_workers; wkr++)
 		d->bufs[wkr].retptr64[0] = 0;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_clear_returns, _v1705, 17.05);
+MAP_STATIC_SYMBOL(void rte_distributor_clear_returns(struct rte_distributor *d),
+		rte_distributor_clear_returns_v1705);
 
 /* creates a distributor instance */
 struct rte_distributor *
-rte_distributor_create(const char *name,
+rte_distributor_create_v1705(const char *name,
 		unsigned int socket_id,
 		unsigned int num_workers,
 		unsigned int alg_type)
@@ -638,3 +672,8 @@ rte_distributor_create(const char *name,
 
 	return d;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_create, _v1705, 17.05);
+MAP_STATIC_SYMBOL(struct rte_distributor *rte_distributor_create(
+		const char *name, unsigned int socket_id,
+		unsigned int num_workers, unsigned int alg_type),
+		rte_distributor_create_v1705);
diff --git a/lib/librte_distributor/rte_distributor_v1705.h b/lib/librte_distributor/rte_distributor_v1705.h
new file mode 100644
index 0000000..81b2691
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v1705.h
@@ -0,0 +1,89 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIB_V1705_H_
+#define _RTE_DISTRIB_V1705_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct rte_distributor *
+rte_distributor_create_v1705(const char *name, unsigned int socket_id,
+		unsigned int num_workers,
+		unsigned int alg_type);
+
+int
+rte_distributor_process_v1705(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+int
+rte_distributor_returned_pkts_v1705(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+int
+rte_distributor_flush_v1705(struct rte_distributor *d);
+
+void
+rte_distributor_clear_returns_v1705(struct rte_distributor *d);
+
+int
+rte_distributor_get_pkt_v1705(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **pkts,
+	struct rte_mbuf **oldpkt, unsigned int retcount);
+
+int
+rte_distributor_return_pkt_v1705(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
+
+void
+rte_distributor_request_pkt_v1705(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count);
+
+int
+rte_distributor_poll_pkt_v1705(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **mbufs);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
index 1f406c5..bb6c5d7 100644
--- a/lib/librte_distributor/rte_distributor_v20.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -38,6 +38,7 @@
 #include <rte_memory.h>
 #include <rte_memzone.h>
 #include <rte_errno.h>
+#include <rte_compat.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
 #include "rte_distributor_v20.h"
@@ -63,6 +64,7 @@ rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
 		rte_pause();
 	buf->bufptr64 = req;
 }
+VERSION_SYMBOL(rte_distributor_request_pkt, _v20, 2.0);
 
 struct rte_mbuf *
 rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
@@ -76,6 +78,7 @@ rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
 	int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
 	return (struct rte_mbuf *)((uintptr_t)ret);
 }
+VERSION_SYMBOL(rte_distributor_poll_pkt, _v20, 2.0);
 
 struct rte_mbuf *
 rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
@@ -87,6 +90,7 @@ rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
 		rte_pause();
 	return ret;
 }
+VERSION_SYMBOL(rte_distributor_get_pkt, _v20, 2.0);
 
 int
 rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
@@ -98,6 +102,7 @@ rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
 	buf->bufptr64 = req;
 	return 0;
 }
+VERSION_SYMBOL(rte_distributor_return_pkt, _v20, 2.0);
 
 /**** APIs called on distributor core ***/
 
@@ -314,6 +319,7 @@ rte_distributor_process_v20(struct rte_distributor_v20 *d,
 	d->returns.count = ret_count;
 	return num_mbufs;
 }
+VERSION_SYMBOL(rte_distributor_process, _v20, 2.0);
 
 /* return to the caller, packets returned from workers */
 int
@@ -334,6 +340,7 @@ rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
 
 	return retval;
 }
+VERSION_SYMBOL(rte_distributor_returned_pkts, _v20, 2.0);
 
 /* return the number of packets in-flight in a distributor, i.e. packets
  * being workered on or queued up in a backlog. */
@@ -362,6 +369,7 @@ rte_distributor_flush_v20(struct rte_distributor_v20 *d)
 
 	return flushed;
 }
+VERSION_SYMBOL(rte_distributor_flush, _v20, 2.0);
 
 /* clears the internal returns array in the distributor */
 void
@@ -372,6 +380,7 @@ rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d)
 	memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
 #endif
 }
+VERSION_SYMBOL(rte_distributor_clear_returns, _v20, 2.0);
 
 /* creates a distributor instance */
 struct rte_distributor_v20 *
@@ -415,3 +424,4 @@ rte_distributor_create_v20(const char *name,
 
 	return d;
 }
+VERSION_SYMBOL(rte_distributor_create, _v20, 2.0);
diff --git a/lib/librte_distributor/rte_distributor_version.map b/lib/librte_distributor/rte_distributor_version.map
index 73fdc43..3a285b3 100644
--- a/lib/librte_distributor/rte_distributor_version.map
+++ b/lib/librte_distributor/rte_distributor_version.map
@@ -13,3 +13,17 @@ DPDK_2.0 {
 
 	local: *;
 };
+
+DPDK_17.05 {
+	global:
+
+	rte_distributor_clear_returns;
+	rte_distributor_create;
+	rte_distributor_flush;
+	rte_distributor_get_pkt;
+	rte_distributor_poll_pkt;
+	rte_distributor_process;
+	rte_distributor_request_pkt;
+	rte_distributor_return_pkt;
+	rte_distributor_returned_pkts;
+} DPDK_2.0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v10 09/18] test: test single and burst distributor API
  2017-03-15  6:19                               ` [PATCH v10 0/18] distributor library performance enhancements David Hunt
                                                   ` (7 preceding siblings ...)
  2017-03-15  6:19                                 ` [PATCH v10 08/18] lib: add symbol versioning to distributor David Hunt
@ 2017-03-15  6:19                                 ` David Hunt
  2017-03-15  6:19                                 ` [PATCH v10 10/18] test: add perf test for distributor burst mode David Hunt
                                                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 test/test/test_distributor.c | 116 ++++++++++++++++++++++++++++++-------------
 1 file changed, 82 insertions(+), 34 deletions(-)

diff --git a/test/test/test_distributor.c b/test/test/test_distributor.c
index 7a30513..890a852 100644
--- a/test/test/test_distributor.c
+++ b/test/test/test_distributor.c
@@ -538,17 +538,25 @@ static
 int test_error_distributor_create_name(void)
 {
 	struct rte_distributor *d = NULL;
+	struct rte_distributor *db = NULL;
 	char *name = NULL;
 
 	d = rte_distributor_create(name, rte_socket_id(),
 			rte_lcore_count() - 1,
-			RTE_DIST_ALG_BURST);
-
+			RTE_DIST_ALG_SINGLE);
 	if (d != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with NULL name param\n");
 		return -1;
 	}
 
+	db = rte_distributor_create(name, rte_socket_id(),
+			rte_lcore_count() - 1,
+			RTE_DIST_ALG_BURST);
+	if (db != NULL || rte_errno != EINVAL) {
+		printf("ERROR: No error on create() with NULL param\n");
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -556,15 +564,25 @@ int test_error_distributor_create_name(void)
 static
 int test_error_distributor_create_numworkers(void)
 {
-	struct rte_distributor *d = NULL;
+	struct rte_distributor *ds = NULL;
+	struct rte_distributor *db = NULL;
 
-	d = rte_distributor_create("test_numworkers", rte_socket_id(),
+	ds = rte_distributor_create("test_numworkers", rte_socket_id(),
 			RTE_MAX_LCORE + 10,
-			RTE_DIST_ALG_BURST);
-	if (d != NULL || rte_errno != EINVAL) {
+			RTE_DIST_ALG_SINGLE);
+	if (ds != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with num_workers > MAX\n");
 		return -1;
 	}
+
+	db = rte_distributor_create("test_numworkers", rte_socket_id(),
+			RTE_MAX_LCORE + 10,
+			RTE_DIST_ALG_BURST);
+	if (db != NULL || rte_errno != EINVAL) {
+		printf("ERROR: No error on create() num_workers > MAX\n");
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -597,25 +615,42 @@ quit_workers(struct worker_params *wp, struct rte_mempool *p)
 static int
 test_distributor(void)
 {
-	static struct rte_distributor *d;
+	static struct rte_distributor *ds;
+	static struct rte_distributor *db;
+	static struct rte_distributor *dist[2];
 	static struct rte_mempool *p;
+	int i;
 
 	if (rte_lcore_count() < 2) {
 		printf("ERROR: not enough cores to test distributor\n");
 		return -1;
 	}
 
-	if (d == NULL) {
-		d = rte_distributor_create("Test_dist_burst", rte_socket_id(),
+	if (db == NULL) {
+		db = rte_distributor_create("Test_dist_burst", rte_socket_id(),
 				rte_lcore_count() - 1,
 				RTE_DIST_ALG_BURST);
-		if (d == NULL) {
+		if (db == NULL) {
 			printf("Error creating burst distributor\n");
 			return -1;
 		}
 	} else {
-		rte_distributor_flush(d);
-		rte_distributor_clear_returns(d);
+		rte_distributor_flush(db);
+		rte_distributor_clear_returns(db);
+	}
+
+	if (ds == NULL) {
+		ds = rte_distributor_create("Test_dist_single",
+				rte_socket_id(),
+				rte_lcore_count() - 1,
+			RTE_DIST_ALG_SINGLE);
+		if (ds == NULL) {
+			printf("Error creating single distributor\n");
+			return -1;
+		}
+	} else {
+		rte_distributor_flush(ds);
+		rte_distributor_clear_returns(ds);
 	}
 
 	const unsigned nb_bufs = (511 * rte_lcore_count()) < BIG_BATCH ?
@@ -629,37 +664,50 @@ test_distributor(void)
 		}
 	}
 
-	worker_params.dist = d;
-	sprintf(worker_params.name, "burst");
+	dist[0] = ds;
+	dist[1] = db;
 
-	rte_eal_mp_remote_launch(handle_work, &worker_params, SKIP_MASTER);
-	if (sanity_test(&worker_params, p) < 0)
-		goto err;
-	quit_workers(&worker_params, p);
+	for (i = 0; i < 2; i++) {
 
-	rte_eal_mp_remote_launch(handle_work_with_free_mbufs, &worker_params,
-				SKIP_MASTER);
-	if (sanity_test_with_mbuf_alloc(&worker_params, p) < 0)
-		goto err;
-	quit_workers(&worker_params, p);
+		worker_params.dist = dist[i];
+		if (i)
+			sprintf(worker_params.name, "burst");
+		else
+			sprintf(worker_params.name, "single");
 
-	if (rte_lcore_count() > 2) {
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
-				&worker_params,
-				SKIP_MASTER);
-		if (sanity_test_with_worker_shutdown(&worker_params, p) < 0)
+		rte_eal_mp_remote_launch(handle_work,
+				&worker_params, SKIP_MASTER);
+		if (sanity_test(&worker_params, p) < 0)
 			goto err;
 		quit_workers(&worker_params, p);
 
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
-				&worker_params,
-				SKIP_MASTER);
-		if (test_flush_with_worker_shutdown(&worker_params, p) < 0)
+		rte_eal_mp_remote_launch(handle_work_with_free_mbufs,
+				&worker_params, SKIP_MASTER);
+		if (sanity_test_with_mbuf_alloc(&worker_params, p) < 0)
 			goto err;
 		quit_workers(&worker_params, p);
 
-	} else {
-		printf("Not enough cores to run tests for worker shutdown\n");
+		if (rte_lcore_count() > 2) {
+			rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+					&worker_params,
+					SKIP_MASTER);
+			if (sanity_test_with_worker_shutdown(&worker_params,
+					p) < 0)
+				goto err;
+			quit_workers(&worker_params, p);
+
+			rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+					&worker_params,
+					SKIP_MASTER);
+			if (test_flush_with_worker_shutdown(&worker_params,
+					p) < 0)
+				goto err;
+			quit_workers(&worker_params, p);
+
+		} else {
+			printf("Too few cores to run worker shutdown test\n");
+		}
+
 	}
 
 	if (test_error_distributor_create_numworkers() == -1 ||
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v10 10/18] test: add perf test for distributor burst mode
  2017-03-15  6:19                               ` [PATCH v10 0/18] distributor library performance enhancements David Hunt
                                                   ` (8 preceding siblings ...)
  2017-03-15  6:19                                 ` [PATCH v10 09/18] test: test single and burst distributor API David Hunt
@ 2017-03-15  6:19                                 ` David Hunt
  2017-03-15  6:19                                 ` [PATCH v10 11/18] examples/distributor: allow for extra stats David Hunt
                                                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 test/test/test_distributor_perf.c | 75 ++++++++++++++++++++++++++-------------
 1 file changed, 51 insertions(+), 24 deletions(-)

diff --git a/test/test/test_distributor_perf.c b/test/test/test_distributor_perf.c
index 1dd326b..732d86d 100644
--- a/test/test/test_distributor_perf.c
+++ b/test/test/test_distributor_perf.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -41,8 +41,9 @@
 #include <rte_mbuf.h>
 #include <rte_distributor.h>
 
-#define ITER_POWER 20 /* log 2 of how many iterations we do when timing. */
-#define BURST 32
+#define ITER_POWER_CL 25 /* log 2 of how many iterations  for Cache Line test */
+#define ITER_POWER 21 /* log 2 of how many iterations we do when timing. */
+#define BURST 64
 #define BIG_BATCH 1024
 
 /* static vars - zero initialized by default */
@@ -54,7 +55,8 @@ struct worker_stats {
 } __rte_cache_aligned;
 struct worker_stats worker_stats[RTE_MAX_LCORE];
 
-/* worker thread used for testing the time to do a round-trip of a cache
+/*
+ * worker thread used for testing the time to do a round-trip of a cache
  * line between two cores and back again
  */
 static void
@@ -69,7 +71,8 @@ flip_bit(volatile uint64_t *arg)
 	}
 }
 
-/* test case to time the number of cycles to round-trip a cache line between
+/*
+ * test case to time the number of cycles to round-trip a cache line between
  * two cores and back again.
  */
 static void
@@ -86,7 +89,7 @@ time_cache_line_switch(void)
 		rte_pause();
 
 	const uint64_t start_time = rte_rdtsc();
-	for (i = 0; i < (1 << ITER_POWER); i++) {
+	for (i = 0; i < (1 << ITER_POWER_CL); i++) {
 		while (*pdata)
 			rte_pause();
 		*pdata = 1;
@@ -98,13 +101,14 @@ time_cache_line_switch(void)
 	*pdata = 2;
 	rte_eal_wait_lcore(slaveid);
 	printf("==== Cache line switch test ===\n");
-	printf("Time for %u iterations = %"PRIu64" ticks\n", (1<<ITER_POWER),
+	printf("Time for %u iterations = %"PRIu64" ticks\n", (1<<ITER_POWER_CL),
 			end_time-start_time);
 	printf("Ticks per iteration = %"PRIu64"\n\n",
-			(end_time-start_time) >> ITER_POWER);
+			(end_time-start_time) >> ITER_POWER_CL);
 }
 
-/* returns the total count of the number of packets handled by the worker
+/*
+ * returns the total count of the number of packets handled by the worker
  * functions given below.
  */
 static unsigned
@@ -123,7 +127,8 @@ clear_packet_count(void)
 	memset(&worker_stats, 0, sizeof(worker_stats));
 }
 
-/* this is the basic worker function for performance tests.
+/*
+ * This is the basic worker function for performance tests.
  * it does nothing but return packets and count them.
  */
 static int
@@ -151,14 +156,15 @@ handle_work(void *arg)
 	return 0;
 }
 
-/* this basic performance test just repeatedly sends in 32 packets at a time
+/*
+ * This basic performance test just repeatedly sends in 32 packets at a time
  * to the distributor and verifies at the end that we got them all in the worker
  * threads and finally how long per packet the processing took.
  */
 static inline int
 perf_test(struct rte_distributor *d, struct rte_mempool *p)
 {
-	unsigned i;
+	unsigned int i;
 	uint64_t start, end;
 	struct rte_mbuf *bufs[BURST];
 
@@ -181,7 +187,8 @@ perf_test(struct rte_distributor *d, struct rte_mempool *p)
 		rte_distributor_process(d, NULL, 0);
 	} while (total_packet_count() < (BURST << ITER_POWER));
 
-	printf("=== Performance test of distributor ===\n");
+	rte_distributor_clear_returns(d);
+
 	printf("Time per burst:  %"PRIu64"\n", (end - start) >> ITER_POWER);
 	printf("Time per packet: %"PRIu64"\n\n",
 			((end - start) >> ITER_POWER)/BURST);
@@ -201,9 +208,10 @@ perf_test(struct rte_distributor *d, struct rte_mempool *p)
 static void
 quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 {
-	const unsigned num_workers = rte_lcore_count() - 1;
-	unsigned i;
+	const unsigned int num_workers = rte_lcore_count() - 1;
+	unsigned int i;
 	struct rte_mbuf *bufs[RTE_MAX_LCORE];
+
 	rte_mempool_get_bulk(p, (void *)bufs, num_workers);
 
 	quit = 1;
@@ -222,7 +230,8 @@ quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 static int
 test_distributor_perf(void)
 {
-	static struct rte_distributor *d;
+	static struct rte_distributor *ds;
+	static struct rte_distributor *db;
 	static struct rte_mempool *p;
 
 	if (rte_lcore_count() < 2) {
@@ -233,17 +242,28 @@ test_distributor_perf(void)
 	/* first time how long it takes to round-trip a cache line */
 	time_cache_line_switch();
 
-	if (d == NULL) {
-		d = rte_distributor_create("Test_perf", rte_socket_id(),
+	if (ds == NULL) {
+		ds = rte_distributor_create("Test_perf", rte_socket_id(),
 				rte_lcore_count() - 1,
 				RTE_DIST_ALG_SINGLE);
-		if (d == NULL) {
+		if (ds == NULL) {
 			printf("Error creating distributor\n");
 			return -1;
 		}
 	} else {
-		rte_distributor_flush(d);
-		rte_distributor_clear_returns(d);
+		rte_distributor_clear_returns(ds);
+	}
+
+	if (db == NULL) {
+		db = rte_distributor_create("Test_burst", rte_socket_id(),
+				rte_lcore_count() - 1,
+				RTE_DIST_ALG_BURST);
+		if (db == NULL) {
+			printf("Error creating burst distributor\n");
+			return -1;
+		}
+	} else {
+		rte_distributor_clear_returns(db);
 	}
 
 	const unsigned nb_bufs = (511 * rte_lcore_count()) < BIG_BATCH ?
@@ -257,10 +277,17 @@ test_distributor_perf(void)
 		}
 	}
 
-	rte_eal_mp_remote_launch(handle_work, d, SKIP_MASTER);
-	if (perf_test(d, p) < 0)
+	printf("=== Performance test of distributor (single mode) ===\n");
+	rte_eal_mp_remote_launch(handle_work, ds, SKIP_MASTER);
+	if (perf_test(ds, p) < 0)
+		return -1;
+	quit_workers(ds, p);
+
+	printf("=== Performance test of distributor (burst mode) ===\n");
+	rte_eal_mp_remote_launch(handle_work, db, SKIP_MASTER);
+	if (perf_test(db, p) < 0)
 		return -1;
-	quit_workers(d, p);
+	quit_workers(db, p);
 
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v10 11/18] examples/distributor: allow for extra stats
  2017-03-15  6:19                               ` [PATCH v10 0/18] distributor library performance enhancements David Hunt
                                                   ` (9 preceding siblings ...)
  2017-03-15  6:19                                 ` [PATCH v10 10/18] test: add perf test for distributor burst mode David Hunt
@ 2017-03-15  6:19                                 ` David Hunt
  2017-03-15  6:19                                 ` [PATCH v10 12/18] examples/distributor: wait for ports to come up David Hunt
                                                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

This will allow us to see what's going on at various stages
throughout the sample app, with per-second visibility

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 examples/distributor/main.c | 140 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 124 insertions(+), 16 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index a748985..a8a5e80 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -54,24 +54,53 @@
 
 #define RTE_LOGTYPE_DISTRAPP RTE_LOGTYPE_USER1
 
+#define ANSI_COLOR_RED     "\x1b[31m"
+#define ANSI_COLOR_RESET   "\x1b[0m"
+
 /* mask of enabled ports */
 static uint32_t enabled_port_mask;
 volatile uint8_t quit_signal;
 volatile uint8_t quit_signal_rx;
+volatile uint8_t quit_signal_dist;
 
 static volatile struct app_stats {
 	struct {
 		uint64_t rx_pkts;
 		uint64_t returned_pkts;
 		uint64_t enqueued_pkts;
+		uint64_t enqdrop_pkts;
 	} rx __rte_cache_aligned;
+	int pad1 __rte_cache_aligned;
+
+	struct {
+		uint64_t in_pkts;
+		uint64_t ret_pkts;
+		uint64_t sent_pkts;
+		uint64_t enqdrop_pkts;
+	} dist __rte_cache_aligned;
+	int pad2 __rte_cache_aligned;
 
 	struct {
 		uint64_t dequeue_pkts;
 		uint64_t tx_pkts;
+		uint64_t enqdrop_pkts;
 	} tx __rte_cache_aligned;
+	int pad3 __rte_cache_aligned;
+
+	uint64_t worker_pkts[64] __rte_cache_aligned;
+
+	int pad4 __rte_cache_aligned;
+
+	uint64_t worker_bursts[64][8] __rte_cache_aligned;
+
+	int pad5 __rte_cache_aligned;
+
+	uint64_t port_rx_pkts[64] __rte_cache_aligned;
+	uint64_t port_tx_pkts[64] __rte_cache_aligned;
 } app_stats;
 
+struct app_stats prev_app_stats;
+
 static const struct rte_eth_conf port_conf_default = {
 	.rxmode = {
 		.mq_mode = ETH_MQ_RX_RSS,
@@ -93,6 +122,8 @@ struct output_buffer {
 	struct rte_mbuf *mbufs[BURST_SIZE];
 };
 
+static void print_stats(void);
+
 /*
  * Initialises a given port using global settings and with the rx buffers
  * coming from the mbuf_pool passed as parameter
@@ -378,25 +409,91 @@ static void
 print_stats(void)
 {
 	struct rte_eth_stats eth_stats;
-	unsigned i;
-
-	printf("\nRX thread stats:\n");
-	printf(" - Received:    %"PRIu64"\n", app_stats.rx.rx_pkts);
-	printf(" - Processed:   %"PRIu64"\n", app_stats.rx.returned_pkts);
-	printf(" - Enqueued:    %"PRIu64"\n", app_stats.rx.enqueued_pkts);
-
-	printf("\nTX thread stats:\n");
-	printf(" - Dequeued:    %"PRIu64"\n", app_stats.tx.dequeue_pkts);
-	printf(" - Transmitted: %"PRIu64"\n", app_stats.tx.tx_pkts);
+	unsigned int i, j;
+	const unsigned int num_workers = rte_lcore_count() - 4;
 
 	for (i = 0; i < rte_eth_dev_count(); i++) {
 		rte_eth_stats_get(i, &eth_stats);
-		printf("\nPort %u stats:\n", i);
-		printf(" - Pkts in:   %"PRIu64"\n", eth_stats.ipackets);
-		printf(" - Pkts out:  %"PRIu64"\n", eth_stats.opackets);
-		printf(" - In Errs:   %"PRIu64"\n", eth_stats.ierrors);
-		printf(" - Out Errs:  %"PRIu64"\n", eth_stats.oerrors);
-		printf(" - Mbuf Errs: %"PRIu64"\n", eth_stats.rx_nombuf);
+		app_stats.port_rx_pkts[i] = eth_stats.ipackets;
+		app_stats.port_tx_pkts[i] = eth_stats.opackets;
+	}
+
+	printf("\n\nRX Thread:\n");
+	for (i = 0; i < rte_eth_dev_count(); i++) {
+		printf("Port %u Pktsin : %5.2f\n", i,
+				(app_stats.port_rx_pkts[i] -
+				prev_app_stats.port_rx_pkts[i])/1000000.0);
+		prev_app_stats.port_rx_pkts[i] = app_stats.port_rx_pkts[i];
+	}
+	printf(" - Received:    %5.2f\n",
+			(app_stats.rx.rx_pkts -
+			prev_app_stats.rx.rx_pkts)/1000000.0);
+	printf(" - Returned:    %5.2f\n",
+			(app_stats.rx.returned_pkts -
+			prev_app_stats.rx.returned_pkts)/1000000.0);
+	printf(" - Enqueued:    %5.2f\n",
+			(app_stats.rx.enqueued_pkts -
+			prev_app_stats.rx.enqueued_pkts)/1000000.0);
+	printf(" - Dropped:     %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.rx.enqdrop_pkts -
+			prev_app_stats.rx.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	printf("Distributor thread:\n");
+	printf(" - In:          %5.2f\n",
+			(app_stats.dist.in_pkts -
+			prev_app_stats.dist.in_pkts)/1000000.0);
+	printf(" - Returned:    %5.2f\n",
+			(app_stats.dist.ret_pkts -
+			prev_app_stats.dist.ret_pkts)/1000000.0);
+	printf(" - Sent:        %5.2f\n",
+			(app_stats.dist.sent_pkts -
+			prev_app_stats.dist.sent_pkts)/1000000.0);
+	printf(" - Dropped      %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.dist.enqdrop_pkts -
+			prev_app_stats.dist.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	printf("TX thread:\n");
+	printf(" - Dequeued:    %5.2f\n",
+			(app_stats.tx.dequeue_pkts -
+			prev_app_stats.tx.dequeue_pkts)/1000000.0);
+	for (i = 0; i < rte_eth_dev_count(); i++) {
+		printf("Port %u Pktsout: %5.2f\n",
+				i, (app_stats.port_tx_pkts[i] -
+				prev_app_stats.port_tx_pkts[i])/1000000.0);
+		prev_app_stats.port_tx_pkts[i] = app_stats.port_tx_pkts[i];
+	}
+	printf(" - Transmitted: %5.2f\n",
+			(app_stats.tx.tx_pkts -
+			prev_app_stats.tx.tx_pkts)/1000000.0);
+	printf(" - Dropped:     %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.tx.enqdrop_pkts -
+			prev_app_stats.tx.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	prev_app_stats.rx.rx_pkts = app_stats.rx.rx_pkts;
+	prev_app_stats.rx.returned_pkts = app_stats.rx.returned_pkts;
+	prev_app_stats.rx.enqueued_pkts = app_stats.rx.enqueued_pkts;
+	prev_app_stats.rx.enqdrop_pkts = app_stats.rx.enqdrop_pkts;
+	prev_app_stats.dist.in_pkts = app_stats.dist.in_pkts;
+	prev_app_stats.dist.ret_pkts = app_stats.dist.ret_pkts;
+	prev_app_stats.dist.sent_pkts = app_stats.dist.sent_pkts;
+	prev_app_stats.dist.enqdrop_pkts = app_stats.dist.enqdrop_pkts;
+	prev_app_stats.tx.dequeue_pkts = app_stats.tx.dequeue_pkts;
+	prev_app_stats.tx.tx_pkts = app_stats.tx.tx_pkts;
+	prev_app_stats.tx.enqdrop_pkts = app_stats.tx.enqdrop_pkts;
+
+	for (i = 0; i < num_workers; i++) {
+		printf("Worker %02u Pkts: %5.2f. Bursts(1-8): ", i,
+				(app_stats.worker_pkts[i] -
+				prev_app_stats.worker_pkts[i])/1000000.0);
+		for (j = 0; j < 8; j++) {
+			printf("%"PRIu64" ", app_stats.worker_bursts[i][j]);
+			app_stats.worker_bursts[i][j] = 0;
+		}
+		printf("\n");
+		prev_app_stats.worker_pkts[i] = app_stats.worker_pkts[i];
 	}
 }
 
@@ -515,6 +612,7 @@ main(int argc, char *argv[])
 	unsigned nb_ports;
 	uint8_t portid;
 	uint8_t nb_ports_available;
+	uint64_t t, freq;
 
 	/* catch ctrl-c so we can print on exit */
 	signal(SIGINT, int_handler);
@@ -610,6 +708,16 @@ main(int argc, char *argv[])
 	if (lcore_rx(&p) != 0)
 		return -1;
 
+	freq = rte_get_timer_hz();
+	t = rte_rdtsc() + freq;
+	while (!quit_signal_dist) {
+		if (t < rte_rdtsc()) {
+			print_stats();
+			t = rte_rdtsc() + freq;
+		}
+		usleep(1000);
+	}
+
 	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
 		if (rte_eal_wait_lcore(lcore_id) < 0)
 			return -1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v10 12/18] examples/distributor: wait for ports to come up
  2017-03-15  6:19                               ` [PATCH v10 0/18] distributor library performance enhancements David Hunt
                                                   ` (10 preceding siblings ...)
  2017-03-15  6:19                                 ` [PATCH v10 11/18] examples/distributor: allow for extra stats David Hunt
@ 2017-03-15  6:19                                 ` David Hunt
  2017-03-15  6:19                                 ` [PATCH v10 13/18] examples/distributor: add dedicated core for dist David Hunt
                                                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

On some machines, ports take several seconds to come up. This
patch causes the app to wait.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 examples/distributor/main.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index a8a5e80..75c001d 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -1,8 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -62,6 +61,7 @@ static uint32_t enabled_port_mask;
 volatile uint8_t quit_signal;
 volatile uint8_t quit_signal_rx;
 volatile uint8_t quit_signal_dist;
+volatile uint8_t quit_signal_work;
 
 static volatile struct app_stats {
 	struct {
@@ -165,7 +165,8 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 
 	struct rte_eth_link link;
 	rte_eth_link_get_nowait(port, &link);
-	if (!link.link_status) {
+	while (!link.link_status) {
+		printf("Waiting for Link up on port %"PRIu8"\n", port);
 		sleep(1);
 		rte_eth_link_get_nowait(port, &link);
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v10 13/18] examples/distributor: add dedicated core for dist
  2017-03-15  6:19                               ` [PATCH v10 0/18] distributor library performance enhancements David Hunt
                                                   ` (11 preceding siblings ...)
  2017-03-15  6:19                                 ` [PATCH v10 12/18] examples/distributor: wait for ports to come up David Hunt
@ 2017-03-15  6:19                                 ` David Hunt
  2017-03-15  6:19                                 ` [PATCH v10 14/18] examples/distributor: tweaks for performance David Hunt
                                                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Give the distribution functionality it's own core for performance,
otherwise it's limited by the Rx core.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 examples/distributor/main.c | 181 ++++++++++++++++++++++++++++++--------------
 1 file changed, 123 insertions(+), 58 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index 75c001d..96d6454 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -49,6 +49,8 @@
 #define NUM_MBUFS ((64*1024)-1)
 #define MBUF_CACHE_SIZE 250
 #define BURST_SIZE 32
+#define SCHED_RX_RING_SZ 8192
+#define SCHED_TX_RING_SZ 65536
 #define RTE_RING_SZ 1024
 
 #define RTE_LOGTYPE_DISTRAPP RTE_LOGTYPE_USER1
@@ -193,37 +195,14 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 struct lcore_params {
 	unsigned worker_id;
 	struct rte_distributor *d;
-	struct rte_ring *r;
+	struct rte_ring *rx_dist_ring;
+	struct rte_ring *dist_tx_ring;
 	struct rte_mempool *mem_pool;
 };
 
 static int
-quit_workers(struct rte_distributor *d, struct rte_mempool *p)
-{
-	const unsigned num_workers = rte_lcore_count() - 2;
-	unsigned i;
-	struct rte_mbuf *bufs[num_workers];
-
-	if (rte_mempool_get_bulk(p, (void *)bufs, num_workers) != 0) {
-		printf("line %d: Error getting mbufs from pool\n", __LINE__);
-		return -1;
-	}
-
-	for (i = 0; i < num_workers; i++)
-		bufs[i]->hash.rss = i << 1;
-
-	rte_distributor_process(d, bufs, num_workers);
-	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
-
-	return 0;
-}
-
-static int
 lcore_rx(struct lcore_params *p)
 {
-	struct rte_distributor *d = p->d;
-	struct rte_mempool *mem_pool = p->mem_pool;
-	struct rte_ring *r = p->r;
 	const uint8_t nb_ports = rte_eth_dev_count();
 	const int socket_id = rte_socket_id();
 	uint8_t port;
@@ -260,9 +239,15 @@ lcore_rx(struct lcore_params *p)
 		}
 		app_stats.rx.rx_pkts += nb_rx;
 
-		rte_distributor_process(d, bufs, nb_rx);
-		const uint16_t nb_ret = rte_distributor_returned_pkts(d,
-				bufs, BURST_SIZE*2);
+/*
+ * You can run the distributor on the rx core with this code. Returned
+ * packets are then send straight to the tx core.
+ */
+#if 0
+	rte_distributor_process(d, bufs, nb_rx);
+	const uint16_t nb_ret = rte_distributor_returned_pktsd,
+			bufs, BURST_SIZE*2);
+
 		app_stats.rx.returned_pkts += nb_ret;
 		if (unlikely(nb_ret == 0)) {
 			if (++port == nb_ports)
@@ -270,7 +255,22 @@ lcore_rx(struct lcore_params *p)
 			continue;
 		}
 
-		uint16_t sent = rte_ring_enqueue_burst(r, (void *)bufs, nb_ret);
+		struct rte_ring *tx_ring = p->dist_tx_ring;
+		uint16_t sent = rte_ring_enqueue_burst(tx_ring,
+				(void *)bufs, nb_ret);
+#else
+		uint16_t nb_ret = nb_rx;
+		/*
+		 * Swap the following two lines if you want the rx traffic
+		 * to go directly to tx, no distribution.
+		 */
+		struct rte_ring *out_ring = p->rx_dist_ring;
+		/* struct rte_ring *out_ring = p->dist_tx_ring; */
+
+		uint16_t sent = rte_ring_enqueue_burst(out_ring,
+				(void *)bufs, nb_ret);
+#endif
+
 		app_stats.rx.enqueued_pkts += sent;
 		if (unlikely(sent < nb_ret)) {
 			RTE_LOG_DP(DEBUG, DISTRAPP,
@@ -281,20 +281,9 @@ lcore_rx(struct lcore_params *p)
 		if (++port == nb_ports)
 			port = 0;
 	}
-	rte_distributor_process(d, NULL, 0);
-	/* flush distributor to bring to known state */
-	rte_distributor_flush(d);
 	/* set worker & tx threads quit flag */
+	printf("\nCore %u exiting rx task.\n", rte_lcore_id());
 	quit_signal = 1;
-	/*
-	 * worker threads may hang in get packet as
-	 * distributor process is not running, just make sure workers
-	 * get packets till quit_signal is actually been
-	 * received and they gracefully shutdown
-	 */
-	if (quit_workers(d, mem_pool) != 0)
-		return -1;
-	/* rx thread should quit at last */
 	return 0;
 }
 
@@ -331,6 +320,58 @@ flush_all_ports(struct output_buffer *tx_buffers, uint8_t nb_ports)
 	}
 }
 
+
+
+static int
+lcore_distributor(struct lcore_params *p)
+{
+	struct rte_ring *in_r = p->rx_dist_ring;
+	struct rte_ring *out_r = p->dist_tx_ring;
+	struct rte_mbuf *bufs[BURST_SIZE * 4];
+	struct rte_distributor *d = p->d;
+
+	printf("\nCore %u acting as distributor core.\n", rte_lcore_id());
+	while (!quit_signal_dist) {
+		const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
+				(void *)bufs, BURST_SIZE*1);
+		if (nb_rx) {
+			app_stats.dist.in_pkts += nb_rx;
+
+			/* Distribute the packets */
+			rte_distributor_process(d, bufs, nb_rx);
+			/* Handle Returns */
+			const uint16_t nb_ret =
+				rte_distributor_returned_pkts(d,
+					bufs, BURST_SIZE*2);
+
+			if (unlikely(nb_ret == 0))
+				continue;
+			app_stats.dist.ret_pkts += nb_ret;
+
+			uint16_t sent = rte_ring_enqueue_burst(out_r,
+					(void *)bufs, nb_ret);
+			app_stats.dist.sent_pkts += sent;
+			if (unlikely(sent < nb_ret)) {
+				app_stats.dist.enqdrop_pkts += nb_ret - sent;
+				RTE_LOG(DEBUG, DISTRAPP,
+					"%s:Packet loss due to full out ring\n",
+					__func__);
+				while (sent < nb_ret)
+					rte_pktmbuf_free(bufs[sent++]);
+			}
+		}
+	}
+	printf("\nCore %u exiting distributor task.\n", rte_lcore_id());
+	quit_signal_work = 1;
+
+	rte_distributor_flush(d);
+	/* Unblock any returns so workers can exit */
+	rte_distributor_clear_returns(d);
+	quit_signal_rx = 1;
+	return 0;
+}
+
+
 static int
 lcore_tx(struct rte_ring *in_r)
 {
@@ -403,7 +444,7 @@ int_handler(int sig_num)
 {
 	printf("Exiting on signal %d\n", sig_num);
 	/* set quit flag for rx thread to exit */
-	quit_signal_rx = 1;
+	quit_signal_dist = 1;
 }
 
 static void
@@ -517,7 +558,7 @@ lcore_worker(struct lcore_params *p)
 		buf[i] = NULL;
 
 	printf("\nCore %u acting as worker core.\n", rte_lcore_id());
-	while (!quit_signal) {
+	while (!quit_signal_work) {
 		num = rte_distributor_get_pkt(d, id, buf, buf, num);
 		/* Do a little bit of work for each packet */
 		for (i = 0; i < num; i++) {
@@ -608,7 +649,8 @@ main(int argc, char *argv[])
 {
 	struct rte_mempool *mbuf_pool;
 	struct rte_distributor *d;
-	struct rte_ring *output_ring;
+	struct rte_ring *dist_tx_ring;
+	struct rte_ring *rx_dist_ring;
 	unsigned lcore_id, worker_id = 0;
 	unsigned nb_ports;
 	uint8_t portid;
@@ -630,10 +672,11 @@ main(int argc, char *argv[])
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Invalid distributor parameters\n");
 
-	if (rte_lcore_count() < 3)
+	if (rte_lcore_count() < 4)
 		rte_exit(EXIT_FAILURE, "Error, This application needs at "
-				"least 3 logical cores to run:\n"
-				"1 lcore for packet RX and distribution\n"
+				"least 4 logical cores to run:\n"
+				"1 lcore for packet RX\n"
+				"1 lcore for distribution\n"
 				"1 lcore for packet TX\n"
 				"and at least 1 lcore for worker threads\n");
 
@@ -673,30 +716,52 @@ main(int argc, char *argv[])
 	}
 
 	d = rte_distributor_create("PKT_DIST", rte_socket_id(),
-			rte_lcore_count() - 2,
+			rte_lcore_count() - 3,
 			RTE_DIST_ALG_BURST);
 	if (d == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create distributor\n");
 
 	/*
-	 * scheduler ring is read only by the transmitter core, but written to
-	 * by multiple threads
+	 * scheduler ring is read by the transmitter core, and written to
+	 * by scheduler core
 	 */
-	output_ring = rte_ring_create("Output_ring", RTE_RING_SZ,
-			rte_socket_id(), RING_F_SC_DEQ);
-	if (output_ring == NULL)
+	dist_tx_ring = rte_ring_create("Output_ring", SCHED_TX_RING_SZ,
+			rte_socket_id(), RING_F_SC_DEQ | RING_F_SP_ENQ);
+	if (dist_tx_ring == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot create output ring\n");
+
+	rx_dist_ring = rte_ring_create("Input_ring", SCHED_RX_RING_SZ,
+			rte_socket_id(), RING_F_SC_DEQ | RING_F_SP_ENQ);
+	if (rx_dist_ring == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create output ring\n");
 
 	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
-		if (worker_id == rte_lcore_count() - 2)
+		if (worker_id == rte_lcore_count() - 3) {
+			printf("Starting distributor on lcore_id %d\n",
+					lcore_id);
+			/* distributor core */
+			struct lcore_params *p =
+					rte_malloc(NULL, sizeof(*p), 0);
+			if (!p)
+				rte_panic("malloc failure\n");
+			*p = (struct lcore_params){worker_id, d,
+				rx_dist_ring, dist_tx_ring, mbuf_pool};
+			rte_eal_remote_launch(
+				(lcore_function_t *)lcore_distributor,
+				p, lcore_id);
+		} else if (worker_id == rte_lcore_count() - 4) {
+			printf("Starting tx  on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
+			/* tx core */
 			rte_eal_remote_launch((lcore_function_t *)lcore_tx,
-					output_ring, lcore_id);
-		else {
+					dist_tx_ring, lcore_id);
+		} else {
 			struct lcore_params *p =
 					rte_malloc(NULL, sizeof(*p), 0);
 			if (!p)
 				rte_panic("malloc failure\n");
-			*p = (struct lcore_params){worker_id, d, output_ring, mbuf_pool};
+			*p = (struct lcore_params){worker_id, d, rx_dist_ring,
+					dist_tx_ring, mbuf_pool};
 
 			rte_eal_remote_launch((lcore_function_t *)lcore_worker,
 					p, lcore_id);
@@ -704,7 +769,7 @@ main(int argc, char *argv[])
 		worker_id++;
 	}
 	/* call lcore_main on master core only */
-	struct lcore_params p = { 0, d, output_ring, mbuf_pool};
+	struct lcore_params p = { 0, d, rx_dist_ring, dist_tx_ring, mbuf_pool};
 
 	if (lcore_rx(&p) != 0)
 		return -1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v10 14/18] examples/distributor: tweaks for performance
  2017-03-15  6:19                               ` [PATCH v10 0/18] distributor library performance enhancements David Hunt
                                                   ` (12 preceding siblings ...)
  2017-03-15  6:19                                 ` [PATCH v10 13/18] examples/distributor: add dedicated core for dist David Hunt
@ 2017-03-15  6:19                                 ` David Hunt
  2017-03-15  6:19                                 ` [PATCH v10 15/18] examples/distributor: give Rx thread a core David Hunt
                                                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Approximately 10% performance increase due to these changes.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 examples/distributor/main.c | 36 +++++++++++++++++++++++-------------
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index 96d6454..53c7b38 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -44,14 +44,14 @@
 #include <rte_prefetch.h>
 #include <rte_distributor.h>
 
-#define RX_RING_SIZE 256
+#define RX_RING_SIZE 512
 #define TX_RING_SIZE 512
 #define NUM_MBUFS ((64*1024)-1)
-#define MBUF_CACHE_SIZE 250
-#define BURST_SIZE 32
+#define MBUF_CACHE_SIZE 128
+#define BURST_SIZE 64
 #define SCHED_RX_RING_SZ 8192
 #define SCHED_TX_RING_SZ 65536
-#define RTE_RING_SZ 1024
+#define BURST_SIZE_TX 32
 
 #define RTE_LOGTYPE_DISTRAPP RTE_LOGTYPE_USER1
 
@@ -206,6 +206,7 @@ lcore_rx(struct lcore_params *p)
 	const uint8_t nb_ports = rte_eth_dev_count();
 	const int socket_id = rte_socket_id();
 	uint8_t port;
+	struct rte_mbuf *bufs[BURST_SIZE*2];
 
 	for (port = 0; port < nb_ports; port++) {
 		/* skip ports that are not enabled */
@@ -229,7 +230,6 @@ lcore_rx(struct lcore_params *p)
 				port = 0;
 			continue;
 		}
-		struct rte_mbuf *bufs[BURST_SIZE*2];
 		const uint16_t nb_rx = rte_eth_rx_burst(port, 0, bufs,
 				BURST_SIZE);
 		if (unlikely(nb_rx == 0)) {
@@ -273,6 +273,7 @@ lcore_rx(struct lcore_params *p)
 
 		app_stats.rx.enqueued_pkts += sent;
 		if (unlikely(sent < nb_ret)) {
+			app_stats.rx.enqdrop_pkts +=  nb_ret - sent;
 			RTE_LOG_DP(DEBUG, DISTRAPP,
 				"%s:Packet loss due to full ring\n", __func__);
 			while (sent < nb_ret)
@@ -290,13 +291,12 @@ lcore_rx(struct lcore_params *p)
 static inline void
 flush_one_port(struct output_buffer *outbuf, uint8_t outp)
 {
-	unsigned nb_tx = rte_eth_tx_burst(outp, 0, outbuf->mbufs,
-			outbuf->count);
-	app_stats.tx.tx_pkts += nb_tx;
+	unsigned int nb_tx = rte_eth_tx_burst(outp, 0,
+			outbuf->mbufs, outbuf->count);
+	app_stats.tx.tx_pkts += outbuf->count;
 
 	if (unlikely(nb_tx < outbuf->count)) {
-		RTE_LOG_DP(DEBUG, DISTRAPP,
-			"%s:Packet loss with tx_burst\n", __func__);
+		app_stats.tx.enqdrop_pkts +=  outbuf->count - nb_tx;
 		do {
 			rte_pktmbuf_free(outbuf->mbufs[nb_tx]);
 		} while (++nb_tx < outbuf->count);
@@ -308,6 +308,7 @@ static inline void
 flush_all_ports(struct output_buffer *tx_buffers, uint8_t nb_ports)
 {
 	uint8_t outp;
+
 	for (outp = 0; outp < nb_ports; outp++) {
 		/* skip ports that are not enabled */
 		if ((enabled_port_mask & (1 << outp)) == 0)
@@ -400,9 +401,9 @@ lcore_tx(struct rte_ring *in_r)
 			if ((enabled_port_mask & (1 << port)) == 0)
 				continue;
 
-			struct rte_mbuf *bufs[BURST_SIZE];
+			struct rte_mbuf *bufs[BURST_SIZE_TX];
 			const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
-					(void *)bufs, BURST_SIZE);
+					(void *)bufs, BURST_SIZE_TX);
 			app_stats.tx.dequeue_pkts += nb_rx;
 
 			/* if we get no traffic, flush anything we have */
@@ -431,11 +432,12 @@ lcore_tx(struct rte_ring *in_r)
 
 				outbuf = &tx_buffers[outp];
 				outbuf->mbufs[outbuf->count++] = bufs[i];
-				if (outbuf->count == BURST_SIZE)
+				if (outbuf->count == BURST_SIZE_TX)
 					flush_one_port(outbuf, outp);
 			}
 		}
 	}
+	printf("\nCore %u exiting tx task.\n", rte_lcore_id());
 	return 0;
 }
 
@@ -557,6 +559,8 @@ lcore_worker(struct lcore_params *p)
 	for (i = 0; i < 8; i++)
 		buf[i] = NULL;
 
+	app_stats.worker_pkts[p->worker_id] = 1;
+
 	printf("\nCore %u acting as worker core.\n", rte_lcore_id());
 	while (!quit_signal_work) {
 		num = rte_distributor_get_pkt(d, id, buf, buf, num);
@@ -568,6 +572,10 @@ lcore_worker(struct lcore_params *p)
 				rte_pause();
 			buf[i]->port ^= xor_val;
 		}
+
+		app_stats.worker_pkts[p->worker_id] += num;
+		if (num > 0)
+			app_stats.worker_bursts[p->worker_id][num-1]++;
 	}
 	return 0;
 }
@@ -756,6 +764,8 @@ main(int argc, char *argv[])
 			rte_eal_remote_launch((lcore_function_t *)lcore_tx,
 					dist_tx_ring, lcore_id);
 		} else {
+			printf("Starting worker on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
 			struct lcore_params *p =
 					rte_malloc(NULL, sizeof(*p), 0);
 			if (!p)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v10 15/18] examples/distributor: give Rx thread a core
  2017-03-15  6:19                               ` [PATCH v10 0/18] distributor library performance enhancements David Hunt
                                                   ` (13 preceding siblings ...)
  2017-03-15  6:19                                 ` [PATCH v10 14/18] examples/distributor: tweaks for performance David Hunt
@ 2017-03-15  6:19                                 ` David Hunt
  2017-03-15  6:19                                 ` [PATCH v10 16/18] doc: distributor library changes for new burst API David Hunt
                                                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Now that we're printing out a page of stats every second to the console,
we should give the stats it's own core so that we don't interfere with
the performance of the Rx core.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 examples/distributor/main.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index 53c7b38..6aa8755 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -680,9 +680,10 @@ main(int argc, char *argv[])
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Invalid distributor parameters\n");
 
-	if (rte_lcore_count() < 4)
+	if (rte_lcore_count() < 5)
 		rte_exit(EXIT_FAILURE, "Error, This application needs at "
-				"least 4 logical cores to run:\n"
+				"least 5 logical cores to run:\n"
+				"1 lcore for stats (can be core 0)\n"
 				"1 lcore for packet RX\n"
 				"1 lcore for distribution\n"
 				"1 lcore for packet TX\n"
@@ -724,7 +725,7 @@ main(int argc, char *argv[])
 	}
 
 	d = rte_distributor_create("PKT_DIST", rte_socket_id(),
-			rte_lcore_count() - 3,
+			rte_lcore_count() - 4,
 			RTE_DIST_ALG_BURST);
 	if (d == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create distributor\n");
@@ -763,6 +764,18 @@ main(int argc, char *argv[])
 			/* tx core */
 			rte_eal_remote_launch((lcore_function_t *)lcore_tx,
 					dist_tx_ring, lcore_id);
+		} else if (worker_id == rte_lcore_count() - 2) {
+			printf("Starting rx on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
+			/* rx core */
+			struct lcore_params *p =
+					rte_malloc(NULL, sizeof(*p), 0);
+			if (!p)
+				rte_panic("malloc failure\n");
+			*p = (struct lcore_params){worker_id, d, rx_dist_ring,
+					dist_tx_ring, mbuf_pool};
+			rte_eal_remote_launch((lcore_function_t *)lcore_rx,
+					p, lcore_id);
 		} else {
 			printf("Starting worker on worker_id %d, lcore_id %d\n",
 					worker_id, lcore_id);
@@ -778,11 +791,6 @@ main(int argc, char *argv[])
 		}
 		worker_id++;
 	}
-	/* call lcore_main on master core only */
-	struct lcore_params p = { 0, d, rx_dist_ring, dist_tx_ring, mbuf_pool};
-
-	if (lcore_rx(&p) != 0)
-		return -1;
 
 	freq = rte_get_timer_hz();
 	t = rte_rdtsc() + freq;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v10 16/18] doc: distributor library changes for new burst API
  2017-03-15  6:19                               ` [PATCH v10 0/18] distributor library performance enhancements David Hunt
                                                   ` (14 preceding siblings ...)
  2017-03-15  6:19                                 ` [PATCH v10 15/18] examples/distributor: give Rx thread a core David Hunt
@ 2017-03-15  6:19                                 ` David Hunt
  2017-03-15  6:19                                 ` [PATCH v10 17/18] doc: distributor app " David Hunt
  2017-03-15  6:19                                 ` [PATCH v10 18/18] maintainers: add to distributor lib maintainers David Hunt
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/prog_guide/packet_distrib_lib.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/guides/prog_guide/packet_distrib_lib.rst b/doc/guides/prog_guide/packet_distrib_lib.rst
index b5bdabb..e0adcaa 100644
--- a/doc/guides/prog_guide/packet_distrib_lib.rst
+++ b/doc/guides/prog_guide/packet_distrib_lib.rst
@@ -42,6 +42,9 @@ The model of operation is shown in the diagram below.
 
    Packet Distributor mode of operation
 
+There are two modes of operation of the API in the distributor Library, one which sends one packet at a time
+to workers using 32-bits for flow_id, and an optiomised mode which sends bursts of up to 8 packets at a time
+to workers, using 15 bits of flow_id. The mode is selected by the type field in the ``rte_distributor_create function``.
 
 Distributor Core Operation
 --------------------------
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v10 17/18] doc: distributor app changes for new burst API
  2017-03-15  6:19                               ` [PATCH v10 0/18] distributor library performance enhancements David Hunt
                                                   ` (15 preceding siblings ...)
  2017-03-15  6:19                                 ` [PATCH v10 16/18] doc: distributor library changes for new burst API David Hunt
@ 2017-03-15  6:19                                 ` David Hunt
  2017-03-15  6:19                                 ` [PATCH v10 18/18] maintainers: add to distributor lib maintainers David Hunt
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Changes in the thread layout described, with an updated diagram.

Signed-off-by: David Hunt <david.hunt@intel.com>
---
 doc/guides/sample_app_ug/dist_app.rst     |  49 +++---
 doc/guides/sample_app_ug/img/dist_app.svg | 276 +++++++++++++++++-------------
 2 files changed, 179 insertions(+), 146 deletions(-)

diff --git a/doc/guides/sample_app_ug/dist_app.rst b/doc/guides/sample_app_ug/dist_app.rst
index ec07b84..b073ce8 100644
--- a/doc/guides/sample_app_ug/dist_app.rst
+++ b/doc/guides/sample_app_ug/dist_app.rst
@@ -104,33 +104,34 @@ Running the Application
 Explanation
 -----------
 
-The distributor application consists of three types of threads: a receive
-thread (lcore_rx()), a set of worker threads(lcore_worker())
-and a transmit thread(lcore_tx()). How these threads work together is shown
-in :numref:`figure_dist_app` below. The main() function launches  threads of these three types.
-Each thread has a while loop which will be doing processing and which is
-terminated only upon SIGINT or ctrl+C. The receive and transmit threads
-communicate using a software ring (rte_ring structure).
-
-The receive thread receives the packets using rte_eth_rx_burst() and gives
-them to  the distributor (using rte_distributor_process() API) which will
-be called in context of the receive thread itself. The distributor distributes
-the packets to workers threads based on the tagging of the packet -
-indicated by the hash field in the mbuf. For IP traffic, this field is
-automatically filled by the NIC with the "usr" hash value for the packet,
-which works as a per-flow tag.
+The distributor application consists of four types of threads: a receive thread
+(lcore_rx()), a distributor thread (lcore_dist()), a set of worker threads
+(lcore_worker()), and a transmit thread(lcore_tx()). How these threads work
+together is shown in :numref:`figure_dist_app` below. The main() function
+launches  threads of these four types.  Each thread has a while loop which will
+be doing processing and which is terminated only upon SIGINT or ctrl+C.
+
+The receive thread receives the packets using rte_eth_rx_burst() and will
+enqueue them to an rte_ring. The distributor thread will dequeue the packets
+from the ring and assign them to workers (using rte_distributor_process() API).
+This assignment is based on the tag (or flow ID) of the packet - indicated by
+the hash field in the mbuf. For IP traffic, this field is automatically filled
+by the NIC with the "usr" hash value for the packet, which works as a per-flow
+tag.  The distributor thread communicates with the worker threads using a
+cache-line swapping mechanism, passing up to 8 mbuf pointers at a time
+(one cache line) to each worker.
 
 More than one worker thread can exist as part of the application, and these
 worker threads do simple packet processing by requesting packets from
 the distributor, doing a simple XOR operation on the input port mbuf field
 (to indicate the output port which will be used later for packet transmission)
-and then finally returning the packets back to the distributor in the RX thread.
+and then finally returning the packets back to the distributor thread.
 
-Meanwhile, the receive thread will call the distributor api
-rte_distributor_returned_pkts() to get the packets processed, and will enqueue
-them to a ring for transfer to the TX thread for transmission on the output port.
-The transmit thread will dequeue the packets from the ring and transmit them on
-the output port specified in packet mbuf.
+The distributor thread will then call the distributor api
+rte_distributor_returned_pkts() to get the processed packets, and will enqueue
+them to another rte_ring for transfer to the TX thread for transmission on the
+output port. The transmit thread will dequeue the packets from the ring and
+transmit them on the output port specified in packet mbuf.
 
 Users who wish to terminate the running of the application have to press ctrl+C
 (or send SIGINT to the app). Upon this signal, a signal handler provided
@@ -153,8 +154,10 @@ the line "#define DEBUG" defined in start of the application in main.c to enable
 Statistics
 ----------
 
-Upon SIGINT (or) ctrl+C, the print_stats() function displays the count of packets
-processed at the different stages in the application.
+The main function will print statistics on the console every second. These
+statistics include the number of packets enqueued and dequeued at each stage
+in the application, and also key statistics per worker, including how many
+packets of each burst size (1-8) were sent to each worker thread.
 
 Application Initialization
 --------------------------
diff --git a/doc/guides/sample_app_ug/img/dist_app.svg b/doc/guides/sample_app_ug/img/dist_app.svg
index 4714c7d..944f437 100644
--- a/doc/guides/sample_app_ug/img/dist_app.svg
+++ b/doc/guides/sample_app_ug/img/dist_app.svg
@@ -1,8 +1,7 @@
 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
-
 <!--
 # BSD LICENSE
-# Copyright (c) <2014>, Intel Corporation
+# Copyright (c) <2014-2017>, Intel Corporation
 # All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
@@ -47,8 +46,8 @@
    height="379.53668"
    id="svg4090"
    version="1.1"
-   inkscape:version="0.48.5 r10040"
-   sodipodi:docname="New document 2">
+   inkscape:version="0.92.1 r15371"
+   sodipodi:docname="dist_app.svg">
   <defs
      id="defs4092">
     <marker
@@ -200,8 +199,8 @@
      inkscape:pageopacity="0.0"
      inkscape:pageshadow="2"
      inkscape:zoom="1"
-     inkscape:cx="339.92174"
-     inkscape:cy="120.32038"
+     inkscape:cx="401.32873"
+     inkscape:cy="130.13572"
      inkscape:document-units="px"
      inkscape:current-layer="layer1"
      showgrid="false"
@@ -210,8 +209,8 @@
      fit-margin-right="0"
      fit-margin-bottom="0"
      inkscape:window-width="1920"
-     inkscape:window-height="1017"
-     inkscape:window-x="-8"
+     inkscape:window-height="1137"
+     inkscape:window-x="1912"
      inkscape:window-y="-8"
      inkscape:window-maximized="1" />
   <metadata
@@ -222,7 +221,7 @@
         <dc:format>image/svg+xml</dc:format>
         <dc:type
            rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
-        <dc:title></dc:title>
+        <dc:title />
       </cc:Work>
     </rdf:RDF>
   </metadata>
@@ -232,40 +231,33 @@
      id="layer1"
      transform="translate(-35.078263,-28.308125)">
     <rect
-       style="fill:none;stroke:#000000;stroke-width:0.99999988;stroke-opacity:0.98412697"
+       style="fill:none;stroke:#000000;stroke-width:0.81890059;stroke-opacity:0.98412697"
        id="rect10443"
-       width="152.9641"
-       height="266.92566"
-       x="122.95611"
-       y="34.642567" />
-    <rect
-       style="fill:none;stroke:#000000;stroke-width:1;stroke-opacity:0.98412697"
-       id="rect10445"
-       width="124.71397"
-       height="46.675529"
-       x="435.7746"
-       y="28.808125" />
+       width="152.96732"
+       height="178.99617"
+       x="124.50176"
+       y="128.95552" />
     <rect
        style="fill:none;stroke:#000000;stroke-width:0.99999988;stroke-opacity:0.98412697"
        id="rect10445-2"
        width="124.71397"
        height="46.675529"
-       x="435.42999"
-       y="103.92654" />
+       x="437.00507"
+       y="133.06113" />
     <rect
        style="fill:none;stroke:#000000;stroke-width:0.99999988;stroke-opacity:0.98412697"
        id="rect10445-0"
        width="124.71397"
        height="46.675529"
        x="436.80811"
-       y="178.31572" />
+       y="193.87207" />
     <rect
        style="fill:none;stroke:#000000;stroke-width:0.99999988;stroke-opacity:0.98412697"
        id="rect10445-9"
        width="124.71397"
        height="46.675529"
        x="436.80811"
-       y="246.87038" />
+       y="256.06277" />
     <rect
        style="fill:none;stroke:#000000;stroke-width:0.99999988;stroke-opacity:0.98412697"
        id="rect10445-7"
@@ -274,203 +266,241 @@
        x="135.7057"
        y="360.66928" />
     <path
-       style="fill:none;stroke:#000000;stroke-width:0.99200004;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-opacity:1;stroke-dasharray:none;marker-start:url(#Arrow1Mstart)"
-       d="M 277.293,44.129101 433.02373,43.388655"
-       id="path10486"
-       inkscape:connector-type="polyline"
-       inkscape:connector-curvature="3" />
-    <path
-       style="fill:none;stroke:#000000;stroke-width:0.99200004;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-opacity:1;stroke-dasharray:none;marker-start:url(#Arrow1Mstart)"
-       d="m 277.83855,110.78109 155.73073,-0.74044"
+       style="fill:none;stroke:#000000;stroke-width:0.99566948;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#Arrow1Mstart)"
+       d="M 278.89497,147.51907 436.5713,146.78234"
        id="path10486-2"
        inkscape:connector-type="polyline"
        inkscape:connector-curvature="3" />
     <path
-       style="fill:none;stroke:#000000;stroke-width:0.99200004;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-opacity:1;stroke-dasharray:none;marker-start:url(#Arrow1Mstart)"
-       d="m 278.48623,189.32721 155.73073,-0.74042"
+       style="fill:none;stroke:#000000;stroke-width:0.99290925;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#Arrow1Mstart)"
+       d="m 279.37092,206.8834 156.80331,-0.73671"
        id="path10486-1"
        inkscape:connector-type="polyline"
        inkscape:connector-curvature="3" />
     <path
-       style="fill:none;stroke:#000000;stroke-width:0.99200004;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-opacity:1;stroke-dasharray:none;marker-start:url(#Arrow1Mstart)"
-       d="m 278.48623,255.19448 155.73073,-0.74043"
+       style="fill:none;stroke:#000000;stroke-width:0.99379504;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#Arrow1Mstart)"
+       d="m 279.19738,270.88669 157.15478,-0.73638"
        id="path10486-4"
        inkscape:connector-type="polyline"
        inkscape:connector-curvature="3" />
     <path
-       style="fill:none;stroke:#000000;stroke-width:0.99200004;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-mid:none;marker-end:url(#Arrow1Mend)"
-       d="M 277.11852,66.041829 432.84924,65.301384"
-       id="path10486-0"
-       inkscape:connector-type="polyline"
-       inkscape:connector-curvature="3" />
-    <path
-       style="fill:none;stroke:#000000;stroke-width:0.99200004;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-mid:none;marker-end:url(#Arrow1Mend)"
-       d="M 277.46746,136.71727 433.1982,135.97682"
+       style="fill:none;stroke:#000000;stroke-width:0.99820405;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-dasharray:none;stroke-opacity:1;marker-start:none;marker-mid:none;marker-end:url(#Arrow1Mend)"
+       d="m 277.17846,166.20347 158.11878,-0.73842"
        id="path10486-0-4"
        inkscape:connector-type="polyline"
        inkscape:connector-curvature="3" />
     <path
-       style="fill:none;stroke:#000000;stroke-width:0.99200004;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-mid:none;marker-end:url(#Arrow1Mend)"
-       d="m 276.77843,210.37709 155.73073,-0.74044"
+       style="fill:none;stroke:#000000;stroke-width:0.99410033;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-dasharray:none;stroke-opacity:1;marker-start:none;marker-mid:none;marker-end:url(#Arrow1Mend)"
+       d="m 277.47049,225.92925 157.32298,-0.73606"
        id="path10486-0-7"
        inkscape:connector-type="polyline"
        inkscape:connector-curvature="3" />
     <path
-       style="fill:none;stroke:#000000;stroke-width:0.99200004;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-mid:none;marker-end:url(#Arrow1Mend)"
-       d="M 277.46746,282.5783 433.1982,281.83785"
+       style="fill:none;stroke:#000000;stroke-width:0.99566948;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-dasharray:none;stroke-opacity:1;marker-start:none;marker-mid:none;marker-end:url(#Arrow1Mend)"
+       d="M 277.70474,289.26714 435.38107,288.5304"
        id="path10486-0-77"
        inkscape:connector-type="polyline"
        inkscape:connector-curvature="3" />
     <text
        xml:space="preserve"
-       style="font-size:9.32312489px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
-       x="348.03241"
-       y="34.792767"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
+       x="345.02322"
+       y="134.82103"
        id="text11995"
-       sodipodi:linespacing="125%"
-       transform="scale(0.93992342,1.0639165)"><tspan
+       transform="scale(0.93992339,1.0639165)"><tspan
          sodipodi:role="line"
          id="tspan11997"
-         x="348.03241"
-         y="34.792767">Request packet</tspan></text>
+         x="345.02322"
+         y="134.82103"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">Request burst</tspan></text>
     <text
        xml:space="preserve"
-       style="font-size:9.32312489px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
-       x="349.51935"
-       y="74.044792"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
+       x="346.38663"
+       y="164.76628"
        id="text11995-7"
-       sodipodi:linespacing="125%"
-       transform="scale(0.93992342,1.0639165)"><tspan
+       transform="scale(0.93992339,1.0639165)"><tspan
          sodipodi:role="line"
          id="tspan11997-3"
-         x="349.51935"
-         y="74.044792">Mbuf pointer</tspan></text>
+         x="346.38663"
+         y="164.76628"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">Mbuf Pointers</tspan></text>
     <text
        xml:space="preserve"
-       style="font-size:9.32312489px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
-       x="504.26611"
-       y="52.165989"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
+       x="502.36844"
+       y="151.66222"
        id="text11995-7-3"
-       sodipodi:linespacing="125%"
-       transform="scale(0.93992342,1.0639165)"><tspan
+       transform="scale(0.93992339,1.0639165)"><tspan
          sodipodi:role="line"
          id="tspan11997-3-5"
-         x="504.26611"
-         y="52.165989">WorkerThread1</tspan></text>
+         x="502.36844"
+         y="151.66222"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">WorkerThread1</tspan></text>
     <text
        xml:space="preserve"
-       style="font-size:9.32312489px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
-       x="501.65793"
-       y="121.54361"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
+       x="499.40103"
+       y="207.94502"
        id="text11995-7-3-9"
-       sodipodi:linespacing="125%"
-       transform="scale(0.93992342,1.0639165)"><tspan
+       transform="scale(0.93992339,1.0639165)"><tspan
          sodipodi:role="line"
          id="tspan11997-3-5-9"
-         x="501.65793"
-         y="121.54361">WorkerThread2</tspan></text>
+         x="499.40103"
+         y="207.94502"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">WorkerThread2</tspan></text>
     <text
        xml:space="preserve"
-       style="font-size:9.32312489px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
-       x="499.45868"
-       y="191.46367"
-       id="text11995-7-3-8"
-       sodipodi:linespacing="125%"
-       transform="scale(0.93992342,1.0639165)"><tspan
-         sodipodi:role="line"
-         id="tspan11997-3-5-1"
-         x="499.45868"
-         y="191.46367">WorkerThread3</tspan></text>
-    <text
-       xml:space="preserve"
-       style="font-size:9.32312489px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
        x="500.1918"
-       y="257.9563"
+       y="266.59644"
        id="text11995-7-3-82"
-       sodipodi:linespacing="125%"
-       transform="scale(0.93992342,1.0639165)"><tspan
+       transform="scale(0.9399234,1.0639165)"><tspan
          sodipodi:role="line"
          id="tspan11997-3-5-6"
          x="500.1918"
-         y="257.9563">WorkerThreadN</tspan></text>
+         y="266.59644"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">WorkerThreadN</tspan></text>
     <text
        xml:space="preserve"
-       style="font-size:9.32312489px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
        x="193.79703"
        y="362.85193"
        id="text11995-7-3-6"
-       sodipodi:linespacing="125%"
        transform="scale(0.93992342,1.0639165)"><tspan
          sodipodi:role="line"
          id="tspan11997-3-5-0"
          x="193.79703"
-         y="362.85193">TX thread</tspan></text>
+         y="362.85193"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">TX thread</tspan></text>
     <text
        xml:space="preserve"
-       style="font-size:9.32312489px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
-       x="162.2476"
-       y="142.79382"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
+       x="175.78905"
+       y="207.26257"
        id="text11995-7-3-3"
-       sodipodi:linespacing="125%"
-       transform="scale(0.93992342,1.0639165)"><tspan
+       transform="scale(0.9399234,1.0639165)"><tspan
          sodipodi:role="line"
          id="tspan11997-3-5-8"
-         x="162.2476"
-         y="142.79382">RX thread &amp; Distributor</tspan></text>
+         x="175.78905"
+         y="207.26257"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">Distributor Thread</tspan></text>
     <path
-       style="fill:none;stroke:#000000;stroke-width:0.75945646;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-mid:none;marker-end:url(#Arrow1Mend)"
-       d="m 35.457991,109.77995 85.546359,-0.79004"
+       style="fill:none;stroke:#000000;stroke-width:0.75945646;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-dasharray:none;stroke-opacity:1;marker-start:none;marker-mid:none;marker-end:url(#Arrow1Mend)"
+       d="m 49.600127,54.625621 85.546363,-0.79004"
        id="path10486-0-4-5"
        inkscape:connector-type="polyline"
        inkscape:connector-curvature="3" />
     <path
-       style="fill:none;stroke:#000000;stroke-width:0.75945646;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-mid:none;marker-end:url(#Arrow1Mend)"
+       style="fill:none;stroke:#000000;stroke-width:0.75945646;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-dasharray:none;stroke-opacity:1;marker-start:none;marker-mid:none;marker-end:url(#Arrow1Mend)"
        d="m 135.70569,384.00706 -85.546361,0.79003"
        id="path10486-0-4-5-7"
        inkscape:connector-type="polyline"
        inkscape:connector-curvature="3" />
     <text
        xml:space="preserve"
-       style="font-size:9.32312489px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
-       x="58.296661"
-       y="96.037407"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
+       x="73.342712"
+       y="44.196564"
        id="text11995-7-8"
-       sodipodi:linespacing="125%"
-       transform="scale(0.93992342,1.0639165)"><tspan
+       transform="scale(0.9399234,1.0639165)"><tspan
          sodipodi:role="line"
          id="tspan11997-3-3"
-         x="58.296661"
-         y="96.037407">Mbufs In</tspan></text>
+         x="73.342712"
+         y="44.196564"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">Mbufs In</tspan></text>
     <text
        xml:space="preserve"
-       style="font-size:9.32312489px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
        x="83.4814"
        y="352.62543"
        id="text11995-7-8-5"
-       sodipodi:linespacing="125%"
        transform="scale(0.93992342,1.0639165)"><tspan
          sodipodi:role="line"
          id="tspan11997-3-3-1"
          x="83.4814"
-         y="352.62543">Mbufs Out</tspan></text>
+         y="352.62543"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">Mbufs Out</tspan></text>
     <path
-       style="fill:none;stroke:#000000;stroke-width:1.05720723;stroke-miterlimit:3;stroke-opacity:0.98412697;stroke-dasharray:none"
-       d="m 171.68192,303.16236 0.21464,30.4719 -8.6322,0.40574 -11.33877,0.1956 25.75778,14.79103 23.25799,11.11792 18.87014,-7.32926 31.83305,-17.26495 -10.75831,-0.32986 -10.37586,-0.44324 -0.22443,-31.54093 z"
+       style="fill:none;stroke:#000000;stroke-width:1.01068497;stroke-miterlimit:3;stroke-dasharray:none;stroke-opacity:0.98412697"
+       d="m 171.68192,308.06701 0.21464,27.84908 -8.6322,0.37082 -11.33877,0.17876 25.75778,13.51792 23.25799,10.16096 18.87014,-6.69841 31.83305,-15.77889 -10.75831,-0.30147 -10.37586,-0.40509 -0.22443,-28.8261 z"
        id="path12188"
        inkscape:connector-curvature="0"
-       inkscape:transform-center-y="7.6863474"
+       inkscape:transform-center-y="7.0247597"
        sodipodi:nodetypes="cccccccccccc" />
     <text
        xml:space="preserve"
-       style="font-size:9.32312489px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
        x="193.68871"
        y="309.26349"
        id="text11995-7-3-6-2"
-       sodipodi:linespacing="125%"
        transform="scale(0.93992342,1.0639165)"><tspan
          sodipodi:role="line"
          x="193.68871"
          y="309.26349"
-         id="tspan12214">SW Ring</tspan></text>
+         id="tspan12214"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">SW Ring</tspan></text>
+    <path
+       style="fill:none;stroke:#000000;stroke-width:1.02106845;stroke-miterlimit:3;stroke-dasharray:none;stroke-opacity:0.98412697"
+       d="m 173.27214,75.568236 0.21464,28.424254 -8.6322,0.37848 -11.33877,0.18245 25.75778,13.79709 23.25799,10.37083 18.87013,-6.83675 31.83305,-16.10478 -10.75831,-0.30769 -10.37586,-0.41345 -0.22443,-29.421453 z"
+       id="path12188-5"
+       inkscape:connector-curvature="0"
+       inkscape:transform-center-y="7.1698404"
+       sodipodi:nodetypes="cccccccccccc" />
+    <rect
+       style="fill:none;stroke:#000000;stroke-width:0.99999988;stroke-opacity:0.98412697"
+       id="rect10445-7-7"
+       width="124.71397"
+       height="46.675529"
+       x="138.18427"
+       y="28.832333" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
+       x="190.80019"
+       y="51.17778"
+       id="text11995-7-3-6-6"
+       transform="scale(0.93992339,1.0639165)"><tspan
+         sodipodi:role="line"
+         id="tspan11997-3-5-0-4"
+         x="190.80019"
+         y="51.17778"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">RX thread</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
+       x="196.38097"
+       y="90.224785"
+       id="text11995-7-3-6-2-9"
+       transform="scale(0.93992339,1.0639165)"><tspan
+         sodipodi:role="line"
+         x="196.38097"
+         y="90.224785"
+         id="tspan12214-8"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">SW Ring</tspan></text>
+    <rect
+       style="fill:none;stroke:#000000;stroke-width:0.99999988;stroke-opacity:0.98412697"
+       id="rect10445-7-7-5"
+       width="124.71397"
+       height="46.675529"
+       x="327.86566"
+       y="29.009106" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
+       x="387.27209"
+       y="45.36227"
+       id="text11995-7-3-6-6-3"
+       transform="scale(0.93992339,1.0639165)"><tspan
+         sodipodi:role="line"
+         id="tspan11997-3-5-0-4-4"
+         x="387.27209"
+         y="45.36227"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">Stats thread</tspan><tspan
+         sodipodi:role="line"
+         x="387.27209"
+         y="57.016178"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif"
+         id="tspan165">(to console)</tspan></text>
   </g>
 </svg>
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v10 18/18] maintainers: add to distributor lib maintainers
  2017-03-15  6:19                               ` [PATCH v10 0/18] distributor library performance enhancements David Hunt
                                                   ` (16 preceding siblings ...)
  2017-03-15  6:19                                 ` [PATCH v10 17/18] doc: distributor app " David Hunt
@ 2017-03-15  6:19                                 ` David Hunt
  17 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-15  6:19 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 39bc78e..0545911 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -494,6 +494,7 @@ F: doc/guides/sample_app_ug/ip_reassembly.rst
 
 Distributor
 M: Bruce Richardson <bruce.richardson@intel.com>
+M: David Hunt <david.hunt@intel.com>
 F: lib/librte_distributor/
 F: doc/guides/prog_guide/packet_distrib_lib.rst
 F: test/test/test_distributor*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* Re: [PATCH v10 02/18] lib: create private header file
  2017-03-15  6:19                                 ` [PATCH v10 02/18] lib: create private header file David Hunt
@ 2017-03-15 17:18                                   ` Thomas Monjalon
  2017-03-16 10:43                                     ` Hunt, David
  0 siblings, 1 reply; 202+ messages in thread
From: Thomas Monjalon @ 2017-03-15 17:18 UTC (permalink / raw)
  To: David Hunt; +Cc: dev, bruce.richardson, nelio.laranjeiro

2017-03-15 06:19, David Hunt:
> +/**
> + * Number of packets to deal with in bursts. Needs to be 8 so as to
> + * fit in one cache line.
> + */
> +#define RTE_DIST_BURST_SIZE (sizeof(rte_xmm_t) / sizeof(uint16_t))

error: 'rte_xmm_t' undeclared here (arm compilation)

Can it be fixed by including rte_vect.h?

Ideally I would prefer we stop using XMM types in a generic code.
XMM are x86-only registers. It has been translated for other arches
but we should use a more generic name.

What was the intention here? SSE-optimized code or 128-bit size?
Please check lib/librte_eal/common/include/generic/rte_vect.h
for a generic type.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v10 02/18] lib: create private header file
  2017-03-15 17:18                                   ` Thomas Monjalon
@ 2017-03-16 10:43                                     ` Hunt, David
  2017-03-16 15:40                                       ` Thomas Monjalon
  0 siblings, 1 reply; 202+ messages in thread
From: Hunt, David @ 2017-03-16 10:43 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, bruce.richardson, nelio.laranjeiro

On 15/3/2017 5:18 PM, Thomas Monjalon wrote:
> 2017-03-15 06:19, David Hunt:
>> +/**
>> + * Number of packets to deal with in bursts. Needs to be 8 so as to
>> + * fit in one cache line.
>> + */
>> +#define RTE_DIST_BURST_SIZE (sizeof(rte_xmm_t) / sizeof(uint16_t))
> error: 'rte_xmm_t' undeclared here (arm compilation)
>
> Can it be fixed by including rte_vect.h?
>
> Ideally I would prefer we stop using XMM types in a generic code.
> XMM are x86-only registers. It has been translated for other arches
> but we should use a more generic name.
>
> What was the intention here? SSE-optimized code or 128-bit size?
> Please check lib/librte_eal/common/include/generic/rte_vect.h
> for a generic type.

Thomas,

Including rte_vect.h does indeed resolve the issue.

I had originally had "#define RTE_DIST_BURST_SIZE 8" but thought that latest
definition would give further clarity as to why it's set to 8.

There are 2 reasons.
1. The vector instruction I use for the matching works on 8 uint16s at a 
time
2. The (x86) cache lines communicating with the worker cores fit 8 mbuf 
pointers at a time.

So there are 2 options to resolve:
1. #include <rte_vect.h> at the top of rte_distributor_private.h
2. revert back to "#define RTE_DIST_BURST_SIZE 8"

Personally, I'd probably lean towards option 2 (with additional comment) 
, as it removes
the mention of xmm from the generic header file, as well as being valid 
for both
reasons, whereas the xmm #define really only helps to explain one reason.

Do you have any preference? Let me know and I can push up a v11.

Regards,
Dave.


P.S. Suggested change:

/*
  * Transfer up to 8 mbufs at a time to/from workers, and
  * flow matching algorithm optimised for 8 flow IDs at a time
  */
#define RTE_DIST_BURST_SIZE 8

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v10 02/18] lib: create private header file
  2017-03-16 10:43                                     ` Hunt, David
@ 2017-03-16 15:40                                       ` Thomas Monjalon
  0 siblings, 0 replies; 202+ messages in thread
From: Thomas Monjalon @ 2017-03-16 15:40 UTC (permalink / raw)
  To: Hunt, David; +Cc: dev, bruce.richardson, nelio.laranjeiro

2017-03-16 10:43, Hunt, David:
> On 15/3/2017 5:18 PM, Thomas Monjalon wrote:
> > 2017-03-15 06:19, David Hunt:
> >> +/**
> >> + * Number of packets to deal with in bursts. Needs to be 8 so as to
> >> + * fit in one cache line.
> >> + */
> >> +#define RTE_DIST_BURST_SIZE (sizeof(rte_xmm_t) / sizeof(uint16_t))
> > error: 'rte_xmm_t' undeclared here (arm compilation)
> >
> > Can it be fixed by including rte_vect.h?
> >
> > Ideally I would prefer we stop using XMM types in a generic code.
> > XMM are x86-only registers. It has been translated for other arches
> > but we should use a more generic name.
> >
> > What was the intention here? SSE-optimized code or 128-bit size?
> > Please check lib/librte_eal/common/include/generic/rte_vect.h
> > for a generic type.
> 
> Thomas,
> 
> Including rte_vect.h does indeed resolve the issue.
> 
> I had originally had "#define RTE_DIST_BURST_SIZE 8" but thought that latest
> definition would give further clarity as to why it's set to 8.
> 
> There are 2 reasons.
> 1. The vector instruction I use for the matching works on 8 uint16s at a 
> time
> 2. The (x86) cache lines communicating with the worker cores fit 8 mbuf 
> pointers at a time.
> 
> So there are 2 options to resolve:
> 1. #include <rte_vect.h> at the top of rte_distributor_private.h
> 2. revert back to "#define RTE_DIST_BURST_SIZE 8"
> 
> Personally, I'd probably lean towards option 2 (with additional comment) 
> , as it removes
> the mention of xmm from the generic header file, as well as being valid 
> for both
> reasons, whereas the xmm #define really only helps to explain one reason.
> 
> Do you have any preference? Let me know and I can push up a v11.
> 
> Regards,
> Dave.
> 
> 
> P.S. Suggested change:
> 
> /*
>   * Transfer up to 8 mbufs at a time to/from workers, and
>   * flow matching algorithm optimised for 8 flow IDs at a time
>   */
> #define RTE_DIST_BURST_SIZE 8

OK for the option 2

^ permalink raw reply	[flat|nested] 202+ messages in thread

* [PATCH v11 0/18] distributor lib performance enhancements
  2017-03-15  6:19                                 ` [PATCH v10 01/18] lib: rename legacy distributor lib files David Hunt
@ 2017-03-20 10:08                                   ` David Hunt
  2017-03-20 10:08                                     ` [PATCH v11 01/18] lib: rename legacy distributor lib files David Hunt
                                                       ` (18 more replies)
  0 siblings, 19 replies; 202+ messages in thread
From: David Hunt @ 2017-03-20 10:08 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson

This patch aims to improve the throughput of the distributor library.

It uses a similar handshake mechanism to the previous version of
the library, in that bits are used to indicate when packets are ready
to be sent to a worker and ready to be returned from a worker. One main
difference is that instead of sending one packet in a cache line, it makes
use of the 7 free spaces in the same cache line in order to send up to
8 packets at a time to/from a worker.

The flow matching algorithm has had significant re-work, and now keeps an
array of inflight flows and an array of backlog flows, and matches incoming
flows to the inflight/backlog flows of all workers so that flow pinning to
workers can be maintained.

The Flow Match algorithm has both scalar and a vector versions, and a
function pointer is used to select the post appropriate function at run time,
depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the
the scalar match function is selected, which should still gives a good boost
in performance over the non-burst API.

v11 changes:
   * Fixed RTE_DIST_BURST_SIZE so it compiles on Arm platforms
   * Fixed compile issue in rte_distributor_match_generic.c on Arm plarforms
   * Tweaked distributor_app docs based on review and added John's Ack

v10 changes:
   * Addressed all review comments from v9 (thanks, Bruce)
   * Inherited v9 series Ack by Bruce, except new suggested addition
     for example app documentation (17/18)
   * Squashed the two patches containing distributor structs and code
   * Renamed confusing rte_distributor_v1705.h to rte_distributor_next.h
   * Added usleep in main so as to be a little more gentle with that core
   * Fixed some patch titles and improved some descriptions
   * Updated sample app guide documentation
   * Removed un-needed code limiting Tx rings and cleaned up patch

v9 changes:
   * fixed symbol versioning so it will compile on CentOS and RedHat

v8 changes:
   * Changed the patch set to have a more logical order order of
     the changes, but the end result is basically the same.
   * Fixed broken shared library build.
   * Split down the updates to example app more
   * No longer changes the test app and sample app to use a temporary
     API.
   * No longer temporarily re-names the functions in the
     version.map file.

v7 changes:
   * Reorganised patch so there's a more natural progression in the
     changes, and divided them down into easier to review chunks.
   * Previous versions of this patch set were effectively two APIs.
     We now have a single API. Legacy functionality can
     be used by by using the rte_distributor_create API call with the
     RTE_DISTRIBUTOR_SINGLE flag when creating a distributor instance.
   * Added symbol versioning for old API so that ABI is preserved.

v6 changes:
   * Fixed intermittent segfault where num pkts not divisible
     by BURST_SIZE
   * Cleanup due to review comments on mailing list
   * Renamed _priv.h to _private.h.

v5 changes:
   * Removed some un-needed code around retries in worker API calls
   * Cleanup due to review comments on mailing list
   * Cleanup of non-x86 platform compilation, fallback to scalar match

v4 changes:
   * fixed issue building shared libraries

v3 changes:
  * Addressed mailing list review comments
  * Test code removal
  * Split out SSE match into separate file to facilitate NEON addition
  * Cleaned up conditional compilation flags for SSE2
  * Addressed c99 style compilation errors
  * rebased on latest head (Jan 2 2017, Happy New Year to all)

v2 changes:
  * Created a common distributor_priv.h header file with common
    definitions and structures.
  * Added a scalar version so it can be built and used on machines without
    sse2 instruction set
  * Added unit autotests
  * Added perf autotest

Notes:
   Apps must now work in bursts, as up to 8 are given to a worker at a time
   For performance in matching, Flow ID's are 15-bits
   If 32 bits Flow IDs are required, use the packet-at-a-time (SINGLE)
   mode.

Performance Gains
   2.2GHz Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
   2 x XL710 40GbE NICS to 2 x 40Gbps traffic generator channels 64b packets
   separate cores for rx, tx, distributor
    1 worker  - up to 4.8x
    4 workers - up to 2.9x
    8 workers - up to 1.8x
   12 workers - up to 2.1x
   16 workers - up to 1.8x

[01/18] lib: rename legacy distributor lib files
[02/18] lib: create private header file
[03/18] lib: add new distributor code
[04/18] lib: add SIMD flow matching to distributor
[05/18] test/distributor: extra params for autotests
[06/18] lib: switch distributor over to new API
[07/18] lib: make v20 header file private
[08/18] lib: add symbol versioning to distributor
[09/18] test: test single and burst distributor API
[10/18] test: add perf test for distributor burst mode
[11/18] examples/distributor: allow for extra stats
[12/18] examples/distributor: wait for ports to come up
[13/18] examples/distributor: add dedicated core for dist
[14/18] examples/distributor: tweaks for performance
[15/18] examples/distributor: give Rx thread a core
[16/18] doc: distributor library changes for new burst API
[17/18] doc: distributor app changes for new burst API
[18/18] maintainers: add to distributor lib maintainers

^ permalink raw reply	[flat|nested] 202+ messages in thread

* [PATCH v11 01/18] lib: rename legacy distributor lib files
  2017-03-20 10:08                                   ` [PATCH v11 0/18] distributor lib performance enhancements David Hunt
@ 2017-03-20 10:08                                     ` David Hunt
  2017-03-20 10:08                                     ` [PATCH v11 02/18] lib: create private header file David Hunt
                                                       ` (17 subsequent siblings)
  18 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-20 10:08 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Move files out of the way so that we can replace with new
versions of the distributor libtrary. Files are named in
such a way as to match the symbol versioning that we will
apply for backward ABI compatibility.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_distributor/Makefile                    |   3 +-
 lib/librte_distributor/rte_distributor.h           | 210 +-----------------
 .../{rte_distributor.c => rte_distributor_v20.c}   |   2 +-
 lib/librte_distributor/rte_distributor_v20.h       | 247 +++++++++++++++++++++
 4 files changed, 251 insertions(+), 211 deletions(-)
 rename lib/librte_distributor/{rte_distributor.c => rte_distributor_v20.c} (99%)
 create mode 100644 lib/librte_distributor/rte_distributor_v20.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 4c9af17..b314ca6 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -42,10 +42,11 @@ EXPORT_MAP := rte_distributor_version.map
 LIBABIVER := 1
 
 # all source are stored in SRCS-y
-SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
+SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor_v20.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
diff --git a/lib/librte_distributor/rte_distributor.h b/lib/librte_distributor/rte_distributor.h
index 7d36bc8..e41d522 100644
--- a/lib/librte_distributor/rte_distributor.h
+++ b/lib/librte_distributor/rte_distributor.h
@@ -34,214 +34,6 @@
 #ifndef _RTE_DISTRIBUTE_H_
 #define _RTE_DISTRIBUTE_H_
 
-/**
- * @file
- * RTE distributor
- *
- * The distributor is a component which is designed to pass packets
- * one-at-a-time to workers, with dynamic load balancing.
- */
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
-
-struct rte_distributor;
-struct rte_mbuf;
-
-/**
- * Function to create a new distributor instance
- *
- * Reserves the memory needed for the distributor operation and
- * initializes the distributor to work with the configured number of workers.
- *
- * @param name
- *   The name to be given to the distributor instance.
- * @param socket_id
- *   The NUMA node on which the memory is to be allocated
- * @param num_workers
- *   The maximum number of workers that will request packets from this
- *   distributor
- * @return
- *   The newly created distributor instance
- */
-struct rte_distributor *
-rte_distributor_create(const char *name, unsigned socket_id,
-		unsigned num_workers);
-
-/*  *** APIS to be called on the distributor lcore ***  */
-/*
- * The following APIs are the public APIs which are designed for use on a
- * single lcore which acts as the distributor lcore for a given distributor
- * instance. These functions cannot be called on multiple cores simultaneously
- * without using locking to protect access to the internals of the distributor.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * Process a set of packets by distributing them among workers that request
- * packets. The distributor will ensure that no two packets that have the
- * same flow id, or tag, in the mbuf will be procesed at the same time.
- *
- * The user is advocated to set tag for each mbuf before calling this function.
- * If user doesn't set the tag, the tag value can be various values depending on
- * driver implementation and configuration.
- *
- * This is not multi-thread safe and should only be called on a single lcore.
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs to be distributed
- * @param num_mbufs
- *   The number of mbufs in the mbufs array
- * @return
- *   The number of mbufs processed.
- */
-int
-rte_distributor_process(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned num_mbufs);
-
-/**
- * Get a set of mbufs that have been returned to the distributor by workers
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs pointer array to be filled in
- * @param max_mbufs
- *   The size of the mbufs array
- * @return
- *   The number of mbufs returned in the mbufs array.
- */
-int
-rte_distributor_returned_pkts(struct rte_distributor *d,
-		struct rte_mbuf **mbufs, unsigned max_mbufs);
-
-/**
- * Flush the distributor component, so that there are no in-flight or
- * backlogged packets awaiting processing
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @return
- *   The number of queued/in-flight packets that were completed by this call.
- */
-int
-rte_distributor_flush(struct rte_distributor *d);
-
-/**
- * Clears the array of returned packets used as the source for the
- * rte_distributor_returned_pkts() API call.
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- */
-void
-rte_distributor_clear_returns(struct rte_distributor *d);
-
-/*  *** APIS to be called on the worker lcores ***  */
-/*
- * The following APIs are the public APIs which are designed for use on
- * multiple lcores which act as workers for a distributor. Each lcore should use
- * a unique worker id when requesting packets.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * API called by a worker to get a new packet to process. Any previous packet
- * given to the worker is assumed to have completed processing, and may be
- * optionally returned to the distributor via the oldpkt parameter.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- *
- * @return
- *   A new packet to be processed by the worker thread.
- */
-struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to return a completed packet without requesting a
- * new packet, for example, because a worker thread is shutting down
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param mbuf
- *   The previous packet being processed by the worker
- */
-int
-rte_distributor_return_pkt(struct rte_distributor *d, unsigned worker_id,
-		struct rte_mbuf *mbuf);
-
-/**
- * API called by a worker to request a new packet to process.
- * Any previous packet given to the worker is assumed to have completed
- * processing, and may be optionally returned to the distributor via
- * the oldpkt parameter.
- * Unlike rte_distributor_get_pkt(), this function does not wait for a new
- * packet to be provided by the distributor.
- *
- * NOTE: after calling this function, rte_distributor_poll_pkt() should
- * be used to poll for the packet requested. The rte_distributor_get_pkt()
- * API should *not* be used to try and retrieve the new packet.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- */
-void
-rte_distributor_request_pkt(struct rte_distributor *d,
-		unsigned worker_id, struct rte_mbuf *oldpkt);
-
-/**
- * API called by a worker to check for a new packet that was previously
- * requested by a call to rte_distributor_request_pkt(). It does not wait
- * for the new packet to be available, but returns NULL if the request has
- * not yet been fulfilled by the distributor.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- *
- * @return
- *   A new packet to be processed by the worker thread, or NULL if no
- *   packet is yet available.
- */
-struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
-		unsigned worker_id);
-
-#ifdef __cplusplus
-}
-#endif
+#include <rte_distributor_v20.h>
 
 #endif
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor_v20.c
similarity index 99%
rename from lib/librte_distributor/rte_distributor.c
rename to lib/librte_distributor/rte_distributor_v20.c
index f3f778c..b890947 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -40,7 +40,7 @@
 #include <rte_errno.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
-#include "rte_distributor.h"
+#include "rte_distributor_v20.h"
 
 #define NO_FLAGS 0
 #define RTE_DISTRIB_PREFIX "DT_"
diff --git a/lib/librte_distributor/rte_distributor_v20.h b/lib/librte_distributor/rte_distributor_v20.h
new file mode 100644
index 0000000..b69aa27
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v20.h
@@ -0,0 +1,247 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIB_V20_H_
+#define _RTE_DISTRIB_V20_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+struct rte_distributor;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor *
+rte_distributor_create(const char *name, unsigned int socket_id,
+		unsigned int num_workers);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed at the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush(struct rte_distributor *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns(struct rte_distributor *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get a new packet to process. Any previous packet
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ *
+ * @return
+ *   A new packet to be processed by the worker thread.
+ */
+struct rte_mbuf *
+rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbuf
+ *   The previous packet being processed by the worker
+ */
+int
+rte_distributor_return_pkt(struct rte_distributor *d, unsigned int worker_id,
+		struct rte_mbuf *mbuf);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt(), this function does not wait for a new
+ * packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ */
+void
+rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf *oldpkt);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ *
+ * @return
+ *   A new packet to be processed by the worker thread, or NULL if no
+ *   packet is yet available.
+ */
+struct rte_mbuf *
+rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v11 02/18] lib: create private header file
  2017-03-20 10:08                                   ` [PATCH v11 0/18] distributor lib performance enhancements David Hunt
  2017-03-20 10:08                                     ` [PATCH v11 01/18] lib: rename legacy distributor lib files David Hunt
@ 2017-03-20 10:08                                     ` David Hunt
  2017-03-20 10:08                                     ` [PATCH v11 03/18] lib: add new distributor code David Hunt
                                                       ` (16 subsequent siblings)
  18 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-20 10:08 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

We'll be adding internal implementation definitions in here
that are common to both burst and legacy APIs.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_distributor/rte_distributor_private.h | 136 +++++++++++++++++++++++
 lib/librte_distributor/rte_distributor_v20.c     |  72 +-----------
 2 files changed, 137 insertions(+), 71 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_private.h

diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
new file mode 100644
index 0000000..b1c0f66
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_private.h
@@ -0,0 +1,136 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DIST_PRIV_H_
+#define _RTE_DIST_PRIV_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#define NO_FLAGS 0
+#define RTE_DISTRIB_PREFIX "DT_"
+
+/*
+ * We will use the bottom four bits of pointer for flags, shifting out
+ * the top four bits to make room (since a 64-bit pointer actually only uses
+ * 48 bits). An arithmetic-right-shift will then appropriately restore the
+ * original pointer value with proper sign extension into the top bits.
+ */
+#define RTE_DISTRIB_FLAG_BITS 4
+#define RTE_DISTRIB_FLAGS_MASK (0x0F)
+#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
+#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
+#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
+#define RTE_DISTRIB_VALID_BUF (4)  /**< set if bufptr contains ptr */
+
+#define RTE_DISTRIB_BACKLOG_SIZE 8
+#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
+
+#define RTE_DISTRIB_MAX_RETURNS 128
+#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
+
+/**
+ * Maximum number of workers allowed.
+ * Be aware of increasing the limit, becaus it is limited by how we track
+ * in-flight tags. See in_flight_bitmask and rte_distributor_process
+ */
+#define RTE_DISTRIB_MAX_WORKERS 64
+
+#define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
+
+/**
+ * Buffer structure used to pass the pointer data between cores. This is cache
+ * line aligned, but to improve performance and prevent adjacent cache-line
+ * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
+ * the next cache line to worker 0, we pad this out to three cache lines.
+ * Only 64-bits of the memory is actually used though.
+ */
+union rte_distributor_buffer {
+	volatile int64_t bufptr64;
+	char pad[RTE_CACHE_LINE_SIZE*3];
+} __rte_cache_aligned;
+
+/*
+ * Transfer up to 8 mbufs at a time to/from workers, and
+ * flow matching algorithm optimised for 8 flow IDs at a time
+ */
+#define RTE_DIST_BURST_SIZE 8
+
+struct rte_distributor_backlog {
+	unsigned int start;
+	unsigned int count;
+	int64_t pkts[RTE_DIST_BURST_SIZE] __rte_cache_aligned;
+	uint16_t *tags; /* will point to second cacheline of inflights */
+} __rte_cache_aligned;
+
+
+struct rte_distributor_returned_pkts {
+	unsigned int start;
+	unsigned int count;
+	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
+};
+
+struct rte_distributor {
+	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
+
+	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
+	unsigned int num_workers;             /**< Number of workers polling */
+
+	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
+		/**< Tracks the tag being processed per core */
+	uint64_t in_flight_bitmask;
+		/**< on/off bits for in-flight tags.
+		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
+		 * the bitmask has to expand.
+		 */
+
+	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
+
+	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
+
+	struct rte_distributor_returned_pkts returns;
+};
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
index b890947..be297ec 100644
--- a/lib/librte_distributor/rte_distributor_v20.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -41,77 +41,7 @@
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
 #include "rte_distributor_v20.h"
-
-#define NO_FLAGS 0
-#define RTE_DISTRIB_PREFIX "DT_"
-
-/* we will use the bottom four bits of pointer for flags, shifting out
- * the top four bits to make room (since a 64-bit pointer actually only uses
- * 48 bits). An arithmetic-right-shift will then appropriately restore the
- * original pointer value with proper sign extension into the top bits. */
-#define RTE_DISTRIB_FLAG_BITS 4
-#define RTE_DISTRIB_FLAGS_MASK (0x0F)
-#define RTE_DISTRIB_NO_BUF 0       /**< empty flags: no buffer requested */
-#define RTE_DISTRIB_GET_BUF (1)    /**< worker requests a buffer, returns old */
-#define RTE_DISTRIB_RETURN_BUF (2) /**< worker returns a buffer, no request */
-
-#define RTE_DISTRIB_BACKLOG_SIZE 8
-#define RTE_DISTRIB_BACKLOG_MASK (RTE_DISTRIB_BACKLOG_SIZE - 1)
-
-#define RTE_DISTRIB_MAX_RETURNS 128
-#define RTE_DISTRIB_RETURNS_MASK (RTE_DISTRIB_MAX_RETURNS - 1)
-
-/**
- * Maximum number of workers allowed.
- * Be aware of increasing the limit, becaus it is limited by how we track
- * in-flight tags. See @in_flight_bitmask and @rte_distributor_process
- */
-#define RTE_DISTRIB_MAX_WORKERS	64
-
-/**
- * Buffer structure used to pass the pointer data between cores. This is cache
- * line aligned, but to improve performance and prevent adjacent cache-line
- * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
- * the next cache line to worker 0, we pad this out to three cache lines.
- * Only 64-bits of the memory is actually used though.
- */
-union rte_distributor_buffer {
-	volatile int64_t bufptr64;
-	char pad[RTE_CACHE_LINE_SIZE*3];
-} __rte_cache_aligned;
-
-struct rte_distributor_backlog {
-	unsigned start;
-	unsigned count;
-	int64_t pkts[RTE_DISTRIB_BACKLOG_SIZE];
-};
-
-struct rte_distributor_returned_pkts {
-	unsigned start;
-	unsigned count;
-	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
-};
-
-struct rte_distributor {
-	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
-
-	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
-	unsigned num_workers;                 /**< Number of workers polling */
-
-	uint32_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS];
-		/**< Tracks the tag being processed per core */
-	uint64_t in_flight_bitmask;
-		/**< on/off bits for in-flight tags.
-		 * Note that if RTE_DISTRIB_MAX_WORKERS is larger than 64 then
-		 * the bitmask has to expand.
-		 */
-
-	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
-
-	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
-
-	struct rte_distributor_returned_pkts returns;
-};
+#include "rte_distributor_private.h"
 
 TAILQ_HEAD(rte_distributor_list, rte_distributor);
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v11 03/18] lib: add new distributor code
  2017-03-20 10:08                                   ` [PATCH v11 0/18] distributor lib performance enhancements David Hunt
  2017-03-20 10:08                                     ` [PATCH v11 01/18] lib: rename legacy distributor lib files David Hunt
  2017-03-20 10:08                                     ` [PATCH v11 02/18] lib: create private header file David Hunt
@ 2017-03-20 10:08                                     ` David Hunt
  2017-03-20 10:08                                     ` [PATCH v11 04/18] lib: add SIMD flow matching to distributor David Hunt
                                                       ` (15 subsequent siblings)
  18 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-20 10:08 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

This patch includes the code for new burst-capable distributor library.

It also includes the rte_distributor_next.h file which will
be used as the public header once we add in the symbol versioning
for v20 and v1705 APIs, at which stage we will rename it to
rte_distributor.h.

The new distributor code contains a very similar API to the legacy code,
but now sends bursts of up to 8 mbufs to each worker. Flow ID's are
reduced to 15 bits for an optimal flow matching algorithm.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_distributor/Makefile                  |   1 +
 lib/librte_distributor/rte_distributor.c         | 628 +++++++++++++++++++++++
 lib/librte_distributor/rte_distributor_next.h    | 269 ++++++++++
 lib/librte_distributor/rte_distributor_private.h |  61 +++
 4 files changed, 959 insertions(+)
 create mode 100644 lib/librte_distributor/rte_distributor.c
 create mode 100644 lib/librte_distributor/rte_distributor_next.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index b314ca6..74256ff 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -43,6 +43,7 @@ LIBABIVER := 1
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor.c
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
new file mode 100644
index 0000000..75b0d47
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor.c
@@ -0,0 +1,628 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <stdio.h>
+#include <sys/queue.h>
+#include <string.h>
+#include <rte_mbuf.h>
+#include <rte_memory.h>
+#include <rte_cycles.h>
+#include <rte_memzone.h>
+#include <rte_errno.h>
+#include <rte_string_fns.h>
+#include <rte_eal_memconfig.h>
+#include <rte_compat.h>
+#include "rte_distributor_private.h"
+#include "rte_distributor_next.h"
+#include "rte_distributor_v20.h"
+
+TAILQ_HEAD(rte_dist_burst_list, rte_distributor_v1705);
+
+static struct rte_tailq_elem rte_dist_burst_tailq = {
+	.name = "RTE_DIST_BURST",
+};
+EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
+
+/**** APIs called by workers ****/
+
+/**** Burst Packet APIs called by workers ****/
+
+void
+rte_distributor_request_pkt_v1705(struct rte_distributor_v1705 *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count)
+{
+	struct rte_distributor_buffer_v1705 *buf = &(d->bufs[worker_id]);
+	unsigned int i;
+
+	volatile int64_t *retptr64;
+
+	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
+		rte_distributor_request_pkt(d->d_v20,
+			worker_id, oldpkt[0]);
+		return;
+	}
+
+	retptr64 = &(buf->retptr64[0]);
+	/* Spin while handshake bits are set (scheduler clears it) */
+	while (unlikely(*retptr64 & RTE_DISTRIB_GET_BUF)) {
+		rte_pause();
+		uint64_t t = rte_rdtsc()+100;
+
+		while (rte_rdtsc() < t)
+			rte_pause();
+	}
+
+	/*
+	 * OK, if we've got here, then the scheduler has just cleared the
+	 * handshake bits. Populate the retptrs with returning packets.
+	 */
+
+	for (i = count; i < RTE_DIST_BURST_SIZE; i++)
+		buf->retptr64[i] = 0;
+
+	/* Set Return bit for each packet returned */
+	for (i = count; i-- > 0; )
+		buf->retptr64[i] =
+			(((int64_t)(uintptr_t)(oldpkt[i])) <<
+			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
+
+	/*
+	 * Finally, set the GET_BUF  to signal to distributor that cache
+	 * line is ready for processing
+	 */
+	*retptr64 |= RTE_DISTRIB_GET_BUF;
+}
+
+int
+rte_distributor_poll_pkt_v1705(struct rte_distributor_v1705 *d,
+		unsigned int worker_id, struct rte_mbuf **pkts)
+{
+	struct rte_distributor_buffer_v1705 *buf = &d->bufs[worker_id];
+	uint64_t ret;
+	int count = 0;
+	unsigned int i;
+
+	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
+		pkts[0] = rte_distributor_poll_pkt(d->d_v20, worker_id);
+		return (pkts[0]) ? 1 : 0;
+	}
+
+	/* If bit is set, return */
+	if (buf->bufptr64[0] & RTE_DISTRIB_GET_BUF)
+		return -1;
+
+	/* since bufptr64 is signed, this should be an arithmetic shift */
+	for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+		if (likely(buf->bufptr64[i] & RTE_DISTRIB_VALID_BUF)) {
+			ret = buf->bufptr64[i] >> RTE_DISTRIB_FLAG_BITS;
+			pkts[count++] = (struct rte_mbuf *)((uintptr_t)(ret));
+		}
+	}
+
+	/*
+	 * so now we've got the contents of the cacheline into an  array of
+	 * mbuf pointers, so toggle the bit so scheduler can start working
+	 * on the next cacheline while we're working.
+	 */
+	buf->bufptr64[0] |= RTE_DISTRIB_GET_BUF;
+
+	return count;
+}
+
+int
+rte_distributor_get_pkt_v1705(struct rte_distributor_v1705 *d,
+		unsigned int worker_id, struct rte_mbuf **pkts,
+		struct rte_mbuf **oldpkt, unsigned int return_count)
+{
+	int count;
+
+	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
+		if (return_count <= 1) {
+			pkts[0] = rte_distributor_get_pkt(d->d_v20,
+				worker_id, oldpkt[0]);
+			return (pkts[0]) ? 1 : 0;
+		} else
+			return -EINVAL;
+	}
+
+	rte_distributor_request_pkt_v1705(d, worker_id, oldpkt, return_count);
+
+	count = rte_distributor_poll_pkt_v1705(d, worker_id, pkts);
+	while (count == -1) {
+		uint64_t t = rte_rdtsc() + 100;
+
+		while (rte_rdtsc() < t)
+			rte_pause();
+
+		count = rte_distributor_poll_pkt_v1705(d, worker_id, pkts);
+	}
+	return count;
+}
+
+int
+rte_distributor_return_pkt_v1705(struct rte_distributor_v1705 *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
+{
+	struct rte_distributor_buffer_v1705 *buf = &d->bufs[worker_id];
+	unsigned int i;
+
+	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
+		if (num == 1)
+			return rte_distributor_return_pkt(d->d_v20,
+				worker_id, oldpkt[0]);
+		else
+			return -EINVAL;
+	}
+
+	for (i = 0; i < RTE_DIST_BURST_SIZE; i++)
+		/* Switch off the return bit first */
+		buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
+
+	for (i = num; i-- > 0; )
+		buf->retptr64[i] = (((int64_t)(uintptr_t)oldpkt[i]) <<
+			RTE_DISTRIB_FLAG_BITS) | RTE_DISTRIB_RETURN_BUF;
+
+	/* set the GET_BUF but even if we got no returns */
+	buf->retptr64[0] |= RTE_DISTRIB_GET_BUF;
+
+	return 0;
+}
+
+/**** APIs called on distributor core ***/
+
+/* stores a packet returned from a worker inside the returns array */
+static inline void
+store_return(uintptr_t oldbuf, struct rte_distributor_v1705 *d,
+		unsigned int *ret_start, unsigned int *ret_count)
+{
+	if (!oldbuf)
+		return;
+	/* store returns in a circular buffer */
+	d->returns.mbufs[(*ret_start + *ret_count) & RTE_DISTRIB_RETURNS_MASK]
+			= (void *)oldbuf;
+	*ret_start += (*ret_count == RTE_DISTRIB_RETURNS_MASK);
+	*ret_count += (*ret_count != RTE_DISTRIB_RETURNS_MASK);
+}
+
+/*
+ * Match then flow_ids (tags) of the incoming packets to the flow_ids
+ * of the inflight packets (both inflight on the workers and in each worker
+ * backlog). This will then allow us to pin those packets to the relevant
+ * workers to give us our atomic flow pinning.
+ */
+void
+find_match_scalar(struct rte_distributor_v1705 *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	struct rte_distributor_backlog *bl;
+	uint16_t i, j, w;
+
+	/*
+	 * Function overview:
+	 * 1. Loop through all worker ID's
+	 * 2. Compare the current inflights to the incoming tags
+	 * 3. Compare the current backlog to the incoming tags
+	 * 4. Add any matches to the output
+	 */
+
+	for (j = 0 ; j < RTE_DIST_BURST_SIZE; j++)
+		output_ptr[j] = 0;
+
+	for (i = 0; i < d->num_workers; i++) {
+		bl = &d->backlog[i];
+
+		for (j = 0; j < RTE_DIST_BURST_SIZE ; j++)
+			for (w = 0; w < RTE_DIST_BURST_SIZE; w++)
+				if (d->in_flight_tags[i][j] == data_ptr[w]) {
+					output_ptr[j] = i+1;
+					break;
+				}
+		for (j = 0; j < RTE_DIST_BURST_SIZE; j++)
+			for (w = 0; w < RTE_DIST_BURST_SIZE; w++)
+				if (bl->tags[j] == data_ptr[w]) {
+					output_ptr[j] = i+1;
+					break;
+				}
+	}
+
+	/*
+	 * At this stage, the output contains 8 16-bit values, with
+	 * each non-zero value containing the worker ID on which the
+	 * corresponding flow is pinned to.
+	 */
+}
+
+
+/*
+ * When the handshake bits indicate that there are packets coming
+ * back from the worker, this function is called to copy and store
+ * the valid returned pointers (store_return).
+ */
+static unsigned int
+handle_returns(struct rte_distributor_v1705 *d, unsigned int wkr)
+{
+	struct rte_distributor_buffer_v1705 *buf = &(d->bufs[wkr]);
+	uintptr_t oldbuf;
+	unsigned int ret_start = d->returns.start,
+			ret_count = d->returns.count;
+	unsigned int count = 0;
+	unsigned int i;
+
+	if (buf->retptr64[0] & RTE_DISTRIB_GET_BUF) {
+		for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
+			if (buf->retptr64[i] & RTE_DISTRIB_RETURN_BUF) {
+				oldbuf = ((uintptr_t)(buf->retptr64[i] >>
+					RTE_DISTRIB_FLAG_BITS));
+				/* store returns in a circular buffer */
+				store_return(oldbuf, d, &ret_start, &ret_count);
+				count++;
+				buf->retptr64[i] &= ~RTE_DISTRIB_RETURN_BUF;
+			}
+		}
+		d->returns.start = ret_start;
+		d->returns.count = ret_count;
+		/* Clear for the worker to populate with more returns */
+		buf->retptr64[0] = 0;
+	}
+	return count;
+}
+
+/*
+ * This function releases a burst (cache line) to a worker.
+ * It is called from the process function when a cacheline is
+ * full to make room for more packets for that worker, or when
+ * all packets have been assigned to bursts and need to be flushed
+ * to the workers.
+ * It also needs to wait for any outstanding packets from the worker
+ * before sending out new packets.
+ */
+static unsigned int
+release(struct rte_distributor_v1705 *d, unsigned int wkr)
+{
+	struct rte_distributor_buffer_v1705 *buf = &(d->bufs[wkr]);
+	unsigned int i;
+
+	while (!(d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF))
+		rte_pause();
+
+	handle_returns(d, wkr);
+
+	buf->count = 0;
+
+	for (i = 0; i < d->backlog[wkr].count; i++) {
+		d->bufs[wkr].bufptr64[i] = d->backlog[wkr].pkts[i] |
+				RTE_DISTRIB_GET_BUF | RTE_DISTRIB_VALID_BUF;
+		d->in_flight_tags[wkr][i] = d->backlog[wkr].tags[i];
+	}
+	buf->count = i;
+	for ( ; i < RTE_DIST_BURST_SIZE ; i++) {
+		buf->bufptr64[i] = RTE_DISTRIB_GET_BUF;
+		d->in_flight_tags[wkr][i] = 0;
+	}
+
+	d->backlog[wkr].count = 0;
+
+	/* Clear the GET bit */
+	buf->bufptr64[0] &= ~RTE_DISTRIB_GET_BUF;
+	return  buf->count;
+
+}
+
+
+/* process a set of packets to distribute them to workers */
+int
+rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs)
+{
+	unsigned int next_idx = 0;
+	static unsigned int wkr;
+	struct rte_mbuf *next_mb = NULL;
+	int64_t next_value = 0;
+	uint16_t new_tag = 0;
+	uint16_t flows[RTE_DIST_BURST_SIZE] __rte_cache_aligned;
+	unsigned int i, j, w, wid;
+
+	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
+		/* Call the old API */
+		return rte_distributor_process(d->d_v20, mbufs, num_mbufs);
+	}
+
+	if (unlikely(num_mbufs == 0)) {
+		/* Flush out all non-full cache-lines to workers. */
+		for (wid = 0 ; wid < d->num_workers; wid++) {
+			if ((d->bufs[wid].bufptr64[0] & RTE_DISTRIB_GET_BUF)) {
+				release(d, wid);
+				handle_returns(d, wid);
+			}
+		}
+		return 0;
+	}
+
+	while (next_idx < num_mbufs) {
+		uint16_t matches[RTE_DIST_BURST_SIZE];
+		unsigned int pkts;
+
+		if (d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF)
+			d->bufs[wkr].count = 0;
+
+		if ((num_mbufs - next_idx) < RTE_DIST_BURST_SIZE)
+			pkts = num_mbufs - next_idx;
+		else
+			pkts = RTE_DIST_BURST_SIZE;
+
+		for (i = 0; i < pkts; i++) {
+			if (mbufs[next_idx + i]) {
+				/* flows have to be non-zero */
+				flows[i] = mbufs[next_idx + i]->hash.usr | 1;
+			} else
+				flows[i] = 0;
+		}
+		for (; i < RTE_DIST_BURST_SIZE; i++)
+			flows[i] = 0;
+
+		find_match_scalar(d, &flows[0], &matches[0]);
+
+		/*
+		 * Matches array now contain the intended worker ID (+1) of
+		 * the incoming packets. Any zeroes need to be assigned
+		 * workers.
+		 */
+
+		for (j = 0; j < pkts; j++) {
+
+			next_mb = mbufs[next_idx++];
+			next_value = (((int64_t)(uintptr_t)next_mb) <<
+					RTE_DISTRIB_FLAG_BITS);
+			/*
+			 * User is advocated to set tag vaue for each
+			 * mbuf before calling rte_distributor_process.
+			 * User defined tags are used to identify flows,
+			 * or sessions.
+			 */
+			/* flows MUST be non-zero */
+			new_tag = (uint16_t)(next_mb->hash.usr) | 1;
+
+			/*
+			 * Uncommenting the next line will cause the find_match
+			 * function to be optimised out, making this function
+			 * do parallel (non-atomic) distribution
+			 */
+			/* matches[j] = 0; */
+
+			if (matches[j]) {
+				struct rte_distributor_backlog *bl =
+						&d->backlog[matches[j]-1];
+				if (unlikely(bl->count ==
+						RTE_DIST_BURST_SIZE)) {
+					release(d, matches[j]-1);
+				}
+
+				/* Add to worker that already has flow */
+				unsigned int idx = bl->count++;
+
+				bl->tags[idx] = new_tag;
+				bl->pkts[idx] = next_value;
+
+			} else {
+				struct rte_distributor_backlog *bl =
+						&d->backlog[wkr];
+				if (unlikely(bl->count ==
+						RTE_DIST_BURST_SIZE)) {
+					release(d, wkr);
+				}
+
+				/* Add to current worker worker */
+				unsigned int idx = bl->count++;
+
+				bl->tags[idx] = new_tag;
+				bl->pkts[idx] = next_value;
+				/*
+				 * Now that we've just added an unpinned flow
+				 * to a worker, we need to ensure that all
+				 * other packets with that same flow will go
+				 * to the same worker in this burst.
+				 */
+				for (w = j; w < pkts; w++)
+					if (flows[w] == new_tag)
+						matches[w] = wkr+1;
+			}
+		}
+		wkr++;
+		if (wkr >= d->num_workers)
+			wkr = 0;
+	}
+
+	/* Flush out all non-full cache-lines to workers. */
+	for (wid = 0 ; wid < d->num_workers; wid++)
+		if ((d->bufs[wid].bufptr64[0] & RTE_DISTRIB_GET_BUF))
+			release(d, wid);
+
+	return num_mbufs;
+}
+
+/* return to the caller, packets returned from workers */
+int
+rte_distributor_returned_pkts_v1705(struct rte_distributor_v1705 *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs)
+{
+	struct rte_distributor_returned_pkts *returns = &d->returns;
+	unsigned int retval = (max_mbufs < returns->count) ?
+			max_mbufs : returns->count;
+	unsigned int i;
+
+	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
+		/* Call the old API */
+		return rte_distributor_returned_pkts(d->d_v20,
+				mbufs, max_mbufs);
+	}
+
+	for (i = 0; i < retval; i++) {
+		unsigned int idx = (returns->start + i) &
+				RTE_DISTRIB_RETURNS_MASK;
+
+		mbufs[i] = returns->mbufs[idx];
+	}
+	returns->start += i;
+	returns->count -= i;
+
+	return retval;
+}
+
+/*
+ * Return the number of packets in-flight in a distributor, i.e. packets
+ * being workered on or queued up in a backlog.
+ */
+static inline unsigned int
+total_outstanding(const struct rte_distributor_v1705 *d)
+{
+	unsigned int wkr, total_outstanding = 0;
+
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		total_outstanding += d->backlog[wkr].count;
+
+	return total_outstanding;
+}
+
+/*
+ * Flush the distributor, so that there are no outstanding packets in flight or
+ * queued up.
+ */
+int
+rte_distributor_flush_v1705(struct rte_distributor_v1705 *d)
+{
+	const unsigned int flushed = total_outstanding(d);
+	unsigned int wkr;
+
+	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
+		/* Call the old API */
+		return rte_distributor_flush(d->d_v20);
+	}
+
+	while (total_outstanding(d) > 0)
+		rte_distributor_process_v1705(d, NULL, 0);
+
+	/*
+	 * Send empty burst to all workers to allow them to exit
+	 * gracefully, should they need to.
+	 */
+	rte_distributor_process_v1705(d, NULL, 0);
+
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		handle_returns(d, wkr);
+
+	return flushed;
+}
+
+/* clears the internal returns array in the distributor */
+void
+rte_distributor_clear_returns_v1705(struct rte_distributor_v1705 *d)
+{
+	unsigned int wkr;
+
+	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
+		/* Call the old API */
+		rte_distributor_clear_returns(d->d_v20);
+	}
+
+	/* throw away returns, so workers can exit */
+	for (wkr = 0; wkr < d->num_workers; wkr++)
+		d->bufs[wkr].retptr64[0] = 0;
+}
+
+/* creates a distributor instance */
+struct rte_distributor_v1705 *
+rte_distributor_create_v1705(const char *name,
+		unsigned int socket_id,
+		unsigned int num_workers,
+		unsigned int alg_type)
+{
+	struct rte_distributor_v1705 *d;
+	struct rte_dist_burst_list *dist_burst_list;
+	char mz_name[RTE_MEMZONE_NAMESIZE];
+	const struct rte_memzone *mz;
+	unsigned int i;
+
+	/* TODO Reorganise function properly around RTE_DIST_ALG_SINGLE/BURST */
+
+	/* compilation-time checks */
+	RTE_BUILD_BUG_ON((sizeof(*d) & RTE_CACHE_LINE_MASK) != 0);
+	RTE_BUILD_BUG_ON((RTE_DISTRIB_MAX_WORKERS & 7) != 0);
+
+	if (alg_type == RTE_DIST_ALG_SINGLE) {
+		d = malloc(sizeof(struct rte_distributor_v1705));
+		d->d_v20 = rte_distributor_create(name,
+				socket_id, num_workers);
+		if (d->d_v20 == NULL) {
+			/* rte_errno will have been set */
+			return NULL;
+		}
+		d->alg_type = alg_type;
+		return d;
+	}
+
+	if (name == NULL || num_workers >= RTE_DISTRIB_MAX_WORKERS) {
+		rte_errno = EINVAL;
+		return NULL;
+	}
+
+	snprintf(mz_name, sizeof(mz_name), RTE_DISTRIB_PREFIX"%s", name);
+	mz = rte_memzone_reserve(mz_name, sizeof(*d), socket_id, NO_FLAGS);
+	if (mz == NULL) {
+		rte_errno = ENOMEM;
+		return NULL;
+	}
+
+	d = mz->addr;
+	snprintf(d->name, sizeof(d->name), "%s", name);
+	d->num_workers = num_workers;
+	d->alg_type = alg_type;
+	d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
+
+	/*
+	 * Set up the backog tags so they're pointing at the second cache
+	 * line for performance during flow matching
+	 */
+	for (i = 0 ; i < num_workers ; i++)
+		d->backlog[i].tags = &d->in_flight_tags[i][RTE_DIST_BURST_SIZE];
+
+	dist_burst_list = RTE_TAILQ_CAST(rte_dist_burst_tailq.head,
+					  rte_dist_burst_list);
+
+
+	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
+	TAILQ_INSERT_TAIL(dist_burst_list, d, next);
+	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
+
+	return d;
+}
diff --git a/lib/librte_distributor/rte_distributor_next.h b/lib/librte_distributor/rte_distributor_next.h
new file mode 100644
index 0000000..0034020
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_next.h
@@ -0,0 +1,269 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIBUTOR_H_
+#define _RTE_DISTRIBUTOR_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Type of distribution (burst/single) */
+enum rte_distributor_alg_type {
+	RTE_DIST_ALG_BURST = 0,
+	RTE_DIST_ALG_SINGLE,
+	RTE_DIST_NUM_ALG_TYPES
+};
+
+struct rte_distributor_v1705;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @param alg_type
+ *   Call the legacy API, or use the new burst API. legacy uses 32-bit
+ *   flow ID, and works on a single packet at a time. Latest uses 15-
+ *   bit flow ID and works on up to 8 packets at a time to worers.
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor_v1705 *
+rte_distributor_create_v1705(const char *name, unsigned int socket_id,
+		unsigned int num_workers,
+		unsigned int alg_type);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed on different cores at
+ * the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts_v1705(struct rte_distributor_v1705 *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush_v1705(struct rte_distributor_v1705 *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns_v1705(struct rte_distributor_v1705 *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get new packets to process. Any previous packets
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param pkts
+ *   The mbufs pointer array to be filled in (up to 8 packets)
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ * @param retcount
+ *   The number of packets being returned
+ *
+ * @return
+ *   The number of packets in the pkts array
+ */
+int
+rte_distributor_get_pkt_v1705(struct rte_distributor_v1705 *d,
+	unsigned int worker_id, struct rte_mbuf **pkts,
+	struct rte_mbuf **oldpkt, unsigned int retcount);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packets being processed by the worker
+ * @param num
+ *   The number of packets in the oldpkt array
+ */
+int
+rte_distributor_return_pkt_v1705(struct rte_distributor_v1705 *d,
+	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt_burst(), this function does not wait for a
+ * new packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt_burst() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt_burst()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The returning packets, if any, processed by the worker
+ * @param count
+ *   The number of returning packets
+ */
+void
+rte_distributor_request_pkt_v1705(struct rte_distributor_v1705 *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbufs
+ *   The array of mbufs being given to the worker
+ *
+ * @return
+ *   The number of packets being given to the worker thread, zero if no
+ *   packet is yet available.
+ */
+int
+rte_distributor_poll_pkt_v1705(struct rte_distributor_v1705 *d,
+		unsigned int worker_id, struct rte_mbuf **mbufs);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
index b1c0f66..82b0daf 100644
--- a/lib/librte_distributor/rte_distributor_private.h
+++ b/lib/librte_distributor/rte_distributor_private.h
@@ -129,6 +129,67 @@ struct rte_distributor {
 	struct rte_distributor_returned_pkts returns;
 };
 
+/* All different signature compare functions */
+enum rte_distributor_match_function {
+	RTE_DIST_MATCH_SCALAR = 0,
+	RTE_DIST_MATCH_VECTOR,
+	RTE_DIST_NUM_MATCH_FNS
+};
+
+/**
+ * Buffer structure used to pass the pointer data between cores. This is cache
+ * line aligned, but to improve performance and prevent adjacent cache-line
+ * prefetches of buffers for other workers, e.g. when worker 1's buffer is on
+ * the next cache line to worker 0, we pad this out to two cache lines.
+ * We can pass up to 8 mbufs at a time in one cacheline.
+ * There is a separate cacheline for returns in the burst API.
+ */
+struct rte_distributor_buffer_v1705 {
+	volatile int64_t bufptr64[RTE_DIST_BURST_SIZE]
+		__rte_cache_aligned; /* <= outgoing to worker */
+
+	int64_t pad1 __rte_cache_aligned;    /* <= one cache line  */
+
+	volatile int64_t retptr64[RTE_DIST_BURST_SIZE]
+		__rte_cache_aligned; /* <= incoming from worker */
+
+	int64_t pad2 __rte_cache_aligned;    /* <= one cache line  */
+
+	int count __rte_cache_aligned;       /* <= number of current mbufs */
+};
+
+struct rte_distributor_v1705 {
+	TAILQ_ENTRY(rte_distributor_v1705) next;    /**< Next in list. */
+
+	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
+	unsigned int num_workers;             /**< Number of workers polling */
+	unsigned int alg_type;                /**< Number of alg types */
+
+	/**>
+	 * First cache line in the this array are the tags inflight
+	 * on the worker core. Second cache line are the backlog
+	 * that are going to go to the worker core.
+	 */
+	uint16_t in_flight_tags[RTE_DISTRIB_MAX_WORKERS][RTE_DIST_BURST_SIZE*2]
+			__rte_cache_aligned;
+
+	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS]
+			__rte_cache_aligned;
+
+	struct rte_distributor_buffer_v1705 bufs[RTE_DISTRIB_MAX_WORKERS];
+
+	struct rte_distributor_returned_pkts returns;
+
+	enum rte_distributor_match_function dist_match_fn;
+
+	struct rte_distributor *d_v20;
+};
+
+void
+find_match_scalar(struct rte_distributor_v1705 *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr);
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v11 04/18] lib: add SIMD flow matching to distributor
  2017-03-20 10:08                                   ` [PATCH v11 0/18] distributor lib performance enhancements David Hunt
                                                       ` (2 preceding siblings ...)
  2017-03-20 10:08                                     ` [PATCH v11 03/18] lib: add new distributor code David Hunt
@ 2017-03-20 10:08                                     ` David Hunt
  2017-03-20 10:08                                     ` [PATCH v11 05/18] test/distributor: extra params for autotests David Hunt
                                                       ` (14 subsequent siblings)
  18 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-20 10:08 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Add an optimised version of the in-flight flow matching algorithm
using SIMD instructions. This should give up to 1.5x over the scalar
versions performance.

Falls back to scalar version if SSE4.2 not available

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_distributor/Makefile                    |  10 ++
 lib/librte_distributor/rte_distributor.c           |  16 ++-
 .../rte_distributor_match_generic.c                |  43 ++++++++
 lib/librte_distributor/rte_distributor_match_sse.c | 114 +++++++++++++++++++++
 lib/librte_distributor/rte_distributor_private.h   |   5 +
 5 files changed, 186 insertions(+), 2 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_match_generic.c
 create mode 100644 lib/librte_distributor/rte_distributor_match_sse.c

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 74256ff..a812fe4 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -44,6 +44,16 @@ LIBABIVER := 1
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor.c
+ifeq ($(CONFIG_RTE_ARCH_X86),y)
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_match_sse.c
+# distributor SIMD algo needs SSE4.2 support
+ifeq ($(findstring RTE_MACHINE_CPUFLAG_SSE4_2,$(CFLAGS)),)
+CFLAGS_rte_distributor_match_sse.o += -msse4.2
+endif
+else
+SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += rte_distributor_match_generic.c
+endif
+
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index 75b0d47..6158fa6 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -391,7 +391,13 @@ rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
 		for (; i < RTE_DIST_BURST_SIZE; i++)
 			flows[i] = 0;
 
-		find_match_scalar(d, &flows[0], &matches[0]);
+		switch (d->dist_match_fn) {
+		case RTE_DIST_MATCH_VECTOR:
+			find_match_vec(d, &flows[0], &matches[0]);
+			break;
+		default:
+			find_match_scalar(d, &flows[0], &matches[0]);
+		}
 
 		/*
 		 * Matches array now contain the intended worker ID (+1) of
@@ -607,7 +613,13 @@ rte_distributor_create_v1705(const char *name,
 	snprintf(d->name, sizeof(d->name), "%s", name);
 	d->num_workers = num_workers;
 	d->alg_type = alg_type;
-	d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
+
+#if defined(RTE_ARCH_X86)
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SSE4_2))
+		d->dist_match_fn = RTE_DIST_MATCH_VECTOR;
+	else
+#endif
+		d->dist_match_fn = RTE_DIST_MATCH_SCALAR;
 
 	/*
 	 * Set up the backog tags so they're pointing at the second cache
diff --git a/lib/librte_distributor/rte_distributor_match_generic.c b/lib/librte_distributor/rte_distributor_match_generic.c
new file mode 100644
index 0000000..7c2f9f5
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_match_generic.c
@@ -0,0 +1,43 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_mbuf.h>
+#include "rte_distributor_private.h"
+#include "rte_distributor.h"
+
+void
+find_match_vec(struct rte_distributor_v1705 *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	find_match_scalar(d, data_ptr, output_ptr);
+}
diff --git a/lib/librte_distributor/rte_distributor_match_sse.c b/lib/librte_distributor/rte_distributor_match_sse.c
new file mode 100644
index 0000000..b9f9bb0
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_match_sse.c
@@ -0,0 +1,114 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <rte_mbuf.h>
+#include "rte_distributor_private.h"
+#include "rte_distributor.h"
+#include "smmintrin.h"
+#include "nmmintrin.h"
+
+
+void
+find_match_vec(struct rte_distributor_v1705 *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr)
+{
+	/* Setup */
+	__m128i incoming_fids;
+	__m128i inflight_fids;
+	__m128i preflight_fids;
+	__m128i wkr;
+	__m128i mask1;
+	__m128i mask2;
+	__m128i output;
+	struct rte_distributor_backlog *bl;
+	uint16_t i;
+
+	/*
+	 * Function overview:
+	 * 2. Loop through all worker ID's
+	 *  2a. Load the current inflights for that worker into an xmm reg
+	 *  2b. Load the current backlog for that worker into an xmm reg
+	 *  2c. use cmpestrm to intersect flow_ids with backlog and inflights
+	 *  2d. Add any matches to the output
+	 * 3. Write the output xmm (matching worker ids).
+	 */
+
+
+	output = _mm_set1_epi16(0);
+	incoming_fids = _mm_load_si128((__m128i *)data_ptr);
+
+	for (i = 0; i < d->num_workers; i++) {
+		bl = &d->backlog[i];
+
+		inflight_fids =
+			_mm_load_si128((__m128i *)&(d->in_flight_tags[i]));
+		preflight_fids =
+			_mm_load_si128((__m128i *)(bl->tags));
+
+		/*
+		 * Any incoming_fid that exists anywhere in inflight_fids will
+		 * have 0xffff in same position of the mask as the incoming fid
+		 * Example (shortened to bytes for brevity):
+		 * incoming_fids   0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08
+		 * inflight_fids   0x03 0x05 0x07 0x00 0x00 0x00 0x00 0x00
+		 * mask            0x00 0x00 0xff 0x00 0xff 0x00 0xff 0x00
+		 */
+
+		mask1 = _mm_cmpestrm(inflight_fids, 8, incoming_fids, 8,
+			_SIDD_UWORD_OPS |
+			_SIDD_CMP_EQUAL_ANY |
+			_SIDD_UNIT_MASK);
+		mask2 = _mm_cmpestrm(preflight_fids, 8, incoming_fids, 8,
+			_SIDD_UWORD_OPS |
+			_SIDD_CMP_EQUAL_ANY |
+			_SIDD_UNIT_MASK);
+
+		mask1 = _mm_or_si128(mask1, mask2);
+		/*
+		 * Now mask contains 0xffff where there's a match.
+		 * Next we need to store the worker_id in the relevant position
+		 * in the output.
+		 */
+
+		wkr = _mm_set1_epi16(i+1);
+		mask1 = _mm_and_si128(mask1, wkr);
+		output = _mm_or_si128(mask1, output);
+	}
+
+	/*
+	 * At this stage, the output 128-bit contains 8 16-bit values, with
+	 * each non-zero value containing the worker ID on which the
+	 * corresponding flow is pinned to.
+	 */
+	_mm_store_si128((__m128i *)output_ptr, output);
+}
diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
index 82b0daf..04c9cac 100644
--- a/lib/librte_distributor/rte_distributor_private.h
+++ b/lib/librte_distributor/rte_distributor_private.h
@@ -190,6 +190,11 @@ find_match_scalar(struct rte_distributor_v1705 *d,
 			uint16_t *data_ptr,
 			uint16_t *output_ptr);
 
+void
+find_match_vec(struct rte_distributor_v1705 *d,
+			uint16_t *data_ptr,
+			uint16_t *output_ptr);
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v11 05/18] test/distributor: extra params for autotests
  2017-03-20 10:08                                   ` [PATCH v11 0/18] distributor lib performance enhancements David Hunt
                                                       ` (3 preceding siblings ...)
  2017-03-20 10:08                                     ` [PATCH v11 04/18] lib: add SIMD flow matching to distributor David Hunt
@ 2017-03-20 10:08                                     ` David Hunt
  2017-03-20 10:08                                     ` [PATCH v11 06/18] lib: switch distributor over to new API David Hunt
                                                       ` (13 subsequent siblings)
  18 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-20 10:08 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

In the next few patches, we'll want to test old and new API,
so here we're allowing different parameters to be passed to
the tests, instead of just a distributor struct.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 test/test/test_distributor.c | 64 +++++++++++++++++++++++++++++---------------
 1 file changed, 43 insertions(+), 21 deletions(-)

diff --git a/test/test/test_distributor.c b/test/test/test_distributor.c
index 85cb8f3..6059a0c 100644
--- a/test/test/test_distributor.c
+++ b/test/test/test_distributor.c
@@ -45,6 +45,13 @@
 #define BURST 32
 #define BIG_BATCH 1024
 
+struct worker_params {
+	char name[64];
+	struct rte_distributor *dist;
+};
+
+struct worker_params worker_params;
+
 /* statics - all zero-initialized by default */
 static volatile int quit;      /**< general quit variable for all threads */
 static volatile int zero_quit; /**< var for when we just want thr0 to quit*/
@@ -81,7 +88,9 @@ static int
 handle_work(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->dist;
+
 	unsigned count = 0;
 	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
 
@@ -107,8 +116,9 @@ handle_work(void *arg)
  *   not necessarily in the same order (as different flows).
  */
 static int
-sanity_test(struct rte_distributor *d, struct rte_mempool *p)
+sanity_test(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->dist;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
@@ -249,7 +259,8 @@ static int
 handle_work_with_free_mbufs(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->dist;
 	unsigned count = 0;
 	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
 
@@ -270,8 +281,9 @@ handle_work_with_free_mbufs(void *arg)
  * library.
  */
 static int
-sanity_test_with_mbuf_alloc(struct rte_distributor *d, struct rte_mempool *p)
+sanity_test_with_mbuf_alloc(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->dist;
 	unsigned i;
 	struct rte_mbuf *bufs[BURST];
 
@@ -305,7 +317,8 @@ static int
 handle_work_for_shutdown_test(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
-	struct rte_distributor *d = arg;
+	struct worker_params *wp = arg;
+	struct rte_distributor *d = wp->dist;
 	unsigned count = 0;
 	const unsigned id = __sync_fetch_and_add(&worker_idx, 1);
 
@@ -344,9 +357,10 @@ handle_work_for_shutdown_test(void *arg)
  * library.
  */
 static int
-sanity_test_with_worker_shutdown(struct rte_distributor *d,
+sanity_test_with_worker_shutdown(struct worker_params *wp,
 		struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->dist;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
@@ -401,9 +415,10 @@ sanity_test_with_worker_shutdown(struct rte_distributor *d,
  * one worker shuts down..
  */
 static int
-test_flush_with_worker_shutdown(struct rte_distributor *d,
+test_flush_with_worker_shutdown(struct worker_params *wp,
 		struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->dist;
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
@@ -480,8 +495,9 @@ int test_error_distributor_create_numworkers(void)
 
 /* Useful function which ensures that all worker functions terminate */
 static void
-quit_workers(struct rte_distributor *d, struct rte_mempool *p)
+quit_workers(struct worker_params *wp, struct rte_mempool *p)
 {
+	struct rte_distributor *d = wp->dist;
 	const unsigned num_workers = rte_lcore_count() - 1;
 	unsigned i;
 	struct rte_mbuf *bufs[RTE_MAX_LCORE];
@@ -536,28 +552,34 @@ test_distributor(void)
 		}
 	}
 
-	rte_eal_mp_remote_launch(handle_work, d, SKIP_MASTER);
-	if (sanity_test(d, p) < 0)
+	worker_params.dist = d;
+	sprintf(worker_params.name, "single");
+
+	rte_eal_mp_remote_launch(handle_work, &worker_params, SKIP_MASTER);
+	if (sanity_test(&worker_params, p) < 0)
 		goto err;
-	quit_workers(d, p);
+	quit_workers(&worker_params, p);
 
-	rte_eal_mp_remote_launch(handle_work_with_free_mbufs, d, SKIP_MASTER);
-	if (sanity_test_with_mbuf_alloc(d, p) < 0)
+	rte_eal_mp_remote_launch(handle_work_with_free_mbufs, &worker_params,
+				SKIP_MASTER);
+	if (sanity_test_with_mbuf_alloc(&worker_params, p) < 0)
 		goto err;
-	quit_workers(d, p);
+	quit_workers(&worker_params, p);
 
 	if (rte_lcore_count() > 2) {
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test, d,
+		rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+				&worker_params,
 				SKIP_MASTER);
-		if (sanity_test_with_worker_shutdown(d, p) < 0)
+		if (sanity_test_with_worker_shutdown(&worker_params, p) < 0)
 			goto err;
-		quit_workers(d, p);
+		quit_workers(&worker_params, p);
 
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test, d,
+		rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+				&worker_params,
 				SKIP_MASTER);
-		if (test_flush_with_worker_shutdown(d, p) < 0)
+		if (test_flush_with_worker_shutdown(&worker_params, p) < 0)
 			goto err;
-		quit_workers(d, p);
+		quit_workers(&worker_params, p);
 
 	} else {
 		printf("Not enough cores to run tests for worker shutdown\n");
@@ -572,7 +594,7 @@ test_distributor(void)
 	return 0;
 
 err:
-	quit_workers(d, p);
+	quit_workers(&worker_params, p);
 	return -1;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v11 06/18] lib: switch distributor over to new API
  2017-03-20 10:08                                   ` [PATCH v11 0/18] distributor lib performance enhancements David Hunt
                                                       ` (4 preceding siblings ...)
  2017-03-20 10:08                                     ` [PATCH v11 05/18] test/distributor: extra params for autotests David Hunt
@ 2017-03-20 10:08                                     ` David Hunt
  2017-03-20 10:08                                     ` [PATCH v11 07/18] lib: make v20 header file private David Hunt
                                                       ` (12 subsequent siblings)
  18 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-20 10:08 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

This is the main switch over between the legacy API and the new
burst API. We rename all the functions in rte_distributor.c to remove
the _v1705, and we add in _v20 in the rte_distributor_v20.c

We also rename the rte_distributor_next.h as rte_distributor.h, as
this is now the public header.

At the same time, we need the autotests and sample app to compile
properly, hence those changes are in this patch also.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 examples/distributor/main.c                        |  22 +-
 lib/librte_distributor/rte_distributor.c           |  76 +++---
 lib/librte_distributor/rte_distributor.h           | 240 +++++++++++++++++-
 .../rte_distributor_match_generic.c                |   2 +-
 lib/librte_distributor/rte_distributor_match_sse.c |   2 +-
 lib/librte_distributor/rte_distributor_next.h      | 269 ---------------------
 lib/librte_distributor/rte_distributor_private.h   |  22 +-
 lib/librte_distributor/rte_distributor_v20.c       |  46 ++--
 lib/librte_distributor/rte_distributor_v20.h       |  24 +-
 test/test/test_distributor.c                       | 235 ++++++++++++------
 test/test/test_distributor_perf.c                  |  26 +-
 11 files changed, 512 insertions(+), 452 deletions(-)
 delete mode 100644 lib/librte_distributor/rte_distributor_next.h

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index 7b8a759..a748985 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -405,17 +405,30 @@ lcore_worker(struct lcore_params *p)
 {
 	struct rte_distributor *d = p->d;
 	const unsigned id = p->worker_id;
+	unsigned int num = 0;
+	unsigned int i;
+
 	/*
 	 * for single port, xor_val will be zero so we won't modify the output
 	 * port, otherwise we send traffic from 0 to 1, 2 to 3, and vice versa
 	 */
 	const unsigned xor_val = (rte_eth_dev_count() > 1);
-	struct rte_mbuf *buf = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
+
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
 
 	printf("\nCore %u acting as worker core.\n", rte_lcore_id());
 	while (!quit_signal) {
-		buf = rte_distributor_get_pkt(d, id, buf);
-		buf->port ^= xor_val;
+		num = rte_distributor_get_pkt(d, id, buf, buf, num);
+		/* Do a little bit of work for each packet */
+		for (i = 0; i < num; i++) {
+			uint64_t t = rte_rdtsc()+100;
+
+			while (rte_rdtsc() < t)
+				rte_pause();
+			buf[i]->port ^= xor_val;
+		}
 	}
 	return 0;
 }
@@ -561,7 +574,8 @@ main(int argc, char *argv[])
 	}
 
 	d = rte_distributor_create("PKT_DIST", rte_socket_id(),
-			rte_lcore_count() - 2);
+			rte_lcore_count() - 2,
+			RTE_DIST_ALG_BURST);
 	if (d == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create distributor\n");
 
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index 6158fa6..6e1debf 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -42,10 +42,10 @@
 #include <rte_eal_memconfig.h>
 #include <rte_compat.h>
 #include "rte_distributor_private.h"
-#include "rte_distributor_next.h"
+#include "rte_distributor.h"
 #include "rte_distributor_v20.h"
 
-TAILQ_HEAD(rte_dist_burst_list, rte_distributor_v1705);
+TAILQ_HEAD(rte_dist_burst_list, rte_distributor);
 
 static struct rte_tailq_elem rte_dist_burst_tailq = {
 	.name = "RTE_DIST_BURST",
@@ -57,17 +57,17 @@ EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
 /**** Burst Packet APIs called by workers ****/
 
 void
-rte_distributor_request_pkt_v1705(struct rte_distributor_v1705 *d,
+rte_distributor_request_pkt(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt,
 		unsigned int count)
 {
-	struct rte_distributor_buffer_v1705 *buf = &(d->bufs[worker_id]);
+	struct rte_distributor_buffer *buf = &(d->bufs[worker_id]);
 	unsigned int i;
 
 	volatile int64_t *retptr64;
 
 	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
-		rte_distributor_request_pkt(d->d_v20,
+		rte_distributor_request_pkt_v20(d->d_v20,
 			worker_id, oldpkt[0]);
 		return;
 	}
@@ -104,16 +104,16 @@ rte_distributor_request_pkt_v1705(struct rte_distributor_v1705 *d,
 }
 
 int
-rte_distributor_poll_pkt_v1705(struct rte_distributor_v1705 *d,
+rte_distributor_poll_pkt(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts)
 {
-	struct rte_distributor_buffer_v1705 *buf = &d->bufs[worker_id];
+	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
 	uint64_t ret;
 	int count = 0;
 	unsigned int i;
 
 	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
-		pkts[0] = rte_distributor_poll_pkt(d->d_v20, worker_id);
+		pkts[0] = rte_distributor_poll_pkt_v20(d->d_v20, worker_id);
 		return (pkts[0]) ? 1 : 0;
 	}
 
@@ -140,7 +140,7 @@ rte_distributor_poll_pkt_v1705(struct rte_distributor_v1705 *d,
 }
 
 int
-rte_distributor_get_pkt_v1705(struct rte_distributor_v1705 *d,
+rte_distributor_get_pkt(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts,
 		struct rte_mbuf **oldpkt, unsigned int return_count)
 {
@@ -148,37 +148,37 @@ rte_distributor_get_pkt_v1705(struct rte_distributor_v1705 *d,
 
 	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
 		if (return_count <= 1) {
-			pkts[0] = rte_distributor_get_pkt(d->d_v20,
+			pkts[0] = rte_distributor_get_pkt_v20(d->d_v20,
 				worker_id, oldpkt[0]);
 			return (pkts[0]) ? 1 : 0;
 		} else
 			return -EINVAL;
 	}
 
-	rte_distributor_request_pkt_v1705(d, worker_id, oldpkt, return_count);
+	rte_distributor_request_pkt(d, worker_id, oldpkt, return_count);
 
-	count = rte_distributor_poll_pkt_v1705(d, worker_id, pkts);
+	count = rte_distributor_poll_pkt(d, worker_id, pkts);
 	while (count == -1) {
 		uint64_t t = rte_rdtsc() + 100;
 
 		while (rte_rdtsc() < t)
 			rte_pause();
 
-		count = rte_distributor_poll_pkt_v1705(d, worker_id, pkts);
+		count = rte_distributor_poll_pkt(d, worker_id, pkts);
 	}
 	return count;
 }
 
 int
-rte_distributor_return_pkt_v1705(struct rte_distributor_v1705 *d,
+rte_distributor_return_pkt(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
 {
-	struct rte_distributor_buffer_v1705 *buf = &d->bufs[worker_id];
+	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
 	unsigned int i;
 
 	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
 		if (num == 1)
-			return rte_distributor_return_pkt(d->d_v20,
+			return rte_distributor_return_pkt_v20(d->d_v20,
 				worker_id, oldpkt[0]);
 		else
 			return -EINVAL;
@@ -202,7 +202,7 @@ rte_distributor_return_pkt_v1705(struct rte_distributor_v1705 *d,
 
 /* stores a packet returned from a worker inside the returns array */
 static inline void
-store_return(uintptr_t oldbuf, struct rte_distributor_v1705 *d,
+store_return(uintptr_t oldbuf, struct rte_distributor *d,
 		unsigned int *ret_start, unsigned int *ret_count)
 {
 	if (!oldbuf)
@@ -221,7 +221,7 @@ store_return(uintptr_t oldbuf, struct rte_distributor_v1705 *d,
  * workers to give us our atomic flow pinning.
  */
 void
-find_match_scalar(struct rte_distributor_v1705 *d,
+find_match_scalar(struct rte_distributor *d,
 			uint16_t *data_ptr,
 			uint16_t *output_ptr)
 {
@@ -270,9 +270,9 @@ find_match_scalar(struct rte_distributor_v1705 *d,
  * the valid returned pointers (store_return).
  */
 static unsigned int
-handle_returns(struct rte_distributor_v1705 *d, unsigned int wkr)
+handle_returns(struct rte_distributor *d, unsigned int wkr)
 {
-	struct rte_distributor_buffer_v1705 *buf = &(d->bufs[wkr]);
+	struct rte_distributor_buffer *buf = &(d->bufs[wkr]);
 	uintptr_t oldbuf;
 	unsigned int ret_start = d->returns.start,
 			ret_count = d->returns.count;
@@ -308,9 +308,9 @@ handle_returns(struct rte_distributor_v1705 *d, unsigned int wkr)
  * before sending out new packets.
  */
 static unsigned int
-release(struct rte_distributor_v1705 *d, unsigned int wkr)
+release(struct rte_distributor *d, unsigned int wkr)
 {
-	struct rte_distributor_buffer_v1705 *buf = &(d->bufs[wkr]);
+	struct rte_distributor_buffer *buf = &(d->bufs[wkr]);
 	unsigned int i;
 
 	while (!(d->bufs[wkr].bufptr64[0] & RTE_DISTRIB_GET_BUF))
@@ -342,7 +342,7 @@ release(struct rte_distributor_v1705 *d, unsigned int wkr)
 
 /* process a set of packets to distribute them to workers */
 int
-rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
+rte_distributor_process(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int num_mbufs)
 {
 	unsigned int next_idx = 0;
@@ -355,7 +355,7 @@ rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
 
 	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
 		/* Call the old API */
-		return rte_distributor_process(d->d_v20, mbufs, num_mbufs);
+		return rte_distributor_process_v20(d->d_v20, mbufs, num_mbufs);
 	}
 
 	if (unlikely(num_mbufs == 0)) {
@@ -479,7 +479,7 @@ rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
 
 /* return to the caller, packets returned from workers */
 int
-rte_distributor_returned_pkts_v1705(struct rte_distributor_v1705 *d,
+rte_distributor_returned_pkts(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int max_mbufs)
 {
 	struct rte_distributor_returned_pkts *returns = &d->returns;
@@ -489,7 +489,7 @@ rte_distributor_returned_pkts_v1705(struct rte_distributor_v1705 *d,
 
 	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
 		/* Call the old API */
-		return rte_distributor_returned_pkts(d->d_v20,
+		return rte_distributor_returned_pkts_v20(d->d_v20,
 				mbufs, max_mbufs);
 	}
 
@@ -510,7 +510,7 @@ rte_distributor_returned_pkts_v1705(struct rte_distributor_v1705 *d,
  * being workered on or queued up in a backlog.
  */
 static inline unsigned int
-total_outstanding(const struct rte_distributor_v1705 *d)
+total_outstanding(const struct rte_distributor *d)
 {
 	unsigned int wkr, total_outstanding = 0;
 
@@ -525,24 +525,24 @@ total_outstanding(const struct rte_distributor_v1705 *d)
  * queued up.
  */
 int
-rte_distributor_flush_v1705(struct rte_distributor_v1705 *d)
+rte_distributor_flush(struct rte_distributor *d)
 {
 	const unsigned int flushed = total_outstanding(d);
 	unsigned int wkr;
 
 	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
 		/* Call the old API */
-		return rte_distributor_flush(d->d_v20);
+		return rte_distributor_flush_v20(d->d_v20);
 	}
 
 	while (total_outstanding(d) > 0)
-		rte_distributor_process_v1705(d, NULL, 0);
+		rte_distributor_process(d, NULL, 0);
 
 	/*
 	 * Send empty burst to all workers to allow them to exit
 	 * gracefully, should they need to.
 	 */
-	rte_distributor_process_v1705(d, NULL, 0);
+	rte_distributor_process(d, NULL, 0);
 
 	for (wkr = 0; wkr < d->num_workers; wkr++)
 		handle_returns(d, wkr);
@@ -552,13 +552,13 @@ rte_distributor_flush_v1705(struct rte_distributor_v1705 *d)
 
 /* clears the internal returns array in the distributor */
 void
-rte_distributor_clear_returns_v1705(struct rte_distributor_v1705 *d)
+rte_distributor_clear_returns(struct rte_distributor *d)
 {
 	unsigned int wkr;
 
 	if (d->alg_type == RTE_DIST_ALG_SINGLE) {
 		/* Call the old API */
-		rte_distributor_clear_returns(d->d_v20);
+		rte_distributor_clear_returns_v20(d->d_v20);
 	}
 
 	/* throw away returns, so workers can exit */
@@ -567,13 +567,13 @@ rte_distributor_clear_returns_v1705(struct rte_distributor_v1705 *d)
 }
 
 /* creates a distributor instance */
-struct rte_distributor_v1705 *
-rte_distributor_create_v1705(const char *name,
+struct rte_distributor *
+rte_distributor_create(const char *name,
 		unsigned int socket_id,
 		unsigned int num_workers,
 		unsigned int alg_type)
 {
-	struct rte_distributor_v1705 *d;
+	struct rte_distributor *d;
 	struct rte_dist_burst_list *dist_burst_list;
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	const struct rte_memzone *mz;
@@ -586,8 +586,8 @@ rte_distributor_create_v1705(const char *name,
 	RTE_BUILD_BUG_ON((RTE_DISTRIB_MAX_WORKERS & 7) != 0);
 
 	if (alg_type == RTE_DIST_ALG_SINGLE) {
-		d = malloc(sizeof(struct rte_distributor_v1705));
-		d->d_v20 = rte_distributor_create(name,
+		d = malloc(sizeof(struct rte_distributor));
+		d->d_v20 = rte_distributor_create_v20(name,
 				socket_id, num_workers);
 		if (d->d_v20 == NULL) {
 			/* rte_errno will have been set */
diff --git a/lib/librte_distributor/rte_distributor.h b/lib/librte_distributor/rte_distributor.h
index e41d522..9b9efdb 100644
--- a/lib/librte_distributor/rte_distributor.h
+++ b/lib/librte_distributor/rte_distributor.h
@@ -1,8 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -31,9 +30,240 @@
  *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  */
 
-#ifndef _RTE_DISTRIBUTE_H_
-#define _RTE_DISTRIBUTE_H_
+#ifndef _RTE_DISTRIBUTOR_H_
+#define _RTE_DISTRIBUTOR_H_
 
-#include <rte_distributor_v20.h>
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Type of distribution (burst/single) */
+enum rte_distributor_alg_type {
+	RTE_DIST_ALG_BURST = 0,
+	RTE_DIST_ALG_SINGLE,
+	RTE_DIST_NUM_ALG_TYPES
+};
+
+struct rte_distributor;
+struct rte_mbuf;
+
+/**
+ * Function to create a new distributor instance
+ *
+ * Reserves the memory needed for the distributor operation and
+ * initializes the distributor to work with the configured number of workers.
+ *
+ * @param name
+ *   The name to be given to the distributor instance.
+ * @param socket_id
+ *   The NUMA node on which the memory is to be allocated
+ * @param num_workers
+ *   The maximum number of workers that will request packets from this
+ *   distributor
+ * @param alg_type
+ *   Call the legacy API, or use the new burst API. legacy uses 32-bit
+ *   flow ID, and works on a single packet at a time. Latest uses 15-
+ *   bit flow ID and works on up to 8 packets at a time to worers.
+ * @return
+ *   The newly created distributor instance
+ */
+struct rte_distributor *
+rte_distributor_create(const char *name, unsigned int socket_id,
+		unsigned int num_workers,
+		unsigned int alg_type);
+
+/*  *** APIS to be called on the distributor lcore ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on a
+ * single lcore which acts as the distributor lcore for a given distributor
+ * instance. These functions cannot be called on multiple cores simultaneously
+ * without using locking to protect access to the internals of the distributor.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * Process a set of packets by distributing them among workers that request
+ * packets. The distributor will ensure that no two packets that have the
+ * same flow id, or tag, in the mbuf will be processed on different cores at
+ * the same time.
+ *
+ * The user is advocated to set tag for each mbuf before calling this function.
+ * If user doesn't set the tag, the tag value can be various values depending on
+ * driver implementation and configuration.
+ *
+ * This is not multi-thread safe and should only be called on a single lcore.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs to be distributed
+ * @param num_mbufs
+ *   The number of mbufs in the mbufs array
+ * @return
+ *   The number of mbufs processed.
+ */
+int
+rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+/**
+ * Get a set of mbufs that have been returned to the distributor by workers
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param mbufs
+ *   The mbufs pointer array to be filled in
+ * @param max_mbufs
+ *   The size of the mbufs array
+ * @return
+ *   The number of mbufs returned in the mbufs array.
+ */
+int
+rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+/**
+ * Flush the distributor component, so that there are no in-flight or
+ * backlogged packets awaiting processing
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @return
+ *   The number of queued/in-flight packets that were completed by this call.
+ */
+int
+rte_distributor_flush(struct rte_distributor *d);
+
+/**
+ * Clears the array of returned packets used as the source for the
+ * rte_distributor_returned_pkts() API call.
+ *
+ * This should only be called on the same lcore as rte_distributor_process()
+ *
+ * @param d
+ *   The distributor instance to be used
+ */
+void
+rte_distributor_clear_returns(struct rte_distributor *d);
+
+/*  *** APIS to be called on the worker lcores ***  */
+/*
+ * The following APIs are the public APIs which are designed for use on
+ * multiple lcores which act as workers for a distributor. Each lcore should use
+ * a unique worker id when requesting packets.
+ *
+ * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
+ * for the same distributor instance, otherwise deadlock will result.
+ */
+
+/**
+ * API called by a worker to get new packets to process. Any previous packets
+ * given to the worker is assumed to have completed processing, and may be
+ * optionally returned to the distributor via the oldpkt parameter.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param pkts
+ *   The mbufs pointer array to be filled in (up to 8 packets)
+ * @param oldpkt
+ *   The previous packet, if any, being processed by the worker
+ * @param retcount
+ *   The number of packets being returned
+ *
+ * @return
+ *   The number of packets in the pkts array
+ */
+int
+rte_distributor_get_pkt(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **pkts,
+	struct rte_mbuf **oldpkt, unsigned int retcount);
+
+/**
+ * API called by a worker to return a completed packet without requesting a
+ * new packet, for example, because a worker thread is shutting down
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The previous packets being processed by the worker
+ * @param num
+ *   The number of packets in the oldpkt array
+ */
+int
+rte_distributor_return_pkt(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
+
+/**
+ * API called by a worker to request a new packet to process.
+ * Any previous packet given to the worker is assumed to have completed
+ * processing, and may be optionally returned to the distributor via
+ * the oldpkt parameter.
+ * Unlike rte_distributor_get_pkt_burst(), this function does not wait for a
+ * new packet to be provided by the distributor.
+ *
+ * NOTE: after calling this function, rte_distributor_poll_pkt_burst() should
+ * be used to poll for the packet requested. The rte_distributor_get_pkt_burst()
+ * API should *not* be used to try and retrieve the new packet.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param oldpkt
+ *   The returning packets, if any, processed by the worker
+ * @param count
+ *   The number of returning packets
+ */
+void
+rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count);
+
+/**
+ * API called by a worker to check for a new packet that was previously
+ * requested by a call to rte_distributor_request_pkt(). It does not wait
+ * for the new packet to be available, but returns NULL if the request has
+ * not yet been fulfilled by the distributor.
+ *
+ * @param d
+ *   The distributor instance to be used
+ * @param worker_id
+ *   The worker instance number to use - must be less that num_workers passed
+ *   at distributor creation time.
+ * @param mbufs
+ *   The array of mbufs being given to the worker
+ *
+ * @return
+ *   The number of packets being given to the worker thread, zero if no
+ *   packet is yet available.
+ */
+int
+rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **mbufs);
+
+#ifdef __cplusplus
+}
+#endif
 
 #endif
diff --git a/lib/librte_distributor/rte_distributor_match_generic.c b/lib/librte_distributor/rte_distributor_match_generic.c
index 7c2f9f5..4925a78 100644
--- a/lib/librte_distributor/rte_distributor_match_generic.c
+++ b/lib/librte_distributor/rte_distributor_match_generic.c
@@ -35,7 +35,7 @@
 #include "rte_distributor.h"
 
 void
-find_match_vec(struct rte_distributor_v1705 *d,
+find_match_vec(struct rte_distributor *d,
 			uint16_t *data_ptr,
 			uint16_t *output_ptr)
 {
diff --git a/lib/librte_distributor/rte_distributor_match_sse.c b/lib/librte_distributor/rte_distributor_match_sse.c
index b9f9bb0..44935a6 100644
--- a/lib/librte_distributor/rte_distributor_match_sse.c
+++ b/lib/librte_distributor/rte_distributor_match_sse.c
@@ -38,7 +38,7 @@
 
 
 void
-find_match_vec(struct rte_distributor_v1705 *d,
+find_match_vec(struct rte_distributor *d,
 			uint16_t *data_ptr,
 			uint16_t *output_ptr)
 {
diff --git a/lib/librte_distributor/rte_distributor_next.h b/lib/librte_distributor/rte_distributor_next.h
deleted file mode 100644
index 0034020..0000000
--- a/lib/librte_distributor/rte_distributor_next.h
+++ /dev/null
@@ -1,269 +0,0 @@
-/*-
- *   BSD LICENSE
- *
- *   Copyright(c) 2017 Intel Corporation. All rights reserved.
- *
- *   Redistribution and use in source and binary forms, with or without
- *   modification, are permitted provided that the following conditions
- *   are met:
- *
- *     * Redistributions of source code must retain the above copyright
- *       notice, this list of conditions and the following disclaimer.
- *     * Redistributions in binary form must reproduce the above copyright
- *       notice, this list of conditions and the following disclaimer in
- *       the documentation and/or other materials provided with the
- *       distribution.
- *     * Neither the name of Intel Corporation nor the names of its
- *       contributors may be used to endorse or promote products derived
- *       from this software without specific prior written permission.
- *
- *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#ifndef _RTE_DISTRIBUTOR_H_
-#define _RTE_DISTRIBUTOR_H_
-
-/**
- * @file
- * RTE distributor
- *
- * The distributor is a component which is designed to pass packets
- * one-at-a-time to workers, with dynamic load balancing.
- */
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-/* Type of distribution (burst/single) */
-enum rte_distributor_alg_type {
-	RTE_DIST_ALG_BURST = 0,
-	RTE_DIST_ALG_SINGLE,
-	RTE_DIST_NUM_ALG_TYPES
-};
-
-struct rte_distributor_v1705;
-struct rte_mbuf;
-
-/**
- * Function to create a new distributor instance
- *
- * Reserves the memory needed for the distributor operation and
- * initializes the distributor to work with the configured number of workers.
- *
- * @param name
- *   The name to be given to the distributor instance.
- * @param socket_id
- *   The NUMA node on which the memory is to be allocated
- * @param num_workers
- *   The maximum number of workers that will request packets from this
- *   distributor
- * @param alg_type
- *   Call the legacy API, or use the new burst API. legacy uses 32-bit
- *   flow ID, and works on a single packet at a time. Latest uses 15-
- *   bit flow ID and works on up to 8 packets at a time to worers.
- * @return
- *   The newly created distributor instance
- */
-struct rte_distributor_v1705 *
-rte_distributor_create_v1705(const char *name, unsigned int socket_id,
-		unsigned int num_workers,
-		unsigned int alg_type);
-
-/*  *** APIS to be called on the distributor lcore ***  */
-/*
- * The following APIs are the public APIs which are designed for use on a
- * single lcore which acts as the distributor lcore for a given distributor
- * instance. These functions cannot be called on multiple cores simultaneously
- * without using locking to protect access to the internals of the distributor.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * Process a set of packets by distributing them among workers that request
- * packets. The distributor will ensure that no two packets that have the
- * same flow id, or tag, in the mbuf will be processed on different cores at
- * the same time.
- *
- * The user is advocated to set tag for each mbuf before calling this function.
- * If user doesn't set the tag, the tag value can be various values depending on
- * driver implementation and configuration.
- *
- * This is not multi-thread safe and should only be called on a single lcore.
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs to be distributed
- * @param num_mbufs
- *   The number of mbufs in the mbufs array
- * @return
- *   The number of mbufs processed.
- */
-int
-rte_distributor_process_v1705(struct rte_distributor_v1705 *d,
-		struct rte_mbuf **mbufs, unsigned int num_mbufs);
-
-/**
- * Get a set of mbufs that have been returned to the distributor by workers
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @param mbufs
- *   The mbufs pointer array to be filled in
- * @param max_mbufs
- *   The size of the mbufs array
- * @return
- *   The number of mbufs returned in the mbufs array.
- */
-int
-rte_distributor_returned_pkts_v1705(struct rte_distributor_v1705 *d,
-		struct rte_mbuf **mbufs, unsigned int max_mbufs);
-
-/**
- * Flush the distributor component, so that there are no in-flight or
- * backlogged packets awaiting processing
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- * @return
- *   The number of queued/in-flight packets that were completed by this call.
- */
-int
-rte_distributor_flush_v1705(struct rte_distributor_v1705 *d);
-
-/**
- * Clears the array of returned packets used as the source for the
- * rte_distributor_returned_pkts() API call.
- *
- * This should only be called on the same lcore as rte_distributor_process()
- *
- * @param d
- *   The distributor instance to be used
- */
-void
-rte_distributor_clear_returns_v1705(struct rte_distributor_v1705 *d);
-
-/*  *** APIS to be called on the worker lcores ***  */
-/*
- * The following APIs are the public APIs which are designed for use on
- * multiple lcores which act as workers for a distributor. Each lcore should use
- * a unique worker id when requesting packets.
- *
- * NOTE: a given lcore cannot act as both a distributor lcore and a worker lcore
- * for the same distributor instance, otherwise deadlock will result.
- */
-
-/**
- * API called by a worker to get new packets to process. Any previous packets
- * given to the worker is assumed to have completed processing, and may be
- * optionally returned to the distributor via the oldpkt parameter.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param pkts
- *   The mbufs pointer array to be filled in (up to 8 packets)
- * @param oldpkt
- *   The previous packet, if any, being processed by the worker
- * @param retcount
- *   The number of packets being returned
- *
- * @return
- *   The number of packets in the pkts array
- */
-int
-rte_distributor_get_pkt_v1705(struct rte_distributor_v1705 *d,
-	unsigned int worker_id, struct rte_mbuf **pkts,
-	struct rte_mbuf **oldpkt, unsigned int retcount);
-
-/**
- * API called by a worker to return a completed packet without requesting a
- * new packet, for example, because a worker thread is shutting down
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The previous packets being processed by the worker
- * @param num
- *   The number of packets in the oldpkt array
- */
-int
-rte_distributor_return_pkt_v1705(struct rte_distributor_v1705 *d,
-	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
-
-/**
- * API called by a worker to request a new packet to process.
- * Any previous packet given to the worker is assumed to have completed
- * processing, and may be optionally returned to the distributor via
- * the oldpkt parameter.
- * Unlike rte_distributor_get_pkt_burst(), this function does not wait for a
- * new packet to be provided by the distributor.
- *
- * NOTE: after calling this function, rte_distributor_poll_pkt_burst() should
- * be used to poll for the packet requested. The rte_distributor_get_pkt_burst()
- * API should *not* be used to try and retrieve the new packet.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param oldpkt
- *   The returning packets, if any, processed by the worker
- * @param count
- *   The number of returning packets
- */
-void
-rte_distributor_request_pkt_v1705(struct rte_distributor_v1705 *d,
-		unsigned int worker_id, struct rte_mbuf **oldpkt,
-		unsigned int count);
-
-/**
- * API called by a worker to check for a new packet that was previously
- * requested by a call to rte_distributor_request_pkt(). It does not wait
- * for the new packet to be available, but returns NULL if the request has
- * not yet been fulfilled by the distributor.
- *
- * @param d
- *   The distributor instance to be used
- * @param worker_id
- *   The worker instance number to use - must be less that num_workers passed
- *   at distributor creation time.
- * @param mbufs
- *   The array of mbufs being given to the worker
- *
- * @return
- *   The number of packets being given to the worker thread, zero if no
- *   packet is yet available.
- */
-int
-rte_distributor_poll_pkt_v1705(struct rte_distributor_v1705 *d,
-		unsigned int worker_id, struct rte_mbuf **mbufs);
-
-#ifdef __cplusplus
-}
-#endif
-
-#endif
diff --git a/lib/librte_distributor/rte_distributor_private.h b/lib/librte_distributor/rte_distributor_private.h
index 04c9cac..250b23e 100644
--- a/lib/librte_distributor/rte_distributor_private.h
+++ b/lib/librte_distributor/rte_distributor_private.h
@@ -83,7 +83,7 @@ extern "C" {
  * the next cache line to worker 0, we pad this out to three cache lines.
  * Only 64-bits of the memory is actually used though.
  */
-union rte_distributor_buffer {
+union rte_distributor_buffer_v20 {
 	volatile int64_t bufptr64;
 	char pad[RTE_CACHE_LINE_SIZE*3];
 } __rte_cache_aligned;
@@ -108,8 +108,8 @@ struct rte_distributor_returned_pkts {
 	struct rte_mbuf *mbufs[RTE_DISTRIB_MAX_RETURNS];
 };
 
-struct rte_distributor {
-	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
+struct rte_distributor_v20 {
+	TAILQ_ENTRY(rte_distributor_v20) next;    /**< Next in list. */
 
 	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
 	unsigned int num_workers;             /**< Number of workers polling */
@@ -124,7 +124,7 @@ struct rte_distributor {
 
 	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS];
 
-	union rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
+	union rte_distributor_buffer_v20 bufs[RTE_DISTRIB_MAX_WORKERS];
 
 	struct rte_distributor_returned_pkts returns;
 };
@@ -144,7 +144,7 @@ enum rte_distributor_match_function {
  * We can pass up to 8 mbufs at a time in one cacheline.
  * There is a separate cacheline for returns in the burst API.
  */
-struct rte_distributor_buffer_v1705 {
+struct rte_distributor_buffer {
 	volatile int64_t bufptr64[RTE_DIST_BURST_SIZE]
 		__rte_cache_aligned; /* <= outgoing to worker */
 
@@ -158,8 +158,8 @@ struct rte_distributor_buffer_v1705 {
 	int count __rte_cache_aligned;       /* <= number of current mbufs */
 };
 
-struct rte_distributor_v1705 {
-	TAILQ_ENTRY(rte_distributor_v1705) next;    /**< Next in list. */
+struct rte_distributor {
+	TAILQ_ENTRY(rte_distributor) next;    /**< Next in list. */
 
 	char name[RTE_DISTRIBUTOR_NAMESIZE];  /**< Name of the ring. */
 	unsigned int num_workers;             /**< Number of workers polling */
@@ -176,22 +176,22 @@ struct rte_distributor_v1705 {
 	struct rte_distributor_backlog backlog[RTE_DISTRIB_MAX_WORKERS]
 			__rte_cache_aligned;
 
-	struct rte_distributor_buffer_v1705 bufs[RTE_DISTRIB_MAX_WORKERS];
+	struct rte_distributor_buffer bufs[RTE_DISTRIB_MAX_WORKERS];
 
 	struct rte_distributor_returned_pkts returns;
 
 	enum rte_distributor_match_function dist_match_fn;
 
-	struct rte_distributor *d_v20;
+	struct rte_distributor_v20 *d_v20;
 };
 
 void
-find_match_scalar(struct rte_distributor_v1705 *d,
+find_match_scalar(struct rte_distributor *d,
 			uint16_t *data_ptr,
 			uint16_t *output_ptr);
 
 void
-find_match_vec(struct rte_distributor_v1705 *d,
+find_match_vec(struct rte_distributor *d,
 			uint16_t *data_ptr,
 			uint16_t *output_ptr);
 
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
index be297ec..1f406c5 100644
--- a/lib/librte_distributor/rte_distributor_v20.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -43,7 +43,7 @@
 #include "rte_distributor_v20.h"
 #include "rte_distributor_private.h"
 
-TAILQ_HEAD(rte_distributor_list, rte_distributor);
+TAILQ_HEAD(rte_distributor_list, rte_distributor_v20);
 
 static struct rte_tailq_elem rte_distributor_tailq = {
 	.name = "RTE_DISTRIBUTOR",
@@ -53,10 +53,10 @@ EAL_REGISTER_TAILQ(rte_distributor_tailq)
 /**** APIs called by workers ****/
 
 void
-rte_distributor_request_pkt(struct rte_distributor *d,
+rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id, struct rte_mbuf *oldpkt)
 {
-	union rte_distributor_buffer *buf = &d->bufs[worker_id];
+	union rte_distributor_buffer_v20 *buf = &d->bufs[worker_id];
 	int64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
 			| RTE_DISTRIB_GET_BUF;
 	while (unlikely(buf->bufptr64 & RTE_DISTRIB_FLAGS_MASK))
@@ -65,10 +65,10 @@ rte_distributor_request_pkt(struct rte_distributor *d,
 }
 
 struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
+rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id)
 {
-	union rte_distributor_buffer *buf = &d->bufs[worker_id];
+	union rte_distributor_buffer_v20 *buf = &d->bufs[worker_id];
 	if (buf->bufptr64 & RTE_DISTRIB_GET_BUF)
 		return NULL;
 
@@ -78,21 +78,21 @@ rte_distributor_poll_pkt(struct rte_distributor *d,
 }
 
 struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
+rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id, struct rte_mbuf *oldpkt)
 {
 	struct rte_mbuf *ret;
-	rte_distributor_request_pkt(d, worker_id, oldpkt);
-	while ((ret = rte_distributor_poll_pkt(d, worker_id)) == NULL)
+	rte_distributor_request_pkt_v20(d, worker_id, oldpkt);
+	while ((ret = rte_distributor_poll_pkt_v20(d, worker_id)) == NULL)
 		rte_pause();
 	return ret;
 }
 
 int
-rte_distributor_return_pkt(struct rte_distributor *d,
+rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned worker_id, struct rte_mbuf *oldpkt)
 {
-	union rte_distributor_buffer *buf = &d->bufs[worker_id];
+	union rte_distributor_buffer_v20 *buf = &d->bufs[worker_id];
 	uint64_t req = (((int64_t)(uintptr_t)oldpkt) << RTE_DISTRIB_FLAG_BITS)
 			| RTE_DISTRIB_RETURN_BUF;
 	buf->bufptr64 = req;
@@ -123,7 +123,7 @@ backlog_pop(struct rte_distributor_backlog *bl)
 
 /* stores a packet returned from a worker inside the returns array */
 static inline void
-store_return(uintptr_t oldbuf, struct rte_distributor *d,
+store_return(uintptr_t oldbuf, struct rte_distributor_v20 *d,
 		unsigned *ret_start, unsigned *ret_count)
 {
 	/* store returns in a circular buffer - code is branch-free */
@@ -134,7 +134,7 @@ store_return(uintptr_t oldbuf, struct rte_distributor *d,
 }
 
 static inline void
-handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
+handle_worker_shutdown(struct rte_distributor_v20 *d, unsigned int wkr)
 {
 	d->in_flight_tags[wkr] = 0;
 	d->in_flight_bitmask &= ~(1UL << wkr);
@@ -164,7 +164,7 @@ handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
 		 * Note that the tags were set before first level call
 		 * to rte_distributor_process.
 		 */
-		rte_distributor_process(d, pkts, i);
+		rte_distributor_process_v20(d, pkts, i);
 		bl->count = bl->start = 0;
 	}
 }
@@ -174,7 +174,7 @@ handle_worker_shutdown(struct rte_distributor *d, unsigned wkr)
  * to do a partial flush.
  */
 static int
-process_returns(struct rte_distributor *d)
+process_returns(struct rte_distributor_v20 *d)
 {
 	unsigned wkr;
 	unsigned flushed = 0;
@@ -213,7 +213,7 @@ process_returns(struct rte_distributor *d)
 
 /* process a set of packets to distribute them to workers */
 int
-rte_distributor_process(struct rte_distributor *d,
+rte_distributor_process_v20(struct rte_distributor_v20 *d,
 		struct rte_mbuf **mbufs, unsigned num_mbufs)
 {
 	unsigned next_idx = 0;
@@ -317,7 +317,7 @@ rte_distributor_process(struct rte_distributor *d,
 
 /* return to the caller, packets returned from workers */
 int
-rte_distributor_returned_pkts(struct rte_distributor *d,
+rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
 		struct rte_mbuf **mbufs, unsigned max_mbufs)
 {
 	struct rte_distributor_returned_pkts *returns = &d->returns;
@@ -338,7 +338,7 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
 /* return the number of packets in-flight in a distributor, i.e. packets
  * being workered on or queued up in a backlog. */
 static inline unsigned
-total_outstanding(const struct rte_distributor *d)
+total_outstanding(const struct rte_distributor_v20 *d)
 {
 	unsigned wkr, total_outstanding;
 
@@ -353,19 +353,19 @@ total_outstanding(const struct rte_distributor *d)
 /* flush the distributor, so that there are no outstanding packets in flight or
  * queued up. */
 int
-rte_distributor_flush(struct rte_distributor *d)
+rte_distributor_flush_v20(struct rte_distributor_v20 *d)
 {
 	const unsigned flushed = total_outstanding(d);
 
 	while (total_outstanding(d) > 0)
-		rte_distributor_process(d, NULL, 0);
+		rte_distributor_process_v20(d, NULL, 0);
 
 	return flushed;
 }
 
 /* clears the internal returns array in the distributor */
 void
-rte_distributor_clear_returns(struct rte_distributor *d)
+rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d)
 {
 	d->returns.start = d->returns.count = 0;
 #ifndef __OPTIMIZE__
@@ -374,12 +374,12 @@ rte_distributor_clear_returns(struct rte_distributor *d)
 }
 
 /* creates a distributor instance */
-struct rte_distributor *
-rte_distributor_create(const char *name,
+struct rte_distributor_v20 *
+rte_distributor_create_v20(const char *name,
 		unsigned socket_id,
 		unsigned num_workers)
 {
-	struct rte_distributor *d;
+	struct rte_distributor_v20 *d;
 	struct rte_distributor_list *distributor_list;
 	char mz_name[RTE_MEMZONE_NAMESIZE];
 	const struct rte_memzone *mz;
diff --git a/lib/librte_distributor/rte_distributor_v20.h b/lib/librte_distributor/rte_distributor_v20.h
index b69aa27..f02e6aa 100644
--- a/lib/librte_distributor/rte_distributor_v20.h
+++ b/lib/librte_distributor/rte_distributor_v20.h
@@ -48,7 +48,7 @@ extern "C" {
 
 #define RTE_DISTRIBUTOR_NAMESIZE 32 /**< Length of name for instance */
 
-struct rte_distributor;
+struct rte_distributor_v20;
 struct rte_mbuf;
 
 /**
@@ -67,8 +67,8 @@ struct rte_mbuf;
  * @return
  *   The newly created distributor instance
  */
-struct rte_distributor *
-rte_distributor_create(const char *name, unsigned int socket_id,
+struct rte_distributor_v20 *
+rte_distributor_create_v20(const char *name, unsigned int socket_id,
 		unsigned int num_workers);
 
 /*  *** APIS to be called on the distributor lcore ***  */
@@ -103,7 +103,7 @@ rte_distributor_create(const char *name, unsigned int socket_id,
  *   The number of mbufs processed.
  */
 int
-rte_distributor_process(struct rte_distributor *d,
+rte_distributor_process_v20(struct rte_distributor_v20 *d,
 		struct rte_mbuf **mbufs, unsigned int num_mbufs);
 
 /**
@@ -121,7 +121,7 @@ rte_distributor_process(struct rte_distributor *d,
  *   The number of mbufs returned in the mbufs array.
  */
 int
-rte_distributor_returned_pkts(struct rte_distributor *d,
+rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
 		struct rte_mbuf **mbufs, unsigned int max_mbufs);
 
 /**
@@ -136,7 +136,7 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
  *   The number of queued/in-flight packets that were completed by this call.
  */
 int
-rte_distributor_flush(struct rte_distributor *d);
+rte_distributor_flush_v20(struct rte_distributor_v20 *d);
 
 /**
  * Clears the array of returned packets used as the source for the
@@ -148,7 +148,7 @@ rte_distributor_flush(struct rte_distributor *d);
  *   The distributor instance to be used
  */
 void
-rte_distributor_clear_returns(struct rte_distributor *d);
+rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d);
 
 /*  *** APIS to be called on the worker lcores ***  */
 /*
@@ -177,7 +177,7 @@ rte_distributor_clear_returns(struct rte_distributor *d);
  *   A new packet to be processed by the worker thread.
  */
 struct rte_mbuf *
-rte_distributor_get_pkt(struct rte_distributor *d,
+rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned int worker_id, struct rte_mbuf *oldpkt);
 
 /**
@@ -193,8 +193,8 @@ rte_distributor_get_pkt(struct rte_distributor *d,
  *   The previous packet being processed by the worker
  */
 int
-rte_distributor_return_pkt(struct rte_distributor *d, unsigned int worker_id,
-		struct rte_mbuf *mbuf);
+rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
+		unsigned int worker_id, struct rte_mbuf *mbuf);
 
 /**
  * API called by a worker to request a new packet to process.
@@ -217,7 +217,7 @@ rte_distributor_return_pkt(struct rte_distributor *d, unsigned int worker_id,
  *   The previous packet, if any, being processed by the worker
  */
 void
-rte_distributor_request_pkt(struct rte_distributor *d,
+rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned int worker_id, struct rte_mbuf *oldpkt);
 
 /**
@@ -237,7 +237,7 @@ rte_distributor_request_pkt(struct rte_distributor *d,
  *   packet is yet available.
  */
 struct rte_mbuf *
-rte_distributor_poll_pkt(struct rte_distributor *d,
+rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
 		unsigned int worker_id);
 
 #ifdef __cplusplus
diff --git a/test/test/test_distributor.c b/test/test/test_distributor.c
index 6059a0c..7a30513 100644
--- a/test/test/test_distributor.c
+++ b/test/test/test_distributor.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -87,20 +87,25 @@ clear_packet_count(void)
 static int
 handle_work(void *arg)
 {
-	struct rte_mbuf *pkt = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
 	struct worker_params *wp = arg;
-	struct rte_distributor *d = wp->dist;
-
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
-
-	pkt = rte_distributor_get_pkt(d, id, NULL);
+	struct rte_distributor *db = wp->dist;
+	unsigned int count = 0, num = 0;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	int i;
+
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
+	num = rte_distributor_get_pkt(db, id, buf, buf, num);
 	while (!quit) {
-		worker_stats[id].handled_packets++, count++;
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		worker_stats[id].handled_packets += num;
+		count += num;
+		num = rte_distributor_get_pkt(db, id,
+				buf, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
+	worker_stats[id].handled_packets += num;
+	count += num;
+	rte_distributor_return_pkt(db, id, buf, num);
 	return 0;
 }
 
@@ -118,9 +123,11 @@ handle_work(void *arg)
 static int
 sanity_test(struct worker_params *wp, struct rte_mempool *p)
 {
-	struct rte_distributor *d = wp->dist;
+	struct rte_distributor *db = wp->dist;
 	struct rte_mbuf *bufs[BURST];
-	unsigned i;
+	struct rte_mbuf *returns[BURST*2];
+	unsigned int i, count;
+	unsigned int retries;
 
 	printf("=== Basic distributor sanity tests ===\n");
 	clear_packet_count();
@@ -134,8 +141,15 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 	for (i = 0; i < BURST; i++)
 		bufs[i]->hash.usr = 0;
 
-	rte_distributor_process(d, bufs, BURST);
-	rte_distributor_flush(d);
+	rte_distributor_process(db, bufs, BURST);
+	count = 0;
+	do {
+
+		rte_distributor_flush(db);
+		count += rte_distributor_returned_pkts(db,
+				returns, BURST*2);
+	} while (count < BURST);
+
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -147,8 +161,6 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 		printf("Worker %u handled %u packets\n", i,
 				worker_stats[i].handled_packets);
 	printf("Sanity test with all zero hashes done.\n");
-	if (worker_stats[0].handled_packets != BURST)
-		return -1;
 
 	/* pick two flows and check they go correctly */
 	if (rte_lcore_count() >= 3) {
@@ -156,8 +168,13 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 		for (i = 0; i < BURST; i++)
 			bufs[i]->hash.usr = (i & 1) << 8;
 
-		rte_distributor_process(d, bufs, BURST);
-		rte_distributor_flush(d);
+		rte_distributor_process(db, bufs, BURST);
+		count = 0;
+		do {
+			rte_distributor_flush(db);
+			count += rte_distributor_returned_pkts(db,
+					returns, BURST*2);
+		} while (count < BURST);
 		if (total_packet_count() != BURST) {
 			printf("Line %d: Error, not all packets flushed. "
 					"Expected %u, got %u\n",
@@ -169,20 +186,21 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 			printf("Worker %u handled %u packets\n", i,
 					worker_stats[i].handled_packets);
 		printf("Sanity test with two hash values done\n");
-
-		if (worker_stats[0].handled_packets != 16 ||
-				worker_stats[1].handled_packets != 16)
-			return -1;
 	}
 
 	/* give a different hash value to each packet,
 	 * so load gets distributed */
 	clear_packet_count();
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = i;
-
-	rte_distributor_process(d, bufs, BURST);
-	rte_distributor_flush(d);
+		bufs[i]->hash.usr = i+1;
+
+	rte_distributor_process(db, bufs, BURST);
+	count = 0;
+	do {
+		rte_distributor_flush(db);
+		count += rte_distributor_returned_pkts(db,
+				returns, BURST*2);
+	} while (count < BURST);
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -204,8 +222,9 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 	unsigned num_returned = 0;
 
 	/* flush out any remaining packets */
-	rte_distributor_flush(d);
-	rte_distributor_clear_returns(d);
+	rte_distributor_flush(db);
+	rte_distributor_clear_returns(db);
+
 	if (rte_mempool_get_bulk(p, (void *)many_bufs, BIG_BATCH) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
 		return -1;
@@ -213,28 +232,44 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 	for (i = 0; i < BIG_BATCH; i++)
 		many_bufs[i]->hash.usr = i << 2;
 
+	printf("=== testing big burst (%s) ===\n", wp->name);
 	for (i = 0; i < BIG_BATCH/BURST; i++) {
-		rte_distributor_process(d, &many_bufs[i*BURST], BURST);
-		num_returned += rte_distributor_returned_pkts(d,
+		rte_distributor_process(db,
+				&many_bufs[i*BURST], BURST);
+		count = rte_distributor_returned_pkts(db,
 				&return_bufs[num_returned],
 				BIG_BATCH - num_returned);
+		num_returned += count;
 	}
-	rte_distributor_flush(d);
-	num_returned += rte_distributor_returned_pkts(d,
-			&return_bufs[num_returned], BIG_BATCH - num_returned);
+	rte_distributor_flush(db);
+	count = rte_distributor_returned_pkts(db,
+		&return_bufs[num_returned],
+			BIG_BATCH - num_returned);
+	num_returned += count;
+	retries = 0;
+	do {
+		rte_distributor_flush(db);
+		count = rte_distributor_returned_pkts(db,
+				&return_bufs[num_returned],
+				BIG_BATCH - num_returned);
+		num_returned += count;
+		retries++;
+	} while ((num_returned < BIG_BATCH) && (retries < 100));
 
 	if (num_returned != BIG_BATCH) {
-		printf("line %d: Number returned is not the same as "
-				"number sent\n", __LINE__);
+		printf("line %d: Missing packets, expected %d\n",
+				__LINE__, num_returned);
 		return -1;
 	}
+
 	/* big check -  make sure all packets made it back!! */
 	for (i = 0; i < BIG_BATCH; i++) {
 		unsigned j;
 		struct rte_mbuf *src = many_bufs[i];
-		for (j = 0; j < BIG_BATCH; j++)
+		for (j = 0; j < BIG_BATCH; j++) {
 			if (return_bufs[j] == src)
 				break;
+		}
 
 		if (j == BIG_BATCH) {
 			printf("Error: could not find source packet #%u\n", i);
@@ -258,20 +293,28 @@ sanity_test(struct worker_params *wp, struct rte_mempool *p)
 static int
 handle_work_with_free_mbufs(void *arg)
 {
-	struct rte_mbuf *pkt = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
 	struct worker_params *wp = arg;
 	struct rte_distributor *d = wp->dist;
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
-
-	pkt = rte_distributor_get_pkt(d, id, NULL);
+	unsigned int count = 0;
+	unsigned int i;
+	unsigned int num = 0;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
+	num = rte_distributor_get_pkt(d, id, buf, buf, num);
 	while (!quit) {
-		worker_stats[id].handled_packets++, count++;
-		rte_pktmbuf_free(pkt);
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		worker_stats[id].handled_packets += num;
+		count += num;
+		for (i = 0; i < num; i++)
+			rte_pktmbuf_free(buf[i]);
+		num = rte_distributor_get_pkt(d,
+				id, buf, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
+	worker_stats[id].handled_packets += num;
+	count += num;
+	rte_distributor_return_pkt(d, id, buf, num);
 	return 0;
 }
 
@@ -287,7 +330,8 @@ sanity_test_with_mbuf_alloc(struct worker_params *wp, struct rte_mempool *p)
 	unsigned i;
 	struct rte_mbuf *bufs[BURST];
 
-	printf("=== Sanity test with mbuf alloc/free  ===\n");
+	printf("=== Sanity test with mbuf alloc/free (%s) ===\n", wp->name);
+
 	clear_packet_count();
 	for (i = 0; i < ((1<<ITER_POWER)); i += BURST) {
 		unsigned j;
@@ -302,6 +346,9 @@ sanity_test_with_mbuf_alloc(struct worker_params *wp, struct rte_mempool *p)
 	}
 
 	rte_distributor_flush(d);
+
+	rte_delay_us(10000);
+
 	if (total_packet_count() < (1<<ITER_POWER)) {
 		printf("Line %u: Packet count is incorrect, %u, expected %u\n",
 				__LINE__, total_packet_count(),
@@ -317,21 +364,32 @@ static int
 handle_work_for_shutdown_test(void *arg)
 {
 	struct rte_mbuf *pkt = NULL;
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
 	struct worker_params *wp = arg;
 	struct rte_distributor *d = wp->dist;
-	unsigned count = 0;
-	const unsigned id = __sync_fetch_and_add(&worker_idx, 1);
+	unsigned int count = 0;
+	unsigned int num = 0;
+	unsigned int total = 0;
+	unsigned int i;
+	unsigned int returned = 0;
+	const unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+
+	num = rte_distributor_get_pkt(d, id, buf, buf, num);
 
-	pkt = rte_distributor_get_pkt(d, id, NULL);
 	/* wait for quit single globally, or for worker zero, wait
 	 * for zero_quit */
 	while (!quit && !(id == 0 && zero_quit)) {
-		worker_stats[id].handled_packets++, count++;
-		rte_pktmbuf_free(pkt);
-		pkt = rte_distributor_get_pkt(d, id, NULL);
+		worker_stats[id].handled_packets += num;
+		count += num;
+		for (i = 0; i < num; i++)
+			rte_pktmbuf_free(buf[i]);
+		num = rte_distributor_get_pkt(d,
+				id, buf, buf, num);
+		total += num;
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
+	worker_stats[id].handled_packets += num;
+	count += num;
+	returned = rte_distributor_return_pkt(d, id, buf, num);
 
 	if (id == 0) {
 		/* for worker zero, allow it to restart to pick up last packet
@@ -339,13 +397,18 @@ handle_work_for_shutdown_test(void *arg)
 		 */
 		while (zero_quit)
 			usleep(100);
-		pkt = rte_distributor_get_pkt(d, id, NULL);
+
+		num = rte_distributor_get_pkt(d,
+				id, buf, buf, num);
+
 		while (!quit) {
 			worker_stats[id].handled_packets++, count++;
 			rte_pktmbuf_free(pkt);
-			pkt = rte_distributor_get_pkt(d, id, NULL);
+			num = rte_distributor_get_pkt(d, id, buf, buf, num);
 		}
-		rte_distributor_return_pkt(d, id, pkt);
+		returned = rte_distributor_return_pkt(d,
+				id, buf, num);
+		printf("Num returned = %d\n", returned);
 	}
 	return 0;
 }
@@ -367,17 +430,22 @@ sanity_test_with_worker_shutdown(struct worker_params *wp,
 	printf("=== Sanity test of worker shutdown ===\n");
 
 	clear_packet_count();
+
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
 		printf("line %d: Error getting mbufs from pool\n", __LINE__);
 		return -1;
 	}
 
-	/* now set all hash values in all buffers to zero, so all pkts go to the
-	 * one worker thread */
+	/*
+	 * Now set all hash values in all buffers to same value so all
+	 * pkts go to the one worker thread
+	 */
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = 0;
+		bufs[i]->hash.usr = 1;
 
 	rte_distributor_process(d, bufs, BURST);
+	rte_distributor_flush(d);
+
 	/* at this point, we will have processed some packets and have a full
 	 * backlog for the other ones at worker 0.
 	 */
@@ -388,7 +456,7 @@ sanity_test_with_worker_shutdown(struct worker_params *wp,
 		return -1;
 	}
 	for (i = 0; i < BURST; i++)
-		bufs[i]->hash.usr = 0;
+		bufs[i]->hash.usr = 1;
 
 	/* get worker zero to quit */
 	zero_quit = 1;
@@ -396,6 +464,12 @@ sanity_test_with_worker_shutdown(struct worker_params *wp,
 
 	/* flush the distributor */
 	rte_distributor_flush(d);
+	rte_delay_us(10000);
+
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
+
 	if (total_packet_count() != BURST * 2) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -403,10 +477,6 @@ sanity_test_with_worker_shutdown(struct worker_params *wp,
 		return -1;
 	}
 
-	for (i = 0; i < rte_lcore_count() - 1; i++)
-		printf("Worker %u handled %u packets\n", i,
-				worker_stats[i].handled_packets);
-
 	printf("Sanity test with worker shutdown passed\n\n");
 	return 0;
 }
@@ -422,7 +492,7 @@ test_flush_with_worker_shutdown(struct worker_params *wp,
 	struct rte_mbuf *bufs[BURST];
 	unsigned i;
 
-	printf("=== Test flush fn with worker shutdown ===\n");
+	printf("=== Test flush fn with worker shutdown (%s) ===\n", wp->name);
 
 	clear_packet_count();
 	if (rte_mempool_get_bulk(p, (void *)bufs, BURST) != 0) {
@@ -446,7 +516,13 @@ test_flush_with_worker_shutdown(struct worker_params *wp,
 	/* flush the distributor */
 	rte_distributor_flush(d);
 
+	rte_delay_us(10000);
+
 	zero_quit = 0;
+	for (i = 0; i < rte_lcore_count() - 1; i++)
+		printf("Worker %u handled %u packets\n", i,
+				worker_stats[i].handled_packets);
+
 	if (total_packet_count() != BURST) {
 		printf("Line %d: Error, not all packets flushed. "
 				"Expected %u, got %u\n",
@@ -454,10 +530,6 @@ test_flush_with_worker_shutdown(struct worker_params *wp,
 		return -1;
 	}
 
-	for (i = 0; i < rte_lcore_count() - 1; i++)
-		printf("Worker %u handled %u packets\n", i,
-				worker_stats[i].handled_packets);
-
 	printf("Flush test with worker shutdown passed\n\n");
 	return 0;
 }
@@ -469,7 +541,9 @@ int test_error_distributor_create_name(void)
 	char *name = NULL;
 
 	d = rte_distributor_create(name, rte_socket_id(),
-			rte_lcore_count() - 1);
+			rte_lcore_count() - 1,
+			RTE_DIST_ALG_BURST);
+
 	if (d != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with NULL name param\n");
 		return -1;
@@ -483,8 +557,10 @@ static
 int test_error_distributor_create_numworkers(void)
 {
 	struct rte_distributor *d = NULL;
+
 	d = rte_distributor_create("test_numworkers", rte_socket_id(),
-			RTE_MAX_LCORE + 10);
+			RTE_MAX_LCORE + 10,
+			RTE_DIST_ALG_BURST);
 	if (d != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with num_workers > MAX\n");
 		return -1;
@@ -530,10 +606,11 @@ test_distributor(void)
 	}
 
 	if (d == NULL) {
-		d = rte_distributor_create("Test_distributor", rte_socket_id(),
-				rte_lcore_count() - 1);
+		d = rte_distributor_create("Test_dist_burst", rte_socket_id(),
+				rte_lcore_count() - 1,
+				RTE_DIST_ALG_BURST);
 		if (d == NULL) {
-			printf("Error creating distributor\n");
+			printf("Error creating burst distributor\n");
 			return -1;
 		}
 	} else {
@@ -553,7 +630,7 @@ test_distributor(void)
 	}
 
 	worker_params.dist = d;
-	sprintf(worker_params.name, "single");
+	sprintf(worker_params.name, "burst");
 
 	rte_eal_mp_remote_launch(handle_work, &worker_params, SKIP_MASTER);
 	if (sanity_test(&worker_params, p) < 0)
diff --git a/test/test/test_distributor_perf.c b/test/test/test_distributor_perf.c
index 7947fe9..1dd326b 100644
--- a/test/test/test_distributor_perf.c
+++ b/test/test/test_distributor_perf.c
@@ -129,18 +129,25 @@ clear_packet_count(void)
 static int
 handle_work(void *arg)
 {
-	struct rte_mbuf *pkt = NULL;
 	struct rte_distributor *d = arg;
-	unsigned count = 0;
-	unsigned id = __sync_fetch_and_add(&worker_idx, 1);
+	unsigned int count = 0;
+	unsigned int num = 0;
+	int i;
+	unsigned int id = __sync_fetch_and_add(&worker_idx, 1);
+	struct rte_mbuf *buf[8] __rte_cache_aligned;
 
-	pkt = rte_distributor_get_pkt(d, id, NULL);
+	for (i = 0; i < 8; i++)
+		buf[i] = NULL;
+
+	num = rte_distributor_get_pkt(d, id, buf, buf, num);
 	while (!quit) {
-		worker_stats[id].handled_packets++, count++;
-		pkt = rte_distributor_get_pkt(d, id, pkt);
+		worker_stats[id].handled_packets += num;
+		count += num;
+		num = rte_distributor_get_pkt(d, id, buf, buf, num);
 	}
-	worker_stats[id].handled_packets++, count++;
-	rte_distributor_return_pkt(d, id, pkt);
+	worker_stats[id].handled_packets += num;
+	count += num;
+	rte_distributor_return_pkt(d, id, buf, num);
 	return 0;
 }
 
@@ -228,7 +235,8 @@ test_distributor_perf(void)
 
 	if (d == NULL) {
 		d = rte_distributor_create("Test_perf", rte_socket_id(),
-				rte_lcore_count() - 1);
+				rte_lcore_count() - 1,
+				RTE_DIST_ALG_SINGLE);
 		if (d == NULL) {
 			printf("Error creating distributor\n");
 			return -1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v11 07/18] lib: make v20 header file private
  2017-03-20 10:08                                   ` [PATCH v11 0/18] distributor lib performance enhancements David Hunt
                                                       ` (5 preceding siblings ...)
  2017-03-20 10:08                                     ` [PATCH v11 06/18] lib: switch distributor over to new API David Hunt
@ 2017-03-20 10:08                                     ` David Hunt
  2017-03-27 13:10                                       ` Thomas Monjalon
  2017-03-20 10:08                                     ` [PATCH v11 08/18] lib: add symbol versioning to distributor David Hunt
                                                       ` (11 subsequent siblings)
  18 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-03-20 10:08 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_distributor/Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index a812fe4..2b28eff 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -57,7 +57,6 @@ endif
 
 # install this header file
 SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
-SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor_v20.h
 
 # this lib needs eal
 DEPDIRS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) += lib/librte_eal
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v11 08/18] lib: add symbol versioning to distributor
  2017-03-20 10:08                                   ` [PATCH v11 0/18] distributor lib performance enhancements David Hunt
                                                       ` (6 preceding siblings ...)
  2017-03-20 10:08                                     ` [PATCH v11 07/18] lib: make v20 header file private David Hunt
@ 2017-03-20 10:08                                     ` David Hunt
  2017-03-27 13:02                                       ` Thomas Monjalon
  2017-03-20 10:08                                     ` [PATCH v11 09/18] test: test single and burst distributor API David Hunt
                                                       ` (10 subsequent siblings)
  18 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-03-20 10:08 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Also bumped up the ABI version number in the Makefile

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/librte_distributor/Makefile                    |  2 +-
 lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
 lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++
 lib/librte_distributor/rte_distributor_v20.c       | 10 +++
 lib/librte_distributor/rte_distributor_version.map | 14 ++++
 5 files changed, 162 insertions(+), 10 deletions(-)
 create mode 100644 lib/librte_distributor/rte_distributor_v1705.h

diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
index 2b28eff..2f05cf3 100644
--- a/lib/librte_distributor/Makefile
+++ b/lib/librte_distributor/Makefile
@@ -39,7 +39,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
 
 EXPORT_MAP := rte_distributor_version.map
 
-LIBABIVER := 1
+LIBABIVER := 2
 
 # all source are stored in SRCS-y
 SRCS-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR) := rte_distributor_v20.c
diff --git a/lib/librte_distributor/rte_distributor.c b/lib/librte_distributor/rte_distributor.c
index 6e1debf..06df13d 100644
--- a/lib/librte_distributor/rte_distributor.c
+++ b/lib/librte_distributor/rte_distributor.c
@@ -36,6 +36,7 @@
 #include <rte_mbuf.h>
 #include <rte_memory.h>
 #include <rte_cycles.h>
+#include <rte_compat.h>
 #include <rte_memzone.h>
 #include <rte_errno.h>
 #include <rte_string_fns.h>
@@ -44,6 +45,7 @@
 #include "rte_distributor_private.h"
 #include "rte_distributor.h"
 #include "rte_distributor_v20.h"
+#include "rte_distributor_v1705.h"
 
 TAILQ_HEAD(rte_dist_burst_list, rte_distributor);
 
@@ -57,7 +59,7 @@ EAL_REGISTER_TAILQ(rte_dist_burst_tailq)
 /**** Burst Packet APIs called by workers ****/
 
 void
-rte_distributor_request_pkt(struct rte_distributor *d,
+rte_distributor_request_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt,
 		unsigned int count)
 {
@@ -102,9 +104,14 @@ rte_distributor_request_pkt(struct rte_distributor *d,
 	 */
 	*retptr64 |= RTE_DISTRIB_GET_BUF;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_request_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(void rte_distributor_request_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count),
+		rte_distributor_request_pkt_v1705);
 
 int
-rte_distributor_poll_pkt(struct rte_distributor *d,
+rte_distributor_poll_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts)
 {
 	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
@@ -138,9 +145,13 @@ rte_distributor_poll_pkt(struct rte_distributor *d,
 
 	return count;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_poll_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_poll_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **pkts),
+		rte_distributor_poll_pkt_v1705);
 
 int
-rte_distributor_get_pkt(struct rte_distributor *d,
+rte_distributor_get_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **pkts,
 		struct rte_mbuf **oldpkt, unsigned int return_count)
 {
@@ -168,9 +179,14 @@ rte_distributor_get_pkt(struct rte_distributor *d,
 	}
 	return count;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_get_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_get_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **pkts,
+		struct rte_mbuf **oldpkt, unsigned int return_count),
+		rte_distributor_get_pkt_v1705);
 
 int
-rte_distributor_return_pkt(struct rte_distributor *d,
+rte_distributor_return_pkt_v1705(struct rte_distributor *d,
 		unsigned int worker_id, struct rte_mbuf **oldpkt, int num)
 {
 	struct rte_distributor_buffer *buf = &d->bufs[worker_id];
@@ -197,6 +213,10 @@ rte_distributor_return_pkt(struct rte_distributor *d,
 
 	return 0;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_return_pkt, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_return_pkt(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt, int num),
+		rte_distributor_return_pkt_v1705);
 
 /**** APIs called on distributor core ***/
 
@@ -342,7 +362,7 @@ release(struct rte_distributor *d, unsigned int wkr)
 
 /* process a set of packets to distribute them to workers */
 int
-rte_distributor_process(struct rte_distributor *d,
+rte_distributor_process_v1705(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int num_mbufs)
 {
 	unsigned int next_idx = 0;
@@ -476,10 +496,14 @@ rte_distributor_process(struct rte_distributor *d,
 
 	return num_mbufs;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_process, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_process(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs),
+		rte_distributor_process_v1705);
 
 /* return to the caller, packets returned from workers */
 int
-rte_distributor_returned_pkts(struct rte_distributor *d,
+rte_distributor_returned_pkts_v1705(struct rte_distributor *d,
 		struct rte_mbuf **mbufs, unsigned int max_mbufs)
 {
 	struct rte_distributor_returned_pkts *returns = &d->returns;
@@ -504,6 +528,10 @@ rte_distributor_returned_pkts(struct rte_distributor *d,
 
 	return retval;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_returned_pkts, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_returned_pkts(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs),
+		rte_distributor_returned_pkts_v1705);
 
 /*
  * Return the number of packets in-flight in a distributor, i.e. packets
@@ -525,7 +553,7 @@ total_outstanding(const struct rte_distributor *d)
  * queued up.
  */
 int
-rte_distributor_flush(struct rte_distributor *d)
+rte_distributor_flush_v1705(struct rte_distributor *d)
 {
 	const unsigned int flushed = total_outstanding(d);
 	unsigned int wkr;
@@ -549,10 +577,13 @@ rte_distributor_flush(struct rte_distributor *d)
 
 	return flushed;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_flush, _v1705, 17.05);
+MAP_STATIC_SYMBOL(int rte_distributor_flush(struct rte_distributor *d),
+		rte_distributor_flush_v1705);
 
 /* clears the internal returns array in the distributor */
 void
-rte_distributor_clear_returns(struct rte_distributor *d)
+rte_distributor_clear_returns_v1705(struct rte_distributor *d)
 {
 	unsigned int wkr;
 
@@ -565,10 +596,13 @@ rte_distributor_clear_returns(struct rte_distributor *d)
 	for (wkr = 0; wkr < d->num_workers; wkr++)
 		d->bufs[wkr].retptr64[0] = 0;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_clear_returns, _v1705, 17.05);
+MAP_STATIC_SYMBOL(void rte_distributor_clear_returns(struct rte_distributor *d),
+		rte_distributor_clear_returns_v1705);
 
 /* creates a distributor instance */
 struct rte_distributor *
-rte_distributor_create(const char *name,
+rte_distributor_create_v1705(const char *name,
 		unsigned int socket_id,
 		unsigned int num_workers,
 		unsigned int alg_type)
@@ -638,3 +672,8 @@ rte_distributor_create(const char *name,
 
 	return d;
 }
+BIND_DEFAULT_SYMBOL(rte_distributor_create, _v1705, 17.05);
+MAP_STATIC_SYMBOL(struct rte_distributor *rte_distributor_create(
+		const char *name, unsigned int socket_id,
+		unsigned int num_workers, unsigned int alg_type),
+		rte_distributor_create_v1705);
diff --git a/lib/librte_distributor/rte_distributor_v1705.h b/lib/librte_distributor/rte_distributor_v1705.h
new file mode 100644
index 0000000..81b2691
--- /dev/null
+++ b/lib/librte_distributor/rte_distributor_v1705.h
@@ -0,0 +1,89 @@
+/*-
+ *   BSD LICENSE
+ *
+ *   Copyright(c) 2017 Intel Corporation. All rights reserved.
+ *
+ *   Redistribution and use in source and binary forms, with or without
+ *   modification, are permitted provided that the following conditions
+ *   are met:
+ *
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in
+ *       the documentation and/or other materials provided with the
+ *       distribution.
+ *     * Neither the name of Intel Corporation nor the names of its
+ *       contributors may be used to endorse or promote products derived
+ *       from this software without specific prior written permission.
+ *
+ *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _RTE_DISTRIB_V1705_H_
+#define _RTE_DISTRIB_V1705_H_
+
+/**
+ * @file
+ * RTE distributor
+ *
+ * The distributor is a component which is designed to pass packets
+ * one-at-a-time to workers, with dynamic load balancing.
+ */
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+struct rte_distributor *
+rte_distributor_create_v1705(const char *name, unsigned int socket_id,
+		unsigned int num_workers,
+		unsigned int alg_type);
+
+int
+rte_distributor_process_v1705(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int num_mbufs);
+
+int
+rte_distributor_returned_pkts_v1705(struct rte_distributor *d,
+		struct rte_mbuf **mbufs, unsigned int max_mbufs);
+
+int
+rte_distributor_flush_v1705(struct rte_distributor *d);
+
+void
+rte_distributor_clear_returns_v1705(struct rte_distributor *d);
+
+int
+rte_distributor_get_pkt_v1705(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **pkts,
+	struct rte_mbuf **oldpkt, unsigned int retcount);
+
+int
+rte_distributor_return_pkt_v1705(struct rte_distributor *d,
+	unsigned int worker_id, struct rte_mbuf **oldpkt, int num);
+
+void
+rte_distributor_request_pkt_v1705(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **oldpkt,
+		unsigned int count);
+
+int
+rte_distributor_poll_pkt_v1705(struct rte_distributor *d,
+		unsigned int worker_id, struct rte_mbuf **mbufs);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/librte_distributor/rte_distributor_v20.c b/lib/librte_distributor/rte_distributor_v20.c
index 1f406c5..bb6c5d7 100644
--- a/lib/librte_distributor/rte_distributor_v20.c
+++ b/lib/librte_distributor/rte_distributor_v20.c
@@ -38,6 +38,7 @@
 #include <rte_memory.h>
 #include <rte_memzone.h>
 #include <rte_errno.h>
+#include <rte_compat.h>
 #include <rte_string_fns.h>
 #include <rte_eal_memconfig.h>
 #include "rte_distributor_v20.h"
@@ -63,6 +64,7 @@ rte_distributor_request_pkt_v20(struct rte_distributor_v20 *d,
 		rte_pause();
 	buf->bufptr64 = req;
 }
+VERSION_SYMBOL(rte_distributor_request_pkt, _v20, 2.0);
 
 struct rte_mbuf *
 rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
@@ -76,6 +78,7 @@ rte_distributor_poll_pkt_v20(struct rte_distributor_v20 *d,
 	int64_t ret = buf->bufptr64 >> RTE_DISTRIB_FLAG_BITS;
 	return (struct rte_mbuf *)((uintptr_t)ret);
 }
+VERSION_SYMBOL(rte_distributor_poll_pkt, _v20, 2.0);
 
 struct rte_mbuf *
 rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
@@ -87,6 +90,7 @@ rte_distributor_get_pkt_v20(struct rte_distributor_v20 *d,
 		rte_pause();
 	return ret;
 }
+VERSION_SYMBOL(rte_distributor_get_pkt, _v20, 2.0);
 
 int
 rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
@@ -98,6 +102,7 @@ rte_distributor_return_pkt_v20(struct rte_distributor_v20 *d,
 	buf->bufptr64 = req;
 	return 0;
 }
+VERSION_SYMBOL(rte_distributor_return_pkt, _v20, 2.0);
 
 /**** APIs called on distributor core ***/
 
@@ -314,6 +319,7 @@ rte_distributor_process_v20(struct rte_distributor_v20 *d,
 	d->returns.count = ret_count;
 	return num_mbufs;
 }
+VERSION_SYMBOL(rte_distributor_process, _v20, 2.0);
 
 /* return to the caller, packets returned from workers */
 int
@@ -334,6 +340,7 @@ rte_distributor_returned_pkts_v20(struct rte_distributor_v20 *d,
 
 	return retval;
 }
+VERSION_SYMBOL(rte_distributor_returned_pkts, _v20, 2.0);
 
 /* return the number of packets in-flight in a distributor, i.e. packets
  * being workered on or queued up in a backlog. */
@@ -362,6 +369,7 @@ rte_distributor_flush_v20(struct rte_distributor_v20 *d)
 
 	return flushed;
 }
+VERSION_SYMBOL(rte_distributor_flush, _v20, 2.0);
 
 /* clears the internal returns array in the distributor */
 void
@@ -372,6 +380,7 @@ rte_distributor_clear_returns_v20(struct rte_distributor_v20 *d)
 	memset(d->returns.mbufs, 0, sizeof(d->returns.mbufs));
 #endif
 }
+VERSION_SYMBOL(rte_distributor_clear_returns, _v20, 2.0);
 
 /* creates a distributor instance */
 struct rte_distributor_v20 *
@@ -415,3 +424,4 @@ rte_distributor_create_v20(const char *name,
 
 	return d;
 }
+VERSION_SYMBOL(rte_distributor_create, _v20, 2.0);
diff --git a/lib/librte_distributor/rte_distributor_version.map b/lib/librte_distributor/rte_distributor_version.map
index 73fdc43..3a285b3 100644
--- a/lib/librte_distributor/rte_distributor_version.map
+++ b/lib/librte_distributor/rte_distributor_version.map
@@ -13,3 +13,17 @@ DPDK_2.0 {
 
 	local: *;
 };
+
+DPDK_17.05 {
+	global:
+
+	rte_distributor_clear_returns;
+	rte_distributor_create;
+	rte_distributor_flush;
+	rte_distributor_get_pkt;
+	rte_distributor_poll_pkt;
+	rte_distributor_process;
+	rte_distributor_request_pkt;
+	rte_distributor_return_pkt;
+	rte_distributor_returned_pkts;
+} DPDK_2.0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v11 09/18] test: test single and burst distributor API
  2017-03-20 10:08                                   ` [PATCH v11 0/18] distributor lib performance enhancements David Hunt
                                                       ` (7 preceding siblings ...)
  2017-03-20 10:08                                     ` [PATCH v11 08/18] lib: add symbol versioning to distributor David Hunt
@ 2017-03-20 10:08                                     ` David Hunt
  2017-03-20 10:08                                     ` [PATCH v11 10/18] test: add perf test for distributor burst mode David Hunt
                                                       ` (9 subsequent siblings)
  18 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-20 10:08 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 test/test/test_distributor.c | 116 ++++++++++++++++++++++++++++++-------------
 1 file changed, 82 insertions(+), 34 deletions(-)

diff --git a/test/test/test_distributor.c b/test/test/test_distributor.c
index 7a30513..890a852 100644
--- a/test/test/test_distributor.c
+++ b/test/test/test_distributor.c
@@ -538,17 +538,25 @@ static
 int test_error_distributor_create_name(void)
 {
 	struct rte_distributor *d = NULL;
+	struct rte_distributor *db = NULL;
 	char *name = NULL;
 
 	d = rte_distributor_create(name, rte_socket_id(),
 			rte_lcore_count() - 1,
-			RTE_DIST_ALG_BURST);
-
+			RTE_DIST_ALG_SINGLE);
 	if (d != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with NULL name param\n");
 		return -1;
 	}
 
+	db = rte_distributor_create(name, rte_socket_id(),
+			rte_lcore_count() - 1,
+			RTE_DIST_ALG_BURST);
+	if (db != NULL || rte_errno != EINVAL) {
+		printf("ERROR: No error on create() with NULL param\n");
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -556,15 +564,25 @@ int test_error_distributor_create_name(void)
 static
 int test_error_distributor_create_numworkers(void)
 {
-	struct rte_distributor *d = NULL;
+	struct rte_distributor *ds = NULL;
+	struct rte_distributor *db = NULL;
 
-	d = rte_distributor_create("test_numworkers", rte_socket_id(),
+	ds = rte_distributor_create("test_numworkers", rte_socket_id(),
 			RTE_MAX_LCORE + 10,
-			RTE_DIST_ALG_BURST);
-	if (d != NULL || rte_errno != EINVAL) {
+			RTE_DIST_ALG_SINGLE);
+	if (ds != NULL || rte_errno != EINVAL) {
 		printf("ERROR: No error on create() with num_workers > MAX\n");
 		return -1;
 	}
+
+	db = rte_distributor_create("test_numworkers", rte_socket_id(),
+			RTE_MAX_LCORE + 10,
+			RTE_DIST_ALG_BURST);
+	if (db != NULL || rte_errno != EINVAL) {
+		printf("ERROR: No error on create() num_workers > MAX\n");
+		return -1;
+	}
+
 	return 0;
 }
 
@@ -597,25 +615,42 @@ quit_workers(struct worker_params *wp, struct rte_mempool *p)
 static int
 test_distributor(void)
 {
-	static struct rte_distributor *d;
+	static struct rte_distributor *ds;
+	static struct rte_distributor *db;
+	static struct rte_distributor *dist[2];
 	static struct rte_mempool *p;
+	int i;
 
 	if (rte_lcore_count() < 2) {
 		printf("ERROR: not enough cores to test distributor\n");
 		return -1;
 	}
 
-	if (d == NULL) {
-		d = rte_distributor_create("Test_dist_burst", rte_socket_id(),
+	if (db == NULL) {
+		db = rte_distributor_create("Test_dist_burst", rte_socket_id(),
 				rte_lcore_count() - 1,
 				RTE_DIST_ALG_BURST);
-		if (d == NULL) {
+		if (db == NULL) {
 			printf("Error creating burst distributor\n");
 			return -1;
 		}
 	} else {
-		rte_distributor_flush(d);
-		rte_distributor_clear_returns(d);
+		rte_distributor_flush(db);
+		rte_distributor_clear_returns(db);
+	}
+
+	if (ds == NULL) {
+		ds = rte_distributor_create("Test_dist_single",
+				rte_socket_id(),
+				rte_lcore_count() - 1,
+			RTE_DIST_ALG_SINGLE);
+		if (ds == NULL) {
+			printf("Error creating single distributor\n");
+			return -1;
+		}
+	} else {
+		rte_distributor_flush(ds);
+		rte_distributor_clear_returns(ds);
 	}
 
 	const unsigned nb_bufs = (511 * rte_lcore_count()) < BIG_BATCH ?
@@ -629,37 +664,50 @@ test_distributor(void)
 		}
 	}
 
-	worker_params.dist = d;
-	sprintf(worker_params.name, "burst");
+	dist[0] = ds;
+	dist[1] = db;
 
-	rte_eal_mp_remote_launch(handle_work, &worker_params, SKIP_MASTER);
-	if (sanity_test(&worker_params, p) < 0)
-		goto err;
-	quit_workers(&worker_params, p);
+	for (i = 0; i < 2; i++) {
 
-	rte_eal_mp_remote_launch(handle_work_with_free_mbufs, &worker_params,
-				SKIP_MASTER);
-	if (sanity_test_with_mbuf_alloc(&worker_params, p) < 0)
-		goto err;
-	quit_workers(&worker_params, p);
+		worker_params.dist = dist[i];
+		if (i)
+			sprintf(worker_params.name, "burst");
+		else
+			sprintf(worker_params.name, "single");
 
-	if (rte_lcore_count() > 2) {
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
-				&worker_params,
-				SKIP_MASTER);
-		if (sanity_test_with_worker_shutdown(&worker_params, p) < 0)
+		rte_eal_mp_remote_launch(handle_work,
+				&worker_params, SKIP_MASTER);
+		if (sanity_test(&worker_params, p) < 0)
 			goto err;
 		quit_workers(&worker_params, p);
 
-		rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
-				&worker_params,
-				SKIP_MASTER);
-		if (test_flush_with_worker_shutdown(&worker_params, p) < 0)
+		rte_eal_mp_remote_launch(handle_work_with_free_mbufs,
+				&worker_params, SKIP_MASTER);
+		if (sanity_test_with_mbuf_alloc(&worker_params, p) < 0)
 			goto err;
 		quit_workers(&worker_params, p);
 
-	} else {
-		printf("Not enough cores to run tests for worker shutdown\n");
+		if (rte_lcore_count() > 2) {
+			rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+					&worker_params,
+					SKIP_MASTER);
+			if (sanity_test_with_worker_shutdown(&worker_params,
+					p) < 0)
+				goto err;
+			quit_workers(&worker_params, p);
+
+			rte_eal_mp_remote_launch(handle_work_for_shutdown_test,
+					&worker_params,
+					SKIP_MASTER);
+			if (test_flush_with_worker_shutdown(&worker_params,
+					p) < 0)
+				goto err;
+			quit_workers(&worker_params, p);
+
+		} else {
+			printf("Too few cores to run worker shutdown test\n");
+		}
+
 	}
 
 	if (test_error_distributor_create_numworkers() == -1 ||
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v11 10/18] test: add perf test for distributor burst mode
  2017-03-20 10:08                                   ` [PATCH v11 0/18] distributor lib performance enhancements David Hunt
                                                       ` (8 preceding siblings ...)
  2017-03-20 10:08                                     ` [PATCH v11 09/18] test: test single and burst distributor API David Hunt
@ 2017-03-20 10:08                                     ` David Hunt
  2017-03-20 10:08                                     ` [PATCH v11 11/18] examples/distributor: allow for extra stats David Hunt
                                                       ` (8 subsequent siblings)
  18 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-20 10:08 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 test/test/test_distributor_perf.c | 75 ++++++++++++++++++++++++++-------------
 1 file changed, 51 insertions(+), 24 deletions(-)

diff --git a/test/test/test_distributor_perf.c b/test/test/test_distributor_perf.c
index 1dd326b..732d86d 100644
--- a/test/test/test_distributor_perf.c
+++ b/test/test/test_distributor_perf.c
@@ -1,7 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *   All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
@@ -41,8 +41,9 @@
 #include <rte_mbuf.h>
 #include <rte_distributor.h>
 
-#define ITER_POWER 20 /* log 2 of how many iterations we do when timing. */
-#define BURST 32
+#define ITER_POWER_CL 25 /* log 2 of how many iterations  for Cache Line test */
+#define ITER_POWER 21 /* log 2 of how many iterations we do when timing. */
+#define BURST 64
 #define BIG_BATCH 1024
 
 /* static vars - zero initialized by default */
@@ -54,7 +55,8 @@ struct worker_stats {
 } __rte_cache_aligned;
 struct worker_stats worker_stats[RTE_MAX_LCORE];
 
-/* worker thread used for testing the time to do a round-trip of a cache
+/*
+ * worker thread used for testing the time to do a round-trip of a cache
  * line between two cores and back again
  */
 static void
@@ -69,7 +71,8 @@ flip_bit(volatile uint64_t *arg)
 	}
 }
 
-/* test case to time the number of cycles to round-trip a cache line between
+/*
+ * test case to time the number of cycles to round-trip a cache line between
  * two cores and back again.
  */
 static void
@@ -86,7 +89,7 @@ time_cache_line_switch(void)
 		rte_pause();
 
 	const uint64_t start_time = rte_rdtsc();
-	for (i = 0; i < (1 << ITER_POWER); i++) {
+	for (i = 0; i < (1 << ITER_POWER_CL); i++) {
 		while (*pdata)
 			rte_pause();
 		*pdata = 1;
@@ -98,13 +101,14 @@ time_cache_line_switch(void)
 	*pdata = 2;
 	rte_eal_wait_lcore(slaveid);
 	printf("==== Cache line switch test ===\n");
-	printf("Time for %u iterations = %"PRIu64" ticks\n", (1<<ITER_POWER),
+	printf("Time for %u iterations = %"PRIu64" ticks\n", (1<<ITER_POWER_CL),
 			end_time-start_time);
 	printf("Ticks per iteration = %"PRIu64"\n\n",
-			(end_time-start_time) >> ITER_POWER);
+			(end_time-start_time) >> ITER_POWER_CL);
 }
 
-/* returns the total count of the number of packets handled by the worker
+/*
+ * returns the total count of the number of packets handled by the worker
  * functions given below.
  */
 static unsigned
@@ -123,7 +127,8 @@ clear_packet_count(void)
 	memset(&worker_stats, 0, sizeof(worker_stats));
 }
 
-/* this is the basic worker function for performance tests.
+/*
+ * This is the basic worker function for performance tests.
  * it does nothing but return packets and count them.
  */
 static int
@@ -151,14 +156,15 @@ handle_work(void *arg)
 	return 0;
 }
 
-/* this basic performance test just repeatedly sends in 32 packets at a time
+/*
+ * This basic performance test just repeatedly sends in 32 packets at a time
  * to the distributor and verifies at the end that we got them all in the worker
  * threads and finally how long per packet the processing took.
  */
 static inline int
 perf_test(struct rte_distributor *d, struct rte_mempool *p)
 {
-	unsigned i;
+	unsigned int i;
 	uint64_t start, end;
 	struct rte_mbuf *bufs[BURST];
 
@@ -181,7 +187,8 @@ perf_test(struct rte_distributor *d, struct rte_mempool *p)
 		rte_distributor_process(d, NULL, 0);
 	} while (total_packet_count() < (BURST << ITER_POWER));
 
-	printf("=== Performance test of distributor ===\n");
+	rte_distributor_clear_returns(d);
+
 	printf("Time per burst:  %"PRIu64"\n", (end - start) >> ITER_POWER);
 	printf("Time per packet: %"PRIu64"\n\n",
 			((end - start) >> ITER_POWER)/BURST);
@@ -201,9 +208,10 @@ perf_test(struct rte_distributor *d, struct rte_mempool *p)
 static void
 quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 {
-	const unsigned num_workers = rte_lcore_count() - 1;
-	unsigned i;
+	const unsigned int num_workers = rte_lcore_count() - 1;
+	unsigned int i;
 	struct rte_mbuf *bufs[RTE_MAX_LCORE];
+
 	rte_mempool_get_bulk(p, (void *)bufs, num_workers);
 
 	quit = 1;
@@ -222,7 +230,8 @@ quit_workers(struct rte_distributor *d, struct rte_mempool *p)
 static int
 test_distributor_perf(void)
 {
-	static struct rte_distributor *d;
+	static struct rte_distributor *ds;
+	static struct rte_distributor *db;
 	static struct rte_mempool *p;
 
 	if (rte_lcore_count() < 2) {
@@ -233,17 +242,28 @@ test_distributor_perf(void)
 	/* first time how long it takes to round-trip a cache line */
 	time_cache_line_switch();
 
-	if (d == NULL) {
-		d = rte_distributor_create("Test_perf", rte_socket_id(),
+	if (ds == NULL) {
+		ds = rte_distributor_create("Test_perf", rte_socket_id(),
 				rte_lcore_count() - 1,
 				RTE_DIST_ALG_SINGLE);
-		if (d == NULL) {
+		if (ds == NULL) {
 			printf("Error creating distributor\n");
 			return -1;
 		}
 	} else {
-		rte_distributor_flush(d);
-		rte_distributor_clear_returns(d);
+		rte_distributor_clear_returns(ds);
+	}
+
+	if (db == NULL) {
+		db = rte_distributor_create("Test_burst", rte_socket_id(),
+				rte_lcore_count() - 1,
+				RTE_DIST_ALG_BURST);
+		if (db == NULL) {
+			printf("Error creating burst distributor\n");
+			return -1;
+		}
+	} else {
+		rte_distributor_clear_returns(db);
 	}
 
 	const unsigned nb_bufs = (511 * rte_lcore_count()) < BIG_BATCH ?
@@ -257,10 +277,17 @@ test_distributor_perf(void)
 		}
 	}
 
-	rte_eal_mp_remote_launch(handle_work, d, SKIP_MASTER);
-	if (perf_test(d, p) < 0)
+	printf("=== Performance test of distributor (single mode) ===\n");
+	rte_eal_mp_remote_launch(handle_work, ds, SKIP_MASTER);
+	if (perf_test(ds, p) < 0)
+		return -1;
+	quit_workers(ds, p);
+
+	printf("=== Performance test of distributor (burst mode) ===\n");
+	rte_eal_mp_remote_launch(handle_work, db, SKIP_MASTER);
+	if (perf_test(db, p) < 0)
 		return -1;
-	quit_workers(d, p);
+	quit_workers(db, p);
 
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v11 11/18] examples/distributor: allow for extra stats
  2017-03-20 10:08                                   ` [PATCH v11 0/18] distributor lib performance enhancements David Hunt
                                                       ` (9 preceding siblings ...)
  2017-03-20 10:08                                     ` [PATCH v11 10/18] test: add perf test for distributor burst mode David Hunt
@ 2017-03-20 10:08                                     ` David Hunt
  2017-03-20 10:08                                     ` [PATCH v11 12/18] examples/distributor: wait for ports to come up David Hunt
                                                       ` (7 subsequent siblings)
  18 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-20 10:08 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

This will allow us to see what's going on at various stages
throughout the sample app, with per-second visibility

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 examples/distributor/main.c | 140 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 124 insertions(+), 16 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index a748985..a8a5e80 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -54,24 +54,53 @@
 
 #define RTE_LOGTYPE_DISTRAPP RTE_LOGTYPE_USER1
 
+#define ANSI_COLOR_RED     "\x1b[31m"
+#define ANSI_COLOR_RESET   "\x1b[0m"
+
 /* mask of enabled ports */
 static uint32_t enabled_port_mask;
 volatile uint8_t quit_signal;
 volatile uint8_t quit_signal_rx;
+volatile uint8_t quit_signal_dist;
 
 static volatile struct app_stats {
 	struct {
 		uint64_t rx_pkts;
 		uint64_t returned_pkts;
 		uint64_t enqueued_pkts;
+		uint64_t enqdrop_pkts;
 	} rx __rte_cache_aligned;
+	int pad1 __rte_cache_aligned;
+
+	struct {
+		uint64_t in_pkts;
+		uint64_t ret_pkts;
+		uint64_t sent_pkts;
+		uint64_t enqdrop_pkts;
+	} dist __rte_cache_aligned;
+	int pad2 __rte_cache_aligned;
 
 	struct {
 		uint64_t dequeue_pkts;
 		uint64_t tx_pkts;
+		uint64_t enqdrop_pkts;
 	} tx __rte_cache_aligned;
+	int pad3 __rte_cache_aligned;
+
+	uint64_t worker_pkts[64] __rte_cache_aligned;
+
+	int pad4 __rte_cache_aligned;
+
+	uint64_t worker_bursts[64][8] __rte_cache_aligned;
+
+	int pad5 __rte_cache_aligned;
+
+	uint64_t port_rx_pkts[64] __rte_cache_aligned;
+	uint64_t port_tx_pkts[64] __rte_cache_aligned;
 } app_stats;
 
+struct app_stats prev_app_stats;
+
 static const struct rte_eth_conf port_conf_default = {
 	.rxmode = {
 		.mq_mode = ETH_MQ_RX_RSS,
@@ -93,6 +122,8 @@ struct output_buffer {
 	struct rte_mbuf *mbufs[BURST_SIZE];
 };
 
+static void print_stats(void);
+
 /*
  * Initialises a given port using global settings and with the rx buffers
  * coming from the mbuf_pool passed as parameter
@@ -378,25 +409,91 @@ static void
 print_stats(void)
 {
 	struct rte_eth_stats eth_stats;
-	unsigned i;
-
-	printf("\nRX thread stats:\n");
-	printf(" - Received:    %"PRIu64"\n", app_stats.rx.rx_pkts);
-	printf(" - Processed:   %"PRIu64"\n", app_stats.rx.returned_pkts);
-	printf(" - Enqueued:    %"PRIu64"\n", app_stats.rx.enqueued_pkts);
-
-	printf("\nTX thread stats:\n");
-	printf(" - Dequeued:    %"PRIu64"\n", app_stats.tx.dequeue_pkts);
-	printf(" - Transmitted: %"PRIu64"\n", app_stats.tx.tx_pkts);
+	unsigned int i, j;
+	const unsigned int num_workers = rte_lcore_count() - 4;
 
 	for (i = 0; i < rte_eth_dev_count(); i++) {
 		rte_eth_stats_get(i, &eth_stats);
-		printf("\nPort %u stats:\n", i);
-		printf(" - Pkts in:   %"PRIu64"\n", eth_stats.ipackets);
-		printf(" - Pkts out:  %"PRIu64"\n", eth_stats.opackets);
-		printf(" - In Errs:   %"PRIu64"\n", eth_stats.ierrors);
-		printf(" - Out Errs:  %"PRIu64"\n", eth_stats.oerrors);
-		printf(" - Mbuf Errs: %"PRIu64"\n", eth_stats.rx_nombuf);
+		app_stats.port_rx_pkts[i] = eth_stats.ipackets;
+		app_stats.port_tx_pkts[i] = eth_stats.opackets;
+	}
+
+	printf("\n\nRX Thread:\n");
+	for (i = 0; i < rte_eth_dev_count(); i++) {
+		printf("Port %u Pktsin : %5.2f\n", i,
+				(app_stats.port_rx_pkts[i] -
+				prev_app_stats.port_rx_pkts[i])/1000000.0);
+		prev_app_stats.port_rx_pkts[i] = app_stats.port_rx_pkts[i];
+	}
+	printf(" - Received:    %5.2f\n",
+			(app_stats.rx.rx_pkts -
+			prev_app_stats.rx.rx_pkts)/1000000.0);
+	printf(" - Returned:    %5.2f\n",
+			(app_stats.rx.returned_pkts -
+			prev_app_stats.rx.returned_pkts)/1000000.0);
+	printf(" - Enqueued:    %5.2f\n",
+			(app_stats.rx.enqueued_pkts -
+			prev_app_stats.rx.enqueued_pkts)/1000000.0);
+	printf(" - Dropped:     %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.rx.enqdrop_pkts -
+			prev_app_stats.rx.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	printf("Distributor thread:\n");
+	printf(" - In:          %5.2f\n",
+			(app_stats.dist.in_pkts -
+			prev_app_stats.dist.in_pkts)/1000000.0);
+	printf(" - Returned:    %5.2f\n",
+			(app_stats.dist.ret_pkts -
+			prev_app_stats.dist.ret_pkts)/1000000.0);
+	printf(" - Sent:        %5.2f\n",
+			(app_stats.dist.sent_pkts -
+			prev_app_stats.dist.sent_pkts)/1000000.0);
+	printf(" - Dropped      %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.dist.enqdrop_pkts -
+			prev_app_stats.dist.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	printf("TX thread:\n");
+	printf(" - Dequeued:    %5.2f\n",
+			(app_stats.tx.dequeue_pkts -
+			prev_app_stats.tx.dequeue_pkts)/1000000.0);
+	for (i = 0; i < rte_eth_dev_count(); i++) {
+		printf("Port %u Pktsout: %5.2f\n",
+				i, (app_stats.port_tx_pkts[i] -
+				prev_app_stats.port_tx_pkts[i])/1000000.0);
+		prev_app_stats.port_tx_pkts[i] = app_stats.port_tx_pkts[i];
+	}
+	printf(" - Transmitted: %5.2f\n",
+			(app_stats.tx.tx_pkts -
+			prev_app_stats.tx.tx_pkts)/1000000.0);
+	printf(" - Dropped:     %s%5.2f%s\n", ANSI_COLOR_RED,
+			(app_stats.tx.enqdrop_pkts -
+			prev_app_stats.tx.enqdrop_pkts)/1000000.0,
+			ANSI_COLOR_RESET);
+
+	prev_app_stats.rx.rx_pkts = app_stats.rx.rx_pkts;
+	prev_app_stats.rx.returned_pkts = app_stats.rx.returned_pkts;
+	prev_app_stats.rx.enqueued_pkts = app_stats.rx.enqueued_pkts;
+	prev_app_stats.rx.enqdrop_pkts = app_stats.rx.enqdrop_pkts;
+	prev_app_stats.dist.in_pkts = app_stats.dist.in_pkts;
+	prev_app_stats.dist.ret_pkts = app_stats.dist.ret_pkts;
+	prev_app_stats.dist.sent_pkts = app_stats.dist.sent_pkts;
+	prev_app_stats.dist.enqdrop_pkts = app_stats.dist.enqdrop_pkts;
+	prev_app_stats.tx.dequeue_pkts = app_stats.tx.dequeue_pkts;
+	prev_app_stats.tx.tx_pkts = app_stats.tx.tx_pkts;
+	prev_app_stats.tx.enqdrop_pkts = app_stats.tx.enqdrop_pkts;
+
+	for (i = 0; i < num_workers; i++) {
+		printf("Worker %02u Pkts: %5.2f. Bursts(1-8): ", i,
+				(app_stats.worker_pkts[i] -
+				prev_app_stats.worker_pkts[i])/1000000.0);
+		for (j = 0; j < 8; j++) {
+			printf("%"PRIu64" ", app_stats.worker_bursts[i][j]);
+			app_stats.worker_bursts[i][j] = 0;
+		}
+		printf("\n");
+		prev_app_stats.worker_pkts[i] = app_stats.worker_pkts[i];
 	}
 }
 
@@ -515,6 +612,7 @@ main(int argc, char *argv[])
 	unsigned nb_ports;
 	uint8_t portid;
 	uint8_t nb_ports_available;
+	uint64_t t, freq;
 
 	/* catch ctrl-c so we can print on exit */
 	signal(SIGINT, int_handler);
@@ -610,6 +708,16 @@ main(int argc, char *argv[])
 	if (lcore_rx(&p) != 0)
 		return -1;
 
+	freq = rte_get_timer_hz();
+	t = rte_rdtsc() + freq;
+	while (!quit_signal_dist) {
+		if (t < rte_rdtsc()) {
+			print_stats();
+			t = rte_rdtsc() + freq;
+		}
+		usleep(1000);
+	}
+
 	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
 		if (rte_eal_wait_lcore(lcore_id) < 0)
 			return -1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v11 12/18] examples/distributor: wait for ports to come up
  2017-03-20 10:08                                   ` [PATCH v11 0/18] distributor lib performance enhancements David Hunt
                                                       ` (10 preceding siblings ...)
  2017-03-20 10:08                                     ` [PATCH v11 11/18] examples/distributor: allow for extra stats David Hunt
@ 2017-03-20 10:08                                     ` David Hunt
  2017-03-20 10:08                                     ` [PATCH v11 13/18] examples/distributor: add dedicated core for dist David Hunt
                                                       ` (6 subsequent siblings)
  18 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-20 10:08 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

On some machines, ports take several seconds to come up. This
patch causes the app to wait.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 examples/distributor/main.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index a8a5e80..75c001d 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -1,8 +1,7 @@
 /*-
  *   BSD LICENSE
  *
- *   Copyright(c) 2010-2014 Intel Corporation. All rights reserved.
- *   All rights reserved.
+ *   Copyright(c) 2010-2017 Intel Corporation. All rights reserved.
  *
  *   Redistribution and use in source and binary forms, with or without
  *   modification, are permitted provided that the following conditions
@@ -62,6 +61,7 @@ static uint32_t enabled_port_mask;
 volatile uint8_t quit_signal;
 volatile uint8_t quit_signal_rx;
 volatile uint8_t quit_signal_dist;
+volatile uint8_t quit_signal_work;
 
 static volatile struct app_stats {
 	struct {
@@ -165,7 +165,8 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 
 	struct rte_eth_link link;
 	rte_eth_link_get_nowait(port, &link);
-	if (!link.link_status) {
+	while (!link.link_status) {
+		printf("Waiting for Link up on port %"PRIu8"\n", port);
 		sleep(1);
 		rte_eth_link_get_nowait(port, &link);
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v11 13/18] examples/distributor: add dedicated core for dist
  2017-03-20 10:08                                   ` [PATCH v11 0/18] distributor lib performance enhancements David Hunt
                                                       ` (11 preceding siblings ...)
  2017-03-20 10:08                                     ` [PATCH v11 12/18] examples/distributor: wait for ports to come up David Hunt
@ 2017-03-20 10:08                                     ` David Hunt
  2017-03-20 10:08                                     ` [PATCH v11 14/18] examples/distributor: tweaks for performance David Hunt
                                                       ` (5 subsequent siblings)
  18 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-20 10:08 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Give the distribution functionality it's own core for performance,
otherwise it's limited by the Rx core.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 examples/distributor/main.c | 181 ++++++++++++++++++++++++++++++--------------
 1 file changed, 123 insertions(+), 58 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index 75c001d..96d6454 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -49,6 +49,8 @@
 #define NUM_MBUFS ((64*1024)-1)
 #define MBUF_CACHE_SIZE 250
 #define BURST_SIZE 32
+#define SCHED_RX_RING_SZ 8192
+#define SCHED_TX_RING_SZ 65536
 #define RTE_RING_SZ 1024
 
 #define RTE_LOGTYPE_DISTRAPP RTE_LOGTYPE_USER1
@@ -193,37 +195,14 @@ port_init(uint8_t port, struct rte_mempool *mbuf_pool)
 struct lcore_params {
 	unsigned worker_id;
 	struct rte_distributor *d;
-	struct rte_ring *r;
+	struct rte_ring *rx_dist_ring;
+	struct rte_ring *dist_tx_ring;
 	struct rte_mempool *mem_pool;
 };
 
 static int
-quit_workers(struct rte_distributor *d, struct rte_mempool *p)
-{
-	const unsigned num_workers = rte_lcore_count() - 2;
-	unsigned i;
-	struct rte_mbuf *bufs[num_workers];
-
-	if (rte_mempool_get_bulk(p, (void *)bufs, num_workers) != 0) {
-		printf("line %d: Error getting mbufs from pool\n", __LINE__);
-		return -1;
-	}
-
-	for (i = 0; i < num_workers; i++)
-		bufs[i]->hash.rss = i << 1;
-
-	rte_distributor_process(d, bufs, num_workers);
-	rte_mempool_put_bulk(p, (void *)bufs, num_workers);
-
-	return 0;
-}
-
-static int
 lcore_rx(struct lcore_params *p)
 {
-	struct rte_distributor *d = p->d;
-	struct rte_mempool *mem_pool = p->mem_pool;
-	struct rte_ring *r = p->r;
 	const uint8_t nb_ports = rte_eth_dev_count();
 	const int socket_id = rte_socket_id();
 	uint8_t port;
@@ -260,9 +239,15 @@ lcore_rx(struct lcore_params *p)
 		}
 		app_stats.rx.rx_pkts += nb_rx;
 
-		rte_distributor_process(d, bufs, nb_rx);
-		const uint16_t nb_ret = rte_distributor_returned_pkts(d,
-				bufs, BURST_SIZE*2);
+/*
+ * You can run the distributor on the rx core with this code. Returned
+ * packets are then send straight to the tx core.
+ */
+#if 0
+	rte_distributor_process(d, bufs, nb_rx);
+	const uint16_t nb_ret = rte_distributor_returned_pktsd,
+			bufs, BURST_SIZE*2);
+
 		app_stats.rx.returned_pkts += nb_ret;
 		if (unlikely(nb_ret == 0)) {
 			if (++port == nb_ports)
@@ -270,7 +255,22 @@ lcore_rx(struct lcore_params *p)
 			continue;
 		}
 
-		uint16_t sent = rte_ring_enqueue_burst(r, (void *)bufs, nb_ret);
+		struct rte_ring *tx_ring = p->dist_tx_ring;
+		uint16_t sent = rte_ring_enqueue_burst(tx_ring,
+				(void *)bufs, nb_ret);
+#else
+		uint16_t nb_ret = nb_rx;
+		/*
+		 * Swap the following two lines if you want the rx traffic
+		 * to go directly to tx, no distribution.
+		 */
+		struct rte_ring *out_ring = p->rx_dist_ring;
+		/* struct rte_ring *out_ring = p->dist_tx_ring; */
+
+		uint16_t sent = rte_ring_enqueue_burst(out_ring,
+				(void *)bufs, nb_ret);
+#endif
+
 		app_stats.rx.enqueued_pkts += sent;
 		if (unlikely(sent < nb_ret)) {
 			RTE_LOG_DP(DEBUG, DISTRAPP,
@@ -281,20 +281,9 @@ lcore_rx(struct lcore_params *p)
 		if (++port == nb_ports)
 			port = 0;
 	}
-	rte_distributor_process(d, NULL, 0);
-	/* flush distributor to bring to known state */
-	rte_distributor_flush(d);
 	/* set worker & tx threads quit flag */
+	printf("\nCore %u exiting rx task.\n", rte_lcore_id());
 	quit_signal = 1;
-	/*
-	 * worker threads may hang in get packet as
-	 * distributor process is not running, just make sure workers
-	 * get packets till quit_signal is actually been
-	 * received and they gracefully shutdown
-	 */
-	if (quit_workers(d, mem_pool) != 0)
-		return -1;
-	/* rx thread should quit at last */
 	return 0;
 }
 
@@ -331,6 +320,58 @@ flush_all_ports(struct output_buffer *tx_buffers, uint8_t nb_ports)
 	}
 }
 
+
+
+static int
+lcore_distributor(struct lcore_params *p)
+{
+	struct rte_ring *in_r = p->rx_dist_ring;
+	struct rte_ring *out_r = p->dist_tx_ring;
+	struct rte_mbuf *bufs[BURST_SIZE * 4];
+	struct rte_distributor *d = p->d;
+
+	printf("\nCore %u acting as distributor core.\n", rte_lcore_id());
+	while (!quit_signal_dist) {
+		const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
+				(void *)bufs, BURST_SIZE*1);
+		if (nb_rx) {
+			app_stats.dist.in_pkts += nb_rx;
+
+			/* Distribute the packets */
+			rte_distributor_process(d, bufs, nb_rx);
+			/* Handle Returns */
+			const uint16_t nb_ret =
+				rte_distributor_returned_pkts(d,
+					bufs, BURST_SIZE*2);
+
+			if (unlikely(nb_ret == 0))
+				continue;
+			app_stats.dist.ret_pkts += nb_ret;
+
+			uint16_t sent = rte_ring_enqueue_burst(out_r,
+					(void *)bufs, nb_ret);
+			app_stats.dist.sent_pkts += sent;
+			if (unlikely(sent < nb_ret)) {
+				app_stats.dist.enqdrop_pkts += nb_ret - sent;
+				RTE_LOG(DEBUG, DISTRAPP,
+					"%s:Packet loss due to full out ring\n",
+					__func__);
+				while (sent < nb_ret)
+					rte_pktmbuf_free(bufs[sent++]);
+			}
+		}
+	}
+	printf("\nCore %u exiting distributor task.\n", rte_lcore_id());
+	quit_signal_work = 1;
+
+	rte_distributor_flush(d);
+	/* Unblock any returns so workers can exit */
+	rte_distributor_clear_returns(d);
+	quit_signal_rx = 1;
+	return 0;
+}
+
+
 static int
 lcore_tx(struct rte_ring *in_r)
 {
@@ -403,7 +444,7 @@ int_handler(int sig_num)
 {
 	printf("Exiting on signal %d\n", sig_num);
 	/* set quit flag for rx thread to exit */
-	quit_signal_rx = 1;
+	quit_signal_dist = 1;
 }
 
 static void
@@ -517,7 +558,7 @@ lcore_worker(struct lcore_params *p)
 		buf[i] = NULL;
 
 	printf("\nCore %u acting as worker core.\n", rte_lcore_id());
-	while (!quit_signal) {
+	while (!quit_signal_work) {
 		num = rte_distributor_get_pkt(d, id, buf, buf, num);
 		/* Do a little bit of work for each packet */
 		for (i = 0; i < num; i++) {
@@ -608,7 +649,8 @@ main(int argc, char *argv[])
 {
 	struct rte_mempool *mbuf_pool;
 	struct rte_distributor *d;
-	struct rte_ring *output_ring;
+	struct rte_ring *dist_tx_ring;
+	struct rte_ring *rx_dist_ring;
 	unsigned lcore_id, worker_id = 0;
 	unsigned nb_ports;
 	uint8_t portid;
@@ -630,10 +672,11 @@ main(int argc, char *argv[])
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Invalid distributor parameters\n");
 
-	if (rte_lcore_count() < 3)
+	if (rte_lcore_count() < 4)
 		rte_exit(EXIT_FAILURE, "Error, This application needs at "
-				"least 3 logical cores to run:\n"
-				"1 lcore for packet RX and distribution\n"
+				"least 4 logical cores to run:\n"
+				"1 lcore for packet RX\n"
+				"1 lcore for distribution\n"
 				"1 lcore for packet TX\n"
 				"and at least 1 lcore for worker threads\n");
 
@@ -673,30 +716,52 @@ main(int argc, char *argv[])
 	}
 
 	d = rte_distributor_create("PKT_DIST", rte_socket_id(),
-			rte_lcore_count() - 2,
+			rte_lcore_count() - 3,
 			RTE_DIST_ALG_BURST);
 	if (d == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create distributor\n");
 
 	/*
-	 * scheduler ring is read only by the transmitter core, but written to
-	 * by multiple threads
+	 * scheduler ring is read by the transmitter core, and written to
+	 * by scheduler core
 	 */
-	output_ring = rte_ring_create("Output_ring", RTE_RING_SZ,
-			rte_socket_id(), RING_F_SC_DEQ);
-	if (output_ring == NULL)
+	dist_tx_ring = rte_ring_create("Output_ring", SCHED_TX_RING_SZ,
+			rte_socket_id(), RING_F_SC_DEQ | RING_F_SP_ENQ);
+	if (dist_tx_ring == NULL)
+		rte_exit(EXIT_FAILURE, "Cannot create output ring\n");
+
+	rx_dist_ring = rte_ring_create("Input_ring", SCHED_RX_RING_SZ,
+			rte_socket_id(), RING_F_SC_DEQ | RING_F_SP_ENQ);
+	if (rx_dist_ring == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create output ring\n");
 
 	RTE_LCORE_FOREACH_SLAVE(lcore_id) {
-		if (worker_id == rte_lcore_count() - 2)
+		if (worker_id == rte_lcore_count() - 3) {
+			printf("Starting distributor on lcore_id %d\n",
+					lcore_id);
+			/* distributor core */
+			struct lcore_params *p =
+					rte_malloc(NULL, sizeof(*p), 0);
+			if (!p)
+				rte_panic("malloc failure\n");
+			*p = (struct lcore_params){worker_id, d,
+				rx_dist_ring, dist_tx_ring, mbuf_pool};
+			rte_eal_remote_launch(
+				(lcore_function_t *)lcore_distributor,
+				p, lcore_id);
+		} else if (worker_id == rte_lcore_count() - 4) {
+			printf("Starting tx  on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
+			/* tx core */
 			rte_eal_remote_launch((lcore_function_t *)lcore_tx,
-					output_ring, lcore_id);
-		else {
+					dist_tx_ring, lcore_id);
+		} else {
 			struct lcore_params *p =
 					rte_malloc(NULL, sizeof(*p), 0);
 			if (!p)
 				rte_panic("malloc failure\n");
-			*p = (struct lcore_params){worker_id, d, output_ring, mbuf_pool};
+			*p = (struct lcore_params){worker_id, d, rx_dist_ring,
+					dist_tx_ring, mbuf_pool};
 
 			rte_eal_remote_launch((lcore_function_t *)lcore_worker,
 					p, lcore_id);
@@ -704,7 +769,7 @@ main(int argc, char *argv[])
 		worker_id++;
 	}
 	/* call lcore_main on master core only */
-	struct lcore_params p = { 0, d, output_ring, mbuf_pool};
+	struct lcore_params p = { 0, d, rx_dist_ring, dist_tx_ring, mbuf_pool};
 
 	if (lcore_rx(&p) != 0)
 		return -1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v11 14/18] examples/distributor: tweaks for performance
  2017-03-20 10:08                                   ` [PATCH v11 0/18] distributor lib performance enhancements David Hunt
                                                       ` (12 preceding siblings ...)
  2017-03-20 10:08                                     ` [PATCH v11 13/18] examples/distributor: add dedicated core for dist David Hunt
@ 2017-03-20 10:08                                     ` David Hunt
  2017-03-27 13:04                                       ` Thomas Monjalon
  2017-03-20 10:08                                     ` [PATCH v11 15/18] examples/distributor: give Rx thread a core David Hunt
                                                       ` (4 subsequent siblings)
  18 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-03-20 10:08 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Approximately 10% performance increase due to these changes.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 examples/distributor/main.c | 36 +++++++++++++++++++++++-------------
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index 96d6454..53c7b38 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -44,14 +44,14 @@
 #include <rte_prefetch.h>
 #include <rte_distributor.h>
 
-#define RX_RING_SIZE 256
+#define RX_RING_SIZE 512
 #define TX_RING_SIZE 512
 #define NUM_MBUFS ((64*1024)-1)
-#define MBUF_CACHE_SIZE 250
-#define BURST_SIZE 32
+#define MBUF_CACHE_SIZE 128
+#define BURST_SIZE 64
 #define SCHED_RX_RING_SZ 8192
 #define SCHED_TX_RING_SZ 65536
-#define RTE_RING_SZ 1024
+#define BURST_SIZE_TX 32
 
 #define RTE_LOGTYPE_DISTRAPP RTE_LOGTYPE_USER1
 
@@ -206,6 +206,7 @@ lcore_rx(struct lcore_params *p)
 	const uint8_t nb_ports = rte_eth_dev_count();
 	const int socket_id = rte_socket_id();
 	uint8_t port;
+	struct rte_mbuf *bufs[BURST_SIZE*2];
 
 	for (port = 0; port < nb_ports; port++) {
 		/* skip ports that are not enabled */
@@ -229,7 +230,6 @@ lcore_rx(struct lcore_params *p)
 				port = 0;
 			continue;
 		}
-		struct rte_mbuf *bufs[BURST_SIZE*2];
 		const uint16_t nb_rx = rte_eth_rx_burst(port, 0, bufs,
 				BURST_SIZE);
 		if (unlikely(nb_rx == 0)) {
@@ -273,6 +273,7 @@ lcore_rx(struct lcore_params *p)
 
 		app_stats.rx.enqueued_pkts += sent;
 		if (unlikely(sent < nb_ret)) {
+			app_stats.rx.enqdrop_pkts +=  nb_ret - sent;
 			RTE_LOG_DP(DEBUG, DISTRAPP,
 				"%s:Packet loss due to full ring\n", __func__);
 			while (sent < nb_ret)
@@ -290,13 +291,12 @@ lcore_rx(struct lcore_params *p)
 static inline void
 flush_one_port(struct output_buffer *outbuf, uint8_t outp)
 {
-	unsigned nb_tx = rte_eth_tx_burst(outp, 0, outbuf->mbufs,
-			outbuf->count);
-	app_stats.tx.tx_pkts += nb_tx;
+	unsigned int nb_tx = rte_eth_tx_burst(outp, 0,
+			outbuf->mbufs, outbuf->count);
+	app_stats.tx.tx_pkts += outbuf->count;
 
 	if (unlikely(nb_tx < outbuf->count)) {
-		RTE_LOG_DP(DEBUG, DISTRAPP,
-			"%s:Packet loss with tx_burst\n", __func__);
+		app_stats.tx.enqdrop_pkts +=  outbuf->count - nb_tx;
 		do {
 			rte_pktmbuf_free(outbuf->mbufs[nb_tx]);
 		} while (++nb_tx < outbuf->count);
@@ -308,6 +308,7 @@ static inline void
 flush_all_ports(struct output_buffer *tx_buffers, uint8_t nb_ports)
 {
 	uint8_t outp;
+
 	for (outp = 0; outp < nb_ports; outp++) {
 		/* skip ports that are not enabled */
 		if ((enabled_port_mask & (1 << outp)) == 0)
@@ -400,9 +401,9 @@ lcore_tx(struct rte_ring *in_r)
 			if ((enabled_port_mask & (1 << port)) == 0)
 				continue;
 
-			struct rte_mbuf *bufs[BURST_SIZE];
+			struct rte_mbuf *bufs[BURST_SIZE_TX];
 			const uint16_t nb_rx = rte_ring_dequeue_burst(in_r,
-					(void *)bufs, BURST_SIZE);
+					(void *)bufs, BURST_SIZE_TX);
 			app_stats.tx.dequeue_pkts += nb_rx;
 
 			/* if we get no traffic, flush anything we have */
@@ -431,11 +432,12 @@ lcore_tx(struct rte_ring *in_r)
 
 				outbuf = &tx_buffers[outp];
 				outbuf->mbufs[outbuf->count++] = bufs[i];
-				if (outbuf->count == BURST_SIZE)
+				if (outbuf->count == BURST_SIZE_TX)
 					flush_one_port(outbuf, outp);
 			}
 		}
 	}
+	printf("\nCore %u exiting tx task.\n", rte_lcore_id());
 	return 0;
 }
 
@@ -557,6 +559,8 @@ lcore_worker(struct lcore_params *p)
 	for (i = 0; i < 8; i++)
 		buf[i] = NULL;
 
+	app_stats.worker_pkts[p->worker_id] = 1;
+
 	printf("\nCore %u acting as worker core.\n", rte_lcore_id());
 	while (!quit_signal_work) {
 		num = rte_distributor_get_pkt(d, id, buf, buf, num);
@@ -568,6 +572,10 @@ lcore_worker(struct lcore_params *p)
 				rte_pause();
 			buf[i]->port ^= xor_val;
 		}
+
+		app_stats.worker_pkts[p->worker_id] += num;
+		if (num > 0)
+			app_stats.worker_bursts[p->worker_id][num-1]++;
 	}
 	return 0;
 }
@@ -756,6 +764,8 @@ main(int argc, char *argv[])
 			rte_eal_remote_launch((lcore_function_t *)lcore_tx,
 					dist_tx_ring, lcore_id);
 		} else {
+			printf("Starting worker on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
 			struct lcore_params *p =
 					rte_malloc(NULL, sizeof(*p), 0);
 			if (!p)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v11 15/18] examples/distributor: give Rx thread a core
  2017-03-20 10:08                                   ` [PATCH v11 0/18] distributor lib performance enhancements David Hunt
                                                       ` (13 preceding siblings ...)
  2017-03-20 10:08                                     ` [PATCH v11 14/18] examples/distributor: tweaks for performance David Hunt
@ 2017-03-20 10:08                                     ` David Hunt
  2017-03-20 10:08                                     ` [PATCH v11 16/18] doc: distributor library changes for new burst API David Hunt
                                                       ` (3 subsequent siblings)
  18 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-20 10:08 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Now that we're printing out a page of stats every second to the console,
we should give the stats it's own core so that we don't interfere with
the performance of the Rx core.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 examples/distributor/main.c | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index 53c7b38..6aa8755 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -680,9 +680,10 @@ main(int argc, char *argv[])
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Invalid distributor parameters\n");
 
-	if (rte_lcore_count() < 4)
+	if (rte_lcore_count() < 5)
 		rte_exit(EXIT_FAILURE, "Error, This application needs at "
-				"least 4 logical cores to run:\n"
+				"least 5 logical cores to run:\n"
+				"1 lcore for stats (can be core 0)\n"
 				"1 lcore for packet RX\n"
 				"1 lcore for distribution\n"
 				"1 lcore for packet TX\n"
@@ -724,7 +725,7 @@ main(int argc, char *argv[])
 	}
 
 	d = rte_distributor_create("PKT_DIST", rte_socket_id(),
-			rte_lcore_count() - 3,
+			rte_lcore_count() - 4,
 			RTE_DIST_ALG_BURST);
 	if (d == NULL)
 		rte_exit(EXIT_FAILURE, "Cannot create distributor\n");
@@ -763,6 +764,18 @@ main(int argc, char *argv[])
 			/* tx core */
 			rte_eal_remote_launch((lcore_function_t *)lcore_tx,
 					dist_tx_ring, lcore_id);
+		} else if (worker_id == rte_lcore_count() - 2) {
+			printf("Starting rx on worker_id %d, lcore_id %d\n",
+					worker_id, lcore_id);
+			/* rx core */
+			struct lcore_params *p =
+					rte_malloc(NULL, sizeof(*p), 0);
+			if (!p)
+				rte_panic("malloc failure\n");
+			*p = (struct lcore_params){worker_id, d, rx_dist_ring,
+					dist_tx_ring, mbuf_pool};
+			rte_eal_remote_launch((lcore_function_t *)lcore_rx,
+					p, lcore_id);
 		} else {
 			printf("Starting worker on worker_id %d, lcore_id %d\n",
 					worker_id, lcore_id);
@@ -778,11 +791,6 @@ main(int argc, char *argv[])
 		}
 		worker_id++;
 	}
-	/* call lcore_main on master core only */
-	struct lcore_params p = { 0, d, rx_dist_ring, dist_tx_ring, mbuf_pool};
-
-	if (lcore_rx(&p) != 0)
-		return -1;
 
 	freq = rte_get_timer_hz();
 	t = rte_rdtsc() + freq;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v11 16/18] doc: distributor library changes for new burst API
  2017-03-20 10:08                                   ` [PATCH v11 0/18] distributor lib performance enhancements David Hunt
                                                       ` (14 preceding siblings ...)
  2017-03-20 10:08                                     ` [PATCH v11 15/18] examples/distributor: give Rx thread a core David Hunt
@ 2017-03-20 10:08                                     ` David Hunt
  2017-03-24 14:49                                       ` Mcnamara, John
  2017-03-20 10:08                                     ` [PATCH v11 17/18] doc: distributor app " David Hunt
                                                       ` (2 subsequent siblings)
  18 siblings, 1 reply; 202+ messages in thread
From: David Hunt @ 2017-03-20 10:08 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 doc/guides/prog_guide/packet_distrib_lib.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/doc/guides/prog_guide/packet_distrib_lib.rst b/doc/guides/prog_guide/packet_distrib_lib.rst
index b5bdabb..e0adcaa 100644
--- a/doc/guides/prog_guide/packet_distrib_lib.rst
+++ b/doc/guides/prog_guide/packet_distrib_lib.rst
@@ -42,6 +42,9 @@ The model of operation is shown in the diagram below.
 
    Packet Distributor mode of operation
 
+There are two modes of operation of the API in the distributor Library, one which sends one packet at a time
+to workers using 32-bits for flow_id, and an optiomised mode which sends bursts of up to 8 packets at a time
+to workers, using 15 bits of flow_id. The mode is selected by the type field in the ``rte_distributor_create function``.
 
 Distributor Core Operation
 --------------------------
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v11 17/18] doc: distributor app changes for new burst API
  2017-03-20 10:08                                   ` [PATCH v11 0/18] distributor lib performance enhancements David Hunt
                                                       ` (15 preceding siblings ...)
  2017-03-20 10:08                                     ` [PATCH v11 16/18] doc: distributor library changes for new burst API David Hunt
@ 2017-03-20 10:08                                     ` David Hunt
  2017-03-20 10:08                                     ` [PATCH v11 18/18] maintainers: add to distributor lib maintainers David Hunt
  2017-03-27 13:06                                     ` [PATCH v11 0/18] distributor lib performance enhancements Thomas Monjalon
  18 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-20 10:08 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Changes in the thread layout described, with an updated diagram.

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: John McNamara <john.mcnamara@intel.com>
---
 doc/guides/sample_app_ug/dist_app.rst     |  50 +++---
 doc/guides/sample_app_ug/img/dist_app.svg | 276 +++++++++++++++++-------------
 2 files changed, 180 insertions(+), 146 deletions(-)

diff --git a/doc/guides/sample_app_ug/dist_app.rst b/doc/guides/sample_app_ug/dist_app.rst
index ec07b84..1cae473 100644
--- a/doc/guides/sample_app_ug/dist_app.rst
+++ b/doc/guides/sample_app_ug/dist_app.rst
@@ -104,33 +104,35 @@ Running the Application
 Explanation
 -----------
 
-The distributor application consists of three types of threads: a receive
-thread (lcore_rx()), a set of worker threads(lcore_worker())
-and a transmit thread(lcore_tx()). How these threads work together is shown
-in :numref:`figure_dist_app` below. The main() function launches  threads of these three types.
-Each thread has a while loop which will be doing processing and which is
-terminated only upon SIGINT or ctrl+C. The receive and transmit threads
-communicate using a software ring (rte_ring structure).
-
-The receive thread receives the packets using rte_eth_rx_burst() and gives
-them to  the distributor (using rte_distributor_process() API) which will
-be called in context of the receive thread itself. The distributor distributes
-the packets to workers threads based on the tagging of the packet -
-indicated by the hash field in the mbuf. For IP traffic, this field is
-automatically filled by the NIC with the "usr" hash value for the packet,
-which works as a per-flow tag.
+The distributor application consists of four types of threads: a receive
+thread (``lcore_rx()``), a distributor thread (``lcore_dist()``), a set of
+worker threads (``lcore_worker()``), and a transmit thread(``lcore_tx()``).
+How these threads work together is shown in :numref:`figure_dist_app` below.
+The ``main()`` function launches  threads of these four types.  Each thread
+has a while loop which will be doing processing and which is terminated
+only upon SIGINT or ctrl+C.
+
+The receive thread receives the packets using ``rte_eth_rx_burst()`` and will
+enqueue them to an rte_ring. The distributor thread will dequeue the packets
+from the ring and assign them to workers (using ``rte_distributor_process()`` API).
+This assignment is based on the tag (or flow ID) of the packet - indicated by
+the hash field in the mbuf. For IP traffic, this field is automatically filled
+by the NIC with the "usr" hash value for the packet, which works as a per-flow
+tag.  The distributor thread communicates with the worker threads using a
+cache-line swapping mechanism, passing up to 8 mbuf pointers at a time
+(one cache line) to each worker.
 
 More than one worker thread can exist as part of the application, and these
 worker threads do simple packet processing by requesting packets from
 the distributor, doing a simple XOR operation on the input port mbuf field
 (to indicate the output port which will be used later for packet transmission)
-and then finally returning the packets back to the distributor in the RX thread.
+and then finally returning the packets back to the distributor thread.
 
-Meanwhile, the receive thread will call the distributor api
-rte_distributor_returned_pkts() to get the packets processed, and will enqueue
-them to a ring for transfer to the TX thread for transmission on the output port.
-The transmit thread will dequeue the packets from the ring and transmit them on
-the output port specified in packet mbuf.
+The distributor thread will then call the distributor api
+``rte_distributor_returned_pkts()`` to get the processed packets, and will enqueue
+them to another rte_ring for transfer to the TX thread for transmission on the
+output port. The transmit thread will dequeue the packets from the ring and
+transmit them on the output port specified in packet mbuf.
 
 Users who wish to terminate the running of the application have to press ctrl+C
 (or send SIGINT to the app). Upon this signal, a signal handler provided
@@ -153,8 +155,10 @@ the line "#define DEBUG" defined in start of the application in main.c to enable
 Statistics
 ----------
 
-Upon SIGINT (or) ctrl+C, the print_stats() function displays the count of packets
-processed at the different stages in the application.
+The main function will print statistics on the console every second. These
+statistics include the number of packets enqueued and dequeued at each stage
+in the application, and also key statistics per worker, including how many
+packets of each burst size (1-8) were sent to each worker thread.
 
 Application Initialization
 --------------------------
diff --git a/doc/guides/sample_app_ug/img/dist_app.svg b/doc/guides/sample_app_ug/img/dist_app.svg
index 4714c7d..944f437 100644
--- a/doc/guides/sample_app_ug/img/dist_app.svg
+++ b/doc/guides/sample_app_ug/img/dist_app.svg
@@ -1,8 +1,7 @@
 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
-
 <!--
 # BSD LICENSE
-# Copyright (c) <2014>, Intel Corporation
+# Copyright (c) <2014-2017>, Intel Corporation
 # All rights reserved.
 #
 # Redistribution and use in source and binary forms, with or without
@@ -47,8 +46,8 @@
    height="379.53668"
    id="svg4090"
    version="1.1"
-   inkscape:version="0.48.5 r10040"
-   sodipodi:docname="New document 2">
+   inkscape:version="0.92.1 r15371"
+   sodipodi:docname="dist_app.svg">
   <defs
      id="defs4092">
     <marker
@@ -200,8 +199,8 @@
      inkscape:pageopacity="0.0"
      inkscape:pageshadow="2"
      inkscape:zoom="1"
-     inkscape:cx="339.92174"
-     inkscape:cy="120.32038"
+     inkscape:cx="401.32873"
+     inkscape:cy="130.13572"
      inkscape:document-units="px"
      inkscape:current-layer="layer1"
      showgrid="false"
@@ -210,8 +209,8 @@
      fit-margin-right="0"
      fit-margin-bottom="0"
      inkscape:window-width="1920"
-     inkscape:window-height="1017"
-     inkscape:window-x="-8"
+     inkscape:window-height="1137"
+     inkscape:window-x="1912"
      inkscape:window-y="-8"
      inkscape:window-maximized="1" />
   <metadata
@@ -222,7 +221,7 @@
         <dc:format>image/svg+xml</dc:format>
         <dc:type
            rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
-        <dc:title></dc:title>
+        <dc:title />
       </cc:Work>
     </rdf:RDF>
   </metadata>
@@ -232,40 +231,33 @@
      id="layer1"
      transform="translate(-35.078263,-28.308125)">
     <rect
-       style="fill:none;stroke:#000000;stroke-width:0.99999988;stroke-opacity:0.98412697"
+       style="fill:none;stroke:#000000;stroke-width:0.81890059;stroke-opacity:0.98412697"
        id="rect10443"
-       width="152.9641"
-       height="266.92566"
-       x="122.95611"
-       y="34.642567" />
-    <rect
-       style="fill:none;stroke:#000000;stroke-width:1;stroke-opacity:0.98412697"
-       id="rect10445"
-       width="124.71397"
-       height="46.675529"
-       x="435.7746"
-       y="28.808125" />
+       width="152.96732"
+       height="178.99617"
+       x="124.50176"
+       y="128.95552" />
     <rect
        style="fill:none;stroke:#000000;stroke-width:0.99999988;stroke-opacity:0.98412697"
        id="rect10445-2"
        width="124.71397"
        height="46.675529"
-       x="435.42999"
-       y="103.92654" />
+       x="437.00507"
+       y="133.06113" />
     <rect
        style="fill:none;stroke:#000000;stroke-width:0.99999988;stroke-opacity:0.98412697"
        id="rect10445-0"
        width="124.71397"
        height="46.675529"
        x="436.80811"
-       y="178.31572" />
+       y="193.87207" />
     <rect
        style="fill:none;stroke:#000000;stroke-width:0.99999988;stroke-opacity:0.98412697"
        id="rect10445-9"
        width="124.71397"
        height="46.675529"
        x="436.80811"
-       y="246.87038" />
+       y="256.06277" />
     <rect
        style="fill:none;stroke:#000000;stroke-width:0.99999988;stroke-opacity:0.98412697"
        id="rect10445-7"
@@ -274,203 +266,241 @@
        x="135.7057"
        y="360.66928" />
     <path
-       style="fill:none;stroke:#000000;stroke-width:0.99200004;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-opacity:1;stroke-dasharray:none;marker-start:url(#Arrow1Mstart)"
-       d="M 277.293,44.129101 433.02373,43.388655"
-       id="path10486"
-       inkscape:connector-type="polyline"
-       inkscape:connector-curvature="3" />
-    <path
-       style="fill:none;stroke:#000000;stroke-width:0.99200004;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-opacity:1;stroke-dasharray:none;marker-start:url(#Arrow1Mstart)"
-       d="m 277.83855,110.78109 155.73073,-0.74044"
+       style="fill:none;stroke:#000000;stroke-width:0.99566948;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#Arrow1Mstart)"
+       d="M 278.89497,147.51907 436.5713,146.78234"
        id="path10486-2"
        inkscape:connector-type="polyline"
        inkscape:connector-curvature="3" />
     <path
-       style="fill:none;stroke:#000000;stroke-width:0.99200004;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-opacity:1;stroke-dasharray:none;marker-start:url(#Arrow1Mstart)"
-       d="m 278.48623,189.32721 155.73073,-0.74042"
+       style="fill:none;stroke:#000000;stroke-width:0.99290925;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#Arrow1Mstart)"
+       d="m 279.37092,206.8834 156.80331,-0.73671"
        id="path10486-1"
        inkscape:connector-type="polyline"
        inkscape:connector-curvature="3" />
     <path
-       style="fill:none;stroke:#000000;stroke-width:0.99200004;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-opacity:1;stroke-dasharray:none;marker-start:url(#Arrow1Mstart)"
-       d="m 278.48623,255.19448 155.73073,-0.74043"
+       style="fill:none;stroke:#000000;stroke-width:0.99379504;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-dasharray:none;stroke-opacity:1;marker-start:url(#Arrow1Mstart)"
+       d="m 279.19738,270.88669 157.15478,-0.73638"
        id="path10486-4"
        inkscape:connector-type="polyline"
        inkscape:connector-curvature="3" />
     <path
-       style="fill:none;stroke:#000000;stroke-width:0.99200004;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-mid:none;marker-end:url(#Arrow1Mend)"
-       d="M 277.11852,66.041829 432.84924,65.301384"
-       id="path10486-0"
-       inkscape:connector-type="polyline"
-       inkscape:connector-curvature="3" />
-    <path
-       style="fill:none;stroke:#000000;stroke-width:0.99200004;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-mid:none;marker-end:url(#Arrow1Mend)"
-       d="M 277.46746,136.71727 433.1982,135.97682"
+       style="fill:none;stroke:#000000;stroke-width:0.99820405;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-dasharray:none;stroke-opacity:1;marker-start:none;marker-mid:none;marker-end:url(#Arrow1Mend)"
+       d="m 277.17846,166.20347 158.11878,-0.73842"
        id="path10486-0-4"
        inkscape:connector-type="polyline"
        inkscape:connector-curvature="3" />
     <path
-       style="fill:none;stroke:#000000;stroke-width:0.99200004;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-mid:none;marker-end:url(#Arrow1Mend)"
-       d="m 276.77843,210.37709 155.73073,-0.74044"
+       style="fill:none;stroke:#000000;stroke-width:0.99410033;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-dasharray:none;stroke-opacity:1;marker-start:none;marker-mid:none;marker-end:url(#Arrow1Mend)"
+       d="m 277.47049,225.92925 157.32298,-0.73606"
        id="path10486-0-7"
        inkscape:connector-type="polyline"
        inkscape:connector-curvature="3" />
     <path
-       style="fill:none;stroke:#000000;stroke-width:0.99200004;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-mid:none;marker-end:url(#Arrow1Mend)"
-       d="M 277.46746,282.5783 433.1982,281.83785"
+       style="fill:none;stroke:#000000;stroke-width:0.99566948;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-dasharray:none;stroke-opacity:1;marker-start:none;marker-mid:none;marker-end:url(#Arrow1Mend)"
+       d="M 277.70474,289.26714 435.38107,288.5304"
        id="path10486-0-77"
        inkscape:connector-type="polyline"
        inkscape:connector-curvature="3" />
     <text
        xml:space="preserve"
-       style="font-size:9.32312489px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
-       x="348.03241"
-       y="34.792767"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
+       x="345.02322"
+       y="134.82103"
        id="text11995"
-       sodipodi:linespacing="125%"
-       transform="scale(0.93992342,1.0639165)"><tspan
+       transform="scale(0.93992339,1.0639165)"><tspan
          sodipodi:role="line"
          id="tspan11997"
-         x="348.03241"
-         y="34.792767">Request packet</tspan></text>
+         x="345.02322"
+         y="134.82103"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">Request burst</tspan></text>
     <text
        xml:space="preserve"
-       style="font-size:9.32312489px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
-       x="349.51935"
-       y="74.044792"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
+       x="346.38663"
+       y="164.76628"
        id="text11995-7"
-       sodipodi:linespacing="125%"
-       transform="scale(0.93992342,1.0639165)"><tspan
+       transform="scale(0.93992339,1.0639165)"><tspan
          sodipodi:role="line"
          id="tspan11997-3"
-         x="349.51935"
-         y="74.044792">Mbuf pointer</tspan></text>
+         x="346.38663"
+         y="164.76628"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">Mbuf Pointers</tspan></text>
     <text
        xml:space="preserve"
-       style="font-size:9.32312489px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
-       x="504.26611"
-       y="52.165989"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
+       x="502.36844"
+       y="151.66222"
        id="text11995-7-3"
-       sodipodi:linespacing="125%"
-       transform="scale(0.93992342,1.0639165)"><tspan
+       transform="scale(0.93992339,1.0639165)"><tspan
          sodipodi:role="line"
          id="tspan11997-3-5"
-         x="504.26611"
-         y="52.165989">WorkerThread1</tspan></text>
+         x="502.36844"
+         y="151.66222"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">WorkerThread1</tspan></text>
     <text
        xml:space="preserve"
-       style="font-size:9.32312489px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
-       x="501.65793"
-       y="121.54361"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
+       x="499.40103"
+       y="207.94502"
        id="text11995-7-3-9"
-       sodipodi:linespacing="125%"
-       transform="scale(0.93992342,1.0639165)"><tspan
+       transform="scale(0.93992339,1.0639165)"><tspan
          sodipodi:role="line"
          id="tspan11997-3-5-9"
-         x="501.65793"
-         y="121.54361">WorkerThread2</tspan></text>
+         x="499.40103"
+         y="207.94502"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">WorkerThread2</tspan></text>
     <text
        xml:space="preserve"
-       style="font-size:9.32312489px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
-       x="499.45868"
-       y="191.46367"
-       id="text11995-7-3-8"
-       sodipodi:linespacing="125%"
-       transform="scale(0.93992342,1.0639165)"><tspan
-         sodipodi:role="line"
-         id="tspan11997-3-5-1"
-         x="499.45868"
-         y="191.46367">WorkerThread3</tspan></text>
-    <text
-       xml:space="preserve"
-       style="font-size:9.32312489px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
        x="500.1918"
-       y="257.9563"
+       y="266.59644"
        id="text11995-7-3-82"
-       sodipodi:linespacing="125%"
-       transform="scale(0.93992342,1.0639165)"><tspan
+       transform="scale(0.9399234,1.0639165)"><tspan
          sodipodi:role="line"
          id="tspan11997-3-5-6"
          x="500.1918"
-         y="257.9563">WorkerThreadN</tspan></text>
+         y="266.59644"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">WorkerThreadN</tspan></text>
     <text
        xml:space="preserve"
-       style="font-size:9.32312489px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
        x="193.79703"
        y="362.85193"
        id="text11995-7-3-6"
-       sodipodi:linespacing="125%"
        transform="scale(0.93992342,1.0639165)"><tspan
          sodipodi:role="line"
          id="tspan11997-3-5-0"
          x="193.79703"
-         y="362.85193">TX thread</tspan></text>
+         y="362.85193"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">TX thread</tspan></text>
     <text
        xml:space="preserve"
-       style="font-size:9.32312489px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
-       x="162.2476"
-       y="142.79382"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
+       x="175.78905"
+       y="207.26257"
        id="text11995-7-3-3"
-       sodipodi:linespacing="125%"
-       transform="scale(0.93992342,1.0639165)"><tspan
+       transform="scale(0.9399234,1.0639165)"><tspan
          sodipodi:role="line"
          id="tspan11997-3-5-8"
-         x="162.2476"
-         y="142.79382">RX thread &amp; Distributor</tspan></text>
+         x="175.78905"
+         y="207.26257"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">Distributor Thread</tspan></text>
     <path
-       style="fill:none;stroke:#000000;stroke-width:0.75945646;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-mid:none;marker-end:url(#Arrow1Mend)"
-       d="m 35.457991,109.77995 85.546359,-0.79004"
+       style="fill:none;stroke:#000000;stroke-width:0.75945646;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-dasharray:none;stroke-opacity:1;marker-start:none;marker-mid:none;marker-end:url(#Arrow1Mend)"
+       d="m 49.600127,54.625621 85.546363,-0.79004"
        id="path10486-0-4-5"
        inkscape:connector-type="polyline"
        inkscape:connector-curvature="3" />
     <path
-       style="fill:none;stroke:#000000;stroke-width:0.75945646;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-opacity:1;stroke-dasharray:none;marker-start:none;marker-mid:none;marker-end:url(#Arrow1Mend)"
+       style="fill:none;stroke:#000000;stroke-width:0.75945646;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:3;stroke-dasharray:none;stroke-opacity:1;marker-start:none;marker-mid:none;marker-end:url(#Arrow1Mend)"
        d="m 135.70569,384.00706 -85.546361,0.79003"
        id="path10486-0-4-5-7"
        inkscape:connector-type="polyline"
        inkscape:connector-curvature="3" />
     <text
        xml:space="preserve"
-       style="font-size:9.32312489px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
-       x="58.296661"
-       y="96.037407"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
+       x="73.342712"
+       y="44.196564"
        id="text11995-7-8"
-       sodipodi:linespacing="125%"
-       transform="scale(0.93992342,1.0639165)"><tspan
+       transform="scale(0.9399234,1.0639165)"><tspan
          sodipodi:role="line"
          id="tspan11997-3-3"
-         x="58.296661"
-         y="96.037407">Mbufs In</tspan></text>
+         x="73.342712"
+         y="44.196564"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">Mbufs In</tspan></text>
     <text
        xml:space="preserve"
-       style="font-size:9.32312489px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
        x="83.4814"
        y="352.62543"
        id="text11995-7-8-5"
-       sodipodi:linespacing="125%"
        transform="scale(0.93992342,1.0639165)"><tspan
          sodipodi:role="line"
          id="tspan11997-3-3-1"
          x="83.4814"
-         y="352.62543">Mbufs Out</tspan></text>
+         y="352.62543"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">Mbufs Out</tspan></text>
     <path
-       style="fill:none;stroke:#000000;stroke-width:1.05720723;stroke-miterlimit:3;stroke-opacity:0.98412697;stroke-dasharray:none"
-       d="m 171.68192,303.16236 0.21464,30.4719 -8.6322,0.40574 -11.33877,0.1956 25.75778,14.79103 23.25799,11.11792 18.87014,-7.32926 31.83305,-17.26495 -10.75831,-0.32986 -10.37586,-0.44324 -0.22443,-31.54093 z"
+       style="fill:none;stroke:#000000;stroke-width:1.01068497;stroke-miterlimit:3;stroke-dasharray:none;stroke-opacity:0.98412697"
+       d="m 171.68192,308.06701 0.21464,27.84908 -8.6322,0.37082 -11.33877,0.17876 25.75778,13.51792 23.25799,10.16096 18.87014,-6.69841 31.83305,-15.77889 -10.75831,-0.30147 -10.37586,-0.40509 -0.22443,-28.8261 z"
        id="path12188"
        inkscape:connector-curvature="0"
-       inkscape:transform-center-y="7.6863474"
+       inkscape:transform-center-y="7.0247597"
        sodipodi:nodetypes="cccccccccccc" />
     <text
        xml:space="preserve"
-       style="font-size:9.32312489px;font-style:normal;font-weight:normal;line-height:125%;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;font-family:Sans"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
        x="193.68871"
        y="309.26349"
        id="text11995-7-3-6-2"
-       sodipodi:linespacing="125%"
        transform="scale(0.93992342,1.0639165)"><tspan
          sodipodi:role="line"
          x="193.68871"
          y="309.26349"
-         id="tspan12214">SW Ring</tspan></text>
+         id="tspan12214"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">SW Ring</tspan></text>
+    <path
+       style="fill:none;stroke:#000000;stroke-width:1.02106845;stroke-miterlimit:3;stroke-dasharray:none;stroke-opacity:0.98412697"
+       d="m 173.27214,75.568236 0.21464,28.424254 -8.6322,0.37848 -11.33877,0.18245 25.75778,13.79709 23.25799,10.37083 18.87013,-6.83675 31.83305,-16.10478 -10.75831,-0.30769 -10.37586,-0.41345 -0.22443,-29.421453 z"
+       id="path12188-5"
+       inkscape:connector-curvature="0"
+       inkscape:transform-center-y="7.1698404"
+       sodipodi:nodetypes="cccccccccccc" />
+    <rect
+       style="fill:none;stroke:#000000;stroke-width:0.99999988;stroke-opacity:0.98412697"
+       id="rect10445-7-7"
+       width="124.71397"
+       height="46.675529"
+       x="138.18427"
+       y="28.832333" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
+       x="190.80019"
+       y="51.17778"
+       id="text11995-7-3-6-6"
+       transform="scale(0.93992339,1.0639165)"><tspan
+         sodipodi:role="line"
+         id="tspan11997-3-5-0-4"
+         x="190.80019"
+         y="51.17778"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">RX thread</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
+       x="196.38097"
+       y="90.224785"
+       id="text11995-7-3-6-2-9"
+       transform="scale(0.93992339,1.0639165)"><tspan
+         sodipodi:role="line"
+         x="196.38097"
+         y="90.224785"
+         id="tspan12214-8"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">SW Ring</tspan></text>
+    <rect
+       style="fill:none;stroke:#000000;stroke-width:0.99999988;stroke-opacity:0.98412697"
+       id="rect10445-7-7-5"
+       width="124.71397"
+       height="46.675529"
+       x="327.86566"
+       y="29.009106" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-weight:normal;line-height:0%;font-family:sans-serif;letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none"
+       x="387.27209"
+       y="45.36227"
+       id="text11995-7-3-6-6-3"
+       transform="scale(0.93992339,1.0639165)"><tspan
+         sodipodi:role="line"
+         id="tspan11997-3-5-0-4-4"
+         x="387.27209"
+         y="45.36227"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif">Stats thread</tspan><tspan
+         sodipodi:role="line"
+         x="387.27209"
+         y="57.016178"
+         style="font-size:9.32312489px;line-height:1.25;font-family:sans-serif"
+         id="tspan165">(to console)</tspan></text>
   </g>
 </svg>
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* [PATCH v11 18/18] maintainers: add to distributor lib maintainers
  2017-03-20 10:08                                   ` [PATCH v11 0/18] distributor lib performance enhancements David Hunt
                                                       ` (16 preceding siblings ...)
  2017-03-20 10:08                                     ` [PATCH v11 17/18] doc: distributor app " David Hunt
@ 2017-03-20 10:08                                     ` David Hunt
  2017-03-27 13:06                                     ` [PATCH v11 0/18] distributor lib performance enhancements Thomas Monjalon
  18 siblings, 0 replies; 202+ messages in thread
From: David Hunt @ 2017-03-20 10:08 UTC (permalink / raw)
  To: dev; +Cc: bruce.richardson, David Hunt

Signed-off-by: David Hunt <david.hunt@intel.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 0c78b58..0dee268 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -492,6 +492,7 @@ F: doc/guides/sample_app_ug/ip_reassembly.rst
 
 Distributor
 M: Bruce Richardson <bruce.richardson@intel.com>
+M: David Hunt <david.hunt@intel.com>
 F: lib/librte_distributor/
 F: doc/guides/prog_guide/packet_distrib_lib.rst
 F: test/test/test_distributor*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 202+ messages in thread

* Re: [PATCH v11 16/18] doc: distributor library changes for new burst API
  2017-03-20 10:08                                     ` [PATCH v11 16/18] doc: distributor library changes for new burst API David Hunt
@ 2017-03-24 14:49                                       ` Mcnamara, John
  0 siblings, 0 replies; 202+ messages in thread
From: Mcnamara, John @ 2017-03-24 14:49 UTC (permalink / raw)
  To: Hunt, David, dev; +Cc: Richardson, Bruce, Hunt, David



> -----Original Message-----
> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of David Hunt
> Sent: Monday, March 20, 2017 10:09 AM
> To: dev@dpdk.org
> Cc: Richardson, Bruce <bruce.richardson@intel.com>; Hunt, David
> <david.hunt@intel.com>
> Subject: [dpdk-dev] [PATCH v11 16/18] doc: distributor library changes for
> new burst API
> 
> Signed-off-by: David Hunt <david.hunt@intel.com>
> Acked-by: John McNamara <john.mcnamara@intel.com>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  doc/guides/prog_guide/packet_distrib_lib.rst | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/doc/guides/prog_guide/packet_distrib_lib.rst
> b/doc/guides/prog_guide/packet_distrib_lib.rst
> index b5bdabb..e0adcaa 100644
> --- a/doc/guides/prog_guide/packet_distrib_lib.rst
> +++ b/doc/guides/prog_guide/packet_distrib_lib.rst
> @@ -42,6 +42,9 @@ The model of operation is shown in the diagram below.
> 
>     Packet Distributor mode of operation
> 
> +There are two modes of operation of the API in the distributor Library,
> +one which sends one packet at a time to workers using 32-bits for
> +flow_id, and an optiomised mode which sends bursts of up to 8 packets at
> a time to workers, using 15 bits of flow_id. The mode is selected by the
> type field in the ``rte_distributor_create function``.


s/Library/library
s/optiomised/optimized/
s/rte_distributor_create function``/rte_distributor_create()`` function/

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v11 08/18] lib: add symbol versioning to distributor
  2017-03-20 10:08                                     ` [PATCH v11 08/18] lib: add symbol versioning to distributor David Hunt
@ 2017-03-27 13:02                                       ` Thomas Monjalon
  2017-03-28  8:25                                         ` Hunt, David
  0 siblings, 1 reply; 202+ messages in thread
From: Thomas Monjalon @ 2017-03-27 13:02 UTC (permalink / raw)
  To: David Hunt; +Cc: dev, bruce.richardson

2017-03-20 10:08, David Hunt:
> Also bumped up the ABI version number in the Makefile

It would be good to explain the intent of versioning here.

> Signed-off-by: David Hunt <david.hunt@intel.com>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  lib/librte_distributor/Makefile                    |  2 +-
>  lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
>  lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++
>  lib/librte_distributor/rte_distributor_v20.c       | 10 +++
>  lib/librte_distributor/rte_distributor_version.map | 14 ++++
>  5 files changed, 162 insertions(+), 10 deletions(-)
>  create mode 100644 lib/librte_distributor/rte_distributor_v1705.h
> 
> diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
> index 2b28eff..2f05cf3 100644
> --- a/lib/librte_distributor/Makefile
> +++ b/lib/librte_distributor/Makefile
> @@ -39,7 +39,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
>  
>  EXPORT_MAP := rte_distributor_version.map
>  
> -LIBABIVER := 1
> +LIBABIVER := 2

Why keeping ABI compat if you bump ABIVER?

I guess you do not really want to bump now.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v11 14/18] examples/distributor: tweaks for performance
  2017-03-20 10:08                                     ` [PATCH v11 14/18] examples/distributor: tweaks for performance David Hunt
@ 2017-03-27 13:04                                       ` Thomas Monjalon
  2017-03-28  8:45                                         ` Hunt, David
  0 siblings, 1 reply; 202+ messages in thread
From: Thomas Monjalon @ 2017-03-27 13:04 UTC (permalink / raw)
  To: David Hunt; +Cc: dev, bruce.richardson

2017-03-20 10:08, David Hunt:
> Approximately 10% performance increase due to these changes.

It would have been better to explain what are the changes.

> Signed-off-by: David Hunt <david.hunt@intel.com>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  examples/distributor/main.c | 36 +++++++++++++++++++++++-------------
>  1 file changed, 23 insertions(+), 13 deletions(-)

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v11 0/18] distributor lib performance enhancements
  2017-03-20 10:08                                   ` [PATCH v11 0/18] distributor lib performance enhancements David Hunt
                                                       ` (17 preceding siblings ...)
  2017-03-20 10:08                                     ` [PATCH v11 18/18] maintainers: add to distributor lib maintainers David Hunt
@ 2017-03-27 13:06                                     ` Thomas Monjalon
  2017-03-29 14:48                                       ` Thomas Monjalon
  18 siblings, 1 reply; 202+ messages in thread
From: Thomas Monjalon @ 2017-03-27 13:06 UTC (permalink / raw)
  To: David Hunt; +Cc: dev, bruce.richardson

2017-03-20 10:08, David Hunt:
> This patch aims to improve the throughput of the distributor library.

The patchset compiles, have been reviewed and acked.

I have only 2 comments that can be addressed without sending a new version.
Let's conclude in the commenting mails.

Thanks

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v11 07/18] lib: make v20 header file private
  2017-03-20 10:08                                     ` [PATCH v11 07/18] lib: make v20 header file private David Hunt
@ 2017-03-27 13:10                                       ` Thomas Monjalon
  2017-03-28  8:47                                         ` Hunt, David
  0 siblings, 1 reply; 202+ messages in thread
From: Thomas Monjalon @ 2017-03-27 13:10 UTC (permalink / raw)
  To: David Hunt; +Cc: dev, bruce.richardson

2017-03-20 10:08, David Hunt:
> Signed-off-by: David Hunt <david.hunt@intel.com>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
[...]
>  SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
> -SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor_v20.h

There is no explanation for this change.
I think it would be clearer if squashed with the previous patch,
switching to the new API.

Let me know, I can squash it myself :)

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v11 08/18] lib: add symbol versioning to distributor
  2017-03-27 13:02                                       ` Thomas Monjalon
@ 2017-03-28  8:25                                         ` Hunt, David
  0 siblings, 0 replies; 202+ messages in thread
From: Hunt, David @ 2017-03-28  8:25 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, bruce.richardson

Hi Thomas,

On 27/3/2017 2:02 PM, Thomas Monjalon wrote:
> 2017-03-20 10:08, David Hunt:
>> Also bumped up the ABI version number in the Makefile
> It would be good to explain the intent of versioning here.
>
>> Signed-off-by: David Hunt <david.hunt@intel.com>
>> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
>> ---
>>   lib/librte_distributor/Makefile                    |  2 +-
>>   lib/librte_distributor/rte_distributor.c           | 57 +++++++++++---
>>   lib/librte_distributor/rte_distributor_v1705.h     | 89 ++++++++++++++++++++++
>>   lib/librte_distributor/rte_distributor_v20.c       | 10 +++
>>   lib/librte_distributor/rte_distributor_version.map | 14 ++++
>>   5 files changed, 162 insertions(+), 10 deletions(-)
>>   create mode 100644 lib/librte_distributor/rte_distributor_v1705.h
>>
>> diff --git a/lib/librte_distributor/Makefile b/lib/librte_distributor/Makefile
>> index 2b28eff..2f05cf3 100644
>> --- a/lib/librte_distributor/Makefile
>> +++ b/lib/librte_distributor/Makefile
>> @@ -39,7 +39,7 @@ CFLAGS += $(WERROR_FLAGS) -I$(SRCDIR)
>>   
>>   EXPORT_MAP := rte_distributor_version.map
>>   
>> -LIBABIVER := 1
>> +LIBABIVER := 2
> Why keeping ABI compat if you bump ABIVER?
>
> I guess you do not really want to bump now.

You are correct. The symbol versioning will ensure old binaries will 
work without the bump in LIBABIVER.
Please do not apply this line.

Thanks,
Dave.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v11 14/18] examples/distributor: tweaks for performance
  2017-03-27 13:04                                       ` Thomas Monjalon
@ 2017-03-28  8:45                                         ` Hunt, David
  0 siblings, 0 replies; 202+ messages in thread
From: Hunt, David @ 2017-03-28  8:45 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, bruce.richardson


On 27/3/2017 2:04 PM, Thomas Monjalon wrote:
> 2017-03-20 10:08, David Hunt:
>> Approximately 10% performance increase due to these changes.
> It would have been better to explain what are the changes.
>
>> Signed-off-by: David Hunt <david.hunt@intel.com>
>> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
>> ---
>>   examples/distributor/main.c | 36 +++++++++++++++++++++++-------------
>>   1 file changed, 23 insertions(+), 13 deletions(-)

Hi Thomas,
    Sure, how about this:

This patch tunes Rx, Tx, and rte_distributor_process() burst sizes to 
maximize performance.
It also addresses some checkpatch issues.
The result is approximately 10% performance increase.

Regards,
Dave.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v11 07/18] lib: make v20 header file private
  2017-03-27 13:10                                       ` Thomas Monjalon
@ 2017-03-28  8:47                                         ` Hunt, David
  0 siblings, 0 replies; 202+ messages in thread
From: Hunt, David @ 2017-03-28  8:47 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, bruce.richardson


On 27/3/2017 2:10 PM, Thomas Monjalon wrote:
> 2017-03-20 10:08, David Hunt:
>> Signed-off-by: David Hunt <david.hunt@intel.com>
>> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> [...]
>>   SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include := rte_distributor.h
>> -SYMLINK-$(CONFIG_RTE_LIBRTE_DISTRIBUTOR)-include += rte_distributor_v20.h
> There is no explanation for this change.
> I think it would be clearer if squashed with the previous patch,
> switching to the new API.
>
> Let me know, I can squash it myself :)

Sure, makes sense to squash  this one. Please do. :)

Thanks,
Dave.

^ permalink raw reply	[flat|nested] 202+ messages in thread

* Re: [PATCH v11 0/18] distributor lib performance enhancements
  2017-03-27 13:06                                     ` [PATCH v11 0/18] distributor lib performance enhancements Thomas Monjalon
@ 2017-03-29 14:48                                       ` Thomas Monjalon
  0 siblings, 0 replies; 202+ messages in thread
From: Thomas Monjalon @ 2017-03-29 14:48 UTC (permalink / raw)
  To: David Hunt; +Cc: dev, bruce.richardson

2017-03-27 15:06, Thomas Monjalon:
> 2017-03-20 10:08, David Hunt:
> > This patch aims to improve the throughput of the distributor library.
> 
> The patchset compiles, have been reviewed and acked.
> 
> I have only 2 comments that can be addressed without sending a new version.
> Let's conclude in the commenting mails.

Applied with suggested changes, thanks

^ permalink raw reply	[flat|nested] 202+ messages in thread

end of thread, other threads:[~2017-03-29 14:48 UTC | newest]

Thread overview: 202+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-01  4:50 [PATCH v1 1/2] distributor lib performance enhancements David Hunt
2016-12-01  4:50 ` [PATCH v1 1/2] lib: distributor " David Hunt
2016-12-22  4:37   ` [PATCH v2 0/5] distributor library " David Hunt
2016-12-22  4:37     ` [PATCH v2 1/5] lib: distributor " David Hunt
2016-12-22 12:47       ` Jerin Jacob
2016-12-22 16:14         ` Hunt, David
2017-01-02 10:22       ` [WARNING: A/V UNSCANNABLE][PATCH v3 0/6] distributor-performance-improvements David Hunt
2017-01-02 10:22         ` [WARNING: A/V UNSCANNABLE][PATCH v3 1/6] lib: distributor performance enhancements David Hunt
2017-01-02 10:22         ` [WARNING: A/V UNSCANNABLE][PATCH v3 2/6] lib: add distributor vector flow matching David Hunt
2017-01-02 10:22         ` [WARNING: A/V UNSCANNABLE][PATCH v3 3/6] test: unit tests for new distributor burst api David Hunt
2017-01-02 10:22         ` [WARNING: A/V UNSCANNABLE][PATCH v3 4/6] test: add distributor_perf autotest David Hunt
2017-01-02 10:22         ` [WARNING: A/V UNSCANNABLE][PATCH v3 5/6] example: distributor app showing burst api David Hunt
2017-01-02 10:22         ` [WARNING: A/V UNSCANNABLE][PATCH v3 6/6] doc: distributor library changes for new " David Hunt
2017-01-09  7:50       ` [PATCH v4 0/6] distributor library performance enhancements David Hunt
2017-01-09  7:50         ` [PATCH v4 1/6] lib: distributor " David Hunt
2017-01-13 15:19           ` Bruce Richardson
2017-01-19 14:58             ` Hunt, David
2017-01-16 16:36           ` Bruce Richardson
2017-01-19 12:07             ` Hunt, David
2017-01-20  9:18           ` [PATCH v5 0/6] distributor library " David Hunt
2017-01-20  9:18             ` [PATCH v5 1/6] lib: distributor " David Hunt
2017-01-23  9:24               ` [PATCH v6 0/6] distributor library " David Hunt
2017-01-23  9:24                 ` [PATCH v6 1/6] lib: distributor " David Hunt
2017-02-21  3:17                   ` [PATCH v7 0/17] distributor library " David Hunt
2017-02-21  3:17                     ` [PATCH v7 01/17] lib: rename legacy distributor lib files David Hunt
2017-02-21 10:27                       ` Hunt, David
2017-02-24 14:03                       ` Bruce Richardson
2017-03-01  9:55                         ` Hunt, David
2017-03-01  7:47                       ` [PATCH v8 0/18] distributor library performance enhancements David Hunt
2017-03-01  7:47                         ` [PATCH v8 01/18] lib: rename legacy distributor lib files David Hunt
2017-03-06  9:10                           ` [PATCH v9 00/18] distributor lib performance enhancements David Hunt
2017-03-06  9:10                             ` [PATCH v9 01/18] lib: rename legacy distributor lib files David Hunt
2017-03-15  6:19                               ` [PATCH v10 0/18] distributor library performance enhancements David Hunt
2017-03-15  6:19                                 ` [PATCH v10 01/18] lib: rename legacy distributor lib files David Hunt
2017-03-20 10:08                                   ` [PATCH v11 0/18] distributor lib performance enhancements David Hunt
2017-03-20 10:08                                     ` [PATCH v11 01/18] lib: rename legacy distributor lib files David Hunt
2017-03-20 10:08                                     ` [PATCH v11 02/18] lib: create private header file David Hunt
2017-03-20 10:08                                     ` [PATCH v11 03/18] lib: add new distributor code David Hunt
2017-03-20 10:08                                     ` [PATCH v11 04/18] lib: add SIMD flow matching to distributor David Hunt
2017-03-20 10:08                                     ` [PATCH v11 05/18] test/distributor: extra params for autotests David Hunt
2017-03-20 10:08                                     ` [PATCH v11 06/18] lib: switch distributor over to new API David Hunt
2017-03-20 10:08                                     ` [PATCH v11 07/18] lib: make v20 header file private David Hunt
2017-03-27 13:10                                       ` Thomas Monjalon
2017-03-28  8:47                                         ` Hunt, David
2017-03-20 10:08                                     ` [PATCH v11 08/18] lib: add symbol versioning to distributor David Hunt
2017-03-27 13:02                                       ` Thomas Monjalon
2017-03-28  8:25                                         ` Hunt, David
2017-03-20 10:08                                     ` [PATCH v11 09/18] test: test single and burst distributor API David Hunt
2017-03-20 10:08                                     ` [PATCH v11 10/18] test: add perf test for distributor burst mode David Hunt
2017-03-20 10:08                                     ` [PATCH v11 11/18] examples/distributor: allow for extra stats David Hunt
2017-03-20 10:08                                     ` [PATCH v11 12/18] examples/distributor: wait for ports to come up David Hunt
2017-03-20 10:08                                     ` [PATCH v11 13/18] examples/distributor: add dedicated core for dist David Hunt
2017-03-20 10:08                                     ` [PATCH v11 14/18] examples/distributor: tweaks for performance David Hunt
2017-03-27 13:04                                       ` Thomas Monjalon
2017-03-28  8:45                                         ` Hunt, David
2017-03-20 10:08                                     ` [PATCH v11 15/18] examples/distributor: give Rx thread a core David Hunt
2017-03-20 10:08                                     ` [PATCH v11 16/18] doc: distributor library changes for new burst API David Hunt
2017-03-24 14:49                                       ` Mcnamara, John
2017-03-20 10:08                                     ` [PATCH v11 17/18] doc: distributor app " David Hunt
2017-03-20 10:08                                     ` [PATCH v11 18/18] maintainers: add to distributor lib maintainers David Hunt
2017-03-27 13:06                                     ` [PATCH v11 0/18] distributor lib performance enhancements Thomas Monjalon
2017-03-29 14:48                                       ` Thomas Monjalon
2017-03-15  6:19                                 ` [PATCH v10 02/18] lib: create private header file David Hunt
2017-03-15 17:18                                   ` Thomas Monjalon
2017-03-16 10:43                                     ` Hunt, David
2017-03-16 15:40                                       ` Thomas Monjalon
2017-03-15  6:19                                 ` [PATCH v10 03/18] lib: add new distributor code David Hunt
2017-03-15  6:19                                 ` [PATCH v10 04/18] lib: add SIMD flow matching to distributor David Hunt
2017-03-15  6:19                                 ` [PATCH v10 05/18] test/distributor: extra params for autotests David Hunt
2017-03-15  6:19                                 ` [PATCH v10 06/18] lib: switch distributor over to new API David Hunt
2017-03-15  6:19                                 ` [PATCH v10 07/18] lib: make v20 header file private David Hunt
2017-03-15  6:19                                 ` [PATCH v10 08/18] lib: add symbol versioning to distributor David Hunt
2017-03-15  6:19                                 ` [PATCH v10 09/18] test: test single and burst distributor API David Hunt
2017-03-15  6:19                                 ` [PATCH v10 10/18] test: add perf test for distributor burst mode David Hunt
2017-03-15  6:19                                 ` [PATCH v10 11/18] examples/distributor: allow for extra stats David Hunt
2017-03-15  6:19                                 ` [PATCH v10 12/18] examples/distributor: wait for ports to come up David Hunt
2017-03-15  6:19                                 ` [PATCH v10 13/18] examples/distributor: add dedicated core for dist David Hunt
2017-03-15  6:19                                 ` [PATCH v10 14/18] examples/distributor: tweaks for performance David Hunt
2017-03-15  6:19                                 ` [PATCH v10 15/18] examples/distributor: give Rx thread a core David Hunt
2017-03-15  6:19                                 ` [PATCH v10 16/18] doc: distributor library changes for new burst API David Hunt
2017-03-15  6:19                                 ` [PATCH v10 17/18] doc: distributor app " David Hunt
2017-03-15  6:19                                 ` [PATCH v10 18/18] maintainers: add to distributor lib maintainers David Hunt
2017-03-06  9:10                             ` [PATCH v9 02/18] lib: create private header file David Hunt
2017-03-06  9:10                             ` [PATCH v9 03/18] lib: add new burst oriented distributor structs David Hunt
2017-03-06  9:10                             ` [PATCH v9 04/18] lib: add new distributor code David Hunt
2017-03-10 16:03                               ` Bruce Richardson
2017-03-14 10:43                                 ` Hunt, David
2017-03-06  9:10                             ` [PATCH v9 05/18] lib: add SIMD flow matching to distributor David Hunt
2017-03-06  9:10                             ` [PATCH v9 06/18] test/distributor: extra params for autotests David Hunt
2017-03-06  9:10                             ` [PATCH v9 07/18] lib: switch distributor over to new API David Hunt
2017-03-06  9:10                             ` [PATCH v9 08/18] lib: make v20 header file private David Hunt
2017-03-06  9:10                             ` [PATCH v9 09/18] lib: add symbol versioning to distributor David Hunt
2017-03-10 16:22                               ` Bruce Richardson
2017-03-13 10:17                                 ` Hunt, David
2017-03-13 10:28                                 ` Hunt, David
2017-03-13 11:01                                   ` Van Haaren, Harry
2017-03-13 11:02                                     ` Hunt, David
2017-03-06  9:10                             ` [PATCH v9 10/18] test: test single and burst distributor API David Hunt
2017-03-06  9:10                             ` [PATCH v9 11/18] test: add perf test for distributor burst mode David Hunt
2017-03-06  9:10                             ` [PATCH v9 12/18] examples/distributor: allow for extra stats David Hunt
2017-03-10 16:46                               ` Bruce Richardson
2017-03-14 10:44                                 ` Hunt, David
2017-03-06  9:10                             ` [PATCH v9 13/18] sample: distributor: wait for ports to come up David Hunt
2017-03-10 16:48                               ` Bruce Richardson
2017-03-06  9:10                             ` [PATCH v9 14/18] examples/distributor: give distributor a core David Hunt
2017-03-10 16:49                               ` Bruce Richardson
2017-03-14 10:48                                 ` Hunt, David
2017-03-06  9:10                             ` [PATCH v9 15/18] examples/distributor: limit number of Tx rings David Hunt
2017-03-10 16:50                               ` Bruce Richardson
2017-03-14 10:50                                 ` Hunt, David
2017-03-06  9:10                             ` [PATCH v9 16/18] examples/distributor: give Rx thread a core David Hunt
2017-03-10 16:51                               ` Bruce Richardson
2017-03-14  9:34                                 ` Hunt, David
2017-03-06  9:10                             ` [PATCH v9 17/18] doc: distributor library changes for new burst API David Hunt
2017-03-07 17:25                               ` Mcnamara, John
2017-03-06  9:10                             ` [PATCH v9 18/18] maintainers: add to distributor lib maintainers David Hunt
2017-03-10 16:54                             ` [PATCH v9 00/18] distributor lib performance enhancements Bruce Richardson
2017-03-01  7:47                         ` [PATCH v8 02/18] lib: create private header file David Hunt
2017-03-01  7:47                         ` [PATCH v8 03/18] lib: add new burst oriented distributor structs David Hunt
2017-03-01  7:47                         ` [PATCH v8 04/18] lib: add new distributor code David Hunt
2017-03-01  7:47                         ` [PATCH v8 05/18] lib: add SIMD flow matching to distributor David Hunt
2017-03-01  7:47                         ` [PATCH v8 06/18] test/distributor: extra params for autotests David Hunt
2017-03-01  7:47                         ` [PATCH v8 07/18] lib: switch distributor over to new API David Hunt
2017-03-01  7:47                         ` [PATCH v8 08/18] lib: make v20 header file private David Hunt
2017-03-01  7:47                         ` [PATCH v8 09/18] lib: add symbol versioning to distributor David Hunt
2017-03-01 14:50                           ` Hunt, David
2017-03-01  7:47                         ` [PATCH v8 10/18] test: test single and burst distributor API David Hunt
2017-03-01  7:47                         ` [PATCH v8 11/18] test: add perf test for distributor burst mode David Hunt
2017-03-01  7:47                         ` [PATCH v8 12/18] examples/distributor: allow for extra stats David Hunt
2017-03-01  7:47                         ` [PATCH v8 13/18] sample: distributor: wait for ports to come up David Hunt
2017-03-01  7:47                         ` [PATCH v8 14/18] examples/distributor: give distributor a core David Hunt
2017-03-01  7:47                         ` [PATCH v8 15/18] examples/distributor: limit number of Tx rings David Hunt
2017-03-01  7:47                         ` [PATCH v8 16/18] examples/distributor: give Rx thread a core David Hunt
2017-03-01  7:47                         ` [PATCH v8 17/18] doc: distributor library changes for new burst API David Hunt
2017-03-01  7:47                         ` [PATCH v8 18/18] maintainers: add to distributor lib maintainers David Hunt
2017-02-21  3:17                     ` [PATCH v7 02/17] lib: symbol versioning of functions in distributor David Hunt
2017-02-24 14:05                       ` Bruce Richardson
2017-02-21  3:17                     ` [PATCH v7 03/17] lib: create rte_distributor_private.h David Hunt
2017-02-24 14:07                       ` Bruce Richardson
2017-02-21  3:17                     ` [PATCH v7 04/17] lib: add new burst oriented distributor structs David Hunt
2017-02-24 14:08                       ` Bruce Richardson
2017-03-01  9:57                         ` Hunt, David
2017-02-24 14:09                       ` Bruce Richardson
2017-03-01  9:58                         ` Hunt, David
2017-02-21  3:17                     ` [PATCH v7 05/17] lib: add new distributor code David Hunt
2017-02-24 14:11                       ` Bruce Richardson
2017-02-21  3:17                     ` [PATCH v7 06/17] lib: add SIMD flow matching to distributor David Hunt
2017-02-24 14:11                       ` Bruce Richardson
2017-02-21  3:17                     ` [PATCH v7 07/17] lib: apply symbol versioning to distibutor lib David Hunt
2017-02-21 11:50                       ` Hunt, David
2017-02-24 14:12                       ` Bruce Richardson
2017-02-21  3:17                     ` [PATCH v7 08/17] test: change params to distributor autotest David Hunt
2017-02-24 14:14                       ` Bruce Richardson
2017-03-01 10:06                         ` Hunt, David
2017-02-21  3:17                     ` [PATCH v7 09/17] test: switch distributor test over to burst API David Hunt
2017-02-21  3:17                     ` [PATCH v7 10/17] test: test single and burst distributor API David Hunt
2017-02-21  3:17                     ` [PATCH v7 11/17] test: add perf test for distributor burst mode David Hunt
2017-02-21  3:17                     ` [PATCH v7 12/17] example: add extra stats to distributor sample David Hunt
2017-02-24 14:16                       ` Bruce Richardson
2017-02-21  3:17                     ` [PATCH v7 13/17] sample: distributor: wait for ports to come up David Hunt
2017-02-21  3:17                     ` [PATCH v7 14/17] sample: switch to new distributor API David Hunt
2017-02-24 14:16                       ` Bruce Richardson
2017-02-21  3:17                     ` [PATCH v7 15/17] lib: make v20 header file private David Hunt
2017-02-24 14:18                       ` Bruce Richardson
2017-02-21  3:17                     ` [PATCH v7 16/17] doc: distributor library changes for new burst api David Hunt
2017-02-21 16:18                       ` Mcnamara, John
2017-02-21  3:17                     ` [PATCH v7 17/17] maintainers: add to distributor lib maintainers David Hunt
2017-02-24 14:01                     ` [PATCH v7 0/17] distributor library performance enhancements Bruce Richardson
2017-01-23  9:24                 ` [PATCH v6 2/6] lib: add distributor vector flow matching David Hunt
2017-01-23  9:24                 ` [PATCH v6 3/6] test: unit tests for new distributor burst API David Hunt
2017-01-23  9:24                 ` [PATCH v6 4/6] test: add distributor perf autotest David Hunt
2017-01-23  9:24                 ` [PATCH v6 5/6] examples/distributor_app: showing burst API David Hunt
2017-01-23  9:24                 ` [PATCH v6 6/6] doc: distributor library changes for new " David Hunt
2017-01-23 17:02                 ` [PATCH v6 0/6] distributor library performance enhancements Bruce Richardson
2017-01-24  8:56                 ` Liu, Yong
2017-01-23 12:26               ` [PATCH v5 1/6] lib: distributor " Bruce Richardson
2017-01-20  9:18             ` [PATCH v5 2/6] lib: add distributor vector flow matching David Hunt
2017-01-20  9:18             ` [PATCH v5 3/6] test: unit tests for new distributor burst API David Hunt
2017-01-20  9:18             ` [PATCH v5 4/6] test: add distributor perf autotest David Hunt
2017-01-20  9:18             ` [PATCH v5 5/6] examples/distributor_app: showing burst API David Hunt
2017-01-23 12:31               ` Bruce Richardson
2017-01-20  9:18             ` [PATCH v5 6/6] doc: distributor library changes for new " David Hunt
2017-01-09  7:50         ` [PATCH v4 2/6] lib: add distributor vector flow matching David Hunt
2017-01-13 15:26           ` Bruce Richardson
2017-01-19 14:59             ` Hunt, David
2017-01-16 16:40           ` Bruce Richardson
2017-01-19 12:11             ` Hunt, David
2017-01-09  7:50         ` [PATCH v4 3/6] test: unit tests for new distributor burst api David Hunt
2017-01-13 15:33           ` Bruce Richardson
2017-01-09  7:50         ` [PATCH v4 4/6] test: add distributor_perf autotest David Hunt
2017-01-09  7:50         ` [PATCH v4 5/6] example: distributor app showing burst api David Hunt
2017-01-13 15:36           ` Bruce Richardson
2017-01-13 15:38           ` Bruce Richardson
2017-01-09  7:50         ` [PATCH v4 6/6] doc: distributor library changes for new " David Hunt
2016-12-22  4:37     ` [PATCH v2 2/5] test: unit tests for new distributor " David Hunt
2016-12-22  4:37     ` [PATCH v2 3/5] test: add distributor_perf autotest David Hunt
2016-12-22 12:19       ` Jerin Jacob
2017-01-02 16:24         ` Hunt, David
2017-01-04 13:09           ` Jerin Jacob
2016-12-22  4:37     ` [PATCH v2 4/5] example: distributor app showing burst api David Hunt
2016-12-22  4:37     ` [PATCH v2 5/5] doc: distributor library changes for new " David Hunt
2016-12-01  4:50 ` [PATCH v1 2/2] example: distributor app modified to use burstAPI David Hunt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.