From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bruce Richardson Subject: Re: [PATCH v7 0/17] distributor library performance enhancements Date: Fri, 24 Feb 2017 14:01:53 +0000 Message-ID: <20170224140153.GA106392@bricha3-MOBL3.ger.corp.intel.com> References: <1485163480-156507-2-git-send-email-david.hunt@intel.com> <1487647073-129064-1-git-send-email-david.hunt@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: dev@dpdk.org To: David Hunt Return-path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id CE21937AA for ; Fri, 24 Feb 2017 15:01:57 +0100 (CET) Content-Disposition: inline In-Reply-To: <1487647073-129064-1-git-send-email-david.hunt@intel.com> List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Tue, Feb 21, 2017 at 03:17:36AM +0000, David Hunt wrote: > This patch aims to improve the throughput of the distributor library. > > It uses a similar handshake mechanism to the previous version of > the library, in that bits are used to indicate when packets are ready > to be sent to a worker and ready to be returned from a worker. One main > difference is that instead of sending one packet in a cache line, it makes > use of the 7 free spaces in the same cache line in order to send up to > 8 packets at a time to/from a worker. > > The flow matching algorithm has had significant re-work, and now keeps an > array of inflight flows and an array of backlog flows, and matches incoming > flows to the inflight/backlog flows of all workers so that flow pinning to > workers can be maintained. > > The Flow Match algorithm has both scalar and a vector versions, and a > function pointer is used to select the post appropriate function at run time, > depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the > the scalar match function is selected, which should still gives a good boost > in performance over the non-burst API. > > v2 changes: > * Created a common distributor_priv.h header file with common > definitions and structures. > * Added a scalar version so it can be built and used on machines without > sse2 instruction set > * Added unit autotests > * Added perf autotest For future reference, I think it's better to put the list of deltas from each version in reverse order, so that the latest changes are on top, and save scrolling for those of us who have been tracking the set. > > v3 changes: > * Addressed mailing list review comments > * Test code removal > * Split out SSE match into separate file to facilitate NEON addition > * Cleaned up conditional compilation flags for SSE2 > * Addressed c99 style compilation errors > * rebased on latest head (Jan 2 2017, Happy New Year to all) > > v4 changes: > * fixed issue building shared libraries > > v5 changes: > * Removed some un-needed code around retries in worker API calls > * Cleanup due to review comments on mailing list > * Cleanup of non-x86 platform compilation, fallback to scalar match > > v6 changes: > * Fixed intermittent segfault where num pkts not divisible > by BURST_SIZE > * Cleanup due to review comments on mailing list > * Renamed _priv.h to _private.h. > > v7 changes: > * Reorganised patch so there's a more natural progression in the > changes, and divided them down into easier to review chunks. > * Previous versions of this patch set were effectively two APIs. > We now have a single API. Legacy functionality can > be used by by using the rte_distributor_create API call with the > RTE_DISTRIBUTOR_SINGLE flag when creating a distributor instance. > * Added symbol versioning for old API so that ABI is preserved. > The merging to a single API is great to see, making it so much easier for app developers. Thanks for that. /Bruce