From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gage Eads Subject: [PATCH v4 0/5] Add non-blocking ring Date: Mon, 28 Jan 2019 12:14:02 -0600 Message-ID: <20190128181407.32739-1-gage.eads@intel.com> References: <20190118152326.22686-1-gage.eads@intel.com> Cc: olivier.matz@6wind.com, arybchenko@solarflare.com, bruce.richardson@intel.com, konstantin.ananyev@intel.com, stephen@networkplumber.org, jerinj@marvell.com, mczekaj@marvell.com, nd@arm.com, Ola.Liljedahl@arm.com To: dev@dpdk.org Return-path: Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by dpdk.org (Postfix) with ESMTP id A2A646904 for ; Mon, 28 Jan 2019 19:15:06 +0100 (CET) In-Reply-To: <20190118152326.22686-1-gage.eads@intel.com> List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" For some users, the rte ring's "non-preemptive" constraint is not acceptable; for example, if the application uses a mixture of pinned high-priority threads and multiplexed low-priority threads that share a mempool. This patchset introduces a non-blocking ring, on top of which a mempool can run. Crucially, the non-blocking algorithm relies on a 128-bit compare-and-swap, so it is currently limited to x86_64 machines. This is also an experimental API, so RING_F_NB users must build with the ALLOW_EXPERIMENTAL_API flag. The ring uses more compare-and-swap atomic operations than the regular rte ring: With no contention, an enqueue of n pointers uses (1 + 2n) CAS operations and a dequeue of n pointers uses 2. This algorithm has worse average-case performance than the regular rte ring (particularly a highly-contended ring with large bulk accesses), however: - For applications with preemptible pthreads, the regular rte ring's worst-case performance (i.e. one thread being preempted in the update_tail() critical section) is much worse than the non-blocking ring's. - Software caching can mitigate the average case performance for ring-based algorithms. For example, a non-blocking ring based mempool (a likely use case for this ring) with per-thread caching. The non-blocking ring is enabled via a new flag, RING_F_NB. For ease-of-use, existing ring enqueue/dequeue functions work with both "regular" and non-blocking rings. This patchset also adds non-blocking versions of ring_autotest and ring_perf_autotest, and a non-blocking ring based mempool. This patchset makes one API change; a deprecation notice will be posted in a separate commit. This patchset depends on the 128-bit compare-and-set patch[1]. [1] http://mails.dpdk.org/archives/dev/2019-January/124159.html v4: - Split out nb_enqueue and nb_dequeue functions in generic and C11 versions, with the necessary memory ordering behavior for weakly consistent machines. - Convert size_t variables (from v2) to uint64_t and no-longer-applicable comment about variably-sized ring indexes. - Fix bug in nb_enqueue_mp that the breaks the non-blocking guarantee. - Split the ring_ptr cast into two lines. - Change the dependent patchset from the non-blocking stack patch series to one only containing the 128b CAS commit v3: - Avoid the ABI break by putting 64-bit head and tail values in the same cacheline as struct rte_ring's prod and cons members. - Don't attempt to compile rte_atomic128_cmpset without ALLOW_EXPERIMENTAL_API, as this would break a large number of libraries. - Add a helpful warning to __rte_ring_do_nb_enqueue_mp() in case someone tries to use RING_F_NB without the ALLOW_EXPERIMENTAL_API flag. - Update the ring mempool to use experimental APIs - Clarify that RINB_F_NB is only limited to x86_64 currently; e.g ARMv8 has the ISA support for 128-bit CAS to eventually support it. v2: - Merge separate docs commit into patch #5 - Convert uintptr_t to size_t - Add a compile-time check for the size of size_t - Fix a space-after-typecast issue - Fix an unnecessary-parentheses checkpatch warning - Bump librte_ring's library version Gage Eads (5): ring: add 64-bit headtail structure ring: add a non-blocking implementation test_ring: add non-blocking ring autotest test_ring_perf: add non-blocking ring perf test mempool/ring: add non-blocking ring handlers doc/guides/prog_guide/env_abstraction_layer.rst | 5 + drivers/mempool/ring/Makefile | 1 + drivers/mempool/ring/meson.build | 2 + drivers/mempool/ring/rte_mempool_ring.c | 58 +++- lib/librte_ring/rte_ring.c | 72 +++- lib/librte_ring/rte_ring.h | 336 +++++++++++++++++-- lib/librte_ring/rte_ring_c11_mem.h | 427 ++++++++++++++++++++++++ lib/librte_ring/rte_ring_generic.h | 408 ++++++++++++++++++++++ lib/librte_ring/rte_ring_version.map | 7 + test/test/test_ring.c | 57 ++-- test/test/test_ring_perf.c | 19 +- 11 files changed, 1319 insertions(+), 73 deletions(-) -- 2.13.6