dev.dpdk.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/7] Improve EAL bit operations API
@ 2024-03-02 13:53 Mattias Rönnblom
  2024-03-02 13:53 ` [RFC 1/7] eal: extend bit manipulation functions Mattias Rönnblom
                   ` (6 more replies)
  0 siblings, 7 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-03-02 13:53 UTC (permalink / raw)
  To: dev; +Cc: hofors, Heng Wang, Mattias Rönnblom

This patch set represent an attempt to improve and extend the RTE
bitops API, in particular for functions that operate on individual
bits.

RFCv1 is submitted primarily to 1) receive general feedback on if
improvements in this area is worth working on, and 2) receive feedback
on the API.

No test cases are included in v1 and the various functions may well
not do what they are intended to.

The legacy <rte_bitops.h> rte_bit_relaxed_*() family of functions is
replaced with three families:

rte_bit_[test|set|clear|assign][32|64]() which provides no memory
ordering or atomicity guarantees and no read-once or write-once
semantics (e.g., no use of volatile), but does provide the best
performance. The performance degradation resulting from the use of
volatile (e.g., forcing loads and stores to actually occur and in the
number specified) and atomic (e.g., LOCK instructions on x86) may be a
significant.

rte_bit_once_*() which guarantees program-level load and stores
actually occurring (i.e., prevents certain optimizations). The primary
use of these functions are in the context of memory mapped
I/O. Feedback on the details (semantics, naming) here would be greatly
appreciated, since the author is not much of a driver developer.

rte_bit_atomic_*() which provides atomic bit-level operations,
including the possibility to specifying memory ordering constraints
(or the lack thereof).

The atomic functions take non-_Atomic pointers, to be flexible, just
like the GCC builtins and default <rte_stdatomic.h>. The issue with
_Atomic APIs is that it may well be the case that the user wants to
perform both non-atomic and atomic operations on the same word.

Having _Atomic-marked addresses would complicate supporting atomic
bit-level operations in the proposed bitset API (and potentially other
APIs depending on RTE bitops for atomic bit-level ops). Either one
needs two bitset variants, one _Atomic bitset and one non-atomic one,
or the bitset code needs to cast the non-_Atomic pointer to an _Atomic
one. Having a separate _Atomic bitset would be bloat and also prevent
the user from both, in some situations, doing atomic operations
against a bit set, while in other situations (e.g., at times when MT
safety is not a concern) operating on the same words in a non-atomic
manner. That said, all this is still unclear to the author and much
depending on the future path of DPDK atomics.

Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
not uint32_t or uint64_t. The author found the use of such large types
confusing, and also failed to see any performance benefits.

A set of functions rte_bit_*_assign*() are added, to assign a
particular boolean value to a particular bit.

All functions have properly documented semantics.

All functions are available in uint32_t and uint64_t variants.

In addition, for every function there is a generic selection variant
which operates on both 32-bit and 64-bit words (depending on the
pointer type). The use of C11 generic selection is the first in the
DPDK code base.

_Generic allow the user code to be a little more impact. Have a
generic atomic test/set/clear/assign bit API also seems consistent
with the "core" (word-size) atomics API, which is generic (both GCC
builtins and <rte_stdatomic.h> are).

The _Generic versions also may avoid having explicit unsigned long
versions of all functions. If you have an unsigned long, it's safe to
use the generic version (e.g., rte_set_bit()) and _Generic will pick
the right function, provided long is either 32 or 64 bit on your
platform (which it is on all DPDK-supported ABIs).

The generic rte_bit_set() is a macro, and not a function, but
nevertheless has been given a lower-case name. That's how C11 does it
(for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
can't be taken, but it does not evaluate its parameters more than
once.

Things that are left out of this patch set, that may be included
in future versions:

 * Have all functions returning a bit number have the same return type
   (i.e., unsigned int).
 * Harmonize naming of some GCC builtin wrappers (i.e., rte_fls_u32()).
 * Add __builtin_ffsll()/ffs() wrapper and potentially other wrappers
   for useful/used bit-level GCC builtins.
 * Eliminate the MSVC #ifdef-induced documentation duplication.
 * _Generic versions of things like rte_popcount32(). (?)

ABI-breaking patches should probably go into a separate patch set (?).

Mattias Rönnblom (7):
  eal: extend bit manipulation functions
  eal: add generic bit manipulation macros
  eal: add bit manipulation functions which read or write once
  eal: add generic once-type bit operations macros
  eal: add atomic bit operations
  eal: add generic atomic bit operations
  eal: deprecate relaxed family of bit operations

 lib/eal/include/rte_bitops.h | 1115 +++++++++++++++++++++++++++++++++-
 1 file changed, 1113 insertions(+), 2 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [RFC 1/7] eal: extend bit manipulation functions
  2024-03-02 13:53 [RFC 0/7] Improve EAL bit operations API Mattias Rönnblom
@ 2024-03-02 13:53 ` Mattias Rönnblom
  2024-03-02 17:05   ` Stephen Hemminger
  2024-04-25  8:58   ` [RFC v2 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-03-02 13:53 ` [RFC 2/7] eal: add generic bit manipulation macros Mattias Rönnblom
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-03-02 13:53 UTC (permalink / raw)
  To: dev; +Cc: hofors, Heng Wang, Mattias Rönnblom

Add functionality to test, set, clear, and assign the value to
individual bits in 32-bit or 64-bit words.

These functions have no implications on memory ordering, atomicity and
does not use volatile and thus does not prevent any compiler
optimizations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 194 ++++++++++++++++++++++++++++++++++-
 1 file changed, 192 insertions(+), 2 deletions(-)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 449565eeae..9a368724d5 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -2,6 +2,7 @@
  * Copyright(c) 2020 Arm Limited
  * Copyright(c) 2010-2019 Intel Corporation
  * Copyright(c) 2023 Microsoft Corporation
+ * Copyright(c) 2024 Ericsson AB
  */
 
 #ifndef _RTE_BITOPS_H_
@@ -11,8 +12,9 @@
  * @file
  * Bit Operations
  *
- * This file defines a family of APIs for bit operations
- * without enforcing memory ordering.
+ * This file provides functionality for low-level, single-word
+ * arithmetic and bit-level operations, such as counting or
+ * setting individual bits.
  */
 
 #include <stdint.h>
@@ -105,6 +107,194 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * Test if a particular bit in a 32-bit word is set.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to query.
+ * @param nr
+ *   The index of the bit (0-31).
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+static inline bool
+rte_bit_test32(const uint32_t *addr, unsigned int nr);
+
+/**
+ * Set bit in 32-bit word.
+ *
+ * Set bit specified by @c nr in the 32-bit word pointed to by
+ * @c addr to '1'.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-31).
+ */
+static inline void
+rte_bit_set32(uint32_t *addr, unsigned int nr);
+
+/**
+ * Clear bit in 32-bit word.
+ *
+ * Set bit specified by @c nr in the 32-bit word pointed to by
+ * @c addr to '0'.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-31).
+ */
+static inline void
+rte_bit_clear32(uint32_t *addr, unsigned int nr);
+
+/**
+ * Assign a value to bit in a 32-bit word.
+ *
+ * Set bit specified by @c nr in the 32-bit word pointed to by
+ * @c addr to the value indicated by @c value.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-31).
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+static inline void
+rte_bit_assign32(uint32_t *addr, unsigned int nr, bool value)
+{
+	if (value)
+		rte_bit_set32(addr, nr);
+	else
+		rte_bit_clear32(addr, nr);
+}
+
+/**
+ * Test if a particular bit in a 64-bit word is set.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to query.
+ * @param nr
+ *   The index of the bit (0-63).
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+static inline bool
+rte_bit_test64(const uint64_t *addr, unsigned int nr);
+
+/**
+ * Set bit in 64-bit word.
+ *
+ * Set bit specified by @c nr in the 64-bit word pointed to by
+ * @c addr to '1'.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-63).
+ */
+static inline void
+rte_bit_set64(uint64_t *addr, unsigned int nr);
+
+/**
+ * Clear bit in 64-bit word.
+ *
+ * Set bit specified by @c nr in the 64-bit word pointed to by
+ * @c addr to '0'.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-63).
+ */
+static inline void
+rte_bit_clear64(uint64_t *addr, unsigned int nr);
+
+/**
+ * Assign a value to bit in a 64-bit word.
+ *
+ * Set bit specified by @c nr in the 64-bit word pointed to by
+ * @c addr to the value indicated by @c value.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-63).
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+static inline void
+rte_bit_assign64(uint64_t *addr, unsigned int nr, bool value)
+{
+	if (value)
+		rte_bit_set64(addr, nr);
+	else
+		rte_bit_clear64(addr, nr);
+}
+
+#define __RTE_GEN_BIT_TEST(name, size, qualifier)			\
+	static inline bool						\
+	name(const qualifier uint ## size ## _t *addr, unsigned int nr)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return *addr & mask;					\
+	}
+
+#define __RTE_GEN_BIT_SET(name, size, qualifier)			\
+	static inline void						\
+	name(qualifier uint ## size ## _t *addr, unsigned int nr)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		*addr |= mask;						\
+	}								\
+
+#define __RTE_GEN_BIT_CLEAR(name, size, qualifier)			\
+	static inline void						\
+	name(qualifier uint ## size ## _t *addr, unsigned int nr)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = ~((uint ## size ## _t)1 << nr); \
+		(*addr) &= mask;					\
+	}								\
+
+__RTE_GEN_BIT_TEST(rte_bit_test32, 32,)
+__RTE_GEN_BIT_SET(rte_bit_set32, 32,)
+__RTE_GEN_BIT_CLEAR(rte_bit_clear32, 32,)
+
+__RTE_GEN_BIT_TEST(rte_bit_test64, 64,)
+__RTE_GEN_BIT_SET(rte_bit_set64, 64,)
+__RTE_GEN_BIT_CLEAR(rte_bit_clear64, 64,)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 2/7] eal: add generic bit manipulation macros
  2024-03-02 13:53 [RFC 0/7] Improve EAL bit operations API Mattias Rönnblom
  2024-03-02 13:53 ` [RFC 1/7] eal: extend bit manipulation functions Mattias Rönnblom
@ 2024-03-02 13:53 ` Mattias Rönnblom
  2024-03-04  8:16   ` Heng Wang
  2024-03-04 16:42   ` Tyler Retzlaff
  2024-03-02 13:53 ` [RFC 3/7] eal: add bit manipulation functions which read or write once Mattias Rönnblom
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-03-02 13:53 UTC (permalink / raw)
  To: dev; +Cc: hofors, Heng Wang, Mattias Rönnblom

Add bit-level test/set/clear/assign macros operating on both 32-bit
and 64-bit words by means of C11 generic selection.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 81 ++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 9a368724d5..afd0f11033 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -107,6 +107,87 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: rte_bit_test32,		\
+		 uint64_t *: rte_bit_test64)(addr, nr)
+
+/**
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: rte_bit_set32,		\
+		 uint64_t *: rte_bit_set64)(addr, nr)
+
+/**
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)			\
+	_Generic((addr),				\
+		 uint32_t *: rte_bit_clear32,		\
+		 uint64_t *: rte_bit_clear64)(addr, nr)
+
+/**
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)			\
+	_Generic((addr),				\
+		 uint32_t *: rte_bit_assign32,			\
+		 uint64_t *: rte_bit_assign64)(addr, nr, value)
+
 /**
  * Test if a particular bit in a 32-bit word is set.
  *
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 3/7] eal: add bit manipulation functions which read or write once
  2024-03-02 13:53 [RFC 0/7] Improve EAL bit operations API Mattias Rönnblom
  2024-03-02 13:53 ` [RFC 1/7] eal: extend bit manipulation functions Mattias Rönnblom
  2024-03-02 13:53 ` [RFC 2/7] eal: add generic bit manipulation macros Mattias Rönnblom
@ 2024-03-02 13:53 ` Mattias Rönnblom
  2024-03-02 13:53 ` [RFC 4/7] eal: add generic once-type bit operations macros Mattias Rönnblom
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-03-02 13:53 UTC (permalink / raw)
  To: dev; +Cc: hofors, Heng Wang, Mattias Rönnblom

Add bit test/set/clear/assign functions which prevents certain
compiler optimizations and guarantees that program-level memory loads
and/or stores will actually occur.

These functions are useful when interacting with memory-mapped
hardware devices.

The "once" family of functions does not promise atomicity and provides
no memory ordering guarantees beyond the C11 relaxed memory model.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 229 +++++++++++++++++++++++++++++++++++
 1 file changed, 229 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index afd0f11033..3118c51748 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -338,6 +338,227 @@ rte_bit_assign64(uint64_t *addr, unsigned int nr, bool value)
 		rte_bit_clear64(addr, nr);
 }
 
+/**
+ * Test exactly once if a particular bit in a 32-bit word is set.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * (e.g., it may not be eliminate or merged by the compiler).
+ *
+ * \code{.c}
+ * rte_bit_once_set32(addr, 17);
+ * if (rte_bit_once_test32(addr, 17)) {
+ *     ...
+ * }
+ * \endcode
+ *
+ * In the above example, rte_bit_once_set32() may not be removed by
+ * the compiler, which would be allowed in case rte_bit_set32() and
+ * rte_bit_test32() was used.
+ *
+ * \code{.c}
+ * while (rte_bit_once_test32(addr, 17);
+ *     ;
+ * \endcode
+ *
+ * In case rte_bit_test32(addr, 17) was used instead, the resulting
+ * object code could (and in many cases would be) replaced with
+ * with the equivalent to
+ * \code{.c}
+ * if (rte_bit_test32(addr, 17)) {
+ *   for (;;) // spin forever
+ *       ;
+ * }
+ * \endcode
+ *
+ * The regular bit set operations (e.g., rte_bit_test32()) should be
+ * preffered over the "once" family of operations (e.g.,
+ * rte_bit_once_test32()), since the latter may prevent optimizations
+ * crucial for run-time performance.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering (except ordering from the same thread to the same memory
+ * location) or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to query.
+ * @param nr
+ *   The index of the bit (0-31).
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+
+static inline bool
+rte_bit_once_test32(const volatile uint32_t *addr, unsigned int nr);
+
+/**
+ * Set bit in 32-bit word exactly once.
+ *
+ * Set bit specified by @c nr in the 32-bit word pointed to by
+ * @c addr to '1'.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit set operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-31).
+ */
+static inline void
+rte_bit_once_set32(volatile uint32_t *addr, unsigned int nr);
+
+/**
+ * Clear bit in 32-bit word exactly once.
+ *
+ * Set bit specified by @c nr in the 32-bit word pointed to by @c addr
+ * to '0'.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * See rte_bit_once_test32() for more information and uses cases for the
+ * "once" class of functions.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-31).
+ */
+static inline void
+rte_bit_once_clear32(volatile uint32_t *addr, unsigned int nr);
+
+/**
+ * Assign a value to bit in a 32-bit word exactly once.
+ *
+ * Set bit specified by @c nr in the 32-bit word pointed to by
+ * @c addr to the value indicated by @c value.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-31).
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+static inline void
+rte_bit_once_assign32(uint32_t *addr, unsigned int nr, bool value)
+{
+	if (value)
+		rte_bit_once_set32(addr, nr);
+	else
+		rte_bit_once_clear32(addr, nr);
+}
+
+/**
+ * Test exactly once if a particular bit in a 64-bit word is set.
+ *
+ * This function is guaranteed to result in exactly one memory load.
+ * See rte_bit_once_test32() for more information and uses cases for the
+ * "once" class of functions.
+ *
+ * rte_v_bit_test64() does give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to query.
+ * @param nr
+ *   The index of the bit (0-63).
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+
+static inline bool
+rte_bit_once_test64(const volatile uint64_t *addr, unsigned int nr);
+
+/**
+ * Set bit in 64-bit word exactly once.
+ *
+ * Set bit specified by @c nr in the 64-bit word pointed to by
+ * @c addr to '1'.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit set operation.
+ *
+ * See rte_bit_once_test32() for more information and uses cases for the
+ * "once" class of functions.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-63).
+ */
+static inline void
+rte_bit_once_set64(volatile uint64_t *addr, unsigned int nr);
+
+/**
+ * Clear bit in 64-bit word exactly once.
+ *
+ * Set bit specified by @c nr in the 64-bit word pointed to by @c addr
+ * to '0'.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * See rte_bit_once_test32() for more information and uses cases for the
+ * "once" class of functions.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-63).
+ */
+static inline void
+rte_bit_once_clear64(volatile uint64_t *addr, unsigned int nr);
+
+/**
+ * Assign a value to bit in a 64-bit word exactly once.
+ *
+ * Set bit specified by @c nr in the 64-bit word pointed to by
+ * @c addr to the value indicated by @c value.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-63).
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+static inline void
+rte_bit_once_assign64(volatile uint64_t *addr, unsigned int nr, bool value)
+{
+	if (value)
+		rte_bit_once_set64(addr, nr);
+	else
+		rte_bit_once_clear64(addr, nr);
+}
+
 #define __RTE_GEN_BIT_TEST(name, size, qualifier)			\
 	static inline bool						\
 	name(const qualifier uint ## size ## _t *addr, unsigned int nr)	\
@@ -376,6 +597,14 @@ __RTE_GEN_BIT_TEST(rte_bit_test64, 64,)
 __RTE_GEN_BIT_SET(rte_bit_set64, 64,)
 __RTE_GEN_BIT_CLEAR(rte_bit_clear64, 64,)
 
+__RTE_GEN_BIT_TEST(rte_bit_once_test32, 32, volatile)
+__RTE_GEN_BIT_SET(rte_bit_once_set32, 32, volatile)
+__RTE_GEN_BIT_CLEAR(rte_bit_once_clear32, 32, volatile)
+
+__RTE_GEN_BIT_TEST(rte_bit_once_test64, 64, volatile)
+__RTE_GEN_BIT_SET(rte_bit_once_set64, 64, volatile)
+__RTE_GEN_BIT_CLEAR(rte_bit_once_clear64, 64, volatile)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 4/7] eal: add generic once-type bit operations macros
  2024-03-02 13:53 [RFC 0/7] Improve EAL bit operations API Mattias Rönnblom
                   ` (2 preceding siblings ...)
  2024-03-02 13:53 ` [RFC 3/7] eal: add bit manipulation functions which read or write once Mattias Rönnblom
@ 2024-03-02 13:53 ` Mattias Rönnblom
  2024-03-02 13:53 ` [RFC 5/7] eal: add atomic bit operations Mattias Rönnblom
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-03-02 13:53 UTC (permalink / raw)
  To: dev; +Cc: hofors, Heng Wang, Mattias Rönnblom

Add macros for once-type bit operations operating on both 32-bit and
64-bit words by means of C11 generic selection.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 101 +++++++++++++++++++++++++++++++++++
 1 file changed, 101 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 3118c51748..450334c751 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -188,6 +188,107 @@ extern "C" {
 		 uint32_t *: rte_bit_assign32,			\
 		 uint64_t *: rte_bit_assign64)(addr, nr, value)
 
+/**
+ * Test exactly once if a particular bit in a word is set.
+ *
+ * Generic selection macro to exactly once test the value of a bit in
+ * a 32-bit or 64-bit word. The type of operation depends on the type
+ * of the @c addr parameter.
+ *
+ * This macro is guaranteed to result in exactly one memory load. See
+ * rte_bit_once_test32() for more information and uses cases for the
+ * "once" class of functions.
+ *
+ * rte_bit_once_test() does give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+
+#define rte_bit_once_test(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: rte_bit_once_test32,		\
+		 uint64_t *: rte_bit_once_test64)(addr, nr)
+
+/**
+ * Set bit in word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to '1'
+ * exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit set operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_set(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: rte_bit_once_set32,		\
+		 uint64_t *: rte_bit_once_set64)(addr, nr)
+
+/**
+ * Clear bit in word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to '0'
+ * exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_clear(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: rte_bit_once_clear32,		\
+		 uint64_t *: rte_bit_once_clear64)(addr, nr)
+
+/**
+ * Assign a value to bit in a word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to the
+ * value indicated by @c value exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_once_assign(addr, nr, value)				\
+	_Generic((addr),						\
+		 uint32_t *: rte_bit_once_assign32,			\
+		 uint64_t *: rte_bit_once_assign64)(addr, nr, value)
+
 /**
  * Test if a particular bit in a 32-bit word is set.
  *
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 5/7] eal: add atomic bit operations
  2024-03-02 13:53 [RFC 0/7] Improve EAL bit operations API Mattias Rönnblom
                   ` (3 preceding siblings ...)
  2024-03-02 13:53 ` [RFC 4/7] eal: add generic once-type bit operations macros Mattias Rönnblom
@ 2024-03-02 13:53 ` Mattias Rönnblom
  2024-03-02 13:53 ` [RFC 6/7] eal: add generic " Mattias Rönnblom
  2024-03-02 13:53 ` [RFC 7/7] eal: deprecate relaxed family of " Mattias Rönnblom
  6 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-03-02 13:53 UTC (permalink / raw)
  To: dev; +Cc: hofors, Heng Wang, Mattias Rönnblom

Add atomic bit test/set/clear/assign and test-and-set/clear functions.

All atomic bit functions allow (and indeed, require) the caller to
specify a memory order.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 337 +++++++++++++++++++++++++++++++++++
 1 file changed, 337 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 450334c751..7eb08bc768 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -20,6 +20,7 @@
 #include <stdint.h>
 
 #include <rte_debug.h>
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -706,6 +707,342 @@ __RTE_GEN_BIT_TEST(rte_bit_once_test64, 64, volatile)
 __RTE_GEN_BIT_SET(rte_bit_once_set64, 64, volatile)
 __RTE_GEN_BIT_CLEAR(rte_bit_once_clear64, 64, volatile)
 
+/**
+ * Test if a particular bit in a 32-bit word is set with a particular
+ * memory order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to query.
+ * @param nr
+ *   The index of the bit (0-31).
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+static inline bool
+rte_bit_atomic_test32(const uint32_t *addr, unsigned int nr, int memory_order);
+
+/**
+ * Atomically set bit in 32-bit word.
+ *
+ * Atomically bit specified by @c nr in the 32-bit word pointed to by
+ * @c addr to '1', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-31).
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+static inline void
+rte_bit_atomic_set32(uint32_t *addr, unsigned int nr, int memory_order);
+
+/**
+ * Atomically clear bit in 32-bit word.
+ *
+ * Atomically set bit specified by @c nr in the 32-bit word pointed to
+ * by @c addr to '0', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-31).
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+static inline void
+rte_bit_atomic_clear32(uint32_t *addr, unsigned int nr, int memory_order);
+
+/**
+ * Atomically assign a value to bit in a 32-bit word.
+ *
+ * Atomically set bit specified by @c nr in the 32-bit word pointed to
+ * by @c addr to the value indicated by @c value, with the memory
+ * ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-31).
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+static inline void
+rte_bit_atomic_assign32(uint32_t *addr, unsigned int nr, bool value,
+			int memory_order);
+
+/*
+ * Atomic test-and-assign is not considered useful-enough to document
+ * and expose in the public API.
+ */
+static inline bool
+__rte_bit_atomic_test_and_assign32(uint32_t *addr, unsigned int nr, bool value,
+				   int memory_order);
+
+/**
+ * Atomically test and set a bit in a 32-bit word.
+ *
+ * Atomically test and set bit specified by @c nr in the 32-bit word
+ * pointed to by @c addr to the value indicated by @c value, with the
+ * memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-31).
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+static inline bool
+rte_bit_atomic_test_and_set32(uint32_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, true, memory_order);
+}
+
+/**
+ * Atomically test and clear a bit in a 32-bit word.
+ *
+ * Atomically test and clear bit specified by @c nr in the 32-bit word
+ * pointed to by @c addr to the value indicated by @c value, with the
+ * memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-31).
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+static inline bool
+rte_bit_atomic_test_and_clear32(uint32_t *addr, unsigned int nr,
+				int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, false, memory_order);
+}
+
+/**
+ * Test if a particular bit in a 32-bit word is set with a particular
+ * memory order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to query.
+ * @param nr
+ *   The index of the bit (0-31).
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+static inline bool
+rte_bit_atomic_test64(const uint64_t *addr, unsigned int nr, int memory_order);
+
+/**
+ * Atomically set bit in 64-bit word.
+ *
+ * Atomically bit specified by @c nr in the 64-bit word pointed to by
+ * @c addr to '1', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-63).
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+static inline void
+rte_bit_atomic_set64(uint64_t *addr, unsigned int nr, int memory_order);
+
+/**
+ * Atomically clear bit in 64-bit word.
+ *
+ * Atomically set bit specified by @c nr in the 64-bit word pointed to
+ * by @c addr to '0', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-63).
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+static inline void
+rte_bit_atomic_clear64(uint64_t *addr, unsigned int nr, int memory_order);
+
+/**
+ * Atomically assign a value to bit in a 64-bit word.
+ *
+ * Atomically set bit specified by @c nr in the 64-bit word pointed to
+ * by @c addr to the value indicated by @c value, with the memory
+ * ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-63).
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+static inline void
+rte_bit_atomic_assign64(uint64_t *addr, unsigned int nr, bool value,
+			int memory_order);
+
+/*
+ * Atomic test-and-assign is not considered useful-enough to document
+ * and expose in the public API.
+ */
+static inline bool
+__rte_bit_atomic_test_and_assign64(uint64_t *addr, unsigned int nr, bool value,
+				   int memory_order);
+/**
+ * Atomically test and set a bit in a 64-bit word.
+ *
+ * Atomically test and set bit specified by @c nr in the 64-bit word
+ * pointed to by @c addr to the value indicated by @c value, with the
+ * memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-63).
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+static inline bool
+rte_bit_atomic_test_and_set64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, true, memory_order);
+}
+
+/**
+ * Atomically test and clear a bit in a 64-bit word.
+ *
+ * Atomically test and clear bit specified by @c nr in the 64-bit word
+ * pointed to by @c addr to the value indicated by @c value, with the
+ * memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-63).
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+static inline bool
+rte_bit_atomic_test_and_clear64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, false, memory_order);
+}
+
+#ifndef RTE_ENABLE_STDATOMIC
+
+#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
+	static inline bool						\
+	rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\
+				    unsigned int nr, int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return __atomic_load_n(addr, memory_order) & mask;	\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_SET(size)					\
+	static inline void						\
+	rte_bit_atomic_set ## size(uint ## size ## _t *addr,		\
+				   unsigned int nr, int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		__atomic_fetch_or(addr, mask, memory_order);		\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_CLEAR(size)				\
+	static inline void						\
+	rte_bit_atomic_clear ## size(uint ## size ## _t *addr,		\
+				     unsigned int nr, int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		__atomic_fetch_and(addr, ~mask, memory_order);		\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_ASSIGN(size)				\
+	static inline void						\
+	rte_bit_atomic_assign ## size(uint ## size ## _t *addr,		\
+				      unsigned int nr, bool value,	\
+				      int memory_order)			\
+	{								\
+		if (value)						\
+			rte_bit_atomic_set ## size(addr, nr, memory_order); \
+		else							\
+			rte_bit_atomic_clear ## size(addr, nr, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)			\
+	static inline bool						\
+	__rte_bit_atomic_test_and_assign ## size(uint ## size ## _t *addr, \
+						 unsigned int nr,	\
+						 bool value,		\
+						 int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t before;				\
+		uint ## size ## _t after;				\
+									\
+		before = __atomic_load_n(addr, __ATOMIC_RELAXED);	\
+									\
+		do {							\
+			rte_bit_assign ## size(&before, nr, value);	\
+		} while(!__atomic_compare_exchange_n(addr, &before, after, \
+						     true, __ATOMIC_RELAXED, \
+						     memory_order));	\
+		return rte_bit_test ## size(&before, nr);		\
+	}
+
+#else
+#error "C11 atomics (MSVC) not supported in this RFC version"
+#endif
+
+#define __RTE_GEN_BIT_ATOMIC_OPS(size)			\
+	__RTE_GEN_BIT_ATOMIC_TEST(size)			\
+	__RTE_GEN_BIT_ATOMIC_SET(size)			\
+	__RTE_GEN_BIT_ATOMIC_CLEAR(size)		\
+	__RTE_GEN_BIT_ATOMIC_ASSIGN(size)		\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)
+
+__RTE_GEN_BIT_ATOMIC_OPS(32)
+__RTE_GEN_BIT_ATOMIC_OPS(64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 6/7] eal: add generic atomic bit operations
  2024-03-02 13:53 [RFC 0/7] Improve EAL bit operations API Mattias Rönnblom
                   ` (4 preceding siblings ...)
  2024-03-02 13:53 ` [RFC 5/7] eal: add atomic bit operations Mattias Rönnblom
@ 2024-03-02 13:53 ` Mattias Rönnblom
  2024-03-02 13:53 ` [RFC 7/7] eal: deprecate relaxed family of " Mattias Rönnblom
  6 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-03-02 13:53 UTC (permalink / raw)
  To: dev; +Cc: hofors, Heng Wang, Mattias Rönnblom

Add atomic bit-level test/set/clear/assign macros operating on both
32-bit and 64-bit words by means of C11 generic selection.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 125 +++++++++++++++++++++++++++++++++++
 1 file changed, 125 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 7eb08bc768..b5a9df5930 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -290,6 +290,131 @@ extern "C" {
 		 uint32_t *: rte_bit_once_assign32,			\
 		 uint64_t *: rte_bit_once_assign64)(addr, nr, value)
 
+/**
+ * Test if a particular bit in a word is set with a particular memory
+ * order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+#define rte_bit_atomic_test(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: rte_bit_atomic_test32,			\
+		 uint64_t *: rte_bit_atomic_test64)(addr, nr, memory_order)
+
+/**
+ * Atomically set bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '1', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_set(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: rte_bit_atomic_set32,			\
+		 uint64_t *: rte_bit_atomic_set64)(addr, nr, memory_order)
+
+/**
+ * Atomically clear bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '0', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_clear(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: rte_bit_atomic_clear32,			\
+		 uint64_t *: rte_bit_atomic_clear64)(addr, nr, memory_order)
+
+/**
+ * Atomically assign a value to bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to the value indicated by @c value, with the memory ordering
+ * as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: rte_bit_atomic_assign32,			\
+		 uint64_t *: rte_bit_atomic_assign64)(addr, nr, value,	\
+						      memory_order)
+
+/**
+ * Atomically test and set a bit in word.
+ *
+ * Atomically test and set bit specified by @c nr in the word pointed
+ * to by @c addr to '1', with the memory ordering as specified with @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: rte_bit_atomic_test_and_set32,		\
+		 uint64_t *: rte_bit_atomic_test_and_set64)(addr, nr,	\
+							    memory_order))
+
+/**
+ * Atomically test and clear a bit in word.
+ *
+ * Atomically test and clear bit specified by @c nr in the word
+ * pointed to by @c addr to '0', with the memory ordering as specified
+ * with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: rte_bit_atomic_test_and_clear32,		\
+		 uint64_t *: rte_bit_atomic_test_and_clear64)(addr, nr, \
+							      memory_order))
+
 /**
  * Test if a particular bit in a 32-bit word is set.
  *
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC 7/7] eal: deprecate relaxed family of bit operations
  2024-03-02 13:53 [RFC 0/7] Improve EAL bit operations API Mattias Rönnblom
                   ` (5 preceding siblings ...)
  2024-03-02 13:53 ` [RFC 6/7] eal: add generic " Mattias Rönnblom
@ 2024-03-02 13:53 ` Mattias Rönnblom
  2024-03-02 17:07   ` Stephen Hemminger
  6 siblings, 1 reply; 90+ messages in thread
From: Mattias Rönnblom @ 2024-03-02 13:53 UTC (permalink / raw)
  To: dev; +Cc: hofors, Heng Wang, Mattias Rönnblom

Informally (by means of documentation) deprecate the
rte_bit_relaxed_*() family of bit-level operations.

rte_bit_relaxed_*() has been replaced by three new families of
bit-level query and manipulation functions.

rte_bit_relaxed_*() failed to deliver the atomicity guarantees their
name suggested. If deprecated, it will encourage the user to consider
whether the actual, implemented behavior (e.g., non-atomic
test-and-set with read/write-once semantics) or the semantics implied
by their names (i.e., atomic), or something else, is what's actually
needed.

Bugzilla ID: 1385

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 48 ++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index b5a9df5930..783dd0e1ee 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -1179,6 +1179,10 @@ __RTE_GEN_BIT_ATOMIC_OPS(64)
  *   The address holding the bit.
  * @return
  *   The target bit.
+ * @note
+ *   This function is deprecated. Use rte_bit_test32(),
+ *   rte_bit_once_test32(), or rte_bit_atomic_test32() instead,
+ *   depending on exactly what guarantees are required.
  */
 static inline uint32_t
 rte_bit_relaxed_get32(unsigned int nr, volatile uint32_t *addr)
@@ -1196,6 +1200,10 @@ rte_bit_relaxed_get32(unsigned int nr, volatile uint32_t *addr)
  *   The target bit to set.
  * @param addr
  *   The address holding the bit.
+ * @note
+ *   This function is deprecated. Use rte_bit_set32(),
+ *   rte_bit_once_set32(), or rte_bit_atomic_set32() instead,
+ *   depending on exactly what guarantees are required.
  */
 static inline void
 rte_bit_relaxed_set32(unsigned int nr, volatile uint32_t *addr)
@@ -1213,6 +1221,10 @@ rte_bit_relaxed_set32(unsigned int nr, volatile uint32_t *addr)
  *   The target bit to clear.
  * @param addr
  *   The address holding the bit.
+ * @note
+ *   This function is deprecated. Use rte_bit_clear32(),
+ *   rte_bit_once_clear32(), or rte_bit_atomic_clear32() instead,
+ *   depending on exactly what guarantees are required.
  */
 static inline void
 rte_bit_relaxed_clear32(unsigned int nr, volatile uint32_t *addr)
@@ -1233,6 +1245,12 @@ rte_bit_relaxed_clear32(unsigned int nr, volatile uint32_t *addr)
  *   The address holding the bit.
  * @return
  *   The original bit.
+ * @note
+ *   This function is deprecated and replaced by
+ *   rte_bit_atomic_test_and_set32(), for use cases where the
+ *   operation needs to be atomic. For non-atomic/non-ordered use
+ *   cases, use rte_bit_test32() + rte_bit_set32() or
+ *   rte_bit_once_test32() + rte_bit_once_set32().
  */
 static inline uint32_t
 rte_bit_relaxed_test_and_set32(unsigned int nr, volatile uint32_t *addr)
@@ -1255,6 +1273,12 @@ rte_bit_relaxed_test_and_set32(unsigned int nr, volatile uint32_t *addr)
  *   The address holding the bit.
  * @return
  *   The original bit.
+ * @note
+ *   This function is deprecated and replaced by
+ *   rte_bit_atomic_test_and_clear32(), for use cases where the
+ *   operation needs to be atomic. For non-atomic/non-ordered use
+ *   cases, use rte_bit_test32() + rte_bit_clear32() or
+ *   rte_bit_once_test32() + rte_bit_once_clear32().
  */
 static inline uint32_t
 rte_bit_relaxed_test_and_clear32(unsigned int nr, volatile uint32_t *addr)
@@ -1278,6 +1302,10 @@ rte_bit_relaxed_test_and_clear32(unsigned int nr, volatile uint32_t *addr)
  *   The address holding the bit.
  * @return
  *   The target bit.
+ * @note
+ *   This function is deprecated. Use rte_bit_test64(),
+ *   rte_bit_once_test64(), or rte_bit_atomic_test64() instead,
+ *   depending on exactly what guarantees are required.
  */
 static inline uint64_t
 rte_bit_relaxed_get64(unsigned int nr, volatile uint64_t *addr)
@@ -1295,6 +1323,10 @@ rte_bit_relaxed_get64(unsigned int nr, volatile uint64_t *addr)
  *   The target bit to set.
  * @param addr
  *   The address holding the bit.
+ * @note
+ *   This function is deprecated. Use rte_bit_set64(),
+ *   rte_bit_once_set64(), or rte_bit_atomic_set64() instead,
+ *   depending on exactly what guarantees are required.
  */
 static inline void
 rte_bit_relaxed_set64(unsigned int nr, volatile uint64_t *addr)
@@ -1312,6 +1344,10 @@ rte_bit_relaxed_set64(unsigned int nr, volatile uint64_t *addr)
  *   The target bit to clear.
  * @param addr
  *   The address holding the bit.
+ * @note
+ *   This function is deprecated. Use rte_bit_clear64(),
+ *   rte_bit_once_clear64(), or rte_bit_atomic_clear64() instead,
+ *   depending on exactly what guarantees are required.
  */
 static inline void
 rte_bit_relaxed_clear64(unsigned int nr, volatile uint64_t *addr)
@@ -1332,6 +1368,12 @@ rte_bit_relaxed_clear64(unsigned int nr, volatile uint64_t *addr)
  *   The address holding the bit.
  * @return
  *   The original bit.
+ * @note
+ *   This function is deprecated and replaced by
+ *   rte_bit_atomic_test_and_set64(), for use cases where the
+ *   operation needs to be atomic. For non-atomic/non-ordered use
+ *   cases, use rte_bit_test64() + rte_bit_set64() or
+ *   rte_bit_once_test64() + rte_bit_once_set64().
  */
 static inline uint64_t
 rte_bit_relaxed_test_and_set64(unsigned int nr, volatile uint64_t *addr)
@@ -1354,6 +1396,12 @@ rte_bit_relaxed_test_and_set64(unsigned int nr, volatile uint64_t *addr)
  *   The address holding the bit.
  * @return
  *   The original bit.
+ * @note
+ *   This function is deprecated and replaced by
+ *   rte_bit_atomic_test_and_clear64(), for use cases where the
+ *   operation needs to be atomic. For non-atomic/non-ordered use
+ *   cases, use rte_bit_test64() + rte_bit_clear64() or
+ *   rte_bit_once_test64() + rte_bit_once_clear64().
  */
 static inline uint64_t
 rte_bit_relaxed_test_and_clear64(unsigned int nr, volatile uint64_t *addr)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [RFC 1/7] eal: extend bit manipulation functions
  2024-03-02 13:53 ` [RFC 1/7] eal: extend bit manipulation functions Mattias Rönnblom
@ 2024-03-02 17:05   ` Stephen Hemminger
  2024-03-03  6:26     ` Mattias Rönnblom
  2024-04-25  8:58   ` [RFC v2 0/6] Improve EAL bit operations API Mattias Rönnblom
  1 sibling, 1 reply; 90+ messages in thread
From: Stephen Hemminger @ 2024-03-02 17:05 UTC (permalink / raw)
  To: Mattias Rönnblom; +Cc: dev, hofors, Heng Wang

On Sat, 2 Mar 2024 14:53:22 +0100
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:

> diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
> index 449565eeae..9a368724d5 100644
> --- a/lib/eal/include/rte_bitops.h
> +++ b/lib/eal/include/rte_bitops.h
> @@ -2,6 +2,7 @@
>   * Copyright(c) 2020 Arm Limited
>   * Copyright(c) 2010-2019 Intel Corporation
>   * Copyright(c) 2023 Microsoft Corporation
> + * Copyright(c) 2024 Ericsson AB
>   */
>  

Unless this is coming from another project code base, the common
practice is not to add copyright for each contributor in later versions.

> +/**
> + * Test if a particular bit in a 32-bit word is set.
> + *
> + * This function does not give any guarantees in regards to memory
> + * ordering or atomicity.
> + *
> + * @param addr
> + *   A pointer to the 32-bit word to query.
> + * @param nr
> + *   The index of the bit (0-31).
> + * @return
> + *   Returns true if the bit is set, and false otherwise.
> + */
> +static inline bool
> +rte_bit_test32(const uint32_t *addr, unsigned int nr);

Is it possible to reorder these inlines to avoid having
forward declarations?

Also, new functions should be marked __rte_experimental
for a release or two.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 7/7] eal: deprecate relaxed family of bit operations
  2024-03-02 13:53 ` [RFC 7/7] eal: deprecate relaxed family of " Mattias Rönnblom
@ 2024-03-02 17:07   ` Stephen Hemminger
  2024-03-03  6:30     ` Mattias Rönnblom
  0 siblings, 1 reply; 90+ messages in thread
From: Stephen Hemminger @ 2024-03-02 17:07 UTC (permalink / raw)
  To: Mattias Rönnblom; +Cc: dev, hofors, Heng Wang

On Sat, 2 Mar 2024 14:53:28 +0100
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:

> diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
> index b5a9df5930..783dd0e1ee 100644
> --- a/lib/eal/include/rte_bitops.h
> +++ b/lib/eal/include/rte_bitops.h
> @@ -1179,6 +1179,10 @@ __RTE_GEN_BIT_ATOMIC_OPS(64)
>   *   The address holding the bit.
>   * @return
>   *   The target bit.
> + * @note
> + *   This function is deprecated. Use rte_bit_test32(),
> + *   rte_bit_once_test32(), or rte_bit_atomic_test32() instead,
> + *   depending on exactly what guarantees are required.
>   */
>  static inline uint32_t
>  rte_bit_relaxed_get32(unsigned int nr, volatile uint32_t *addr)

The DPDK process is:
	- mark these as deprecated in release notes of release N.
	- mark these as deprecated using __rte_deprecated in next LTS
	- drop these in LTS release after that.

Don't use notes for this.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 1/7] eal: extend bit manipulation functions
  2024-03-02 17:05   ` Stephen Hemminger
@ 2024-03-03  6:26     ` Mattias Rönnblom
  2024-03-04 16:34       ` Tyler Retzlaff
  0 siblings, 1 reply; 90+ messages in thread
From: Mattias Rönnblom @ 2024-03-03  6:26 UTC (permalink / raw)
  To: Stephen Hemminger, Mattias Rönnblom; +Cc: dev, Heng Wang

On 2024-03-02 18:05, Stephen Hemminger wrote:
> On Sat, 2 Mar 2024 14:53:22 +0100
> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> 
>> diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
>> index 449565eeae..9a368724d5 100644
>> --- a/lib/eal/include/rte_bitops.h
>> +++ b/lib/eal/include/rte_bitops.h
>> @@ -2,6 +2,7 @@
>>    * Copyright(c) 2020 Arm Limited
>>    * Copyright(c) 2010-2019 Intel Corporation
>>    * Copyright(c) 2023 Microsoft Corporation
>> + * Copyright(c) 2024 Ericsson AB
>>    */
>>   
> 
> Unless this is coming from another project code base, the common
> practice is not to add copyright for each contributor in later versions.
> 

Unless it's a large contribution (compared to the rest of the file)?

I guess that's why the 916c50d commit adds the Microsoft copyright notice.

>> +/**
>> + * Test if a particular bit in a 32-bit word is set.
>> + *
>> + * This function does not give any guarantees in regards to memory
>> + * ordering or atomicity.
>> + *
>> + * @param addr
>> + *   A pointer to the 32-bit word to query.
>> + * @param nr
>> + *   The index of the bit (0-31).
>> + * @return
>> + *   Returns true if the bit is set, and false otherwise.
>> + */
>> +static inline bool
>> +rte_bit_test32(const uint32_t *addr, unsigned int nr);
> 
> Is it possible to reorder these inlines to avoid having
> forward declarations?
> 

Yes, but I'm not sure it's a net gain.

A statement expression macro seems like a perfect tool for the job, but 
then MSVC doesn't support statement expressions. You could also have a 
macro that just generate the function body, as oppose to the whole function.

I'll consider if I should just bite the bullet and expand all the 
macros. 4x duplication.

> Also, new functions should be marked __rte_experimental
> for a release or two.

Yes, thanks.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 7/7] eal: deprecate relaxed family of bit operations
  2024-03-02 17:07   ` Stephen Hemminger
@ 2024-03-03  6:30     ` Mattias Rönnblom
  0 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-03-03  6:30 UTC (permalink / raw)
  To: Stephen Hemminger, Mattias Rönnblom; +Cc: dev, Heng Wang

On 2024-03-02 18:07, Stephen Hemminger wrote:
> On Sat, 2 Mar 2024 14:53:28 +0100
> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> 
>> diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
>> index b5a9df5930..783dd0e1ee 100644
>> --- a/lib/eal/include/rte_bitops.h
>> +++ b/lib/eal/include/rte_bitops.h
>> @@ -1179,6 +1179,10 @@ __RTE_GEN_BIT_ATOMIC_OPS(64)
>>    *   The address holding the bit.
>>    * @return
>>    *   The target bit.
>> + * @note
>> + *   This function is deprecated. Use rte_bit_test32(),
>> + *   rte_bit_once_test32(), or rte_bit_atomic_test32() instead,
>> + *   depending on exactly what guarantees are required.
>>    */
>>   static inline uint32_t
>>   rte_bit_relaxed_get32(unsigned int nr, volatile uint32_t *addr)
> 
> The DPDK process is:
> 	- mark these as deprecated in release notes of release N.
> 	- mark these as deprecated using __rte_deprecated in next LTS
> 	- drop these in LTS release after that.
> 
> Don't use notes for this.

Don't use notes to replace the above process, or don't use notes at all?

A note seems useful to me, especially considering there is a choice to 
be made (not just mindlessly replacing one call with another).

Anyway, release notes updates have to wait so I'll just drop this patch 
for now.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* RE: [RFC 2/7] eal: add generic bit manipulation macros
  2024-03-02 13:53 ` [RFC 2/7] eal: add generic bit manipulation macros Mattias Rönnblom
@ 2024-03-04  8:16   ` Heng Wang
  2024-03-04 15:41     ` Mattias Rönnblom
  2024-03-04 16:42   ` Tyler Retzlaff
  1 sibling, 1 reply; 90+ messages in thread
From: Heng Wang @ 2024-03-04  8:16 UTC (permalink / raw)
  To: Mattias Rönnblom, dev; +Cc: hofors

Hi Mattias,
  I have a comment about the _Generic. What if the user gives uint8_t * or uint16_t * as the address. One improvement is that we could add a default branch in _Generic to throw a compiler error or assert false.
  Another question is what if nr >= sizeof(type) ? What if you do, for example, (uint32_t)1 << 35? Maybe we could add an assert in the implementation?

Regards,
Heng

-----Original Message-----
From: Mattias Rönnblom <mattias.ronnblom@ericsson.com> 
Sent: Saturday, March 2, 2024 2:53 PM
To: dev@dpdk.org
Cc: hofors@lysator.liu.se; Heng Wang <heng.wang@ericsson.com>; Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Subject: [RFC 2/7] eal: add generic bit manipulation macros

Add bit-level test/set/clear/assign macros operating on both 32-bit and 64-bit words by means of C11 generic selection.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 81 ++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h index 9a368724d5..afd0f11033 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -107,6 +107,87 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: rte_bit_test32,		\
+		 uint64_t *: rte_bit_test64)(addr, nr)
+
+/**
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: rte_bit_set32,		\
+		 uint64_t *: rte_bit_set64)(addr, nr)
+
+/**
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)			\
+	_Generic((addr),				\
+		 uint32_t *: rte_bit_clear32,		\
+		 uint64_t *: rte_bit_clear64)(addr, nr)
+
+/**
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or 
+64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)			\
+	_Generic((addr),				\
+		 uint32_t *: rte_bit_assign32,			\
+		 uint64_t *: rte_bit_assign64)(addr, nr, value)
+
 /**
  * Test if a particular bit in a 32-bit word is set.
  *
--
2.34.1


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 2/7] eal: add generic bit manipulation macros
  2024-03-04  8:16   ` Heng Wang
@ 2024-03-04 15:41     ` Mattias Rönnblom
  0 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-03-04 15:41 UTC (permalink / raw)
  To: Heng Wang, Mattias Rönnblom, dev


On 2024-03-04 09:16, Heng Wang wrote:
> Hi Mattias,
>    I have a comment about the _Generic. What if the user gives uint8_t * or uint16_t * as the address. One improvement is that we could add a default branch in _Generic to throw a compiler error or assert false.

If the user pass an incompatible pointer, the compiler will generate an 
error.

>    Another question is what if nr >= sizeof(type) ? What if you do, for example, (uint32_t)1 << 35? Maybe we could add an assert in the implementation?
> 

There are already such asserts in the functions the macro delegates to.

That said, DPDK RTE_ASSERT()s are disabled even in debug builds, so I'm 
not sure it's going to help anyone.

> Regards,
> Heng
> 
> -----Original Message-----
> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Sent: Saturday, March 2, 2024 2:53 PM
> To: dev@dpdk.org
> Cc: hofors@lysator.liu.se; Heng Wang <heng.wang@ericsson.com>; Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Subject: [RFC 2/7] eal: add generic bit manipulation macros
> 
> Add bit-level test/set/clear/assign macros operating on both 32-bit and 64-bit words by means of C11 generic selection.
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> ---
>   lib/eal/include/rte_bitops.h | 81 ++++++++++++++++++++++++++++++++++++
>   1 file changed, 81 insertions(+)
> 
> diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h index 9a368724d5..afd0f11033 100644
> --- a/lib/eal/include/rte_bitops.h
> +++ b/lib/eal/include/rte_bitops.h
> @@ -107,6 +107,87 @@ extern "C" {
>   #define RTE_FIELD_GET64(mask, reg) \
>   		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
>   
> +/**
> + * Test bit in word.
> + *
> + * Generic selection macro to test the value of a bit in a 32-bit or
> + * 64-bit word. The type of operation depends on the type of the @c
> + * addr parameter.
> + *
> + * This macro does not give any guarantees in regards to memory
> + * ordering or atomicity.
> + *
> + * @param addr
> + *   A pointer to the word to modify.
> + * @param nr
> + *   The index of the bit.
> + */
> +#define rte_bit_test(addr, nr)				\
> +	_Generic((addr),				\
> +		 uint32_t *: rte_bit_test32,		\
> +		 uint64_t *: rte_bit_test64)(addr, nr)
> +
> +/**
> + * Set bit in word.
> + *
> + * Generic selection macro to set a bit in a 32-bit or 64-bit
> + * word. The type of operation depends on the type of the @c addr
> + * parameter.
> + *
> + * This macro does not give any guarantees in regards to memory
> + * ordering or atomicity.
> + *
> + * @param addr
> + *   A pointer to the word to modify.
> + * @param nr
> + *   The index of the bit.
> + */
> +#define rte_bit_set(addr, nr)				\
> +	_Generic((addr),				\
> +		 uint32_t *: rte_bit_set32,		\
> +		 uint64_t *: rte_bit_set64)(addr, nr)
> +
> +/**
> + * Clear bit in word.
> + *
> + * Generic selection macro to clear a bit in a 32-bit or 64-bit
> + * word. The type of operation depends on the type of the @c addr
> + * parameter.
> + *
> + * This macro does not give any guarantees in regards to memory
> + * ordering or atomicity.
> + *
> + * @param addr
> + *   A pointer to the word to modify.
> + * @param nr
> + *   The index of the bit.
> + */
> +#define rte_bit_clear(addr, nr)			\
> +	_Generic((addr),				\
> +		 uint32_t *: rte_bit_clear32,		\
> +		 uint64_t *: rte_bit_clear64)(addr, nr)
> +
> +/**
> + * Assign a value to a bit in word.
> + *
> + * Generic selection macro to assign a value to a bit in a 32-bit or
> +64-bit
> + * word. The type of operation depends on the type of the @c addr parameter.
> + *
> + * This macro does not give any guarantees in regards to memory
> + * ordering or atomicity.
> + *
> + * @param addr
> + *   A pointer to the word to modify.
> + * @param nr
> + *   The index of the bit.
> + * @param value
> + *   The new value of the bit - true for '1', or false for '0'.
> + */
> +#define rte_bit_assign(addr, nr, value)			\
> +	_Generic((addr),				\
> +		 uint32_t *: rte_bit_assign32,			\
> +		 uint64_t *: rte_bit_assign64)(addr, nr, value)
> +
>   /**
>    * Test if a particular bit in a 32-bit word is set.
>    *
> --
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 1/7] eal: extend bit manipulation functions
  2024-03-03  6:26     ` Mattias Rönnblom
@ 2024-03-04 16:34       ` Tyler Retzlaff
  2024-03-05 18:01         ` Mattias Rönnblom
  0 siblings, 1 reply; 90+ messages in thread
From: Tyler Retzlaff @ 2024-03-04 16:34 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Stephen Hemminger, Mattias Rönnblom, dev, Heng Wang

On Sun, Mar 03, 2024 at 07:26:36AM +0100, Mattias Rönnblom wrote:
> On 2024-03-02 18:05, Stephen Hemminger wrote:
> >On Sat, 2 Mar 2024 14:53:22 +0100
> >Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> >
> >>diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
> >>index 449565eeae..9a368724d5 100644
> >>--- a/lib/eal/include/rte_bitops.h
> >>+++ b/lib/eal/include/rte_bitops.h
> >>@@ -2,6 +2,7 @@
> >>   * Copyright(c) 2020 Arm Limited
> >>   * Copyright(c) 2010-2019 Intel Corporation
> >>   * Copyright(c) 2023 Microsoft Corporation
> >>+ * Copyright(c) 2024 Ericsson AB
> >>   */
> >
> >Unless this is coming from another project code base, the common
> >practice is not to add copyright for each contributor in later versions.
> >
> 
> Unless it's a large contribution (compared to the rest of the file)?
> 
> I guess that's why the 916c50d commit adds the Microsoft copyright notice.
> 
> >>+/**
> >>+ * Test if a particular bit in a 32-bit word is set.
> >>+ *
> >>+ * This function does not give any guarantees in regards to memory
> >>+ * ordering or atomicity.
> >>+ *
> >>+ * @param addr
> >>+ *   A pointer to the 32-bit word to query.
> >>+ * @param nr
> >>+ *   The index of the bit (0-31).
> >>+ * @return
> >>+ *   Returns true if the bit is set, and false otherwise.
> >>+ */
> >>+static inline bool
> >>+rte_bit_test32(const uint32_t *addr, unsigned int nr);
> >
> >Is it possible to reorder these inlines to avoid having
> >forward declarations?
> >
> 
> Yes, but I'm not sure it's a net gain.
> 
> A statement expression macro seems like a perfect tool for the job,
> but then MSVC doesn't support statement expressions. You could also
> have a macro that just generate the function body, as oppose to the
> whole function.

statement expressions can be used even with MSVC when using C. but GCC
documentation discourages their use for C++. since the header is
consumed by C++ in addition to C it's preferrable to avoid them.

> 
> I'll consider if I should just bite the bullet and expand all the
> macros. 4x duplication.
> 
> >Also, new functions should be marked __rte_experimental
> >for a release or two.
> 
> Yes, thanks.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 2/7] eal: add generic bit manipulation macros
  2024-03-02 13:53 ` [RFC 2/7] eal: add generic bit manipulation macros Mattias Rönnblom
  2024-03-04  8:16   ` Heng Wang
@ 2024-03-04 16:42   ` Tyler Retzlaff
  2024-03-05 18:08     ` Mattias Rönnblom
  1 sibling, 1 reply; 90+ messages in thread
From: Tyler Retzlaff @ 2024-03-04 16:42 UTC (permalink / raw)
  To: Mattias Rönnblom; +Cc: dev, hofors, Heng Wang

On Sat, Mar 02, 2024 at 02:53:23PM +0100, Mattias Rönnblom wrote:
> Add bit-level test/set/clear/assign macros operating on both 32-bit
> and 64-bit words by means of C11 generic selection.
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> ---

_Generic is nice here. should we discourage direct use of the inline
functions in preference of using the macro always? either way lgtm.

Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>

>  lib/eal/include/rte_bitops.h | 81 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 81 insertions(+)
> 
> diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
> index 9a368724d5..afd0f11033 100644
> --- a/lib/eal/include/rte_bitops.h
> +++ b/lib/eal/include/rte_bitops.h
> @@ -107,6 +107,87 @@ extern "C" {
>  #define RTE_FIELD_GET64(mask, reg) \
>  		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
>  
> +/**
> + * Test bit in word.
> + *
> + * Generic selection macro to test the value of a bit in a 32-bit or
> + * 64-bit word. The type of operation depends on the type of the @c
> + * addr parameter.
> + *
> + * This macro does not give any guarantees in regards to memory
> + * ordering or atomicity.
> + *
> + * @param addr
> + *   A pointer to the word to modify.
> + * @param nr
> + *   The index of the bit.
> + */
> +#define rte_bit_test(addr, nr)				\
> +	_Generic((addr),				\
> +		 uint32_t *: rte_bit_test32,		\
> +		 uint64_t *: rte_bit_test64)(addr, nr)
> +
> +/**
> + * Set bit in word.
> + *
> + * Generic selection macro to set a bit in a 32-bit or 64-bit
> + * word. The type of operation depends on the type of the @c addr
> + * parameter.
> + *
> + * This macro does not give any guarantees in regards to memory
> + * ordering or atomicity.
> + *
> + * @param addr
> + *   A pointer to the word to modify.
> + * @param nr
> + *   The index of the bit.
> + */
> +#define rte_bit_set(addr, nr)				\
> +	_Generic((addr),				\
> +		 uint32_t *: rte_bit_set32,		\
> +		 uint64_t *: rte_bit_set64)(addr, nr)
> +
> +/**
> + * Clear bit in word.
> + *
> + * Generic selection macro to clear a bit in a 32-bit or 64-bit
> + * word. The type of operation depends on the type of the @c addr
> + * parameter.
> + *
> + * This macro does not give any guarantees in regards to memory
> + * ordering or atomicity.
> + *
> + * @param addr
> + *   A pointer to the word to modify.
> + * @param nr
> + *   The index of the bit.
> + */
> +#define rte_bit_clear(addr, nr)			\
> +	_Generic((addr),				\
> +		 uint32_t *: rte_bit_clear32,		\
> +		 uint64_t *: rte_bit_clear64)(addr, nr)
> +
> +/**
> + * Assign a value to a bit in word.
> + *
> + * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
> + * word. The type of operation depends on the type of the @c addr parameter.
> + *
> + * This macro does not give any guarantees in regards to memory
> + * ordering or atomicity.
> + *
> + * @param addr
> + *   A pointer to the word to modify.
> + * @param nr
> + *   The index of the bit.
> + * @param value
> + *   The new value of the bit - true for '1', or false for '0'.
> + */
> +#define rte_bit_assign(addr, nr, value)			\
> +	_Generic((addr),				\
> +		 uint32_t *: rte_bit_assign32,			\
> +		 uint64_t *: rte_bit_assign64)(addr, nr, value)
> +
>  /**
>   * Test if a particular bit in a 32-bit word is set.
>   *
> -- 
> 2.34.1

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 1/7] eal: extend bit manipulation functions
  2024-03-04 16:34       ` Tyler Retzlaff
@ 2024-03-05 18:01         ` Mattias Rönnblom
  2024-03-05 18:06           ` Tyler Retzlaff
  0 siblings, 1 reply; 90+ messages in thread
From: Mattias Rönnblom @ 2024-03-05 18:01 UTC (permalink / raw)
  To: Tyler Retzlaff; +Cc: Stephen Hemminger, Mattias Rönnblom, dev, Heng Wang

On 2024-03-04 17:34, Tyler Retzlaff wrote:
> On Sun, Mar 03, 2024 at 07:26:36AM +0100, Mattias Rönnblom wrote:
>> On 2024-03-02 18:05, Stephen Hemminger wrote:
>>> On Sat, 2 Mar 2024 14:53:22 +0100
>>> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
>>>
>>>> diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
>>>> index 449565eeae..9a368724d5 100644
>>>> --- a/lib/eal/include/rte_bitops.h
>>>> +++ b/lib/eal/include/rte_bitops.h
>>>> @@ -2,6 +2,7 @@
>>>>    * Copyright(c) 2020 Arm Limited
>>>>    * Copyright(c) 2010-2019 Intel Corporation
>>>>    * Copyright(c) 2023 Microsoft Corporation
>>>> + * Copyright(c) 2024 Ericsson AB
>>>>    */
>>>
>>> Unless this is coming from another project code base, the common
>>> practice is not to add copyright for each contributor in later versions.
>>>
>>
>> Unless it's a large contribution (compared to the rest of the file)?
>>
>> I guess that's why the 916c50d commit adds the Microsoft copyright notice.
>>
>>>> +/**
>>>> + * Test if a particular bit in a 32-bit word is set.
>>>> + *
>>>> + * This function does not give any guarantees in regards to memory
>>>> + * ordering or atomicity.
>>>> + *
>>>> + * @param addr
>>>> + *   A pointer to the 32-bit word to query.
>>>> + * @param nr
>>>> + *   The index of the bit (0-31).
>>>> + * @return
>>>> + *   Returns true if the bit is set, and false otherwise.
>>>> + */
>>>> +static inline bool
>>>> +rte_bit_test32(const uint32_t *addr, unsigned int nr);
>>>
>>> Is it possible to reorder these inlines to avoid having
>>> forward declarations?
>>>
>>
>> Yes, but I'm not sure it's a net gain.
>>
>> A statement expression macro seems like a perfect tool for the job,
>> but then MSVC doesn't support statement expressions. You could also
>> have a macro that just generate the function body, as oppose to the
>> whole function.
> 
> statement expressions can be used even with MSVC when using C. but GCC
> documentation discourages their use for C++. since the header is

GCC documentation discourages statement expressions *of a particular 
form* from being included in headers to be consumed by C++.

They would be fine to use here, especially considering they wouldn't be 
a part of the public API (i.e., only invoked from the static inline 
functions in the API).

> consumed by C++ in addition to C it's preferrable to avoid them.
> 
>>
>> I'll consider if I should just bite the bullet and expand all the
>> macros. 4x duplication.
>>
>>> Also, new functions should be marked __rte_experimental
>>> for a release or two.
>>
>> Yes, thanks.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 1/7] eal: extend bit manipulation functions
  2024-03-05 18:01         ` Mattias Rönnblom
@ 2024-03-05 18:06           ` Tyler Retzlaff
  0 siblings, 0 replies; 90+ messages in thread
From: Tyler Retzlaff @ 2024-03-05 18:06 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Stephen Hemminger, Mattias Rönnblom, dev, Heng Wang

On Tue, Mar 05, 2024 at 07:01:50PM +0100, Mattias Rönnblom wrote:
> On 2024-03-04 17:34, Tyler Retzlaff wrote:
> >On Sun, Mar 03, 2024 at 07:26:36AM +0100, Mattias Rönnblom wrote:
> >>On 2024-03-02 18:05, Stephen Hemminger wrote:
> >>>On Sat, 2 Mar 2024 14:53:22 +0100
> >>>Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> >>>
> >>>>diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
> >>>>index 449565eeae..9a368724d5 100644
> >>>>--- a/lib/eal/include/rte_bitops.h
> >>>>+++ b/lib/eal/include/rte_bitops.h
> >>>>@@ -2,6 +2,7 @@
> >>>>   * Copyright(c) 2020 Arm Limited
> >>>>   * Copyright(c) 2010-2019 Intel Corporation
> >>>>   * Copyright(c) 2023 Microsoft Corporation
> >>>>+ * Copyright(c) 2024 Ericsson AB
> >>>>   */
> >>>
> >>>Unless this is coming from another project code base, the common
> >>>practice is not to add copyright for each contributor in later versions.
> >>>
> >>
> >>Unless it's a large contribution (compared to the rest of the file)?
> >>
> >>I guess that's why the 916c50d commit adds the Microsoft copyright notice.
> >>
> >>>>+/**
> >>>>+ * Test if a particular bit in a 32-bit word is set.
> >>>>+ *
> >>>>+ * This function does not give any guarantees in regards to memory
> >>>>+ * ordering or atomicity.
> >>>>+ *
> >>>>+ * @param addr
> >>>>+ *   A pointer to the 32-bit word to query.
> >>>>+ * @param nr
> >>>>+ *   The index of the bit (0-31).
> >>>>+ * @return
> >>>>+ *   Returns true if the bit is set, and false otherwise.
> >>>>+ */
> >>>>+static inline bool
> >>>>+rte_bit_test32(const uint32_t *addr, unsigned int nr);
> >>>
> >>>Is it possible to reorder these inlines to avoid having
> >>>forward declarations?
> >>>
> >>
> >>Yes, but I'm not sure it's a net gain.
> >>
> >>A statement expression macro seems like a perfect tool for the job,
> >>but then MSVC doesn't support statement expressions. You could also
> >>have a macro that just generate the function body, as oppose to the
> >>whole function.
> >
> >statement expressions can be used even with MSVC when using C. but GCC
> >documentation discourages their use for C++. since the header is
> 
> GCC documentation discourages statement expressions *of a particular
> form* from being included in headers to be consumed by C++.
> 
> They would be fine to use here, especially considering they wouldn't
> be a part of the public API (i.e., only invoked from the static
> inline functions in the API).

agreed, there should be no problem.

> 
> >consumed by C++ in addition to C it's preferrable to avoid them.
> >
> >>
> >>I'll consider if I should just bite the bullet and expand all the
> >>macros. 4x duplication.
> >>
> >>>Also, new functions should be marked __rte_experimental
> >>>for a release or two.
> >>
> >>Yes, thanks.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 2/7] eal: add generic bit manipulation macros
  2024-03-04 16:42   ` Tyler Retzlaff
@ 2024-03-05 18:08     ` Mattias Rönnblom
  2024-03-05 18:22       ` Tyler Retzlaff
  0 siblings, 1 reply; 90+ messages in thread
From: Mattias Rönnblom @ 2024-03-05 18:08 UTC (permalink / raw)
  To: Tyler Retzlaff, Mattias Rönnblom; +Cc: dev, Heng Wang

On 2024-03-04 17:42, Tyler Retzlaff wrote:
> On Sat, Mar 02, 2024 at 02:53:23PM +0100, Mattias Rönnblom wrote:
>> Add bit-level test/set/clear/assign macros operating on both 32-bit
>> and 64-bit words by means of C11 generic selection.
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> ---
> 
> _Generic is nice here. should we discourage direct use of the inline
> functions in preference of using the macro always? either way lgtm.
> 

That was something I considered, but decided against it for RFC v1. I 
wasn't even sure people would like _Generic.

The big upside of having only the _Generic macros would be a much 
smaller API, but maybe a tiny bit less (type-)safe to use.

Also, _Generic is new for DPDK, so who knows what issues it might cause 
with old compilers.

Thanks.

> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> 
>>   lib/eal/include/rte_bitops.h | 81 ++++++++++++++++++++++++++++++++++++
>>   1 file changed, 81 insertions(+)
>>
>> diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
>> index 9a368724d5..afd0f11033 100644
>> --- a/lib/eal/include/rte_bitops.h
>> +++ b/lib/eal/include/rte_bitops.h
>> @@ -107,6 +107,87 @@ extern "C" {
>>   #define RTE_FIELD_GET64(mask, reg) \
>>   		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
>>   
>> +/**
>> + * Test bit in word.
>> + *
>> + * Generic selection macro to test the value of a bit in a 32-bit or
>> + * 64-bit word. The type of operation depends on the type of the @c
>> + * addr parameter.
>> + *
>> + * This macro does not give any guarantees in regards to memory
>> + * ordering or atomicity.
>> + *
>> + * @param addr
>> + *   A pointer to the word to modify.
>> + * @param nr
>> + *   The index of the bit.
>> + */
>> +#define rte_bit_test(addr, nr)				\
>> +	_Generic((addr),				\
>> +		 uint32_t *: rte_bit_test32,		\
>> +		 uint64_t *: rte_bit_test64)(addr, nr)
>> +
>> +/**
>> + * Set bit in word.
>> + *
>> + * Generic selection macro to set a bit in a 32-bit or 64-bit
>> + * word. The type of operation depends on the type of the @c addr
>> + * parameter.
>> + *
>> + * This macro does not give any guarantees in regards to memory
>> + * ordering or atomicity.
>> + *
>> + * @param addr
>> + *   A pointer to the word to modify.
>> + * @param nr
>> + *   The index of the bit.
>> + */
>> +#define rte_bit_set(addr, nr)				\
>> +	_Generic((addr),				\
>> +		 uint32_t *: rte_bit_set32,		\
>> +		 uint64_t *: rte_bit_set64)(addr, nr)
>> +
>> +/**
>> + * Clear bit in word.
>> + *
>> + * Generic selection macro to clear a bit in a 32-bit or 64-bit
>> + * word. The type of operation depends on the type of the @c addr
>> + * parameter.
>> + *
>> + * This macro does not give any guarantees in regards to memory
>> + * ordering or atomicity.
>> + *
>> + * @param addr
>> + *   A pointer to the word to modify.
>> + * @param nr
>> + *   The index of the bit.
>> + */
>> +#define rte_bit_clear(addr, nr)			\
>> +	_Generic((addr),				\
>> +		 uint32_t *: rte_bit_clear32,		\
>> +		 uint64_t *: rte_bit_clear64)(addr, nr)
>> +
>> +/**
>> + * Assign a value to a bit in word.
>> + *
>> + * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
>> + * word. The type of operation depends on the type of the @c addr parameter.
>> + *
>> + * This macro does not give any guarantees in regards to memory
>> + * ordering or atomicity.
>> + *
>> + * @param addr
>> + *   A pointer to the word to modify.
>> + * @param nr
>> + *   The index of the bit.
>> + * @param value
>> + *   The new value of the bit - true for '1', or false for '0'.
>> + */
>> +#define rte_bit_assign(addr, nr, value)			\
>> +	_Generic((addr),				\
>> +		 uint32_t *: rte_bit_assign32,			\
>> +		 uint64_t *: rte_bit_assign64)(addr, nr, value)
>> +
>>   /**
>>    * Test if a particular bit in a 32-bit word is set.
>>    *
>> -- 
>> 2.34.1

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 2/7] eal: add generic bit manipulation macros
  2024-03-05 18:08     ` Mattias Rönnblom
@ 2024-03-05 18:22       ` Tyler Retzlaff
  2024-03-05 20:02         ` Mattias Rönnblom
  0 siblings, 1 reply; 90+ messages in thread
From: Tyler Retzlaff @ 2024-03-05 18:22 UTC (permalink / raw)
  To: Mattias Rönnblom; +Cc: Mattias Rönnblom, dev, Heng Wang

On Tue, Mar 05, 2024 at 07:08:36PM +0100, Mattias Rönnblom wrote:
> On 2024-03-04 17:42, Tyler Retzlaff wrote:
> >On Sat, Mar 02, 2024 at 02:53:23PM +0100, Mattias Rönnblom wrote:
> >>Add bit-level test/set/clear/assign macros operating on both 32-bit
> >>and 64-bit words by means of C11 generic selection.
> >>
> >>Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >>---
> >
> >_Generic is nice here. should we discourage direct use of the inline
> >functions in preference of using the macro always? either way lgtm.
> >
> 
> That was something I considered, but decided against it for RFC v1.
> I wasn't even sure people would like _Generic.
> 
> The big upside of having only the _Generic macros would be a much
> smaller API, but maybe a tiny bit less (type-)safe to use.

i'm curious what misuse pattern you anticipate or have seen that may be
less type-safe? just so i can look out for them.

i (perhaps naively) have liked generic functions for their selection of
the "correct" type and for _Generic if no leg/case exists compiler
error (as opposed to e.g. silent truncation).

> 
> Also, _Generic is new for DPDK, so who knows what issues it might
> cause with old compilers.

i was thinking about this overnight, it's supposed to be standard C11
and my use on various compilers showed no problem but I can't recall if
i did any evaluation when consuming as a part of a C++ translation unit
so there could be problems.

> 
> Thanks.
> 
> >Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> >
> >>  lib/eal/include/rte_bitops.h | 81 ++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 81 insertions(+)
> >>
> >>diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
> >>index 9a368724d5..afd0f11033 100644
> >>--- a/lib/eal/include/rte_bitops.h
> >>+++ b/lib/eal/include/rte_bitops.h
> >>@@ -107,6 +107,87 @@ extern "C" {
> >>  #define RTE_FIELD_GET64(mask, reg) \
> >>  		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
> >>+/**
> >>+ * Test bit in word.
> >>+ *
> >>+ * Generic selection macro to test the value of a bit in a 32-bit or
> >>+ * 64-bit word. The type of operation depends on the type of the @c
> >>+ * addr parameter.
> >>+ *
> >>+ * This macro does not give any guarantees in regards to memory
> >>+ * ordering or atomicity.
> >>+ *
> >>+ * @param addr
> >>+ *   A pointer to the word to modify.
> >>+ * @param nr
> >>+ *   The index of the bit.
> >>+ */
> >>+#define rte_bit_test(addr, nr)				\
> >>+	_Generic((addr),				\
> >>+		 uint32_t *: rte_bit_test32,		\
> >>+		 uint64_t *: rte_bit_test64)(addr, nr)
> >>+
> >>+/**
> >>+ * Set bit in word.
> >>+ *
> >>+ * Generic selection macro to set a bit in a 32-bit or 64-bit
> >>+ * word. The type of operation depends on the type of the @c addr
> >>+ * parameter.
> >>+ *
> >>+ * This macro does not give any guarantees in regards to memory
> >>+ * ordering or atomicity.
> >>+ *
> >>+ * @param addr
> >>+ *   A pointer to the word to modify.
> >>+ * @param nr
> >>+ *   The index of the bit.
> >>+ */
> >>+#define rte_bit_set(addr, nr)				\
> >>+	_Generic((addr),				\
> >>+		 uint32_t *: rte_bit_set32,		\
> >>+		 uint64_t *: rte_bit_set64)(addr, nr)
> >>+
> >>+/**
> >>+ * Clear bit in word.
> >>+ *
> >>+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
> >>+ * word. The type of operation depends on the type of the @c addr
> >>+ * parameter.
> >>+ *
> >>+ * This macro does not give any guarantees in regards to memory
> >>+ * ordering or atomicity.
> >>+ *
> >>+ * @param addr
> >>+ *   A pointer to the word to modify.
> >>+ * @param nr
> >>+ *   The index of the bit.
> >>+ */
> >>+#define rte_bit_clear(addr, nr)			\
> >>+	_Generic((addr),				\
> >>+		 uint32_t *: rte_bit_clear32,		\
> >>+		 uint64_t *: rte_bit_clear64)(addr, nr)
> >>+
> >>+/**
> >>+ * Assign a value to a bit in word.
> >>+ *
> >>+ * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
> >>+ * word. The type of operation depends on the type of the @c addr parameter.
> >>+ *
> >>+ * This macro does not give any guarantees in regards to memory
> >>+ * ordering or atomicity.
> >>+ *
> >>+ * @param addr
> >>+ *   A pointer to the word to modify.
> >>+ * @param nr
> >>+ *   The index of the bit.
> >>+ * @param value
> >>+ *   The new value of the bit - true for '1', or false for '0'.
> >>+ */
> >>+#define rte_bit_assign(addr, nr, value)			\
> >>+	_Generic((addr),				\
> >>+		 uint32_t *: rte_bit_assign32,			\
> >>+		 uint64_t *: rte_bit_assign64)(addr, nr, value)
> >>+
> >>  /**
> >>   * Test if a particular bit in a 32-bit word is set.
> >>   *
> >>-- 
> >>2.34.1

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 2/7] eal: add generic bit manipulation macros
  2024-03-05 18:22       ` Tyler Retzlaff
@ 2024-03-05 20:02         ` Mattias Rönnblom
  2024-03-05 20:53           ` Tyler Retzlaff
  0 siblings, 1 reply; 90+ messages in thread
From: Mattias Rönnblom @ 2024-03-05 20:02 UTC (permalink / raw)
  To: Tyler Retzlaff; +Cc: Mattias Rönnblom, dev, Heng Wang

On 2024-03-05 19:22, Tyler Retzlaff wrote:
> On Tue, Mar 05, 2024 at 07:08:36PM +0100, Mattias Rönnblom wrote:
>> On 2024-03-04 17:42, Tyler Retzlaff wrote:
>>> On Sat, Mar 02, 2024 at 02:53:23PM +0100, Mattias Rönnblom wrote:
>>>> Add bit-level test/set/clear/assign macros operating on both 32-bit
>>>> and 64-bit words by means of C11 generic selection.
>>>>
>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>>> ---
>>>
>>> _Generic is nice here. should we discourage direct use of the inline
>>> functions in preference of using the macro always? either way lgtm.
>>>
>>
>> That was something I considered, but decided against it for RFC v1.
>> I wasn't even sure people would like _Generic.
>>
>> The big upside of having only the _Generic macros would be a much
>> smaller API, but maybe a tiny bit less (type-)safe to use.
> 
> i'm curious what misuse pattern you anticipate or have seen that may be
> less type-safe? just so i can look out for them.
> 

That was just a gut feeling, not to be taken too seriously.

uint32_t *p = some_void_pointer;
/../
rte_bit_set32(p, 17);

A code section like this is redundant in the way the type (or at least 
type size) is coded both into the function name, and the pointer type. 
The use of rte_set_bit() will eliminate this, which is good (DRY), and 
bad, because now the type isn't "double-checked".

As you can see, it's a pretty weak argument.

> i (perhaps naively) have liked generic functions for their selection of
> the "correct" type and for _Generic if no leg/case exists compiler
> error (as opposed to e.g. silent truncation).
> 
>>
>> Also, _Generic is new for DPDK, so who knows what issues it might
>> cause with old compilers.
> 
> i was thinking about this overnight, it's supposed to be standard C11
> and my use on various compilers showed no problem but I can't recall if
> i did any evaluation when consuming as a part of a C++ translation unit
> so there could be problems.
> 

It would be unfortunate if DPDK was prohibited from using _Generic.

>>
>> Thanks.
>>
>>> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
>>>
>>>>   lib/eal/include/rte_bitops.h | 81 ++++++++++++++++++++++++++++++++++++
>>>>   1 file changed, 81 insertions(+)
>>>>
>>>> diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
>>>> index 9a368724d5..afd0f11033 100644
>>>> --- a/lib/eal/include/rte_bitops.h
>>>> +++ b/lib/eal/include/rte_bitops.h
>>>> @@ -107,6 +107,87 @@ extern "C" {
>>>>   #define RTE_FIELD_GET64(mask, reg) \
>>>>   		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
>>>> +/**
>>>> + * Test bit in word.
>>>> + *
>>>> + * Generic selection macro to test the value of a bit in a 32-bit or
>>>> + * 64-bit word. The type of operation depends on the type of the @c
>>>> + * addr parameter.
>>>> + *
>>>> + * This macro does not give any guarantees in regards to memory
>>>> + * ordering or atomicity.
>>>> + *
>>>> + * @param addr
>>>> + *   A pointer to the word to modify.
>>>> + * @param nr
>>>> + *   The index of the bit.
>>>> + */
>>>> +#define rte_bit_test(addr, nr)				\
>>>> +	_Generic((addr),				\
>>>> +		 uint32_t *: rte_bit_test32,		\
>>>> +		 uint64_t *: rte_bit_test64)(addr, nr)
>>>> +
>>>> +/**
>>>> + * Set bit in word.
>>>> + *
>>>> + * Generic selection macro to set a bit in a 32-bit or 64-bit
>>>> + * word. The type of operation depends on the type of the @c addr
>>>> + * parameter.
>>>> + *
>>>> + * This macro does not give any guarantees in regards to memory
>>>> + * ordering or atomicity.
>>>> + *
>>>> + * @param addr
>>>> + *   A pointer to the word to modify.
>>>> + * @param nr
>>>> + *   The index of the bit.
>>>> + */
>>>> +#define rte_bit_set(addr, nr)				\
>>>> +	_Generic((addr),				\
>>>> +		 uint32_t *: rte_bit_set32,		\
>>>> +		 uint64_t *: rte_bit_set64)(addr, nr)
>>>> +
>>>> +/**
>>>> + * Clear bit in word.
>>>> + *
>>>> + * Generic selection macro to clear a bit in a 32-bit or 64-bit
>>>> + * word. The type of operation depends on the type of the @c addr
>>>> + * parameter.
>>>> + *
>>>> + * This macro does not give any guarantees in regards to memory
>>>> + * ordering or atomicity.
>>>> + *
>>>> + * @param addr
>>>> + *   A pointer to the word to modify.
>>>> + * @param nr
>>>> + *   The index of the bit.
>>>> + */
>>>> +#define rte_bit_clear(addr, nr)			\
>>>> +	_Generic((addr),				\
>>>> +		 uint32_t *: rte_bit_clear32,		\
>>>> +		 uint64_t *: rte_bit_clear64)(addr, nr)
>>>> +
>>>> +/**
>>>> + * Assign a value to a bit in word.
>>>> + *
>>>> + * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
>>>> + * word. The type of operation depends on the type of the @c addr parameter.
>>>> + *
>>>> + * This macro does not give any guarantees in regards to memory
>>>> + * ordering or atomicity.
>>>> + *
>>>> + * @param addr
>>>> + *   A pointer to the word to modify.
>>>> + * @param nr
>>>> + *   The index of the bit.
>>>> + * @param value
>>>> + *   The new value of the bit - true for '1', or false for '0'.
>>>> + */
>>>> +#define rte_bit_assign(addr, nr, value)			\
>>>> +	_Generic((addr),				\
>>>> +		 uint32_t *: rte_bit_assign32,			\
>>>> +		 uint64_t *: rte_bit_assign64)(addr, nr, value)
>>>> +
>>>>   /**
>>>>    * Test if a particular bit in a 32-bit word is set.
>>>>    *
>>>> -- 
>>>> 2.34.1

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC 2/7] eal: add generic bit manipulation macros
  2024-03-05 20:02         ` Mattias Rönnblom
@ 2024-03-05 20:53           ` Tyler Retzlaff
  0 siblings, 0 replies; 90+ messages in thread
From: Tyler Retzlaff @ 2024-03-05 20:53 UTC (permalink / raw)
  To: Mattias Rönnblom; +Cc: Mattias Rönnblom, dev, Heng Wang

On Tue, Mar 05, 2024 at 09:02:34PM +0100, Mattias Rönnblom wrote:
> On 2024-03-05 19:22, Tyler Retzlaff wrote:
> >On Tue, Mar 05, 2024 at 07:08:36PM +0100, Mattias Rönnblom wrote:
> >>On 2024-03-04 17:42, Tyler Retzlaff wrote:
> >>>On Sat, Mar 02, 2024 at 02:53:23PM +0100, Mattias Rönnblom wrote:
> >>>>Add bit-level test/set/clear/assign macros operating on both 32-bit
> >>>>and 64-bit words by means of C11 generic selection.
> >>>>
> >>>>Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >>>>---
> >>>
> >>>_Generic is nice here. should we discourage direct use of the inline
> >>>functions in preference of using the macro always? either way lgtm.
> >>>
> >>
> >>That was something I considered, but decided against it for RFC v1.
> >>I wasn't even sure people would like _Generic.
> >>
> >>The big upside of having only the _Generic macros would be a much
> >>smaller API, but maybe a tiny bit less (type-)safe to use.
> >
> >i'm curious what misuse pattern you anticipate or have seen that may be
> >less type-safe? just so i can look out for them.
> >
> 
> That was just a gut feeling, not to be taken too seriously.
> 
> uint32_t *p = some_void_pointer;
> /../
> rte_bit_set32(p, 17);
> 
> A code section like this is redundant in the way the type (or at
> least type size) is coded both into the function name, and the
> pointer type. The use of rte_set_bit() will eliminate this, which is
> good (DRY), and bad, because now the type isn't "double-checked".
> 
> As you can see, it's a pretty weak argument.
> 
> >i (perhaps naively) have liked generic functions for their selection of
> >the "correct" type and for _Generic if no leg/case exists compiler
> >error (as opposed to e.g. silent truncation).
> >
> >>
> >>Also, _Generic is new for DPDK, so who knows what issues it might
> >>cause with old compilers.
> >
> >i was thinking about this overnight, it's supposed to be standard C11
> >and my use on various compilers showed no problem but I can't recall if
> >i did any evaluation when consuming as a part of a C++ translation unit
> >so there could be problems.
> >
> 
> It would be unfortunate if DPDK was prohibited from using _Generic.

I agree, I don't think it should be prohibited. If C++ poses a problem
we can work to find solutions.

> 
> >>
> >>Thanks.
> >>
> >>>Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> >>>
> >>>>  lib/eal/include/rte_bitops.h | 81 ++++++++++++++++++++++++++++++++++++
> >>>>  1 file changed, 81 insertions(+)
> >>>>
> >>>>diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
> >>>>index 9a368724d5..afd0f11033 100644
> >>>>--- a/lib/eal/include/rte_bitops.h
> >>>>+++ b/lib/eal/include/rte_bitops.h
> >>>>@@ -107,6 +107,87 @@ extern "C" {
> >>>>  #define RTE_FIELD_GET64(mask, reg) \
> >>>>  		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
> >>>>+/**
> >>>>+ * Test bit in word.
> >>>>+ *
> >>>>+ * Generic selection macro to test the value of a bit in a 32-bit or
> >>>>+ * 64-bit word. The type of operation depends on the type of the @c
> >>>>+ * addr parameter.
> >>>>+ *
> >>>>+ * This macro does not give any guarantees in regards to memory
> >>>>+ * ordering or atomicity.
> >>>>+ *
> >>>>+ * @param addr
> >>>>+ *   A pointer to the word to modify.
> >>>>+ * @param nr
> >>>>+ *   The index of the bit.
> >>>>+ */
> >>>>+#define rte_bit_test(addr, nr)				\
> >>>>+	_Generic((addr),				\
> >>>>+		 uint32_t *: rte_bit_test32,		\
> >>>>+		 uint64_t *: rte_bit_test64)(addr, nr)
> >>>>+
> >>>>+/**
> >>>>+ * Set bit in word.
> >>>>+ *
> >>>>+ * Generic selection macro to set a bit in a 32-bit or 64-bit
> >>>>+ * word. The type of operation depends on the type of the @c addr
> >>>>+ * parameter.
> >>>>+ *
> >>>>+ * This macro does not give any guarantees in regards to memory
> >>>>+ * ordering or atomicity.
> >>>>+ *
> >>>>+ * @param addr
> >>>>+ *   A pointer to the word to modify.
> >>>>+ * @param nr
> >>>>+ *   The index of the bit.
> >>>>+ */
> >>>>+#define rte_bit_set(addr, nr)				\
> >>>>+	_Generic((addr),				\
> >>>>+		 uint32_t *: rte_bit_set32,		\
> >>>>+		 uint64_t *: rte_bit_set64)(addr, nr)
> >>>>+
> >>>>+/**
> >>>>+ * Clear bit in word.
> >>>>+ *
> >>>>+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
> >>>>+ * word. The type of operation depends on the type of the @c addr
> >>>>+ * parameter.
> >>>>+ *
> >>>>+ * This macro does not give any guarantees in regards to memory
> >>>>+ * ordering or atomicity.
> >>>>+ *
> >>>>+ * @param addr
> >>>>+ *   A pointer to the word to modify.
> >>>>+ * @param nr
> >>>>+ *   The index of the bit.
> >>>>+ */
> >>>>+#define rte_bit_clear(addr, nr)			\
> >>>>+	_Generic((addr),				\
> >>>>+		 uint32_t *: rte_bit_clear32,		\
> >>>>+		 uint64_t *: rte_bit_clear64)(addr, nr)
> >>>>+
> >>>>+/**
> >>>>+ * Assign a value to a bit in word.
> >>>>+ *
> >>>>+ * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
> >>>>+ * word. The type of operation depends on the type of the @c addr parameter.
> >>>>+ *
> >>>>+ * This macro does not give any guarantees in regards to memory
> >>>>+ * ordering or atomicity.
> >>>>+ *
> >>>>+ * @param addr
> >>>>+ *   A pointer to the word to modify.
> >>>>+ * @param nr
> >>>>+ *   The index of the bit.
> >>>>+ * @param value
> >>>>+ *   The new value of the bit - true for '1', or false for '0'.
> >>>>+ */
> >>>>+#define rte_bit_assign(addr, nr, value)			\
> >>>>+	_Generic((addr),				\
> >>>>+		 uint32_t *: rte_bit_assign32,			\
> >>>>+		 uint64_t *: rte_bit_assign64)(addr, nr, value)
> >>>>+
> >>>>  /**
> >>>>   * Test if a particular bit in a 32-bit word is set.
> >>>>   *
> >>>>-- 
> >>>>2.34.1

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [RFC v2 0/6] Improve EAL bit operations API
  2024-03-02 13:53 ` [RFC 1/7] eal: extend bit manipulation functions Mattias Rönnblom
  2024-03-02 17:05   ` Stephen Hemminger
@ 2024-04-25  8:58   ` Mattias Rönnblom
  2024-04-25  8:58     ` [RFC v2 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
                       ` (7 more replies)
  1 sibling, 8 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-25  8:58 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Mattias Rönnblom

This patch set represent an attempt to improve and extend the RTE
bitops API, in particular for functions that operate on individual
bits.

All new functionality is exposed to the user as generic selection
macros, delegating the actual work to private (__-marked) static
inline functions. Public functions (e.g., rte_bit_set32()) would just
be bloating the API. Such generic selection macros will here be
referred to as "functions", although technically they are not.

The legacy <rte_bitops.h> rte_bit_relaxed_*() family of functions is
replaced with three families:

rte_bit_[test|set|clear|assign]() which provides no memory ordering or
atomicity guarantees and no read-once or write-once semantics (e.g.,
no use of volatile), but does provide the best performance. The
performance degradation resulting from the use of volatile (e.g.,
forcing loads and stores to actually occur and in the number
specified) and atomic (e.g., LOCK-prefixed instructions on x86) may be
significant.

rte_bit_once_*() which guarantees program-level load and stores
actually occurring (i.e., prevents certain optimizations). The primary
use of these functions are in the context of memory mapped
I/O. Feedback on the details (semantics, naming) here would be greatly
appreciated, since the author is not much of a driver developer.

rte_bit_atomic_*() which provides atomic bit-level operations,
including the possibility to specifying memory ordering constraints
(or the lack thereof).

The atomic functions take non-_Atomic pointers, to be flexible, just
like the GCC builtins and default <rte_stdatomic.h>. The issue with
_Atomic APIs is that it may well be the case that the user wants to
perform both non-atomic and atomic operations on the same word.

Having _Atomic-marked addresses would complicate supporting atomic
bit-level operations in the bitset API (proposed in a different RFC
patchset), and potentially other APIs depending on RTE bitops for
atomic bit-level ops). Either one needs two bitset variants, one
_Atomic bitset and one non-atomic one, or the bitset code needs to
cast the non-_Atomic pointer to an _Atomic one. Having a separate
_Atomic bitset would be bloat and also prevent the user from both, in
some situations, doing atomic operations against a bit set, while in
other situations (e.g., at times when MT safety is not a concern)
operating on the same objects in a non-atomic manner.

Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
not uint32_t or uint64_t. The author found the use of such large types
confusing, and also failed to see any performance benefits.

A set of functions rte_bit_*_assign() are added, to assign a
particular boolean value to a particular bit.

All new functions have properly documented semantics.

All new functions (or more correctly, generic selection macros)
operate on both 32 and 64-bit words, with type checking.

_Generic allow the user code to be a little more impact. Have a
type-generic atomic test/set/clear/assign bit API also seems
consistent with the "core" (word-size) atomics API, which is generic
(both GCC builtins and <rte_stdatomic.h> are).

The _Generic versions avoids having explicit unsigned long versions of
all functions. If you have an unsigned long, it's safe to use the
generic version (e.g., rte_set_bit()) and _Generic will pick the right
function, provided long is either 32 or 64 bit on your platform (which
it is on all DPDK-supported ABIs).

The generic rte_bit_set() is a macro, and not a function, but
nevertheless has been given a lower-case name. That's how C11 does it
(for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
can't be taken, but it does not evaluate its parameters more than
once.

Things that are left out of this patch set, that may be included
in future versions:

 * Have all functions returning a bit number have the same return type
   (i.e., unsigned int).
 * Harmonize naming of some GCC builtin wrappers (i.e., rte_fls_u32()).
 * Add __builtin_ffsll()/ffs() wrapper and potentially other wrappers
   for useful/used bit-level GCC builtins.
 * Eliminate the MSVC #ifdef-induced documentation duplication.
 * _Generic versions of things like rte_popcount32(). (?)

Mattias Rönnblom (6):
  eal: extend bit manipulation functionality
  eal: add unit tests for bit operations
  eal: add exactly-once bit access functions
  eal: add unit tests for exactly-once bit access functions
  eal: add atomic bit operations
  eal: add unit tests for atomic bit access functions

 app/test/test_bitops.c       | 319 +++++++++++++++++-
 lib/eal/include/rte_bitops.h | 624 ++++++++++++++++++++++++++++++++++-
 2 files changed, 925 insertions(+), 18 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [RFC v2 1/6] eal: extend bit manipulation functionality
  2024-04-25  8:58   ` [RFC v2 0/6] Improve EAL bit operations API Mattias Rönnblom
@ 2024-04-25  8:58     ` Mattias Rönnblom
  2024-04-29  9:51       ` [RFC v3 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-04-25  8:58     ` [RFC v2 2/6] eal: add unit tests for bit operations Mattias Rönnblom
                       ` (6 subsequent siblings)
  7 siblings, 1 reply; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-25  8:58 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Mattias Rönnblom

Add functionality to test, set, clear, and assign the value to
individual bits in 32-bit or 64-bit words.

These functions have no implications on memory ordering, atomicity and
does not use volatile and thus does not prevent any compiler
optimizations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 158 ++++++++++++++++++++++++++++++++++-
 1 file changed, 156 insertions(+), 2 deletions(-)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 449565eeae..75a29fdfe0 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -2,6 +2,7 @@
  * Copyright(c) 2020 Arm Limited
  * Copyright(c) 2010-2019 Intel Corporation
  * Copyright(c) 2023 Microsoft Corporation
+ * Copyright(c) 2024 Ericsson AB
  */
 
 #ifndef _RTE_BITOPS_H_
@@ -11,12 +12,14 @@
  * @file
  * Bit Operations
  *
- * This file defines a family of APIs for bit operations
- * without enforcing memory ordering.
+ * This file provides functionality for low-level, single-word
+ * arithmetic and bit-level operations, such as counting or
+ * setting individual bits.
  */
 
 #include <stdint.h>
 
+#include <rte_compat.h>
 #include <rte_debug.h>
 
 #ifdef __cplusplus
@@ -105,6 +108,157 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_test32,			\
+		 uint64_t *: __rte_bit_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: __rte_bit_set32,		\
+		 uint64_t *: __rte_bit_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_clear32,			\
+		 uint64_t *: __rte_bit_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)					\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_assign32,			\
+		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
+
+#define __RTE_GEN_BIT_TEST(name, size, qualifier)			\
+	static inline bool						\
+	name(const qualifier uint ## size ## _t *addr, unsigned int nr)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return *addr & mask;					\
+	}
+
+#define __RTE_GEN_BIT_SET(name, size, qualifier)			\
+	static inline void						\
+	name(qualifier uint ## size ## _t *addr, unsigned int nr)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		*addr |= mask;						\
+	}								\
+
+#define __RTE_GEN_BIT_CLEAR(name, size, qualifier)			\
+	static inline void						\
+	name(qualifier uint ## size ## _t *addr, unsigned int nr)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = ~((uint ## size ## _t)1 << nr); \
+		(*addr) &= mask;					\
+	}								\
+
+__RTE_GEN_BIT_TEST(__rte_bit_test32, 32,)
+__RTE_GEN_BIT_SET(__rte_bit_set32, 32,)
+__RTE_GEN_BIT_CLEAR(__rte_bit_clear32, 32,)
+
+__RTE_GEN_BIT_TEST(__rte_bit_test64, 64,)
+__RTE_GEN_BIT_SET(__rte_bit_set64, 64,)
+__RTE_GEN_BIT_CLEAR(__rte_bit_clear64, 64,)
+
+__rte_experimental
+static inline void
+__rte_bit_assign32(uint32_t *addr, unsigned int nr, bool value)
+{
+	if (value)
+		__rte_bit_set32(addr, nr);
+	else
+		__rte_bit_clear32(addr, nr);
+}
+
+__rte_experimental
+static inline void
+__rte_bit_assign64(uint64_t *addr, unsigned int nr, bool value)
+{
+	if (value)
+		__rte_bit_set64(addr, nr);
+	else
+		__rte_bit_clear64(addr, nr);
+}
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v2 2/6] eal: add unit tests for bit operations
  2024-04-25  8:58   ` [RFC v2 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-04-25  8:58     ` [RFC v2 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-04-25  8:58     ` Mattias Rönnblom
  2024-04-25  8:58     ` [RFC v2 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
                       ` (5 subsequent siblings)
  7 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-25  8:58 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Mattias Rönnblom

Extend bitops tests to cover the rte_bit_[set|clear|assign|test]()
family of functions.

The tests are converted to use the test suite runner framework.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 app/test/test_bitops.c | 76 +++++++++++++++++++++++++++++++++---------
 1 file changed, 61 insertions(+), 15 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 0d4ccfb468..f788b561a0 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -1,13 +1,59 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2019 Arm Limited
+ * Copyright(c) 2024 Ericsson AB
  */
 
+#include <stdbool.h>
+
 #include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_random.h>
 #include "test.h"
 
-uint32_t val32;
-uint64_t val64;
+#define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
+			    test_fun, size)				\
+	static int							\
+	test_name(void)							\
+	{								\
+		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
+		unsigned int bit_nr;					\
+		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			bool assign = rte_rand() & 1;			\
+			if (assign)					\
+				assign_fun(&word, bit_nr, reference_bit); \
+			else {						\
+				if (reference_bit)			\
+					set_fun(&word, bit_nr);		\
+				else					\
+					clear_fun(&word, bit_nr);	\
+									\
+			}						\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		TEST_ASSERT(reference == word, "Word had unexpected value"); \
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_ACCESS(test_bit_access_32, rte_bit_set, rte_bit_clear, \
+		    rte_bit_assign, rte_bit_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_access_64, rte_bit_set, rte_bit_clear, \
+		    rte_bit_assign, rte_bit_test, 64)
+
+static uint32_t val32;
+static uint64_t val64;
 
 #define MAX_BITS_32 32
 #define MAX_BITS_64 64
@@ -117,22 +163,22 @@ test_bit_relaxed_test_set_clear(void)
 	return TEST_SUCCESS;
 }
 
+static struct unit_test_suite test_suite = {
+	.suite_name = "Bitops test suite",
+	.unit_test_cases = {
+		TEST_CASE(test_bit_access_32),
+		TEST_CASE(test_bit_access_64),
+		TEST_CASE(test_bit_relaxed_set),
+		TEST_CASE(test_bit_relaxed_clear),
+		TEST_CASE(test_bit_relaxed_test_set_clear),
+		TEST_CASES_END()
+	}
+};
+
 static int
 test_bitops(void)
 {
-	val32 = 0;
-	val64 = 0;
-
-	if (test_bit_relaxed_set() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_clear() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_test_set_clear() < 0)
-		return TEST_FAILED;
-
-	return TEST_SUCCESS;
+	return unit_test_suite_runner(&test_suite);
 }
 
 REGISTER_FAST_TEST(bitops_autotest, true, true, test_bitops);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v2 3/6] eal: add exactly-once bit access functions
  2024-04-25  8:58   ` [RFC v2 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-04-25  8:58     ` [RFC v2 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
  2024-04-25  8:58     ` [RFC v2 2/6] eal: add unit tests for bit operations Mattias Rönnblom
@ 2024-04-25  8:58     ` Mattias Rönnblom
  2024-04-25  8:58     ` [RFC v2 4/6] eal: add unit tests for " Mattias Rönnblom
                       ` (4 subsequent siblings)
  7 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-25  8:58 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Mattias Rönnblom

Add bit test/set/clear/assign functions which prevents certain
compiler optimizations and guarantees that program-level memory loads
and/or stores will actually occur.

These functions are useful when interacting with memory-mapped
hardware devices.

The "once" family of functions does not promise atomicity and provides
no memory ordering guarantees beyond the C11 relaxed memory model.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 170 +++++++++++++++++++++++++++++++++++
 1 file changed, 170 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 75a29fdfe0..a2746e657f 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -201,6 +201,147 @@ extern "C" {
 		 uint32_t *: __rte_bit_assign32,			\
 		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Generic selection macro to test exactly once the value of a bit in
+ * a 32-bit or 64-bit word. The type of operation depends on the type
+ * of the @c addr parameter.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * (e.g., it may not be eliminate or merged by the compiler).
+ *
+ * \code{.c}
+ * rte_bit_once_set(addr, 17);
+ * if (rte_bit_once_test(addr, 17)) {
+ *     ...
+ * }
+ * \endcode
+ *
+ * In the above example, rte_bit_once_set() may not be removed by
+ * the compiler, which would be allowed in case rte_bit_set() and
+ * rte_bit_test() was used.
+ *
+ * \code{.c}
+ * while (rte_bit_once_test(addr, 17);
+ *     ;
+ * \endcode
+ *
+ * In case rte_bit_test(addr, 17) was used instead, the resulting
+ * object code could (and in many cases would be) replaced with
+ * the equivalent to
+ * \code{.c}
+ * if (rte_bit_test(addr, 17)) {
+ *   for (;;) // spin forever
+ *       ;
+ * }
+ * \endcode
+ *
+ * rte_bit_once_test() does not give any guarantees in regards to
+ * memory ordering or atomicity.
+ *
+ * The regular bit set operations (e.g., rte_bit_test()) should be
+ * preferred over the "once" family of operations (e.g.,
+ * rte_bit_once_test()) if possible, since the latter may prevent
+ * optimizations crucial for run-time performance.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+
+#define rte_bit_once_test(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_test32,		\
+		 uint64_t *: __rte_bit_once_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to '1'
+ * exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit set operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_set(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_set32,		\
+		 uint64_t *: __rte_bit_once_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to '0'
+ * exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_clear(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_clear32,		\
+		 uint64_t *: __rte_bit_once_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to bit in a word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to the
+ * value indicated by @c value exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_once_assign(addr, nr, value)				\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_once_assign32,			\
+		 uint64_t *: __rte_bit_once_assign64)(addr, nr, value)
+
 #define __RTE_GEN_BIT_TEST(name, size, qualifier)			\
 	static inline bool						\
 	name(const qualifier uint ## size ## _t *addr, unsigned int nr)	\
@@ -239,6 +380,14 @@ __RTE_GEN_BIT_TEST(__rte_bit_test64, 64,)
 __RTE_GEN_BIT_SET(__rte_bit_set64, 64,)
 __RTE_GEN_BIT_CLEAR(__rte_bit_clear64, 64,)
 
+__RTE_GEN_BIT_TEST(__rte_bit_once_test32, 32, volatile)
+__RTE_GEN_BIT_SET(__rte_bit_once_set32, 32, volatile)
+__RTE_GEN_BIT_CLEAR(__rte_bit_once_clear32, 32, volatile)
+
+__RTE_GEN_BIT_TEST(__rte_bit_once_test64, 64, volatile)
+__RTE_GEN_BIT_SET(__rte_bit_once_set64, 64, volatile)
+__RTE_GEN_BIT_CLEAR(__rte_bit_once_clear64, 64, volatile)
+
 __rte_experimental
 static inline void
 __rte_bit_assign32(uint32_t *addr, unsigned int nr, bool value)
@@ -259,6 +408,27 @@ __rte_bit_assign64(uint64_t *addr, unsigned int nr, bool value)
 		__rte_bit_clear64(addr, nr);
 }
 
+
+__rte_experimental
+static inline void
+__rte_bit_once_assign32(uint32_t *addr, unsigned int nr, bool value)
+{
+	if (value)
+		__rte_bit_once_set32(addr, nr);
+	else
+		__rte_bit_once_clear32(addr, nr);
+}
+
+__rte_experimental
+static inline void
+__rte_bit_once_assign64(volatile uint64_t *addr, unsigned int nr, bool value)
+{
+	if (value)
+		__rte_bit_once_set64(addr, nr);
+	else
+		__rte_bit_once_clear64(addr, nr);
+}
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v2 4/6] eal: add unit tests for exactly-once bit access functions
  2024-04-25  8:58   ` [RFC v2 0/6] Improve EAL bit operations API Mattias Rönnblom
                       ` (2 preceding siblings ...)
  2024-04-25  8:58     ` [RFC v2 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
@ 2024-04-25  8:58     ` Mattias Rönnblom
  2024-04-25  8:58     ` [RFC v2 5/6] eal: add atomic bit operations Mattias Rönnblom
                       ` (3 subsequent siblings)
  7 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-25  8:58 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_once_[set|clear|assign|test]() family of functions.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 app/test/test_bitops.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index f788b561a0..12c1027e36 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -46,12 +46,20 @@
 		return TEST_SUCCESS;					\
 	}
 
-GEN_TEST_BIT_ACCESS(test_bit_access_32, rte_bit_set, rte_bit_clear, \
+GEN_TEST_BIT_ACCESS(test_bit_access_32, rte_bit_set, rte_bit_clear,	\
 		    rte_bit_assign, rte_bit_test, 32)
 
-GEN_TEST_BIT_ACCESS(test_bit_access_64, rte_bit_set, rte_bit_clear, \
+GEN_TEST_BIT_ACCESS(test_bit_access_64, rte_bit_set, rte_bit_clear,	\
 		    rte_bit_assign, rte_bit_test, 64)
 
+GEN_TEST_BIT_ACCESS(test_bit_once_access_32, rte_bit_once_set,		\
+		    rte_bit_once_clear, rte_bit_once_assign,		\
+		    rte_bit_once_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_once_access_64, rte_bit_once_set,		\
+		    rte_bit_once_clear, rte_bit_once_assign,		\
+		    rte_bit_once_test, 64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -168,6 +176,8 @@ static struct unit_test_suite test_suite = {
 	.unit_test_cases = {
 		TEST_CASE(test_bit_access_32),
 		TEST_CASE(test_bit_access_64),
+		TEST_CASE(test_bit_once_access_32),
+		TEST_CASE(test_bit_once_access_64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v2 5/6] eal: add atomic bit operations
  2024-04-25  8:58   ` [RFC v2 0/6] Improve EAL bit operations API Mattias Rönnblom
                       ` (3 preceding siblings ...)
  2024-04-25  8:58     ` [RFC v2 4/6] eal: add unit tests for " Mattias Rönnblom
@ 2024-04-25  8:58     ` Mattias Rönnblom
  2024-04-25 10:25       ` Morten Brørup
  2024-04-25  8:58     ` [RFC v2 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
                       ` (2 subsequent siblings)
  7 siblings, 1 reply; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-25  8:58 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Mattias Rönnblom

Add atomic bit test/set/clear/assign and test-and-set/clear functions.

All atomic bit functions allow (and indeed, require) the caller to
specify a memory order.

RFC v2:
 o Add rte_bit_atomic_test_and_assign() (for consistency).
 o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
 o Use <rte_stdatomics.h> to support MSVC.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 297 +++++++++++++++++++++++++++++++++++
 1 file changed, 297 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index a2746e657f..8c38a1ac03 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -21,6 +21,7 @@
 
 #include <rte_compat.h>
 #include <rte_debug.h>
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -342,6 +343,177 @@ extern "C" {
 		 uint32_t *: __rte_bit_once_assign32,			\
 		 uint64_t *: __rte_bit_once_assign64)(addr, nr, value)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test if a particular bit in a word is set with a particular memory
+ * order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+#define rte_bit_atomic_test(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test32,			\
+		 uint64_t *: __rte_bit_atomic_test64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically set bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '1', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_set(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_set32,			\
+		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically clear bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '0', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_clear(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_clear32,			\
+		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically assign a value to bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to the value indicated by @c value, with the memory ordering
+ * as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_assign32,			\
+		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
+							memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and set a bit in word.
+ *
+ * Atomically test and set bit specified by @c nr in the word pointed
+ * to by @c addr to '1', with the memory ordering as specified with @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
+							      memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and clear a bit in word.
+ *
+ * Atomically test and clear bit specified by @c nr in the word
+ * pointed to by @c addr to '0', with the memory ordering as specified
+ * with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
+								memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and assign a bit in word.
+ *
+ * Atomically test and assign bit specified by @c nr in the word
+ * pointed to by @c addr the value specified by @c value, with the
+ * memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
+		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
+								 value, \
+								 memory_order)
+
 #define __RTE_GEN_BIT_TEST(name, size, qualifier)			\
 	static inline bool						\
 	name(const qualifier uint ## size ## _t *addr, unsigned int nr)	\
@@ -429,6 +601,131 @@ __rte_bit_once_assign64(volatile uint64_t *addr, unsigned int nr, bool value)
 		__rte_bit_once_clear64(addr, nr);
 }
 
+#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
+	static inline bool						\
+	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\
+				      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_SET(size)					\
+	static inline void						\
+	__rte_bit_atomic_set ## size(uint ## size ## _t *addr,		\
+				     unsigned int nr, int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_or_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_CLEAR(size)				\
+	static inline void						\
+	__rte_bit_atomic_clear ## size(uint ## size ## _t *addr,	\
+				       unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_and_explicit(a_addr, ~mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_ASSIGN(size)				\
+	static inline void						\
+	__rte_bit_atomic_assign ## size(uint ## size ## _t *addr,	\
+					unsigned int nr, bool value,	\
+					int memory_order)		\
+	{								\
+		if (value)						\
+			__rte_bit_atomic_set ## size(addr, nr, memory_order); \
+		else							\
+			__rte_bit_atomic_clear ## size(addr, nr,	\
+						       memory_order);	\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)			\
+	static inline bool						\
+	__rte_bit_atomic_test_and_assign ## size(uint ## size ## _t *addr, \
+						 unsigned int nr,	\
+						 bool value,		\
+						 int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t before;				\
+		uint ## size ## _t target;				\
+									\
+		before = rte_atomic_load_explicit(a_addr,		\
+						  rte_memory_order_relaxed); \
+									\
+		do {							\
+			target = before;				\
+			__rte_bit_assign ## size(&target, nr, value);	\
+		} while (!rte_atomic_compare_exchange_weak_explicit(	\
+				a_addr, &before, target,		\
+				rte_memory_order_relaxed,		\
+				memory_order));				\
+		return __rte_bit_test ## size(&before, nr);		\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_OPS(size)			\
+	__RTE_GEN_BIT_ATOMIC_TEST(size)			\
+	__RTE_GEN_BIT_ATOMIC_SET(size)			\
+	__RTE_GEN_BIT_ATOMIC_CLEAR(size)		\
+	__RTE_GEN_BIT_ATOMIC_ASSIGN(size)		\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)
+
+__RTE_GEN_BIT_ATOMIC_OPS(32)
+__RTE_GEN_BIT_ATOMIC_OPS(64)
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_set32(uint32_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, true,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_clear32(uint32_t *addr, unsigned int nr,
+				int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, false,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_set64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, true,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_clear64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, false,
+						  memory_order);
+}
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v2 6/6] eal: add unit tests for atomic bit access functions
  2024-04-25  8:58   ` [RFC v2 0/6] Improve EAL bit operations API Mattias Rönnblom
                       ` (4 preceding siblings ...)
  2024-04-25  8:58     ` [RFC v2 5/6] eal: add atomic bit operations Mattias Rönnblom
@ 2024-04-25  8:58     ` Mattias Rönnblom
  2024-04-25 18:05     ` [RFC v2 0/6] Improve EAL bit operations API Tyler Retzlaff
  2024-04-26 21:35     ` Patrick Robb
  7 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-25  8:58 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_atomic_[set|clear|assign|test|test_and_[set|clear|assign]]()
family of functions.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 app/test/test_bitops.c       | 233 ++++++++++++++++++++++++++++++++++-
 lib/eal/include/rte_bitops.h |   1 -
 2 files changed, 232 insertions(+), 2 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 12c1027e36..a0967260aa 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -3,10 +3,13 @@
  * Copyright(c) 2024 Ericsson AB
  */
 
+#include <inttypes.h>
 #include <stdbool.h>
 
-#include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_lcore.h>
 #include <rte_random.h>
 #include "test.h"
 
@@ -60,6 +63,228 @@ GEN_TEST_BIT_ACCESS(test_bit_once_access_64, rte_bit_once_set,		\
 		    rte_bit_once_clear, rte_bit_once_assign,		\
 		    rte_bit_once_test, 64)
 
+#define bit_atomic_set(addr, nr)				\
+	rte_bit_atomic_set(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_clear(addr, nr)					\
+	rte_bit_atomic_clear(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_assign(addr, nr, value)				\
+	rte_bit_atomic_assign(addr, nr, value, rte_memory_order_relaxed)
+
+#define bit_atomic_test(addr, nr)				\
+	rte_bit_atomic_test(addr, nr, rte_memory_order_relaxed)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access_32, bit_atomic_set,	\
+		    bit_atomic_clear, bit_atomic_assign,	\
+		    bit_atomic_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access_64, bit_atomic_set,	\
+		    bit_atomic_clear, bit_atomic_assign,	\
+		    bit_atomic_test, 64)
+
+#define PARALLEL_TEST_RUNTIME 0.25
+
+#define GEN_TEST_BIT_PARALLEL_ASSIGN(size)				\
+									\
+	struct parallel_access_lcore_ ## size				\
+	{								\
+		unsigned int bit;					\
+		uint ## size ##_t *word;				\
+		bool failed;						\
+	};								\
+									\
+	static int							\
+	run_parallel_assign_ ## size(void *arg)				\
+	{								\
+		struct parallel_access_lcore_ ## size *lcore = arg;	\
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		bool value = false;					\
+									\
+		do {							\
+			bool new_value = rte_rand() & 1;		\
+			bool use_test_and_modify = rte_rand() & 1;	\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (rte_bit_atomic_test(lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) != value) { \
+				lcore->failed = true;			\
+				break;					\
+			}						\
+									\
+			if (use_test_and_modify) {			\
+				bool old_value;				\
+				if (use_assign) 			\
+					old_value = rte_bit_atomic_test_and_assign( \
+						lcore->word, lcore->bit, new_value, \
+						rte_memory_order_relaxed); \
+				else {					\
+					old_value = new_value ?		\
+						rte_bit_atomic_test_and_set( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed) : \
+						rte_bit_atomic_test_and_clear( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+				if (old_value != value) {		\
+					lcore->failed = true;		\
+					break;				\
+				}					\
+			} else {					\
+				if (use_assign)				\
+					rte_bit_atomic_assign(lcore->word, lcore->bit, \
+							      new_value, \
+							      rte_memory_order_relaxed); \
+				else {					\
+					if (new_value)			\
+						rte_bit_atomic_set(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+					else				\
+						rte_bit_atomic_clear(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+			}						\
+									\
+			value = new_value;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_assign_ ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		struct parallel_access_lcore_ ## size main = {		\
+			.word = &word					\
+		};							\
+		struct parallel_access_lcore_ ## size worker = {	\
+			.word = &word					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		main.bit = rte_rand_max(size);				\
+		do {							\
+			worker.bit = rte_rand_max(size);		\
+		} while (worker.bit == main.bit);			\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_assign_ ## size, \
+					       &worker,	worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_assign_ ## size(&main);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		TEST_ASSERT(!main.failed, "Main lcore atomic access failed"); \
+		TEST_ASSERT(!worker.failed, "Worker lcore atomic access " \
+			    "failed");					\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_ASSIGN(32)
+GEN_TEST_BIT_PARALLEL_ASSIGN(64)
+
+#define GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(size)			\
+									\
+	struct parallel_test_and_set_lcore_ ## size			\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_test_and_modify_ ## size(void *arg)		\
+	{								\
+		struct parallel_test_and_set_lcore_ ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			bool old_value;					\
+			bool new_value = rte_rand() & 1;		\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (use_assign)					\
+				old_value = rte_bit_atomic_test_and_assign( \
+					lcore->word, lcore->bit, new_value, \
+					rte_memory_order_relaxed);	\
+			else						\
+				old_value = new_value ?			\
+					rte_bit_atomic_test_and_set(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) : \
+					rte_bit_atomic_test_and_clear(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed); \
+			if (old_value != new_value)			\
+				lcore->flips++;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_test_and_modify_ ## size(void)		\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_test_and_set_lcore_ ## size main = {	\
+			.word = &word,				       \
+			.bit = bit \
+		};							\
+		struct parallel_test_and_set_lcore_ ## size worker = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_test_and_modify_ ## size, \
+					       &worker,	worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_test_and_modify_ ## size(&main);		\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = main.flips + worker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(32)
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -178,6 +403,12 @@ static struct unit_test_suite test_suite = {
 		TEST_CASE(test_bit_access_64),
 		TEST_CASE(test_bit_once_access_32),
 		TEST_CASE(test_bit_once_access_64),
+		TEST_CASE(test_bit_atomic_access_32),
+		TEST_CASE(test_bit_atomic_access_64),
+		TEST_CASE(test_bit_atomic_parallel_assign_32),
+		TEST_CASE(test_bit_atomic_parallel_assign_64),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify_32),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify_64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 8c38a1ac03..bc6d79086b 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -485,7 +485,6 @@ extern "C" {
 		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
 		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
 								memory_order)
-
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* RE: [RFC v2 5/6] eal: add atomic bit operations
  2024-04-25  8:58     ` [RFC v2 5/6] eal: add atomic bit operations Mattias Rönnblom
@ 2024-04-25 10:25       ` Morten Brørup
  2024-04-25 14:36         ` Mattias Rönnblom
  0 siblings, 1 reply; 90+ messages in thread
From: Morten Brørup @ 2024-04-25 10:25 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff

> +#define rte_bit_atomic_test(addr, nr, memory_order)			\
> +	_Generic((addr),						\
> +		 uint32_t *: __rte_bit_atomic_test32,			\
> +		 uint64_t *: __rte_bit_atomic_test64)(addr, nr, memory_order)

I wonder if these should have RTE_ATOMIC qualifier:

+		 RTE_ATOMIC(uint32_t) *: __rte_bit_atomic_test32,			\
+		 RTE_ATOMIC(uint64_t) *: __rte_bit_atomic_test64)(addr, nr, memory_order)


> +#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
> +	static inline bool						\
> +	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\

I wonder if the "addr" parameter should have RTE_ATOMIC qualifier:

+	__rte_bit_atomic_test ## size(const RTE_ATOMIC(uint ## size ## _t) *addr,	\

instead of casting into a_addr.

> +				      unsigned int nr, int memory_order) \
> +	{								\
> +		RTE_ASSERT(nr < size);					\
> +									\
> +		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
> +			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
> +		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
> +		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
> +	}


Similar considerations regarding volatile qualifier for the "once" operations.


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC v2 5/6] eal: add atomic bit operations
  2024-04-25 10:25       ` Morten Brørup
@ 2024-04-25 14:36         ` Mattias Rönnblom
  2024-04-25 16:18           ` Morten Brørup
  0 siblings, 1 reply; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-25 14:36 UTC (permalink / raw)
  To: Morten Brørup, Mattias Rönnblom, dev
  Cc: Heng Wang, Stephen Hemminger, Tyler Retzlaff

On 2024-04-25 12:25, Morten Brørup wrote:
>> +#define rte_bit_atomic_test(addr, nr, memory_order)			\
>> +	_Generic((addr),						\
>> +		 uint32_t *: __rte_bit_atomic_test32,			\
>> +		 uint64_t *: __rte_bit_atomic_test64)(addr, nr, memory_order)
> 
> I wonder if these should have RTE_ATOMIC qualifier:
> 
> +		 RTE_ATOMIC(uint32_t) *: __rte_bit_atomic_test32,			\
> +		 RTE_ATOMIC(uint64_t) *: __rte_bit_atomic_test64)(addr, nr, memory_order)
> 
> 
>> +#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
>> +	static inline bool						\
>> +	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\
> 
> I wonder if the "addr" parameter should have RTE_ATOMIC qualifier:
> 
> +	__rte_bit_atomic_test ## size(const RTE_ATOMIC(uint ## size ## _t) *addr,	\
> 
> instead of casting into a_addr.
> 

Check the cover letter for the rationale for the cast.

Where I'm at now is that I think C11 _Atomic is rather poor design. The 
assumption that an object which allows for atomic access always should 
require all operations upon it to be atomic, regardless of where it is 
in its lifetime, and which thread is accessing it, does not hold, in the 
general case.

The only reason for _Atomic being as it is, as far as I can see, is to 
accommodate for ISAs which does not have the appropriate atomic machine 
instructions, and thus require a lock or some other data associated with 
the actual user-data-carrying bits. Neither GCC nor DPDK supports any 
such ISAs, to my knowledge. I suspect neither never will. So the cast 
will continue to work.

>> +				      unsigned int nr, int memory_order) \
>> +	{								\
>> +		RTE_ASSERT(nr < size);					\
>> +									\
>> +		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
>> +			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
>> +		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
>> +		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
>> +	}
> 
> 
> Similar considerations regarding volatile qualifier for the "once" operations.
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* RE: [RFC v2 5/6] eal: add atomic bit operations
  2024-04-25 14:36         ` Mattias Rönnblom
@ 2024-04-25 16:18           ` Morten Brørup
  2024-04-26  9:39             ` Mattias Rönnblom
  0 siblings, 1 reply; 90+ messages in thread
From: Morten Brørup @ 2024-04-25 16:18 UTC (permalink / raw)
  To: Mattias Rönnblom, Mattias Rönnblom, dev, Tyler Retzlaff
  Cc: Heng Wang, Stephen Hemminger, techboard

> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Thursday, 25 April 2024 16.36
> 
> On 2024-04-25 12:25, Morten Brørup wrote:
> >> +#define rte_bit_atomic_test(addr, nr, memory_order)
> 	\
> >> +	_Generic((addr),						\
> >> +		 uint32_t *: __rte_bit_atomic_test32,			\
> >> +		 uint64_t *: __rte_bit_atomic_test64)(addr, nr,
> memory_order)
> >
> > I wonder if these should have RTE_ATOMIC qualifier:
> >
> > +		 RTE_ATOMIC(uint32_t) *: __rte_bit_atomic_test32,
> 		\
> > +		 RTE_ATOMIC(uint64_t) *: __rte_bit_atomic_test64)(addr, nr,
> memory_order)
> >
> >
> >> +#define __RTE_GEN_BIT_ATOMIC_TEST(size)
> 	\
> >> +	static inline bool						\
> >> +	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,
> 	\
> >
> > I wonder if the "addr" parameter should have RTE_ATOMIC qualifier:
> >
> > +	__rte_bit_atomic_test ## size(const RTE_ATOMIC(uint ## size ## _t)
> *addr,	\
> >
> > instead of casting into a_addr.
> >
> 
> Check the cover letter for the rationale for the cast.

Thanks, that clarifies it. Then...
For the series:
Acked-by: Morten Brørup <mb@smartsharesystems.com>

> 
> Where I'm at now is that I think C11 _Atomic is rather poor design. The
> assumption that an object which allows for atomic access always should
> require all operations upon it to be atomic, regardless of where it is
> in its lifetime, and which thread is accessing it, does not hold, in the
> general case.

It might be slow, but I suppose the C11 standard prioritizes correctness over performance.

It seems locks are automatically added to _Atomic types larger than what is natively supported by the architecture.
E.g. MSVC adds locks to _Atomic types larger than 8 byte. [1]

[1]: https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual-studio-2022-version-17-5-preview-2/

> 
> The only reason for _Atomic being as it is, as far as I can see, is to
> accommodate for ISAs which does not have the appropriate atomic machine
> instructions, and thus require a lock or some other data associated with
> the actual user-data-carrying bits. Neither GCC nor DPDK supports any
> such ISAs, to my knowledge. I suspect neither never will. So the cast
> will continue to work.

I tend to agree with you on this.

We should officially decide that DPDK treats RTE_ATOMIC types as a union of _Atomic and non-atomic, i.e. operations on RTE_ATOMIC types can be both atomic and non-atomic.

> 
> >> +				      unsigned int nr, int memory_order) \
> >> +	{								\
> >> +		RTE_ASSERT(nr < size);					\
> >> +									\
> >> +		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
> >> +			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
> >> +		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
> >> +		return rte_atomic_load_explicit(a_addr, memory_order) &
> mask; \
> >> +	}
> >
> >
> > Similar considerations regarding volatile qualifier for the "once"
> operations.
> >

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC v2 0/6] Improve EAL bit operations API
  2024-04-25  8:58   ` [RFC v2 0/6] Improve EAL bit operations API Mattias Rönnblom
                       ` (5 preceding siblings ...)
  2024-04-25  8:58     ` [RFC v2 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
@ 2024-04-25 18:05     ` Tyler Retzlaff
  2024-04-26 11:17       ` Mattias Rönnblom
  2024-04-26 21:35     ` Patrick Robb
  7 siblings, 1 reply; 90+ messages in thread
From: Tyler Retzlaff @ 2024-04-25 18:05 UTC (permalink / raw)
  To: Mattias Rönnblom; +Cc: dev, hofors, Heng Wang, Stephen Hemminger

On Thu, Apr 25, 2024 at 10:58:47AM +0200, Mattias Rönnblom wrote:
> This patch set represent an attempt to improve and extend the RTE
> bitops API, in particular for functions that operate on individual
> bits.
> 
> All new functionality is exposed to the user as generic selection
> macros, delegating the actual work to private (__-marked) static
> inline functions. Public functions (e.g., rte_bit_set32()) would just
> be bloating the API. Such generic selection macros will here be
> referred to as "functions", although technically they are not.


> 
> The legacy <rte_bitops.h> rte_bit_relaxed_*() family of functions is
> replaced with three families:
> 
> rte_bit_[test|set|clear|assign]() which provides no memory ordering or
> atomicity guarantees and no read-once or write-once semantics (e.g.,
> no use of volatile), but does provide the best performance. The
> performance degradation resulting from the use of volatile (e.g.,
> forcing loads and stores to actually occur and in the number
> specified) and atomic (e.g., LOCK-prefixed instructions on x86) may be
> significant.
> 
> rte_bit_once_*() which guarantees program-level load and stores
> actually occurring (i.e., prevents certain optimizations). The primary
> use of these functions are in the context of memory mapped
> I/O. Feedback on the details (semantics, naming) here would be greatly
> appreciated, since the author is not much of a driver developer.
> 
> rte_bit_atomic_*() which provides atomic bit-level operations,
> including the possibility to specifying memory ordering constraints
> (or the lack thereof).
> 
> The atomic functions take non-_Atomic pointers, to be flexible, just
> like the GCC builtins and default <rte_stdatomic.h>. The issue with
> _Atomic APIs is that it may well be the case that the user wants to
> perform both non-atomic and atomic operations on the same word.
> 
> Having _Atomic-marked addresses would complicate supporting atomic
> bit-level operations in the bitset API (proposed in a different RFC
> patchset), and potentially other APIs depending on RTE bitops for
> atomic bit-level ops). Either one needs two bitset variants, one
> _Atomic bitset and one non-atomic one, or the bitset code needs to
> cast the non-_Atomic pointer to an _Atomic one. Having a separate
> _Atomic bitset would be bloat and also prevent the user from both, in
> some situations, doing atomic operations against a bit set, while in
> other situations (e.g., at times when MT safety is not a concern)
> operating on the same objects in a non-atomic manner.

understood. i think the only downside is that if you do have an
_Atomic-specified type you'll have to cast the qualification away
to use the function like macro.

as a suggestion the _Generic legs could include both _Atomic-specified
and non-_Atomic-specified types where an intermediate inline function
could strip the qualification to use your core inline implementations.

_Generic((v), int *: __foo32, RTE_ATOMIC(int) *: __foo32_unqual)(v))

static inline void
__foo32(int *a) { ... }

static inline void
__foo32_unqual(RTE_ATOMIC(int) *a) { __foo32((int *)(uintptr_t)(a)); }

there is some similar prior art in newer ISO C23 with typeof_unqual.

https://en.cppreference.com/w/c/language/typeof

> 
> Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
> not uint32_t or uint64_t. The author found the use of such large types
> confusing, and also failed to see any performance benefits.
> 
> A set of functions rte_bit_*_assign() are added, to assign a
> particular boolean value to a particular bit.
> 
> All new functions have properly documented semantics.
> 
> All new functions (or more correctly, generic selection macros)
> operate on both 32 and 64-bit words, with type checking.
> 
> _Generic allow the user code to be a little more impact. Have a
> type-generic atomic test/set/clear/assign bit API also seems
> consistent with the "core" (word-size) atomics API, which is generic
> (both GCC builtins and <rte_stdatomic.h> are).

ack, can you confirm _Generic is usable from a C++ TU? i may be making a
mistake locally but using g++ version 11.4.0 -std=c++20 it wasn't
accepting it.

i think using _Generic is ideal, it eliminates mistakes when handling
the different integer sizes so if it turns out C++ doesn't want to
cooperate the function like macro can conditionally expand to a C++
template this will need to be done for MSVC since i can confirm
_Generic does not work with MSVC C++.

> 
> The _Generic versions avoids having explicit unsigned long versions of
> all functions. If you have an unsigned long, it's safe to use the
> generic version (e.g., rte_set_bit()) and _Generic will pick the right
> function, provided long is either 32 or 64 bit on your platform (which
> it is on all DPDK-supported ABIs).
> 
> The generic rte_bit_set() is a macro, and not a function, but
> nevertheless has been given a lower-case name. That's how C11 does it
> (for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
> can't be taken, but it does not evaluate its parameters more than
> once.
> 
> Things that are left out of this patch set, that may be included
> in future versions:
> 
>  * Have all functions returning a bit number have the same return type
>    (i.e., unsigned int).
>  * Harmonize naming of some GCC builtin wrappers (i.e., rte_fls_u32()).
>  * Add __builtin_ffsll()/ffs() wrapper and potentially other wrappers
>    for useful/used bit-level GCC builtins.
>  * Eliminate the MSVC #ifdef-induced documentation duplication.
>  * _Generic versions of things like rte_popcount32(). (?)

it would be nice to see them all converted, at the time i added them we
still hadn't adopted C11 so was limited. but certainly not asking for it
as a part of this series.

> 
> Mattias Rönnblom (6):
>   eal: extend bit manipulation functionality
>   eal: add unit tests for bit operations
>   eal: add exactly-once bit access functions
>   eal: add unit tests for exactly-once bit access functions
>   eal: add atomic bit operations
>   eal: add unit tests for atomic bit access functions
> 
>  app/test/test_bitops.c       | 319 +++++++++++++++++-
>  lib/eal/include/rte_bitops.h | 624 ++++++++++++++++++++++++++++++++++-
>  2 files changed, 925 insertions(+), 18 deletions(-)
> 
> -- 

Series-acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>

> 2.34.1

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC v2 5/6] eal: add atomic bit operations
  2024-04-25 16:18           ` Morten Brørup
@ 2024-04-26  9:39             ` Mattias Rönnblom
  2024-04-26 12:00               ` Morten Brørup
  2024-04-30 16:52               ` Tyler Retzlaff
  0 siblings, 2 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-26  9:39 UTC (permalink / raw)
  To: Morten Brørup, Mattias Rönnblom, dev, Tyler Retzlaff
  Cc: Heng Wang, Stephen Hemminger, techboard

On 2024-04-25 18:18, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
>> Sent: Thursday, 25 April 2024 16.36
>>
>> On 2024-04-25 12:25, Morten Brørup wrote:
>>>> +#define rte_bit_atomic_test(addr, nr, memory_order)
>> 	\
>>>> +	_Generic((addr),						\
>>>> +		 uint32_t *: __rte_bit_atomic_test32,			\
>>>> +		 uint64_t *: __rte_bit_atomic_test64)(addr, nr,
>> memory_order)
>>>
>>> I wonder if these should have RTE_ATOMIC qualifier:
>>>
>>> +		 RTE_ATOMIC(uint32_t) *: __rte_bit_atomic_test32,
>> 		\
>>> +		 RTE_ATOMIC(uint64_t) *: __rte_bit_atomic_test64)(addr, nr,
>> memory_order)
>>>
>>>
>>>> +#define __RTE_GEN_BIT_ATOMIC_TEST(size)
>> 	\
>>>> +	static inline bool						\
>>>> +	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,
>> 	\
>>>
>>> I wonder if the "addr" parameter should have RTE_ATOMIC qualifier:
>>>
>>> +	__rte_bit_atomic_test ## size(const RTE_ATOMIC(uint ## size ## _t)
>> *addr,	\
>>>
>>> instead of casting into a_addr.
>>>
>>
>> Check the cover letter for the rationale for the cast.
> 
> Thanks, that clarifies it. Then...
> For the series:
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> 
>>
>> Where I'm at now is that I think C11 _Atomic is rather poor design. The
>> assumption that an object which allows for atomic access always should
>> require all operations upon it to be atomic, regardless of where it is
>> in its lifetime, and which thread is accessing it, does not hold, in the
>> general case.
> 
> It might be slow, but I suppose the C11 standard prioritizes correctness over performance.
> 

That's a false dichotomy, in this case. You can have both.

> It seems locks are automatically added to _Atomic types larger than what is natively supported by the architecture.
> E.g. MSVC adds locks to _Atomic types larger than 8 byte. [1]
> 
> [1]: https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual-studio-2022-version-17-5-preview-2/
> 
>>
>> The only reason for _Atomic being as it is, as far as I can see, is to
>> accommodate for ISAs which does not have the appropriate atomic machine
>> instructions, and thus require a lock or some other data associated with
>> the actual user-data-carrying bits. Neither GCC nor DPDK supports any
>> such ISAs, to my knowledge. I suspect neither never will. So the cast
>> will continue to work.
> 
> I tend to agree with you on this.
> 
> We should officially decide that DPDK treats RTE_ATOMIC types as a union of _Atomic and non-atomic, i.e. operations on RTE_ATOMIC types can be both atomic and non-atomic.
> 

I think this is a subject which needs to be further explored.

Objects that can be accessed both atomically and non-atomically should 
be without _Atomic. With my current understanding of this issue, that 
seems like the best option.

You could turn it around as well, and have such marked _Atomic and have 
explicit casts to their non-_Atomic cousins when operated upon by 
non-atomic functions. Not sure how realistic that is, since 
non-atomicity is the norm. All generic selection-based "functions" must 
take this into account.

>>
>>>> +				      unsigned int nr, int memory_order) \
>>>> +	{								\
>>>> +		RTE_ASSERT(nr < size);					\
>>>> +									\
>>>> +		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
>>>> +			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
>>>> +		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
>>>> +		return rte_atomic_load_explicit(a_addr, memory_order) &
>> mask; \
>>>> +	}
>>>
>>>
>>> Similar considerations regarding volatile qualifier for the "once"
>> operations.
>>>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC v2 0/6] Improve EAL bit operations API
  2024-04-25 18:05     ` [RFC v2 0/6] Improve EAL bit operations API Tyler Retzlaff
@ 2024-04-26 11:17       ` Mattias Rönnblom
  0 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-26 11:17 UTC (permalink / raw)
  To: Tyler Retzlaff, Mattias Rönnblom; +Cc: dev, Heng Wang, Stephen Hemminger

On 2024-04-25 20:05, Tyler Retzlaff wrote:
> On Thu, Apr 25, 2024 at 10:58:47AM +0200, Mattias Rönnblom wrote:
>> This patch set represent an attempt to improve and extend the RTE
>> bitops API, in particular for functions that operate on individual
>> bits.
>>
>> All new functionality is exposed to the user as generic selection
>> macros, delegating the actual work to private (__-marked) static
>> inline functions. Public functions (e.g., rte_bit_set32()) would just
>> be bloating the API. Such generic selection macros will here be
>> referred to as "functions", although technically they are not.
> 
> 
>>
>> The legacy <rte_bitops.h> rte_bit_relaxed_*() family of functions is
>> replaced with three families:
>>
>> rte_bit_[test|set|clear|assign]() which provides no memory ordering or
>> atomicity guarantees and no read-once or write-once semantics (e.g.,
>> no use of volatile), but does provide the best performance. The
>> performance degradation resulting from the use of volatile (e.g.,
>> forcing loads and stores to actually occur and in the number
>> specified) and atomic (e.g., LOCK-prefixed instructions on x86) may be
>> significant.
>>
>> rte_bit_once_*() which guarantees program-level load and stores
>> actually occurring (i.e., prevents certain optimizations). The primary
>> use of these functions are in the context of memory mapped
>> I/O. Feedback on the details (semantics, naming) here would be greatly
>> appreciated, since the author is not much of a driver developer.
>>
>> rte_bit_atomic_*() which provides atomic bit-level operations,
>> including the possibility to specifying memory ordering constraints
>> (or the lack thereof).
>>
>> The atomic functions take non-_Atomic pointers, to be flexible, just
>> like the GCC builtins and default <rte_stdatomic.h>. The issue with
>> _Atomic APIs is that it may well be the case that the user wants to
>> perform both non-atomic and atomic operations on the same word.
>>
>> Having _Atomic-marked addresses would complicate supporting atomic
>> bit-level operations in the bitset API (proposed in a different RFC
>> patchset), and potentially other APIs depending on RTE bitops for
>> atomic bit-level ops). Either one needs two bitset variants, one
>> _Atomic bitset and one non-atomic one, or the bitset code needs to
>> cast the non-_Atomic pointer to an _Atomic one. Having a separate
>> _Atomic bitset would be bloat and also prevent the user from both, in
>> some situations, doing atomic operations against a bit set, while in
>> other situations (e.g., at times when MT safety is not a concern)
>> operating on the same objects in a non-atomic manner.
> 
> understood. i think the only downside is that if you do have an
> _Atomic-specified type you'll have to cast the qualification away
> to use the function like macro.
> 

This is tricky, and I can't say I've really converged on an opinion, but 
it seems to me at this point you shouldn't mark anything _Atomic.

> as a suggestion the _Generic legs could include both _Atomic-specified
> and non-_Atomic-specified types where an intermediate inline function
> could strip the qualification to use your core inline implementations.
> 
> _Generic((v), int *: __foo32, RTE_ATOMIC(int) *: __foo32_unqual)(v))
> 
> static inline void
> __foo32(int *a) { ... }
> 
> static inline void
> __foo32_unqual(RTE_ATOMIC(int) *a) { __foo32((int *)(uintptr_t)(a)); }
> 
> there is some similar prior art in newer ISO C23 with typeof_unqual.
> 
> https://en.cppreference.com/w/c/language/typeof
> 

This is an interesting solution, but I'm not sure it's a problem that 
needs to be solved.

>>
>> Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
>> not uint32_t or uint64_t. The author found the use of such large types
>> confusing, and also failed to see any performance benefits.
>>
>> A set of functions rte_bit_*_assign() are added, to assign a
>> particular boolean value to a particular bit.
>>
>> All new functions have properly documented semantics.
>>
>> All new functions (or more correctly, generic selection macros)
>> operate on both 32 and 64-bit words, with type checking.
>>
>> _Generic allow the user code to be a little more impact. Have a
>> type-generic atomic test/set/clear/assign bit API also seems
>> consistent with the "core" (word-size) atomics API, which is generic
>> (both GCC builtins and <rte_stdatomic.h> are).
> 
> ack, can you confirm _Generic is usable from a C++ TU? i may be making a
> mistake locally but using g++ version 11.4.0 -std=c++20 it wasn't
> accepting it.
> 
> i think using _Generic is ideal, it eliminates mistakes when handling
> the different integer sizes so if it turns out C++ doesn't want to
> cooperate the function like macro can conditionally expand to a C++
> template this will need to be done for MSVC since i can confirm
> _Generic does not work with MSVC C++.
> 

That's unfortunate.

No, I didn't try it with C++. I just assumed _Generic was C++ as well.

The naive solution would be to include two overloaded functions per 
function-like macro.

#ifdef __cplusplus

#undef rte_bit_set

static inline void
rte_bit_set(uint32_t *addr, unsigned int nr)
{
     __rte_bit_set32(addr, nr);
}

static inline void
rte_bit_set(uint64_t *addr, unsigned int nr)
{
     __rte_bit_set64(addr, nr);
}
#endif

Did you have something more clever/less verbose in mind? The best would 
if one could have a completely generic C++-compatible replacement of 
_Generic, but it's not obvious how that would work.

What's the minimum C++ version required by DPDK? C++11?

>>
>> The _Generic versions avoids having explicit unsigned long versions of
>> all functions. If you have an unsigned long, it's safe to use the
>> generic version (e.g., rte_set_bit()) and _Generic will pick the right
>> function, provided long is either 32 or 64 bit on your platform (which
>> it is on all DPDK-supported ABIs).
>>
>> The generic rte_bit_set() is a macro, and not a function, but
>> nevertheless has been given a lower-case name. That's how C11 does it
>> (for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
>> can't be taken, but it does not evaluate its parameters more than
>> once.
>>
>> Things that are left out of this patch set, that may be included
>> in future versions:
>>
>>   * Have all functions returning a bit number have the same return type
>>     (i.e., unsigned int).
>>   * Harmonize naming of some GCC builtin wrappers (i.e., rte_fls_u32()).
>>   * Add __builtin_ffsll()/ffs() wrapper and potentially other wrappers
>>     for useful/used bit-level GCC builtins.
>>   * Eliminate the MSVC #ifdef-induced documentation duplication.
>>   * _Generic versions of things like rte_popcount32(). (?)
> 
> it would be nice to see them all converted, at the time i added them we
> still hadn't adopted C11 so was limited. but certainly not asking for it
> as a part of this series.
> 
>>
>> Mattias Rönnblom (6):
>>    eal: extend bit manipulation functionality
>>    eal: add unit tests for bit operations
>>    eal: add exactly-once bit access functions
>>    eal: add unit tests for exactly-once bit access functions
>>    eal: add atomic bit operations
>>    eal: add unit tests for atomic bit access functions
>>
>>   app/test/test_bitops.c       | 319 +++++++++++++++++-
>>   lib/eal/include/rte_bitops.h | 624 ++++++++++++++++++++++++++++++++++-
>>   2 files changed, 925 insertions(+), 18 deletions(-)
>>
>> -- 
> 
> Series-acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> 
>> 2.34.1

^ permalink raw reply	[flat|nested] 90+ messages in thread

* RE: [RFC v2 5/6] eal: add atomic bit operations
  2024-04-26  9:39             ` Mattias Rönnblom
@ 2024-04-26 12:00               ` Morten Brørup
  2024-04-28 15:37                 ` Mattias Rönnblom
  2024-04-30 16:52               ` Tyler Retzlaff
  1 sibling, 1 reply; 90+ messages in thread
From: Morten Brørup @ 2024-04-26 12:00 UTC (permalink / raw)
  To: Mattias Rönnblom, Mattias Rönnblom, dev, Tyler Retzlaff
  Cc: Heng Wang, Stephen Hemminger, techboard

> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Friday, 26 April 2024 11.39
> 
> On 2024-04-25 18:18, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> >> Sent: Thursday, 25 April 2024 16.36
> >>
> >> On 2024-04-25 12:25, Morten Brørup wrote:
> >>>> +#define rte_bit_atomic_test(addr, nr, memory_order)
> >> 	\
> >>>> +	_Generic((addr),						\
> >>>> +		 uint32_t *: __rte_bit_atomic_test32,			\
> >>>> +		 uint64_t *: __rte_bit_atomic_test64)(addr, nr,
> >> memory_order)
> >>>
> >>> I wonder if these should have RTE_ATOMIC qualifier:
> >>>
> >>> +		 RTE_ATOMIC(uint32_t) *: __rte_bit_atomic_test32,
> >> 		\
> >>> +		 RTE_ATOMIC(uint64_t) *: __rte_bit_atomic_test64)(addr, nr,
> >> memory_order)
> >>>
> >>>
> >>>> +#define __RTE_GEN_BIT_ATOMIC_TEST(size)
> >> 	\
> >>>> +	static inline bool						\
> >>>> +	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,
> >> 	\
> >>>
> >>> I wonder if the "addr" parameter should have RTE_ATOMIC qualifier:
> >>>
> >>> +	__rte_bit_atomic_test ## size(const RTE_ATOMIC(uint ## size ## _t)
> >> *addr,	\
> >>>
> >>> instead of casting into a_addr.
> >>>
> >>
> >> Check the cover letter for the rationale for the cast.
> >
> > Thanks, that clarifies it. Then...
> > For the series:
> > Acked-by: Morten Brørup <mb@smartsharesystems.com>
> >
> >>
> >> Where I'm at now is that I think C11 _Atomic is rather poor design. The
> >> assumption that an object which allows for atomic access always should
> >> require all operations upon it to be atomic, regardless of where it is
> >> in its lifetime, and which thread is accessing it, does not hold, in the
> >> general case.
> >
> > It might be slow, but I suppose the C11 standard prioritizes correctness
> over performance.
> >
> 
> That's a false dichotomy, in this case. You can have both.

In theory you shouldn't need non-atomic access to atomic variables.
In reality, we want it anyway, because real CPUs are faster at non-atomic operations.

> 
> > It seems locks are automatically added to _Atomic types larger than what is
> natively supported by the architecture.
> > E.g. MSVC adds locks to _Atomic types larger than 8 byte. [1]
> >
> > [1]: https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual-studio-
> 2022-version-17-5-preview-2/
> >
> >>
> >> The only reason for _Atomic being as it is, as far as I can see, is to
> >> accommodate for ISAs which does not have the appropriate atomic machine
> >> instructions, and thus require a lock or some other data associated with
> >> the actual user-data-carrying bits. Neither GCC nor DPDK supports any
> >> such ISAs, to my knowledge. I suspect neither never will. So the cast
> >> will continue to work.
> >
> > I tend to agree with you on this.
> >
> > We should officially decide that DPDK treats RTE_ATOMIC types as a union of
> _Atomic and non-atomic, i.e. operations on RTE_ATOMIC types can be both atomic
> and non-atomic.
> >
> 
> I think this is a subject which needs to be further explored.

Yes. It's easier exploring and deciding now, when our options are open, than after we have locked down the affected APIs.

> 
> Objects that can be accessed both atomically and non-atomically should
> be without _Atomic. With my current understanding of this issue, that
> seems like the best option.

Agree.

The alterative described below is certainly no good!

It would be nice if they were marked as sometimes-atomic by a qualifier or special type, like rte_be32_t marks the network byte order variant of an uint32_t.

Furthermore, large atomic objects need the _Atomic qualifier for the compiler to add (and use) the associated lock.
Alternatively, we could specify that sometimes-atomic objects cannot be larger than 8 byte, which is what MSVC can handle without locking.

> 
> You could turn it around as well, and have such marked _Atomic and have
> explicit casts to their non-_Atomic cousins when operated upon by
> non-atomic functions. Not sure how realistic that is, since
> non-atomicity is the norm. All generic selection-based "functions" must
> take this into account.
> 
> >>
> >>>> +				      unsigned int nr, int memory_order) \
> >>>> +	{								\
> >>>> +		RTE_ASSERT(nr < size);					\
> >>>> +									\
> >>>> +		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
> >>>> +			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
> >>>> +		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
> >>>> +		return rte_atomic_load_explicit(a_addr, memory_order) &
> >> mask; \
> >>>> +	}
> >>>
> >>>
> >>> Similar considerations regarding volatile qualifier for the "once"
> >> operations.
> >>>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC v2 0/6] Improve EAL bit operations API
  2024-04-25  8:58   ` [RFC v2 0/6] Improve EAL bit operations API Mattias Rönnblom
                       ` (6 preceding siblings ...)
  2024-04-25 18:05     ` [RFC v2 0/6] Improve EAL bit operations API Tyler Retzlaff
@ 2024-04-26 21:35     ` Patrick Robb
  7 siblings, 0 replies; 90+ messages in thread
From: Patrick Robb @ 2024-04-26 21:35 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: dev, hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff

[-- Attachment #1: Type: text/plain, Size: 296 bytes --]

Recheck-request: iol-compile-amd64-testing

The DPDK Community Lab updated to the latest Alpine image yesterday, which
resulted in all Alpine builds failing. The failure is unrelated to your
patch, and this recheck should remove the fail on Patchwork, as we have
disabled Alpine testing for now.

[-- Attachment #2: Type: text/html, Size: 361 bytes --]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC v2 5/6] eal: add atomic bit operations
  2024-04-26 12:00               ` Morten Brørup
@ 2024-04-28 15:37                 ` Mattias Rönnblom
  2024-04-29  7:24                   ` Morten Brørup
  0 siblings, 1 reply; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-28 15:37 UTC (permalink / raw)
  To: Morten Brørup, Mattias Rönnblom, dev, Tyler Retzlaff
  Cc: Heng Wang, Stephen Hemminger, techboard

On 2024-04-26 14:00, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
>> Sent: Friday, 26 April 2024 11.39
>>
>> On 2024-04-25 18:18, Morten Brørup wrote:
>>>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
>>>> Sent: Thursday, 25 April 2024 16.36
>>>>
>>>> On 2024-04-25 12:25, Morten Brørup wrote:
>>>>>> +#define rte_bit_atomic_test(addr, nr, memory_order)
>>>> 	\
>>>>>> +	_Generic((addr),						\
>>>>>> +		 uint32_t *: __rte_bit_atomic_test32,			\
>>>>>> +		 uint64_t *: __rte_bit_atomic_test64)(addr, nr,
>>>> memory_order)
>>>>>
>>>>> I wonder if these should have RTE_ATOMIC qualifier:
>>>>>
>>>>> +		 RTE_ATOMIC(uint32_t) *: __rte_bit_atomic_test32,
>>>> 		\
>>>>> +		 RTE_ATOMIC(uint64_t) *: __rte_bit_atomic_test64)(addr, nr,
>>>> memory_order)
>>>>>
>>>>>
>>>>>> +#define __RTE_GEN_BIT_ATOMIC_TEST(size)
>>>> 	\
>>>>>> +	static inline bool						\
>>>>>> +	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,
>>>> 	\
>>>>>
>>>>> I wonder if the "addr" parameter should have RTE_ATOMIC qualifier:
>>>>>
>>>>> +	__rte_bit_atomic_test ## size(const RTE_ATOMIC(uint ## size ## _t)
>>>> *addr,	\
>>>>>
>>>>> instead of casting into a_addr.
>>>>>
>>>>
>>>> Check the cover letter for the rationale for the cast.
>>>
>>> Thanks, that clarifies it. Then...
>>> For the series:
>>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>>>
>>>>
>>>> Where I'm at now is that I think C11 _Atomic is rather poor design. The
>>>> assumption that an object which allows for atomic access always should
>>>> require all operations upon it to be atomic, regardless of where it is
>>>> in its lifetime, and which thread is accessing it, does not hold, in the
>>>> general case.
>>>
>>> It might be slow, but I suppose the C11 standard prioritizes correctness
>> over performance.
>>>
>>
>> That's a false dichotomy, in this case. You can have both.
> 
> In theory you shouldn't need non-atomic access to atomic variables.
> In reality, we want it anyway, because real CPUs are faster at non-atomic operations.
> 
>>
>>> It seems locks are automatically added to _Atomic types larger than what is
>> natively supported by the architecture.
>>> E.g. MSVC adds locks to _Atomic types larger than 8 byte. [1]
>>>
>>> [1]: https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual-studio-
>> 2022-version-17-5-preview-2/
>>>
>>>>
>>>> The only reason for _Atomic being as it is, as far as I can see, is to
>>>> accommodate for ISAs which does not have the appropriate atomic machine
>>>> instructions, and thus require a lock or some other data associated with
>>>> the actual user-data-carrying bits. Neither GCC nor DPDK supports any
>>>> such ISAs, to my knowledge. I suspect neither never will. So the cast
>>>> will continue to work.
>>>
>>> I tend to agree with you on this.
>>>
>>> We should officially decide that DPDK treats RTE_ATOMIC types as a union of
>> _Atomic and non-atomic, i.e. operations on RTE_ATOMIC types can be both atomic
>> and non-atomic.
>>>
>>
>> I think this is a subject which needs to be further explored.
> 
> Yes. It's easier exploring and deciding now, when our options are open, than after we have locked down the affected APIs.
> 
>>
>> Objects that can be accessed both atomically and non-atomically should
>> be without _Atomic. With my current understanding of this issue, that
>> seems like the best option.
> 
> Agree.
> 
> The alterative described below is certainly no good!
> 
> It would be nice if they were marked as sometimes-atomic by a qualifier or special type, like rte_be32_t marks the network byte order variant of an uint32_t.
> 
> Furthermore, large atomic objects need the _Atomic qualifier for the compiler to add (and use) the associated lock.

If you have larger objects than the ISA can handle, you wouldn't want to 
leave the choice of the synchronization primitive to use to the 
compiler. I don't see how it could possibly know, which one is the most 
appropriate, especially in a DPDK context. It would for example need to 
know if the contending threads are non-preemptable or not.

In some situations a sequence lock may well be your best option. Will 
the compiler generate one for you?

If "lock" means std::mutex, it would be a disaster, performance-wise.

> Alternatively, we could specify that sometimes-atomic objects cannot be larger than 8 byte, which is what MSVC can handle without locking.
> 
>>
>> You could turn it around as well, and have such marked _Atomic and have
>> explicit casts to their non-_Atomic cousins when operated upon by
>> non-atomic functions. Not sure how realistic that is, since
>> non-atomicity is the norm. All generic selection-based "functions" must
>> take this into account.
>>
>>>>
>>>>>> +				      unsigned int nr, int memory_order) \
>>>>>> +	{								\
>>>>>> +		RTE_ASSERT(nr < size);					\
>>>>>> +									\
>>>>>> +		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
>>>>>> +			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
>>>>>> +		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
>>>>>> +		return rte_atomic_load_explicit(a_addr, memory_order) &
>>>> mask; \
>>>>>> +	}
>>>>>
>>>>>
>>>>> Similar considerations regarding volatile qualifier for the "once"
>>>> operations.
>>>>>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* RE: [RFC v2 5/6] eal: add atomic bit operations
  2024-04-28 15:37                 ` Mattias Rönnblom
@ 2024-04-29  7:24                   ` Morten Brørup
  0 siblings, 0 replies; 90+ messages in thread
From: Morten Brørup @ 2024-04-29  7:24 UTC (permalink / raw)
  To: Mattias Rönnblom, Mattias Rönnblom, dev, Tyler Retzlaff
  Cc: Heng Wang, Stephen Hemminger, techboard

> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Sunday, 28 April 2024 17.38
> 
> On 2024-04-26 14:00, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> >> Sent: Friday, 26 April 2024 11.39
> >>
> >> On 2024-04-25 18:18, Morten Brørup wrote:
> >>>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> >>>> Sent: Thursday, 25 April 2024 16.36
> >>>>
> >>>> On 2024-04-25 12:25, Morten Brørup wrote:
> >>>>>> +#define rte_bit_atomic_test(addr, nr, memory_order)
> >>>> 	\
> >>>>>> +	_Generic((addr),						\
> >>>>>> +		 uint32_t *: __rte_bit_atomic_test32,			\
> >>>>>> +		 uint64_t *: __rte_bit_atomic_test64)(addr, nr,
> >>>> memory_order)
> >>>>>
> >>>>> I wonder if these should have RTE_ATOMIC qualifier:
> >>>>>
> >>>>> +		 RTE_ATOMIC(uint32_t) *: __rte_bit_atomic_test32,
> >>>> 		\
> >>>>> +		 RTE_ATOMIC(uint64_t) *: __rte_bit_atomic_test64)(addr,
> nr,
> >>>> memory_order)
> >>>>>
> >>>>>
> >>>>>> +#define __RTE_GEN_BIT_ATOMIC_TEST(size)
> >>>> 	\
> >>>>>> +	static inline bool						\
> >>>>>> +	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,
> >>>> 	\
> >>>>>
> >>>>> I wonder if the "addr" parameter should have RTE_ATOMIC qualifier:
> >>>>>
> >>>>> +	__rte_bit_atomic_test ## size(const RTE_ATOMIC(uint ## size ##
> _t)
> >>>> *addr,	\
> >>>>>
> >>>>> instead of casting into a_addr.
> >>>>>
> >>>>
> >>>> Check the cover letter for the rationale for the cast.
> >>>
> >>> Thanks, that clarifies it. Then...
> >>> For the series:
> >>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> >>>
> >>>>
> >>>> Where I'm at now is that I think C11 _Atomic is rather poor design. The
> >>>> assumption that an object which allows for atomic access always should
> >>>> require all operations upon it to be atomic, regardless of where it is
> >>>> in its lifetime, and which thread is accessing it, does not hold, in the
> >>>> general case.
> >>>
> >>> It might be slow, but I suppose the C11 standard prioritizes correctness
> >> over performance.
> >>>
> >>
> >> That's a false dichotomy, in this case. You can have both.
> >
> > In theory you shouldn't need non-atomic access to atomic variables.
> > In reality, we want it anyway, because real CPUs are faster at non-atomic
> operations.
> >
> >>
> >>> It seems locks are automatically added to _Atomic types larger than what
> is
> >> natively supported by the architecture.
> >>> E.g. MSVC adds locks to _Atomic types larger than 8 byte. [1]
> >>>
> >>> [1]: https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual-studio-
> >> 2022-version-17-5-preview-2/
> >>>
> >>>>
> >>>> The only reason for _Atomic being as it is, as far as I can see, is to
> >>>> accommodate for ISAs which does not have the appropriate atomic machine
> >>>> instructions, and thus require a lock or some other data associated with
> >>>> the actual user-data-carrying bits. Neither GCC nor DPDK supports any
> >>>> such ISAs, to my knowledge. I suspect neither never will. So the cast
> >>>> will continue to work.
> >>>
> >>> I tend to agree with you on this.
> >>>
> >>> We should officially decide that DPDK treats RTE_ATOMIC types as a union
> of
> >> _Atomic and non-atomic, i.e. operations on RTE_ATOMIC types can be both
> atomic
> >> and non-atomic.
> >>>
> >>
> >> I think this is a subject which needs to be further explored.
> >
> > Yes. It's easier exploring and deciding now, when our options are open, than
> after we have locked down the affected APIs.
> >
> >>
> >> Objects that can be accessed both atomically and non-atomically should
> >> be without _Atomic. With my current understanding of this issue, that
> >> seems like the best option.
> >
> > Agree.
> >
> > The alterative described below is certainly no good!
> >
> > It would be nice if they were marked as sometimes-atomic by a qualifier or
> special type, like rte_be32_t marks the network byte order variant of an
> uint32_t.
> >
> > Furthermore, large atomic objects need the _Atomic qualifier for the
> compiler to add (and use) the associated lock.
> 
> If you have larger objects than the ISA can handle, you wouldn't want to
> leave the choice of the synchronization primitive to use to the
> compiler. I don't see how it could possibly know, which one is the most
> appropriate, especially in a DPDK context. It would for example need to
> know if the contending threads are non-preemptable or not.
> 
> In some situations a sequence lock may well be your best option. Will
> the compiler generate one for you?
> 
> If "lock" means std::mutex, it would be a disaster, performance-wise.

Considering that the atomic functions, e.g. atomic_fetch_add(), without _explicit(..., memory_order) means memory_order_seq_cst, I think it does. This makes it relatively straightforward to use atomic types, at the cost of performance.

There's a good description here:
https://en.cppreference.com/w/c/language/atomic

Note that accessing members of an _Atomic struct/union is undefined behavior.
For those, you need to have a non-atomic type, used as "value" to void atomic_store( volatile _Atomic struct mytype * obj, const struct mytype value ), and return value from atomic_load( const volatile _Atomic struct mytype * obj ).

In other words, for structs/unions, _Atomic variables are only accessed through accessor functions taking pointers to them, and thereby transformed from/to values of similar non-atomic type.
I think that this concept also supports your suggestion above: Objects that can be accessed both atomically and non-atomically should be without _Atomic.

But I still think it would be a good idea to mark them as sometimes-atomic, for source code readability/review purposes.

E.g. the mbuf's refcnt field is of the type RTE_ATOMIC(uint16_t). If it is not only accessed through atomic_ accessor functions, should it still be marked RTE_ATOMIC()?

In the future, compilers might warn or error when an _Atomic variable (of any type) is being accessed directly.
The extreme solution would be not to mix atomic and non-atomic access to variables. But that seems unrealistic (at this time).

If we truly want to support C11 atomics, we need to understand and follow the concepts in the standard.

> 
> > Alternatively, we could specify that sometimes-atomic objects cannot be
> larger than 8 byte, which is what MSVC can handle without locking.
> >
> >>
> >> You could turn it around as well, and have such marked _Atomic and have
> >> explicit casts to their non-_Atomic cousins when operated upon by
> >> non-atomic functions. Not sure how realistic that is, since
> >> non-atomicity is the norm. All generic selection-based "functions" must
> >> take this into account.
> >>
> >>>>
> >>>>>> +				      unsigned int nr, int memory_order) \
> >>>>>> +	{								\
> >>>>>> +		RTE_ASSERT(nr < size);					\
> >>>>>> +									\
> >>>>>> +		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
> >>>>>> +			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
> >>>>>> +		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;
> 	\
> >>>>>> +		return rte_atomic_load_explicit(a_addr, memory_order) &
> >>>> mask; \
> >>>>>> +	}
> >>>>>
> >>>>>
> >>>>> Similar considerations regarding volatile qualifier for the "once"
> >>>> operations.
> >>>>>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [RFC v3 0/6] Improve EAL bit operations API
  2024-04-25  8:58     ` [RFC v2 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-04-29  9:51       ` Mattias Rönnblom
  2024-04-29  9:51         ` [RFC v3 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
                           ` (5 more replies)
  0 siblings, 6 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-29  9:51 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

This patch set represent an attempt to improve and extend the RTE
bitops API, in particular for functions that operate on individual
bits.

All new functionality is exposed to the user as generic selection
macros, delegating the actual work to private (__-marked) static
inline functions. Public functions (e.g., rte_bit_set32()) would just
be bloating the API. Such generic selection macros will here be
referred to as "functions", although technically they are not.

The legacy <rte_bitops.h> rte_bit_relaxed_*() family of functions is
replaced with three families:

rte_bit_[test|set|clear|assign]() which provides no memory ordering or
atomicity guarantees and no read-once or write-once semantics (e.g.,
no use of volatile), but does provide the best performance. The
performance degradation resulting from the use of volatile (e.g.,
forcing loads and stores to actually occur and in the number
specified) and atomic (e.g., LOCK-prefixed instructions on x86) may be
significant.

rte_bit_once_*() which guarantees program-level load and stores
actually occurring (i.e., prevents certain optimizations). The primary
use of these functions are in the context of memory mapped
I/O. Feedback on the details (semantics, naming) here would be greatly
appreciated, since the author is not much of a driver developer.

rte_bit_atomic_*() which provides atomic bit-level operations,
including the possibility to specifying memory ordering constraints
(or the lack thereof).

The atomic functions take non-_Atomic pointers, to be flexible, just
like the GCC builtins and default <rte_stdatomic.h>. The issue with
_Atomic APIs is that it may well be the case that the user wants to
perform both non-atomic and atomic operations on the same word.

Having _Atomic-marked addresses would complicate supporting atomic
bit-level operations in the bitset API (proposed in a different RFC
patchset), and potentially other APIs depending on RTE bitops for
atomic bit-level ops). Either one needs two bitset variants, one
_Atomic bitset and one non-atomic one, or the bitset code needs to
cast the non-_Atomic pointer to an _Atomic one. Having a separate
_Atomic bitset would be bloat and also prevent the user from both, in
some situations, doing atomic operations against a bit set, while in
other situations (e.g., at times when MT safety is not a concern)
operating on the same objects in a non-atomic manner.

Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
not uint32_t or uint64_t. The author found the use of such large types
confusing, and also failed to see any performance benefits.

A set of functions rte_bit_*_assign() are added, to assign a
particular boolean value to a particular bit.

All new functions have properly documented semantics.

All new functions operate on both 32 and 64-bit words, with type
checking.

_Generic allow the user code to be a little more impact. Have a
type-generic atomic test/set/clear/assign bit API also seems
consistent with the "core" (word-size) atomics API, which is generic
(both GCC builtins and <rte_stdatomic.h> are).

The _Generic versions avoids having explicit unsigned long versions of
all functions. If you have an unsigned long, it's safe to use the
generic version (e.g., rte_set_bit()) and _Generic will pick the right
function, provided long is either 32 or 64 bit on your platform (which
it is on all DPDK-supported ABIs).

The generic rte_bit_set() is a macro, and not a function, but
nevertheless has been given a lower-case name. That's how C11 does it
(for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
can't be taken, but it does not evaluate its parameters more than
once.

C++ doesn't generic selection, and in C++ translation units the
_Generic macros are replaced with overloaded functions.

Things that are left out of this patch set, that may be included
in future versions:

 * Have all functions returning a bit number have the same return type
   (i.e., unsigned int).
 * Harmonize naming of some GCC builtin wrappers (i.e., rte_fls_u32()).
 * Add __builtin_ffsll()/ffs() wrapper and potentially other wrappers
   for useful/used bit-level GCC builtins.
 * Eliminate the MSVC #ifdef-induced documentation duplication.
 * _Generic versions of things like rte_popcount32(). (?)

Mattias Rönnblom (6):
  eal: extend bit manipulation functionality
  eal: add unit tests for bit operations
  eal: add exactly-once bit access functions
  eal: add unit tests for exactly-once bit access functions
  eal: add atomic bit operations
  eal: add unit tests for atomic bit access functions

 app/test/test_bitops.c       | 319 ++++++++++++++-
 lib/eal/include/rte_bitops.h | 768 ++++++++++++++++++++++++++++++++++-
 2 files changed, 1069 insertions(+), 18 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [RFC v3 1/6] eal: extend bit manipulation functionality
  2024-04-29  9:51       ` [RFC v3 0/6] Improve EAL bit operations API Mattias Rönnblom
@ 2024-04-29  9:51         ` Mattias Rönnblom
  2024-04-29 11:12           ` Morten Brørup
  2024-04-30  9:55           ` [RFC v4 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-04-29  9:51         ` [RFC v3 2/6] eal: add unit tests for bit operations Mattias Rönnblom
                           ` (4 subsequent siblings)
  5 siblings, 2 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-29  9:51 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add functionality to test, set, clear, and assign the value to
individual bits in 32-bit or 64-bit words.

These functions have no implications on memory ordering, atomicity and
does not use volatile and thus does not prevent any compiler
optimizations.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).
 * Fix ','-related checkpatch warnings.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 218 ++++++++++++++++++++++++++++++++++-
 1 file changed, 216 insertions(+), 2 deletions(-)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 449565eeae..fb2e3dae7b 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -2,6 +2,7 @@
  * Copyright(c) 2020 Arm Limited
  * Copyright(c) 2010-2019 Intel Corporation
  * Copyright(c) 2023 Microsoft Corporation
+ * Copyright(c) 2024 Ericsson AB
  */
 
 #ifndef _RTE_BITOPS_H_
@@ -11,12 +12,14 @@
  * @file
  * Bit Operations
  *
- * This file defines a family of APIs for bit operations
- * without enforcing memory ordering.
+ * This file provides functionality for low-level, single-word
+ * arithmetic and bit-level operations, such as counting or
+ * setting individual bits.
  */
 
 #include <stdint.h>
 
+#include <rte_compat.h>
 #include <rte_debug.h>
 
 #ifdef __cplusplus
@@ -105,6 +108,157 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_test32,			\
+		 uint64_t *: __rte_bit_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: __rte_bit_set32,		\
+		 uint64_t *: __rte_bit_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_clear32,			\
+		 uint64_t *: __rte_bit_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)					\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_assign32,			\
+		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
+
+#define __RTE_GEN_BIT_TEST(name, size, qualifier)			\
+	static inline bool						\
+	name(const qualifier uint ## size ## _t *addr, unsigned int nr)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return *addr & mask;					\
+	}
+
+#define __RTE_GEN_BIT_SET(name, size, qualifier)			\
+	static inline void						\
+	name(qualifier uint ## size ## _t *addr, unsigned int nr)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		*addr |= mask;						\
+	}								\
+
+#define __RTE_GEN_BIT_CLEAR(name, size, qualifier)			\
+	static inline void						\
+	name(qualifier uint ## size ## _t *addr, unsigned int nr)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = ~((uint ## size ## _t)1 << nr); \
+		(*addr) &= mask;					\
+	}								\
+
+__RTE_GEN_BIT_TEST(__rte_bit_test32, 32, )
+__RTE_GEN_BIT_SET(__rte_bit_set32, 32, )
+__RTE_GEN_BIT_CLEAR(__rte_bit_clear32, 32, )
+
+__RTE_GEN_BIT_TEST(__rte_bit_test64, 64, )
+__RTE_GEN_BIT_SET(__rte_bit_set64, 64, )
+__RTE_GEN_BIT_CLEAR(__rte_bit_clear64, 64, )
+
+__rte_experimental
+static inline void
+__rte_bit_assign32(uint32_t *addr, unsigned int nr, bool value)
+{
+	if (value)
+		__rte_bit_set32(addr, nr);
+	else
+		__rte_bit_clear32(addr, nr);
+}
+
+__rte_experimental
+static inline void
+__rte_bit_assign64(uint64_t *addr, unsigned int nr, bool value)
+{
+	if (value)
+		__rte_bit_set64(addr, nr);
+	else
+		__rte_bit_clear64(addr, nr);
+}
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -787,6 +941,66 @@ rte_log2_u64(uint64_t v)
 
 #ifdef __cplusplus
 }
+
+/*
+ * Since C++ doesn't support generic selection (i.e., _Generic),
+ * function overloading is used instead. Such functions must be
+ * defined outside 'extern "C"' to be accepted by the compiler.
+ */
+
+#undef rte_bit_test
+#undef rte_bit_set
+#undef rte_bit_clear
+#undef rte_bit_assign
+
+#define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
+	static inline void						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_2(fun, qualifier, arg1_type, arg1_name)	\
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 32, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 64, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_2R(fun, qualifier, ret_type, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name)				\
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_3(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name)					\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name)
+
+__RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(set, , unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(clear, , unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(assign, , unsigned int, nr, bool, value)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v3 2/6] eal: add unit tests for bit operations
  2024-04-29  9:51       ` [RFC v3 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-04-29  9:51         ` [RFC v3 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-04-29  9:51         ` Mattias Rönnblom
  2024-04-29  9:51         ` [RFC v3 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
                           ` (3 subsequent siblings)
  5 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-29  9:51 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the rte_bit_[set|clear|assign|test]()
family of functions.

The tests are converted to use the test suite runner framework.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 app/test/test_bitops.c | 76 +++++++++++++++++++++++++++++++++---------
 1 file changed, 61 insertions(+), 15 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 0d4ccfb468..f788b561a0 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -1,13 +1,59 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2019 Arm Limited
+ * Copyright(c) 2024 Ericsson AB
  */
 
+#include <stdbool.h>
+
 #include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_random.h>
 #include "test.h"
 
-uint32_t val32;
-uint64_t val64;
+#define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
+			    test_fun, size)				\
+	static int							\
+	test_name(void)							\
+	{								\
+		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
+		unsigned int bit_nr;					\
+		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			bool assign = rte_rand() & 1;			\
+			if (assign)					\
+				assign_fun(&word, bit_nr, reference_bit); \
+			else {						\
+				if (reference_bit)			\
+					set_fun(&word, bit_nr);		\
+				else					\
+					clear_fun(&word, bit_nr);	\
+									\
+			}						\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		TEST_ASSERT(reference == word, "Word had unexpected value"); \
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_ACCESS(test_bit_access_32, rte_bit_set, rte_bit_clear, \
+		    rte_bit_assign, rte_bit_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_access_64, rte_bit_set, rte_bit_clear, \
+		    rte_bit_assign, rte_bit_test, 64)
+
+static uint32_t val32;
+static uint64_t val64;
 
 #define MAX_BITS_32 32
 #define MAX_BITS_64 64
@@ -117,22 +163,22 @@ test_bit_relaxed_test_set_clear(void)
 	return TEST_SUCCESS;
 }
 
+static struct unit_test_suite test_suite = {
+	.suite_name = "Bitops test suite",
+	.unit_test_cases = {
+		TEST_CASE(test_bit_access_32),
+		TEST_CASE(test_bit_access_64),
+		TEST_CASE(test_bit_relaxed_set),
+		TEST_CASE(test_bit_relaxed_clear),
+		TEST_CASE(test_bit_relaxed_test_set_clear),
+		TEST_CASES_END()
+	}
+};
+
 static int
 test_bitops(void)
 {
-	val32 = 0;
-	val64 = 0;
-
-	if (test_bit_relaxed_set() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_clear() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_test_set_clear() < 0)
-		return TEST_FAILED;
-
-	return TEST_SUCCESS;
+	return unit_test_suite_runner(&test_suite);
 }
 
 REGISTER_FAST_TEST(bitops_autotest, true, true, test_bitops);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v3 3/6] eal: add exactly-once bit access functions
  2024-04-29  9:51       ` [RFC v3 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-04-29  9:51         ` [RFC v3 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
  2024-04-29  9:51         ` [RFC v3 2/6] eal: add unit tests for bit operations Mattias Rönnblom
@ 2024-04-29  9:51         ` Mattias Rönnblom
  2024-04-29  9:51         ` [RFC v3 4/6] eal: add unit tests for " Mattias Rönnblom
                           ` (2 subsequent siblings)
  5 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-29  9:51 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add bit test/set/clear/assign functions which prevents certain
compiler optimizations and guarantees that program-level memory loads
and/or stores will actually occur.

These functions are useful when interacting with memory-mapped
hardware devices.

The "once" family of functions does not promise atomicity and provides
no memory ordering guarantees beyond the C11 relaxed memory model.

RFC v3:
    * Work around lack of C++ support for _Generic (Tyler Retzlaff).

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 180 +++++++++++++++++++++++++++++++++++
 1 file changed, 180 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index fb2e3dae7b..eac3f8b86a 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -201,6 +201,147 @@ extern "C" {
 		 uint32_t *: __rte_bit_assign32,			\
 		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Generic selection macro to test exactly once the value of a bit in
+ * a 32-bit or 64-bit word. The type of operation depends on the type
+ * of the @c addr parameter.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * (e.g., it may not be eliminate or merged by the compiler).
+ *
+ * \code{.c}
+ * rte_bit_once_set(addr, 17);
+ * if (rte_bit_once_test(addr, 17)) {
+ *     ...
+ * }
+ * \endcode
+ *
+ * In the above example, rte_bit_once_set() may not be removed by
+ * the compiler, which would be allowed in case rte_bit_set() and
+ * rte_bit_test() was used.
+ *
+ * \code{.c}
+ * while (rte_bit_once_test(addr, 17);
+ *     ;
+ * \endcode
+ *
+ * In case rte_bit_test(addr, 17) was used instead, the resulting
+ * object code could (and in many cases would be) replaced with
+ * the equivalent to
+ * \code{.c}
+ * if (rte_bit_test(addr, 17)) {
+ *   for (;;) // spin forever
+ *       ;
+ * }
+ * \endcode
+ *
+ * rte_bit_once_test() does not give any guarantees in regards to
+ * memory ordering or atomicity.
+ *
+ * The regular bit set operations (e.g., rte_bit_test()) should be
+ * preferred over the "once" family of operations (e.g.,
+ * rte_bit_once_test()) if possible, since the latter may prevent
+ * optimizations crucial for run-time performance.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+
+#define rte_bit_once_test(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_test32,		\
+		 uint64_t *: __rte_bit_once_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to '1'
+ * exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit set operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_set(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_set32,		\
+		 uint64_t *: __rte_bit_once_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to '0'
+ * exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_clear(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_clear32,		\
+		 uint64_t *: __rte_bit_once_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to bit in a word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to the
+ * value indicated by @c value exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_once_assign(addr, nr, value)				\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_once_assign32,			\
+		 uint64_t *: __rte_bit_once_assign64)(addr, nr, value)
+
 #define __RTE_GEN_BIT_TEST(name, size, qualifier)			\
 	static inline bool						\
 	name(const qualifier uint ## size ## _t *addr, unsigned int nr)	\
@@ -239,6 +380,14 @@ __RTE_GEN_BIT_TEST(__rte_bit_test64, 64, )
 __RTE_GEN_BIT_SET(__rte_bit_set64, 64, )
 __RTE_GEN_BIT_CLEAR(__rte_bit_clear64, 64, )
 
+__RTE_GEN_BIT_TEST(__rte_bit_once_test32, 32, volatile)
+__RTE_GEN_BIT_SET(__rte_bit_once_set32, 32, volatile)
+__RTE_GEN_BIT_CLEAR(__rte_bit_once_clear32, 32, volatile)
+
+__RTE_GEN_BIT_TEST(__rte_bit_once_test64, 64, volatile)
+__RTE_GEN_BIT_SET(__rte_bit_once_set64, 64, volatile)
+__RTE_GEN_BIT_CLEAR(__rte_bit_once_clear64, 64, volatile)
+
 __rte_experimental
 static inline void
 __rte_bit_assign32(uint32_t *addr, unsigned int nr, bool value)
@@ -259,6 +408,27 @@ __rte_bit_assign64(uint64_t *addr, unsigned int nr, bool value)
 		__rte_bit_clear64(addr, nr);
 }
 
+
+__rte_experimental
+static inline void
+__rte_bit_once_assign32(uint32_t *addr, unsigned int nr, bool value)
+{
+	if (value)
+		__rte_bit_once_set32(addr, nr);
+	else
+		__rte_bit_once_clear32(addr, nr);
+}
+
+__rte_experimental
+static inline void
+__rte_bit_once_assign64(volatile uint64_t *addr, unsigned int nr, bool value)
+{
+	if (value)
+		__rte_bit_once_set64(addr, nr);
+	else
+		__rte_bit_once_clear64(addr, nr);
+}
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -953,6 +1123,11 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_clear
 #undef rte_bit_assign
 
+#undef rte_bit_once_test
+#undef rte_bit_once_set
+#undef rte_bit_once_clear
+#undef rte_bit_once_assign
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1001,6 +1176,11 @@ __RTE_BIT_OVERLOAD_2(set, , unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(clear, , unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(assign, , unsigned int, nr, bool, value)
 
+__RTE_BIT_OVERLOAD_2R(once_test, const volatile, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(once_set, volatile, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(once_clear, volatile, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(once_assign, volatile, unsigned int, nr, bool, value)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v3 4/6] eal: add unit tests for exactly-once bit access functions
  2024-04-29  9:51       ` [RFC v3 0/6] Improve EAL bit operations API Mattias Rönnblom
                           ` (2 preceding siblings ...)
  2024-04-29  9:51         ` [RFC v3 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
@ 2024-04-29  9:51         ` Mattias Rönnblom
  2024-04-29  9:51         ` [RFC v3 5/6] eal: add atomic bit operations Mattias Rönnblom
  2024-04-29  9:51         ` [RFC v3 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  5 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-29  9:51 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_once_[set|clear|assign|test]() family of functions.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 app/test/test_bitops.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index f788b561a0..12c1027e36 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -46,12 +46,20 @@
 		return TEST_SUCCESS;					\
 	}
 
-GEN_TEST_BIT_ACCESS(test_bit_access_32, rte_bit_set, rte_bit_clear, \
+GEN_TEST_BIT_ACCESS(test_bit_access_32, rte_bit_set, rte_bit_clear,	\
 		    rte_bit_assign, rte_bit_test, 32)
 
-GEN_TEST_BIT_ACCESS(test_bit_access_64, rte_bit_set, rte_bit_clear, \
+GEN_TEST_BIT_ACCESS(test_bit_access_64, rte_bit_set, rte_bit_clear,	\
 		    rte_bit_assign, rte_bit_test, 64)
 
+GEN_TEST_BIT_ACCESS(test_bit_once_access_32, rte_bit_once_set,		\
+		    rte_bit_once_clear, rte_bit_once_assign,		\
+		    rte_bit_once_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_once_access_64, rte_bit_once_set,		\
+		    rte_bit_once_clear, rte_bit_once_assign,		\
+		    rte_bit_once_test, 64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -168,6 +176,8 @@ static struct unit_test_suite test_suite = {
 	.unit_test_cases = {
 		TEST_CASE(test_bit_access_32),
 		TEST_CASE(test_bit_access_64),
+		TEST_CASE(test_bit_once_access_32),
+		TEST_CASE(test_bit_once_access_64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v3 5/6] eal: add atomic bit operations
  2024-04-29  9:51       ` [RFC v3 0/6] Improve EAL bit operations API Mattias Rönnblom
                           ` (3 preceding siblings ...)
  2024-04-29  9:51         ` [RFC v3 4/6] eal: add unit tests for " Mattias Rönnblom
@ 2024-04-29  9:51         ` Mattias Rönnblom
  2024-04-29  9:51         ` [RFC v3 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  5 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-29  9:51 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add atomic bit test/set/clear/assign and test-and-set/clear functions.

All atomic bit functions allow (and indeed, require) the caller to
specify a memory order.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).

RFC v2:
 o Add rte_bit_atomic_test_and_assign() (for consistency).
 o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
 o Use <rte_stdatomics.h> to support MSVC.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 371 +++++++++++++++++++++++++++++++++++
 1 file changed, 371 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index eac3f8b86a..2af5355a8a 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -21,6 +21,7 @@
 
 #include <rte_compat.h>
 #include <rte_debug.h>
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -342,6 +343,177 @@ extern "C" {
 		 uint32_t *: __rte_bit_once_assign32,			\
 		 uint64_t *: __rte_bit_once_assign64)(addr, nr, value)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test if a particular bit in a word is set with a particular memory
+ * order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+#define rte_bit_atomic_test(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test32,			\
+		 uint64_t *: __rte_bit_atomic_test64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically set bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '1', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_set(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_set32,			\
+		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically clear bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '0', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_clear(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_clear32,			\
+		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically assign a value to bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to the value indicated by @c value, with the memory ordering
+ * as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_assign32,			\
+		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
+							memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and set a bit in word.
+ *
+ * Atomically test and set bit specified by @c nr in the word pointed
+ * to by @c addr to '1', with the memory ordering as specified with @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
+							      memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and clear a bit in word.
+ *
+ * Atomically test and clear bit specified by @c nr in the word
+ * pointed to by @c addr to '0', with the memory ordering as specified
+ * with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
+								memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and assign a bit in word.
+ *
+ * Atomically test and assign bit specified by @c nr in the word
+ * pointed to by @c addr the value specified by @c value, with the
+ * memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
+		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
+								 value, \
+								 memory_order)
+
 #define __RTE_GEN_BIT_TEST(name, size, qualifier)			\
 	static inline bool						\
 	name(const qualifier uint ## size ## _t *addr, unsigned int nr)	\
@@ -429,6 +601,131 @@ __rte_bit_once_assign64(volatile uint64_t *addr, unsigned int nr, bool value)
 		__rte_bit_once_clear64(addr, nr);
 }
 
+#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
+	static inline bool						\
+	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\
+				      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_SET(size)					\
+	static inline void						\
+	__rte_bit_atomic_set ## size(uint ## size ## _t *addr,		\
+				     unsigned int nr, int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_or_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_CLEAR(size)				\
+	static inline void						\
+	__rte_bit_atomic_clear ## size(uint ## size ## _t *addr,	\
+				       unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_and_explicit(a_addr, ~mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_ASSIGN(size)				\
+	static inline void						\
+	__rte_bit_atomic_assign ## size(uint ## size ## _t *addr,	\
+					unsigned int nr, bool value,	\
+					int memory_order)		\
+	{								\
+		if (value)						\
+			__rte_bit_atomic_set ## size(addr, nr, memory_order); \
+		else							\
+			__rte_bit_atomic_clear ## size(addr, nr,	\
+						       memory_order);	\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)			\
+	static inline bool						\
+	__rte_bit_atomic_test_and_assign ## size(uint ## size ## _t *addr, \
+						 unsigned int nr,	\
+						 bool value,		\
+						 int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t before;				\
+		uint ## size ## _t target;				\
+									\
+		before = rte_atomic_load_explicit(a_addr,		\
+						  rte_memory_order_relaxed); \
+									\
+		do {							\
+			target = before;				\
+			__rte_bit_assign ## size(&target, nr, value);	\
+		} while (!rte_atomic_compare_exchange_weak_explicit(	\
+				a_addr, &before, target,		\
+				rte_memory_order_relaxed,		\
+				memory_order));				\
+		return __rte_bit_test ## size(&before, nr);		\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_OPS(size)			\
+	__RTE_GEN_BIT_ATOMIC_TEST(size)			\
+	__RTE_GEN_BIT_ATOMIC_SET(size)			\
+	__RTE_GEN_BIT_ATOMIC_CLEAR(size)		\
+	__RTE_GEN_BIT_ATOMIC_ASSIGN(size)		\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)
+
+__RTE_GEN_BIT_ATOMIC_OPS(32)
+__RTE_GEN_BIT_ATOMIC_OPS(64)
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_set32(uint32_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, true,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_clear32(uint32_t *addr, unsigned int nr,
+				int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, false,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_set64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, true,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_clear64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, false,
+						  memory_order);
+}
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -1128,6 +1425,14 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_once_clear
 #undef rte_bit_once_assign
 
+#undef rte_bit_atomic_test
+#undef rte_bit_atomic_set
+#undef rte_bit_atomic_clear
+#undef rte_bit_atomic_assign
+#undef rte_bit_atomic_test_and_set
+#undef rte_bit_atomic_test_and_clear
+#undef rte_bit_atomic_test_and_assign
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1171,6 +1476,59 @@ rte_log2_u64(uint64_t v)
 	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
 				arg2_type, arg2_name)
 
+#define __RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name); \
+	}
+
+#define __RTE_BIT_OVERLOAD_3R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name,	\
+					  arg3_name);		      \
+	}
+
+#define __RTE_BIT_OVERLOAD_4(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name, arg3_type, arg3_name)		\
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name, \
+						 arg3_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_4R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)
+
 __RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(set, , unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(clear, , unsigned int, nr)
@@ -1181,6 +1539,19 @@ __RTE_BIT_OVERLOAD_2(once_set, volatile, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(once_clear, volatile, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(once_assign, volatile, unsigned int, nr, bool, value)
 
+__RTE_BIT_OVERLOAD_3R(atomic_test, const, bool, unsigned int, nr,	\
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_set,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_clear,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_4(atomic_assign,, unsigned int, nr, bool, value,
+		     int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_set,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_clear,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_4R(atomic_test_and_assign,, bool, unsigned int, nr,
+		      bool, value, int, memory_order)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v3 6/6] eal: add unit tests for atomic bit access functions
  2024-04-29  9:51       ` [RFC v3 0/6] Improve EAL bit operations API Mattias Rönnblom
                           ` (4 preceding siblings ...)
  2024-04-29  9:51         ` [RFC v3 5/6] eal: add atomic bit operations Mattias Rönnblom
@ 2024-04-29  9:51         ` Mattias Rönnblom
  5 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-29  9:51 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_atomic_[set|clear|assign|test|test_and_[set|clear|assign]]()
family of functions.

RFC v3:
 * Rename variable 'main' to make ICC happy.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 app/test/test_bitops.c       | 233 ++++++++++++++++++++++++++++++++++-
 lib/eal/include/rte_bitops.h |   1 -
 2 files changed, 232 insertions(+), 2 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 12c1027e36..d77793dfe8 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -3,10 +3,13 @@
  * Copyright(c) 2024 Ericsson AB
  */
 
+#include <inttypes.h>
 #include <stdbool.h>
 
-#include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_lcore.h>
 #include <rte_random.h>
 #include "test.h"
 
@@ -60,6 +63,228 @@ GEN_TEST_BIT_ACCESS(test_bit_once_access_64, rte_bit_once_set,		\
 		    rte_bit_once_clear, rte_bit_once_assign,		\
 		    rte_bit_once_test, 64)
 
+#define bit_atomic_set(addr, nr)				\
+	rte_bit_atomic_set(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_clear(addr, nr)					\
+	rte_bit_atomic_clear(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_assign(addr, nr, value)				\
+	rte_bit_atomic_assign(addr, nr, value, rte_memory_order_relaxed)
+
+#define bit_atomic_test(addr, nr)				\
+	rte_bit_atomic_test(addr, nr, rte_memory_order_relaxed)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access_32, bit_atomic_set,	\
+		    bit_atomic_clear, bit_atomic_assign,	\
+		    bit_atomic_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access_64, bit_atomic_set,	\
+		    bit_atomic_clear, bit_atomic_assign,	\
+		    bit_atomic_test, 64)
+
+#define PARALLEL_TEST_RUNTIME 0.25
+
+#define GEN_TEST_BIT_PARALLEL_ASSIGN(size)				\
+									\
+	struct parallel_access_lcore_ ## size				\
+	{								\
+		unsigned int bit;					\
+		uint ## size ##_t *word;				\
+		bool failed;						\
+	};								\
+									\
+	static int							\
+	run_parallel_assign_ ## size(void *arg)				\
+	{								\
+		struct parallel_access_lcore_ ## size *lcore = arg;	\
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		bool value = false;					\
+									\
+		do {							\
+			bool new_value = rte_rand() & 1;		\
+			bool use_test_and_modify = rte_rand() & 1;	\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (rte_bit_atomic_test(lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) != value) { \
+				lcore->failed = true;			\
+				break;					\
+			}						\
+									\
+			if (use_test_and_modify) {			\
+				bool old_value;				\
+				if (use_assign) 			\
+					old_value = rte_bit_atomic_test_and_assign( \
+						lcore->word, lcore->bit, new_value, \
+						rte_memory_order_relaxed); \
+				else {					\
+					old_value = new_value ?		\
+						rte_bit_atomic_test_and_set( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed) : \
+						rte_bit_atomic_test_and_clear( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+				if (old_value != value) {		\
+					lcore->failed = true;		\
+					break;				\
+				}					\
+			} else {					\
+				if (use_assign)				\
+					rte_bit_atomic_assign(lcore->word, lcore->bit, \
+							      new_value, \
+							      rte_memory_order_relaxed); \
+				else {					\
+					if (new_value)			\
+						rte_bit_atomic_set(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+					else				\
+						rte_bit_atomic_clear(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+			}						\
+									\
+			value = new_value;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_assign_ ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		struct parallel_access_lcore_ ## size lmain = {		\
+			.word = &word					\
+		};							\
+		struct parallel_access_lcore_ ## size lworker = {	\
+			.word = &word					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		lmain.bit = rte_rand_max(size);				\
+		do {							\
+			lworker.bit = rte_rand_max(size);		\
+		} while (lworker.bit == lmain.bit);			\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_assign_ ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_assign_ ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		TEST_ASSERT(!lmain.failed, "Main lcore atomic access failed"); \
+		TEST_ASSERT(!lworker.failed, "Worker lcore atomic access " \
+			    "failed");					\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_ASSIGN(32)
+GEN_TEST_BIT_PARALLEL_ASSIGN(64)
+
+#define GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(size)			\
+									\
+	struct parallel_test_and_set_lcore_ ## size			\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_test_and_modify_ ## size(void *arg)		\
+	{								\
+		struct parallel_test_and_set_lcore_ ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			bool old_value;					\
+			bool new_value = rte_rand() & 1;		\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (use_assign)					\
+				old_value = rte_bit_atomic_test_and_assign( \
+					lcore->word, lcore->bit, new_value, \
+					rte_memory_order_relaxed);	\
+			else						\
+				old_value = new_value ?			\
+					rte_bit_atomic_test_and_set(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) : \
+					rte_bit_atomic_test_and_clear(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed); \
+			if (old_value != new_value)			\
+				lcore->flips++;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_test_and_modify_ ## size(void)		\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_test_and_set_lcore_ ## size lmain = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_test_and_set_lcore_ ## size lworker = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_test_and_modify_ ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_test_and_modify_ ## size(&lmain);		\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(32)
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -178,6 +403,12 @@ static struct unit_test_suite test_suite = {
 		TEST_CASE(test_bit_access_64),
 		TEST_CASE(test_bit_once_access_32),
 		TEST_CASE(test_bit_once_access_64),
+		TEST_CASE(test_bit_atomic_access_32),
+		TEST_CASE(test_bit_atomic_access_64),
+		TEST_CASE(test_bit_atomic_parallel_assign_32),
+		TEST_CASE(test_bit_atomic_parallel_assign_64),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify_32),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify_64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 2af5355a8a..5717691e7c 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -485,7 +485,6 @@ extern "C" {
 		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
 		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
 								memory_order)
-
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* RE: [RFC v3 1/6] eal: extend bit manipulation functionality
  2024-04-29  9:51         ` [RFC v3 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-04-29 11:12           ` Morten Brørup
  2024-04-30  9:55           ` [RFC v4 0/6] Improve EAL bit operations API Mattias Rönnblom
  1 sibling, 0 replies; 90+ messages in thread
From: Morten Brørup @ 2024-04-29 11:12 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff

> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Monday, 29 April 2024 11.52
> 
> Add functionality to test, set, clear, and assign the value to
> individual bits in 32-bit or 64-bit words.
> 
> These functions have no implications on memory ordering, atomicity and
> does not use volatile and thus does not prevent any compiler
> optimizations.
> 
> RFC v3:
>  * Work around lack of C++ support for _Generic (Tyler Retzlaff).
>  * Fix ','-related checkpatch warnings.
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> ---

For the series,
Acked-by: Morten Brørup <mb@smartsharesystems.com>


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [RFC v4 0/6] Improve EAL bit operations API
  2024-04-29  9:51         ` [RFC v3 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
  2024-04-29 11:12           ` Morten Brørup
@ 2024-04-30  9:55           ` Mattias Rönnblom
  2024-04-30  9:55             ` [RFC v4 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
                               ` (5 more replies)
  1 sibling, 6 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-30  9:55 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

This patch set represent an attempt to improve and extend the RTE
bitops API, in particular for functions that operate on individual
bits.

All new functionality is exposed to the user as generic selection
macros, delegating the actual work to private (__-marked) static
inline functions. Public functions (e.g., rte_bit_set32()) would just
be bloating the API. Such generic selection macros will here be
referred to as "functions", although technically they are not.

The legacy <rte_bitops.h> rte_bit_relaxed_*() family of functions is
replaced with three families:

rte_bit_[test|set|clear|assign|flip]() which provides no memory
ordering or atomicity guarantees and no read-once or write-once
semantics (e.g., no use of volatile), but does provide the best
performance. The performance degradation resulting from the use of
volatile (e.g., forcing loads and stores to actually occur and in the
number specified) and atomic (e.g., LOCK-prefixed instructions on x86)
may be significant.

rte_bit_once_*() which guarantees program-level load and stores
actually occurring (i.e., prevents certain optimizations). The primary
use of these functions are in the context of memory mapped
I/O. Feedback on the details (semantics, naming) here would be greatly
appreciated, since the author is not much of a driver developer.

rte_bit_atomic_*() which provides atomic bit-level operations,
including the possibility to specifying memory ordering constraints
(or the lack thereof).

The atomic functions take non-_Atomic pointers, to be flexible, just
like the GCC builtins and default <rte_stdatomic.h>. The issue with
_Atomic APIs is that it may well be the case that the user wants to
perform both non-atomic and atomic operations on the same word.

Having _Atomic-marked addresses would complicate supporting atomic
bit-level operations in the bitset API (proposed in a different RFC
patchset), and potentially other APIs depending on RTE bitops for
atomic bit-level ops). Either one needs two bitset variants, one
_Atomic bitset and one non-atomic one, or the bitset code needs to
cast the non-_Atomic pointer to an _Atomic one. Having a separate
_Atomic bitset would be bloat and also prevent the user from both, in
some situations, doing atomic operations against a bit set, while in
other situations (e.g., at times when MT safety is not a concern)
operating on the same objects in a non-atomic manner.

Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
not uint32_t or uint64_t. The author found the use of such large types
confusing, and also failed to see any performance benefits.

A set of functions rte_bit_*_assign() are added, to assign a
particular boolean value to a particular bit.

All new functions have properly documented semantics.

All new functions operate on both 32 and 64-bit words, with type
checking.

_Generic allow the user code to be a little more impact. Have a
type-generic atomic test/set/clear/assign bit API also seems
consistent with the "core" (word-size) atomics API, which is generic
(both GCC builtins and <rte_stdatomic.h> are).

The _Generic versions avoids having explicit unsigned long versions of
all functions. If you have an unsigned long, it's safe to use the
generic version (e.g., rte_set_bit()) and _Generic will pick the right
function, provided long is either 32 or 64 bit on your platform (which
it is on all DPDK-supported ABIs).

The generic rte_bit_set() is a macro, and not a function, but
nevertheless has been given a lower-case name. That's how C11 does it
(for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
can't be taken, but it does not evaluate its parameters more than
once.

C++ doesn't support generic selection. In C++ translation units the
_Generic macros are replaced with overloaded functions.

Things that are left out of this patch set, that may be included
in future versions:

 * Have all functions returning a bit number have the same return type
   (i.e., unsigned int).
 * Harmonize naming of some GCC builtin wrappers (i.e., rte_fls_u32()).
 * Add __builtin_ffsll()/ffs() wrapper and potentially other wrappers
   for useful/used bit-level GCC builtins.
 * Eliminate the MSVC #ifdef-induced documentation duplication.
 * _Generic versions of things like rte_popcount32(). (?)

Mattias Rönnblom (6):
  eal: extend bit manipulation functionality
  eal: add unit tests for bit operations
  eal: add exactly-once bit access functions
  eal: add unit tests for exactly-once bit access functions
  eal: add atomic bit operations
  eal: add unit tests for atomic bit access functions

 app/test/test_bitops.c       |  405 ++++++++++++-
 lib/eal/include/rte_bitops.h | 1070 +++++++++++++++++++++++++++++++++-
 2 files changed, 1457 insertions(+), 18 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [RFC v4 1/6] eal: extend bit manipulation functionality
  2024-04-30  9:55           ` [RFC v4 0/6] Improve EAL bit operations API Mattias Rönnblom
@ 2024-04-30  9:55             ` Mattias Rönnblom
  2024-04-30 12:08               ` [RFC v5 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-04-30  9:55             ` [RFC v4 2/6] eal: add unit tests for bit operations Mattias Rönnblom
                               ` (4 subsequent siblings)
  5 siblings, 1 reply; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-30  9:55 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add functionality to test, set, clear, and assign the value to
individual bits in 32-bit or 64-bit words.

These functions have no implications on memory ordering, atomicity and
does not use volatile and thus does not prevent any compiler
optimizations.

RFC v4:
 * Add rte_bit_flip() which, believe it or not, flips the value of a bit.
 * Mark macro-generated private functions as experimental.
 * Use macros to generate *assign*() functions.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).
 * Fix ','-related checkpatch warnings.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 257 ++++++++++++++++++++++++++++++++++-
 1 file changed, 255 insertions(+), 2 deletions(-)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 449565eeae..9d426f1602 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -2,6 +2,7 @@
  * Copyright(c) 2020 Arm Limited
  * Copyright(c) 2010-2019 Intel Corporation
  * Copyright(c) 2023 Microsoft Corporation
+ * Copyright(c) 2024 Ericsson AB
  */
 
 #ifndef _RTE_BITOPS_H_
@@ -11,12 +12,14 @@
  * @file
  * Bit Operations
  *
- * This file defines a family of APIs for bit operations
- * without enforcing memory ordering.
+ * This file provides functionality for low-level, single-word
+ * arithmetic and bit-level operations, such as counting or
+ * setting individual bits.
  */
 
 #include <stdint.h>
 
+#include <rte_compat.h>
 #include <rte_debug.h>
 
 #ifdef __cplusplus
@@ -105,6 +108,194 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_test32,			\
+		 uint64_t *: __rte_bit_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: __rte_bit_set32,		\
+		 uint64_t *: __rte_bit_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_clear32,			\
+		 uint64_t *: __rte_bit_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)					\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_assign32,			\
+		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Flip a bit in word.
+ *
+ * Generic selection macro to change the value of a bit to '0' if '1'
+ * or '1' if '0' in a 32-bit or 64-bit word. The type of operation
+ * depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_flip(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_flip32,				\
+		 uint64_t *: __rte_bit_flip64)(addr, nr)
+
+#define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_ ## family ## fun ## size(const qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return *addr & mask;					\
+	}
+
+#define __RTE_GEN_BIT_SET(family, fun, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		*addr |= mask;						\
+	}								\
+
+#define __RTE_GEN_BIT_CLEAR(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = ~((uint ## size ## _t)1 << nr); \
+		(*addr) &= mask;					\
+	}								\
+
+#define __RTE_GEN_BIT_ASSIGN(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr, bool value) \
+	{								\
+		if (value)						\
+			__rte_bit_ ## family ## set ## size(addr, nr);	\
+		else							\
+			__rte_bit_ ## family ## clear ## size(addr, nr); \
+	}
+
+#define __RTE_GEN_BIT_FLIP(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		bool value;						\
+									\
+		value = __rte_bit_ ## family ## test ## size(addr, nr);	\
+		__rte_bit_ ## family ## assign ## size(addr, nr, !value); \
+	}
+
+__RTE_GEN_BIT_TEST(, test,, 32)
+__RTE_GEN_BIT_SET(, set,, 32)
+__RTE_GEN_BIT_CLEAR(, clear,, 32)
+__RTE_GEN_BIT_ASSIGN(, assign,, 32)
+__RTE_GEN_BIT_FLIP(, flip,, 32)
+
+__RTE_GEN_BIT_TEST(, test,, 64)
+__RTE_GEN_BIT_SET(, set,, 64)
+__RTE_GEN_BIT_CLEAR(, clear,, 64)
+__RTE_GEN_BIT_ASSIGN(, assign,, 64)
+__RTE_GEN_BIT_FLIP(, flip,, 64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -787,6 +978,68 @@ rte_log2_u64(uint64_t v)
 
 #ifdef __cplusplus
 }
+
+/*
+ * Since C++ doesn't support generic selection (i.e., _Generic),
+ * function overloading is used instead. Such functions must be
+ * defined outside 'extern "C"' to be accepted by the compiler.
+ */
+
+#undef rte_bit_test
+#undef rte_bit_set
+#undef rte_bit_clear
+#undef rte_bit_assign
+#undef rte_bit_flip
+
+#define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
+	static inline void						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_2(fun, qualifier, arg1_type, arg1_name)	\
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 32, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 64, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_2R(fun, qualifier, ret_type, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name)				\
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_3(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name)					\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name)
+
+__RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v4 2/6] eal: add unit tests for bit operations
  2024-04-30  9:55           ` [RFC v4 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-04-30  9:55             ` [RFC v4 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-04-30  9:55             ` Mattias Rönnblom
  2024-04-30  9:55             ` [RFC v4 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
                               ` (3 subsequent siblings)
  5 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-30  9:55 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the rte_bit_[set|clear|assign|test]()
family of functions.

The tests are converted to use the test suite runner framework.

RFC v4:
 * Remove redundant line continuations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c | 80 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 65 insertions(+), 15 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 0d4ccfb468..111f9b328e 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -1,13 +1,63 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2019 Arm Limited
+ * Copyright(c) 2024 Ericsson AB
  */
 
+#include <stdbool.h>
+
 #include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_random.h>
 #include "test.h"
 
-uint32_t val32;
-uint64_t val64;
+#define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
+			    flip_fun, test_fun, size)			\
+	static int							\
+	test_name(void)							\
+	{								\
+		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
+		unsigned int bit_nr;					\
+		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			bool assign = rte_rand() & 1;			\
+			if (assign)					\
+				assign_fun(&word, bit_nr, reference_bit); \
+			else {						\
+				if (reference_bit)			\
+					set_fun(&word, bit_nr);		\
+				else					\
+					clear_fun(&word, bit_nr);	\
+									\
+			}						\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+			TEST_ASSERT(test_fun(&word, bit_nr) != reference_bit, \
+				    "Bit %d had unflipped value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+		}							\
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		TEST_ASSERT(reference == word, "Word had unexpected value"); \
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
+
+static uint32_t val32;
+static uint64_t val64;
 
 #define MAX_BITS_32 32
 #define MAX_BITS_64 64
@@ -117,22 +167,22 @@ test_bit_relaxed_test_set_clear(void)
 	return TEST_SUCCESS;
 }
 
+static struct unit_test_suite test_suite = {
+	.suite_name = "Bitops test suite",
+	.unit_test_cases = {
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_relaxed_set),
+		TEST_CASE(test_bit_relaxed_clear),
+		TEST_CASE(test_bit_relaxed_test_set_clear),
+		TEST_CASES_END()
+	}
+};
+
 static int
 test_bitops(void)
 {
-	val32 = 0;
-	val64 = 0;
-
-	if (test_bit_relaxed_set() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_clear() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_test_set_clear() < 0)
-		return TEST_FAILED;
-
-	return TEST_SUCCESS;
+	return unit_test_suite_runner(&test_suite);
 }
 
 REGISTER_FAST_TEST(bitops_autotest, true, true, test_bitops);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v4 3/6] eal: add exactly-once bit access functions
  2024-04-30  9:55           ` [RFC v4 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-04-30  9:55             ` [RFC v4 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
  2024-04-30  9:55             ` [RFC v4 2/6] eal: add unit tests for bit operations Mattias Rönnblom
@ 2024-04-30  9:55             ` Mattias Rönnblom
  2024-04-30  9:55             ` [RFC v4 4/6] eal: add unit tests for " Mattias Rönnblom
                               ` (2 subsequent siblings)
  5 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-30  9:55 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add bit test/set/clear/assign functions which prevents certain
compiler optimizations and guarantees that program-level memory loads
and/or stores will actually occur.

These functions are useful when interacting with memory-mapped
hardware devices.

The "once" family of functions does not promise atomicity and provides
no memory ordering guarantees beyond the C11 relaxed memory model.

RFC v3:
    * Work around lack of C++ support for _Generic (Tyler Retzlaff).

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 195 +++++++++++++++++++++++++++++++++++
 1 file changed, 195 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 9d426f1602..f77bd83e97 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -224,6 +224,177 @@ extern "C" {
 		 uint32_t *: __rte_bit_flip32,				\
 		 uint64_t *: __rte_bit_flip64)(addr, nr)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Generic selection macro to test exactly once the value of a bit in
+ * a 32-bit or 64-bit word. The type of operation depends on the type
+ * of the @c addr parameter.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * (e.g., it may not be eliminate or merged by the compiler).
+ *
+ * \code{.c}
+ * rte_bit_once_set(addr, 17);
+ * if (rte_bit_once_test(addr, 17)) {
+ *     ...
+ * }
+ * \endcode
+ *
+ * In the above example, rte_bit_once_set() may not be removed by
+ * the compiler, which would be allowed in case rte_bit_set() and
+ * rte_bit_test() was used.
+ *
+ * \code{.c}
+ * while (rte_bit_once_test(addr, 17);
+ *     ;
+ * \endcode
+ *
+ * In case rte_bit_test(addr, 17) was used instead, the resulting
+ * object code could (and in many cases would be) replaced with
+ * the equivalent to
+ * \code{.c}
+ * if (rte_bit_test(addr, 17)) {
+ *   for (;;) // spin forever
+ *       ;
+ * }
+ * \endcode
+ *
+ * rte_bit_once_test() does not give any guarantees in regards to
+ * memory ordering or atomicity.
+ *
+ * The regular bit set operations (e.g., rte_bit_test()) should be
+ * preferred over the "once" family of operations (e.g.,
+ * rte_bit_once_test()) if possible, since the latter may prevent
+ * optimizations crucial for run-time performance.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+
+#define rte_bit_once_test(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_test32,		\
+		 uint64_t *: __rte_bit_once_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to '1'
+ * exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit set operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+
+#define rte_bit_once_set(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_set32,		\
+		 uint64_t *: __rte_bit_once_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to '0'
+ * exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_clear(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_clear32,		\
+		 uint64_t *: __rte_bit_once_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to bit in a word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to the
+ * value indicated by @c value exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_once_assign(addr, nr, value)				\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_once_assign32,			\
+		 uint64_t *: __rte_bit_once_assign64)(addr, nr, value)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Flip bit in word, reading and writing exactly once.
+ *
+ * Change the value of a bit to '0' if '1' or '1' if '0' in a 32-bit
+ * or 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit flip operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_flip(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_flip32,		\
+		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
+
 #define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
@@ -296,6 +467,18 @@ __RTE_GEN_BIT_CLEAR(, clear,, 64)
 __RTE_GEN_BIT_ASSIGN(, assign,, 64)
 __RTE_GEN_BIT_FLIP(, flip,, 64)
 
+__RTE_GEN_BIT_TEST(once_, test, volatile, 32)
+__RTE_GEN_BIT_SET(once_, set, volatile, 32)
+__RTE_GEN_BIT_CLEAR(once_, clear, volatile, 32)
+__RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 32)
+__RTE_GEN_BIT_FLIP(once_, flip, volatile, 32)
+
+__RTE_GEN_BIT_TEST(once_, test, volatile, 64)
+__RTE_GEN_BIT_SET(once_, set, volatile, 64)
+__RTE_GEN_BIT_CLEAR(once_, clear, volatile, 64)
+__RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 64)
+__RTE_GEN_BIT_FLIP(once_, flip, volatile, 64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -991,6 +1174,12 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_assign
 #undef rte_bit_flip
 
+#undef rte_bit_once_test
+#undef rte_bit_once_set
+#undef rte_bit_once_clear
+#undef rte_bit_once_assign
+#undef rte_bit_once_flip
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1040,6 +1229,12 @@ __RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
 __RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
 
+__RTE_BIT_OVERLOAD_2R(once_test, const volatile, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(once_set, volatile, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(once_clear, volatile, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(once_assign, volatile, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(once_flip, volatile, unsigned int, nr)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v4 4/6] eal: add unit tests for exactly-once bit access functions
  2024-04-30  9:55           ` [RFC v4 0/6] Improve EAL bit operations API Mattias Rönnblom
                               ` (2 preceding siblings ...)
  2024-04-30  9:55             ` [RFC v4 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
@ 2024-04-30  9:55             ` Mattias Rönnblom
  2024-04-30 10:37               ` Morten Brørup
  2024-04-30  9:55             ` [RFC v4 5/6] eal: add atomic bit operations Mattias Rönnblom
  2024-04-30  9:55             ` [RFC v4 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  5 siblings, 1 reply; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-30  9:55 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_once_[set|clear|assign|test]() family of functions.

RFC v4:
 * Remove redundant continuations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c       |  10 +
 lib/eal/include/rte_bitops.h | 425 +++++++++++++++++++++++++++++++++++
 2 files changed, 435 insertions(+)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 111f9b328e..615ec6e563 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -56,6 +56,14 @@ GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
 GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
 		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
 
+GEN_TEST_BIT_ACCESS(test_bit_once_access32, rte_bit_once_set,
+		    rte_bit_once_clear, rte_bit_once_assign,
+		    rte_bit_once_flip, rte_bit_once_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_once_access64, rte_bit_once_set,
+		    rte_bit_once_clear, rte_bit_once_assign,
+		    rte_bit_once_flip, rte_bit_once_test, 64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -172,6 +180,8 @@ static struct unit_test_suite test_suite = {
 	.unit_test_cases = {
 		TEST_CASE(test_bit_access32),
 		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_once_access32),
+		TEST_CASE(test_bit_once_access64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index f77bd83e97..abfe96d531 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -21,6 +21,7 @@
 
 #include <rte_compat.h>
 #include <rte_debug.h>
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -395,6 +396,199 @@ extern "C" {
 		 uint32_t *: __rte_bit_once_flip32,		\
 		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test if a particular bit in a word is set with a particular memory
+ * order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+#define rte_bit_atomic_test(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test32,			\
+		 uint64_t *: __rte_bit_atomic_test64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically set bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '1', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_set(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_set32,			\
+		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically clear bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '0', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_clear(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_clear32,			\
+		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically assign a value to bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to the value indicated by @c value, with the memory ordering
+ * as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_assign32,			\
+		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
+							memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically flip bit in word.
+ *
+ * Atomically negate the value of the bit specified by @c nr in the
+ * word pointed to by @c addr to the value indicated by @c value, with
+ * the memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_flip(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_flip32,			\
+		 uint64_t *: __rte_bit_atomic_flip64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and set a bit in word.
+ *
+ * Atomically test and set bit specified by @c nr in the word pointed
+ * to by @c addr to '1', with the memory ordering as specified with @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
+							      memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and clear a bit in word.
+ *
+ * Atomically test and clear bit specified by @c nr in the word
+ * pointed to by @c addr to '0', with the memory ordering as specified
+ * with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
+								memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and assign a bit in word.
+ *
+ * Atomically test and assign bit specified by @c nr in the word
+ * pointed to by @c addr the value specified by @c value, with the
+ * memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
+		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
+								 value, \
+								 memory_order)
+
 #define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
@@ -479,6 +673,162 @@ __RTE_GEN_BIT_CLEAR(once_, clear, volatile, 64)
 __RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 64)
 __RTE_GEN_BIT_FLIP(once_, flip, volatile, 64)
 
+#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\
+				      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_SET(size)					\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_set ## size(uint ## size ## _t *addr,		\
+				     unsigned int nr, int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_or_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_CLEAR(size)				\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_clear ## size(uint ## size ## _t *addr,	\
+				       unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_and_explicit(a_addr, ~mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_ASSIGN(size)				\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_assign ## size(uint ## size ## _t *addr,	\
+					unsigned int nr, bool value,	\
+					int memory_order)		\
+	{								\
+		if (value)						\
+			__rte_bit_atomic_set ## size(addr, nr, memory_order); \
+		else							\
+			__rte_bit_atomic_clear ## size(addr, nr,	\
+						       memory_order);	\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)			\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test_and_assign ## size(uint ## size ## _t *addr, \
+						 unsigned int nr,	\
+						 bool value,		\
+						 int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t before;				\
+		uint ## size ## _t target;				\
+									\
+		before = rte_atomic_load_explicit(a_addr,		\
+						  rte_memory_order_relaxed); \
+									\
+		do {							\
+			target = before;				\
+			__rte_bit_assign ## size(&target, nr, value);	\
+		} while (!rte_atomic_compare_exchange_weak_explicit(	\
+				a_addr, &before, target,		\
+				rte_memory_order_relaxed,		\
+				memory_order));				\
+		return __rte_bit_test ## size(&before, nr);		\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_FLIP(size)					\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_flip ## size(uint ## size ## _t *addr,		\
+				      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t before;				\
+		uint ## size ## _t target;				\
+									\
+		before = rte_atomic_load_explicit(a_addr,		\
+						  rte_memory_order_relaxed); \
+									\
+		do {							\
+			target = before;				\
+			__rte_bit_flip ## size(&target, nr);		\
+		} while (!rte_atomic_compare_exchange_weak_explicit(	\
+				a_addr, &before, target,		\
+				rte_memory_order_relaxed,		\
+				memory_order));				\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_OPS(size)			\
+	__RTE_GEN_BIT_ATOMIC_TEST(size)			\
+	__RTE_GEN_BIT_ATOMIC_SET(size)			\
+	__RTE_GEN_BIT_ATOMIC_CLEAR(size)		\
+	__RTE_GEN_BIT_ATOMIC_ASSIGN(size)		\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)	\
+	__RTE_GEN_BIT_ATOMIC_FLIP(size)
+
+__RTE_GEN_BIT_ATOMIC_OPS(32)
+__RTE_GEN_BIT_ATOMIC_OPS(64)
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_set32(uint32_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, true,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_clear32(uint32_t *addr, unsigned int nr,
+				int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, false,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_set64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, true,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_clear64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, false,
+						  memory_order);
+}
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -1180,6 +1530,14 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_once_assign
 #undef rte_bit_once_flip
 
+#undef rte_bit_atomic_test
+#undef rte_bit_atomic_set
+#undef rte_bit_atomic_clear
+#undef rte_bit_atomic_assign
+#undef rte_bit_atomic_test_and_set
+#undef rte_bit_atomic_test_and_clear
+#undef rte_bit_atomic_test_and_assign
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1223,6 +1581,59 @@ rte_log2_u64(uint64_t v)
 	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
 				arg2_type, arg2_name)
 
+#define __RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name); \
+	}
+
+#define __RTE_BIT_OVERLOAD_3R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name,	\
+					  arg3_name);		      \
+	}
+
+#define __RTE_BIT_OVERLOAD_4(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name, arg3_type, arg3_name)		\
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name, \
+						 arg3_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_4R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)
+
 __RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
@@ -1235,6 +1646,20 @@ __RTE_BIT_OVERLOAD_2(once_clear, volatile, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(once_assign, volatile, unsigned int, nr, bool, value)
 __RTE_BIT_OVERLOAD_2(once_flip, volatile, unsigned int, nr)
 
+__RTE_BIT_OVERLOAD_3R(atomic_test, const, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_set,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_clear,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_4(atomic_assign,, unsigned int, nr, bool, value,
+		     int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_flip,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_set,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_clear,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_4R(atomic_test_and_assign,, bool, unsigned int, nr,
+		      bool, value, int, memory_order)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v4 5/6] eal: add atomic bit operations
  2024-04-30  9:55           ` [RFC v4 0/6] Improve EAL bit operations API Mattias Rönnblom
                               ` (3 preceding siblings ...)
  2024-04-30  9:55             ` [RFC v4 4/6] eal: add unit tests for " Mattias Rönnblom
@ 2024-04-30  9:55             ` Mattias Rönnblom
  2024-04-30  9:55             ` [RFC v4 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  5 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-30  9:55 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add atomic bit test/set/clear/assign and test-and-set/clear functions.

All atomic bit functions allow (and indeed, require) the caller to
specify a memory order.

RFC v4:
 * Add atomic bit flip.
 * Mark macro-generated private functions experimental.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).

RFC v2:
 o Add rte_bit_atomic_test_and_assign() (for consistency).
 o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
 o Use <rte_stdatomics.h> to support MSVC.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 194 +++++++++++++++++++++++++++++++++++
 1 file changed, 194 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index abfe96d531..f014bd913e 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -589,6 +589,199 @@ extern "C" {
 								 value, \
 								 memory_order)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test if a particular bit in a word is set with a particular memory
+ * order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+#define rte_bit_atomic_test(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test32,			\
+		 uint64_t *: __rte_bit_atomic_test64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically set bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '1', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_set(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_set32,			\
+		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically clear bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '0', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_clear(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_clear32,			\
+		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically assign a value to bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to the value indicated by @c value, with the memory ordering
+ * as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_assign32,			\
+		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
+							memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically flip bit in word.
+ *
+ * Atomically negate the value of the bit specified by @c nr in the
+ * word pointed to by @c addr to the value indicated by @c value, with
+ * the memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_flip(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_flip32,			\
+		 uint64_t *: __rte_bit_atomic_flip64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and set a bit in word.
+ *
+ * Atomically test and set bit specified by @c nr in the word pointed
+ * to by @c addr to '1', with the memory ordering as specified with @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
+							      memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and clear a bit in word.
+ *
+ * Atomically test and clear bit specified by @c nr in the word
+ * pointed to by @c addr to '0', with the memory ordering as specified
+ * with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
+								memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and assign a bit in word.
+ *
+ * Atomically test and assign bit specified by @c nr in the word
+ * pointed to by @c addr the value specified by @c value, with the
+ * memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
+		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
+								 value, \
+								 memory_order)
+
 #define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
@@ -1534,6 +1727,7 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_atomic_set
 #undef rte_bit_atomic_clear
 #undef rte_bit_atomic_assign
+#undef rte_bit_atomic_flip
 #undef rte_bit_atomic_test_and_set
 #undef rte_bit_atomic_test_and_clear
 #undef rte_bit_atomic_test_and_assign
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v4 6/6] eal: add unit tests for atomic bit access functions
  2024-04-30  9:55           ` [RFC v4 0/6] Improve EAL bit operations API Mattias Rönnblom
                               ` (4 preceding siblings ...)
  2024-04-30  9:55             ` [RFC v4 5/6] eal: add atomic bit operations Mattias Rönnblom
@ 2024-04-30  9:55             ` Mattias Rönnblom
  5 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-30  9:55 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_atomic_[set|clear|assign|test|test_and_[set|clear|assign]]()
family of functions.

RFC v4:
 * Add atomicity test for atomic bit flip.

RFC v3:
 * Rename variable 'main' to make ICC happy.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c       | 315 ++++++++++++++++++++++++++++++++++-
 lib/eal/include/rte_bitops.h |   1 -
 2 files changed, 314 insertions(+), 2 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 615ec6e563..abc07e8caf 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -3,10 +3,13 @@
  * Copyright(c) 2024 Ericsson AB
  */
 
+#include <inttypes.h>
 #include <stdbool.h>
 
-#include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_lcore.h>
 #include <rte_random.h>
 #include "test.h"
 
@@ -64,6 +67,304 @@ GEN_TEST_BIT_ACCESS(test_bit_once_access64, rte_bit_once_set,
 		    rte_bit_once_clear, rte_bit_once_assign,
 		    rte_bit_once_flip, rte_bit_once_test, 64)
 
+#define bit_atomic_set(addr, nr)				\
+	rte_bit_atomic_set(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_clear(addr, nr)					\
+	rte_bit_atomic_clear(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_assign(addr, nr, value)				\
+	rte_bit_atomic_assign(addr, nr, value, rte_memory_order_relaxed)
+
+#define bit_atomic_flip(addr, nr)					\
+    rte_bit_atomic_flip(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_test(addr, nr)				\
+	rte_bit_atomic_test(addr, nr, rte_memory_order_relaxed)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access32, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access64, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 64)
+
+#define PARALLEL_TEST_RUNTIME 0.25
+
+#define GEN_TEST_BIT_PARALLEL_ASSIGN(size)				\
+									\
+	struct parallel_access_lcore ## size				\
+	{								\
+		unsigned int bit;					\
+		uint ## size ##_t *word;				\
+		bool failed;						\
+	};								\
+									\
+	static int							\
+	run_parallel_assign ## size(void *arg)				\
+	{								\
+		struct parallel_access_lcore ## size *lcore = arg;	\
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		bool value = false;					\
+									\
+		do {							\
+			bool new_value = rte_rand() & 1;		\
+			bool use_test_and_modify = rte_rand() & 1;	\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (rte_bit_atomic_test(lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) != value) { \
+				lcore->failed = true;			\
+				break;					\
+			}						\
+									\
+			if (use_test_and_modify) {			\
+				bool old_value;				\
+				if (use_assign) 			\
+					old_value = rte_bit_atomic_test_and_assign( \
+						lcore->word, lcore->bit, new_value, \
+						rte_memory_order_relaxed); \
+				else {					\
+					old_value = new_value ?		\
+						rte_bit_atomic_test_and_set( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed) : \
+						rte_bit_atomic_test_and_clear( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+				if (old_value != value) {		\
+					lcore->failed = true;		\
+					break;				\
+				}					\
+			} else {					\
+				if (use_assign)				\
+					rte_bit_atomic_assign(lcore->word, lcore->bit, \
+							      new_value, \
+							      rte_memory_order_relaxed); \
+				else {					\
+					if (new_value)			\
+						rte_bit_atomic_set(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+					else				\
+						rte_bit_atomic_clear(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+			}						\
+									\
+			value = new_value;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_assign ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		struct parallel_access_lcore ## size lmain = {		\
+			.word = &word					\
+		};							\
+		struct parallel_access_lcore ## size lworker = {	\
+			.word = &word					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		lmain.bit = rte_rand_max(size);				\
+		do {							\
+			lworker.bit = rte_rand_max(size);		\
+		} while (lworker.bit == lmain.bit);			\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_assign ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_assign ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		TEST_ASSERT(!lmain.failed, "Main lcore atomic access failed"); \
+		TEST_ASSERT(!lworker.failed, "Worker lcore atomic access " \
+			    "failed");					\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_ASSIGN(32)
+GEN_TEST_BIT_PARALLEL_ASSIGN(64)
+
+#define GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(size)			\
+									\
+	struct parallel_test_and_set_lcore ## size			\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_test_and_modify ## size(void *arg)		\
+	{								\
+		struct parallel_test_and_set_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			bool old_value;					\
+			bool new_value = rte_rand() & 1;		\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (use_assign)					\
+				old_value = rte_bit_atomic_test_and_assign( \
+					lcore->word, lcore->bit, new_value, \
+					rte_memory_order_relaxed);	\
+			else						\
+				old_value = new_value ?			\
+					rte_bit_atomic_test_and_set(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) : \
+					rte_bit_atomic_test_and_clear(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed); \
+			if (old_value != new_value)			\
+				lcore->flips++;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_test_and_modify ## size(void)		\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_test_and_set_lcore ## size lmain = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_test_and_set_lcore ## size lworker = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_test_and_modify ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_test_and_modify ## size(&lmain);		\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(32)
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(64)
+
+#define GEN_TEST_BIT_PARALLEL_FLIP(size)				\
+									\
+	struct parallel_flip_lcore ## size				\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_flip ## size(void *arg)				\
+	{								\
+		struct parallel_flip_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			rte_bit_atomic_flip(lcore->word, lcore->bit,	\
+					    rte_memory_order_relaxed);	\
+			lcore->flips++;					\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_flip ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_flip_lcore ## size lmain = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_flip_lcore ## size lworker = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_flip ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_flip ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_FLIP(32)
+GEN_TEST_BIT_PARALLEL_FLIP(64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -182,6 +483,18 @@ static struct unit_test_suite test_suite = {
 		TEST_CASE(test_bit_access64),
 		TEST_CASE(test_bit_once_access32),
 		TEST_CASE(test_bit_once_access64),
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_once_access32),
+		TEST_CASE(test_bit_once_access64),
+		TEST_CASE(test_bit_atomic_access32),
+		TEST_CASE(test_bit_atomic_access64),
+		TEST_CASE(test_bit_atomic_parallel_assign32),
+		TEST_CASE(test_bit_atomic_parallel_assign64),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify32),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify64),
+		TEST_CASE(test_bit_atomic_parallel_flip32),
+		TEST_CASE(test_bit_atomic_parallel_flip64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index f014bd913e..fb771c6dfc 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -560,7 +560,6 @@ extern "C" {
 		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
 		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
 								memory_order)
-
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* RE: [RFC v4 4/6] eal: add unit tests for exactly-once bit access functions
  2024-04-30  9:55             ` [RFC v4 4/6] eal: add unit tests for " Mattias Rönnblom
@ 2024-04-30 10:37               ` Morten Brørup
  2024-04-30 11:58                 ` Mattias Rönnblom
  0 siblings, 1 reply; 90+ messages in thread
From: Morten Brørup @ 2024-04-30 10:37 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff

> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Tuesday, 30 April 2024 11.55
> 
> Extend bitops tests to cover the
> rte_bit_once_[set|clear|assign|test]() family of functions.
> 
> RFC v4:
>  * Remove redundant continuations.
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
>  app/test/test_bitops.c       |  10 +
>  lib/eal/include/rte_bitops.h | 425 +++++++++++++++++++++++++++++++++++
>  2 files changed, 435 insertions(+)

The rte_bitops.h changes belong in another patch in the series.



^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC v4 4/6] eal: add unit tests for exactly-once bit access functions
  2024-04-30 10:37               ` Morten Brørup
@ 2024-04-30 11:58                 ` Mattias Rönnblom
  0 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-30 11:58 UTC (permalink / raw)
  To: Morten Brørup, Mattias Rönnblom, dev
  Cc: Heng Wang, Stephen Hemminger, Tyler Retzlaff

On 2024-04-30 12:37, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
>> Sent: Tuesday, 30 April 2024 11.55
>>
>> Extend bitops tests to cover the
>> rte_bit_once_[set|clear|assign|test]() family of functions.
>>
>> RFC v4:
>>   * Remove redundant continuations.
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
>> ---
>>   app/test/test_bitops.c       |  10 +
>>   lib/eal/include/rte_bitops.h | 425 +++++++++++++++++++++++++++++++++++
>>   2 files changed, 435 insertions(+)
> 
> The rte_bitops.h changes belong in another patch in the series.
> 
> 

Thanks. Will send a v5.

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [RFC v5 0/6] Improve EAL bit operations API
  2024-04-30  9:55             ` [RFC v4 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-04-30 12:08               ` Mattias Rönnblom
  2024-04-30 12:08                 ` [RFC v5 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
                                   ` (5 more replies)
  0 siblings, 6 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-30 12:08 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

This patch set represent an attempt to improve and extend the RTE
bitops API, in particular for functions that operate on individual
bits.

All new functionality is exposed to the user as generic selection
macros, delegating the actual work to private (__-marked) static
inline functions. Public functions (e.g., rte_bit_set32()) would just
be bloating the API. Such generic selection macros will here be
referred to as "functions", although technically they are not.

The legacy <rte_bitops.h> rte_bit_relaxed_*() family of functions is
replaced with three families:

rte_bit_[test|set|clear|assign|flip]() which provides no memory
ordering or atomicity guarantees and no read-once or write-once
semantics (e.g., no use of volatile), but does provide the best
performance. The performance degradation resulting from the use of
volatile (e.g., forcing loads and stores to actually occur and in the
number specified) and atomic (e.g., LOCK-prefixed instructions on x86)
may be significant.

rte_bit_once_*() which guarantees program-level load and stores
actually occurring (i.e., prevents certain optimizations). The primary
use of these functions are in the context of memory mapped
I/O. Feedback on the details (semantics, naming) here would be greatly
appreciated, since the author is not much of a driver developer.

rte_bit_atomic_*() which provides atomic bit-level operations,
including the possibility to specifying memory ordering constraints
(or the lack thereof).

The atomic functions take non-_Atomic pointers, to be flexible, just
like the GCC builtins and default <rte_stdatomic.h>. The issue with
_Atomic APIs is that it may well be the case that the user wants to
perform both non-atomic and atomic operations on the same word.

Having _Atomic-marked addresses would complicate supporting atomic
bit-level operations in the bitset API (proposed in a different RFC
patchset), and potentially other APIs depending on RTE bitops for
atomic bit-level ops). Either one needs two bitset variants, one
_Atomic bitset and one non-atomic one, or the bitset code needs to
cast the non-_Atomic pointer to an _Atomic one. Having a separate
_Atomic bitset would be bloat and also prevent the user from both, in
some situations, doing atomic operations against a bit set, while in
other situations (e.g., at times when MT safety is not a concern)
operating on the same objects in a non-atomic manner.

Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
not uint32_t or uint64_t. The author found the use of such large types
confusing, and also failed to see any performance benefits.

A set of functions rte_bit_*_assign() are added, to assign a
particular boolean value to a particular bit.

All new functions have properly documented semantics.

All new functions operate on both 32 and 64-bit words, with type
checking.

_Generic allow the user code to be a little more impact. Have a
type-generic atomic test/set/clear/assign bit API also seems
consistent with the "core" (word-size) atomics API, which is generic
(both GCC builtins and <rte_stdatomic.h> are).

The _Generic versions avoids having explicit unsigned long versions of
all functions. If you have an unsigned long, it's safe to use the
generic version (e.g., rte_set_bit()) and _Generic will pick the right
function, provided long is either 32 or 64 bit on your platform (which
it is on all DPDK-supported ABIs).

The generic rte_bit_set() is a macro, and not a function, but
nevertheless has been given a lower-case name. That's how C11 does it
(for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
can't be taken, but it does not evaluate its parameters more than
once.

C++ doesn't support generic selection. In C++ translation units the
_Generic macros are replaced with overloaded functions.

Things that are left out of this patch set, that may be included
in future versions:

 * Have all functions returning a bit number have the same return type
   (i.e., unsigned int).
 * Harmonize naming of some GCC builtin wrappers (i.e., rte_fls_u32()).
 * Add __builtin_ffsll()/ffs() wrapper and potentially other wrappers
   for useful/used bit-level GCC builtins.
 * Eliminate the MSVC #ifdef-induced documentation duplication.
 * _Generic versions of things like rte_popcount32(). (?)

Mattias Rönnblom (6):
  eal: extend bit manipulation functionality
  eal: add unit tests for bit operations
  eal: add exactly-once bit access functions
  eal: add unit tests for exactly-once bit access functions
  eal: add atomic bit operations
  eal: add unit tests for atomic bit access functions

 app/test/test_bitops.c       | 405 +++++++++++++++-
 lib/eal/include/rte_bitops.h | 877 ++++++++++++++++++++++++++++++++++-
 2 files changed, 1264 insertions(+), 18 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [RFC v5 1/6] eal: extend bit manipulation functionality
  2024-04-30 12:08               ` [RFC v5 0/6] Improve EAL bit operations API Mattias Rönnblom
@ 2024-04-30 12:08                 ` Mattias Rönnblom
  2024-05-02  5:57                   ` [RFC v6 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-04-30 12:08                 ` [RFC v5 2/6] eal: add unit tests for bit operations Mattias Rönnblom
                                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-30 12:08 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add functionality to test, set, clear, and assign the value to
individual bits in 32-bit or 64-bit words.

These functions have no implications on memory ordering, atomicity and
does not use volatile and thus does not prevent any compiler
optimizations.

RFC v4:
 * Add rte_bit_flip() which, believe it or not, flips the value of a bit.
 * Mark macro-generated private functions as experimental.
 * Use macros to generate *assign*() functions.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).
 * Fix ','-related checkpatch warnings.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 257 ++++++++++++++++++++++++++++++++++-
 1 file changed, 255 insertions(+), 2 deletions(-)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 449565eeae..9d426f1602 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -2,6 +2,7 @@
  * Copyright(c) 2020 Arm Limited
  * Copyright(c) 2010-2019 Intel Corporation
  * Copyright(c) 2023 Microsoft Corporation
+ * Copyright(c) 2024 Ericsson AB
  */
 
 #ifndef _RTE_BITOPS_H_
@@ -11,12 +12,14 @@
  * @file
  * Bit Operations
  *
- * This file defines a family of APIs for bit operations
- * without enforcing memory ordering.
+ * This file provides functionality for low-level, single-word
+ * arithmetic and bit-level operations, such as counting or
+ * setting individual bits.
  */
 
 #include <stdint.h>
 
+#include <rte_compat.h>
 #include <rte_debug.h>
 
 #ifdef __cplusplus
@@ -105,6 +108,194 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_test32,			\
+		 uint64_t *: __rte_bit_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: __rte_bit_set32,		\
+		 uint64_t *: __rte_bit_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_clear32,			\
+		 uint64_t *: __rte_bit_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)					\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_assign32,			\
+		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Flip a bit in word.
+ *
+ * Generic selection macro to change the value of a bit to '0' if '1'
+ * or '1' if '0' in a 32-bit or 64-bit word. The type of operation
+ * depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_flip(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_flip32,				\
+		 uint64_t *: __rte_bit_flip64)(addr, nr)
+
+#define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_ ## family ## fun ## size(const qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return *addr & mask;					\
+	}
+
+#define __RTE_GEN_BIT_SET(family, fun, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		*addr |= mask;						\
+	}								\
+
+#define __RTE_GEN_BIT_CLEAR(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = ~((uint ## size ## _t)1 << nr); \
+		(*addr) &= mask;					\
+	}								\
+
+#define __RTE_GEN_BIT_ASSIGN(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr, bool value) \
+	{								\
+		if (value)						\
+			__rte_bit_ ## family ## set ## size(addr, nr);	\
+		else							\
+			__rte_bit_ ## family ## clear ## size(addr, nr); \
+	}
+
+#define __RTE_GEN_BIT_FLIP(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		bool value;						\
+									\
+		value = __rte_bit_ ## family ## test ## size(addr, nr);	\
+		__rte_bit_ ## family ## assign ## size(addr, nr, !value); \
+	}
+
+__RTE_GEN_BIT_TEST(, test,, 32)
+__RTE_GEN_BIT_SET(, set,, 32)
+__RTE_GEN_BIT_CLEAR(, clear,, 32)
+__RTE_GEN_BIT_ASSIGN(, assign,, 32)
+__RTE_GEN_BIT_FLIP(, flip,, 32)
+
+__RTE_GEN_BIT_TEST(, test,, 64)
+__RTE_GEN_BIT_SET(, set,, 64)
+__RTE_GEN_BIT_CLEAR(, clear,, 64)
+__RTE_GEN_BIT_ASSIGN(, assign,, 64)
+__RTE_GEN_BIT_FLIP(, flip,, 64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -787,6 +978,68 @@ rte_log2_u64(uint64_t v)
 
 #ifdef __cplusplus
 }
+
+/*
+ * Since C++ doesn't support generic selection (i.e., _Generic),
+ * function overloading is used instead. Such functions must be
+ * defined outside 'extern "C"' to be accepted by the compiler.
+ */
+
+#undef rte_bit_test
+#undef rte_bit_set
+#undef rte_bit_clear
+#undef rte_bit_assign
+#undef rte_bit_flip
+
+#define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
+	static inline void						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_2(fun, qualifier, arg1_type, arg1_name)	\
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 32, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 64, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_2R(fun, qualifier, ret_type, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name)				\
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_3(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name)					\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name)
+
+__RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v5 2/6] eal: add unit tests for bit operations
  2024-04-30 12:08               ` [RFC v5 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-04-30 12:08                 ` [RFC v5 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-04-30 12:08                 ` Mattias Rönnblom
  2024-04-30 12:08                 ` [RFC v5 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
                                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-30 12:08 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the rte_bit_[set|clear|assign|test]()
family of functions.

The tests are converted to use the test suite runner framework.

RFC v4:
 * Remove redundant line continuations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c | 80 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 65 insertions(+), 15 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 0d4ccfb468..111f9b328e 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -1,13 +1,63 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2019 Arm Limited
+ * Copyright(c) 2024 Ericsson AB
  */
 
+#include <stdbool.h>
+
 #include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_random.h>
 #include "test.h"
 
-uint32_t val32;
-uint64_t val64;
+#define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
+			    flip_fun, test_fun, size)			\
+	static int							\
+	test_name(void)							\
+	{								\
+		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
+		unsigned int bit_nr;					\
+		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			bool assign = rte_rand() & 1;			\
+			if (assign)					\
+				assign_fun(&word, bit_nr, reference_bit); \
+			else {						\
+				if (reference_bit)			\
+					set_fun(&word, bit_nr);		\
+				else					\
+					clear_fun(&word, bit_nr);	\
+									\
+			}						\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+			TEST_ASSERT(test_fun(&word, bit_nr) != reference_bit, \
+				    "Bit %d had unflipped value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+		}							\
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		TEST_ASSERT(reference == word, "Word had unexpected value"); \
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
+
+static uint32_t val32;
+static uint64_t val64;
 
 #define MAX_BITS_32 32
 #define MAX_BITS_64 64
@@ -117,22 +167,22 @@ test_bit_relaxed_test_set_clear(void)
 	return TEST_SUCCESS;
 }
 
+static struct unit_test_suite test_suite = {
+	.suite_name = "Bitops test suite",
+	.unit_test_cases = {
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_relaxed_set),
+		TEST_CASE(test_bit_relaxed_clear),
+		TEST_CASE(test_bit_relaxed_test_set_clear),
+		TEST_CASES_END()
+	}
+};
+
 static int
 test_bitops(void)
 {
-	val32 = 0;
-	val64 = 0;
-
-	if (test_bit_relaxed_set() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_clear() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_test_set_clear() < 0)
-		return TEST_FAILED;
-
-	return TEST_SUCCESS;
+	return unit_test_suite_runner(&test_suite);
 }
 
 REGISTER_FAST_TEST(bitops_autotest, true, true, test_bitops);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v5 3/6] eal: add exactly-once bit access functions
  2024-04-30 12:08               ` [RFC v5 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-04-30 12:08                 ` [RFC v5 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
  2024-04-30 12:08                 ` [RFC v5 2/6] eal: add unit tests for bit operations Mattias Rönnblom
@ 2024-04-30 12:08                 ` Mattias Rönnblom
  2024-04-30 12:08                 ` [RFC v5 4/6] eal: add unit tests for " Mattias Rönnblom
                                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-30 12:08 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add bit test/set/clear/assign functions which prevents certain
compiler optimizations and guarantees that program-level memory loads
and/or stores will actually occur.

These functions are useful when interacting with memory-mapped
hardware devices.

The "once" family of functions does not promise atomicity and provides
no memory ordering guarantees beyond the C11 relaxed memory model.

RFC v3:
    * Work around lack of C++ support for _Generic (Tyler Retzlaff).

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 195 +++++++++++++++++++++++++++++++++++
 1 file changed, 195 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 9d426f1602..f77bd83e97 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -224,6 +224,177 @@ extern "C" {
 		 uint32_t *: __rte_bit_flip32,				\
 		 uint64_t *: __rte_bit_flip64)(addr, nr)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Generic selection macro to test exactly once the value of a bit in
+ * a 32-bit or 64-bit word. The type of operation depends on the type
+ * of the @c addr parameter.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * (e.g., it may not be eliminate or merged by the compiler).
+ *
+ * \code{.c}
+ * rte_bit_once_set(addr, 17);
+ * if (rte_bit_once_test(addr, 17)) {
+ *     ...
+ * }
+ * \endcode
+ *
+ * In the above example, rte_bit_once_set() may not be removed by
+ * the compiler, which would be allowed in case rte_bit_set() and
+ * rte_bit_test() was used.
+ *
+ * \code{.c}
+ * while (rte_bit_once_test(addr, 17);
+ *     ;
+ * \endcode
+ *
+ * In case rte_bit_test(addr, 17) was used instead, the resulting
+ * object code could (and in many cases would be) replaced with
+ * the equivalent to
+ * \code{.c}
+ * if (rte_bit_test(addr, 17)) {
+ *   for (;;) // spin forever
+ *       ;
+ * }
+ * \endcode
+ *
+ * rte_bit_once_test() does not give any guarantees in regards to
+ * memory ordering or atomicity.
+ *
+ * The regular bit set operations (e.g., rte_bit_test()) should be
+ * preferred over the "once" family of operations (e.g.,
+ * rte_bit_once_test()) if possible, since the latter may prevent
+ * optimizations crucial for run-time performance.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+
+#define rte_bit_once_test(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_test32,		\
+		 uint64_t *: __rte_bit_once_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to '1'
+ * exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit set operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+
+#define rte_bit_once_set(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_set32,		\
+		 uint64_t *: __rte_bit_once_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to '0'
+ * exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_clear(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_clear32,		\
+		 uint64_t *: __rte_bit_once_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to bit in a word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to the
+ * value indicated by @c value exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_once_assign(addr, nr, value)				\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_once_assign32,			\
+		 uint64_t *: __rte_bit_once_assign64)(addr, nr, value)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Flip bit in word, reading and writing exactly once.
+ *
+ * Change the value of a bit to '0' if '1' or '1' if '0' in a 32-bit
+ * or 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit flip operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_flip(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_flip32,		\
+		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
+
 #define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
@@ -296,6 +467,18 @@ __RTE_GEN_BIT_CLEAR(, clear,, 64)
 __RTE_GEN_BIT_ASSIGN(, assign,, 64)
 __RTE_GEN_BIT_FLIP(, flip,, 64)
 
+__RTE_GEN_BIT_TEST(once_, test, volatile, 32)
+__RTE_GEN_BIT_SET(once_, set, volatile, 32)
+__RTE_GEN_BIT_CLEAR(once_, clear, volatile, 32)
+__RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 32)
+__RTE_GEN_BIT_FLIP(once_, flip, volatile, 32)
+
+__RTE_GEN_BIT_TEST(once_, test, volatile, 64)
+__RTE_GEN_BIT_SET(once_, set, volatile, 64)
+__RTE_GEN_BIT_CLEAR(once_, clear, volatile, 64)
+__RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 64)
+__RTE_GEN_BIT_FLIP(once_, flip, volatile, 64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -991,6 +1174,12 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_assign
 #undef rte_bit_flip
 
+#undef rte_bit_once_test
+#undef rte_bit_once_set
+#undef rte_bit_once_clear
+#undef rte_bit_once_assign
+#undef rte_bit_once_flip
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1040,6 +1229,12 @@ __RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
 __RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
 
+__RTE_BIT_OVERLOAD_2R(once_test, const volatile, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(once_set, volatile, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(once_clear, volatile, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(once_assign, volatile, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(once_flip, volatile, unsigned int, nr)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v5 4/6] eal: add unit tests for exactly-once bit access functions
  2024-04-30 12:08               ` [RFC v5 0/6] Improve EAL bit operations API Mattias Rönnblom
                                   ` (2 preceding siblings ...)
  2024-04-30 12:08                 ` [RFC v5 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
@ 2024-04-30 12:08                 ` Mattias Rönnblom
  2024-04-30 12:08                 ` [RFC v5 5/6] eal: add atomic bit operations Mattias Rönnblom
  2024-04-30 12:08                 ` [RFC v5 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  5 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-30 12:08 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_once_[set|clear|assign|test]() family of functions.

RFC v5:
 * Atomic bit op implementation moved from this patch to the proper
   patch in the series. (Morten Brørup)

RFC v4:
 * Remove redundant continuations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 111f9b328e..615ec6e563 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -56,6 +56,14 @@ GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
 GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
 		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
 
+GEN_TEST_BIT_ACCESS(test_bit_once_access32, rte_bit_once_set,
+		    rte_bit_once_clear, rte_bit_once_assign,
+		    rte_bit_once_flip, rte_bit_once_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_once_access64, rte_bit_once_set,
+		    rte_bit_once_clear, rte_bit_once_assign,
+		    rte_bit_once_flip, rte_bit_once_test, 64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -172,6 +180,8 @@ static struct unit_test_suite test_suite = {
 	.unit_test_cases = {
 		TEST_CASE(test_bit_access32),
 		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_once_access32),
+		TEST_CASE(test_bit_once_access64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v5 5/6] eal: add atomic bit operations
  2024-04-30 12:08               ` [RFC v5 0/6] Improve EAL bit operations API Mattias Rönnblom
                                   ` (3 preceding siblings ...)
  2024-04-30 12:08                 ` [RFC v5 4/6] eal: add unit tests for " Mattias Rönnblom
@ 2024-04-30 12:08                 ` Mattias Rönnblom
  2024-04-30 12:08                 ` [RFC v5 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  5 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-30 12:08 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add atomic bit test/set/clear/assign and test-and-set/clear functions.

All atomic bit functions allow (and indeed, require) the caller to
specify a memory order.

RFC v4:
 * Add atomic bit flip.
 * Mark macro-generated private functions experimental.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).

RFC v2:
 o Add rte_bit_atomic_test_and_assign() (for consistency).
 o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
 o Use <rte_stdatomics.h> to support MSVC.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 425 +++++++++++++++++++++++++++++++++++
 1 file changed, 425 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index f77bd83e97..abfe96d531 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -21,6 +21,7 @@
 
 #include <rte_compat.h>
 #include <rte_debug.h>
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -395,6 +396,199 @@ extern "C" {
 		 uint32_t *: __rte_bit_once_flip32,		\
 		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test if a particular bit in a word is set with a particular memory
+ * order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+#define rte_bit_atomic_test(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test32,			\
+		 uint64_t *: __rte_bit_atomic_test64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically set bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '1', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_set(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_set32,			\
+		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically clear bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '0', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_clear(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_clear32,			\
+		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically assign a value to bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to the value indicated by @c value, with the memory ordering
+ * as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_assign32,			\
+		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
+							memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically flip bit in word.
+ *
+ * Atomically negate the value of the bit specified by @c nr in the
+ * word pointed to by @c addr to the value indicated by @c value, with
+ * the memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_flip(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_flip32,			\
+		 uint64_t *: __rte_bit_atomic_flip64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and set a bit in word.
+ *
+ * Atomically test and set bit specified by @c nr in the word pointed
+ * to by @c addr to '1', with the memory ordering as specified with @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
+							      memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and clear a bit in word.
+ *
+ * Atomically test and clear bit specified by @c nr in the word
+ * pointed to by @c addr to '0', with the memory ordering as specified
+ * with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
+								memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and assign a bit in word.
+ *
+ * Atomically test and assign bit specified by @c nr in the word
+ * pointed to by @c addr the value specified by @c value, with the
+ * memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
+		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
+								 value, \
+								 memory_order)
+
 #define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
@@ -479,6 +673,162 @@ __RTE_GEN_BIT_CLEAR(once_, clear, volatile, 64)
 __RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 64)
 __RTE_GEN_BIT_FLIP(once_, flip, volatile, 64)
 
+#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\
+				      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_SET(size)					\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_set ## size(uint ## size ## _t *addr,		\
+				     unsigned int nr, int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_or_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_CLEAR(size)				\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_clear ## size(uint ## size ## _t *addr,	\
+				       unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_and_explicit(a_addr, ~mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_ASSIGN(size)				\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_assign ## size(uint ## size ## _t *addr,	\
+					unsigned int nr, bool value,	\
+					int memory_order)		\
+	{								\
+		if (value)						\
+			__rte_bit_atomic_set ## size(addr, nr, memory_order); \
+		else							\
+			__rte_bit_atomic_clear ## size(addr, nr,	\
+						       memory_order);	\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)			\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test_and_assign ## size(uint ## size ## _t *addr, \
+						 unsigned int nr,	\
+						 bool value,		\
+						 int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t before;				\
+		uint ## size ## _t target;				\
+									\
+		before = rte_atomic_load_explicit(a_addr,		\
+						  rte_memory_order_relaxed); \
+									\
+		do {							\
+			target = before;				\
+			__rte_bit_assign ## size(&target, nr, value);	\
+		} while (!rte_atomic_compare_exchange_weak_explicit(	\
+				a_addr, &before, target,		\
+				rte_memory_order_relaxed,		\
+				memory_order));				\
+		return __rte_bit_test ## size(&before, nr);		\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_FLIP(size)					\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_flip ## size(uint ## size ## _t *addr,		\
+				      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t before;				\
+		uint ## size ## _t target;				\
+									\
+		before = rte_atomic_load_explicit(a_addr,		\
+						  rte_memory_order_relaxed); \
+									\
+		do {							\
+			target = before;				\
+			__rte_bit_flip ## size(&target, nr);		\
+		} while (!rte_atomic_compare_exchange_weak_explicit(	\
+				a_addr, &before, target,		\
+				rte_memory_order_relaxed,		\
+				memory_order));				\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_OPS(size)			\
+	__RTE_GEN_BIT_ATOMIC_TEST(size)			\
+	__RTE_GEN_BIT_ATOMIC_SET(size)			\
+	__RTE_GEN_BIT_ATOMIC_CLEAR(size)		\
+	__RTE_GEN_BIT_ATOMIC_ASSIGN(size)		\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)	\
+	__RTE_GEN_BIT_ATOMIC_FLIP(size)
+
+__RTE_GEN_BIT_ATOMIC_OPS(32)
+__RTE_GEN_BIT_ATOMIC_OPS(64)
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_set32(uint32_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, true,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_clear32(uint32_t *addr, unsigned int nr,
+				int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, false,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_set64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, true,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_clear64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, false,
+						  memory_order);
+}
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -1180,6 +1530,14 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_once_assign
 #undef rte_bit_once_flip
 
+#undef rte_bit_atomic_test
+#undef rte_bit_atomic_set
+#undef rte_bit_atomic_clear
+#undef rte_bit_atomic_assign
+#undef rte_bit_atomic_test_and_set
+#undef rte_bit_atomic_test_and_clear
+#undef rte_bit_atomic_test_and_assign
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1223,6 +1581,59 @@ rte_log2_u64(uint64_t v)
 	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
 				arg2_type, arg2_name)
 
+#define __RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name); \
+	}
+
+#define __RTE_BIT_OVERLOAD_3R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name,	\
+					  arg3_name);		      \
+	}
+
+#define __RTE_BIT_OVERLOAD_4(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name, arg3_type, arg3_name)		\
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name, \
+						 arg3_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_4R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)
+
 __RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
@@ -1235,6 +1646,20 @@ __RTE_BIT_OVERLOAD_2(once_clear, volatile, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(once_assign, volatile, unsigned int, nr, bool, value)
 __RTE_BIT_OVERLOAD_2(once_flip, volatile, unsigned int, nr)
 
+__RTE_BIT_OVERLOAD_3R(atomic_test, const, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_set,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_clear,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_4(atomic_assign,, unsigned int, nr, bool, value,
+		     int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_flip,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_set,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_clear,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_4R(atomic_test_and_assign,, bool, unsigned int, nr,
+		      bool, value, int, memory_order)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v5 6/6] eal: add unit tests for atomic bit access functions
  2024-04-30 12:08               ` [RFC v5 0/6] Improve EAL bit operations API Mattias Rönnblom
                                   ` (4 preceding siblings ...)
  2024-04-30 12:08                 ` [RFC v5 5/6] eal: add atomic bit operations Mattias Rönnblom
@ 2024-04-30 12:08                 ` Mattias Rönnblom
  5 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-04-30 12:08 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_atomic_[set|clear|assign|test|test_and_[set|clear|assign]]()
family of functions.

RFC v4:
 * Add atomicity test for atomic bit flip.

RFC v3:
 * Rename variable 'main' to make ICC happy.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c | 315 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 314 insertions(+), 1 deletion(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 615ec6e563..abc07e8caf 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -3,10 +3,13 @@
  * Copyright(c) 2024 Ericsson AB
  */
 
+#include <inttypes.h>
 #include <stdbool.h>
 
-#include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_lcore.h>
 #include <rte_random.h>
 #include "test.h"
 
@@ -64,6 +67,304 @@ GEN_TEST_BIT_ACCESS(test_bit_once_access64, rte_bit_once_set,
 		    rte_bit_once_clear, rte_bit_once_assign,
 		    rte_bit_once_flip, rte_bit_once_test, 64)
 
+#define bit_atomic_set(addr, nr)				\
+	rte_bit_atomic_set(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_clear(addr, nr)					\
+	rte_bit_atomic_clear(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_assign(addr, nr, value)				\
+	rte_bit_atomic_assign(addr, nr, value, rte_memory_order_relaxed)
+
+#define bit_atomic_flip(addr, nr)					\
+    rte_bit_atomic_flip(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_test(addr, nr)				\
+	rte_bit_atomic_test(addr, nr, rte_memory_order_relaxed)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access32, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access64, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 64)
+
+#define PARALLEL_TEST_RUNTIME 0.25
+
+#define GEN_TEST_BIT_PARALLEL_ASSIGN(size)				\
+									\
+	struct parallel_access_lcore ## size				\
+	{								\
+		unsigned int bit;					\
+		uint ## size ##_t *word;				\
+		bool failed;						\
+	};								\
+									\
+	static int							\
+	run_parallel_assign ## size(void *arg)				\
+	{								\
+		struct parallel_access_lcore ## size *lcore = arg;	\
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		bool value = false;					\
+									\
+		do {							\
+			bool new_value = rte_rand() & 1;		\
+			bool use_test_and_modify = rte_rand() & 1;	\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (rte_bit_atomic_test(lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) != value) { \
+				lcore->failed = true;			\
+				break;					\
+			}						\
+									\
+			if (use_test_and_modify) {			\
+				bool old_value;				\
+				if (use_assign) 			\
+					old_value = rte_bit_atomic_test_and_assign( \
+						lcore->word, lcore->bit, new_value, \
+						rte_memory_order_relaxed); \
+				else {					\
+					old_value = new_value ?		\
+						rte_bit_atomic_test_and_set( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed) : \
+						rte_bit_atomic_test_and_clear( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+				if (old_value != value) {		\
+					lcore->failed = true;		\
+					break;				\
+				}					\
+			} else {					\
+				if (use_assign)				\
+					rte_bit_atomic_assign(lcore->word, lcore->bit, \
+							      new_value, \
+							      rte_memory_order_relaxed); \
+				else {					\
+					if (new_value)			\
+						rte_bit_atomic_set(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+					else				\
+						rte_bit_atomic_clear(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+			}						\
+									\
+			value = new_value;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_assign ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		struct parallel_access_lcore ## size lmain = {		\
+			.word = &word					\
+		};							\
+		struct parallel_access_lcore ## size lworker = {	\
+			.word = &word					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		lmain.bit = rte_rand_max(size);				\
+		do {							\
+			lworker.bit = rte_rand_max(size);		\
+		} while (lworker.bit == lmain.bit);			\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_assign ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_assign ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		TEST_ASSERT(!lmain.failed, "Main lcore atomic access failed"); \
+		TEST_ASSERT(!lworker.failed, "Worker lcore atomic access " \
+			    "failed");					\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_ASSIGN(32)
+GEN_TEST_BIT_PARALLEL_ASSIGN(64)
+
+#define GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(size)			\
+									\
+	struct parallel_test_and_set_lcore ## size			\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_test_and_modify ## size(void *arg)		\
+	{								\
+		struct parallel_test_and_set_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			bool old_value;					\
+			bool new_value = rte_rand() & 1;		\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (use_assign)					\
+				old_value = rte_bit_atomic_test_and_assign( \
+					lcore->word, lcore->bit, new_value, \
+					rte_memory_order_relaxed);	\
+			else						\
+				old_value = new_value ?			\
+					rte_bit_atomic_test_and_set(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) : \
+					rte_bit_atomic_test_and_clear(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed); \
+			if (old_value != new_value)			\
+				lcore->flips++;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_test_and_modify ## size(void)		\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_test_and_set_lcore ## size lmain = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_test_and_set_lcore ## size lworker = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_test_and_modify ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_test_and_modify ## size(&lmain);		\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(32)
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(64)
+
+#define GEN_TEST_BIT_PARALLEL_FLIP(size)				\
+									\
+	struct parallel_flip_lcore ## size				\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_flip ## size(void *arg)				\
+	{								\
+		struct parallel_flip_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			rte_bit_atomic_flip(lcore->word, lcore->bit,	\
+					    rte_memory_order_relaxed);	\
+			lcore->flips++;					\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_flip ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_flip_lcore ## size lmain = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_flip_lcore ## size lworker = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_flip ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_flip ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_FLIP(32)
+GEN_TEST_BIT_PARALLEL_FLIP(64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -182,6 +483,18 @@ static struct unit_test_suite test_suite = {
 		TEST_CASE(test_bit_access64),
 		TEST_CASE(test_bit_once_access32),
 		TEST_CASE(test_bit_once_access64),
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_once_access32),
+		TEST_CASE(test_bit_once_access64),
+		TEST_CASE(test_bit_atomic_access32),
+		TEST_CASE(test_bit_atomic_access64),
+		TEST_CASE(test_bit_atomic_parallel_assign32),
+		TEST_CASE(test_bit_atomic_parallel_assign64),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify32),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify64),
+		TEST_CASE(test_bit_atomic_parallel_flip32),
+		TEST_CASE(test_bit_atomic_parallel_flip64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [RFC v2 5/6] eal: add atomic bit operations
  2024-04-26  9:39             ` Mattias Rönnblom
  2024-04-26 12:00               ` Morten Brørup
@ 2024-04-30 16:52               ` Tyler Retzlaff
  1 sibling, 0 replies; 90+ messages in thread
From: Tyler Retzlaff @ 2024-04-30 16:52 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Morten Brørup, Mattias Rönnblom, dev, Heng Wang,
	Stephen Hemminger, techboard

On Fri, Apr 26, 2024 at 11:39:17AM +0200, Mattias Rönnblom wrote:

[ ... ]

> >
> >>
> >>The only reason for _Atomic being as it is, as far as I can see, is to
> >>accommodate for ISAs which does not have the appropriate atomic machine
> >>instructions, and thus require a lock or some other data associated with
> >>the actual user-data-carrying bits. Neither GCC nor DPDK supports any
> >>such ISAs, to my knowledge. I suspect neither never will. So the cast
> >>will continue to work.
> >
> >I tend to agree with you on this.
> >
> >We should officially decide that DPDK treats RTE_ATOMIC types as a union of _Atomic and non-atomic, i.e. operations on RTE_ATOMIC types can be both atomic and non-atomic.
> >
> 
> I think this is a subject which needs to be further explored.
> 
> Objects that can be accessed both atomically and non-atomically
> should be without _Atomic. With my current understanding of this
> issue, that seems like the best option.

i've been distracted by other work and while not in the scope of this
series i want to say +1 to having this discussion. utilizing a union for
this atomic vs non-atomic access that appears in practice is a good idea.

> 
> You could turn it around as well, and have such marked _Atomic and
> have explicit casts to their non-_Atomic cousins when operated upon
> by non-atomic functions. Not sure how realistic that is, since
> non-atomicity is the norm. All generic selection-based "functions"
> must take this into account.

the problem with casts is they are actually different types and may have
different size and/or alignment relative to their non-atomic types.
for current non-locking atomics the implementations happen to be the
same (presumably because it was practical) but the union is definitely a
cleaner approach.

> 
> >>
> >>>>+				      unsigned int nr, int memory_order) \
> >>>>+	{								\
> >>>>+		RTE_ASSERT(nr < size);					\
> >>>>+									\
> >>>>+		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
> >>>>+			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
> >>>>+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
> >>>>+		return rte_atomic_load_explicit(a_addr, memory_order) &
> >>mask; \
> >>>>+	}
> >>>
> >>>
> >>>Similar considerations regarding volatile qualifier for the "once"
> >>operations.
> >>>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [RFC v6 0/6] Improve EAL bit operations API
  2024-04-30 12:08                 ` [RFC v5 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-05-02  5:57                   ` Mattias Rönnblom
  2024-05-02  5:57                     ` [RFC v6 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
                                       ` (5 more replies)
  0 siblings, 6 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-05-02  5:57 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

This patch set represent an attempt to improve and extend the RTE
bitops API, in particular for functions that operate on individual
bits.

All new functionality is exposed to the user as generic selection
macros, delegating the actual work to private (__-marked) static
inline functions. Public functions (e.g., rte_bit_set32()) would just
be bloating the API. Such generic selection macros will here be
referred to as "functions", although technically they are not.

The legacy <rte_bitops.h> rte_bit_relaxed_*() family of functions is
replaced with three families:

rte_bit_[test|set|clear|assign|flip]() which provides no memory
ordering or atomicity guarantees and no read-once or write-once
semantics (e.g., no use of volatile), but does provide the best
performance. The performance degradation resulting from the use of
volatile (e.g., forcing loads and stores to actually occur and in the
number specified) and atomic (e.g., LOCK-prefixed instructions on x86)
may be significant.

rte_bit_once_*() which guarantees program-level load and stores
actually occurring (i.e., prevents certain optimizations). The primary
use of these functions are in the context of memory mapped
I/O. Feedback on the details (semantics, naming) here would be greatly
appreciated, since the author is not much of a driver developer.

rte_bit_atomic_*() which provides atomic bit-level operations,
including the possibility to specifying memory ordering constraints
(or the lack thereof).

The atomic functions take non-_Atomic pointers, to be flexible, just
like the GCC builtins and default <rte_stdatomic.h>. The issue with
_Atomic APIs is that it may well be the case that the user wants to
perform both non-atomic and atomic operations on the same word.

Having _Atomic-marked addresses would complicate supporting atomic
bit-level operations in the bitset API (proposed in a different RFC
patchset), and potentially other APIs depending on RTE bitops for
atomic bit-level ops). Either one needs two bitset variants, one
_Atomic bitset and one non-atomic one, or the bitset code needs to
cast the non-_Atomic pointer to an _Atomic one. Having a separate
_Atomic bitset would be bloat and also prevent the user from both, in
some situations, doing atomic operations against a bit set, while in
other situations (e.g., at times when MT safety is not a concern)
operating on the same objects in a non-atomic manner.

Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
not uint32_t or uint64_t. The author found the use of such large types
confusing, and also failed to see any performance benefits.

A set of functions rte_bit_*_assign() are added, to assign a
particular boolean value to a particular bit.

All new functions have properly documented semantics.

All new functions operate on both 32 and 64-bit words, with type
checking.

_Generic allow the user code to be a little more impact. Have a
type-generic atomic test/set/clear/assign bit API also seems
consistent with the "core" (word-size) atomics API, which is generic
(both GCC builtins and <rte_stdatomic.h> are).

The _Generic versions avoids having explicit unsigned long versions of
all functions. If you have an unsigned long, it's safe to use the
generic version (e.g., rte_set_bit()) and _Generic will pick the right
function, provided long is either 32 or 64 bit on your platform (which
it is on all DPDK-supported ABIs).

The generic rte_bit_set() is a macro, and not a function, but
nevertheless has been given a lower-case name. That's how C11 does it
(for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
can't be taken, but it does not evaluate its parameters more than
once.

C++ doesn't support generic selection. In C++ translation units the
_Generic macros are replaced with overloaded functions.

Mattias Rönnblom (6):
  eal: extend bit manipulation functionality
  eal: add unit tests for bit operations
  eal: add exactly-once bit access functions
  eal: add unit tests for exactly-once bit access functions
  eal: add atomic bit operations
  eal: add unit tests for atomic bit access functions

 app/test/test_bitops.c       | 410 +++++++++++++++-
 lib/eal/include/rte_bitops.h | 884 ++++++++++++++++++++++++++++++++++-
 2 files changed, 1276 insertions(+), 18 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [RFC v6 1/6] eal: extend bit manipulation functionality
  2024-05-02  5:57                   ` [RFC v6 0/6] Improve EAL bit operations API Mattias Rönnblom
@ 2024-05-02  5:57                     ` Mattias Rönnblom
  2024-05-05  8:37                       ` [RFC v7 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-05-02  5:57                     ` [RFC v6 2/6] eal: add unit tests for bit operations Mattias Rönnblom
                                       ` (4 subsequent siblings)
  5 siblings, 1 reply; 90+ messages in thread
From: Mattias Rönnblom @ 2024-05-02  5:57 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add functionality to test and modify the value of individual bits in
32-bit or 64-bit words.

These functions have no implications on memory ordering, atomicity and
does not use volatile and thus does not prevent any compiler
optimizations.

RFC v6:
 * Have rte_bit_test() accept const-marked bitsets.

RFC v4:
 * Add rte_bit_flip() which, believe it or not, flips the value of a bit.
 * Mark macro-generated private functions as experimental.
 * Use macros to generate *assign*() functions.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).
 * Fix ','-related checkpatch warnings.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 259 ++++++++++++++++++++++++++++++++++-
 1 file changed, 257 insertions(+), 2 deletions(-)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 449565eeae..3297133e22 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -2,6 +2,7 @@
  * Copyright(c) 2020 Arm Limited
  * Copyright(c) 2010-2019 Intel Corporation
  * Copyright(c) 2023 Microsoft Corporation
+ * Copyright(c) 2024 Ericsson AB
  */
 
 #ifndef _RTE_BITOPS_H_
@@ -11,12 +12,14 @@
  * @file
  * Bit Operations
  *
- * This file defines a family of APIs for bit operations
- * without enforcing memory ordering.
+ * This file provides functionality for low-level, single-word
+ * arithmetic and bit-level operations, such as counting or
+ * setting individual bits.
  */
 
 #include <stdint.h>
 
+#include <rte_compat.h>
 #include <rte_debug.h>
 
 #ifdef __cplusplus
@@ -105,6 +108,196 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr)					\
+	_Generic((addr),					\
+		uint32_t *: __rte_bit_test32,			\
+		const uint32_t *: __rte_bit_test32,		\
+		uint64_t *: __rte_bit_test64,			\
+		const uint64_t *: __rte_bit_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: __rte_bit_set32,		\
+		 uint64_t *: __rte_bit_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_clear32,			\
+		 uint64_t *: __rte_bit_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)					\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_assign32,			\
+		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Flip a bit in word.
+ *
+ * Generic selection macro to change the value of a bit to '0' if '1'
+ * or '1' if '0' in a 32-bit or 64-bit word. The type of operation
+ * depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_flip(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_flip32,				\
+		 uint64_t *: __rte_bit_flip64)(addr, nr)
+
+#define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_ ## family ## fun ## size(const qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return *addr & mask;					\
+	}
+
+#define __RTE_GEN_BIT_SET(family, fun, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		*addr |= mask;						\
+	}								\
+
+#define __RTE_GEN_BIT_CLEAR(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = ~((uint ## size ## _t)1 << nr); \
+		(*addr) &= mask;					\
+	}								\
+
+#define __RTE_GEN_BIT_ASSIGN(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr, bool value) \
+	{								\
+		if (value)						\
+			__rte_bit_ ## family ## set ## size(addr, nr);	\
+		else							\
+			__rte_bit_ ## family ## clear ## size(addr, nr); \
+	}
+
+#define __RTE_GEN_BIT_FLIP(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		bool value;						\
+									\
+		value = __rte_bit_ ## family ## test ## size(addr, nr);	\
+		__rte_bit_ ## family ## assign ## size(addr, nr, !value); \
+	}
+
+__RTE_GEN_BIT_TEST(, test,, 32)
+__RTE_GEN_BIT_SET(, set,, 32)
+__RTE_GEN_BIT_CLEAR(, clear,, 32)
+__RTE_GEN_BIT_ASSIGN(, assign,, 32)
+__RTE_GEN_BIT_FLIP(, flip,, 32)
+
+__RTE_GEN_BIT_TEST(, test,, 64)
+__RTE_GEN_BIT_SET(, set,, 64)
+__RTE_GEN_BIT_CLEAR(, clear,, 64)
+__RTE_GEN_BIT_ASSIGN(, assign,, 64)
+__RTE_GEN_BIT_FLIP(, flip,, 64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -787,6 +980,68 @@ rte_log2_u64(uint64_t v)
 
 #ifdef __cplusplus
 }
+
+/*
+ * Since C++ doesn't support generic selection (i.e., _Generic),
+ * function overloading is used instead. Such functions must be
+ * defined outside 'extern "C"' to be accepted by the compiler.
+ */
+
+#undef rte_bit_test
+#undef rte_bit_set
+#undef rte_bit_clear
+#undef rte_bit_assign
+#undef rte_bit_flip
+
+#define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
+	static inline void						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_2(fun, qualifier, arg1_type, arg1_name)	\
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 32, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 64, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_2R(fun, qualifier, ret_type, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name)				\
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_3(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name)					\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name)
+
+__RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v6 2/6] eal: add unit tests for bit operations
  2024-05-02  5:57                   ` [RFC v6 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-05-02  5:57                     ` [RFC v6 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-05-02  5:57                     ` Mattias Rönnblom
  2024-05-02  5:57                     ` [RFC v6 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
                                       ` (3 subsequent siblings)
  5 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-05-02  5:57 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_[test|set|clear|assign|flip]()
functions.

The tests are converted to use the test suite runner framework.

RFC v6:
 * Test rte_bit_*test() usage through const pointers.

RFC v4:
 * Remove redundant line continuations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c | 85 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 70 insertions(+), 15 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 0d4ccfb468..322f58c066 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -1,13 +1,68 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2019 Arm Limited
+ * Copyright(c) 2024 Ericsson AB
  */
 
+#include <stdbool.h>
+
 #include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_random.h>
 #include "test.h"
 
-uint32_t val32;
-uint64_t val64;
+#define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
+			    flip_fun, test_fun, size)			\
+	static int							\
+	test_name(void)							\
+	{								\
+		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
+		unsigned int bit_nr;					\
+		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			bool assign = rte_rand() & 1;			\
+			if (assign)					\
+				assign_fun(&word, bit_nr, reference_bit); \
+			else {						\
+				if (reference_bit)			\
+					set_fun(&word, bit_nr);		\
+				else					\
+					clear_fun(&word, bit_nr);	\
+									\
+			}						\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+			TEST_ASSERT(test_fun(&word, bit_nr) != reference_bit, \
+				    "Bit %d had unflipped value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+									\
+			const uint ## size ## _t *const_ptr = &word;	\
+			TEST_ASSERT(test_fun(const_ptr, bit_nr) ==	\
+				    reference_bit,			\
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		TEST_ASSERT(reference == word, "Word had unexpected value"); \
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
+
+static uint32_t val32;
+static uint64_t val64;
 
 #define MAX_BITS_32 32
 #define MAX_BITS_64 64
@@ -117,22 +172,22 @@ test_bit_relaxed_test_set_clear(void)
 	return TEST_SUCCESS;
 }
 
+static struct unit_test_suite test_suite = {
+	.suite_name = "Bitops test suite",
+	.unit_test_cases = {
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_relaxed_set),
+		TEST_CASE(test_bit_relaxed_clear),
+		TEST_CASE(test_bit_relaxed_test_set_clear),
+		TEST_CASES_END()
+	}
+};
+
 static int
 test_bitops(void)
 {
-	val32 = 0;
-	val64 = 0;
-
-	if (test_bit_relaxed_set() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_clear() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_test_set_clear() < 0)
-		return TEST_FAILED;
-
-	return TEST_SUCCESS;
+	return unit_test_suite_runner(&test_suite);
 }
 
 REGISTER_FAST_TEST(bitops_autotest, true, true, test_bitops);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v6 3/6] eal: add exactly-once bit access functions
  2024-05-02  5:57                   ` [RFC v6 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-05-02  5:57                     ` [RFC v6 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
  2024-05-02  5:57                     ` [RFC v6 2/6] eal: add unit tests for bit operations Mattias Rönnblom
@ 2024-05-02  5:57                     ` Mattias Rönnblom
  2024-05-02  5:57                     ` [RFC v6 4/6] eal: add unit tests for " Mattias Rönnblom
                                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-05-02  5:57 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add test/set/clear/assign/flip functions which prevents certain
compiler optimizations and guarantees that program-level memory loads
and/or stores will actually occur.

These functions are useful when interacting with memory-mapped
hardware devices.

The "once" family of functions does not promise atomicity and provides
no memory ordering guarantees beyond the C11 relaxed memory model.

RFC v6:
 * Have rte_bit_once_test() accept const-marked bitsets.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 197 +++++++++++++++++++++++++++++++++++
 1 file changed, 197 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 3297133e22..caec4f36bb 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -226,6 +226,179 @@ extern "C" {
 		 uint32_t *: __rte_bit_flip32,				\
 		 uint64_t *: __rte_bit_flip64)(addr, nr)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Generic selection macro to test exactly once the value of a bit in
+ * a 32-bit or 64-bit word. The type of operation depends on the type
+ * of the @c addr parameter.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * (e.g., it may not be eliminate or merged by the compiler).
+ *
+ * \code{.c}
+ * rte_bit_once_set(addr, 17);
+ * if (rte_bit_once_test(addr, 17)) {
+ *     ...
+ * }
+ * \endcode
+ *
+ * In the above example, rte_bit_once_set() may not be removed by
+ * the compiler, which would be allowed in case rte_bit_set() and
+ * rte_bit_test() was used.
+ *
+ * \code{.c}
+ * while (rte_bit_once_test(addr, 17);
+ *     ;
+ * \endcode
+ *
+ * In case rte_bit_test(addr, 17) was used instead, the resulting
+ * object code could (and in many cases would be) replaced with
+ * the equivalent to
+ * \code{.c}
+ * if (rte_bit_test(addr, 17)) {
+ *   for (;;) // spin forever
+ *       ;
+ * }
+ * \endcode
+ *
+ * rte_bit_once_test() does not give any guarantees in regards to
+ * memory ordering or atomicity.
+ *
+ * The regular bit set operations (e.g., rte_bit_test()) should be
+ * preferred over the "once" family of operations (e.g.,
+ * rte_bit_once_test()) if possible, since the latter may prevent
+ * optimizations crucial for run-time performance.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+
+#define rte_bit_once_test(addr, nr)					\
+	_Generic((addr),						\
+		uint32_t *: __rte_bit_once_test32,			\
+		const uint32_t *: __rte_bit_once_test32,		\
+		uint64_t *: __rte_bit_once_test64,			\
+		const uint64_t *: __rte_bit_once_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to '1'
+ * exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit set operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+
+#define rte_bit_once_set(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_set32,		\
+		 uint64_t *: __rte_bit_once_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to '0'
+ * exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_clear(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_clear32,		\
+		 uint64_t *: __rte_bit_once_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to bit in a word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to the
+ * value indicated by @c value exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_once_assign(addr, nr, value)				\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_once_assign32,			\
+		 uint64_t *: __rte_bit_once_assign64)(addr, nr, value)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Flip bit in word, reading and writing exactly once.
+ *
+ * Change the value of a bit to '0' if '1' or '1' if '0' in a 32-bit
+ * or 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit flip operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_flip(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_flip32,		\
+		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
+
 #define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
@@ -298,6 +471,18 @@ __RTE_GEN_BIT_CLEAR(, clear,, 64)
 __RTE_GEN_BIT_ASSIGN(, assign,, 64)
 __RTE_GEN_BIT_FLIP(, flip,, 64)
 
+__RTE_GEN_BIT_TEST(once_, test, volatile, 32)
+__RTE_GEN_BIT_SET(once_, set, volatile, 32)
+__RTE_GEN_BIT_CLEAR(once_, clear, volatile, 32)
+__RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 32)
+__RTE_GEN_BIT_FLIP(once_, flip, volatile, 32)
+
+__RTE_GEN_BIT_TEST(once_, test, volatile, 64)
+__RTE_GEN_BIT_SET(once_, set, volatile, 64)
+__RTE_GEN_BIT_CLEAR(once_, clear, volatile, 64)
+__RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 64)
+__RTE_GEN_BIT_FLIP(once_, flip, volatile, 64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -993,6 +1178,12 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_assign
 #undef rte_bit_flip
 
+#undef rte_bit_once_test
+#undef rte_bit_once_set
+#undef rte_bit_once_clear
+#undef rte_bit_once_assign
+#undef rte_bit_once_flip
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1042,6 +1233,12 @@ __RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
 __RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
 
+__RTE_BIT_OVERLOAD_2R(once_test, const volatile, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(once_set, volatile, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(once_clear, volatile, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(once_assign, volatile, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(once_flip, volatile, unsigned int, nr)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v6 4/6] eal: add unit tests for exactly-once bit access functions
  2024-05-02  5:57                   ` [RFC v6 0/6] Improve EAL bit operations API Mattias Rönnblom
                                       ` (2 preceding siblings ...)
  2024-05-02  5:57                     ` [RFC v6 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
@ 2024-05-02  5:57                     ` Mattias Rönnblom
  2024-05-02  5:57                     ` [RFC v6 5/6] eal: add atomic bit operations Mattias Rönnblom
  2024-05-02  5:57                     ` [RFC v6 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  5 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-05-02  5:57 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the rte_bit_once_*() family of functions.

RFC v5:
 * Atomic bit op implementation moved from this patch to the proper
   patch in the series. (Morten Brørup)

RFC v4:
 * Remove redundant continuations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 322f58c066..9bffc4da14 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -61,6 +61,14 @@ GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
 GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
 		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
 
+GEN_TEST_BIT_ACCESS(test_bit_once_access32, rte_bit_once_set,
+		    rte_bit_once_clear, rte_bit_once_assign,
+		    rte_bit_once_flip, rte_bit_once_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_once_access64, rte_bit_once_set,
+		    rte_bit_once_clear, rte_bit_once_assign,
+		    rte_bit_once_flip, rte_bit_once_test, 64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -177,6 +185,8 @@ static struct unit_test_suite test_suite = {
 	.unit_test_cases = {
 		TEST_CASE(test_bit_access32),
 		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_once_access32),
+		TEST_CASE(test_bit_once_access64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v6 5/6] eal: add atomic bit operations
  2024-05-02  5:57                   ` [RFC v6 0/6] Improve EAL bit operations API Mattias Rönnblom
                                       ` (3 preceding siblings ...)
  2024-05-02  5:57                     ` [RFC v6 4/6] eal: add unit tests for " Mattias Rönnblom
@ 2024-05-02  5:57                     ` Mattias Rönnblom
  2024-05-03  6:41                       ` Mattias Rönnblom
  2024-05-02  5:57                     ` [RFC v6 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  5 siblings, 1 reply; 90+ messages in thread
From: Mattias Rönnblom @ 2024-05-02  5:57 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add atomic bit test/set/clear/assign/flip and
test-and-set/clear/assign/flip functions.

All atomic bit functions allow (and indeed, require) the caller to
specify a memory order.

RFC v6:
 * Have rte_bit_atomic_test() accept const-marked bitsets.

RFC v4:
 * Add atomic bit flip.
 * Mark macro-generated private functions experimental.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).

RFC v2:
 o Add rte_bit_atomic_test_and_assign() (for consistency).
 o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
 o Use <rte_stdatomics.h> to support MSVC.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 428 +++++++++++++++++++++++++++++++++++
 1 file changed, 428 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index caec4f36bb..9cde982113 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -21,6 +21,7 @@
 
 #include <rte_compat.h>
 #include <rte_debug.h>
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -399,6 +400,202 @@ extern "C" {
 		 uint32_t *: __rte_bit_once_flip32,		\
 		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test if a particular bit in a word is set with a particular memory
+ * order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+#define rte_bit_atomic_test(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test32,			\
+		 const uint32_t *: __rte_bit_atomic_test32,		\
+		 uint64_t *: __rte_bit_atomic_test64,			\
+		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
+							    memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically set bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '1', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_set(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_set32,			\
+		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically clear bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '0', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_clear(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_clear32,			\
+		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically assign a value to bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to the value indicated by @c value, with the memory ordering
+ * as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_assign32,			\
+		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
+							memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically flip bit in word.
+ *
+ * Atomically negate the value of the bit specified by @c nr in the
+ * word pointed to by @c addr to the value indicated by @c value, with
+ * the memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_flip(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_flip32,			\
+		 uint64_t *: __rte_bit_atomic_flip64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and set a bit in word.
+ *
+ * Atomically test and set bit specified by @c nr in the word pointed
+ * to by @c addr to '1', with the memory ordering as specified with @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
+							      memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and clear a bit in word.
+ *
+ * Atomically test and clear bit specified by @c nr in the word
+ * pointed to by @c addr to '0', with the memory ordering as specified
+ * with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
+								memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and assign a bit in word.
+ *
+ * Atomically test and assign bit specified by @c nr in the word
+ * pointed to by @c addr the value specified by @c value, with the
+ * memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
+		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
+								 value, \
+								 memory_order)
+
 #define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
@@ -483,6 +680,162 @@ __RTE_GEN_BIT_CLEAR(once_, clear, volatile, 64)
 __RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 64)
 __RTE_GEN_BIT_FLIP(once_, flip, volatile, 64)
 
+#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\
+				      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_SET(size)					\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_set ## size(uint ## size ## _t *addr,		\
+				     unsigned int nr, int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_or_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_CLEAR(size)				\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_clear ## size(uint ## size ## _t *addr,	\
+				       unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_and_explicit(a_addr, ~mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_ASSIGN(size)				\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_assign ## size(uint ## size ## _t *addr,	\
+					unsigned int nr, bool value,	\
+					int memory_order)		\
+	{								\
+		if (value)						\
+			__rte_bit_atomic_set ## size(addr, nr, memory_order); \
+		else							\
+			__rte_bit_atomic_clear ## size(addr, nr,	\
+						       memory_order);	\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)			\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test_and_assign ## size(uint ## size ## _t *addr, \
+						 unsigned int nr,	\
+						 bool value,		\
+						 int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t before;				\
+		uint ## size ## _t target;				\
+									\
+		before = rte_atomic_load_explicit(a_addr,		\
+						  rte_memory_order_relaxed); \
+									\
+		do {							\
+			target = before;				\
+			__rte_bit_assign ## size(&target, nr, value);	\
+		} while (!rte_atomic_compare_exchange_weak_explicit(	\
+				a_addr, &before, target,		\
+				rte_memory_order_relaxed,		\
+				memory_order));				\
+		return __rte_bit_test ## size(&before, nr);		\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_FLIP(size)					\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_flip ## size(uint ## size ## _t *addr,		\
+				      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t before;				\
+		uint ## size ## _t target;				\
+									\
+		before = rte_atomic_load_explicit(a_addr,		\
+						  rte_memory_order_relaxed); \
+									\
+		do {							\
+			target = before;				\
+			__rte_bit_flip ## size(&target, nr);		\
+		} while (!rte_atomic_compare_exchange_weak_explicit(	\
+				a_addr, &before, target,		\
+				rte_memory_order_relaxed,		\
+				memory_order));				\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_OPS(size)			\
+	__RTE_GEN_BIT_ATOMIC_TEST(size)			\
+	__RTE_GEN_BIT_ATOMIC_SET(size)			\
+	__RTE_GEN_BIT_ATOMIC_CLEAR(size)		\
+	__RTE_GEN_BIT_ATOMIC_ASSIGN(size)		\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)	\
+	__RTE_GEN_BIT_ATOMIC_FLIP(size)
+
+__RTE_GEN_BIT_ATOMIC_OPS(32)
+__RTE_GEN_BIT_ATOMIC_OPS(64)
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_set32(uint32_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, true,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_clear32(uint32_t *addr, unsigned int nr,
+				int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, false,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_set64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, true,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_clear64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, false,
+						  memory_order);
+}
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -1184,6 +1537,14 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_once_assign
 #undef rte_bit_once_flip
 
+#undef rte_bit_atomic_test
+#undef rte_bit_atomic_set
+#undef rte_bit_atomic_clear
+#undef rte_bit_atomic_assign
+#undef rte_bit_atomic_test_and_set
+#undef rte_bit_atomic_test_and_clear
+#undef rte_bit_atomic_test_and_assign
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1227,6 +1588,59 @@ rte_log2_u64(uint64_t v)
 	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
 				arg2_type, arg2_name)
 
+#define __RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name); \
+	}
+
+#define __RTE_BIT_OVERLOAD_3R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name,	\
+					  arg3_name);		      \
+	}
+
+#define __RTE_BIT_OVERLOAD_4(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name, arg3_type, arg3_name)		\
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name, \
+						 arg3_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_4R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)
+
 __RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
@@ -1239,6 +1653,20 @@ __RTE_BIT_OVERLOAD_2(once_clear, volatile, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(once_assign, volatile, unsigned int, nr, bool, value)
 __RTE_BIT_OVERLOAD_2(once_flip, volatile, unsigned int, nr)
 
+__RTE_BIT_OVERLOAD_3R(atomic_test, const, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_set,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_clear,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_4(atomic_assign,, unsigned int, nr, bool, value,
+		     int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_flip,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_set,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_clear,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_4R(atomic_test_and_assign,, bool, unsigned int, nr,
+		      bool, value, int, memory_order)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v6 6/6] eal: add unit tests for atomic bit access functions
  2024-05-02  5:57                   ` [RFC v6 0/6] Improve EAL bit operations API Mattias Rönnblom
                                       ` (4 preceding siblings ...)
  2024-05-02  5:57                     ` [RFC v6 5/6] eal: add atomic bit operations Mattias Rönnblom
@ 2024-05-02  5:57                     ` Mattias Rönnblom
  5 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-05-02  5:57 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the rte_bit_atomic_*() family of
functions.

RFC v4:
 * Add atomicity test for atomic bit flip.

RFC v3:
 * Rename variable 'main' to make ICC happy.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c | 315 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 314 insertions(+), 1 deletion(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 9bffc4da14..c86d7e1f77 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -3,10 +3,13 @@
  * Copyright(c) 2024 Ericsson AB
  */
 
+#include <inttypes.h>
 #include <stdbool.h>
 
-#include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_lcore.h>
 #include <rte_random.h>
 #include "test.h"
 
@@ -69,6 +72,304 @@ GEN_TEST_BIT_ACCESS(test_bit_once_access64, rte_bit_once_set,
 		    rte_bit_once_clear, rte_bit_once_assign,
 		    rte_bit_once_flip, rte_bit_once_test, 64)
 
+#define bit_atomic_set(addr, nr)				\
+	rte_bit_atomic_set(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_clear(addr, nr)					\
+	rte_bit_atomic_clear(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_assign(addr, nr, value)				\
+	rte_bit_atomic_assign(addr, nr, value, rte_memory_order_relaxed)
+
+#define bit_atomic_flip(addr, nr)					\
+    rte_bit_atomic_flip(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_test(addr, nr)				\
+	rte_bit_atomic_test(addr, nr, rte_memory_order_relaxed)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access32, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access64, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 64)
+
+#define PARALLEL_TEST_RUNTIME 0.25
+
+#define GEN_TEST_BIT_PARALLEL_ASSIGN(size)				\
+									\
+	struct parallel_access_lcore ## size				\
+	{								\
+		unsigned int bit;					\
+		uint ## size ##_t *word;				\
+		bool failed;						\
+	};								\
+									\
+	static int							\
+	run_parallel_assign ## size(void *arg)				\
+	{								\
+		struct parallel_access_lcore ## size *lcore = arg;	\
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		bool value = false;					\
+									\
+		do {							\
+			bool new_value = rte_rand() & 1;		\
+			bool use_test_and_modify = rte_rand() & 1;	\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (rte_bit_atomic_test(lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) != value) { \
+				lcore->failed = true;			\
+				break;					\
+			}						\
+									\
+			if (use_test_and_modify) {			\
+				bool old_value;				\
+				if (use_assign) 			\
+					old_value = rte_bit_atomic_test_and_assign( \
+						lcore->word, lcore->bit, new_value, \
+						rte_memory_order_relaxed); \
+				else {					\
+					old_value = new_value ?		\
+						rte_bit_atomic_test_and_set( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed) : \
+						rte_bit_atomic_test_and_clear( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+				if (old_value != value) {		\
+					lcore->failed = true;		\
+					break;				\
+				}					\
+			} else {					\
+				if (use_assign)				\
+					rte_bit_atomic_assign(lcore->word, lcore->bit, \
+							      new_value, \
+							      rte_memory_order_relaxed); \
+				else {					\
+					if (new_value)			\
+						rte_bit_atomic_set(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+					else				\
+						rte_bit_atomic_clear(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+			}						\
+									\
+			value = new_value;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_assign ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		struct parallel_access_lcore ## size lmain = {		\
+			.word = &word					\
+		};							\
+		struct parallel_access_lcore ## size lworker = {	\
+			.word = &word					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		lmain.bit = rte_rand_max(size);				\
+		do {							\
+			lworker.bit = rte_rand_max(size);		\
+		} while (lworker.bit == lmain.bit);			\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_assign ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_assign ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		TEST_ASSERT(!lmain.failed, "Main lcore atomic access failed"); \
+		TEST_ASSERT(!lworker.failed, "Worker lcore atomic access " \
+			    "failed");					\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_ASSIGN(32)
+GEN_TEST_BIT_PARALLEL_ASSIGN(64)
+
+#define GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(size)			\
+									\
+	struct parallel_test_and_set_lcore ## size			\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_test_and_modify ## size(void *arg)		\
+	{								\
+		struct parallel_test_and_set_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			bool old_value;					\
+			bool new_value = rte_rand() & 1;		\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (use_assign)					\
+				old_value = rte_bit_atomic_test_and_assign( \
+					lcore->word, lcore->bit, new_value, \
+					rte_memory_order_relaxed);	\
+			else						\
+				old_value = new_value ?			\
+					rte_bit_atomic_test_and_set(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) : \
+					rte_bit_atomic_test_and_clear(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed); \
+			if (old_value != new_value)			\
+				lcore->flips++;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_test_and_modify ## size(void)		\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_test_and_set_lcore ## size lmain = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_test_and_set_lcore ## size lworker = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_test_and_modify ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_test_and_modify ## size(&lmain);		\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(32)
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(64)
+
+#define GEN_TEST_BIT_PARALLEL_FLIP(size)				\
+									\
+	struct parallel_flip_lcore ## size				\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_flip ## size(void *arg)				\
+	{								\
+		struct parallel_flip_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			rte_bit_atomic_flip(lcore->word, lcore->bit,	\
+					    rte_memory_order_relaxed);	\
+			lcore->flips++;					\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_flip ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_flip_lcore ## size lmain = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_flip_lcore ## size lworker = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_flip ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_flip ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_FLIP(32)
+GEN_TEST_BIT_PARALLEL_FLIP(64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -187,6 +488,18 @@ static struct unit_test_suite test_suite = {
 		TEST_CASE(test_bit_access64),
 		TEST_CASE(test_bit_once_access32),
 		TEST_CASE(test_bit_once_access64),
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_once_access32),
+		TEST_CASE(test_bit_once_access64),
+		TEST_CASE(test_bit_atomic_access32),
+		TEST_CASE(test_bit_atomic_access64),
+		TEST_CASE(test_bit_atomic_parallel_assign32),
+		TEST_CASE(test_bit_atomic_parallel_assign64),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify32),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify64),
+		TEST_CASE(test_bit_atomic_parallel_flip32),
+		TEST_CASE(test_bit_atomic_parallel_flip64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* Re: [RFC v6 5/6] eal: add atomic bit operations
  2024-05-02  5:57                     ` [RFC v6 5/6] eal: add atomic bit operations Mattias Rönnblom
@ 2024-05-03  6:41                       ` Mattias Rönnblom
  2024-05-03 23:30                         ` Tyler Retzlaff
  0 siblings, 1 reply; 90+ messages in thread
From: Mattias Rönnblom @ 2024-05-03  6:41 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: Heng Wang, Stephen Hemminger, Tyler Retzlaff, Morten Brørup

On 2024-05-02 07:57, Mattias Rönnblom wrote:
> Add atomic bit test/set/clear/assign/flip and
> test-and-set/clear/assign/flip functions.
> 
> All atomic bit functions allow (and indeed, require) the caller to
> specify a memory order.
> 
> RFC v6:
>   * Have rte_bit_atomic_test() accept const-marked bitsets.
> 
> RFC v4:
>   * Add atomic bit flip.
>   * Mark macro-generated private functions experimental.
> 
> RFC v3:
>   * Work around lack of C++ support for _Generic (Tyler Retzlaff).
> 
> RFC v2:
>   o Add rte_bit_atomic_test_and_assign() (for consistency).
>   o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
>   o Use <rte_stdatomics.h> to support MSVC.
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
>   lib/eal/include/rte_bitops.h | 428 +++++++++++++++++++++++++++++++++++
>   1 file changed, 428 insertions(+)
> 
> diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
> index caec4f36bb..9cde982113 100644
> --- a/lib/eal/include/rte_bitops.h
> +++ b/lib/eal/include/rte_bitops.h
> @@ -21,6 +21,7 @@
>   
>   #include <rte_compat.h>
>   #include <rte_debug.h>
> +#include <rte_stdatomic.h>
>   
>   #ifdef __cplusplus
>   extern "C" {
> @@ -399,6 +400,202 @@ extern "C" {
>   		 uint32_t *: __rte_bit_once_flip32,		\
>   		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Test if a particular bit in a word is set with a particular memory
> + * order.
> + *
> + * Test a bit with the resulting memory load ordered as per the
> + * specified memory order.
> + *
> + * @param addr
> + *   A pointer to the word to query.
> + * @param nr
> + *   The index of the bit.
> + * @param memory_order
> + *   The memory order to use. See <rte_stdatomics.h> for details.
> + * @return
> + *   Returns true if the bit is set, and false otherwise.
> + */
> +#define rte_bit_atomic_test(addr, nr, memory_order)			\
> +	_Generic((addr),						\
> +		 uint32_t *: __rte_bit_atomic_test32,			\
> +		 const uint32_t *: __rte_bit_atomic_test32,		\
> +		 uint64_t *: __rte_bit_atomic_test64,			\
> +		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
> +							    memory_order)

Should __rte_bit_atomic_test32()'s addr parameter be marked volatile, 
and two volatile-marked branches added to the above list? Both the 
C11-style GCC built-ins and the C11-proper atomic functions have 
addresses marked volatile. The Linux kernel and the old __sync GCC 
built-ins on the other hand, doesn't (although I think you still get 
volatile semantics). The only point of "volatile", as far as I can see, 
is to avoid warnings in case the user passed a volatile-marked pointer. 
The drawback is that *you're asking for volatile semantics*, although 
with the current compilers, it seems like that is what you get, 
regardless if you asked for it or not.

Just to be clear: even these functions would accept volatile-marked 
pointers, non-volatile pointers should be accepted as well (and should 
generally be preferred).

Isn't parallel programming in C lovely.

<snip>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC v6 5/6] eal: add atomic bit operations
  2024-05-03  6:41                       ` Mattias Rönnblom
@ 2024-05-03 23:30                         ` Tyler Retzlaff
  2024-05-04 15:36                           ` Mattias Rönnblom
  0 siblings, 1 reply; 90+ messages in thread
From: Tyler Retzlaff @ 2024-05-03 23:30 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Mattias Rönnblom, dev, Heng Wang, Stephen Hemminger,
	Morten Brørup

On Fri, May 03, 2024 at 08:41:09AM +0200, Mattias Rönnblom wrote:
> On 2024-05-02 07:57, Mattias Rönnblom wrote:
> >Add atomic bit test/set/clear/assign/flip and
> >test-and-set/clear/assign/flip functions.
> >
> >All atomic bit functions allow (and indeed, require) the caller to
> >specify a memory order.
> >
> >RFC v6:
> >  * Have rte_bit_atomic_test() accept const-marked bitsets.
> >
> >RFC v4:
> >  * Add atomic bit flip.
> >  * Mark macro-generated private functions experimental.
> >
> >RFC v3:
> >  * Work around lack of C++ support for _Generic (Tyler Retzlaff).
> >
> >RFC v2:
> >  o Add rte_bit_atomic_test_and_assign() (for consistency).
> >  o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
> >  o Use <rte_stdatomics.h> to support MSVC.
> >
> >Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >Acked-by: Morten Brørup <mb@smartsharesystems.com>
> >Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> >---
> >  lib/eal/include/rte_bitops.h | 428 +++++++++++++++++++++++++++++++++++
> >  1 file changed, 428 insertions(+)
> >
> >diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
> >index caec4f36bb..9cde982113 100644
> >--- a/lib/eal/include/rte_bitops.h
> >+++ b/lib/eal/include/rte_bitops.h
> >@@ -21,6 +21,7 @@
> >  #include <rte_compat.h>
> >  #include <rte_debug.h>
> >+#include <rte_stdatomic.h>
> >  #ifdef __cplusplus
> >  extern "C" {
> >@@ -399,6 +400,202 @@ extern "C" {
> >  		 uint32_t *: __rte_bit_once_flip32,		\
> >  		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
> >+/**
> >+ * @warning
> >+ * @b EXPERIMENTAL: this API may change without prior notice.
> >+ *
> >+ * Test if a particular bit in a word is set with a particular memory
> >+ * order.
> >+ *
> >+ * Test a bit with the resulting memory load ordered as per the
> >+ * specified memory order.
> >+ *
> >+ * @param addr
> >+ *   A pointer to the word to query.
> >+ * @param nr
> >+ *   The index of the bit.
> >+ * @param memory_order
> >+ *   The memory order to use. See <rte_stdatomics.h> for details.
> >+ * @return
> >+ *   Returns true if the bit is set, and false otherwise.
> >+ */
> >+#define rte_bit_atomic_test(addr, nr, memory_order)			\
> >+	_Generic((addr),						\
> >+		 uint32_t *: __rte_bit_atomic_test32,			\
> >+		 const uint32_t *: __rte_bit_atomic_test32,		\
> >+		 uint64_t *: __rte_bit_atomic_test64,			\
> >+		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
> >+							    memory_order)
> 
> Should __rte_bit_atomic_test32()'s addr parameter be marked
> volatile, and two volatile-marked branches added to the above list?

off-topic comment relating to the generic type selection list above, i was
reading C17 DR481 recently and i think we may want to avoid providing
qualified and unauqlified types in the list.

DR481: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2396.htm#dr_481

"the controlling expression of a generic selection shall have type
compatibile with at most one of the types named in its generic
association list."

"the type of the controlling expression is the type of the expression as
if it had undergone an lvalue conversion"

"lvalue conversion drops type qualifiers"

so the unqualified type of the controlling expression is only matched
selection list which i guess that means the qualified entries in the
list are never selected.

i suppose the implication here is we couldn't then provide 2 inline
functions one for volatile qualified and for not volatile qualified.

as for a single function where the parameter is or isn't volatile
qualified. if we're always forwarding to an intrinsic i've always
assumed (perhaps incorrectly) that the intrinsic itself did what was
semantically correct even without qualification.

as you note i believe there is a convenience element in providing the
volatile qualified version since it means the function like macro /
inline function will accept both volatile qualified and unqualified
whereas if we did not qualify the parameter it would require the
caller/user to strip the volatile qualification if present with casts.

i imagine in most cases we are just forwarding, in which case it seems
not horrible to provide the qualified version.

> Both the C11-style GCC built-ins and the C11-proper atomic functions
> have addresses marked volatile. The Linux kernel and the old __sync
> GCC built-ins on the other hand, doesn't (although I think you still
> get volatile semantics). The only point of "volatile", as far as I
> can see, is to avoid warnings in case the user passed a
> volatile-marked pointer. The drawback is that *you're asking for
> volatile semantics*, although with the current compilers, it seems
> like that is what you get, regardless if you asked for it or not.
> 
> Just to be clear: even these functions would accept volatile-marked
> pointers, non-volatile pointers should be accepted as well (and
> should generally be preferred).
> 
> Isn't parallel programming in C lovely.

it's super!

> 
> <snip>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC v6 5/6] eal: add atomic bit operations
  2024-05-03 23:30                         ` Tyler Retzlaff
@ 2024-05-04 15:36                           ` Mattias Rönnblom
  0 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-05-04 15:36 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: Mattias Rönnblom, dev, Heng Wang, Stephen Hemminger,
	Morten Brørup

On 2024-05-04 01:30, Tyler Retzlaff wrote:
> On Fri, May 03, 2024 at 08:41:09AM +0200, Mattias Rönnblom wrote:
>> On 2024-05-02 07:57, Mattias Rönnblom wrote:
>>> Add atomic bit test/set/clear/assign/flip and
>>> test-and-set/clear/assign/flip functions.
>>>
>>> All atomic bit functions allow (and indeed, require) the caller to
>>> specify a memory order.
>>>
>>> RFC v6:
>>>   * Have rte_bit_atomic_test() accept const-marked bitsets.
>>>
>>> RFC v4:
>>>   * Add atomic bit flip.
>>>   * Mark macro-generated private functions experimental.
>>>
>>> RFC v3:
>>>   * Work around lack of C++ support for _Generic (Tyler Retzlaff).
>>>
>>> RFC v2:
>>>   o Add rte_bit_atomic_test_and_assign() (for consistency).
>>>   o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
>>>   o Use <rte_stdatomics.h> to support MSVC.
>>>
>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>>> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
>>> ---
>>>   lib/eal/include/rte_bitops.h | 428 +++++++++++++++++++++++++++++++++++
>>>   1 file changed, 428 insertions(+)
>>>
>>> diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
>>> index caec4f36bb..9cde982113 100644
>>> --- a/lib/eal/include/rte_bitops.h
>>> +++ b/lib/eal/include/rte_bitops.h
>>> @@ -21,6 +21,7 @@
>>>   #include <rte_compat.h>
>>>   #include <rte_debug.h>
>>> +#include <rte_stdatomic.h>
>>>   #ifdef __cplusplus
>>>   extern "C" {
>>> @@ -399,6 +400,202 @@ extern "C" {
>>>   		 uint32_t *: __rte_bit_once_flip32,		\
>>>   		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change without prior notice.
>>> + *
>>> + * Test if a particular bit in a word is set with a particular memory
>>> + * order.
>>> + *
>>> + * Test a bit with the resulting memory load ordered as per the
>>> + * specified memory order.
>>> + *
>>> + * @param addr
>>> + *   A pointer to the word to query.
>>> + * @param nr
>>> + *   The index of the bit.
>>> + * @param memory_order
>>> + *   The memory order to use. See <rte_stdatomics.h> for details.
>>> + * @return
>>> + *   Returns true if the bit is set, and false otherwise.
>>> + */
>>> +#define rte_bit_atomic_test(addr, nr, memory_order)			\
>>> +	_Generic((addr),						\
>>> +		 uint32_t *: __rte_bit_atomic_test32,			\
>>> +		 const uint32_t *: __rte_bit_atomic_test32,		\
>>> +		 uint64_t *: __rte_bit_atomic_test64,			\
>>> +		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
>>> +							    memory_order)
>>
>> Should __rte_bit_atomic_test32()'s addr parameter be marked
>> volatile, and two volatile-marked branches added to the above list?
> 
> off-topic comment relating to the generic type selection list above, i was
> reading C17 DR481 recently and i think we may want to avoid providing
> qualified and unauqlified types in the list.
> 
> DR481: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2396.htm#dr_481
> 
> "the controlling expression of a generic selection shall have type
> compatibile with at most one of the types named in its generic
> association list."
> 

Const and unqualified pointers are not compatible. Without the "const 
uint32_t *" element in the association list, passing const-qualified 
pointers to rte_bit_test() will cause a compiler error.

So, if you want to support both passing const-qualified and unqualified 
pointers (which you do, obviously), then you have no other option than 
to treat them separately.

GCC, clang and ICC all seem to agree on this. The standard also is 
pretty clear on this, from what I can tell. "No two generic associations 
in the same generic selection shall specify compatible types." (6.5.1.1, 
note *compatible*). "For two pointer types to be compatible, both shall 
be identically qualified and both shall be pointers to compatible 
types." (6.7.6.1)

> "the type of the controlling expression is the type of the expression as
> if it had undergone an lvalue conversion"
> 
> "lvalue conversion drops type qualifiers"
> 
> so the unqualified type of the controlling expression is only matched
> selection list which i guess that means the qualified entries in the
> list are never selected.
> 
> i suppose the implication here is we couldn't then provide 2 inline
> functions one for volatile qualified and for not volatile qualified.
> 
> as for a single function where the parameter is or isn't volatile
> qualified. if we're always forwarding to an intrinsic i've always
> assumed (perhaps incorrectly) that the intrinsic itself did what was
> semantically correct even without qualification.
> 
> as you note i believe there is a convenience element in providing the
> volatile qualified version since it means the function like macro /
> inline function will accept both volatile qualified and unqualified
> whereas if we did not qualify the parameter it would require the
> caller/user to strip the volatile qualification if present with casts.
> 
> i imagine in most cases we are just forwarding, in which case it seems
> not horrible to provide the qualified version.
> 
>> Both the C11-style GCC built-ins and the C11-proper atomic functions
>> have addresses marked volatile. The Linux kernel and the old __sync
>> GCC built-ins on the other hand, doesn't (although I think you still
>> get volatile semantics). The only point of "volatile", as far as I
>> can see, is to avoid warnings in case the user passed a
>> volatile-marked pointer. The drawback is that *you're asking for
>> volatile semantics*, although with the current compilers, it seems
>> like that is what you get, regardless if you asked for it or not.
>>
>> Just to be clear: even these functions would accept volatile-marked
>> pointers, non-volatile pointers should be accepted as well (and
>> should generally be preferred).
>>
>> Isn't parallel programming in C lovely.
> 
> it's super!
> 
>>
>> <snip>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [RFC v7 0/6] Improve EAL bit operations API
  2024-05-02  5:57                     ` [RFC v6 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-05-05  8:37                       ` Mattias Rönnblom
  2024-05-05  8:37                         ` [RFC v7 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
                                           ` (5 more replies)
  0 siblings, 6 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-05-05  8:37 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

This patch set represent an attempt to improve and extend the RTE
bitops API, in particular for functions that operate on individual
bits.

All new functionality is exposed to the user as generic selection
macros, delegating the actual work to private (__-marked) static
inline functions. Public functions (e.g., rte_bit_set32()) would just
be bloating the API. Such generic selection macros will here be
referred to as "functions", although technically they are not.

The legacy <rte_bitops.h> rte_bit_relaxed_*() family of functions is
replaced with three families:

rte_bit_[test|set|clear|assign|flip]() which provides no memory
ordering or atomicity guarantees and no read-once or write-once
semantics (e.g., no use of volatile), but does provide the best
performance. The performance degradation resulting from the use of
volatile (e.g., forcing loads and stores to actually occur and in the
number specified) and atomic (e.g., LOCK-prefixed instructions on x86)
may be significant.

rte_bit_once_*() which guarantees program-level load and stores
actually occurring (i.e., prevents certain optimizations). The primary
use of these functions are in the context of memory mapped
I/O. Feedback on the details (semantics, naming) here would be greatly
appreciated, since the author is not much of a driver developer.

rte_bit_atomic_*() which provides atomic bit-level operations,
including the possibility to specifying memory ordering constraints
(or the lack thereof).

The atomic functions take non-_Atomic pointers, to be flexible, just
like the GCC builtins and default <rte_stdatomic.h>. The issue with
_Atomic APIs is that it may well be the case that the user wants to
perform both non-atomic and atomic operations on the same word.

Having _Atomic-marked addresses would complicate supporting atomic
bit-level operations in the bitset API (proposed in a different RFC
patchset), and potentially other APIs depending on RTE bitops for
atomic bit-level ops). Either one needs two bitset variants, one
_Atomic bitset and one non-atomic one, or the bitset code needs to
cast the non-_Atomic pointer to an _Atomic one. Having a separate
_Atomic bitset would be bloat and also prevent the user from both, in
some situations, doing atomic operations against a bit set, while in
other situations (e.g., at times when MT safety is not a concern)
operating on the same objects in a non-atomic manner.

Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
not uint32_t or uint64_t. The author found the use of such large types
confusing, and also failed to see any performance benefits.

A set of functions rte_bit_*_assign() are added, to assign a
particular boolean value to a particular bit.

All new functions have properly documented semantics.

All new functions operate on both 32 and 64-bit words, with type
checking.

_Generic allow the user code to be a little more impact. Have a
type-generic atomic test/set/clear/assign bit API also seems
consistent with the "core" (word-size) atomics API, which is generic
(both GCC builtins and <rte_stdatomic.h> are).

The _Generic versions avoids having explicit unsigned long versions of
all functions. If you have an unsigned long, it's safe to use the
generic version (e.g., rte_set_bit()) and _Generic will pick the right
function, provided long is either 32 or 64 bit on your platform (which
it is on all DPDK-supported ABIs).

The generic rte_bit_set() is a macro, and not a function, but
nevertheless has been given a lower-case name. That's how C11 does it
(for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
can't be taken, but it does not evaluate its parameters more than
once.

C++ doesn't support generic selection. In C++ translation units the
_Generic macros are replaced with overloaded functions.

Mattias Rönnblom (6):
  eal: extend bit manipulation functionality
  eal: add unit tests for bit operations
  eal: add exactly-once bit access functions
  eal: add unit tests for exactly-once bit access functions
  eal: add atomic bit operations
  eal: add unit tests for atomic bit access functions

 app/test/test_bitops.c       | 410 +++++++++++++++-
 lib/eal/include/rte_bitops.h | 873 ++++++++++++++++++++++++++++++++++-
 2 files changed, 1265 insertions(+), 18 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 90+ messages in thread

* [RFC v7 1/6] eal: extend bit manipulation functionality
  2024-05-05  8:37                       ` [RFC v7 0/6] Improve EAL bit operations API Mattias Rönnblom
@ 2024-05-05  8:37                         ` Mattias Rönnblom
  2024-05-05  8:37                         ` [RFC v7 2/6] eal: add unit tests for bit operations Mattias Rönnblom
                                           ` (4 subsequent siblings)
  5 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-05-05  8:37 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add functionality to test and modify the value of individual bits in
32-bit or 64-bit words.

These functions have no implications on memory ordering, atomicity and
does not use volatile and thus does not prevent any compiler
optimizations.

RFC v6:
 * Have rte_bit_test() accept const-marked bitsets.

RFC v4:
 * Add rte_bit_flip() which, believe it or not, flips the value of a bit.
 * Mark macro-generated private functions as experimental.
 * Use macros to generate *assign*() functions.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).
 * Fix ','-related checkpatch warnings.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 259 ++++++++++++++++++++++++++++++++++-
 1 file changed, 257 insertions(+), 2 deletions(-)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 449565eeae..3297133e22 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -2,6 +2,7 @@
  * Copyright(c) 2020 Arm Limited
  * Copyright(c) 2010-2019 Intel Corporation
  * Copyright(c) 2023 Microsoft Corporation
+ * Copyright(c) 2024 Ericsson AB
  */
 
 #ifndef _RTE_BITOPS_H_
@@ -11,12 +12,14 @@
  * @file
  * Bit Operations
  *
- * This file defines a family of APIs for bit operations
- * without enforcing memory ordering.
+ * This file provides functionality for low-level, single-word
+ * arithmetic and bit-level operations, such as counting or
+ * setting individual bits.
  */
 
 #include <stdint.h>
 
+#include <rte_compat.h>
 #include <rte_debug.h>
 
 #ifdef __cplusplus
@@ -105,6 +108,196 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr)					\
+	_Generic((addr),					\
+		uint32_t *: __rte_bit_test32,			\
+		const uint32_t *: __rte_bit_test32,		\
+		uint64_t *: __rte_bit_test64,			\
+		const uint64_t *: __rte_bit_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: __rte_bit_set32,		\
+		 uint64_t *: __rte_bit_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_clear32,			\
+		 uint64_t *: __rte_bit_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)					\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_assign32,			\
+		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Flip a bit in word.
+ *
+ * Generic selection macro to change the value of a bit to '0' if '1'
+ * or '1' if '0' in a 32-bit or 64-bit word. The type of operation
+ * depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_flip(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_flip32,				\
+		 uint64_t *: __rte_bit_flip64)(addr, nr)
+
+#define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_ ## family ## fun ## size(const qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return *addr & mask;					\
+	}
+
+#define __RTE_GEN_BIT_SET(family, fun, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		*addr |= mask;						\
+	}								\
+
+#define __RTE_GEN_BIT_CLEAR(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = ~((uint ## size ## _t)1 << nr); \
+		(*addr) &= mask;					\
+	}								\
+
+#define __RTE_GEN_BIT_ASSIGN(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr, bool value) \
+	{								\
+		if (value)						\
+			__rte_bit_ ## family ## set ## size(addr, nr);	\
+		else							\
+			__rte_bit_ ## family ## clear ## size(addr, nr); \
+	}
+
+#define __RTE_GEN_BIT_FLIP(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		bool value;						\
+									\
+		value = __rte_bit_ ## family ## test ## size(addr, nr);	\
+		__rte_bit_ ## family ## assign ## size(addr, nr, !value); \
+	}
+
+__RTE_GEN_BIT_TEST(, test,, 32)
+__RTE_GEN_BIT_SET(, set,, 32)
+__RTE_GEN_BIT_CLEAR(, clear,, 32)
+__RTE_GEN_BIT_ASSIGN(, assign,, 32)
+__RTE_GEN_BIT_FLIP(, flip,, 32)
+
+__RTE_GEN_BIT_TEST(, test,, 64)
+__RTE_GEN_BIT_SET(, set,, 64)
+__RTE_GEN_BIT_CLEAR(, clear,, 64)
+__RTE_GEN_BIT_ASSIGN(, assign,, 64)
+__RTE_GEN_BIT_FLIP(, flip,, 64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -787,6 +980,68 @@ rte_log2_u64(uint64_t v)
 
 #ifdef __cplusplus
 }
+
+/*
+ * Since C++ doesn't support generic selection (i.e., _Generic),
+ * function overloading is used instead. Such functions must be
+ * defined outside 'extern "C"' to be accepted by the compiler.
+ */
+
+#undef rte_bit_test
+#undef rte_bit_set
+#undef rte_bit_clear
+#undef rte_bit_assign
+#undef rte_bit_flip
+
+#define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
+	static inline void						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_2(fun, qualifier, arg1_type, arg1_name)	\
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 32, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 64, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_2R(fun, qualifier, ret_type, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name)				\
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_3(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name)					\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name)
+
+__RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v7 2/6] eal: add unit tests for bit operations
  2024-05-05  8:37                       ` [RFC v7 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-05-05  8:37                         ` [RFC v7 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-05-05  8:37                         ` Mattias Rönnblom
  2024-05-05  8:37                         ` [RFC v7 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
                                           ` (3 subsequent siblings)
  5 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-05-05  8:37 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_[test|set|clear|assign|flip]()
functions.

The tests are converted to use the test suite runner framework.

RFC v6:
 * Test rte_bit_*test() usage through const pointers.

RFC v4:
 * Remove redundant line continuations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c | 85 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 70 insertions(+), 15 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 0d4ccfb468..322f58c066 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -1,13 +1,68 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2019 Arm Limited
+ * Copyright(c) 2024 Ericsson AB
  */
 
+#include <stdbool.h>
+
 #include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_random.h>
 #include "test.h"
 
-uint32_t val32;
-uint64_t val64;
+#define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
+			    flip_fun, test_fun, size)			\
+	static int							\
+	test_name(void)							\
+	{								\
+		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
+		unsigned int bit_nr;					\
+		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			bool assign = rte_rand() & 1;			\
+			if (assign)					\
+				assign_fun(&word, bit_nr, reference_bit); \
+			else {						\
+				if (reference_bit)			\
+					set_fun(&word, bit_nr);		\
+				else					\
+					clear_fun(&word, bit_nr);	\
+									\
+			}						\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+			TEST_ASSERT(test_fun(&word, bit_nr) != reference_bit, \
+				    "Bit %d had unflipped value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+									\
+			const uint ## size ## _t *const_ptr = &word;	\
+			TEST_ASSERT(test_fun(const_ptr, bit_nr) ==	\
+				    reference_bit,			\
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		TEST_ASSERT(reference == word, "Word had unexpected value"); \
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
+
+static uint32_t val32;
+static uint64_t val64;
 
 #define MAX_BITS_32 32
 #define MAX_BITS_64 64
@@ -117,22 +172,22 @@ test_bit_relaxed_test_set_clear(void)
 	return TEST_SUCCESS;
 }
 
+static struct unit_test_suite test_suite = {
+	.suite_name = "Bitops test suite",
+	.unit_test_cases = {
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_relaxed_set),
+		TEST_CASE(test_bit_relaxed_clear),
+		TEST_CASE(test_bit_relaxed_test_set_clear),
+		TEST_CASES_END()
+	}
+};
+
 static int
 test_bitops(void)
 {
-	val32 = 0;
-	val64 = 0;
-
-	if (test_bit_relaxed_set() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_clear() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_test_set_clear() < 0)
-		return TEST_FAILED;
-
-	return TEST_SUCCESS;
+	return unit_test_suite_runner(&test_suite);
 }
 
 REGISTER_FAST_TEST(bitops_autotest, true, true, test_bitops);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v7 3/6] eal: add exactly-once bit access functions
  2024-05-05  8:37                       ` [RFC v7 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-05-05  8:37                         ` [RFC v7 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
  2024-05-05  8:37                         ` [RFC v7 2/6] eal: add unit tests for bit operations Mattias Rönnblom
@ 2024-05-05  8:37                         ` Mattias Rönnblom
  2024-05-07 19:17                           ` Morten Brørup
  2024-05-05  8:37                         ` [RFC v7 4/6] eal: add unit tests for " Mattias Rönnblom
                                           ` (2 subsequent siblings)
  5 siblings, 1 reply; 90+ messages in thread
From: Mattias Rönnblom @ 2024-05-05  8:37 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add test/set/clear/assign/flip functions which prevents certain
compiler optimizations and guarantees that program-level memory loads
and/or stores will actually occur.

These functions are useful when interacting with memory-mapped
hardware devices.

The "once" family of functions does not promise atomicity and provides
no memory ordering guarantees beyond the C11 relaxed memory model.

RFC v7:
 * Fix various minor issues in documentation.

RFC v6:
 * Have rte_bit_once_test() accept const-marked bitsets.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 201 +++++++++++++++++++++++++++++++++++
 1 file changed, 201 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 3297133e22..3644aa115c 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -226,6 +226,183 @@ extern "C" {
 		 uint32_t *: __rte_bit_flip32,				\
 		 uint64_t *: __rte_bit_flip64)(addr, nr)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Generic selection macro to test exactly once the value of a bit in
+ * a 32-bit or 64-bit word. The type of operation depends on the type
+ * of the @c addr parameter.
+ *
+ * rte_bit_once_test() is guaranteed to result in exactly one memory
+ * load (e.g., it may not be eliminate or merged by the compiler).
+ *
+ * \code{.c}
+ * rte_bit_once_set(addr, 17);
+ * if (rte_bit_once_test(addr, 17)) {
+ *     ...
+ * }
+ * \endcode
+ *
+ * In the above example, rte_bit_once_set() may not be removed by
+ * the compiler, which would be allowed in case rte_bit_set() and
+ * rte_bit_test() was used.
+ *
+ * \code{.c}
+ * while (rte_bit_once_test(addr, 17);
+ *     ;
+ * \endcode
+ *
+ * In case rte_bit_test(addr, 17) was used instead, the resulting
+ * object code could (and in many cases would be) replaced with
+ * the equivalent to
+ * \code{.c}
+ * if (rte_bit_test(addr, 17)) {
+ *   for (;;) // spin forever
+ *       ;
+ * }
+ * \endcode
+ *
+ * rte_bit_once_test() does not give any guarantees in regards to
+ * memory ordering or atomicity.
+ *
+ * The regular bit set operations (e.g., rte_bit_test()) should be
+ * preferred over the "once" family of operations (e.g.,
+ * rte_bit_once_test()) if possible, since the latter may prevent
+ * optimizations crucial for run-time performance.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+
+#define rte_bit_once_test(addr, nr)					\
+	_Generic((addr),						\
+		uint32_t *: __rte_bit_once_test32,			\
+		const uint32_t *: __rte_bit_once_test32,		\
+		uint64_t *: __rte_bit_once_test64,			\
+		const uint64_t *: __rte_bit_once_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word exactly once.
+ *
+ * Generic selection macro to set bit specified by @c nr in the word
+ * pointed to by @c addr to '1' exactly once.
+ *
+ * rte_bit_once_set() is guaranteed to result in exactly one memory
+ * load and exactly one memory store, *or* an atomic bit set
+ * operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+
+#define rte_bit_once_set(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_set32,		\
+		 uint64_t *: __rte_bit_once_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word exactly once.
+ *
+ * Generic selection macro to set bit specified by @c nr in the word
+ * pointed to by @c addr to '0' exactly once.
+ *
+ * rte_bit_once_clear() is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * See rte_bit_test_once() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_clear(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_clear32,		\
+		 uint64_t *: __rte_bit_once_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to bit in a word exactly once.
+ *
+ * Generic selection macro to set bit specified by @c nr in the word
+ * pointed to by @c addr to the value indicated by @c value exactly
+ * once.
+ *
+ * rte_bit_once_assign() is guaranteed to result in exactly one memory
+ * load and exactly one memory store, *or* an atomic bit clear
+ * operation.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_once_assign(addr, nr, value)				\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_once_assign32,			\
+		 uint64_t *: __rte_bit_once_assign64)(addr, nr, value)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Flip bit in word, reading and writing exactly once.
+ *
+ * Generic selection macro to change the value of a bit to '0' if '1'
+ * or '1' if '0' in a 32-bit or 64-bit word. The type of operation
+ * depends on the type of the @c addr parameter.
+ *
+ * rte_bit_once_flip() is guaranteed to result in exactly one memory
+ * load and exactly one memory store, *or* an atomic bit flip
+ * operation.
+ *
+ * See rte_bit_test_once() for more information and uses cases for the
+ * "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_flip(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_flip32,		\
+		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
+
 #define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
@@ -298,6 +475,18 @@ __RTE_GEN_BIT_CLEAR(, clear,, 64)
 __RTE_GEN_BIT_ASSIGN(, assign,, 64)
 __RTE_GEN_BIT_FLIP(, flip,, 64)
 
+__RTE_GEN_BIT_TEST(once_, test, volatile, 32)
+__RTE_GEN_BIT_SET(once_, set, volatile, 32)
+__RTE_GEN_BIT_CLEAR(once_, clear, volatile, 32)
+__RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 32)
+__RTE_GEN_BIT_FLIP(once_, flip, volatile, 32)
+
+__RTE_GEN_BIT_TEST(once_, test, volatile, 64)
+__RTE_GEN_BIT_SET(once_, set, volatile, 64)
+__RTE_GEN_BIT_CLEAR(once_, clear, volatile, 64)
+__RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 64)
+__RTE_GEN_BIT_FLIP(once_, flip, volatile, 64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -993,6 +1182,12 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_assign
 #undef rte_bit_flip
 
+#undef rte_bit_once_test
+#undef rte_bit_once_set
+#undef rte_bit_once_clear
+#undef rte_bit_once_assign
+#undef rte_bit_once_flip
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1042,6 +1237,12 @@ __RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
 __RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
 
+__RTE_BIT_OVERLOAD_2R(once_test, const volatile, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(once_set, volatile, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(once_clear, volatile, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(once_assign, volatile, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(once_flip, volatile, unsigned int, nr)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v7 4/6] eal: add unit tests for exactly-once bit access functions
  2024-05-05  8:37                       ` [RFC v7 0/6] Improve EAL bit operations API Mattias Rönnblom
                                           ` (2 preceding siblings ...)
  2024-05-05  8:37                         ` [RFC v7 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
@ 2024-05-05  8:37                         ` Mattias Rönnblom
  2024-05-05  8:37                         ` [RFC v7 5/6] eal: add atomic bit operations Mattias Rönnblom
  2024-05-05  8:37                         ` [RFC v7 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  5 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-05-05  8:37 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the rte_bit_once_*() family of functions.

RFC v5:
 * Atomic bit op implementation moved from this patch to the proper
   patch in the series. (Morten Brørup)

RFC v4:
 * Remove redundant continuations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 322f58c066..9bffc4da14 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -61,6 +61,14 @@ GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
 GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
 		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
 
+GEN_TEST_BIT_ACCESS(test_bit_once_access32, rte_bit_once_set,
+		    rte_bit_once_clear, rte_bit_once_assign,
+		    rte_bit_once_flip, rte_bit_once_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_once_access64, rte_bit_once_set,
+		    rte_bit_once_clear, rte_bit_once_assign,
+		    rte_bit_once_flip, rte_bit_once_test, 64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -177,6 +185,8 @@ static struct unit_test_suite test_suite = {
 	.unit_test_cases = {
 		TEST_CASE(test_bit_access32),
 		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_once_access32),
+		TEST_CASE(test_bit_once_access64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v7 5/6] eal: add atomic bit operations
  2024-05-05  8:37                       ` [RFC v7 0/6] Improve EAL bit operations API Mattias Rönnblom
                                           ` (3 preceding siblings ...)
  2024-05-05  8:37                         ` [RFC v7 4/6] eal: add unit tests for " Mattias Rönnblom
@ 2024-05-05  8:37                         ` Mattias Rönnblom
  2024-05-05  8:37                         ` [RFC v7 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  5 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-05-05  8:37 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add atomic bit test/set/clear/assign/flip and
test-and-set/clear/assign/flip functions.

All atomic bit functions allow (and indeed, require) the caller to
specify a memory order.

RFC v7:
 * Replace compare-exchange-based rte_bitset_atomic_test_and_*() and
   flip() with implementations that use the previous value as returned
   by the atomic fetch function.
 * Reword documentation to match the non-atomic macro variants.
 * Remove pointer to <rte_stdatomic.h> for memory model documentation,
   since there is no documentation for that API.

RFC v6:
 * Have rte_bit_atomic_test() accept const-marked bitsets.

RFC v4:
 * Add atomic bit flip.
 * Mark macro-generated private functions experimental.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).

RFC v2:
 o Add rte_bit_atomic_test_and_assign() (for consistency).
 o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
 o Use <rte_stdatomics.h> to support MSVC.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 413 +++++++++++++++++++++++++++++++++++
 1 file changed, 413 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 3644aa115c..673b888c1a 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -21,6 +21,7 @@
 
 #include <rte_compat.h>
 #include <rte_debug.h>
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -403,6 +404,204 @@ extern "C" {
 		 uint32_t *: __rte_bit_once_flip32,		\
 		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test if a particular bit in a word is set with a particular memory
+ * order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+#define rte_bit_atomic_test(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test32,			\
+		 const uint32_t *: __rte_bit_atomic_test32,		\
+		 uint64_t *: __rte_bit_atomic_test64,			\
+		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
+							    memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically set bit in word.
+ *
+ * Generic selection macro to atomically set bit specified by @c nr in
+ * the word pointed to by @c addr to '1', with the memory ordering as
+ * specified by @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_set(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_set32,			\
+		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically clear bit in word.
+ *
+ * Generic selection macro to atomically set bit specified by @c nr in
+ * the word pointed to by @c addr to '0', with the memory ordering as
+ * specified by @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_clear(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_clear32,			\
+		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically assign a value to bit in word.
+ *
+ * Generic selection macro to atomically set bit specified by @c nr in the
+ * word pointed to by @c addr to the value indicated by @c value, with
+ * the memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_assign32,			\
+		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
+							memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically flip bit in word.
+ *
+ * Generic selection macro to atomically negate the value of the bit
+ * specified by @c nr in the word pointed to by @c addr to the value
+ * indicated by @c value, with the memory ordering as specified with
+ * @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_flip(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_flip32,			\
+		 uint64_t *: __rte_bit_atomic_flip64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and set a bit in word.
+ *
+ * Generic selection macro to atomically test and set bit specified by
+ * @c nr in the word pointed to by @c addr to '1', with the memory
+ * ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
+							      memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and clear a bit in word.
+ *
+ * Generic selection macro to atomically test and clear bit specified
+ * by @c nr in the word pointed to by @c addr to '0', with the memory
+ * ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
+								memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and assign a bit in word.
+ *
+ * Generic selection macro to atomically test and assign bit specified
+ * by @c nr in the word pointed to by @c addr the value specified by
+ * @c value, with the memory ordering as specified with @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
+		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
+								 value, \
+								 memory_order)
+
 #define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
@@ -487,6 +686,145 @@ __RTE_GEN_BIT_CLEAR(once_, clear, volatile, 64)
 __RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 64)
 __RTE_GEN_BIT_FLIP(once_, flip, volatile, 64)
 
+#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\
+				      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_SET(size)					\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_set ## size(uint ## size ## _t *addr,		\
+				     unsigned int nr, int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_or_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_CLEAR(size)				\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_clear ## size(uint ## size ## _t *addr,	\
+				       unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_and_explicit(a_addr, ~mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_FLIP(size)					\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_flip ## size(uint ## size ## _t *addr,		\
+				       unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_xor_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_ASSIGN(size)				\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_assign ## size(uint ## size ## _t *addr,	\
+					unsigned int nr, bool value,	\
+					int memory_order)		\
+	{								\
+		if (value)						\
+			__rte_bit_atomic_set ## size(addr, nr, memory_order); \
+		else							\
+			__rte_bit_atomic_clear ## size(addr, nr,	\
+						       memory_order);	\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_SET(size)					\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test_and_set ## size(uint ## size ## _t *addr,	\
+					      unsigned int nr,		\
+					      int memory_order)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		uint ## size ## _t prev;				\
+									\
+		prev = rte_atomic_fetch_or_explicit(a_addr, mask,	\
+						    memory_order);	\
+									\
+		return prev & mask;					\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(size)			\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test_and_clear ## size(uint ## size ## _t *addr,	\
+						unsigned int nr,	\
+						int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		uint ## size ## _t prev;				\
+									\
+	        prev = rte_atomic_fetch_and_explicit(a_addr, ~mask,	\
+						     memory_order);	\
+									\
+		return prev & mask;					\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)			\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test_and_assign ## size(uint ## size ## _t *addr, \
+						 unsigned int nr,	\
+						 bool value,		\
+						 int memory_order)	\
+	{								\
+		if (value)						\
+			return __rte_bit_atomic_test_and_set ## size(addr, nr, \
+								     memory_order); \
+		else							\
+			return __rte_bit_atomic_test_and_clear ## size(addr, nr, \
+								       memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_OPS(size)			\
+	__RTE_GEN_BIT_ATOMIC_TEST(size)			\
+	__RTE_GEN_BIT_ATOMIC_SET(size)			\
+	__RTE_GEN_BIT_ATOMIC_CLEAR(size)		\
+	__RTE_GEN_BIT_ATOMIC_ASSIGN(size)		\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_SET(size)		\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(size)	\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)	\
+	__RTE_GEN_BIT_ATOMIC_FLIP(size)
+
+__RTE_GEN_BIT_ATOMIC_OPS(32)
+__RTE_GEN_BIT_ATOMIC_OPS(64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -1188,6 +1526,14 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_once_assign
 #undef rte_bit_once_flip
 
+#undef rte_bit_atomic_test
+#undef rte_bit_atomic_set
+#undef rte_bit_atomic_clear
+#undef rte_bit_atomic_assign
+#undef rte_bit_atomic_test_and_set
+#undef rte_bit_atomic_test_and_clear
+#undef rte_bit_atomic_test_and_assign
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1231,6 +1577,59 @@ rte_log2_u64(uint64_t v)
 	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
 				arg2_type, arg2_name)
 
+#define __RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name); \
+	}
+
+#define __RTE_BIT_OVERLOAD_3R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name,	\
+					  arg3_name);		      \
+	}
+
+#define __RTE_BIT_OVERLOAD_4(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name, arg3_type, arg3_name)		\
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name, \
+						 arg3_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_4R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)
+
 __RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
@@ -1243,6 +1642,20 @@ __RTE_BIT_OVERLOAD_2(once_clear, volatile, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(once_assign, volatile, unsigned int, nr, bool, value)
 __RTE_BIT_OVERLOAD_2(once_flip, volatile, unsigned int, nr)
 
+__RTE_BIT_OVERLOAD_3R(atomic_test, const, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_set,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_clear,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_4(atomic_assign,, unsigned int, nr, bool, value,
+		     int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_flip,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_set,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_clear,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_4R(atomic_test_and_assign,, bool, unsigned int, nr,
+		      bool, value, int, memory_order)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* [RFC v7 6/6] eal: add unit tests for atomic bit access functions
  2024-05-05  8:37                       ` [RFC v7 0/6] Improve EAL bit operations API Mattias Rönnblom
                                           ` (4 preceding siblings ...)
  2024-05-05  8:37                         ` [RFC v7 5/6] eal: add atomic bit operations Mattias Rönnblom
@ 2024-05-05  8:37                         ` Mattias Rönnblom
  5 siblings, 0 replies; 90+ messages in thread
From: Mattias Rönnblom @ 2024-05-05  8:37 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the rte_bit_atomic_*() family of
functions.

RFC v4:
 * Add atomicity test for atomic bit flip.

RFC v3:
 * Rename variable 'main' to make ICC happy.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c | 315 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 314 insertions(+), 1 deletion(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 9bffc4da14..c86d7e1f77 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -3,10 +3,13 @@
  * Copyright(c) 2024 Ericsson AB
  */
 
+#include <inttypes.h>
 #include <stdbool.h>
 
-#include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_lcore.h>
 #include <rte_random.h>
 #include "test.h"
 
@@ -69,6 +72,304 @@ GEN_TEST_BIT_ACCESS(test_bit_once_access64, rte_bit_once_set,
 		    rte_bit_once_clear, rte_bit_once_assign,
 		    rte_bit_once_flip, rte_bit_once_test, 64)
 
+#define bit_atomic_set(addr, nr)				\
+	rte_bit_atomic_set(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_clear(addr, nr)					\
+	rte_bit_atomic_clear(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_assign(addr, nr, value)				\
+	rte_bit_atomic_assign(addr, nr, value, rte_memory_order_relaxed)
+
+#define bit_atomic_flip(addr, nr)					\
+    rte_bit_atomic_flip(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_test(addr, nr)				\
+	rte_bit_atomic_test(addr, nr, rte_memory_order_relaxed)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access32, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access64, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 64)
+
+#define PARALLEL_TEST_RUNTIME 0.25
+
+#define GEN_TEST_BIT_PARALLEL_ASSIGN(size)				\
+									\
+	struct parallel_access_lcore ## size				\
+	{								\
+		unsigned int bit;					\
+		uint ## size ##_t *word;				\
+		bool failed;						\
+	};								\
+									\
+	static int							\
+	run_parallel_assign ## size(void *arg)				\
+	{								\
+		struct parallel_access_lcore ## size *lcore = arg;	\
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		bool value = false;					\
+									\
+		do {							\
+			bool new_value = rte_rand() & 1;		\
+			bool use_test_and_modify = rte_rand() & 1;	\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (rte_bit_atomic_test(lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) != value) { \
+				lcore->failed = true;			\
+				break;					\
+			}						\
+									\
+			if (use_test_and_modify) {			\
+				bool old_value;				\
+				if (use_assign) 			\
+					old_value = rte_bit_atomic_test_and_assign( \
+						lcore->word, lcore->bit, new_value, \
+						rte_memory_order_relaxed); \
+				else {					\
+					old_value = new_value ?		\
+						rte_bit_atomic_test_and_set( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed) : \
+						rte_bit_atomic_test_and_clear( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+				if (old_value != value) {		\
+					lcore->failed = true;		\
+					break;				\
+				}					\
+			} else {					\
+				if (use_assign)				\
+					rte_bit_atomic_assign(lcore->word, lcore->bit, \
+							      new_value, \
+							      rte_memory_order_relaxed); \
+				else {					\
+					if (new_value)			\
+						rte_bit_atomic_set(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+					else				\
+						rte_bit_atomic_clear(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+			}						\
+									\
+			value = new_value;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_assign ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		struct parallel_access_lcore ## size lmain = {		\
+			.word = &word					\
+		};							\
+		struct parallel_access_lcore ## size lworker = {	\
+			.word = &word					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		lmain.bit = rte_rand_max(size);				\
+		do {							\
+			lworker.bit = rte_rand_max(size);		\
+		} while (lworker.bit == lmain.bit);			\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_assign ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_assign ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		TEST_ASSERT(!lmain.failed, "Main lcore atomic access failed"); \
+		TEST_ASSERT(!lworker.failed, "Worker lcore atomic access " \
+			    "failed");					\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_ASSIGN(32)
+GEN_TEST_BIT_PARALLEL_ASSIGN(64)
+
+#define GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(size)			\
+									\
+	struct parallel_test_and_set_lcore ## size			\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_test_and_modify ## size(void *arg)		\
+	{								\
+		struct parallel_test_and_set_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			bool old_value;					\
+			bool new_value = rte_rand() & 1;		\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (use_assign)					\
+				old_value = rte_bit_atomic_test_and_assign( \
+					lcore->word, lcore->bit, new_value, \
+					rte_memory_order_relaxed);	\
+			else						\
+				old_value = new_value ?			\
+					rte_bit_atomic_test_and_set(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) : \
+					rte_bit_atomic_test_and_clear(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed); \
+			if (old_value != new_value)			\
+				lcore->flips++;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_test_and_modify ## size(void)		\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_test_and_set_lcore ## size lmain = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_test_and_set_lcore ## size lworker = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_test_and_modify ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_test_and_modify ## size(&lmain);		\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(32)
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(64)
+
+#define GEN_TEST_BIT_PARALLEL_FLIP(size)				\
+									\
+	struct parallel_flip_lcore ## size				\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_flip ## size(void *arg)				\
+	{								\
+		struct parallel_flip_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			rte_bit_atomic_flip(lcore->word, lcore->bit,	\
+					    rte_memory_order_relaxed);	\
+			lcore->flips++;					\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_flip ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_flip_lcore ## size lmain = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_flip_lcore ## size lworker = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_flip ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_flip ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_FLIP(32)
+GEN_TEST_BIT_PARALLEL_FLIP(64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -187,6 +488,18 @@ static struct unit_test_suite test_suite = {
 		TEST_CASE(test_bit_access64),
 		TEST_CASE(test_bit_once_access32),
 		TEST_CASE(test_bit_once_access64),
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_once_access32),
+		TEST_CASE(test_bit_once_access64),
+		TEST_CASE(test_bit_atomic_access32),
+		TEST_CASE(test_bit_atomic_access64),
+		TEST_CASE(test_bit_atomic_parallel_assign32),
+		TEST_CASE(test_bit_atomic_parallel_assign64),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify32),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify64),
+		TEST_CASE(test_bit_atomic_parallel_flip32),
+		TEST_CASE(test_bit_atomic_parallel_flip64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 90+ messages in thread

* RE: [RFC v7 3/6] eal: add exactly-once bit access functions
  2024-05-05  8:37                         ` [RFC v7 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
@ 2024-05-07 19:17                           ` Morten Brørup
  2024-05-08  6:47                             ` Mattias Rönnblom
  0 siblings, 1 reply; 90+ messages in thread
From: Morten Brørup @ 2024-05-07 19:17 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff

> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Sunday, 5 May 2024 10.38
> 
> Add test/set/clear/assign/flip functions which prevents certain
> compiler optimizations and guarantees that program-level memory loads
> and/or stores will actually occur.
> 
> These functions are useful when interacting with memory-mapped
> hardware devices.
> 
> The "once" family of functions does not promise atomicity and provides
> no memory ordering guarantees beyond the C11 relaxed memory model.

In another thread, Stephen referred to the extended discussion on memory models in Linux kernel documentation:
https://www.kernel.org/doc/html/latest/core-api/wrappers/memory-barriers.html

Unlike the "once" family of functions in this RFC, the "once" family of functions in the kernel also guarantee memory ordering, specifically for memory-mapped hardware devices. The document describes the rationale with examples.

It makes me think that DPDK "once" family of functions should behave similarly.
Alternatively, if the "once" family of functions cannot be generically implemented with a memory ordering that is optimal for all use cases, drop this family of functions, and instead rely on the "atomic" family of functions for interacting with memory-mapped hardware devices.

> 
> RFC v7:
>  * Fix various minor issues in documentation.
> 
> RFC v6:
>  * Have rte_bit_once_test() accept const-marked bitsets.
> 
> RFC v3:
>  * Work around lack of C++ support for _Generic (Tyler Retzlaff).
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---


^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC v7 3/6] eal: add exactly-once bit access functions
  2024-05-07 19:17                           ` Morten Brørup
@ 2024-05-08  6:47                             ` Mattias Rönnblom
  2024-05-08  7:33                               ` Morten Brørup
  0 siblings, 1 reply; 90+ messages in thread
From: Mattias Rönnblom @ 2024-05-08  6:47 UTC (permalink / raw)
  To: Morten Brørup, Mattias Rönnblom, dev
  Cc: Heng Wang, Stephen Hemminger, Tyler Retzlaff

On 2024-05-07 21:17, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
>> Sent: Sunday, 5 May 2024 10.38
>>
>> Add test/set/clear/assign/flip functions which prevents certain
>> compiler optimizations and guarantees that program-level memory loads
>> and/or stores will actually occur.
>>
>> These functions are useful when interacting with memory-mapped
>> hardware devices.
>>
>> The "once" family of functions does not promise atomicity and provides
>> no memory ordering guarantees beyond the C11 relaxed memory model.
> 
> In another thread, Stephen referred to the extended discussion on memory models in Linux kernel documentation:
> https://www.kernel.org/doc/html/latest/core-api/wrappers/memory-barriers.html
> 
> Unlike the "once" family of functions in this RFC, the "once" family of functions in the kernel also guarantee memory ordering, specifically for memory-mapped hardware devices. The document describes the rationale with examples.
> 

What more specifically did you have in mind? READ_ONCE() and 
WRITE_ONCE()? They give almost no guarantees. Very much relaxed.

I've read that document.

What you should keep in mind if you read that document, is that DPDK 
doesn't use the kernel's memory model, and doesn't have the kernel's 
barrier and atomics APIs. What we have are an obsolete, miniature 
look-alike in <rte_atomic.h> and something C11-like in <rte_stdatomic.h>.

My general impression is that DPDK was moving in the C11 direction 
memory model-wise, which is not the model the kernel uses.

> It makes me think that DPDK "once" family of functions should behave similarly.

I think they do already.

Also, rte_bit_once_set() works as the kernel's __set_bit().

> Alternatively, if the "once" family of functions cannot be generically implemented with a memory ordering that is optimal for all use cases, drop this family of functions, and instead rely on the "atomic" family of functions for interacting with memory-mapped hardware devices.
> 
>>
>> RFC v7:
>>   * Fix various minor issues in documentation.
>>
>> RFC v6:
>>   * Have rte_bit_once_test() accept const-marked bitsets.
>>
>> RFC v3:
>>   * Work around lack of C++ support for _Generic (Tyler Retzlaff).
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
>> ---
> 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* RE: [RFC v7 3/6] eal: add exactly-once bit access functions
  2024-05-08  6:47                             ` Mattias Rönnblom
@ 2024-05-08  7:33                               ` Morten Brørup
  2024-05-08  8:00                                 ` Mattias Rönnblom
  2024-05-08 15:15                                 ` Stephen Hemminger
  0 siblings, 2 replies; 90+ messages in thread
From: Morten Brørup @ 2024-05-08  7:33 UTC (permalink / raw)
  To: Mattias Rönnblom, Mattias Rönnblom, dev
  Cc: Heng Wang, Stephen Hemminger, Tyler Retzlaff

> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Wednesday, 8 May 2024 08.47
> 
> On 2024-05-07 21:17, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> >> Sent: Sunday, 5 May 2024 10.38
> >>
> >> Add test/set/clear/assign/flip functions which prevents certain
> >> compiler optimizations and guarantees that program-level memory loads
> >> and/or stores will actually occur.
> >>
> >> These functions are useful when interacting with memory-mapped
> >> hardware devices.
> >>
> >> The "once" family of functions does not promise atomicity and provides
> >> no memory ordering guarantees beyond the C11 relaxed memory model.
> >
> > In another thread, Stephen referred to the extended discussion on memory
> models in Linux kernel documentation:
> > https://www.kernel.org/doc/html/latest/core-api/wrappers/memory-
> barriers.html
> >
> > Unlike the "once" family of functions in this RFC, the "once" family of
> functions in the kernel also guarantee memory ordering, specifically for
> memory-mapped hardware devices. The document describes the rationale with
> examples.
> >
> 
> What more specifically did you have in mind? READ_ONCE() and
> WRITE_ONCE()? They give almost no guarantees. Very much relaxed.

The way I read it, they do provide memory ordering guarantees.

Ignore that the kernel's "once" functions operates on words and this RFC operates on bits, the behavior is the same. Either there are memory ordering guarantees, or there are not.

> 
> I've read that document.
> 
> What you should keep in mind if you read that document, is that DPDK
> doesn't use the kernel's memory model, and doesn't have the kernel's
> barrier and atomics APIs. What we have are an obsolete, miniature
> look-alike in <rte_atomic.h> and something C11-like in <rte_stdatomic.h>.
> 
> My general impression is that DPDK was moving in the C11 direction
> memory model-wise, which is not the model the kernel uses.

I think you and I agree that using legacy methods only because "the kernel does it that way" would not be the optimal roadmap for DPDK.

We should keep moving in the C11 direction memory model-wise.
I consider it more descriptive, and thus expect compilers to eventually produce better optimized code.

> 
> > It makes me think that DPDK "once" family of functions should behave
> similarly.
> 
> I think they do already.

I haven't looked deep into it, but the RFC's documentation says otherwise:
The "once" family of functions does not promise atomicity and provides *no memory ordering* guarantees beyond the C11 relaxed memory model.

> 
> Also, rte_bit_once_set() works as the kernel's __set_bit().
> 
> > Alternatively, if the "once" family of functions cannot be generically
> implemented with a memory ordering that is optimal for all use cases, drop
> this family of functions, and instead rely on the "atomic" family of functions
> for interacting with memory-mapped hardware devices.
> >
> >>
> >> RFC v7:
> >>   * Fix various minor issues in documentation.
> >>
> >> RFC v6:
> >>   * Have rte_bit_once_test() accept const-marked bitsets.
> >>
> >> RFC v3:
> >>   * Work around lack of C++ support for _Generic (Tyler Retzlaff).
> >>
> >> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> >> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> >> ---
> >

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC v7 3/6] eal: add exactly-once bit access functions
  2024-05-08  7:33                               ` Morten Brørup
@ 2024-05-08  8:00                                 ` Mattias Rönnblom
  2024-05-08  8:11                                   ` Morten Brørup
  2024-05-08 15:15                                 ` Stephen Hemminger
  1 sibling, 1 reply; 90+ messages in thread
From: Mattias Rönnblom @ 2024-05-08  8:00 UTC (permalink / raw)
  To: Morten Brørup, Mattias Rönnblom, dev
  Cc: Heng Wang, Stephen Hemminger, Tyler Retzlaff

On 2024-05-08 09:33, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
>> Sent: Wednesday, 8 May 2024 08.47
>>
>> On 2024-05-07 21:17, Morten Brørup wrote:
>>>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
>>>> Sent: Sunday, 5 May 2024 10.38
>>>>
>>>> Add test/set/clear/assign/flip functions which prevents certain
>>>> compiler optimizations and guarantees that program-level memory loads
>>>> and/or stores will actually occur.
>>>>
>>>> These functions are useful when interacting with memory-mapped
>>>> hardware devices.
>>>>
>>>> The "once" family of functions does not promise atomicity and provides
>>>> no memory ordering guarantees beyond the C11 relaxed memory model.
>>>
>>> In another thread, Stephen referred to the extended discussion on memory
>> models in Linux kernel documentation:
>>> https://www.kernel.org/doc/html/latest/core-api/wrappers/memory-
>> barriers.html
>>>
>>> Unlike the "once" family of functions in this RFC, the "once" family of
>> functions in the kernel also guarantee memory ordering, specifically for
>> memory-mapped hardware devices. The document describes the rationale with
>> examples.
>>>
>>
>> What more specifically did you have in mind? READ_ONCE() and
>> WRITE_ONCE()? They give almost no guarantees. Very much relaxed.
> 
> The way I read it, they do provide memory ordering guarantees.
> 

Sure. All types memory operations comes with some kind guarantees. A 
series of non-atomic, non-volatile stores issued by a particular thread 
are guaranteed to happen in program order, from the point of view of 
that thread, for example. Would be hard to write a program if that 
wasn't true.

"This macro does not give any guarantees in regards to memory ordering /../"

This is not true. I will rephrase to "any *additional* guarantees" for 
both plain and "once" family documentation.

> Ignore that the kernel's "once" functions operates on words and this RFC operates on bits, the behavior is the same. Either there are memory ordering guarantees, or there are not.
> 
>>
>> I've read that document.
>>
>> What you should keep in mind if you read that document, is that DPDK
>> doesn't use the kernel's memory model, and doesn't have the kernel's
>> barrier and atomics APIs. What we have are an obsolete, miniature
>> look-alike in <rte_atomic.h> and something C11-like in <rte_stdatomic.h>.
>>
>> My general impression is that DPDK was moving in the C11 direction
>> memory model-wise, which is not the model the kernel uses.
> 
> I think you and I agree that using legacy methods only because "the kernel does it that way" would not be the optimal roadmap for DPDK.
> 
> We should keep moving in the C11 direction memory model-wise.
> I consider it more descriptive, and thus expect compilers to eventually produce better optimized code.
> 
>>
>>> It makes me think that DPDK "once" family of functions should behave
>> similarly.
>>
>> I think they do already.
> 
> I haven't looked deep into it, but the RFC's documentation says otherwise:
> The "once" family of functions does not promise atomicity and provides *no memory ordering* guarantees beyond the C11 relaxed memory model.
> 
>>
>> Also, rte_bit_once_set() works as the kernel's __set_bit().
>>
>>> Alternatively, if the "once" family of functions cannot be generically
>> implemented with a memory ordering that is optimal for all use cases, drop
>> this family of functions, and instead rely on the "atomic" family of functions
>> for interacting with memory-mapped hardware devices.
>>>
>>>>
>>>> RFC v7:
>>>>    * Fix various minor issues in documentation.
>>>>
>>>> RFC v6:
>>>>    * Have rte_bit_once_test() accept const-marked bitsets.
>>>>
>>>> RFC v3:
>>>>    * Work around lack of C++ support for _Generic (Tyler Retzlaff).
>>>>
>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>>>> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
>>>> ---
>>>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* RE: [RFC v7 3/6] eal: add exactly-once bit access functions
  2024-05-08  8:00                                 ` Mattias Rönnblom
@ 2024-05-08  8:11                                   ` Morten Brørup
  2024-05-08  9:27                                     ` Mattias Rönnblom
  0 siblings, 1 reply; 90+ messages in thread
From: Morten Brørup @ 2024-05-08  8:11 UTC (permalink / raw)
  To: Mattias Rönnblom, Mattias Rönnblom, dev
  Cc: Heng Wang, Stephen Hemminger, Tyler Retzlaff

> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Wednesday, 8 May 2024 10.00
> 
> On 2024-05-08 09:33, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> >> Sent: Wednesday, 8 May 2024 08.47
> >>
> >> On 2024-05-07 21:17, Morten Brørup wrote:
> >>>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> >>>> Sent: Sunday, 5 May 2024 10.38
> >>>>
> >>>> Add test/set/clear/assign/flip functions which prevents certain
> >>>> compiler optimizations and guarantees that program-level memory loads
> >>>> and/or stores will actually occur.
> >>>>
> >>>> These functions are useful when interacting with memory-mapped
> >>>> hardware devices.
> >>>>
> >>>> The "once" family of functions does not promise atomicity and provides
> >>>> no memory ordering guarantees beyond the C11 relaxed memory model.
> >>>
> >>> In another thread, Stephen referred to the extended discussion on memory
> >> models in Linux kernel documentation:
> >>> https://www.kernel.org/doc/html/latest/core-api/wrappers/memory-
> >> barriers.html
> >>>
> >>> Unlike the "once" family of functions in this RFC, the "once" family of
> >> functions in the kernel also guarantee memory ordering, specifically for
> >> memory-mapped hardware devices. The document describes the rationale with
> >> examples.
> >>>
> >>
> >> What more specifically did you have in mind? READ_ONCE() and
> >> WRITE_ONCE()? They give almost no guarantees. Very much relaxed.
> >
> > The way I read it, they do provide memory ordering guarantees.
> >
> 
> Sure. All types memory operations comes with some kind guarantees. A
> series of non-atomic, non-volatile stores issued by a particular thread
> are guaranteed to happen in program order, from the point of view of
> that thread, for example. Would be hard to write a program if that
> wasn't true.
> 
> "This macro does not give any guarantees in regards to memory ordering /../"
> 
> This is not true. I will rephrase to "any *additional* guarantees" for
> both plain and "once" family documentation.

Consider code like this:
set_once(HW_START_BIT);
while (!get_once(HW_DONE_BIT)) /*busy wait*/;

If the "once" functions are used for hardware access, they must guarantee that HW_START_BIT has been written before HW_DONE_BIT is read.

The documentation must reflect this ordering guarantee.

> 
> > Ignore that the kernel's "once" functions operates on words and this RFC
> operates on bits, the behavior is the same. Either there are memory ordering
> guarantees, or there are not.
> >
> >>
> >> I've read that document.
> >>
> >> What you should keep in mind if you read that document, is that DPDK
> >> doesn't use the kernel's memory model, and doesn't have the kernel's
> >> barrier and atomics APIs. What we have are an obsolete, miniature
> >> look-alike in <rte_atomic.h> and something C11-like in <rte_stdatomic.h>.
> >>
> >> My general impression is that DPDK was moving in the C11 direction
> >> memory model-wise, which is not the model the kernel uses.
> >
> > I think you and I agree that using legacy methods only because "the kernel
> does it that way" would not be the optimal roadmap for DPDK.
> >
> > We should keep moving in the C11 direction memory model-wise.
> > I consider it more descriptive, and thus expect compilers to eventually
> produce better optimized code.
> >
> >>
> >>> It makes me think that DPDK "once" family of functions should behave
> >> similarly.
> >>
> >> I think they do already.
> >
> > I haven't looked deep into it, but the RFC's documentation says otherwise:
> > The "once" family of functions does not promise atomicity and provides *no
> memory ordering* guarantees beyond the C11 relaxed memory model.
> >
> >>
> >> Also, rte_bit_once_set() works as the kernel's __set_bit().
> >>
> >>> Alternatively, if the "once" family of functions cannot be generically
> >> implemented with a memory ordering that is optimal for all use cases, drop
> >> this family of functions, and instead rely on the "atomic" family of
> functions
> >> for interacting with memory-mapped hardware devices.
> >>>
> >>>>
> >>>> RFC v7:
> >>>>    * Fix various minor issues in documentation.
> >>>>
> >>>> RFC v6:
> >>>>    * Have rte_bit_once_test() accept const-marked bitsets.
> >>>>
> >>>> RFC v3:
> >>>>    * Work around lack of C++ support for _Generic (Tyler Retzlaff).
> >>>>
> >>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >>>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> >>>> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> >>>> ---
> >>>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC v7 3/6] eal: add exactly-once bit access functions
  2024-05-08  8:11                                   ` Morten Brørup
@ 2024-05-08  9:27                                     ` Mattias Rönnblom
  2024-05-08 10:08                                       ` Morten Brørup
  0 siblings, 1 reply; 90+ messages in thread
From: Mattias Rönnblom @ 2024-05-08  9:27 UTC (permalink / raw)
  To: Morten Brørup, Mattias Rönnblom, dev
  Cc: Heng Wang, Stephen Hemminger, Tyler Retzlaff

On 2024-05-08 10:11, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
>> Sent: Wednesday, 8 May 2024 10.00
>>
>> On 2024-05-08 09:33, Morten Brørup wrote:
>>>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
>>>> Sent: Wednesday, 8 May 2024 08.47
>>>>
>>>> On 2024-05-07 21:17, Morten Brørup wrote:
>>>>>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
>>>>>> Sent: Sunday, 5 May 2024 10.38
>>>>>>
>>>>>> Add test/set/clear/assign/flip functions which prevents certain
>>>>>> compiler optimizations and guarantees that program-level memory loads
>>>>>> and/or stores will actually occur.
>>>>>>
>>>>>> These functions are useful when interacting with memory-mapped
>>>>>> hardware devices.
>>>>>>
>>>>>> The "once" family of functions does not promise atomicity and provides
>>>>>> no memory ordering guarantees beyond the C11 relaxed memory model.
>>>>>
>>>>> In another thread, Stephen referred to the extended discussion on memory
>>>> models in Linux kernel documentation:
>>>>> https://www.kernel.org/doc/html/latest/core-api/wrappers/memory-
>>>> barriers.html
>>>>>
>>>>> Unlike the "once" family of functions in this RFC, the "once" family of
>>>> functions in the kernel also guarantee memory ordering, specifically for
>>>> memory-mapped hardware devices. The document describes the rationale with
>>>> examples.
>>>>>
>>>>
>>>> What more specifically did you have in mind? READ_ONCE() and
>>>> WRITE_ONCE()? They give almost no guarantees. Very much relaxed.
>>>
>>> The way I read it, they do provide memory ordering guarantees.
>>>
>>
>> Sure. All types memory operations comes with some kind guarantees. A
>> series of non-atomic, non-volatile stores issued by a particular thread
>> are guaranteed to happen in program order, from the point of view of
>> that thread, for example. Would be hard to write a program if that
>> wasn't true.
>>
>> "This macro does not give any guarantees in regards to memory ordering /../"
>>
>> This is not true. I will rephrase to "any *additional* guarantees" for
>> both plain and "once" family documentation.
> 
> Consider code like this:
> set_once(HW_START_BIT);
> while (!get_once(HW_DONE_BIT)) /*busy wait*/;
> 
> If the "once" functions are used for hardware access, they must guarantee that HW_START_BIT has been written before HW_DONE_BIT is read.
> 

Provided bits reside in the same word, there is (or at least, should be) 
such guarantee, and otherwise, you'll need a barrier.

I'm guessing in most cases the requirements are actually not as strict 
as you pose them: DONE starts as 0, so it may actually be read before 
START is written to, but not all DONE reads can be reordered ahead of 
the single START write. In that case, a compiler barrier between set and 
the get loop should suffice. Otherwise, you need a full barrier, or an 
I/O barrier.

Anyway, since the exact purpose of the "once" type bit operations is 
unclear, maybe I should drop them from the patch set.

Now, they are much like the Linux kernel's __set_bit(), but for hardware 
access, maybe they should be more like writel().

> The documentation must reflect this ordering guarantee.
> 
>>
>>> Ignore that the kernel's "once" functions operates on words and this RFC
>> operates on bits, the behavior is the same. Either there are memory ordering
>> guarantees, or there are not.
>>>
>>>>
>>>> I've read that document.
>>>>
>>>> What you should keep in mind if you read that document, is that DPDK
>>>> doesn't use the kernel's memory model, and doesn't have the kernel's
>>>> barrier and atomics APIs. What we have are an obsolete, miniature
>>>> look-alike in <rte_atomic.h> and something C11-like in <rte_stdatomic.h>.
>>>>
>>>> My general impression is that DPDK was moving in the C11 direction
>>>> memory model-wise, which is not the model the kernel uses.
>>>
>>> I think you and I agree that using legacy methods only because "the kernel
>> does it that way" would not be the optimal roadmap for DPDK.
>>>
>>> We should keep moving in the C11 direction memory model-wise.
>>> I consider it more descriptive, and thus expect compilers to eventually
>> produce better optimized code.
>>>
>>>>
>>>>> It makes me think that DPDK "once" family of functions should behave
>>>> similarly.
>>>>
>>>> I think they do already.
>>>
>>> I haven't looked deep into it, but the RFC's documentation says otherwise:
>>> The "once" family of functions does not promise atomicity and provides *no
>> memory ordering* guarantees beyond the C11 relaxed memory model.
>>>
>>>>
>>>> Also, rte_bit_once_set() works as the kernel's __set_bit().
>>>>
>>>>> Alternatively, if the "once" family of functions cannot be generically
>>>> implemented with a memory ordering that is optimal for all use cases, drop
>>>> this family of functions, and instead rely on the "atomic" family of
>> functions
>>>> for interacting with memory-mapped hardware devices.
>>>>>
>>>>>>
>>>>>> RFC v7:
>>>>>>     * Fix various minor issues in documentation.
>>>>>>
>>>>>> RFC v6:
>>>>>>     * Have rte_bit_once_test() accept const-marked bitsets.
>>>>>>
>>>>>> RFC v3:
>>>>>>     * Work around lack of C++ support for _Generic (Tyler Retzlaff).
>>>>>>
>>>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>>>>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>>>>>> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
>>>>>> ---
>>>>>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* RE: [RFC v7 3/6] eal: add exactly-once bit access functions
  2024-05-08  9:27                                     ` Mattias Rönnblom
@ 2024-05-08 10:08                                       ` Morten Brørup
  0 siblings, 0 replies; 90+ messages in thread
From: Morten Brørup @ 2024-05-08 10:08 UTC (permalink / raw)
  To: Mattias Rönnblom, Mattias Rönnblom, dev
  Cc: Heng Wang, Stephen Hemminger, Tyler Retzlaff

> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Wednesday, 8 May 2024 11.27
> 
> On 2024-05-08 10:11, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> >> Sent: Wednesday, 8 May 2024 10.00
> >>
> >> On 2024-05-08 09:33, Morten Brørup wrote:
> >>>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> >>>> Sent: Wednesday, 8 May 2024 08.47
> >>>>
> >>>> On 2024-05-07 21:17, Morten Brørup wrote:
> >>>>>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> >>>>>> Sent: Sunday, 5 May 2024 10.38
> >>>>>>
> >>>>>> Add test/set/clear/assign/flip functions which prevents certain
> >>>>>> compiler optimizations and guarantees that program-level memory loads
> >>>>>> and/or stores will actually occur.
> >>>>>>
> >>>>>> These functions are useful when interacting with memory-mapped
> >>>>>> hardware devices.
> >>>>>>
> >>>>>> The "once" family of functions does not promise atomicity and provides
> >>>>>> no memory ordering guarantees beyond the C11 relaxed memory model.
> >>>>>
> >>>>> In another thread, Stephen referred to the extended discussion on memory
> >>>> models in Linux kernel documentation:
> >>>>> https://www.kernel.org/doc/html/latest/core-api/wrappers/memory-
> >>>> barriers.html
> >>>>>
> >>>>> Unlike the "once" family of functions in this RFC, the "once" family of
> >>>> functions in the kernel also guarantee memory ordering, specifically for
> >>>> memory-mapped hardware devices. The document describes the rationale with
> >>>> examples.
> >>>>>
> >>>>
> >>>> What more specifically did you have in mind? READ_ONCE() and
> >>>> WRITE_ONCE()? They give almost no guarantees. Very much relaxed.
> >>>
> >>> The way I read it, they do provide memory ordering guarantees.
> >>>
> >>
> >> Sure. All types memory operations comes with some kind guarantees. A
> >> series of non-atomic, non-volatile stores issued by a particular thread
> >> are guaranteed to happen in program order, from the point of view of
> >> that thread, for example. Would be hard to write a program if that
> >> wasn't true.
> >>
> >> "This macro does not give any guarantees in regards to memory ordering
> /../"
> >>
> >> This is not true. I will rephrase to "any *additional* guarantees" for
> >> both plain and "once" family documentation.
> >
> > Consider code like this:
> > set_once(HW_START_BIT);
> > while (!get_once(HW_DONE_BIT)) /*busy wait*/;
> >
> > If the "once" functions are used for hardware access, they must guarantee
> that HW_START_BIT has been written before HW_DONE_BIT is read.
> >
> 
> Provided bits reside in the same word, there is (or at least, should be)
> such guarantee, and otherwise, you'll need a barrier.
> 
> I'm guessing in most cases the requirements are actually not as strict
> as you pose them: DONE starts as 0, so it may actually be read before
> START is written to, but not all DONE reads can be reordered ahead of
> the single START write. In that case, a compiler barrier between set and
> the get loop should suffice. Otherwise, you need a full barrier, or an
> I/O barrier.
> 
> Anyway, since the exact purpose of the "once" type bit operations is
> unclear, maybe I should drop them from the patch set.

I agree.

The "once" family, unless designed for accessing hardware registers, somehow seems like a subset of the "atomic" family.

Looking at DPDK drivers, they access hardware registers using e.g. rte_read32(), which looks like this:

static __rte_always_inline uint32_t
rte_read32(const volatile void *addr)
{
	uint32_t val;
	val = rte_read32_relaxed(addr);
	rte_io_rmb();
	return val;
}

If the "once" family of functions is for hardware access, they should do something similar regarding ordering and barriers.
And even if they do, I'm not sure the hardware driver developers are going to use them, unless other environments (e.g. Linux, Windows, BSD) supported by the hardware driver's common low-level code provide similar functions.

> 
> Now, they are much like the Linux kernel's __set_bit(), but for hardware
> access, maybe they should be more like writel().
> 
> > The documentation must reflect this ordering guarantee.
> >
> >>
> >>> Ignore that the kernel's "once" functions operates on words and this RFC
> >> operates on bits, the behavior is the same. Either there are memory
> ordering
> >> guarantees, or there are not.
> >>>
> >>>>
> >>>> I've read that document.
> >>>>
> >>>> What you should keep in mind if you read that document, is that DPDK
> >>>> doesn't use the kernel's memory model, and doesn't have the kernel's
> >>>> barrier and atomics APIs. What we have are an obsolete, miniature
> >>>> look-alike in <rte_atomic.h> and something C11-like in <rte_stdatomic.h>.
> >>>>
> >>>> My general impression is that DPDK was moving in the C11 direction
> >>>> memory model-wise, which is not the model the kernel uses.
> >>>
> >>> I think you and I agree that using legacy methods only because "the kernel
> >> does it that way" would not be the optimal roadmap for DPDK.
> >>>
> >>> We should keep moving in the C11 direction memory model-wise.
> >>> I consider it more descriptive, and thus expect compilers to eventually
> >> produce better optimized code.
> >>>
> >>>>
> >>>>> It makes me think that DPDK "once" family of functions should behave
> >>>> similarly.
> >>>>
> >>>> I think they do already.
> >>>
> >>> I haven't looked deep into it, but the RFC's documentation says otherwise:
> >>> The "once" family of functions does not promise atomicity and provides *no
> >> memory ordering* guarantees beyond the C11 relaxed memory model.
> >>>
> >>>>
> >>>> Also, rte_bit_once_set() works as the kernel's __set_bit().
> >>>>
> >>>>> Alternatively, if the "once" family of functions cannot be generically
> >>>> implemented with a memory ordering that is optimal for all use cases,
> drop
> >>>> this family of functions, and instead rely on the "atomic" family of
> >> functions
> >>>> for interacting with memory-mapped hardware devices.
> >>>>>
> >>>>>>
> >>>>>> RFC v7:
> >>>>>>     * Fix various minor issues in documentation.
> >>>>>>
> >>>>>> RFC v6:
> >>>>>>     * Have rte_bit_once_test() accept const-marked bitsets.
> >>>>>>
> >>>>>> RFC v3:
> >>>>>>     * Work around lack of C++ support for _Generic (Tyler Retzlaff).
> >>>>>>
> >>>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >>>>>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> >>>>>> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> >>>>>> ---
> >>>>>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [RFC v7 3/6] eal: add exactly-once bit access functions
  2024-05-08  7:33                               ` Morten Brørup
  2024-05-08  8:00                                 ` Mattias Rönnblom
@ 2024-05-08 15:15                                 ` Stephen Hemminger
  2024-05-08 16:16                                   ` Morten Brørup
  1 sibling, 1 reply; 90+ messages in thread
From: Stephen Hemminger @ 2024-05-08 15:15 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Mattias Rönnblom, Mattias Rönnblom, dev, Heng Wang,
	Tyler Retzlaff

On Wed, 8 May 2024 09:33:43 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:

> > What more specifically did you have in mind? READ_ONCE() and
> > WRITE_ONCE()? They give almost no guarantees. Very much relaxed.  
> 
> The way I read it, they do provide memory ordering guarantees.
> 
> Ignore that the kernel's "once" functions operates on words and this RFC operates on bits, the behavior is the same. Either there are memory ordering guarantees, or there are not.

The kernel's READ_ONCE/WRITE_ONCE are compiler only ordering, i.e only apply to single CPU.
RTFM memory-barriers.txt..

GUARANTEES
----------

There are some minimal guarantees that may be expected of a CPU:

 (*) On any given CPU, dependent memory accesses will be issued in order, with
     respect to itself.  This means that for:

	Q = READ_ONCE(P); D = READ_ONCE(*Q);

     the CPU will issue the following memory operations:

	Q = LOAD P, D = LOAD *Q

     and always in that order.  However, on DEC Alpha, READ_ONCE() also
     emits a memory-barrier instruction, so that a DEC Alpha CPU will
     instead issue the following memory operations:

	Q = LOAD P, MEMORY_BARRIER, D = LOAD *Q, MEMORY_BARRIER

     Whether on DEC Alpha or not, the READ_ONCE() also prevents compiler
     mischief.

 (*) Overlapping loads and stores within a particular CPU will appear to be
     ordered within that CPU.  This means that for:

	a = READ_ONCE(*X); WRITE_ONCE(*X, b);

     the CPU will only issue the following sequence of memory operations:

	a = LOAD *X, STORE *X = b

     And for:

	WRITE_ONCE(*X, c); d = READ_ONCE(*X);

     the CPU will only issue:

	STORE *X = c, d = LOAD *X

     (Loads and stores overlap if they are targeted at overlapping pieces of
     memory).

^ permalink raw reply	[flat|nested] 90+ messages in thread

* RE: [RFC v7 3/6] eal: add exactly-once bit access functions
  2024-05-08 15:15                                 ` Stephen Hemminger
@ 2024-05-08 16:16                                   ` Morten Brørup
  0 siblings, 0 replies; 90+ messages in thread
From: Morten Brørup @ 2024-05-08 16:16 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Mattias Rönnblom, Mattias Rönnblom, dev, Heng Wang,
	Tyler Retzlaff

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Wednesday, 8 May 2024 17.16
> 
> On Wed, 8 May 2024 09:33:43 +0200
> Morten Brørup <mb@smartsharesystems.com> wrote:
> 
> > > What more specifically did you have in mind? READ_ONCE() and
> > > WRITE_ONCE()? They give almost no guarantees. Very much relaxed.
> >
> > The way I read it, they do provide memory ordering guarantees.
> >
> > Ignore that the kernel's "once" functions operates on words and this RFC
> operates on bits, the behavior is the same. Either there are memory ordering
> guarantees, or there are not.
> 
> The kernel's READ_ONCE/WRITE_ONCE are compiler only ordering, i.e only apply
> to single CPU.
> RTFM memory-barriers.txt..
> 
> GUARANTEES
> ----------
> 
> There are some minimal guarantees that may be expected of a CPU:
> 
>  (*) On any given CPU, dependent memory accesses will be issued in order, with
>      respect to itself.  This means that for:
> 
> 	Q = READ_ONCE(P); D = READ_ONCE(*Q);
> 
>      the CPU will issue the following memory operations:
> 
> 	Q = LOAD P, D = LOAD *Q
> 
>      and always in that order.
>      However, on DEC Alpha, READ_ONCE() also
>      emits a memory-barrier instruction, so that a DEC Alpha CPU will
>      instead issue the following memory operations:
> 
> 	Q = LOAD P, MEMORY_BARRIER, D = LOAD *Q, MEMORY_BARRIER
> 
>      Whether on DEC Alpha or not, the READ_ONCE() also prevents compiler
>      mischief.
> 
>  (*) Overlapping loads and stores within a particular CPU will appear to be
>      ordered within that CPU.  This means that for:
> 
> 	a = READ_ONCE(*X); WRITE_ONCE(*X, b);
> 
>      the CPU will only issue the following sequence of memory operations:
> 
> 	a = LOAD *X, STORE *X = b
> 
>      And for:
> 
> 	WRITE_ONCE(*X, c); d = READ_ONCE(*X);
> 
>      the CPU will only issue:
> 
> 	STORE *X = c, d = LOAD *X
> 
>      (Loads and stores overlap if they are targeted at overlapping pieces of
>      memory).

It says "*the CPU* will issue the following [sequence of] *memory operations*",
not "*the compiler* will generate the following *CPU instructions*".

To me, that reads like a memory ordering guarantee.


^ permalink raw reply	[flat|nested] 90+ messages in thread

end of thread, other threads:[~2024-05-08 16:17 UTC | newest]

Thread overview: 90+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-02 13:53 [RFC 0/7] Improve EAL bit operations API Mattias Rönnblom
2024-03-02 13:53 ` [RFC 1/7] eal: extend bit manipulation functions Mattias Rönnblom
2024-03-02 17:05   ` Stephen Hemminger
2024-03-03  6:26     ` Mattias Rönnblom
2024-03-04 16:34       ` Tyler Retzlaff
2024-03-05 18:01         ` Mattias Rönnblom
2024-03-05 18:06           ` Tyler Retzlaff
2024-04-25  8:58   ` [RFC v2 0/6] Improve EAL bit operations API Mattias Rönnblom
2024-04-25  8:58     ` [RFC v2 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
2024-04-29  9:51       ` [RFC v3 0/6] Improve EAL bit operations API Mattias Rönnblom
2024-04-29  9:51         ` [RFC v3 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
2024-04-29 11:12           ` Morten Brørup
2024-04-30  9:55           ` [RFC v4 0/6] Improve EAL bit operations API Mattias Rönnblom
2024-04-30  9:55             ` [RFC v4 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
2024-04-30 12:08               ` [RFC v5 0/6] Improve EAL bit operations API Mattias Rönnblom
2024-04-30 12:08                 ` [RFC v5 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
2024-05-02  5:57                   ` [RFC v6 0/6] Improve EAL bit operations API Mattias Rönnblom
2024-05-02  5:57                     ` [RFC v6 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
2024-05-05  8:37                       ` [RFC v7 0/6] Improve EAL bit operations API Mattias Rönnblom
2024-05-05  8:37                         ` [RFC v7 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
2024-05-05  8:37                         ` [RFC v7 2/6] eal: add unit tests for bit operations Mattias Rönnblom
2024-05-05  8:37                         ` [RFC v7 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
2024-05-07 19:17                           ` Morten Brørup
2024-05-08  6:47                             ` Mattias Rönnblom
2024-05-08  7:33                               ` Morten Brørup
2024-05-08  8:00                                 ` Mattias Rönnblom
2024-05-08  8:11                                   ` Morten Brørup
2024-05-08  9:27                                     ` Mattias Rönnblom
2024-05-08 10:08                                       ` Morten Brørup
2024-05-08 15:15                                 ` Stephen Hemminger
2024-05-08 16:16                                   ` Morten Brørup
2024-05-05  8:37                         ` [RFC v7 4/6] eal: add unit tests for " Mattias Rönnblom
2024-05-05  8:37                         ` [RFC v7 5/6] eal: add atomic bit operations Mattias Rönnblom
2024-05-05  8:37                         ` [RFC v7 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
2024-05-02  5:57                     ` [RFC v6 2/6] eal: add unit tests for bit operations Mattias Rönnblom
2024-05-02  5:57                     ` [RFC v6 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
2024-05-02  5:57                     ` [RFC v6 4/6] eal: add unit tests for " Mattias Rönnblom
2024-05-02  5:57                     ` [RFC v6 5/6] eal: add atomic bit operations Mattias Rönnblom
2024-05-03  6:41                       ` Mattias Rönnblom
2024-05-03 23:30                         ` Tyler Retzlaff
2024-05-04 15:36                           ` Mattias Rönnblom
2024-05-02  5:57                     ` [RFC v6 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
2024-04-30 12:08                 ` [RFC v5 2/6] eal: add unit tests for bit operations Mattias Rönnblom
2024-04-30 12:08                 ` [RFC v5 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
2024-04-30 12:08                 ` [RFC v5 4/6] eal: add unit tests for " Mattias Rönnblom
2024-04-30 12:08                 ` [RFC v5 5/6] eal: add atomic bit operations Mattias Rönnblom
2024-04-30 12:08                 ` [RFC v5 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
2024-04-30  9:55             ` [RFC v4 2/6] eal: add unit tests for bit operations Mattias Rönnblom
2024-04-30  9:55             ` [RFC v4 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
2024-04-30  9:55             ` [RFC v4 4/6] eal: add unit tests for " Mattias Rönnblom
2024-04-30 10:37               ` Morten Brørup
2024-04-30 11:58                 ` Mattias Rönnblom
2024-04-30  9:55             ` [RFC v4 5/6] eal: add atomic bit operations Mattias Rönnblom
2024-04-30  9:55             ` [RFC v4 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
2024-04-29  9:51         ` [RFC v3 2/6] eal: add unit tests for bit operations Mattias Rönnblom
2024-04-29  9:51         ` [RFC v3 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
2024-04-29  9:51         ` [RFC v3 4/6] eal: add unit tests for " Mattias Rönnblom
2024-04-29  9:51         ` [RFC v3 5/6] eal: add atomic bit operations Mattias Rönnblom
2024-04-29  9:51         ` [RFC v3 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
2024-04-25  8:58     ` [RFC v2 2/6] eal: add unit tests for bit operations Mattias Rönnblom
2024-04-25  8:58     ` [RFC v2 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
2024-04-25  8:58     ` [RFC v2 4/6] eal: add unit tests for " Mattias Rönnblom
2024-04-25  8:58     ` [RFC v2 5/6] eal: add atomic bit operations Mattias Rönnblom
2024-04-25 10:25       ` Morten Brørup
2024-04-25 14:36         ` Mattias Rönnblom
2024-04-25 16:18           ` Morten Brørup
2024-04-26  9:39             ` Mattias Rönnblom
2024-04-26 12:00               ` Morten Brørup
2024-04-28 15:37                 ` Mattias Rönnblom
2024-04-29  7:24                   ` Morten Brørup
2024-04-30 16:52               ` Tyler Retzlaff
2024-04-25  8:58     ` [RFC v2 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
2024-04-25 18:05     ` [RFC v2 0/6] Improve EAL bit operations API Tyler Retzlaff
2024-04-26 11:17       ` Mattias Rönnblom
2024-04-26 21:35     ` Patrick Robb
2024-03-02 13:53 ` [RFC 2/7] eal: add generic bit manipulation macros Mattias Rönnblom
2024-03-04  8:16   ` Heng Wang
2024-03-04 15:41     ` Mattias Rönnblom
2024-03-04 16:42   ` Tyler Retzlaff
2024-03-05 18:08     ` Mattias Rönnblom
2024-03-05 18:22       ` Tyler Retzlaff
2024-03-05 20:02         ` Mattias Rönnblom
2024-03-05 20:53           ` Tyler Retzlaff
2024-03-02 13:53 ` [RFC 3/7] eal: add bit manipulation functions which read or write once Mattias Rönnblom
2024-03-02 13:53 ` [RFC 4/7] eal: add generic once-type bit operations macros Mattias Rönnblom
2024-03-02 13:53 ` [RFC 5/7] eal: add atomic bit operations Mattias Rönnblom
2024-03-02 13:53 ` [RFC 6/7] eal: add generic " Mattias Rönnblom
2024-03-02 13:53 ` [RFC 7/7] eal: deprecate relaxed family of " Mattias Rönnblom
2024-03-02 17:07   ` Stephen Hemminger
2024-03-03  6:30     ` Mattias Rönnblom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).