Re: [PATCH 1/1] eal: add 128-bit cmpset (x86-64 only)

From: Ola Liljedahl <Ola.Liljedahl@arm.com>
To: "gage.eads@intel.com" <gage.eads@intel.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Cc: "arybchenko@solarflare.com" <arybchenko@solarflare.com>,
	"jerinj@marvell.com" <jerinj@marvell.com>,
	"chaozhu@linux.vnet.ibm.com" <chaozhu@linux.vnet.ibm.com>,
	nd <nd@arm.com>,
	"bruce.richardson@intel.com" <bruce.richardson@intel.com>,
	"konstantin.ananyev@intel.com" <konstantin.ananyev@intel.com>,
	"hemant.agrawal@nxp.com" <hemant.agrawal@nxp.com>,
	"olivier.matz@6wind.com" <olivier.matz@6wind.com>,
	Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	"Gavin Hu (Arm Technology China)" <Gavin.Hu@arm.com>
Subject: Re: [PATCH 1/1] eal: add 128-bit cmpset (x86-64 only)
Date: Mon, 28 Jan 2019 23:01:37 +0000	[thread overview]
Message-ID: <1548716507.11472.96.camel@arm.com> (raw)
In-Reply-To: <20190128172945.27251-2-gage.eads@intel.com>

On Mon, 2019-01-28 at 11:29 -0600, Gage Eads wrote:
> This operation can be used for non-blocking algorithms, such as a
> non-blocking stack or ring.
> 
> Signed-off-by: Gage Eads <gage.eads@intel.com>
> ---
>  .../common/include/arch/x86/rte_atomic_64.h        | 31 +++++++++++
>  lib/librte_eal/common/include/generic/rte_atomic.h | 65
> ++++++++++++++++++++++
>  2 files changed, 96 insertions(+)
> 
> diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
> b/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
> index fd2ec9c53..b7b90b83e 100644
> --- a/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
> +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h
> @@ -34,6 +34,7 @@
>  /*
>   * Inspired from FreeBSD src/sys/amd64/include/atomic.h
>   * Copyright (c) 1998 Doug Rabson
> + * Copyright (c) 2019 Intel Corporation
>   * All rights reserved.
>   */
>  
> @@ -46,6 +47,7 @@
>  
>  #include <stdint.h>
>  #include <rte_common.h>
> +#include <rte_compat.h>
>  #include <rte_atomic.h>
>  
>  /*------------------------- 64 bit atomic operations ------------------------
> -*/
> @@ -208,4 +210,33 @@ static inline void rte_atomic64_clear(rte_atomic64_t *v)
>  }
>  #endif
>  
> +static inline int __rte_experimental
__rte_always_inline?

> +rte_atomic128_cmpset(volatile rte_int128_t *dst,
No need to declare the location volatile. Volatile doesn't do what you think it
does.
https://youtu.be/lkgszkPnV8g?t=1027

> +		     rte_int128_t *exp,
I would declare 'exp' const as well and document that 'exp' is not updated (with
the old value) for a failure. The reason being that ARMv8.0/AArch64 cannot
atomically read the old value without also writing the location and that is bad
for performance (unnecessary writes leads to unnecessary contention and worse
scalability). And the user must anyway read the location (in the start of the
critical section) using e.g. non-atomic 64-bit reads so there isn't actually any
requirement for an atomic 128-bit read of the location.

>  rte_int128_t *src,
const rte_int128_t *src?

But why are we not passing 'exp' and 'src' by value? That works great, even with
structs. Passing by value simplifies the compiler's life, especially if the call
is inlined. Ask a compiler developer.

> +		     unsigned int weak,
> +		     enum rte_atomic_memmodel_t success,
> +		     enum rte_atomic_memmodel_t failure)
> +{
> +	RTE_SET_USED(weak);
> +	RTE_SET_USED(success);
> +	RTE_SET_USED(failure);
> +	uint8_t res;
> +
> +	asm volatile (
> +		      MPLOCKED
> +		      "cmpxchg16b %[dst];"
> +		      " sete %[res]"
> +		      : [dst] "=m" (dst->val[0]),
> +			"=A" (exp->val[0]),
> +			[res] "=r" (res)
> +		      : "c" (src->val[1]),
> +			"b" (src->val[0]),
> +			"m" (dst->val[0]),
> +			"d" (exp->val[1]),
> +			"a" (exp->val[0])
> +		      : "memory");
> +
> +	return res;
> +}
> +
>  #endif /* _RTE_ATOMIC_X86_64_H_ */
> diff --git a/lib/librte_eal/common/include/generic/rte_atomic.h
> b/lib/librte_eal/common/include/generic/rte_atomic.h
> index b99ba4688..8d612d566 100644
> --- a/lib/librte_eal/common/include/generic/rte_atomic.h
> +++ b/lib/librte_eal/common/include/generic/rte_atomic.h
> @@ -14,6 +14,7 @@
>  
>  #include <stdint.h>
>  #include <rte_common.h>
> +#include <rte_compat.h>
>  
>  #ifdef __DOXYGEN__
>  
> @@ -1082,4 +1083,68 @@ static inline void rte_atomic64_clear(rte_atomic64_t
> *v)
>  }
>  #endif
>  
> +/*------------------------ 128 bit atomic operations ------------------------
> -*/
> +
> +/**
> + * 128-bit integer structure.
> + */
> +typedef struct {
> +	uint64_t val[2];
> +} __rte_aligned(16) rte_int128_t;
So we can't use __int128?

> +
> +/**
> + * Memory consistency models used in atomic operations. These control the
> + * behavior of the operation with respect to memory barriers and
> + * thread synchronization.
> + *
> + * These directly match those in the C++11 standard; for details on their
> + * behavior, refer to the standard.
> + */
> +enum rte_atomic_memmodel_t {
> +	RTE_ATOMIC_RELAXED,
> +	RTE_ATOMIC_CONSUME,
> +	RTE_ATOMIC_ACQUIRE,
> +	RTE_ATOMIC_RELEASE,
> +	RTE_ATOMIC_ACQ_REL,
> +	RTE_ATOMIC_SEQ_CST,
> +};
> +
> +/* Only implemented on x86-64 currently. The ifdef prevents compilation from
> + * failing for architectures without a definition of this function.
> + */
> +#if defined(RTE_ARCH_X86_64)
> +/**
> + * An atomic compare and set function used by the mutex functions.
> + * (atomic) equivalent to:
> + *   if (*dst == exp)
> + *     *dst = src (all 128-bit words)
> + *
> + * @param dst
> + *   The destination into which the value will be written.
> + * @param exp
> + *   Pointer to the expected value. If the operation fails, this memory is
> + *   updated with the actual value.
> + * @param src
> + *   Pointer to the new value.
> + * @param weak
> + *   A value of true allows the comparison to spuriously fail.
> Implementations
> + *   may ignore this argument and only implement the strong variant.
> + * @param success
> + *   If successful, the operation's memory behavior conforms to this (or a
> + *   stronger) model.
> + * @param failure
> + *   If unsuccessful, the operation's memory behavior conforms to this (or a
> + *   stronger) model. This argument cannot be RTE_ATOMIC_RELEASE,
> + *   RTE_ATOMIC_ACQ_REL, or a stronger model than success.
> + * @return
> + *   Non-zero on success; 0 on failure.
> + */
> +static inline int __rte_experimental
> +rte_atomic128_cmpset(volatile rte_int128_t *dst,
> +		     rte_int128_t *exp, rte_int128_t *src,
> +		     unsigned int weak,
> +		     enum rte_atomic_memmodel_t success,
> +		     enum rte_atomic_memmodel_t failure);
> +#endif
> +
>  #endif /* _RTE_ATOMIC_H_ */
-- 
Ola Liljedahl, Networking System Architect, Arm
Phone +46706866373, Skype ola.liljedahl