From mboxrd@z Thu Jan 1 00:00:00 1970 From: Honnappa Nagarahalli Subject: Re: [PATCH v3 1/2] eal: add 128-bit cmpset (x86-64 only) Date: Tue, 22 Jan 2019 20:30:34 +0000 Message-ID: References: <20190115223232.31866-1-gage.eads@intel.com> <20190116151835.22424-1-gage.eads@intel.com> <20190116151835.22424-2-gage.eads@intel.com> <9184057F7FC11744A2107296B6B8EB1E541C8BCA@FMSMSX108.amr.corp.intel.com> <9184057F7FC11744A2107296B6B8EB1E541C94D9@FMSMSX108.amr.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Cc: "olivier.matz@6wind.com" , "arybchenko@solarflare.com" , "Richardson, Bruce" , "Ananyev, Konstantin" , nd , "chaozhu@linux.vnet.ibm.com" , "jerinj@marvell.com" , "hemant.agrawal@nxp.com" , nd To: "Eads, Gage" , "dev@dpdk.org" Return-path: Received: from EUR01-HE1-obe.outbound.protection.outlook.com (mail-eopbgr130057.outbound.protection.outlook.com [40.107.13.57]) by dpdk.org (Postfix) with ESMTP id 24D26DE3 for ; Tue, 22 Jan 2019 21:30:35 +0100 (CET) In-Reply-To: <9184057F7FC11744A2107296B6B8EB1E541C94D9@FMSMSX108.amr.corp.intel.com> Content-Language: en-US List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Added other platform owners. > > > > > @@ -208,4 +209,25 @@ static inline void > > > > > rte_atomic64_clear(rte_atomic64_t > > > > > *v) } #endif > > > > > > > > > > +static inline int > > > > > +rte_atomic128_cmpset(volatile uint64_t *dst, uint64_t *exp, > > > > > +uint64_t > > > > > +*src) { > > > > The API name suggests it is a 128b operation. 'dst', 'exp' and 'src= ' > > > > should be pointers to 128b (__int128)? Or we could define our own > > > > data > > > type. > > > > > > I agree, I'm not a big fan of the 64b pointers here. I avoided > > > __int128 originally because it fails to compile with -pedantic, but > > > on second thought (and with your suggestion of a separate data > > > type), we can resolve that with this typedef: > > > > > > typedef struct { > > > RTE_STD_C11 __int128 val; > > > } rte_int128_t; > > ok > > > > > > > > > Since, it is a new API, can we define it with memory orderings > > > > which will be more conducive to relaxed memory ordering based > architectures? > > > > You can refer to [1] and [2] for guidance. > > > > > > I certainly see the value in controlling the operation's memory > > > ordering, like in the __atomic intrinsics, but I'm not sure this > > > patchset is the right place to address that. I see that work going a > > > couple > > ways: > > > 1. Expand the existing rte_atomicN_* interfaces with additional > > > arguments. In that case, I'd prefer this be done in a separate > > > patchset that addresses all the atomic operations, not just cmpset, > > > so the interface changes are chosen according to the needs of the > > > full set of atomic operations. If this approach is taken then > > > there's no need to solve this while rte_atomic128_cmpset is > > > experimental, since all the > > other functions are non-experimental anyway. > > > > > > - Or - > > > > > > 2. Don't modify the existing rte_atomicN_* interfaces (or their > > > strongly ordered behavior), and instead create new versions of them > > > that take additional arguments. In this case, we can implement > > > rte_atomic128_cmpset() as is and create a more flexible version in a > > > later > > patchset. > > > > > > Either way, I think the current interface (w.r.t. memory ordering > > > options) can work and still leaves us in a good position for future > > changes/improvements. > > > > > I do not see the need to modify/extend the existing rte_atomicN_* APIs > > as the corresponding __atomic intrinsics serve as replacements. I > > expect that at some point, DPDK code base will not be using > rte_atomicN_* APIs. > > However, __atomic intrinsics do not support 128b wide parameters. > > Hence >=20 > I don't think that's correct. From the GCC docs: >=20 > "16-byte integral types are also allowed if `__int128' (see __int128) is > supported by the architecture." >=20 > This works with x86 -64 -- I assume aarch64 also, but haven't confirmed. >=20 > Source: https://gcc.gnu.org/onlinedocs/gcc-4.7.0/gcc/_005f_005fatomic- > Builtins.html (following is based on my reading, not based on experiments) My understanding is that the __atomic_load_n/store_n were implemented using= a compare-and-swap HW instruction (due to lack of HW 128b atomic load and = store instructions). This introduced the side effect or store/load respecti= vely. Where as the user does not expect such side effects. As suggested in = [1], it looks like GCC delegated the implementation to libatomic which 'it = seems' uses locks to implement 128b __atomic intrinsics (needs to be verifi= ed) If __atomic intrinsics, for any of the supported platforms, do not have an = optimal implementation, I prefer to add a DPDK API as an abstraction. A giv= en platform can choose to implement such an API using __atomic intrinsics i= f it wants. The DPDK API can be similar to __atomic_compare_exchange_n. [1] https://patchwork.ozlabs.org/patch/721686/ [2] https://gcc.gnu.org/ml/gcc/2017-01/msg00167.html >=20 > > DPDK needs to write its own. Since this is the first API in that > > regard, I prefer that we start with a signature that resembles > > __atomic intrinsics which have been proven to provide best flexibility = for > all the platforms supported by DPDK.