From mboxrd@z Thu Jan 1 00:00:00 1970 From: Honnappa Nagarahalli Subject: Re: [PATCH v3 6/8] stack: add C11 atomic implementation Date: Mon, 1 Apr 2019 19:06:38 +0000 Message-ID: References: <20190305164256.2367-1-gage.eads@intel.com> <20190306144559.391-1-gage.eads@intel.com> <20190306144559.391-7-gage.eads@intel.com> <9184057F7FC11744A2107296B6B8EB1E5420D940@FMSMSX108.amr.corp.intel.com> <9184057F7FC11744A2107296B6B8EB1E5420DDF2@FMSMSX108.amr.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Cc: "'olivier.matz@6wind.com'" , "'arybchenko@solarflare.com'" , "Richardson, Bruce" , "Ananyev, Konstantin" , "Gavin Hu (Arm Technology China)" , nd , "thomas@monjalon.net" , nd To: "Eads, Gage" , "'dev@dpdk.org'" Return-path: Received: from EUR01-VE1-obe.outbound.protection.outlook.com (mail-eopbgr140042.outbound.protection.outlook.com [40.107.14.42]) by dpdk.org (Postfix) with ESMTP id 489A84C8D for ; Mon, 1 Apr 2019 21:06:42 +0200 (CEST) In-Reply-To: <9184057F7FC11744A2107296B6B8EB1E5420DDF2@FMSMSX108.amr.corp.intel.com> Content-Language: en-US List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > > Subject: RE: [PATCH v3 6/8] stack: add C11 atomic implementation > > > > [snip] > > > > > > +static __rte_always_inline void > > > > +__rte_stack_lf_push(struct rte_stack_lf_list *list, > > > > + struct rte_stack_lf_elem *first, > > > > + struct rte_stack_lf_elem *last, > > > > + unsigned int num) > > > > +{ > > > > +#ifndef RTE_ARCH_X86_64 > > > > + RTE_SET_USED(first); > > > > + RTE_SET_USED(last); > > > > + RTE_SET_USED(list); > > > > + RTE_SET_USED(num); > > > > +#else > > > > + struct rte_stack_lf_head old_head; > > > > + int success; > > > > + > > > > + old_head =3D list->head; > > > This can be a torn read (same as you have mentioned in > > > __rte_stack_lf_pop). I suggest we use acquire thread fence here as > > > well (please see the comments in __rte_stack_lf_pop). > > > > Agreed. I'll add the acquire fence. > > >=20 > On second thought, an acquire fence isn't necessary. The acquire fence in > __rte_stack_lf_pop() ensures the list->head is ordered before the list el= ement > reads. That isn't necessary here; we need to ensure that the last->next w= rite > occurs (and is observed) before the list->head write, which the CAS's REL= EASE > success memorder accomplishes. >=20 > If a torn read occurs, the CAS will fail and will atomically re-load &old= _head. Following is my understanding: The general guideline is there should be a load-acquire for every store-rel= ease. In both xxx_lf_pop and xxx_lf_push, the head is store-released, hence= the load of the head should be load-acquire. >>From the code (for ex: in function _xxx_lf_push), you can notice that there= is dependency from 'old_head to new_head to list->head(in compare_exchange= )'. When such a dependency exists, if the memory orderings have to be avoid= ed, one needs to use __ATOMIC_CONSUME. Currently, the compilers will use a = stronger memory order (which is __ATOMIC_ACQUIRE) as __ATOMIC_CONSUME is no= t well defined. Please refer to [1] and [2] for more info. IMO, since, for 128b, we do not have a pure load-acquire, I suggest we use = thread_fence with acquire semantics. It is a heavier barrier, but I think i= t is a safer code which will adhere to C11 memory model. [1] https://preshing.com/20140709/the-purpose-of-memory_order_consume-in-cp= p11/ [2] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0750r1.html