From mboxrd@z Thu Jan  1 00:00:00 1970
From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Subject: Re: [PATCH v3 6/8] stack: add C11 atomic implementation
Date: Mon, 1 Apr 2019 19:06:38 +0000
Message-ID: <VE1PR08MB51499D8D3DB5C866E29866C998550@VE1PR08MB5149.eurprd08.prod.outlook.com>
References: <20190305164256.2367-1-gage.eads@intel.com>
 <20190306144559.391-1-gage.eads@intel.com>
 <20190306144559.391-7-gage.eads@intel.com>
 <VE1PR08MB5149F2B5E1A3AC800BC71D8698590@VE1PR08MB5149.eurprd08.prod.outlook.com>
 <9184057F7FC11744A2107296B6B8EB1E5420D940@FMSMSX108.amr.corp.intel.com>
 <9184057F7FC11744A2107296B6B8EB1E5420DDF2@FMSMSX108.amr.corp.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Cc: "'olivier.matz@6wind.com'" <olivier.matz@6wind.com>,
 "'arybchenko@solarflare.com'" <arybchenko@solarflare.com>, "Richardson,
 Bruce" <bruce.richardson@intel.com>, "Ananyev, Konstantin"
 <konstantin.ananyev@intel.com>, "Gavin Hu (Arm Technology China)"
 <Gavin.Hu@arm.com>, nd <nd@arm.com>, "thomas@monjalon.net"
 <thomas@monjalon.net>, nd <nd@arm.com>
To: "Eads, Gage" <gage.eads@intel.com>, "'dev@dpdk.org'" <dev@dpdk.org>
Return-path: <dev-bounces@dpdk.org>
Received: from EUR01-VE1-obe.outbound.protection.outlook.com
 (mail-eopbgr140042.outbound.protection.outlook.com [40.107.14.42])
 by dpdk.org (Postfix) with ESMTP id 489A84C8D
 for <dev@dpdk.org>; Mon,  1 Apr 2019 21:06:42 +0200 (CEST)
In-Reply-To: <9184057F7FC11744A2107296B6B8EB1E5420DDF2@FMSMSX108.amr.corp.intel.com>
Content-Language: en-US
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

> > Subject: RE: [PATCH v3 6/8] stack: add C11 atomic implementation
> >
> > [snip]
> >
> > > > +static __rte_always_inline void
> > > > +__rte_stack_lf_push(struct rte_stack_lf_list *list,
> > > > +		    struct rte_stack_lf_elem *first,
> > > > +		    struct rte_stack_lf_elem *last,
> > > > +		    unsigned int num)
> > > > +{
> > > > +#ifndef RTE_ARCH_X86_64
> > > > +	RTE_SET_USED(first);
> > > > +	RTE_SET_USED(last);
> > > > +	RTE_SET_USED(list);
> > > > +	RTE_SET_USED(num);
> > > > +#else
> > > > +	struct rte_stack_lf_head old_head;
> > > > +	int success;
> > > > +
> > > > +	old_head =3D list->head;
> > > This can be a torn read (same as you have mentioned in
> > > __rte_stack_lf_pop). I suggest we use acquire thread fence here as
> > > well (please see the comments in __rte_stack_lf_pop).
> >
> > Agreed. I'll add the acquire fence.
> >
>=20
> On second thought, an acquire fence isn't necessary. The acquire fence in
> __rte_stack_lf_pop() ensures the list->head is ordered before the list el=
ement
> reads. That isn't necessary here; we need to ensure that the last->next w=
rite
> occurs (and is observed) before the list->head write, which the CAS's REL=
EASE
> success memorder accomplishes.
>=20
> If a torn read occurs, the CAS will fail and will atomically re-load &old=
_head.

Following is my understanding:
The general guideline is there should be a load-acquire for every store-rel=
ease. In both xxx_lf_pop and xxx_lf_push, the head is store-released, hence=
 the load of the head should be load-acquire.
>>From the code (for ex: in function _xxx_lf_push), you can notice that there=
 is dependency from 'old_head to new_head to list->head(in compare_exchange=
)'. When such a dependency exists, if the memory orderings have to be avoid=
ed, one needs to use __ATOMIC_CONSUME. Currently, the compilers will use a =
stronger memory order (which is __ATOMIC_ACQUIRE) as __ATOMIC_CONSUME is no=
t well defined. Please refer to [1] and [2] for more info.

IMO, since, for 128b, we do not have a pure load-acquire, I suggest we use =
thread_fence with acquire semantics. It is a heavier barrier, but I think i=
t is a safer code which will adhere to C11 memory model.

[1] https://preshing.com/20140709/the-purpose-of-memory_order_consume-in-cp=
p11/
[2] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0750r1.html