From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Gavin Hu (Arm Technology China)" <Gavin.Hu@arm.com>
Subject: Re: [PATCH v3 1/3] ring: read tail using atomic load
Date: Fri, 5 Oct 2018 00:47:28 +0000
Message-ID: <VI1PR08MB316716BE3A6CE22E70F044838FEB0@VI1PR08MB3167.eurprd08.prod.outlook.com>
References: <20180807031943.5331-1-gavin.hu@arm.com>
 <1537172244-64874-1-git-send-email-gavin.hu@arm.com>
 <20180929104857.GA30457@jerin>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Cc: "dev@dpdk.org" <dev@dpdk.org>, Honnappa Nagarahalli
 <Honnappa.Nagarahalli@arm.com>, Steve Capper <Steve.Capper@arm.com>, Ola
 Liljedahl <Ola.Liljedahl@arm.com>, nd <nd@arm.com>, "stable@dpdk.org"
 <stable@dpdk.org>
To: Jerin Jacob <jerin.jacob@caviumnetworks.com>
Return-path: <dev-bounces@dpdk.org>
In-Reply-To: <20180929104857.GA30457@jerin>
Content-Language: en-US
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Hi Jerin,

Thanks for your review, inline comments from our internal discussions.

BR. Gavin

> -----Original Message-----
> From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
> Sent: Saturday, September 29, 2018 6:49 PM
> To: Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>
> Cc: dev@dpdk.org; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Steve Capper
> <Steve.Capper@arm.com>; Ola Liljedahl <Ola.Liljedahl@arm.com>; nd
> <nd@arm.com>; stable@dpdk.org
> Subject: Re: [PATCH v3 1/3] ring: read tail using atomic load
>=20
> -----Original Message-----
> > Date: Mon, 17 Sep 2018 16:17:22 +0800
> > From: Gavin Hu <gavin.hu@arm.com>
> > To: dev@dpdk.org
> > CC: gavin.hu@arm.com, Honnappa.Nagarahalli@arm.com,
> > steve.capper@arm.com,  Ola.Liljedahl@arm.com,
> > jerin.jacob@caviumnetworks.com, nd@arm.com,  stable@dpdk.org
> > Subject: [PATCH v3 1/3] ring: read tail using atomic load
> > X-Mailer: git-send-email 2.7.4
> >
> > External Email
> >
> > In update_tail, read ht->tail using __atomic_load.Although the
> > compiler currently seems to be doing the right thing even without
> > _atomic_load, we don't want to give the compiler freedom to optimise
> > what should be an atomic load, it should not be arbitarily moved
> > around.
> >
> > Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Gavin Hu <gavin.hu@arm.com>
> > Reviewed-by: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> > Reviewed-by: Steve Capper <steve.capper@arm.com>
> > Reviewed-by: Ola Liljedahl <Ola.Liljedahl@arm.com>
> > ---
> >  lib/librte_ring/rte_ring_c11_mem.h | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_ring/rte_ring_c11_mem.h
> > b/lib/librte_ring/rte_ring_c11_mem.h
> > index 94df3c4..234fea0 100644
> > --- a/lib/librte_ring/rte_ring_c11_mem.h
> > +++ b/lib/librte_ring/rte_ring_c11_mem.h
> > @@ -21,7 +21,8 @@ update_tail(struct rte_ring_headtail *ht, uint32_t
> old_val, uint32_t new_val,
> >          * we need to wait for them to complete
> >          */
> >         if (!single)
> > -               while (unlikely(ht->tail !=3D old_val))
> > +               while (unlikely(old_val !=3D __atomic_load_n(&ht->tail,
> > +                                               __ATOMIC_RELAXED)))
> >                         rte_pause();
>=20
> Since it is a while loop with rte_pause(), IMO, There is no scope of fals=
e
> compiler optimization.
> IMO, this change may not required though I don't see any performance
> difference with two core ring_perf_autotest test. May be more core case i=
t
> may have effect. IMO, If it not absolutely required, we can avoid this ch=
ange.
>=20

Using __atomic_load_n() has two purposes:
1) the old code only works because ht->tail is declared volatile which is n=
ot a requirement for C11 or for the use of __atomic builtins. If ht->tail w=
as not declared volatile and __atomic_load_n() not used, the compiler would=
 likely hoist the load above the loop.=20
2) I think all memory locations used for synchronization should use __atomi=
c operations for access in order to clearly indicate that these locations (=
and these accesses) are used for synchronization.

The read of ht->tail needs to be atomic, a non-atomic read would not be cor=
rect. But there are no memory ordering requirements (with regards to other =
loads and/or stores by this thread) so relaxed memory order is sufficient.
Another aspect of using __atomic_load_n() is that the compiler cannot "opti=
mise" this load (e.g. combine, hoist etc), it has to be done as specified i=
n the source code which is also what we need here.

One point worth mentioning though is that this change is for the rte_ring_c=
11_mem.h file, not the legacy ring. It may be worth persisting with getting=
 the C11 code right when people are less excited about sending a release ou=
t?

We can explain that for C11 we would prefer to do loads and stores as per t=
he C11 memory model. In the case of rte_ring, the code is separated cleanly=
 into C11 specific files anyway.

I think reading ht->tail using __atomic_load_n() is the most appropriate wa=
y. We show that ht->tail is used for synchronization, we acknowledge that h=
t->tail may be written by other threads without any other kind of synchroni=
zation (e.g. no lock involved) and we require an atomic load (any write to =
ht->tail must also be atomic).

Using volatile and explicit compiler (or processor) memory barriers (fences=
) is the legacy pre-C11 way of accomplishing these things. There's a reason=
 why C11/C++11 moved away from the old ways.
> >
> >         __atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE);
> > --
> > 2.7.4
> >