From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Ananyev, Konstantin" Subject: Re: [PATCH v3 1/3] ring: read tail using atomic load Date: Fri, 5 Oct 2018 08:21:59 +0000 Message-ID: <2601191342CEEE43887BDE71AB9772580102FE261A@IRSMSX106.ger.corp.intel.com> References: <20180807031943.5331-1-gavin.hu@arm.com> <1537172244-64874-1-git-send-email-gavin.hu@arm.com> <20180929104857.GA30457@jerin> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Cc: "dev@dpdk.org" , Honnappa Nagarahalli , Steve Capper , "Ola Liljedahl" , nd , "stable@dpdk.org" To: "Gavin Hu (Arm Technology China)" , Jerin Jacob Return-path: In-Reply-To: Content-Language: en-US List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Gavin Hu (Arm Techno= logy China) > Sent: Friday, October 5, 2018 1:47 AM > To: Jerin Jacob > Cc: dev@dpdk.org; Honnappa Nagarahalli ; St= eve Capper ; Ola Liljedahl > ; nd ; stable@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v3 1/3] ring: read tail using atomic load >=20 > Hi Jerin, >=20 > Thanks for your review, inline comments from our internal discussions. >=20 > BR. Gavin >=20 > > -----Original Message----- > > From: Jerin Jacob > > Sent: Saturday, September 29, 2018 6:49 PM > > To: Gavin Hu (Arm Technology China) > > Cc: dev@dpdk.org; Honnappa Nagarahalli > > ; Steve Capper > > ; Ola Liljedahl ; nd > > ; stable@dpdk.org > > Subject: Re: [PATCH v3 1/3] ring: read tail using atomic load > > > > -----Original Message----- > > > Date: Mon, 17 Sep 2018 16:17:22 +0800 > > > From: Gavin Hu > > > To: dev@dpdk.org > > > CC: gavin.hu@arm.com, Honnappa.Nagarahalli@arm.com, > > > steve.capper@arm.com, Ola.Liljedahl@arm.com, > > > jerin.jacob@caviumnetworks.com, nd@arm.com, stable@dpdk.org > > > Subject: [PATCH v3 1/3] ring: read tail using atomic load > > > X-Mailer: git-send-email 2.7.4 > > > > > > External Email > > > > > > In update_tail, read ht->tail using __atomic_load.Although the > > > compiler currently seems to be doing the right thing even without > > > _atomic_load, we don't want to give the compiler freedom to optimise > > > what should be an atomic load, it should not be arbitarily moved > > > around. > > > > > > Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option") > > > Cc: stable@dpdk.org > > > > > > Signed-off-by: Gavin Hu > > > Reviewed-by: Honnappa Nagarahalli > > > Reviewed-by: Steve Capper > > > Reviewed-by: Ola Liljedahl > > > --- > > > lib/librte_ring/rte_ring_c11_mem.h | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/lib/librte_ring/rte_ring_c11_mem.h > > > b/lib/librte_ring/rte_ring_c11_mem.h > > > index 94df3c4..234fea0 100644 > > > --- a/lib/librte_ring/rte_ring_c11_mem.h > > > +++ b/lib/librte_ring/rte_ring_c11_mem.h > > > @@ -21,7 +21,8 @@ update_tail(struct rte_ring_headtail *ht, uint32_t > > old_val, uint32_t new_val, > > > * we need to wait for them to complete > > > */ > > > if (!single) > > > - while (unlikely(ht->tail !=3D old_val)) > > > + while (unlikely(old_val !=3D __atomic_load_n(&ht->tai= l, > > > + __ATOMIC_RELAXED))) > > > rte_pause(); > > > > Since it is a while loop with rte_pause(), IMO, There is no scope of fa= lse > > compiler optimization. > > IMO, this change may not required though I don't see any performance > > difference with two core ring_perf_autotest test. May be more core case= it > > may have effect. IMO, If it not absolutely required, we can avoid this = change. > > >=20 > Using __atomic_load_n() has two purposes: > 1) the old code only works because ht->tail is declared volatile which is= not a requirement for C11 or for the use of __atomic builtins. If ht- > >tail was not declared volatile and __atomic_load_n() not used, the compi= ler would likely hoist the load above the loop. > 2) I think all memory locations used for synchronization should use __ato= mic operations for access in order to clearly indicate that these > locations (and these accesses) are used for synchronization. >=20 > The read of ht->tail needs to be atomic, a non-atomic read would not be c= orrect. That's a 32bit value load. AFAIK on all CPUs that we support it is an atomic operation. > But there are no memory ordering requirements (with > regards to other loads and/or stores by this thread) so relaxed memory or= der is sufficient. > Another aspect of using __atomic_load_n() is that the compiler cannot "op= timise" this load (e.g. combine, hoist etc), it has to be done as > specified in the source code which is also what we need here. I think Jerin points that rte_pause() acts here as compiler barrier too, so no need to worry that compiler would optimize out the loop. Konstantin >=20 > One point worth mentioning though is that this change is for the rte_ring= _c11_mem.h file, not the legacy ring. It may be worth persisting > with getting the C11 code right when people are less excited about sendin= g a release out? >=20 > We can explain that for C11 we would prefer to do loads and stores as per= the C11 memory model. In the case of rte_ring, the code is > separated cleanly into C11 specific files anyway. >=20 > I think reading ht->tail using __atomic_load_n() is the most appropriate = way. We show that ht->tail is used for synchronization, we > acknowledge that ht->tail may be written by other threads without any oth= er kind of synchronization (e.g. no lock involved) and we require > an atomic load (any write to ht->tail must also be atomic). >=20 > Using volatile and explicit compiler (or processor) memory barriers (fenc= es) is the legacy pre-C11 way of accomplishing these things. There's > a reason why C11/C++11 moved away from the old ways. > > > > > > __atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE); > > > -- > > > 2.7.4 > > >