From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Gavin Hu (Arm Technology China)" Subject: Re: [PATCH v3 1/3] ring: read tail using atomic load Date: Fri, 5 Oct 2018 00:47:28 +0000 Message-ID: References: <20180807031943.5331-1-gavin.hu@arm.com> <1537172244-64874-1-git-send-email-gavin.hu@arm.com> <20180929104857.GA30457@jerin> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Cc: "dev@dpdk.org" , Honnappa Nagarahalli , Steve Capper , Ola Liljedahl , nd , "stable@dpdk.org" To: Jerin Jacob Return-path: In-Reply-To: <20180929104857.GA30457@jerin> Content-Language: en-US List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Jerin, Thanks for your review, inline comments from our internal discussions. BR. Gavin > -----Original Message----- > From: Jerin Jacob > Sent: Saturday, September 29, 2018 6:49 PM > To: Gavin Hu (Arm Technology China) > Cc: dev@dpdk.org; Honnappa Nagarahalli > ; Steve Capper > ; Ola Liljedahl ; nd > ; stable@dpdk.org > Subject: Re: [PATCH v3 1/3] ring: read tail using atomic load >=20 > -----Original Message----- > > Date: Mon, 17 Sep 2018 16:17:22 +0800 > > From: Gavin Hu > > To: dev@dpdk.org > > CC: gavin.hu@arm.com, Honnappa.Nagarahalli@arm.com, > > steve.capper@arm.com, Ola.Liljedahl@arm.com, > > jerin.jacob@caviumnetworks.com, nd@arm.com, stable@dpdk.org > > Subject: [PATCH v3 1/3] ring: read tail using atomic load > > X-Mailer: git-send-email 2.7.4 > > > > External Email > > > > In update_tail, read ht->tail using __atomic_load.Although the > > compiler currently seems to be doing the right thing even without > > _atomic_load, we don't want to give the compiler freedom to optimise > > what should be an atomic load, it should not be arbitarily moved > > around. > > > > Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option") > > Cc: stable@dpdk.org > > > > Signed-off-by: Gavin Hu > > Reviewed-by: Honnappa Nagarahalli > > Reviewed-by: Steve Capper > > Reviewed-by: Ola Liljedahl > > --- > > lib/librte_ring/rte_ring_c11_mem.h | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/lib/librte_ring/rte_ring_c11_mem.h > > b/lib/librte_ring/rte_ring_c11_mem.h > > index 94df3c4..234fea0 100644 > > --- a/lib/librte_ring/rte_ring_c11_mem.h > > +++ b/lib/librte_ring/rte_ring_c11_mem.h > > @@ -21,7 +21,8 @@ update_tail(struct rte_ring_headtail *ht, uint32_t > old_val, uint32_t new_val, > > * we need to wait for them to complete > > */ > > if (!single) > > - while (unlikely(ht->tail !=3D old_val)) > > + while (unlikely(old_val !=3D __atomic_load_n(&ht->tail, > > + __ATOMIC_RELAXED))) > > rte_pause(); >=20 > Since it is a while loop with rte_pause(), IMO, There is no scope of fals= e > compiler optimization. > IMO, this change may not required though I don't see any performance > difference with two core ring_perf_autotest test. May be more core case i= t > may have effect. IMO, If it not absolutely required, we can avoid this ch= ange. >=20 Using __atomic_load_n() has two purposes: 1) the old code only works because ht->tail is declared volatile which is n= ot a requirement for C11 or for the use of __atomic builtins. If ht->tail w= as not declared volatile and __atomic_load_n() not used, the compiler would= likely hoist the load above the loop.=20 2) I think all memory locations used for synchronization should use __atomi= c operations for access in order to clearly indicate that these locations (= and these accesses) are used for synchronization. The read of ht->tail needs to be atomic, a non-atomic read would not be cor= rect. But there are no memory ordering requirements (with regards to other = loads and/or stores by this thread) so relaxed memory order is sufficient. Another aspect of using __atomic_load_n() is that the compiler cannot "opti= mise" this load (e.g. combine, hoist etc), it has to be done as specified i= n the source code which is also what we need here. One point worth mentioning though is that this change is for the rte_ring_c= 11_mem.h file, not the legacy ring. It may be worth persisting with getting= the C11 code right when people are less excited about sending a release ou= t? We can explain that for C11 we would prefer to do loads and stores as per t= he C11 memory model. In the case of rte_ring, the code is separated cleanly= into C11 specific files anyway. I think reading ht->tail using __atomic_load_n() is the most appropriate wa= y. We show that ht->tail is used for synchronization, we acknowledge that h= t->tail may be written by other threads without any other kind of synchroni= zation (e.g. no lock involved) and we require an atomic load (any write to = ht->tail must also be atomic). Using volatile and explicit compiler (or processor) memory barriers (fences= ) is the legacy pre-C11 way of accomplishing these things. There's a reason= why C11/C++11 moved away from the old ways. > > > > __atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE); > > -- > > 2.7.4 > >