From mboxrd@z Thu Jan 1 00:00:00 1970 From: Justin He Subject: Re: [PATCH v3 2/3] ring: synchronize the load and store of the tail Date: Wed, 26 Sep 2018 09:59:02 +0000 Message-ID: References: <20180807031943.5331-1-gavin.hu@arm.com> <1537172244-64874-1-git-send-email-gavin.hu@arm.com> <1537172244-64874-2-git-send-email-gavin.hu@arm.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-2022-jp" Content-Transfer-Encoding: quoted-printable Cc: Honnappa Nagarahalli , Steve Capper , Ola Liljedahl , "jerin.jacob@caviumnetworks.com" , nd , "stable@dpdk.org" To: "Gavin Hu (Arm Technology China)" , "dev@dpdk.org" Return-path: In-Reply-To: Content-Language: en-US List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Gavin > -----Original Message----- > From: Gavin Hu (Arm Technology China) > Sent: 2018=1B$BG/=1B(B9=1B$B7n=1B(B26=1B$BF|=1B(B 17:30 > To: Gavin Hu (Arm Technology China) ; dev@dpdk.org > Cc: Honnappa Nagarahalli ; Steve Capper > ; Ola Liljedahl ; > jerin.jacob@caviumnetworks.com; nd ; stable@dpdk.org; Justin > He > Subject: RE: [PATCH v3 2/3] ring: synchronize the load and store of the t= ail > > +Justin He for review. > > > -----Original Message----- > > From: Gavin Hu > > Sent: Monday, September 17, 2018 4:17 PM > > To: dev@dpdk.org > > Cc: Gavin Hu (Arm Technology China) ; Honnappa > > Nagarahalli ; Steve Capper > > ; Ola Liljedahl ; > > jerin.jacob@caviumnetworks.com; nd ; stable@dpdk.org > > Subject: [PATCH v3 2/3] ring: synchronize the load and store of the > > tail > > > > Synchronize the load-acquire of the tail and the store-release within > > update_tail, the store release ensures all the ring operations, > > enqueue or dequeue, are seen by the observers on the other side as > > soon as they see the updated tail. The load-acquire is needed here as > > the data dependency is not a reliable way for ordering as the compiler > > might break it by saving to temporary values to boost performance. > > When computing the free_entries and avail_entries, use atomic > > semantics to load the heads and tails instead. > > > > The patch was benchmarked with test/ring_perf_autotest and it > > decreases the enqueue/dequeue latency by 5% ~ 27.6% with two lcores, > > the real gains are dependent on the number of lcores, depth of the ring= , SPSC > or MPMC. > > For 1 lcore, it also improves a little, about 3 ~ 4%. > > It is a big improvement, in case of MPMC, with two lcores and ring > > size of 32, it saves latency up to (3.26-2.36)/3.26 =3D 27.6%. > > > > This patch is a bug fix, while the improvement is a bonus. In our > > analysis the improvement comes from the cacheline pre-filling after > > hoisting load- acquire from _atomic_compare_exchange_n up above. > > > > The test command: > > $sudo ./test/test/test -l 16-19,44-47,72-75,100-103 -n 4 > > --socket-mem=3D\ > > 1024 -- -i > > > > Test result with this patch(two cores): > > SP/SC bulk enq/dequeue (size: 8): 5.86 MP/MC bulk enq/dequeue (size: > > 8): 10.15 SP/SC bulk enq/dequeue (size: > > 32): 1.94 MP/MC bulk enq/dequeue (size: 32): 2.36 > > > > In comparison of the test result without this patch: > > SP/SC bulk enq/dequeue (size: 8): 6.67 MP/MC bulk enq/dequeue (size: > > 8): 13.12 SP/SC bulk enq/dequeue (size: > > 32): 2.04 MP/MC bulk enq/dequeue (size: 32): 3.26 > > > > Fixes: 39368ebfc6 ("ring: introduce C11 memory model barrier option") > > Cc: stable@dpdk.org > > > > Signed-off-by: Gavin Hu > > Reviewed-by: Honnappa Nagarahalli > > Reviewed-by: Steve Capper > > Reviewed-by: Ola Liljedahl > > --- > > lib/librte_ring/rte_ring_c11_mem.h | 20 ++++++++++++++++---- > > 1 file changed, 16 insertions(+), 4 deletions(-) > > > > diff --git a/lib/librte_ring/rte_ring_c11_mem.h > > b/lib/librte_ring/rte_ring_c11_mem.h > > index 234fea0..0eae3b3 100644 > > --- a/lib/librte_ring/rte_ring_c11_mem.h > > +++ b/lib/librte_ring/rte_ring_c11_mem.h > > @@ -68,13 +68,18 @@ __rte_ring_move_prod_head(struct rte_ring *r, > > unsigned int is_sp, > > *old_head =3D __atomic_load_n(&r->prod.head, > > __ATOMIC_ACQUIRE); > > > > -/* > > - * The subtraction is done between two unsigned 32bits > > value > > +/* load-acquire synchronize with store-release of ht->tail > > + * in update_tail. > > + */ > > +const uint32_t cons_tail =3D __atomic_load_n(&r->cons.tail, > > + > > __ATOMIC_ACQUIRE); I ever noticed that freebsd also used the double __atomic_load_n. And as we discussed it several months ago [1], seems the second load_acquire is not necessary. But as you verified, it looks good to me [1] http://mails.dpdk.org/archives/dev/2017-November/080983.html So, Reviewed-by: Jia He Cheers, Justin (Jia He) > > + > > +/* The subtraction is done between two unsigned 32bits > > value > > * (the result is always modulo 32 bits even if we have > > * *old_head > cons_tail). So 'free_entries' is always between 0 > > * and capacity (which is < size). > > */ > > -*free_entries =3D (capacity + r->cons.tail - *old_head); > > +*free_entries =3D (capacity + cons_tail - *old_head); > > > > /* check that we have enough room in ring */ > > if (unlikely(n > *free_entries)) > > @@ -132,15 +137,22 @@ __rte_ring_move_cons_head(struct rte_ring *r, > > int is_sc, > > do { > > /* Restore n as it may change every loop */ > > n =3D max; > > + > > *old_head =3D __atomic_load_n(&r->cons.head, > > __ATOMIC_ACQUIRE); > > > > +/* this load-acquire synchronize with store-release of ht->tail > > + * in update_tail. > > + */ > > +const uint32_t prod_tail =3D __atomic_load_n(&r->prod.tail, > > +__ATOMIC_ACQUIRE); > > + > > /* The subtraction is done between two unsigned 32bits value > > * (the result is always modulo 32 bits even if we have > > * cons_head > prod_tail). So 'entries' is always between 0 > > * and size(ring)-1. > > */ > > -*entries =3D (r->prod.tail - *old_head); > > +*entries =3D (prod_tail - *old_head); > > > > /* Set the actual entries for dequeue */ > > if (n > *entries) > > -- > > 2.7.4 IMPORTANT NOTICE: The contents of this email and any attachments are confid= ential and may also be privileged. If you are not the intended recipient, p= lease notify the sender immediately and do not disclose the contents to any= other person, use it for any purpose, or store or copy the information in = any medium. Thank you.