From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jia He Subject: Re: [PATCH] ring: guarantee ordering of cons/prod loading when doing enqueue/dequeue Date: Fri, 20 Oct 2017 09:57:58 +0800 Message-ID: <3e580cd7-2854-d855-be9c-7c4ce06e3ed5@gmail.com> References: <20171010095636.4507-1-hejianet@gmail.com> <20171012155350.j34ddtivxzd27pag@platinum> <2601191342CEEE43887BDE71AB9772585FAA859F@IRSMSX103.ger.corp.intel.com> <20171012172311.GA8524@jerin> <2601191342CEEE43887BDE71AB9772585FAAB171@IRSMSX103.ger.corp.intel.com> <8806e2bd-c57b-03ff-a315-0a311690f1d9@163.com> <2601191342CEEE43887BDE71AB9772585FAAB404@IRSMSX103.ger.corp.intel.com> <2601191342CEEE43887BDE71AB9772585FAAB570@IRSMSX103.ger.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Cc: Olivier MATZ , "dev@dpdk.org" , "jia.he@hxt-semitech.com" , "jie2.liu@hxt-semitech.com" , "bing.zhao@hxt-semitech.com" , "Richardson, Bruce" To: "Ananyev, Konstantin" , "Zhao, Bing" , Jerin Jacob Return-path: Received: from mail-pf0-f180.google.com (mail-pf0-f180.google.com [209.85.192.180]) by dpdk.org (Postfix) with ESMTP id 965E8199C8 for ; Fri, 20 Oct 2017 03:58:08 +0200 (CEST) Received: by mail-pf0-f180.google.com with SMTP id b6so8718479pfh.7 for ; Thu, 19 Oct 2017 18:58:08 -0700 (PDT) In-Reply-To: <2601191342CEEE43887BDE71AB9772585FAAB570@IRSMSX103.ger.corp.intel.com> List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 10/20/2017 4:02 AM, Ananyev, Konstantin Wrote: > >> -----Original Message----- >> From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev, Konstantin >> Sent: Thursday, October 19, 2017 3:15 PM >> To: Zhao, Bing ; Jia He ; Jerin Jacob >> Cc: Olivier MATZ ; dev@dpdk.org; jia.he@hxt-semitech.com; jie2.liu@hxt-semitech.com; bing.zhao@hxt- >> semitech.com >> Subject: Re: [dpdk-dev] [PATCH] ring: guarantee ordering of cons/prod loading when doing enqueue/dequeue >> >>> Hi, >>> >>> On 2017/10/19 18:02, Ananyev, Konstantin wrote: >>>> Hi Jia, >>>> >>>>> Hi >>>>> >>>>> >>>>> On 10/13/2017 9:02 AM, Jia He Wrote: >>>>>> Hi Jerin >>>>>> >>>>>> >>>>>> On 10/13/2017 1:23 AM, Jerin Jacob Wrote: >>>>>>> -----Original Message----- >>>>>>>> Date: Thu, 12 Oct 2017 17:05:50 +0000 >>>>>>>> >>>>> [...] >>>>>>> On the same lines, >>>>>>> >>>>>>> Jia He, jie2.liu, bing.zhao, >>>>>>> >>>>>>> Is this patch based on code review or do you saw this issue on any of >>>>>>> the >>>>>>> arm/ppc target? arm64 will have performance impact with this change. >>>>> sorry, miss one important information >>>>> Our platform is an aarch64 server with 46 cpus. >>>>> If we reduced the involved cpu numbers, the bug occurred less frequently. >>>>> >>>>> Yes, mb barrier impact the performance, but correctness is more >>>>> important, isn't it ;-) >>>>> Maybe we can  find any other lightweight barrier here? >>>>> >>>>> Cheers, >>>>> Jia >>>>>> Based on mbuf_autotest, the rte_panic will be invoked in seconds. >>>>>> >>>>>> PANIC in test_refcnt_iter(): >>>>>> (lcore=0, iter=0): after 10s only 61 of 64 mbufs left free >>>>>> 1: [./test(rte_dump_stack+0x38) [0x58d868]] >>>>>> Aborted (core dumped) >>>>>> >>>> So is it only reproducible with mbuf refcnt test? >>>> Could it be reproduced with some 'pure' ring test >>>> (no mempools/mbufs refcnt, etc.)? >>>> The reason I am asking - in that test we also have mbuf refcnt updates >>>> (that's what for that test was created) and we are doing some optimizations here too >>>> to avoid excessive atomic updates. >>>> BTW, if the problem is not reproducible without mbuf refcnt, >>>> can I suggest to extend the test with: >>>> - add a check that enqueue() operation was successful >>>> - walk through the pool and check/printf refcnt of each mbuf. >>>> Hopefully that would give us some extra information what is going wrong here. >>>> Konstantin >>>> >>>> >>> Currently, the issue is only found in this case here on the ARM >>> platform, not sure how it is going with the X86_64 platform >> I understand that it is only reproducible on arm so far. >> What I am asking - with dpdk is there any other way to reproduce it (on arm) >> except then running mbuf_autotest? >> Something really simple that not using mbuf/mempool etc? >> Just do dequeue/enqueue from multiple threads and check data integrity at the end? >> If not - what makes you think that the problem is precisely in rte_ring code? >> Why not in rte_mbuf let say? > Actually I reread your original mail and finally get your point. > If I understand you correctly the problem with read reordering here is that > after we read prot.tail but before we read cons.head > both cons.head and prod.tail might be updated, > but for us prod.tail change might be unnoticed. > As an example: > time 0 (cons.head == 0, prod.tail == 0): > prod_tail = r->prod.tail; /* due read reordering */ > /* prod_tail == 0 */ > > time 1 (cons.head ==5, prod.tail == 5): > *old_head = r->cons.head; > /* cons.head == 5 */ > *entries = (prod_tail - *old_head); > /* *entries == (0 - 5) == 0xFFFFFFFB */ > > And we will move our cons.head forward, even though there are no filled entries in the ring. > Is that right? Yes > As I side notice, I wonder why we have here: > *entries = (prod_tail - *old_head); > instead of: > *entries = r->size + prod_tail - *old_head; > ? Yes, I agree with you at this code line. But reordering will still mess up things even after this change(I have tested, still the same as before) I thought the *entries is a door to prevent consumer from moving forward too fast than the producer. But in some cases, it is correct that prod_tail is smaller than *old_head due to  the cirular queue. In other cases, it is incorrect because of read/read reordering. AFAICT, the root cause here is the producer tail and cosumer head are dependant on each other. Thus a memory barrier is neccessary. Cheers, Jia > > Konstantin > >>> . In another >>> mail of this thread, we've made a simple test based on this and captured >>> some information and I pasted there.(I pasted the patch there :-)) >> Are you talking about that one: >> http://dpdk.org/dev/patchwork/patch/30405/ >> ? >> It still uses test/test/test_mbuf.c..., >> but anyway I don't really understand how mbuf_autotest supposed >> to work with these changes: >> @@ -730,7 +739,7 @@ test_refcnt_iter(unsigned int lcore, unsigned int iter, >> rte_ring_enqueue(refcnt_mbuf_ring, m); >> } >> } >> - rte_pktmbuf_free(m); >> + // rte_pktmbuf_free(m); >> } >> @@ -741,6 +750,12 @@ test_refcnt_iter(unsigned int lcore, unsigned int iter, >> while (!rte_ring_empty(refcnt_mbuf_ring)) >> ; >> >> + if (NULL != m) { >> + if (1 != rte_mbuf_refcnt_read(m)) >> + printf("m ref is %u\n", rte_mbuf_refcnt_read(m)); >> + rte_pktmbuf_free(m); >> + } >> + >> /* check that all mbufs are back into mempool by now */ >> for (wn = 0; wn != REFCNT_MAX_TIMEOUT; wn++) { >> if ((i = rte_mempool_avail_count(refcnt_pool)) == n) { >> >> That means all your mbufs (except the last one) will still be allocated. >> So the test would fail - as it should, I think. >> >>> And >>> it seems that Juhamatti & Jacod found some reverting action several >>> months ago. >> Didn't get that one either. >> Konstantin