All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ruifeng Wang (Arm Technology China)" <Ruifeng.Wang@arm.com>
To: David Marchand <david.marchand@redhat.com>,
	Aaron Conole <aconole@redhat.com>
Cc: David Hunt <david.hunt@intel.com>, dev <dev@dpdk.org>,
	"hkalra@marvell.com" <hkalra@marvell.com>,
	"Gavin Hu (Arm Technology China)" <Gavin.Hu@arm.com>,
	Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
	nd <nd@arm.com>, dpdk stable <stable@dpdk.org>, nd <nd@arm.com>
Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH] lib/distributor: fix deadlock issue for aarch64
Date: Wed, 9 Oct 2019 05:52:03 +0000	[thread overview]
Message-ID: <AM0PR08MB3986AC7C021B31BA4C2C53759E950@AM0PR08MB3986.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <CAJFAV8yyt7EyPV_SKLmdFrXgqwJh=J-cLZ0GoRnoEBzh_Nnj7A@mail.gmail.com>


> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Wednesday, October 9, 2019 03:47
> To: Aaron Conole <aconole@redhat.com>
> Cc: Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>; David
> Hunt <david.hunt@intel.com>; dev <dev@dpdk.org>; hkalra@marvell.com;
> Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; dpdk
> stable <stable@dpdk.org>
> Subject: Re: [dpdk-stable] [dpdk-dev] [PATCH] lib/distributor: fix deadlock
> issue for aarch64
> 
> On Tue, Oct 8, 2019 at 7:06 PM Aaron Conole <aconole@redhat.com> wrote:
> >
> > Ruifeng Wang <ruifeng.wang@arm.com> writes:
> >
> > > Distributor and worker threads rely on data structs in cache line
> > > for synchronization. The shared data structs were not protected.
> > > This caused deadlock issue on weaker memory ordering platforms as
> > > aarch64.
> > > Fix this issue by adding memory barriers to ensure synchronization
> > > among cores.
> > >
> > > Bugzilla ID: 342
> > > Fixes: 775003ad2f96 ("distributor: add new burst-capable library")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > > ---
> >
> > I see a failure in the distributor_autotest (on one of the builds):
> >
> > 64/82 DPDK:fast-tests / distributor_autotest  FAIL     0.37 s (exit status 255
> or signal 127 SIGinvalid)
> >
> > --- command ---
> >
> > DPDK_TEST='distributor_autotest'
> > /home/travis/build/ovsrobot/dpdk/build/app/test/dpdk-test -l 0-1
> > --file-prefix=distributor_autotest
> >
> > --- stdout ---
> >
> > EAL: Probing VFIO support...
> >
> > APP: HPET is not enabled, using TSC as default timer
> >
> > RTE>>distributor_autotest
> >
> > === Basic distributor sanity tests ===
> >
> > Worker 0 handled 32 packets
> >
> > Sanity test with all zero hashes done.
> >
> > Worker 0 handled 32 packets
> >
> > Sanity test with non-zero hashes done
> >
> > === testing big burst (single) ===
> >
> > Sanity test of returned packets done
> >
> > === Sanity test with mbuf alloc/free (single) ===
> >
> > Sanity test with mbuf alloc/free passed
> >
> > Too few cores to run worker shutdown test
> >
> > === Basic distributor sanity tests ===
> >
> > Worker 0 handled 32 packets
> >
> > Sanity test with all zero hashes done.
> >
> > Worker 0 handled 32 packets
> >
> > Sanity test with non-zero hashes done
> >
> > === testing big burst (burst) ===
> >
> > Sanity test of returned packets done
> >
> > === Sanity test with mbuf alloc/free (burst) ===
> >
> > Line 326: Packet count is incorrect, 1048568, expected 1048576
> >
> > Test Failed
> >
> > RTE>>
> >
> > --- stderr ---
> >
> > EAL: Detected 2 lcore(s)
> >
> > EAL: Detected 1 NUMA nodes
> >
> > EAL: Multi-process socket /var/run/dpdk/distributor_autotest/mp_socket
> >
> > EAL: Selected IOVA mode 'PA'
> >
> > EAL: No available hugepages reported in hugepages-1048576kB
> >
> > -------
> >
> > Not sure how to help debug further.  I'll re-start the job to see if
> > it 'clears' up - but I guess there may be a delicate synchronization
> > somewhere that needs to be accounted.
> 
> Idem, and with the same loop I used before, it can be caught quickly.
> 
> # time (log=/tmp/$$.log; while true; do echo distributor_autotest
> |taskset -c 0-1 ./build-gcc-static/app/test/dpdk-test --log-level *:8
> -l 0-1 >$log 2>&1; grep -q 'Test OK' $log || break; done; cat $log; rm -f $log)
> 
Thanks Aaron and David for your report. I can reproduce this issue with the script.
Will fix it in next version.

> [snip]
> 
> RTE>>distributor_autotest
> EAL: Trying to obtain current memory policy.
> EAL: Setting policy MPOL_PREFERRED for socket 0
> EAL: Restoring previous memory policy: 0
> EAL: request: mp_malloc_sync
> EAL: Heap on socket 0 was expanded by 2MB
> EAL: Trying to obtain current memory policy.
> EAL: Setting policy MPOL_PREFERRED for socket 0
> EAL: Restoring previous memory policy: 0
> EAL: alloc_pages_on_heap(): couldn't allocate physically contiguous space
> EAL: Trying to obtain current memory policy.
> EAL: Setting policy MPOL_PREFERRED for socket 0
> EAL: Restoring previous memory policy: 0
> EAL: request: mp_malloc_sync
> EAL: Heap on socket 0 was expanded by 8MB === Basic distributor sanity
> tests === Worker 0 handled 32 packets Sanity test with all zero hashes done.
> Worker 0 handled 32 packets
> Sanity test with non-zero hashes done
> === testing big burst (single) ===
> Sanity test of returned packets done
> 
> === Sanity test with mbuf alloc/free (single) === Sanity test with mbuf
> alloc/free passed
> 
> Too few cores to run worker shutdown test === Basic distributor sanity tests
> === Worker 0 handled 32 packets Sanity test with all zero hashes done.
> Worker 0 handled 32 packets
> Sanity test with non-zero hashes done
> === testing big burst (burst) ===
> Sanity test of returned packets done
> 
> === Sanity test with mbuf alloc/free (burst) === Line 326: Packet count is
> incorrect, 1048568, expected 1048576 Test Failed
> RTE>>
> real    0m36.668s
> user    1m7.293s
> sys    0m1.560s
> 
> Could be worth running this loop on all tests? (not talking about the CI, it
> would be a manual effort to catch lurking issues).
> 
> 
> --
> David Marchand

  parent reply	other threads:[~2019-10-09  5:52 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-08  9:55 [dpdk-dev] [PATCH] lib/distributor: fix deadlock issue for aarch64 Ruifeng Wang
2019-10-08 12:53 ` Hunt, David
2019-10-08 17:05 ` Aaron Conole
2019-10-08 19:46   ` [dpdk-dev] [dpdk-stable] " David Marchand
2019-10-08 20:08     ` Aaron Conole
2019-10-09  5:52     ` Ruifeng Wang (Arm Technology China) [this message]
2019-10-17 11:42       ` [dpdk-dev] [EXT] " Harman Kalra
2019-10-17 13:48         ` Ruifeng Wang (Arm Technology China)
2019-10-12  2:43 ` [dpdk-dev] [PATCH v2 0/2] fix distributor unit test Ruifeng Wang
2019-10-12  2:43   ` [dpdk-dev] [PATCH v2 1/2] lib/distributor: fix deadlock issue for aarch64 Ruifeng Wang
2019-10-13  2:31     ` Honnappa Nagarahalli
2019-10-14 10:00       ` Ruifeng Wang (Arm Technology China)
2019-10-12  2:43   ` [dpdk-dev] [PATCH v2 2/2] test/distributor: fix false unit test failure Ruifeng Wang
2019-10-15  9:28 ` [dpdk-dev] [PATCH v3 0/2] fix distributor unit test Ruifeng Wang
2019-10-15  9:28   ` [dpdk-dev] [PATCH v3 1/2] lib/distributor: fix deadlock issue for aarch64 Ruifeng Wang
2019-10-25  8:13     ` Hunt, David
2019-10-15  9:28   ` [dpdk-dev] [PATCH v3 2/2] test/distributor: fix false unit test failure Ruifeng Wang
2019-10-25  8:13     ` Hunt, David
2019-10-24 19:31   ` [dpdk-dev] [PATCH v3 0/2] fix distributor unit test David Marchand
2019-10-25  8:11     ` Hunt, David
2019-10-25  8:18       ` David Marchand
2019-10-25  8:20         ` Hunt, David
2019-10-25  8:33   ` David Marchand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AM0PR08MB3986AC7C021B31BA4C2C53759E950@AM0PR08MB3986.eurprd08.prod.outlook.com \
    --to=ruifeng.wang@arm.com \
    --cc=Gavin.Hu@arm.com \
    --cc=Honnappa.Nagarahalli@arm.com \
    --cc=aconole@redhat.com \
    --cc=david.hunt@intel.com \
    --cc=david.marchand@redhat.com \
    --cc=dev@dpdk.org \
    --cc=hkalra@marvell.com \
    --cc=nd@arm.com \
    --cc=stable@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.