All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Marchand <david.marchand@redhat.com>
To: Aaron Conole <aconole@redhat.com>
Cc: Ruifeng Wang <ruifeng.wang@arm.com>,
	David Hunt <david.hunt@intel.com>, dev <dev@dpdk.org>,
	hkalra@marvell.com, Gavin Hu <gavin.hu@arm.com>,
	 Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>,
	nd <nd@arm.com>, dpdk stable <stable@dpdk.org>
Subject: Re: [dpdk-dev] [dpdk-stable] [PATCH] lib/distributor: fix deadlock issue for aarch64
Date: Tue, 8 Oct 2019 21:46:37 +0200	[thread overview]
Message-ID: <CAJFAV8yyt7EyPV_SKLmdFrXgqwJh=J-cLZ0GoRnoEBzh_Nnj7A@mail.gmail.com> (raw)
In-Reply-To: <f7tk19f182h.fsf@dhcp-25.97.bos.redhat.com>

On Tue, Oct 8, 2019 at 7:06 PM Aaron Conole <aconole@redhat.com> wrote:
>
> Ruifeng Wang <ruifeng.wang@arm.com> writes:
>
> > Distributor and worker threads rely on data structs in cache line
> > for synchronization. The shared data structs were not protected.
> > This caused deadlock issue on weaker memory ordering platforms as
> > aarch64.
> > Fix this issue by adding memory barriers to ensure synchronization
> > among cores.
> >
> > Bugzilla ID: 342
> > Fixes: 775003ad2f96 ("distributor: add new burst-capable library")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Reviewed-by: Gavin Hu <gavin.hu@arm.com>
> > ---
>
> I see a failure in the distributor_autotest (on one of the builds):
>
> 64/82 DPDK:fast-tests / distributor_autotest  FAIL     0.37 s (exit status 255 or signal 127 SIGinvalid)
>
> --- command ---
>
> DPDK_TEST='distributor_autotest' /home/travis/build/ovsrobot/dpdk/build/app/test/dpdk-test -l 0-1 --file-prefix=distributor_autotest
>
> --- stdout ---
>
> EAL: Probing VFIO support...
>
> APP: HPET is not enabled, using TSC as default timer
>
> RTE>>distributor_autotest
>
> === Basic distributor sanity tests ===
>
> Worker 0 handled 32 packets
>
> Sanity test with all zero hashes done.
>
> Worker 0 handled 32 packets
>
> Sanity test with non-zero hashes done
>
> === testing big burst (single) ===
>
> Sanity test of returned packets done
>
> === Sanity test with mbuf alloc/free (single) ===
>
> Sanity test with mbuf alloc/free passed
>
> Too few cores to run worker shutdown test
>
> === Basic distributor sanity tests ===
>
> Worker 0 handled 32 packets
>
> Sanity test with all zero hashes done.
>
> Worker 0 handled 32 packets
>
> Sanity test with non-zero hashes done
>
> === testing big burst (burst) ===
>
> Sanity test of returned packets done
>
> === Sanity test with mbuf alloc/free (burst) ===
>
> Line 326: Packet count is incorrect, 1048568, expected 1048576
>
> Test Failed
>
> RTE>>
>
> --- stderr ---
>
> EAL: Detected 2 lcore(s)
>
> EAL: Detected 1 NUMA nodes
>
> EAL: Multi-process socket /var/run/dpdk/distributor_autotest/mp_socket
>
> EAL: Selected IOVA mode 'PA'
>
> EAL: No available hugepages reported in hugepages-1048576kB
>
> -------
>
> Not sure how to help debug further.  I'll re-start the job to see if
> it 'clears' up - but I guess there may be a delicate synchronization
> somewhere that needs to be accounted.

Idem, and with the same loop I used before, it can be caught quickly.

# time (log=/tmp/$$.log; while true; do echo distributor_autotest
|taskset -c 0-1 ./build-gcc-static/app/test/dpdk-test --log-level *:8
-l 0-1 >$log 2>&1; grep -q 'Test OK' $log || break; done; cat $log; rm
-f $log)

[snip]

RTE>>distributor_autotest
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
EAL: request: mp_malloc_sync
EAL: Heap on socket 0 was expanded by 2MB
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
EAL: alloc_pages_on_heap(): couldn't allocate physically contiguous space
EAL: Trying to obtain current memory policy.
EAL: Setting policy MPOL_PREFERRED for socket 0
EAL: Restoring previous memory policy: 0
EAL: request: mp_malloc_sync
EAL: Heap on socket 0 was expanded by 8MB
=== Basic distributor sanity tests ===
Worker 0 handled 32 packets
Sanity test with all zero hashes done.
Worker 0 handled 32 packets
Sanity test with non-zero hashes done
=== testing big burst (single) ===
Sanity test of returned packets done

=== Sanity test with mbuf alloc/free (single) ===
Sanity test with mbuf alloc/free passed

Too few cores to run worker shutdown test
=== Basic distributor sanity tests ===
Worker 0 handled 32 packets
Sanity test with all zero hashes done.
Worker 0 handled 32 packets
Sanity test with non-zero hashes done
=== testing big burst (burst) ===
Sanity test of returned packets done

=== Sanity test with mbuf alloc/free (burst) ===
Line 326: Packet count is incorrect, 1048568, expected 1048576
Test Failed
RTE>>
real    0m36.668s
user    1m7.293s
sys    0m1.560s

Could be worth running this loop on all tests? (not talking about the
CI, it would be a manual effort to catch lurking issues).


-- 
David Marchand

  reply	other threads:[~2019-10-08 19:46 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-08  9:55 [dpdk-dev] [PATCH] lib/distributor: fix deadlock issue for aarch64 Ruifeng Wang
2019-10-08 12:53 ` Hunt, David
2019-10-08 17:05 ` Aaron Conole
2019-10-08 19:46   ` David Marchand [this message]
2019-10-08 20:08     ` [dpdk-dev] [dpdk-stable] " Aaron Conole
2019-10-09  5:52     ` Ruifeng Wang (Arm Technology China)
2019-10-17 11:42       ` [dpdk-dev] [EXT] " Harman Kalra
2019-10-17 13:48         ` Ruifeng Wang (Arm Technology China)
2019-10-12  2:43 ` [dpdk-dev] [PATCH v2 0/2] fix distributor unit test Ruifeng Wang
2019-10-12  2:43   ` [dpdk-dev] [PATCH v2 1/2] lib/distributor: fix deadlock issue for aarch64 Ruifeng Wang
2019-10-13  2:31     ` Honnappa Nagarahalli
2019-10-14 10:00       ` Ruifeng Wang (Arm Technology China)
2019-10-12  2:43   ` [dpdk-dev] [PATCH v2 2/2] test/distributor: fix false unit test failure Ruifeng Wang
2019-10-15  9:28 ` [dpdk-dev] [PATCH v3 0/2] fix distributor unit test Ruifeng Wang
2019-10-15  9:28   ` [dpdk-dev] [PATCH v3 1/2] lib/distributor: fix deadlock issue for aarch64 Ruifeng Wang
2019-10-25  8:13     ` Hunt, David
2019-10-15  9:28   ` [dpdk-dev] [PATCH v3 2/2] test/distributor: fix false unit test failure Ruifeng Wang
2019-10-25  8:13     ` Hunt, David
2019-10-24 19:31   ` [dpdk-dev] [PATCH v3 0/2] fix distributor unit test David Marchand
2019-10-25  8:11     ` Hunt, David
2019-10-25  8:18       ` David Marchand
2019-10-25  8:20         ` Hunt, David
2019-10-25  8:33   ` David Marchand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJFAV8yyt7EyPV_SKLmdFrXgqwJh=J-cLZ0GoRnoEBzh_Nnj7A@mail.gmail.com' \
    --to=david.marchand@redhat.com \
    --cc=aconole@redhat.com \
    --cc=david.hunt@intel.com \
    --cc=dev@dpdk.org \
    --cc=gavin.hu@arm.com \
    --cc=hkalra@marvell.com \
    --cc=honnappa.nagarahalli@arm.com \
    --cc=nd@arm.com \
    --cc=ruifeng.wang@arm.com \
    --cc=stable@dpdk.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.