dev.dpdk.org archive mirror
 help / color / mirror / Atom feed
From: David Marchand <david.marchand@redhat.com>
To: Michael Santana Francisco <msantana@redhat.com>
Cc: Aaron Conole <aconole@redhat.com>,
	Thomas Monjalon <thomas@monjalon.net>, dev <dev@dpdk.org>,
	 JananeeX M Parthasarathy <jananeex.m.parthasarathy@intel.com>,
	David Hunt <david.hunt@intel.com>
Subject: Re: [dpdk-dev] [PATCH v2 00/15] Unit tests fixes for CI
Date: Wed, 10 Jul 2019 10:18:46 +0200	[thread overview]
Message-ID: <CAJFAV8wzfH2wWxFaMwfJmhYc4zfJaP+ukV_tg9PPGL2jPqU-Og@mail.gmail.com> (raw)
In-Reply-To: <139fd420-dbee-0a33-1885-00c9593fe201@redhat.com>

On Tue, Jul 9, 2019 at 5:50 PM Michael Santana Francisco <
msantana@redhat.com> wrote:

> On 7/1/19 2:07 PM, Michael Santana Francisco wrote:
> >>
> >>
> >> On Mon, Jul 1, 2019 at 6:04 PM Aaron Conole <aconole@redhat.com> wrote:
> >>>>> - rwlock_autotest and hash_readwrite_lf_autotest are taking a little
> more
> >>>>>    than 10s,
> >>> Occasionally the distributor test times out as well.  I've moved them
> as
> >>> part of a separate patch, that I'll post along with a bigger series to
> >>> enable the unit tests under travis.  Michael and I are leaning toward
> >>> introducing a new variable called RUN_TESTS which will do the docs and
> >>> unit testing since those combined would add quite a bit to the
> execution
> >>> time of each job (and feel free to bike shed the name, since the
> patches
> >>> aren't final).
> >>
> >> Seeing how the distributor autotest usually takes less than a second to
> complete, this sounds like a bug.
> >> I don't think I caught this so far.
> > So I actually ran into the distributor test timing out. I agree with
> > David in that it is a bug with the test. Looking at the logs that test
> > normally finishes in less than 1/2 a second, so running to 10 seconds
> > and timing out is a big jump in run time. I ran into the issue where
> > it timedout, so I restarted the job and it finished no problem.
> > The test fails every so often for no good reason and the logs[1] dont
> > really say much. I speculate that it is waiting for a resource to
> > become available or in the worse case a deadlock. Seeing that it only
> > fails every so often and it passes when restarted I don't think it's a
> > big deal, nevertheless it's worth investing time figuring out what's
> > wrong
> >
> > [1] https://api.travis-ci.com/v3/job/212335916/log.txt
>
> I investigated a little bit on this this test. CC'd David Hunt,
>
> I was able to reproduce the problem on v19.08-rc1 with:
>
> `while sudo sh -c "echo 'distributor_autotest' |
> ./build/app/test/dpdk-test"; do :; done`
>
> It runs a couple of times fine showing output and showing progress, but
> then at some point after a couple of seconds it just stops - no longer
> getting any output. It just sits there with no further output. I let it
> sit there for a whole minute and nothing happens. So I attach gdb to try
> to figure out what is happening. One thread seems to be stuck on a while
> loop, see lib/librte_distributor/rte_distributor.c:310.
>
> I looked at the assembly code (layout asm, ni) and I saw these four
> lines below (which correspond to the while loop) being executed
> repeatedly and indefinitely. It looks like this thread is waiting for
> the variable bufptr64[0] to change state.
>
> 0xa064d0 <release+32>   pause
> 0xa064d2 <release+34>   mov    0x3840(%rdx),%rax
> 0xa064d9 <release+41>   test   $0x1,%al
> 0xa064db <release+43>   je     0xa064d0 <release+32>
>
>
> While the first thread is waiting on bufptr64[0] to change state, there
> is another thread that is also stuck on another while loop on
> lib/librte_distributor/rte_distributor.c:53. It seems that this thread
> is stuck waiting for retptr64 to change state. Corresponding assembly
> being executed indefinitely:
>
> 0xa06de0 <rte_distributor_request_pkt_v1705+592> mov    0x38c0(%r8),%rax
> 0xa06de7 <rte_distributor_request_pkt_v1705+599> test   $0x1,%al
> 0xa06de9 <rte_distributor_request_pkt_v1705+601> je     0xa06bbd
> <rte_distributor_request_pkt_v1705+45>
> 0xa06def <rte_distributor_request_pkt_v1705+607>        nop
> 0xa06df0 <rte_distributor_request_pkt_v1705+608> pause
> 0xa06df2 <rte_distributor_request_pkt_v1705+610> rdtsc
> 0xa06df4 <rte_distributor_request_pkt_v1705+612> mov    %rdx,%r10
> 0xa06df7 <rte_distributor_request_pkt_v1705+615> shl    $0x20,%r10
> 0xa06dfb <rte_distributor_request_pkt_v1705+619> mov    %eax,%eax
> 0xa06dfd <rte_distributor_request_pkt_v1705+621> or     %r10,%rax
> 0xa06e00 <rte_distributor_request_pkt_v1705+624> lea    0x64(%rax),%r10
> 0xa06e04 <rte_distributor_request_pkt_v1705+628> jmp    0xa06e12
> <rte_distributor_request_pkt_v1705+642>
> 0xa06e06 <rte_distributor_request_pkt_v1705+630> nopw
> %cs:0x0(%rax,%rax,1)
> 0xa06e10 <rte_distributor_request_pkt_v1705+640> pause
> 0xa06e12 <rte_distributor_request_pkt_v1705+642> rdtsc
> 0xa06e14 <rte_distributor_request_pkt_v1705+644> shl    $0x20,%rdx
> 0xa06e18 <rte_distributor_request_pkt_v1705+648> mov    %eax,%eax
> 0xa06e1a <rte_distributor_request_pkt_v1705+650> or     %rdx,%rax
> 0xa06e1d <rte_distributor_request_pkt_v1705+653> cmp    %rax,%r10
> 0xa06e20 <rte_distributor_request_pkt_v1705+656> ja     0xa06e10
> <rte_distributor_request_pkt_v1705+640>
> 0xa06e22 <rte_distributor_request_pkt_v1705+658> jmp    0xa06de0
> <rte_distributor_request_pkt_v1705+592>
>
>
> My guess is that these threads are interdependent, so one thread is
> waiting for the other thread to change the state of the control
> variable. I can't say for sure if this is what is happening or why the
> these variables don't change state, so I would like ask someone who is
> more familiar with this particular code to take a look
>

Ah cool, thanks for the analysis.
Can you create a bz with this description and assign it to the
librte_distributor maintainer?


-- 
David Marchand

      reply	other threads:[~2019-07-10  8:19 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-04  8:59 [dpdk-dev] [PATCH 00/14] Unit tests fixes for CI David Marchand
2019-06-04  8:59 ` [dpdk-dev] [PATCH 01/14] test/bonding: add missing sources for link bonding RSS David Marchand
2019-06-04 12:59   ` Aaron Conole
2019-06-04  8:59 ` [dpdk-dev] [PATCH 02/14] test/crypto: move tests to the driver specific list David Marchand
2019-06-04 13:00   ` Aaron Conole
2019-06-04  8:59 ` [dpdk-dev] [PATCH 03/14] test/eventdev: " David Marchand
2019-06-04 13:04   ` Aaron Conole
2019-06-04  8:59 ` [dpdk-dev] [PATCH 04/14] test/hash: fix off-by-one check on core count David Marchand
2019-06-04 13:05   ` Aaron Conole
2019-06-05 20:02   ` Wang, Yipeng1
2019-06-04  8:59 ` [dpdk-dev] [PATCH 05/14] test/hash: rectify slaveid to point to valid cores David Marchand
2019-06-05 20:02   ` Wang, Yipeng1
2019-06-04  8:59 ` [dpdk-dev] [PATCH 06/14] test/hash: clean remaining trace of scaling autotest David Marchand
2019-06-04 13:31   ` Aaron Conole
2019-06-04  8:59 ` [dpdk-dev] [PATCH 07/14] test/latencystats: fix stack smashing David Marchand
2019-06-04 13:38   ` Aaron Conole
2019-06-04  8:59 ` [dpdk-dev] [PATCH 08/14] test/stack: fix lock-free test name David Marchand
2019-06-04 13:06   ` Aaron Conole
2019-06-04  8:59 ` [dpdk-dev] [PATCH 09/14] test/eal: set memory channel config only in dedicated test David Marchand
2019-06-04 13:11   ` Aaron Conole
2019-06-26  9:44   ` Burakov, Anatoly
2019-06-04  8:59 ` [dpdk-dev] [PATCH 10/14] test/eal: set core mask/list " David Marchand
2019-06-04 13:12   ` Aaron Conole
2019-06-26  9:45   ` Burakov, Anatoly
2019-06-04  8:59 ` [dpdk-dev] [PATCH 11/14] test/eal: check number of cores before running subtests David Marchand
2019-06-04 13:26   ` Aaron Conole
2019-06-26  9:47   ` Burakov, Anatoly
2019-06-04  8:59 ` [dpdk-dev] [PATCH 12/14] test/eal: make the test pass again David Marchand
2019-06-04 13:29   ` Aaron Conole
2019-06-04 13:50     ` David Marchand
2019-06-26  9:49   ` Burakov, Anatoly
2019-06-26 10:03     ` David Marchand
2019-06-04  8:59 ` [dpdk-dev] [PATCH 13/14] test: do not start tests in parallel David Marchand
2019-06-04  8:59 ` [dpdk-dev] [PATCH 14/14] test: skip tests when missing requirements David Marchand
2019-06-07 20:54   ` Honnappa Nagarahalli
2019-06-08  8:01     ` David Marchand
2019-06-11  4:08       ` Honnappa Nagarahalli
2019-06-04 15:49 ` [dpdk-dev] [PATCH 00/14] Unit tests fixes for CI Michael Santana Francisco
2019-06-27 16:34   ` Thomas Monjalon
2019-07-01 12:17     ` Aaron Conole
2019-06-15  6:42 ` [dpdk-dev] [PATCH v2 00/15] " David Marchand
2019-06-15  6:42   ` [dpdk-dev] [PATCH v2 01/15] test/bonding: add missing sources for link bonding RSS David Marchand
2019-06-15  6:42   ` [dpdk-dev] [PATCH v2 02/15] test/crypto: move tests to the driver specific list David Marchand
2019-06-15  6:42   ` [dpdk-dev] [PATCH v2 03/15] test/eventdev: " David Marchand
2019-06-15  6:42   ` [dpdk-dev] [PATCH v2 04/15] test/hash: fix off-by-one check on core count David Marchand
2019-06-15  6:42   ` [dpdk-dev] [PATCH v2 05/15] test/hash: rectify slaveid to point to valid cores David Marchand
2019-06-15  6:42   ` [dpdk-dev] [PATCH v2 06/15] test/hash: clean remaining trace of scaling autotest David Marchand
2019-06-15  6:42   ` [dpdk-dev] [PATCH v2 07/15] test/latencystats: fix stack smashing David Marchand
2019-06-15  6:42   ` [dpdk-dev] [PATCH v2 08/15] test/rcu: remove arbitrary limit on max core count David Marchand
2019-06-28 12:56     ` [dpdk-dev] [dpdk-stable] " Ferruh Yigit
2019-06-28 13:32       ` David Marchand
2019-06-15  6:42   ` [dpdk-dev] [PATCH v2 09/15] test/stack: fix lock-free test name David Marchand
2019-06-15  6:42   ` [dpdk-dev] [PATCH v2 10/15] test/eal: set memory channel config only in dedicated test David Marchand
2019-06-15  6:42   ` [dpdk-dev] [PATCH v2 11/15] test/eal: set core mask/list " David Marchand
2019-06-15  6:42   ` [dpdk-dev] [PATCH v2 12/15] test/eal: check number of cores before running subtests David Marchand
2019-06-15  6:42   ` [dpdk-dev] [PATCH v2 13/15] test: split into shorter subtests for CI David Marchand
2019-06-15  6:42   ` [dpdk-dev] [PATCH v2 14/15] test: do not start tests in parallel David Marchand
2019-06-15  6:42   ` [dpdk-dev] [PATCH v2 15/15] test: skip tests when missing requirements David Marchand
2019-06-17 10:00   ` [dpdk-dev] [PATCH v2 00/15] Unit tests fixes for CI Bruce Richardson
2019-06-17 10:46     ` David Marchand
2019-06-17 11:17       ` Bruce Richardson
2019-06-17 11:41         ` David Marchand
2019-06-17 11:56           ` Bruce Richardson
2019-06-17 13:44             ` David Marchand
2019-06-27 20:36   ` Thomas Monjalon
2019-07-01 16:04     ` Aaron Conole
2019-07-01 16:22       ` Thomas Monjalon
2019-07-01 16:45       ` David Marchand
2019-07-01 18:07         ` Michael Santana Francisco
2019-07-09 15:50           ` Michael Santana Francisco
2019-07-10  8:18             ` David Marchand [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJFAV8wzfH2wWxFaMwfJmhYc4zfJaP+ukV_tg9PPGL2jPqU-Og@mail.gmail.com \
    --to=david.marchand@redhat.com \
    --cc=aconole@redhat.com \
    --cc=david.hunt@intel.com \
    --cc=dev@dpdk.org \
    --cc=jananeex.m.parthasarathy@intel.com \
    --cc=msantana@redhat.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).