All of lore.kernel.org
 help / color / mirror / Atom feed
From: Cleber Rosa <crosa@redhat.com>
To: "Philippe Mathieu-Daudé" <philmd@redhat.com>
Cc: qemu-devel@nongnu.org, ehabkost@redhat.com,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: Problems with MIPS Malta SSH tests in make check-acceptance
Date: Thu, 19 Sep 2019 13:14:44 -0400	[thread overview]
Message-ID: <20190919171444.GA6495@localhost.localdomain> (raw)
In-Reply-To: <8bc26c85-253e-ff20-c3e0-1ecdb56d60c0@redhat.com>

On Thu, Sep 19, 2019 at 07:00:49PM +0200, Philippe Mathieu-Daudé wrote:
> On 9/19/19 6:56 PM, Cleber Rosa wrote:
> > On Wed, Sep 18, 2019 at 09:14:58PM -0400, Cleber Rosa wrote:
> >> On Wed, Sep 18, 2019 at 05:16:54PM +1000, David Gibson wrote:
> >>> Hi,
> >>>
> >>> I'm finding make check-acceptance is currently useless for me as a
> >>> pre-pull test, because a bunch of the tests are not at all reliable.
> >>> There are a bunch which I'm still investigating, but for now I'm
> >>> looking at the MIPS Malta SSH tests.
> >>>
> >>> There seem to be at least two problems here.  First, the test includes
> >>> a download of a pretty big guest disk image.  This can easily exhaust
> >>> the 2m30 timeout on its own.
> >>>
> >>
> >> You're correct that successes and failures on those tests depend
> >> largely on bandwith.  On a shared environment I used for tests
> >> the download of those images take roughly 400 seconds, resulting
> >> in failures.  On my own machine, around 60, and the tests pass.
> >>
> >> There's a conceptual and conflicting problem in that the environment
> >> for tests to run should be prepared beforehand.  The conflicting
> >> solutions can be:
> >>
> >>  * extensive bootstrapping of the test execution environment, such
> >>    as the installation of guests from ISOs or installation trees, or
> >>    the download of "default" images wether the tests will use it or
> >>    not (this is what Avocado-VT does/requires)
> >>
> >>  * keeping test assets in the tree (Avocado allows this if you have
> >>    a your_test.py.data/ directory), but it's not practical for large
> >>    files or files that can't or shouldn't be redistributed
> >>
> >>> Even without the timeout, it makes the test really slow, even on
> >>> repeated runs.  Is there some way we can make the image download part
> >>> of "building" the tests rather than actually running the testsuite, so
> >>> that a) the test themselves go faster and b) we don't include the
> >>> download in the test timeout - obviously the download speed is hugely
> >>> dependent on factors that aren't really related to what we're testing
> >>> here.
> >>>
> >>
> >> On Avocado version 72.0 we attempted to minimize the isse by
> >> implementing a "vmimage" command.  So, if you expect to use Fedora 30
> >> aarch64 images, you could run before your tests:
> >>
> >>  $ avocado vmimage get --distro fedora --distro-version 30 --arch aarch64
> >>
> >> And to list the images on your cache:
> >>
> >>  $ avocado vmimage list
> >>
> >> Unfortunately, this test doesn't use the vmimage API.  Actually that
> >> is fine because not all test assets map nicely to the vmimage goal,
> >> and should keep using the more generic (and lower level) fetch_asset().
> >>
> >> We're now working on various "asset fetcher" improvements that should
> >> allow us to check/cache all assets before a test is executed.  Also,
> >> we're adding a mode in which the "fetch_asset()" API will default to
> >> cancel (aka SKIP) a test if the asset could not be downloaded.
> >>
> >> If you're interested in the card we're using to track that new feature:
> >>
> >>   https://trello.com/c/T3SC1sZs/1521-implement-fetch-assets-command-line-parameter
> >>
> >> Another possibility that we've prototyped, and we'll be working on
> >> further, is to make a specific part of the "test" code execution
> >> (really a pre-test phase) to be executed without a timeout and even be
> >> tried a number of times before bailing out and skipping the test.
> >>
> >>> In the meantime, I tried hacking it by just increasing the timeout to
> >>> 10m.  That got several of the tests working for me, but one still
> >>> failed.  Specifically 'LinuxSSH.test_mips_malta32eb_kernel3_2_0' still
> >>> timed out for me, but now after booting the guest, rather than during
> >>> the image download.  Looking at the avocado log file I'm seeing a
> >>> bunch of soft lockup messages from the guest console, AFAICT.  So it
> >>> looks like we have a real bug here, which I suspect has been
> >>> overlooked precisely because the download problems mean this test
> >>> isn't reliable.
> >>>
> >>
> >> I've schedulled a 100 executions of `make check-acceptance` builds, with
> >> the linux_ssh_mips_malta.py tests having a 1500 seconds timeout.  The
> >> very first execution already brought interesting results:
> >>
> >>  ...
> >>  (15/39) /home/cleber/src/qemu/tests/acceptance/linux_ssh_mips_malta.py:LinuxSSH.test_mips_malta32eb_kernel3_2_0: PASS (198.38 s)
> >>  (16/39) /home/cleber/src/qemu/tests/acceptance/linux_ssh_mips_malta.py:LinuxSSH.test_mips_malta64el_kernel3_2_0: FAIL: Failure message found in console: Oops (22.83 s)
> >>
> >> I'll let you know about my full results.  This should also serve as a
> >> starting point to a discussion about the reliability of other tests,
> >> as you mentioned before.
> > 
> > Out of the 100 executions on a ppc64le host, the results that contain
> > failures and errors:
> > 
> > 15-/home/cleber/src/qemu/tests/acceptance/linux_ssh_mips_malta.py:LinuxSSH.test_mips_malta32eb_kernel3_2_0
> >   - PASS: 92
> >   - INTERRUPTED: 4
> >   - FAIL: 4
> > 16-/home/cleber/src/qemu/tests/acceptance/linux_ssh_mips_malta.py:LinuxSSH.test_mips_malta64el_kernel3_2_0
> >   - PASS: 95
> >   - FAIL: 5
> > 
> > FAIL means that self.fail() was called, which means 'Oops' was found
> > in the console.  INTERRUPTED here means that the test timeout kicked
> > in, and I can back David's statements about soft lockups.
> > 
> > Let me know if anyone wants access to the full logs/results.
> 
> Can you check if the FAIL case are this bug please?
> 
> https://bugs.launchpad.net/qemu/+bug/1833661
>

Yes, the errors do match.  I posted an updated there:

  https://bugs.launchpad.net/qemu/+bug/1833661/comments/3

Cheers,
- Cleber.

> Thanks,
> 
> Phil.
> 


  reply	other threads:[~2019-09-19 17:39 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-18  7:16 [Qemu-devel] Problems with MIPS Malta SSH tests in make check-acceptance David Gibson
2019-09-18 11:13 ` Philippe Mathieu-Daudé
2019-09-18 11:37   ` David Gibson
2019-09-19  1:14 ` Cleber Rosa
2019-09-19 16:56   ` Cleber Rosa
2019-09-19 17:00     ` Philippe Mathieu-Daudé
2019-09-19 17:14       ` Cleber Rosa [this message]
2019-09-19 18:54         ` Cleber Rosa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190919171444.GA6495@localhost.localdomain \
    --to=crosa@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=ehabkost@redhat.com \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.