xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
To: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Anthony PERARD <anthony.perard@citrix.com>,
	xen-devel@lists.xen.org, Wei Liu <wei.liu2@citrix.com>,
	Ian Campbell <ian.campbell@citrix.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Subject: Re: [PATCH] libxl: Increase device model startup timeout to 1min.
Date: Wed, 1 Jul 2015 16:03:55 +0100	[thread overview]
Message-ID: <alpine.DEB.2.02.1507011525420.17378@kaball.uk.xensource.com> (raw)
In-Reply-To: <21906.50807.907706.819950@mariner.uk.xensource.com>

On Tue, 30 Jun 2015, Ian Jackson wrote:
> > >   * The number and nature of parallel operations done in the stress
> > >     test is unreasonable for the provided hardware:
> > >       => the timeout is fine
> > 
> > I don't know if it is our place to make this call.  Should we really be
> > deciding what is considered "reasonable"? I think not. Defining what is
> > reasonable and policies that match it is not a route I think we should
> > take in libxl.
> 
> Nevertheless if we are defining timeouts we are implicitly setting
> some parameters which imply that certain configurations are
> unreasonable.  Hopefully all such configurations are absurd.
> 
> If what you mean is that our bounds of `reasonable' should be very
> wide, then I agree.  If anyone could reasonably expect it to work,
> then that is fine.  Certainly we should refrain fromk subjective
> judgements.

OK.  How do you measure reasonable for this case?

What I actually mean to ask is how do you suggest we proceed on this
problem?

Of course it would be nice if we knew exactly why this is happening, but
the issue only happens once every 2-3 tempest runs, each of them takes
about 1 hour.  Tempest executes about 1300 tests for each run, some
of them in parallel. We haven't taken the time to read all the tests run
by tempest so we don't know exactly what they do.

We don't really know the environment that causes the failure. Reading
all the tests is not an option. We could try adding more tracing to the
system, but given the type of error, if we do we are not likely to
reproduce the error at all, or maybe reproduce something different.


Given the state of things, I suggest we make sure that increasing the
timeout actually fixes/works-around the problem. I would also like to
see some empirical measurements that tell us by how much we should
increase the timeout. Is 1 minute actually enough?

I would not go as far as asking to figure out what the real cause of the
problem is, because there is no way to estimate how long is going to
take or even how to do that. And in the meantime we still have spurious
failures in the OpenStack CI-loop.

Alternatively we could carry the work around in the Xen package we build
for the OpenStack CI-loop, leaving xen-unstable "unfixed", but I think
that would be less desirable.

  reply	other threads:[~2015-07-01 15:03 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-26 11:57 [PATCH] libxl: Increase device model startup timeout to 1min Anthony PERARD
2015-06-26 16:41 ` Ian Campbell
2015-06-29 14:23   ` Anthony PERARD
2015-06-29 14:51     ` Ian Campbell
2015-06-29 16:09       ` Anthony PERARD
2015-06-30  3:08         ` Dario Faggioli
2015-06-30 14:04         ` Ian Jackson
2015-06-30 15:58           ` Stefano Stabellini
2015-06-30 16:40             ` Ian Jackson
2015-07-01 15:03               ` Stefano Stabellini [this message]
2015-07-02 11:11                 ` Anthony PERARD
2015-07-02 12:38                   ` Ian Jackson
2015-07-03 11:21                     ` Anthony PERARD
2015-07-03 11:30                       ` Ian Jackson
2015-07-07 13:44                         ` Ian Campbell
2015-07-07 13:47                           ` Ian Jackson
2015-07-07 14:37                             ` [PATCH V2] " Anthony PERARD
2015-07-07 14:41                               ` Ian Jackson
2015-07-07 15:09                                 ` [PATCH V3] " Anthony PERARD
2015-07-07 15:14                                   ` Ian Jackson
2015-07-07 15:41                                     ` Ian Campbell
2015-07-14  6:17                                       ` Jan Beulich
2015-07-14  7:55                                         ` Ian Campbell
2015-07-14  9:25                                           ` Dario Faggioli
2015-07-14  9:37                                             ` Ian Campbell
2015-07-14 10:52                                               ` Dario Faggioli
2015-07-14 14:23                                                 ` Dario Faggioli
2015-07-14 14:48                                                   ` Ian Campbell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.02.1507011525420.17378@kaball.uk.xensource.com \
    --to=stefano.stabellini@eu.citrix.com \
    --cc=Ian.Jackson@eu.citrix.com \
    --cc=anthony.perard@citrix.com \
    --cc=ian.campbell@citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).