From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ian Campbell Subject: Re: [PATCH V3] libxl: Increase device model startup timeout to 1min. Date: Tue, 14 Jul 2015 10:37:34 +0100 Message-ID: <1436866654.25044.45.camel@citrix.com> References: <21915.58620.948343.728555@mariner.uk.xensource.com> <1436281753-19534-1-git-send-email-anthony.perard@citrix.com> <21915.60619.555732.214104@mariner.uk.xensource.com> <1436283671.25646.254.camel@citrix.com> <55A4C593020000780009077E@mail.emea.novell.com> <1436860520.7019.140.camel@citrix.com> <1436865941.13522.68.camel@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <1436865941.13522.68.camel@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Dario Faggioli Cc: Wei Liu , Stefano Stabellini , Ian Jackson , xen-devel@lists.xen.org, Jan Beulich , Anthony PERARD List-Id: xen-devel@lists.xenproject.org On Tue, 2015-07-14 at 11:25 +0200, Dario Faggioli wrote: > On Tue, 2015-07-14 at 08:55 +0100, Ian Campbell wrote: > > On Tue, 2015-07-14 at 07:17 +0100, Jan Beulich wrote: > > > >>> On 07.07.15 at 17:41, wrote: > > > > On Tue, 2015-07-07 at 16:14 +0100, Ian Jackson wrote: > > > >> Anthony PERARD writes ("[PATCH V3] libxl: Increase device model startup > > > > timeout to 1min."): > > > >> > On a busy host, QEMU may take more than 10s to load and start. > > > >> > > > > >> > This is likely due to a bug in Linux where the I/O subsystem sometime > > > >> > produce high latency under load and result in QEMU taking a long time to > > > >> > load every single dynamic libraries. > > > >> > > > >> Acked-by: Ian Jackson > > > > > > > > Applied. > > > > > > So is this the "answer" to "Problems with merlot* AMD Opteron 6376 > > > systems"? > > > > It'll be hard to say until this change gets through the Xen push gate > > and that version gets used for other branches (linux testing, libvirt, > > ovmf, osstest's own gate etc). > > > Indeed. My opinion is that no, it is not. > > My understanding of the data Anthony provided is that, under some > (difficult to track/analyze/reproduce/etc) load conditions, the Linux IO > and VM subsystem suffer from high latency, delaying QEMU startup. > > In the merlot* cases, the system is completely idle, apart from the > failing creation/migration operation. > > So, no, I don't think that would not be the fix we need for that > situation. Even if it is not the correct fix it seems like in some situations the increase in timeout has improved things, hence it is an "answer" as Jan asked (his quote marks). > > At the moment it looks like it has helped with some but not all of the > > issues. > > > > These: > > > > http://logs.test-lab.xenproject.org/osstest/results/host/merlot0.html > > http://logs.test-lab.xenproject.org/osstest/results/host/merlot1.html > > > Can I ask why (I mean, e.g., comparing what with what) you're saying it > seems to have helped? There seemed (unscientifically) to be fewer of the libvirt related guest-start failures. Ian.