From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: [xen-unstable test] 58821: tolerable FAIL Date: Mon, 22 Jun 2015 17:00:43 +0100 Message-ID: <5588312B.6010009@citrix.com> References: <1434986228.28264.172.camel@citrix.com> <5588479B0200007800087ADE@mail.emea.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <5588479B0200007800087ADE@mail.emea.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich , Ian Campbell Cc: ian.jackson@eu.citrix.com, xen-devel@lists.xensource.com, Aravind Gopalakrishnan , suravee.suthikulpanit@amd.com List-Id: xen-devel@lists.xenproject.org On 22/06/15 16:36, Jan Beulich wrote: >>>> On 22.06.15 at 17:17, wrote: >> On Mon, 2015-06-22 at 14:09 +0000, osstest service user wrote: >>> flight 58821 xen-unstable real [real] >>> http://logs.test-lab.xenproject.org/osstest/logs/58821/ >>> >> [...] >>> test-amd64-amd64-libvirt 11 guest-start fail like >> 58789 >> >> http://logs.test-lab.xenproject.org/osstest/logs/58821/test-amd64-amd64-libv >> irt/info.html >> >> While investigating why libvirt hasn't been succeeding very well on >> merlot* I came across some things in the serial log which initially >> struck me as odd, but which I suspect are nothing (or at least not >> terribly relevant), if someone could confirm that would be great. >> >> Firstly is: >> >> Jun 22 12:41:09.633294 (XEN) microcode: CPU2 updated from revision 0x6000822 to 0x6000832 >> Jun 22 12:41:09.665099 (XEN) microcode: CPU4 updated from revision 0x6000822 to 0x6000832 >> Jun 22 12:41:09.729089 (XEN) microcode: CPU6 updated from revision 0x6000822 to 0x6000832 >> [...] >> >> i.e. only even numbered cpus are updated. (0 was done earlier in boot). >> I suspect that the answer here is "hyperthreading", and the cpuinfo >> shows all cpus have in fact been updated. > Yes (albeit hyperthreading is an Intel term, but iirc the same applies > to the two cores per compute unit). Indeed. The "microcode: patch is already at required level or greater.\n" message is helpfully unconditionally compiled out. > >> The second thing is: >> Jun 22 12:41:10.601103 (XEN) Brought up 32 CPUs >> Jun 22 12:41:10.625270 (XEN) Testing NMI watchdog on all CPUs: 0 1 2 3 4 5 6 >> 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 stuck >> >> i.e. at least one CPU has issues with NMI watchdog (looking at other >> runs it seems to vary between 29-31). Is this just that the NMI watchdog >> doesn't deal well with so many pCPUs? Or is it a real issue? > Very few CPUs properly responding is certainly quite odd; one > would expect all or none of them to work. Perhaps our AMD > maintainers (now Cc-ed) could take a look... There are several things wrong with the NMI testing in Xen atm, following some recent investigation in XenServer. Time isn't accounted properly for cores under bios/hardware power control, and Xen doesn't wait for the requisite time even if the core were running at its expected frequency. I should see about making those patches appear, but for now, ignore this line. It is more than likely wrong. ~Andrew