From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: [xen-unstable test] 58821: tolerable FAIL
Date: Mon, 22 Jun 2015 17:00:43 +0100
Message-ID: <5588312B.6010009@citrix.com>
References: <osstest-58821-mainreport@xen.org>
	<1434986228.28264.172.camel@citrix.com>
	<5588479B0200007800087ADE@mail.emea.novell.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <5588479B0200007800087ADE@mail.emea.novell.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Jan Beulich <JBeulich@suse.com>, Ian Campbell <ian.campbell@citrix.com>
Cc: ian.jackson@eu.citrix.com, xen-devel@lists.xensource.com, Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>, suravee.suthikulpanit@amd.com
List-Id: xen-devel@lists.xenproject.org

On 22/06/15 16:36, Jan Beulich wrote:
>>>> On 22.06.15 at 17:17, <ian.campbell@citrix.com> wrote:
>> On Mon, 2015-06-22 at 14:09 +0000, osstest service user wrote:
>>> flight 58821 xen-unstable real [real]
>>> http://logs.test-lab.xenproject.org/osstest/logs/58821/ 
>>>
>> [...]
>>>  test-amd64-amd64-libvirt     11 guest-start                  fail   like 
>> 58789
>>
>> http://logs.test-lab.xenproject.org/osstest/logs/58821/test-amd64-amd64-libv 
>> irt/info.html
>>
>> While investigating why libvirt hasn't been succeeding very well on
>> merlot* I came across some things in the serial log which initially
>> struck me as odd, but which I suspect are nothing (or at least not
>> terribly relevant), if someone could confirm that would be great.
>>
>> Firstly is:
>>
>> Jun 22 12:41:09.633294 (XEN) microcode: CPU2 updated from revision 0x6000822 to 0x6000832
>> Jun 22 12:41:09.665099 (XEN) microcode: CPU4 updated from revision 0x6000822 to 0x6000832
>> Jun 22 12:41:09.729089 (XEN) microcode: CPU6 updated from revision 0x6000822 to 0x6000832
>> [...]
>>
>> i.e. only even numbered cpus are updated. (0 was done earlier in boot).
>> I suspect that the answer here is "hyperthreading", and the cpuinfo
>> shows all cpus have in fact been updated.
> Yes (albeit hyperthreading is an Intel term, but iirc the same applies
> to the two cores per compute unit).

Indeed.  The "microcode: patch is already at required level or
greater.\n" message is helpfully unconditionally compiled out.

>
>> The second thing is:
>> Jun 22 12:41:10.601103 (XEN) Brought up 32 CPUs
>> Jun 22 12:41:10.625270 (XEN) Testing NMI watchdog on all CPUs: 0 1 2 3 4 5 6 
>> 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 stuck
>>
>> i.e. at least one CPU has issues with NMI watchdog (looking at other
>> runs it seems to vary between 29-31). Is this just that the NMI watchdog
>> doesn't deal well with so many pCPUs? Or is it a real issue?
> Very few CPUs properly responding is certainly quite odd; one
> would expect all or none of them to work. Perhaps our AMD
> maintainers (now Cc-ed) could take a look...

There are several things wrong with the NMI testing in Xen atm,
following some recent investigation in XenServer.  Time isn't accounted
properly for cores under bios/hardware power control, and Xen doesn't
wait for the requisite time even if the core were running at its
expected frequency.

I should see about making those patches appear, but for now, ignore this
line.  It is more than likely wrong.

~Andrew