From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jan Beulich" Subject: Re: [xen-unstable test] APIC error on CPU[x] in xcp dom0 Date: Tue, 25 Jan 2011 11:53:49 +0000 Message-ID: <4D3EC7DD020000780002E551@vpn.id2.novell.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: George Dunlap , "xen.org" Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org >>> On 25.01.11 at 11:52, George Dunlap = wrote: > Something strange about this error. Comparing the serial logs of the > failure on "leaf-beetle": > http://www.chiark.greenend.org.uk/~xensrcts/logs/5231/test-i386-xcpkern-i= 386-xl/=20 > serial-leaf-beetle.log >=20 > to a successful boot on the same machine: > http://www.chiark.greenend.org.uk/~xensrcts/logs/5161/test-i386-xcpkern-i= 386-xl/=20 > serial-leaf-beetle.log >=20 > the thing that stands out are some scary messages from Xen during the > failed boot: > --- > (XEN) CPU counter reports 4094 correctable hardware errors that were > Jan 24 17:37:24 not reported by the status MSRs > ... > Jan 24 17:37:27 (XEN) APIC error on CPU3: 00(08) > Jan 24 17:37:27 (XEN) APIC error on CPU2: 00(08) > Jan 24 17:37:27 (XEN) APIC error on CPU0: 00(08) > Jan 24 17:37:27 (XEN) APIC error on CPU1: 00(08) > ... > Jan 24 17:37:30 (XEN) CPU counter reports 4094 correctable hardware > errors that were not reporte > Jan 24 17:37:30 d by the status MSRs > --- > Immediately after which, the sata driver complains that the "identify" > command failed: > --- > Jan 24 17:37:33 ata1.00: qc timeout (cmd 0xec) > Jan 24 17:37:33 ata1.00: failed to IDENTIFY (I/O error, err_mask=3D0x4) > Jan 24 17:37:33 ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) > Jan 24 17:37:43 ata1.00: qc timeout (cmd 0xec) > Jan 24 17:37:43 ata1.00: failed to IDENTIFY (I/O error, err_mask=3D0x4) > Jan 24 17:37:43 ata1: limiting SATA link speed to 1.5 Gbps > Jan 24 17:37:43 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) > Jan 24 17:38:13 ata1.00: qc timeout (cmd 0xec) > Jan 24 17:38:13 ata1.00: failed to IDENTIFY (I/O error, err_mask=3D0x4) > Jan 24 17:38:14 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) > --- >=20 > The "correctable hardware errors" messages are present in the > successful log as well, but not the APIC error messages. >=20 > Who does development / maintenance on the xcp kernel? Is that a = Novell=20 > thing? It's derived from our kernel. Why do you ask, given that the initial set of frightening messages come from the hypervisor? Jan