From mboxrd@z Thu Jan 1 00:00:00 1970 From: Trenta sis Subject: Re: IBM HS20 Xen 4.1 and 4.2 Critical Interrupt - Front panel NMI crash Date: Sun, 29 Sep 2013 12:47:58 +0200 Message-ID: References: <20130909191524.GA4215@phenom.dumpdata.com> <20130923140251.GH3175@phenom.dumpdata.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============6798237078016503175==" Return-path: In-Reply-To: <20130923140251.GH3175@phenom.dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk Cc: arrfab@centos.org, agya.naila@gmail.com, JBeulich@suse.com, xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org --===============6798237078016503175== Content-Type: multipart/alternative; boundary=047d7bd76aeae7b9ef04e7837372 --047d7bd76aeae7b9ef04e7837372 Content-Type: text/plain; charset=ISO-8859-1 Hello, In Bladecenter webfrontend appears: 27 I Blade_09 09/08/13 13:25:17 0x806f0013 Chassis, (NMI State) diagnostic interrupt 28 E Blade_09 09/08/13 13:25:12 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 29 I Blade_09 09/08/13 13:09:14 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt 30 I Blade_09 09/08/13 13:09:03 0x806f0013 Chassis, (NMI State) diagnostic interrupt 31 E Blade_09 09/08/13 13:08:58 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 32 I Blade_09 09/08/13 12:46:26 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt 33 I Blade_09 09/08/13 12:46:15 0x806f0013 Chassis, (NMI State) diagnostic interrupt 34 E Blade_09 09/08/13 12:46:11 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 35 I Blade_09 09/08/13 12:34:13 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt 36 I Blade_09 09/08/13 12:34:03 0x806f0013 Chassis, (NMI State) diagnostic interrupt 37 E Blade_09 09/08/13 12:33:58 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 38 I Blade_09 09/08/13 12:27:25 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt 39 I Blade_09 09/08/13 12:27:14 0x806f0013 Chassis, (NMI State) diagnostic interrupt 40 E Blade_09 09/08/13 12:27:10 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 41 I Blade_09 09/08/13 12:20:45 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt 42 I Blade_09 09/08/13 12:20:34 0x806f0013 Chassis, (NMI State) diagnostic interrupt 43 E Blade_09 09/08/13 12:20:30 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 44 I Blade_09 09/08/13 12:18:20 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt 45 I Blade_09 09/08/13 12:18:10 0x806f0013 Chassis, (NMI State) diagnostic interrupt 46 E Blade_09 09/08/13 12:18:05 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 47 I Blade_09 09/08/13 12:15:47 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt 48 I Blade_09 09/08/13 12:15:37 0x806f0013 Chassis, (NMI State) diagnostic interrupt 49 E Blade_09 09/08/13 12:15:32 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 27 I Blade_09 09/08/13 13:25:17 0x806f0013 Chassis, (NMI State) diagnostic interrupt 28 E Blade_09 09/08/13 13:25:12 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 29 I Blade_09 09/08/13 13:09:14 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt 30 I Blade_09 09/08/13 13:09:03 0x806f0013 Chassis, (NMI State) diagnostic interrupt 31 E Blade_09 09/08/13 13:08:58 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 32 I Blade_09 09/08/13 12:46:26 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt 33 I Blade_09 09/08/13 12:46:15 0x806f0013 Chassis, (NMI State) diagnostic interrupt 34 E Blade_09 09/08/13 12:46:11 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 35 I Blade_09 09/08/13 12:34:13 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt 36 I Blade_09 09/08/13 12:34:03 0x806f0013 Chassis, (NMI State) diagnostic interrupt 37 E Blade_09 09/08/13 12:33:58 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 38 I Blade_09 09/08/13 12:27:25 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt 39 I Blade_09 09/08/13 12:27:14 0x806f0013 Chassis, (NMI State) diagnostic interrupt 40 E Blade_09 09/08/13 12:27:10 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 41 I Blade_09 09/08/13 12:20:45 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt 42 I Blade_09 09/08/13 12:20:34 0x806f0013 Chassis, (NMI State) diagnostic interrupt 43 E Blade_09 09/08/13 12:20:30 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 44 I Blade_09 09/08/13 12:18:20 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt 45 I Blade_09 09/08/13 12:18:10 0x806f0013 Chassis, (NMI State) diagnostic interrupt 46 E Blade_09 09/08/13 12:18:05 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 47 I Blade_09 09/08/13 12:15:47 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt 48 I Blade_09 09/08/13 12:15:37 0x806f0013 Chassis, (NMI State) diagnostic interrupt 49 E Blade_09 09/08/13 12:15:32 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 Thanks 27 I Blade_09 09/08/13 13:25:17 0x806f0013 Chassis, (NMI State) diagnostic interrupt 28 E Blade_09 09/08/13 13:25:12 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 29 I Blade_09 09/08/13 13:09:14 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt 30 I Blade_09 09/08/13 13:09:03 0x806f0013 Chassis, (NMI State) diagnostic interrupt 31 E Blade_09 09/08/13 13:08:58 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 32 I Blade_09 09/08/13 12:46:26 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt 33 I Blade_09 09/08/13 12:46:15 0x806f0013 Chassis, (NMI State) diagnostic interrupt 34 E Blade_09 09/08/13 12:46:11 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 35 I Blade_09 09/08/13 12:34:13 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt 36 I Blade_09 09/08/13 12:34:03 0x806f0013 Chassis, (NMI State) diagnostic interrupt 37 E Blade_09 09/08/13 12:33:58 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 38 I Blade_09 09/08/13 12:27:25 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt 39 I Blade_09 09/08/13 12:27:14 0x806f0013 Chassis, (NMI State) diagnostic interrupt 40 E Blade_09 09/08/13 12:27:10 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 41 I Blade_09 09/08/13 12:20:45 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt 42 I Blade_09 09/08/13 12:20:34 0x806f0013 Chassis, (NMI State) diagnostic interrupt 43 E Blade_09 09/08/13 12:20:30 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 44 I Blade_09 09/08/13 12:18:20 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt 45 I Blade_09 09/08/13 12:18:10 0x806f0013 Chassis, (NMI State) diagnostic interrupt 46 E Blade_09 09/08/13 12:18:05 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 47 I Blade_09 09/08/13 12:15:47 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt 48 I Blade_09 09/08/13 12:15:37 0x806f0013 Chassis, (NMI State) diagnostic interrupt 49 E Blade_09 09/08/13 12:15:32 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value= 0020 2013/9/23 Konrad Rzeszutek Wilk > On Thu, Sep 12, 2013 at 02:47:39PM +0200, Trenta sis wrote: > > Hello, > > > > We need this server and we have made a downgrade to Debian Squeeze. > > I hope in a few day to have another HS20 to make some additional test, > I'll > > try to get all information that you asked and send > > Sorry, one question what is PCI SERR ? Where? > > If you log in the BladeCenter webfrontend you should see logs of > each blade. Some of them are 'User XYZ logged in'. But in some cases > the are more serious ones - such an NMI or PCI SERR. If you could > copy-n-paste > them it could help in figuring which PCI device is responsible for causing > the NMI. > > > > > Thanks for all > > > > 2013/9/9 Konrad Rzeszutek Wilk > > > > > On Sun, Sep 08, 2013 at 04:41:02PM +0200, Trenta sis wrote: > > > > Hello, > > > > > > > > I have the same error, server is auto rebooted during every boot with > > > > kernel XEN, HS20 with Debian Wheezy and XEN hang on and AMM managment > > > show > > > > same errors described in previous mails. With Debian wheezy wit > non-xen > > > > kernel boots correcte, it seems that problems is with xen kernel > > > > Same Server HS20 with Debian Lenny+ XEN 3.2 or Debian Squeeze+XEN > > > > 4.0 working perfect > > > > > > > > Upgraded to Debian testing and unstable with same results XEN 4.1 and > > > 4.2. > > > > > > > > If you need more information, you can ask. > > > > How can be solved this bug? > > > > > > Did you the workaround help? > > > > > > And in regards to finding out exactly what causes it - well there are > > > logs in the BMC that can point to it the PCI device? Did you check > those? > > > Do they save if there is any device that has PCI SERR on them? > > > > > > Thanks. > > > > --047d7bd76aeae7b9ef04e7837372 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hello,
=A0
In Bladecenter webfrontend appears:
=A0
=
27 I Blade_09 09/08/13 13:25:17 0x806f0013 Chassis, (NMI State) diagnostic interrupt
28 E Blade_09 09/08/13 13:25:12 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value=3D 0020
29 I Blade_09 09/08/13 13:09:14 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
30 I Blade_09 09/08/13 13:09:03 0x806f0013 Chassis, (NMI State) diagnostic interrupt
31 E Blade_09 09/08/13 13:08:58 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value=3D 0020
32 I Blade_09 09/08/13 12:46:26 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
33 I Blade_09 09/08/13 12:46:15 0x806f0013 Chassis, (NMI State) diagnostic interrupt
34 E Blade_09 09/08/13 12:46:11 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value=3D 0020
35 I Blade_09 09/08/13 12:34:13 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
36 I Blade_09 09/08/13 12:34:03 0x806f0013 Chassis, (NMI State) diagnostic interrupt
37 E Blade_09 09/08/13 12:33:58 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value=3D 0020
38 I Blade_09 09/08/13 12:27:25 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
39 I Blade_09 09/08/13 12:27:14 0x806f0013 Chassis, (NMI State) diagnostic interrupt
40 E Blade_09 09/08/13 12:27:10 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value=3D 0020
41 I Blade_09 09/08/13 12:20:45 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
42 I Blade_09 09/08/13 12:20:34 0x806f0013 Chassis, (NMI State) diagnostic interrupt
43 E Blade_09 09/08/13 12:20:30 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value=3D 0020
44 I Blade_09 09/08/13 12:18:20 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
45 I Blade_09 09/08/13 12:18:10 0x806f0013 Chassis, (NMI State) diagnostic interrupt
46 E Blade_09 09/08/13 12:18:05 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value=3D 0020
47 I Blade_09 09/08/13 12:15:47 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
48 I Blade_09 09/08/13 12:15:37 0x806f0013 Chassis, (NMI State) diagnostic interrupt
49 E Blade_09 09/08/13 12:15:32 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value=3D 0020
27 I Blade_09 09/08/13 13:25:17 0x806f0013 Chassis, (NMI State) diagno= stic interrupt
28 E Blade_09 09/08/13 13:25:12 0x10000002 SMI Hdlr: 001= 51743 HI Fatal Error, HI_FERR/NERR Value=3D 0020
29 I Blade_09 09/08/13= 13:09:14 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt 30 I Blade_09 09/08/13 13:09:03 0x806f0013 Chassis, (NMI State) diagnostic = interrupt
31 E Blade_09 09/08/13 13:08:58 0x10000002 SMI Hdlr: 00151743= HI Fatal Error, HI_FERR/NERR Value=3D 0020
32 I Blade_09 09/08/13 12:4= 6:26 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
33 I Blade_09 09/08/13 12:46:15 0x806f0013 Chassis, (NMI State) diagnostic = interrupt
34 E Blade_09 09/08/13 12:46:11 0x10000002 SMI Hdlr: 00151743= HI Fatal Error, HI_FERR/NERR Value=3D 0020
35 I Blade_09 09/08/13 12:3= 4:13 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
36 I Blade_09 09/08/13 12:34:03 0x806f0013 Chassis, (NMI State) diagnostic = interrupt
37 E Blade_09 09/08/13 12:33:58 0x10000002 SMI Hdlr: 00151743= HI Fatal Error, HI_FERR/NERR Value=3D 0020
38 I Blade_09 09/08/13 12:2= 7:25 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
39 I Blade_09 09/08/13 12:27:14 0x806f0013 Chassis, (NMI State) diagnostic = interrupt
40 E Blade_09 09/08/13 12:27:10 0x10000002 SMI Hdlr: 00151743= HI Fatal Error, HI_FERR/NERR Value=3D 0020
41 I Blade_09 09/08/13 12:2= 0:45 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
42 I Blade_09 09/08/13 12:20:34 0x806f0013 Chassis, (NMI State) diagnostic = interrupt
43 E Blade_09 09/08/13 12:20:30 0x10000002 SMI Hdlr: 00151743= HI Fatal Error, HI_FERR/NERR Value=3D 0020
44 I Blade_09 09/08/13 12:1= 8:20 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
45 I Blade_09 09/08/13 12:18:10 0x806f0013 Chassis, (NMI State) diagnostic = interrupt
46 E Blade_09 09/08/13 12:18:05 0x10000002 SMI Hdlr: 00151743= HI Fatal Error, HI_FERR/NERR Value=3D 0020
47 I Blade_09 09/08/13 12:1= 5:47 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
48 I Blade_09 09/08/13 12:15:37 0x806f0013 Chassis, (NMI State) diagnostic = interrupt
49 E Blade_09 09/08/13 12:15:32 0x10000002 SMI Hdlr: 00151743= HI Fatal Error, HI_FERR/NERR Value=3D 0020
Thanks

=A0
=
27 I Blade_09 09/08/13 13:25:17 0x806f0013 Chassis, (NMI State) diagnostic interrupt
28 E Blade_09 09/08/13 13:25:12 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value=3D 0020
29 I Blade_09 09/08/13 13:09:14 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
30 I Blade_09 09/08/13 13:09:03 0x806f0013 Chassis, (NMI State) diagnostic interrupt
31 E Blade_09 09/08/13 13:08:58 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value=3D 0020
32 I Blade_09 09/08/13 12:46:26 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
33 I Blade_09 09/08/13 12:46:15 0x806f0013 Chassis, (NMI State) diagnostic interrupt
34 E Blade_09 09/08/13 12:46:11 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value=3D 0020
35 I Blade_09 09/08/13 12:34:13 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
36 I Blade_09 09/08/13 12:34:03 0x806f0013 Chassis, (NMI State) diagnostic interrupt
37 E Blade_09 09/08/13 12:33:58 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value=3D 0020
38 I Blade_09 09/08/13 12:27:25 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
39 I Blade_09 09/08/13 12:27:14 0x806f0013 Chassis, (NMI State) diagnostic interrupt
40 E Blade_09 09/08/13 12:27:10 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value=3D 0020
41 I Blade_09 09/08/13 12:20:45 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
42 I Blade_09 09/08/13 12:20:34 0x806f0013 Chassis, (NMI State) diagnostic interrupt
43 E Blade_09 09/08/13 12:20:30 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value=3D 0020
44 I Blade_09 09/08/13 12:18:20 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
45 I Blade_09 09/08/13 12:18:10 0x806f0013 Chassis, (NMI State) diagnostic interrupt
46 E Blade_09 09/08/13 12:18:05 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value=3D 0020
47 I Blade_09 09/08/13 12:15:47 0x806f0013 Recovery Chassis, (NMI State) diagnostic interrupt
48 I Blade_09 09/08/13 12:15:37 0x806f0013 Chassis, (NMI State) diagnostic interrupt
49 E Blade_09 09/08/13 12:15:32 0x10000002 SMI Hdlr: 00151743 HI Fatal Error, HI_FERR/NERR Value=3D 0020

2013/9/23 Konrad Rzeszutek Wilk <konrad.wi= lk@oracle.com>
On Thu, Sep 12, 2013 at 02:47:39PM +0200, Trenta sis wrot= e:
> Hello,
>
> We need this server and we have made a downgrade to = Debian Squeeze.
> I hope in a few day to have another HS20 to make so= me additional test, I'll
> try to get all information that you as= ked and send
> Sorry, one question what is =A0PCI SERR ? Where?

If you l= og in the BladeCenter webfrontend you should see logs of
each blade. Som= e of them are 'User XYZ logged in'. But in some cases
the are mo= re serious ones - such an NMI or PCI SERR. If you could copy-n-paste
them it could help in figuring which PCI device is responsible for causing<= br>the NMI.

>
> Thanks for all
>
> 2013/9/9 = Konrad Rzeszutek Wilk <konrad.= wilk@oracle.com>
>
> > On Sun, Sep 08, 2013 at 04:41:= 02PM +0200, Trenta sis wrote:
> > > =A0Hello,
> > >
> > > I have the sam= e error, server is auto rebooted during every boot with
> > > k= ernel XEN, HS20 with Debian Wheezy and XEN hang on and AMM managment
>= ; > show
> > > same errors described in previous mails. With Debian wheezy = wit non-xen
> > > kernel boots correcte, it seems that problems= is with xen kernel
> > > Same Server HS20 with Debian Lenny+ X= EN 3.2 or Debian Squeeze+XEN
> > > 4.0 working perfect
> > >
> > > Upgr= aded to Debian testing and unstable with same results XEN 4.1 and
> &= gt; 4.2.
> > >
> > > If you need more information, = you can ask.
> > > How can be solved this bug?
> >
> > Did yo= u the workaround help?
> >
> > And in regards to finding = out exactly what causes it - well there are
> > logs in the BMC th= at can point to it the PCI device? Did you check those?
> > Do they save if there is any device that has PCI SERR on them?> >
> > Thanks.
> >

--047d7bd76aeae7b9ef04e7837372-- --===============6798237078016503175== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============6798237078016503175==--