All of lore.kernel.org
 help / color / mirror / Atom feed
* Xen 4 occasionally hangs during boot
@ 2011-10-12 19:34 Christopher S. Aker
  2011-10-13  7:04 ` Jan Beulich
  2011-10-13 10:46 ` Tim Deegan
  0 siblings, 2 replies; 23+ messages in thread
From: Christopher S. Aker @ 2011-10-12 19:34 UTC (permalink / raw)
  To: xen devel

Since I started playing with Xen 4 (vs 3.x), machines often hang during 
reboot at exactly the same place:

	(XEN) HVM: Hardware Assisted Paging detected.
	(

... and then nothing.  I have to RPC bounce them.  On some occasions it 
takes four or five attempts to get beyond this point.  A normal boot 
looks like this:

	(XEN) HVM: Hardware Assisted Paging detected.
	(XEN) Brought up 16 CPUs

4.1.2-rc @ 23159.  All of the Xen 4.x I've tried have done this, but I'd 
need to dig up which ones those are.

-Chris

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Xen 4 occasionally hangs during boot
  2011-10-12 19:34 Xen 4 occasionally hangs during boot Christopher S. Aker
@ 2011-10-13  7:04 ` Jan Beulich
  2011-10-13 10:46 ` Tim Deegan
  1 sibling, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2011-10-13  7:04 UTC (permalink / raw)
  To: Christopher S. Aker; +Cc: xen devel

>>> On 12.10.11 at 21:34, "Christopher S. Aker" <caker@theshore.net> wrote:
> Since I started playing with Xen 4 (vs 3.x), machines often hang during 
> reboot at exactly the same place:

Do those machines have something in common hardware-wise? As you
would certainly assume, this isn't a problem generally, and hence telling
us on what hardware you observe this might help guessing... Also, any
chance you could try recent -unstable?

Jan

> 
> 	(XEN) HVM: Hardware Assisted Paging detected.
> 	(
> 
> ... and then nothing.  I have to RPC bounce them.  On some occasions it 
> takes four or five attempts to get beyond this point.  A normal boot 
> looks like this:
> 
> 	(XEN) HVM: Hardware Assisted Paging detected.
> 	(XEN) Brought up 16 CPUs
> 
> 4.1.2-rc @ 23159.  All of the Xen 4.x I've tried have done this, but I'd 
> need to dig up which ones those are.
> 
> -Chris
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com 
> http://lists.xensource.com/xen-devel 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Xen 4 occasionally hangs during boot
  2011-10-12 19:34 Xen 4 occasionally hangs during boot Christopher S. Aker
  2011-10-13  7:04 ` Jan Beulich
@ 2011-10-13 10:46 ` Tim Deegan
  2012-07-20 17:48   ` Xen 4 serial " Christopher S. Aker
  1 sibling, 1 reply; 23+ messages in thread
From: Tim Deegan @ 2011-10-13 10:46 UTC (permalink / raw)
  To: Christopher S. Aker; +Cc: xen devel

At 15:34 -0400 on 12 Oct (1318433686), Christopher S. Aker wrote:
> Since I started playing with Xen 4 (vs 3.x), machines often hang during 
> reboot at exactly the same place:
> 
> 	(XEN) HVM: Hardware Assisted Paging detected.
> 	(
> 
> ... and then nothing.  I have to RPC bounce them.  On some occasions it 
> takes four or five attempts to get beyond this point.  A normal boot 
> looks like this:
> 
> 	(XEN) HVM: Hardware Assisted Paging detected.
> 	(XEN) Brought up 16 CPUs

:(  If you add "cpuinfo" to the xen commend-line arguments does it print
anything more useful?

Tim.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Xen 4 serial hangs during boot
  2011-10-13 10:46 ` Tim Deegan
@ 2012-07-20 17:48   ` Christopher S. Aker
  2012-07-20 17:49     ` Andrew Cooper
  0 siblings, 1 reply; 23+ messages in thread
From: Christopher S. Aker @ 2012-07-20 17:48 UTC (permalink / raw)
  To: xen devel

On 10/13/11 6:46 AM, Tim Deegan wrote:
> At 15:34 -0400 on 12 Oct (1318433686), Christopher S. Aker wrote:
>> Since I started playing with Xen 4 (vs 3.x), machines often hang during
>> reboot at exactly the same place:

We're still seeing this occasionally, even with 'cpuinfo' added to Xen 
args.  Serial console stops responding during Xen booting - every time 
in the exact same place:

	(XEN) CPU9: Intel(R) Xeon(R) CPU L5520 @ 2.27GHz stepping 05
	(XEN) CPU 10

The machine continues to boot and becomes available via network, however 
nothing I do from then on can get serial to start working again. 
Control-AAA, sending massive amounts to /dev/console, etc.  If I issue 
the reboot command via dom0 something will tickle Xen and a page or two 
of buffered OLD data will flush out the serial before the machine 
reboots, which is interesting.

In this state hvc_console receives no interrupts.  When not in this 
state hvc_console seems to get interrupts occasionally.  Not sure of its 
significance.

I still have a box in this state if anyone has ideas to try.

-Chris

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Xen 4 serial hangs during boot
  2012-07-20 17:48   ` Xen 4 serial " Christopher S. Aker
@ 2012-07-20 17:49     ` Andrew Cooper
  2012-07-20 17:58       ` Christopher S. Aker
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Cooper @ 2012-07-20 17:49 UTC (permalink / raw)
  To: Christopher S. Aker; +Cc: xen devel

On 20/07/12 18:48, Christopher S. Aker wrote:
> On 10/13/11 6:46 AM, Tim Deegan wrote:
>> At 15:34 -0400 on 12 Oct (1318433686), Christopher S. Aker wrote:
>>> Since I started playing with Xen 4 (vs 3.x), machines often hang during
>>> reboot at exactly the same place:
> We're still seeing this occasionally, even with 'cpuinfo' added to Xen 
> args.  Serial console stops responding during Xen booting - every time 
> in the exact same place:
>
> 	(XEN) CPU9: Intel(R) Xeon(R) CPU L5520 @ 2.27GHz stepping 05
> 	(XEN) CPU 10
>
> The machine continues to boot and becomes available via network, however 
> nothing I do from then on can get serial to start working again. 
> Control-AAA, sending massive amounts to /dev/console, etc.  If I issue 
> the reboot command via dom0 something will tickle Xen and a page or two 
> of buffered OLD data will flush out the serial before the machine 
> reboots, which is interesting.
>
> In this state hvc_console receives no interrupts.  When not in this 
> state hvc_console seems to get interrupts occasionally.  Not sure of its 
> significance.
>
> I still have a box in this state if anyone has ideas to try.

Is this an HP box by any chance, and are you accessing serial over iLO?

~Andrew

>
> -Chris
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Xen 4 serial hangs during boot
  2012-07-20 17:49     ` Andrew Cooper
@ 2012-07-20 17:58       ` Christopher S. Aker
  2012-07-20 18:05         ` Andrew Cooper
  0 siblings, 1 reply; 23+ messages in thread
From: Christopher S. Aker @ 2012-07-20 17:58 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen devel

On 7/20/12 1:49 PM, Andrew Cooper wrote:
> Is this an HP box by any chance, and are you accessing serial over iLO?

It is not.  It's a SM motherboard with an on board UART 16550, which 
we're booting with xen args "com1=115200,8n1 console=com1".

-Chris

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Xen 4 serial hangs during boot
  2012-07-20 17:58       ` Christopher S. Aker
@ 2012-07-20 18:05         ` Andrew Cooper
  2012-07-20 19:10           ` Christopher S. Aker
  0 siblings, 1 reply; 23+ messages in thread
From: Andrew Cooper @ 2012-07-20 18:05 UTC (permalink / raw)
  To: Christopher S. Aker; +Cc: xen devel


On 20/07/12 18:58, Christopher S. Aker wrote:
> On 7/20/12 1:49 PM, Andrew Cooper wrote:
>> Is this an HP box by any chance, and are you accessing serial over iLO?
> It is not.  It's a SM motherboard with an on board UART 16550, which 
> we're booting with xen args "com1=115200,8n1 console=com1".

Oh interesting.  We periodically see this with HP kit, but nothing else
which is why we assumed it was iLO specific.  Perhaps it is not after all.

As for suggestions, try manually prodding port 3f8 to see whether the
UART is actually working?

Alternatively, use `xl debug-keys` and `xl dmesg` to see whether you can
provoke it back to life.

>
> -Chris
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Xen 4 serial hangs during boot
  2012-07-20 18:05         ` Andrew Cooper
@ 2012-07-20 19:10           ` Christopher S. Aker
  2012-07-20 19:25             ` Andrew Cooper
  2012-07-20 19:31             ` Keir Fraser
  0 siblings, 2 replies; 23+ messages in thread
From: Christopher S. Aker @ 2012-07-20 19:10 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: xen devel

On 7/20/12 2:05 PM, Andrew Cooper wrote:
> Alternatively, use `xl debug-keys`

Interesting.  'xl debug-keys h' got me ~1448 bytes onto the serial 
console, except it is old buffered output from exactly where it left off 
during boot. 1448 bytes is about the size of 'h' output from another 
working box.  The same thing happened with 'm'.  Xen is flushing an 
equal amount of characters from the buffer as generated by debug-key 
command output.

I can continue to poke the buffer until I see output from things I've 
issued, however it still refuses to respond to serial input 
(control-aaa) nor can I get dom0 to echo chars.  It's currently running 
a * dump which has been going now for over an hour, currently on vcpu 
435628.  I won't be doing that again.

Any lightblubs going off?

-Chris

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Xen 4 serial hangs during boot
  2012-07-20 19:10           ` Christopher S. Aker
@ 2012-07-20 19:25             ` Andrew Cooper
  2012-07-20 19:31             ` Keir Fraser
  1 sibling, 0 replies; 23+ messages in thread
From: Andrew Cooper @ 2012-07-20 19:25 UTC (permalink / raw)
  To: Christopher S. Aker; +Cc: xen devel


On 20/07/12 20:10, Christopher S. Aker wrote:
> On 7/20/12 2:05 PM, Andrew Cooper wrote:
>> Alternatively, use `xl debug-keys`
> Interesting.  'xl debug-keys h' got me ~1448 bytes onto the serial 
> console, except it is old buffered output from exactly where it left off 
> during boot. 1448 bytes is about the size of 'h' output from another 
> working box.  The same thing happened with 'm'.  Xen is flushing an 
> equal amount of characters from the buffer as generated by debug-key 
> command output.
>
> I can continue to poke the buffer until I see output from things I've 
> issued, however it still refuses to respond to serial input 
> (control-aaa) nor can I get dom0 to echo chars.  It's currently running 
> a * dump which has been going now for over an hour, currently on vcpu 
> 435628.  I won't be doing that again.
>
> Any lightblubs going off?

Not especially.  It sounds like the serial ring buffer filled up and
never got drained.

How easy is this to reproduce for you? In the past, I have had success
debugging Xen like this with an outb(0x3f8, <ascii char>) in certain
locations.  Perhaps the serial_rx interrupt handler, or failing that,
do_irq checking for vector 0xf0.  That should allow you to see whether
Xen is actually receiving interrupts when you try to send characters.

>
> -Chris

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Xen 4 serial hangs during boot
  2012-07-20 19:10           ` Christopher S. Aker
  2012-07-20 19:25             ` Andrew Cooper
@ 2012-07-20 19:31             ` Keir Fraser
  2012-07-20 19:44               ` Christopher S. Aker
  1 sibling, 1 reply; 23+ messages in thread
From: Keir Fraser @ 2012-07-20 19:31 UTC (permalink / raw)
  To: Christopher S. Aker, Andrew Cooper; +Cc: xen devel

On 20/07/2012 20:10, "Christopher S. Aker" <caker@theshore.net> wrote:

> On 7/20/12 2:05 PM, Andrew Cooper wrote:
>> Alternatively, use `xl debug-keys`
> 
> Interesting.  'xl debug-keys h' got me ~1448 bytes onto the serial
> console, except it is old buffered output from exactly where it left off
> during boot. 1448 bytes is about the size of 'h' output from another
> working box.  The same thing happened with 'm'.  Xen is flushing an
> equal amount of characters from the buffer as generated by debug-key
> command output.
> 
> I can continue to poke the buffer until I see output from things I've
> issued, however it still refuses to respond to serial input
> (control-aaa) nor can I get dom0 to echo chars.  It's currently running
> a * dump which has been going now for over an hour, currently on vcpu
> 435628.  I won't be doing that again.
> 
> Any lightblubs going off?

Somehow dom0 disabled the serial-line interrupt during boot. Possibly it
appeared as a PnP device in some BIOS table and dom0 decided to disable it
because it doesn't think it is being used. Xen would usually stop this
happening via programming of the IO-APIC/XT-PIC but perhaps there is some
other method of disabling it on this mainboard, which Xen doesn't catch.

 -- Keir

> -Chris
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Xen 4 serial hangs during boot
  2012-07-20 19:31             ` Keir Fraser
@ 2012-07-20 19:44               ` Christopher S. Aker
  2012-07-20 19:59                 ` Keir Fraser
  0 siblings, 1 reply; 23+ messages in thread
From: Christopher S. Aker @ 2012-07-20 19:44 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Andrew Cooper, xen devel

On 7/20/12 3:31 PM, Keir Fraser wrote:
> Somehow dom0 disabled the serial-line interrupt during boot. Possibly it
> appeared as a PnP device in some BIOS table and dom0 decided to disable it
> because it doesn't think it is being used. Xen would usually stop this
> happening via programming of the IO-APIC/XT-PIC but perhaps there is some
> other method of disabling it on this mainboard, which Xen doesn't catch.

Hmm -- except dom0 hasn't even booted yet at the time the serial stops 
working.  Xen is 30-60 seconds away from booting dom0 given the RAM 
scrub still has to happen.

-Chris

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Xen 4 serial hangs during boot
  2012-07-20 19:44               ` Christopher S. Aker
@ 2012-07-20 19:59                 ` Keir Fraser
  2012-07-23 14:13                   ` Konrad Rzeszutek Wilk
  2012-07-23 20:53                   ` Christopher S. Aker
  0 siblings, 2 replies; 23+ messages in thread
From: Keir Fraser @ 2012-07-20 19:59 UTC (permalink / raw)
  To: Christopher S. Aker; +Cc: Andrew Cooper, xen devel

On 20/07/2012 20:44, "Christopher S. Aker" <caker@theshore.net> wrote:

> On 7/20/12 3:31 PM, Keir Fraser wrote:
>> Somehow dom0 disabled the serial-line interrupt during boot. Possibly it
>> appeared as a PnP device in some BIOS table and dom0 decided to disable it
>> because it doesn't think it is being used. Xen would usually stop this
>> happening via programming of the IO-APIC/XT-PIC but perhaps there is some
>> other method of disabling it on this mainboard, which Xen doesn't catch.
> 
> Hmm -- except dom0 hasn't even booted yet at the time the serial stops
> working.  Xen is 30-60 seconds away from booting dom0 given the RAM
> scrub still has to happen.

Then it is Xen doing something to kill the serial interrupt. ;-) I haven't
seen anything like this reported before. Not sure what to suggest really...
Gather debug output from interrupt-related debug keys (via the xl debug-keys
interface) I suppose. I think that would be 'i' and 'z' keys. That plus Xen
and dom0 boot logs... something might become apparent.

 -- Keir

> -Chris

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Xen 4 serial hangs during boot
  2012-07-20 19:59                 ` Keir Fraser
@ 2012-07-23 14:13                   ` Konrad Rzeszutek Wilk
  2012-07-23 15:26                     ` Keir Fraser
  2012-07-23 20:53                   ` Christopher S. Aker
  1 sibling, 1 reply; 23+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-23 14:13 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Andrew Cooper, xen devel

On Fri, Jul 20, 2012 at 08:59:29PM +0100, Keir Fraser wrote:
> On 20/07/2012 20:44, "Christopher S. Aker" <caker@theshore.net> wrote:
> 
> > On 7/20/12 3:31 PM, Keir Fraser wrote:
> >> Somehow dom0 disabled the serial-line interrupt during boot. Possibly it
> >> appeared as a PnP device in some BIOS table and dom0 decided to disable it
> >> because it doesn't think it is being used. Xen would usually stop this
> >> happening via programming of the IO-APIC/XT-PIC but perhaps there is some
> >> other method of disabling it on this mainboard, which Xen doesn't catch.
> > 
> > Hmm -- except dom0 hasn't even booted yet at the time the serial stops
> > working.  Xen is 30-60 seconds away from booting dom0 given the RAM
> > scrub still has to happen.
> 
> Then it is Xen doing something to kill the serial interrupt. ;-) I haven't
> seen anything like this reported before. Not sure what to suggest really...
> Gather debug output from interrupt-related debug keys (via the xl debug-keys
> interface) I suppose. I think that would be 'i' and 'z' keys. That plus Xen
> and dom0 boot logs... something might become apparent.

What about using the serial line without the interrupt?
Meaning com1=115200,8n1,0x3f8,0

That ought to make the code go into polling and ignore the interrupt line right?

> 
>  -- Keir
> 
> > -Chris
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Xen 4 serial hangs during boot
  2012-07-23 14:13                   ` Konrad Rzeszutek Wilk
@ 2012-07-23 15:26                     ` Keir Fraser
  0 siblings, 0 replies; 23+ messages in thread
From: Keir Fraser @ 2012-07-23 15:26 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Andrew Cooper, xen devel

On 23/07/2012 15:13, "Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com> wrote:

> On Fri, Jul 20, 2012 at 08:59:29PM +0100, Keir Fraser wrote:
>> On 20/07/2012 20:44, "Christopher S. Aker" <caker@theshore.net> wrote:
>> 
>>> On 7/20/12 3:31 PM, Keir Fraser wrote:
>>>> Somehow dom0 disabled the serial-line interrupt during boot. Possibly it
>>>> appeared as a PnP device in some BIOS table and dom0 decided to disable it
>>>> because it doesn't think it is being used. Xen would usually stop this
>>>> happening via programming of the IO-APIC/XT-PIC but perhaps there is some
>>>> other method of disabling it on this mainboard, which Xen doesn't catch.
>>> 
>>> Hmm -- except dom0 hasn't even booted yet at the time the serial stops
>>> working.  Xen is 30-60 seconds away from booting dom0 given the RAM
>>> scrub still has to happen.
>> 
>> Then it is Xen doing something to kill the serial interrupt. ;-) I haven't
>> seen anything like this reported before. Not sure what to suggest really...
>> Gather debug output from interrupt-related debug keys (via the xl debug-keys
>> interface) I suppose. I think that would be 'i' and 'z' keys. That plus Xen
>> and dom0 boot logs... something might become apparent.
> 
> What about using the serial line without the interrupt?
> Meaning com1=115200,8n1,0x3f8,0
> 
> That ought to make the code go into polling and ignore the interrupt line
> right?

Yes, that should work. It does waste some CPU time runnign the poll handler
continually, even when the serial line is idle. And of course serial debug
key inputs will still not work.

 -- Keir

>> 
>>  -- Keir
>> 
>>> -Chris
>> 
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Xen 4 serial hangs during boot
  2012-07-20 19:59                 ` Keir Fraser
  2012-07-23 14:13                   ` Konrad Rzeszutek Wilk
@ 2012-07-23 20:53                   ` Christopher S. Aker
  2012-07-23 22:03                     ` Malcolm Crossley
                                       ` (4 more replies)
  1 sibling, 5 replies; 23+ messages in thread
From: Christopher S. Aker @ 2012-07-23 20:53 UTC (permalink / raw)
  To: Keir Fraser; +Cc: Andrew Cooper, xen devel

On 7/20/12 3:59 PM, Keir Fraser wrote:
> Then it is Xen doing something to kill the serial interrupt. ;-) I haven't
> seen anything like this reported before. Not sure what to suggest really...
> Gather debug output from interrupt-related debug keys (via the xl debug-keys
> interface) I suppose. I think that would be 'i' and 'z' keys. That plus Xen
> and dom0 boot logs... something might become apparent.

We hit this again today, and I grabbed boot and debug-keys output:

http://theshore.net/~caker/xen/BUGS/serial/log.txt

Thanks,
-Chris

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Xen 4 serial hangs during boot
  2012-07-23 20:53                   ` Christopher S. Aker
@ 2012-07-23 22:03                     ` Malcolm Crossley
  2012-07-23 22:45                     ` Malcolm Crossley
                                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Malcolm Crossley @ 2012-07-23 22:03 UTC (permalink / raw)
  To: 'Christopher S. Aker', Keir (Xen.org); +Cc: Andrew Cooper, xen devel

Try enabling x2apic mode in the bios if there's an option for it, or remove any CPU masking that may be applied.

Sorry about top posting (remote login forces using outlook).

Malcolm

-----Original Message-----
From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Christopher S. Aker
Sent: 23 July 2012 21:54
To: Keir (Xen.org)
Cc: Andrew Cooper; xen devel
Subject: Re: [Xen-devel] Xen 4 serial hangs during boot

On 7/20/12 3:59 PM, Keir Fraser wrote:
> Then it is Xen doing something to kill the serial interrupt. ;-) I haven't
> seen anything like this reported before. Not sure what to suggest really...
> Gather debug output from interrupt-related debug keys (via the xl debug-keys
> interface) I suppose. I think that would be 'i' and 'z' keys. That plus Xen
> and dom0 boot logs... something might become apparent.

We hit this again today, and I grabbed boot and debug-keys output:

http://theshore.net/~caker/xen/BUGS/serial/log.txt

Thanks,
-Chris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Xen 4 serial hangs during boot
  2012-07-23 20:53                   ` Christopher S. Aker
  2012-07-23 22:03                     ` Malcolm Crossley
@ 2012-07-23 22:45                     ` Malcolm Crossley
  2012-07-24  9:40                     ` Andrew Cooper
                                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Malcolm Crossley @ 2012-07-23 22:45 UTC (permalink / raw)
  To: 'Christopher S. Aker', Keir (Xen.org); +Cc: Andrew Cooper, xen devel

Sorry for the top post again,

You can also try adding "apic=bigsmp" to the xen command line. 

-----Original Message-----
From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Christopher S. Aker
Sent: 23 July 2012 21:54
To: Keir (Xen.org)
Cc: Andrew Cooper; xen devel
Subject: Re: [Xen-devel] Xen 4 serial hangs during boot

On 7/20/12 3:59 PM, Keir Fraser wrote:
> Then it is Xen doing something to kill the serial interrupt. ;-) I haven't
> seen anything like this reported before. Not sure what to suggest really...
> Gather debug output from interrupt-related debug keys (via the xl debug-keys
> interface) I suppose. I think that would be 'i' and 'z' keys. That plus Xen
> and dom0 boot logs... something might become apparent.

We hit this again today, and I grabbed boot and debug-keys output:

http://theshore.net/~caker/xen/BUGS/serial/log.txt

Thanks,
-Chris

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Xen 4 serial hangs during boot
  2012-07-23 20:53                   ` Christopher S. Aker
  2012-07-23 22:03                     ` Malcolm Crossley
  2012-07-23 22:45                     ` Malcolm Crossley
@ 2012-07-24  9:40                     ` Andrew Cooper
  2012-07-24 10:32                     ` Jan Beulich
  2012-07-24 10:46                     ` Jan Beulich
  4 siblings, 0 replies; 23+ messages in thread
From: Andrew Cooper @ 2012-07-24  9:40 UTC (permalink / raw)
  To: Christopher S. Aker; +Cc: xen devel, Keir (Xen.org)

On 23/07/12 21:53, Christopher S. Aker wrote:
> On 7/20/12 3:59 PM, Keir Fraser wrote:
>> Then it is Xen doing something to kill the serial interrupt. ;-) I haven't
>> seen anything like this reported before. Not sure what to suggest really...
>> Gather debug output from interrupt-related debug keys (via the xl debug-keys
>> interface) I suppose. I think that would be 'i' and 'z' keys. That plus Xen
>> and dom0 boot logs... something might become apparent.
> We hit this again today, and I grabbed boot and debug-keys output:
>
> http://theshore.net/~caker/xen/BUGS/serial/log.txt
>
> Thanks,
> -Chris

The serial interrupt will be IO-APIC #9 pin 4 which is set with its
vector as 0xf1.

I cant immediately see any other issue with that log unfortunately.

-- 
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Xen 4 serial hangs during boot
  2012-07-23 20:53                   ` Christopher S. Aker
                                       ` (2 preceding siblings ...)
  2012-07-24  9:40                     ` Andrew Cooper
@ 2012-07-24 10:32                     ` Jan Beulich
  2012-07-26 13:50                       ` Konrad Rzeszutek Wilk
  2012-07-24 10:46                     ` Jan Beulich
  4 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2012-07-24 10:32 UTC (permalink / raw)
  To: Christopher S. Aker; +Cc: Andrew Cooper, xen devel, Keir Fraser

>>> On 23.07.12 at 22:53, "Christopher S. Aker" <caker@theshore.net> wrote:
> On 7/20/12 3:59 PM, Keir Fraser wrote:
>> Then it is Xen doing something to kill the serial interrupt. ;-) I haven't
>> seen anything like this reported before. Not sure what to suggest really...
>> Gather debug output from interrupt-related debug keys (via the xl debug-keys
>> interface) I suppose. I think that would be 'i' and 'z' keys. That plus Xen
>> and dom0 boot logs... something might become apparent.
> 
> We hit this again today, and I grabbed boot and debug-keys output:
> 
> http://theshore.net/~caker/xen/BUGS/serial/log.txt 

This isn't even 8k that make it over, whereas the transmit buffer
is 16k, and dropping of characters would only start when it first
got full.

The part of the data that didn't make it out isn't big enough to
overflow the buffer - to check whether that would actually
happen, could you increase the log level of both hypervisor and
Dom0 kernel? To me this all (particularly the fact that you can
make the data appear combined with the amount of data not
being big enough to fill the buffer) looks as if there was some
buffering happening outside of the control of Xen. Did you check
whether this is possibly a problem with the remote end?

Does this also happen with "sync_console"? Did you check
whether disabling the use of the associated IRQ makes any
difference, as suggested by Konrad (I think)?

Does the port work flawlessly on native Linux?

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Xen 4 serial hangs during boot
  2012-07-23 20:53                   ` Christopher S. Aker
                                       ` (3 preceding siblings ...)
  2012-07-24 10:32                     ` Jan Beulich
@ 2012-07-24 10:46                     ` Jan Beulich
  4 siblings, 0 replies; 23+ messages in thread
From: Jan Beulich @ 2012-07-24 10:46 UTC (permalink / raw)
  To: Christopher S. Aker; +Cc: Andrew Cooper, xen devel, Keir Fraser

>>> On 23.07.12 at 22:53, "Christopher S. Aker" <caker@theshore.net> wrote:
> On 7/20/12 3:59 PM, Keir Fraser wrote:
>> Then it is Xen doing something to kill the serial interrupt. ;-) I haven't
>> seen anything like this reported before. Not sure what to suggest really...
>> Gather debug output from interrupt-related debug keys (via the xl debug-keys
>> interface) I suppose. I think that would be 'i' and 'z' keys. That plus Xen
>> and dom0 boot logs... something might become apparent.
> 
> We hit this again today, and I grabbed boot and debug-keys output:
> 
> http://theshore.net/~caker/xen/BUGS/serial/log.txt 

One more thing - having seen various interesting (mis)behavior
(or part of the chip set) with the IOMMU turned on, could you
also check whether with it disabled you also get the problem?

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Xen 4 serial hangs during boot
  2012-07-24 10:32                     ` Jan Beulich
@ 2012-07-26 13:50                       ` Konrad Rzeszutek Wilk
  2012-07-26 14:10                         ` Jan Beulich
  0 siblings, 1 reply; 23+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-26 13:50 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen devel, Keir Fraser

On Tue, Jul 24, 2012 at 11:32:19AM +0100, Jan Beulich wrote:
> >>> On 23.07.12 at 22:53, "Christopher S. Aker" <caker@theshore.net> wrote:
> > On 7/20/12 3:59 PM, Keir Fraser wrote:
> >> Then it is Xen doing something to kill the serial interrupt. ;-) I haven't
> >> seen anything like this reported before. Not sure what to suggest really...
> >> Gather debug output from interrupt-related debug keys (via the xl debug-keys
> >> interface) I suppose. I think that would be 'i' and 'z' keys. That plus Xen
> >> and dom0 boot logs... something might become apparent.
> > 
> > We hit this again today, and I grabbed boot and debug-keys output:
> > 
> > http://theshore.net/~caker/xen/BUGS/serial/log.txt 
> 
> This isn't even 8k that make it over, whereas the transmit buffer
> is 16k, and dropping of characters would only start when it first
> got full.
> 
> The part of the data that didn't make it out isn't big enough to
> overflow the buffer - to check whether that would actually
> happen, could you increase the log level of both hypervisor and
> Dom0 kernel? To me this all (particularly the fact that you can
> make the data appear combined with the amount of data not
> being big enough to fill the buffer) looks as if there was some
> buffering happening outside of the control of Xen. Did you check
> whether this is possibly a problem with the remote end?

This got me thinking - I've one particular AMD machine (prototype) that
seems to hang often - but if I use 'sync_console' it works fine.

This issue started oooh, I can't remember when but I do have some logs
that could shed some light on the about date. I guess I was
too quick to blame the prototype for being at fault here :-(

Then recently (yesterday?) the upstream kernel started doing something
wonky on this card:

01:05.0 Serial controller: NetMos Technology PCI 9835 Multi-I/O Controller (rev 01)
Under Xen, when it boots it hits right here:
[    1.240774] pci 0000:01:05.0: [9710:9835] type 00 class 0x070002
and then stops [note: I hadn't really done any investigation to see
if the machine is dead or if it continues on, but with the serial port just
wedged hard].

On baremetal it can actually read the IO bars:
[    1.240774] pci 0000:01:05.0: [9710:9835] type 00 class 0x070002
[    1.247075] pci 0000:01:05.0: reg 10: [io  0xe050-0xe057]
[    1.252734] pci 0000:01:05.0: reg 14: [io  0xe040-0xe047]
[    1.258394] pci 0000:01:05.0: reg 18: [io  0xe030-0xe037]
[    1.264054] pci 0000:01:05.0: reg 1c: [io  0xe020-0xe027]
[    1.269713] pci 0000:01:05.0: reg 20: [io  0xe010-0xe017]
[    1.275372] pci 0000:01:05.0: reg 24: [io  0xe000-0xe00f]

so I am wondering if the back-ports in Xen 4.1 for dealing with
PCI have something to do with this? 

> 
> Does this also happen with "sync_console"? Did you check
> whether disabling the use of the associated IRQ makes any
> difference, as suggested by Konrad (I think)?
> 
> Does the port work flawlessly on native Linux?
> 
> Jan
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Xen 4 serial hangs during boot
  2012-07-26 13:50                       ` Konrad Rzeszutek Wilk
@ 2012-07-26 14:10                         ` Jan Beulich
  2012-07-26 14:36                           ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 23+ messages in thread
From: Jan Beulich @ 2012-07-26 14:10 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Andrew Cooper, xen devel, Keir Fraser

>>> On 26.07.12 at 15:50, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> Then recently (yesterday?) the upstream kernel started doing something
> wonky on this card:
> 
> 01:05.0 Serial controller: NetMos Technology PCI 9835 Multi-I/O Controller 
> (rev 01)
> Under Xen, when it boots it hits right here:
> [    1.240774] pci 0000:01:05.0: [9710:9835] type 00 class 0x070002
> and then stops [note: I hadn't really done any investigation to see
> if the machine is dead or if it continues on, but with the serial port just
> wedged hard].

The machine state here, if accessible at all, would of course be
very interesting.

> On baremetal it can actually read the IO bars:
> [    1.240774] pci 0000:01:05.0: [9710:9835] type 00 class 0x070002
> [    1.247075] pci 0000:01:05.0: reg 10: [io  0xe050-0xe057]
> [    1.252734] pci 0000:01:05.0: reg 14: [io  0xe040-0xe047]
> [    1.258394] pci 0000:01:05.0: reg 18: [io  0xe030-0xe037]
> [    1.264054] pci 0000:01:05.0: reg 1c: [io  0xe020-0xe027]
> [    1.269713] pci 0000:01:05.0: reg 20: [io  0xe010-0xe017]
> [    1.275372] pci 0000:01:05.0: reg 24: [io  0xe000-0xe00f]
> 
> so I am wondering if the back-ports in Xen 4.1 for dealing with
> PCI have something to do with this? 

What backports are you thinking of? I just went through the titles
of everything since 4.1.2, and nothing that has "PCI" in it looks
in any way dangerous.

Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Xen 4 serial hangs during boot
  2012-07-26 14:10                         ` Jan Beulich
@ 2012-07-26 14:36                           ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 23+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-07-26 14:36 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, xen devel, Keir Fraser

On Thu, Jul 26, 2012 at 03:10:25PM +0100, Jan Beulich wrote:
> >>> On 26.07.12 at 15:50, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> > Then recently (yesterday?) the upstream kernel started doing something
> > wonky on this card:
> > 
> > 01:05.0 Serial controller: NetMos Technology PCI 9835 Multi-I/O Controller 
> > (rev 01)
> > Under Xen, when it boots it hits right here:
> > [    1.240774] pci 0000:01:05.0: [9710:9835] type 00 class 0x070002
> > and then stops [note: I hadn't really done any investigation to see
> > if the machine is dead or if it continues on, but with the serial port just
> > wedged hard].
> 
> The machine state here, if accessible at all, would of course be
> very interesting.

<nods> Hope to get to that today.
> 
> > On baremetal it can actually read the IO bars:
> > [    1.240774] pci 0000:01:05.0: [9710:9835] type 00 class 0x070002
> > [    1.247075] pci 0000:01:05.0: reg 10: [io  0xe050-0xe057]
> > [    1.252734] pci 0000:01:05.0: reg 14: [io  0xe040-0xe047]
> > [    1.258394] pci 0000:01:05.0: reg 18: [io  0xe030-0xe037]
> > [    1.264054] pci 0000:01:05.0: reg 1c: [io  0xe020-0xe027]
> > [    1.269713] pci 0000:01:05.0: reg 20: [io  0xe010-0xe017]
> > [    1.275372] pci 0000:01:05.0: reg 24: [io  0xe000-0xe00f]
> > 
> > so I am wondering if the back-ports in Xen 4.1 for dealing with
> > PCI have something to do with this? 
> 
> What backports are you thinking of? I just went through the titles
> of everything since 4.1.2, and nothing that has "PCI" in it looks
> in any way dangerous.

I know :-( That is why I am thinking it might be the kernel, but when
I did a git bisection I got an innocious Documentation patch. But then
recently (say 3.5) has been doing some weird stuff in the PCI space

(like it seems to have MSI's and BARs disabled - at least when using
them with xen-pciback to hide them). 
> 
> Jan

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2012-07-26 14:36 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-12 19:34 Xen 4 occasionally hangs during boot Christopher S. Aker
2011-10-13  7:04 ` Jan Beulich
2011-10-13 10:46 ` Tim Deegan
2012-07-20 17:48   ` Xen 4 serial " Christopher S. Aker
2012-07-20 17:49     ` Andrew Cooper
2012-07-20 17:58       ` Christopher S. Aker
2012-07-20 18:05         ` Andrew Cooper
2012-07-20 19:10           ` Christopher S. Aker
2012-07-20 19:25             ` Andrew Cooper
2012-07-20 19:31             ` Keir Fraser
2012-07-20 19:44               ` Christopher S. Aker
2012-07-20 19:59                 ` Keir Fraser
2012-07-23 14:13                   ` Konrad Rzeszutek Wilk
2012-07-23 15:26                     ` Keir Fraser
2012-07-23 20:53                   ` Christopher S. Aker
2012-07-23 22:03                     ` Malcolm Crossley
2012-07-23 22:45                     ` Malcolm Crossley
2012-07-24  9:40                     ` Andrew Cooper
2012-07-24 10:32                     ` Jan Beulich
2012-07-26 13:50                       ` Konrad Rzeszutek Wilk
2012-07-26 14:10                         ` Jan Beulich
2012-07-26 14:36                           ` Konrad Rzeszutek Wilk
2012-07-24 10:46                     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.