All of lore.kernel.org
 help / color / mirror / Atom feed
* Question about Xen reboot on panic
@ 2015-11-11 22:49 Meng Xu
  2015-11-11 22:54 ` Andrew Cooper
  0 siblings, 1 reply; 17+ messages in thread
From: Meng Xu @ 2015-11-11 22:49 UTC (permalink / raw)
  To: xen-devel; +Cc: Wei Liu

Hi all,

I'm trying to work remotely with the machine I have. So I want Xen
automatically reboot if it experiences a panic.

When Xen panic, it says " Reboot in five seconds...". However, it does
not reboot. :-(

*** The xen boot command line on my machine is ***
multiboot   /boot/xen-4.6-unstable.gz placeholder
dom0_mem=8192M,max:8192M dom0_max_vcpus=8 sched=credit
com1=115200,8n1,pci console=com1,vga reboot=k panic=2 panic_on_oops=1

*** The dom0's boot command line on my machine is ***
 module  /boot/vmlinuz-3.8.0-44-generic placeholder root=/dev/sda5 ro
panic=10 panic_on_oops=1 console=hvc0,115200n8 earlyprintk=xen quiet
splash

I'm actually not quite sure if I should add "reboot=k panic=2
panic_on_oops=1" into both command lines. But if I remove them, Xen
still won't reboot on some panic.

For example, if I boot into modified RTDS scheduler on Xen 4.6
unstable, it experiences panic as follows:

(XEN) Xen call trace:

(XEN)    [<ffff82d080128479>] rt_vcpu_remove+0x87/0xf9

(XEN)    [<ffff82d08012ab06>] sched_destroy_vcpu+0x98/0xed

(XEN)    [<ffff82d08010571f>] complete_domain_destroy+0x52/0x17f

(XEN)    [<ffff82d0801360d8>] rcu_process_callbacks+0x149/0x1a6

(XEN)    [<ffff82d08012d243>] __do_softirq+0x133/0x147

(XEN)    [<ffff82d08012d28f>] process_pending_softirqs+0x38/0x3a

(XEN)    [<ffff82d0801afbff>] mwait_idle+0x10a/0x2fb

(XEN)    [<ffff82d0801640c2>] idle_loop+0x51/0x6b

(XEN)

(XEN)

(XEN) ****************************************

(XEN) Panic on CPU 5:

(XEN) GENERAL PROTECTION FAULT

(XEN) [error_code=0000]

(XEN) ****************************************

(XEN)

(XEN) Reboot in five seconds...

***It says  "Reboot in five seconds", but I want for 1 min and nothing
happened. :-(


Did anyone happen to know why Xen does not reboot under this situation?
(Xen could reboot if it fails on an ASSERT())

Thank you very much for your time and help!

Best regards,

Meng

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Xen reboot on panic
  2015-11-11 22:49 Question about Xen reboot on panic Meng Xu
@ 2015-11-11 22:54 ` Andrew Cooper
  2015-11-11 23:21   ` Meng Xu
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Cooper @ 2015-11-11 22:54 UTC (permalink / raw)
  To: Meng Xu, xen-devel; +Cc: Wei Liu

On 11/11/2015 22:49, Meng Xu wrote:
> Hi all,
>
> I'm trying to work remotely with the machine I have. So I want Xen
> automatically reboot if it experiences a panic.
>
> When Xen panic, it says " Reboot in five seconds...". However, it does
> not reboot. :-(
>
> *** The xen boot command line on my machine is ***
> multiboot   /boot/xen-4.6-unstable.gz placeholder
> dom0_mem=8192M,max:8192M dom0_max_vcpus=8 sched=credit
> com1=115200,8n1,pci console=com1,vga reboot=k panic=2 panic_on_oops=1

Xen has no panic= or panic_on_oops= command line options.

Please read the command line
documentation(http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html). 
It is written and kept up to date for a reason.

Furthermore, you are force-overriding the normal reboot path with the
keyboard controller.  You shouldn't need to do this unless you have some
very wonky firmware.

Finally, I can't tell from your paste below, but ensure that you are
always using a debug hypervisor.

~Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Xen reboot on panic
  2015-11-11 22:54 ` Andrew Cooper
@ 2015-11-11 23:21   ` Meng Xu
  2015-11-11 23:34     ` Andrew Cooper
  0 siblings, 1 reply; 17+ messages in thread
From: Meng Xu @ 2015-11-11 23:21 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, xen-devel

Hi Andrew,

First, thank you so much for your super fast reply! I really appreciate it. :-)

2015-11-11 17:54 GMT-05:00 Andrew Cooper <andrew.cooper3@citrix.com>:
> On 11/11/2015 22:49, Meng Xu wrote:
>> Hi all,
>>
>> I'm trying to work remotely with the machine I have. So I want Xen
>> automatically reboot if it experiences a panic.
>>
>> When Xen panic, it says " Reboot in five seconds...". However, it does
>> not reboot. :-(
>>
>> *** The xen boot command line on my machine is ***
>> multiboot   /boot/xen-4.6-unstable.gz placeholder
>> dom0_mem=8192M,max:8192M dom0_max_vcpus=8 sched=credit
>> com1=115200,8n1,pci console=com1,vga reboot=k panic=2 panic_on_oops=1
>
> Xen has no panic= or panic_on_oops= command line options.
>
> Please read the command line
> documentation(http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html).
> It is written and kept up to date for a reason.

Yes, I actually read it before I put the command options. I'm not
quite sure if it is most updated and I was hoping "what if it works".
Now I know it won't work for sure.

>
> Furthermore, you are force-overriding the normal reboot path with the
> keyboard controller.  You shouldn't need to do this unless you have some
> very wonky firmware.

I see. I removed it from the boot command line.

>
> Finally, I can't tell from your paste below, but ensure that you are
> always using a debug hypervisor.

The source file Config.mk under the xen folder has
debug ?= y

In addition,  "xl dmesg |grep debug" gives me:

(XEN) Xen version 4.6-unstable (root@) (gcc (Ubuntu/Linaro
4.6.3-1ubuntu5) 4.6.3) debug=y Wed Nov 11 17:06:30 EST 2015

So I guess I'm using the debug hypervisor.

I reboot the system after removing all of those useless options (that
is, no more "reboot=k panic=2 panic_on_oops=1" in the Xen boot command
line.)

Is there anything else I can do to force Xen always reboot at panic or oops?

Thank you very much for your help!

Best,

Meng



-- 


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Xen reboot on panic
  2015-11-11 23:21   ` Meng Xu
@ 2015-11-11 23:34     ` Andrew Cooper
  2015-11-12  2:10       ` Meng Xu
  0 siblings, 1 reply; 17+ messages in thread
From: Andrew Cooper @ 2015-11-11 23:34 UTC (permalink / raw)
  To: Meng Xu; +Cc: Wei Liu, xen-devel

On 11/11/2015 23:21, Meng Xu wrote:
>
>> Finally, I can't tell from your paste below, but ensure that you are
>> always using a debug hypervisor.
> The source file Config.mk under the xen folder has
> debug ?= y
>
> In addition,  "xl dmesg |grep debug" gives me:
>
> (XEN) Xen version 4.6-unstable (root@) (gcc (Ubuntu/Linaro
> 4.6.3-1ubuntu5) 4.6.3) debug=y Wed Nov 11 17:06:30 EST 2015
>
> So I guess I'm using the debug hypervisor.

You are

>
> I reboot the system after removing all of those useless options (that
> is, no more "reboot=k panic=2 panic_on_oops=1" in the Xen boot command
> line.)
>
> Is there anything else I can do to force Xen always reboot at panic or oops?

Unless you specify noreboot, Xen will try its hardest to reboot the
system.  It is possible that you have a dodgy firmware which interacts
poorly with the default methods.

Does normal reboot from dom0 work as intended?

If not, debug in the following order:

* `reboot` from the dom0 shell
* `echo b > /proc/sysrq-trigger` from the dom0 shell
* `xl debug-keys R` from the dom0 shell
* CTRL-A x3, R from the serial console

Those are the reboot options.  It is also possible that a kexec kernel
is being loaded and that is getting stuck.

The crash options are:
* `kexec -p`
* `echo c > /proc/sysrq-trigger`
* `xl debug-keys C`

If those don't work then you will need to start instrumenting Xen to
work out where stuff is going wrong.

~Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Xen reboot on panic
  2015-11-11 23:34     ` Andrew Cooper
@ 2015-11-12  2:10       ` Meng Xu
  2015-11-12 12:52         ` Andrew Cooper
  0 siblings, 1 reply; 17+ messages in thread
From: Meng Xu @ 2015-11-12  2:10 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, xen-devel

Hi Andrew,

2015-11-11 18:34 GMT-05:00 Andrew Cooper <andrew.cooper3@citrix.com>:
> On 11/11/2015 23:21, Meng Xu wrote:
>>
>>> Finally, I can't tell from your paste below, but ensure that you are
>>> always using a debug hypervisor.
>> The source file Config.mk under the xen folder has
>> debug ?= y
>>
>> In addition,  "xl dmesg |grep debug" gives me:
>>
>> (XEN) Xen version 4.6-unstable (root@) (gcc (Ubuntu/Linaro
>> 4.6.3-1ubuntu5) 4.6.3) debug=y Wed Nov 11 17:06:30 EST 2015
>>
>> So I guess I'm using the debug hypervisor.
>
> You are
>
>>
>> I reboot the system after removing all of those useless options (that
>> is, no more "reboot=k panic=2 panic_on_oops=1" in the Xen boot command
>> line.)
>>
>> Is there anything else I can do to force Xen always reboot at panic or oops?
>
> Unless you specify noreboot, Xen will try its hardest to reboot the
> system.  It is possible that you have a dodgy firmware which interacts
> poorly with the default methods.
>
> Does normal reboot from dom0 work as intended?

Yes. Before Xen crashes, I can reboot the machine dom0 or from a serial port.

>
> If not, debug in the following order:
>
> * `reboot` from the dom0 shell
> * `echo b > /proc/sysrq-trigger` from the dom0 shell
> * `xl debug-keys R` from the dom0 shell

All of these three approaches can reboot the machine successfully.


> * CTRL-A x3, R from the serial console
I think "Ctrl-A" means that I should press "Ctrl + A" three times. Am I correct?
When I press Ctrl-A twice, it shows "No other window"; after I press
"Ctrl-A" for three consecutive times, and press R (or r), it shows
"+wrap" in the serial port.

>From the serial console, I can press "Ctrl+o, b" to reboot the machine
when Xen hasn't crashed. But when Xen crashed, serial port won't work.

BTW, the serial port is an PCI serial port instead of the legacy
serial port on the motherboard. Is the PCI serial port a problem? On
another machine with the legacy serial port, I can use "Ctrl - o, b"
to reboot even when system crashes. :-(

>
> Those are the reboot options.  It is also possible that a kexec kernel
> is being loaded and that is getting stuck.
>
> The crash options are:
> * `kexec -p`
> * `echo c > /proc/sysrq-trigger`
> * `xl debug-keys C`

The "xl debug-keys c" will not reboot the system, but it will print
out the crash message in the serial console.

>
> If those don't work then you will need to start instrumenting Xen to
> work out where stuff is going wrong.

It seems all of the above commands work on my machine (except for the
Ctrl-A x3, R). Is there anything else I can do to force the system
reboot at panic?

Thank you very much for your time and help in this problem!

Best regards,

Meng


-- 


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Xen reboot on panic
  2015-11-12  2:10       ` Meng Xu
@ 2015-11-12 12:52         ` Andrew Cooper
  2015-11-12 12:57           ` Wei Liu
  2015-11-12 15:07           ` Meng Xu
  0 siblings, 2 replies; 17+ messages in thread
From: Andrew Cooper @ 2015-11-12 12:52 UTC (permalink / raw)
  To: Meng Xu; +Cc: Wei Liu, xen-devel

On 12/11/15 02:10, Meng Xu wrote:
> Hi Andrew,
>
> 2015-11-11 18:34 GMT-05:00 Andrew Cooper <andrew.cooper3@citrix.com>:
>> On 11/11/2015 23:21, Meng Xu wrote:
>>>> Finally, I can't tell from your paste below, but ensure that you are
>>>> always using a debug hypervisor.
>>> The source file Config.mk under the xen folder has
>>> debug ?= y
>>>
>>> In addition,  "xl dmesg |grep debug" gives me:
>>>
>>> (XEN) Xen version 4.6-unstable (root@) (gcc (Ubuntu/Linaro
>>> 4.6.3-1ubuntu5) 4.6.3) debug=y Wed Nov 11 17:06:30 EST 2015
>>>
>>> So I guess I'm using the debug hypervisor.
>> You are
>>
>>> I reboot the system after removing all of those useless options (that
>>> is, no more "reboot=k panic=2 panic_on_oops=1" in the Xen boot command
>>> line.)
>>>
>>> Is there anything else I can do to force Xen always reboot at panic or oops?
>> Unless you specify noreboot, Xen will try its hardest to reboot the
>> system.  It is possible that you have a dodgy firmware which interacts
>> poorly with the default methods.
>>
>> Does normal reboot from dom0 work as intended?
> Yes. Before Xen crashes, I can reboot the machine dom0 or from a serial port.
>
>> If not, debug in the following order:
>>
>> * `reboot` from the dom0 shell
>> * `echo b > /proc/sysrq-trigger` from the dom0 shell
>> * `xl debug-keys R` from the dom0 shell
> All of these three approaches can reboot the machine successfully.
>
>
>> * CTRL-A x3, R from the serial console
> I think "Ctrl-A" means that I should press "Ctrl + A" three times. Am I correct?
> When I press Ctrl-A twice, it shows "No other window"; after I press
> "Ctrl-A" for three consecutive times, and press R (or r), it shows
> "+wrap" in the serial port.

In which case the program you are using locally to connect to the serial
console (Minicom / screen/ putty?) is intercepting CTRL-a for its own
purposes.

In screen for example, you need to send CTRL-a a to send a "CTRL-a" on
the serial.

>
> From the serial console, I can press "Ctrl+o, b" to reboot the machine
> when Xen hasn't crashed. But when Xen crashed, serial port won't work.

Ctrl-o b is exactly the same as `echo b > /proc/sysrq-trigger`.

>
> BTW, the serial port is an PCI serial port instead of the legacy
> serial port on the motherboard. Is the PCI serial port a problem? On
> another machine with the legacy serial port, I can use "Ctrl - o, b"
> to reboot even when system crashes. :-(

Once dom0 stops responding to its console, CTRL-o won't help you at all.

>
>> Those are the reboot options.  It is also possible that a kexec kernel
>> is being loaded and that is getting stuck.
>>
>> The crash options are:
>> * `kexec -p`
>> * `echo c > /proc/sysrq-trigger`
>> * `xl debug-keys C`
> The "xl debug-keys c" will not reboot the system, but it will print
> out the crash message in the serial console.

Right - in which case there is a problem on the crash path, rather than
the reboot path.

Are you (or rather, your dom0) loading a crash kernel?

>
>> If those don't work then you will need to start instrumenting Xen to
>> work out where stuff is going wrong.
> It seems all of the above commands work on my machine (except for the
> Ctrl-A x3, R). Is there anything else I can do to force the system
> reboot at panic?

Get CTRL-A working first.  That is simply a configuration interaction
with the software you are using to connect to the serial console.

Once you get that working, you will be able to use debug keys from the
serial console itself, rather than via `xl debug-keys`.

After that, you should start putting printk()s in machine_restart() to
see where execution is actually getting to.

~Andrew

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Xen reboot on panic
  2015-11-12 12:52         ` Andrew Cooper
@ 2015-11-12 12:57           ` Wei Liu
  2015-11-12 13:16             ` Ian Campbell
  2015-11-12 15:07           ` Meng Xu
  1 sibling, 1 reply; 17+ messages in thread
From: Wei Liu @ 2015-11-12 12:57 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, Meng Xu, xen-devel

On Thu, Nov 12, 2015 at 12:52:56PM +0000, Andrew Cooper wrote:
> On 12/11/15 02:10, Meng Xu wrote:
> > Hi Andrew,
> >
> > 2015-11-11 18:34 GMT-05:00 Andrew Cooper <andrew.cooper3@citrix.com>:
> >> On 11/11/2015 23:21, Meng Xu wrote:
> >>>> Finally, I can't tell from your paste below, but ensure that you are
> >>>> always using a debug hypervisor.
> >>> The source file Config.mk under the xen folder has
> >>> debug ?= y
> >>>
> >>> In addition,  "xl dmesg |grep debug" gives me:
> >>>
> >>> (XEN) Xen version 4.6-unstable (root@) (gcc (Ubuntu/Linaro
> >>> 4.6.3-1ubuntu5) 4.6.3) debug=y Wed Nov 11 17:06:30 EST 2015
> >>>
> >>> So I guess I'm using the debug hypervisor.
> >> You are
> >>
> >>> I reboot the system after removing all of those useless options (that
> >>> is, no more "reboot=k panic=2 panic_on_oops=1" in the Xen boot command
> >>> line.)
> >>>
> >>> Is there anything else I can do to force Xen always reboot at panic or oops?
> >> Unless you specify noreboot, Xen will try its hardest to reboot the
> >> system.  It is possible that you have a dodgy firmware which interacts
> >> poorly with the default methods.
> >>
> >> Does normal reboot from dom0 work as intended?
> > Yes. Before Xen crashes, I can reboot the machine dom0 or from a serial port.
> >
> >> If not, debug in the following order:
> >>
> >> * `reboot` from the dom0 shell
> >> * `echo b > /proc/sysrq-trigger` from the dom0 shell
> >> * `xl debug-keys R` from the dom0 shell
> > All of these three approaches can reboot the machine successfully.
> >
> >
> >> * CTRL-A x3, R from the serial console
> > I think "Ctrl-A" means that I should press "Ctrl + A" three times. Am I correct?
> > When I press Ctrl-A twice, it shows "No other window"; after I press
> > "Ctrl-A" for three consecutive times, and press R (or r), it shows
> > "+wrap" in the serial port.
> 
> In which case the program you are using locally to connect to the serial
> console (Minicom / screen/ putty?) is intercepting CTRL-a for its own
> purposes.
> 
> In screen for example, you need to send CTRL-a a to send a "CTRL-a" on
> the serial.
> 

And for minicom it is "Ctrl-a Ctrl-a" to send "Ctrl-a".

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Xen reboot on panic
  2015-11-12 12:57           ` Wei Liu
@ 2015-11-12 13:16             ` Ian Campbell
  2015-11-12 15:09               ` Meng Xu
  0 siblings, 1 reply; 17+ messages in thread
From: Ian Campbell @ 2015-11-12 13:16 UTC (permalink / raw)
  To: Wei Liu, Andrew Cooper; +Cc: Meng Xu, xen-devel

On Thu, 2015-11-12 at 12:57 +0000, Wei Liu wrote:
> On Thu, Nov 12, 2015 at 12:52:56PM +0000, Andrew Cooper wrote:
> > 
> > In screen for example, you need to send CTRL-a a to send a "CTRL-a" on
> > the serial.
> > 
> 
> And for minicom it is "Ctrl-a Ctrl-a" to send "Ctrl-a".

Now run minicom in screen ;-) (Ctrl-a a ?)

I recommend using conswitch=x (for any letter x you like, although x is
often good) on your hypervisor command line to change the Xen debug key to
something which doesn't clash with things.

> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Xen reboot on panic
  2015-11-12 12:52         ` Andrew Cooper
  2015-11-12 12:57           ` Wei Liu
@ 2015-11-12 15:07           ` Meng Xu
  2015-11-12 16:13             ` Meng Xu
  1 sibling, 1 reply; 17+ messages in thread
From: Meng Xu @ 2015-11-12 15:07 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, xen-devel

Hi Andrew,

2015-11-12 7:52 GMT-05:00 Andrew Cooper <andrew.cooper3@citrix.com>:
> On 12/11/15 02:10, Meng Xu wrote:
>> Hi Andrew,
>>
>> 2015-11-11 18:34 GMT-05:00 Andrew Cooper <andrew.cooper3@citrix.com>:
>>> On 11/11/2015 23:21, Meng Xu wrote:
>>>>> Finally, I can't tell from your paste below, but ensure that you are
>>>>> always using a debug hypervisor.
>>>> The source file Config.mk under the xen folder has
>>>> debug ?= y
>>>>
>>>> In addition,  "xl dmesg |grep debug" gives me:
>>>>
>>>> (XEN) Xen version 4.6-unstable (root@) (gcc (Ubuntu/Linaro
>>>> 4.6.3-1ubuntu5) 4.6.3) debug=y Wed Nov 11 17:06:30 EST 2015
>>>>
>>>> So I guess I'm using the debug hypervisor.
>>> You are
>>>
>>>> I reboot the system after removing all of those useless options (that
>>>> is, no more "reboot=k panic=2 panic_on_oops=1" in the Xen boot command
>>>> line.)
>>>>
>>>> Is there anything else I can do to force Xen always reboot at panic or oops?
>>> Unless you specify noreboot, Xen will try its hardest to reboot the
>>> system.  It is possible that you have a dodgy firmware which interacts
>>> poorly with the default methods.
>>>
>>> Does normal reboot from dom0 work as intended?
>> Yes. Before Xen crashes, I can reboot the machine dom0 or from a serial port.
>>
>>> If not, debug in the following order:
>>>
>>> * `reboot` from the dom0 shell
>>> * `echo b > /proc/sysrq-trigger` from the dom0 shell
>>> * `xl debug-keys R` from the dom0 shell
>> All of these three approaches can reboot the machine successfully.
>>
>>
>>> * CTRL-A x3, R from the serial console
>> I think "Ctrl-A" means that I should press "Ctrl + A" three times. Am I correct?
>> When I press Ctrl-A twice, it shows "No other window"; after I press
>> "Ctrl-A" for three consecutive times, and press R (or r), it shows
>> "+wrap" in the serial port.
>
> In which case the program you are using locally to connect to the serial
> console (Minicom / screen/ putty?) is intercepting CTRL-a for its own
> purposes.

I'm using screen.

>
> In screen for example, you need to send CTRL-a a to send a "CTRL-a" on
> the serial.
>

I see. After I use Ctrl-a a to send the Ctrl-a to the serial, I can
reboot the machine when the Xen kernel working well.
However, when the Xen kernel crashes, I cannot switch to Xen's (debug)
console and reboot it.

>>
>> From the serial console, I can press "Ctrl+o, b" to reboot the machine
>> when Xen hasn't crashed. But when Xen crashed, serial port won't work.
>
> Ctrl-o b is exactly the same as `echo b > /proc/sysrq-trigger`.
>
>>
>> BTW, the serial port is an PCI serial port instead of the legacy
>> serial port on the motherboard. Is the PCI serial port a problem? On
>> another machine with the legacy serial port, I can use "Ctrl - o, b"
>> to reboot even when system crashes. :-(
>
> Once dom0 stops responding to its console, CTRL-o won't help you at all.

>
>>
>>> Those are the reboot options.  It is also possible that a kexec kernel
>>> is being loaded and that is getting stuck.
>>>
>>> The crash options are:
>>> * `kexec -p`
>>> * `echo c > /proc/sysrq-trigger`
>>> * `xl debug-keys C`
>> The "xl debug-keys c" will not reboot the system, but it will print
>> out the crash message in the serial console.
>
> Right - in which case there is a problem on the crash path, rather than
> the reboot path.

Yes. I had a look at xl debug-keys before. ;-)

>
> Are you (or rather, your dom0) loading a crash kernel?

Sorry, I don't quite get what you mean in this sentence.
Did you mean that did I load a crash Xen kernel when I boot the machine?
What I did is:
I loaded a buggy Xen kernel (which  I know how to trigger xen to crash
in the scheduler), and caused Xen to crash by creating and destroying
a VM. Xen crashes when a VM is destroyed and I  try to reboot the
machine.

dom0's kernel is unmodified.

>
>>
>>> If those don't work then you will need to start instrumenting Xen to
>>> work out where stuff is going wrong.
>> It seems all of the above commands work on my machine (except for the
>> Ctrl-A x3, R). Is there anything else I can do to force the system
>> reboot at panic?
>
> Get CTRL-A working first.  That is simply a configuration interaction
> with the software you are using to connect to the serial console.

Yes, it works when Xen kernel works fine.

>
> Once you get that working, you will be able to use debug keys from the
> serial console itself, rather than via `xl debug-keys`.
>
> After that, you should start putting printk()s in machine_restart() to
> see where execution is actually getting to.

so that I can try to debug and fix the issue (if there exist an issue
in the code path). Am I right?

Thank you very much for your help! :-)

Best regards,

Meng



-- 


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Xen reboot on panic
  2015-11-12 13:16             ` Ian Campbell
@ 2015-11-12 15:09               ` Meng Xu
  0 siblings, 0 replies; 17+ messages in thread
From: Meng Xu @ 2015-11-12 15:09 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Andrew Cooper, Wei Liu, xen-devel

Hi Ian and Wei,

2015-11-12 8:16 GMT-05:00 Ian Campbell <ian.campbell@citrix.com>:
> On Thu, 2015-11-12 at 12:57 +0000, Wei Liu wrote:
>> On Thu, Nov 12, 2015 at 12:52:56PM +0000, Andrew Cooper wrote:
>> >
>> > In screen for example, you need to send CTRL-a a to send a "CTRL-a" on
>> > the serial.
>> >
>>
>> And for minicom it is "Ctrl-a Ctrl-a" to send "Ctrl-a".
>
> Now run minicom in screen ;-) (Ctrl-a a ?)
>
> I recommend using conswitch=x (for any letter x you like, although x is
> often good) on your hypervisor command line to change the Xen debug key to
> something which doesn't clash with things.

Thank you very much for your suggestions! :-) The "Ctrl-a a" x3 works
well on the screen when the Xen kernel hasn't crashed, but didn't work
once Xen kernel crashed. :-(

I will put some printk in the machine_restart as Andrew suggested to
see if I can find more information. If you happen to know any other
ways that I can try, could you let me know?

Thank you very much!

Best,

Meng


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Xen reboot on panic
  2015-11-12 15:07           ` Meng Xu
@ 2015-11-12 16:13             ` Meng Xu
  2015-11-12 16:57               ` Meng Xu
  0 siblings, 1 reply; 17+ messages in thread
From: Meng Xu @ 2015-11-12 16:13 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, xen-devel

Hi Andrew,

I thought I might find where the system got stuck.

As you suggested, I add several printks inside machine_restart();
If the machine restart when Xen kernel crashes, I can see the following output:

        umount: /run/lock: not mounted

        umount: /run/shm: not mounted

         * Will now restart

        [  122.261583] Restarting system.

        (XEN) Domain 0 shutdown: rebooting machine.

        (XEN) machine_restart start running
(This is what I added at the first line of the machine_restart())

        (XEN) machine_restart start running

        (XEN) reboot_type=97

        (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.

So when the machine reboots correctly at Xen kernel crash,  the
machine_restart will be called twice.

After looking into the code, I found the following code in the
machine_restart(), which is quite suspicious.

    if ( system_state >= SYS_STATE_smp_boot )

    {

        local_irq_enable();


        /* Ensure we are the boot CPU. */

        if ( get_apic_id() != boot_cpu_physical_apicid )

        {

            /* Send IPI to the boot CPU (logical cpu 0). */

            on_selected_cpus(cpumask_of(0), __machine_restart,

                             &delay_millisecs, 0);

            for ( ; ; )

                halt();

        }


        smp_send_stop();

    }

This function basically try to send an IPI from the current CPU to
notify the boot CPU to run machine_restart() function and then the
current CPU goes to halt().

If the boot CPU missed the IPI, the machine_restart() will never be
called and the system hangs. Am I correct?

If I'm correct, how should I fix this? Should I just let the current
CPU keep sending the IPI to the boot CPU to run machine_restart()?
This seems too hacky to me, but I'm not quite sure why we have to use
the boot CPU to restart. If we can let any CPU to reset the CPU status
and reboot, we can avoid this.

or is it because the system_state is not correctly set? If we can
avoid getting into the if statement, we can also avoid this problem.

Do you have any suggestions?

Thank you very much for your help!

Best,

Meng


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Xen reboot on panic
  2015-11-12 16:13             ` Meng Xu
@ 2015-11-12 16:57               ` Meng Xu
  2015-11-12 17:08                 ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Meng Xu @ 2015-11-12 16:57 UTC (permalink / raw)
  To: Andrew Cooper; +Cc: Wei Liu, xen-devel

I think the machine_restart() may have a bug. :-(

2015-11-12 11:13 GMT-05:00 Meng Xu <xumengpanda@gmail.com>:
> Hi Andrew,
>
> I thought I might find where the system got stuck.
>
> As you suggested, I add several printks inside machine_restart();
> If the machine restart when Xen kernel crashes, I can see the following output:
>
>         umount: /run/lock: not mounted
>
>         umount: /run/shm: not mounted
>
>          * Will now restart
>
>         [  122.261583] Restarting system.
>
>         (XEN) Domain 0 shutdown: rebooting machine.
>
>         (XEN) machine_restart start running
> (This is what I added at the first line of the machine_restart())
>
>         (XEN) machine_restart start running
>
>         (XEN) reboot_type=97
>
>         (XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
>
> So when the machine reboots correctly at Xen kernel crash,  the
> machine_restart will be called twice.
>
> After looking into the code, I found the following code in the
> machine_restart(), which is quite suspicious.
>
>     if ( system_state >= SYS_STATE_smp_boot )
>
>     {
>
>         local_irq_enable();
>
>
>         /* Ensure we are the boot CPU. */
>
>         if ( get_apic_id() != boot_cpu_physical_apicid )

If we are at the boot CPU and the if statement return true

>
>         {
>
>             /* Send IPI to the boot CPU (logical cpu 0). */
>
>             on_selected_cpus(cpumask_of(0), __machine_restart,
>
>                              &delay_millisecs, 0);

we will send an IPI from CPU 0 to CPU to run machine_restart.

>
>             for ( ; ; )
>
>                 halt();

and CPU 0 will halt immediately.

If the IPI arrives later on CPU 0, CPU 0 won't be able to handle it,
since it has been halted.

*** I have one solution in my mind ***
Maybe we should check if the current CPU is CPU 0 by using
smp_processor_id(); The only concern I have is I'm not sure if the
machine_restart() will be rescheduled by Xen scheduler onto another
CPU after we run the smp_processor_id();

*** Result below confirms my guess***
If I print out the current CPU who sends out the IPI and the following
result confirms my speculation:

XEN) Reboot in five seconds...

(XEN) now we should see: before kexec_crash

(XEN) before kexec_crash

(XEN) after kexec_crash

(XEN) machine_restart start running, delay_millisecs=5000

(XEN) machine_restart: finished console_start_sync, system_state is 3

(XEN) On P0
As this line suggests, P0 sends P0 an IPI and P0 goes to halt immediately...

Thanks,

Meng

-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Xen reboot on panic
  2015-11-12 16:57               ` Meng Xu
@ 2015-11-12 17:08                 ` Jan Beulich
  2015-11-12 19:54                   ` Meng Xu
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2015-11-12 17:08 UTC (permalink / raw)
  To: Meng Xu; +Cc: Andrew Cooper, Wei Liu, xen-devel

>>> On 12.11.15 at 17:57, <xumengpanda@gmail.com> wrote:
>> After looking into the code, I found the following code in the
>> machine_restart(), which is quite suspicious.
>>
>>     if ( system_state >= SYS_STATE_smp_boot )
>>
>>     {
>>
>>         local_irq_enable();
>>
>>
>>         /* Ensure we are the boot CPU. */
>>
>>         if ( get_apic_id() != boot_cpu_physical_apicid )
> 
> If we are at the boot CPU and the if statement return true
> 
>>
>>         {
>>
>>             /* Send IPI to the boot CPU (logical cpu 0). */
>>
>>             on_selected_cpus(cpumask_of(0), __machine_restart,
>>
>>                              &delay_millisecs, 0);
> 
> we will send an IPI from CPU 0 to CPU to run machine_restart.

The other way around you mean.

>>
>>             for ( ; ; )
>>
>>                 halt();
> 
> and CPU 0 will halt immediately.
> 
> If the IPI arrives later on CPU 0, CPU 0 won't be able to handle it,
> since it has been halted.

It's CPUn that gets halted, not CPU0. This ...

> (XEN) On P0
> As this line suggests, P0 sends P0 an IPI and P0 goes to halt immediately...

... is suspicious: Is boot_cpu_physical_apicid not set correctly?
Or is get_apic_id() returning rubbish?

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Xen reboot on panic
  2015-11-12 17:08                 ` Jan Beulich
@ 2015-11-12 19:54                   ` Meng Xu
  2015-11-13  7:39                     ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Meng Xu @ 2015-11-12 19:54 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel

Hi Jan,

2015-11-12 12:08 GMT-05:00 Jan Beulich <JBeulich@suse.com>:
>>>> On 12.11.15 at 17:57, <xumengpanda@gmail.com> wrote:
>>> After looking into the code, I found the following code in the
>>> machine_restart(), which is quite suspicious.
>>>
>>>     if ( system_state >= SYS_STATE_smp_boot )
>>>
>>>     {
>>>
>>>         local_irq_enable();
>>>
>>>
>>>         /* Ensure we are the boot CPU. */
>>>
>>>         if ( get_apic_id() != boot_cpu_physical_apicid )
>>
>> If we are at the boot CPU and the if statement return true
>>
>>>
>>>         {
>>>
>>>             /* Send IPI to the boot CPU (logical cpu 0). */
>>>
>>>             on_selected_cpus(cpumask_of(0), __machine_restart,
>>>
>>>                              &delay_millisecs, 0);
>>
>> we will send an IPI from CPU 0 to CPU to run machine_restart.
>
> The other way around you mean.
>
>>>
>>>             for ( ; ; )
>>>
>>>                 halt();
>>
>> and CPU 0 will halt immediately.
>>
>> If the IPI arrives later on CPU 0, CPU 0 won't be able to handle it,
>> since it has been halted.
>
> It's CPUn that gets halted, not CPU0. This ...

You are right.  When system_state > SYS_STATE_smp_boot, CPU i (i != 0)
will send IPI to CPU 0.


>
>> (XEN) On P0
>> As this line suggests, P0 sends P0 an IPI and P0 goes to halt immediately...
>
> ... is suspicious: Is boot_cpu_physical_apicid not set correctly?
> Or is get_apic_id() returning rubbish?

After printing out the boot_cpu_physical_apicid and get_apic_id, I
found that are correct!

However, the line after that if statement is:
smp_send_stop();

which is not in the if ( get_apic_id() != boot_cpu_physical_apicid ) statement.

So P0 may run this code, and from what I read from this
smp_send_stop(), it has the following code:

    local_irq_disable();

    __stop_this_cpu();

    disable_IO_APIC();

    hpet_disable();

    local_irq_enable();

I'm guessing at __stop_this_cpu() when it is on P0, P0 will be
stopped. That's why P0 will never have the chance to proceed to the
rest of logic in the machine_restart(). Therefore, the machine won't
restart.

If I move this  smp_send_stop(void) into the if statement, Xen will reboot.

Do you think this could be a fix?
If I misunderstood anything, please let me know...

Thanks,

Meng


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Xen reboot on panic
  2015-11-12 19:54                   ` Meng Xu
@ 2015-11-13  7:39                     ` Jan Beulich
  2015-11-19  3:58                       ` Meng Xu
  0 siblings, 1 reply; 17+ messages in thread
From: Jan Beulich @ 2015-11-13  7:39 UTC (permalink / raw)
  To: Meng Xu; +Cc: Andrew Cooper, Wei Liu, xen-devel

>>> On 12.11.15 at 20:54, <xumengpanda@gmail.com> wrote:
> However, the line after that if statement is:
> smp_send_stop();
> 
> which is not in the if ( get_apic_id() != boot_cpu_physical_apicid ) 
> statement.
> 
> So P0 may run this code, and from what I read from this
> smp_send_stop(), it has the following code:
> 
>     local_irq_disable();
> 
>     __stop_this_cpu();
> 
>     disable_IO_APIC();
> 
>     hpet_disable();
> 
>     local_irq_enable();
> 
> I'm guessing at __stop_this_cpu() when it is on P0, P0 will be
> stopped. That's why P0 will never have the chance to proceed to the
> rest of logic in the machine_restart(). Therefore, the machine won't
> restart.

The code is quite clear in this regard - smp_send_stop() stops all other
CPUs, but calls only __stop_this_cpu() (not stop_this_cpu()) for itself.
I.e. execution is at least supposed to make it back to the caller. Also
please don't forget that this is working for most everyone else, so
what you're looking for is more likely some oddity on your system, not
some general issue.

(Btw - are you doing this on master, which is what I'd expect you to?
I ask because the sequence of calls you quote above doesn't match
with what I see on there. I'd really like to avoid hunting a problem
long fixed.)

> If I move this  smp_send_stop(void) into the if statement, Xen will reboot.
> 
> Do you think this could be a fix?

Definitely not.

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Xen reboot on panic
  2015-11-13  7:39                     ` Jan Beulich
@ 2015-11-19  3:58                       ` Meng Xu
  2015-11-19  7:26                         ` Jan Beulich
  0 siblings, 1 reply; 17+ messages in thread
From: Meng Xu @ 2015-11-19  3:58 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Wei Liu, xen-devel

2015-11-13 2:39 GMT-05:00 Jan Beulich <JBeulich@suse.com>:
>>>> On 12.11.15 at 20:54, <xumengpanda@gmail.com> wrote:
>> However, the line after that if statement is:
>> smp_send_stop();
>>
>> which is not in the if ( get_apic_id() != boot_cpu_physical_apicid )
>> statement.
>>
>> So P0 may run this code, and from what I read from this
>> smp_send_stop(), it has the following code:
>>
>>     local_irq_disable();
>>
>>     __stop_this_cpu();
>>
>>     disable_IO_APIC();
>>
>>     hpet_disable();
>>
>>     local_irq_enable();
>>
>> I'm guessing at __stop_this_cpu() when it is on P0, P0 will be
>> stopped. That's why P0 will never have the chance to proceed to the
>> rest of logic in the machine_restart(). Therefore, the machine won't
>> restart.
>
> The code is quite clear in this regard - smp_send_stop() stops all other
> CPUs, but calls only __stop_this_cpu() (not stop_this_cpu()) for itself.
> I.e. execution is at least supposed to make it back to the caller. Also
> please don't forget that this is working for most everyone else, so
> what you're looking for is more likely some oddity on your system, not
> some general issue.

I see. Hmm, maybe it is because of some oddity on my machine, which is
not a commodity machine but assembled from components. :-(

>
> (Btw - are you doing this on master, which is what I'd expect you to?
> I ask because the sequence of calls you quote above doesn't match
> with what I see on there. I'd really like to avoid hunting a problem
> long fixed.)

Not really. I added several commits after the master and then "buried"
a bug in the scheduler to cause the system crash when I destroy a VM.
Because the rebooting issue only appears when the kernel crashes in
some cases, so I just tried to use the bug to test if xen kernel can
successfully reboot after crash.  I didn't experience kernel crash on
master.

I will come back when I experience the reboot issue on Xen master.
But I still think that Xen should reboot even when the other parts of
Xen (not the reboot logic) has a bug. Maybe I'm wrong?

>
>> If I move this  smp_send_stop(void) into the if statement, Xen will reboot.
>>
>> Do you think this could be a fix?
>
> Definitely not.

I see the reason now...
This issue does not happen on another machine of mine. Probably it is
because of the oddity on the assembled machine, as Jan said. :-(

Thank you very much for your help and advice!

Best,

Meng



-- 


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
http://www.cis.upenn.edu/~mengxu/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Question about Xen reboot on panic
  2015-11-19  3:58                       ` Meng Xu
@ 2015-11-19  7:26                         ` Jan Beulich
  0 siblings, 0 replies; 17+ messages in thread
From: Jan Beulich @ 2015-11-19  7:26 UTC (permalink / raw)
  To: Meng Xu; +Cc: Andrew Cooper, Wei Liu, xen-devel

>>> On 19.11.15 at 04:58, <xumengpanda@gmail.com> wrote:
> I will come back when I experience the reboot issue on Xen master.
> But I still think that Xen should reboot even when the other parts of
> Xen (not the reboot logic) has a bug. Maybe I'm wrong?

No, that's a perfectly valid expectation. Just that what you proposed
as a fix so far was based on specific observations on that one (non-
standard, as you now say) machine, but any adjustment we make
needs to be universal. To help with evaluating possibilities we depend
- even more so with the machine specifics in mind - on you providing
us with enough data.

Jan

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-11-19  7:26 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-11 22:49 Question about Xen reboot on panic Meng Xu
2015-11-11 22:54 ` Andrew Cooper
2015-11-11 23:21   ` Meng Xu
2015-11-11 23:34     ` Andrew Cooper
2015-11-12  2:10       ` Meng Xu
2015-11-12 12:52         ` Andrew Cooper
2015-11-12 12:57           ` Wei Liu
2015-11-12 13:16             ` Ian Campbell
2015-11-12 15:09               ` Meng Xu
2015-11-12 15:07           ` Meng Xu
2015-11-12 16:13             ` Meng Xu
2015-11-12 16:57               ` Meng Xu
2015-11-12 17:08                 ` Jan Beulich
2015-11-12 19:54                   ` Meng Xu
2015-11-13  7:39                     ` Jan Beulich
2015-11-19  3:58                       ` Meng Xu
2015-11-19  7:26                         ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.