From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andrew Cooper Subject: Re: Question about Xen reboot on panic Date: Thu, 12 Nov 2015 12:52:56 +0000 Message-ID: <56448BA8.6080705@citrix.com> References: <5643C716.1050102@citrix.com> <5643D091.7090503@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Meng Xu Cc: Wei Liu , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On 12/11/15 02:10, Meng Xu wrote: > Hi Andrew, > > 2015-11-11 18:34 GMT-05:00 Andrew Cooper : >> On 11/11/2015 23:21, Meng Xu wrote: >>>> Finally, I can't tell from your paste below, but ensure that you are >>>> always using a debug hypervisor. >>> The source file Config.mk under the xen folder has >>> debug ?= y >>> >>> In addition, "xl dmesg |grep debug" gives me: >>> >>> (XEN) Xen version 4.6-unstable (root@) (gcc (Ubuntu/Linaro >>> 4.6.3-1ubuntu5) 4.6.3) debug=y Wed Nov 11 17:06:30 EST 2015 >>> >>> So I guess I'm using the debug hypervisor. >> You are >> >>> I reboot the system after removing all of those useless options (that >>> is, no more "reboot=k panic=2 panic_on_oops=1" in the Xen boot command >>> line.) >>> >>> Is there anything else I can do to force Xen always reboot at panic or oops? >> Unless you specify noreboot, Xen will try its hardest to reboot the >> system. It is possible that you have a dodgy firmware which interacts >> poorly with the default methods. >> >> Does normal reboot from dom0 work as intended? > Yes. Before Xen crashes, I can reboot the machine dom0 or from a serial port. > >> If not, debug in the following order: >> >> * `reboot` from the dom0 shell >> * `echo b > /proc/sysrq-trigger` from the dom0 shell >> * `xl debug-keys R` from the dom0 shell > All of these three approaches can reboot the machine successfully. > > >> * CTRL-A x3, R from the serial console > I think "Ctrl-A" means that I should press "Ctrl + A" three times. Am I correct? > When I press Ctrl-A twice, it shows "No other window"; after I press > "Ctrl-A" for three consecutive times, and press R (or r), it shows > "+wrap" in the serial port. In which case the program you are using locally to connect to the serial console (Minicom / screen/ putty?) is intercepting CTRL-a for its own purposes. In screen for example, you need to send CTRL-a a to send a "CTRL-a" on the serial. > > From the serial console, I can press "Ctrl+o, b" to reboot the machine > when Xen hasn't crashed. But when Xen crashed, serial port won't work. Ctrl-o b is exactly the same as `echo b > /proc/sysrq-trigger`. > > BTW, the serial port is an PCI serial port instead of the legacy > serial port on the motherboard. Is the PCI serial port a problem? On > another machine with the legacy serial port, I can use "Ctrl - o, b" > to reboot even when system crashes. :-( Once dom0 stops responding to its console, CTRL-o won't help you at all. > >> Those are the reboot options. It is also possible that a kexec kernel >> is being loaded and that is getting stuck. >> >> The crash options are: >> * `kexec -p` >> * `echo c > /proc/sysrq-trigger` >> * `xl debug-keys C` > The "xl debug-keys c" will not reboot the system, but it will print > out the crash message in the serial console. Right - in which case there is a problem on the crash path, rather than the reboot path. Are you (or rather, your dom0) loading a crash kernel? > >> If those don't work then you will need to start instrumenting Xen to >> work out where stuff is going wrong. > It seems all of the above commands work on my machine (except for the > Ctrl-A x3, R). Is there anything else I can do to force the system > reboot at panic? Get CTRL-A working first. That is simply a configuration interaction with the software you are using to connect to the serial console. Once you get that working, you will be able to use debug keys from the serial console itself, rather than via `xl debug-keys`. After that, you should start putting printk()s in machine_restart() to see where execution is actually getting to. ~Andrew