From mboxrd@z Thu Jan 1 00:00:00 1970 From: Meng Xu Subject: Re: Question about Xen reboot on panic Date: Thu, 12 Nov 2015 11:13:26 -0500 Message-ID: References: <5643C716.1050102@citrix.com> <5643D091.7090503@citrix.com> <56448BA8.6080705@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper Cc: Wei Liu , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org Hi Andrew, I thought I might find where the system got stuck. As you suggested, I add several printks inside machine_restart(); If the machine restart when Xen kernel crashes, I can see the following output: umount: /run/lock: not mounted umount: /run/shm: not mounted * Will now restart [ 122.261583] Restarting system. (XEN) Domain 0 shutdown: rebooting machine. (XEN) machine_restart start running (This is what I added at the first line of the machine_restart()) (XEN) machine_restart start running (XEN) reboot_type=97 (XEN) Resetting with ACPI MEMORY or I/O RESET_REG. So when the machine reboots correctly at Xen kernel crash, the machine_restart will be called twice. After looking into the code, I found the following code in the machine_restart(), which is quite suspicious. if ( system_state >= SYS_STATE_smp_boot ) { local_irq_enable(); /* Ensure we are the boot CPU. */ if ( get_apic_id() != boot_cpu_physical_apicid ) { /* Send IPI to the boot CPU (logical cpu 0). */ on_selected_cpus(cpumask_of(0), __machine_restart, &delay_millisecs, 0); for ( ; ; ) halt(); } smp_send_stop(); } This function basically try to send an IPI from the current CPU to notify the boot CPU to run machine_restart() function and then the current CPU goes to halt(). If the boot CPU missed the IPI, the machine_restart() will never be called and the system hangs. Am I correct? If I'm correct, how should I fix this? Should I just let the current CPU keep sending the IPI to the boot CPU to run machine_restart()? This seems too hacky to me, but I'm not quite sure why we have to use the boot CPU to restart. If we can let any CPU to reset the CPU status and reboot, we can avoid this. or is it because the system_state is not correctly set? If we can avoid getting into the if statement, we can also avoid this problem. Do you have any suggestions? Thank you very much for your help! Best, Meng ----------- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania http://www.cis.upenn.edu/~mengxu/