From mboxrd@z Thu Jan 1 00:00:00 1970 From: Meng Xu Subject: Re: Question about Xen reboot on panic Date: Thu, 12 Nov 2015 11:57:58 -0500 Message-ID: References: <5643C716.1050102@citrix.com> <5643D091.7090503@citrix.com> <56448BA8.6080705@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Andrew Cooper Cc: Wei Liu , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org I think the machine_restart() may have a bug. :-( 2015-11-12 11:13 GMT-05:00 Meng Xu : > Hi Andrew, > > I thought I might find where the system got stuck. > > As you suggested, I add several printks inside machine_restart(); > If the machine restart when Xen kernel crashes, I can see the following output: > > umount: /run/lock: not mounted > > umount: /run/shm: not mounted > > * Will now restart > > [ 122.261583] Restarting system. > > (XEN) Domain 0 shutdown: rebooting machine. > > (XEN) machine_restart start running > (This is what I added at the first line of the machine_restart()) > > (XEN) machine_restart start running > > (XEN) reboot_type=97 > > (XEN) Resetting with ACPI MEMORY or I/O RESET_REG. > > So when the machine reboots correctly at Xen kernel crash, the > machine_restart will be called twice. > > After looking into the code, I found the following code in the > machine_restart(), which is quite suspicious. > > if ( system_state >= SYS_STATE_smp_boot ) > > { > > local_irq_enable(); > > > /* Ensure we are the boot CPU. */ > > if ( get_apic_id() != boot_cpu_physical_apicid ) If we are at the boot CPU and the if statement return true > > { > > /* Send IPI to the boot CPU (logical cpu 0). */ > > on_selected_cpus(cpumask_of(0), __machine_restart, > > &delay_millisecs, 0); we will send an IPI from CPU 0 to CPU to run machine_restart. > > for ( ; ; ) > > halt(); and CPU 0 will halt immediately. If the IPI arrives later on CPU 0, CPU 0 won't be able to handle it, since it has been halted. *** I have one solution in my mind *** Maybe we should check if the current CPU is CPU 0 by using smp_processor_id(); The only concern I have is I'm not sure if the machine_restart() will be rescheduled by Xen scheduler onto another CPU after we run the smp_processor_id(); *** Result below confirms my guess*** If I print out the current CPU who sends out the IPI and the following result confirms my speculation: XEN) Reboot in five seconds... (XEN) now we should see: before kexec_crash (XEN) before kexec_crash (XEN) after kexec_crash (XEN) machine_restart start running, delay_millisecs=5000 (XEN) machine_restart: finished console_start_sync, system_state is 3 (XEN) On P0 As this line suggests, P0 sends P0 an IPI and P0 goes to halt immediately... Thanks, Meng ----------- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania http://www.cis.upenn.edu/~mengxu/