On 01/23/2017 at 10:50 PM, Borislav Petkov wrote: > On Mon, Jan 23, 2017 at 09:35:53PM +0800, Xunlei Pang wrote: >> One possible timing sequence would be: >> 1st kernel running on multiple cpus panicked >> then the crash dump code starts >> the crash dump code stops the others cpus except the crashing one >> 2nd kernel boots up on the crash cpu with "nr_cpus=1" >> some broadcasted mce comes on some cpu amongst the other cpus(not the crashing cpu) > Where does this broadcasted MCE come from? > > The crash dump code triggered it? Or it happened before the panic()? > > Are you talking about an *actual* sequence which you're experiencing on > real hw or is this something hypothetical? > It occurred on real hardware when testing crash dump. 1) SysRq-c was injected for the test in 1st kernel [ 49.897279] SysRq : Trigger a crash 2) The 2nd kernel started for kdump [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-229.el7.x86_64 root=UUID=976a15c8-8cbe-44ad-bb91-23f9b18e8789 ro console=ttyS1,115200 nmi_watchdog=0 irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug disable_cpu_apicid=0 elfcorehdr=869772K 3) An MCE came to the 1st kernel, timeout panic occurred, and rebooted the machine [ 6.095706] Dazed and confused, but trying to continue // message of the 1st kernel [ 81.655507] Kernel panic - not syncing: Timeout synchronizing machine check over CPUs [ 82.729324] Shutting down cpus with NMI [ 82.774539] drm_kms_helper: panic occurred, switching back to text console [ 82.782257] Rebooting in 10 seconds.. Please see the attached for the full log. Regards, Xunlei