All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug] Kdump does not work when panic triggered due to MCE
@ 2011-05-06 16:54 ` K.Prasad
  0 siblings, 0 replies; 20+ messages in thread
From: K.Prasad @ 2011-05-06 16:54 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Andi Kleen, Luck, Tony, Vivek Goyal, kexec

Hi All,
	I wanted to test the behaviour of kdump when panic is triggered
due to MCE on x86 and found that kdump is not captured.

While the kdump service is configured and running and non-MCE panics
(such as those triggered through to /proc/sysrq-trigger) successfully
capture a kdump, any fatal MCE error injected through the mce-inject
tool causes a reboot of the machine.

The code has been traced (using early_serial_putc()) to enter the kexec
path i.e. panic()->crash_kexec()->machine_kexec()->relocate_kernel()
but is untraceable further.

Kdump works fine when the same the similar test is carried out inside a
KVM guest.

Has anybody tested this before? Or have found kdump working when fatal
MCEs have actually occurred?

Thanks,
K.Prasad

Relevant Screen logs
---------------------

login: root
Password: 
Last login: Fri May  6 11:16:52 from 9.77.122.190

# uname -a
Linux elm3a97.beaverton.ibm.com 2.6.39-rc6.prasad_kdump+ #1 SMP Fri May 6 07:47:31 EDT 2011 i686 i686 i386 GNU/Linux
# lsmod | grep mce
mce_inject              2355  0 [permanent]
# service kdump status
Kdump is operational
# mce-inject /home/prasadkr/mce/mce-test/cases/soft-inj/panic_ucr/data/srar_over
Triggering MCE exception on CPU 0
Disabling lock debugging due to kernel taint
[Hardware Error]: CPU 0: Machine Check Exception: 6 Bank 2: f580000000000000
[Hardware Error]: RIP 73:<000000001eadbabe> 
[Hardware Error]: TSC 21dde8717030 ADDR 1234 
[Hardware Error]: PROCESSOR 0:106a5 TIME 1304696989 SOCKET 0 APIC 0
[Hardware Error]: No human readable MCE decoding support on this CPU type.
[Hardware Error]: Run the message through 'mcelog --ascii' to decode.
[Hardware Error]: Machine check: Overflowed uncorrected
Kernel panic - not syncing: Fatal Machine check
Pid: 0, comm: kworker/0:0 Tainted: G   M    W   2.6.39-rc6.prasad_kdump+ #1
------------[ cut here ]------------
kernel BUG at arch/x86/kernel/traps.c:436!
invalid opcode: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
Modules linked in: mce_inject autofs4 cpufreq_ondemand acpi_cpufreq mperf ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_mirror dm_region_hash dm_log dm_mod cdc_ether usbnet mii microcode sg i2c_i801 serio_raw pcspkr iTCO_wdt iTCO_vendor_support bnx2x libcrc32c mdio ioatdma dca i7core_edac edac_core bnx2 ext4 jbd2 sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptsas mptscsih mptbase scsi_transport_sas [last unloaded: scsi_wait_scan]

Pid: 0, comm: kworker/0:1 Tainted: G   M    W   2.6.39-rc6.prasad_kdump+ #1 IBM IBM System x -[7839AC1]-/46C7890     
EIP: 0060:[<c0860da9>] EFLAGS: 00010006 CPU: 12
EIP is at do_nmi+0x89/0xa0
EAX: e9ba9c9c EBX: e9ba9c9c ECX: 04010000 EDX: e9ba8000
ESI: 0000000c EDI: 00000af0 EBP: e9ba9c94 ESP: e9ba9c90
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process kworker/0:1 (pid: 0, ti=e9ba8000 task=e9ba70d0 task.ti=e9ba8000)
Stack:
 0000000c e9ba9ce8 c0860678 0000000c 00000010 000021dd 0000000c 00000af0
 e9ba9ce8 f0eca855 c0aa007b 0000007b e9ba00d8 c08600e0 f0eca855 c06048fd
 00000060 00000246 f0eca5dd 004b9910 0000000c 0000000c e9ba9cf0 c06048be
Call Trace:
 [<c0860678>] nmi_stack_correct+0x2f/0x34
 [<c08600e0>] ? invalidate_interrupt23+0x3c/0x3c
 [<c06048fd>] ? delay_tsc+0x3d/0x70
 [<c06048be>] __const_udelay+0x1e/0x20
 [<c041c9b5>] wait_for_panic+0x25/0x50
 [<c041cd78>] mce_timed_out+0x48/0x90
 [<c041cf39>] mce_end+0x59/0x100
 [<c041db2b>] do_machine_check+0x3db/0x6a0
 [<c0476600>] ? __hrtimer_start_range_ns+0xa0/0x470
 [<f7e15064>] raise_exception+0x34/0xa0 [mce_inject]
 [<c040f978>] ? sched_clock+0x8/0x10
 [<c0478735>] ? sched_clock_cpu+0x145/0x190
 [<c0489cc0>] ? __lock_acquire+0x2c0/0x490
 [<f7e151a1>] mce_raise_notify+0x61/0x70 [mce_inject]
 [<c0863393>] notifier_call_chain+0x43/0x60
 [<c086340b>] __atomic_notifier_call_chain+0x5b/0x80
 [<c08633b0>] ? notifier_call_chain+0x60/0x60
 [<c086344a>] atomic_notifier_call_chain+0x1a/0x20
 [<c086347d>] notify_die+0x2d/0x30
 [<c0860ac2>] default_do_nmi+0x32/0x290
 [<c048a312>] ? __lock_release+0x72/0x180
 [<c04809ea>] ? clockevents_notify+0x3a/0xf0
 [<c0860da7>] do_nmi+0x87/0xa0
 [<c0860678>] nmi_stack_correct+0x2f/0x34
 [<c048007b>] ? leaps_between+0x3b/0x90
 [<c065521c>] ? intel_idle+0x8c/0x100
 [<c077a32d>] cpuidle_idle_call+0x8d/0x210
 [<c0409b0b>] cpu_idle+0x9b/0xd0
 [<c0858714>] start_secondary+0xdd/0xe3
Code: 5e 2f c2 ff 89 e0 25 00 e0 ff ff 8b 50 14 f7 c2 00 00 00 04 74 1e 81 ea 00 00 01 04 89 50 14 5b 5d c3 89 d8 e8 e9 fc ff ff eb c2 <0f> 0b 90 8d 74 26 00 eb f9 0f 0b eb fe 8d 76 00 8d bc 27 00 00 
EIP: [<c0860da9>] do_nmi+0x89/0xa0 SS:ESP 0068:e9ba9c90
Call Trace:
 [<c085c638>] panic+0x57/0x165
 [<c041cd10>] mce_panic+0x1c0/0x1e0
 [<c041ced0>] mce_reign+0x110/0x120
 [<c041cfca>] mce_end+0xea/0x100
 [<c041db2b>] do_machine_check+0x3db/0x6a0
 [<c0476600>] ? __hrtimer_start_range_ns+0xa0/0x470
 [<f7e15064>] raise_exception+0x34/0xa0 [mce_inject]
 [<c040f978>] ? sched_clock+0x8/0x10
 [<c0478735>] ? sched_clock_cpu+0x145/0x190
 [<c0489cc0>] ? __lock_acquire+0x2c0/0x490
 [<f7e151a1>] mce_raise_notify+0x61/0x70 [mce_inject]
 [<c0863393>] nx30
 [<c0860ac2>] default_do_nmi+0x32/0x290
 [<c048a312>] ? __lock_release+0x72/0x180
 [<c04809ea>] ? clockevents_notify+0x3a/0xf0
 [<c0860da7>] do_nmi+0x87/0xa0
 [<c0860678>] nmi_stack_correct+0x2f/0x34
 [<c048007b>] ? leaps_between+0x3b/0x90
 [<c065521c>] ? intel_idle+0x8c/0x100
 [<c077a32d>] cpuidle_idle_call+0x8d/0x210
 [<c0409b0b>] cpu_idle+0x9b/0xd0
 [<c0858714>] start_secondary+0xdd/0xe3
Rebooting in 1 seconds..


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug] Kdump does not work when panic triggered due to MCE
@ 2011-05-06 16:54 ` K.Prasad
  0 siblings, 0 replies; 20+ messages in thread
From: K.Prasad @ 2011-05-06 16:54 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Andi Kleen, kexec, Vivek Goyal, Luck, Tony

Hi All,
	I wanted to test the behaviour of kdump when panic is triggered
due to MCE on x86 and found that kdump is not captured.

While the kdump service is configured and running and non-MCE panics
(such as those triggered through to /proc/sysrq-trigger) successfully
capture a kdump, any fatal MCE error injected through the mce-inject
tool causes a reboot of the machine.

The code has been traced (using early_serial_putc()) to enter the kexec
path i.e. panic()->crash_kexec()->machine_kexec()->relocate_kernel()
but is untraceable further.

Kdump works fine when the same the similar test is carried out inside a
KVM guest.

Has anybody tested this before? Or have found kdump working when fatal
MCEs have actually occurred?

Thanks,
K.Prasad

Relevant Screen logs
---------------------

login: root
Password: 
Last login: Fri May  6 11:16:52 from 9.77.122.190

# uname -a
Linux elm3a97.beaverton.ibm.com 2.6.39-rc6.prasad_kdump+ #1 SMP Fri May 6 07:47:31 EDT 2011 i686 i686 i386 GNU/Linux
# lsmod | grep mce
mce_inject              2355  0 [permanent]
# service kdump status
Kdump is operational
# mce-inject /home/prasadkr/mce/mce-test/cases/soft-inj/panic_ucr/data/srar_over
Triggering MCE exception on CPU 0
Disabling lock debugging due to kernel taint
[Hardware Error]: CPU 0: Machine Check Exception: 6 Bank 2: f580000000000000
[Hardware Error]: RIP 73:<000000001eadbabe> 
[Hardware Error]: TSC 21dde8717030 ADDR 1234 
[Hardware Error]: PROCESSOR 0:106a5 TIME 1304696989 SOCKET 0 APIC 0
[Hardware Error]: No human readable MCE decoding support on this CPU type.
[Hardware Error]: Run the message through 'mcelog --ascii' to decode.
[Hardware Error]: Machine check: Overflowed uncorrected
Kernel panic - not syncing: Fatal Machine check
Pid: 0, comm: kworker/0:0 Tainted: G   M    W   2.6.39-rc6.prasad_kdump+ #1
------------[ cut here ]------------
kernel BUG at arch/x86/kernel/traps.c:436!
invalid opcode: 0000 [#1] SMP 
last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
Modules linked in: mce_inject autofs4 cpufreq_ondemand acpi_cpufreq mperf ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_mirror dm_region_hash dm_log dm_mod cdc_ether usbnet mii microcode sg i2c_i801 serio_raw pcspkr iTCO_wdt iTCO_vendor_support bnx2x libcrc32c mdio ioatdma dca i7core_edac edac_core bnx2 ext4 jbd2 sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptsas mptscsih mptbase scsi_transport_sas [last unloaded: scsi_wait_scan]

Pid: 0, comm: kworker/0:1 Tainted: G   M    W   2.6.39-rc6.prasad_kdump+ #1 IBM IBM System x -[7839AC1]-/46C7890     
EIP: 0060:[<c0860da9>] EFLAGS: 00010006 CPU: 12
EIP is at do_nmi+0x89/0xa0
EAX: e9ba9c9c EBX: e9ba9c9c ECX: 04010000 EDX: e9ba8000
ESI: 0000000c EDI: 00000af0 EBP: e9ba9c94 ESP: e9ba9c90
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process kworker/0:1 (pid: 0, ti=e9ba8000 task=e9ba70d0 task.ti=e9ba8000)
Stack:
 0000000c e9ba9ce8 c0860678 0000000c 00000010 000021dd 0000000c 00000af0
 e9ba9ce8 f0eca855 c0aa007b 0000007b e9ba00d8 c08600e0 f0eca855 c06048fd
 00000060 00000246 f0eca5dd 004b9910 0000000c 0000000c e9ba9cf0 c06048be
Call Trace:
 [<c0860678>] nmi_stack_correct+0x2f/0x34
 [<c08600e0>] ? invalidate_interrupt23+0x3c/0x3c
 [<c06048fd>] ? delay_tsc+0x3d/0x70
 [<c06048be>] __const_udelay+0x1e/0x20
 [<c041c9b5>] wait_for_panic+0x25/0x50
 [<c041cd78>] mce_timed_out+0x48/0x90
 [<c041cf39>] mce_end+0x59/0x100
 [<c041db2b>] do_machine_check+0x3db/0x6a0
 [<c0476600>] ? __hrtimer_start_range_ns+0xa0/0x470
 [<f7e15064>] raise_exception+0x34/0xa0 [mce_inject]
 [<c040f978>] ? sched_clock+0x8/0x10
 [<c0478735>] ? sched_clock_cpu+0x145/0x190
 [<c0489cc0>] ? __lock_acquire+0x2c0/0x490
 [<f7e151a1>] mce_raise_notify+0x61/0x70 [mce_inject]
 [<c0863393>] notifier_call_chain+0x43/0x60
 [<c086340b>] __atomic_notifier_call_chain+0x5b/0x80
 [<c08633b0>] ? notifier_call_chain+0x60/0x60
 [<c086344a>] atomic_notifier_call_chain+0x1a/0x20
 [<c086347d>] notify_die+0x2d/0x30
 [<c0860ac2>] default_do_nmi+0x32/0x290
 [<c048a312>] ? __lock_release+0x72/0x180
 [<c04809ea>] ? clockevents_notify+0x3a/0xf0
 [<c0860da7>] do_nmi+0x87/0xa0
 [<c0860678>] nmi_stack_correct+0x2f/0x34
 [<c048007b>] ? leaps_between+0x3b/0x90
 [<c065521c>] ? intel_idle+0x8c/0x100
 [<c077a32d>] cpuidle_idle_call+0x8d/0x210
 [<c0409b0b>] cpu_idle+0x9b/0xd0
 [<c0858714>] start_secondary+0xdd/0xe3
Code: 5e 2f c2 ff 89 e0 25 00 e0 ff ff 8b 50 14 f7 c2 00 00 00 04 74 1e 81 ea 00 00 01 04 89 50 14 5b 5d c3 89 d8 e8 e9 fc ff ff eb c2 <0f> 0b 90 8d 74 26 00 eb f9 0f 0b eb fe 8d 76 00 8d bc 27 00 00 
EIP: [<c0860da9>] do_nmi+0x89/0xa0 SS:ESP 0068:e9ba9c90
Call Trace:
 [<c085c638>] panic+0x57/0x165
 [<c041cd10>] mce_panic+0x1c0/0x1e0
 [<c041ced0>] mce_reign+0x110/0x120
 [<c041cfca>] mce_end+0xea/0x100
 [<c041db2b>] do_machine_check+0x3db/0x6a0
 [<c0476600>] ? __hrtimer_start_range_ns+0xa0/0x470
 [<f7e15064>] raise_exception+0x34/0xa0 [mce_inject]
 [<c040f978>] ? sched_clock+0x8/0x10
 [<c0478735>] ? sched_clock_cpu+0x145/0x190
 [<c0489cc0>] ? __lock_acquire+0x2c0/0x490
 [<f7e151a1>] mce_raise_notify+0x61/0x70 [mce_inject]
 [<c0863393>] nx30
 [<c0860ac2>] default_do_nmi+0x32/0x290
 [<c048a312>] ? __lock_release+0x72/0x180
 [<c04809ea>] ? clockevents_notify+0x3a/0xf0
 [<c0860da7>] do_nmi+0x87/0xa0
 [<c0860678>] nmi_stack_correct+0x2f/0x34
 [<c048007b>] ? leaps_between+0x3b/0x90
 [<c065521c>] ? intel_idle+0x8c/0x100
 [<c077a32d>] cpuidle_idle_call+0x8d/0x210
 [<c0409b0b>] cpu_idle+0x9b/0xd0
 [<c0858714>] start_secondary+0xdd/0xe3
Rebooting in 1 seconds..


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bug] Kdump does not work when panic triggered due to MCE
  2011-05-06 16:54 ` K.Prasad
@ 2011-05-06 17:38   ` Andi Kleen
  -1 siblings, 0 replies; 20+ messages in thread
From: Andi Kleen @ 2011-05-06 17:38 UTC (permalink / raw)
  To: K.Prasad
  Cc: Linux Kernel Mailing List, Andi Kleen, Luck, Tony, Vivek Goyal,
	kexec, ying.huang

> Has anybody tested this before? Or have found kdump working when fatal
> MCEs have actually occurred?

Ying did some testing. mce-test has test cases for kdump.

My guess is you injected the error into some area used by the kexec
code or boot up path of the kexec kernel.

-Andi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bug] Kdump does not work when panic triggered due to MCE
@ 2011-05-06 17:38   ` Andi Kleen
  0 siblings, 0 replies; 20+ messages in thread
From: Andi Kleen @ 2011-05-06 17:38 UTC (permalink / raw)
  To: K.Prasad
  Cc: Luck, Tony, kexec, Linux Kernel Mailing List, Andi Kleen,
	ying.huang, Vivek Goyal

> Has anybody tested this before? Or have found kdump working when fatal
> MCEs have actually occurred?

Ying did some testing. mce-test has test cases for kdump.

My guess is you injected the error into some area used by the kexec
code or boot up path of the kexec kernel.

-Andi

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bug] Kdump does not work when panic triggered due to MCE
  2011-05-06 16:54 ` K.Prasad
@ 2011-05-09 12:39   ` Vivek Goyal
  -1 siblings, 0 replies; 20+ messages in thread
From: Vivek Goyal @ 2011-05-09 12:39 UTC (permalink / raw)
  To: K.Prasad; +Cc: Linux Kernel Mailing List, Andi Kleen, Luck, Tony, kexec

On Fri, May 06, 2011 at 10:24:12PM +0530, K.Prasad wrote:
> Hi All,
> 	I wanted to test the behaviour of kdump when panic is triggered
> due to MCE on x86 and found that kdump is not captured.
> 
> While the kdump service is configured and running and non-MCE panics
> (such as those triggered through to /proc/sysrq-trigger) successfully
> capture a kdump, any fatal MCE error injected through the mce-inject
> tool causes a reboot of the machine.
> 
> The code has been traced (using early_serial_putc()) to enter the kexec
> path i.e. panic()->crash_kexec()->machine_kexec()->relocate_kernel()
> but is untraceable further.
> 
> Kdump works fine when the same the similar test is carried out inside a
> KVM guest.
> 
> Has anybody tested this before? Or have found kdump working when fatal
> MCEs have actually occurred?

Prasad,

I have never tried taking dump in MCE situation. Does kdump work on this
machine with normal panic()?

Use --debug and --serial option in kexec-tools to print some debug message
and look for "I am in purgatory". This will tell you whether you hanged
in first kernel or second kernel.

Then put "outb()" messages in the kernel to trace what happened. 

Thanks
Vivek

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bug] Kdump does not work when panic triggered due to MCE
@ 2011-05-09 12:39   ` Vivek Goyal
  0 siblings, 0 replies; 20+ messages in thread
From: Vivek Goyal @ 2011-05-09 12:39 UTC (permalink / raw)
  To: K.Prasad; +Cc: Andi Kleen, kexec, Linux Kernel Mailing List, Luck, Tony

On Fri, May 06, 2011 at 10:24:12PM +0530, K.Prasad wrote:
> Hi All,
> 	I wanted to test the behaviour of kdump when panic is triggered
> due to MCE on x86 and found that kdump is not captured.
> 
> While the kdump service is configured and running and non-MCE panics
> (such as those triggered through to /proc/sysrq-trigger) successfully
> capture a kdump, any fatal MCE error injected through the mce-inject
> tool causes a reboot of the machine.
> 
> The code has been traced (using early_serial_putc()) to enter the kexec
> path i.e. panic()->crash_kexec()->machine_kexec()->relocate_kernel()
> but is untraceable further.
> 
> Kdump works fine when the same the similar test is carried out inside a
> KVM guest.
> 
> Has anybody tested this before? Or have found kdump working when fatal
> MCEs have actually occurred?

Prasad,

I have never tried taking dump in MCE situation. Does kdump work on this
machine with normal panic()?

Use --debug and --serial option in kexec-tools to print some debug message
and look for "I am in purgatory". This will tell you whether you hanged
in first kernel or second kernel.

Then put "outb()" messages in the kernel to trace what happened. 

Thanks
Vivek

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bug] Kdump does not work when panic triggered due to MCE
  2011-05-09 12:39   ` Vivek Goyal
  (?)
@ 2011-05-09 15:21   ` Bouchard Louis
  2011-05-09 15:46     ` Vivek Goyal
  2011-05-09 17:03     ` K.Prasad
  -1 siblings, 2 replies; 20+ messages in thread
From: Bouchard Louis @ 2011-05-09 15:21 UTC (permalink / raw)
  To: kexec; +Cc: prasad, Vivek Goyal

Hello,

Le 09/05/2011 14:39, Vivek Goyal a écrit :
>
> Prasad,
>
> I have never tried taking dump in MCE situation. Does kdump work on this
> machine with normal panic()?
>
> Use --debug and --serial option in kexec-tools to print some debug message
> and look for "I am in purgatory". This will tell you whether you hanged
> in first kernel or second kernel.
>
> Then put "outb()" messages in the kernel to trace what happened. 
>
> Thanks
> Vivek
>
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
I have seen numerous occurrences of MCE triggered kernel panics on both
RHEL & SLES environment used on IA32 architecture. Both in contexts
where kexec/kdump was being used.

 Matter of fact, MCE triggered panic are part of the reason that pushed
me to work on crashdc : only one crash command is required to get the
MCE trace out of the kernel ring buffer. This avoids transfering massive
amount of vmcore file over the net.

crashdc does well on those, mcelog can be applied on the data gathered.

HTH,

...Louis

-- 
Louis Bouchard
Server Support Analyst
Canonical Ltd
Ubuntu support: http://landscape.canonical.com


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bug] Kdump does not work when panic triggered due to MCE
  2011-05-09 15:21   ` Bouchard Louis
@ 2011-05-09 15:46     ` Vivek Goyal
  2011-05-10  7:31       ` Bouchard Louis
  2011-05-09 17:03     ` K.Prasad
  1 sibling, 1 reply; 20+ messages in thread
From: Vivek Goyal @ 2011-05-09 15:46 UTC (permalink / raw)
  To: Bouchard Louis; +Cc: prasad, kexec

On Mon, May 09, 2011 at 05:21:06PM +0200, Bouchard Louis wrote:
> Hello,
> 
> Le 09/05/2011 14:39, Vivek Goyal a écrit :
> >
> > Prasad,
> >
> > I have never tried taking dump in MCE situation. Does kdump work on this
> > machine with normal panic()?
> >
> > Use --debug and --serial option in kexec-tools to print some debug message
> > and look for "I am in purgatory". This will tell you whether you hanged
> > in first kernel or second kernel.
> >
> > Then put "outb()" messages in the kernel to trace what happened. 
> >
> > Thanks
> > Vivek
> >
> > _______________________________________________
> > kexec mailing list
> > kexec@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/kexec
> I have seen numerous occurrences of MCE triggered kernel panics on both
> RHEL & SLES environment used on IA32 architecture. Both in contexts
> where kexec/kdump was being used.
> 
>  Matter of fact, MCE triggered panic are part of the reason that pushed
> me to work on crashdc : only one crash command is required to get the
> MCE trace out of the kernel ring buffer. This avoids transfering massive
> amount of vmcore file over the net.
> 
> crashdc does well on those, mcelog can be applied on the data gathered.
> 

Louis, is this "crashdc" an independent executable. Where is it packaged?
What is the significance of name "crashdc" (Crash data collecor ?) Will it
make sense to merge it with makedumpfile which does kernel filtering
already and this could be one of the additional features.

Thanks
Vivek

> HTH,
> 
> ...Louis
> 
> -- 
> Louis Bouchard
> Server Support Analyst
> Canonical Ltd
> Ubuntu support: http://landscape.canonical.com

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bug] Kdump does not work when panic triggered due to MCE
  2011-05-06 17:38   ` Andi Kleen
@ 2011-05-09 16:35     ` K.Prasad
  -1 siblings, 0 replies; 20+ messages in thread
From: K.Prasad @ 2011-05-09 16:35 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linux Kernel Mailing List, Luck, Tony, Vivek Goyal, kexec, ying.huang

On Fri, May 06, 2011 at 07:38:25PM +0200, Andi Kleen wrote:
> > Has anybody tested this before? Or have found kdump working when fatal
> > MCEs have actually occurred?
> 
> Ying did some testing. mce-test has test cases for kdump.
> 

We'd be glad to hear about any successful testcases with recent kernels.
My manual testing was quite similar to what the LTP kdump testcase would
do i.e. configure kdump service, trigger crash through
/proc/sysrq-trigger and watchout for kdump....but as you could see in
the logs, that did not happen.

> My guess is you injected the error into some area used by the kexec
> code or boot up path of the kexec kernel.
> 
> -Andi

The logs did not suggest that the second kernel was booted into. The
"Rebooting in ... seconds" message appeared from the first kernel. I
tried the kdump testcase in atleast two dissimilar machines but with
the same results, so it is not clear if the kexec code was affected by
the MCE injection in both the cases.

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bug] Kdump does not work when panic triggered due to MCE
@ 2011-05-09 16:35     ` K.Prasad
  0 siblings, 0 replies; 20+ messages in thread
From: K.Prasad @ 2011-05-09 16:35 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Luck, Tony, kexec, Linux Kernel Mailing List, Vivek Goyal, ying.huang

On Fri, May 06, 2011 at 07:38:25PM +0200, Andi Kleen wrote:
> > Has anybody tested this before? Or have found kdump working when fatal
> > MCEs have actually occurred?
> 
> Ying did some testing. mce-test has test cases for kdump.
> 

We'd be glad to hear about any successful testcases with recent kernels.
My manual testing was quite similar to what the LTP kdump testcase would
do i.e. configure kdump service, trigger crash through
/proc/sysrq-trigger and watchout for kdump....but as you could see in
the logs, that did not happen.

> My guess is you injected the error into some area used by the kexec
> code or boot up path of the kexec kernel.
> 
> -Andi

The logs did not suggest that the second kernel was booted into. The
"Rebooting in ... seconds" message appeared from the first kernel. I
tried the kdump testcase in atleast two dissimilar machines but with
the same results, so it is not clear if the kexec code was affected by
the MCE injection in both the cases.

Thanks,
K.Prasad


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bug] Kdump does not work when panic triggered due to MCE
  2011-05-09 12:39   ` Vivek Goyal
@ 2011-05-09 16:53     ` K.Prasad
  -1 siblings, 0 replies; 20+ messages in thread
From: K.Prasad @ 2011-05-09 16:53 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Linux Kernel Mailing List, Andi Kleen, Luck, Tony, kexec

On Mon, May 09, 2011 at 08:39:02AM -0400, Vivek Goyal wrote:
> On Fri, May 06, 2011 at 10:24:12PM +0530, K.Prasad wrote:
> > Hi All,
> > 	I wanted to test the behaviour of kdump when panic is triggered
> > due to MCE on x86 and found that kdump is not captured.
> > 
> > While the kdump service is configured and running and non-MCE panics
> > (such as those triggered through to /proc/sysrq-trigger) successfully
> > capture a kdump, any fatal MCE error injected through the mce-inject
> > tool causes a reboot of the machine.
> > 
> > The code has been traced (using early_serial_putc()) to enter the kexec
> > path i.e. panic()->crash_kexec()->machine_kexec()->relocate_kernel()
> > but is untraceable further.
> > 
> > Kdump works fine when the same the similar test is carried out inside a
> > KVM guest.
> > 
> > Has anybody tested this before? Or have found kdump working when fatal
> > MCEs have actually occurred?
> 
> Prasad,
> 
> I have never tried taking dump in MCE situation. Does kdump work on this
> machine with normal panic()?
> 

Hi Vivek,
	kdump worked fine on this machine for non-MCE triggered panic
calls (the /proc/sysrq-trigger initiated crashes got the kdump fine).

> Use --debug and --serial option in kexec-tools to print some debug message
> and look for "I am in purgatory". This will tell you whether you hanged
> in first kernel or second kernel.
> 

There were no boot logs from the second kernel while the "Rebooting in X
seconds..." message had appeared before the system rebooted, suggesting
that the second kernel did not boot at all.

> Then put "outb()" messages in the kernel to trace what happened. 
> 

The outb logs showed that the system entered machine_kexec function (traceable
upto relocate_kernel) but then rebooted from inside the panic() function.

Thanks,
K.Prasad


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bug] Kdump does not work when panic triggered due to MCE
@ 2011-05-09 16:53     ` K.Prasad
  0 siblings, 0 replies; 20+ messages in thread
From: K.Prasad @ 2011-05-09 16:53 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Andi Kleen, kexec, Linux Kernel Mailing List, Luck, Tony

On Mon, May 09, 2011 at 08:39:02AM -0400, Vivek Goyal wrote:
> On Fri, May 06, 2011 at 10:24:12PM +0530, K.Prasad wrote:
> > Hi All,
> > 	I wanted to test the behaviour of kdump when panic is triggered
> > due to MCE on x86 and found that kdump is not captured.
> > 
> > While the kdump service is configured and running and non-MCE panics
> > (such as those triggered through to /proc/sysrq-trigger) successfully
> > capture a kdump, any fatal MCE error injected through the mce-inject
> > tool causes a reboot of the machine.
> > 
> > The code has been traced (using early_serial_putc()) to enter the kexec
> > path i.e. panic()->crash_kexec()->machine_kexec()->relocate_kernel()
> > but is untraceable further.
> > 
> > Kdump works fine when the same the similar test is carried out inside a
> > KVM guest.
> > 
> > Has anybody tested this before? Or have found kdump working when fatal
> > MCEs have actually occurred?
> 
> Prasad,
> 
> I have never tried taking dump in MCE situation. Does kdump work on this
> machine with normal panic()?
> 

Hi Vivek,
	kdump worked fine on this machine for non-MCE triggered panic
calls (the /proc/sysrq-trigger initiated crashes got the kdump fine).

> Use --debug and --serial option in kexec-tools to print some debug message
> and look for "I am in purgatory". This will tell you whether you hanged
> in first kernel or second kernel.
> 

There were no boot logs from the second kernel while the "Rebooting in X
seconds..." message had appeared before the system rebooted, suggesting
that the second kernel did not boot at all.

> Then put "outb()" messages in the kernel to trace what happened. 
> 

The outb logs showed that the system entered machine_kexec function (traceable
upto relocate_kernel) but then rebooted from inside the panic() function.

Thanks,
K.Prasad


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bug] Kdump does not work when panic triggered due to MCE
  2011-05-09 15:21   ` Bouchard Louis
  2011-05-09 15:46     ` Vivek Goyal
@ 2011-05-09 17:03     ` K.Prasad
  2011-05-10  7:19       ` Bouchard Louis
  2011-05-10 10:21       ` WANG Cong
  1 sibling, 2 replies; 20+ messages in thread
From: K.Prasad @ 2011-05-09 17:03 UTC (permalink / raw)
  To: Bouchard Louis; +Cc: Andi Kleen, kexec, Vivek Goyal

On Mon, May 09, 2011 at 05:21:06PM +0200, Bouchard Louis wrote:
> Hello,
> 
> Le 09/05/2011 14:39, Vivek Goyal a écrit :
> >
> > Prasad,
> >
> > I have never tried taking dump in MCE situation. Does kdump work on this
> > machine with normal panic()?
> >
> > Use --debug and --serial option in kexec-tools to print some debug message
> > and look for "I am in purgatory". This will tell you whether you hanged
> > in first kernel or second kernel.
> >
> > Then put "outb()" messages in the kernel to trace what happened. 
> >
> > Thanks
> > Vivek
> >
> > _______________________________________________
> > kexec mailing list
> > kexec@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/kexec
> I have seen numerous occurrences of MCE triggered kernel panics on both
> RHEL & SLES environment used on IA32 architecture. Both in contexts
> where kexec/kdump was being used.
>

That's interesting! Assuming that these are not software induced MCEs
but panic() calls invoked due to unrecoverable memory errors in a
physical machine, did you experience any situation where the kdump
kernel hung/rebooted due to a second MCE (triggered while reading the
faulty memory location belonging to the first kernel)?
 
>  Matter of fact, MCE triggered panic are part of the reason that pushed
> me to work on crashdc : only one crash command is required to get the
> MCE trace out of the kernel ring buffer. This avoids transfering massive
> amount of vmcore file over the net.
> 

What is the data that is contained in the faulty memory location (whose
I/O triggered an MCE in the first place)? Basically we'd like to
understand what a 'read' operation on the corrupted memory location
would result in.

> crashdc does well on those, mcelog can be applied on the data gathered.
>

We're contemplating a solution on the similar lines (refer the
description of 'slim' kdump at https://lkml.org/lkml/2011/5/4/396) to
create a 'crash tool readable coredump containing a message that
indicates the cause of the crash as MCE (and not any data from the old
memory).

I'll take a look at the crashdc code and see if there are ideas that we
can borrow from there.

Thanks,
K.Prasad
 

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bug] Kdump does not work when panic triggered due to MCE
  2011-05-09 16:53     ` K.Prasad
@ 2011-05-09 17:05       ` Vivek Goyal
  -1 siblings, 0 replies; 20+ messages in thread
From: Vivek Goyal @ 2011-05-09 17:05 UTC (permalink / raw)
  To: K.Prasad; +Cc: Linux Kernel Mailing List, Andi Kleen, Luck, Tony, kexec

On Mon, May 09, 2011 at 10:23:57PM +0530, K.Prasad wrote:
> On Mon, May 09, 2011 at 08:39:02AM -0400, Vivek Goyal wrote:
> > On Fri, May 06, 2011 at 10:24:12PM +0530, K.Prasad wrote:
> > > Hi All,
> > > 	I wanted to test the behaviour of kdump when panic is triggered
> > > due to MCE on x86 and found that kdump is not captured.
> > > 
> > > While the kdump service is configured and running and non-MCE panics
> > > (such as those triggered through to /proc/sysrq-trigger) successfully
> > > capture a kdump, any fatal MCE error injected through the mce-inject
> > > tool causes a reboot of the machine.
> > > 
> > > The code has been traced (using early_serial_putc()) to enter the kexec
> > > path i.e. panic()->crash_kexec()->machine_kexec()->relocate_kernel()
> > > but is untraceable further.
> > > 
> > > Kdump works fine when the same the similar test is carried out inside a
> > > KVM guest.
> > > 
> > > Has anybody tested this before? Or have found kdump working when fatal
> > > MCEs have actually occurred?
> > 
> > Prasad,
> > 
> > I have never tried taking dump in MCE situation. Does kdump work on this
> > machine with normal panic()?
> > 
> 
> Hi Vivek,
> 	kdump worked fine on this machine for non-MCE triggered panic
> calls (the /proc/sysrq-trigger initiated crashes got the kdump fine).
> 
> > Use --debug and --serial option in kexec-tools to print some debug message
> > and look for "I am in purgatory". This will tell you whether you hanged
> > in first kernel or second kernel.
> > 
> 
> There were no boot logs from the second kernel while the "Rebooting in X
> seconds..." message had appeared before the system rebooted, suggesting
> that the second kernel did not boot at all.
> 
> > Then put "outb()" messages in the kernel to trace what happened. 
> > 
> 
> The outb logs showed that the system entered machine_kexec function (traceable
> upto relocate_kernel) but then rebooted from inside the panic() function.

Ok, that means that we returned from crash_kexec() function instead of
transitioning into second kernel. This is strange. machine_kexec() is not
supposed to return until and unless it finds that there is no crash
kernel loaded. As per your mail you can trace it to relocate_kernel()
being entered. So only thing I can suggest is debug relocate_kernel()
code now to see why it is returning.

Thanks
Vivek


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bug] Kdump does not work when panic triggered due to MCE
@ 2011-05-09 17:05       ` Vivek Goyal
  0 siblings, 0 replies; 20+ messages in thread
From: Vivek Goyal @ 2011-05-09 17:05 UTC (permalink / raw)
  To: K.Prasad; +Cc: Andi Kleen, kexec, Linux Kernel Mailing List, Luck, Tony

On Mon, May 09, 2011 at 10:23:57PM +0530, K.Prasad wrote:
> On Mon, May 09, 2011 at 08:39:02AM -0400, Vivek Goyal wrote:
> > On Fri, May 06, 2011 at 10:24:12PM +0530, K.Prasad wrote:
> > > Hi All,
> > > 	I wanted to test the behaviour of kdump when panic is triggered
> > > due to MCE on x86 and found that kdump is not captured.
> > > 
> > > While the kdump service is configured and running and non-MCE panics
> > > (such as those triggered through to /proc/sysrq-trigger) successfully
> > > capture a kdump, any fatal MCE error injected through the mce-inject
> > > tool causes a reboot of the machine.
> > > 
> > > The code has been traced (using early_serial_putc()) to enter the kexec
> > > path i.e. panic()->crash_kexec()->machine_kexec()->relocate_kernel()
> > > but is untraceable further.
> > > 
> > > Kdump works fine when the same the similar test is carried out inside a
> > > KVM guest.
> > > 
> > > Has anybody tested this before? Or have found kdump working when fatal
> > > MCEs have actually occurred?
> > 
> > Prasad,
> > 
> > I have never tried taking dump in MCE situation. Does kdump work on this
> > machine with normal panic()?
> > 
> 
> Hi Vivek,
> 	kdump worked fine on this machine for non-MCE triggered panic
> calls (the /proc/sysrq-trigger initiated crashes got the kdump fine).
> 
> > Use --debug and --serial option in kexec-tools to print some debug message
> > and look for "I am in purgatory". This will tell you whether you hanged
> > in first kernel or second kernel.
> > 
> 
> There were no boot logs from the second kernel while the "Rebooting in X
> seconds..." message had appeared before the system rebooted, suggesting
> that the second kernel did not boot at all.
> 
> > Then put "outb()" messages in the kernel to trace what happened. 
> > 
> 
> The outb logs showed that the system entered machine_kexec function (traceable
> upto relocate_kernel) but then rebooted from inside the panic() function.

Ok, that means that we returned from crash_kexec() function instead of
transitioning into second kernel. This is strange. machine_kexec() is not
supposed to return until and unless it finds that there is no crash
kernel loaded. As per your mail you can trace it to relocate_kernel()
being entered. So only thing I can suggest is debug relocate_kernel()
code now to see why it is returning.

Thanks
Vivek


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bug] Kdump does not work when panic triggered due to MCE
  2011-05-09 16:35     ` K.Prasad
@ 2011-05-10  1:28       ` Huang Ying
  -1 siblings, 0 replies; 20+ messages in thread
From: Huang Ying @ 2011-05-10  1:28 UTC (permalink / raw)
  To: prasad
  Cc: Andi Kleen, Linux Kernel Mailing List, Luck, Tony, Vivek Goyal, kexec

Hi, Prasad,

On 05/10/2011 12:35 AM, K.Prasad wrote:
> On Fri, May 06, 2011 at 07:38:25PM +0200, Andi Kleen wrote:
>>> Has anybody tested this before? Or have found kdump working when fatal
>>> MCEs have actually occurred?
>>
>> Ying did some testing. mce-test has test cases for kdump.
>>
> 
> We'd be glad to hear about any successful testcases with recent kernels.
> My manual testing was quite similar to what the LTP kdump testcase would
> do i.e. configure kdump service, trigger crash through
> /proc/sysrq-trigger and watchout for kdump....but as you could see in
> the logs, that did not happen.
> 
>> My guess is you injected the error into some area used by the kexec
>> code or boot up path of the kexec kernel.
>>
>> -Andi
> 
> The logs did not suggest that the second kernel was booted into. The
> "Rebooting in ... seconds" message appeared from the first kernel. I
> tried the kdump testcase in atleast two dissimilar machines but with
> the same results, so it is not clear if the kexec code was affected by
> the MCE injection in both the cases.

>From your panic logs, it seems that panic is triggered for MCE on one
CPU,  when crash_kexec is executing, another panic is triggered on
another CPU for timeout mechanism in MCE.  We have seen something like
that in mce-test developing.  Please try following command line for mce
injecting.

mce-inject --no-random
/home/prasadkr/mce/mce-test/cases/soft-inj/panic_ucr/data/srar_over

Which is used by kdump test driver of mce-test too.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bug] Kdump does not work when panic triggered due to MCE
@ 2011-05-10  1:28       ` Huang Ying
  0 siblings, 0 replies; 20+ messages in thread
From: Huang Ying @ 2011-05-10  1:28 UTC (permalink / raw)
  To: prasad
  Cc: Andi Kleen, kexec, Linux Kernel Mailing List, Vivek Goyal, Luck, Tony

Hi, Prasad,

On 05/10/2011 12:35 AM, K.Prasad wrote:
> On Fri, May 06, 2011 at 07:38:25PM +0200, Andi Kleen wrote:
>>> Has anybody tested this before? Or have found kdump working when fatal
>>> MCEs have actually occurred?
>>
>> Ying did some testing. mce-test has test cases for kdump.
>>
> 
> We'd be glad to hear about any successful testcases with recent kernels.
> My manual testing was quite similar to what the LTP kdump testcase would
> do i.e. configure kdump service, trigger crash through
> /proc/sysrq-trigger and watchout for kdump....but as you could see in
> the logs, that did not happen.
> 
>> My guess is you injected the error into some area used by the kexec
>> code or boot up path of the kexec kernel.
>>
>> -Andi
> 
> The logs did not suggest that the second kernel was booted into. The
> "Rebooting in ... seconds" message appeared from the first kernel. I
> tried the kdump testcase in atleast two dissimilar machines but with
> the same results, so it is not clear if the kexec code was affected by
> the MCE injection in both the cases.

From your panic logs, it seems that panic is triggered for MCE on one
CPU,  when crash_kexec is executing, another panic is triggered on
another CPU for timeout mechanism in MCE.  We have seen something like
that in mce-test developing.  Please try following command line for mce
injecting.

mce-inject --no-random
/home/prasadkr/mce/mce-test/cases/soft-inj/panic_ucr/data/srar_over

Which is used by kdump test driver of mce-test too.

Best Regards,
Huang Ying

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bug] Kdump does not work when panic triggered due to MCE
  2011-05-09 17:03     ` K.Prasad
@ 2011-05-10  7:19       ` Bouchard Louis
  2011-05-10 10:21       ` WANG Cong
  1 sibling, 0 replies; 20+ messages in thread
From: Bouchard Louis @ 2011-05-10  7:19 UTC (permalink / raw)
  To: prasad; +Cc: Bouchard Louis, kexec

Hello,

Le 09/05/2011 19:03, K.Prasad a écrit :
> On Mon, May 09, 2011 at 05:21:06PM +0200, Bouchard Louis wrote:
>> I have seen numerous occurrences of MCE triggered kernel panics on both
>> RHEL & SLES environment used on IA32 architecture. Both in contexts
>> where kexec/kdump was being used.
>>
> That's interesting! Assuming that these are not software induced MCEs
> but panic() calls invoked due to unrecoverable memory errors in a
> physical machine, did you experience any situation where the kdump
> kernel hung/rebooted due to a second MCE (triggered while reading the
> faulty memory location belonging to the first kernel)? 
What I'm relating is issues I have seen in the past, while I was
employed somewhere else. Unfortunately, I no longer have access to any
of those dumps.

Kind regards,

...Louis

-- 
Louis Bouchard
Server Support Analyst
Canonical Ltd
Ubuntu support: http://landscape.canonical.com


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bug] Kdump does not work when panic triggered due to MCE
  2011-05-09 15:46     ` Vivek Goyal
@ 2011-05-10  7:31       ` Bouchard Louis
  0 siblings, 0 replies; 20+ messages in thread
From: Bouchard Louis @ 2011-05-10  7:31 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: Bouchard Louis, kexec

Hello,

Le 09/05/2011 17:46, Vivek Goyal a écrit :
> Louis, is this "crashdc" an independent executable. Where is it packaged?
> What is the significance of name "crashdc" (Crash data collecor ?) Will it
> make sense to merge it with makedumpfile which does kernel filtering
> already and this could be one of the additional features.
>
> Thanks
> Vivek

crashdc is hosted no SF.  You can have a look at it here :

http://crashdc.svn.sourceforge.net/viewvc/crashdc/trunk/and

Most of the work happens in /usr/bin/crashdc and the
run-crashdc-{distro}.sh scripts.

It does indeed means crash data collector.  It basically is a set of
shell script that send a set of predefined commands to the crash utility
and gather the result in a text file that can be easily sent over email.

In one implementation, it makes use of the KDUMP_POST mechanism to run.
Another way to use it is through an init script upon reboot.

As for merging it with makedumpfile, it could be an interesting choice. 
But that might add requirements to makedumpfile that you might not want
to have (dependencies on crash & the kernel-debuginfo packages which are
needed).

I have recently changed jobs, so I had to suspend my work on crashdc a
bit. I'm hoping to find time to work on it and adapt it to debian based
distro in the near future.

Kind regards,

...Louis

-- 
Louis Bouchard
Server Support Analyst
Canonical Ltd
Ubuntu support: http://landscape.canonical.com


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Bug] Kdump does not work when panic triggered due to MCE
  2011-05-09 17:03     ` K.Prasad
  2011-05-10  7:19       ` Bouchard Louis
@ 2011-05-10 10:21       ` WANG Cong
  1 sibling, 0 replies; 20+ messages in thread
From: WANG Cong @ 2011-05-10 10:21 UTC (permalink / raw)
  To: kexec

On Mon, 09 May 2011 22:33:36 +0530, K.Prasad wrote:

> That's interesting! Assuming that these are not software induced MCEs
> but panic() calls invoked due to unrecoverable memory errors in a
> physical machine, did you experience any situation where the kdump
> kernel hung/rebooted due to a second MCE (triggered while reading the
> faulty memory location belonging to the first kernel)?


Someone discussed this with me inside RH, is disabling MCE
checking in the second kernel an acceptable solution?

> We're contemplating a solution on the similar lines (refer the
> description of 'slim' kdump at https://lkml.org/lkml/2011/5/4/396) to
> create a 'crash tool readable coredump containing a message that
> indicates the cause of the crash as MCE (and not any data from the old
> memory).
> 

You might want to try use other way instead of memory to store
the log of the first kernel, for example, mtdoops.


_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2011-05-10 10:21 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-06 16:54 [Bug] Kdump does not work when panic triggered due to MCE K.Prasad
2011-05-06 16:54 ` K.Prasad
2011-05-06 17:38 ` Andi Kleen
2011-05-06 17:38   ` Andi Kleen
2011-05-09 16:35   ` K.Prasad
2011-05-09 16:35     ` K.Prasad
2011-05-10  1:28     ` Huang Ying
2011-05-10  1:28       ` Huang Ying
2011-05-09 12:39 ` Vivek Goyal
2011-05-09 12:39   ` Vivek Goyal
2011-05-09 15:21   ` Bouchard Louis
2011-05-09 15:46     ` Vivek Goyal
2011-05-10  7:31       ` Bouchard Louis
2011-05-09 17:03     ` K.Prasad
2011-05-10  7:19       ` Bouchard Louis
2011-05-10 10:21       ` WANG Cong
2011-05-09 16:53   ` K.Prasad
2011-05-09 16:53     ` K.Prasad
2011-05-09 17:05     ` Vivek Goyal
2011-05-09 17:05       ` Vivek Goyal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.