All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -rt 3.10.x] mce: don't try to wake thread before it exists.
@ 2014-08-26 22:10 Paul Gortmaker
  2014-08-26 23:07 ` Steven Rostedt
  0 siblings, 1 reply; 4+ messages in thread
From: Paul Gortmaker @ 2014-08-26 22:10 UTC (permalink / raw)
  To: Steven Rostedt (Red Hat); +Cc: linux-rt-users, Paul Gortmaker

If a broken machine with issues raises an MCE irq event real
early in the boot, it can try and wake the -rt specific handler
thread (mce_notify_helper) before it exists.  (It is created
through a device_initcall that happens later in the boot.)  When
this happens, we see the irq, which calls the wake with a null
pointer, which then panics the machine at boot.

The race between the irq event and thread init is as follows:

mce_notify_irq();
  --> mce_notify_work();
        --> wake_up_process(mce_notify_helper);

device_initcall_sync(mcheck_init_device);
  --> mce_notify_work_init();
        --> mce_notify_helper = kthread_run(mce_notify_helper_thread, ...);

So, clearly if the IRQ event happens before the device_initcall,
the mce_notify_helper pointer (at global file scope and hence BSS)
will still be NULL, resulting in the following panic at boot:

  CPU: Physical Processor ID: 0
  CPU: Processor Core ID: 0
  ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
  ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)
  mce: CPU supports 22 MCE banks
  CPU0: Thermal monitoring enabled (TM1)
  Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
  Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0
  tlb_flushall_shift: 6
  Freeing SMP alternatives: 36k freed
  ACPI: Core revision 20130328
  BUG: unable to handle kernel NULL pointer dereference at           (null)
  IP: [<ffffffff8107730d>] wake_up_process+0xd/0x40
  PGD 0
  Oops: 0000 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.40-rt40_preempt-rt #1
  Hardware name: Insyde Grantley/Type2 - Board Product Name1, BIOS 05.04.07 04/21/2014
  task: ffffffff81e14440 ti: ffffffff81e00000 task.ti: ffffffff81e00000
  RIP: 0010:[<ffffffff8107730d>]  [<ffffffff8107730d>] wake_up_process+0xd/0x40
  RSP: 0000:ffff88107fc03f68  EFLAGS: 00010086
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000007ffefbff
  RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffff88107fc03f70 R08: 0000000000000002 R09: 0000000000000003
  R10: 0000000000000000 R11: 0000000000000001 R12: ffff88103f03d100
  R13: ffff880ff4e0c000 R14: ffff88107fc16f00 R15: ffff880ff4e0c000
  FS:  0000000000000000(0000) GS:ffff88107fc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 0000000001e0f000 CR4: 00000000001406f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 00000000000000

  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Stack:
   ffff88107fc0ccf0 ffff88107fc03f80 ffffffff8101f900 ffff88107fc03f98
   ffffffff8102169d ffff88107fc0fab0 ffff88107fc03fa8 ffffffff81022051
   ffffffff81e01d48 ffffffff819a8a9a ffffffff81e01bf8 <EOI>  ffffffff81e01d48
  Call Trace:
   <IRQ>
   [<ffffffff8101f900>] mce_notify_irq+0x30/0x40
   [<ffffffff8102169d>] intel_threshold_interrupt+0xbd/0xe0
   [<ffffffff81022051>] smp_threshold_interrupt+0x21/0x40
   [<ffffffff819a8a9a>] threshold_interrupt+0x6a/0x70
   <EOI>
   [<ffffffff8199c57c>] ? __slab_alloc.isra.48+0x39e/0x60c
   [<ffffffff814369d5>] ? acpi_ps_alloc_op+0x9a/0xa1
   [<ffffffff811534a8>] ? kmem_cache_free+0xb8/0x2b0
   [<ffffffff81152be4>] kmem_cache_alloc+0x234/0x2e0
   [<ffffffff814369d5>] ? acpi_ps_alloc_op+0x9a/0xa1
   [<ffffffff814369d5>] acpi_ps_alloc_op+0x9a/0xa1
   [<ffffffff8143523f>] acpi_ps_get_next_arg+0xfe/0x3d3
   [<ffffffff814357a4>] acpi_ps_parse_loop+0x290/0x560
   [<ffffffff814364bc>] acpi_ps_parse_aml+0x98/0x28c
   [<ffffffff8143242c>] acpi_ns_one_complete_parse+0x104/0x124
   [<ffffffff8143247f>] acpi_ns_parse_table+0x33/0x38
   [<ffffffff81431e56>] acpi_ns_load_table+0x4a/0x8c
   [<ffffffff81439d6e>] acpi_load_tables+0xa2/0x176
   [<ffffffff81f4dbf3>] acpi_early_init+0x70/0x100
   [<ffffffff81f1c4e9>] ? check_bugs+0xe/0x2d
   [<ffffffff81f14df2>] start_kernel+0x387/0x3b5
   [<ffffffff81f14874>] ? repair_env_string+0x5c/0x5c
   [<ffffffff81f145ad>] x86_64_start_reservations+0x2a/0x2c
   [<ffffffff81f1467b>] x86_64_start_kernel+0xcc/0xcf
  Code: 8b 52 18 e9 9e fc ff ff 48 89 45 c0 e8 cd df 92 00 48 8b 45 c0 eb e5 0f 1f 80 00 00 00 00 e8 fb 04 93 00 55 48 89 e5 53 48 89 fb <48> 8b 07 a8 0c 75 12 48 89 df 31 d2 be 03 00 00 00 e8 ad fb ff
  RIP  [<ffffffff8107730d>] wake_up_process+0xd/0x40
   RSP <ffff88107fc03f68>
  CR2: 0000000000000000
  ---[ end trace 0000000000000001 ]---
  Kernel panic - not syncing: Fatal exception in interrupt

Evidently the hardware has issues, but we can handle this more
gracefully by ignoring the events that happen before the
device_initcall has registered the mce handler thread.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index aaf4b9b94f38..94860c521fb8 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1391,6 +1391,11 @@ static int mce_notify_work_init(void)
 
 static void mce_notify_work(void)
 {
+	if (unlikely(!mce_notify_helper)) {
+		pr_info(HW_ERR "Machine check event before MCE init; ignored\n");
+		return;
+	}
+
 	wake_up_process(mce_notify_helper);
 }
 #else
-- 
2.0.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH -rt 3.10.x] mce: don't try to wake thread before it exists.
  2014-08-26 22:10 [PATCH -rt 3.10.x] mce: don't try to wake thread before it exists Paul Gortmaker
@ 2014-08-26 23:07 ` Steven Rostedt
  2014-09-04 15:29   ` [PATCHv2 " Paul Gortmaker
  0 siblings, 1 reply; 4+ messages in thread
From: Steven Rostedt @ 2014-08-26 23:07 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: linux-rt-users

On Tue, 26 Aug 2014 18:10:53 -0400
Paul Gortmaker <paul.gortmaker@windriver.com> wrote:

> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index aaf4b9b94f38..94860c521fb8 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1391,6 +1391,11 @@ static int mce_notify_work_init(void)
>  
>  static void mce_notify_work(void)
>  {
> +	if (unlikely(!mce_notify_helper)) {
> +		pr_info(HW_ERR "Machine check event before MCE init; ignored\n");

Hmm, maybe we should make this a bit more noticeable?  Not just an
"ignored" event with pr_info(). Maybe a:

	if (WARN_ON_ONCE(!mce_notify_helper)) {

-- Steve

> +		return;
> +	}
> +
>  	wake_up_process(mce_notify_helper);
>  }
>  #else


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCHv2 -rt 3.10.x] mce: don't try to wake thread before it exists.
  2014-08-26 23:07 ` Steven Rostedt
@ 2014-09-04 15:29   ` Paul Gortmaker
  2015-02-17 10:00     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 4+ messages in thread
From: Paul Gortmaker @ 2014-09-04 15:29 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-rt-users, Paul Gortmaker

If a broken machine with issues raises an MCE irq event real
early in the boot, it can try and wake the -rt specific handler
thread (mce_notify_helper) before it exists.  (It is created
through a device_initcall that happens later in the boot.)  When
this happens, we see the irq, which calls the wake with a null
pointer, which then panics the machine at boot.

The race between the irq event and thread init is as follows:

mce_notify_irq();
  --> mce_notify_work();
        --> wake_up_process(mce_notify_helper);

device_initcall_sync(mcheck_init_device);
  --> mce_notify_work_init();
        --> mce_notify_helper = kthread_run(mce_notify_helper_thread, ...);

So, clearly if the IRQ event happens before the device_initcall,
the mce_notify_helper pointer (at global file scope and hence BSS)
will still be NULL, resulting in the following panic at boot:

  CPU: Physical Processor ID: 0
  CPU: Processor Core ID: 0
  ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
  ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)
  mce: CPU supports 22 MCE banks
  CPU0: Thermal monitoring enabled (TM1)
  Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
  Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0
  tlb_flushall_shift: 6
  Freeing SMP alternatives: 36k freed
  ACPI: Core revision 20130328
  BUG: unable to handle kernel NULL pointer dereference at           (null)
  IP: [<ffffffff8107730d>] wake_up_process+0xd/0x40
  PGD 0
  Oops: 0000 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.10.40-rt40_preempt-rt #1
  Hardware name: Insyde Grantley/Type2 - Board Product Name1, BIOS 05.04.07 04/21/2014
  task: ffffffff81e14440 ti: ffffffff81e00000 task.ti: ffffffff81e00000
  RIP: 0010:[<ffffffff8107730d>]  [<ffffffff8107730d>] wake_up_process+0xd/0x40
  RSP: 0000:ffff88107fc03f68  EFLAGS: 00010086
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000007ffefbff
  RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffff88107fc03f70 R08: 0000000000000002 R09: 0000000000000003
  R10: 0000000000000000 R11: 0000000000000001 R12: ffff88103f03d100
  R13: ffff880ff4e0c000 R14: ffff88107fc16f00 R15: ffff880ff4e0c000
  FS:  0000000000000000(0000) GS:ffff88107fc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 0000000000000000 CR3: 0000000001e0f000 CR4: 00000000001406f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 00000000000000

  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Stack:
   ffff88107fc0ccf0 ffff88107fc03f80 ffffffff8101f900 ffff88107fc03f98
   ffffffff8102169d ffff88107fc0fab0 ffff88107fc03fa8 ffffffff81022051
   ffffffff81e01d48 ffffffff819a8a9a ffffffff81e01bf8 <EOI>  ffffffff81e01d48
  Call Trace:
   <IRQ>
   [<ffffffff8101f900>] mce_notify_irq+0x30/0x40
   [<ffffffff8102169d>] intel_threshold_interrupt+0xbd/0xe0
   [<ffffffff81022051>] smp_threshold_interrupt+0x21/0x40
   [<ffffffff819a8a9a>] threshold_interrupt+0x6a/0x70
   <EOI>
   [<ffffffff8199c57c>] ? __slab_alloc.isra.48+0x39e/0x60c
   [<ffffffff814369d5>] ? acpi_ps_alloc_op+0x9a/0xa1
   [<ffffffff811534a8>] ? kmem_cache_free+0xb8/0x2b0
   [<ffffffff81152be4>] kmem_cache_alloc+0x234/0x2e0
   [<ffffffff814369d5>] ? acpi_ps_alloc_op+0x9a/0xa1
   [<ffffffff814369d5>] acpi_ps_alloc_op+0x9a/0xa1
   [<ffffffff8143523f>] acpi_ps_get_next_arg+0xfe/0x3d3
   [<ffffffff814357a4>] acpi_ps_parse_loop+0x290/0x560
   [<ffffffff814364bc>] acpi_ps_parse_aml+0x98/0x28c
   [<ffffffff8143242c>] acpi_ns_one_complete_parse+0x104/0x124
   [<ffffffff8143247f>] acpi_ns_parse_table+0x33/0x38
   [<ffffffff81431e56>] acpi_ns_load_table+0x4a/0x8c
   [<ffffffff81439d6e>] acpi_load_tables+0xa2/0x176
   [<ffffffff81f4dbf3>] acpi_early_init+0x70/0x100
   [<ffffffff81f1c4e9>] ? check_bugs+0xe/0x2d
   [<ffffffff81f14df2>] start_kernel+0x387/0x3b5
   [<ffffffff81f14874>] ? repair_env_string+0x5c/0x5c
   [<ffffffff81f145ad>] x86_64_start_reservations+0x2a/0x2c
   [<ffffffff81f1467b>] x86_64_start_kernel+0xcc/0xcf
  Code: 8b 52 18 e9 9e fc ff ff 48 89 45 c0 e8 cd df 92 00 48 8b 45 c0 eb e5 0f 1f 80 00 00 00 00 e8 fb 04 93 00 55 48 89 e5 53 48 89 fb <48> 8b 07 a8 0c 75 12 48 89 df 31 d2 be 03 00 00 00 e8 ad fb ff
  RIP  [<ffffffff8107730d>] wake_up_process+0xd/0x40
   RSP <ffff88107fc03f68>
  CR2: 0000000000000000
  ---[ end trace 0000000000000001 ]---
  Kernel panic - not syncing: Fatal exception in interrupt

Evidently the hardware has issues, but we can handle this more
gracefully by ignoring the events that happen before the
device_initcall has registered the mce handler thread.

We use WARN_ON_ONCE to ensure it is still noticed, and also to
implicitly ratelimit it, in case the race window is wide enough
to spam the console with too many instances of the warning.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---

[v2: wrap pr_info(..) in WARN_ON_ONCE as suggested by Steve.]

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index aaf4b9b94f38..294138c52bce 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1391,6 +1391,11 @@ static int mce_notify_work_init(void)
 
 static void mce_notify_work(void)
 {
+	if (WARN_ON_ONCE(!mce_notify_helper)) {
+		pr_info(HW_ERR "Machine check event before MCE init; ignored\n");
+		return;
+	}
+
 	wake_up_process(mce_notify_helper);
 }
 #else
-- 
2.0.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCHv2 -rt 3.10.x] mce: don't try to wake thread before it exists.
  2014-09-04 15:29   ` [PATCHv2 " Paul Gortmaker
@ 2015-02-17 10:00     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 4+ messages in thread
From: Sebastian Andrzej Siewior @ 2015-02-17 10:00 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: Steven Rostedt, linux-rt-users

* Paul Gortmaker | 2014-09-04 11:29:27 [-0400]:

>If a broken machine with issues raises an MCE irq event real
>early in the boot, it can try and wake the -rt specific handler
>thread (mce_notify_helper) before it exists.  (It is created
>through a device_initcall that happens later in the boot.)  When
>this happens, we see the irq, which calls the wake with a null
>pointer, which then panics the machine at boot.

applied.

Sebastian

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-02-17 10:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-26 22:10 [PATCH -rt 3.10.x] mce: don't try to wake thread before it exists Paul Gortmaker
2014-08-26 23:07 ` Steven Rostedt
2014-09-04 15:29   ` [PATCHv2 " Paul Gortmaker
2015-02-17 10:00     ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.