All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Corey Minyard <minyard@acm.org>, Steven Rostedt <rostedt@goodmis.org>
Cc: Corey Minyard <cminyard@mvista.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	"linux-rt-users@vger.kernel.org" <linux-rt-users@vger.kernel.org>
Subject: Re: [PATCH][RT] x86: Fix an RT MCE crash
Date: Wed, 6 Jul 2016 10:37:04 +0200	[thread overview]
Message-ID: <20160706083704.GA7300@pd.tnic> (raw)
In-Reply-To: <577C580F.8010004@acm.org>

On Tue, Jul 05, 2016 at 07:59:59PM -0500, Corey Minyard wrote:
> I'm having our hardware people keep the system as-is until we can
> track this down.
> 
> A applied the above four patches and a few more support patches got that
> were needed, but no love.  Exact same issue.  Well, almost the same, here's
> the traceback:
> 
> [    0.455575]  [<ffffffff810733c4>] try_to_wake_up+0x34/0x300
> [    0.455590]  [<ffffffff81067d76>] ? __hrtimer_start_range_ns+0x226/0x3a0
> [    0.455593]  [<ffffffff810736e0>] wake_up_process+0x10/0x20
> [    0.455615]  [<ffffffff8101c7a8>] mce_notify_irq+0x28/0x30
> [    0.455621]  [<ffffffff8101cbd9>] mce_irq_work_cb+0x9/0x10
> [    0.455646]  [<ffffffff810cbb0c>] irq_work_run_list+0x3c/0x60
> [    0.455649]  [<ffffffff810cbe97>] irq_work_tick_soft+0x27/0x30
> [    0.455673]  [<ffffffff8104dbe4>] run_timer_softirq+0x24/0x250
> [    0.455681]  [<ffffffff81045bce>] do_current_softirqs+0x1ae/0x250
> [    0.455684]  [<ffffffff81045c9e>] run_ksoftirqd+0x2e/0x50
> [    0.455697]  [<ffffffff8106c7f6>] smpboot_thread_fn+0x206/0x320
> [    0.455700]  [<ffffffff8106c5f0>] ? lg_global_unlock+0x60/0x60
> [    0.455720]  [<ffffffff81063cad>] kthread+0xad/0xc0
> [    0.455740]  [<ffffffff81730303>] ? _dbgp_external_startup+0x236/0x392
> [    0.455744]  [<ffffffff81063c00>] ? kthread_create_on_node+0x130/0x130
> [    0.455752]  [<ffffffff8173a4be>] ret_from_fork+0x4e/0x80
> [    0.455756]  [<ffffffff81063c00>] ? kthread_create_on_node+0x130/0x130
> 
> 
> So it crashed in the kthread instead of the irq, but exactly the same issue,
> that particular field is not initialized.  Not that these aren't patches
> that look like good ideas.

Hmm, so this looks like RT-specific now AFAICT.

mce_notify_irq() calls mce_notify_work() and on RT_FULL that's
trying to wake up mce_notify_helper which is not initialized yet -
mce_notify_work_init() happens later in a device_initcall_sync.

Would something as trivial as this work in your case?

---
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index aaf4b9b94f38..cc70d98a30f6 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1391,7 +1391,8 @@ static int mce_notify_work_init(void)
 
 static void mce_notify_work(void)
 {
-	wake_up_process(mce_notify_helper);
+	if (mce_notify_helper)
+		wake_up_process(mce_notify_helper);
 }
 #else
 static void mce_notify_work(void)


-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

  reply	other threads:[~2016-07-06  8:37 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-30 13:24 [PATCH][RT] x86: Fix an RT MCE crash minyard
2016-06-30 13:43 ` Steven Rostedt
2016-06-30 14:49   ` Corey Minyard
2016-06-30 15:51     ` Steven Rostedt
2016-06-30 15:58       ` Corey Minyard
2016-06-30 16:01       ` Borislav Petkov
2016-06-30 16:17         ` Luck, Tony
2016-06-30 16:40           ` Corey Minyard
2016-06-30 17:01             ` Borislav Petkov
2016-06-30 17:18               ` Corey Minyard
2016-06-30 17:26                 ` Borislav Petkov
2016-06-30 17:54                   ` Corey Minyard
2016-06-30 18:22                     ` Borislav Petkov
2016-06-30 19:44                       ` Corey Minyard
2016-06-30 20:34                         ` Borislav Petkov
2016-06-30 22:47                           ` Corey Minyard
2016-07-01  7:20                             ` Borislav Petkov
2016-07-06  0:59                               ` Corey Minyard
2016-07-06  8:37                                 ` Borislav Petkov [this message]
2016-07-06 12:03                                   ` Corey Minyard
2016-07-06 13:32                                     ` Steven Rostedt
2016-07-06 13:43                                       ` Sebastian Andrzej Siewior
2016-07-11 17:32                                         ` Steven Rostedt
2016-07-01  9:20         ` Daniel Wagner
2016-06-30 16:04       ` Corey Minyard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160706083704.GA7300@pd.tnic \
    --to=bp@alien8.de \
    --cc=cminyard@mvista.com \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=minyard@acm.org \
    --cc=rostedt@goodmis.org \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.