From: Don Zickus <dzickus@redhat.com>
To: hpa@zytor.com
Cc: LKML <linux-kernel@vger.kernel.org>,
x86@kernel.org, vgoyal@redhat.com, ebiederm@xmission.com,
Don Zickus <dzickus@redhat.com>
Subject: [PATCH] x86: Skip latched NMIs on early boot in kdump
Date: Fri, 7 Mar 2014 14:39:03 -0500
Message-ID: <1394221143-29713-1-git-send-email-dzickus@redhat.com> (raw)
A customer generated an external NMI using their iLO to test kdump worked.
Unfortunately, the machine hung. Disabling the nmi_watchdog made things work.
I speculated the external NMI fired, caused the machine to panic (as expected)
and the perf NMI from the watchdog came in and was latched. My guess was this
somehow caused the hang.
Debugging this with outb's and debug_putstr, I learned the following
- the machine hung during the first memcpy in copy_bootdata (in
arch/x86/kernel/head64.c)
- early_make_pgtable was called during this memcpy
- after early_make_pgtable, an exception vector 2 (NMI) came in
- the IP of this vector was in copy_bootdata's range
- because there was no fixup associated with this IP, the machine
is sitting in a 'hlt' instruction (in arch/x86/kernel/head_64.S)
(copy and paste from arch/x86/kernel/head_64.S)
/* This is global to keep gas from relaxing the jumps */
ENTRY(early_idt_handler)
<snip>
cmpl $14,72(%rsp) # Page fault?
jnz 10f
GET_CR2_INTO(%rdi) # can clobber any volatile register if pv
call early_make_pgtable
andl %eax,%eax
jz 20f # All good
10:
leaq 88(%rsp),%rdi # Pointer to %rip
call early_fixup_exception
andl %eax,%eax
jnz 20f # Found an exception entry
11:
<snip>
1: hlt
^^^^^^^^^^^^ sitting here
jmp 1b
I added the below hack, which says if the exception is an NMI just return and
things seem to work.
Now, I don't expect this to be the correct solution. Nor do I fully understand
what this early boot code is doing, so hopefully folks wiser than me can
provide me a better patch to test. :-)
I also do not fully understand why the latched NMI is not happening immediately
after the load idt call or why it comes after a page fault (the
early_make_pgtable). Further adding to my confusion is why the early printk
magic didn't dump a stack as I believe I had that setup on my commandline.
But I figured I would just report what I have observed.
My testing and debugging were based off a 3.10 kernel (RHEL-7) but has included
Seiji's tracepoint cleanups to arch/x86/kernel/head_64.S|head64.c. Not much
has changed upstream here. Also 3.14-rc4 still has the same hang.
Signed-off-by: Don Zickus <dzickus@redhat.com>
---
arch/x86/kernel/head_64.S | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 77e6d3e..05306c8 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -368,6 +368,8 @@ ENTRY(early_idt_handler)
jz 20f # All good
10:
+ cmpl $2,72(%rsp) # NMI?
+ jz 20f # skip NMIs
leaq 88(%rsp),%rdi # Pointer to %rip
call early_fixup_exception
andl %eax,%eax
--
1.7.1
next reply index
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-07 19:39 Don Zickus [this message]
2014-03-07 21:15 ` H. Peter Anvin
2014-03-07 22:54 ` Don Zickus
2014-03-07 23:15 ` [tip:x86/urgent] x86: Ignore NMIs that come in during early boot tip-bot for H. Peter Anvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1394221143-29713-1-git-send-email-dzickus@redhat.com \
--to=dzickus@redhat.com \
--cc=ebiederm@xmission.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=vgoyal@redhat.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
LKML Archive on lore.kernel.org
Archives are clonable:
git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
# If you have public-inbox 1.1+ installed, you may
# initialize and index your mirror using the following commands:
public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
linux-kernel@vger.kernel.org
public-inbox-index lkml
Example config snippet for mirrors
Newsgroup available over NNTP:
nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel
AGPL code for this site: git clone https://public-inbox.org/public-inbox.git