linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>,
	linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	Lai Jiangshan <laijs@linux.alibaba.com>,
	Andy Lutomirski <luto@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>
Subject: [PATCH] x86_64: Update the NMI handler nesting logic comment
Date: Mon, 25 Jan 2021 13:36:22 -0500	[thread overview]
Message-ID: <20210125133622.7aea0401@gandalf.local.home> (raw)
In-Reply-To: <4415FFC4-995A-4C77-9A96-744ED697AF05@amacapital.net>

From: "Steven Rostedt (VMware)" <rostedt@goodmis.org>

The NMI handler on x86 needs to deal with nested NMIs becaues if an NMI
takes an exception (page fault or breakpoint), the exception handle triggers
a iretq which unlatches NMIs in the hardware allowing for another NMI to
take place.

The current code does a bit of work arounds to handle this case, and over
time it has changed to become safer and more efficient. But the comment has
not kept up as much to the current logic. That needs to be fixed.

Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---

This is added on top of Lai's patch.

 arch/x86/entry/entry_64.S | 68 +++++++++++++++++++++++++--------------
 1 file changed, 43 insertions(+), 25 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 21f67ea62341..962e0bfb69dd 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1117,33 +1117,46 @@ SYM_CODE_START(asm_exc_nmi)
 	 * stack of the previous NMI. NMI handlers are not re-entrant
 	 * anyway.
 	 *
-	 * To handle this case we do the following:
-	 *  Check the a special location on the stack that contains
-	 *  a variable that is set when NMIs are executing.
-	 *  The interrupted task's stack is also checked to see if it
-	 *  is an NMI stack.
-	 *  If the variable is not set and the stack is not the NMI
-	 *  stack then:
-	 *    o Set the special variable on the stack
-	 *    o Copy the interrupt frame into an "outermost" location on the
-	 *      stack
-	 *    o Copy the interrupt frame into an "iret" location on the stack
-	 *    o Continue processing the NMI
-	 *  If the variable is set or the previous stack is the NMI stack:
-	 *    o Modify the "iret" location to jump to the repeat_nmi
-	 *    o return back to the first NMI
+	 * To handle this case, the following is performed:
 	 *
-	 * Now on exit of the first NMI, we first clear the stack variable
-	 * The NMI stack will tell any nested NMIs at that point that it is
-	 * nested. Then we pop the stack normally with iret, and if there was
-	 * a nested NMI that updated the copy interrupt stack frame, a
-	 * jump will be made to the repeat_nmi code that will handle the second
-	 * NMI.
+	 *  If the NMI came in from user space, then the stack used is the
+	 *   kernel thread stack, and the C code of the nmi handler handles
+	 *   NMI nesting just like it is done for x86_32. This also allows
+	 *   NMIs from user space to handle the issues that ISTs have
+	 *   going back to user space.
 	 *
-	 * However, espfix prevents us from directly returning to userspace
-	 * with a single IRET instruction.  Similarly, IRET to user mode
-	 * can fault.  We therefore handle NMIs from user space like
-	 * other IST entries.
+	 *  If the NMI came in from kernel space, then perform the following:
+	 *   Add a variable at a specific location on the stack which is
+	 *    called "NMI executing". This variable will be set as early
+	 *    in the process as possible, and cleared just before the iretq.
+	 *   Copy the hardware stack frame twice onto the stack.
+	 *    The original hardware stack will be replaced by the hardware if
+	 *	any nested NMIs occur, so it must not be used.
+	 *    The first copy will be used by the iretq of the NMI handler.
+	 *      If a nested NMI arrives, it will update this copy to have
+	 *      the interrupted NMI return to the "repeat_nmi" location and
+	 *      that will execute the next NMI at the finish of the first NMI.
+	 *    The second copy is used to reset the first copy of the NMI stack
+	 *	frame to return to the original location of the first NMI.
+	 *
+	 *  On entering the NMI handler, if the "NMI executing" bit is set
+	 *   or the RIP of the interrupted location is located between
+	 *   the labels of "repeat_nmi" and "end_repeat_nmi" or is located
+	 *   at the iretq (.Lnmi_iret) then the current NMI had interrupted
+	 *   another NMI handler. It will update the first copy of the
+	 *   stack frame to return to "repeat_nmi" and exit.
+	 *
+	 *  When the first NMI exits, it clears the "NMI executing" variable
+	 *  and immediately calls iretq. If this NMI had triggered an exception
+	 *  that executed an iretq, allowing a nested NMI to come in, then
+	 *  that nested NMI would have updated the stack frame that the iretq
+	 *  is executing to jump to the "repeat_nmi" label. The repeat_nmi
+	 *  code is reponsible for setting the "NMI executing" variable and
+	 *  copying the second copy of the stack frame to the first copy.
+	 *
+	 *  NB the repeated NMI still has NMIs enabled from the start, and
+	 *   an NMI can come in even after the first iretq jumps to the
+	 *   repeat_nmi code.
 	 */
 
 	ASM_CLAC
@@ -1193,6 +1206,11 @@ SYM_CODE_START(asm_exc_nmi)
 	call	exc_nmi
 
 	/*
+	 * However, espfix prevents us from directly returning to userspace
+	 * with a single IRET instruction.  Similarly, IRET to user mode
+	 * can fault.  We therefore handle NMIs from user space like
+	 * other IST entries.
+	 *
 	 * Return back to user mode.  We must *not* do the normal exit
 	 * work, because we don't want to enable interrupts.
 	 */
-- 
2.25.4


  parent reply	other threads:[~2021-01-25 18:38 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-25  2:14 [PATCH] x86/entry/64: De-Xen-ify our NMI code further Lai Jiangshan
2021-01-25  3:00 ` Andy Lutomirski
2021-01-25  7:45   ` [PATCH V2] " Lai Jiangshan
2021-01-25 17:38     ` Steven Rostedt
2021-01-25 17:51       ` Andy Lutomirski
2021-01-25 18:16         ` Steven Rostedt
2021-01-25 18:36         ` Steven Rostedt [this message]
2021-01-25 17:52       ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210125133622.7aea0401@gandalf.local.home \
    --to=rostedt@goodmis.org \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=jiangshanlai@gmail.com \
    --cc=laijs@linux.alibaba.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).