linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC x86/nmi] Fix out-of-order nesting checks
@ 2023-10-11 18:40 Paul E. McKenney
  2023-10-12  6:37 ` Ingo Molnar
  2023-10-12  6:41 ` [tip: x86/irq] x86/nmi: Fix out-of-order NMI nesting checks & false positive warning tip-bot2 for Paul E. McKenney
  0 siblings, 2 replies; 4+ messages in thread
From: Paul E. McKenney @ 2023-10-11 18:40 UTC (permalink / raw)
  To: linux-kernel, x86; +Cc: clm, mingo, tglx, mingo, bp, dave.hansen, hpa

The ->idt_seq and ->recv_jiffies variables added by commit 1a3ea611fc10
("x86/nmi: Accumulate NMI-progress evidence in exc_nmi()") place
the exit-time check of the bottom bit of ->idt_seq after the
this_cpu_dec_return() that re-enables NMI nesting.  This can result in
the following sequence of events on a given CPU in kernels built with
CONFIG_NMI_CHECK_CPU=y:

o       An NMI arrives, and ->idt_seq is incremented to an odd number.
        In addition, nmi_state is set to NMI_EXECUTING==1.

o       The NMI is processed.

o       The this_cpu_dec_return(nmi_state) zeroes nmi_state and returns
        NMI_EXECUTING==1, thus opting out of the "goto nmi_restart".

o       Another NMI arrives and ->idt_seq is incremented to an even
        number, triggering the warning.  But all is just fine, at least
        assuming we don't get so many closely spaced NMIs that the stack
        overflows or some such.

Experience on the fleet indicates that the MTBF of this false positive
is about 70 years.  Or, for those who are not quite that patient, the
MTBF appears to be about one per week per 4,000 systems.

Fix this false-positive warning by moving the "nmi_restart" label before
the initial ->idt_seq increment/check and moving the this_cpu_dec_return()
to follow the final ->idt_seq increment/check.  This way, all nested NMIs
that get past the NMI_NOT_RUNNING check get a clean ->idt_seq slate.
And if they don't get past that check, they will set nmi_state to
NMI_LATCHED, which will cause the this_cpu_dec_return(nmi_state)
to restart.

Reported-by: Chris Mason <clm@fb.com>
Fixes: 1a3ea611fc10 ("x86/nmi: Accumulate NMI-progress evidence in exc_nmi()")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <x86@kernel.org>

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index a0c551846b35..4766b6bed443 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -507,12 +507,13 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
 	}
 	this_cpu_write(nmi_state, NMI_EXECUTING);
 	this_cpu_write(nmi_cr2, read_cr2());
+
+nmi_restart:
 	if (IS_ENABLED(CONFIG_NMI_CHECK_CPU)) {
 		WRITE_ONCE(nsp->idt_seq, nsp->idt_seq + 1);
 		WARN_ON_ONCE(!(nsp->idt_seq & 0x1));
 		WRITE_ONCE(nsp->recv_jiffies, jiffies);
 	}
-nmi_restart:
 
 	/*
 	 * Needs to happen before DR7 is accessed, because the hypervisor can
@@ -548,16 +549,16 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
 
 	if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))
 		write_cr2(this_cpu_read(nmi_cr2));
-	if (this_cpu_dec_return(nmi_state))
-		goto nmi_restart;
-
-	if (user_mode(regs))
-		mds_user_clear_cpu_buffers();
 	if (IS_ENABLED(CONFIG_NMI_CHECK_CPU)) {
 		WRITE_ONCE(nsp->idt_seq, nsp->idt_seq + 1);
 		WARN_ON_ONCE(nsp->idt_seq & 0x1);
 		WRITE_ONCE(nsp->recv_jiffies, jiffies);
 	}
+	if (this_cpu_dec_return(nmi_state))
+		goto nmi_restart;
+
+	if (user_mode(regs))
+		mds_user_clear_cpu_buffers();
 }
 
 #if IS_ENABLED(CONFIG_KVM_INTEL)

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC x86/nmi] Fix out-of-order nesting checks
  2023-10-11 18:40 [PATCH RFC x86/nmi] Fix out-of-order nesting checks Paul E. McKenney
@ 2023-10-12  6:37 ` Ingo Molnar
  2023-10-12 10:45   ` Paul E. McKenney
  2023-10-12  6:41 ` [tip: x86/irq] x86/nmi: Fix out-of-order NMI nesting checks & false positive warning tip-bot2 for Paul E. McKenney
  1 sibling, 1 reply; 4+ messages in thread
From: Ingo Molnar @ 2023-10-12  6:37 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: linux-kernel, x86, clm, tglx, mingo, bp, dave.hansen, hpa


* Paul E. McKenney <paulmck@kernel.org> wrote:

> The ->idt_seq and ->recv_jiffies variables added by commit 1a3ea611fc10
> ("x86/nmi: Accumulate NMI-progress evidence in exc_nmi()") place
> the exit-time check of the bottom bit of ->idt_seq after the
> this_cpu_dec_return() that re-enables NMI nesting.  This can result in
> the following sequence of events on a given CPU in kernels built with
> CONFIG_NMI_CHECK_CPU=y:
> 
> o       An NMI arrives, and ->idt_seq is incremented to an odd number.
>         In addition, nmi_state is set to NMI_EXECUTING==1.
> 
> o       The NMI is processed.
> 
> o       The this_cpu_dec_return(nmi_state) zeroes nmi_state and returns
>         NMI_EXECUTING==1, thus opting out of the "goto nmi_restart".
> 
> o       Another NMI arrives and ->idt_seq is incremented to an even
>         number, triggering the warning.  But all is just fine, at least
>         assuming we don't get so many closely spaced NMIs that the stack
>         overflows or some such.
> 
> Experience on the fleet indicates that the MTBF of this false positive
> is about 70 years.  Or, for those who are not quite that patient, the
> MTBF appears to be about one per week per 4,000 systems.
> 
> Fix this false-positive warning by moving the "nmi_restart" label before
> the initial ->idt_seq increment/check and moving the this_cpu_dec_return()
> to follow the final ->idt_seq increment/check.  This way, all nested NMIs
> that get past the NMI_NOT_RUNNING check get a clean ->idt_seq slate.
> And if they don't get past that check, they will set nmi_state to
> NMI_LATCHED, which will cause the this_cpu_dec_return(nmi_state)
> to restart.

This looks like a sensible fix: the warning should obviously be atomic wrt. 
the no-nesting region. I've applied your fix to tip:x86/irq, as it doesn't 
seem urgent enough with a MTBF of 70 years to warrant tip:x86/urgent handling. ;-)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [tip: x86/irq] x86/nmi: Fix out-of-order NMI nesting checks & false positive warning
  2023-10-11 18:40 [PATCH RFC x86/nmi] Fix out-of-order nesting checks Paul E. McKenney
  2023-10-12  6:37 ` Ingo Molnar
@ 2023-10-12  6:41 ` tip-bot2 for Paul E. McKenney
  1 sibling, 0 replies; 4+ messages in thread
From: tip-bot2 for Paul E. McKenney @ 2023-10-12  6:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Chris Mason, Paul E. McKenney, Ingo Molnar, Linus Torvalds,
	Andy Lutomirski, H. Peter Anvin, x86, linux-kernel

The following commit has been merged into the x86/irq branch of tip:

Commit-ID:     f44075ecafb726830e63d33fbca29413149eeeb8
Gitweb:        https://git.kernel.org/tip/f44075ecafb726830e63d33fbca29413149eeeb8
Author:        Paul E. McKenney <paulmck@kernel.org>
AuthorDate:    Wed, 11 Oct 2023 11:40:16 -07:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Thu, 12 Oct 2023 08:35:15 +02:00

x86/nmi: Fix out-of-order NMI nesting checks & false positive warning

The ->idt_seq and ->recv_jiffies variables added by:

  1a3ea611fc10 ("x86/nmi: Accumulate NMI-progress evidence in exc_nmi()")

... place the exit-time check of the bottom bit of ->idt_seq after the
this_cpu_dec_return() that re-enables NMI nesting.  This can result in
the following sequence of events on a given CPU in kernels built with
CONFIG_NMI_CHECK_CPU=y:

  o   An NMI arrives, and ->idt_seq is incremented to an odd number.
      In addition, nmi_state is set to NMI_EXECUTING==1.

  o   The NMI is processed.

  o   The this_cpu_dec_return(nmi_state) zeroes nmi_state and returns
      NMI_EXECUTING==1, thus opting out of the "goto nmi_restart".

  o   Another NMI arrives and ->idt_seq is incremented to an even
      number, triggering the warning.  But all is just fine, at least
      assuming we don't get so many closely spaced NMIs that the stack
      overflows or some such.

Experience on the fleet indicates that the MTBF of this false positive
is about 70 years.  Or, for those who are not quite that patient, the
MTBF appears to be about one per week per 4,000 systems.

Fix this false-positive warning by moving the "nmi_restart" label before
the initial ->idt_seq increment/check and moving the this_cpu_dec_return()
to follow the final ->idt_seq increment/check.  This way, all nested NMIs
that get past the NMI_NOT_RUNNING check get a clean ->idt_seq slate.
And if they don't get past that check, they will set nmi_state to
NMI_LATCHED, which will cause the this_cpu_dec_return(nmi_state)
to restart.

Fixes: 1a3ea611fc10 ("x86/nmi: Accumulate NMI-progress evidence in exc_nmi()")
Reported-by: Chris Mason <clm@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lore.kernel.org/r/0cbff831-6e3d-431c-9830-ee65ee7787ff@paulmck-laptop
---
 arch/x86/kernel/nmi.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index a0c5518..4766b6b 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -507,12 +507,13 @@ DEFINE_IDTENTRY_RAW(exc_nmi)
 	}
 	this_cpu_write(nmi_state, NMI_EXECUTING);
 	this_cpu_write(nmi_cr2, read_cr2());
+
+nmi_restart:
 	if (IS_ENABLED(CONFIG_NMI_CHECK_CPU)) {
 		WRITE_ONCE(nsp->idt_seq, nsp->idt_seq + 1);
 		WARN_ON_ONCE(!(nsp->idt_seq & 0x1));
 		WRITE_ONCE(nsp->recv_jiffies, jiffies);
 	}
-nmi_restart:
 
 	/*
 	 * Needs to happen before DR7 is accessed, because the hypervisor can
@@ -548,16 +549,16 @@ nmi_restart:
 
 	if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))
 		write_cr2(this_cpu_read(nmi_cr2));
-	if (this_cpu_dec_return(nmi_state))
-		goto nmi_restart;
-
-	if (user_mode(regs))
-		mds_user_clear_cpu_buffers();
 	if (IS_ENABLED(CONFIG_NMI_CHECK_CPU)) {
 		WRITE_ONCE(nsp->idt_seq, nsp->idt_seq + 1);
 		WARN_ON_ONCE(nsp->idt_seq & 0x1);
 		WRITE_ONCE(nsp->recv_jiffies, jiffies);
 	}
+	if (this_cpu_dec_return(nmi_state))
+		goto nmi_restart;
+
+	if (user_mode(regs))
+		mds_user_clear_cpu_buffers();
 }
 
 #if IS_ENABLED(CONFIG_KVM_INTEL)

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC x86/nmi] Fix out-of-order nesting checks
  2023-10-12  6:37 ` Ingo Molnar
@ 2023-10-12 10:45   ` Paul E. McKenney
  0 siblings, 0 replies; 4+ messages in thread
From: Paul E. McKenney @ 2023-10-12 10:45 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, x86, clm, tglx, mingo, bp, dave.hansen, hpa

On Thu, Oct 12, 2023 at 08:37:25AM +0200, Ingo Molnar wrote:
> 
> * Paul E. McKenney <paulmck@kernel.org> wrote:
> 
> > The ->idt_seq and ->recv_jiffies variables added by commit 1a3ea611fc10
> > ("x86/nmi: Accumulate NMI-progress evidence in exc_nmi()") place
> > the exit-time check of the bottom bit of ->idt_seq after the
> > this_cpu_dec_return() that re-enables NMI nesting.  This can result in
> > the following sequence of events on a given CPU in kernels built with
> > CONFIG_NMI_CHECK_CPU=y:
> > 
> > o       An NMI arrives, and ->idt_seq is incremented to an odd number.
> >         In addition, nmi_state is set to NMI_EXECUTING==1.
> > 
> > o       The NMI is processed.
> > 
> > o       The this_cpu_dec_return(nmi_state) zeroes nmi_state and returns
> >         NMI_EXECUTING==1, thus opting out of the "goto nmi_restart".
> > 
> > o       Another NMI arrives and ->idt_seq is incremented to an even
> >         number, triggering the warning.  But all is just fine, at least
> >         assuming we don't get so many closely spaced NMIs that the stack
> >         overflows or some such.
> > 
> > Experience on the fleet indicates that the MTBF of this false positive
> > is about 70 years.  Or, for those who are not quite that patient, the
> > MTBF appears to be about one per week per 4,000 systems.
> > 
> > Fix this false-positive warning by moving the "nmi_restart" label before
> > the initial ->idt_seq increment/check and moving the this_cpu_dec_return()
> > to follow the final ->idt_seq increment/check.  This way, all nested NMIs
> > that get past the NMI_NOT_RUNNING check get a clean ->idt_seq slate.
> > And if they don't get past that check, they will set nmi_state to
> > NMI_LATCHED, which will cause the this_cpu_dec_return(nmi_state)
> > to restart.
> 
> This looks like a sensible fix: the warning should obviously be atomic wrt. 
> the no-nesting region. I've applied your fix to tip:x86/irq, as it doesn't 
> seem urgent enough with a MTBF of 70 years to warrant tip:x86/urgent handling. ;-)

Works for me!  ;-)

And thank you!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-10-12 10:45 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-11 18:40 [PATCH RFC x86/nmi] Fix out-of-order nesting checks Paul E. McKenney
2023-10-12  6:37 ` Ingo Molnar
2023-10-12 10:45   ` Paul E. McKenney
2023-10-12  6:41 ` [tip: x86/irq] x86/nmi: Fix out-of-order NMI nesting checks & false positive warning tip-bot2 for Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).