linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] [GIT PULL][3.6] x86: cr2 and cmpxchg issues of NMI
@ 2012-06-08 13:52 Steven Rostedt
  2012-06-08 13:52 ` [PATCH 1/3] x86: Save cr2 in NMI in case NMIs take a page fault Steven Rostedt
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Steven Rostedt @ 2012-06-08 13:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, H. Peter Anvin

[-- Attachment #1: Type: text/plain, Size: 1164 bytes --]


Ingo,

As Avi brought to attention, there may be issues with NMIs and page faults
as if a NMI takes a page fault it can corrupt the cr2 for a page fault
handler that it preempted. The first and third patch fixes this for x86_64
and i386 respectively.

The second patch is to fix the cmpxchg issue brought up by someone on LWN.
The NMI handler for i386 uses cmpxchg to handle nested NMIs. But some
older i386 boxes do not have a true cmpxchg and this will fail for them.
Luckily, there is a simple fix that also makes the code cleaner.

Please pull the latest tip/x86/core tree, which can be found at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
tip/x86/core

Head SHA1: 7fefda3109021f98926d76a13f90d0b116ce3db5


Steven Rostedt (3):
      x86: Save cr2 in NMI in case NMIs take a page fault
      x86: Remove cmpxchg from i386 NMI nesting code
      x86: Save cr2 in NMI in case NMIs take a page fault (for i386)

----
 arch/x86/kernel/entry_64.S |   20 +++++++++++++++++
 arch/x86/kernel/nmi.c      |   51 +++++++++++++++++++++++++++++++-------------
 2 files changed, 56 insertions(+), 15 deletions(-)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] x86: Save cr2 in NMI in case NMIs take a page fault
  2012-06-08 13:52 [PATCH 0/3] [GIT PULL][3.6] x86: cr2 and cmpxchg issues of NMI Steven Rostedt
@ 2012-06-08 13:52 ` Steven Rostedt
  2012-06-08 13:52 ` [PATCH 2/3] x86: Remove cmpxchg from i386 NMI nesting code Steven Rostedt
  2012-06-08 13:52 ` [PATCH 3/3] x86: Save cr2 in NMI in case NMIs take a page fault (for i386) Steven Rostedt
  2 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2012-06-08 13:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, H. Peter Anvin,
	Avi Kivity, Linus Torvalds, Brian Gerst

[-- Attachment #1: Type: text/plain, Size: 2531 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Avi Kivity reported that page faults in NMIs could cause havic if
the NMI preempted another page fault handler:

   The recent changes to NMI allow exceptions to take place in NMI
   handlers, but I think that a #PF (say, due to access to vmalloc space)
   is still problematic.  Consider the sequence

    #PF  (cr2 set by processor)
      NMI
        ...
        #PF (cr2 clobbered)
          do_page_fault()
          IRET
        ...
        IRET
      do_page_fault()
        address = read_cr2()

   The last line reads the overwritten cr2 value.

Originally I wrote a patch to solve this by saving the cr2 on the stack.
Brian Gerst suggested to save it in the r12 register as both r12 and rbx
are saved by the do_nmi handler as required by the C standard. But rbx
is already used for saving if swapgs needs to be run on exit of the NMI
handler.

Link: http://lkml.kernel.org/r/4FBB8C40.6080304@redhat.com
Link: http://lkml.kernel.org/r/1337763411.13348.140.camel@gandalf.stny.rr.com

Reported-by: Avi Kivity <avi@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 arch/x86/kernel/entry_64.S |   20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 7d65133..111f6bb 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1758,10 +1758,30 @@ end_repeat_nmi:
 	 */
 	call save_paranoid
 	DEFAULT_FRAME 0
+
+	/*
+	 * Save off the CR2 register. If we take a page fault in the NMI then
+	 * it could corrupt the CR2 value. If the NMI preempts a page fault
+	 * handler before it was able to read the CR2 register, and then the
+	 * NMI itself takes a page fault, the page fault that was preempted
+	 * will read the information from the NMI page fault and not the
+	 * origin fault. Save it off and restore it if it changes.
+	 * Use the r12 callee-saved register.
+	 */
+	movq %cr2, %r12
+
 	/* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
 	movq %rsp,%rdi
 	movq $-1,%rsi
 	call do_nmi
+
+	/* Did the NMI take a page fault? Restore cr2 if it did */
+	movq %cr2, %rcx
+	cmpq %rcx, %r12
+	je 1f
+	movq %r12, %cr2
+1:
+	
 	testl %ebx,%ebx				/* swapgs needed? */
 	jnz nmi_restore
 nmi_swapgs:
-- 
1.7.10



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/3] x86: Remove cmpxchg from i386 NMI nesting code
  2012-06-08 13:52 [PATCH 0/3] [GIT PULL][3.6] x86: cr2 and cmpxchg issues of NMI Steven Rostedt
  2012-06-08 13:52 ` [PATCH 1/3] x86: Save cr2 in NMI in case NMIs take a page fault Steven Rostedt
@ 2012-06-08 13:52 ` Steven Rostedt
  2012-06-08 16:10   ` H. Peter Anvin
  2012-06-08 13:52 ` [PATCH 3/3] x86: Save cr2 in NMI in case NMIs take a page fault (for i386) Steven Rostedt
  2 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2012-06-08 13:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, H. Peter Anvin

[-- Attachment #1: Type: text/plain, Size: 4342 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

I've been informed by someone on LWN called 'slashdot' that
some i386 machines do not support a true cmpxchg. The cmpxchg
used by the i386 NMI nesting code must be a true cmpxchg as
disabling interrupts will not work for NMIs (which is the work
around for i386s that do not have a true cmpxchg).

This 'slashdot' character also suggested a fix to the issue.
As the state of the nesting NMIs goes as follows:

  NOT_RUNNING -> EXECUTING
  EXECUTING   -> NOT_RUNNING
  EXECUTING   -> LATCHED
  LATCHED     -> EXECUTING

Having these states as enum values of:

  NOT_RUNNING = 0
  EXECUTING   = 1
  LATCHED     = 2

Instead of a cmpxchg to make EXECUTING -> NOT_RUNNING a
dec_and_test() would work as well. If the dec_and_test brings
the state to NOT_RUNNING, that is the same as a cmpxchg
succeeding to change EXECUTING to NOT_RUNNING. If a nested NMI
were to come in and change it to LATCHED, the dec_and_test() would
convert the state to EXECUTING (what we want it to be in such a
case anyway).

I asked 'slashdot' to post this as a patch, but it never came to
be. I decided to do the work instead.

Link: http://lwn.net/Articles/484932/

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 arch/x86/kernel/nmi.c |   39 ++++++++++++++++++++++++---------------
 1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index a0b2f84..43cce77 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -28,6 +28,7 @@
 #include <asm/mach_traps.h>
 #include <asm/nmi.h>
 #include <asm/x86_init.h>
+#include <asm/local.h>
 
 struct nmi_desc {
 	spinlock_t lock;
@@ -365,8 +366,9 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
 #ifdef CONFIG_X86_32
 /*
  * For i386, NMIs use the same stack as the kernel, and we can
- * add a workaround to the iret problem in C. Simply have 3 states
- * the NMI can be in.
+ * add a workaround to the iret problem in C (preventing nested
+ * NMIs if an NMI takes a trap). Simply have 3 states the NMI
+ * can be in:
  *
  *  1) not running
  *  2) executing
@@ -383,32 +385,39 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
  * If an NMI hits a breakpoint that executes an iret, another
  * NMI can preempt it. We do not want to allow this new NMI
  * to run, but we want to execute it when the first one finishes.
- * We set the state to "latched", and the first NMI will perform
- * an cmpxchg on the state, and if it doesn't successfully
- * reset the state to "not running" it will restart the next
- * NMI.
+ * We set the state to "latched", and the exit of the first NMI will
+ * perform a dec_and_test, if the result is zero (NOT_RUNNING), then
+ * it will simply exit the NMI handler. If not, the dec_and_test
+ * would have set the state to NMI_EXECUTING (what we want it to
+ * be when we are running). In this case, we simply jump back
+ * to rerun the NMI handler again, and restart the 'latched' NMI.
+ *
+ * No trap (breakpoint or page fault) should be hit before nmi_restart,
+ * thus there is no race between the first check of state for NOT_RUNNING
+ * and setting it to NMI_EXECUTING. The HW will prevent nested NMIs
+ * at this point.
  */
 enum nmi_states {
-	NMI_NOT_RUNNING,
+	NMI_NOT_RUNNING = 0,
 	NMI_EXECUTING,
 	NMI_LATCHED,
 };
-static DEFINE_PER_CPU(enum nmi_states, nmi_state);
+static DEFINE_PER_CPU(local_t, nmi_state);
 
 #define nmi_nesting_preprocess(regs)					\
 	do {								\
-		if (__get_cpu_var(nmi_state) != NMI_NOT_RUNNING) {	\
-			__get_cpu_var(nmi_state) = NMI_LATCHED;		\
+		local_t *__state = &__get_cpu_var(nmi_state);		\
+		if (local_read(__state) != NMI_NOT_RUNNING) {		\
+			local_set(__state, NMI_LATCHED);		\
 			return;						\
 		}							\
-	nmi_restart:							\
-		__get_cpu_var(nmi_state) = NMI_EXECUTING;		\
-	} while (0)
+		local_set(__state, NMI_EXECUTING);			\
+	} while (0);							\
+	nmi_restart:
 
 #define nmi_nesting_postprocess()					\
 	do {								\
-		if (cmpxchg(&__get_cpu_var(nmi_state),			\
-		    NMI_EXECUTING, NMI_NOT_RUNNING) != NMI_EXECUTING)	\
+		if (!local_dec_and_test(&__get_cpu_var(nmi_state)))	\
 			goto nmi_restart;				\
 	} while (0)
 #else /* x86_64 */
-- 
1.7.10



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/3] x86: Save cr2 in NMI in case NMIs take a page fault (for i386)
  2012-06-08 13:52 [PATCH 0/3] [GIT PULL][3.6] x86: cr2 and cmpxchg issues of NMI Steven Rostedt
  2012-06-08 13:52 ` [PATCH 1/3] x86: Save cr2 in NMI in case NMIs take a page fault Steven Rostedt
  2012-06-08 13:52 ` [PATCH 2/3] x86: Remove cmpxchg from i386 NMI nesting code Steven Rostedt
@ 2012-06-08 13:52 ` Steven Rostedt
  2 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2012-06-08 13:52 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ingo Molnar, Andrew Morton, Thomas Gleixner, H. Peter Anvin,
	Avi Kivity, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 2757 bytes --]

From: Steven Rostedt <srostedt@redhat.com>

Avi Kivity reported that page faults in NMIs could cause havic if
the NMI preempted another page fault handler:

   The recent changes to NMI allow exceptions to take place in NMI
   handlers, but I think that a #PF (say, due to access to vmalloc space)
   is still problematic.  Consider the sequence

    #PF  (cr2 set by processor)
      NMI
        ...
        #PF (cr2 clobbered)
          do_page_fault()
          IRET
        ...
        IRET
      do_page_fault()
        address = read_cr2()

   The last line reads the overwritten cr2 value.

This is the i386 version, which has the luxury of doing the work
in C code.

Link: http://lkml.kernel.org/r/4FBB8C40.6080304@redhat.com

Reported-by: Avi Kivity <avi@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
---
 arch/x86/kernel/nmi.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 43cce77..26ae7e5 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -396,6 +396,14 @@ static __kprobes void default_do_nmi(struct pt_regs *regs)
  * thus there is no race between the first check of state for NOT_RUNNING
  * and setting it to NMI_EXECUTING. The HW will prevent nested NMIs
  * at this point.
+ *
+ * In case the NMI takes a page fault, we need to save off the CR2
+ * because the NMI could have preempted another page fault and corrupt
+ * the CR2 that is about to be read. As nested NMIs must be restarted
+ * and they can not take breakpoints or page faults, the update of the
+ * CR2 must be done before converting the nmi state back to NOT_RUNNING.
+ * Otherwise, there would be a race of another nested NMI coming in
+ * after setting state to NOT_RUNNING but before updating the nmi_cr2.
  */
 enum nmi_states {
 	NMI_NOT_RUNNING = 0,
@@ -403,6 +411,7 @@ enum nmi_states {
 	NMI_LATCHED,
 };
 static DEFINE_PER_CPU(local_t, nmi_state);
+static DEFINE_PER_CPU(unsigned long, nmi_cr2);
 
 #define nmi_nesting_preprocess(regs)					\
 	do {								\
@@ -412,11 +421,14 @@ static DEFINE_PER_CPU(local_t, nmi_state);
 			return;						\
 		}							\
 		local_set(__state, NMI_EXECUTING);			\
+		this_cpu_write(nmi_cr2, read_cr2());			\
 	} while (0);							\
 	nmi_restart:
 
 #define nmi_nesting_postprocess()					\
 	do {								\
+		if (unlikely(this_cpu_read(nmi_cr2) != read_cr2()))	\
+			write_cr2(this_cpu_read(nmi_cr2));		\
 		if (!local_dec_and_test(&__get_cpu_var(nmi_state)))	\
 			goto nmi_restart;				\
 	} while (0)
-- 
1.7.10



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] x86: Remove cmpxchg from i386 NMI nesting code
  2012-06-08 13:52 ` [PATCH 2/3] x86: Remove cmpxchg from i386 NMI nesting code Steven Rostedt
@ 2012-06-08 16:10   ` H. Peter Anvin
  2012-06-08 16:41     ` Steven Rostedt
  0 siblings, 1 reply; 16+ messages in thread
From: H. Peter Anvin @ 2012-06-08 16:10 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner

On 06/08/2012 06:52 AM, Steven Rostedt wrote:
> From: Steven Rostedt <srostedt@redhat.com>
> 
> I've been informed by someone on LWN called 'slashdot' that
> some i386 machines do not support a true cmpxchg. The cmpxchg
> used by the i386 NMI nesting code must be a true cmpxchg as
> disabling interrupts will not work for NMIs (which is the work
> around for i386s that do not have a true cmpxchg).
> 
> This 'slashdot' character also suggested a fix to the issue.
> As the state of the nesting NMIs goes as follows:
> 
>   NOT_RUNNING -> EXECUTING
>   EXECUTING   -> NOT_RUNNING
>   EXECUTING   -> LATCHED
>   LATCHED     -> EXECUTING
> 
> Having these states as enum values of:
> 
>   NOT_RUNNING = 0
>   EXECUTING   = 1
>   LATCHED     = 2
> 
> Instead of a cmpxchg to make EXECUTING -> NOT_RUNNING a
> dec_and_test() would work as well. If the dec_and_test brings
> the state to NOT_RUNNING, that is the same as a cmpxchg
> succeeding to change EXECUTING to NOT_RUNNING. If a nested NMI
> were to come in and change it to LATCHED, the dec_and_test() would
> convert the state to EXECUTING (what we want it to be in such a
> case anyway).
> 
> I asked 'slashdot' to post this as a patch, but it never came to
> be. I decided to do the work instead.
> 
> Link: http://lwn.net/Articles/484932/

Okay, slight interrupt here.

The cost of this on real hardware better be zero (which I cannot
immediately judge.)

Why?  Because cmpxchg has been in every CPU since the i486, the i386 is
royally crippled on Linux anyway (due to minor architectural defects,
the main one being the write protect issue) and NMI is almost never used
on i386 as anything other than a fatal error indication.

Most "real" NMI users generate the NMI from the local APIC, but the i386
has no local APIC, and unlike the i486 cannot even have an external
local APIC to the best of my knowledge.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] x86: Remove cmpxchg from i386 NMI nesting code
  2012-06-08 16:10   ` H. Peter Anvin
@ 2012-06-08 16:41     ` Steven Rostedt
  2012-06-08 17:28       ` H. Peter Anvin
  0 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2012-06-08 16:41 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner

On Fri, 2012-06-08 at 09:10 -0700, H. Peter Anvin wrote:

> Okay, slight interrupt here.

Pun intended?

> 
> The cost of this on real hardware better be zero (which I cannot
> immediately judge.)

Is dec_and_test cheaper than cmpxchg? I would think so.

> 
> Why?  Because cmpxchg has been in every CPU since the i486, the i386 is
> royally crippled on Linux anyway (due to minor architectural defects,
> the main one being the write protect issue) and NMI is almost never used
> on i386 as anything other than a fatal error indication.
> 
> Most "real" NMI users generate the NMI from the local APIC, but the i386
> has no local APIC, and unlike the i486 cannot even have an external
> local APIC to the best of my knowledge.

Yeah, this is why I didn't rush to do this change. But it does seem to
make the code simpler and it may actually speed things up. It replaces a
cmpxchg with a local_dec_and_test, which, I believe, doesn't even lock
the cachelines.

So lets look at the patch in detail, shall we?


>  enum nmi_states {
> -       NMI_NOT_RUNNING,
> +       NMI_NOT_RUNNING = 0,

This change was done more for documenting that the first element must be
zero. Even though C guarantees this. I wanted to point out that we
expect it to be zero and that it being zero really does matter. No
functionality change whats-so-ever.

>         NMI_EXECUTING,
>         NMI_LATCHED,
>  };
> -static DEFINE_PER_CPU(enum nmi_states, nmi_state);
> +static DEFINE_PER_CPU(local_t, nmi_state);

local_t is is just an atomic_long_t, which on i386 is nothing different
than what an enum would be.

>  
>  #define nmi_nesting_preprocess(regs)                                   \
>         do {                                                            \
> -               if (__get_cpu_var(nmi_state) != NMI_NOT_RUNNING) {      \
> -                       __get_cpu_var(nmi_state) = NMI_LATCHED;         \
> +               local_t *__state = &__get_cpu_var(nmi_state);           \
> +               if (local_read(__state) != NMI_NOT_RUNNING) {           \
> +                       local_set(__state, NMI_LATCHED);                \

The above change is probably a little bit of a speed up because we
remove the double '__get_cpu_var()' and replace it with a pointer that
is reused. I haven't looked at the assembly for this, but it is either
the same or better with the patch.

Sure, we could improve this by using this_cpu_var() which may make it
better than the patch. But the patch is currently the same or better
than what is there now.

>                         return;                                         \
>                 }                                                       \
> -       nmi_restart:                                                    \
> -               __get_cpu_var(nmi_state) = NMI_EXECUTING;               \
> -       } while (0)
> +               local_set(__state, NMI_EXECUTING);                      \
> +       } while (0);                                                    \
> +       nmi_restart:

Here it's better or the same than what is there now as we again remove
the reference to getting the pointer. In case gcc doesn't optimize it
nicely. But again we could have switched to this_cpu_write() which could
be better.

The movement of nmi_restart does help too. I'll explain that below.

>  
>  #define nmi_nesting_postprocess()                                      \
>         do {                                                            \
> -               if (cmpxchg(&__get_cpu_var(nmi_state),                  \
> -                   NMI_EXECUTING, NMI_NOT_RUNNING) != NMI_EXECUTING)   \
> +               if (!local_dec_and_test(&__get_cpu_var(nmi_state)))     \

Now this is where I think the patch helps. I'm almost certain that
local_dec_and_test is faster than a cmpxchg by many cycles. Especially
on i386.

>                         goto nmi_restart;                               \

And now that the local_dec_and_test on failure set the state to
NMI_EXECUTING, we can simply jump back to the new location of
nmi_restart instead of having to reset it above.

This also removes the race in case another NMI comes in at this moment
and we lose the latch. But this is a small problem, as we are about to
repeat the NMI anyway, and it would be the same as HW if the NMI came in
just before we finished the last NMI. We only repeat once. This is just
changing where we consider the start of the repeated NMI is (where a new
latch can happen). But functionality wise, it's no different.

-- Steve

>         } while (0)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] x86: Remove cmpxchg from i386 NMI nesting code
  2012-06-08 16:41     ` Steven Rostedt
@ 2012-06-08 17:28       ` H. Peter Anvin
  2012-06-08 17:36         ` Steven Rostedt
  0 siblings, 1 reply; 16+ messages in thread
From: H. Peter Anvin @ 2012-06-08 17:28 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner

On 06/08/2012 09:41 AM, Steven Rostedt wrote:
>>
>> The cost of this on real hardware better be zero (which I cannot
>> immediately judge.)
> 
> Is dec_and_test cheaper than cmpxchg? I would think so.
> 

Should be more or less the same (but see below w.r.t. _local).

>>
>> Why?  Because cmpxchg has been in every CPU since the i486, the i386 is
>> royally crippled on Linux anyway (due to minor architectural defects,
>> the main one being the write protect issue) and NMI is almost never used
>> on i386 as anything other than a fatal error indication.
>>
>> Most "real" NMI users generate the NMI from the local APIC, but the i386
>> has no local APIC, and unlike the i486 cannot even have an external
>> local APIC to the best of my knowledge.
> 
> Yeah, this is why I didn't rush to do this change. But it does seem to
> make the code simpler and it may actually speed things up. It replaces a
> cmpxchg with a local_dec_and_test, which, I believe, doesn't even lock
> the cachelines.
> 

Yeah, the cmpxchg here rather than cmpxchg_local seems like it just was
a plain bug, no?

> So lets look at the patch in detail, shall we?
> 
> 
>>  enum nmi_states {
>> -       NMI_NOT_RUNNING,
>> +       NMI_NOT_RUNNING = 0,
> 
> This change was done more for documenting that the first element must be
> zero. Even though C guarantees this. I wanted to point out that we
> expect it to be zero and that it being zero really does matter. No
> functionality change whats-so-ever.
> 

Yes, that makes sense.

>>         NMI_EXECUTING,
>>         NMI_LATCHED,
>>  };
>> -static DEFINE_PER_CPU(enum nmi_states, nmi_state);
>> +static DEFINE_PER_CPU(local_t, nmi_state);
> 
> local_t is is just an atomic_long_t, which on i386 is nothing different
> than what an enum would be.
> 
>>  
>>  #define nmi_nesting_preprocess(regs)                                   \
>>         do {                                                            \
>> -               if (__get_cpu_var(nmi_state) != NMI_NOT_RUNNING) {      \
>> -                       __get_cpu_var(nmi_state) = NMI_LATCHED;         \
>> +               local_t *__state = &__get_cpu_var(nmi_state);           \
>> +               if (local_read(__state) != NMI_NOT_RUNNING) {           \
>> +                       local_set(__state, NMI_LATCHED);                \
> 
> The above change is probably a little bit of a speed up because we
> remove the double '__get_cpu_var()' and replace it with a pointer that
> is reused. I haven't looked at the assembly for this, but it is either
> the same or better with the patch.
> 
> Sure, we could improve this by using this_cpu_var() which may make it
> better than the patch. But the patch is currently the same or better
> than what is there now.

But yes, if you're going to modify this use this_cpu_read() and
this_cpu_write() and avoid the pointer completely.

>>                         return;                                         \
>>                 }                                                       \
>> -       nmi_restart:                                                    \
>> -               __get_cpu_var(nmi_state) = NMI_EXECUTING;               \
>> -       } while (0)
>> +               local_set(__state, NMI_EXECUTING);                      \
>> +       } while (0);                                                    \
>> +       nmi_restart:
> 
> Here it's better or the same than what is there now as we again remove
> the reference to getting the pointer. In case gcc doesn't optimize it
> nicely. But again we could have switched to this_cpu_write() which could
> be better.
> 
> The movement of nmi_restart does help too. I'll explain that below.
> 
>>  
>>  #define nmi_nesting_postprocess()                                      \
>>         do {                                                            \
>> -               if (cmpxchg(&__get_cpu_var(nmi_state),                  \
>> -                   NMI_EXECUTING, NMI_NOT_RUNNING) != NMI_EXECUTING)   \
>> +               if (!local_dec_and_test(&__get_cpu_var(nmi_state)))     \
> 
> Now this is where I think the patch helps. I'm almost certain that
> local_dec_and_test is faster than a cmpxchg by many cycles. Especially
> on i386.
> 

On i386 it's infinite, but again, I don't think the code will ever be
exercised on i386.  I'm much more concerned about performance on current
processors.

But yes, local_dec_and_test should at least not be more expensive.  Even
better, use this_cpu_dec_return().

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] x86: Remove cmpxchg from i386 NMI nesting code
  2012-06-08 17:28       ` H. Peter Anvin
@ 2012-06-08 17:36         ` Steven Rostedt
  2012-06-08 17:39           ` H. Peter Anvin
  0 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2012-06-08 17:36 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner

On Fri, 2012-06-08 at 10:28 -0700, H. Peter Anvin wrote:
> On 06/08/2012 09:41 AM, Steven Rostedt wrote:
>  
> >>  
> >>  #define nmi_nesting_postprocess()                                      \
> >>         do {                                                            \
> >> -               if (cmpxchg(&__get_cpu_var(nmi_state),                  \
> >> -                   NMI_EXECUTING, NMI_NOT_RUNNING) != NMI_EXECUTING)   \
> >> +               if (!local_dec_and_test(&__get_cpu_var(nmi_state)))     \
> > 
> > Now this is where I think the patch helps. I'm almost certain that
> > local_dec_and_test is faster than a cmpxchg by many cycles. Especially
> > on i386.
> > 
> 
> On i386 it's infinite, but again, I don't think the code will ever be
> exercised on i386.  I'm much more concerned about performance on current
> processors.

I'm not worried about non i386 here. Some context missing from the patch
is that this code is surrounded by:

#ifdef CONFIG_X86_32

[...]

#else

The '#else' part does something completely different, as this work is
done in the assembly handler on x86_64.

And yes, this code is executed on i386 (well when I boot my x86_64 into
an i386 kernel it does).


> 
> But yes, local_dec_and_test should at least not be more expensive.  Even
> better, use this_cpu_dec_return().

Oh! That would make this much better! I'll update the patch set.

Thanks!

-- Steve





^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] x86: Remove cmpxchg from i386 NMI nesting code
  2012-06-08 17:36         ` Steven Rostedt
@ 2012-06-08 17:39           ` H. Peter Anvin
  2012-06-08 17:52             ` Steven Rostedt
  0 siblings, 1 reply; 16+ messages in thread
From: H. Peter Anvin @ 2012-06-08 17:39 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner

On 06/08/2012 10:36 AM, Steven Rostedt wrote:
>>
>> On i386 it's infinite, but again, I don't think the code will ever be
>> exercised on i386.  I'm much more concerned about performance on current
>> processors.
> 
> I'm not worried about non i386 here. Some context missing from the patch
> is that this code is surrounded by:
> 

Sorry, I meant i386 as in actual i386 processors, as opposed to modern
processors in 32-bit mode.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] x86: Remove cmpxchg from i386 NMI nesting code
  2012-06-08 17:39           ` H. Peter Anvin
@ 2012-06-08 17:52             ` Steven Rostedt
  2012-06-08 18:04               ` H. Peter Anvin
                                 ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Steven Rostedt @ 2012-06-08 17:52 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner

On Fri, 2012-06-08 at 10:39 -0700, H. Peter Anvin wrote:
> On 06/08/2012 10:36 AM, Steven Rostedt wrote:
> >>
> >> On i386 it's infinite, but again, I don't think the code will ever be
> >> exercised on i386.  I'm much more concerned about performance on current
> >> processors.
> > 
> > I'm not worried about non i386 here. Some context missing from the patch
> > is that this code is surrounded by:
> > 
> 
> Sorry, I meant i386 as in actual i386 processors, as opposed to modern
> processors in 32-bit mode.
> 

Fair enough. But still, I think a dec_and_test() is nicer (even if we
use this_cpu_dec_and_return()). That's the rational for this change in
the first place, with an added benefit that it would work with the
fictional i386 old processor that does NMIs but no cmpxchg.

Not to mention that I would love to make all users that have x86_64
machines running i386 kernels, for something other than testing i386,
thrown into a large boiling pot of clue stew. It really pisses me off
when people run these i386 machines with 16 gigs of memory and complain
about performance. highmem should be disabled for any x86_64 box with
more than 1G of RAM running an i386 kernel with a nasty message on boot
up:

"Idiot! Why the f*ck are you running i386 on your nice shiny new x86_64
machine, with Umpteen Gigs of RAM. Boot a x86_64 kernel for Christ sake
and get your full potential. Your i386 userspace will still work just
fine on it. You're like one of those 75 year old retirees that can
finally buy a Porsche just to drive it 10 miles per hour below the speed
limit with a line of cars behind them!"

-- Steve



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] x86: Remove cmpxchg from i386 NMI nesting code
  2012-06-08 17:52             ` Steven Rostedt
@ 2012-06-08 18:04               ` H. Peter Anvin
  2012-06-08 18:09               ` Borislav Petkov
  2012-06-11  0:25               ` Maciej W. Rozycki
  2 siblings, 0 replies; 16+ messages in thread
From: H. Peter Anvin @ 2012-06-08 18:04 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner

On 06/08/2012 10:52 AM, Steven Rostedt wrote:
> 
> "Idiot! Why the f*ck are you running i386 on your nice shiny new x86_64
> machine, with Umpteen Gigs of RAM. Boot a x86_64 kernel for Christ sake
> and get your full potential. Your i386 userspace will still work just
> fine on it. You're like one of those 75 year old retirees that can
> finally buy a Porsche just to drive it 10 miles per hour below the speed
> limit with a line of cars behind them!"
> 

LOL!

	-hpa


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] x86: Remove cmpxchg from i386 NMI nesting code
  2012-06-08 17:52             ` Steven Rostedt
  2012-06-08 18:04               ` H. Peter Anvin
@ 2012-06-08 18:09               ` Borislav Petkov
  2012-06-11  0:25               ` Maciej W. Rozycki
  2 siblings, 0 replies; 16+ messages in thread
From: Borislav Petkov @ 2012-06-08 18:09 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: H. Peter Anvin, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner

On Fri, Jun 08, 2012 at 01:52:09PM -0400, Steven Rostedt wrote:
> "Idiot! Why the f*ck are you running i386 on your nice shiny new x86_64
> machine, with Umpteen Gigs of RAM. Boot a x86_64 kernel for Christ sake
> and get your full potential. Your i386 userspace will still work just
> fine on it. You're like one of those 75 year old retirees that can
> finally buy a Porsche just to drive it 10 miles per hour below the speed
> limit with a line of cars behind them!"

If you make this into a patch, you can have my vehement ACK.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] x86: Remove cmpxchg from i386 NMI nesting code
  2012-06-08 17:52             ` Steven Rostedt
  2012-06-08 18:04               ` H. Peter Anvin
  2012-06-08 18:09               ` Borislav Petkov
@ 2012-06-11  0:25               ` Maciej W. Rozycki
  2012-06-11  2:23                 ` H. Peter Anvin
  2 siblings, 1 reply; 16+ messages in thread
From: Maciej W. Rozycki @ 2012-06-11  0:25 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: H. Peter Anvin, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner

On Fri, 8 Jun 2012, Steven Rostedt wrote:

> "Idiot! Why the f*ck are you running i386 on your nice shiny new x86_64
> machine, with Umpteen Gigs of RAM. Boot a x86_64 kernel for Christ sake
> and get your full potential. Your i386 userspace will still work just
> fine on it. You're like one of those 75 year old retirees that can
> finally buy a Porsche just to drive it 10 miles per hour below the speed
> limit with a line of cars behind them!"

 :)  Yeah, if only native x86_64 could run vm86 so as to ask the VGA BIOS 
to unbotch the graphics card that shot itself into the foot for whatever 
reason, sigh...

  Maciej

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] x86: Remove cmpxchg from i386 NMI nesting code
  2012-06-11  0:25               ` Maciej W. Rozycki
@ 2012-06-11  2:23                 ` H. Peter Anvin
  2012-06-11  3:14                   ` Maciej W. Rozycki
  0 siblings, 1 reply; 16+ messages in thread
From: H. Peter Anvin @ 2012-06-11  2:23 UTC (permalink / raw)
  To: Maciej W. Rozycki, Steven Rostedt
  Cc: linux-kernel, Ingo Molnar, Andrew Morton, Thomas Gleixner

Just use a VM or a simulator.


"Maciej W. Rozycki" <macro@linux-mips.org> wrote:

>On Fri, 8 Jun 2012, Steven Rostedt wrote:
>
>> "Idiot! Why the f*ck are you running i386 on your nice shiny new
>x86_64
>> machine, with Umpteen Gigs of RAM. Boot a x86_64 kernel for Christ
>sake
>> and get your full potential. Your i386 userspace will still work just
>> fine on it. You're like one of those 75 year old retirees that can
>> finally buy a Porsche just to drive it 10 miles per hour below the
>speed
>> limit with a line of cars behind them!"
>
>:)  Yeah, if only native x86_64 could run vm86 so as to ask the VGA
>BIOS 
>to unbotch the graphics card that shot itself into the foot for
>whatever 
>reason, sigh...
>
>  Maciej

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] x86: Remove cmpxchg from i386 NMI nesting code
  2012-06-11  2:23                 ` H. Peter Anvin
@ 2012-06-11  3:14                   ` Maciej W. Rozycki
  2012-06-11  3:17                     ` H. Peter Anvin
  0 siblings, 1 reply; 16+ messages in thread
From: Maciej W. Rozycki @ 2012-06-11  3:14 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Steven Rostedt, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner

On Sun, 10 Jun 2012, H. Peter Anvin wrote:

> Just use a VM or a simulator.

 That wasn't completely serious ;) -- however thanks for the hint.  I 
wonder actually if any piece of software would let the code run in a 
"pass-through" mode and record what I/O accesses went past.  I reckon 
DOSEMU used to support that (and running the VGA BIOS) -- but that piece 
did indeed require vm86.  Then the sequence recorded could be replayed 
natively.

 Thanks for the food for thought! :)

  Maciej

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] x86: Remove cmpxchg from i386 NMI nesting code
  2012-06-11  3:14                   ` Maciej W. Rozycki
@ 2012-06-11  3:17                     ` H. Peter Anvin
  0 siblings, 0 replies; 16+ messages in thread
From: H. Peter Anvin @ 2012-06-11  3:17 UTC (permalink / raw)
  To: Maciej W. Rozycki
  Cc: Steven Rostedt, linux-kernel, Ingo Molnar, Andrew Morton,
	Thomas Gleixner

libx86 is used for this.  Of course, when you interpret you can also record...

"Maciej W. Rozycki" <macro@linux-mips.org> wrote:

>On Sun, 10 Jun 2012, H. Peter Anvin wrote:
>
>> Just use a VM or a simulator.
>
> That wasn't completely serious ;) -- however thanks for the hint.  I 
>wonder actually if any piece of software would let the code run in a 
>"pass-through" mode and record what I/O accesses went past.  I reckon 
>DOSEMU used to support that (and running the VGA BIOS) -- but that
>piece 
>did indeed require vm86.  Then the sequence recorded could be replayed 
>natively.
>
> Thanks for the food for thought! :)
>
>  Maciej

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-06-11  3:31 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-08 13:52 [PATCH 0/3] [GIT PULL][3.6] x86: cr2 and cmpxchg issues of NMI Steven Rostedt
2012-06-08 13:52 ` [PATCH 1/3] x86: Save cr2 in NMI in case NMIs take a page fault Steven Rostedt
2012-06-08 13:52 ` [PATCH 2/3] x86: Remove cmpxchg from i386 NMI nesting code Steven Rostedt
2012-06-08 16:10   ` H. Peter Anvin
2012-06-08 16:41     ` Steven Rostedt
2012-06-08 17:28       ` H. Peter Anvin
2012-06-08 17:36         ` Steven Rostedt
2012-06-08 17:39           ` H. Peter Anvin
2012-06-08 17:52             ` Steven Rostedt
2012-06-08 18:04               ` H. Peter Anvin
2012-06-08 18:09               ` Borislav Petkov
2012-06-11  0:25               ` Maciej W. Rozycki
2012-06-11  2:23                 ` H. Peter Anvin
2012-06-11  3:14                   ` Maciej W. Rozycki
2012-06-11  3:17                     ` H. Peter Anvin
2012-06-08 13:52 ` [PATCH 3/3] x86: Save cr2 in NMI in case NMIs take a page fault (for i386) Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).