linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] x86/nmi: Optimize address compares with better jump algorithm
@ 2017-03-09 22:42 Steven Rostedt
  2017-03-09 22:42 ` [PATCH 1/2] x86/nmi: Optimize the check for being in the repeat_nmi code Steven Rostedt
  2017-03-09 22:42 ` [PATCH 2/2] x86/nmi: Fix and optimize the NMI stack check code Steven Rostedt
  0 siblings, 2 replies; 10+ messages in thread
From: Steven Rostedt @ 2017-03-09 22:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Andrew Morton


I started working on the x86-32 fentry code and I noticed that I had some
old patches in my local tip/x86/core branch. Looking at them, they still
seem relevant.

They are from 2014 when I was working at Red Hat. I kept the author having
(Red Hat) and since I forward ported them now that I work for VMware,
I used (VMware) in my Signed-off-by.

Both these patches are basically doing the same thing but to different
parts. One is making a better test of the RIP the other is a better
test of the RSP. Each patch describe what they are doing.

Steven Rostedt (Red Hat) (2):
      x86/nmi: Optimize the check for being in the repeat_nmi code
      x86/nmi: Fix and optimize the NMI stack check code

----
 arch/x86/entry/entry_64.S | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] x86/nmi: Optimize the check for being in the repeat_nmi code
  2017-03-09 22:42 [PATCH 0/2] x86/nmi: Optimize address compares with better jump algorithm Steven Rostedt
@ 2017-03-09 22:42 ` Steven Rostedt
  2017-03-10  2:42   ` Andy Lutomirski
  2017-03-09 22:42 ` [PATCH 2/2] x86/nmi: Fix and optimize the NMI stack check code Steven Rostedt
  1 sibling, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2017-03-09 22:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Andrew Morton

[-- Attachment #1: 0001-x86-nmi-Optimize-the-check-for-being-in-the-repeat_n.patch --]
[-- Type: text/plain, Size: 1202 bytes --]

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

Linus mentioned that doing two compares can be replaced by a single
compare. That is, instead of:

   movq $repeat_nmi, %rdx
   cmpq 8(%rsp), %rdx
   ja not_in_region
   movq $end_repeat_nmi, %rdx
   cmpq 8(%rsp), %rdx
   ja in_region

we can replace that with:

   movq 8(%rsp), %rdx
   subq $repeat_nmi, %rdx
   cmpq $end_repeat_nmi-repeat_nmi, %rdx
   jb in_region

Inspired-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 arch/x86/entry/entry_64.S | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 044d18ebc43c..3aad759aace2 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1330,13 +1330,10 @@ ENTRY(nmi)
 	 * resume the outer NMI.
 	 */
 
-	movq	$repeat_nmi, %rdx
-	cmpq	8(%rsp), %rdx
-	ja	1f
-	movq	$end_repeat_nmi, %rdx
-	cmpq	8(%rsp), %rdx
-	ja	nested_nmi_out
-1:
+	movq	8(%rsp), %rdx
+	subq	$repeat_nmi, %rdx
+	cmpq	$end_repeat_nmi-repeat_nmi, %rdx
+	jb	nested_nmi_out
 
 	/*
 	 * Now check "NMI executing".  If it's set, then we're nested.
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] x86/nmi: Fix and optimize the NMI stack check code
  2017-03-09 22:42 [PATCH 0/2] x86/nmi: Optimize address compares with better jump algorithm Steven Rostedt
  2017-03-09 22:42 ` [PATCH 1/2] x86/nmi: Optimize the check for being in the repeat_nmi code Steven Rostedt
@ 2017-03-09 22:42 ` Steven Rostedt
  2017-03-10  2:43   ` Andy Lutomirski
  1 sibling, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2017-03-09 22:42 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	Andy Lutomirski, Andrew Morton, Andy Lutomirski

[-- Attachment #1: 0002-x86-nmi-Fix-and-optimize-the-NMI-stack-check-code.patch --]
[-- Type: text/plain, Size: 2282 bytes --]

From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>

Andy Lutomirski reported an off by one in the NMI stack check
for the nested NMI code, where if the stack pointer was one above
the actual stack (stack start + STACK_SIZE) it would trigger a false
positive. This is not that big of a deal because the stack pointer
should never be that. Even if a stack was using the pages just
above the NMI stack, it would require the stack about to overflow
for this to trigger, which is a much bigger bug than this is fixing.

Also, Linus Torvalds pointed out that doing two compares can be
accomplish with a single compare. That is:

("reg" is top of stack we are comparing "stack" to)

  cmpq reg, stack
  jae label  // note, code had one off "ja" instead of "jae"
  subq size, reg
  cmpq reg, stack
  jb label

Is the same as:

  subq $1, reg
  subq stack, reg
  cmpq size, reg
  jae label

The subq $1 was added into the leaq by doing:

   leaq 5*8+7(%rsp), %rdx

Added more comments as well.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Inspired-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 arch/x86/entry/entry_64.S | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 3aad759aace2..1e6ca3740762 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1355,16 +1355,17 @@ ENTRY(nmi)
 	 * if it controls the kernel's RSP.  We set DF before we clear
 	 * "NMI executing".
 	 */
-	lea	6*8(%rsp), %rdx
-	/* Compare the NMI stack (rdx) with the stack we came from (4*8(%rsp)) */
-	cmpq	%rdx, 4*8(%rsp)
-	/* If the stack pointer is above the NMI stack, this is a normal NMI */
-	ja	first_nmi
-
-	subq	$EXCEPTION_STKSZ, %rdx
-	cmpq	%rdx, 4*8(%rsp)
-	/* If it is below the NMI stack, it is a normal NMI */
-	jb	first_nmi
+
+	/* Load address of the top of this stack into rdx */
+	lea 5*8+7(%rsp), %rdx
+	/* Subtract the return stack pointer from it */
+	subq 4*8(%rsp), %rdx
+	/*
+	 * If the result is greater or equal to the stack size,
+	 * then the return stack was not on this stack.
+	 */
+	cmpq $EXCEPTION_STKSZ, %rdx
+	jae first_nmi
 
 	/* Ah, it is within the NMI stack. */
 
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] x86/nmi: Optimize the check for being in the repeat_nmi code
  2017-03-09 22:42 ` [PATCH 1/2] x86/nmi: Optimize the check for being in the repeat_nmi code Steven Rostedt
@ 2017-03-10  2:42   ` Andy Lutomirski
  2017-03-10  3:49     ` Steven Rostedt
  0 siblings, 1 reply; 10+ messages in thread
From: Andy Lutomirski @ 2017-03-10  2:42 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Andrew Morton

On Thu, Mar 9, 2017 at 2:42 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
>
> Linus mentioned that doing two compares can be replaced by a single
> compare. That is, instead of:
>
>    movq $repeat_nmi, %rdx
>    cmpq 8(%rsp), %rdx
>    ja not_in_region
>    movq $end_repeat_nmi, %rdx
>    cmpq 8(%rsp), %rdx
>    ja in_region
>
> we can replace that with:
>
>    movq 8(%rsp), %rdx
>    subq $repeat_nmi, %rdx
>    cmpq $end_repeat_nmi-repeat_nmi, %rdx
>    jb in_region

Seems reasonable to me.  Good luck ever noticing the speedup :)

--Andy

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] x86/nmi: Fix and optimize the NMI stack check code
  2017-03-09 22:42 ` [PATCH 2/2] x86/nmi: Fix and optimize the NMI stack check code Steven Rostedt
@ 2017-03-10  2:43   ` Andy Lutomirski
  0 siblings, 0 replies; 10+ messages in thread
From: Andy Lutomirski @ 2017-03-10  2:43 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Andrew Morton

On Thu, Mar 9, 2017 at 2:42 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
>
> Andy Lutomirski reported an off by one in the NMI stack check
> for the nested NMI code, where if the stack pointer was one above
> the actual stack (stack start + STACK_SIZE) it would trigger a false
> positive. This is not that big of a deal because the stack pointer
> should never be that. Even if a stack was using the pages just
> above the NMI stack, it would require the stack about to overflow
> for this to trigger, which is a much bigger bug than this is fixing.
>
> Also, Linus Torvalds pointed out that doing two compares can be
> accomplish with a single compare. That is:
>
> ("reg" is top of stack we are comparing "stack" to)
>
>   cmpq reg, stack
>   jae label  // note, code had one off "ja" instead of "jae"
>   subq size, reg
>   cmpq reg, stack
>   jb label
>
> Is the same as:
>
>   subq $1, reg
>   subq stack, reg
>   cmpq size, reg
>   jae label
>
> The subq $1 was added into the leaq by doing:
>
>    leaq 5*8+7(%rsp), %rdx
>
> Added more comments as well.

Nice.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] x86/nmi: Optimize the check for being in the repeat_nmi code
  2017-03-10  2:42   ` Andy Lutomirski
@ 2017-03-10  3:49     ` Steven Rostedt
  2017-03-10  3:50       ` Andy Lutomirski
  0 siblings, 1 reply; 10+ messages in thread
From: Steven Rostedt @ 2017-03-10  3:49 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Andrew Morton



On March 9, 2017 9:42:57 PM EST, Andy Lutomirski <luto@amacapital.net> wrote:
>On Thu, Mar 9, 2017 at 2:42 PM, Steven Rostedt <rostedt@goodmis.org>
>wrote:
>> From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
>>
>> Linus mentioned that doing two compares can be replaced by a single
>> compare. That is, instead of:
>>
>>    movq $repeat_nmi, %rdx
>>    cmpq 8(%rsp), %rdx
>>    ja not_in_region
>>    movq $end_repeat_nmi, %rdx
>>    cmpq 8(%rsp), %rdx
>>    ja in_region
>>
>> we can replace that with:
>>
>>    movq 8(%rsp), %rdx
>>    subq $repeat_nmi, %rdx
>>    cmpq $end_repeat_nmi-repeat_nmi, %rdx
>>    jb in_region
>
>Seems reasonable to me.  Good luck ever noticing the speedup :)
>

It had nothing to do with speedup. Linus said that the current code makes the assembly programmer in him die a little. I want to cure that.

-- Steve 

>--Andy

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] x86/nmi: Optimize the check for being in the repeat_nmi code
  2017-03-10  3:49     ` Steven Rostedt
@ 2017-03-10  3:50       ` Andy Lutomirski
  2017-03-10  7:20         ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Andy Lutomirski @ 2017-03-10  3:50 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Andrew Morton

On Thu, Mar 9, 2017 at 7:49 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
>
>
> On March 9, 2017 9:42:57 PM EST, Andy Lutomirski <luto@amacapital.net> wrote:
>>On Thu, Mar 9, 2017 at 2:42 PM, Steven Rostedt <rostedt@goodmis.org>
>>wrote:
>>> From: "Steven Rostedt (Red Hat)" <rostedt@goodmis.org>
>>>
>>> Linus mentioned that doing two compares can be replaced by a single
>>> compare. That is, instead of:
>>>
>>>    movq $repeat_nmi, %rdx
>>>    cmpq 8(%rsp), %rdx
>>>    ja not_in_region
>>>    movq $end_repeat_nmi, %rdx
>>>    cmpq 8(%rsp), %rdx
>>>    ja in_region
>>>
>>> we can replace that with:
>>>
>>>    movq 8(%rsp), %rdx
>>>    subq $repeat_nmi, %rdx
>>>    cmpq $end_repeat_nmi-repeat_nmi, %rdx
>>>    jb in_region
>>
>>Seems reasonable to me.  Good luck ever noticing the speedup :)
>>
>
> It had nothing to do with speedup. Linus said that the current code makes the assembly programmer in him die a little. I want to cure that.
>

One might argue that the world would be a better place if the assembly
programmer in some people died a little.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] x86/nmi: Optimize the check for being in the repeat_nmi code
  2017-03-10  3:50       ` Andy Lutomirski
@ 2017-03-10  7:20         ` Ingo Molnar
  2017-03-10 19:00           ` Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2017-03-10  7:20 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Steven Rostedt, linux-kernel, Linus Torvalds, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Andrew Morton


* Andy Lutomirski <luto@amacapital.net> wrote:

> > It had nothing to do with speedup. Linus said that the current code makes the 
> > assembly programmer in him die a little. I want to cure that.
> 
> One might argue that the world would be a better place if the assembly 
> programmer in some people died a little.

Joking aside, I'll bite: while in the kernel we try to avoid ever actually 
_writing_ new assembly code, assembly programming is still an invaluable skill, 
because it indirectly improves all the other 99% of non-assembly .c code:

 - Looking at the C compiler's assembly output tells us how close the code is to
   optimal.

 - Being able to tell whether our C abstractions are too far removed from how the
   compiler will map it to machine instructions is invaluable.

 - Being able to shape data structures and code in a machine-friendly way.

Much would be lost if the assembly programmer went extinct and it's no 
accident that annotated assembly output is just two <Enter> keys away
after launching 'perf top' or 'perf report'. The more developers know
assembly the better, IMHO.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] x86/nmi: Optimize the check for being in the repeat_nmi code
  2017-03-10  7:20         ` Ingo Molnar
@ 2017-03-10 19:00           ` Linus Torvalds
  2017-03-10 19:03             ` Andy Lutomirski
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2017-03-10 19:00 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andy Lutomirski, Steven Rostedt, linux-kernel, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Andrew Morton

On Thu, Mar 9, 2017 at 11:20 PM, Ingo Molnar <mingo@kernel.org> wrote:
>
> Joking aside, I'll bite: while in the kernel we try to avoid ever actually
> _writing_ new assembly code

.. also, when we do, I think we should care about it.

If you write asm, and the end result is noticeably worse than what
your average compiler would generate, exactly why are you writing it
in asm in the first place?

So I think people should aim to avoid asm. Andy certainly knows that,
and I loved his "rewrite a lot of the low-level system call code"
patches.

But the corollary to that is that if you _do_ write assembler, please
have some pride in the code, and don't half-arse it.

                 Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] x86/nmi: Optimize the check for being in the repeat_nmi code
  2017-03-10 19:00           ` Linus Torvalds
@ 2017-03-10 19:03             ` Andy Lutomirski
  0 siblings, 0 replies; 10+ messages in thread
From: Andy Lutomirski @ 2017-03-10 19:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Steven Rostedt, linux-kernel, Thomas Gleixner,
	H. Peter Anvin, Andy Lutomirski, Andrew Morton

On Fri, Mar 10, 2017 at 11:00 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Mar 9, 2017 at 11:20 PM, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> Joking aside, I'll bite: while in the kernel we try to avoid ever actually
>> _writing_ new assembly code
>
> .. also, when we do, I think we should care about it.
>
> If you write asm, and the end result is noticeably worse than what
> your average compiler would generate, exactly why are you writing it
> in asm in the first place?
>
> So I think people should aim to avoid asm. Andy certainly knows that,
> and I loved his "rewrite a lot of the low-level system call code"
> patches.
>
> But the corollary to that is that if you _do_ write assembler, please
> have some pride in the code, and don't half-arse it.
>

Geez, I didn't expect anyone to take my silly comment remotely
seriously :)  And I do like Steven's patches.

--Andy, who just looked at binutils source to figure out WTF "nobits"
meant.  Take that, asm!

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-03-10 19:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-09 22:42 [PATCH 0/2] x86/nmi: Optimize address compares with better jump algorithm Steven Rostedt
2017-03-09 22:42 ` [PATCH 1/2] x86/nmi: Optimize the check for being in the repeat_nmi code Steven Rostedt
2017-03-10  2:42   ` Andy Lutomirski
2017-03-10  3:49     ` Steven Rostedt
2017-03-10  3:50       ` Andy Lutomirski
2017-03-10  7:20         ` Ingo Molnar
2017-03-10 19:00           ` Linus Torvalds
2017-03-10 19:03             ` Andy Lutomirski
2017-03-09 22:42 ` [PATCH 2/2] x86/nmi: Fix and optimize the NMI stack check code Steven Rostedt
2017-03-10  2:43   ` Andy Lutomirski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).