linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/asm/entry/32: simplify pushes of zeroed pt_regs->REGs
@ 2016-03-11 16:53 Denys Vlasenko
  2016-03-12 15:38 ` Ingo Molnar
  2016-03-12 15:45 ` Ingo Molnar
  0 siblings, 2 replies; 6+ messages in thread
From: Denys Vlasenko @ 2016-03-11 16:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Denys Vlasenko, Steven Rostedt, Borislav Petkov, H. Peter Anvin,
	Andy Lutomirski, Frederic Weisbecker, Will Drewry, Kees Cook,
	x86, linux-kernel

Use of a temporary R8 register here seems to be unnecessary.

"push %r8" is a two-byte insn (it needs REX prefix to specify R8),
"push $0" is two-byte too. It seems just using the latter would be
no worse.

Thus, code had an unnecessary "xorq %r8,%r8" insn.
It probably costs nothing in execution time here since we are probably
limited by store bandwidth at this point, but still.

Run-tested under QEMU: 32-bit calls still work:

/ # ./test_syscall_vdso32
[RUN]	Executing 6-argument 32-bit syscall via VDSO
[OK]	Arguments are preserved across syscall
[NOTE]	R11 has changed:0000000000200ed7 - assuming clobbered by SYSRET insn
[OK]	R8..R15 did not leak kernel data
[RUN]	Executing 6-argument 32-bit syscall via INT 80
[OK]	Arguments are preserved across syscall
[OK]	R8..R15 did not leak kernel data
[RUN]	Running tests under ptrace
[RUN]	Executing 6-argument 32-bit syscall via VDSO
[OK]	Arguments are preserved across syscall
[NOTE]	R11 has changed:0000000000200ed7 - assuming clobbered by SYSRET insn
[OK]	R8..R15 did not leak kernel data
[RUN]	Executing 6-argument 32-bit syscall via INT 80
[OK]	Arguments are preserved across syscall
[OK]	R8..R15 did not leak kernel data

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Ingo Molnar <mingo@kernel.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Borislav Petkov <bp@alien8.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
---
 arch/x86/entry/entry_64_compat.S | 45 +++++++++++++++++++---------------------
 1 file changed, 21 insertions(+), 24 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 847f2f0..e1721da 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -72,24 +72,23 @@ ENTRY(entry_SYSENTER_compat)
 	pushfq				/* pt_regs->flags (except IF = 0) */
 	orl	$X86_EFLAGS_IF, (%rsp)	/* Fix saved flags */
 	pushq	$__USER32_CS		/* pt_regs->cs */
-	xorq    %r8,%r8
-	pushq	%r8			/* pt_regs->ip = 0 (placeholder) */
+	pushq	$0			/* pt_regs->ip = 0 (placeholder) */
 	pushq	%rax			/* pt_regs->orig_ax */
 	pushq	%rdi			/* pt_regs->di */
 	pushq	%rsi			/* pt_regs->si */
 	pushq	%rdx			/* pt_regs->dx */
 	pushq	%rcx			/* pt_regs->cx */
 	pushq	$-ENOSYS		/* pt_regs->ax */
-	pushq   %r8                     /* pt_regs->r8  = 0 */
-	pushq   %r8                     /* pt_regs->r9  = 0 */
-	pushq   %r8                     /* pt_regs->r10 = 0 */
-	pushq   %r8                     /* pt_regs->r11 = 0 */
+	pushq   $0			/* pt_regs->r8  = 0 */
+	pushq   $0			/* pt_regs->r9  = 0 */
+	pushq   $0			/* pt_regs->r10 = 0 */
+	pushq   $0			/* pt_regs->r11 = 0 */
 	pushq   %rbx                    /* pt_regs->rbx */
 	pushq   %rbp                    /* pt_regs->rbp (will be overwritten) */
-	pushq   %r8                     /* pt_regs->r12 = 0 */
-	pushq   %r8                     /* pt_regs->r13 = 0 */
-	pushq   %r8                     /* pt_regs->r14 = 0 */
-	pushq   %r8                     /* pt_regs->r15 = 0 */
+	pushq   $0			/* pt_regs->r12 = 0 */
+	pushq   $0			/* pt_regs->r13 = 0 */
+	pushq   $0			/* pt_regs->r14 = 0 */
+	pushq   $0			/* pt_regs->r15 = 0 */
 	cld
 
 	/*
@@ -205,17 +204,16 @@ ENTRY(entry_SYSCALL_compat)
 	pushq	%rdx			/* pt_regs->dx */
 	pushq	%rbp			/* pt_regs->cx (stashed in bp) */
 	pushq	$-ENOSYS		/* pt_regs->ax */
-	xorq    %r8,%r8
-	pushq   %r8                     /* pt_regs->r8  = 0 */
-	pushq   %r8                     /* pt_regs->r9  = 0 */
-	pushq   %r8                     /* pt_regs->r10 = 0 */
-	pushq   %r8                     /* pt_regs->r11 = 0 */
+	pushq   $0			/* pt_regs->r8  = 0 */
+	pushq   $0			/* pt_regs->r9  = 0 */
+	pushq   $0			/* pt_regs->r10 = 0 */
+	pushq   $0			/* pt_regs->r11 = 0 */
 	pushq   %rbx                    /* pt_regs->rbx */
 	pushq   %rbp                    /* pt_regs->rbp (will be overwritten) */
-	pushq   %r8                     /* pt_regs->r12 = 0 */
-	pushq   %r8                     /* pt_regs->r13 = 0 */
-	pushq   %r8                     /* pt_regs->r14 = 0 */
-	pushq   %r8                     /* pt_regs->r15 = 0 */
+	pushq   $0			/* pt_regs->r12 = 0 */
+	pushq   $0			/* pt_regs->r13 = 0 */
+	pushq   $0			/* pt_regs->r14 = 0 */
+	pushq   $0			/* pt_regs->r15 = 0 */
 
 	/*
 	 * User mode is traced as though IRQs are on, and SYSENTER
@@ -316,11 +314,10 @@ ENTRY(entry_INT80_compat)
 	pushq	%rdx			/* pt_regs->dx */
 	pushq	%rcx			/* pt_regs->cx */
 	pushq	$-ENOSYS		/* pt_regs->ax */
-	xorq    %r8,%r8
-	pushq   %r8                     /* pt_regs->r8  = 0 */
-	pushq   %r8                     /* pt_regs->r9  = 0 */
-	pushq   %r8                     /* pt_regs->r10 = 0 */
-	pushq   %r8                     /* pt_regs->r11 = 0 */
+	pushq   $0			/* pt_regs->r8  = 0 */
+	pushq   $0			/* pt_regs->r9  = 0 */
+	pushq   $0			/* pt_regs->r10 = 0 */
+	pushq   $0			/* pt_regs->r11 = 0 */
 	pushq   %rbx                    /* pt_regs->rbx */
 	pushq   %rbp                    /* pt_regs->rbp */
 	pushq   %r12                    /* pt_regs->r12 */
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/asm/entry/32: simplify pushes of zeroed pt_regs->REGs
  2016-03-11 16:53 [PATCH] x86/asm/entry/32: simplify pushes of zeroed pt_regs->REGs Denys Vlasenko
@ 2016-03-12 15:38 ` Ingo Molnar
  2016-03-12 17:53   ` Denys Vlasenko
  2016-03-12 15:45 ` Ingo Molnar
  1 sibling, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2016-03-12 15:38 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Steven Rostedt, Borislav Petkov, H. Peter Anvin, Andy Lutomirski,
	Frederic Weisbecker, Will Drewry, Kees Cook, x86, linux-kernel


* Denys Vlasenko <dvlasenk@redhat.com> wrote:

> Use of a temporary R8 register here seems to be unnecessary.
> 
> "push %r8" is a two-byte insn (it needs REX prefix to specify R8),
> "push $0" is two-byte too. It seems just using the latter would be
> no worse.
> 
> Thus, code had an unnecessary "xorq %r8,%r8" insn.

Neat!

> It probably costs nothing in execution time here since we are probably
> limited by store bandwidth at this point, but still.
> 
> Run-tested under QEMU: 32-bit calls still work:
> 
> / # ./test_syscall_vdso32

Did you manage to test all 3 compat variants:

> @@ -72,24 +72,23 @@ ENTRY(entry_SYSENTER_compat)
> @@ -205,17 +204,16 @@ ENTRY(entry_SYSCALL_compat)
> @@ -316,11 +314,10 @@ ENTRY(entry_INT80_compat)

?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/asm/entry/32: simplify pushes of zeroed pt_regs->REGs
  2016-03-11 16:53 [PATCH] x86/asm/entry/32: simplify pushes of zeroed pt_regs->REGs Denys Vlasenko
  2016-03-12 15:38 ` Ingo Molnar
@ 2016-03-12 15:45 ` Ingo Molnar
  1 sibling, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2016-03-12 15:45 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Steven Rostedt, Borislav Petkov, H. Peter Anvin, Andy Lutomirski,
	Frederic Weisbecker, Will Drewry, Kees Cook, x86, linux-kernel


* Denys Vlasenko <dvlasenk@redhat.com> wrote:

> Use of a temporary R8 register here seems to be unnecessary.
> 
> "push %r8" is a two-byte insn (it needs REX prefix to specify R8),
> "push $0" is two-byte too. It seems just using the latter would be
> no worse.
> 
> Thus, code had an unnecessary "xorq %r8,%r8" insn.
> It probably costs nothing in execution time here since we are probably
> limited by store bandwidth at this point, but still.

Note that the 3 fewer instruction in the image also shrink the code by 16 bytes:

arch/x86/entry/entry_64_compat.o:

   text    data     bss     dec     hex filename
    380       0       0     380     17c entry_64_compat.o.before
    364       0       0     364     16c entry_64_compat.o.after

because (at least in this defconfig build) one of these functions shrunk below a 
16-byte boundary. So cache footprint got denser.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/asm/entry/32: simplify pushes of zeroed pt_regs->REGs
  2016-03-12 15:38 ` Ingo Molnar
@ 2016-03-12 17:53   ` Denys Vlasenko
  2016-03-12 18:05     ` Andy Lutomirski
  0 siblings, 1 reply; 6+ messages in thread
From: Denys Vlasenko @ 2016-03-12 17:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Steven Rostedt, Borislav Petkov, H. Peter Anvin, Andy Lutomirski,
	Frederic Weisbecker, Will Drewry, Kees Cook, x86, linux-kernel

On 03/12/2016 04:38 PM, Ingo Molnar wrote:
> 
> * Denys Vlasenko <dvlasenk@redhat.com> wrote:
> 
>> Use of a temporary R8 register here seems to be unnecessary.
>>
>> "push %r8" is a two-byte insn (it needs REX prefix to specify R8),
>> "push $0" is two-byte too. It seems just using the latter would be
>> no worse.
>>
>> Thus, code had an unnecessary "xorq %r8,%r8" insn.
> 
> Neat!
> 
>> It probably costs nothing in execution time here since we are probably
>> limited by store bandwidth at this point, but still.
>>
>> Run-tested under QEMU: 32-bit calls still work:
>>
>> / # ./test_syscall_vdso32
> 
> Did you manage to test all 3 compat variants:
> 
>> @@ -72,24 +72,23 @@ ENTRY(entry_SYSENTER_compat)
>> @@ -205,17 +204,16 @@ ENTRY(entry_SYSCALL_compat)
>> @@ -316,11 +314,10 @@ ENTRY(entry_INT80_compat)

Yes.

test_syscall_vdso32 checks vdso syscall (if available)
and direct int80 syscall.
Booting two times, with different qemu flags:

	qemu-system-x86_64 -cpu Opteron_G4
	qemu-system-x86_64 -cpu SandyBridge

makes kernel choose either SYSCALL or SYSENTER vdso.
So it's all covered.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/asm/entry/32: simplify pushes of zeroed pt_regs->REGs
  2016-03-12 17:53   ` Denys Vlasenko
@ 2016-03-12 18:05     ` Andy Lutomirski
  2016-03-12 21:41       ` Denys Vlasenko
  0 siblings, 1 reply; 6+ messages in thread
From: Andy Lutomirski @ 2016-03-12 18:05 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Ingo Molnar, Steven Rostedt, Borislav Petkov, H. Peter Anvin,
	Frederic Weisbecker, Will Drewry, Kees Cook, X86 ML,
	linux-kernel

On Sat, Mar 12, 2016 at 9:53 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> On 03/12/2016 04:38 PM, Ingo Molnar wrote:
>>
>> * Denys Vlasenko <dvlasenk@redhat.com> wrote:
>>
>>> Use of a temporary R8 register here seems to be unnecessary.
>>>
>>> "push %r8" is a two-byte insn (it needs REX prefix to specify R8),
>>> "push $0" is two-byte too. It seems just using the latter would be
>>> no worse.
>>>
>>> Thus, code had an unnecessary "xorq %r8,%r8" insn.
>>
>> Neat!
>>
>>> It probably costs nothing in execution time here since we are probably
>>> limited by store bandwidth at this point, but still.
>>>
>>> Run-tested under QEMU: 32-bit calls still work:
>>>
>>> / # ./test_syscall_vdso32
>>
>> Did you manage to test all 3 compat variants:
>>
>>> @@ -72,24 +72,23 @@ ENTRY(entry_SYSENTER_compat)
>>> @@ -205,17 +204,16 @@ ENTRY(entry_SYSCALL_compat)
>>> @@ -316,11 +314,10 @@ ENTRY(entry_INT80_compat)
>
> Yes.
>
> test_syscall_vdso32 checks vdso syscall (if available)
> and direct int80 syscall.
> Booting two times, with different qemu flags:
>
>         qemu-system-x86_64 -cpu Opteron_G4
>         qemu-system-x86_64 -cpu SandyBridge
>
> makes kernel choose either SYSCALL or SYSENTER vdso.
> So it's all covered.

How carefully did you check the latter bit?  In my experience, if KVM
is used, your cpu will report as your native CPU's manufacturer
regardless of who actually makes the emulated CPU.  -machine accel=tcg
turns that off.

-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/asm/entry/32: simplify pushes of zeroed pt_regs->REGs
  2016-03-12 18:05     ` Andy Lutomirski
@ 2016-03-12 21:41       ` Denys Vlasenko
  0 siblings, 0 replies; 6+ messages in thread
From: Denys Vlasenko @ 2016-03-12 21:41 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Steven Rostedt, Borislav Petkov, H. Peter Anvin,
	Frederic Weisbecker, Will Drewry, Kees Cook, X86 ML,
	linux-kernel

On 03/12/2016 07:05 PM, Andy Lutomirski wrote:
> On Sat, Mar 12, 2016 at 9:53 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>> On 03/12/2016 04:38 PM, Ingo Molnar wrote:
>>>
>>> * Denys Vlasenko <dvlasenk@redhat.com> wrote:
>>>
>>>> Use of a temporary R8 register here seems to be unnecessary.
>>>>
>>>> "push %r8" is a two-byte insn (it needs REX prefix to specify R8),
>>>> "push $0" is two-byte too. It seems just using the latter would be
>>>> no worse.
>>>>
>>>> Thus, code had an unnecessary "xorq %r8,%r8" insn.
>>>
>>> Neat!
>>>
>>>> It probably costs nothing in execution time here since we are probably
>>>> limited by store bandwidth at this point, but still.
>>>>
>>>> Run-tested under QEMU: 32-bit calls still work:
>>>>
>>>> / # ./test_syscall_vdso32
>>>
>>> Did you manage to test all 3 compat variants:
>>>
>>>> @@ -72,24 +72,23 @@ ENTRY(entry_SYSENTER_compat)
>>>> @@ -205,17 +204,16 @@ ENTRY(entry_SYSCALL_compat)
>>>> @@ -316,11 +314,10 @@ ENTRY(entry_INT80_compat)
>>
>> Yes.
>>
>> test_syscall_vdso32 checks vdso syscall (if available)
>> and direct int80 syscall.
>> Booting two times, with different qemu flags:
>>
>>         qemu-system-x86_64 -cpu Opteron_G4
>>         qemu-system-x86_64 -cpu SandyBridge
>>
>> makes kernel choose either SYSCALL or SYSENTER vdso.
>> So it's all covered.
> 
> How carefully did you check the latter bit?

To double-check, I built a kernel with intentionally crippled
SYSENTER handling (infinite loop).

qemu-system-x86_64 -cpu Opteron_G4 - works
qemu-system-x86_64 -cpu SandyBridge - ./test_syscall_vdso_32 hung

This proves that -cpu SandyBridge does cause SYSENTER path to be used.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-03-12 21:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-11 16:53 [PATCH] x86/asm/entry/32: simplify pushes of zeroed pt_regs->REGs Denys Vlasenko
2016-03-12 15:38 ` Ingo Molnar
2016-03-12 17:53   ` Denys Vlasenko
2016-03-12 18:05     ` Andy Lutomirski
2016-03-12 21:41       ` Denys Vlasenko
2016-03-12 15:45 ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).