* [PATCH] x86/asm/entry/32: simplify pushes of zeroed pt_regs->REGs
@ 2016-03-11 16:53 Denys Vlasenko
2016-03-12 15:38 ` Ingo Molnar
2016-03-12 15:45 ` Ingo Molnar
0 siblings, 2 replies; 6+ messages in thread
From: Denys Vlasenko @ 2016-03-11 16:53 UTC (permalink / raw)
To: Ingo Molnar
Cc: Denys Vlasenko, Steven Rostedt, Borislav Petkov, H. Peter Anvin,
Andy Lutomirski, Frederic Weisbecker, Will Drewry, Kees Cook,
x86, linux-kernel
Use of a temporary R8 register here seems to be unnecessary.
"push %r8" is a two-byte insn (it needs REX prefix to specify R8),
"push $0" is two-byte too. It seems just using the latter would be
no worse.
Thus, code had an unnecessary "xorq %r8,%r8" insn.
It probably costs nothing in execution time here since we are probably
limited by store bandwidth at this point, but still.
Run-tested under QEMU: 32-bit calls still work:
/ # ./test_syscall_vdso32
[RUN] Executing 6-argument 32-bit syscall via VDSO
[OK] Arguments are preserved across syscall
[NOTE] R11 has changed:0000000000200ed7 - assuming clobbered by SYSRET insn
[OK] R8..R15 did not leak kernel data
[RUN] Executing 6-argument 32-bit syscall via INT 80
[OK] Arguments are preserved across syscall
[OK] R8..R15 did not leak kernel data
[RUN] Running tests under ptrace
[RUN] Executing 6-argument 32-bit syscall via VDSO
[OK] Arguments are preserved across syscall
[NOTE] R11 has changed:0000000000200ed7 - assuming clobbered by SYSRET insn
[OK] R8..R15 did not leak kernel data
[RUN] Executing 6-argument 32-bit syscall via INT 80
[OK] Arguments are preserved across syscall
[OK] R8..R15 did not leak kernel data
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: Ingo Molnar <mingo@kernel.org>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Borislav Petkov <bp@alien8.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andy Lutomirski <luto@amacapital.net>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Will Drewry <wad@chromium.org>
CC: Kees Cook <keescook@chromium.org>
CC: x86@kernel.org
CC: linux-kernel@vger.kernel.org
---
arch/x86/entry/entry_64_compat.S | 45 +++++++++++++++++++---------------------
1 file changed, 21 insertions(+), 24 deletions(-)
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 847f2f0..e1721da 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -72,24 +72,23 @@ ENTRY(entry_SYSENTER_compat)
pushfq /* pt_regs->flags (except IF = 0) */
orl $X86_EFLAGS_IF, (%rsp) /* Fix saved flags */
pushq $__USER32_CS /* pt_regs->cs */
- xorq %r8,%r8
- pushq %r8 /* pt_regs->ip = 0 (placeholder) */
+ pushq $0 /* pt_regs->ip = 0 (placeholder) */
pushq %rax /* pt_regs->orig_ax */
pushq %rdi /* pt_regs->di */
pushq %rsi /* pt_regs->si */
pushq %rdx /* pt_regs->dx */
pushq %rcx /* pt_regs->cx */
pushq $-ENOSYS /* pt_regs->ax */
- pushq %r8 /* pt_regs->r8 = 0 */
- pushq %r8 /* pt_regs->r9 = 0 */
- pushq %r8 /* pt_regs->r10 = 0 */
- pushq %r8 /* pt_regs->r11 = 0 */
+ pushq $0 /* pt_regs->r8 = 0 */
+ pushq $0 /* pt_regs->r9 = 0 */
+ pushq $0 /* pt_regs->r10 = 0 */
+ pushq $0 /* pt_regs->r11 = 0 */
pushq %rbx /* pt_regs->rbx */
pushq %rbp /* pt_regs->rbp (will be overwritten) */
- pushq %r8 /* pt_regs->r12 = 0 */
- pushq %r8 /* pt_regs->r13 = 0 */
- pushq %r8 /* pt_regs->r14 = 0 */
- pushq %r8 /* pt_regs->r15 = 0 */
+ pushq $0 /* pt_regs->r12 = 0 */
+ pushq $0 /* pt_regs->r13 = 0 */
+ pushq $0 /* pt_regs->r14 = 0 */
+ pushq $0 /* pt_regs->r15 = 0 */
cld
/*
@@ -205,17 +204,16 @@ ENTRY(entry_SYSCALL_compat)
pushq %rdx /* pt_regs->dx */
pushq %rbp /* pt_regs->cx (stashed in bp) */
pushq $-ENOSYS /* pt_regs->ax */
- xorq %r8,%r8
- pushq %r8 /* pt_regs->r8 = 0 */
- pushq %r8 /* pt_regs->r9 = 0 */
- pushq %r8 /* pt_regs->r10 = 0 */
- pushq %r8 /* pt_regs->r11 = 0 */
+ pushq $0 /* pt_regs->r8 = 0 */
+ pushq $0 /* pt_regs->r9 = 0 */
+ pushq $0 /* pt_regs->r10 = 0 */
+ pushq $0 /* pt_regs->r11 = 0 */
pushq %rbx /* pt_regs->rbx */
pushq %rbp /* pt_regs->rbp (will be overwritten) */
- pushq %r8 /* pt_regs->r12 = 0 */
- pushq %r8 /* pt_regs->r13 = 0 */
- pushq %r8 /* pt_regs->r14 = 0 */
- pushq %r8 /* pt_regs->r15 = 0 */
+ pushq $0 /* pt_regs->r12 = 0 */
+ pushq $0 /* pt_regs->r13 = 0 */
+ pushq $0 /* pt_regs->r14 = 0 */
+ pushq $0 /* pt_regs->r15 = 0 */
/*
* User mode is traced as though IRQs are on, and SYSENTER
@@ -316,11 +314,10 @@ ENTRY(entry_INT80_compat)
pushq %rdx /* pt_regs->dx */
pushq %rcx /* pt_regs->cx */
pushq $-ENOSYS /* pt_regs->ax */
- xorq %r8,%r8
- pushq %r8 /* pt_regs->r8 = 0 */
- pushq %r8 /* pt_regs->r9 = 0 */
- pushq %r8 /* pt_regs->r10 = 0 */
- pushq %r8 /* pt_regs->r11 = 0 */
+ pushq $0 /* pt_regs->r8 = 0 */
+ pushq $0 /* pt_regs->r9 = 0 */
+ pushq $0 /* pt_regs->r10 = 0 */
+ pushq $0 /* pt_regs->r11 = 0 */
pushq %rbx /* pt_regs->rbx */
pushq %rbp /* pt_regs->rbp */
pushq %r12 /* pt_regs->r12 */
--
1.8.1.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/asm/entry/32: simplify pushes of zeroed pt_regs->REGs
2016-03-11 16:53 [PATCH] x86/asm/entry/32: simplify pushes of zeroed pt_regs->REGs Denys Vlasenko
@ 2016-03-12 15:38 ` Ingo Molnar
2016-03-12 17:53 ` Denys Vlasenko
2016-03-12 15:45 ` Ingo Molnar
1 sibling, 1 reply; 6+ messages in thread
From: Ingo Molnar @ 2016-03-12 15:38 UTC (permalink / raw)
To: Denys Vlasenko
Cc: Steven Rostedt, Borislav Petkov, H. Peter Anvin, Andy Lutomirski,
Frederic Weisbecker, Will Drewry, Kees Cook, x86, linux-kernel
* Denys Vlasenko <dvlasenk@redhat.com> wrote:
> Use of a temporary R8 register here seems to be unnecessary.
>
> "push %r8" is a two-byte insn (it needs REX prefix to specify R8),
> "push $0" is two-byte too. It seems just using the latter would be
> no worse.
>
> Thus, code had an unnecessary "xorq %r8,%r8" insn.
Neat!
> It probably costs nothing in execution time here since we are probably
> limited by store bandwidth at this point, but still.
>
> Run-tested under QEMU: 32-bit calls still work:
>
> / # ./test_syscall_vdso32
Did you manage to test all 3 compat variants:
> @@ -72,24 +72,23 @@ ENTRY(entry_SYSENTER_compat)
> @@ -205,17 +204,16 @@ ENTRY(entry_SYSCALL_compat)
> @@ -316,11 +314,10 @@ ENTRY(entry_INT80_compat)
?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/asm/entry/32: simplify pushes of zeroed pt_regs->REGs
2016-03-11 16:53 [PATCH] x86/asm/entry/32: simplify pushes of zeroed pt_regs->REGs Denys Vlasenko
2016-03-12 15:38 ` Ingo Molnar
@ 2016-03-12 15:45 ` Ingo Molnar
1 sibling, 0 replies; 6+ messages in thread
From: Ingo Molnar @ 2016-03-12 15:45 UTC (permalink / raw)
To: Denys Vlasenko
Cc: Steven Rostedt, Borislav Petkov, H. Peter Anvin, Andy Lutomirski,
Frederic Weisbecker, Will Drewry, Kees Cook, x86, linux-kernel
* Denys Vlasenko <dvlasenk@redhat.com> wrote:
> Use of a temporary R8 register here seems to be unnecessary.
>
> "push %r8" is a two-byte insn (it needs REX prefix to specify R8),
> "push $0" is two-byte too. It seems just using the latter would be
> no worse.
>
> Thus, code had an unnecessary "xorq %r8,%r8" insn.
> It probably costs nothing in execution time here since we are probably
> limited by store bandwidth at this point, but still.
Note that the 3 fewer instruction in the image also shrink the code by 16 bytes:
arch/x86/entry/entry_64_compat.o:
text data bss dec hex filename
380 0 0 380 17c entry_64_compat.o.before
364 0 0 364 16c entry_64_compat.o.after
because (at least in this defconfig build) one of these functions shrunk below a
16-byte boundary. So cache footprint got denser.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/asm/entry/32: simplify pushes of zeroed pt_regs->REGs
2016-03-12 15:38 ` Ingo Molnar
@ 2016-03-12 17:53 ` Denys Vlasenko
2016-03-12 18:05 ` Andy Lutomirski
0 siblings, 1 reply; 6+ messages in thread
From: Denys Vlasenko @ 2016-03-12 17:53 UTC (permalink / raw)
To: Ingo Molnar
Cc: Steven Rostedt, Borislav Petkov, H. Peter Anvin, Andy Lutomirski,
Frederic Weisbecker, Will Drewry, Kees Cook, x86, linux-kernel
On 03/12/2016 04:38 PM, Ingo Molnar wrote:
>
> * Denys Vlasenko <dvlasenk@redhat.com> wrote:
>
>> Use of a temporary R8 register here seems to be unnecessary.
>>
>> "push %r8" is a two-byte insn (it needs REX prefix to specify R8),
>> "push $0" is two-byte too. It seems just using the latter would be
>> no worse.
>>
>> Thus, code had an unnecessary "xorq %r8,%r8" insn.
>
> Neat!
>
>> It probably costs nothing in execution time here since we are probably
>> limited by store bandwidth at this point, but still.
>>
>> Run-tested under QEMU: 32-bit calls still work:
>>
>> / # ./test_syscall_vdso32
>
> Did you manage to test all 3 compat variants:
>
>> @@ -72,24 +72,23 @@ ENTRY(entry_SYSENTER_compat)
>> @@ -205,17 +204,16 @@ ENTRY(entry_SYSCALL_compat)
>> @@ -316,11 +314,10 @@ ENTRY(entry_INT80_compat)
Yes.
test_syscall_vdso32 checks vdso syscall (if available)
and direct int80 syscall.
Booting two times, with different qemu flags:
qemu-system-x86_64 -cpu Opteron_G4
qemu-system-x86_64 -cpu SandyBridge
makes kernel choose either SYSCALL or SYSENTER vdso.
So it's all covered.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/asm/entry/32: simplify pushes of zeroed pt_regs->REGs
2016-03-12 17:53 ` Denys Vlasenko
@ 2016-03-12 18:05 ` Andy Lutomirski
2016-03-12 21:41 ` Denys Vlasenko
0 siblings, 1 reply; 6+ messages in thread
From: Andy Lutomirski @ 2016-03-12 18:05 UTC (permalink / raw)
To: Denys Vlasenko
Cc: Ingo Molnar, Steven Rostedt, Borislav Petkov, H. Peter Anvin,
Frederic Weisbecker, Will Drewry, Kees Cook, X86 ML,
linux-kernel
On Sat, Mar 12, 2016 at 9:53 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> On 03/12/2016 04:38 PM, Ingo Molnar wrote:
>>
>> * Denys Vlasenko <dvlasenk@redhat.com> wrote:
>>
>>> Use of a temporary R8 register here seems to be unnecessary.
>>>
>>> "push %r8" is a two-byte insn (it needs REX prefix to specify R8),
>>> "push $0" is two-byte too. It seems just using the latter would be
>>> no worse.
>>>
>>> Thus, code had an unnecessary "xorq %r8,%r8" insn.
>>
>> Neat!
>>
>>> It probably costs nothing in execution time here since we are probably
>>> limited by store bandwidth at this point, but still.
>>>
>>> Run-tested under QEMU: 32-bit calls still work:
>>>
>>> / # ./test_syscall_vdso32
>>
>> Did you manage to test all 3 compat variants:
>>
>>> @@ -72,24 +72,23 @@ ENTRY(entry_SYSENTER_compat)
>>> @@ -205,17 +204,16 @@ ENTRY(entry_SYSCALL_compat)
>>> @@ -316,11 +314,10 @@ ENTRY(entry_INT80_compat)
>
> Yes.
>
> test_syscall_vdso32 checks vdso syscall (if available)
> and direct int80 syscall.
> Booting two times, with different qemu flags:
>
> qemu-system-x86_64 -cpu Opteron_G4
> qemu-system-x86_64 -cpu SandyBridge
>
> makes kernel choose either SYSCALL or SYSENTER vdso.
> So it's all covered.
How carefully did you check the latter bit? In my experience, if KVM
is used, your cpu will report as your native CPU's manufacturer
regardless of who actually makes the emulated CPU. -machine accel=tcg
turns that off.
--
Andy Lutomirski
AMA Capital Management, LLC
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] x86/asm/entry/32: simplify pushes of zeroed pt_regs->REGs
2016-03-12 18:05 ` Andy Lutomirski
@ 2016-03-12 21:41 ` Denys Vlasenko
0 siblings, 0 replies; 6+ messages in thread
From: Denys Vlasenko @ 2016-03-12 21:41 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Ingo Molnar, Steven Rostedt, Borislav Petkov, H. Peter Anvin,
Frederic Weisbecker, Will Drewry, Kees Cook, X86 ML,
linux-kernel
On 03/12/2016 07:05 PM, Andy Lutomirski wrote:
> On Sat, Mar 12, 2016 at 9:53 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>> On 03/12/2016 04:38 PM, Ingo Molnar wrote:
>>>
>>> * Denys Vlasenko <dvlasenk@redhat.com> wrote:
>>>
>>>> Use of a temporary R8 register here seems to be unnecessary.
>>>>
>>>> "push %r8" is a two-byte insn (it needs REX prefix to specify R8),
>>>> "push $0" is two-byte too. It seems just using the latter would be
>>>> no worse.
>>>>
>>>> Thus, code had an unnecessary "xorq %r8,%r8" insn.
>>>
>>> Neat!
>>>
>>>> It probably costs nothing in execution time here since we are probably
>>>> limited by store bandwidth at this point, but still.
>>>>
>>>> Run-tested under QEMU: 32-bit calls still work:
>>>>
>>>> / # ./test_syscall_vdso32
>>>
>>> Did you manage to test all 3 compat variants:
>>>
>>>> @@ -72,24 +72,23 @@ ENTRY(entry_SYSENTER_compat)
>>>> @@ -205,17 +204,16 @@ ENTRY(entry_SYSCALL_compat)
>>>> @@ -316,11 +314,10 @@ ENTRY(entry_INT80_compat)
>>
>> Yes.
>>
>> test_syscall_vdso32 checks vdso syscall (if available)
>> and direct int80 syscall.
>> Booting two times, with different qemu flags:
>>
>> qemu-system-x86_64 -cpu Opteron_G4
>> qemu-system-x86_64 -cpu SandyBridge
>>
>> makes kernel choose either SYSCALL or SYSENTER vdso.
>> So it's all covered.
>
> How carefully did you check the latter bit?
To double-check, I built a kernel with intentionally crippled
SYSENTER handling (infinite loop).
qemu-system-x86_64 -cpu Opteron_G4 - works
qemu-system-x86_64 -cpu SandyBridge - ./test_syscall_vdso_32 hung
This proves that -cpu SandyBridge does cause SYSENTER path to be used.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-03-12 21:41 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-11 16:53 [PATCH] x86/asm/entry/32: simplify pushes of zeroed pt_regs->REGs Denys Vlasenko
2016-03-12 15:38 ` Ingo Molnar
2016-03-12 17:53 ` Denys Vlasenko
2016-03-12 18:05 ` Andy Lutomirski
2016-03-12 21:41 ` Denys Vlasenko
2016-03-12 15:45 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).