All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80
@ 2018-04-17 14:36 Andy Lutomirski
  2018-04-17 15:00 ` Denys Vlasenko
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Andy Lutomirski @ 2018-04-17 14:36 UTC (permalink / raw)
  To: x86, LKML
  Cc: Borislav Petkov, Dominik Brodowski, Denys Vlasenko, Andy Lutomirski

32-bit user code that uses int $80 doesn't care about r8-r11.  There is,
however, some 64-bit user code that intentionally uses int $0x80 to
invoke 32-bit system calls.  From what I've seen, basically all such
code assumes that r8-r15 are all preserved, but the kernel clobbers
r8-r11.  Since I doubt that there's any code that depends on int $0x80
zeroing r8-r11, change the kernel to preserve them.

I suspect that very little user code is broken by the old clobber,
since r8-r11 are only rarely allocated by gcc, and they're clobbered
by function calls, so they only way we'd see a problem is if the
same function that invokes int $0x80 also spills something important
to one of these registers.

The current behavior seems to date back to the historical commit
"[PATCH] x86-64 merge for 2.6.4".  Before that, all regs were
preserved.  I can't find any explanation of why this change was made.

This patch also updates the test_syscall_vdso_32 testcase to verify
the new behavior, and it strengthens the test to make sure that
the kernel doesn't accidentally permute r8..r15.

Suggested-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---

Ingo, Thomas: this could be a -stable candidate, but it's apparently not
severe enough for many people to have noticed.

 arch/x86/entry/entry_64_compat.S                |  8 +++---
 tools/testing/selftests/x86/test_syscall_vdso.c | 35 +++++++++++++++----------
 2 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 08425c42f8b7..e4b94b7494c6 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -368,13 +368,13 @@ ENTRY(entry_INT80_compat)
 	pushq	%rdx			/* pt_regs->dx */
 	pushq	%rcx			/* pt_regs->cx */
 	pushq	$-ENOSYS		/* pt_regs->ax */
-	pushq   $0			/* pt_regs->r8  = 0 */
+	pushq   %r8			/* pt_regs->r8 */
 	xorl	%r8d, %r8d		/* nospec   r8 */
-	pushq   $0			/* pt_regs->r9  = 0 */
+	pushq   %r9			/* pt_regs->r9 */
 	xorl	%r9d, %r9d		/* nospec   r9 */
-	pushq   $0			/* pt_regs->r10 = 0 */
+	pushq   %r10			/* pt_regs->r10 */
 	xorl	%r10d, %r10d		/* nospec   r10 */
-	pushq   $0			/* pt_regs->r11 = 0 */
+	pushq   %r11			/* pt_regs->r11 */
 	xorl	%r11d, %r11d		/* nospec   r11 */
 	pushq   %rbx                    /* pt_regs->rbx */
 	xorl	%ebx, %ebx		/* nospec   rbx */
diff --git a/tools/testing/selftests/x86/test_syscall_vdso.c b/tools/testing/selftests/x86/test_syscall_vdso.c
index 40370354d4c1..c9c3281077bc 100644
--- a/tools/testing/selftests/x86/test_syscall_vdso.c
+++ b/tools/testing/selftests/x86/test_syscall_vdso.c
@@ -100,12 +100,19 @@ asm (
 	"	shl	$32, %r8\n"
 	"	orq	$0x7f7f7f7f, %r8\n"
 	"	movq	%r8, %r9\n"
-	"	movq	%r8, %r10\n"
-	"	movq	%r8, %r11\n"
-	"	movq	%r8, %r12\n"
-	"	movq	%r8, %r13\n"
-	"	movq	%r8, %r14\n"
-	"	movq	%r8, %r15\n"
+	"	incq	%r9\n"
+	"	movq	%r9, %r10\n"
+	"	incq	%r10\n"
+	"	movq	%r10, %r11\n"
+	"	incq	%r11\n"
+	"	movq	%r11, %r12\n"
+	"	incq	%r12\n"
+	"	movq	%r12, %r13\n"
+	"	incq	%r13\n"
+	"	movq	%r13, %r14\n"
+	"	incq	%r14\n"
+	"	movq	%r14, %r15\n"
+	"	incq	%r15\n"
 	"	ret\n"
 	"	.code32\n"
 	"	.popsection\n"
@@ -128,12 +135,13 @@ int check_regs64(void)
 	int err = 0;
 	int num = 8;
 	uint64_t *r64 = &regs64.r8;
+	uint64_t expected = 0x7f7f7f7f7f7f7f7fULL;
 
 	if (!kernel_is_64bit)
 		return 0;
 
 	do {
-		if (*r64 == 0x7f7f7f7f7f7f7f7fULL)
+		if (*r64 == expected++)
 			continue; /* register did not change */
 		if (syscall_addr != (long)&int80) {
 			/*
@@ -147,18 +155,17 @@ int check_regs64(void)
 				continue;
 			}
 		} else {
-			/* INT80 syscall entrypoint can be used by
+			/*
+			 * INT80 syscall entrypoint can be used by
 			 * 64-bit programs too, unlike SYSCALL/SYSENTER.
 			 * Therefore it must preserve R12+
 			 * (they are callee-saved registers in 64-bit C ABI).
 			 *
-			 * This was probably historically not intended,
-			 * but R8..11 are clobbered (cleared to 0).
-			 * IOW: they are the only registers which aren't
-			 * preserved across INT80 syscall.
+			 * Starting in Linux 4.17 (and any kernel that
+			 * backports the change), R8..11 are preserved.
+			 * Historically (and probably unintentionally), they
+			 * were clobbered or zeroed.
 			 */
-			if (*r64 == 0 && num <= 11)
-				continue;
 		}
 		printf("[FAIL]\tR%d has changed:%016llx\n", num, *r64);
 		err++;
-- 
2.14.3

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80
  2018-04-17 14:36 [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80 Andy Lutomirski
@ 2018-04-17 15:00 ` Denys Vlasenko
  2018-04-18 16:53   ` Andy Lutomirski
  2018-04-23 12:50 ` Borislav Petkov
  2018-04-27 15:12 ` [tip:x86/pti] " tip-bot for Andy Lutomirski
  2 siblings, 1 reply; 6+ messages in thread
From: Denys Vlasenko @ 2018-04-17 15:00 UTC (permalink / raw)
  To: Andy Lutomirski, x86, LKML; +Cc: Borislav Petkov, Dominik Brodowski

On 04/17/2018 04:36 PM, Andy Lutomirski wrote:
> 32-bit user code that uses int $80 doesn't care about r8-r11.  There is,
> however, some 64-bit user code that intentionally uses int $0x80 to
> invoke 32-bit system calls.  From what I've seen, basically all such
> code assumes that r8-r15 are all preserved, but the kernel clobbers
> r8-r11.  Since I doubt that there's any code that depends on int $0x80
> zeroing r8-r11, change the kernel to preserve them.
> 
> I suspect that very little user code is broken by the old clobber,
> since r8-r11 are only rarely allocated by gcc, and they're clobbered
> by function calls, so they only way we'd see a problem is if the
> same function that invokes int $0x80 also spills something important
> to one of these registers.
> 
> The current behavior seems to date back to the historical commit
> "[PATCH] x86-64 merge for 2.6.4".  Before that, all regs were
> preserved.  I can't find any explanation of why this change was made.

This means that the new behavior is there for some 8 years already.
Whoever was impacted by it, probably already switched to the new ABI.

Current ABI is "weaker", it allows kernel to save fewer registers.

Which is generally a good thing, since saving/restoring things cost
cycles, and sometimes painful on entry paths where you may desperately
need a scratch register or two. (Recall this one? -
...
         movq    %rsp, PER_CPU_VAR(rsp_scratch)
         movq    PER_CPU_VAR(cpu_current_top_of_stack), %rsp
         /* Construct struct pt_regs on stack */
         pushq   $__USER_DS                      /* pt_regs->ss */
         pushq   PER_CPU_VAR(rsp_scratch)        /* pt_regs->sp */
...
wouldn't it be _great_ if one of GPRs would be available here
to hold userspace %rsp?
)

If userspace needs some registers saved, it's trivial for it to have:

	push reg1
	push reg2
	int  0x80
	pop  reg2
	pop  reg1

OTOH if userspace _does not_ need some registers saved,
but they are defined as saved by the entrypoint ABI, then save/restore work
is done every time, even when not needed.

Thus, I propose to retain the current behavior.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80
  2018-04-17 15:00 ` Denys Vlasenko
@ 2018-04-18 16:53   ` Andy Lutomirski
  2018-04-18 17:13     ` Denys Vlasenko
  0 siblings, 1 reply; 6+ messages in thread
From: Andy Lutomirski @ 2018-04-18 16:53 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Andy Lutomirski, X86 ML, LKML, Borislav Petkov, Dominik Brodowski

On Tue, Apr 17, 2018 at 8:00 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
> This means that the new behavior is there for some 8 years already.
> Whoever was impacted by it, probably already switched to the new ABI.
>
> Current ABI is "weaker", it allows kernel to save fewer registers.
>
> Which is generally a good thing, since saving/restoring things cost
> cycles, and sometimes painful on entry paths where you may desperately
> need a scratch register or two. (Recall this one? -
> ...
>         movq    %rsp, PER_CPU_VAR(rsp_scratch)
>         movq    PER_CPU_VAR(cpu_current_top_of_stack), %rsp
>         /* Construct struct pt_regs on stack */
>         pushq   $__USER_DS                      /* pt_regs->ss */
>         pushq   PER_CPU_VAR(rsp_scratch)        /* pt_regs->sp */
> ...
> wouldn't it be _great_ if one of GPRs would be available here
> to hold userspace %rsp?
> )

But this is the int $0x80 entry, which uses the stack sanely and
doesn't have this problem at all.

>
> If userspace needs some registers saved, it's trivial for it to have:
>
>         push reg1
>         push reg2
>         int  0x80
>         pop  reg2
>         pop  reg1
>
> OTOH if userspace _does not_ need some registers saved,
> but they are defined as saved by the entrypoint ABI, then save/restore work
> is done every time, even when not needed.
>
> Thus, I propose to retain the current behavior.

The problems are:

1. If you look up how to do int $0x80, every answer you get doesn't
mention any clobbers.  The code works on x86_32 and seems to work on
x86_64.  I think we should make it actually work.

2. It's very easy to make this mistake and get away with it for a long
time, and the failure modes are hard to debug.  gcc doesn't allocate
r8..r11 that often, and there are plenty of contexts (near end of a
leaf function) where r8..r11 are dead even if they were allocated.  So
there is probably a decent body of code out there that makes this
mistake and is okay for now.  But if anyone ever compiles it with LTO,
it's reasonably likely to go boom.

So I think we should fix it in the interest of avoiding weird bugs.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80
  2018-04-18 16:53   ` Andy Lutomirski
@ 2018-04-18 17:13     ` Denys Vlasenko
  0 siblings, 0 replies; 6+ messages in thread
From: Denys Vlasenko @ 2018-04-18 17:13 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: X86 ML, LKML, Borislav Petkov, Dominik Brodowski

On 04/18/2018 06:53 PM, Andy Lutomirski wrote:
> On Tue, Apr 17, 2018 at 8:00 AM, Denys Vlasenko <dvlasenk@redhat.com> wrote:
>> This means that the new behavior is there for some 8 years already.
>> Whoever was impacted by it, probably already switched to the new ABI.
>>
>> Current ABI is "weaker", it allows kernel to save fewer registers.
>>
>> Which is generally a good thing, since saving/restoring things cost
>> cycles, and sometimes painful on entry paths where you may desperately
>> need a scratch register or two. (Recall this one? -
>> ...
>>          movq    %rsp, PER_CPU_VAR(rsp_scratch)
>>          movq    PER_CPU_VAR(cpu_current_top_of_stack), %rsp
>>          /* Construct struct pt_regs on stack */
>>          pushq   $__USER_DS                      /* pt_regs->ss */
>>          pushq   PER_CPU_VAR(rsp_scratch)        /* pt_regs->sp */
>> ...
>> wouldn't it be _great_ if one of GPRs would be available here
>> to hold userspace %rsp?
>> )
> 
> But this is the int $0x80 entry, which uses the stack sanely and
> doesn't have this problem at all.

It was a general point why not committing to save every register
may help on the callee (in this case kernel) side.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80
  2018-04-17 14:36 [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80 Andy Lutomirski
  2018-04-17 15:00 ` Denys Vlasenko
@ 2018-04-23 12:50 ` Borislav Petkov
  2018-04-27 15:12 ` [tip:x86/pti] " tip-bot for Andy Lutomirski
  2 siblings, 0 replies; 6+ messages in thread
From: Borislav Petkov @ 2018-04-23 12:50 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: x86, LKML, Dominik Brodowski, Denys Vlasenko

On Tue, Apr 17, 2018 at 07:36:36AM -0700, Andy Lutomirski wrote:
> 32-bit user code that uses int $80 doesn't care about r8-r11.  There is,
> however, some 64-bit user code that intentionally uses int $0x80 to
> invoke 32-bit system calls.  From what I've seen, basically all such
> code assumes that r8-r15 are all preserved, but the kernel clobbers
> r8-r11.  Since I doubt that there's any code that depends on int $0x80
> zeroing r8-r11, change the kernel to preserve them.
> 
> I suspect that very little user code is broken by the old clobber,
> since r8-r11 are only rarely allocated by gcc, and they're clobbered
> by function calls, so they only way we'd see a problem is if the
> same function that invokes int $0x80 also spills something important
> to one of these registers.
> 
> The current behavior seems to date back to the historical commit
> "[PATCH] x86-64 merge for 2.6.4".  Before that, all regs were
> preserved.  I can't find any explanation of why this change was made.

Probably because r8-r11 are callee-clobbered, according to ABI so
someone decided to whack them so that code which doesn't adhere to the
ABI would fall on its face...

Also, looking at PUSH_AND_CLEAR_REGS and how we call it on the 64-bit
entry path, we probably should keep clearing those regs to avoid
speculation crap.

Methinks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip:x86/pti] x86/entry/64/compat: Preserve r8-r11 in int $0x80
  2018-04-17 14:36 [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80 Andy Lutomirski
  2018-04-17 15:00 ` Denys Vlasenko
  2018-04-23 12:50 ` Borislav Petkov
@ 2018-04-27 15:12 ` tip-bot for Andy Lutomirski
  2 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Andy Lutomirski @ 2018-04-27 15:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, luto, hpa, bp, linux-kernel, dvlasenk, linux, tglx

Commit-ID:  8bb2610bc4967f19672444a7b0407367f1540028
Gitweb:     https://git.kernel.org/tip/8bb2610bc4967f19672444a7b0407367f1540028
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Tue, 17 Apr 2018 07:36:36 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 27 Apr 2018 17:07:58 +0200

x86/entry/64/compat: Preserve r8-r11 in int $0x80

32-bit user code that uses int $80 doesn't care about r8-r11.  There is,
however, some 64-bit user code that intentionally uses int $0x80 to invoke
32-bit system calls.  From what I've seen, basically all such code assumes
that r8-r15 are all preserved, but the kernel clobbers r8-r11.  Since I
doubt that there's any code that depends on int $0x80 zeroing r8-r11,
change the kernel to preserve them.

I suspect that very little user code is broken by the old clobber, since
r8-r11 are only rarely allocated by gcc, and they're clobbered by function
calls, so they only way we'd see a problem is if the same function that
invokes int $0x80 also spills something important to one of these
registers.

The current behavior seems to date back to the historical commit
"[PATCH] x86-64 merge for 2.6.4".  Before that, all regs were
preserved.  I can't find any explanation of why this change was made.

Update the test_syscall_vdso_32 testcase as well to verify the new
behavior, and it strengthens the test to make sure that the kernel doesn't
accidentally permute r8..r15.

Suggested-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Link: https://lkml.kernel.org/r/d4c4d9985fbe64f8c9e19291886453914b48caee.1523975710.git.luto@kernel.org

---
 arch/x86/entry/entry_64_compat.S                |  8 +++---
 tools/testing/selftests/x86/test_syscall_vdso.c | 35 +++++++++++++++----------
 2 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index 9af927e59d49..9de7f1e1dede 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -84,13 +84,13 @@ ENTRY(entry_SYSENTER_compat)
 	pushq	%rdx			/* pt_regs->dx */
 	pushq	%rcx			/* pt_regs->cx */
 	pushq	$-ENOSYS		/* pt_regs->ax */
-	pushq   $0			/* pt_regs->r8  = 0 */
+	pushq   %r8			/* pt_regs->r8 */
 	xorl	%r8d, %r8d		/* nospec   r8 */
-	pushq   $0			/* pt_regs->r9  = 0 */
+	pushq   %r9			/* pt_regs->r9 */
 	xorl	%r9d, %r9d		/* nospec   r9 */
-	pushq   $0			/* pt_regs->r10 = 0 */
+	pushq   %r10			/* pt_regs->r10 */
 	xorl	%r10d, %r10d		/* nospec   r10 */
-	pushq   $0			/* pt_regs->r11 = 0 */
+	pushq   %r11			/* pt_regs->r11 */
 	xorl	%r11d, %r11d		/* nospec   r11 */
 	pushq   %rbx                    /* pt_regs->rbx */
 	xorl	%ebx, %ebx		/* nospec   rbx */
diff --git a/tools/testing/selftests/x86/test_syscall_vdso.c b/tools/testing/selftests/x86/test_syscall_vdso.c
index 40370354d4c1..c9c3281077bc 100644
--- a/tools/testing/selftests/x86/test_syscall_vdso.c
+++ b/tools/testing/selftests/x86/test_syscall_vdso.c
@@ -100,12 +100,19 @@ asm (
 	"	shl	$32, %r8\n"
 	"	orq	$0x7f7f7f7f, %r8\n"
 	"	movq	%r8, %r9\n"
-	"	movq	%r8, %r10\n"
-	"	movq	%r8, %r11\n"
-	"	movq	%r8, %r12\n"
-	"	movq	%r8, %r13\n"
-	"	movq	%r8, %r14\n"
-	"	movq	%r8, %r15\n"
+	"	incq	%r9\n"
+	"	movq	%r9, %r10\n"
+	"	incq	%r10\n"
+	"	movq	%r10, %r11\n"
+	"	incq	%r11\n"
+	"	movq	%r11, %r12\n"
+	"	incq	%r12\n"
+	"	movq	%r12, %r13\n"
+	"	incq	%r13\n"
+	"	movq	%r13, %r14\n"
+	"	incq	%r14\n"
+	"	movq	%r14, %r15\n"
+	"	incq	%r15\n"
 	"	ret\n"
 	"	.code32\n"
 	"	.popsection\n"
@@ -128,12 +135,13 @@ int check_regs64(void)
 	int err = 0;
 	int num = 8;
 	uint64_t *r64 = &regs64.r8;
+	uint64_t expected = 0x7f7f7f7f7f7f7f7fULL;
 
 	if (!kernel_is_64bit)
 		return 0;
 
 	do {
-		if (*r64 == 0x7f7f7f7f7f7f7f7fULL)
+		if (*r64 == expected++)
 			continue; /* register did not change */
 		if (syscall_addr != (long)&int80) {
 			/*
@@ -147,18 +155,17 @@ int check_regs64(void)
 				continue;
 			}
 		} else {
-			/* INT80 syscall entrypoint can be used by
+			/*
+			 * INT80 syscall entrypoint can be used by
 			 * 64-bit programs too, unlike SYSCALL/SYSENTER.
 			 * Therefore it must preserve R12+
 			 * (they are callee-saved registers in 64-bit C ABI).
 			 *
-			 * This was probably historically not intended,
-			 * but R8..11 are clobbered (cleared to 0).
-			 * IOW: they are the only registers which aren't
-			 * preserved across INT80 syscall.
+			 * Starting in Linux 4.17 (and any kernel that
+			 * backports the change), R8..11 are preserved.
+			 * Historically (and probably unintentionally), they
+			 * were clobbered or zeroed.
 			 */
-			if (*r64 == 0 && num <= 11)
-				continue;
 		}
 		printf("[FAIL]\tR%d has changed:%016llx\n", num, *r64);
 		err++;

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-04-27 15:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-17 14:36 [PATCH] x86/entry/64/compat: Preserve r8-r11 in int $0x80 Andy Lutomirski
2018-04-17 15:00 ` Denys Vlasenko
2018-04-18 16:53   ` Andy Lutomirski
2018-04-18 17:13     ` Denys Vlasenko
2018-04-23 12:50 ` Borislav Petkov
2018-04-27 15:12 ` [tip:x86/pti] " tip-bot for Andy Lutomirski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.