[v2] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue
diff mbox series

Message ID 2b5bb0f417a95dfdcb50c6c41702bb0455fc535a.1430091970.git.luto@kernel.org
State New, archived
Headers show
Series
  • [v2] x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue
Related show

Commit Message

Andy Lutomirski April 26, 2015, 11:47 p.m. UTC
AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET
with SS == 0 results in an invalid usermode state in which SS is
apparently equal to __USER_DS but causes #SS if used.

Work around the issue by setting SS to __KERNEL_DS __switch_to, thus
ensuring that SYSRET never happens with SS set to NULL.

This was exposed by a recent vDSO cleanup.

Fixes: e7d6eefaaa44 x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---

Changes since v1:
 - Improve comments per Denys' advice
 - Force SS to __KERNEL_DS -- don't just check for NULL.

Please apply my test case patch, too.

arch/x86/ia32/ia32entry.S         |  7 +++++++
 arch/x86/include/asm/cpufeature.h |  1 +
 arch/x86/kernel/cpu/amd.c         |  3 +++
 arch/x86/kernel/entry_64.S        |  9 +++++++++
 arch/x86/kernel/process_64.c      | 28 ++++++++++++++++++++++++++++
 5 files changed, 48 insertions(+)

Comments

Linus Torvalds April 27, 2015, 12:51 a.m. UTC | #1
Just a heads-up to the x86 people: I'm going to merge this directly,
since I'm doing -rc1 momentarily, and without this patch 32-bit
user-land on a 64-bit kernel is flaky on all AMD CPU's. Which I don't
want for -rc1.

                         Linus

On Sun, Apr 26, 2015 at 4:47 PM, Andy Lutomirski <luto@kernel.org> wrote:
> AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET
> with SS == 0 results in an invalid usermode state in which SS is
> apparently equal to __USER_DS but causes #SS if used.
>
> Work around the issue by setting SS to __KERNEL_DS __switch_to, thus
> ensuring that SYSRET never happens with SS set to NULL.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
H. Peter Anvin April 27, 2015, 1:31 a.m. UTC | #2
In case it matters:

Acked-by: H. Peter Anvin <hpa@zytor.com>

On April 26, 2015 5:51:26 PM PDT, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>Just a heads-up to the x86 people: I'm going to merge this directly,
>since I'm doing -rc1 momentarily, and without this patch 32-bit
>user-land on a 64-bit kernel is flaky on all AMD CPU's. Which I don't
>want for -rc1.
>
>                         Linus
>
>On Sun, Apr 26, 2015 at 4:47 PM, Andy Lutomirski <luto@kernel.org>
>wrote:
>> AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET
>> with SS == 0 results in an invalid usermode state in which SS is
>> apparently equal to __USER_DS but causes #SS if used.
>>
>> Work around the issue by setting SS to __KERNEL_DS __switch_to, thus
>> ensuring that SYSRET never happens with SS set to NULL.

Patch
diff mbox series

diff --git a/arch/x86/ia32/ia32entry.S b/arch/x86/ia32/ia32entry.S
index 3c9fadf8b95c..c84b90bd30a7 100644
--- a/arch/x86/ia32/ia32entry.S
+++ b/arch/x86/ia32/ia32entry.S
@@ -421,6 +421,13 @@  sysretl_from_sys_call:
 	 * cs and ss are loaded from MSRs.
 	 * (Note: 32bit->32bit SYSRET is different: since r11
 	 * does not exist, it merely sets eflags.IF=1).
+	 *
+	 * NB: On AMD CPUs with the X86_BUG_SYSRET_SS_ATTRS bug, the ss
+	 * descriptor is not reinitialized.  This means that we must
+	 * avoid SYSRET with SS == NULL, which could happen if we schedule,
+	 * exit the kernel, and re-enter using an interrupt vector.  (All
+	 * interrupt entries on x86_64 set SS to NULL.)  We prevent that
+	 * from happening by reloading SS in __switch_to.
 	 */
 	USERGS_SYSRET32
 
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 854c04b3c9c2..7e244f626301 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -257,6 +257,7 @@ 
 #define X86_BUG_11AP		X86_BUG(5) /* Bad local APIC aka 11AP */
 #define X86_BUG_FXSAVE_LEAK	X86_BUG(6) /* FXSAVE leaks FOP/FIP/FOP */
 #define X86_BUG_CLFLUSH_MONITOR	X86_BUG(7) /* AAI65, CLFLUSH required before MONITOR */
+#define X86_BUG_SYSRET_SS_ATTRS	X86_BUG(8) /* SYSRET doesn't fix up SS attrs */
 
 #if defined(__KERNEL__) && !defined(__ASSEMBLY__)
 
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index fd470ebf924e..e4cf63301ff4 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -720,6 +720,9 @@  static void init_amd(struct cpuinfo_x86 *c)
 	if (!cpu_has(c, X86_FEATURE_3DNOWPREFETCH))
 		if (cpu_has(c, X86_FEATURE_3DNOW) || cpu_has(c, X86_FEATURE_LM))
 			set_cpu_cap(c, X86_FEATURE_3DNOWPREFETCH);
+
+	/* AMD CPUs don't reset SS attributes on SYSRET */
+	set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
 }
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 0034012d04ea..2a14b1568f8a 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -295,6 +295,15 @@  system_call_fastpath:
 	 * rflags from r11 (but RF and VM bits are forced to 0),
 	 * cs and ss are loaded from MSRs.
 	 * Restoration of rflags re-enables interrupts.
+	 *
+	 * NB: On AMD CPUs with the X86_BUG_SYSRET_SS_ATTRS bug, the ss
+	 * descriptor is not reinitialized.  This means that we should
+	 * avoid SYSRET with SS == NULL, which could happen if we schedule,
+	 * exit the kernel, and re-enter using an interrupt vector.  (All
+	 * interrupt entries on x86_64 set SS to NULL.)  We prevent that
+	 * from happening by reloading SS in __switch_to.  (Actually
+	 * detecting the failure in 64-bit userspace is tricky but can be
+	 * done.)
 	 */
 	USERGS_SYSRET64
 
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 4baaa972f52a..ddfdbf74f174 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -419,6 +419,34 @@  __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 		     task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV))
 		__switch_to_xtra(prev_p, next_p, tss);
 
+	if (static_cpu_has_bug(X86_BUG_SYSRET_SS_ATTRS)) {
+		/*
+		 * AMD CPUs have a misfeature: SYSRET sets the SS selector but
+		 * does not update the cached descriptor.  As a result, if we
+		 * do SYSRET while SS is NULL, we'll end up in user mode with
+		 * SS apparently equal to __USER_DS but actually unusable.
+		 *
+		 * The straightforward workaround would be to fix it up just
+		 * before SYSRET, but that would slow down the system call
+		 * fast paths.  Instead, we ensure that SS is never NULL in
+		 * system call context.  We do this by replacing NULL SS
+		 * selectors at every context switch.  SYSCALL sets up a valid
+		 * SS, so the only way to get NULL is to re-enter the kernel
+		 * from CPL 3 through an interrupt.  Since that can't happen
+		 * in the same task as a running syscall, we are guaranteed to
+		 * context switch between every interrupt vector entry and a
+		 * subsequent SYSRET.
+		 *
+		 * We read SS first because SS reads are much faster than
+		 * writes.  Out of caution, we force SS to __KERNEL_DS even if
+		 * it previously had a different non-NULL value.
+		 */
+		unsigned short ss_sel;
+		savesegment(ss, ss_sel);
+		if (ss_sel != __KERNEL_DS)
+			loadsegment(ss, __KERNEL_DS);
+	}
+
 	return prev_p;
 }