linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/39 v8] PTI support for x86-32
@ 2018-07-18  9:40 Joerg Roedel
  2018-07-18  9:40 ` [PATCH 01/39] x86/asm-offsets: Move TSS_sp0 and TSS_sp1 to asm-offsets.c Joerg Roedel
                   ` (40 more replies)
  0 siblings, 41 replies; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

Hi,

here is version 8 of my patches to enable PTI on x86-32. The
last version got some good review which I mostly worked into
this version.

I didn't rebase it to v4.18-rc5, as that base didn't boot on
x86-32 because of a regression introduced by

	e181ae0c5db9 ('mm: zero unavailable pages before memmap init')

But that is already being worked on. The rebase I tried
showed no conflicts, so these patches should apply cleanly
there as well.

The changes to v7 are:

	* Fixed kbuild error (one patch failed to build)

	* Removed segment loading changes from SAVE_ALL

	* More restrictive entry-stack check in
	  SWITCH_TO_KERNEL_STACK

	* Renamed TSS_entry_stack to TSS_entry2task_stack

	* Documented properly what will go into TSS.sp1

	* Fixed comment for clearing high-bits of the dword
	  containing CS-slot in pt_regs

	* Fixed X86_FEATURE_PCID check for x86-32 in pti_init()

	* Made entry-debugging depend on CONFIG_DEBUG_ENTRY
	  instead of a new config option

	* Dropped cpu_current_top_of_stack->tss.sp1 patch.
	  It was actually subtly broken on x86-32 because
	  there is a difference between the task-stack
	  pointer and cpu_current_top_of_stack. The formula
	  is:

		task_stack = cpu_current_top_of_stack - padding
	
	  On x86-64 the padding is zero, so there is no
	  difference, but on x86-32 it is 8 or 16 bytes so
	  that cpu_current_top_of_stack can't point to
	  tss.sp1 without breaking current_pt_regs().

	* Renamed update_sp0 to update_task_stack() and made
	  that function update TSS.sp1 on x86-32. This is
	  also needed for VM86 mode. I think it can also be
	  implemented this way on x86-64, but that will be a
	  separate patch outside of this patch-set.

The patches still need fixes already in tip-tree to work
correctly. I merged these fixes into the branch I pushed to

	git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git pti-x32-v8

for easier testing. The code survived >12h overnight testing
with my usual

	* 'perf top' for NMI load

	* x86-selftests in a loop (except mpx and pkeys
	  which are not supported on the machine)

	* kernel-compile in a loop

all in parallel. I also boot-tested x86-64 and !PAE config
again and ran my GLB-test to make sure that the global
mappings between user and kernel page-table are identical.
All that succeeded and showed no regressions.

Previous versions of this patch-set are:

	* For v7:
	  Post : https://lore.kernel.org/lkml/1531308586-29340-1-git-send-email-joro@8bytes.org/
	  Git  : git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git pti-x32-v7

	* For v6:
	  Post : https://lore.kernel.org/lkml/1524498460-25530-1-git-send-email-joro@8bytes.org/
	  Git  : git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git pti-x32-v6

	* For v5:
	  Post : https://marc.info/?l=linux-kernel&m=152389297705480&w=2
	  Git  : git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git pti-x32-v5

	* For v4:
	  Post : https://marc.info/?l=linux-kernel&m=152122860630236&w=2
	  Git  : git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git pti-x32-v4

	* For v3:
	  Post : https://marc.info/?l=linux-kernel&m=152024559419876&w=2
	  Git  : git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git pti-x32-v3

	* For v2:
	  Post : https://marc.info/?l=linux-kernel&m=151816914932088&w=2
	  Git  : git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git pti-x32-v2

Please review.

Thanks,

	Joerg


Joerg Roedel (39):
  x86/asm-offsets: Move TSS_sp0 and TSS_sp1 to asm-offsets.c
  x86/entry/32: Rename TSS_sysenter_sp0 to TSS_entry2task_stack
  x86/entry/32: Load task stack from x86_tss.sp1 in SYSENTER handler
  x86/entry/32: Put ESPFIX code into a macro
  x86/entry/32: Unshare NMI return path
  x86/entry/32: Split off return-to-kernel path
  x86/entry/32: Enter the kernel via trampoline stack
  x86/entry/32: Leave the kernel via trampoline stack
  x86/entry/32: Introduce SAVE_ALL_NMI and RESTORE_ALL_NMI
  x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack
  x86/entry/32: Simplify debug entry point
  x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
  x86/entry/32: Add PTI cr3 switches to NMI handler code
  x86/entry: Rename update_sp0 to update_task_stack
  x86/pgtable: Rename pti_set_user_pgd to pti_set_user_pgtbl
  x86/pgtable/pae: Unshare kernel PMDs when PTI is enabled
  x86/pgtable/32: Allocate 8k page-tables when PTI is enabled
  x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h
  x86/pgtable: Move pti_set_user_pgtbl() to pgtable.h
  x86/pgtable: Move two more functions from pgtable_64.h to pgtable.h
  x86/mm/pae: Populate valid user PGD entries
  x86/mm/pae: Populate the user page-table with user pgd's
  x86/mm/legacy: Populate the user page-table with user pgd's
  x86/mm/pti: Add an overflow check to pti_clone_pmds()
  x86/mm/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32
  x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level on x86_32
  x86/mm/pti: Make pti_clone_kernel_text() compile on 32 bit
  x86/mm/pti: Keep permissions when cloning kernel text in
    pti_clone_kernel_text()
  x86/mm/pti: Introduce pti_finalize()
  x86/mm/pti: Clone entry-text again in pti_finalize()
  x86/mm/dump_pagetables: Define INIT_PGD
  x86/pgtable/pae: Use separate kernel PMDs for user page-table
  x86/ldt: Reserve address-space range on 32 bit for the LDT
  x86/ldt: Define LDT_END_ADDR
  x86/ldt: Split out sanity check in map_ldt_struct()
  x86/ldt: Enable LDT user-mapping for PAE
  x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32
  x86/mm/pti: Add Warning when booting on a PCID capable CPU
  x86/entry/32: Add debug code to check entry/exit cr3

 arch/x86/entry/entry_32.S                   | 624 +++++++++++++++++++++++-----
 arch/x86/include/asm/mmu_context.h          |   5 -
 arch/x86/include/asm/pgtable-2level.h       |   9 +
 arch/x86/include/asm/pgtable-2level_types.h |   3 +
 arch/x86/include/asm/pgtable-3level.h       |   7 +
 arch/x86/include/asm/pgtable-3level_types.h |   6 +-
 arch/x86/include/asm/pgtable.h              |  87 ++++
 arch/x86/include/asm/pgtable_32.h           |   2 -
 arch/x86/include/asm/pgtable_32_types.h     |   9 +-
 arch/x86/include/asm/pgtable_64.h           |  89 +---
 arch/x86/include/asm/pgtable_64_types.h     |   3 +
 arch/x86/include/asm/pgtable_types.h        |  28 +-
 arch/x86/include/asm/processor-flags.h      |   8 +-
 arch/x86/include/asm/pti.h                  |   3 +-
 arch/x86/include/asm/sections.h             |   1 +
 arch/x86/include/asm/switch_to.h            |  16 +-
 arch/x86/kernel/asm-offsets.c               |   5 +
 arch/x86/kernel/asm-offsets_32.c            |  10 +-
 arch/x86/kernel/asm-offsets_64.c            |   2 -
 arch/x86/kernel/cpu/common.c                |   5 +-
 arch/x86/kernel/head_32.S                   |  20 +-
 arch/x86/kernel/ldt.c                       | 137 ++++--
 arch/x86/kernel/process.c                   |   2 -
 arch/x86/kernel/process_32.c                |   2 +-
 arch/x86/kernel/process_64.c                |   2 +-
 arch/x86/kernel/vm86_32.c                   |   4 +-
 arch/x86/kernel/vmlinux.lds.S               |  17 +-
 arch/x86/mm/dump_pagetables.c               |  21 +-
 arch/x86/mm/init_64.c                       |   6 -
 arch/x86/mm/pgtable.c                       | 105 ++++-
 arch/x86/mm/pti.c                           |  73 +++-
 include/linux/pti.h                         |   1 +
 init/main.c                                 |   7 +
 security/Kconfig                            |   2 +-
 34 files changed, 1014 insertions(+), 307 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 01/39] x86/asm-offsets: Move TSS_sp0 and TSS_sp1 to asm-offsets.c
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
@ 2018-07-18  9:40 ` Joerg Roedel
  2018-07-19 23:18   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:40 ` [PATCH 02/39] x86/entry/32: Rename TSS_sysenter_sp0 to TSS_entry2task_stack Joerg Roedel
                   ` (39 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

These offsets will be used in 32 bit assembly code as well,
so make them available for all of x86 code.

Reviewed-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/asm-offsets.c    | 4 ++++
 arch/x86/kernel/asm-offsets_64.c | 2 --
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index dcb008c..a1e1628 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -103,4 +103,8 @@ void common(void) {
 	OFFSET(CPU_ENTRY_AREA_entry_trampoline, cpu_entry_area, entry_trampoline);
 	OFFSET(CPU_ENTRY_AREA_entry_stack, cpu_entry_area, entry_stack_page);
 	DEFINE(SIZEOF_entry_stack, sizeof(struct entry_stack));
+
+	/* Offset for sp0 and sp1 into the tss_struct */
+	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
+	OFFSET(TSS_sp1, tss_struct, x86_tss.sp1);
 }
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index b2dcd16..3b9405e 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -65,8 +65,6 @@ int main(void)
 #undef ENTRY
 
 	OFFSET(TSS_ist, tss_struct, x86_tss.ist);
-	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
-	OFFSET(TSS_sp1, tss_struct, x86_tss.sp1);
 	BLANK();
 
 #ifdef CONFIG_STACKPROTECTOR
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 02/39] x86/entry/32: Rename TSS_sysenter_sp0 to TSS_entry2task_stack
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
  2018-07-18  9:40 ` [PATCH 01/39] x86/asm-offsets: Move TSS_sp0 and TSS_sp1 to asm-offsets.c Joerg Roedel
@ 2018-07-18  9:40 ` Joerg Roedel
  2018-07-19 23:19   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:40 ` [PATCH 03/39] x86/entry/32: Load task stack from x86_tss.sp1 in SYSENTER handler Joerg Roedel
                   ` (38 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

The stack address doesn't need to be stored in tss.sp0 if
we switch manually like on sysenter. Rename the offset so
that it still makes sense when we change its location.

We will also use this stack for all kernel-entry points, not
just sysenter. Reflect that and the fact that it is the
offset to the task-stack location in the name as well.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S        | 2 +-
 arch/x86/kernel/asm-offsets_32.c | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index c371bfe..39f711a 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -412,7 +412,7 @@ ENTRY(xen_sysenter_target)
  * 0(%ebp) arg6
  */
 ENTRY(entry_SYSENTER_32)
-	movl	TSS_sysenter_sp0(%esp), %esp
+	movl	TSS_entry2task_stack(%esp), %esp
 .Lsysenter_past_esp:
 	pushl	$__USER_DS		/* pt_regs->ss */
 	pushl	%ebp			/* pt_regs->sp (stashed in bp) */
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index a4a3be3..15b3f45 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -46,8 +46,9 @@ void foo(void)
 	OFFSET(saved_context_gdt_desc, saved_context, gdt_desc);
 	BLANK();
 
-	/* Offset from the sysenter stack to tss.sp0 */
-	DEFINE(TSS_sysenter_sp0, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) -
+	/* Offset from the entry stack to task stack stored in TSS */
+	DEFINE(TSS_entry2task_stack,
+	       offsetof(struct cpu_entry_area, tss.x86_tss.sp0) -
 	       offsetofend(struct cpu_entry_area, entry_stack_page.stack));
 
 #ifdef CONFIG_STACKPROTECTOR
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 03/39] x86/entry/32: Load task stack from x86_tss.sp1 in SYSENTER handler
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
  2018-07-18  9:40 ` [PATCH 01/39] x86/asm-offsets: Move TSS_sp0 and TSS_sp1 to asm-offsets.c Joerg Roedel
  2018-07-18  9:40 ` [PATCH 02/39] x86/entry/32: Rename TSS_sysenter_sp0 to TSS_entry2task_stack Joerg Roedel
@ 2018-07-18  9:40 ` Joerg Roedel
  2018-07-19 23:19   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:40 ` [PATCH 04/39] x86/entry/32: Put ESPFIX code into a macro Joerg Roedel
                   ` (37 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

We want x86_tss.sp0 point to the entry stack later to use
it as a trampoline stack for other kernel entry points
besides SYSENTER.

So store the task stack pointer in x86_tss.sp1, which is
otherwise unused by the hardware, as Linux doesn't make use
of Ring 1.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/asm-offsets_32.c | 9 +++++++--
 arch/x86/kernel/process_32.c     | 2 ++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index 15b3f45..82826f2 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -46,9 +46,14 @@ void foo(void)
 	OFFSET(saved_context_gdt_desc, saved_context, gdt_desc);
 	BLANK();
 
-	/* Offset from the entry stack to task stack stored in TSS */
+	/*
+	 * Offset from the entry stack to task stack stored in TSS. Kernel entry
+	 * happens on the per-cpu entry-stack, and the asm code switches to the
+	 * task-stack pointer stored in x86_tss.sp1, which is a copy of
+	 * task->thread.sp0 where entry code can find it.
+	 */
 	DEFINE(TSS_entry2task_stack,
-	       offsetof(struct cpu_entry_area, tss.x86_tss.sp0) -
+	       offsetof(struct cpu_entry_area, tss.x86_tss.sp1) -
 	       offsetofend(struct cpu_entry_area, entry_stack_page.stack));
 
 #ifdef CONFIG_STACKPROTECTOR
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 0ae659d..ec62cc7 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -290,6 +290,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	this_cpu_write(cpu_current_top_of_stack,
 		       (unsigned long)task_stack_page(next_p) +
 		       THREAD_SIZE);
+	/* SYSENTER reads the task-stack from tss.sp1 */
+	this_cpu_write(cpu_tss_rw.x86_tss.sp1, next_p->thread.sp0);
 
 	/*
 	 * Restore %gs if needed (which is common)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 04/39] x86/entry/32: Put ESPFIX code into a macro
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (2 preceding siblings ...)
  2018-07-18  9:40 ` [PATCH 03/39] x86/entry/32: Load task stack from x86_tss.sp1 in SYSENTER handler Joerg Roedel
@ 2018-07-18  9:40 ` Joerg Roedel
  2018-07-19 23:20   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:40 ` [PATCH 05/39] x86/entry/32: Unshare NMI return path Joerg Roedel
                   ` (36 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

This makes it easier to split up the shared iret code path.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 97 ++++++++++++++++++++++++-----------------------
 1 file changed, 49 insertions(+), 48 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 39f711a..ef7d653 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -221,6 +221,54 @@
 	POP_GS_EX
 .endm
 
+.macro CHECK_AND_APPLY_ESPFIX
+#ifdef CONFIG_X86_ESPFIX32
+#define GDT_ESPFIX_SS PER_CPU_VAR(gdt_page) + (GDT_ENTRY_ESPFIX_SS * 8)
+
+	ALTERNATIVE	"jmp .Lend_\@", "", X86_BUG_ESPFIX
+
+	movl	PT_EFLAGS(%esp), %eax		# mix EFLAGS, SS and CS
+	/*
+	 * Warning: PT_OLDSS(%esp) contains the wrong/random values if we
+	 * are returning to the kernel.
+	 * See comments in process.c:copy_thread() for details.
+	 */
+	movb	PT_OLDSS(%esp), %ah
+	movb	PT_CS(%esp), %al
+	andl	$(X86_EFLAGS_VM | (SEGMENT_TI_MASK << 8) | SEGMENT_RPL_MASK), %eax
+	cmpl	$((SEGMENT_LDT << 8) | USER_RPL), %eax
+	jne	.Lend_\@	# returning to user-space with LDT SS
+
+	/*
+	 * Setup and switch to ESPFIX stack
+	 *
+	 * We're returning to userspace with a 16 bit stack. The CPU will not
+	 * restore the high word of ESP for us on executing iret... This is an
+	 * "official" bug of all the x86-compatible CPUs, which we can work
+	 * around to make dosemu and wine happy. We do this by preloading the
+	 * high word of ESP with the high word of the userspace ESP while
+	 * compensating for the offset by changing to the ESPFIX segment with
+	 * a base address that matches for the difference.
+	 */
+	mov	%esp, %edx			/* load kernel esp */
+	mov	PT_OLDESP(%esp), %eax		/* load userspace esp */
+	mov	%dx, %ax			/* eax: new kernel esp */
+	sub	%eax, %edx			/* offset (low word is 0) */
+	shr	$16, %edx
+	mov	%dl, GDT_ESPFIX_SS + 4		/* bits 16..23 */
+	mov	%dh, GDT_ESPFIX_SS + 7		/* bits 24..31 */
+	pushl	$__ESPFIX_SS
+	pushl	%eax				/* new kernel esp */
+	/*
+	 * Disable interrupts, but do not irqtrace this section: we
+	 * will soon execute iret and the tracer was already set to
+	 * the irqstate after the IRET:
+	 */
+	DISABLE_INTERRUPTS(CLBR_ANY)
+	lss	(%esp), %esp			/* switch to espfix segment */
+.Lend_\@:
+#endif /* CONFIG_X86_ESPFIX32 */
+.endm
 /*
  * %eax: prev task
  * %edx: next task
@@ -547,21 +595,7 @@ ENTRY(entry_INT80_32)
 restore_all:
 	TRACE_IRQS_IRET
 .Lrestore_all_notrace:
-#ifdef CONFIG_X86_ESPFIX32
-	ALTERNATIVE	"jmp .Lrestore_nocheck", "", X86_BUG_ESPFIX
-
-	movl	PT_EFLAGS(%esp), %eax		# mix EFLAGS, SS and CS
-	/*
-	 * Warning: PT_OLDSS(%esp) contains the wrong/random values if we
-	 * are returning to the kernel.
-	 * See comments in process.c:copy_thread() for details.
-	 */
-	movb	PT_OLDSS(%esp), %ah
-	movb	PT_CS(%esp), %al
-	andl	$(X86_EFLAGS_VM | (SEGMENT_TI_MASK << 8) | SEGMENT_RPL_MASK), %eax
-	cmpl	$((SEGMENT_LDT << 8) | USER_RPL), %eax
-	je .Lldt_ss				# returning to user-space with LDT SS
-#endif
+	CHECK_AND_APPLY_ESPFIX
 .Lrestore_nocheck:
 	RESTORE_REGS 4				# skip orig_eax/error_code
 .Lirq_return:
@@ -579,39 +613,6 @@ ENTRY(iret_exc	)
 	jmp	common_exception
 .previous
 	_ASM_EXTABLE(.Lirq_return, iret_exc)
-
-#ifdef CONFIG_X86_ESPFIX32
-.Lldt_ss:
-/*
- * Setup and switch to ESPFIX stack
- *
- * We're returning to userspace with a 16 bit stack. The CPU will not
- * restore the high word of ESP for us on executing iret... This is an
- * "official" bug of all the x86-compatible CPUs, which we can work
- * around to make dosemu and wine happy. We do this by preloading the
- * high word of ESP with the high word of the userspace ESP while
- * compensating for the offset by changing to the ESPFIX segment with
- * a base address that matches for the difference.
- */
-#define GDT_ESPFIX_SS PER_CPU_VAR(gdt_page) + (GDT_ENTRY_ESPFIX_SS * 8)
-	mov	%esp, %edx			/* load kernel esp */
-	mov	PT_OLDESP(%esp), %eax		/* load userspace esp */
-	mov	%dx, %ax			/* eax: new kernel esp */
-	sub	%eax, %edx			/* offset (low word is 0) */
-	shr	$16, %edx
-	mov	%dl, GDT_ESPFIX_SS + 4		/* bits 16..23 */
-	mov	%dh, GDT_ESPFIX_SS + 7		/* bits 24..31 */
-	pushl	$__ESPFIX_SS
-	pushl	%eax				/* new kernel esp */
-	/*
-	 * Disable interrupts, but do not irqtrace this section: we
-	 * will soon execute iret and the tracer was already set to
-	 * the irqstate after the IRET:
-	 */
-	DISABLE_INTERRUPTS(CLBR_ANY)
-	lss	(%esp), %esp			/* switch to espfix segment */
-	jmp	.Lrestore_nocheck
-#endif
 ENDPROC(entry_INT80_32)
 
 .macro FIXUP_ESPFIX_STACK
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 05/39] x86/entry/32: Unshare NMI return path
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (3 preceding siblings ...)
  2018-07-18  9:40 ` [PATCH 04/39] x86/entry/32: Put ESPFIX code into a macro Joerg Roedel
@ 2018-07-18  9:40 ` Joerg Roedel
  2018-07-19 23:21   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:40 ` [PATCH 06/39] x86/entry/32: Split off return-to-kernel path Joerg Roedel
                   ` (35 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

NMI will no longer use most of the shared return path,
because NMI needs special handling when the CR3 switches for
PTI are added. This patch prepares for that.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index ef7d653..4364131 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1017,7 +1017,7 @@ ENTRY(nmi)
 
 	/* Not on SYSENTER stack. */
 	call	do_nmi
-	jmp	.Lrestore_all_notrace
+	jmp	.Lnmi_return
 
 .Lnmi_from_sysenter_stack:
 	/*
@@ -1028,7 +1028,11 @@ ENTRY(nmi)
 	movl	PER_CPU_VAR(cpu_current_top_of_stack), %esp
 	call	do_nmi
 	movl	%ebx, %esp
-	jmp	.Lrestore_all_notrace
+
+.Lnmi_return:
+	CHECK_AND_APPLY_ESPFIX
+	RESTORE_REGS 4
+	jmp	.Lirq_return
 
 #ifdef CONFIG_X86_ESPFIX32
 .Lnmi_espfix_stack:
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 06/39] x86/entry/32: Split off return-to-kernel path
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (4 preceding siblings ...)
  2018-07-18  9:40 ` [PATCH 05/39] x86/entry/32: Unshare NMI return path Joerg Roedel
@ 2018-07-18  9:40 ` Joerg Roedel
  2018-07-19 23:21   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:40 ` [PATCH 07/39] x86/entry/32: Enter the kernel via trampoline stack Joerg Roedel
                   ` (34 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Use a separate return path when we know we are returning to
the kernel. This allows us to put the PTI cr3-switch and the
switch to the entry-stack into the return-to-user path
without further checking.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 4364131..7251c4f 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -65,7 +65,7 @@
 # define preempt_stop(clobbers)	DISABLE_INTERRUPTS(clobbers); TRACE_IRQS_OFF
 #else
 # define preempt_stop(clobbers)
-# define resume_kernel		restore_all
+# define resume_kernel		restore_all_kernel
 #endif
 
 .macro TRACE_IRQS_IRET
@@ -399,9 +399,9 @@ ENTRY(resume_kernel)
 	DISABLE_INTERRUPTS(CLBR_ANY)
 .Lneed_resched:
 	cmpl	$0, PER_CPU_VAR(__preempt_count)
-	jnz	restore_all
+	jnz	restore_all_kernel
 	testl	$X86_EFLAGS_IF, PT_EFLAGS(%esp)	# interrupts off (exception path) ?
-	jz	restore_all
+	jz	restore_all_kernel
 	call	preempt_schedule_irq
 	jmp	.Lneed_resched
 END(resume_kernel)
@@ -606,6 +606,11 @@ restore_all:
 	 */
 	INTERRUPT_RETURN
 
+restore_all_kernel:
+	TRACE_IRQS_IRET
+	RESTORE_REGS 4
+	jmp	.Lirq_return
+
 .section .fixup, "ax"
 ENTRY(iret_exc	)
 	pushl	$0				# no error code
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 07/39] x86/entry/32: Enter the kernel via trampoline stack
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (5 preceding siblings ...)
  2018-07-18  9:40 ` [PATCH 06/39] x86/entry/32: Split off return-to-kernel path Joerg Roedel
@ 2018-07-18  9:40 ` Joerg Roedel
  2018-07-18 18:09   ` Brian Gerst
  2018-07-19 23:22   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:40 ` [PATCH 08/39] x86/entry/32: Leave " Joerg Roedel
                   ` (33 subsequent siblings)
  40 siblings, 2 replies; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Use the entry-stack as a trampoline to enter the kernel. The
entry-stack is already in the cpu_entry_area and will be
mapped to userspace when PTI is enabled.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S        | 119 ++++++++++++++++++++++++++++++++-------
 arch/x86/include/asm/switch_to.h |  14 ++++-
 arch/x86/kernel/asm-offsets.c    |   1 +
 arch/x86/kernel/cpu/common.c     |   5 +-
 arch/x86/kernel/process.c        |   2 -
 arch/x86/kernel/process_32.c     |   2 -
 6 files changed, 115 insertions(+), 28 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 7251c4f..fea49ec 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -154,7 +154,7 @@
 
 #endif /* CONFIG_X86_32_LAZY_GS */
 
-.macro SAVE_ALL pt_regs_ax=%eax
+.macro SAVE_ALL pt_regs_ax=%eax switch_stacks=0
 	cld
 	PUSH_GS
 	pushl	%fs
@@ -173,6 +173,12 @@
 	movl	$(__KERNEL_PERCPU), %edx
 	movl	%edx, %fs
 	SET_KERNEL_GS %edx
+
+	/* Switch to kernel stack if necessary */
+.if \switch_stacks > 0
+	SWITCH_TO_KERNEL_STACK
+.endif
+
 .endm
 
 /*
@@ -269,6 +275,73 @@
 .Lend_\@:
 #endif /* CONFIG_X86_ESPFIX32 */
 .endm
+
+
+/*
+ * Called with pt_regs fully populated and kernel segments loaded,
+ * so we can access PER_CPU and use the integer registers.
+ *
+ * We need to be very careful here with the %esp switch, because an NMI
+ * can happen everywhere. If the NMI handler finds itself on the
+ * entry-stack, it will overwrite the task-stack and everything we
+ * copied there. So allocate the stack-frame on the task-stack and
+ * switch to it before we do any copying.
+ */
+.macro SWITCH_TO_KERNEL_STACK
+
+	ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
+
+	/* Are we on the entry stack? Bail out if not! */
+	movl	PER_CPU_VAR(cpu_entry_area), %ecx
+	addl	$CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
+	subl	%esp, %ecx	/* ecx = (end of entry_stack) - esp */
+	cmpl	$SIZEOF_entry_stack, %ecx
+	jae	.Lend_\@
+
+	/* Load stack pointer into %esi and %edi */
+	movl	%esp, %esi
+	movl	%esi, %edi
+
+	/* Move %edi to the top of the entry stack */
+	andl	$(MASK_entry_stack), %edi
+	addl	$(SIZEOF_entry_stack), %edi
+
+	/* Load top of task-stack into %edi */
+	movl	TSS_entry2task_stack(%edi), %edi
+
+	/* Bytes to copy */
+	movl	$PTREGS_SIZE, %ecx
+
+#ifdef CONFIG_VM86
+	testl	$X86_EFLAGS_VM, PT_EFLAGS(%esi)
+	jz	.Lcopy_pt_regs_\@
+
+	/*
+	 * Stack-frame contains 4 additional segment registers when
+	 * coming from VM86 mode
+	 */
+	addl	$(4 * 4), %ecx
+
+.Lcopy_pt_regs_\@:
+#endif
+
+	/* Allocate frame on task-stack */
+	subl	%ecx, %edi
+
+	/* Switch to task-stack */
+	movl	%edi, %esp
+
+	/*
+	 * We are now on the task-stack and can safely copy over the
+	 * stack-frame
+	 */
+	shrl	$2, %ecx
+	cld
+	rep movsl
+
+.Lend_\@:
+.endm
+
 /*
  * %eax: prev task
  * %edx: next task
@@ -469,7 +542,7 @@ ENTRY(entry_SYSENTER_32)
 	pushl	$__USER_CS		/* pt_regs->cs */
 	pushl	$0			/* pt_regs->ip = 0 (placeholder) */
 	pushl	%eax			/* pt_regs->orig_ax */
-	SAVE_ALL pt_regs_ax=$-ENOSYS	/* save rest */
+	SAVE_ALL pt_regs_ax=$-ENOSYS	/* save rest, stack already switched */
 
 	/*
 	 * SYSENTER doesn't filter flags, so we need to clear NT, AC
@@ -580,7 +653,8 @@ ENDPROC(entry_SYSENTER_32)
 ENTRY(entry_INT80_32)
 	ASM_CLAC
 	pushl	%eax			/* pt_regs->orig_ax */
-	SAVE_ALL pt_regs_ax=$-ENOSYS	/* save rest */
+
+	SAVE_ALL pt_regs_ax=$-ENOSYS switch_stacks=1	/* save rest */
 
 	/*
 	 * User mode is traced as though IRQs are on, and the interrupt gate
@@ -677,7 +751,8 @@ END(irq_entries_start)
 common_interrupt:
 	ASM_CLAC
 	addl	$-0x80, (%esp)			/* Adjust vector into the [-256, -1] range */
-	SAVE_ALL
+
+	SAVE_ALL switch_stacks=1
 	ENCODE_FRAME_POINTER
 	TRACE_IRQS_OFF
 	movl	%esp, %eax
@@ -685,16 +760,16 @@ common_interrupt:
 	jmp	ret_from_intr
 ENDPROC(common_interrupt)
 
-#define BUILD_INTERRUPT3(name, nr, fn)	\
-ENTRY(name)				\
-	ASM_CLAC;			\
-	pushl	$~(nr);			\
-	SAVE_ALL;			\
-	ENCODE_FRAME_POINTER;		\
-	TRACE_IRQS_OFF			\
-	movl	%esp, %eax;		\
-	call	fn;			\
-	jmp	ret_from_intr;		\
+#define BUILD_INTERRUPT3(name, nr, fn)			\
+ENTRY(name)						\
+	ASM_CLAC;					\
+	pushl	$~(nr);					\
+	SAVE_ALL switch_stacks=1;			\
+	ENCODE_FRAME_POINTER;				\
+	TRACE_IRQS_OFF					\
+	movl	%esp, %eax;				\
+	call	fn;					\
+	jmp	ret_from_intr;				\
 ENDPROC(name)
 
 #define BUILD_INTERRUPT(name, nr)		\
@@ -926,16 +1001,20 @@ common_exception:
 	pushl	%es
 	pushl	%ds
 	pushl	%eax
+	movl	$(__USER_DS), %eax
+	movl	%eax, %ds
+	movl	%eax, %es
+	movl	$(__KERNEL_PERCPU), %eax
+	movl	%eax, %fs
 	pushl	%ebp
 	pushl	%edi
 	pushl	%esi
 	pushl	%edx
 	pushl	%ecx
 	pushl	%ebx
+	SWITCH_TO_KERNEL_STACK
 	ENCODE_FRAME_POINTER
 	cld
-	movl	$(__KERNEL_PERCPU), %ecx
-	movl	%ecx, %fs
 	UNWIND_ESPFIX_STACK
 	GS_TO_REG %ecx
 	movl	PT_GS(%esp), %edi		# get the function address
@@ -943,9 +1022,6 @@ common_exception:
 	movl	$-1, PT_ORIG_EAX(%esp)		# no syscall to restart
 	REG_TO_PTGS %ecx
 	SET_KERNEL_GS %ecx
-	movl	$(__USER_DS), %ecx
-	movl	%ecx, %ds
-	movl	%ecx, %es
 	TRACE_IRQS_OFF
 	movl	%esp, %eax			# pt_regs pointer
 	CALL_NOSPEC %edi
@@ -964,6 +1040,7 @@ ENTRY(debug)
 	 */
 	ASM_CLAC
 	pushl	$-1				# mark this as an int
+
 	SAVE_ALL
 	ENCODE_FRAME_POINTER
 	xorl	%edx, %edx			# error code 0
@@ -999,6 +1076,7 @@ END(debug)
  */
 ENTRY(nmi)
 	ASM_CLAC
+
 #ifdef CONFIG_X86_ESPFIX32
 	pushl	%eax
 	movl	%ss, %eax
@@ -1066,7 +1144,8 @@ END(nmi)
 ENTRY(int3)
 	ASM_CLAC
 	pushl	$-1				# mark this as an int
-	SAVE_ALL
+
+	SAVE_ALL switch_stacks=1
 	ENCODE_FRAME_POINTER
 	TRACE_IRQS_OFF
 	xorl	%edx, %edx			# zero error code
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index eb5f799..8bc2f70 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -89,13 +89,23 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread)
 /* This is used when switching tasks or entering/exiting vm86 mode. */
 static inline void update_sp0(struct task_struct *task)
 {
-	/* On x86_64, sp0 always points to the entry trampoline stack, which is constant: */
+	/* sp0 always points to the entry trampoline stack, which is constant: */
 #ifdef CONFIG_X86_32
-	load_sp0(task->thread.sp0);
+	if (static_cpu_has(X86_FEATURE_XENPV))
+		load_sp0(task->thread.sp0);
+	else
+		this_cpu_write(cpu_tss_rw.x86_tss.sp1, task->thread.sp0);
 #else
+	/*
+	 * x86-64 updates x86_tss.sp1 via cpu_current_top_of_stack. That
+	 * doesn't work on x86-32 because sp1 and
+	 * cpu_current_top_of_stack have different values (because of
+	 * the non-zero stack-padding on 32bit).
+	 */
 	if (static_cpu_has(X86_FEATURE_XENPV))
 		load_sp0(task_top_of_stack(task));
 #endif
+
 }
 
 #endif /* _ASM_X86_SWITCH_TO_H */
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index a1e1628..01de31d 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -103,6 +103,7 @@ void common(void) {
 	OFFSET(CPU_ENTRY_AREA_entry_trampoline, cpu_entry_area, entry_trampoline);
 	OFFSET(CPU_ENTRY_AREA_entry_stack, cpu_entry_area, entry_stack_page);
 	DEFINE(SIZEOF_entry_stack, sizeof(struct entry_stack));
+	DEFINE(MASK_entry_stack, (~(sizeof(struct entry_stack) - 1)));
 
 	/* Offset for sp0 and sp1 into the tss_struct */
 	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index eb4cb3e..43a927e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1804,11 +1804,12 @@ void cpu_init(void)
 	enter_lazy_tlb(&init_mm, curr);
 
 	/*
-	 * Initialize the TSS.  Don't bother initializing sp0, as the initial
-	 * task never enters user mode.
+	 * Initialize the TSS.  sp0 points to the entry trampoline stack
+	 * regardless of what task is running.
 	 */
 	set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);
 	load_TR_desc();
+	load_sp0((unsigned long)(cpu_entry_stack(cpu) + 1));
 
 	load_mm_ldt(&init_mm);
 
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 30ca2d1..c93fcfd 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -57,14 +57,12 @@ __visible DEFINE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw) = {
 		 */
 		.sp0 = (1UL << (BITS_PER_LONG-1)) + 1,
 
-#ifdef CONFIG_X86_64
 		/*
 		 * .sp1 is cpu_current_top_of_stack.  The init task never
 		 * runs user code, but cpu_current_top_of_stack should still
 		 * be well defined before the first context switch.
 		 */
 		.sp1 = TOP_OF_INIT_STACK,
-#endif
 
 #ifdef CONFIG_X86_32
 		.ss0 = __KERNEL_DS,
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index ec62cc7..0ae659d 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -290,8 +290,6 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	this_cpu_write(cpu_current_top_of_stack,
 		       (unsigned long)task_stack_page(next_p) +
 		       THREAD_SIZE);
-	/* SYSENTER reads the task-stack from tss.sp1 */
-	this_cpu_write(cpu_tss_rw.x86_tss.sp1, next_p->thread.sp0);
 
 	/*
 	 * Restore %gs if needed (which is common)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 08/39] x86/entry/32: Leave the kernel via trampoline stack
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (6 preceding siblings ...)
  2018-07-18  9:40 ` [PATCH 07/39] x86/entry/32: Enter the kernel via trampoline stack Joerg Roedel
@ 2018-07-18  9:40 ` Joerg Roedel
  2018-07-19 23:22   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:40 ` [PATCH 09/39] x86/entry/32: Introduce SAVE_ALL_NMI and RESTORE_ALL_NMI Joerg Roedel
                   ` (32 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Switch back to the trampoline stack before returning to
userspace.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 79 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 77 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index fea49ec..a905e62 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -343,6 +343,60 @@
 .endm
 
 /*
+ * Switch back from the kernel stack to the entry stack.
+ *
+ * The %esp register must point to pt_regs on the task stack. It will
+ * first calculate the size of the stack-frame to copy, depending on
+ * whether we return to VM86 mode or not. With that it uses 'rep movsl'
+ * to copy the contents of the stack over to the entry stack.
+ *
+ * We must be very careful here, as we can't trust the contents of the
+ * task-stack once we switched to the entry-stack. When an NMI happens
+ * while on the entry-stack, the NMI handler will switch back to the top
+ * of the task stack, overwriting our stack-frame we are about to copy.
+ * Therefore we switch the stack only after everything is copied over.
+ */
+.macro SWITCH_TO_ENTRY_STACK
+
+	ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
+
+	/* Bytes to copy */
+	movl	$PTREGS_SIZE, %ecx
+
+#ifdef CONFIG_VM86
+	testl	$(X86_EFLAGS_VM), PT_EFLAGS(%esp)
+	jz	.Lcopy_pt_regs_\@
+
+	/* Additional 4 registers to copy when returning to VM86 mode */
+	addl    $(4 * 4), %ecx
+
+.Lcopy_pt_regs_\@:
+#endif
+
+	/* Initialize source and destination for movsl */
+	movl	PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %edi
+	subl	%ecx, %edi
+	movl	%esp, %esi
+
+	/* Save future stack pointer in %ebx */
+	movl	%edi, %ebx
+
+	/* Copy over the stack-frame */
+	shrl	$2, %ecx
+	cld
+	rep movsl
+
+	/*
+	 * Switch to entry-stack - needs to happen after everything is
+	 * copied because the NMI handler will overwrite the task-stack
+	 * when on entry-stack
+	 */
+	movl	%ebx, %esp
+
+.Lend_\@:
+.endm
+
+/*
  * %eax: prev task
  * %edx: next task
  */
@@ -581,25 +635,45 @@ ENTRY(entry_SYSENTER_32)
 
 /* Opportunistic SYSEXIT */
 	TRACE_IRQS_ON			/* User mode traces as IRQs on. */
+
+	/*
+	 * Setup entry stack - we keep the pointer in %eax and do the
+	 * switch after almost all user-state is restored.
+	 */
+
+	/* Load entry stack pointer and allocate frame for eflags/eax */
+	movl	PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %eax
+	subl	$(2*4), %eax
+
+	/* Copy eflags and eax to entry stack */
+	movl	PT_EFLAGS(%esp), %edi
+	movl	PT_EAX(%esp), %esi
+	movl	%edi, (%eax)
+	movl	%esi, 4(%eax)
+
+	/* Restore user registers and segments */
 	movl	PT_EIP(%esp), %edx	/* pt_regs->ip */
 	movl	PT_OLDESP(%esp), %ecx	/* pt_regs->sp */
 1:	mov	PT_FS(%esp), %fs
 	PTGS_TO_GS
+
 	popl	%ebx			/* pt_regs->bx */
 	addl	$2*4, %esp		/* skip pt_regs->cx and pt_regs->dx */
 	popl	%esi			/* pt_regs->si */
 	popl	%edi			/* pt_regs->di */
 	popl	%ebp			/* pt_regs->bp */
-	popl	%eax			/* pt_regs->ax */
+
+	/* Switch to entry stack */
+	movl	%eax, %esp
 
 	/*
 	 * Restore all flags except IF. (We restore IF separately because
 	 * STI gives a one-instruction window in which we won't be interrupted,
 	 * whereas POPF does not.)
 	 */
-	addl	$PT_EFLAGS-PT_DS, %esp	/* point esp at pt_regs->flags */
 	btrl	$X86_EFLAGS_IF_BIT, (%esp)
 	popfl
+	popl	%eax
 
 	/*
 	 * Return back to the vDSO, which will pop ecx and edx.
@@ -668,6 +742,7 @@ ENTRY(entry_INT80_32)
 
 restore_all:
 	TRACE_IRQS_IRET
+	SWITCH_TO_ENTRY_STACK
 .Lrestore_all_notrace:
 	CHECK_AND_APPLY_ESPFIX
 .Lrestore_nocheck:
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 09/39] x86/entry/32: Introduce SAVE_ALL_NMI and RESTORE_ALL_NMI
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (7 preceding siblings ...)
  2018-07-18  9:40 ` [PATCH 08/39] x86/entry/32: Leave " Joerg Roedel
@ 2018-07-18  9:40 ` Joerg Roedel
  2018-07-19 23:23   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:40 ` [PATCH 10/39] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack Joerg Roedel
                   ` (31 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

These macros will be used in the NMI handler code and
replace plain SAVE_ALL and RESTORE_REGS there. We will add
the NMI-specific CR3-switch to these macros later.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index a905e62..7635925 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -181,6 +181,9 @@
 
 .endm
 
+.macro SAVE_ALL_NMI
+	SAVE_ALL
+.endm
 /*
  * This is a sneaky trick to help the unwinder find pt_regs on the stack.  The
  * frame pointer is replaced with an encoded pointer to pt_regs.  The encoding
@@ -227,6 +230,10 @@
 	POP_GS_EX
 .endm
 
+.macro RESTORE_ALL_NMI pop=0
+	RESTORE_REGS pop=\pop
+.endm
+
 .macro CHECK_AND_APPLY_ESPFIX
 #ifdef CONFIG_X86_ESPFIX32
 #define GDT_ESPFIX_SS PER_CPU_VAR(gdt_page) + (GDT_ENTRY_ESPFIX_SS * 8)
@@ -1161,7 +1168,7 @@ ENTRY(nmi)
 #endif
 
 	pushl	%eax				# pt_regs->orig_ax
-	SAVE_ALL
+	SAVE_ALL_NMI
 	ENCODE_FRAME_POINTER
 	xorl	%edx, %edx			# zero error code
 	movl	%esp, %eax			# pt_regs pointer
@@ -1189,7 +1196,7 @@ ENTRY(nmi)
 
 .Lnmi_return:
 	CHECK_AND_APPLY_ESPFIX
-	RESTORE_REGS 4
+	RESTORE_ALL_NMI pop=4
 	jmp	.Lirq_return
 
 #ifdef CONFIG_X86_ESPFIX32
@@ -1205,12 +1212,12 @@ ENTRY(nmi)
 	pushl	16(%esp)
 	.endr
 	pushl	%eax
-	SAVE_ALL
+	SAVE_ALL_NMI
 	ENCODE_FRAME_POINTER
 	FIXUP_ESPFIX_STACK			# %eax == %esp
 	xorl	%edx, %edx			# zero error code
 	call	do_nmi
-	RESTORE_REGS
+	RESTORE_ALL_NMI
 	lss	12+4(%esp), %esp		# back to espfix stack
 	jmp	.Lirq_return
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 10/39] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (8 preceding siblings ...)
  2018-07-18  9:40 ` [PATCH 09/39] x86/entry/32: Introduce SAVE_ALL_NMI and RESTORE_ALL_NMI Joerg Roedel
@ 2018-07-18  9:40 ` Joerg Roedel
  2018-07-19 23:23   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-10-12 18:29   ` [PATCH 10/39] " Jan Kiszka
  2018-07-18  9:40 ` [PATCH 11/39] x86/entry/32: Simplify debug entry point Joerg Roedel
                   ` (30 subsequent siblings)
  40 siblings, 2 replies; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

It can happen that we enter the kernel from kernel-mode and
on the entry-stack. The most common way this happens is when
we get an exception while loading the user-space segment
registers on the kernel-to-userspace exit path.

The segment loading needs to be done after the entry-stack
switch, because the stack-switch needs kernel %fs for
per_cpu access.

When this happens, we need to make sure that we leave the
kernel with the entry-stack again, so that the interrupted
code-path runs on the right stack when switching to the
user-cr3.

We do this by detecting this condition on kernel-entry by
checking CS.RPL and %esp, and if it happens, we copy over
the complete content of the entry stack to the task-stack.
This needs to be done because once we enter the exception
handlers we might be scheduled out or even migrated to a
different CPU, so that we can't rely on the entry-stack
contents. We also leave a marker in the stack-frame to
detect this condition on the exit path.

On the exit path the copy is reversed, we copy all of the
remaining task-stack back to the entry-stack and switch
to it.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 116 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 115 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 7635925..9d6eceb 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -294,6 +294,9 @@
  * copied there. So allocate the stack-frame on the task-stack and
  * switch to it before we do any copying.
  */
+
+#define CS_FROM_ENTRY_STACK	(1 << 31)
+
 .macro SWITCH_TO_KERNEL_STACK
 
 	ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
@@ -316,6 +319,16 @@
 	/* Load top of task-stack into %edi */
 	movl	TSS_entry2task_stack(%edi), %edi
 
+	/*
+	 * Clear unused upper bits of the dword containing the word-sized CS
+	 * slot in pt_regs in case hardware didn't clear it for us.
+	 */
+	andl	$(0x0000ffff), PT_CS(%esp)
+
+	/* Special case - entry from kernel mode via entry stack */
+	testl	$SEGMENT_RPL_MASK, PT_CS(%esp)
+	jz	.Lentry_from_kernel_\@
+
 	/* Bytes to copy */
 	movl	$PTREGS_SIZE, %ecx
 
@@ -329,8 +342,8 @@
 	 */
 	addl	$(4 * 4), %ecx
 
-.Lcopy_pt_regs_\@:
 #endif
+.Lcopy_pt_regs_\@:
 
 	/* Allocate frame on task-stack */
 	subl	%ecx, %edi
@@ -346,6 +359,56 @@
 	cld
 	rep movsl
 
+	jmp .Lend_\@
+
+.Lentry_from_kernel_\@:
+
+	/*
+	 * This handles the case when we enter the kernel from
+	 * kernel-mode and %esp points to the entry-stack. When this
+	 * happens we need to switch to the task-stack to run C code,
+	 * but switch back to the entry-stack again when we approach
+	 * iret and return to the interrupted code-path. This usually
+	 * happens when we hit an exception while restoring user-space
+	 * segment registers on the way back to user-space.
+	 *
+	 * When we switch to the task-stack here, we can't trust the
+	 * contents of the entry-stack anymore, as the exception handler
+	 * might be scheduled out or moved to another CPU. Therefore we
+	 * copy the complete entry-stack to the task-stack and set a
+	 * marker in the iret-frame (bit 31 of the CS dword) to detect
+	 * what we've done on the iret path.
+	 *
+	 * On the iret path we copy everything back and switch to the
+	 * entry-stack, so that the interrupted kernel code-path
+	 * continues on the same stack it was interrupted with.
+	 *
+	 * Be aware that an NMI can happen anytime in this code.
+	 *
+	 * %esi: Entry-Stack pointer (same as %esp)
+	 * %edi: Top of the task stack
+	 */
+
+	/* Calculate number of bytes on the entry stack in %ecx */
+	movl	%esi, %ecx
+
+	/* %ecx to the top of entry-stack */
+	andl	$(MASK_entry_stack), %ecx
+	addl	$(SIZEOF_entry_stack), %ecx
+
+	/* Number of bytes on the entry stack to %ecx */
+	sub	%esi, %ecx
+
+	/* Mark stackframe as coming from entry stack */
+	orl	$CS_FROM_ENTRY_STACK, PT_CS(%esp)
+
+	/*
+	 * %esi and %edi are unchanged, %ecx contains the number of
+	 * bytes to copy. The code at .Lcopy_pt_regs_\@ will allocate
+	 * the stack-frame on task-stack and copy everything over
+	 */
+	jmp .Lcopy_pt_regs_\@
+
 .Lend_\@:
 .endm
 
@@ -404,6 +467,56 @@
 .endm
 
 /*
+ * This macro handles the case when we return to kernel-mode on the iret
+ * path and have to switch back to the entry stack.
+ *
+ * See the comments below the .Lentry_from_kernel_\@ label in the
+ * SWITCH_TO_KERNEL_STACK macro for more details.
+ */
+.macro PARANOID_EXIT_TO_KERNEL_MODE
+
+	/*
+	 * Test if we entered the kernel with the entry-stack. Most
+	 * likely we did not, because this code only runs on the
+	 * return-to-kernel path.
+	 */
+	testl	$CS_FROM_ENTRY_STACK, PT_CS(%esp)
+	jz	.Lend_\@
+
+	/* Unlikely slow-path */
+
+	/* Clear marker from stack-frame */
+	andl	$(~CS_FROM_ENTRY_STACK), PT_CS(%esp)
+
+	/* Copy the remaining task-stack contents to entry-stack */
+	movl	%esp, %esi
+	movl	PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %edi
+
+	/* Bytes on the task-stack to ecx */
+	movl	PER_CPU_VAR(cpu_tss_rw + TSS_sp1), %ecx
+	subl	%esi, %ecx
+
+	/* Allocate stack-frame on entry-stack */
+	subl	%ecx, %edi
+
+	/*
+	 * Save future stack-pointer, we must not switch until the
+	 * copy is done, otherwise the NMI handler could destroy the
+	 * contents of the task-stack we are about to copy.
+	 */
+	movl	%edi, %ebx
+
+	/* Do the copy */
+	shrl	$2, %ecx
+	cld
+	rep movsl
+
+	/* Safe to switch to entry-stack now */
+	movl	%ebx, %esp
+
+.Lend_\@:
+.endm
+/*
  * %eax: prev task
  * %edx: next task
  */
@@ -764,6 +877,7 @@ restore_all:
 
 restore_all_kernel:
 	TRACE_IRQS_IRET
+	PARANOID_EXIT_TO_KERNEL_MODE
 	RESTORE_REGS 4
 	jmp	.Lirq_return
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 11/39] x86/entry/32: Simplify debug entry point
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (9 preceding siblings ...)
  2018-07-18  9:40 ` [PATCH 10/39] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack Joerg Roedel
@ 2018-07-18  9:40 ` Joerg Roedel
  2018-07-19 23:24   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:40 ` [PATCH 12/39] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points Joerg Roedel
                   ` (29 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

The common exception entry code now handles the
entry-from-sysenter stack situation and makes sure to leave
with the same stack as it entered the kernel.

So there is no need anymore for the special handling in the
debug entry code.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 35 +++--------------------------------
 1 file changed, 3 insertions(+), 32 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 9d6eceb..dbf7d61 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1226,41 +1226,12 @@ END(common_exception)
 
 ENTRY(debug)
 	/*
-	 * #DB can happen at the first instruction of
-	 * entry_SYSENTER_32 or in Xen's SYSENTER prologue.  If this
-	 * happens, then we will be running on a very small stack.  We
-	 * need to detect this condition and switch to the thread
-	 * stack before calling any C code at all.
-	 *
-	 * If you edit this code, keep in mind that NMIs can happen in here.
+	 * Entry from sysenter is now handled in common_exception
 	 */
 	ASM_CLAC
 	pushl	$-1				# mark this as an int
-
-	SAVE_ALL
-	ENCODE_FRAME_POINTER
-	xorl	%edx, %edx			# error code 0
-	movl	%esp, %eax			# pt_regs pointer
-
-	/* Are we currently on the SYSENTER stack? */
-	movl	PER_CPU_VAR(cpu_entry_area), %ecx
-	addl	$CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
-	subl	%eax, %ecx	/* ecx = (end of entry_stack) - esp */
-	cmpl	$SIZEOF_entry_stack, %ecx
-	jb	.Ldebug_from_sysenter_stack
-
-	TRACE_IRQS_OFF
-	call	do_debug
-	jmp	ret_from_exception
-
-.Ldebug_from_sysenter_stack:
-	/* We're on the SYSENTER stack.  Switch off. */
-	movl	%esp, %ebx
-	movl	PER_CPU_VAR(cpu_current_top_of_stack), %esp
-	TRACE_IRQS_OFF
-	call	do_debug
-	movl	%ebx, %esp
-	jmp	ret_from_exception
+	pushl	$do_debug
+	jmp	common_exception
 END(debug)
 
 /*
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 12/39] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (10 preceding siblings ...)
  2018-07-18  9:40 ` [PATCH 11/39] x86/entry/32: Simplify debug entry point Joerg Roedel
@ 2018-07-18  9:40 ` Joerg Roedel
  2018-07-19 23:24   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:40 ` [PATCH 13/39] x86/entry/32: Add PTI cr3 switches to NMI handler code Joerg Roedel
                   ` (28 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Add unconditional cr3 switches between user and kernel cr3
to all non-NMI entry and exit points.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 86 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 82 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index dbf7d61..60b28df 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -77,6 +77,8 @@
 #endif
 .endm
 
+#define PTI_SWITCH_MASK         (1 << PAGE_SHIFT)
+
 /*
  * User gs save/restore
  *
@@ -154,6 +156,33 @@
 
 #endif /* CONFIG_X86_32_LAZY_GS */
 
+/* Unconditionally switch to user cr3 */
+.macro SWITCH_TO_USER_CR3 scratch_reg:req
+	ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
+
+	movl	%cr3, \scratch_reg
+	orl	$PTI_SWITCH_MASK, \scratch_reg
+	movl	\scratch_reg, %cr3
+.Lend_\@:
+.endm
+
+/*
+ * Switch to kernel cr3 if not already loaded and return current cr3 in
+ * \scratch_reg
+ */
+.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
+	ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
+	movl	%cr3, \scratch_reg
+	/* Test if we are already on kernel CR3 */
+	testl	$PTI_SWITCH_MASK, \scratch_reg
+	jz	.Lend_\@
+	andl	$(~PTI_SWITCH_MASK), \scratch_reg
+	movl	\scratch_reg, %cr3
+	/* Return original CR3 in \scratch_reg */
+	orl	$PTI_SWITCH_MASK, \scratch_reg
+.Lend_\@:
+.endm
+
 .macro SAVE_ALL pt_regs_ax=%eax switch_stacks=0
 	cld
 	PUSH_GS
@@ -283,7 +312,6 @@
 #endif /* CONFIG_X86_ESPFIX32 */
 .endm
 
-
 /*
  * Called with pt_regs fully populated and kernel segments loaded,
  * so we can access PER_CPU and use the integer registers.
@@ -296,11 +324,19 @@
  */
 
 #define CS_FROM_ENTRY_STACK	(1 << 31)
+#define CS_FROM_USER_CR3	(1 << 30)
 
 .macro SWITCH_TO_KERNEL_STACK
 
 	ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
 
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
+
+	/*
+	 * %eax now contains the entry cr3 and we carry it forward in
+	 * that register for the time this macro runs
+	 */
+
 	/* Are we on the entry stack? Bail out if not! */
 	movl	PER_CPU_VAR(cpu_entry_area), %ecx
 	addl	$CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
@@ -370,7 +406,8 @@
 	 * but switch back to the entry-stack again when we approach
 	 * iret and return to the interrupted code-path. This usually
 	 * happens when we hit an exception while restoring user-space
-	 * segment registers on the way back to user-space.
+	 * segment registers on the way back to user-space or when the
+	 * sysenter handler runs with eflags.tf set.
 	 *
 	 * When we switch to the task-stack here, we can't trust the
 	 * contents of the entry-stack anymore, as the exception handler
@@ -387,6 +424,7 @@
 	 *
 	 * %esi: Entry-Stack pointer (same as %esp)
 	 * %edi: Top of the task stack
+	 * %eax: CR3 on kernel entry
 	 */
 
 	/* Calculate number of bytes on the entry stack in %ecx */
@@ -403,6 +441,14 @@
 	orl	$CS_FROM_ENTRY_STACK, PT_CS(%esp)
 
 	/*
+	 * Test the cr3 used to enter the kernel and add a marker
+	 * so that we can switch back to it before iret.
+	 */
+	testl	$PTI_SWITCH_MASK, %eax
+	jz	.Lcopy_pt_regs_\@
+	orl	$CS_FROM_USER_CR3, PT_CS(%esp)
+
+	/*
 	 * %esi and %edi are unchanged, %ecx contains the number of
 	 * bytes to copy. The code at .Lcopy_pt_regs_\@ will allocate
 	 * the stack-frame on task-stack and copy everything over
@@ -468,7 +514,7 @@
 
 /*
  * This macro handles the case when we return to kernel-mode on the iret
- * path and have to switch back to the entry stack.
+ * path and have to switch back to the entry stack and/or user-cr3
  *
  * See the comments below the .Lentry_from_kernel_\@ label in the
  * SWITCH_TO_KERNEL_STACK macro for more details.
@@ -514,6 +560,18 @@
 	/* Safe to switch to entry-stack now */
 	movl	%ebx, %esp
 
+	/*
+	 * We came from entry-stack and need to check if we also need to
+	 * switch back to user cr3.
+	 */
+	testl	$CS_FROM_USER_CR3, PT_CS(%esp)
+	jz	.Lend_\@
+
+	/* Clear marker from stack-frame */
+	andl	$(~CS_FROM_USER_CR3), PT_CS(%esp)
+
+	SWITCH_TO_USER_CR3 scratch_reg=%eax
+
 .Lend_\@:
 .endm
 /*
@@ -707,7 +765,20 @@ ENTRY(xen_sysenter_target)
  * 0(%ebp) arg6
  */
 ENTRY(entry_SYSENTER_32)
+	/*
+	 * On entry-stack with all userspace-regs live - save and
+	 * restore eflags and %eax to use it as scratch-reg for the cr3
+	 * switch.
+	 */
+	pushfl
+	pushl	%eax
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
+	popl	%eax
+	popfl
+
+	/* Stack empty again, switch to task stack */
 	movl	TSS_entry2task_stack(%esp), %esp
+
 .Lsysenter_past_esp:
 	pushl	$__USER_DS		/* pt_regs->ss */
 	pushl	%ebp			/* pt_regs->sp (stashed in bp) */
@@ -786,6 +857,9 @@ ENTRY(entry_SYSENTER_32)
 	/* Switch to entry stack */
 	movl	%eax, %esp
 
+	/* Now ready to switch the cr3 */
+	SWITCH_TO_USER_CR3 scratch_reg=%eax
+
 	/*
 	 * Restore all flags except IF. (We restore IF separately because
 	 * STI gives a one-instruction window in which we won't be interrupted,
@@ -866,7 +940,11 @@ restore_all:
 .Lrestore_all_notrace:
 	CHECK_AND_APPLY_ESPFIX
 .Lrestore_nocheck:
-	RESTORE_REGS 4				# skip orig_eax/error_code
+	/* Switch back to user CR3 */
+	SWITCH_TO_USER_CR3 scratch_reg=%eax
+
+	/* Restore user state */
+	RESTORE_REGS pop=4			# skip orig_eax/error_code
 .Lirq_return:
 	/*
 	 * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on IRET core serialization
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 13/39] x86/entry/32: Add PTI cr3 switches to NMI handler code
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (11 preceding siblings ...)
  2018-07-18  9:40 ` [PATCH 12/39] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points Joerg Roedel
@ 2018-07-18  9:40 ` Joerg Roedel
  2018-07-19 23:25   ` [tip:x86/pti] x86/entry/32: Add PTI CR3 " tip-bot for Joerg Roedel
  2018-07-18  9:40 ` [PATCH 14/39] x86/entry: Rename update_sp0 to update_task_stack Joerg Roedel
                   ` (27 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

The NMI handler is special, as it needs to leave with the
same cr3 as it was entered with. We need to do this because
we could enter the NMI handler from kernel code with
user-cr3 already loaded.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 39 +++++++++++++++++++++++++++++++++------
 1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 60b28df..b1541c7 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -210,8 +210,19 @@
 
 .endm
 
-.macro SAVE_ALL_NMI
+.macro SAVE_ALL_NMI cr3_reg:req
 	SAVE_ALL
+
+	/*
+	 * Now switch the CR3 when PTI is enabled.
+	 *
+	 * We can enter with either user or kernel cr3, the code will
+	 * store the old cr3 in \cr3_reg and switches to the kernel cr3
+	 * if necessary.
+	 */
+	SWITCH_TO_KERNEL_CR3 scratch_reg=\cr3_reg
+
+.Lend_\@:
 .endm
 /*
  * This is a sneaky trick to help the unwinder find pt_regs on the stack.  The
@@ -259,7 +270,23 @@
 	POP_GS_EX
 .endm
 
-.macro RESTORE_ALL_NMI pop=0
+.macro RESTORE_ALL_NMI cr3_reg:req pop=0
+	/*
+	 * Now switch the CR3 when PTI is enabled.
+	 *
+	 * We enter with kernel cr3 and switch the cr3 to the value
+	 * stored on \cr3_reg, which is either a user or a kernel cr3.
+	 */
+	ALTERNATIVE "jmp .Lswitched_\@", "", X86_FEATURE_PTI
+
+	testl	$PTI_SWITCH_MASK, \cr3_reg
+	jz	.Lswitched_\@
+
+	/* User cr3 in \cr3_reg - write it to hardware cr3 */
+	movl	\cr3_reg, %cr3
+
+.Lswitched_\@:
+
 	RESTORE_REGS pop=\pop
 .endm
 
@@ -1331,7 +1358,7 @@ ENTRY(nmi)
 #endif
 
 	pushl	%eax				# pt_regs->orig_ax
-	SAVE_ALL_NMI
+	SAVE_ALL_NMI cr3_reg=%edi
 	ENCODE_FRAME_POINTER
 	xorl	%edx, %edx			# zero error code
 	movl	%esp, %eax			# pt_regs pointer
@@ -1359,7 +1386,7 @@ ENTRY(nmi)
 
 .Lnmi_return:
 	CHECK_AND_APPLY_ESPFIX
-	RESTORE_ALL_NMI pop=4
+	RESTORE_ALL_NMI cr3_reg=%edi pop=4
 	jmp	.Lirq_return
 
 #ifdef CONFIG_X86_ESPFIX32
@@ -1375,12 +1402,12 @@ ENTRY(nmi)
 	pushl	16(%esp)
 	.endr
 	pushl	%eax
-	SAVE_ALL_NMI
+	SAVE_ALL_NMI cr3_reg=%edi
 	ENCODE_FRAME_POINTER
 	FIXUP_ESPFIX_STACK			# %eax == %esp
 	xorl	%edx, %edx			# zero error code
 	call	do_nmi
-	RESTORE_ALL_NMI
+	RESTORE_ALL_NMI cr3_reg=%edi
 	lss	12+4(%esp), %esp		# back to espfix stack
 	jmp	.Lirq_return
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 14/39] x86/entry: Rename update_sp0 to update_task_stack
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (12 preceding siblings ...)
  2018-07-18  9:40 ` [PATCH 13/39] x86/entry/32: Add PTI cr3 switches to NMI handler code Joerg Roedel
@ 2018-07-18  9:40 ` Joerg Roedel
  2018-07-19 23:25   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:40 ` [PATCH 15/39] x86/pgtable: Rename pti_set_user_pgd to pti_set_user_pgtbl Joerg Roedel
                   ` (26 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

The function does not update sp0 anymore but updates makes
the task-stack visible for entry code. This is by either
writing it to sp1 or by doing a hypercall. Rename the
function to get rid of the misleading name.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/switch_to.h | 2 +-
 arch/x86/kernel/process_32.c     | 2 +-
 arch/x86/kernel/process_64.c     | 2 +-
 arch/x86/kernel/vm86_32.c        | 4 ++--
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 8bc2f70..36bd243 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -87,7 +87,7 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread)
 #endif
 
 /* This is used when switching tasks or entering/exiting vm86 mode. */
-static inline void update_sp0(struct task_struct *task)
+static inline void update_task_stack(struct task_struct *task)
 {
 	/* sp0 always points to the entry trampoline stack, which is constant: */
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 0ae659d..2924fd4 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -285,7 +285,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	 * current_thread_info().  Refresh the SYSENTER configuration in
 	 * case prev or next is vm86.
 	 */
-	update_sp0(next_p);
+	update_task_stack(next_p);
 	refresh_sysenter_cs(next);
 	this_cpu_write(cpu_current_top_of_stack,
 		       (unsigned long)task_stack_page(next_p) +
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 12bb445..476e3dd 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -478,7 +478,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
 
 	/* Reload sp0. */
-	update_sp0(next_p);
+	update_task_stack(next_p);
 
 	/*
 	 * Now maybe reload the debug registers and handle I/O bitmaps
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 9d0b5af..1c03e4a 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -149,7 +149,7 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
 	preempt_disable();
 	tsk->thread.sp0 = vm86->saved_sp0;
 	tsk->thread.sysenter_cs = __KERNEL_CS;
-	update_sp0(tsk);
+	update_task_stack(tsk);
 	refresh_sysenter_cs(&tsk->thread);
 	vm86->saved_sp0 = 0;
 	preempt_enable();
@@ -374,7 +374,7 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
 		refresh_sysenter_cs(&tsk->thread);
 	}
 
-	update_sp0(tsk);
+	update_task_stack(tsk);
 	preempt_enable();
 
 	if (vm86->flags & VM86_SCREEN_BITMAP)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 15/39] x86/pgtable: Rename pti_set_user_pgd to pti_set_user_pgtbl
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (13 preceding siblings ...)
  2018-07-18  9:40 ` [PATCH 14/39] x86/entry: Rename update_sp0 to update_task_stack Joerg Roedel
@ 2018-07-18  9:40 ` Joerg Roedel
  2018-07-19 23:26   ` [tip:x86/pti] x86/pgtable: Rename pti_set_user_pgd() to pti_set_user_pgtbl() tip-bot for Joerg Roedel
  2018-07-18  9:40 ` [PATCH 16/39] x86/pgtable/pae: Unshare kernel PMDs when PTI is enabled Joerg Roedel
                   ` (25 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

With the way page-table folding is implemented on 32 bit, we
are not only setting PGDs with this functions, but also PUDs
and even PMDs. Give the function a more generic name to
reflect that.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable_64.h | 12 ++++++------
 arch/x86/mm/pti.c                 |  2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 3c5385f..9406c4f 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -196,21 +196,21 @@ static inline bool pgdp_maps_userspace(void *__ptr)
 }
 
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
-pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd);
+pgd_t __pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd);
 
 /*
  * Take a PGD location (pgdp) and a pgd value that needs to be set there.
  * Populates the user and returns the resulting PGD that must be set in
  * the kernel copy of the page tables.
  */
-static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
+static inline pgd_t pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd)
 {
 	if (!static_cpu_has(X86_FEATURE_PTI))
 		return pgd;
-	return __pti_set_user_pgd(pgdp, pgd);
+	return __pti_set_user_pgtbl(pgdp, pgd);
 }
 #else
-static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
+static inline pgd_t pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd)
 {
 	return pgd;
 }
@@ -226,7 +226,7 @@ static inline void native_set_p4d(p4d_t *p4dp, p4d_t p4d)
 	}
 
 	pgd = native_make_pgd(native_p4d_val(p4d));
-	pgd = pti_set_user_pgd((pgd_t *)p4dp, pgd);
+	pgd = pti_set_user_pgtbl((pgd_t *)p4dp, pgd);
 	*p4dp = native_make_p4d(native_pgd_val(pgd));
 }
 
@@ -237,7 +237,7 @@ static inline void native_p4d_clear(p4d_t *p4d)
 
 static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd)
 {
-	*pgdp = pti_set_user_pgd(pgdp, pgd);
+	*pgdp = pti_set_user_pgtbl(pgdp, pgd);
 }
 
 static inline void native_pgd_clear(pgd_t *pgd)
diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 4d418e7..f512222 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -117,7 +117,7 @@ void __init pti_check_boottime_disable(void)
 	setup_force_cpu_cap(X86_FEATURE_PTI);
 }
 
-pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
+pgd_t __pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd)
 {
 	/*
 	 * Changes to the high (kernel) portion of the kernelmode page
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 16/39] x86/pgtable/pae: Unshare kernel PMDs when PTI is enabled
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (14 preceding siblings ...)
  2018-07-18  9:40 ` [PATCH 15/39] x86/pgtable: Rename pti_set_user_pgd to pti_set_user_pgtbl Joerg Roedel
@ 2018-07-18  9:40 ` Joerg Roedel
  2018-07-19 23:26   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:40 ` [PATCH 17/39] x86/pgtable/32: Allocate 8k page-tables " Joerg Roedel
                   ` (24 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

With PTI we need to map the per-process LDT into the kernel
address-space for each process, so we need separate kernel
PMDs per PGD.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable-3level_types.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable-3level_types.h b/arch/x86/include/asm/pgtable-3level_types.h
index 6a59a6d..78038e0 100644
--- a/arch/x86/include/asm/pgtable-3level_types.h
+++ b/arch/x86/include/asm/pgtable-3level_types.h
@@ -21,9 +21,10 @@ typedef union {
 #endif	/* !__ASSEMBLY__ */
 
 #ifdef CONFIG_PARAVIRT
-#define SHARED_KERNEL_PMD	(pv_info.shared_kernel_pmd)
+#define SHARED_KERNEL_PMD	((!static_cpu_has(X86_FEATURE_PTI) &&	\
+				 (pv_info.shared_kernel_pmd)))
 #else
-#define SHARED_KERNEL_PMD	1
+#define SHARED_KERNEL_PMD	(!static_cpu_has(X86_FEATURE_PTI))
 #endif
 
 /*
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 17/39] x86/pgtable/32: Allocate 8k page-tables when PTI is enabled
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (15 preceding siblings ...)
  2018-07-18  9:40 ` [PATCH 16/39] x86/pgtable/pae: Unshare kernel PMDs when PTI is enabled Joerg Roedel
@ 2018-07-18  9:40 ` Joerg Roedel
  2018-07-19 23:27   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:40 ` [PATCH 18/39] x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h Joerg Roedel
                   ` (23 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Allocate a kernel and a user page-table root when PTI is
enabled. Also allocate a full page per root for PAE because
otherwise the bit to flip in cr3 to switch between them
would be non-constant, which creates a lot of hassle.
Keep that for a later optimization.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/head_32.S | 20 +++++++++++++++-----
 arch/x86/mm/pgtable.c     |  5 +++--
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index abe6df1..30f9cb2 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -512,11 +512,18 @@ ENTRY(initial_code)
 ENTRY(setup_once_ref)
 	.long setup_once
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+#define	PGD_ALIGN	(2 * PAGE_SIZE)
+#define PTI_USER_PGD_FILL	1024
+#else
+#define	PGD_ALIGN	(PAGE_SIZE)
+#define PTI_USER_PGD_FILL	0
+#endif
 /*
  * BSS section
  */
 __PAGE_ALIGNED_BSS
-	.align PAGE_SIZE
+	.align PGD_ALIGN
 #ifdef CONFIG_X86_PAE
 .globl initial_pg_pmd
 initial_pg_pmd:
@@ -526,14 +533,17 @@ initial_pg_pmd:
 initial_page_table:
 	.fill 1024,4,0
 #endif
+	.align PGD_ALIGN
 initial_pg_fixmap:
 	.fill 1024,4,0
-.globl empty_zero_page
-empty_zero_page:
-	.fill 4096,1,0
 .globl swapper_pg_dir
+	.align PGD_ALIGN
 swapper_pg_dir:
 	.fill 1024,4,0
+	.fill PTI_USER_PGD_FILL,4,0
+.globl empty_zero_page
+empty_zero_page:
+	.fill 4096,1,0
 EXPORT_SYMBOL(empty_zero_page)
 
 /*
@@ -542,7 +552,7 @@ EXPORT_SYMBOL(empty_zero_page)
 #ifdef CONFIG_X86_PAE
 __PAGE_ALIGNED_DATA
 	/* Page-aligned for the benefit of paravirt? */
-	.align PAGE_SIZE
+	.align PGD_ALIGN
 ENTRY(initial_page_table)
 	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR),0	/* low identity map */
 # if KPMDS == 3
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 47b5951..db6fb77 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -343,7 +343,8 @@ static inline pgd_t *_pgd_alloc(void)
 	 * We allocate one page for pgd.
 	 */
 	if (!SHARED_KERNEL_PMD)
-		return (pgd_t *)__get_free_page(PGALLOC_GFP);
+		return (pgd_t *)__get_free_pages(PGALLOC_GFP,
+						 PGD_ALLOCATION_ORDER);
 
 	/*
 	 * Now PAE kernel is not running as a Xen domain. We can allocate
@@ -355,7 +356,7 @@ static inline pgd_t *_pgd_alloc(void)
 static inline void _pgd_free(pgd_t *pgd)
 {
 	if (!SHARED_KERNEL_PMD)
-		free_page((unsigned long)pgd);
+		free_pages((unsigned long)pgd, PGD_ALLOCATION_ORDER);
 	else
 		kmem_cache_free(pgd_cache, pgd);
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 18/39] x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (16 preceding siblings ...)
  2018-07-18  9:40 ` [PATCH 17/39] x86/pgtable/32: Allocate 8k page-tables " Joerg Roedel
@ 2018-07-18  9:40 ` Joerg Roedel
  2018-07-19 23:27   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:40 ` [PATCH 19/39] x86/pgtable: Move pti_set_user_pgtbl() " Joerg Roedel
                   ` (22 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Make them available on 32 bit and clone_pgd_range() happy.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable.h    | 49 +++++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/pgtable_64.h | 49 ---------------------------------------
 2 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 5715647..eb47432 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1155,6 +1155,55 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
 }
 #endif
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+/*
+ * All top-level PAGE_TABLE_ISOLATION page tables are order-1 pages
+ * (8k-aligned and 8k in size).  The kernel one is at the beginning 4k and
+ * the user one is in the last 4k.  To switch between them, you
+ * just need to flip the 12th bit in their addresses.
+ */
+#define PTI_PGTABLE_SWITCH_BIT	PAGE_SHIFT
+
+/*
+ * This generates better code than the inline assembly in
+ * __set_bit().
+ */
+static inline void *ptr_set_bit(void *ptr, int bit)
+{
+	unsigned long __ptr = (unsigned long)ptr;
+
+	__ptr |= BIT(bit);
+	return (void *)__ptr;
+}
+static inline void *ptr_clear_bit(void *ptr, int bit)
+{
+	unsigned long __ptr = (unsigned long)ptr;
+
+	__ptr &= ~BIT(bit);
+	return (void *)__ptr;
+}
+
+static inline pgd_t *kernel_to_user_pgdp(pgd_t *pgdp)
+{
+	return ptr_set_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
+}
+
+static inline pgd_t *user_to_kernel_pgdp(pgd_t *pgdp)
+{
+	return ptr_clear_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
+}
+
+static inline p4d_t *kernel_to_user_p4dp(p4d_t *p4dp)
+{
+	return ptr_set_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
+}
+
+static inline p4d_t *user_to_kernel_p4dp(p4d_t *p4dp)
+{
+	return ptr_clear_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
+}
+#endif /* CONFIG_PAGE_TABLE_ISOLATION */
+
 /*
  * clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
  *
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 9406c4f..4adba19 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -132,55 +132,6 @@ static inline pud_t native_pudp_get_and_clear(pud_t *xp)
 #endif
 }
 
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
-/*
- * All top-level PAGE_TABLE_ISOLATION page tables are order-1 pages
- * (8k-aligned and 8k in size).  The kernel one is at the beginning 4k and
- * the user one is in the last 4k.  To switch between them, you
- * just need to flip the 12th bit in their addresses.
- */
-#define PTI_PGTABLE_SWITCH_BIT	PAGE_SHIFT
-
-/*
- * This generates better code than the inline assembly in
- * __set_bit().
- */
-static inline void *ptr_set_bit(void *ptr, int bit)
-{
-	unsigned long __ptr = (unsigned long)ptr;
-
-	__ptr |= BIT(bit);
-	return (void *)__ptr;
-}
-static inline void *ptr_clear_bit(void *ptr, int bit)
-{
-	unsigned long __ptr = (unsigned long)ptr;
-
-	__ptr &= ~BIT(bit);
-	return (void *)__ptr;
-}
-
-static inline pgd_t *kernel_to_user_pgdp(pgd_t *pgdp)
-{
-	return ptr_set_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
-}
-
-static inline pgd_t *user_to_kernel_pgdp(pgd_t *pgdp)
-{
-	return ptr_clear_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
-}
-
-static inline p4d_t *kernel_to_user_p4dp(p4d_t *p4dp)
-{
-	return ptr_set_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
-}
-
-static inline p4d_t *user_to_kernel_p4dp(p4d_t *p4dp)
-{
-	return ptr_clear_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
-}
-#endif /* CONFIG_PAGE_TABLE_ISOLATION */
-
 /*
  * Page table pages are page-aligned.  The lower half of the top
  * level is used for userspace and the top half for the kernel.
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 19/39] x86/pgtable: Move pti_set_user_pgtbl() to pgtable.h
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (17 preceding siblings ...)
  2018-07-18  9:40 ` [PATCH 18/39] x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h Joerg Roedel
@ 2018-07-18  9:40 ` Joerg Roedel
  2018-07-19 23:28   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:40 ` [PATCH 20/39] x86/pgtable: Move two more functions from pgtable_64.h " Joerg Roedel
                   ` (21 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

There it is also usable from 32 bit code.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable.h | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index eb47432..cc117161 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -640,8 +640,31 @@ static inline int is_new_memtype_allowed(u64 paddr, unsigned long size,
 
 pmd_t *populate_extra_pmd(unsigned long vaddr);
 pte_t *populate_extra_pte(unsigned long vaddr);
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+pgd_t __pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd);
+
+/*
+ * Take a PGD location (pgdp) and a pgd value that needs to be set there.
+ * Populates the user and returns the resulting PGD that must be set in
+ * the kernel copy of the page tables.
+ */
+static inline pgd_t pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd)
+{
+	if (!static_cpu_has(X86_FEATURE_PTI))
+		return pgd;
+	return __pti_set_user_pgtbl(pgdp, pgd);
+}
+#else   /* CONFIG_PAGE_TABLE_ISOLATION */
+static inline pgd_t pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd)
+{
+	return pgd;
+}
+#endif  /* CONFIG_PAGE_TABLE_ISOLATION */
+
 #endif	/* __ASSEMBLY__ */
 
+
 #ifdef CONFIG_X86_32
 # include <asm/pgtable_32.h>
 #else
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 20/39] x86/pgtable: Move two more functions from pgtable_64.h to pgtable.h
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (18 preceding siblings ...)
  2018-07-18  9:40 ` [PATCH 19/39] x86/pgtable: Move pti_set_user_pgtbl() " Joerg Roedel
@ 2018-07-18  9:40 ` Joerg Roedel
  2018-07-19 23:28   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:40 ` [PATCH 21/39] x86/mm/pae: Populate valid user PGD entries Joerg Roedel
                   ` (20 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

These two functions are required for PTI on 32 bit:

	* pgdp_maps_userspace()
	* pgd_large()

Also re-implement pgdp_maps_userspace() so that it will work
on 64 and 32 bit kernels.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable-2level_types.h |  3 +++
 arch/x86/include/asm/pgtable-3level_types.h |  1 +
 arch/x86/include/asm/pgtable.h              | 15 ++++++++++++
 arch/x86/include/asm/pgtable_32.h           |  2 --
 arch/x86/include/asm/pgtable_64.h           | 36 -----------------------------
 arch/x86/include/asm/pgtable_64_types.h     |  2 ++
 6 files changed, 21 insertions(+), 38 deletions(-)

diff --git a/arch/x86/include/asm/pgtable-2level_types.h b/arch/x86/include/asm/pgtable-2level_types.h
index f982ef8..6deb6cd 100644
--- a/arch/x86/include/asm/pgtable-2level_types.h
+++ b/arch/x86/include/asm/pgtable-2level_types.h
@@ -35,4 +35,7 @@ typedef union {
 
 #define PTRS_PER_PTE	1024
 
+/* This covers all VMSPLIT_* and VMSPLIT_*_OPT variants */
+#define PGD_KERNEL_START	(CONFIG_PAGE_OFFSET >> PGDIR_SHIFT)
+
 #endif /* _ASM_X86_PGTABLE_2LEVEL_DEFS_H */
diff --git a/arch/x86/include/asm/pgtable-3level_types.h b/arch/x86/include/asm/pgtable-3level_types.h
index 78038e0..858358a 100644
--- a/arch/x86/include/asm/pgtable-3level_types.h
+++ b/arch/x86/include/asm/pgtable-3level_types.h
@@ -46,5 +46,6 @@ typedef union {
 #define PTRS_PER_PTE	512
 
 #define MAX_POSSIBLE_PHYSMEM_BITS	36
+#define PGD_KERNEL_START	(CONFIG_PAGE_OFFSET >> PGDIR_SHIFT)
 
 #endif /* _ASM_X86_PGTABLE_3LEVEL_DEFS_H */
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index cc117161..e39088cb 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1177,6 +1177,21 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
 	}
 }
 #endif
+/*
+ * Page table pages are page-aligned.  The lower half of the top
+ * level is used for userspace and the top half for the kernel.
+ *
+ * Returns true for parts of the PGD that map userspace and
+ * false for the parts that map the kernel.
+ */
+static inline bool pgdp_maps_userspace(void *__ptr)
+{
+	unsigned long ptr = (unsigned long)__ptr;
+
+	return (((ptr & ~PAGE_MASK) / sizeof(pgd_t)) < PGD_KERNEL_START);
+}
+
+static inline int pgd_large(pgd_t pgd) { return 0; }
 
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
 /*
diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h
index 88a056b..b3ec519 100644
--- a/arch/x86/include/asm/pgtable_32.h
+++ b/arch/x86/include/asm/pgtable_32.h
@@ -34,8 +34,6 @@ static inline void check_pgt_cache(void) { }
 void paging_init(void);
 void sync_initial_page_table(void);
 
-static inline int pgd_large(pgd_t pgd) { return 0; }
-
 /*
  * Define this if things work differently on an i386 and an i486:
  * it will (on an i486) warn about kernel memory accesses that are
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 4adba19..acb6970 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -132,41 +132,6 @@ static inline pud_t native_pudp_get_and_clear(pud_t *xp)
 #endif
 }
 
-/*
- * Page table pages are page-aligned.  The lower half of the top
- * level is used for userspace and the top half for the kernel.
- *
- * Returns true for parts of the PGD that map userspace and
- * false for the parts that map the kernel.
- */
-static inline bool pgdp_maps_userspace(void *__ptr)
-{
-	unsigned long ptr = (unsigned long)__ptr;
-
-	return (ptr & ~PAGE_MASK) < (PAGE_SIZE / 2);
-}
-
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
-pgd_t __pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd);
-
-/*
- * Take a PGD location (pgdp) and a pgd value that needs to be set there.
- * Populates the user and returns the resulting PGD that must be set in
- * the kernel copy of the page tables.
- */
-static inline pgd_t pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd)
-{
-	if (!static_cpu_has(X86_FEATURE_PTI))
-		return pgd;
-	return __pti_set_user_pgtbl(pgdp, pgd);
-}
-#else
-static inline pgd_t pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd)
-{
-	return pgd;
-}
-#endif
-
 static inline void native_set_p4d(p4d_t *p4dp, p4d_t p4d)
 {
 	pgd_t pgd;
@@ -206,7 +171,6 @@ extern void sync_global_pgds(unsigned long start, unsigned long end);
 /*
  * Level 4 access.
  */
-static inline int pgd_large(pgd_t pgd) { return 0; }
 #define mk_kernel_pgd(address) __pgd((address) | _KERNPG_TABLE)
 
 /* PUD - Level3 access */
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 054765a..066e0ab 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -153,4 +153,6 @@ extern unsigned int ptrs_per_p4d;
 
 #define EARLY_DYNAMIC_PAGE_TABLES	64
 
+#define PGD_KERNEL_START	((PAGE_SIZE / 2) / sizeof(pgd_t))
+
 #endif /* _ASM_X86_PGTABLE_64_DEFS_H */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 21/39] x86/mm/pae: Populate valid user PGD entries
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (19 preceding siblings ...)
  2018-07-18  9:40 ` [PATCH 20/39] x86/pgtable: Move two more functions from pgtable_64.h " Joerg Roedel
@ 2018-07-18  9:40 ` Joerg Roedel
  2018-07-19 23:29   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:40 ` [PATCH 22/39] x86/mm/pae: Populate the user page-table with user pgd's Joerg Roedel
                   ` (19 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Generic page-table code populates all non-leaf entries with
_KERNPG_TABLE bits set. This is fine for all paging modes
except PAE.

In PAE mode only a subset of the bits is allowed to be set.
Make sure we only set allowed bits by masking out the
reserved bits.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable_types.h | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 99fff85..b64acb0 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -50,6 +50,7 @@
 #define _PAGE_GLOBAL	(_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
 #define _PAGE_SOFTW1	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW1)
 #define _PAGE_SOFTW2	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW2)
+#define _PAGE_SOFTW3	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW3)
 #define _PAGE_PAT	(_AT(pteval_t, 1) << _PAGE_BIT_PAT)
 #define _PAGE_PAT_LARGE (_AT(pteval_t, 1) << _PAGE_BIT_PAT_LARGE)
 #define _PAGE_SPECIAL	(_AT(pteval_t, 1) << _PAGE_BIT_SPECIAL)
@@ -266,14 +267,37 @@ typedef struct pgprot { pgprotval_t pgprot; } pgprot_t;
 
 typedef struct { pgdval_t pgd; } pgd_t;
 
+#ifdef CONFIG_X86_PAE
+
+/*
+ * PHYSICAL_PAGE_MASK might be non-constant when SME is compiled in, so we can't
+ * use it here.
+ */
+
+#define PGD_PAE_PAGE_MASK	((signed long)PAGE_MASK)
+#define PGD_PAE_PHYS_MASK	(((1ULL << __PHYSICAL_MASK_SHIFT)-1) & PGD_PAE_PAGE_MASK)
+
+/*
+ * PAE allows Base Address, P, PWT, PCD and AVL bits to be set in PGD entries.
+ * All other bits are Reserved MBZ
+ */
+#define PGD_ALLOWED_BITS	(PGD_PAE_PHYS_MASK | _PAGE_PRESENT | \
+				 _PAGE_PWT | _PAGE_PCD | \
+				 _PAGE_SOFTW1 | _PAGE_SOFTW2 | _PAGE_SOFTW3)
+
+#else
+/* No need to mask any bits for !PAE */
+#define PGD_ALLOWED_BITS	(~0ULL)
+#endif
+
 static inline pgd_t native_make_pgd(pgdval_t val)
 {
-	return (pgd_t) { val };
+	return (pgd_t) { val & PGD_ALLOWED_BITS };
 }
 
 static inline pgdval_t native_pgd_val(pgd_t pgd)
 {
-	return pgd.pgd;
+	return pgd.pgd & PGD_ALLOWED_BITS;
 }
 
 static inline pgdval_t pgd_flags(pgd_t pgd)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 22/39] x86/mm/pae: Populate the user page-table with user pgd's
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (20 preceding siblings ...)
  2018-07-18  9:40 ` [PATCH 21/39] x86/mm/pae: Populate valid user PGD entries Joerg Roedel
@ 2018-07-18  9:40 ` Joerg Roedel
  2018-07-19 23:30   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:41 ` [PATCH 23/39] x86/mm/legacy: " Joerg Roedel
                   ` (18 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:40 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

When we populate a PGD entry, make sure we populate it in
the user page-table too.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable-3level.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
index f24df59..f2ca313 100644
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -98,6 +98,9 @@ static inline void native_set_pmd(pmd_t *pmdp, pmd_t pmd)
 
 static inline void native_set_pud(pud_t *pudp, pud_t pud)
 {
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+	pud.p4d.pgd = pti_set_user_pgtbl(&pudp->p4d.pgd, pud.p4d.pgd);
+#endif
 	set_64bit((unsigned long long *)(pudp), native_pud_val(pud));
 }
 
@@ -229,6 +232,10 @@ static inline pud_t native_pudp_get_and_clear(pud_t *pudp)
 {
 	union split_pud res, *orig = (union split_pud *)pudp;
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+	pti_set_user_pgtbl(&pudp->p4d.pgd, __pgd(0));
+#endif
+
 	/* xchg acts as a barrier before setting of the high bits */
 	res.pud_low = xchg(&orig->pud_low, 0);
 	res.pud_high = orig->pud_high;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 23/39] x86/mm/legacy: Populate the user page-table with user pgd's
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (21 preceding siblings ...)
  2018-07-18  9:40 ` [PATCH 22/39] x86/mm/pae: Populate the user page-table with user pgd's Joerg Roedel
@ 2018-07-18  9:41 ` Joerg Roedel
  2018-07-19 23:30   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:41 ` [PATCH 24/39] x86/mm/pti: Add an overflow check to pti_clone_pmds() Joerg Roedel
                   ` (17 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Also populate the user-spage pgd's in the user page-table.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable-2level.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/include/asm/pgtable-2level.h b/arch/x86/include/asm/pgtable-2level.h
index 685ffe8..c399ea5 100644
--- a/arch/x86/include/asm/pgtable-2level.h
+++ b/arch/x86/include/asm/pgtable-2level.h
@@ -19,6 +19,9 @@ static inline void native_set_pte(pte_t *ptep , pte_t pte)
 
 static inline void native_set_pmd(pmd_t *pmdp, pmd_t pmd)
 {
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+	pmd.pud.p4d.pgd = pti_set_user_pgtbl(&pmdp->pud.p4d.pgd, pmd.pud.p4d.pgd);
+#endif
 	*pmdp = pmd;
 }
 
@@ -58,6 +61,9 @@ static inline pte_t native_ptep_get_and_clear(pte_t *xp)
 #ifdef CONFIG_SMP
 static inline pmd_t native_pmdp_get_and_clear(pmd_t *xp)
 {
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+	pti_set_user_pgtbl(&xp->pud.p4d.pgd, __pgd(0));
+#endif
 	return __pmd(xchg((pmdval_t *)xp, 0));
 }
 #else
@@ -67,6 +73,9 @@ static inline pmd_t native_pmdp_get_and_clear(pmd_t *xp)
 #ifdef CONFIG_SMP
 static inline pud_t native_pudp_get_and_clear(pud_t *xp)
 {
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+	pti_set_user_pgtbl(&xp->p4d.pgd, __pgd(0));
+#endif
 	return __pud(xchg((pudval_t *)xp, 0));
 }
 #else
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 24/39] x86/mm/pti: Add an overflow check to pti_clone_pmds()
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (22 preceding siblings ...)
  2018-07-18  9:41 ` [PATCH 23/39] x86/mm/legacy: " Joerg Roedel
@ 2018-07-18  9:41 ` Joerg Roedel
  2018-07-19 23:31   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:41 ` [PATCH 25/39] x86/mm/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32 Joerg Roedel
                   ` (16 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

The addr counter will overflow if we clone the last PMD of
the address space, resulting in an endless loop.

Check for that and bail out of the loop when it happens.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/mm/pti.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index f512222..dc02fd4 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -297,6 +297,10 @@ pti_clone_pmds(unsigned long start, unsigned long end, pmdval_t clear)
 		p4d_t *p4d;
 		pud_t *pud;
 
+		/* Overflow check */
+		if (addr < start)
+			break;
+
 		pgd = pgd_offset_k(addr);
 		if (WARN_ON(pgd_none(*pgd)))
 			return;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 25/39] x86/mm/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (23 preceding siblings ...)
  2018-07-18  9:41 ` [PATCH 24/39] x86/mm/pti: Add an overflow check to pti_clone_pmds() Joerg Roedel
@ 2018-07-18  9:41 ` Joerg Roedel
  2018-07-19 23:31   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:41 ` [PATCH 26/39] x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level " Joerg Roedel
                   ` (15 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Move it out of the X86_64 specific processor defines so
that its visible for 32bit too.

Reviewed-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/processor-flags.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/processor-flags.h b/arch/x86/include/asm/processor-flags.h
index 625a52a..02c2cbd 100644
--- a/arch/x86/include/asm/processor-flags.h
+++ b/arch/x86/include/asm/processor-flags.h
@@ -39,10 +39,6 @@
 #define CR3_PCID_MASK	0xFFFull
 #define CR3_NOFLUSH	BIT_ULL(63)
 
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
-# define X86_CR3_PTI_PCID_USER_BIT	11
-#endif
-
 #else
 /*
  * CR3_ADDR_MASK needs at least bits 31:5 set on PAE systems, and we save
@@ -53,4 +49,8 @@
 #define CR3_NOFLUSH	0
 #endif
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+# define X86_CR3_PTI_PCID_USER_BIT	11
+#endif
+
 #endif /* _ASM_X86_PROCESSOR_FLAGS_H */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 26/39] x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level on x86_32
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (24 preceding siblings ...)
  2018-07-18  9:41 ` [PATCH 25/39] x86/mm/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32 Joerg Roedel
@ 2018-07-18  9:41 ` Joerg Roedel
  2018-07-19 23:32   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:41 ` [PATCH 27/39] x86/mm/pti: Make pti_clone_kernel_text() compile on 32 bit Joerg Roedel
                   ` (14 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Cloning on the P4D level would clone the complete kernel
address space into the user-space page-tables for PAE
kernels. Cloning on PMD level is fine for PAE and legacy
paging.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/mm/pti.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index dc02fd4..2eadab0 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -348,6 +348,7 @@ pti_clone_pmds(unsigned long start, unsigned long end, pmdval_t clear)
 	}
 }
 
+#ifdef CONFIG_X86_64
 /*
  * Clone a single p4d (i.e. a top-level entry on 4-level systems and a
  * next-level entry on 5-level systems.
@@ -371,6 +372,25 @@ static void __init pti_clone_user_shared(void)
 	pti_clone_p4d(CPU_ENTRY_AREA_BASE);
 }
 
+#else /* CONFIG_X86_64 */
+
+/*
+ * On 32 bit PAE systems with 1GB of Kernel address space there is only
+ * one pgd/p4d for the whole kernel. Cloning that would map the whole
+ * address space into the user page-tables, making PTI useless. So clone
+ * the page-table on the PMD level to prevent that.
+ */
+static void __init pti_clone_user_shared(void)
+{
+	unsigned long start, end;
+
+	start = CPU_ENTRY_AREA_BASE;
+	end   = start + (PAGE_SIZE * CPU_ENTRY_AREA_PAGES);
+
+	pti_clone_pmds(start, end, 0);
+}
+#endif /* CONFIG_X86_64 */
+
 /*
  * Clone the ESPFIX P4D into the user space visible page table
  */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 27/39] x86/mm/pti: Make pti_clone_kernel_text() compile on 32 bit
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (25 preceding siblings ...)
  2018-07-18  9:41 ` [PATCH 26/39] x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level " Joerg Roedel
@ 2018-07-18  9:41 ` Joerg Roedel
  2018-07-19 23:32   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:41 ` [PATCH 28/39] x86/mm/pti: Keep permissions when cloning kernel text in pti_clone_kernel_text() Joerg Roedel
                   ` (13 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

The pti_clone_kernel_text() function references
__end_rodata_hpage_align, which is only present on x86-64.
This makes sense as the end of the rodata section is not
huge-page aligned on 32 bit.

Nevertheless we need a symbol for the function that points
at the right address for both 32 and 64 bit. Introduce
__end_rodata_aligned for that purpose and use it in
pti_clone_kernel_text().

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/sections.h |  1 +
 arch/x86/kernel/vmlinux.lds.S   | 17 ++++++++++-------
 arch/x86/mm/pti.c               |  2 +-
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h
index 5c019d2..4a911a3 100644
--- a/arch/x86/include/asm/sections.h
+++ b/arch/x86/include/asm/sections.h
@@ -7,6 +7,7 @@
 
 extern char __brk_base[], __brk_limit[];
 extern struct exception_table_entry __stop___ex_table[];
+extern char __end_rodata_aligned[];
 
 #if defined(CONFIG_X86_64)
 extern char __end_rodata_hpage_align[];
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 5e1458f..8bde0a4 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -55,19 +55,22 @@ jiffies_64 = jiffies;
  * so we can enable protection checks as well as retain 2MB large page
  * mappings for kernel text.
  */
-#define X64_ALIGN_RODATA_BEGIN	. = ALIGN(HPAGE_SIZE);
+#define X86_ALIGN_RODATA_BEGIN	. = ALIGN(HPAGE_SIZE);
 
-#define X64_ALIGN_RODATA_END					\
+#define X86_ALIGN_RODATA_END					\
 		. = ALIGN(HPAGE_SIZE);				\
-		__end_rodata_hpage_align = .;
+		__end_rodata_hpage_align = .;			\
+		__end_rodata_aligned = .;
 
 #define ALIGN_ENTRY_TEXT_BEGIN	. = ALIGN(PMD_SIZE);
 #define ALIGN_ENTRY_TEXT_END	. = ALIGN(PMD_SIZE);
 
 #else
 
-#define X64_ALIGN_RODATA_BEGIN
-#define X64_ALIGN_RODATA_END
+#define X86_ALIGN_RODATA_BEGIN
+#define X86_ALIGN_RODATA_END					\
+		. = ALIGN(PAGE_SIZE);				\
+		__end_rodata_aligned = .;
 
 #define ALIGN_ENTRY_TEXT_BEGIN
 #define ALIGN_ENTRY_TEXT_END
@@ -141,9 +144,9 @@ SECTIONS
 
 	/* .text should occupy whole number of pages */
 	. = ALIGN(PAGE_SIZE);
-	X64_ALIGN_RODATA_BEGIN
+	X86_ALIGN_RODATA_BEGIN
 	RO_DATA(PAGE_SIZE)
-	X64_ALIGN_RODATA_END
+	X86_ALIGN_RODATA_END
 
 	/* Data */
 	.data : AT(ADDR(.data) - LOAD_OFFSET) {
diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 2eadab0..4f6e933 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -470,7 +470,7 @@ void pti_clone_kernel_text(void)
 	 * clone the areas past rodata, they might contain secrets.
 	 */
 	unsigned long start = PFN_ALIGN(_text);
-	unsigned long end = (unsigned long)__end_rodata_hpage_align;
+	unsigned long end = (unsigned long)__end_rodata_aligned;
 
 	if (!pti_kernel_image_global_ok())
 		return;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 28/39] x86/mm/pti: Keep permissions when cloning kernel text in pti_clone_kernel_text()
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (26 preceding siblings ...)
  2018-07-18  9:41 ` [PATCH 27/39] x86/mm/pti: Make pti_clone_kernel_text() compile on 32 bit Joerg Roedel
@ 2018-07-18  9:41 ` Joerg Roedel
  2018-07-19 23:33   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:41 ` [PATCH 29/39] x86/mm/pti: Introduce pti_finalize() Joerg Roedel
                   ` (12 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Mapping the kernel text area to user-space makes only sense
if it has the same permissions as in the kernel page-table.
If permissions are different this will cause a TLB reload
when using the kernel page-table, which is as good as not
mapping it at all.

On 64-bit kernels this patch makes no difference, as the
whole range cloned by pti_clone_kernel_text() is mapped RO
anyway. On 32 bit there are writeable mappings in the range,
so just keep the permissions as they are.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/mm/pti.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 4f6e933..fc77054 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -482,7 +482,7 @@ void pti_clone_kernel_text(void)
 	 * pti_set_kernel_image_nonglobal() did to clear the
 	 * global bit.
 	 */
-	pti_clone_pmds(start, end, _PAGE_RW);
+	pti_clone_pmds(start, end, 0);
 }
 
 /*
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 29/39] x86/mm/pti: Introduce pti_finalize()
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (27 preceding siblings ...)
  2018-07-18  9:41 ` [PATCH 28/39] x86/mm/pti: Keep permissions when cloning kernel text in pti_clone_kernel_text() Joerg Roedel
@ 2018-07-18  9:41 ` Joerg Roedel
  2018-07-19 23:33   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:41 ` [PATCH 30/39] x86/mm/pti: Clone entry-text again in pti_finalize() Joerg Roedel
                   ` (11 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Introduce a new function to finalize the kernel mappings for
the userspace page-table after all ro/nx protections have been
applied to the kernel mappings.

Also move the call to pti_clone_kernel_text() to that
function so that it will run on 32 bit kernels too.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pti.h |  3 +--
 arch/x86/mm/init_64.c      |  6 ------
 arch/x86/mm/pti.c          | 14 +++++++++++++-
 include/linux/pti.h        |  1 +
 init/main.c                |  7 +++++++
 5 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/pti.h b/arch/x86/include/asm/pti.h
index 38a17f1..5df09a0 100644
--- a/arch/x86/include/asm/pti.h
+++ b/arch/x86/include/asm/pti.h
@@ -6,10 +6,9 @@
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
 extern void pti_init(void);
 extern void pti_check_boottime_disable(void);
-extern void pti_clone_kernel_text(void);
+extern void pti_finalize(void);
 #else
 static inline void pti_check_boottime_disable(void) { }
-static inline void pti_clone_kernel_text(void) { }
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index a688617..9b19f9a 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1291,12 +1291,6 @@ void mark_rodata_ro(void)
 			(unsigned long) __va(__pa_symbol(_sdata)));
 
 	debug_checkwx();
-
-	/*
-	 * Do this after all of the manipulation of the
-	 * kernel text page tables are complete.
-	 */
-	pti_clone_kernel_text();
 }
 
 int kern_addr_valid(unsigned long addr)
diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index fc77054..1825f30 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -462,7 +462,7 @@ static inline bool pti_kernel_image_global_ok(void)
  * For some configurations, map all of kernel text into the user page
  * tables.  This reduces TLB misses, especially on non-PCID systems.
  */
-void pti_clone_kernel_text(void)
+static void pti_clone_kernel_text(void)
 {
 	/*
 	 * rodata is part of the kernel image and is normally
@@ -526,3 +526,15 @@ void __init pti_init(void)
 	pti_setup_espfix64();
 	pti_setup_vsyscall();
 }
+
+/*
+ * Finalize the kernel mappings in the userspace page-table.
+ */
+void pti_finalize(void)
+{
+	/*
+	 * Do this after all of the manipulation of the
+	 * kernel text page tables are complete.
+	 */
+	pti_clone_kernel_text();
+}
diff --git a/include/linux/pti.h b/include/linux/pti.h
index 0174883..1a941ef 100644
--- a/include/linux/pti.h
+++ b/include/linux/pti.h
@@ -6,6 +6,7 @@
 #include <asm/pti.h>
 #else
 static inline void pti_init(void) { }
+static inline void pti_finalize(void) { }
 #endif
 
 #endif
diff --git a/init/main.c b/init/main.c
index 3b4ada1..fcfef46 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1065,6 +1065,13 @@ static int __ref kernel_init(void *unused)
 	jump_label_invalidate_initmem();
 	free_initmem();
 	mark_readonly();
+
+	/*
+	 * Kernel mappings are now finalized - update the userspace page-table
+	 * to finalize PTI.
+	 */
+	pti_finalize();
+
 	system_state = SYSTEM_RUNNING;
 	numa_default_policy();
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 30/39] x86/mm/pti: Clone entry-text again in pti_finalize()
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (28 preceding siblings ...)
  2018-07-18  9:41 ` [PATCH 29/39] x86/mm/pti: Introduce pti_finalize() Joerg Roedel
@ 2018-07-18  9:41 ` Joerg Roedel
  2018-07-19 23:34   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:41 ` [PATCH 31/39] x86/mm/dump_pagetables: Define INIT_PGD Joerg Roedel
                   ` (10 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

The mapping for entry-text might have changed in the kernel
after it was cloned to the user page-table. Clone again
to update the user page-table to bring the mapping in sync
with the kernel again.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/mm/pti.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 1825f30..b879ccd 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -404,7 +404,7 @@ static void __init pti_setup_espfix64(void)
 /*
  * Clone the populated PMDs of the entry and irqentry text and force it RO.
  */
-static void __init pti_clone_entry_text(void)
+static void pti_clone_entry_text(void)
 {
 	pti_clone_pmds((unsigned long) __entry_text_start,
 			(unsigned long) __irqentry_text_end,
@@ -528,13 +528,18 @@ void __init pti_init(void)
 }
 
 /*
- * Finalize the kernel mappings in the userspace page-table.
+ * Finalize the kernel mappings in the userspace page-table. Some of the
+ * mappings for the kernel image might have changed since pti_init()
+ * cloned them. This is because parts of the kernel image have been
+ * mapped RO and/or NX.  These changes need to be cloned again to the
+ * userspace page-table.
  */
 void pti_finalize(void)
 {
 	/*
-	 * Do this after all of the manipulation of the
-	 * kernel text page tables are complete.
+	 * We need to clone everything (again) that maps parts of the
+	 * kernel image.
 	 */
+	pti_clone_entry_text();
 	pti_clone_kernel_text();
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 31/39] x86/mm/dump_pagetables: Define INIT_PGD
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (29 preceding siblings ...)
  2018-07-18  9:41 ` [PATCH 30/39] x86/mm/pti: Clone entry-text again in pti_finalize() Joerg Roedel
@ 2018-07-18  9:41 ` Joerg Roedel
  2018-07-19 23:34   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:41 ` [PATCH 32/39] x86/pgtable/pae: Use separate kernel PMDs for user page-table Joerg Roedel
                   ` (9 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Define INIT_PGD to point to the correct initial page-table
for 32 and 64 bit and use it where needed. This fixes the
build on 32 bit with CONFIG_PAGE_TABLE_ISOLATION enabled.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/mm/dump_pagetables.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 2f3c919..e6fd0cd 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -111,6 +111,8 @@ static struct addr_marker address_markers[] = {
 	[END_OF_SPACE_NR]	= { -1,			NULL }
 };
 
+#define INIT_PGD	((pgd_t *) &init_top_pgt)
+
 #else /* CONFIG_X86_64 */
 
 enum address_markers_idx {
@@ -139,6 +141,8 @@ static struct addr_marker address_markers[] = {
 	[END_OF_SPACE_NR]	= { -1,			NULL }
 };
 
+#define INIT_PGD	(swapper_pg_dir)
+
 #endif /* !CONFIG_X86_64 */
 
 /* Multipliers for offsets within the PTEs */
@@ -496,11 +500,7 @@ static inline bool is_hypervisor_range(int idx)
 static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
 				       bool checkwx, bool dmesg)
 {
-#ifdef CONFIG_X86_64
-	pgd_t *start = (pgd_t *) &init_top_pgt;
-#else
-	pgd_t *start = swapper_pg_dir;
-#endif
+	pgd_t *start = INIT_PGD;
 	pgprotval_t prot, eff;
 	int i;
 	struct pg_state st = {};
@@ -566,7 +566,7 @@ EXPORT_SYMBOL_GPL(ptdump_walk_pgd_level_debugfs);
 static void ptdump_walk_user_pgd_level_checkwx(void)
 {
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
-	pgd_t *pgd = (pgd_t *) &init_top_pgt;
+	pgd_t *pgd = INIT_PGD;
 
 	if (!static_cpu_has(X86_FEATURE_PTI))
 		return;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 32/39] x86/pgtable/pae: Use separate kernel PMDs for user page-table
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (30 preceding siblings ...)
  2018-07-18  9:41 ` [PATCH 31/39] x86/mm/dump_pagetables: Define INIT_PGD Joerg Roedel
@ 2018-07-18  9:41 ` Joerg Roedel
  2018-07-19 23:35   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-10-05 14:06   ` [PATCH 32/39] " Arnd Bergmann
  2018-07-18  9:41 ` [PATCH 33/39] x86/ldt: Reserve address-space range on 32 bit for the LDT Joerg Roedel
                   ` (8 subsequent siblings)
  40 siblings, 2 replies; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

We need separate kernel PMDs in the user page-table when PTI
is enabled to map the per-process LDT for user-space.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/mm/pgtable.c | 100 ++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 81 insertions(+), 19 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index db6fb77..8e4e63d 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -182,6 +182,14 @@ static void pgd_dtor(pgd_t *pgd)
  */
 #define PREALLOCATED_PMDS	UNSHARED_PTRS_PER_PGD
 
+/*
+ * We allocate separate PMDs for the kernel part of the user page-table
+ * when PTI is enabled. We need them to map the per-process LDT into the
+ * user-space page-table.
+ */
+#define PREALLOCATED_USER_PMDS	 (static_cpu_has(X86_FEATURE_PTI) ? \
+					KERNEL_PGD_PTRS : 0)
+
 void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd)
 {
 	paravirt_alloc_pmd(mm, __pa(pmd) >> PAGE_SHIFT);
@@ -202,14 +210,14 @@ void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd)
 
 /* No need to prepopulate any pagetable entries in non-PAE modes. */
 #define PREALLOCATED_PMDS	0
-
+#define PREALLOCATED_USER_PMDS	 0
 #endif	/* CONFIG_X86_PAE */
 
-static void free_pmds(struct mm_struct *mm, pmd_t *pmds[])
+static void free_pmds(struct mm_struct *mm, pmd_t *pmds[], int count)
 {
 	int i;
 
-	for(i = 0; i < PREALLOCATED_PMDS; i++)
+	for (i = 0; i < count; i++)
 		if (pmds[i]) {
 			pgtable_pmd_page_dtor(virt_to_page(pmds[i]));
 			free_page((unsigned long)pmds[i]);
@@ -217,7 +225,7 @@ static void free_pmds(struct mm_struct *mm, pmd_t *pmds[])
 		}
 }
 
-static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[])
+static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[], int count)
 {
 	int i;
 	bool failed = false;
@@ -226,7 +234,7 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[])
 	if (mm == &init_mm)
 		gfp &= ~__GFP_ACCOUNT;
 
-	for(i = 0; i < PREALLOCATED_PMDS; i++) {
+	for (i = 0; i < count; i++) {
 		pmd_t *pmd = (pmd_t *)__get_free_page(gfp);
 		if (!pmd)
 			failed = true;
@@ -241,7 +249,7 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[])
 	}
 
 	if (failed) {
-		free_pmds(mm, pmds);
+		free_pmds(mm, pmds, count);
 		return -ENOMEM;
 	}
 
@@ -254,23 +262,38 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[])
  * preallocate which never got a corresponding vma will need to be
  * freed manually.
  */
+static void mop_up_one_pmd(struct mm_struct *mm, pgd_t *pgdp)
+{
+	pgd_t pgd = *pgdp;
+
+	if (pgd_val(pgd) != 0) {
+		pmd_t *pmd = (pmd_t *)pgd_page_vaddr(pgd);
+
+		*pgdp = native_make_pgd(0);
+
+		paravirt_release_pmd(pgd_val(pgd) >> PAGE_SHIFT);
+		pmd_free(mm, pmd);
+		mm_dec_nr_pmds(mm);
+	}
+}
+
 static void pgd_mop_up_pmds(struct mm_struct *mm, pgd_t *pgdp)
 {
 	int i;
 
-	for(i = 0; i < PREALLOCATED_PMDS; i++) {
-		pgd_t pgd = pgdp[i];
+	for (i = 0; i < PREALLOCATED_PMDS; i++)
+		mop_up_one_pmd(mm, &pgdp[i]);
 
-		if (pgd_val(pgd) != 0) {
-			pmd_t *pmd = (pmd_t *)pgd_page_vaddr(pgd);
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
 
-			pgdp[i] = native_make_pgd(0);
+	if (!static_cpu_has(X86_FEATURE_PTI))
+		return;
 
-			paravirt_release_pmd(pgd_val(pgd) >> PAGE_SHIFT);
-			pmd_free(mm, pmd);
-			mm_dec_nr_pmds(mm);
-		}
-	}
+	pgdp = kernel_to_user_pgdp(pgdp);
+
+	for (i = 0; i < PREALLOCATED_USER_PMDS; i++)
+		mop_up_one_pmd(mm, &pgdp[i + KERNEL_PGD_BOUNDARY]);
+#endif
 }
 
 static void pgd_prepopulate_pmd(struct mm_struct *mm, pgd_t *pgd, pmd_t *pmds[])
@@ -296,6 +319,38 @@ static void pgd_prepopulate_pmd(struct mm_struct *mm, pgd_t *pgd, pmd_t *pmds[])
 	}
 }
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+static void pgd_prepopulate_user_pmd(struct mm_struct *mm,
+				     pgd_t *k_pgd, pmd_t *pmds[])
+{
+	pgd_t *s_pgd = kernel_to_user_pgdp(swapper_pg_dir);
+	pgd_t *u_pgd = kernel_to_user_pgdp(k_pgd);
+	p4d_t *u_p4d;
+	pud_t *u_pud;
+	int i;
+
+	u_p4d = p4d_offset(u_pgd, 0);
+	u_pud = pud_offset(u_p4d, 0);
+
+	s_pgd += KERNEL_PGD_BOUNDARY;
+	u_pud += KERNEL_PGD_BOUNDARY;
+
+	for (i = 0; i < PREALLOCATED_USER_PMDS; i++, u_pud++, s_pgd++) {
+		pmd_t *pmd = pmds[i];
+
+		memcpy(pmd, (pmd_t *)pgd_page_vaddr(*s_pgd),
+		       sizeof(pmd_t) * PTRS_PER_PMD);
+
+		pud_populate(mm, u_pud, pmd);
+	}
+
+}
+#else
+static void pgd_prepopulate_user_pmd(struct mm_struct *mm,
+				     pgd_t *k_pgd, pmd_t *pmds[])
+{
+}
+#endif
 /*
  * Xen paravirt assumes pgd table should be in one page. 64 bit kernel also
  * assumes that pgd should be in one page.
@@ -376,6 +431,7 @@ static inline void _pgd_free(pgd_t *pgd)
 pgd_t *pgd_alloc(struct mm_struct *mm)
 {
 	pgd_t *pgd;
+	pmd_t *u_pmds[PREALLOCATED_USER_PMDS];
 	pmd_t *pmds[PREALLOCATED_PMDS];
 
 	pgd = _pgd_alloc();
@@ -385,12 +441,15 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
 
 	mm->pgd = pgd;
 
-	if (preallocate_pmds(mm, pmds) != 0)
+	if (preallocate_pmds(mm, pmds, PREALLOCATED_PMDS) != 0)
 		goto out_free_pgd;
 
-	if (paravirt_pgd_alloc(mm) != 0)
+	if (preallocate_pmds(mm, u_pmds, PREALLOCATED_USER_PMDS) != 0)
 		goto out_free_pmds;
 
+	if (paravirt_pgd_alloc(mm) != 0)
+		goto out_free_user_pmds;
+
 	/*
 	 * Make sure that pre-populating the pmds is atomic with
 	 * respect to anything walking the pgd_list, so that they
@@ -400,13 +459,16 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
 
 	pgd_ctor(mm, pgd);
 	pgd_prepopulate_pmd(mm, pgd, pmds);
+	pgd_prepopulate_user_pmd(mm, pgd, u_pmds);
 
 	spin_unlock(&pgd_lock);
 
 	return pgd;
 
+out_free_user_pmds:
+	free_pmds(mm, u_pmds, PREALLOCATED_USER_PMDS);
 out_free_pmds:
-	free_pmds(mm, pmds);
+	free_pmds(mm, pmds, PREALLOCATED_PMDS);
 out_free_pgd:
 	_pgd_free(pgd);
 out:
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 33/39] x86/ldt: Reserve address-space range on 32 bit for the LDT
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (31 preceding siblings ...)
  2018-07-18  9:41 ` [PATCH 32/39] x86/pgtable/pae: Use separate kernel PMDs for user page-table Joerg Roedel
@ 2018-07-18  9:41 ` Joerg Roedel
  2018-07-19 23:35   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:41 ` [PATCH 34/39] x86/ldt: Define LDT_END_ADDR Joerg Roedel
                   ` (7 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Reserve 2MB/4MB of address-space for mapping the LDT to
user-space on 32 bit PTI kernels.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable_32_types.h | 7 +++++--
 arch/x86/mm/dump_pagetables.c           | 9 +++++++++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_32_types.h b/arch/x86/include/asm/pgtable_32_types.h
index d9a001a..7297810 100644
--- a/arch/x86/include/asm/pgtable_32_types.h
+++ b/arch/x86/include/asm/pgtable_32_types.h
@@ -50,13 +50,16 @@ extern bool __vmalloc_start_set; /* set once high_memory is set */
 	((FIXADDR_TOT_START - PAGE_SIZE * (CPU_ENTRY_AREA_PAGES + 1))   \
 	 & PMD_MASK)
 
-#define PKMAP_BASE		\
+#define LDT_BASE_ADDR		\
 	((CPU_ENTRY_AREA_BASE - PAGE_SIZE) & PMD_MASK)
 
+#define PKMAP_BASE		\
+	((LDT_BASE_ADDR - PAGE_SIZE) & PMD_MASK)
+
 #ifdef CONFIG_HIGHMEM
 # define VMALLOC_END	(PKMAP_BASE - 2 * PAGE_SIZE)
 #else
-# define VMALLOC_END	(CPU_ENTRY_AREA_BASE - 2 * PAGE_SIZE)
+# define VMALLOC_END	(LDT_BASE_ADDR - 2 * PAGE_SIZE)
 #endif
 
 #define MODULES_VADDR	VMALLOC_START
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index e6fd0cd..ccd92c4 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -123,6 +123,9 @@ enum address_markers_idx {
 #ifdef CONFIG_HIGHMEM
 	PKMAP_BASE_NR,
 #endif
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+	LDT_NR,
+#endif
 	CPU_ENTRY_AREA_NR,
 	FIXADDR_START_NR,
 	END_OF_SPACE_NR,
@@ -136,6 +139,9 @@ static struct addr_marker address_markers[] = {
 #ifdef CONFIG_HIGHMEM
 	[PKMAP_BASE_NR]		= { 0UL,		"Persistent kmap() Area" },
 #endif
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+	[LDT_NR]		= { 0UL,		"LDT remap" },
+#endif
 	[CPU_ENTRY_AREA_NR]	= { 0UL,		"CPU entry area" },
 	[FIXADDR_START_NR]	= { 0UL,		"Fixmap area" },
 	[END_OF_SPACE_NR]	= { -1,			NULL }
@@ -609,6 +615,9 @@ static int __init pt_dump_init(void)
 # endif
 	address_markers[FIXADDR_START_NR].start_address = FIXADDR_START;
 	address_markers[CPU_ENTRY_AREA_NR].start_address = CPU_ENTRY_AREA_BASE;
+# ifdef CONFIG_MODIFY_LDT_SYSCALL
+	address_markers[LDT_NR].start_address = LDT_BASE_ADDR;
+# endif
 #endif
 	return 0;
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 34/39] x86/ldt: Define LDT_END_ADDR
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (32 preceding siblings ...)
  2018-07-18  9:41 ` [PATCH 33/39] x86/ldt: Reserve address-space range on 32 bit for the LDT Joerg Roedel
@ 2018-07-18  9:41 ` Joerg Roedel
  2018-07-19 23:36   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:41 ` [PATCH 35/39] x86/ldt: Split out sanity check in map_ldt_struct() Joerg Roedel
                   ` (6 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

It marks the end of the address-space range reserved for the
LDT. The LDT-code will use it when unmapping the LDT for
user-space.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable_32_types.h | 2 ++
 arch/x86/include/asm/pgtable_64_types.h | 1 +
 arch/x86/kernel/ldt.c                   | 2 +-
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pgtable_32_types.h b/arch/x86/include/asm/pgtable_32_types.h
index 7297810..b0bc0ff 100644
--- a/arch/x86/include/asm/pgtable_32_types.h
+++ b/arch/x86/include/asm/pgtable_32_types.h
@@ -53,6 +53,8 @@ extern bool __vmalloc_start_set; /* set once high_memory is set */
 #define LDT_BASE_ADDR		\
 	((CPU_ENTRY_AREA_BASE - PAGE_SIZE) & PMD_MASK)
 
+#define LDT_END_ADDR		(LDT_BASE_ADDR + PMD_SIZE)
+
 #define PKMAP_BASE		\
 	((LDT_BASE_ADDR - PAGE_SIZE) & PMD_MASK)
 
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 066e0ab..04edd2d 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -115,6 +115,7 @@ extern unsigned int ptrs_per_p4d;
 #define LDT_PGD_ENTRY_L5	-112UL
 #define LDT_PGD_ENTRY		(pgtable_l5_enabled() ? LDT_PGD_ENTRY_L5 : LDT_PGD_ENTRY_L4)
 #define LDT_BASE_ADDR		(LDT_PGD_ENTRY << PGDIR_SHIFT)
+#define LDT_END_ADDR		(LDT_BASE_ADDR + PGDIR_SIZE)
 
 #define __VMALLOC_BASE_L4	0xffffc90000000000UL
 #define __VMALLOC_BASE_L5 	0xffa0000000000000UL
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index c9b1402..e921b3d 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -206,7 +206,7 @@ static void free_ldt_pgtables(struct mm_struct *mm)
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
 	struct mmu_gather tlb;
 	unsigned long start = LDT_BASE_ADDR;
-	unsigned long end = start + (1UL << PGDIR_SHIFT);
+	unsigned long end = LDT_END_ADDR;
 
 	if (!static_cpu_has(X86_FEATURE_PTI))
 		return;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 35/39] x86/ldt: Split out sanity check in map_ldt_struct()
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (33 preceding siblings ...)
  2018-07-18  9:41 ` [PATCH 34/39] x86/ldt: Define LDT_END_ADDR Joerg Roedel
@ 2018-07-18  9:41 ` Joerg Roedel
  2018-07-19 23:36   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:41 ` [PATCH 36/39] x86/ldt: Enable LDT user-mapping for PAE Joerg Roedel
                   ` (5 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

This splits out the mapping sanity check and the actual
mapping of the LDT to user-space from the map_ldt_struct()
function in a way so that it is re-usable for PAE paging.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/ldt.c | 82 ++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 58 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index e921b3d..69af9a0 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -100,6 +100,49 @@ static struct ldt_struct *alloc_ldt_struct(unsigned int num_entries)
 	return new_ldt;
 }
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+
+static void do_sanity_check(struct mm_struct *mm,
+			    bool had_kernel_mapping,
+			    bool had_user_mapping)
+{
+	if (mm->context.ldt) {
+		/*
+		 * We already had an LDT.  The top-level entry should already
+		 * have been allocated and synchronized with the usermode
+		 * tables.
+		 */
+		WARN_ON(!had_kernel_mapping);
+		if (static_cpu_has(X86_FEATURE_PTI))
+			WARN_ON(!had_user_mapping);
+	} else {
+		/*
+		 * This is the first time we're mapping an LDT for this process.
+		 * Sync the pgd to the usermode tables.
+		 */
+		WARN_ON(had_kernel_mapping);
+		if (static_cpu_has(X86_FEATURE_PTI))
+			WARN_ON(had_user_mapping);
+	}
+}
+
+static void map_ldt_struct_to_user(struct mm_struct *mm)
+{
+	pgd_t *pgd = pgd_offset(mm, LDT_BASE_ADDR);
+
+	if (static_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt)
+		set_pgd(kernel_to_user_pgdp(pgd), *pgd);
+}
+
+static void sanity_check_ldt_mapping(struct mm_struct *mm)
+{
+	pgd_t *pgd = pgd_offset(mm, LDT_BASE_ADDR);
+	bool had_kernel = (pgd->pgd != 0);
+	bool had_user   = (kernel_to_user_pgdp(pgd)->pgd != 0);
+
+	do_sanity_check(mm, had_kernel, had_user);
+}
+
 /*
  * If PTI is enabled, this maps the LDT into the kernelmode and
  * usermode tables for the given mm.
@@ -115,9 +158,8 @@ static struct ldt_struct *alloc_ldt_struct(unsigned int num_entries)
 static int
 map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 {
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
-	bool is_vmalloc, had_top_level_entry;
 	unsigned long va;
+	bool is_vmalloc;
 	spinlock_t *ptl;
 	pgd_t *pgd;
 	int i;
@@ -131,13 +173,15 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 	 */
 	WARN_ON(ldt->slot != -1);
 
+	/* Check if the current mappings are sane */
+	sanity_check_ldt_mapping(mm);
+
 	/*
 	 * Did we already have the top level entry allocated?  We can't
 	 * use pgd_none() for this because it doens't do anything on
 	 * 4-level page table kernels.
 	 */
 	pgd = pgd_offset(mm, LDT_BASE_ADDR);
-	had_top_level_entry = (pgd->pgd != 0);
 
 	is_vmalloc = is_vmalloc_addr(ldt->entries);
 
@@ -172,35 +216,25 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 		pte_unmap_unlock(ptep, ptl);
 	}
 
-	if (mm->context.ldt) {
-		/*
-		 * We already had an LDT.  The top-level entry should already
-		 * have been allocated and synchronized with the usermode
-		 * tables.
-		 */
-		WARN_ON(!had_top_level_entry);
-		if (static_cpu_has(X86_FEATURE_PTI))
-			WARN_ON(!kernel_to_user_pgdp(pgd)->pgd);
-	} else {
-		/*
-		 * This is the first time we're mapping an LDT for this process.
-		 * Sync the pgd to the usermode tables.
-		 */
-		WARN_ON(had_top_level_entry);
-		if (static_cpu_has(X86_FEATURE_PTI)) {
-			WARN_ON(kernel_to_user_pgdp(pgd)->pgd);
-			set_pgd(kernel_to_user_pgdp(pgd), *pgd);
-		}
-	}
+	/* Propagate LDT mapping to the user page-table */
+	map_ldt_struct_to_user(mm);
 
 	va = (unsigned long)ldt_slot_va(slot);
 	flush_tlb_mm_range(mm, va, va + LDT_SLOT_STRIDE, 0);
 
 	ldt->slot = slot;
-#endif
 	return 0;
 }
 
+#else /* !CONFIG_PAGE_TABLE_ISOLATION */
+
+static int
+map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
+{
+	return 0;
+}
+#endif /* CONFIG_PAGE_TABLE_ISOLATION */
+
 static void free_ldt_pgtables(struct mm_struct *mm)
 {
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 36/39] x86/ldt: Enable LDT user-mapping for PAE
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (34 preceding siblings ...)
  2018-07-18  9:41 ` [PATCH 35/39] x86/ldt: Split out sanity check in map_ldt_struct() Joerg Roedel
@ 2018-07-18  9:41 ` Joerg Roedel
  2018-07-19 23:37   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:41 ` [PATCH 37/39] x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32 Joerg Roedel
                   ` (4 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

This adds the needed special case for PAE to get the LDT
mapped into the user page-table when PTI is enabled. The big
difference to the other paging modes is that we don't have a
full top-level PGD entry available for the LDT, but only PMD
entry.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/mmu_context.h |  5 ----
 arch/x86/kernel/ldt.c              | 53 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index bbc796e..eeeb928 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -71,12 +71,7 @@ struct ldt_struct {
 
 static inline void *ldt_slot_va(int slot)
 {
-#ifdef CONFIG_X86_64
 	return (void *)(LDT_BASE_ADDR + LDT_SLOT_STRIDE * slot);
-#else
-	BUG();
-	return (void *)fix_to_virt(FIX_HOLE);
-#endif
 }
 
 /*
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index 69af9a0..733e6ac 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -126,6 +126,57 @@ static void do_sanity_check(struct mm_struct *mm,
 	}
 }
 
+#ifdef CONFIG_X86_PAE
+
+static pmd_t *pgd_to_pmd_walk(pgd_t *pgd, unsigned long va)
+{
+	p4d_t *p4d;
+	pud_t *pud;
+
+	if (pgd->pgd == 0)
+		return NULL;
+
+	p4d = p4d_offset(pgd, va);
+	if (p4d_none(*p4d))
+		return NULL;
+
+	pud = pud_offset(p4d, va);
+	if (pud_none(*pud))
+		return NULL;
+
+	return pmd_offset(pud, va);
+}
+
+static void map_ldt_struct_to_user(struct mm_struct *mm)
+{
+	pgd_t *k_pgd = pgd_offset(mm, LDT_BASE_ADDR);
+	pgd_t *u_pgd = kernel_to_user_pgdp(k_pgd);
+	pmd_t *k_pmd, *u_pmd;
+
+	k_pmd = pgd_to_pmd_walk(k_pgd, LDT_BASE_ADDR);
+	u_pmd = pgd_to_pmd_walk(u_pgd, LDT_BASE_ADDR);
+
+	if (static_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt)
+		set_pmd(u_pmd, *k_pmd);
+}
+
+static void sanity_check_ldt_mapping(struct mm_struct *mm)
+{
+	pgd_t *k_pgd = pgd_offset(mm, LDT_BASE_ADDR);
+	pgd_t *u_pgd = kernel_to_user_pgdp(k_pgd);
+	bool had_kernel, had_user;
+	pmd_t *k_pmd, *u_pmd;
+
+	k_pmd      = pgd_to_pmd_walk(k_pgd, LDT_BASE_ADDR);
+	u_pmd      = pgd_to_pmd_walk(u_pgd, LDT_BASE_ADDR);
+	had_kernel = (k_pmd->pmd != 0);
+	had_user   = (u_pmd->pmd != 0);
+
+	do_sanity_check(mm, had_kernel, had_user);
+}
+
+#else /* !CONFIG_X86_PAE */
+
 static void map_ldt_struct_to_user(struct mm_struct *mm)
 {
 	pgd_t *pgd = pgd_offset(mm, LDT_BASE_ADDR);
@@ -143,6 +194,8 @@ static void sanity_check_ldt_mapping(struct mm_struct *mm)
 	do_sanity_check(mm, had_kernel, had_user);
 }
 
+#endif /* CONFIG_X86_PAE */
+
 /*
  * If PTI is enabled, this maps the LDT into the kernelmode and
  * usermode tables for the given mm.
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 37/39] x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (35 preceding siblings ...)
  2018-07-18  9:41 ` [PATCH 36/39] x86/ldt: Enable LDT user-mapping for PAE Joerg Roedel
@ 2018-07-18  9:41 ` Joerg Roedel
  2018-07-19 23:37   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:41 ` [PATCH 38/39] x86/mm/pti: Add Warning when booting on a PCID capable CPU Joerg Roedel
                   ` (3 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Allow PTI to be compiled on x86_32.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 security/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/Kconfig b/security/Kconfig
index c430206..afa91c6 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -57,7 +57,7 @@ config SECURITY_NETWORK
 config PAGE_TABLE_ISOLATION
 	bool "Remove the kernel mapping in user mode"
 	default y
-	depends on X86_64 && !UML
+	depends on X86 && !UML
 	help
 	  This feature reduces the number of hardware side channels by
 	  ensuring that the majority of kernel addresses are not mapped
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 38/39] x86/mm/pti: Add Warning when booting on a PCID capable CPU
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (36 preceding siblings ...)
  2018-07-18  9:41 ` [PATCH 37/39] x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32 Joerg Roedel
@ 2018-07-18  9:41 ` Joerg Roedel
  2018-07-19 23:38   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  2018-07-18  9:41 ` [PATCH 39/39] x86/entry/32: Add debug code to check entry/exit cr3 Joerg Roedel
                   ` (2 subsequent siblings)
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Warn the user in case the performance can be significantly
improved by switching to a 64-bit kernel.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/mm/pti.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index b879ccd..be8d2cd 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -517,6 +517,28 @@ void __init pti_init(void)
 
 	pr_info("enabled\n");
 
+#ifdef CONFIG_X86_32
+	/*
+	 * We check for X86_FEATURE_PCID here. But the init-code will
+	 * clear the feature flag on 32 bit because the feature is not
+	 * supported on 32 bit anyway. To print the warning we need to
+	 * check with cpuid directly again.
+	 */
+	if (cpuid_ecx(0x1) && BIT(17)) {
+		/* Use printk to work around pr_fmt() */
+		printk(KERN_WARNING "\n");
+		printk(KERN_WARNING "************************************************************\n");
+		printk(KERN_WARNING "** WARNING! WARNING! WARNING! WARNING! WARNING! WARNING!  **\n");
+		printk(KERN_WARNING "**                                                        **\n");
+		printk(KERN_WARNING "** You are using 32-bit PTI on a 64-bit PCID-capable CPU. **\n");
+		printk(KERN_WARNING "** Your performance will increase dramatically if you     **\n");
+		printk(KERN_WARNING "** switch to a 64-bit kernel!                             **\n");
+		printk(KERN_WARNING "**                                                        **\n");
+		printk(KERN_WARNING "** WARNING! WARNING! WARNING! WARNING! WARNING! WARNING!  **\n");
+		printk(KERN_WARNING "************************************************************\n");
+	}
+#endif
+
 	pti_clone_user_shared();
 
 	/* Undo all global bits from the init pagetables in head_64.S: */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [PATCH 39/39] x86/entry/32: Add debug code to check entry/exit cr3
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (37 preceding siblings ...)
  2018-07-18  9:41 ` [PATCH 38/39] x86/mm/pti: Add Warning when booting on a PCID capable CPU Joerg Roedel
@ 2018-07-18  9:41 ` Joerg Roedel
  2018-07-19 23:38   ` [tip:x86/pti] x86/entry/32: Add debug code to check entry/exit CR3 tip-bot for Joerg Roedel
  2018-07-18 11:59 ` [PATCH 00/39 v8] PTI support for x86-32 Pavel Machek
  2018-07-19 23:21 ` Thomas Gleixner
  40 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18  9:41 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Add code to check whether we enter and leave the kernel with
the correct cr3 and make it depend on CONFIG_DEBUG_ENTRY.
This is needed because we have no NX protection of
user-addresses in the kernel-cr3 on x86-32 and wouldn't
notice that type of bug otherwise.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index b1541c7..010cdb4 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -166,6 +166,24 @@
 .Lend_\@:
 .endm
 
+.macro BUG_IF_WRONG_CR3 no_user_check=0
+#ifdef CONFIG_DEBUG_ENTRY
+	ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
+	.if \no_user_check == 0
+	/* coming from usermode? */
+	testl	$SEGMENT_RPL_MASK, PT_CS(%esp)
+	jz	.Lend_\@
+	.endif
+	/* On user-cr3? */
+	movl	%cr3, %eax
+	testl	$PTI_SWITCH_MASK, %eax
+	jnz	.Lend_\@
+	/* From userspace with kernel cr3 - BUG */
+	ud2
+.Lend_\@:
+#endif
+.endm
+
 /*
  * Switch to kernel cr3 if not already loaded and return current cr3 in
  * \scratch_reg
@@ -213,6 +231,8 @@
 .macro SAVE_ALL_NMI cr3_reg:req
 	SAVE_ALL
 
+	BUG_IF_WRONG_CR3
+
 	/*
 	 * Now switch the CR3 when PTI is enabled.
 	 *
@@ -224,6 +244,7 @@
 
 .Lend_\@:
 .endm
+
 /*
  * This is a sneaky trick to help the unwinder find pt_regs on the stack.  The
  * frame pointer is replaced with an encoded pointer to pt_regs.  The encoding
@@ -287,6 +308,8 @@
 
 .Lswitched_\@:
 
+	BUG_IF_WRONG_CR3
+
 	RESTORE_REGS pop=\pop
 .endm
 
@@ -357,6 +380,8 @@
 
 	ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
 
+	BUG_IF_WRONG_CR3
+
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
 
 	/*
@@ -799,6 +824,7 @@ ENTRY(entry_SYSENTER_32)
 	 */
 	pushfl
 	pushl	%eax
+	BUG_IF_WRONG_CR3 no_user_check=1
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
 	popl	%eax
 	popfl
@@ -893,6 +919,7 @@ ENTRY(entry_SYSENTER_32)
 	 * whereas POPF does not.)
 	 */
 	btrl	$X86_EFLAGS_IF_BIT, (%esp)
+	BUG_IF_WRONG_CR3 no_user_check=1
 	popfl
 	popl	%eax
 
@@ -970,6 +997,8 @@ restore_all:
 	/* Switch back to user CR3 */
 	SWITCH_TO_USER_CR3 scratch_reg=%eax
 
+	BUG_IF_WRONG_CR3
+
 	/* Restore user state */
 	RESTORE_REGS pop=4			# skip orig_eax/error_code
 .Lirq_return:
@@ -983,6 +1012,7 @@ restore_all:
 restore_all_kernel:
 	TRACE_IRQS_IRET
 	PARANOID_EXIT_TO_KERNEL_MODE
+	BUG_IF_WRONG_CR3
 	RESTORE_REGS 4
 	jmp	.Lirq_return
 
@@ -990,6 +1020,19 @@ restore_all_kernel:
 ENTRY(iret_exc	)
 	pushl	$0				# no error code
 	pushl	$do_iret_error
+
+#ifdef CONFIG_DEBUG_ENTRY
+	/*
+	 * The stack-frame here is the one that iret faulted on, so its a
+	 * return-to-user frame. We are on kernel-cr3 because we come here from
+	 * the fixup code. This confuses the CR3 checker, so switch to user-cr3
+	 * as the checker expects it.
+	 */
+	pushl	%eax
+	SWITCH_TO_USER_CR3 scratch_reg=%eax
+	popl	%eax
+#endif
+
 	jmp	common_exception
 .previous
 	_ASM_EXTABLE(.Lirq_return, iret_exc)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

* Re: [PATCH 00/39 v8] PTI support for x86-32
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (38 preceding siblings ...)
  2018-07-18  9:41 ` [PATCH 39/39] x86/entry/32: Add debug code to check entry/exit cr3 Joerg Roedel
@ 2018-07-18 11:59 ` Pavel Machek
  2018-07-19 23:21 ` Thomas Gleixner
  40 siblings, 0 replies; 105+ messages in thread
From: Pavel Machek @ 2018-07-18 11:59 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel,
	linux-mm, Linus Torvalds, Andy Lutomirski, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, aliguori,
	daniel.gruss, hughd, keescook, Andrea Arcangeli, Waiman Long,
	David H . Gutteridge, jroedel

[-- Attachment #1: Type: text/plain, Size: 953 bytes --]

On Wed 2018-07-18 11:40:37, Joerg Roedel wrote:
> Hi,
> 
> here is version 8 of my patches to enable PTI on x86-32. The
> last version got some good review which I mostly worked into
> this version.


> for easier testing. The code survived >12h overnight testing
> with my usual
> 
> 	* 'perf top' for NMI load
> 
> 	* x86-selftests in a loop (except mpx and pkeys
> 	  which are not supported on the machine)
> 
> 	* kernel-compile in a loop
> 
> all in parallel. I also boot-tested x86-64 and !PAE config
> again and ran my GLB-test to make sure that the global
> mappings between user and kernel page-table are identical.
> All that succeeded and showed no regressions.

For the record:

Tested-by: Pavel Machek <pavel@ucw.cz>

(on top of .18.0-rc5-next-20180718)

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 07/39] x86/entry/32: Enter the kernel via trampoline stack
  2018-07-18  9:40 ` [PATCH 07/39] x86/entry/32: Enter the kernel via trampoline stack Joerg Roedel
@ 2018-07-18 18:09   ` Brian Gerst
  2018-07-19 20:52     ` Thomas Gleixner
  2018-07-19 23:22   ` [tip:x86/pti] " tip-bot for Joerg Roedel
  1 sibling, 1 reply; 105+ messages in thread
From: Brian Gerst @ 2018-07-18 18:09 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List, Linux-MM,
	Linus Torvalds, Andy Lutomirski, Dave Hansen, Josh Poimboeuf,
	Juergen Gross, Peter Zijlstra, Borislav Petkov, Jiri Kosina,
	Boris Ostrovsky, David Laight, Denys Vlasenko, Eduardo Valentin,
	Greg Kroah-Hartman, Will Deacon, Liguori, Anthony, Daniel Gruss,
	Hugh Dickins, Kees Cook, Andrea Arcangeli, Waiman Long,
	Pavel Machek, dhgutteridge, Joerg Roedel

On Wed, Jul 18, 2018 at 5:41 AM Joerg Roedel <joro@8bytes.org> wrote:
>
> From: Joerg Roedel <jroedel@suse.de>
>
> Use the entry-stack as a trampoline to enter the kernel. The
> entry-stack is already in the cpu_entry_area and will be
> mapped to userspace when PTI is enabled.
>
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  arch/x86/entry/entry_32.S        | 119 ++++++++++++++++++++++++++++++++-------
>  arch/x86/include/asm/switch_to.h |  14 ++++-
>  arch/x86/kernel/asm-offsets.c    |   1 +
>  arch/x86/kernel/cpu/common.c     |   5 +-
>  arch/x86/kernel/process.c        |   2 -
>  arch/x86/kernel/process_32.c     |   2 -
>  6 files changed, 115 insertions(+), 28 deletions(-)
>
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index 7251c4f..fea49ec 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -154,7 +154,7 @@
>
>  #endif /* CONFIG_X86_32_LAZY_GS */
>
> -.macro SAVE_ALL pt_regs_ax=%eax
> +.macro SAVE_ALL pt_regs_ax=%eax switch_stacks=0
>         cld
>         PUSH_GS
>         pushl   %fs
> @@ -173,6 +173,12 @@
>         movl    $(__KERNEL_PERCPU), %edx
>         movl    %edx, %fs
>         SET_KERNEL_GS %edx
> +
> +       /* Switch to kernel stack if necessary */
> +.if \switch_stacks > 0
> +       SWITCH_TO_KERNEL_STACK
> +.endif
> +
>  .endm
>
>  /*
> @@ -269,6 +275,73 @@
>  .Lend_\@:
>  #endif /* CONFIG_X86_ESPFIX32 */
>  .endm
> +
> +
> +/*
> + * Called with pt_regs fully populated and kernel segments loaded,
> + * so we can access PER_CPU and use the integer registers.
> + *
> + * We need to be very careful here with the %esp switch, because an NMI
> + * can happen everywhere. If the NMI handler finds itself on the
> + * entry-stack, it will overwrite the task-stack and everything we
> + * copied there. So allocate the stack-frame on the task-stack and
> + * switch to it before we do any copying.
> + */
> +.macro SWITCH_TO_KERNEL_STACK
> +
> +       ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
> +
> +       /* Are we on the entry stack? Bail out if not! */
> +       movl    PER_CPU_VAR(cpu_entry_area), %ecx
> +       addl    $CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
> +       subl    %esp, %ecx      /* ecx = (end of entry_stack) - esp */
> +       cmpl    $SIZEOF_entry_stack, %ecx
> +       jae     .Lend_\@
> +
> +       /* Load stack pointer into %esi and %edi */
> +       movl    %esp, %esi
> +       movl    %esi, %edi
> +
> +       /* Move %edi to the top of the entry stack */
> +       andl    $(MASK_entry_stack), %edi
> +       addl    $(SIZEOF_entry_stack), %edi
> +
> +       /* Load top of task-stack into %edi */
> +       movl    TSS_entry2task_stack(%edi), %edi
> +
> +       /* Bytes to copy */
> +       movl    $PTREGS_SIZE, %ecx
> +
> +#ifdef CONFIG_VM86
> +       testl   $X86_EFLAGS_VM, PT_EFLAGS(%esi)
> +       jz      .Lcopy_pt_regs_\@
> +
> +       /*
> +        * Stack-frame contains 4 additional segment registers when
> +        * coming from VM86 mode
> +        */
> +       addl    $(4 * 4), %ecx
> +
> +.Lcopy_pt_regs_\@:
> +#endif
> +
> +       /* Allocate frame on task-stack */
> +       subl    %ecx, %edi
> +
> +       /* Switch to task-stack */
> +       movl    %edi, %esp
> +
> +       /*
> +        * We are now on the task-stack and can safely copy over the
> +        * stack-frame
> +        */
> +       shrl    $2, %ecx

This shift can be removed if you divide the constants by 4 above.
Ditto on the exit path in the next patch.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 07/39] x86/entry/32: Enter the kernel via trampoline stack
  2018-07-18 18:09   ` Brian Gerst
@ 2018-07-19 20:52     ` Thomas Gleixner
  0 siblings, 0 replies; 105+ messages in thread
From: Thomas Gleixner @ 2018-07-19 20:52 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Joerg Roedel, Ingo Molnar, H. Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List, Linux-MM,
	Linus Torvalds, Andy Lutomirski, Dave Hansen, Josh Poimboeuf,
	Juergen Gross, Peter Zijlstra, Borislav Petkov, Jiri Kosina,
	Boris Ostrovsky, David Laight, Denys Vlasenko, Eduardo Valentin,
	Greg Kroah-Hartman, Will Deacon, Liguori, Anthony, Daniel Gruss,
	Hugh Dickins, Kees Cook, Andrea Arcangeli, Waiman Long,
	Pavel Machek, dhgutteridge, Joerg Roedel

On Wed, 18 Jul 2018, Brian Gerst wrote:
> > +.Lcopy_pt_regs_\@:
> > +#endif
> > +
> > +       /* Allocate frame on task-stack */
> > +       subl    %ecx, %edi
> > +
> > +       /* Switch to task-stack */
> > +       movl    %edi, %esp
> > +
> > +       /*
> > +        * We are now on the task-stack and can safely copy over the
> > +        * stack-frame
> > +        */
> > +       shrl    $2, %ecx
> 
> This shift can be removed if you divide the constants by 4 above.
> Ditto on the exit path in the next patch.

No, the

> > +       /* Allocate frame on task-stack */
> > +       subl    %ecx, %edi

needs the full value in bytes ....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/asm-offsets: Move TSS_sp0 and TSS_sp1 to asm-offsets.c
  2018-07-18  9:40 ` [PATCH 01/39] x86/asm-offsets: Move TSS_sp0 and TSS_sp1 to asm-offsets.c Joerg Roedel
@ 2018-07-19 23:18   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:18 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: brgerst, peterz, jroedel, torvalds, jgross, gregkh, jpoimboe,
	luto, hpa, dvlasenk, eduval, mingo, dhgutteridge, bp,
	boris.ostrovsky, linux-kernel, aarcange, dave.hansen, pavel,
	tglx, will.deacon, David.Laight, jkosina, llong

Commit-ID:  9e97b73fdb235345a826519862a52a7398c89eb8
Gitweb:     https://git.kernel.org/tip/9e97b73fdb235345a826519862a52a7398c89eb8
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:40:38 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:35 +0200

x86/asm-offsets: Move TSS_sp0 and TSS_sp1 to asm-offsets.c

These offsets will be used in 32 bit assembly code as well, so make them
available for all of x86 code.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-2-git-send-email-joro@8bytes.org

---
 arch/x86/kernel/asm-offsets.c    | 4 ++++
 arch/x86/kernel/asm-offsets_64.c | 2 --
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index dcb008c320fe..a1e16286a832 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -103,4 +103,8 @@ void common(void) {
 	OFFSET(CPU_ENTRY_AREA_entry_trampoline, cpu_entry_area, entry_trampoline);
 	OFFSET(CPU_ENTRY_AREA_entry_stack, cpu_entry_area, entry_stack_page);
 	DEFINE(SIZEOF_entry_stack, sizeof(struct entry_stack));
+
+	/* Offset for sp0 and sp1 into the tss_struct */
+	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
+	OFFSET(TSS_sp1, tss_struct, x86_tss.sp1);
 }
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index b2dcd161f514..3b9405e7ba2b 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -65,8 +65,6 @@ int main(void)
 #undef ENTRY
 
 	OFFSET(TSS_ist, tss_struct, x86_tss.ist);
-	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
-	OFFSET(TSS_sp1, tss_struct, x86_tss.sp1);
 	BLANK();
 
 #ifdef CONFIG_STACKPROTECTOR

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/entry/32: Rename TSS_sysenter_sp0 to TSS_entry2task_stack
  2018-07-18  9:40 ` [PATCH 02/39] x86/entry/32: Rename TSS_sysenter_sp0 to TSS_entry2task_stack Joerg Roedel
@ 2018-07-19 23:19   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: llong, will.deacon, linux-kernel, aarcange, jpoimboe,
	dhgutteridge, eduval, pavel, dvlasenk, mingo, David.Laight,
	brgerst, torvalds, tglx, bp, hpa, luto, dave.hansen, jroedel,
	jkosina, peterz, gregkh, jgross, boris.ostrovsky

Commit-ID:  ae2e565bc6aaee3f3db420fec5fdd39755c9813e
Gitweb:     https://git.kernel.org/tip/ae2e565bc6aaee3f3db420fec5fdd39755c9813e
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:40:39 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:35 +0200

x86/entry/32: Rename TSS_sysenter_sp0 to TSS_entry2task_stack

The stack address doesn't need to be stored in tss.sp0 if the stack is
switched manually like on sysenter. Rename the offset so that it still
makes sense when its location is changed in later patches.

This stackk will also be used for all kernel-entry points, not just
sysenter. Reflect that and the fact that it is the offset to the task-stack
location in the name as well.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-3-git-send-email-joro@8bytes.org

---
 arch/x86/entry/entry_32.S        | 2 +-
 arch/x86/kernel/asm-offsets_32.c | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index c371bfee137a..39f711ad0fc9 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -412,7 +412,7 @@ ENTRY(xen_sysenter_target)
  * 0(%ebp) arg6
  */
 ENTRY(entry_SYSENTER_32)
-	movl	TSS_sysenter_sp0(%esp), %esp
+	movl	TSS_entry2task_stack(%esp), %esp
 .Lsysenter_past_esp:
 	pushl	$__USER_DS		/* pt_regs->ss */
 	pushl	%ebp			/* pt_regs->sp (stashed in bp) */
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index a4a3be399f4b..15b3f45b69cd 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -46,8 +46,9 @@ void foo(void)
 	OFFSET(saved_context_gdt_desc, saved_context, gdt_desc);
 	BLANK();
 
-	/* Offset from the sysenter stack to tss.sp0 */
-	DEFINE(TSS_sysenter_sp0, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) -
+	/* Offset from the entry stack to task stack stored in TSS */
+	DEFINE(TSS_entry2task_stack,
+	       offsetof(struct cpu_entry_area, tss.x86_tss.sp0) -
 	       offsetofend(struct cpu_entry_area, entry_stack_page.stack));
 
 #ifdef CONFIG_STACKPROTECTOR

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/entry/32: Load task stack from x86_tss.sp1 in SYSENTER handler
  2018-07-18  9:40 ` [PATCH 03/39] x86/entry/32: Load task stack from x86_tss.sp1 in SYSENTER handler Joerg Roedel
@ 2018-07-19 23:19   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:19 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, jroedel, luto, aarcange, llong, pavel, dave.hansen,
	mingo, will.deacon, eduval, dvlasenk, linux-kernel, David.Laight,
	tglx, dhgutteridge, brgerst, torvalds, jpoimboe, jgross, jkosina,
	hpa, gregkh, boris.ostrovsky, bp

Commit-ID:  a6b744f3ce9d017dd86b28355de2d8e0d36496d4
Gitweb:     https://git.kernel.org/tip/a6b744f3ce9d017dd86b28355de2d8e0d36496d4
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:40:40 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:36 +0200

x86/entry/32: Load task stack from x86_tss.sp1 in SYSENTER handler

x86_tss.sp0 will be used to point to the entry stack later to use it as a
trampoline stack for other kernel entry points besides SYSENTER.

So store the real task stack pointer in x86_tss.sp1, which is otherwise
unused by the hardware, as Linux doesn't make use of Ring 1.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-4-git-send-email-joro@8bytes.org

---
 arch/x86/kernel/asm-offsets_32.c | 9 +++++++--
 arch/x86/kernel/process_32.c     | 2 ++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index 15b3f45b69cd..82826f2275cc 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -46,9 +46,14 @@ void foo(void)
 	OFFSET(saved_context_gdt_desc, saved_context, gdt_desc);
 	BLANK();
 
-	/* Offset from the entry stack to task stack stored in TSS */
+	/*
+	 * Offset from the entry stack to task stack stored in TSS. Kernel entry
+	 * happens on the per-cpu entry-stack, and the asm code switches to the
+	 * task-stack pointer stored in x86_tss.sp1, which is a copy of
+	 * task->thread.sp0 where entry code can find it.
+	 */
 	DEFINE(TSS_entry2task_stack,
-	       offsetof(struct cpu_entry_area, tss.x86_tss.sp0) -
+	       offsetof(struct cpu_entry_area, tss.x86_tss.sp1) -
 	       offsetofend(struct cpu_entry_area, entry_stack_page.stack));
 
 #ifdef CONFIG_STACKPROTECTOR
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 0ae659de21eb..ec62cc77388d 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -290,6 +290,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	this_cpu_write(cpu_current_top_of_stack,
 		       (unsigned long)task_stack_page(next_p) +
 		       THREAD_SIZE);
+	/* SYSENTER reads the task-stack from tss.sp1 */
+	this_cpu_write(cpu_tss_rw.x86_tss.sp1, next_p->thread.sp0);
 
 	/*
 	 * Restore %gs if needed (which is common)

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/entry/32: Put ESPFIX code into a macro
  2018-07-18  9:40 ` [PATCH 04/39] x86/entry/32: Put ESPFIX code into a macro Joerg Roedel
@ 2018-07-19 23:20   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:20 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: gregkh, hpa, dave.hansen, eduval, aarcange, dhgutteridge,
	brgerst, jgross, mingo, peterz, David.Laight, luto, will.deacon,
	linux-kernel, jroedel, torvalds, dvlasenk, tglx, jkosina, bp,
	jpoimboe, llong, boris.ostrovsky, pavel

Commit-ID:  46eabca284f9f27fe94698a068a4a21ba93b4975
Gitweb:     https://git.kernel.org/tip/46eabca284f9f27fe94698a068a4a21ba93b4975
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:40:41 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:36 +0200

x86/entry/32: Put ESPFIX code into a macro

This makes it easier to split up the shared iret code path.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-5-git-send-email-joro@8bytes.org

---
 arch/x86/entry/entry_32.S | 97 ++++++++++++++++++++++++-----------------------
 1 file changed, 49 insertions(+), 48 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 39f711ad0fc9..ef7d65315e8e 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -221,6 +221,54 @@
 	POP_GS_EX
 .endm
 
+.macro CHECK_AND_APPLY_ESPFIX
+#ifdef CONFIG_X86_ESPFIX32
+#define GDT_ESPFIX_SS PER_CPU_VAR(gdt_page) + (GDT_ENTRY_ESPFIX_SS * 8)
+
+	ALTERNATIVE	"jmp .Lend_\@", "", X86_BUG_ESPFIX
+
+	movl	PT_EFLAGS(%esp), %eax		# mix EFLAGS, SS and CS
+	/*
+	 * Warning: PT_OLDSS(%esp) contains the wrong/random values if we
+	 * are returning to the kernel.
+	 * See comments in process.c:copy_thread() for details.
+	 */
+	movb	PT_OLDSS(%esp), %ah
+	movb	PT_CS(%esp), %al
+	andl	$(X86_EFLAGS_VM | (SEGMENT_TI_MASK << 8) | SEGMENT_RPL_MASK), %eax
+	cmpl	$((SEGMENT_LDT << 8) | USER_RPL), %eax
+	jne	.Lend_\@	# returning to user-space with LDT SS
+
+	/*
+	 * Setup and switch to ESPFIX stack
+	 *
+	 * We're returning to userspace with a 16 bit stack. The CPU will not
+	 * restore the high word of ESP for us on executing iret... This is an
+	 * "official" bug of all the x86-compatible CPUs, which we can work
+	 * around to make dosemu and wine happy. We do this by preloading the
+	 * high word of ESP with the high word of the userspace ESP while
+	 * compensating for the offset by changing to the ESPFIX segment with
+	 * a base address that matches for the difference.
+	 */
+	mov	%esp, %edx			/* load kernel esp */
+	mov	PT_OLDESP(%esp), %eax		/* load userspace esp */
+	mov	%dx, %ax			/* eax: new kernel esp */
+	sub	%eax, %edx			/* offset (low word is 0) */
+	shr	$16, %edx
+	mov	%dl, GDT_ESPFIX_SS + 4		/* bits 16..23 */
+	mov	%dh, GDT_ESPFIX_SS + 7		/* bits 24..31 */
+	pushl	$__ESPFIX_SS
+	pushl	%eax				/* new kernel esp */
+	/*
+	 * Disable interrupts, but do not irqtrace this section: we
+	 * will soon execute iret and the tracer was already set to
+	 * the irqstate after the IRET:
+	 */
+	DISABLE_INTERRUPTS(CLBR_ANY)
+	lss	(%esp), %esp			/* switch to espfix segment */
+.Lend_\@:
+#endif /* CONFIG_X86_ESPFIX32 */
+.endm
 /*
  * %eax: prev task
  * %edx: next task
@@ -547,21 +595,7 @@ ENTRY(entry_INT80_32)
 restore_all:
 	TRACE_IRQS_IRET
 .Lrestore_all_notrace:
-#ifdef CONFIG_X86_ESPFIX32
-	ALTERNATIVE	"jmp .Lrestore_nocheck", "", X86_BUG_ESPFIX
-
-	movl	PT_EFLAGS(%esp), %eax		# mix EFLAGS, SS and CS
-	/*
-	 * Warning: PT_OLDSS(%esp) contains the wrong/random values if we
-	 * are returning to the kernel.
-	 * See comments in process.c:copy_thread() for details.
-	 */
-	movb	PT_OLDSS(%esp), %ah
-	movb	PT_CS(%esp), %al
-	andl	$(X86_EFLAGS_VM | (SEGMENT_TI_MASK << 8) | SEGMENT_RPL_MASK), %eax
-	cmpl	$((SEGMENT_LDT << 8) | USER_RPL), %eax
-	je .Lldt_ss				# returning to user-space with LDT SS
-#endif
+	CHECK_AND_APPLY_ESPFIX
 .Lrestore_nocheck:
 	RESTORE_REGS 4				# skip orig_eax/error_code
 .Lirq_return:
@@ -579,39 +613,6 @@ ENTRY(iret_exc	)
 	jmp	common_exception
 .previous
 	_ASM_EXTABLE(.Lirq_return, iret_exc)
-
-#ifdef CONFIG_X86_ESPFIX32
-.Lldt_ss:
-/*
- * Setup and switch to ESPFIX stack
- *
- * We're returning to userspace with a 16 bit stack. The CPU will not
- * restore the high word of ESP for us on executing iret... This is an
- * "official" bug of all the x86-compatible CPUs, which we can work
- * around to make dosemu and wine happy. We do this by preloading the
- * high word of ESP with the high word of the userspace ESP while
- * compensating for the offset by changing to the ESPFIX segment with
- * a base address that matches for the difference.
- */
-#define GDT_ESPFIX_SS PER_CPU_VAR(gdt_page) + (GDT_ENTRY_ESPFIX_SS * 8)
-	mov	%esp, %edx			/* load kernel esp */
-	mov	PT_OLDESP(%esp), %eax		/* load userspace esp */
-	mov	%dx, %ax			/* eax: new kernel esp */
-	sub	%eax, %edx			/* offset (low word is 0) */
-	shr	$16, %edx
-	mov	%dl, GDT_ESPFIX_SS + 4		/* bits 16..23 */
-	mov	%dh, GDT_ESPFIX_SS + 7		/* bits 24..31 */
-	pushl	$__ESPFIX_SS
-	pushl	%eax				/* new kernel esp */
-	/*
-	 * Disable interrupts, but do not irqtrace this section: we
-	 * will soon execute iret and the tracer was already set to
-	 * the irqstate after the IRET:
-	 */
-	DISABLE_INTERRUPTS(CLBR_ANY)
-	lss	(%esp), %esp			/* switch to espfix segment */
-	jmp	.Lrestore_nocheck
-#endif
 ENDPROC(entry_INT80_32)
 
 .macro FIXUP_ESPFIX_STACK

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/entry/32: Unshare NMI return path
  2018-07-18  9:40 ` [PATCH 05/39] x86/entry/32: Unshare NMI return path Joerg Roedel
@ 2018-07-19 23:21   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, will.deacon, hpa, jgross, aarcange, luto, dvlasenk,
	jpoimboe, mingo, dhgutteridge, brgerst, eduval, llong,
	David.Laight, tglx, bp, pavel, gregkh, jroedel, jkosina, peterz,
	dave.hansen, torvalds, boris.ostrovsky

Commit-ID:  8e676ced31e9d1448d3ffc4159586a259cc67f30
Gitweb:     https://git.kernel.org/tip/8e676ced31e9d1448d3ffc4159586a259cc67f30
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:40:42 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:36 +0200

x86/entry/32: Unshare NMI return path

NMI will no longer use most of the shared return path, because NMI needs
special handling when the CR3 switches for PTI are added. Prepare for that
change.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-6-git-send-email-joro@8bytes.org

---
 arch/x86/entry/entry_32.S | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index ef7d65315e8e..43641310b6e3 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1017,7 +1017,7 @@ ENTRY(nmi)
 
 	/* Not on SYSENTER stack. */
 	call	do_nmi
-	jmp	.Lrestore_all_notrace
+	jmp	.Lnmi_return
 
 .Lnmi_from_sysenter_stack:
 	/*
@@ -1028,7 +1028,11 @@ ENTRY(nmi)
 	movl	PER_CPU_VAR(cpu_current_top_of_stack), %esp
 	call	do_nmi
 	movl	%ebx, %esp
-	jmp	.Lrestore_all_notrace
+
+.Lnmi_return:
+	CHECK_AND_APPLY_ESPFIX
+	RESTORE_REGS 4
+	jmp	.Lirq_return
 
 #ifdef CONFIG_X86_ESPFIX32
 .Lnmi_espfix_stack:

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/entry/32: Split off return-to-kernel path
  2018-07-18  9:40 ` [PATCH 06/39] x86/entry/32: Split off return-to-kernel path Joerg Roedel
@ 2018-07-19 23:21   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dave.hansen, linux-kernel, peterz, eduval, bp, mingo, tglx,
	dvlasenk, jpoimboe, torvalds, hpa, will.deacon, aarcange, llong,
	jgross, David.Laight, dhgutteridge, jroedel, gregkh, brgerst,
	jkosina, luto, pavel, boris.ostrovsky

Commit-ID:  0d2eb73b29996684d5bbb72f85c74b47b4c359f7
Gitweb:     https://git.kernel.org/tip/0d2eb73b29996684d5bbb72f85c74b47b4c359f7
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:40:43 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:37 +0200

x86/entry/32: Split off return-to-kernel path

Use a separate return path when returning to the kernel.

This allows to put the PTI cr3-switch and the switch to the entry-stack
into the return-to-user path without further checking.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-7-git-send-email-joro@8bytes.org

---
 arch/x86/entry/entry_32.S | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 43641310b6e3..7251c4f3e99e 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -65,7 +65,7 @@
 # define preempt_stop(clobbers)	DISABLE_INTERRUPTS(clobbers); TRACE_IRQS_OFF
 #else
 # define preempt_stop(clobbers)
-# define resume_kernel		restore_all
+# define resume_kernel		restore_all_kernel
 #endif
 
 .macro TRACE_IRQS_IRET
@@ -399,9 +399,9 @@ ENTRY(resume_kernel)
 	DISABLE_INTERRUPTS(CLBR_ANY)
 .Lneed_resched:
 	cmpl	$0, PER_CPU_VAR(__preempt_count)
-	jnz	restore_all
+	jnz	restore_all_kernel
 	testl	$X86_EFLAGS_IF, PT_EFLAGS(%esp)	# interrupts off (exception path) ?
-	jz	restore_all
+	jz	restore_all_kernel
 	call	preempt_schedule_irq
 	jmp	.Lneed_resched
 END(resume_kernel)
@@ -606,6 +606,11 @@ restore_all:
 	 */
 	INTERRUPT_RETURN
 
+restore_all_kernel:
+	TRACE_IRQS_IRET
+	RESTORE_REGS 4
+	jmp	.Lirq_return
+
 .section .fixup, "ax"
 ENTRY(iret_exc	)
 	pushl	$0				# no error code

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* Re: [PATCH 00/39 v8] PTI support for x86-32
  2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
                   ` (39 preceding siblings ...)
  2018-07-18 11:59 ` [PATCH 00/39 v8] PTI support for x86-32 Pavel Machek
@ 2018-07-19 23:21 ` Thomas Gleixner
  2018-07-20  7:59   ` Joerg Roedel
  40 siblings, 1 reply; 105+ messages in thread
From: Thomas Gleixner @ 2018-07-19 23:21 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Ingo Molnar, H . Peter Anvin, x86, linux-kernel, linux-mm,
	Linus Torvalds, Andy Lutomirski, Dave Hansen, Josh Poimboeuf,
	Juergen Gross, Peter Zijlstra, Borislav Petkov, Jiri Kosina,
	Boris Ostrovsky, Brian Gerst, David Laight, Denys Vlasenko,
	Eduardo Valentin, Greg KH, Will Deacon, aliguori, daniel.gruss,
	hughd, keescook, Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel

Joerg,

On Wed, 18 Jul 2018, Joerg Roedel wrote:
> 
> here is version 8 of my patches to enable PTI on x86-32. The
> last version got some good review which I mostly worked into
> this version.

I went over the whole set once again and did not find any real issues. As
the outstanding review comments are addressed, I decided that only broader
exposure can shake out eventually remaining issues. Applied and pushed out,
so it should show up in linux-next soon.

The mm regression seems to be sorted, so there is no immeditate fallout
expected.

Thanks for your patience in reworking this over and over. Thanks to Andy
for putting his entry focssed eyes on it more than once. Great work!

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/entry/32: Enter the kernel via trampoline stack
  2018-07-18  9:40 ` [PATCH 07/39] x86/entry/32: Enter the kernel via trampoline stack Joerg Roedel
  2018-07-18 18:09   ` Brian Gerst
@ 2018-07-19 23:22   ` tip-bot for Joerg Roedel
  1 sibling, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dave.hansen, brgerst, bp, gregkh, torvalds, pavel, jpoimboe,
	jkosina, will.deacon, dhgutteridge, luto, eduval, llong, peterz,
	boris.ostrovsky, linux-kernel, David.Laight, hpa, tglx, jroedel,
	jgross, aarcange, mingo, dvlasenk

Commit-ID:  45d7b255747c21fc4b1f5043bee0754d39c3bdbf
Gitweb:     https://git.kernel.org/tip/45d7b255747c21fc4b1f5043bee0754d39c3bdbf
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:40:44 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:37 +0200

x86/entry/32: Enter the kernel via trampoline stack

Use the entry-stack as a trampoline to enter the kernel. The entry-stack is
already in the cpu_entry_area and will be mapped to userspace when PTI is
enabled.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-8-git-send-email-joro@8bytes.org

---
 arch/x86/entry/entry_32.S        | 119 ++++++++++++++++++++++++++++++++-------
 arch/x86/include/asm/switch_to.h |  14 ++++-
 arch/x86/kernel/asm-offsets.c    |   1 +
 arch/x86/kernel/cpu/common.c     |   5 +-
 arch/x86/kernel/process.c        |   2 -
 arch/x86/kernel/process_32.c     |   2 -
 6 files changed, 115 insertions(+), 28 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 7251c4f3e99e..fea49ec345ba 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -154,7 +154,7 @@
 
 #endif /* CONFIG_X86_32_LAZY_GS */
 
-.macro SAVE_ALL pt_regs_ax=%eax
+.macro SAVE_ALL pt_regs_ax=%eax switch_stacks=0
 	cld
 	PUSH_GS
 	pushl	%fs
@@ -173,6 +173,12 @@
 	movl	$(__KERNEL_PERCPU), %edx
 	movl	%edx, %fs
 	SET_KERNEL_GS %edx
+
+	/* Switch to kernel stack if necessary */
+.if \switch_stacks > 0
+	SWITCH_TO_KERNEL_STACK
+.endif
+
 .endm
 
 /*
@@ -269,6 +275,73 @@
 .Lend_\@:
 #endif /* CONFIG_X86_ESPFIX32 */
 .endm
+
+
+/*
+ * Called with pt_regs fully populated and kernel segments loaded,
+ * so we can access PER_CPU and use the integer registers.
+ *
+ * We need to be very careful here with the %esp switch, because an NMI
+ * can happen everywhere. If the NMI handler finds itself on the
+ * entry-stack, it will overwrite the task-stack and everything we
+ * copied there. So allocate the stack-frame on the task-stack and
+ * switch to it before we do any copying.
+ */
+.macro SWITCH_TO_KERNEL_STACK
+
+	ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
+
+	/* Are we on the entry stack? Bail out if not! */
+	movl	PER_CPU_VAR(cpu_entry_area), %ecx
+	addl	$CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
+	subl	%esp, %ecx	/* ecx = (end of entry_stack) - esp */
+	cmpl	$SIZEOF_entry_stack, %ecx
+	jae	.Lend_\@
+
+	/* Load stack pointer into %esi and %edi */
+	movl	%esp, %esi
+	movl	%esi, %edi
+
+	/* Move %edi to the top of the entry stack */
+	andl	$(MASK_entry_stack), %edi
+	addl	$(SIZEOF_entry_stack), %edi
+
+	/* Load top of task-stack into %edi */
+	movl	TSS_entry2task_stack(%edi), %edi
+
+	/* Bytes to copy */
+	movl	$PTREGS_SIZE, %ecx
+
+#ifdef CONFIG_VM86
+	testl	$X86_EFLAGS_VM, PT_EFLAGS(%esi)
+	jz	.Lcopy_pt_regs_\@
+
+	/*
+	 * Stack-frame contains 4 additional segment registers when
+	 * coming from VM86 mode
+	 */
+	addl	$(4 * 4), %ecx
+
+.Lcopy_pt_regs_\@:
+#endif
+
+	/* Allocate frame on task-stack */
+	subl	%ecx, %edi
+
+	/* Switch to task-stack */
+	movl	%edi, %esp
+
+	/*
+	 * We are now on the task-stack and can safely copy over the
+	 * stack-frame
+	 */
+	shrl	$2, %ecx
+	cld
+	rep movsl
+
+.Lend_\@:
+.endm
+
 /*
  * %eax: prev task
  * %edx: next task
@@ -469,7 +542,7 @@ ENTRY(entry_SYSENTER_32)
 	pushl	$__USER_CS		/* pt_regs->cs */
 	pushl	$0			/* pt_regs->ip = 0 (placeholder) */
 	pushl	%eax			/* pt_regs->orig_ax */
-	SAVE_ALL pt_regs_ax=$-ENOSYS	/* save rest */
+	SAVE_ALL pt_regs_ax=$-ENOSYS	/* save rest, stack already switched */
 
 	/*
 	 * SYSENTER doesn't filter flags, so we need to clear NT, AC
@@ -580,7 +653,8 @@ ENDPROC(entry_SYSENTER_32)
 ENTRY(entry_INT80_32)
 	ASM_CLAC
 	pushl	%eax			/* pt_regs->orig_ax */
-	SAVE_ALL pt_regs_ax=$-ENOSYS	/* save rest */
+
+	SAVE_ALL pt_regs_ax=$-ENOSYS switch_stacks=1	/* save rest */
 
 	/*
 	 * User mode is traced as though IRQs are on, and the interrupt gate
@@ -677,7 +751,8 @@ END(irq_entries_start)
 common_interrupt:
 	ASM_CLAC
 	addl	$-0x80, (%esp)			/* Adjust vector into the [-256, -1] range */
-	SAVE_ALL
+
+	SAVE_ALL switch_stacks=1
 	ENCODE_FRAME_POINTER
 	TRACE_IRQS_OFF
 	movl	%esp, %eax
@@ -685,16 +760,16 @@ common_interrupt:
 	jmp	ret_from_intr
 ENDPROC(common_interrupt)
 
-#define BUILD_INTERRUPT3(name, nr, fn)	\
-ENTRY(name)				\
-	ASM_CLAC;			\
-	pushl	$~(nr);			\
-	SAVE_ALL;			\
-	ENCODE_FRAME_POINTER;		\
-	TRACE_IRQS_OFF			\
-	movl	%esp, %eax;		\
-	call	fn;			\
-	jmp	ret_from_intr;		\
+#define BUILD_INTERRUPT3(name, nr, fn)			\
+ENTRY(name)						\
+	ASM_CLAC;					\
+	pushl	$~(nr);					\
+	SAVE_ALL switch_stacks=1;			\
+	ENCODE_FRAME_POINTER;				\
+	TRACE_IRQS_OFF					\
+	movl	%esp, %eax;				\
+	call	fn;					\
+	jmp	ret_from_intr;				\
 ENDPROC(name)
 
 #define BUILD_INTERRUPT(name, nr)		\
@@ -926,16 +1001,20 @@ common_exception:
 	pushl	%es
 	pushl	%ds
 	pushl	%eax
+	movl	$(__USER_DS), %eax
+	movl	%eax, %ds
+	movl	%eax, %es
+	movl	$(__KERNEL_PERCPU), %eax
+	movl	%eax, %fs
 	pushl	%ebp
 	pushl	%edi
 	pushl	%esi
 	pushl	%edx
 	pushl	%ecx
 	pushl	%ebx
+	SWITCH_TO_KERNEL_STACK
 	ENCODE_FRAME_POINTER
 	cld
-	movl	$(__KERNEL_PERCPU), %ecx
-	movl	%ecx, %fs
 	UNWIND_ESPFIX_STACK
 	GS_TO_REG %ecx
 	movl	PT_GS(%esp), %edi		# get the function address
@@ -943,9 +1022,6 @@ common_exception:
 	movl	$-1, PT_ORIG_EAX(%esp)		# no syscall to restart
 	REG_TO_PTGS %ecx
 	SET_KERNEL_GS %ecx
-	movl	$(__USER_DS), %ecx
-	movl	%ecx, %ds
-	movl	%ecx, %es
 	TRACE_IRQS_OFF
 	movl	%esp, %eax			# pt_regs pointer
 	CALL_NOSPEC %edi
@@ -964,6 +1040,7 @@ ENTRY(debug)
 	 */
 	ASM_CLAC
 	pushl	$-1				# mark this as an int
+
 	SAVE_ALL
 	ENCODE_FRAME_POINTER
 	xorl	%edx, %edx			# error code 0
@@ -999,6 +1076,7 @@ END(debug)
  */
 ENTRY(nmi)
 	ASM_CLAC
+
 #ifdef CONFIG_X86_ESPFIX32
 	pushl	%eax
 	movl	%ss, %eax
@@ -1066,7 +1144,8 @@ END(nmi)
 ENTRY(int3)
 	ASM_CLAC
 	pushl	$-1				# mark this as an int
-	SAVE_ALL
+
+	SAVE_ALL switch_stacks=1
 	ENCODE_FRAME_POINTER
 	TRACE_IRQS_OFF
 	xorl	%edx, %edx			# zero error code
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index eb5f7999a893..8bc2f70b452e 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -89,13 +89,23 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread)
 /* This is used when switching tasks or entering/exiting vm86 mode. */
 static inline void update_sp0(struct task_struct *task)
 {
-	/* On x86_64, sp0 always points to the entry trampoline stack, which is constant: */
+	/* sp0 always points to the entry trampoline stack, which is constant: */
 #ifdef CONFIG_X86_32
-	load_sp0(task->thread.sp0);
+	if (static_cpu_has(X86_FEATURE_XENPV))
+		load_sp0(task->thread.sp0);
+	else
+		this_cpu_write(cpu_tss_rw.x86_tss.sp1, task->thread.sp0);
 #else
+	/*
+	 * x86-64 updates x86_tss.sp1 via cpu_current_top_of_stack. That
+	 * doesn't work on x86-32 because sp1 and
+	 * cpu_current_top_of_stack have different values (because of
+	 * the non-zero stack-padding on 32bit).
+	 */
 	if (static_cpu_has(X86_FEATURE_XENPV))
 		load_sp0(task_top_of_stack(task));
 #endif
+
 }
 
 #endif /* _ASM_X86_SWITCH_TO_H */
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index a1e16286a832..01de31db300d 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -103,6 +103,7 @@ void common(void) {
 	OFFSET(CPU_ENTRY_AREA_entry_trampoline, cpu_entry_area, entry_trampoline);
 	OFFSET(CPU_ENTRY_AREA_entry_stack, cpu_entry_area, entry_stack_page);
 	DEFINE(SIZEOF_entry_stack, sizeof(struct entry_stack));
+	DEFINE(MASK_entry_stack, (~(sizeof(struct entry_stack) - 1)));
 
 	/* Offset for sp0 and sp1 into the tss_struct */
 	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index eb4cb3efd20e..43a927eb9c09 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1804,11 +1804,12 @@ void cpu_init(void)
 	enter_lazy_tlb(&init_mm, curr);
 
 	/*
-	 * Initialize the TSS.  Don't bother initializing sp0, as the initial
-	 * task never enters user mode.
+	 * Initialize the TSS.  sp0 points to the entry trampoline stack
+	 * regardless of what task is running.
 	 */
 	set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);
 	load_TR_desc();
+	load_sp0((unsigned long)(cpu_entry_stack(cpu) + 1));
 
 	load_mm_ldt(&init_mm);
 
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 30ca2d1a9231..c93fcfdf1673 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -57,14 +57,12 @@ __visible DEFINE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw) = {
 		 */
 		.sp0 = (1UL << (BITS_PER_LONG-1)) + 1,
 
-#ifdef CONFIG_X86_64
 		/*
 		 * .sp1 is cpu_current_top_of_stack.  The init task never
 		 * runs user code, but cpu_current_top_of_stack should still
 		 * be well defined before the first context switch.
 		 */
 		.sp1 = TOP_OF_INIT_STACK,
-#endif
 
 #ifdef CONFIG_X86_32
 		.ss0 = __KERNEL_DS,
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index ec62cc77388d..0ae659de21eb 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -290,8 +290,6 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	this_cpu_write(cpu_current_top_of_stack,
 		       (unsigned long)task_stack_page(next_p) +
 		       THREAD_SIZE);
-	/* SYSENTER reads the task-stack from tss.sp1 */
-	this_cpu_write(cpu_tss_rw.x86_tss.sp1, next_p->thread.sp0);
 
 	/*
 	 * Restore %gs if needed (which is common)

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/entry/32: Leave the kernel via trampoline stack
  2018-07-18  9:40 ` [PATCH 08/39] x86/entry/32: Leave " Joerg Roedel
@ 2018-07-19 23:22   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dvlasenk, mingo, will.deacon, jgross, tglx, eduval, torvalds,
	dhgutteridge, dave.hansen, peterz, hpa, brgerst, jpoimboe, luto,
	boris.ostrovsky, David.Laight, bp, aarcange, llong, linux-kernel,
	gregkh, jroedel, jkosina, pavel

Commit-ID:  e5862d0515ad970ccec6208ecf5bb0cffe291ea3
Gitweb:     https://git.kernel.org/tip/e5862d0515ad970ccec6208ecf5bb0cffe291ea3
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:40:45 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:37 +0200

x86/entry/32: Leave the kernel via trampoline stack

Switch back to the trampoline stack before returning to userspace.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-9-git-send-email-joro@8bytes.org

---
 arch/x86/entry/entry_32.S | 79 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 77 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index fea49ec345ba..a905e6215ea9 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -342,6 +342,60 @@
 .Lend_\@:
 .endm
 
+/*
+ * Switch back from the kernel stack to the entry stack.
+ *
+ * The %esp register must point to pt_regs on the task stack. It will
+ * first calculate the size of the stack-frame to copy, depending on
+ * whether we return to VM86 mode or not. With that it uses 'rep movsl'
+ * to copy the contents of the stack over to the entry stack.
+ *
+ * We must be very careful here, as we can't trust the contents of the
+ * task-stack once we switched to the entry-stack. When an NMI happens
+ * while on the entry-stack, the NMI handler will switch back to the top
+ * of the task stack, overwriting our stack-frame we are about to copy.
+ * Therefore we switch the stack only after everything is copied over.
+ */
+.macro SWITCH_TO_ENTRY_STACK
+
+	ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
+
+	/* Bytes to copy */
+	movl	$PTREGS_SIZE, %ecx
+
+#ifdef CONFIG_VM86
+	testl	$(X86_EFLAGS_VM), PT_EFLAGS(%esp)
+	jz	.Lcopy_pt_regs_\@
+
+	/* Additional 4 registers to copy when returning to VM86 mode */
+	addl    $(4 * 4), %ecx
+
+.Lcopy_pt_regs_\@:
+#endif
+
+	/* Initialize source and destination for movsl */
+	movl	PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %edi
+	subl	%ecx, %edi
+	movl	%esp, %esi
+
+	/* Save future stack pointer in %ebx */
+	movl	%edi, %ebx
+
+	/* Copy over the stack-frame */
+	shrl	$2, %ecx
+	cld
+	rep movsl
+
+	/*
+	 * Switch to entry-stack - needs to happen after everything is
+	 * copied because the NMI handler will overwrite the task-stack
+	 * when on entry-stack
+	 */
+	movl	%ebx, %esp
+
+.Lend_\@:
+.endm
+
 /*
  * %eax: prev task
  * %edx: next task
@@ -581,25 +635,45 @@ ENTRY(entry_SYSENTER_32)
 
 /* Opportunistic SYSEXIT */
 	TRACE_IRQS_ON			/* User mode traces as IRQs on. */
+
+	/*
+	 * Setup entry stack - we keep the pointer in %eax and do the
+	 * switch after almost all user-state is restored.
+	 */
+
+	/* Load entry stack pointer and allocate frame for eflags/eax */
+	movl	PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %eax
+	subl	$(2*4), %eax
+
+	/* Copy eflags and eax to entry stack */
+	movl	PT_EFLAGS(%esp), %edi
+	movl	PT_EAX(%esp), %esi
+	movl	%edi, (%eax)
+	movl	%esi, 4(%eax)
+
+	/* Restore user registers and segments */
 	movl	PT_EIP(%esp), %edx	/* pt_regs->ip */
 	movl	PT_OLDESP(%esp), %ecx	/* pt_regs->sp */
 1:	mov	PT_FS(%esp), %fs
 	PTGS_TO_GS
+
 	popl	%ebx			/* pt_regs->bx */
 	addl	$2*4, %esp		/* skip pt_regs->cx and pt_regs->dx */
 	popl	%esi			/* pt_regs->si */
 	popl	%edi			/* pt_regs->di */
 	popl	%ebp			/* pt_regs->bp */
-	popl	%eax			/* pt_regs->ax */
+
+	/* Switch to entry stack */
+	movl	%eax, %esp
 
 	/*
 	 * Restore all flags except IF. (We restore IF separately because
 	 * STI gives a one-instruction window in which we won't be interrupted,
 	 * whereas POPF does not.)
 	 */
-	addl	$PT_EFLAGS-PT_DS, %esp	/* point esp at pt_regs->flags */
 	btrl	$X86_EFLAGS_IF_BIT, (%esp)
 	popfl
+	popl	%eax
 
 	/*
 	 * Return back to the vDSO, which will pop ecx and edx.
@@ -668,6 +742,7 @@ ENTRY(entry_INT80_32)
 
 restore_all:
 	TRACE_IRQS_IRET
+	SWITCH_TO_ENTRY_STACK
 .Lrestore_all_notrace:
 	CHECK_AND_APPLY_ESPFIX
 .Lrestore_nocheck:

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/entry/32: Introduce SAVE_ALL_NMI and RESTORE_ALL_NMI
  2018-07-18  9:40 ` [PATCH 09/39] x86/entry/32: Introduce SAVE_ALL_NMI and RESTORE_ALL_NMI Joerg Roedel
@ 2018-07-19 23:23   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:23 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: David.Laight, peterz, hpa, llong, dhgutteridge, luto, dvlasenk,
	jroedel, mingo, eduval, will.deacon, dave.hansen, jpoimboe,
	pavel, aarcange, bp, jgross, boris.ostrovsky, jkosina, gregkh,
	tglx, brgerst, linux-kernel, torvalds

Commit-ID:  8b376fae0514dc7ee04786e2327169e39d12e51b
Gitweb:     https://git.kernel.org/tip/8b376fae0514dc7ee04786e2327169e39d12e51b
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:40:46 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:38 +0200

x86/entry/32: Introduce SAVE_ALL_NMI and RESTORE_ALL_NMI

These macros will be used in the NMI handler code and replace plain
SAVE_ALL and RESTORE_REGS there.

The NMI-specific CR3-switch will be added to these macros later.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-10-git-send-email-joro@8bytes.org

---
 arch/x86/entry/entry_32.S | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index a905e6215ea9..763592596727 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -181,6 +181,9 @@
 
 .endm
 
+.macro SAVE_ALL_NMI
+	SAVE_ALL
+.endm
 /*
  * This is a sneaky trick to help the unwinder find pt_regs on the stack.  The
  * frame pointer is replaced with an encoded pointer to pt_regs.  The encoding
@@ -227,6 +230,10 @@
 	POP_GS_EX
 .endm
 
+.macro RESTORE_ALL_NMI pop=0
+	RESTORE_REGS pop=\pop
+.endm
+
 .macro CHECK_AND_APPLY_ESPFIX
 #ifdef CONFIG_X86_ESPFIX32
 #define GDT_ESPFIX_SS PER_CPU_VAR(gdt_page) + (GDT_ENTRY_ESPFIX_SS * 8)
@@ -1161,7 +1168,7 @@ ENTRY(nmi)
 #endif
 
 	pushl	%eax				# pt_regs->orig_ax
-	SAVE_ALL
+	SAVE_ALL_NMI
 	ENCODE_FRAME_POINTER
 	xorl	%edx, %edx			# zero error code
 	movl	%esp, %eax			# pt_regs pointer
@@ -1189,7 +1196,7 @@ ENTRY(nmi)
 
 .Lnmi_return:
 	CHECK_AND_APPLY_ESPFIX
-	RESTORE_REGS 4
+	RESTORE_ALL_NMI pop=4
 	jmp	.Lirq_return
 
 #ifdef CONFIG_X86_ESPFIX32
@@ -1205,12 +1212,12 @@ ENTRY(nmi)
 	pushl	16(%esp)
 	.endr
 	pushl	%eax
-	SAVE_ALL
+	SAVE_ALL_NMI
 	ENCODE_FRAME_POINTER
 	FIXUP_ESPFIX_STACK			# %eax == %esp
 	xorl	%edx, %edx			# zero error code
 	call	do_nmi
-	RESTORE_REGS
+	RESTORE_ALL_NMI
 	lss	12+4(%esp), %esp		# back to espfix stack
 	jmp	.Lirq_return
 #endif

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack
  2018-07-18  9:40 ` [PATCH 10/39] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack Joerg Roedel
@ 2018-07-19 23:23   ` tip-bot for Joerg Roedel
  2018-10-12 18:29   ` [PATCH 10/39] " Jan Kiszka
  1 sibling, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:23 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: luto, dave.hansen, aarcange, llong, peterz, eduval, brgerst,
	mingo, tglx, dvlasenk, linux-kernel, jroedel, bp, dhgutteridge,
	David.Laight, jkosina, pavel, torvalds, gregkh, boris.ostrovsky,
	will.deacon, jpoimboe, hpa, jgross

Commit-ID:  b92a165df17ee6e616e43107730f06bf6ecf5d8d
Gitweb:     https://git.kernel.org/tip/b92a165df17ee6e616e43107730f06bf6ecf5d8d
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:40:47 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:38 +0200

x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack

It is possible that the kernel is entered from kernel-mode and on the
entry-stack. The most common way this happens is when an exception is
triggered while loading the user-space segment registers on the
kernel-to-userspace exit path.

The segment loading needs to be done after the entry-stack switch, because
the stack-switch needs kernel %fs for per_cpu access.

When this happens, make sure to leave the kernel with the entry-stack
again, so that the interrupted code-path runs on the right stack when
switching to the user-cr3.

Detect this condition on kernel-entry by checking CS.RPL and %esp, and if
it happens, copy over the complete content of the entry stack to the
task-stack.  This needs to be done because once the exception handler is
entereed, the task might be scheduled out or even migrated to a different
CPU, so this cannot rely on the entry-stack contents. Leave a marker in the
stack-frame to detect this condition on the exit path.

On the exit path the copy is reversed, copy all of the remaining task-stack
back to the entry-stack and switch to it.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-11-git-send-email-joro@8bytes.org

---
 arch/x86/entry/entry_32.S | 116 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 115 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 763592596727..9d6eceba0461 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -294,6 +294,9 @@
  * copied there. So allocate the stack-frame on the task-stack and
  * switch to it before we do any copying.
  */
+
+#define CS_FROM_ENTRY_STACK	(1 << 31)
+
 .macro SWITCH_TO_KERNEL_STACK
 
 	ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
@@ -316,6 +319,16 @@
 	/* Load top of task-stack into %edi */
 	movl	TSS_entry2task_stack(%edi), %edi
 
+	/*
+	 * Clear unused upper bits of the dword containing the word-sized CS
+	 * slot in pt_regs in case hardware didn't clear it for us.
+	 */
+	andl	$(0x0000ffff), PT_CS(%esp)
+
+	/* Special case - entry from kernel mode via entry stack */
+	testl	$SEGMENT_RPL_MASK, PT_CS(%esp)
+	jz	.Lentry_from_kernel_\@
+
 	/* Bytes to copy */
 	movl	$PTREGS_SIZE, %ecx
 
@@ -329,8 +342,8 @@
 	 */
 	addl	$(4 * 4), %ecx
 
-.Lcopy_pt_regs_\@:
 #endif
+.Lcopy_pt_regs_\@:
 
 	/* Allocate frame on task-stack */
 	subl	%ecx, %edi
@@ -346,6 +359,56 @@
 	cld
 	rep movsl
 
+	jmp .Lend_\@
+
+.Lentry_from_kernel_\@:
+
+	/*
+	 * This handles the case when we enter the kernel from
+	 * kernel-mode and %esp points to the entry-stack. When this
+	 * happens we need to switch to the task-stack to run C code,
+	 * but switch back to the entry-stack again when we approach
+	 * iret and return to the interrupted code-path. This usually
+	 * happens when we hit an exception while restoring user-space
+	 * segment registers on the way back to user-space.
+	 *
+	 * When we switch to the task-stack here, we can't trust the
+	 * contents of the entry-stack anymore, as the exception handler
+	 * might be scheduled out or moved to another CPU. Therefore we
+	 * copy the complete entry-stack to the task-stack and set a
+	 * marker in the iret-frame (bit 31 of the CS dword) to detect
+	 * what we've done on the iret path.
+	 *
+	 * On the iret path we copy everything back and switch to the
+	 * entry-stack, so that the interrupted kernel code-path
+	 * continues on the same stack it was interrupted with.
+	 *
+	 * Be aware that an NMI can happen anytime in this code.
+	 *
+	 * %esi: Entry-Stack pointer (same as %esp)
+	 * %edi: Top of the task stack
+	 */
+
+	/* Calculate number of bytes on the entry stack in %ecx */
+	movl	%esi, %ecx
+
+	/* %ecx to the top of entry-stack */
+	andl	$(MASK_entry_stack), %ecx
+	addl	$(SIZEOF_entry_stack), %ecx
+
+	/* Number of bytes on the entry stack to %ecx */
+	sub	%esi, %ecx
+
+	/* Mark stackframe as coming from entry stack */
+	orl	$CS_FROM_ENTRY_STACK, PT_CS(%esp)
+
+	/*
+	 * %esi and %edi are unchanged, %ecx contains the number of
+	 * bytes to copy. The code at .Lcopy_pt_regs_\@ will allocate
+	 * the stack-frame on task-stack and copy everything over
+	 */
+	jmp .Lcopy_pt_regs_\@
+
 .Lend_\@:
 .endm
 
@@ -403,6 +466,56 @@
 .Lend_\@:
 .endm
 
+/*
+ * This macro handles the case when we return to kernel-mode on the iret
+ * path and have to switch back to the entry stack.
+ *
+ * See the comments below the .Lentry_from_kernel_\@ label in the
+ * SWITCH_TO_KERNEL_STACK macro for more details.
+ */
+.macro PARANOID_EXIT_TO_KERNEL_MODE
+
+	/*
+	 * Test if we entered the kernel with the entry-stack. Most
+	 * likely we did not, because this code only runs on the
+	 * return-to-kernel path.
+	 */
+	testl	$CS_FROM_ENTRY_STACK, PT_CS(%esp)
+	jz	.Lend_\@
+
+	/* Unlikely slow-path */
+
+	/* Clear marker from stack-frame */
+	andl	$(~CS_FROM_ENTRY_STACK), PT_CS(%esp)
+
+	/* Copy the remaining task-stack contents to entry-stack */
+	movl	%esp, %esi
+	movl	PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %edi
+
+	/* Bytes on the task-stack to ecx */
+	movl	PER_CPU_VAR(cpu_tss_rw + TSS_sp1), %ecx
+	subl	%esi, %ecx
+
+	/* Allocate stack-frame on entry-stack */
+	subl	%ecx, %edi
+
+	/*
+	 * Save future stack-pointer, we must not switch until the
+	 * copy is done, otherwise the NMI handler could destroy the
+	 * contents of the task-stack we are about to copy.
+	 */
+	movl	%edi, %ebx
+
+	/* Do the copy */
+	shrl	$2, %ecx
+	cld
+	rep movsl
+
+	/* Safe to switch to entry-stack now */
+	movl	%ebx, %esp
+
+.Lend_\@:
+.endm
 /*
  * %eax: prev task
  * %edx: next task
@@ -764,6 +877,7 @@ restore_all:
 
 restore_all_kernel:
 	TRACE_IRQS_IRET
+	PARANOID_EXIT_TO_KERNEL_MODE
 	RESTORE_REGS 4
 	jmp	.Lirq_return
 

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/entry/32: Simplify debug entry point
  2018-07-18  9:40 ` [PATCH 11/39] x86/entry/32: Simplify debug entry point Joerg Roedel
@ 2018-07-19 23:24   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: David.Laight, jroedel, jpoimboe, jkosina, luto, pavel, mingo, bp,
	linux-kernel, hpa, boris.ostrovsky, brgerst, dvlasenk, tglx,
	aarcange, eduval, peterz, dhgutteridge, gregkh, will.deacon,
	llong, dave.hansen, jgross, torvalds

Commit-ID:  929b44eb5739bf11d4a9bce85d7346bd955fc24d
Gitweb:     https://git.kernel.org/tip/929b44eb5739bf11d4a9bce85d7346bd955fc24d
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:40:48 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:39 +0200

x86/entry/32: Simplify debug entry point

The common exception entry code now handles the entry-from-sysenter stack
situation and makes sure to leave with the same stack as it entered the
kernel.

So there is no need anymore for the special handling in the debug entry
code.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-12-git-send-email-joro@8bytes.org

---
 arch/x86/entry/entry_32.S | 35 +++--------------------------------
 1 file changed, 3 insertions(+), 32 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 9d6eceba0461..dbf7d619dcd6 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1226,41 +1226,12 @@ END(common_exception)
 
 ENTRY(debug)
 	/*
-	 * #DB can happen at the first instruction of
-	 * entry_SYSENTER_32 or in Xen's SYSENTER prologue.  If this
-	 * happens, then we will be running on a very small stack.  We
-	 * need to detect this condition and switch to the thread
-	 * stack before calling any C code at all.
-	 *
-	 * If you edit this code, keep in mind that NMIs can happen in here.
+	 * Entry from sysenter is now handled in common_exception
 	 */
 	ASM_CLAC
 	pushl	$-1				# mark this as an int
-
-	SAVE_ALL
-	ENCODE_FRAME_POINTER
-	xorl	%edx, %edx			# error code 0
-	movl	%esp, %eax			# pt_regs pointer
-
-	/* Are we currently on the SYSENTER stack? */
-	movl	PER_CPU_VAR(cpu_entry_area), %ecx
-	addl	$CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
-	subl	%eax, %ecx	/* ecx = (end of entry_stack) - esp */
-	cmpl	$SIZEOF_entry_stack, %ecx
-	jb	.Ldebug_from_sysenter_stack
-
-	TRACE_IRQS_OFF
-	call	do_debug
-	jmp	ret_from_exception
-
-.Ldebug_from_sysenter_stack:
-	/* We're on the SYSENTER stack.  Switch off. */
-	movl	%esp, %ebx
-	movl	PER_CPU_VAR(cpu_current_top_of_stack), %esp
-	TRACE_IRQS_OFF
-	call	do_debug
-	movl	%ebx, %esp
-	jmp	ret_from_exception
+	pushl	$do_debug
+	jmp	common_exception
 END(debug)
 
 /*

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
  2018-07-18  9:40 ` [PATCH 12/39] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points Joerg Roedel
@ 2018-07-19 23:24   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:24 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, eduval, bp, torvalds, aarcange, jkosina, pavel,
	David.Laight, mingo, jpoimboe, brgerst, luto, jgross, dvlasenk,
	hpa, dhgutteridge, peterz, jroedel, dave.hansen, boris.ostrovsky,
	will.deacon, gregkh, llong, linux-kernel

Commit-ID:  e464fb9f241ddf46815b31ca594af96f2699a78e
Gitweb:     https://git.kernel.org/tip/e464fb9f241ddf46815b31ca594af96f2699a78e
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:40:49 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:39 +0200

x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points

Add unconditional cr3 switches between user and kernel cr3 to all non-NMI
entry and exit points.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-13-git-send-email-joro@8bytes.org

---
 arch/x86/entry/entry_32.S | 86 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 82 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index dbf7d619dcd6..60b28dfa00dc 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -77,6 +77,8 @@
 #endif
 .endm
 
+#define PTI_SWITCH_MASK         (1 << PAGE_SHIFT)
+
 /*
  * User gs save/restore
  *
@@ -154,6 +156,33 @@
 
 #endif /* CONFIG_X86_32_LAZY_GS */
 
+/* Unconditionally switch to user cr3 */
+.macro SWITCH_TO_USER_CR3 scratch_reg:req
+	ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
+
+	movl	%cr3, \scratch_reg
+	orl	$PTI_SWITCH_MASK, \scratch_reg
+	movl	\scratch_reg, %cr3
+.Lend_\@:
+.endm
+
+/*
+ * Switch to kernel cr3 if not already loaded and return current cr3 in
+ * \scratch_reg
+ */
+.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
+	ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
+	movl	%cr3, \scratch_reg
+	/* Test if we are already on kernel CR3 */
+	testl	$PTI_SWITCH_MASK, \scratch_reg
+	jz	.Lend_\@
+	andl	$(~PTI_SWITCH_MASK), \scratch_reg
+	movl	\scratch_reg, %cr3
+	/* Return original CR3 in \scratch_reg */
+	orl	$PTI_SWITCH_MASK, \scratch_reg
+.Lend_\@:
+.endm
+
 .macro SAVE_ALL pt_regs_ax=%eax switch_stacks=0
 	cld
 	PUSH_GS
@@ -283,7 +312,6 @@
 #endif /* CONFIG_X86_ESPFIX32 */
 .endm
 
-
 /*
  * Called with pt_regs fully populated and kernel segments loaded,
  * so we can access PER_CPU and use the integer registers.
@@ -296,11 +324,19 @@
  */
 
 #define CS_FROM_ENTRY_STACK	(1 << 31)
+#define CS_FROM_USER_CR3	(1 << 30)
 
 .macro SWITCH_TO_KERNEL_STACK
 
 	ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
 
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
+
+	/*
+	 * %eax now contains the entry cr3 and we carry it forward in
+	 * that register for the time this macro runs
+	 */
+
 	/* Are we on the entry stack? Bail out if not! */
 	movl	PER_CPU_VAR(cpu_entry_area), %ecx
 	addl	$CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
@@ -370,7 +406,8 @@
 	 * but switch back to the entry-stack again when we approach
 	 * iret and return to the interrupted code-path. This usually
 	 * happens when we hit an exception while restoring user-space
-	 * segment registers on the way back to user-space.
+	 * segment registers on the way back to user-space or when the
+	 * sysenter handler runs with eflags.tf set.
 	 *
 	 * When we switch to the task-stack here, we can't trust the
 	 * contents of the entry-stack anymore, as the exception handler
@@ -387,6 +424,7 @@
 	 *
 	 * %esi: Entry-Stack pointer (same as %esp)
 	 * %edi: Top of the task stack
+	 * %eax: CR3 on kernel entry
 	 */
 
 	/* Calculate number of bytes on the entry stack in %ecx */
@@ -402,6 +440,14 @@
 	/* Mark stackframe as coming from entry stack */
 	orl	$CS_FROM_ENTRY_STACK, PT_CS(%esp)
 
+	/*
+	 * Test the cr3 used to enter the kernel and add a marker
+	 * so that we can switch back to it before iret.
+	 */
+	testl	$PTI_SWITCH_MASK, %eax
+	jz	.Lcopy_pt_regs_\@
+	orl	$CS_FROM_USER_CR3, PT_CS(%esp)
+
 	/*
 	 * %esi and %edi are unchanged, %ecx contains the number of
 	 * bytes to copy. The code at .Lcopy_pt_regs_\@ will allocate
@@ -468,7 +514,7 @@
 
 /*
  * This macro handles the case when we return to kernel-mode on the iret
- * path and have to switch back to the entry stack.
+ * path and have to switch back to the entry stack and/or user-cr3
  *
  * See the comments below the .Lentry_from_kernel_\@ label in the
  * SWITCH_TO_KERNEL_STACK macro for more details.
@@ -514,6 +560,18 @@
 	/* Safe to switch to entry-stack now */
 	movl	%ebx, %esp
 
+	/*
+	 * We came from entry-stack and need to check if we also need to
+	 * switch back to user cr3.
+	 */
+	testl	$CS_FROM_USER_CR3, PT_CS(%esp)
+	jz	.Lend_\@
+
+	/* Clear marker from stack-frame */
+	andl	$(~CS_FROM_USER_CR3), PT_CS(%esp)
+
+	SWITCH_TO_USER_CR3 scratch_reg=%eax
+
 .Lend_\@:
 .endm
 /*
@@ -707,7 +765,20 @@ ENTRY(xen_sysenter_target)
  * 0(%ebp) arg6
  */
 ENTRY(entry_SYSENTER_32)
+	/*
+	 * On entry-stack with all userspace-regs live - save and
+	 * restore eflags and %eax to use it as scratch-reg for the cr3
+	 * switch.
+	 */
+	pushfl
+	pushl	%eax
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
+	popl	%eax
+	popfl
+
+	/* Stack empty again, switch to task stack */
 	movl	TSS_entry2task_stack(%esp), %esp
+
 .Lsysenter_past_esp:
 	pushl	$__USER_DS		/* pt_regs->ss */
 	pushl	%ebp			/* pt_regs->sp (stashed in bp) */
@@ -786,6 +857,9 @@ ENTRY(entry_SYSENTER_32)
 	/* Switch to entry stack */
 	movl	%eax, %esp
 
+	/* Now ready to switch the cr3 */
+	SWITCH_TO_USER_CR3 scratch_reg=%eax
+
 	/*
 	 * Restore all flags except IF. (We restore IF separately because
 	 * STI gives a one-instruction window in which we won't be interrupted,
@@ -866,7 +940,11 @@ restore_all:
 .Lrestore_all_notrace:
 	CHECK_AND_APPLY_ESPFIX
 .Lrestore_nocheck:
-	RESTORE_REGS 4				# skip orig_eax/error_code
+	/* Switch back to user CR3 */
+	SWITCH_TO_USER_CR3 scratch_reg=%eax
+
+	/* Restore user state */
+	RESTORE_REGS pop=4			# skip orig_eax/error_code
 .Lirq_return:
 	/*
 	 * ARCH_HAS_MEMBARRIER_SYNC_CORE rely on IRET core serialization

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/entry/32: Add PTI CR3 switches to NMI handler code
  2018-07-18  9:40 ` [PATCH 13/39] x86/entry/32: Add PTI cr3 switches to NMI handler code Joerg Roedel
@ 2018-07-19 23:25   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, llong, dave.hansen, jpoimboe, luto, mingo, jgross,
	tglx, jkosina, linux-kernel, will.deacon, David.Laight,
	boris.ostrovsky, bp, hpa, eduval, dhgutteridge, dvlasenk,
	brgerst, jroedel, gregkh, pavel, aarcange, peterz

Commit-ID:  b65bef400689ceee7108c2d47fb97ae91f4d1440
Gitweb:     https://git.kernel.org/tip/b65bef400689ceee7108c2d47fb97ae91f4d1440
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:40:50 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:39 +0200

x86/entry/32: Add PTI CR3 switches to NMI handler code

The NMI handler is special, as it needs to leave with the same CR3 as it
was entered with. This is required because the NMI can happen within kernel
context but with user CR3 already loaded, i.e. after switching to user CR3
but before returning to user space.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-14-git-send-email-joro@8bytes.org

---
 arch/x86/entry/entry_32.S | 39 +++++++++++++++++++++++++++++++++------
 1 file changed, 33 insertions(+), 6 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 60b28dfa00dc..b1541c74c71a 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -210,8 +210,19 @@
 
 .endm
 
-.macro SAVE_ALL_NMI
+.macro SAVE_ALL_NMI cr3_reg:req
 	SAVE_ALL
+
+	/*
+	 * Now switch the CR3 when PTI is enabled.
+	 *
+	 * We can enter with either user or kernel cr3, the code will
+	 * store the old cr3 in \cr3_reg and switches to the kernel cr3
+	 * if necessary.
+	 */
+	SWITCH_TO_KERNEL_CR3 scratch_reg=\cr3_reg
+
+.Lend_\@:
 .endm
 /*
  * This is a sneaky trick to help the unwinder find pt_regs on the stack.  The
@@ -259,7 +270,23 @@
 	POP_GS_EX
 .endm
 
-.macro RESTORE_ALL_NMI pop=0
+.macro RESTORE_ALL_NMI cr3_reg:req pop=0
+	/*
+	 * Now switch the CR3 when PTI is enabled.
+	 *
+	 * We enter with kernel cr3 and switch the cr3 to the value
+	 * stored on \cr3_reg, which is either a user or a kernel cr3.
+	 */
+	ALTERNATIVE "jmp .Lswitched_\@", "", X86_FEATURE_PTI
+
+	testl	$PTI_SWITCH_MASK, \cr3_reg
+	jz	.Lswitched_\@
+
+	/* User cr3 in \cr3_reg - write it to hardware cr3 */
+	movl	\cr3_reg, %cr3
+
+.Lswitched_\@:
+
 	RESTORE_REGS pop=\pop
 .endm
 
@@ -1331,7 +1358,7 @@ ENTRY(nmi)
 #endif
 
 	pushl	%eax				# pt_regs->orig_ax
-	SAVE_ALL_NMI
+	SAVE_ALL_NMI cr3_reg=%edi
 	ENCODE_FRAME_POINTER
 	xorl	%edx, %edx			# zero error code
 	movl	%esp, %eax			# pt_regs pointer
@@ -1359,7 +1386,7 @@ ENTRY(nmi)
 
 .Lnmi_return:
 	CHECK_AND_APPLY_ESPFIX
-	RESTORE_ALL_NMI pop=4
+	RESTORE_ALL_NMI cr3_reg=%edi pop=4
 	jmp	.Lirq_return
 
 #ifdef CONFIG_X86_ESPFIX32
@@ -1375,12 +1402,12 @@ ENTRY(nmi)
 	pushl	16(%esp)
 	.endr
 	pushl	%eax
-	SAVE_ALL_NMI
+	SAVE_ALL_NMI cr3_reg=%edi
 	ENCODE_FRAME_POINTER
 	FIXUP_ESPFIX_STACK			# %eax == %esp
 	xorl	%edx, %edx			# zero error code
 	call	do_nmi
-	RESTORE_ALL_NMI
+	RESTORE_ALL_NMI cr3_reg=%edi
 	lss	12+4(%esp), %esp		# back to espfix stack
 	jmp	.Lirq_return
 #endif

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/entry: Rename update_sp0 to update_task_stack
  2018-07-18  9:40 ` [PATCH 14/39] x86/entry: Rename update_sp0 to update_task_stack Joerg Roedel
@ 2018-07-19 23:25   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bp, aarcange, dvlasenk, gregkh, peterz, luto, will.deacon, pavel,
	hpa, boris.ostrovsky, linux-kernel, torvalds, eduval, jgross,
	tglx, llong, brgerst, David.Laight, jkosina, jpoimboe, jroedel,
	mingo, dhgutteridge, dave.hansen

Commit-ID:  252e1a0526304f0f3f6888fc09e81cb220f957f3
Gitweb:     https://git.kernel.org/tip/252e1a0526304f0f3f6888fc09e81cb220f957f3
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:40:51 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:40 +0200

x86/entry: Rename update_sp0 to update_task_stack

The function does not update sp0 anymore but updates makes the task-stack
visible for entry code. This is by either writing it to sp1 or by doing a
hypercall. Rename the function to get rid of the misleading name.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-15-git-send-email-joro@8bytes.org

---
 arch/x86/include/asm/switch_to.h | 2 +-
 arch/x86/kernel/process_32.c     | 2 +-
 arch/x86/kernel/process_64.c     | 2 +-
 arch/x86/kernel/vm86_32.c        | 4 ++--
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index 8bc2f70b452e..36bd243843d6 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -87,7 +87,7 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread)
 #endif
 
 /* This is used when switching tasks or entering/exiting vm86 mode. */
-static inline void update_sp0(struct task_struct *task)
+static inline void update_task_stack(struct task_struct *task)
 {
 	/* sp0 always points to the entry trampoline stack, which is constant: */
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 0ae659de21eb..2924fd447e61 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -285,7 +285,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	 * current_thread_info().  Refresh the SYSENTER configuration in
 	 * case prev or next is vm86.
 	 */
-	update_sp0(next_p);
+	update_task_stack(next_p);
 	refresh_sysenter_cs(next);
 	this_cpu_write(cpu_current_top_of_stack,
 		       (unsigned long)task_stack_page(next_p) +
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 12bb445fb98d..476e3ddf8890 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -478,7 +478,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
 
 	/* Reload sp0. */
-	update_sp0(next_p);
+	update_task_stack(next_p);
 
 	/*
 	 * Now maybe reload the debug registers and handle I/O bitmaps
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c
index 9d0b5af7db91..1c03e4aa6474 100644
--- a/arch/x86/kernel/vm86_32.c
+++ b/arch/x86/kernel/vm86_32.c
@@ -149,7 +149,7 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval)
 	preempt_disable();
 	tsk->thread.sp0 = vm86->saved_sp0;
 	tsk->thread.sysenter_cs = __KERNEL_CS;
-	update_sp0(tsk);
+	update_task_stack(tsk);
 	refresh_sysenter_cs(&tsk->thread);
 	vm86->saved_sp0 = 0;
 	preempt_enable();
@@ -374,7 +374,7 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus)
 		refresh_sysenter_cs(&tsk->thread);
 	}
 
-	update_sp0(tsk);
+	update_task_stack(tsk);
 	preempt_enable();
 
 	if (vm86->flags & VM86_SCREEN_BITMAP)

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/pgtable: Rename pti_set_user_pgd() to pti_set_user_pgtbl()
  2018-07-18  9:40 ` [PATCH 15/39] x86/pgtable: Rename pti_set_user_pgd to pti_set_user_pgtbl Joerg Roedel
@ 2018-07-19 23:26   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dhgutteridge, peterz, llong, gregkh, aarcange, eduval, jroedel,
	dvlasenk, David.Laight, pavel, jpoimboe, tglx, boris.ostrovsky,
	brgerst, luto, mingo, hpa, will.deacon, torvalds, bp,
	linux-kernel, jkosina, jgross, dave.hansen

Commit-ID:  23b772883d1ddcf7fdf883614b88b2a6205db4da
Gitweb:     https://git.kernel.org/tip/23b772883d1ddcf7fdf883614b88b2a6205db4da
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:40:52 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:40 +0200

x86/pgtable: Rename pti_set_user_pgd() to pti_set_user_pgtbl()

The way page-table folding is implemented on 32 bit, these functions are
not only setting, but also PUDs and even PMDs. Give the function a more
generic name to reflect that.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-16-git-send-email-joro@8bytes.org

---
 arch/x86/include/asm/pgtable_64.h | 12 ++++++------
 arch/x86/mm/pti.c                 |  2 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 3c5385f9a88f..9406c4fe1d38 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -196,21 +196,21 @@ static inline bool pgdp_maps_userspace(void *__ptr)
 }
 
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
-pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd);
+pgd_t __pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd);
 
 /*
  * Take a PGD location (pgdp) and a pgd value that needs to be set there.
  * Populates the user and returns the resulting PGD that must be set in
  * the kernel copy of the page tables.
  */
-static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
+static inline pgd_t pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd)
 {
 	if (!static_cpu_has(X86_FEATURE_PTI))
 		return pgd;
-	return __pti_set_user_pgd(pgdp, pgd);
+	return __pti_set_user_pgtbl(pgdp, pgd);
 }
 #else
-static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
+static inline pgd_t pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd)
 {
 	return pgd;
 }
@@ -226,7 +226,7 @@ static inline void native_set_p4d(p4d_t *p4dp, p4d_t p4d)
 	}
 
 	pgd = native_make_pgd(native_p4d_val(p4d));
-	pgd = pti_set_user_pgd((pgd_t *)p4dp, pgd);
+	pgd = pti_set_user_pgtbl((pgd_t *)p4dp, pgd);
 	*p4dp = native_make_p4d(native_pgd_val(pgd));
 }
 
@@ -237,7 +237,7 @@ static inline void native_p4d_clear(p4d_t *p4d)
 
 static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd)
 {
-	*pgdp = pti_set_user_pgd(pgdp, pgd);
+	*pgdp = pti_set_user_pgtbl(pgdp, pgd);
 }
 
 static inline void native_pgd_clear(pgd_t *pgd)
diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 9277e9ba92b5..71fba17c9d7c 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -117,7 +117,7 @@ enable:
 	setup_force_cpu_cap(X86_FEATURE_PTI);
 }
 
-pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
+pgd_t __pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd)
 {
 	/*
 	 * Changes to the high (kernel) portion of the kernelmode page

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/pgtable/pae: Unshare kernel PMDs when PTI is enabled
  2018-07-18  9:40 ` [PATCH 16/39] x86/pgtable/pae: Unshare kernel PMDs when PTI is enabled Joerg Roedel
@ 2018-07-19 23:26   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jgross, jpoimboe, gregkh, will.deacon, dhgutteridge, pavel,
	torvalds, aarcange, llong, linux-kernel, David.Laight, jkosina,
	luto, bp, brgerst, mingo, tglx, hpa, boris.ostrovsky, dvlasenk,
	dave.hansen, eduval, peterz, jroedel

Commit-ID:  7ffcf1497c8ab59a705bfafb7401876fd2f6f71e
Gitweb:     https://git.kernel.org/tip/7ffcf1497c8ab59a705bfafb7401876fd2f6f71e
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:40:53 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:40 +0200

x86/pgtable/pae: Unshare kernel PMDs when PTI is enabled

With PTI the per-process LDT must be mapped into the kernel address-space
for each process, which requires separate kernel PMDs per PGD.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-17-git-send-email-joro@8bytes.org

---
 arch/x86/include/asm/pgtable-3level_types.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable-3level_types.h b/arch/x86/include/asm/pgtable-3level_types.h
index 6a59a6d0cc50..78038e057876 100644
--- a/arch/x86/include/asm/pgtable-3level_types.h
+++ b/arch/x86/include/asm/pgtable-3level_types.h
@@ -21,9 +21,10 @@ typedef union {
 #endif	/* !__ASSEMBLY__ */
 
 #ifdef CONFIG_PARAVIRT
-#define SHARED_KERNEL_PMD	(pv_info.shared_kernel_pmd)
+#define SHARED_KERNEL_PMD	((!static_cpu_has(X86_FEATURE_PTI) &&	\
+				 (pv_info.shared_kernel_pmd)))
 #else
-#define SHARED_KERNEL_PMD	1
+#define SHARED_KERNEL_PMD	(!static_cpu_has(X86_FEATURE_PTI))
 #endif
 
 /*

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/pgtable/32: Allocate 8k page-tables when PTI is enabled
  2018-07-18  9:40 ` [PATCH 17/39] x86/pgtable/32: Allocate 8k page-tables " Joerg Roedel
@ 2018-07-19 23:27   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:27 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dhgutteridge, will.deacon, peterz, mingo, hpa, torvalds, jgross,
	dave.hansen, linux-kernel, pavel, jroedel, jkosina, gregkh, tglx,
	llong, eduval, aarcange, luto, jpoimboe, dvlasenk,
	boris.ostrovsky, David.Laight, bp, brgerst

Commit-ID:  e3238faf20fb1b51a814497751398ab525a2c884
Gitweb:     https://git.kernel.org/tip/e3238faf20fb1b51a814497751398ab525a2c884
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:40:54 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:41 +0200

x86/pgtable/32: Allocate 8k page-tables when PTI is enabled

Allocate a kernel and a user page-table root when PTI is enabled. Also
allocate a full page per root for PAE because otherwise the bit to flip in
CR3 to switch between them would be non-constant, which creates a lot of
hassle.  Keep that for a later optimization.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-18-git-send-email-joro@8bytes.org

---
 arch/x86/kernel/head_32.S | 20 +++++++++++++++-----
 arch/x86/mm/pgtable.c     |  5 +++--
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index abe6df15a8fb..30f9cb2c0b55 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -512,11 +512,18 @@ ENTRY(initial_code)
 ENTRY(setup_once_ref)
 	.long setup_once
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+#define	PGD_ALIGN	(2 * PAGE_SIZE)
+#define PTI_USER_PGD_FILL	1024
+#else
+#define	PGD_ALIGN	(PAGE_SIZE)
+#define PTI_USER_PGD_FILL	0
+#endif
 /*
  * BSS section
  */
 __PAGE_ALIGNED_BSS
-	.align PAGE_SIZE
+	.align PGD_ALIGN
 #ifdef CONFIG_X86_PAE
 .globl initial_pg_pmd
 initial_pg_pmd:
@@ -526,14 +533,17 @@ initial_pg_pmd:
 initial_page_table:
 	.fill 1024,4,0
 #endif
+	.align PGD_ALIGN
 initial_pg_fixmap:
 	.fill 1024,4,0
-.globl empty_zero_page
-empty_zero_page:
-	.fill 4096,1,0
 .globl swapper_pg_dir
+	.align PGD_ALIGN
 swapper_pg_dir:
 	.fill 1024,4,0
+	.fill PTI_USER_PGD_FILL,4,0
+.globl empty_zero_page
+empty_zero_page:
+	.fill 4096,1,0
 EXPORT_SYMBOL(empty_zero_page)
 
 /*
@@ -542,7 +552,7 @@ EXPORT_SYMBOL(empty_zero_page)
 #ifdef CONFIG_X86_PAE
 __PAGE_ALIGNED_DATA
 	/* Page-aligned for the benefit of paravirt? */
-	.align PAGE_SIZE
+	.align PGD_ALIGN
 ENTRY(initial_page_table)
 	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR),0	/* low identity map */
 # if KPMDS == 3
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 47b5951e592b..db6fb7740bf7 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -343,7 +343,8 @@ static inline pgd_t *_pgd_alloc(void)
 	 * We allocate one page for pgd.
 	 */
 	if (!SHARED_KERNEL_PMD)
-		return (pgd_t *)__get_free_page(PGALLOC_GFP);
+		return (pgd_t *)__get_free_pages(PGALLOC_GFP,
+						 PGD_ALLOCATION_ORDER);
 
 	/*
 	 * Now PAE kernel is not running as a Xen domain. We can allocate
@@ -355,7 +356,7 @@ static inline pgd_t *_pgd_alloc(void)
 static inline void _pgd_free(pgd_t *pgd)
 {
 	if (!SHARED_KERNEL_PMD)
-		free_page((unsigned long)pgd);
+		free_pages((unsigned long)pgd, PGD_ALLOCATION_ORDER);
 	else
 		kmem_cache_free(pgd_cache, pgd);
 }

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h
  2018-07-18  9:40 ` [PATCH 18/39] x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h Joerg Roedel
@ 2018-07-19 23:27   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:27 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: David.Laight, boris.ostrovsky, bp, peterz, dvlasenk, torvalds,
	mingo, pavel, hpa, jpoimboe, dave.hansen, will.deacon, jkosina,
	jroedel, jgross, eduval, linux-kernel, gregkh, luto, llong,
	aarcange, dhgutteridge, brgerst, tglx

Commit-ID:  8372d66865deb45ee3ec21401a9c80f231b728c8
Gitweb:     https://git.kernel.org/tip/8372d66865deb45ee3ec21401a9c80f231b728c8
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:40:55 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:41 +0200

x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h

Make them available on 32 bit and clone_pgd_range() happy.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-19-git-send-email-joro@8bytes.org

---
 arch/x86/include/asm/pgtable.h    | 49 +++++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/pgtable_64.h | 49 ---------------------------------------
 2 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 5715647fc4fe..eb474329d751 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1155,6 +1155,55 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
 }
 #endif
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+/*
+ * All top-level PAGE_TABLE_ISOLATION page tables are order-1 pages
+ * (8k-aligned and 8k in size).  The kernel one is at the beginning 4k and
+ * the user one is in the last 4k.  To switch between them, you
+ * just need to flip the 12th bit in their addresses.
+ */
+#define PTI_PGTABLE_SWITCH_BIT	PAGE_SHIFT
+
+/*
+ * This generates better code than the inline assembly in
+ * __set_bit().
+ */
+static inline void *ptr_set_bit(void *ptr, int bit)
+{
+	unsigned long __ptr = (unsigned long)ptr;
+
+	__ptr |= BIT(bit);
+	return (void *)__ptr;
+}
+static inline void *ptr_clear_bit(void *ptr, int bit)
+{
+	unsigned long __ptr = (unsigned long)ptr;
+
+	__ptr &= ~BIT(bit);
+	return (void *)__ptr;
+}
+
+static inline pgd_t *kernel_to_user_pgdp(pgd_t *pgdp)
+{
+	return ptr_set_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
+}
+
+static inline pgd_t *user_to_kernel_pgdp(pgd_t *pgdp)
+{
+	return ptr_clear_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
+}
+
+static inline p4d_t *kernel_to_user_p4dp(p4d_t *p4dp)
+{
+	return ptr_set_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
+}
+
+static inline p4d_t *user_to_kernel_p4dp(p4d_t *p4dp)
+{
+	return ptr_clear_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
+}
+#endif /* CONFIG_PAGE_TABLE_ISOLATION */
+
 /*
  * clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
  *
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 9406c4fe1d38..4adba19c7bf6 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -132,55 +132,6 @@ static inline pud_t native_pudp_get_and_clear(pud_t *xp)
 #endif
 }
 
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
-/*
- * All top-level PAGE_TABLE_ISOLATION page tables are order-1 pages
- * (8k-aligned and 8k in size).  The kernel one is at the beginning 4k and
- * the user one is in the last 4k.  To switch between them, you
- * just need to flip the 12th bit in their addresses.
- */
-#define PTI_PGTABLE_SWITCH_BIT	PAGE_SHIFT
-
-/*
- * This generates better code than the inline assembly in
- * __set_bit().
- */
-static inline void *ptr_set_bit(void *ptr, int bit)
-{
-	unsigned long __ptr = (unsigned long)ptr;
-
-	__ptr |= BIT(bit);
-	return (void *)__ptr;
-}
-static inline void *ptr_clear_bit(void *ptr, int bit)
-{
-	unsigned long __ptr = (unsigned long)ptr;
-
-	__ptr &= ~BIT(bit);
-	return (void *)__ptr;
-}
-
-static inline pgd_t *kernel_to_user_pgdp(pgd_t *pgdp)
-{
-	return ptr_set_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
-}
-
-static inline pgd_t *user_to_kernel_pgdp(pgd_t *pgdp)
-{
-	return ptr_clear_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
-}
-
-static inline p4d_t *kernel_to_user_p4dp(p4d_t *p4dp)
-{
-	return ptr_set_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
-}
-
-static inline p4d_t *user_to_kernel_p4dp(p4d_t *p4dp)
-{
-	return ptr_clear_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
-}
-#endif /* CONFIG_PAGE_TABLE_ISOLATION */
-
 /*
  * Page table pages are page-aligned.  The lower half of the top
  * level is used for userspace and the top half for the kernel.

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/pgtable: Move pti_set_user_pgtbl() to pgtable.h
  2018-07-18  9:40 ` [PATCH 19/39] x86/pgtable: Move pti_set_user_pgtbl() " Joerg Roedel
@ 2018-07-19 23:28   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:28 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, jkosina, luto, tglx, aarcange, bp, llong, eduval,
	dave.hansen, peterz, jroedel, jpoimboe, linux-kernel,
	boris.ostrovsky, dvlasenk, gregkh, torvalds, pavel, David.Laight,
	will.deacon, hpa, brgerst, jgross, dhgutteridge

Commit-ID:  fcbbd977572cfe5a3dcc97d663bf7480431a07ca
Gitweb:     https://git.kernel.org/tip/fcbbd977572cfe5a3dcc97d663bf7480431a07ca
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:40:56 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:42 +0200

x86/pgtable: Move pti_set_user_pgtbl() to pgtable.h

There it is also usable from 32 bit code.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-20-git-send-email-joro@8bytes.org

---
 arch/x86/include/asm/pgtable.h | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index eb474329d751..cc117161f13d 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -640,8 +640,31 @@ static inline int is_new_memtype_allowed(u64 paddr, unsigned long size,
 
 pmd_t *populate_extra_pmd(unsigned long vaddr);
 pte_t *populate_extra_pte(unsigned long vaddr);
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+pgd_t __pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd);
+
+/*
+ * Take a PGD location (pgdp) and a pgd value that needs to be set there.
+ * Populates the user and returns the resulting PGD that must be set in
+ * the kernel copy of the page tables.
+ */
+static inline pgd_t pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd)
+{
+	if (!static_cpu_has(X86_FEATURE_PTI))
+		return pgd;
+	return __pti_set_user_pgtbl(pgdp, pgd);
+}
+#else   /* CONFIG_PAGE_TABLE_ISOLATION */
+static inline pgd_t pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd)
+{
+	return pgd;
+}
+#endif  /* CONFIG_PAGE_TABLE_ISOLATION */
+
 #endif	/* __ASSEMBLY__ */
 
+
 #ifdef CONFIG_X86_32
 # include <asm/pgtable_32.h>
 #else

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/pgtable: Move two more functions from pgtable_64.h to pgtable.h
  2018-07-18  9:40 ` [PATCH 20/39] x86/pgtable: Move two more functions from pgtable_64.h " Joerg Roedel
@ 2018-07-19 23:28   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:28 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dave.hansen, linux-kernel, mingo, torvalds, bp, dhgutteridge,
	jgross, boris.ostrovsky, tglx, jkosina, aarcange, jpoimboe,
	pavel, hpa, luto, gregkh, David.Laight, jroedel, peterz, brgerst,
	will.deacon, eduval, llong, dvlasenk

Commit-ID:  76e258add7b653b60037ee4b25ebc40da6a35c4a
Gitweb:     https://git.kernel.org/tip/76e258add7b653b60037ee4b25ebc40da6a35c4a
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:40:57 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:42 +0200

x86/pgtable: Move two more functions from pgtable_64.h to pgtable.h

These two functions are required for PTI on 32 bit:

	* pgdp_maps_userspace()
	* pgd_large()

Also re-implement pgdp_maps_userspace() so that it will work on 64 and 32
bit kernels.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-21-git-send-email-joro@8bytes.org

---
 arch/x86/include/asm/pgtable-2level_types.h |  3 +++
 arch/x86/include/asm/pgtable-3level_types.h |  1 +
 arch/x86/include/asm/pgtable.h              | 15 ++++++++++++
 arch/x86/include/asm/pgtable_32.h           |  2 --
 arch/x86/include/asm/pgtable_64.h           | 36 -----------------------------
 arch/x86/include/asm/pgtable_64_types.h     |  2 ++
 6 files changed, 21 insertions(+), 38 deletions(-)

diff --git a/arch/x86/include/asm/pgtable-2level_types.h b/arch/x86/include/asm/pgtable-2level_types.h
index f982ef808e7e..6deb6cd236e3 100644
--- a/arch/x86/include/asm/pgtable-2level_types.h
+++ b/arch/x86/include/asm/pgtable-2level_types.h
@@ -35,4 +35,7 @@ typedef union {
 
 #define PTRS_PER_PTE	1024
 
+/* This covers all VMSPLIT_* and VMSPLIT_*_OPT variants */
+#define PGD_KERNEL_START	(CONFIG_PAGE_OFFSET >> PGDIR_SHIFT)
+
 #endif /* _ASM_X86_PGTABLE_2LEVEL_DEFS_H */
diff --git a/arch/x86/include/asm/pgtable-3level_types.h b/arch/x86/include/asm/pgtable-3level_types.h
index 78038e057876..858358a82b14 100644
--- a/arch/x86/include/asm/pgtable-3level_types.h
+++ b/arch/x86/include/asm/pgtable-3level_types.h
@@ -46,5 +46,6 @@ typedef union {
 #define PTRS_PER_PTE	512
 
 #define MAX_POSSIBLE_PHYSMEM_BITS	36
+#define PGD_KERNEL_START	(CONFIG_PAGE_OFFSET >> PGDIR_SHIFT)
 
 #endif /* _ASM_X86_PGTABLE_3LEVEL_DEFS_H */
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index cc117161f13d..e39088cb59ab 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1177,6 +1177,21 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
 	}
 }
 #endif
+/*
+ * Page table pages are page-aligned.  The lower half of the top
+ * level is used for userspace and the top half for the kernel.
+ *
+ * Returns true for parts of the PGD that map userspace and
+ * false for the parts that map the kernel.
+ */
+static inline bool pgdp_maps_userspace(void *__ptr)
+{
+	unsigned long ptr = (unsigned long)__ptr;
+
+	return (((ptr & ~PAGE_MASK) / sizeof(pgd_t)) < PGD_KERNEL_START);
+}
+
+static inline int pgd_large(pgd_t pgd) { return 0; }
 
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
 /*
diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h
index 88a056b01db4..b3ec519e3982 100644
--- a/arch/x86/include/asm/pgtable_32.h
+++ b/arch/x86/include/asm/pgtable_32.h
@@ -34,8 +34,6 @@ static inline void check_pgt_cache(void) { }
 void paging_init(void);
 void sync_initial_page_table(void);
 
-static inline int pgd_large(pgd_t pgd) { return 0; }
-
 /*
  * Define this if things work differently on an i386 and an i486:
  * it will (on an i486) warn about kernel memory accesses that are
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 4adba19c7bf6..acb6970e7bcf 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -132,41 +132,6 @@ static inline pud_t native_pudp_get_and_clear(pud_t *xp)
 #endif
 }
 
-/*
- * Page table pages are page-aligned.  The lower half of the top
- * level is used for userspace and the top half for the kernel.
- *
- * Returns true for parts of the PGD that map userspace and
- * false for the parts that map the kernel.
- */
-static inline bool pgdp_maps_userspace(void *__ptr)
-{
-	unsigned long ptr = (unsigned long)__ptr;
-
-	return (ptr & ~PAGE_MASK) < (PAGE_SIZE / 2);
-}
-
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
-pgd_t __pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd);
-
-/*
- * Take a PGD location (pgdp) and a pgd value that needs to be set there.
- * Populates the user and returns the resulting PGD that must be set in
- * the kernel copy of the page tables.
- */
-static inline pgd_t pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd)
-{
-	if (!static_cpu_has(X86_FEATURE_PTI))
-		return pgd;
-	return __pti_set_user_pgtbl(pgdp, pgd);
-}
-#else
-static inline pgd_t pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd)
-{
-	return pgd;
-}
-#endif
-
 static inline void native_set_p4d(p4d_t *p4dp, p4d_t p4d)
 {
 	pgd_t pgd;
@@ -206,7 +171,6 @@ extern void sync_global_pgds(unsigned long start, unsigned long end);
 /*
  * Level 4 access.
  */
-static inline int pgd_large(pgd_t pgd) { return 0; }
 #define mk_kernel_pgd(address) __pgd((address) | _KERNPG_TABLE)
 
 /* PUD - Level3 access */
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 054765ab2da2..066e0ab9ffab 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -153,4 +153,6 @@ extern unsigned int ptrs_per_p4d;
 
 #define EARLY_DYNAMIC_PAGE_TABLES	64
 
+#define PGD_KERNEL_START	((PAGE_SIZE / 2) / sizeof(pgd_t))
+
 #endif /* _ASM_X86_PGTABLE_64_DEFS_H */

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/mm/pae: Populate valid user PGD entries
  2018-07-18  9:40 ` [PATCH 21/39] x86/mm/pae: Populate valid user PGD entries Joerg Roedel
@ 2018-07-19 23:29   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:29 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: will.deacon, jroedel, tglx, dave.hansen, pavel, hpa, luto,
	gregkh, David.Laight, dhgutteridge, peterz, jkosina, bp, llong,
	dvlasenk, boris.ostrovsky, eduval, torvalds, brgerst, aarcange,
	mingo, linux-kernel, jpoimboe, jgross

Commit-ID:  6c0df8689494e1fefa685377676fa8192291a0eb
Gitweb:     https://git.kernel.org/tip/6c0df8689494e1fefa685377676fa8192291a0eb
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:40:58 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:42 +0200

x86/mm/pae: Populate valid user PGD entries

Generic page-table code populates all non-leaf entries with _KERNPG_TABLE
bits set. This is fine for all paging modes except PAE.

In PAE mode only a subset of the bits is allowed to be set.  Make sure to
only set allowed bits by masking out the reserved bits.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-22-git-send-email-joro@8bytes.org

---
 arch/x86/include/asm/pgtable_types.h | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 99fff853c944..b64acb08a62b 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -50,6 +50,7 @@
 #define _PAGE_GLOBAL	(_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
 #define _PAGE_SOFTW1	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW1)
 #define _PAGE_SOFTW2	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW2)
+#define _PAGE_SOFTW3	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW3)
 #define _PAGE_PAT	(_AT(pteval_t, 1) << _PAGE_BIT_PAT)
 #define _PAGE_PAT_LARGE (_AT(pteval_t, 1) << _PAGE_BIT_PAT_LARGE)
 #define _PAGE_SPECIAL	(_AT(pteval_t, 1) << _PAGE_BIT_SPECIAL)
@@ -266,14 +267,37 @@ typedef struct pgprot { pgprotval_t pgprot; } pgprot_t;
 
 typedef struct { pgdval_t pgd; } pgd_t;
 
+#ifdef CONFIG_X86_PAE
+
+/*
+ * PHYSICAL_PAGE_MASK might be non-constant when SME is compiled in, so we can't
+ * use it here.
+ */
+
+#define PGD_PAE_PAGE_MASK	((signed long)PAGE_MASK)
+#define PGD_PAE_PHYS_MASK	(((1ULL << __PHYSICAL_MASK_SHIFT)-1) & PGD_PAE_PAGE_MASK)
+
+/*
+ * PAE allows Base Address, P, PWT, PCD and AVL bits to be set in PGD entries.
+ * All other bits are Reserved MBZ
+ */
+#define PGD_ALLOWED_BITS	(PGD_PAE_PHYS_MASK | _PAGE_PRESENT | \
+				 _PAGE_PWT | _PAGE_PCD | \
+				 _PAGE_SOFTW1 | _PAGE_SOFTW2 | _PAGE_SOFTW3)
+
+#else
+/* No need to mask any bits for !PAE */
+#define PGD_ALLOWED_BITS	(~0ULL)
+#endif
+
 static inline pgd_t native_make_pgd(pgdval_t val)
 {
-	return (pgd_t) { val };
+	return (pgd_t) { val & PGD_ALLOWED_BITS };
 }
 
 static inline pgdval_t native_pgd_val(pgd_t pgd)
 {
-	return pgd.pgd;
+	return pgd.pgd & PGD_ALLOWED_BITS;
 }
 
 static inline pgdval_t pgd_flags(pgd_t pgd)

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/mm/pae: Populate the user page-table with user pgd's
  2018-07-18  9:40 ` [PATCH 22/39] x86/mm/pae: Populate the user page-table with user pgd's Joerg Roedel
@ 2018-07-19 23:30   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:30 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, peterz, jpoimboe, hpa, David.Laight, llong, jgross,
	boris.ostrovsky, mingo, jroedel, luto, will.deacon, eduval,
	dave.hansen, aarcange, linux-kernel, dhgutteridge, dvlasenk,
	gregkh, pavel, torvalds, jkosina, bp, brgerst

Commit-ID:  9b7b8bbd7f6ba4ef7caa5a078ead70237e12d045
Gitweb:     https://git.kernel.org/tip/9b7b8bbd7f6ba4ef7caa5a078ead70237e12d045
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:40:59 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:43 +0200

x86/mm/pae: Populate the user page-table with user pgd's

When a PGD entry is populated, make sure to populate it in the user
page-table too.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-23-git-send-email-joro@8bytes.org

---
 arch/x86/include/asm/pgtable-3level.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
index f24df59c40b2..f2ca3139ca22 100644
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -98,6 +98,9 @@ static inline void native_set_pmd(pmd_t *pmdp, pmd_t pmd)
 
 static inline void native_set_pud(pud_t *pudp, pud_t pud)
 {
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+	pud.p4d.pgd = pti_set_user_pgtbl(&pudp->p4d.pgd, pud.p4d.pgd);
+#endif
 	set_64bit((unsigned long long *)(pudp), native_pud_val(pud));
 }
 
@@ -229,6 +232,10 @@ static inline pud_t native_pudp_get_and_clear(pud_t *pudp)
 {
 	union split_pud res, *orig = (union split_pud *)pudp;
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+	pti_set_user_pgtbl(&pudp->p4d.pgd, __pgd(0));
+#endif
+
 	/* xchg acts as a barrier before setting of the high bits */
 	res.pud_low = xchg(&orig->pud_low, 0);
 	res.pud_high = orig->pud_high;

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/mm/legacy: Populate the user page-table with user pgd's
  2018-07-18  9:41 ` [PATCH 23/39] x86/mm/legacy: " Joerg Roedel
@ 2018-07-19 23:30   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:30 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dhgutteridge, dvlasenk, jroedel, hpa, gregkh, peterz,
	David.Laight, jgross, luto, linux-kernel, brgerst, bp, llong,
	dave.hansen, aarcange, pavel, eduval, will.deacon, mingo,
	torvalds, jkosina, jpoimboe, tglx, boris.ostrovsky

Commit-ID:  1f40a46cf47c12d93a5ad9dccd82bd36ff8f956a
Gitweb:     https://git.kernel.org/tip/1f40a46cf47c12d93a5ad9dccd82bd36ff8f956a
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:41:00 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:43 +0200

x86/mm/legacy: Populate the user page-table with user pgd's

Also populate the user-spage pgd's in the user page-table.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-24-git-send-email-joro@8bytes.org

---
 arch/x86/include/asm/pgtable-2level.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/include/asm/pgtable-2level.h b/arch/x86/include/asm/pgtable-2level.h
index 685ffe8a0eaf..c399ea5eea41 100644
--- a/arch/x86/include/asm/pgtable-2level.h
+++ b/arch/x86/include/asm/pgtable-2level.h
@@ -19,6 +19,9 @@ static inline void native_set_pte(pte_t *ptep , pte_t pte)
 
 static inline void native_set_pmd(pmd_t *pmdp, pmd_t pmd)
 {
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+	pmd.pud.p4d.pgd = pti_set_user_pgtbl(&pmdp->pud.p4d.pgd, pmd.pud.p4d.pgd);
+#endif
 	*pmdp = pmd;
 }
 
@@ -58,6 +61,9 @@ static inline pte_t native_ptep_get_and_clear(pte_t *xp)
 #ifdef CONFIG_SMP
 static inline pmd_t native_pmdp_get_and_clear(pmd_t *xp)
 {
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+	pti_set_user_pgtbl(&xp->pud.p4d.pgd, __pgd(0));
+#endif
 	return __pmd(xchg((pmdval_t *)xp, 0));
 }
 #else
@@ -67,6 +73,9 @@ static inline pmd_t native_pmdp_get_and_clear(pmd_t *xp)
 #ifdef CONFIG_SMP
 static inline pud_t native_pudp_get_and_clear(pud_t *xp)
 {
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+	pti_set_user_pgtbl(&xp->p4d.pgd, __pgd(0));
+#endif
 	return __pud(xchg((pudval_t *)xp, 0));
 }
 #else

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/mm/pti: Add an overflow check to pti_clone_pmds()
  2018-07-18  9:41 ` [PATCH 24/39] x86/mm/pti: Add an overflow check to pti_clone_pmds() Joerg Roedel
@ 2018-07-19 23:31   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, torvalds, bp, dave.hansen, boris.ostrovsky,
	jpoimboe, brgerst, David.Laight, jkosina, pavel, tglx, dvlasenk,
	dhgutteridge, jroedel, hpa, llong, peterz, aarcange, eduval,
	jgross, mingo, will.deacon, gregkh, luto

Commit-ID:  935232ce28dfabff1171e5a7113b2d865fa9ee63
Gitweb:     https://git.kernel.org/tip/935232ce28dfabff1171e5a7113b2d865fa9ee63
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:41:01 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:43 +0200

x86/mm/pti: Add an overflow check to pti_clone_pmds()

The addr counter will overflow if the last PMD of the address space is
cloned, resulting in an endless loop.

Check for that and bail out of the loop when it happens.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-25-git-send-email-joro@8bytes.org

---
 arch/x86/mm/pti.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 71fba17c9d7c..79217868dd13 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -297,6 +297,10 @@ pti_clone_pmds(unsigned long start, unsigned long end, pmdval_t clear)
 		p4d_t *p4d;
 		pud_t *pud;
 
+		/* Overflow check */
+		if (addr < start)
+			break;
+
 		pgd = pgd_offset_k(addr);
 		if (WARN_ON(pgd_none(*pgd)))
 			return;

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/mm/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32
  2018-07-18  9:41 ` [PATCH 25/39] x86/mm/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32 Joerg Roedel
@ 2018-07-19 23:31   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:31 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, brgerst, jgross, dvlasenk, gregkh, jkosina, jpoimboe,
	eduval, dave.hansen, linux-kernel, bp, hpa, dhgutteridge,
	torvalds, boris.ostrovsky, llong, tglx, will.deacon, jroedel,
	aarcange, David.Laight, mingo, luto, pavel

Commit-ID:  2c1b9fbe83412598d2dccdd448147336b085e0c6
Gitweb:     https://git.kernel.org/tip/2c1b9fbe83412598d2dccdd448147336b085e0c6
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:41:02 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:44 +0200

x86/mm/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32

Move it out of the X86_64 specific processor defines so that its visible
for 32bit too.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-26-git-send-email-joro@8bytes.org

---
 arch/x86/include/asm/processor-flags.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/processor-flags.h b/arch/x86/include/asm/processor-flags.h
index 625a52a5594f..02c2cbda4a74 100644
--- a/arch/x86/include/asm/processor-flags.h
+++ b/arch/x86/include/asm/processor-flags.h
@@ -39,10 +39,6 @@
 #define CR3_PCID_MASK	0xFFFull
 #define CR3_NOFLUSH	BIT_ULL(63)
 
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
-# define X86_CR3_PTI_PCID_USER_BIT	11
-#endif
-
 #else
 /*
  * CR3_ADDR_MASK needs at least bits 31:5 set on PAE systems, and we save
@@ -53,4 +49,8 @@
 #define CR3_NOFLUSH	0
 #endif
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+# define X86_CR3_PTI_PCID_USER_BIT	11
+#endif
+
 #endif /* _ASM_X86_PROCESSOR_FLAGS_H */

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level on x86_32
  2018-07-18  9:41 ` [PATCH 26/39] x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level " Joerg Roedel
@ 2018-07-19 23:32   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:32 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: aarcange, peterz, gregkh, dvlasenk, tglx, hpa, llong,
	dave.hansen, torvalds, jpoimboe, eduval, mingo, brgerst, luto,
	jgross, boris.ostrovsky, bp, will.deacon, linux-kernel,
	dhgutteridge, David.Laight, jroedel, jkosina, pavel

Commit-ID:  f94560cd6b5117f8913f4c42f4d9a405c26ddc1c
Gitweb:     https://git.kernel.org/tip/f94560cd6b5117f8913f4c42f4d9a405c26ddc1c
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:41:03 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:44 +0200

x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level on x86_32

Cloning on the P4D level would clone the complete kernel address space into
the user-space page-tables for PAE kernels. Cloning on PMD level is fine
for PAE and legacy paging.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-27-git-send-email-joro@8bytes.org

---
 arch/x86/mm/pti.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 79217868dd13..a594e3b6401a 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -348,6 +348,7 @@ pti_clone_pmds(unsigned long start, unsigned long end, pmdval_t clear)
 	}
 }
 
+#ifdef CONFIG_X86_64
 /*
  * Clone a single p4d (i.e. a top-level entry on 4-level systems and a
  * next-level entry on 5-level systems.
@@ -371,6 +372,25 @@ static void __init pti_clone_user_shared(void)
 	pti_clone_p4d(CPU_ENTRY_AREA_BASE);
 }
 
+#else /* CONFIG_X86_64 */
+
+/*
+ * On 32 bit PAE systems with 1GB of Kernel address space there is only
+ * one pgd/p4d for the whole kernel. Cloning that would map the whole
+ * address space into the user page-tables, making PTI useless. So clone
+ * the page-table on the PMD level to prevent that.
+ */
+static void __init pti_clone_user_shared(void)
+{
+	unsigned long start, end;
+
+	start = CPU_ENTRY_AREA_BASE;
+	end   = start + (PAGE_SIZE * CPU_ENTRY_AREA_PAGES);
+
+	pti_clone_pmds(start, end, 0);
+}
+#endif /* CONFIG_X86_64 */
+
 /*
  * Clone the ESPFIX P4D into the user space visible page table
  */

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/mm/pti: Make pti_clone_kernel_text() compile on 32 bit
  2018-07-18  9:41 ` [PATCH 27/39] x86/mm/pti: Make pti_clone_kernel_text() compile on 32 bit Joerg Roedel
@ 2018-07-19 23:32   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:32 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: gregkh, boris.ostrovsky, will.deacon, dvlasenk, llong, jkosina,
	pavel, luto, peterz, torvalds, dhgutteridge, eduval, jgross,
	tglx, bp, brgerst, David.Laight, jroedel, dave.hansen, jpoimboe,
	linux-kernel, hpa, aarcange, mingo

Commit-ID:  39d668e04edad25abe184fb329ce35a131146ee5
Gitweb:     https://git.kernel.org/tip/39d668e04edad25abe184fb329ce35a131146ee5
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:41:04 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:44 +0200

x86/mm/pti: Make pti_clone_kernel_text() compile on 32 bit

The pti_clone_kernel_text() function references __end_rodata_hpage_align,
which is only present on x86-64.  This makes sense as the end of the rodata
section is not huge-page aligned on 32 bit.

Nevertheless a symbol is required for the function that points at the right
address for both 32 and 64 bit. Introduce __end_rodata_aligned for that
purpose and use it in pti_clone_kernel_text().

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-28-git-send-email-joro@8bytes.org

---
 arch/x86/include/asm/sections.h |  1 +
 arch/x86/kernel/vmlinux.lds.S   | 17 ++++++++++-------
 arch/x86/mm/pti.c               |  2 +-
 3 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h
index 5c019d23d06b..4a911a382ade 100644
--- a/arch/x86/include/asm/sections.h
+++ b/arch/x86/include/asm/sections.h
@@ -7,6 +7,7 @@
 
 extern char __brk_base[], __brk_limit[];
 extern struct exception_table_entry __stop___ex_table[];
+extern char __end_rodata_aligned[];
 
 #if defined(CONFIG_X86_64)
 extern char __end_rodata_hpage_align[];
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 5e1458f609a1..8bde0a419f86 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -55,19 +55,22 @@ jiffies_64 = jiffies;
  * so we can enable protection checks as well as retain 2MB large page
  * mappings for kernel text.
  */
-#define X64_ALIGN_RODATA_BEGIN	. = ALIGN(HPAGE_SIZE);
+#define X86_ALIGN_RODATA_BEGIN	. = ALIGN(HPAGE_SIZE);
 
-#define X64_ALIGN_RODATA_END					\
+#define X86_ALIGN_RODATA_END					\
 		. = ALIGN(HPAGE_SIZE);				\
-		__end_rodata_hpage_align = .;
+		__end_rodata_hpage_align = .;			\
+		__end_rodata_aligned = .;
 
 #define ALIGN_ENTRY_TEXT_BEGIN	. = ALIGN(PMD_SIZE);
 #define ALIGN_ENTRY_TEXT_END	. = ALIGN(PMD_SIZE);
 
 #else
 
-#define X64_ALIGN_RODATA_BEGIN
-#define X64_ALIGN_RODATA_END
+#define X86_ALIGN_RODATA_BEGIN
+#define X86_ALIGN_RODATA_END					\
+		. = ALIGN(PAGE_SIZE);				\
+		__end_rodata_aligned = .;
 
 #define ALIGN_ENTRY_TEXT_BEGIN
 #define ALIGN_ENTRY_TEXT_END
@@ -141,9 +144,9 @@ SECTIONS
 
 	/* .text should occupy whole number of pages */
 	. = ALIGN(PAGE_SIZE);
-	X64_ALIGN_RODATA_BEGIN
+	X86_ALIGN_RODATA_BEGIN
 	RO_DATA(PAGE_SIZE)
-	X64_ALIGN_RODATA_END
+	X86_ALIGN_RODATA_END
 
 	/* Data */
 	.data : AT(ADDR(.data) - LOAD_OFFSET) {
diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index a594e3b6401a..453d23760941 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -470,7 +470,7 @@ void pti_clone_kernel_text(void)
 	 * clone the areas past rodata, they might contain secrets.
 	 */
 	unsigned long start = PFN_ALIGN(_text);
-	unsigned long end = (unsigned long)__end_rodata_hpage_align;
+	unsigned long end = (unsigned long)__end_rodata_aligned;
 
 	if (!pti_kernel_image_global_ok())
 		return;

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/mm/pti: Keep permissions when cloning kernel text in pti_clone_kernel_text()
  2018-07-18  9:41 ` [PATCH 28/39] x86/mm/pti: Keep permissions when cloning kernel text in pti_clone_kernel_text() Joerg Roedel
@ 2018-07-19 23:33   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, jpoimboe, dave.hansen, tglx, peterz, pavel, bp, luto,
	mingo, boris.ostrovsky, jkosina, aarcange, David.Laight, hpa,
	dvlasenk, will.deacon, jgross, linux-kernel, dhgutteridge, llong,
	brgerst, jroedel, eduval, gregkh

Commit-ID:  1ac228a7c87f697d1d01eb6362a6b5246705b0dd
Gitweb:     https://git.kernel.org/tip/1ac228a7c87f697d1d01eb6362a6b5246705b0dd
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:41:05 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:45 +0200

x86/mm/pti: Keep permissions when cloning kernel text in pti_clone_kernel_text()

Mapping the kernel text area to user-space makes only sense if it has the
same permissions as in the kernel page-table.  If permissions are different
this will cause a TLB reload when using the kernel page-table, which is as
good as not mapping it at all.

On 64-bit kernels this patch makes no difference, as the whole range cloned
by pti_clone_kernel_text() is mapped RO anyway. On 32 bit there are
writeable mappings in the range, so just keep the permissions as they are.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-29-git-send-email-joro@8bytes.org

---
 arch/x86/mm/pti.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 453d23760941..e41ee93c430d 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -482,7 +482,7 @@ void pti_clone_kernel_text(void)
 	 * pti_set_kernel_image_nonglobal() did to clear the
 	 * global bit.
 	 */
-	pti_clone_pmds(start, end, _PAGE_RW);
+	pti_clone_pmds(start, end, 0);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/mm/pti: Introduce pti_finalize()
  2018-07-18  9:41 ` [PATCH 29/39] x86/mm/pti: Introduce pti_finalize() Joerg Roedel
@ 2018-07-19 23:33   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, dave.hansen, llong, tglx, brgerst, jgross, peterz,
	pavel, mingo, jpoimboe, boris.ostrovsky, luto, will.deacon,
	dhgutteridge, jroedel, aarcange, eduval, jkosina, gregkh, hpa,
	David.Laight, dvlasenk, bp, linux-kernel

Commit-ID:  b976690f5db26fbc7c2be413bfa0fbd270547a94
Gitweb:     https://git.kernel.org/tip/b976690f5db26fbc7c2be413bfa0fbd270547a94
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:41:06 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:45 +0200

x86/mm/pti: Introduce pti_finalize()

Introduce a new function to finalize the kernel mappings for the userspace
page-table after all ro/nx protections have been applied to the kernel
mappings.

Also move the call to pti_clone_kernel_text() to that function so that it
will run on 32 bit kernels too.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-30-git-send-email-joro@8bytes.org

---
 arch/x86/include/asm/pti.h |  3 +--
 arch/x86/mm/init_64.c      |  6 ------
 arch/x86/mm/pti.c          | 14 +++++++++++++-
 include/linux/pti.h        |  1 +
 init/main.c                |  7 +++++++
 5 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/pti.h b/arch/x86/include/asm/pti.h
index 38a17f1d5c9d..5df09a0b80b8 100644
--- a/arch/x86/include/asm/pti.h
+++ b/arch/x86/include/asm/pti.h
@@ -6,10 +6,9 @@
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
 extern void pti_init(void);
 extern void pti_check_boottime_disable(void);
-extern void pti_clone_kernel_text(void);
+extern void pti_finalize(void);
 #else
 static inline void pti_check_boottime_disable(void) { }
-static inline void pti_clone_kernel_text(void) { }
 #endif
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index a688617c727e..9b19f9a8948e 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1291,12 +1291,6 @@ void mark_rodata_ro(void)
 			(unsigned long) __va(__pa_symbol(_sdata)));
 
 	debug_checkwx();
-
-	/*
-	 * Do this after all of the manipulation of the
-	 * kernel text page tables are complete.
-	 */
-	pti_clone_kernel_text();
 }
 
 int kern_addr_valid(unsigned long addr)
diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index e41ee93c430d..fcfb815d420f 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -462,7 +462,7 @@ static inline bool pti_kernel_image_global_ok(void)
  * For some configurations, map all of kernel text into the user page
  * tables.  This reduces TLB misses, especially on non-PCID systems.
  */
-void pti_clone_kernel_text(void)
+static void pti_clone_kernel_text(void)
 {
 	/*
 	 * rodata is part of the kernel image and is normally
@@ -526,3 +526,15 @@ void __init pti_init(void)
 	pti_setup_espfix64();
 	pti_setup_vsyscall();
 }
+
+/*
+ * Finalize the kernel mappings in the userspace page-table.
+ */
+void pti_finalize(void)
+{
+	/*
+	 * Do this after all of the manipulation of the
+	 * kernel text page tables are complete.
+	 */
+	pti_clone_kernel_text();
+}
diff --git a/include/linux/pti.h b/include/linux/pti.h
index 0174883a935a..1a941efcaa62 100644
--- a/include/linux/pti.h
+++ b/include/linux/pti.h
@@ -6,6 +6,7 @@
 #include <asm/pti.h>
 #else
 static inline void pti_init(void) { }
+static inline void pti_finalize(void) { }
 #endif
 
 #endif
diff --git a/init/main.c b/init/main.c
index 3b4ada11ed52..fcfef46c935a 100644
--- a/init/main.c
+++ b/init/main.c
@@ -1065,6 +1065,13 @@ static int __ref kernel_init(void *unused)
 	jump_label_invalidate_initmem();
 	free_initmem();
 	mark_readonly();
+
+	/*
+	 * Kernel mappings are now finalized - update the userspace page-table
+	 * to finalize PTI.
+	 */
+	pti_finalize();
+
 	system_state = SYSTEM_RUNNING;
 	numa_default_policy();
 

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/mm/pti: Clone entry-text again in pti_finalize()
  2018-07-18  9:41 ` [PATCH 30/39] x86/mm/pti: Clone entry-text again in pti_finalize() Joerg Roedel
@ 2018-07-19 23:34   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: David.Laight, mingo, hpa, dhgutteridge, pavel, dvlasenk, brgerst,
	peterz, tglx, jkosina, bp, luto, jpoimboe, gregkh, aarcange,
	will.deacon, torvalds, dave.hansen, linux-kernel,
	boris.ostrovsky, jroedel, llong, eduval, jgross

Commit-ID:  ba0364e260ab37c02975557dbecc014a26072236
Gitweb:     https://git.kernel.org/tip/ba0364e260ab37c02975557dbecc014a26072236
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:41:07 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:45 +0200

x86/mm/pti: Clone entry-text again in pti_finalize()

The mapping for entry-text might have changed in the kernel after it was
cloned to the user page-table. Clone again to update the user page-table to
bring the mapping in sync with the kernel again.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-31-git-send-email-joro@8bytes.org

---
 arch/x86/mm/pti.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index fcfb815d420f..a536ecc91847 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -404,7 +404,7 @@ static void __init pti_setup_espfix64(void)
 /*
  * Clone the populated PMDs of the entry and irqentry text and force it RO.
  */
-static void __init pti_clone_entry_text(void)
+static void pti_clone_entry_text(void)
 {
 	pti_clone_pmds((unsigned long) __entry_text_start,
 			(unsigned long) __irqentry_text_end,
@@ -528,13 +528,18 @@ void __init pti_init(void)
 }
 
 /*
- * Finalize the kernel mappings in the userspace page-table.
+ * Finalize the kernel mappings in the userspace page-table. Some of the
+ * mappings for the kernel image might have changed since pti_init()
+ * cloned them. This is because parts of the kernel image have been
+ * mapped RO and/or NX.  These changes need to be cloned again to the
+ * userspace page-table.
  */
 void pti_finalize(void)
 {
 	/*
-	 * Do this after all of the manipulation of the
-	 * kernel text page tables are complete.
+	 * We need to clone everything (again) that maps parts of the
+	 * kernel image.
 	 */
+	pti_clone_entry_text();
 	pti_clone_kernel_text();
 }

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/mm/dump_pagetables: Define INIT_PGD
  2018-07-18  9:41 ` [PATCH 31/39] x86/mm/dump_pagetables: Define INIT_PGD Joerg Roedel
@ 2018-07-19 23:34   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: llong, jpoimboe, boris.ostrovsky, luto, will.deacon, peterz,
	jgross, dhgutteridge, bp, dvlasenk, torvalds, gregkh, hpa,
	brgerst, jkosina, jroedel, mingo, aarcange, linux-kernel, eduval,
	dave.hansen, pavel, David.Laight, tglx

Commit-ID:  4e8537e4a7a15402b87c424b22c25c9e59681d16
Gitweb:     https://git.kernel.org/tip/4e8537e4a7a15402b87c424b22c25c9e59681d16
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:41:08 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:46 +0200

x86/mm/dump_pagetables: Define INIT_PGD

Define INIT_PGD to point to the correct initial page-table for 32 and 64
bit and use it where needed. This fixes the build on 32 bit with
CONFIG_PAGE_TABLE_ISOLATION enabled.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-32-git-send-email-joro@8bytes.org

---
 arch/x86/mm/dump_pagetables.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 2f3c9196b834..e6fd0cdef9ad 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -111,6 +111,8 @@ static struct addr_marker address_markers[] = {
 	[END_OF_SPACE_NR]	= { -1,			NULL }
 };
 
+#define INIT_PGD	((pgd_t *) &init_top_pgt)
+
 #else /* CONFIG_X86_64 */
 
 enum address_markers_idx {
@@ -139,6 +141,8 @@ static struct addr_marker address_markers[] = {
 	[END_OF_SPACE_NR]	= { -1,			NULL }
 };
 
+#define INIT_PGD	(swapper_pg_dir)
+
 #endif /* !CONFIG_X86_64 */
 
 /* Multipliers for offsets within the PTEs */
@@ -496,11 +500,7 @@ static inline bool is_hypervisor_range(int idx)
 static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
 				       bool checkwx, bool dmesg)
 {
-#ifdef CONFIG_X86_64
-	pgd_t *start = (pgd_t *) &init_top_pgt;
-#else
-	pgd_t *start = swapper_pg_dir;
-#endif
+	pgd_t *start = INIT_PGD;
 	pgprotval_t prot, eff;
 	int i;
 	struct pg_state st = {};
@@ -566,7 +566,7 @@ EXPORT_SYMBOL_GPL(ptdump_walk_pgd_level_debugfs);
 static void ptdump_walk_user_pgd_level_checkwx(void)
 {
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
-	pgd_t *pgd = (pgd_t *) &init_top_pgt;
+	pgd_t *pgd = INIT_PGD;
 
 	if (!static_cpu_has(X86_FEATURE_PTI))
 		return;

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/pgtable/pae: Use separate kernel PMDs for user page-table
  2018-07-18  9:41 ` [PATCH 32/39] x86/pgtable/pae: Use separate kernel PMDs for user page-table Joerg Roedel
@ 2018-07-19 23:35   ` tip-bot for Joerg Roedel
  2018-10-05 14:06   ` [PATCH 32/39] " Arnd Bergmann
  1 sibling, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:35 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jgross, boris.ostrovsky, dave.hansen, pavel, gregkh, will.deacon,
	brgerst, tglx, hpa, luto, jroedel, mingo, eduval, jkosina,
	linux-kernel, llong, torvalds, bp, aarcange, jpoimboe,
	dhgutteridge, peterz, dvlasenk, David.Laight

Commit-ID:  f59dbe9ca6707eb7ffd0e24359085651c2d7df48
Gitweb:     https://git.kernel.org/tip/f59dbe9ca6707eb7ffd0e24359085651c2d7df48
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:41:09 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:46 +0200

x86/pgtable/pae: Use separate kernel PMDs for user page-table

When PTI is enabled, separate kernel PMDs in the user page-table are
required to map the per-process LDT for user-space.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-33-git-send-email-joro@8bytes.org

---
 arch/x86/mm/pgtable.c | 100 ++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 81 insertions(+), 19 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index db6fb7740bf7..8e4e63d46d81 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -182,6 +182,14 @@ static void pgd_dtor(pgd_t *pgd)
  */
 #define PREALLOCATED_PMDS	UNSHARED_PTRS_PER_PGD
 
+/*
+ * We allocate separate PMDs for the kernel part of the user page-table
+ * when PTI is enabled. We need them to map the per-process LDT into the
+ * user-space page-table.
+ */
+#define PREALLOCATED_USER_PMDS	 (static_cpu_has(X86_FEATURE_PTI) ? \
+					KERNEL_PGD_PTRS : 0)
+
 void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd)
 {
 	paravirt_alloc_pmd(mm, __pa(pmd) >> PAGE_SHIFT);
@@ -202,14 +210,14 @@ void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd)
 
 /* No need to prepopulate any pagetable entries in non-PAE modes. */
 #define PREALLOCATED_PMDS	0
-
+#define PREALLOCATED_USER_PMDS	 0
 #endif	/* CONFIG_X86_PAE */
 
-static void free_pmds(struct mm_struct *mm, pmd_t *pmds[])
+static void free_pmds(struct mm_struct *mm, pmd_t *pmds[], int count)
 {
 	int i;
 
-	for(i = 0; i < PREALLOCATED_PMDS; i++)
+	for (i = 0; i < count; i++)
 		if (pmds[i]) {
 			pgtable_pmd_page_dtor(virt_to_page(pmds[i]));
 			free_page((unsigned long)pmds[i]);
@@ -217,7 +225,7 @@ static void free_pmds(struct mm_struct *mm, pmd_t *pmds[])
 		}
 }
 
-static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[])
+static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[], int count)
 {
 	int i;
 	bool failed = false;
@@ -226,7 +234,7 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[])
 	if (mm == &init_mm)
 		gfp &= ~__GFP_ACCOUNT;
 
-	for(i = 0; i < PREALLOCATED_PMDS; i++) {
+	for (i = 0; i < count; i++) {
 		pmd_t *pmd = (pmd_t *)__get_free_page(gfp);
 		if (!pmd)
 			failed = true;
@@ -241,7 +249,7 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[])
 	}
 
 	if (failed) {
-		free_pmds(mm, pmds);
+		free_pmds(mm, pmds, count);
 		return -ENOMEM;
 	}
 
@@ -254,23 +262,38 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[])
  * preallocate which never got a corresponding vma will need to be
  * freed manually.
  */
+static void mop_up_one_pmd(struct mm_struct *mm, pgd_t *pgdp)
+{
+	pgd_t pgd = *pgdp;
+
+	if (pgd_val(pgd) != 0) {
+		pmd_t *pmd = (pmd_t *)pgd_page_vaddr(pgd);
+
+		*pgdp = native_make_pgd(0);
+
+		paravirt_release_pmd(pgd_val(pgd) >> PAGE_SHIFT);
+		pmd_free(mm, pmd);
+		mm_dec_nr_pmds(mm);
+	}
+}
+
 static void pgd_mop_up_pmds(struct mm_struct *mm, pgd_t *pgdp)
 {
 	int i;
 
-	for(i = 0; i < PREALLOCATED_PMDS; i++) {
-		pgd_t pgd = pgdp[i];
+	for (i = 0; i < PREALLOCATED_PMDS; i++)
+		mop_up_one_pmd(mm, &pgdp[i]);
 
-		if (pgd_val(pgd) != 0) {
-			pmd_t *pmd = (pmd_t *)pgd_page_vaddr(pgd);
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
 
-			pgdp[i] = native_make_pgd(0);
+	if (!static_cpu_has(X86_FEATURE_PTI))
+		return;
 
-			paravirt_release_pmd(pgd_val(pgd) >> PAGE_SHIFT);
-			pmd_free(mm, pmd);
-			mm_dec_nr_pmds(mm);
-		}
-	}
+	pgdp = kernel_to_user_pgdp(pgdp);
+
+	for (i = 0; i < PREALLOCATED_USER_PMDS; i++)
+		mop_up_one_pmd(mm, &pgdp[i + KERNEL_PGD_BOUNDARY]);
+#endif
 }
 
 static void pgd_prepopulate_pmd(struct mm_struct *mm, pgd_t *pgd, pmd_t *pmds[])
@@ -296,6 +319,38 @@ static void pgd_prepopulate_pmd(struct mm_struct *mm, pgd_t *pgd, pmd_t *pmds[])
 	}
 }
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+static void pgd_prepopulate_user_pmd(struct mm_struct *mm,
+				     pgd_t *k_pgd, pmd_t *pmds[])
+{
+	pgd_t *s_pgd = kernel_to_user_pgdp(swapper_pg_dir);
+	pgd_t *u_pgd = kernel_to_user_pgdp(k_pgd);
+	p4d_t *u_p4d;
+	pud_t *u_pud;
+	int i;
+
+	u_p4d = p4d_offset(u_pgd, 0);
+	u_pud = pud_offset(u_p4d, 0);
+
+	s_pgd += KERNEL_PGD_BOUNDARY;
+	u_pud += KERNEL_PGD_BOUNDARY;
+
+	for (i = 0; i < PREALLOCATED_USER_PMDS; i++, u_pud++, s_pgd++) {
+		pmd_t *pmd = pmds[i];
+
+		memcpy(pmd, (pmd_t *)pgd_page_vaddr(*s_pgd),
+		       sizeof(pmd_t) * PTRS_PER_PMD);
+
+		pud_populate(mm, u_pud, pmd);
+	}
+
+}
+#else
+static void pgd_prepopulate_user_pmd(struct mm_struct *mm,
+				     pgd_t *k_pgd, pmd_t *pmds[])
+{
+}
+#endif
 /*
  * Xen paravirt assumes pgd table should be in one page. 64 bit kernel also
  * assumes that pgd should be in one page.
@@ -376,6 +431,7 @@ static inline void _pgd_free(pgd_t *pgd)
 pgd_t *pgd_alloc(struct mm_struct *mm)
 {
 	pgd_t *pgd;
+	pmd_t *u_pmds[PREALLOCATED_USER_PMDS];
 	pmd_t *pmds[PREALLOCATED_PMDS];
 
 	pgd = _pgd_alloc();
@@ -385,12 +441,15 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
 
 	mm->pgd = pgd;
 
-	if (preallocate_pmds(mm, pmds) != 0)
+	if (preallocate_pmds(mm, pmds, PREALLOCATED_PMDS) != 0)
 		goto out_free_pgd;
 
-	if (paravirt_pgd_alloc(mm) != 0)
+	if (preallocate_pmds(mm, u_pmds, PREALLOCATED_USER_PMDS) != 0)
 		goto out_free_pmds;
 
+	if (paravirt_pgd_alloc(mm) != 0)
+		goto out_free_user_pmds;
+
 	/*
 	 * Make sure that pre-populating the pmds is atomic with
 	 * respect to anything walking the pgd_list, so that they
@@ -400,13 +459,16 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
 
 	pgd_ctor(mm, pgd);
 	pgd_prepopulate_pmd(mm, pgd, pmds);
+	pgd_prepopulate_user_pmd(mm, pgd, u_pmds);
 
 	spin_unlock(&pgd_lock);
 
 	return pgd;
 
+out_free_user_pmds:
+	free_pmds(mm, u_pmds, PREALLOCATED_USER_PMDS);
 out_free_pmds:
-	free_pmds(mm, pmds);
+	free_pmds(mm, pmds, PREALLOCATED_PMDS);
 out_free_pgd:
 	_pgd_free(pgd);
 out:

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/ldt: Reserve address-space range on 32 bit for the LDT
  2018-07-18  9:41 ` [PATCH 33/39] x86/ldt: Reserve address-space range on 32 bit for the LDT Joerg Roedel
@ 2018-07-19 23:35   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:35 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jpoimboe, will.deacon, jgross, boris.ostrovsky, tglx, mingo,
	peterz, jroedel, brgerst, torvalds, bp, dave.hansen, pavel,
	dvlasenk, llong, aarcange, hpa, linux-kernel, gregkh,
	dhgutteridge, David.Laight, luto, jkosina, eduval

Commit-ID:  f3e48e546c42e31c0c095a6f917a4ad64668608c
Gitweb:     https://git.kernel.org/tip/f3e48e546c42e31c0c095a6f917a4ad64668608c
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:41:10 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:47 +0200

x86/ldt: Reserve address-space range on 32 bit for the LDT

Reserve 2MB/4MB of address-space for mapping the LDT to user-space on 32
bit PTI kernels.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-34-git-send-email-joro@8bytes.org

---
 arch/x86/include/asm/pgtable_32_types.h | 7 +++++--
 arch/x86/mm/dump_pagetables.c           | 9 +++++++++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_32_types.h b/arch/x86/include/asm/pgtable_32_types.h
index d9a001a4a872..7297810747cf 100644
--- a/arch/x86/include/asm/pgtable_32_types.h
+++ b/arch/x86/include/asm/pgtable_32_types.h
@@ -50,13 +50,16 @@ extern bool __vmalloc_start_set; /* set once high_memory is set */
 	((FIXADDR_TOT_START - PAGE_SIZE * (CPU_ENTRY_AREA_PAGES + 1))   \
 	 & PMD_MASK)
 
-#define PKMAP_BASE		\
+#define LDT_BASE_ADDR		\
 	((CPU_ENTRY_AREA_BASE - PAGE_SIZE) & PMD_MASK)
 
+#define PKMAP_BASE		\
+	((LDT_BASE_ADDR - PAGE_SIZE) & PMD_MASK)
+
 #ifdef CONFIG_HIGHMEM
 # define VMALLOC_END	(PKMAP_BASE - 2 * PAGE_SIZE)
 #else
-# define VMALLOC_END	(CPU_ENTRY_AREA_BASE - 2 * PAGE_SIZE)
+# define VMALLOC_END	(LDT_BASE_ADDR - 2 * PAGE_SIZE)
 #endif
 
 #define MODULES_VADDR	VMALLOC_START
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index e6fd0cdef9ad..ccd92c4da57c 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -122,6 +122,9 @@ enum address_markers_idx {
 	VMALLOC_END_NR,
 #ifdef CONFIG_HIGHMEM
 	PKMAP_BASE_NR,
+#endif
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+	LDT_NR,
 #endif
 	CPU_ENTRY_AREA_NR,
 	FIXADDR_START_NR,
@@ -135,6 +138,9 @@ static struct addr_marker address_markers[] = {
 	[VMALLOC_END_NR]	= { 0UL,		"vmalloc() End" },
 #ifdef CONFIG_HIGHMEM
 	[PKMAP_BASE_NR]		= { 0UL,		"Persistent kmap() Area" },
+#endif
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+	[LDT_NR]		= { 0UL,		"LDT remap" },
 #endif
 	[CPU_ENTRY_AREA_NR]	= { 0UL,		"CPU entry area" },
 	[FIXADDR_START_NR]	= { 0UL,		"Fixmap area" },
@@ -609,6 +615,9 @@ static int __init pt_dump_init(void)
 # endif
 	address_markers[FIXADDR_START_NR].start_address = FIXADDR_START;
 	address_markers[CPU_ENTRY_AREA_NR].start_address = CPU_ENTRY_AREA_BASE;
+# ifdef CONFIG_MODIFY_LDT_SYSCALL
+	address_markers[LDT_NR].start_address = LDT_BASE_ADDR;
+# endif
 #endif
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/ldt: Define LDT_END_ADDR
  2018-07-18  9:41 ` [PATCH 34/39] x86/ldt: Define LDT_END_ADDR Joerg Roedel
@ 2018-07-19 23:36   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:36 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: bp, eduval, jgross, dhgutteridge, jpoimboe, dave.hansen, pavel,
	luto, will.deacon, David.Laight, aarcange, linux-kernel, brgerst,
	mingo, jkosina, tglx, hpa, boris.ostrovsky, peterz, torvalds,
	dvlasenk, llong, jroedel, gregkh

Commit-ID:  8195d869d118bc30bf0be8d0c5d8849d6f58529b
Gitweb:     https://git.kernel.org/tip/8195d869d118bc30bf0be8d0c5d8849d6f58529b
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:41:11 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:47 +0200

x86/ldt: Define LDT_END_ADDR

It marks the end of the address-space range reserved for the LDT. The
LDT-code will use it when unmapping the LDT for user-space.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-35-git-send-email-joro@8bytes.org

---
 arch/x86/include/asm/pgtable_32_types.h | 2 ++
 arch/x86/include/asm/pgtable_64_types.h | 1 +
 arch/x86/kernel/ldt.c                   | 2 +-
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pgtable_32_types.h b/arch/x86/include/asm/pgtable_32_types.h
index 7297810747cf..b0bc0fff5f1f 100644
--- a/arch/x86/include/asm/pgtable_32_types.h
+++ b/arch/x86/include/asm/pgtable_32_types.h
@@ -53,6 +53,8 @@ extern bool __vmalloc_start_set; /* set once high_memory is set */
 #define LDT_BASE_ADDR		\
 	((CPU_ENTRY_AREA_BASE - PAGE_SIZE) & PMD_MASK)
 
+#define LDT_END_ADDR		(LDT_BASE_ADDR + PMD_SIZE)
+
 #define PKMAP_BASE		\
 	((LDT_BASE_ADDR - PAGE_SIZE) & PMD_MASK)
 
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 066e0ab9ffab..04edd2d58211 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -115,6 +115,7 @@ extern unsigned int ptrs_per_p4d;
 #define LDT_PGD_ENTRY_L5	-112UL
 #define LDT_PGD_ENTRY		(pgtable_l5_enabled() ? LDT_PGD_ENTRY_L5 : LDT_PGD_ENTRY_L4)
 #define LDT_BASE_ADDR		(LDT_PGD_ENTRY << PGDIR_SHIFT)
+#define LDT_END_ADDR		(LDT_BASE_ADDR + PGDIR_SIZE)
 
 #define __VMALLOC_BASE_L4	0xffffc90000000000UL
 #define __VMALLOC_BASE_L5 	0xffa0000000000000UL
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index c9b14020f4dd..e921b3d494d5 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -206,7 +206,7 @@ static void free_ldt_pgtables(struct mm_struct *mm)
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
 	struct mmu_gather tlb;
 	unsigned long start = LDT_BASE_ADDR;
-	unsigned long end = start + (1UL << PGDIR_SHIFT);
+	unsigned long end = LDT_END_ADDR;
 
 	if (!static_cpu_has(X86_FEATURE_PTI))
 		return;

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/ldt: Split out sanity check in map_ldt_struct()
  2018-07-18  9:41 ` [PATCH 35/39] x86/ldt: Split out sanity check in map_ldt_struct() Joerg Roedel
@ 2018-07-19 23:36   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:36 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jpoimboe, dave.hansen, dvlasenk, peterz, torvalds, brgerst,
	llong, dhgutteridge, jgross, pavel, boris.ostrovsky, gregkh,
	tglx, hpa, aarcange, eduval, bp, jroedel, mingo, jkosina,
	will.deacon, David.Laight, linux-kernel, luto

Commit-ID:  9bae3197e15dd5e03ce8e237db6fe4486b08a775
Gitweb:     https://git.kernel.org/tip/9bae3197e15dd5e03ce8e237db6fe4486b08a775
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:41:12 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:47 +0200

x86/ldt: Split out sanity check in map_ldt_struct()

This splits out the mapping sanity check and the actual mapping of the LDT
to user-space from the map_ldt_struct() function in a way so that it is
re-usable for PAE paging.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-36-git-send-email-joro@8bytes.org

---
 arch/x86/kernel/ldt.c | 82 ++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 58 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index e921b3d494d5..69af9a0d57b7 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -100,6 +100,49 @@ static struct ldt_struct *alloc_ldt_struct(unsigned int num_entries)
 	return new_ldt;
 }
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+
+static void do_sanity_check(struct mm_struct *mm,
+			    bool had_kernel_mapping,
+			    bool had_user_mapping)
+{
+	if (mm->context.ldt) {
+		/*
+		 * We already had an LDT.  The top-level entry should already
+		 * have been allocated and synchronized with the usermode
+		 * tables.
+		 */
+		WARN_ON(!had_kernel_mapping);
+		if (static_cpu_has(X86_FEATURE_PTI))
+			WARN_ON(!had_user_mapping);
+	} else {
+		/*
+		 * This is the first time we're mapping an LDT for this process.
+		 * Sync the pgd to the usermode tables.
+		 */
+		WARN_ON(had_kernel_mapping);
+		if (static_cpu_has(X86_FEATURE_PTI))
+			WARN_ON(had_user_mapping);
+	}
+}
+
+static void map_ldt_struct_to_user(struct mm_struct *mm)
+{
+	pgd_t *pgd = pgd_offset(mm, LDT_BASE_ADDR);
+
+	if (static_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt)
+		set_pgd(kernel_to_user_pgdp(pgd), *pgd);
+}
+
+static void sanity_check_ldt_mapping(struct mm_struct *mm)
+{
+	pgd_t *pgd = pgd_offset(mm, LDT_BASE_ADDR);
+	bool had_kernel = (pgd->pgd != 0);
+	bool had_user   = (kernel_to_user_pgdp(pgd)->pgd != 0);
+
+	do_sanity_check(mm, had_kernel, had_user);
+}
+
 /*
  * If PTI is enabled, this maps the LDT into the kernelmode and
  * usermode tables for the given mm.
@@ -115,9 +158,8 @@ static struct ldt_struct *alloc_ldt_struct(unsigned int num_entries)
 static int
 map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 {
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
-	bool is_vmalloc, had_top_level_entry;
 	unsigned long va;
+	bool is_vmalloc;
 	spinlock_t *ptl;
 	pgd_t *pgd;
 	int i;
@@ -131,13 +173,15 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 	 */
 	WARN_ON(ldt->slot != -1);
 
+	/* Check if the current mappings are sane */
+	sanity_check_ldt_mapping(mm);
+
 	/*
 	 * Did we already have the top level entry allocated?  We can't
 	 * use pgd_none() for this because it doens't do anything on
 	 * 4-level page table kernels.
 	 */
 	pgd = pgd_offset(mm, LDT_BASE_ADDR);
-	had_top_level_entry = (pgd->pgd != 0);
 
 	is_vmalloc = is_vmalloc_addr(ldt->entries);
 
@@ -172,35 +216,25 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 		pte_unmap_unlock(ptep, ptl);
 	}
 
-	if (mm->context.ldt) {
-		/*
-		 * We already had an LDT.  The top-level entry should already
-		 * have been allocated and synchronized with the usermode
-		 * tables.
-		 */
-		WARN_ON(!had_top_level_entry);
-		if (static_cpu_has(X86_FEATURE_PTI))
-			WARN_ON(!kernel_to_user_pgdp(pgd)->pgd);
-	} else {
-		/*
-		 * This is the first time we're mapping an LDT for this process.
-		 * Sync the pgd to the usermode tables.
-		 */
-		WARN_ON(had_top_level_entry);
-		if (static_cpu_has(X86_FEATURE_PTI)) {
-			WARN_ON(kernel_to_user_pgdp(pgd)->pgd);
-			set_pgd(kernel_to_user_pgdp(pgd), *pgd);
-		}
-	}
+	/* Propagate LDT mapping to the user page-table */
+	map_ldt_struct_to_user(mm);
 
 	va = (unsigned long)ldt_slot_va(slot);
 	flush_tlb_mm_range(mm, va, va + LDT_SLOT_STRIDE, 0);
 
 	ldt->slot = slot;
-#endif
 	return 0;
 }
 
+#else /* !CONFIG_PAGE_TABLE_ISOLATION */
+
+static int
+map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
+{
+	return 0;
+}
+#endif /* CONFIG_PAGE_TABLE_ISOLATION */
+
 static void free_ldt_pgtables(struct mm_struct *mm)
 {
 #ifdef CONFIG_PAGE_TABLE_ISOLATION

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/ldt: Enable LDT user-mapping for PAE
  2018-07-18  9:41 ` [PATCH 36/39] x86/ldt: Enable LDT user-mapping for PAE Joerg Roedel
@ 2018-07-19 23:37   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: luto, pavel, aarcange, jkosina, dhgutteridge, torvalds, bp,
	boris.ostrovsky, mingo, hpa, brgerst, jpoimboe, dvlasenk, peterz,
	dave.hansen, jgross, tglx, gregkh, eduval, will.deacon, jroedel,
	llong, linux-kernel, David.Laight

Commit-ID:  6df934b92a549cb3badb6d576f71aeb133e2f110
Gitweb:     https://git.kernel.org/tip/6df934b92a549cb3badb6d576f71aeb133e2f110
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:41:13 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:48 +0200

x86/ldt: Enable LDT user-mapping for PAE

This adds the needed special case for PAE to get the LDT mapped into the
user page-table when PTI is enabled. The big difference to the other paging
modes is that on PAE there is no full top-level PGD entry available for the
LDT, but only a PMD entry.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-37-git-send-email-joro@8bytes.org

---
 arch/x86/include/asm/mmu_context.h |  5 ----
 arch/x86/kernel/ldt.c              | 53 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index bbc796eb0a3b..eeeb9289c764 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -71,12 +71,7 @@ struct ldt_struct {
 
 static inline void *ldt_slot_va(int slot)
 {
-#ifdef CONFIG_X86_64
 	return (void *)(LDT_BASE_ADDR + LDT_SLOT_STRIDE * slot);
-#else
-	BUG();
-	return (void *)fix_to_virt(FIX_HOLE);
-#endif
 }
 
 /*
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index 69af9a0d57b7..733e6ace0fa4 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -126,6 +126,57 @@ static void do_sanity_check(struct mm_struct *mm,
 	}
 }
 
+#ifdef CONFIG_X86_PAE
+
+static pmd_t *pgd_to_pmd_walk(pgd_t *pgd, unsigned long va)
+{
+	p4d_t *p4d;
+	pud_t *pud;
+
+	if (pgd->pgd == 0)
+		return NULL;
+
+	p4d = p4d_offset(pgd, va);
+	if (p4d_none(*p4d))
+		return NULL;
+
+	pud = pud_offset(p4d, va);
+	if (pud_none(*pud))
+		return NULL;
+
+	return pmd_offset(pud, va);
+}
+
+static void map_ldt_struct_to_user(struct mm_struct *mm)
+{
+	pgd_t *k_pgd = pgd_offset(mm, LDT_BASE_ADDR);
+	pgd_t *u_pgd = kernel_to_user_pgdp(k_pgd);
+	pmd_t *k_pmd, *u_pmd;
+
+	k_pmd = pgd_to_pmd_walk(k_pgd, LDT_BASE_ADDR);
+	u_pmd = pgd_to_pmd_walk(u_pgd, LDT_BASE_ADDR);
+
+	if (static_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt)
+		set_pmd(u_pmd, *k_pmd);
+}
+
+static void sanity_check_ldt_mapping(struct mm_struct *mm)
+{
+	pgd_t *k_pgd = pgd_offset(mm, LDT_BASE_ADDR);
+	pgd_t *u_pgd = kernel_to_user_pgdp(k_pgd);
+	bool had_kernel, had_user;
+	pmd_t *k_pmd, *u_pmd;
+
+	k_pmd      = pgd_to_pmd_walk(k_pgd, LDT_BASE_ADDR);
+	u_pmd      = pgd_to_pmd_walk(u_pgd, LDT_BASE_ADDR);
+	had_kernel = (k_pmd->pmd != 0);
+	had_user   = (u_pmd->pmd != 0);
+
+	do_sanity_check(mm, had_kernel, had_user);
+}
+
+#else /* !CONFIG_X86_PAE */
+
 static void map_ldt_struct_to_user(struct mm_struct *mm)
 {
 	pgd_t *pgd = pgd_offset(mm, LDT_BASE_ADDR);
@@ -143,6 +194,8 @@ static void sanity_check_ldt_mapping(struct mm_struct *mm)
 	do_sanity_check(mm, had_kernel, had_user);
 }
 
+#endif /* CONFIG_X86_PAE */
+
 /*
  * If PTI is enabled, this maps the LDT into the kernelmode and
  * usermode tables for the given mm.

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32
  2018-07-18  9:41 ` [PATCH 37/39] x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32 Joerg Roedel
@ 2018-07-19 23:37   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jpoimboe, hpa, aarcange, tglx, linux-kernel, boris.ostrovsky,
	brgerst, bp, jgross, luto, jkosina, peterz, will.deacon,
	torvalds, jroedel, dhgutteridge, David.Laight, dvlasenk, mingo,
	eduval, gregkh, llong, pavel, dave.hansen

Commit-ID:  7757d607c6b31867777de42e1fb0210b9c5d8b70
Gitweb:     https://git.kernel.org/tip/7757d607c6b31867777de42e1fb0210b9c5d8b70
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:41:14 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:48 +0200

x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32

Allow PTI to be compiled on x86_32.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-38-git-send-email-joro@8bytes.org

---
 security/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/Kconfig b/security/Kconfig
index c4302067a3ad..afa91c6f06bb 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -57,7 +57,7 @@ config SECURITY_NETWORK
 config PAGE_TABLE_ISOLATION
 	bool "Remove the kernel mapping in user mode"
 	default y
-	depends on X86_64 && !UML
+	depends on X86 && !UML
 	help
 	  This feature reduces the number of hardware side channels by
 	  ensuring that the majority of kernel addresses are not mapped

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/mm/pti: Add Warning when booting on a PCID capable CPU
  2018-07-18  9:41 ` [PATCH 38/39] x86/mm/pti: Add Warning when booting on a PCID capable CPU Joerg Roedel
@ 2018-07-19 23:38   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: gregkh, llong, bp, David.Laight, torvalds, linux-kernel, jgross,
	jroedel, aarcange, brgerst, dvlasenk, jkosina, jpoimboe, hpa,
	eduval, tglx, dave.hansen, mingo, will.deacon, peterz,
	dhgutteridge, pavel, luto, boris.ostrovsky

Commit-ID:  5e8105950a8b3e03e805299b4d05020ee4eda31a
Gitweb:     https://git.kernel.org/tip/5e8105950a8b3e03e805299b4d05020ee4eda31a
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:41:15 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:48 +0200

x86/mm/pti: Add Warning when booting on a PCID capable CPU

Warn the user in case the performance can be significantly improved by
switching to a 64-bit kernel.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-39-git-send-email-joro@8bytes.org

---
 arch/x86/mm/pti.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index a536ecc91847..7b1c85759005 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -517,6 +517,28 @@ void __init pti_init(void)
 
 	pr_info("enabled\n");
 
+#ifdef CONFIG_X86_32
+	/*
+	 * We check for X86_FEATURE_PCID here. But the init-code will
+	 * clear the feature flag on 32 bit because the feature is not
+	 * supported on 32 bit anyway. To print the warning we need to
+	 * check with cpuid directly again.
+	 */
+	if (cpuid_ecx(0x1) && BIT(17)) {
+		/* Use printk to work around pr_fmt() */
+		printk(KERN_WARNING "\n");
+		printk(KERN_WARNING "************************************************************\n");
+		printk(KERN_WARNING "** WARNING! WARNING! WARNING! WARNING! WARNING! WARNING!  **\n");
+		printk(KERN_WARNING "**                                                        **\n");
+		printk(KERN_WARNING "** You are using 32-bit PTI on a 64-bit PCID-capable CPU. **\n");
+		printk(KERN_WARNING "** Your performance will increase dramatically if you     **\n");
+		printk(KERN_WARNING "** switch to a 64-bit kernel!                             **\n");
+		printk(KERN_WARNING "**                                                        **\n");
+		printk(KERN_WARNING "** WARNING! WARNING! WARNING! WARNING! WARNING! WARNING!  **\n");
+		printk(KERN_WARNING "************************************************************\n");
+	}
+#endif
+
 	pti_clone_user_shared();
 
 	/* Undo all global bits from the init pagetables in head_64.S: */

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/pti] x86/entry/32: Add debug code to check entry/exit CR3
  2018-07-18  9:41 ` [PATCH 39/39] x86/entry/32: Add debug code to check entry/exit cr3 Joerg Roedel
@ 2018-07-19 23:38   ` tip-bot for Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: tip-bot for Joerg Roedel @ 2018-07-19 23:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dvlasenk, boris.ostrovsky, David.Laight, llong, luto, hpa,
	jroedel, peterz, pavel, eduval, aarcange, jpoimboe, jgross,
	mingo, bp, torvalds, brgerst, dhgutteridge, linux-kernel,
	will.deacon, dave.hansen, jkosina, tglx, gregkh

Commit-ID:  97193702c6d353e12aefc0fb2f73a98ca421cd56
Gitweb:     https://git.kernel.org/tip/97193702c6d353e12aefc0fb2f73a98ca421cd56
Author:     Joerg Roedel <jroedel@suse.de>
AuthorDate: Wed, 18 Jul 2018 11:41:16 +0200
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Jul 2018 01:11:49 +0200

x86/entry/32: Add debug code to check entry/exit CR3

Add code to check whether the kernel is entered and left with the correct
CR3 and make it depend on CONFIG_DEBUG_ENTRY.  This is needed because there
is no NX protection of user-addresses in the kernel-CR3 on x86-32 and that
type of bug would not be detected otherwise.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Pavel Machek <pavel@ucw.cz>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: "David H . Gutteridge" <dhgutteridge@sympatico.ca>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1531906876-13451-40-git-send-email-joro@8bytes.org

---
 arch/x86/entry/entry_32.S | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index b1541c74c71a..010cdb41e3c7 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -166,6 +166,24 @@
 .Lend_\@:
 .endm
 
+.macro BUG_IF_WRONG_CR3 no_user_check=0
+#ifdef CONFIG_DEBUG_ENTRY
+	ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
+	.if \no_user_check == 0
+	/* coming from usermode? */
+	testl	$SEGMENT_RPL_MASK, PT_CS(%esp)
+	jz	.Lend_\@
+	.endif
+	/* On user-cr3? */
+	movl	%cr3, %eax
+	testl	$PTI_SWITCH_MASK, %eax
+	jnz	.Lend_\@
+	/* From userspace with kernel cr3 - BUG */
+	ud2
+.Lend_\@:
+#endif
+.endm
+
 /*
  * Switch to kernel cr3 if not already loaded and return current cr3 in
  * \scratch_reg
@@ -213,6 +231,8 @@
 .macro SAVE_ALL_NMI cr3_reg:req
 	SAVE_ALL
 
+	BUG_IF_WRONG_CR3
+
 	/*
 	 * Now switch the CR3 when PTI is enabled.
 	 *
@@ -224,6 +244,7 @@
 
 .Lend_\@:
 .endm
+
 /*
  * This is a sneaky trick to help the unwinder find pt_regs on the stack.  The
  * frame pointer is replaced with an encoded pointer to pt_regs.  The encoding
@@ -287,6 +308,8 @@
 
 .Lswitched_\@:
 
+	BUG_IF_WRONG_CR3
+
 	RESTORE_REGS pop=\pop
 .endm
 
@@ -357,6 +380,8 @@
 
 	ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
 
+	BUG_IF_WRONG_CR3
+
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
 
 	/*
@@ -799,6 +824,7 @@ ENTRY(entry_SYSENTER_32)
 	 */
 	pushfl
 	pushl	%eax
+	BUG_IF_WRONG_CR3 no_user_check=1
 	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
 	popl	%eax
 	popfl
@@ -893,6 +919,7 @@ ENTRY(entry_SYSENTER_32)
 	 * whereas POPF does not.)
 	 */
 	btrl	$X86_EFLAGS_IF_BIT, (%esp)
+	BUG_IF_WRONG_CR3 no_user_check=1
 	popfl
 	popl	%eax
 
@@ -970,6 +997,8 @@ restore_all:
 	/* Switch back to user CR3 */
 	SWITCH_TO_USER_CR3 scratch_reg=%eax
 
+	BUG_IF_WRONG_CR3
+
 	/* Restore user state */
 	RESTORE_REGS pop=4			# skip orig_eax/error_code
 .Lirq_return:
@@ -983,6 +1012,7 @@ restore_all:
 restore_all_kernel:
 	TRACE_IRQS_IRET
 	PARANOID_EXIT_TO_KERNEL_MODE
+	BUG_IF_WRONG_CR3
 	RESTORE_REGS 4
 	jmp	.Lirq_return
 
@@ -990,6 +1020,19 @@ restore_all_kernel:
 ENTRY(iret_exc	)
 	pushl	$0				# no error code
 	pushl	$do_iret_error
+
+#ifdef CONFIG_DEBUG_ENTRY
+	/*
+	 * The stack-frame here is the one that iret faulted on, so its a
+	 * return-to-user frame. We are on kernel-cr3 because we come here from
+	 * the fixup code. This confuses the CR3 checker, so switch to user-cr3
+	 * as the checker expects it.
+	 */
+	pushl	%eax
+	SWITCH_TO_USER_CR3 scratch_reg=%eax
+	popl	%eax
+#endif
+
 	jmp	common_exception
 .previous
 	_ASM_EXTABLE(.Lirq_return, iret_exc)

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* Re: [PATCH 00/39 v8] PTI support for x86-32
  2018-07-19 23:21 ` Thomas Gleixner
@ 2018-07-20  7:59   ` Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: Joerg Roedel @ 2018-07-20  7:59 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Joerg Roedel, Ingo Molnar, H . Peter Anvin, x86, linux-kernel,
	linux-mm, Linus Torvalds, Andy Lutomirski, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, aliguori,
	daniel.gruss, hughd, keescook, Andrea Arcangeli, Waiman Long,
	Pavel Machek, David H . Gutteridge

Hi Thomas,

On Fri, Jul 20, 2018 at 01:21:33AM +0200, Thomas Gleixner wrote:
> On Wed, 18 Jul 2018, Joerg Roedel wrote:
> > 
> > here is version 8 of my patches to enable PTI on x86-32. The
> > last version got some good review which I mostly worked into
> > this version.
> 
> I went over the whole set once again and did not find any real issues. As
> the outstanding review comments are addressed, I decided that only broader
> exposure can shake out eventually remaining issues. Applied and pushed out,
> so it should show up in linux-next soon.
> 
> The mm regression seems to be sorted, so there is no immeditate fallout
> expected.
> 
> Thanks for your patience in reworking this over and over. Thanks to Andy
> for putting his entry focssed eyes on it more than once. Great work!

Thanks a lot too! Let's hope things will go smooth from here...
I will also continue testing and improving the code, currently I am
working on a relaxed paranoid entry/exit path suggested by Andy.


Regards,

	Joerg


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 32/39] x86/pgtable/pae: Use separate kernel PMDs for user page-table
  2018-07-18  9:41 ` [PATCH 32/39] x86/pgtable/pae: Use separate kernel PMDs for user page-table Joerg Roedel
  2018-07-19 23:35   ` [tip:x86/pti] " tip-bot for Joerg Roedel
@ 2018-10-05 14:06   ` Arnd Bergmann
  1 sibling, 0 replies; 105+ messages in thread
From: Arnd Bergmann @ 2018-10-05 14:06 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List, Linux-MM,
	Linus Torvalds, Andy Lutomirski, Dave Hansen, Josh Poimboeuf,
	Juergen Gross, Peter Zijlstra, Borislav Petkov, Jiri Kosina,
	Boris Ostrovsky, Brian Gerst, David Laight, Denys Vlasenko,
	Eduardo Valentin, gregkh, Will Deacon, Liguori, Anthony,
	Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek, dhgutteridge, Joerg Roedel

On Wed, Jul 18, 2018 at 11:43 AM Joerg Roedel <joro@8bytes.org> wrote:
>  arch/x86/mm/pgtable.c | 100 ++++++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 81 insertions(+), 19 deletions(-)
>
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index db6fb77..8e4e63d 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -182,6 +182,14 @@ static void pgd_dtor(pgd_t *pgd)
>   */
>  #define PREALLOCATED_PMDS      UNSHARED_PTRS_PER_PGD
>
> +/*
> + * We allocate separate PMDs for the kernel part of the user page-table
> + * when PTI is enabled. We need them to map the per-process LDT into the
> + * user-space page-table.
> + */
> +#define PREALLOCATED_USER_PMDS  (static_cpu_has(X86_FEATURE_PTI) ? \
> +                                       KERNEL_PGD_PTRS : 0)

>   * Xen paravirt assumes pgd table should be in one page. 64 bit kernel also
>   * assumes that pgd should be in one page.
> @@ -376,6 +431,7 @@ static inline void _pgd_free(pgd_t *pgd)
>  pgd_t *pgd_alloc(struct mm_struct *mm)
>  {
>         pgd_t *pgd;
> +       pmd_t *u_pmds[PREALLOCATED_USER_PMDS];
>         pmd_t *pmds[PREALLOCATED_PMDS];
>

This commit from back in July now causes a build warning after the patch
from Kees that enables -Wvla:

In file included from /git/arm-soc/include/linux/kernel.h:15,
                 from /git/arm-soc/include/asm-generic/bug.h:18,
                 from /git/arm-soc/arch/x86/include/asm/bug.h:83,
                 from /git/arm-soc/include/linux/bug.h:5,
                 from /git/arm-soc/include/linux/mmdebug.h:5,
                 from /git/arm-soc/include/linux/mm.h:9,
                 from /git/arm-soc/arch/x86/mm/pgtable.c:2:
/git/arm-soc/arch/x86/mm/pgtable.c: In function 'pgd_alloc':
/git/arm-soc/include/linux/build_bug.h:29:45: error: ISO C90 forbids
variable length array 'u_pmds' [-Werror=vla]
 #define BUILD_BUG_ON_ZERO(e) (sizeof(struct { int:(-!!(e)); }))
                                             ^
/git/arm-soc/arch/x86/include/asm/cpufeature.h:85:5: note: in
expansion of macro 'BUILD_BUG_ON_ZERO'
     BUILD_BUG_ON_ZERO(NCAPINTS != 19))
     ^~~~~~~~~~~~~~~~~
/git/arm-soc/arch/x86/include/asm/cpufeature.h:111:32: note: in
expansion of macro 'REQUIRED_MASK_BIT_SET'
  (__builtin_constant_p(bit) && REQUIRED_MASK_BIT_SET(bit) ? 1 : \
                                ^~~~~~~~~~~~~~~~~~~~~
/git/arm-soc/arch/x86/include/asm/cpufeature.h:129:27: note: in
expansion of macro 'cpu_has'
 #define boot_cpu_has(bit) cpu_has(&boot_cpu_data, bit)
                           ^~~~~~~
/git/arm-soc/arch/x86/include/asm/cpufeature.h:209:3: note: in
expansion of macro 'boot_cpu_has'
   boot_cpu_has(bit) :    \
   ^~~~~~~~~~~~
/git/arm-soc/arch/x86/mm/pgtable.c:190:34: note: in expansion of macro
'static_cpu_has'
 #define PREALLOCATED_USER_PMDS  (static_cpu_has(X86_FEATURE_PTI) ? \
                                  ^~~~~~~~~~~~~~
/git/arm-soc/arch/x86/mm/pgtable.c:431:16: note: in expansion of macro
'PREALLOCATED_USER_PMDS'
  pmd_t *u_pmds[PREALLOCATED_USER_PMDS];
                ^~~~~~~~~~~~~~~~~~~~~~

       Arnd

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 10/39] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack
  2018-07-18  9:40 ` [PATCH 10/39] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack Joerg Roedel
  2018-07-19 23:23   ` [tip:x86/pti] " tip-bot for Joerg Roedel
@ 2018-10-12 18:29   ` Jan Kiszka
  2018-10-13  9:54     ` [PATCH] x86/entry/32: Fix setup of CS high bits Jan Kiszka
  1 sibling, 1 reply; 105+ messages in thread
From: Jan Kiszka @ 2018-10-12 18:29 UTC (permalink / raw)
  To: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli

[-- Attachment #1: Type: text/plain, Size: 6753 bytes --]

On 18.07.18 11:40, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
> 
> It can happen that we enter the kernel from kernel-mode and
> on the entry-stack. The most common way this happens is when
> we get an exception while loading the user-space segment
> registers on the kernel-to-userspace exit path.
> 
> The segment loading needs to be done after the entry-stack
> switch, because the stack-switch needs kernel %fs for
> per_cpu access.
> 
> When this happens, we need to make sure that we leave the
> kernel with the entry-stack again, so that the interrupted
> code-path runs on the right stack when switching to the
> user-cr3.
> 
> We do this by detecting this condition on kernel-entry by
> checking CS.RPL and %esp, and if it happens, we copy over
> the complete content of the entry stack to the task-stack.
> This needs to be done because once we enter the exception
> handlers we might be scheduled out or even migrated to a
> different CPU, so that we can't rely on the entry-stack
> contents. We also leave a marker in the stack-frame to
> detect this condition on the exit path.
> 
> On the exit path the copy is reversed, we copy all of the
> remaining task-stack back to the entry-stack and switch
> to it.
> 
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>   arch/x86/entry/entry_32.S | 116 +++++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 115 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index 7635925..9d6eceb 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -294,6 +294,9 @@
>    * copied there. So allocate the stack-frame on the task-stack and
>    * switch to it before we do any copying.
>    */
> +
> +#define CS_FROM_ENTRY_STACK	(1 << 31)
> +
>   .macro SWITCH_TO_KERNEL_STACK
>   
>   	ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
> @@ -316,6 +319,16 @@
>   	/* Load top of task-stack into %edi */
>   	movl	TSS_entry2task_stack(%edi), %edi
>   
> +	/*
> +	 * Clear unused upper bits of the dword containing the word-sized CS
> +	 * slot in pt_regs in case hardware didn't clear it for us.
> +	 */
> +	andl	$(0x0000ffff), PT_CS(%esp)
> +
> +	/* Special case - entry from kernel mode via entry stack */
> +	testl	$SEGMENT_RPL_MASK, PT_CS(%esp)
> +	jz	.Lentry_from_kernel_\@
> +
>   	/* Bytes to copy */
>   	movl	$PTREGS_SIZE, %ecx
>   
> @@ -329,8 +342,8 @@
>   	 */
>   	addl	$(4 * 4), %ecx
>   
> -.Lcopy_pt_regs_\@:
>   #endif
> +.Lcopy_pt_regs_\@:
>   
>   	/* Allocate frame on task-stack */
>   	subl	%ecx, %edi
> @@ -346,6 +359,56 @@
>   	cld
>   	rep movsl
>   
> +	jmp .Lend_\@
> +
> +.Lentry_from_kernel_\@:
> +
> +	/*
> +	 * This handles the case when we enter the kernel from
> +	 * kernel-mode and %esp points to the entry-stack. When this
> +	 * happens we need to switch to the task-stack to run C code,
> +	 * but switch back to the entry-stack again when we approach
> +	 * iret and return to the interrupted code-path. This usually
> +	 * happens when we hit an exception while restoring user-space
> +	 * segment registers on the way back to user-space.
> +	 *
> +	 * When we switch to the task-stack here, we can't trust the
> +	 * contents of the entry-stack anymore, as the exception handler
> +	 * might be scheduled out or moved to another CPU. Therefore we
> +	 * copy the complete entry-stack to the task-stack and set a
> +	 * marker in the iret-frame (bit 31 of the CS dword) to detect
> +	 * what we've done on the iret path.
> +	 *
> +	 * On the iret path we copy everything back and switch to the
> +	 * entry-stack, so that the interrupted kernel code-path
> +	 * continues on the same stack it was interrupted with.
> +	 *
> +	 * Be aware that an NMI can happen anytime in this code.
> +	 *
> +	 * %esi: Entry-Stack pointer (same as %esp)
> +	 * %edi: Top of the task stack
> +	 */
> +
> +	/* Calculate number of bytes on the entry stack in %ecx */
> +	movl	%esi, %ecx
> +
> +	/* %ecx to the top of entry-stack */
> +	andl	$(MASK_entry_stack), %ecx
> +	addl	$(SIZEOF_entry_stack), %ecx
> +
> +	/* Number of bytes on the entry stack to %ecx */
> +	sub	%esi, %ecx
> +
> +	/* Mark stackframe as coming from entry stack */
> +	orl	$CS_FROM_ENTRY_STACK, PT_CS(%esp)
> +
> +	/*
> +	 * %esi and %edi are unchanged, %ecx contains the number of
> +	 * bytes to copy. The code at .Lcopy_pt_regs_\@ will allocate
> +	 * the stack-frame on task-stack and copy everything over
> +	 */
> +	jmp .Lcopy_pt_regs_\@
> +
>   .Lend_\@:
>   .endm
>   
> @@ -404,6 +467,56 @@
>   .endm
>   
>   /*
> + * This macro handles the case when we return to kernel-mode on the iret
> + * path and have to switch back to the entry stack.
> + *
> + * See the comments below the .Lentry_from_kernel_\@ label in the
> + * SWITCH_TO_KERNEL_STACK macro for more details.
> + */
> +.macro PARANOID_EXIT_TO_KERNEL_MODE
> +
> +	/*
> +	 * Test if we entered the kernel with the entry-stack. Most
> +	 * likely we did not, because this code only runs on the
> +	 * return-to-kernel path.
> +	 */
> +	testl	$CS_FROM_ENTRY_STACK, PT_CS(%esp)
> +	jz	.Lend_\@
> +
> +	/* Unlikely slow-path */
> +
> +	/* Clear marker from stack-frame */
> +	andl	$(~CS_FROM_ENTRY_STACK), PT_CS(%esp)
> +
> +	/* Copy the remaining task-stack contents to entry-stack */
> +	movl	%esp, %esi
> +	movl	PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %edi
> +
> +	/* Bytes on the task-stack to ecx */
> +	movl	PER_CPU_VAR(cpu_tss_rw + TSS_sp1), %ecx
> +	subl	%esi, %ecx
> +
> +	/* Allocate stack-frame on entry-stack */
> +	subl	%ecx, %edi
> +
> +	/*
> +	 * Save future stack-pointer, we must not switch until the
> +	 * copy is done, otherwise the NMI handler could destroy the
> +	 * contents of the task-stack we are about to copy.
> +	 */
> +	movl	%edi, %ebx
> +
> +	/* Do the copy */
> +	shrl	$2, %ecx
> +	cld
> +	rep movsl
> +
> +	/* Safe to switch to entry-stack now */
> +	movl	%ebx, %esp
> +
> +.Lend_\@:
> +.endm
> +/*
>    * %eax: prev task
>    * %edx: next task
>    */
> @@ -764,6 +877,7 @@ restore_all:
>   
>   restore_all_kernel:
>   	TRACE_IRQS_IRET
> +	PARANOID_EXIT_TO_KERNEL_MODE
>   	RESTORE_REGS 4
>   	jmp	.Lirq_return
>   
> 

I've bisected down a boot breakage on Intel Quark board (config attached) to 
this commit (b92a165df17e, I additionally had to apply d1b47a7c9efc). The kernel 
prints out nothing if this is in.

The board is an Siemens IOT2000, I will check if this can also be triggered on a 
similar Galileo Gen2. Qemu does not like to reproduce it, unfortunately.

The commit look unsuspicious at first glance - maybe it is just changing some 
layout in an unfortunate way. Any ideas?

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

[-- Attachment #2: .config.xz --]
[-- Type: application/x-xz, Size: 25424 bytes --]

^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH] x86/entry/32: Fix setup of CS high bits
  2018-10-12 18:29   ` [PATCH 10/39] " Jan Kiszka
@ 2018-10-13  9:54     ` Jan Kiszka
  2018-10-13 15:12       ` Andy Lutomirski
                         ` (2 more replies)
  0 siblings, 3 replies; 105+ messages in thread
From: Jan Kiszka @ 2018-10-13  9:54 UTC (permalink / raw)
  To: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli

From: Jan Kiszka <jan.kiszka@siemens.com>

Even if we are not on an entry stack, we have to initialize the CS high
bits because we are unconditionally evaluating them
PARANOID_EXIT_TO_KERNEL_MODE. Failing to do so broke the boot on Galileo
Gen2 and IOT2000 boards.

Fixes: b92a165df17e ("x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack")
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 arch/x86/entry/entry_32.S | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 2767c625a52c..95c94d48ecd2 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -389,6 +389,12 @@
 	 * that register for the time this macro runs
 	 */
 
+	/*
+	 * Clear unused upper bits of the dword containing the word-sized CS
+	 * slot in pt_regs in case hardware didn't clear it for us.
+	 */
+	andl	$(0x0000ffff), PT_CS(%esp)
+
 	/* Are we on the entry stack? Bail out if not! */
 	movl	PER_CPU_VAR(cpu_entry_area), %ecx
 	addl	$CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
@@ -407,12 +413,6 @@
 	/* Load top of task-stack into %edi */
 	movl	TSS_entry2task_stack(%edi), %edi
 
-	/*
-	 * Clear unused upper bits of the dword containing the word-sized CS
-	 * slot in pt_regs in case hardware didn't clear it for us.
-	 */
-	andl	$(0x0000ffff), PT_CS(%esp)
-
 	/* Special case - entry from kernel mode via entry stack */
 #ifdef CONFIG_VM86
 	movl	PT_EFLAGS(%esp), %ecx		# mix EFLAGS and CS
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* Re: [PATCH] x86/entry/32: Fix setup of CS high bits
  2018-10-13  9:54     ` [PATCH] x86/entry/32: Fix setup of CS high bits Jan Kiszka
@ 2018-10-13 15:12       ` Andy Lutomirski
  2018-10-15 13:08         ` Jan Kiszka
  2018-10-15  9:10       ` Joerg Roedel
  2018-10-15 14:09       ` [PATCH v2] " Jan Kiszka
  2 siblings, 1 reply; 105+ messages in thread
From: Andy Lutomirski @ 2018-10-13 15:12 UTC (permalink / raw)
  To: jan.kiszka
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	X86 ML, LKML, Linux-MM, Linus Torvalds, Andrew Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli

On Sat, Oct 13, 2018 at 3:02 AM Jan Kiszka <jan.kiszka@web.de> wrote:
>
> From: Jan Kiszka <jan.kiszka@siemens.com>
>
> Even if we are not on an entry stack, we have to initialize the CS high
> bits because we are unconditionally evaluating them
> PARANOID_EXIT_TO_KERNEL_MODE. Failing to do so broke the boot on Galileo
> Gen2 and IOT2000 boards.
>
> Fixes: b92a165df17e ("x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack")
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>  arch/x86/entry/entry_32.S | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index 2767c625a52c..95c94d48ecd2 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -389,6 +389,12 @@
>          * that register for the time this macro runs
>          */
>
> +       /*
> +        * Clear unused upper bits of the dword containing the word-sized CS
> +        * slot in pt_regs in case hardware didn't clear it for us.
> +        */
> +       andl    $(0x0000ffff), PT_CS(%esp)
> +

Please improve the comment. Since commit:

commit 385eca8f277c4c34f361a4c3a088fd876d29ae21
Author: Andy Lutomirski <luto@kernel.org>
Date:   Fri Jul 28 06:00:30 2017 -0700

    x86/asm/32: Make pt_regs's segment registers be 16 bits

Those fields are genuinely 16 bit.  So the comment should say
something like "Those high bits are used for CS_FROM_ENTRY_STACK and
CS_FROM_USER_CR3".

Also, can you fold something like this in:

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 2767c625a52c..358eed8cf62a 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -171,7 +171,7 @@
        ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
        .if \no_user_check == 0
        /* coming from usermode? */
-       testl   $SEGMENT_RPL_MASK, PT_CS(%esp)
+       testb   $SEGMENT_RPL_MASK, PT_CS(%esp)
        jz      .Lend_\@
        .endif
        /* On user-cr3? */

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* Re: [PATCH] x86/entry/32: Fix setup of CS high bits
  2018-10-13  9:54     ` [PATCH] x86/entry/32: Fix setup of CS high bits Jan Kiszka
  2018-10-13 15:12       ` Andy Lutomirski
@ 2018-10-15  9:10       ` Joerg Roedel
  2018-10-15 14:09       ` [PATCH v2] " Jan Kiszka
  2 siblings, 0 replies; 105+ messages in thread
From: Joerg Roedel @ 2018-10-15  9:10 UTC (permalink / raw)
  To: Jan Kiszka
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel,
	linux-mm, Linus Torvalds, Andy Lutomirski, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, aliguori,
	daniel.gruss, hughd, keescook, Andrea Arcangeli

Hey Jan,

thanks for tracking this down and sending the fix! So your hardware
probably doesn't zero out the CS high bits, so that the code wrongly
detects that it came from the entry stack on return. Clearing the bits
earlier before the entry-stack check makes sense.

Acked-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Joerg Roedel <jroedel@suse.de>

On Sat, Oct 13, 2018 at 11:54:54AM +0200, Jan Kiszka wrote:
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index 2767c625a52c..95c94d48ecd2 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -389,6 +389,12 @@
>  	 * that register for the time this macro runs
>  	 */
>  
> +	/*
> +	 * Clear unused upper bits of the dword containing the word-sized CS
> +	 * slot in pt_regs in case hardware didn't clear it for us.
> +	 */
> +	andl	$(0x0000ffff), PT_CS(%esp)
> +
>  	/* Are we on the entry stack? Bail out if not! */
>  	movl	PER_CPU_VAR(cpu_entry_area), %ecx
>  	addl	$CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
> @@ -407,12 +413,6 @@
>  	/* Load top of task-stack into %edi */
>  	movl	TSS_entry2task_stack(%edi), %edi
>  
> -	/*
> -	 * Clear unused upper bits of the dword containing the word-sized CS
> -	 * slot in pt_regs in case hardware didn't clear it for us.
> -	 */
> -	andl	$(0x0000ffff), PT_CS(%esp)
> -
>  	/* Special case - entry from kernel mode via entry stack */
>  #ifdef CONFIG_VM86
>  	movl	PT_EFLAGS(%esp), %ecx		# mix EFLAGS and CS
> -- 
> 2.16.4

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH] x86/entry/32: Fix setup of CS high bits
  2018-10-13 15:12       ` Andy Lutomirski
@ 2018-10-15 13:08         ` Jan Kiszka
  2018-10-15 13:14           ` David Laight
  0 siblings, 1 reply; 105+ messages in thread
From: Jan Kiszka @ 2018-10-15 13:08 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	X86 ML, LKML, Linux-MM, Linus Torvalds, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli

On 13.10.18 17:12, Andy Lutomirski wrote:
> On Sat, Oct 13, 2018 at 3:02 AM Jan Kiszka <jan.kiszka@web.de> wrote:
>>
>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>
>> Even if we are not on an entry stack, we have to initialize the CS high
>> bits because we are unconditionally evaluating them
>> PARANOID_EXIT_TO_KERNEL_MODE. Failing to do so broke the boot on Galileo
>> Gen2 and IOT2000 boards.
>>
>> Fixes: b92a165df17e ("x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack")
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>   arch/x86/entry/entry_32.S | 12 ++++++------
>>   1 file changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
>> index 2767c625a52c..95c94d48ecd2 100644
>> --- a/arch/x86/entry/entry_32.S
>> +++ b/arch/x86/entry/entry_32.S
>> @@ -389,6 +389,12 @@
>>           * that register for the time this macro runs
>>           */
>>
>> +       /*
>> +        * Clear unused upper bits of the dword containing the word-sized CS
>> +        * slot in pt_regs in case hardware didn't clear it for us.
>> +        */
>> +       andl    $(0x0000ffff), PT_CS(%esp)
>> +
> 
> Please improve the comment. Since commit:
> 
> commit 385eca8f277c4c34f361a4c3a088fd876d29ae21
> Author: Andy Lutomirski <luto@kernel.org>
> Date:   Fri Jul 28 06:00:30 2017 -0700
> 
>      x86/asm/32: Make pt_regs's segment registers be 16 bits
> 
> Those fields are genuinely 16 bit.  So the comment should say
> something like "Those high bits are used for CS_FROM_ENTRY_STACK and
> CS_FROM_USER_CR3".

/*
  * The high bits of the CS dword (__csh) are used for
  * CS_FROM_ENTRY_STACK and CS_FROM_USER_CR3. Clear them in case
  * hardware didn't do this for us.
  */

OK? I will send out v2 with this wording soon.

> 
> Also, can you fold something like this in:
> 
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index 2767c625a52c..358eed8cf62a 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -171,7 +171,7 @@
>          ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
>          .if \no_user_check == 0
>          /* coming from usermode? */
> -       testl   $SEGMENT_RPL_MASK, PT_CS(%esp)
> +       testb   $SEGMENT_RPL_MASK, PT_CS(%esp)
>          jz      .Lend_\@
>          .endif
>          /* On user-cr3? */
> 

Makes sense, but this looks like a separate patch to me.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 105+ messages in thread

* RE: [PATCH] x86/entry/32: Fix setup of CS high bits
  2018-10-15 13:08         ` Jan Kiszka
@ 2018-10-15 13:14           ` David Laight
  2018-10-15 13:18             ` Jan Kiszka
  0 siblings, 1 reply; 105+ messages in thread
From: David Laight @ 2018-10-15 13:14 UTC (permalink / raw)
  To: 'Jan Kiszka', Andy Lutomirski
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	X86 ML, LKML, Linux-MM, Linus Torvalds, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, Denys Vlasenko,
	Eduardo Valentin, Greg KH, Will Deacon, Liguori, Anthony,
	Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli

From: Jan Kiszka
> Sent: 15 October 2018 14:09
...
> > Those fields are genuinely 16 bit.  So the comment should say
> > something like "Those high bits are used for CS_FROM_ENTRY_STACK and
> > CS_FROM_USER_CR3".
> 
> /*
>   * The high bits of the CS dword (__csh) are used for
>   * CS_FROM_ENTRY_STACK and CS_FROM_USER_CR3. Clear them in case
>   * hardware didn't do this for us.
>   */

What's a 'dword' ? :-)

On a 32bit processor a 'word' will be 32 bits to a 'double-word'
would be 64 bits.
One of the worst names to use.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH] x86/entry/32: Fix setup of CS high bits
  2018-10-15 13:14           ` David Laight
@ 2018-10-15 13:18             ` Jan Kiszka
  2018-10-15 13:29               ` David Laight
  0 siblings, 1 reply; 105+ messages in thread
From: Jan Kiszka @ 2018-10-15 13:18 UTC (permalink / raw)
  To: David Laight, Andy Lutomirski
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	X86 ML, LKML, Linux-MM, Linus Torvalds, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, Denys Vlasenko,
	Eduardo Valentin, Greg KH, Will Deacon, Liguori, Anthony,
	Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli

On 15.10.18 15:14, David Laight wrote:
> From: Jan Kiszka
>> Sent: 15 October 2018 14:09
> ...
>>> Those fields are genuinely 16 bit.  So the comment should say
>>> something like "Those high bits are used for CS_FROM_ENTRY_STACK and
>>> CS_FROM_USER_CR3".
>>
>> /*
>>    * The high bits of the CS dword (__csh) are used for
>>    * CS_FROM_ENTRY_STACK and CS_FROM_USER_CR3. Clear them in case
>>    * hardware didn't do this for us.
>>    */
> 
> What's a 'dword' ? :-)
> 
> On a 32bit processor a 'word' will be 32 bits to a 'double-word'
> would be 64 bits.
> One of the worst names to use.

That's ia32 nomenclature: a doubleword (dword) is a 32-bit value.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 105+ messages in thread

* RE: [PATCH] x86/entry/32: Fix setup of CS high bits
  2018-10-15 13:18             ` Jan Kiszka
@ 2018-10-15 13:29               ` David Laight
  0 siblings, 0 replies; 105+ messages in thread
From: David Laight @ 2018-10-15 13:29 UTC (permalink / raw)
  To: 'Jan Kiszka', Andy Lutomirski
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	X86 ML, LKML, Linux-MM, Linus Torvalds, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, Denys Vlasenko,
	Eduardo Valentin, Greg KH, Will Deacon, Liguori, Anthony,
	Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli

From: Jan Kiszka
> On 15.10.18 15:14, David Laight wrote:
> > From: Jan Kiszka
> >> Sent: 15 October 2018 14:09
> > ...
> >>> Those fields are genuinely 16 bit.  So the comment should say
> >>> something like "Those high bits are used for CS_FROM_ENTRY_STACK and
> >>> CS_FROM_USER_CR3".
> >>
> >> /*
> >>    * The high bits of the CS dword (__csh) are used for
> >>    * CS_FROM_ENTRY_STACK and CS_FROM_USER_CR3. Clear them in case
> >>    * hardware didn't do this for us.
> >>    */
> >
> > What's a 'dword' ? :-)
> >
> > On a 32bit processor a 'word' will be 32 bits to a 'double-word'
> > would be 64 bits.
> > One of the worst names to use.
> 
> That's ia32 nomenclature: a doubleword (dword) is a 32-bit value.

I think you missed the :-)
I don't think linux uses that term very often.

Any guesses as to what type DWORD_PTR is?
(in a well know 64bit environment that uses UPPER_CASE for types).

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH v2] x86/entry/32: Fix setup of CS high bits
  2018-10-13  9:54     ` [PATCH] x86/entry/32: Fix setup of CS high bits Jan Kiszka
  2018-10-13 15:12       ` Andy Lutomirski
  2018-10-15  9:10       ` Joerg Roedel
@ 2018-10-15 14:09       ` Jan Kiszka
  2018-10-15 15:09         ` [tip:x86/urgent] x86/entry/32: Clear the " tip-bot for Jan Kiszka
  2018-10-18  6:21         ` tip-bot for Jan Kiszka
  2 siblings, 2 replies; 105+ messages in thread
From: Jan Kiszka @ 2018-10-15 14:09 UTC (permalink / raw)
  To: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli

Even if we are not on an entry stack, we have to initialize the CS high
bits because we are unconditionally evaluating them
PARANOID_EXIT_TO_KERNEL_MODE. Failing to do so broke the boot on Galileo
Gen2 and IOT2000 boards.

Fixes: b92a165df17e ("x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack")
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Acked-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
---

Changes in v2:
 - adjust comment according to Andy's feedback
 - added Jörg's ack/review (assuming the comment change does not affect it)

 arch/x86/entry/entry_32.S | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 2767c625a52c..fbbf1ba57ec6 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -389,6 +389,13 @@
 	 * that register for the time this macro runs
 	 */
 
+	/*
+	 * The high bits of the CS dword (__csh) are used for
+	 * CS_FROM_ENTRY_STACK and CS_FROM_USER_CR3. Clear them in case
+	 * hardware didn't do this for us.
+	 */
+	andl	$(0x0000ffff), PT_CS(%esp)
+
 	/* Are we on the entry stack? Bail out if not! */
 	movl	PER_CPU_VAR(cpu_entry_area), %ecx
 	addl	$CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
@@ -407,12 +414,6 @@
 	/* Load top of task-stack into %edi */
 	movl	TSS_entry2task_stack(%edi), %edi
 
-	/*
-	 * Clear unused upper bits of the dword containing the word-sized CS
-	 * slot in pt_regs in case hardware didn't clear it for us.
-	 */
-	andl	$(0x0000ffff), PT_CS(%esp)
-
 	/* Special case - entry from kernel mode via entry stack */
 #ifdef CONFIG_VM86
 	movl	PT_EFLAGS(%esp), %ecx		# mix EFLAGS and CS
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/urgent] x86/entry/32: Clear the CS high bits
  2018-10-15 14:09       ` [PATCH v2] " Jan Kiszka
@ 2018-10-15 15:09         ` tip-bot for Jan Kiszka
  2018-10-18  6:21         ` tip-bot for Jan Kiszka
  1 sibling, 0 replies; 105+ messages in thread
From: tip-bot for Jan Kiszka @ 2018-10-15 15:09 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jpoimboe, dave.hansen, torvalds, jan.kiszka, dvlasenk, bp,
	linux-mm, David.Laight, linux-kernel, luto, jgross, hpa, brgerst,
	eduval, gregkh, jroedel, tglx, peterz, mingo, will.deacon, x86,
	jkosina, boris.ostrovsky, aarcange

Commit-ID:  8cad6c58c9effb59b830bcf0103d8267ad2e312d
Gitweb:     https://git.kernel.org/tip/8cad6c58c9effb59b830bcf0103d8267ad2e312d
Author:     Jan Kiszka <jan.kiszka@siemens.com>
AuthorDate: Mon, 15 Oct 2018 16:09:29 +0200
Committer:  Borislav Petkov <bp@suse.de>
CommitDate: Mon, 15 Oct 2018 16:54:28 +0200

x86/entry/32: Clear the CS high bits

Even if not on an entry stack, the CS's high bits must be
initialized because they are unconditionally evaluated in
PARANOID_EXIT_TO_KERNEL_MODE.

Failing to do so broke the boot on Galileo Gen2 and IOT2000 boards.

 [ bp: Make the commit message tone passive and impartial. ]

Fixes: b92a165df17e ("x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack")
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andrea Arcangeli <aarcange@redhat.com>
CC: Andy Lutomirski <luto@kernel.org>
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: Brian Gerst <brgerst@gmail.com>
CC: Dave Hansen <dave.hansen@intel.com>
CC: David Laight <David.Laight@aculab.com>
CC: Denys Vlasenko <dvlasenk@redhat.com>
CC: Eduardo Valentin <eduval@amazon.com>
CC: Greg KH <gregkh@linuxfoundation.org>
CC: Ingo Molnar <mingo@kernel.org>
CC: Jiri Kosina <jkosina@suse.cz>
CC: Josh Poimboeuf <jpoimboe@redhat.com>
CC: Juergen Gross <jgross@suse.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Will Deacon <will.deacon@arm.com>
CC: aliguori@amazon.com
CC: daniel.gruss@iaik.tugraz.at
CC: hughd@google.com
CC: keescook@google.com
CC: linux-mm <linux-mm@kvack.org>
CC: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/f271c747-1714-5a5b-a71f-ae189a093b8d@siemens.com
---
 arch/x86/entry/entry_32.S | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 2767c625a52c..fbbf1ba57ec6 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -389,6 +389,13 @@
 	 * that register for the time this macro runs
 	 */
 
+	/*
+	 * The high bits of the CS dword (__csh) are used for
+	 * CS_FROM_ENTRY_STACK and CS_FROM_USER_CR3. Clear them in case
+	 * hardware didn't do this for us.
+	 */
+	andl	$(0x0000ffff), PT_CS(%esp)
+
 	/* Are we on the entry stack? Bail out if not! */
 	movl	PER_CPU_VAR(cpu_entry_area), %ecx
 	addl	$CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
@@ -407,12 +414,6 @@
 	/* Load top of task-stack into %edi */
 	movl	TSS_entry2task_stack(%edi), %edi
 
-	/*
-	 * Clear unused upper bits of the dword containing the word-sized CS
-	 * slot in pt_regs in case hardware didn't clear it for us.
-	 */
-	andl	$(0x0000ffff), PT_CS(%esp)
-
 	/* Special case - entry from kernel mode via entry stack */
 #ifdef CONFIG_VM86
 	movl	PT_EFLAGS(%esp), %ecx		# mix EFLAGS and CS

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* [tip:x86/urgent] x86/entry/32: Clear the CS high bits
  2018-10-15 14:09       ` [PATCH v2] " Jan Kiszka
  2018-10-15 15:09         ` [tip:x86/urgent] x86/entry/32: Clear the " tip-bot for Jan Kiszka
@ 2018-10-18  6:21         ` tip-bot for Jan Kiszka
  1 sibling, 0 replies; 105+ messages in thread
From: tip-bot for Jan Kiszka @ 2018-10-18  6:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jkosina, jpoimboe, x86, jgross, aarcange, linux-kernel, jroedel,
	gregkh, mingo, luto, boris.ostrovsky, linux-mm, torvalds,
	dave.hansen, jan.kiszka, eduval, hpa, dvlasenk, tglx, brgerst,
	David.Laight, peterz, bp, will.deacon

Commit-ID:  04f4f954b69526d7af8ffb8e5780f08b8a6cda2d
Gitweb:     https://git.kernel.org/tip/04f4f954b69526d7af8ffb8e5780f08b8a6cda2d
Author:     Jan Kiszka <jan.kiszka@siemens.com>
AuthorDate: Mon, 15 Oct 2018 16:09:29 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 17 Oct 2018 12:30:20 +0200

x86/entry/32: Clear the CS high bits

Even if not on an entry stack, the CS's high bits must be
initialized because they are unconditionally evaluated in
PARANOID_EXIT_TO_KERNEL_MODE.

Failing to do so broke the boot on Galileo Gen2 and IOT2000 boards.

 [ bp: Make the commit message tone passive and impartial. ]

Fixes: b92a165df17e ("x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack")
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Joerg Roedel <jroedel@suse.de>
Acked-by: Joerg Roedel <jroedel@suse.de>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Andrea Arcangeli <aarcange@redhat.com>
CC: Andy Lutomirski <luto@kernel.org>
CC: Boris Ostrovsky <boris.ostrovsky@oracle.com>
CC: Brian Gerst <brgerst@gmail.com>
CC: Dave Hansen <dave.hansen@intel.com>
CC: David Laight <David.Laight@aculab.com>
CC: Denys Vlasenko <dvlasenk@redhat.com>
CC: Eduardo Valentin <eduval@amazon.com>
CC: Greg KH <gregkh@linuxfoundation.org>
CC: Ingo Molnar <mingo@kernel.org>
CC: Jiri Kosina <jkosina@suse.cz>
CC: Josh Poimboeuf <jpoimboe@redhat.com>
CC: Juergen Gross <jgross@suse.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Will Deacon <will.deacon@arm.com>
CC: aliguori@amazon.com
CC: daniel.gruss@iaik.tugraz.at
CC: hughd@google.com
CC: keescook@google.com
CC: linux-mm <linux-mm@kvack.org>
CC: x86-ml <x86@kernel.org>
Link: http://lkml.kernel.org/r/f271c747-1714-5a5b-a71f-ae189a093b8d@siemens.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/entry/entry_32.S | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 2767c625a52c..fbbf1ba57ec6 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -389,6 +389,13 @@
 	 * that register for the time this macro runs
 	 */
 
+	/*
+	 * The high bits of the CS dword (__csh) are used for
+	 * CS_FROM_ENTRY_STACK and CS_FROM_USER_CR3. Clear them in case
+	 * hardware didn't do this for us.
+	 */
+	andl	$(0x0000ffff), PT_CS(%esp)
+
 	/* Are we on the entry stack? Bail out if not! */
 	movl	PER_CPU_VAR(cpu_entry_area), %ecx
 	addl	$CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
@@ -407,12 +414,6 @@
 	/* Load top of task-stack into %edi */
 	movl	TSS_entry2task_stack(%edi), %edi
 
-	/*
-	 * Clear unused upper bits of the dword containing the word-sized CS
-	 * slot in pt_regs in case hardware didn't clear it for us.
-	 */
-	andl	$(0x0000ffff), PT_CS(%esp)
-
 	/* Special case - entry from kernel mode via entry stack */
 #ifdef CONFIG_VM86
 	movl	PT_EFLAGS(%esp), %ecx		# mix EFLAGS and CS

^ permalink raw reply related	[flat|nested] 105+ messages in thread

* Re: [PATCH 10/39] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack
  2018-07-17 20:06               ` Andy Lutomirski
@ 2018-07-18 11:59                 ` Joerg Roedel
  0 siblings, 0 replies; 105+ messages in thread
From: Joerg Roedel @ 2018-07-18 11:59 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Joerg Roedel, Andy Lutomirski, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, X86 ML, LKML, Linux-MM, Linus Torvalds,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge

On Tue, Jul 17, 2018 at 01:06:11PM -0700, Andy Lutomirski wrote:
> Yes, we obviously need to restore the correct cr3.  But I really don't
> like the code that rewrites the stack frame that we're about to IRET
> to, especially when it doesn't seem to serve a purpose.  I'd much
> rather the code just get its CR3 right and do the IRET and trust that
> the frame it's returning to is still there.

Okay, I'll give it a try and if it works without the copying we can put
that on-top of this patch-set. This also has the benefit that we can
revert it later if it causes problems down the road.


Regards,

	Joerg

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 10/39] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack
  2018-07-17  7:15             ` Joerg Roedel
@ 2018-07-17 20:06               ` Andy Lutomirski
  2018-07-18 11:59                 ` Joerg Roedel
  0 siblings, 1 reply; 105+ messages in thread
From: Andy Lutomirski @ 2018-07-17 20:06 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Andy Lutomirski, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, X86 ML, LKML, Linux-MM, Linus Torvalds,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge

On Tue, Jul 17, 2018 at 12:15 AM, Joerg Roedel <jroedel@suse.de> wrote:
> On Sat, Jul 14, 2018 at 07:36:47AM -0700, Andy Lutomirski wrote:
>> But I’m still unconvinced. If any code executed with IRQs enabled on
>> the entry stack, then that code is terminally buggy. If you’re
>> executing with IRQs off, you’re not going to get migrated.  64-bit
>> kernels run on percpu stacks all the time, and it’s not a problem.
>
> The code switches to the kernel-stack and kernel-cr3 and just remembers
> where it came from (to handle the entry-from-kernel with entry-stack
> and/or user-cr3 case). IRQs are disabled in the entry-code path. But
> ultimately it calls into C code to handle the exception. And there IRQs
> might get enabled again.
>
>> IRET errors are genuinely special and, if they’re causing a problem
>> for you, we should fix them the same way we deal with them on x86_64.
>
> Right, IRET is handled differently and doesn't need this patch. But the
> segment-writing exceptions do.
>
> If you insist on it I can try to implement the assumption that we don't
> get preempted in this code-path. That will safe us some cycles for
> copying stack contents in this unlikely slow-path. But we definitly need
> to handle the entry-from-kernel with entry-stack and/or user-cr3 case
> correctly and make a switch to kernel-stack/cr3 because we are going to
> call into C-code.
>
>

Yes, we obviously need to restore the correct cr3.  But I really don't
like the code that rewrites the stack frame that we're about to IRET
to, especially when it doesn't seem to serve a purpose.  I'd much
rather the code just get its CR3 right and do the IRET and trust that
the frame it's returning to is still there.

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 10/39] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack
  2018-07-14 14:36           ` Andy Lutomirski
@ 2018-07-17  7:15             ` Joerg Roedel
  2018-07-17 20:06               ` Andy Lutomirski
  0 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-17  7:15 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, X86 ML, LKML, Linux-MM, Linus Torvalds,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge

On Sat, Jul 14, 2018 at 07:36:47AM -0700, Andy Lutomirski wrote:
> But I’m still unconvinced. If any code executed with IRQs enabled on
> the entry stack, then that code is terminally buggy. If you’re
> executing with IRQs off, you’re not going to get migrated.  64-bit
> kernels run on percpu stacks all the time, and it’s not a problem.

The code switches to the kernel-stack and kernel-cr3 and just remembers
where it came from (to handle the entry-from-kernel with entry-stack
and/or user-cr3 case). IRQs are disabled in the entry-code path. But
ultimately it calls into C code to handle the exception. And there IRQs
might get enabled again.

> IRET errors are genuinely special and, if they’re causing a problem
> for you, we should fix them the same way we deal with them on x86_64.

Right, IRET is handled differently and doesn't need this patch. But the
segment-writing exceptions do.

If you insist on it I can try to implement the assumption that we don't
get preempted in this code-path. That will safe us some cycles for
copying stack contents in this unlikely slow-path. But we definitly need
to handle the entry-from-kernel with entry-stack and/or user-cr3 case
correctly and make a switch to kernel-stack/cr3 because we are going to
call into C-code.


Regards,

	Joerg

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 10/39] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack
  2018-07-14  8:01         ` Joerg Roedel
@ 2018-07-14 14:36           ` Andy Lutomirski
  2018-07-17  7:15             ` Joerg Roedel
  0 siblings, 1 reply; 105+ messages in thread
From: Andy Lutomirski @ 2018-07-14 14:36 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Andy Lutomirski, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, X86 ML, LKML, Linux-MM, Linus Torvalds,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge



> On Jul 14, 2018, at 1:01 AM, Joerg Roedel <jroedel@suse.de> wrote:
> 
> On Fri, Jul 13, 2018 at 11:26:54PM -0700, Andy Lutomirski wrote:
>>> So based on that, I did the above because the entry-stack is a per-cpu
>>> data structure and I am not sure that we always return from the exception
>>> on the same CPU where we got it. Therefore the path is called
>>> PARANOID_... :)
>> 
>> But we should just be able to IRET and end up right back on the entry
>> stack where we were when we got interrupted.
> 
> Yeah, but using another CPUs entry-stack is a bad idea, no? Especially
> since the owning CPU might have overwritten our content there already.
> 
>> On x86_64, we *definitely* can’t schedule in NMI, MCE, or #DB because
>> we’re on a percpu stack. Are you *sure* we need this patch?
> 
> I am sure we need this patch, but not 100% sure that we really can
> change CPUs in this path. We are not only talking about NMI, #MC and
> #DB, but also about #GP and every other exception that can happen while
> writing segments registers or on iret. With this implementation we are
> on the safe side for this unlikely slow-path.

Oh, right, exceptions while writing segment regs. IRET is special, though.

But I’m still unconvinced. If any code executed with IRQs enabled on the entry stack, then that code is terminally buggy. If you’re executing with IRQs off, you’re not going to get migrated.  64-bit kernels run on percpu stacks all the time, and it’s not a problem.

IRET errors are genuinely special and, if they’re causing a problem for you, we should fix them the same way we deal with them on x86_64. M

> 
> Regards,
> 
>    Joerg

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 10/39] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack
  2018-07-14  6:26       ` Andy Lutomirski
@ 2018-07-14  8:01         ` Joerg Roedel
  2018-07-14 14:36           ` Andy Lutomirski
  0 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-14  8:01 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Andy Lutomirski, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, X86 ML, LKML, Linux-MM, Linus Torvalds,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge

On Fri, Jul 13, 2018 at 11:26:54PM -0700, Andy Lutomirski wrote:
> > So based on that, I did the above because the entry-stack is a per-cpu
> > data structure and I am not sure that we always return from the exception
> > on the same CPU where we got it. Therefore the path is called
> > PARANOID_... :)
> 
> But we should just be able to IRET and end up right back on the entry
> stack where we were when we got interrupted.

Yeah, but using another CPUs entry-stack is a bad idea, no? Especially
since the owning CPU might have overwritten our content there already.

> On x86_64, we *definitely* can’t schedule in NMI, MCE, or #DB because
> we’re on a percpu stack. Are you *sure* we need this patch?

I am sure we need this patch, but not 100% sure that we really can
change CPUs in this path. We are not only talking about NMI, #MC and
#DB, but also about #GP and every other exception that can happen while
writing segments registers or on iret. With this implementation we are
on the safe side for this unlikely slow-path.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 10/39] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack
  2018-07-14  5:21     ` Joerg Roedel
@ 2018-07-14  6:26       ` Andy Lutomirski
  2018-07-14  8:01         ` Joerg Roedel
  0 siblings, 1 reply; 105+ messages in thread
From: Andy Lutomirski @ 2018-07-14  6:26 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Andy Lutomirski, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, X86 ML, LKML, Linux-MM, Linus Torvalds,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge



> On Jul 13, 2018, at 10:21 PM, Joerg Roedel <jroedel@suse.de> wrote:
> 
>> On Fri, Jul 13, 2018 at 04:31:02PM -0700, Andy Lutomirski wrote:
>> What you're really doing is keeping it available for an extra flag.
>> Please update the comment as such.  But see below.
> 
> Thanks, will do.
> 
>>> +.macro PARANOID_EXIT_TO_KERNEL_MODE
>>> +
>>> +       /*
>>> +        * Test if we entered the kernel with the entry-stack. Most
>>> +        * likely we did not, because this code only runs on the
>>> +        * return-to-kernel path.
>>> +        */
>>> +       testl   $CS_FROM_ENTRY_STACK, PT_CS(%esp)
>>> +       jz      .Lend_\@
>>> +
>>> +       /* Unlikely slow-path */
>>> +
>>> +       /* Clear marker from stack-frame */
>>> +       andl    $(~CS_FROM_ENTRY_STACK), PT_CS(%esp)
>>> +
>>> +       /* Copy the remaining task-stack contents to entry-stack */
>>> +       movl    %esp, %esi
>>> +       movl    PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %edi
>> 
>> I'm confused.  Why do we need any special handling here at all?  How
>> could we end up with the contents of the stack frame we interrupted in
>> a corrupt state?
>> 
>> I guess I don't understand why this patch is needed.
> 
> The patch is needed because we can get exceptions in kernel-mode while
> we are already on user-cr3 and entry-stack. In this case we need to
> return with user-cr3 and entry-stack to the kernel too, otherwise we
> would go to user-space with kernel-cr3.
> 
> So based on that, I did the above because the entry-stack is a per-cpu
> data structure and I am not sure that we always return from the exception
> on the same CPU where we got it. Therefore the path is called
> PARANOID_... :)

But we should just be able to IRET and end up right back on the entry stack where we were when we got interrupted.

On x86_64, we *definitely* can’t schedule in NMI, MCE, or #DB because we’re on a percpu stack. Are you *sure* we need this patch?

> 
> 
> Regards,
> 
>    Joerg
> 

^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 10/39] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack
  2018-07-13 23:31   ` Andy Lutomirski
@ 2018-07-14  5:21     ` Joerg Roedel
  2018-07-14  6:26       ` Andy Lutomirski
  0 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-14  5:21 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	X86 ML, LKML, Linux-MM, Linus Torvalds, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek, David H . Gutteridge

On Fri, Jul 13, 2018 at 04:31:02PM -0700, Andy Lutomirski wrote:
> What you're really doing is keeping it available for an extra flag.
> Please update the comment as such.  But see below.

Thanks, will do.

> > +.macro PARANOID_EXIT_TO_KERNEL_MODE
> > +
> > +       /*
> > +        * Test if we entered the kernel with the entry-stack. Most
> > +        * likely we did not, because this code only runs on the
> > +        * return-to-kernel path.
> > +        */
> > +       testl   $CS_FROM_ENTRY_STACK, PT_CS(%esp)
> > +       jz      .Lend_\@
> > +
> > +       /* Unlikely slow-path */
> > +
> > +       /* Clear marker from stack-frame */
> > +       andl    $(~CS_FROM_ENTRY_STACK), PT_CS(%esp)
> > +
> > +       /* Copy the remaining task-stack contents to entry-stack */
> > +       movl    %esp, %esi
> > +       movl    PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %edi
> 
> I'm confused.  Why do we need any special handling here at all?  How
> could we end up with the contents of the stack frame we interrupted in
> a corrupt state?
> 
> I guess I don't understand why this patch is needed.

The patch is needed because we can get exceptions in kernel-mode while
we are already on user-cr3 and entry-stack. In this case we need to
return with user-cr3 and entry-stack to the kernel too, otherwise we
would go to user-space with kernel-cr3.

So based on that, I did the above because the entry-stack is a per-cpu
data structure and I am not sure that we always return from the exception
on the same CPU where we got it. Therefore the path is called
PARANOID_... :)


Regards,

	Joerg


^ permalink raw reply	[flat|nested] 105+ messages in thread

* Re: [PATCH 10/39] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack
  2018-07-11 11:29 ` [PATCH 10/39] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack Joerg Roedel
@ 2018-07-13 23:31   ` Andy Lutomirski
  2018-07-14  5:21     ` Joerg Roedel
  0 siblings, 1 reply; 105+ messages in thread
From: Andy Lutomirski @ 2018-07-13 23:31 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML, LKML,
	Linux-MM, Linus Torvalds, Andy Lutomirski, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek, David H . Gutteridge, Joerg Roedel

On Wed, Jul 11, 2018 at 4:29 AM, Joerg Roedel <joro@8bytes.org> wrote:
> From: Joerg Roedel <jroedel@suse.de>
>
> It can happen that we enter the kernel from kernel-mode and
> on the entry-stack. The most common way this happens is when
> we get an exception while loading the user-space segment
> registers on the kernel-to-userspace exit path.
>
> The segment loading needs to be done after the entry-stack
> switch, because the stack-switch needs kernel %fs for
> per_cpu access.
>
> When this happens, we need to make sure that we leave the
> kernel with the entry-stack again, so that the interrupted
> code-path runs on the right stack when switching to the
> user-cr3.
>
> We do this by detecting this condition on kernel-entry by
> checking CS.RPL and %esp, and if it happens, we copy over
> the complete content of the entry stack to the task-stack.
> This needs to be done because once we enter the exception
> handlers we might be scheduled out or even migrated to a
> different CPU, so that we can't rely on the entry-stack
> contents. We also leave a marker in the stack-frame to
> detect this condition on the exit path.
>
> On the exit path the copy is reversed, we copy all of the
> remaining task-stack back to the entry-stack and switch
> to it.
>
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  arch/x86/entry/entry_32.S | 116 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 115 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index 3d1a114..b3af76e 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -299,6 +299,9 @@
>   * copied there. So allocate the stack-frame on the task-stack and
>   * switch to it before we do any copying.
>   */
> +
> +#define CS_FROM_ENTRY_STACK    (1 << 31)
> +
>  .macro SWITCH_TO_KERNEL_STACK
>
>         ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
> @@ -320,6 +323,16 @@
>         /* Load top of task-stack into %edi */
>         movl    TSS_entry_stack(%edi), %edi
>
> +       /*
> +        * Clear upper bits of the CS slot in pt_regs in case hardware
> +        * didn't clear it for us
> +        */
> +       andl    $(0x0000ffff), PT_CS(%esp)

The comment is highly confusing, give that the upper bits aren't part
of the slot any more:

commit 385eca8f277c4c34f361a4c3a088fd876d29ae21
Author: Andy Lutomirski <luto@kernel.org>
Date:   Fri Jul 28 06:00:30 2017 -0700

    x86/asm/32: Make pt_regs's segment registers be 16 bits

What you're really doing is keeping it available for an extra flag.
Please update the comment as such.  But see below.

> +
> +       /* Special case - entry from kernel mode via entry stack */
> +       testl   $SEGMENT_RPL_MASK, PT_CS(%esp)
> +       jz      .Lentry_from_kernel_\@
> +
>         /* Bytes to copy */
>         movl    $PTREGS_SIZE, %ecx
>
> @@ -333,8 +346,8 @@
>          */
>         addl    $(4 * 4), %ecx
>
> -.Lcopy_pt_regs_\@:
>  #endif
> +.Lcopy_pt_regs_\@:
>
>         /* Allocate frame on task-stack */
>         subl    %ecx, %edi
> @@ -350,6 +363,56 @@
>         cld
>         rep movsl
>
> +       jmp .Lend_\@
> +
> +.Lentry_from_kernel_\@:
> +
> +       /*
> +        * This handles the case when we enter the kernel from
> +        * kernel-mode and %esp points to the entry-stack. When this
> +        * happens we need to switch to the task-stack to run C code,
> +        * but switch back to the entry-stack again when we approach
> +        * iret and return to the interrupted code-path. This usually
> +        * happens when we hit an exception while restoring user-space
> +        * segment registers on the way back to user-space.
> +        *
> +        * When we switch to the task-stack here, we can't trust the
> +        * contents of the entry-stack anymore, as the exception handler
> +        * might be scheduled out or moved to another CPU. Therefore we
> +        * copy the complete entry-stack to the task-stack and set a
> +        * marker in the iret-frame (bit 31 of the CS dword) to detect
> +        * what we've done on the iret path.
> +        *
> +        * On the iret path we copy everything back and switch to the
> +        * entry-stack, so that the interrupted kernel code-path
> +        * continues on the same stack it was interrupted with.
> +        *
> +        * Be aware that an NMI can happen anytime in this code.
> +        *
> +        * %esi: Entry-Stack pointer (same as %esp)
> +        * %edi: Top of the task stack
> +        */
> +
> +       /* Calculate number of bytes on the entry stack in %ecx */
> +       movl    %esi, %ecx
> +
> +       /* %ecx to the top of entry-stack */
> +       andl    $(MASK_entry_stack), %ecx
> +       addl    $(SIZEOF_entry_stack), %ecx
> +
> +       /* Number of bytes on the entry stack to %ecx */
> +       sub     %esi, %ecx
> +
> +       /* Mark stackframe as coming from entry stack */
> +       orl     $CS_FROM_ENTRY_STACK, PT_CS(%esp)
> +
> +       /*
> +        * %esi and %edi are unchanged, %ecx contains the number of
> +        * bytes to copy. The code at .Lcopy_pt_regs_\@ will allocate
> +        * the stack-frame on task-stack and copy everything over
> +        */
> +       jmp .Lcopy_pt_regs_\@
> +
>  .Lend_\@:
>  .endm
>
> @@ -408,6 +471,56 @@
>  .endm
>
>  /*
> + * This macro handles the case when we return to kernel-mode on the iret
> + * path and have to switch back to the entry stack.
> + *
> + * See the comments below the .Lentry_from_kernel_\@ label in the
> + * SWITCH_TO_KERNEL_STACK macro for more details.
> + */
> +.macro PARANOID_EXIT_TO_KERNEL_MODE
> +
> +       /*
> +        * Test if we entered the kernel with the entry-stack. Most
> +        * likely we did not, because this code only runs on the
> +        * return-to-kernel path.
> +        */
> +       testl   $CS_FROM_ENTRY_STACK, PT_CS(%esp)
> +       jz      .Lend_\@
> +
> +       /* Unlikely slow-path */
> +
> +       /* Clear marker from stack-frame */
> +       andl    $(~CS_FROM_ENTRY_STACK), PT_CS(%esp)
> +
> +       /* Copy the remaining task-stack contents to entry-stack */
> +       movl    %esp, %esi
> +       movl    PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %edi

I'm confused.  Why do we need any special handling here at all?  How
could we end up with the contents of the stack frame we interrupted in
a corrupt state?

I guess I don't understand why this patch is needed.

^ permalink raw reply	[flat|nested] 105+ messages in thread

* [PATCH 10/39] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack
  2018-07-11 11:29 [PATCH 00/39 v7] " Joerg Roedel
@ 2018-07-11 11:29 ` Joerg Roedel
  2018-07-13 23:31   ` Andy Lutomirski
  0 siblings, 1 reply; 105+ messages in thread
From: Joerg Roedel @ 2018-07-11 11:29 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek,
	David H . Gutteridge, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

It can happen that we enter the kernel from kernel-mode and
on the entry-stack. The most common way this happens is when
we get an exception while loading the user-space segment
registers on the kernel-to-userspace exit path.

The segment loading needs to be done after the entry-stack
switch, because the stack-switch needs kernel %fs for
per_cpu access.

When this happens, we need to make sure that we leave the
kernel with the entry-stack again, so that the interrupted
code-path runs on the right stack when switching to the
user-cr3.

We do this by detecting this condition on kernel-entry by
checking CS.RPL and %esp, and if it happens, we copy over
the complete content of the entry stack to the task-stack.
This needs to be done because once we enter the exception
handlers we might be scheduled out or even migrated to a
different CPU, so that we can't rely on the entry-stack
contents. We also leave a marker in the stack-frame to
detect this condition on the exit path.

On the exit path the copy is reversed, we copy all of the
remaining task-stack back to the entry-stack and switch
to it.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 116 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 115 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 3d1a114..b3af76e 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -299,6 +299,9 @@
  * copied there. So allocate the stack-frame on the task-stack and
  * switch to it before we do any copying.
  */
+
+#define CS_FROM_ENTRY_STACK	(1 << 31)
+
 .macro SWITCH_TO_KERNEL_STACK
 
 	ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
@@ -320,6 +323,16 @@
 	/* Load top of task-stack into %edi */
 	movl	TSS_entry_stack(%edi), %edi
 
+	/*
+	 * Clear upper bits of the CS slot in pt_regs in case hardware
+	 * didn't clear it for us
+	 */
+	andl	$(0x0000ffff), PT_CS(%esp)
+
+	/* Special case - entry from kernel mode via entry stack */
+	testl	$SEGMENT_RPL_MASK, PT_CS(%esp)
+	jz	.Lentry_from_kernel_\@
+
 	/* Bytes to copy */
 	movl	$PTREGS_SIZE, %ecx
 
@@ -333,8 +346,8 @@
 	 */
 	addl	$(4 * 4), %ecx
 
-.Lcopy_pt_regs_\@:
 #endif
+.Lcopy_pt_regs_\@:
 
 	/* Allocate frame on task-stack */
 	subl	%ecx, %edi
@@ -350,6 +363,56 @@
 	cld
 	rep movsl
 
+	jmp .Lend_\@
+
+.Lentry_from_kernel_\@:
+
+	/*
+	 * This handles the case when we enter the kernel from
+	 * kernel-mode and %esp points to the entry-stack. When this
+	 * happens we need to switch to the task-stack to run C code,
+	 * but switch back to the entry-stack again when we approach
+	 * iret and return to the interrupted code-path. This usually
+	 * happens when we hit an exception while restoring user-space
+	 * segment registers on the way back to user-space.
+	 *
+	 * When we switch to the task-stack here, we can't trust the
+	 * contents of the entry-stack anymore, as the exception handler
+	 * might be scheduled out or moved to another CPU. Therefore we
+	 * copy the complete entry-stack to the task-stack and set a
+	 * marker in the iret-frame (bit 31 of the CS dword) to detect
+	 * what we've done on the iret path.
+	 *
+	 * On the iret path we copy everything back and switch to the
+	 * entry-stack, so that the interrupted kernel code-path
+	 * continues on the same stack it was interrupted with.
+	 *
+	 * Be aware that an NMI can happen anytime in this code.
+	 *
+	 * %esi: Entry-Stack pointer (same as %esp)
+	 * %edi: Top of the task stack
+	 */
+
+	/* Calculate number of bytes on the entry stack in %ecx */
+	movl	%esi, %ecx
+
+	/* %ecx to the top of entry-stack */
+	andl	$(MASK_entry_stack), %ecx
+	addl	$(SIZEOF_entry_stack), %ecx
+
+	/* Number of bytes on the entry stack to %ecx */
+	sub	%esi, %ecx
+
+	/* Mark stackframe as coming from entry stack */
+	orl	$CS_FROM_ENTRY_STACK, PT_CS(%esp)
+
+	/*
+	 * %esi and %edi are unchanged, %ecx contains the number of
+	 * bytes to copy. The code at .Lcopy_pt_regs_\@ will allocate
+	 * the stack-frame on task-stack and copy everything over
+	 */
+	jmp .Lcopy_pt_regs_\@
+
 .Lend_\@:
 .endm
 
@@ -408,6 +471,56 @@
 .endm
 
 /*
+ * This macro handles the case when we return to kernel-mode on the iret
+ * path and have to switch back to the entry stack.
+ *
+ * See the comments below the .Lentry_from_kernel_\@ label in the
+ * SWITCH_TO_KERNEL_STACK macro for more details.
+ */
+.macro PARANOID_EXIT_TO_KERNEL_MODE
+
+	/*
+	 * Test if we entered the kernel with the entry-stack. Most
+	 * likely we did not, because this code only runs on the
+	 * return-to-kernel path.
+	 */
+	testl	$CS_FROM_ENTRY_STACK, PT_CS(%esp)
+	jz	.Lend_\@
+
+	/* Unlikely slow-path */
+
+	/* Clear marker from stack-frame */
+	andl	$(~CS_FROM_ENTRY_STACK), PT_CS(%esp)
+
+	/* Copy the remaining task-stack contents to entry-stack */
+	movl	%esp, %esi
+	movl	PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %edi
+
+	/* Bytes on the task-stack to ecx */
+	movl	PER_CPU_VAR(cpu_current_top_of_stack), %ecx
+	subl	%esi, %ecx
+
+	/* Allocate stack-frame on entry-stack */
+	subl	%ecx, %edi
+
+	/*
+	 * Save future stack-pointer, we must not switch until the
+	 * copy is done, otherwise the NMI handler could destroy the
+	 * contents of the task-stack we are about to copy.
+	 */
+	movl	%edi, %ebx
+
+	/* Do the copy */
+	shrl	$2, %ecx
+	cld
+	rep movsl
+
+	/* Safe to switch to entry-stack now */
+	movl	%ebx, %esp
+
+.Lend_\@:
+.endm
+/*
  * %eax: prev task
  * %edx: next task
  */
@@ -769,6 +882,7 @@ restore_all:
 
 restore_all_kernel:
 	TRACE_IRQS_IRET
+	PARANOID_EXIT_TO_KERNEL_MODE
 	RESTORE_REGS 4
 	jmp	.Lirq_return
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 105+ messages in thread

end of thread, other threads:[~2018-10-18  6:22 UTC | newest]

Thread overview: 105+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-18  9:40 [PATCH 00/39 v8] PTI support for x86-32 Joerg Roedel
2018-07-18  9:40 ` [PATCH 01/39] x86/asm-offsets: Move TSS_sp0 and TSS_sp1 to asm-offsets.c Joerg Roedel
2018-07-19 23:18   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:40 ` [PATCH 02/39] x86/entry/32: Rename TSS_sysenter_sp0 to TSS_entry2task_stack Joerg Roedel
2018-07-19 23:19   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:40 ` [PATCH 03/39] x86/entry/32: Load task stack from x86_tss.sp1 in SYSENTER handler Joerg Roedel
2018-07-19 23:19   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:40 ` [PATCH 04/39] x86/entry/32: Put ESPFIX code into a macro Joerg Roedel
2018-07-19 23:20   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:40 ` [PATCH 05/39] x86/entry/32: Unshare NMI return path Joerg Roedel
2018-07-19 23:21   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:40 ` [PATCH 06/39] x86/entry/32: Split off return-to-kernel path Joerg Roedel
2018-07-19 23:21   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:40 ` [PATCH 07/39] x86/entry/32: Enter the kernel via trampoline stack Joerg Roedel
2018-07-18 18:09   ` Brian Gerst
2018-07-19 20:52     ` Thomas Gleixner
2018-07-19 23:22   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:40 ` [PATCH 08/39] x86/entry/32: Leave " Joerg Roedel
2018-07-19 23:22   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:40 ` [PATCH 09/39] x86/entry/32: Introduce SAVE_ALL_NMI and RESTORE_ALL_NMI Joerg Roedel
2018-07-19 23:23   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:40 ` [PATCH 10/39] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack Joerg Roedel
2018-07-19 23:23   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-10-12 18:29   ` [PATCH 10/39] " Jan Kiszka
2018-10-13  9:54     ` [PATCH] x86/entry/32: Fix setup of CS high bits Jan Kiszka
2018-10-13 15:12       ` Andy Lutomirski
2018-10-15 13:08         ` Jan Kiszka
2018-10-15 13:14           ` David Laight
2018-10-15 13:18             ` Jan Kiszka
2018-10-15 13:29               ` David Laight
2018-10-15  9:10       ` Joerg Roedel
2018-10-15 14:09       ` [PATCH v2] " Jan Kiszka
2018-10-15 15:09         ` [tip:x86/urgent] x86/entry/32: Clear the " tip-bot for Jan Kiszka
2018-10-18  6:21         ` tip-bot for Jan Kiszka
2018-07-18  9:40 ` [PATCH 11/39] x86/entry/32: Simplify debug entry point Joerg Roedel
2018-07-19 23:24   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:40 ` [PATCH 12/39] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points Joerg Roedel
2018-07-19 23:24   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:40 ` [PATCH 13/39] x86/entry/32: Add PTI cr3 switches to NMI handler code Joerg Roedel
2018-07-19 23:25   ` [tip:x86/pti] x86/entry/32: Add PTI CR3 " tip-bot for Joerg Roedel
2018-07-18  9:40 ` [PATCH 14/39] x86/entry: Rename update_sp0 to update_task_stack Joerg Roedel
2018-07-19 23:25   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:40 ` [PATCH 15/39] x86/pgtable: Rename pti_set_user_pgd to pti_set_user_pgtbl Joerg Roedel
2018-07-19 23:26   ` [tip:x86/pti] x86/pgtable: Rename pti_set_user_pgd() to pti_set_user_pgtbl() tip-bot for Joerg Roedel
2018-07-18  9:40 ` [PATCH 16/39] x86/pgtable/pae: Unshare kernel PMDs when PTI is enabled Joerg Roedel
2018-07-19 23:26   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:40 ` [PATCH 17/39] x86/pgtable/32: Allocate 8k page-tables " Joerg Roedel
2018-07-19 23:27   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:40 ` [PATCH 18/39] x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h Joerg Roedel
2018-07-19 23:27   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:40 ` [PATCH 19/39] x86/pgtable: Move pti_set_user_pgtbl() " Joerg Roedel
2018-07-19 23:28   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:40 ` [PATCH 20/39] x86/pgtable: Move two more functions from pgtable_64.h " Joerg Roedel
2018-07-19 23:28   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:40 ` [PATCH 21/39] x86/mm/pae: Populate valid user PGD entries Joerg Roedel
2018-07-19 23:29   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:40 ` [PATCH 22/39] x86/mm/pae: Populate the user page-table with user pgd's Joerg Roedel
2018-07-19 23:30   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:41 ` [PATCH 23/39] x86/mm/legacy: " Joerg Roedel
2018-07-19 23:30   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:41 ` [PATCH 24/39] x86/mm/pti: Add an overflow check to pti_clone_pmds() Joerg Roedel
2018-07-19 23:31   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:41 ` [PATCH 25/39] x86/mm/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32 Joerg Roedel
2018-07-19 23:31   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:41 ` [PATCH 26/39] x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level " Joerg Roedel
2018-07-19 23:32   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:41 ` [PATCH 27/39] x86/mm/pti: Make pti_clone_kernel_text() compile on 32 bit Joerg Roedel
2018-07-19 23:32   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:41 ` [PATCH 28/39] x86/mm/pti: Keep permissions when cloning kernel text in pti_clone_kernel_text() Joerg Roedel
2018-07-19 23:33   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:41 ` [PATCH 29/39] x86/mm/pti: Introduce pti_finalize() Joerg Roedel
2018-07-19 23:33   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:41 ` [PATCH 30/39] x86/mm/pti: Clone entry-text again in pti_finalize() Joerg Roedel
2018-07-19 23:34   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:41 ` [PATCH 31/39] x86/mm/dump_pagetables: Define INIT_PGD Joerg Roedel
2018-07-19 23:34   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:41 ` [PATCH 32/39] x86/pgtable/pae: Use separate kernel PMDs for user page-table Joerg Roedel
2018-07-19 23:35   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-10-05 14:06   ` [PATCH 32/39] " Arnd Bergmann
2018-07-18  9:41 ` [PATCH 33/39] x86/ldt: Reserve address-space range on 32 bit for the LDT Joerg Roedel
2018-07-19 23:35   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:41 ` [PATCH 34/39] x86/ldt: Define LDT_END_ADDR Joerg Roedel
2018-07-19 23:36   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:41 ` [PATCH 35/39] x86/ldt: Split out sanity check in map_ldt_struct() Joerg Roedel
2018-07-19 23:36   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:41 ` [PATCH 36/39] x86/ldt: Enable LDT user-mapping for PAE Joerg Roedel
2018-07-19 23:37   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:41 ` [PATCH 37/39] x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32 Joerg Roedel
2018-07-19 23:37   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:41 ` [PATCH 38/39] x86/mm/pti: Add Warning when booting on a PCID capable CPU Joerg Roedel
2018-07-19 23:38   ` [tip:x86/pti] " tip-bot for Joerg Roedel
2018-07-18  9:41 ` [PATCH 39/39] x86/entry/32: Add debug code to check entry/exit cr3 Joerg Roedel
2018-07-19 23:38   ` [tip:x86/pti] x86/entry/32: Add debug code to check entry/exit CR3 tip-bot for Joerg Roedel
2018-07-18 11:59 ` [PATCH 00/39 v8] PTI support for x86-32 Pavel Machek
2018-07-19 23:21 ` Thomas Gleixner
2018-07-20  7:59   ` Joerg Roedel
  -- strict thread matches above, loose matches on Subject: below --
2018-07-11 11:29 [PATCH 00/39 v7] " Joerg Roedel
2018-07-11 11:29 ` [PATCH 10/39] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack Joerg Roedel
2018-07-13 23:31   ` Andy Lutomirski
2018-07-14  5:21     ` Joerg Roedel
2018-07-14  6:26       ` Andy Lutomirski
2018-07-14  8:01         ` Joerg Roedel
2018-07-14 14:36           ` Andy Lutomirski
2018-07-17  7:15             ` Joerg Roedel
2018-07-17 20:06               ` Andy Lutomirski
2018-07-18 11:59                 ` Joerg Roedel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).