All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-09  9:25 ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

Hi,

here is the second version of my PTI implementation for
x86_32, based on tip/x86-pti-for-linus. It took a lot longer
than I had hoped, but there have been a number of obstacles
on the way. It also isn't the small patch-set anymore that v1
was, but compared to it this one actually works :)

The biggest changes were necessary in the entry code, a lot
of it is moving code around, but there are also significant
changes to get all cases covered. This includes NMIs and
exceptions on the kernel exit-path where we are already on
the entry-stack. To make this work I decided to mostly split
up the common kernel-exit path into a return-to-kernel,
return-to-user and return-from-nmi part.

On the page-table side I had to do a lot of special cases
for PAE because PAE paging is so, well, special. The biggest
example here is the LDT mapping code, which needs to work on
the PMD level instead of PGD when PAE is enabled.

During development I also experimented with unshared PMDs
between the kernel and the user page-tables for PAE. It
worked by allocating 8k PMDs and using the lower half for
the kernel and the upper half for the user page-table. While
this worked and allowed me to NX-protect the user-space
address-range in the kernel page-table, it also required 5
order-1 allocations in low-mem for each process. In my
testing I got this to fail pretty quickly and trigger OOM,
so I abandoned the approach for now.

Here is how I tested these patches:

	* Booted on a real machine (4C/8T, 16GB RAM) and run
	  an overnight load-test with 'perf top' running
	  (for the NMIs), the ldt_gdt selftest running in a
	  loop (for more stress on the entry/exit path) and
	  a -j16 kernel compile also running in a loop. The
	  box survived the test, which ran for more than 18
	  hours.

	* Tested most x86 selftests in the kernel on the
	  real machine. This showed no regressions. I did
	  not run the mpx and protection-key tests, as the
	  machine does not support these features, and I
	  also skipped the check_initial_reg_state test, as
	  it made problems while compiling and it didn't
	  seem relevant enough to fix that for this
	  patch-set.

	* Boot tested all valid combinations of [NO]HIGHMEM* vs.
	  VMSPLIT* vs. PAE in KVM. All booted fine.

	* Did compile-tests with various configs (allyes,
	  allmod, defconfig, ..., basically what I usually
	  use to test the iommu-tree as well). All compiled
	  fine.

	* Some basic compile, boot and runtime testing of
	  64 bit to make sure I didn't break anything there.

I did not explicitly test wine and dosemu, but since the
vm86 and the ldt_gdt self-tests all passed fine I am
confident that those will also still work.

XENPV is also untested from my side, but I added checks to
not do the stack switches in the entry-code when XENPV is
enabled, so hopefully it works. But someone should test it,
of course.

I also pushed these patches to

	git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git pti-x32-v2

for easier testing.

I do not claim that I've found the best solution for every
problem I encountered, so please review and give me feedback
on what I should change or solve differently. Of course I am
also interested in all bugs that may still be in there.

Thanks a lot,

       Joerg


Joerg Roedel (31):
  x86/asm-offsets: Move TSS_sp0 and TSS_sp1 to asm-offsets.c
  x86/entry/32: Rename TSS_sysenter_sp0 to TSS_entry_stack
  x86/entry/32: Load task stack from x86_tss.sp1 in SYSENTER handler
  x86/entry/32: Put ESPFIX code into a macro
  x86/entry/32: Unshare NMI return path
  x86/entry/32: Split off return-to-kernel path
  x86/entry/32: Restore segments before int registers
  x86/entry/32: Enter the kernel via trampoline stack
  x86/entry/32: Leave the kernel via trampoline stack
  x86/entry/32: Introduce SAVE_ALL_NMI and RESTORE_ALL_NMI
  x86/entry/32: Add PTI cr3 switches to NMI handler code
  x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
  x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack
  x86/pgtable/pae: Unshare kernel PMDs when PTI is enabled
  x86/pgtable/32: Allocate 8k page-tables when PTI is enabled
  x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h
  x86/pgtable: Move pti_set_user_pgd() to pgtable.h
  x86/pgtable: Move two more functions from pgtable_64.h to pgtable.h
  x86/mm/pae: Populate valid user PGD entries
  x86/mm/pae: Populate the user page-table with user pgd's
  x86/mm/legacy: Populate the user page-table with user pgd's
  x86/mm/pti: Add an overflow check to pti_clone_pmds()
  x86/mm/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32
  x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level on x86_32
  x86/mm/dump_pagetables: Define INIT_PGD
  x86/pgtable/pae: Use separate kernel PMDs for user page-table
  x86/ldt: Reserve address-space range on 32 bit for the LDT
  x86/ldt: Define LDT_END_ADDR
  x86/ldt: Split out sanity check in map_ldt_struct()
  x86/ldt: Enable LDT user-mapping for PAE
  x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32

 arch/x86/entry/entry_32.S                   | 581 ++++++++++++++++++++++------
 arch/x86/include/asm/mmu_context.h          |   4 -
 arch/x86/include/asm/pgtable-2level.h       |   9 +
 arch/x86/include/asm/pgtable-2level_types.h |   3 +
 arch/x86/include/asm/pgtable-3level.h       |   7 +
 arch/x86/include/asm/pgtable-3level_types.h |   6 +-
 arch/x86/include/asm/pgtable.h              |  88 +++++
 arch/x86/include/asm/pgtable_32_types.h     |   9 +-
 arch/x86/include/asm/pgtable_64.h           |  85 ----
 arch/x86/include/asm/pgtable_64_types.h     |   4 +
 arch/x86/include/asm/pgtable_types.h        |  26 +-
 arch/x86/include/asm/processor-flags.h      |   8 +-
 arch/x86/include/asm/switch_to.h            |   6 +-
 arch/x86/kernel/asm-offsets.c               |   5 +
 arch/x86/kernel/asm-offsets_32.c            |   2 +-
 arch/x86/kernel/asm-offsets_64.c            |   2 -
 arch/x86/kernel/cpu/common.c                |   5 +-
 arch/x86/kernel/head_32.S                   |  20 +-
 arch/x86/kernel/ldt.c                       | 137 +++++--
 arch/x86/kernel/process.c                   |   2 -
 arch/x86/kernel/process_32.c                |  10 +-
 arch/x86/mm/dump_pagetables.c               |  21 +-
 arch/x86/mm/pgtable.c                       | 105 ++++-
 arch/x86/mm/pti.c                           |  24 ++
 security/Kconfig                            |   2 +-
 25 files changed, 888 insertions(+), 283 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-09  9:25 ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

Hi,

here is the second version of my PTI implementation for
x86_32, based on tip/x86-pti-for-linus. It took a lot longer
than I had hoped, but there have been a number of obstacles
on the way. It also isn't the small patch-set anymore that v1
was, but compared to it this one actually works :)

The biggest changes were necessary in the entry code, a lot
of it is moving code around, but there are also significant
changes to get all cases covered. This includes NMIs and
exceptions on the kernel exit-path where we are already on
the entry-stack. To make this work I decided to mostly split
up the common kernel-exit path into a return-to-kernel,
return-to-user and return-from-nmi part.

On the page-table side I had to do a lot of special cases
for PAE because PAE paging is so, well, special. The biggest
example here is the LDT mapping code, which needs to work on
the PMD level instead of PGD when PAE is enabled.

During development I also experimented with unshared PMDs
between the kernel and the user page-tables for PAE. It
worked by allocating 8k PMDs and using the lower half for
the kernel and the upper half for the user page-table. While
this worked and allowed me to NX-protect the user-space
address-range in the kernel page-table, it also required 5
order-1 allocations in low-mem for each process. In my
testing I got this to fail pretty quickly and trigger OOM,
so I abandoned the approach for now.

Here is how I tested these patches:

	* Booted on a real machine (4C/8T, 16GB RAM) and run
	  an overnight load-test with 'perf top' running
	  (for the NMIs), the ldt_gdt selftest running in a
	  loop (for more stress on the entry/exit path) and
	  a -j16 kernel compile also running in a loop. The
	  box survived the test, which ran for more than 18
	  hours.

	* Tested most x86 selftests in the kernel on the
	  real machine. This showed no regressions. I did
	  not run the mpx and protection-key tests, as the
	  machine does not support these features, and I
	  also skipped the check_initial_reg_state test, as
	  it made problems while compiling and it didn't
	  seem relevant enough to fix that for this
	  patch-set.

	* Boot tested all valid combinations of [NO]HIGHMEM* vs.
	  VMSPLIT* vs. PAE in KVM. All booted fine.

	* Did compile-tests with various configs (allyes,
	  allmod, defconfig, ..., basically what I usually
	  use to test the iommu-tree as well). All compiled
	  fine.

	* Some basic compile, boot and runtime testing of
	  64 bit to make sure I didn't break anything there.

I did not explicitly test wine and dosemu, but since the
vm86 and the ldt_gdt self-tests all passed fine I am
confident that those will also still work.

XENPV is also untested from my side, but I added checks to
not do the stack switches in the entry-code when XENPV is
enabled, so hopefully it works. But someone should test it,
of course.

I also pushed these patches to

	git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git pti-x32-v2

for easier testing.

I do not claim that I've found the best solution for every
problem I encountered, so please review and give me feedback
on what I should change or solve differently. Of course I am
also interested in all bugs that may still be in there.

Thanks a lot,

       Joerg


Joerg Roedel (31):
  x86/asm-offsets: Move TSS_sp0 and TSS_sp1 to asm-offsets.c
  x86/entry/32: Rename TSS_sysenter_sp0 to TSS_entry_stack
  x86/entry/32: Load task stack from x86_tss.sp1 in SYSENTER handler
  x86/entry/32: Put ESPFIX code into a macro
  x86/entry/32: Unshare NMI return path
  x86/entry/32: Split off return-to-kernel path
  x86/entry/32: Restore segments before int registers
  x86/entry/32: Enter the kernel via trampoline stack
  x86/entry/32: Leave the kernel via trampoline stack
  x86/entry/32: Introduce SAVE_ALL_NMI and RESTORE_ALL_NMI
  x86/entry/32: Add PTI cr3 switches to NMI handler code
  x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
  x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack
  x86/pgtable/pae: Unshare kernel PMDs when PTI is enabled
  x86/pgtable/32: Allocate 8k page-tables when PTI is enabled
  x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h
  x86/pgtable: Move pti_set_user_pgd() to pgtable.h
  x86/pgtable: Move two more functions from pgtable_64.h to pgtable.h
  x86/mm/pae: Populate valid user PGD entries
  x86/mm/pae: Populate the user page-table with user pgd's
  x86/mm/legacy: Populate the user page-table with user pgd's
  x86/mm/pti: Add an overflow check to pti_clone_pmds()
  x86/mm/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32
  x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level on x86_32
  x86/mm/dump_pagetables: Define INIT_PGD
  x86/pgtable/pae: Use separate kernel PMDs for user page-table
  x86/ldt: Reserve address-space range on 32 bit for the LDT
  x86/ldt: Define LDT_END_ADDR
  x86/ldt: Split out sanity check in map_ldt_struct()
  x86/ldt: Enable LDT user-mapping for PAE
  x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32

 arch/x86/entry/entry_32.S                   | 581 ++++++++++++++++++++++------
 arch/x86/include/asm/mmu_context.h          |   4 -
 arch/x86/include/asm/pgtable-2level.h       |   9 +
 arch/x86/include/asm/pgtable-2level_types.h |   3 +
 arch/x86/include/asm/pgtable-3level.h       |   7 +
 arch/x86/include/asm/pgtable-3level_types.h |   6 +-
 arch/x86/include/asm/pgtable.h              |  88 +++++
 arch/x86/include/asm/pgtable_32_types.h     |   9 +-
 arch/x86/include/asm/pgtable_64.h           |  85 ----
 arch/x86/include/asm/pgtable_64_types.h     |   4 +
 arch/x86/include/asm/pgtable_types.h        |  26 +-
 arch/x86/include/asm/processor-flags.h      |   8 +-
 arch/x86/include/asm/switch_to.h            |   6 +-
 arch/x86/kernel/asm-offsets.c               |   5 +
 arch/x86/kernel/asm-offsets_32.c            |   2 +-
 arch/x86/kernel/asm-offsets_64.c            |   2 -
 arch/x86/kernel/cpu/common.c                |   5 +-
 arch/x86/kernel/head_32.S                   |  20 +-
 arch/x86/kernel/ldt.c                       | 137 +++++--
 arch/x86/kernel/process.c                   |   2 -
 arch/x86/kernel/process_32.c                |  10 +-
 arch/x86/mm/dump_pagetables.c               |  21 +-
 arch/x86/mm/pgtable.c                       | 105 ++++-
 arch/x86/mm/pti.c                           |  24 ++
 security/Kconfig                            |   2 +-
 25 files changed, 888 insertions(+), 283 deletions(-)

-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* [PATCH 01/31] x86/asm-offsets: Move TSS_sp0 and TSS_sp1 to asm-offsets.c
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

These offsets will be used in 32 bit assembly code as well,
so make them available for all of x86 code.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/asm-offsets.c    | 4 ++++
 arch/x86/kernel/asm-offsets_64.c | 2 --
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 76417a9..232152c 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -103,4 +103,8 @@ void common(void) {
 	OFFSET(CPU_ENTRY_AREA_entry_trampoline, cpu_entry_area, entry_trampoline);
 	OFFSET(CPU_ENTRY_AREA_entry_stack, cpu_entry_area, entry_stack_page);
 	DEFINE(SIZEOF_entry_stack, sizeof(struct entry_stack));
+
+	/* Offset for sp0 and sp1 into the tss_struct */
+	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
+	OFFSET(TSS_sp1, tss_struct, x86_tss.sp1);
 }
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index bf51e51..d2eba73 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -65,8 +65,6 @@ int main(void)
 #undef ENTRY
 
 	OFFSET(TSS_ist, tss_struct, x86_tss.ist);
-	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
-	OFFSET(TSS_sp1, tss_struct, x86_tss.sp1);
 	BLANK();
 
 #ifdef CONFIG_CC_STACKPROTECTOR
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 01/31] x86/asm-offsets: Move TSS_sp0 and TSS_sp1 to asm-offsets.c
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

These offsets will be used in 32 bit assembly code as well,
so make them available for all of x86 code.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/asm-offsets.c    | 4 ++++
 arch/x86/kernel/asm-offsets_64.c | 2 --
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 76417a9..232152c 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -103,4 +103,8 @@ void common(void) {
 	OFFSET(CPU_ENTRY_AREA_entry_trampoline, cpu_entry_area, entry_trampoline);
 	OFFSET(CPU_ENTRY_AREA_entry_stack, cpu_entry_area, entry_stack_page);
 	DEFINE(SIZEOF_entry_stack, sizeof(struct entry_stack));
+
+	/* Offset for sp0 and sp1 into the tss_struct */
+	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
+	OFFSET(TSS_sp1, tss_struct, x86_tss.sp1);
 }
diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c
index bf51e51..d2eba73 100644
--- a/arch/x86/kernel/asm-offsets_64.c
+++ b/arch/x86/kernel/asm-offsets_64.c
@@ -65,8 +65,6 @@ int main(void)
 #undef ENTRY
 
 	OFFSET(TSS_ist, tss_struct, x86_tss.ist);
-	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
-	OFFSET(TSS_sp1, tss_struct, x86_tss.sp1);
 	BLANK();
 
 #ifdef CONFIG_CC_STACKPROTECTOR
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 02/31] x86/entry/32: Rename TSS_sysenter_sp0 to TSS_entry_stack
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

The stack address doesn't need to be stored in tss.sp0 if
we switch manually like on sysenter. Rename the offset so
that it still makes sense when we change its location.

We will also use this stack for all kernel-entry points, not
just sysenter. Reflect that in the name as well.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S        | 2 +-
 arch/x86/kernel/asm-offsets_32.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 2a35b1e..e659776 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -413,7 +413,7 @@ ENTRY(xen_sysenter_target)
  * 0(%ebp) arg6
  */
 ENTRY(entry_SYSENTER_32)
-	movl	TSS_sysenter_sp0(%esp), %esp
+	movl	TSS_entry_stack(%esp), %esp
 .Lsysenter_past_esp:
 	pushl	$__USER_DS		/* pt_regs->ss */
 	pushl	%ebp			/* pt_regs->sp (stashed in bp) */
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index fa1261e..f452bfd 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -47,7 +47,7 @@ void foo(void)
 	BLANK();
 
 	/* Offset from the sysenter stack to tss.sp0 */
-	DEFINE(TSS_sysenter_sp0, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) -
+	DEFINE(TSS_entry_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) -
 	       offsetofend(struct cpu_entry_area, entry_stack_page.stack));
 
 #ifdef CONFIG_CC_STACKPROTECTOR
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 02/31] x86/entry/32: Rename TSS_sysenter_sp0 to TSS_entry_stack
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

The stack address doesn't need to be stored in tss.sp0 if
we switch manually like on sysenter. Rename the offset so
that it still makes sense when we change its location.

We will also use this stack for all kernel-entry points, not
just sysenter. Reflect that in the name as well.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S        | 2 +-
 arch/x86/kernel/asm-offsets_32.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 2a35b1e..e659776 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -413,7 +413,7 @@ ENTRY(xen_sysenter_target)
  * 0(%ebp) arg6
  */
 ENTRY(entry_SYSENTER_32)
-	movl	TSS_sysenter_sp0(%esp), %esp
+	movl	TSS_entry_stack(%esp), %esp
 .Lsysenter_past_esp:
 	pushl	$__USER_DS		/* pt_regs->ss */
 	pushl	%ebp			/* pt_regs->sp (stashed in bp) */
diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index fa1261e..f452bfd 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -47,7 +47,7 @@ void foo(void)
 	BLANK();
 
 	/* Offset from the sysenter stack to tss.sp0 */
-	DEFINE(TSS_sysenter_sp0, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) -
+	DEFINE(TSS_entry_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) -
 	       offsetofend(struct cpu_entry_area, entry_stack_page.stack));
 
 #ifdef CONFIG_CC_STACKPROTECTOR
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 03/31] x86/entry/32: Load task stack from x86_tss.sp1 in SYSENTER handler
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

We want x86_tss.sp0 point to the entry stack later to use
it as a trampoline stack for other kernel entry points
besides SYSENTER.

So store the task stack pointer in x86_tss.sp1, which is
otherwise unused by the hardware, as Linux doesn't make use
of Ring 1.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/asm-offsets_32.c | 2 +-
 arch/x86/kernel/process_32.c     | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index f452bfd..b97e48c 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -47,7 +47,7 @@ void foo(void)
 	BLANK();
 
 	/* Offset from the sysenter stack to tss.sp0 */
-	DEFINE(TSS_entry_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) -
+	DEFINE(TSS_entry_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp1) -
 	       offsetofend(struct cpu_entry_area, entry_stack_page.stack));
 
 #ifdef CONFIG_CC_STACKPROTECTOR
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 5224c60..097d36a 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -292,6 +292,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	this_cpu_write(cpu_current_top_of_stack,
 		       (unsigned long)task_stack_page(next_p) +
 		       THREAD_SIZE);
+	/* SYSENTER reads the task-stack from tss.sp1 */
+	this_cpu_write(cpu_tss_rw.x86_tss.sp1, next_p->thread.sp0);
 
 	/*
 	 * Restore %gs if needed (which is common)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 03/31] x86/entry/32: Load task stack from x86_tss.sp1 in SYSENTER handler
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

We want x86_tss.sp0 point to the entry stack later to use
it as a trampoline stack for other kernel entry points
besides SYSENTER.

So store the task stack pointer in x86_tss.sp1, which is
otherwise unused by the hardware, as Linux doesn't make use
of Ring 1.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/asm-offsets_32.c | 2 +-
 arch/x86/kernel/process_32.c     | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c
index f452bfd..b97e48c 100644
--- a/arch/x86/kernel/asm-offsets_32.c
+++ b/arch/x86/kernel/asm-offsets_32.c
@@ -47,7 +47,7 @@ void foo(void)
 	BLANK();
 
 	/* Offset from the sysenter stack to tss.sp0 */
-	DEFINE(TSS_entry_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) -
+	DEFINE(TSS_entry_stack, offsetof(struct cpu_entry_area, tss.x86_tss.sp1) -
 	       offsetofend(struct cpu_entry_area, entry_stack_page.stack));
 
 #ifdef CONFIG_CC_STACKPROTECTOR
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 5224c60..097d36a 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -292,6 +292,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	this_cpu_write(cpu_current_top_of_stack,
 		       (unsigned long)task_stack_page(next_p) +
 		       THREAD_SIZE);
+	/* SYSENTER reads the task-stack from tss.sp1 */
+	this_cpu_write(cpu_tss_rw.x86_tss.sp1, next_p->thread.sp0);
 
 	/*
 	 * Restore %gs if needed (which is common)
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 04/31] x86/entry/32: Put ESPFIX code into a macro
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

This makes it easier to split up the shared iret code path.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 97 ++++++++++++++++++++++++-----------------------
 1 file changed, 49 insertions(+), 48 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index e659776..0289bde 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -221,6 +221,54 @@
 	POP_GS_EX
 .endm
 
+.macro CHECK_AND_APPLY_ESPFIX
+#ifdef CONFIG_X86_ESPFIX32
+#define GDT_ESPFIX_SS PER_CPU_VAR(gdt_page) + (GDT_ENTRY_ESPFIX_SS * 8)
+
+	ALTERNATIVE	"jmp .Lend_\@", "", X86_BUG_ESPFIX
+
+	movl	PT_EFLAGS(%esp), %eax		# mix EFLAGS, SS and CS
+	/*
+	 * Warning: PT_OLDSS(%esp) contains the wrong/random values if we
+	 * are returning to the kernel.
+	 * See comments in process.c:copy_thread() for details.
+	 */
+	movb	PT_OLDSS(%esp), %ah
+	movb	PT_CS(%esp), %al
+	andl	$(X86_EFLAGS_VM | (SEGMENT_TI_MASK << 8) | SEGMENT_RPL_MASK), %eax
+	cmpl	$((SEGMENT_LDT << 8) | USER_RPL), %eax
+	jne	.Lend_\@	# returning to user-space with LDT SS
+
+	/*
+	 * Setup and switch to ESPFIX stack
+	 *
+	 * We're returning to userspace with a 16 bit stack. The CPU will not
+	 * restore the high word of ESP for us on executing iret... This is an
+	 * "official" bug of all the x86-compatible CPUs, which we can work
+	 * around to make dosemu and wine happy. We do this by preloading the
+	 * high word of ESP with the high word of the userspace ESP while
+	 * compensating for the offset by changing to the ESPFIX segment with
+	 * a base address that matches for the difference.
+	 */
+	mov	%esp, %edx			/* load kernel esp */
+	mov	PT_OLDESP(%esp), %eax		/* load userspace esp */
+	mov	%dx, %ax			/* eax: new kernel esp */
+	sub	%eax, %edx			/* offset (low word is 0) */
+	shr	$16, %edx
+	mov	%dl, GDT_ESPFIX_SS + 4		/* bits 16..23 */
+	mov	%dh, GDT_ESPFIX_SS + 7		/* bits 24..31 */
+	pushl	$__ESPFIX_SS
+	pushl	%eax				/* new kernel esp */
+	/*
+	 * Disable interrupts, but do not irqtrace this section: we
+	 * will soon execute iret and the tracer was already set to
+	 * the irqstate after the IRET:
+	 */
+	DISABLE_INTERRUPTS(CLBR_ANY)
+	lss	(%esp), %esp			/* switch to espfix segment */
+.Lend_\@:
+#endif /* CONFIG_X86_ESPFIX32 */
+.endm
 /*
  * %eax: prev task
  * %edx: next task
@@ -548,21 +596,7 @@ ENTRY(entry_INT80_32)
 restore_all:
 	TRACE_IRQS_IRET
 .Lrestore_all_notrace:
-#ifdef CONFIG_X86_ESPFIX32
-	ALTERNATIVE	"jmp .Lrestore_nocheck", "", X86_BUG_ESPFIX
-
-	movl	PT_EFLAGS(%esp), %eax		# mix EFLAGS, SS and CS
-	/*
-	 * Warning: PT_OLDSS(%esp) contains the wrong/random values if we
-	 * are returning to the kernel.
-	 * See comments in process.c:copy_thread() for details.
-	 */
-	movb	PT_OLDSS(%esp), %ah
-	movb	PT_CS(%esp), %al
-	andl	$(X86_EFLAGS_VM | (SEGMENT_TI_MASK << 8) | SEGMENT_RPL_MASK), %eax
-	cmpl	$((SEGMENT_LDT << 8) | USER_RPL), %eax
-	je .Lldt_ss				# returning to user-space with LDT SS
-#endif
+	CHECK_AND_APPLY_ESPFIX
 .Lrestore_nocheck:
 	RESTORE_REGS 4				# skip orig_eax/error_code
 .Lirq_return:
@@ -575,39 +609,6 @@ ENTRY(iret_exc	)
 	jmp	common_exception
 .previous
 	_ASM_EXTABLE(.Lirq_return, iret_exc)
-
-#ifdef CONFIG_X86_ESPFIX32
-.Lldt_ss:
-/*
- * Setup and switch to ESPFIX stack
- *
- * We're returning to userspace with a 16 bit stack. The CPU will not
- * restore the high word of ESP for us on executing iret... This is an
- * "official" bug of all the x86-compatible CPUs, which we can work
- * around to make dosemu and wine happy. We do this by preloading the
- * high word of ESP with the high word of the userspace ESP while
- * compensating for the offset by changing to the ESPFIX segment with
- * a base address that matches for the difference.
- */
-#define GDT_ESPFIX_SS PER_CPU_VAR(gdt_page) + (GDT_ENTRY_ESPFIX_SS * 8)
-	mov	%esp, %edx			/* load kernel esp */
-	mov	PT_OLDESP(%esp), %eax		/* load userspace esp */
-	mov	%dx, %ax			/* eax: new kernel esp */
-	sub	%eax, %edx			/* offset (low word is 0) */
-	shr	$16, %edx
-	mov	%dl, GDT_ESPFIX_SS + 4		/* bits 16..23 */
-	mov	%dh, GDT_ESPFIX_SS + 7		/* bits 24..31 */
-	pushl	$__ESPFIX_SS
-	pushl	%eax				/* new kernel esp */
-	/*
-	 * Disable interrupts, but do not irqtrace this section: we
-	 * will soon execute iret and the tracer was already set to
-	 * the irqstate after the IRET:
-	 */
-	DISABLE_INTERRUPTS(CLBR_ANY)
-	lss	(%esp), %esp			/* switch to espfix segment */
-	jmp	.Lrestore_nocheck
-#endif
 ENDPROC(entry_INT80_32)
 
 .macro FIXUP_ESPFIX_STACK
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 04/31] x86/entry/32: Put ESPFIX code into a macro
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

This makes it easier to split up the shared iret code path.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 97 ++++++++++++++++++++++++-----------------------
 1 file changed, 49 insertions(+), 48 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index e659776..0289bde 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -221,6 +221,54 @@
 	POP_GS_EX
 .endm
 
+.macro CHECK_AND_APPLY_ESPFIX
+#ifdef CONFIG_X86_ESPFIX32
+#define GDT_ESPFIX_SS PER_CPU_VAR(gdt_page) + (GDT_ENTRY_ESPFIX_SS * 8)
+
+	ALTERNATIVE	"jmp .Lend_\@", "", X86_BUG_ESPFIX
+
+	movl	PT_EFLAGS(%esp), %eax		# mix EFLAGS, SS and CS
+	/*
+	 * Warning: PT_OLDSS(%esp) contains the wrong/random values if we
+	 * are returning to the kernel.
+	 * See comments in process.c:copy_thread() for details.
+	 */
+	movb	PT_OLDSS(%esp), %ah
+	movb	PT_CS(%esp), %al
+	andl	$(X86_EFLAGS_VM | (SEGMENT_TI_MASK << 8) | SEGMENT_RPL_MASK), %eax
+	cmpl	$((SEGMENT_LDT << 8) | USER_RPL), %eax
+	jne	.Lend_\@	# returning to user-space with LDT SS
+
+	/*
+	 * Setup and switch to ESPFIX stack
+	 *
+	 * We're returning to userspace with a 16 bit stack. The CPU will not
+	 * restore the high word of ESP for us on executing iret... This is an
+	 * "official" bug of all the x86-compatible CPUs, which we can work
+	 * around to make dosemu and wine happy. We do this by preloading the
+	 * high word of ESP with the high word of the userspace ESP while
+	 * compensating for the offset by changing to the ESPFIX segment with
+	 * a base address that matches for the difference.
+	 */
+	mov	%esp, %edx			/* load kernel esp */
+	mov	PT_OLDESP(%esp), %eax		/* load userspace esp */
+	mov	%dx, %ax			/* eax: new kernel esp */
+	sub	%eax, %edx			/* offset (low word is 0) */
+	shr	$16, %edx
+	mov	%dl, GDT_ESPFIX_SS + 4		/* bits 16..23 */
+	mov	%dh, GDT_ESPFIX_SS + 7		/* bits 24..31 */
+	pushl	$__ESPFIX_SS
+	pushl	%eax				/* new kernel esp */
+	/*
+	 * Disable interrupts, but do not irqtrace this section: we
+	 * will soon execute iret and the tracer was already set to
+	 * the irqstate after the IRET:
+	 */
+	DISABLE_INTERRUPTS(CLBR_ANY)
+	lss	(%esp), %esp			/* switch to espfix segment */
+.Lend_\@:
+#endif /* CONFIG_X86_ESPFIX32 */
+.endm
 /*
  * %eax: prev task
  * %edx: next task
@@ -548,21 +596,7 @@ ENTRY(entry_INT80_32)
 restore_all:
 	TRACE_IRQS_IRET
 .Lrestore_all_notrace:
-#ifdef CONFIG_X86_ESPFIX32
-	ALTERNATIVE	"jmp .Lrestore_nocheck", "", X86_BUG_ESPFIX
-
-	movl	PT_EFLAGS(%esp), %eax		# mix EFLAGS, SS and CS
-	/*
-	 * Warning: PT_OLDSS(%esp) contains the wrong/random values if we
-	 * are returning to the kernel.
-	 * See comments in process.c:copy_thread() for details.
-	 */
-	movb	PT_OLDSS(%esp), %ah
-	movb	PT_CS(%esp), %al
-	andl	$(X86_EFLAGS_VM | (SEGMENT_TI_MASK << 8) | SEGMENT_RPL_MASK), %eax
-	cmpl	$((SEGMENT_LDT << 8) | USER_RPL), %eax
-	je .Lldt_ss				# returning to user-space with LDT SS
-#endif
+	CHECK_AND_APPLY_ESPFIX
 .Lrestore_nocheck:
 	RESTORE_REGS 4				# skip orig_eax/error_code
 .Lirq_return:
@@ -575,39 +609,6 @@ ENTRY(iret_exc	)
 	jmp	common_exception
 .previous
 	_ASM_EXTABLE(.Lirq_return, iret_exc)
-
-#ifdef CONFIG_X86_ESPFIX32
-.Lldt_ss:
-/*
- * Setup and switch to ESPFIX stack
- *
- * We're returning to userspace with a 16 bit stack. The CPU will not
- * restore the high word of ESP for us on executing iret... This is an
- * "official" bug of all the x86-compatible CPUs, which we can work
- * around to make dosemu and wine happy. We do this by preloading the
- * high word of ESP with the high word of the userspace ESP while
- * compensating for the offset by changing to the ESPFIX segment with
- * a base address that matches for the difference.
- */
-#define GDT_ESPFIX_SS PER_CPU_VAR(gdt_page) + (GDT_ENTRY_ESPFIX_SS * 8)
-	mov	%esp, %edx			/* load kernel esp */
-	mov	PT_OLDESP(%esp), %eax		/* load userspace esp */
-	mov	%dx, %ax			/* eax: new kernel esp */
-	sub	%eax, %edx			/* offset (low word is 0) */
-	shr	$16, %edx
-	mov	%dl, GDT_ESPFIX_SS + 4		/* bits 16..23 */
-	mov	%dh, GDT_ESPFIX_SS + 7		/* bits 24..31 */
-	pushl	$__ESPFIX_SS
-	pushl	%eax				/* new kernel esp */
-	/*
-	 * Disable interrupts, but do not irqtrace this section: we
-	 * will soon execute iret and the tracer was already set to
-	 * the irqstate after the IRET:
-	 */
-	DISABLE_INTERRUPTS(CLBR_ANY)
-	lss	(%esp), %esp			/* switch to espfix segment */
-	jmp	.Lrestore_nocheck
-#endif
 ENDPROC(entry_INT80_32)
 
 .macro FIXUP_ESPFIX_STACK
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 05/31] x86/entry/32: Unshare NMI return path
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

NMI will no longer use most of the shared return path,
because NMI needs special handling when the CR3 switches for
PTI are added. This patch prepares for that.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 0289bde..00ae759 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1007,7 +1007,7 @@ ENTRY(nmi)
 
 	/* Not on SYSENTER stack. */
 	call	do_nmi
-	jmp	.Lrestore_all_notrace
+	jmp	.Lnmi_return
 
 .Lnmi_from_sysenter_stack:
 	/*
@@ -1018,7 +1018,11 @@ ENTRY(nmi)
 	movl	PER_CPU_VAR(cpu_current_top_of_stack), %esp
 	call	do_nmi
 	movl	%ebx, %esp
-	jmp	.Lrestore_all_notrace
+
+.Lnmi_return:
+	CHECK_AND_APPLY_ESPFIX
+	RESTORE_REGS 4
+	jmp	.Lirq_return
 
 #ifdef CONFIG_X86_ESPFIX32
 .Lnmi_espfix_stack:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 05/31] x86/entry/32: Unshare NMI return path
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

NMI will no longer use most of the shared return path,
because NMI needs special handling when the CR3 switches for
PTI are added. This patch prepares for that.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 0289bde..00ae759 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1007,7 +1007,7 @@ ENTRY(nmi)
 
 	/* Not on SYSENTER stack. */
 	call	do_nmi
-	jmp	.Lrestore_all_notrace
+	jmp	.Lnmi_return
 
 .Lnmi_from_sysenter_stack:
 	/*
@@ -1018,7 +1018,11 @@ ENTRY(nmi)
 	movl	PER_CPU_VAR(cpu_current_top_of_stack), %esp
 	call	do_nmi
 	movl	%ebx, %esp
-	jmp	.Lrestore_all_notrace
+
+.Lnmi_return:
+	CHECK_AND_APPLY_ESPFIX
+	RESTORE_REGS 4
+	jmp	.Lirq_return
 
 #ifdef CONFIG_X86_ESPFIX32
 .Lnmi_espfix_stack:
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 06/31] x86/entry/32: Split off return-to-kernel path
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Use a separate return path when we know we are returning to
the kernel. This allows us to put the PTI cr3-switch and the
switch to the entry-stack into the return-to-user path
without further checking.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 00ae759..9bd7718 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -65,7 +65,7 @@
 # define preempt_stop(clobbers)	DISABLE_INTERRUPTS(clobbers); TRACE_IRQS_OFF
 #else
 # define preempt_stop(clobbers)
-# define resume_kernel		restore_all
+# define resume_kernel		restore_all_kernel
 #endif
 
 .macro TRACE_IRQS_IRET
@@ -400,9 +400,9 @@ ENTRY(resume_kernel)
 	DISABLE_INTERRUPTS(CLBR_ANY)
 .Lneed_resched:
 	cmpl	$0, PER_CPU_VAR(__preempt_count)
-	jnz	restore_all
+	jnz	restore_all_kernel
 	testl	$X86_EFLAGS_IF, PT_EFLAGS(%esp)	# interrupts off (exception path) ?
-	jz	restore_all
+	jz	restore_all_kernel
 	call	preempt_schedule_irq
 	jmp	.Lneed_resched
 END(resume_kernel)
@@ -602,6 +602,11 @@ restore_all:
 .Lirq_return:
 	INTERRUPT_RETURN
 
+restore_all_kernel:
+	TRACE_IRQS_IRET
+	RESTORE_REGS 4
+	jmp	.Lirq_return
+
 .section .fixup, "ax"
 ENTRY(iret_exc	)
 	pushl	$0				# no error code
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 06/31] x86/entry/32: Split off return-to-kernel path
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Use a separate return path when we know we are returning to
the kernel. This allows us to put the PTI cr3-switch and the
switch to the entry-stack into the return-to-user path
without further checking.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 00ae759..9bd7718 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -65,7 +65,7 @@
 # define preempt_stop(clobbers)	DISABLE_INTERRUPTS(clobbers); TRACE_IRQS_OFF
 #else
 # define preempt_stop(clobbers)
-# define resume_kernel		restore_all
+# define resume_kernel		restore_all_kernel
 #endif
 
 .macro TRACE_IRQS_IRET
@@ -400,9 +400,9 @@ ENTRY(resume_kernel)
 	DISABLE_INTERRUPTS(CLBR_ANY)
 .Lneed_resched:
 	cmpl	$0, PER_CPU_VAR(__preempt_count)
-	jnz	restore_all
+	jnz	restore_all_kernel
 	testl	$X86_EFLAGS_IF, PT_EFLAGS(%esp)	# interrupts off (exception path) ?
-	jz	restore_all
+	jz	restore_all_kernel
 	call	preempt_schedule_irq
 	jmp	.Lneed_resched
 END(resume_kernel)
@@ -602,6 +602,11 @@ restore_all:
 .Lirq_return:
 	INTERRUPT_RETURN
 
+restore_all_kernel:
+	TRACE_IRQS_IRET
+	RESTORE_REGS 4
+	jmp	.Lirq_return
+
 .section .fixup, "ax"
 ENTRY(iret_exc	)
 	pushl	$0				# no error code
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 07/31] x86/entry/32: Restore segments before int registers
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Restoring the segments can cause exceptions that need to be
handled. With PTI enabled, we still need to be on kernel cr3
when the exception happens. For the cr3-switch we need
at least one integer scratch register, so we can't switch
with the user integer registers already loaded.

Avoid a push/pop cycle to free a register for the cr3 switch
by restoring the segments first. That way the integer
registers are not live yet and we can use them for the cr3
switch.

This also helps in the NMI path, where we need to leave with
the same cr3 as we entered. There we still have the
callee-saved registers live when switching cr3s.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 50 ++++++++++++++++++++---------------------------
 1 file changed, 21 insertions(+), 29 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 9bd7718..b39c5e2 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -92,11 +92,6 @@
 .macro PUSH_GS
 	pushl	$0
 .endm
-.macro POP_GS pop=0
-	addl	$(4 + \pop), %esp
-.endm
-.macro POP_GS_EX
-.endm
 
  /* all the rest are no-op */
 .macro PTGS_TO_GS
@@ -116,20 +111,6 @@
 	pushl	%gs
 .endm
 
-.macro POP_GS pop=0
-98:	popl	%gs
-  .if \pop <> 0
-	add	$\pop, %esp
-  .endif
-.endm
-.macro POP_GS_EX
-.pushsection .fixup, "ax"
-99:	movl	$0, (%esp)
-	jmp	98b
-.popsection
-	_ASM_EXTABLE(98b, 99b)
-.endm
-
 .macro PTGS_TO_GS
 98:	mov	PT_GS(%esp), %gs
 .endm
@@ -201,24 +182,35 @@
 	popl	%eax
 .endm
 
-.macro RESTORE_REGS pop=0
-	RESTORE_INT_REGS
-1:	popl	%ds
-2:	popl	%es
-3:	popl	%fs
-	POP_GS \pop
+.macro RESTORE_SEGMENTS
+1:	mov	PT_DS(%esp), %ds
+2:	mov	PT_ES(%esp), %es
+3:	mov	PT_FS(%esp), %fs
+	PTGS_TO_GS
 .pushsection .fixup, "ax"
-4:	movl	$0, (%esp)
+4:	movl	$0, PT_DS(%esp)
 	jmp	1b
-5:	movl	$0, (%esp)
+5:	movl	$0, PT_ES(%esp)
 	jmp	2b
-6:	movl	$0, (%esp)
+6:	movl	$0, PT_FS(%esp)
 	jmp	3b
 .popsection
 	_ASM_EXTABLE(1b, 4b)
 	_ASM_EXTABLE(2b, 5b)
 	_ASM_EXTABLE(3b, 6b)
-	POP_GS_EX
+	PTGS_TO_GS_EX
+.endm
+
+.macro RESTORE_SKIP_SEGMENTS pop=0
+	/* Jump over the segments stored on stack */
+	addl	$((4 * 4) + \pop), %esp
+.endm
+
+.macro RESTORE_REGS pop=0
+	RESTORE_SEGMENTS
+	RESTORE_INT_REGS
+	/* Skip over already restored segment registers */
+	RESTORE_SKIP_SEGMENTS \pop
 .endm
 
 .macro CHECK_AND_APPLY_ESPFIX
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 07/31] x86/entry/32: Restore segments before int registers
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Restoring the segments can cause exceptions that need to be
handled. With PTI enabled, we still need to be on kernel cr3
when the exception happens. For the cr3-switch we need
at least one integer scratch register, so we can't switch
with the user integer registers already loaded.

Avoid a push/pop cycle to free a register for the cr3 switch
by restoring the segments first. That way the integer
registers are not live yet and we can use them for the cr3
switch.

This also helps in the NMI path, where we need to leave with
the same cr3 as we entered. There we still have the
callee-saved registers live when switching cr3s.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 50 ++++++++++++++++++++---------------------------
 1 file changed, 21 insertions(+), 29 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 9bd7718..b39c5e2 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -92,11 +92,6 @@
 .macro PUSH_GS
 	pushl	$0
 .endm
-.macro POP_GS pop=0
-	addl	$(4 + \pop), %esp
-.endm
-.macro POP_GS_EX
-.endm
 
  /* all the rest are no-op */
 .macro PTGS_TO_GS
@@ -116,20 +111,6 @@
 	pushl	%gs
 .endm
 
-.macro POP_GS pop=0
-98:	popl	%gs
-  .if \pop <> 0
-	add	$\pop, %esp
-  .endif
-.endm
-.macro POP_GS_EX
-.pushsection .fixup, "ax"
-99:	movl	$0, (%esp)
-	jmp	98b
-.popsection
-	_ASM_EXTABLE(98b, 99b)
-.endm
-
 .macro PTGS_TO_GS
 98:	mov	PT_GS(%esp), %gs
 .endm
@@ -201,24 +182,35 @@
 	popl	%eax
 .endm
 
-.macro RESTORE_REGS pop=0
-	RESTORE_INT_REGS
-1:	popl	%ds
-2:	popl	%es
-3:	popl	%fs
-	POP_GS \pop
+.macro RESTORE_SEGMENTS
+1:	mov	PT_DS(%esp), %ds
+2:	mov	PT_ES(%esp), %es
+3:	mov	PT_FS(%esp), %fs
+	PTGS_TO_GS
 .pushsection .fixup, "ax"
-4:	movl	$0, (%esp)
+4:	movl	$0, PT_DS(%esp)
 	jmp	1b
-5:	movl	$0, (%esp)
+5:	movl	$0, PT_ES(%esp)
 	jmp	2b
-6:	movl	$0, (%esp)
+6:	movl	$0, PT_FS(%esp)
 	jmp	3b
 .popsection
 	_ASM_EXTABLE(1b, 4b)
 	_ASM_EXTABLE(2b, 5b)
 	_ASM_EXTABLE(3b, 6b)
-	POP_GS_EX
+	PTGS_TO_GS_EX
+.endm
+
+.macro RESTORE_SKIP_SEGMENTS pop=0
+	/* Jump over the segments stored on stack */
+	addl	$((4 * 4) + \pop), %esp
+.endm
+
+.macro RESTORE_REGS pop=0
+	RESTORE_SEGMENTS
+	RESTORE_INT_REGS
+	/* Skip over already restored segment registers */
+	RESTORE_SKIP_SEGMENTS \pop
 .endm
 
 .macro CHECK_AND_APPLY_ESPFIX
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 08/31] x86/entry/32: Enter the kernel via trampoline stack
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Use the entry-stack as a trampoline to enter the kernel. The
entry-stack is already in the cpu_entry_area and will be
mapped to userspace when PTI is enabled.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S        | 135 +++++++++++++++++++++++++++++++--------
 arch/x86/include/asm/switch_to.h |   6 +-
 arch/x86/kernel/asm-offsets.c    |   1 +
 arch/x86/kernel/cpu/common.c     |   5 +-
 arch/x86/kernel/process.c        |   2 -
 arch/x86/kernel/process_32.c     |  10 +--
 6 files changed, 120 insertions(+), 39 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index b39c5e2..e714b53 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -135,25 +135,36 @@
 
 #endif /* CONFIG_X86_32_LAZY_GS */
 
-.macro SAVE_ALL pt_regs_ax=%eax
+.macro SAVE_ALL pt_regs_ax=%eax switch_stacks=0
 	cld
+	/* Push segment registers and %eax */
 	PUSH_GS
 	pushl	%fs
 	pushl	%es
 	pushl	%ds
 	pushl	\pt_regs_ax
+
+	/* Load kernel segments */
+	movl	$(__USER_DS), %eax
+	movl	%eax, %ds
+	movl	%eax, %es
+	movl	$(__KERNEL_PERCPU), %eax
+	movl	%eax, %fs
+	SET_KERNEL_GS %eax
+
+	/* Push integer registers and complete PT_REGS */
 	pushl	%ebp
 	pushl	%edi
 	pushl	%esi
 	pushl	%edx
 	pushl	%ecx
 	pushl	%ebx
-	movl	$(__USER_DS), %edx
-	movl	%edx, %ds
-	movl	%edx, %es
-	movl	$(__KERNEL_PERCPU), %edx
-	movl	%edx, %fs
-	SET_KERNEL_GS %edx
+
+	/* Switch to kernel stack if necessary */
+.if \switch_stacks > 0
+	SWITCH_TO_KERNEL_STACK
+.endif
+
 .endm
 
 /*
@@ -261,6 +272,71 @@
 .Lend_\@:
 #endif /* CONFIG_X86_ESPFIX32 */
 .endm
+
+
+/*
+ * Called with pt_regs fully populated and kernel segments loaded,
+ * so we can access PER_CPU and use the integer registers.
+ *
+ * We need to be very careful here with the %esp switch, because an NMI
+ * can happen everywhere. If the NMI handler finds itself on the
+ * entry-stack, it will overwrite the task-stack and everything we
+ * copied there. So allocate the stack-frame on the task-stack and
+ * switch to it before we do any copying.
+ */
+.macro SWITCH_TO_KERNEL_STACK
+
+	ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
+
+	/* Are we on the entry stack? Bail out if not! */
+	movl	PER_CPU_VAR(cpu_entry_area), %edi
+	addl	$CPU_ENTRY_AREA_entry_stack, %edi
+	cmpl	%esp, %edi
+	jae	.Lend_\@
+
+	/* Load stack pointer into %esi and %edi */
+	movl	%esp, %esi
+	movl	%esi, %edi
+
+	/* Move %edi to the top of the entry stack */
+	andl	$(MASK_entry_stack), %edi
+	addl	$(SIZEOF_entry_stack), %edi
+
+	/* Load top of task-stack into %edi */
+	movl	TSS_entry_stack(%edi), %edi
+
+	/* Bytes to copy */
+	movl	$PTREGS_SIZE, %ecx
+
+#ifdef CONFIG_VM86
+	testl	$X86_EFLAGS_VM, PT_EFLAGS(%esi)
+	jz	.Lcopy_pt_regs_\@
+
+	/*
+	 * Stack-frame contains 4 additional segment registers when
+	 * coming from VM86 mode
+	 */
+	addl	$(4 * 4), %ecx
+
+.Lcopy_pt_regs_\@:
+#endif
+
+	/* Allocate frame on task-stack */
+	subl	%ecx, %edi
+
+	/* Switch to task-stack */
+	movl	%edi, %esp
+
+	/*
+	 * We are now on the task-stack and can safely copy over the
+	 * stack-frame
+	 */
+	cld
+	rep movsb
+
+.Lend_\@:
+.endm
+
 /*
  * %eax: prev task
  * %edx: next task
@@ -454,6 +530,7 @@ ENTRY(xen_sysenter_target)
  */
 ENTRY(entry_SYSENTER_32)
 	movl	TSS_entry_stack(%esp), %esp
+
 .Lsysenter_past_esp:
 	pushl	$__USER_DS		/* pt_regs->ss */
 	pushl	%ebp			/* pt_regs->sp (stashed in bp) */
@@ -462,7 +539,7 @@ ENTRY(entry_SYSENTER_32)
 	pushl	$__USER_CS		/* pt_regs->cs */
 	pushl	$0			/* pt_regs->ip = 0 (placeholder) */
 	pushl	%eax			/* pt_regs->orig_ax */
-	SAVE_ALL pt_regs_ax=$-ENOSYS	/* save rest */
+	SAVE_ALL pt_regs_ax=$-ENOSYS	/* save rest, stack already switched */
 
 	/*
 	 * SYSENTER doesn't filter flags, so we need to clear NT, AC
@@ -573,7 +650,8 @@ ENDPROC(entry_SYSENTER_32)
 ENTRY(entry_INT80_32)
 	ASM_CLAC
 	pushl	%eax			/* pt_regs->orig_ax */
-	SAVE_ALL pt_regs_ax=$-ENOSYS	/* save rest */
+
+	SAVE_ALL pt_regs_ax=$-ENOSYS switch_stacks=1	/* save rest */
 
 	/*
 	 * User mode is traced as though IRQs are on, and the interrupt gate
@@ -665,7 +743,8 @@ END(irq_entries_start)
 common_interrupt:
 	ASM_CLAC
 	addl	$-0x80, (%esp)			/* Adjust vector into the [-256, -1] range */
-	SAVE_ALL
+
+	SAVE_ALL switch_stacks=1
 	ENCODE_FRAME_POINTER
 	TRACE_IRQS_OFF
 	movl	%esp, %eax
@@ -673,16 +752,16 @@ common_interrupt:
 	jmp	ret_from_intr
 ENDPROC(common_interrupt)
 
-#define BUILD_INTERRUPT3(name, nr, fn)	\
-ENTRY(name)				\
-	ASM_CLAC;			\
-	pushl	$~(nr);			\
-	SAVE_ALL;			\
-	ENCODE_FRAME_POINTER;		\
-	TRACE_IRQS_OFF			\
-	movl	%esp, %eax;		\
-	call	fn;			\
-	jmp	ret_from_intr;		\
+#define BUILD_INTERRUPT3(name, nr, fn)			\
+ENTRY(name)						\
+	ASM_CLAC;					\
+	pushl	$~(nr);					\
+	SAVE_ALL switch_stacks=1;			\
+	ENCODE_FRAME_POINTER;				\
+	TRACE_IRQS_OFF					\
+	movl	%esp, %eax;				\
+	call	fn;					\
+	jmp	ret_from_intr;				\
 ENDPROC(name)
 
 #define BUILD_INTERRUPT(name, nr)		\
@@ -908,16 +987,20 @@ common_exception:
 	pushl	%es
 	pushl	%ds
 	pushl	%eax
+	movl	$(__USER_DS), %eax
+	movl	%eax, %ds
+	movl	%eax, %es
+	movl	$(__KERNEL_PERCPU), %eax
+	movl	%eax, %fs
 	pushl	%ebp
 	pushl	%edi
 	pushl	%esi
 	pushl	%edx
 	pushl	%ecx
 	pushl	%ebx
+	SWITCH_TO_KERNEL_STACK
 	ENCODE_FRAME_POINTER
 	cld
-	movl	$(__KERNEL_PERCPU), %ecx
-	movl	%ecx, %fs
 	UNWIND_ESPFIX_STACK
 	GS_TO_REG %ecx
 	movl	PT_GS(%esp), %edi		# get the function address
@@ -925,9 +1008,6 @@ common_exception:
 	movl	$-1, PT_ORIG_EAX(%esp)		# no syscall to restart
 	REG_TO_PTGS %ecx
 	SET_KERNEL_GS %ecx
-	movl	$(__USER_DS), %ecx
-	movl	%ecx, %ds
-	movl	%ecx, %es
 	TRACE_IRQS_OFF
 	movl	%esp, %eax			# pt_regs pointer
 	CALL_NOSPEC %edi
@@ -946,6 +1026,7 @@ ENTRY(debug)
 	 */
 	ASM_CLAC
 	pushl	$-1				# mark this as an int
+
 	SAVE_ALL
 	ENCODE_FRAME_POINTER
 	xorl	%edx, %edx			# error code 0
@@ -981,6 +1062,7 @@ END(debug)
  */
 ENTRY(nmi)
 	ASM_CLAC
+
 #ifdef CONFIG_X86_ESPFIX32
 	pushl	%eax
 	movl	%ss, %eax
@@ -1048,7 +1130,8 @@ END(nmi)
 ENTRY(int3)
 	ASM_CLAC
 	pushl	$-1				# mark this as an int
-	SAVE_ALL
+
+	SAVE_ALL switch_stacks=1
 	ENCODE_FRAME_POINTER
 	TRACE_IRQS_OFF
 	xorl	%edx, %edx			# zero error code
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index eb5f799..20e5f7ab 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -89,13 +89,9 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread)
 /* This is used when switching tasks or entering/exiting vm86 mode. */
 static inline void update_sp0(struct task_struct *task)
 {
-	/* On x86_64, sp0 always points to the entry trampoline stack, which is constant: */
-#ifdef CONFIG_X86_32
-	load_sp0(task->thread.sp0);
-#else
+	/* sp0 always points to the entry trampoline stack, which is constant: */
 	if (static_cpu_has(X86_FEATURE_XENPV))
 		load_sp0(task_top_of_stack(task));
-#endif
 }
 
 #endif /* _ASM_X86_SWITCH_TO_H */
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 232152c..86f06e8 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -103,6 +103,7 @@ void common(void) {
 	OFFSET(CPU_ENTRY_AREA_entry_trampoline, cpu_entry_area, entry_trampoline);
 	OFFSET(CPU_ENTRY_AREA_entry_stack, cpu_entry_area, entry_stack_page);
 	DEFINE(SIZEOF_entry_stack, sizeof(struct entry_stack));
+	DEFINE(MASK_entry_stack, (~(sizeof(struct entry_stack) - 1)));
 
 	/* Offset for sp0 and sp1 into the tss_struct */
 	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index d63f4b57..a0ed348 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1709,11 +1709,12 @@ void cpu_init(void)
 	enter_lazy_tlb(&init_mm, curr);
 
 	/*
-	 * Initialize the TSS.  Don't bother initializing sp0, as the initial
-	 * task never enters user mode.
+	 * Initialize the TSS.  sp0 points to the entry trampoline stack
+	 * regardless of what task is running.
 	 */
 	set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);
 	load_TR_desc();
+	load_sp0((unsigned long)(cpu_entry_stack(cpu) + 1));
 
 	load_mm_ldt(&init_mm);
 
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index cb368c2..7fab4fa 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -57,14 +57,12 @@ __visible DEFINE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw) = {
 		 */
 		.sp0 = (1UL << (BITS_PER_LONG-1)) + 1,
 
-#ifdef CONFIG_X86_64
 		/*
 		 * .sp1 is cpu_current_top_of_stack.  The init task never
 		 * runs user code, but cpu_current_top_of_stack should still
 		 * be well defined before the first context switch.
 		 */
 		.sp1 = TOP_OF_INIT_STACK,
-#endif
 
 #ifdef CONFIG_X86_32
 		.ss0 = __KERNEL_DS,
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 097d36a..3f3a8c6 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -289,10 +289,12 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	 */
 	update_sp0(next_p);
 	refresh_sysenter_cs(next);
-	this_cpu_write(cpu_current_top_of_stack,
-		       (unsigned long)task_stack_page(next_p) +
-		       THREAD_SIZE);
-	/* SYSENTER reads the task-stack from tss.sp1 */
+	this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
+	/*
+	 * TODO: Find a way to let cpu_current_top_of_stack point to
+	 * cpu_tss_rw.x86_tss.sp1. Doing so now results in stack corruption with
+	 * iret exceptions.
+	 */
 	this_cpu_write(cpu_tss_rw.x86_tss.sp1, next_p->thread.sp0);
 
 	/*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 08/31] x86/entry/32: Enter the kernel via trampoline stack
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Use the entry-stack as a trampoline to enter the kernel. The
entry-stack is already in the cpu_entry_area and will be
mapped to userspace when PTI is enabled.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S        | 135 +++++++++++++++++++++++++++++++--------
 arch/x86/include/asm/switch_to.h |   6 +-
 arch/x86/kernel/asm-offsets.c    |   1 +
 arch/x86/kernel/cpu/common.c     |   5 +-
 arch/x86/kernel/process.c        |   2 -
 arch/x86/kernel/process_32.c     |  10 +--
 6 files changed, 120 insertions(+), 39 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index b39c5e2..e714b53 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -135,25 +135,36 @@
 
 #endif /* CONFIG_X86_32_LAZY_GS */
 
-.macro SAVE_ALL pt_regs_ax=%eax
+.macro SAVE_ALL pt_regs_ax=%eax switch_stacks=0
 	cld
+	/* Push segment registers and %eax */
 	PUSH_GS
 	pushl	%fs
 	pushl	%es
 	pushl	%ds
 	pushl	\pt_regs_ax
+
+	/* Load kernel segments */
+	movl	$(__USER_DS), %eax
+	movl	%eax, %ds
+	movl	%eax, %es
+	movl	$(__KERNEL_PERCPU), %eax
+	movl	%eax, %fs
+	SET_KERNEL_GS %eax
+
+	/* Push integer registers and complete PT_REGS */
 	pushl	%ebp
 	pushl	%edi
 	pushl	%esi
 	pushl	%edx
 	pushl	%ecx
 	pushl	%ebx
-	movl	$(__USER_DS), %edx
-	movl	%edx, %ds
-	movl	%edx, %es
-	movl	$(__KERNEL_PERCPU), %edx
-	movl	%edx, %fs
-	SET_KERNEL_GS %edx
+
+	/* Switch to kernel stack if necessary */
+.if \switch_stacks > 0
+	SWITCH_TO_KERNEL_STACK
+.endif
+
 .endm
 
 /*
@@ -261,6 +272,71 @@
 .Lend_\@:
 #endif /* CONFIG_X86_ESPFIX32 */
 .endm
+
+
+/*
+ * Called with pt_regs fully populated and kernel segments loaded,
+ * so we can access PER_CPU and use the integer registers.
+ *
+ * We need to be very careful here with the %esp switch, because an NMI
+ * can happen everywhere. If the NMI handler finds itself on the
+ * entry-stack, it will overwrite the task-stack and everything we
+ * copied there. So allocate the stack-frame on the task-stack and
+ * switch to it before we do any copying.
+ */
+.macro SWITCH_TO_KERNEL_STACK
+
+	ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
+
+	/* Are we on the entry stack? Bail out if not! */
+	movl	PER_CPU_VAR(cpu_entry_area), %edi
+	addl	$CPU_ENTRY_AREA_entry_stack, %edi
+	cmpl	%esp, %edi
+	jae	.Lend_\@
+
+	/* Load stack pointer into %esi and %edi */
+	movl	%esp, %esi
+	movl	%esi, %edi
+
+	/* Move %edi to the top of the entry stack */
+	andl	$(MASK_entry_stack), %edi
+	addl	$(SIZEOF_entry_stack), %edi
+
+	/* Load top of task-stack into %edi */
+	movl	TSS_entry_stack(%edi), %edi
+
+	/* Bytes to copy */
+	movl	$PTREGS_SIZE, %ecx
+
+#ifdef CONFIG_VM86
+	testl	$X86_EFLAGS_VM, PT_EFLAGS(%esi)
+	jz	.Lcopy_pt_regs_\@
+
+	/*
+	 * Stack-frame contains 4 additional segment registers when
+	 * coming from VM86 mode
+	 */
+	addl	$(4 * 4), %ecx
+
+.Lcopy_pt_regs_\@:
+#endif
+
+	/* Allocate frame on task-stack */
+	subl	%ecx, %edi
+
+	/* Switch to task-stack */
+	movl	%edi, %esp
+
+	/*
+	 * We are now on the task-stack and can safely copy over the
+	 * stack-frame
+	 */
+	cld
+	rep movsb
+
+.Lend_\@:
+.endm
+
 /*
  * %eax: prev task
  * %edx: next task
@@ -454,6 +530,7 @@ ENTRY(xen_sysenter_target)
  */
 ENTRY(entry_SYSENTER_32)
 	movl	TSS_entry_stack(%esp), %esp
+
 .Lsysenter_past_esp:
 	pushl	$__USER_DS		/* pt_regs->ss */
 	pushl	%ebp			/* pt_regs->sp (stashed in bp) */
@@ -462,7 +539,7 @@ ENTRY(entry_SYSENTER_32)
 	pushl	$__USER_CS		/* pt_regs->cs */
 	pushl	$0			/* pt_regs->ip = 0 (placeholder) */
 	pushl	%eax			/* pt_regs->orig_ax */
-	SAVE_ALL pt_regs_ax=$-ENOSYS	/* save rest */
+	SAVE_ALL pt_regs_ax=$-ENOSYS	/* save rest, stack already switched */
 
 	/*
 	 * SYSENTER doesn't filter flags, so we need to clear NT, AC
@@ -573,7 +650,8 @@ ENDPROC(entry_SYSENTER_32)
 ENTRY(entry_INT80_32)
 	ASM_CLAC
 	pushl	%eax			/* pt_regs->orig_ax */
-	SAVE_ALL pt_regs_ax=$-ENOSYS	/* save rest */
+
+	SAVE_ALL pt_regs_ax=$-ENOSYS switch_stacks=1	/* save rest */
 
 	/*
 	 * User mode is traced as though IRQs are on, and the interrupt gate
@@ -665,7 +743,8 @@ END(irq_entries_start)
 common_interrupt:
 	ASM_CLAC
 	addl	$-0x80, (%esp)			/* Adjust vector into the [-256, -1] range */
-	SAVE_ALL
+
+	SAVE_ALL switch_stacks=1
 	ENCODE_FRAME_POINTER
 	TRACE_IRQS_OFF
 	movl	%esp, %eax
@@ -673,16 +752,16 @@ common_interrupt:
 	jmp	ret_from_intr
 ENDPROC(common_interrupt)
 
-#define BUILD_INTERRUPT3(name, nr, fn)	\
-ENTRY(name)				\
-	ASM_CLAC;			\
-	pushl	$~(nr);			\
-	SAVE_ALL;			\
-	ENCODE_FRAME_POINTER;		\
-	TRACE_IRQS_OFF			\
-	movl	%esp, %eax;		\
-	call	fn;			\
-	jmp	ret_from_intr;		\
+#define BUILD_INTERRUPT3(name, nr, fn)			\
+ENTRY(name)						\
+	ASM_CLAC;					\
+	pushl	$~(nr);					\
+	SAVE_ALL switch_stacks=1;			\
+	ENCODE_FRAME_POINTER;				\
+	TRACE_IRQS_OFF					\
+	movl	%esp, %eax;				\
+	call	fn;					\
+	jmp	ret_from_intr;				\
 ENDPROC(name)
 
 #define BUILD_INTERRUPT(name, nr)		\
@@ -908,16 +987,20 @@ common_exception:
 	pushl	%es
 	pushl	%ds
 	pushl	%eax
+	movl	$(__USER_DS), %eax
+	movl	%eax, %ds
+	movl	%eax, %es
+	movl	$(__KERNEL_PERCPU), %eax
+	movl	%eax, %fs
 	pushl	%ebp
 	pushl	%edi
 	pushl	%esi
 	pushl	%edx
 	pushl	%ecx
 	pushl	%ebx
+	SWITCH_TO_KERNEL_STACK
 	ENCODE_FRAME_POINTER
 	cld
-	movl	$(__KERNEL_PERCPU), %ecx
-	movl	%ecx, %fs
 	UNWIND_ESPFIX_STACK
 	GS_TO_REG %ecx
 	movl	PT_GS(%esp), %edi		# get the function address
@@ -925,9 +1008,6 @@ common_exception:
 	movl	$-1, PT_ORIG_EAX(%esp)		# no syscall to restart
 	REG_TO_PTGS %ecx
 	SET_KERNEL_GS %ecx
-	movl	$(__USER_DS), %ecx
-	movl	%ecx, %ds
-	movl	%ecx, %es
 	TRACE_IRQS_OFF
 	movl	%esp, %eax			# pt_regs pointer
 	CALL_NOSPEC %edi
@@ -946,6 +1026,7 @@ ENTRY(debug)
 	 */
 	ASM_CLAC
 	pushl	$-1				# mark this as an int
+
 	SAVE_ALL
 	ENCODE_FRAME_POINTER
 	xorl	%edx, %edx			# error code 0
@@ -981,6 +1062,7 @@ END(debug)
  */
 ENTRY(nmi)
 	ASM_CLAC
+
 #ifdef CONFIG_X86_ESPFIX32
 	pushl	%eax
 	movl	%ss, %eax
@@ -1048,7 +1130,8 @@ END(nmi)
 ENTRY(int3)
 	ASM_CLAC
 	pushl	$-1				# mark this as an int
-	SAVE_ALL
+
+	SAVE_ALL switch_stacks=1
 	ENCODE_FRAME_POINTER
 	TRACE_IRQS_OFF
 	xorl	%edx, %edx			# zero error code
diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h
index eb5f799..20e5f7ab 100644
--- a/arch/x86/include/asm/switch_to.h
+++ b/arch/x86/include/asm/switch_to.h
@@ -89,13 +89,9 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread)
 /* This is used when switching tasks or entering/exiting vm86 mode. */
 static inline void update_sp0(struct task_struct *task)
 {
-	/* On x86_64, sp0 always points to the entry trampoline stack, which is constant: */
-#ifdef CONFIG_X86_32
-	load_sp0(task->thread.sp0);
-#else
+	/* sp0 always points to the entry trampoline stack, which is constant: */
 	if (static_cpu_has(X86_FEATURE_XENPV))
 		load_sp0(task_top_of_stack(task));
-#endif
 }
 
 #endif /* _ASM_X86_SWITCH_TO_H */
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 232152c..86f06e8 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -103,6 +103,7 @@ void common(void) {
 	OFFSET(CPU_ENTRY_AREA_entry_trampoline, cpu_entry_area, entry_trampoline);
 	OFFSET(CPU_ENTRY_AREA_entry_stack, cpu_entry_area, entry_stack_page);
 	DEFINE(SIZEOF_entry_stack, sizeof(struct entry_stack));
+	DEFINE(MASK_entry_stack, (~(sizeof(struct entry_stack) - 1)));
 
 	/* Offset for sp0 and sp1 into the tss_struct */
 	OFFSET(TSS_sp0, tss_struct, x86_tss.sp0);
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index d63f4b57..a0ed348 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1709,11 +1709,12 @@ void cpu_init(void)
 	enter_lazy_tlb(&init_mm, curr);
 
 	/*
-	 * Initialize the TSS.  Don't bother initializing sp0, as the initial
-	 * task never enters user mode.
+	 * Initialize the TSS.  sp0 points to the entry trampoline stack
+	 * regardless of what task is running.
 	 */
 	set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss);
 	load_TR_desc();
+	load_sp0((unsigned long)(cpu_entry_stack(cpu) + 1));
 
 	load_mm_ldt(&init_mm);
 
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index cb368c2..7fab4fa 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -57,14 +57,12 @@ __visible DEFINE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw) = {
 		 */
 		.sp0 = (1UL << (BITS_PER_LONG-1)) + 1,
 
-#ifdef CONFIG_X86_64
 		/*
 		 * .sp1 is cpu_current_top_of_stack.  The init task never
 		 * runs user code, but cpu_current_top_of_stack should still
 		 * be well defined before the first context switch.
 		 */
 		.sp1 = TOP_OF_INIT_STACK,
-#endif
 
 #ifdef CONFIG_X86_32
 		.ss0 = __KERNEL_DS,
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 097d36a..3f3a8c6 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -289,10 +289,12 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
 	 */
 	update_sp0(next_p);
 	refresh_sysenter_cs(next);
-	this_cpu_write(cpu_current_top_of_stack,
-		       (unsigned long)task_stack_page(next_p) +
-		       THREAD_SIZE);
-	/* SYSENTER reads the task-stack from tss.sp1 */
+	this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p));
+	/*
+	 * TODO: Find a way to let cpu_current_top_of_stack point to
+	 * cpu_tss_rw.x86_tss.sp1. Doing so now results in stack corruption with
+	 * iret exceptions.
+	 */
 	this_cpu_write(cpu_tss_rw.x86_tss.sp1, next_p->thread.sp0);
 
 	/*
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Switch back to the trampoline stack before returning to
userspace.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 55 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index e714b53..ba3d8c2 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -338,6 +338,59 @@
 .endm
 
 /*
+ * Switch back from the kernel stack to the entry stack.
+ *
+ * The %esp register must point to pt_regs on the task stack. It will
+ * first calculate the size of the stack-frame to copy, depending on
+ * whether we return to VM86 mode or not. With that it uses 'rep movsb'
+ * to copy the contents of the stack over to the entry stack.
+ *
+ * We must be very careful here, as we can't trust the contents of the
+ * task-stack once we switched to the entry-stack. When an NMI happens
+ * while on the entry-stack, the NMI handler will switch back to the top
+ * of the task stack, overwriting our stack-frame we are about to copy.
+ * Therefore we switch the stack only after everything is copied over.
+ */
+.macro SWITCH_TO_ENTRY_STACK
+
+	ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
+
+	/* Bytes to copy */
+	movl	$PTREGS_SIZE, %ecx
+
+#ifdef CONFIG_VM86
+	testl	$(X86_EFLAGS_VM), PT_EFLAGS(%esp)
+	jz	.Lcopy_pt_regs_\@
+
+	/* Additional 4 registers to copy when returning to VM86 mode */
+	addl    $(4 * 4), %ecx
+
+.Lcopy_pt_regs_\@:
+#endif
+
+	/* Initialize source and destination for movsb */
+	movl	PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %edi
+	subl	%ecx, %edi
+	movl	%esp, %esi
+
+	/* Save future stack pointer in %ebx */
+	movl %edi, %ebx
+
+	/* Copy over the stack-frame */
+	cld
+	rep movsb
+
+	/*
+	 * Switch to entry-stack - needs to happen after everything is
+	 * copied because the NMI handler will overwrite the task-stack
+	 * when on entry-stack
+	 */
+	movl	%ebx, %esp
+
+.Lend_\@:
+.endm
+
+/*
  * %eax: prev task
  * %edx: next task
  */
@@ -578,6 +631,7 @@ ENTRY(entry_SYSENTER_32)
 
 /* Opportunistic SYSEXIT */
 	TRACE_IRQS_ON			/* User mode traces as IRQs on. */
+	SWITCH_TO_ENTRY_STACK		/* Switch to per-cpu entry stack */
 	movl	PT_EIP(%esp), %edx	/* pt_regs->ip */
 	movl	PT_OLDESP(%esp), %ecx	/* pt_regs->sp */
 1:	mov	PT_FS(%esp), %fs
@@ -665,6 +719,7 @@ ENTRY(entry_INT80_32)
 
 restore_all:
 	TRACE_IRQS_IRET
+	SWITCH_TO_ENTRY_STACK
 .Lrestore_all_notrace:
 	CHECK_AND_APPLY_ESPFIX
 .Lrestore_nocheck:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Switch back to the trampoline stack before returning to
userspace.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 55 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index e714b53..ba3d8c2 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -338,6 +338,59 @@
 .endm
 
 /*
+ * Switch back from the kernel stack to the entry stack.
+ *
+ * The %esp register must point to pt_regs on the task stack. It will
+ * first calculate the size of the stack-frame to copy, depending on
+ * whether we return to VM86 mode or not. With that it uses 'rep movsb'
+ * to copy the contents of the stack over to the entry stack.
+ *
+ * We must be very careful here, as we can't trust the contents of the
+ * task-stack once we switched to the entry-stack. When an NMI happens
+ * while on the entry-stack, the NMI handler will switch back to the top
+ * of the task stack, overwriting our stack-frame we are about to copy.
+ * Therefore we switch the stack only after everything is copied over.
+ */
+.macro SWITCH_TO_ENTRY_STACK
+
+	ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
+
+	/* Bytes to copy */
+	movl	$PTREGS_SIZE, %ecx
+
+#ifdef CONFIG_VM86
+	testl	$(X86_EFLAGS_VM), PT_EFLAGS(%esp)
+	jz	.Lcopy_pt_regs_\@
+
+	/* Additional 4 registers to copy when returning to VM86 mode */
+	addl    $(4 * 4), %ecx
+
+.Lcopy_pt_regs_\@:
+#endif
+
+	/* Initialize source and destination for movsb */
+	movl	PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %edi
+	subl	%ecx, %edi
+	movl	%esp, %esi
+
+	/* Save future stack pointer in %ebx */
+	movl %edi, %ebx
+
+	/* Copy over the stack-frame */
+	cld
+	rep movsb
+
+	/*
+	 * Switch to entry-stack - needs to happen after everything is
+	 * copied because the NMI handler will overwrite the task-stack
+	 * when on entry-stack
+	 */
+	movl	%ebx, %esp
+
+.Lend_\@:
+.endm
+
+/*
  * %eax: prev task
  * %edx: next task
  */
@@ -578,6 +631,7 @@ ENTRY(entry_SYSENTER_32)
 
 /* Opportunistic SYSEXIT */
 	TRACE_IRQS_ON			/* User mode traces as IRQs on. */
+	SWITCH_TO_ENTRY_STACK		/* Switch to per-cpu entry stack */
 	movl	PT_EIP(%esp), %edx	/* pt_regs->ip */
 	movl	PT_OLDESP(%esp), %ecx	/* pt_regs->sp */
 1:	mov	PT_FS(%esp), %fs
@@ -665,6 +719,7 @@ ENTRY(entry_INT80_32)
 
 restore_all:
 	TRACE_IRQS_IRET
+	SWITCH_TO_ENTRY_STACK
 .Lrestore_all_notrace:
 	CHECK_AND_APPLY_ESPFIX
 .Lrestore_nocheck:
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 10/31] x86/entry/32: Introduce SAVE_ALL_NMI and RESTORE_ALL_NMI
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

These macros will be used in the NMI handler code and
replace plain SAVE_ALL and RESTORE_REGS there. We will add
the NMI-specific CR3-switch to these macros later.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index ba3d8c2..be1d814 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -167,6 +167,9 @@
 
 .endm
 
+.macro SAVE_ALL_NMI
+	SAVE_ALL
+.endm
 /*
  * This is a sneaky trick to help the unwinder find pt_regs on the stack.  The
  * frame pointer is replaced with an encoded pointer to pt_regs.  The encoding
@@ -224,6 +227,18 @@
 	RESTORE_SKIP_SEGMENTS \pop
 .endm
 
+.macro RESTORE_ALL_NMI pop=0
+	/*
+	 * Restore segments - might cause exceptions when loading
+	 * user-space segments
+	 */
+	RESTORE_SEGMENTS
+
+	/* Restore integer registers and unwind stack to iret frame */
+	RESTORE_INT_REGS
+	RESTORE_SKIP_SEGMENTS \pop
+.endm
+
 .macro CHECK_AND_APPLY_ESPFIX
 #ifdef CONFIG_X86_ESPFIX32
 #define GDT_ESPFIX_SS PER_CPU_VAR(gdt_page) + (GDT_ENTRY_ESPFIX_SS * 8)
@@ -1127,7 +1142,7 @@ ENTRY(nmi)
 #endif
 
 	pushl	%eax				# pt_regs->orig_ax
-	SAVE_ALL
+	SAVE_ALL_NMI
 	ENCODE_FRAME_POINTER
 	xorl	%edx, %edx			# zero error code
 	movl	%esp, %eax			# pt_regs pointer
@@ -1155,7 +1170,7 @@ ENTRY(nmi)
 
 .Lnmi_return:
 	CHECK_AND_APPLY_ESPFIX
-	RESTORE_REGS 4
+	RESTORE_ALL_NMI pop=4
 	jmp	.Lirq_return
 
 #ifdef CONFIG_X86_ESPFIX32
@@ -1171,12 +1186,12 @@ ENTRY(nmi)
 	pushl	16(%esp)
 	.endr
 	pushl	%eax
-	SAVE_ALL
+	SAVE_ALL_NMI
 	ENCODE_FRAME_POINTER
 	FIXUP_ESPFIX_STACK			# %eax == %esp
 	xorl	%edx, %edx			# zero error code
 	call	do_nmi
-	RESTORE_REGS
+	RESTORE_ALL_NMI
 	lss	12+4(%esp), %esp		# back to espfix stack
 	jmp	.Lirq_return
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 10/31] x86/entry/32: Introduce SAVE_ALL_NMI and RESTORE_ALL_NMI
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

These macros will be used in the NMI handler code and
replace plain SAVE_ALL and RESTORE_REGS there. We will add
the NMI-specific CR3-switch to these macros later.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index ba3d8c2..be1d814 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -167,6 +167,9 @@
 
 .endm
 
+.macro SAVE_ALL_NMI
+	SAVE_ALL
+.endm
 /*
  * This is a sneaky trick to help the unwinder find pt_regs on the stack.  The
  * frame pointer is replaced with an encoded pointer to pt_regs.  The encoding
@@ -224,6 +227,18 @@
 	RESTORE_SKIP_SEGMENTS \pop
 .endm
 
+.macro RESTORE_ALL_NMI pop=0
+	/*
+	 * Restore segments - might cause exceptions when loading
+	 * user-space segments
+	 */
+	RESTORE_SEGMENTS
+
+	/* Restore integer registers and unwind stack to iret frame */
+	RESTORE_INT_REGS
+	RESTORE_SKIP_SEGMENTS \pop
+.endm
+
 .macro CHECK_AND_APPLY_ESPFIX
 #ifdef CONFIG_X86_ESPFIX32
 #define GDT_ESPFIX_SS PER_CPU_VAR(gdt_page) + (GDT_ENTRY_ESPFIX_SS * 8)
@@ -1127,7 +1142,7 @@ ENTRY(nmi)
 #endif
 
 	pushl	%eax				# pt_regs->orig_ax
-	SAVE_ALL
+	SAVE_ALL_NMI
 	ENCODE_FRAME_POINTER
 	xorl	%edx, %edx			# zero error code
 	movl	%esp, %eax			# pt_regs pointer
@@ -1155,7 +1170,7 @@ ENTRY(nmi)
 
 .Lnmi_return:
 	CHECK_AND_APPLY_ESPFIX
-	RESTORE_REGS 4
+	RESTORE_ALL_NMI pop=4
 	jmp	.Lirq_return
 
 #ifdef CONFIG_X86_ESPFIX32
@@ -1171,12 +1186,12 @@ ENTRY(nmi)
 	pushl	16(%esp)
 	.endr
 	pushl	%eax
-	SAVE_ALL
+	SAVE_ALL_NMI
 	ENCODE_FRAME_POINTER
 	FIXUP_ESPFIX_STACK			# %eax == %esp
 	xorl	%edx, %edx			# zero error code
 	call	do_nmi
-	RESTORE_REGS
+	RESTORE_ALL_NMI
 	lss	12+4(%esp), %esp		# back to espfix stack
 	jmp	.Lirq_return
 #endif
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 11/31] x86/entry/32: Add PTI cr3 switches to NMI handler code
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

The NMI handler is special, as it needs to leave with the
same cr3 as it was entered with. We need to do this because
we could enter the NMI handler from kernel code with
user-cr3 already loaded.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 52 +++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 46 insertions(+), 6 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index be1d814..9693485 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -77,6 +77,8 @@
 #endif
 .endm
 
+#define PTI_SWITCH_MASK         (1 << PAGE_SHIFT)
+
 /*
  * User gs save/restore
  *
@@ -167,8 +169,30 @@
 
 .endm
 
-.macro SAVE_ALL_NMI
+.macro SAVE_ALL_NMI cr3_reg:req
 	SAVE_ALL
+
+	/*
+	 * Now switch the CR3 when PTI is enabled.
+	 *
+	 * We can enter with either user or kernel cr3, the code will
+	 * store the old cr3 in \cr3_reg and switches to the kernel cr3
+	 * if necessary.
+	 */
+	ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
+
+	movl	%cr3, \cr3_reg
+	testl	$PTI_SWITCH_MASK, \cr3_reg
+	jz	.Lend_\@	/* Already on kernel cr3 */
+
+	/* On user cr3 - write new kernel cr3 */
+	andl	$(~PTI_SWITCH_MASK), \cr3_reg
+	movl	\cr3_reg, %cr3
+
+	/* Restore user cr3 value */
+	orl	$PTI_SWITCH_MASK, \cr3_reg
+
+.Lend_\@:
 .endm
 /*
  * This is a sneaky trick to help the unwinder find pt_regs on the stack.  The
@@ -227,13 +251,29 @@
 	RESTORE_SKIP_SEGMENTS \pop
 .endm
 
-.macro RESTORE_ALL_NMI pop=0
+.macro RESTORE_ALL_NMI cr3_reg:req pop=0
 	/*
 	 * Restore segments - might cause exceptions when loading
 	 * user-space segments
 	 */
 	RESTORE_SEGMENTS
 
+	/*
+	 * Now switch the CR3 when PTI is enabled.
+	 *
+	 * We enter with kernel cr3 and switch the cr3 to the value
+	 * stored on \cr3_reg, which is either a user or a kernel cr3.
+	 */
+	ALTERNATIVE "jmp .Lswitched_\@", "", X86_FEATURE_PTI
+
+	testl	$PTI_SWITCH_MASK, \cr3_reg
+	jz	.Lswitched_\@
+
+	/* User cr3 in \cr3_reg - write it to hardware cr3 */
+	movl	\cr3_reg, %cr3
+
+.Lswitched_\@:
+
 	/* Restore integer registers and unwind stack to iret frame */
 	RESTORE_INT_REGS
 	RESTORE_SKIP_SEGMENTS \pop
@@ -1142,7 +1182,7 @@ ENTRY(nmi)
 #endif
 
 	pushl	%eax				# pt_regs->orig_ax
-	SAVE_ALL_NMI
+	SAVE_ALL_NMI cr3_reg=%edi
 	ENCODE_FRAME_POINTER
 	xorl	%edx, %edx			# zero error code
 	movl	%esp, %eax			# pt_regs pointer
@@ -1170,7 +1210,7 @@ ENTRY(nmi)
 
 .Lnmi_return:
 	CHECK_AND_APPLY_ESPFIX
-	RESTORE_ALL_NMI pop=4
+	RESTORE_ALL_NMI cr3_reg=%edi pop=4
 	jmp	.Lirq_return
 
 #ifdef CONFIG_X86_ESPFIX32
@@ -1186,12 +1226,12 @@ ENTRY(nmi)
 	pushl	16(%esp)
 	.endr
 	pushl	%eax
-	SAVE_ALL_NMI
+	SAVE_ALL_NMI cr3_reg=%edi
 	ENCODE_FRAME_POINTER
 	FIXUP_ESPFIX_STACK			# %eax == %esp
 	xorl	%edx, %edx			# zero error code
 	call	do_nmi
-	RESTORE_ALL_NMI
+	RESTORE_ALL_NMI cr3_reg=%edi
 	lss	12+4(%esp), %esp		# back to espfix stack
 	jmp	.Lirq_return
 #endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 11/31] x86/entry/32: Add PTI cr3 switches to NMI handler code
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

The NMI handler is special, as it needs to leave with the
same cr3 as it was entered with. We need to do this because
we could enter the NMI handler from kernel code with
user-cr3 already loaded.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 52 +++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 46 insertions(+), 6 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index be1d814..9693485 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -77,6 +77,8 @@
 #endif
 .endm
 
+#define PTI_SWITCH_MASK         (1 << PAGE_SHIFT)
+
 /*
  * User gs save/restore
  *
@@ -167,8 +169,30 @@
 
 .endm
 
-.macro SAVE_ALL_NMI
+.macro SAVE_ALL_NMI cr3_reg:req
 	SAVE_ALL
+
+	/*
+	 * Now switch the CR3 when PTI is enabled.
+	 *
+	 * We can enter with either user or kernel cr3, the code will
+	 * store the old cr3 in \cr3_reg and switches to the kernel cr3
+	 * if necessary.
+	 */
+	ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
+
+	movl	%cr3, \cr3_reg
+	testl	$PTI_SWITCH_MASK, \cr3_reg
+	jz	.Lend_\@	/* Already on kernel cr3 */
+
+	/* On user cr3 - write new kernel cr3 */
+	andl	$(~PTI_SWITCH_MASK), \cr3_reg
+	movl	\cr3_reg, %cr3
+
+	/* Restore user cr3 value */
+	orl	$PTI_SWITCH_MASK, \cr3_reg
+
+.Lend_\@:
 .endm
 /*
  * This is a sneaky trick to help the unwinder find pt_regs on the stack.  The
@@ -227,13 +251,29 @@
 	RESTORE_SKIP_SEGMENTS \pop
 .endm
 
-.macro RESTORE_ALL_NMI pop=0
+.macro RESTORE_ALL_NMI cr3_reg:req pop=0
 	/*
 	 * Restore segments - might cause exceptions when loading
 	 * user-space segments
 	 */
 	RESTORE_SEGMENTS
 
+	/*
+	 * Now switch the CR3 when PTI is enabled.
+	 *
+	 * We enter with kernel cr3 and switch the cr3 to the value
+	 * stored on \cr3_reg, which is either a user or a kernel cr3.
+	 */
+	ALTERNATIVE "jmp .Lswitched_\@", "", X86_FEATURE_PTI
+
+	testl	$PTI_SWITCH_MASK, \cr3_reg
+	jz	.Lswitched_\@
+
+	/* User cr3 in \cr3_reg - write it to hardware cr3 */
+	movl	\cr3_reg, %cr3
+
+.Lswitched_\@:
+
 	/* Restore integer registers and unwind stack to iret frame */
 	RESTORE_INT_REGS
 	RESTORE_SKIP_SEGMENTS \pop
@@ -1142,7 +1182,7 @@ ENTRY(nmi)
 #endif
 
 	pushl	%eax				# pt_regs->orig_ax
-	SAVE_ALL_NMI
+	SAVE_ALL_NMI cr3_reg=%edi
 	ENCODE_FRAME_POINTER
 	xorl	%edx, %edx			# zero error code
 	movl	%esp, %eax			# pt_regs pointer
@@ -1170,7 +1210,7 @@ ENTRY(nmi)
 
 .Lnmi_return:
 	CHECK_AND_APPLY_ESPFIX
-	RESTORE_ALL_NMI pop=4
+	RESTORE_ALL_NMI cr3_reg=%edi pop=4
 	jmp	.Lirq_return
 
 #ifdef CONFIG_X86_ESPFIX32
@@ -1186,12 +1226,12 @@ ENTRY(nmi)
 	pushl	16(%esp)
 	.endr
 	pushl	%eax
-	SAVE_ALL_NMI
+	SAVE_ALL_NMI cr3_reg=%edi
 	ENCODE_FRAME_POINTER
 	FIXUP_ESPFIX_STACK			# %eax == %esp
 	xorl	%edx, %edx			# zero error code
 	call	do_nmi
-	RESTORE_ALL_NMI
+	RESTORE_ALL_NMI cr3_reg=%edi
 	lss	12+4(%esp), %esp		# back to espfix stack
 	jmp	.Lirq_return
 #endif
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Add unconditional cr3 switches between user and kernel cr3
to all non-NMI entry and exit points.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 59 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 58 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 9693485..b5ef003 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -328,6 +328,25 @@
 #endif /* CONFIG_X86_ESPFIX32 */
 .endm
 
+/* Unconditionally switch to user cr3 */
+.macro SWITCH_TO_USER_CR3 scratch_reg:req
+	ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
+
+	movl	%cr3, \scratch_reg
+	orl	$PTI_SWITCH_MASK, \scratch_reg
+	movl	\scratch_reg, %cr3
+.Lend_\@:
+.endm
+
+/* Unconditionally switch to kernel cr3 */
+.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
+	ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
+	movl	%cr3, \scratch_reg
+	andl	$(~PTI_SWITCH_MASK), \scratch_reg
+	movl	\scratch_reg, %cr3
+.Lend_\@:
+.endm
+
 
 /*
  * Called with pt_regs fully populated and kernel segments loaded,
@@ -343,6 +362,8 @@
 
 	ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
 
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
+
 	/* Are we on the entry stack? Bail out if not! */
 	movl	PER_CPU_VAR(cpu_entry_area), %edi
 	addl	$CPU_ENTRY_AREA_entry_stack, %edi
@@ -637,6 +658,18 @@ ENTRY(xen_sysenter_target)
  * 0(%ebp) arg6
  */
 ENTRY(entry_SYSENTER_32)
+	/*
+	 * On entry-stack with all userspace-regs live - save and
+	 * restore eflags and %eax to use it as scratch-reg for the cr3
+	 * switch.
+	 */
+	pushfl
+	pushl	%eax
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
+	popl	%eax
+	popfl
+
+	/* Stack empty again, switch to task stack */
 	movl	TSS_entry_stack(%esp), %esp
 
 .Lsysenter_past_esp:
@@ -691,6 +724,10 @@ ENTRY(entry_SYSENTER_32)
 	movl	PT_OLDESP(%esp), %ecx	/* pt_regs->sp */
 1:	mov	PT_FS(%esp), %fs
 	PTGS_TO_GS
+
+	/* Segments are restored - switch to user cr3 */
+	SWITCH_TO_USER_CR3 scratch_reg=%eax
+
 	popl	%ebx			/* pt_regs->bx */
 	addl	$2*4, %esp		/* skip pt_regs->cx and pt_regs->dx */
 	popl	%esi			/* pt_regs->si */
@@ -778,7 +815,23 @@ restore_all:
 .Lrestore_all_notrace:
 	CHECK_AND_APPLY_ESPFIX
 .Lrestore_nocheck:
-	RESTORE_REGS 4				# skip orig_eax/error_code
+	/*
+	 * First restore user segments. This can cause exceptions, so we
+	 * run it with kernel cr3.
+	 */
+	RESTORE_SEGMENTS
+
+	/*
+	 * Segments are restored - no more exceptions from here on except on
+	 * iret, but that handled safely.
+	 */
+	SWITCH_TO_USER_CR3 scratch_reg=%eax
+
+	/* Restore rest */
+	RESTORE_INT_REGS
+
+	/* Unwind stack to the iret frame */
+	RESTORE_SKIP_SEGMENTS 4			# skip orig_eax/error_code
 .Lirq_return:
 	INTERRUPT_RETURN
 
@@ -1139,6 +1192,10 @@ ENTRY(debug)
 
 	SAVE_ALL
 	ENCODE_FRAME_POINTER
+
+	/* Make sure we are running on kernel cr3 */
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
+
 	xorl	%edx, %edx			# error code 0
 	movl	%esp, %eax			# pt_regs pointer
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Add unconditional cr3 switches between user and kernel cr3
to all non-NMI entry and exit points.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 59 ++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 58 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 9693485..b5ef003 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -328,6 +328,25 @@
 #endif /* CONFIG_X86_ESPFIX32 */
 .endm
 
+/* Unconditionally switch to user cr3 */
+.macro SWITCH_TO_USER_CR3 scratch_reg:req
+	ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
+
+	movl	%cr3, \scratch_reg
+	orl	$PTI_SWITCH_MASK, \scratch_reg
+	movl	\scratch_reg, %cr3
+.Lend_\@:
+.endm
+
+/* Unconditionally switch to kernel cr3 */
+.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
+	ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
+	movl	%cr3, \scratch_reg
+	andl	$(~PTI_SWITCH_MASK), \scratch_reg
+	movl	\scratch_reg, %cr3
+.Lend_\@:
+.endm
+
 
 /*
  * Called with pt_regs fully populated and kernel segments loaded,
@@ -343,6 +362,8 @@
 
 	ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
 
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
+
 	/* Are we on the entry stack? Bail out if not! */
 	movl	PER_CPU_VAR(cpu_entry_area), %edi
 	addl	$CPU_ENTRY_AREA_entry_stack, %edi
@@ -637,6 +658,18 @@ ENTRY(xen_sysenter_target)
  * 0(%ebp) arg6
  */
 ENTRY(entry_SYSENTER_32)
+	/*
+	 * On entry-stack with all userspace-regs live - save and
+	 * restore eflags and %eax to use it as scratch-reg for the cr3
+	 * switch.
+	 */
+	pushfl
+	pushl	%eax
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
+	popl	%eax
+	popfl
+
+	/* Stack empty again, switch to task stack */
 	movl	TSS_entry_stack(%esp), %esp
 
 .Lsysenter_past_esp:
@@ -691,6 +724,10 @@ ENTRY(entry_SYSENTER_32)
 	movl	PT_OLDESP(%esp), %ecx	/* pt_regs->sp */
 1:	mov	PT_FS(%esp), %fs
 	PTGS_TO_GS
+
+	/* Segments are restored - switch to user cr3 */
+	SWITCH_TO_USER_CR3 scratch_reg=%eax
+
 	popl	%ebx			/* pt_regs->bx */
 	addl	$2*4, %esp		/* skip pt_regs->cx and pt_regs->dx */
 	popl	%esi			/* pt_regs->si */
@@ -778,7 +815,23 @@ restore_all:
 .Lrestore_all_notrace:
 	CHECK_AND_APPLY_ESPFIX
 .Lrestore_nocheck:
-	RESTORE_REGS 4				# skip orig_eax/error_code
+	/*
+	 * First restore user segments. This can cause exceptions, so we
+	 * run it with kernel cr3.
+	 */
+	RESTORE_SEGMENTS
+
+	/*
+	 * Segments are restored - no more exceptions from here on except on
+	 * iret, but that handled safely.
+	 */
+	SWITCH_TO_USER_CR3 scratch_reg=%eax
+
+	/* Restore rest */
+	RESTORE_INT_REGS
+
+	/* Unwind stack to the iret frame */
+	RESTORE_SKIP_SEGMENTS 4			# skip orig_eax/error_code
 .Lirq_return:
 	INTERRUPT_RETURN
 
@@ -1139,6 +1192,10 @@ ENTRY(debug)
 
 	SAVE_ALL
 	ENCODE_FRAME_POINTER
+
+	/* Make sure we are running on kernel cr3 */
+	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
+
 	xorl	%edx, %edx			# error code 0
 	movl	%esp, %eax			# pt_regs pointer
 
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 13/31] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

It can happen that we enter the kernel from kernel-mode and
on the entry-stack. The most common way this happens is when
we get an exception while loading the user-space segment
registers on the kernel-to-userspace exit path.

The segment loading needs to be done after the entry-stack
switch, because the stack-switch needs kernel %fs for
per_cpu access.

When this happens, we need to make sure that we leave the
kernel with the entry-stack again, so that the interrupted
code-path runs on the right stack when switching to the
user-cr3.

We do this by detecting this condition on kernel-entry by
checking CS.RPL and %esp, and if it happens, we copy over
the complete content of the entry stack to the task-stack.
This needs to be done because once we enter the exception
handlers we might be scheduled out or even migrated to a
different CPU, so that we can't rely on the entry-stack
contents. We also leave a marker in the stack-frame to
detect this condition on the exit path.

On the exit path the copy is reversed, we copy all of the
remaining task-stack back to the entry-stack and switch
to it.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 109 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 108 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index b5ef003..d94dab6 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -358,6 +358,9 @@
  * copied there. So allocate the stack-frame on the task-stack and
  * switch to it before we do any copying.
  */
+
+#define CS_FROM_ENTRY_STACK	(1 << 31)
+
 .macro SWITCH_TO_KERNEL_STACK
 
 	ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
@@ -381,6 +384,10 @@
 	/* Load top of task-stack into %edi */
 	movl	TSS_entry_stack(%edi), %edi
 
+	/* Special case - entry from kernel mode via entry stack */
+	testl	$SEGMENT_RPL_MASK, PT_CS(%esp)
+	jz	.Lentry_from_kernel_\@
+
 	/* Bytes to copy */
 	movl	$PTREGS_SIZE, %ecx
 
@@ -394,8 +401,8 @@
 	 */
 	addl	$(4 * 4), %ecx
 
-.Lcopy_pt_regs_\@:
 #endif
+.Lcopy_pt_regs_\@:
 
 	/* Allocate frame on task-stack */
 	subl	%ecx, %edi
@@ -410,6 +417,56 @@
 	cld
 	rep movsb
 
+	jmp .Lend_\@
+
+.Lentry_from_kernel_\@:
+
+	/*
+	 * This handles the case when we enter the kernel from
+	 * kernel-mode and %esp points to the entry-stack. When this
+	 * happens we need to switch to the task-stack to run C code,
+	 * but switch back to the entry-stack again when we approach
+	 * iret and return to the interrupted code-path. This usually
+	 * happens when we hit an exception while restoring user-space
+	 * segment registers on the way back to user-space.
+	 *
+	 * When we switch to the task-stack here, we can't trust the
+	 * contents of the entry-stack anymore, as the exception handler
+	 * might be scheduled out or moved to another CPU. Therefore we
+	 * copy the complete entry-stack to the task-stack and set a
+	 * marker in the iret-frame (bit 31 of the CS dword) to detect
+	 * what we've done on the iret path.
+	 *
+	 * On the iret path we copy everything back and switch to the
+	 * entry-stack, so that the interrupted kernel code-path
+	 * continues on the same stack it was interrupted with.
+	 *
+	 * Be aware that an NMI can happen anytime in this code.
+	 *
+	 * %esi: Entry-Stack pointer (same as %esp)
+	 * %edi: Top of the task stack
+	 */
+
+	/* Calculate number of bytes on the entry stack in %ecx */
+	movl	%esi, %ecx
+
+	/* %ecx to the top of entry-stack */
+	andl	$(MASK_entry_stack), %ecx
+	addl	$(SIZEOF_entry_stack), %ecx
+
+	/* Number of bytes on the entry stack to %ecx */
+	sub	%esi, %ecx
+
+	/* Mark stackframe as coming from entry stack */
+	orl	$CS_FROM_ENTRY_STACK, PT_CS(%esp)
+
+	/*
+	 * %esi and %edi are unchanged, %ecx contains the number of
+	 * bytes to copy. The code at .Lcopy_pt_regs_\@ will allocate
+	 * the stack-frame on task-stack and copy everything over
+	 */
+	jmp .Lcopy_pt_regs_\@
+
 .Lend_\@:
 .endm
 
@@ -467,6 +524,55 @@
 .endm
 
 /*
+ * This macro handles the case when we return to kernel-mode on the iret
+ * path and have to switch back to the entry stack.
+ *
+ * See the comments below the .Lentry_from_kernel_\@ label in the
+ * SWITCH_TO_KERNEL_STACK macro for more details.
+ */
+.macro PARANOID_EXIT_TO_KERNEL_MODE
+
+	/*
+	 * Test if we entered the kernel with the entry-stack. Most
+	 * likely we did not, because this code only runs on the
+	 * return-to-kernel path.
+	 */
+	testl	$CS_FROM_ENTRY_STACK, PT_CS(%esp)
+	jz	.Lend_\@
+
+	/* Unlikely slow-path */
+
+	/* Clear marker from stack-frame */
+	andl	$(~CS_FROM_ENTRY_STACK), PT_CS(%esp)
+
+	/* Copy the remaining task-stack contents to entry-stack */
+	movl	%esp, %esi
+	movl	PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %edi
+
+	/* Bytes on the task-stack to ecx */
+	movl	PER_CPU_VAR(cpu_current_top_of_stack), %ecx
+	subl	%esi, %ecx
+
+	/* Allocate stack-frame on entry-stack */
+	subl	%ecx, %edi
+
+	/*
+	 * Save future stack-pointer, we must not switch until the
+	 * copy is done, otherwise the NMI handler could destroy the
+	 * contents of the task-stack we are about to copy.
+	 */
+	movl	%edi, %ebx
+
+	/* Do the copy */
+	cld
+	rep movsb
+
+	/* Safe to switch to entry-stack now */
+	movl	%ebx, %esp
+
+.Lend_\@:
+.endm
+/*
  * %eax: prev task
  * %edx: next task
  */
@@ -837,6 +943,7 @@ restore_all:
 
 restore_all_kernel:
 	TRACE_IRQS_IRET
+	PARANOID_EXIT_TO_KERNEL_MODE
 	RESTORE_REGS 4
 	jmp	.Lirq_return
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 13/31] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

It can happen that we enter the kernel from kernel-mode and
on the entry-stack. The most common way this happens is when
we get an exception while loading the user-space segment
registers on the kernel-to-userspace exit path.

The segment loading needs to be done after the entry-stack
switch, because the stack-switch needs kernel %fs for
per_cpu access.

When this happens, we need to make sure that we leave the
kernel with the entry-stack again, so that the interrupted
code-path runs on the right stack when switching to the
user-cr3.

We do this by detecting this condition on kernel-entry by
checking CS.RPL and %esp, and if it happens, we copy over
the complete content of the entry stack to the task-stack.
This needs to be done because once we enter the exception
handlers we might be scheduled out or even migrated to a
different CPU, so that we can't rely on the entry-stack
contents. We also leave a marker in the stack-frame to
detect this condition on the exit path.

On the exit path the copy is reversed, we copy all of the
remaining task-stack back to the entry-stack and switch
to it.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/entry/entry_32.S | 109 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 108 insertions(+), 1 deletion(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index b5ef003..d94dab6 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -358,6 +358,9 @@
  * copied there. So allocate the stack-frame on the task-stack and
  * switch to it before we do any copying.
  */
+
+#define CS_FROM_ENTRY_STACK	(1 << 31)
+
 .macro SWITCH_TO_KERNEL_STACK
 
 	ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
@@ -381,6 +384,10 @@
 	/* Load top of task-stack into %edi */
 	movl	TSS_entry_stack(%edi), %edi
 
+	/* Special case - entry from kernel mode via entry stack */
+	testl	$SEGMENT_RPL_MASK, PT_CS(%esp)
+	jz	.Lentry_from_kernel_\@
+
 	/* Bytes to copy */
 	movl	$PTREGS_SIZE, %ecx
 
@@ -394,8 +401,8 @@
 	 */
 	addl	$(4 * 4), %ecx
 
-.Lcopy_pt_regs_\@:
 #endif
+.Lcopy_pt_regs_\@:
 
 	/* Allocate frame on task-stack */
 	subl	%ecx, %edi
@@ -410,6 +417,56 @@
 	cld
 	rep movsb
 
+	jmp .Lend_\@
+
+.Lentry_from_kernel_\@:
+
+	/*
+	 * This handles the case when we enter the kernel from
+	 * kernel-mode and %esp points to the entry-stack. When this
+	 * happens we need to switch to the task-stack to run C code,
+	 * but switch back to the entry-stack again when we approach
+	 * iret and return to the interrupted code-path. This usually
+	 * happens when we hit an exception while restoring user-space
+	 * segment registers on the way back to user-space.
+	 *
+	 * When we switch to the task-stack here, we can't trust the
+	 * contents of the entry-stack anymore, as the exception handler
+	 * might be scheduled out or moved to another CPU. Therefore we
+	 * copy the complete entry-stack to the task-stack and set a
+	 * marker in the iret-frame (bit 31 of the CS dword) to detect
+	 * what we've done on the iret path.
+	 *
+	 * On the iret path we copy everything back and switch to the
+	 * entry-stack, so that the interrupted kernel code-path
+	 * continues on the same stack it was interrupted with.
+	 *
+	 * Be aware that an NMI can happen anytime in this code.
+	 *
+	 * %esi: Entry-Stack pointer (same as %esp)
+	 * %edi: Top of the task stack
+	 */
+
+	/* Calculate number of bytes on the entry stack in %ecx */
+	movl	%esi, %ecx
+
+	/* %ecx to the top of entry-stack */
+	andl	$(MASK_entry_stack), %ecx
+	addl	$(SIZEOF_entry_stack), %ecx
+
+	/* Number of bytes on the entry stack to %ecx */
+	sub	%esi, %ecx
+
+	/* Mark stackframe as coming from entry stack */
+	orl	$CS_FROM_ENTRY_STACK, PT_CS(%esp)
+
+	/*
+	 * %esi and %edi are unchanged, %ecx contains the number of
+	 * bytes to copy. The code at .Lcopy_pt_regs_\@ will allocate
+	 * the stack-frame on task-stack and copy everything over
+	 */
+	jmp .Lcopy_pt_regs_\@
+
 .Lend_\@:
 .endm
 
@@ -467,6 +524,55 @@
 .endm
 
 /*
+ * This macro handles the case when we return to kernel-mode on the iret
+ * path and have to switch back to the entry stack.
+ *
+ * See the comments below the .Lentry_from_kernel_\@ label in the
+ * SWITCH_TO_KERNEL_STACK macro for more details.
+ */
+.macro PARANOID_EXIT_TO_KERNEL_MODE
+
+	/*
+	 * Test if we entered the kernel with the entry-stack. Most
+	 * likely we did not, because this code only runs on the
+	 * return-to-kernel path.
+	 */
+	testl	$CS_FROM_ENTRY_STACK, PT_CS(%esp)
+	jz	.Lend_\@
+
+	/* Unlikely slow-path */
+
+	/* Clear marker from stack-frame */
+	andl	$(~CS_FROM_ENTRY_STACK), PT_CS(%esp)
+
+	/* Copy the remaining task-stack contents to entry-stack */
+	movl	%esp, %esi
+	movl	PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %edi
+
+	/* Bytes on the task-stack to ecx */
+	movl	PER_CPU_VAR(cpu_current_top_of_stack), %ecx
+	subl	%esi, %ecx
+
+	/* Allocate stack-frame on entry-stack */
+	subl	%ecx, %edi
+
+	/*
+	 * Save future stack-pointer, we must not switch until the
+	 * copy is done, otherwise the NMI handler could destroy the
+	 * contents of the task-stack we are about to copy.
+	 */
+	movl	%edi, %ebx
+
+	/* Do the copy */
+	cld
+	rep movsb
+
+	/* Safe to switch to entry-stack now */
+	movl	%ebx, %esp
+
+.Lend_\@:
+.endm
+/*
  * %eax: prev task
  * %edx: next task
  */
@@ -837,6 +943,7 @@ restore_all:
 
 restore_all_kernel:
 	TRACE_IRQS_IRET
+	PARANOID_EXIT_TO_KERNEL_MODE
 	RESTORE_REGS 4
 	jmp	.Lirq_return
 
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 14/31] x86/pgtable/pae: Unshare kernel PMDs when PTI is enabled
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

With PTI we need to map the per-process LDT into the kernel
address-space for each process, so we need separate kernel
PMDs per PGD.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable-3level_types.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable-3level_types.h b/arch/x86/include/asm/pgtable-3level_types.h
index 876b4c7..ed8a200 100644
--- a/arch/x86/include/asm/pgtable-3level_types.h
+++ b/arch/x86/include/asm/pgtable-3level_types.h
@@ -21,9 +21,10 @@ typedef union {
 #endif	/* !__ASSEMBLY__ */
 
 #ifdef CONFIG_PARAVIRT
-#define SHARED_KERNEL_PMD	(pv_info.shared_kernel_pmd)
+#define SHARED_KERNEL_PMD	((!static_cpu_has(X86_FEATURE_PTI) &&	\
+				 (pv_info.shared_kernel_pmd)))
 #else
-#define SHARED_KERNEL_PMD	1
+#define SHARED_KERNEL_PMD	(!static_cpu_has(X86_FEATURE_PTI))
 #endif
 
 /*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 14/31] x86/pgtable/pae: Unshare kernel PMDs when PTI is enabled
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

With PTI we need to map the per-process LDT into the kernel
address-space for each process, so we need separate kernel
PMDs per PGD.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable-3level_types.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable-3level_types.h b/arch/x86/include/asm/pgtable-3level_types.h
index 876b4c7..ed8a200 100644
--- a/arch/x86/include/asm/pgtable-3level_types.h
+++ b/arch/x86/include/asm/pgtable-3level_types.h
@@ -21,9 +21,10 @@ typedef union {
 #endif	/* !__ASSEMBLY__ */
 
 #ifdef CONFIG_PARAVIRT
-#define SHARED_KERNEL_PMD	(pv_info.shared_kernel_pmd)
+#define SHARED_KERNEL_PMD	((!static_cpu_has(X86_FEATURE_PTI) &&	\
+				 (pv_info.shared_kernel_pmd)))
 #else
-#define SHARED_KERNEL_PMD	1
+#define SHARED_KERNEL_PMD	(!static_cpu_has(X86_FEATURE_PTI))
 #endif
 
 /*
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 15/31] x86/pgtable/32: Allocate 8k page-tables when PTI is enabled
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Allocate a kernel and a user page-table root when PTI is
enabled. Also allocate a full page per root for PAE because
otherwise the bit to flip in cr3 to switch between them
would be non-constant, which creates a lot of hassle.
Keep that for a later optimization.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/head_32.S | 20 +++++++++++++++-----
 arch/x86/mm/pgtable.c     |  5 +++--
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index c290209..1f35d60 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -512,11 +512,18 @@ ENTRY(initial_code)
 ENTRY(setup_once_ref)
 	.long setup_once
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+#define	PGD_ALIGN	(2 * PAGE_SIZE)
+#define PTI_USER_PGD_FILL	1024
+#else
+#define	PGD_ALIGN	(PAGE_SIZE)
+#define PTI_USER_PGD_FILL	0
+#endif
 /*
  * BSS section
  */
 __PAGE_ALIGNED_BSS
-	.align PAGE_SIZE
+	.align PGD_ALIGN
 #ifdef CONFIG_X86_PAE
 .globl initial_pg_pmd
 initial_pg_pmd:
@@ -526,14 +533,17 @@ initial_pg_pmd:
 initial_page_table:
 	.fill 1024,4,0
 #endif
+	.align PGD_ALIGN
 initial_pg_fixmap:
 	.fill 1024,4,0
-.globl empty_zero_page
-empty_zero_page:
-	.fill 4096,1,0
 .globl swapper_pg_dir
+	.align PGD_ALIGN
 swapper_pg_dir:
 	.fill 1024,4,0
+	.fill PTI_USER_PGD_FILL,4,0
+.globl empty_zero_page
+empty_zero_page:
+	.fill 4096,1,0
 EXPORT_SYMBOL(empty_zero_page)
 
 /*
@@ -542,7 +552,7 @@ EXPORT_SYMBOL(empty_zero_page)
 #ifdef CONFIG_X86_PAE
 __PAGE_ALIGNED_DATA
 	/* Page-aligned for the benefit of paravirt? */
-	.align PAGE_SIZE
+	.align PGD_ALIGN
 ENTRY(initial_page_table)
 	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR),0	/* low identity map */
 # if KPMDS == 3
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 004abf9..a81d42e 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -338,7 +338,8 @@ static inline pgd_t *_pgd_alloc(void)
 	 * We allocate one page for pgd.
 	 */
 	if (!SHARED_KERNEL_PMD)
-		return (pgd_t *)__get_free_page(PGALLOC_GFP);
+		return (pgd_t *)__get_free_pages(PGALLOC_GFP,
+						 PGD_ALLOCATION_ORDER);
 
 	/*
 	 * Now PAE kernel is not running as a Xen domain. We can allocate
@@ -350,7 +351,7 @@ static inline pgd_t *_pgd_alloc(void)
 static inline void _pgd_free(pgd_t *pgd)
 {
 	if (!SHARED_KERNEL_PMD)
-		free_page((unsigned long)pgd);
+		free_pages((unsigned long)pgd, PGD_ALLOCATION_ORDER);
 	else
 		kmem_cache_free(pgd_cache, pgd);
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 15/31] x86/pgtable/32: Allocate 8k page-tables when PTI is enabled
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Allocate a kernel and a user page-table root when PTI is
enabled. Also allocate a full page per root for PAE because
otherwise the bit to flip in cr3 to switch between them
would be non-constant, which creates a lot of hassle.
Keep that for a later optimization.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/head_32.S | 20 +++++++++++++++-----
 arch/x86/mm/pgtable.c     |  5 +++--
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index c290209..1f35d60 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -512,11 +512,18 @@ ENTRY(initial_code)
 ENTRY(setup_once_ref)
 	.long setup_once
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+#define	PGD_ALIGN	(2 * PAGE_SIZE)
+#define PTI_USER_PGD_FILL	1024
+#else
+#define	PGD_ALIGN	(PAGE_SIZE)
+#define PTI_USER_PGD_FILL	0
+#endif
 /*
  * BSS section
  */
 __PAGE_ALIGNED_BSS
-	.align PAGE_SIZE
+	.align PGD_ALIGN
 #ifdef CONFIG_X86_PAE
 .globl initial_pg_pmd
 initial_pg_pmd:
@@ -526,14 +533,17 @@ initial_pg_pmd:
 initial_page_table:
 	.fill 1024,4,0
 #endif
+	.align PGD_ALIGN
 initial_pg_fixmap:
 	.fill 1024,4,0
-.globl empty_zero_page
-empty_zero_page:
-	.fill 4096,1,0
 .globl swapper_pg_dir
+	.align PGD_ALIGN
 swapper_pg_dir:
 	.fill 1024,4,0
+	.fill PTI_USER_PGD_FILL,4,0
+.globl empty_zero_page
+empty_zero_page:
+	.fill 4096,1,0
 EXPORT_SYMBOL(empty_zero_page)
 
 /*
@@ -542,7 +552,7 @@ EXPORT_SYMBOL(empty_zero_page)
 #ifdef CONFIG_X86_PAE
 __PAGE_ALIGNED_DATA
 	/* Page-aligned for the benefit of paravirt? */
-	.align PAGE_SIZE
+	.align PGD_ALIGN
 ENTRY(initial_page_table)
 	.long	pa(initial_pg_pmd+PGD_IDENT_ATTR),0	/* low identity map */
 # if KPMDS == 3
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 004abf9..a81d42e 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -338,7 +338,8 @@ static inline pgd_t *_pgd_alloc(void)
 	 * We allocate one page for pgd.
 	 */
 	if (!SHARED_KERNEL_PMD)
-		return (pgd_t *)__get_free_page(PGALLOC_GFP);
+		return (pgd_t *)__get_free_pages(PGALLOC_GFP,
+						 PGD_ALLOCATION_ORDER);
 
 	/*
 	 * Now PAE kernel is not running as a Xen domain. We can allocate
@@ -350,7 +351,7 @@ static inline pgd_t *_pgd_alloc(void)
 static inline void _pgd_free(pgd_t *pgd)
 {
 	if (!SHARED_KERNEL_PMD)
-		free_page((unsigned long)pgd);
+		free_pages((unsigned long)pgd, PGD_ALLOCATION_ORDER);
 	else
 		kmem_cache_free(pgd_cache, pgd);
 }
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 16/31] x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Make them available on 32 bit and clone_pgd_range() happy.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable.h    | 49 +++++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/pgtable_64.h | 49 ---------------------------------------
 2 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index e42b894..0a9f746 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1109,6 +1109,55 @@ static inline int pud_write(pud_t pud)
 	return pud_flags(pud) & _PAGE_RW;
 }
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+/*
+ * All top-level PAGE_TABLE_ISOLATION page tables are order-1 pages
+ * (8k-aligned and 8k in size).  The kernel one is at the beginning 4k and
+ * the user one is in the last 4k.  To switch between them, you
+ * just need to flip the 12th bit in their addresses.
+ */
+#define PTI_PGTABLE_SWITCH_BIT	PAGE_SHIFT
+
+/*
+ * This generates better code than the inline assembly in
+ * __set_bit().
+ */
+static inline void *ptr_set_bit(void *ptr, int bit)
+{
+	unsigned long __ptr = (unsigned long)ptr;
+
+	__ptr |= BIT(bit);
+	return (void *)__ptr;
+}
+static inline void *ptr_clear_bit(void *ptr, int bit)
+{
+	unsigned long __ptr = (unsigned long)ptr;
+
+	__ptr &= ~BIT(bit);
+	return (void *)__ptr;
+}
+
+static inline pgd_t *kernel_to_user_pgdp(pgd_t *pgdp)
+{
+	return ptr_set_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
+}
+
+static inline pgd_t *user_to_kernel_pgdp(pgd_t *pgdp)
+{
+	return ptr_clear_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
+}
+
+static inline p4d_t *kernel_to_user_p4dp(p4d_t *p4dp)
+{
+	return ptr_set_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
+}
+
+static inline p4d_t *user_to_kernel_p4dp(p4d_t *p4dp)
+{
+	return ptr_clear_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
+}
+#endif /* CONFIG_PAGE_TABLE_ISOLATION */
+
 /*
  * clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
  *
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 81462e9..58d7f10 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -131,55 +131,6 @@ static inline pud_t native_pudp_get_and_clear(pud_t *xp)
 #endif
 }
 
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
-/*
- * All top-level PAGE_TABLE_ISOLATION page tables are order-1 pages
- * (8k-aligned and 8k in size).  The kernel one is at the beginning 4k and
- * the user one is in the last 4k.  To switch between them, you
- * just need to flip the 12th bit in their addresses.
- */
-#define PTI_PGTABLE_SWITCH_BIT	PAGE_SHIFT
-
-/*
- * This generates better code than the inline assembly in
- * __set_bit().
- */
-static inline void *ptr_set_bit(void *ptr, int bit)
-{
-	unsigned long __ptr = (unsigned long)ptr;
-
-	__ptr |= BIT(bit);
-	return (void *)__ptr;
-}
-static inline void *ptr_clear_bit(void *ptr, int bit)
-{
-	unsigned long __ptr = (unsigned long)ptr;
-
-	__ptr &= ~BIT(bit);
-	return (void *)__ptr;
-}
-
-static inline pgd_t *kernel_to_user_pgdp(pgd_t *pgdp)
-{
-	return ptr_set_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
-}
-
-static inline pgd_t *user_to_kernel_pgdp(pgd_t *pgdp)
-{
-	return ptr_clear_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
-}
-
-static inline p4d_t *kernel_to_user_p4dp(p4d_t *p4dp)
-{
-	return ptr_set_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
-}
-
-static inline p4d_t *user_to_kernel_p4dp(p4d_t *p4dp)
-{
-	return ptr_clear_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
-}
-#endif /* CONFIG_PAGE_TABLE_ISOLATION */
-
 /*
  * Page table pages are page-aligned.  The lower half of the top
  * level is used for userspace and the top half for the kernel.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 16/31] x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Make them available on 32 bit and clone_pgd_range() happy.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable.h    | 49 +++++++++++++++++++++++++++++++++++++++
 arch/x86/include/asm/pgtable_64.h | 49 ---------------------------------------
 2 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index e42b894..0a9f746 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1109,6 +1109,55 @@ static inline int pud_write(pud_t pud)
 	return pud_flags(pud) & _PAGE_RW;
 }
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+/*
+ * All top-level PAGE_TABLE_ISOLATION page tables are order-1 pages
+ * (8k-aligned and 8k in size).  The kernel one is at the beginning 4k and
+ * the user one is in the last 4k.  To switch between them, you
+ * just need to flip the 12th bit in their addresses.
+ */
+#define PTI_PGTABLE_SWITCH_BIT	PAGE_SHIFT
+
+/*
+ * This generates better code than the inline assembly in
+ * __set_bit().
+ */
+static inline void *ptr_set_bit(void *ptr, int bit)
+{
+	unsigned long __ptr = (unsigned long)ptr;
+
+	__ptr |= BIT(bit);
+	return (void *)__ptr;
+}
+static inline void *ptr_clear_bit(void *ptr, int bit)
+{
+	unsigned long __ptr = (unsigned long)ptr;
+
+	__ptr &= ~BIT(bit);
+	return (void *)__ptr;
+}
+
+static inline pgd_t *kernel_to_user_pgdp(pgd_t *pgdp)
+{
+	return ptr_set_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
+}
+
+static inline pgd_t *user_to_kernel_pgdp(pgd_t *pgdp)
+{
+	return ptr_clear_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
+}
+
+static inline p4d_t *kernel_to_user_p4dp(p4d_t *p4dp)
+{
+	return ptr_set_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
+}
+
+static inline p4d_t *user_to_kernel_p4dp(p4d_t *p4dp)
+{
+	return ptr_clear_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
+}
+#endif /* CONFIG_PAGE_TABLE_ISOLATION */
+
 /*
  * clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
  *
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 81462e9..58d7f10 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -131,55 +131,6 @@ static inline pud_t native_pudp_get_and_clear(pud_t *xp)
 #endif
 }
 
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
-/*
- * All top-level PAGE_TABLE_ISOLATION page tables are order-1 pages
- * (8k-aligned and 8k in size).  The kernel one is at the beginning 4k and
- * the user one is in the last 4k.  To switch between them, you
- * just need to flip the 12th bit in their addresses.
- */
-#define PTI_PGTABLE_SWITCH_BIT	PAGE_SHIFT
-
-/*
- * This generates better code than the inline assembly in
- * __set_bit().
- */
-static inline void *ptr_set_bit(void *ptr, int bit)
-{
-	unsigned long __ptr = (unsigned long)ptr;
-
-	__ptr |= BIT(bit);
-	return (void *)__ptr;
-}
-static inline void *ptr_clear_bit(void *ptr, int bit)
-{
-	unsigned long __ptr = (unsigned long)ptr;
-
-	__ptr &= ~BIT(bit);
-	return (void *)__ptr;
-}
-
-static inline pgd_t *kernel_to_user_pgdp(pgd_t *pgdp)
-{
-	return ptr_set_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
-}
-
-static inline pgd_t *user_to_kernel_pgdp(pgd_t *pgdp)
-{
-	return ptr_clear_bit(pgdp, PTI_PGTABLE_SWITCH_BIT);
-}
-
-static inline p4d_t *kernel_to_user_p4dp(p4d_t *p4dp)
-{
-	return ptr_set_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
-}
-
-static inline p4d_t *user_to_kernel_p4dp(p4d_t *p4dp)
-{
-	return ptr_clear_bit(p4dp, PTI_PGTABLE_SWITCH_BIT);
-}
-#endif /* CONFIG_PAGE_TABLE_ISOLATION */
-
 /*
  * Page table pages are page-aligned.  The lower half of the top
  * level is used for userspace and the top half for the kernel.
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 17/31] x86/pgtable: Move pti_set_user_pgd() to pgtable.h
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

There it is also usable from 32 bit code.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable.h    | 23 +++++++++++++++++++++++
 arch/x86/include/asm/pgtable_64.h | 21 ---------------------
 2 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 0a9f746..c9d3cf9 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -618,8 +618,31 @@ static inline int is_new_memtype_allowed(u64 paddr, unsigned long size,
 
 pmd_t *populate_extra_pmd(unsigned long vaddr);
 pte_t *populate_extra_pte(unsigned long vaddr);
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd);
+
+/*
+ * Take a PGD location (pgdp) and a pgd value that needs to be set there.
+ * Populates the user and returns the resulting PGD that must be set in
+ * the kernel copy of the page tables.
+ */
+static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+	if (!static_cpu_has(X86_FEATURE_PTI))
+		return pgd;
+	return __pti_set_user_pgd(pgdp, pgd);
+}
+#else   /* CONFIG_PAGE_TABLE_ISOLATION */
+static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+	return pgd;
+}
+#endif  /* CONFIG_PAGE_TABLE_ISOLATION */
+
 #endif	/* __ASSEMBLY__ */
 
+
 #ifdef CONFIG_X86_32
 # include <asm/pgtable_32.h>
 #else
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 58d7f10..396664d 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -145,27 +145,6 @@ static inline bool pgdp_maps_userspace(void *__ptr)
 	return (ptr & ~PAGE_MASK) < (PAGE_SIZE / 2);
 }
 
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
-pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd);
-
-/*
- * Take a PGD location (pgdp) and a pgd value that needs to be set there.
- * Populates the user and returns the resulting PGD that must be set in
- * the kernel copy of the page tables.
- */
-static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
-{
-	if (!static_cpu_has(X86_FEATURE_PTI))
-		return pgd;
-	return __pti_set_user_pgd(pgdp, pgd);
-}
-#else
-static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
-{
-	return pgd;
-}
-#endif
-
 static inline void native_set_p4d(p4d_t *p4dp, p4d_t p4d)
 {
 #if defined(CONFIG_PAGE_TABLE_ISOLATION) && !defined(CONFIG_X86_5LEVEL)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 17/31] x86/pgtable: Move pti_set_user_pgd() to pgtable.h
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

There it is also usable from 32 bit code.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable.h    | 23 +++++++++++++++++++++++
 arch/x86/include/asm/pgtable_64.h | 21 ---------------------
 2 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 0a9f746..c9d3cf9 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -618,8 +618,31 @@ static inline int is_new_memtype_allowed(u64 paddr, unsigned long size,
 
 pmd_t *populate_extra_pmd(unsigned long vaddr);
 pte_t *populate_extra_pte(unsigned long vaddr);
+
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd);
+
+/*
+ * Take a PGD location (pgdp) and a pgd value that needs to be set there.
+ * Populates the user and returns the resulting PGD that must be set in
+ * the kernel copy of the page tables.
+ */
+static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+	if (!static_cpu_has(X86_FEATURE_PTI))
+		return pgd;
+	return __pti_set_user_pgd(pgdp, pgd);
+}
+#else   /* CONFIG_PAGE_TABLE_ISOLATION */
+static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
+{
+	return pgd;
+}
+#endif  /* CONFIG_PAGE_TABLE_ISOLATION */
+
 #endif	/* __ASSEMBLY__ */
 
+
 #ifdef CONFIG_X86_32
 # include <asm/pgtable_32.h>
 #else
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 58d7f10..396664d 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -145,27 +145,6 @@ static inline bool pgdp_maps_userspace(void *__ptr)
 	return (ptr & ~PAGE_MASK) < (PAGE_SIZE / 2);
 }
 
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
-pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd);
-
-/*
- * Take a PGD location (pgdp) and a pgd value that needs to be set there.
- * Populates the user and returns the resulting PGD that must be set in
- * the kernel copy of the page tables.
- */
-static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
-{
-	if (!static_cpu_has(X86_FEATURE_PTI))
-		return pgd;
-	return __pti_set_user_pgd(pgdp, pgd);
-}
-#else
-static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd)
-{
-	return pgd;
-}
-#endif
-
 static inline void native_set_p4d(p4d_t *p4dp, p4d_t p4d)
 {
 #if defined(CONFIG_PAGE_TABLE_ISOLATION) && !defined(CONFIG_X86_5LEVEL)
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 18/31] x86/pgtable: Move two more functions from pgtable_64.h to pgtable.h
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

These two functions are required for PTI on 32 bit:

	* pgdp_maps_userspace()
	* pgd_large()

Also re-implement pgdp_maps_userspace() so that it will work
on 64 and 32 bit kernels.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable-2level_types.h |  3 +++
 arch/x86/include/asm/pgtable-3level_types.h |  1 +
 arch/x86/include/asm/pgtable.h              | 16 ++++++++++++++++
 arch/x86/include/asm/pgtable_64.h           | 15 ---------------
 arch/x86/include/asm/pgtable_64_types.h     |  2 ++
 5 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/pgtable-2level_types.h b/arch/x86/include/asm/pgtable-2level_types.h
index f982ef8..6deb6cd 100644
--- a/arch/x86/include/asm/pgtable-2level_types.h
+++ b/arch/x86/include/asm/pgtable-2level_types.h
@@ -35,4 +35,7 @@ typedef union {
 
 #define PTRS_PER_PTE	1024
 
+/* This covers all VMSPLIT_* and VMSPLIT_*_OPT variants */
+#define PGD_KERNEL_START	(CONFIG_PAGE_OFFSET >> PGDIR_SHIFT)
+
 #endif /* _ASM_X86_PGTABLE_2LEVEL_DEFS_H */
diff --git a/arch/x86/include/asm/pgtable-3level_types.h b/arch/x86/include/asm/pgtable-3level_types.h
index ed8a200..925ac1b 100644
--- a/arch/x86/include/asm/pgtable-3level_types.h
+++ b/arch/x86/include/asm/pgtable-3level_types.h
@@ -45,5 +45,6 @@ typedef union {
  */
 #define PTRS_PER_PTE	512
 
+#define PGD_KERNEL_START	(CONFIG_PAGE_OFFSET >> PGDIR_SHIFT)
 
 #endif /* _ASM_X86_PGTABLE_3LEVEL_DEFS_H */
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index c9d3cf9..82151f6 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1132,6 +1132,22 @@ static inline int pud_write(pud_t pud)
 	return pud_flags(pud) & _PAGE_RW;
 }
 
+/*
+ * Page table pages are page-aligned.  The lower half of the top
+ * level is used for userspace and the top half for the kernel.
+ *
+ * Returns true for parts of the PGD that map userspace and
+ * false for the parts that map the kernel.
+ */
+static inline bool pgdp_maps_userspace(void *__ptr)
+{
+	unsigned long ptr = (unsigned long)__ptr;
+
+	return (((ptr & ~PAGE_MASK) / sizeof(pgd_t)) < PGD_KERNEL_START);
+}
+
+static inline int pgd_large(pgd_t pgd) { return 0; }
+
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
 /*
  * All top-level PAGE_TABLE_ISOLATION page tables are order-1 pages
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 396664d..50a02a3 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -131,20 +131,6 @@ static inline pud_t native_pudp_get_and_clear(pud_t *xp)
 #endif
 }
 
-/*
- * Page table pages are page-aligned.  The lower half of the top
- * level is used for userspace and the top half for the kernel.
- *
- * Returns true for parts of the PGD that map userspace and
- * false for the parts that map the kernel.
- */
-static inline bool pgdp_maps_userspace(void *__ptr)
-{
-	unsigned long ptr = (unsigned long)__ptr;
-
-	return (ptr & ~PAGE_MASK) < (PAGE_SIZE / 2);
-}
-
 static inline void native_set_p4d(p4d_t *p4dp, p4d_t p4d)
 {
 #if defined(CONFIG_PAGE_TABLE_ISOLATION) && !defined(CONFIG_X86_5LEVEL)
@@ -187,7 +173,6 @@ extern void sync_global_pgds(unsigned long start, unsigned long end);
 /*
  * Level 4 access.
  */
-static inline int pgd_large(pgd_t pgd) { return 0; }
 #define mk_kernel_pgd(address) __pgd((address) | _KERNPG_TABLE)
 
 /* PUD - Level3 access */
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 6b8f73d..e57003a 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -124,4 +124,6 @@ typedef struct { pteval_t pte; } pte_t;
 
 #define EARLY_DYNAMIC_PAGE_TABLES	64
 
+#define PGD_KERNEL_START	((PAGE_SIZE / 2) / sizeof(pgd_t))
+
 #endif /* _ASM_X86_PGTABLE_64_DEFS_H */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 18/31] x86/pgtable: Move two more functions from pgtable_64.h to pgtable.h
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

These two functions are required for PTI on 32 bit:

	* pgdp_maps_userspace()
	* pgd_large()

Also re-implement pgdp_maps_userspace() so that it will work
on 64 and 32 bit kernels.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable-2level_types.h |  3 +++
 arch/x86/include/asm/pgtable-3level_types.h |  1 +
 arch/x86/include/asm/pgtable.h              | 16 ++++++++++++++++
 arch/x86/include/asm/pgtable_64.h           | 15 ---------------
 arch/x86/include/asm/pgtable_64_types.h     |  2 ++
 5 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/pgtable-2level_types.h b/arch/x86/include/asm/pgtable-2level_types.h
index f982ef8..6deb6cd 100644
--- a/arch/x86/include/asm/pgtable-2level_types.h
+++ b/arch/x86/include/asm/pgtable-2level_types.h
@@ -35,4 +35,7 @@ typedef union {
 
 #define PTRS_PER_PTE	1024
 
+/* This covers all VMSPLIT_* and VMSPLIT_*_OPT variants */
+#define PGD_KERNEL_START	(CONFIG_PAGE_OFFSET >> PGDIR_SHIFT)
+
 #endif /* _ASM_X86_PGTABLE_2LEVEL_DEFS_H */
diff --git a/arch/x86/include/asm/pgtable-3level_types.h b/arch/x86/include/asm/pgtable-3level_types.h
index ed8a200..925ac1b 100644
--- a/arch/x86/include/asm/pgtable-3level_types.h
+++ b/arch/x86/include/asm/pgtable-3level_types.h
@@ -45,5 +45,6 @@ typedef union {
  */
 #define PTRS_PER_PTE	512
 
+#define PGD_KERNEL_START	(CONFIG_PAGE_OFFSET >> PGDIR_SHIFT)
 
 #endif /* _ASM_X86_PGTABLE_3LEVEL_DEFS_H */
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index c9d3cf9..82151f6 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1132,6 +1132,22 @@ static inline int pud_write(pud_t pud)
 	return pud_flags(pud) & _PAGE_RW;
 }
 
+/*
+ * Page table pages are page-aligned.  The lower half of the top
+ * level is used for userspace and the top half for the kernel.
+ *
+ * Returns true for parts of the PGD that map userspace and
+ * false for the parts that map the kernel.
+ */
+static inline bool pgdp_maps_userspace(void *__ptr)
+{
+	unsigned long ptr = (unsigned long)__ptr;
+
+	return (((ptr & ~PAGE_MASK) / sizeof(pgd_t)) < PGD_KERNEL_START);
+}
+
+static inline int pgd_large(pgd_t pgd) { return 0; }
+
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
 /*
  * All top-level PAGE_TABLE_ISOLATION page tables are order-1 pages
diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h
index 396664d..50a02a3 100644
--- a/arch/x86/include/asm/pgtable_64.h
+++ b/arch/x86/include/asm/pgtable_64.h
@@ -131,20 +131,6 @@ static inline pud_t native_pudp_get_and_clear(pud_t *xp)
 #endif
 }
 
-/*
- * Page table pages are page-aligned.  The lower half of the top
- * level is used for userspace and the top half for the kernel.
- *
- * Returns true for parts of the PGD that map userspace and
- * false for the parts that map the kernel.
- */
-static inline bool pgdp_maps_userspace(void *__ptr)
-{
-	unsigned long ptr = (unsigned long)__ptr;
-
-	return (ptr & ~PAGE_MASK) < (PAGE_SIZE / 2);
-}
-
 static inline void native_set_p4d(p4d_t *p4dp, p4d_t p4d)
 {
 #if defined(CONFIG_PAGE_TABLE_ISOLATION) && !defined(CONFIG_X86_5LEVEL)
@@ -187,7 +173,6 @@ extern void sync_global_pgds(unsigned long start, unsigned long end);
 /*
  * Level 4 access.
  */
-static inline int pgd_large(pgd_t pgd) { return 0; }
 #define mk_kernel_pgd(address) __pgd((address) | _KERNPG_TABLE)
 
 /* PUD - Level3 access */
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index 6b8f73d..e57003a 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -124,4 +124,6 @@ typedef struct { pteval_t pte; } pte_t;
 
 #define EARLY_DYNAMIC_PAGE_TABLES	64
 
+#define PGD_KERNEL_START	((PAGE_SIZE / 2) / sizeof(pgd_t))
+
 #endif /* _ASM_X86_PGTABLE_64_DEFS_H */
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 19/31] x86/mm/pae: Populate valid user PGD entries
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Generic page-table code populates all non-leaf entries with
_KERNPG_TABLE bits set. This is fine for all paging modes
except PAE.

In PAE mode only a subset of the bits is allowed to be set.
Make sure we only set allowed bits by masking out the
reserved bits.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable_types.h | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 3696398..5027470 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -50,6 +50,7 @@
 #define _PAGE_GLOBAL	(_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
 #define _PAGE_SOFTW1	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW1)
 #define _PAGE_SOFTW2	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW2)
+#define _PAGE_SOFTW3	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW3)
 #define _PAGE_PAT	(_AT(pteval_t, 1) << _PAGE_BIT_PAT)
 #define _PAGE_PAT_LARGE (_AT(pteval_t, 1) << _PAGE_BIT_PAT_LARGE)
 #define _PAGE_SPECIAL	(_AT(pteval_t, 1) << _PAGE_BIT_SPECIAL)
@@ -267,14 +268,35 @@ typedef struct pgprot { pgprotval_t pgprot; } pgprot_t;
 
 typedef struct { pgdval_t pgd; } pgd_t;
 
+#ifdef CONFIG_X86_PAE
+
+/*
+ * PHYSICAL_PAGE_MASK might be non-constant when SME is compiled in, so we can't
+ * use it here.
+ */
+#define PGD_PAE_PHYS_MASK	(((1ULL << __PHYSICAL_MASK_SHIFT)-1) & PAGE_MASK)
+
+/*
+ * PAE allows Base Address, P, PWT, PCD and AVL bits to be set in PGD entries.
+ * All other bits are Reserved MBZ
+ */
+#define PGD_ALLOWED_BITS	(PGD_PAE_PHYS_MASK | _PAGE_PRESENT | \
+				 _PAGE_PWT | _PAGE_PCD | \
+				 _PAGE_SOFTW1 | _PAGE_SOFTW2 | _PAGE_SOFTW3 )
+
+#else
+/* No need to mask any bits for !PAE */
+#define PGD_ALLOWED_BITS	(~0ULL)
+#endif
+
 static inline pgd_t native_make_pgd(pgdval_t val)
 {
-	return (pgd_t) { val };
+	return (pgd_t) { val & PGD_ALLOWED_BITS };
 }
 
 static inline pgdval_t native_pgd_val(pgd_t pgd)
 {
-	return pgd.pgd;
+	return pgd.pgd & PGD_ALLOWED_BITS;
 }
 
 static inline pgdval_t pgd_flags(pgd_t pgd)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 19/31] x86/mm/pae: Populate valid user PGD entries
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Generic page-table code populates all non-leaf entries with
_KERNPG_TABLE bits set. This is fine for all paging modes
except PAE.

In PAE mode only a subset of the bits is allowed to be set.
Make sure we only set allowed bits by masking out the
reserved bits.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable_types.h | 26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 3696398..5027470 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -50,6 +50,7 @@
 #define _PAGE_GLOBAL	(_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
 #define _PAGE_SOFTW1	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW1)
 #define _PAGE_SOFTW2	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW2)
+#define _PAGE_SOFTW3	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW3)
 #define _PAGE_PAT	(_AT(pteval_t, 1) << _PAGE_BIT_PAT)
 #define _PAGE_PAT_LARGE (_AT(pteval_t, 1) << _PAGE_BIT_PAT_LARGE)
 #define _PAGE_SPECIAL	(_AT(pteval_t, 1) << _PAGE_BIT_SPECIAL)
@@ -267,14 +268,35 @@ typedef struct pgprot { pgprotval_t pgprot; } pgprot_t;
 
 typedef struct { pgdval_t pgd; } pgd_t;
 
+#ifdef CONFIG_X86_PAE
+
+/*
+ * PHYSICAL_PAGE_MASK might be non-constant when SME is compiled in, so we can't
+ * use it here.
+ */
+#define PGD_PAE_PHYS_MASK	(((1ULL << __PHYSICAL_MASK_SHIFT)-1) & PAGE_MASK)
+
+/*
+ * PAE allows Base Address, P, PWT, PCD and AVL bits to be set in PGD entries.
+ * All other bits are Reserved MBZ
+ */
+#define PGD_ALLOWED_BITS	(PGD_PAE_PHYS_MASK | _PAGE_PRESENT | \
+				 _PAGE_PWT | _PAGE_PCD | \
+				 _PAGE_SOFTW1 | _PAGE_SOFTW2 | _PAGE_SOFTW3 )
+
+#else
+/* No need to mask any bits for !PAE */
+#define PGD_ALLOWED_BITS	(~0ULL)
+#endif
+
 static inline pgd_t native_make_pgd(pgdval_t val)
 {
-	return (pgd_t) { val };
+	return (pgd_t) { val & PGD_ALLOWED_BITS };
 }
 
 static inline pgdval_t native_pgd_val(pgd_t pgd)
 {
-	return pgd.pgd;
+	return pgd.pgd & PGD_ALLOWED_BITS;
 }
 
 static inline pgdval_t pgd_flags(pgd_t pgd)
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 20/31] x86/mm/pae: Populate the user page-table with user pgd's
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

When we populate a PGD entry, make sure we populate it in
the user page-table too.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable-3level.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
index bc4af54..1a0661b 100644
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -98,6 +98,9 @@ static inline void native_set_pmd(pmd_t *pmdp, pmd_t pmd)
 
 static inline void native_set_pud(pud_t *pudp, pud_t pud)
 {
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+	pud.p4d.pgd = pti_set_user_pgd(&pudp->p4d.pgd, pud.p4d.pgd);
+#endif
 	set_64bit((unsigned long long *)(pudp), native_pud_val(pud));
 }
 
@@ -194,6 +197,10 @@ static inline pud_t native_pudp_get_and_clear(pud_t *pudp)
 {
 	union split_pud res, *orig = (union split_pud *)pudp;
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+	pti_set_user_pgd(&pudp->p4d.pgd, __pgd(0));
+#endif
+
 	/* xchg acts as a barrier before setting of the high bits */
 	res.pud_low = xchg(&orig->pud_low, 0);
 	res.pud_high = orig->pud_high;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 20/31] x86/mm/pae: Populate the user page-table with user pgd's
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

When we populate a PGD entry, make sure we populate it in
the user page-table too.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable-3level.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
index bc4af54..1a0661b 100644
--- a/arch/x86/include/asm/pgtable-3level.h
+++ b/arch/x86/include/asm/pgtable-3level.h
@@ -98,6 +98,9 @@ static inline void native_set_pmd(pmd_t *pmdp, pmd_t pmd)
 
 static inline void native_set_pud(pud_t *pudp, pud_t pud)
 {
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+	pud.p4d.pgd = pti_set_user_pgd(&pudp->p4d.pgd, pud.p4d.pgd);
+#endif
 	set_64bit((unsigned long long *)(pudp), native_pud_val(pud));
 }
 
@@ -194,6 +197,10 @@ static inline pud_t native_pudp_get_and_clear(pud_t *pudp)
 {
 	union split_pud res, *orig = (union split_pud *)pudp;
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+	pti_set_user_pgd(&pudp->p4d.pgd, __pgd(0));
+#endif
+
 	/* xchg acts as a barrier before setting of the high bits */
 	res.pud_low = xchg(&orig->pud_low, 0);
 	res.pud_high = orig->pud_high;
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 21/31] x86/mm/legacy: Populate the user page-table with user pgd's
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Also populate the user-spage pgd's in the user page-table.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable-2level.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/include/asm/pgtable-2level.h b/arch/x86/include/asm/pgtable-2level.h
index 685ffe8..d772554 100644
--- a/arch/x86/include/asm/pgtable-2level.h
+++ b/arch/x86/include/asm/pgtable-2level.h
@@ -19,6 +19,9 @@ static inline void native_set_pte(pte_t *ptep , pte_t pte)
 
 static inline void native_set_pmd(pmd_t *pmdp, pmd_t pmd)
 {
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+	pmd.pud.p4d.pgd = pti_set_user_pgd(&pmdp->pud.p4d.pgd, pmd.pud.p4d.pgd);
+#endif
 	*pmdp = pmd;
 }
 
@@ -58,6 +61,9 @@ static inline pte_t native_ptep_get_and_clear(pte_t *xp)
 #ifdef CONFIG_SMP
 static inline pmd_t native_pmdp_get_and_clear(pmd_t *xp)
 {
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+	pti_set_user_pgd(&xp->pud.p4d.pgd, __pgd(0));
+#endif
 	return __pmd(xchg((pmdval_t *)xp, 0));
 }
 #else
@@ -67,6 +73,9 @@ static inline pmd_t native_pmdp_get_and_clear(pmd_t *xp)
 #ifdef CONFIG_SMP
 static inline pud_t native_pudp_get_and_clear(pud_t *xp)
 {
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+	pti_set_user_pgd(&xp->p4d.pgd, __pgd(0));
+#endif
 	return __pud(xchg((pudval_t *)xp, 0));
 }
 #else
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 21/31] x86/mm/legacy: Populate the user page-table with user pgd's
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Also populate the user-spage pgd's in the user page-table.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable-2level.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/x86/include/asm/pgtable-2level.h b/arch/x86/include/asm/pgtable-2level.h
index 685ffe8..d772554 100644
--- a/arch/x86/include/asm/pgtable-2level.h
+++ b/arch/x86/include/asm/pgtable-2level.h
@@ -19,6 +19,9 @@ static inline void native_set_pte(pte_t *ptep , pte_t pte)
 
 static inline void native_set_pmd(pmd_t *pmdp, pmd_t pmd)
 {
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+	pmd.pud.p4d.pgd = pti_set_user_pgd(&pmdp->pud.p4d.pgd, pmd.pud.p4d.pgd);
+#endif
 	*pmdp = pmd;
 }
 
@@ -58,6 +61,9 @@ static inline pte_t native_ptep_get_and_clear(pte_t *xp)
 #ifdef CONFIG_SMP
 static inline pmd_t native_pmdp_get_and_clear(pmd_t *xp)
 {
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+	pti_set_user_pgd(&xp->pud.p4d.pgd, __pgd(0));
+#endif
 	return __pmd(xchg((pmdval_t *)xp, 0));
 }
 #else
@@ -67,6 +73,9 @@ static inline pmd_t native_pmdp_get_and_clear(pmd_t *xp)
 #ifdef CONFIG_SMP
 static inline pud_t native_pudp_get_and_clear(pud_t *xp)
 {
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+	pti_set_user_pgd(&xp->p4d.pgd, __pgd(0));
+#endif
 	return __pud(xchg((pudval_t *)xp, 0));
 }
 #else
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 22/31] x86/mm/pti: Add an overflow check to pti_clone_pmds()
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

The addr counter will overflow if we clone the last PMD of
the address space, resulting in an endless loop.

Check for that and bail out of the loop when it happens.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/mm/pti.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index ce38f16..7f5e698 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -282,6 +282,10 @@ pti_clone_pmds(unsigned long start, unsigned long end, pmdval_t clear)
 		p4d_t *p4d;
 		pud_t *pud;
 
+		/* Overflow check */
+		if (addr < start)
+			break;
+
 		pgd = pgd_offset_k(addr);
 		if (WARN_ON(pgd_none(*pgd)))
 			return;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 22/31] x86/mm/pti: Add an overflow check to pti_clone_pmds()
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

The addr counter will overflow if we clone the last PMD of
the address space, resulting in an endless loop.

Check for that and bail out of the loop when it happens.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/mm/pti.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index ce38f16..7f5e698 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -282,6 +282,10 @@ pti_clone_pmds(unsigned long start, unsigned long end, pmdval_t clear)
 		p4d_t *p4d;
 		pud_t *pud;
 
+		/* Overflow check */
+		if (addr < start)
+			break;
+
 		pgd = pgd_offset_k(addr);
 		if (WARN_ON(pgd_none(*pgd)))
 			return;
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 23/31] x86/mm/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Move it out of the X86_64 specific processor defines so
that its visible for 32bit too.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/processor-flags.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/processor-flags.h b/arch/x86/include/asm/processor-flags.h
index 625a52a..02c2cbd 100644
--- a/arch/x86/include/asm/processor-flags.h
+++ b/arch/x86/include/asm/processor-flags.h
@@ -39,10 +39,6 @@
 #define CR3_PCID_MASK	0xFFFull
 #define CR3_NOFLUSH	BIT_ULL(63)
 
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
-# define X86_CR3_PTI_PCID_USER_BIT	11
-#endif
-
 #else
 /*
  * CR3_ADDR_MASK needs at least bits 31:5 set on PAE systems, and we save
@@ -53,4 +49,8 @@
 #define CR3_NOFLUSH	0
 #endif
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+# define X86_CR3_PTI_PCID_USER_BIT	11
+#endif
+
 #endif /* _ASM_X86_PROCESSOR_FLAGS_H */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 23/31] x86/mm/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Move it out of the X86_64 specific processor defines so
that its visible for 32bit too.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/processor-flags.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/processor-flags.h b/arch/x86/include/asm/processor-flags.h
index 625a52a..02c2cbd 100644
--- a/arch/x86/include/asm/processor-flags.h
+++ b/arch/x86/include/asm/processor-flags.h
@@ -39,10 +39,6 @@
 #define CR3_PCID_MASK	0xFFFull
 #define CR3_NOFLUSH	BIT_ULL(63)
 
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
-# define X86_CR3_PTI_PCID_USER_BIT	11
-#endif
-
 #else
 /*
  * CR3_ADDR_MASK needs at least bits 31:5 set on PAE systems, and we save
@@ -53,4 +49,8 @@
 #define CR3_NOFLUSH	0
 #endif
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+# define X86_CR3_PTI_PCID_USER_BIT	11
+#endif
+
 #endif /* _ASM_X86_PROCESSOR_FLAGS_H */
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 24/31] x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level on x86_32
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Cloning on the P4D level would clone the complete kernel
address space into the user-space page-tables for PAE
kernels. Cloning on PMD level is fine for PAE and legacy
paging.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/mm/pti.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 7f5e698..ec9852a 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -312,6 +312,7 @@ pti_clone_pmds(unsigned long start, unsigned long end, pmdval_t clear)
 	}
 }
 
+#ifdef CONFIG_X86_64
 /*
  * Clone a single p4d (i.e. a top-level entry on 4-level systems and a
  * next-level entry on 5-level systems.
@@ -335,6 +336,25 @@ static void __init pti_clone_user_shared(void)
 	pti_clone_p4d(CPU_ENTRY_AREA_BASE);
 }
 
+#else /* CONFIG_X86_64 */
+
+/*
+ * On 32 bit PAE systems with 1GB of Kernel address space there is only
+ * one pgd/p4d for the whole kernel. Cloning that would map the whole
+ * address space into the user page-tables, making PTI useless. So clone
+ * the page-table on the PMD level to prevent that.
+ */
+static void __init pti_clone_user_shared(void)
+{
+	unsigned long start, end;
+
+	start = CPU_ENTRY_AREA_BASE;
+	end   = start + (PAGE_SIZE * CPU_ENTRY_AREA_PAGES);
+
+	pti_clone_pmds(start, end, _PAGE_GLOBAL);
+}
+#endif /* CONFIG_X86_64 */
+
 /*
  * Clone the ESPFIX P4D into the user space visinble page table
  */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 24/31] x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level on x86_32
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Cloning on the P4D level would clone the complete kernel
address space into the user-space page-tables for PAE
kernels. Cloning on PMD level is fine for PAE and legacy
paging.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/mm/pti.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c
index 7f5e698..ec9852a 100644
--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -312,6 +312,7 @@ pti_clone_pmds(unsigned long start, unsigned long end, pmdval_t clear)
 	}
 }
 
+#ifdef CONFIG_X86_64
 /*
  * Clone a single p4d (i.e. a top-level entry on 4-level systems and a
  * next-level entry on 5-level systems.
@@ -335,6 +336,25 @@ static void __init pti_clone_user_shared(void)
 	pti_clone_p4d(CPU_ENTRY_AREA_BASE);
 }
 
+#else /* CONFIG_X86_64 */
+
+/*
+ * On 32 bit PAE systems with 1GB of Kernel address space there is only
+ * one pgd/p4d for the whole kernel. Cloning that would map the whole
+ * address space into the user page-tables, making PTI useless. So clone
+ * the page-table on the PMD level to prevent that.
+ */
+static void __init pti_clone_user_shared(void)
+{
+	unsigned long start, end;
+
+	start = CPU_ENTRY_AREA_BASE;
+	end   = start + (PAGE_SIZE * CPU_ENTRY_AREA_PAGES);
+
+	pti_clone_pmds(start, end, _PAGE_GLOBAL);
+}
+#endif /* CONFIG_X86_64 */
+
 /*
  * Clone the ESPFIX P4D into the user space visinble page table
  */
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 25/31] x86/mm/dump_pagetables: Define INIT_PGD
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Define INIT_PGD to point to the correct initial page-table
for 32 and 64 bit and use it where needed. This fixes the
build on 32 bit with CONFIG_PAGE_TABLE_ISOLATION enabled.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/mm/dump_pagetables.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 2a4849e..2151ebb 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -105,6 +105,8 @@ static struct addr_marker address_markers[] = {
 	[END_OF_SPACE_NR]	= { -1,			NULL }
 };
 
+#define INIT_PGD	((pgd_t *) &init_top_pgt)
+
 #else /* CONFIG_X86_64 */
 
 enum address_markers_idx {
@@ -133,6 +135,8 @@ static struct addr_marker address_markers[] = {
 	[END_OF_SPACE_NR]	= { -1,			NULL }
 };
 
+#define INIT_PGD	(swapper_pg_dir)
+
 #endif /* !CONFIG_X86_64 */
 
 /* Multipliers for offsets within the PTEs */
@@ -478,11 +482,7 @@ static inline bool is_hypervisor_range(int idx)
 static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
 				       bool checkwx, bool dmesg)
 {
-#ifdef CONFIG_X86_64
-	pgd_t *start = (pgd_t *) &init_top_pgt;
-#else
-	pgd_t *start = swapper_pg_dir;
-#endif
+	pgd_t *start = INIT_PGD;
 	pgprotval_t prot;
 	int i;
 	struct pg_state st = {};
@@ -543,7 +543,7 @@ EXPORT_SYMBOL_GPL(ptdump_walk_pgd_level_debugfs);
 static void ptdump_walk_user_pgd_level_checkwx(void)
 {
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
-	pgd_t *pgd = (pgd_t *) &init_top_pgt;
+	pgd_t *pgd = INIT_PGD;
 
 	if (!static_cpu_has(X86_FEATURE_PTI))
 		return;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 25/31] x86/mm/dump_pagetables: Define INIT_PGD
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Define INIT_PGD to point to the correct initial page-table
for 32 and 64 bit and use it where needed. This fixes the
build on 32 bit with CONFIG_PAGE_TABLE_ISOLATION enabled.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/mm/dump_pagetables.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 2a4849e..2151ebb 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -105,6 +105,8 @@ static struct addr_marker address_markers[] = {
 	[END_OF_SPACE_NR]	= { -1,			NULL }
 };
 
+#define INIT_PGD	((pgd_t *) &init_top_pgt)
+
 #else /* CONFIG_X86_64 */
 
 enum address_markers_idx {
@@ -133,6 +135,8 @@ static struct addr_marker address_markers[] = {
 	[END_OF_SPACE_NR]	= { -1,			NULL }
 };
 
+#define INIT_PGD	(swapper_pg_dir)
+
 #endif /* !CONFIG_X86_64 */
 
 /* Multipliers for offsets within the PTEs */
@@ -478,11 +482,7 @@ static inline bool is_hypervisor_range(int idx)
 static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
 				       bool checkwx, bool dmesg)
 {
-#ifdef CONFIG_X86_64
-	pgd_t *start = (pgd_t *) &init_top_pgt;
-#else
-	pgd_t *start = swapper_pg_dir;
-#endif
+	pgd_t *start = INIT_PGD;
 	pgprotval_t prot;
 	int i;
 	struct pg_state st = {};
@@ -543,7 +543,7 @@ EXPORT_SYMBOL_GPL(ptdump_walk_pgd_level_debugfs);
 static void ptdump_walk_user_pgd_level_checkwx(void)
 {
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
-	pgd_t *pgd = (pgd_t *) &init_top_pgt;
+	pgd_t *pgd = INIT_PGD;
 
 	if (!static_cpu_has(X86_FEATURE_PTI))
 		return;
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 26/31] x86/pgtable/pae: Use separate kernel PMDs for user page-table
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

We need separate kernel PMDs in the user page-table when PTI
is enabled to map the per-process LDT for user-space.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/mm/pgtable.c | 100 ++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 81 insertions(+), 19 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index a81d42e..d95bc7b 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -177,6 +177,14 @@ static void pgd_dtor(pgd_t *pgd)
  */
 #define PREALLOCATED_PMDS	UNSHARED_PTRS_PER_PGD
 
+/*
+ * We allocate separate PMDs for the kernel part of the user page-table
+ * when PTI is enabled. We need them to map the per-process LDT into the
+ * user-space page-table.
+ */
+#define PREALLOCATED_USER_PMDS	 (static_cpu_has(X86_FEATURE_PTI) ? \
+					KERNEL_PGD_PTRS : 0)
+
 void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd)
 {
 	paravirt_alloc_pmd(mm, __pa(pmd) >> PAGE_SHIFT);
@@ -197,14 +205,14 @@ void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd)
 
 /* No need to prepopulate any pagetable entries in non-PAE modes. */
 #define PREALLOCATED_PMDS	0
-
+#define PREALLOCATED_USER_PMDS	 0
 #endif	/* CONFIG_X86_PAE */
 
-static void free_pmds(struct mm_struct *mm, pmd_t *pmds[])
+static void free_pmds(struct mm_struct *mm, pmd_t *pmds[], int count)
 {
 	int i;
 
-	for(i = 0; i < PREALLOCATED_PMDS; i++)
+	for(i = 0; i < count; i++)
 		if (pmds[i]) {
 			pgtable_pmd_page_dtor(virt_to_page(pmds[i]));
 			free_page((unsigned long)pmds[i]);
@@ -212,7 +220,7 @@ static void free_pmds(struct mm_struct *mm, pmd_t *pmds[])
 		}
 }
 
-static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[])
+static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[], int count)
 {
 	int i;
 	bool failed = false;
@@ -221,7 +229,7 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[])
 	if (mm == &init_mm)
 		gfp &= ~__GFP_ACCOUNT;
 
-	for(i = 0; i < PREALLOCATED_PMDS; i++) {
+	for(i = 0; i < count; i++) {
 		pmd_t *pmd = (pmd_t *)__get_free_page(gfp);
 		if (!pmd)
 			failed = true;
@@ -236,7 +244,7 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[])
 	}
 
 	if (failed) {
-		free_pmds(mm, pmds);
+		free_pmds(mm, pmds, count);
 		return -ENOMEM;
 	}
 
@@ -249,23 +257,38 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[])
  * preallocate which never got a corresponding vma will need to be
  * freed manually.
  */
+static void mop_up_one_pmd(struct mm_struct *mm, pgd_t *pgdp)
+{
+	pgd_t pgd = *pgdp;
+
+	if (pgd_val(pgd) != 0) {
+		pmd_t *pmd = (pmd_t *)pgd_page_vaddr(pgd);
+
+		*pgdp = native_make_pgd(0);
+
+		paravirt_release_pmd(pgd_val(pgd) >> PAGE_SHIFT);
+		pmd_free(mm, pmd);
+		mm_dec_nr_pmds(mm);
+	}
+}
+
 static void pgd_mop_up_pmds(struct mm_struct *mm, pgd_t *pgdp)
 {
 	int i;
 
-	for(i = 0; i < PREALLOCATED_PMDS; i++) {
-		pgd_t pgd = pgdp[i];
+	for(i = 0; i < PREALLOCATED_PMDS; i++)
+		mop_up_one_pmd(mm, &pgdp[i]);
 
-		if (pgd_val(pgd) != 0) {
-			pmd_t *pmd = (pmd_t *)pgd_page_vaddr(pgd);
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
 
-			pgdp[i] = native_make_pgd(0);
+	if (!static_cpu_has(X86_FEATURE_PTI))
+		return;
 
-			paravirt_release_pmd(pgd_val(pgd) >> PAGE_SHIFT);
-			pmd_free(mm, pmd);
-			mm_dec_nr_pmds(mm);
-		}
-	}
+	pgdp = kernel_to_user_pgdp(pgdp);
+
+	for (i = 0; i < PREALLOCATED_USER_PMDS; i++)
+		mop_up_one_pmd(mm, &pgdp[i + KERNEL_PGD_BOUNDARY]);
+#endif
 }
 
 static void pgd_prepopulate_pmd(struct mm_struct *mm, pgd_t *pgd, pmd_t *pmds[])
@@ -291,6 +314,38 @@ static void pgd_prepopulate_pmd(struct mm_struct *mm, pgd_t *pgd, pmd_t *pmds[])
 	}
 }
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+static void pgd_prepopulate_user_pmd(struct mm_struct *mm,
+				     pgd_t *k_pgd, pmd_t *pmds[])
+{
+	pgd_t *s_pgd = kernel_to_user_pgdp(swapper_pg_dir);
+	pgd_t *u_pgd = kernel_to_user_pgdp(k_pgd);
+	p4d_t *u_p4d;
+	pud_t *u_pud;
+	int i;
+
+	u_p4d = p4d_offset(u_pgd, 0);
+	u_pud = pud_offset(u_p4d, 0);
+
+	s_pgd += KERNEL_PGD_BOUNDARY;
+	u_pud += KERNEL_PGD_BOUNDARY;
+
+	for (i = 0; i < PREALLOCATED_USER_PMDS; i++, u_pud++, s_pgd++) {
+		pmd_t *pmd = pmds[i];
+
+		memcpy(pmd, (pmd_t *)pgd_page_vaddr(*s_pgd),
+		       sizeof(pmd_t) * PTRS_PER_PMD);
+
+		pud_populate(mm, u_pud, pmd);
+	}
+
+}
+#else
+static void pgd_prepopulate_user_pmd(struct mm_struct *mm,
+				     pgd_t *k_pgd, pmd_t *pmds[])
+{
+}
+#endif
 /*
  * Xen paravirt assumes pgd table should be in one page. 64 bit kernel also
  * assumes that pgd should be in one page.
@@ -371,6 +426,7 @@ static inline void _pgd_free(pgd_t *pgd)
 pgd_t *pgd_alloc(struct mm_struct *mm)
 {
 	pgd_t *pgd;
+	pmd_t *u_pmds[PREALLOCATED_USER_PMDS];
 	pmd_t *pmds[PREALLOCATED_PMDS];
 
 	pgd = _pgd_alloc();
@@ -380,12 +436,15 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
 
 	mm->pgd = pgd;
 
-	if (preallocate_pmds(mm, pmds) != 0)
+	if (preallocate_pmds(mm, pmds, PREALLOCATED_PMDS) != 0)
 		goto out_free_pgd;
 
-	if (paravirt_pgd_alloc(mm) != 0)
+	if (preallocate_pmds(mm, u_pmds, PREALLOCATED_USER_PMDS) != 0)
 		goto out_free_pmds;
 
+	if (paravirt_pgd_alloc(mm) != 0)
+		goto out_free_user_pmds;
+
 	/*
 	 * Make sure that pre-populating the pmds is atomic with
 	 * respect to anything walking the pgd_list, so that they
@@ -395,13 +454,16 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
 
 	pgd_ctor(mm, pgd);
 	pgd_prepopulate_pmd(mm, pgd, pmds);
+	pgd_prepopulate_user_pmd(mm, pgd, u_pmds);
 
 	spin_unlock(&pgd_lock);
 
 	return pgd;
 
+out_free_user_pmds:
+	free_pmds(mm, u_pmds, PREALLOCATED_USER_PMDS);
 out_free_pmds:
-	free_pmds(mm, pmds);
+	free_pmds(mm, pmds, PREALLOCATED_PMDS);
 out_free_pgd:
 	_pgd_free(pgd);
 out:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 26/31] x86/pgtable/pae: Use separate kernel PMDs for user page-table
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

We need separate kernel PMDs in the user page-table when PTI
is enabled to map the per-process LDT for user-space.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/mm/pgtable.c | 100 ++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 81 insertions(+), 19 deletions(-)

diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index a81d42e..d95bc7b 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -177,6 +177,14 @@ static void pgd_dtor(pgd_t *pgd)
  */
 #define PREALLOCATED_PMDS	UNSHARED_PTRS_PER_PGD
 
+/*
+ * We allocate separate PMDs for the kernel part of the user page-table
+ * when PTI is enabled. We need them to map the per-process LDT into the
+ * user-space page-table.
+ */
+#define PREALLOCATED_USER_PMDS	 (static_cpu_has(X86_FEATURE_PTI) ? \
+					KERNEL_PGD_PTRS : 0)
+
 void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd)
 {
 	paravirt_alloc_pmd(mm, __pa(pmd) >> PAGE_SHIFT);
@@ -197,14 +205,14 @@ void pud_populate(struct mm_struct *mm, pud_t *pudp, pmd_t *pmd)
 
 /* No need to prepopulate any pagetable entries in non-PAE modes. */
 #define PREALLOCATED_PMDS	0
-
+#define PREALLOCATED_USER_PMDS	 0
 #endif	/* CONFIG_X86_PAE */
 
-static void free_pmds(struct mm_struct *mm, pmd_t *pmds[])
+static void free_pmds(struct mm_struct *mm, pmd_t *pmds[], int count)
 {
 	int i;
 
-	for(i = 0; i < PREALLOCATED_PMDS; i++)
+	for(i = 0; i < count; i++)
 		if (pmds[i]) {
 			pgtable_pmd_page_dtor(virt_to_page(pmds[i]));
 			free_page((unsigned long)pmds[i]);
@@ -212,7 +220,7 @@ static void free_pmds(struct mm_struct *mm, pmd_t *pmds[])
 		}
 }
 
-static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[])
+static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[], int count)
 {
 	int i;
 	bool failed = false;
@@ -221,7 +229,7 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[])
 	if (mm == &init_mm)
 		gfp &= ~__GFP_ACCOUNT;
 
-	for(i = 0; i < PREALLOCATED_PMDS; i++) {
+	for(i = 0; i < count; i++) {
 		pmd_t *pmd = (pmd_t *)__get_free_page(gfp);
 		if (!pmd)
 			failed = true;
@@ -236,7 +244,7 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[])
 	}
 
 	if (failed) {
-		free_pmds(mm, pmds);
+		free_pmds(mm, pmds, count);
 		return -ENOMEM;
 	}
 
@@ -249,23 +257,38 @@ static int preallocate_pmds(struct mm_struct *mm, pmd_t *pmds[])
  * preallocate which never got a corresponding vma will need to be
  * freed manually.
  */
+static void mop_up_one_pmd(struct mm_struct *mm, pgd_t *pgdp)
+{
+	pgd_t pgd = *pgdp;
+
+	if (pgd_val(pgd) != 0) {
+		pmd_t *pmd = (pmd_t *)pgd_page_vaddr(pgd);
+
+		*pgdp = native_make_pgd(0);
+
+		paravirt_release_pmd(pgd_val(pgd) >> PAGE_SHIFT);
+		pmd_free(mm, pmd);
+		mm_dec_nr_pmds(mm);
+	}
+}
+
 static void pgd_mop_up_pmds(struct mm_struct *mm, pgd_t *pgdp)
 {
 	int i;
 
-	for(i = 0; i < PREALLOCATED_PMDS; i++) {
-		pgd_t pgd = pgdp[i];
+	for(i = 0; i < PREALLOCATED_PMDS; i++)
+		mop_up_one_pmd(mm, &pgdp[i]);
 
-		if (pgd_val(pgd) != 0) {
-			pmd_t *pmd = (pmd_t *)pgd_page_vaddr(pgd);
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
 
-			pgdp[i] = native_make_pgd(0);
+	if (!static_cpu_has(X86_FEATURE_PTI))
+		return;
 
-			paravirt_release_pmd(pgd_val(pgd) >> PAGE_SHIFT);
-			pmd_free(mm, pmd);
-			mm_dec_nr_pmds(mm);
-		}
-	}
+	pgdp = kernel_to_user_pgdp(pgdp);
+
+	for (i = 0; i < PREALLOCATED_USER_PMDS; i++)
+		mop_up_one_pmd(mm, &pgdp[i + KERNEL_PGD_BOUNDARY]);
+#endif
 }
 
 static void pgd_prepopulate_pmd(struct mm_struct *mm, pgd_t *pgd, pmd_t *pmds[])
@@ -291,6 +314,38 @@ static void pgd_prepopulate_pmd(struct mm_struct *mm, pgd_t *pgd, pmd_t *pmds[])
 	}
 }
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+static void pgd_prepopulate_user_pmd(struct mm_struct *mm,
+				     pgd_t *k_pgd, pmd_t *pmds[])
+{
+	pgd_t *s_pgd = kernel_to_user_pgdp(swapper_pg_dir);
+	pgd_t *u_pgd = kernel_to_user_pgdp(k_pgd);
+	p4d_t *u_p4d;
+	pud_t *u_pud;
+	int i;
+
+	u_p4d = p4d_offset(u_pgd, 0);
+	u_pud = pud_offset(u_p4d, 0);
+
+	s_pgd += KERNEL_PGD_BOUNDARY;
+	u_pud += KERNEL_PGD_BOUNDARY;
+
+	for (i = 0; i < PREALLOCATED_USER_PMDS; i++, u_pud++, s_pgd++) {
+		pmd_t *pmd = pmds[i];
+
+		memcpy(pmd, (pmd_t *)pgd_page_vaddr(*s_pgd),
+		       sizeof(pmd_t) * PTRS_PER_PMD);
+
+		pud_populate(mm, u_pud, pmd);
+	}
+
+}
+#else
+static void pgd_prepopulate_user_pmd(struct mm_struct *mm,
+				     pgd_t *k_pgd, pmd_t *pmds[])
+{
+}
+#endif
 /*
  * Xen paravirt assumes pgd table should be in one page. 64 bit kernel also
  * assumes that pgd should be in one page.
@@ -371,6 +426,7 @@ static inline void _pgd_free(pgd_t *pgd)
 pgd_t *pgd_alloc(struct mm_struct *mm)
 {
 	pgd_t *pgd;
+	pmd_t *u_pmds[PREALLOCATED_USER_PMDS];
 	pmd_t *pmds[PREALLOCATED_PMDS];
 
 	pgd = _pgd_alloc();
@@ -380,12 +436,15 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
 
 	mm->pgd = pgd;
 
-	if (preallocate_pmds(mm, pmds) != 0)
+	if (preallocate_pmds(mm, pmds, PREALLOCATED_PMDS) != 0)
 		goto out_free_pgd;
 
-	if (paravirt_pgd_alloc(mm) != 0)
+	if (preallocate_pmds(mm, u_pmds, PREALLOCATED_USER_PMDS) != 0)
 		goto out_free_pmds;
 
+	if (paravirt_pgd_alloc(mm) != 0)
+		goto out_free_user_pmds;
+
 	/*
 	 * Make sure that pre-populating the pmds is atomic with
 	 * respect to anything walking the pgd_list, so that they
@@ -395,13 +454,16 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
 
 	pgd_ctor(mm, pgd);
 	pgd_prepopulate_pmd(mm, pgd, pmds);
+	pgd_prepopulate_user_pmd(mm, pgd, u_pmds);
 
 	spin_unlock(&pgd_lock);
 
 	return pgd;
 
+out_free_user_pmds:
+	free_pmds(mm, u_pmds, PREALLOCATED_USER_PMDS);
 out_free_pmds:
-	free_pmds(mm, pmds);
+	free_pmds(mm, pmds, PREALLOCATED_PMDS);
 out_free_pgd:
 	_pgd_free(pgd);
 out:
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 27/31] x86/ldt: Reserve address-space range on 32 bit for the LDT
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Reserve 2MB/4MB of address-space for mapping the LDT to
user-space on 32 bit PTI kernels.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable_32_types.h | 7 +++++--
 arch/x86/mm/dump_pagetables.c           | 9 +++++++++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_32_types.h b/arch/x86/include/asm/pgtable_32_types.h
index 0777e18..eb2e97a 100644
--- a/arch/x86/include/asm/pgtable_32_types.h
+++ b/arch/x86/include/asm/pgtable_32_types.h
@@ -48,13 +48,16 @@ extern bool __vmalloc_start_set; /* set once high_memory is set */
 	((FIXADDR_TOT_START - PAGE_SIZE * (CPU_ENTRY_AREA_PAGES + 1))   \
 	 & PMD_MASK)
 
-#define PKMAP_BASE		\
+#define LDT_BASE_ADDR		\
 	((CPU_ENTRY_AREA_BASE - PAGE_SIZE) & PMD_MASK)
 
+#define PKMAP_BASE		\
+	((LDT_BASE_ADDR - PAGE_SIZE) & PMD_MASK)
+
 #ifdef CONFIG_HIGHMEM
 # define VMALLOC_END	(PKMAP_BASE - 2 * PAGE_SIZE)
 #else
-# define VMALLOC_END	(CPU_ENTRY_AREA_BASE - 2 * PAGE_SIZE)
+# define VMALLOC_END	(LDT_BASE_ADDR - 2 * PAGE_SIZE)
 #endif
 
 #define MODULES_VADDR	VMALLOC_START
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 2151ebb..fdefdf0 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -117,6 +117,9 @@ enum address_markers_idx {
 #ifdef CONFIG_HIGHMEM
 	PKMAP_BASE_NR,
 #endif
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+	LDT_NR,
+#endif
 	CPU_ENTRY_AREA_NR,
 	FIXADDR_START_NR,
 	END_OF_SPACE_NR,
@@ -130,6 +133,9 @@ static struct addr_marker address_markers[] = {
 #ifdef CONFIG_HIGHMEM
 	[PKMAP_BASE_NR]		= { 0UL,		"Persistent kmap() Area" },
 #endif
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+	[LDT_NR]		= { 0UL,		"LDT remap" },
+#endif
 	[CPU_ENTRY_AREA_NR]	= { 0UL,		"CPU entry area" },
 	[FIXADDR_START_NR]	= { 0UL,		"Fixmap area" },
 	[END_OF_SPACE_NR]	= { -1,			NULL }
@@ -579,6 +585,9 @@ static int __init pt_dump_init(void)
 # endif
 	address_markers[FIXADDR_START_NR].start_address = FIXADDR_START;
 	address_markers[CPU_ENTRY_AREA_NR].start_address = CPU_ENTRY_AREA_BASE;
+# ifdef CONFIG_MODIFY_LDT_SYSCALL
+	address_markers[LDT_NR].start_address = LDT_BASE_ADDR;
+# endif
 #endif
 	return 0;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 27/31] x86/ldt: Reserve address-space range on 32 bit for the LDT
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Reserve 2MB/4MB of address-space for mapping the LDT to
user-space on 32 bit PTI kernels.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable_32_types.h | 7 +++++--
 arch/x86/mm/dump_pagetables.c           | 9 +++++++++
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_32_types.h b/arch/x86/include/asm/pgtable_32_types.h
index 0777e18..eb2e97a 100644
--- a/arch/x86/include/asm/pgtable_32_types.h
+++ b/arch/x86/include/asm/pgtable_32_types.h
@@ -48,13 +48,16 @@ extern bool __vmalloc_start_set; /* set once high_memory is set */
 	((FIXADDR_TOT_START - PAGE_SIZE * (CPU_ENTRY_AREA_PAGES + 1))   \
 	 & PMD_MASK)
 
-#define PKMAP_BASE		\
+#define LDT_BASE_ADDR		\
 	((CPU_ENTRY_AREA_BASE - PAGE_SIZE) & PMD_MASK)
 
+#define PKMAP_BASE		\
+	((LDT_BASE_ADDR - PAGE_SIZE) & PMD_MASK)
+
 #ifdef CONFIG_HIGHMEM
 # define VMALLOC_END	(PKMAP_BASE - 2 * PAGE_SIZE)
 #else
-# define VMALLOC_END	(CPU_ENTRY_AREA_BASE - 2 * PAGE_SIZE)
+# define VMALLOC_END	(LDT_BASE_ADDR - 2 * PAGE_SIZE)
 #endif
 
 #define MODULES_VADDR	VMALLOC_START
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 2151ebb..fdefdf0 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -117,6 +117,9 @@ enum address_markers_idx {
 #ifdef CONFIG_HIGHMEM
 	PKMAP_BASE_NR,
 #endif
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+	LDT_NR,
+#endif
 	CPU_ENTRY_AREA_NR,
 	FIXADDR_START_NR,
 	END_OF_SPACE_NR,
@@ -130,6 +133,9 @@ static struct addr_marker address_markers[] = {
 #ifdef CONFIG_HIGHMEM
 	[PKMAP_BASE_NR]		= { 0UL,		"Persistent kmap() Area" },
 #endif
+#ifdef CONFIG_MODIFY_LDT_SYSCALL
+	[LDT_NR]		= { 0UL,		"LDT remap" },
+#endif
 	[CPU_ENTRY_AREA_NR]	= { 0UL,		"CPU entry area" },
 	[FIXADDR_START_NR]	= { 0UL,		"Fixmap area" },
 	[END_OF_SPACE_NR]	= { -1,			NULL }
@@ -579,6 +585,9 @@ static int __init pt_dump_init(void)
 # endif
 	address_markers[FIXADDR_START_NR].start_address = FIXADDR_START;
 	address_markers[CPU_ENTRY_AREA_NR].start_address = CPU_ENTRY_AREA_BASE;
+# ifdef CONFIG_MODIFY_LDT_SYSCALL
+	address_markers[LDT_NR].start_address = LDT_BASE_ADDR;
+# endif
 #endif
 	return 0;
 }
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 28/31] x86/ldt: Define LDT_END_ADDR
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

It marks the end of the address-space range reserved for the
LDT. The LDT-code will use it when unmapping the LDT for
user-space.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable_32_types.h | 2 ++
 arch/x86/include/asm/pgtable_64_types.h | 2 ++
 arch/x86/kernel/ldt.c                   | 2 +-
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pgtable_32_types.h b/arch/x86/include/asm/pgtable_32_types.h
index eb2e97a..02bd445 100644
--- a/arch/x86/include/asm/pgtable_32_types.h
+++ b/arch/x86/include/asm/pgtable_32_types.h
@@ -51,6 +51,8 @@ extern bool __vmalloc_start_set; /* set once high_memory is set */
 #define LDT_BASE_ADDR		\
 	((CPU_ENTRY_AREA_BASE - PAGE_SIZE) & PMD_MASK)
 
+#define LDT_END_ADDR		(LDT_BASE_ADDR + PMD_SIZE)
+
 #define PKMAP_BASE		\
 	((LDT_BASE_ADDR - PAGE_SIZE) & PMD_MASK)
 
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index e57003a..15188baa 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -90,12 +90,14 @@ typedef struct { pteval_t pte; } pte_t;
 # define __VMEMMAP_BASE		_AC(0xffd4000000000000, UL)
 # define LDT_PGD_ENTRY		_AC(-112, UL)
 # define LDT_BASE_ADDR		(LDT_PGD_ENTRY << PGDIR_SHIFT)
+#define  LDT_END_ADDR		(LDT_BASE_ADDR + PGDIR_SIZE)
 #else
 # define VMALLOC_SIZE_TB	_AC(32, UL)
 # define __VMALLOC_BASE		_AC(0xffffc90000000000, UL)
 # define __VMEMMAP_BASE		_AC(0xffffea0000000000, UL)
 # define LDT_PGD_ENTRY		_AC(-3, UL)
 # define LDT_BASE_ADDR		(LDT_PGD_ENTRY << PGDIR_SHIFT)
+#define  LDT_END_ADDR		(LDT_BASE_ADDR + PGDIR_SIZE)
 #endif
 
 #ifdef CONFIG_RANDOMIZE_MEMORY
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index 26d713e..f3c2fbf 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -202,7 +202,7 @@ static void free_ldt_pgtables(struct mm_struct *mm)
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
 	struct mmu_gather tlb;
 	unsigned long start = LDT_BASE_ADDR;
-	unsigned long end = start + (1UL << PGDIR_SHIFT);
+	unsigned long end = LDT_END_ADDR;
 
 	if (!static_cpu_has(X86_FEATURE_PTI))
 		return;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 28/31] x86/ldt: Define LDT_END_ADDR
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

It marks the end of the address-space range reserved for the
LDT. The LDT-code will use it when unmapping the LDT for
user-space.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/pgtable_32_types.h | 2 ++
 arch/x86/include/asm/pgtable_64_types.h | 2 ++
 arch/x86/kernel/ldt.c                   | 2 +-
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pgtable_32_types.h b/arch/x86/include/asm/pgtable_32_types.h
index eb2e97a..02bd445 100644
--- a/arch/x86/include/asm/pgtable_32_types.h
+++ b/arch/x86/include/asm/pgtable_32_types.h
@@ -51,6 +51,8 @@ extern bool __vmalloc_start_set; /* set once high_memory is set */
 #define LDT_BASE_ADDR		\
 	((CPU_ENTRY_AREA_BASE - PAGE_SIZE) & PMD_MASK)
 
+#define LDT_END_ADDR		(LDT_BASE_ADDR + PMD_SIZE)
+
 #define PKMAP_BASE		\
 	((LDT_BASE_ADDR - PAGE_SIZE) & PMD_MASK)
 
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h
index e57003a..15188baa 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -90,12 +90,14 @@ typedef struct { pteval_t pte; } pte_t;
 # define __VMEMMAP_BASE		_AC(0xffd4000000000000, UL)
 # define LDT_PGD_ENTRY		_AC(-112, UL)
 # define LDT_BASE_ADDR		(LDT_PGD_ENTRY << PGDIR_SHIFT)
+#define  LDT_END_ADDR		(LDT_BASE_ADDR + PGDIR_SIZE)
 #else
 # define VMALLOC_SIZE_TB	_AC(32, UL)
 # define __VMALLOC_BASE		_AC(0xffffc90000000000, UL)
 # define __VMEMMAP_BASE		_AC(0xffffea0000000000, UL)
 # define LDT_PGD_ENTRY		_AC(-3, UL)
 # define LDT_BASE_ADDR		(LDT_PGD_ENTRY << PGDIR_SHIFT)
+#define  LDT_END_ADDR		(LDT_BASE_ADDR + PGDIR_SIZE)
 #endif
 
 #ifdef CONFIG_RANDOMIZE_MEMORY
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index 26d713e..f3c2fbf 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -202,7 +202,7 @@ static void free_ldt_pgtables(struct mm_struct *mm)
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
 	struct mmu_gather tlb;
 	unsigned long start = LDT_BASE_ADDR;
-	unsigned long end = start + (1UL << PGDIR_SHIFT);
+	unsigned long end = LDT_END_ADDR;
 
 	if (!static_cpu_has(X86_FEATURE_PTI))
 		return;
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 29/31] x86/ldt: Split out sanity check in map_ldt_struct()
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

This splits out the mapping sanity check and the actual
mapping of the LDT to user-space from the map_ldt_struct()
function in a way so that it is re-usable for PAE paging.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/ldt.c | 82 ++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 58 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index f3c2fbf..8ab7df9 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -100,6 +100,49 @@ static struct ldt_struct *alloc_ldt_struct(unsigned int num_entries)
 	return new_ldt;
 }
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+
+static void do_sanity_check(struct mm_struct *mm,
+			    bool had_kernel_mapping,
+			    bool had_user_mapping)
+{
+	if (mm->context.ldt) {
+		/*
+		 * We already had an LDT.  The top-level entry should already
+		 * have been allocated and synchronized with the usermode
+		 * tables.
+		 */
+		WARN_ON(!had_kernel_mapping);
+		if (static_cpu_has(X86_FEATURE_PTI))
+			WARN_ON(!had_user_mapping);
+	} else {
+		/*
+		 * This is the first time we're mapping an LDT for this process.
+		 * Sync the pgd to the usermode tables.
+		 */
+		WARN_ON(had_kernel_mapping);
+		if (static_cpu_has(X86_FEATURE_PTI))
+			WARN_ON(had_user_mapping);
+	}
+}
+
+static void map_ldt_struct_to_user(struct mm_struct *mm)
+{
+	pgd_t *pgd = pgd_offset(mm, LDT_BASE_ADDR);
+
+	if (static_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt)
+		set_pgd(kernel_to_user_pgdp(pgd), *pgd);
+}
+
+static void sanity_check_ldt_mapping(struct mm_struct *mm)
+{
+	pgd_t *pgd = pgd_offset(mm, LDT_BASE_ADDR);
+	bool had_kernel = (pgd->pgd != 0);
+	bool had_user   = (kernel_to_user_pgdp(pgd)->pgd != 0);
+
+	do_sanity_check(mm, had_kernel, had_user);
+}
+
 /*
  * If PTI is enabled, this maps the LDT into the kernelmode and
  * usermode tables for the given mm.
@@ -115,9 +158,8 @@ static struct ldt_struct *alloc_ldt_struct(unsigned int num_entries)
 static int
 map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 {
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
-	bool is_vmalloc, had_top_level_entry;
 	unsigned long va;
+	bool is_vmalloc;
 	spinlock_t *ptl;
 	pgd_t *pgd;
 	int i;
@@ -131,13 +173,15 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 	 */
 	WARN_ON(ldt->slot != -1);
 
+	/* Check if the current mappings are sane */
+	sanity_check_ldt_mapping(mm);
+
 	/*
 	 * Did we already have the top level entry allocated?  We can't
 	 * use pgd_none() for this because it doens't do anything on
 	 * 4-level page table kernels.
 	 */
 	pgd = pgd_offset(mm, LDT_BASE_ADDR);
-	had_top_level_entry = (pgd->pgd != 0);
 
 	is_vmalloc = is_vmalloc_addr(ldt->entries);
 
@@ -168,35 +212,25 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 		pte_unmap_unlock(ptep, ptl);
 	}
 
-	if (mm->context.ldt) {
-		/*
-		 * We already had an LDT.  The top-level entry should already
-		 * have been allocated and synchronized with the usermode
-		 * tables.
-		 */
-		WARN_ON(!had_top_level_entry);
-		if (static_cpu_has(X86_FEATURE_PTI))
-			WARN_ON(!kernel_to_user_pgdp(pgd)->pgd);
-	} else {
-		/*
-		 * This is the first time we're mapping an LDT for this process.
-		 * Sync the pgd to the usermode tables.
-		 */
-		WARN_ON(had_top_level_entry);
-		if (static_cpu_has(X86_FEATURE_PTI)) {
-			WARN_ON(kernel_to_user_pgdp(pgd)->pgd);
-			set_pgd(kernel_to_user_pgdp(pgd), *pgd);
-		}
-	}
+	/* Propagate LDT mapping to the user page-table */
+	map_ldt_struct_to_user(mm);
 
 	va = (unsigned long)ldt_slot_va(slot);
 	flush_tlb_mm_range(mm, va, va + LDT_SLOT_STRIDE, 0);
 
 	ldt->slot = slot;
-#endif
 	return 0;
 }
 
+#else /* !CONFIG_PAGE_TABLE_ISOLATION */
+
+static int
+map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
+{
+	return 0;
+}
+#endif /* CONFIG_PAGE_TABLE_ISOLATION */
+
 static void free_ldt_pgtables(struct mm_struct *mm)
 {
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 29/31] x86/ldt: Split out sanity check in map_ldt_struct()
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

This splits out the mapping sanity check and the actual
mapping of the LDT to user-space from the map_ldt_struct()
function in a way so that it is re-usable for PAE paging.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/kernel/ldt.c | 82 ++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 58 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index f3c2fbf..8ab7df9 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -100,6 +100,49 @@ static struct ldt_struct *alloc_ldt_struct(unsigned int num_entries)
 	return new_ldt;
 }
 
+#ifdef CONFIG_PAGE_TABLE_ISOLATION
+
+static void do_sanity_check(struct mm_struct *mm,
+			    bool had_kernel_mapping,
+			    bool had_user_mapping)
+{
+	if (mm->context.ldt) {
+		/*
+		 * We already had an LDT.  The top-level entry should already
+		 * have been allocated and synchronized with the usermode
+		 * tables.
+		 */
+		WARN_ON(!had_kernel_mapping);
+		if (static_cpu_has(X86_FEATURE_PTI))
+			WARN_ON(!had_user_mapping);
+	} else {
+		/*
+		 * This is the first time we're mapping an LDT for this process.
+		 * Sync the pgd to the usermode tables.
+		 */
+		WARN_ON(had_kernel_mapping);
+		if (static_cpu_has(X86_FEATURE_PTI))
+			WARN_ON(had_user_mapping);
+	}
+}
+
+static void map_ldt_struct_to_user(struct mm_struct *mm)
+{
+	pgd_t *pgd = pgd_offset(mm, LDT_BASE_ADDR);
+
+	if (static_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt)
+		set_pgd(kernel_to_user_pgdp(pgd), *pgd);
+}
+
+static void sanity_check_ldt_mapping(struct mm_struct *mm)
+{
+	pgd_t *pgd = pgd_offset(mm, LDT_BASE_ADDR);
+	bool had_kernel = (pgd->pgd != 0);
+	bool had_user   = (kernel_to_user_pgdp(pgd)->pgd != 0);
+
+	do_sanity_check(mm, had_kernel, had_user);
+}
+
 /*
  * If PTI is enabled, this maps the LDT into the kernelmode and
  * usermode tables for the given mm.
@@ -115,9 +158,8 @@ static struct ldt_struct *alloc_ldt_struct(unsigned int num_entries)
 static int
 map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 {
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
-	bool is_vmalloc, had_top_level_entry;
 	unsigned long va;
+	bool is_vmalloc;
 	spinlock_t *ptl;
 	pgd_t *pgd;
 	int i;
@@ -131,13 +173,15 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 	 */
 	WARN_ON(ldt->slot != -1);
 
+	/* Check if the current mappings are sane */
+	sanity_check_ldt_mapping(mm);
+
 	/*
 	 * Did we already have the top level entry allocated?  We can't
 	 * use pgd_none() for this because it doens't do anything on
 	 * 4-level page table kernels.
 	 */
 	pgd = pgd_offset(mm, LDT_BASE_ADDR);
-	had_top_level_entry = (pgd->pgd != 0);
 
 	is_vmalloc = is_vmalloc_addr(ldt->entries);
 
@@ -168,35 +212,25 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
 		pte_unmap_unlock(ptep, ptl);
 	}
 
-	if (mm->context.ldt) {
-		/*
-		 * We already had an LDT.  The top-level entry should already
-		 * have been allocated and synchronized with the usermode
-		 * tables.
-		 */
-		WARN_ON(!had_top_level_entry);
-		if (static_cpu_has(X86_FEATURE_PTI))
-			WARN_ON(!kernel_to_user_pgdp(pgd)->pgd);
-	} else {
-		/*
-		 * This is the first time we're mapping an LDT for this process.
-		 * Sync the pgd to the usermode tables.
-		 */
-		WARN_ON(had_top_level_entry);
-		if (static_cpu_has(X86_FEATURE_PTI)) {
-			WARN_ON(kernel_to_user_pgdp(pgd)->pgd);
-			set_pgd(kernel_to_user_pgdp(pgd), *pgd);
-		}
-	}
+	/* Propagate LDT mapping to the user page-table */
+	map_ldt_struct_to_user(mm);
 
 	va = (unsigned long)ldt_slot_va(slot);
 	flush_tlb_mm_range(mm, va, va + LDT_SLOT_STRIDE, 0);
 
 	ldt->slot = slot;
-#endif
 	return 0;
 }
 
+#else /* !CONFIG_PAGE_TABLE_ISOLATION */
+
+static int
+map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot)
+{
+	return 0;
+}
+#endif /* CONFIG_PAGE_TABLE_ISOLATION */
+
 static void free_ldt_pgtables(struct mm_struct *mm)
 {
 #ifdef CONFIG_PAGE_TABLE_ISOLATION
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 30/31] x86/ldt: Enable LDT user-mapping for PAE
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

This adds the needed special case for PAE to get the LDT
mapped into the user page-table when PTI is enabled. The big
difference to the other paging modes is that we don't have a
full top-level PGD entry available for the LDT, but only PMD
entry.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/mmu_context.h |  4 ---
 arch/x86/kernel/ldt.c              | 53 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index c931b88..af96cfb 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -70,11 +70,7 @@ struct ldt_struct {
 
 static inline void *ldt_slot_va(int slot)
 {
-#ifdef CONFIG_X86_64
 	return (void *)(LDT_BASE_ADDR + LDT_SLOT_STRIDE * slot);
-#else
-	BUG();
-#endif
 }
 
 /*
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index 8ab7df9..7787451 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -126,6 +126,57 @@ static void do_sanity_check(struct mm_struct *mm,
 	}
 }
 
+#ifdef CONFIG_X86_PAE
+
+static pmd_t *pgd_to_pmd_walk(pgd_t *pgd, unsigned long va)
+{
+	p4d_t *p4d;
+	pud_t *pud;
+
+	if (pgd->pgd == 0)
+		return NULL;
+
+	p4d = p4d_offset(pgd, va);
+	if (p4d_none(*p4d))
+		return NULL;
+
+	pud = pud_offset(p4d, va);
+	if (pud_none(*pud))
+		return NULL;
+
+	return pmd_offset(pud, va);
+}
+
+static void map_ldt_struct_to_user(struct mm_struct *mm)
+{
+	pgd_t *k_pgd = pgd_offset(mm, LDT_BASE_ADDR);
+	pgd_t *u_pgd = kernel_to_user_pgdp(k_pgd);
+	pmd_t *k_pmd, *u_pmd;
+
+	k_pmd = pgd_to_pmd_walk(k_pgd, LDT_BASE_ADDR);
+	u_pmd = pgd_to_pmd_walk(u_pgd, LDT_BASE_ADDR);
+
+	if (static_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt)
+		set_pmd(u_pmd, *k_pmd);
+}
+
+static void sanity_check_ldt_mapping(struct mm_struct *mm)
+{
+	pgd_t *k_pgd = pgd_offset(mm, LDT_BASE_ADDR);
+	pgd_t *u_pgd = kernel_to_user_pgdp(k_pgd);
+	bool had_kernel, had_user;
+	pmd_t *k_pmd, *u_pmd;
+
+	k_pmd      = pgd_to_pmd_walk(k_pgd, LDT_BASE_ADDR);
+	u_pmd      = pgd_to_pmd_walk(u_pgd, LDT_BASE_ADDR);
+	had_kernel = (k_pmd->pmd != 0);
+	had_user   = (u_pmd->pmd != 0);
+
+	do_sanity_check(mm, had_kernel, had_user);
+}
+
+#else /* !CONFIG_X86_PAE */
+
 static void map_ldt_struct_to_user(struct mm_struct *mm)
 {
 	pgd_t *pgd = pgd_offset(mm, LDT_BASE_ADDR);
@@ -143,6 +194,8 @@ static void sanity_check_ldt_mapping(struct mm_struct *mm)
 	do_sanity_check(mm, had_kernel, had_user);
 }
 
+#endif /* CONFIG_X86_PAE */
+
 /*
  * If PTI is enabled, this maps the LDT into the kernelmode and
  * usermode tables for the given mm.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 30/31] x86/ldt: Enable LDT user-mapping for PAE
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

This adds the needed special case for PAE to get the LDT
mapped into the user page-table when PTI is enabled. The big
difference to the other paging modes is that we don't have a
full top-level PGD entry available for the LDT, but only PMD
entry.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 arch/x86/include/asm/mmu_context.h |  4 ---
 arch/x86/kernel/ldt.c              | 53 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index c931b88..af96cfb 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -70,11 +70,7 @@ struct ldt_struct {
 
 static inline void *ldt_slot_va(int slot)
 {
-#ifdef CONFIG_X86_64
 	return (void *)(LDT_BASE_ADDR + LDT_SLOT_STRIDE * slot);
-#else
-	BUG();
-#endif
 }
 
 /*
diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c
index 8ab7df9..7787451 100644
--- a/arch/x86/kernel/ldt.c
+++ b/arch/x86/kernel/ldt.c
@@ -126,6 +126,57 @@ static void do_sanity_check(struct mm_struct *mm,
 	}
 }
 
+#ifdef CONFIG_X86_PAE
+
+static pmd_t *pgd_to_pmd_walk(pgd_t *pgd, unsigned long va)
+{
+	p4d_t *p4d;
+	pud_t *pud;
+
+	if (pgd->pgd == 0)
+		return NULL;
+
+	p4d = p4d_offset(pgd, va);
+	if (p4d_none(*p4d))
+		return NULL;
+
+	pud = pud_offset(p4d, va);
+	if (pud_none(*pud))
+		return NULL;
+
+	return pmd_offset(pud, va);
+}
+
+static void map_ldt_struct_to_user(struct mm_struct *mm)
+{
+	pgd_t *k_pgd = pgd_offset(mm, LDT_BASE_ADDR);
+	pgd_t *u_pgd = kernel_to_user_pgdp(k_pgd);
+	pmd_t *k_pmd, *u_pmd;
+
+	k_pmd = pgd_to_pmd_walk(k_pgd, LDT_BASE_ADDR);
+	u_pmd = pgd_to_pmd_walk(u_pgd, LDT_BASE_ADDR);
+
+	if (static_cpu_has(X86_FEATURE_PTI) && !mm->context.ldt)
+		set_pmd(u_pmd, *k_pmd);
+}
+
+static void sanity_check_ldt_mapping(struct mm_struct *mm)
+{
+	pgd_t *k_pgd = pgd_offset(mm, LDT_BASE_ADDR);
+	pgd_t *u_pgd = kernel_to_user_pgdp(k_pgd);
+	bool had_kernel, had_user;
+	pmd_t *k_pmd, *u_pmd;
+
+	k_pmd      = pgd_to_pmd_walk(k_pgd, LDT_BASE_ADDR);
+	u_pmd      = pgd_to_pmd_walk(u_pgd, LDT_BASE_ADDR);
+	had_kernel = (k_pmd->pmd != 0);
+	had_user   = (u_pmd->pmd != 0);
+
+	do_sanity_check(mm, had_kernel, had_user);
+}
+
+#else /* !CONFIG_X86_PAE */
+
 static void map_ldt_struct_to_user(struct mm_struct *mm)
 {
 	pgd_t *pgd = pgd_offset(mm, LDT_BASE_ADDR);
@@ -143,6 +194,8 @@ static void sanity_check_ldt_mapping(struct mm_struct *mm)
 	do_sanity_check(mm, had_kernel, had_user);
 }
 
+#endif /* CONFIG_X86_PAE */
+
 /*
  * If PTI is enabled, this maps the LDT into the kernelmode and
  * usermode tables for the given mm.
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 31/31] x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09  9:25   ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Allow PTI to be compiled on x86_32.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 security/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/Kconfig b/security/Kconfig
index b0cb9a5..93d85fd 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -57,7 +57,7 @@ config SECURITY_NETWORK
 config PAGE_TABLE_ISOLATION
 	bool "Remove the kernel mapping in user mode"
 	default y
-	depends on X86_64 && !UML
+	depends on X86 && !UML
 	help
 	  This feature reduces the number of hardware side channels by
 	  ensuring that the majority of kernel addresses are not mapped
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* [PATCH 31/31] x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32
@ 2018-02-09  9:25   ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09  9:25 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel, joro

From: Joerg Roedel <jroedel@suse.de>

Allow PTI to be compiled on x86_32.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 security/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/Kconfig b/security/Kconfig
index b0cb9a5..93d85fd 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -57,7 +57,7 @@ config SECURITY_NETWORK
 config PAGE_TABLE_ISOLATION
 	bool "Remove the kernel mapping in user mode"
 	default y
-	depends on X86_64 && !UML
+	depends on X86 && !UML
 	help
 	  This feature reduces the number of hardware side channels by
 	  ensuring that the majority of kernel addresses are not mapped
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09 12:11   ` Juergen Gross
  -1 siblings, 0 replies; 183+ messages in thread
From: Juergen Gross @ 2018-02-09 12:11 UTC (permalink / raw)
  To: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, aliguori,
	daniel.gruss, hughd, keescook, Andrea Arcangeli, Waiman Long,
	Pavel Machek, jroedel

On 09/02/18 10:25, Joerg Roedel wrote:
> Hi,
> 
> here is the second version of my PTI implementation for
> x86_32, based on tip/x86-pti-for-linus. It took a lot longer
> than I had hoped, but there have been a number of obstacles
> on the way. It also isn't the small patch-set anymore that v1
> was, but compared to it this one actually works :)
> 
> The biggest changes were necessary in the entry code, a lot
> of it is moving code around, but there are also significant
> changes to get all cases covered. This includes NMIs and
> exceptions on the kernel exit-path where we are already on
> the entry-stack. To make this work I decided to mostly split
> up the common kernel-exit path into a return-to-kernel,
> return-to-user and return-from-nmi part.
> 
> On the page-table side I had to do a lot of special cases
> for PAE because PAE paging is so, well, special. The biggest
> example here is the LDT mapping code, which needs to work on
> the PMD level instead of PGD when PAE is enabled.
> 
> During development I also experimented with unshared PMDs
> between the kernel and the user page-tables for PAE. It
> worked by allocating 8k PMDs and using the lower half for
> the kernel and the upper half for the user page-table. While
> this worked and allowed me to NX-protect the user-space
> address-range in the kernel page-table, it also required 5
> order-1 allocations in low-mem for each process. In my
> testing I got this to fail pretty quickly and trigger OOM,
> so I abandoned the approach for now.
> 
> Here is how I tested these patches:
> 
> 	* Booted on a real machine (4C/8T, 16GB RAM) and run
> 	  an overnight load-test with 'perf top' running
> 	  (for the NMIs), the ldt_gdt selftest running in a
> 	  loop (for more stress on the entry/exit path) and
> 	  a -j16 kernel compile also running in a loop. The
> 	  box survived the test, which ran for more than 18
> 	  hours.
> 
> 	* Tested most x86 selftests in the kernel on the
> 	  real machine. This showed no regressions. I did
> 	  not run the mpx and protection-key tests, as the
> 	  machine does not support these features, and I
> 	  also skipped the check_initial_reg_state test, as
> 	  it made problems while compiling and it didn't
> 	  seem relevant enough to fix that for this
> 	  patch-set.
> 
> 	* Boot tested all valid combinations of [NO]HIGHMEM* vs.
> 	  VMSPLIT* vs. PAE in KVM. All booted fine.
> 
> 	* Did compile-tests with various configs (allyes,
> 	  allmod, defconfig, ..., basically what I usually
> 	  use to test the iommu-tree as well). All compiled
> 	  fine.
> 
> 	* Some basic compile, boot and runtime testing of
> 	  64 bit to make sure I didn't break anything there.
> 
> I did not explicitly test wine and dosemu, but since the
> vm86 and the ldt_gdt self-tests all passed fine I am
> confident that those will also still work.
> 
> XENPV is also untested from my side, but I added checks to
> not do the stack switches in the entry-code when XENPV is
> enabled, so hopefully it works. But someone should test it,
> of course.

That's unfortunate. 32 bit XENPV kernel is vulnerable to Meltdown, too.
I'll have a look whether 32 bit XENPV is still working, though.

Adding support for KPTI with Xen PV should probably be done later. :-)


Juergen

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-09 12:11   ` Juergen Gross
  0 siblings, 0 replies; 183+ messages in thread
From: Juergen Gross @ 2018-02-09 12:11 UTC (permalink / raw)
  To: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, aliguori,
	daniel.gruss, hughd, keescook, Andrea Arcangeli, Waiman Long,
	Pavel Machek, jroedel

On 09/02/18 10:25, Joerg Roedel wrote:
> Hi,
> 
> here is the second version of my PTI implementation for
> x86_32, based on tip/x86-pti-for-linus. It took a lot longer
> than I had hoped, but there have been a number of obstacles
> on the way. It also isn't the small patch-set anymore that v1
> was, but compared to it this one actually works :)
> 
> The biggest changes were necessary in the entry code, a lot
> of it is moving code around, but there are also significant
> changes to get all cases covered. This includes NMIs and
> exceptions on the kernel exit-path where we are already on
> the entry-stack. To make this work I decided to mostly split
> up the common kernel-exit path into a return-to-kernel,
> return-to-user and return-from-nmi part.
> 
> On the page-table side I had to do a lot of special cases
> for PAE because PAE paging is so, well, special. The biggest
> example here is the LDT mapping code, which needs to work on
> the PMD level instead of PGD when PAE is enabled.
> 
> During development I also experimented with unshared PMDs
> between the kernel and the user page-tables for PAE. It
> worked by allocating 8k PMDs and using the lower half for
> the kernel and the upper half for the user page-table. While
> this worked and allowed me to NX-protect the user-space
> address-range in the kernel page-table, it also required 5
> order-1 allocations in low-mem for each process. In my
> testing I got this to fail pretty quickly and trigger OOM,
> so I abandoned the approach for now.
> 
> Here is how I tested these patches:
> 
> 	* Booted on a real machine (4C/8T, 16GB RAM) and run
> 	  an overnight load-test with 'perf top' running
> 	  (for the NMIs), the ldt_gdt selftest running in a
> 	  loop (for more stress on the entry/exit path) and
> 	  a -j16 kernel compile also running in a loop. The
> 	  box survived the test, which ran for more than 18
> 	  hours.
> 
> 	* Tested most x86 selftests in the kernel on the
> 	  real machine. This showed no regressions. I did
> 	  not run the mpx and protection-key tests, as the
> 	  machine does not support these features, and I
> 	  also skipped the check_initial_reg_state test, as
> 	  it made problems while compiling and it didn't
> 	  seem relevant enough to fix that for this
> 	  patch-set.
> 
> 	* Boot tested all valid combinations of [NO]HIGHMEM* vs.
> 	  VMSPLIT* vs. PAE in KVM. All booted fine.
> 
> 	* Did compile-tests with various configs (allyes,
> 	  allmod, defconfig, ..., basically what I usually
> 	  use to test the iommu-tree as well). All compiled
> 	  fine.
> 
> 	* Some basic compile, boot and runtime testing of
> 	  64 bit to make sure I didn't break anything there.
> 
> I did not explicitly test wine and dosemu, but since the
> vm86 and the ldt_gdt self-tests all passed fine I am
> confident that those will also still work.
> 
> XENPV is also untested from my side, but I added checks to
> not do the stack switches in the entry-code when XENPV is
> enabled, so hopefully it works. But someone should test it,
> of course.

That's unfortunate. 32 bit XENPV kernel is vulnerable to Meltdown, too.
I'll have a look whether 32 bit XENPV is still working, though.

Adding support for KPTI with Xen PV should probably be done later. :-)


Juergen

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-09 12:11   ` Juergen Gross
@ 2018-02-09 13:35     ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09 13:35 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel,
	linux-mm, Linus Torvalds, Andy Lutomirski, Dave Hansen,
	Josh Poimboeuf, Peter Zijlstra, Borislav Petkov, Jiri Kosina,
	Boris Ostrovsky, Brian Gerst, David Laight, Denys Vlasenko,
	Eduardo Valentin, Greg KH, Will Deacon, aliguori, daniel.gruss,
	hughd, keescook, Andrea Arcangeli, Waiman Long, Pavel Machek,
	jroedel

Hi Juergen,

On Fri, Feb 09, 2018 at 01:11:42PM +0100, Juergen Gross wrote:
> On 09/02/18 10:25, Joerg Roedel wrote:
> > XENPV is also untested from my side, but I added checks to
> > not do the stack switches in the entry-code when XENPV is
> > enabled, so hopefully it works. But someone should test it,
> > of course.
> 
> That's unfortunate. 32 bit XENPV kernel is vulnerable to Meltdown, too.
> I'll have a look whether 32 bit XENPV is still working, though.
> 
> Adding support for KPTI with Xen PV should probably be done later. :-)

Not sure how much is missing to make it work there, one point is
certainly to write the right stack into tss.sp0 for xenpv on 32bit. This
write has a check to only happen for !xenpv.

But let's first test the code as-is on XENPV and see if it still boots
:)


Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-09 13:35     ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09 13:35 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel,
	linux-mm, Linus Torvalds, Andy Lutomirski, Dave Hansen,
	Josh Poimboeuf, Peter Zijlstra, Borislav Petkov, Jiri Kosina,
	Boris Ostrovsky, Brian Gerst, David Laight, Denys Vlasenko,
	Eduardo Valentin, Greg KH, Will Deacon, aliguori, daniel.gruss,
	hughd, keescook, Andrea Arcangeli, Waiman Long, Pavel Machek,
	jroedel

Hi Juergen,

On Fri, Feb 09, 2018 at 01:11:42PM +0100, Juergen Gross wrote:
> On 09/02/18 10:25, Joerg Roedel wrote:
> > XENPV is also untested from my side, but I added checks to
> > not do the stack switches in the entry-code when XENPV is
> > enabled, so hopefully it works. But someone should test it,
> > of course.
> 
> That's unfortunate. 32 bit XENPV kernel is vulnerable to Meltdown, too.
> I'll have a look whether 32 bit XENPV is still working, though.
> 
> Adding support for KPTI with Xen PV should probably be done later. :-)

Not sure how much is missing to make it work there, one point is
certainly to write the right stack into tss.sp0 for xenpv on 32bit. This
write has a check to only happen for !xenpv.

But let's first test the code as-is on XENPV and see if it still boots
:)


Thanks,

	Joerg

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-09 13:35     ` Joerg Roedel
@ 2018-02-09 13:54       ` Andrew Cooper
  -1 siblings, 0 replies; 183+ messages in thread
From: Andrew Cooper @ 2018-02-09 13:54 UTC (permalink / raw)
  To: Joerg Roedel, Juergen Gross
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel,
	linux-mm, Linus Torvalds, Andy Lutomirski, Dave Hansen,
	Josh Poimboeuf, Peter Zijlstra, Borislav Petkov, Jiri Kosina,
	Boris Ostrovsky, Brian Gerst, David Laight, Denys Vlasenko,
	Eduardo Valentin, Greg KH, Will Deacon, aliguori, daniel.gruss,
	hughd, keescook, Andrea Arcangeli, Waiman Long, Pavel Machek,
	jroedel

On 09/02/18 13:35, Joerg Roedel wrote:
> Hi Juergen,
>
> On Fri, Feb 09, 2018 at 01:11:42PM +0100, Juergen Gross wrote:
>> On 09/02/18 10:25, Joerg Roedel wrote:
>>> XENPV is also untested from my side, but I added checks to
>>> not do the stack switches in the entry-code when XENPV is
>>> enabled, so hopefully it works. But someone should test it,
>>> of course.
>> That's unfortunate. 32 bit XENPV kernel is vulnerable to Meltdown, too.
>> I'll have a look whether 32 bit XENPV is still working, though.
>>
>> Adding support for KPTI with Xen PV should probably be done later. :-)
> Not sure how much is missing to make it work there, one point is
> certainly to write the right stack into tss.sp0 for xenpv on 32bit. This
> write has a check to only happen for !xenpv.
>
> But let's first test the code as-is on XENPV and see if it still boots
> :)

IMO, the only sensible way to do KPTI + Xen PV is to have Xen to do the
pagetable switch for 32bit like we already do for 64bit guests.  All
context switches already pass through the hypervisor, and it saves the
guest having to make the updates itself (which will trap for auditing)
or having to juggle the set_stack_base() semantics.

~Andrew

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-09 13:54       ` Andrew Cooper
  0 siblings, 0 replies; 183+ messages in thread
From: Andrew Cooper @ 2018-02-09 13:54 UTC (permalink / raw)
  To: Joerg Roedel, Juergen Gross
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel,
	linux-mm, Linus Torvalds, Andy Lutomirski, Dave Hansen,
	Josh Poimboeuf, Peter Zijlstra, Borislav Petkov, Jiri Kosina,
	Boris Ostrovsky, Brian Gerst, David Laight, Denys Vlasenko,
	Eduardo Valentin, Greg KH, Will Deacon, aliguori, daniel.gruss,
	hughd, keescook, Andrea Arcangeli, Waiman Long, Pavel Machek,
	jroedel

On 09/02/18 13:35, Joerg Roedel wrote:
> Hi Juergen,
>
> On Fri, Feb 09, 2018 at 01:11:42PM +0100, Juergen Gross wrote:
>> On 09/02/18 10:25, Joerg Roedel wrote:
>>> XENPV is also untested from my side, but I added checks to
>>> not do the stack switches in the entry-code when XENPV is
>>> enabled, so hopefully it works. But someone should test it,
>>> of course.
>> That's unfortunate. 32 bit XENPV kernel is vulnerable to Meltdown, too.
>> I'll have a look whether 32 bit XENPV is still working, though.
>>
>> Adding support for KPTI with Xen PV should probably be done later. :-)
> Not sure how much is missing to make it work there, one point is
> certainly to write the right stack into tss.sp0 for xenpv on 32bit. This
> write has a check to only happen for !xenpv.
>
> But let's first test the code as-is on XENPV and see if it still boots
> :)

IMO, the only sensible way to do KPTI + Xen PV is to have Xen to do the
pagetable switch for 32bit like we already do for 64bit guests.A  All
context switches already pass through the hypervisor, and it saves the
guest having to make the updates itself (which will trap for auditing)
or having to juggle the set_stack_base() semantics.

~Andrew

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
  2018-02-09  9:25   ` Joerg Roedel
@ 2018-02-09 17:05     ` Linus Torvalds
  -1 siblings, 0 replies; 183+ messages in thread
From: Linus Torvalds @ 2018-02-09 17:05 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List, linux-mm,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	Brian Gerst, David Laight, Denys Vlasenko, Eduardo Valentin,
	Greg KH, Will Deacon, Liguori, Anthony, Daniel Gruss,
	Hugh Dickins, Kees Cook, Andrea Arcangeli, Waiman Long,
	Pavel Machek, Joerg Roedel

On Fri, Feb 9, 2018 at 1:25 AM, Joerg Roedel <joro@8bytes.org> wrote:
> +
> +       /* Copy over the stack-frame */
> +       cld
> +       rep movsb

Ugh. This is going to be horrendous. Maybe not noticeable on modern
CPU's, but the whole 32-bit code is kind of pointless on a modern CPU.

At least use "rep movsl". If the kernel stack isn't 4-byte aligned,
you have issues.

             Linus

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
@ 2018-02-09 17:05     ` Linus Torvalds
  0 siblings, 0 replies; 183+ messages in thread
From: Linus Torvalds @ 2018-02-09 17:05 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List, linux-mm,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	Brian Gerst, David Laight, Denys Vlasenko, Eduardo Valentin,
	Greg KH, Will Deacon, Liguori, Anthony, Daniel Gruss,
	Hugh Dickins, Kees Cook, Andrea Arcangeli, Waiman Long,
	Pavel Machek, Joerg Roedel

On Fri, Feb 9, 2018 at 1:25 AM, Joerg Roedel <joro@8bytes.org> wrote:
> +
> +       /* Copy over the stack-frame */
> +       cld
> +       rep movsb

Ugh. This is going to be horrendous. Maybe not noticeable on modern
CPU's, but the whole 32-bit code is kind of pointless on a modern CPU.

At least use "rep movsl". If the kernel stack isn't 4-byte aligned,
you have issues.

             Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
  2018-02-09 17:05     ` Linus Torvalds
@ 2018-02-09 17:17       ` Denys Vlasenko
  -1 siblings, 0 replies; 183+ messages in thread
From: Denys Vlasenko @ 2018-02-09 17:17 UTC (permalink / raw)
  To: Linus Torvalds, Joerg Roedel
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List, linux-mm,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	Brian Gerst, David Laight, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli, Waiman Long, Pavel Machek,
	Joerg Roedel

On 02/09/2018 06:05 PM, Linus Torvalds wrote:
> On Fri, Feb 9, 2018 at 1:25 AM, Joerg Roedel <joro@8bytes.org> wrote:
>> +
>> +       /* Copy over the stack-frame */
>> +       cld
>> +       rep movsb
> 
> Ugh. This is going to be horrendous. Maybe not noticeable on modern
> CPU's, but the whole 32-bit code is kind of pointless on a modern CPU.
> 
> At least use "rep movsl". If the kernel stack isn't 4-byte aligned,
> you have issues.

Indeed, "rep movs" has some setup overhead that makes it undesirable
for small sizes. In my testing, moving less than 128 bytes with "rep movs"
is a loss.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
@ 2018-02-09 17:17       ` Denys Vlasenko
  0 siblings, 0 replies; 183+ messages in thread
From: Denys Vlasenko @ 2018-02-09 17:17 UTC (permalink / raw)
  To: Linus Torvalds, Joerg Roedel
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List, linux-mm,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	Brian Gerst, David Laight, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli, Waiman Long, Pavel Machek,
	Joerg Roedel

On 02/09/2018 06:05 PM, Linus Torvalds wrote:
> On Fri, Feb 9, 2018 at 1:25 AM, Joerg Roedel <joro@8bytes.org> wrote:
>> +
>> +       /* Copy over the stack-frame */
>> +       cld
>> +       rep movsb
> 
> Ugh. This is going to be horrendous. Maybe not noticeable on modern
> CPU's, but the whole 32-bit code is kind of pointless on a modern CPU.
> 
> At least use "rep movsl". If the kernel stack isn't 4-byte aligned,
> you have issues.

Indeed, "rep movs" has some setup overhead that makes it undesirable
for small sizes. In my testing, moving less than 128 bytes with "rep movs"
is a loss.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 23/31] x86/mm/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32
  2018-02-09  9:25   ` Joerg Roedel
@ 2018-02-09 17:42     ` Andy Lutomirski
  -1 siblings, 0 replies; 183+ messages in thread
From: Andy Lutomirski @ 2018-02-09 17:42 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML, LKML,
	Linux-MM, Linus Torvalds, Andy Lutomirski, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek, Joerg Roedel

On Fri, Feb 9, 2018 at 9:25 AM, Joerg Roedel <joro@8bytes.org> wrote:
> From: Joerg Roedel <jroedel@suse.de>
>
> Move it out of the X86_64 specific processor defines so
> that its visible for 32bit too.
>
> Signed-off-by: Joerg Roedel <jroedel@suse.de>

Reviewed-by: Andy Lutomirski <luto@kernel.org>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 23/31] x86/mm/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32
@ 2018-02-09 17:42     ` Andy Lutomirski
  0 siblings, 0 replies; 183+ messages in thread
From: Andy Lutomirski @ 2018-02-09 17:42 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML, LKML,
	Linux-MM, Linus Torvalds, Andy Lutomirski, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek, Joerg Roedel

On Fri, Feb 9, 2018 at 9:25 AM, Joerg Roedel <joro@8bytes.org> wrote:
> From: Joerg Roedel <jroedel@suse.de>
>
> Move it out of the X86_64 specific processor defines so
> that its visible for 32bit too.
>
> Signed-off-by: Joerg Roedel <jroedel@suse.de>

Reviewed-by: Andy Lutomirski <luto@kernel.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
  2018-02-09 17:05     ` Linus Torvalds
@ 2018-02-09 17:43       ` Andy Lutomirski
  -1 siblings, 0 replies; 183+ messages in thread
From: Andy Lutomirski @ 2018-02-09 17:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List, linux-mm,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	Brian Gerst, David Laight, Denys Vlasenko, Eduardo Valentin,
	Greg KH, Will Deacon, Liguori, Anthony, Daniel Gruss,
	Hugh Dickins, Kees Cook, Andrea Arcangeli, Waiman Long,
	Pavel Machek, Joerg Roedel

On Fri, Feb 9, 2018 at 5:05 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Feb 9, 2018 at 1:25 AM, Joerg Roedel <joro@8bytes.org> wrote:
>> +
>> +       /* Copy over the stack-frame */
>> +       cld
>> +       rep movsb
>
> Ugh. This is going to be horrendous. Maybe not noticeable on modern
> CPU's, but the whole 32-bit code is kind of pointless on a modern CPU.
>
> At least use "rep movsl". If the kernel stack isn't 4-byte aligned,
> you have issues.
>

The 64-bit code mostly uses a bunch of push instructions for this.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
@ 2018-02-09 17:43       ` Andy Lutomirski
  0 siblings, 0 replies; 183+ messages in thread
From: Andy Lutomirski @ 2018-02-09 17:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List, linux-mm,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	Brian Gerst, David Laight, Denys Vlasenko, Eduardo Valentin,
	Greg KH, Will Deacon, Liguori, Anthony, Daniel Gruss,
	Hugh Dickins, Kees Cook, Andrea Arcangeli, Waiman Long,
	Pavel Machek, Joerg Roedel

On Fri, Feb 9, 2018 at 5:05 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, Feb 9, 2018 at 1:25 AM, Joerg Roedel <joro@8bytes.org> wrote:
>> +
>> +       /* Copy over the stack-frame */
>> +       cld
>> +       rep movsb
>
> Ugh. This is going to be horrendous. Maybe not noticeable on modern
> CPU's, but the whole 32-bit code is kind of pointless on a modern CPU.
>
> At least use "rep movsl". If the kernel stack isn't 4-byte aligned,
> you have issues.
>

The 64-bit code mostly uses a bunch of push instructions for this.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-09  9:25 ` Joerg Roedel
@ 2018-02-09 17:47   ` Andy Lutomirski
  -1 siblings, 0 replies; 183+ messages in thread
From: Andy Lutomirski @ 2018-02-09 17:47 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML, LKML,
	Linux-MM, Linus Torvalds, Andy Lutomirski, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek, Joerg Roedel

On Fri, Feb 9, 2018 at 9:25 AM, Joerg Roedel <joro@8bytes.org> wrote:
> Hi,
>
> here is the second version of my PTI implementation for
> x86_32, based on tip/x86-pti-for-linus. It took a lot longer
> than I had hoped, but there have been a number of obstacles
> on the way. It also isn't the small patch-set anymore that v1
> was, but compared to it this one actually works :)

One thing worth noting is that performance of this whole series is
going to be abysmal due to the complete lack of 32-bit PCID.  Maybe
any kernel built with this option set that runs on a CPU that has the
PCID bit set in CPUID should print a big fat warning like "WARNING:
you are using 32-bit PTI on a 64-bit PCID-capable CPU.  Your
performance will increase dramatically if you switch to a 64-bit
kernel."

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-09 17:47   ` Andy Lutomirski
  0 siblings, 0 replies; 183+ messages in thread
From: Andy Lutomirski @ 2018-02-09 17:47 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML, LKML,
	Linux-MM, Linus Torvalds, Andy Lutomirski, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek, Joerg Roedel

On Fri, Feb 9, 2018 at 9:25 AM, Joerg Roedel <joro@8bytes.org> wrote:
> Hi,
>
> here is the second version of my PTI implementation for
> x86_32, based on tip/x86-pti-for-linus. It took a lot longer
> than I had hoped, but there have been a number of obstacles
> on the way. It also isn't the small patch-set anymore that v1
> was, but compared to it this one actually works :)

One thing worth noting is that performance of this whole series is
going to be abysmal due to the complete lack of 32-bit PCID.  Maybe
any kernel built with this option set that runs on a CPU that has the
PCID bit set in CPUID should print a big fat warning like "WARNING:
you are using 32-bit PTI on a 64-bit PCID-capable CPU.  Your
performance will increase dramatically if you switch to a 64-bit
kernel."

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 20/31] x86/mm/pae: Populate the user page-table with user pgd's
  2018-02-09  9:25   ` Joerg Roedel
@ 2018-02-09 17:48     ` Andy Lutomirski
  -1 siblings, 0 replies; 183+ messages in thread
From: Andy Lutomirski @ 2018-02-09 17:48 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML, LKML,
	Linux-MM, Linus Torvalds, Andy Lutomirski, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek, Joerg Roedel

On Fri, Feb 9, 2018 at 9:25 AM, Joerg Roedel <joro@8bytes.org> wrote:
> From: Joerg Roedel <jroedel@suse.de>
>
> When we populate a PGD entry, make sure we populate it in
> the user page-table too.
>
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  arch/x86/include/asm/pgtable-3level.h | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
> index bc4af54..1a0661b 100644
> --- a/arch/x86/include/asm/pgtable-3level.h
> +++ b/arch/x86/include/asm/pgtable-3level.h
> @@ -98,6 +98,9 @@ static inline void native_set_pmd(pmd_t *pmdp, pmd_t pmd)
>
>  static inline void native_set_pud(pud_t *pudp, pud_t pud)
>  {
> +#ifdef CONFIG_PAGE_TABLE_ISOLATION
> +       pud.p4d.pgd = pti_set_user_pgd(&pudp->p4d.pgd, pud.p4d.pgd);
> +#endif
>         set_64bit((unsigned long long *)(pudp), native_pud_val(pud));
>  }
>
> @@ -194,6 +197,10 @@ static inline pud_t native_pudp_get_and_clear(pud_t *pudp)
>  {
>         union split_pud res, *orig = (union split_pud *)pudp;
>
> +#ifdef CONFIG_PAGE_TABLE_ISOLATION
> +       pti_set_user_pgd(&pudp->p4d.pgd, __pgd(0));
> +#endif
> +
>         /* xchg acts as a barrier before setting of the high bits */
>         res.pud_low = xchg(&orig->pud_low, 0);
>         res.pud_high = orig->pud_high;

Can you rename the helper from pti_set_user_pgd() to
pti_set_user_top_level_entry() or similar?  The name was already a bit
absurd, but now it's just nuts.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 20/31] x86/mm/pae: Populate the user page-table with user pgd's
@ 2018-02-09 17:48     ` Andy Lutomirski
  0 siblings, 0 replies; 183+ messages in thread
From: Andy Lutomirski @ 2018-02-09 17:48 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML, LKML,
	Linux-MM, Linus Torvalds, Andy Lutomirski, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek, Joerg Roedel

On Fri, Feb 9, 2018 at 9:25 AM, Joerg Roedel <joro@8bytes.org> wrote:
> From: Joerg Roedel <jroedel@suse.de>
>
> When we populate a PGD entry, make sure we populate it in
> the user page-table too.
>
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  arch/x86/include/asm/pgtable-3level.h | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h
> index bc4af54..1a0661b 100644
> --- a/arch/x86/include/asm/pgtable-3level.h
> +++ b/arch/x86/include/asm/pgtable-3level.h
> @@ -98,6 +98,9 @@ static inline void native_set_pmd(pmd_t *pmdp, pmd_t pmd)
>
>  static inline void native_set_pud(pud_t *pudp, pud_t pud)
>  {
> +#ifdef CONFIG_PAGE_TABLE_ISOLATION
> +       pud.p4d.pgd = pti_set_user_pgd(&pudp->p4d.pgd, pud.p4d.pgd);
> +#endif
>         set_64bit((unsigned long long *)(pudp), native_pud_val(pud));
>  }
>
> @@ -194,6 +197,10 @@ static inline pud_t native_pudp_get_and_clear(pud_t *pudp)
>  {
>         union split_pud res, *orig = (union split_pud *)pudp;
>
> +#ifdef CONFIG_PAGE_TABLE_ISOLATION
> +       pti_set_user_pgd(&pudp->p4d.pgd, __pgd(0));
> +#endif
> +
>         /* xchg acts as a barrier before setting of the high bits */
>         res.pud_low = xchg(&orig->pud_low, 0);
>         res.pud_high = orig->pud_high;

Can you rename the helper from pti_set_user_pgd() to
pti_set_user_top_level_entry() or similar?  The name was already a bit
absurd, but now it's just nuts.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
  2018-02-09 17:05     ` Linus Torvalds
@ 2018-02-09 19:02       ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09 19:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List, linux-mm,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	Brian Gerst, David Laight, Denys Vlasenko, Eduardo Valentin,
	Greg KH, Will Deacon, Liguori, Anthony, Daniel Gruss,
	Hugh Dickins, Kees Cook, Andrea Arcangeli, Waiman Long,
	Pavel Machek

On Fri, Feb 09, 2018 at 09:05:02AM -0800, Linus Torvalds wrote:
> On Fri, Feb 9, 2018 at 1:25 AM, Joerg Roedel <joro@8bytes.org> wrote:
> > +
> > +       /* Copy over the stack-frame */
> > +       cld
> > +       rep movsb
> 
> Ugh. This is going to be horrendous. Maybe not noticeable on modern
> CPU's, but the whole 32-bit code is kind of pointless on a modern CPU.
> 
> At least use "rep movsl". If the kernel stack isn't 4-byte aligned,
> you have issues.

Okay, I used movsb because I remembered that being the recommendation
for the most efficient memcpy, and it safes me an instruction. But that
is probably only true on modern CPUs. I'll change it to use movsl here
and in the other patches.

Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
@ 2018-02-09 19:02       ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09 19:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List, linux-mm,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	Brian Gerst, David Laight, Denys Vlasenko, Eduardo Valentin,
	Greg KH, Will Deacon, Liguori, Anthony, Daniel Gruss,
	Hugh Dickins, Kees Cook, Andrea Arcangeli, Waiman Long,
	Pavel Machek

On Fri, Feb 09, 2018 at 09:05:02AM -0800, Linus Torvalds wrote:
> On Fri, Feb 9, 2018 at 1:25 AM, Joerg Roedel <joro@8bytes.org> wrote:
> > +
> > +       /* Copy over the stack-frame */
> > +       cld
> > +       rep movsb
> 
> Ugh. This is going to be horrendous. Maybe not noticeable on modern
> CPU's, but the whole 32-bit code is kind of pointless on a modern CPU.
> 
> At least use "rep movsl". If the kernel stack isn't 4-byte aligned,
> you have issues.

Okay, I used movsb because I remembered that being the recommendation
for the most efficient memcpy, and it safes me an instruction. But that
is probably only true on modern CPUs. I'll change it to use movsl here
and in the other patches.

Thanks,

	Joerg

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
  2018-02-09 17:43       ` Andy Lutomirski
@ 2018-02-09 19:06         ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09 19:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List, linux-mm, Dave Hansen, Josh Poimboeuf,
	Juergen Gross, Peter Zijlstra, Borislav Petkov, Jiri Kosina,
	Boris Ostrovsky, Brian Gerst, David Laight, Denys Vlasenko,
	Eduardo Valentin, Greg KH, Will Deacon, Liguori, Anthony,
	Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek

On Fri, Feb 09, 2018 at 05:43:55PM +0000, Andy Lutomirski wrote:
 
> The 64-bit code mostly uses a bunch of push instructions for this.

I had it implemented with tons of push instructions first, but that
doesn't work in cases where the stack switch needs to happen only after
everything is copied over.

So I switched to 'rep movsb', which in my eyes also makes the code
easier to understand.


	Joerg

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
@ 2018-02-09 19:06         ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09 19:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List, linux-mm, Dave Hansen, Josh Poimboeuf,
	Juergen Gross, Peter Zijlstra, Borislav Petkov, Jiri Kosina,
	Boris Ostrovsky, Brian Gerst, David Laight, Denys Vlasenko,
	Eduardo Valentin, Greg KH, Will Deacon, Liguori, Anthony,
	Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek

On Fri, Feb 09, 2018 at 05:43:55PM +0000, Andy Lutomirski wrote:
 
> The 64-bit code mostly uses a bunch of push instructions for this.

I had it implemented with tons of push instructions first, but that
doesn't work in cases where the stack switch needs to happen only after
everything is copied over.

So I switched to 'rep movsb', which in my eyes also makes the code
easier to understand.


	Joerg

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 20/31] x86/mm/pae: Populate the user page-table with user pgd's
  2018-02-09 17:48     ` Andy Lutomirski
@ 2018-02-09 19:09       ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09 19:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	X86 ML, LKML, Linux-MM, Linus Torvalds, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek

On Fri, Feb 09, 2018 at 05:48:36PM +0000, Andy Lutomirski wrote:
> Can you rename the helper from pti_set_user_pgd() to
> pti_set_user_top_level_entry() or similar?  The name was already a bit
> absurd, but now it's just nuts.

Sure, I can do that, just pti_set_user_top_level_entry() is a bit long.
I try to find a shorter name.


Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 20/31] x86/mm/pae: Populate the user page-table with user pgd's
@ 2018-02-09 19:09       ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09 19:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	X86 ML, LKML, Linux-MM, Linus Torvalds, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek

On Fri, Feb 09, 2018 at 05:48:36PM +0000, Andy Lutomirski wrote:
> Can you rename the helper from pti_set_user_pgd() to
> pti_set_user_top_level_entry() or similar?  The name was already a bit
> absurd, but now it's just nuts.

Sure, I can do that, just pti_set_user_top_level_entry() is a bit long.
I try to find a shorter name.


Thanks,

	Joerg

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-09 17:47   ` Andy Lutomirski
@ 2018-02-09 19:11     ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09 19:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	X86 ML, LKML, Linux-MM, Linus Torvalds, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek

Hi Andy,

On Fri, Feb 09, 2018 at 05:47:43PM +0000, Andy Lutomirski wrote:
> One thing worth noting is that performance of this whole series is
> going to be abysmal due to the complete lack of 32-bit PCID.  Maybe
> any kernel built with this option set that runs on a CPU that has the
> PCID bit set in CPUID should print a big fat warning like "WARNING:
> you are using 32-bit PTI on a 64-bit PCID-capable CPU.  Your
> performance will increase dramatically if you switch to a 64-bit
> kernel."

Thanks for your review. I can add this warning, but I just hope that not
a lot of people will actually see it :)


	Joerg

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-09 19:11     ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09 19:11 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	X86 ML, LKML, Linux-MM, Linus Torvalds, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek

Hi Andy,

On Fri, Feb 09, 2018 at 05:47:43PM +0000, Andy Lutomirski wrote:
> One thing worth noting is that performance of this whole series is
> going to be abysmal due to the complete lack of 32-bit PCID.  Maybe
> any kernel built with this option set that runs on a CPU that has the
> PCID bit set in CPUID should print a big fat warning like "WARNING:
> you are using 32-bit PTI on a 64-bit PCID-capable CPU.  Your
> performance will increase dramatically if you switch to a 64-bit
> kernel."

Thanks for your review. I can add this warning, but I just hope that not
a lot of people will actually see it :)


	Joerg

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
  2018-02-09 19:02       ` Joerg Roedel
@ 2018-02-09 19:17         ` Linus Torvalds
  -1 siblings, 0 replies; 183+ messages in thread
From: Linus Torvalds @ 2018-02-09 19:17 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List, linux-mm,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	Brian Gerst, David Laight, Denys Vlasenko, Eduardo Valentin,
	Greg KH, Will Deacon, Liguori, Anthony, Daniel Gruss,
	Hugh Dickins, Kees Cook, Andrea Arcangeli, Waiman Long,
	Pavel Machek

On Fri, Feb 9, 2018 at 11:02 AM, Joerg Roedel <jroedel@suse.de> wrote:
>
> Okay, I used movsb because I remembered that being the recommendation
> for the most efficient memcpy, and it safes me an instruction. But that
> is probably only true on modern CPUs.

Yeah, it's only true on the very latest uarchs, and even there it's
not perfect for small copies.

On the older machines that are relevant for 32-bit code, it's often
tens of cycles just for the ucode overhead, I think, and "rep movsb"
actually does things literally a byte at a time.

              Linus

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
@ 2018-02-09 19:17         ` Linus Torvalds
  0 siblings, 0 replies; 183+ messages in thread
From: Linus Torvalds @ 2018-02-09 19:17 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List, linux-mm,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	Brian Gerst, David Laight, Denys Vlasenko, Eduardo Valentin,
	Greg KH, Will Deacon, Liguori, Anthony, Daniel Gruss,
	Hugh Dickins, Kees Cook, Andrea Arcangeli, Waiman Long,
	Pavel Machek

On Fri, Feb 9, 2018 at 11:02 AM, Joerg Roedel <jroedel@suse.de> wrote:
>
> Okay, I used movsb because I remembered that being the recommendation
> for the most efficient memcpy, and it safes me an instruction. But that
> is probably only true on modern CPUs.

Yeah, it's only true on the very latest uarchs, and even there it's
not perfect for small copies.

On the older machines that are relevant for 32-bit code, it's often
tens of cycles just for the ucode overhead, I think, and "rep movsb"
actually does things literally a byte at a time.

              Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
  2018-02-09 19:17         ` Linus Torvalds
@ 2018-02-09 19:25           ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09 19:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List, linux-mm,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	Brian Gerst, David Laight, Denys Vlasenko, Eduardo Valentin,
	Greg KH, Will Deacon, Liguori, Anthony, Daniel Gruss,
	Hugh Dickins, Kees Cook, Andrea Arcangeli, Waiman Long,
	Pavel Machek

On Fri, Feb 09, 2018 at 11:17:35AM -0800, Linus Torvalds wrote:
> Yeah, it's only true on the very latest uarchs, and even there it's
> not perfect for small copies.
> 
> On the older machines that are relevant for 32-bit code, it's often
> tens of cycles just for the ucode overhead, I think, and "rep movsb"
> actually does things literally a byte at a time.

Ugh, okay. So I switch to movsl, that should at least perform on-par
with the chain of 'pushl' instructions I had before.


Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
@ 2018-02-09 19:25           ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-09 19:25 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List, linux-mm,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	Brian Gerst, David Laight, Denys Vlasenko, Eduardo Valentin,
	Greg KH, Will Deacon, Liguori, Anthony, Daniel Gruss,
	Hugh Dickins, Kees Cook, Andrea Arcangeli, Waiman Long,
	Pavel Machek

On Fri, Feb 09, 2018 at 11:17:35AM -0800, Linus Torvalds wrote:
> Yeah, it's only true on the very latest uarchs, and even there it's
> not perfect for small copies.
> 
> On the older machines that are relevant for 32-bit code, it's often
> tens of cycles just for the ucode overhead, I think, and "rep movsb"
> actually does things literally a byte at a time.

Ugh, okay. So I switch to movsl, that should at least perform on-par
with the chain of 'pushl' instructions I had before.


Thanks,

	Joerg

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
  2018-02-09 19:02       ` Joerg Roedel
@ 2018-02-09 19:30         ` Denys Vlasenko
  -1 siblings, 0 replies; 183+ messages in thread
From: Denys Vlasenko @ 2018-02-09 19:30 UTC (permalink / raw)
  To: Joerg Roedel, Linus Torvalds
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List, linux-mm,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	Brian Gerst, David Laight, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli, Waiman Long, Pavel Machek

On 02/09/2018 08:02 PM, Joerg Roedel wrote:
> On Fri, Feb 09, 2018 at 09:05:02AM -0800, Linus Torvalds wrote:
>> On Fri, Feb 9, 2018 at 1:25 AM, Joerg Roedel <joro@8bytes.org> wrote:
>>> +
>>> +       /* Copy over the stack-frame */
>>> +       cld
>>> +       rep movsb
>>
>> Ugh. This is going to be horrendous. Maybe not noticeable on modern
>> CPU's, but the whole 32-bit code is kind of pointless on a modern CPU.
>>
>> At least use "rep movsl". If the kernel stack isn't 4-byte aligned,
>> you have issues.
>
> Okay, I used movsb because I remembered that being the recommendation
> for the most efficient memcpy, and it safes me an instruction. But that
> is probably only true on modern CPUs.

It's fast (copies data with full-width loads and stores,
up to 64-byte wide on latest Intel CPUs), but this kicks in only for
largish blocks. In your case, you are copying less than 100 bytes.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
@ 2018-02-09 19:30         ` Denys Vlasenko
  0 siblings, 0 replies; 183+ messages in thread
From: Denys Vlasenko @ 2018-02-09 19:30 UTC (permalink / raw)
  To: Joerg Roedel, Linus Torvalds
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List, linux-mm,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	Brian Gerst, David Laight, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli, Waiman Long, Pavel Machek

On 02/09/2018 08:02 PM, Joerg Roedel wrote:
> On Fri, Feb 09, 2018 at 09:05:02AM -0800, Linus Torvalds wrote:
>> On Fri, Feb 9, 2018 at 1:25 AM, Joerg Roedel <joro@8bytes.org> wrote:
>>> +
>>> +       /* Copy over the stack-frame */
>>> +       cld
>>> +       rep movsb
>>
>> Ugh. This is going to be horrendous. Maybe not noticeable on modern
>> CPU's, but the whole 32-bit code is kind of pointless on a modern CPU.
>>
>> At least use "rep movsl". If the kernel stack isn't 4-byte aligned,
>> you have issues.
>
> Okay, I used movsb because I remembered that being the recommendation
> for the most efficient memcpy, and it safes me an instruction. But that
> is probably only true on modern CPUs.

It's fast (copies data with full-width loads and stores,
up to 64-byte wide on latest Intel CPUs), but this kicks in only for
largish blocks. In your case, you are copying less than 100 bytes.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
  2018-02-09 19:25           ` Joerg Roedel
@ 2018-02-09 19:48             ` Linus Torvalds
  -1 siblings, 0 replies; 183+ messages in thread
From: Linus Torvalds @ 2018-02-09 19:48 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List, linux-mm,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	Brian Gerst, David Laight, Denys Vlasenko, Eduardo Valentin,
	Greg KH, Will Deacon, Liguori, Anthony, Daniel Gruss,
	Hugh Dickins, Kees Cook, Andrea Arcangeli, Waiman Long,
	Pavel Machek

On Fri, Feb 9, 2018 at 11:25 AM, Joerg Roedel <jroedel@suse.de> wrote:
>
> Ugh, okay. So I switch to movsl, that should at least perform on-par
> with the chain of 'pushl' instructions I had before.

It should generally be roughly in the same ballpark.

I think the instruction scheduling ends up basically breaking around
microcoded instructions, which is why you'll get something like 12+n
cycles for "rep movs" on some uarchs, but at that point it's probably
mostly in the noise compared to all the other nasty PTI things.

You won't see any of the _real_ advantages (which are about moving
cachelines at a time), so with smallish copies you really only see the
downsides of "rep movs", which is mainly that instruction scheduling
hickup with any miocrocode.

But with the iret and the cr3 movement, you aren't going to have a
nice well-behaved pipeline anyway.

                Linus

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
@ 2018-02-09 19:48             ` Linus Torvalds
  0 siblings, 0 replies; 183+ messages in thread
From: Linus Torvalds @ 2018-02-09 19:48 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List, linux-mm,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	Brian Gerst, David Laight, Denys Vlasenko, Eduardo Valentin,
	Greg KH, Will Deacon, Liguori, Anthony, Daniel Gruss,
	Hugh Dickins, Kees Cook, Andrea Arcangeli, Waiman Long,
	Pavel Machek

On Fri, Feb 9, 2018 at 11:25 AM, Joerg Roedel <jroedel@suse.de> wrote:
>
> Ugh, okay. So I switch to movsl, that should at least perform on-par
> with the chain of 'pushl' instructions I had before.

It should generally be roughly in the same ballpark.

I think the instruction scheduling ends up basically breaking around
microcoded instructions, which is why you'll get something like 12+n
cycles for "rep movs" on some uarchs, but at that point it's probably
mostly in the noise compared to all the other nasty PTI things.

You won't see any of the _real_ advantages (which are about moving
cachelines at a time), so with smallish copies you really only see the
downsides of "rep movs", which is mainly that instruction scheduling
hickup with any miocrocode.

But with the iret and the cr3 movement, you aren't going to have a
nice well-behaved pipeline anyway.

                Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-09 17:47   ` Andy Lutomirski
  (?)
  (?)
@ 2018-02-09 21:09   ` Pavel Machek
  2018-02-09 21:11       ` Linus Torvalds
  2018-02-09 21:28       ` Andrew Cooper
  -1 siblings, 2 replies; 183+ messages in thread
From: Pavel Machek @ 2018-02-09 21:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	X86 ML, LKML, Linux-MM, Linus Torvalds, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Joerg Roedel

[-- Attachment #1: Type: text/plain, Size: 1154 bytes --]

On Fri 2018-02-09 17:47:43, Andy Lutomirski wrote:
> On Fri, Feb 9, 2018 at 9:25 AM, Joerg Roedel <joro@8bytes.org> wrote:
> > Hi,
> >
> > here is the second version of my PTI implementation for
> > x86_32, based on tip/x86-pti-for-linus. It took a lot longer
> > than I had hoped, but there have been a number of obstacles
> > on the way. It also isn't the small patch-set anymore that v1
> > was, but compared to it this one actually works :)
> 
> One thing worth noting is that performance of this whole series is
> going to be abysmal due to the complete lack of 32-bit PCID.  Maybe
> any kernel built with this option set that runs on a CPU that has
>the

What kind of slowdown are we talking about here?

> PCID bit set in CPUID should print a big fat warning like "WARNING:
> you are using 32-bit PTI on a 64-bit PCID-capable CPU.  Your
> performance will increase dramatically if you switch to a 64-bit
> kernel."

Hardware supports PCID even on 32-bit kernels, no?

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-09 21:09   ` Pavel Machek
@ 2018-02-09 21:11       ` Linus Torvalds
  2018-02-09 21:28       ` Andrew Cooper
  1 sibling, 0 replies; 183+ messages in thread
From: Linus Torvalds @ 2018-02-09 21:11 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Andy Lutomirski, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, X86 ML, LKML, Linux-MM, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Joerg Roedel

On Fri, Feb 9, 2018 at 1:09 PM, Pavel Machek <pavel@ucw.cz> wrote:
>
> Hardware supports PCID even on 32-bit kernels, no?

We're not adding support for it even if it were possible. No way.

Christ, even if you want to run 32-bit user code, you'd better run a
64-bit kernel. Backporting the PCID bits to something that no actual
real developer will ever use is crazy and unacceptable.

             Linus

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-09 21:11       ` Linus Torvalds
  0 siblings, 0 replies; 183+ messages in thread
From: Linus Torvalds @ 2018-02-09 21:11 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Andy Lutomirski, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, X86 ML, LKML, Linux-MM, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Joerg Roedel

On Fri, Feb 9, 2018 at 1:09 PM, Pavel Machek <pavel@ucw.cz> wrote:
>
> Hardware supports PCID even on 32-bit kernels, no?

We're not adding support for it even if it were possible. No way.

Christ, even if you want to run 32-bit user code, you'd better run a
64-bit kernel. Backporting the PCID bits to something that no actual
real developer will ever use is crazy and unacceptable.

             Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-09 21:09   ` Pavel Machek
@ 2018-02-09 21:28       ` Andrew Cooper
  2018-02-09 21:28       ` Andrew Cooper
  1 sibling, 0 replies; 183+ messages in thread
From: Andrew Cooper @ 2018-02-09 21:28 UTC (permalink / raw)
  To: Pavel Machek, Andy Lutomirski
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	X86 ML, LKML, Linux-MM, Linus Torvalds, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Joerg Roedel

On 09/02/2018 21:09, Pavel Machek wrote:
> On Fri 2018-02-09 17:47:43, Andy Lutomirski wrote:
>> PCID bit set in CPUID should print a big fat warning like "WARNING:
>> you are using 32-bit PTI on a 64-bit PCID-capable CPU.  Your
>> performance will increase dramatically if you switch to a 64-bit
>> kernel."
> Hardware supports PCID even on 32-bit kernels, no?

Attempting to set CR4.PCIDE is disallowed outside of long mode.  It is
strictly a 64bit-only feature.

~Andrew

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-09 21:28       ` Andrew Cooper
  0 siblings, 0 replies; 183+ messages in thread
From: Andrew Cooper @ 2018-02-09 21:28 UTC (permalink / raw)
  To: Pavel Machek, Andy Lutomirski
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	X86 ML, LKML, Linux-MM, Linus Torvalds, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Joerg Roedel

On 09/02/2018 21:09, Pavel Machek wrote:
> On Fri 2018-02-09 17:47:43, Andy Lutomirski wrote:
>> PCID bit set in CPUID should print a big fat warning like "WARNING:
>> you are using 32-bit PTI on a 64-bit PCID-capable CPU.  Your
>> performance will increase dramatically if you switch to a 64-bit
>> kernel."
> Hardware supports PCID even on 32-bit kernels, no?

Attempting to set CR4.PCIDE is disallowed outside of long mode.  It is
strictly a 64bit-only feature.

~Andrew

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-09 19:11     ` Joerg Roedel
@ 2018-02-10  9:15       ` Adam Borowski
  -1 siblings, 0 replies; 183+ messages in thread
From: Adam Borowski @ 2018-02-10  9:15 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Andy Lutomirski, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, X86 ML, LKML, Linux-MM, Linus Torvalds,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli, Waiman Long, Pavel Machek

On Fri, Feb 09, 2018 at 08:11:12PM +0100, Joerg Roedel wrote:
> On Fri, Feb 09, 2018 at 05:47:43PM +0000, Andy Lutomirski wrote:
> > One thing worth noting is that performance of this whole series is
> > going to be abysmal due to the complete lack of 32-bit PCID.  Maybe
> > any kernel built with this option set that runs on a CPU that has the
> > PCID bit set in CPUID should print a big fat warning like "WARNING:
> > you are using 32-bit PTI on a 64-bit PCID-capable CPU.  Your
> > performance will increase dramatically if you switch to a 64-bit
> > kernel."
> 
> Thanks for your review. I can add this warning, but I just hope that not
> a lot of people will actually see it :)

Alas, we got some data:
https://popcon.debian.org/ says 20% of x86 users have i386 as their main ABI
(current; people with popcon installed).

Of those, 80% use 32-bit kernels: i686 881, x86_64 229, i586 14
(uname -m included in bug reports; data for 2016) -- and bug reporters tend
to have more clue than the average user.  There's no way so many folks still
use pre-2004 computers.

Thus, if you could include that big fat warning, distro developers would be
thankful.  Make it show fiery letters if you can.  Preferably, we'd want a
huge mallet reach out of the screen and bonk the user on the head, but with
that impossible, a scary message would help.

Let them use 32-bit userland, but if someone runs such a kernel on a modern
machine, some kind of verbal abuse is warranted.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀ 
⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!?
⢿⡄⠘⠷⠚⠋⠀                                 -- Genghis Ht'rok'din
⠈⠳⣄⠀⠀⠀⠀ 

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-10  9:15       ` Adam Borowski
  0 siblings, 0 replies; 183+ messages in thread
From: Adam Borowski @ 2018-02-10  9:15 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Andy Lutomirski, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, X86 ML, LKML, Linux-MM, Linus Torvalds,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli, Waiman Long, Pavel Machek

On Fri, Feb 09, 2018 at 08:11:12PM +0100, Joerg Roedel wrote:
> On Fri, Feb 09, 2018 at 05:47:43PM +0000, Andy Lutomirski wrote:
> > One thing worth noting is that performance of this whole series is
> > going to be abysmal due to the complete lack of 32-bit PCID.  Maybe
> > any kernel built with this option set that runs on a CPU that has the
> > PCID bit set in CPUID should print a big fat warning like "WARNING:
> > you are using 32-bit PTI on a 64-bit PCID-capable CPU.  Your
> > performance will increase dramatically if you switch to a 64-bit
> > kernel."
> 
> Thanks for your review. I can add this warning, but I just hope that not
> a lot of people will actually see it :)

Alas, we got some data:
https://popcon.debian.org/ says 20% of x86 users have i386 as their main ABI
(current; people with popcon installed).

Of those, 80% use 32-bit kernels: i686 881, x86_64 229, i586 14
(uname -m included in bug reports; data for 2016) -- and bug reporters tend
to have more clue than the average user.  There's no way so many folks still
use pre-2004 computers.

Thus, if you could include that big fat warning, distro developers would be
thankful.  Make it show fiery letters if you can.  Preferably, we'd want a
huge mallet reach out of the screen and bonk the user on the head, but with
that impossible, a scary message would help.

Let them use 32-bit userland, but if someone runs such a kernel on a modern
machine, some kind of verbal abuse is warranted.


Meow!
-- 
ac?aGBP'a  3/4 a >>ac?aGBP|a ? 
aGBP 3/4 a ?ac?a ?a ?aGBP?a!? Vat kind uf sufficiently advanced technology iz dis!?
ac?a!?a ?a .a ?a ?a ?                                 -- Genghis Ht'rok'din
a ?a 3aGBP?a ?a ?a ?a ? 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
  2018-02-09 17:17       ` Denys Vlasenko
@ 2018-02-10 15:26         ` David Laight
  -1 siblings, 0 replies; 183+ messages in thread
From: David Laight @ 2018-02-10 15:26 UTC (permalink / raw)
  To: 'Denys Vlasenko', Linus Torvalds, Joerg Roedel
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List, linux-mm,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	Brian Gerst, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek, Joerg Roedel

From: Denys Vlasenko
> Sent: 09 February 2018 17:17
> On 02/09/2018 06:05 PM, Linus Torvalds wrote:
> > On Fri, Feb 9, 2018 at 1:25 AM, Joerg Roedel <joro@8bytes.org> wrote:
> >> +
> >> +       /* Copy over the stack-frame */
> >> +       cld
> >> +       rep movsb
> >
> > Ugh. This is going to be horrendous. Maybe not noticeable on modern
> > CPU's, but the whole 32-bit code is kind of pointless on a modern CPU.
> >
> > At least use "rep movsl". If the kernel stack isn't 4-byte aligned,
> > you have issues.

The alignment doesn't matter, 'rep movsl' will still work.

> Indeed, "rep movs" has some setup overhead that makes it undesirable
> for small sizes. In my testing, moving less than 128 bytes with "rep movs"
> is a loss.

It very much depends on the cpu.

Recent (Haswell?) Intel cpus have hardware support for optimising 'rep movsb'
for cached memory locations so that it is fast regardless of the alignments.
The setup cost is fairly small.

The previous generation had an optimisation for 'rep movsb' for less than
7 bytes, but for larger values the setup cost was significantly higher.
On these cpu you needed to use 'rep movsd' (64 bits is best) for the bulk
of a copy.

Actually, instead of using 'rep movsb' to copy the odd few bytes, for
memcpy() you can copy the last (misaligned) 8 bytes first then use
'rep movsd' for the bulk of the copy.

On Netburst P4 the setup cost for any 'rep movs' was something like 45 clocks.
You really didn't want to use them for short copies.
(A C compiler from a well known OS supplier will 'optimise' any copy loop
into 'rep movsb' - not entirely the best of optimisations!)

I also managed to match the per-cycle cost of 'rep movsl' with a copy
loop on my Athlon-700 (but not the setup cost, on a P4 I might have
beaten the setup cost as well).

	David


^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
@ 2018-02-10 15:26         ` David Laight
  0 siblings, 0 replies; 183+ messages in thread
From: David Laight @ 2018-02-10 15:26 UTC (permalink / raw)
  To: 'Denys Vlasenko', Linus Torvalds, Joerg Roedel
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List, linux-mm,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	Brian Gerst, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek, Joerg Roedel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1959 bytes --]

From: Denys Vlasenko
> Sent: 09 February 2018 17:17
> On 02/09/2018 06:05 PM, Linus Torvalds wrote:
> > On Fri, Feb 9, 2018 at 1:25 AM, Joerg Roedel <joro@8bytes.org> wrote:
> >> +
> >> +       /* Copy over the stack-frame */
> >> +       cld
> >> +       rep movsb
> >
> > Ugh. This is going to be horrendous. Maybe not noticeable on modern
> > CPU's, but the whole 32-bit code is kind of pointless on a modern CPU.
> >
> > At least use "rep movsl". If the kernel stack isn't 4-byte aligned,
> > you have issues.

The alignment doesn't matter, 'rep movsl' will still work.

> Indeed, "rep movs" has some setup overhead that makes it undesirable
> for small sizes. In my testing, moving less than 128 bytes with "rep movs"
> is a loss.

It very much depends on the cpu.

Recent (Haswell?) Intel cpus have hardware support for optimising 'rep movsb'
for cached memory locations so that it is fast regardless of the alignments.
The setup cost is fairly small.

The previous generation had an optimisation for 'rep movsb' for less than
7 bytes, but for larger values the setup cost was significantly higher.
On these cpu you needed to use 'rep movsd' (64 bits is best) for the bulk
of a copy.

Actually, instead of using 'rep movsb' to copy the odd few bytes, for
memcpy() you can copy the last (misaligned) 8 bytes first then use
'rep movsd' for the bulk of the copy.

On Netburst P4 the setup cost for any 'rep movs' was something like 45 clocks.
You really didn't want to use them for short copies.
(A C compiler from a well known OS supplier will 'optimise' any copy loop
into 'rep movsb' - not entirely the best of optimisations!)

I also managed to match the per-cycle cost of 'rep movsl' with a copy
loop on my Athlon-700 (but not the setup cost, on a P4 I might have
beaten the setup cost as well).

	David

N‹§²æìr¸›zǧu©ž²Æ {\b­†éì¹»\x1c®&Þ–)îÆi¢žØ^n‡r¶‰šŽŠÝ¢j$½§$¢¸\x05¢¹¨­è§~Š'.)îÄÃ,yèm¶ŸÿÃ\f%Š{±šj+ƒðèž×¦j)Z†·Ÿ

^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
  2018-02-09 19:48             ` Linus Torvalds
@ 2018-02-10 15:41               ` David Laight
  -1 siblings, 0 replies; 183+ messages in thread
From: David Laight @ 2018-02-10 15:41 UTC (permalink / raw)
  To: 'Linus Torvalds', Joerg Roedel
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List, linux-mm,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	Brian Gerst, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli, Waiman Long, Pavel Machek

From: Linus Torvalds
> Sent: 09 February 2018 19:49
...
> I think the instruction scheduling ends up basically breaking around
> microcoded instructions, which is why you'll get something like 12+n
> cycles for "rep movs" on some uarchs, but at that point it's probably
> mostly in the noise compared to all the other nasty PTI things.

Or 48+n on P4

> You won't see any of the _real_ advantages (which are about moving
> cachelines at a time), so with smallish copies you really only see the
> downsides of "rep movs", which is mainly that instruction scheduling
> hickup with any miocrocode.

I thought that the hardware optimisation for 'rep movsb' on recent
Intel cpus generated word sized memory accesses even for misaligned
short transfers.
My thoughts were that they'd implemented a cache line sized barrel
shift register.
If that isn't true then using it for all memcpy() is probably stupid
(but not as stupid as doing all memcpy backwards!)

	David



^ permalink raw reply	[flat|nested] 183+ messages in thread

* RE: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
@ 2018-02-10 15:41               ` David Laight
  0 siblings, 0 replies; 183+ messages in thread
From: David Laight @ 2018-02-10 15:41 UTC (permalink / raw)
  To: 'Linus Torvalds', Joerg Roedel
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List, linux-mm,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	Brian Gerst, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli, Waiman Long, Pavel Machek

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1086 bytes --]

From: Linus Torvalds
> Sent: 09 February 2018 19:49
...
> I think the instruction scheduling ends up basically breaking around
> microcoded instructions, which is why you'll get something like 12+n
> cycles for "rep movs" on some uarchs, but at that point it's probably
> mostly in the noise compared to all the other nasty PTI things.

Or 48+n on P4

> You won't see any of the _real_ advantages (which are about moving
> cachelines at a time), so with smallish copies you really only see the
> downsides of "rep movs", which is mainly that instruction scheduling
> hickup with any miocrocode.

I thought that the hardware optimisation for 'rep movsb' on recent
Intel cpus generated word sized memory accesses even for misaligned
short transfers.
My thoughts were that they'd implemented a cache line sized barrel
shift register.
If that isn't true then using it for all memcpy() is probably stupid
(but not as stupid as doing all memcpy backwards!)

	David


N‹§²æìr¸›zǧu©ž²Æ {\b­†éì¹»\x1c®&Þ–)îÆi¢žØ^n‡r¶‰šŽŠÝ¢j$½§$¢¸\x05¢¹¨­è§~Š'.)îÄÃ,yèm¶ŸÿÃ\f%Š{±šj+ƒðèž×¦j)Z†·Ÿ

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
  2018-02-10 15:26         ` David Laight
@ 2018-02-10 20:18           ` Linus Torvalds
  -1 siblings, 0 replies; 183+ messages in thread
From: Linus Torvalds @ 2018-02-10 20:18 UTC (permalink / raw)
  To: David Laight
  Cc: Denys Vlasenko, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List, linux-mm, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	Eduardo Valentin, Greg KH, Will Deacon, Liguori, Anthony,
	Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek, Joerg Roedel

On Sat, Feb 10, 2018 at 7:26 AM, David Laight <David.Laight@aculab.com> wrote:
>
> The alignment doesn't matter, 'rep movsl' will still work.

.. no it won't. It might not copy the last two bytes or whatever,
because the shift of the count will have ignored the low bits.

But since an unaligned stack pointer really shouldn't be an issue,
it's fine to not care.

>> Indeed, "rep movs" has some setup overhead that makes it undesirable
>> for small sizes. In my testing, moving less than 128 bytes with "rep movs"
>> is a loss.
>
> It very much depends on the cpu.

No again.

It does NOT depend on the CPU, since the only CPU's that are relevant
to this patch are the ones that don't do 64-bit. If you run a 32-bit
Linux on a 64-bit CPU, performance simply isn't an issue. The problem
is between keyboard and chair, not in the kernel.

And absolutely *no* 32-bit-only CPU does "rep movs" really well.  Some
of them do it even worse than others (P4), but none of them do a great
job.

That said, none of them should do _such_ a shitty job that this will
be in the least noticeable compared to all the crazy %cr3 stuff.

                Linus

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 09/31] x86/entry/32: Leave the kernel via trampoline stack
@ 2018-02-10 20:18           ` Linus Torvalds
  0 siblings, 0 replies; 183+ messages in thread
From: Linus Torvalds @ 2018-02-10 20:18 UTC (permalink / raw)
  To: David Laight
  Cc: Denys Vlasenko, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List, linux-mm, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	Eduardo Valentin, Greg KH, Will Deacon, Liguori, Anthony,
	Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek, Joerg Roedel

On Sat, Feb 10, 2018 at 7:26 AM, David Laight <David.Laight@aculab.com> wrote:
>
> The alignment doesn't matter, 'rep movsl' will still work.

.. no it won't. It might not copy the last two bytes or whatever,
because the shift of the count will have ignored the low bits.

But since an unaligned stack pointer really shouldn't be an issue,
it's fine to not care.

>> Indeed, "rep movs" has some setup overhead that makes it undesirable
>> for small sizes. In my testing, moving less than 128 bytes with "rep movs"
>> is a loss.
>
> It very much depends on the cpu.

No again.

It does NOT depend on the CPU, since the only CPU's that are relevant
to this patch are the ones that don't do 64-bit. If you run a 32-bit
Linux on a 64-bit CPU, performance simply isn't an issue. The problem
is between keyboard and chair, not in the kernel.

And absolutely *no* 32-bit-only CPU does "rep movs" really well.  Some
of them do it even worse than others (P4), but none of them do a great
job.

That said, none of them should do _such_ a shitty job that this will
be in the least noticeable compared to all the crazy %cr3 stuff.

                Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-10  9:15       ` Adam Borowski
@ 2018-02-10 20:22         ` Linus Torvalds
  -1 siblings, 0 replies; 183+ messages in thread
From: Linus Torvalds @ 2018-02-10 20:22 UTC (permalink / raw)
  To: Adam Borowski
  Cc: Joerg Roedel, Andy Lutomirski, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, X86 ML, LKML, Linux-MM,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli, Waiman Long, Pavel Machek

On Sat, Feb 10, 2018 at 1:15 AM, Adam Borowski <kilobyte@angband.pl> wrote:
>
> Alas, we got some data:
> https://popcon.debian.org/ says 20% of x86 users have i386 as their main ABI
> (current; people with popcon installed).

One of the issues I've seen is that people often simply move a disk
(or copy an installation) when upgrading machines.

Does Debian make it easy to upgrade to a 64-bit kernel if you have a
32-bit install? Because if not, then it's entirely possible that a lot
of people started out with a 32-bit install (maybe they even had a
64-bit kernel, but they started when the 32-bit install was the
default one), and never upgraded their kernel.

It really should be easy to _just_ upgrade the kernel. But if the
distro doesn't support it, most people won't do it.

                Linus

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-10 20:22         ` Linus Torvalds
  0 siblings, 0 replies; 183+ messages in thread
From: Linus Torvalds @ 2018-02-10 20:22 UTC (permalink / raw)
  To: Adam Borowski
  Cc: Joerg Roedel, Andy Lutomirski, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, X86 ML, LKML, Linux-MM,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli, Waiman Long, Pavel Machek

On Sat, Feb 10, 2018 at 1:15 AM, Adam Borowski <kilobyte@angband.pl> wrote:
>
> Alas, we got some data:
> https://popcon.debian.org/ says 20% of x86 users have i386 as their main ABI
> (current; people with popcon installed).

One of the issues I've seen is that people often simply move a disk
(or copy an installation) when upgrading machines.

Does Debian make it easy to upgrade to a 64-bit kernel if you have a
32-bit install? Because if not, then it's entirely possible that a lot
of people started out with a 32-bit install (maybe they even had a
64-bit kernel, but they started when the 32-bit install was the
default one), and never upgraded their kernel.

It really should be easy to _just_ upgrade the kernel. But if the
distro doesn't support it, most people won't do it.

                Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-10 20:22         ` Linus Torvalds
@ 2018-02-11 10:59           ` Adam Borowski
  -1 siblings, 0 replies; 183+ messages in thread
From: Adam Borowski @ 2018-02-11 10:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Joerg Roedel, Andy Lutomirski, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, X86 ML, LKML, Linux-MM,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli, Waiman Long, Pavel Machek

On Sat, Feb 10, 2018 at 12:22:59PM -0800, Linus Torvalds wrote:
> On Sat, Feb 10, 2018 at 1:15 AM, Adam Borowski <kilobyte@angband.pl> wrote:
> >
> > Alas, we got some data:
> > https://popcon.debian.org/ says 20% of x86 users have i386 as their main ABI
> > (current; people with popcon installed).
> 
> One of the issues I've seen is that people often simply move a disk
> (or copy an installation) when upgrading machines.

Less skilled users (ie, most of them) had until recently a different hurdle:
the "32-bit" option was shown way too prominently, without an explanation.

> Does Debian make it easy to upgrade to a 64-bit kernel if you have a
> 32-bit install?

Quite easy, yeah.  Crossgrading userspace is not for the faint of the heart,
but changing just the kernel is fine.

> Because if not, then it's entirely possible that a lot
> of people started out with a 32-bit install (maybe they even had a
> 64-bit kernel, but they started when the 32-bit install was the
> default one), and never upgraded their kernel.
> 
> It really should be easy to _just_ upgrade the kernel. But if the
> distro doesn't support it, most people won't do it.

I just realized that, for people who use distro kernels, the right way is a
message during upgrade.  Ie, it's up to the packagers rather than you.

Having a dire-sounding printk, though, would still be nice.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢰⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!?
⢿⡄⠘⠷⠚⠋⠀                                 -- Genghis Ht'rok'din
⠈⠳⣄⠀⠀⠀⠀

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-11 10:59           ` Adam Borowski
  0 siblings, 0 replies; 183+ messages in thread
From: Adam Borowski @ 2018-02-11 10:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Joerg Roedel, Andy Lutomirski, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, X86 ML, LKML, Linux-MM,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli, Waiman Long, Pavel Machek

On Sat, Feb 10, 2018 at 12:22:59PM -0800, Linus Torvalds wrote:
> On Sat, Feb 10, 2018 at 1:15 AM, Adam Borowski <kilobyte@angband.pl> wrote:
> >
> > Alas, we got some data:
> > https://popcon.debian.org/ says 20% of x86 users have i386 as their main ABI
> > (current; people with popcon installed).
> 
> One of the issues I've seen is that people often simply move a disk
> (or copy an installation) when upgrading machines.

Less skilled users (ie, most of them) had until recently a different hurdle:
the "32-bit" option was shown way too prominently, without an explanation.

> Does Debian make it easy to upgrade to a 64-bit kernel if you have a
> 32-bit install?

Quite easy, yeah.  Crossgrading userspace is not for the faint of the heart,
but changing just the kernel is fine.

> Because if not, then it's entirely possible that a lot
> of people started out with a 32-bit install (maybe they even had a
> 64-bit kernel, but they started when the 32-bit install was the
> default one), and never upgraded their kernel.
> 
> It really should be easy to _just_ upgrade the kernel. But if the
> distro doesn't support it, most people won't do it.

I just realized that, for people who use distro kernels, the right way is a
message during upgrade.  Ie, it's up to the packagers rather than you.

Having a dire-sounding printk, though, would still be nice.


Meow!
-- 
ac?aGBP'a  3/4 a >>ac?aGBP|a ?
aGBP 3/4 a ?ac?a ?a ?aGBP?a!? Vat kind uf sufficiently advanced technology iz dis!?
ac?a!?a ?a .a ?a ?a ?                                 -- Genghis Ht'rok'din
a ?a 3aGBP?a ?a ?a ?a ?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-11 10:59           ` Adam Borowski
  (?)
@ 2018-02-11 17:40           ` Mark D Rustad
  2018-02-11 19:42               ` Andy Lutomirski
  2018-02-13  8:54               ` Greg KH
  -1 siblings, 2 replies; 183+ messages in thread
From: Mark D Rustad @ 2018-02-11 17:40 UTC (permalink / raw)
  To: Adam Borowski
  Cc: Linus Torvalds, Joerg Roedel, Andy Lutomirski, Joerg Roedel,
	Thomas Gleixner, Ingo Molnar, H . Peter Anvin, X86 ML, LKML,
	Linux-MM, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	Brian Gerst, David Laight, Denys Vlasenko, Eduardo Valentin,
	Greg KH, Will Deacon, Liguori, Anthony, Daniel Gruss,
	Hugh Dickins, Kees Cook, Andrea Arcangeli, Waiman Long,
	Pavel Machek

[-- Attachment #1: Type: text/plain, Size: 651 bytes --]

> On Feb 11, 2018, at 2:59 AM, Adam Borowski <kilobyte@angband.pl> wrote:
> 
>> Does Debian make it easy to upgrade to a 64-bit kernel if you have a
>> 32-bit install?
> 
> Quite easy, yeah.  Crossgrading userspace is not for the faint of the heart,
> but changing just the kernel is fine.

ISTR that iscsi doesn't work when running a 64-bit kernel with a 32-bit userspace. I remember someone offered kernel patches to fix it, but I think they were rejected. I haven't messed with that stuff in many years, so perhaps the userspace side now has accommodation for it. It might be something to check on.

--
Mark Rustad, MRustad@gmail.com


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-09 19:11     ` Joerg Roedel
@ 2018-02-11 19:13       ` Ingo Molnar
  -1 siblings, 0 replies; 183+ messages in thread
From: Ingo Molnar @ 2018-02-11 19:13 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Andy Lutomirski, Joerg Roedel, Thomas Gleixner, H . Peter Anvin,
	X86 ML, LKML, Linux-MM, Linus Torvalds, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek


* Joerg Roedel <jroedel@suse.de> wrote:

> Hi Andy,
> 
> On Fri, Feb 09, 2018 at 05:47:43PM +0000, Andy Lutomirski wrote:
> > One thing worth noting is that performance of this whole series is
> > going to be abysmal due to the complete lack of 32-bit PCID.  Maybe
> > any kernel built with this option set that runs on a CPU that has the
> > PCID bit set in CPUID should print a big fat warning like "WARNING:
> > you are using 32-bit PTI on a 64-bit PCID-capable CPU.  Your
> > performance will increase dramatically if you switch to a 64-bit
> > kernel."
> 
> Thanks for your review. I can add this warning, but I just hope that not
> a lot of people will actually see it :)

Could you please measure the PTI kernel vs. vanilla kernel?

Nothing complex, just perf's built-in scheduler and syscall benchmark should be 
enough:

   perf stat --null --sync --repeat 10 perf bench sched messaging -g 20

this should give us a pretty good worst-case overhead figure for process 
workloads.

Add '-t' to test threaded workloads as well:

   perf stat --null --sync --repeat 10 perf bench sched messaging -g 20 -t

The 10 runs used should be enough to reach good stability in practice:

 Performance counter stats for 'perf bench sched messaging -g 20 -t' (10 runs):

       0.380742219 seconds time elapsed                                          ( +-  0.73% )

Maybe do the same on the 64-bit kernel as well, so that we have 4 good data points 
on the same hardware?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-11 19:13       ` Ingo Molnar
  0 siblings, 0 replies; 183+ messages in thread
From: Ingo Molnar @ 2018-02-11 19:13 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Andy Lutomirski, Joerg Roedel, Thomas Gleixner, H . Peter Anvin,
	X86 ML, LKML, Linux-MM, Linus Torvalds, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek


* Joerg Roedel <jroedel@suse.de> wrote:

> Hi Andy,
> 
> On Fri, Feb 09, 2018 at 05:47:43PM +0000, Andy Lutomirski wrote:
> > One thing worth noting is that performance of this whole series is
> > going to be abysmal due to the complete lack of 32-bit PCID.  Maybe
> > any kernel built with this option set that runs on a CPU that has the
> > PCID bit set in CPUID should print a big fat warning like "WARNING:
> > you are using 32-bit PTI on a 64-bit PCID-capable CPU.  Your
> > performance will increase dramatically if you switch to a 64-bit
> > kernel."
> 
> Thanks for your review. I can add this warning, but I just hope that not
> a lot of people will actually see it :)

Could you please measure the PTI kernel vs. vanilla kernel?

Nothing complex, just perf's built-in scheduler and syscall benchmark should be 
enough:

   perf stat --null --sync --repeat 10 perf bench sched messaging -g 20

this should give us a pretty good worst-case overhead figure for process 
workloads.

Add '-t' to test threaded workloads as well:

   perf stat --null --sync --repeat 10 perf bench sched messaging -g 20 -t

The 10 runs used should be enough to reach good stability in practice:

 Performance counter stats for 'perf bench sched messaging -g 20 -t' (10 runs):

       0.380742219 seconds time elapsed                                          ( +-  0.73% )

Maybe do the same on the 64-bit kernel as well, so that we have 4 good data points 
on the same hardware?

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-11 17:40           ` Mark D Rustad
@ 2018-02-11 19:42               ` Andy Lutomirski
  2018-02-13  8:54               ` Greg KH
  1 sibling, 0 replies; 183+ messages in thread
From: Andy Lutomirski @ 2018-02-11 19:42 UTC (permalink / raw)
  To: Mark D Rustad
  Cc: Adam Borowski, Linus Torvalds, Joerg Roedel, Andy Lutomirski,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	X86 ML, LKML, Linux-MM, Dave Hansen, Josh Poimboeuf,
	Juergen Gross, Peter Zijlstra, Borislav Petkov, Jiri Kosina,
	Boris Ostrovsky, Brian Gerst, David Laight, Denys Vlasenko,
	Eduardo Valentin, Greg KH, Will Deacon, Liguori, Anthony,
	Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek



On Feb 11, 2018, at 9:40 AM, Mark D Rustad <mrustad@gmail.com> wrote:

>> On Feb 11, 2018, at 2:59 AM, Adam Borowski <kilobyte@angband.pl> wrote:
>> 
>>> Does Debian make it easy to upgrade to a 64-bit kernel if you have a
>>> 32-bit install?
>> 
>> Quite easy, yeah.  Crossgrading userspace is not for the faint of the heart,
>> but changing just the kernel is fine.
> 
> ISTR that iscsi doesn't work when running a 64-bit kernel with a 32-bit userspace. I remember someone offered kernel patches to fix it, but I think they were rejected. I haven't messed with that stuff in many years, so perhaps the userspace side now has accommodation for it. It might be something to check on.
> 

At the risk of suggesting heresy, should we consider removing x86_32 support at some point?

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-11 19:42               ` Andy Lutomirski
  0 siblings, 0 replies; 183+ messages in thread
From: Andy Lutomirski @ 2018-02-11 19:42 UTC (permalink / raw)
  To: Mark D Rustad
  Cc: Adam Borowski, Linus Torvalds, Joerg Roedel, Andy Lutomirski,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	X86 ML, LKML, Linux-MM, Dave Hansen, Josh Poimboeuf,
	Juergen Gross, Peter Zijlstra, Borislav Petkov, Jiri Kosina,
	Boris Ostrovsky, Brian Gerst, David Laight, Denys Vlasenko,
	Eduardo Valentin, Greg KH, Will Deacon, Liguori, Anthony,
	Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek



On Feb 11, 2018, at 9:40 AM, Mark D Rustad <mrustad@gmail.com> wrote:

>> On Feb 11, 2018, at 2:59 AM, Adam Borowski <kilobyte@angband.pl> wrote:
>> 
>>> Does Debian make it easy to upgrade to a 64-bit kernel if you have a
>>> 32-bit install?
>> 
>> Quite easy, yeah.  Crossgrading userspace is not for the faint of the heart,
>> but changing just the kernel is fine.
> 
> ISTR that iscsi doesn't work when running a 64-bit kernel with a 32-bit userspace. I remember someone offered kernel patches to fix it, but I think they were rejected. I haven't messed with that stuff in many years, so perhaps the userspace side now has accommodation for it. It might be something to check on.
> 

At the risk of suggesting heresy, should we consider removing x86_32 support at some point?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-11 19:42               ` Andy Lutomirski
@ 2018-02-11 20:14                 ` Linus Torvalds
  -1 siblings, 0 replies; 183+ messages in thread
From: Linus Torvalds @ 2018-02-11 20:14 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mark D Rustad, Adam Borowski, Joerg Roedel, Andy Lutomirski,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	X86 ML, LKML, Linux-MM, Dave Hansen, Josh Poimboeuf,
	Juergen Gross, Peter Zijlstra, Borislav Petkov, Jiri Kosina,
	Boris Ostrovsky, Brian Gerst, David Laight, Denys Vlasenko,
	Eduardo Valentin, Greg KH, Will Deacon, Liguori, Anthony,
	Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek

On Sun, Feb 11, 2018 at 11:42 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> At the risk of suggesting heresy, should we consider removing x86_32 support at some point?

Maybe in five or ten years. Not in the near future, I'm afraid.

We removed support for the 80386 back at the end of 2012. At that
point it just became too painful to support at all.

In contrast, we could continue supporting 32-bit for a long time. Even
if it might be in some degraded form (ie no PTI at all, or only a slow
one).

It's not really a lot of pain for people who don't care.

                Linus

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-11 20:14                 ` Linus Torvalds
  0 siblings, 0 replies; 183+ messages in thread
From: Linus Torvalds @ 2018-02-11 20:14 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mark D Rustad, Adam Borowski, Joerg Roedel, Andy Lutomirski,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	X86 ML, LKML, Linux-MM, Dave Hansen, Josh Poimboeuf,
	Juergen Gross, Peter Zijlstra, Borislav Petkov, Jiri Kosina,
	Boris Ostrovsky, Brian Gerst, David Laight, Denys Vlasenko,
	Eduardo Valentin, Greg KH, Will Deacon, Liguori, Anthony,
	Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek

On Sun, Feb 11, 2018 at 11:42 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>
> At the risk of suggesting heresy, should we consider removing x86_32 support at some point?

Maybe in five or ten years. Not in the near future, I'm afraid.

We removed support for the 80386 back at the end of 2012. At that
point it just became too painful to support at all.

In contrast, we could continue supporting 32-bit for a long time. Even
if it might be in some degraded form (ie no PTI at all, or only a slow
one).

It's not really a lot of pain for people who don't care.

                Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-11 19:42               ` Andy Lutomirski
@ 2018-02-11 22:12                 ` James Bottomley
  -1 siblings, 0 replies; 183+ messages in thread
From: James Bottomley @ 2018-02-11 22:12 UTC (permalink / raw)
  To: Andy Lutomirski, Mark D Rustad
  Cc: Adam Borowski, Linus Torvalds, Joerg Roedel, Andy Lutomirski,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	X86 ML, LKML, Linux-MM, Dave Hansen, Josh Poimboeuf,
	Juergen Gross, Peter Zijlstra, Borislav Petkov, Jiri Kosina,
	Boris Ostrovsky, Brian Gerst, David Laight, Denys Vlasenko,
	Eduardo Valentin, Greg KH, Will Deacon, Liguori, Anthony,
	Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek

On Sun, 2018-02-11 at 11:42 -0800, Andy Lutomirski wrote:
> 
> On Feb 11, 2018, at 9:40 AM, Mark D Rustad <mrustad@gmail.com> wrote:
> 
> > 
> > > 
> > > On Feb 11, 2018, at 2:59 AM, Adam Borowski <kilobyte@angband.pl>
> > > wrote:
> > > 
> > > > 
> > > > Does Debian make it easy to upgrade to a 64-bit kernel if you
> > > > have a
> > > > 32-bit install?
> > > 
> > > Quite easy, yeah.  Crossgrading userspace is not for the faint of
> > > the heart,
> > > but changing just the kernel is fine.
> > 
> > ISTR that iscsi doesn't work when running a 64-bit kernel with a
> > 32-bit userspace. I remember someone offered kernel patches to fix
> > it, but I think they were rejected. I haven't messed with that
> > stuff in many years, so perhaps the userspace side now has
> > accommodation for it. It might be something to check on.
> > 
> 
> At the risk of suggesting heresy, should we consider removing x86_32
> support at some point?

Hey, my cloud server is 32 bit:

bedivere:~# cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 15
model		: 2
model name	: Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping	: 9
microcode	: 0x2e
cpu MHz		: 2813.464
[...]

I suspect a lot of people are in the same position: grandfathered in on
an old hosting plan, but not really willing to switch to a new 64 bit
box because the monthly cost about doubles and nothing it does is
currently anywhere up to (let alone over) the capacity of the single
686 processor.

The thing which is making me consider it is actually getting a TPM to
protect the private keys, but doubling the monthly cost is still a huge
disincentive.

James

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-11 22:12                 ` James Bottomley
  0 siblings, 0 replies; 183+ messages in thread
From: James Bottomley @ 2018-02-11 22:12 UTC (permalink / raw)
  To: Andy Lutomirski, Mark D Rustad
  Cc: Adam Borowski, Linus Torvalds, Joerg Roedel, Andy Lutomirski,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	X86 ML, LKML, Linux-MM, Dave Hansen, Josh Poimboeuf,
	Juergen Gross, Peter Zijlstra, Borislav Petkov, Jiri Kosina,
	Boris Ostrovsky, Brian Gerst, David Laight, Denys Vlasenko,
	Eduardo Valentin, Greg KH, Will Deacon, Liguori, Anthony,
	Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek

On Sun, 2018-02-11 at 11:42 -0800, Andy Lutomirski wrote:
> 
> On Feb 11, 2018, at 9:40 AM, Mark D Rustad <mrustad@gmail.com> wrote:
> 
> > 
> > > 
> > > On Feb 11, 2018, at 2:59 AM, Adam Borowski <kilobyte@angband.pl>
> > > wrote:
> > > 
> > > > 
> > > > Does Debian make it easy to upgrade to a 64-bit kernel if you
> > > > have a
> > > > 32-bit install?
> > > 
> > > Quite easy, yeah.A A Crossgrading userspace is not for the faint of
> > > the heart,
> > > but changing just the kernel is fine.
> > 
> > ISTR that iscsi doesn't work when running a 64-bit kernel with a
> > 32-bit userspace. I remember someone offered kernel patches to fix
> > it, but I think they were rejected. I haven't messed with that
> > stuff in many years, so perhaps the userspace side now has
> > accommodation for it. It might be something to check on.
> > 
> 
> At the risk of suggesting heresy, should we consider removing x86_32
> support at some point?

Hey, my cloud server is 32 bit:

bedivere:~# cat /proc/cpuinfoA 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 15
model		: 2
model name	: Intel(R) Pentium(R) 4 CPU 2.80GHz
stepping	: 9
microcode	: 0x2e
cpu MHz		: 2813.464
[...]

I suspect a lot of people are in the same position: grandfathered in on
an old hosting plan, but not really willing to switch to a new 64 bit
box because the monthly cost about doubles and nothing it does is
currently anywhere up to (let alone over) the capacity of the single
686 processor.

The thing which is making me consider it is actually getting a TPM to
protect the private keys, but doubling the monthly cost is still a huge
disincentive.

James

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-11 22:12                 ` James Bottomley
@ 2018-02-11 22:30                   ` Andy Lutomirski
  -1 siblings, 0 replies; 183+ messages in thread
From: Andy Lutomirski @ 2018-02-11 22:30 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mark D Rustad, Adam Borowski, Linus Torvalds, Joerg Roedel,
	Andy Lutomirski, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, X86 ML, LKML, Linux-MM, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek



> On Feb 11, 2018, at 2:12 PM, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
>> On Sun, 2018-02-11 at 11:42 -0800, Andy Lutomirski wrote:
>> 
>>> On Feb 11, 2018, at 9:40 AM, Mark D Rustad <mrustad@gmail.com> wrote:
>>> 
>>> 
>>>> 
>>>> On Feb 11, 2018, at 2:59 AM, Adam Borowski <kilobyte@angband.pl>
>>>> wrote:
>>>> 
>>>>> 
>>>>> Does Debian make it easy to upgrade to a 64-bit kernel if you
>>>>> have a
>>>>> 32-bit install?
>>>> 
>>>> Quite easy, yeah.  Crossgrading userspace is not for the faint of
>>>> the heart,
>>>> but changing just the kernel is fine.
>>> 
>>> ISTR that iscsi doesn't work when running a 64-bit kernel with a
>>> 32-bit userspace. I remember someone offered kernel patches to fix
>>> it, but I think they were rejected. I haven't messed with that
>>> stuff in many years, so perhaps the userspace side now has
>>> accommodation for it. It might be something to check on.
>>> 
>> 
>> At the risk of suggesting heresy, should we consider removing x86_32
>> support at some point?
> 
> Hey, my cloud server is 32 bit:
> 
> bedivere:~# cat /proc/cpuinfo 
> processor    : 0
> vendor_id    : GenuineIntel
> cpu family    : 15
> model        : 2
> model name    : Intel(R) Pentium(R) 4 CPU 2.80GHz
> stepping    : 9
> microcode    : 0x2e
> cpu MHz        : 2813.464
> [...]
> 
> I suspect a lot of people are in the same position: grandfathered in on
> an old hosting plan, but not really willing to switch to a new 64 bit
> box because the monthly cost about doubles and nothing it does is
> currently anywhere up to (let alone over) the capacity of the single
> 686 processor.
> 
> The thing which is making me consider it is actually getting a TPM to
> protect the private keys, but doubling the monthly cost is still a huge
> disincentive.

Where are they hosting this?  Last I checked, replacing a P4 and motherboard with something new paid for itself in about a year in power savings.

> 
> James
> 

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-11 22:30                   ` Andy Lutomirski
  0 siblings, 0 replies; 183+ messages in thread
From: Andy Lutomirski @ 2018-02-11 22:30 UTC (permalink / raw)
  To: James Bottomley
  Cc: Mark D Rustad, Adam Borowski, Linus Torvalds, Joerg Roedel,
	Andy Lutomirski, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, X86 ML, LKML, Linux-MM, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek



> On Feb 11, 2018, at 2:12 PM, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
> 
>> On Sun, 2018-02-11 at 11:42 -0800, Andy Lutomirski wrote:
>> 
>>> On Feb 11, 2018, at 9:40 AM, Mark D Rustad <mrustad@gmail.com> wrote:
>>> 
>>> 
>>>> 
>>>> On Feb 11, 2018, at 2:59 AM, Adam Borowski <kilobyte@angband.pl>
>>>> wrote:
>>>> 
>>>>> 
>>>>> Does Debian make it easy to upgrade to a 64-bit kernel if you
>>>>> have a
>>>>> 32-bit install?
>>>> 
>>>> Quite easy, yeah.  Crossgrading userspace is not for the faint of
>>>> the heart,
>>>> but changing just the kernel is fine.
>>> 
>>> ISTR that iscsi doesn't work when running a 64-bit kernel with a
>>> 32-bit userspace. I remember someone offered kernel patches to fix
>>> it, but I think they were rejected. I haven't messed with that
>>> stuff in many years, so perhaps the userspace side now has
>>> accommodation for it. It might be something to check on.
>>> 
>> 
>> At the risk of suggesting heresy, should we consider removing x86_32
>> support at some point?
> 
> Hey, my cloud server is 32 bit:
> 
> bedivere:~# cat /proc/cpuinfo 
> processor    : 0
> vendor_id    : GenuineIntel
> cpu family    : 15
> model        : 2
> model name    : Intel(R) Pentium(R) 4 CPU 2.80GHz
> stepping    : 9
> microcode    : 0x2e
> cpu MHz        : 2813.464
> [...]
> 
> I suspect a lot of people are in the same position: grandfathered in on
> an old hosting plan, but not really willing to switch to a new 64 bit
> box because the monthly cost about doubles and nothing it does is
> currently anywhere up to (let alone over) the capacity of the single
> 686 processor.
> 
> The thing which is making me consider it is actually getting a TPM to
> protect the private keys, but doubling the monthly cost is still a huge
> disincentive.

Where are they hosting this?  Last I checked, replacing a P4 and motherboard with something new paid for itself in about a year in power savings.

> 
> James
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-11 19:42               ` Andy Lutomirski
@ 2018-02-11 22:34                 ` Pavel Machek
  -1 siblings, 0 replies; 183+ messages in thread
From: Pavel Machek @ 2018-02-11 22:34 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mark D Rustad, Adam Borowski, Linus Torvalds, Joerg Roedel,
	Andy Lutomirski, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, X86 ML, LKML, Linux-MM, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long

On Sun 2018-02-11 11:42:47, Andy Lutomirski wrote:
> 
> 
> On Feb 11, 2018, at 9:40 AM, Mark D Rustad <mrustad@gmail.com> wrote:
> 
> >> On Feb 11, 2018, at 2:59 AM, Adam Borowski <kilobyte@angband.pl> wrote:
> >> 
> >>> Does Debian make it easy to upgrade to a 64-bit kernel if you have a
> >>> 32-bit install?
> >> 
> >> Quite easy, yeah.  Crossgrading userspace is not for the faint of the heart,
> >> but changing just the kernel is fine.
> > 
> > ISTR that iscsi doesn't work when running a 64-bit kernel with a 32-bit userspace. I remember someone offered kernel patches to fix it, but I think they were rejected. I haven't messed with that stuff in many years, so perhaps the userspace side now has accommodation for it. It might be something to check on.
> > 

It might make sense to retry those patches... if we want people to
move to 64bit kernels, it makes sense to provide complete emulation
for 32bit distros...

> At the risk of suggesting heresy, should we consider removing x86_32
> support at some point?

Heresy!

We still support 486s. I don't know if anyone really uses them, but
T40p is an acceptable machine to ssh from. X60 is still the machine I
prefer for traveling. Fast enough if you don't compile, and nicer
keyboard/screen than X220...

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-11 22:34                 ` Pavel Machek
  0 siblings, 0 replies; 183+ messages in thread
From: Pavel Machek @ 2018-02-11 22:34 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mark D Rustad, Adam Borowski, Linus Torvalds, Joerg Roedel,
	Andy Lutomirski, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, X86 ML, LKML, Linux-MM, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long

On Sun 2018-02-11 11:42:47, Andy Lutomirski wrote:
> 
> 
> On Feb 11, 2018, at 9:40 AM, Mark D Rustad <mrustad@gmail.com> wrote:
> 
> >> On Feb 11, 2018, at 2:59 AM, Adam Borowski <kilobyte@angband.pl> wrote:
> >> 
> >>> Does Debian make it easy to upgrade to a 64-bit kernel if you have a
> >>> 32-bit install?
> >> 
> >> Quite easy, yeah.  Crossgrading userspace is not for the faint of the heart,
> >> but changing just the kernel is fine.
> > 
> > ISTR that iscsi doesn't work when running a 64-bit kernel with a 32-bit userspace. I remember someone offered kernel patches to fix it, but I think they were rejected. I haven't messed with that stuff in many years, so perhaps the userspace side now has accommodation for it. It might be something to check on.
> > 

It might make sense to retry those patches... if we want people to
move to 64bit kernels, it makes sense to provide complete emulation
for 32bit distros...

> At the risk of suggesting heresy, should we consider removing x86_32
> support at some point?

Heresy!

We still support 486s. I don't know if anyone really uses them, but
T40p is an acceptable machine to ssh from. X60 is still the machine I
prefer for traveling. Fast enough if you don't compile, and nicer
keyboard/screen than X220...

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-11 19:42               ` Andy Lutomirski
@ 2018-02-11 23:25                 ` Alan Cox
  -1 siblings, 0 replies; 183+ messages in thread
From: Alan Cox @ 2018-02-11 23:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mark D Rustad, Adam Borowski, Linus Torvalds, Joerg Roedel,
	Andy Lutomirski, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, X86 ML, LKML, Linux-MM, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek

On Sun, 11 Feb 2018 11:42:47 -0800
Andy Lutomirski <luto@amacapital.net> wrote:

> On Feb 11, 2018, at 9:40 AM, Mark D Rustad <mrustad@gmail.com> wrote:
> 
> >> On Feb 11, 2018, at 2:59 AM, Adam Borowski <kilobyte@angband.pl> wrote:
> >>   
> >>> Does Debian make it easy to upgrade to a 64-bit kernel if you have a
> >>> 32-bit install?  
> >> 
> >> Quite easy, yeah.  Crossgrading userspace is not for the faint of the heart,
> >> but changing just the kernel is fine.  
> > 
> > ISTR that iscsi doesn't work when running a 64-bit kernel with a 32-bit userspace. I remember someone offered kernel patches to fix it, but I think they were rejected. I haven't messed with that stuff in many years, so perhaps the userspace side now has accommodation for it. It might be something to check on.
> >   
> 
> At the risk of suggesting heresy, should we consider removing x86_32 support at some point?

Probably - although it's still relevant for Quark. I can't think of any
other in-production 32bit only processor at this point. Big core Intel
went 64bit 2006 or so, atoms mostly 2008 or so (with some stragglers that
are 32 or 64 bit depending if it's enabled) until 2011 (Cedartrail)

If someone stuck a fork in it just after the next long term kernel
release then by the time that expired it would probably be historical
interest only.

Does it not depend if there is someone crazy enough to maintain it
however - 68000 is doing fine 8)

Alan

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-11 23:25                 ` Alan Cox
  0 siblings, 0 replies; 183+ messages in thread
From: Alan Cox @ 2018-02-11 23:25 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mark D Rustad, Adam Borowski, Linus Torvalds, Joerg Roedel,
	Andy Lutomirski, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, X86 ML, LKML, Linux-MM, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek

On Sun, 11 Feb 2018 11:42:47 -0800
Andy Lutomirski <luto@amacapital.net> wrote:

> On Feb 11, 2018, at 9:40 AM, Mark D Rustad <mrustad@gmail.com> wrote:
> 
> >> On Feb 11, 2018, at 2:59 AM, Adam Borowski <kilobyte@angband.pl> wrote:
> >>   
> >>> Does Debian make it easy to upgrade to a 64-bit kernel if you have a
> >>> 32-bit install?  
> >> 
> >> Quite easy, yeah.  Crossgrading userspace is not for the faint of the heart,
> >> but changing just the kernel is fine.  
> > 
> > ISTR that iscsi doesn't work when running a 64-bit kernel with a 32-bit userspace. I remember someone offered kernel patches to fix it, but I think they were rejected. I haven't messed with that stuff in many years, so perhaps the userspace side now has accommodation for it. It might be something to check on.
> >   
> 
> At the risk of suggesting heresy, should we consider removing x86_32 support at some point?

Probably - although it's still relevant for Quark. I can't think of any
other in-production 32bit only processor at this point. Big core Intel
went 64bit 2006 or so, atoms mostly 2008 or so (with some stragglers that
are 32 or 64 bit depending if it's enabled) until 2011 (Cedartrail)

If someone stuck a fork in it just after the next long term kernel
release then by the time that expired it would probably be historical
interest only.

Does it not depend if there is someone crazy enough to maintain it
however - 68000 is doing fine 8)

Alan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-11 22:30                   ` Andy Lutomirski
@ 2018-02-11 23:47                     ` James Bottomley
  -1 siblings, 0 replies; 183+ messages in thread
From: James Bottomley @ 2018-02-11 23:47 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mark D Rustad, Adam Borowski, Linus Torvalds, Joerg Roedel,
	Andy Lutomirski, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, X86 ML, LKML, Linux-MM, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek

On Sun, 2018-02-11 at 14:30 -0800, Andy Lutomirski wrote:
> 
> > 
> > On Feb 11, 2018, at 2:12 PM, James Bottomley <James.Bottomley@Hanse
> > nPartnership.com> wrote:
> > 
> > > 
> > > On Sun, 2018-02-11 at 11:42 -0800, Andy Lutomirski wrote:
> > > 
> > > > 
> > > > On Feb 11, 2018, at 9:40 AM, Mark D Rustad <mrustad@gmail.com>
> > > > wrote:
> > > > 
> > > > 
> > > > > 
> > > > > 
> > > > > On Feb 11, 2018, at 2:59 AM, Adam Borowski <kilobyte@angband.
> > > > > pl>
> > > > > wrote:
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > Does Debian make it easy to upgrade to a 64-bit kernel if
> > > > > > you
> > > > > > have a
> > > > > > 32-bit install?
> > > > > 
> > > > > Quite easy, yeah.  Crossgrading userspace is not for the
> > > > > faint of the heart, but changing just the kernel is fine.
> > > > 
> > > > ISTR that iscsi doesn't work when running a 64-bit kernel with
> > > > a 32-bit userspace. I remember someone offered kernel patches
> > > > to fix it, but I think they were rejected. I haven't messed
> > > > with that stuff in many years, so perhaps the userspace side
> > > > now has accommodation for it. It might be something to check
> > > > on.
> > > > 
> > > 
> > > At the risk of suggesting heresy, should we consider removing
> > > x86_32 support at some point?
> > 
> > Hey, my cloud server is 32 bit:
> > 
> > bedivere:~# cat /proc/cpuinfo 
> > processor    : 0
> > vendor_id    : GenuineIntel
> > cpu family    : 15
> > model        : 2
> > model name    : Intel(R) Pentium(R) 4 CPU 2.80GHz
> > stepping    : 9
> > microcode    : 0x2e
> > cpu MHz        : 2813.464
> > [...]
> > 
> > I suspect a lot of people are in the same position: grandfathered
> > in on an old hosting plan, but not really willing to switch to a
> > new 64 bit box because the monthly cost about doubles and nothing
> > it does is currently anywhere up to (let alone over) the capacity
> > of the single 686 processor.
> > 
> > The thing which is making me consider it is actually getting a TPM
> > to protect the private keys, but doubling the monthly cost is still
> > a huge disincentive.
> 
> Where are they hosting this?  Last I checked, replacing a P4 and
> motherboard with something new paid for itself in about a year in
> power savings.

It's a rented server not a co-lo cage.  I don't doubt it's costing the
hosting provider, but they're keeping my rates low.

James

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-11 23:47                     ` James Bottomley
  0 siblings, 0 replies; 183+ messages in thread
From: James Bottomley @ 2018-02-11 23:47 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mark D Rustad, Adam Borowski, Linus Torvalds, Joerg Roedel,
	Andy Lutomirski, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, X86 ML, LKML, Linux-MM, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek

On Sun, 2018-02-11 at 14:30 -0800, Andy Lutomirski wrote:
> 
> > 
> > On Feb 11, 2018, at 2:12 PM, James Bottomley <James.Bottomley@Hanse
> > nPartnership.com> wrote:
> > 
> > > 
> > > On Sun, 2018-02-11 at 11:42 -0800, Andy Lutomirski wrote:
> > > 
> > > > 
> > > > On Feb 11, 2018, at 9:40 AM, Mark D Rustad <mrustad@gmail.com>
> > > > wrote:
> > > > 
> > > > 
> > > > > 
> > > > > 
> > > > > On Feb 11, 2018, at 2:59 AM, Adam Borowski <kilobyte@angband.
> > > > > pl>
> > > > > wrote:
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > Does Debian make it easy to upgrade to a 64-bit kernel if
> > > > > > you
> > > > > > have a
> > > > > > 32-bit install?
> > > > > 
> > > > > Quite easy, yeah.A A Crossgrading userspace is not for the
> > > > > faint of the heart, but changing just the kernel is fine.
> > > > 
> > > > ISTR that iscsi doesn't work when running a 64-bit kernel with
> > > > a 32-bit userspace. I remember someone offered kernel patches
> > > > to fix it, but I think they were rejected. I haven't messed
> > > > with that stuff in many years, so perhaps the userspace side
> > > > now has accommodation for it. It might be something to check
> > > > on.
> > > > 
> > > 
> > > At the risk of suggesting heresy, should we consider removing
> > > x86_32 support at some point?
> > 
> > Hey, my cloud server is 32 bit:
> > 
> > bedivere:~# cat /proc/cpuinfoA 
> > processorA A A A : 0
> > vendor_idA A A A : GenuineIntel
> > cpu familyA A A A : 15
> > modelA A A A A A A A : 2
> > model nameA A A A : Intel(R) Pentium(R) 4 CPU 2.80GHz
> > steppingA A A A : 9
> > microcodeA A A A : 0x2e
> > cpu MHzA A A A A A A A : 2813.464
> > [...]
> > 
> > I suspect a lot of people are in the same position: grandfathered
> > in on an old hosting plan, but not really willing to switch to a
> > new 64 bit box because the monthly cost about doubles and nothing
> > it does is currently anywhere up to (let alone over) the capacity
> > of the single 686 processor.
> > 
> > The thing which is making me consider it is actually getting a TPM
> > to protect the private keys, but doubling the monthly cost is still
> > a huge disincentive.
> 
> Where are they hosting this?A A Last I checked, replacing a P4 and
> motherboard with something new paid for itself in about a year in
> power savings.

It's a rented server not a co-lo cage. A I don't doubt it's costing the
hosting provider, but they're keeping my rates low.

James

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-11 23:25                 ` Alan Cox
@ 2018-02-12 10:16                   ` Anders Larsen
  -1 siblings, 0 replies; 183+ messages in thread
From: Anders Larsen @ 2018-02-12 10:16 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andy Lutomirski, Mark D Rustad, Adam Borowski, Linus Torvalds,
	Joerg Roedel, Andy Lutomirski, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, X86 ML, LKML, Linux-MM,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss

On Sunday, 11 February 2018 23:25 Alan Cox wrote:
> On Sun, 11 Feb 2018 11:42:47 -0800
> 
> Andy Lutomirski <luto@amacapital.net> wrote:
> > On Feb 11, 2018, at 9:40 AM, Mark D Rustad <mrustad@gmail.com> wrote:
> > >> On Feb 11, 2018, at 2:59 AM, Adam Borowski <kilobyte@angband.pl> wrote:
> > >>> Does Debian make it easy to upgrade to a 64-bit kernel if you have a
> > >>> 32-bit install?
> > >> 
> > >> Quite easy, yeah.  Crossgrading userspace is not for the faint of the
> > >> heart, but changing just the kernel is fine.
> > > 
> > > ISTR that iscsi doesn't work when running a 64-bit kernel with a 32-bit
> > > userspace. I remember someone offered kernel patches to fix it, but I
> > > think they were rejected. I haven't messed with that stuff in many
> > > years, so perhaps the userspace side now has accommodation for it. It
> > > might be something to check on.
> > > 
> > At the risk of suggesting heresy, should we consider removing x86_32
> > support at some point?
> 
> Probably - although it's still relevant for Quark. I can't think of any
> other in-production 32bit only processor at this point. Big core Intel
> went 64bit 2006 or so, atoms mostly 2008 or so (with some stragglers that
> are 32 or 64 bit depending if it's enabled) until 2011 (Cedartrail)

FWIW the Atom E6xx series (Tunnel Creek) is 32bit only and still in
production; my employer is using those beasts in several devices - and I'm
fighting an uphill battle to have those products ship with a recent kernel
(for certain values of recent)

> If someone stuck a fork in it just after the next long term kernel
> release then by the time that expired it would probably be historical
> interest only.
> 
> Does it not depend if there is someone crazy enough to maintain it
> however - 68000 is doing fine 8)

Cheers
Anders

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-12 10:16                   ` Anders Larsen
  0 siblings, 0 replies; 183+ messages in thread
From: Anders Larsen @ 2018-02-12 10:16 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andy Lutomirski, Mark D Rustad, Adam Borowski, Linus Torvalds,
	Joerg Roedel, Andy Lutomirski, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, X86 ML, LKML, Linux-MM,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli, Waiman Long, Pavel Machek

On Sunday, 11 February 2018 23:25 Alan Cox wrote:
> On Sun, 11 Feb 2018 11:42:47 -0800
> 
> Andy Lutomirski <luto@amacapital.net> wrote:
> > On Feb 11, 2018, at 9:40 AM, Mark D Rustad <mrustad@gmail.com> wrote:
> > >> On Feb 11, 2018, at 2:59 AM, Adam Borowski <kilobyte@angband.pl> wrote:
> > >>> Does Debian make it easy to upgrade to a 64-bit kernel if you have a
> > >>> 32-bit install?
> > >> 
> > >> Quite easy, yeah.  Crossgrading userspace is not for the faint of the
> > >> heart, but changing just the kernel is fine.
> > > 
> > > ISTR that iscsi doesn't work when running a 64-bit kernel with a 32-bit
> > > userspace. I remember someone offered kernel patches to fix it, but I
> > > think they were rejected. I haven't messed with that stuff in many
> > > years, so perhaps the userspace side now has accommodation for it. It
> > > might be something to check on.
> > > 
> > At the risk of suggesting heresy, should we consider removing x86_32
> > support at some point?
> 
> Probably - although it's still relevant for Quark. I can't think of any
> other in-production 32bit only processor at this point. Big core Intel
> went 64bit 2006 or so, atoms mostly 2008 or so (with some stragglers that
> are 32 or 64 bit depending if it's enabled) until 2011 (Cedartrail)

FWIW the Atom E6xx series (Tunnel Creek) is 32bit only and still in
production; my employer is using those beasts in several devices - and I'm
fighting an uphill battle to have those products ship with a recent kernel
(for certain values of recent)

> If someone stuck a fork in it just after the next long term kernel
> release then by the time that expired it would probably be historical
> interest only.
> 
> Does it not depend if there is someone crazy enough to maintain it
> however - 68000 is doing fine 8)

Cheers
Anders

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-11 19:13       ` Ingo Molnar
@ 2018-02-12 14:51         ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-12 14:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Joerg Roedel, Andy Lutomirski, Thomas Gleixner, H . Peter Anvin,
	X86 ML, LKML, Linux-MM, Linus Torvalds, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek

Hi Ingo,

On Sun, Feb 11, 2018 at 08:13:12PM +0100, Ingo Molnar wrote:
> Could you please measure the PTI kernel vs. vanilla kernel?

Okay, did that, here is the data. The test machine is a Xeon E5-1620v2,
which is Ivy Bridge based (no PCIE) and has 4C/8T.

I ran the 2 tests you suggested:

	* Test-1: perf stat --null --sync --repeat 10 perf bench sched messaging -g 20

	* Test-2: perf stat --null --sync --repeat 10 perf bench sched messaging -g 20 -t

The tests ran on these kernels:

	* tip-32-pae: current top of tip/x86-tip-for-linus branch,
	              compiled as a 32 bit kernel with PAE
	              (commit b2ac58f90540e39324e7a29a7ad471407ae0bf48)

	* pti-32-pae: Same as above with my patches on-top, as on

		      git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git pti-x32-v2

	              compiled as a 32 bit kernel with PAE
		      (commit dbb0074f778b396a11e0c897fef9d0c4583e7ccb)

	* pti-off-64: current top of tip/x86-tip-for-linus branch,
		      compiled as a 64 bit kernel, booted with pti=off
	              (commit b2ac58f90540e39324e7a29a7ad471407ae0bf48)

	* pti-on-64: current top of tip/x86-tip-for-linus branch,
		     compiled as a 64 bit kernel, booted with pti=on
	             (commit b2ac58f90540e39324e7a29a7ad471407ae0bf48)

Results are:
	            | Test-1             | Test-2          
	------------+--------------------+-----------------
	tip-32-pae  | 0.28s (+-0.44%)    | 0.27s (+-2.15%) 
	------------+--------------------+-----------------
	pti-32-pae  | 0.44s (+-0.40%)    | 0.42s (+-0.48%) 
	------------+--------------------+-----------------
	pti-off-64  | 0.24s (+-0.40%)    | 0.25s (+-1.31%) 
	------------+--------------------+-----------------
	pti-on-64   | 0.30s (+-0.47%)    | 0.31s (+-0.95%)

On 32 bit with PTI enabled the test needs 157% (non-threaded) and
156% (threaded) of time compared to the non-PTI baseline.

On 64 bit these numbers are 125% (non-threaded) and 124% (threaded).

The pti-32-pae kernel still used 'rep movsb' in the entry code. I
replaced that with 'rep movsl' and measured again, but overhead is still
around 152%.

I also measured cycles with 'perf record' to see where the additional
time is spent. The report showed around 25% in entry_SYSENTER_32 for
the pti-32-pae kernel. The same report on the tip-32-pae kernel shows
around 2.5% for the same symbol.

The entry_SYSENTER_32 path does no stack-copy on entry (it only
push/pops 8 bytes for the cr3 switch), but one full pt_regs copy on
exit. The exit-path was easy to optimize, I got it to the point where it
only copied 8 bytes to the entry stack (flags and eax).  This way I got
the 'perf report' numbers for entry_SYSENTER_32 down to around 20%, but
the overall numbers for Test-1 and Test-2 are still at around 150% of
the baseline.

So it seems that most of the additional time is actually spent switching
the cr3s.

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-12 14:51         ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-12 14:51 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Joerg Roedel, Andy Lutomirski, Thomas Gleixner, H . Peter Anvin,
	X86 ML, LKML, Linux-MM, Linus Torvalds, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek

Hi Ingo,

On Sun, Feb 11, 2018 at 08:13:12PM +0100, Ingo Molnar wrote:
> Could you please measure the PTI kernel vs. vanilla kernel?

Okay, did that, here is the data. The test machine is a Xeon E5-1620v2,
which is Ivy Bridge based (no PCIE) and has 4C/8T.

I ran the 2 tests you suggested:

	* Test-1: perf stat --null --sync --repeat 10 perf bench sched messaging -g 20

	* Test-2: perf stat --null --sync --repeat 10 perf bench sched messaging -g 20 -t

The tests ran on these kernels:

	* tip-32-pae: current top of tip/x86-tip-for-linus branch,
	              compiled as a 32 bit kernel with PAE
	              (commit b2ac58f90540e39324e7a29a7ad471407ae0bf48)

	* pti-32-pae: Same as above with my patches on-top, as on

		      git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux.git pti-x32-v2

	              compiled as a 32 bit kernel with PAE
		      (commit dbb0074f778b396a11e0c897fef9d0c4583e7ccb)

	* pti-off-64: current top of tip/x86-tip-for-linus branch,
		      compiled as a 64 bit kernel, booted with pti=off
	              (commit b2ac58f90540e39324e7a29a7ad471407ae0bf48)

	* pti-on-64: current top of tip/x86-tip-for-linus branch,
		     compiled as a 64 bit kernel, booted with pti=on
	             (commit b2ac58f90540e39324e7a29a7ad471407ae0bf48)

Results are:
	            | Test-1             | Test-2          
	------------+--------------------+-----------------
	tip-32-pae  | 0.28s (+-0.44%)    | 0.27s (+-2.15%) 
	------------+--------------------+-----------------
	pti-32-pae  | 0.44s (+-0.40%)    | 0.42s (+-0.48%) 
	------------+--------------------+-----------------
	pti-off-64  | 0.24s (+-0.40%)    | 0.25s (+-1.31%) 
	------------+--------------------+-----------------
	pti-on-64   | 0.30s (+-0.47%)    | 0.31s (+-0.95%)

On 32 bit with PTI enabled the test needs 157% (non-threaded) and
156% (threaded) of time compared to the non-PTI baseline.

On 64 bit these numbers are 125% (non-threaded) and 124% (threaded).

The pti-32-pae kernel still used 'rep movsb' in the entry code. I
replaced that with 'rep movsl' and measured again, but overhead is still
around 152%.

I also measured cycles with 'perf record' to see where the additional
time is spent. The report showed around 25% in entry_SYSENTER_32 for
the pti-32-pae kernel. The same report on the tip-32-pae kernel shows
around 2.5% for the same symbol.

The entry_SYSENTER_32 path does no stack-copy on entry (it only
push/pops 8 bytes for the cr3 switch), but one full pt_regs copy on
exit. The exit-path was easy to optimize, I got it to the point where it
only copied 8 bytes to the entry stack (flags and eax).  This way I got
the 'perf report' numbers for entry_SYSENTER_32 down to around 20%, but
the overall numbers for Test-1 and Test-2 are still at around 150% of
the baseline.

So it seems that most of the additional time is actually spent switching
the cr3s.

Regards,

	Joerg

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-11 17:40           ` Mark D Rustad
@ 2018-02-13  8:54               ` Greg KH
  2018-02-13  8:54               ` Greg KH
  1 sibling, 0 replies; 183+ messages in thread
From: Greg KH @ 2018-02-13  8:54 UTC (permalink / raw)
  To: Mark D Rustad
  Cc: Adam Borowski, Linus Torvalds, Joerg Roedel, Andy Lutomirski,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	X86 ML, LKML, Linux-MM, Dave Hansen, Josh Poimboeuf,
	Juergen Gross, Peter Zijlstra, Borislav Petkov, Jiri Kosina,
	Boris Ostrovsky, Brian Gerst, David Laight, Denys Vlasenko,
	Eduardo Valentin, Will Deacon, Liguori, Anthony, Daniel Gruss,
	Hugh Dickins, Kees Cook, Andrea Arcangeli, Waiman Long,
	Pavel Machek

On Sun, Feb 11, 2018 at 09:40:41AM -0800, Mark D Rustad wrote:
> > On Feb 11, 2018, at 2:59 AM, Adam Borowski <kilobyte@angband.pl> wrote:
> > 
> >> Does Debian make it easy to upgrade to a 64-bit kernel if you have a
> >> 32-bit install?
> > 
> > Quite easy, yeah.  Crossgrading userspace is not for the faint of the heart,
> > but changing just the kernel is fine.
> 
> ISTR that iscsi doesn't work when running a 64-bit kernel with a
> 32-bit userspace. I remember someone offered kernel patches to fix it,
> but I think they were rejected. I haven't messed with that stuff in
> many years, so perhaps the userspace side now has accommodation for
> it. It might be something to check on.

IPSEC doesn't work with a 64bit kernel and 32bit userspace right now.

Back in 2015 someone started to work on that, and properly marked that
the kernel could not handle this with commit 74005991b78a ("xfrm: Do not
parse 32bits compiled xfrm netlink msg on 64bits host")

This is starting to be hit by some Android systems that are moving
(yeah, slowly) to 4.4 :(

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-13  8:54               ` Greg KH
  0 siblings, 0 replies; 183+ messages in thread
From: Greg KH @ 2018-02-13  8:54 UTC (permalink / raw)
  To: Mark D Rustad
  Cc: Adam Borowski, Linus Torvalds, Joerg Roedel, Andy Lutomirski,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	X86 ML, LKML, Linux-MM, Dave Hansen, Josh Poimboeuf,
	Juergen Gross, Peter Zijlstra, Borislav Petkov, Jiri Kosina,
	Boris Ostrovsky, Brian Gerst, David Laight, Denys Vlasenko,
	Eduardo Valentin, Will Deacon, Liguori, Anthony, Daniel Gruss,
	Hugh Dickins, Kees Cook, Andrea Arcangeli, Waiman Long,
	Pavel Machek

On Sun, Feb 11, 2018 at 09:40:41AM -0800, Mark D Rustad wrote:
> > On Feb 11, 2018, at 2:59 AM, Adam Borowski <kilobyte@angband.pl> wrote:
> > 
> >> Does Debian make it easy to upgrade to a 64-bit kernel if you have a
> >> 32-bit install?
> > 
> > Quite easy, yeah.  Crossgrading userspace is not for the faint of the heart,
> > but changing just the kernel is fine.
> 
> ISTR that iscsi doesn't work when running a 64-bit kernel with a
> 32-bit userspace. I remember someone offered kernel patches to fix it,
> but I think they were rejected. I haven't messed with that stuff in
> many years, so perhaps the userspace side now has accommodation for
> it. It might be something to check on.

IPSEC doesn't work with a 64bit kernel and 32bit userspace right now.

Back in 2015 someone started to work on that, and properly marked that
the kernel could not handle this with commit 74005991b78a ("xfrm: Do not
parse 32bits compiled xfrm netlink msg on 64bits host")

This is starting to be hit by some Android systems that are moving
(yeah, slowly) to 4.4 :(

thanks,

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-13  8:54               ` Greg KH
@ 2018-02-13 17:25                 ` Linus Torvalds
  -1 siblings, 0 replies; 183+ messages in thread
From: Linus Torvalds @ 2018-02-13 17:25 UTC (permalink / raw)
  To: Greg KH
  Cc: Mark D Rustad, Adam Borowski, Joerg Roedel, Andy Lutomirski,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	X86 ML, LKML, Linux-MM, Dave Hansen, Josh Poimboeuf,
	Juergen Gross, Peter Zijlstra, Borislav Petkov, Jiri Kosina,
	Boris Ostrovsky, Brian Gerst, David Laight, Denys Vlasenko,
	Eduardo Valentin, Will Deacon, Liguori, Anthony, Daniel Gruss,
	Hugh Dickins, Kees Cook, Andrea Arcangeli, Waiman Long,
	Pavel Machek

On Tue, Feb 13, 2018 at 12:54 AM, Greg KH <gregkh@linuxfoundation.org> wrote:
> On Sun, Feb 11, 2018 at 09:40:41AM -0800, Mark D Rustad wrote:
>>
>> ISTR that iscsi doesn't work when running a 64-bit kernel with a
>> 32-bit userspace. I remember someone offered kernel patches to fix it,
>> but I think they were rejected. I haven't messed with that stuff in
>> many years, so perhaps the userspace side now has accommodation for
>> it. It might be something to check on.
>
> IPSEC doesn't work with a 64bit kernel and 32bit userspace right now.
>
> Back in 2015 someone started to work on that, and properly marked that
> the kernel could not handle this with commit 74005991b78a ("xfrm: Do not
> parse 32bits compiled xfrm netlink msg on 64bits host")
>
> This is starting to be hit by some Android systems that are moving
> (yeah, slowly) to 4.4 :(

Does anybody have test-programs/harnesses for this?

This is an area I care deeply about: I really want people to not have
any excuses for not upgrading to a 64-bit kernel.  It used to be
autofs (I actually added that whole "packetized pipe" model just to
make automount "just w ork" even though the stupid protocol used a
pipe to send command packets that had different layout on 32-bit and
64-bit).

See commit 64f371bc3107 ("autofs: make the autofsv5 packet file
descriptor use a packetized pipe") for some discussion of that
particular saga.

Some drm people used to run 32-bit kernels because of compat worries,
and that would have been a disaster. They got very good about not
having compat issues.

So let's try to fix the iscsi and ipsec issues. Not that anybody sane
should use that overly complex ipsec thing, and I think we should
strive to merge WireGuard and get people moved over to that instead,
but I haven't heard anything from davem about it since I last asked..
I have some hope that it's slowly happening.

             Linus

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-13 17:25                 ` Linus Torvalds
  0 siblings, 0 replies; 183+ messages in thread
From: Linus Torvalds @ 2018-02-13 17:25 UTC (permalink / raw)
  To: Greg KH
  Cc: Mark D Rustad, Adam Borowski, Joerg Roedel, Andy Lutomirski,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	X86 ML, LKML, Linux-MM, Dave Hansen, Josh Poimboeuf,
	Juergen Gross, Peter Zijlstra, Borislav Petkov, Jiri Kosina,
	Boris Ostrovsky, Brian Gerst, David Laight, Denys Vlasenko,
	Eduardo Valentin, Will Deacon, Liguori, Anthony, Daniel Gruss,
	Hugh Dickins, Kees Cook, Andrea Arcangeli, Waiman Long,
	Pavel Machek

On Tue, Feb 13, 2018 at 12:54 AM, Greg KH <gregkh@linuxfoundation.org> wrote:
> On Sun, Feb 11, 2018 at 09:40:41AM -0800, Mark D Rustad wrote:
>>
>> ISTR that iscsi doesn't work when running a 64-bit kernel with a
>> 32-bit userspace. I remember someone offered kernel patches to fix it,
>> but I think they were rejected. I haven't messed with that stuff in
>> many years, so perhaps the userspace side now has accommodation for
>> it. It might be something to check on.
>
> IPSEC doesn't work with a 64bit kernel and 32bit userspace right now.
>
> Back in 2015 someone started to work on that, and properly marked that
> the kernel could not handle this with commit 74005991b78a ("xfrm: Do not
> parse 32bits compiled xfrm netlink msg on 64bits host")
>
> This is starting to be hit by some Android systems that are moving
> (yeah, slowly) to 4.4 :(

Does anybody have test-programs/harnesses for this?

This is an area I care deeply about: I really want people to not have
any excuses for not upgrading to a 64-bit kernel.  It used to be
autofs (I actually added that whole "packetized pipe" model just to
make automount "just w ork" even though the stupid protocol used a
pipe to send command packets that had different layout on 32-bit and
64-bit).

See commit 64f371bc3107 ("autofs: make the autofsv5 packet file
descriptor use a packetized pipe") for some discussion of that
particular saga.

Some drm people used to run 32-bit kernels because of compat worries,
and that would have been a disaster. They got very good about not
having compat issues.

So let's try to fix the iscsi and ipsec issues. Not that anybody sane
should use that overly complex ipsec thing, and I think we should
strive to merge WireGuard and get people moved over to that instead,
but I haven't heard anything from davem about it since I last asked..
I have some hope that it's slowly happening.

             Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-13 17:25                 ` Linus Torvalds
@ 2018-02-14  8:54                   ` Greg KH
  -1 siblings, 0 replies; 183+ messages in thread
From: Greg KH @ 2018-02-14  8:54 UTC (permalink / raw)
  To: Lorenzo Colitti, Linus Torvalds
  Cc: Mark D Rustad, Adam Borowski, Joerg Roedel, Andy Lutomirski,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	X86 ML, LKML, Linux-MM, Dave Hansen, Josh Poimboeuf,
	Juergen Gross, Peter Zijlstra, Borislav Petkov, Jiri Kosina,
	Boris Ostrovsky, Brian Gerst, David Laight, Denys Vlasenko,
	Eduardo Valentin, Will Deacon, Liguori, Anthony, Daniel Gruss,
	Hugh Dickins, Kees Cook, Andrea Arcangeli, Waiman Long,
	Pavel Machek

On Tue, Feb 13, 2018 at 09:25:34AM -0800, Linus Torvalds wrote:
> On Tue, Feb 13, 2018 at 12:54 AM, Greg KH <gregkh@linuxfoundation.org> wrote:
> > On Sun, Feb 11, 2018 at 09:40:41AM -0800, Mark D Rustad wrote:
> >>
> >> ISTR that iscsi doesn't work when running a 64-bit kernel with a
> >> 32-bit userspace. I remember someone offered kernel patches to fix it,
> >> but I think they were rejected. I haven't messed with that stuff in
> >> many years, so perhaps the userspace side now has accommodation for
> >> it. It might be something to check on.
> >
> > IPSEC doesn't work with a 64bit kernel and 32bit userspace right now.
> >
> > Back in 2015 someone started to work on that, and properly marked that
> > the kernel could not handle this with commit 74005991b78a ("xfrm: Do not
> > parse 32bits compiled xfrm netlink msg on 64bits host")
> >
> > This is starting to be hit by some Android systems that are moving
> > (yeah, slowly) to 4.4 :(
> 
> Does anybody have test-programs/harnesses for this?

Lorenzo (now on the To: line), is the one that I think is looking into
this, and should have some sort of test for it.  Lorenzo?

> This is an area I care deeply about: I really want people to not have
> any excuses for not upgrading to a 64-bit kernel.  It used to be
> autofs (I actually added that whole "packetized pipe" model just to
> make automount "just w ork" even though the stupid protocol used a
> pipe to send command packets that had different layout on 32-bit and
> 64-bit).
> 
> See commit 64f371bc3107 ("autofs: make the autofsv5 packet file
> descriptor use a packetized pipe") for some discussion of that
> particular saga.
> 
> Some drm people used to run 32-bit kernels because of compat worries,
> and that would have been a disaster. They got very good about not
> having compat issues.
> 
> So let's try to fix the iscsi and ipsec issues. Not that anybody sane
> should use that overly complex ipsec thing, and I think we should
> strive to merge WireGuard and get people moved over to that instead,
> but I haven't heard anything from davem about it since I last asked..
> I have some hope that it's slowly happening.

WireGuard is still being worked on, it needs some crypto library changes
that should be coming soon, but will probably be 6 months out at the
earliest to get merged.  There are still lots of people using IPSEC :(

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-14  8:54                   ` Greg KH
  0 siblings, 0 replies; 183+ messages in thread
From: Greg KH @ 2018-02-14  8:54 UTC (permalink / raw)
  To: Lorenzo Colitti, Linus Torvalds
  Cc: Mark D Rustad, Adam Borowski, Joerg Roedel, Andy Lutomirski,
	Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	X86 ML, LKML, Linux-MM, Dave Hansen, Josh Poimboeuf,
	Juergen Gross, Peter Zijlstra, Borislav Petkov, Jiri Kosina,
	Boris Ostrovsky, Brian Gerst, David Laight, Denys Vlasenko,
	Eduardo Valentin, Will Deacon, Liguori, Anthony, Daniel Gruss,
	Hugh Dickins, Kees Cook, Andrea Arcangeli, Waiman Long,
	Pavel Machek

On Tue, Feb 13, 2018 at 09:25:34AM -0800, Linus Torvalds wrote:
> On Tue, Feb 13, 2018 at 12:54 AM, Greg KH <gregkh@linuxfoundation.org> wrote:
> > On Sun, Feb 11, 2018 at 09:40:41AM -0800, Mark D Rustad wrote:
> >>
> >> ISTR that iscsi doesn't work when running a 64-bit kernel with a
> >> 32-bit userspace. I remember someone offered kernel patches to fix it,
> >> but I think they were rejected. I haven't messed with that stuff in
> >> many years, so perhaps the userspace side now has accommodation for
> >> it. It might be something to check on.
> >
> > IPSEC doesn't work with a 64bit kernel and 32bit userspace right now.
> >
> > Back in 2015 someone started to work on that, and properly marked that
> > the kernel could not handle this with commit 74005991b78a ("xfrm: Do not
> > parse 32bits compiled xfrm netlink msg on 64bits host")
> >
> > This is starting to be hit by some Android systems that are moving
> > (yeah, slowly) to 4.4 :(
> 
> Does anybody have test-programs/harnesses for this?

Lorenzo (now on the To: line), is the one that I think is looking into
this, and should have some sort of test for it.  Lorenzo?

> This is an area I care deeply about: I really want people to not have
> any excuses for not upgrading to a 64-bit kernel.  It used to be
> autofs (I actually added that whole "packetized pipe" model just to
> make automount "just w ork" even though the stupid protocol used a
> pipe to send command packets that had different layout on 32-bit and
> 64-bit).
> 
> See commit 64f371bc3107 ("autofs: make the autofsv5 packet file
> descriptor use a packetized pipe") for some discussion of that
> particular saga.
> 
> Some drm people used to run 32-bit kernels because of compat worries,
> and that would have been a disaster. They got very good about not
> having compat issues.
> 
> So let's try to fix the iscsi and ipsec issues. Not that anybody sane
> should use that overly complex ipsec thing, and I think we should
> strive to merge WireGuard and get people moved over to that instead,
> but I haven't heard anything from davem about it since I last asked..
> I have some hope that it's slowly happening.

WireGuard is still being worked on, it needs some crypto library changes
that should be coming soon, but will probably be 6 months out at the
earliest to get merged.  There are still lots of people using IPSEC :(

thanks,

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/31] x86/mm/pae: Populate valid user PGD entries
  2018-02-09  9:25   ` Joerg Roedel
@ 2018-02-14  9:45     ` Juergen Gross
  -1 siblings, 0 replies; 183+ messages in thread
From: Juergen Gross @ 2018-02-14  9:45 UTC (permalink / raw)
  To: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, aliguori,
	daniel.gruss, hughd, keescook, Andrea Arcangeli, Waiman Long,
	Pavel Machek, jroedel

On 09/02/18 10:25, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
> 
> Generic page-table code populates all non-leaf entries with
> _KERNPG_TABLE bits set. This is fine for all paging modes
> except PAE.
> 
> In PAE mode only a subset of the bits is allowed to be set.
> Make sure we only set allowed bits by masking out the
> reserved bits.
> 
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  arch/x86/include/asm/pgtable_types.h | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
> index 3696398..5027470 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -50,6 +50,7 @@
>  #define _PAGE_GLOBAL	(_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
>  #define _PAGE_SOFTW1	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW1)
>  #define _PAGE_SOFTW2	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW2)
> +#define _PAGE_SOFTW3	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW3)
>  #define _PAGE_PAT	(_AT(pteval_t, 1) << _PAGE_BIT_PAT)
>  #define _PAGE_PAT_LARGE (_AT(pteval_t, 1) << _PAGE_BIT_PAT_LARGE)
>  #define _PAGE_SPECIAL	(_AT(pteval_t, 1) << _PAGE_BIT_SPECIAL)
> @@ -267,14 +268,35 @@ typedef struct pgprot { pgprotval_t pgprot; } pgprot_t;
>  
>  typedef struct { pgdval_t pgd; } pgd_t;
>  
> +#ifdef CONFIG_X86_PAE
> +
> +/*
> + * PHYSICAL_PAGE_MASK might be non-constant when SME is compiled in, so we can't
> + * use it here.
> + */
> +#define PGD_PAE_PHYS_MASK	(((1ULL << __PHYSICAL_MASK_SHIFT)-1) & PAGE_MASK)

I think PAGE_MASK is a 32 bit value here, so you are chopping off
the high physical address bits.

With that corrected the kernel is coming up as Xen PV guest.


Juergen

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/31] x86/mm/pae: Populate valid user PGD entries
@ 2018-02-14  9:45     ` Juergen Gross
  0 siblings, 0 replies; 183+ messages in thread
From: Juergen Gross @ 2018-02-14  9:45 UTC (permalink / raw)
  To: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, aliguori,
	daniel.gruss, hughd, keescook, Andrea Arcangeli, Waiman Long,
	Pavel Machek, jroedel

On 09/02/18 10:25, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
> 
> Generic page-table code populates all non-leaf entries with
> _KERNPG_TABLE bits set. This is fine for all paging modes
> except PAE.
> 
> In PAE mode only a subset of the bits is allowed to be set.
> Make sure we only set allowed bits by masking out the
> reserved bits.
> 
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  arch/x86/include/asm/pgtable_types.h | 26 ++++++++++++++++++++++++--
>  1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
> index 3696398..5027470 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -50,6 +50,7 @@
>  #define _PAGE_GLOBAL	(_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL)
>  #define _PAGE_SOFTW1	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW1)
>  #define _PAGE_SOFTW2	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW2)
> +#define _PAGE_SOFTW3	(_AT(pteval_t, 1) << _PAGE_BIT_SOFTW3)
>  #define _PAGE_PAT	(_AT(pteval_t, 1) << _PAGE_BIT_PAT)
>  #define _PAGE_PAT_LARGE (_AT(pteval_t, 1) << _PAGE_BIT_PAT_LARGE)
>  #define _PAGE_SPECIAL	(_AT(pteval_t, 1) << _PAGE_BIT_SPECIAL)
> @@ -267,14 +268,35 @@ typedef struct pgprot { pgprotval_t pgprot; } pgprot_t;
>  
>  typedef struct { pgdval_t pgd; } pgd_t;
>  
> +#ifdef CONFIG_X86_PAE
> +
> +/*
> + * PHYSICAL_PAGE_MASK might be non-constant when SME is compiled in, so we can't
> + * use it here.
> + */
> +#define PGD_PAE_PHYS_MASK	(((1ULL << __PHYSICAL_MASK_SHIFT)-1) & PAGE_MASK)

I think PAGE_MASK is a 32 bit value here, so you are chopping off
the high physical address bits.

With that corrected the kernel is coming up as Xen PV guest.


Juergen

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/31] x86/mm/pae: Populate valid user PGD entries
  2018-02-14  9:45     ` Juergen Gross
@ 2018-02-14 10:00       ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-14 10:00 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel,
	linux-mm, Linus Torvalds, Andy Lutomirski, Dave Hansen,
	Josh Poimboeuf, Peter Zijlstra, Borislav Petkov, Jiri Kosina,
	Boris Ostrovsky, Brian Gerst, David Laight, Denys Vlasenko,
	Eduardo Valentin, Greg KH, Will Deacon, aliguori, daniel.gruss,
	hughd, keescook, Andrea Arcangeli, Waiman Long, Pavel Machek,
	jroedel

Hi Juergen,

On Wed, Feb 14, 2018 at 10:45:53AM +0100, Juergen Gross wrote:
> On 09/02/18 10:25, Joerg Roedel wrote:
> > +#ifdef CONFIG_X86_PAE
> > +
> > +/*
> > + * PHYSICAL_PAGE_MASK might be non-constant when SME is compiled in, so we can't
> > + * use it here.
> > + */
> > +#define PGD_PAE_PHYS_MASK	(((1ULL << __PHYSICAL_MASK_SHIFT)-1) & PAGE_MASK)
> 
> I think PAGE_MASK is a 32 bit value here, so you are chopping off
> the high physical address bits.
> 
> With that corrected the kernel is coming up as Xen PV guest.

Cool, thanks for testing these patches and debugging the breakage on
Xen-PV. I'll fix that in the next version.


Thanks again,

       Joerg

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 19/31] x86/mm/pae: Populate valid user PGD entries
@ 2018-02-14 10:00       ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-14 10:00 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel,
	linux-mm, Linus Torvalds, Andy Lutomirski, Dave Hansen,
	Josh Poimboeuf, Peter Zijlstra, Borislav Petkov, Jiri Kosina,
	Boris Ostrovsky, Brian Gerst, David Laight, Denys Vlasenko,
	Eduardo Valentin, Greg KH, Will Deacon, aliguori, daniel.gruss,
	hughd, keescook, Andrea Arcangeli, Waiman Long, Pavel Machek,
	jroedel

Hi Juergen,

On Wed, Feb 14, 2018 at 10:45:53AM +0100, Juergen Gross wrote:
> On 09/02/18 10:25, Joerg Roedel wrote:
> > +#ifdef CONFIG_X86_PAE
> > +
> > +/*
> > + * PHYSICAL_PAGE_MASK might be non-constant when SME is compiled in, so we can't
> > + * use it here.
> > + */
> > +#define PGD_PAE_PHYS_MASK	(((1ULL << __PHYSICAL_MASK_SHIFT)-1) & PAGE_MASK)
> 
> I think PAGE_MASK is a 32 bit value here, so you are chopping off
> the high physical address bits.
> 
> With that corrected the kernel is coming up as Xen PV guest.

Cool, thanks for testing these patches and debugging the breakage on
Xen-PV. I'll fix that in the next version.


Thanks again,

       Joerg

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-11 19:42               ` Andy Lutomirski
                                 ` (4 preceding siblings ...)
  (?)
@ 2018-02-14 10:43               ` Pavel Machek
  2018-02-15  3:44                 ` joe.korty
  -1 siblings, 1 reply; 183+ messages in thread
From: Pavel Machek @ 2018-02-14 10:43 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Mark D Rustad, Adam Borowski, Linus Torvalds, Joerg Roedel,
	Andy Lutomirski, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, X86 ML, LKML, Linux-MM, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long

[-- Attachment #1: Type: text/plain, Size: 1834 bytes --]

On Sun 2018-02-11 11:42:47, Andy Lutomirski wrote:
> 
> 
> On Feb 11, 2018, at 9:40 AM, Mark D Rustad <mrustad@gmail.com> wrote:
> 
> >> On Feb 11, 2018, at 2:59 AM, Adam Borowski <kilobyte@angband.pl> wrote:
> >> 
> >>> Does Debian make it easy to upgrade to a 64-bit kernel if you have a
> >>> 32-bit install?
> >> 
> >> Quite easy, yeah.  Crossgrading userspace is not for the faint of the heart,
> >> but changing just the kernel is fine.
> > 
> > ISTR that iscsi doesn't work when running a 64-bit kernel with a 32-bit userspace. I remember someone offered kernel patches to fix it, but I think they were rejected. I haven't messed with that stuff in many years, so perhaps the userspace side now has accommodation for it. It might be something to check on.
> > 
> 
> At the risk of suggesting heresy, should we consider removing x86_32 support at some point?

We have just found out that majority of 64-bit machines are broken in
rather fundamental ways (Spectre) and Intel does not even look
interested in fixing that (because it would make them look bad on
benchmarks).

Even when the Spectre bug is mitigated... this looks like can of worms
that can not be closed.

OTOH -- we do know that there are non-broken machines out there,
unfortunately they are mostly 32-bit :-). Removing support for
majority of working machines may not be good idea...

[And I really hope future CPUs get at least option to treat cache miss
as a side-effect -- thus disalowed during speculation -- and probably
option to turn off speculation altogether. AFAICT, it should "only"
result in 50% slowdown -- or that was result in some riscv
presentation.]

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-14 10:43               ` Pavel Machek
@ 2018-02-15  3:44                 ` joe.korty
  2018-02-16 14:34                   ` Pavel Machek
  0 siblings, 1 reply; 183+ messages in thread
From: joe.korty @ 2018-02-15  3:44 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Andy Lutomirski, int-list-linux-mm

On Wed, Feb 14, 2018 at 11:43:42AM +0100, Pavel Machek wrote:
> We have just found out that majority of 64-bit machines are broken in
> rather fundamental ways (Spectre) and Intel does not even look
> interested in fixing that (because it would make them look bad on
> benchmarks).
> 
> Even when the Spectre bug is mitigated... this looks like can of worms
> that can not be closed.
> 
> OTOH -- we do know that there are non-broken machines out there,
> unfortunately they are mostly 32-bit :-). Removing support for
> majority of working machines may not be good idea...
> 
> [And I really hope future CPUs get at least option to treat cache miss
> as a side-effect -- thus disalowed during speculation -- and probably
> option to turn off speculation altogether. AFAICT, it should "only"
> result in 50% slowdown -- or that was result in some riscv
> presentation.]

Or, future CPU designs introduce shadow caches and shadow
TLBs which only speculation loads and sees and which
become real only if and whend the resultant speculative
calculations become real.

Joe
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-15  3:44                 ` joe.korty
@ 2018-02-16 14:34                   ` Pavel Machek
  0 siblings, 0 replies; 183+ messages in thread
From: Pavel Machek @ 2018-02-16 14:34 UTC (permalink / raw)
  To: joe.korty; +Cc: Andy Lutomirski, int-list-linux-mm

[-- Attachment #1: Type: text/plain, Size: 1465 bytes --]

On Wed 2018-02-14 22:44:44, joe.korty@concurrent-rt.com wrote:
> On Wed, Feb 14, 2018 at 11:43:42AM +0100, Pavel Machek wrote:
> > We have just found out that majority of 64-bit machines are broken in
> > rather fundamental ways (Spectre) and Intel does not even look
> > interested in fixing that (because it would make them look bad on
> > benchmarks).
> > 
> > Even when the Spectre bug is mitigated... this looks like can of worms
> > that can not be closed.
> > 
> > OTOH -- we do know that there are non-broken machines out there,
> > unfortunately they are mostly 32-bit :-). Removing support for
> > majority of working machines may not be good idea...
> > 
> > [And I really hope future CPUs get at least option to treat cache miss
> > as a side-effect -- thus disalowed during speculation -- and probably
> > option to turn off speculation altogether. AFAICT, it should "only"
> > result in 50% slowdown -- or that was result in some riscv
> > presentation.]
> 
> Or, future CPU designs introduce shadow caches and shadow
> TLBs which only speculation loads and sees and which
> become real only if and whend the resultant speculative
> calculations become real.

Yes, that could help.

But there's still sidechannel in the RAM: it has row buffer
or something like that.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-14  8:54                   ` Greg KH
@ 2018-02-21 10:26                     ` Lorenzo Colitti
  -1 siblings, 0 replies; 183+ messages in thread
From: Lorenzo Colitti @ 2018-02-21 10:26 UTC (permalink / raw)
  To: Greg KH
  Cc: Linus Torvalds, Mark D Rustad, Adam Borowski, Joerg Roedel,
	Andy Lutomirski, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, X86 ML, LKML, Linux-MM, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Will Deacon, Liguori, Anthony,
	Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek, Florian Westphal

On Wed, Feb 14, 2018 at 5:54 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> > > IPSEC doesn't work with a 64bit kernel and 32bit userspace right now.
> > >
> > > Back in 2015 someone started to work on that, and properly marked that
> > > the kernel could not handle this with commit 74005991b78a ("xfrm: Do not
> > > parse 32bits compiled xfrm netlink msg on 64bits host")
> > >
> > > This is starting to be hit by some Android systems that are moving
> > > (yeah, slowly) to 4.4 :(
> >
> > Does anybody have test-programs/harnesses for this?
>
> Lorenzo (now on the To: line), is the one that I think is looking into
> this, and should have some sort of test for it.  Lorenzo?

Sorry for the late reply here. The issue is that the xfrm uapi structs
don't specify padding at the end, so they're a different size on
32-bit and 64-bit archs. This by itself would be fine, as the kernel
could just ignore the (lack of) padding. But some of these structs
contain others (e.g., xfrm_userspi_info contains xfrm_usersa_info),
and in that case the whole layout after the contained struct is
different.

On another thread Florian pointed out that he once wrote a patch to
fix this - https://patchwork.ozlabs.org/patch/45855/ . Florian, think
you could revive that?

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-21 10:26                     ` Lorenzo Colitti
  0 siblings, 0 replies; 183+ messages in thread
From: Lorenzo Colitti @ 2018-02-21 10:26 UTC (permalink / raw)
  To: Greg KH
  Cc: Linus Torvalds, Mark D Rustad, Adam Borowski, Joerg Roedel,
	Andy Lutomirski, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, X86 ML, LKML, Linux-MM, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Will Deacon, Liguori, Anthony,
	Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek, Florian Westphal

On Wed, Feb 14, 2018 at 5:54 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> > > IPSEC doesn't work with a 64bit kernel and 32bit userspace right now.
> > >
> > > Back in 2015 someone started to work on that, and properly marked that
> > > the kernel could not handle this with commit 74005991b78a ("xfrm: Do not
> > > parse 32bits compiled xfrm netlink msg on 64bits host")
> > >
> > > This is starting to be hit by some Android systems that are moving
> > > (yeah, slowly) to 4.4 :(
> >
> > Does anybody have test-programs/harnesses for this?
>
> Lorenzo (now on the To: line), is the one that I think is looking into
> this, and should have some sort of test for it.  Lorenzo?

Sorry for the late reply here. The issue is that the xfrm uapi structs
don't specify padding at the end, so they're a different size on
32-bit and 64-bit archs. This by itself would be fine, as the kernel
could just ignore the (lack of) padding. But some of these structs
contain others (e.g., xfrm_userspi_info contains xfrm_usersa_info),
and in that case the whole layout after the contained struct is
different.

On another thread Florian pointed out that he once wrote a patch to
fix this - https://patchwork.ozlabs.org/patch/45855/ . Florian, think
you could revive that?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-21 10:26                     ` Lorenzo Colitti
@ 2018-02-21 16:59                       ` Arnd Bergmann
  -1 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2018-02-21 16:59 UTC (permalink / raw)
  To: Lorenzo Colitti
  Cc: Greg KH, Linus Torvalds, Mark D Rustad, Adam Borowski,
	Joerg Roedel, Andy Lutomirski, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, X86 ML, LKML, Linux-MM,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Will Deacon,
	Liguori, Anthony, Daniel Gruss, Hugh Dickins, Kees Cook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, Florian Westphal

On Wed, Feb 21, 2018 at 11:26 AM, Lorenzo Colitti <lorenzo@google.com> wrote:
> On Wed, Feb 14, 2018 at 5:54 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
>> > > IPSEC doesn't work with a 64bit kernel and 32bit userspace right now.
>> > >
>> > > Back in 2015 someone started to work on that, and properly marked that
>> > > the kernel could not handle this with commit 74005991b78a ("xfrm: Do not
>> > > parse 32bits compiled xfrm netlink msg on 64bits host")
>> > >
>> > > This is starting to be hit by some Android systems that are moving
>> > > (yeah, slowly) to 4.4 :(
>> >
>> > Does anybody have test-programs/harnesses for this?
>>
>> Lorenzo (now on the To: line), is the one that I think is looking into
>> this, and should have some sort of test for it.  Lorenzo?
>
> Sorry for the late reply here. The issue is that the xfrm uapi structs
> don't specify padding at the end, so they're a different size on
> 32-bit and 64-bit archs. This by itself would be fine, as the kernel
> could just ignore the (lack of) padding. But some of these structs
> contain others (e.g., xfrm_userspi_info contains xfrm_usersa_info),
> and in that case the whole layout after the contained struct is
> different.

So this is x86 specific then and it already works correctly on all
other architectures (especially arm64 Android), right?

      Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-21 16:59                       ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2018-02-21 16:59 UTC (permalink / raw)
  To: Lorenzo Colitti
  Cc: Greg KH, Linus Torvalds, Mark D Rustad, Adam Borowski,
	Joerg Roedel, Andy Lutomirski, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, X86 ML, LKML, Linux-MM,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Will Deacon,
	Liguori, Anthony, Daniel Gruss, Hugh Dickins, Kees Cook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, Florian Westphal

On Wed, Feb 21, 2018 at 11:26 AM, Lorenzo Colitti <lorenzo@google.com> wrote:
> On Wed, Feb 14, 2018 at 5:54 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
>> > > IPSEC doesn't work with a 64bit kernel and 32bit userspace right now.
>> > >
>> > > Back in 2015 someone started to work on that, and properly marked that
>> > > the kernel could not handle this with commit 74005991b78a ("xfrm: Do not
>> > > parse 32bits compiled xfrm netlink msg on 64bits host")
>> > >
>> > > This is starting to be hit by some Android systems that are moving
>> > > (yeah, slowly) to 4.4 :(
>> >
>> > Does anybody have test-programs/harnesses for this?
>>
>> Lorenzo (now on the To: line), is the one that I think is looking into
>> this, and should have some sort of test for it.  Lorenzo?
>
> Sorry for the late reply here. The issue is that the xfrm uapi structs
> don't specify padding at the end, so they're a different size on
> 32-bit and 64-bit archs. This by itself would be fine, as the kernel
> could just ignore the (lack of) padding. But some of these structs
> contain others (e.g., xfrm_userspi_info contains xfrm_usersa_info),
> and in that case the whole layout after the contained struct is
> different.

So this is x86 specific then and it already works correctly on all
other architectures (especially arm64 Android), right?

      Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-21 16:59                       ` Arnd Bergmann
@ 2018-02-22 11:10                         ` Greg KH
  -1 siblings, 0 replies; 183+ messages in thread
From: Greg KH @ 2018-02-22 11:10 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Lorenzo Colitti, Linus Torvalds, Mark D Rustad, Adam Borowski,
	Joerg Roedel, Andy Lutomirski, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, X86 ML, LKML, Linux-MM,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Will Deacon,
	Liguori, Anthony, Daniel Gruss, Hugh Dickins, Kees Cook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, Florian Westphal

On Wed, Feb 21, 2018 at 05:59:34PM +0100, Arnd Bergmann wrote:
> On Wed, Feb 21, 2018 at 11:26 AM, Lorenzo Colitti <lorenzo@google.com> wrote:
> > On Wed, Feb 14, 2018 at 5:54 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> >> > > IPSEC doesn't work with a 64bit kernel and 32bit userspace right now.
> >> > >
> >> > > Back in 2015 someone started to work on that, and properly marked that
> >> > > the kernel could not handle this with commit 74005991b78a ("xfrm: Do not
> >> > > parse 32bits compiled xfrm netlink msg on 64bits host")
> >> > >
> >> > > This is starting to be hit by some Android systems that are moving
> >> > > (yeah, slowly) to 4.4 :(
> >> >
> >> > Does anybody have test-programs/harnesses for this?
> >>
> >> Lorenzo (now on the To: line), is the one that I think is looking into
> >> this, and should have some sort of test for it.  Lorenzo?
> >
> > Sorry for the late reply here. The issue is that the xfrm uapi structs
> > don't specify padding at the end, so they're a different size on
> > 32-bit and 64-bit archs. This by itself would be fine, as the kernel
> > could just ignore the (lack of) padding. But some of these structs
> > contain others (e.g., xfrm_userspi_info contains xfrm_usersa_info),
> > and in that case the whole layout after the contained struct is
> > different.
> 
> So this is x86 specific then and it already works correctly on all
> other architectures (especially arm64 Android), right?

Why is this an x86-specific issue?  I think people have noticed this
with ARM systems given that the original bug report I saw was for an
ARM Android-based system that had a 64bit kernel and 32bit userspace.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-22 11:10                         ` Greg KH
  0 siblings, 0 replies; 183+ messages in thread
From: Greg KH @ 2018-02-22 11:10 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Lorenzo Colitti, Linus Torvalds, Mark D Rustad, Adam Borowski,
	Joerg Roedel, Andy Lutomirski, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, X86 ML, LKML, Linux-MM,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Will Deacon,
	Liguori, Anthony, Daniel Gruss, Hugh Dickins, Kees Cook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, Florian Westphal

On Wed, Feb 21, 2018 at 05:59:34PM +0100, Arnd Bergmann wrote:
> On Wed, Feb 21, 2018 at 11:26 AM, Lorenzo Colitti <lorenzo@google.com> wrote:
> > On Wed, Feb 14, 2018 at 5:54 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> >> > > IPSEC doesn't work with a 64bit kernel and 32bit userspace right now.
> >> > >
> >> > > Back in 2015 someone started to work on that, and properly marked that
> >> > > the kernel could not handle this with commit 74005991b78a ("xfrm: Do not
> >> > > parse 32bits compiled xfrm netlink msg on 64bits host")
> >> > >
> >> > > This is starting to be hit by some Android systems that are moving
> >> > > (yeah, slowly) to 4.4 :(
> >> >
> >> > Does anybody have test-programs/harnesses for this?
> >>
> >> Lorenzo (now on the To: line), is the one that I think is looking into
> >> this, and should have some sort of test for it.  Lorenzo?
> >
> > Sorry for the late reply here. The issue is that the xfrm uapi structs
> > don't specify padding at the end, so they're a different size on
> > 32-bit and 64-bit archs. This by itself would be fine, as the kernel
> > could just ignore the (lack of) padding. But some of these structs
> > contain others (e.g., xfrm_userspi_info contains xfrm_usersa_info),
> > and in that case the whole layout after the contained struct is
> > different.
> 
> So this is x86 specific then and it already works correctly on all
> other architectures (especially arm64 Android), right?

Why is this an x86-specific issue?  I think people have noticed this
with ARM systems given that the original bug report I saw was for an
ARM Android-based system that had a 64bit kernel and 32bit userspace.

thanks,

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-22 11:10                         ` Greg KH
@ 2018-02-22 11:18                           ` Arnd Bergmann
  -1 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2018-02-22 11:18 UTC (permalink / raw)
  To: Greg KH
  Cc: Lorenzo Colitti, Linus Torvalds, Mark D Rustad, Adam Borowski,
	Joerg Roedel, Andy Lutomirski, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, X86 ML, LKML, Linux-MM,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Will Deacon,
	Liguori, Anthony, Daniel Gruss, Hugh Dickins, Kees Cook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, Florian Westphal

On Thu, Feb 22, 2018 at 12:10 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> On Wed, Feb 21, 2018 at 05:59:34PM +0100, Arnd Bergmann wrote:
>> On Wed, Feb 21, 2018 at 11:26 AM, Lorenzo Colitti <lorenzo@google.com> wrote:
>> > On Wed, Feb 14, 2018 at 5:54 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
>> >> > > IPSEC doesn't work with a 64bit kernel and 32bit userspace right now.
>> >> > >
>> >> > > Back in 2015 someone started to work on that, and properly marked that
>> >> > > the kernel could not handle this with commit 74005991b78a ("xfrm: Do not
>> >> > > parse 32bits compiled xfrm netlink msg on 64bits host")
>> >> > >
>> >> > > This is starting to be hit by some Android systems that are moving
>> >> > > (yeah, slowly) to 4.4 :(
>> >> >
>> >> > Does anybody have test-programs/harnesses for this?
>> >>
>> >> Lorenzo (now on the To: line), is the one that I think is looking into
>> >> this, and should have some sort of test for it.  Lorenzo?
>> >
>> > Sorry for the late reply here. The issue is that the xfrm uapi structs
>> > don't specify padding at the end, so they're a different size on
>> > 32-bit and 64-bit archs. This by itself would be fine, as the kernel
>> > could just ignore the (lack of) padding. But some of these structs
>> > contain others (e.g., xfrm_userspi_info contains xfrm_usersa_info),
>> > and in that case the whole layout after the contained struct is
>> > different.
>>
>> So this is x86 specific then and it already works correctly on all
>> other architectures (especially arm64 Android), right?
>
> Why is this an x86-specific issue?  I think people have noticed this
> with ARM systems given that the original bug report I saw was for an
> ARM Android-based system that had a 64bit kernel and 32bit userspace.

The patch Lorenzo linked to is only for x86, it addresses the fact that
the padding at the end of xfrm_usersa_info differs between 32-bit x86
and all other architectures because of the x86 specific quirk that u64
variables have 32-bit alignment.

xfrm_usersa_info should have the exact same layout on arm32, arm64
and x86_64.

      Arnd

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-22 11:18                           ` Arnd Bergmann
  0 siblings, 0 replies; 183+ messages in thread
From: Arnd Bergmann @ 2018-02-22 11:18 UTC (permalink / raw)
  To: Greg KH
  Cc: Lorenzo Colitti, Linus Torvalds, Mark D Rustad, Adam Borowski,
	Joerg Roedel, Andy Lutomirski, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, X86 ML, LKML, Linux-MM,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Will Deacon,
	Liguori, Anthony, Daniel Gruss, Hugh Dickins, Kees Cook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, Florian Westphal

On Thu, Feb 22, 2018 at 12:10 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> On Wed, Feb 21, 2018 at 05:59:34PM +0100, Arnd Bergmann wrote:
>> On Wed, Feb 21, 2018 at 11:26 AM, Lorenzo Colitti <lorenzo@google.com> wrote:
>> > On Wed, Feb 14, 2018 at 5:54 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
>> >> > > IPSEC doesn't work with a 64bit kernel and 32bit userspace right now.
>> >> > >
>> >> > > Back in 2015 someone started to work on that, and properly marked that
>> >> > > the kernel could not handle this with commit 74005991b78a ("xfrm: Do not
>> >> > > parse 32bits compiled xfrm netlink msg on 64bits host")
>> >> > >
>> >> > > This is starting to be hit by some Android systems that are moving
>> >> > > (yeah, slowly) to 4.4 :(
>> >> >
>> >> > Does anybody have test-programs/harnesses for this?
>> >>
>> >> Lorenzo (now on the To: line), is the one that I think is looking into
>> >> this, and should have some sort of test for it.  Lorenzo?
>> >
>> > Sorry for the late reply here. The issue is that the xfrm uapi structs
>> > don't specify padding at the end, so they're a different size on
>> > 32-bit and 64-bit archs. This by itself would be fine, as the kernel
>> > could just ignore the (lack of) padding. But some of these structs
>> > contain others (e.g., xfrm_userspi_info contains xfrm_usersa_info),
>> > and in that case the whole layout after the contained struct is
>> > different.
>>
>> So this is x86 specific then and it already works correctly on all
>> other architectures (especially arm64 Android), right?
>
> Why is this an x86-specific issue?  I think people have noticed this
> with ARM systems given that the original bug report I saw was for an
> ARM Android-based system that had a 64bit kernel and 32bit userspace.

The patch Lorenzo linked to is only for x86, it addresses the fact that
the padding at the end of xfrm_usersa_info differs between 32-bit x86
and all other architectures because of the x86 specific quirk that u64
variables have 32-bit alignment.

xfrm_usersa_info should have the exact same layout on arm32, arm64
and x86_64.

      Arnd

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
  2018-02-09  9:25   ` Joerg Roedel
@ 2018-02-27 19:18     ` Waiman Long
  -1 siblings, 0 replies; 183+ messages in thread
From: Waiman Long @ 2018-02-27 19:18 UTC (permalink / raw)
  To: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel

On 02/09/2018 04:25 AM, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
>
> Add unconditional cr3 switches between user and kernel cr3
> to all non-NMI entry and exit points.
>
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  arch/x86/entry/entry_32.S | 59 ++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 58 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index 9693485..b5ef003 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -328,6 +328,25 @@
>  #endif /* CONFIG_X86_ESPFIX32 */
>  .endm
>  
> +/* Unconditionally switch to user cr3 */
> +.macro SWITCH_TO_USER_CR3 scratch_reg:req
> +	ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
> +
> +	movl	%cr3, \scratch_reg
> +	orl	$PTI_SWITCH_MASK, \scratch_reg
> +	movl	\scratch_reg, %cr3
> +.Lend_\@:
> +.endm
> +
> +/* Unconditionally switch to kernel cr3 */
> +.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
> +	ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
> +	movl	%cr3, \scratch_reg
> +	andl	$(~PTI_SWITCH_MASK), \scratch_reg
> +	movl	\scratch_reg, %cr3
> +.Lend_\@:
> +.endm
> +
>  
>  /*
>   * Called with pt_regs fully populated and kernel segments loaded,
> @@ -343,6 +362,8 @@
>  
>  	ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
>  
> +	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
> +
>  	/* Are we on the entry stack? Bail out if not! */
>  	movl	PER_CPU_VAR(cpu_entry_area), %edi
>  	addl	$CPU_ENTRY_AREA_entry_stack, %edi
> @@ -637,6 +658,18 @@ ENTRY(xen_sysenter_target)
>   * 0(%ebp) arg6
>   */
>  ENTRY(entry_SYSENTER_32)
> +	/*
> +	 * On entry-stack with all userspace-regs live - save and
> +	 * restore eflags and %eax to use it as scratch-reg for the cr3
> +	 * switch.
> +	 */
> +	pushfl
> +	pushl	%eax
> +	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
> +	popl	%eax
> +	popfl
> +
> +	/* Stack empty again, switch to task stack */
>  	movl	TSS_entry_stack(%esp), %esp
>  
>  .Lsysenter_past_esp:
> @@ -691,6 +724,10 @@ ENTRY(entry_SYSENTER_32)
>  	movl	PT_OLDESP(%esp), %ecx	/* pt_regs->sp */
>  1:	mov	PT_FS(%esp), %fs
>  	PTGS_TO_GS
> +
> +	/* Segments are restored - switch to user cr3 */
> +	SWITCH_TO_USER_CR3 scratch_reg=%eax
> +
>  	popl	%ebx			/* pt_regs->bx */
>  	addl	$2*4, %esp		/* skip pt_regs->cx and pt_regs->dx */
>  	popl	%esi			/* pt_regs->si */
> @@ -778,7 +815,23 @@ restore_all:
>  .Lrestore_all_notrace:
>  	CHECK_AND_APPLY_ESPFIX
>  .Lrestore_nocheck:
> -	RESTORE_REGS 4				# skip orig_eax/error_code
> +	/*
> +	 * First restore user segments. This can cause exceptions, so we
> +	 * run it with kernel cr3.
> +	 */
> +	RESTORE_SEGMENTS
> +
> +	/*
> +	 * Segments are restored - no more exceptions from here on except on
> +	 * iret, but that handled safely.
> +	 */
> +	SWITCH_TO_USER_CR3 scratch_reg=%eax
> +
> +	/* Restore rest */
> +	RESTORE_INT_REGS
> +
> +	/* Unwind stack to the iret frame */
> +	RESTORE_SKIP_SEGMENTS 4			# skip orig_eax/error_code
>  .Lirq_return:
>  	INTERRUPT_RETURN
>  
> @@ -1139,6 +1192,10 @@ ENTRY(debug)
>  
>  	SAVE_ALL
>  	ENCODE_FRAME_POINTER
> +
> +	/* Make sure we are running on kernel cr3 */
> +	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
> +
>  	xorl	%edx, %edx			# error code 0
>  	movl	%esp, %eax			# pt_regs pointer
>  

The debug exception calls ret_from_exception on exit. If coming from
userspace, the C function prepare_exit_to_usermode() will be called.
With the PTI-32 code, it means that function will be called with the
entry stack instead of the task stack. This can be problematic as macro
like current won't work anymore.

-Longman

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
@ 2018-02-27 19:18     ` Waiman Long
  0 siblings, 0 replies; 183+ messages in thread
From: Waiman Long @ 2018-02-27 19:18 UTC (permalink / raw)
  To: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin
  Cc: x86, linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek, jroedel

On 02/09/2018 04:25 AM, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
>
> Add unconditional cr3 switches between user and kernel cr3
> to all non-NMI entry and exit points.
>
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  arch/x86/entry/entry_32.S | 59 ++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 58 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index 9693485..b5ef003 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -328,6 +328,25 @@
>  #endif /* CONFIG_X86_ESPFIX32 */
>  .endm
>  
> +/* Unconditionally switch to user cr3 */
> +.macro SWITCH_TO_USER_CR3 scratch_reg:req
> +	ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
> +
> +	movl	%cr3, \scratch_reg
> +	orl	$PTI_SWITCH_MASK, \scratch_reg
> +	movl	\scratch_reg, %cr3
> +.Lend_\@:
> +.endm
> +
> +/* Unconditionally switch to kernel cr3 */
> +.macro SWITCH_TO_KERNEL_CR3 scratch_reg:req
> +	ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI
> +	movl	%cr3, \scratch_reg
> +	andl	$(~PTI_SWITCH_MASK), \scratch_reg
> +	movl	\scratch_reg, %cr3
> +.Lend_\@:
> +.endm
> +
>  
>  /*
>   * Called with pt_regs fully populated and kernel segments loaded,
> @@ -343,6 +362,8 @@
>  
>  	ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
>  
> +	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
> +
>  	/* Are we on the entry stack? Bail out if not! */
>  	movl	PER_CPU_VAR(cpu_entry_area), %edi
>  	addl	$CPU_ENTRY_AREA_entry_stack, %edi
> @@ -637,6 +658,18 @@ ENTRY(xen_sysenter_target)
>   * 0(%ebp) arg6
>   */
>  ENTRY(entry_SYSENTER_32)
> +	/*
> +	 * On entry-stack with all userspace-regs live - save and
> +	 * restore eflags and %eax to use it as scratch-reg for the cr3
> +	 * switch.
> +	 */
> +	pushfl
> +	pushl	%eax
> +	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
> +	popl	%eax
> +	popfl
> +
> +	/* Stack empty again, switch to task stack */
>  	movl	TSS_entry_stack(%esp), %esp
>  
>  .Lsysenter_past_esp:
> @@ -691,6 +724,10 @@ ENTRY(entry_SYSENTER_32)
>  	movl	PT_OLDESP(%esp), %ecx	/* pt_regs->sp */
>  1:	mov	PT_FS(%esp), %fs
>  	PTGS_TO_GS
> +
> +	/* Segments are restored - switch to user cr3 */
> +	SWITCH_TO_USER_CR3 scratch_reg=%eax
> +
>  	popl	%ebx			/* pt_regs->bx */
>  	addl	$2*4, %esp		/* skip pt_regs->cx and pt_regs->dx */
>  	popl	%esi			/* pt_regs->si */
> @@ -778,7 +815,23 @@ restore_all:
>  .Lrestore_all_notrace:
>  	CHECK_AND_APPLY_ESPFIX
>  .Lrestore_nocheck:
> -	RESTORE_REGS 4				# skip orig_eax/error_code
> +	/*
> +	 * First restore user segments. This can cause exceptions, so we
> +	 * run it with kernel cr3.
> +	 */
> +	RESTORE_SEGMENTS
> +
> +	/*
> +	 * Segments are restored - no more exceptions from here on except on
> +	 * iret, but that handled safely.
> +	 */
> +	SWITCH_TO_USER_CR3 scratch_reg=%eax
> +
> +	/* Restore rest */
> +	RESTORE_INT_REGS
> +
> +	/* Unwind stack to the iret frame */
> +	RESTORE_SKIP_SEGMENTS 4			# skip orig_eax/error_code
>  .Lirq_return:
>  	INTERRUPT_RETURN
>  
> @@ -1139,6 +1192,10 @@ ENTRY(debug)
>  
>  	SAVE_ALL
>  	ENCODE_FRAME_POINTER
> +
> +	/* Make sure we are running on kernel cr3 */
> +	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
> +
>  	xorl	%edx, %edx			# error code 0
>  	movl	%esp, %eax			# pt_regs pointer
>  

The debug exception calls ret_from_exception on exit. If coming from
userspace, the C function prepare_exit_to_usermode() will be called.
With the PTI-32 code, it means that function will be called with the
entry stack instead of the task stack. This can be problematic as macro
like current won't work anymore.

-Longman

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
  2018-02-27 19:18     ` Waiman Long
@ 2018-03-01 12:03       ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-03-01 12:03 UTC (permalink / raw)
  To: Waiman Long
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel,
	linux-mm, Linus Torvalds, Andy Lutomirski, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, aliguori,
	daniel.gruss, hughd, keescook, Andrea Arcangeli, Waiman Long,
	Pavel Machek, jroedel

On Tue, Feb 27, 2018 at 02:18:36PM -0500, Waiman Long wrote:
> On 02/09/2018 04:25 AM, Joerg Roedel wrote:
> >  	SAVE_ALL
> >  	ENCODE_FRAME_POINTER
> > +
> > +	/* Make sure we are running on kernel cr3 */
> > +	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
> > +
> >  	xorl	%edx, %edx			# error code 0
> >  	movl	%esp, %eax			# pt_regs pointer
> >  
> 
> The debug exception calls ret_from_exception on exit. If coming from
> userspace, the C function prepare_exit_to_usermode() will be called.
> With the PTI-32 code, it means that function will be called with the
> entry stack instead of the task stack. This can be problematic as macro
> like current won't work anymore.

This is not different from before, no? The debug handler already can be
entered on entry stack before this patch-set.


	Joerg

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
@ 2018-03-01 12:03       ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-03-01 12:03 UTC (permalink / raw)
  To: Waiman Long
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel,
	linux-mm, Linus Torvalds, Andy Lutomirski, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, aliguori,
	daniel.gruss, hughd, keescook, Andrea Arcangeli, Waiman Long,
	Pavel Machek, jroedel

On Tue, Feb 27, 2018 at 02:18:36PM -0500, Waiman Long wrote:
> On 02/09/2018 04:25 AM, Joerg Roedel wrote:
> >  	SAVE_ALL
> >  	ENCODE_FRAME_POINTER
> > +
> > +	/* Make sure we are running on kernel cr3 */
> > +	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
> > +
> >  	xorl	%edx, %edx			# error code 0
> >  	movl	%esp, %eax			# pt_regs pointer
> >  
> 
> The debug exception calls ret_from_exception on exit. If coming from
> userspace, the C function prepare_exit_to_usermode() will be called.
> With the PTI-32 code, it means that function will be called with the
> entry stack instead of the task stack. This can be problematic as macro
> like current won't work anymore.

This is not different from before, no? The debug handler already can be
entered on entry stack before this patch-set.


	Joerg

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
  2018-02-27 19:18     ` Waiman Long
@ 2018-03-01 13:34       ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-03-01 13:34 UTC (permalink / raw)
  To: Waiman Long
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel,
	linux-mm, Linus Torvalds, Andy Lutomirski, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, aliguori,
	daniel.gruss, hughd, keescook, Andrea Arcangeli, Waiman Long,
	Pavel Machek, jroedel

On Tue, Feb 27, 2018 at 02:18:36PM -0500, Waiman Long wrote:
> > +	/* Make sure we are running on kernel cr3 */
> > +	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
> > +
> >  	xorl	%edx, %edx			# error code 0
> >  	movl	%esp, %eax			# pt_regs pointer
> >  
> 
> The debug exception calls ret_from_exception on exit. If coming from
> userspace, the C function prepare_exit_to_usermode() will be called.
> With the PTI-32 code, it means that function will be called with the
> entry stack instead of the task stack. This can be problematic as macro
> like current won't work anymore.

Okay, I had another look at the debug handler. As I said before, it
already handles the from-entry-stack case, but with these patches it
gets more likely that we actually hit that path.

Also, with the special handling for from-kernel-with-entry-stack
situations we can simplify the debug handler and make it more robust
with the diff below. Thoughts?

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 8c149f5..844aff1 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1318,33 +1318,14 @@ ENTRY(debug)
 	ASM_CLAC
 	pushl	$-1				# mark this as an int
 
-	SAVE_ALL
+	SAVE_ALL switch_stacks=1
 	ENCODE_FRAME_POINTER
 
-	/* Make sure we are running on kernel cr3 */
-	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
-
 	xorl	%edx, %edx			# error code 0
 	movl	%esp, %eax			# pt_regs pointer
 
-	/* Are we currently on the SYSENTER stack? */
-	movl	PER_CPU_VAR(cpu_entry_area), %ecx
-	addl	$CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
-	subl	%eax, %ecx	/* ecx = (end of entry_stack) - esp */
-	cmpl	$SIZEOF_entry_stack, %ecx
-	jb	.Ldebug_from_sysenter_stack
-
-	TRACE_IRQS_OFF
-	call	do_debug
-	jmp	ret_from_exception
-
-.Ldebug_from_sysenter_stack:
-	/* We're on the SYSENTER stack.  Switch off. */
-	movl	%esp, %ebx
-	movl	PER_CPU_VAR(cpu_current_top_of_stack), %esp
 	TRACE_IRQS_OFF
 	call	do_debug
-	movl	%ebx, %esp
 	jmp	ret_from_exception
 END(debug)
 

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
@ 2018-03-01 13:34       ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-03-01 13:34 UTC (permalink / raw)
  To: Waiman Long
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel,
	linux-mm, Linus Torvalds, Andy Lutomirski, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, aliguori,
	daniel.gruss, hughd, keescook, Andrea Arcangeli, Waiman Long,
	Pavel Machek, jroedel

On Tue, Feb 27, 2018 at 02:18:36PM -0500, Waiman Long wrote:
> > +	/* Make sure we are running on kernel cr3 */
> > +	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
> > +
> >  	xorl	%edx, %edx			# error code 0
> >  	movl	%esp, %eax			# pt_regs pointer
> >  
> 
> The debug exception calls ret_from_exception on exit. If coming from
> userspace, the C function prepare_exit_to_usermode() will be called.
> With the PTI-32 code, it means that function will be called with the
> entry stack instead of the task stack. This can be problematic as macro
> like current won't work anymore.

Okay, I had another look at the debug handler. As I said before, it
already handles the from-entry-stack case, but with these patches it
gets more likely that we actually hit that path.

Also, with the special handling for from-kernel-with-entry-stack
situations we can simplify the debug handler and make it more robust
with the diff below. Thoughts?

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 8c149f5..844aff1 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1318,33 +1318,14 @@ ENTRY(debug)
 	ASM_CLAC
 	pushl	$-1				# mark this as an int
 
-	SAVE_ALL
+	SAVE_ALL switch_stacks=1
 	ENCODE_FRAME_POINTER
 
-	/* Make sure we are running on kernel cr3 */
-	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
-
 	xorl	%edx, %edx			# error code 0
 	movl	%esp, %eax			# pt_regs pointer
 
-	/* Are we currently on the SYSENTER stack? */
-	movl	PER_CPU_VAR(cpu_entry_area), %ecx
-	addl	$CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
-	subl	%eax, %ecx	/* ecx = (end of entry_stack) - esp */
-	cmpl	$SIZEOF_entry_stack, %ecx
-	jb	.Ldebug_from_sysenter_stack
-
-	TRACE_IRQS_OFF
-	call	do_debug
-	jmp	ret_from_exception
-
-.Ldebug_from_sysenter_stack:
-	/* We're on the SYSENTER stack.  Switch off. */
-	movl	%esp, %ebx
-	movl	PER_CPU_VAR(cpu_current_top_of_stack), %esp
 	TRACE_IRQS_OFF
 	call	do_debug
-	movl	%ebx, %esp
 	jmp	ret_from_exception
 END(debug)
 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
  2018-03-01 13:34       ` Joerg Roedel
@ 2018-03-01 14:33         ` Waiman Long
  -1 siblings, 0 replies; 183+ messages in thread
From: Waiman Long @ 2018-03-01 14:33 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel,
	linux-mm, Linus Torvalds, Andy Lutomirski, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, aliguori,
	daniel.gruss, hughd, keescook, Andrea Arcangeli, Waiman Long,
	Pavel Machek, jroedel

On 03/01/2018 08:34 AM, Joerg Roedel wrote:
> On Tue, Feb 27, 2018 at 02:18:36PM -0500, Waiman Long wrote:
>>> +	/* Make sure we are running on kernel cr3 */
>>> +	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
>>> +
>>>  	xorl	%edx, %edx			# error code 0
>>>  	movl	%esp, %eax			# pt_regs pointer
>>>  
>> The debug exception calls ret_from_exception on exit. If coming from
>> userspace, the C function prepare_exit_to_usermode() will be called.
>> With the PTI-32 code, it means that function will be called with the
>> entry stack instead of the task stack. This can be problematic as macro
>> like current won't work anymore.
> Okay, I had another look at the debug handler. As I said before, it
> already handles the from-entry-stack case, but with these patches it
> gets more likely that we actually hit that path.
>
> Also, with the special handling for from-kernel-with-entry-stack
> situations we can simplify the debug handler and make it more robust
> with the diff below. Thoughts?
>
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index 8c149f5..844aff1 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -1318,33 +1318,14 @@ ENTRY(debug)
>  	ASM_CLAC
>  	pushl	$-1				# mark this as an int
>  
> -	SAVE_ALL
> +	SAVE_ALL switch_stacks=1
>  	ENCODE_FRAME_POINTER
>  
> -	/* Make sure we are running on kernel cr3 */
> -	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
> -
>  	xorl	%edx, %edx			# error code 0
>  	movl	%esp, %eax			# pt_regs pointer
>  
> -	/* Are we currently on the SYSENTER stack? */
> -	movl	PER_CPU_VAR(cpu_entry_area), %ecx
> -	addl	$CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
> -	subl	%eax, %ecx	/* ecx = (end of entry_stack) - esp */
> -	cmpl	$SIZEOF_entry_stack, %ecx
> -	jb	.Ldebug_from_sysenter_stack
> -
> -	TRACE_IRQS_OFF
> -	call	do_debug
> -	jmp	ret_from_exception
> -
> -.Ldebug_from_sysenter_stack:
> -	/* We're on the SYSENTER stack.  Switch off. */
> -	movl	%esp, %ebx
> -	movl	PER_CPU_VAR(cpu_current_top_of_stack), %esp
>  	TRACE_IRQS_OFF
>  	call	do_debug
> -	movl	%ebx, %esp
>  	jmp	ret_from_exception
>  END(debug)
>  
>
>
I think that should fix the issue of debug exception from userspace.

One thing that I am not certain about is whether debug exception can
happen even if the IF flag is cleared. If it can, debug exception should
be handled like NMI as the state of the CR3 can be indeterminate if the
exception happens in the entry/exit code.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
@ 2018-03-01 14:33         ` Waiman Long
  0 siblings, 0 replies; 183+ messages in thread
From: Waiman Long @ 2018-03-01 14:33 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86, linux-kernel,
	linux-mm, Linus Torvalds, Andy Lutomirski, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, aliguori,
	daniel.gruss, hughd, keescook, Andrea Arcangeli, Waiman Long,
	Pavel Machek, jroedel

On 03/01/2018 08:34 AM, Joerg Roedel wrote:
> On Tue, Feb 27, 2018 at 02:18:36PM -0500, Waiman Long wrote:
>>> +	/* Make sure we are running on kernel cr3 */
>>> +	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
>>> +
>>>  	xorl	%edx, %edx			# error code 0
>>>  	movl	%esp, %eax			# pt_regs pointer
>>>  
>> The debug exception calls ret_from_exception on exit. If coming from
>> userspace, the C function prepare_exit_to_usermode() will be called.
>> With the PTI-32 code, it means that function will be called with the
>> entry stack instead of the task stack. This can be problematic as macro
>> like current won't work anymore.
> Okay, I had another look at the debug handler. As I said before, it
> already handles the from-entry-stack case, but with these patches it
> gets more likely that we actually hit that path.
>
> Also, with the special handling for from-kernel-with-entry-stack
> situations we can simplify the debug handler and make it more robust
> with the diff below. Thoughts?
>
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index 8c149f5..844aff1 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -1318,33 +1318,14 @@ ENTRY(debug)
>  	ASM_CLAC
>  	pushl	$-1				# mark this as an int
>  
> -	SAVE_ALL
> +	SAVE_ALL switch_stacks=1
>  	ENCODE_FRAME_POINTER
>  
> -	/* Make sure we are running on kernel cr3 */
> -	SWITCH_TO_KERNEL_CR3 scratch_reg=%eax
> -
>  	xorl	%edx, %edx			# error code 0
>  	movl	%esp, %eax			# pt_regs pointer
>  
> -	/* Are we currently on the SYSENTER stack? */
> -	movl	PER_CPU_VAR(cpu_entry_area), %ecx
> -	addl	$CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx
> -	subl	%eax, %ecx	/* ecx = (end of entry_stack) - esp */
> -	cmpl	$SIZEOF_entry_stack, %ecx
> -	jb	.Ldebug_from_sysenter_stack
> -
> -	TRACE_IRQS_OFF
> -	call	do_debug
> -	jmp	ret_from_exception
> -
> -.Ldebug_from_sysenter_stack:
> -	/* We're on the SYSENTER stack.  Switch off. */
> -	movl	%esp, %ebx
> -	movl	PER_CPU_VAR(cpu_current_top_of_stack), %esp
>  	TRACE_IRQS_OFF
>  	call	do_debug
> -	movl	%ebx, %esp
>  	jmp	ret_from_exception
>  END(debug)
>  
>
>
I think that should fix the issue of debug exception from userspace.

One thing that I am not certain about is whether debug exception can
happen even if the IF flag is cleared. If it can, debug exception should
be handled like NMI as the state of the CR3 can be indeterminate if the
exception happens in the entry/exit code.

Cheers,
Longman


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
  2018-03-01 14:33         ` Waiman Long
@ 2018-03-01 16:50           ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-03-01 16:50 UTC (permalink / raw)
  To: Waiman Long
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek

On Thu, Mar 01, 2018 at 09:33:11AM -0500, Waiman Long wrote:
> On 03/01/2018 08:34 AM, Joerg Roedel wrote:
> I think that should fix the issue of debug exception from userspace.
> 
> One thing that I am not certain about is whether debug exception can
> happen even if the IF flag is cleared. If it can, debug exception should
> be handled like NMI as the state of the CR3 can be indeterminate if the
> exception happens in the entry/exit code.

I am actually not 100% sure where it can happen, from the code it can
happen from anywhere, except when we are running on an espfix stack.

So I am not sure we need the same complex handling NMIs need wrt. to
switching the cr3s.


	Joerg

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
@ 2018-03-01 16:50           ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-03-01 16:50 UTC (permalink / raw)
  To: Waiman Long
  Cc: Joerg Roedel, Thomas Gleixner, Ingo Molnar, H . Peter Anvin, x86,
	linux-kernel, linux-mm, Linus Torvalds, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, Brian Gerst,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, aliguori, daniel.gruss, hughd, keescook,
	Andrea Arcangeli, Waiman Long, Pavel Machek

On Thu, Mar 01, 2018 at 09:33:11AM -0500, Waiman Long wrote:
> On 03/01/2018 08:34 AM, Joerg Roedel wrote:
> I think that should fix the issue of debug exception from userspace.
> 
> One thing that I am not certain about is whether debug exception can
> happen even if the IF flag is cleared. If it can, debug exception should
> be handled like NMI as the state of the CR3 can be indeterminate if the
> exception happens in the entry/exit code.

I am actually not 100% sure where it can happen, from the code it can
happen from anywhere, except when we are running on an espfix stack.

So I am not sure we need the same complex handling NMIs need wrt. to
switching the cr3s.


	Joerg

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
  2018-03-01 16:50           ` Joerg Roedel
@ 2018-03-01 18:24             ` Brian Gerst
  -1 siblings, 0 replies; 183+ messages in thread
From: Brian Gerst @ 2018-03-01 18:24 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Waiman Long, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List, Linux-MM, Linus Torvalds,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli, Waiman Long, Pavel Machek

On Thu, Mar 1, 2018 at 11:50 AM, Joerg Roedel <jroedel@suse.de> wrote:
> On Thu, Mar 01, 2018 at 09:33:11AM -0500, Waiman Long wrote:
>> On 03/01/2018 08:34 AM, Joerg Roedel wrote:
>> I think that should fix the issue of debug exception from userspace.
>>
>> One thing that I am not certain about is whether debug exception can
>> happen even if the IF flag is cleared. If it can, debug exception should
>> be handled like NMI as the state of the CR3 can be indeterminate if the
>> exception happens in the entry/exit code.
>
> I am actually not 100% sure where it can happen, from the code it can
> happen from anywhere, except when we are running on an espfix stack.
>
> So I am not sure we need the same complex handling NMIs need wrt. to
> switching the cr3s.

The IF flag only affects external maskable interrupts, not traps or
faults.  You do need to check CR3 because SYSENTER does not clear TF
and will immediately cause a debug trap on kernel entry (with user
CR3) if set.  That is why the code existed before to check for the
entry stack for debug/NMI.

--
Brian Gerst

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
@ 2018-03-01 18:24             ` Brian Gerst
  0 siblings, 0 replies; 183+ messages in thread
From: Brian Gerst @ 2018-03-01 18:24 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Waiman Long, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List, Linux-MM, Linus Torvalds,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli, Waiman Long, Pavel Machek

On Thu, Mar 1, 2018 at 11:50 AM, Joerg Roedel <jroedel@suse.de> wrote:
> On Thu, Mar 01, 2018 at 09:33:11AM -0500, Waiman Long wrote:
>> On 03/01/2018 08:34 AM, Joerg Roedel wrote:
>> I think that should fix the issue of debug exception from userspace.
>>
>> One thing that I am not certain about is whether debug exception can
>> happen even if the IF flag is cleared. If it can, debug exception should
>> be handled like NMI as the state of the CR3 can be indeterminate if the
>> exception happens in the entry/exit code.
>
> I am actually not 100% sure where it can happen, from the code it can
> happen from anywhere, except when we are running on an espfix stack.
>
> So I am not sure we need the same complex handling NMIs need wrt. to
> switching the cr3s.

The IF flag only affects external maskable interrupts, not traps or
faults.  You do need to check CR3 because SYSENTER does not clear TF
and will immediately cause a debug trap on kernel entry (with user
CR3) if set.  That is why the code existed before to check for the
entry stack for debug/NMI.

--
Brian Gerst

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
  2018-03-01 18:24             ` Brian Gerst
@ 2018-03-01 18:36               ` Dave Hansen
  -1 siblings, 0 replies; 183+ messages in thread
From: Dave Hansen @ 2018-03-01 18:36 UTC (permalink / raw)
  To: Brian Gerst, Joerg Roedel
  Cc: Waiman Long, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List, Linux-MM, Linus Torvalds,
	Andy Lutomirski, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek

On 03/01/2018 10:24 AM, Brian Gerst wrote:
> One thing that I am not certain about is whether debug exception can
> happen even if the IF flag is cleared. If it can, debug exception should
> be handled like NMI as the state of the CR3 can be indeterminate if the
> exception happens in the entry/exit code.

It can happen with IF cleared.  I ran into it during PTI development
more than once.  That's why the debug fault code uses paranoid_entry on
64-bit just like the NMI code.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
@ 2018-03-01 18:36               ` Dave Hansen
  0 siblings, 0 replies; 183+ messages in thread
From: Dave Hansen @ 2018-03-01 18:36 UTC (permalink / raw)
  To: Brian Gerst, Joerg Roedel
  Cc: Waiman Long, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List, Linux-MM, Linus Torvalds,
	Andy Lutomirski, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek

On 03/01/2018 10:24 AM, Brian Gerst wrote:
> One thing that I am not certain about is whether debug exception can
> happen even if the IF flag is cleared. If it can, debug exception should
> be handled like NMI as the state of the CR3 can be indeterminate if the
> exception happens in the entry/exit code.

It can happen with IF cleared.  I ran into it during PTI development
more than once.  That's why the debug fault code uses paranoid_entry on
64-bit just like the NMI code.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
  2018-03-01 18:24             ` Brian Gerst
@ 2018-03-01 18:38               ` Linus Torvalds
  -1 siblings, 0 replies; 183+ messages in thread
From: Linus Torvalds @ 2018-03-01 18:38 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Joerg Roedel, Waiman Long, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List, Linux-MM, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek

On Thu, Mar 1, 2018 at 10:24 AM, Brian Gerst <brgerst@gmail.com> wrote:
>
> The IF flag only affects external maskable interrupts, not traps or
> faults.  You do need to check CR3 because SYSENTER does not clear TF
> and will immediately cause a debug trap on kernel entry (with user
> CR3) if set.  That is why the code existed before to check for the
> entry stack for debug/NMI.

Note that debug traps can happen regardless of TF, Think kgdb etc.
Arguably kgdb users get what they deserve, but still.. I think root
can set kernel breakpoints too.

          Linus

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
@ 2018-03-01 18:38               ` Linus Torvalds
  0 siblings, 0 replies; 183+ messages in thread
From: Linus Torvalds @ 2018-03-01 18:38 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Joerg Roedel, Waiman Long, Joerg Roedel, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List, Linux-MM, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek

On Thu, Mar 1, 2018 at 10:24 AM, Brian Gerst <brgerst@gmail.com> wrote:
>
> The IF flag only affects external maskable interrupts, not traps or
> faults.  You do need to check CR3 because SYSENTER does not clear TF
> and will immediately cause a debug trap on kernel entry (with user
> CR3) if set.  That is why the code existed before to check for the
> entry stack for debug/NMI.

Note that debug traps can happen regardless of TF, Think kgdb etc.
Arguably kgdb users get what they deserve, but still.. I think root
can set kernel breakpoints too.

          Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
  2018-03-01 18:24             ` Brian Gerst
@ 2018-03-02  9:07               ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-03-02  9:07 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Joerg Roedel, Waiman Long, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List, Linux-MM, Linus Torvalds,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli, Waiman Long, Pavel Machek

On Thu, Mar 01, 2018 at 01:24:39PM -0500, Brian Gerst wrote:
> The IF flag only affects external maskable interrupts, not traps or
> faults.  You do need to check CR3 because SYSENTER does not clear TF
> and will immediately cause a debug trap on kernel entry (with user
> CR3) if set.  That is why the code existed before to check for the
> entry stack for debug/NMI.

Yeah, okay, thanks for the clarification. This also means the #DB
handler needs to leave with the same cr3 as it entered. I'll work that
into my patches.

Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
@ 2018-03-02  9:07               ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-03-02  9:07 UTC (permalink / raw)
  To: Brian Gerst
  Cc: Joerg Roedel, Waiman Long, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List, Linux-MM, Linus Torvalds,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli, Waiman Long, Pavel Machek

On Thu, Mar 01, 2018 at 01:24:39PM -0500, Brian Gerst wrote:
> The IF flag only affects external maskable interrupts, not traps or
> faults.  You do need to check CR3 because SYSENTER does not clear TF
> and will immediately cause a debug trap on kernel entry (with user
> CR3) if set.  That is why the code existed before to check for the
> entry stack for debug/NMI.

Yeah, okay, thanks for the clarification. This also means the #DB
handler needs to leave with the same cr3 as it entered. I'll work that
into my patches.

Thanks,

	Joerg

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
  2018-03-01 18:38               ` Linus Torvalds
@ 2018-03-02  9:10                 ` Joerg Roedel
  -1 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-03-02  9:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Brian Gerst, Joerg Roedel, Waiman Long, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List, Linux-MM, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek

On Thu, Mar 01, 2018 at 10:38:21AM -0800, Linus Torvalds wrote:
> Note that debug traps can happen regardless of TF, Think kgdb etc.
> Arguably kgdb users get what they deserve, but still.. I think root
> can set kernel breakpoints too.

But that seems to be broken right now at least wrt. to the espfix code
where there is no handling for in the #DB handler. Can userspace really
set arbitrary kernel breakpoints?


	Joerg

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
@ 2018-03-02  9:10                 ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-03-02  9:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Brian Gerst, Joerg Roedel, Waiman Long, Thomas Gleixner,
	Ingo Molnar, H . Peter Anvin, the arch/x86 maintainers,
	Linux Kernel Mailing List, Linux-MM, Andy Lutomirski,
	Dave Hansen, Josh Poimboeuf, Juergen Gross, Peter Zijlstra,
	Borislav Petkov, Jiri Kosina, Boris Ostrovsky, David Laight,
	Denys Vlasenko, Eduardo Valentin, Greg KH, Will Deacon, Liguori,
	Anthony, Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek

On Thu, Mar 01, 2018 at 10:38:21AM -0800, Linus Torvalds wrote:
> Note that debug traps can happen regardless of TF, Think kgdb etc.
> Arguably kgdb users get what they deserve, but still.. I think root
> can set kernel breakpoints too.

But that seems to be broken right now at least wrt. to the espfix code
where there is no handling for in the #DB handler. Can userspace really
set arbitrary kernel breakpoints?


	Joerg

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-13 17:25                 ` Linus Torvalds
  (?)
@ 2018-03-06 15:39                   ` Jason A. Donenfeld
  -1 siblings, 0 replies; 183+ messages in thread
From: Jason A. Donenfeld @ 2018-03-06 15:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Greg KH, Mark D Rustad, Adam Borowski, Joerg Roedel,
	Andy Lutomirski, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, X86 ML, LKML, Linux-MM, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Will Deacon, Liguori, Anthony,
	Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek, WireGuard mailing list

Hi Linus,

On Tue, Feb 13, 2018 at 6:25 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> So let's try to fix the iscsi and ipsec issues. Not that anybody sane
> should use that overly complex ipsec thing, and I think we should
> strive to merge WireGuard and get people moved over to that instead,
> but I haven't heard anything from davem about it since I last asked..
> I have some hope that it's slowly happening.

Sorry for missing this comment earlier. We're really quite close to a
point where we can post our v1 patchset. I'm headed on the road this
week, but will be back on the first of April, and I expect that
sometime in the spring we should begin to have the cycles of posting
patches and receiving reviews and getting this into shape for shipping
upstream.

So, fear not, we're still rolling ahead!

Regards,
Jason

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-03-06 15:39                   ` Jason A. Donenfeld
  0 siblings, 0 replies; 183+ messages in thread
From: Jason A. Donenfeld @ 2018-03-06 15:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Greg KH, Mark D Rustad, Adam Borowski, Joerg Roedel,
	Andy Lutomirski, Joerg Roedel, Thomas Gleixner, Ingo Molnar,
	H . Peter Anvin, X86 ML, LKML, Linux-MM, Dave Hansen,
	Josh Poimboeuf, Juergen Gross, Peter Zijlstra, Borislav Petkov,
	Jiri Kosina, Boris Ostrovsky, Brian Gerst, David Laight,
	Denys Vlasenko, Eduardo Valentin, Will Deacon, Liguori, Anthony,
	Daniel Gruss, Hugh Dickins, Kees Cook, Andrea Arcangeli,
	Waiman Long, Pavel Machek, WireGuard mailing list

Hi Linus,

On Tue, Feb 13, 2018 at 6:25 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> So let's try to fix the iscsi and ipsec issues. Not that anybody sane
> should use that overly complex ipsec thing, and I think we should
> strive to merge WireGuard and get people moved over to that instead,
> but I haven't heard anything from davem about it since I last asked..
> I have some hope that it's slowly happening.

Sorry for missing this comment earlier. We're really quite close to a
point where we can post our v1 patchset. I'm headed on the road this
week, but will be back on the first of April, and I expect that
sometime in the spring we should begin to have the cycles of posting
patches and receiving reviews and getting this into shape for shipping
upstream.

So, fear not, we're still rolling ahead!

Regards,
Jason

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-03-06 15:39                   ` Jason A. Donenfeld
  0 siblings, 0 replies; 183+ messages in thread
From: Jason A. Donenfeld @ 2018-03-06 15:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eduardo Valentin, Peter Zijlstra, Will Deacon, Dave Hansen,
	Pavel Machek, H . Peter Anvin, Ingo Molnar, Andrea Arcangeli,
	Joerg Roedel, X86 ML, Hugh Dickins, Liguori, Anthony,
	Adam Borowski, Joerg Roedel, Denys Vlasenko, Daniel Gruss,
	Brian Gerst, Kees Cook, Borislav Petkov, Andy Lutomirski,
	Josh Poimboeuf, Boris Ostrovsky, Juergen Gross, Linux-MM,
	Mark D Rustad, LKML, David Laight, Thomas Gleixner, Jiri Kosina,
	Waiman Long, WireGuard mailing list

Hi Linus,

On Tue, Feb 13, 2018 at 6:25 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> So let's try to fix the iscsi and ipsec issues. Not that anybody sane
> should use that overly complex ipsec thing, and I think we should
> strive to merge WireGuard and get people moved over to that instead,
> but I haven't heard anything from davem about it since I last asked..
> I have some hope that it's slowly happening.

Sorry for missing this comment earlier. We're really quite close to a
point where we can post our v1 patchset. I'm headed on the road this
week, but will be back on the first of April, and I expect that
sometime in the spring we should begin to have the cycles of posting
patches and receiving reviews and getting this into shape for shipping
upstream.

So, fear not, we're still rolling ahead!

Regards,
Jason

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points
  2018-03-02  9:10                 ` Joerg Roedel
  (?)
@ 2018-03-16 20:55                 ` Andy Lutomirski
  -1 siblings, 0 replies; 183+ messages in thread
From: Andy Lutomirski @ 2018-03-16 20:55 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Linus Torvalds, Brian Gerst, Joerg Roedel, Waiman Long,
	Thomas Gleixner, Ingo Molnar, H . Peter Anvin,
	the arch/x86 maintainers, Linux Kernel Mailing List, Linux-MM,
	Andy Lutomirski, Dave Hansen, Josh Poimboeuf, Juergen Gross,
	Peter Zijlstra, Borislav Petkov, Jiri Kosina, Boris Ostrovsky,
	David Laight, Denys Vlasenko, Eduardo Valentin, Greg KH,
	Will Deacon, Liguori, Anthony, Daniel Gruss, Hugh Dickins,
	Kees Cook, Andrea Arcangeli, Waiman Long, Pavel Machek

On Fri, Mar 2, 2018 at 9:10 AM, Joerg Roedel <joro@8bytes.org> wrote:
> On Thu, Mar 01, 2018 at 10:38:21AM -0800, Linus Torvalds wrote:
>> Note that debug traps can happen regardless of TF, Think kgdb etc.
>> Arguably kgdb users get what they deserve, but still.. I think root
>> can set kernel breakpoints too.
>
> But that seems to be broken right now at least wrt. to the espfix code
> where there is no handling for in the #DB handler. Can userspace really
> set arbitrary kernel breakpoints?
>

As far as I'm concerned, I don't try to support kernel debugger users
setting arbitrary breakpoints in the kernel entry text.

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
  2018-02-20  3:45 David H. Gutteridge
@ 2018-02-20  8:40 ` Joerg Roedel
  0 siblings, 0 replies; 183+ messages in thread
From: Joerg Roedel @ 2018-02-20  8:40 UTC (permalink / raw)
  To: David H. Gutteridge; +Cc: linux-kernel

Hi David,

thanks a lot for your testing, much appreciated!

On Mon, Feb 19, 2018 at 10:45:56PM -0500, David H. Gutteridge wrote:
> (1) There is a regression when the QXL display driver is enabled; the
> VM hangs during boot. (QXL has been a source of similar trouble in the
> past.) I don't have an example trace for it at present.
> 
> (2) There is a regression when the VGA display driver is enabled; it
> intermittently (but reproducibly) faults, which makes it impossible
> to boot to the graphical login manager.

Can you please send me the kernel-config used and the qemu command-line
of the VM? I'll try to reproduce this here.

> [   25.439213] kernel BUG at arch/x86/mm/fault.c:268!
> [   25.439218] invalid opcode: 0000 [#1] SMP PTI
> [   25.439218] Modules linked in: bochs_drm(+) ttm snd_hda_core
> drm_kms_helper snd_hwdep drm snd_seq snd_seq_device snd_pcm snd_timer
> snd pcspkr virtio_balloon i2c_piix4 soundcore virtio_console 8139too
> crc32c_intel virtio_pci virtio_ring serio_raw virtio 8139cp ata_generic
> mii pata_acpi floppy qemu_fw_cfg
> [   25.439236] CPU: 1 PID: 545 Comm: systemd-udevd Tainted:
> G        W        4.15.0+ #1
> [   25.439237] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.10.2-1 04/01/2014
> [   25.439241] EIP: vmalloc_fault+0x1e7/0x210
> [   25.439242] EFLAGS: 00010083 CPU: 1
> [   25.439243] EAX: 02788000 EBX: d78ecdf8 ECX: 00000080 EDX: 00000000
> [   25.439244] ESI: 000fd000 EDI: fd0000f3 EBP: f3f639a0 ESP: f3f63988
> [   25.439245]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> [   25.439246] CR0: 80050033 CR2: f7e00000 CR3: 33e3a000 CR4: 000006f0

This is a kernel-cr3, so that is at least not the issue.


Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 183+ messages in thread

* Re: [PATCH 00/31 v2] PTI support for x86_32
@ 2018-02-20  3:45 David H. Gutteridge
  2018-02-20  8:40 ` Joerg Roedel
  0 siblings, 1 reply; 183+ messages in thread
From: David H. Gutteridge @ 2018-02-20  3:45 UTC (permalink / raw)
  To: joro, linux-kernel

On 09/02/18 10:25, Joerg Roedel wrote:
> Hi,
> 
> here is the second version of my PTI implementation for
> x86_32, based on tip/x86-pti-for-linus. It took a lot longer
> than I had hoped, but there have been a number of obstacles
> on the way. It also isn't the small patch-set anymore that v1
> was, but compared to it this one actually works :)
[...]
>I do not claim that I've found the best solution for every
>problem I encountered, so please review and give me feedback
>on what I should change or solve differently. Of course I am
>also interested in all bugs that may still be in there.
>
>Thanks a lot,
>
>       Joerg

Hello,

I thought I'd try my hand at testing this patch set from an end user's
perspective. I built a test kernel based on Fedora's
config-4.15.2-300.fc27.i686+PAE, the only change obviously being the
addition of CONFIG_PAGE_TABLE_ISOLATION=y. I ran this kernel in two
test environments: an LG X110 netbook, which has an Atom N270 with 1GB
of RAM (booted with "pti=on"), and a QEMU VM emulating a quad Core i7
Nehalem setup. (The X110 is the only i686 hardware I had on hand I
could practically use. I figured it'd be a suitable low-end hardware
spec to work with, even though no one realistically would force-enable
PTI on it.)

Testing consisted in part of using the laptop's Mate session to
remotely render the VM's Xfce session, while both had PTI enabled on
their test kernels. The VM also successfully ran the basic kernel
tests and the performance test suite that Fedora provides for
community testing (https://pagure.io/kernel-tests.git). (Well, it had
a hiccup with the performance testing, but that's apparently unrelated
to the PTI patches.) The laptop was also used for various everyday
activities, like web browsing using Firefox, and document editing
using LibreOffice Writer. (It obviously isn't a star at this, but it
was usable.)

General results:

X110: no issues whatsoever. (I was actually expecting more of a
noticable performance hit in some aspects.)

QEMU VM: I encountered two similar issues:

(1) There is a regression when the QXL display driver is enabled; the
VM hangs during boot. (QXL has been a source of similar trouble in the
past.) I don't have an example trace for it at present.

(2) There is a regression when the VGA display driver is enabled; it
intermittently (but reproducibly) faults, which makes it impossible
to boot to the graphical login manager.

[   25.430588] [drm] Found bochs VGA, ID 0xb0c0.
[   25.431212] [drm] Framebuffer size 16384 kB @ 0xfd000000, mmio @
0xfebd4000.
[   25.432586] [TTM] Zone  kernel: Available graphics memory: 426476 kiB
[   25.433099] [TTM] Zone highmem: Available graphics memory: 1549744
kiB
[   25.433890] [TTM] Initializing pool allocator
[   25.434863] [TTM] Initializing DMA pool allocator
[   25.436767] ------------[ cut here ]------------
[   25.439213] kernel BUG at arch/x86/mm/fault.c:268!
[   25.439218] invalid opcode: 0000 [#1] SMP PTI
[   25.439218] Modules linked in: bochs_drm(+) ttm snd_hda_core
drm_kms_helper snd_hwdep drm snd_seq snd_seq_device snd_pcm snd_timer
snd pcspkr virtio_balloon i2c_piix4 soundcore virtio_console 8139too
crc32c_intel virtio_pci virtio_ring serio_raw virtio 8139cp ata_generic
mii pata_acpi floppy qemu_fw_cfg
[   25.439236] CPU: 1 PID: 545 Comm: systemd-udevd Tainted:
G        W        4.15.0+ #1
[   25.439237] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.10.2-1 04/01/2014
[   25.439241] EIP: vmalloc_fault+0x1e7/0x210
[   25.439242] EFLAGS: 00010083 CPU: 1
[   25.439243] EAX: 02788000 EBX: d78ecdf8 ECX: 00000080 EDX: 00000000
[   25.439244] ESI: 000fd000 EDI: fd0000f3 EBP: f3f639a0 ESP: f3f63988
[   25.439245]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[   25.439246] CR0: 80050033 CR2: f7e00000 CR3: 33e3a000 CR4: 000006f0
[   25.439249] Call Trace:
[   25.439254]  ? kvm_async_pf_task_wake+0x100/0x100
[   25.439256]  __do_page_fault+0x34d/0x4d0
[   25.439257]  ? __ioremap_caller+0x23a/0x3d0
[   25.439259]  ? kvm_async_pf_task_wake+0x100/0x100
[   25.439260]  do_page_fault+0x27/0xe0
[   25.439261]  ? kvm_async_pf_task_wake+0x100/0x100
[   25.439263]  do_async_page_fault+0x55/0x80
[   25.439265]  common_exception+0xef/0xf6
[   25.439268] EIP: memset+0xb/0x20
[   25.439268] EFLAGS: 00010206 CPU: 1
[   25.439269] EAX: 00000000 EBX: f7e00000 ECX: 00300000 EDX: 00000000
[   25.439270] ESI: f3f63b5c EDI: f7e00000 EBP: f3f63a58 ESP: f3f63a50
[   25.439271]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[   25.439278]  ttm_bo_move_memcpy+0x47c/0x4a0 [ttm]
[   25.439283]  ttm_bo_handle_move_mem+0x55a/0x580 [ttm]
[   25.439286]  ? ttm_bo_mem_space+0x394/0x460 [ttm]
[   25.439290]  ttm_bo_validate+0x116/0x130 [ttm]
[   25.439294]  bochs_bo_pin+0xa1/0x170 [bochs_drm]
[   25.439297]  bochsfb_create+0xce/0x310 [bochs_drm]
[   25.439308]  __drm_fb_helper_initial_config_and_unlock+0x1cc/0x460
[drm_kms_helper]
[   25.439314]  drm_fb_helper_initial_config+0x35/0x40 [drm_kms_helper]
[   25.439317]  bochs_fbdev_init+0x74/0x80 [bochs_drm]
[   25.439319]  bochs_load+0x7a/0x90 [bochs_drm]
[   25.439333]  drm_dev_register+0x133/0x1b0 [drm]
[   25.439343]  drm_get_pci_dev+0x86/0x160 [drm]
[   25.439346]  bochs_pci_probe+0xcb/0x110 [bochs_drm]
[   25.439348]  ? bochs_load+0x90/0x90 [bochs_drm]
[   25.439351]  pci_device_probe+0xc7/0x160
[   25.439353]  driver_probe_device+0x2dc/0x460
[   25.439354]  __driver_attach+0x99/0xe0
[   25.439356]  ? driver_probe_device+0x460/0x460
[   25.439357]  bus_for_each_dev+0x5a/0xa0
[   25.439359]  driver_attach+0x19/0x20
[   25.439360]  ? driver_probe_device+0x460/0x460
[   25.439362]  bus_add_driver+0x187/0x230
[   25.439363]  ? 0xf7afa000
[   25.439364]  driver_register+0x56/0xd0
[   25.439365]  ? 0xf7afa000
[   25.439367]  __pci_register_driver+0x3a/0x40
[   25.439369]  bochs_init+0x41/0x1000 [bochs_drm]
[   25.439371]  do_one_initcall+0x49/0x170
[   25.439373]  ? _cond_resched+0x2a/0x40
[   25.439375]  ? kmem_cache_alloc_trace+0x175/0x1e0
[   25.439376]  ? do_init_module+0x21/0x1dc
[   25.439378]  ? do_init_module+0x21/0x1dc
[   25.439379]  do_init_module+0x50/0x1dc
[   25.439380]  load_module+0x1fce/0x28e0
[   25.439383]  SyS_finit_module+0x8a/0xe0
[   25.439385]  do_fast_syscall_32+0x81/0x1b0
[   25.439518]  entry_SYSENTER_32+0x5f/0xb9
[   25.439519] EIP: 0xb7f21cf9
[   25.439520] EFLAGS: 00000246 CPU: 1
[   25.439521] EAX: ffffffda EBX: 00000011 ECX: b7afae75 EDX: 00000000
[   25.439522] ESI: 019d5740 EDI: 019acc00 EBP: 019ade00 ESP: bff9bb4c
[   25.439524]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
[   25.439525] Code: e2 00 f0 1f 00 81 ea 00 00 20 00 21 d0 8b 55 e8 89
c6 81 e2 ff 0f 00 00 0f ac d6 0c 8d 04 b6 c1 e0 03 39 45 ec 0f 84 27 ff
ff ff <0f> 0b 8d b4 26 00 00 00 00 83 c4 0c ba ff ff ff ff 5b 89 d0 5e
[   25.439547] EIP: vmalloc_fault+0x1e7/0x210 SS:ESP: 0068:f3f63988
[   25.439548] ---[ end trace 18f2d11043a28ec0 ]---

The Virtio and VMVGA display drivers both worked consistently for me.

I haven't tested a non-PAE kernel, but can do so if it's of interest.
Or I can provide further details or testing if need be. If so, please
CC me. I hope this is of some use.

Regards,

Dave

^ permalink raw reply	[flat|nested] 183+ messages in thread

end of thread, other threads:[~2018-03-16 20:55 UTC | newest]

Thread overview: 183+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-09  9:25 [PATCH 00/31 v2] PTI support for x86_32 Joerg Roedel
2018-02-09  9:25 ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 01/31] x86/asm-offsets: Move TSS_sp0 and TSS_sp1 to asm-offsets.c Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 02/31] x86/entry/32: Rename TSS_sysenter_sp0 to TSS_entry_stack Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 03/31] x86/entry/32: Load task stack from x86_tss.sp1 in SYSENTER handler Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 04/31] x86/entry/32: Put ESPFIX code into a macro Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 05/31] x86/entry/32: Unshare NMI return path Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 06/31] x86/entry/32: Split off return-to-kernel path Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 07/31] x86/entry/32: Restore segments before int registers Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 08/31] x86/entry/32: Enter the kernel via trampoline stack Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 09/31] x86/entry/32: Leave " Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09 17:05   ` Linus Torvalds
2018-02-09 17:05     ` Linus Torvalds
2018-02-09 17:17     ` Denys Vlasenko
2018-02-09 17:17       ` Denys Vlasenko
2018-02-10 15:26       ` David Laight
2018-02-10 15:26         ` David Laight
2018-02-10 20:18         ` Linus Torvalds
2018-02-10 20:18           ` Linus Torvalds
2018-02-09 17:43     ` Andy Lutomirski
2018-02-09 17:43       ` Andy Lutomirski
2018-02-09 19:06       ` Joerg Roedel
2018-02-09 19:06         ` Joerg Roedel
2018-02-09 19:02     ` Joerg Roedel
2018-02-09 19:02       ` Joerg Roedel
2018-02-09 19:17       ` Linus Torvalds
2018-02-09 19:17         ` Linus Torvalds
2018-02-09 19:25         ` Joerg Roedel
2018-02-09 19:25           ` Joerg Roedel
2018-02-09 19:48           ` Linus Torvalds
2018-02-09 19:48             ` Linus Torvalds
2018-02-10 15:41             ` David Laight
2018-02-10 15:41               ` David Laight
2018-02-09 19:30       ` Denys Vlasenko
2018-02-09 19:30         ` Denys Vlasenko
2018-02-09  9:25 ` [PATCH 10/31] x86/entry/32: Introduce SAVE_ALL_NMI and RESTORE_ALL_NMI Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 11/31] x86/entry/32: Add PTI cr3 switches to NMI handler code Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 12/31] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-27 19:18   ` Waiman Long
2018-02-27 19:18     ` Waiman Long
2018-03-01 12:03     ` Joerg Roedel
2018-03-01 12:03       ` Joerg Roedel
2018-03-01 13:34     ` Joerg Roedel
2018-03-01 13:34       ` Joerg Roedel
2018-03-01 14:33       ` Waiman Long
2018-03-01 14:33         ` Waiman Long
2018-03-01 16:50         ` Joerg Roedel
2018-03-01 16:50           ` Joerg Roedel
2018-03-01 18:24           ` Brian Gerst
2018-03-01 18:24             ` Brian Gerst
2018-03-01 18:36             ` Dave Hansen
2018-03-01 18:36               ` Dave Hansen
2018-03-01 18:38             ` Linus Torvalds
2018-03-01 18:38               ` Linus Torvalds
2018-03-02  9:10               ` Joerg Roedel
2018-03-02  9:10                 ` Joerg Roedel
2018-03-16 20:55                 ` Andy Lutomirski
2018-03-02  9:07             ` Joerg Roedel
2018-03-02  9:07               ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 13/31] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 14/31] x86/pgtable/pae: Unshare kernel PMDs when PTI is enabled Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 15/31] x86/pgtable/32: Allocate 8k page-tables " Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 16/31] x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 17/31] x86/pgtable: Move pti_set_user_pgd() " Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 18/31] x86/pgtable: Move two more functions from pgtable_64.h " Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 19/31] x86/mm/pae: Populate valid user PGD entries Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-14  9:45   ` Juergen Gross
2018-02-14  9:45     ` Juergen Gross
2018-02-14 10:00     ` Joerg Roedel
2018-02-14 10:00       ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 20/31] x86/mm/pae: Populate the user page-table with user pgd's Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09 17:48   ` Andy Lutomirski
2018-02-09 17:48     ` Andy Lutomirski
2018-02-09 19:09     ` Joerg Roedel
2018-02-09 19:09       ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 21/31] x86/mm/legacy: " Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 22/31] x86/mm/pti: Add an overflow check to pti_clone_pmds() Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 23/31] x86/mm/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32 Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09 17:42   ` Andy Lutomirski
2018-02-09 17:42     ` Andy Lutomirski
2018-02-09  9:25 ` [PATCH 24/31] x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level " Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 25/31] x86/mm/dump_pagetables: Define INIT_PGD Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 26/31] x86/pgtable/pae: Use separate kernel PMDs for user page-table Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 27/31] x86/ldt: Reserve address-space range on 32 bit for the LDT Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 28/31] x86/ldt: Define LDT_END_ADDR Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 29/31] x86/ldt: Split out sanity check in map_ldt_struct() Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 30/31] x86/ldt: Enable LDT user-mapping for PAE Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09  9:25 ` [PATCH 31/31] x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32 Joerg Roedel
2018-02-09  9:25   ` Joerg Roedel
2018-02-09 12:11 ` [PATCH 00/31 v2] PTI support " Juergen Gross
2018-02-09 12:11   ` Juergen Gross
2018-02-09 13:35   ` Joerg Roedel
2018-02-09 13:35     ` Joerg Roedel
2018-02-09 13:54     ` Andrew Cooper
2018-02-09 13:54       ` Andrew Cooper
2018-02-09 17:47 ` Andy Lutomirski
2018-02-09 17:47   ` Andy Lutomirski
2018-02-09 19:11   ` Joerg Roedel
2018-02-09 19:11     ` Joerg Roedel
2018-02-10  9:15     ` Adam Borowski
2018-02-10  9:15       ` Adam Borowski
2018-02-10 20:22       ` Linus Torvalds
2018-02-10 20:22         ` Linus Torvalds
2018-02-11 10:59         ` Adam Borowski
2018-02-11 10:59           ` Adam Borowski
2018-02-11 17:40           ` Mark D Rustad
2018-02-11 19:42             ` Andy Lutomirski
2018-02-11 19:42               ` Andy Lutomirski
2018-02-11 20:14               ` Linus Torvalds
2018-02-11 20:14                 ` Linus Torvalds
2018-02-11 22:12               ` James Bottomley
2018-02-11 22:12                 ` James Bottomley
2018-02-11 22:30                 ` Andy Lutomirski
2018-02-11 22:30                   ` Andy Lutomirski
2018-02-11 23:47                   ` James Bottomley
2018-02-11 23:47                     ` James Bottomley
2018-02-11 22:34               ` Pavel Machek
2018-02-11 22:34                 ` Pavel Machek
2018-02-11 23:25               ` Alan Cox
2018-02-11 23:25                 ` Alan Cox
2018-02-12 10:16                 ` Anders Larsen
2018-02-12 10:16                   ` Anders Larsen
2018-02-14 10:43               ` Pavel Machek
2018-02-15  3:44                 ` joe.korty
2018-02-16 14:34                   ` Pavel Machek
2018-02-13  8:54             ` Greg KH
2018-02-13  8:54               ` Greg KH
2018-02-13 17:25               ` Linus Torvalds
2018-02-13 17:25                 ` Linus Torvalds
2018-02-14  8:54                 ` Greg KH
2018-02-14  8:54                   ` Greg KH
2018-02-21 10:26                   ` Lorenzo Colitti
2018-02-21 10:26                     ` Lorenzo Colitti
2018-02-21 16:59                     ` Arnd Bergmann
2018-02-21 16:59                       ` Arnd Bergmann
2018-02-22 11:10                       ` Greg KH
2018-02-22 11:10                         ` Greg KH
2018-02-22 11:18                         ` Arnd Bergmann
2018-02-22 11:18                           ` Arnd Bergmann
2018-03-06 15:39                 ` Jason A. Donenfeld
2018-03-06 15:39                   ` Jason A. Donenfeld
2018-03-06 15:39                   ` Jason A. Donenfeld
2018-02-11 19:13     ` Ingo Molnar
2018-02-11 19:13       ` Ingo Molnar
2018-02-12 14:51       ` Joerg Roedel
2018-02-12 14:51         ` Joerg Roedel
2018-02-09 21:09   ` Pavel Machek
2018-02-09 21:11     ` Linus Torvalds
2018-02-09 21:11       ` Linus Torvalds
2018-02-09 21:28     ` Andrew Cooper
2018-02-09 21:28       ` Andrew Cooper
2018-02-20  3:45 David H. Gutteridge
2018-02-20  8:40 ` Joerg Roedel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.