linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER)
@ 2017-11-17 18:21 Will Deacon
  2017-11-17 18:21 ` [PATCH 01/18] arm64: mm: Use non-global mappings for kernel space Will Deacon
                   ` (21 more replies)
  0 siblings, 22 replies; 45+ messages in thread
From: Will Deacon @ 2017-11-17 18:21 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, catalin.marinas, mark.rutland, ard.biesheuvel,
	sboyd, dave.hansen, keescook, Will Deacon

Hi all,

This patch series implements something along the lines of KAISER for arm64:

  https://gruss.cc/files/kaiser.pdf

although I wrote this from scratch because the paper has some funny
assumptions about how the architecture works. There is a patch series
in review for x86, which follows a similar approach:

  http://lkml.kernel.org/r/<20171110193058.BECA7D88@viggo.jf.intel.com>

and the topic was recently covered by LWN (currently subscriber-only):

  https://lwn.net/Articles/738975/

The basic idea is that transitions to and from userspace are proxied
through a trampoline page which is mapped into a separate page table and
can switch the full kernel mapping in and out on exception entry and
exit respectively. This is a valuable defence against various KASLR and
timing attacks, particularly as the trampoline page is at a fixed virtual
address and therefore the kernel text can be randomized independently.

The major consequences of the trampoline are:

  * We can no longer make use of global mappings for kernel space, so
    each task is assigned two ASIDs: one for user mappings and one for
    kernel mappings

  * Our ASID moves into TTBR1 so that we can quickly switch between the
    trampoline and kernel page tables

  * Switching TTBR0 always requires use of the zero page, so we can
    dispense with some of our errata workaround code.

  * entry.S gets more complicated to read

The performance hit from this series isn't as bad as I feared: things
like cyclictest and kernbench seem to be largely unaffected, although
syscall micro-benchmarks appear to show that syscall overhead is roughly
doubled, and this has an impact on things like hackbench which exhibits
a ~10% hit due to its heavy context-switching.

Patches based on 4.14 and also pushed here:

  git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git kaiser

Feedback welcome,

Will

--->8

Will Deacon (18):
  arm64: mm: Use non-global mappings for kernel space
  arm64: mm: Temporarily disable ARM64_SW_TTBR0_PAN
  arm64: mm: Move ASID from TTBR0 to TTBR1
  arm64: mm: Remove pre_ttbr0_update_workaround for Falkor erratum
    #E1003
  arm64: mm: Rename post_ttbr0_update_workaround
  arm64: mm: Fix and re-enable ARM64_SW_TTBR0_PAN
  arm64: mm: Allocate ASIDs in pairs
  arm64: mm: Add arm64_kernel_mapped_at_el0 helper using static key
  arm64: mm: Invalidate both kernel and user ASIDs when performing TLBI
  arm64: entry: Add exception trampoline page for exceptions from EL0
  arm64: mm: Map entry trampoline into trampoline and kernel page tables
  arm64: entry: Explicitly pass exception level to kernel_ventry macro
  arm64: entry: Hook up entry trampoline to exception vectors
  arm64: erratum: Work around Falkor erratum #E1003 in trampoline code
  arm64: tls: Avoid unconditional zeroing of tpidrro_el0 for native
    tasks
  arm64: entry: Add fake CPU feature for mapping the kernel at EL0
  arm64: makefile: Ensure TEXT_OFFSET doesn't overlap with trampoline
  arm64: Kconfig: Add CONFIG_UNMAP_KERNEL_AT_EL0

 arch/arm64/Kconfig                      |  30 +++--
 arch/arm64/Makefile                     |  18 ++-
 arch/arm64/include/asm/asm-uaccess.h    |  25 ++--
 arch/arm64/include/asm/assembler.h      |  27 +----
 arch/arm64/include/asm/cpucaps.h        |   3 +-
 arch/arm64/include/asm/kernel-pgtable.h |  12 +-
 arch/arm64/include/asm/memory.h         |   1 +
 arch/arm64/include/asm/mmu.h            |  12 ++
 arch/arm64/include/asm/mmu_context.h    |   9 +-
 arch/arm64/include/asm/pgtable-hwdef.h  |   1 +
 arch/arm64/include/asm/pgtable-prot.h   |  21 +++-
 arch/arm64/include/asm/pgtable.h        |   1 +
 arch/arm64/include/asm/proc-fns.h       |   6 -
 arch/arm64/include/asm/tlbflush.h       |  16 ++-
 arch/arm64/include/asm/uaccess.h        |  21 +++-
 arch/arm64/kernel/cpufeature.c          |  11 ++
 arch/arm64/kernel/entry.S               | 195 ++++++++++++++++++++++++++------
 arch/arm64/kernel/process.c             |  12 +-
 arch/arm64/kernel/vmlinux.lds.S         |  17 +++
 arch/arm64/lib/clear_user.S             |   2 +-
 arch/arm64/lib/copy_from_user.S         |   2 +-
 arch/arm64/lib/copy_in_user.S           |   2 +-
 arch/arm64/lib/copy_to_user.S           |   2 +-
 arch/arm64/mm/cache.S                   |   2 +-
 arch/arm64/mm/context.c                 |  36 +++---
 arch/arm64/mm/mmu.c                     |  60 ++++++++++
 arch/arm64/mm/proc.S                    |  12 +-
 arch/arm64/xen/hypercall.S              |   2 +-
 28 files changed, 418 insertions(+), 140 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 01/18] arm64: mm: Use non-global mappings for kernel space
  2017-11-17 18:21 [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Will Deacon
@ 2017-11-17 18:21 ` Will Deacon
  2017-11-17 18:21 ` [PATCH 02/18] arm64: mm: Temporarily disable ARM64_SW_TTBR0_PAN Will Deacon
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2017-11-17 18:21 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, catalin.marinas, mark.rutland, ard.biesheuvel,
	sboyd, dave.hansen, keescook, Will Deacon

In preparation for unmapping the kernel whilst running in userspace,
make the kernel mappings non-global so we can avoid expensive TLB
invalidation on kernel exit to userspace.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/kernel-pgtable.h | 12 ++++++++++--
 arch/arm64/include/asm/pgtable-prot.h   | 21 +++++++++++++++------
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index 7803343e5881..77a27af01371 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -78,8 +78,16 @@
 /*
  * Initial memory map attributes.
  */
-#define SWAPPER_PTE_FLAGS	(PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
-#define SWAPPER_PMD_FLAGS	(PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
+#define _SWAPPER_PTE_FLAGS	(PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
+#define _SWAPPER_PMD_FLAGS	(PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
+
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+#define SWAPPER_PTE_FLAGS	(_SWAPPER_PTE_FLAGS | PTE_NG)
+#define SWAPPER_PMD_FLAGS	(_SWAPPER_PMD_FLAGS | PMD_SECT_NG)
+#else
+#define SWAPPER_PTE_FLAGS	_SWAPPER_PTE_FLAGS
+#define SWAPPER_PMD_FLAGS	_SWAPPER_PMD_FLAGS
+#endif
 
 #if ARM64_SWAPPER_USES_SECTION_MAPS
 #define SWAPPER_MM_MMUFLAGS	(PMD_ATTRINDX(MT_NORMAL) | SWAPPER_PMD_FLAGS)
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 0a5635fb0ef9..22a926825e3f 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -34,8 +34,16 @@
 
 #include <asm/pgtable-types.h>
 
-#define PROT_DEFAULT		(PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
-#define PROT_SECT_DEFAULT	(PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
+#define _PROT_DEFAULT		(PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
+#define _PROT_SECT_DEFAULT	(PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
+
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+#define PROT_DEFAULT		(_PROT_DEFAULT | PTE_NG)
+#define PROT_SECT_DEFAULT	(_PROT_SECT_DEFAULT | PMD_SECT_NG)
+#else
+#define PROT_DEFAULT		_PROT_DEFAULT
+#define PROT_SECT_DEFAULT	_PROT_SECT_DEFAULT
+#endif /* CONFIG_UNMAP_KERNEL_AT_EL0 */
 
 #define PROT_DEVICE_nGnRnE	(PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_DIRTY | PTE_WRITE | PTE_ATTRINDX(MT_DEVICE_nGnRnE))
 #define PROT_DEVICE_nGnRE	(PROT_DEFAULT | PTE_PXN | PTE_UXN | PTE_DIRTY | PTE_WRITE | PTE_ATTRINDX(MT_DEVICE_nGnRE))
@@ -48,6 +56,7 @@
 #define PROT_SECT_NORMAL_EXEC	(PROT_SECT_DEFAULT | PMD_SECT_UXN | PMD_ATTRINDX(MT_NORMAL))
 
 #define _PAGE_DEFAULT		(PROT_DEFAULT | PTE_ATTRINDX(MT_NORMAL))
+#define _HYP_PAGE_DEFAULT	(_PAGE_DEFAULT & ~PTE_NG)
 
 #define PAGE_KERNEL		__pgprot(_PAGE_DEFAULT | PTE_PXN | PTE_UXN | PTE_DIRTY | PTE_WRITE)
 #define PAGE_KERNEL_RO		__pgprot(_PAGE_DEFAULT | PTE_PXN | PTE_UXN | PTE_DIRTY | PTE_RDONLY)
@@ -55,15 +64,15 @@
 #define PAGE_KERNEL_EXEC	__pgprot(_PAGE_DEFAULT | PTE_UXN | PTE_DIRTY | PTE_WRITE)
 #define PAGE_KERNEL_EXEC_CONT	__pgprot(_PAGE_DEFAULT | PTE_UXN | PTE_DIRTY | PTE_WRITE | PTE_CONT)
 
-#define PAGE_HYP		__pgprot(_PAGE_DEFAULT | PTE_HYP | PTE_HYP_XN)
-#define PAGE_HYP_EXEC		__pgprot(_PAGE_DEFAULT | PTE_HYP | PTE_RDONLY)
-#define PAGE_HYP_RO		__pgprot(_PAGE_DEFAULT | PTE_HYP | PTE_RDONLY | PTE_HYP_XN)
+#define PAGE_HYP		__pgprot(_HYP_PAGE_DEFAULT | PTE_HYP | PTE_HYP_XN)
+#define PAGE_HYP_EXEC		__pgprot(_HYP_PAGE_DEFAULT | PTE_HYP | PTE_RDONLY)
+#define PAGE_HYP_RO		__pgprot(_HYP_PAGE_DEFAULT | PTE_HYP | PTE_RDONLY | PTE_HYP_XN)
 #define PAGE_HYP_DEVICE		__pgprot(PROT_DEVICE_nGnRE | PTE_HYP)
 
 #define PAGE_S2			__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_NORMAL) | PTE_S2_RDONLY)
 #define PAGE_S2_DEVICE		__pgprot(PROT_DEFAULT | PTE_S2_MEMATTR(MT_S2_DEVICE_nGnRE) | PTE_S2_RDONLY | PTE_UXN)
 
-#define PAGE_NONE		__pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_RDONLY | PTE_PXN | PTE_UXN)
+#define PAGE_NONE		__pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN)
 #define PAGE_SHARED		__pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_UXN | PTE_WRITE)
 #define PAGE_SHARED_EXEC	__pgprot(_PAGE_DEFAULT | PTE_USER | PTE_NG | PTE_PXN | PTE_WRITE)
 #define PAGE_READONLY		__pgprot(_PAGE_DEFAULT | PTE_USER | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 02/18] arm64: mm: Temporarily disable ARM64_SW_TTBR0_PAN
  2017-11-17 18:21 [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Will Deacon
  2017-11-17 18:21 ` [PATCH 01/18] arm64: mm: Use non-global mappings for kernel space Will Deacon
@ 2017-11-17 18:21 ` Will Deacon
  2017-11-17 18:21 ` [PATCH 03/18] arm64: mm: Move ASID from TTBR0 to TTBR1 Will Deacon
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2017-11-17 18:21 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, catalin.marinas, mark.rutland, ard.biesheuvel,
	sboyd, dave.hansen, keescook, Will Deacon

We're about to rework the way ASIDs are allocated, switch_mm is
implemented and low-level kernel entry/exit is handled, so keep the
ARM64_SW_TTBR0_PAN code out of the way whilst we do the heavy lifting.

It will be re-enabled in a subsequent patch.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6a56d4..582bbd77c390 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -872,6 +872,7 @@ endif
 
 config ARM64_SW_TTBR0_PAN
 	bool "Emulate Privileged Access Never using TTBR0_EL1 switching"
+	depends on BROKEN       # Temporary while switch_mm is reworked
 	help
 	  Enabling this option prevents the kernel from accessing
 	  user-space memory directly by pointing TTBR0_EL1 to a reserved
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 03/18] arm64: mm: Move ASID from TTBR0 to TTBR1
  2017-11-17 18:21 [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Will Deacon
  2017-11-17 18:21 ` [PATCH 01/18] arm64: mm: Use non-global mappings for kernel space Will Deacon
  2017-11-17 18:21 ` [PATCH 02/18] arm64: mm: Temporarily disable ARM64_SW_TTBR0_PAN Will Deacon
@ 2017-11-17 18:21 ` Will Deacon
  2017-11-17 18:21 ` [PATCH 04/18] arm64: mm: Remove pre_ttbr0_update_workaround for Falkor erratum #E1003 Will Deacon
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2017-11-17 18:21 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, catalin.marinas, mark.rutland, ard.biesheuvel,
	sboyd, dave.hansen, keescook, Will Deacon

In preparation for mapping kernelspace and userspace with different
ASIDs, move the ASID to TTBR1 and update switch_mm to context-switch
TTBR0 via an invalid mapping (the zero page).

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/mmu_context.h   | 7 +++++++
 arch/arm64/include/asm/pgtable-hwdef.h | 1 +
 arch/arm64/include/asm/proc-fns.h      | 6 ------
 arch/arm64/mm/proc.S                   | 9 ++++++---
 4 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 3257895a9b5e..56723bcbfaaa 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -37,6 +37,13 @@
 #include <asm/sysreg.h>
 #include <asm/tlbflush.h>
 
+#define cpu_switch_mm(pgd,mm)				\
+do {							\
+	BUG_ON(pgd == swapper_pg_dir);			\
+	cpu_set_reserved_ttbr0();			\
+	cpu_do_switch_mm(virt_to_phys(pgd),mm);		\
+} while (0)
+
 static inline void contextidr_thread_switch(struct task_struct *next)
 {
 	if (!IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR))
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index eb0c2bd90de9..8df4cb6ac6f7 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -272,6 +272,7 @@
 #define TCR_TG1_4K		(UL(2) << TCR_TG1_SHIFT)
 #define TCR_TG1_64K		(UL(3) << TCR_TG1_SHIFT)
 
+#define TCR_A1			(UL(1) << 22)
 #define TCR_ASID16		(UL(1) << 36)
 #define TCR_TBI0		(UL(1) << 37)
 #define TCR_HA			(UL(1) << 39)
diff --git a/arch/arm64/include/asm/proc-fns.h b/arch/arm64/include/asm/proc-fns.h
index 14ad6e4e87d1..16cef2e8449e 100644
--- a/arch/arm64/include/asm/proc-fns.h
+++ b/arch/arm64/include/asm/proc-fns.h
@@ -35,12 +35,6 @@ extern u64 cpu_do_resume(phys_addr_t ptr, u64 idmap_ttbr);
 
 #include <asm/memory.h>
 
-#define cpu_switch_mm(pgd,mm)				\
-do {							\
-	BUG_ON(pgd == swapper_pg_dir);			\
-	cpu_do_switch_mm(virt_to_phys(pgd),mm);		\
-} while (0)
-
 #endif /* __ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* __ASM_PROCFNS_H */
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 877d42fb0df6..0bd7550b7230 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -139,9 +139,12 @@ ENDPROC(cpu_do_resume)
  */
 ENTRY(cpu_do_switch_mm)
 	pre_ttbr0_update_workaround x0, x2, x3
+	mrs	x2, ttbr1_el1
 	mmid	x1, x1				// get mm->context.id
-	bfi	x0, x1, #48, #16		// set the ASID
-	msr	ttbr0_el1, x0			// set TTBR0
+	bfi	x2, x1, #48, #16		// set the ASID
+	msr	ttbr1_el1, x2			// in TTBR1 (since TCR.A1 is set)
+	isb
+	msr	ttbr0_el1, x0			// now update TTBR0
 	isb
 	post_ttbr0_update_workaround
 	ret
@@ -225,7 +228,7 @@ ENTRY(__cpu_setup)
 	 * both user and kernel.
 	 */
 	ldr	x10, =TCR_TxSZ(VA_BITS) | TCR_CACHE_FLAGS | TCR_SMP_FLAGS | \
-			TCR_TG_FLAGS | TCR_ASID16 | TCR_TBI0
+			TCR_TG_FLAGS | TCR_ASID16 | TCR_TBI0 | TCR_A1
 	tcr_set_idmap_t0sz	x10, x9
 
 	/*
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 04/18] arm64: mm: Remove pre_ttbr0_update_workaround for Falkor erratum #E1003
  2017-11-17 18:21 [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Will Deacon
                   ` (2 preceding siblings ...)
  2017-11-17 18:21 ` [PATCH 03/18] arm64: mm: Move ASID from TTBR0 to TTBR1 Will Deacon
@ 2017-11-17 18:21 ` Will Deacon
  2017-11-17 18:21 ` [PATCH 05/18] arm64: mm: Rename post_ttbr0_update_workaround Will Deacon
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2017-11-17 18:21 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, catalin.marinas, mark.rutland, ard.biesheuvel,
	sboyd, dave.hansen, keescook, Will Deacon

The pre_ttbr0_update_workaround hook is called prior to context-switching
TTBR0 because Falkor erratum E1003 can cause TLB allocation with the wrong
ASID if both the ASID and the base address of the TTBR are updated at
the same time.

With the ASID sitting safely in TTBR1, we no longer update things
atomically, so we can remove the pre_ttbr0_update_workaround macro as
it's no longer required. The erratum infrastructure and documentation
is left around for #E1003, as it will be required by the entry
trampoline code in a future patch.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/assembler.h   | 22 ----------------------
 arch/arm64/include/asm/mmu_context.h |  2 --
 arch/arm64/mm/context.c              | 11 -----------
 arch/arm64/mm/proc.S                 |  1 -
 4 files changed, 36 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index d58a6253c6ab..8359148858cb 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -25,7 +25,6 @@
 
 #include <asm/asm-offsets.h>
 #include <asm/cpufeature.h>
-#include <asm/mmu_context.h>
 #include <asm/page.h>
 #include <asm/pgtable-hwdef.h>
 #include <asm/ptrace.h>
@@ -465,27 +464,6 @@ alternative_endif
 	.endm
 
 /*
- * Errata workaround prior to TTBR0_EL1 update
- *
- * 	val:	TTBR value with new BADDR, preserved
- * 	tmp0:	temporary register, clobbered
- * 	tmp1:	other temporary register, clobbered
- */
-	.macro	pre_ttbr0_update_workaround, val, tmp0, tmp1
-#ifdef CONFIG_QCOM_FALKOR_ERRATUM_1003
-alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1003
-	mrs	\tmp0, ttbr0_el1
-	mov	\tmp1, #FALKOR_RESERVED_ASID
-	bfi	\tmp0, \tmp1, #48, #16		// reserved ASID + old BADDR
-	msr	ttbr0_el1, \tmp0
-	isb
-	bfi	\tmp0, \val, #0, #48		// reserved ASID + new BADDR
-	msr	ttbr0_el1, \tmp0
-	isb
-alternative_else_nop_endif
-#endif
-	.endm
-
 /*
  * Errata workaround post TTBR0_EL1 update.
  */
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 56723bcbfaaa..6d93bd545906 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -19,8 +19,6 @@
 #ifndef __ASM_MMU_CONTEXT_H
 #define __ASM_MMU_CONTEXT_H
 
-#define FALKOR_RESERVED_ASID	1
-
 #ifndef __ASSEMBLY__
 
 #include <linux/compiler.h>
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index ab9f5f0fb2c7..78816e476491 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -79,13 +79,6 @@ void verify_cpu_asid_bits(void)
 	}
 }
 
-static void set_reserved_asid_bits(void)
-{
-	if (IS_ENABLED(CONFIG_QCOM_FALKOR_ERRATUM_1003) &&
-	    cpus_have_const_cap(ARM64_WORKAROUND_QCOM_FALKOR_E1003))
-		__set_bit(FALKOR_RESERVED_ASID, asid_map);
-}
-
 static void flush_context(unsigned int cpu)
 {
 	int i;
@@ -94,8 +87,6 @@ static void flush_context(unsigned int cpu)
 	/* Update the list of reserved ASIDs and the ASID bitmap. */
 	bitmap_clear(asid_map, 0, NUM_USER_ASIDS);
 
-	set_reserved_asid_bits();
-
 	/*
 	 * Ensure the generation bump is observed before we xchg the
 	 * active_asids.
@@ -250,8 +241,6 @@ static int asids_init(void)
 		panic("Failed to allocate bitmap for %lu ASIDs\n",
 		      NUM_USER_ASIDS);
 
-	set_reserved_asid_bits();
-
 	pr_info("ASID allocator initialised with %lu entries\n", NUM_USER_ASIDS);
 	return 0;
 }
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 0bd7550b7230..1623150ed0a6 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -138,7 +138,6 @@ ENDPROC(cpu_do_resume)
  *	- pgd_phys - physical address of new TTB
  */
 ENTRY(cpu_do_switch_mm)
-	pre_ttbr0_update_workaround x0, x2, x3
 	mrs	x2, ttbr1_el1
 	mmid	x1, x1				// get mm->context.id
 	bfi	x2, x1, #48, #16		// set the ASID
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 05/18] arm64: mm: Rename post_ttbr0_update_workaround
  2017-11-17 18:21 [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Will Deacon
                   ` (3 preceding siblings ...)
  2017-11-17 18:21 ` [PATCH 04/18] arm64: mm: Remove pre_ttbr0_update_workaround for Falkor erratum #E1003 Will Deacon
@ 2017-11-17 18:21 ` Will Deacon
  2017-11-17 18:21 ` [PATCH 06/18] arm64: mm: Fix and re-enable ARM64_SW_TTBR0_PAN Will Deacon
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2017-11-17 18:21 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, catalin.marinas, mark.rutland, ard.biesheuvel,
	sboyd, dave.hansen, keescook, Will Deacon

The post_ttbr0_update_workaround hook applies to any change to TTBRx_EL1.
Since we're using TTBR1 for the ASID, rename the hook to make it clearer
as to what it's doing.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/assembler.h | 5 ++---
 arch/arm64/kernel/entry.S          | 2 +-
 arch/arm64/mm/proc.S               | 2 +-
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 8359148858cb..622316a8c82b 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -464,10 +464,9 @@ alternative_endif
 	.endm
 
 /*
-/*
- * Errata workaround post TTBR0_EL1 update.
+ * Errata workaround post TTBRx_EL1 update.
  */
-	.macro	post_ttbr0_update_workaround
+	.macro	post_ttbr_update_workaround
 #ifdef CONFIG_CAVIUM_ERRATUM_27456
 alternative_if ARM64_WORKAROUND_CAVIUM_27456
 	ic	iallu
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index e1c59d4008a8..dd5fa2c3d489 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -255,7 +255,7 @@ alternative_else_nop_endif
 	 * Cavium erratum 27456 (broadcast TLBI instructions may cause I-cache
 	 * corruption).
 	 */
-	post_ttbr0_update_workaround
+	post_ttbr_update_workaround
 	.endif
 1:
 	.if	\el != 0
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 1623150ed0a6..447537c1699d 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -145,7 +145,7 @@ ENTRY(cpu_do_switch_mm)
 	isb
 	msr	ttbr0_el1, x0			// now update TTBR0
 	isb
-	post_ttbr0_update_workaround
+	post_ttbr_update_workaround
 	ret
 ENDPROC(cpu_do_switch_mm)
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 06/18] arm64: mm: Fix and re-enable ARM64_SW_TTBR0_PAN
  2017-11-17 18:21 [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Will Deacon
                   ` (4 preceding siblings ...)
  2017-11-17 18:21 ` [PATCH 05/18] arm64: mm: Rename post_ttbr0_update_workaround Will Deacon
@ 2017-11-17 18:21 ` Will Deacon
  2017-11-17 18:21 ` [PATCH 07/18] arm64: mm: Allocate ASIDs in pairs Will Deacon
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2017-11-17 18:21 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, catalin.marinas, mark.rutland, ard.biesheuvel,
	sboyd, dave.hansen, keescook, Will Deacon

With the ASID now installed in TTBR1, we can re-enable ARM64_SW_TTBR0_PAN
by ensuring that we switch to a reserved ASID of zero when disabling
user access and restore the active user ASID on the uaccess enable path.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/Kconfig                   |  1 -
 arch/arm64/include/asm/asm-uaccess.h | 25 +++++++++++++++++--------
 arch/arm64/include/asm/uaccess.h     | 21 +++++++++++++++++----
 arch/arm64/kernel/entry.S            |  4 ++--
 arch/arm64/lib/clear_user.S          |  2 +-
 arch/arm64/lib/copy_from_user.S      |  2 +-
 arch/arm64/lib/copy_in_user.S        |  2 +-
 arch/arm64/lib/copy_to_user.S        |  2 +-
 arch/arm64/mm/cache.S                |  2 +-
 arch/arm64/xen/hypercall.S           |  2 +-
 10 files changed, 42 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 582bbd77c390..0df64a6a56d4 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -872,7 +872,6 @@ endif
 
 config ARM64_SW_TTBR0_PAN
 	bool "Emulate Privileged Access Never using TTBR0_EL1 switching"
-	depends on BROKEN       # Temporary while switch_mm is reworked
 	help
 	  Enabling this option prevents the kernel from accessing
 	  user-space memory directly by pointing TTBR0_EL1 to a reserved
diff --git a/arch/arm64/include/asm/asm-uaccess.h b/arch/arm64/include/asm/asm-uaccess.h
index b3da6c886835..21b8cf304028 100644
--- a/arch/arm64/include/asm/asm-uaccess.h
+++ b/arch/arm64/include/asm/asm-uaccess.h
@@ -16,11 +16,20 @@
 	add	\tmp1, \tmp1, #SWAPPER_DIR_SIZE	// reserved_ttbr0 at the end of swapper_pg_dir
 	msr	ttbr0_el1, \tmp1		// set reserved TTBR0_EL1
 	isb
+	sub	\tmp1, \tmp1, #SWAPPER_DIR_SIZE
+	bic	\tmp1, \tmp1, #(0xffff << 48)
+	msr	ttbr1_el1, \tmp1		// set reserved ASID
+	isb
 	.endm
 
-	.macro	__uaccess_ttbr0_enable, tmp1
+	.macro	__uaccess_ttbr0_enable, tmp1, tmp2
 	get_thread_info \tmp1
 	ldr	\tmp1, [\tmp1, #TSK_TI_TTBR0]	// load saved TTBR0_EL1
+	mrs	\tmp2, ttbr1_el1
+	extr    \tmp2, \tmp2, \tmp1, #48
+	ror     \tmp2, \tmp2, #16
+	msr	ttbr1_el1, \tmp2		// set the active ASID
+	isb
 	msr	ttbr0_el1, \tmp1		// set the non-PAN TTBR0_EL1
 	isb
 	.endm
@@ -31,18 +40,18 @@ alternative_if_not ARM64_HAS_PAN
 alternative_else_nop_endif
 	.endm
 
-	.macro	uaccess_ttbr0_enable, tmp1, tmp2
+	.macro	uaccess_ttbr0_enable, tmp1, tmp2, tmp3
 alternative_if_not ARM64_HAS_PAN
-	save_and_disable_irq \tmp2		// avoid preemption
-	__uaccess_ttbr0_enable \tmp1
-	restore_irq \tmp2
+	save_and_disable_irq \tmp3		// avoid preemption
+	__uaccess_ttbr0_enable \tmp1, \tmp2
+	restore_irq \tmp3
 alternative_else_nop_endif
 	.endm
 #else
 	.macro	uaccess_ttbr0_disable, tmp1
 	.endm
 
-	.macro	uaccess_ttbr0_enable, tmp1, tmp2
+	.macro	uaccess_ttbr0_enable, tmp1, tmp2, tmp3
 	.endm
 #endif
 
@@ -56,8 +65,8 @@ alternative_if ARM64_ALT_PAN_NOT_UAO
 alternative_else_nop_endif
 	.endm
 
-	.macro	uaccess_enable_not_uao, tmp1, tmp2
-	uaccess_ttbr0_enable \tmp1, \tmp2
+	.macro	uaccess_enable_not_uao, tmp1, tmp2, tmp3
+	uaccess_ttbr0_enable \tmp1, \tmp2, \tmp3
 alternative_if ARM64_ALT_PAN_NOT_UAO
 	SET_PSTATE_PAN(0)
 alternative_else_nop_endif
diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h
index fc0f9eb66039..750a3b76a01c 100644
--- a/arch/arm64/include/asm/uaccess.h
+++ b/arch/arm64/include/asm/uaccess.h
@@ -107,15 +107,19 @@ static inline void __uaccess_ttbr0_disable(void)
 {
 	unsigned long ttbr;
 
+	ttbr = read_sysreg(ttbr1_el1);
 	/* reserved_ttbr0 placed at the end of swapper_pg_dir */
-	ttbr = read_sysreg(ttbr1_el1) + SWAPPER_DIR_SIZE;
-	write_sysreg(ttbr, ttbr0_el1);
+	write_sysreg(ttbr + SWAPPER_DIR_SIZE, ttbr0_el1);
+	isb();
+	/* Set reserved ASID */
+	ttbr &= ~(0xffffUL << 48);
+	write_sysreg(ttbr, ttbr1_el1);
 	isb();
 }
 
 static inline void __uaccess_ttbr0_enable(void)
 {
-	unsigned long flags;
+	unsigned long flags, ttbr0, ttbr1;
 
 	/*
 	 * Disable interrupts to avoid preemption between reading the 'ttbr0'
@@ -123,7 +127,16 @@ static inline void __uaccess_ttbr0_enable(void)
 	 * roll-over and an update of 'ttbr0'.
 	 */
 	local_irq_save(flags);
-	write_sysreg(current_thread_info()->ttbr0, ttbr0_el1);
+	ttbr0 = current_thread_info()->ttbr0;
+
+	/* Restore active ASID */
+	ttbr1 = read_sysreg(ttbr1_el1);
+	ttbr1 |= ttbr0 & (0xffffUL << 48);
+	write_sysreg(ttbr1, ttbr1_el1);
+	isb();
+
+	/* Restore user page table */
+	write_sysreg(ttbr0, ttbr0_el1);
 	isb();
 	local_irq_restore(flags);
 }
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index dd5fa2c3d489..e2afc15a1535 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -184,7 +184,7 @@ alternative_if ARM64_HAS_PAN
 alternative_else_nop_endif
 
 	.if	\el != 0
-	mrs	x21, ttbr0_el1
+	mrs	x21, ttbr1_el1
 	tst	x21, #0xffff << 48		// Check for the reserved ASID
 	orr	x23, x23, #PSR_PAN_BIT		// Set the emulated PAN in the saved SPSR
 	b.eq	1f				// TTBR0 access already disabled
@@ -246,7 +246,7 @@ alternative_else_nop_endif
 	tbnz	x22, #22, 1f			// Skip re-enabling TTBR0 access if the PSR_PAN_BIT is set
 	.endif
 
-	__uaccess_ttbr0_enable x0
+	__uaccess_ttbr0_enable x0, x1
 
 	.if	\el == 0
 	/*
diff --git a/arch/arm64/lib/clear_user.S b/arch/arm64/lib/clear_user.S
index e88fb99c1561..8f9c4641e706 100644
--- a/arch/arm64/lib/clear_user.S
+++ b/arch/arm64/lib/clear_user.S
@@ -30,7 +30,7 @@
  * Alignment fixed up by hardware.
  */
 ENTRY(__clear_user)
-	uaccess_enable_not_uao x2, x3
+	uaccess_enable_not_uao x2, x3, x4
 	mov	x2, x1			// save the size for fixup return
 	subs	x1, x1, #8
 	b.mi	2f
diff --git a/arch/arm64/lib/copy_from_user.S b/arch/arm64/lib/copy_from_user.S
index 4b5d826895ff..69d86a80f3e2 100644
--- a/arch/arm64/lib/copy_from_user.S
+++ b/arch/arm64/lib/copy_from_user.S
@@ -64,7 +64,7 @@
 
 end	.req	x5
 ENTRY(__arch_copy_from_user)
-	uaccess_enable_not_uao x3, x4
+	uaccess_enable_not_uao x3, x4, x5
 	add	end, x0, x2
 #include "copy_template.S"
 	uaccess_disable_not_uao x3
diff --git a/arch/arm64/lib/copy_in_user.S b/arch/arm64/lib/copy_in_user.S
index b24a830419ad..e442b531252a 100644
--- a/arch/arm64/lib/copy_in_user.S
+++ b/arch/arm64/lib/copy_in_user.S
@@ -65,7 +65,7 @@
 
 end	.req	x5
 ENTRY(raw_copy_in_user)
-	uaccess_enable_not_uao x3, x4
+	uaccess_enable_not_uao x3, x4, x5
 	add	end, x0, x2
 #include "copy_template.S"
 	uaccess_disable_not_uao x3
diff --git a/arch/arm64/lib/copy_to_user.S b/arch/arm64/lib/copy_to_user.S
index 351f0766f7a6..318f15d5c336 100644
--- a/arch/arm64/lib/copy_to_user.S
+++ b/arch/arm64/lib/copy_to_user.S
@@ -63,7 +63,7 @@
 
 end	.req	x5
 ENTRY(__arch_copy_to_user)
-	uaccess_enable_not_uao x3, x4
+	uaccess_enable_not_uao x3, x4, x5
 	add	end, x0, x2
 #include "copy_template.S"
 	uaccess_disable_not_uao x3
diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
index 7f1dbe962cf5..6cd20a8c0952 100644
--- a/arch/arm64/mm/cache.S
+++ b/arch/arm64/mm/cache.S
@@ -49,7 +49,7 @@ ENTRY(flush_icache_range)
  *	- end     - virtual end address of region
  */
 ENTRY(__flush_cache_user_range)
-	uaccess_ttbr0_enable x2, x3
+	uaccess_ttbr0_enable x2, x3, x4
 	dcache_line_size x2, x3
 	sub	x3, x2, #1
 	bic	x4, x0, x3
diff --git a/arch/arm64/xen/hypercall.S b/arch/arm64/xen/hypercall.S
index 401ceb71540c..acdbd2c9e899 100644
--- a/arch/arm64/xen/hypercall.S
+++ b/arch/arm64/xen/hypercall.S
@@ -101,7 +101,7 @@ ENTRY(privcmd_call)
 	 * need the explicit uaccess_enable/disable if the TTBR0 PAN emulation
 	 * is enabled (it implies that hardware UAO and PAN disabled).
 	 */
-	uaccess_ttbr0_enable x6, x7
+	uaccess_ttbr0_enable x6, x7, x8
 	hvc XEN_IMM
 
 	/*
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 07/18] arm64: mm: Allocate ASIDs in pairs
  2017-11-17 18:21 [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Will Deacon
                   ` (5 preceding siblings ...)
  2017-11-17 18:21 ` [PATCH 06/18] arm64: mm: Fix and re-enable ARM64_SW_TTBR0_PAN Will Deacon
@ 2017-11-17 18:21 ` Will Deacon
  2017-11-17 18:21 ` [PATCH 08/18] arm64: mm: Add arm64_kernel_mapped_at_el0 helper using static key Will Deacon
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2017-11-17 18:21 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, catalin.marinas, mark.rutland, ard.biesheuvel,
	sboyd, dave.hansen, keescook, Will Deacon

In preparation for separate kernel/user ASIDs, allocate them in pairs
for each mm_struct. The bottom bit distinguishes the two: if it is set,
then the ASID will map only userspace.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/mmu.h |  1 +
 arch/arm64/mm/context.c      | 25 +++++++++++++++++--------
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 0d34bf0a89c7..01bfb184f2a8 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -17,6 +17,7 @@
 #define __ASM_MMU_H
 
 #define MMCF_AARCH32	0x1	/* mm context flag for AArch32 executables */
+#define USER_ASID_FLAG	(UL(1) << 48)
 
 typedef struct {
 	atomic64_t	id;
diff --git a/arch/arm64/mm/context.c b/arch/arm64/mm/context.c
index 78816e476491..db28958d9e4f 100644
--- a/arch/arm64/mm/context.c
+++ b/arch/arm64/mm/context.c
@@ -39,7 +39,16 @@ static cpumask_t tlb_flush_pending;
 
 #define ASID_MASK		(~GENMASK(asid_bits - 1, 0))
 #define ASID_FIRST_VERSION	(1UL << asid_bits)
-#define NUM_USER_ASIDS		ASID_FIRST_VERSION
+
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+#define NUM_USER_ASIDS		(ASID_FIRST_VERSION >> 1)
+#define asid2idx(asid)		(((asid) & ~ASID_MASK) >> 1)
+#define idx2asid(idx)		(((idx) << 1) & ~ASID_MASK)
+#else
+#define NUM_USER_ASIDS		(ASID_FIRST_VERSION)
+#define asid2idx(asid)		((asid) & ~ASID_MASK)
+#define idx2asid(idx)		asid2idx(idx)
+#endif
 
 /* Get the ASIDBits supported by the current CPU */
 static u32 get_cpu_asid_bits(void)
@@ -104,7 +113,7 @@ static void flush_context(unsigned int cpu)
 		 */
 		if (asid == 0)
 			asid = per_cpu(reserved_asids, i);
-		__set_bit(asid & ~ASID_MASK, asid_map);
+		__set_bit(asid2idx(asid), asid_map);
 		per_cpu(reserved_asids, i) = asid;
 	}
 
@@ -156,16 +165,16 @@ static u64 new_context(struct mm_struct *mm, unsigned int cpu)
 		 * We had a valid ASID in a previous life, so try to re-use
 		 * it if possible.
 		 */
-		asid &= ~ASID_MASK;
-		if (!__test_and_set_bit(asid, asid_map))
+		if (!__test_and_set_bit(asid2idx(asid), asid_map))
 			return newasid;
 	}
 
 	/*
 	 * Allocate a free ASID. If we can't find one, take a note of the
-	 * currently active ASIDs and mark the TLBs as requiring flushes.
-	 * We always count from ASID #1, as we use ASID #0 when setting a
-	 * reserved TTBR0 for the init_mm.
+	 * currently active ASIDs and mark the TLBs as requiring flushes.  We
+	 * always count from ASID #2 (index 1), as we use ASID #0 when setting
+	 * a reserved TTBR0 for the init_mm and we allocate ASIDs in even/odd
+	 * pairs.
 	 */
 	asid = find_next_zero_bit(asid_map, NUM_USER_ASIDS, cur_idx);
 	if (asid != NUM_USER_ASIDS)
@@ -182,7 +191,7 @@ static u64 new_context(struct mm_struct *mm, unsigned int cpu)
 set_asid:
 	__set_bit(asid, asid_map);
 	cur_idx = asid;
-	return asid | generation;
+	return idx2asid(asid) | generation;
 }
 
 void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 08/18] arm64: mm: Add arm64_kernel_mapped_at_el0 helper using static key
  2017-11-17 18:21 [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Will Deacon
                   ` (6 preceding siblings ...)
  2017-11-17 18:21 ` [PATCH 07/18] arm64: mm: Allocate ASIDs in pairs Will Deacon
@ 2017-11-17 18:21 ` Will Deacon
  2017-11-17 18:21 ` [PATCH 09/18] arm64: mm: Invalidate both kernel and user ASIDs when performing TLBI Will Deacon
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2017-11-17 18:21 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, catalin.marinas, mark.rutland, ard.biesheuvel,
	sboyd, dave.hansen, keescook, Will Deacon

In order for code such as TLB invalidation to operate efficiently when
the decision to map the kernel at EL0 is determined at runtime, this
patch introduces a helper function, arm64_kernel_mapped_at_el0, which
uses a static key that will later be hooked up to a command-line option.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/mmu.h | 11 +++++++++++
 arch/arm64/mm/mmu.c          |  5 +++++
 2 files changed, 16 insertions(+)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 01bfb184f2a8..a84f851409ca 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -19,6 +19,8 @@
 #define MMCF_AARCH32	0x1	/* mm context flag for AArch32 executables */
 #define USER_ASID_FLAG	(UL(1) << 48)
 
+#ifndef __ASSEMBLY__
+
 typedef struct {
 	atomic64_t	id;
 	void		*vdso;
@@ -32,6 +34,14 @@ typedef struct {
  */
 #define ASID(mm)	((mm)->context.id.counter & 0xffff)
 
+DECLARE_STATIC_KEY_TRUE(__unmap_kernel_at_el0);
+
+static inline bool arm64_kernel_mapped_at_el0(void)
+{
+	return !IS_ENABLED(CONFIG_UNMAP_KERNEL_AT_EL0) ||
+	       !static_branch_likely(&__unmap_kernel_at_el0);
+}
+
 extern void paging_init(void);
 extern void bootmem_init(void);
 extern void __iomem *early_io_map(phys_addr_t phys, unsigned long virt);
@@ -42,4 +52,5 @@ extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 extern void *fixmap_remap_fdt(phys_addr_t dt_phys);
 extern void mark_linear_text_alias_ro(void);
 
+#endif	/* !__ASSEMBLY__ */
 #endif
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index f1eb15e0e864..a75858267b6d 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -525,6 +525,11 @@ static int __init parse_rodata(char *arg)
 }
 early_param("rodata", parse_rodata);
 
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+DEFINE_STATIC_KEY_TRUE(__unmap_kernel_at_el0);
+EXPORT_SYMBOL_GPL(__unmap_kernel_at_el0);
+#endif
+
 /*
  * Create fine-grained mappings for the kernel.
  */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 09/18] arm64: mm: Invalidate both kernel and user ASIDs when performing TLBI
  2017-11-17 18:21 [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Will Deacon
                   ` (7 preceding siblings ...)
  2017-11-17 18:21 ` [PATCH 08/18] arm64: mm: Add arm64_kernel_mapped_at_el0 helper using static key Will Deacon
@ 2017-11-17 18:21 ` Will Deacon
  2017-11-17 18:21 ` [PATCH 10/18] arm64: entry: Add exception trampoline page for exceptions from EL0 Will Deacon
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2017-11-17 18:21 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, catalin.marinas, mark.rutland, ard.biesheuvel,
	sboyd, dave.hansen, keescook, Will Deacon

Since an mm has both a kernel and a user ASID, we need to ensure that
broadcast TLB maintenance targets both address spaces so that things
like CoW continue to work with the uaccess primitives in the kernel.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/tlbflush.h | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index af1c76981911..42d250ec74b1 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -23,6 +23,7 @@
 
 #include <linux/sched.h>
 #include <asm/cputype.h>
+#include <asm/mmu.h>
 
 /*
  * Raw TLBI operations.
@@ -54,6 +55,11 @@
 
 #define __tlbi(op, ...)		__TLBI_N(op, ##__VA_ARGS__, 1, 0)
 
+#define __tlbi_user(op, arg) do {						\
+	if (!arm64_kernel_mapped_at_el0())					\
+		__tlbi(op, (arg) | USER_ASID_FLAG);				\
+} while (0)
+
 /*
  *	TLB Management
  *	==============
@@ -115,6 +121,7 @@ static inline void flush_tlb_mm(struct mm_struct *mm)
 
 	dsb(ishst);
 	__tlbi(aside1is, asid);
+	__tlbi_user(aside1is, asid);
 	dsb(ish);
 }
 
@@ -125,6 +132,7 @@ static inline void flush_tlb_page(struct vm_area_struct *vma,
 
 	dsb(ishst);
 	__tlbi(vale1is, addr);
+	__tlbi_user(vale1is, addr);
 	dsb(ish);
 }
 
@@ -151,10 +159,13 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma,
 
 	dsb(ishst);
 	for (addr = start; addr < end; addr += 1 << (PAGE_SHIFT - 12)) {
-		if (last_level)
+		if (last_level) {
 			__tlbi(vale1is, addr);
-		else
+			__tlbi_user(vale1is, addr);
+		} else {
 			__tlbi(vae1is, addr);
+			__tlbi_user(vae1is, addr);
+		}
 	}
 	dsb(ish);
 }
@@ -194,6 +205,7 @@ static inline void __flush_tlb_pgtable(struct mm_struct *mm,
 	unsigned long addr = uaddr >> 12 | (ASID(mm) << 48);
 
 	__tlbi(vae1is, addr);
+	__tlbi_user(vae1is, addr);
 	dsb(ish);
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 10/18] arm64: entry: Add exception trampoline page for exceptions from EL0
  2017-11-17 18:21 [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Will Deacon
                   ` (8 preceding siblings ...)
  2017-11-17 18:21 ` [PATCH 09/18] arm64: mm: Invalidate both kernel and user ASIDs when performing TLBI Will Deacon
@ 2017-11-17 18:21 ` Will Deacon
  2017-11-17 18:21 ` [PATCH 11/18] arm64: mm: Map entry trampoline into trampoline and kernel page tables Will Deacon
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2017-11-17 18:21 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, catalin.marinas, mark.rutland, ard.biesheuvel,
	sboyd, dave.hansen, keescook, Will Deacon

To allow unmapping of the kernel whilst running at EL0, we need to
point the exception vectors at an entry trampoline that can map/unmap
the kernel on entry/exit respectively.

This patch adds the trampoline page, although it is not yet plugged
into the vector table and is therefore unused.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/kernel/entry.S       | 85 +++++++++++++++++++++++++++++++++++++++++
 arch/arm64/kernel/vmlinux.lds.S | 17 +++++++++
 2 files changed, 102 insertions(+)

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index e2afc15a1535..d850af724c8c 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -29,6 +29,7 @@
 #include <asm/esr.h>
 #include <asm/irq.h>
 #include <asm/memory.h>
+#include <asm/mmu.h>
 #include <asm/ptrace.h>
 #include <asm/thread_info.h>
 #include <asm/asm-uaccess.h>
@@ -895,6 +896,90 @@ __ni_sys_trace:
 
 	.popsection				// .entry.text
 
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+/*
+ * Exception vectors trampoline.
+ */
+	.pushsection ".entry.tramp.text", "ax"
+
+	.macro tramp_map_kernel, tmp
+	mrs	\tmp, ttbr1_el1
+	sub	\tmp, \tmp, #(SWAPPER_DIR_SIZE + RESERVED_TTBR0_SIZE)
+	bic	\tmp, \tmp, #USER_ASID_FLAG
+	msr	ttbr1_el1, \tmp
+	.endm
+
+	.macro tramp_unmap_kernel, tmp
+	mrs	\tmp, ttbr1_el1
+	add	\tmp, \tmp, #(SWAPPER_DIR_SIZE + RESERVED_TTBR0_SIZE)
+	orr	\tmp, \tmp, #USER_ASID_FLAG
+	msr	ttbr1_el1, \tmp
+	/*
+	 * We avoid running the post_ttbr_update_workaround here because the
+	 * user and kernel ASIDs don't have conflicting mappings, so any
+	 * "blessing" as described in:
+	 *
+	 *   http://lkml.kernel.org/r/56BB848A.6060603@caviumnetworks.com
+	 *
+	 * will not hurt correctness. Whilst this may partially defeat the
+	 * point of using split ASIDs in the first place, it avoids
+	 * the hit of invalidating the entire I-cache on every return to
+	 * userspace.
+	 */
+	.endm
+
+	.macro tramp_ventry, regsize = 64
+	.align	7
+1:
+	.if	\regsize == 64
+	msr	tpidrro_el0, x30
+	.endif
+	tramp_map_kernel	x30
+	ldr	x30, =vectors
+	prfm	plil1strm, [x30, #(1b - tramp_vectors)]
+	msr	vbar_el1, x30
+	add	x30, x30, #(1b - tramp_vectors)
+	isb
+	br	x30
+	.endm
+
+	.macro tramp_exit, regsize = 64
+	adr	x30, tramp_vectors
+	msr	vbar_el1, x30
+	tramp_unmap_kernel	x30
+	.if	\regsize == 64
+	mrs	x30, far_el1
+	.endif
+	eret
+	.endm
+
+	.align	11
+ENTRY(tramp_vectors)
+	.space	0x400
+
+	tramp_ventry
+	tramp_ventry
+	tramp_ventry
+	tramp_ventry
+
+	tramp_ventry	32
+	tramp_ventry	32
+	tramp_ventry	32
+	tramp_ventry	32
+END(tramp_vectors)
+
+ENTRY(tramp_exit_native)
+	tramp_exit
+END(tramp_exit_native)
+
+ENTRY(tramp_exit_compat)
+	tramp_exit	32
+END(tramp_exit_compat)
+
+	.ltorg
+	.popsection				// .entry.tramp.text
+#endif /* CONFIG_UNMAP_KERNEL_AT_EL0 */
+
 /*
  * Special system call wrappers.
  */
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 7da3e5c366a0..6b4260f22aab 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -57,6 +57,17 @@ jiffies = jiffies_64;
 #define HIBERNATE_TEXT
 #endif
 
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+#define TRAMP_TEXT					\
+	. = ALIGN(PAGE_SIZE);				\
+	VMLINUX_SYMBOL(__entry_tramp_text_start) = .;	\
+	*(.entry.tramp.text)				\
+	. = ALIGN(PAGE_SIZE);				\
+	VMLINUX_SYMBOL(__entry_tramp_text_end) = .;
+#else
+#define TRAMP_TEXT
+#endif
+
 /*
  * The size of the PE/COFF section that covers the kernel image, which
  * runs from stext to _edata, must be a round multiple of the PE/COFF
@@ -113,6 +124,7 @@ SECTIONS
 			HYPERVISOR_TEXT
 			IDMAP_TEXT
 			HIBERNATE_TEXT
+			TRAMP_TEXT
 			*(.fixup)
 			*(.gnu.warning)
 		. = ALIGN(16);
@@ -214,6 +226,11 @@ SECTIONS
 	. += RESERVED_TTBR0_SIZE;
 #endif
 
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+	tramp_pg_dir = .;
+	. += PAGE_SIZE;
+#endif
+
 	__pecoff_data_size = ABSOLUTE(. - __initdata_begin);
 	_end = .;
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 11/18] arm64: mm: Map entry trampoline into trampoline and kernel page tables
  2017-11-17 18:21 [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Will Deacon
                   ` (9 preceding siblings ...)
  2017-11-17 18:21 ` [PATCH 10/18] arm64: entry: Add exception trampoline page for exceptions from EL0 Will Deacon
@ 2017-11-17 18:21 ` Will Deacon
  2017-11-17 18:21 ` [PATCH 12/18] arm64: entry: Explicitly pass exception level to kernel_ventry macro Will Deacon
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2017-11-17 18:21 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, catalin.marinas, mark.rutland, ard.biesheuvel,
	sboyd, dave.hansen, keescook, Will Deacon

The exception entry trampoline needs to be mapped at the same virtual
address in both the trampoline page table (which maps nothing else)
and also the kernel page table, so that we can swizzle TTBR1_EL1 on
exceptions from and return to EL0.

This patch maps the trampoline at a fixed virtual address (TRAMP_VALIAS),
which allows the kernel proper to be randomized with respect to the
trampoline when KASLR is enabled.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/memory.h  |  1 +
 arch/arm64/include/asm/pgtable.h |  1 +
 arch/arm64/mm/mmu.c              | 48 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 50 insertions(+)

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index f7c4d2146aed..18a3cb86ef17 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -70,6 +70,7 @@
 #define PAGE_OFFSET		(UL(0xffffffffffffffff) - \
 	(UL(1) << (VA_BITS - 1)) + 1)
 #define KIMAGE_VADDR		(MODULES_END)
+#define TRAMP_VALIAS		(KIMAGE_VADDR)
 #define MODULES_END		(MODULES_VADDR + MODULES_VSIZE)
 #define MODULES_VADDR		(VA_START + KASAN_SHADOW_SIZE)
 #define MODULES_VSIZE		(SZ_128M)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index b46e54c2399b..2f3b58a1d434 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -667,6 +667,7 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm,
 
 extern pgd_t swapper_pg_dir[PTRS_PER_PGD];
 extern pgd_t idmap_pg_dir[PTRS_PER_PGD];
+extern pgd_t tramp_pg_dir[PTRS_PER_PGD];
 
 /*
  * Encode and decode a swap entry:
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index a75858267b6d..5ce5cb1249da 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -528,6 +528,51 @@ early_param("rodata", parse_rodata);
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
 DEFINE_STATIC_KEY_TRUE(__unmap_kernel_at_el0);
 EXPORT_SYMBOL_GPL(__unmap_kernel_at_el0);
+
+static void __init add_tramp_vma(void)
+{
+	extern char __entry_tramp_text_start[], __entry_tramp_text_end[];
+	static struct vm_struct vmlinux_tramp;
+	unsigned long size = (unsigned long)__entry_tramp_text_end -
+			     (unsigned long)__entry_tramp_text_start;
+
+	vmlinux_tramp = (struct vm_struct) {
+		.addr		= (void *)TRAMP_VALIAS,
+		.phys_addr	= __pa_symbol(__entry_tramp_text_start),
+		.size		= size + PAGE_SIZE,
+		.flags		= VM_MAP,
+		.caller		= __builtin_return_address(0),
+
+	};
+
+	vm_area_add_early(&vmlinux_tramp);
+}
+
+static int __init map_entry_trampoline(void)
+{
+	extern char __entry_tramp_text_start[], __entry_tramp_text_end[];
+
+	pgprot_t prot = rodata_enabled ? PAGE_KERNEL_ROX : PAGE_KERNEL_EXEC;
+	phys_addr_t size = (unsigned long)__entry_tramp_text_end -
+			   (unsigned long)__entry_tramp_text_start;
+	phys_addr_t pa_start = __pa_symbol(__entry_tramp_text_start);
+
+	/* The trampoline is always mapped and can therefore be global */
+	pgprot_val(prot) &= ~PTE_NG;
+
+	/* Map only the text into the trampoline page table */
+	memset((char *)tramp_pg_dir, 0, PGD_SIZE);
+	__create_pgd_mapping(tramp_pg_dir, pa_start, TRAMP_VALIAS, size, prot,
+			     pgd_pgtable_alloc, 0);
+
+	/* ...as well as the kernel page table */
+	__create_pgd_mapping(init_mm.pgd, pa_start, TRAMP_VALIAS, size, prot,
+			     pgd_pgtable_alloc, 0);
+	return 0;
+}
+core_initcall(map_entry_trampoline);
+#else
+static void __init add_tramp_vma(void) {}
 #endif
 
 /*
@@ -559,6 +604,9 @@ static void __init map_kernel(pgd_t *pgd)
 			   &vmlinux_initdata, 0, VM_NO_GUARD);
 	map_kernel_segment(pgd, _data, _end, PAGE_KERNEL, &vmlinux_data, 0, 0);
 
+	/* Add a VMA for the trampoline page, which will be mapped later on */
+	add_tramp_vma();
+
 	if (!pgd_val(*pgd_offset_raw(pgd, FIXADDR_START))) {
 		/*
 		 * The fixmap falls in a separate pgd to the kernel, and doesn't
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 12/18] arm64: entry: Explicitly pass exception level to kernel_ventry macro
  2017-11-17 18:21 [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Will Deacon
                   ` (10 preceding siblings ...)
  2017-11-17 18:21 ` [PATCH 11/18] arm64: mm: Map entry trampoline into trampoline and kernel page tables Will Deacon
@ 2017-11-17 18:21 ` Will Deacon
  2017-11-17 18:21 ` [PATCH 13/18] arm64: entry: Hook up entry trampoline to exception vectors Will Deacon
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2017-11-17 18:21 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, catalin.marinas, mark.rutland, ard.biesheuvel,
	sboyd, dave.hansen, keescook, Will Deacon

We will need to treat exceptions from EL0 differently in kernel_ventry,
so rework the macro to take the exception level as an argument and
construct the branch target using that.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/kernel/entry.S | 46 +++++++++++++++++++++++-----------------------
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index d850af724c8c..e98cf3064509 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -70,7 +70,7 @@
 #define BAD_FIQ		2
 #define BAD_ERROR	3
 
-	.macro kernel_ventry	label
+	.macro kernel_ventry, el, label, regsize = 64
 	.align 7
 	sub	sp, sp, #S_FRAME_SIZE
 #ifdef CONFIG_VMAP_STACK
@@ -83,7 +83,7 @@
 	tbnz	x0, #THREAD_SHIFT, 0f
 	sub	x0, sp, x0			// x0'' = sp' - x0' = (sp + x0) - sp = x0
 	sub	sp, sp, x0			// sp'' = sp' - x0 = (sp + x0) - x0 = sp
-	b	\label
+	b	el\()\el\()_\label
 
 0:
 	/*
@@ -115,7 +115,7 @@
 	sub	sp, sp, x0
 	mrs	x0, tpidrro_el0
 #endif
-	b	\label
+	b	el\()\el\()_\label
 	.endm
 
 	.macro	kernel_entry, el, regsize = 64
@@ -366,31 +366,31 @@ tsk	.req	x28		// current thread_info
 
 	.align	11
 ENTRY(vectors)
-	kernel_ventry	el1_sync_invalid		// Synchronous EL1t
-	kernel_ventry	el1_irq_invalid			// IRQ EL1t
-	kernel_ventry	el1_fiq_invalid			// FIQ EL1t
-	kernel_ventry	el1_error_invalid		// Error EL1t
+	kernel_ventry	1, sync_invalid			// Synchronous EL1t
+	kernel_ventry	1, irq_invalid			// IRQ EL1t
+	kernel_ventry	1, fiq_invalid			// FIQ EL1t
+	kernel_ventry	1, error_invalid		// Error EL1t
 
-	kernel_ventry	el1_sync			// Synchronous EL1h
-	kernel_ventry	el1_irq				// IRQ EL1h
-	kernel_ventry	el1_fiq_invalid			// FIQ EL1h
-	kernel_ventry	el1_error_invalid		// Error EL1h
+	kernel_ventry	1, sync				// Synchronous EL1h
+	kernel_ventry	1, irq				// IRQ EL1h
+	kernel_ventry	1, fiq_invalid			// FIQ EL1h
+	kernel_ventry	1, error_invalid		// Error EL1h
 
-	kernel_ventry	el0_sync			// Synchronous 64-bit EL0
-	kernel_ventry	el0_irq				// IRQ 64-bit EL0
-	kernel_ventry	el0_fiq_invalid			// FIQ 64-bit EL0
-	kernel_ventry	el0_error_invalid		// Error 64-bit EL0
+	kernel_ventry	0, sync				// Synchronous 64-bit EL0
+	kernel_ventry	0, irq				// IRQ 64-bit EL0
+	kernel_ventry	0, fiq_invalid			// FIQ 64-bit EL0
+	kernel_ventry	0, error_invalid		// Error 64-bit EL0
 
 #ifdef CONFIG_COMPAT
-	kernel_ventry	el0_sync_compat			// Synchronous 32-bit EL0
-	kernel_ventry	el0_irq_compat			// IRQ 32-bit EL0
-	kernel_ventry	el0_fiq_invalid_compat		// FIQ 32-bit EL0
-	kernel_ventry	el0_error_invalid_compat	// Error 32-bit EL0
+	kernel_ventry	0, sync_compat, 32		// Synchronous 32-bit EL0
+	kernel_ventry	0, irq_compat, 32		// IRQ 32-bit EL0
+	kernel_ventry	0, fiq_invalid_compat, 32	// FIQ 32-bit EL0
+	kernel_ventry	0, error_invalid_compat, 32	// Error 32-bit EL0
 #else
-	kernel_ventry	el0_sync_invalid		// Synchronous 32-bit EL0
-	kernel_ventry	el0_irq_invalid			// IRQ 32-bit EL0
-	kernel_ventry	el0_fiq_invalid			// FIQ 32-bit EL0
-	kernel_ventry	el0_error_invalid		// Error 32-bit EL0
+	kernel_ventry	0, sync_invalid, 32		// Synchronous 32-bit EL0
+	kernel_ventry	0, irq_invalid, 32		// IRQ 32-bit EL0
+	kernel_ventry	0, fiq_invalid, 32		// FIQ 32-bit EL0
+	kernel_ventry	0, error_invalid, 32		// Error 32-bit EL0
 #endif
 END(vectors)
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 13/18] arm64: entry: Hook up entry trampoline to exception vectors
  2017-11-17 18:21 [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Will Deacon
                   ` (11 preceding siblings ...)
  2017-11-17 18:21 ` [PATCH 12/18] arm64: entry: Explicitly pass exception level to kernel_ventry macro Will Deacon
@ 2017-11-17 18:21 ` Will Deacon
  2017-11-17 18:21 ` [PATCH 14/18] arm64: erratum: Work around Falkor erratum #E1003 in trampoline code Will Deacon
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2017-11-17 18:21 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, catalin.marinas, mark.rutland, ard.biesheuvel,
	sboyd, dave.hansen, keescook, Will Deacon

Hook up the entry trampoline to our exception vectors so that all
exceptions from and returns to EL0 go via the trampoline, which swizzles
the vector base register accordingly. Transitioning to and from the
kernel clobbers x30, so we use tpidrro_el0 and far_el1 as scratch
registers for native tasks.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/kernel/entry.S | 46 ++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 40 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index e98cf3064509..a839b94bba05 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -72,6 +72,16 @@
 
 	.macro kernel_ventry, el, label, regsize = 64
 	.align 7
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+	.if	\el == 0
+	.if	\regsize == 64
+	mrs	x30, tpidrro_el0
+	.else
+	mov	x30, xzr
+	.endif
+	.endif
+#endif
+
 	sub	sp, sp, #S_FRAME_SIZE
 #ifdef CONFIG_VMAP_STACK
 	/*
@@ -118,6 +128,11 @@
 	b	el\()\el\()_\label
 	.endm
 
+	.macro tramp_alias, dst, sym
+	mov_q	\dst, TRAMP_VALIAS
+	add	\dst, \dst, #(\sym - .entry.tramp.text)
+	.endm
+
 	.macro	kernel_entry, el, regsize = 64
 	.if	\regsize == 32
 	mov	w0, w0				// zero upper 32 bits of x0
@@ -265,25 +280,39 @@ alternative_else_nop_endif
 2:
 #endif
 
+	msr	elr_el1, x21			// set up the return data
+	msr	spsr_el1, x22
+	ldr	lr, [sp, #S_LR]
+
 	.if	\el == 0
 	ldr	x23, [sp, #S_SP]		// load return stack pointer
 	msr	sp_el0, x23
+	tbz	x22, #4, 3f
+
 #ifdef CONFIG_ARM64_ERRATUM_845719
 alternative_if ARM64_WORKAROUND_845719
-	tbz	x22, #4, 1f
 #ifdef CONFIG_PID_IN_CONTEXTIDR
 	mrs	x29, contextidr_el1
 	msr	contextidr_el1, x29
 #else
 	msr contextidr_el1, xzr
 #endif
-1:
 alternative_else_nop_endif
 #endif
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+	tramp_alias	x30, tramp_exit_compat
+	b	4f
+3:
+	msr	tpidrro_el0, xzr
+	msr	far_el1, x30
+	tramp_alias	x30, tramp_exit_native
+4:
+	prfm	plil1strm, [x30]
+#else
+3:
+#endif
 	.endif
 
-	msr	elr_el1, x21			// set up the return data
-	msr	spsr_el1, x22
 	ldp	x0, x1, [sp, #16 * 0]
 	ldp	x2, x3, [sp, #16 * 1]
 	ldp	x4, x5, [sp, #16 * 2]
@@ -299,9 +328,14 @@ alternative_else_nop_endif
 	ldp	x24, x25, [sp, #16 * 12]
 	ldp	x26, x27, [sp, #16 * 13]
 	ldp	x28, x29, [sp, #16 * 14]
-	ldr	lr, [sp, #S_LR]
 	add	sp, sp, #S_FRAME_SIZE		// restore sp
-	eret					// return to kernel
+
+#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+	.if	\el == 0
+	br	x30
+	.endif
+#endif
+	eret
 	.endm
 
 	.macro	irq_stack_entry
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 14/18] arm64: erratum: Work around Falkor erratum #E1003 in trampoline code
  2017-11-17 18:21 [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Will Deacon
                   ` (12 preceding siblings ...)
  2017-11-17 18:21 ` [PATCH 13/18] arm64: entry: Hook up entry trampoline to exception vectors Will Deacon
@ 2017-11-17 18:21 ` Will Deacon
  2017-11-18  0:27   ` Stephen Boyd
  2017-11-17 18:21 ` [PATCH 15/18] arm64: tls: Avoid unconditional zeroing of tpidrro_el0 for native tasks Will Deacon
                   ` (7 subsequent siblings)
  21 siblings, 1 reply; 45+ messages in thread
From: Will Deacon @ 2017-11-17 18:21 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, catalin.marinas, mark.rutland, ard.biesheuvel,
	sboyd, dave.hansen, keescook, Will Deacon

We rely on an atomic swizzling of TTBR1 when transitioning from the entry
trampoline to the kernel proper on an exception. We can't rely on this
atomicity in the face of Falkor erratum #E1003, so on affected cores we
can issue a TLB invalidation prior to jumping into the kernel. There is
still the possibility of a TLB conflict here due to conflicting walk
cache entries, but this doesn't appear to be the case on these CPUs in
practice.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/Kconfig        | 17 +++++------------
 arch/arm64/kernel/entry.S |  8 ++++++++
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0df64a6a56d4..f0fcbfc2262e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -504,20 +504,13 @@ config CAVIUM_ERRATUM_30115
 config QCOM_FALKOR_ERRATUM_1003
 	bool "Falkor E1003: Incorrect translation due to ASID change"
 	default y
-	select ARM64_PAN if ARM64_SW_TTBR0_PAN
 	help
 	  On Falkor v1, an incorrect ASID may be cached in the TLB when ASID
-	  and BADDR are changed together in TTBRx_EL1. The workaround for this
-	  issue is to use a reserved ASID in cpu_do_switch_mm() before
-	  switching to the new ASID. Saying Y here selects ARM64_PAN if
-	  ARM64_SW_TTBR0_PAN is selected. This is done because implementing and
-	  maintaining the E1003 workaround in the software PAN emulation code
-	  would be an unnecessary complication. The affected Falkor v1 CPU
-	  implements ARMv8.1 hardware PAN support and using hardware PAN
-	  support versus software PAN emulation is mutually exclusive at
-	  runtime.
-
-	  If unsure, say Y.
+	  and BADDR are changed together in TTBRx_EL1. Since we keep the ASID
+	  in TTBR1_EL1, this situation only occurs in the entry trampoline and
+	  then only for entries in the walk cache, since the leaf translation
+	  is unchanged. Work around the erratum by invalidating the walk cache
+	  entries for the trampoline before entering the kernel proper.
 
 config QCOM_FALKOR_ERRATUM_1009
 	bool "Falkor E1009: Prematurely complete a DSB after a TLBI"
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index a839b94bba05..a600879939ce 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -941,6 +941,14 @@ __ni_sys_trace:
 	sub	\tmp, \tmp, #(SWAPPER_DIR_SIZE + RESERVED_TTBR0_SIZE)
 	bic	\tmp, \tmp, #USER_ASID_FLAG
 	msr	ttbr1_el1, \tmp
+alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1003
+	isb
+	movk	\tmp, #:abs_g2_nc:(TRAMP_VALIAS >> 12)
+	movk	\tmp, #:abs_g1_nc:(TRAMP_VALIAS >> 12)
+	movk	\tmp, #:abs_g0_nc:(TRAMP_VALIAS >> 12)
+	tlbi	vae1, \tmp
+	dsb	nsh
+alternative_else_nop_endif
 	.endm
 
 	.macro tramp_unmap_kernel, tmp
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 15/18] arm64: tls: Avoid unconditional zeroing of tpidrro_el0 for native tasks
  2017-11-17 18:21 [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Will Deacon
                   ` (13 preceding siblings ...)
  2017-11-17 18:21 ` [PATCH 14/18] arm64: erratum: Work around Falkor erratum #E1003 in trampoline code Will Deacon
@ 2017-11-17 18:21 ` Will Deacon
  2017-11-17 18:21 ` [PATCH 16/18] arm64: entry: Add fake CPU feature for mapping the kernel at EL0 Will Deacon
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2017-11-17 18:21 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, catalin.marinas, mark.rutland, ard.biesheuvel,
	sboyd, dave.hansen, keescook, Will Deacon

We zero tpidrro_el0 on return to the exception trampoline since it is
used as a scratch register during exception entry. When the entry
trampoline is being used, we can therefore avoid zeroing tpidrro_el0
in the context-switch for native tasks.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/kernel/process.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 2dc0f8482210..c2841bda60be 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -305,16 +305,14 @@ void tls_preserve_current_state(void)
 
 static void tls_thread_switch(struct task_struct *next)
 {
-	unsigned long tpidr, tpidrro;
-
 	tls_preserve_current_state();
 
-	tpidr = *task_user_tls(next);
-	tpidrro = is_compat_thread(task_thread_info(next)) ?
-		  next->thread.tp_value : 0;
+	if (is_compat_thread(task_thread_info(next)))
+		write_sysreg(next->thread.tp_value, tpidrro_el0);
+	else if (arm64_kernel_mapped_at_el0())
+		write_sysreg(0, tpidrro_el0);
 
-	write_sysreg(tpidr, tpidr_el0);
-	write_sysreg(tpidrro, tpidrro_el0);
+	write_sysreg(*task_user_tls(next), tpidr_el0);
 }
 
 /* Restore the UAO state depending on next's addr_limit */
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 16/18] arm64: entry: Add fake CPU feature for mapping the kernel at EL0
  2017-11-17 18:21 [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Will Deacon
                   ` (14 preceding siblings ...)
  2017-11-17 18:21 ` [PATCH 15/18] arm64: tls: Avoid unconditional zeroing of tpidrro_el0 for native tasks Will Deacon
@ 2017-11-17 18:21 ` Will Deacon
  2017-11-17 18:22 ` [PATCH 17/18] arm64: makefile: Ensure TEXT_OFFSET doesn't overlap with trampoline Will Deacon
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2017-11-17 18:21 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, catalin.marinas, mark.rutland, ard.biesheuvel,
	sboyd, dave.hansen, keescook, Will Deacon

Allow explicit disabling of the entry trampoline on the kernel command
line by adding a fake CPU feature (ARM64_MAP_KERNEL_AT_EL0) that can
be used to apply alternative sequences to our entry code and avoid use
of the trampoline altogether.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/cpucaps.h |  3 ++-
 arch/arm64/kernel/cpufeature.c   | 11 +++++++++++
 arch/arm64/kernel/entry.S        |  6 +++++-
 arch/arm64/mm/mmu.c              |  7 +++++++
 4 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h
index 8da621627d7c..f61d85f76683 100644
--- a/arch/arm64/include/asm/cpucaps.h
+++ b/arch/arm64/include/asm/cpucaps.h
@@ -40,7 +40,8 @@
 #define ARM64_WORKAROUND_858921			19
 #define ARM64_WORKAROUND_CAVIUM_30115		20
 #define ARM64_HAS_DCPOP				21
+#define ARM64_MAP_KERNEL_AT_EL0			22
 
-#define ARM64_NCAPS				22
+#define ARM64_NCAPS				23
 
 #endif /* __ASM_CPUCAPS_H */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 21e2c95d24e7..aa6b90de6591 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -796,6 +796,12 @@ static bool has_no_fpsimd(const struct arm64_cpu_capabilities *entry, int __unus
 					ID_AA64PFR0_FP_SHIFT) < 0;
 }
 
+static bool map_kernel_at_el0(const struct arm64_cpu_capabilities *entry,
+			      int __unused)
+{
+	return arm64_kernel_mapped_at_el0();
+}
+
 static const struct arm64_cpu_capabilities arm64_features[] = {
 	{
 		.desc = "GIC system register CPU interface",
@@ -883,6 +889,11 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.matches = hyp_offset_low,
 	},
 	{
+		.capability = ARM64_MAP_KERNEL_AT_EL0,
+		.def_scope = SCOPE_SYSTEM,
+		.matches = map_kernel_at_el0,
+	},
+	{
 		/* FP/SIMD is not implemented */
 		.capability = ARM64_HAS_NO_FPSIMD,
 		.def_scope = SCOPE_SYSTEM,
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index a600879939ce..a74253defc5b 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -73,6 +73,7 @@
 	.macro kernel_ventry, el, label, regsize = 64
 	.align 7
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+alternative_if_not ARM64_MAP_KERNEL_AT_EL0
 	.if	\el == 0
 	.if	\regsize == 64
 	mrs	x30, tpidrro_el0
@@ -80,6 +81,7 @@
 	mov	x30, xzr
 	.endif
 	.endif
+alternative_else_nop_endif
 #endif
 
 	sub	sp, sp, #S_FRAME_SIZE
@@ -300,6 +302,7 @@ alternative_if ARM64_WORKAROUND_845719
 alternative_else_nop_endif
 #endif
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
+alternative_if_not ARM64_MAP_KERNEL_AT_EL0
 	tramp_alias	x30, tramp_exit_compat
 	b	4f
 3:
@@ -308,6 +311,7 @@ alternative_else_nop_endif
 	tramp_alias	x30, tramp_exit_native
 4:
 	prfm	plil1strm, [x30]
+alternative_else_nop_endif
 #else
 3:
 #endif
@@ -332,7 +336,7 @@ alternative_else_nop_endif
 
 #ifdef CONFIG_UNMAP_KERNEL_AT_EL0
 	.if	\el == 0
-	br	x30
+	alternative_insn "br x30", nop, ARM64_MAP_KERNEL_AT_EL0
 	.endif
 #endif
 	eret
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 5ce5cb1249da..dab987f2912c 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -571,6 +571,13 @@ static int __init map_entry_trampoline(void)
 	return 0;
 }
 core_initcall(map_entry_trampoline);
+
+static int __init parse_nokaiser(char *__unused)
+{
+	static_branch_disable(&__unmap_kernel_at_el0);
+	return 0;
+}
+__setup("nokaiser", parse_nokaiser);
 #else
 static void __init add_tramp_vma(void) {}
 #endif
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 17/18] arm64: makefile: Ensure TEXT_OFFSET doesn't overlap with trampoline
  2017-11-17 18:21 [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Will Deacon
                   ` (15 preceding siblings ...)
  2017-11-17 18:21 ` [PATCH 16/18] arm64: entry: Add fake CPU feature for mapping the kernel at EL0 Will Deacon
@ 2017-11-17 18:22 ` Will Deacon
  2017-11-17 18:22 ` [PATCH 18/18] arm64: Kconfig: Add CONFIG_UNMAP_KERNEL_AT_EL0 Will Deacon
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2017-11-17 18:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, catalin.marinas, mark.rutland, ard.biesheuvel,
	sboyd, dave.hansen, keescook, Will Deacon

When CONFIG_ARM64_RANDOMIZE_TEXT_OFFSET=y, we could end up with a
TEXT_OFFSET of less than 2 * PAGE_SIZE, which would result in an
overlap with the trampoline and a panic on boot.

Fix this by restricting the minimum value of the random TEXT_OFFSET
value so that it is not less than 2 pages when CONFIG_UNMAP_KERNEL_AT_EL0
is enabled.

I do wonder whether we should just remove
CONFIG_ARM64_RANDOMIZE_TEXT_OFFSET completely, since we're realistically
never going to be able to change our offset from 0x80000, but this keeps
the dream alive for now.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/Makefile | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 939b310913cf..b60ac6c43ccd 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -87,9 +87,21 @@ head-y		:= arch/arm64/kernel/head.o
 
 # The byte offset of the kernel image in RAM from the start of RAM.
 ifeq ($(CONFIG_ARM64_RANDOMIZE_TEXT_OFFSET), y)
-TEXT_OFFSET := $(shell awk "BEGIN {srand(); printf \"0x%06x\n\", \
-		 int(2 * 1024 * 1024 / (2 ^ $(CONFIG_ARM64_PAGE_SHIFT)) * \
-		 rand()) * (2 ^ $(CONFIG_ARM64_PAGE_SHIFT))}")
+TEXT_OFFSET := $(shell awk							\
+		"BEGIN {							\
+			srand();						\
+			page_size = 2 ^ $(CONFIG_ARM64_PAGE_SHIFT);		\
+			tramp_size = 0;						\
+			if (\" $(CONFIG_UNMAP_KERNEL_AT_EL0)\" == \" y\") {	\
+				tramp_size = 2 * page_size;			\
+			}							\
+			offset = int(2 * 1024 * 1024 / page_size * rand());	\
+			offset *= page_size;					\
+			if (offset < tramp_size) {				\
+				offset = tramp_size;				\
+			}							\
+			printf \"0x%06x\n\", offset;				\
+		}")
 else
 TEXT_OFFSET := 0x00080000
 endif
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 18/18] arm64: Kconfig: Add CONFIG_UNMAP_KERNEL_AT_EL0
  2017-11-17 18:21 [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Will Deacon
                   ` (16 preceding siblings ...)
  2017-11-17 18:22 ` [PATCH 17/18] arm64: makefile: Ensure TEXT_OFFSET doesn't overlap with trampoline Will Deacon
@ 2017-11-17 18:22 ` Will Deacon
  2017-11-22 16:52   ` Marc Zyngier
  2017-11-18  0:19 ` [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Stephen Boyd
                   ` (3 subsequent siblings)
  21 siblings, 1 reply; 45+ messages in thread
From: Will Deacon @ 2017-11-17 18:22 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, catalin.marinas, mark.rutland, ard.biesheuvel,
	sboyd, dave.hansen, keescook, Will Deacon

Add a Kconfig entry to control use of the entry trampoline, which allows
us to unmap the kernel whilst running in userspace and improve the
robustness of KASLR.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/Kconfig | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index f0fcbfc2262e..f99ffb88843a 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -796,6 +796,19 @@ config FORCE_MAX_ZONEORDER
 	  However for 4K, we choose a higher default value, 11 as opposed to 10, giving us
 	  4M allocations matching the default size used by generic code.
 
+config UNMAP_KERNEL_AT_EL0
+	bool "Unmap kernel when running in userspace (aka \"KAISER\")"
+	default y
+	help
+	  Some attacks against KASLR make use of the timing difference between
+	  a permission fault which could arise from a page table entry that is
+	  present in the TLB, and a translation fault which always requires a
+	  page table walk. This option defends against these attacks by unmapping
+	  the kernel whilst running in userspace, therefore forcing translation
+	  faults for all of kernel space.
+
+	  If unsure, say Y.
+
 menuconfig ARMV8_DEPRECATED
 	bool "Emulate deprecated/obsolete ARMv8 instructions"
 	depends on COMPAT
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER)
  2017-11-17 18:21 [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Will Deacon
                   ` (17 preceding siblings ...)
  2017-11-17 18:22 ` [PATCH 18/18] arm64: Kconfig: Add CONFIG_UNMAP_KERNEL_AT_EL0 Will Deacon
@ 2017-11-18  0:19 ` Stephen Boyd
  2017-11-20 18:03   ` Will Deacon
  2017-11-18 15:25 ` Ard Biesheuvel
                   ` (2 subsequent siblings)
  21 siblings, 1 reply; 45+ messages in thread
From: Stephen Boyd @ 2017-11-18  0:19 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-kernel, catalin.marinas, mark.rutland,
	ard.biesheuvel, dave.hansen, keescook

On 11/17, Will Deacon wrote:
> Hi all,
> 
> This patch series implements something along the lines of KAISER for arm64:
> 
>   https://gruss.cc/files/kaiser.pdf
> 
> although I wrote this from scratch because the paper has some funny
> assumptions about how the architecture works. There is a patch series
> in review for x86, which follows a similar approach:
> 
>   http://lkml.kernel.org/r/<20171110193058.BECA7D88@viggo.jf.intel.com>
> 
> and the topic was recently covered by LWN (currently subscriber-only):
> 
>   https://lwn.net/Articles/738975/
> 
> The basic idea is that transitions to and from userspace are proxied
> through a trampoline page which is mapped into a separate page table and
> can switch the full kernel mapping in and out on exception entry and
> exit respectively. This is a valuable defence against various KASLR and
> timing attacks, particularly as the trampoline page is at a fixed virtual
> address and therefore the kernel text can be randomized independently.
> 
> The major consequences of the trampoline are:
> 
>   * We can no longer make use of global mappings for kernel space, so
>     each task is assigned two ASIDs: one for user mappings and one for
>     kernel mappings
> 
>   * Our ASID moves into TTBR1 so that we can quickly switch between the
>     trampoline and kernel page tables
> 
>   * Switching TTBR0 always requires use of the zero page, so we can
>     dispense with some of our errata workaround code.
> 
>   * entry.S gets more complicated to read
> 
> The performance hit from this series isn't as bad as I feared: things
> like cyclictest and kernbench seem to be largely unaffected, although
> syscall micro-benchmarks appear to show that syscall overhead is roughly
> doubled, and this has an impact on things like hackbench which exhibits
> a ~10% hit due to its heavy context-switching.

Do you have performance benchmark numbers on CPUs with the Falkor
errata? I'm interested to see how much the TLB invalidate hurts
heavy context-switching workloads on these CPUs.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 14/18] arm64: erratum: Work around Falkor erratum #E1003 in trampoline code
  2017-11-17 18:21 ` [PATCH 14/18] arm64: erratum: Work around Falkor erratum #E1003 in trampoline code Will Deacon
@ 2017-11-18  0:27   ` Stephen Boyd
  2017-11-20 18:05     ` Will Deacon
  0 siblings, 1 reply; 45+ messages in thread
From: Stephen Boyd @ 2017-11-18  0:27 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-kernel, catalin.marinas, mark.rutland,
	ard.biesheuvel, dave.hansen, keescook

On 11/17, Will Deacon wrote:
> We rely on an atomic swizzling of TTBR1 when transitioning from the entry
> trampoline to the kernel proper on an exception. We can't rely on this
> atomicity in the face of Falkor erratum #E1003, so on affected cores we
> can issue a TLB invalidation prior to jumping into the kernel. There is
> still the possibility of a TLB conflict here due to conflicting walk
> cache entries, but this doesn't appear to be the case on these CPUs in
> practice.
> 
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> ---
>  arch/arm64/Kconfig        | 17 +++++------------
>  arch/arm64/kernel/entry.S |  8 ++++++++
>  2 files changed, 13 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 0df64a6a56d4..f0fcbfc2262e 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -504,20 +504,13 @@ config CAVIUM_ERRATUM_30115
>  config QCOM_FALKOR_ERRATUM_1003
>  	bool "Falkor E1003: Incorrect translation due to ASID change"
>  	default y
> -	select ARM64_PAN if ARM64_SW_TTBR0_PAN

Cool, this sort of complicates the backport of the Kryo MIDR
update of this errata to stable trees though.

> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index a839b94bba05..a600879939ce 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -941,6 +941,14 @@ __ni_sys_trace:
>  	sub	\tmp, \tmp, #(SWAPPER_DIR_SIZE + RESERVED_TTBR0_SIZE)
>  	bic	\tmp, \tmp, #USER_ASID_FLAG
>  	msr	ttbr1_el1, \tmp
> +alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1003

Shouldn't we put this inside an #ifdef QCOM_FALKOR_ERRATUM_1003
so that we don't even emit nops in case we have the errata
disabled? Or did I miss something in the alternatives assembly
code?

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER)
  2017-11-17 18:21 [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Will Deacon
                   ` (18 preceding siblings ...)
  2017-11-18  0:19 ` [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Stephen Boyd
@ 2017-11-18 15:25 ` Ard Biesheuvel
  2017-11-20 18:06   ` Will Deacon
  2017-11-20 22:50 ` Laura Abbott
  2017-11-22 16:19 ` Pavel Machek
  21 siblings, 1 reply; 45+ messages in thread
From: Ard Biesheuvel @ 2017-11-18 15:25 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-kernel, Catalin Marinas, Mark Rutland,
	Stephen Boyd, Dave Hansen, Kees Cook

On 17 November 2017 at 18:21, Will Deacon <will.deacon@arm.com> wrote:
> Hi all,
>
> This patch series implements something along the lines of KAISER for arm64:
>
>   https://gruss.cc/files/kaiser.pdf
>
> although I wrote this from scratch because the paper has some funny
> assumptions about how the architecture works. There is a patch series
> in review for x86, which follows a similar approach:
>
>   http://lkml.kernel.org/r/<20171110193058.BECA7D88@viggo.jf.intel.com>
>
> and the topic was recently covered by LWN (currently subscriber-only):
>
>   https://lwn.net/Articles/738975/
>
> The basic idea is that transitions to and from userspace are proxied
> through a trampoline page which is mapped into a separate page table and
> can switch the full kernel mapping in and out on exception entry and
> exit respectively. This is a valuable defence against various KASLR and
> timing attacks, particularly as the trampoline page is at a fixed virtual
> address and therefore the kernel text can be randomized independently.
>
> The major consequences of the trampoline are:
>
>   * We can no longer make use of global mappings for kernel space, so
>     each task is assigned two ASIDs: one for user mappings and one for
>     kernel mappings
>
>   * Our ASID moves into TTBR1 so that we can quickly switch between the
>     trampoline and kernel page tables
>
>   * Switching TTBR0 always requires use of the zero page, so we can
>     dispense with some of our errata workaround code.
>
>   * entry.S gets more complicated to read
>
> The performance hit from this series isn't as bad as I feared: things
> like cyclictest and kernbench seem to be largely unaffected, although
> syscall micro-benchmarks appear to show that syscall overhead is roughly
> doubled, and this has an impact on things like hackbench which exhibits
> a ~10% hit due to its heavy context-switching.
>
> Patches based on 4.14 and also pushed here:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git kaiser
>
> Feedback welcome,
>
> Will
>

Very nice! I am quite pleased, because this makes KASLR much more
useful than it is now.

My main question is why we need a separate trampoline vector table: it
seems to me that with some minor surgery (as proposed below), we can
make the kernel_ventry macro instantiations tolerant for being loaded
somewhere in the fixmap (which I think is a better place for this than
at the base of the VMALLOC space), removing the need to change
vbar_el1 back and forth. The only downside is that exceptions taken
from EL1 will also use absolute addressing, but I don't think that is
a huge price to pay.

-------------->8------------------
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index f8ce4cdd3bb5..7f89ebc690b1 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -71,6 +71,20 @@

  .macro kernel_ventry, el, label, regsize = 64
  .align 7
+alternative_if_not ARM64_MAP_KERNEL_AT_EL0
+ .if \regsize == 64
+ msr tpidrro_el0, x30 // preserve x30
+ .endif
+ .if \el == 0
+ mrs x30, ttbr1_el1
+ sub x30, x30, #(SWAPPER_DIR_SIZE + RESERVED_TTBR0_SIZE)
+ bic x30, x30, #USER_ASID_FLAG
+ msr ttbr1_el1, x30
+ isb
+ .endif
+ ldr x30, =el\()\el\()_\label
+alternative_else_nop_endif
+
  sub sp, sp, #S_FRAME_SIZE
 #ifdef CONFIG_VMAP_STACK
  /*
@@ -82,7 +96,11 @@
  tbnz x0, #THREAD_SHIFT, 0f
  sub x0, sp, x0 // x0'' = sp' - x0' = (sp + x0) - sp = x0
  sub sp, sp, x0 // sp'' = sp' - x0 = (sp + x0) - x0 = sp
+alternative_if_not ARM64_MAP_KERNEL_AT_EL0
+ br x30
+alternative_else
  b el\()\el\()_\label
+alternative_endif

 0:
  /*
@@ -91,6 +109,10 @@
  * userspace, and can clobber EL0 registers to free up GPRs.
  */

+alternative_if_not ARM64_MAP_KERNEL_AT_EL0
+ mrs x30, tpidrro_el0 // restore x30
+alternative_else_nop_endif
+
  /* Stash the original SP (minus S_FRAME_SIZE) in tpidr_el0. */
  msr tpidr_el0, x0

@@ -98,8 +120,11 @@
  sub x0, sp, x0
  msr tpidrro_el0, x0

- /* Switch to the overflow stack */
- adr_this_cpu sp, overflow_stack + OVERFLOW_STACK_SIZE, x0
+ /* Switch to the overflow stack of this CPU */
+ ldr x0, =overflow_stack + OVERFLOW_STACK_SIZE
+ mov sp, x0
+ mrs x0, tpidr_el1
+ add sp, sp, x0

  /*
  * Check whether we were already on the overflow stack. This may happen
@@ -108,19 +133,30 @@
  mrs x0, tpidr_el0 // sp of interrupted context
  sub x0, sp, x0 // delta with top of overflow stack
  tst x0, #~(OVERFLOW_STACK_SIZE - 1) // within range?
- b.ne __bad_stack // no? -> bad stack pointer
+ b.eq 1f
+ ldr x0, =__bad_stack // no? -> bad stack pointer
+ br x0

  /* We were already on the overflow stack. Restore sp/x0 and carry on. */
- sub sp, sp, x0
+1: sub sp, sp, x0
  mrs x0, tpidrro_el0
 #endif
+alternative_if_not ARM64_MAP_KERNEL_AT_EL0
+ br x30
+alternative_else
  b el\()\el\()_\label
+alternative_endif
  .endm

- .macro kernel_entry, el, regsize = 64
+ .macro kernel_entry, el, regsize = 64, restore_x30 = 1
  .if \regsize == 32
  mov w0, w0 // zero upper 32 bits of x0
  .endif
+ .if \restore_x30
+alternative_if_not ARM64_MAP_KERNEL_AT_EL0
+ mrs x30, tpidrro_el0 // restore x30
+alternative_else_nop_endif
+ .endif
  stp x0, x1, [sp, #16 * 0]
  stp x2, x3, [sp, #16 * 1]
  stp x4, x5, [sp, #16 * 2]
@@ -363,7 +399,7 @@ tsk .req x28 // current thread_info
  */
  .pushsection ".entry.text", "ax"

- .align 11
+ .align PAGE_SHIFT
 ENTRY(vectors)
  kernel_ventry 1, sync_invalid // Synchronous EL1t
  kernel_ventry 1, irq_invalid // IRQ EL1t
@@ -391,6 +427,8 @@ ENTRY(vectors)
  kernel_ventry 0, fiq_invalid, 32 // FIQ 32-bit EL0
  kernel_ventry 0, error_invalid, 32 // Error 32-bit EL0
 #endif
+ .ltorg
+ .align PAGE_SHIFT
 END(vectors)

 #ifdef CONFIG_VMAP_STACK
@@ -408,7 +446,7 @@ __bad_stack:
  * S_FRAME_SIZE) was stashed in tpidr_el0 by kernel_ventry.
  */
  sub sp, sp, #S_FRAME_SIZE
- kernel_entry 1
+ kernel_entry 1, restore_x30=0
  mrs x0, tpidr_el0
  add x0, x0, #S_FRAME_SIZE
  str x0, [sp, #S_SP]

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER)
  2017-11-18  0:19 ` [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Stephen Boyd
@ 2017-11-20 18:03   ` Will Deacon
  0 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2017-11-20 18:03 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: linux-arm-kernel, linux-kernel, catalin.marinas, mark.rutland,
	ard.biesheuvel, dave.hansen, keescook

On Fri, Nov 17, 2017 at 04:19:35PM -0800, Stephen Boyd wrote:
> On 11/17, Will Deacon wrote:
> > Hi all,
> > 
> > This patch series implements something along the lines of KAISER for arm64:
> > 
> >   https://gruss.cc/files/kaiser.pdf
> > 
> > although I wrote this from scratch because the paper has some funny
> > assumptions about how the architecture works. There is a patch series
> > in review for x86, which follows a similar approach:
> > 
> >   http://lkml.kernel.org/r/<20171110193058.BECA7D88@viggo.jf.intel.com>
> > 
> > and the topic was recently covered by LWN (currently subscriber-only):
> > 
> >   https://lwn.net/Articles/738975/
> > 
> > The basic idea is that transitions to and from userspace are proxied
> > through a trampoline page which is mapped into a separate page table and
> > can switch the full kernel mapping in and out on exception entry and
> > exit respectively. This is a valuable defence against various KASLR and
> > timing attacks, particularly as the trampoline page is at a fixed virtual
> > address and therefore the kernel text can be randomized independently.
> > 
> > The major consequences of the trampoline are:
> > 
> >   * We can no longer make use of global mappings for kernel space, so
> >     each task is assigned two ASIDs: one for user mappings and one for
> >     kernel mappings
> > 
> >   * Our ASID moves into TTBR1 so that we can quickly switch between the
> >     trampoline and kernel page tables
> > 
> >   * Switching TTBR0 always requires use of the zero page, so we can
> >     dispense with some of our errata workaround code.
> > 
> >   * entry.S gets more complicated to read
> > 
> > The performance hit from this series isn't as bad as I feared: things
> > like cyclictest and kernbench seem to be largely unaffected, although
> > syscall micro-benchmarks appear to show that syscall overhead is roughly
> > doubled, and this has an impact on things like hackbench which exhibits
> > a ~10% hit due to its heavy context-switching.
> 
> Do you have performance benchmark numbers on CPUs with the Falkor
> errata? I'm interested to see how much the TLB invalidate hurts
> heavy context-switching workloads on these CPUs.

I don't, but I'm also not sure what I can do about it if it's an issue.

Will

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 14/18] arm64: erratum: Work around Falkor erratum #E1003 in trampoline code
  2017-11-18  0:27   ` Stephen Boyd
@ 2017-11-20 18:05     ` Will Deacon
  0 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2017-11-20 18:05 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: linux-arm-kernel, linux-kernel, catalin.marinas, mark.rutland,
	ard.biesheuvel, dave.hansen, keescook

On Fri, Nov 17, 2017 at 04:27:14PM -0800, Stephen Boyd wrote:
> On 11/17, Will Deacon wrote:
> > We rely on an atomic swizzling of TTBR1 when transitioning from the entry
> > trampoline to the kernel proper on an exception. We can't rely on this
> > atomicity in the face of Falkor erratum #E1003, so on affected cores we
> > can issue a TLB invalidation prior to jumping into the kernel. There is
> > still the possibility of a TLB conflict here due to conflicting walk
> > cache entries, but this doesn't appear to be the case on these CPUs in
> > practice.
> > 
> > Signed-off-by: Will Deacon <will.deacon@arm.com>
> > ---
> >  arch/arm64/Kconfig        | 17 +++++------------
> >  arch/arm64/kernel/entry.S |  8 ++++++++
> >  2 files changed, 13 insertions(+), 12 deletions(-)
> > 
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index 0df64a6a56d4..f0fcbfc2262e 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -504,20 +504,13 @@ config CAVIUM_ERRATUM_30115
> >  config QCOM_FALKOR_ERRATUM_1003
> >  	bool "Falkor E1003: Incorrect translation due to ASID change"
> >  	default y
> > -	select ARM64_PAN if ARM64_SW_TTBR0_PAN
> 
> Cool, this sort of complicates the backport of the Kryo MIDR
> update of this errata to stable trees though.

Yeah, you may have to do a separate version for -stable if you don't
want to backport parts of this series.

> > diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> > index a839b94bba05..a600879939ce 100644
> > --- a/arch/arm64/kernel/entry.S
> > +++ b/arch/arm64/kernel/entry.S
> > @@ -941,6 +941,14 @@ __ni_sys_trace:
> >  	sub	\tmp, \tmp, #(SWAPPER_DIR_SIZE + RESERVED_TTBR0_SIZE)
> >  	bic	\tmp, \tmp, #USER_ASID_FLAG
> >  	msr	ttbr1_el1, \tmp
> > +alternative_if ARM64_WORKAROUND_QCOM_FALKOR_E1003
> 
> Shouldn't we put this inside an #ifdef QCOM_FALKOR_ERRATUM_1003
> so that we don't even emit nops in case we have the errata
> disabled? Or did I miss something in the alternatives assembly
> code?

Yes, you're right. Thanks!

Will

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER)
  2017-11-18 15:25 ` Ard Biesheuvel
@ 2017-11-20 18:06   ` Will Deacon
  2017-11-20 18:20     ` Ard Biesheuvel
  0 siblings, 1 reply; 45+ messages in thread
From: Will Deacon @ 2017-11-20 18:06 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, linux-kernel, Catalin Marinas, Mark Rutland,
	Stephen Boyd, Dave Hansen, Kees Cook

Hi Ard,

Cheers for having a look.

On Sat, Nov 18, 2017 at 03:25:06PM +0000, Ard Biesheuvel wrote:
> On 17 November 2017 at 18:21, Will Deacon <will.deacon@arm.com> wrote:
> > This patch series implements something along the lines of KAISER for arm64:
> 
> Very nice! I am quite pleased, because this makes KASLR much more
> useful than it is now.

Agreed. I might actually start enabling now ;)

> My main question is why we need a separate trampoline vector table: it
> seems to me that with some minor surgery (as proposed below), we can
> make the kernel_ventry macro instantiations tolerant for being loaded
> somewhere in the fixmap (which I think is a better place for this than
> at the base of the VMALLOC space), removing the need to change
> vbar_el1 back and forth. The only downside is that exceptions taken
> from EL1 will also use absolute addressing, but I don't think that is
> a huge price to pay.

I think there are two aspects to this:

1. Moving the vectors to the fixmap
2. Avoiding the vbar toggle

I think (1) is a good idea, so I'll hack that up for v2. The vbar toggle
isn't as obvious: avoiding it adds some overhead to EL1 irq entry because
we're writing tpidrro_el0 as well as loading from the literal pool. I think
that it also makes the code more difficult to reason about because we'd have
to make sure we don't try to use the fixmap mapping before it's actually
mapped, which I think would mean we'd need a set of early vectors that we
then switch away from in a CPU hotplug notifier or something.

I'll see if I can measure the cost of the current vbar switching to get
an idea of the potential performance available.

Will

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER)
  2017-11-20 18:06   ` Will Deacon
@ 2017-11-20 18:20     ` Ard Biesheuvel
  2017-11-22 19:37       ` Will Deacon
  0 siblings, 1 reply; 45+ messages in thread
From: Ard Biesheuvel @ 2017-11-20 18:20 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-kernel, Catalin Marinas, Mark Rutland,
	Stephen Boyd, Dave Hansen, Kees Cook

On 20 November 2017 at 18:06, Will Deacon <will.deacon@arm.com> wrote:
> Hi Ard,
>
> Cheers for having a look.
>
> On Sat, Nov 18, 2017 at 03:25:06PM +0000, Ard Biesheuvel wrote:
>> On 17 November 2017 at 18:21, Will Deacon <will.deacon@arm.com> wrote:
>> > This patch series implements something along the lines of KAISER for arm64:
>>
>> Very nice! I am quite pleased, because this makes KASLR much more
>> useful than it is now.
>
> Agreed. I might actually start enabling now ;)
>

I think it makes more sense to have enabled on your phone than on the
devboard on your desk.

>> My main question is why we need a separate trampoline vector table: it
>> seems to me that with some minor surgery (as proposed below), we can
>> make the kernel_ventry macro instantiations tolerant for being loaded
>> somewhere in the fixmap (which I think is a better place for this than
>> at the base of the VMALLOC space), removing the need to change
>> vbar_el1 back and forth. The only downside is that exceptions taken
>> from EL1 will also use absolute addressing, but I don't think that is
>> a huge price to pay.
>
> I think there are two aspects to this:
>
> 1. Moving the vectors to the fixmap
> 2. Avoiding the vbar toggle
>
> I think (1) is a good idea, so I'll hack that up for v2. The vbar toggle
> isn't as obvious: avoiding it adds some overhead to EL1 irq entry because
> we're writing tpidrro_el0 as well as loading from the literal pool.

Yeah, but in what workloads are interrupts taken while running in the
kernel a dominant factor?

> I think
> that it also makes the code more difficult to reason about because we'd have
> to make sure we don't try to use the fixmap mapping before it's actually
> mapped, which I think would mean we'd need a set of early vectors that we
> then switch away from in a CPU hotplug notifier or something.
>

I don't think this is necessary. The vector page with absolute
addressing would tolerate being accessed via its natural mapping
inside the kernel image as well as via the mapping in the fixmap
region.

> I'll see if I can measure the cost of the current vbar switching to get
> an idea of the potential performance available.
>

Yeah, makes sense. If the bulk of the performance hit is elsewhere,
there's no point in focusing on this bit.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER)
  2017-11-17 18:21 [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Will Deacon
                   ` (19 preceding siblings ...)
  2017-11-18 15:25 ` Ard Biesheuvel
@ 2017-11-20 22:50 ` Laura Abbott
  2017-11-22 19:37   ` Will Deacon
  2017-11-22 16:19 ` Pavel Machek
  21 siblings, 1 reply; 45+ messages in thread
From: Laura Abbott @ 2017-11-20 22:50 UTC (permalink / raw)
  To: Will Deacon, linux-arm-kernel
  Cc: linux-kernel, catalin.marinas, mark.rutland, ard.biesheuvel,
	sboyd, dave.hansen, keescook

On 11/17/2017 10:21 AM, Will Deacon wrote:
> Hi all,
> 
> This patch series implements something along the lines of KAISER for arm64:
> 
>    https://gruss.cc/files/kaiser.pdf
> 
> although I wrote this from scratch because the paper has some funny
> assumptions about how the architecture works. There is a patch series
> in review for x86, which follows a similar approach:
> 
>    http://lkml.kernel.org/r/<20171110193058.BECA7D88@viggo.jf.intel.com>
> 
> and the topic was recently covered by LWN (currently subscriber-only):
> 
>    https://lwn.net/Articles/738975/
> 
> The basic idea is that transitions to and from userspace are proxied
> through a trampoline page which is mapped into a separate page table and
> can switch the full kernel mapping in and out on exception entry and
> exit respectively. This is a valuable defence against various KASLR and
> timing attacks, particularly as the trampoline page is at a fixed virtual
> address and therefore the kernel text can be randomized independently.
> 
> The major consequences of the trampoline are:
> 
>    * We can no longer make use of global mappings for kernel space, so
>      each task is assigned two ASIDs: one for user mappings and one for
>      kernel mappings
> 
>    * Our ASID moves into TTBR1 so that we can quickly switch between the
>      trampoline and kernel page tables
> 
>    * Switching TTBR0 always requires use of the zero page, so we can
>      dispense with some of our errata workaround code.
> 
>    * entry.S gets more complicated to read
> 
> The performance hit from this series isn't as bad as I feared: things
> like cyclictest and kernbench seem to be largely unaffected, although
> syscall micro-benchmarks appear to show that syscall overhead is roughly
> doubled, and this has an impact on things like hackbench which exhibits
> a ~10% hit due to its heavy context-switching.
> 
> Patches based on 4.14 and also pushed here:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git kaiser
> 
> Feedback welcome,
> 
> Will
> 

Passed some basic tests on Hikey Android and my Mustang box. I'll
leave the Mustang building kernels for a few days. You're welcome
to add Tested-by or I can re-test on v2.

Thanks,
Laura

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER)
  2017-11-17 18:21 [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Will Deacon
                   ` (20 preceding siblings ...)
  2017-11-20 22:50 ` Laura Abbott
@ 2017-11-22 16:19 ` Pavel Machek
  2017-11-22 19:37   ` Will Deacon
  2017-11-22 21:19   ` Ard Biesheuvel
  21 siblings, 2 replies; 45+ messages in thread
From: Pavel Machek @ 2017-11-22 16:19 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-kernel, catalin.marinas, mark.rutland,
	ard.biesheuvel, sboyd, dave.hansen, keescook

[-- Attachment #1: Type: text/plain, Size: 1304 bytes --]

Hi!

> This patch series implements something along the lines of KAISER for arm64:
> 
>   https://gruss.cc/files/kaiser.pdf
> 
> although I wrote this from scratch because the paper has some funny
> assumptions about how the architecture works. There is a patch series
> in review for x86, which follows a similar approach:
> 
>   http://lkml.kernel.org/r/<20171110193058.BECA7D88@viggo.jf.intel.com>
> 
> and the topic was recently covered by LWN (currently subscriber-only):
> 
>   https://lwn.net/Articles/738975/
> 
> The basic idea is that transitions to and from userspace are proxied
> through a trampoline page which is mapped into a separate page table and
> can switch the full kernel mapping in and out on exception entry and
> exit respectively. This is a valuable defence against various KASLR and
> timing attacks, particularly as the trampoline page is at a fixed virtual
> address and therefore the kernel text can be randomized
> independently.

If I'm willing to do timing attacks to defeat KASLR... what prevents
me from using CPU caches to do that?

There was blackhat talk about exactly that IIRC...
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 18/18] arm64: Kconfig: Add CONFIG_UNMAP_KERNEL_AT_EL0
  2017-11-17 18:22 ` [PATCH 18/18] arm64: Kconfig: Add CONFIG_UNMAP_KERNEL_AT_EL0 Will Deacon
@ 2017-11-22 16:52   ` Marc Zyngier
  2017-11-22 19:36     ` Will Deacon
  0 siblings, 1 reply; 45+ messages in thread
From: Marc Zyngier @ 2017-11-22 16:52 UTC (permalink / raw)
  To: Will Deacon, linux-arm-kernel
  Cc: linux-kernel, catalin.marinas, mark.rutland, ard.biesheuvel,
	sboyd, dave.hansen, keescook

Hi Will,

On 17/11/17 18:22, Will Deacon wrote:
> Add a Kconfig entry to control use of the entry trampoline, which allows
> us to unmap the kernel whilst running in userspace and improve the
> robustness of KASLR.
> 
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> ---
>  arch/arm64/Kconfig | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index f0fcbfc2262e..f99ffb88843a 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -796,6 +796,19 @@ config FORCE_MAX_ZONEORDER
>  	  However for 4K, we choose a higher default value, 11 as opposed to 10, giving us
>  	  4M allocations matching the default size used by generic code.
>  
> +config UNMAP_KERNEL_AT_EL0
> +	bool "Unmap kernel when running in userspace (aka \"KAISER\")"
> +	default y
> +	help
> +	  Some attacks against KASLR make use of the timing difference between
> +	  a permission fault which could arise from a page table entry that is
> +	  present in the TLB, and a translation fault which always requires a
> +	  page table walk. This option defends against these attacks by unmapping
> +	  the kernel whilst running in userspace, therefore forcing translation
> +	  faults for all of kernel space.
> +
> +	  If unsure, say Y.
> +
>  menuconfig ARMV8_DEPRECATED
>  	bool "Emulate deprecated/obsolete ARMv8 instructions"
>  	depends on COMPAT
> 

Since this seems to be the recommended setting, I wonder if there is any
real value in keeping the old code around. My hunch is that the lack of
use in the field will make it fragile and that it will eventually bit-rot.

Do you have any plan to eventually drop the non-KAISER switch code?

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 18/18] arm64: Kconfig: Add CONFIG_UNMAP_KERNEL_AT_EL0
  2017-11-22 16:52   ` Marc Zyngier
@ 2017-11-22 19:36     ` Will Deacon
  0 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2017-11-22 19:36 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, linux-kernel, catalin.marinas, mark.rutland,
	ard.biesheuvel, sboyd, dave.hansen, keescook

Hi Marc,

On Wed, Nov 22, 2017 at 04:52:36PM +0000, Marc Zyngier wrote:
> On 17/11/17 18:22, Will Deacon wrote:
> > Add a Kconfig entry to control use of the entry trampoline, which allows
> > us to unmap the kernel whilst running in userspace and improve the
> > robustness of KASLR.
> > 
> > Signed-off-by: Will Deacon <will.deacon@arm.com>
> > ---
> >  arch/arm64/Kconfig | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> > 
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index f0fcbfc2262e..f99ffb88843a 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -796,6 +796,19 @@ config FORCE_MAX_ZONEORDER
> >  	  However for 4K, we choose a higher default value, 11 as opposed to 10, giving us
> >  	  4M allocations matching the default size used by generic code.
> >  
> > +config UNMAP_KERNEL_AT_EL0
> > +	bool "Unmap kernel when running in userspace (aka \"KAISER\")"
> > +	default y
> > +	help
> > +	  Some attacks against KASLR make use of the timing difference between
> > +	  a permission fault which could arise from a page table entry that is
> > +	  present in the TLB, and a translation fault which always requires a
> > +	  page table walk. This option defends against these attacks by unmapping
> > +	  the kernel whilst running in userspace, therefore forcing translation
> > +	  faults for all of kernel space.
> > +
> > +	  If unsure, say Y.
> > +
> >  menuconfig ARMV8_DEPRECATED
> >  	bool "Emulate deprecated/obsolete ARMv8 instructions"
> >  	depends on COMPAT
> > 
> 
> Since this seems to be the recommended setting, I wonder if there is any
> real value in keeping the old code around. My hunch is that the lack of
> use in the field will make it fragile and that it will eventually bit-rot.
> 
> Do you have any plan to eventually drop the non-KAISER switch code?

Good question. I think having a command-line option for this makes sense,
but I'd expect distributions to run with the config option enabled. That
basically leaves us with two changes we could make by forcing the option
to be enabled:

  1. The changes in the linker script could be made unconditional
  2. We could drop support for global mappings altogether

I'd be in favour of that because it avoids some conceptual complexity and
would allow us to rely on nG mappings in the future (there might be
something fun we can do with the percpu area, but there are dragons here).

I'd be interested in other people's opinions on this one.

Will

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER)
  2017-11-20 22:50 ` Laura Abbott
@ 2017-11-22 19:37   ` Will Deacon
  0 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2017-11-22 19:37 UTC (permalink / raw)
  To: Laura Abbott
  Cc: linux-arm-kernel, linux-kernel, catalin.marinas, mark.rutland,
	ard.biesheuvel, sboyd, dave.hansen, keescook

On Mon, Nov 20, 2017 at 02:50:58PM -0800, Laura Abbott wrote:
> On 11/17/2017 10:21 AM, Will Deacon wrote:
> >This patch series implements something along the lines of KAISER for arm64:
> 
> Passed some basic tests on Hikey Android and my Mustang box. I'll
> leave the Mustang building kernels for a few days. You're welcome
> to add Tested-by or I can re-test on v2.

Cheers, Laura. I've got a few changes for v2 based on Ard's feedback, so if
you could retest that when I post it then it would be much appreciated.

Will

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER)
  2017-11-22 16:19 ` Pavel Machek
@ 2017-11-22 19:37   ` Will Deacon
  2017-11-22 22:36     ` Pavel Machek
  2017-11-22 21:19   ` Ard Biesheuvel
  1 sibling, 1 reply; 45+ messages in thread
From: Will Deacon @ 2017-11-22 19:37 UTC (permalink / raw)
  To: Pavel Machek
  Cc: linux-arm-kernel, linux-kernel, catalin.marinas, mark.rutland,
	ard.biesheuvel, sboyd, dave.hansen, keescook

On Wed, Nov 22, 2017 at 05:19:14PM +0100, Pavel Machek wrote:
> > This patch series implements something along the lines of KAISER for arm64:
> > 
> >   https://gruss.cc/files/kaiser.pdf
> > 
> > although I wrote this from scratch because the paper has some funny
> > assumptions about how the architecture works. There is a patch series
> > in review for x86, which follows a similar approach:
> > 
> >   http://lkml.kernel.org/r/<20171110193058.BECA7D88@viggo.jf.intel.com>
> > 
> > and the topic was recently covered by LWN (currently subscriber-only):
> > 
> >   https://lwn.net/Articles/738975/
> > 
> > The basic idea is that transitions to and from userspace are proxied
> > through a trampoline page which is mapped into a separate page table and
> > can switch the full kernel mapping in and out on exception entry and
> > exit respectively. This is a valuable defence against various KASLR and
> > timing attacks, particularly as the trampoline page is at a fixed virtual
> > address and therefore the kernel text can be randomized
> > independently.
> 
> If I'm willing to do timing attacks to defeat KASLR... what prevents
> me from using CPU caches to do that?

Is that a rhetorical question? If not, then I'm probably not the best person
to answer it. All I'm doing here is protecting against a class of attacks on
kaslr that make use of the TLB/page-table walker to determine where the
kernel is mapped.

> There was blackhat talk about exactly that IIRC...

Got a link? I'd be interested to see how the idea works in case there's an
orthogonal defence against it.

Will

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER)
  2017-11-20 18:20     ` Ard Biesheuvel
@ 2017-11-22 19:37       ` Will Deacon
  0 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2017-11-22 19:37 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: linux-arm-kernel, linux-kernel, Catalin Marinas, Mark Rutland,
	Stephen Boyd, Dave Hansen, Kees Cook

On Mon, Nov 20, 2017 at 06:20:39PM +0000, Ard Biesheuvel wrote:
> On 20 November 2017 at 18:06, Will Deacon <will.deacon@arm.com> wrote:
> > I'll see if I can measure the cost of the current vbar switching to get
> > an idea of the potential performance available.
> >
> 
> Yeah, makes sense. If the bulk of the performance hit is elsewhere,
> there's no point in focusing on this bit.

I had a go at implementing a variant on your suggestion where we avoid
swizzling the vbar on exception entry/exit but I couldn't reliably measure a
difference in performance. It appears that the ISB needed by the TTBR change
is dominant, so the vbar write is insignificant.

Will

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER)
  2017-11-22 16:19 ` Pavel Machek
  2017-11-22 19:37   ` Will Deacon
@ 2017-11-22 21:19   ` Ard Biesheuvel
  2017-11-22 22:33     ` Pavel Machek
  1 sibling, 1 reply; 45+ messages in thread
From: Ard Biesheuvel @ 2017-11-22 21:19 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Will Deacon, linux-arm-kernel, linux-kernel, Catalin Marinas,
	Mark Rutland, Stephen Boyd, Dave Hansen, Kees Cook

On 22 November 2017 at 16:19, Pavel Machek <pavel@ucw.cz> wrote:
> Hi!
>
>> This patch series implements something along the lines of KAISER for arm64:
>>
>>   https://gruss.cc/files/kaiser.pdf
>>
>> although I wrote this from scratch because the paper has some funny
>> assumptions about how the architecture works. There is a patch series
>> in review for x86, which follows a similar approach:
>>
>>   http://lkml.kernel.org/r/<20171110193058.BECA7D88@viggo.jf.intel.com>
>>
>> and the topic was recently covered by LWN (currently subscriber-only):
>>
>>   https://lwn.net/Articles/738975/
>>
>> The basic idea is that transitions to and from userspace are proxied
>> through a trampoline page which is mapped into a separate page table and
>> can switch the full kernel mapping in and out on exception entry and
>> exit respectively. This is a valuable defence against various KASLR and
>> timing attacks, particularly as the trampoline page is at a fixed virtual
>> address and therefore the kernel text can be randomized
>> independently.
>
> If I'm willing to do timing attacks to defeat KASLR... what prevents
> me from using CPU caches to do that?
>

Because it is impossible to get a cache hit on an access to an unmapped address?

> There was blackhat talk about exactly that IIRC...
>                                                                         Pavel
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER)
  2017-11-22 21:19   ` Ard Biesheuvel
@ 2017-11-22 22:33     ` Pavel Machek
  2017-11-22 23:19       ` Ard Biesheuvel
  0 siblings, 1 reply; 45+ messages in thread
From: Pavel Machek @ 2017-11-22 22:33 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Will Deacon, linux-arm-kernel, linux-kernel, Catalin Marinas,
	Mark Rutland, Stephen Boyd, Dave Hansen, Kees Cook

[-- Attachment #1: Type: text/plain, Size: 1787 bytes --]

On Wed 2017-11-22 21:19:28, Ard Biesheuvel wrote:
> On 22 November 2017 at 16:19, Pavel Machek <pavel@ucw.cz> wrote:
> > Hi!
> >
> >> This patch series implements something along the lines of KAISER for arm64:
> >>
> >>   https://gruss.cc/files/kaiser.pdf
> >>
> >> although I wrote this from scratch because the paper has some funny
> >> assumptions about how the architecture works. There is a patch series
> >> in review for x86, which follows a similar approach:
> >>
> >>   http://lkml.kernel.org/r/<20171110193058.BECA7D88@viggo.jf.intel.com>
> >>
> >> and the topic was recently covered by LWN (currently subscriber-only):
> >>
> >>   https://lwn.net/Articles/738975/
> >>
> >> The basic idea is that transitions to and from userspace are proxied
> >> through a trampoline page which is mapped into a separate page table and
> >> can switch the full kernel mapping in and out on exception entry and
> >> exit respectively. This is a valuable defence against various KASLR and
> >> timing attacks, particularly as the trampoline page is at a fixed virtual
> >> address and therefore the kernel text can be randomized
> >> independently.
> >
> > If I'm willing to do timing attacks to defeat KASLR... what prevents
> > me from using CPU caches to do that?
> >
> 
> Because it is impossible to get a cache hit on an access to an
> unmapped address?

Um, no, I don't need to be able to directly access kernel addresses. I
just put some data in _same place in cache where kernel data would
go_, then do syscall and look if my data are still cached. Caches
don't have infinite associativity.

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER)
  2017-11-22 19:37   ` Will Deacon
@ 2017-11-22 22:36     ` Pavel Machek
  0 siblings, 0 replies; 45+ messages in thread
From: Pavel Machek @ 2017-11-22 22:36 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-arm-kernel, linux-kernel, catalin.marinas, mark.rutland,
	ard.biesheuvel, sboyd, dave.hansen, keescook

[-- Attachment #1: Type: text/plain, Size: 2347 bytes --]

On Wed 2017-11-22 19:37:14, Will Deacon wrote:
> On Wed, Nov 22, 2017 at 05:19:14PM +0100, Pavel Machek wrote:
> > > This patch series implements something along the lines of KAISER for arm64:
> > > 
> > >   https://gruss.cc/files/kaiser.pdf
> > > 
> > > although I wrote this from scratch because the paper has some funny
> > > assumptions about how the architecture works. There is a patch series
> > > in review for x86, which follows a similar approach:
> > > 
> > >   http://lkml.kernel.org/r/<20171110193058.BECA7D88@viggo.jf.intel.com>
> > > 
> > > and the topic was recently covered by LWN (currently subscriber-only):
> > > 
> > >   https://lwn.net/Articles/738975/
> > > 
> > > The basic idea is that transitions to and from userspace are proxied
> > > through a trampoline page which is mapped into a separate page table and
> > > can switch the full kernel mapping in and out on exception entry and
> > > exit respectively. This is a valuable defence against various KASLR and
> > > timing attacks, particularly as the trampoline page is at a fixed virtual
> > > address and therefore the kernel text can be randomized
> > > independently.
> > 
> > If I'm willing to do timing attacks to defeat KASLR... what prevents
> > me from using CPU caches to do that?
> 
> Is that a rhetorical question? If not, then I'm probably not the best person
> to answer it. All I'm doing here is protecting against a class of attacks on
> kaslr that make use of the TLB/page-table walker to determine where the
> kernel is mapped.

Yeah. What I'm saying is that I can use cache effects to probe where
kernel is mapped (and what it is doing).

> > There was blackhat talk about exactly that IIRC...
> 
> Got a link? I'd be interested to see how the idea works in case there's an
> orthogonal defence against it.

https://www.youtube.com/watch?v=9KsnFWejpQg

(Tell me if it is not the right one).

As of defenses... yes. "maxcpus=1" and flush caches on switch to
usermode will do the trick :-).

Ok, so that was sarcastic. I'm not sure if good defense exists. ARM is
better than i386 because reading time and cache flushing is
priviledged, but...

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER)
  2017-11-22 22:33     ` Pavel Machek
@ 2017-11-22 23:19       ` Ard Biesheuvel
  2017-11-22 23:37         ` Pavel Machek
  0 siblings, 1 reply; 45+ messages in thread
From: Ard Biesheuvel @ 2017-11-22 23:19 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Will Deacon, linux-arm-kernel, linux-kernel, Catalin Marinas,
	Mark Rutland, Stephen Boyd, Dave Hansen, Kees Cook


> On 22 Nov 2017, at 22:33, Pavel Machek <pavel@ucw.cz> wrote:
> 
>> On Wed 2017-11-22 21:19:28, Ard Biesheuvel wrote:
>>> On 22 November 2017 at 16:19, Pavel Machek <pavel@ucw.cz> wrote:
>>> Hi!
>>> 
>>>> This patch series implements something along the lines of KAISER for arm64:
>>>> 
>>>>  https://gruss.cc/files/kaiser.pdf
>>>> 
>>>> although I wrote this from scratch because the paper has some funny
>>>> assumptions about how the architecture works. There is a patch series
>>>> in review for x86, which follows a similar approach:
>>>> 
>>>>  http://lkml.kernel.org/r/<20171110193058.BECA7D88@viggo.jf.intel.com>
>>>> 
>>>> and the topic was recently covered by LWN (currently subscriber-only):
>>>> 
>>>>  https://lwn.net/Articles/738975/
>>>> 
>>>> The basic idea is that transitions to and from userspace are proxied
>>>> through a trampoline page which is mapped into a separate page table and
>>>> can switch the full kernel mapping in and out on exception entry and
>>>> exit respectively. This is a valuable defence against various KASLR and
>>>> timing attacks, particularly as the trampoline page is at a fixed virtual
>>>> address and therefore the kernel text can be randomized
>>>> independently.
>>> 
>>> If I'm willing to do timing attacks to defeat KASLR... what prevents
>>> me from using CPU caches to do that?
>>> 
>> 
>> Because it is impossible to get a cache hit on an access to an
>> unmapped address?
> 
> Um, no, I don't need to be able to directly access kernel addresses. I
> just put some data in _same place in cache where kernel data would
> go_, then do syscall and look if my data are still cached. Caches
> don't have infinite associativity.
> 

Ah ok. Interesting.

But how does that leak address bits that are covered by the tag?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER)
  2017-11-22 23:19       ` Ard Biesheuvel
@ 2017-11-22 23:37         ` Pavel Machek
  2017-11-23  6:51           ` Ard Biesheuvel
  0 siblings, 1 reply; 45+ messages in thread
From: Pavel Machek @ 2017-11-22 23:37 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Will Deacon, linux-arm-kernel, linux-kernel, Catalin Marinas,
	Mark Rutland, Stephen Boyd, Dave Hansen, Kees Cook

[-- Attachment #1: Type: text/plain, Size: 945 bytes --]

Hi!

> >>> If I'm willing to do timing attacks to defeat KASLR... what prevents
> >>> me from using CPU caches to do that?
> >>> 
> >> 
> >> Because it is impossible to get a cache hit on an access to an
> >> unmapped address?
> > 
> > Um, no, I don't need to be able to directly access kernel addresses. I
> > just put some data in _same place in cache where kernel data would
> > go_, then do syscall and look if my data are still cached. Caches
> > don't have infinite associativity.
> > 
> 
> Ah ok. Interesting.
> 
> But how does that leak address bits that are covered by the tag?

Same as leaking any other address bits? Caches are "virtually
indexed", and tag does not come into play...

Maybe this explains it?

https://www.youtube.com/watch?v=9KsnFWejpQg
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER)
  2017-11-22 23:37         ` Pavel Machek
@ 2017-11-23  6:51           ` Ard Biesheuvel
  2017-11-23  9:07             ` Pavel Machek
  0 siblings, 1 reply; 45+ messages in thread
From: Ard Biesheuvel @ 2017-11-23  6:51 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Will Deacon, linux-arm-kernel, linux-kernel, Catalin Marinas,
	Mark Rutland, Stephen Boyd, Dave Hansen, Kees Cook



> On 22 Nov 2017, at 23:37, Pavel Machek <pavel@ucw.cz> wrote:
> 
> Hi!
> 
>>>>> If I'm willing to do timing attacks to defeat KASLR... what prevents
>>>>> me from using CPU caches to do that?
>>>>> 
>>>> 
>>>> Because it is impossible to get a cache hit on an access to an
>>>> unmapped address?
>>> 
>>> Um, no, I don't need to be able to directly access kernel addresses. I
>>> just put some data in _same place in cache where kernel data would
>>> go_, then do syscall and look if my data are still cached. Caches
>>> don't have infinite associativity.
>>> 
>> 
>> Ah ok. Interesting.
>> 
>> But how does that leak address bits that are covered by the tag?
> 
> Same as leaking any other address bits? Caches are "virtually
> indexed",

Not on arm64, although I don’t see how that is relevant if you are trying to defeat kaslr.

> and tag does not come into play...
> 

Well, I must be missing something then, because I don’t see how knowledge about which userland address shares a cache way with a kernel address can leak anything beyond the bits that make up the index (i.e., which cache way is being shared)

> Maybe this explains it?
> 

No not really. It explains how cache timing can be used as a side channel, not how it defeats kaslr.

Thanks,
Ard.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER)
  2017-11-23  6:51           ` Ard Biesheuvel
@ 2017-11-23  9:07             ` Pavel Machek
  2017-11-23  9:23               ` Ard Biesheuvel
  0 siblings, 1 reply; 45+ messages in thread
From: Pavel Machek @ 2017-11-23  9:07 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Will Deacon, linux-arm-kernel, linux-kernel, Catalin Marinas,
	Mark Rutland, Stephen Boyd, Dave Hansen, Kees Cook

[-- Attachment #1: Type: text/plain, Size: 1925 bytes --]

Hi!

> > On 22 Nov 2017, at 23:37, Pavel Machek <pavel@ucw.cz> wrote:
> > 
> > Hi!
> > 
> >>>>> If I'm willing to do timing attacks to defeat KASLR... what prevents
> >>>>> me from using CPU caches to do that?
> >>>>> 
> >>>> 
> >>>> Because it is impossible to get a cache hit on an access to an
> >>>> unmapped address?
> >>> 
> >>> Um, no, I don't need to be able to directly access kernel addresses. I
> >>> just put some data in _same place in cache where kernel data would
> >>> go_, then do syscall and look if my data are still cached. Caches
> >>> don't have infinite associativity.
> >>> 
> >> 
> >> Ah ok. Interesting.
> >> 
> >> But how does that leak address bits that are covered by the tag?
> > 
> > Same as leaking any other address bits? Caches are "virtually
> > indexed",
> 
> Not on arm64, although I don’t see how that is relevant if you are trying to defeat kaslr.
> 
> > and tag does not come into play...
> > 
> 
> Well, I must be missing something then, because I don’t see how knowledge about which userland address shares a cache way with a kernel address can leak anything beyond the bits that make up the index (i.e., which cache way is being shared)
> 

Well, KASLR is about keeping bits of kernel virtual address secret
from userland. Leaking them through cache sidechannel means KASLR is
defeated.


> > Maybe this explains it?
> > 
> 
> No not really. It explains how cache timing can be used as a side channel, not how it defeats kaslr.

Ok, look at this one:

https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX-wp.pdf

You can use timing instead of TSX, right?
     	 	    	     	       	      	     	       	    Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER)
  2017-11-23  9:07             ` Pavel Machek
@ 2017-11-23  9:23               ` Ard Biesheuvel
  2017-11-23 10:46                 ` Pavel Machek
  0 siblings, 1 reply; 45+ messages in thread
From: Ard Biesheuvel @ 2017-11-23  9:23 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Will Deacon, linux-arm-kernel, linux-kernel, Catalin Marinas,
	Mark Rutland, Stephen Boyd, Dave Hansen, Kees Cook

On 23 November 2017 at 09:07, Pavel Machek <pavel@ucw.cz> wrote:
> Hi!
>
>> > On 22 Nov 2017, at 23:37, Pavel Machek <pavel@ucw.cz> wrote:
>> >
>> > Hi!
>> >
>> >>>>> If I'm willing to do timing attacks to defeat KASLR... what prevents
>> >>>>> me from using CPU caches to do that?
>> >>>>>
>> >>>>
>> >>>> Because it is impossible to get a cache hit on an access to an
>> >>>> unmapped address?
>> >>>
>> >>> Um, no, I don't need to be able to directly access kernel addresses. I
>> >>> just put some data in _same place in cache where kernel data would
>> >>> go_, then do syscall and look if my data are still cached. Caches
>> >>> don't have infinite associativity.
>> >>>
>> >>
>> >> Ah ok. Interesting.
>> >>
>> >> But how does that leak address bits that are covered by the tag?
>> >
>> > Same as leaking any other address bits? Caches are "virtually
>> > indexed",
>>
>> Not on arm64, although I don’t see how that is relevant if you are trying to defeat kaslr.
>>
>> > and tag does not come into play...
>> >
>>
>> Well, I must be missing something then, because I don’t see how knowledge about which userland address shares a cache way with a kernel address can leak anything beyond the bits that make up the index (i.e., which cache way is being shared)
>>
>
> Well, KASLR is about keeping bits of kernel virtual address secret
> from userland. Leaking them through cache sidechannel means KASLR is
> defeated.
>

Yes, that is what you claim. But you are not explaining how any of the
bits that we do want to keep secret can be discovered by making
inferences from which lines in a primed cache were evicted during a
syscall.

The cache index maps to low order bits. You can use this, e.g., to
attack table based AES, because there is only ~4 KB worth of tables,
and you are interested in finding out which exact entries of the table
were read by the process under attack.

You are saying the same approach will help you discover 30 high order
bits of a virtual kernel address, by observing the cache evictions in
a physically indexed physically tagged cache. How?

>
>> > Maybe this explains it?
>> >
>>
>> No not really. It explains how cache timing can be used as a side channel, not how it defeats kaslr.
>
> Ok, look at this one:
>
> https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX-wp.pdf
>
> You can use timing instead of TSX, right?

The TSX attack is TLB based not cache based.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER)
  2017-11-23  9:23               ` Ard Biesheuvel
@ 2017-11-23 10:46                 ` Pavel Machek
  2017-11-23 11:38                   ` Ard Biesheuvel
  0 siblings, 1 reply; 45+ messages in thread
From: Pavel Machek @ 2017-11-23 10:46 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Will Deacon, linux-arm-kernel, linux-kernel, Catalin Marinas,
	Mark Rutland, Stephen Boyd, Dave Hansen, Kees Cook

[-- Attachment #1: Type: text/plain, Size: 2630 bytes --]

On Thu 2017-11-23 09:23:02, Ard Biesheuvel wrote:
> On 23 November 2017 at 09:07, Pavel Machek <pavel@ucw.cz> wrote:
> > Hi!
> >
> >> > On 22 Nov 2017, at 23:37, Pavel Machek <pavel@ucw.cz> wrote:
> >> >
> >> > Hi!
> >> >
> >> >>>>> If I'm willing to do timing attacks to defeat KASLR... what prevents
> >> >>>>> me from using CPU caches to do that?
> >> >>>>>
> >> >>>>
> >> >>>> Because it is impossible to get a cache hit on an access to an
> >> >>>> unmapped address?
> >> >>>
> >> >>> Um, no, I don't need to be able to directly access kernel addresses. I
> >> >>> just put some data in _same place in cache where kernel data would
> >> >>> go_, then do syscall and look if my data are still cached. Caches
> >> >>> don't have infinite associativity.
> >> >>>
> >> >>
> >> >> Ah ok. Interesting.
> >> >>
> >> >> But how does that leak address bits that are covered by the tag?
> >> >
> >> > Same as leaking any other address bits? Caches are "virtually
> >> > indexed",
> >>
> >> Not on arm64, although I don’t see how that is relevant if you are trying to defeat kaslr.
> >>
> >> > and tag does not come into play...
> >> >
> >>
> >> Well, I must be missing something then, because I don’t see how knowledge about which userland address shares a cache way with a kernel address can leak anything beyond the bits that make up the index (i.e., which cache way is being shared)
> >>
> >
> > Well, KASLR is about keeping bits of kernel virtual address secret
> > from userland. Leaking them through cache sidechannel means KASLR is
> > defeated.
> >
> 
> Yes, that is what you claim. But you are not explaining how any of the
> bits that we do want to keep secret can be discovered by making
> inferences from which lines in a primed cache were evicted during a
> syscall.
> 
> The cache index maps to low order bits. You can use this, e.g., to
> attack table based AES, because there is only ~4 KB worth of tables,
> and you are interested in finding out which exact entries of the table
> were read by the process under attack.
> 
> You are saying the same approach will help you discover 30 high order
> bits of a virtual kernel address, by observing the cache evictions in
> a physically indexed physically tagged cache. How?

I assumed high bits are hashed into cache index. I might have been
wrong. Anyway, page tables are about same size as AES tables. So...:

http://cve.circl.lu/cve/CVE-2017-5927

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER)
  2017-11-23 10:46                 ` Pavel Machek
@ 2017-11-23 11:38                   ` Ard Biesheuvel
  2017-11-23 17:54                     ` Pavel Machek
  0 siblings, 1 reply; 45+ messages in thread
From: Ard Biesheuvel @ 2017-11-23 11:38 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Will Deacon, linux-arm-kernel, linux-kernel, Catalin Marinas,
	Mark Rutland, Stephen Boyd, Dave Hansen, Kees Cook

On 23 November 2017 at 10:46, Pavel Machek <pavel@ucw.cz> wrote:
> On Thu 2017-11-23 09:23:02, Ard Biesheuvel wrote:
>> On 23 November 2017 at 09:07, Pavel Machek <pavel@ucw.cz> wrote:
>> > Hi!
>> >
>> >> > On 22 Nov 2017, at 23:37, Pavel Machek <pavel@ucw.cz> wrote:
>> >> >
>> >> > Hi!
>> >> >
>> >> >>>>> If I'm willing to do timing attacks to defeat KASLR... what prevents
>> >> >>>>> me from using CPU caches to do that?
>> >> >>>>>
>> >> >>>>
>> >> >>>> Because it is impossible to get a cache hit on an access to an
>> >> >>>> unmapped address?
>> >> >>>
>> >> >>> Um, no, I don't need to be able to directly access kernel addresses. I
>> >> >>> just put some data in _same place in cache where kernel data would
>> >> >>> go_, then do syscall and look if my data are still cached. Caches
>> >> >>> don't have infinite associativity.
>> >> >>>
>> >> >>
>> >> >> Ah ok. Interesting.
>> >> >>
>> >> >> But how does that leak address bits that are covered by the tag?
>> >> >
>> >> > Same as leaking any other address bits? Caches are "virtually
>> >> > indexed",
>> >>
>> >> Not on arm64, although I don’t see how that is relevant if you are trying to defeat kaslr.
>> >>
>> >> > and tag does not come into play...
>> >> >
>> >>
>> >> Well, I must be missing something then, because I don’t see how knowledge about which userland address shares a cache way with a kernel address can leak anything beyond the bits that make up the index (i.e., which cache way is being shared)
>> >>
>> >
>> > Well, KASLR is about keeping bits of kernel virtual address secret
>> > from userland. Leaking them through cache sidechannel means KASLR is
>> > defeated.
>> >
>>
>> Yes, that is what you claim. But you are not explaining how any of the
>> bits that we do want to keep secret can be discovered by making
>> inferences from which lines in a primed cache were evicted during a
>> syscall.
>>
>> The cache index maps to low order bits. You can use this, e.g., to
>> attack table based AES, because there is only ~4 KB worth of tables,
>> and you are interested in finding out which exact entries of the table
>> were read by the process under attack.
>>
>> You are saying the same approach will help you discover 30 high order
>> bits of a virtual kernel address, by observing the cache evictions in
>> a physically indexed physically tagged cache. How?
>
> I assumed high bits are hashed into cache index. I might have been
> wrong. Anyway, page tables are about same size as AES tables. So...:
>
> http://cve.circl.lu/cve/CVE-2017-5927
>

Very interesting paper. Can you explain why you think its findings can
be extrapolated to apply to attacks across address spaces? Because
that is what would be required for it to be able to defeat KASLR.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER)
  2017-11-23 11:38                   ` Ard Biesheuvel
@ 2017-11-23 17:54                     ` Pavel Machek
  2017-11-23 18:17                       ` Ard Biesheuvel
  0 siblings, 1 reply; 45+ messages in thread
From: Pavel Machek @ 2017-11-23 17:54 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Will Deacon, linux-arm-kernel, linux-kernel, Catalin Marinas,
	Mark Rutland, Stephen Boyd, Dave Hansen, Kees Cook

[-- Attachment #1: Type: text/plain, Size: 3414 bytes --]

On Thu 2017-11-23 11:38:52, Ard Biesheuvel wrote:
> On 23 November 2017 at 10:46, Pavel Machek <pavel@ucw.cz> wrote:
> > On Thu 2017-11-23 09:23:02, Ard Biesheuvel wrote:
> >> On 23 November 2017 at 09:07, Pavel Machek <pavel@ucw.cz> wrote:
> >> > Hi!
> >> >
> >> >> > On 22 Nov 2017, at 23:37, Pavel Machek <pavel@ucw.cz> wrote:
> >> >> >
> >> >> > Hi!
> >> >> >
> >> >> >>>>> If I'm willing to do timing attacks to defeat KASLR... what prevents
> >> >> >>>>> me from using CPU caches to do that?
> >> >> >>>>>
> >> >> >>>>
> >> >> >>>> Because it is impossible to get a cache hit on an access to an
> >> >> >>>> unmapped address?
> >> >> >>>
> >> >> >>> Um, no, I don't need to be able to directly access kernel addresses. I
> >> >> >>> just put some data in _same place in cache where kernel data would
> >> >> >>> go_, then do syscall and look if my data are still cached. Caches
> >> >> >>> don't have infinite associativity.
> >> >> >>>
> >> >> >>
> >> >> >> Ah ok. Interesting.
> >> >> >>
> >> >> >> But how does that leak address bits that are covered by the tag?
> >> >> >
> >> >> > Same as leaking any other address bits? Caches are "virtually
> >> >> > indexed",
> >> >>
> >> >> Not on arm64, although I don’t see how that is relevant if you are trying to defeat kaslr.
> >> >>
> >> >> > and tag does not come into play...
> >> >> >
> >> >>
> >> >> Well, I must be missing something then, because I don’t see how knowledge about which userland address shares a cache way with a kernel address can leak anything beyond the bits that make up the index (i.e., which cache way is being shared)
> >> >>
> >> >
> >> > Well, KASLR is about keeping bits of kernel virtual address secret
> >> > from userland. Leaking them through cache sidechannel means KASLR is
> >> > defeated.
> >> >
> >>
> >> Yes, that is what you claim. But you are not explaining how any of the
> >> bits that we do want to keep secret can be discovered by making
> >> inferences from which lines in a primed cache were evicted during a
> >> syscall.
> >>
> >> The cache index maps to low order bits. You can use this, e.g., to
> >> attack table based AES, because there is only ~4 KB worth of tables,
> >> and you are interested in finding out which exact entries of the table
> >> were read by the process under attack.
> >>
> >> You are saying the same approach will help you discover 30 high order
> >> bits of a virtual kernel address, by observing the cache evictions in
> >> a physically indexed physically tagged cache. How?
> >
> > I assumed high bits are hashed into cache index. I might have been
> > wrong. Anyway, page tables are about same size as AES tables. So...:
> >
> > http://cve.circl.lu/cve/CVE-2017-5927
> >
> 
> Very interesting paper. Can you explain why you think its findings can
> be extrapolated to apply to attacks across address spaces? Because
> that is what would be required for it to be able to defeat KASLR.

Can you explain why not?

You clearly understand AES tables can be attacked cross-address-space,
and there's no reason page tables could not be attacked same way. I'm
not saying that's the best way to launch the attack, but it certainly
looks possible to me.

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER)
  2017-11-23 17:54                     ` Pavel Machek
@ 2017-11-23 18:17                       ` Ard Biesheuvel
  0 siblings, 0 replies; 45+ messages in thread
From: Ard Biesheuvel @ 2017-11-23 18:17 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Will Deacon, linux-arm-kernel, linux-kernel, Catalin Marinas,
	Mark Rutland, Stephen Boyd, Dave Hansen, Kees Cook

On 23 November 2017 at 17:54, Pavel Machek <pavel@ucw.cz> wrote:
> On Thu 2017-11-23 11:38:52, Ard Biesheuvel wrote:
>> On 23 November 2017 at 10:46, Pavel Machek <pavel@ucw.cz> wrote:
>> > On Thu 2017-11-23 09:23:02, Ard Biesheuvel wrote:
>> >> On 23 November 2017 at 09:07, Pavel Machek <pavel@ucw.cz> wrote:
>> >> > Hi!
>> >> >
>> >> >> > On 22 Nov 2017, at 23:37, Pavel Machek <pavel@ucw.cz> wrote:
>> >> >> >
>> >> >> > Hi!
>> >> >> >
>> >> >> >>>>> If I'm willing to do timing attacks to defeat KASLR... what prevents
>> >> >> >>>>> me from using CPU caches to do that?
>> >> >> >>>>>
>> >> >> >>>>
>> >> >> >>>> Because it is impossible to get a cache hit on an access to an
>> >> >> >>>> unmapped address?
>> >> >> >>>
>> >> >> >>> Um, no, I don't need to be able to directly access kernel addresses. I
>> >> >> >>> just put some data in _same place in cache where kernel data would
>> >> >> >>> go_, then do syscall and look if my data are still cached. Caches
>> >> >> >>> don't have infinite associativity.
>> >> >> >>>
>> >> >> >>
>> >> >> >> Ah ok. Interesting.
>> >> >> >>
>> >> >> >> But how does that leak address bits that are covered by the tag?
>> >> >> >
>> >> >> > Same as leaking any other address bits? Caches are "virtually
>> >> >> > indexed",
>> >> >>
>> >> >> Not on arm64, although I don’t see how that is relevant if you are trying to defeat kaslr.
>> >> >>
>> >> >> > and tag does not come into play...
>> >> >> >
>> >> >>
>> >> >> Well, I must be missing something then, because I don’t see how knowledge about which userland address shares a cache way with a kernel address can leak anything beyond the bits that make up the index (i.e., which cache way is being shared)
>> >> >>
>> >> >
>> >> > Well, KASLR is about keeping bits of kernel virtual address secret
>> >> > from userland. Leaking them through cache sidechannel means KASLR is
>> >> > defeated.
>> >> >
>> >>
>> >> Yes, that is what you claim. But you are not explaining how any of the
>> >> bits that we do want to keep secret can be discovered by making
>> >> inferences from which lines in a primed cache were evicted during a
>> >> syscall.
>> >>
>> >> The cache index maps to low order bits. You can use this, e.g., to
>> >> attack table based AES, because there is only ~4 KB worth of tables,
>> >> and you are interested in finding out which exact entries of the table
>> >> were read by the process under attack.
>> >>
>> >> You are saying the same approach will help you discover 30 high order
>> >> bits of a virtual kernel address, by observing the cache evictions in
>> >> a physically indexed physically tagged cache. How?
>> >
>> > I assumed high bits are hashed into cache index. I might have been
>> > wrong. Anyway, page tables are about same size as AES tables. So...:
>> >
>> > http://cve.circl.lu/cve/CVE-2017-5927
>> >
>>
>> Very interesting paper. Can you explain why you think its findings can
>> be extrapolated to apply to attacks across address spaces? Because
>> that is what would be required for it to be able to defeat KASLR.
>
> Can you explain why not?
>
> You clearly understand AES tables can be attacked cross-address-space,
> and there's no reason page tables could not be attacked same way. I'm
> not saying that's the best way to launch the attack, but it certainly
> looks possible to me.
>

There are two sides to this:
- on the one hand, a round trip into the kernel is quite likely to
result in many more cache evictions than the ones from which you will
be able to infer what address was being resolved by the page table
walker, adding noise to the signal,
- on the other hand, the kernel mappings are deliberately coarse
grained so that they can be cached in the TLB with literally only a
handful of entries, so it is not guaranteed that a TLB miss will occur
that results in a page table walk that you are interested in.

Given the statistical approach, it may simply mean taking more
samples, but how many more? 10x 100000x? Given that the current attack
takes 10s of seconds to mount, that is a significant limitation. For
the TLB side, it may help to mount an additional attack to prime the
TLB, but that itself is likely to add noise to the cache state
measurements.

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2017-11-23 18:17 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-17 18:21 [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Will Deacon
2017-11-17 18:21 ` [PATCH 01/18] arm64: mm: Use non-global mappings for kernel space Will Deacon
2017-11-17 18:21 ` [PATCH 02/18] arm64: mm: Temporarily disable ARM64_SW_TTBR0_PAN Will Deacon
2017-11-17 18:21 ` [PATCH 03/18] arm64: mm: Move ASID from TTBR0 to TTBR1 Will Deacon
2017-11-17 18:21 ` [PATCH 04/18] arm64: mm: Remove pre_ttbr0_update_workaround for Falkor erratum #E1003 Will Deacon
2017-11-17 18:21 ` [PATCH 05/18] arm64: mm: Rename post_ttbr0_update_workaround Will Deacon
2017-11-17 18:21 ` [PATCH 06/18] arm64: mm: Fix and re-enable ARM64_SW_TTBR0_PAN Will Deacon
2017-11-17 18:21 ` [PATCH 07/18] arm64: mm: Allocate ASIDs in pairs Will Deacon
2017-11-17 18:21 ` [PATCH 08/18] arm64: mm: Add arm64_kernel_mapped_at_el0 helper using static key Will Deacon
2017-11-17 18:21 ` [PATCH 09/18] arm64: mm: Invalidate both kernel and user ASIDs when performing TLBI Will Deacon
2017-11-17 18:21 ` [PATCH 10/18] arm64: entry: Add exception trampoline page for exceptions from EL0 Will Deacon
2017-11-17 18:21 ` [PATCH 11/18] arm64: mm: Map entry trampoline into trampoline and kernel page tables Will Deacon
2017-11-17 18:21 ` [PATCH 12/18] arm64: entry: Explicitly pass exception level to kernel_ventry macro Will Deacon
2017-11-17 18:21 ` [PATCH 13/18] arm64: entry: Hook up entry trampoline to exception vectors Will Deacon
2017-11-17 18:21 ` [PATCH 14/18] arm64: erratum: Work around Falkor erratum #E1003 in trampoline code Will Deacon
2017-11-18  0:27   ` Stephen Boyd
2017-11-20 18:05     ` Will Deacon
2017-11-17 18:21 ` [PATCH 15/18] arm64: tls: Avoid unconditional zeroing of tpidrro_el0 for native tasks Will Deacon
2017-11-17 18:21 ` [PATCH 16/18] arm64: entry: Add fake CPU feature for mapping the kernel at EL0 Will Deacon
2017-11-17 18:22 ` [PATCH 17/18] arm64: makefile: Ensure TEXT_OFFSET doesn't overlap with trampoline Will Deacon
2017-11-17 18:22 ` [PATCH 18/18] arm64: Kconfig: Add CONFIG_UNMAP_KERNEL_AT_EL0 Will Deacon
2017-11-22 16:52   ` Marc Zyngier
2017-11-22 19:36     ` Will Deacon
2017-11-18  0:19 ` [PATCH 00/18] arm64: Unmap the kernel whilst running in userspace (KAISER) Stephen Boyd
2017-11-20 18:03   ` Will Deacon
2017-11-18 15:25 ` Ard Biesheuvel
2017-11-20 18:06   ` Will Deacon
2017-11-20 18:20     ` Ard Biesheuvel
2017-11-22 19:37       ` Will Deacon
2017-11-20 22:50 ` Laura Abbott
2017-11-22 19:37   ` Will Deacon
2017-11-22 16:19 ` Pavel Machek
2017-11-22 19:37   ` Will Deacon
2017-11-22 22:36     ` Pavel Machek
2017-11-22 21:19   ` Ard Biesheuvel
2017-11-22 22:33     ` Pavel Machek
2017-11-22 23:19       ` Ard Biesheuvel
2017-11-22 23:37         ` Pavel Machek
2017-11-23  6:51           ` Ard Biesheuvel
2017-11-23  9:07             ` Pavel Machek
2017-11-23  9:23               ` Ard Biesheuvel
2017-11-23 10:46                 ` Pavel Machek
2017-11-23 11:38                   ` Ard Biesheuvel
2017-11-23 17:54                     ` Pavel Machek
2017-11-23 18:17                       ` Ard Biesheuvel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).