All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/6] 32bit ARM branch predictor hardening
@ 2018-01-25 15:21 Marc Zyngier
  2018-01-25 15:21 ` [PATCH v3 1/6] arm: Add BTB invalidation on switch_mm for Cortex-A9, A12 and A17 Marc Zyngier
                   ` (8 more replies)
  0 siblings, 9 replies; 32+ messages in thread
From: Marc Zyngier @ 2018-01-25 15:21 UTC (permalink / raw)
  To: linux-arm-kernel

This small series implements some basic BP hardening by invalidating
the BTB on 32bit ARM CPUs that are known to be susceptible to aliasing
attacks (Spectre variant 2). It doesn't help non-ARM 32bit CPUs, nor
32bit kernels that run on 64bit capable CPUs. This series doesn't
mitigate Spectre variant 1 either.

These patches are closely modelled against what we do on arm64,
although simpler as we can rely on an architected instruction to
perform the invalidation. The notable exception is Cortex-A15, where
BTB invalidation behaves like a NOP, and the only way to shoot the
predictor down is to invalidate the icache *and* to have ACTLR[0] set
to 1 (which is a secure-only operation).

The first patch reuses the Cortex-A8 BTB invalidation in switch_mm and
generalises it to be used on all affected CPUs. The second perform the
same invalidation on prefetch abort outside of the userspace
range. The third one nukes it on guest exit, and results in some major
surgery as we cannot take a branch from the vectors (that, and Thumb2
being a massive pain).

Patches 4 to 6 are doing a similar thing for Cortex-A15, which the
aforementioned ICIALLU.

To sum up the requirements:
- Both Cortex-A8 and Cortex-A15 need to have ACTLR.IBE (bit 0) set to
  1 from secure mode. For Cortex-A8, this overlaps with
  ARM_ERRATA_430973 which also requires it.
- Cortex-A9, A12 and A17 do not require any extra configuration.

Note 1: Contrary to the initial version, this new series relies on
the arm64/kpti branch (I reuse the per-CPU vector hook for KVM).

Note 2: M-class CPUs are not affected and for R-class cores, the
mitigation doesn't make much sense since we do not enforce user/kernel
isolation.

* From v2:
  - Fixed !MMU build
  - Small KVM optimisation (suggested by Robin)
  - Fixed register zeroing in cpu_v7_btbinv_switch_mm (noticed by
    Andre)
  
* From v1:
  - Fixed broken hyp_fiq vector (noticed by Ard)
  - Fixed broken BTB invalidation in LPAE switch_mm (reported by Andre)
  - Revamped invalidation on PABT (noticed by James on arm64,
    suggested by Will)
  - Rewrote the whole HYP sequence, as Thumb2 was pretty unhappy about
    arithmetic with the stack pointer

Marc Zyngier (6):
  arm: Add BTB invalidation on switch_mm for Cortex-A9, A12 and A17
  arm: Invalidate BTB on prefetch abort outside of user mapping on
    Cortex A8, A9, A12 and A17
  arm: KVM: Invalidate BTB on guest exit for Cortex-A12/A17
  arm: Add icache invalidation on switch_mm for Cortex-A15
  arm: Invalidate icache on prefetch abort outside of user mapping on
    Cortex-A15
  arm: KVM: Invalidate icache on guest exit for Cortex-A15

 arch/arm/include/asm/cp15.h    |  3 ++
 arch/arm/include/asm/kvm_asm.h |  2 -
 arch/arm/include/asm/kvm_mmu.h | 17 ++++++++-
 arch/arm/kvm/hyp/hyp-entry.S   | 85 +++++++++++++++++++++++++++++++++++++++++-
 arch/arm/mm/fault.c            | 29 ++++++++++++++
 arch/arm/mm/fsr-2level.c       |  4 +-
 arch/arm/mm/fsr-3level.c       | 67 ++++++++++++++++++++++++++++++++-
 arch/arm/mm/proc-v7-2level.S   | 14 ++++++-
 arch/arm/mm/proc-v7-3level.S   | 22 +++++++++++
 arch/arm/mm/proc-v7.S          | 48 ++++++++++++++++--------
 10 files changed, 265 insertions(+), 26 deletions(-)

-- 
2.14.2

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 1/6] arm: Add BTB invalidation on switch_mm for Cortex-A9, A12 and A17
  2018-01-25 15:21 [PATCH v3 0/6] 32bit ARM branch predictor hardening Marc Zyngier
@ 2018-01-25 15:21 ` Marc Zyngier
  2018-01-26 20:44   ` Florian Fainelli
  2018-01-25 15:21 ` [PATCH v3 2/6] arm: Invalidate BTB on prefetch abort outside of user mapping on Cortex A8, A9, " Marc Zyngier
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 32+ messages in thread
From: Marc Zyngier @ 2018-01-25 15:21 UTC (permalink / raw)
  To: linux-arm-kernel

In order to avoid aliasing attacks against the branch predictor,
some implementations require to invalidate the BTB when switching
from one user context to another.

For this, we reuse the existing implementation for Cortex-A8, and
apply it to A9, A12 and A17.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/mm/proc-v7-2level.S |  4 ++--
 arch/arm/mm/proc-v7-3level.S |  6 ++++++
 arch/arm/mm/proc-v7.S        | 30 +++++++++++++++---------------
 3 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/arch/arm/mm/proc-v7-2level.S b/arch/arm/mm/proc-v7-2level.S
index c6141a5435c3..0422e58b74e8 100644
--- a/arch/arm/mm/proc-v7-2level.S
+++ b/arch/arm/mm/proc-v7-2level.S
@@ -41,7 +41,7 @@
  *	even on Cortex-A8 revisions not affected by 430973.
  *	If IBE is not set, the flush BTAC/BTB won't do anything.
  */
-ENTRY(cpu_ca8_switch_mm)
+ENTRY(cpu_v7_btbinv_switch_mm)
 #ifdef CONFIG_MMU
 	mov	r2, #0
 	mcr	p15, 0, r2, c7, c5, 6		@ flush BTAC/BTB
@@ -66,7 +66,7 @@ ENTRY(cpu_v7_switch_mm)
 #endif
 	bx	lr
 ENDPROC(cpu_v7_switch_mm)
-ENDPROC(cpu_ca8_switch_mm)
+ENDPROC(cpu_v7_btbinv_switch_mm)
 
 /*
  *	cpu_v7_set_pte_ext(ptep, pte)
diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 7d16bbc4102b..934272e1fa08 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -54,6 +54,11 @@
  * Set the translation table base pointer to be pgd_phys (physical address of
  * the new TTB).
  */
+ENTRY(cpu_v7_btbinv_switch_mm)
+#ifdef CONFIG_MMU
+	mov	r2, #0
+	mcr	p15, 0, r2, c7, c5, 6			@ flush BTAC/BTB
+#endif
 ENTRY(cpu_v7_switch_mm)
 #ifdef CONFIG_MMU
 	mmid	r2, r2
@@ -64,6 +69,7 @@ ENTRY(cpu_v7_switch_mm)
 #endif
 	ret	lr
 ENDPROC(cpu_v7_switch_mm)
+ENDPROC(cpu_v7_btbinv_switch_mm)
 
 #ifdef __ARMEB__
 #define rl r3
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 01d64c0b2563..0a14967fd400 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -159,18 +159,18 @@ ENDPROC(cpu_v7_do_resume)
 #endif
 
 /*
- * Cortex-A8
+ * Cortex-A8/A12/A17 that require a BTB invalidation on switch_mm
  */
-	globl_equ	cpu_ca8_proc_init,	cpu_v7_proc_init
-	globl_equ	cpu_ca8_proc_fin,	cpu_v7_proc_fin
-	globl_equ	cpu_ca8_reset,		cpu_v7_reset
-	globl_equ	cpu_ca8_do_idle,	cpu_v7_do_idle
-	globl_equ	cpu_ca8_dcache_clean_area, cpu_v7_dcache_clean_area
-	globl_equ	cpu_ca8_set_pte_ext,	cpu_v7_set_pte_ext
-	globl_equ	cpu_ca8_suspend_size,	cpu_v7_suspend_size
+	globl_equ	cpu_v7_btbinv_proc_init,	cpu_v7_proc_init
+	globl_equ	cpu_v7_btbinv_proc_fin,		cpu_v7_proc_fin
+	globl_equ	cpu_v7_btbinv_reset,		cpu_v7_reset
+	globl_equ	cpu_v7_btbinv_do_idle,		cpu_v7_do_idle
+	globl_equ	cpu_v7_btbinv_dcache_clean_area, cpu_v7_dcache_clean_area
+	globl_equ	cpu_v7_btbinv_set_pte_ext,	cpu_v7_set_pte_ext
+	globl_equ	cpu_v7_btbinv_suspend_size,	cpu_v7_suspend_size
 #ifdef CONFIG_ARM_CPU_SUSPEND
-	globl_equ	cpu_ca8_do_suspend,	cpu_v7_do_suspend
-	globl_equ	cpu_ca8_do_resume,	cpu_v7_do_resume
+	globl_equ	cpu_v7_btbinv_do_suspend,	cpu_v7_do_suspend
+	globl_equ	cpu_v7_btbinv_do_resume,	cpu_v7_do_resume
 #endif
 
 /*
@@ -181,7 +181,7 @@ ENDPROC(cpu_v7_do_resume)
 	globl_equ	cpu_ca9mp_reset,	cpu_v7_reset
 	globl_equ	cpu_ca9mp_do_idle,	cpu_v7_do_idle
 	globl_equ	cpu_ca9mp_dcache_clean_area, cpu_v7_dcache_clean_area
-	globl_equ	cpu_ca9mp_switch_mm,	cpu_v7_switch_mm
+	globl_equ	cpu_ca9mp_switch_mm,	cpu_v7_btbinv_switch_mm
 	globl_equ	cpu_ca9mp_set_pte_ext,	cpu_v7_set_pte_ext
 .globl	cpu_ca9mp_suspend_size
 .equ	cpu_ca9mp_suspend_size, cpu_v7_suspend_size + 4 * 2
@@ -548,8 +548,8 @@ __v7_setup_stack:
 
 	@ define struct processor (see <asm/proc-fns.h> and proc-macros.S)
 	define_processor_functions v7, dabort=v7_early_abort, pabort=v7_pabort, suspend=1
+	define_processor_functions v7_btbinv, dabort=v7_early_abort, pabort=v7_pabort, suspend=1
 #ifndef CONFIG_ARM_LPAE
-	define_processor_functions ca8, dabort=v7_early_abort, pabort=v7_pabort, suspend=1
 	define_processor_functions ca9mp, dabort=v7_early_abort, pabort=v7_pabort, suspend=1
 #endif
 #ifdef CONFIG_CPU_PJ4B
@@ -614,7 +614,7 @@ __v7_ca9mp_proc_info:
 __v7_ca8_proc_info:
 	.long	0x410fc080
 	.long	0xff0ffff0
-	__v7_proc __v7_ca8_proc_info, __v7_setup, proc_fns = ca8_processor_functions
+	__v7_proc __v7_ca8_proc_info, __v7_setup, proc_fns = v7_btbinv_processor_functions
 	.size	__v7_ca8_proc_info, . - __v7_ca8_proc_info
 
 #endif	/* CONFIG_ARM_LPAE */
@@ -658,7 +658,7 @@ __v7_ca7mp_proc_info:
 __v7_ca12mp_proc_info:
 	.long	0x410fc0d0
 	.long	0xff0ffff0
-	__v7_proc __v7_ca12mp_proc_info, __v7_ca12mp_setup
+	__v7_proc __v7_ca12mp_proc_info, __v7_ca12mp_setup, proc_fns = v7_btbinv_processor_functions
 	.size	__v7_ca12mp_proc_info, . - __v7_ca12mp_proc_info
 
 	/*
@@ -688,7 +688,7 @@ __v7_b15mp_proc_info:
 __v7_ca17mp_proc_info:
 	.long	0x410fc0e0
 	.long	0xff0ffff0
-	__v7_proc __v7_ca17mp_proc_info, __v7_ca17mp_setup
+	__v7_proc __v7_ca17mp_proc_info, __v7_ca17mp_setup, proc_fns = v7_btbinv_processor_functions
 	.size	__v7_ca17mp_proc_info, . - __v7_ca17mp_proc_info
 
 	/*
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 2/6] arm: Invalidate BTB on prefetch abort outside of user mapping on Cortex A8, A9, A12 and A17
  2018-01-25 15:21 [PATCH v3 0/6] 32bit ARM branch predictor hardening Marc Zyngier
  2018-01-25 15:21 ` [PATCH v3 1/6] arm: Add BTB invalidation on switch_mm for Cortex-A9, A12 and A17 Marc Zyngier
@ 2018-01-25 15:21 ` Marc Zyngier
  2018-01-31  2:13   ` Fabio Estevam
  2018-01-25 15:21 ` [PATCH v3 3/6] arm: KVM: Invalidate BTB on guest exit for Cortex-A12/A17 Marc Zyngier
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 32+ messages in thread
From: Marc Zyngier @ 2018-01-25 15:21 UTC (permalink / raw)
  To: linux-arm-kernel

In order to prevent aliasing attacks on the branch predictor,
invalidate the BTB on CPUs that are known to be affected when taking
a prefetch abort on a address that is outside of a user task limit.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/cp15.h |  2 ++
 arch/arm/mm/fault.c         | 25 +++++++++++++++++
 arch/arm/mm/fsr-2level.c    |  4 +--
 arch/arm/mm/fsr-3level.c    | 67 ++++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 95 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/cp15.h b/arch/arm/include/asm/cp15.h
index 4c9fa72b59f5..9e900ae855aa 100644
--- a/arch/arm/include/asm/cp15.h
+++ b/arch/arm/include/asm/cp15.h
@@ -65,6 +65,8 @@
 #define __write_sysreg(v, r, w, c, t)	asm volatile(w " " c : : "r" ((t)(v)))
 #define write_sysreg(v, ...)		__write_sysreg(v, __VA_ARGS__)
 
+#define BPIALL				__ACCESS_CP15(c7, 0, c5, 6)
+
 extern unsigned long cr_alignment;	/* defined in entry-armv.S */
 
 static inline unsigned long get_cr(void)
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 42f585379e19..d400d1c5a409 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -21,6 +21,7 @@
 #include <linux/highmem.h>
 #include <linux/perf_event.h>
 
+#include <asm/cp15.h>
 #include <asm/exception.h>
 #include <asm/pgtable.h>
 #include <asm/system_misc.h>
@@ -181,6 +182,7 @@ __do_user_fault(struct task_struct *tsk, unsigned long addr,
 	si.si_errno = 0;
 	si.si_code = code;
 	si.si_addr = (void __user *)addr;
+
 	force_sig_info(sig, &si, tsk);
 }
 
@@ -396,12 +398,35 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 	__do_kernel_fault(mm, addr, fsr, regs);
 	return 0;
 }
+
+static int
+do_pabt_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
+{
+	if (addr > TASK_SIZE) {
+		switch(read_cpuid_part()) {
+		case ARM_CPU_PART_CORTEX_A8:
+		case ARM_CPU_PART_CORTEX_A9:
+		case ARM_CPU_PART_CORTEX_A12:
+		case ARM_CPU_PART_CORTEX_A17:
+			write_sysreg(0, BPIALL);
+			break;
+		}
+	}
+
+	return do_page_fault(addr, fsr, regs);
+}
 #else					/* CONFIG_MMU */
 static int
 do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 {
 	return 0;
 }
+
+static int
+do_pabt_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
+{
+	return 0;
+}
 #endif					/* CONFIG_MMU */
 
 /*
diff --git a/arch/arm/mm/fsr-2level.c b/arch/arm/mm/fsr-2level.c
index f2be95197265..ca07f72d8624 100644
--- a/arch/arm/mm/fsr-2level.c
+++ b/arch/arm/mm/fsr-2level.c
@@ -51,7 +51,7 @@ static struct fsr_info ifsr_info[] = {
 	{ do_bad,		SIGBUS,  0,		"unknown 4"			   },
 	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"section translation fault"	   },
 	{ do_bad,		SIGSEGV, SEGV_ACCERR,	"page access flag fault"	   },
-	{ do_page_fault,	SIGSEGV, SEGV_MAPERR,	"page translation fault"	   },
+	{ do_pabt_page_fault,	SIGSEGV, SEGV_MAPERR,	"page translation fault"	   },
 	{ do_bad,		SIGBUS,	 0,		"external abort on non-linefetch"  },
 	{ do_bad,		SIGSEGV, SEGV_ACCERR,	"section domain fault"		   },
 	{ do_bad,		SIGBUS,  0,		"unknown 10"			   },
@@ -59,7 +59,7 @@ static struct fsr_info ifsr_info[] = {
 	{ do_bad,		SIGBUS,	 0,		"external abort on translation"	   },
 	{ do_sect_fault,	SIGSEGV, SEGV_ACCERR,	"section permission fault"	   },
 	{ do_bad,		SIGBUS,	 0,		"external abort on translation"	   },
-	{ do_page_fault,	SIGSEGV, SEGV_ACCERR,	"page permission fault"		   },
+	{ do_pabt_page_fault,	SIGSEGV, SEGV_ACCERR,	"page permission fault"		   },
 	{ do_bad,		SIGBUS,  0,		"unknown 16"			   },
 	{ do_bad,		SIGBUS,  0,		"unknown 17"			   },
 	{ do_bad,		SIGBUS,  0,		"unknown 18"			   },
diff --git a/arch/arm/mm/fsr-3level.c b/arch/arm/mm/fsr-3level.c
index d0ae2963656a..88cfc7d06a30 100644
--- a/arch/arm/mm/fsr-3level.c
+++ b/arch/arm/mm/fsr-3level.c
@@ -66,4 +66,69 @@ static struct fsr_info fsr_info[] = {
 	{ do_bad,		SIGBUS,  0,		"unknown 63"			},
 };
 
-#define ifsr_info	fsr_info
+static struct fsr_info ifsr_info[] = {
+	{ do_bad,		SIGBUS,  0,		"unknown 0"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 1"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 2"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 3"			},
+	{ do_bad,		SIGBUS,  0,		"reserved translation fault"	},
+	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level 1 translation fault"	},
+	{ do_translation_fault,	SIGSEGV, SEGV_MAPERR,	"level 2 translation fault"	},
+	{ do_pabt_page_fault,	SIGSEGV, SEGV_MAPERR,	"level 3 translation fault"	},
+	{ do_bad,		SIGBUS,  0,		"reserved access flag fault"	},
+	{ do_bad,		SIGSEGV, SEGV_ACCERR,	"level 1 access flag fault"	},
+	{ do_pabt_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 access flag fault"	},
+	{ do_pabt_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 access flag fault"	},
+	{ do_bad,		SIGBUS,  0,		"reserved permission fault"	},
+	{ do_bad,		SIGSEGV, SEGV_ACCERR,	"level 1 permission fault"	},
+	{ do_pabt_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 2 permission fault"	},
+	{ do_pabt_page_fault,	SIGSEGV, SEGV_ACCERR,	"level 3 permission fault"	},
+	{ do_bad,		SIGBUS,  0,		"synchronous external abort"	},
+	{ do_bad,		SIGBUS,  0,		"asynchronous external abort"	},
+	{ do_bad,		SIGBUS,  0,		"unknown 18"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 19"			},
+	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
+	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
+	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
+	{ do_bad,		SIGBUS,  0,		"synchronous abort (translation table walk)" },
+	{ do_bad,		SIGBUS,  0,		"synchronous parity error"	},
+	{ do_bad,		SIGBUS,  0,		"asynchronous parity error"	},
+	{ do_bad,		SIGBUS,  0,		"unknown 26"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 27"			},
+	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk" },
+	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk" },
+	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk" },
+	{ do_bad,		SIGBUS,  0,		"synchronous parity error (translation table walk" },
+	{ do_bad,		SIGBUS,  0,		"unknown 32"			},
+	{ do_bad,		SIGBUS,  BUS_ADRALN,	"alignment fault"		},
+	{ do_bad,		SIGBUS,  0,		"debug event"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 35"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 36"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 37"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 38"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 39"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 40"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 41"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 42"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 43"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 44"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 45"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 46"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 47"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 48"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 49"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 50"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 51"			},
+	{ do_bad,		SIGBUS,  0,		"implementation fault (lockdown abort)" },
+	{ do_bad,		SIGBUS,  0,		"unknown 53"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 54"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 55"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 56"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 57"			},
+	{ do_bad,		SIGBUS,  0,		"implementation fault (coprocessor abort)" },
+	{ do_bad,		SIGBUS,  0,		"unknown 59"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 60"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 61"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 62"			},
+	{ do_bad,		SIGBUS,  0,		"unknown 63"			},
+};
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 3/6] arm: KVM: Invalidate BTB on guest exit for Cortex-A12/A17
  2018-01-25 15:21 [PATCH v3 0/6] 32bit ARM branch predictor hardening Marc Zyngier
  2018-01-25 15:21 ` [PATCH v3 1/6] arm: Add BTB invalidation on switch_mm for Cortex-A9, A12 and A17 Marc Zyngier
  2018-01-25 15:21 ` [PATCH v3 2/6] arm: Invalidate BTB on prefetch abort outside of user mapping on Cortex A8, A9, " Marc Zyngier
@ 2018-01-25 15:21 ` Marc Zyngier
  2018-01-26  9:23   ` Christoffer Dall
  2018-01-26 17:12   ` Robin Murphy
  2018-01-25 15:21 ` [PATCH v3 4/6] arm: Add icache invalidation on switch_mm for Cortex-A15 Marc Zyngier
                   ` (5 subsequent siblings)
  8 siblings, 2 replies; 32+ messages in thread
From: Marc Zyngier @ 2018-01-25 15:21 UTC (permalink / raw)
  To: linux-arm-kernel

In order to avoid aliasing attacks against the branch predictor,
let's invalidate the BTB on guest exit. This is made complicated
by the fact that we cannot take a branch before invalidating the
BTB.

We only apply this to A12 and A17, which are the only two ARM
cores on which this useful.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_asm.h |  2 --
 arch/arm/include/asm/kvm_mmu.h | 13 ++++++++-
 arch/arm/kvm/hyp/hyp-entry.S   | 62 ++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 72 insertions(+), 5 deletions(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 36dd2962a42d..df24ed48977d 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -61,8 +61,6 @@ struct kvm_vcpu;
 extern char __kvm_hyp_init[];
 extern char __kvm_hyp_init_end[];
 
-extern char __kvm_hyp_vector[];
-
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index eb46fc81a440..b47db5b9e407 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -37,6 +37,7 @@
 
 #include <linux/highmem.h>
 #include <asm/cacheflush.h>
+#include <asm/cputype.h>
 #include <asm/pgalloc.h>
 #include <asm/stage2_pgtable.h>
 
@@ -223,7 +224,17 @@ static inline unsigned int kvm_get_vmid_bits(void)
 
 static inline void *kvm_get_hyp_vector(void)
 {
-	return kvm_ksym_ref(__kvm_hyp_vector);
+	extern char __kvm_hyp_vector[];
+	extern char __kvm_hyp_vector_bp_inv[];
+
+	switch(read_cpuid_part()) {
+	case ARM_CPU_PART_CORTEX_A12:
+	case ARM_CPU_PART_CORTEX_A17:
+		return kvm_ksym_ref(__kvm_hyp_vector_bp_inv);
+
+	default:
+		return kvm_ksym_ref(__kvm_hyp_vector);
+	}
 }
 
 static inline int kvm_map_vectors(void)
diff --git a/arch/arm/kvm/hyp/hyp-entry.S b/arch/arm/kvm/hyp/hyp-entry.S
index 95a2faefc070..aab6b0c06a19 100644
--- a/arch/arm/kvm/hyp/hyp-entry.S
+++ b/arch/arm/kvm/hyp/hyp-entry.S
@@ -70,6 +70,57 @@ __kvm_hyp_vector:
 	W(b)	hyp_hvc
 	W(b)	hyp_irq
 	W(b)	hyp_fiq
+	
+	.align 5
+__kvm_hyp_vector_bp_inv:
+	.global __kvm_hyp_vector_bp_inv
+
+	/*
+	 * We encode the exception entry in the bottom 3 bits of
+	 * SP, and we have to guarantee to be 8 bytes aligned.
+	 */
+	W(add)	sp, sp, #1	/* Reset 	  7 */
+	W(add)	sp, sp, #1	/* Undef	  6 */
+	W(add)	sp, sp, #1	/* Syscall	  5 */
+	W(add)	sp, sp, #1	/* Prefetch abort 4 */
+	W(add)	sp, sp, #1	/* Data abort	  3 */
+	W(add)	sp, sp, #1	/* HVC		  2 */
+	W(add)	sp, sp, #1	/* IRQ		  1 */
+	W(nop)			/* FIQ		  0 */
+
+	mcr	p15, 0, r0, c7, c5, 6	/* BPIALL */
+	isb
+
+	/*
+	 * Yet another silly hack: Use VPIDR as a temp register.
+	 * Thumb2 is really a pain, as SP cannot be used with most
+	 * of the bitwise instructions. The vect_br macro ensures
+	 * things gets cleaned-up.
+	 */
+	mcr	p15, 4, r0, c0, c0, 0	/* VPIDR */
+	mov	r0, sp
+	and	r0, r0, #7
+	sub	sp, sp, r0
+	push	{r1, r2}
+	mov	r1, r0
+	mrc	p15, 4, r0, c0, c0, 0	/* VPIDR */
+	mrc	p15, 0, r2, c0, c0, 0	/* MIDR  */
+	mcr	p15, 4, r2, c0, c0, 0	/* VPIDR */
+
+.macro vect_br val, targ
+	cmp	r1, #\val
+	popeq	{r1, r2}
+	beq	\targ
+.endm
+
+	vect_br	0, hyp_fiq
+	vect_br	1, hyp_irq
+	vect_br	2, hyp_hvc
+	vect_br	3, hyp_dabt
+	vect_br	4, hyp_pabt
+	vect_br	5, hyp_svc
+	vect_br	6, hyp_undef
+	vect_br	7, hyp_reset
 
 .macro invalid_vector label, cause
 	.align
@@ -149,7 +200,14 @@ hyp_hvc:
 	bx	ip
 
 1:
-	push	{lr}
+	/*
+	 * Pushing r2 here is just a way of keeping the stack aligned to
+	 * 8 bytes on any path that can trigger a HYP exception. Here,
+	 * we may well be about to jump into the guest, and the guest
+	 * exit would otherwise be badly decoded by our fancy
+	 * "decode-exception-without-a-branch" code...
+	 */
+	push	{r2, lr}
 
 	mov	lr, r0
 	mov	r0, r1
@@ -159,7 +217,7 @@ hyp_hvc:
 THUMB(	orr	lr, #1)
 	blx	lr			@ Call the HYP function
 
-	pop	{lr}
+	pop	{r2, lr}
 	eret
 
 guest_trap:
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 4/6] arm: Add icache invalidation on switch_mm for Cortex-A15
  2018-01-25 15:21 [PATCH v3 0/6] 32bit ARM branch predictor hardening Marc Zyngier
                   ` (2 preceding siblings ...)
  2018-01-25 15:21 ` [PATCH v3 3/6] arm: KVM: Invalidate BTB on guest exit for Cortex-A12/A17 Marc Zyngier
@ 2018-01-25 15:21 ` Marc Zyngier
  2018-01-26  9:14   ` Christoffer Dall
  2018-01-27 22:23   ` Florian Fainelli
  2018-01-25 15:21 ` [PATCH v3 5/6] arm: Invalidate icache on prefetch abort outside of user mapping on Cortex-A15 Marc Zyngier
                   ` (4 subsequent siblings)
  8 siblings, 2 replies; 32+ messages in thread
From: Marc Zyngier @ 2018-01-25 15:21 UTC (permalink / raw)
  To: linux-arm-kernel

In order to avoid aliasing attacks against the branch predictor,
Cortex-A15 require to invalidate the BTB when switching
from one user context to another. The only way to do so on this
CPU is to perform an ICIALLU, having set ACTLR[0] to 1 from secure
mode.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/mm/proc-v7-2level.S | 10 ++++++++++
 arch/arm/mm/proc-v7-3level.S | 16 ++++++++++++++++
 arch/arm/mm/proc-v7.S        | 18 +++++++++++++++++-
 3 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/arch/arm/mm/proc-v7-2level.S b/arch/arm/mm/proc-v7-2level.S
index 0422e58b74e8..7dc9e1c69039 100644
--- a/arch/arm/mm/proc-v7-2level.S
+++ b/arch/arm/mm/proc-v7-2level.S
@@ -40,7 +40,17 @@
  *	Note that we always need to flush BTAC/BTB if IBE is set
  *	even on Cortex-A8 revisions not affected by 430973.
  *	If IBE is not set, the flush BTAC/BTB won't do anything.
+ *
+ *	Cortex-A15 requires ACTLR[0] to be set from secure in order
+ *	for the icache invalidation to also invalidate the BTB.
  */
+ENTRY(cpu_ca15_switch_mm)
+#ifdef CONFIG_MMU
+	mcr	p15, 0, r0, c7, c5, 0			@ ICIALLU
+	isb
+	b	cpu_v7_switch_mm
+#endif
+ENDPROC(cpu_ca15_switch_mm)
 ENTRY(cpu_v7_btbinv_switch_mm)
 #ifdef CONFIG_MMU
 	mov	r2, #0
diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
index 934272e1fa08..cae6bb4da956 100644
--- a/arch/arm/mm/proc-v7-3level.S
+++ b/arch/arm/mm/proc-v7-3level.S
@@ -71,6 +71,22 @@ ENTRY(cpu_v7_switch_mm)
 ENDPROC(cpu_v7_switch_mm)
 ENDPROC(cpu_v7_btbinv_switch_mm)
 
+/*
+ *	Cortex-A15 requires ACTLR[0] to be set from secure in order
+ *	for the icache invalidation to also invalidate the BTB.
+ */
+ENTRY(cpu_ca15_switch_mm)
+#ifdef CONFIG_MMU
+	mcr	p15, 0, r0, c7, c5, 0			@ ICIALLU
+	mmid	r2, r2
+	asid	r2, r2
+	orr	rpgdh, rpgdh, r2, lsl #(48 - 32)	@ upper 32-bits of pgd
+	mcrr	p15, 0, rpgdl, rpgdh, c2		@ set TTB 0
+	isb
+#endif
+	ret	lr
+ENDPROC(cpu_ca15_switch_mm)
+
 #ifdef __ARMEB__
 #define rl r3
 #define rh r2
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 0a14967fd400..9310fd9aa1cf 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -173,6 +173,21 @@ ENDPROC(cpu_v7_do_resume)
 	globl_equ	cpu_v7_btbinv_do_resume,	cpu_v7_do_resume
 #endif
 
+/*
+ * Cortex-A15 that require an icache invalidation on switch_mm
+ */
+	globl_equ	cpu_ca15_proc_init,		cpu_v7_proc_init
+	globl_equ	cpu_ca15_proc_fin,		cpu_v7_proc_fin
+	globl_equ	cpu_ca15_reset,			cpu_v7_reset
+	globl_equ	cpu_ca15_do_idle,		cpu_v7_do_idle
+	globl_equ	cpu_ca15_dcache_clean_area, 	cpu_v7_dcache_clean_area
+	globl_equ	cpu_ca15_set_pte_ext,		cpu_v7_set_pte_ext
+	globl_equ	cpu_ca15_suspend_size,		cpu_v7_suspend_size
+#ifdef CONFIG_ARM_CPU_SUSPEND
+	globl_equ	cpu_ca15_do_suspend,		cpu_v7_do_suspend
+	globl_equ	cpu_ca15_do_resume,		cpu_v7_do_resume
+#endif
+
 /*
  * Cortex-A9 processor functions
  */
@@ -549,6 +564,7 @@ __v7_setup_stack:
 	@ define struct processor (see <asm/proc-fns.h> and proc-macros.S)
 	define_processor_functions v7, dabort=v7_early_abort, pabort=v7_pabort, suspend=1
 	define_processor_functions v7_btbinv, dabort=v7_early_abort, pabort=v7_pabort, suspend=1
+	define_processor_functions ca15, dabort=v7_early_abort, pabort=v7_pabort, suspend=1
 #ifndef CONFIG_ARM_LPAE
 	define_processor_functions ca9mp, dabort=v7_early_abort, pabort=v7_pabort, suspend=1
 #endif
@@ -668,7 +684,7 @@ __v7_ca12mp_proc_info:
 __v7_ca15mp_proc_info:
 	.long	0x410fc0f0
 	.long	0xff0ffff0
-	__v7_proc __v7_ca15mp_proc_info, __v7_ca15mp_setup
+	__v7_proc __v7_ca15mp_proc_info, __v7_ca15mp_setup, proc_fns = ca15_processor_functions
 	.size	__v7_ca15mp_proc_info, . - __v7_ca15mp_proc_info
 
 	/*
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 5/6] arm: Invalidate icache on prefetch abort outside of user mapping on Cortex-A15
  2018-01-25 15:21 [PATCH v3 0/6] 32bit ARM branch predictor hardening Marc Zyngier
                   ` (3 preceding siblings ...)
  2018-01-25 15:21 ` [PATCH v3 4/6] arm: Add icache invalidation on switch_mm for Cortex-A15 Marc Zyngier
@ 2018-01-25 15:21 ` Marc Zyngier
  2018-01-25 15:21 ` [PATCH v3 6/6] arm: KVM: Invalidate icache on guest exit for Cortex-A15 Marc Zyngier
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Marc Zyngier @ 2018-01-25 15:21 UTC (permalink / raw)
  To: linux-arm-kernel

In order to prevent aliasing attacks on the branch predictor,
invalidate the icache on Cortex-A15, which has the side effect
of invalidating the BTB. This requires ACTLR[0] to be set to 1
(secure operation).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/cp15.h | 1 +
 arch/arm/mm/fault.c         | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/arch/arm/include/asm/cp15.h b/arch/arm/include/asm/cp15.h
index 9e900ae855aa..07e27f212dc7 100644
--- a/arch/arm/include/asm/cp15.h
+++ b/arch/arm/include/asm/cp15.h
@@ -66,6 +66,7 @@
 #define write_sysreg(v, ...)		__write_sysreg(v, __VA_ARGS__)
 
 #define BPIALL				__ACCESS_CP15(c7, 0, c5, 6)
+#define ICIALLU				__ACCESS_CP15(c7, 0, c5, 0)
 
 extern unsigned long cr_alignment;	/* defined in entry-armv.S */
 
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index d400d1c5a409..cda1023e3f67 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -410,6 +410,10 @@ do_pabt_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
 		case ARM_CPU_PART_CORTEX_A17:
 			write_sysreg(0, BPIALL);
 			break;
+
+		case ARM_CPU_PART_CORTEX_A15:
+			write_sysreg(0, ICIALLU);
+			break;
 		}
 	}
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 6/6] arm: KVM: Invalidate icache on guest exit for Cortex-A15
  2018-01-25 15:21 [PATCH v3 0/6] 32bit ARM branch predictor hardening Marc Zyngier
                   ` (4 preceding siblings ...)
  2018-01-25 15:21 ` [PATCH v3 5/6] arm: Invalidate icache on prefetch abort outside of user mapping on Cortex-A15 Marc Zyngier
@ 2018-01-25 15:21 ` Marc Zyngier
  2018-01-26  9:30 ` [PATCH v3 0/6] 32bit ARM branch predictor hardening Christoffer Dall
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 32+ messages in thread
From: Marc Zyngier @ 2018-01-25 15:21 UTC (permalink / raw)
  To: linux-arm-kernel

In order to avoid aliasing attacks against the branch predictor
on Cortex-A15, let's invalidate the BTB on guest exit, which can
only be done by invalidating the icache (with ACTLR[0] being set).

We use the same hack as for A12/A17 to perform the vector decoding.

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h |  4 ++++
 arch/arm/kvm/hyp/hyp-entry.S   | 25 ++++++++++++++++++++++++-
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index b47db5b9e407..72ffb4d27fde 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -226,12 +226,16 @@ static inline void *kvm_get_hyp_vector(void)
 {
 	extern char __kvm_hyp_vector[];
 	extern char __kvm_hyp_vector_bp_inv[];
+	extern char __kvm_hyp_vector_ic_inv[];
 
 	switch(read_cpuid_part()) {
 	case ARM_CPU_PART_CORTEX_A12:
 	case ARM_CPU_PART_CORTEX_A17:
 		return kvm_ksym_ref(__kvm_hyp_vector_bp_inv);
 
+	case ARM_CPU_PART_CORTEX_A15:
+		return kvm_ksym_ref(__kvm_hyp_vector_ic_inv);
+
 	default:
 		return kvm_ksym_ref(__kvm_hyp_vector);
 	}
diff --git a/arch/arm/kvm/hyp/hyp-entry.S b/arch/arm/kvm/hyp/hyp-entry.S
index aab6b0c06a19..2377ed86e20b 100644
--- a/arch/arm/kvm/hyp/hyp-entry.S
+++ b/arch/arm/kvm/hyp/hyp-entry.S
@@ -70,7 +70,29 @@ __kvm_hyp_vector:
 	W(b)	hyp_hvc
 	W(b)	hyp_irq
 	W(b)	hyp_fiq
-	
+
+	.align 5
+__kvm_hyp_vector_ic_inv:
+	.global __kvm_hyp_vector_ic_inv
+
+	/*
+	 * We encode the exception entry in the bottom 3 bits of
+	 * SP, and we have to guarantee to be 8 bytes aligned.
+	 */
+	W(add)	sp, sp, #1	/* Reset 	  7 */
+	W(add)	sp, sp, #1	/* Undef	  6 */
+	W(add)	sp, sp, #1	/* Syscall	  5 */
+	W(add)	sp, sp, #1	/* Prefetch abort 4 */
+	W(add)	sp, sp, #1	/* Data abort	  3 */
+	W(add)	sp, sp, #1	/* HVC		  2 */
+	W(add)	sp, sp, #1	/* IRQ		  1 */
+	W(nop)			/* FIQ		  0 */
+
+	mcr	p15, 0, r0, c7, c5, 0	/* ICIALLU */
+	isb
+
+	b	decode_vectors
+
 	.align 5
 __kvm_hyp_vector_bp_inv:
 	.global __kvm_hyp_vector_bp_inv
@@ -91,6 +113,7 @@ __kvm_hyp_vector_bp_inv:
 	mcr	p15, 0, r0, c7, c5, 6	/* BPIALL */
 	isb
 
+decode_vectors:
 	/*
 	 * Yet another silly hack: Use VPIDR as a temp register.
 	 * Thumb2 is really a pain, as SP cannot be used with most
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 4/6] arm: Add icache invalidation on switch_mm for Cortex-A15
  2018-01-25 15:21 ` [PATCH v3 4/6] arm: Add icache invalidation on switch_mm for Cortex-A15 Marc Zyngier
@ 2018-01-26  9:14   ` Christoffer Dall
  2018-01-26  9:30     ` Marc Zyngier
  2018-01-27 22:23   ` Florian Fainelli
  1 sibling, 1 reply; 32+ messages in thread
From: Christoffer Dall @ 2018-01-26  9:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jan 25, 2018 at 03:21:37PM +0000, Marc Zyngier wrote:
> In order to avoid aliasing attacks against the branch predictor,
> Cortex-A15 require to invalidate the BTB when switching
> from one user context to another. The only way to do so on this
> CPU is to perform an ICIALLU, having set ACTLR[0] to 1 from secure
> mode.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/mm/proc-v7-2level.S | 10 ++++++++++
>  arch/arm/mm/proc-v7-3level.S | 16 ++++++++++++++++
>  arch/arm/mm/proc-v7.S        | 18 +++++++++++++++++-
>  3 files changed, 43 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/mm/proc-v7-2level.S b/arch/arm/mm/proc-v7-2level.S
> index 0422e58b74e8..7dc9e1c69039 100644
> --- a/arch/arm/mm/proc-v7-2level.S
> +++ b/arch/arm/mm/proc-v7-2level.S
> @@ -40,7 +40,17 @@
>   *	Note that we always need to flush BTAC/BTB if IBE is set
>   *	even on Cortex-A8 revisions not affected by 430973.
>   *	If IBE is not set, the flush BTAC/BTB won't do anything.
> + *
> + *	Cortex-A15 requires ACTLR[0] to be set from secure in order
> + *	for the icache invalidation to also invalidate the BTB.
>   */

Seems like we can read (but not write) this bit from non-secure.  Should
we test if it's set somewhere during boot and warn the user if not?

> +ENTRY(cpu_ca15_switch_mm)
> +#ifdef CONFIG_MMU
> +	mcr	p15, 0, r0, c7, c5, 0			@ ICIALLU
> +	isb
> +	b	cpu_v7_switch_mm
> +#endif
> +ENDPROC(cpu_ca15_switch_mm)
>  ENTRY(cpu_v7_btbinv_switch_mm)
>  #ifdef CONFIG_MMU
>  	mov	r2, #0
> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
> index 934272e1fa08..cae6bb4da956 100644
> --- a/arch/arm/mm/proc-v7-3level.S
> +++ b/arch/arm/mm/proc-v7-3level.S
> @@ -71,6 +71,22 @@ ENTRY(cpu_v7_switch_mm)
>  ENDPROC(cpu_v7_switch_mm)
>  ENDPROC(cpu_v7_btbinv_switch_mm)
>  
> +/*
> + *	Cortex-A15 requires ACTLR[0] to be set from secure in order
> + *	for the icache invalidation to also invalidate the BTB.
> + */
> +ENTRY(cpu_ca15_switch_mm)
> +#ifdef CONFIG_MMU
> +	mcr	p15, 0, r0, c7, c5, 0			@ ICIALLU
> +	mmid	r2, r2
> +	asid	r2, r2
> +	orr	rpgdh, rpgdh, r2, lsl #(48 - 32)	@ upper 32-bits of pgd
> +	mcrr	p15, 0, rpgdl, rpgdh, c2		@ set TTB 0
> +	isb
> +#endif
> +	ret	lr
> +ENDPROC(cpu_ca15_switch_mm)
> +

There's some potential for code shaing with cpu_v7_switch_mm here,
either via a macro or by simply calling cpu_v7_switch_mm from
cpu_ca15_switch_mm, but I'm not sure if we care?

>  #ifdef __ARMEB__
>  #define rl r3
>  #define rh r2
> diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
> index 0a14967fd400..9310fd9aa1cf 100644
> --- a/arch/arm/mm/proc-v7.S
> +++ b/arch/arm/mm/proc-v7.S
> @@ -173,6 +173,21 @@ ENDPROC(cpu_v7_do_resume)
>  	globl_equ	cpu_v7_btbinv_do_resume,	cpu_v7_do_resume
>  #endif
>  
> +/*
> + * Cortex-A15 that require an icache invalidation on switch_mm

uber nit: The wording is weird here, how about "Cortex-A15 requires
an..." ?

> + */
> +	globl_equ	cpu_ca15_proc_init,		cpu_v7_proc_init
> +	globl_equ	cpu_ca15_proc_fin,		cpu_v7_proc_fin
> +	globl_equ	cpu_ca15_reset,			cpu_v7_reset
> +	globl_equ	cpu_ca15_do_idle,		cpu_v7_do_idle
> +	globl_equ	cpu_ca15_dcache_clean_area, 	cpu_v7_dcache_clean_area
> +	globl_equ	cpu_ca15_set_pte_ext,		cpu_v7_set_pte_ext
> +	globl_equ	cpu_ca15_suspend_size,		cpu_v7_suspend_size
> +#ifdef CONFIG_ARM_CPU_SUSPEND
> +	globl_equ	cpu_ca15_do_suspend,		cpu_v7_do_suspend
> +	globl_equ	cpu_ca15_do_resume,		cpu_v7_do_resume
> +#endif
> +
>  /*
>   * Cortex-A9 processor functions
>   */
> @@ -549,6 +564,7 @@ __v7_setup_stack:
>  	@ define struct processor (see <asm/proc-fns.h> and proc-macros.S)
>  	define_processor_functions v7, dabort=v7_early_abort, pabort=v7_pabort, suspend=1
>  	define_processor_functions v7_btbinv, dabort=v7_early_abort, pabort=v7_pabort, suspend=1
> +	define_processor_functions ca15, dabort=v7_early_abort, pabort=v7_pabort, suspend=1
>  #ifndef CONFIG_ARM_LPAE
>  	define_processor_functions ca9mp, dabort=v7_early_abort, pabort=v7_pabort, suspend=1
>  #endif
> @@ -668,7 +684,7 @@ __v7_ca12mp_proc_info:
>  __v7_ca15mp_proc_info:
>  	.long	0x410fc0f0
>  	.long	0xff0ffff0
> -	__v7_proc __v7_ca15mp_proc_info, __v7_ca15mp_setup
> +	__v7_proc __v7_ca15mp_proc_info, __v7_ca15mp_setup, proc_fns = ca15_processor_functions
>  	.size	__v7_ca15mp_proc_info, . - __v7_ca15mp_proc_info
>  
>  	/*
> -- 
> 2.14.2
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 3/6] arm: KVM: Invalidate BTB on guest exit for Cortex-A12/A17
  2018-01-25 15:21 ` [PATCH v3 3/6] arm: KVM: Invalidate BTB on guest exit for Cortex-A12/A17 Marc Zyngier
@ 2018-01-26  9:23   ` Christoffer Dall
  2018-01-26 17:12   ` Robin Murphy
  1 sibling, 0 replies; 32+ messages in thread
From: Christoffer Dall @ 2018-01-26  9:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jan 25, 2018 at 03:21:36PM +0000, Marc Zyngier wrote:
> In order to avoid aliasing attacks against the branch predictor,
> let's invalidate the BTB on guest exit. This is made complicated
> by the fact that we cannot take a branch before invalidating the
> BTB.
> 
> We only apply this to A12 and A17, which are the only two ARM
> cores on which this useful.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_asm.h |  2 --
>  arch/arm/include/asm/kvm_mmu.h | 13 ++++++++-
>  arch/arm/kvm/hyp/hyp-entry.S   | 62 ++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 72 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> index 36dd2962a42d..df24ed48977d 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -61,8 +61,6 @@ struct kvm_vcpu;
>  extern char __kvm_hyp_init[];
>  extern char __kvm_hyp_init_end[];
>  
> -extern char __kvm_hyp_vector[];
> -
>  extern void __kvm_flush_vm_context(void);
>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index eb46fc81a440..b47db5b9e407 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -37,6 +37,7 @@
>  
>  #include <linux/highmem.h>
>  #include <asm/cacheflush.h>
> +#include <asm/cputype.h>
>  #include <asm/pgalloc.h>
>  #include <asm/stage2_pgtable.h>
>  
> @@ -223,7 +224,17 @@ static inline unsigned int kvm_get_vmid_bits(void)
>  
>  static inline void *kvm_get_hyp_vector(void)
>  {
> -	return kvm_ksym_ref(__kvm_hyp_vector);
> +	extern char __kvm_hyp_vector[];
> +	extern char __kvm_hyp_vector_bp_inv[];
> +
> +	switch(read_cpuid_part()) {
> +	case ARM_CPU_PART_CORTEX_A12:
> +	case ARM_CPU_PART_CORTEX_A17:
> +		return kvm_ksym_ref(__kvm_hyp_vector_bp_inv);
> +
> +	default:
> +		return kvm_ksym_ref(__kvm_hyp_vector);
> +	}
>  }
>  
>  static inline int kvm_map_vectors(void)
> diff --git a/arch/arm/kvm/hyp/hyp-entry.S b/arch/arm/kvm/hyp/hyp-entry.S
> index 95a2faefc070..aab6b0c06a19 100644
> --- a/arch/arm/kvm/hyp/hyp-entry.S
> +++ b/arch/arm/kvm/hyp/hyp-entry.S
> @@ -70,6 +70,57 @@ __kvm_hyp_vector:
>  	W(b)	hyp_hvc
>  	W(b)	hyp_irq
>  	W(b)	hyp_fiq
> +	

nit: trailing white space

> +	.align 5
> +__kvm_hyp_vector_bp_inv:
> +	.global __kvm_hyp_vector_bp_inv
> +
> +	/*
> +	 * We encode the exception entry in the bottom 3 bits of
> +	 * SP, and we have to guarantee to be 8 bytes aligned.
> +	 */
> +	W(add)	sp, sp, #1	/* Reset 	  7 */
> +	W(add)	sp, sp, #1	/* Undef	  6 */
> +	W(add)	sp, sp, #1	/* Syscall	  5 */
> +	W(add)	sp, sp, #1	/* Prefetch abort 4 */
> +	W(add)	sp, sp, #1	/* Data abort	  3 */
> +	W(add)	sp, sp, #1	/* HVC		  2 */
> +	W(add)	sp, sp, #1	/* IRQ		  1 */
> +	W(nop)			/* FIQ		  0 */
> +
> +	mcr	p15, 0, r0, c7, c5, 6	/* BPIALL */
> +	isb
> +
> +	/*
> +	 * Yet another silly hack: Use VPIDR as a temp register.
> +	 * Thumb2 is really a pain, as SP cannot be used with most
> +	 * of the bitwise instructions. The vect_br macro ensures
> +	 * things gets cleaned-up.
> +	 */
> +	mcr	p15, 4, r0, c0, c0, 0	/* VPIDR */
> +	mov	r0, sp
> +	and	r0, r0, #7
> +	sub	sp, sp, r0
> +	push	{r1, r2}
> +	mov	r1, r0
> +	mrc	p15, 4, r0, c0, c0, 0	/* VPIDR */
> +	mrc	p15, 0, r2, c0, c0, 0	/* MIDR  */
> +	mcr	p15, 4, r2, c0, c0, 0	/* VPIDR */
> +
> +.macro vect_br val, targ
> +	cmp	r1, #\val
> +	popeq	{r1, r2}
> +	beq	\targ
> +.endm
> +
> +	vect_br	0, hyp_fiq
> +	vect_br	1, hyp_irq
> +	vect_br	2, hyp_hvc
> +	vect_br	3, hyp_dabt
> +	vect_br	4, hyp_pabt
> +	vect_br	5, hyp_svc
> +	vect_br	6, hyp_undef
> +	vect_br	7, hyp_reset
>  
>  .macro invalid_vector label, cause
>  	.align
> @@ -149,7 +200,14 @@ hyp_hvc:
>  	bx	ip
>  
>  1:
> -	push	{lr}
> +	/*
> +	 * Pushing r2 here is just a way of keeping the stack aligned to
> +	 * 8 bytes on any path that can trigger a HYP exception. Here,
> +	 * we may well be about to jump into the guest, and the guest
> +	 * exit would otherwise be badly decoded by our fancy
> +	 * "decode-exception-without-a-branch" code...
> +	 */
> +	push	{r2, lr}
>  
>  	mov	lr, r0
>  	mov	r0, r1
> @@ -159,7 +217,7 @@ hyp_hvc:
>  THUMB(	orr	lr, #1)
>  	blx	lr			@ Call the HYP function
>  
> -	pop	{lr}
> +	pop	{r2, lr}
>  	eret
>  
>  guest_trap:
> -- 
> 2.14.2
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 0/6] 32bit ARM branch predictor hardening
  2018-01-25 15:21 [PATCH v3 0/6] 32bit ARM branch predictor hardening Marc Zyngier
                   ` (5 preceding siblings ...)
  2018-01-25 15:21 ` [PATCH v3 6/6] arm: KVM: Invalidate icache on guest exit for Cortex-A15 Marc Zyngier
@ 2018-01-26  9:30 ` Christoffer Dall
  2018-01-26 16:39 ` Andre Przywara
  2018-01-29 11:36 ` Hanjun Guo
  8 siblings, 0 replies; 32+ messages in thread
From: Christoffer Dall @ 2018-01-26  9:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jan 25, 2018 at 03:21:33PM +0000, Marc Zyngier wrote:
> This small series implements some basic BP hardening by invalidating
> the BTB on 32bit ARM CPUs that are known to be susceptible to aliasing
> attacks (Spectre variant 2). It doesn't help non-ARM 32bit CPUs, nor
> 32bit kernels that run on 64bit capable CPUs. This series doesn't
> mitigate Spectre variant 1 either.
> 
> These patches are closely modelled against what we do on arm64,
> although simpler as we can rely on an architected instruction to
> perform the invalidation. The notable exception is Cortex-A15, where
> BTB invalidation behaves like a NOP, and the only way to shoot the
> predictor down is to invalidate the icache *and* to have ACTLR[0] set
> to 1 (which is a secure-only operation).
> 
> The first patch reuses the Cortex-A8 BTB invalidation in switch_mm and
> generalises it to be used on all affected CPUs. The second perform the
> same invalidation on prefetch abort outside of the userspace
> range. The third one nukes it on guest exit, and results in some major
> surgery as we cannot take a branch from the vectors (that, and Thumb2
> being a massive pain).
> 
> Patches 4 to 6 are doing a similar thing for Cortex-A15, which the
> aforementioned ICIALLU.
> 
> To sum up the requirements:
> - Both Cortex-A8 and Cortex-A15 need to have ACTLR.IBE (bit 0) set to
>   1 from secure mode. For Cortex-A8, this overlaps with
>   ARM_ERRATA_430973 which also requires it.
> - Cortex-A9, A12 and A17 do not require any extra configuration.
> 
> Note 1: Contrary to the initial version, this new series relies on
> the arm64/kpti branch (I reuse the per-CPU vector hook for KVM).
> 
> Note 2: M-class CPUs are not affected and for R-class cores, the
> mitigation doesn't make much sense since we do not enforce user/kernel
> isolation.

Besides the minor nits in some of the patches, for the series:

Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 4/6] arm: Add icache invalidation on switch_mm for Cortex-A15
  2018-01-26  9:14   ` Christoffer Dall
@ 2018-01-26  9:30     ` Marc Zyngier
  2018-01-26 16:20       ` Florian Fainelli
  2018-01-26 17:20       ` Robin Murphy
  0 siblings, 2 replies; 32+ messages in thread
From: Marc Zyngier @ 2018-01-26  9:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 26/01/18 09:14, Christoffer Dall wrote:
> On Thu, Jan 25, 2018 at 03:21:37PM +0000, Marc Zyngier wrote:
>> In order to avoid aliasing attacks against the branch predictor,
>> Cortex-A15 require to invalidate the BTB when switching
>> from one user context to another. The only way to do so on this
>> CPU is to perform an ICIALLU, having set ACTLR[0] to 1 from secure
>> mode.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm/mm/proc-v7-2level.S | 10 ++++++++++
>>  arch/arm/mm/proc-v7-3level.S | 16 ++++++++++++++++
>>  arch/arm/mm/proc-v7.S        | 18 +++++++++++++++++-
>>  3 files changed, 43 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/mm/proc-v7-2level.S b/arch/arm/mm/proc-v7-2level.S
>> index 0422e58b74e8..7dc9e1c69039 100644
>> --- a/arch/arm/mm/proc-v7-2level.S
>> +++ b/arch/arm/mm/proc-v7-2level.S
>> @@ -40,7 +40,17 @@
>>   *	Note that we always need to flush BTAC/BTB if IBE is set
>>   *	even on Cortex-A8 revisions not affected by 430973.
>>   *	If IBE is not set, the flush BTAC/BTB won't do anything.
>> + *
>> + *	Cortex-A15 requires ACTLR[0] to be set from secure in order
>> + *	for the icache invalidation to also invalidate the BTB.
>>   */
> 
> Seems like we can read (but not write) this bit from non-secure.  Should
> we test if it's set somewhere during boot and warn the user if not?

That would have to be something that happens much later in the boot
process, as the proc structure is picked up very early, before we run
any C code.

I could add some detection code later, but we'll be stuck with a icache
invalidation for nothing. Not a pretty situation.

> 
>> +ENTRY(cpu_ca15_switch_mm)
>> +#ifdef CONFIG_MMU
>> +	mcr	p15, 0, r0, c7, c5, 0			@ ICIALLU
>> +	isb
>> +	b	cpu_v7_switch_mm
>> +#endif
>> +ENDPROC(cpu_ca15_switch_mm)
>>  ENTRY(cpu_v7_btbinv_switch_mm)
>>  #ifdef CONFIG_MMU
>>  	mov	r2, #0
>> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
>> index 934272e1fa08..cae6bb4da956 100644
>> --- a/arch/arm/mm/proc-v7-3level.S
>> +++ b/arch/arm/mm/proc-v7-3level.S
>> @@ -71,6 +71,22 @@ ENTRY(cpu_v7_switch_mm)
>>  ENDPROC(cpu_v7_switch_mm)
>>  ENDPROC(cpu_v7_btbinv_switch_mm)
>>  
>> +/*
>> + *	Cortex-A15 requires ACTLR[0] to be set from secure in order
>> + *	for the icache invalidation to also invalidate the BTB.
>> + */
>> +ENTRY(cpu_ca15_switch_mm)
>> +#ifdef CONFIG_MMU
>> +	mcr	p15, 0, r0, c7, c5, 0			@ ICIALLU
>> +	mmid	r2, r2
>> +	asid	r2, r2
>> +	orr	rpgdh, rpgdh, r2, lsl #(48 - 32)	@ upper 32-bits of pgd
>> +	mcrr	p15, 0, rpgdl, rpgdh, c2		@ set TTB 0
>> +	isb
>> +#endif
>> +	ret	lr
>> +ENDPROC(cpu_ca15_switch_mm)
>> +
> 
> There's some potential for code shaing with cpu_v7_switch_mm here,
> either via a macro or by simply calling cpu_v7_switch_mm from
> cpu_ca15_switch_mm, but I'm not sure if we care?

We could, but most things seems to been duplicated between the two
page-table implementations. I can look in to it if this is going anywhere.

> 
>>  #ifdef __ARMEB__
>>  #define rl r3
>>  #define rh r2
>> diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
>> index 0a14967fd400..9310fd9aa1cf 100644
>> --- a/arch/arm/mm/proc-v7.S
>> +++ b/arch/arm/mm/proc-v7.S
>> @@ -173,6 +173,21 @@ ENDPROC(cpu_v7_do_resume)
>>  	globl_equ	cpu_v7_btbinv_do_resume,	cpu_v7_do_resume
>>  #endif
>>  
>> +/*
>> + * Cortex-A15 that require an icache invalidation on switch_mm
> 
> uber nit: The wording is weird here, how about "Cortex-A15 requires
> an..." ?

Sure.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 4/6] arm: Add icache invalidation on switch_mm for Cortex-A15
  2018-01-26  9:30     ` Marc Zyngier
@ 2018-01-26 16:20       ` Florian Fainelli
  2018-01-26 16:33         ` Marc Zyngier
  2018-01-26 17:20       ` Robin Murphy
  1 sibling, 1 reply; 32+ messages in thread
From: Florian Fainelli @ 2018-01-26 16:20 UTC (permalink / raw)
  To: linux-arm-kernel



On 01/26/2018 01:30 AM, Marc Zyngier wrote:
> On 26/01/18 09:14, Christoffer Dall wrote:
>> On Thu, Jan 25, 2018 at 03:21:37PM +0000, Marc Zyngier wrote:
>>> In order to avoid aliasing attacks against the branch predictor,
>>> Cortex-A15 require to invalidate the BTB when switching
>>> from one user context to another. The only way to do so on this
>>> CPU is to perform an ICIALLU, having set ACTLR[0] to 1 from secure
>>> mode.
>>>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> ---
>>>  arch/arm/mm/proc-v7-2level.S | 10 ++++++++++
>>>  arch/arm/mm/proc-v7-3level.S | 16 ++++++++++++++++
>>>  arch/arm/mm/proc-v7.S        | 18 +++++++++++++++++-
>>>  3 files changed, 43 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm/mm/proc-v7-2level.S b/arch/arm/mm/proc-v7-2level.S
>>> index 0422e58b74e8..7dc9e1c69039 100644
>>> --- a/arch/arm/mm/proc-v7-2level.S
>>> +++ b/arch/arm/mm/proc-v7-2level.S
>>> @@ -40,7 +40,17 @@
>>>   *	Note that we always need to flush BTAC/BTB if IBE is set
>>>   *	even on Cortex-A8 revisions not affected by 430973.
>>>   *	If IBE is not set, the flush BTAC/BTB won't do anything.
>>> + *
>>> + *	Cortex-A15 requires ACTLR[0] to be set from secure in order
>>> + *	for the icache invalidation to also invalidate the BTB.
>>>   */
>>
>> Seems like we can read (but not write) this bit from non-secure.  Should
>> we test if it's set somewhere during boot and warn the user if not?
> 
> That would have to be something that happens much later in the boot
> process, as the proc structure is picked up very early, before we run
> any C code.
> 
> I could add some detection code later, but we'll be stuck with a icache
> invalidation for nothing. Not a pretty situation.

Would it make sense to put some code in the decompressor (maybe some
people don't use the decompressor...) similar to how we check for an
ARMv7 CPU w/o LPAE attempting to boot a LPAE enabled kernel for instance.

This is not technically a hard failure if ACTLR[0] is not set,
everything will still work fine, it just is not ideal from a security
perspective.

Should we have a Kconfig option, similar to what Nishanth proposed for
u-boot, with which we could guard an optional check against ACTLR[0]
being set?

> 
>>
>>> +ENTRY(cpu_ca15_switch_mm)
>>> +#ifdef CONFIG_MMU
>>> +	mcr	p15, 0, r0, c7, c5, 0			@ ICIALLU
>>> +	isb
>>> +	b	cpu_v7_switch_mm
>>> +#endif
>>> +ENDPROC(cpu_ca15_switch_mm)
>>>  ENTRY(cpu_v7_btbinv_switch_mm)
>>>  #ifdef CONFIG_MMU
>>>  	mov	r2, #0
>>> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
>>> index 934272e1fa08..cae6bb4da956 100644
>>> --- a/arch/arm/mm/proc-v7-3level.S
>>> +++ b/arch/arm/mm/proc-v7-3level.S
>>> @@ -71,6 +71,22 @@ ENTRY(cpu_v7_switch_mm)
>>>  ENDPROC(cpu_v7_switch_mm)
>>>  ENDPROC(cpu_v7_btbinv_switch_mm)
>>>  
>>> +/*
>>> + *	Cortex-A15 requires ACTLR[0] to be set from secure in order
>>> + *	for the icache invalidation to also invalidate the BTB.
>>> + */
>>> +ENTRY(cpu_ca15_switch_mm)
>>> +#ifdef CONFIG_MMU
>>> +	mcr	p15, 0, r0, c7, c5, 0			@ ICIALLU
>>> +	mmid	r2, r2
>>> +	asid	r2, r2
>>> +	orr	rpgdh, rpgdh, r2, lsl #(48 - 32)	@ upper 32-bits of pgd
>>> +	mcrr	p15, 0, rpgdl, rpgdh, c2		@ set TTB 0
>>> +	isb
>>> +#endif
>>> +	ret	lr
>>> +ENDPROC(cpu_ca15_switch_mm)
>>> +
>>
>> There's some potential for code shaing with cpu_v7_switch_mm here,
>> either via a macro or by simply calling cpu_v7_switch_mm from
>> cpu_ca15_switch_mm, but I'm not sure if we care?
> 
> We could, but most things seems to been duplicated between the two
> page-table implementations. I can look in to it if this is going anywhere.
> 
>>
>>>  #ifdef __ARMEB__
>>>  #define rl r3
>>>  #define rh r2
>>> diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
>>> index 0a14967fd400..9310fd9aa1cf 100644
>>> --- a/arch/arm/mm/proc-v7.S
>>> +++ b/arch/arm/mm/proc-v7.S
>>> @@ -173,6 +173,21 @@ ENDPROC(cpu_v7_do_resume)
>>>  	globl_equ	cpu_v7_btbinv_do_resume,	cpu_v7_do_resume
>>>  #endif
>>>  
>>> +/*
>>> + * Cortex-A15 that require an icache invalidation on switch_mm
>>
>> uber nit: The wording is weird here, how about "Cortex-A15 requires
>> an..." ?
> 
> Sure.
> 
> Thanks,
> 
> 	M.
> 

-- 
Florian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 4/6] arm: Add icache invalidation on switch_mm for Cortex-A15
  2018-01-26 16:20       ` Florian Fainelli
@ 2018-01-26 16:33         ` Marc Zyngier
  0 siblings, 0 replies; 32+ messages in thread
From: Marc Zyngier @ 2018-01-26 16:33 UTC (permalink / raw)
  To: linux-arm-kernel

On 26/01/18 16:20, Florian Fainelli wrote:
> 
> 
> On 01/26/2018 01:30 AM, Marc Zyngier wrote:
>> On 26/01/18 09:14, Christoffer Dall wrote:
>>> On Thu, Jan 25, 2018 at 03:21:37PM +0000, Marc Zyngier wrote:
>>>> In order to avoid aliasing attacks against the branch predictor,
>>>> Cortex-A15 require to invalidate the BTB when switching
>>>> from one user context to another. The only way to do so on this
>>>> CPU is to perform an ICIALLU, having set ACTLR[0] to 1 from secure
>>>> mode.
>>>>
>>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>>> ---
>>>>  arch/arm/mm/proc-v7-2level.S | 10 ++++++++++
>>>>  arch/arm/mm/proc-v7-3level.S | 16 ++++++++++++++++
>>>>  arch/arm/mm/proc-v7.S        | 18 +++++++++++++++++-
>>>>  3 files changed, 43 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm/mm/proc-v7-2level.S b/arch/arm/mm/proc-v7-2level.S
>>>> index 0422e58b74e8..7dc9e1c69039 100644
>>>> --- a/arch/arm/mm/proc-v7-2level.S
>>>> +++ b/arch/arm/mm/proc-v7-2level.S
>>>> @@ -40,7 +40,17 @@
>>>>   *	Note that we always need to flush BTAC/BTB if IBE is set
>>>>   *	even on Cortex-A8 revisions not affected by 430973.
>>>>   *	If IBE is not set, the flush BTAC/BTB won't do anything.
>>>> + *
>>>> + *	Cortex-A15 requires ACTLR[0] to be set from secure in order
>>>> + *	for the icache invalidation to also invalidate the BTB.
>>>>   */
>>>
>>> Seems like we can read (but not write) this bit from non-secure.  Should
>>> we test if it's set somewhere during boot and warn the user if not?
>>
>> That would have to be something that happens much later in the boot
>> process, as the proc structure is picked up very early, before we run
>> any C code.
>>
>> I could add some detection code later, but we'll be stuck with a icache
>> invalidation for nothing. Not a pretty situation.
> 
> Would it make sense to put some code in the decompressor (maybe some
> people don't use the decompressor...) similar to how we check for an
> ARMv7 CPU w/o LPAE attempting to boot a LPAE enabled kernel for instance.

Which only works if you have CONFIG_DEBUG_LL, and thus not generally
usable. Also, the LPAE test is not in the decompressor. but in the
kernel itself (head-common.S). Not that it changes anything...

Also, testing that at early boot is not that useful, as you only have
the boot CPU enabled at this point. You want to check that you have
whatever configuration is required on a per-CPU basis, which means you
have to hook this into the SMP bring-up code. And play nice with things
like big-little, in both SMP and MCPM configurations.

> This is not technically a hard failure if ACTLR[0] is not set,
> everything will still work fine, it just is not ideal from a security
> perspective.
> 
> Should we have a Kconfig option, similar to what Nishanth proposed for
> u-boot, with which we could guard an optional check against ACTLR[0]
> being set?

Sure, why not... If you have something in mind, I can review it.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 0/6] 32bit ARM branch predictor hardening
  2018-01-25 15:21 [PATCH v3 0/6] 32bit ARM branch predictor hardening Marc Zyngier
                   ` (6 preceding siblings ...)
  2018-01-26  9:30 ` [PATCH v3 0/6] 32bit ARM branch predictor hardening Christoffer Dall
@ 2018-01-26 16:39 ` Andre Przywara
  2018-01-29 11:36 ` Hanjun Guo
  8 siblings, 0 replies; 32+ messages in thread
From: Andre Przywara @ 2018-01-26 16:39 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On 25/01/18 15:21, Marc Zyngier wrote:
> This small series implements some basic BP hardening by invalidating
> the BTB on 32bit ARM CPUs that are known to be susceptible to aliasing
> attacks (Spectre variant 2). It doesn't help non-ARM 32bit CPUs, nor
> 32bit kernels that run on 64bit capable CPUs. This series doesn't
> mitigate Spectre variant 1 either.
> 
> These patches are closely modelled against what we do on arm64,
> although simpler as we can rely on an architected instruction to
> perform the invalidation. The notable exception is Cortex-A15, where
> BTB invalidation behaves like a NOP, and the only way to shoot the
> predictor down is to invalidate the icache *and* to have ACTLR[0] set
> to 1 (which is a secure-only operation).

FWIW, I tested this series briefly on a Calxeda Midway (Cortex-A15).
Both without and with ACTLR[0] (set in the secure firmware code), I
could boot it fine and ran two KVM guests under some stress.

So at least no regression here.
Thanks for putting this together!


Tested-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre.

> The first patch reuses the Cortex-A8 BTB invalidation in switch_mm and
> generalises it to be used on all affected CPUs. The second perform the
> same invalidation on prefetch abort outside of the userspace
> range. The third one nukes it on guest exit, and results in some major
> surgery as we cannot take a branch from the vectors (that, and Thumb2
> being a massive pain).
> 
> Patches 4 to 6 are doing a similar thing for Cortex-A15, which the
> aforementioned ICIALLU.
> 
> To sum up the requirements:
> - Both Cortex-A8 and Cortex-A15 need to have ACTLR.IBE (bit 0) set to
>   1 from secure mode. For Cortex-A8, this overlaps with
>   ARM_ERRATA_430973 which also requires it.
> - Cortex-A9, A12 and A17 do not require any extra configuration.
> 
> Note 1: Contrary to the initial version, this new series relies on
> the arm64/kpti branch (I reuse the per-CPU vector hook for KVM).
> 
> Note 2: M-class CPUs are not affected and for R-class cores, the
> mitigation doesn't make much sense since we do not enforce user/kernel
> isolation.
> 
> * From v2:
>   - Fixed !MMU build
>   - Small KVM optimisation (suggested by Robin)
>   - Fixed register zeroing in cpu_v7_btbinv_switch_mm (noticed by
>     Andre)
>   
> * From v1:
>   - Fixed broken hyp_fiq vector (noticed by Ard)
>   - Fixed broken BTB invalidation in LPAE switch_mm (reported by Andre)
>   - Revamped invalidation on PABT (noticed by James on arm64,
>     suggested by Will)
>   - Rewrote the whole HYP sequence, as Thumb2 was pretty unhappy about
>     arithmetic with the stack pointer
> 
> Marc Zyngier (6):
>   arm: Add BTB invalidation on switch_mm for Cortex-A9, A12 and A17
>   arm: Invalidate BTB on prefetch abort outside of user mapping on
>     Cortex A8, A9, A12 and A17
>   arm: KVM: Invalidate BTB on guest exit for Cortex-A12/A17
>   arm: Add icache invalidation on switch_mm for Cortex-A15
>   arm: Invalidate icache on prefetch abort outside of user mapping on
>     Cortex-A15
>   arm: KVM: Invalidate icache on guest exit for Cortex-A15
> 
>  arch/arm/include/asm/cp15.h    |  3 ++
>  arch/arm/include/asm/kvm_asm.h |  2 -
>  arch/arm/include/asm/kvm_mmu.h | 17 ++++++++-
>  arch/arm/kvm/hyp/hyp-entry.S   | 85 +++++++++++++++++++++++++++++++++++++++++-
>  arch/arm/mm/fault.c            | 29 ++++++++++++++
>  arch/arm/mm/fsr-2level.c       |  4 +-
>  arch/arm/mm/fsr-3level.c       | 67 ++++++++++++++++++++++++++++++++-
>  arch/arm/mm/proc-v7-2level.S   | 14 ++++++-
>  arch/arm/mm/proc-v7-3level.S   | 22 +++++++++++
>  arch/arm/mm/proc-v7.S          | 48 ++++++++++++++++--------
>  10 files changed, 265 insertions(+), 26 deletions(-)
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 3/6] arm: KVM: Invalidate BTB on guest exit for Cortex-A12/A17
  2018-01-25 15:21 ` [PATCH v3 3/6] arm: KVM: Invalidate BTB on guest exit for Cortex-A12/A17 Marc Zyngier
  2018-01-26  9:23   ` Christoffer Dall
@ 2018-01-26 17:12   ` Robin Murphy
  2018-01-31 12:11     ` Marc Zyngier
  1 sibling, 1 reply; 32+ messages in thread
From: Robin Murphy @ 2018-01-26 17:12 UTC (permalink / raw)
  To: linux-arm-kernel

On 25/01/18 15:21, Marc Zyngier wrote:
> In order to avoid aliasing attacks against the branch predictor,
> let's invalidate the BTB on guest exit. This is made complicated
> by the fact that we cannot take a branch before invalidating the
> BTB.
> 
> We only apply this to A12 and A17, which are the only two ARM
> cores on which this useful.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>   arch/arm/include/asm/kvm_asm.h |  2 --
>   arch/arm/include/asm/kvm_mmu.h | 13 ++++++++-
>   arch/arm/kvm/hyp/hyp-entry.S   | 62 ++++++++++++++++++++++++++++++++++++++++--
>   3 files changed, 72 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
> index 36dd2962a42d..df24ed48977d 100644
> --- a/arch/arm/include/asm/kvm_asm.h
> +++ b/arch/arm/include/asm/kvm_asm.h
> @@ -61,8 +61,6 @@ struct kvm_vcpu;
>   extern char __kvm_hyp_init[];
>   extern char __kvm_hyp_init_end[];
>   
> -extern char __kvm_hyp_vector[];
> -
>   extern void __kvm_flush_vm_context(void);
>   extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>   extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
> index eb46fc81a440..b47db5b9e407 100644
> --- a/arch/arm/include/asm/kvm_mmu.h
> +++ b/arch/arm/include/asm/kvm_mmu.h
> @@ -37,6 +37,7 @@
>   
>   #include <linux/highmem.h>
>   #include <asm/cacheflush.h>
> +#include <asm/cputype.h>
>   #include <asm/pgalloc.h>
>   #include <asm/stage2_pgtable.h>
>   
> @@ -223,7 +224,17 @@ static inline unsigned int kvm_get_vmid_bits(void)
>   
>   static inline void *kvm_get_hyp_vector(void)
>   {
> -	return kvm_ksym_ref(__kvm_hyp_vector);
> +	extern char __kvm_hyp_vector[];
> +	extern char __kvm_hyp_vector_bp_inv[];
> +
> +	switch(read_cpuid_part()) {
> +	case ARM_CPU_PART_CORTEX_A12:
> +	case ARM_CPU_PART_CORTEX_A17:
> +		return kvm_ksym_ref(__kvm_hyp_vector_bp_inv);
> +
> +	default:
> +		return kvm_ksym_ref(__kvm_hyp_vector);
> +	}
>   }
>   
>   static inline int kvm_map_vectors(void)
> diff --git a/arch/arm/kvm/hyp/hyp-entry.S b/arch/arm/kvm/hyp/hyp-entry.S
> index 95a2faefc070..aab6b0c06a19 100644
> --- a/arch/arm/kvm/hyp/hyp-entry.S
> +++ b/arch/arm/kvm/hyp/hyp-entry.S
> @@ -70,6 +70,57 @@ __kvm_hyp_vector:
>   	W(b)	hyp_hvc
>   	W(b)	hyp_irq
>   	W(b)	hyp_fiq
> +	
> +	.align 5
> +__kvm_hyp_vector_bp_inv:
> +	.global __kvm_hyp_vector_bp_inv
> +
> +	/*
> +	 * We encode the exception entry in the bottom 3 bits of
> +	 * SP, and we have to guarantee to be 8 bytes aligned.
> +	 */
> +	W(add)	sp, sp, #1	/* Reset 	  7 */
> +	W(add)	sp, sp, #1	/* Undef	  6 */
> +	W(add)	sp, sp, #1	/* Syscall	  5 */
> +	W(add)	sp, sp, #1	/* Prefetch abort 4 */
> +	W(add)	sp, sp, #1	/* Data abort	  3 */
> +	W(add)	sp, sp, #1	/* HVC		  2 */
> +	W(add)	sp, sp, #1	/* IRQ		  1 */
> +	W(nop)			/* FIQ		  0 */
> +
> +	mcr	p15, 0, r0, c7, c5, 6	/* BPIALL */
> +	isb
> +

The below is quite a bit of faff; might it be worth an

#ifdef CONFIG_THUMB2_KERNEL

> +	/*
> +	 * Yet another silly hack: Use VPIDR as a temp register.
> +	 * Thumb2 is really a pain, as SP cannot be used with most
> +	 * of the bitwise instructions. The vect_br macro ensures
> +	 * things gets cleaned-up.
> +	 */
> +	mcr	p15, 4, r0, c0, c0, 0	/* VPIDR */
> +	mov	r0, sp
> +	and	r0, r0, #7
> +	sub	sp, sp, r0
> +	push	{r1, r2}
> +	mov	r1, r0
> +	mrc	p15, 4, r0, c0, c0, 0	/* VPIDR */
> +	mrc	p15, 0, r2, c0, c0, 0	/* MIDR  */
> +	mcr	p15, 4, r2, c0, c0, 0	/* VPIDR */

#endif

> +
> +.macro vect_br val, targ

ARM(cmp		sp, #val)
THUMB(cmp	r1, #\val)
THUMB(popeq	{r1, r2}

> +	beq	\targ
> +.endm

...to keep the "normal" path relatively streamlined?

Robin.

> +
> +	vect_br	0, hyp_fiq
> +	vect_br	1, hyp_irq
> +	vect_br	2, hyp_hvc
> +	vect_br	3, hyp_dabt
> +	vect_br	4, hyp_pabt
> +	vect_br	5, hyp_svc
> +	vect_br	6, hyp_undef
> +	vect_br	7, hyp_reset
>   
>   .macro invalid_vector label, cause
>   	.align
> @@ -149,7 +200,14 @@ hyp_hvc:
>   	bx	ip
>   
>   1:
> -	push	{lr}
> +	/*
> +	 * Pushing r2 here is just a way of keeping the stack aligned to
> +	 * 8 bytes on any path that can trigger a HYP exception. Here,
> +	 * we may well be about to jump into the guest, and the guest
> +	 * exit would otherwise be badly decoded by our fancy
> +	 * "decode-exception-without-a-branch" code...
> +	 */
> +	push	{r2, lr}
>   
>   	mov	lr, r0
>   	mov	r0, r1
> @@ -159,7 +217,7 @@ hyp_hvc:
>   THUMB(	orr	lr, #1)
>   	blx	lr			@ Call the HYP function
>   
> -	pop	{lr}
> +	pop	{r2, lr}
>   	eret
>   
>   guest_trap:
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 4/6] arm: Add icache invalidation on switch_mm for Cortex-A15
  2018-01-26  9:30     ` Marc Zyngier
  2018-01-26 16:20       ` Florian Fainelli
@ 2018-01-26 17:20       ` Robin Murphy
  1 sibling, 0 replies; 32+ messages in thread
From: Robin Murphy @ 2018-01-26 17:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 26/01/18 09:30, Marc Zyngier wrote:
> On 26/01/18 09:14, Christoffer Dall wrote:
>> On Thu, Jan 25, 2018 at 03:21:37PM +0000, Marc Zyngier wrote:
>>> In order to avoid aliasing attacks against the branch predictor,
>>> Cortex-A15 require to invalidate the BTB when switching
>>> from one user context to another. The only way to do so on this
>>> CPU is to perform an ICIALLU, having set ACTLR[0] to 1 from secure
>>> mode.
>>>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> ---
>>>   arch/arm/mm/proc-v7-2level.S | 10 ++++++++++
>>>   arch/arm/mm/proc-v7-3level.S | 16 ++++++++++++++++
>>>   arch/arm/mm/proc-v7.S        | 18 +++++++++++++++++-
>>>   3 files changed, 43 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm/mm/proc-v7-2level.S b/arch/arm/mm/proc-v7-2level.S
>>> index 0422e58b74e8..7dc9e1c69039 100644
>>> --- a/arch/arm/mm/proc-v7-2level.S
>>> +++ b/arch/arm/mm/proc-v7-2level.S
>>> @@ -40,7 +40,17 @@
>>>    *	Note that we always need to flush BTAC/BTB if IBE is set
>>>    *	even on Cortex-A8 revisions not affected by 430973.
>>>    *	If IBE is not set, the flush BTAC/BTB won't do anything.
>>> + *
>>> + *	Cortex-A15 requires ACTLR[0] to be set from secure in order
>>> + *	for the icache invalidation to also invalidate the BTB.
>>>    */
>>
>> Seems like we can read (but not write) this bit from non-secure.  Should
>> we test if it's set somewhere during boot and warn the user if not?
> 
> That would have to be something that happens much later in the boot
> process, as the proc structure is picked up very early, before we run
> any C code.
> 
> I could add some detection code later, but we'll be stuck with a icache
> invalidation for nothing. Not a pretty situation.
> 
>>
>>> +ENTRY(cpu_ca15_switch_mm)
>>> +#ifdef CONFIG_MMU
>>> +	mcr	p15, 0, r0, c7, c5, 0			@ ICIALLU
>>> +	isb
>>> +	b	cpu_v7_switch_mm
>>> +#endif
>>> +ENDPROC(cpu_ca15_switch_mm)
>>>   ENTRY(cpu_v7_btbinv_switch_mm)
>>>   #ifdef CONFIG_MMU
>>>   	mov	r2, #0
>>> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
>>> index 934272e1fa08..cae6bb4da956 100644
>>> --- a/arch/arm/mm/proc-v7-3level.S
>>> +++ b/arch/arm/mm/proc-v7-3level.S
>>> @@ -71,6 +71,22 @@ ENTRY(cpu_v7_switch_mm)
>>>   ENDPROC(cpu_v7_switch_mm)
>>>   ENDPROC(cpu_v7_btbinv_switch_mm)
>>>   
>>> +/*
>>> + *	Cortex-A15 requires ACTLR[0] to be set from secure in order
>>> + *	for the icache invalidation to also invalidate the BTB.
>>> + */
>>> +ENTRY(cpu_ca15_switch_mm)
>>> +#ifdef CONFIG_MMU
>>> +	mcr	p15, 0, r0, c7, c5, 0			@ ICIALLU
>>> +	mmid	r2, r2
>>> +	asid	r2, r2
>>> +	orr	rpgdh, rpgdh, r2, lsl #(48 - 32)	@ upper 32-bits of pgd
>>> +	mcrr	p15, 0, rpgdl, rpgdh, c2		@ set TTB 0
>>> +	isb
>>> +#endif
>>> +	ret	lr
>>> +ENDPROC(cpu_ca15_switch_mm)
>>> +
>>
>> There's some potential for code shaing with cpu_v7_switch_mm here,
>> either via a macro or by simply calling cpu_v7_switch_mm from
>> cpu_ca15_switch_mm, but I'm not sure if we care?
> 
> We could, but most things seems to been duplicated between the two
> page-table implementations. I can look in to it if this is going anywhere.

Could this not just do the equivalent of the cpu_ca8 version for 2-level 
paging, i.e. execute the ICIALLU then fall through - everything else is 
the same, right?

Robin.

> 
>>
>>>   #ifdef __ARMEB__
>>>   #define rl r3
>>>   #define rh r2
>>> diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
>>> index 0a14967fd400..9310fd9aa1cf 100644
>>> --- a/arch/arm/mm/proc-v7.S
>>> +++ b/arch/arm/mm/proc-v7.S
>>> @@ -173,6 +173,21 @@ ENDPROC(cpu_v7_do_resume)
>>>   	globl_equ	cpu_v7_btbinv_do_resume,	cpu_v7_do_resume
>>>   #endif
>>>   
>>> +/*
>>> + * Cortex-A15 that require an icache invalidation on switch_mm
>>
>> uber nit: The wording is weird here, how about "Cortex-A15 requires
>> an..." ?
> 
> Sure.
> 
> Thanks,
> 
> 	M.
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 1/6] arm: Add BTB invalidation on switch_mm for Cortex-A9, A12 and A17
  2018-01-25 15:21 ` [PATCH v3 1/6] arm: Add BTB invalidation on switch_mm for Cortex-A9, A12 and A17 Marc Zyngier
@ 2018-01-26 20:44   ` Florian Fainelli
  2018-01-30 17:27     ` Marc Zyngier
  0 siblings, 1 reply; 32+ messages in thread
From: Florian Fainelli @ 2018-01-26 20:44 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/25/2018 07:21 AM, Marc Zyngier wrote:
> In order to avoid aliasing attacks against the branch predictor,
> some implementations require to invalidate the BTB when switching
> from one user context to another.
> 
> For this, we reuse the existing implementation for Cortex-A8, and
> apply it to A9, A12 and A17.

Should this read: and apply it to A8, A9, A12 and A17

> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/mm/proc-v7-2level.S |  4 ++--
>  arch/arm/mm/proc-v7-3level.S |  6 ++++++
>  arch/arm/mm/proc-v7.S        | 30 +++++++++++++++---------------
>  3 files changed, 23 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/arm/mm/proc-v7-2level.S b/arch/arm/mm/proc-v7-2level.S
> index c6141a5435c3..0422e58b74e8 100644
> --- a/arch/arm/mm/proc-v7-2level.S
> +++ b/arch/arm/mm/proc-v7-2level.S
> @@ -41,7 +41,7 @@
>   *	even on Cortex-A8 revisions not affected by 430973.
>   *	If IBE is not set, the flush BTAC/BTB won't do anything.
>   */
> -ENTRY(cpu_ca8_switch_mm)
> +ENTRY(cpu_v7_btbinv_switch_mm)
>  #ifdef CONFIG_MMU
>  	mov	r2, #0
>  	mcr	p15, 0, r2, c7, c5, 6		@ flush BTAC/BTB
> @@ -66,7 +66,7 @@ ENTRY(cpu_v7_switch_mm)
>  #endif
>  	bx	lr
>  ENDPROC(cpu_v7_switch_mm)
> -ENDPROC(cpu_ca8_switch_mm)
> +ENDPROC(cpu_v7_btbinv_switch_mm)
>  
>  /*
>   *	cpu_v7_set_pte_ext(ptep, pte)
> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
> index 7d16bbc4102b..934272e1fa08 100644
> --- a/arch/arm/mm/proc-v7-3level.S
> +++ b/arch/arm/mm/proc-v7-3level.S
> @@ -54,6 +54,11 @@
>   * Set the translation table base pointer to be pgd_phys (physical address of
>   * the new TTB).
>   */
> +ENTRY(cpu_v7_btbinv_switch_mm)
> +#ifdef CONFIG_MMU
> +	mov	r2, #0
> +	mcr	p15, 0, r2, c7, c5, 6			@ flush BTAC/BTB
> +#endif
>  ENTRY(cpu_v7_switch_mm)
>  #ifdef CONFIG_MMU
>  	mmid	r2, r2
> @@ -64,6 +69,7 @@ ENTRY(cpu_v7_switch_mm)
>  #endif
>  	ret	lr
>  ENDPROC(cpu_v7_switch_mm)
> +ENDPROC(cpu_v7_btbinv_switch_mm)
>  
>  #ifdef __ARMEB__
>  #define rl r3
> diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
> index 01d64c0b2563..0a14967fd400 100644
> --- a/arch/arm/mm/proc-v7.S
> +++ b/arch/arm/mm/proc-v7.S
> @@ -159,18 +159,18 @@ ENDPROC(cpu_v7_do_resume)
>  #endif
>  
>  /*
> - * Cortex-A8
> + * Cortex-A8/A12/A17 that require a BTB invalidation on switch_mm

Should this read Cortex-A8/A9/A12/A17?
-- 
Florian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 4/6] arm: Add icache invalidation on switch_mm for Cortex-A15
  2018-01-25 15:21 ` [PATCH v3 4/6] arm: Add icache invalidation on switch_mm for Cortex-A15 Marc Zyngier
  2018-01-26  9:14   ` Christoffer Dall
@ 2018-01-27 22:23   ` Florian Fainelli
  2018-01-28 11:55     ` Marc Zyngier
  1 sibling, 1 reply; 32+ messages in thread
From: Florian Fainelli @ 2018-01-27 22:23 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/25/2018 07:21 AM, Marc Zyngier wrote:
> In order to avoid aliasing attacks against the branch predictor,
> Cortex-A15 require to invalidate the BTB when switching
> from one user context to another. The only way to do so on this
> CPU is to perform an ICIALLU, having set ACTLR[0] to 1 from secure
> mode.

Even though this is a platform design mistake, let's say your Linux
kernel boots in secure supervisor mode, we could have code
that tries to set ACTLR[0] as early as possible, since the writes are
ignored if executing from non-secure mode. If Linux is booted normally
either PL1 or PL2 non-secure we could still check ACTLR[0].

My concern is that without doing this, we may have a hard time catching
improper firmware as well as having bogus bug reports. This is
completely RFC though since:

- I could not quite figure out yet why update ca15_actlr_status from
assembly is not reflected in the C code despite using PC relative loads

- this is done in __ca15mp_setup and __b15mp_setup because we know by
then that we have either of these two CPUs but we could presumably move
this check under a Kconfig option and earlier where all erratas are checked

-- 
Florian

-------------- next part --------------
A non-text attachment was scrubbed...
Name: actlr-kernel-check.patch
Type: text/x-patch
Size: 4133 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20180127/da1e1735/attachment-0001.bin>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 4/6] arm: Add icache invalidation on switch_mm for Cortex-A15
  2018-01-27 22:23   ` Florian Fainelli
@ 2018-01-28 11:55     ` Marc Zyngier
  2018-01-29 18:05       ` Florian Fainelli
  0 siblings, 1 reply; 32+ messages in thread
From: Marc Zyngier @ 2018-01-28 11:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, 27 Jan 2018 14:23:57 -0800
Florian Fainelli <f.fainelli@gmail.com> wrote:

> On 01/25/2018 07:21 AM, Marc Zyngier wrote:
> > In order to avoid aliasing attacks against the branch predictor,
> > Cortex-A15 require to invalidate the BTB when switching
> > from one user context to another. The only way to do so on this
> > CPU is to perform an ICIALLU, having set ACTLR[0] to 1 from secure
> > mode.  
> 
> Even though this is a platform design mistake, let's say your Linux
> kernel boots in secure supervisor mode, we could have code
> that tries to set ACTLR[0] as early as possible, since the writes are
> ignored if executing from non-secure mode. If Linux is booted normally
> either PL1 or PL2 non-secure we could still check ACTLR[0].
> 
> My concern is that without doing this, we may have a hard time catching
> improper firmware as well as having bogus bug reports. This is
> completely RFC though since:
> 
> - I could not quite figure out yet why update ca15_actlr_status from
> assembly is not reflected in the C code despite using PC relative loads

One potential problem is that you're writing this with the MMU off, and
reading it with the MMU on. Unless you've added CMOs to make this
visible, it will not be reliable.

Also, I'm not sure how this works with MCPM.

> - this is done in __ca15mp_setup and __b15mp_setup because we know by
> then that we have either of these two CPUs but we could presumably move
> this check under a Kconfig option and earlier where all erratas are checked

Thanks,

	M.
-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 0/6] 32bit ARM branch predictor hardening
  2018-01-25 15:21 [PATCH v3 0/6] 32bit ARM branch predictor hardening Marc Zyngier
                   ` (7 preceding siblings ...)
  2018-01-26 16:39 ` Andre Przywara
@ 2018-01-29 11:36 ` Hanjun Guo
  2018-01-29 14:58   ` Nishanth Menon
  8 siblings, 1 reply; 32+ messages in thread
From: Hanjun Guo @ 2018-01-29 11:36 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc,

On 2018/1/25 23:21, Marc Zyngier wrote:
> This small series implements some basic BP hardening by invalidating
> the BTB on 32bit ARM CPUs that are known to be susceptible to aliasing
> attacks (Spectre variant 2). It doesn't help non-ARM 32bit CPUs, nor
> 32bit kernels that run on 64bit capable CPUs. This series doesn't
> mitigate Spectre variant 1 either.
> 
> These patches are closely modelled against what we do on arm64,
> although simpler as we can rely on an architected instruction to
> perform the invalidation. The notable exception is Cortex-A15, where
> BTB invalidation behaves like a NOP, and the only way to shoot the
> predictor down is to invalidate the icache *and* to have ACTLR[0] set
> to 1 (which is a secure-only operation).
> 
> The first patch reuses the Cortex-A8 BTB invalidation in switch_mm and
> generalises it to be used on all affected CPUs. The second perform the
> same invalidation on prefetch abort outside of the userspace
> range. The third one nukes it on guest exit, and results in some major
> surgery as we cannot take a branch from the vectors (that, and Thumb2
> being a massive pain).
> 
> Patches 4 to 6 are doing a similar thing for Cortex-A15, which the
> aforementioned ICIALLU.

I have tested this patch set on A9 and A15 based machines, but with
no kvm support, it booted ok, and passed our test framework which
has thousands of test cases (not related to performance :) ), so
with patch 1,2,4,5 in this patch set,

Tested-by: Hanjun Guo <hanjun.guo@linaro.org>

By the way, this patch set just enable branch predictor hardening
on arm32 unconditionally, but some of machines (such as wireless
network base station) will not be exposed to user to take advantage
of variant 2, and those machines will be pretty sensitive for
performance, so can we introduce Kconfig or boot option to disable
branch predictor hardening as an option?

Thanks
Hanjun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 0/6] 32bit ARM branch predictor hardening
  2018-01-29 11:36 ` Hanjun Guo
@ 2018-01-29 14:58   ` Nishanth Menon
  2018-01-31 12:45     ` Hanjun Guo
  0 siblings, 1 reply; 32+ messages in thread
From: Nishanth Menon @ 2018-01-29 14:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 29, 2018 at 5:36 AM, Hanjun Guo <guohanjun@huawei.com> wrote:
[...]

> By the way, this patch set just enable branch predictor hardening
> on arm32 unconditionally, but some of machines (such as wireless
> network base station) will not be exposed to user to take advantage
> of variant 2, and those machines will be pretty sensitive for
> performance, so can we introduce Kconfig or boot option to disable
> branch predictor hardening as an option?

I am curious: Have you seen performance degradation with this series?
If yes, is it possible to share the information?

---
Regards,
Nishanth Menon

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 4/6] arm: Add icache invalidation on switch_mm for Cortex-A15
  2018-01-28 11:55     ` Marc Zyngier
@ 2018-01-29 18:05       ` Florian Fainelli
  2018-01-29 18:13         ` Marc Zyngier
  0 siblings, 1 reply; 32+ messages in thread
From: Florian Fainelli @ 2018-01-29 18:05 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/28/2018 03:55 AM, Marc Zyngier wrote:
> On Sat, 27 Jan 2018 14:23:57 -0800
> Florian Fainelli <f.fainelli@gmail.com> wrote:
> 
>> On 01/25/2018 07:21 AM, Marc Zyngier wrote:
>>> In order to avoid aliasing attacks against the branch predictor,
>>> Cortex-A15 require to invalidate the BTB when switching
>>> from one user context to another. The only way to do so on this
>>> CPU is to perform an ICIALLU, having set ACTLR[0] to 1 from secure
>>> mode.  
>>
>> Even though this is a platform design mistake, let's say your Linux
>> kernel boots in secure supervisor mode, we could have code
>> that tries to set ACTLR[0] as early as possible, since the writes are
>> ignored if executing from non-secure mode. If Linux is booted normally
>> either PL1 or PL2 non-secure we could still check ACTLR[0].
>>
>> My concern is that without doing this, we may have a hard time catching
>> improper firmware as well as having bogus bug reports. This is
>> completely RFC though since:
>>
>> - I could not quite figure out yet why update ca15_actlr_status from
>> assembly is not reflected in the C code despite using PC relative loads
> 
> One potential problem is that you're writing this with the MMU off, and
> reading it with the MMU on. Unless you've added CMOs to make this
> visible, it will not be reliable.

I did not, let's change that.

> 
> Also, I'm not sure how this works with MCPM.

I just started looking at MCPM, do you know roughly at what point do
non-MCPM and MCPM entry points converge? If we moved the check for
ACTLR[0] there, would that be too late.

> 
>> - this is done in __ca15mp_setup and __b15mp_setup because we know by
>> then that we have either of these two CPUs but we could presumably move
>> this check under a Kconfig option and earlier where all erratas are checked
> 
> Thanks,
> 
> 	M.
> 


-- 
Florian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 4/6] arm: Add icache invalidation on switch_mm for Cortex-A15
  2018-01-29 18:05       ` Florian Fainelli
@ 2018-01-29 18:13         ` Marc Zyngier
  0 siblings, 0 replies; 32+ messages in thread
From: Marc Zyngier @ 2018-01-29 18:13 UTC (permalink / raw)
  To: linux-arm-kernel

On 29/01/18 18:05, Florian Fainelli wrote:
> On 01/28/2018 03:55 AM, Marc Zyngier wrote:
>> On Sat, 27 Jan 2018 14:23:57 -0800
>> Florian Fainelli <f.fainelli@gmail.com> wrote:
>>
>>> On 01/25/2018 07:21 AM, Marc Zyngier wrote:
>>>> In order to avoid aliasing attacks against the branch predictor,
>>>> Cortex-A15 require to invalidate the BTB when switching
>>>> from one user context to another. The only way to do so on this
>>>> CPU is to perform an ICIALLU, having set ACTLR[0] to 1 from secure
>>>> mode.  
>>>
>>> Even though this is a platform design mistake, let's say your Linux
>>> kernel boots in secure supervisor mode, we could have code
>>> that tries to set ACTLR[0] as early as possible, since the writes are
>>> ignored if executing from non-secure mode. If Linux is booted normally
>>> either PL1 or PL2 non-secure we could still check ACTLR[0].
>>>
>>> My concern is that without doing this, we may have a hard time catching
>>> improper firmware as well as having bogus bug reports. This is
>>> completely RFC though since:
>>>
>>> - I could not quite figure out yet why update ca15_actlr_status from
>>> assembly is not reflected in the C code despite using PC relative loads
>>
>> One potential problem is that you're writing this with the MMU off, and
>> reading it with the MMU on. Unless you've added CMOs to make this
>> visible, it will not be reliable.
> 
> I did not, let's change that.
> 
>>
>> Also, I'm not sure how this works with MCPM.
> 
> I just started looking at MCPM, do you know roughly at what point do
> non-MCPM and MCPM entry points converge? If we moved the check for
> ACTLR[0] there, would that be too late.

I have no idea. You'll have to reverse-engineer it (MCPM makes me feel
slightly sick).

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 1/6] arm: Add BTB invalidation on switch_mm for Cortex-A9, A12 and A17
  2018-01-26 20:44   ` Florian Fainelli
@ 2018-01-30 17:27     ` Marc Zyngier
  0 siblings, 0 replies; 32+ messages in thread
From: Marc Zyngier @ 2018-01-30 17:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 26/01/18 20:44, Florian Fainelli wrote:
> On 01/25/2018 07:21 AM, Marc Zyngier wrote:
>> In order to avoid aliasing attacks against the branch predictor,
>> some implementations require to invalidate the BTB when switching
>> from one user context to another.
>>
>> For this, we reuse the existing implementation for Cortex-A8, and
>> apply it to A9, A12 and A17.
> 
> Should this read: and apply it to A8, A9, A12 and A17

A8 already has a BTB invalidation there. We are expanding the use of
that function to other v7 cores. Or am I missing something obvious?

> 
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>  arch/arm/mm/proc-v7-2level.S |  4 ++--
>>  arch/arm/mm/proc-v7-3level.S |  6 ++++++
>>  arch/arm/mm/proc-v7.S        | 30 +++++++++++++++---------------
>>  3 files changed, 23 insertions(+), 17 deletions(-)
>>
>> diff --git a/arch/arm/mm/proc-v7-2level.S b/arch/arm/mm/proc-v7-2level.S
>> index c6141a5435c3..0422e58b74e8 100644
>> --- a/arch/arm/mm/proc-v7-2level.S
>> +++ b/arch/arm/mm/proc-v7-2level.S
>> @@ -41,7 +41,7 @@
>>   *	even on Cortex-A8 revisions not affected by 430973.
>>   *	If IBE is not set, the flush BTAC/BTB won't do anything.
>>   */
>> -ENTRY(cpu_ca8_switch_mm)
>> +ENTRY(cpu_v7_btbinv_switch_mm)
>>  #ifdef CONFIG_MMU
>>  	mov	r2, #0
>>  	mcr	p15, 0, r2, c7, c5, 6		@ flush BTAC/BTB
>> @@ -66,7 +66,7 @@ ENTRY(cpu_v7_switch_mm)
>>  #endif
>>  	bx	lr
>>  ENDPROC(cpu_v7_switch_mm)
>> -ENDPROC(cpu_ca8_switch_mm)
>> +ENDPROC(cpu_v7_btbinv_switch_mm)
>>  
>>  /*
>>   *	cpu_v7_set_pte_ext(ptep, pte)
>> diff --git a/arch/arm/mm/proc-v7-3level.S b/arch/arm/mm/proc-v7-3level.S
>> index 7d16bbc4102b..934272e1fa08 100644
>> --- a/arch/arm/mm/proc-v7-3level.S
>> +++ b/arch/arm/mm/proc-v7-3level.S
>> @@ -54,6 +54,11 @@
>>   * Set the translation table base pointer to be pgd_phys (physical address of
>>   * the new TTB).
>>   */
>> +ENTRY(cpu_v7_btbinv_switch_mm)
>> +#ifdef CONFIG_MMU
>> +	mov	r2, #0
>> +	mcr	p15, 0, r2, c7, c5, 6			@ flush BTAC/BTB
>> +#endif
>>  ENTRY(cpu_v7_switch_mm)
>>  #ifdef CONFIG_MMU
>>  	mmid	r2, r2
>> @@ -64,6 +69,7 @@ ENTRY(cpu_v7_switch_mm)
>>  #endif
>>  	ret	lr
>>  ENDPROC(cpu_v7_switch_mm)
>> +ENDPROC(cpu_v7_btbinv_switch_mm)
>>  
>>  #ifdef __ARMEB__
>>  #define rl r3
>> diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
>> index 01d64c0b2563..0a14967fd400 100644
>> --- a/arch/arm/mm/proc-v7.S
>> +++ b/arch/arm/mm/proc-v7.S
>> @@ -159,18 +159,18 @@ ENDPROC(cpu_v7_do_resume)
>>  #endif
>>  
>>  /*
>> - * Cortex-A8
>> + * Cortex-A8/A12/A17 that require a BTB invalidation on switch_mm
> 
> Should this read Cortex-A8/A9/A12/A17?

Indeed, I missed our dear A9 friend.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 2/6] arm: Invalidate BTB on prefetch abort outside of user mapping on Cortex A8, A9, A12 and A17
  2018-01-25 15:21 ` [PATCH v3 2/6] arm: Invalidate BTB on prefetch abort outside of user mapping on Cortex A8, A9, " Marc Zyngier
@ 2018-01-31  2:13   ` Fabio Estevam
  0 siblings, 0 replies; 32+ messages in thread
From: Fabio Estevam @ 2018-01-31  2:13 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marc,

On Thu, Jan 25, 2018 at 1:21 PM, Marc Zyngier <marc.zyngier@arm.com> wrote:

> --- a/arch/arm/mm/fault.c
> +++ b/arch/arm/mm/fault.c
> @@ -21,6 +21,7 @@
>  #include <linux/highmem.h>
>  #include <linux/perf_event.h>
>
> +#include <asm/cp15.h>
>  #include <asm/exception.h>
>  #include <asm/pgtable.h>
>  #include <asm/system_misc.h>
> @@ -181,6 +182,7 @@ __do_user_fault(struct task_struct *tsk, unsigned long addr,
>         si.si_errno = 0;
>         si.si_code = code;
>         si.si_addr = (void __user *)addr;
> +

Unneeded blank line.

>         force_sig_info(sig, &si, tsk);
>  }
>
> @@ -396,12 +398,35 @@ do_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
>         __do_kernel_fault(mm, addr, fsr, regs);
>         return 0;
>  }
> +
> +static int
> +do_pabt_page_fault(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
> +{
> +       if (addr > TASK_SIZE) {
> +               switch(read_cpuid_part()) {

checkpatch complains here:

ERROR: space required before the open parenthesis '('
#62: FILE: arch/arm/mm/fault.c:406:
+        switch(read_cpuid_part()) {

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 3/6] arm: KVM: Invalidate BTB on guest exit for Cortex-A12/A17
  2018-01-26 17:12   ` Robin Murphy
@ 2018-01-31 12:11     ` Marc Zyngier
  2018-01-31 14:25       ` Robin Murphy
  0 siblings, 1 reply; 32+ messages in thread
From: Marc Zyngier @ 2018-01-31 12:11 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robin,

On 26/01/18 17:12, Robin Murphy wrote:
> On 25/01/18 15:21, Marc Zyngier wrote:
>> In order to avoid aliasing attacks against the branch predictor,
>> let's invalidate the BTB on guest exit. This is made complicated
>> by the fact that we cannot take a branch before invalidating the
>> BTB.
>>
>> We only apply this to A12 and A17, which are the only two ARM
>> cores on which this useful.
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>>   arch/arm/include/asm/kvm_asm.h |  2 --
>>   arch/arm/include/asm/kvm_mmu.h | 13 ++++++++-
>>   arch/arm/kvm/hyp/hyp-entry.S   | 62 ++++++++++++++++++++++++++++++++++++++++--
>>   3 files changed, 72 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
>> index 36dd2962a42d..df24ed48977d 100644
>> --- a/arch/arm/include/asm/kvm_asm.h
>> +++ b/arch/arm/include/asm/kvm_asm.h
>> @@ -61,8 +61,6 @@ struct kvm_vcpu;
>>   extern char __kvm_hyp_init[];
>>   extern char __kvm_hyp_init_end[];
>>   
>> -extern char __kvm_hyp_vector[];
>> -
>>   extern void __kvm_flush_vm_context(void);
>>   extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>>   extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>> index eb46fc81a440..b47db5b9e407 100644
>> --- a/arch/arm/include/asm/kvm_mmu.h
>> +++ b/arch/arm/include/asm/kvm_mmu.h
>> @@ -37,6 +37,7 @@
>>   
>>   #include <linux/highmem.h>
>>   #include <asm/cacheflush.h>
>> +#include <asm/cputype.h>
>>   #include <asm/pgalloc.h>
>>   #include <asm/stage2_pgtable.h>
>>   
>> @@ -223,7 +224,17 @@ static inline unsigned int kvm_get_vmid_bits(void)
>>   
>>   static inline void *kvm_get_hyp_vector(void)
>>   {
>> -	return kvm_ksym_ref(__kvm_hyp_vector);
>> +	extern char __kvm_hyp_vector[];
>> +	extern char __kvm_hyp_vector_bp_inv[];
>> +
>> +	switch(read_cpuid_part()) {
>> +	case ARM_CPU_PART_CORTEX_A12:
>> +	case ARM_CPU_PART_CORTEX_A17:
>> +		return kvm_ksym_ref(__kvm_hyp_vector_bp_inv);
>> +
>> +	default:
>> +		return kvm_ksym_ref(__kvm_hyp_vector);
>> +	}
>>   }
>>   
>>   static inline int kvm_map_vectors(void)
>> diff --git a/arch/arm/kvm/hyp/hyp-entry.S b/arch/arm/kvm/hyp/hyp-entry.S
>> index 95a2faefc070..aab6b0c06a19 100644
>> --- a/arch/arm/kvm/hyp/hyp-entry.S
>> +++ b/arch/arm/kvm/hyp/hyp-entry.S
>> @@ -70,6 +70,57 @@ __kvm_hyp_vector:
>>   	W(b)	hyp_hvc
>>   	W(b)	hyp_irq
>>   	W(b)	hyp_fiq
>> +	
>> +	.align 5
>> +__kvm_hyp_vector_bp_inv:
>> +	.global __kvm_hyp_vector_bp_inv
>> +
>> +	/*
>> +	 * We encode the exception entry in the bottom 3 bits of
>> +	 * SP, and we have to guarantee to be 8 bytes aligned.
>> +	 */
>> +	W(add)	sp, sp, #1	/* Reset 	  7 */
>> +	W(add)	sp, sp, #1	/* Undef	  6 */
>> +	W(add)	sp, sp, #1	/* Syscall	  5 */
>> +	W(add)	sp, sp, #1	/* Prefetch abort 4 */
>> +	W(add)	sp, sp, #1	/* Data abort	  3 */
>> +	W(add)	sp, sp, #1	/* HVC		  2 */
>> +	W(add)	sp, sp, #1	/* IRQ		  1 */
>> +	W(nop)			/* FIQ		  0 */
>> +
>> +	mcr	p15, 0, r0, c7, c5, 6	/* BPIALL */
>> +	isb
>> +
> 
> The below is quite a bit of faff; might it be worth an
> 
> #ifdef CONFIG_THUMB2_KERNEL
> 
>> +	/*
>> +	 * Yet another silly hack: Use VPIDR as a temp register.
>> +	 * Thumb2 is really a pain, as SP cannot be used with most
>> +	 * of the bitwise instructions. The vect_br macro ensures
>> +	 * things gets cleaned-up.
>> +	 */
>> +	mcr	p15, 4, r0, c0, c0, 0	/* VPIDR */
>> +	mov	r0, sp
>> +	and	r0, r0, #7
>> +	sub	sp, sp, r0
>> +	push	{r1, r2}
>> +	mov	r1, r0
>> +	mrc	p15, 4, r0, c0, c0, 0	/* VPIDR */
>> +	mrc	p15, 0, r2, c0, c0, 0	/* MIDR  */
>> +	mcr	p15, 4, r2, c0, c0, 0	/* VPIDR */
> 
> #endif
> 
>> +
>> +.macro vect_br val, targ
> 
> ARM(cmp		sp, #val)

Doesn't quite work, as we still have all the top bits that contain the 
stack address. But I like the idea of making it baster for non-T2. How
about this instead?

diff --git a/arch/arm/kvm/hyp/hyp-entry.S b/arch/arm/kvm/hyp/hyp-entry.S
index 2377ed86e20b..23c954a9e441 100644
--- a/arch/arm/kvm/hyp/hyp-entry.S
+++ b/arch/arm/kvm/hyp/hyp-entry.S
@@ -114,6 +114,8 @@ __kvm_hyp_vector_bp_inv:
 	isb
 
 decode_vectors:
+
+#ifdef CONFIG_THUMB2_KERNEL
 	/*
 	 * Yet another silly hack: Use VPIDR as a temp register.
 	 * Thumb2 is really a pain, as SP cannot be used with most
@@ -129,10 +131,16 @@ decode_vectors:
 	mrc	p15, 4, r0, c0, c0, 0	/* VPIDR */
 	mrc	p15, 0, r2, c0, c0, 0	/* MIDR  */
 	mcr	p15, 4, r2, c0, c0, 0	/* VPIDR */
+#endif
 
 .macro vect_br val, targ
-	cmp	r1, #\val
-	popeq	{r1, r2}
+ARM(	eor	sp, sp, #\val	)
+ARM(	tst	sp, #7		)
+ARM(	eorne	sp, sp, #\val	)
+
+THUMB(	cmp	r1, #\val	)
+THUMB(	popeq	{r1, r2}	)
+
 	beq	\targ
 .endm
 


> THUMB(cmp	r1, #\val)
> THUMB(popeq	{r1, r2}
> 
>> +	beq	\targ
>> +.endm
> 
> ...to keep the "normal" path relatively streamlined?

I think the above achieves it... Thoughts?

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 0/6] 32bit ARM branch predictor hardening
  2018-01-29 14:58   ` Nishanth Menon
@ 2018-01-31 12:45     ` Hanjun Guo
  2018-01-31 18:53       ` Florian Fainelli
  0 siblings, 1 reply; 32+ messages in thread
From: Hanjun Guo @ 2018-01-31 12:45 UTC (permalink / raw)
  To: linux-arm-kernel

On 2018/1/29 22:58, Nishanth Menon wrote:
> On Mon, Jan 29, 2018 at 5:36 AM, Hanjun Guo <guohanjun@huawei.com> wrote:
> [...]
> 
>> By the way, this patch set just enable branch predictor hardening
>> on arm32 unconditionally, but some of machines (such as wireless
>> network base station) will not be exposed to user to take advantage
>> of variant 2, and those machines will be pretty sensitive for
>> performance, so can we introduce Kconfig or boot option to disable
>> branch predictor hardening as an option?
> 
> I am curious: Have you seen performance degradation with this series?
> If yes, is it possible to share the information?

Sorry for the late reply, the performance data for context switch (CFS)
is about 6%~12% drop (A9 based machine) for the first around test, but
the data is not stable, I need to retest then I will update here.

Thanks
Hanjun

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 3/6] arm: KVM: Invalidate BTB on guest exit for Cortex-A12/A17
  2018-01-31 12:11     ` Marc Zyngier
@ 2018-01-31 14:25       ` Robin Murphy
  0 siblings, 0 replies; 32+ messages in thread
From: Robin Murphy @ 2018-01-31 14:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 31/01/18 12:11, Marc Zyngier wrote:
> Hi Robin,
> 
> On 26/01/18 17:12, Robin Murphy wrote:
>> On 25/01/18 15:21, Marc Zyngier wrote:
>>> In order to avoid aliasing attacks against the branch predictor,
>>> let's invalidate the BTB on guest exit. This is made complicated
>>> by the fact that we cannot take a branch before invalidating the
>>> BTB.
>>>
>>> We only apply this to A12 and A17, which are the only two ARM
>>> cores on which this useful.
>>>
>>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>>> ---
>>>    arch/arm/include/asm/kvm_asm.h |  2 --
>>>    arch/arm/include/asm/kvm_mmu.h | 13 ++++++++-
>>>    arch/arm/kvm/hyp/hyp-entry.S   | 62 ++++++++++++++++++++++++++++++++++++++++--
>>>    3 files changed, 72 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
>>> index 36dd2962a42d..df24ed48977d 100644
>>> --- a/arch/arm/include/asm/kvm_asm.h
>>> +++ b/arch/arm/include/asm/kvm_asm.h
>>> @@ -61,8 +61,6 @@ struct kvm_vcpu;
>>>    extern char __kvm_hyp_init[];
>>>    extern char __kvm_hyp_init_end[];
>>>    
>>> -extern char __kvm_hyp_vector[];
>>> -
>>>    extern void __kvm_flush_vm_context(void);
>>>    extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
>>>    extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>>> diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
>>> index eb46fc81a440..b47db5b9e407 100644
>>> --- a/arch/arm/include/asm/kvm_mmu.h
>>> +++ b/arch/arm/include/asm/kvm_mmu.h
>>> @@ -37,6 +37,7 @@
>>>    
>>>    #include <linux/highmem.h>
>>>    #include <asm/cacheflush.h>
>>> +#include <asm/cputype.h>
>>>    #include <asm/pgalloc.h>
>>>    #include <asm/stage2_pgtable.h>
>>>    
>>> @@ -223,7 +224,17 @@ static inline unsigned int kvm_get_vmid_bits(void)
>>>    
>>>    static inline void *kvm_get_hyp_vector(void)
>>>    {
>>> -	return kvm_ksym_ref(__kvm_hyp_vector);
>>> +	extern char __kvm_hyp_vector[];
>>> +	extern char __kvm_hyp_vector_bp_inv[];
>>> +
>>> +	switch(read_cpuid_part()) {
>>> +	case ARM_CPU_PART_CORTEX_A12:
>>> +	case ARM_CPU_PART_CORTEX_A17:
>>> +		return kvm_ksym_ref(__kvm_hyp_vector_bp_inv);
>>> +
>>> +	default:
>>> +		return kvm_ksym_ref(__kvm_hyp_vector);
>>> +	}
>>>    }
>>>    
>>>    static inline int kvm_map_vectors(void)
>>> diff --git a/arch/arm/kvm/hyp/hyp-entry.S b/arch/arm/kvm/hyp/hyp-entry.S
>>> index 95a2faefc070..aab6b0c06a19 100644
>>> --- a/arch/arm/kvm/hyp/hyp-entry.S
>>> +++ b/arch/arm/kvm/hyp/hyp-entry.S
>>> @@ -70,6 +70,57 @@ __kvm_hyp_vector:
>>>    	W(b)	hyp_hvc
>>>    	W(b)	hyp_irq
>>>    	W(b)	hyp_fiq
>>> +	
>>> +	.align 5
>>> +__kvm_hyp_vector_bp_inv:
>>> +	.global __kvm_hyp_vector_bp_inv
>>> +
>>> +	/*
>>> +	 * We encode the exception entry in the bottom 3 bits of
>>> +	 * SP, and we have to guarantee to be 8 bytes aligned.
>>> +	 */
>>> +	W(add)	sp, sp, #1	/* Reset 	  7 */
>>> +	W(add)	sp, sp, #1	/* Undef	  6 */
>>> +	W(add)	sp, sp, #1	/* Syscall	  5 */
>>> +	W(add)	sp, sp, #1	/* Prefetch abort 4 */
>>> +	W(add)	sp, sp, #1	/* Data abort	  3 */
>>> +	W(add)	sp, sp, #1	/* HVC		  2 */
>>> +	W(add)	sp, sp, #1	/* IRQ		  1 */
>>> +	W(nop)			/* FIQ		  0 */
>>> +
>>> +	mcr	p15, 0, r0, c7, c5, 6	/* BPIALL */
>>> +	isb
>>> +
>>
>> The below is quite a bit of faff; might it be worth an
>>
>> #ifdef CONFIG_THUMB2_KERNEL
>>
>>> +	/*
>>> +	 * Yet another silly hack: Use VPIDR as a temp register.
>>> +	 * Thumb2 is really a pain, as SP cannot be used with most
>>> +	 * of the bitwise instructions. The vect_br macro ensures
>>> +	 * things gets cleaned-up.
>>> +	 */
>>> +	mcr	p15, 4, r0, c0, c0, 0	/* VPIDR */
>>> +	mov	r0, sp
>>> +	and	r0, r0, #7
>>> +	sub	sp, sp, r0
>>> +	push	{r1, r2}
>>> +	mov	r1, r0
>>> +	mrc	p15, 4, r0, c0, c0, 0	/* VPIDR */
>>> +	mrc	p15, 0, r2, c0, c0, 0	/* MIDR  */
>>> +	mcr	p15, 4, r2, c0, c0, 0	/* VPIDR */
>>
>> #endif
>>
>>> +
>>> +.macro vect_br val, targ
>>
>> ARM(cmp		sp, #val)
> 
> Doesn't quite work, as we still have all the top bits that contain the
> stack address. But I like the idea of making it baster for non-T2. How
> about this instead?

Right, the CMP is indeed totally bogus - I hadn't exactly reasoned this 
through in detail ;)

> diff --git a/arch/arm/kvm/hyp/hyp-entry.S b/arch/arm/kvm/hyp/hyp-entry.S
> index 2377ed86e20b..23c954a9e441 100644
> --- a/arch/arm/kvm/hyp/hyp-entry.S
> +++ b/arch/arm/kvm/hyp/hyp-entry.S
> @@ -114,6 +114,8 @@ __kvm_hyp_vector_bp_inv:
>   	isb
>   
>   decode_vectors:
> +
> +#ifdef CONFIG_THUMB2_KERNEL
>   	/*
>   	 * Yet another silly hack: Use VPIDR as a temp register.
>   	 * Thumb2 is really a pain, as SP cannot be used with most
> @@ -129,10 +131,16 @@ decode_vectors:
>   	mrc	p15, 4, r0, c0, c0, 0	/* VPIDR */
>   	mrc	p15, 0, r2, c0, c0, 0	/* MIDR  */
>   	mcr	p15, 4, r2, c0, c0, 0	/* VPIDR */
> +#endif
>   
>   .macro vect_br val, targ
> -	cmp	r1, #\val
> -	popeq	{r1, r2}
> +ARM(	eor	sp, sp, #\val	)
> +ARM(	tst	sp, #7		)
> +ARM(	eorne	sp, sp, #\val	)
> +
> +THUMB(	cmp	r1, #\val	)
> +THUMB(	popeq	{r1, r2}	)
> +
>   	beq	\targ
>   .endm
>   
> 
> 
>> THUMB(cmp	r1, #\val)
>> THUMB(popeq	{r1, r2}
>>
>>> +	beq	\targ
>>> +.endm
>>
>> ...to keep the "normal" path relatively streamlined?
> 
> I think the above achieves it... Thoughts?

Yeah, that looks like it should do the trick; very cunning!

Robin.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 0/6] 32bit ARM branch predictor hardening
  2018-01-31 12:45     ` Hanjun Guo
@ 2018-01-31 18:53       ` Florian Fainelli
  2018-01-31 19:07         ` Marc Zyngier
  0 siblings, 1 reply; 32+ messages in thread
From: Florian Fainelli @ 2018-01-31 18:53 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/31/2018 04:45 AM, Hanjun Guo wrote:
> On 2018/1/29 22:58, Nishanth Menon wrote:
>> On Mon, Jan 29, 2018 at 5:36 AM, Hanjun Guo <guohanjun@huawei.com> wrote:
>> [...]
>>
>>> By the way, this patch set just enable branch predictor hardening
>>> on arm32 unconditionally, but some of machines (such as wireless
>>> network base station) will not be exposed to user to take advantage
>>> of variant 2, and those machines will be pretty sensitive for
>>> performance, so can we introduce Kconfig or boot option to disable
>>> branch predictor hardening as an option?
>>
>> I am curious: Have you seen performance degradation with this series?
>> If yes, is it possible to share the information?
> 
> Sorry for the late reply, the performance data for context switch (CFS)
> is about 6%~12% drop (A9 based machine) for the first around test, but
> the data is not stable, I need to retest then I will update here.

What tool did you use to measure this? On a Brahma-B15 platform clocked
at 1.5Ghz, across kernels 4.1, 4.9 (4.15 in progress as we speak), I
measured the following, with two memory configurations, one giving 256MB
of usable memory, another giving 3GB of usable memory, results below are
only the most extreme 256MB case. This is running 13 groups because the
ASID space is 256bits so this should force at least two full ASID
generation rollovers (assuming the logic is correct here).

for i in $(seq 0 9)
do
	hackbench 13 process 10000
done

Average values, in seconds:

1) 4.1.45, ACTLR[0] = 0, no spectre variant 2 patches: 114,2666
2) 4.1.45, ACTLR[0] = 1, no spectre variant 2 patches: 114,2952
3) 4.1.45, ACTLR[0] =1 , spectre variant 2 patches: 115,5853

=> 3) is a 1.15% degradation against 1)

4.9.51, ACTLR[0] = 0, no spectre variant 2 patches: 130,7676
4.9.51, ACTLR[0] = 1, no spectre variant 2 patches: 130,6848
4.9.51, ACTLR[0] =1 , spectre variant 2 patches: 132,4274

=> 3) is a 1.26% degradation against 1)

The relative differences between 4.1 and 4.9 appear consistent (with 4.9
being slower for a reason I ignore).

Marc, are there any performance tests/results that you ran that you
could share?
-- 
Florian

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 0/6] 32bit ARM branch predictor hardening
  2018-01-31 18:53       ` Florian Fainelli
@ 2018-01-31 19:07         ` Marc Zyngier
  2018-01-31 19:54           ` André Przywara
  0 siblings, 1 reply; 32+ messages in thread
From: Marc Zyngier @ 2018-01-31 19:07 UTC (permalink / raw)
  To: linux-arm-kernel

On 31/01/18 18:53, Florian Fainelli wrote:
> On 01/31/2018 04:45 AM, Hanjun Guo wrote:
>> On 2018/1/29 22:58, Nishanth Menon wrote:
>>> On Mon, Jan 29, 2018 at 5:36 AM, Hanjun Guo <guohanjun@huawei.com> wrote:
>>> [...]
>>>
>>>> By the way, this patch set just enable branch predictor hardening
>>>> on arm32 unconditionally, but some of machines (such as wireless
>>>> network base station) will not be exposed to user to take advantage
>>>> of variant 2, and those machines will be pretty sensitive for
>>>> performance, so can we introduce Kconfig or boot option to disable
>>>> branch predictor hardening as an option?
>>>
>>> I am curious: Have you seen performance degradation with this series?
>>> If yes, is it possible to share the information?
>>
>> Sorry for the late reply, the performance data for context switch (CFS)
>> is about 6%~12% drop (A9 based machine) for the first around test, but
>> the data is not stable, I need to retest then I will update here.
> 
> What tool did you use to measure this? On a Brahma-B15 platform clocked
> at 1.5Ghz, across kernels 4.1, 4.9 (4.15 in progress as we speak), I
> measured the following, with two memory configurations, one giving 256MB
> of usable memory, another giving 3GB of usable memory, results below are
> only the most extreme 256MB case. This is running 13 groups because the
> ASID space is 256bits so this should force at least two full ASID
> generation rollovers (assuming the logic is correct here).
> 
> for i in $(seq 0 9)
> do
> 	hackbench 13 process 10000
> done
> 
> Average values, in seconds:
> 
> 1) 4.1.45, ACTLR[0] = 0, no spectre variant 2 patches: 114,2666
> 2) 4.1.45, ACTLR[0] = 1, no spectre variant 2 patches: 114,2952
> 3) 4.1.45, ACTLR[0] =1 , spectre variant 2 patches: 115,5853
> 
> => 3) is a 1.15% degradation against 1)
> 
> 4.9.51, ACTLR[0] = 0, no spectre variant 2 patches: 130,7676
> 4.9.51, ACTLR[0] = 1, no spectre variant 2 patches: 130,6848
> 4.9.51, ACTLR[0] =1 , spectre variant 2 patches: 132,4274
> 
> => 3) is a 1.26% degradation against 1)
> 
> The relative differences between 4.1 and 4.9 appear consistent (with 4.9
> being slower for a reason I ignore).
> 
> Marc, are there any performance tests/results that you ran that you
> could share?

None. I usually don't run benchmarks, because they are not
representative of a real workload. I urge people to run their own real
workload, as it is very unlikely to have hackbench's profile...

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 0/6] 32bit ARM branch predictor hardening
  2018-01-31 19:07         ` Marc Zyngier
@ 2018-01-31 19:54           ` André Przywara
  2018-01-31 20:37             ` Florian Fainelli
  0 siblings, 1 reply; 32+ messages in thread
From: André Przywara @ 2018-01-31 19:54 UTC (permalink / raw)
  To: linux-arm-kernel

On 31/01/18 19:07, Marc Zyngier wrote:
> On 31/01/18 18:53, Florian Fainelli wrote:
>> On 01/31/2018 04:45 AM, Hanjun Guo wrote:
>>> On 2018/1/29 22:58, Nishanth Menon wrote:
>>>> On Mon, Jan 29, 2018 at 5:36 AM, Hanjun Guo <guohanjun@huawei.com> wrote:
>>>> [...]
>>>>
>>>>> By the way, this patch set just enable branch predictor hardening
>>>>> on arm32 unconditionally, but some of machines (such as wireless
>>>>> network base station) will not be exposed to user to take advantage
>>>>> of variant 2, and those machines will be pretty sensitive for
>>>>> performance, so can we introduce Kconfig or boot option to disable
>>>>> branch predictor hardening as an option?
>>>>
>>>> I am curious: Have you seen performance degradation with this series?
>>>> If yes, is it possible to share the information?
>>>
>>> Sorry for the late reply, the performance data for context switch (CFS)
>>> is about 6%~12% drop (A9 based machine) for the first around test, but
>>> the data is not stable, I need to retest then I will update here.
>>
>> What tool did you use to measure this? On a Brahma-B15 platform clocked
>> at 1.5Ghz, across kernels 4.1, 4.9 (4.15 in progress as we speak), I
>> measured the following, with two memory configurations, one giving 256MB
>> of usable memory, another giving 3GB of usable memory, results below are
>> only the most extreme 256MB case. This is running 13 groups because the
>> ASID space is 256bits so this should force at least two full ASID
>> generation rollovers (assuming the logic is correct here).
>>
>> for i in $(seq 0 9)
>> do
>> 	hackbench 13 process 10000
>> done
>>
>> Average values, in seconds:
>>
>> 1) 4.1.45, ACTLR[0] = 0, no spectre variant 2 patches: 114,2666
>> 2) 4.1.45, ACTLR[0] = 1, no spectre variant 2 patches: 114,2952
>> 3) 4.1.45, ACTLR[0] =1 , spectre variant 2 patches: 115,5853
>>
>> => 3) is a 1.15% degradation against 1)
>>
>> 4.9.51, ACTLR[0] = 0, no spectre variant 2 patches: 130,7676
>> 4.9.51, ACTLR[0] = 1, no spectre variant 2 patches: 130,6848
>> 4.9.51, ACTLR[0] =1 , spectre variant 2 patches: 132,4274
>>
>> => 3) is a 1.26% degradation against 1)
>>
>> The relative differences between 4.1 and 4.9 appear consistent (with 4.9
>> being slower for a reason I ignore).
>>
>> Marc, are there any performance tests/results that you ran that you
>> could share?
> 
> None. I usually don't run benchmarks, because they are not
> representative of a real workload. I urge people to run their own real
> workload, as it is very unlikely to have hackbench's profile...

Very true.

Out of curiosity (and to prove that the patches and my home-baked
firmware fix actually had an effect), I also ran hackbench (of course!)
on a Calxeda Midway (4*Cortex-A15, 8GB RAM). Native runs showed only
very little degradation, not unlike Florian's numbers.
Running hackbench in a KVM guest however showed a bigger impact, which
is of course somewhat expected.

Cheers,
Andre.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 0/6] 32bit ARM branch predictor hardening
  2018-01-31 19:54           ` André Przywara
@ 2018-01-31 20:37             ` Florian Fainelli
  0 siblings, 0 replies; 32+ messages in thread
From: Florian Fainelli @ 2018-01-31 20:37 UTC (permalink / raw)
  To: linux-arm-kernel

On 01/31/2018 11:54 AM, Andr? Przywara wrote:
> On 31/01/18 19:07, Marc Zyngier wrote:
>> On 31/01/18 18:53, Florian Fainelli wrote:
>>> On 01/31/2018 04:45 AM, Hanjun Guo wrote:
>>>> On 2018/1/29 22:58, Nishanth Menon wrote:
>>>>> On Mon, Jan 29, 2018 at 5:36 AM, Hanjun Guo <guohanjun@huawei.com> wrote:
>>>>> [...]
>>>>>
>>>>>> By the way, this patch set just enable branch predictor hardening
>>>>>> on arm32 unconditionally, but some of machines (such as wireless
>>>>>> network base station) will not be exposed to user to take advantage
>>>>>> of variant 2, and those machines will be pretty sensitive for
>>>>>> performance, so can we introduce Kconfig or boot option to disable
>>>>>> branch predictor hardening as an option?
>>>>>
>>>>> I am curious: Have you seen performance degradation with this series?
>>>>> If yes, is it possible to share the information?
>>>>
>>>> Sorry for the late reply, the performance data for context switch (CFS)
>>>> is about 6%~12% drop (A9 based machine) for the first around test, but
>>>> the data is not stable, I need to retest then I will update here.
>>>
>>> What tool did you use to measure this? On a Brahma-B15 platform clocked
>>> at 1.5Ghz, across kernels 4.1, 4.9 (4.15 in progress as we speak), I
>>> measured the following, with two memory configurations, one giving 256MB
>>> of usable memory, another giving 3GB of usable memory, results below are
>>> only the most extreme 256MB case. This is running 13 groups because the
>>> ASID space is 256bits so this should force at least two full ASID
>>> generation rollovers (assuming the logic is correct here).
>>>
>>> for i in $(seq 0 9)
>>> do
>>> 	hackbench 13 process 10000
>>> done
>>>
>>> Average values, in seconds:
>>>
>>> 1) 4.1.45, ACTLR[0] = 0, no spectre variant 2 patches: 114,2666
>>> 2) 4.1.45, ACTLR[0] = 1, no spectre variant 2 patches: 114,2952
>>> 3) 4.1.45, ACTLR[0] =1 , spectre variant 2 patches: 115,5853
>>>
>>> => 3) is a 1.15% degradation against 1)
>>>
>>> 4.9.51, ACTLR[0] = 0, no spectre variant 2 patches: 130,7676
>>> 4.9.51, ACTLR[0] = 1, no spectre variant 2 patches: 130,6848
>>> 4.9.51, ACTLR[0] =1 , spectre variant 2 patches: 132,4274
>>>
>>> => 3) is a 1.26% degradation against 1)
>>>
>>> The relative differences between 4.1 and 4.9 appear consistent (with 4.9
>>> being slower for a reason I ignore).
>>>
>>> Marc, are there any performance tests/results that you ran that you
>>> could share?
>>
>> None. I usually don't run benchmarks, because they are not
>> representative of a real workload. I urge people to run their own real
>> workload, as it is very unlikely to have hackbench's profile...
> 
> Very true.

Of course, but that does not mean you don't want to characterize some
sort of worst case scenario ;)

> 
> Out of curiosity (and to prove that the patches and my home-baked
> firmware fix actually had an effect), I also ran hackbench (of course!)
> on a Calxeda Midway (4*Cortex-A15, 8GB RAM). Native runs showed only
> very little degradation, not unlike Florian's numbers.

How did you run hackbench, out of curiosity?

> Running hackbench in a KVM guest however showed a bigger impact, which
> is of course somewhat expected.

That I have not done, not too interested by KVM, but this is indeed
expected.
-- 
Florian

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2018-01-31 20:37 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-25 15:21 [PATCH v3 0/6] 32bit ARM branch predictor hardening Marc Zyngier
2018-01-25 15:21 ` [PATCH v3 1/6] arm: Add BTB invalidation on switch_mm for Cortex-A9, A12 and A17 Marc Zyngier
2018-01-26 20:44   ` Florian Fainelli
2018-01-30 17:27     ` Marc Zyngier
2018-01-25 15:21 ` [PATCH v3 2/6] arm: Invalidate BTB on prefetch abort outside of user mapping on Cortex A8, A9, " Marc Zyngier
2018-01-31  2:13   ` Fabio Estevam
2018-01-25 15:21 ` [PATCH v3 3/6] arm: KVM: Invalidate BTB on guest exit for Cortex-A12/A17 Marc Zyngier
2018-01-26  9:23   ` Christoffer Dall
2018-01-26 17:12   ` Robin Murphy
2018-01-31 12:11     ` Marc Zyngier
2018-01-31 14:25       ` Robin Murphy
2018-01-25 15:21 ` [PATCH v3 4/6] arm: Add icache invalidation on switch_mm for Cortex-A15 Marc Zyngier
2018-01-26  9:14   ` Christoffer Dall
2018-01-26  9:30     ` Marc Zyngier
2018-01-26 16:20       ` Florian Fainelli
2018-01-26 16:33         ` Marc Zyngier
2018-01-26 17:20       ` Robin Murphy
2018-01-27 22:23   ` Florian Fainelli
2018-01-28 11:55     ` Marc Zyngier
2018-01-29 18:05       ` Florian Fainelli
2018-01-29 18:13         ` Marc Zyngier
2018-01-25 15:21 ` [PATCH v3 5/6] arm: Invalidate icache on prefetch abort outside of user mapping on Cortex-A15 Marc Zyngier
2018-01-25 15:21 ` [PATCH v3 6/6] arm: KVM: Invalidate icache on guest exit for Cortex-A15 Marc Zyngier
2018-01-26  9:30 ` [PATCH v3 0/6] 32bit ARM branch predictor hardening Christoffer Dall
2018-01-26 16:39 ` Andre Przywara
2018-01-29 11:36 ` Hanjun Guo
2018-01-29 14:58   ` Nishanth Menon
2018-01-31 12:45     ` Hanjun Guo
2018-01-31 18:53       ` Florian Fainelli
2018-01-31 19:07         ` Marc Zyngier
2018-01-31 19:54           ` André Przywara
2018-01-31 20:37             ` Florian Fainelli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.