These two patches are based on the work by Marc Zyngier and addresses Cavium ThunderX2 erratum 219. This erratum (originally reported by ARM folks) is from an interesting use of the prefetch instruction in the KPTI patchset. The prefetch was done between a TTBR change and the corresponding ISB, and this occasionally caused a crash on ThunderX2. The first patch removes the troublesome prefetch for ThunderX2. The second patch addresses the case where the issue can be triggered from a guest kernel. The workaround in this case is to trap TTBR accesses by setting HCR_EL2.TVM for guests and doing the system register update from EL2 in a fast path. Due to the nature of the erratum, the trap-and-emulate is only needed when SMT is enabled. The overhead of trap-and-emulate is expected to be negligible on most workloads. A command line option kvm-arm.vm_msr_trap has been provided to override trapping on guest TTBR updates. This is to address a very limited case where a user wants to run SMT enabled, with a trustworthy guest kernel, and wants to avoid the performance overhead associated with emulating the address translation register changes. JC Jayachandran Chandrasekharan Nair (1): arm64: KVM: Add option to trap and emulate guest VM sysreg updates Marc Zyngier (1): arm64: Workaround for Cavium ThunderX2 erratum 219 .../admin-guide/kernel-parameters.txt | 5 + Documentation/arm64/silicon-errata.rst | 2 + arch/arm/include/asm/kvm_host.h | 1 + arch/arm64/Kconfig | 12 ++ arch/arm64/include/asm/cpucaps.h | 3 +- arch/arm64/include/asm/kvm_host.h | 2 + arch/arm64/kernel/cpu_errata.c | 15 +++ arch/arm64/kernel/entry.S | 2 + arch/arm64/kvm/hyp/switch.c | 115 +++++++++++++++++- virt/kvm/arm/arm.c | 2 + 10 files changed, 156 insertions(+), 3 deletions(-) -- 2.17.1 _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
From: Marc Zyngier <marc.zyngier@arm.com> A prefetch, load or store instruction after a TTBR change but before the corresponding context synchronization barrier can cause a spurious Data Abort on Cavium ThunderX2. An optimization introduced in commit c7b9adaf85f818 ("arm64: entry: Add exception trampoline page for exceptions from EL0") for prefetching trampoline vectors ends up triggering this issue when KPTI is enabled. Workaround is to turn off the prefetch for ThunderX2 where it does not have an advantage. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> [Updated commit message and erratum information - jnair] Signed-off-by: Jayachandran Chandrasekharan Nair <jnair@marvell.com> Cc: stable@kernel.org --- Documentation/arm64/silicon-errata.rst | 2 ++ arch/arm64/Kconfig | 12 ++++++++++++ arch/arm64/include/asm/cpucaps.h | 3 ++- arch/arm64/kernel/cpu_errata.c | 15 +++++++++++++++ arch/arm64/kernel/entry.S | 2 ++ 5 files changed, 33 insertions(+), 1 deletion(-) diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst index 17ea3fecddaa..ab7ed2fd072f 100644 --- a/Documentation/arm64/silicon-errata.rst +++ b/Documentation/arm64/silicon-errata.rst @@ -107,6 +107,8 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | Cavium | ThunderX2 SMMUv3| #126 | N/A | +----------------+-----------------+-----------------+-----------------------------+ +| Cavium | ThunderX2 Core | #219 | CAVIUM_TX2_ERRATUM_219 | ++----------------+-----------------+-----------------+-----------------------------+ +----------------+-----------------+-----------------+-----------------------------+ | Freescale/NXP | LS2080A/LS1043A | A-008585 | FSL_ERRATUM_A008585 | +----------------+-----------------+-----------------+-----------------------------+ diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 41a9b4257b72..d2c7c9b22dae 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -617,6 +617,18 @@ config CAVIUM_ERRATUM_30115 If unsure, say Y. +config CAVIUM_TX2_ERRATUM_219 + bool "Cavium ThunderX2 erratum 219: PRFM between TTBR change and ISB fails" + default y + help + On Cavium ThunderX2, a load/store/prefetch instruction after a + change to TTBR and before the corresponding context synchronization + operation can cause a spurious Data Abort to be delivered to any + hardware thread in the CPU core. + + Workaround is to drop the optimization which does this. If unsure + say Y + config QCOM_FALKOR_ERRATUM_1003 bool "Falkor E1003: Incorrect translation due to ASID change" default y diff --git a/arch/arm64/include/asm/cpucaps.h b/arch/arm64/include/asm/cpucaps.h index f19fe4b9acc4..a0666dcff72a 100644 --- a/arch/arm64/include/asm/cpucaps.h +++ b/arch/arm64/include/asm/cpucaps.h @@ -52,7 +52,8 @@ #define ARM64_HAS_IRQ_PRIO_MASKING 42 #define ARM64_HAS_DCPODP 43 #define ARM64_WORKAROUND_1463225 44 +#define ARM64_WORKAROUND_CAVIUM_TX2_219 45 -#define ARM64_NCAPS 45 +#define ARM64_NCAPS 46 #endif /* __ASM_CPUCAPS_H */ diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index 1e43ba5c79b7..1bbb89d0379a 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -712,6 +712,14 @@ static const struct midr_range erratum_1418040_list[] = { }; #endif +#ifdef CONFIG_CAVIUM_TX2_ERRATUM_219 +static const struct midr_range tx2_family_cpus[] = { + MIDR_ALL_VERSIONS(MIDR_BRCM_VULCAN), + MIDR_ALL_VERSIONS(MIDR_CAVIUM_THUNDERX2), + {}, +}; +#endif + const struct arm64_cpu_capabilities arm64_errata[] = { #ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE { @@ -851,6 +859,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = { .type = ARM64_CPUCAP_LOCAL_CPU_ERRATUM, .matches = has_cortex_a76_erratum_1463225, }, +#endif +#ifdef CONFIG_CAVIUM_TX2_ERRATUM_219 + { + .desc = "ThunderX2 erratum 219", + .capability = ARM64_WORKAROUND_CAVIUM_TX2_219, + ERRATA_MIDR_RANGE_LIST(tx2_family_cpus), + }, #endif { } diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 84a822748c84..c282e6570a5b 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -1070,7 +1070,9 @@ alternative_insn isb, nop, ARM64_WORKAROUND_QCOM_FALKOR_E1003 #else ldr x30, =vectors #endif +alternative_if_not ARM64_WORKAROUND_CAVIUM_TX2_219 prfm plil1strm, [x30, #(1b - tramp_vectors)] +alternative_else_nop_endif msr vbar_el1, x30 add x30, x30, #(1b - tramp_vectors) isb -- 2.17.1 _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Enable use of HCR_EL2.TVM to trap and emulate updates to TTBRx_EL1 and other address translation related system registers. To minimize the overhead, traps caused by this flag are emulated in a fast path and we don't go all the way back to the main sysreg handling code, unless the rest of the hypervisor expects to see these accesses. On Cavium ThunderX2, this option enabled by default when SMT > 1, to ensure that a guest kernel cannot trigger erratum 219. We provide a kernel command line option "kvm-arm.vm_msr_trap" to turn this option off - this is to handle the case where the user has a trustable guest kernel, and wants to avoid the trap overhead. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> [Marc's original patch has been changed significantly to add the boot time parameter - jnair] Signed-off-by: Jayachandran Chandrasekharan Nair <jnair@marvell.com> --- .../admin-guide/kernel-parameters.txt | 5 + arch/arm/include/asm/kvm_host.h | 1 + arch/arm64/include/asm/kvm_host.h | 2 + arch/arm64/kvm/hyp/switch.c | 115 +++++++++++++++++- virt/kvm/arm/arm.c | 2 + 5 files changed, 123 insertions(+), 2 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index c7ac2f3ac99f..2028b002bc99 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2078,6 +2078,11 @@ [KVM,ARM] Allow use of GICv4 for direct injection of LPIs. + kvm-arm.vm_msr_trap= + [KVM,ARM] Trap guest accesses to address translation + related system registers. Can be set to 0 to turn off + erratum 219 workaround on Cavium ThunderX2. + kvm-intel.ept= [KVM,Intel] Disable extended page tables (virtualized MMU) support on capable Intel chips. Default is 1 (enabled) diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 8a37c8e89777..a9cc1be336ef 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -42,6 +42,7 @@ DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use); static inline int kvm_arm_init_sve(void) { return 0; } +static inline int kvm_arm_config_vm_msr_trap(void) { } u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode); int __attribute_const__ kvm_target_cpu(void); diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index f656169db8c3..4939e795e2ee 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -50,6 +50,8 @@ DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use); extern unsigned int kvm_sve_max_vl; int kvm_arm_init_sve(void); +void kvm_arm_config_vm_msr_trap(void); + int __attribute_const__ kvm_target_cpu(void); int kvm_reset_vcpu(struct kvm_vcpu *vcpu); void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu); diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c index 3d3815020e36..cbba53714305 100644 --- a/arch/arm64/kvm/hyp/switch.c +++ b/arch/arm64/kvm/hyp/switch.c @@ -120,10 +120,15 @@ static void __hyp_text __activate_traps_nvhe(struct kvm_vcpu *vcpu) write_sysreg(val, cptr_el2); } +/* Key to set HCR_EL2.TVM when running a guest */ +static DEFINE_STATIC_KEY_FALSE(vm_msr_trap); + static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu) { u64 hcr = vcpu->arch.hcr_el2; + if (static_branch_unlikely(&vm_msr_trap)) + hcr |= HCR_TVM; write_sysreg(hcr, hcr_el2); if (cpus_have_const_cap(ARM64_HAS_RAS_EXTN) && (hcr & HCR_VSE)) @@ -174,8 +179,10 @@ static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu) * the crucial bit is "On taking a vSError interrupt, * HCR_EL2.VSE is cleared to 0." */ - if (vcpu->arch.hcr_el2 & HCR_VSE) - vcpu->arch.hcr_el2 = read_sysreg(hcr_el2); + if (vcpu->arch.hcr_el2 & HCR_VSE) { + vcpu->arch.hcr_el2 &= ~HCR_VSE; + vcpu->arch.hcr_el2 |= read_sysreg(hcr_el2) & HCR_VSE; + } if (has_vhe()) deactivate_traps_vhe(); @@ -380,6 +387,61 @@ static bool __hyp_text __hyp_handle_fpsimd(struct kvm_vcpu *vcpu) return true; } +static bool __hyp_text handle_hcr_tvm(struct kvm_vcpu *vcpu) +{ + u32 sysreg = esr_sys64_to_sysreg(kvm_vcpu_get_hsr(vcpu)); + int rt = kvm_vcpu_sys_get_rt(vcpu); + u64 val = vcpu_get_reg(vcpu, rt); + + /* + * The normal sysreg handling code expects to see the traps, + * let's not do anything here. + */ + if (vcpu->arch.hcr_el2 & HCR_TVM) + return false; + + switch (sysreg) { + case SYS_SCTLR_EL1: + write_sysreg_el1(val, SYS_SCTLR); + break; + case SYS_TTBR0_EL1: + write_sysreg_el1(val, SYS_TTBR0); + break; + case SYS_TTBR1_EL1: + write_sysreg_el1(val, SYS_TTBR1); + break; + case SYS_TCR_EL1: + write_sysreg_el1(val, SYS_TCR); + break; + case SYS_ESR_EL1: + write_sysreg_el1(val, SYS_ESR); + break; + case SYS_FAR_EL1: + write_sysreg_el1(val, SYS_FAR); + break; + case SYS_AFSR0_EL1: + write_sysreg_el1(val, SYS_AFSR0); + break; + case SYS_AFSR1_EL1: + write_sysreg_el1(val, SYS_AFSR1); + break; + case SYS_MAIR_EL1: + write_sysreg_el1(val, SYS_MAIR); + break; + case SYS_AMAIR_EL1: + write_sysreg_el1(val, SYS_AMAIR); + break; + case SYS_CONTEXTIDR_EL1: + write_sysreg_el1(val, SYS_CONTEXTIDR); + break; + default: + return false; + } + + __kvm_skip_instr(vcpu); + return true; +} + /* * Return true when we were able to fixup the guest exit and should return to * the guest, false when we should restore the host state and return to the @@ -399,6 +461,11 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code) if (*exit_code != ARM_EXCEPTION_TRAP) goto exit; + if (static_branch_unlikely(&vm_msr_trap) && + kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_SYS64 && + handle_hcr_tvm(vcpu)) + return true; + /* * We trap the first access to the FP/SIMD to save the host context * and restore the guest context lazily. @@ -718,3 +785,47 @@ void __hyp_text __noreturn hyp_panic(struct kvm_cpu_context *host_ctxt) unreachable(); } + +/* command line setting, values : -1 : off, 1 : on, 0 : not set */ +static int vm_msr_trap_arg; + +static int __init handle_early_vm_msr_trap_arg(char *buf) +{ + bool val; + int rv; + + rv = strtobool(buf, &val); + vm_msr_trap_arg = val ? 1 : -1; + return rv; +} +early_param("kvm-arm.vm_msr_trap", handle_early_vm_msr_trap_arg); + +void kvm_arm_config_vm_msr_trap(void) +{ + bool needed = false; + +#ifdef CONFIG_CAVIUM_TX2_ERRATUM_219 + if (cpus_have_const_cap(ARM64_WORKAROUND_CAVIUM_TX2_219)) { + int i; + + /* needed if Aff0 of any CPU is non-zero, i.e, SMT > 1 */ + for_each_possible_cpu(i) { + if (MPIDR_AFFINITY_LEVEL(cpu_logical_map(i), 0) != 0) { + needed = true; + break; + } + } + } +#endif + if (needed) { + if (vm_msr_trap_arg == -1) + pr_warn("KVM: Cavium ThunderX2 erratum 219 workaround forced off!\n"); + else + vm_msr_trap_arg = 1; + } + + if (vm_msr_trap_arg > 0) { + static_branch_enable(&vm_msr_trap); + kvm_info("Using HCR_EL2.TVM to trap guest VM updates.\n"); + } +} diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c index 86c6aa1cb58e..799c4bb308be 100644 --- a/virt/kvm/arm/arm.c +++ b/virt/kvm/arm/arm.c @@ -1696,6 +1696,8 @@ int kvm_arch_init(void *opaque) if (err) return err; + kvm_arm_config_vm_msr_trap(); + if (!in_hyp_mode) { err = init_hyp_mode(); if (err) -- 2.17.1 _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
Hi JC, Thanks for posting this. On Fri, Oct 11, 2019 at 10:35:21AM +0000, Jayachandran Chandrasekharan Nair wrote: > These two patches are based on the work by Marc Zyngier and addresses > Cavium ThunderX2 erratum 219. > > This erratum (originally reported by ARM folks) is from an interesting > use of the prefetch instruction in the KPTI patchset. The prefetch > was done between a TTBR change and the corresponding ISB, and this > occasionally caused a crash on ThunderX2. > > The first patch removes the troublesome prefetch for ThunderX2. > The second patch addresses the case where the issue can be triggered > from a guest kernel. The workaround in this case is to trap TTBR > accesses by setting HCR_EL2.TVM for guests and doing the system > register update from EL2 in a fast path. FWIW, I was already planning to send the following to Linus: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=errata/tx2-219 so please base any changes on top of that branch. > Due to the nature of the erratum, the trap-and-emulate is only > needed when SMT is enabled. > > The overhead of trap-and-emulate is expected to be negligible on most > workloads. A command line option kvm-arm.vm_msr_trap has been > provided to override trapping on guest TTBR updates. This is to > address a very limited case where a user wants to run SMT enabled, > with a trustworthy guest kernel, and wants to avoid the performance > overhead associated with emulating the address translation register > changes. Do you have any performance data to show the impact of the workaround on non-kpti guests? I don't think we can justify the inclusion of a cmdline option for this without figures showing that it's really necessary. Otherwise, the "very limited case" really is a niche scenario where the CONFIG option can simply be disabled. Cheers, Will _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
On Fri, Oct 11, 2019 at 11:44:55AM +0100, Will Deacon wrote: > Hi JC, > > Thanks for posting this. > > On Fri, Oct 11, 2019 at 10:35:21AM +0000, Jayachandran Chandrasekharan Nair wrote: > > These two patches are based on the work by Marc Zyngier and addresses > > Cavium ThunderX2 erratum 219. > > > > This erratum (originally reported by ARM folks) is from an interesting > > use of the prefetch instruction in the KPTI patchset. The prefetch > > was done between a TTBR change and the corresponding ISB, and this > > occasionally caused a crash on ThunderX2. > > > > The first patch removes the troublesome prefetch for ThunderX2. > > The second patch addresses the case where the issue can be triggered > > from a guest kernel. The workaround in this case is to trap TTBR > > accesses by setting HCR_EL2.TVM for guests and doing the system > > register update from EL2 in a fast path. > > FWIW, I was already planning to send the following to Linus: > > https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=errata/tx2-219 > > so please base any changes on top of that branch. Please consider taking my patchset as is, if you don't have issues with patches. > > Due to the nature of the erratum, the trap-and-emulate is only > > needed when SMT is enabled. > > > > The overhead of trap-and-emulate is expected to be negligible on most > > workloads. A command line option kvm-arm.vm_msr_trap has been > > provided to override trapping on guest TTBR updates. This is to > > address a very limited case where a user wants to run SMT enabled, > > with a trustworthy guest kernel, and wants to avoid the performance > > overhead associated with emulating the address translation register > > changes. > > Do you have any performance data to show the impact of the workaround on > non-kpti guests? I don't think we can justify the inclusion of a cmdline > option for this without figures showing that it's really necessary. > Otherwise, the "very limited case" really is a niche scenario where the > CONFIG option can simply be disabled. In my view, you are switching the responsibility here. Even one use case should be enough reason not to force a performance regression that cannot be opted out of. You are expected to leave as much policy as reasonable to the user with safe (and least astonishing) defaults. A class of guest usecases involve running stock images (linux or non-linux). The arm64 server ecosystem is still in development, so we should allow someone evaluating a server system to turn off or on options as much as possible without forcing a re-compile. Also, the run-time option is generic enough that it can be switched on/off for any platform, not just the one affected by this erratum. So, I disagree with the queued patchset - but I will leave you to make your call on which way to go. JC _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/mailman/listinfo/kvmarm