kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/2] KVM: arm64: PMU: Correct the handling of PMUSERENR_EL0
@ 2023-04-16  4:53 Reiji Watanabe
  2023-04-16  4:53 ` [PATCH v4 1/2] KVM: arm64: PMU: Restore the host's PMUSERENR_EL0 Reiji Watanabe
  2023-04-16  4:53 ` [PATCH v4 2/2] KVM: arm64: PMU: Don't overwrite PMUSERENR with vcpu loaded Reiji Watanabe
  0 siblings, 2 replies; 3+ messages in thread
From: Reiji Watanabe @ 2023-04-16  4:53 UTC (permalink / raw)
  To: Marc Zyngier, Mark Rutland, Oliver Upton, Will Deacon,
	Catalin Marinas, kvmarm
  Cc: kvm, linux-arm-kernel, James Morse, Alexandru Elisei, Zenghui Yu,
	Suzuki K Poulose, Paolo Bonzini, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, Shaoqin Huang, Rob Herring,
	Reiji Watanabe

This series will fix bugs in KVM's handling of PMUSERENR_EL0.

With PMU access support from EL0 [1], the perf subsystem would
set CR and ER bits of PMUSERENR_EL0 as needed to allow EL0 to have
a direct access to PMU counters.  However, KVM appears to assume
that the register value is always zero for the host EL0, and has
the following two problems in handling the register.

[A] The host EL0 might lose the direct access to PMU counters, as
    KVM always clears PMUSERENR_EL0 before returning to userspace.

[B] With VHE, the guest EL0 access to PMU counters might be trapped
    to EL1 instead of to EL2 (even when PMUSERENR_EL0 for the guest
    indicates that the guest EL0 has an access to the counters).
    This is because, with VHE, KVM sets ER, CR, SW and EN bits of
    PMUSERENR_EL0 to 1 on vcpu_load() to ensure to trap PMU access
    from the guset EL0 to EL2, but those bits might be cleared by
    the perf subsystem after vcpu_load() (when PMU counters are
    programmed for the vPMU emulation).

Patch-1 will fix [A], and Patch-2 will fix [B] respectively.
The series is based on v6.3-rc6.

v4:
 - Introduce NO_DEBUG_IRQFLAGS to exclude warn_bogus_irq_restore()
   from the nVHE hyp code. This is to address the issue [2] that
   was reported by kernel test robot <lkp@intel.com>.

v3: https://lore.kernel.org/all/20230415164029.526895-1-reijiw@google.com/
 - While vcpu_{put,load}() are manipulating PMUSERENR_EL0,
   disable IRQs to prevent a race condition between these
   processes and IPIs that updates PMUSERENR_EL0. [Mark]

v2: https://lore.kernel.org/all/20230408034759.2369068-1-reijiw@google.com/
 - Save the PMUSERENR_EL0 for the host in the sysreg array of
   kvm_host_data. [Marc]
 - Don't let armv8pmu_start() overwrite PMUSERENR if the vCPU
   is loaded, instead have KVM update the saved shadow register
   value for the host. [Marc, Mark]

v1: https://lore.kernel.org/all/20230329002136.2463442-1-reijiw@google.com/

[1] https://github.com/torvalds/linux/commit/83a7a4d643d33a8b74a42229346b7ed7139fcef9
[2] https://lore.kernel.org/all/202304160658.Oqr1xZbi-lkp@intel.com/

Reiji Watanabe (2):
  KVM: arm64: PMU: Restore the host's PMUSERENR_EL0
  KVM: arm64: PMU: Don't overwrite PMUSERENR with vcpu loaded

 arch/arm64/include/asm/kvm_host.h       |  7 +++++
 arch/arm64/kernel/perf_event.c          | 21 ++++++++++++--
 arch/arm64/kvm/hyp/include/hyp/switch.h | 37 +++++++++++++++++++++++--
 arch/arm64/kvm/hyp/nvhe/Makefile        |  2 +-
 arch/arm64/kvm/pmu.c                    | 25 +++++++++++++++++
 include/linux/irqflags.h                |  6 ++--
 6 files changed, 89 insertions(+), 9 deletions(-)


base-commit: 09a9639e56c01c7a00d6c0ca63f4c7c41abe075d
-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH v4 1/2] KVM: arm64: PMU: Restore the host's PMUSERENR_EL0
  2023-04-16  4:53 [PATCH v4 0/2] KVM: arm64: PMU: Correct the handling of PMUSERENR_EL0 Reiji Watanabe
@ 2023-04-16  4:53 ` Reiji Watanabe
  2023-04-16  4:53 ` [PATCH v4 2/2] KVM: arm64: PMU: Don't overwrite PMUSERENR with vcpu loaded Reiji Watanabe
  1 sibling, 0 replies; 3+ messages in thread
From: Reiji Watanabe @ 2023-04-16  4:53 UTC (permalink / raw)
  To: Marc Zyngier, Mark Rutland, Oliver Upton, Will Deacon,
	Catalin Marinas, kvmarm
  Cc: kvm, linux-arm-kernel, James Morse, Alexandru Elisei, Zenghui Yu,
	Suzuki K Poulose, Paolo Bonzini, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, Shaoqin Huang, Rob Herring,
	Reiji Watanabe

Restore the host's PMUSERENR_EL0 value instead of clearing it,
before returning back to userspace, as the host's EL0 might have
a direct access to PMU registers (some bits of PMUSERENR_EL0 for
might not be zero for the host EL0).

Fixes: 83a7a4d643d3 ("arm64: perf: Enable PMU counter userspace access for perf event")
Signed-off-by: Reiji Watanabe <reijiw@google.com>
---
 arch/arm64/kvm/hyp/include/hyp/switch.h | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 07d37ff88a3f..6718731729fd 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -81,7 +81,12 @@ static inline void __activate_traps_common(struct kvm_vcpu *vcpu)
 	 * EL1 instead of being trapped to EL2.
 	 */
 	if (kvm_arm_support_pmu_v3()) {
+		struct kvm_cpu_context *hctxt;
+
 		write_sysreg(0, pmselr_el0);
+
+		hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+		ctxt_sys_reg(hctxt, PMUSERENR_EL0) = read_sysreg(pmuserenr_el0);
 		write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
 	}
 
@@ -105,8 +110,12 @@ static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
 	write_sysreg(vcpu->arch.mdcr_el2_host, mdcr_el2);
 
 	write_sysreg(0, hstr_el2);
-	if (kvm_arm_support_pmu_v3())
-		write_sysreg(0, pmuserenr_el0);
+	if (kvm_arm_support_pmu_v3()) {
+		struct kvm_cpu_context *hctxt;
+
+		hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+		write_sysreg(ctxt_sys_reg(hctxt, PMUSERENR_EL0), pmuserenr_el0);
+	}
 
 	if (cpus_have_final_cap(ARM64_SME)) {
 		sysreg_clear_set_s(SYS_HFGRTR_EL2, 0,
-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH v4 2/2] KVM: arm64: PMU: Don't overwrite PMUSERENR with vcpu loaded
  2023-04-16  4:53 [PATCH v4 0/2] KVM: arm64: PMU: Correct the handling of PMUSERENR_EL0 Reiji Watanabe
  2023-04-16  4:53 ` [PATCH v4 1/2] KVM: arm64: PMU: Restore the host's PMUSERENR_EL0 Reiji Watanabe
@ 2023-04-16  4:53 ` Reiji Watanabe
  1 sibling, 0 replies; 3+ messages in thread
From: Reiji Watanabe @ 2023-04-16  4:53 UTC (permalink / raw)
  To: Marc Zyngier, Mark Rutland, Oliver Upton, Will Deacon,
	Catalin Marinas, kvmarm
  Cc: kvm, linux-arm-kernel, James Morse, Alexandru Elisei, Zenghui Yu,
	Suzuki K Poulose, Paolo Bonzini, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, Shaoqin Huang, Rob Herring,
	Reiji Watanabe

Currently, with VHE, KVM sets ER, CR, SW and EN bits of
PMUSERENR_EL0 to 1 on vcpu_load(), and saves and restores
the register value for the host on vcpu_load() and vcpu_put().
If the value of those bits are cleared on a pCPU with a vCPU
loaded (armv8pmu_start() would do that when PMU counters are
programmed for the guest), PMU access from the guest EL0 might
be trapped to the guest EL1 directly regardless of the current
PMUSERENR_EL0 value of the vCPU.

Fix this by not letting armv8pmu_start() overwrite PMUSERENR_EL0
on the pCPU where PMUSERENR_EL0 for the guest is loaded, and
instead updating the saved shadow register value for the host,
so that the value can be restored on vcpu_put() later.
While vcpu_{put,load}() are manipulating PMUSERENR_EL0, disable
IRQs to prevent a race condition between these processes and IPIs
that attempt to update PMUSERENR_EL0 for the host EL0.
As this change (disabling IRQs) is applied to the nVHE hyp code,
unwanted code (e.g. trace_hardirqs_off, etc) will be included in the
hyp code when CONFIG_TRACE_IRQFLAGS and/or CONFIG_DEBUG_IRQFLAGS
are enabled.  Introduce NO_TRACE_IRQFLAGS and NO_DEBUG_IRQFLAGS
macros to locally disable CONFIG_TRACE_IRQFLAGS or
CONFIG_DEBUG_IRQFLAGS in the nVHE hyp code.

Suggested-by: Mark Rutland <mark.rutland@arm.com>
Suggested-by: Marc Zyngier <maz@kernel.org>
Fixes: 83a7a4d643d3 ("arm64: perf: Enable PMU counter userspace access for perf event")
Signed-off-by: Reiji Watanabe <reijiw@google.com>
---
 arch/arm64/include/asm/kvm_host.h       |  7 +++++++
 arch/arm64/kernel/perf_event.c          | 21 ++++++++++++++++++---
 arch/arm64/kvm/hyp/include/hyp/switch.h | 24 ++++++++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/Makefile        |  2 +-
 arch/arm64/kvm/pmu.c                    | 25 +++++++++++++++++++++++++
 include/linux/irqflags.h                |  6 +++---
 6 files changed, 78 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index bcd774d74f34..c49cfda2740a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -668,6 +668,8 @@ struct kvm_vcpu_arch {
 /* Software step state is Active-pending */
 #define DBG_SS_ACTIVE_PENDING	__vcpu_single_flag(sflags, BIT(5))
 
+/* PMUSERENR for the guest EL0 is on physical CPU */
+#define PMUSERENR_ON_CPU	__vcpu_single_flag(sflags, BIT(6))
 
 /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
 #define vcpu_sve_pffr(vcpu) (kern_hyp_va((vcpu)->arch.sve_state) +	\
@@ -1028,9 +1030,14 @@ void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu);
 #ifdef CONFIG_KVM
 void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr);
 void kvm_clr_pmu_events(u32 clr);
+bool kvm_set_pmuserenr(u64 val);
 #else
 static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {}
 static inline void kvm_clr_pmu_events(u32 clr) {}
+static inline bool kvm_set_pmuserenr(u64 val)
+{
+	return false;
+}
 #endif
 
 void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index dde06c0f97f3..33bb5f548f8a 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -741,9 +741,25 @@ static inline u32 armv8pmu_getreset_flags(void)
 	return value;
 }
 
+static void update_pmuserenr(u64 val)
+{
+	lockdep_assert_irqs_disabled();
+
+	/*
+	 * The current PMUSERENR_EL0 value might be the value for the guest.
+	 * If that's the case, have KVM keep tracking of the register value
+	 * for the host EL0 so that KVM can restore it before returning to
+	 * the host EL0. Otherwise, update the register now.
+	 */
+	if (kvm_set_pmuserenr(val))
+		return;
+
+	write_sysreg(val, pmuserenr_el0);
+}
+
 static void armv8pmu_disable_user_access(void)
 {
-	write_sysreg(0, pmuserenr_el0);
+	update_pmuserenr(0);
 }
 
 static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
@@ -759,8 +775,7 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
 			armv8pmu_write_evcntr(i, 0);
 	}
 
-	write_sysreg(0, pmuserenr_el0);
-	write_sysreg(ARMV8_PMU_USERENR_ER | ARMV8_PMU_USERENR_CR, pmuserenr_el0);
+	update_pmuserenr(ARMV8_PMU_USERENR_ER | ARMV8_PMU_USERENR_CR);
 }
 
 static void armv8pmu_enable_event(struct perf_event *event)
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index 6718731729fd..7e73be12cfaf 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -82,12 +82,24 @@ static inline void __activate_traps_common(struct kvm_vcpu *vcpu)
 	 */
 	if (kvm_arm_support_pmu_v3()) {
 		struct kvm_cpu_context *hctxt;
+		unsigned long flags;
 
 		write_sysreg(0, pmselr_el0);
 
 		hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+
+		/*
+		 * Disable IRQs to prevent a race condition between the
+		 * following code and IPIs that attempts to update
+		 * PMUSERENR_EL0. See also kvm_set_pmuserenr().
+		 */
+		local_irq_save(flags);
+
 		ctxt_sys_reg(hctxt, PMUSERENR_EL0) = read_sysreg(pmuserenr_el0);
 		write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0);
+		vcpu_set_flag(vcpu, PMUSERENR_ON_CPU);
+
+		local_irq_restore(flags);
 	}
 
 	vcpu->arch.mdcr_el2_host = read_sysreg(mdcr_el2);
@@ -112,9 +124,21 @@ static inline void __deactivate_traps_common(struct kvm_vcpu *vcpu)
 	write_sysreg(0, hstr_el2);
 	if (kvm_arm_support_pmu_v3()) {
 		struct kvm_cpu_context *hctxt;
+		unsigned long flags;
 
 		hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+
+		/*
+		 * Disable IRQs to prevent a race condition between the
+		 * following code and IPIs that attempts to update
+		 * PMUSERENR_EL0. See also kvm_set_pmuserenr().
+		 */
+		local_irq_save(flags);
+
 		write_sysreg(ctxt_sys_reg(hctxt, PMUSERENR_EL0), pmuserenr_el0);
+		vcpu_clear_flag(vcpu, PMUSERENR_ON_CPU);
+
+		local_irq_restore(flags);
 	}
 
 	if (cpus_have_final_cap(ARM64_SME)) {
diff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile
index 530347cdebe3..2e31d37512c7 100644
--- a/arch/arm64/kvm/hyp/nvhe/Makefile
+++ b/arch/arm64/kvm/hyp/nvhe/Makefile
@@ -10,7 +10,7 @@ asflags-y := -D__KVM_NVHE_HYPERVISOR__ -D__DISABLE_EXPORTS
 # will explode instantly (Words of Marc Zyngier). So introduce a generic flag
 # __DISABLE_TRACE_MMIO__ to disable MMIO tracing for nVHE KVM.
 ccflags-y := -D__KVM_NVHE_HYPERVISOR__ -D__DISABLE_EXPORTS -D__DISABLE_TRACE_MMIO__
-ccflags-y += -fno-stack-protector	\
+ccflags-y += -fno-stack-protector -DNO_TRACE_IRQFLAGS -DNO_DEBUG_IRQFLAGS \
 	     -DDISABLE_BRANCH_PROFILING	\
 	     $(DISABLE_STACKLEAK_PLUGIN)
 
diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c
index 7887133d15f0..d6a863853bfe 100644
--- a/arch/arm64/kvm/pmu.c
+++ b/arch/arm64/kvm/pmu.c
@@ -209,3 +209,28 @@ void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu)
 	kvm_vcpu_pmu_enable_el0(events_host);
 	kvm_vcpu_pmu_disable_el0(events_guest);
 }
+
+/*
+ * With VHE, keep track of the PMUSERENR_EL0 value for the host EL0 on the pCPU
+ * where PMUSERENR_EL0 for the guest is loaded, since PMUSERENR_EL0 is switched
+ * to the value for the guest on vcpu_load().  The value for the host EL0
+ * will be restored on vcpu_put(), before returning to the EL0.
+ *
+ * Return true if KVM takes care of the register. Otherwise return false.
+ */
+bool kvm_set_pmuserenr(u64 val)
+{
+	struct kvm_cpu_context *hctxt;
+	struct kvm_vcpu *vcpu;
+
+	if (!kvm_arm_support_pmu_v3() || !has_vhe())
+		return false;
+
+	vcpu = kvm_get_running_vcpu();
+	if (!vcpu || !vcpu_get_flag(vcpu, PMUSERENR_ON_CPU))
+		return false;
+
+	hctxt = &this_cpu_ptr(&kvm_host_data)->host_ctxt;
+	ctxt_sys_reg(hctxt, PMUSERENR_EL0) = val;
+	return true;
+}
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 5ec0fa71399e..bcf3b969d459 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -156,7 +156,7 @@ do {						\
 # define start_critical_timings() do { } while (0)
 #endif
 
-#ifdef CONFIG_DEBUG_IRQFLAGS
+#if defined CONFIG_DEBUG_IRQFLAGS && !defined(NO_DEBUG_IRQFLAGS)
 extern void warn_bogus_irq_restore(void);
 #define raw_check_bogus_irq_restore()			\
 	do {						\
@@ -198,9 +198,9 @@ extern void warn_bogus_irq_restore(void);
 
 /*
  * The local_irq_*() APIs are equal to the raw_local_irq*()
- * if !TRACE_IRQFLAGS.
+ * if !TRACE_IRQFLAGS or if NO_TRACE_IRQFLAGS is locally set.
  */
-#ifdef CONFIG_TRACE_IRQFLAGS
+#if defined CONFIG_TRACE_IRQFLAGS && !defined(NO_TRACE_IRQFLAGS)
 
 #define local_irq_enable()				\
 	do {						\
-- 
2.40.0.634.g4ca3ef3211-goog


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-04-16  4:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-16  4:53 [PATCH v4 0/2] KVM: arm64: PMU: Correct the handling of PMUSERENR_EL0 Reiji Watanabe
2023-04-16  4:53 ` [PATCH v4 1/2] KVM: arm64: PMU: Restore the host's PMUSERENR_EL0 Reiji Watanabe
2023-04-16  4:53 ` [PATCH v4 2/2] KVM: arm64: PMU: Don't overwrite PMUSERENR with vcpu loaded Reiji Watanabe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).