linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] KVM: x86/pmu: Counter reprogramming fixes
@ 2022-09-23  0:13 Sean Christopherson
  2022-09-23  0:13 ` [PATCH 1/4] KVM: x86/pmu: Force reprogramming of all counters on PMU filter change Sean Christopherson
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Sean Christopherson @ 2022-09-23  0:13 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Aaron Lewis, Like Xu, Wanpeng Li

The headliner is Like's patches to fix a bug where KVM's sleeps with IRQs
disabled due to attempting to pause and reprogram PMU counters in the
VM-Exit fast path.

Patches 1 and 2 (mine) fix mostly theoretical issues found when reviewing
Like's code, and would conflict with Like's patches if posted separately.

This is _very_ lightly tested, borderline RFC, but Like's updated PMU
KUT tests pass, as do the filter selftests, and I'm feeling lucky.

And I'm offline until Monday, so a traditional Friday patch bomb is in
order ;-)

Like Xu (2):
  KVM: x86/pmu: Defer reprogram_counter() to kvm_pmu_handle_event()
  KVM: x86/pmu: Defer counter emulated overflow via pmc->prev_counter

Sean Christopherson (2):
  KVM: x86/pmu: Force reprogramming of all counters on PMU filter change
  KVM: x86/pmu: Clear "reprogram" bit if counter is disabled or
    disallowed

 arch/x86/include/asm/kvm_host.h | 17 ++++--
 arch/x86/kvm/pmu.c              | 92 +++++++++++++++++++++------------
 arch/x86/kvm/pmu.h              |  6 ++-
 arch/x86/kvm/svm/pmu.c          |  4 +-
 arch/x86/kvm/vmx/pmu_intel.c    | 10 ++--
 5 files changed, 86 insertions(+), 43 deletions(-)


base-commit: efc8a6befa592de7e2fd4927b1b9a56a5ec7d8cf
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] KVM: x86/pmu: Force reprogramming of all counters on PMU filter change
  2022-09-23  0:13 [PATCH 0/4] KVM: x86/pmu: Counter reprogramming fixes Sean Christopherson
@ 2022-09-23  0:13 ` Sean Christopherson
  2022-10-13 12:01   ` Like Xu
  2022-09-23  0:13 ` [PATCH 2/4] KVM: x86/pmu: Clear "reprogram" bit if counter is disabled or disallowed Sean Christopherson
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2022-09-23  0:13 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Aaron Lewis, Like Xu, Wanpeng Li

Force vCPUs to reprogram all counters on a PMU filter change to provide
a sane ABI for userspace.  Use the existing KVM_REQ_PMU to do the
programming, and take advantage of the fact that the reprogram_pmi bitmap
fits in a u64 to set all bits in a single atomic update.  Note, setting
the bitmap and making the request needs to be done _after_ the SRCU
synchronization to ensure that vCPUs will reprogram using the new filter.

KVM's current "lazy" approach is confusing and non-deterministic.  It's
confusing because, from a developer perspective, the code is buggy as it
makes zero sense to let userspace modify the filter but then not actually
enforce the new filter.  The lazy approach is non-deterministic because
KVM enforces the filter whenever a counter is reprogrammed, not just on
guest WRMSRs, i.e. a guest might gain/lose access to an event at random
times depending on what is going on in the host.

Note, the resulting behavior is still non-determinstic while the filter
is in flux.  If userspace wants to guarantee deterministic behavior, all
vCPUs should be paused during the filter update.

Fixes: 66bb8a065f5a ("KVM: x86: PMU Event Filter")
Cc: Aaron Lewis <aaronlewis@google.com>
Jim Mattson <jmattson@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h | 11 ++++++++++-
 arch/x86/kvm/pmu.c              | 15 +++++++++++++--
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b3ce723efb43..462f041ede9f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -519,7 +519,16 @@ struct kvm_pmu {
 	struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC];
 	struct kvm_pmc fixed_counters[KVM_PMC_MAX_FIXED];
 	struct irq_work irq_work;
-	DECLARE_BITMAP(reprogram_pmi, X86_PMC_IDX_MAX);
+
+	/*
+	 * Overlay the bitmap with a 64-bit atomic so that all bits can be
+	 * set in a single access, e.g. to reprogram all counters when the PMU
+	 * filter changes.
+	 */
+	union {
+		DECLARE_BITMAP(reprogram_pmi, X86_PMC_IDX_MAX);
+		atomic64_t __reprogram_pmi;
+	};
 	DECLARE_BITMAP(all_valid_pmc_idx, X86_PMC_IDX_MAX);
 	DECLARE_BITMAP(pmc_in_use, X86_PMC_IDX_MAX);
 
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index d9b9a0f0db17..4504987cbbe2 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -577,6 +577,8 @@ EXPORT_SYMBOL_GPL(kvm_pmu_trigger_event);
 int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_pmu_event_filter tmp, *filter;
+	struct kvm_vcpu *vcpu;
+	unsigned long i;
 	size_t size;
 	int r;
 
@@ -613,9 +615,18 @@ int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
 	mutex_lock(&kvm->lock);
 	filter = rcu_replace_pointer(kvm->arch.pmu_event_filter, filter,
 				     mutex_is_locked(&kvm->lock));
-	mutex_unlock(&kvm->lock);
-
 	synchronize_srcu_expedited(&kvm->srcu);
+
+	BUILD_BUG_ON(sizeof(((struct kvm_pmu *)0)->reprogram_pmi) >
+		     sizeof(((struct kvm_pmu *)0)->__reprogram_pmi));
+
+	kvm_for_each_vcpu(i, vcpu, kvm)
+		atomic64_set(&vcpu_to_pmu(vcpu)->__reprogram_pmi, -1ull);
+
+	kvm_make_all_cpus_request(kvm, KVM_REQ_PMU);
+
+	mutex_unlock(&kvm->lock);
+
 	r = 0;
 cleanup:
 	kfree(filter);
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] KVM: x86/pmu: Clear "reprogram" bit if counter is disabled or disallowed
  2022-09-23  0:13 [PATCH 0/4] KVM: x86/pmu: Counter reprogramming fixes Sean Christopherson
  2022-09-23  0:13 ` [PATCH 1/4] KVM: x86/pmu: Force reprogramming of all counters on PMU filter change Sean Christopherson
@ 2022-09-23  0:13 ` Sean Christopherson
  2022-10-14  7:14   ` Like Xu
  2022-09-23  0:13 ` [PATCH 3/4] KVM: x86/pmu: Defer reprogram_counter() to kvm_pmu_handle_event() Sean Christopherson
  2022-09-23  0:13 ` [PATCH 4/4] KVM: x86/pmu: Defer counter emulated overflow via pmc->prev_counter Sean Christopherson
  3 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2022-09-23  0:13 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Aaron Lewis, Like Xu, Wanpeng Li

When reprogramming a counter, clear the counter's "reprogram pending" bit
if the counter is disabled (by the guest) or is disallowed (by the
userspace filter).  In both cases, there's no need to re-attempt
programming on the next coincident KVM_REQ_PMU as enabling the counter by
either method will trigger reprogramming.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/pmu.c | 38 ++++++++++++++++++++++++--------------
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 4504987cbbe2..4cd99320019b 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -150,9 +150,9 @@ static void kvm_perf_overflow(struct perf_event *perf_event,
 	__kvm_perf_overflow(pmc, true);
 }
 
-static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
-				  u64 config, bool exclude_user,
-				  bool exclude_kernel, bool intr)
+static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config,
+				 bool exclude_user, bool exclude_kernel,
+				 bool intr)
 {
 	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
 	struct perf_event *event;
@@ -204,14 +204,14 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
 	if (IS_ERR(event)) {
 		pr_debug_ratelimited("kvm_pmu: event creation failed %ld for pmc->idx = %d\n",
 			    PTR_ERR(event), pmc->idx);
-		return;
+		return PTR_ERR(event);
 	}
 
 	pmc->perf_event = event;
 	pmc_to_pmu(pmc)->event_count++;
-	clear_bit(pmc->idx, pmc_to_pmu(pmc)->reprogram_pmi);
 	pmc->is_paused = false;
 	pmc->intr = intr || pebs;
+	return 0;
 }
 
 static void pmc_pause_counter(struct kvm_pmc *pmc)
@@ -245,7 +245,6 @@ static bool pmc_resume_counter(struct kvm_pmc *pmc)
 	perf_event_enable(pmc->perf_event);
 	pmc->is_paused = false;
 
-	clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi);
 	return true;
 }
 
@@ -303,10 +302,10 @@ void reprogram_counter(struct kvm_pmc *pmc)
 	pmc_pause_counter(pmc);
 
 	if (!pmc_speculative_in_use(pmc) || !pmc_is_enabled(pmc))
-		return;
+		goto reprogram_complete;
 
 	if (!check_pmu_event_filter(pmc))
-		return;
+		goto reprogram_complete;
 
 	if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
 		printk_once("kvm pmu: pin control bit is ignored\n");
@@ -324,16 +323,27 @@ void reprogram_counter(struct kvm_pmc *pmc)
 	}
 
 	if (pmc->current_config == new_config && pmc_resume_counter(pmc))
-		return;
+		goto reprogram_complete;
 
 	pmc_release_perf_event(pmc);
 
 	pmc->current_config = new_config;
-	pmc_reprogram_counter(pmc, PERF_TYPE_RAW,
-			      (eventsel & pmu->raw_event_mask),
-			      !(eventsel & ARCH_PERFMON_EVENTSEL_USR),
-			      !(eventsel & ARCH_PERFMON_EVENTSEL_OS),
-			      eventsel & ARCH_PERFMON_EVENTSEL_INT);
+
+	/*
+	 * If reprogramming fails, e.g. due to contention, leave the counter's
+	 * regprogram bit set, i.e. opportunistically try again on the next PMU
+	 * refresh.  Don't make a new request as doing so can stall the guest
+	 * if reprogramming repeatedly fails.
+	 */
+	if (pmc_reprogram_counter(pmc, PERF_TYPE_RAW,
+				  (eventsel & pmu->raw_event_mask),
+				  !(eventsel & ARCH_PERFMON_EVENTSEL_USR),
+				  !(eventsel & ARCH_PERFMON_EVENTSEL_OS),
+				  eventsel & ARCH_PERFMON_EVENTSEL_INT))
+		return;
+
+reprogram_complete:
+	clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi);
 }
 EXPORT_SYMBOL_GPL(reprogram_counter);
 
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] KVM: x86/pmu: Defer reprogram_counter() to kvm_pmu_handle_event()
  2022-09-23  0:13 [PATCH 0/4] KVM: x86/pmu: Counter reprogramming fixes Sean Christopherson
  2022-09-23  0:13 ` [PATCH 1/4] KVM: x86/pmu: Force reprogramming of all counters on PMU filter change Sean Christopherson
  2022-09-23  0:13 ` [PATCH 2/4] KVM: x86/pmu: Clear "reprogram" bit if counter is disabled or disallowed Sean Christopherson
@ 2022-09-23  0:13 ` Sean Christopherson
  2022-09-23  0:13 ` [PATCH 4/4] KVM: x86/pmu: Defer counter emulated overflow via pmc->prev_counter Sean Christopherson
  3 siblings, 0 replies; 10+ messages in thread
From: Sean Christopherson @ 2022-09-23  0:13 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Aaron Lewis, Like Xu, Wanpeng Li

From: Like Xu <likexu@tencent.com>

Batch reprogramming PMU counters by setting KVM_REQ_PMU and thus
deferring reprogramming kvm_pmu_handle_event() to avoid reprogramming
a counter multiple times during a single VM-Exit.

Deferring programming will also allow KVM to fix a bug where immediately
reprogramming a counter can result in sleeping (taking a mutex) while
interrupts are disabled in the VM-Exit fastpath.

Introduce kvm_pmu_request_counter_reprogam() to make it obvious that
KVM is _requesting_ a reprogram and not actually doing the reprogram.

Opportunistically refine related comments to avoid misunderstandings.

Signed-off-by: Like Xu <likexu@tencent.com>
Link: https://lore.kernel.org/r/20220831085328.45489-5-likexu@tencent.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/pmu.c              | 17 ++++++++++++-----
 arch/x86/kvm/pmu.h              |  6 +++++-
 arch/x86/kvm/svm/pmu.c          |  2 +-
 arch/x86/kvm/vmx/pmu_intel.c    |  6 +++---
 5 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 462f041ede9f..12dcfc9330e7 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -493,6 +493,7 @@ struct kvm_pmc {
 	struct perf_event *perf_event;
 	struct kvm_vcpu *vcpu;
 	/*
+	 * only for creating or reusing perf_event,
 	 * eventsel value for general purpose counters,
 	 * ctrl value for fixed counters.
 	 */
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 4cd99320019b..d8330e6064ab 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -101,7 +101,11 @@ static inline void __kvm_perf_overflow(struct kvm_pmc *pmc, bool in_pmi)
 	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
 	bool skip_pmi = false;
 
-	/* Ignore counters that have been reprogrammed already. */
+	/*
+	 * Ignore overflow events for counters that are scheduled to be
+	 * reprogrammed, e.g. if a PMI for the previous event races with KVM's
+	 * handling of a related guest WRMSR.
+	 */
 	if (test_and_set_bit(pmc->idx, pmu->reprogram_pmi))
 		return;
 
@@ -292,7 +296,7 @@ static bool check_pmu_event_filter(struct kvm_pmc *pmc)
 	return allow_event;
 }
 
-void reprogram_counter(struct kvm_pmc *pmc)
+static void reprogram_counter(struct kvm_pmc *pmc)
 {
 	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
 	u64 eventsel = pmc->eventsel;
@@ -345,7 +349,6 @@ void reprogram_counter(struct kvm_pmc *pmc)
 reprogram_complete:
 	clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi);
 }
-EXPORT_SYMBOL_GPL(reprogram_counter);
 
 void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
 {
@@ -355,10 +358,11 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
 	for_each_set_bit(bit, pmu->reprogram_pmi, X86_PMC_IDX_MAX) {
 		struct kvm_pmc *pmc = static_call(kvm_x86_pmu_pmc_idx_to_pmc)(pmu, bit);
 
-		if (unlikely(!pmc || !pmc->perf_event)) {
+		if (unlikely(!pmc)) {
 			clear_bit(bit, pmu->reprogram_pmi);
 			continue;
 		}
+
 		reprogram_counter(pmc);
 	}
 
@@ -552,12 +556,15 @@ static inline bool eventsel_match_perf_hw_id(struct kvm_pmc *pmc,
 static inline bool cpl_is_matched(struct kvm_pmc *pmc)
 {
 	bool select_os, select_user;
-	u64 config = pmc->current_config;
+	u64 config;
 
 	if (pmc_is_gp(pmc)) {
+		config = pmc->eventsel;
 		select_os = config & ARCH_PERFMON_EVENTSEL_OS;
 		select_user = config & ARCH_PERFMON_EVENTSEL_USR;
 	} else {
+		config = fixed_ctrl_field(pmc_to_pmu(pmc)->fixed_ctr_ctrl,
+					  pmc->idx - INTEL_PMC_IDX_FIXED);
 		select_os = config & 0x1;
 		select_user = config & 0x2;
 	}
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 5cc5721f260b..85ff3c0588ba 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -183,7 +183,11 @@ static inline void kvm_init_pmu_capability(void)
 					     KVM_PMC_MAX_FIXED);
 }
 
-void reprogram_counter(struct kvm_pmc *pmc);
+static inline void kvm_pmu_request_counter_reprogam(struct kvm_pmc *pmc)
+{
+	set_bit(pmc->idx, pmc_to_pmu(pmc)->reprogram_pmi);
+	kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
+}
 
 void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
 void kvm_pmu_handle_event(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index b68956299fa8..041aa898e1bc 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -159,7 +159,7 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		data &= ~pmu->reserved_bits;
 		if (data != pmc->eventsel) {
 			pmc->eventsel = data;
-			reprogram_counter(pmc);
+			kvm_pmu_request_counter_reprogam(pmc);
 		}
 		return 0;
 	}
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 25b70a85bef5..e38518afc265 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -52,7 +52,7 @@ static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
 		pmc = get_fixed_pmc(pmu, MSR_CORE_PERF_FIXED_CTR0 + i);
 
 		__set_bit(INTEL_PMC_IDX_FIXED + i, pmu->pmc_in_use);
-		reprogram_counter(pmc);
+		kvm_pmu_request_counter_reprogam(pmc);
 	}
 }
 
@@ -76,7 +76,7 @@ static void reprogram_counters(struct kvm_pmu *pmu, u64 diff)
 	for_each_set_bit(bit, (unsigned long *)&diff, X86_PMC_IDX_MAX) {
 		pmc = intel_pmc_idx_to_pmc(pmu, bit);
 		if (pmc)
-			reprogram_counter(pmc);
+			kvm_pmu_request_counter_reprogam(pmc);
 	}
 }
 
@@ -477,7 +477,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 				reserved_bits ^= HSW_IN_TX_CHECKPOINTED;
 			if (!(data & reserved_bits)) {
 				pmc->eventsel = data;
-				reprogram_counter(pmc);
+				kvm_pmu_request_counter_reprogam(pmc);
 				return 0;
 			}
 		} else if (intel_pmu_handle_lbr_msrs_access(vcpu, msr_info, false))
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] KVM: x86/pmu: Defer counter emulated overflow via pmc->prev_counter
  2022-09-23  0:13 [PATCH 0/4] KVM: x86/pmu: Counter reprogramming fixes Sean Christopherson
                   ` (2 preceding siblings ...)
  2022-09-23  0:13 ` [PATCH 3/4] KVM: x86/pmu: Defer reprogram_counter() to kvm_pmu_handle_event() Sean Christopherson
@ 2022-09-23  0:13 ` Sean Christopherson
  3 siblings, 0 replies; 10+ messages in thread
From: Sean Christopherson @ 2022-09-23  0:13 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Aaron Lewis, Like Xu, Wanpeng Li

From: Like Xu <likexu@tencent.com>

Defer reprogramming counters and handling overflow via KVM_REQ_PMU
when incrementing counters.  KVM skips emulated WRMSR in the VM-Exit
fastpath, the fastpath runs with IRQs disabled, skipping instructions
can increment and reprogram counters, reprogramming counters can
sleep, and sleeping is disallowed while IRQs are disabled.

 [*] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580
 [*] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 2981888, name: CPU 15/KVM
 [*] preempt_count: 1, expected: 0
 [*] RCU nest depth: 0, expected: 0
 [*] INFO: lockdep is turned off.
 [*] irq event stamp: 0
 [*] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
 [*] hardirqs last disabled at (0): [<ffffffff8121222a>] copy_process+0x146a/0x62d0
 [*] softirqs last  enabled at (0): [<ffffffff81212269>] copy_process+0x14a9/0x62d0
 [*] softirqs last disabled at (0): [<0000000000000000>] 0x0
 [*] Preemption disabled at:
 [*] [<ffffffffc2063fc1>] vcpu_enter_guest+0x1001/0x3dc0 [kvm]
 [*] CPU: 17 PID: 2981888 Comm: CPU 15/KVM Kdump: 5.19.0-rc1-g239111db364c-dirty #2
 [*] Call Trace:
 [*]  <TASK>
 [*]  dump_stack_lvl+0x6c/0x9b
 [*]  __might_resched.cold+0x22e/0x297
 [*]  __mutex_lock+0xc0/0x23b0
 [*]  perf_event_ctx_lock_nested+0x18f/0x340
 [*]  perf_event_pause+0x1a/0x110
 [*]  reprogram_counter+0x2af/0x1490 [kvm]
 [*]  kvm_pmu_trigger_event+0x429/0x950 [kvm]
 [*]  kvm_skip_emulated_instruction+0x48/0x90 [kvm]
 [*]  handle_fastpath_set_msr_irqoff+0x349/0x3b0 [kvm]
 [*]  vmx_vcpu_run+0x268e/0x3b80 [kvm_intel]
 [*]  vcpu_enter_guest+0x1d22/0x3dc0 [kvm]

Add a field to kvm_pmc to track the previous counter value in order
to defer overflow detection to kvm_pmu_handle_event() (the counter must
be paused before handling overflow, and that may increment the counter).

Opportunistically shrink sizeof(struct kvm_pmc) a bit.

Suggested-by: Wanpeng Li <wanpengli@tencent.com>
Fixes: 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions")
Signed-off-by: Like Xu <likexu@tencent.com>
Link: https://lore.kernel.org/r/20220831085328.45489-6-likexu@tencent.com
[sean: avoid re-triggering KVM_REQ_PMU on overflow, tweak changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  5 +++--
 arch/x86/kvm/pmu.c              | 32 ++++++++++++++++----------------
 arch/x86/kvm/svm/pmu.c          |  2 +-
 arch/x86/kvm/vmx/pmu_intel.c    |  4 ++--
 4 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 12dcfc9330e7..9639404f2856 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -488,7 +488,10 @@ enum pmc_type {
 struct kvm_pmc {
 	enum pmc_type type;
 	u8 idx;
+	bool is_paused;
+	bool intr;
 	u64 counter;
+	u64 prev_counter;
 	u64 eventsel;
 	struct perf_event *perf_event;
 	struct kvm_vcpu *vcpu;
@@ -498,8 +501,6 @@ struct kvm_pmc {
 	 * ctrl value for fixed counters.
 	 */
 	u64 current_config;
-	bool is_paused;
-	bool intr;
 };
 
 #define KVM_PMC_MAX_FIXED	3
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index d8330e6064ab..935c9d80ab50 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -101,14 +101,6 @@ static inline void __kvm_perf_overflow(struct kvm_pmc *pmc, bool in_pmi)
 	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
 	bool skip_pmi = false;
 
-	/*
-	 * Ignore overflow events for counters that are scheduled to be
-	 * reprogrammed, e.g. if a PMI for the previous event races with KVM's
-	 * handling of a related guest WRMSR.
-	 */
-	if (test_and_set_bit(pmc->idx, pmu->reprogram_pmi))
-		return;
-
 	if (pmc->perf_event && pmc->perf_event->attr.precise_ip) {
 		if (!in_pmi) {
 			/*
@@ -126,7 +118,6 @@ static inline void __kvm_perf_overflow(struct kvm_pmc *pmc, bool in_pmi)
 	} else {
 		__set_bit(pmc->idx, (unsigned long *)&pmu->global_status);
 	}
-	kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
 
 	if (!pmc->intr || skip_pmi)
 		return;
@@ -151,7 +142,17 @@ static void kvm_perf_overflow(struct perf_event *perf_event,
 {
 	struct kvm_pmc *pmc = perf_event->overflow_handler_context;
 
+	/*
+	 * Ignore overflow events for counters that are scheduled to be
+	 * reprogrammed, e.g. if a PMI for the previous event races with KVM's
+	 * handling of a related guest WRMSR.
+	 */
+	if (test_and_set_bit(pmc->idx, pmc_to_pmu(pmc)->reprogram_pmi))
+		return;
+
 	__kvm_perf_overflow(pmc, true);
+
+	kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
 }
 
 static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config,
@@ -311,6 +312,9 @@ static void reprogram_counter(struct kvm_pmc *pmc)
 	if (!check_pmu_event_filter(pmc))
 		goto reprogram_complete;
 
+	if (pmc->counter < pmc->prev_counter)
+		__kvm_perf_overflow(pmc, false);
+
 	if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
 		printk_once("kvm pmu: pin control bit is ignored\n");
 
@@ -348,6 +352,7 @@ static void reprogram_counter(struct kvm_pmc *pmc)
 
 reprogram_complete:
 	clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi);
+	pmc->prev_counter = 0;
 }
 
 void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
@@ -536,14 +541,9 @@ void kvm_pmu_destroy(struct kvm_vcpu *vcpu)
 
 static void kvm_pmu_incr_counter(struct kvm_pmc *pmc)
 {
-	u64 prev_count;
-
-	prev_count = pmc->counter;
+	pmc->prev_counter = pmc->counter;
 	pmc->counter = (pmc->counter + 1) & pmc_bitmask(pmc);
-
-	reprogram_counter(pmc);
-	if (pmc->counter < prev_count)
-		__kvm_perf_overflow(pmc, false);
+	kvm_pmu_request_counter_reprogam(pmc);
 }
 
 static inline bool eventsel_match_perf_hw_id(struct kvm_pmc *pmc,
diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 041aa898e1bc..2ec420b85d6a 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -211,7 +211,7 @@ static void amd_pmu_reset(struct kvm_vcpu *vcpu)
 		struct kvm_pmc *pmc = &pmu->gp_counters[i];
 
 		pmc_stop_counter(pmc);
-		pmc->counter = pmc->eventsel = 0;
+		pmc->counter = pmc->prev_counter = pmc->eventsel = 0;
 	}
 }
 
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index e38518afc265..1bf5d4b00296 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -647,14 +647,14 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
 		pmc = &pmu->gp_counters[i];
 
 		pmc_stop_counter(pmc);
-		pmc->counter = pmc->eventsel = 0;
+		pmc->counter = pmc->prev_counter = pmc->eventsel = 0;
 	}
 
 	for (i = 0; i < KVM_PMC_MAX_FIXED; i++) {
 		pmc = &pmu->fixed_counters[i];
 
 		pmc_stop_counter(pmc);
-		pmc->counter = 0;
+		pmc->counter = pmc->prev_counter = 0;
 	}
 
 	pmu->fixed_ctr_ctrl = pmu->global_ctrl = pmu->global_status = 0;
-- 
2.37.3.998.g577e59143f-goog


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] KVM: x86/pmu: Force reprogramming of all counters on PMU filter change
  2022-09-23  0:13 ` [PATCH 1/4] KVM: x86/pmu: Force reprogramming of all counters on PMU filter change Sean Christopherson
@ 2022-10-13 12:01   ` Like Xu
  2022-10-13 20:53     ` Sean Christopherson
  0 siblings, 1 reply; 10+ messages in thread
From: Like Xu @ 2022-10-13 12:01 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, Aaron Lewis, Like Xu, Wanpeng Li, Paolo Bonzini

Firstly, thanks for your comments that spewed out around vpmu.

On 23/9/2022 8:13 am, Sean Christopherson wrote:
> Force vCPUs to reprogram all counters on a PMU filter change to provide
> a sane ABI for userspace.  Use the existing KVM_REQ_PMU to do the
> programming, and take advantage of the fact that the reprogram_pmi bitmap
> fits in a u64 to set all bits in a single atomic update.  Note, setting
> the bitmap and making the request needs to be done _after_ the SRCU
> synchronization to ensure that vCPUs will reprogram using the new filter.
> 
> KVM's current "lazy" approach is confusing and non-deterministic.  It's

The resolute lazy approach was introduced in patch 03, right after this change.

> confusing because, from a developer perspective, the code is buggy as it
> makes zero sense to let userspace modify the filter but then not actually
> enforce the new filter.  The lazy approach is non-deterministic because
> KVM enforces the filter whenever a counter is reprogrammed, not just on
> guest WRMSRs, i.e. a guest might gain/lose access to an event at random
> times depending on what is going on in the host.
> 
> Note, the resulting behavior is still non-determinstic while the filter
> is in flux.  If userspace wants to guarantee deterministic behavior, all
> vCPUs should be paused during the filter update.
> 
> Fixes: 66bb8a065f5a ("KVM: x86: PMU Event Filter")
> Cc: Aaron Lewis <aaronlewis@google.com>
> Jim Mattson <jmattson@google.com>

miss "Cc:" ?

> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/include/asm/kvm_host.h | 11 ++++++++++-
>   arch/x86/kvm/pmu.c              | 15 +++++++++++++--
>   2 files changed, 23 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index b3ce723efb43..462f041ede9f 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -519,7 +519,16 @@ struct kvm_pmu {
>   	struct kvm_pmc gp_counters[INTEL_PMC_MAX_GENERIC];
>   	struct kvm_pmc fixed_counters[KVM_PMC_MAX_FIXED];
>   	struct irq_work irq_work;
> -	DECLARE_BITMAP(reprogram_pmi, X86_PMC_IDX_MAX);
> +
> +	/*
> +	 * Overlay the bitmap with a 64-bit atomic so that all bits can be
> +	 * set in a single access, e.g. to reprogram all counters when the PMU
> +	 * filter changes.
> +	 */
> +	union {
> +		DECLARE_BITMAP(reprogram_pmi, X86_PMC_IDX_MAX);
> +		atomic64_t __reprogram_pmi;
> +	};
>   	DECLARE_BITMAP(all_valid_pmc_idx, X86_PMC_IDX_MAX);
>   	DECLARE_BITMAP(pmc_in_use, X86_PMC_IDX_MAX);
>   
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index d9b9a0f0db17..4504987cbbe2 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -577,6 +577,8 @@ EXPORT_SYMBOL_GPL(kvm_pmu_trigger_event);
>   int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
>   {
>   	struct kvm_pmu_event_filter tmp, *filter;
> +	struct kvm_vcpu *vcpu;
> +	unsigned long i;
>   	size_t size;
>   	int r;
>   
> @@ -613,9 +615,18 @@ int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
>   	mutex_lock(&kvm->lock);
>   	filter = rcu_replace_pointer(kvm->arch.pmu_event_filter, filter,
>   				     mutex_is_locked(&kvm->lock));
> -	mutex_unlock(&kvm->lock);
> -
>   	synchronize_srcu_expedited(&kvm->srcu);

The relative order of these two operations has been reversed
	mutex_unlock() and synchronize_srcu_expedited()
, extending the execution window of the critical area of "kvm->lock)".
The motivation is also not explicitly stated in the commit message.

> +
> +	BUILD_BUG_ON(sizeof(((struct kvm_pmu *)0)->reprogram_pmi) >
> +		     sizeof(((struct kvm_pmu *)0)->__reprogram_pmi));
> +
> +	kvm_for_each_vcpu(i, vcpu, kvm)
> +		atomic64_set(&vcpu_to_pmu(vcpu)->__reprogram_pmi, -1ull);

How about:
	bitmap_copy(pmu->reprogram_pmi, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX);
to avoid further cycles on calls of 
"static_call(kvm_x86_pmu_pmc_idx_to_pmc)(pmu, bit)" ?

> +
> +	kvm_make_all_cpus_request(kvm, KVM_REQ_PMU);
> +
> +	mutex_unlock(&kvm->lock);
> +
>   	r = 0;
>   cleanup:
>   	kfree(filter);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] KVM: x86/pmu: Force reprogramming of all counters on PMU filter change
  2022-10-13 12:01   ` Like Xu
@ 2022-10-13 20:53     ` Sean Christopherson
  2022-10-14  6:41       ` Like Xu
  0 siblings, 1 reply; 10+ messages in thread
From: Sean Christopherson @ 2022-10-13 20:53 UTC (permalink / raw)
  To: Like Xu
  Cc: kvm, linux-kernel, Aaron Lewis, Like Xu, Wanpeng Li, Paolo Bonzini

On Thu, Oct 13, 2022, Like Xu wrote:
> Firstly, thanks for your comments that spewed out around vpmu.
> 
> On 23/9/2022 8:13 am, Sean Christopherson wrote:
> > Force vCPUs to reprogram all counters on a PMU filter change to provide
> > a sane ABI for userspace.  Use the existing KVM_REQ_PMU to do the
> > programming, and take advantage of the fact that the reprogram_pmi bitmap
> > fits in a u64 to set all bits in a single atomic update.  Note, setting
> > the bitmap and making the request needs to be done _after_ the SRCU
> > synchronization to ensure that vCPUs will reprogram using the new filter.
> > 
> > KVM's current "lazy" approach is confusing and non-deterministic.  It's
> 
> The resolute lazy approach was introduced in patch 03, right after this change.

This is referring to the lazy recognition of the filter, not the deferred
reprogramming of the counters.  Regardless of whether reprogramming is handled
via request or in-line, KVM is still lazily recognizing the new filter as vCPUs
won't picke up the new filter until the _guest_ triggers a refresh.

> > @@ -613,9 +615,18 @@ int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
> >   	mutex_lock(&kvm->lock);
> >   	filter = rcu_replace_pointer(kvm->arch.pmu_event_filter, filter,
> >   				     mutex_is_locked(&kvm->lock));
> > -	mutex_unlock(&kvm->lock);
> > -
> >   	synchronize_srcu_expedited(&kvm->srcu);
> 
> The relative order of these two operations has been reversed
> 	mutex_unlock() and synchronize_srcu_expedited()
> , extending the execution window of the critical area of "kvm->lock)".
> The motivation is also not explicitly stated in the commit message.

I'll add a blurb, after I re-convince myself that the sync+request needs to be
done under kvm->lock.

> > +	BUILD_BUG_ON(sizeof(((struct kvm_pmu *)0)->reprogram_pmi) >
> > +		     sizeof(((struct kvm_pmu *)0)->__reprogram_pmi));
> > +
> > +	kvm_for_each_vcpu(i, vcpu, kvm)
> > +		atomic64_set(&vcpu_to_pmu(vcpu)->__reprogram_pmi, -1ull);
> 
> How about:
> 	bitmap_copy(pmu->reprogram_pmi, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX);
> to avoid further cycles on calls of
> "static_call(kvm_x86_pmu_pmc_idx_to_pmc)(pmu, bit)" ?

bitmap_copy() was my first choice too, but unfortunately it's doesn't guarantee
atomicity and could lead to data corruption if the target vCPU is concurrently
modifying the bitmap.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/4] KVM: x86/pmu: Force reprogramming of all counters on PMU filter change
  2022-10-13 20:53     ` Sean Christopherson
@ 2022-10-14  6:41       ` Like Xu
  0 siblings, 0 replies; 10+ messages in thread
From: Like Xu @ 2022-10-14  6:41 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, Aaron Lewis, Wanpeng Li, Paolo Bonzini

On 14/10/2022 4:53 am, Sean Christopherson wrote:
> On Thu, Oct 13, 2022, Like Xu wrote:
>> Firstly, thanks for your comments that spewed out around vpmu.
>>
>> On 23/9/2022 8:13 am, Sean Christopherson wrote:
>>> Force vCPUs to reprogram all counters on a PMU filter change to provide
>>> a sane ABI for userspace.  Use the existing KVM_REQ_PMU to do the
>>> programming, and take advantage of the fact that the reprogram_pmi bitmap
>>> fits in a u64 to set all bits in a single atomic update.  Note, setting
>>> the bitmap and making the request needs to be done _after_ the SRCU
>>> synchronization to ensure that vCPUs will reprogram using the new filter.
>>>
>>> KVM's current "lazy" approach is confusing and non-deterministic.  It's
>>
>> The resolute lazy approach was introduced in patch 03, right after this change.
> 
> This is referring to the lazy recognition of the filter, not the deferred
> reprogramming of the counters.  Regardless of whether reprogramming is handled
> via request or in-line, KVM is still lazily recognizing the new filter as vCPUs
> won't picke up the new filter until the _guest_ triggers a refresh.

It may still be too late for the pmu filter to take effect. To eliminate this 
"non-deterministic",
should we kick out all vpmu-enabled vcpus right after making KVM_REQ_PMU requests ?

> 
>>> @@ -613,9 +615,18 @@ int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp)
>>>    	mutex_lock(&kvm->lock);
>>>    	filter = rcu_replace_pointer(kvm->arch.pmu_event_filter, filter,
>>>    				     mutex_is_locked(&kvm->lock));
>>> -	mutex_unlock(&kvm->lock);
>>> -
>>>    	synchronize_srcu_expedited(&kvm->srcu);
>>
>> The relative order of these two operations has been reversed
>> 	mutex_unlock() and synchronize_srcu_expedited()
>> , extending the execution window of the critical area of "kvm->lock)".
>> The motivation is also not explicitly stated in the commit message.
> 
> I'll add a blurb, after I re-convince myself that the sync+request needs to be
> done under kvm->lock.
> 
>>> +	BUILD_BUG_ON(sizeof(((struct kvm_pmu *)0)->reprogram_pmi) >
>>> +		     sizeof(((struct kvm_pmu *)0)->__reprogram_pmi));
>>> +
>>> +	kvm_for_each_vcpu(i, vcpu, kvm)
>>> +		atomic64_set(&vcpu_to_pmu(vcpu)->__reprogram_pmi, -1ull);
>>
>> How about:
>> 	bitmap_copy(pmu->reprogram_pmi, pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX);
>> to avoid further cycles on calls of
>> "static_call(kvm_x86_pmu_pmc_idx_to_pmc)(pmu, bit)" ?
> 
> bitmap_copy() was my first choice too, but unfortunately it's doesn't guarantee
> atomicity and could lead to data corruption if the target vCPU is concurrently
> modifying the bitmap.

Indeed, it may help to reuse "pmu->global_ctrl_mask" instead of "-1ull":

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 4504987cbbe2..8e279f816e27 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -621,7 +621,8 @@ int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void 
__user *argp)
  		     sizeof(((struct kvm_pmu *)0)->__reprogram_pmi));

  	kvm_for_each_vcpu(i, vcpu, kvm)
-		atomic64_set(&vcpu_to_pmu(vcpu)->__reprogram_pmi, -1ull);
+		atomic64_set(&vcpu_to_pmu(vcpu)->__reprogram_pmi,
+			     pmu->global_ctrl_mask);

  	kvm_make_all_cpus_request(kvm, KVM_REQ_PMU);

diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index b68956299fa8..a946c1c57e1d 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -185,6 +185,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
  	pmu->nr_arch_fixed_counters = 0;
  	pmu->global_status = 0;
  	bitmap_set(pmu->all_valid_pmc_idx, 0, pmu->nr_arch_gp_counters);
+	pmu->global_ctrl_mask = ~((1ull << pmu->nr_arch_gp_counters) - 1);
  }

  static void amd_pmu_init(struct kvm_vcpu *vcpu)
-- 
2.38.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] KVM: x86/pmu: Clear "reprogram" bit if counter is disabled or disallowed
  2022-09-23  0:13 ` [PATCH 2/4] KVM: x86/pmu: Clear "reprogram" bit if counter is disabled or disallowed Sean Christopherson
@ 2022-10-14  7:14   ` Like Xu
  2022-10-14 16:26     ` Sean Christopherson
  0 siblings, 1 reply; 10+ messages in thread
From: Like Xu @ 2022-10-14  7:14 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, linux-kernel, Aaron Lewis, Wanpeng Li, Paolo Bonzini

For subject title, the "reprogram" bit is _only_ used to keep track of 
pmc->perf_event,
not whether the counter is disabled.

On 23/9/2022 8:13 am, Sean Christopherson wrote:
> When reprogramming a counter, clear the counter's "reprogram pending" bit
> if the counter is disabled (by the guest) or is disallowed (by the
> userspace filter).  In both cases, there's no need to re-attempt
> programming on the next coincident KVM_REQ_PMU as enabling the counter by
> either method will trigger reprogramming.

Perhaps we could move the check_pmu_event_filter() towards the top of the call 
stack.

> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/pmu.c | 38 ++++++++++++++++++++++++--------------
>   1 file changed, 24 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 4504987cbbe2..4cd99320019b 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -150,9 +150,9 @@ static void kvm_perf_overflow(struct perf_event *perf_event,
>   	__kvm_perf_overflow(pmc, true);
>   }
>   
> -static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
> -				  u64 config, bool exclude_user,
> -				  bool exclude_kernel, bool intr)
> +static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config,
> +				 bool exclude_user, bool exclude_kernel,
> +				 bool intr)
>   {
>   	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
>   	struct perf_event *event;
> @@ -204,14 +204,14 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
>   	if (IS_ERR(event)) {
>   		pr_debug_ratelimited("kvm_pmu: event creation failed %ld for pmc->idx = %d\n",
>   			    PTR_ERR(event), pmc->idx);
> -		return;
> +		return PTR_ERR(event);
>   	}
>   
>   	pmc->perf_event = event;
>   	pmc_to_pmu(pmc)->event_count++;
> -	clear_bit(pmc->idx, pmc_to_pmu(pmc)->reprogram_pmi);
>   	pmc->is_paused = false;
>   	pmc->intr = intr || pebs;
> +	return 0;
>   }
>   
>   static void pmc_pause_counter(struct kvm_pmc *pmc)
> @@ -245,7 +245,6 @@ static bool pmc_resume_counter(struct kvm_pmc *pmc)
>   	perf_event_enable(pmc->perf_event);
>   	pmc->is_paused = false;
>   
> -	clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi);

This change is very suspicious.

>   	return true;
>   }
>   
> @@ -303,10 +302,10 @@ void reprogram_counter(struct kvm_pmc *pmc)
>   	pmc_pause_counter(pmc);
>   
>   	if (!pmc_speculative_in_use(pmc) || !pmc_is_enabled(pmc))
> -		return;
> +		goto reprogram_complete;
>   
>   	if (!check_pmu_event_filter(pmc))
> -		return;
> +		goto reprogram_complete;
>   
>   	if (eventsel & ARCH_PERFMON_EVENTSEL_PIN_CONTROL)
>   		printk_once("kvm pmu: pin control bit is ignored\n");
> @@ -324,16 +323,27 @@ void reprogram_counter(struct kvm_pmc *pmc)
>   	}
>   
>   	if (pmc->current_config == new_config && pmc_resume_counter(pmc))
> -		return;
> +		goto reprogram_complete;
>   
>   	pmc_release_perf_event(pmc);
>   
>   	pmc->current_config = new_config;
> -	pmc_reprogram_counter(pmc, PERF_TYPE_RAW,
> -			      (eventsel & pmu->raw_event_mask),
> -			      !(eventsel & ARCH_PERFMON_EVENTSEL_USR),
> -			      !(eventsel & ARCH_PERFMON_EVENTSEL_OS),
> -			      eventsel & ARCH_PERFMON_EVENTSEL_INT);
> +
> +	/*
> +	 * If reprogramming fails, e.g. due to contention, leave the counter's
> +	 * regprogram bit set, i.e. opportunistically try again on the next PMU

This is what we need, in the upstream case we need to keep trying regprogram
to try to occupy the hardware.

> +	 * refresh.  Don't make a new request as doing so can stall the guest
> +	 * if reprogramming repeatedly fails.

This does not happen, the guest still enters w/p perf_event backend support
and the vPMU is broken until the next vm-exit.

There is no need to endlessly call kvm_pmu_handle_event() when reprogram fails.

> +	 */
> +	if (pmc_reprogram_counter(pmc, PERF_TYPE_RAW,
> +				  (eventsel & pmu->raw_event_mask),
> +				  !(eventsel & ARCH_PERFMON_EVENTSEL_USR),
> +				  !(eventsel & ARCH_PERFMON_EVENTSEL_OS),
> +				  eventsel & ARCH_PERFMON_EVENTSEL_INT))
> +		return;
> +
> +reprogram_complete:
> +	clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi);
>   }
>   EXPORT_SYMBOL_GPL(reprogram_counter);
>   

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/4] KVM: x86/pmu: Clear "reprogram" bit if counter is disabled or disallowed
  2022-10-14  7:14   ` Like Xu
@ 2022-10-14 16:26     ` Sean Christopherson
  0 siblings, 0 replies; 10+ messages in thread
From: Sean Christopherson @ 2022-10-14 16:26 UTC (permalink / raw)
  To: Like Xu; +Cc: kvm, linux-kernel, Aaron Lewis, Wanpeng Li, Paolo Bonzini

On Fri, Oct 14, 2022, Like Xu wrote:
> For subject title, the "reprogram" bit is _only_ used to keep track of
> pmc->perf_event,
> not whether the counter is disabled.
> 
> On 23/9/2022 8:13 am, Sean Christopherson wrote:
> > When reprogramming a counter, clear the counter's "reprogram pending" bit
> > if the counter is disabled (by the guest) or is disallowed (by the
> > userspace filter).  In both cases, there's no need to re-attempt
> > programming on the next coincident KVM_REQ_PMU as enabling the counter by
> > either method will trigger reprogramming.
> 
> Perhaps we could move the check_pmu_event_filter() towards the top of the
> call stack.

Top of what call stack exactly?  reprogram_counter() has multiple callers, and
the filter check is already near the top of reprogram_counter().

> > @@ -245,7 +245,6 @@ static bool pmc_resume_counter(struct kvm_pmc *pmc)
> >   	perf_event_enable(pmc->perf_event);
> >   	pmc->is_paused = false;
> > -	clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi);
> 
> This change is very suspicious.

In the current code, pmc_resume_counter() clears the bit iff it returns true.
With this patch, reprogram_counter() is guarnteed to clear the bit if
pmc_resume_counter() returns true.

	if (pmc->current_config == new_config && pmc_resume_counter(pmc))
		goto reprogram_complete;

	pmc_release_perf_event(pmc);

	pmc->current_config = new_config;

	/*
	 * If reprogramming fails, e.g. due to contention, leave the counter's
	 * regprogram bit set, i.e. opportunistically try again on the next PMU
	 * refresh.  Don't make a new request as doing so can stall the guest
	 * if reprogramming repeatedly fails.
	 */
	if (pmc_reprogram_counter(pmc, PERF_TYPE_RAW,
				  (eventsel & pmu->raw_event_mask),
				  !(eventsel & ARCH_PERFMON_EVENTSEL_USR),
				  !(eventsel & ARCH_PERFMON_EVENTSEL_OS),
				  eventsel & ARCH_PERFMON_EVENTSEL_INT))
		return;

reprogram_complete:
	clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi);
	pmc->prev_counter = 0;

> > @@ -324,16 +323,27 @@ void reprogram_counter(struct kvm_pmc *pmc)
> >   	}
> >   	if (pmc->current_config == new_config && pmc_resume_counter(pmc))
> > -		return;
> > +		goto reprogram_complete;
> >   	pmc_release_perf_event(pmc);
> >   	pmc->current_config = new_config;
> > -	pmc_reprogram_counter(pmc, PERF_TYPE_RAW,
> > -			      (eventsel & pmu->raw_event_mask),
> > -			      !(eventsel & ARCH_PERFMON_EVENTSEL_USR),
> > -			      !(eventsel & ARCH_PERFMON_EVENTSEL_OS),
> > -			      eventsel & ARCH_PERFMON_EVENTSEL_INT);
> > +
> > +	/*
> > +	 * If reprogramming fails, e.g. due to contention, leave the counter's
> > +	 * regprogram bit set, i.e. opportunistically try again on the next PMU
> 
> This is what we need, in the upstream case we need to keep trying regprogram
> to try to occupy the hardware.

Maybe in an ideal world, but in reality KVM can't guarantee that programming will
ever succeed.  Making a new KVM_REQ_PMU will prevent entering the guest, i.e. will
effectively hang the vCPU.  Breaking the vPMU isn't great, but hanging the guest
is worse.

> > +	 * refresh.  Don't make a new request as doing so can stall the guest
> > +	 * if reprogramming repeatedly fails.
> 
> This does not happen, the guest still enters w/p perf_event backend support
> and the vPMU is broken until the next vm-exit.
> 
> There is no need to endlessly call kvm_pmu_handle_event() when reprogram fails.

Yes, that's what the above comment is calling out, or at least trying to call out.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-10-14 16:26 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-23  0:13 [PATCH 0/4] KVM: x86/pmu: Counter reprogramming fixes Sean Christopherson
2022-09-23  0:13 ` [PATCH 1/4] KVM: x86/pmu: Force reprogramming of all counters on PMU filter change Sean Christopherson
2022-10-13 12:01   ` Like Xu
2022-10-13 20:53     ` Sean Christopherson
2022-10-14  6:41       ` Like Xu
2022-09-23  0:13 ` [PATCH 2/4] KVM: x86/pmu: Clear "reprogram" bit if counter is disabled or disallowed Sean Christopherson
2022-10-14  7:14   ` Like Xu
2022-10-14 16:26     ` Sean Christopherson
2022-09-23  0:13 ` [PATCH 3/4] KVM: x86/pmu: Defer reprogram_counter() to kvm_pmu_handle_event() Sean Christopherson
2022-09-23  0:13 ` [PATCH 4/4] KVM: x86/pmu: Defer counter emulated overflow via pmc->prev_counter Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).