linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event
@ 2019-10-21 16:06 Like Xu
  2019-10-21 16:06 ` [PATCH v3 1/6] perf/core: Provide a kernel-internal interface to recalibrate event period Like Xu
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Like Xu @ 2019-10-21 16:06 UTC (permalink / raw)
  To: pbonzini, peterz, kvm
  Cc: like.xu, linux-kernel, jmattson, sean.j.christopherson,
	wei.w.wang, kan.liang

This patch series is going to improve vPMU Efficiency for guest which is
mainly measured by guest NMI handler latency in such as basic perf usages
[1][2] with hardware PMU. It's not a passthrough solution but based on the
legacy vPMU implementation.

With this optimization, the average latency of the guest NMI handler is
reduced from 104923 ns to 48393 ns (~2.16x speed up on CLX-AP with 5.4-rc4,
w/ perf_v4_pmi=n). If host disables the watchdog, the minimum latency of
guest NMI handler could be speed up at ~3413x and in the average at ~786x.
The run time of workload with perf attached inside the guest could be
reduced significantly with this optimization.

The general idea (defined in patch 5/6) is to reuse last created event
for the same vPMC when the new requested config is the exactly same as the
current_config (used by last pmc_reprogram_counter()) AND the new event
period is appropriate and accepted (via perf_event_period() in patch 1/6).
Before reusing the perf_event, it will be disabled until it's suitable for
reuse and a hardware counter will be reassigned again to serve vPMC.

If the disabled perf_event is no longer reused, we do a lazy release
mechanism (defined in patch 6/6) which in a short is to release the
disabled perf_events in the call of kvm_pmu_handle_event since the vcpu
gets next scheduled in if guest doesn't WRMSR its MSRs in the last sched
time slice. In the kvm_arch_sched_in(), KVM_REQ_PMU is requested if the
pmu->event_count has not been reduced to zero and then do kvm_pmu_cleanup
only once for a sched time slice to ensure that overhead is very limited.

The first two patches are for perf reviewers and the last four patches
are for kvm reviewers. Please check each commit for more details and
share your comments with us.

Thanks,
Like Xu

---
[1] multiplexing sampling mode usage: perf record  -e \
`perf list | grep Hardware | grep event |\
awk '{print $1}' | head -n 10 |tr '\n' ',' | sed 's/,$//' ` ./ftest
[2] single event count mode usage: perf stat -e branch-misses ./ftest

---

Changes in v3:
- optimize perf_event_pause() for no child event 
- rename programed_config to programed_config
- rename lazy_release_ctrl to pmc_in_use
- rename kvm_pmu_ops callbacks form msr_idx to rdpmc_idx
- add a new kvm_pmu_ops callback msr_idx_to_pmc
- use DECLARE_BITMAP to declare bitmap
- set up a bitmap 'pmu->all_valid_pmc_idx'
- move kvm_pmu_cleanup to kvm_pmu_handle_event
- update performance data based on 5.4-rc4 on CLX-AP

Changes in v2:
- use perf_event_pause() to disable, read, reset by only one lock;
- use __perf_event_read_value() after _perf_event_disable();
- replace bitfields with 'u8 event_count; bool need_cleanup;';
- refine comments and commit messages;
- fix two issues reported by kbuild test robot for ARCH=[nds32|sh]

v2:
https://lore.kernel.org/kvm/20191013091533.12971-1-like.xu@linux.intel.com/

v1:
https://lore.kernel.org/kvm/20190930072257.43352-1-like.xu@linux.intel.com/

Like Xu (6):
  perf/core: Provide a kernel-internal interface to recalibrate event
    period
  perf/core: Provide a kernel-internal interface to pause perf_event
  KVM: x86/vPMU: Rename pmu_ops callbacks from msr_idx to rdpmc_idx
  KVM: x86/vPMU: Introduce a new kvm_pmu_ops->msr_idx_to_pmc callback
  KVM: x86/vPMU: Reuse perf_event to avoid unnecessary
    pmc_reprogram_counter
  KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC

 arch/x86/include/asm/kvm_host.h |  19 ++++++
 arch/x86/kvm/pmu.c              | 112 ++++++++++++++++++++++++++++++--
 arch/x86/kvm/pmu.h              |  23 +++++--
 arch/x86/kvm/pmu_amd.c          |  24 +++++--
 arch/x86/kvm/vmx/pmu_intel.c    |  29 +++++++--
 arch/x86/kvm/x86.c              |   8 ++-
 include/linux/perf_event.h      |  10 +++
 kernel/events/core.c            |  46 +++++++++++--
 8 files changed, 240 insertions(+), 31 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 1/6] perf/core: Provide a kernel-internal interface to recalibrate event period
  2019-10-21 16:06 [PATCH v3 0/6] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event Like Xu
@ 2019-10-21 16:06 ` Like Xu
  2019-10-21 16:06 ` [PATCH v3 2/6] perf/core: Provide a kernel-internal interface to pause perf_event Like Xu
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Like Xu @ 2019-10-21 16:06 UTC (permalink / raw)
  To: pbonzini, peterz, kvm
  Cc: like.xu, linux-kernel, jmattson, sean.j.christopherson,
	wei.w.wang, kan.liang

Currently, perf_event_period() is used by user tools via ioctl. Based on
naming convention, exporting perf_event_period() for kernel users (such
as KVM) who may recalibrate the event period for their assigned counter
according to their requirements.

The perf_event_period() is an external accessor, just like the
perf_event_{en,dis}able() and should thus use perf_event_ctx_lock().

Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 include/linux/perf_event.h |  5 +++++
 kernel/events/core.c       | 28 +++++++++++++++++++++-------
 2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 61448c19a132..d601df36e671 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1336,6 +1336,7 @@ extern void perf_event_disable_local(struct perf_event *event);
 extern void perf_event_disable_inatomic(struct perf_event *event);
 extern void perf_event_task_tick(void);
 extern int perf_event_account_interrupt(struct perf_event *event);
+extern int perf_event_period(struct perf_event *event, u64 value);
 #else /* !CONFIG_PERF_EVENTS: */
 static inline void *
 perf_aux_output_begin(struct perf_output_handle *handle,
@@ -1415,6 +1416,10 @@ static inline void perf_event_disable(struct perf_event *event)		{ }
 static inline int __perf_event_disable(void *info)			{ return -1; }
 static inline void perf_event_task_tick(void)				{ }
 static inline int perf_event_release_kernel(struct perf_event *event)	{ return 0; }
+static inline int perf_event_period(struct perf_event *event, u64 value)
+{
+	return -EINVAL;
+}
 #endif
 
 #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 9ec0b0bfddbd..e1b83d2731da 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5106,16 +5106,11 @@ static int perf_event_check_period(struct perf_event *event, u64 value)
 	return event->pmu->check_period(event, value);
 }
 
-static int perf_event_period(struct perf_event *event, u64 __user *arg)
+static int _perf_event_period(struct perf_event *event, u64 value)
 {
-	u64 value;
-
 	if (!is_sampling_event(event))
 		return -EINVAL;
 
-	if (copy_from_user(&value, arg, sizeof(value)))
-		return -EFAULT;
-
 	if (!value)
 		return -EINVAL;
 
@@ -5133,6 +5128,19 @@ static int perf_event_period(struct perf_event *event, u64 __user *arg)
 	return 0;
 }
 
+int perf_event_period(struct perf_event *event, u64 value)
+{
+	struct perf_event_context *ctx;
+	int ret;
+
+	ctx = perf_event_ctx_lock(event);
+	ret = _perf_event_period(event, value);
+	perf_event_ctx_unlock(event, ctx);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(perf_event_period);
+
 static const struct file_operations perf_fops;
 
 static inline int perf_fget_light(int fd, struct fd *p)
@@ -5176,8 +5184,14 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon
 		return _perf_event_refresh(event, arg);
 
 	case PERF_EVENT_IOC_PERIOD:
-		return perf_event_period(event, (u64 __user *)arg);
+	{
+		u64 value;
+
+		if (copy_from_user(&value, (u64 __user *)arg, sizeof(value)))
+			return -EFAULT;
 
+		return _perf_event_period(event, value);
+	}
 	case PERF_EVENT_IOC_ID:
 	{
 		u64 id = primary_event_id(event);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 2/6] perf/core: Provide a kernel-internal interface to pause perf_event
  2019-10-21 16:06 [PATCH v3 0/6] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event Like Xu
  2019-10-21 16:06 ` [PATCH v3 1/6] perf/core: Provide a kernel-internal interface to recalibrate event period Like Xu
@ 2019-10-21 16:06 ` Like Xu
  2019-10-21 16:06 ` [PATCH v3 3/6] KVM: x86/vPMU: Rename pmu_ops callbacks from msr_idx to rdpmc_idx Like Xu
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Like Xu @ 2019-10-21 16:06 UTC (permalink / raw)
  To: pbonzini, peterz, kvm
  Cc: like.xu, linux-kernel, jmattson, sean.j.christopherson,
	wei.w.wang, kan.liang

Exporting perf_event_pause() as an external accessor for kernel users (such
as KVM) who may do both disable perf_event and read count with just one
time to hold perf_event_ctx_lock. Also the value could be reset optionally.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 include/linux/perf_event.h |  5 +++++
 kernel/events/core.c       | 18 ++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index d601df36e671..e9768bfc76f6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1337,6 +1337,7 @@ extern void perf_event_disable_inatomic(struct perf_event *event);
 extern void perf_event_task_tick(void);
 extern int perf_event_account_interrupt(struct perf_event *event);
 extern int perf_event_period(struct perf_event *event, u64 value);
+extern u64 perf_event_pause(struct perf_event *event, bool reset);
 #else /* !CONFIG_PERF_EVENTS: */
 static inline void *
 perf_aux_output_begin(struct perf_output_handle *handle,
@@ -1420,6 +1421,10 @@ static inline int perf_event_period(struct perf_event *event, u64 value)
 {
 	return -EINVAL;
 }
+static inline u64 perf_event_pause(struct perf_event *event, bool reset)
+{
+	return 0;
+}
 #endif
 
 #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index e1b83d2731da..fc9f5ebf4849 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5029,6 +5029,24 @@ static void _perf_event_reset(struct perf_event *event)
 	perf_event_update_userpage(event);
 }
 
+/* Assume it's not an event with inherit set. */
+u64 perf_event_pause(struct perf_event *event, bool reset)
+{
+	struct perf_event_context *ctx;
+	u64 count;
+
+	ctx = perf_event_ctx_lock(event);
+	WARN_ON_ONCE(event->attr.inherit);
+	_perf_event_disable(event);
+	count = local64_read(&event->count);
+	if (reset)
+		local64_set(&event->count, 0);
+	perf_event_ctx_unlock(event, ctx);
+
+	return count;
+}
+EXPORT_SYMBOL_GPL(perf_event_pause);
+
 /*
  * Holding the top-level event's child_mutex means that any
  * descendant process that has inherited this event will block
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 3/6] KVM: x86/vPMU: Rename pmu_ops callbacks from msr_idx to rdpmc_idx
  2019-10-21 16:06 [PATCH v3 0/6] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event Like Xu
  2019-10-21 16:06 ` [PATCH v3 1/6] perf/core: Provide a kernel-internal interface to recalibrate event period Like Xu
  2019-10-21 16:06 ` [PATCH v3 2/6] perf/core: Provide a kernel-internal interface to pause perf_event Like Xu
@ 2019-10-21 16:06 ` Like Xu
  2019-10-23 17:55   ` Jim Mattson
  2019-10-21 16:06 ` [PATCH v3 4/6] KVM: x86/vPMU: Introduce a new kvm_pmu_ops->msr_idx_to_pmc callback Like Xu
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Like Xu @ 2019-10-21 16:06 UTC (permalink / raw)
  To: pbonzini, peterz, kvm
  Cc: like.xu, linux-kernel, jmattson, sean.j.christopherson,
	wei.w.wang, kan.liang

The leagcy pmu_ops->msr_idx_to_pmc is only called in kvm_pmu_rdpmc, so
this name is restrictedly limited to rdpmc_idx which could be indexed
exactly to a kvm_pmc. Let's restrict its semantic by renaming the
existing msr_idx_to_pmc to rdpmc_idx_to_pmc, and is_valid_msr_idx to
is_valid_rdpmc_idx (likewise for kvm_pmu_is_valid_msr_idx).

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/kvm/pmu.c           |  6 +++---
 arch/x86/kvm/pmu.h           |  8 ++++----
 arch/x86/kvm/pmu_amd.c       |  9 +++++----
 arch/x86/kvm/vmx/pmu_intel.c | 10 +++++-----
 arch/x86/kvm/x86.c           |  2 +-
 5 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 46875bbd0419..de362ba0df01 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -271,9 +271,9 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
 }
 
 /* check if idx is a valid index to access PMU */
-int kvm_pmu_is_valid_msr_idx(struct kvm_vcpu *vcpu, unsigned idx)
+int kvm_pmu_is_valid_rdpmc_idx(struct kvm_vcpu *vcpu, unsigned int idx)
 {
-	return kvm_x86_ops->pmu_ops->is_valid_msr_idx(vcpu, idx);
+	return kvm_x86_ops->pmu_ops->is_valid_rdpmc_idx(vcpu, idx);
 }
 
 bool is_vmware_backdoor_pmc(u32 pmc_idx)
@@ -323,7 +323,7 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)
 	if (is_vmware_backdoor_pmc(idx))
 		return kvm_pmu_rdpmc_vmware(vcpu, idx, data);
 
-	pmc = kvm_x86_ops->pmu_ops->msr_idx_to_pmc(vcpu, idx, &mask);
+	pmc = kvm_x86_ops->pmu_ops->rdpmc_idx_to_pmc(vcpu, idx, &mask);
 	if (!pmc)
 		return 1;
 
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 58265f761c3b..c3faa70d9886 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -25,9 +25,9 @@ struct kvm_pmu_ops {
 	unsigned (*find_fixed_event)(int idx);
 	bool (*pmc_is_enabled)(struct kvm_pmc *pmc);
 	struct kvm_pmc *(*pmc_idx_to_pmc)(struct kvm_pmu *pmu, int pmc_idx);
-	struct kvm_pmc *(*msr_idx_to_pmc)(struct kvm_vcpu *vcpu, unsigned idx,
-					  u64 *mask);
-	int (*is_valid_msr_idx)(struct kvm_vcpu *vcpu, unsigned idx);
+	struct kvm_pmc *(*rdpmc_idx_to_pmc)(struct kvm_vcpu *vcpu,
+		unsigned int idx, u64 *mask);
+	int (*is_valid_rdpmc_idx)(struct kvm_vcpu *vcpu, unsigned int idx);
 	bool (*is_valid_msr)(struct kvm_vcpu *vcpu, u32 msr);
 	int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
 	int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
@@ -110,7 +110,7 @@ void reprogram_counter(struct kvm_pmu *pmu, int pmc_idx);
 void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
 void kvm_pmu_handle_event(struct kvm_vcpu *vcpu);
 int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
-int kvm_pmu_is_valid_msr_idx(struct kvm_vcpu *vcpu, unsigned idx);
+int kvm_pmu_is_valid_rdpmc_idx(struct kvm_vcpu *vcpu, unsigned int idx);
 bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr);
 int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
 int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
diff --git a/arch/x86/kvm/pmu_amd.c b/arch/x86/kvm/pmu_amd.c
index c8388389a3b0..759b090f8140 100644
--- a/arch/x86/kvm/pmu_amd.c
+++ b/arch/x86/kvm/pmu_amd.c
@@ -174,7 +174,7 @@ static struct kvm_pmc *amd_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx)
 }
 
 /* returns 0 if idx's corresponding MSR exists; otherwise returns 1. */
-static int amd_is_valid_msr_idx(struct kvm_vcpu *vcpu, unsigned idx)
+static int amd_is_valid_rdpmc_idx(struct kvm_vcpu *vcpu, unsigned int idx)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
 
@@ -184,7 +184,8 @@ static int amd_is_valid_msr_idx(struct kvm_vcpu *vcpu, unsigned idx)
 }
 
 /* idx is the ECX register of RDPMC instruction */
-static struct kvm_pmc *amd_msr_idx_to_pmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *mask)
+static struct kvm_pmc *amd_rdpmc_idx_to_pmc(struct kvm_vcpu *vcpu,
+	unsigned int idx, u64 *mask)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
 	struct kvm_pmc *counters;
@@ -306,8 +307,8 @@ struct kvm_pmu_ops amd_pmu_ops = {
 	.find_fixed_event = amd_find_fixed_event,
 	.pmc_is_enabled = amd_pmc_is_enabled,
 	.pmc_idx_to_pmc = amd_pmc_idx_to_pmc,
-	.msr_idx_to_pmc = amd_msr_idx_to_pmc,
-	.is_valid_msr_idx = amd_is_valid_msr_idx,
+	.rdpmc_idx_to_pmc = amd_rdpmc_idx_to_pmc,
+	.is_valid_rdpmc_idx = amd_is_valid_rdpmc_idx,
 	.is_valid_msr = amd_is_valid_msr,
 	.get_msr = amd_pmu_get_msr,
 	.set_msr = amd_pmu_set_msr,
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 3e9c059099e9..8b999dae15f2 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -111,7 +111,7 @@ static struct kvm_pmc *intel_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx)
 }
 
 /* returns 0 if idx's corresponding MSR exists; otherwise returns 1. */
-static int intel_is_valid_msr_idx(struct kvm_vcpu *vcpu, unsigned idx)
+static int intel_is_valid_rdpmc_idx(struct kvm_vcpu *vcpu, unsigned int idx)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
 	bool fixed = idx & (1u << 30);
@@ -122,8 +122,8 @@ static int intel_is_valid_msr_idx(struct kvm_vcpu *vcpu, unsigned idx)
 		(fixed && idx >= pmu->nr_arch_fixed_counters);
 }
 
-static struct kvm_pmc *intel_msr_idx_to_pmc(struct kvm_vcpu *vcpu,
-					    unsigned idx, u64 *mask)
+static struct kvm_pmc *intel_rdpmc_idx_to_pmc(struct kvm_vcpu *vcpu,
+					    unsigned int idx, u64 *mask)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
 	bool fixed = idx & (1u << 30);
@@ -366,8 +366,8 @@ struct kvm_pmu_ops intel_pmu_ops = {
 	.find_fixed_event = intel_find_fixed_event,
 	.pmc_is_enabled = intel_pmc_is_enabled,
 	.pmc_idx_to_pmc = intel_pmc_idx_to_pmc,
-	.msr_idx_to_pmc = intel_msr_idx_to_pmc,
-	.is_valid_msr_idx = intel_is_valid_msr_idx,
+	.rdpmc_idx_to_pmc = intel_rdpmc_idx_to_pmc,
+	.is_valid_rdpmc_idx = intel_is_valid_rdpmc_idx,
 	.is_valid_msr = intel_is_valid_msr,
 	.get_msr = intel_pmu_get_msr,
 	.set_msr = intel_pmu_set_msr,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 661e2bf38526..72ce691fd45d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6111,7 +6111,7 @@ static void emulator_set_smbase(struct x86_emulate_ctxt *ctxt, u64 smbase)
 static int emulator_check_pmc(struct x86_emulate_ctxt *ctxt,
 			      u32 pmc)
 {
-	return kvm_pmu_is_valid_msr_idx(emul_to_vcpu(ctxt), pmc);
+	return kvm_pmu_is_valid_rdpmc_idx(emul_to_vcpu(ctxt), pmc);
 }
 
 static int emulator_read_pmc(struct x86_emulate_ctxt *ctxt,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 4/6] KVM: x86/vPMU: Introduce a new kvm_pmu_ops->msr_idx_to_pmc callback
  2019-10-21 16:06 [PATCH v3 0/6] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event Like Xu
                   ` (2 preceding siblings ...)
  2019-10-21 16:06 ` [PATCH v3 3/6] KVM: x86/vPMU: Rename pmu_ops callbacks from msr_idx to rdpmc_idx Like Xu
@ 2019-10-21 16:06 ` Like Xu
  2019-10-26  5:28   ` [RFC PATCH] KVM: x86/vPMU: intel_msr_idx_to_pmc() can be static kbuild test robot
  2019-10-26  5:28   ` [PATCH v3 4/6] KVM: x86/vPMU: Introduce a new kvm_pmu_ops->msr_idx_to_pmc callback kbuild test robot
  2019-10-21 16:06 ` [PATCH v3 5/6] KVM: x86/vPMU: Reuse perf_event to avoid unnecessary pmc_reprogram_counter Like Xu
  2019-10-21 16:06 ` [PATCH v3 6/6] KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC Like Xu
  5 siblings, 2 replies; 13+ messages in thread
From: Like Xu @ 2019-10-21 16:06 UTC (permalink / raw)
  To: pbonzini, peterz, kvm
  Cc: like.xu, linux-kernel, jmattson, sean.j.christopherson,
	wei.w.wang, kan.liang

Introduce a new callback msr_idx_to_pmc that returns a struct kvm_pmc*,
and change kvm_pmu_is_valid_msr to return ".msr_idx_to_pmc(vcpu, msr) ||
.is_valid_msr(vcpu, msr)" and AMD just returns false from .is_valid_msr.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/kvm/pmu.c           |  3 ++-
 arch/x86/kvm/pmu.h           |  1 +
 arch/x86/kvm/pmu_amd.c       | 15 +++++++++++----
 arch/x86/kvm/vmx/pmu_intel.c | 13 +++++++++++++
 4 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index de362ba0df01..39a259b701e0 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -339,7 +339,8 @@ void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
 
 bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 {
-	return kvm_x86_ops->pmu_ops->is_valid_msr(vcpu, msr);
+	return kvm_x86_ops->pmu_ops->msr_idx_to_pmc(vcpu, msr) ||
+		kvm_x86_ops->pmu_ops->is_valid_msr(vcpu, msr);
 }
 
 int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index c3faa70d9886..3b4f1942d206 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -27,6 +27,7 @@ struct kvm_pmu_ops {
 	struct kvm_pmc *(*pmc_idx_to_pmc)(struct kvm_pmu *pmu, int pmc_idx);
 	struct kvm_pmc *(*rdpmc_idx_to_pmc)(struct kvm_vcpu *vcpu,
 		unsigned int idx, u64 *mask);
+	struct kvm_pmc *(*msr_idx_to_pmc)(struct kvm_vcpu *vcpu, u32 msr);
 	int (*is_valid_rdpmc_idx)(struct kvm_vcpu *vcpu, unsigned int idx);
 	bool (*is_valid_msr)(struct kvm_vcpu *vcpu, u32 msr);
 	int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
diff --git a/arch/x86/kvm/pmu_amd.c b/arch/x86/kvm/pmu_amd.c
index 759b090f8140..88fe3f5f5bd1 100644
--- a/arch/x86/kvm/pmu_amd.c
+++ b/arch/x86/kvm/pmu_amd.c
@@ -199,14 +199,20 @@ static struct kvm_pmc *amd_rdpmc_idx_to_pmc(struct kvm_vcpu *vcpu,
 }
 
 static bool amd_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
+{
+	/* please use kvm_pmu_is_valid_msr() instead */
+	return false;
+}
+
+struct kvm_pmc *amd_msr_idx_to_pmc(struct kvm_vcpu *vcpu, u32 msr)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
-	int ret = false;
+	struct kvm_pmc *pmc;
 
-	ret = get_gp_pmc_amd(pmu, msr, PMU_TYPE_COUNTER) ||
-		get_gp_pmc_amd(pmu, msr, PMU_TYPE_EVNTSEL);
+	pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_COUNTER);
+	pmc = pmc ? pmc : get_gp_pmc_amd(pmu, msr, PMU_TYPE_EVNTSEL);
 
-	return ret;
+	return pmc;
 }
 
 static int amd_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
@@ -308,6 +314,7 @@ struct kvm_pmu_ops amd_pmu_ops = {
 	.pmc_is_enabled = amd_pmc_is_enabled,
 	.pmc_idx_to_pmc = amd_pmc_idx_to_pmc,
 	.rdpmc_idx_to_pmc = amd_rdpmc_idx_to_pmc,
+	.msr_idx_to_pmc = amd_msr_idx_to_pmc,
 	.is_valid_rdpmc_idx = amd_is_valid_rdpmc_idx,
 	.is_valid_msr = amd_is_valid_msr,
 	.get_msr = amd_pmu_get_msr,
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 8b999dae15f2..714afcd9244c 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -162,6 +162,18 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 	return ret;
 }
 
+struct kvm_pmc *intel_msr_idx_to_pmc(struct kvm_vcpu *vcpu, u32 msr)
+{
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+	struct kvm_pmc *pmc;
+
+	pmc = get_fixed_pmc(pmu, msr);
+	pmc = pmc ? pmc : get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0);
+	pmc = pmc ? pmc : get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0);
+
+	return pmc;
+}
+
 static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -367,6 +379,7 @@ struct kvm_pmu_ops intel_pmu_ops = {
 	.pmc_is_enabled = intel_pmc_is_enabled,
 	.pmc_idx_to_pmc = intel_pmc_idx_to_pmc,
 	.rdpmc_idx_to_pmc = intel_rdpmc_idx_to_pmc,
+	.msr_idx_to_pmc = intel_msr_idx_to_pmc,
 	.is_valid_rdpmc_idx = intel_is_valid_rdpmc_idx,
 	.is_valid_msr = intel_is_valid_msr,
 	.get_msr = intel_pmu_get_msr,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 5/6] KVM: x86/vPMU: Reuse perf_event to avoid unnecessary pmc_reprogram_counter
  2019-10-21 16:06 [PATCH v3 0/6] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event Like Xu
                   ` (3 preceding siblings ...)
  2019-10-21 16:06 ` [PATCH v3 4/6] KVM: x86/vPMU: Introduce a new kvm_pmu_ops->msr_idx_to_pmc callback Like Xu
@ 2019-10-21 16:06 ` Like Xu
  2019-10-21 16:06 ` [PATCH v3 6/6] KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC Like Xu
  5 siblings, 0 replies; 13+ messages in thread
From: Like Xu @ 2019-10-21 16:06 UTC (permalink / raw)
  To: pbonzini, peterz, kvm
  Cc: like.xu, linux-kernel, jmattson, sean.j.christopherson,
	wei.w.wang, kan.liang

The perf_event_create_kernel_counter() in the pmc_reprogram_counter() is
a heavyweight and high-frequency operation, especially when host disables
the watchdog (maximum 21000000 ns) which leads to an unacceptable latency
of the guest NMI handler. It limits the use of vPMUs in the guest.

When a vPMC is fully enabled, the legacy reprogram_*_counter() would stop
and release its existing perf_event (if any) every time EVEN in most cases
almost the same requested perf_event will be created and configured again.

For each vPMC, if the reuqested config ('u64 eventsel' for gp and 'u8 ctrl'
for fixed) is the same as its current config AND a new sample period based
on pmc->counter is accepted by host perf interface, the current event could
be reused safely as a new created one does. Otherwise, do release the
undesirable perf_event and reprogram a new one as usual.

It's light-weight to call pmc_pause_counter (disable, read and reset event)
and pmc_resume_counter (recalibrate period and re-enable event) as guest
expects instead of release-and-create again on any condition. Compared to
use the filterable event->attr or hw.config, a new 'u64 current_config'
field is added to save the last original programed config for each vPMC.

Based on this implementation, the number of calls to pmc_reprogram_counter
is reduced by ~82.5% for a gp sampling event and ~99.9% for a fixed event.
In the usage of multiplexing perf sampling mode, the average latency of the
guest NMI handler is reduced from 104923 ns to 48393 ns (~2.16x speed up).
If host disables watchdog, the minimum latecy of guest NMI handler could be
speed up at ~3413x (from 20407603 to 5979 ns) and at ~786x in the average.

Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h |  5 ++++
 arch/x86/kvm/pmu.c              | 45 +++++++++++++++++++++++++++++++--
 arch/x86/kvm/pmu.h              | 12 +++++++--
 arch/x86/kvm/pmu_amd.c          |  1 +
 arch/x86/kvm/vmx/pmu_intel.c    |  2 ++
 5 files changed, 61 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 50eb430b0ad8..ccce4aaa44df 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -451,6 +451,11 @@ struct kvm_pmc {
 	u64 eventsel;
 	struct perf_event *perf_event;
 	struct kvm_vcpu *vcpu;
+	/*
+	 * eventsel value for general purpose counters,
+	 * ctrl value for fixed counters.
+	 */
+	u64 current_config;
 };
 
 struct kvm_pmu {
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 39a259b701e0..80a17377ec81 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -140,6 +140,35 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
 	clear_bit(pmc->idx, (unsigned long*)&pmc_to_pmu(pmc)->reprogram_pmi);
 }
 
+static void pmc_pause_counter(struct kvm_pmc *pmc)
+{
+	u64 counter = pmc->counter;
+
+	if (!pmc->perf_event)
+		return;
+
+	/* update counter, reset event value to avoid redundant accumulation */
+	counter += perf_event_pause(pmc->perf_event, true);
+	pmc->counter = counter & pmc_bitmask(pmc);
+}
+
+static bool pmc_resume_counter(struct kvm_pmc *pmc)
+{
+	if (!pmc->perf_event)
+		return false;
+
+	/* recalibrate sample period and check if it's accepted by perf core */
+	if (perf_event_period(pmc->perf_event,
+			(-pmc->counter) & pmc_bitmask(pmc)))
+		return false;
+
+	/* reuse perf_event to serve as pmc_reprogram_counter() does*/
+	perf_event_enable(pmc->perf_event);
+
+	clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi);
+	return true;
+}
+
 void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
 {
 	unsigned config, type = PERF_TYPE_RAW;
@@ -154,7 +183,7 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
 
 	pmc->eventsel = eventsel;
 
-	pmc_stop_counter(pmc);
+	pmc_pause_counter(pmc);
 
 	if (!(eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) || !pmc_is_enabled(pmc))
 		return;
@@ -193,6 +222,12 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
 	if (type == PERF_TYPE_RAW)
 		config = eventsel & X86_RAW_EVENT_MASK;
 
+	if (pmc->current_config == eventsel && pmc_resume_counter(pmc))
+		return;
+
+	pmc_release_perf_event(pmc);
+
+	pmc->current_config = eventsel;
 	pmc_reprogram_counter(pmc, type, config,
 			      !(eventsel & ARCH_PERFMON_EVENTSEL_USR),
 			      !(eventsel & ARCH_PERFMON_EVENTSEL_OS),
@@ -209,7 +244,7 @@ void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int idx)
 	struct kvm_pmu_event_filter *filter;
 	struct kvm *kvm = pmc->vcpu->kvm;
 
-	pmc_stop_counter(pmc);
+	pmc_pause_counter(pmc);
 
 	if (!en_field || !pmc_is_enabled(pmc))
 		return;
@@ -224,6 +259,12 @@ void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int idx)
 			return;
 	}
 
+	if (pmc->current_config == (u64)ctrl && pmc_resume_counter(pmc))
+		return;
+
+	pmc_release_perf_event(pmc);
+
+	pmc->current_config = (u64)ctrl;
 	pmc_reprogram_counter(pmc, PERF_TYPE_HARDWARE,
 			      kvm_x86_ops->pmu_ops->find_fixed_event(idx),
 			      !(en_field & 0x2), /* exclude user */
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 3b4f1942d206..4bf1d25c92d3 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -56,12 +56,20 @@ static inline u64 pmc_read_counter(struct kvm_pmc *pmc)
 	return counter & pmc_bitmask(pmc);
 }
 
-static inline void pmc_stop_counter(struct kvm_pmc *pmc)
+static inline void pmc_release_perf_event(struct kvm_pmc *pmc)
 {
 	if (pmc->perf_event) {
-		pmc->counter = pmc_read_counter(pmc);
 		perf_event_release_kernel(pmc->perf_event);
 		pmc->perf_event = NULL;
+		pmc->current_config = 0;
+	}
+}
+
+static inline void pmc_stop_counter(struct kvm_pmc *pmc)
+{
+	if (pmc->perf_event) {
+		pmc->counter = pmc_read_counter(pmc);
+		pmc_release_perf_event(pmc);
 	}
 }
 
diff --git a/arch/x86/kvm/pmu_amd.c b/arch/x86/kvm/pmu_amd.c
index 88fe3f5f5bd1..0ed2cc7c5902 100644
--- a/arch/x86/kvm/pmu_amd.c
+++ b/arch/x86/kvm/pmu_amd.c
@@ -292,6 +292,7 @@ static void amd_pmu_init(struct kvm_vcpu *vcpu)
 		pmu->gp_counters[i].type = KVM_PMC_GP;
 		pmu->gp_counters[i].vcpu = vcpu;
 		pmu->gp_counters[i].idx = i;
+		pmu->gp_counters[i].current_config = 0;
 	}
 }
 
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 714afcd9244c..002b98a8977e 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -340,12 +340,14 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
 		pmu->gp_counters[i].type = KVM_PMC_GP;
 		pmu->gp_counters[i].vcpu = vcpu;
 		pmu->gp_counters[i].idx = i;
+		pmu->gp_counters[i].current_config = 0;
 	}
 
 	for (i = 0; i < INTEL_PMC_MAX_FIXED; i++) {
 		pmu->fixed_counters[i].type = KVM_PMC_FIXED;
 		pmu->fixed_counters[i].vcpu = vcpu;
 		pmu->fixed_counters[i].idx = i + INTEL_PMC_IDX_FIXED;
+		pmu->fixed_counters[i].current_config = 0;
 	}
 }
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 6/6] KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC
  2019-10-21 16:06 [PATCH v3 0/6] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event Like Xu
                   ` (4 preceding siblings ...)
  2019-10-21 16:06 ` [PATCH v3 5/6] KVM: x86/vPMU: Reuse perf_event to avoid unnecessary pmc_reprogram_counter Like Xu
@ 2019-10-21 16:06 ` Like Xu
  2019-10-22 10:47   ` Paolo Bonzini
  5 siblings, 1 reply; 13+ messages in thread
From: Like Xu @ 2019-10-21 16:06 UTC (permalink / raw)
  To: pbonzini, peterz, kvm
  Cc: like.xu, linux-kernel, jmattson, sean.j.christopherson,
	wei.w.wang, kan.liang

Currently, a host perf_event is created for a vPMC functionality emulation.
It’s unpredictable to determine if a disabled perf_event will be reused.
If they are disabled and are not reused for a considerable period of time,
those obsolete perf_events would increase host context switch overhead that
could have been avoided.

If the guest doesn't WRMSR any of the vPMC's MSRs during an entire vcpu
sched time slice, and its independent enable bit of the vPMC isn't set,
we can predict that the guest has finished the use of this vPMC, and then
do request KVM_REQ_PMU in kvm_arch_sched_in and release those perf_events
in the first call of kvm_pmu_handle_event() after the vcpu is scheduled in.

This lazy mechanism delays the event release time to the beginning of the
next scheduled time slice if vPMC's MSRs aren't changed during this time
slice. If guest comes back to use this vPMC in next time slice, a new perf
event would be re-created via perf_event_create_kernel_counter() as usual.

Suggested-by: Wei W Wang <wei.w.wang@intel.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h | 14 ++++++++
 arch/x86/kvm/pmu.c              | 58 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/pmu.h              |  2 ++
 arch/x86/kvm/pmu_amd.c          |  1 +
 arch/x86/kvm/vmx/pmu_intel.c    |  6 ++++
 arch/x86/kvm/x86.c              |  6 ++++
 6 files changed, 87 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ccce4aaa44df..b8ee62cf669b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -475,6 +475,20 @@ struct kvm_pmu {
 	struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED];
 	struct irq_work irq_work;
 	u64 reprogram_pmi;
+	DECLARE_BITMAP(all_valid_pmc_idx, X86_PMC_IDX_MAX);
+	DECLARE_BITMAP(pmc_in_use, X86_PMC_IDX_MAX);
+
+	/*
+	 * The gate to release perf_events not marked in
+	 * pmc_in_use only once in a vcpu time slice.
+	 */
+	bool need_cleanup;
+
+	/*
+	 * The total number of programmed perf_events and it helps to avoid
+	 * redundant check before cleanup if guest don't use vPMU at all.
+	 */
+	u8 event_count;
 };
 
 struct kvm_pmu_ops;
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 80a17377ec81..a8793f965941 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -137,6 +137,7 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
 	}
 
 	pmc->perf_event = event;
+	pmc_to_pmu(pmc)->event_count++;
 	clear_bit(pmc->idx, (unsigned long*)&pmc_to_pmu(pmc)->reprogram_pmi);
 }
 
@@ -309,6 +310,15 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
 
 		reprogram_counter(pmu, bit);
 	}
+
+	/*
+	 * vPMU uses a lazy method to release the perf_events created for
+	 * features emulation when the related MSRs weren't accessed during
+	 * last vcpu time slice. Technically, this cleanup check happens on
+	 * the first call of vcpu_enter_guest after the vcpu gets scheduled in.
+	 */
+	if (unlikely(pmu->need_cleanup))
+		kvm_pmu_cleanup(vcpu);
 }
 
 /* check if idx is a valid index to access PMU */
@@ -384,6 +394,15 @@ bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 		kvm_x86_ops->pmu_ops->is_valid_msr(vcpu, msr);
 }
 
+static void kvm_pmu_mark_pmc_in_use(struct kvm_vcpu *vcpu, u32 msr)
+{
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+	struct kvm_pmc *pmc = kvm_x86_ops->pmu_ops->msr_idx_to_pmc(vcpu, msr);
+
+	if (pmc)
+		__set_bit(pmc->idx, pmu->pmc_in_use);
+}
+
 int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
 {
 	return kvm_x86_ops->pmu_ops->get_msr(vcpu, msr, data);
@@ -391,6 +410,7 @@ int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
 
 int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
+	kvm_pmu_mark_pmc_in_use(vcpu, msr_info->index);
 	return kvm_x86_ops->pmu_ops->set_msr(vcpu, msr_info);
 }
 
@@ -418,9 +438,47 @@ void kvm_pmu_init(struct kvm_vcpu *vcpu)
 	memset(pmu, 0, sizeof(*pmu));
 	kvm_x86_ops->pmu_ops->init(vcpu);
 	init_irq_work(&pmu->irq_work, kvm_pmi_trigger_fn);
+	pmu->event_count = 0;
+	pmu->need_cleanup = false;
 	kvm_pmu_refresh(vcpu);
 }
 
+static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc)
+{
+	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
+
+	if (pmc_is_fixed(pmc))
+		return fixed_ctrl_field(pmu->fixed_ctr_ctrl,
+			pmc->idx - INTEL_PMC_IDX_FIXED) & 0x3;
+
+	return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
+}
+
+void kvm_pmu_cleanup(struct kvm_vcpu *vcpu)
+{
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+	struct kvm_pmc *pmc = NULL;
+	DECLARE_BITMAP(bitmask, X86_PMC_IDX_MAX);
+	int i;
+
+	/* do cleanup before the first time of running vcpu after sched_in */
+	pmu->need_cleanup = false;
+
+	bitmap_andnot(bitmask, pmu->all_valid_pmc_idx,
+		      pmu->pmc_in_use, X86_PMC_IDX_MAX);
+
+	/* release events for unmarked vPMCs in the last sched time slice */
+	for_each_set_bit(i, bitmask, X86_PMC_IDX_MAX) {
+		pmc = kvm_x86_ops->pmu_ops->pmc_idx_to_pmc(pmu, i);
+
+		if (pmc && pmc->perf_event && !pmc_speculative_in_use(pmc))
+			pmc_stop_counter(pmc);
+	}
+
+	/* reset vPMC lazy-release bitmap for this sched time slice */
+	bitmap_zero(pmu->pmc_in_use, X86_PMC_IDX_MAX);
+}
+
 void kvm_pmu_destroy(struct kvm_vcpu *vcpu)
 {
 	kvm_pmu_reset(vcpu);
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 4bf1d25c92d3..325c12b5c9b4 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -62,6 +62,7 @@ static inline void pmc_release_perf_event(struct kvm_pmc *pmc)
 		perf_event_release_kernel(pmc->perf_event);
 		pmc->perf_event = NULL;
 		pmc->current_config = 0;
+		pmc_to_pmu(pmc)->event_count--;
 	}
 }
 
@@ -126,6 +127,7 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
 void kvm_pmu_refresh(struct kvm_vcpu *vcpu);
 void kvm_pmu_reset(struct kvm_vcpu *vcpu);
 void kvm_pmu_init(struct kvm_vcpu *vcpu);
+void kvm_pmu_cleanup(struct kvm_vcpu *vcpu);
 void kvm_pmu_destroy(struct kvm_vcpu *vcpu);
 int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp);
 
diff --git a/arch/x86/kvm/pmu_amd.c b/arch/x86/kvm/pmu_amd.c
index 0ed2cc7c5902..f0aa291f9963 100644
--- a/arch/x86/kvm/pmu_amd.c
+++ b/arch/x86/kvm/pmu_amd.c
@@ -279,6 +279,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
 	pmu->counter_bitmask[KVM_PMC_FIXED] = 0;
 	pmu->nr_arch_fixed_counters = 0;
 	pmu->global_status = 0;
+	bitmap_set(pmu->all_valid_pmc_idx, 0, pmu->nr_arch_gp_counters);
 }
 
 static void amd_pmu_init(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 002b98a8977e..a00197291f81 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -46,6 +46,7 @@ static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
 		if (old_ctrl == new_ctrl)
 			continue;
 
+		__set_bit(INTEL_PMC_IDX_FIXED + i, pmu->pmc_in_use);
 		reprogram_fixed_counter(pmc, new_ctrl, i);
 	}
 
@@ -329,6 +330,11 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 	    (boot_cpu_has(X86_FEATURE_HLE) || boot_cpu_has(X86_FEATURE_RTM)) &&
 	    (entry->ebx & (X86_FEATURE_HLE|X86_FEATURE_RTM)))
 		pmu->reserved_bits ^= HSW_IN_TX|HSW_IN_TX_CHECKPOINTED;
+
+	bitmap_set(pmu->all_valid_pmc_idx,
+		0, pmu->nr_arch_gp_counters);
+	bitmap_set(pmu->all_valid_pmc_idx,
+		INTEL_PMC_MAX_GENERIC, pmu->nr_arch_fixed_counters);
 }
 
 static void intel_pmu_init(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 72ce691fd45d..a18cb93e80d4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9415,7 +9415,13 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 
 void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
 {
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+
 	vcpu->arch.l1tf_flush_l1d = true;
+	if (pmu->version && unlikely(pmu->event_count)) {
+		pmu->need_cleanup = true;
+		kvm_make_request(KVM_REQ_PMU, vcpu);
+	}
 	kvm_x86_ops->sched_in(vcpu, cpu);
 }
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 6/6] KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC
  2019-10-21 16:06 ` [PATCH v3 6/6] KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC Like Xu
@ 2019-10-22 10:47   ` Paolo Bonzini
  2019-10-22 12:00     ` Like Xu
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2019-10-22 10:47 UTC (permalink / raw)
  To: Like Xu, peterz, kvm
  Cc: like.xu, linux-kernel, jmattson, sean.j.christopherson,
	wei.w.wang, kan.liang

On 21/10/19 18:06, Like Xu wrote:
>  
> +		__set_bit(INTEL_PMC_IDX_FIXED + i, pmu->pmc_in_use);
>  		reprogram_fixed_counter(pmc, new_ctrl, i);
>  	}
>  
> @@ -329,6 +330,11 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>  	    (boot_cpu_has(X86_FEATURE_HLE) || boot_cpu_has(X86_FEATURE_RTM)) &&
>  	    (entry->ebx & (X86_FEATURE_HLE|X86_FEATURE_RTM)))
>  		pmu->reserved_bits ^= HSW_IN_TX|HSW_IN_TX_CHECKPOINTED;
> +
> +	bitmap_set(pmu->all_valid_pmc_idx,
> +		0, pmu->nr_arch_gp_counters);
> +	bitmap_set(pmu->all_valid_pmc_idx,
> +		INTEL_PMC_MAX_GENERIC, pmu->nr_arch_fixed_counters);

The offset needs to be INTEL_PMC_IDX_FIXED for GP counters, and 0 for
fixed counters, otherwise pmc_in_use and all_valid_pmc_idx are not in sync.

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 6/6] KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC
  2019-10-22 10:47   ` Paolo Bonzini
@ 2019-10-22 12:00     ` Like Xu
  2019-10-22 14:09       ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Like Xu @ 2019-10-22 12:00 UTC (permalink / raw)
  To: Paolo Bonzini, kvm
  Cc: peterz, like.xu, linux-kernel, jmattson, sean.j.christopherson,
	wei.w.wang, kan.liang

Hi Paolo,
On 2019/10/22 18:47, Paolo Bonzini wrote:
> On 21/10/19 18:06, Like Xu wrote:
>>   
>> +		__set_bit(INTEL_PMC_IDX_FIXED + i, pmu->pmc_in_use);
>>   		reprogram_fixed_counter(pmc, new_ctrl, i);
>>   	}
>>   
>> @@ -329,6 +330,11 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>>   	    (boot_cpu_has(X86_FEATURE_HLE) || boot_cpu_has(X86_FEATURE_RTM)) &&
>>   	    (entry->ebx & (X86_FEATURE_HLE|X86_FEATURE_RTM)))
>>   		pmu->reserved_bits ^= HSW_IN_TX|HSW_IN_TX_CHECKPOINTED;
>> +
>> +	bitmap_set(pmu->all_valid_pmc_idx,
>> +		0, pmu->nr_arch_gp_counters);
>> +	bitmap_set(pmu->all_valid_pmc_idx,
>> +		INTEL_PMC_MAX_GENERIC, pmu->nr_arch_fixed_counters);
> 
> The offset needs to be INTEL_PMC_IDX_FIXED for GP counters, and 0 for
> fixed counters, otherwise pmc_in_use and all_valid_pmc_idx are not in sync.
> 

First, the bitmap_set is declared as:

	static __always_inline void bitmap_set(unsigned long *map,
	unsigned int start, unsigned int nbits)

Second, the structure of pmu->pmc_in_use is in the following format:

   Intel: [0 .. INTEL_PMC_MAX_GENERIC-1] <=> gp counters
        	 [INTEL_PMC_IDX_FIXED .. INTEL_PMC_IDX_FIXED + 2] <=> fixed
   AMD:   [0 .. AMD64_NUM_COUNTERS-1] <=> gp counters

Then let me translate your suggestion to the following code:

	bitmap_set(pmu->all_valid_pmc_idx, 0,
		   pmu->nr_arch_fixed_counters);
	bitmap_set(pmu->all_valid_pmc_idx, INTEL_PMC_IDX_FIXED,
		   pmu->nr_arch_gp_counters);

and the above code doesn't pass the following verification patch:

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index a8793f965941..0a73bc8c587d 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -469,6 +469,7 @@ void kvm_pmu_cleanup(struct kvm_vcpu *vcpu)

         /* release events for unmarked vPMCs in the last sched time 
slice */
         for_each_set_bit(i, bitmask, X86_PMC_IDX_MAX) {
+               pr_info("%s, do cleanup check for i = %d", __func__, i);
                 pmc = kvm_x86_ops->pmu_ops->pmc_idx_to_pmc(pmu, i);

                 if (pmc && pmc->perf_event && !pmc_speculative_in_use(pmc))

The print message would never stop after the guest user finishes the
perf command and it's checking the invalid idx for i = 35 unexpectedly.

However, my code does work just as you suggest.

By the way, how about other kvm patches?

> Paolo
> 
> 


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 6/6] KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC
  2019-10-22 12:00     ` Like Xu
@ 2019-10-22 14:09       ` Paolo Bonzini
  0 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2019-10-22 14:09 UTC (permalink / raw)
  To: Like Xu, kvm
  Cc: peterz, like.xu, linux-kernel, jmattson, sean.j.christopherson,
	wei.w.wang, kan.liang

On 22/10/19 14:00, Like Xu wrote:
> 
> Second, the structure of pmu->pmc_in_use is in the following format:
> 
>   Intel: [0 .. INTEL_PMC_MAX_GENERIC-1] <=> gp counters
>             [INTEL_PMC_IDX_FIXED .. INTEL_PMC_IDX_FIXED + 2] <=> fixed
>   AMD:   [0 .. AMD64_NUM_COUNTERS-1] <=> gp counters

Sorry---I confused INTEL_PMC_MAX_FIXED and INTEL_PMC_IDX_FIXED.

The patches look good, I'll give them another look since I obviously
wasn't very much awake when reviewing them.

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/6] KVM: x86/vPMU: Rename pmu_ops callbacks from msr_idx to rdpmc_idx
  2019-10-21 16:06 ` [PATCH v3 3/6] KVM: x86/vPMU: Rename pmu_ops callbacks from msr_idx to rdpmc_idx Like Xu
@ 2019-10-23 17:55   ` Jim Mattson
  0 siblings, 0 replies; 13+ messages in thread
From: Jim Mattson @ 2019-10-23 17:55 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, Peter Zijlstra, kvm list, like.xu, LKML,
	Sean Christopherson, Wei Wang, Kan Liang

On Tue, Oct 22, 2019 at 1:12 AM Like Xu <like.xu@linux.intel.com> wrote:
>
> The leagcy pmu_ops->msr_idx_to_pmc is only called in kvm_pmu_rdpmc, so
> this name is restrictedly limited to rdpmc_idx which could be indexed
> exactly to a kvm_pmc. Let's restrict its semantic by renaming the
> existing msr_idx_to_pmc to rdpmc_idx_to_pmc, and is_valid_msr_idx to
> is_valid_rdpmc_idx (likewise for kvm_pmu_is_valid_msr_idx).
>
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Like Xu <like.xu@linux.intel.com>
Nit: The ECX argument to RDPMC is more than just an index (in fact,
intel_is_valid_msr_idx() extracts the index from the provided ECX
value), so I'd suggest s/rdpmc_idx/rdpmc_ecx/g.

Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH] KVM: x86/vPMU: intel_msr_idx_to_pmc() can be static
  2019-10-21 16:06 ` [PATCH v3 4/6] KVM: x86/vPMU: Introduce a new kvm_pmu_ops->msr_idx_to_pmc callback Like Xu
@ 2019-10-26  5:28   ` kbuild test robot
  2019-10-26  5:28   ` [PATCH v3 4/6] KVM: x86/vPMU: Introduce a new kvm_pmu_ops->msr_idx_to_pmc callback kbuild test robot
  1 sibling, 0 replies; 13+ messages in thread
From: kbuild test robot @ 2019-10-26  5:28 UTC (permalink / raw)
  To: Like Xu
  Cc: kbuild-all, pbonzini, peterz, kvm, like.xu, linux-kernel,
	jmattson, sean.j.christopherson, wei.w.wang, kan.liang


Fixes: 0ca8f3f79a8c ("KVM: x86/vPMU: Introduce a new kvm_pmu_ops->msr_idx_to_pmc callback")
Signed-off-by: kbuild test robot <lkp@intel.com>
---
 pmu_amd.c       |    2 +-
 vmx/pmu_intel.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/pmu_amd.c b/arch/x86/kvm/pmu_amd.c
index 88fe3f5f5bd18..ce61d180cf70c 100644
--- a/arch/x86/kvm/pmu_amd.c
+++ b/arch/x86/kvm/pmu_amd.c
@@ -204,7 +204,7 @@ static bool amd_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 	return false;
 }
 
-struct kvm_pmc *amd_msr_idx_to_pmc(struct kvm_vcpu *vcpu, u32 msr)
+static struct kvm_pmc *amd_msr_idx_to_pmc(struct kvm_vcpu *vcpu, u32 msr)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
 	struct kvm_pmc *pmc;
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 714afcd9244ca..95f0ff9e39f3c 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -162,7 +162,7 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 	return ret;
 }
 
-struct kvm_pmc *intel_msr_idx_to_pmc(struct kvm_vcpu *vcpu, u32 msr)
+static struct kvm_pmc *intel_msr_idx_to_pmc(struct kvm_vcpu *vcpu, u32 msr)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
 	struct kvm_pmc *pmc;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 4/6] KVM: x86/vPMU: Introduce a new kvm_pmu_ops->msr_idx_to_pmc callback
  2019-10-21 16:06 ` [PATCH v3 4/6] KVM: x86/vPMU: Introduce a new kvm_pmu_ops->msr_idx_to_pmc callback Like Xu
  2019-10-26  5:28   ` [RFC PATCH] KVM: x86/vPMU: intel_msr_idx_to_pmc() can be static kbuild test robot
@ 2019-10-26  5:28   ` kbuild test robot
  1 sibling, 0 replies; 13+ messages in thread
From: kbuild test robot @ 2019-10-26  5:28 UTC (permalink / raw)
  To: Like Xu
  Cc: kbuild-all, pbonzini, peterz, kvm, like.xu, linux-kernel,
	jmattson, sean.j.christopherson, wei.w.wang, kan.liang

Hi Like,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on kvm/linux-next]
[cannot apply to v5.4-rc4 next-20191025]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Like-Xu/KVM-x86-vPMU-Efficiency-optimization-by-reusing-last-created-perf_event/20191024-164128
base:   https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.1-dirty
        make ARCH=x86_64 allmodconfig
        make C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

>> arch/x86/kvm/vmx/pmu_intel.c:165:16: sparse: sparse: symbol 'intel_msr_idx_to_pmc' was not declared. Should it be static?
--
>> arch/x86/kvm/pmu_amd.c:207:16: sparse: sparse: symbol 'amd_msr_idx_to_pmc' was not declared. Should it be static?

Please review and possibly fold the followup patch.

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-10-26  5:29 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-21 16:06 [PATCH v3 0/6] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event Like Xu
2019-10-21 16:06 ` [PATCH v3 1/6] perf/core: Provide a kernel-internal interface to recalibrate event period Like Xu
2019-10-21 16:06 ` [PATCH v3 2/6] perf/core: Provide a kernel-internal interface to pause perf_event Like Xu
2019-10-21 16:06 ` [PATCH v3 3/6] KVM: x86/vPMU: Rename pmu_ops callbacks from msr_idx to rdpmc_idx Like Xu
2019-10-23 17:55   ` Jim Mattson
2019-10-21 16:06 ` [PATCH v3 4/6] KVM: x86/vPMU: Introduce a new kvm_pmu_ops->msr_idx_to_pmc callback Like Xu
2019-10-26  5:28   ` [RFC PATCH] KVM: x86/vPMU: intel_msr_idx_to_pmc() can be static kbuild test robot
2019-10-26  5:28   ` [PATCH v3 4/6] KVM: x86/vPMU: Introduce a new kvm_pmu_ops->msr_idx_to_pmc callback kbuild test robot
2019-10-21 16:06 ` [PATCH v3 5/6] KVM: x86/vPMU: Reuse perf_event to avoid unnecessary pmc_reprogram_counter Like Xu
2019-10-21 16:06 ` [PATCH v3 6/6] KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC Like Xu
2019-10-22 10:47   ` Paolo Bonzini
2019-10-22 12:00     ` Like Xu
2019-10-22 14:09       ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).