All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/6]  KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event
@ 2019-10-27 10:52 Like Xu
  2019-10-27 10:52 ` [PATCH v4 1/6] perf/core: Provide a kernel-internal interface to recalibrate event period Like Xu
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Like Xu @ 2019-10-27 10:52 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Sean Christopherson, Jim Mattson, Wanpeng Li, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Borislav Petkov, Ingo Molnar,
	Jiri Olsa, Joerg Roedel, Namhyung Kim, Thomas Gleixner,
	Vitaly Kuznetsov, kan.liang, wei.w.wang, linux-kernel, kvm

Hi Community,

For perf subsystem, please help review first two patches.
For kvm subsystem, please help review last four patches.

This patch series is going to improve vPMU Efficiency for guest which is
mainly measured by guest NMI handler latency in such as basic perf usages
[1][2] with hardware PMU. It's not a passthrough solution but based on the
legacy vPMU implementation.

With this optimization, the average latency of the guest NMI handler is
reduced from 104923 ns to 48393 ns (~2.16x speed up on CLX-AP with 5.4-rc4,
w/ perf_v4_pmi=n). If host disables the watchdog, the minimum latency of
guest NMI handler could be speed up at ~3413x and in the average at ~786x.
The run time of workload with perf attached inside the guest could be
reduced significantly with this optimization.

The general idea (defined in patch 5/6) is to reuse last created event
for the same vPMC when the new requested config is the exactly same as the
current_config (used by last pmc_reprogram_counter()) AND the new event
period is appropriate and accepted (via perf_event_period() in patch 1/6).
Before reusing the perf_event, it will be disabled until it's suitable for
reuse and a hardware counter will be reassigned again to serve vPMC.

If the disabled perf_event is no longer reused, we do a lazy release
mechanism (defined in patch 6/6) which in a short is to release the
disabled perf_events in the call of kvm_pmu_handle_event since the vcpu
gets next scheduled in if guest doesn't WRMSR its MSRs in the last sched
time slice. In the kvm_arch_sched_in(), KVM_REQ_PMU is requested if the
pmu->event_count has not been reduced to zero and then do kvm_pmu_cleanup
only once for a sched time slice to ensure that overhead is very limited.

Please check each commit for more details and share your comments with us.

Thanks,
Like Xu

---
[1] multiplexing sampling mode usage: perf record  -e \
`perf list | grep Hardware | grep event |\
awk '{print $1}' | head -n 10 |tr '\n' ',' | sed 's/,$//' ` ./ftest
[2] single event count mode usage: perf stat -e branch-misses ./ftest

---
Changes in v4:
- s/rdpmc_idx/rdpmc_ecx/g (Jim Mattson)
- make *_msr_idx_to_pmc static (kbuild test robot)

Changes in v3:
- optimize perf_event_pause() for no child event 
- rename programed_config to programed_config
- rename lazy_release_ctrl to pmc_in_use
- rename kvm_pmu_ops callbacks form msr_idx to rdpmc_idx
- add a new kvm_pmu_ops callback msr_idx_to_pmc
- use DECLARE_BITMAP to declare bitmap
- set up a bitmap 'pmu->all_valid_pmc_idx'
- move kvm_pmu_cleanup to kvm_pmu_handle_event
- update performance data based on 5.4-rc4 on CLX-AP

Changes in v2:
- use perf_event_pause() to disable, read, reset by only one lock;
- use __perf_event_read_value() after _perf_event_disable();
- replace bitfields with 'u8 event_count; bool need_cleanup;';
- refine comments and commit messages;
- fix two issues reported by kbuild test robot for ARCH=[nds32|sh]

v3:
https://lore.kernel.org/kvm/20191021160651.49508-1-like.xu@linux.intel.com/

v2:
https://lore.kernel.org/kvm/20191013091533.12971-1-like.xu@linux.intel.com/

v1:
https://lore.kernel.org/kvm/20190930072257.43352-1-like.xu@linux.intel.com/

Like Xu (6):
  perf/core: Provide a kernel-internal interface to recalibrate event
    period
  perf/core: Provide a kernel-internal interface to pause perf_event
  KVM: x86/vPMU: Rename pmu_ops callbacks from msr_idx to rdpmc_ecx
  KVM: x86/vPMU: Introduce a new kvm_pmu_ops->msr_idx_to_pmc callback
  KVM: x86/vPMU: Reuse perf_event to avoid unnecessary
    pmc_reprogram_counter
  KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC

 arch/x86/include/asm/kvm_host.h |  19 ++++++
 arch/x86/kvm/pmu.c              | 112 ++++++++++++++++++++++++++++++--
 arch/x86/kvm/pmu.h              |  23 +++++--
 arch/x86/kvm/pmu_amd.c          |  24 +++++--
 arch/x86/kvm/vmx/pmu_intel.c    |  29 +++++++--
 arch/x86/kvm/x86.c              |   8 ++-
 include/linux/perf_event.h      |  10 +++
 kernel/events/core.c            |  46 +++++++++++--
 8 files changed, 240 insertions(+), 31 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4 1/6] perf/core: Provide a kernel-internal interface to recalibrate event period
  2019-10-27 10:52 [PATCH v4 0/6] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event Like Xu
@ 2019-10-27 10:52 ` Like Xu
  2019-10-27 10:52 ` [PATCH v4 2/6] perf/core: Provide a kernel-internal interface to pause perf_event Like Xu
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Like Xu @ 2019-10-27 10:52 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Sean Christopherson, Jim Mattson, Wanpeng Li, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Borislav Petkov, Ingo Molnar,
	Jiri Olsa, Joerg Roedel, Namhyung Kim, Thomas Gleixner,
	Vitaly Kuznetsov, kan.liang, wei.w.wang, linux-kernel, kvm

Currently, perf_event_period() is used by user tools via ioctl. Based on
naming convention, exporting perf_event_period() for kernel users (such
as KVM) who may recalibrate the event period for their assigned counter
according to their requirements.

The perf_event_period() is an external accessor, just like the
perf_event_{en,dis}able() and should thus use perf_event_ctx_lock().

Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 include/linux/perf_event.h |  5 +++++
 kernel/events/core.c       | 28 +++++++++++++++++++++-------
 2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 61448c19a132..d601df36e671 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1336,6 +1336,7 @@ extern void perf_event_disable_local(struct perf_event *event);
 extern void perf_event_disable_inatomic(struct perf_event *event);
 extern void perf_event_task_tick(void);
 extern int perf_event_account_interrupt(struct perf_event *event);
+extern int perf_event_period(struct perf_event *event, u64 value);
 #else /* !CONFIG_PERF_EVENTS: */
 static inline void *
 perf_aux_output_begin(struct perf_output_handle *handle,
@@ -1415,6 +1416,10 @@ static inline void perf_event_disable(struct perf_event *event)		{ }
 static inline int __perf_event_disable(void *info)			{ return -1; }
 static inline void perf_event_task_tick(void)				{ }
 static inline int perf_event_release_kernel(struct perf_event *event)	{ return 0; }
+static inline int perf_event_period(struct perf_event *event, u64 value)
+{
+	return -EINVAL;
+}
 #endif
 
 #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index bb3748d29b04..2fbb485ae880 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5106,16 +5106,11 @@ static int perf_event_check_period(struct perf_event *event, u64 value)
 	return event->pmu->check_period(event, value);
 }
 
-static int perf_event_period(struct perf_event *event, u64 __user *arg)
+static int _perf_event_period(struct perf_event *event, u64 value)
 {
-	u64 value;
-
 	if (!is_sampling_event(event))
 		return -EINVAL;
 
-	if (copy_from_user(&value, arg, sizeof(value)))
-		return -EFAULT;
-
 	if (!value)
 		return -EINVAL;
 
@@ -5133,6 +5128,19 @@ static int perf_event_period(struct perf_event *event, u64 __user *arg)
 	return 0;
 }
 
+int perf_event_period(struct perf_event *event, u64 value)
+{
+	struct perf_event_context *ctx;
+	int ret;
+
+	ctx = perf_event_ctx_lock(event);
+	ret = _perf_event_period(event, value);
+	perf_event_ctx_unlock(event, ctx);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(perf_event_period);
+
 static const struct file_operations perf_fops;
 
 static inline int perf_fget_light(int fd, struct fd *p)
@@ -5176,8 +5184,14 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon
 		return _perf_event_refresh(event, arg);
 
 	case PERF_EVENT_IOC_PERIOD:
-		return perf_event_period(event, (u64 __user *)arg);
+	{
+		u64 value;
+
+		if (copy_from_user(&value, (u64 __user *)arg, sizeof(value)))
+			return -EFAULT;
 
+		return _perf_event_period(event, value);
+	}
 	case PERF_EVENT_IOC_ID:
 	{
 		u64 id = primary_event_id(event);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 2/6] perf/core: Provide a kernel-internal interface to pause perf_event
  2019-10-27 10:52 [PATCH v4 0/6] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event Like Xu
  2019-10-27 10:52 ` [PATCH v4 1/6] perf/core: Provide a kernel-internal interface to recalibrate event period Like Xu
@ 2019-10-27 10:52 ` Like Xu
  2019-10-27 10:52 ` [PATCH v4 3/6] KVM: x86/vPMU: Rename pmu_ops callbacks from msr_idx to rdpmc_ecx Like Xu
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Like Xu @ 2019-10-27 10:52 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Sean Christopherson, Jim Mattson, Wanpeng Li, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Borislav Petkov, Ingo Molnar,
	Jiri Olsa, Joerg Roedel, Namhyung Kim, Thomas Gleixner,
	Vitaly Kuznetsov, kan.liang, wei.w.wang, linux-kernel, kvm

Exporting perf_event_pause() as an external accessor for kernel users (such
as KVM) who may do both disable perf_event and read count with just one
time to hold perf_event_ctx_lock. Also the value could be reset optionally.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 include/linux/perf_event.h |  5 +++++
 kernel/events/core.c       | 18 ++++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index d601df36e671..e9768bfc76f6 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1337,6 +1337,7 @@ extern void perf_event_disable_inatomic(struct perf_event *event);
 extern void perf_event_task_tick(void);
 extern int perf_event_account_interrupt(struct perf_event *event);
 extern int perf_event_period(struct perf_event *event, u64 value);
+extern u64 perf_event_pause(struct perf_event *event, bool reset);
 #else /* !CONFIG_PERF_EVENTS: */
 static inline void *
 perf_aux_output_begin(struct perf_output_handle *handle,
@@ -1420,6 +1421,10 @@ static inline int perf_event_period(struct perf_event *event, u64 value)
 {
 	return -EINVAL;
 }
+static inline u64 perf_event_pause(struct perf_event *event, bool reset)
+{
+	return 0;
+}
 #endif
 
 #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2fbb485ae880..35cc901e0e4f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5029,6 +5029,24 @@ static void _perf_event_reset(struct perf_event *event)
 	perf_event_update_userpage(event);
 }
 
+/* Assume it's not an event with inherit set. */
+u64 perf_event_pause(struct perf_event *event, bool reset)
+{
+	struct perf_event_context *ctx;
+	u64 count;
+
+	ctx = perf_event_ctx_lock(event);
+	WARN_ON_ONCE(event->attr.inherit);
+	_perf_event_disable(event);
+	count = local64_read(&event->count);
+	if (reset)
+		local64_set(&event->count, 0);
+	perf_event_ctx_unlock(event, ctx);
+
+	return count;
+}
+EXPORT_SYMBOL_GPL(perf_event_pause);
+
 /*
  * Holding the top-level event's child_mutex means that any
  * descendant process that has inherited this event will block
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 3/6] KVM: x86/vPMU: Rename pmu_ops callbacks from msr_idx to rdpmc_ecx
  2019-10-27 10:52 [PATCH v4 0/6] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event Like Xu
  2019-10-27 10:52 ` [PATCH v4 1/6] perf/core: Provide a kernel-internal interface to recalibrate event period Like Xu
  2019-10-27 10:52 ` [PATCH v4 2/6] perf/core: Provide a kernel-internal interface to pause perf_event Like Xu
@ 2019-10-27 10:52 ` Like Xu
  2019-10-27 10:52 ` [PATCH v4 4/6] KVM: x86/vPMU: Introduce a new kvm_pmu_ops->msr_idx_to_pmc callback Like Xu
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Like Xu @ 2019-10-27 10:52 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Sean Christopherson, Jim Mattson, Wanpeng Li, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Borislav Petkov, Ingo Molnar,
	Jiri Olsa, Joerg Roedel, Namhyung Kim, Thomas Gleixner,
	Vitaly Kuznetsov, kan.liang, wei.w.wang, linux-kernel, kvm

The leagcy pmu_ops->msr_idx_to_pmc is only called in kvm_pmu_rdpmc, so
this name is restrictedly limited to rdpmc_ecx which could be indexed
exactly to a kvm_pmc. Let's restrict its semantic by renaming the
existing msr_idx_to_pmc to rdpmc_ecx_to_pmc, and is_valid_msr_idx to
is_valid_rdpmc_ecx (likewise for kvm_pmu_is_valid_msr_idx).

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/kvm/pmu.c           |  6 +++---
 arch/x86/kvm/pmu.h           |  8 ++++----
 arch/x86/kvm/pmu_amd.c       |  9 +++++----
 arch/x86/kvm/vmx/pmu_intel.c | 10 +++++-----
 arch/x86/kvm/x86.c           |  2 +-
 5 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 46875bbd0419..1d3252d4f9d4 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -271,9 +271,9 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
 }
 
 /* check if idx is a valid index to access PMU */
-int kvm_pmu_is_valid_msr_idx(struct kvm_vcpu *vcpu, unsigned idx)
+int kvm_pmu_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx)
 {
-	return kvm_x86_ops->pmu_ops->is_valid_msr_idx(vcpu, idx);
+	return kvm_x86_ops->pmu_ops->is_valid_rdpmc_ecx(vcpu, idx);
 }
 
 bool is_vmware_backdoor_pmc(u32 pmc_idx)
@@ -323,7 +323,7 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)
 	if (is_vmware_backdoor_pmc(idx))
 		return kvm_pmu_rdpmc_vmware(vcpu, idx, data);
 
-	pmc = kvm_x86_ops->pmu_ops->msr_idx_to_pmc(vcpu, idx, &mask);
+	pmc = kvm_x86_ops->pmu_ops->rdpmc_ecx_to_pmc(vcpu, idx, &mask);
 	if (!pmc)
 		return 1;
 
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 58265f761c3b..c4a80fe285a5 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -25,9 +25,9 @@ struct kvm_pmu_ops {
 	unsigned (*find_fixed_event)(int idx);
 	bool (*pmc_is_enabled)(struct kvm_pmc *pmc);
 	struct kvm_pmc *(*pmc_idx_to_pmc)(struct kvm_pmu *pmu, int pmc_idx);
-	struct kvm_pmc *(*msr_idx_to_pmc)(struct kvm_vcpu *vcpu, unsigned idx,
-					  u64 *mask);
-	int (*is_valid_msr_idx)(struct kvm_vcpu *vcpu, unsigned idx);
+	struct kvm_pmc *(*rdpmc_ecx_to_pmc)(struct kvm_vcpu *vcpu,
+		unsigned int idx, u64 *mask);
+	int (*is_valid_rdpmc_ecx)(struct kvm_vcpu *vcpu, unsigned int idx);
 	bool (*is_valid_msr)(struct kvm_vcpu *vcpu, u32 msr);
 	int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
 	int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
@@ -110,7 +110,7 @@ void reprogram_counter(struct kvm_pmu *pmu, int pmc_idx);
 void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
 void kvm_pmu_handle_event(struct kvm_vcpu *vcpu);
 int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
-int kvm_pmu_is_valid_msr_idx(struct kvm_vcpu *vcpu, unsigned idx);
+int kvm_pmu_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx);
 bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr);
 int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
 int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
diff --git a/arch/x86/kvm/pmu_amd.c b/arch/x86/kvm/pmu_amd.c
index c8388389a3b0..a4a6d8a09f70 100644
--- a/arch/x86/kvm/pmu_amd.c
+++ b/arch/x86/kvm/pmu_amd.c
@@ -174,7 +174,7 @@ static struct kvm_pmc *amd_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx)
 }
 
 /* returns 0 if idx's corresponding MSR exists; otherwise returns 1. */
-static int amd_is_valid_msr_idx(struct kvm_vcpu *vcpu, unsigned idx)
+static int amd_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
 
@@ -184,7 +184,8 @@ static int amd_is_valid_msr_idx(struct kvm_vcpu *vcpu, unsigned idx)
 }
 
 /* idx is the ECX register of RDPMC instruction */
-static struct kvm_pmc *amd_msr_idx_to_pmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *mask)
+static struct kvm_pmc *amd_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
+	unsigned int idx, u64 *mask)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
 	struct kvm_pmc *counters;
@@ -306,8 +307,8 @@ struct kvm_pmu_ops amd_pmu_ops = {
 	.find_fixed_event = amd_find_fixed_event,
 	.pmc_is_enabled = amd_pmc_is_enabled,
 	.pmc_idx_to_pmc = amd_pmc_idx_to_pmc,
-	.msr_idx_to_pmc = amd_msr_idx_to_pmc,
-	.is_valid_msr_idx = amd_is_valid_msr_idx,
+	.rdpmc_ecx_to_pmc = amd_rdpmc_ecx_to_pmc,
+	.is_valid_rdpmc_ecx = amd_is_valid_rdpmc_ecx,
 	.is_valid_msr = amd_is_valid_msr,
 	.get_msr = amd_pmu_get_msr,
 	.set_msr = amd_pmu_set_msr,
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 3e9c059099e9..7a8067ec19bb 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -111,7 +111,7 @@ static struct kvm_pmc *intel_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx)
 }
 
 /* returns 0 if idx's corresponding MSR exists; otherwise returns 1. */
-static int intel_is_valid_msr_idx(struct kvm_vcpu *vcpu, unsigned idx)
+static int intel_is_valid_rdpmc_ecx(struct kvm_vcpu *vcpu, unsigned int idx)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
 	bool fixed = idx & (1u << 30);
@@ -122,8 +122,8 @@ static int intel_is_valid_msr_idx(struct kvm_vcpu *vcpu, unsigned idx)
 		(fixed && idx >= pmu->nr_arch_fixed_counters);
 }
 
-static struct kvm_pmc *intel_msr_idx_to_pmc(struct kvm_vcpu *vcpu,
-					    unsigned idx, u64 *mask)
+static struct kvm_pmc *intel_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
+					    unsigned int idx, u64 *mask)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
 	bool fixed = idx & (1u << 30);
@@ -366,8 +366,8 @@ struct kvm_pmu_ops intel_pmu_ops = {
 	.find_fixed_event = intel_find_fixed_event,
 	.pmc_is_enabled = intel_pmc_is_enabled,
 	.pmc_idx_to_pmc = intel_pmc_idx_to_pmc,
-	.msr_idx_to_pmc = intel_msr_idx_to_pmc,
-	.is_valid_msr_idx = intel_is_valid_msr_idx,
+	.rdpmc_ecx_to_pmc = intel_rdpmc_ecx_to_pmc,
+	.is_valid_rdpmc_ecx = intel_is_valid_rdpmc_ecx,
 	.is_valid_msr = intel_is_valid_msr,
 	.get_msr = intel_pmu_get_msr,
 	.set_msr = intel_pmu_set_msr,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ff395f812719..726a74e1c6a1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6108,7 +6108,7 @@ static void emulator_set_smbase(struct x86_emulate_ctxt *ctxt, u64 smbase)
 static int emulator_check_pmc(struct x86_emulate_ctxt *ctxt,
 			      u32 pmc)
 {
-	return kvm_pmu_is_valid_msr_idx(emul_to_vcpu(ctxt), pmc);
+	return kvm_pmu_is_valid_rdpmc_ecx(emul_to_vcpu(ctxt), pmc);
 }
 
 static int emulator_read_pmc(struct x86_emulate_ctxt *ctxt,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 4/6] KVM: x86/vPMU: Introduce a new kvm_pmu_ops->msr_idx_to_pmc callback
  2019-10-27 10:52 [PATCH v4 0/6] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event Like Xu
                   ` (2 preceding siblings ...)
  2019-10-27 10:52 ` [PATCH v4 3/6] KVM: x86/vPMU: Rename pmu_ops callbacks from msr_idx to rdpmc_ecx Like Xu
@ 2019-10-27 10:52 ` Like Xu
  2019-10-27 10:52 ` [PATCH v4 5/6] KVM: x86/vPMU: Reuse perf_event to avoid unnecessary pmc_reprogram_counter Like Xu
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Like Xu @ 2019-10-27 10:52 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Sean Christopherson, Jim Mattson, Wanpeng Li, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Borislav Petkov, Ingo Molnar,
	Jiri Olsa, Joerg Roedel, Namhyung Kim, Thomas Gleixner,
	Vitaly Kuznetsov, kan.liang, wei.w.wang, linux-kernel, kvm

Introduce a new callback msr_idx_to_pmc that returns a struct kvm_pmc*,
and change kvm_pmu_is_valid_msr to return ".msr_idx_to_pmc(vcpu, msr) ||
.is_valid_msr(vcpu, msr)" and AMD just returns false from .is_valid_msr.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/kvm/pmu.c           |  3 ++-
 arch/x86/kvm/pmu.h           |  1 +
 arch/x86/kvm/pmu_amd.c       | 15 +++++++++++----
 arch/x86/kvm/vmx/pmu_intel.c | 13 +++++++++++++
 4 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 1d3252d4f9d4..2674a6659ba4 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -339,7 +339,8 @@ void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
 
 bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 {
-	return kvm_x86_ops->pmu_ops->is_valid_msr(vcpu, msr);
+	return kvm_x86_ops->pmu_ops->msr_idx_to_pmc(vcpu, msr) ||
+		kvm_x86_ops->pmu_ops->is_valid_msr(vcpu, msr);
 }
 
 int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index c4a80fe285a5..b253dd5e56cf 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -27,6 +27,7 @@ struct kvm_pmu_ops {
 	struct kvm_pmc *(*pmc_idx_to_pmc)(struct kvm_pmu *pmu, int pmc_idx);
 	struct kvm_pmc *(*rdpmc_ecx_to_pmc)(struct kvm_vcpu *vcpu,
 		unsigned int idx, u64 *mask);
+	struct kvm_pmc *(*msr_idx_to_pmc)(struct kvm_vcpu *vcpu, u32 msr);
 	int (*is_valid_rdpmc_ecx)(struct kvm_vcpu *vcpu, unsigned int idx);
 	bool (*is_valid_msr)(struct kvm_vcpu *vcpu, u32 msr);
 	int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
diff --git a/arch/x86/kvm/pmu_amd.c b/arch/x86/kvm/pmu_amd.c
index a4a6d8a09f70..8e7372e10251 100644
--- a/arch/x86/kvm/pmu_amd.c
+++ b/arch/x86/kvm/pmu_amd.c
@@ -199,14 +199,20 @@ static struct kvm_pmc *amd_rdpmc_ecx_to_pmc(struct kvm_vcpu *vcpu,
 }
 
 static bool amd_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
+{
+	/* please use kvm_pmu_is_valid_msr() instead */
+	return false;
+}
+
+static struct kvm_pmc *amd_msr_idx_to_pmc(struct kvm_vcpu *vcpu, u32 msr)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
-	int ret = false;
+	struct kvm_pmc *pmc;
 
-	ret = get_gp_pmc_amd(pmu, msr, PMU_TYPE_COUNTER) ||
-		get_gp_pmc_amd(pmu, msr, PMU_TYPE_EVNTSEL);
+	pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_COUNTER);
+	pmc = pmc ? pmc : get_gp_pmc_amd(pmu, msr, PMU_TYPE_EVNTSEL);
 
-	return ret;
+	return pmc;
 }
 
 static int amd_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
@@ -308,6 +314,7 @@ struct kvm_pmu_ops amd_pmu_ops = {
 	.pmc_is_enabled = amd_pmc_is_enabled,
 	.pmc_idx_to_pmc = amd_pmc_idx_to_pmc,
 	.rdpmc_ecx_to_pmc = amd_rdpmc_ecx_to_pmc,
+	.msr_idx_to_pmc = amd_msr_idx_to_pmc,
 	.is_valid_rdpmc_ecx = amd_is_valid_rdpmc_ecx,
 	.is_valid_msr = amd_is_valid_msr,
 	.get_msr = amd_pmu_get_msr,
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 7a8067ec19bb..dcde142327ca 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -162,6 +162,18 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 	return ret;
 }
 
+static struct kvm_pmc *intel_msr_idx_to_pmc(struct kvm_vcpu *vcpu, u32 msr)
+{
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+	struct kvm_pmc *pmc;
+
+	pmc = get_fixed_pmc(pmu, msr);
+	pmc = pmc ? pmc : get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0);
+	pmc = pmc ? pmc : get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0);
+
+	return pmc;
+}
+
 static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -367,6 +379,7 @@ struct kvm_pmu_ops intel_pmu_ops = {
 	.pmc_is_enabled = intel_pmc_is_enabled,
 	.pmc_idx_to_pmc = intel_pmc_idx_to_pmc,
 	.rdpmc_ecx_to_pmc = intel_rdpmc_ecx_to_pmc,
+	.msr_idx_to_pmc = intel_msr_idx_to_pmc,
 	.is_valid_rdpmc_ecx = intel_is_valid_rdpmc_ecx,
 	.is_valid_msr = intel_is_valid_msr,
 	.get_msr = intel_pmu_get_msr,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 5/6] KVM: x86/vPMU: Reuse perf_event to avoid unnecessary pmc_reprogram_counter
  2019-10-27 10:52 [PATCH v4 0/6] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event Like Xu
                   ` (3 preceding siblings ...)
  2019-10-27 10:52 ` [PATCH v4 4/6] KVM: x86/vPMU: Introduce a new kvm_pmu_ops->msr_idx_to_pmc callback Like Xu
@ 2019-10-27 10:52 ` Like Xu
  2019-11-14  3:51   ` Wanpeng Li
  2019-10-27 10:52 ` [PATCH v4 6/6] KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC Like Xu
  2019-10-28 16:43 ` [PATCH v4 0/6] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event Peter Zijlstra
  6 siblings, 1 reply; 13+ messages in thread
From: Like Xu @ 2019-10-27 10:52 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Sean Christopherson, Jim Mattson, Wanpeng Li, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Borislav Petkov, Ingo Molnar,
	Jiri Olsa, Joerg Roedel, Namhyung Kim, Thomas Gleixner,
	Vitaly Kuznetsov, kan.liang, wei.w.wang, linux-kernel, kvm

The perf_event_create_kernel_counter() in the pmc_reprogram_counter() is
a heavyweight and high-frequency operation, especially when host disables
the watchdog (maximum 21000000 ns) which leads to an unacceptable latency
of the guest NMI handler. It limits the use of vPMUs in the guest.

When a vPMC is fully enabled, the legacy reprogram_*_counter() would stop
and release its existing perf_event (if any) every time EVEN in most cases
almost the same requested perf_event will be created and configured again.

For each vPMC, if the reuqested config ('u64 eventsel' for gp and 'u8 ctrl'
for fixed) is the same as its current config AND a new sample period based
on pmc->counter is accepted by host perf interface, the current event could
be reused safely as a new created one does. Otherwise, do release the
undesirable perf_event and reprogram a new one as usual.

It's light-weight to call pmc_pause_counter (disable, read and reset event)
and pmc_resume_counter (recalibrate period and re-enable event) as guest
expects instead of release-and-create again on any condition. Compared to
use the filterable event->attr or hw.config, a new 'u64 current_config'
field is added to save the last original programed config for each vPMC.

Based on this implementation, the number of calls to pmc_reprogram_counter
is reduced by ~82.5% for a gp sampling event and ~99.9% for a fixed event.
In the usage of multiplexing perf sampling mode, the average latency of the
guest NMI handler is reduced from 104923 ns to 48393 ns (~2.16x speed up).
If host disables watchdog, the minimum latecy of guest NMI handler could be
speed up at ~3413x (from 20407603 to 5979 ns) and at ~786x in the average.

Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h |  5 ++++
 arch/x86/kvm/pmu.c              | 45 +++++++++++++++++++++++++++++++--
 arch/x86/kvm/pmu.h              | 12 +++++++--
 arch/x86/kvm/pmu_amd.c          |  1 +
 arch/x86/kvm/vmx/pmu_intel.c    |  2 ++
 5 files changed, 61 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 24d6598dea29..c4e0da8e899c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -451,6 +451,11 @@ struct kvm_pmc {
 	u64 eventsel;
 	struct perf_event *perf_event;
 	struct kvm_vcpu *vcpu;
+	/*
+	 * eventsel value for general purpose counters,
+	 * ctrl value for fixed counters.
+	 */
+	u64 current_config;
 };
 
 struct kvm_pmu {
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 2674a6659ba4..47a01a75a8fa 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -140,6 +140,35 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
 	clear_bit(pmc->idx, (unsigned long*)&pmc_to_pmu(pmc)->reprogram_pmi);
 }
 
+static void pmc_pause_counter(struct kvm_pmc *pmc)
+{
+	u64 counter = pmc->counter;
+
+	if (!pmc->perf_event)
+		return;
+
+	/* update counter, reset event value to avoid redundant accumulation */
+	counter += perf_event_pause(pmc->perf_event, true);
+	pmc->counter = counter & pmc_bitmask(pmc);
+}
+
+static bool pmc_resume_counter(struct kvm_pmc *pmc)
+{
+	if (!pmc->perf_event)
+		return false;
+
+	/* recalibrate sample period and check if it's accepted by perf core */
+	if (perf_event_period(pmc->perf_event,
+			(-pmc->counter) & pmc_bitmask(pmc)))
+		return false;
+
+	/* reuse perf_event to serve as pmc_reprogram_counter() does*/
+	perf_event_enable(pmc->perf_event);
+
+	clear_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->reprogram_pmi);
+	return true;
+}
+
 void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
 {
 	unsigned config, type = PERF_TYPE_RAW;
@@ -154,7 +183,7 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
 
 	pmc->eventsel = eventsel;
 
-	pmc_stop_counter(pmc);
+	pmc_pause_counter(pmc);
 
 	if (!(eventsel & ARCH_PERFMON_EVENTSEL_ENABLE) || !pmc_is_enabled(pmc))
 		return;
@@ -193,6 +222,12 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
 	if (type == PERF_TYPE_RAW)
 		config = eventsel & X86_RAW_EVENT_MASK;
 
+	if (pmc->current_config == eventsel && pmc_resume_counter(pmc))
+		return;
+
+	pmc_release_perf_event(pmc);
+
+	pmc->current_config = eventsel;
 	pmc_reprogram_counter(pmc, type, config,
 			      !(eventsel & ARCH_PERFMON_EVENTSEL_USR),
 			      !(eventsel & ARCH_PERFMON_EVENTSEL_OS),
@@ -209,7 +244,7 @@ void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int idx)
 	struct kvm_pmu_event_filter *filter;
 	struct kvm *kvm = pmc->vcpu->kvm;
 
-	pmc_stop_counter(pmc);
+	pmc_pause_counter(pmc);
 
 	if (!en_field || !pmc_is_enabled(pmc))
 		return;
@@ -224,6 +259,12 @@ void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int idx)
 			return;
 	}
 
+	if (pmc->current_config == (u64)ctrl && pmc_resume_counter(pmc))
+		return;
+
+	pmc_release_perf_event(pmc);
+
+	pmc->current_config = (u64)ctrl;
 	pmc_reprogram_counter(pmc, PERF_TYPE_HARDWARE,
 			      kvm_x86_ops->pmu_ops->find_fixed_event(idx),
 			      !(en_field & 0x2), /* exclude user */
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index b253dd5e56cf..7eba298587dc 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -56,12 +56,20 @@ static inline u64 pmc_read_counter(struct kvm_pmc *pmc)
 	return counter & pmc_bitmask(pmc);
 }
 
-static inline void pmc_stop_counter(struct kvm_pmc *pmc)
+static inline void pmc_release_perf_event(struct kvm_pmc *pmc)
 {
 	if (pmc->perf_event) {
-		pmc->counter = pmc_read_counter(pmc);
 		perf_event_release_kernel(pmc->perf_event);
 		pmc->perf_event = NULL;
+		pmc->current_config = 0;
+	}
+}
+
+static inline void pmc_stop_counter(struct kvm_pmc *pmc)
+{
+	if (pmc->perf_event) {
+		pmc->counter = pmc_read_counter(pmc);
+		pmc_release_perf_event(pmc);
 	}
 }
 
diff --git a/arch/x86/kvm/pmu_amd.c b/arch/x86/kvm/pmu_amd.c
index 8e7372e10251..aaa065989ea1 100644
--- a/arch/x86/kvm/pmu_amd.c
+++ b/arch/x86/kvm/pmu_amd.c
@@ -292,6 +292,7 @@ static void amd_pmu_init(struct kvm_vcpu *vcpu)
 		pmu->gp_counters[i].type = KVM_PMC_GP;
 		pmu->gp_counters[i].vcpu = vcpu;
 		pmu->gp_counters[i].idx = i;
+		pmu->gp_counters[i].current_config = 0;
 	}
 }
 
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index dcde142327ca..9b1ddc42f604 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -340,12 +340,14 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
 		pmu->gp_counters[i].type = KVM_PMC_GP;
 		pmu->gp_counters[i].vcpu = vcpu;
 		pmu->gp_counters[i].idx = i;
+		pmu->gp_counters[i].current_config = 0;
 	}
 
 	for (i = 0; i < INTEL_PMC_MAX_FIXED; i++) {
 		pmu->fixed_counters[i].type = KVM_PMC_FIXED;
 		pmu->fixed_counters[i].vcpu = vcpu;
 		pmu->fixed_counters[i].idx = i + INTEL_PMC_IDX_FIXED;
+		pmu->fixed_counters[i].current_config = 0;
 	}
 }
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 6/6] KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC
  2019-10-27 10:52 [PATCH v4 0/6] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event Like Xu
                   ` (4 preceding siblings ...)
  2019-10-27 10:52 ` [PATCH v4 5/6] KVM: x86/vPMU: Reuse perf_event to avoid unnecessary pmc_reprogram_counter Like Xu
@ 2019-10-27 10:52 ` Like Xu
  2019-10-28 16:43 ` [PATCH v4 0/6] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event Peter Zijlstra
  6 siblings, 0 replies; 13+ messages in thread
From: Like Xu @ 2019-10-27 10:52 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Sean Christopherson, Jim Mattson, Wanpeng Li, Alexander Shishkin,
	Arnaldo Carvalho de Melo, Borislav Petkov, Ingo Molnar,
	Jiri Olsa, Joerg Roedel, Namhyung Kim, Thomas Gleixner,
	Vitaly Kuznetsov, kan.liang, wei.w.wang, linux-kernel, kvm

Currently, a host perf_event is created for a vPMC functionality emulation.
It’s unpredictable to determine if a disabled perf_event will be reused.
If they are disabled and are not reused for a considerable period of time,
those obsolete perf_events would increase host context switch overhead that
could have been avoided.

If the guest doesn't WRMSR any of the vPMC's MSRs during an entire vcpu
sched time slice, and its independent enable bit of the vPMC isn't set,
we can predict that the guest has finished the use of this vPMC, and then
do request KVM_REQ_PMU in kvm_arch_sched_in and release those perf_events
in the first call of kvm_pmu_handle_event() after the vcpu is scheduled in.

This lazy mechanism delays the event release time to the beginning of the
next scheduled time slice if vPMC's MSRs aren't changed during this time
slice. If guest comes back to use this vPMC in next time slice, a new perf
event would be re-created via perf_event_create_kernel_counter() as usual.

Suggested-by: Wei Wang <wei.w.wang@intel.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h | 14 ++++++++
 arch/x86/kvm/pmu.c              | 58 +++++++++++++++++++++++++++++++++
 arch/x86/kvm/pmu.h              |  2 ++
 arch/x86/kvm/pmu_amd.c          |  1 +
 arch/x86/kvm/vmx/pmu_intel.c    |  6 ++++
 arch/x86/kvm/x86.c              |  6 ++++
 6 files changed, 87 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c4e0da8e899c..1f489ffa3e9b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -475,6 +475,20 @@ struct kvm_pmu {
 	struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED];
 	struct irq_work irq_work;
 	u64 reprogram_pmi;
+	DECLARE_BITMAP(all_valid_pmc_idx, X86_PMC_IDX_MAX);
+	DECLARE_BITMAP(pmc_in_use, X86_PMC_IDX_MAX);
+
+	/*
+	 * The gate to release perf_events not marked in
+	 * pmc_in_use only once in a vcpu time slice.
+	 */
+	bool need_cleanup;
+
+	/*
+	 * The total number of programmed perf_events and it helps to avoid
+	 * redundant check before cleanup if guest don't use vPMU at all.
+	 */
+	u8 event_count;
 };
 
 struct kvm_pmu_ops;
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 47a01a75a8fa..0655fcde190f 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -137,6 +137,7 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
 	}
 
 	pmc->perf_event = event;
+	pmc_to_pmu(pmc)->event_count++;
 	clear_bit(pmc->idx, (unsigned long*)&pmc_to_pmu(pmc)->reprogram_pmi);
 }
 
@@ -309,6 +310,15 @@ void kvm_pmu_handle_event(struct kvm_vcpu *vcpu)
 
 		reprogram_counter(pmu, bit);
 	}
+
+	/*
+	 * vPMU uses a lazy method to release the perf_events created for
+	 * features emulation when the related MSRs weren't accessed during
+	 * last vcpu time slice. Technically, this cleanup check happens on
+	 * the first call of vcpu_enter_guest after the vcpu gets scheduled in.
+	 */
+	if (unlikely(pmu->need_cleanup))
+		kvm_pmu_cleanup(vcpu);
 }
 
 /* check if idx is a valid index to access PMU */
@@ -384,6 +394,15 @@ bool kvm_pmu_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 		kvm_x86_ops->pmu_ops->is_valid_msr(vcpu, msr);
 }
 
+static void kvm_pmu_mark_pmc_in_use(struct kvm_vcpu *vcpu, u32 msr)
+{
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+	struct kvm_pmc *pmc = kvm_x86_ops->pmu_ops->msr_idx_to_pmc(vcpu, msr);
+
+	if (pmc)
+		__set_bit(pmc->idx, pmu->pmc_in_use);
+}
+
 int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
 {
 	return kvm_x86_ops->pmu_ops->get_msr(vcpu, msr, data);
@@ -391,6 +410,7 @@ int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
 
 int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
+	kvm_pmu_mark_pmc_in_use(vcpu, msr_info->index);
 	return kvm_x86_ops->pmu_ops->set_msr(vcpu, msr_info);
 }
 
@@ -418,9 +438,47 @@ void kvm_pmu_init(struct kvm_vcpu *vcpu)
 	memset(pmu, 0, sizeof(*pmu));
 	kvm_x86_ops->pmu_ops->init(vcpu);
 	init_irq_work(&pmu->irq_work, kvm_pmi_trigger_fn);
+	pmu->event_count = 0;
+	pmu->need_cleanup = false;
 	kvm_pmu_refresh(vcpu);
 }
 
+static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc)
+{
+	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
+
+	if (pmc_is_fixed(pmc))
+		return fixed_ctrl_field(pmu->fixed_ctr_ctrl,
+			pmc->idx - INTEL_PMC_IDX_FIXED) & 0x3;
+
+	return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
+}
+
+void kvm_pmu_cleanup(struct kvm_vcpu *vcpu)
+{
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+	struct kvm_pmc *pmc = NULL;
+	DECLARE_BITMAP(bitmask, X86_PMC_IDX_MAX);
+	int i;
+
+	/* do cleanup before the first time of running vcpu after sched_in */
+	pmu->need_cleanup = false;
+
+	bitmap_andnot(bitmask, pmu->all_valid_pmc_idx,
+		      pmu->pmc_in_use, X86_PMC_IDX_MAX);
+
+	/* release events for unmarked vPMCs in the last sched time slice */
+	for_each_set_bit(i, bitmask, X86_PMC_IDX_MAX) {
+		pmc = kvm_x86_ops->pmu_ops->pmc_idx_to_pmc(pmu, i);
+
+		if (pmc && pmc->perf_event && !pmc_speculative_in_use(pmc))
+			pmc_stop_counter(pmc);
+	}
+
+	/* reset vPMC lazy-release bitmap for this sched time slice */
+	bitmap_zero(pmu->pmc_in_use, X86_PMC_IDX_MAX);
+}
+
 void kvm_pmu_destroy(struct kvm_vcpu *vcpu)
 {
 	kvm_pmu_reset(vcpu);
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 7eba298587dc..b7a625874203 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -62,6 +62,7 @@ static inline void pmc_release_perf_event(struct kvm_pmc *pmc)
 		perf_event_release_kernel(pmc->perf_event);
 		pmc->perf_event = NULL;
 		pmc->current_config = 0;
+		pmc_to_pmu(pmc)->event_count--;
 	}
 }
 
@@ -126,6 +127,7 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
 void kvm_pmu_refresh(struct kvm_vcpu *vcpu);
 void kvm_pmu_reset(struct kvm_vcpu *vcpu);
 void kvm_pmu_init(struct kvm_vcpu *vcpu);
+void kvm_pmu_cleanup(struct kvm_vcpu *vcpu);
 void kvm_pmu_destroy(struct kvm_vcpu *vcpu);
 int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp);
 
diff --git a/arch/x86/kvm/pmu_amd.c b/arch/x86/kvm/pmu_amd.c
index aaa065989ea1..e5223ff83a56 100644
--- a/arch/x86/kvm/pmu_amd.c
+++ b/arch/x86/kvm/pmu_amd.c
@@ -279,6 +279,7 @@ static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
 	pmu->counter_bitmask[KVM_PMC_FIXED] = 0;
 	pmu->nr_arch_fixed_counters = 0;
 	pmu->global_status = 0;
+	bitmap_set(pmu->all_valid_pmc_idx, 0, pmu->nr_arch_gp_counters);
 }
 
 static void amd_pmu_init(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 9b1ddc42f604..b5a16379f534 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -46,6 +46,7 @@ static void reprogram_fixed_counters(struct kvm_pmu *pmu, u64 data)
 		if (old_ctrl == new_ctrl)
 			continue;
 
+		__set_bit(INTEL_PMC_IDX_FIXED + i, pmu->pmc_in_use);
 		reprogram_fixed_counter(pmc, new_ctrl, i);
 	}
 
@@ -329,6 +330,11 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 	    (boot_cpu_has(X86_FEATURE_HLE) || boot_cpu_has(X86_FEATURE_RTM)) &&
 	    (entry->ebx & (X86_FEATURE_HLE|X86_FEATURE_RTM)))
 		pmu->reserved_bits ^= HSW_IN_TX|HSW_IN_TX_CHECKPOINTED;
+
+	bitmap_set(pmu->all_valid_pmc_idx,
+		0, pmu->nr_arch_gp_counters);
+	bitmap_set(pmu->all_valid_pmc_idx,
+		INTEL_PMC_MAX_GENERIC, pmu->nr_arch_fixed_counters);
 }
 
 static void intel_pmu_init(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 726a74e1c6a1..66da24253452 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9416,7 +9416,13 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 
 void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
 {
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+
 	vcpu->arch.l1tf_flush_l1d = true;
+	if (pmu->version && unlikely(pmu->event_count)) {
+		pmu->need_cleanup = true;
+		kvm_make_request(KVM_REQ_PMU, vcpu);
+	}
 	kvm_x86_ops->sched_in(vcpu, cpu);
 }
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 0/6]  KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event
  2019-10-27 10:52 [PATCH v4 0/6] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event Like Xu
                   ` (5 preceding siblings ...)
  2019-10-27 10:52 ` [PATCH v4 6/6] KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC Like Xu
@ 2019-10-28 16:43 ` Peter Zijlstra
  2019-11-12  6:08   ` Like Xu
  6 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2019-10-28 16:43 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, Sean Christopherson, Jim Mattson, Wanpeng Li,
	Alexander Shishkin, Arnaldo Carvalho de Melo, Borislav Petkov,
	Ingo Molnar, Jiri Olsa, Joerg Roedel, Namhyung Kim,
	Thomas Gleixner, Vitaly Kuznetsov, kan.liang, wei.w.wang,
	linux-kernel, kvm

On Sun, Oct 27, 2019 at 06:52:37PM +0800, Like Xu wrote:
> For perf subsystem, please help review first two patches.

> Like Xu (6):
>   perf/core: Provide a kernel-internal interface to recalibrate event
>     period
>   perf/core: Provide a kernel-internal interface to pause perf_event

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 0/6] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event
  2019-10-28 16:43 ` [PATCH v4 0/6] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event Peter Zijlstra
@ 2019-11-12  6:08   ` Like Xu
  2019-11-12 10:05     ` Paolo Bonzini
  2019-11-12 10:09     ` Paolo Bonzini
  0 siblings, 2 replies; 13+ messages in thread
From: Like Xu @ 2019-11-12  6:08 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Peter Zijlstra, linux-kernel, kvm

Hi Paolo,

On 2019/10/29 0:43, Peter Zijlstra wrote:
> On Sun, Oct 27, 2019 at 06:52:37PM +0800, Like Xu wrote:
>> For perf subsystem, please help review first two patches.
> 
>> Like Xu (6):
>>    perf/core: Provide a kernel-internal interface to recalibrate event
>>      period
>>    perf/core: Provide a kernel-internal interface to pause perf_event
> 
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 

Would you mind to revisit the following patches for upstream ?

 >    KVM: x86/vPMU: Rename pmu_ops callbacks from msr_idx to rdpmc_ecx
 >    KVM: x86/vPMU: Introduce a new kvm_pmu_ops->msr_idx_to_pmc callback
 >    KVM: x86/vPMU: Reuse perf_event to avoid unnecessary
 >      pmc_reprogram_counter
 >    KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC

For vPMU, please review two more patches as well:
+ 
https://lore.kernel.org/kvm/20191030164418.2957-1-like.xu@linux.intel.com/ 
(kvm)
+ 
https://lore.kernel.org/lkml/20191105140955.22504-1-like.xu@linux.intel.com/ 
(perf)

Thanks,
Like Xu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 0/6] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event
  2019-11-12  6:08   ` Like Xu
@ 2019-11-12 10:05     ` Paolo Bonzini
  2019-11-12 10:09     ` Paolo Bonzini
  1 sibling, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2019-11-12 10:05 UTC (permalink / raw)
  To: Like Xu; +Cc: Peter Zijlstra, linux-kernel, kvm

On 12/11/19 07:08, Like Xu wrote:
> Hi Paolo,
> 
> On 2019/10/29 0:43, Peter Zijlstra wrote:
>> On Sun, Oct 27, 2019 at 06:52:37PM +0800, Like Xu wrote:
>>> For perf subsystem, please help review first two patches.
>>
>>> Like Xu (6):
>>>    perf/core: Provide a kernel-internal interface to recalibrate event
>>>      period
>>>    perf/core: Provide a kernel-internal interface to pause perf_event
>>
>> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>
> 
> Would you mind to revisit the following patches for upstream ?
> 
>>    KVM: x86/vPMU: Rename pmu_ops callbacks from msr_idx to rdpmc_ecx
>>    KVM: x86/vPMU: Introduce a new kvm_pmu_ops->msr_idx_to_pmc callback
>>    KVM: x86/vPMU: Reuse perf_event to avoid unnecessary
>>      pmc_reprogram_counter
>>    KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC

Queued, thanks.

Paolo

> For vPMU, please review two more patches as well:
> +
> https://lore.kernel.org/kvm/20191030164418.2957-1-like.xu@linux.intel.com/
> (kvm)
> +
> https://lore.kernel.org/lkml/20191105140955.22504-1-like.xu@linux.intel.com/
> (perf)
> 
> Thanks,
> Like Xu
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 0/6] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event
  2019-11-12  6:08   ` Like Xu
  2019-11-12 10:05     ` Paolo Bonzini
@ 2019-11-12 10:09     ` Paolo Bonzini
  1 sibling, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2019-11-12 10:09 UTC (permalink / raw)
  To: Like Xu; +Cc: Peter Zijlstra, linux-kernel, kvm

On 12/11/19 07:08, Like Xu wrote:
> 
> For vPMU, please review two more patches as well:
> +
> https://lore.kernel.org/kvm/20191030164418.2957-1-like.xu@linux.intel.com/
> (kvm)

If I understand this patch correctly, you are patching the CPUID values
passed to the KVM_SET_CPUID2 ioctl if they are not valid for the host.
Generally we don't do that, if there is garbage in CPUID the behavior of
the guest will be unreliable.

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 5/6] KVM: x86/vPMU: Reuse perf_event to avoid unnecessary pmc_reprogram_counter
  2019-10-27 10:52 ` [PATCH v4 5/6] KVM: x86/vPMU: Reuse perf_event to avoid unnecessary pmc_reprogram_counter Like Xu
@ 2019-11-14  3:51   ` Wanpeng Li
  2019-11-14  7:13     ` Like Xu
  0 siblings, 1 reply; 13+ messages in thread
From: Wanpeng Li @ 2019-11-14  3:51 UTC (permalink / raw)
  To: Like Xu
  Cc: Peter Zijlstra, Paolo Bonzini, Sean Christopherson, Jim Mattson,
	Wanpeng Li, Alexander Shishkin, Arnaldo Carvalho de Melo,
	Borislav Petkov, Ingo Molnar, Jiri Olsa, Joerg Roedel,
	Namhyung Kim, Thomas Gleixner, Vitaly Kuznetsov, kan.liang,
	wei.w.wang, LKML, kvm

On Mon, 28 Oct 2019 at 21:06, Like Xu <like.xu@linux.intel.com> wrote:
>
> The perf_event_create_kernel_counter() in the pmc_reprogram_counter() is
> a heavyweight and high-frequency operation, especially when host disables
> the watchdog (maximum 21000000 ns) which leads to an unacceptable latency

Why when host disables the watchdog,
perf_event_create_kernel_counter() is more heavyweight and
high-frequency operation?

    Wanpeng

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 5/6] KVM: x86/vPMU: Reuse perf_event to avoid unnecessary pmc_reprogram_counter
  2019-11-14  3:51   ` Wanpeng Li
@ 2019-11-14  7:13     ` Like Xu
  0 siblings, 0 replies; 13+ messages in thread
From: Like Xu @ 2019-11-14  7:13 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Peter Zijlstra, Paolo Bonzini, Sean Christopherson, Jim Mattson,
	Wanpeng Li, Alexander Shishkin, Arnaldo Carvalho de Melo,
	Borislav Petkov, Ingo Molnar, Jiri Olsa, Joerg Roedel,
	Namhyung Kim, Thomas Gleixner, Vitaly Kuznetsov, kan.liang,
	wei.w.wang, LKML, kvm

Hi Wanpeng,

On 2019/11/14 11:51, Wanpeng Li wrote:
> On Mon, 28 Oct 2019 at 21:06, Like Xu <like.xu@linux.intel.com> wrote:
>>
>> The perf_event_create_kernel_counter() in the pmc_reprogram_counter() is
>> a heavyweight and high-frequency operation, especially when host disables
>> the watchdog (maximum 21000000 ns) which leads to an unacceptable latency
> 
> Why when host disables the watchdog,
> perf_event_create_kernel_counter() is more heavyweight and
> high-frequency operation?
> 
>      Wanpeng
> 

- Fist, let me assume you do have experienced the fact that the perf 
behavior on guest for multiple hardware events is extremely sluggish when 
you disable watchdog on host. The setting of host watchdog is
the uncontrollability the patch series wants to eliminate for vPMU users.

- Disabling host watchdog brings higher frequency is imprecise. In legacy 
vPMU code, the operation is high-frequency regardless of the host watchdog 
setting. The exact frequency depends on perf sampling frequency and guest 
pmu driver pattern.

- The sched_clock() time consumed by perf_event_create_kernel_counter() is 
tested on various x86 platforms and the values suddenly become larger when 
and only when host disables watchdog. Sometimes watchdog damages the 
accuracy. In the early stages of exploration, we found if host disables 
watchdog, the synchronize_rcu() from account_event() in perf_event_alloc() 
becomes much more heavyweight and it seems to be a general necessary 
mechanism. The deeper reason behind this is undefined.

Thanks,
Like Xu

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-11-14  7:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-27 10:52 [PATCH v4 0/6] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event Like Xu
2019-10-27 10:52 ` [PATCH v4 1/6] perf/core: Provide a kernel-internal interface to recalibrate event period Like Xu
2019-10-27 10:52 ` [PATCH v4 2/6] perf/core: Provide a kernel-internal interface to pause perf_event Like Xu
2019-10-27 10:52 ` [PATCH v4 3/6] KVM: x86/vPMU: Rename pmu_ops callbacks from msr_idx to rdpmc_ecx Like Xu
2019-10-27 10:52 ` [PATCH v4 4/6] KVM: x86/vPMU: Introduce a new kvm_pmu_ops->msr_idx_to_pmc callback Like Xu
2019-10-27 10:52 ` [PATCH v4 5/6] KVM: x86/vPMU: Reuse perf_event to avoid unnecessary pmc_reprogram_counter Like Xu
2019-11-14  3:51   ` Wanpeng Li
2019-11-14  7:13     ` Like Xu
2019-10-27 10:52 ` [PATCH v4 6/6] KVM: x86/vPMU: Add lazy mechanism to release perf_event per vPMC Like Xu
2019-10-28 16:43 ` [PATCH v4 0/6] KVM: x86/vPMU: Efficiency optimization by reusing last created perf_event Peter Zijlstra
2019-11-12  6:08   ` Like Xu
2019-11-12 10:05     ` Paolo Bonzini
2019-11-12 10:09     ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.