kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS
@ 2021-05-11  2:41 Like Xu
  2021-05-11  2:41 ` [PATCH v6 01/16] perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server Like Xu
                   ` (16 more replies)
  0 siblings, 17 replies; 56+ messages in thread
From: Like Xu @ 2021-05-11  2:41 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, liuxiangdong5, linux-kernel, x86, kvm,
	Like Xu

A new kernel cycle has begun, and this version looks promising.

The guest Precise Event Based Sampling (PEBS) feature can provide
an architectural state of the instruction executed after the guest
instruction that exactly caused the event. It needs new hardware
facility only available on Intel Ice Lake Server platforms. This
patch set enables the basic PEBS feature for KVM guests on ICX.

We can use PEBS feature on the Linux guest like native:

  # perf record -e instructions:ppp ./br_instr a
  # perf record -c 100000 -e instructions:pp ./br_instr a

To emulate guest PEBS facility for the above perf usages,
we need to implement 2 code paths:

1) Fast path

This is when the host assigned physical PMC has an identical index as
the virtual PMC (e.g. using physical PMC0 to emulate virtual PMC0).
This path is used in most common use cases.

2) Slow path

This is when the host assigned physical PMC has a different index
from the virtual PMC (e.g. using physical PMC1 to emulate virtual PMC0)
In this case, KVM needs to rewrite the PEBS records to change the
applicable counter indexes to the virtual PMC indexes, which would
otherwise contain the physical counter index written by PEBS facility,
and switch the counter reset values to the offset corresponding to
the physical counter indexes in the DS data structure.

The previous version [0] enables both fast path and slow path, which
seems a bit more complex as the first step. In this patchset, we want
to start with the fast path to get the basic guest PEBS enabled while
keeping the slow path disabled. More focused discussion on the slow
path [1] is planned to be put to another patchset in the next step.

Compared to later versions in subsequent steps, the functionality
to support host-guest PEBS both enabled and the functionality to
emulate guest PEBS when the counter is cross-mapped are missing
in this patch set (neither of these are typical scenarios).

With the basic support, the guest can retrieve the correct PEBS
information from its own PEBS records on the Ice Lake servers.
And we expect it should work when migrating to another Ice Lake
and no regression about host perf is expected.

Here are the results of pebs test from guest/host for same workload:

perf report on guest:
# Samples: 2K of event 'instructions:ppp', # Event count (approx.): 1473377250
# Overhead  Command   Shared Object      Symbol
  57.74%  br_instr  br_instr           [.] lfsr_cond
  41.40%  br_instr  br_instr           [.] cmp_end
   0.21%  br_instr  [kernel.kallsyms]  [k] __lock_acquire

perf report on host:
# Samples: 2K of event 'instructions:ppp', # Event count (approx.): 1462721386
# Overhead  Command   Shared Object     Symbol
  57.90%  br_instr  br_instr          [.] lfsr_cond
  41.95%  br_instr  br_instr          [.] cmp_end
   0.05%  br_instr  [kernel.vmlinux]  [k] lock_acquire
   Conclusion: the profiling results on the guest are similar tothat on the host.

A minimum guest kernel version may be v5.4 or a backport version
support Icelake server PEBS.

Please check more details in each commit and feel free to comment.

Previous:
https://lore.kernel.org/kvm/20210415032016.166201-1-like.xu@linux.intel.com/

[0] https://lore.kernel.org/kvm/20210104131542.495413-1-like.xu@linux.intel.com/
[1] https://lore.kernel.org/kvm/20210115191113.nktlnmivc3edstiv@two.firstfloor.org/

V5 -> V6 Changelog:
- Rebased on the latest kvm/queue tree;
- Fix a git rebase issue (Liuxiangdong);
- Adjust the patch sequence 06/07 for bisection (Liuxiangdong);

Like Xu (16):
  perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server
  perf/x86/intel: Handle guest PEBS overflow PMI for KVM guest
  perf/x86/core: Pass "struct kvm_pmu *" to determine the guest values
  KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled
  KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter
  KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS
  KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter
  KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS
  KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS
  KVM: x86: Set PEBS_UNAVAIL in IA32_MISC_ENABLE when PEBS is enabled
  KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR counter
  KVM: x86/pmu: Move pmc_speculative_in_use() to arch/x86/kvm/pmu.h
  KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations
  KVM: x86/pmu: Add kvm_pmu_cap to optimize perf_get_x86_pmu_capability
  KVM: x86/cpuid: Refactor host/guest CPU model consistency check
  KVM: x86/pmu: Expose CPUIDs feature bits PDCM, DS, DTES64

 arch/x86/events/core.c            |   5 +-
 arch/x86/events/intel/core.c      | 129 ++++++++++++++++++++++++------
 arch/x86/events/perf_event.h      |   5 +-
 arch/x86/include/asm/kvm_host.h   |  16 ++++
 arch/x86/include/asm/msr-index.h  |   6 ++
 arch/x86/include/asm/perf_event.h |   5 +-
 arch/x86/kvm/cpuid.c              |  24 ++----
 arch/x86/kvm/cpuid.h              |   5 ++
 arch/x86/kvm/pmu.c                |  50 +++++++++---
 arch/x86/kvm/pmu.h                |  38 +++++++++
 arch/x86/kvm/vmx/capabilities.h   |  26 ++++--
 arch/x86/kvm/vmx/pmu_intel.c      | 115 +++++++++++++++++++++-----
 arch/x86/kvm/vmx/vmx.c            |  24 +++++-
 arch/x86/kvm/vmx/vmx.h            |   2 +-
 arch/x86/kvm/x86.c                |  14 ++--
 15 files changed, 368 insertions(+), 96 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v6 01/16] perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server
  2021-05-11  2:41 [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS Like Xu
@ 2021-05-11  2:41 ` Like Xu
  2021-05-11  2:42 ` [PATCH v6 02/16] perf/x86/intel: Handle guest PEBS overflow PMI for KVM guest Like Xu
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 56+ messages in thread
From: Like Xu @ 2021-05-11  2:41 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, liuxiangdong5, linux-kernel, x86, kvm,
	Like Xu

The new hardware facility supporting guest PEBS is only available
on Intel Ice Lake Server platforms for now. KVM will check this field
through perf_get_x86_pmu_capability() instead of hard coding the cpu
models in the KVM code. If it is supported, the guest PBES capability
will be exposed to the guest.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/events/core.c            | 1 +
 arch/x86/events/intel/core.c      | 1 +
 arch/x86/events/perf_event.h      | 3 ++-
 arch/x86/include/asm/perf_event.h | 1 +
 4 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 8e509325c2c3..ccbe5b239c22 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2958,5 +2958,6 @@ void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
 	cap->bit_width_fixed	= x86_pmu.cntval_bits;
 	cap->events_mask	= (unsigned int)x86_pmu.events_maskl;
 	cap->events_mask_len	= x86_pmu.events_mask_len;
+	cap->pebs_vmx		= x86_pmu.pebs_vmx;
 }
 EXPORT_SYMBOL_GPL(perf_get_x86_pmu_capability);
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 2521d03de5e0..b6e45ee10e16 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -6028,6 +6028,7 @@ __init int intel_pmu_init(void)
 
 	case INTEL_FAM6_ICELAKE_X:
 	case INTEL_FAM6_ICELAKE_D:
+		x86_pmu.pebs_vmx = 1;
 		pmem = true;
 		fallthrough;
 	case INTEL_FAM6_ICELAKE_L:
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 27fa85e7d4fd..6a0f768c5330 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -796,7 +796,8 @@ struct x86_pmu {
 			pebs_prec_dist		:1,
 			pebs_no_tlb		:1,
 			pebs_no_isolation	:1,
-			pebs_block		:1;
+			pebs_block		:1,
+			pebs_vmx		:1;
 	int		pebs_record_size;
 	int		pebs_buffer_size;
 	int		max_pebs_events;
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 544f41a179fb..6a6e707905be 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -192,6 +192,7 @@ struct x86_pmu_capability {
 	int		bit_width_fixed;
 	unsigned int	events_mask;
 	int		events_mask_len;
+	unsigned int	pebs_vmx	:1;
 };
 
 /*
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 02/16] perf/x86/intel: Handle guest PEBS overflow PMI for KVM guest
  2021-05-11  2:41 [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS Like Xu
  2021-05-11  2:41 ` [PATCH v6 01/16] perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server Like Xu
@ 2021-05-11  2:42 ` Like Xu
  2021-05-17  8:16   ` Peter Zijlstra
  2021-05-11  2:42 ` [PATCH v6 03/16] perf/x86/core: Pass "struct kvm_pmu *" to determine the guest values Like Xu
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 56+ messages in thread
From: Like Xu @ 2021-05-11  2:42 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, liuxiangdong5, linux-kernel, x86, kvm,
	Like Xu

With PEBS virtualization, the guest PEBS records get delivered to the
guest DS, and the host pmi handler uses perf_guest_cbs->is_in_guest()
to distinguish whether the PMI comes from the guest code like Intel PT.

No matter how many guest PEBS counters are overflowed, only triggering
one fake event is enough. The fake event causes the KVM PMI callback to
be called, thereby injecting the PEBS overflow PMI into the guest.

KVM may inject the PMI with BUFFER_OVF set, even if the guest DS is
empty. That should really be harmless. Thus guest PEBS handler would
retrieve the correct information from its own PEBS records buffer.

Originally-by: Andi Kleen <ak@linux.intel.com>
Co-developed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/events/intel/core.c | 40 ++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index b6e45ee10e16..092ecacf8345 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2780,6 +2780,43 @@ static void intel_pmu_reset(void)
 	local_irq_restore(flags);
 }
 
+/*
+ * We may be running with guest PEBS events created by KVM, and the
+ * PEBS records are logged into the guest's DS and invisible to host.
+ *
+ * In the case of guest PEBS overflow, we only trigger a fake event
+ * to emulate the PEBS overflow PMI for guest PBES counters in KVM.
+ * The guest will then vm-entry and check the guest DS area to read
+ * the guest PEBS records.
+ *
+ * The contents and other behavior of the guest event do not matter.
+ */
+static void x86_pmu_handle_guest_pebs(struct pt_regs *regs,
+				      struct perf_sample_data *data)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	u64 guest_pebs_idxs = cpuc->pebs_enabled & ~cpuc->intel_ctrl_host_mask;
+	struct perf_event *event = NULL;
+	int bit;
+
+	if (!x86_pmu.pebs_active || !guest_pebs_idxs)
+		return;
+
+	for_each_set_bit(bit, (unsigned long *)&guest_pebs_idxs,
+			 INTEL_PMC_IDX_FIXED + x86_pmu.num_counters_fixed) {
+		event = cpuc->events[bit];
+		if (!event->attr.precise_ip)
+			continue;
+
+		perf_sample_data_init(data, 0, event->hw.last_period);
+		if (perf_event_overflow(event, data, regs))
+			x86_pmu_stop(event, 0);
+
+		/* Inject one fake event is enough. */
+		break;
+	}
+}
+
 static int handle_pmi_common(struct pt_regs *regs, u64 status)
 {
 	struct perf_sample_data data;
@@ -2831,6 +2868,9 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
 		u64 pebs_enabled = cpuc->pebs_enabled;
 
 		handled++;
+		if (x86_pmu.pebs_vmx && perf_guest_cbs &&
+		    perf_guest_cbs->is_in_guest())
+			x86_pmu_handle_guest_pebs(regs, &data);
 		x86_pmu.drain_pebs(regs, &data);
 		status &= intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 03/16] perf/x86/core: Pass "struct kvm_pmu *" to determine the guest values
  2021-05-11  2:41 [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS Like Xu
  2021-05-11  2:41 ` [PATCH v6 01/16] perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server Like Xu
  2021-05-11  2:42 ` [PATCH v6 02/16] perf/x86/intel: Handle guest PEBS overflow PMI for KVM guest Like Xu
@ 2021-05-11  2:42 ` Like Xu
  2021-05-11  2:42 ` [PATCH v6 04/16] KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled Like Xu
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 56+ messages in thread
From: Like Xu @ 2021-05-11  2:42 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, liuxiangdong5, linux-kernel, x86, kvm,
	Like Xu

Splitting the logic for determining the guest values is unnecessarily
confusing, and potentially fragile. Perf should have full knowledge and
control of what values are loaded for the guest.

If we change .guest_get_msrs() to take a struct kvm_pmu pointer, then it
can generate the full set of guest values by grabbing guest ds_area and
pebs_data_cfg. Alternatively, .guest_get_msrs() could take the desired
guest MSR values directly (ds_area and pebs_data_cfg), but kvm_pmu is
vendor agnostic, so we don't see any reason to not just pass the pointer.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/events/core.c            | 4 ++--
 arch/x86/events/intel/core.c      | 4 ++--
 arch/x86/events/perf_event.h      | 2 +-
 arch/x86/include/asm/perf_event.h | 4 ++--
 arch/x86/kvm/vmx/vmx.c            | 3 ++-
 5 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index ccbe5b239c22..88e6540fe876 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -689,9 +689,9 @@ void x86_pmu_disable_all(void)
 	}
 }
 
-struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr)
+struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr, void *data)
 {
-	return static_call(x86_pmu_guest_get_msrs)(nr);
+	return static_call(x86_pmu_guest_get_msrs)(nr, data);
 }
 EXPORT_SYMBOL_GPL(perf_guest_get_msrs);
 
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 092ecacf8345..2f89fd599842 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3893,7 +3893,7 @@ static int intel_pmu_hw_config(struct perf_event *event)
 	return 0;
 }
 
-static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr)
+static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct perf_guest_switch_msr *arr = cpuc->guest_switch_msrs;
@@ -3926,7 +3926,7 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr)
 	return arr;
 }
 
-static struct perf_guest_switch_msr *core_guest_get_msrs(int *nr)
+static struct perf_guest_switch_msr *core_guest_get_msrs(int *nr, void *data)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct perf_guest_switch_msr *arr = cpuc->guest_switch_msrs;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 6a0f768c5330..685a1a4e9438 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -876,7 +876,7 @@ struct x86_pmu {
 	/*
 	 * Intel host/guest support (KVM)
 	 */
-	struct perf_guest_switch_msr *(*guest_get_msrs)(int *nr);
+	struct perf_guest_switch_msr *(*guest_get_msrs)(int *nr, void *data);
 
 	/*
 	 * Check period value for PERF_EVENT_IOC_PERIOD ioctl.
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 6a6e707905be..d5957b68906b 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -491,10 +491,10 @@ static inline void perf_check_microcode(void) { }
 #endif
 
 #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
-extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr);
+extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr, void *data);
 extern int x86_perf_get_lbr(struct x86_pmu_lbr *lbr);
 #else
-struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr);
+struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr, void *data);
 static inline int x86_perf_get_lbr(struct x86_pmu_lbr *lbr)
 {
 	return -1;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f2fd447eed45..df5c1c7f9bd3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6594,9 +6594,10 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
 {
 	int i, nr_msrs;
 	struct perf_guest_switch_msr *msrs;
+	struct kvm_pmu *pmu = vcpu_to_pmu(&vmx->vcpu);
 
 	/* Note, nr_msrs may be garbage if perf_guest_get_msrs() returns NULL. */
-	msrs = perf_guest_get_msrs(&nr_msrs);
+	msrs = perf_guest_get_msrs(&nr_msrs, (void *)pmu);
 	if (!msrs)
 		return;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 04/16] KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled
  2021-05-11  2:41 [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS Like Xu
                   ` (2 preceding siblings ...)
  2021-05-11  2:42 ` [PATCH v6 03/16] perf/x86/core: Pass "struct kvm_pmu *" to determine the guest values Like Xu
@ 2021-05-11  2:42 ` Like Xu
  2021-05-12  1:58   ` Venkatesh Srinivas
  2021-05-11  2:42 ` [PATCH v6 05/16] KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter Like Xu
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 56+ messages in thread
From: Like Xu @ 2021-05-11  2:42 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, liuxiangdong5, linux-kernel, x86, kvm,
	Like Xu, Yao Yuan

On Intel platforms, the software can use the IA32_MISC_ENABLE[7] bit to
detect whether the processor supports performance monitoring facility.

It depends on the PMU is enabled for the guest, and a software write
operation to this available bit will be ignored.

Cc: Yao Yuan <yuan.yao@intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/kvm/vmx/pmu_intel.c | 1 +
 arch/x86/kvm/x86.c           | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 9efc1a6b8693..d9dbebe03cae 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -488,6 +488,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 	if (!pmu->version)
 		return;
 
+	vcpu->arch.ia32_misc_enable_msr |= MSR_IA32_MISC_ENABLE_EMON;
 	perf_get_x86_pmu_capability(&x86_pmu);
 
 	pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5bd550eaf683..abe3ea69078c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3211,6 +3211,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		}
 		break;
 	case MSR_IA32_MISC_ENABLE:
+		data &= ~MSR_IA32_MISC_ENABLE_EMON;
 		if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
 		    ((vcpu->arch.ia32_misc_enable_msr ^ data) & MSR_IA32_MISC_ENABLE_MWAIT)) {
 			if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 05/16] KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter
  2021-05-11  2:41 [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS Like Xu
                   ` (3 preceding siblings ...)
  2021-05-11  2:42 ` [PATCH v6 04/16] KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled Like Xu
@ 2021-05-11  2:42 ` Like Xu
  2021-05-17  8:18   ` Peter Zijlstra
  2021-05-11  2:42 ` [PATCH v6 06/16] KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS Like Xu
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 56+ messages in thread
From: Like Xu @ 2021-05-11  2:42 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, liuxiangdong5, linux-kernel, x86, kvm,
	Like Xu, Luwei Kang

The mask value of fixed counter control register should be dynamic
adjusted with the number of fixed counters. This patch introduces a
variable that includes the reserved bits of fixed counter control
registers. This is needed for later Ice Lake fixed counter changes.

Co-developed-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/vmx/pmu_intel.c    | 6 +++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 55efbacfc244..49b421bd3dd8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -457,6 +457,7 @@ struct kvm_pmu {
 	unsigned nr_arch_fixed_counters;
 	unsigned available_event_types;
 	u64 fixed_ctr_ctrl;
+	u64 fixed_ctr_ctrl_mask;
 	u64 global_ctrl;
 	u64 global_status;
 	u64 global_ovf_ctrl;
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index d9dbebe03cae..ac7fe714e6c1 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -400,7 +400,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_CORE_PERF_FIXED_CTR_CTRL:
 		if (pmu->fixed_ctr_ctrl == data)
 			return 0;
-		if (!(data & 0xfffffffffffff444ull)) {
+		if (!(data & pmu->fixed_ctr_ctrl_mask)) {
 			reprogram_fixed_counters(pmu, data);
 			return 0;
 		}
@@ -470,6 +470,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 	struct kvm_cpuid_entry2 *entry;
 	union cpuid10_eax eax;
 	union cpuid10_edx edx;
+	int i;
 
 	pmu->nr_arch_gp_counters = 0;
 	pmu->nr_arch_fixed_counters = 0;
@@ -477,6 +478,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 	pmu->counter_bitmask[KVM_PMC_FIXED] = 0;
 	pmu->version = 0;
 	pmu->reserved_bits = 0xffffffff00200000ull;
+	pmu->fixed_ctr_ctrl_mask = ~0ull;
 
 	entry = kvm_find_cpuid_entry(vcpu, 0xa, 0);
 	if (!entry)
@@ -511,6 +513,8 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 			((u64)1 << edx.split.bit_width_fixed) - 1;
 	}
 
+	for (i = 0; i < pmu->nr_arch_fixed_counters; i++)
+		pmu->fixed_ctr_ctrl_mask &= ~(0xbull << (i * 4));
 	pmu->global_ctrl = ((1ull << pmu->nr_arch_gp_counters) - 1) |
 		(((1ull << pmu->nr_arch_fixed_counters) - 1) << INTEL_PMC_IDX_FIXED);
 	pmu->global_ctrl_mask = ~pmu->global_ctrl;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 06/16] KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS
  2021-05-11  2:41 [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS Like Xu
                   ` (4 preceding siblings ...)
  2021-05-11  2:42 ` [PATCH v6 05/16] KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter Like Xu
@ 2021-05-11  2:42 ` Like Xu
  2021-05-17  8:32   ` Peter Zijlstra
  2021-05-17  8:33   ` Peter Zijlstra
  2021-05-11  2:42 ` [PATCH v6 07/16] KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter Like Xu
                   ` (10 subsequent siblings)
  16 siblings, 2 replies; 56+ messages in thread
From: Like Xu @ 2021-05-11  2:42 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, liuxiangdong5, linux-kernel, x86, kvm,
	Like Xu, Luwei Kang

If IA32_PERF_CAPABILITIES.PEBS_BASELINE [bit 14] is set, the
IA32_PEBS_ENABLE MSR exists and all architecturally enumerated fixed
and general purpose counters have corresponding bits in IA32_PEBS_ENABLE
that enable generation of PEBS records. The general-purpose counter bits
start at bit IA32_PEBS_ENABLE[0], and the fixed counter bits start at
bit IA32_PEBS_ENABLE[32].

When guest PEBS is enabled, the IA32_PEBS_ENABLE MSR will be
added to the perf_guest_switch_msr() and atomically switched during
the VMX transitions just like CORE_PERF_GLOBAL_CTRL MSR.

Based on whether the platform supports x86_pmu.pebs_vmx, it has also
refactored the way to add more msrs to arr[] in intel_guest_get_msrs()
for extensibility.

Originally-by: Andi Kleen <ak@linux.intel.com>
Co-developed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Co-developed-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/events/intel/core.c     | 60 +++++++++++++++++++++-----------
 arch/x86/include/asm/kvm_host.h  |  3 ++
 arch/x86/include/asm/msr-index.h |  6 ++++
 arch/x86/kvm/vmx/pmu_intel.c     | 31 +++++++++++++++++
 4 files changed, 79 insertions(+), 21 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 2f89fd599842..c791765f4761 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3898,31 +3898,49 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct perf_guest_switch_msr *arr = cpuc->guest_switch_msrs;
 	u64 intel_ctrl = hybrid(cpuc->pmu, intel_ctrl);
+	u64 pebs_mask = (x86_pmu.flags & PMU_FL_PEBS_ALL) ?
+		cpuc->pebs_enabled : (cpuc->pebs_enabled & PEBS_COUNTER_MASK);
+
+	*nr = 0;
+	arr[(*nr)++] = (struct perf_guest_switch_msr){
+		.msr = MSR_CORE_PERF_GLOBAL_CTRL,
+		.host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
+		.guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask),
+	};
 
-	arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL;
-	arr[0].host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
-	arr[0].guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask;
-	if (x86_pmu.flags & PMU_FL_PEBS_ALL)
-		arr[0].guest &= ~cpuc->pebs_enabled;
-	else
-		arr[0].guest &= ~(cpuc->pebs_enabled & PEBS_COUNTER_MASK);
-	*nr = 1;
+	if (!x86_pmu.pebs)
+		return arr;
 
-	if (x86_pmu.pebs && x86_pmu.pebs_no_isolation) {
-		/*
-		 * If PMU counter has PEBS enabled it is not enough to
-		 * disable counter on a guest entry since PEBS memory
-		 * write can overshoot guest entry and corrupt guest
-		 * memory. Disabling PEBS solves the problem.
-		 *
-		 * Don't do this if the CPU already enforces it.
-		 */
-		arr[1].msr = MSR_IA32_PEBS_ENABLE;
-		arr[1].host = cpuc->pebs_enabled;
-		arr[1].guest = 0;
-		*nr = 2;
+	/*
+	 * If PMU counter has PEBS enabled it is not enough to
+	 * disable counter on a guest entry since PEBS memory
+	 * write can overshoot guest entry and corrupt guest
+	 * memory. Disabling PEBS solves the problem.
+	 *
+	 * Don't do this if the CPU already enforces it.
+	 */
+	if (x86_pmu.pebs_no_isolation) {
+		arr[(*nr)++] = (struct perf_guest_switch_msr){
+			.msr = MSR_IA32_PEBS_ENABLE,
+			.host = cpuc->pebs_enabled,
+			.guest = 0,
+		};
+		return arr;
 	}
 
+	if (!x86_pmu.pebs_vmx)
+		return arr;
+
+	arr[*nr] = (struct perf_guest_switch_msr){
+		.msr = MSR_IA32_PEBS_ENABLE,
+		.host = cpuc->pebs_enabled & ~cpuc->intel_ctrl_guest_mask,
+		.guest = pebs_mask & ~cpuc->intel_ctrl_host_mask,
+	};
+
+	/* Set hw GLOBAL_CTRL bits for PEBS counter when it runs for guest */
+	arr[0].guest |= arr[*nr].guest;
+
+	++(*nr);
 	return arr;
 }
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 49b421bd3dd8..0a42079560ac 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -473,6 +473,9 @@ struct kvm_pmu {
 	DECLARE_BITMAP(all_valid_pmc_idx, X86_PMC_IDX_MAX);
 	DECLARE_BITMAP(pmc_in_use, X86_PMC_IDX_MAX);
 
+	u64 pebs_enable;
+	u64 pebs_enable_mask;
+
 	/*
 	 * The gate to release perf_events not marked in
 	 * pmc_in_use only once in a vcpu time slice.
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 742d89a00721..1ab3f280f3a9 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -189,6 +189,12 @@
 #define PERF_CAP_PT_IDX			16
 
 #define MSR_PEBS_LD_LAT_THRESHOLD	0x000003f6
+#define PERF_CAP_PEBS_TRAP             BIT_ULL(6)
+#define PERF_CAP_ARCH_REG              BIT_ULL(7)
+#define PERF_CAP_PEBS_FORMAT           0xf00
+#define PERF_CAP_PEBS_BASELINE         BIT_ULL(14)
+#define PERF_CAP_PEBS_MASK	(PERF_CAP_PEBS_TRAP | PERF_CAP_ARCH_REG | \
+				 PERF_CAP_PEBS_FORMAT | PERF_CAP_PEBS_BASELINE)
 
 #define MSR_IA32_RTIT_CTL		0x00000570
 #define RTIT_CTL_TRACEEN		BIT(0)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index ac7fe714e6c1..9938b485c31c 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -220,6 +220,9 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 	case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
 		ret = pmu->version > 1;
 		break;
+	case MSR_IA32_PEBS_ENABLE:
+		ret = vcpu->arch.perf_capabilities & PERF_CAP_PEBS_FORMAT;
+		break;
 	default:
 		ret = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0) ||
 			get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0) ||
@@ -367,6 +370,9 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
 		msr_info->data = pmu->global_ovf_ctrl;
 		return 0;
+	case MSR_IA32_PEBS_ENABLE:
+		msr_info->data = pmu->pebs_enable;
+		return 0;
 	default:
 		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
 		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
@@ -427,6 +433,14 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 0;
 		}
 		break;
+	case MSR_IA32_PEBS_ENABLE:
+		if (pmu->pebs_enable == data)
+			return 0;
+		if (!(data & pmu->pebs_enable_mask)) {
+			pmu->pebs_enable = data;
+			return 0;
+		}
+		break;
 	default:
 		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
 		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
@@ -479,6 +493,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 	pmu->version = 0;
 	pmu->reserved_bits = 0xffffffff00200000ull;
 	pmu->fixed_ctr_ctrl_mask = ~0ull;
+	pmu->pebs_enable_mask = ~0ull;
 
 	entry = kvm_find_cpuid_entry(vcpu, 0xa, 0);
 	if (!entry)
@@ -545,6 +560,22 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 
 	if (lbr_desc->records.nr)
 		bitmap_set(pmu->all_valid_pmc_idx, INTEL_PMC_IDX_FIXED_VLBR, 1);
+
+	if (vcpu->arch.perf_capabilities & PERF_CAP_PEBS_FORMAT) {
+		if (vcpu->arch.perf_capabilities & PERF_CAP_PEBS_BASELINE) {
+			pmu->pebs_enable_mask = ~pmu->global_ctrl;
+			pmu->reserved_bits &= ~ICL_EVENTSEL_ADAPTIVE;
+			for (i = 0; i < pmu->nr_arch_fixed_counters; i++) {
+				pmu->fixed_ctr_ctrl_mask &=
+					~(1ULL << (INTEL_PMC_IDX_FIXED + i * 4));
+			}
+		} else {
+			pmu->pebs_enable_mask =
+				~((1ull << pmu->nr_arch_gp_counters) - 1);
+		}
+	} else {
+		vcpu->arch.perf_capabilities &= ~PERF_CAP_PEBS_MASK;
+	}
 }
 
 static void intel_pmu_init(struct kvm_vcpu *vcpu)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 07/16] KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter
  2021-05-11  2:41 [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS Like Xu
                   ` (5 preceding siblings ...)
  2021-05-11  2:42 ` [PATCH v6 06/16] KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS Like Xu
@ 2021-05-11  2:42 ` Like Xu
  2021-05-17  8:39   ` Peter Zijlstra
  2021-05-17  9:14   ` Peter Zijlstra
  2021-05-11  2:42 ` [PATCH v6 08/16] KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS Like Xu
                   ` (9 subsequent siblings)
  16 siblings, 2 replies; 56+ messages in thread
From: Like Xu @ 2021-05-11  2:42 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, liuxiangdong5, linux-kernel, x86, kvm,
	Like Xu

When a guest counter is configured as a PEBS counter through
IA32_PEBS_ENABLE, a guest PEBS event will be reprogrammed by
configuring a non-zero precision level in the perf_event_attr.

The guest PEBS overflow PMI bit would be set in the guest
GLOBAL_STATUS MSR when PEBS facility generates a PEBS
overflow PMI based on guest IA32_DS_AREA MSR.

Even with the same counter index and the same event code and
mask, guest PEBS events will not be reused for non-PEBS events.

Originally-by: Andi Kleen <ak@linux.intel.com>
Co-developed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/kvm/pmu.c | 34 ++++++++++++++++++++++++++++++++--
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 827886c12c16..0f86c1142f17 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -74,11 +74,21 @@ static void kvm_perf_overflow_intr(struct perf_event *perf_event,
 {
 	struct kvm_pmc *pmc = perf_event->overflow_handler_context;
 	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
+	bool skip_pmi = false;
 
 	if (!test_and_set_bit(pmc->idx, pmu->reprogram_pmi)) {
-		__set_bit(pmc->idx, (unsigned long *)&pmu->global_status);
+		if (perf_event->attr.precise_ip) {
+			/* Indicate PEBS overflow PMI to guest. */
+			skip_pmi = __test_and_set_bit(GLOBAL_STATUS_BUFFER_OVF_BIT,
+						      (unsigned long *)&pmu->global_status);
+		} else {
+			__set_bit(pmc->idx, (unsigned long *)&pmu->global_status);
+		}
 		kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
 
+		if (skip_pmi)
+			return;
+
 		/*
 		 * Inject PMI. If vcpu was in a guest mode during NMI PMI
 		 * can be ejected on a guest mode re-entry. Otherwise we can't
@@ -99,6 +109,7 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
 				  bool exclude_kernel, bool intr,
 				  bool in_tx, bool in_tx_cp)
 {
+	struct kvm_pmu *pmu = vcpu_to_pmu(pmc->vcpu);
 	struct perf_event *event;
 	struct perf_event_attr attr = {
 		.type = type,
@@ -110,6 +121,7 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
 		.exclude_kernel = exclude_kernel,
 		.config = config,
 	};
+	bool pebs = test_bit(pmc->idx, (unsigned long *)&pmu->pebs_enable);
 
 	attr.sample_period = get_sample_period(pmc, pmc->counter);
 
@@ -124,9 +136,23 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
 		attr.sample_period = 0;
 		attr.config |= HSW_IN_TX_CHECKPOINTED;
 	}
+	if (pebs) {
+		/*
+		 * The non-zero precision level of guest event makes the ordinary
+		 * guest event becomes a guest PEBS event and triggers the host
+		 * PEBS PMI handler to determine whether the PEBS overflow PMI
+		 * comes from the host counters or the guest.
+		 *
+		 * For most PEBS hardware events, the difference in the software
+		 * precision levels of guest and host PEBS events will not affect
+		 * the accuracy of the PEBS profiling result, because the "event IP"
+		 * in the PEBS record is calibrated on the guest side.
+		 */
+		attr.precise_ip = 1;
+	}
 
 	event = perf_event_create_kernel_counter(&attr, -1, current,
-						 intr ? kvm_perf_overflow_intr :
+						 (intr || pebs) ? kvm_perf_overflow_intr :
 						 kvm_perf_overflow, pmc);
 	if (IS_ERR(event)) {
 		pr_debug_ratelimited("kvm_pmu: event creation failed %ld for pmc->idx = %d\n",
@@ -161,6 +187,10 @@ static bool pmc_resume_counter(struct kvm_pmc *pmc)
 			      get_sample_period(pmc, pmc->counter)))
 		return false;
 
+	if (!test_bit(pmc->idx, (unsigned long *)&pmc_to_pmu(pmc)->pebs_enable) &&
+	    pmc->perf_event->attr.precise_ip)
+		return false;
+
 	/* reuse perf_event to serve as pmc_reprogram_counter() does*/
 	perf_event_enable(pmc->perf_event);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 08/16] KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS
  2021-05-11  2:41 [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS Like Xu
                   ` (6 preceding siblings ...)
  2021-05-11  2:42 ` [PATCH v6 07/16] KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter Like Xu
@ 2021-05-11  2:42 ` Like Xu
  2021-05-12  5:16   ` Xu, Like
  2021-05-17 13:26   ` Peter Zijlstra
  2021-05-11  2:42 ` [PATCH v6 09/16] KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS Like Xu
                   ` (8 subsequent siblings)
  16 siblings, 2 replies; 56+ messages in thread
From: Like Xu @ 2021-05-11  2:42 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, liuxiangdong5, linux-kernel, x86, kvm,
	Like Xu

When CPUID.01H:EDX.DS[21] is set, the IA32_DS_AREA MSR exists and points
to the linear address of the first byte of the DS buffer management area,
which is used to manage the PEBS records.

When guest PEBS is enabled, the MSR_IA32_DS_AREA MSR will be added to the
perf_guest_switch_msr() and switched during the VMX transitions just like
CORE_PERF_GLOBAL_CTRL MSR. The WRMSR to IA32_DS_AREA MSR brings a #GP(0)
if the source register contains a non-canonical address.

Originally-by: Andi Kleen <ak@linux.intel.com>
Co-developed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/events/intel/core.c    | 11 ++++++++++-
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/vmx/pmu_intel.c    | 11 +++++++++++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index c791765f4761..de3bc8dfe85e 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -21,6 +21,7 @@
 #include <asm/intel_pt.h>
 #include <asm/apic.h>
 #include <asm/cpu_device_id.h>
+#include <asm/kvm_host.h>
 
 #include "../perf_event.h"
 
@@ -3897,6 +3898,8 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct perf_guest_switch_msr *arr = cpuc->guest_switch_msrs;
+	struct debug_store *ds = __this_cpu_read(cpu_hw_events.ds);
+	struct kvm_pmu *pmu = (struct kvm_pmu *)data;
 	u64 intel_ctrl = hybrid(cpuc->pmu, intel_ctrl);
 	u64 pebs_mask = (x86_pmu.flags & PMU_FL_PEBS_ALL) ?
 		cpuc->pebs_enabled : (cpuc->pebs_enabled & PEBS_COUNTER_MASK);
@@ -3908,7 +3911,7 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
 		.guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask),
 	};
 
-	if (!x86_pmu.pebs)
+	if (!pmu || !x86_pmu.pebs_vmx)
 		return arr;
 
 	/*
@@ -3931,6 +3934,12 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
 	if (!x86_pmu.pebs_vmx)
 		return arr;
 
+	arr[(*nr)++] = (struct perf_guest_switch_msr){
+		.msr = MSR_IA32_DS_AREA,
+		.host = (unsigned long)ds,
+		.guest = pmu->ds_area,
+	};
+
 	arr[*nr] = (struct perf_guest_switch_msr){
 		.msr = MSR_IA32_PEBS_ENABLE,
 		.host = cpuc->pebs_enabled & ~cpuc->intel_ctrl_guest_mask,
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0a42079560ac..296bc3eecdc6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -473,6 +473,7 @@ struct kvm_pmu {
 	DECLARE_BITMAP(all_valid_pmc_idx, X86_PMC_IDX_MAX);
 	DECLARE_BITMAP(pmc_in_use, X86_PMC_IDX_MAX);
 
+	u64 ds_area;
 	u64 pebs_enable;
 	u64 pebs_enable_mask;
 
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 9938b485c31c..5584b8dfadb3 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -223,6 +223,9 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 	case MSR_IA32_PEBS_ENABLE:
 		ret = vcpu->arch.perf_capabilities & PERF_CAP_PEBS_FORMAT;
 		break;
+	case MSR_IA32_DS_AREA:
+		ret = guest_cpuid_has(vcpu, X86_FEATURE_DS);
+		break;
 	default:
 		ret = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0) ||
 			get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0) ||
@@ -373,6 +376,9 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_PEBS_ENABLE:
 		msr_info->data = pmu->pebs_enable;
 		return 0;
+	case MSR_IA32_DS_AREA:
+		msr_info->data = pmu->ds_area;
+		return 0;
 	default:
 		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
 		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
@@ -441,6 +447,11 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 0;
 		}
 		break;
+	case MSR_IA32_DS_AREA:
+		if (is_noncanonical_address(data, vcpu))
+			return 1;
+		pmu->ds_area = data;
+		return 0;
 	default:
 		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
 		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 09/16] KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS
  2021-05-11  2:41 [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS Like Xu
                   ` (7 preceding siblings ...)
  2021-05-11  2:42 ` [PATCH v6 08/16] KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS Like Xu
@ 2021-05-11  2:42 ` Like Xu
  2021-05-11  2:42 ` [PATCH v6 10/16] KVM: x86: Set PEBS_UNAVAIL in IA32_MISC_ENABLE when PEBS is enabled Like Xu
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 56+ messages in thread
From: Like Xu @ 2021-05-11  2:42 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, liuxiangdong5, linux-kernel, x86, kvm,
	Like Xu, Luwei Kang

If IA32_PERF_CAPABILITIES.PEBS_BASELINE [bit 14] is set, the adaptive
PEBS is supported. The PEBS_DATA_CFG MSR and adaptive record enable
bits (IA32_PERFEVTSELx.Adaptive_Record and IA32_FIXED_CTR_CTRL.
FCx_Adaptive_Record) are also supported.

Adaptive PEBS provides software the capability to configure the PEBS
records to capture only the data of interest, keeping the record size
compact. An overflow of PMCx results in generation of an adaptive PEBS
record with state information based on the selections specified in
MSR_PEBS_DATA_CFG.By default, the record only contain the Basic group.

When guest adaptive PEBS is enabled, the IA32_PEBS_ENABLE MSR will
be added to the perf_guest_switch_msr() and switched during the VMX
transitions just like CORE_PERF_GLOBAL_CTRL MSR.

Co-developed-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/events/intel/core.c    |  8 ++++++++
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/vmx/pmu_intel.c    | 16 ++++++++++++++++
 3 files changed, 26 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index de3bc8dfe85e..18843412718a 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3940,6 +3940,14 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
 		.guest = pmu->ds_area,
 	};
 
+	if (x86_pmu.intel_cap.pebs_baseline) {
+		arr[(*nr)++] = (struct perf_guest_switch_msr){
+			.msr = MSR_PEBS_DATA_CFG,
+			.host = cpuc->pebs_data_cfg,
+			.guest = pmu->pebs_data_cfg,
+		};
+	}
+
 	arr[*nr] = (struct perf_guest_switch_msr){
 		.msr = MSR_IA32_PEBS_ENABLE,
 		.host = cpuc->pebs_enabled & ~cpuc->intel_ctrl_guest_mask,
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 296bc3eecdc6..b4deb7820397 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -476,6 +476,8 @@ struct kvm_pmu {
 	u64 ds_area;
 	u64 pebs_enable;
 	u64 pebs_enable_mask;
+	u64 pebs_data_cfg;
+	u64 pebs_data_cfg_mask;
 
 	/*
 	 * The gate to release perf_events not marked in
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 5584b8dfadb3..58f32a55cc2e 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -226,6 +226,9 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 	case MSR_IA32_DS_AREA:
 		ret = guest_cpuid_has(vcpu, X86_FEATURE_DS);
 		break;
+	case MSR_PEBS_DATA_CFG:
+		ret = vcpu->arch.perf_capabilities & PERF_CAP_PEBS_BASELINE;
+		break;
 	default:
 		ret = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0) ||
 			get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0) ||
@@ -379,6 +382,9 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_DS_AREA:
 		msr_info->data = pmu->ds_area;
 		return 0;
+	case MSR_PEBS_DATA_CFG:
+		msr_info->data = pmu->pebs_data_cfg;
+		return 0;
 	default:
 		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
 		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
@@ -452,6 +458,14 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 1;
 		pmu->ds_area = data;
 		return 0;
+	case MSR_PEBS_DATA_CFG:
+		if (pmu->pebs_data_cfg == data)
+			return 0;
+		if (!(data & pmu->pebs_data_cfg_mask)) {
+			pmu->pebs_data_cfg = data;
+			return 0;
+		}
+		break;
 	default:
 		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
 		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
@@ -505,6 +519,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 	pmu->reserved_bits = 0xffffffff00200000ull;
 	pmu->fixed_ctr_ctrl_mask = ~0ull;
 	pmu->pebs_enable_mask = ~0ull;
+	pmu->pebs_data_cfg_mask = ~0ull;
 
 	entry = kvm_find_cpuid_entry(vcpu, 0xa, 0);
 	if (!entry)
@@ -580,6 +595,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 				pmu->fixed_ctr_ctrl_mask &=
 					~(1ULL << (INTEL_PMC_IDX_FIXED + i * 4));
 			}
+			pmu->pebs_data_cfg_mask = ~0xff00000full;
 		} else {
 			pmu->pebs_enable_mask =
 				~((1ull << pmu->nr_arch_gp_counters) - 1);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 10/16] KVM: x86: Set PEBS_UNAVAIL in IA32_MISC_ENABLE when PEBS is enabled
  2021-05-11  2:41 [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS Like Xu
                   ` (8 preceding siblings ...)
  2021-05-11  2:42 ` [PATCH v6 09/16] KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS Like Xu
@ 2021-05-11  2:42 ` Like Xu
  2021-05-11  2:42 ` [PATCH v6 11/16] KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR counter Like Xu
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 56+ messages in thread
From: Like Xu @ 2021-05-11  2:42 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, liuxiangdong5, linux-kernel, x86, kvm,
	Like Xu

The bit 12 represents "Processor Event Based Sampling Unavailable (RO)" :
	1 = PEBS is not supported.
	0 = PEBS is supported.

A write to this PEBS_UNAVL available bit will bring #GP(0) when guest PEBS
is enabled. Some PEBS drivers in guest may care about this bit.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/kvm/vmx/pmu_intel.c | 2 ++
 arch/x86/kvm/x86.c           | 4 ++++
 2 files changed, 6 insertions(+)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 58f32a55cc2e..296246bf253d 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -588,6 +588,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 		bitmap_set(pmu->all_valid_pmc_idx, INTEL_PMC_IDX_FIXED_VLBR, 1);
 
 	if (vcpu->arch.perf_capabilities & PERF_CAP_PEBS_FORMAT) {
+		vcpu->arch.ia32_misc_enable_msr &= ~MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL;
 		if (vcpu->arch.perf_capabilities & PERF_CAP_PEBS_BASELINE) {
 			pmu->pebs_enable_mask = ~pmu->global_ctrl;
 			pmu->reserved_bits &= ~ICL_EVENTSEL_ADAPTIVE;
@@ -601,6 +602,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 				~((1ull << pmu->nr_arch_gp_counters) - 1);
 		}
 	} else {
+		vcpu->arch.ia32_misc_enable_msr |= MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL;
 		vcpu->arch.perf_capabilities &= ~PERF_CAP_PEBS_MASK;
 	}
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index abe3ea69078c..c1ab5bcf75cc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3212,6 +3212,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		break;
 	case MSR_IA32_MISC_ENABLE:
 		data &= ~MSR_IA32_MISC_ENABLE_EMON;
+		if (!msr_info->host_initiated &&
+		    (vcpu->arch.perf_capabilities & PERF_CAP_PEBS_FORMAT) &&
+		    (data & MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL))
+			return 1;
 		if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT) &&
 		    ((vcpu->arch.ia32_misc_enable_msr ^ data) & MSR_IA32_MISC_ENABLE_MWAIT)) {
 			if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 11/16] KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR counter
  2021-05-11  2:41 [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS Like Xu
                   ` (9 preceding siblings ...)
  2021-05-11  2:42 ` [PATCH v6 10/16] KVM: x86: Set PEBS_UNAVAIL in IA32_MISC_ENABLE when PEBS is enabled Like Xu
@ 2021-05-11  2:42 ` Like Xu
  2021-05-11  2:42 ` [PATCH v6 12/16] KVM: x86/pmu: Move pmc_speculative_in_use() to arch/x86/kvm/pmu.h Like Xu
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 56+ messages in thread
From: Like Xu @ 2021-05-11  2:42 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, liuxiangdong5, linux-kernel, x86, kvm,
	Like Xu

The PEBS-PDIR facility on Ice Lake server is supported on IA31_FIXED0 only.
If the guest configures counter 32 and PEBS is enabled, the PEBS-PDIR
facility is supposed to be used, in which case KVM adjusts attr.precise_ip
to 3 and request host perf to assign the exactly requested counter or fail.

The cpu model check is also required since some platforms may place the
PEBS-PDIR facility in another counter index.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/kvm/pmu.c | 2 ++
 arch/x86/kvm/pmu.h | 7 +++++++
 2 files changed, 9 insertions(+)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 0f86c1142f17..d3f746877d1b 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -149,6 +149,8 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
 		 * in the PEBS record is calibrated on the guest side.
 		 */
 		attr.precise_ip = 1;
+		if (x86_match_cpu(vmx_icl_pebs_cpu) && pmc->idx == 32)
+			attr.precise_ip = 3;
 	}
 
 	event = perf_event_create_kernel_counter(&attr, -1, current,
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 67e753edfa22..1af86ae1d3f2 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -4,6 +4,8 @@
 
 #include <linux/nospec.h>
 
+#include <asm/cpu_device_id.h>
+
 #define vcpu_to_pmu(vcpu) (&(vcpu)->arch.pmu)
 #define pmu_to_vcpu(pmu)  (container_of((pmu), struct kvm_vcpu, arch.pmu))
 #define pmc_to_pmu(pmc)   (&(pmc)->vcpu->arch.pmu)
@@ -16,6 +18,11 @@
 #define VMWARE_BACKDOOR_PMC_APPARENT_TIME	0x10002
 
 #define MAX_FIXED_COUNTERS	3
+static const struct x86_cpu_id vmx_icl_pebs_cpu[] = {
+	X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_D, NULL),
+	X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, NULL),
+	{}
+};
 
 struct kvm_event_hw_type_mapping {
 	u8 eventsel;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 12/16] KVM: x86/pmu: Move pmc_speculative_in_use() to arch/x86/kvm/pmu.h
  2021-05-11  2:41 [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS Like Xu
                   ` (10 preceding siblings ...)
  2021-05-11  2:42 ` [PATCH v6 11/16] KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR counter Like Xu
@ 2021-05-11  2:42 ` Like Xu
  2021-05-11  2:42 ` [PATCH v6 13/16] KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations Like Xu
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 56+ messages in thread
From: Like Xu @ 2021-05-11  2:42 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, liuxiangdong5, linux-kernel, x86, kvm,
	Like Xu

It allows this inline function to be reused by more callers in
more files, such as pmu_intel.c.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/kvm/pmu.c | 11 -----------
 arch/x86/kvm/pmu.h | 11 +++++++++++
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index d3f746877d1b..666a5e90a3cb 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -477,17 +477,6 @@ void kvm_pmu_init(struct kvm_vcpu *vcpu)
 	kvm_pmu_refresh(vcpu);
 }
 
-static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc)
-{
-	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
-
-	if (pmc_is_fixed(pmc))
-		return fixed_ctrl_field(pmu->fixed_ctr_ctrl,
-			pmc->idx - INTEL_PMC_IDX_FIXED) & 0x3;
-
-	return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
-}
-
 /* Release perf_events for vPMCs that have been unused for a full time slice.  */
 void kvm_pmu_cleanup(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 1af86ae1d3f2..ef5b6ee8fdc7 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -149,6 +149,17 @@ static inline u64 get_sample_period(struct kvm_pmc *pmc, u64 counter_value)
 	return sample_period;
 }
 
+static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc)
+{
+	struct kvm_pmu *pmu = pmc_to_pmu(pmc);
+
+	if (pmc_is_fixed(pmc))
+		return fixed_ctrl_field(pmu->fixed_ctr_ctrl,
+			pmc->idx - INTEL_PMC_IDX_FIXED) & 0x3;
+
+	return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
+}
+
 void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel);
 void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int fixed_idx);
 void reprogram_counter(struct kvm_pmu *pmu, int pmc_idx);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 13/16] KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations
  2021-05-11  2:41 [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS Like Xu
                   ` (11 preceding siblings ...)
  2021-05-11  2:42 ` [PATCH v6 12/16] KVM: x86/pmu: Move pmc_speculative_in_use() to arch/x86/kvm/pmu.h Like Xu
@ 2021-05-11  2:42 ` Like Xu
  2021-05-11  2:42 ` [PATCH v6 14/16] KVM: x86/pmu: Add kvm_pmu_cap to optimize perf_get_x86_pmu_capability Like Xu
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 56+ messages in thread
From: Like Xu @ 2021-05-11  2:42 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, liuxiangdong5, linux-kernel, x86, kvm,
	Like Xu

The guest PEBS will be disabled when some users try to perf KVM and
its user-space through the same PEBS facility OR when the host perf
doesn't schedule the guest PEBS counter in a one-to-one mapping manner
(neither of these are typical scenarios).

The PEBS records in the guest DS buffer are still accurate and the
above two restrictions will be checked before each vm-entry only if
guest PEBS is deemed to be enabled.

Suggested-by: Wei Wang <wei.w.wang@intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/events/intel/core.c    | 11 +++++++++--
 arch/x86/include/asm/kvm_host.h |  9 +++++++++
 arch/x86/kvm/vmx/pmu_intel.c    | 19 +++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c          |  4 ++++
 arch/x86/kvm/vmx/vmx.h          |  1 +
 5 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 18843412718a..678958df2ce9 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3954,8 +3954,15 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
 		.guest = pebs_mask & ~cpuc->intel_ctrl_host_mask,
 	};
 
-	/* Set hw GLOBAL_CTRL bits for PEBS counter when it runs for guest */
-	arr[0].guest |= arr[*nr].guest;
+	if (arr[*nr].host) {
+		/* Disable guest PEBS if host PEBS is enabled. */
+		arr[*nr].guest = 0;
+	} else {
+		/* Disable guest PEBS for cross-mapped PEBS counters. */
+		arr[*nr].guest &= ~pmu->host_cross_mapped_mask;
+		/* Set hw GLOBAL_CTRL bits for PEBS counter when it runs for guest */
+		arr[0].guest |= arr[*nr].guest;
+	}
 
 	++(*nr);
 	return arr;
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index b4deb7820397..15bff609fd57 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -479,6 +479,15 @@ struct kvm_pmu {
 	u64 pebs_data_cfg;
 	u64 pebs_data_cfg_mask;
 
+	/*
+	 * If a guest counter is cross-mapped to host counter with different
+	 * index, its PEBS capability will be temporarily disabled.
+	 *
+	 * The user should make sure that this mask is updated
+	 * after disabling interrupts and before perf_guest_get_msrs();
+	 */
+	u64 host_cross_mapped_mask;
+
 	/*
 	 * The gate to release perf_events not marked in
 	 * pmc_in_use only once in a vcpu time slice.
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 296246bf253d..28152d7fd12d 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -770,6 +770,25 @@ static void intel_pmu_cleanup(struct kvm_vcpu *vcpu)
 		intel_pmu_release_guest_lbr_event(vcpu);
 }
 
+void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu)
+{
+	struct kvm_pmc *pmc = NULL;
+	int bit;
+
+	for_each_set_bit(bit, (unsigned long *)&pmu->global_ctrl,
+			 X86_PMC_IDX_MAX) {
+		pmc = kvm_x86_ops.pmu_ops->pmc_idx_to_pmc(pmu, bit);
+
+		if (!pmc || !pmc_speculative_in_use(pmc) ||
+		    !pmc_is_enabled(pmc))
+			continue;
+
+		if (pmc->perf_event && (pmc->idx != pmc->perf_event->hw.idx))
+			pmu->host_cross_mapped_mask |=
+				BIT_ULL(pmc->perf_event->hw.idx);
+	}
+}
+
 struct kvm_pmu_ops intel_pmu_ops = {
 	.find_arch_event = intel_find_arch_event,
 	.find_fixed_event = intel_find_fixed_event,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index df5c1c7f9bd3..e43d58020c75 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6596,6 +6596,10 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
 	struct perf_guest_switch_msr *msrs;
 	struct kvm_pmu *pmu = vcpu_to_pmu(&vmx->vcpu);
 
+	pmu->host_cross_mapped_mask = 0;
+	if (pmu->pebs_enable & pmu->global_ctrl)
+		intel_pmu_cross_mapped_check(pmu);
+
 	/* Note, nr_msrs may be garbage if perf_guest_get_msrs() returns NULL. */
 	msrs = perf_guest_get_msrs(&nr_msrs, (void *)pmu);
 	if (!msrs)
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 16e4e457ba23..72f1175e474b 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -96,6 +96,7 @@ union vmx_exit_reason {
 #define vcpu_to_lbr_desc(vcpu) (&to_vmx(vcpu)->lbr_desc)
 #define vcpu_to_lbr_records(vcpu) (&to_vmx(vcpu)->lbr_desc.records)
 
+void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu);
 bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu);
 bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 14/16] KVM: x86/pmu: Add kvm_pmu_cap to optimize perf_get_x86_pmu_capability
  2021-05-11  2:41 [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS Like Xu
                   ` (12 preceding siblings ...)
  2021-05-11  2:42 ` [PATCH v6 13/16] KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations Like Xu
@ 2021-05-11  2:42 ` Like Xu
  2021-05-11  2:42 ` [PATCH v6 15/16] KVM: x86/cpuid: Refactor host/guest CPU model consistency check Like Xu
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 56+ messages in thread
From: Like Xu @ 2021-05-11  2:42 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, liuxiangdong5, linux-kernel, x86, kvm,
	Like Xu

The information obtained from the interface perf_get_x86_pmu_capability()
doesn't change, so an exported global "struct x86_pmu_capability" can be
introduced for all guests in the KVM, and it's initialized before
hardware_setup().

Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/kvm/cpuid.c         | 24 +++++++-----------------
 arch/x86/kvm/pmu.c           |  3 +++
 arch/x86/kvm/pmu.h           | 20 ++++++++++++++++++++
 arch/x86/kvm/vmx/pmu_intel.c | 17 ++++++++---------
 arch/x86/kvm/x86.c           |  9 ++++-----
 5 files changed, 42 insertions(+), 31 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 9a48f138832d..a654fac41c22 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -744,32 +744,22 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 	case 9:
 		break;
 	case 0xa: { /* Architectural Performance Monitoring */
-		struct x86_pmu_capability cap;
 		union cpuid10_eax eax;
 		union cpuid10_edx edx;
 
-		perf_get_x86_pmu_capability(&cap);
+		eax.split.version_id = kvm_pmu_cap.version;
+		eax.split.num_counters = kvm_pmu_cap.num_counters_gp;
+		eax.split.bit_width = kvm_pmu_cap.bit_width_gp;
+		eax.split.mask_length = kvm_pmu_cap.events_mask_len;
 
-		/*
-		 * Only support guest architectural pmu on a host
-		 * with architectural pmu.
-		 */
-		if (!cap.version)
-			memset(&cap, 0, sizeof(cap));
-
-		eax.split.version_id = min(cap.version, 2);
-		eax.split.num_counters = cap.num_counters_gp;
-		eax.split.bit_width = cap.bit_width_gp;
-		eax.split.mask_length = cap.events_mask_len;
-
-		edx.split.num_counters_fixed = min(cap.num_counters_fixed, MAX_FIXED_COUNTERS);
-		edx.split.bit_width_fixed = cap.bit_width_fixed;
+		edx.split.num_counters_fixed = kvm_pmu_cap.num_counters_fixed;
+		edx.split.bit_width_fixed = kvm_pmu_cap.bit_width_fixed;
 		edx.split.anythread_deprecated = 1;
 		edx.split.reserved1 = 0;
 		edx.split.reserved2 = 0;
 
 		entry->eax = eax.full;
-		entry->ebx = cap.events_mask;
+		entry->ebx = kvm_pmu_cap.events_mask;
 		entry->ecx = 0;
 		entry->edx = edx.full;
 		break;
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 666a5e90a3cb..4798bf991b60 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -19,6 +19,9 @@
 #include "lapic.h"
 #include "pmu.h"
 
+struct x86_pmu_capability __read_mostly kvm_pmu_cap;
+EXPORT_SYMBOL_GPL(kvm_pmu_cap);
+
 /* This is enough to filter the vast majority of currently defined events. */
 #define KVM_PMU_EVENT_FILTER_MAX_EVENTS 300
 
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index ef5b6ee8fdc7..832cf56e6924 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -160,6 +160,24 @@ static inline bool pmc_speculative_in_use(struct kvm_pmc *pmc)
 	return pmc->eventsel & ARCH_PERFMON_EVENTSEL_ENABLE;
 }
 
+extern struct x86_pmu_capability kvm_pmu_cap;
+
+static inline void kvm_init_pmu_capability(void)
+{
+	perf_get_x86_pmu_capability(&kvm_pmu_cap);
+
+	/*
+	 * Only support guest architectural pmu on
+	 * a host with architectural pmu.
+	 */
+	if (!kvm_pmu_cap.version)
+		memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
+
+	kvm_pmu_cap.version = min(kvm_pmu_cap.version, 2);
+	kvm_pmu_cap.num_counters_fixed = min(kvm_pmu_cap.num_counters_fixed,
+					     MAX_FIXED_COUNTERS);
+}
+
 void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel);
 void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int fixed_idx);
 void reprogram_counter(struct kvm_pmu *pmu, int pmc_idx);
@@ -177,9 +195,11 @@ void kvm_pmu_init(struct kvm_vcpu *vcpu);
 void kvm_pmu_cleanup(struct kvm_vcpu *vcpu);
 void kvm_pmu_destroy(struct kvm_vcpu *vcpu);
 int kvm_vm_ioctl_set_pmu_event_filter(struct kvm *kvm, void __user *argp);
+void kvm_init_pmu_capability(void);
 
 bool is_vmware_backdoor_pmc(u32 pmc_idx);
 
 extern struct kvm_pmu_ops intel_pmu_ops;
 extern struct kvm_pmu_ops amd_pmu_ops;
+
 #endif /* __KVM_X86_PMU_H */
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 28152d7fd12d..d0610716675b 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -504,8 +504,6 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
 	struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
-
-	struct x86_pmu_capability x86_pmu;
 	struct kvm_cpuid_entry2 *entry;
 	union cpuid10_eax eax;
 	union cpuid10_edx edx;
@@ -532,13 +530,14 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 		return;
 
 	vcpu->arch.ia32_misc_enable_msr |= MSR_IA32_MISC_ENABLE_EMON;
-	perf_get_x86_pmu_capability(&x86_pmu);
 
 	pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
-					 x86_pmu.num_counters_gp);
-	eax.split.bit_width = min_t(int, eax.split.bit_width, x86_pmu.bit_width_gp);
+					 kvm_pmu_cap.num_counters_gp);
+	eax.split.bit_width = min_t(int, eax.split.bit_width,
+				    kvm_pmu_cap.bit_width_gp);
 	pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << eax.split.bit_width) - 1;
-	eax.split.mask_length = min_t(int, eax.split.mask_length, x86_pmu.events_mask_len);
+	eax.split.mask_length = min_t(int, eax.split.mask_length,
+				      kvm_pmu_cap.events_mask_len);
 	pmu->available_event_types = ~entry->ebx &
 					((1ull << eax.split.mask_length) - 1);
 
@@ -547,9 +546,9 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 	} else {
 		pmu->nr_arch_fixed_counters =
 			min_t(int, edx.split.num_counters_fixed,
-			      x86_pmu.num_counters_fixed);
-		edx.split.bit_width_fixed = min_t(int,
-			edx.split.bit_width_fixed, x86_pmu.bit_width_fixed);
+			      kvm_pmu_cap.num_counters_fixed);
+		edx.split.bit_width_fixed = min_t(int, edx.split.bit_width_fixed,
+						  kvm_pmu_cap.bit_width_fixed);
 		pmu->counter_bitmask[KVM_PMC_FIXED] =
 			((u64)1 << edx.split.bit_width_fixed) - 1;
 	}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c1ab5bcf75cc..0a86a9f34dce 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5969,15 +5969,12 @@ long kvm_arch_vm_ioctl(struct file *filp,
 
 static void kvm_init_msr_list(void)
 {
-	struct x86_pmu_capability x86_pmu;
 	u32 dummy[2];
 	unsigned i;
 
 	BUILD_BUG_ON_MSG(INTEL_PMC_MAX_FIXED != 4,
 			 "Please update the fixed PMCs in msrs_to_saved_all[]");
 
-	perf_get_x86_pmu_capability(&x86_pmu);
-
 	num_msrs_to_save = 0;
 	num_emulated_msrs = 0;
 	num_msr_based_features = 0;
@@ -6029,12 +6026,12 @@ static void kvm_init_msr_list(void)
 			break;
 		case MSR_ARCH_PERFMON_PERFCTR0 ... MSR_ARCH_PERFMON_PERFCTR0 + 17:
 			if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_PERFCTR0 >=
-			    min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp))
+			    min(INTEL_PMC_MAX_GENERIC, kvm_pmu_cap.num_counters_gp))
 				continue;
 			break;
 		case MSR_ARCH_PERFMON_EVENTSEL0 ... MSR_ARCH_PERFMON_EVENTSEL0 + 17:
 			if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_EVENTSEL0 >=
-			    min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp))
+			    min(INTEL_PMC_MAX_GENERIC, kvm_pmu_cap.num_counters_gp))
 				continue;
 			break;
 		default:
@@ -10618,6 +10615,8 @@ int kvm_arch_hardware_setup(void *opaque)
 	if (boot_cpu_has(X86_FEATURE_XSAVES))
 		rdmsrl(MSR_IA32_XSS, host_xss);
 
+	kvm_init_pmu_capability();
+
 	r = ops->hardware_setup();
 	if (r != 0)
 		return r;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 15/16] KVM: x86/cpuid: Refactor host/guest CPU model consistency check
  2021-05-11  2:41 [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS Like Xu
                   ` (13 preceding siblings ...)
  2021-05-11  2:42 ` [PATCH v6 14/16] KVM: x86/pmu: Add kvm_pmu_cap to optimize perf_get_x86_pmu_capability Like Xu
@ 2021-05-11  2:42 ` Like Xu
  2021-05-11  2:42 ` [PATCH v6 16/16] KVM: x86/pmu: Expose CPUIDs feature bits PDCM, DS, DTES64 Like Xu
  2021-05-15 10:30 ` [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS Liuxiangdong
  16 siblings, 0 replies; 56+ messages in thread
From: Like Xu @ 2021-05-11  2:42 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, liuxiangdong5, linux-kernel, x86, kvm,
	Like Xu

For the same purpose, the leagcy intel_pmu_lbr_is_compatible() can be
renamed for reuse by more callers, and remove the comment about LBR
use case can be deleted by the way.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/kvm/cpuid.h         |  5 +++++
 arch/x86/kvm/vmx/pmu_intel.c | 12 +-----------
 arch/x86/kvm/vmx/vmx.c       |  2 +-
 arch/x86/kvm/vmx/vmx.h       |  1 -
 4 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index c99edfff7f82..439ce776b9a0 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -143,6 +143,11 @@ static inline int guest_cpuid_model(struct kvm_vcpu *vcpu)
 	return x86_model(best->eax);
 }
 
+static inline bool cpuid_model_is_consistent(struct kvm_vcpu *vcpu)
+{
+	return boot_cpu_data.x86_model == guest_cpuid_model(vcpu);
+}
+
 static inline int guest_cpuid_stepping(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpuid_entry2 *best;
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index d0610716675b..a706d3597720 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -173,16 +173,6 @@ static inline struct kvm_pmc *get_fw_gp_pmc(struct kvm_pmu *pmu, u32 msr)
 	return get_gp_pmc(pmu, msr, MSR_IA32_PMC0);
 }
 
-bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu)
-{
-	/*
-	 * As a first step, a guest could only enable LBR feature if its
-	 * cpu model is the same as the host because the LBR registers
-	 * would be pass-through to the guest and they're model specific.
-	 */
-	return boot_cpu_data.x86_model == guest_cpuid_model(vcpu);
-}
-
 bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu)
 {
 	struct x86_pmu_lbr *lbr = vcpu_to_lbr_records(vcpu);
@@ -578,7 +568,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
 
 	nested_vmx_pmu_entry_exit_ctls_update(vcpu);
 
-	if (intel_pmu_lbr_is_compatible(vcpu))
+	if (cpuid_model_is_consistent(vcpu))
 		x86_perf_get_lbr(&lbr_desc->records);
 	else
 		lbr_desc->records.nr = 0;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e43d58020c75..e11efe9d2ff4 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2306,7 +2306,7 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			if ((data & PMU_CAP_LBR_FMT) !=
 			    (vmx_get_perf_capabilities() & PMU_CAP_LBR_FMT))
 				return 1;
-			if (!intel_pmu_lbr_is_compatible(vcpu))
+			if (!cpuid_model_is_consistent(vcpu))
 				return 1;
 		}
 		ret = kvm_set_msr_common(vcpu, msr_info);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 72f1175e474b..3afdcebb0a11 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -97,7 +97,6 @@ union vmx_exit_reason {
 #define vcpu_to_lbr_records(vcpu) (&to_vmx(vcpu)->lbr_desc.records)
 
 void intel_pmu_cross_mapped_check(struct kvm_pmu *pmu);
-bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu);
 bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu);
 
 int intel_pmu_create_guest_lbr_event(struct kvm_vcpu *vcpu);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v6 16/16] KVM: x86/pmu: Expose CPUIDs feature bits PDCM, DS, DTES64
  2021-05-11  2:41 [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS Like Xu
                   ` (14 preceding siblings ...)
  2021-05-11  2:42 ` [PATCH v6 15/16] KVM: x86/cpuid: Refactor host/guest CPU model consistency check Like Xu
@ 2021-05-11  2:42 ` Like Xu
  2021-05-15 10:30 ` [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS Liuxiangdong
  16 siblings, 0 replies; 56+ messages in thread
From: Like Xu @ 2021-05-11  2:42 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, liuxiangdong5, linux-kernel, x86, kvm,
	Like Xu, Luwei Kang

The CPUID features PDCM, DS and DTES64 are required for PEBS feature.
KVM would expose CPUID feature PDCM, DS and DTES64 to guest when PEBS
is supported in the KVM on the Ice Lake server platforms.

Originally-by: Andi Kleen <ak@linux.intel.com>
Co-developed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Co-developed-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/kvm/vmx/capabilities.h | 26 ++++++++++++++++++--------
 arch/x86/kvm/vmx/vmx.c          | 15 +++++++++++++++
 2 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 8dee8a5fbc17..fd8c9822db9e 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -5,6 +5,7 @@
 #include <asm/vmx.h>
 
 #include "lapic.h"
+#include "pmu.h"
 
 extern bool __read_mostly enable_vpid;
 extern bool __read_mostly flexpriority_enabled;
@@ -378,20 +379,29 @@ static inline bool vmx_pt_mode_is_host_guest(void)
 	return pt_mode == PT_MODE_HOST_GUEST;
 }
 
-static inline u64 vmx_get_perf_capabilities(void)
+static inline bool vmx_pebs_supported(void)
 {
-	u64 perf_cap = 0;
-
-	if (boot_cpu_has(X86_FEATURE_PDCM))
-		rdmsrl(MSR_IA32_PERF_CAPABILITIES, perf_cap);
-
-	perf_cap &= PMU_CAP_LBR_FMT;
+	return boot_cpu_has(X86_FEATURE_PEBS) && kvm_pmu_cap.pebs_vmx;
+}
 
+static inline u64 vmx_get_perf_capabilities(void)
+{
 	/*
 	 * Since counters are virtualized, KVM would support full
 	 * width counting unconditionally, even if the host lacks it.
 	 */
-	return PMU_CAP_FW_WRITES | perf_cap;
+	u64 perf_cap = PMU_CAP_FW_WRITES;
+	u64 host_perf_cap = 0;
+
+	if (boot_cpu_has(X86_FEATURE_PDCM))
+		rdmsrl(MSR_IA32_PERF_CAPABILITIES, host_perf_cap);
+
+	perf_cap |= host_perf_cap & PMU_CAP_LBR_FMT;
+
+	if (vmx_pebs_supported())
+		perf_cap |= host_perf_cap & PERF_CAP_PEBS_MASK;
+
+	return perf_cap;
 }
 
 static inline u64 vmx_supported_debugctl(void)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e11efe9d2ff4..8d7dcf0ce4a3 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2309,6 +2309,17 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			if (!cpuid_model_is_consistent(vcpu))
 				return 1;
 		}
+		if (data & PERF_CAP_PEBS_FORMAT) {
+			if ((data & PERF_CAP_PEBS_MASK) !=
+			    (vmx_get_perf_capabilities() & PERF_CAP_PEBS_MASK))
+				return 1;
+			if (!guest_cpuid_has(vcpu, X86_FEATURE_DS))
+				return 1;
+			if (!guest_cpuid_has(vcpu, X86_FEATURE_DTES64))
+				return 1;
+			if (!cpuid_model_is_consistent(vcpu))
+				return 1;
+		}
 		ret = kvm_set_msr_common(vcpu, msr_info);
 		break;
 
@@ -7338,6 +7349,10 @@ static __init void vmx_set_cpu_caps(void)
 		kvm_cpu_cap_clear(X86_FEATURE_INVPCID);
 	if (vmx_pt_mode_is_host_guest())
 		kvm_cpu_cap_check_and_set(X86_FEATURE_INTEL_PT);
+	if (vmx_pebs_supported()) {
+		kvm_cpu_cap_check_and_set(X86_FEATURE_DS);
+		kvm_cpu_cap_check_and_set(X86_FEATURE_DTES64);
+	}
 
 	if (!enable_sgx) {
 		kvm_cpu_cap_clear(X86_FEATURE_SGX);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/16] KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled
  2021-05-11  2:42 ` [PATCH v6 04/16] KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled Like Xu
@ 2021-05-12  1:58   ` Venkatesh Srinivas
  2021-05-12  5:00     ` Xu, Like
  0 siblings, 1 reply; 56+ messages in thread
From: Venkatesh Srinivas @ 2021-05-12  1:58 UTC (permalink / raw)
  To: Like Xu
  Cc: Peter Zijlstra, Paolo Bonzini, Borislav Petkov,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, weijiang.yang, Kan Liang, ak, wei.w.wang, eranian,
	liuxiangdong5, linux-kernel, x86, kvm, Yao Yuan

On 5/10/21, Like Xu <like.xu@linux.intel.com> wrote:
> On Intel platforms, the software can use the IA32_MISC_ENABLE[7] bit to
> detect whether the processor supports performance monitoring facility.
>
> It depends on the PMU is enabled for the guest, and a software write
> operation to this available bit will be ignored.

Is the behavior that writes to IA32_MISC_ENABLE[7] are ignored (rather than #GP)
documented someplace?

Reviewed-by: Venkatesh Srinivas <venkateshs@chromium.org>

> Cc: Yao Yuan <yuan.yao@intel.com>
> Signed-off-by: Like Xu <like.xu@linux.intel.com>
> ---
>  arch/x86/kvm/vmx/pmu_intel.c | 1 +
>  arch/x86/kvm/x86.c           | 1 +
>  2 files changed, 2 insertions(+)
>
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 9efc1a6b8693..d9dbebe03cae 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -488,6 +488,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>  	if (!pmu->version)
>  		return;
>
> +	vcpu->arch.ia32_misc_enable_msr |= MSR_IA32_MISC_ENABLE_EMON;
>  	perf_get_x86_pmu_capability(&x86_pmu);
>
>  	pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5bd550eaf683..abe3ea69078c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3211,6 +3211,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct
> msr_data *msr_info)
>  		}
>  		break;
>  	case MSR_IA32_MISC_ENABLE:
> +		data &= ~MSR_IA32_MISC_ENABLE_EMON;
>  		if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)
> &&
>  		    ((vcpu->arch.ia32_misc_enable_msr ^ data) &
> MSR_IA32_MISC_ENABLE_MWAIT)) {
>  			if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
> --
> 2.31.1
>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/16] KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled
  2021-05-12  1:58   ` Venkatesh Srinivas
@ 2021-05-12  5:00     ` Xu, Like
  2021-05-12 15:18       ` Sean Christopherson
  0 siblings, 1 reply; 56+ messages in thread
From: Xu, Like @ 2021-05-12  5:00 UTC (permalink / raw)
  To: Venkatesh Srinivas
  Cc: Peter Zijlstra, Paolo Bonzini, Borislav Petkov,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, weijiang.yang, Kan Liang, ak, wei.w.wang, eranian,
	liuxiangdong5, linux-kernel, x86, kvm, Yao Yuan, Like Xu

Hi Venkatesh Srinivas,

On 2021/5/12 9:58, Venkatesh Srinivas wrote:
> On 5/10/21, Like Xu <like.xu@linux.intel.com> wrote:
>> On Intel platforms, the software can use the IA32_MISC_ENABLE[7] bit to
>> detect whether the processor supports performance monitoring facility.
>>
>> It depends on the PMU is enabled for the guest, and a software write
>> operation to this available bit will be ignored.
> Is the behavior that writes to IA32_MISC_ENABLE[7] are ignored (rather than #GP)
> documented someplace?

The bit[7] behavior of the real hardware on the native host is quite 
suspicious.

To keep the semantics consistent and simple, we propose ignoring write 
operation
in the virtualized world, since whether or not to expose PMU is configured 
by the
hypervisor user space and not by the guest side.

I assume your "reviewed-by" also points this out. Thanks.

>
> Reviewed-by: Venkatesh Srinivas <venkateshs@chromium.org>
>
>> Cc: Yao Yuan <yuan.yao@intel.com>
>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
>> ---
>>   arch/x86/kvm/vmx/pmu_intel.c | 1 +
>>   arch/x86/kvm/x86.c           | 1 +
>>   2 files changed, 2 insertions(+)
>>
>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
>> index 9efc1a6b8693..d9dbebe03cae 100644
>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>> @@ -488,6 +488,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>>   	if (!pmu->version)
>>   		return;
>>
>> +	vcpu->arch.ia32_misc_enable_msr |= MSR_IA32_MISC_ENABLE_EMON;
>>   	perf_get_x86_pmu_capability(&x86_pmu);
>>
>>   	pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 5bd550eaf683..abe3ea69078c 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3211,6 +3211,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct
>> msr_data *msr_info)
>>   		}
>>   		break;
>>   	case MSR_IA32_MISC_ENABLE:
>> +		data &= ~MSR_IA32_MISC_ENABLE_EMON;
>>   		if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)
>> &&
>>   		    ((vcpu->arch.ia32_misc_enable_msr ^ data) &
>> MSR_IA32_MISC_ENABLE_MWAIT)) {
>>   			if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
>> --
>> 2.31.1
>>
>>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 08/16] KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS
  2021-05-11  2:42 ` [PATCH v6 08/16] KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS Like Xu
@ 2021-05-12  5:16   ` Xu, Like
  2021-05-17 13:26   ` Peter Zijlstra
  1 sibling, 0 replies; 56+ messages in thread
From: Xu, Like @ 2021-05-12  5:16 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, liuxiangdong5, linux-kernel, x86, kvm,
	Like Xu

On 2021/5/11 10:42, Like Xu wrote:
> @@ -3908,7 +3911,7 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
>   		.guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask),
>   	};
>   
> -	if (!x86_pmu.pebs)
> +	if (!pmu || !x86_pmu.pebs_vmx)
>   		return arr;
>   
>   	/*
> @@ -3931,6 +3934,12 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
>   	if (!x86_pmu.pebs_vmx)
>   		return arr;
>   
> +	arr[(*nr)++] = (struct perf_guest_switch_msr){
> +		.msr = MSR_IA32_DS_AREA,
> +		.host = (unsigned long)ds,
> +		.guest = pmu->ds_area,
> +	};
> +
>   	arr[*nr] = (struct perf_guest_switch_msr){
>   		.msr = MSR_IA32_PEBS_ENABLE,
>   		.host = cpuc->pebs_enabled & ~cpuc->intel_ctrl_guest_mask,

Sorry, this part should be:

@@ -3928,9 +3931,15 @@ static struct perf_guest_switch_msr 
*intel_guest_get_msrs(int *nr, void *data)
                 return arr;
         }

-       if (!x86_pmu.pebs_vmx)
+       if (!pmu || !x86_pmu.pebs_vmx)
                 return arr;

+       arr[(*nr)++] = (struct perf_guest_switch_msr){
+               .msr = MSR_IA32_DS_AREA,
+               .host = (unsigned long)ds,
+               .guest = pmu->ds_area,
+       };
+
         arr[*nr] = (struct perf_guest_switch_msr){
                 .msr = MSR_IA32_PEBS_ENABLE,
                 .host = cpuc->pebs_enabled & ~cpuc->intel_ctrl_guest_mask,


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/16] KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled
  2021-05-12  5:00     ` Xu, Like
@ 2021-05-12 15:18       ` Sean Christopherson
  2021-05-13  2:50         ` Xu, Like
  0 siblings, 1 reply; 56+ messages in thread
From: Sean Christopherson @ 2021-05-12 15:18 UTC (permalink / raw)
  To: Xu, Like
  Cc: Venkatesh Srinivas, Peter Zijlstra, Paolo Bonzini,
	Borislav Petkov, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, weijiang.yang, Kan Liang, ak, wei.w.wang, eranian,
	liuxiangdong5, linux-kernel, x86, kvm, Yao Yuan, Like Xu

On Wed, May 12, 2021, Xu, Like wrote:
> Hi Venkatesh Srinivas,
> 
> On 2021/5/12 9:58, Venkatesh Srinivas wrote:
> > On 5/10/21, Like Xu <like.xu@linux.intel.com> wrote:
> > > On Intel platforms, the software can use the IA32_MISC_ENABLE[7] bit to
> > > detect whether the processor supports performance monitoring facility.
> > > 
> > > It depends on the PMU is enabled for the guest, and a software write
> > > operation to this available bit will be ignored.
> > Is the behavior that writes to IA32_MISC_ENABLE[7] are ignored (rather than #GP)
> > documented someplace?
> 
> The bit[7] behavior of the real hardware on the native host is quite
> suspicious.

Ugh.  Can you file an SDM bug to get the wording and accessibility updated?  The
current phrasing is a mess:

  Performance Monitoring Available (R)
  1 = Performance monitoring enabled.
  0 = Performance monitoring disabled.

The (R) is ambiguous because most other entries that are read-only use (RO), and
the "enabled vs. disabled" implies the bit is writable and really does control
the PMU.  But on my Haswell system, it's read-only.  Assuming the bit is supposed
to be a read-only "PMU supported bit", the SDM should be:

  Performance Monitoring Available (RO)
  1 = Performance monitoring supported.
  0 = Performance monitoring not supported.

And please update the changelog to explain the "why" of whatever the behavior
ends up being.  The "what" is obvious from the code.

> To keep the semantics consistent and simple, we propose ignoring write
> operation in the virtualized world, since whether or not to expose PMU is
> configured by the hypervisor user space and not by the guest side.

Making up our own architectural behavior because it's convient is not a good
idea.

> > > diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> > > index 9efc1a6b8693..d9dbebe03cae 100644
> > > --- a/arch/x86/kvm/vmx/pmu_intel.c
> > > +++ b/arch/x86/kvm/vmx/pmu_intel.c
> > > @@ -488,6 +488,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
> > >   	if (!pmu->version)
> > >   		return;
> > > 
> > > +	vcpu->arch.ia32_misc_enable_msr |= MSR_IA32_MISC_ENABLE_EMON;

Hmm, normally I would say overwriting the guest's value is a bad idea, but if
the bit really is a read-only "PMU supported" bit, then this is the correct
behavior, albeit weird if userspace does a late CPUID update (though that's
weird no matter what).

> > >   	perf_get_x86_pmu_capability(&x86_pmu);
> > > 
> > >   	pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 5bd550eaf683..abe3ea69078c 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -3211,6 +3211,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct
> > > msr_data *msr_info)
> > >   		}
> > >   		break;
> > >   	case MSR_IA32_MISC_ENABLE:
> > > +		data &= ~MSR_IA32_MISC_ENABLE_EMON;

However, this is not.  If it's a read-only bit, then toggling the bit should
cause a #GP.

> > >   		if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)
> > > &&
> > >   		    ((vcpu->arch.ia32_misc_enable_msr ^ data) &
> > > MSR_IA32_MISC_ENABLE_MWAIT)) {
> > >   			if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
> > > --

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/16] KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled
  2021-05-12 15:18       ` Sean Christopherson
@ 2021-05-13  2:50         ` Xu, Like
  2021-05-17 18:43           ` Venkatesh Srinivas
  2021-05-17 21:16           ` Sean Christopherson
  0 siblings, 2 replies; 56+ messages in thread
From: Xu, Like @ 2021-05-13  2:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Venkatesh Srinivas, Peter Zijlstra, Paolo Bonzini,
	Borislav Petkov, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, weijiang.yang, Kan Liang, ak, wei.w.wang, eranian,
	liuxiangdong5, linux-kernel, x86, kvm, Yao Yuan, Like Xu

On 2021/5/12 23:18, Sean Christopherson wrote:
> On Wed, May 12, 2021, Xu, Like wrote:
>> Hi Venkatesh Srinivas,
>>
>> On 2021/5/12 9:58, Venkatesh Srinivas wrote:
>>> On 5/10/21, Like Xu <like.xu@linux.intel.com> wrote:
>>>> On Intel platforms, the software can use the IA32_MISC_ENABLE[7] bit to
>>>> detect whether the processor supports performance monitoring facility.
>>>>
>>>> It depends on the PMU is enabled for the guest, and a software write
>>>> operation to this available bit will be ignored.
>>> Is the behavior that writes to IA32_MISC_ENABLE[7] are ignored (rather than #GP)
>>> documented someplace?
>> The bit[7] behavior of the real hardware on the native host is quite
>> suspicious.
> Ugh.  Can you file an SDM bug to get the wording and accessibility updated?  The
> current phrasing is a mess:
>
>    Performance Monitoring Available (R)
>    1 = Performance monitoring enabled.
>    0 = Performance monitoring disabled.
>
> The (R) is ambiguous because most other entries that are read-only use (RO), and
> the "enabled vs. disabled" implies the bit is writable and really does control
> the PMU.  But on my Haswell system, it's read-only.

On your Haswell system, does it cause #GP or just silent if you change this 
bit ?

> Assuming the bit is supposed
> to be a read-only "PMU supported bit", the SDM should be:
>
>    Performance Monitoring Available (RO)
>    1 = Performance monitoring supported.
>    0 = Performance monitoring not supported.
>
> And please update the changelog to explain the "why" of whatever the behavior
> ends up being.  The "what" is obvious from the code.

Thanks for your "why" comment.

>
>> To keep the semantics consistent and simple, we propose ignoring write
>> operation in the virtualized world, since whether or not to expose PMU is
>> configured by the hypervisor user space and not by the guest side.
> Making up our own architectural behavior because it's convient is not a good
> idea.

Sometime we do change it.

For example, the scope of some msrs may be "core level share"
but we likely keep it as a "thread level" variable in the KVM out of 
convenience.

>
>>>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
>>>> index 9efc1a6b8693..d9dbebe03cae 100644
>>>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>>>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>>>> @@ -488,6 +488,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
>>>>    	if (!pmu->version)
>>>>    		return;
>>>>
>>>> +	vcpu->arch.ia32_misc_enable_msr |= MSR_IA32_MISC_ENABLE_EMON;
> Hmm, normally I would say overwriting the guest's value is a bad idea, but if
> the bit really is a read-only "PMU supported" bit, then this is the correct
> behavior, albeit weird if userspace does a late CPUID update (though that's
> weird no matter what).
>
>>>>    	perf_get_x86_pmu_capability(&x86_pmu);
>>>>
>>>>    	pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index 5bd550eaf683..abe3ea69078c 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -3211,6 +3211,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct
>>>> msr_data *msr_info)
>>>>    		}
>>>>    		break;
>>>>    	case MSR_IA32_MISC_ENABLE:
>>>> +		data &= ~MSR_IA32_MISC_ENABLE_EMON;
> However, this is not.  If it's a read-only bit, then toggling the bit should
> cause a #GP.

The proposal here is trying to make it as an
unchangeable bit and don't make it #GP if guest changes it.

It may different from the host behavior but
it doesn't cause potential issue if some guest code
changes it during the use of performance monitoring.

Does this make sense to you or do you want to
keep it strictly the same as the host side?

>
>>>>    		if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)
>>>> &&
>>>>    		    ((vcpu->arch.ia32_misc_enable_msr ^ data) &
>>>> MSR_IA32_MISC_ENABLE_MWAIT)) {
>>>>    			if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
>>>> --


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS
  2021-05-11  2:41 [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS Like Xu
                   ` (15 preceding siblings ...)
  2021-05-11  2:42 ` [PATCH v6 16/16] KVM: x86/pmu: Expose CPUIDs feature bits PDCM, DS, DTES64 Like Xu
@ 2021-05-15 10:30 ` Liuxiangdong
  2021-05-17  6:38   ` Like Xu
  16 siblings, 1 reply; 56+ messages in thread
From: Liuxiangdong @ 2021-05-15 10:30 UTC (permalink / raw)
  To: Like Xu, Peter Zijlstra, Paolo Bonzini
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, linux-kernel, x86, kvm, Fangyi (Eric),
	Xiexiangyou



On 2021/5/11 10:41, Like Xu wrote:
> A new kernel cycle has begun, and this version looks promising.
>
> The guest Precise Event Based Sampling (PEBS) feature can provide
> an architectural state of the instruction executed after the guest
> instruction that exactly caused the event. It needs new hardware
> facility only available on Intel Ice Lake Server platforms. This
> patch set enables the basic PEBS feature for KVM guests on ICX.
>
> We can use PEBS feature on the Linux guest like native:
>
>    # perf record -e instructions:ppp ./br_instr a
>    # perf record -c 100000 -e instructions:pp ./br_instr a

Hi, Like.
Has the qemu patch been modified?

https://lore.kernel.org/kvm/f4dcb068-2ddf-428f-50ad-39f65cad3710@intel.com/ 
?


> To emulate guest PEBS facility for the above perf usages,
> we need to implement 2 code paths:
>
> 1) Fast path
>
> This is when the host assigned physical PMC has an identical index as
> the virtual PMC (e.g. using physical PMC0 to emulate virtual PMC0).
> This path is used in most common use cases.
>
> 2) Slow path
>
> This is when the host assigned physical PMC has a different index
> from the virtual PMC (e.g. using physical PMC1 to emulate virtual PMC0)
> In this case, KVM needs to rewrite the PEBS records to change the
> applicable counter indexes to the virtual PMC indexes, which would
> otherwise contain the physical counter index written by PEBS facility,
> and switch the counter reset values to the offset corresponding to
> the physical counter indexes in the DS data structure.
>
> The previous version [0] enables both fast path and slow path, which
> seems a bit more complex as the first step. In this patchset, we want
> to start with the fast path to get the basic guest PEBS enabled while
> keeping the slow path disabled. More focused discussion on the slow
> path [1] is planned to be put to another patchset in the next step.
>
> Compared to later versions in subsequent steps, the functionality
> to support host-guest PEBS both enabled and the functionality to
> emulate guest PEBS when the counter is cross-mapped are missing
> in this patch set (neither of these are typical scenarios).
>
> With the basic support, the guest can retrieve the correct PEBS
> information from its own PEBS records on the Ice Lake servers.
> And we expect it should work when migrating to another Ice Lake
> and no regression about host perf is expected.
>
> Here are the results of pebs test from guest/host for same workload:
>
> perf report on guest:
> # Samples: 2K of event 'instructions:ppp', # Event count (approx.): 1473377250
> # Overhead  Command   Shared Object      Symbol
>    57.74%  br_instr  br_instr           [.] lfsr_cond
>    41.40%  br_instr  br_instr           [.] cmp_end
>     0.21%  br_instr  [kernel.kallsyms]  [k] __lock_acquire
>
> perf report on host:
> # Samples: 2K of event 'instructions:ppp', # Event count (approx.): 1462721386
> # Overhead  Command   Shared Object     Symbol
>    57.90%  br_instr  br_instr          [.] lfsr_cond
>    41.95%  br_instr  br_instr          [.] cmp_end
>     0.05%  br_instr  [kernel.vmlinux]  [k] lock_acquire
>     Conclusion: the profiling results on the guest are similar tothat on the host.
>
> A minimum guest kernel version may be v5.4 or a backport version
> support Icelake server PEBS.
>
> Please check more details in each commit and feel free to comment.
>
> Previous:
> https://lore.kernel.org/kvm/20210415032016.166201-1-like.xu@linux.intel.com/
>
> [0] https://lore.kernel.org/kvm/20210104131542.495413-1-like.xu@linux.intel.com/
> [1] https://lore.kernel.org/kvm/20210115191113.nktlnmivc3edstiv@two.firstfloor.org/
>
> V5 -> V6 Changelog:
> - Rebased on the latest kvm/queue tree;
> - Fix a git rebase issue (Liuxiangdong);
> - Adjust the patch sequence 06/07 for bisection (Liuxiangdong);
>
> Like Xu (16):
>    perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server
>    perf/x86/intel: Handle guest PEBS overflow PMI for KVM guest
>    perf/x86/core: Pass "struct kvm_pmu *" to determine the guest values
>    KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled
>    KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter
>    KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS
>    KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter
>    KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS
>    KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS
>    KVM: x86: Set PEBS_UNAVAIL in IA32_MISC_ENABLE when PEBS is enabled
>    KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR counter
>    KVM: x86/pmu: Move pmc_speculative_in_use() to arch/x86/kvm/pmu.h
>    KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations
>    KVM: x86/pmu: Add kvm_pmu_cap to optimize perf_get_x86_pmu_capability
>    KVM: x86/cpuid: Refactor host/guest CPU model consistency check
>    KVM: x86/pmu: Expose CPUIDs feature bits PDCM, DS, DTES64
>
>   arch/x86/events/core.c            |   5 +-
>   arch/x86/events/intel/core.c      | 129 ++++++++++++++++++++++++------
>   arch/x86/events/perf_event.h      |   5 +-
>   arch/x86/include/asm/kvm_host.h   |  16 ++++
>   arch/x86/include/asm/msr-index.h  |   6 ++
>   arch/x86/include/asm/perf_event.h |   5 +-
>   arch/x86/kvm/cpuid.c              |  24 ++----
>   arch/x86/kvm/cpuid.h              |   5 ++
>   arch/x86/kvm/pmu.c                |  50 +++++++++---
>   arch/x86/kvm/pmu.h                |  38 +++++++++
>   arch/x86/kvm/vmx/capabilities.h   |  26 ++++--
>   arch/x86/kvm/vmx/pmu_intel.c      | 115 +++++++++++++++++++++-----
>   arch/x86/kvm/vmx/vmx.c            |  24 +++++-
>   arch/x86/kvm/vmx/vmx.h            |   2 +-
>   arch/x86/kvm/x86.c                |  14 ++--
>   15 files changed, 368 insertions(+), 96 deletions(-)
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS
  2021-05-15 10:30 ` [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS Liuxiangdong
@ 2021-05-17  6:38   ` Like Xu
  2021-05-18 12:23     ` Liuxiangdong
  0 siblings, 1 reply; 56+ messages in thread
From: Like Xu @ 2021-05-17  6:38 UTC (permalink / raw)
  To: Liuxiangdong
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, linux-kernel, x86, kvm, Fangyi (Eric),
	Xiexiangyou, Peter Zijlstra, Paolo Bonzini

Hi xiangdong,

On 2021/5/15 18:30, Liuxiangdong wrote:
> 
> 
> On 2021/5/11 10:41, Like Xu wrote:
>> A new kernel cycle has begun, and this version looks promising.
>>
>> The guest Precise Event Based Sampling (PEBS) feature can provide
>> an architectural state of the instruction executed after the guest
>> instruction that exactly caused the event. It needs new hardware
>> facility only available on Intel Ice Lake Server platforms. This
>> patch set enables the basic PEBS feature for KVM guests on ICX.
>>
>> We can use PEBS feature on the Linux guest like native:
>>
>>    # perf record -e instructions:ppp ./br_instr a
>>    # perf record -c 100000 -e instructions:pp ./br_instr a
> 
> Hi, Like.
> Has the qemu patch been modified?
> 
> https://lore.kernel.org/kvm/f4dcb068-2ddf-428f-50ad-39f65cad3710@intel.com/ ?

I think the qemu part still works based on
609d7596524ab204ccd71ef42c9eee4c7c338ea4 (tag: v6.0.0).

When the LBR qemu patch receives the ACK from the maintainer,
I will submit PBES qemu support because their changes are very similar.

Please help review this version and
feel free to add your comments or "Reviewed-by".

Thanks,
Like Xu

> 
> 
>> To emulate guest PEBS facility for the above perf usages,
>> we need to implement 2 code paths:
>>
>> 1) Fast path
>>
>> This is when the host assigned physical PMC has an identical index as
>> the virtual PMC (e.g. using physical PMC0 to emulate virtual PMC0).
>> This path is used in most common use cases.
>>
>> 2) Slow path
>>
>> This is when the host assigned physical PMC has a different index
>> from the virtual PMC (e.g. using physical PMC1 to emulate virtual PMC0)
>> In this case, KVM needs to rewrite the PEBS records to change the
>> applicable counter indexes to the virtual PMC indexes, which would
>> otherwise contain the physical counter index written by PEBS facility,
>> and switch the counter reset values to the offset corresponding to
>> the physical counter indexes in the DS data structure.
>>
>> The previous version [0] enables both fast path and slow path, which
>> seems a bit more complex as the first step. In this patchset, we want
>> to start with the fast path to get the basic guest PEBS enabled while
>> keeping the slow path disabled. More focused discussion on the slow
>> path [1] is planned to be put to another patchset in the next step.
>>
>> Compared to later versions in subsequent steps, the functionality
>> to support host-guest PEBS both enabled and the functionality to
>> emulate guest PEBS when the counter is cross-mapped are missing
>> in this patch set (neither of these are typical scenarios).
>>
>> With the basic support, the guest can retrieve the correct PEBS
>> information from its own PEBS records on the Ice Lake servers.
>> And we expect it should work when migrating to another Ice Lake
>> and no regression about host perf is expected.
>>
>> Here are the results of pebs test from guest/host for same workload:
>>
>> perf report on guest:
>> # Samples: 2K of event 'instructions:ppp', # Event count (approx.): 
>> 1473377250
>> # Overhead  Command   Shared Object      Symbol
>>    57.74%  br_instr  br_instr           [.] lfsr_cond
>>    41.40%  br_instr  br_instr           [.] cmp_end
>>     0.21%  br_instr  [kernel.kallsyms]  [k] __lock_acquire
>>
>> perf report on host:
>> # Samples: 2K of event 'instructions:ppp', # Event count (approx.): 
>> 1462721386
>> # Overhead  Command   Shared Object     Symbol
>>    57.90%  br_instr  br_instr          [.] lfsr_cond
>>    41.95%  br_instr  br_instr          [.] cmp_end
>>     0.05%  br_instr  [kernel.vmlinux]  [k] lock_acquire
>>     Conclusion: the profiling results on the guest are similar tothat on 
>> the host.
>>
>> A minimum guest kernel version may be v5.4 or a backport version
>> support Icelake server PEBS.
>>
>> Please check more details in each commit and feel free to comment.
>>
>> Previous:
>> https://lore.kernel.org/kvm/20210415032016.166201-1-like.xu@linux.intel.com/
>>
>> [0] 
>> https://lore.kernel.org/kvm/20210104131542.495413-1-like.xu@linux.intel.com/
>> [1] 
>> https://lore.kernel.org/kvm/20210115191113.nktlnmivc3edstiv@two.firstfloor.org/ 
>>
>>
>> V5 -> V6 Changelog:
>> - Rebased on the latest kvm/queue tree;
>> - Fix a git rebase issue (Liuxiangdong);
>> - Adjust the patch sequence 06/07 for bisection (Liuxiangdong);
>>
>> Like Xu (16):
>>    perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server
>>    perf/x86/intel: Handle guest PEBS overflow PMI for KVM guest
>>    perf/x86/core: Pass "struct kvm_pmu *" to determine the guest values
>>    KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled
>>    KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter
>>    KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS
>>    KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter
>>    KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS
>>    KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS
>>    KVM: x86: Set PEBS_UNAVAIL in IA32_MISC_ENABLE when PEBS is enabled
>>    KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR counter
>>    KVM: x86/pmu: Move pmc_speculative_in_use() to arch/x86/kvm/pmu.h
>>    KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations
>>    KVM: x86/pmu: Add kvm_pmu_cap to optimize perf_get_x86_pmu_capability
>>    KVM: x86/cpuid: Refactor host/guest CPU model consistency check
>>    KVM: x86/pmu: Expose CPUIDs feature bits PDCM, DS, DTES64
>>
>>   arch/x86/events/core.c            |   5 +-
>>   arch/x86/events/intel/core.c      | 129 ++++++++++++++++++++++++------
>>   arch/x86/events/perf_event.h      |   5 +-
>>   arch/x86/include/asm/kvm_host.h   |  16 ++++
>>   arch/x86/include/asm/msr-index.h  |   6 ++
>>   arch/x86/include/asm/perf_event.h |   5 +-
>>   arch/x86/kvm/cpuid.c              |  24 ++----
>>   arch/x86/kvm/cpuid.h              |   5 ++
>>   arch/x86/kvm/pmu.c                |  50 +++++++++---
>>   arch/x86/kvm/pmu.h                |  38 +++++++++
>>   arch/x86/kvm/vmx/capabilities.h   |  26 ++++--
>>   arch/x86/kvm/vmx/pmu_intel.c      | 115 +++++++++++++++++++++-----
>>   arch/x86/kvm/vmx/vmx.c            |  24 +++++-
>>   arch/x86/kvm/vmx/vmx.h            |   2 +-
>>   arch/x86/kvm/x86.c                |  14 ++--
>>   15 files changed, 368 insertions(+), 96 deletions(-)
>>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 02/16] perf/x86/intel: Handle guest PEBS overflow PMI for KVM guest
  2021-05-11  2:42 ` [PATCH v6 02/16] perf/x86/intel: Handle guest PEBS overflow PMI for KVM guest Like Xu
@ 2021-05-17  8:16   ` Peter Zijlstra
  2021-05-18  7:38     ` Xu, Like
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2021-05-17  8:16 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, Borislav Petkov, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	weijiang.yang, Kan Liang, ak, wei.w.wang, eranian, liuxiangdong5,
	linux-kernel, x86, kvm

On Tue, May 11, 2021 at 10:42:00AM +0800, Like Xu wrote:
> With PEBS virtualization, the guest PEBS records get delivered to the
> guest DS, and the host pmi handler uses perf_guest_cbs->is_in_guest()
> to distinguish whether the PMI comes from the guest code like Intel PT.
> 
> No matter how many guest PEBS counters are overflowed, only triggering
> one fake event is enough. The fake event causes the KVM PMI callback to
> be called, thereby injecting the PEBS overflow PMI into the guest.
> 
> KVM may inject the PMI with BUFFER_OVF set, even if the guest DS is
> empty. That should really be harmless. Thus guest PEBS handler would
> retrieve the correct information from its own PEBS records buffer.
> 
> Originally-by: Andi Kleen <ak@linux.intel.com>
> Co-developed-by: Kan Liang <kan.liang@linux.intel.com>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> Signed-off-by: Like Xu <like.xu@linux.intel.com>
> ---
>  arch/x86/events/intel/core.c | 40 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
> 
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index b6e45ee10e16..092ecacf8345 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -2780,6 +2780,43 @@ static void intel_pmu_reset(void)
>  	local_irq_restore(flags);
>  }
>  
> +/*
> + * We may be running with guest PEBS events created by KVM, and the
> + * PEBS records are logged into the guest's DS and invisible to host.
> + *
> + * In the case of guest PEBS overflow, we only trigger a fake event
> + * to emulate the PEBS overflow PMI for guest PBES counters in KVM.
> + * The guest will then vm-entry and check the guest DS area to read
> + * the guest PEBS records.
> + *
> + * The contents and other behavior of the guest event do not matter.
> + */
> +static void x86_pmu_handle_guest_pebs(struct pt_regs *regs,
> +				      struct perf_sample_data *data)
> +{
> +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> +	u64 guest_pebs_idxs = cpuc->pebs_enabled & ~cpuc->intel_ctrl_host_mask;
> +	struct perf_event *event = NULL;
> +	int bit;
> +
> +	if (!x86_pmu.pebs_active || !guest_pebs_idxs)
> +		return;
> +
> +	for_each_set_bit(bit, (unsigned long *)&guest_pebs_idxs,
> +			 INTEL_PMC_IDX_FIXED + x86_pmu.num_counters_fixed) {
> +		event = cpuc->events[bit];
> +		if (!event->attr.precise_ip)
> +			continue;
> +
> +		perf_sample_data_init(data, 0, event->hw.last_period);
> +		if (perf_event_overflow(event, data, regs))
> +			x86_pmu_stop(event, 0);
> +
> +		/* Inject one fake event is enough. */
> +		break;
> +	}
> +}
> +
>  static int handle_pmi_common(struct pt_regs *regs, u64 status)
>  {
>  	struct perf_sample_data data;
> @@ -2831,6 +2868,9 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
>  		u64 pebs_enabled = cpuc->pebs_enabled;
>  
>  		handled++;
> +		if (x86_pmu.pebs_vmx && perf_guest_cbs &&
> +		    perf_guest_cbs->is_in_guest())
> +			x86_pmu_handle_guest_pebs(regs, &data);
>  		x86_pmu.drain_pebs(regs, &data);
>  		status &= intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;
>  

I'm thinking you have your conditions in the wrong order; would it not
be much cheaper to first check: '!x86_pmu.pebs_active || !guest_pebs_idx'
than to do that horrible indirect ->is_in_guest() call?

After all, if the guest doesn't have PEBS enabled, who cares if we're
currently in a guest or not.

Also, something like the below perhaps (arm64 and xen need fixing up at
the very least) could make all that perf_guest_cbs stuff suck less.


---
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 8e509325c2c3..c8f8fb7c0536 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -90,6 +90,26 @@ DEFINE_STATIC_CALL_NULL(x86_pmu_pebs_aliases, *x86_pmu.pebs_aliases);
  */
 DEFINE_STATIC_CALL_RET0(x86_pmu_guest_get_msrs, *x86_pmu.guest_get_msrs);
 
+DEFINE_STATIC_CALL_RET0(x86_guest_state, *(perf_guest_cbs->state));
+DEFINE_STATIC_CALL_RET0(x86_guest_get_ip, *(perf_guest_cbs->get_ip));
+DEFINE_STATIC_CALL_RET0(x86_guest_handle_intel_pt_intr, *(perf_guest_cbs->handle_intel_pt_intr));
+
+void arch_perf_update_guest_cbs(void)
+{
+	static_call_update(x86_guest_state, (void *)&__static_call_return0);
+	static_call_update(x86_guest_get_ip, (void *)&__static_call_return0);
+	static_call_update(x86_guest_handle_intel_pt_intr, (void *)&__static_call_return0);
+
+	if (perf_guest_cbs && perf_guest_cbs->state)
+		static_call_update(x86_guest_state, perf_guest_cbs->state);
+
+	if (perf_guest_cbs && perf_guest_cbs->get_ip)
+		static_call_update(x86_guest_get_ip, perf_guest_cbs->get_ip);
+
+	if (perf_guest_cbs && perf_guest_cbs->handle_intel_pt_intr)
+		static_call_update(x86_guest_handle_intel_pt_intr, perf_guest_cbs->handle_intel_pt_intr);
+}
+
 u64 __read_mostly hw_cache_event_ids
 				[PERF_COUNT_HW_CACHE_MAX]
 				[PERF_COUNT_HW_CACHE_OP_MAX]
@@ -2736,7 +2756,7 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
 	struct unwind_state state;
 	unsigned long addr;
 
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+	if (static_call(x86_guest_state)()) {
 		/* TODO: We don't support guest os callchain now */
 		return;
 	}
@@ -2839,7 +2859,7 @@ perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs
 	struct stack_frame frame;
 	const struct stack_frame __user *fp;
 
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+	if (static_call(x86_guest_state)()) {
 		/* TODO: We don't support guest os callchain now */
 		return;
 	}
@@ -2916,18 +2936,21 @@ static unsigned long code_segment_base(struct pt_regs *regs)
 
 unsigned long perf_instruction_pointer(struct pt_regs *regs)
 {
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest())
-		return perf_guest_cbs->get_guest_ip();
+	unsigned long ip = static_call(x86_guest_get_ip)();
+
+	if (likely(!ip))
+		ip = regs->ip + code_segment_base(regs);
 
-	return regs->ip + code_segment_base(regs);
+	return ip;
 }
 
 unsigned long perf_misc_flags(struct pt_regs *regs)
 {
+	unsigned int guest = static_call(x86_guest_state)();
 	int misc = 0;
 
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
-		if (perf_guest_cbs->is_user_mode())
+	if (guest) {
+		if (guest & PERF_GUEST_USER)
 			misc |= PERF_RECORD_MISC_GUEST_USER;
 		else
 			misc |= PERF_RECORD_MISC_GUEST_KERNEL;
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 2521d03de5e0..ac422c45f940 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2780,6 +2780,8 @@ static void intel_pmu_reset(void)
 	local_irq_restore(flags);
 }
 
+DECLARE_STATIC_CALL(x86_guest_handle_intel_pt_intr, *(perf_guest_cbs->handle_intel_pt_intr));
+
 static int handle_pmi_common(struct pt_regs *regs, u64 status)
 {
 	struct perf_sample_data data;
@@ -2850,10 +2852,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
 	 */
 	if (__test_and_clear_bit(GLOBAL_STATUS_TRACE_TOPAPMI_BIT, (unsigned long *)&status)) {
 		handled++;
-		if (unlikely(perf_guest_cbs && perf_guest_cbs->is_in_guest() &&
-			perf_guest_cbs->handle_intel_pt_intr))
-			perf_guest_cbs->handle_intel_pt_intr();
-		else
+		if (!static_call(x86_guest_handle_intel_pt_intr)())
 			intel_pt_interrupt();
 	}
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 55efbacfc244..2a24e615fa4a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1812,7 +1812,7 @@ int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu);
 int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err);
 void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu);
 
-int kvm_is_in_guest(void);
+unsigned int kvm_guest_state(void);
 
 void __user *__x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa,
 				     u32 size);
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 827886c12c16..2dcbd1b30004 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -87,7 +87,7 @@ static void kvm_perf_overflow_intr(struct perf_event *perf_event,
 		 * woken up. So we should wake it, but this is impossible from
 		 * NMI context. Do it from irq work instead.
 		 */
-		if (!kvm_is_in_guest())
+		if (!kvm_guest_state())
 			irq_work_queue(&pmc_to_pmu(pmc)->irq_work);
 		else
 			kvm_make_request(KVM_REQ_PMI, pmc->vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bbc4e04e67ad..88f709b3759c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8035,44 +8035,46 @@ static void kvm_timer_init(void)
 DEFINE_PER_CPU(struct kvm_vcpu *, current_vcpu);
 EXPORT_PER_CPU_SYMBOL_GPL(current_vcpu);
 
-int kvm_is_in_guest(void)
+static unsigned int kvm_guest_state(void)
 {
-	return __this_cpu_read(current_vcpu) != NULL;
-}
-
-static int kvm_is_user_mode(void)
-{
-	int user_mode = 3;
+	struct kvm_vcpu *vcpu = __this_cpu_read(current_vcpu);
+	unsigned int state = 0;
 
-	if (__this_cpu_read(current_vcpu))
-		user_mode = static_call(kvm_x86_get_cpl)(__this_cpu_read(current_vcpu));
+	if (vcpu)
+		state |= PERF_GUEST_ACTIVE;
+	if (static_call(kvm_x86_get_cpl)(vcpu))
+		state |= PERF_GUEST_USER;
 
-	return user_mode != 0;
+	return state;
 }
 
-static unsigned long kvm_get_guest_ip(void)
+static unsigned long kvm_guest_get_ip(void)
 {
+	struct kvm_vcpu *vcpu = __this_cpu_read(current_vcpu);
 	unsigned long ip = 0;
 
-	if (__this_cpu_read(current_vcpu))
-		ip = kvm_rip_read(__this_cpu_read(current_vcpu));
+	if (vcpu)
+		ip = kvm_rip_read(vcpu);
 
 	return ip;
 }
 
-static void kvm_handle_intel_pt_intr(void)
+static unsigned int kvm_handle_intel_pt_intr(void)
 {
 	struct kvm_vcpu *vcpu = __this_cpu_read(current_vcpu);
 
+	if (!vcpu)
+		return 0;
+
 	kvm_make_request(KVM_REQ_PMI, vcpu);
 	__set_bit(MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI_BIT,
 			(unsigned long *)&vcpu->arch.pmu.global_status);
+	return 1;
 }
 
 static struct perf_guest_info_callbacks kvm_guest_cbs = {
-	.is_in_guest		= kvm_is_in_guest,
-	.is_user_mode		= kvm_is_user_mode,
-	.get_guest_ip		= kvm_get_guest_ip,
+	.state			= kvm_guest_state,
+	.get_ip			= kvm_guest_get_ip,
 	.handle_intel_pt_intr	= kvm_handle_intel_pt_intr,
 };
 
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index f5a6a2f069ed..7eae1fd22db3 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -26,11 +26,13 @@
 # include <asm/local64.h>
 #endif
 
+#define PERF_GUEST_ACTIVE	0x01
+#define PERF_GUEST_USER		0x02
+
 struct perf_guest_info_callbacks {
-	int				(*is_in_guest)(void);
-	int				(*is_user_mode)(void);
-	unsigned long			(*get_guest_ip)(void);
-	void				(*handle_intel_pt_intr)(void);
+	unsigned int			(*state)(void);
+	unsigned long			(*get_ip)(void);
+	unsigned int			(*handle_intel_pt_intr)(void);
 };
 
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
@@ -1237,6 +1239,8 @@ extern void perf_event_bpf_event(struct bpf_prog *prog,
 				 u16 flags);
 
 extern struct perf_guest_info_callbacks *perf_guest_cbs;
+extern void __weak arch_perf_update_guest_cbs(void);
+
 extern int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
 extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2e947a485898..aec531fc9c90 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6486,9 +6486,17 @@ static void perf_pending_event(struct irq_work *entry)
  */
 struct perf_guest_info_callbacks *perf_guest_cbs;
 
+void __weak arch_perf_update_guest_cbs(void)
+{
+}
+
 int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
 {
+	if (WARN_ON_ONCE(perf_guest_cbs))
+		return -EBUSY;
+
 	perf_guest_cbs = cbs;
+	arch_perf_update_guest_cbs();
 	return 0;
 }
 EXPORT_SYMBOL_GPL(perf_register_guest_info_callbacks);

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 05/16] KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter
  2021-05-11  2:42 ` [PATCH v6 05/16] KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter Like Xu
@ 2021-05-17  8:18   ` Peter Zijlstra
  2021-05-18  7:55     ` Xu, Like
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2021-05-17  8:18 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, Borislav Petkov, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	weijiang.yang, Kan Liang, ak, wei.w.wang, eranian, liuxiangdong5,
	linux-kernel, x86, kvm, Luwei Kang

On Tue, May 11, 2021 at 10:42:03AM +0800, Like Xu wrote:
> The mask value of fixed counter control register should be dynamic
> adjusted with the number of fixed counters. This patch introduces a
> variable that includes the reserved bits of fixed counter control
> registers. This is needed for later Ice Lake fixed counter changes.
> 
> Co-developed-by: Luwei Kang <luwei.kang@intel.com>
> Signed-off-by: Luwei Kang <luwei.kang@intel.com>
> Signed-off-by: Like Xu <like.xu@linux.intel.com>
> ---
>  arch/x86/include/asm/kvm_host.h | 1 +
>  arch/x86/kvm/vmx/pmu_intel.c    | 6 +++++-
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 55efbacfc244..49b421bd3dd8 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -457,6 +457,7 @@ struct kvm_pmu {
>  	unsigned nr_arch_fixed_counters;
>  	unsigned available_event_types;
>  	u64 fixed_ctr_ctrl;
> +	u64 fixed_ctr_ctrl_mask;
>  	u64 global_ctrl;
>  	u64 global_status;
>  	u64 global_ovf_ctrl;
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index d9dbebe03cae..ac7fe714e6c1 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -400,7 +400,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	case MSR_CORE_PERF_FIXED_CTR_CTRL:
>  		if (pmu->fixed_ctr_ctrl == data)
>  			return 0;
> -		if (!(data & 0xfffffffffffff444ull)) {
> +		if (!(data & pmu->fixed_ctr_ctrl_mask)) {

Don't we already have hardware with more than 3 fixed counters?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 06/16] KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS
  2021-05-11  2:42 ` [PATCH v6 06/16] KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS Like Xu
@ 2021-05-17  8:32   ` Peter Zijlstra
  2021-05-18  8:44     ` Xu, Like
  2021-05-17  8:33   ` Peter Zijlstra
  1 sibling, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2021-05-17  8:32 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, Borislav Petkov, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	weijiang.yang, Kan Liang, ak, wei.w.wang, eranian, liuxiangdong5,
	linux-kernel, x86, kvm, Luwei Kang

On Tue, May 11, 2021 at 10:42:04AM +0800, Like Xu wrote:
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 2f89fd599842..c791765f4761 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -3898,31 +3898,49 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
>  	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>  	struct perf_guest_switch_msr *arr = cpuc->guest_switch_msrs;
>  	u64 intel_ctrl = hybrid(cpuc->pmu, intel_ctrl);
> +	u64 pebs_mask = (x86_pmu.flags & PMU_FL_PEBS_ALL) ?
> +		cpuc->pebs_enabled : (cpuc->pebs_enabled & PEBS_COUNTER_MASK);
> +
> +	*nr = 0;
> +	arr[(*nr)++] = (struct perf_guest_switch_msr){
> +		.msr = MSR_CORE_PERF_GLOBAL_CTRL,
> +		.host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
> +		.guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask),
> +	};
>  
> +	if (!x86_pmu.pebs)
> +		return arr;
>  
> +	/*
> +	 * If PMU counter has PEBS enabled it is not enough to
> +	 * disable counter on a guest entry since PEBS memory
> +	 * write can overshoot guest entry and corrupt guest
> +	 * memory. Disabling PEBS solves the problem.
> +	 *
> +	 * Don't do this if the CPU already enforces it.
> +	 */
> +	if (x86_pmu.pebs_no_isolation) {
> +		arr[(*nr)++] = (struct perf_guest_switch_msr){
> +			.msr = MSR_IA32_PEBS_ENABLE,
> +			.host = cpuc->pebs_enabled,
> +			.guest = 0,
> +		};
> +		return arr;
>  	}
>  
> +	if (!x86_pmu.pebs_vmx)
> +		return arr;
> +
> +	arr[*nr] = (struct perf_guest_switch_msr){
> +		.msr = MSR_IA32_PEBS_ENABLE,
> +		.host = cpuc->pebs_enabled & ~cpuc->intel_ctrl_guest_mask,
> +		.guest = pebs_mask & ~cpuc->intel_ctrl_host_mask,
> +	};
> +
> +	/* Set hw GLOBAL_CTRL bits for PEBS counter when it runs for guest */
> +	arr[0].guest |= arr[*nr].guest;
> +
> +	++(*nr);
>  	return arr;
>  }

ISTR saying I was confused as heck by this function, I still don't see
clarifying comments :/

What's .host and .guest ?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 06/16] KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS
  2021-05-11  2:42 ` [PATCH v6 06/16] KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS Like Xu
  2021-05-17  8:32   ` Peter Zijlstra
@ 2021-05-17  8:33   ` Peter Zijlstra
  2021-05-18  8:13     ` Xu, Like
  1 sibling, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2021-05-17  8:33 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, Borislav Petkov, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	weijiang.yang, Kan Liang, ak, wei.w.wang, eranian, liuxiangdong5,
	linux-kernel, x86, kvm, Luwei Kang

On Tue, May 11, 2021 at 10:42:04AM +0800, Like Xu wrote:
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 2f89fd599842..c791765f4761 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -3898,31 +3898,49 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
>  	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>  	struct perf_guest_switch_msr *arr = cpuc->guest_switch_msrs;
>  	u64 intel_ctrl = hybrid(cpuc->pmu, intel_ctrl);
> +	u64 pebs_mask = (x86_pmu.flags & PMU_FL_PEBS_ALL) ?
> +		cpuc->pebs_enabled : (cpuc->pebs_enabled & PEBS_COUNTER_MASK);

> -	if (x86_pmu.flags & PMU_FL_PEBS_ALL)
> -		arr[0].guest &= ~cpuc->pebs_enabled;
> -	else
> -		arr[0].guest &= ~(cpuc->pebs_enabled & PEBS_COUNTER_MASK);
> -	*nr = 1;

Instead of endlessly mucking about with branches, do we want something
like this instead?

---
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 2521d03de5e0..bcfba11196c8 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2819,10 +2819,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
 	 * counters from the GLOBAL_STATUS mask and we always process PEBS
 	 * events via drain_pebs().
 	 */
-	if (x86_pmu.flags & PMU_FL_PEBS_ALL)
-		status &= ~cpuc->pebs_enabled;
-	else
-		status &= ~(cpuc->pebs_enabled & PEBS_COUNTER_MASK);
+	status &= ~(cpuc->pebs_enabled & x86_pmu.pebs_capable);
 
 	/*
 	 * PEBS overflow sets bit 62 in the global status register
@@ -3862,10 +3859,7 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr)
 	arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL;
 	arr[0].host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
 	arr[0].guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask;
-	if (x86_pmu.flags & PMU_FL_PEBS_ALL)
-		arr[0].guest &= ~cpuc->pebs_enabled;
-	else
-		arr[0].guest &= ~(cpuc->pebs_enabled & PEBS_COUNTER_MASK);
+	arr[0].guest &= ~(cpuc->pebs_enabled & x86_pmu.pebs_capable);
 	*nr = 1;
 
 	if (x86_pmu.pebs && x86_pmu.pebs_no_isolation) {
@@ -5546,6 +5540,7 @@ __init int intel_pmu_init(void)
 	x86_pmu.events_mask_len		= eax.split.mask_length;
 
 	x86_pmu.max_pebs_events		= min_t(unsigned, MAX_PEBS_EVENTS, x86_pmu.num_counters);
+	x86_pmu.pebs_capable		= PEBS_COUNTER_MASK;
 
 	/*
 	 * Quirk: v2 perfmon does not report fixed-purpose events, so
@@ -5730,6 +5725,7 @@ __init int intel_pmu_init(void)
 		x86_pmu.pebs_aliases = NULL;
 		x86_pmu.pebs_prec_dist = true;
 		x86_pmu.lbr_pt_coexist = true;
+		x86_pmu.pebs_capable = ~0ULL;
 		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
 		x86_pmu.flags |= PMU_FL_PEBS_ALL;
 		x86_pmu.get_event_constraints = glp_get_event_constraints;
@@ -6080,6 +6076,7 @@ __init int intel_pmu_init(void)
 		x86_pmu.pebs_aliases = NULL;
 		x86_pmu.pebs_prec_dist = true;
 		x86_pmu.pebs_block = true;
+		x86_pmu.pebs_capable = ~0ULL;
 		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
 		x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
 		x86_pmu.flags |= PMU_FL_PEBS_ALL;
@@ -6123,6 +6120,7 @@ __init int intel_pmu_init(void)
 		x86_pmu.pebs_aliases = NULL;
 		x86_pmu.pebs_prec_dist = true;
 		x86_pmu.pebs_block = true;
+		x86_pmu.pebs_capable = ~0ULL;
 		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
 		x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
 		x86_pmu.flags |= PMU_FL_PEBS_ALL;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 27fa85e7d4fd..6f3cf81ccb1b 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -805,6 +805,7 @@ struct x86_pmu {
 	void		(*pebs_aliases)(struct perf_event *event);
 	unsigned long	large_pebs_flags;
 	u64		rtm_abort_event;
+	u64		pebs_capable;
 
 	/*
 	 * Intel LBR

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 07/16] KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter
  2021-05-11  2:42 ` [PATCH v6 07/16] KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter Like Xu
@ 2021-05-17  8:39   ` Peter Zijlstra
  2021-05-17 14:44     ` Andi Kleen
  2021-05-17  9:14   ` Peter Zijlstra
  1 sibling, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2021-05-17  8:39 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, Borislav Petkov, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	weijiang.yang, Kan Liang, ak, wei.w.wang, eranian, liuxiangdong5,
	linux-kernel, x86, kvm

On Tue, May 11, 2021 at 10:42:05AM +0800, Like Xu wrote:
> +	if (pebs) {
> +		/*
> +		 * The non-zero precision level of guest event makes the ordinary
> +		 * guest event becomes a guest PEBS event and triggers the host
> +		 * PEBS PMI handler to determine whether the PEBS overflow PMI
> +		 * comes from the host counters or the guest.
> +		 *
> +		 * For most PEBS hardware events, the difference in the software
> +		 * precision levels of guest and host PEBS events will not affect
> +		 * the accuracy of the PEBS profiling result, because the "event IP"
> +		 * in the PEBS record is calibrated on the guest side.
> +		 */
> +		attr.precise_ip = 1;
> +	}

You've just destroyed precdist, no?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 07/16] KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter
  2021-05-11  2:42 ` [PATCH v6 07/16] KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter Like Xu
  2021-05-17  8:39   ` Peter Zijlstra
@ 2021-05-17  9:14   ` Peter Zijlstra
  2021-05-18 13:28     ` Xu, Like
  1 sibling, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2021-05-17  9:14 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, Borislav Petkov, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	weijiang.yang, Kan Liang, ak, wei.w.wang, eranian, liuxiangdong5,
	linux-kernel, x86, kvm

On Tue, May 11, 2021 at 10:42:05AM +0800, Like Xu wrote:
> @@ -99,6 +109,7 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
>  				  bool exclude_kernel, bool intr,
>  				  bool in_tx, bool in_tx_cp)
>  {
> +	struct kvm_pmu *pmu = vcpu_to_pmu(pmc->vcpu);
>  	struct perf_event *event;
>  	struct perf_event_attr attr = {
>  		.type = type,
> @@ -110,6 +121,7 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
>  		.exclude_kernel = exclude_kernel,
>  		.config = config,
>  	};
> +	bool pebs = test_bit(pmc->idx, (unsigned long *)&pmu->pebs_enable);
>  
>  	attr.sample_period = get_sample_period(pmc, pmc->counter);
>  
> @@ -124,9 +136,23 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
>  		attr.sample_period = 0;
>  		attr.config |= HSW_IN_TX_CHECKPOINTED;
>  	}
> +	if (pebs) {
> +		/*
> +		 * The non-zero precision level of guest event makes the ordinary
> +		 * guest event becomes a guest PEBS event and triggers the host
> +		 * PEBS PMI handler to determine whether the PEBS overflow PMI
> +		 * comes from the host counters or the guest.
> +		 *
> +		 * For most PEBS hardware events, the difference in the software
> +		 * precision levels of guest and host PEBS events will not affect
> +		 * the accuracy of the PEBS profiling result, because the "event IP"
> +		 * in the PEBS record is calibrated on the guest side.
> +		 */
> +		attr.precise_ip = 1;
> +	}
>  
>  	event = perf_event_create_kernel_counter(&attr, -1, current,
> -						 intr ? kvm_perf_overflow_intr :
> +						 (intr || pebs) ? kvm_perf_overflow_intr :
>  						 kvm_perf_overflow, pmc);

How would pebs && !intr be possible? Also; wouldn't this be more legible
when written like:

	perf_overflow_handler_t ovf = kvm_perf_overflow;

	...

	if (intr)
		ovf = kvm_perf_overflow_intr;

	...

	event = perf_event_create_kernel_counter(&attr, -1, current, ovf, pmc);


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 08/16] KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS
  2021-05-11  2:42 ` [PATCH v6 08/16] KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS Like Xu
  2021-05-12  5:16   ` Xu, Like
@ 2021-05-17 13:26   ` Peter Zijlstra
  2021-05-17 14:50     ` Andi Kleen
  1 sibling, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2021-05-17 13:26 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, Borislav Petkov, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	weijiang.yang, Kan Liang, ak, wei.w.wang, eranian, liuxiangdong5,
	linux-kernel, x86, kvm

On Tue, May 11, 2021 at 10:42:06AM +0800, Like Xu wrote:
> @@ -3897,6 +3898,8 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
>  {
>  	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>  	struct perf_guest_switch_msr *arr = cpuc->guest_switch_msrs;
> +	struct debug_store *ds = __this_cpu_read(cpu_hw_events.ds);
> +	struct kvm_pmu *pmu = (struct kvm_pmu *)data;

You can do without the cast, this is C, 'void *' silently casts to any
other pointer type.

>  	u64 intel_ctrl = hybrid(cpuc->pmu, intel_ctrl);
>  	u64 pebs_mask = (x86_pmu.flags & PMU_FL_PEBS_ALL) ?
>  		cpuc->pebs_enabled : (cpuc->pebs_enabled & PEBS_COUNTER_MASK);

> @@ -3931,6 +3934,12 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
>  	if (!x86_pmu.pebs_vmx)
>  		return arr;
>  
> +	arr[(*nr)++] = (struct perf_guest_switch_msr){
> +		.msr = MSR_IA32_DS_AREA,
> +		.host = (unsigned long)ds,

Using:
		(unsigned long)cpuc->ds;

was too complicated? :-)

> +		.guest = pmu->ds_area,
> +	};
> +
>  	arr[*nr] = (struct perf_guest_switch_msr){
>  		.msr = MSR_IA32_PEBS_ENABLE,
>  		.host = cpuc->pebs_enabled & ~cpuc->intel_ctrl_guest_mask,

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 07/16] KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter
  2021-05-17  8:39   ` Peter Zijlstra
@ 2021-05-17 14:44     ` Andi Kleen
  2021-05-18  8:47       ` Peter Zijlstra
  0 siblings, 1 reply; 56+ messages in thread
From: Andi Kleen @ 2021-05-17 14:44 UTC (permalink / raw)
  To: Peter Zijlstra, Like Xu
  Cc: Paolo Bonzini, Borislav Petkov, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	weijiang.yang, Kan Liang, wei.w.wang, eranian, liuxiangdong5,
	linux-kernel, x86, kvm


On 5/17/2021 1:39 AM, Peter Zijlstra wrote:
> On Tue, May 11, 2021 at 10:42:05AM +0800, Like Xu wrote:
>> +	if (pebs) {
>> +		/*
>> +		 * The non-zero precision level of guest event makes the ordinary
>> +		 * guest event becomes a guest PEBS event and triggers the host
>> +		 * PEBS PMI handler to determine whether the PEBS overflow PMI
>> +		 * comes from the host counters or the guest.
>> +		 *
>> +		 * For most PEBS hardware events, the difference in the software
>> +		 * precision levels of guest and host PEBS events will not affect
>> +		 * the accuracy of the PEBS profiling result, because the "event IP"
>> +		 * in the PEBS record is calibrated on the guest side.
>> +		 */
>> +		attr.precise_ip = 1;
>> +	}
> You've just destroyed precdist, no?

precdist can mean multiple things:

- Convert cycles to the precise INST_RETIRED event. That is not 
meaningful for virtualization because "cycles" doesn't exist, just the 
raw events.

- For GLC+ and TNT+ it will force the event to a specific counter that 
is more precise. This would be indeed "destroyed", but right now the 
patch kit only supports Icelake which doesn't support that anyways.

So I think the code is correct for now, but will need to be changed for 
later CPUs. Should perhaps fix the comment though to discuss this.


-Andi



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 08/16] KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS
  2021-05-17 13:26   ` Peter Zijlstra
@ 2021-05-17 14:50     ` Andi Kleen
  0 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2021-05-17 14:50 UTC (permalink / raw)
  To: Peter Zijlstra, Like Xu
  Cc: Paolo Bonzini, Borislav Petkov, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	weijiang.yang, Kan Liang, wei.w.wang, eranian, liuxiangdong5,
	linux-kernel, x86, kvm


On 5/17/2021 6:26 AM, Peter Zijlstra wrote:
> On Tue, May 11, 2021 at 10:42:06AM +0800, Like Xu wrote:
>> @@ -3897,6 +3898,8 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
>>   {
>>   	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>>   	struct perf_guest_switch_msr *arr = cpuc->guest_switch_msrs;
>> +	struct debug_store *ds = __this_cpu_read(cpu_hw_events.ds);
>> +	struct kvm_pmu *pmu = (struct kvm_pmu *)data;
> You can do without the cast, this is C, 'void *' silently casts to any
> other pointer type.

FWIW doing the C++ like casts for void * is fairly standard C coding 
style. I generally prefer it too for better documentation. K&R is 
written this way.

-Andi (my last email on this topic to avoid any bike shedding)



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/16] KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled
  2021-05-13  2:50         ` Xu, Like
@ 2021-05-17 18:43           ` Venkatesh Srinivas
  2021-05-17 21:19             ` Sean Christopherson
  2021-05-17 21:16           ` Sean Christopherson
  1 sibling, 1 reply; 56+ messages in thread
From: Venkatesh Srinivas @ 2021-05-17 18:43 UTC (permalink / raw)
  To: Xu, Like
  Cc: Sean Christopherson, Peter Zijlstra, Paolo Bonzini,
	Borislav Petkov, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, weijiang.yang, Kan Liang, ak, wei.w.wang,
	Stephane Eranian, liuxiangdong5, linux-kernel, x86, kvm,
	Yao Yuan, Like Xu

On Wed, May 12, 2021 at 7:50 PM Xu, Like <like.xu@intel.com> wrote:
>
> On 2021/5/12 23:18, Sean Christopherson wrote:
> > On Wed, May 12, 2021, Xu, Like wrote:
> >> Hi Venkatesh Srinivas,
> >>
> >> On 2021/5/12 9:58, Venkatesh Srinivas wrote:
> >>> On 5/10/21, Like Xu <like.xu@linux.intel.com> wrote:
> >>>> On Intel platforms, the software can use the IA32_MISC_ENABLE[7] bit to
> >>>> detect whether the processor supports performance monitoring facility.
> >>>>
> >>>> It depends on the PMU is enabled for the guest, and a software write
> >>>> operation to this available bit will be ignored.
> >>> Is the behavior that writes to IA32_MISC_ENABLE[7] are ignored (rather than #GP)
> >>> documented someplace?
> >> The bit[7] behavior of the real hardware on the native host is quite
> >> suspicious.
> > Ugh.  Can you file an SDM bug to get the wording and accessibility updated?  The
> > current phrasing is a mess:
> >
> >    Performance Monitoring Available (R)
> >    1 = Performance monitoring enabled.
> >    0 = Performance monitoring disabled.
> >
> > The (R) is ambiguous because most other entries that are read-only use (RO), and
> > the "enabled vs. disabled" implies the bit is writable and really does control
> > the PMU.  But on my Haswell system, it's read-only.
>
> On your Haswell system, does it cause #GP or just silent if you change this
> bit ?
>
> > Assuming the bit is supposed
> > to be a read-only "PMU supported bit", the SDM should be:
> >
> >    Performance Monitoring Available (RO)
> >    1 = Performance monitoring supported.
> >    0 = Performance monitoring not supported.

Can't speak to Haswell, but on Apollo Lake/Goldmont, this bit is _not_
set natively
and we get a #GP when attempting to set it, even though the PMU is available.

Should this bit be conditional on the host having it set?

> >
> > And please update the changelog to explain the "why" of whatever the behavior
> > ends up being.  The "what" is obvious from the code.
>
> Thanks for your "why" comment.
>
> >
> >> To keep the semantics consistent and simple, we propose ignoring write
> >> operation in the virtualized world, since whether or not to expose PMU is
> >> configured by the hypervisor user space and not by the guest side.
> > Making up our own architectural behavior because it's convient is not a good
> > idea.
>
> Sometime we do change it.
>
> For example, the scope of some msrs may be "core level share"
> but we likely keep it as a "thread level" variable in the KVM out of
> convenience.
>
> >
> >>>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> >>>> index 9efc1a6b8693..d9dbebe03cae 100644
> >>>> --- a/arch/x86/kvm/vmx/pmu_intel.c
> >>>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> >>>> @@ -488,6 +488,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
> >>>>            if (!pmu->version)
> >>>>                    return;
> >>>>
> >>>> +  vcpu->arch.ia32_misc_enable_msr |= MSR_IA32_MISC_ENABLE_EMON;
> > Hmm, normally I would say overwriting the guest's value is a bad idea, but if
> > the bit really is a read-only "PMU supported" bit, then this is the correct
> > behavior, albeit weird if userspace does a late CPUID update (though that's
> > weird no matter what).
> >
> >>>>            perf_get_x86_pmu_capability(&x86_pmu);
> >>>>
> >>>>            pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
> >>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >>>> index 5bd550eaf683..abe3ea69078c 100644
> >>>> --- a/arch/x86/kvm/x86.c
> >>>> +++ b/arch/x86/kvm/x86.c
> >>>> @@ -3211,6 +3211,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct
> >>>> msr_data *msr_info)
> >>>>                    }
> >>>>                    break;
> >>>>            case MSR_IA32_MISC_ENABLE:
> >>>> +          data &= ~MSR_IA32_MISC_ENABLE_EMON;
> > However, this is not.  If it's a read-only bit, then toggling the bit should
> > cause a #GP.
>
> The proposal here is trying to make it as an
> unchangeable bit and don't make it #GP if guest changes it.
>
> It may different from the host behavior but
> it doesn't cause potential issue if some guest code
> changes it during the use of performance monitoring.
>
> Does this make sense to you or do you want to
> keep it strictly the same as the host side?
>
> >
> >>>>                    if (!kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT)
> >>>> &&
> >>>>                        ((vcpu->arch.ia32_misc_enable_msr ^ data) &
> >>>> MSR_IA32_MISC_ENABLE_MWAIT)) {
> >>>>                            if (!guest_cpuid_has(vcpu, X86_FEATURE_XMM3))
> >>>> --
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/16] KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled
  2021-05-13  2:50         ` Xu, Like
  2021-05-17 18:43           ` Venkatesh Srinivas
@ 2021-05-17 21:16           ` Sean Christopherson
  2021-05-17 23:51             ` Sean Christopherson
  1 sibling, 1 reply; 56+ messages in thread
From: Sean Christopherson @ 2021-05-17 21:16 UTC (permalink / raw)
  To: Xu, Like
  Cc: Venkatesh Srinivas, Peter Zijlstra, Paolo Bonzini,
	Borislav Petkov, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, weijiang.yang, Kan Liang, ak, wei.w.wang, eranian,
	liuxiangdong5, linux-kernel, x86, kvm, Yao Yuan, Like Xu

On Thu, May 13, 2021, Xu, Like wrote:
> On 2021/5/12 23:18, Sean Christopherson wrote:
> > On Wed, May 12, 2021, Xu, Like wrote:
> > > Hi Venkatesh Srinivas,
> > > 
> > > On 2021/5/12 9:58, Venkatesh Srinivas wrote:
> > > > On 5/10/21, Like Xu <like.xu@linux.intel.com> wrote:
> > > > > On Intel platforms, the software can use the IA32_MISC_ENABLE[7] bit to
> > > > > detect whether the processor supports performance monitoring facility.
> > > > > 
> > > > > It depends on the PMU is enabled for the guest, and a software write
> > > > > operation to this available bit will be ignored.
> > > > Is the behavior that writes to IA32_MISC_ENABLE[7] are ignored (rather than #GP)
> > > > documented someplace?
> > > The bit[7] behavior of the real hardware on the native host is quite
> > > suspicious.
> > Ugh.  Can you file an SDM bug to get the wording and accessibility updated?  The
> > current phrasing is a mess:
> > 
> >    Performance Monitoring Available (R)
> >    1 = Performance monitoring enabled.
> >    0 = Performance monitoring disabled.
> > 
> > The (R) is ambiguous because most other entries that are read-only use (RO), and
> > the "enabled vs. disabled" implies the bit is writable and really does control
> > the PMU.  But on my Haswell system, it's read-only.
> 
> On your Haswell system, does it cause #GP or just silent if you change this
> bit ?

Attempting to clear the bit generates a #GP.

> > Assuming the bit is supposed
> > to be a read-only "PMU supported bit", the SDM should be:
> > 
> >    Performance Monitoring Available (RO)
> >    1 = Performance monitoring supported.
> >    0 = Performance monitoring not supported.
> > 
> > And please update the changelog to explain the "why" of whatever the behavior
> > ends up being.  The "what" is obvious from the code.
> 
> Thanks for your "why" comment.
> 
> > 
> > > To keep the semantics consistent and simple, we propose ignoring write
> > > operation in the virtualized world, since whether or not to expose PMU is
> > > configured by the hypervisor user space and not by the guest side.
> > Making up our own architectural behavior because it's convient is not a good
> > idea.
> 
> Sometime we do change it.
> 
> For example, the scope of some msrs may be "core level share"
> but we likely keep it as a "thread level" variable in the KVM out of
> convenience.

Thread vs. core scope is not architectural behavior.  Maybe you could argue that
it is for architectural MSRs, but even that is tenuous, e.g. SPEC_CTRL has this:

  The MSR bits are defined as logical processor scope. On some core
  implementations, the bits may impact sibling logical processors on the same core.

Regardless, the flaws of an inaccurate virtual CPU topology are well known, and
are a far cry from directly violating the SDM (assuming the SDM is fixed...).

> > > > > diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> > > > > index 9efc1a6b8693..d9dbebe03cae 100644
> > > > > --- a/arch/x86/kvm/vmx/pmu_intel.c
> > > > > +++ b/arch/x86/kvm/vmx/pmu_intel.c
> > > > > @@ -488,6 +488,7 @@ static void intel_pmu_refresh(struct kvm_vcpu *vcpu)
> > > > >    	if (!pmu->version)
> > > > >    		return;
> > > > > 
> > > > > +	vcpu->arch.ia32_misc_enable_msr |= MSR_IA32_MISC_ENABLE_EMON;
> > Hmm, normally I would say overwriting the guest's value is a bad idea, but if
> > the bit really is a read-only "PMU supported" bit, then this is the correct
> > behavior, albeit weird if userspace does a late CPUID update (though that's
> > weird no matter what).
> > 
> > > > >    	perf_get_x86_pmu_capability(&x86_pmu);
> > > > > 
> > > > >    	pmu->nr_arch_gp_counters = min_t(int, eax.split.num_counters,
> > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > > index 5bd550eaf683..abe3ea69078c 100644
> > > > > --- a/arch/x86/kvm/x86.c
> > > > > +++ b/arch/x86/kvm/x86.c
> > > > > @@ -3211,6 +3211,7 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct
> > > > > msr_data *msr_info)
> > > > >    		}
> > > > >    		break;
> > > > >    	case MSR_IA32_MISC_ENABLE:
> > > > > +		data &= ~MSR_IA32_MISC_ENABLE_EMON;
> > However, this is not.  If it's a read-only bit, then toggling the bit should
> > cause a #GP.
> 
> The proposal here is trying to make it as an unchangeable bit and don't make
> it #GP if guest changes it.
> 
> It may different from the host behavior but it doesn't cause potential issue
> if some guest code changes it during the use of performance monitoring.
> 
> Does this make sense to you or do you want to keep it strictly the same as
> the host side?

Strictly the same as bare metal.  I don't see any reason to eat writes from the
guest.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/16] KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled
  2021-05-17 18:43           ` Venkatesh Srinivas
@ 2021-05-17 21:19             ` Sean Christopherson
  0 siblings, 0 replies; 56+ messages in thread
From: Sean Christopherson @ 2021-05-17 21:19 UTC (permalink / raw)
  To: Venkatesh Srinivas
  Cc: Xu, Like, Peter Zijlstra, Paolo Bonzini, Borislav Petkov,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	weijiang.yang, Kan Liang, ak, wei.w.wang, Stephane Eranian,
	liuxiangdong5, linux-kernel, x86, kvm, Yao Yuan, Like Xu

On Mon, May 17, 2021, Venkatesh Srinivas wrote:
> Should this bit be conditional on the host having it set?

No need, KVM advertises the architectural PMU to userspace iff hardware itself
has an architecture PMU.  Userspace is free to lie to its guests so long as doing
so doesn't put KVM at risk.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/16] KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled
  2021-05-17 21:16           ` Sean Christopherson
@ 2021-05-17 23:51             ` Sean Christopherson
  2021-05-18  7:49               ` Xu, Like
  0 siblings, 1 reply; 56+ messages in thread
From: Sean Christopherson @ 2021-05-17 23:51 UTC (permalink / raw)
  To: Xu, Like
  Cc: Venkatesh Srinivas, Peter Zijlstra, Paolo Bonzini,
	Borislav Petkov, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, weijiang.yang, Kan Liang, ak, wei.w.wang, eranian,
	liuxiangdong5, linux-kernel, x86, kvm, Yao Yuan, Like Xu

On Mon, May 17, 2021, Sean Christopherson wrote:
> On Thu, May 13, 2021, Xu, Like wrote:
> > On 2021/5/12 23:18, Sean Christopherson wrote:
> > > On Wed, May 12, 2021, Xu, Like wrote:
> > > > Hi Venkatesh Srinivas,
> > > > 
> > > > On 2021/5/12 9:58, Venkatesh Srinivas wrote:
> > > > > On 5/10/21, Like Xu <like.xu@linux.intel.com> wrote:
> > > > > > On Intel platforms, the software can use the IA32_MISC_ENABLE[7] bit to
> > > > > > detect whether the processor supports performance monitoring facility.
> > > > > > 
> > > > > > It depends on the PMU is enabled for the guest, and a software write
> > > > > > operation to this available bit will be ignored.
> > > > > Is the behavior that writes to IA32_MISC_ENABLE[7] are ignored (rather than #GP)
> > > > > documented someplace?
> > > > The bit[7] behavior of the real hardware on the native host is quite
> > > > suspicious.
> > > Ugh.  Can you file an SDM bug to get the wording and accessibility updated?  The
> > > current phrasing is a mess:
> > > 
> > >    Performance Monitoring Available (R)
> > >    1 = Performance monitoring enabled.
> > >    0 = Performance monitoring disabled.
> > > 
> > > The (R) is ambiguous because most other entries that are read-only use (RO), and
> > > the "enabled vs. disabled" implies the bit is writable and really does control
> > > the PMU.  But on my Haswell system, it's read-only.
> > 
> > On your Haswell system, does it cause #GP or just silent if you change this
> > bit ?
> 
> Attempting to clear the bit generates a #GP.

*sigh*

Venkatesh and I are exhausting our brown paper bag supply.

Attempting to clear bit 7 is ignored on both Haswell and Goldmont.  This _no_ #GP,
the toggle is simply ignored.  I forgot to specify hex format (multiple times),
and Venkatesh accessed the wrong MSR (0x10a instead of 0x1a0).

So your proposal to ignore the toggle in KVM is the way to go, but please
document in the changelog that that behavior matches bare metal.

It would be nice to get the SDM cleaned up to use "supported/unsupported", and to
pick one of (R), (RO), and (R/O) for all MSRs entries for consistency, but that
may be a pipe dream.

Sorry for the run-around :-/

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 02/16] perf/x86/intel: Handle guest PEBS overflow PMI for KVM guest
  2021-05-17  8:16   ` Peter Zijlstra
@ 2021-05-18  7:38     ` Xu, Like
  2021-05-18  8:37       ` Peter Zijlstra
  0 siblings, 1 reply; 56+ messages in thread
From: Xu, Like @ 2021-05-18  7:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paolo Bonzini, Borislav Petkov, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	weijiang.yang, Kan Liang, ak, wei.w.wang, eranian, liuxiangdong5,
	linux-kernel, x86, kvm, Like Xu

On 2021/5/17 16:16, Peter Zijlstra wrote:
> On Tue, May 11, 2021 at 10:42:00AM +0800, Like Xu wrote:
>> With PEBS virtualization, the guest PEBS records get delivered to the
>> guest DS, and the host pmi handler uses perf_guest_cbs->is_in_guest()
>> to distinguish whether the PMI comes from the guest code like Intel PT.
>>
>> No matter how many guest PEBS counters are overflowed, only triggering
>> one fake event is enough. The fake event causes the KVM PMI callback to
>> be called, thereby injecting the PEBS overflow PMI into the guest.
>>
>> KVM may inject the PMI with BUFFER_OVF set, even if the guest DS is
>> empty. That should really be harmless. Thus guest PEBS handler would
>> retrieve the correct information from its own PEBS records buffer.
>>
>> Originally-by: Andi Kleen <ak@linux.intel.com>
>> Co-developed-by: Kan Liang <kan.liang@linux.intel.com>
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
>> ---
>>   arch/x86/events/intel/core.c | 40 ++++++++++++++++++++++++++++++++++++
>>   1 file changed, 40 insertions(+)
>>
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index b6e45ee10e16..092ecacf8345 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -2780,6 +2780,43 @@ static void intel_pmu_reset(void)
>>   	local_irq_restore(flags);
>>   }
>>   
>> +/*
>> + * We may be running with guest PEBS events created by KVM, and the
>> + * PEBS records are logged into the guest's DS and invisible to host.
>> + *
>> + * In the case of guest PEBS overflow, we only trigger a fake event
>> + * to emulate the PEBS overflow PMI for guest PBES counters in KVM.
>> + * The guest will then vm-entry and check the guest DS area to read
>> + * the guest PEBS records.
>> + *
>> + * The contents and other behavior of the guest event do not matter.
>> + */
>> +static void x86_pmu_handle_guest_pebs(struct pt_regs *regs,
>> +				      struct perf_sample_data *data)
>> +{
>> +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>> +	u64 guest_pebs_idxs = cpuc->pebs_enabled & ~cpuc->intel_ctrl_host_mask;
>> +	struct perf_event *event = NULL;
>> +	int bit;
>> +
>> +	if (!x86_pmu.pebs_active || !guest_pebs_idxs)
>> +		return;
>> +
>> +	for_each_set_bit(bit, (unsigned long *)&guest_pebs_idxs,
>> +			 INTEL_PMC_IDX_FIXED + x86_pmu.num_counters_fixed) {
>> +		event = cpuc->events[bit];
>> +		if (!event->attr.precise_ip)
>> +			continue;
>> +
>> +		perf_sample_data_init(data, 0, event->hw.last_period);
>> +		if (perf_event_overflow(event, data, regs))
>> +			x86_pmu_stop(event, 0);
>> +
>> +		/* Inject one fake event is enough. */
>> +		break;
>> +	}
>> +}
>> +
>>   static int handle_pmi_common(struct pt_regs *regs, u64 status)
>>   {
>>   	struct perf_sample_data data;
>> @@ -2831,6 +2868,9 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
>>   		u64 pebs_enabled = cpuc->pebs_enabled;
>>   
>>   		handled++;
>> +		if (x86_pmu.pebs_vmx && perf_guest_cbs &&
>> +		    perf_guest_cbs->is_in_guest())
>> +			x86_pmu_handle_guest_pebs(regs, &data);
>>   		x86_pmu.drain_pebs(regs, &data);
>>   		status &= intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;
>>   
> I'm thinking you have your conditions in the wrong order; would it not
> be much cheaper to first check: '!x86_pmu.pebs_active || !guest_pebs_idx'
> than to do that horrible indirect ->is_in_guest() call?
>
> After all, if the guest doesn't have PEBS enabled, who cares if we're
> currently in a guest or not.

Yes, it makes sense. How about:

@@ -2833,6 +2867,10 @@ static int handle_pmi_common(struct pt_regs *regs, 
u64 status)
                 u64 pebs_enabled = cpuc->pebs_enabled;

                 handled++;
+               if (x86_pmu.pebs_vmx && x86_pmu.pebs_active &&
+                   (cpuc->pebs_enabled & ~cpuc->intel_ctrl_host_mask) &&
+                   (static_call(x86_guest_state)() & PERF_GUEST_ACTIVE))
+                       x86_pmu_handle_guest_pebs(regs, &data);
                 x86_pmu.drain_pebs(regs, &data);
                 status &= intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;

>
> Also, something like the below perhaps (arm64 and xen need fixing up at
> the very least) could make all that perf_guest_cbs stuff suck less.

How about the commit message for your below patch:

From: "Peter Zijlstra (Intel)" <peterz@infradead.org>

x86/core: Use static_call to rewrite perf_guest_info_callbacks

The two fields named "is_in_guest" and "is_user_mode" in
perf_guest_info_callbacks are replaced with a new multiplexed member
named "state", and the "get_guest_ip" field will be renamed to "get_ip".

The application of DEFINE_STATIC_CALL_RET0 (arm64 and xen need fixing
up at the very least) could make all that perf_guest_cbs stuff suck less.
For KVM, these callbacks will be updated in the kvm_arch_init().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

----

I'm not sue if you have a strong reason to violate the check-patch rule:

ERROR: Using weak declarations can have unintended link defects
#238: FILE: include/linux/perf_event.h:1242:
+extern void __weak arch_perf_update_guest_cbs(void);

?

>
> ---
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 8e509325c2c3..c8f8fb7c0536 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -90,6 +90,26 @@ DEFINE_STATIC_CALL_NULL(x86_pmu_pebs_aliases, *x86_pmu.pebs_aliases);
>    */
>   DEFINE_STATIC_CALL_RET0(x86_pmu_guest_get_msrs, *x86_pmu.guest_get_msrs);
>   
> +DEFINE_STATIC_CALL_RET0(x86_guest_state, *(perf_guest_cbs->state));
> +DEFINE_STATIC_CALL_RET0(x86_guest_get_ip, *(perf_guest_cbs->get_ip));
> +DEFINE_STATIC_CALL_RET0(x86_guest_handle_intel_pt_intr, *(perf_guest_cbs->handle_intel_pt_intr));
> +
> +void arch_perf_update_guest_cbs(void)
> +{
> +	static_call_update(x86_guest_state, (void *)&__static_call_return0);
> +	static_call_update(x86_guest_get_ip, (void *)&__static_call_return0);
> +	static_call_update(x86_guest_handle_intel_pt_intr, (void *)&__static_call_return0);
> +
> +	if (perf_guest_cbs && perf_guest_cbs->state)
> +		static_call_update(x86_guest_state, perf_guest_cbs->state);
> +
> +	if (perf_guest_cbs && perf_guest_cbs->get_ip)
> +		static_call_update(x86_guest_get_ip, perf_guest_cbs->get_ip);
> +
> +	if (perf_guest_cbs && perf_guest_cbs->handle_intel_pt_intr)
> +		static_call_update(x86_guest_handle_intel_pt_intr, perf_guest_cbs->handle_intel_pt_intr);
> +}
> +
>   u64 __read_mostly hw_cache_event_ids
>   				[PERF_COUNT_HW_CACHE_MAX]
>   				[PERF_COUNT_HW_CACHE_OP_MAX]
> @@ -2736,7 +2756,7 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
>   	struct unwind_state state;
>   	unsigned long addr;
>   
> -	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
> +	if (static_call(x86_guest_state)()) {
>   		/* TODO: We don't support guest os callchain now */
>   		return;
>   	}
> @@ -2839,7 +2859,7 @@ perf_callchain_user(struct perf_callchain_entry_ctx *entry, struct pt_regs *regs
>   	struct stack_frame frame;
>   	const struct stack_frame __user *fp;
>   
> -	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
> +	if (static_call(x86_guest_state)()) {
>   		/* TODO: We don't support guest os callchain now */
>   		return;
>   	}
> @@ -2916,18 +2936,21 @@ static unsigned long code_segment_base(struct pt_regs *regs)
>   
>   unsigned long perf_instruction_pointer(struct pt_regs *regs)
>   {
> -	if (perf_guest_cbs && perf_guest_cbs->is_in_guest())
> -		return perf_guest_cbs->get_guest_ip();
> +	unsigned long ip = static_call(x86_guest_get_ip)();
> +
> +	if (likely(!ip))
> +		ip = regs->ip + code_segment_base(regs);
>   
> -	return regs->ip + code_segment_base(regs);
> +	return ip;
>   }
>   
>   unsigned long perf_misc_flags(struct pt_regs *regs)
>   {
> +	unsigned int guest = static_call(x86_guest_state)();
>   	int misc = 0;
>   
> -	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
> -		if (perf_guest_cbs->is_user_mode())
> +	if (guest) {
> +		if (guest & PERF_GUEST_USER)
>   			misc |= PERF_RECORD_MISC_GUEST_USER;
>   		else
>   			misc |= PERF_RECORD_MISC_GUEST_KERNEL;
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 2521d03de5e0..ac422c45f940 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -2780,6 +2780,8 @@ static void intel_pmu_reset(void)
>   	local_irq_restore(flags);
>   }
>   
> +DECLARE_STATIC_CALL(x86_guest_handle_intel_pt_intr, *(perf_guest_cbs->handle_intel_pt_intr));
> +
>   static int handle_pmi_common(struct pt_regs *regs, u64 status)
>   {
>   	struct perf_sample_data data;
> @@ -2850,10 +2852,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
>   	 */
>   	if (__test_and_clear_bit(GLOBAL_STATUS_TRACE_TOPAPMI_BIT, (unsigned long *)&status)) {
>   		handled++;
> -		if (unlikely(perf_guest_cbs && perf_guest_cbs->is_in_guest() &&
> -			perf_guest_cbs->handle_intel_pt_intr))
> -			perf_guest_cbs->handle_intel_pt_intr();
> -		else
> +		if (!static_call(x86_guest_handle_intel_pt_intr)())
>   			intel_pt_interrupt();
>   	}
>   
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 55efbacfc244..2a24e615fa4a 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1812,7 +1812,7 @@ int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu);
>   int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err);
>   void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu);
>   
> -int kvm_is_in_guest(void);
> +unsigned int kvm_guest_state(void);
>   
>   void __user *__x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa,
>   				     u32 size);
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 827886c12c16..2dcbd1b30004 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -87,7 +87,7 @@ static void kvm_perf_overflow_intr(struct perf_event *perf_event,
>   		 * woken up. So we should wake it, but this is impossible from
>   		 * NMI context. Do it from irq work instead.
>   		 */
> -		if (!kvm_is_in_guest())
> +		if (!kvm_guest_state())
>   			irq_work_queue(&pmc_to_pmu(pmc)->irq_work);
>   		else
>   			kvm_make_request(KVM_REQ_PMI, pmc->vcpu);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index bbc4e04e67ad..88f709b3759c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8035,44 +8035,46 @@ static void kvm_timer_init(void)
>   DEFINE_PER_CPU(struct kvm_vcpu *, current_vcpu);
>   EXPORT_PER_CPU_SYMBOL_GPL(current_vcpu);
>   
> -int kvm_is_in_guest(void)
> +static unsigned int kvm_guest_state(void)
>   {
> -	return __this_cpu_read(current_vcpu) != NULL;
> -}
> -
> -static int kvm_is_user_mode(void)
> -{
> -	int user_mode = 3;
> +	struct kvm_vcpu *vcpu = __this_cpu_read(current_vcpu);
> +	unsigned int state = 0;
>   
> -	if (__this_cpu_read(current_vcpu))
> -		user_mode = static_call(kvm_x86_get_cpl)(__this_cpu_read(current_vcpu));
> +	if (vcpu)
> +		state |= PERF_GUEST_ACTIVE;
> +	if (static_call(kvm_x86_get_cpl)(vcpu))
> +		state |= PERF_GUEST_USER;
>   
> -	return user_mode != 0;
> +	return state;
>   }
>   
> -static unsigned long kvm_get_guest_ip(void)
> +static unsigned long kvm_guest_get_ip(void)
>   {
> +	struct kvm_vcpu *vcpu = __this_cpu_read(current_vcpu);
>   	unsigned long ip = 0;
>   
> -	if (__this_cpu_read(current_vcpu))
> -		ip = kvm_rip_read(__this_cpu_read(current_vcpu));
> +	if (vcpu)
> +		ip = kvm_rip_read(vcpu);
>   
>   	return ip;
>   }
>   
> -static void kvm_handle_intel_pt_intr(void)
> +static unsigned int kvm_handle_intel_pt_intr(void)
>   {
>   	struct kvm_vcpu *vcpu = __this_cpu_read(current_vcpu);
>   
> +	if (!vcpu)
> +		return 0;
> +
>   	kvm_make_request(KVM_REQ_PMI, vcpu);
>   	__set_bit(MSR_CORE_PERF_GLOBAL_OVF_CTRL_TRACE_TOPA_PMI_BIT,
>   			(unsigned long *)&vcpu->arch.pmu.global_status);
> +	return 1;
>   }
>   
>   static struct perf_guest_info_callbacks kvm_guest_cbs = {
> -	.is_in_guest		= kvm_is_in_guest,
> -	.is_user_mode		= kvm_is_user_mode,
> -	.get_guest_ip		= kvm_get_guest_ip,
> +	.state			= kvm_guest_state,
> +	.get_ip			= kvm_guest_get_ip,
>   	.handle_intel_pt_intr	= kvm_handle_intel_pt_intr,
>   };
>   
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index f5a6a2f069ed..7eae1fd22db3 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -26,11 +26,13 @@
>   # include <asm/local64.h>
>   #endif
>   
> +#define PERF_GUEST_ACTIVE	0x01
> +#define PERF_GUEST_USER		0x02
> +
>   struct perf_guest_info_callbacks {
> -	int				(*is_in_guest)(void);
> -	int				(*is_user_mode)(void);
> -	unsigned long			(*get_guest_ip)(void);
> -	void				(*handle_intel_pt_intr)(void);
> +	unsigned int			(*state)(void);
> +	unsigned long			(*get_ip)(void);
> +	unsigned int			(*handle_intel_pt_intr)(void);
>   };
>   
>   #ifdef CONFIG_HAVE_HW_BREAKPOINT
> @@ -1237,6 +1239,8 @@ extern void perf_event_bpf_event(struct bpf_prog *prog,
>   				 u16 flags);
>   
>   extern struct perf_guest_info_callbacks *perf_guest_cbs;
> +extern void __weak arch_perf_update_guest_cbs(void);
> +
>   extern int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
>   extern int perf_unregister_guest_info_callbacks(struct perf_guest_info_callbacks *callbacks);
>   
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 2e947a485898..aec531fc9c90 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -6486,9 +6486,17 @@ static void perf_pending_event(struct irq_work *entry)
>    */
>   struct perf_guest_info_callbacks *perf_guest_cbs;
>   
> +void __weak arch_perf_update_guest_cbs(void)
> +{
> +}
> +
>   int perf_register_guest_info_callbacks(struct perf_guest_info_callbacks *cbs)
>   {
> +	if (WARN_ON_ONCE(perf_guest_cbs))
> +		return -EBUSY;
> +
>   	perf_guest_cbs = cbs;
> +	arch_perf_update_guest_cbs();
>   	return 0;
>   }
>   EXPORT_SYMBOL_GPL(perf_register_guest_info_callbacks);


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 04/16] KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled
  2021-05-17 23:51             ` Sean Christopherson
@ 2021-05-18  7:49               ` Xu, Like
  0 siblings, 0 replies; 56+ messages in thread
From: Xu, Like @ 2021-05-18  7:49 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Venkatesh Srinivas, Peter Zijlstra, Paolo Bonzini,
	Borislav Petkov, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, weijiang.yang, Kan Liang, ak, wei.w.wang, eranian,
	liuxiangdong5, linux-kernel, x86, kvm, Yao Yuan, Like Xu

On 2021/5/18 7:51, Sean Christopherson wrote:
> On Mon, May 17, 2021, Sean Christopherson wrote:
>> On Thu, May 13, 2021, Xu, Like wrote:
>>> On 2021/5/12 23:18, Sean Christopherson wrote:
>>>> On Wed, May 12, 2021, Xu, Like wrote:
>>>>> Hi Venkatesh Srinivas,
>>>>>
>>>>> On 2021/5/12 9:58, Venkatesh Srinivas wrote:
>>>>>> On 5/10/21, Like Xu <like.xu@linux.intel.com> wrote:
>>>>>>> On Intel platforms, the software can use the IA32_MISC_ENABLE[7] bit to
>>>>>>> detect whether the processor supports performance monitoring facility.
>>>>>>>
>>>>>>> It depends on the PMU is enabled for the guest, and a software write
>>>>>>> operation to this available bit will be ignored.
>>>>>> Is the behavior that writes to IA32_MISC_ENABLE[7] are ignored (rather than #GP)
>>>>>> documented someplace?
>>>>> The bit[7] behavior of the real hardware on the native host is quite
>>>>> suspicious.
>>>> Ugh.  Can you file an SDM bug to get the wording and accessibility updated?  The
>>>> current phrasing is a mess:
>>>>
>>>>     Performance Monitoring Available (R)
>>>>     1 = Performance monitoring enabled.
>>>>     0 = Performance monitoring disabled.
>>>>
>>>> The (R) is ambiguous because most other entries that are read-only use (RO), and
>>>> the "enabled vs. disabled" implies the bit is writable and really does control
>>>> the PMU.  But on my Haswell system, it's read-only.
>>> On your Haswell system, does it cause #GP or just silent if you change this
>>> bit ?
>> Attempting to clear the bit generates a #GP.
> *sigh*
>
> Venkatesh and I are exhausting our brown paper bag supply.
>
> Attempting to clear bit 7 is ignored on both Haswell and Goldmont.  This _no_ #GP,
> the toggle is simply ignored.  I forgot to specify hex format (multiple times),
> and Venkatesh accessed the wrong MSR (0x10a instead of 0x1a0).

*sigh*

>
> So your proposal to ignore the toggle in KVM is the way to go, but please
> document in the changelog that that behavior matches bare metal.

Thank you, I will clearly state it in the commit message.

>
> It would be nice to get the SDM cleaned up to use "supported/unsupported", and to
> pick one of (R), (RO), and (R/O) for all MSRs entries for consistency, but that
> may be a pipe dream.

Glad you could review my code. I have reported this issue internally.

>
> Sorry for the run-around :-/


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 05/16] KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter
  2021-05-17  8:18   ` Peter Zijlstra
@ 2021-05-18  7:55     ` Xu, Like
  2021-05-18  8:35       ` Peter Zijlstra
  0 siblings, 1 reply; 56+ messages in thread
From: Xu, Like @ 2021-05-18  7:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paolo Bonzini, Borislav Petkov, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	weijiang.yang, Kan Liang, ak, wei.w.wang, eranian, liuxiangdong5,
	linux-kernel, x86, kvm, Like Xu

On 2021/5/17 16:18, Peter Zijlstra wrote:
> On Tue, May 11, 2021 at 10:42:03AM +0800, Like Xu wrote:
>> The mask value of fixed counter control register should be dynamic
>> adjusted with the number of fixed counters. This patch introduces a
>> variable that includes the reserved bits of fixed counter control
>> registers. This is needed for later Ice Lake fixed counter changes.
>>
>> Co-developed-by: Luwei Kang <luwei.kang@intel.com>
>> Signed-off-by: Luwei Kang <luwei.kang@intel.com>
>> Signed-off-by: Like Xu <like.xu@linux.intel.com>
>> ---
>>   arch/x86/include/asm/kvm_host.h | 1 +
>>   arch/x86/kvm/vmx/pmu_intel.c    | 6 +++++-
>>   2 files changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 55efbacfc244..49b421bd3dd8 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -457,6 +457,7 @@ struct kvm_pmu {
>>   	unsigned nr_arch_fixed_counters;
>>   	unsigned available_event_types;
>>   	u64 fixed_ctr_ctrl;
>> +	u64 fixed_ctr_ctrl_mask;
>>   	u64 global_ctrl;
>>   	u64 global_status;
>>   	u64 global_ovf_ctrl;
>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
>> index d9dbebe03cae..ac7fe714e6c1 100644
>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>> @@ -400,7 +400,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   	case MSR_CORE_PERF_FIXED_CTR_CTRL:
>>   		if (pmu->fixed_ctr_ctrl == data)
>>   			return 0;
>> -		if (!(data & 0xfffffffffffff444ull)) {
>> +		if (!(data & pmu->fixed_ctr_ctrl_mask)) {
> Don't we already have hardware with more than 3 fixed counters?

Yes, so we update this mask based on the value of pmu->nr_arch_fixed_counters:

+    for (i = 0; i < pmu->nr_arch_fixed_counters; i++)
+        pmu->fixed_ctr_ctrl_mask &= ~(0xbull << (i * 4));

I assume this comment will not result in any code changes for this patch.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 06/16] KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS
  2021-05-17  8:33   ` Peter Zijlstra
@ 2021-05-18  8:13     ` Xu, Like
  0 siblings, 0 replies; 56+ messages in thread
From: Xu, Like @ 2021-05-18  8:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paolo Bonzini, Borislav Petkov, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	weijiang.yang, Kan Liang, ak, wei.w.wang, eranian, liuxiangdong5,
	linux-kernel, x86, kvm, Like Xu

On 2021/5/17 16:33, Peter Zijlstra wrote:
> On Tue, May 11, 2021 at 10:42:04AM +0800, Like Xu wrote:
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index 2f89fd599842..c791765f4761 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -3898,31 +3898,49 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
>>   	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>>   	struct perf_guest_switch_msr *arr = cpuc->guest_switch_msrs;
>>   	u64 intel_ctrl = hybrid(cpuc->pmu, intel_ctrl);
>> +	u64 pebs_mask = (x86_pmu.flags & PMU_FL_PEBS_ALL) ?
>> +		cpuc->pebs_enabled : (cpuc->pebs_enabled & PEBS_COUNTER_MASK);
>> -	if (x86_pmu.flags & PMU_FL_PEBS_ALL)
>> -		arr[0].guest &= ~cpuc->pebs_enabled;
>> -	else
>> -		arr[0].guest &= ~(cpuc->pebs_enabled & PEBS_COUNTER_MASK);
>> -	*nr = 1;
> Instead of endlessly mucking about with branches, do we want something
> like this instead?

Fine to me. How about the commit message for your below patch:

x86/perf/core: Add pebs_capable to store valid PEBS_COUNTER_MASK value

The value of pebs_counter_mask will be accessed frequently
for repeated use in the intel_guest_get_msrs(). So it can be
optimized instead of endlessly mucking about with branches.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

>
> ---
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 2521d03de5e0..bcfba11196c8 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -2819,10 +2819,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
>   	 * counters from the GLOBAL_STATUS mask and we always process PEBS
>   	 * events via drain_pebs().
>   	 */
> -	if (x86_pmu.flags & PMU_FL_PEBS_ALL)
> -		status &= ~cpuc->pebs_enabled;
> -	else
> -		status &= ~(cpuc->pebs_enabled & PEBS_COUNTER_MASK);
> +	status &= ~(cpuc->pebs_enabled & x86_pmu.pebs_capable);
>   
>   	/*
>   	 * PEBS overflow sets bit 62 in the global status register
> @@ -3862,10 +3859,7 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr)
>   	arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL;
>   	arr[0].host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
>   	arr[0].guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask;
> -	if (x86_pmu.flags & PMU_FL_PEBS_ALL)
> -		arr[0].guest &= ~cpuc->pebs_enabled;
> -	else
> -		arr[0].guest &= ~(cpuc->pebs_enabled & PEBS_COUNTER_MASK);
> +	arr[0].guest &= ~(cpuc->pebs_enabled & x86_pmu.pebs_capable);
>   	*nr = 1;
>   
>   	if (x86_pmu.pebs && x86_pmu.pebs_no_isolation) {
> @@ -5546,6 +5540,7 @@ __init int intel_pmu_init(void)
>   	x86_pmu.events_mask_len		= eax.split.mask_length;
>   
>   	x86_pmu.max_pebs_events		= min_t(unsigned, MAX_PEBS_EVENTS, x86_pmu.num_counters);
> +	x86_pmu.pebs_capable		= PEBS_COUNTER_MASK;
>   
>   	/*
>   	 * Quirk: v2 perfmon does not report fixed-purpose events, so
> @@ -5730,6 +5725,7 @@ __init int intel_pmu_init(void)
>   		x86_pmu.pebs_aliases = NULL;
>   		x86_pmu.pebs_prec_dist = true;
>   		x86_pmu.lbr_pt_coexist = true;
> +		x86_pmu.pebs_capable = ~0ULL;
>   		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
>   		x86_pmu.flags |= PMU_FL_PEBS_ALL;
>   		x86_pmu.get_event_constraints = glp_get_event_constraints;
> @@ -6080,6 +6076,7 @@ __init int intel_pmu_init(void)
>   		x86_pmu.pebs_aliases = NULL;
>   		x86_pmu.pebs_prec_dist = true;
>   		x86_pmu.pebs_block = true;
> +		x86_pmu.pebs_capable = ~0ULL;
>   		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
>   		x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
>   		x86_pmu.flags |= PMU_FL_PEBS_ALL;
> @@ -6123,6 +6120,7 @@ __init int intel_pmu_init(void)
>   		x86_pmu.pebs_aliases = NULL;
>   		x86_pmu.pebs_prec_dist = true;
>   		x86_pmu.pebs_block = true;
> +		x86_pmu.pebs_capable = ~0ULL;
>   		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
>   		x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
>   		x86_pmu.flags |= PMU_FL_PEBS_ALL;
> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
> index 27fa85e7d4fd..6f3cf81ccb1b 100644
> --- a/arch/x86/events/perf_event.h
> +++ b/arch/x86/events/perf_event.h
> @@ -805,6 +805,7 @@ struct x86_pmu {
>   	void		(*pebs_aliases)(struct perf_event *event);
>   	unsigned long	large_pebs_flags;
>   	u64		rtm_abort_event;
> +	u64		pebs_capable;
>   
>   	/*
>   	 * Intel LBR


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 05/16] KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter
  2021-05-18  7:55     ` Xu, Like
@ 2021-05-18  8:35       ` Peter Zijlstra
  0 siblings, 0 replies; 56+ messages in thread
From: Peter Zijlstra @ 2021-05-18  8:35 UTC (permalink / raw)
  To: Xu, Like
  Cc: Paolo Bonzini, Borislav Petkov, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	weijiang.yang, Kan Liang, ak, wei.w.wang, eranian, liuxiangdong5,
	linux-kernel, x86, kvm, Like Xu

On Tue, May 18, 2021 at 03:55:13PM +0800, Xu, Like wrote:
> On 2021/5/17 16:18, Peter Zijlstra wrote:
> > On Tue, May 11, 2021 at 10:42:03AM +0800, Like Xu wrote:
> > > The mask value of fixed counter control register should be dynamic
> > > adjusted with the number of fixed counters. This patch introduces a
> > > variable that includes the reserved bits of fixed counter control
> > > registers. This is needed for later Ice Lake fixed counter changes.
> > > 
> > > Co-developed-by: Luwei Kang <luwei.kang@intel.com>
> > > Signed-off-by: Luwei Kang <luwei.kang@intel.com>
> > > Signed-off-by: Like Xu <like.xu@linux.intel.com>
> > > ---
> > >   arch/x86/include/asm/kvm_host.h | 1 +
> > >   arch/x86/kvm/vmx/pmu_intel.c    | 6 +++++-
> > >   2 files changed, 6 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 55efbacfc244..49b421bd3dd8 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -457,6 +457,7 @@ struct kvm_pmu {
> > >   	unsigned nr_arch_fixed_counters;
> > >   	unsigned available_event_types;
> > >   	u64 fixed_ctr_ctrl;
> > > +	u64 fixed_ctr_ctrl_mask;
> > >   	u64 global_ctrl;
> > >   	u64 global_status;
> > >   	u64 global_ovf_ctrl;
> > > diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> > > index d9dbebe03cae..ac7fe714e6c1 100644
> > > --- a/arch/x86/kvm/vmx/pmu_intel.c
> > > +++ b/arch/x86/kvm/vmx/pmu_intel.c
> > > @@ -400,7 +400,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > >   	case MSR_CORE_PERF_FIXED_CTR_CTRL:
> > >   		if (pmu->fixed_ctr_ctrl == data)
> > >   			return 0;
> > > -		if (!(data & 0xfffffffffffff444ull)) {
> > > +		if (!(data & pmu->fixed_ctr_ctrl_mask)) {
> > Don't we already have hardware with more than 3 fixed counters?
> 
> Yes, so we update this mask based on the value of pmu->nr_arch_fixed_counters:

Yes, I saw that, but the Changelog makes it appear this is only relevant
to ice lake, which I think is not fully correct.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 02/16] perf/x86/intel: Handle guest PEBS overflow PMI for KVM guest
  2021-05-18  7:38     ` Xu, Like
@ 2021-05-18  8:37       ` Peter Zijlstra
  0 siblings, 0 replies; 56+ messages in thread
From: Peter Zijlstra @ 2021-05-18  8:37 UTC (permalink / raw)
  To: Xu, Like
  Cc: Paolo Bonzini, Borislav Petkov, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	weijiang.yang, Kan Liang, ak, wei.w.wang, eranian, liuxiangdong5,
	linux-kernel, x86, kvm, Like Xu

On Tue, May 18, 2021 at 03:38:52PM +0800, Xu, Like wrote:

> > I'm thinking you have your conditions in the wrong order; would it not
> > be much cheaper to first check: '!x86_pmu.pebs_active || !guest_pebs_idx'
> > than to do that horrible indirect ->is_in_guest() call?
> > 
> > After all, if the guest doesn't have PEBS enabled, who cares if we're
> > currently in a guest or not.
> 
> Yes, it makes sense. How about:
> 
> @@ -2833,6 +2867,10 @@ static int handle_pmi_common(struct pt_regs *regs,
> u64 status)
>                 u64 pebs_enabled = cpuc->pebs_enabled;
> 
>                 handled++;
> +               if (x86_pmu.pebs_vmx && x86_pmu.pebs_active &&
> +                   (cpuc->pebs_enabled & ~cpuc->intel_ctrl_host_mask) &&
> +                   (static_call(x86_guest_state)() & PERF_GUEST_ACTIVE))
> +                       x86_pmu_handle_guest_pebs(regs, &data);

This is terruble, just call x86_pmu_handle_guest_pebs() unconditionally
and put all the ugly inside it.

>                 x86_pmu.drain_pebs(regs, &data);
>                 status &= intel_ctrl | GLOBAL_STATUS_TRACE_TOPAPMI;
> 
> > 
> > Also, something like the below perhaps (arm64 and xen need fixing up at
> > the very least) could make all that perf_guest_cbs stuff suck less.
> 
> How about the commit message for your below patch:
> 
> From: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> 
> x86/core: Use static_call to rewrite perf_guest_info_callbacks
> 
> The two fields named "is_in_guest" and "is_user_mode" in
> perf_guest_info_callbacks are replaced with a new multiplexed member
> named "state", and the "get_guest_ip" field will be renamed to "get_ip".
> 
> The application of DEFINE_STATIC_CALL_RET0 (arm64 and xen need fixing
> up at the very least) could make all that perf_guest_cbs stuff suck less.
> For KVM, these callbacks will be updated in the kvm_arch_init().
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Well, you *do* need to fix up arm64 and xen, we can't very well break
their builds can we now.

> ----
> 
> I'm not sue if you have a strong reason to violate the check-patch rule:
> 
> ERROR: Using weak declarations can have unintended link defects
> #238: FILE: include/linux/perf_event.h:1242:
> +extern void __weak arch_perf_update_guest_cbs(void);

Copy/paste fail I think. I didn't really put much effort into the patch,
only made sure defconfig+kvm_guest.config compiled.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 06/16] KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS
  2021-05-17  8:32   ` Peter Zijlstra
@ 2021-05-18  8:44     ` Xu, Like
  2021-05-18 13:42       ` Peter Zijlstra
  0 siblings, 1 reply; 56+ messages in thread
From: Xu, Like @ 2021-05-18  8:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paolo Bonzini, Borislav Petkov, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	weijiang.yang, Kan Liang, ak, wei.w.wang, eranian, liuxiangdong5,
	linux-kernel, x86, kvm, Luwei Kang, Like Xu

On 2021/5/17 16:32, Peter Zijlstra wrote:
> On Tue, May 11, 2021 at 10:42:04AM +0800, Like Xu wrote:
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index 2f89fd599842..c791765f4761 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -3898,31 +3898,49 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
>>   	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>>   	struct perf_guest_switch_msr *arr = cpuc->guest_switch_msrs;
>>   	u64 intel_ctrl = hybrid(cpuc->pmu, intel_ctrl);
>> +	u64 pebs_mask = (x86_pmu.flags & PMU_FL_PEBS_ALL) ?
>> +		cpuc->pebs_enabled : (cpuc->pebs_enabled & PEBS_COUNTER_MASK);
>> +
>> +	*nr = 0;
>> +	arr[(*nr)++] = (struct perf_guest_switch_msr){
>> +		.msr = MSR_CORE_PERF_GLOBAL_CTRL,
>> +		.host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
>> +		.guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask),
>> +	};
>>   
>> +	if (!x86_pmu.pebs)
>> +		return arr;
>>   
>> +	/*
>> +	 * If PMU counter has PEBS enabled it is not enough to
>> +	 * disable counter on a guest entry since PEBS memory
>> +	 * write can overshoot guest entry and corrupt guest
>> +	 * memory. Disabling PEBS solves the problem.
>> +	 *
>> +	 * Don't do this if the CPU already enforces it.
>> +	 */
>> +	if (x86_pmu.pebs_no_isolation) {
>> +		arr[(*nr)++] = (struct perf_guest_switch_msr){
>> +			.msr = MSR_IA32_PEBS_ENABLE,
>> +			.host = cpuc->pebs_enabled,
>> +			.guest = 0,
>> +		};
>> +		return arr;
>>   	}
>>   
>> +	if (!x86_pmu.pebs_vmx)
>> +		return arr;
>> +
>> +	arr[*nr] = (struct perf_guest_switch_msr){
>> +		.msr = MSR_IA32_PEBS_ENABLE,
>> +		.host = cpuc->pebs_enabled & ~cpuc->intel_ctrl_guest_mask,
>> +		.guest = pebs_mask & ~cpuc->intel_ctrl_host_mask,
>> +	};
>> +
>> +	/* Set hw GLOBAL_CTRL bits for PEBS counter when it runs for guest */
>> +	arr[0].guest |= arr[*nr].guest;
>> +
>> +	++(*nr);
>>   	return arr;
>>   }
> ISTR saying I was confused as heck by this function, I still don't see
> clarifying comments :/
>
> What's .host and .guest ?

Will adding the following comments help you ?

+/*
+ * Currently, the only caller of this function is the 
atomic_switch_perf_msrs().
+ * The host perf conext helps to prepare the values of the real hardware for
+ * a set of msrs that need to be switched atomically in a vmx transaction.
+ *
+ * For example, the pseudocode needed to add a new msr should look like:
+ *
+ * arr[(*nr)++] = (struct perf_guest_switch_msr){
+ *     .msr = the hardware msr address,
+ *     .host = the value the hardware has when it doesn't run a guest,
+ *     .guest = the value the hardware has when it runs a guest,
+ * };
+ *
+ * These values have nothing to do with the emulated values the guest sees
+ * when it uses {RD,WR}MSR, which should be handled in the KVM context.
+ */
  static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void 
*data)



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 07/16] KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter
  2021-05-17 14:44     ` Andi Kleen
@ 2021-05-18  8:47       ` Peter Zijlstra
  2021-05-18 13:15         ` Xu, Like
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2021-05-18  8:47 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Like Xu, Paolo Bonzini, Borislav Petkov, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	weijiang.yang, Kan Liang, wei.w.wang, eranian, liuxiangdong5,
	linux-kernel, x86, kvm

On Mon, May 17, 2021 at 07:44:15AM -0700, Andi Kleen wrote:
> 
> On 5/17/2021 1:39 AM, Peter Zijlstra wrote:
> > On Tue, May 11, 2021 at 10:42:05AM +0800, Like Xu wrote:
> > > +	if (pebs) {
> > > +		/*
> > > +		 * The non-zero precision level of guest event makes the ordinary
> > > +		 * guest event becomes a guest PEBS event and triggers the host
> > > +		 * PEBS PMI handler to determine whether the PEBS overflow PMI
> > > +		 * comes from the host counters or the guest.
> > > +		 *
> > > +		 * For most PEBS hardware events, the difference in the software
> > > +		 * precision levels of guest and host PEBS events will not affect
> > > +		 * the accuracy of the PEBS profiling result, because the "event IP"
> > > +		 * in the PEBS record is calibrated on the guest side.
> > > +		 */
> > > +		attr.precise_ip = 1;
> > > +	}
> > You've just destroyed precdist, no?
> 
> precdist can mean multiple things:
> 
> - Convert cycles to the precise INST_RETIRED event. That is not meaningful
> for virtualization because "cycles" doesn't exist, just the raw events.
> 
> - For GLC+ and TNT+ it will force the event to a specific counter that is
> more precise. This would be indeed "destroyed", but right now the patch kit
> only supports Icelake which doesn't support that anyways.
> 
> So I think the code is correct for now, but will need to be changed for
> later CPUs. Should perhaps fix the comment though to discuss this.

OK, can we then do a better comment that explains *why* this is correct
now and what needs help later?

Because IIUC the only reason it is correct now is because:

 - we only support ICL

   * and ICL has pebs_format>=2, so {1,2} are the same
   * and ICL doesn't have precise_ip==3 support

 - Other hardware (GLC+, TNT+) that could possibly care here
   is unsupported atm. but needs changes.

None of which is actually mentioned in that comment it does have.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS
  2021-05-17  6:38   ` Like Xu
@ 2021-05-18 12:23     ` Liuxiangdong
  2021-05-18 12:40       ` Xu, Like
  0 siblings, 1 reply; 56+ messages in thread
From: Liuxiangdong @ 2021-05-18 12:23 UTC (permalink / raw)
  To: Like Xu
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, linux-kernel, x86, kvm, Fangyi (Eric),
	Xiexiangyou, Peter Zijlstra, Paolo Bonzini



On 2021/5/17 14:38, Like Xu wrote:
> Hi xiangdong,
>
> On 2021/5/15 18:30, Liuxiangdong wrote:
>>
>>
>> On 2021/5/11 10:41, Like Xu wrote:
>>> A new kernel cycle has begun, and this version looks promising.
>>>
>>> The guest Precise Event Based Sampling (PEBS) feature can provide
>>> an architectural state of the instruction executed after the guest
>>> instruction that exactly caused the event. It needs new hardware
>>> facility only available on Intel Ice Lake Server platforms. This
>>> patch set enables the basic PEBS feature for KVM guests on ICX.
>>>
>>> We can use PEBS feature on the Linux guest like native:
>>>
>>>    # perf record -e instructions:ppp ./br_instr a
>>>    # perf record -c 100000 -e instructions:pp ./br_instr a
>>
>> Hi, Like.
>> Has the qemu patch been modified?
>>
>> https://lore.kernel.org/kvm/f4dcb068-2ddf-428f-50ad-39f65cad3710@intel.com/ 
>> ?
>
> I think the qemu part still works based on
> 609d7596524ab204ccd71ef42c9eee4c7c338ea4 (tag: v6.0.0).
>

Yes. I applied these two qemu patches to qemu v6.0.0 and this kvm 
patches set to latest kvm tree.

I can see pebs flags in Guest(linux 5.11) on the IceLake( Model: 106  
Model name: Intel(R) Xeon(R) Platinum 8378A CPU),
and i can use PEBS like this.

     #perf record -e instructions:pp

It can work normally.

But  there is no sampling when i use "perf record -e events:pp" or just 
"perf record" in guest
unless i delete patch 09 and patch 13 from this kvm patches set.


Have you tried "perf record -e events:pp" in this patches set? Does it 
work normally?



Thanks!
Xiangdong Liu



> When the LBR qemu patch receives the ACK from the maintainer,
> I will submit PBES qemu support because their changes are very similar.
>
> Please help review this version and
> feel free to add your comments or "Reviewed-by".
>
> Thanks,
> Like Xu
>
>>
>>
>>> To emulate guest PEBS facility for the above perf usages,
>>> we need to implement 2 code paths:
>>>
>>> 1) Fast path
>>>
>>> This is when the host assigned physical PMC has an identical index as
>>> the virtual PMC (e.g. using physical PMC0 to emulate virtual PMC0).
>>> This path is used in most common use cases.
>>>
>>> 2) Slow path
>>>
>>> This is when the host assigned physical PMC has a different index
>>> from the virtual PMC (e.g. using physical PMC1 to emulate virtual PMC0)
>>> In this case, KVM needs to rewrite the PEBS records to change the
>>> applicable counter indexes to the virtual PMC indexes, which would
>>> otherwise contain the physical counter index written by PEBS facility,
>>> and switch the counter reset values to the offset corresponding to
>>> the physical counter indexes in the DS data structure.
>>>
>>> The previous version [0] enables both fast path and slow path, which
>>> seems a bit more complex as the first step. In this patchset, we want
>>> to start with the fast path to get the basic guest PEBS enabled while
>>> keeping the slow path disabled. More focused discussion on the slow
>>> path [1] is planned to be put to another patchset in the next step.
>>>
>>> Compared to later versions in subsequent steps, the functionality
>>> to support host-guest PEBS both enabled and the functionality to
>>> emulate guest PEBS when the counter is cross-mapped are missing
>>> in this patch set (neither of these are typical scenarios).
>>>
>>> With the basic support, the guest can retrieve the correct PEBS
>>> information from its own PEBS records on the Ice Lake servers.
>>> And we expect it should work when migrating to another Ice Lake
>>> and no regression about host perf is expected.
>>>
>>> Here are the results of pebs test from guest/host for same workload:
>>>
>>> perf report on guest:
>>> # Samples: 2K of event 'instructions:ppp', # Event count (approx.): 
>>> 1473377250
>>> # Overhead  Command   Shared Object      Symbol
>>>    57.74%  br_instr  br_instr           [.] lfsr_cond
>>>    41.40%  br_instr  br_instr           [.] cmp_end
>>>     0.21%  br_instr  [kernel.kallsyms]  [k] __lock_acquire
>>>
>>> perf report on host:
>>> # Samples: 2K of event 'instructions:ppp', # Event count (approx.): 
>>> 1462721386
>>> # Overhead  Command   Shared Object     Symbol
>>>    57.90%  br_instr  br_instr          [.] lfsr_cond
>>>    41.95%  br_instr  br_instr          [.] cmp_end
>>>     0.05%  br_instr  [kernel.vmlinux]  [k] lock_acquire
>>>     Conclusion: the profiling results on the guest are similar 
>>> tothat on the host.
>>>
>>> A minimum guest kernel version may be v5.4 or a backport version
>>> support Icelake server PEBS.
>>>
>>> Please check more details in each commit and feel free to comment.
>>>
>>> Previous:
>>> https://lore.kernel.org/kvm/20210415032016.166201-1-like.xu@linux.intel.com/ 
>>>
>>>
>>> [0] 
>>> https://lore.kernel.org/kvm/20210104131542.495413-1-like.xu@linux.intel.com/
>>> [1] 
>>> https://lore.kernel.org/kvm/20210115191113.nktlnmivc3edstiv@two.firstfloor.org/ 
>>>
>>>
>>> V5 -> V6 Changelog:
>>> - Rebased on the latest kvm/queue tree;
>>> - Fix a git rebase issue (Liuxiangdong);
>>> - Adjust the patch sequence 06/07 for bisection (Liuxiangdong);
>>>
>>> Like Xu (16):
>>>    perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server
>>>    perf/x86/intel: Handle guest PEBS overflow PMI for KVM guest
>>>    perf/x86/core: Pass "struct kvm_pmu *" to determine the guest values
>>>    KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled
>>>    KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter
>>>    KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS
>>>    KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter
>>>    KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS
>>>    KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive 
>>> PEBS
>>>    KVM: x86: Set PEBS_UNAVAIL in IA32_MISC_ENABLE when PEBS is enabled
>>>    KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR 
>>> counter
>>>    KVM: x86/pmu: Move pmc_speculative_in_use() to arch/x86/kvm/pmu.h
>>>    KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations
>>>    KVM: x86/pmu: Add kvm_pmu_cap to optimize 
>>> perf_get_x86_pmu_capability
>>>    KVM: x86/cpuid: Refactor host/guest CPU model consistency check
>>>    KVM: x86/pmu: Expose CPUIDs feature bits PDCM, DS, DTES64
>>>
>>>   arch/x86/events/core.c            |   5 +-
>>>   arch/x86/events/intel/core.c      | 129 
>>> ++++++++++++++++++++++++------
>>>   arch/x86/events/perf_event.h      |   5 +-
>>>   arch/x86/include/asm/kvm_host.h   |  16 ++++
>>>   arch/x86/include/asm/msr-index.h  |   6 ++
>>>   arch/x86/include/asm/perf_event.h |   5 +-
>>>   arch/x86/kvm/cpuid.c              |  24 ++----
>>>   arch/x86/kvm/cpuid.h              |   5 ++
>>>   arch/x86/kvm/pmu.c                |  50 +++++++++---
>>>   arch/x86/kvm/pmu.h                |  38 +++++++++
>>>   arch/x86/kvm/vmx/capabilities.h   |  26 ++++--
>>>   arch/x86/kvm/vmx/pmu_intel.c      | 115 +++++++++++++++++++++-----
>>>   arch/x86/kvm/vmx/vmx.c            |  24 +++++-
>>>   arch/x86/kvm/vmx/vmx.h            |   2 +-
>>>   arch/x86/kvm/x86.c                |  14 ++--
>>>   15 files changed, 368 insertions(+), 96 deletions(-)
>>>
>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS
  2021-05-18 12:23     ` Liuxiangdong
@ 2021-05-18 12:40       ` Xu, Like
  2021-05-18 13:15         ` Liuxiangdong
  2021-05-19  1:44         ` Liuxiangdong
  0 siblings, 2 replies; 56+ messages in thread
From: Xu, Like @ 2021-05-18 12:40 UTC (permalink / raw)
  To: Liuxiangdong
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, linux-kernel, x86, kvm, Fangyi (Eric),
	Xiexiangyou, Peter Zijlstra, Paolo Bonzini, Like Xu

On 2021/5/18 20:23, Liuxiangdong wrote:
>
>
> On 2021/5/17 14:38, Like Xu wrote:
>> Hi xiangdong,
>>
>> On 2021/5/15 18:30, Liuxiangdong wrote:
>>>
>>>
>>> On 2021/5/11 10:41, Like Xu wrote:
>>>> A new kernel cycle has begun, and this version looks promising.
>>>>
>>>> The guest Precise Event Based Sampling (PEBS) feature can provide
>>>> an architectural state of the instruction executed after the guest
>>>> instruction that exactly caused the event. It needs new hardware
>>>> facility only available on Intel Ice Lake Server platforms. This
>>>> patch set enables the basic PEBS feature for KVM guests on ICX.
>>>>
>>>> We can use PEBS feature on the Linux guest like native:
>>>>
>>>>    # perf record -e instructions:ppp ./br_instr a
>>>>    # perf record -c 100000 -e instructions:pp ./br_instr a
>>>
>>> Hi, Like.
>>> Has the qemu patch been modified?
>>>
>>> https://lore.kernel.org/kvm/f4dcb068-2ddf-428f-50ad-39f65cad3710@intel.com/ 
>>> ?
>>
>> I think the qemu part still works based on
>> 609d7596524ab204ccd71ef42c9eee4c7c338ea4 (tag: v6.0.0).
>>
>
> Yes. I applied these two qemu patches to qemu v6.0.0 and this kvm patches 
> set to latest kvm tree.
>
> I can see pebs flags in Guest(linux 5.11) on the IceLake( Model: 106  
> Model name: Intel(R) Xeon(R) Platinum 8378A CPU),
> and i can use PEBS like this.
>
>     #perf record -e instructions:pp
>
> It can work normally.
>
> But  there is no sampling when i use "perf record -e events:pp" or just 
> "perf record" in guest
> unless i delete patch 09 and patch 13 from this kvm patches set.
>
>

With patch 9 and 13, does the basic counter sampling still work ?
You may retry w/ "echo 0 > /proc/sys/kernel/watchdog" on the host and guest.

> Have you tried "perf record -e events:pp" in this patches set? Does it 
> work normally?

All my PEBS testcases passed. You may dump guest msr traces from your 
testcase with me.

>
>
>
> Thanks!
> Xiangdong Liu
>
>
>
>> When the LBR qemu patch receives the ACK from the maintainer,
>> I will submit PBES qemu support because their changes are very similar.
>>
>> Please help review this version and
>> feel free to add your comments or "Reviewed-by".
>>
>> Thanks,
>> Like Xu
>>
>>>
>>>
>>>> To emulate guest PEBS facility for the above perf usages,
>>>> we need to implement 2 code paths:
>>>>
>>>> 1) Fast path
>>>>
>>>> This is when the host assigned physical PMC has an identical index as
>>>> the virtual PMC (e.g. using physical PMC0 to emulate virtual PMC0).
>>>> This path is used in most common use cases.
>>>>
>>>> 2) Slow path
>>>>
>>>> This is when the host assigned physical PMC has a different index
>>>> from the virtual PMC (e.g. using physical PMC1 to emulate virtual PMC0)
>>>> In this case, KVM needs to rewrite the PEBS records to change the
>>>> applicable counter indexes to the virtual PMC indexes, which would
>>>> otherwise contain the physical counter index written by PEBS facility,
>>>> and switch the counter reset values to the offset corresponding to
>>>> the physical counter indexes in the DS data structure.
>>>>
>>>> The previous version [0] enables both fast path and slow path, which
>>>> seems a bit more complex as the first step. In this patchset, we want
>>>> to start with the fast path to get the basic guest PEBS enabled while
>>>> keeping the slow path disabled. More focused discussion on the slow
>>>> path [1] is planned to be put to another patchset in the next step.
>>>>
>>>> Compared to later versions in subsequent steps, the functionality
>>>> to support host-guest PEBS both enabled and the functionality to
>>>> emulate guest PEBS when the counter is cross-mapped are missing
>>>> in this patch set (neither of these are typical scenarios).
>>>>
>>>> With the basic support, the guest can retrieve the correct PEBS
>>>> information from its own PEBS records on the Ice Lake servers.
>>>> And we expect it should work when migrating to another Ice Lake
>>>> and no regression about host perf is expected.
>>>>
>>>> Here are the results of pebs test from guest/host for same workload:
>>>>
>>>> perf report on guest:
>>>> # Samples: 2K of event 'instructions:ppp', # Event count (approx.): 
>>>> 1473377250
>>>> # Overhead  Command   Shared Object      Symbol
>>>>    57.74%  br_instr  br_instr           [.] lfsr_cond
>>>>    41.40%  br_instr  br_instr           [.] cmp_end
>>>>     0.21%  br_instr  [kernel.kallsyms]  [k] __lock_acquire
>>>>
>>>> perf report on host:
>>>> # Samples: 2K of event 'instructions:ppp', # Event count (approx.): 
>>>> 1462721386
>>>> # Overhead  Command   Shared Object     Symbol
>>>>    57.90%  br_instr  br_instr          [.] lfsr_cond
>>>>    41.95%  br_instr  br_instr          [.] cmp_end
>>>>     0.05%  br_instr  [kernel.vmlinux]  [k] lock_acquire
>>>>     Conclusion: the profiling results on the guest are similar tothat 
>>>> on the host.
>>>>
>>>> A minimum guest kernel version may be v5.4 or a backport version
>>>> support Icelake server PEBS.
>>>>
>>>> Please check more details in each commit and feel free to comment.
>>>>
>>>> Previous:
>>>> https://lore.kernel.org/kvm/20210415032016.166201-1-like.xu@linux.intel.com/ 
>>>>
>>>>
>>>> [0] 
>>>> https://lore.kernel.org/kvm/20210104131542.495413-1-like.xu@linux.intel.com/
>>>> [1] 
>>>> https://lore.kernel.org/kvm/20210115191113.nktlnmivc3edstiv@two.firstfloor.org/ 
>>>>
>>>>
>>>> V5 -> V6 Changelog:
>>>> - Rebased on the latest kvm/queue tree;
>>>> - Fix a git rebase issue (Liuxiangdong);
>>>> - Adjust the patch sequence 06/07 for bisection (Liuxiangdong);
>>>>
>>>> Like Xu (16):
>>>>    perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server
>>>>    perf/x86/intel: Handle guest PEBS overflow PMI for KVM guest
>>>>    perf/x86/core: Pass "struct kvm_pmu *" to determine the guest values
>>>>    KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled
>>>>    KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter
>>>>    KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS
>>>>    KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter
>>>>    KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS
>>>>    KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS
>>>>    KVM: x86: Set PEBS_UNAVAIL in IA32_MISC_ENABLE when PEBS is enabled
>>>>    KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR counter
>>>>    KVM: x86/pmu: Move pmc_speculative_in_use() to arch/x86/kvm/pmu.h
>>>>    KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations
>>>>    KVM: x86/pmu: Add kvm_pmu_cap to optimize perf_get_x86_pmu_capability
>>>>    KVM: x86/cpuid: Refactor host/guest CPU model consistency check
>>>>    KVM: x86/pmu: Expose CPUIDs feature bits PDCM, DS, DTES64
>>>>
>>>>   arch/x86/events/core.c            |   5 +-
>>>>   arch/x86/events/intel/core.c      | 129 ++++++++++++++++++++++++------
>>>>   arch/x86/events/perf_event.h      |   5 +-
>>>>   arch/x86/include/asm/kvm_host.h   |  16 ++++
>>>>   arch/x86/include/asm/msr-index.h  |   6 ++
>>>>   arch/x86/include/asm/perf_event.h |   5 +-
>>>>   arch/x86/kvm/cpuid.c              |  24 ++----
>>>>   arch/x86/kvm/cpuid.h              |   5 ++
>>>>   arch/x86/kvm/pmu.c                |  50 +++++++++---
>>>>   arch/x86/kvm/pmu.h                |  38 +++++++++
>>>>   arch/x86/kvm/vmx/capabilities.h   |  26 ++++--
>>>>   arch/x86/kvm/vmx/pmu_intel.c      | 115 +++++++++++++++++++++-----
>>>>   arch/x86/kvm/vmx/vmx.c            |  24 +++++-
>>>>   arch/x86/kvm/vmx/vmx.h            |   2 +-
>>>>   arch/x86/kvm/x86.c                |  14 ++--
>>>>   15 files changed, 368 insertions(+), 96 deletions(-)
>>>>
>>
>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS
  2021-05-18 12:40       ` Xu, Like
@ 2021-05-18 13:15         ` Liuxiangdong
  2021-05-19  1:44         ` Liuxiangdong
  1 sibling, 0 replies; 56+ messages in thread
From: Liuxiangdong @ 2021-05-18 13:15 UTC (permalink / raw)
  To: Xu, Like
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, linux-kernel, x86, kvm, Fangyi (Eric),
	Xiexiangyou, Peter Zijlstra, Paolo Bonzini, Like Xu



On 2021/5/18 20:40, Xu, Like wrote:
> On 2021/5/18 20:23, Liuxiangdong wrote:
>>
>>
>> On 2021/5/17 14:38, Like Xu wrote:
>>> Hi xiangdong,
>>>
>>> On 2021/5/15 18:30, Liuxiangdong wrote:
>>>>
>>>>
>>>> On 2021/5/11 10:41, Like Xu wrote:
>>>>> A new kernel cycle has begun, and this version looks promising.
>>>>>
>>>>> The guest Precise Event Based Sampling (PEBS) feature can provide
>>>>> an architectural state of the instruction executed after the guest
>>>>> instruction that exactly caused the event. It needs new hardware
>>>>> facility only available on Intel Ice Lake Server platforms. This
>>>>> patch set enables the basic PEBS feature for KVM guests on ICX.
>>>>>
>>>>> We can use PEBS feature on the Linux guest like native:
>>>>>
>>>>>    # perf record -e instructions:ppp ./br_instr a
>>>>>    # perf record -c 100000 -e instructions:pp ./br_instr a
>>>>
>>>> Hi, Like.
>>>> Has the qemu patch been modified?
>>>>
>>>> https://lore.kernel.org/kvm/f4dcb068-2ddf-428f-50ad-39f65cad3710@intel.com/ 
>>>> ?
>>>
>>> I think the qemu part still works based on
>>> 609d7596524ab204ccd71ef42c9eee4c7c338ea4 (tag: v6.0.0).
>>>
>>
>> Yes. I applied these two qemu patches to qemu v6.0.0 and this kvm 
>> patches set to latest kvm tree.
>>
>> I can see pebs flags in Guest(linux 5.11) on the IceLake( Model: 106  
>> Model name: Intel(R) Xeon(R) Platinum 8378A CPU),
>> and i can use PEBS like this.
>>
>>     #perf record -e instructions:pp
>>
>> It can work normally.
>>
>> But  there is no sampling when i use "perf record -e events:pp" or 
>> just "perf record" in guest
>> unless i delete patch 09 and patch 13 from this kvm patches set.
>>
>>
>
> With patch 9 and 13, does the basic counter sampling still work ?
> You may retry w/ "echo 0 > /proc/sys/kernel/watchdog" on the host and 
> guest.
>

Yes. It works!  Thanks!


>> Have you tried "perf record -e events:pp" in this patches set? Does 
>> it work normally?
>
> All my PEBS testcases passed. You may dump guest msr traces from your 
> testcase with me.
>
>>
>>
>>
>> Thanks!
>> Xiangdong Liu
>>
>>
>>
>>> When the LBR qemu patch receives the ACK from the maintainer,
>>> I will submit PBES qemu support because their changes are very similar.
>>>
>>> Please help review this version and
>>> feel free to add your comments or "Reviewed-by".
>>>
>>> Thanks,
>>> Like Xu
>>>
>>>>
>>>>
>>>>> To emulate guest PEBS facility for the above perf usages,
>>>>> we need to implement 2 code paths:
>>>>>
>>>>> 1) Fast path
>>>>>
>>>>> This is when the host assigned physical PMC has an identical index as
>>>>> the virtual PMC (e.g. using physical PMC0 to emulate virtual PMC0).
>>>>> This path is used in most common use cases.
>>>>>
>>>>> 2) Slow path
>>>>>
>>>>> This is when the host assigned physical PMC has a different index
>>>>> from the virtual PMC (e.g. using physical PMC1 to emulate virtual 
>>>>> PMC0)
>>>>> In this case, KVM needs to rewrite the PEBS records to change the
>>>>> applicable counter indexes to the virtual PMC indexes, which would
>>>>> otherwise contain the physical counter index written by PEBS 
>>>>> facility,
>>>>> and switch the counter reset values to the offset corresponding to
>>>>> the physical counter indexes in the DS data structure.
>>>>>
>>>>> The previous version [0] enables both fast path and slow path, which
>>>>> seems a bit more complex as the first step. In this patchset, we want
>>>>> to start with the fast path to get the basic guest PEBS enabled while
>>>>> keeping the slow path disabled. More focused discussion on the slow
>>>>> path [1] is planned to be put to another patchset in the next step.
>>>>>
>>>>> Compared to later versions in subsequent steps, the functionality
>>>>> to support host-guest PEBS both enabled and the functionality to
>>>>> emulate guest PEBS when the counter is cross-mapped are missing
>>>>> in this patch set (neither of these are typical scenarios).
>>>>>
>>>>> With the basic support, the guest can retrieve the correct PEBS
>>>>> information from its own PEBS records on the Ice Lake servers.
>>>>> And we expect it should work when migrating to another Ice Lake
>>>>> and no regression about host perf is expected.
>>>>>
>>>>> Here are the results of pebs test from guest/host for same workload:
>>>>>
>>>>> perf report on guest:
>>>>> # Samples: 2K of event 'instructions:ppp', # Event count 
>>>>> (approx.): 1473377250
>>>>> # Overhead  Command   Shared Object      Symbol
>>>>>    57.74%  br_instr  br_instr           [.] lfsr_cond
>>>>>    41.40%  br_instr  br_instr           [.] cmp_end
>>>>>     0.21%  br_instr  [kernel.kallsyms]  [k] __lock_acquire
>>>>>
>>>>> perf report on host:
>>>>> # Samples: 2K of event 'instructions:ppp', # Event count 
>>>>> (approx.): 1462721386
>>>>> # Overhead  Command   Shared Object     Symbol
>>>>>    57.90%  br_instr  br_instr          [.] lfsr_cond
>>>>>    41.95%  br_instr  br_instr          [.] cmp_end
>>>>>     0.05%  br_instr  [kernel.vmlinux]  [k] lock_acquire
>>>>>     Conclusion: the profiling results on the guest are similar 
>>>>> tothat on the host.
>>>>>
>>>>> A minimum guest kernel version may be v5.4 or a backport version
>>>>> support Icelake server PEBS.
>>>>>
>>>>> Please check more details in each commit and feel free to comment.
>>>>>
>>>>> Previous:
>>>>> https://lore.kernel.org/kvm/20210415032016.166201-1-like.xu@linux.intel.com/ 
>>>>>
>>>>>
>>>>> [0] 
>>>>> https://lore.kernel.org/kvm/20210104131542.495413-1-like.xu@linux.intel.com/
>>>>> [1] 
>>>>> https://lore.kernel.org/kvm/20210115191113.nktlnmivc3edstiv@two.firstfloor.org/ 
>>>>>
>>>>>
>>>>> V5 -> V6 Changelog:
>>>>> - Rebased on the latest kvm/queue tree;
>>>>> - Fix a git rebase issue (Liuxiangdong);
>>>>> - Adjust the patch sequence 06/07 for bisection (Liuxiangdong);
>>>>>
>>>>> Like Xu (16):
>>>>>    perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server
>>>>>    perf/x86/intel: Handle guest PEBS overflow PMI for KVM guest
>>>>>    perf/x86/core: Pass "struct kvm_pmu *" to determine the guest 
>>>>> values
>>>>>    KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is 
>>>>> enabled
>>>>>    KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter
>>>>>    KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS
>>>>>    KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter
>>>>>    KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS
>>>>>    KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support 
>>>>> adaptive PEBS
>>>>>    KVM: x86: Set PEBS_UNAVAIL in IA32_MISC_ENABLE when PEBS is 
>>>>> enabled
>>>>>    KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR 
>>>>> counter
>>>>>    KVM: x86/pmu: Move pmc_speculative_in_use() to arch/x86/kvm/pmu.h
>>>>>    KVM: x86/pmu: Disable guest PEBS temporarily in two rare 
>>>>> situations
>>>>>    KVM: x86/pmu: Add kvm_pmu_cap to optimize 
>>>>> perf_get_x86_pmu_capability
>>>>>    KVM: x86/cpuid: Refactor host/guest CPU model consistency check
>>>>>    KVM: x86/pmu: Expose CPUIDs feature bits PDCM, DS, DTES64
>>>>>
>>>>>   arch/x86/events/core.c            |   5 +-
>>>>>   arch/x86/events/intel/core.c      | 129 
>>>>> ++++++++++++++++++++++++------
>>>>>   arch/x86/events/perf_event.h      |   5 +-
>>>>>   arch/x86/include/asm/kvm_host.h   |  16 ++++
>>>>>   arch/x86/include/asm/msr-index.h  |   6 ++
>>>>>   arch/x86/include/asm/perf_event.h |   5 +-
>>>>>   arch/x86/kvm/cpuid.c              |  24 ++----
>>>>>   arch/x86/kvm/cpuid.h              |   5 ++
>>>>>   arch/x86/kvm/pmu.c                |  50 +++++++++---
>>>>>   arch/x86/kvm/pmu.h                |  38 +++++++++
>>>>>   arch/x86/kvm/vmx/capabilities.h   |  26 ++++--
>>>>>   arch/x86/kvm/vmx/pmu_intel.c      | 115 +++++++++++++++++++++-----
>>>>>   arch/x86/kvm/vmx/vmx.c            |  24 +++++-
>>>>>   arch/x86/kvm/vmx/vmx.h            |   2 +-
>>>>>   arch/x86/kvm/x86.c                |  14 ++--
>>>>>   15 files changed, 368 insertions(+), 96 deletions(-)
>>>>>
>>>
>>
>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 07/16] KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter
  2021-05-18  8:47       ` Peter Zijlstra
@ 2021-05-18 13:15         ` Xu, Like
  2021-05-18 15:58           ` Andi Kleen
  0 siblings, 1 reply; 56+ messages in thread
From: Xu, Like @ 2021-05-18 13:15 UTC (permalink / raw)
  To: Peter Zijlstra, Andi Kleen
  Cc: Like Xu, Paolo Bonzini, Borislav Petkov, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	weijiang.yang, Kan Liang, wei.w.wang, eranian, liuxiangdong5,
	linux-kernel, x86, kvm

On 2021/5/18 16:47, Peter Zijlstra wrote:
> On Mon, May 17, 2021 at 07:44:15AM -0700, Andi Kleen wrote:
>> On 5/17/2021 1:39 AM, Peter Zijlstra wrote:
>>> On Tue, May 11, 2021 at 10:42:05AM +0800, Like Xu wrote:
>>>> +	if (pebs) {
>>>> +		/*
>>>> +		 * The non-zero precision level of guest event makes the ordinary
>>>> +		 * guest event becomes a guest PEBS event and triggers the host
>>>> +		 * PEBS PMI handler to determine whether the PEBS overflow PMI
>>>> +		 * comes from the host counters or the guest.
>>>> +		 *
>>>> +		 * For most PEBS hardware events, the difference in the software
>>>> +		 * precision levels of guest and host PEBS events will not affect
>>>> +		 * the accuracy of the PEBS profiling result, because the "event IP"
>>>> +		 * in the PEBS record is calibrated on the guest side.
>>>> +		 */
>>>> +		attr.precise_ip = 1;
>>>> +	}
>>> You've just destroyed precdist, no?
>> precdist can mean multiple things:
>>
>> - Convert cycles to the precise INST_RETIRED event. That is not meaningful
>> for virtualization because "cycles" doesn't exist, just the raw events.
>>
>> - For GLC+ and TNT+ it will force the event to a specific counter that is
>> more precise. This would be indeed "destroyed", but right now the patch kit
>> only supports Icelake which doesn't support that anyways.
>>
>> So I think the code is correct for now, but will need to be changed for
>> later CPUs. Should perhaps fix the comment though to discuss this.
> OK, can we then do a better comment that explains *why* this is correct
> now and what needs help later?
>
> Because IIUC the only reason it is correct now is because:
>
>   - we only support ICL
>
>     * and ICL has pebs_format>=2, so {1,2} are the same
>     * and ICL doesn't have precise_ip==3 support
>
>   - Other hardware (GLC+, TNT+) that could possibly care here
>     is unsupported atm. but needs changes.
>
> None of which is actually mentioned in that comment it does have.

Hi Andi & Peter,

By "precdist", do you mean the"Precise Distribution of Instructions Retired 
(PDIR) Facility"?

The SDM says Ice Lake Microarchitecture does support PEBS-PDIR on 
IA32_FIXED0 only.
And this patch kit enables it in the patch 0011, please take a look.

Or do I miss something about precdist on ICL ?

Thanks,
Like Xu






^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 07/16] KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter
  2021-05-17  9:14   ` Peter Zijlstra
@ 2021-05-18 13:28     ` Xu, Like
  2021-05-18 13:36       ` Peter Zijlstra
  0 siblings, 1 reply; 56+ messages in thread
From: Xu, Like @ 2021-05-18 13:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paolo Bonzini, Borislav Petkov, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	weijiang.yang, Kan Liang, ak, wei.w.wang, eranian, liuxiangdong5,
	linux-kernel, x86, kvm, Like Xu

On 2021/5/17 17:14, Peter Zijlstra wrote:
> On Tue, May 11, 2021 at 10:42:05AM +0800, Like Xu wrote:
>> @@ -99,6 +109,7 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
>>   				  bool exclude_kernel, bool intr,
>>   				  bool in_tx, bool in_tx_cp)
>>   {
>> +	struct kvm_pmu *pmu = vcpu_to_pmu(pmc->vcpu);
>>   	struct perf_event *event;
>>   	struct perf_event_attr attr = {
>>   		.type = type,
>> @@ -110,6 +121,7 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
>>   		.exclude_kernel = exclude_kernel,
>>   		.config = config,
>>   	};
>> +	bool pebs = test_bit(pmc->idx, (unsigned long *)&pmu->pebs_enable);
>>   
>>   	attr.sample_period = get_sample_period(pmc, pmc->counter);
>>   
>> @@ -124,9 +136,23 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type,
>>   		attr.sample_period = 0;
>>   		attr.config |= HSW_IN_TX_CHECKPOINTED;
>>   	}
>> +	if (pebs) {
>> +		/*
>> +		 * The non-zero precision level of guest event makes the ordinary
>> +		 * guest event becomes a guest PEBS event and triggers the host
>> +		 * PEBS PMI handler to determine whether the PEBS overflow PMI
>> +		 * comes from the host counters or the guest.
>> +		 *
>> +		 * For most PEBS hardware events, the difference in the software
>> +		 * precision levels of guest and host PEBS events will not affect
>> +		 * the accuracy of the PEBS profiling result, because the "event IP"
>> +		 * in the PEBS record is calibrated on the guest side.
>> +		 */
>> +		attr.precise_ip = 1;
>> +	}
>>   
>>   	event = perf_event_create_kernel_counter(&attr, -1, current,
>> -						 intr ? kvm_perf_overflow_intr :
>> +						 (intr || pebs) ? kvm_perf_overflow_intr :
>>   						 kvm_perf_overflow, pmc);
> How would pebs && !intr be possible?

I don't think it's possible.

> Also; wouldn't this be more legible
> when written like:
>
> 	perf_overflow_handler_t ovf = kvm_perf_overflow;
>
> 	...
>
> 	if (intr)
> 		ovf = kvm_perf_overflow_intr;
>
> 	...
>
> 	event = perf_event_create_kernel_counter(&attr, -1, current, ovf, pmc);
>

Please yell if you don't like this:

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 711294babb97..a607f5a1b9cd 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -122,6 +122,8 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, 
u32 type,
                 .config = config,
         };
         bool pebs = test_bit(pmc->idx, (unsigned long *)&pmu->pebs_enable);
+       perf_overflow_handler_t ovf = (intr || pebs) ?
+               kvm_perf_overflow_intr : kvm_perf_overflow;

         attr.sample_period = get_sample_period(pmc, pmc->counter);

@@ -151,9 +153,7 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc, 
u32 type,
                 attr.precise_ip = 1;
         }

-       event = perf_event_create_kernel_counter(&attr, -1, current,
-                                                (intr || pebs) ? 
kvm_perf_overflow_intr :
-                                                kvm_perf_overflow, pmc);
+       event = perf_event_create_kernel_counter(&attr, -1, current, ovf, pmc);
         if (IS_ERR(event)) {
                 pr_debug_ratelimited("kvm_pmu: event creation failed %ld 
for pmc->idx = %d\n",
                             PTR_ERR(event), pmc->idx);



^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 07/16] KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter
  2021-05-18 13:28     ` Xu, Like
@ 2021-05-18 13:36       ` Peter Zijlstra
  2021-05-18 14:05         ` Xu, Like
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Zijlstra @ 2021-05-18 13:36 UTC (permalink / raw)
  To: Xu, Like
  Cc: Paolo Bonzini, Borislav Petkov, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	weijiang.yang, Kan Liang, ak, wei.w.wang, eranian, liuxiangdong5,
	linux-kernel, x86, kvm, Like Xu

On Tue, May 18, 2021 at 09:28:52PM +0800, Xu, Like wrote:

> > How would pebs && !intr be possible?
> 
> I don't think it's possible.

And yet you keep that 'intr||pebs' weirdness :/

> > Also; wouldn't this be more legible
> > when written like:
> > 
> > 	perf_overflow_handler_t ovf = kvm_perf_overflow;
> > 
> > 	...
> > 
> > 	if (intr)
> > 		ovf = kvm_perf_overflow_intr;
> > 
> > 	...
> > 
> > 	event = perf_event_create_kernel_counter(&attr, -1, current, ovf, pmc);
> > 
> 
> Please yell if you don't like this:
> 
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 711294babb97..a607f5a1b9cd 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -122,6 +122,8 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc,
> u32 type,
>                 .config = config,
>         };
>         bool pebs = test_bit(pmc->idx, (unsigned long *)&pmu->pebs_enable);
> +       perf_overflow_handler_t ovf = (intr || pebs) ?
> +               kvm_perf_overflow_intr : kvm_perf_overflow;

This, that's exactly the kind of code I wanted to get rid of. ?: has
it's place I suppose, but you're creating dense ugly code for no reason.

	perf_overflow_handle_t ovf = kvm_perf_overflow;

	if (intr)
		ovf = kvm_perf_overflow_intr;

Is so much easier to read. And if you really worry about that pebs
thing; you can add:

	WARN_ON_ONCE(pebs && !intr);


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 06/16] KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS
  2021-05-18  8:44     ` Xu, Like
@ 2021-05-18 13:42       ` Peter Zijlstra
  0 siblings, 0 replies; 56+ messages in thread
From: Peter Zijlstra @ 2021-05-18 13:42 UTC (permalink / raw)
  To: Xu, Like
  Cc: Paolo Bonzini, Borislav Petkov, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	weijiang.yang, Kan Liang, ak, wei.w.wang, eranian, liuxiangdong5,
	linux-kernel, x86, kvm, Luwei Kang, Like Xu

On Tue, May 18, 2021 at 04:44:13PM +0800, Xu, Like wrote:
> Will adding the following comments help you ?
> 
> +/*
> + * Currently, the only caller of this function is the atomic_switch_perf_msrs().
> + * The host perf conext helps to prepare the values of the real hardware for
> + * a set of msrs that need to be switched atomically in a vmx transaction.
> + *
> + * For example, the pseudocode needed to add a new msr should look like:
> + *
> + * arr[(*nr)++] = (struct perf_guest_switch_msr){
> + *     .msr = the hardware msr address,
> + *     .host = the value the hardware has when it doesn't run a guest,
> + *     .guest = the value the hardware has when it runs a guest,

So personally I think the .host and .guest naming is terrible here,
because both values are host values. But I don't know enough about virt
to know if there's accepted nomencature for this.

> + * };
> + *
> + * These values have nothing to do with the emulated values the guest sees
> + * when it uses {RD,WR}MSR, which should be handled in the KVM context.
> + */
>  static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)

Yes, now at least one can understand wth this function does, even though
the actual naming is still horrible. Thanks!

Additionally, would it make sense to add a pointer to the KVM code that
does the emulation for each MSR listed in this function?

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 07/16] KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter
  2021-05-18 13:36       ` Peter Zijlstra
@ 2021-05-18 14:05         ` Xu, Like
  0 siblings, 0 replies; 56+ messages in thread
From: Xu, Like @ 2021-05-18 14:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paolo Bonzini, Borislav Petkov, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	weijiang.yang, Kan Liang, ak, wei.w.wang, eranian, liuxiangdong5,
	linux-kernel, x86, kvm, Like Xu

On 2021/5/18 21:36, Peter Zijlstra wrote:
> On Tue, May 18, 2021 at 09:28:52PM +0800, Xu, Like wrote:
>
>>> How would pebs && !intr be possible?
>> I don't think it's possible.
> And yet you keep that 'intr||pebs' weirdness :/
>
>>> Also; wouldn't this be more legible
>>> when written like:
>>>
>>> 	perf_overflow_handler_t ovf = kvm_perf_overflow;
>>>
>>> 	...
>>>
>>> 	if (intr)
>>> 		ovf = kvm_perf_overflow_intr;
>>>
>>> 	...
>>>
>>> 	event = perf_event_create_kernel_counter(&attr, -1, current, ovf, pmc);
>>>
>> Please yell if you don't like this:
>>
>> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
>> index 711294babb97..a607f5a1b9cd 100644
>> --- a/arch/x86/kvm/pmu.c
>> +++ b/arch/x86/kvm/pmu.c
>> @@ -122,6 +122,8 @@ static void pmc_reprogram_counter(struct kvm_pmc *pmc,
>> u32 type,
>>                  .config = config,
>>          };
>>          bool pebs = test_bit(pmc->idx, (unsigned long *)&pmu->pebs_enable);
>> +       perf_overflow_handler_t ovf = (intr || pebs) ?
>> +               kvm_perf_overflow_intr : kvm_perf_overflow;
> This, that's exactly the kind of code I wanted to get rid of. ?: has
> it's place I suppose, but you're creating dense ugly code for no reason.
>
> 	perf_overflow_handle_t ovf = kvm_perf_overflow;
>
> 	if (intr)
> 		ovf = kvm_perf_overflow_intr;
>
> Is so much easier to read. And if you really worry about that pebs
> thing; you can add:
>
> 	WARN_ON_ONCE(pebs && !intr);
>

Thanks!  Glad you could review my code.
As a new generation, we do appreciate your patient guidance on your taste 
in code.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 07/16] KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter
  2021-05-18 13:15         ` Xu, Like
@ 2021-05-18 15:58           ` Andi Kleen
  0 siblings, 0 replies; 56+ messages in thread
From: Andi Kleen @ 2021-05-18 15:58 UTC (permalink / raw)
  To: Xu, Like, Peter Zijlstra
  Cc: Like Xu, Paolo Bonzini, Borislav Petkov, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	weijiang.yang, Kan Liang, wei.w.wang, eranian, liuxiangdong5,
	linux-kernel, x86, kvm


>
> By "precdist", do you mean the"Precise Distribution of Instructions 
> Retired (PDIR) Facility"?


This was referring to perf's precise_ip field


>
> The SDM says Ice Lake Microarchitecture does support PEBS-PDIR on 
> IA32_FIXED0 only.
> And this patch kit enables it in the patch 0011, please take a look.
>
> Or do I miss something about precdist on ICL ?


On Icelake everything is fine

It just would need changes on other CPUs, but that can be done later.


-Andi



^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS
  2021-05-18 12:40       ` Xu, Like
  2021-05-18 13:15         ` Liuxiangdong
@ 2021-05-19  1:44         ` Liuxiangdong
  2021-05-21  1:37           ` Like Xu
  1 sibling, 1 reply; 56+ messages in thread
From: Liuxiangdong @ 2021-05-19  1:44 UTC (permalink / raw)
  To: Xu, Like
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, linux-kernel, x86, kvm, Fangyi (Eric),
	Xiexiangyou, Peter Zijlstra, Paolo Bonzini, Like Xu



On 2021/5/18 20:40, Xu, Like wrote:
> On 2021/5/18 20:23, Liuxiangdong wrote:
>>
>>
>> On 2021/5/17 14:38, Like Xu wrote:
>>> Hi xiangdong,
>>>
>>> On 2021/5/15 18:30, Liuxiangdong wrote:
>>>>
>>>>
>>>> On 2021/5/11 10:41, Like Xu wrote:
>>>>> A new kernel cycle has begun, and this version looks promising.
>>>>>
>>>>> The guest Precise Event Based Sampling (PEBS) feature can provide
>>>>> an architectural state of the instruction executed after the guest
>>>>> instruction that exactly caused the event. It needs new hardware
>>>>> facility only available on Intel Ice Lake Server platforms. This
>>>>> patch set enables the basic PEBS feature for KVM guests on ICX.
>>>>>
>>>>> We can use PEBS feature on the Linux guest like native:
>>>>>
>>>>>    # perf record -e instructions:ppp ./br_instr a
>>>>>    # perf record -c 100000 -e instructions:pp ./br_instr a
>>>>
>>>> Hi, Like.
>>>> Has the qemu patch been modified?
>>>>
>>>> https://lore.kernel.org/kvm/f4dcb068-2ddf-428f-50ad-39f65cad3710@intel.com/ 
>>>> ?
>>>
>>> I think the qemu part still works based on
>>> 609d7596524ab204ccd71ef42c9eee4c7c338ea4 (tag: v6.0.0).
>>>
>>
>> Yes. I applied these two qemu patches to qemu v6.0.0 and this kvm 
>> patches set to latest kvm tree.
>>
>> I can see pebs flags in Guest(linux 5.11) on the IceLake( Model: 106  
>> Model name: Intel(R) Xeon(R) Platinum 8378A CPU),
>> and i can use PEBS like this.
>>
>>     #perf record -e instructions:pp
>>
>> It can work normally.
>>
>> But  there is no sampling when i use "perf record -e events:pp" or 
>> just "perf record" in guest
>> unless i delete patch 09 and patch 13 from this kvm patches set.
>>
>>
>
> With patch 9 and 13, does the basic counter sampling still work ?
> You may retry w/ "echo 0 > /proc/sys/kernel/watchdog" on the host and 
> guest.
>

In fact, I didn't use "echo 0 > /proc/sys/kernel/watchdog" when I tried 
PEBS patches V3 on Icelake.
Why should we use it now?  What does it have to do with sampling?

Thanks!

>> Have you tried "perf record -e events:pp" in this patches set? Does 
>> it work normally?
>
> All my PEBS testcases passed. You may dump guest msr traces from your 
> testcase with me.
>
>>
>>
>>
>> Thanks!
>> Xiangdong Liu
>>
>>
>>
>>> When the LBR qemu patch receives the ACK from the maintainer,
>>> I will submit PBES qemu support because their changes are very similar.
>>>
>>> Please help review this version and
>>> feel free to add your comments or "Reviewed-by".
>>>
>>> Thanks,
>>> Like Xu
>>>
>>>>
>>>>
>>>>> To emulate guest PEBS facility for the above perf usages,
>>>>> we need to implement 2 code paths:
>>>>>
>>>>> 1) Fast path
>>>>>
>>>>> This is when the host assigned physical PMC has an identical index as
>>>>> the virtual PMC (e.g. using physical PMC0 to emulate virtual PMC0).
>>>>> This path is used in most common use cases.
>>>>>
>>>>> 2) Slow path
>>>>>
>>>>> This is when the host assigned physical PMC has a different index
>>>>> from the virtual PMC (e.g. using physical PMC1 to emulate virtual 
>>>>> PMC0)
>>>>> In this case, KVM needs to rewrite the PEBS records to change the
>>>>> applicable counter indexes to the virtual PMC indexes, which would
>>>>> otherwise contain the physical counter index written by PEBS 
>>>>> facility,
>>>>> and switch the counter reset values to the offset corresponding to
>>>>> the physical counter indexes in the DS data structure.
>>>>>
>>>>> The previous version [0] enables both fast path and slow path, which
>>>>> seems a bit more complex as the first step. In this patchset, we want
>>>>> to start with the fast path to get the basic guest PEBS enabled while
>>>>> keeping the slow path disabled. More focused discussion on the slow
>>>>> path [1] is planned to be put to another patchset in the next step.
>>>>>
>>>>> Compared to later versions in subsequent steps, the functionality
>>>>> to support host-guest PEBS both enabled and the functionality to
>>>>> emulate guest PEBS when the counter is cross-mapped are missing
>>>>> in this patch set (neither of these are typical scenarios).
>>>>>
>>>>> With the basic support, the guest can retrieve the correct PEBS
>>>>> information from its own PEBS records on the Ice Lake servers.
>>>>> And we expect it should work when migrating to another Ice Lake
>>>>> and no regression about host perf is expected.
>>>>>
>>>>> Here are the results of pebs test from guest/host for same workload:
>>>>>
>>>>> perf report on guest:
>>>>> # Samples: 2K of event 'instructions:ppp', # Event count 
>>>>> (approx.): 1473377250
>>>>> # Overhead  Command   Shared Object      Symbol
>>>>>    57.74%  br_instr  br_instr           [.] lfsr_cond
>>>>>    41.40%  br_instr  br_instr           [.] cmp_end
>>>>>     0.21%  br_instr  [kernel.kallsyms]  [k] __lock_acquire
>>>>>
>>>>> perf report on host:
>>>>> # Samples: 2K of event 'instructions:ppp', # Event count 
>>>>> (approx.): 1462721386
>>>>> # Overhead  Command   Shared Object     Symbol
>>>>>    57.90%  br_instr  br_instr          [.] lfsr_cond
>>>>>    41.95%  br_instr  br_instr          [.] cmp_end
>>>>>     0.05%  br_instr  [kernel.vmlinux]  [k] lock_acquire
>>>>>     Conclusion: the profiling results on the guest are similar 
>>>>> tothat on the host.
>>>>>
>>>>> A minimum guest kernel version may be v5.4 or a backport version
>>>>> support Icelake server PEBS.
>>>>>
>>>>> Please check more details in each commit and feel free to comment.
>>>>>
>>>>> Previous:
>>>>> https://lore.kernel.org/kvm/20210415032016.166201-1-like.xu@linux.intel.com/ 
>>>>>
>>>>>
>>>>> [0] 
>>>>> https://lore.kernel.org/kvm/20210104131542.495413-1-like.xu@linux.intel.com/
>>>>> [1] 
>>>>> https://lore.kernel.org/kvm/20210115191113.nktlnmivc3edstiv@two.firstfloor.org/ 
>>>>>
>>>>>
>>>>> V5 -> V6 Changelog:
>>>>> - Rebased on the latest kvm/queue tree;
>>>>> - Fix a git rebase issue (Liuxiangdong);
>>>>> - Adjust the patch sequence 06/07 for bisection (Liuxiangdong);
>>>>>
>>>>> Like Xu (16):
>>>>>    perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server
>>>>>    perf/x86/intel: Handle guest PEBS overflow PMI for KVM guest
>>>>>    perf/x86/core: Pass "struct kvm_pmu *" to determine the guest 
>>>>> values
>>>>>    KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is 
>>>>> enabled
>>>>>    KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter
>>>>>    KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS
>>>>>    KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter
>>>>>    KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS
>>>>>    KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support 
>>>>> adaptive PEBS
>>>>>    KVM: x86: Set PEBS_UNAVAIL in IA32_MISC_ENABLE when PEBS is 
>>>>> enabled
>>>>>    KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR 
>>>>> counter
>>>>>    KVM: x86/pmu: Move pmc_speculative_in_use() to arch/x86/kvm/pmu.h
>>>>>    KVM: x86/pmu: Disable guest PEBS temporarily in two rare 
>>>>> situations
>>>>>    KVM: x86/pmu: Add kvm_pmu_cap to optimize 
>>>>> perf_get_x86_pmu_capability
>>>>>    KVM: x86/cpuid: Refactor host/guest CPU model consistency check
>>>>>    KVM: x86/pmu: Expose CPUIDs feature bits PDCM, DS, DTES64
>>>>>
>>>>>   arch/x86/events/core.c            |   5 +-
>>>>>   arch/x86/events/intel/core.c      | 129 
>>>>> ++++++++++++++++++++++++------
>>>>>   arch/x86/events/perf_event.h      |   5 +-
>>>>>   arch/x86/include/asm/kvm_host.h   |  16 ++++
>>>>>   arch/x86/include/asm/msr-index.h  |   6 ++
>>>>>   arch/x86/include/asm/perf_event.h |   5 +-
>>>>>   arch/x86/kvm/cpuid.c              |  24 ++----
>>>>>   arch/x86/kvm/cpuid.h              |   5 ++
>>>>>   arch/x86/kvm/pmu.c                |  50 +++++++++---
>>>>>   arch/x86/kvm/pmu.h                |  38 +++++++++
>>>>>   arch/x86/kvm/vmx/capabilities.h   |  26 ++++--
>>>>>   arch/x86/kvm/vmx/pmu_intel.c      | 115 +++++++++++++++++++++-----
>>>>>   arch/x86/kvm/vmx/vmx.c            |  24 +++++-
>>>>>   arch/x86/kvm/vmx/vmx.h            |   2 +-
>>>>>   arch/x86/kvm/x86.c                |  14 ++--
>>>>>   15 files changed, 368 insertions(+), 96 deletions(-)
>>>>>
>>>
>>
>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS
  2021-05-19  1:44         ` Liuxiangdong
@ 2021-05-21  1:37           ` Like Xu
  0 siblings, 0 replies; 56+ messages in thread
From: Like Xu @ 2021-05-21  1:37 UTC (permalink / raw)
  To: Liuxiangdong
  Cc: Borislav Petkov, Sean Christopherson, Vitaly Kuznetsov,
	Wanpeng Li, Jim Mattson, Joerg Roedel, weijiang.yang, Kan Liang,
	ak, wei.w.wang, eranian, linux-kernel, x86, kvm, Fangyi (Eric),
	Xiexiangyou, Peter Zijlstra, Paolo Bonzini, Zhu, Lingshan, Xu,
	Like

On 2021/5/19 9:44, Liuxiangdong wrote:
> 
> 
> On 2021/5/18 20:40, Xu, Like wrote:
>> On 2021/5/18 20:23, Liuxiangdong wrote:
>>>
>>>
>>> On 2021/5/17 14:38, Like Xu wrote:
>>>> Hi xiangdong,
>>>>
>>>> On 2021/5/15 18:30, Liuxiangdong wrote:
>>>>>
>>>>>
>>>>> On 2021/5/11 10:41, Like Xu wrote:
>>>>>> A new kernel cycle has begun, and this version looks promising.
>>>>>>
>>>>>> The guest Precise Event Based Sampling (PEBS) feature can provide
>>>>>> an architectural state of the instruction executed after the guest
>>>>>> instruction that exactly caused the event. It needs new hardware
>>>>>> facility only available on Intel Ice Lake Server platforms. This
>>>>>> patch set enables the basic PEBS feature for KVM guests on ICX.
>>>>>>
>>>>>> We can use PEBS feature on the Linux guest like native:
>>>>>>
>>>>>>    # perf record -e instructions:ppp ./br_instr a
>>>>>>    # perf record -c 100000 -e instructions:pp ./br_instr a
>>>>>
>>>>> Hi, Like.
>>>>> Has the qemu patch been modified?
>>>>>
>>>>> https://lore.kernel.org/kvm/f4dcb068-2ddf-428f-50ad-39f65cad3710@intel.com/ 
>>>>> ?
>>>>
>>>> I think the qemu part still works based on
>>>> 609d7596524ab204ccd71ef42c9eee4c7c338ea4 (tag: v6.0.0).
>>>>
>>>
>>> Yes. I applied these two qemu patches to qemu v6.0.0 and this kvm 
>>> patches set to latest kvm tree.
>>>
>>> I can see pebs flags in Guest(linux 5.11) on the IceLake( Model: 106 
>>> Model name: Intel(R) Xeon(R) Platinum 8378A CPU),
>>> and i can use PEBS like this.
>>>
>>>     #perf record -e instructions:pp
>>>
>>> It can work normally.
>>>
>>> But  there is no sampling when i use "perf record -e events:pp" or just 
>>> "perf record" in guest
>>> unless i delete patch 09 and patch 13 from this kvm patches set.
>>>
>>>
>>
>> With patch 9 and 13, does the basic counter sampling still work ?
>> You may retry w/ "echo 0 > /proc/sys/kernel/watchdog" on the host and guest.
>>
> 
> In fact, I didn't use "echo 0 > /proc/sys/kernel/watchdog" when I tried 
> PEBS patches V3 on Icelake.
> Why should we use it now?  What does it have to do with sampling?

In the recent patch sets, we disable the guest PEBS when the guest
PEBS counter is cross mapped to a host PEBS counter with a
different index.

When we use the watchdog feature on the Intel platforms,
it may takes a cycle hw counter on the host and it may cause
the guest PEBS counter temporarily disabled if it's cross mapped.

Check patch 0013 for more details.

> 
> Thanks!
> 
>>> Have you tried "perf record -e events:pp" in this patches set? Does it 
>>> work normally?
>>
>> All my PEBS testcases passed. You may dump guest msr traces from your 
>> testcase with me.
>>
>>>
>>>
>>>
>>> Thanks!
>>> Xiangdong Liu
>>>
>>>
>>>
>>>> When the LBR qemu patch receives the ACK from the maintainer,
>>>> I will submit PBES qemu support because their changes are very similar.
>>>>
>>>> Please help review this version and
>>>> feel free to add your comments or "Reviewed-by".
>>>>
>>>> Thanks,
>>>> Like Xu
>>>>
>>>>>
>>>>>
>>>>>> To emulate guest PEBS facility for the above perf usages,
>>>>>> we need to implement 2 code paths:
>>>>>>
>>>>>> 1) Fast path
>>>>>>
>>>>>> This is when the host assigned physical PMC has an identical index as
>>>>>> the virtual PMC (e.g. using physical PMC0 to emulate virtual PMC0).
>>>>>> This path is used in most common use cases.
>>>>>>
>>>>>> 2) Slow path
>>>>>>
>>>>>> This is when the host assigned physical PMC has a different index
>>>>>> from the virtual PMC (e.g. using physical PMC1 to emulate virtual PMC0)
>>>>>> In this case, KVM needs to rewrite the PEBS records to change the
>>>>>> applicable counter indexes to the virtual PMC indexes, which would
>>>>>> otherwise contain the physical counter index written by PEBS facility,
>>>>>> and switch the counter reset values to the offset corresponding to
>>>>>> the physical counter indexes in the DS data structure.
>>>>>>
>>>>>> The previous version [0] enables both fast path and slow path, which
>>>>>> seems a bit more complex as the first step. In this patchset, we want
>>>>>> to start with the fast path to get the basic guest PEBS enabled while
>>>>>> keeping the slow path disabled. More focused discussion on the slow
>>>>>> path [1] is planned to be put to another patchset in the next step.
>>>>>>
>>>>>> Compared to later versions in subsequent steps, the functionality
>>>>>> to support host-guest PEBS both enabled and the functionality to
>>>>>> emulate guest PEBS when the counter is cross-mapped are missing
>>>>>> in this patch set (neither of these are typical scenarios).
>>>>>>
>>>>>> With the basic support, the guest can retrieve the correct PEBS
>>>>>> information from its own PEBS records on the Ice Lake servers.
>>>>>> And we expect it should work when migrating to another Ice Lake
>>>>>> and no regression about host perf is expected.
>>>>>>
>>>>>> Here are the results of pebs test from guest/host for same workload:
>>>>>>
>>>>>> perf report on guest:
>>>>>> # Samples: 2K of event 'instructions:ppp', # Event count (approx.): 
>>>>>> 1473377250
>>>>>> # Overhead  Command   Shared Object      Symbol
>>>>>>    57.74%  br_instr  br_instr           [.] lfsr_cond
>>>>>>    41.40%  br_instr  br_instr           [.] cmp_end
>>>>>>     0.21%  br_instr  [kernel.kallsyms]  [k] __lock_acquire
>>>>>>
>>>>>> perf report on host:
>>>>>> # Samples: 2K of event 'instructions:ppp', # Event count (approx.): 
>>>>>> 1462721386
>>>>>> # Overhead  Command   Shared Object     Symbol
>>>>>>    57.90%  br_instr  br_instr          [.] lfsr_cond
>>>>>>    41.95%  br_instr  br_instr          [.] cmp_end
>>>>>>     0.05%  br_instr  [kernel.vmlinux]  [k] lock_acquire
>>>>>>     Conclusion: the profiling results on the guest are similar tothat 
>>>>>> on the host.
>>>>>>
>>>>>> A minimum guest kernel version may be v5.4 or a backport version
>>>>>> support Icelake server PEBS.
>>>>>>
>>>>>> Please check more details in each commit and feel free to comment.
>>>>>>
>>>>>> Previous:
>>>>>> https://lore.kernel.org/kvm/20210415032016.166201-1-like.xu@linux.intel.com/ 
>>>>>>
>>>>>>
>>>>>> [0] 
>>>>>> https://lore.kernel.org/kvm/20210104131542.495413-1-like.xu@linux.intel.com/ 
>>>>>>
>>>>>> [1] 
>>>>>> https://lore.kernel.org/kvm/20210115191113.nktlnmivc3edstiv@two.firstfloor.org/ 
>>>>>>
>>>>>>
>>>>>> V5 -> V6 Changelog:
>>>>>> - Rebased on the latest kvm/queue tree;
>>>>>> - Fix a git rebase issue (Liuxiangdong);
>>>>>> - Adjust the patch sequence 06/07 for bisection (Liuxiangdong);
>>>>>>
>>>>>> Like Xu (16):
>>>>>>    perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server
>>>>>>    perf/x86/intel: Handle guest PEBS overflow PMI for KVM guest
>>>>>>    perf/x86/core: Pass "struct kvm_pmu *" to determine the guest values
>>>>>>    KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled
>>>>>>    KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter
>>>>>>    KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS
>>>>>>    KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter
>>>>>>    KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS
>>>>>>    KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive 
>>>>>> PEBS
>>>>>>    KVM: x86: Set PEBS_UNAVAIL in IA32_MISC_ENABLE when PEBS is enabled
>>>>>>    KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR 
>>>>>> counter
>>>>>>    KVM: x86/pmu: Move pmc_speculative_in_use() to arch/x86/kvm/pmu.h
>>>>>>    KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations
>>>>>>    KVM: x86/pmu: Add kvm_pmu_cap to optimize perf_get_x86_pmu_capability
>>>>>>    KVM: x86/cpuid: Refactor host/guest CPU model consistency check
>>>>>>    KVM: x86/pmu: Expose CPUIDs feature bits PDCM, DS, DTES64
>>>>>>
>>>>>>   arch/x86/events/core.c            |   5 +-
>>>>>>   arch/x86/events/intel/core.c      | 129 ++++++++++++++++++++++++------
>>>>>>   arch/x86/events/perf_event.h      |   5 +-
>>>>>>   arch/x86/include/asm/kvm_host.h   |  16 ++++
>>>>>>   arch/x86/include/asm/msr-index.h  |   6 ++
>>>>>>   arch/x86/include/asm/perf_event.h |   5 +-
>>>>>>   arch/x86/kvm/cpuid.c              |  24 ++----
>>>>>>   arch/x86/kvm/cpuid.h              |   5 ++
>>>>>>   arch/x86/kvm/pmu.c                |  50 +++++++++---
>>>>>>   arch/x86/kvm/pmu.h                |  38 +++++++++
>>>>>>   arch/x86/kvm/vmx/capabilities.h   |  26 ++++--
>>>>>>   arch/x86/kvm/vmx/pmu_intel.c      | 115 +++++++++++++++++++++-----
>>>>>>   arch/x86/kvm/vmx/vmx.c            |  24 +++++-
>>>>>>   arch/x86/kvm/vmx/vmx.h            |   2 +-
>>>>>>   arch/x86/kvm/x86.c                |  14 ++--
>>>>>>   15 files changed, 368 insertions(+), 96 deletions(-)
>>>>>>
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2021-05-21  1:37 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-11  2:41 [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS Like Xu
2021-05-11  2:41 ` [PATCH v6 01/16] perf/x86/intel: Add EPT-Friendly PEBS for Ice Lake Server Like Xu
2021-05-11  2:42 ` [PATCH v6 02/16] perf/x86/intel: Handle guest PEBS overflow PMI for KVM guest Like Xu
2021-05-17  8:16   ` Peter Zijlstra
2021-05-18  7:38     ` Xu, Like
2021-05-18  8:37       ` Peter Zijlstra
2021-05-11  2:42 ` [PATCH v6 03/16] perf/x86/core: Pass "struct kvm_pmu *" to determine the guest values Like Xu
2021-05-11  2:42 ` [PATCH v6 04/16] KVM: x86/pmu: Set MSR_IA32_MISC_ENABLE_EMON bit when vPMU is enabled Like Xu
2021-05-12  1:58   ` Venkatesh Srinivas
2021-05-12  5:00     ` Xu, Like
2021-05-12 15:18       ` Sean Christopherson
2021-05-13  2:50         ` Xu, Like
2021-05-17 18:43           ` Venkatesh Srinivas
2021-05-17 21:19             ` Sean Christopherson
2021-05-17 21:16           ` Sean Christopherson
2021-05-17 23:51             ` Sean Christopherson
2021-05-18  7:49               ` Xu, Like
2021-05-11  2:42 ` [PATCH v6 05/16] KVM: x86/pmu: Introduce the ctrl_mask value for fixed counter Like Xu
2021-05-17  8:18   ` Peter Zijlstra
2021-05-18  7:55     ` Xu, Like
2021-05-18  8:35       ` Peter Zijlstra
2021-05-11  2:42 ` [PATCH v6 06/16] KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS Like Xu
2021-05-17  8:32   ` Peter Zijlstra
2021-05-18  8:44     ` Xu, Like
2021-05-18 13:42       ` Peter Zijlstra
2021-05-17  8:33   ` Peter Zijlstra
2021-05-18  8:13     ` Xu, Like
2021-05-11  2:42 ` [PATCH v6 07/16] KVM: x86/pmu: Reprogram PEBS event to emulate guest PEBS counter Like Xu
2021-05-17  8:39   ` Peter Zijlstra
2021-05-17 14:44     ` Andi Kleen
2021-05-18  8:47       ` Peter Zijlstra
2021-05-18 13:15         ` Xu, Like
2021-05-18 15:58           ` Andi Kleen
2021-05-17  9:14   ` Peter Zijlstra
2021-05-18 13:28     ` Xu, Like
2021-05-18 13:36       ` Peter Zijlstra
2021-05-18 14:05         ` Xu, Like
2021-05-11  2:42 ` [PATCH v6 08/16] KVM: x86/pmu: Add IA32_DS_AREA MSR emulation to support guest DS Like Xu
2021-05-12  5:16   ` Xu, Like
2021-05-17 13:26   ` Peter Zijlstra
2021-05-17 14:50     ` Andi Kleen
2021-05-11  2:42 ` [PATCH v6 09/16] KVM: x86/pmu: Add PEBS_DATA_CFG MSR emulation to support adaptive PEBS Like Xu
2021-05-11  2:42 ` [PATCH v6 10/16] KVM: x86: Set PEBS_UNAVAIL in IA32_MISC_ENABLE when PEBS is enabled Like Xu
2021-05-11  2:42 ` [PATCH v6 11/16] KVM: x86/pmu: Adjust precise_ip to emulate Ice Lake guest PDIR counter Like Xu
2021-05-11  2:42 ` [PATCH v6 12/16] KVM: x86/pmu: Move pmc_speculative_in_use() to arch/x86/kvm/pmu.h Like Xu
2021-05-11  2:42 ` [PATCH v6 13/16] KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations Like Xu
2021-05-11  2:42 ` [PATCH v6 14/16] KVM: x86/pmu: Add kvm_pmu_cap to optimize perf_get_x86_pmu_capability Like Xu
2021-05-11  2:42 ` [PATCH v6 15/16] KVM: x86/cpuid: Refactor host/guest CPU model consistency check Like Xu
2021-05-11  2:42 ` [PATCH v6 16/16] KVM: x86/pmu: Expose CPUIDs feature bits PDCM, DS, DTES64 Like Xu
2021-05-15 10:30 ` [PATCH v6 00/16] KVM: x86/pmu: Add *basic* support to enable guest PEBS via DS Liuxiangdong
2021-05-17  6:38   ` Like Xu
2021-05-18 12:23     ` Liuxiangdong
2021-05-18 12:40       ` Xu, Like
2021-05-18 13:15         ` Liuxiangdong
2021-05-19  1:44         ` Liuxiangdong
2021-05-21  1:37           ` Like Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).