linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RESEND v3 0/3] KVM: x86/pmu: Enable guest PEBS for SPR and later models
@ 2022-12-06  8:29 Like Xu
  2022-12-06  8:29 ` [PATCH RESEND v3 1/3] KVM: x86/pmu: Disable guest PEBS on hybird cpu due to heterogeneity Like Xu
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Like Xu @ 2022-12-06  8:29 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Sean Christopherson, linux-kernel, kvm

Hi,

Finally, SPR will go live in early 2023. Virtualization support for SPR
PEBS (kvm.x86.vpmu.pebs_ept) has officially available in the Intel SDM
(June 2022), and this patch set is validated on a late stepping machine.

Let's see if this new rebased revision will satisfy everyone's appetite.

Previous:

V3:
https://lore.kernel.org/kvm/20221109082802.27543-1-likexu@tencent.com/
V2:
https://lore.kernel.org/kvm/20220922051929.89484-1-likexu@tencent.com/

V2 -> V3 Changelog:
- Add more commit message about the pdit/pdir stuff; (Sean)
- Refine confusing comments on event precise level and TNT+; (Sean)
- Use pmc_get_pebs_precise_level() instead of need_max_precise(); (Sean)
- Move HYBRID_CPU change in a separate patch; (Sean)
- Land KVM changes before perf core changes; (Sean)
- Aalign code indentation; (Sean) // VScode is quite good for kernel dev.

Like Xu (3):
  KVM: x86/pmu: Disable guest PEBS on hybird cpu due to heterogeneity
  KVM: x86/pmu: Add PRIR++ and PDist support for SPR and later models
  perf/x86/intel: Expose EPT-friendly PEBS for SPR and future models

 arch/x86/events/intel/core.c    |  1 +
 arch/x86/events/intel/ds.c      |  4 ++-
 arch/x86/kvm/pmu.c              | 45 ++++++++++++++++++++++++---------
 arch/x86/kvm/vmx/capabilities.h |  4 ++-
 4 files changed, 40 insertions(+), 14 deletions(-)

-- 
2.38.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH RESEND v3 1/3] KVM: x86/pmu: Disable guest PEBS on hybird cpu due to heterogeneity
  2022-12-06  8:29 [PATCH RESEND v3 0/3] KVM: x86/pmu: Enable guest PEBS for SPR and later models Like Xu
@ 2022-12-06  8:29 ` Like Xu
  2022-12-06  8:29 ` [PATCH RESEND v3 2/3] KVM: x86/pmu: Add PRIR++ and PDist support for SPR and later models Like Xu
  2022-12-06  8:29 ` [PATCH RESEND v3 3/3] perf/x86/intel: Expose EPT-friendly PEBS for SPR and future models Like Xu
  2 siblings, 0 replies; 4+ messages in thread
From: Like Xu @ 2022-12-06  8:29 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Sean Christopherson, linux-kernel, kvm

From: Like Xu <likexu@tencent.com>

From vPMU enabling perspective, KVM does not have proper support for
hybird x86 core. The reported perf_capabilities value (e.g. the format
of pebs record) depends on the type of cpu the kvm-intel module is init.
When a vcpu of one pebs format migrates to a vcpu of another pebs format,
the incorrect parsing of pebs records by guest can make profiling data
analysis extremely problematic.

The safe way to fix this is to disable this part of the support until the
guest recognizes that it is running on the hybird cpu, which is appropriate
at the moment given that x86 hybrid architectures are not heavily touted
in the data center market.

Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/vmx/capabilities.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index cd2ac9536c99..ea0498684048 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -392,7 +392,9 @@ static inline bool vmx_pt_mode_is_host_guest(void)
 
 static inline bool vmx_pebs_supported(void)
 {
-	return boot_cpu_has(X86_FEATURE_PEBS) && kvm_pmu_cap.pebs_ept;
+	return boot_cpu_has(X86_FEATURE_PEBS) &&
+	       !boot_cpu_has(X86_FEATURE_HYBRID_CPU) &&
+	       kvm_pmu_cap.pebs_ept;
 }
 
 static inline bool cpu_has_notify_vmexit(void)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH RESEND v3 2/3] KVM: x86/pmu: Add PRIR++ and PDist support for SPR and later models
  2022-12-06  8:29 [PATCH RESEND v3 0/3] KVM: x86/pmu: Enable guest PEBS for SPR and later models Like Xu
  2022-12-06  8:29 ` [PATCH RESEND v3 1/3] KVM: x86/pmu: Disable guest PEBS on hybird cpu due to heterogeneity Like Xu
@ 2022-12-06  8:29 ` Like Xu
  2022-12-06  8:29 ` [PATCH RESEND v3 3/3] perf/x86/intel: Expose EPT-friendly PEBS for SPR and future models Like Xu
  2 siblings, 0 replies; 4+ messages in thread
From: Like Xu @ 2022-12-06  8:29 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Sean Christopherson, linux-kernel, kvm

From: Like Xu <likexu@tencent.com>

The pebs capability on the SPR is basically the same as Ice Lake Server
with the exception of two special facilities that have been enhanced and
require special handling.

Upon triggering a PEBS assist, there will be a finite delay between the
time the counter overflows and when the microcode starts to carry out
its data collection obligations. Even if the delay is constant in core
clock space, it invariably manifest as variable "skids" in instruction
address space.

On the Ice Lake Server, the Precise Distribution of Instructions Retire
(PDIR) facility mitigates the "skid" problem by providing an early
indication of when the counter is about to overflow. On SPR, the PDIR
counter available (Fixed 0) is unchanged, but the capability is enhanced
to Instruction-Accurate PDIR (PDIR++), where PEBS is taken on the
next instruction after the one that caused the overflow.

SPR also introduces a new Precise Distribution (PDist) facility only on
general programmable counter 0. Per Intel SDM, PDist eliminates any
skid or shadowing effects from PEBS. With PDist, the PEBS record will
be generated precisely upon completion of the instruction or operation
that causes the counter to overflow (there is no "wait for next occurrence"
by default).

In terms of KVM handling, when guest accesses those special counters,
the KVM needs to request the same index counters via the perf_event
kernel subsystem to ensure that the guest uses the correct pebs hardware
counter (PRIR++ or PDist). This is mainly achieved by adjusting the
event precise level to the maximum, where the semantics of this magic
number is mainly defined by the internal software context of perf_event
and it's also backwards compatible as part of the user space interface.

Opportunistically, refine confusing comments on TNT+, as the only
ones that currently support pebs_ept are Ice Lake server and SPR (GLC+).

Signed-off-by: Like Xu <likexu@tencent.com>
---
 arch/x86/kvm/pmu.c | 45 +++++++++++++++++++++++++++++++++------------
 1 file changed, 33 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 684393c22105..8c8bfd078a3f 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -28,9 +28,18 @@
 struct x86_pmu_capability __read_mostly kvm_pmu_cap;
 EXPORT_SYMBOL_GPL(kvm_pmu_cap);
 
-static const struct x86_cpu_id vmx_icl_pebs_cpu[] = {
+/* Precise Distribution of Instructions Retired (PDIR) */
+static const struct x86_cpu_id vmx_pebs_pdir_cpu[] = {
 	X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_D, NULL),
 	X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X, NULL),
+	/* Instruction-Accurate PDIR (PDIR++) */
+	X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, NULL),
+	{}
+};
+
+/* Precise Distribution (PDist) */
+static const struct x86_cpu_id vmx_pebs_pdist_cpu[] = {
+	X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, NULL),
 	{}
 };
 
@@ -155,6 +164,28 @@ static void kvm_perf_overflow(struct perf_event *perf_event,
 	kvm_make_request(KVM_REQ_PMU, pmc->vcpu);
 }
 
+static u64 pmc_get_pebs_precise_level(struct kvm_pmc *pmc)
+{
+	/*
+	 * For some model specific pebs counters with special capabilities
+	 * (PDIR, PDIR++, PDIST), KVM needs to raise the event precise
+	 * level to the maximum value (currently 3, backwards compatible)
+	 * so that the perf subsystem would assign specific hardware counter
+	 * with that capability for vPMC.
+	 */
+	if ((pmc->idx == 0 && x86_match_cpu(vmx_pebs_pdist_cpu)) ||
+	    (pmc->idx == 32 && x86_match_cpu(vmx_pebs_pdir_cpu)))
+		return 3;
+
+	/*
+	 * The non-zero precision level of guest event makes the ordinary
+	 * guest event becomes a guest PEBS event and triggers the host
+	 * PEBS PMI handler to determine whether the PEBS overflow PMI
+	 * comes from the host counters or the guest.
+	 */
+	return 1;
+}
+
 static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config,
 				 bool exclude_user, bool exclude_kernel,
 				 bool intr)
@@ -186,22 +217,12 @@ static int pmc_reprogram_counter(struct kvm_pmc *pmc, u32 type, u64 config,
 	}
 	if (pebs) {
 		/*
-		 * The non-zero precision level of guest event makes the ordinary
-		 * guest event becomes a guest PEBS event and triggers the host
-		 * PEBS PMI handler to determine whether the PEBS overflow PMI
-		 * comes from the host counters or the guest.
-		 *
 		 * For most PEBS hardware events, the difference in the software
 		 * precision levels of guest and host PEBS events will not affect
 		 * the accuracy of the PEBS profiling result, because the "event IP"
 		 * in the PEBS record is calibrated on the guest side.
-		 *
-		 * On Icelake everything is fine. Other hardware (GLC+, TNT+) that
-		 * could possibly care here is unsupported and needs changes.
 		 */
-		attr.precise_ip = 1;
-		if (x86_match_cpu(vmx_icl_pebs_cpu) && pmc->idx == 32)
-			attr.precise_ip = 3;
+		attr.precise_ip = pmc_get_pebs_precise_level(pmc);
 	}
 
 	event = perf_event_create_kernel_counter(&attr, -1, current,
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH RESEND v3 3/3] perf/x86/intel: Expose EPT-friendly PEBS for SPR and future models
  2022-12-06  8:29 [PATCH RESEND v3 0/3] KVM: x86/pmu: Enable guest PEBS for SPR and later models Like Xu
  2022-12-06  8:29 ` [PATCH RESEND v3 1/3] KVM: x86/pmu: Disable guest PEBS on hybird cpu due to heterogeneity Like Xu
  2022-12-06  8:29 ` [PATCH RESEND v3 2/3] KVM: x86/pmu: Add PRIR++ and PDist support for SPR and later models Like Xu
@ 2022-12-06  8:29 ` Like Xu
  2 siblings, 0 replies; 4+ messages in thread
From: Like Xu @ 2022-12-06  8:29 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, linux-kernel, kvm, Peter Zijlstra,
	linux-perf-users, Kan Liang

From: Like Xu <likexu@tencent.com>

According to Intel SDM, the EPT-friendly PEBS is supported by all the
platforms after ICX, ADL and the future platforms with PEBS format 5.

Currently the only in-kernel user of this capability is KVM, which has
very limited support for hybrid core pmu, so ADL and its successors do
not currently expose this capability. When both hybrid core and PEBS
format 5 are present, KVM will decide on its own merits.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-perf-users@vger.kernel.org
Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <likexu@tencent.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
Nit: This change is proposed to be applied via the KVM tree.

 arch/x86/events/intel/core.c | 1 +
 arch/x86/events/intel/ds.c   | 4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 1b92bf05fd65..e7e31b9d24d3 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -6351,6 +6351,7 @@ __init int intel_pmu_init(void)
 		x86_pmu.pebs_constraints = intel_spr_pebs_event_constraints;
 		x86_pmu.extra_regs = intel_spr_extra_regs;
 		x86_pmu.limit_period = spr_limit_period;
+		x86_pmu.pebs_ept = 1;
 		x86_pmu.pebs_aliases = NULL;
 		x86_pmu.pebs_prec_dist = true;
 		x86_pmu.pebs_block = true;
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 446d2833efa7..7258dca6882f 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -2303,8 +2303,10 @@ void __init intel_ds_init(void)
 			x86_pmu.large_pebs_flags |= PERF_SAMPLE_TIME;
 			break;
 
-		case 4:
 		case 5:
+			x86_pmu.pebs_ept = 1;
+			fallthrough;
+		case 4:
 			x86_pmu.drain_pebs = intel_pmu_drain_pebs_icl;
 			x86_pmu.pebs_record_size = sizeof(struct pebs_basic);
 			if (x86_pmu.intel_cap.pebs_baseline) {
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-12-06  8:30 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-06  8:29 [PATCH RESEND v3 0/3] KVM: x86/pmu: Enable guest PEBS for SPR and later models Like Xu
2022-12-06  8:29 ` [PATCH RESEND v3 1/3] KVM: x86/pmu: Disable guest PEBS on hybird cpu due to heterogeneity Like Xu
2022-12-06  8:29 ` [PATCH RESEND v3 2/3] KVM: x86/pmu: Add PRIR++ and PDist support for SPR and later models Like Xu
2022-12-06  8:29 ` [PATCH RESEND v3 3/3] perf/x86/intel: Expose EPT-friendly PEBS for SPR and future models Like Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).