linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/10] Guest LBR Enabling
@ 2018-12-26  9:25 Wei Wang
  2018-12-26  9:25 ` [PATCH v4 01/10] perf/x86: fix the variable type of the LBR MSRs Wei Wang
                   ` (9 more replies)
  0 siblings, 10 replies; 42+ messages in thread
From: Wei Wang @ 2018-12-26  9:25 UTC (permalink / raw)
  To: linux-kernel, kvm, pbonzini, ak, peterz
  Cc: kan.liang, mingo, rkrcmar, like.xu, wei.w.wang, jannh, arei.gonglei

Last Branch Recording (LBR) is a performance monitor unit (PMU) feature
on Intel CPUs that captures branch related info. This patch series enables
this feature to KVM guests.

Here is a conclusion of the fundamental methods that we use:
1) the LBR feature is enabled per guest via QEMU setting of
   KVM_CAP_X86_GUEST_LBR;
2) the LBR stack is passed through to the guest for direct accesses after
   the guest's first access to any of the lbr related MSRs;
3) When the guest uses the LBR feature with the user callstack mode, the
   host will help save/resotre the LBR stack when the vCPU is scheduled
   out/in.

ChangeLog:
v3->v4:
    - perf/x86:
        - Patch 1: change the lbr msr variable type to "unsigned int";
        - Patch 6: add a "no_counter" boolean attr to let callers
          explicitly tell the perf core to not allocate counters for the
          event;
        - Patch 7: change "cpuc->vcpu_lbr" from "boolean" to "u8", and add
          the PF_VCPU and is_kernel_event checks befoer setting it;
        - Patch 8: add the LBR_SELECT save/restore on vCPU switching;
	- Patch 9: move lbr_select_user_callstack to .h, instead of
          exporting it;

    - KVM/x86:
        - Patch 3: use "cap->args[0]" to enable/disable the lbr feature
          from userspace;
        - Patch 4: forbid the enabling the guest lbr when the host and
          guest see different lbr stack entries;
        - Patch 10: in guest_access_lbr_msr, pass through the lbr stack
          only when it has been passed through (i.e. add the check
          !vcpu->arch.lbr_used).
previous:
https://lkml.org/lkml/2018/9/20/507

Like Xu (1):
  KVM/x86/vPMU: Add APIs to support host save/restore the guest lbr
    stack

Wei Wang (9):
  perf/x86: fix the variable type of the LBR MSRs
  perf/x86: add a function to get the lbr stack
  KVM/x86: KVM_CAP_X86_GUEST_LBR
  KVM/x86: intel_pmu_lbr_enable
  KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest
  perf/x86: no counter allocation support
  perf/x86: save/restore LBR_SELECT on vCPU switching
  perf/x86: function to check lbr user callstack mode
  KVM/x86/lbr: lazy save the guest lbr stack

 arch/x86/events/core.c            |  12 +++
 arch/x86/events/intel/lbr.c       |  82 ++++++++---------
 arch/x86/events/perf_event.h      |   6 +-
 arch/x86/include/asm/kvm_host.h   |   7 ++
 arch/x86/include/asm/perf_event.h |  64 +++++++++++++
 arch/x86/kvm/cpuid.c              |   2 +-
 arch/x86/kvm/cpuid.h              |   8 ++
 arch/x86/kvm/pmu.c                |   8 ++
 arch/x86/kvm/pmu.h                |  10 +++
 arch/x86/kvm/pmu_intel.c          | 183 ++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx.c                | 147 ++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c                |  14 +++
 include/linux/perf_event.h        |  12 +++
 include/uapi/linux/kvm.h          |   1 +
 include/uapi/linux/perf_event.h   |   3 +-
 kernel/events/core.c              |   7 --
 16 files changed, 514 insertions(+), 52 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4 01/10] perf/x86: fix the variable type of the LBR MSRs
  2018-12-26  9:25 [PATCH v4 00/10] Guest LBR Enabling Wei Wang
@ 2018-12-26  9:25 ` Wei Wang
  2018-12-26  9:25 ` [PATCH v4 02/10] perf/x86: add a function to get the lbr stack Wei Wang
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-12-26  9:25 UTC (permalink / raw)
  To: linux-kernel, kvm, pbonzini, ak, peterz
  Cc: kan.liang, mingo, rkrcmar, like.xu, wei.w.wang, jannh, arei.gonglei

The MSR variable type can be "unsigned int", which uses less memory than
the longer unsigned long. The lbr nr won't be a negative number, so make
it "unsigned int" as well.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/events/perf_event.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 78d7b70..1f78d85 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -619,8 +619,8 @@ struct x86_pmu {
 	/*
 	 * Intel LBR
 	 */
-	unsigned long	lbr_tos, lbr_from, lbr_to; /* MSR base regs       */
-	int		lbr_nr;			   /* hardware stack size */
+	unsigned int	lbr_tos, lbr_from, lbr_to,
+			lbr_nr;			   /* lbr stack and size */
 	u64		lbr_sel_mask;		   /* LBR_SELECT valid bits */
 	const int	*lbr_sel_map;		   /* lbr_select mappings */
 	bool		lbr_double_abort;	   /* duplicated lbr aborts */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 02/10] perf/x86: add a function to get the lbr stack
  2018-12-26  9:25 [PATCH v4 00/10] Guest LBR Enabling Wei Wang
  2018-12-26  9:25 ` [PATCH v4 01/10] perf/x86: fix the variable type of the LBR MSRs Wei Wang
@ 2018-12-26  9:25 ` Wei Wang
  2018-12-26  9:25 ` [PATCH v4 03/10] KVM/x86: KVM_CAP_X86_GUEST_LBR Wei Wang
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-12-26  9:25 UTC (permalink / raw)
  To: linux-kernel, kvm, pbonzini, ak, peterz
  Cc: kan.liang, mingo, rkrcmar, like.xu, wei.w.wang, jannh, arei.gonglei

The LBR stack MSRs are architecturally specific. The perf subsystem has
already assigned the abstracted MSR values based on the CPU architecture.

This patch enables a caller outside the perf subsystem to get the LBR
stack info. This is useful for hyperviosrs to prepare the lbr feature
for the guest.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/events/intel/lbr.c       | 23 +++++++++++++++++++++++
 arch/x86/include/asm/perf_event.h | 14 ++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index c88ed39..594a91b 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -1277,3 +1277,26 @@ void intel_pmu_lbr_init_knl(void)
 	if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_LIP)
 		x86_pmu.intel_cap.lbr_format = LBR_FORMAT_EIP_FLAGS;
 }
+
+/**
+ * x86_perf_get_lbr_stack - get the lbr stack related MSRs
+ *
+ * @stack: the caller's memory to get the lbr stack
+ *
+ * Returns: 0 indicates that the lbr stack has been successfully obtained.
+ */
+int x86_perf_get_lbr_stack(struct x86_perf_lbr_stack *stack)
+{
+	stack->nr = x86_pmu.lbr_nr;
+	stack->tos = x86_pmu.lbr_tos;
+	stack->from = x86_pmu.lbr_from;
+	stack->to = x86_pmu.lbr_to;
+
+	if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO)
+		stack->info = MSR_LBR_INFO_0;
+	else
+		stack->info = 0;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(x86_perf_get_lbr_stack);
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 8bdf749..2f82795 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -275,7 +275,16 @@ struct perf_guest_switch_msr {
 	u64 host, guest;
 };
 
+struct x86_perf_lbr_stack {
+	unsigned int	nr;
+	unsigned int	tos;
+	unsigned int	from;
+	unsigned int	to;
+	unsigned int	info;
+};
+
 extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr);
+extern int x86_perf_get_lbr_stack(struct x86_perf_lbr_stack *stack);
 extern void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap);
 extern void perf_check_microcode(void);
 extern int x86_perf_rdpmc_index(struct perf_event *event);
@@ -286,6 +295,11 @@ static inline struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr)
 	return NULL;
 }
 
+static inline int x86_perf_get_lbr_stack(struct x86_perf_lbr_stack *stack)
+{
+	return -1;
+}
+
 static inline void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
 {
 	memset(cap, 0, sizeof(*cap));
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 03/10] KVM/x86: KVM_CAP_X86_GUEST_LBR
  2018-12-26  9:25 [PATCH v4 00/10] Guest LBR Enabling Wei Wang
  2018-12-26  9:25 ` [PATCH v4 01/10] perf/x86: fix the variable type of the LBR MSRs Wei Wang
  2018-12-26  9:25 ` [PATCH v4 02/10] perf/x86: add a function to get the lbr stack Wei Wang
@ 2018-12-26  9:25 ` Wei Wang
  2018-12-26  9:25 ` [PATCH v4 04/10] KVM/x86: intel_pmu_lbr_enable Wei Wang
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-12-26  9:25 UTC (permalink / raw)
  To: linux-kernel, kvm, pbonzini, ak, peterz
  Cc: kan.liang, mingo, rkrcmar, like.xu, wei.w.wang, jannh, arei.gonglei

Introduce KVM_CAP_X86_GUEST_LBR to allow per-VM enabling of the guest
lbr feature.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/x86.c              | 15 +++++++++++++++
 include/uapi/linux/kvm.h        |  1 +
 3 files changed, 18 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fbda5a9..0831ac6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -865,6 +865,7 @@ struct kvm_arch {
 	atomic_t vapics_in_nmi_mode;
 	struct mutex apic_map_lock;
 	struct kvm_apic_map *apic_map;
+	struct x86_perf_lbr_stack lbr_stack;
 
 	bool apic_access_page_done;
 
@@ -873,6 +874,7 @@ struct kvm_arch {
 	bool mwait_in_guest;
 	bool hlt_in_guest;
 	bool pause_in_guest;
+	bool lbr_in_guest;
 
 	unsigned long irq_sources_bitmap;
 	s64 kvmclock_offset;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f049ecf..50efee4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3018,6 +3018,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_GET_MSR_FEATURES:
 	case KVM_CAP_MSR_PLATFORM_INFO:
 	case KVM_CAP_EXCEPTION_PAYLOAD:
+	case KVM_CAP_X86_GUEST_LBR:
 		r = 1;
 		break;
 	case KVM_CAP_SYNC_REGS:
@@ -4502,6 +4503,20 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		kvm->arch.exception_payload_enabled = cap->args[0];
 		r = 0;
 		break;
+	case KVM_CAP_X86_GUEST_LBR:
+		r = -EINVAL;
+		if (cap->args[0] &&
+		    x86_perf_get_lbr_stack(&kvm->arch.lbr_stack)) {
+			pr_err("Failed to enable the guest lbr feature\n");
+			break;
+		}
+		if (copy_to_user((void __user *)cap->args[1],
+				 &kvm->arch.lbr_stack,
+				 sizeof(struct x86_perf_lbr_stack)))
+			break;
+		kvm->arch.lbr_in_guest = cap->args[0];
+		r = 0;
+		break;
 	default:
 		r = -EINVAL;
 		break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2b7a652..ffe31b1 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -975,6 +975,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_HYPERV_ENLIGHTENED_VMCS 163
 #define KVM_CAP_EXCEPTION_PAYLOAD 164
 #define KVM_CAP_ARM_VM_IPA_SIZE 165
+#define KVM_CAP_X86_GUEST_LBR 166
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 04/10] KVM/x86: intel_pmu_lbr_enable
  2018-12-26  9:25 [PATCH v4 00/10] Guest LBR Enabling Wei Wang
                   ` (2 preceding siblings ...)
  2018-12-26  9:25 ` [PATCH v4 03/10] KVM/x86: KVM_CAP_X86_GUEST_LBR Wei Wang
@ 2018-12-26  9:25 ` Wei Wang
  2019-01-02 16:33   ` Liang, Kan
  2019-01-02 23:26   ` Jim Mattson
  2018-12-26  9:25 ` [PATCH v4 05/10] KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest Wei Wang
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 42+ messages in thread
From: Wei Wang @ 2018-12-26  9:25 UTC (permalink / raw)
  To: linux-kernel, kvm, pbonzini, ak, peterz
  Cc: kan.liang, mingo, rkrcmar, like.xu, wei.w.wang, jannh, arei.gonglei

The lbr stack is architecturally specific, for example, SKX has 32 lbr
stack entries while HSW has 16 entries, so a HSW guest running on a SKX
machine may not get accurate perf results. Currently, we forbid the
guest lbr enabling when the guest and host see different lbr stack
entries.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/kvm/cpuid.h     |   8 ++++
 arch/x86/kvm/pmu.c       |   8 ++++
 arch/x86/kvm/pmu.h       |   2 +
 arch/x86/kvm/pmu_intel.c | 117 +++++++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/x86.c       |   3 +-
 5 files changed, 136 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index 9a327d5..92bdc7d 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -123,6 +123,14 @@ static inline bool guest_cpuid_is_amd(struct kvm_vcpu *vcpu)
 	return best && best->ebx == X86EMUL_CPUID_VENDOR_AuthenticAMD_ebx;
 }
 
+static inline bool guest_cpuid_is_intel(struct kvm_vcpu *vcpu)
+{
+	struct kvm_cpuid_entry2 *best;
+
+	best = kvm_find_cpuid_entry(vcpu, 0, 0);
+	return best && best->ebx == X86EMUL_CPUID_VENDOR_GenuineIntel_ebx;
+}
+
 static inline int guest_cpuid_family(struct kvm_vcpu *vcpu)
 {
 	struct kvm_cpuid_entry2 *best;
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 58ead7d..b438ffa 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -299,6 +299,14 @@ int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned idx, u64 *data)
 	return 0;
 }
 
+bool kvm_pmu_lbr_enable(struct kvm_vcpu *vcpu)
+{
+	if (guest_cpuid_is_intel(vcpu))
+		return kvm_x86_ops->pmu_ops->lbr_enable(vcpu);
+
+	return false;
+}
+
 void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
 {
 	if (lapic_in_kernel(vcpu))
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index ba8898e..5f3c7a4 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -28,6 +28,7 @@ struct kvm_pmu_ops {
 	struct kvm_pmc *(*msr_idx_to_pmc)(struct kvm_vcpu *vcpu, unsigned idx);
 	int (*is_valid_msr_idx)(struct kvm_vcpu *vcpu, unsigned idx);
 	bool (*is_valid_msr)(struct kvm_vcpu *vcpu, u32 msr);
+	bool (*lbr_enable)(struct kvm_vcpu *vcpu);
 	int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
 	int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
 	void (*refresh)(struct kvm_vcpu *vcpu);
@@ -106,6 +107,7 @@ void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel);
 void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int fixed_idx);
 void reprogram_counter(struct kvm_pmu *pmu, int pmc_idx);
 
+bool kvm_pmu_lbr_enable(struct kvm_vcpu *vcpu);
 void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu);
 void kvm_pmu_handle_event(struct kvm_vcpu *vcpu);
 int kvm_pmu_rdpmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
diff --git a/arch/x86/kvm/pmu_intel.c b/arch/x86/kvm/pmu_intel.c
index 5ab4a36..c04cb6d 100644
--- a/arch/x86/kvm/pmu_intel.c
+++ b/arch/x86/kvm/pmu_intel.c
@@ -15,6 +15,7 @@
 #include <linux/kvm_host.h>
 #include <linux/perf_event.h>
 #include <asm/perf_event.h>
+#include <asm/intel-family.h>
 #include "x86.h"
 #include "cpuid.h"
 #include "lapic.h"
@@ -164,6 +165,121 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 	return ret;
 }
 
+static bool intel_pmu_lbr_enable(struct kvm_vcpu *vcpu)
+{
+	struct kvm *kvm = vcpu->kvm;
+	u8 vcpu_model = guest_cpuid_model(vcpu);
+	unsigned int vcpu_lbr_nr;
+
+	if (x86_perf_get_lbr_stack(&kvm->arch.lbr_stack))
+		return false;
+
+	if (guest_cpuid_family(vcpu) != boot_cpu_data.x86)
+		return false;
+
+	/*
+	 * It could be possible that people have vcpus of old model run on
+	 * physcal cpus of newer model, for example a BDW guest on a SKX
+	 * machine (but not possible to be the other way around).
+	 * The BDW guest may not get accurate results on a SKX machine as it
+	 * only reads 16 entries of the lbr stack while there are 32 entries
+	 * of recordings. So we currently forbid the lbr enabling when the
+	 * vcpu and physical cpu see different lbr stack entries.
+	 */
+	switch (vcpu_model) {
+	case INTEL_FAM6_CORE2_MEROM:
+	case INTEL_FAM6_CORE2_MEROM_L:
+	case INTEL_FAM6_CORE2_PENRYN:
+	case INTEL_FAM6_CORE2_DUNNINGTON:
+		/* intel_pmu_lbr_init_core() */
+		vcpu_lbr_nr = 4;
+		break;
+	case INTEL_FAM6_NEHALEM:
+	case INTEL_FAM6_NEHALEM_EP:
+	case INTEL_FAM6_NEHALEM_EX:
+		/* intel_pmu_lbr_init_nhm() */
+		vcpu_lbr_nr = 16;
+		break;
+	case INTEL_FAM6_ATOM_BONNELL:
+	case INTEL_FAM6_ATOM_BONNELL_MID:
+	case INTEL_FAM6_ATOM_SALTWELL:
+	case INTEL_FAM6_ATOM_SALTWELL_MID:
+	case INTEL_FAM6_ATOM_SALTWELL_TABLET:
+		/* intel_pmu_lbr_init_atom() */
+		vcpu_lbr_nr = 8;
+		break;
+	case INTEL_FAM6_ATOM_SILVERMONT:
+	case INTEL_FAM6_ATOM_SILVERMONT_X:
+	case INTEL_FAM6_ATOM_SILVERMONT_MID:
+	case INTEL_FAM6_ATOM_AIRMONT:
+	case INTEL_FAM6_ATOM_AIRMONT_MID:
+		/* intel_pmu_lbr_init_slm() */
+		vcpu_lbr_nr = 8;
+		break;
+	case INTEL_FAM6_ATOM_GOLDMONT:
+	case INTEL_FAM6_ATOM_GOLDMONT_X:
+		/* intel_pmu_lbr_init_skl(); */
+		vcpu_lbr_nr = 32;
+		break;
+	case INTEL_FAM6_ATOM_GOLDMONT_PLUS:
+		/* intel_pmu_lbr_init_skl()*/
+		vcpu_lbr_nr = 32;
+		break;
+	case INTEL_FAM6_WESTMERE:
+	case INTEL_FAM6_WESTMERE_EP:
+	case INTEL_FAM6_WESTMERE_EX:
+		/* intel_pmu_lbr_init_nhm() */
+		vcpu_lbr_nr = 16;
+		break;
+	case INTEL_FAM6_SANDYBRIDGE:
+	case INTEL_FAM6_SANDYBRIDGE_X:
+		/* intel_pmu_lbr_init_snb() */
+		vcpu_lbr_nr = 16;
+		break;
+	case INTEL_FAM6_IVYBRIDGE:
+	case INTEL_FAM6_IVYBRIDGE_X:
+		/* intel_pmu_lbr_init_snb() */
+		vcpu_lbr_nr = 16;
+		break;
+	case INTEL_FAM6_HASWELL_CORE:
+	case INTEL_FAM6_HASWELL_X:
+	case INTEL_FAM6_HASWELL_ULT:
+	case INTEL_FAM6_HASWELL_GT3E:
+		/* intel_pmu_lbr_init_hsw() */
+		vcpu_lbr_nr = 16;
+		break;
+	case INTEL_FAM6_BROADWELL_CORE:
+	case INTEL_FAM6_BROADWELL_XEON_D:
+	case INTEL_FAM6_BROADWELL_GT3E:
+	case INTEL_FAM6_BROADWELL_X:
+		/* intel_pmu_lbr_init_hsw() */
+		vcpu_lbr_nr = 16;
+		break;
+	case INTEL_FAM6_XEON_PHI_KNL:
+	case INTEL_FAM6_XEON_PHI_KNM:
+		/* intel_pmu_lbr_init_knl() */
+		vcpu_lbr_nr = 8;
+		break;
+	case INTEL_FAM6_SKYLAKE_MOBILE:
+	case INTEL_FAM6_SKYLAKE_DESKTOP:
+	case INTEL_FAM6_SKYLAKE_X:
+	case INTEL_FAM6_KABYLAKE_MOBILE:
+	case INTEL_FAM6_KABYLAKE_DESKTOP:
+		/* intel_pmu_lbr_init_skl() */
+		vcpu_lbr_nr = 32;
+		break;
+	default:
+		vcpu_lbr_nr = 0;
+		pr_warn("%s: vcpu model not supported %d\n", __func__,
+			vcpu_model);
+	}
+
+	if (vcpu_lbr_nr != kvm->arch.lbr_stack.nr)
+		return false;
+
+	return true;
+}
+
 static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
@@ -350,6 +466,7 @@ struct kvm_pmu_ops intel_pmu_ops = {
 	.msr_idx_to_pmc = intel_msr_idx_to_pmc,
 	.is_valid_msr_idx = intel_is_valid_msr_idx,
 	.is_valid_msr = intel_is_valid_msr,
+	.lbr_enable = intel_pmu_lbr_enable,
 	.get_msr = intel_pmu_get_msr,
 	.set_msr = intel_pmu_set_msr,
 	.refresh = intel_pmu_refresh,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 50efee4..02e29fd 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4505,8 +4505,7 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		break;
 	case KVM_CAP_X86_GUEST_LBR:
 		r = -EINVAL;
-		if (cap->args[0] &&
-		    x86_perf_get_lbr_stack(&kvm->arch.lbr_stack)) {
+		if (cap->args[0] && !kvm_pmu_lbr_enable(kvm->vcpus[0])) {
 			pr_err("Failed to enable the guest lbr feature\n");
 			break;
 		}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 05/10] KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest
  2018-12-26  9:25 [PATCH v4 00/10] Guest LBR Enabling Wei Wang
                   ` (3 preceding siblings ...)
  2018-12-26  9:25 ` [PATCH v4 04/10] KVM/x86: intel_pmu_lbr_enable Wei Wang
@ 2018-12-26  9:25 ` Wei Wang
  2019-01-02 23:40   ` Jim Mattson
  2018-12-26  9:25 ` [PATCH v4 06/10] perf/x86: no counter allocation support Wei Wang
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 42+ messages in thread
From: Wei Wang @ 2018-12-26  9:25 UTC (permalink / raw)
  To: linux-kernel, kvm, pbonzini, ak, peterz
  Cc: kan.liang, mingo, rkrcmar, like.xu, wei.w.wang, jannh, arei.gonglei

Bits [0, 5] of MSR_IA32_PERF_CAPABILITIES tell about the format of
the addresses stored in the LBR stack. Expose those bits to the guest
when the guest lbr feature is enabled.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/include/asm/perf_event.h | 2 ++
 arch/x86/kvm/cpuid.c              | 2 +-
 arch/x86/kvm/vmx.c                | 9 +++++++++
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 2f82795..eee09b7 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -87,6 +87,8 @@
 #define ARCH_PERFMON_BRANCH_MISSES_RETIRED		6
 #define ARCH_PERFMON_EVENTS_COUNT			7
 
+#define X86_PERF_CAP_MASK_LBR_FMT			0x3f
+
 /*
  * Intel "Architectural Performance Monitoring" CPUID
  * detection/enumeration details:
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 7bcfa61..3b8a57b 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -365,7 +365,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
 		F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
 		0 /* DS-CPL, VMX, SMX, EST */ |
 		0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
-		F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ |
+		F(FMA) | F(CX16) | 0 /* xTPR Update*/ | F(PDCM) |
 		F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
 		F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) |
 		0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 8d5d984..ee02967 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4161,6 +4161,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 1;
 		msr_info->data = vcpu->arch.ia32_xss;
 		break;
+	case MSR_IA32_PERF_CAPABILITIES:
+		if (!boot_cpu_has(X86_FEATURE_PDCM))
+			return 1;
+		msr_info->data = native_read_msr(MSR_IA32_PERF_CAPABILITIES);
+		if (vcpu->kvm->arch.lbr_in_guest)
+			msr_info->data &= X86_PERF_CAP_MASK_LBR_FMT;
+		break;
 	case MSR_TSC_AUX:
 		if (!msr_info->host_initiated &&
 		    !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
@@ -4343,6 +4350,8 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		else
 			clear_atomic_switch_msr(vmx, MSR_IA32_XSS);
 		break;
+	case MSR_IA32_PERF_CAPABILITIES:
+		return 1; /* RO MSR */
 	case MSR_TSC_AUX:
 		if (!msr_info->host_initiated &&
 		    !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 06/10] perf/x86: no counter allocation support
  2018-12-26  9:25 [PATCH v4 00/10] Guest LBR Enabling Wei Wang
                   ` (4 preceding siblings ...)
  2018-12-26  9:25 ` [PATCH v4 05/10] KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest Wei Wang
@ 2018-12-26  9:25 ` Wei Wang
  2018-12-26  9:25 ` [PATCH v4 07/10] KVM/x86/vPMU: Add APIs to support host save/restore the guest lbr stack Wei Wang
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-12-26  9:25 UTC (permalink / raw)
  To: linux-kernel, kvm, pbonzini, ak, peterz
  Cc: kan.liang, mingo, rkrcmar, like.xu, wei.w.wang, jannh, arei.gonglei

In some cases, an event may be created without needing a counter
allocation. For example, an lbr event may be created by the host
only to help save/restore the lbr stack on the vCPU context switching.

This patch adds a "no_counter" attr boolean to let the callers
explicitly tell the perf core that no counter is needed.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
---
 arch/x86/events/core.c          | 12 ++++++++++++
 include/linux/perf_event.h      |  5 +++++
 include/uapi/linux/perf_event.h |  3 ++-
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 374a197..c09f03b 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -410,6 +410,9 @@ int x86_setup_perfctr(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	u64 config;
 
+	if (is_no_counter_event(event))
+		return 0;
+
 	if (!is_sampling_event(event)) {
 		hwc->sample_period = x86_pmu.max_period;
 		hwc->last_period = hwc->sample_period;
@@ -1209,6 +1212,12 @@ static int x86_pmu_add(struct perf_event *event, int flags)
 	hwc = &event->hw;
 
 	n0 = cpuc->n_events;
+
+	if (is_no_counter_event(event)) {
+		n = n0;
+		goto done_collect;
+	}
+
 	ret = n = collect_events(cpuc, event, false);
 	if (ret < 0)
 		goto out;
@@ -1387,6 +1396,9 @@ static void x86_pmu_del(struct perf_event *event, int flags)
 	if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
 		goto do_del;
 
+	if (is_no_counter_event(event))
+		goto do_del;
+
 	/*
 	 * Not a TXN, therefore cleanup properly.
 	 */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 53c500f..e58a997 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1009,6 +1009,11 @@ static inline bool is_sampling_event(struct perf_event *event)
 	return event->attr.sample_period != 0;
 }
 
+static inline bool is_no_counter_event(struct perf_event *event)
+{
+	return event->attr.no_counter;
+}
+
 /*
  * Return 1 for a software event, 0 for a hardware event
  */
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 9de8780..ec97a70 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -372,7 +372,8 @@ struct perf_event_attr {
 				context_switch :  1, /* context switch data */
 				write_backward :  1, /* Write ring buffer from end to beginning */
 				namespaces     :  1, /* include namespaces data */
-				__reserved_1   : 35;
+				no_counter     :  1, /* no counter allocation */
+				__reserved_1   : 34;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 07/10] KVM/x86/vPMU: Add APIs to support host save/restore the guest lbr stack
  2018-12-26  9:25 [PATCH v4 00/10] Guest LBR Enabling Wei Wang
                   ` (5 preceding siblings ...)
  2018-12-26  9:25 ` [PATCH v4 06/10] perf/x86: no counter allocation support Wei Wang
@ 2018-12-26  9:25 ` Wei Wang
  2018-12-26  9:25 ` [PATCH v4 08/10] perf/x86: save/restore LBR_SELECT on vCPU switching Wei Wang
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-12-26  9:25 UTC (permalink / raw)
  To: linux-kernel, kvm, pbonzini, ak, peterz
  Cc: kan.liang, mingo, rkrcmar, like.xu, wei.w.wang, jannh, arei.gonglei

From: Like Xu <like.xu@intel.com>

This patch adds support to enable/disable the host side save/restore
for the guest lbr stack on vCPU switching. To enable that, the host
creates a perf event for the vCPU, and the event attributes are set
to the user callstack mode lbr so that all the conditions are meet in
the host perf subsystem to save the lbr stack on task switching.

The host side lbr perf event are created only for the purpose of saving
and restoring the lbr stack. There is no need to enable the lbr
functionality for this perf event, because the feature is essentially
used in the vCPU. So use "no_counter=true" to have the perf core not
allocate a counter for this event.

The vcpu_lbr field is added to cpuc, to indicate if the lbr perf event is
used by the vCPU only for context switching. When the perf subsystem
handles this event (e.g. lbr enable or read lbr stack on PMI) and finds
it's non-zero, it simply returns.

Signed-off-by: Like Xu <like.xu@intel.com>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/events/intel/lbr.c     | 14 +++++++--
 arch/x86/events/perf_event.h    |  1 +
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/pmu.h              |  3 ++
 arch/x86/kvm/pmu_intel.c        | 66 +++++++++++++++++++++++++++++++++++++++++
 include/linux/perf_event.h      |  7 +++++
 kernel/events/core.c            |  7 -----
 7 files changed, 89 insertions(+), 10 deletions(-)

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 594a91b..1d20051 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -462,6 +462,10 @@ void intel_pmu_lbr_add(struct perf_event *event)
 	if (!x86_pmu.lbr_nr)
 		return;
 
+	if (is_kernel_event(event) && event->attr.exclude_guest &&
+	    (current->flags & PF_VCPU))
+		cpuc->vcpu_lbr = 1;
+
 	cpuc->br_sel = event->hw.branch_reg.reg;
 
 	if (branch_user_callstack(cpuc->br_sel) && event->ctx->task_ctx_data) {
@@ -507,6 +511,10 @@ void intel_pmu_lbr_del(struct perf_event *event)
 		task_ctx->lbr_callstack_users--;
 	}
 
+	if (is_kernel_event(event) && event->attr.exclude_guest &&
+	    (current->flags & PF_VCPU))
+		cpuc->vcpu_lbr = 0;
+
 	cpuc->lbr_users--;
 	WARN_ON_ONCE(cpuc->lbr_users < 0);
 	perf_sched_cb_dec(event->ctx->pmu);
@@ -516,7 +524,7 @@ void intel_pmu_lbr_enable_all(bool pmi)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 
-	if (cpuc->lbr_users)
+	if (cpuc->lbr_users && !cpuc->vcpu_lbr)
 		__intel_pmu_lbr_enable(pmi);
 }
 
@@ -524,7 +532,7 @@ void intel_pmu_lbr_disable_all(void)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 
-	if (cpuc->lbr_users)
+	if (cpuc->lbr_users && !cpuc->vcpu_lbr)
 		__intel_pmu_lbr_disable();
 }
 
@@ -658,7 +666,7 @@ void intel_pmu_lbr_read(void)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 
-	if (!cpuc->lbr_users)
+	if (!cpuc->lbr_users || cpuc->vcpu_lbr)
 		return;
 
 	if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_32)
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 1f78d85..bbea559 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -210,6 +210,7 @@ struct cpu_hw_events {
 	/*
 	 * Intel LBR bits
 	 */
+	u8				vcpu_lbr;
 	int				lbr_users;
 	struct perf_branch_stack	lbr_stack;
 	struct perf_branch_entry	lbr_entries[MAX_LBR_ENTRIES];
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0831ac6..fac209b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -469,6 +469,7 @@ struct kvm_pmu {
 	struct kvm_pmc fixed_counters[INTEL_PMC_MAX_FIXED];
 	struct irq_work irq_work;
 	u64 reprogram_pmi;
+	struct perf_event *vcpu_lbr_event;
 };
 
 struct kvm_pmu_ops;
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index 5f3c7a4..efd8f16 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -122,6 +122,9 @@ void kvm_pmu_destroy(struct kvm_vcpu *vcpu);
 
 bool is_vmware_backdoor_pmc(u32 pmc_idx);
 
+extern int intel_pmu_enable_save_guest_lbr(struct kvm_vcpu *vcpu);
+extern void intel_pmu_disable_save_guest_lbr(struct kvm_vcpu *vcpu);
+
 extern struct kvm_pmu_ops intel_pmu_ops;
 extern struct kvm_pmu_ops amd_pmu_ops;
 #endif /* __KVM_X86_PMU_H */
diff --git a/arch/x86/kvm/pmu_intel.c b/arch/x86/kvm/pmu_intel.c
index c04cb6d..6e46354 100644
--- a/arch/x86/kvm/pmu_intel.c
+++ b/arch/x86/kvm/pmu_intel.c
@@ -458,6 +458,72 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
 		pmu->global_ovf_ctrl = 0;
 }
 
+int intel_pmu_enable_save_guest_lbr(struct kvm_vcpu *vcpu)
+{
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+	struct perf_event *event;
+
+	/*
+	 * The main purpose of this perf event is to have the host perf core
+	 * help save/restore the guest lbr stack on vcpu switching. There is
+	 * no perf counters allocated for the event.
+	 *
+	 * About the attr:
+	 * exclude_guest: set to true to indicate that the event runs on the
+	 *                host only.
+	 * no_counter:    set to true to tell the perf core that this event
+	 *                doesn't need a counter.
+	 * pinned:        set to false, so that the FLEXIBLE events will not
+	 *                be rescheduled for this event which actually doesn't
+	 *                need a perf counter.
+	 * config:        Actually this field won't be used by the perf core
+	 *                as this event doesn't have a perf counter.
+	 * sample_period: Same as above.
+	 * sample_type:   tells the perf core that it is an lbr event.
+	 * branch_sample_type: tells the perf core that the lbr event works in
+	 *                the user callstack mode so that the lbr stack will be
+	 *                saved/restored on vCPU switching.
+	 */
+	struct perf_event_attr attr = {
+		.type = PERF_TYPE_RAW,
+		.size = sizeof(attr),
+		.no_counter = true,
+		.exclude_guest = true,
+		.pinned = false,
+		.config = 0,
+		.sample_period = 0,
+		.sample_type = PERF_SAMPLE_BRANCH_STACK,
+		.branch_sample_type = PERF_SAMPLE_BRANCH_CALL_STACK |
+				      PERF_SAMPLE_BRANCH_USER,
+	};
+
+	if (pmu->vcpu_lbr_event)
+		return 0;
+
+	event = perf_event_create_kernel_counter(&attr, -1, current, NULL,
+						 NULL);
+	if (IS_ERR(event)) {
+		pr_err("%s: failed %ld\n", __func__, PTR_ERR(event));
+		return -ENOENT;
+	}
+	pmu->vcpu_lbr_event = event;
+
+	return 0;
+}
+
+void intel_pmu_disable_save_guest_lbr(struct kvm_vcpu *vcpu)
+{
+	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+	struct perf_event *event = pmu->vcpu_lbr_event;
+
+	if (!event)
+		return;
+
+	perf_event_release_kernel(event);
+	pmu->vcpu_lbr_event = NULL;
+}
+
+
 struct kvm_pmu_ops intel_pmu_ops = {
 	.find_arch_event = intel_find_arch_event,
 	.find_fixed_event = intel_find_fixed_event,
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e58a997..e2fed1b 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1014,6 +1014,13 @@ static inline bool is_no_counter_event(struct perf_event *event)
 	return event->attr.no_counter;
 }
 
+#define TASK_TOMBSTONE ((void *)-1L)
+
+static inline bool is_kernel_event(struct perf_event *event)
+{
+	return READ_ONCE(event->owner) == TASK_TOMBSTONE;
+}
+
 /*
  * Return 1 for a software event, 0 for a hardware event
  */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 84530ab..78b0550 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -165,13 +165,6 @@ static void perf_ctx_unlock(struct perf_cpu_context *cpuctx,
 	raw_spin_unlock(&cpuctx->ctx.lock);
 }
 
-#define TASK_TOMBSTONE ((void *)-1L)
-
-static bool is_kernel_event(struct perf_event *event)
-{
-	return READ_ONCE(event->owner) == TASK_TOMBSTONE;
-}
-
 /*
  * On task ctx scheduling...
  *
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 08/10] perf/x86: save/restore LBR_SELECT on vCPU switching
  2018-12-26  9:25 [PATCH v4 00/10] Guest LBR Enabling Wei Wang
                   ` (6 preceding siblings ...)
  2018-12-26  9:25 ` [PATCH v4 07/10] KVM/x86/vPMU: Add APIs to support host save/restore the guest lbr stack Wei Wang
@ 2018-12-26  9:25 ` Wei Wang
  2018-12-26  9:25 ` [PATCH v4 09/10] perf/x86: function to check lbr user callstack mode Wei Wang
  2018-12-26  9:25 ` [PATCH v4 10/10] KVM/x86/lbr: lazy save the guest lbr stack Wei Wang
  9 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-12-26  9:25 UTC (permalink / raw)
  To: linux-kernel, kvm, pbonzini, ak, peterz
  Cc: kan.liang, mingo, rkrcmar, like.xu, wei.w.wang, jannh, arei.gonglei

The vCPU lbr event relies on the host to save/restore all the lbr
related MSRs. So add the LBR_SELECT save/restore to the related
functions for the vCPU case.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/events/intel/lbr.c  | 7 +++++++
 arch/x86/events/perf_event.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 1d20051..0225ac9 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -383,6 +383,9 @@ static void __intel_pmu_lbr_restore(struct x86_perf_task_context *task_ctx)
 
 	wrmsrl(x86_pmu.lbr_tos, tos);
 	task_ctx->lbr_stack_state = LBR_NONE;
+
+	if (cpuc->vcpu_lbr)
+		wrmsrl(MSR_LBR_SELECT, task_ctx->lbr_sel);
 }
 
 static void __intel_pmu_lbr_save(struct x86_perf_task_context *task_ctx)
@@ -409,6 +412,10 @@ static void __intel_pmu_lbr_save(struct x86_perf_task_context *task_ctx)
 		if (x86_pmu.intel_cap.lbr_format == LBR_FORMAT_INFO)
 			rdmsrl(MSR_LBR_INFO_0 + lbr_idx, task_ctx->lbr_info[i]);
 	}
+
+	if (cpuc->vcpu_lbr)
+		rdmsrl(MSR_LBR_SELECT, task_ctx->lbr_sel);
+
 	task_ctx->valid_lbrs = i;
 	task_ctx->tos = tos;
 	task_ctx->lbr_stack_state = LBR_VALID;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index bbea559..ccd0215 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -653,6 +653,7 @@ struct x86_perf_task_context {
 	u64 lbr_from[MAX_LBR_ENTRIES];
 	u64 lbr_to[MAX_LBR_ENTRIES];
 	u64 lbr_info[MAX_LBR_ENTRIES];
+	u64 lbr_sel;
 	int tos;
 	int valid_lbrs;
 	int lbr_callstack_users;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 09/10] perf/x86: function to check lbr user callstack mode
  2018-12-26  9:25 [PATCH v4 00/10] Guest LBR Enabling Wei Wang
                   ` (7 preceding siblings ...)
  2018-12-26  9:25 ` [PATCH v4 08/10] perf/x86: save/restore LBR_SELECT on vCPU switching Wei Wang
@ 2018-12-26  9:25 ` Wei Wang
  2018-12-26  9:25 ` [PATCH v4 10/10] KVM/x86/lbr: lazy save the guest lbr stack Wei Wang
  9 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2018-12-26  9:25 UTC (permalink / raw)
  To: linux-kernel, kvm, pbonzini, ak, peterz
  Cc: kan.liang, mingo, rkrcmar, like.xu, wei.w.wang, jannh, arei.gonglei

This patch adds a function to check if the lbr_sel value has the user
callstack mode set. The related LBR_* macros are moved to the
perf_event.h as well.

Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/events/intel/lbr.c       | 38 -------------------------------
 arch/x86/include/asm/perf_event.h | 48 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 48 insertions(+), 38 deletions(-)

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 0225ac9..71fd76a 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -27,44 +27,6 @@ static const enum {
 	[LBR_FORMAT_EIP_FLAGS2] = LBR_EIP_FLAGS | LBR_TSX,
 };
 
-/*
- * Intel LBR_SELECT bits
- * Intel Vol3a, April 2011, Section 16.7 Table 16-10
- *
- * Hardware branch filter (not available on all CPUs)
- */
-#define LBR_KERNEL_BIT		0 /* do not capture at ring0 */
-#define LBR_USER_BIT		1 /* do not capture at ring > 0 */
-#define LBR_JCC_BIT		2 /* do not capture conditional branches */
-#define LBR_REL_CALL_BIT	3 /* do not capture relative calls */
-#define LBR_IND_CALL_BIT	4 /* do not capture indirect calls */
-#define LBR_RETURN_BIT		5 /* do not capture near returns */
-#define LBR_IND_JMP_BIT		6 /* do not capture indirect jumps */
-#define LBR_REL_JMP_BIT		7 /* do not capture relative jumps */
-#define LBR_FAR_BIT		8 /* do not capture far branches */
-#define LBR_CALL_STACK_BIT	9 /* enable call stack */
-
-/*
- * Following bit only exists in Linux; we mask it out before writing it to
- * the actual MSR. But it helps the constraint perf code to understand
- * that this is a separate configuration.
- */
-#define LBR_NO_INFO_BIT	       63 /* don't read LBR_INFO. */
-
-#define LBR_KERNEL	(1 << LBR_KERNEL_BIT)
-#define LBR_USER	(1 << LBR_USER_BIT)
-#define LBR_JCC		(1 << LBR_JCC_BIT)
-#define LBR_REL_CALL	(1 << LBR_REL_CALL_BIT)
-#define LBR_IND_CALL	(1 << LBR_IND_CALL_BIT)
-#define LBR_RETURN	(1 << LBR_RETURN_BIT)
-#define LBR_REL_JMP	(1 << LBR_REL_JMP_BIT)
-#define LBR_IND_JMP	(1 << LBR_IND_JMP_BIT)
-#define LBR_FAR		(1 << LBR_FAR_BIT)
-#define LBR_CALL_STACK	(1 << LBR_CALL_STACK_BIT)
-#define LBR_NO_INFO	(1ULL << LBR_NO_INFO_BIT)
-
-#define LBR_PLM (LBR_KERNEL | LBR_USER)
-
 #define LBR_SEL_MASK	0x3ff	/* valid bits in LBR_SELECT */
 #define LBR_NOT_SUPP	-1	/* LBR filter not supported */
 #define LBR_IGN		0	/* ignored */
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index eee09b7..5c1d3ed 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -90,6 +90,45 @@
 #define X86_PERF_CAP_MASK_LBR_FMT			0x3f
 
 /*
+ * Intel LBR_SELECT bits
+ * Intel Vol3a, April 2011, Section 16.7 Table 16-10
+ *
+ * Hardware branch filter (not available on all CPUs)
+ */
+#define LBR_KERNEL_BIT		0 /* do not capture at ring0 */
+#define LBR_USER_BIT		1 /* do not capture at ring > 0 */
+#define LBR_JCC_BIT		2 /* do not capture conditional branches */
+#define LBR_REL_CALL_BIT	3 /* do not capture relative calls */
+#define LBR_IND_CALL_BIT	4 /* do not capture indirect calls */
+#define LBR_RETURN_BIT		5 /* do not capture near returns */
+#define LBR_IND_JMP_BIT		6 /* do not capture indirect jumps */
+#define LBR_REL_JMP_BIT		7 /* do not capture relative jumps */
+#define LBR_FAR_BIT		8 /* do not capture far branches */
+#define LBR_CALL_STACK_BIT	9 /* enable call stack */
+
+/*
+ * Following bit only exists in Linux; we mask it out before writing it to
+ * the actual MSR. But it helps the constraint perf code to understand
+ * that this is a separate configuration.
+ */
+#define LBR_NO_INFO_BIT	       63 /* don't read LBR_INFO. */
+
+#define LBR_KERNEL	(1 << LBR_KERNEL_BIT)
+#define LBR_USER	(1 << LBR_USER_BIT)
+#define LBR_JCC		(1 << LBR_JCC_BIT)
+#define LBR_REL_CALL	(1 << LBR_REL_CALL_BIT)
+#define LBR_IND_CALL	(1 << LBR_IND_CALL_BIT)
+#define LBR_RETURN	(1 << LBR_RETURN_BIT)
+#define LBR_REL_JMP	(1 << LBR_REL_JMP_BIT)
+#define LBR_IND_JMP	(1 << LBR_IND_JMP_BIT)
+#define LBR_FAR		(1 << LBR_FAR_BIT)
+#define LBR_CALL_STACK	(1 << LBR_CALL_STACK_BIT)
+#define LBR_NO_INFO	(1ULL << LBR_NO_INFO_BIT)
+
+#define LBR_PLM (LBR_KERNEL | LBR_USER)
+#define LBR_USER_CALLSTACK (LBR_CALL_STACK | LBR_USER)
+
+/*
  * Intel "Architectural Performance Monitoring" CPUID
  * detection/enumeration details:
  */
@@ -311,6 +350,15 @@ static inline void perf_events_lapic_init(void)	{ }
 static inline void perf_check_microcode(void) { }
 #endif
 
+/*
+ * Returns true if the msr value is configured to the user callstack mode.
+ * Otherwise, false.
+ */
+static inline bool lbr_select_user_callstack(u64 lbr_select)
+{
+	return !!(lbr_select & LBR_USER_CALLSTACK);
+}
+
 #ifdef CONFIG_CPU_SUP_INTEL
  extern void intel_pt_handle_vmx(int on);
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 10/10] KVM/x86/lbr: lazy save the guest lbr stack
  2018-12-26  9:25 [PATCH v4 00/10] Guest LBR Enabling Wei Wang
                   ` (8 preceding siblings ...)
  2018-12-26  9:25 ` [PATCH v4 09/10] perf/x86: function to check lbr user callstack mode Wei Wang
@ 2018-12-26  9:25 ` Wei Wang
  2018-12-27 20:51   ` Andi Kleen
  2018-12-27 20:52   ` [PATCH v4 10/10] KVM/x86/lbr: lazy save the guest lbr stack II Andi Kleen
  9 siblings, 2 replies; 42+ messages in thread
From: Wei Wang @ 2018-12-26  9:25 UTC (permalink / raw)
  To: linux-kernel, kvm, pbonzini, ak, peterz
  Cc: kan.liang, mingo, rkrcmar, like.xu, wei.w.wang, jannh, arei.gonglei

When the vCPU is scheduled in:
- if the lbr feature was used in the last vCPU time slice, set the lbr
  stack to be interceptible, so that the host can capture whether the
  lbr feature will be used in this time slice;
- if the lbr feature wasn't used in the last vCPU time slice, disable
  the vCPU support of the guest lbr switching.

Upon the first access to one of the lbr related MSRs (since the vCPU was
scheduled in):
- record that the guest has used the lbr;
- create a host perf event to help save/restore the guest lbr stack if
  the guest uses the user callstack mode lbr stack;
- pass the stack through to the guest.

Suggested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 arch/x86/include/asm/kvm_host.h |   4 ++
 arch/x86/kvm/pmu.h              |   5 ++
 arch/x86/kvm/vmx.c              | 138 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 147 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fac209b..7f91eac 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -775,6 +775,10 @@ struct kvm_vcpu_arch {
 
 	/* Flush the L1 Data cache for L1TF mitigation on VMENTER */
 	bool l1tf_flush_l1d;
+	/* Indicate if the guest is using lbr with the user callstack mode */
+	bool lbr_user_callstack;
+	/* Indicate if the lbr msrs were accessed in this vCPU time slice */
+	bool lbr_used;
 };
 
 struct kvm_lpage_info {
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index efd8f16..c1fed24 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -103,6 +103,11 @@ static inline struct kvm_pmc *get_fixed_pmc(struct kvm_pmu *pmu, u32 msr)
 	return NULL;
 }
 
+static inline bool intel_pmu_save_vcpu_lbr_enabled(struct kvm_vcpu *vcpu)
+{
+	return !!vcpu_to_pmu(vcpu)->vcpu_lbr_event;
+}
+
 void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel);
 void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 ctrl, int fixed_idx);
 void reprogram_counter(struct kvm_pmu *pmu, int pmc_idx);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ee02967..80ec3f4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1310,6 +1310,9 @@ static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12,
 static void vmx_update_msr_bitmap(struct kvm_vcpu *vcpu);
 static __always_inline void vmx_disable_intercept_for_msr(unsigned long *msr_bitmap,
 							  u32 msr, int type);
+static void
+__always_inline vmx_set_intercept_for_msr(unsigned long *msr_bitmap, u32 msr,
+					  int type, bool value);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -4088,6 +4091,121 @@ static int vmx_get_msr_feature(struct kvm_msr_entry *msr)
 	return 0;
 }
 
+static void vmx_set_intercept_for_lbr_msrs(struct kvm_vcpu *vcpu, bool set)
+{
+	unsigned long *msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap;
+	struct x86_perf_lbr_stack *stack = &vcpu->kvm->arch.lbr_stack;
+	int nr = stack->nr;
+	int i;
+
+	vmx_set_intercept_for_msr(msr_bitmap, stack->tos, MSR_TYPE_RW, set);
+	for (i = 0; i < nr; i++) {
+		vmx_set_intercept_for_msr(msr_bitmap, stack->from + i,
+					  MSR_TYPE_RW, set);
+		vmx_set_intercept_for_msr(msr_bitmap, stack->to + i,
+					  MSR_TYPE_RW, set);
+		if (stack->info)
+			vmx_set_intercept_for_msr(msr_bitmap, stack->info + i,
+						  MSR_TYPE_RW, set);
+	}
+}
+
+static inline bool msr_is_lbr_stack(struct kvm_vcpu *vcpu, u32 index)
+{
+	struct x86_perf_lbr_stack *stack = &vcpu->kvm->arch.lbr_stack;
+	int nr = stack->nr;
+
+	return !!(index == stack->tos ||
+		 (index >= stack->from && index < stack->from + nr) ||
+		 (index >= stack->to && index < stack->to + nr) ||
+		 (index >= stack->info && index < stack->info));
+}
+
+static bool guest_get_lbr_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+	u32 index = msr_info->index;
+	bool ret = false;
+
+	switch (index) {
+	case MSR_IA32_DEBUGCTLMSR:
+		msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL);
+		ret = true;
+		break;
+	case MSR_LBR_SELECT:
+		ret = true;
+		rdmsrl(index, msr_info->data);
+		break;
+	default:
+		if (msr_is_lbr_stack(vcpu, index)) {
+			ret = true;
+			rdmsrl(index, msr_info->data);
+		}
+	}
+
+	return ret;
+}
+
+static bool guest_set_lbr_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
+{
+	u32 index = msr_info->index;
+	u64 data = msr_info->data;
+	bool ret = false;
+
+	switch (index) {
+	case MSR_IA32_DEBUGCTLMSR:
+		ret = true;
+		/*
+		 * Currently, only FREEZE_LBRS_ON_PMI and DEBUGCTLMSR_LBR are
+		 * supported.
+		 */
+		data &= (DEBUGCTLMSR_FREEZE_LBRS_ON_PMI | DEBUGCTLMSR_LBR);
+		vmcs_write64(GUEST_IA32_DEBUGCTL, msr_info->data);
+		break;
+	case MSR_LBR_SELECT:
+		ret = true;
+		if (lbr_select_user_callstack(data))
+			vcpu->arch.lbr_user_callstack = true;
+		else
+			vcpu->arch.lbr_user_callstack = false;
+		wrmsrl(index, msr_info->data);
+		break;
+	default:
+		if (msr_is_lbr_stack(vcpu, index)) {
+			ret = true;
+			wrmsrl(index, msr_info->data);
+		}
+	}
+
+	return ret;
+}
+
+static bool guest_access_lbr_msr(struct kvm_vcpu *vcpu,
+				 struct msr_data *msr_info,
+				 bool set)
+{
+	bool ret = false;
+
+	if (!vcpu->kvm->arch.lbr_in_guest)
+		return false;
+
+	if (set)
+		ret = guest_set_lbr_msr(vcpu, msr_info);
+	else
+		ret = guest_get_lbr_msr(vcpu, msr_info);
+
+	if (ret && !vcpu->arch.lbr_used) {
+		vcpu->arch.lbr_used = true;
+		vmx_set_intercept_for_lbr_msrs(vcpu, false);
+		if (vcpu->arch.lbr_user_callstack)
+			intel_pmu_enable_save_guest_lbr(vcpu);
+		else
+			intel_pmu_disable_save_guest_lbr(vcpu);
+	}
+
+	return ret;
+}
+
+
 /*
  * Reads an msr value (of 'msr_index') into 'pdata'.
  * Returns 0 on success, non-0 otherwise.
@@ -4179,6 +4297,8 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			msr_info->data = msr->data;
 			break;
 		}
+		if (guest_access_lbr_msr(vcpu, msr_info, false))
+			break;
 		return kvm_get_msr_common(vcpu, msr_info);
 	}
 
@@ -4375,6 +4495,8 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			}
 			break;
 		}
+		if (guest_access_lbr_msr(vcpu, msr_info, true))
+			break;
 		ret = kvm_set_msr_common(vcpu, msr_info);
 	}
 
@@ -11515,6 +11637,7 @@ static void vmx_free_vcpu(struct kvm_vcpu *vcpu)
 
 	if (enable_pml)
 		vmx_destroy_pml_buffer(vmx);
+	intel_pmu_disable_save_guest_lbr(vcpu);
 	free_vpid(vmx->vpid);
 	leave_guest_mode(vcpu);
 	vmx_free_vcpu_nested(vcpu);
@@ -14422,6 +14545,21 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
 {
 	if (!kvm_pause_in_guest(vcpu->kvm))
 		shrink_ple_window(vcpu);
+
+	if (vcpu->arch.lbr_used) {
+		vcpu->arch.lbr_used = false;
+		vmx_set_intercept_for_lbr_msrs(vcpu, true);
+	} else if (intel_pmu_save_vcpu_lbr_enabled(vcpu)) {
+		u64 guest_debugctl;
+
+		/*
+		 * The lbr feature wasn't used during that last vCPU time
+		 * slice, so it's time to disable the vCPU side save/restore.
+		 */
+		guest_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL);
+		if (!(guest_debugctl & DEBUGCTLMSR_LBR))
+			intel_pmu_disable_save_guest_lbr(vcpu);
+	}
 }
 
 static void vmx_slot_enable_log_dirty(struct kvm *kvm,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 10/10] KVM/x86/lbr: lazy save the guest lbr stack
  2018-12-26  9:25 ` [PATCH v4 10/10] KVM/x86/lbr: lazy save the guest lbr stack Wei Wang
@ 2018-12-27 20:51   ` Andi Kleen
  2018-12-28  3:47     ` Wei Wang
  2018-12-27 20:52   ` [PATCH v4 10/10] KVM/x86/lbr: lazy save the guest lbr stack II Andi Kleen
  1 sibling, 1 reply; 42+ messages in thread
From: Andi Kleen @ 2018-12-27 20:51 UTC (permalink / raw)
  To: Wei Wang
  Cc: linux-kernel, kvm, pbonzini, peterz, kan.liang, mingo, rkrcmar,
	like.xu, jannh, arei.gonglei


Thanks. This looks a lot better than the earlier versions.

Some more comments.

On Wed, Dec 26, 2018 at 05:25:38PM +0800, Wei Wang wrote:
> When the vCPU is scheduled in:
> - if the lbr feature was used in the last vCPU time slice, set the lbr
>   stack to be interceptible, so that the host can capture whether the
>   lbr feature will be used in this time slice;
> - if the lbr feature wasn't used in the last vCPU time slice, disable
>   the vCPU support of the guest lbr switching.

time slice is the time from exit to exit?

This might be rather short in some cases if the workload does a lot of exits
(which I would expect PMU workloads to do) Would be better to use some
explicit time check, or at least N exits.

> 
> Upon the first access to one of the lbr related MSRs (since the vCPU was
> scheduled in):
> - record that the guest has used the lbr;
> - create a host perf event to help save/restore the guest lbr stack if
>   the guest uses the user callstack mode lbr stack;

This is a bit risky. It would be safer (but also more expensive)
to always safe even for any guest LBR use independent of callstack.

Otherwise we might get into a situation where
a vCPU context switch inside the guest PMI will clear the LBRs
before they can be read in the PMI, so some LBR samples will be fully
or partially cleared. This would be user visible.

In theory could try to detect if the guest is inside a PMI and
save/restore then, but that would likely be complicated. I would
save/restore for all cases.

> +static void
> +__always_inline vmx_set_intercept_for_msr(unsigned long *msr_bitmap, u32 msr,
> +					  int type, bool value);

__always_inline should only be used if it's needed for functionality,
or in a header.

The rest looks good.

-Andi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 10/10] KVM/x86/lbr: lazy save the guest lbr stack II
  2018-12-26  9:25 ` [PATCH v4 10/10] KVM/x86/lbr: lazy save the guest lbr stack Wei Wang
  2018-12-27 20:51   ` Andi Kleen
@ 2018-12-27 20:52   ` Andi Kleen
  2018-12-29  4:25     ` Wang, Wei W
  1 sibling, 1 reply; 42+ messages in thread
From: Andi Kleen @ 2018-12-27 20:52 UTC (permalink / raw)
  To: Wei Wang
  Cc: linux-kernel, kvm, pbonzini, peterz, kan.liang, mingo, rkrcmar,
	like.xu, jannh, arei.gonglei


Actually forgot one case.

In Arch Perfmon v4 the LBR freezing is also controlled through a GLOBAL_CTRL bit.
I didn't see any code handling that bit?


-Andi

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 10/10] KVM/x86/lbr: lazy save the guest lbr stack
  2018-12-27 20:51   ` Andi Kleen
@ 2018-12-28  3:47     ` Wei Wang
  2018-12-28 19:10       ` Andi Kleen
  0 siblings, 1 reply; 42+ messages in thread
From: Wei Wang @ 2018-12-28  3:47 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, kvm, pbonzini, peterz, kan.liang, mingo, rkrcmar,
	like.xu, jannh, arei.gonglei

On 12/28/2018 04:51 AM, Andi Kleen wrote:
> Thanks. This looks a lot better than the earlier versions.
>
> Some more comments.
>
> On Wed, Dec 26, 2018 at 05:25:38PM +0800, Wei Wang wrote:
>> When the vCPU is scheduled in:
>> - if the lbr feature was used in the last vCPU time slice, set the lbr
>>    stack to be interceptible, so that the host can capture whether the
>>    lbr feature will be used in this time slice;
>> - if the lbr feature wasn't used in the last vCPU time slice, disable
>>    the vCPU support of the guest lbr switching.
> time slice is the time from exit to exit?

It's the vCPU thread time slice (e.g. 100ms).


>
> This might be rather short in some cases if the workload does a lot of exits
> (which I would expect PMU workloads to do) Would be better to use some
> explicit time check, or at least N exits.

Did you mean further increasing the lazy time to multiple host thread
scheduling time slices?
What would be a good value for "N"?


>> Upon the first access to one of the lbr related MSRs (since the vCPU was
>> scheduled in):
>> - record that the guest has used the lbr;
>> - create a host perf event to help save/restore the guest lbr stack if
>>    the guest uses the user callstack mode lbr stack;
> This is a bit risky. It would be safer (but also more expensive)
> to always safe even for any guest LBR use independent of callstack.
>
> Otherwise we might get into a situation where
> a vCPU context switch inside the guest PMI will clear the LBRs
> before they can be read in the PMI, so some LBR samples will be fully
> or partially cleared. This would be user visible.
>
> In theory could try to detect if the guest is inside a PMI and
> save/restore then, but that would likely be complicated. I would
> save/restore for all cases.

Yes, it is easier to save for all the cases. But curious for the 
non-callstack
mode, it's just ponit sampling functions (kind of speculative in some 
degree).
Would rarely losing a few recordings important in that case?


>
>> +static void
>> +__always_inline vmx_set_intercept_for_msr(unsigned long *msr_bitmap, u32 msr,
>> +					  int type, bool value);
> __always_inline should only be used if it's needed for functionality,
> or in a header.

Thanks, will fix it.

Best,
Wei

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 10/10] KVM/x86/lbr: lazy save the guest lbr stack
  2018-12-28  3:47     ` Wei Wang
@ 2018-12-28 19:10       ` Andi Kleen
  0 siblings, 0 replies; 42+ messages in thread
From: Andi Kleen @ 2018-12-28 19:10 UTC (permalink / raw)
  To: Wei Wang
  Cc: linux-kernel, kvm, pbonzini, peterz, kan.liang, mingo, rkrcmar,
	like.xu, jannh, arei.gonglei

On Fri, Dec 28, 2018 at 11:47:06AM +0800, Wei Wang wrote:
> On 12/28/2018 04:51 AM, Andi Kleen wrote:
> > Thanks. This looks a lot better than the earlier versions.
> > 
> > Some more comments.
> > 
> > On Wed, Dec 26, 2018 at 05:25:38PM +0800, Wei Wang wrote:
> > > When the vCPU is scheduled in:
> > > - if the lbr feature was used in the last vCPU time slice, set the lbr
> > >    stack to be interceptible, so that the host can capture whether the
> > >    lbr feature will be used in this time slice;
> > > - if the lbr feature wasn't used in the last vCPU time slice, disable
> > >    the vCPU support of the guest lbr switching.
> > time slice is the time from exit to exit?
> 
> It's the vCPU thread time slice (e.g. 100ms).

I don't think the time slices are that long, but ok.

> 
> > 
> > This might be rather short in some cases if the workload does a lot of exits
> > (which I would expect PMU workloads to do) Would be better to use some
> > explicit time check, or at least N exits.
> 
> Did you mean further increasing the lazy time to multiple host thread
> scheduling time slices?
> What would be a good value for "N"?

I'm not sure -- i think the goal would be to find a value that optimizes
performance (or rather minimizes overhead). But perhaps if it's as you say the
scheduler time slice it might be good enough as it is.

I guess it could be tuned later based on more experneice.

> > or partially cleared. This would be user visible.
> > 
> > In theory could try to detect if the guest is inside a PMI and
> > save/restore then, but that would likely be complicated. I would
> > save/restore for all cases.
> 
> Yes, it is easier to save for all the cases. But curious for the
> non-callstack
> mode, it's just ponit sampling functions (kind of speculative in some
> degree).
> Would rarely losing a few recordings important in that case?

In principle no for statistical samples, but I know some tools complain
for bogus samples (e.g. autofdo will). Also with perf report --branch-history it will
be definitely visible. I think it's easier to always safe now than to
handle the user complaints about this later.


-Andi

^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v4 10/10] KVM/x86/lbr: lazy save the guest lbr stack II
  2018-12-27 20:52   ` [PATCH v4 10/10] KVM/x86/lbr: lazy save the guest lbr stack II Andi Kleen
@ 2018-12-29  4:25     ` Wang, Wei W
  0 siblings, 0 replies; 42+ messages in thread
From: Wang, Wei W @ 2018-12-29  4:25 UTC (permalink / raw)
  To: 'Andi Kleen'
  Cc: linux-kernel, kvm, pbonzini, peterz, Liang, Kan, mingo, rkrcmar,
	Xu, Like, jannh, arei.gonglei

On Friday, December 28, 2018 4:53 AM, Andi Kleen wrote:
> Actually forgot one case.
> 
> In Arch Perfmon v4 the LBR freezing is also controlled through a
> GLOBAL_CTRL bit.
> I didn't see any code handling that bit?

That GLOBAL_STATUS.LBR_FRZ bit hasn't been supported yet. I'll add that, thanks.

Best,
Wei



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 04/10] KVM/x86: intel_pmu_lbr_enable
  2018-12-26  9:25 ` [PATCH v4 04/10] KVM/x86: intel_pmu_lbr_enable Wei Wang
@ 2019-01-02 16:33   ` Liang, Kan
  2019-01-04  9:58     ` Wei Wang
  2019-01-02 23:26   ` Jim Mattson
  1 sibling, 1 reply; 42+ messages in thread
From: Liang, Kan @ 2019-01-02 16:33 UTC (permalink / raw)
  To: Wei Wang, linux-kernel, kvm, pbonzini, ak, peterz
  Cc: kan.liang, mingo, rkrcmar, like.xu, jannh, arei.gonglei



On 12/26/2018 4:25 AM, Wei Wang wrote:
> +
> +	/*
> +	 * It could be possible that people have vcpus of old model run on
> +	 * physcal cpus of newer model, for example a BDW guest on a SKX
> +	 * machine (but not possible to be the other way around).
> +	 * The BDW guest may not get accurate results on a SKX machine as it
> +	 * only reads 16 entries of the lbr stack while there are 32 entries
> +	 * of recordings. So we currently forbid the lbr enabling when the
> +	 * vcpu and physical cpu see different lbr stack entries.

I think it's not enough to only check number of entries. The LBR from/to 
MSRs may be different even the number of entries is the same, e.g SLM 
and KNL.

> +	 */
> +	switch (vcpu_model) {

That's a duplicate of intel_pmu_init(). I think it's better to factor 
out the common part if you want to check LBR MSRs and entries. Then we 
don't need to add the same codes in two different places when enabling 
new platforms.

Actually, I think we may just support LBR for guest if it has the 
identical CPU model as host. It should be good enough for now.

Thanks,
Kan
> +	case INTEL_FAM6_CORE2_MEROM:
> +	case INTEL_FAM6_CORE2_MEROM_L:

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 04/10] KVM/x86: intel_pmu_lbr_enable
  2018-12-26  9:25 ` [PATCH v4 04/10] KVM/x86: intel_pmu_lbr_enable Wei Wang
  2019-01-02 16:33   ` Liang, Kan
@ 2019-01-02 23:26   ` Jim Mattson
  2019-01-03  7:22     ` Wei Wang
  1 sibling, 1 reply; 42+ messages in thread
From: Jim Mattson @ 2019-01-02 23:26 UTC (permalink / raw)
  To: Wei Wang
  Cc: LKML, kvm list, Paolo Bonzini, Andi Kleen, Peter Zijlstra,
	Kan Liang, Ingo Molnar, Radim Krčmář,
	like.xu, Jann Horn, arei.gonglei

On Wed, Dec 26, 2018 at 2:01 AM Wei Wang <wei.w.wang@intel.com> wrote:
>
> The lbr stack is architecturally specific, for example, SKX has 32 lbr
> stack entries while HSW has 16 entries, so a HSW guest running on a SKX
> machine may not get accurate perf results. Currently, we forbid the
> guest lbr enabling when the guest and host see different lbr stack
> entries.

How do you handle live migration?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 05/10] KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest
  2018-12-26  9:25 ` [PATCH v4 05/10] KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest Wei Wang
@ 2019-01-02 23:40   ` Jim Mattson
  2019-01-03  8:00     ` Wei Wang
  0 siblings, 1 reply; 42+ messages in thread
From: Jim Mattson @ 2019-01-02 23:40 UTC (permalink / raw)
  To: Wei Wang
  Cc: LKML, kvm list, Paolo Bonzini, Andi Kleen, Peter Zijlstra,
	Kan Liang, Ingo Molnar, Radim Krčmář,
	like.xu, Jann Horn, arei.gonglei

On Wed, Dec 26, 2018 at 2:01 AM Wei Wang <wei.w.wang@intel.com> wrote:
>
> Bits [0, 5] of MSR_IA32_PERF_CAPABILITIES tell about the format of
> the addresses stored in the LBR stack. Expose those bits to the guest
> when the guest lbr feature is enabled.
>
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Andi Kleen <ak@linux.intel.com>
> ---
>  arch/x86/include/asm/perf_event.h | 2 ++
>  arch/x86/kvm/cpuid.c              | 2 +-
>  arch/x86/kvm/vmx.c                | 9 +++++++++
>  3 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
> index 2f82795..eee09b7 100644
> --- a/arch/x86/include/asm/perf_event.h
> +++ b/arch/x86/include/asm/perf_event.h
> @@ -87,6 +87,8 @@
>  #define ARCH_PERFMON_BRANCH_MISSES_RETIRED             6
>  #define ARCH_PERFMON_EVENTS_COUNT                      7
>
> +#define X86_PERF_CAP_MASK_LBR_FMT                      0x3f
> +
>  /*
>   * Intel "Architectural Performance Monitoring" CPUID
>   * detection/enumeration details:
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 7bcfa61..3b8a57b 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -365,7 +365,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>                 F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
>                 0 /* DS-CPL, VMX, SMX, EST */ |
>                 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
> -               F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ |
> +               F(FMA) | F(CX16) | 0 /* xTPR Update*/ | F(PDCM) |
>                 F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
>                 F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) |
>                 0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 8d5d984..ee02967 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -4161,6 +4161,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>                         return 1;
>                 msr_info->data = vcpu->arch.ia32_xss;
>                 break;
> +       case MSR_IA32_PERF_CAPABILITIES:
> +               if (!boot_cpu_has(X86_FEATURE_PDCM))
> +                       return 1;
> +               msr_info->data = native_read_msr(MSR_IA32_PERF_CAPABILITIES);

Since this isn't guarded by vcpu->kvm->arch.lbr_in_guest, it breaks
backwards compatibility, doesn't it?

> +               if (vcpu->kvm->arch.lbr_in_guest)
> +                       msr_info->data &= X86_PERF_CAP_MASK_LBR_FMT;
> +               break;
>         case MSR_TSC_AUX:
>                 if (!msr_info->host_initiated &&
>                     !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> @@ -4343,6 +4350,8 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>                 else
>                         clear_atomic_switch_msr(vmx, MSR_IA32_XSS);
>                 break;
> +       case MSR_IA32_PERF_CAPABILITIES:
> +               return 1; /* RO MSR */
>         case MSR_TSC_AUX:
>                 if (!msr_info->host_initiated &&
>                     !guest_cpuid_has(vcpu, X86_FEATURE_RDTSCP))
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 04/10] KVM/x86: intel_pmu_lbr_enable
  2019-01-02 23:26   ` Jim Mattson
@ 2019-01-03  7:22     ` Wei Wang
  2019-01-03 15:34       ` Jim Mattson
  0 siblings, 1 reply; 42+ messages in thread
From: Wei Wang @ 2019-01-03  7:22 UTC (permalink / raw)
  To: Jim Mattson
  Cc: LKML, kvm list, Paolo Bonzini, Andi Kleen, Peter Zijlstra,
	Kan Liang, Ingo Molnar, Radim Krčmář,
	like.xu, Jann Horn, arei.gonglei

On 01/03/2019 07:26 AM, Jim Mattson wrote:
> On Wed, Dec 26, 2018 at 2:01 AM Wei Wang <wei.w.wang@intel.com> wrote:
>> The lbr stack is architecturally specific, for example, SKX has 32 lbr
>> stack entries while HSW has 16 entries, so a HSW guest running on a SKX
>> machine may not get accurate perf results. Currently, we forbid the
>> guest lbr enabling when the guest and host see different lbr stack
>> entries.
> How do you handle live migration?

This feature is gated by the QEMU "lbr=true" option.
So if the lbr fails to work on the destination machine,
the destination side QEMU wouldn't be able to boot,
and migration will not happen.

Best,
Wei

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 05/10] KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest
  2019-01-02 23:40   ` Jim Mattson
@ 2019-01-03  8:00     ` Wei Wang
  2019-01-03 15:25       ` Jim Mattson
  0 siblings, 1 reply; 42+ messages in thread
From: Wei Wang @ 2019-01-03  8:00 UTC (permalink / raw)
  To: Jim Mattson
  Cc: LKML, kvm list, Paolo Bonzini, Andi Kleen, Peter Zijlstra,
	Kan Liang, Ingo Molnar, Radim Krčmář,
	like.xu, Jann Horn, arei.gonglei

On 01/03/2019 07:40 AM, Jim Mattson wrote:
> On Wed, Dec 26, 2018 at 2:01 AM Wei Wang <wei.w.wang@intel.com> wrote:
>> Bits [0, 5] of MSR_IA32_PERF_CAPABILITIES tell about the format of
>> the addresses stored in the LBR stack. Expose those bits to the guest
>> when the guest lbr feature is enabled.
>>
>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Andi Kleen <ak@linux.intel.com>
>> ---
>>   arch/x86/include/asm/perf_event.h | 2 ++
>>   arch/x86/kvm/cpuid.c              | 2 +-
>>   arch/x86/kvm/vmx.c                | 9 +++++++++
>>   3 files changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
>> index 2f82795..eee09b7 100644
>> --- a/arch/x86/include/asm/perf_event.h
>> +++ b/arch/x86/include/asm/perf_event.h
>> @@ -87,6 +87,8 @@
>>   #define ARCH_PERFMON_BRANCH_MISSES_RETIRED             6
>>   #define ARCH_PERFMON_EVENTS_COUNT                      7
>>
>> +#define X86_PERF_CAP_MASK_LBR_FMT                      0x3f
>> +
>>   /*
>>    * Intel "Architectural Performance Monitoring" CPUID
>>    * detection/enumeration details:
>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>> index 7bcfa61..3b8a57b 100644
>> --- a/arch/x86/kvm/cpuid.c
>> +++ b/arch/x86/kvm/cpuid.c
>> @@ -365,7 +365,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
>>                  F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
>>                  0 /* DS-CPL, VMX, SMX, EST */ |
>>                  0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
>> -               F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ |
>> +               F(FMA) | F(CX16) | 0 /* xTPR Update*/ | F(PDCM) |
>>                  F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
>>                  F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) |
>>                  0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 8d5d984..ee02967 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -4161,6 +4161,13 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>                          return 1;
>>                  msr_info->data = vcpu->arch.ia32_xss;
>>                  break;
>> +       case MSR_IA32_PERF_CAPABILITIES:
>> +               if (!boot_cpu_has(X86_FEATURE_PDCM))
>> +                       return 1;
>> +               msr_info->data = native_read_msr(MSR_IA32_PERF_CAPABILITIES);
> Since this isn't guarded by vcpu->kvm->arch.lbr_in_guest, it breaks
> backwards compatibility, doesn't it?

Right, thanks. Probably better to change it to below:

msr_info->data = 0;
data = native_read_msr(MSR_IA32_PERF_CAPABILITIES);
if (vcpu->kvm->arch.lbr_in_guest)
     msr_info->data |= (data & X86_PERF_CAP_MASK_LBR_FMT);

Best,
Wei

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 05/10] KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest
  2019-01-03  8:00     ` Wei Wang
@ 2019-01-03 15:25       ` Jim Mattson
  2019-01-07  9:15         ` Wei Wang
  0 siblings, 1 reply; 42+ messages in thread
From: Jim Mattson @ 2019-01-03 15:25 UTC (permalink / raw)
  To: Wei Wang
  Cc: LKML, kvm list, Paolo Bonzini, Andi Kleen, Peter Zijlstra,
	Kan Liang, Ingo Molnar, Radim Krčmář,
	like.xu, Jann Horn, arei.gonglei

On Wed, Jan 2, 2019 at 11:55 PM Wei Wang <wei.w.wang@intel.com> wrote:

> Right, thanks. Probably better to change it to below:
>
> msr_info->data = 0;
> data = native_read_msr(MSR_IA32_PERF_CAPABILITIES);
> if (vcpu->kvm->arch.lbr_in_guest)
>      msr_info->data |= (data & X86_PERF_CAP_MASK_LBR_FMT);
>

This still breaks backwards compatibility. Returning 0 and raising #GP
are not the same.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 04/10] KVM/x86: intel_pmu_lbr_enable
  2019-01-03  7:22     ` Wei Wang
@ 2019-01-03 15:34       ` Jim Mattson
  2019-01-03 17:18         ` Andi Kleen
  2019-01-04 10:09         ` Wei Wang
  0 siblings, 2 replies; 42+ messages in thread
From: Jim Mattson @ 2019-01-03 15:34 UTC (permalink / raw)
  To: Wei Wang
  Cc: LKML, kvm list, Paolo Bonzini, Andi Kleen, Peter Zijlstra,
	Kan Liang, Ingo Molnar, Radim Krčmář,
	like.xu, Jann Horn, arei.gonglei

On Wed, Jan 2, 2019 at 11:16 PM Wei Wang <wei.w.wang@intel.com> wrote:
>
> On 01/03/2019 07:26 AM, Jim Mattson wrote:
> > On Wed, Dec 26, 2018 at 2:01 AM Wei Wang <wei.w.wang@intel.com> wrote:
> >> The lbr stack is architecturally specific, for example, SKX has 32 lbr
> >> stack entries while HSW has 16 entries, so a HSW guest running on a SKX
> >> machine may not get accurate perf results. Currently, we forbid the
> >> guest lbr enabling when the guest and host see different lbr stack
> >> entries.
> > How do you handle live migration?
>
> This feature is gated by the QEMU "lbr=true" option.
> So if the lbr fails to work on the destination machine,
> the destination side QEMU wouldn't be able to boot,
> and migration will not happen.

Yes, but then what happens?

Fast forward to, say, 2021. You're decommissioning all Broadwell
servers in your data center. You have to migrate the running VMs off
of those Broadwell systems onto newer hardware. But, with the current
implementation, the migration cannot happen. So, what do you do? I
suppose you just never enable the feature in the first place. Right?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 04/10] KVM/x86: intel_pmu_lbr_enable
  2019-01-03 15:34       ` Jim Mattson
@ 2019-01-03 17:18         ` Andi Kleen
  2019-01-04 10:09         ` Wei Wang
  1 sibling, 0 replies; 42+ messages in thread
From: Andi Kleen @ 2019-01-03 17:18 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Wei Wang, LKML, kvm list, Paolo Bonzini, Peter Zijlstra,
	Kan Liang, Ingo Molnar, Radim Krčmář,
	like.xu, Jann Horn, arei.gonglei

> Yes, but then what happens?
> 
> Fast forward to, say, 2021. You're decommissioning all Broadwell
> servers in your data center. You have to migrate the running VMs off
> of those Broadwell systems onto newer hardware. But, with the current
> implementation, the migration cannot happen. So, what do you do? I
> suppose you just never enable the feature in the first place. Right?

There would in theory ways to handle this.

LBRs are normally not required for running correctly, they're not like
instructions, so it would be actually quite reasonable to just return 0 
and ignore writes for incompatible machines for them. Your profilers might
return some bogus data, but that is normally acceptable.

In some cases it might be also possible to emulate LBRs of
different types, but I don't think it would be worth the effort.
 
Yes, but the patches don't do this, so right now you should only enable it when
all systems in your migration pool have compatible LBRs.

-Andi

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 04/10] KVM/x86: intel_pmu_lbr_enable
  2019-01-02 16:33   ` Liang, Kan
@ 2019-01-04  9:58     ` Wei Wang
  2019-01-04 15:57       ` Liang, Kan
  0 siblings, 1 reply; 42+ messages in thread
From: Wei Wang @ 2019-01-04  9:58 UTC (permalink / raw)
  To: Liang, Kan, linux-kernel, kvm, pbonzini, ak, peterz
  Cc: kan.liang, mingo, rkrcmar, like.xu, jannh, arei.gonglei

On 01/03/2019 12:33 AM, Liang, Kan wrote:
>
>
> On 12/26/2018 4:25 AM, Wei Wang wrote:
>> +
>> +    /*
>> +     * It could be possible that people have vcpus of old model run on
>> +     * physcal cpus of newer model, for example a BDW guest on a SKX
>> +     * machine (but not possible to be the other way around).
>> +     * The BDW guest may not get accurate results on a SKX machine 
>> as it
>> +     * only reads 16 entries of the lbr stack while there are 32 
>> entries
>> +     * of recordings. So we currently forbid the lbr enabling when the
>> +     * vcpu and physical cpu see different lbr stack entries.
>
> I think it's not enough to only check number of entries. The LBR 
> from/to MSRs may be different even the number of entries is the same, 
> e.g SLM and KNL.

Yes, we could add the comparison of the FROM msrs.

>
>> +     */
>> +    switch (vcpu_model) {
>
> That's a duplicate of intel_pmu_init(). I think it's better to factor 
> out the common part if you want to check LBR MSRs and entries. Then we 
> don't need to add the same codes in two different places when enabling 
> new platforms.
>


Yes, I thought about this, but intel_pmu_init() does a lot more things 
in each "Case xx". Any thought about how to factor them out?


> Actually, I think we may just support LBR for guest if it has the 
> identical CPU model as host. It should be good enough for now.
>

I actually tried this in the first place but it failed to work with the 
existing QEMU.
For example, when we specify "Broadwell" cpu from qemu, then qemu uses 
Broadwell core model,
but the physical machine I have is Broadwell X. This patch will support 
this case.

Best,
Wei

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 04/10] KVM/x86: intel_pmu_lbr_enable
  2019-01-03 15:34       ` Jim Mattson
  2019-01-03 17:18         ` Andi Kleen
@ 2019-01-04 10:09         ` Wei Wang
  2019-01-04 15:53           ` Jim Mattson
  1 sibling, 1 reply; 42+ messages in thread
From: Wei Wang @ 2019-01-04 10:09 UTC (permalink / raw)
  To: Jim Mattson
  Cc: LKML, kvm list, Paolo Bonzini, Andi Kleen, Peter Zijlstra,
	Kan Liang, Ingo Molnar, Radim Krčmář,
	like.xu, Jann Horn, arei.gonglei

On 01/03/2019 11:34 PM, Jim Mattson wrote:
> On Wed, Jan 2, 2019 at 11:16 PM Wei Wang <wei.w.wang@intel.com> wrote:
>> On 01/03/2019 07:26 AM, Jim Mattson wrote:
>>> On Wed, Dec 26, 2018 at 2:01 AM Wei Wang <wei.w.wang@intel.com> wrote:
>>>> The lbr stack is architecturally specific, for example, SKX has 32 lbr
>>>> stack entries while HSW has 16 entries, so a HSW guest running on a SKX
>>>> machine may not get accurate perf results. Currently, we forbid the
>>>> guest lbr enabling when the guest and host see different lbr stack
>>>> entries.
>>> How do you handle live migration?
>> This feature is gated by the QEMU "lbr=true" option.
>> So if the lbr fails to work on the destination machine,
>> the destination side QEMU wouldn't be able to boot,
>> and migration will not happen.
> Yes, but then what happens?
>
> Fast forward to, say, 2021. You're decommissioning all Broadwell
> servers in your data center. You have to migrate the running VMs off
> of those Broadwell systems onto newer hardware. But, with the current
> implementation, the migration cannot happen. So, what do you do? I
> suppose you just never enable the feature in the first place. Right?

I'm not sure if that's the way people would do with their data centers.
What would be the point of decommissioning all the BDW machines when
there are important BDW VMs running?

The "lbr=true" option can also be disabled via QMP, which will disable the
kvm side lbr support. So if you really want to deal with the above case,
you could first disable the lbr feature on the source side, and then 
boot the
destination side QEMU without "lbr=true". The lbr feature will not be 
available
to use by the guest at the time you decide to migrate the guest to a
non-compatible physical machine.

The point of this patch is: If we couldn't offer our users accurate
lbr results, we'd better to have the feature disabled rather than
offering wrong results to confuse them.


Best,
Wei

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 04/10] KVM/x86: intel_pmu_lbr_enable
  2019-01-04 10:09         ` Wei Wang
@ 2019-01-04 15:53           ` Jim Mattson
  2019-01-05 10:15             ` Wang, Wei W
  0 siblings, 1 reply; 42+ messages in thread
From: Jim Mattson @ 2019-01-04 15:53 UTC (permalink / raw)
  To: Wei Wang
  Cc: LKML, kvm list, Paolo Bonzini, Andi Kleen, Peter Zijlstra,
	Kan Liang, Ingo Molnar, Radim Krčmář,
	like.xu, Jann Horn, arei.gonglei

On Fri, Jan 4, 2019 at 2:03 AM Wei Wang <wei.w.wang@intel.com> wrote:
>
> On 01/03/2019 11:34 PM, Jim Mattson wrote:
> > Fast forward to, say, 2021. You're decommissioning all Broadwell
> > servers in your data center. You have to migrate the running VMs off
> > of those Broadwell systems onto newer hardware. But, with the current
> > implementation, the migration cannot happen. So, what do you do? I
> > suppose you just never enable the feature in the first place. Right?
>
> I'm not sure if that's the way people would do with their data centers.
> What would be the point of decommissioning all the BDW machines when
> there are important BDW VMs running?

TCO increases as hardware ages, while the TCO for each successive
generation of hardware tends to be lower than its predecessor. Thus,
the point of decommissioning all BDW hosts that are beyond their
useful service life is to save money. Our assumption for Google Cloud
is that we will always be able to emulate older Intel processors on
newer Intel processors, so the running BDW VMs should not be affected
by the decommissioning of BDW hardware. Obviously, that means that we
won't offer features that don't have a forward migration story, such
as this one.

Yes, someday Intel will drop support for some feature that we
currently offer (like MPX, perhaps), and that will cause us some
grief.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 04/10] KVM/x86: intel_pmu_lbr_enable
  2019-01-04  9:58     ` Wei Wang
@ 2019-01-04 15:57       ` Liang, Kan
  2019-01-05 10:09         ` Wei Wang
  0 siblings, 1 reply; 42+ messages in thread
From: Liang, Kan @ 2019-01-04 15:57 UTC (permalink / raw)
  To: Wei Wang, linux-kernel, kvm, pbonzini, ak, peterz
  Cc: kan.liang, mingo, rkrcmar, like.xu, jannh, arei.gonglei



On 1/4/2019 4:58 AM, Wei Wang wrote:
> On 01/03/2019 12:33 AM, Liang, Kan wrote:
>>
>>
>> On 12/26/2018 4:25 AM, Wei Wang wrote:
>>> +
>>> +    /*
>>> +     * It could be possible that people have vcpus of old model run on
>>> +     * physcal cpus of newer model, for example a BDW guest on a SKX
>>> +     * machine (but not possible to be the other way around).
>>> +     * The BDW guest may not get accurate results on a SKX machine 
>>> as it
>>> +     * only reads 16 entries of the lbr stack while there are 32 
>>> entries
>>> +     * of recordings. So we currently forbid the lbr enabling when the
>>> +     * vcpu and physical cpu see different lbr stack entries.
>>
>> I think it's not enough to only check number of entries. The LBR 
>> from/to MSRs may be different even the number of entries is the same, 
>> e.g SLM and KNL.
> 
> Yes, we could add the comparison of the FROM msrs.
> 
>>
>>> +     */
>>> +    switch (vcpu_model) {
>>
>> That's a duplicate of intel_pmu_init(). I think it's better to factor 
>> out the common part if you want to check LBR MSRs and entries. Then we 
>> don't need to add the same codes in two different places when enabling 
>> new platforms.
>>
> 
> 
> Yes, I thought about this, but intel_pmu_init() does a lot more things 
> in each "Case xx". Any thought about how to factor them out?
>

I think we may only move the "switch (boot_cpu_data.x86_model) { ... }" 
to a new function, e.g. __intel_pmu_init(int model, struct x86_pmu *x86_pmu)

In __intel_pmu_init, if the model != boot_cpu_data.x86_model, you only 
need to update x86_pmu.*. Just ignore global settings, e.g 
hw_cache_event_ids, mem_attr, extra_attr etc.

You may also need to introduce another new function to check if the LBR 
is compatible with guest in lbr.c, e.g. bool 
is_lbr_compatible_with_guest(int model).

bool is_lbr_compatible_with_guest(int model) {
	struct x86_pmu fake_x86_pmu;

	if (boot_cpu_data.x86_model == model)
		return true;

	__intel_pmu_init(model, &fake_x86_pmu);

	if ((x86_pmu.lbr_nr == fake_x86_pmu.lbr_nr) &&
	    (x86_pmu.lbr_tos == fake_x86_pmu.lbr_tos) &&
	    (x86_pmu.lbr_from == fake_x86_pmu.lbr_from))
		return true;

	return false;
}

> 
>> Actually, I think we may just support LBR for guest if it has the 
>> identical CPU model as host. It should be good enough for now.
>>
> 
> I actually tried this in the first place but it failed to work with the 
> existing QEMU.
> For example, when we specify "Broadwell" cpu from qemu, then qemu uses 
> Broadwell core model,
> but the physical machine I have is Broadwell X. This patch will support 
> this case.

I mean is it good enough if we only support "-cpu host"?

Thanks,
Kan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 04/10] KVM/x86: intel_pmu_lbr_enable
  2019-01-04 15:57       ` Liang, Kan
@ 2019-01-05 10:09         ` Wei Wang
  2019-01-07 14:22           ` Liang, Kan
  0 siblings, 1 reply; 42+ messages in thread
From: Wei Wang @ 2019-01-05 10:09 UTC (permalink / raw)
  To: Liang, Kan, linux-kernel, kvm, pbonzini, ak, peterz
  Cc: kan.liang, mingo, rkrcmar, like.xu, jannh, arei.gonglei

On 01/04/2019 11:57 PM, Liang, Kan wrote:
>
>
> On 1/4/2019 4:58 AM, Wei Wang wrote:
>> On 01/03/2019 12:33 AM, Liang, Kan wrote:
>>>
>>>
>>> On 12/26/2018 4:25 AM, Wei Wang wrote:
>>>> +
>>>> +    /*
>>>> +     * It could be possible that people have vcpus of old model 
>>>> run on
>>>> +     * physcal cpus of newer model, for example a BDW guest on a SKX
>>>> +     * machine (but not possible to be the other way around).
>>>> +     * The BDW guest may not get accurate results on a SKX machine 
>>>> as it
>>>> +     * only reads 16 entries of the lbr stack while there are 32 
>>>> entries
>>>> +     * of recordings. So we currently forbid the lbr enabling when 
>>>> the
>>>> +     * vcpu and physical cpu see different lbr stack entries.
>>>
>>> I think it's not enough to only check number of entries. The LBR 
>>> from/to MSRs may be different even the number of entries is the 
>>> same, e.g SLM and KNL.
>>
>> Yes, we could add the comparison of the FROM msrs.
>>
>>>
>>>> +     */
>>>> +    switch (vcpu_model) {
>>>
>>> That's a duplicate of intel_pmu_init(). I think it's better to 
>>> factor out the common part if you want to check LBR MSRs and 
>>> entries. Then we don't need to add the same codes in two different 
>>> places when enabling new platforms.
>>>
>>
>>
>> Yes, I thought about this, but intel_pmu_init() does a lot more 
>> things in each "Case xx". Any thought about how to factor them out?
>>
>
> I think we may only move the "switch (boot_cpu_data.x86_model) { ... 
> }" to a new function, e.g. __intel_pmu_init(int model, struct x86_pmu 
> *x86_pmu)
>
> In __intel_pmu_init, if the model != boot_cpu_data.x86_model, you only 
> need to update x86_pmu.*. Just ignore global settings, e.g 
> hw_cache_event_ids, mem_attr, extra_attr etc.

Thanks for sharing. I understand the point of maintaining those models 
at one place,
but this factor-out doesn't seem very elegant to me, like below

__intel_pmu_init (int model, struct x86_pmu *x86_pmu)
{
...
switch (model)
case INTEL_FAM6_NEHALEM:
case INTEL_FAM6_NEHALEM_EP:
case INTEL_FAM6_NEHALEM_EX:
     intel_pmu_lbr_init(x86_pmu);
     if (model != boot_cpu_data.x86_model)
         return;

     /* Other a lot of things init like below..*/
     memcpy(hw_cache_event_ids, nehalem_hw_cache_event_ids,
                    sizeof(hw_cache_event_ids));
     memcpy(hw_cache_extra_regs, nehalem_hw_cache_extra_regs,
                    sizeof(hw_cache_extra_regs));
     x86_pmu.event_constraints = intel_nehalem_event_constraints;
                 x86_pmu.pebs_constraints = 
intel_nehalem_pebs_event_constraints;
                 x86_pmu.enable_all = intel_pmu_nhm_enable_all;
                 x86_pmu.extra_regs = intel_nehalem_extra_regs;
  ...

Case...
}
We need insert "if (model != boot_cpu_data.x86_model)" in every "Case xx".

What would be the rationale that we only do lbr_init for "x86_pmu"
when model != boot_cpu_data.x86_model?
(It looks more like a workaround to factor-out the function and get what 
we want)

I would prefer having them separated as this patch for now - it is 
logically more clear to me.


>
>>
>>> Actually, I think we may just support LBR for guest if it has the 
>>> identical CPU model as host. It should be good enough for now.
>>>
>>
>> I actually tried this in the first place but it failed to work with 
>> the existing QEMU.
>> For example, when we specify "Broadwell" cpu from qemu, then qemu 
>> uses Broadwell core model,
>> but the physical machine I have is Broadwell X. This patch will 
>> support this case.
>
> I mean is it good enough if we only support "-cpu host"?
>

Not really. AFAIK, people don't use this usually. It is more common to 
specify the CPU type.

Best,
Wei



^ permalink raw reply	[flat|nested] 42+ messages in thread

* RE: [PATCH v4 04/10] KVM/x86: intel_pmu_lbr_enable
  2019-01-04 15:53           ` Jim Mattson
@ 2019-01-05 10:15             ` Wang, Wei W
  0 siblings, 0 replies; 42+ messages in thread
From: Wang, Wei W @ 2019-01-05 10:15 UTC (permalink / raw)
  To: Jim Mattson
  Cc: LKML, kvm list, Paolo Bonzini, Andi Kleen, Peter Zijlstra, Liang,
	Kan, Ingo Molnar, Radim Krcmár, Xu, Like, Jann Horn,
	arei.gonglei

On Friday, January 4, 2019 11:54 PM, Jim Mattson wrote:
> On Fri, Jan 4, 2019 at 2:03 AM Wei Wang <wei.w.wang@intel.com> wrote:
> >
> > On 01/03/2019 11:34 PM, Jim Mattson wrote:
> > > Fast forward to, say, 2021. You're decommissioning all Broadwell
> > > servers in your data center. You have to migrate the running VMs off
> > > of those Broadwell systems onto newer hardware. But, with the
> > > current implementation, the migration cannot happen. So, what do you
> > > do? I suppose you just never enable the feature in the first place. Right?
> >
> > I'm not sure if that's the way people would do with their data centers.
> > What would be the point of decommissioning all the BDW machines when
> > there are important BDW VMs running?
> 
> TCO increases as hardware ages, while the TCO for each successive
> generation of hardware tends to be lower than its predecessor. Thus, the
> point of decommissioning all BDW hosts that are beyond their useful service
> life is to save money. Our assumption for Google Cloud is that we will always
> be able to emulate older Intel processors on newer Intel processors, so the
> running BDW VMs should not be affected by the decommissioning of BDW
> hardware. Obviously, that means that we won't offer features that don't
> have a forward migration story, such as this one.
> 
> Yes, someday Intel will drop support for some feature that we currently offer
> (like MPX, perhaps), and that will cause us some grief.

This sounds like a pretty good opportunity to convince your customers
to upgrade their VMs :)

So the solution to your above problem I have in mind currently is:
- expect QEMU's support to upgrade VMs (e.g. BDW to SKL)
- just disable the non-compatible features for non-upgraded VMs
(this could be in your SLA)

Anyway, I think it would be good to get this feature online first,
and then re-visit that future concern.

Best,
Wei
 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 05/10] KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest
  2019-01-03 15:25       ` Jim Mattson
@ 2019-01-07  9:15         ` Wei Wang
  2019-01-07 18:05           ` Jim Mattson
  0 siblings, 1 reply; 42+ messages in thread
From: Wei Wang @ 2019-01-07  9:15 UTC (permalink / raw)
  To: Jim Mattson
  Cc: LKML, kvm list, Paolo Bonzini, Andi Kleen, Peter Zijlstra,
	Kan Liang, Ingo Molnar, Radim Krčmář,
	like.xu, Jann Horn, arei.gonglei

On 01/03/2019 11:25 PM, Jim Mattson wrote:
> On Wed, Jan 2, 2019 at 11:55 PM Wei Wang <wei.w.wang@intel.com> wrote:
>
>> Right, thanks. Probably better to change it to below:
>>
>> msr_info->data = 0;
>> data = native_read_msr(MSR_IA32_PERF_CAPABILITIES);
>> if (vcpu->kvm->arch.lbr_in_guest)
>>       msr_info->data |= (data & X86_PERF_CAP_MASK_LBR_FMT);
>>
> This still breaks backwards compatibility. Returning 0 and raising #GP
> are not the same.

I'm not sure about raising GP# in this case.

This PERF_CAP msr contains more things than the lbr format.
For example, a guest with lbr=false option could read it to get PEBS_FMT,
which is PERF_CAP[11:8]. We should offer those bits in this case.

When lbr=false, the lbr feature is not usable by the guest,
so I think whatever value (0 or other value) of the LBR_FMT bits that
we give to the guest might not be important.

Best,
Wei

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 04/10] KVM/x86: intel_pmu_lbr_enable
  2019-01-05 10:09         ` Wei Wang
@ 2019-01-07 14:22           ` Liang, Kan
  2019-01-08  6:13             ` Wei Wang
  0 siblings, 1 reply; 42+ messages in thread
From: Liang, Kan @ 2019-01-07 14:22 UTC (permalink / raw)
  To: Wei Wang, linux-kernel, kvm, pbonzini, ak, peterz
  Cc: kan.liang, mingo, rkrcmar, like.xu, jannh, arei.gonglei



On 1/5/2019 5:09 AM, Wei Wang wrote:
> On 01/04/2019 11:57 PM, Liang, Kan wrote:
>>
>>
>> On 1/4/2019 4:58 AM, Wei Wang wrote:
>>> On 01/03/2019 12:33 AM, Liang, Kan wrote:
>>>>
>>>>
>>>> On 12/26/2018 4:25 AM, Wei Wang wrote:
>>>>> +
>>>>> +    /*
>>>>> +     * It could be possible that people have vcpus of old model 
>>>>> run on
>>>>> +     * physcal cpus of newer model, for example a BDW guest on a SKX
>>>>> +     * machine (but not possible to be the other way around).
>>>>> +     * The BDW guest may not get accurate results on a SKX machine 
>>>>> as it
>>>>> +     * only reads 16 entries of the lbr stack while there are 32 
>>>>> entries
>>>>> +     * of recordings. So we currently forbid the lbr enabling when 
>>>>> the
>>>>> +     * vcpu and physical cpu see different lbr stack entries.
>>>>
>>>> I think it's not enough to only check number of entries. The LBR 
>>>> from/to MSRs may be different even the number of entries is the 
>>>> same, e.g SLM and KNL.
>>>
>>> Yes, we could add the comparison of the FROM msrs.
>>>
>>>>
>>>>> +     */
>>>>> +    switch (vcpu_model) {
>>>>
>>>> That's a duplicate of intel_pmu_init(). I think it's better to 
>>>> factor out the common part if you want to check LBR MSRs and 
>>>> entries. Then we don't need to add the same codes in two different 
>>>> places when enabling new platforms.
>>>>
>>>
>>>
>>> Yes, I thought about this, but intel_pmu_init() does a lot more 
>>> things in each "Case xx". Any thought about how to factor them out?
>>>
>>
>> I think we may only move the "switch (boot_cpu_data.x86_model) { ... 
>> }" to a new function, e.g. __intel_pmu_init(int model, struct x86_pmu 
>> *x86_pmu)
>>
>> In __intel_pmu_init, if the model != boot_cpu_data.x86_model, you only 
>> need to update x86_pmu.*. Just ignore global settings, e.g 
>> hw_cache_event_ids, mem_attr, extra_attr etc.
> 
> Thanks for sharing. I understand the point of maintaining those models 
> at one place,
> but this factor-out doesn't seem very elegant to me, like below
> 
> __intel_pmu_init (int model, struct x86_pmu *x86_pmu)
> {
> ...
> switch (model)
> case INTEL_FAM6_NEHALEM:
> case INTEL_FAM6_NEHALEM_EP:
> case INTEL_FAM6_NEHALEM_EX:
>      intel_pmu_lbr_init(x86_pmu);
>      if (model != boot_cpu_data.x86_model)
>          return;
> 
>      /* Other a lot of things init like below..*/
>      memcpy(hw_cache_event_ids, nehalem_hw_cache_event_ids,
>                     sizeof(hw_cache_event_ids));
>      memcpy(hw_cache_extra_regs, nehalem_hw_cache_extra_regs,
>                     sizeof(hw_cache_extra_regs));
>      x86_pmu.event_constraints = intel_nehalem_event_constraints;
>                  x86_pmu.pebs_constraints = 
> intel_nehalem_pebs_event_constraints;
>                  x86_pmu.enable_all = intel_pmu_nhm_enable_all;
>                  x86_pmu.extra_regs = intel_nehalem_extra_regs;
>   ...
> 
> Case...
> }
> We need insert "if (model != boot_cpu_data.x86_model)" in every "Case xx".
> 
> What would be the rationale that we only do lbr_init for "x86_pmu"
> when model != boot_cpu_data.x86_model?
> (It looks more like a workaround to factor-out the function and get what 
> we want)

I thought the new function may be extended to support fake pmu as below.
It's not only for lbr. PMU has many CPU specific features. It can be 
used for other features, if you want to check the compatibility in 
future. But I don't have an example now.

__intel_pmu_init (int model, struct x86_pmu *x86_pmu)
{
bool fake_pmu = (model != boot_cpu_data.x86_model) ? true : false;
...
switch (model)
case INTEL_FAM6_NEHALEM:
case INTEL_FAM6_NEHALEM_EP:
case INTEL_FAM6_NEHALEM_EX:
      intel_pmu_lbr_init(x86_pmu);
      x86_pmu->event_constraints = intel_nehalem_event_constraints;
      x86_pmu->pebs_constraints = intel_nehalem_pebs_event_constraints;
      x86_pmu->enable_all = intel_pmu_nhm_enable_all;
      x86_pmu->extra_regs = intel_nehalem_extra_regs;

      if (fake_pmu)
          return;

      /* Global variables should not be updated for fake PMU */
      memcpy(hw_cache_event_ids, nehalem_hw_cache_event_ids,
                     sizeof(hw_cache_event_ids));
      memcpy(hw_cache_extra_regs, nehalem_hw_cache_extra_regs,
                     sizeof(hw_cache_extra_regs));


> 
> I would prefer having them separated as this patch for now - it is 
> logically more clear to me.
> 

But it will be a problem for maintenance. Perf developer probably forget 
to update the list in KVM. I think you have to regularly check the perf 
code.

Thanks,
Kan


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 05/10] KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest
  2019-01-07  9:15         ` Wei Wang
@ 2019-01-07 18:05           ` Jim Mattson
  2019-01-07 18:20             ` Andi Kleen
  0 siblings, 1 reply; 42+ messages in thread
From: Jim Mattson @ 2019-01-07 18:05 UTC (permalink / raw)
  To: Wei Wang
  Cc: LKML, kvm list, Paolo Bonzini, Andi Kleen, Peter Zijlstra,
	Kan Liang, Ingo Molnar, Radim Krčmář,
	like.xu, Jann Horn, arei.gonglei

On Mon, Jan 7, 2019 at 1:09 AM Wei Wang <wei.w.wang@intel.com> wrote:
>
> On 01/03/2019 11:25 PM, Jim Mattson wrote:
> > On Wed, Jan 2, 2019 at 11:55 PM Wei Wang <wei.w.wang@intel.com> wrote:
> >
> >> Right, thanks. Probably better to change it to below:
> >>
> >> msr_info->data = 0;
> >> data = native_read_msr(MSR_IA32_PERF_CAPABILITIES);
> >> if (vcpu->kvm->arch.lbr_in_guest)
> >>       msr_info->data |= (data & X86_PERF_CAP_MASK_LBR_FMT);
> >>
> > This still breaks backwards compatibility. Returning 0 and raising #GP
> > are not the same.
>
> I'm not sure about raising GP# in this case.
>
> This PERF_CAP msr contains more things than the lbr format.
> For example, a guest with lbr=false option could read it to get PEBS_FMT,
> which is PERF_CAP[11:8]. We should offer those bits in this case.
>
> When lbr=false, the lbr feature is not usable by the guest,
> so I think whatever value (0 or other value) of the LBR_FMT bits that
> we give to the guest might not be important.

The issue is compatibility. Prior to your change, reading this MSR
from a VM would raise #GP. After your change, it won't. That means
that if you have a VM migrating between hosts with kernel versions
before and after this change, the results will be inconsistent. In the
forward migration path, RDMSR will first raise #GP, and later will
return 0. In the backward migration path, RDMSR will first return 0,
and later it will raise #GP.

To avoid these inconsistencies, the new MSR behavior should be
explicitly enabled by an opt-in ioctl from userspace (not necessarily
the same ioctl that enables LBR).

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 05/10] KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest
  2019-01-07 18:05           ` Jim Mattson
@ 2019-01-07 18:20             ` Andi Kleen
  2019-01-07 18:48               ` Jim Mattson
  0 siblings, 1 reply; 42+ messages in thread
From: Andi Kleen @ 2019-01-07 18:20 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Wei Wang, LKML, kvm list, Paolo Bonzini, Peter Zijlstra,
	Kan Liang, Ingo Molnar, Radim Krčmář,
	like.xu, Jann Horn, arei.gonglei

> The issue is compatibility. Prior to your change, reading this MSR
> from a VM would raise #GP. After your change, it won't. That means
> that if you have a VM migrating between hosts with kernel versions
> before and after this change, the results will be inconsistent. In the

No it will not be. All Linux kernel uses of this MSR are guarded
by a CPUID check.

-Andi

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 05/10] KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest
  2019-01-07 18:20             ` Andi Kleen
@ 2019-01-07 18:48               ` Jim Mattson
  2019-01-07 20:14                 ` Andi Kleen
  2019-01-08  7:53                 ` Wei Wang
  0 siblings, 2 replies; 42+ messages in thread
From: Jim Mattson @ 2019-01-07 18:48 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Wei Wang, LKML, kvm list, Paolo Bonzini, Peter Zijlstra,
	Kan Liang, Ingo Molnar, Radim Krčmář,
	like.xu, Jann Horn, arei.gonglei

On Mon, Jan 7, 2019 at 10:20 AM Andi Kleen <ak@linux.intel.com> wrote:
>
> > The issue is compatibility. Prior to your change, reading this MSR
> > from a VM would raise #GP. After your change, it won't. That means
> > that if you have a VM migrating between hosts with kernel versions
> > before and after this change, the results will be inconsistent. In the
>
> No it will not be. All Linux kernel uses of this MSR are guarded
> by a CPUID check.

Linux usage is irrelevant to the architected behavior of the virtual
CPU. According to volume 4 of the SDM, this MSR is only supported when
CPUID.01H:ECX.PDCM [bit 15] is set. Therefore, kvm should raise #GP
whenever a guest tries to read this MSR and the guest's
CPUID.01H:ECX.PDCM [bit 15] is clear.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 05/10] KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest
  2019-01-07 18:48               ` Jim Mattson
@ 2019-01-07 20:14                 ` Andi Kleen
  2019-01-07 21:00                   ` Jim Mattson
  2019-01-08  7:53                 ` Wei Wang
  1 sibling, 1 reply; 42+ messages in thread
From: Andi Kleen @ 2019-01-07 20:14 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Wei Wang, LKML, kvm list, Paolo Bonzini, Peter Zijlstra,
	Kan Liang, Ingo Molnar, Radim Krčmář,
	like.xu, Jann Horn, arei.gonglei

On Mon, Jan 07, 2019 at 10:48:38AM -0800, Jim Mattson wrote:
> On Mon, Jan 7, 2019 at 10:20 AM Andi Kleen <ak@linux.intel.com> wrote:
> >
> > > The issue is compatibility. Prior to your change, reading this MSR
> > > from a VM would raise #GP. After your change, it won't. That means
> > > that if you have a VM migrating between hosts with kernel versions
> > > before and after this change, the results will be inconsistent. In the
> >
> > No it will not be. All Linux kernel uses of this MSR are guarded
> > by a CPUID check.
> 
> Linux usage is irrelevant to the architected behavior of the virtual
> CPU. According to volume 4 of the SDM, this MSR is only supported when
> CPUID.01H:ECX.PDCM [bit 15] is set. Therefore, kvm should raise #GP
> whenever a guest tries to read this MSR and the guest's
> CPUID.01H:ECX.PDCM [bit 15] is clear.

That's not general practice in KVM. There are lots of MSRs
that don't fault even if their CPUID doesn't support them.

It's also not generally true for instructions, where it is even
impossible.

You could argue it should be done for MSRs, but that would
be a much larger patch for lots of MSRs. It seems pointless
to single out this particular one.

In practice I doubt it will make any difference for
real software either way.

-Andi


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 05/10] KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest
  2019-01-07 20:14                 ` Andi Kleen
@ 2019-01-07 21:00                   ` Jim Mattson
  0 siblings, 0 replies; 42+ messages in thread
From: Jim Mattson @ 2019-01-07 21:00 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Wei Wang, LKML, kvm list, Paolo Bonzini, Peter Zijlstra,
	Kan Liang, Ingo Molnar, Radim Krčmář,
	like.xu, Jann Horn, arei.gonglei

On Mon, Jan 7, 2019 at 12:14 PM Andi Kleen <ak@linux.intel.com> wrote:
>
> On Mon, Jan 07, 2019 at 10:48:38AM -0800, Jim Mattson wrote:
> > On Mon, Jan 7, 2019 at 10:20 AM Andi Kleen <ak@linux.intel.com> wrote:
> > >
> > > > The issue is compatibility. Prior to your change, reading this MSR
> > > > from a VM would raise #GP. After your change, it won't. That means
> > > > that if you have a VM migrating between hosts with kernel versions
> > > > before and after this change, the results will be inconsistent. In the
> > >
> > > No it will not be. All Linux kernel uses of this MSR are guarded
> > > by a CPUID check.
> >
> > Linux usage is irrelevant to the architected behavior of the virtual
> > CPU. According to volume 4 of the SDM, this MSR is only supported when
> > CPUID.01H:ECX.PDCM [bit 15] is set. Therefore, kvm should raise #GP
> > whenever a guest tries to read this MSR and the guest's
> > CPUID.01H:ECX.PDCM [bit 15] is clear.
>
> That's not general practice in KVM. There are lots of MSRs
> that don't fault even if their CPUID doesn't support them.

KVM doesn't really have a "general practice" in this regard. True,
there are lots of MSRs that don't fault even if unsupported. But there
are quite a few that do fault if unsupported. On the Intel side,
MSR_IA32_SPEC_CTRL, MSR_IA32_ARCH_CAPABILITIES, MSR_IA32_BNDCFGS, and
MSR_TSC_AUX all check for the corresponding capabilities in guest
CPUID. In the "common" code, MSR_IA32_TSC_ADJUST,
MSR_AMD64_OSVW_ID_LENGTH, and MSR_AMD64_OSVW_STATUS all check for the
corresponding capabilities in guest CPUID.

> It's also not generally true for instructions, where it is even
> impossible.

Yes, sometimes it is impossible to #UD for instructions that are
supported on the host but not in the guest. However, sometimes it is
possible, and kvm sometimes does enforce this. Some examples that come
to mind are RDTSCP and INVPCID. In fact, kvm goes so far as to require
that the guest CPUID must enumerate PCID support in order to enumerate
INVPCID support, and INVPCID will raise #UD unless both capabilities
are enumerated!

> You could argue it should be done for MSRs, but that would
> be a much larger patch for lots of MSRs. It seems pointless
> to single out this particular one.

Past sloppiness isn't really a compelling argument for continued
sloppiness going forward. And there are existing MSRs that do the
appropriate checks, so I'm not singling this one out.

> In practice I doubt it will make any difference for
> real software either way.

Perhaps, but when the correct behavior is so easy to implement, why
not just do it right in the first place?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 04/10] KVM/x86: intel_pmu_lbr_enable
  2019-01-07 14:22           ` Liang, Kan
@ 2019-01-08  6:13             ` Wei Wang
  2019-01-08 14:08               ` Liang, Kan
  0 siblings, 1 reply; 42+ messages in thread
From: Wei Wang @ 2019-01-08  6:13 UTC (permalink / raw)
  To: Liang, Kan, linux-kernel, kvm, pbonzini, ak, peterz
  Cc: kan.liang, mingo, rkrcmar, like.xu, jannh, arei.gonglei

On 01/07/2019 10:22 PM, Liang, Kan wrote:
>
>> Thanks for sharing. I understand the point of maintaining those 
>> models at one place,
>> but this factor-out doesn't seem very elegant to me, like below
>>
>> __intel_pmu_init (int model, struct x86_pmu *x86_pmu)
>> {
>> ...
>> switch (model)
>> case INTEL_FAM6_NEHALEM:
>> case INTEL_FAM6_NEHALEM_EP:
>> case INTEL_FAM6_NEHALEM_EX:
>>      intel_pmu_lbr_init(x86_pmu);
>>      if (model != boot_cpu_data.x86_model)
>>          return;
>>
>>      /* Other a lot of things init like below..*/
>>      memcpy(hw_cache_event_ids, nehalem_hw_cache_event_ids,
>>                     sizeof(hw_cache_event_ids));
>>      memcpy(hw_cache_extra_regs, nehalem_hw_cache_extra_regs,
>>                     sizeof(hw_cache_extra_regs));
>>      x86_pmu.event_constraints = intel_nehalem_event_constraints;
>>                  x86_pmu.pebs_constraints = 
>> intel_nehalem_pebs_event_constraints;
>>                  x86_pmu.enable_all = intel_pmu_nhm_enable_all;
>>                  x86_pmu.extra_regs = intel_nehalem_extra_regs;
>>   ...
>>
>> Case...
>> }
>> We need insert "if (model != boot_cpu_data.x86_model)" in every "Case 
>> xx".
>>
>> What would be the rationale that we only do lbr_init for "x86_pmu"
>> when model != boot_cpu_data.x86_model?
>> (It looks more like a workaround to factor-out the function and get 
>> what we want)
>
> I thought the new function may be extended to support fake pmu as below.
> It's not only for lbr. PMU has many CPU specific features. It can be 
> used for other features, if you want to check the compatibility in 
> future. But I don't have an example now.
>
> __intel_pmu_init (int model, struct x86_pmu *x86_pmu)
> {
> bool fake_pmu = (model != boot_cpu_data.x86_model) ? true : false;
> ...
> switch (model)
> case INTEL_FAM6_NEHALEM:
> case INTEL_FAM6_NEHALEM_EP:
> case INTEL_FAM6_NEHALEM_EX:
>      intel_pmu_lbr_init(x86_pmu);
>      x86_pmu->event_constraints = intel_nehalem_event_constraints;
>      x86_pmu->pebs_constraints = intel_nehalem_pebs_event_constraints;
>      x86_pmu->enable_all = intel_pmu_nhm_enable_all;
>      x86_pmu->extra_regs = intel_nehalem_extra_regs;
>
>      if (fake_pmu)
>          return;

It looks similar as the one I shared above, the difference is that more 
things
(e.g. constraints) are assigned to x86_fake_pmu.
I'm not sure about the logic behind it (still look like a workaround).



>
>      /* Global variables should not be updated for fake PMU */
>      memcpy(hw_cache_event_ids, nehalem_hw_cache_event_ids,
>                     sizeof(hw_cache_event_ids));
>      memcpy(hw_cache_extra_regs, nehalem_hw_cache_extra_regs,
>                     sizeof(hw_cache_extra_regs));
>
>
>>
>> I would prefer having them separated as this patch for now - it is 
>> logically more clear to me.
>>
>
> But it will be a problem for maintenance. Perf developer probably 
> forget to update the list in KVM. I think you have to regularly check 
> the perf code.
>

It's been very common in hypervisor development. That's why we have 
hypervisor developers here.
When a new platform is added, we will definitely get some work like this 
to do.

Best,
Wei


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 05/10] KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest
  2019-01-07 18:48               ` Jim Mattson
  2019-01-07 20:14                 ` Andi Kleen
@ 2019-01-08  7:53                 ` Wei Wang
  2019-01-08 17:19                   ` Jim Mattson
  1 sibling, 1 reply; 42+ messages in thread
From: Wei Wang @ 2019-01-08  7:53 UTC (permalink / raw)
  To: Jim Mattson, Andi Kleen
  Cc: LKML, kvm list, Paolo Bonzini, Peter Zijlstra, Kan Liang,
	Ingo Molnar, Radim Krčmář,
	like.xu, Jann Horn, arei.gonglei

On 01/08/2019 02:48 AM, Jim Mattson wrote:
> On Mon, Jan 7, 2019 at 10:20 AM Andi Kleen <ak@linux.intel.com> wrote:
>>> The issue is compatibility. Prior to your change, reading this MSR
>>> from a VM would raise #GP. After your change, it won't. That means
>>> that if you have a VM migrating between hosts with kernel versions
>>> before and after this change, the results will be inconsistent. In the
>> No it will not be. All Linux kernel uses of this MSR are guarded
>> by a CPUID check.
> Linux usage is irrelevant to the architected behavior of the virtual
> CPU. According to volume 4 of the SDM, this MSR is only supported when
> CPUID.01H:ECX.PDCM [bit 15] is set. Therefore, kvm should raise #GP
> whenever a guest tries to read this MSR and the guest's
> CPUID.01H:ECX.PDCM [bit 15] is clear.
>

Probably one more check would be better:

if (!boot_cpu_has(X86_FEATURE_PDCM) ||
     !guest_cpuid_has(vcpu, X86_FEATURE_PDCM))
     return 1;

(host isn't expected to read this MSR when PDCM is not supported
by the guest, so don't have "!msr_info->host_initiate" added to above)

Best,
Wei

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 04/10] KVM/x86: intel_pmu_lbr_enable
  2019-01-08  6:13             ` Wei Wang
@ 2019-01-08 14:08               ` Liang, Kan
  2019-01-09  1:54                 ` Wei Wang
  0 siblings, 1 reply; 42+ messages in thread
From: Liang, Kan @ 2019-01-08 14:08 UTC (permalink / raw)
  To: Wei Wang, linux-kernel, kvm, pbonzini, ak, peterz
  Cc: kan.liang, mingo, rkrcmar, like.xu, jannh, arei.gonglei



On 1/8/2019 1:13 AM, Wei Wang wrote:
> On 01/07/2019 10:22 PM, Liang, Kan wrote:
>>
>>> Thanks for sharing. I understand the point of maintaining those 
>>> models at one place,
>>> but this factor-out doesn't seem very elegant to me, like below
>>>
>>> __intel_pmu_init (int model, struct x86_pmu *x86_pmu)
>>> {
>>> ...
>>> switch (model)
>>> case INTEL_FAM6_NEHALEM:
>>> case INTEL_FAM6_NEHALEM_EP:
>>> case INTEL_FAM6_NEHALEM_EX:
>>>      intel_pmu_lbr_init(x86_pmu);
>>>      if (model != boot_cpu_data.x86_model)
>>>          return;
>>>
>>>      /* Other a lot of things init like below..*/
>>>      memcpy(hw_cache_event_ids, nehalem_hw_cache_event_ids,
>>>                     sizeof(hw_cache_event_ids));
>>>      memcpy(hw_cache_extra_regs, nehalem_hw_cache_extra_regs,
>>>                     sizeof(hw_cache_extra_regs));
>>>      x86_pmu.event_constraints = intel_nehalem_event_constraints;
>>>                  x86_pmu.pebs_constraints = 
>>> intel_nehalem_pebs_event_constraints;
>>>                  x86_pmu.enable_all = intel_pmu_nhm_enable_all;
>>>                  x86_pmu.extra_regs = intel_nehalem_extra_regs;
>>>   ...
>>>
>>> Case...
>>> }
>>> We need insert "if (model != boot_cpu_data.x86_model)" in every "Case 
>>> xx".
>>>
>>> What would be the rationale that we only do lbr_init for "x86_pmu"
>>> when model != boot_cpu_data.x86_model?
>>> (It looks more like a workaround to factor-out the function and get 
>>> what we want)
>>
>> I thought the new function may be extended to support fake pmu as below.
>> It's not only for lbr. PMU has many CPU specific features. It can be 
>> used for other features, if you want to check the compatibility in 
>> future. But I don't have an example now.
>>
>> __intel_pmu_init (int model, struct x86_pmu *x86_pmu)
>> {
>> bool fake_pmu = (model != boot_cpu_data.x86_model) ? true : false;
>> ...
>> switch (model)
>> case INTEL_FAM6_NEHALEM:
>> case INTEL_FAM6_NEHALEM_EP:
>> case INTEL_FAM6_NEHALEM_EX:
>>      intel_pmu_lbr_init(x86_pmu);
>>      x86_pmu->event_constraints = intel_nehalem_event_constraints;
>>      x86_pmu->pebs_constraints = intel_nehalem_pebs_event_constraints;
>>      x86_pmu->enable_all = intel_pmu_nhm_enable_all;
>>      x86_pmu->extra_regs = intel_nehalem_extra_regs;
>>
>>      if (fake_pmu)
>>          return;
> 
> It looks similar as the one I shared above, the difference is that more 
> things
> (e.g. constraints) are assigned to x86_fake_pmu.
> I'm not sure about the logic behind it (still look like a workaround).

The fake x86_pmu will include all the supported features in host. If you 
want to check other features in future, it would be useful.

> 
> 
> 
>>
>>      /* Global variables should not be updated for fake PMU */
>>      memcpy(hw_cache_event_ids, nehalem_hw_cache_event_ids,
>>                     sizeof(hw_cache_event_ids));
>>      memcpy(hw_cache_extra_regs, nehalem_hw_cache_extra_regs,
>>                     sizeof(hw_cache_extra_regs));
>>
>>
>>>
>>> I would prefer having them separated as this patch for now - it is 
>>> logically more clear to me.
>>>
>>
>> But it will be a problem for maintenance. Perf developer probably 
>> forget to update the list in KVM. I think you have to regularly check 
>> the perf code.
>>
> 
> It's been very common in hypervisor development. That's why we have 
> hypervisor developers here.
> When a new platform is added, we will definitely get some work like this 
> to do.
>

If that's part of your job, I'm OK with it.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 05/10] KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest
  2019-01-08  7:53                 ` Wei Wang
@ 2019-01-08 17:19                   ` Jim Mattson
  0 siblings, 0 replies; 42+ messages in thread
From: Jim Mattson @ 2019-01-08 17:19 UTC (permalink / raw)
  To: Wei Wang
  Cc: Andi Kleen, LKML, kvm list, Paolo Bonzini, Peter Zijlstra,
	Kan Liang, Ingo Molnar, Radim Krčmář,
	like.xu, Jann Horn, arei.gonglei

On Mon, Jan 7, 2019 at 11:48 PM Wei Wang <wei.w.wang@intel.com> wrote:
>
> On 01/08/2019 02:48 AM, Jim Mattson wrote:
> > On Mon, Jan 7, 2019 at 10:20 AM Andi Kleen <ak@linux.intel.com> wrote:
> >>> The issue is compatibility. Prior to your change, reading this MSR
> >>> from a VM would raise #GP. After your change, it won't. That means
> >>> that if you have a VM migrating between hosts with kernel versions
> >>> before and after this change, the results will be inconsistent. In the
> >> No it will not be. All Linux kernel uses of this MSR are guarded
> >> by a CPUID check.
> > Linux usage is irrelevant to the architected behavior of the virtual
> > CPU. According to volume 4 of the SDM, this MSR is only supported when
> > CPUID.01H:ECX.PDCM [bit 15] is set. Therefore, kvm should raise #GP
> > whenever a guest tries to read this MSR and the guest's
> > CPUID.01H:ECX.PDCM [bit 15] is clear.
> >
>
> Probably one more check would be better:
>
> if (!boot_cpu_has(X86_FEATURE_PDCM) ||
>      !guest_cpuid_has(vcpu, X86_FEATURE_PDCM))
>      return 1;

Thanks for that change!

> (host isn't expected to read this MSR when PDCM is not supported
> by the guest, so don't have "!msr_info->host_initiate" added to above)

In keeping with commit 44883f01fe6ae ("KVM: x86: ensure all MSRs can
always be KVM_GET/SET_MSR'd"), I think the host_initiated check should
be added.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 04/10] KVM/x86: intel_pmu_lbr_enable
  2019-01-08 14:08               ` Liang, Kan
@ 2019-01-09  1:54                 ` Wei Wang
  0 siblings, 0 replies; 42+ messages in thread
From: Wei Wang @ 2019-01-09  1:54 UTC (permalink / raw)
  To: Liang, Kan, linux-kernel, kvm, pbonzini, ak, peterz
  Cc: kan.liang, mingo, rkrcmar, like.xu, jannh, arei.gonglei

On 01/08/2019 10:08 PM, Liang, Kan wrote:
>
>
> On 1/8/2019 1:13 AM, Wei Wang wrote:
>> On 01/07/2019 10:22 PM, Liang, Kan wrote:
>>>
>>>> Thanks for sharing. I understand the point of maintaining those 
>>>> models at one place,
>>>> but this factor-out doesn't seem very elegant to me, like below
>>>>
>>>> __intel_pmu_init (int model, struct x86_pmu *x86_pmu)
>>>> {
>>>> ...
>>>> switch (model)
>>>> case INTEL_FAM6_NEHALEM:
>>>> case INTEL_FAM6_NEHALEM_EP:
>>>> case INTEL_FAM6_NEHALEM_EX:
>>>>      intel_pmu_lbr_init(x86_pmu);
>>>>      if (model != boot_cpu_data.x86_model)
>>>>          return;
>>>>
>>>>      /* Other a lot of things init like below..*/
>>>>      memcpy(hw_cache_event_ids, nehalem_hw_cache_event_ids,
>>>>                     sizeof(hw_cache_event_ids));
>>>>      memcpy(hw_cache_extra_regs, nehalem_hw_cache_extra_regs,
>>>>                     sizeof(hw_cache_extra_regs));
>>>>      x86_pmu.event_constraints = intel_nehalem_event_constraints;
>>>>                  x86_pmu.pebs_constraints = 
>>>> intel_nehalem_pebs_event_constraints;
>>>>                  x86_pmu.enable_all = intel_pmu_nhm_enable_all;
>>>>                  x86_pmu.extra_regs = intel_nehalem_extra_regs;
>>>>   ...
>>>>
>>>> Case...
>>>> }
>>>> We need insert "if (model != boot_cpu_data.x86_model)" in every 
>>>> "Case xx".
>>>>
>>>> What would be the rationale that we only do lbr_init for "x86_pmu"
>>>> when model != boot_cpu_data.x86_model?
>>>> (It looks more like a workaround to factor-out the function and get 
>>>> what we want)
>>>
>>> I thought the new function may be extended to support fake pmu as 
>>> below.
>>> It's not only for lbr. PMU has many CPU specific features. It can be 
>>> used for other features, if you want to check the compatibility in 
>>> future. But I don't have an example now.
>>>
>>> __intel_pmu_init (int model, struct x86_pmu *x86_pmu)
>>> {
>>> bool fake_pmu = (model != boot_cpu_data.x86_model) ? true : false;
>>> ...
>>> switch (model)
>>> case INTEL_FAM6_NEHALEM:
>>> case INTEL_FAM6_NEHALEM_EP:
>>> case INTEL_FAM6_NEHALEM_EX:
>>>      intel_pmu_lbr_init(x86_pmu);
>>>      x86_pmu->event_constraints = intel_nehalem_event_constraints;
>>>      x86_pmu->pebs_constraints = intel_nehalem_pebs_event_constraints;
>>>      x86_pmu->enable_all = intel_pmu_nhm_enable_all;
>>>      x86_pmu->extra_regs = intel_nehalem_extra_regs;
>>>
>>>      if (fake_pmu)
>>>          return;
>>
>> It looks similar as the one I shared above, the difference is that 
>> more things
>> (e.g. constraints) are assigned to x86_fake_pmu.
>> I'm not sure about the logic behind it (still look like a workaround).
>
> The fake x86_pmu will include all the supported features in host. If 
> you want to check other features in future, it would be useful.
>

OK, I'll think more about if we could have a cleaner way to factor out this.

Best,
Wei

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2019-01-09  1:49 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-26  9:25 [PATCH v4 00/10] Guest LBR Enabling Wei Wang
2018-12-26  9:25 ` [PATCH v4 01/10] perf/x86: fix the variable type of the LBR MSRs Wei Wang
2018-12-26  9:25 ` [PATCH v4 02/10] perf/x86: add a function to get the lbr stack Wei Wang
2018-12-26  9:25 ` [PATCH v4 03/10] KVM/x86: KVM_CAP_X86_GUEST_LBR Wei Wang
2018-12-26  9:25 ` [PATCH v4 04/10] KVM/x86: intel_pmu_lbr_enable Wei Wang
2019-01-02 16:33   ` Liang, Kan
2019-01-04  9:58     ` Wei Wang
2019-01-04 15:57       ` Liang, Kan
2019-01-05 10:09         ` Wei Wang
2019-01-07 14:22           ` Liang, Kan
2019-01-08  6:13             ` Wei Wang
2019-01-08 14:08               ` Liang, Kan
2019-01-09  1:54                 ` Wei Wang
2019-01-02 23:26   ` Jim Mattson
2019-01-03  7:22     ` Wei Wang
2019-01-03 15:34       ` Jim Mattson
2019-01-03 17:18         ` Andi Kleen
2019-01-04 10:09         ` Wei Wang
2019-01-04 15:53           ` Jim Mattson
2019-01-05 10:15             ` Wang, Wei W
2018-12-26  9:25 ` [PATCH v4 05/10] KVM/x86: expose MSR_IA32_PERF_CAPABILITIES to the guest Wei Wang
2019-01-02 23:40   ` Jim Mattson
2019-01-03  8:00     ` Wei Wang
2019-01-03 15:25       ` Jim Mattson
2019-01-07  9:15         ` Wei Wang
2019-01-07 18:05           ` Jim Mattson
2019-01-07 18:20             ` Andi Kleen
2019-01-07 18:48               ` Jim Mattson
2019-01-07 20:14                 ` Andi Kleen
2019-01-07 21:00                   ` Jim Mattson
2019-01-08  7:53                 ` Wei Wang
2019-01-08 17:19                   ` Jim Mattson
2018-12-26  9:25 ` [PATCH v4 06/10] perf/x86: no counter allocation support Wei Wang
2018-12-26  9:25 ` [PATCH v4 07/10] KVM/x86/vPMU: Add APIs to support host save/restore the guest lbr stack Wei Wang
2018-12-26  9:25 ` [PATCH v4 08/10] perf/x86: save/restore LBR_SELECT on vCPU switching Wei Wang
2018-12-26  9:25 ` [PATCH v4 09/10] perf/x86: function to check lbr user callstack mode Wei Wang
2018-12-26  9:25 ` [PATCH v4 10/10] KVM/x86/lbr: lazy save the guest lbr stack Wei Wang
2018-12-27 20:51   ` Andi Kleen
2018-12-28  3:47     ` Wei Wang
2018-12-28 19:10       ` Andi Kleen
2018-12-27 20:52   ` [PATCH v4 10/10] KVM/x86/lbr: lazy save the guest lbr stack II Andi Kleen
2018-12-29  4:25     ` Wang, Wei W

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).