All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/9] KVM: x86/pmu: Guest Architectural LBR Enabling
@ 2021-03-03 13:57 Like Xu
  2021-03-03 13:57 ` [PATCH v3 1/9] perf/x86/intel: Fix a comment about guest LBR support Like Xu
                   ` (9 more replies)
  0 siblings, 10 replies; 30+ messages in thread
From: Like Xu @ 2021-03-03 13:57 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Kan Liang, Dave Hansen, wei.w.wang, Borislav Petkov, kvm, x86,
	linux-kernel, Like Xu

Hi geniuses,

Please help review the new version of Arch LBR enabling patch set.

The Architectural Last Branch Records (LBRs) is publiced
in the 319433-040 release of Intel Architecture Instruction
Set Extensions and Future Features Programming Reference[0].

The main advantages for the Arch LBR users are [1]:
- Faster context switching due to XSAVES support and faster reset of
  LBR MSRs via the new DEPTH MSR
- Faster LBR read for a non-PEBS event due to XSAVES support, which
  lowers the overhead of the NMI handler.
- Linux kernel can support the LBR features without knowing the model
  number of the current CPU.

It's based on the kvm/queue tree plus two commits from kvm/intel tree:
- 'fea4ab260645 ("KVM: x86: Refresh CPUID on writes to MSR_IA32_XSS")'
- '0ccd14126cb2 ("KVM: x86: Report XSS as an MSR to be saved if there are supported features")'

Please check more details in each commit and feel free to comment.

[0] https://software.intel.com/content/www/us/en/develop/download/
intel-architecture-instruction-set-extensions-and-future-features-programming-reference.html
[1] https://lore.kernel.org/lkml/1593780569-62993-1-git-send-email-kan.liang@linux.intel.com/

---
v2->v3 Changelog:
- Add host patches (0001-0004) to support guest Arch LBR;
- Fix arch_lbr_depth_is_valid() check condition; [Sean]
- Fix usage of KVM_ARCH_LBR_CTL_MASK;
- Fix intel_pmu_legacy_freezing_lbrs_on_pmi();
- Reset GUEST_IA32_LBR_CTL in the vmx_vcpu_reset();
- Refine intel_pmu_lbr_is_compatible();
- Simplify lbr_enable check and its usage;
- Add Arch LBR msrs to is_valid_passthrough_msr();
- Make XSAVE support for Arch LBR as a separate patch;

Previous:
https://lore.kernel.org/kvm/20210203135714.318356-1-like.xu@linux.intel.com/

Like Xu (9):
  perf/x86/intel: Fix a comment about guest LBR support
  perf/x86/lbr: Simplify the exposure check for the LBR_INFO registers
  perf/x86/lbr: Skip checking for the existence of LBR_TOS for Arch LBR
  perf/x86/lbr: Use GFP_ATOMIC for cpuc->lbr_xsave memory allocation
  KVM: vmx/pmu: Add MSR_ARCH_LBR_DEPTH emulation for Arch LBR
  KVM: vmx/pmu: Add MSR_ARCH_LBR_CTL emulation for Arch LBR
  KVM: vmx/pmu: Add Arch LBR emulation and its VMCS field
  KVM: x86: Expose Architectural LBR CPUID leaf
  KVM: x86: Add XSAVE Support for Architectural LBRs

 arch/x86/events/intel/core.c    |  5 +-
 arch/x86/events/intel/lbr.c     |  6 +-
 arch/x86/include/asm/vmx.h      |  4 ++
 arch/x86/kvm/cpuid.c            | 25 ++++++++-
 arch/x86/kvm/vmx/capabilities.h | 25 ++++++---
 arch/x86/kvm/vmx/pmu_intel.c    | 99 +++++++++++++++++++++++++++++----
 arch/x86/kvm/vmx/vmx.c          | 22 +++++++-
 arch/x86/kvm/vmx/vmx.h          |  3 +
 arch/x86/kvm/x86.c              |  2 +
 9 files changed, 164 insertions(+), 27 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v3 1/9] perf/x86/intel: Fix a comment about guest LBR support
  2021-03-03 13:57 [PATCH v3 0/9] KVM: x86/pmu: Guest Architectural LBR Enabling Like Xu
@ 2021-03-03 13:57 ` Like Xu
  2021-03-03 16:49   ` Sean Christopherson
  2021-03-03 13:57 ` [PATCH v3 2/9] perf/x86/lbr: Simplify the exposure check for the LBR_INFO registers Like Xu
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Like Xu @ 2021-03-03 13:57 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Kan Liang, Dave Hansen, wei.w.wang, Borislav Petkov, kvm, x86,
	linux-kernel, Like Xu

Starting from v5.12, KVM reports guest LBR and extra_regs
support when the host has relevant support.

Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/events/intel/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index d4569bfa83e3..a32acc7733a7 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5565,7 +5565,7 @@ __init int intel_pmu_init(void)
 
 	/*
 	 * Access LBR MSR may cause #GP under certain circumstances.
-	 * E.g. KVM doesn't support LBR MSR
+	 * E.g. KVM doesn't support LBR MSR before v5.12.
 	 * Check all LBT MSR here.
 	 * Disable LBR access if any LBR MSRs can not be accessed.
 	 */
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 2/9] perf/x86/lbr: Simplify the exposure check for the LBR_INFO registers
  2021-03-03 13:57 [PATCH v3 0/9] KVM: x86/pmu: Guest Architectural LBR Enabling Like Xu
  2021-03-03 13:57 ` [PATCH v3 1/9] perf/x86/intel: Fix a comment about guest LBR support Like Xu
@ 2021-03-03 13:57 ` Like Xu
  2021-03-03 13:57 ` [PATCH v3 3/9] perf/x86/lbr: Skip checking for the existence of LBR_TOS for Arch LBR Like Xu
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Like Xu @ 2021-03-03 13:57 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Kan Liang, Dave Hansen, wei.w.wang, Borislav Petkov, kvm, x86,
	linux-kernel, Like Xu

If the platform supports LBR_INFO register, the x86_pmu.lbr_info will
be assigned in intel_pmu_?_lbr_init_?() and it's safe to expose LBR_INFO
in the x86_perf_get_lbr() directly, instead of relying on lbr_format check.

Also Architectural LBR has IA32_LBR_x_INFO instead of LBR_FORMAT_INFO_x
to hold metadata for the operation, including mispredict, TSX, and
elapsed cycle time information.

Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/events/intel/lbr.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 21890dacfcfe..355ea70f1879 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -1832,12 +1832,10 @@ void __init intel_pmu_arch_lbr_init(void)
  */
 int x86_perf_get_lbr(struct x86_pmu_lbr *lbr)
 {
-	int lbr_fmt = x86_pmu.intel_cap.lbr_format;
-
 	lbr->nr = x86_pmu.lbr_nr;
 	lbr->from = x86_pmu.lbr_from;
 	lbr->to = x86_pmu.lbr_to;
-	lbr->info = (lbr_fmt == LBR_FORMAT_INFO) ? x86_pmu.lbr_info : 0;
+	lbr->info = x86_pmu.lbr_info;
 
 	return 0;
 }
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 3/9] perf/x86/lbr: Skip checking for the existence of LBR_TOS for Arch LBR
  2021-03-03 13:57 [PATCH v3 0/9] KVM: x86/pmu: Guest Architectural LBR Enabling Like Xu
  2021-03-03 13:57 ` [PATCH v3 1/9] perf/x86/intel: Fix a comment about guest LBR support Like Xu
  2021-03-03 13:57 ` [PATCH v3 2/9] perf/x86/lbr: Simplify the exposure check for the LBR_INFO registers Like Xu
@ 2021-03-03 13:57 ` Like Xu
  2021-03-03 13:57 ` [PATCH v3 4/9] perf/x86/lbr: Use GFP_ATOMIC for cpuc->lbr_xsave memory allocation Like Xu
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Like Xu @ 2021-03-03 13:57 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Kan Liang, Dave Hansen, wei.w.wang, Borislav Petkov, kvm, x86,
	linux-kernel, Like Xu

The Architecture LBR does not have MSR_LBR_TOS (0x000001c9). KVM will
generate #GP for this MSR access, thereby preventing the initialization
of the guest LBR.

Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 47125db27e47 ("perf/x86/intel/lbr: Support Architectural LBR")
Signed-off-by: Like Xu <like.xu@linux.intel.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index a32acc7733a7..3cf065185fb0 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5569,7 +5569,8 @@ __init int intel_pmu_init(void)
 	 * Check all LBT MSR here.
 	 * Disable LBR access if any LBR MSRs can not be accessed.
 	 */
-	if (x86_pmu.lbr_nr && !check_msr(x86_pmu.lbr_tos, 0x3UL))
+	if (x86_pmu.lbr_nr && !boot_cpu_has(X86_FEATURE_ARCH_LBR) &&
+	    !check_msr(x86_pmu.lbr_tos, 0x3UL))
 		x86_pmu.lbr_nr = 0;
 	for (i = 0; i < x86_pmu.lbr_nr; i++) {
 		if (!(check_msr(x86_pmu.lbr_from + i, 0xffffUL) &&
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 4/9] perf/x86/lbr: Use GFP_ATOMIC for cpuc->lbr_xsave memory allocation
  2021-03-03 13:57 [PATCH v3 0/9] KVM: x86/pmu: Guest Architectural LBR Enabling Like Xu
                   ` (2 preceding siblings ...)
  2021-03-03 13:57 ` [PATCH v3 3/9] perf/x86/lbr: Skip checking for the existence of LBR_TOS for Arch LBR Like Xu
@ 2021-03-03 13:57 ` Like Xu
  2021-03-03 13:57 ` [PATCH v3 5/9] KVM: vmx/pmu: Add MSR_ARCH_LBR_DEPTH emulation for Arch LBR Like Xu
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Like Xu @ 2021-03-03 13:57 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Kan Liang, Dave Hansen, wei.w.wang, Borislav Petkov, kvm, x86,
	linux-kernel, Like Xu

When allocating the cpuc->lbr_xsave memory in the guest Arch LBR driver,
we may get a stacktrace due to relatively slow execution like below:

[   54.283563] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:196
[   54.285218] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 830, name: perf
[   54.286684] INFO: lockdep is turned off.
[   54.287448] irq event stamp: 8644
[   54.288098] hardirqs last  enabled at (8643): [<ffffffff810e2212>] __local_bh_enable_ip+0x82/0xd0
[   54.289806] hardirqs last disabled at (8644): [<ffffffff812a8777>] perf_event_exec+0x1c7/0x3c0
[   54.291418] softirqs last  enabled at (8642): [<ffffffff81033f22>] fpu__clear+0x92/0x190
[   54.292921] softirqs last disabled at (8638): [<ffffffff81033e95>] fpu__clear+0x5/0x190
[   54.294418] CPU: 3 PID: 830 Comm: perf Not tainted 5.11.0-guest+ #1145
[   54.295635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
[   54.297136] Call Trace:
[   54.297603]  dump_stack+0x8b/0xb0
[   54.298246]  ___might_sleep.cold+0xb6/0xc6
[   54.299022]  ? intel_pmu_lbr_add+0x147/0x160
[   54.299823]  kmem_cache_alloc+0x26d/0x2f0
[   54.300587]  intel_pmu_lbr_add+0x147/0x160
[   54.301358]  x86_pmu_add+0x85/0xe0
[   54.302009]  ? check_irq_usage+0x147/0x460
[   54.302793]  ? __bfs+0x210/0x210
[   54.303420]  ? stack_trace_save+0x3b/0x50
[   54.304190]  ? check_noncircular+0x66/0xf0
[   54.304978]  ? save_trace+0x3f/0x2f0
[   54.305670]  event_sched_in+0xf5/0x2a0
[   54.306401]  merge_sched_in+0x1a0/0x3b0
[   54.307141]  visit_groups_merge.constprop.0.isra.0+0x16e/0x490
[   54.308255]  ctx_sched_in+0xcc/0x200
[   54.308948]  ctx_resched+0x84/0xe0
[   54.309606]  perf_event_exec+0x2c0/0x3c0
[   54.310370]  begin_new_exec+0x627/0xbc0
[   54.311096]  load_elf_binary+0x734/0x17a0
[   54.311853]  ? lock_acquire+0xbc/0x360
[   54.312562]  ? bprm_execve+0x346/0x860
[   54.313272]  ? kvm_sched_clock_read+0x14/0x30
[   54.314095]  ? sched_clock+0x5/0x10
[   54.314760]  ? sched_clock_cpu+0xc/0xb0
[   54.315492]  bprm_execve+0x337/0x860
[   54.316176]  do_execveat_common+0x164/0x1d0
[   54.316971]  __x64_sys_execve+0x39/0x50
[   54.317698]  do_syscall_64+0x33/0x40
[   54.318390]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fix it by allocating this part of memory with GFP_ATOMIC mask.

Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: c085fb8774 ("perf/x86/intel/lbr: Support XSAVES for arch LBR read")
Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/events/intel/lbr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index 355ea70f1879..495466b12480 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -700,7 +700,7 @@ void intel_pmu_lbr_add(struct perf_event *event)
 	if (static_cpu_has(X86_FEATURE_ARCH_LBR) &&
 	    kmem_cache && !cpuc->lbr_xsave &&
 	    (cpuc->lbr_users != cpuc->lbr_pebs_users))
-		cpuc->lbr_xsave = kmem_cache_alloc(kmem_cache, GFP_KERNEL);
+		cpuc->lbr_xsave = kmem_cache_alloc(kmem_cache, GFP_ATOMIC);
 }
 
 void release_lbr_buffers(void)
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 5/9] KVM: vmx/pmu: Add MSR_ARCH_LBR_DEPTH emulation for Arch LBR
  2021-03-03 13:57 [PATCH v3 0/9] KVM: x86/pmu: Guest Architectural LBR Enabling Like Xu
                   ` (3 preceding siblings ...)
  2021-03-03 13:57 ` [PATCH v3 4/9] perf/x86/lbr: Use GFP_ATOMIC for cpuc->lbr_xsave memory allocation Like Xu
@ 2021-03-03 13:57 ` Like Xu
  2021-03-03 16:58   ` Sean Christopherson
  2021-03-03 13:57 ` [PATCH v3 6/9] KVM: vmx/pmu: Add MSR_ARCH_LBR_CTL " Like Xu
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Like Xu @ 2021-03-03 13:57 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Kan Liang, Dave Hansen, wei.w.wang, Borislav Petkov, kvm, x86,
	linux-kernel, Like Xu

The number of Arch LBR entries available for recording operations
is dictated by the value in MSR_ARCH_LBR_DEPTH.DEPTH. The supported
LBR depth values can be found in CPUID.(EAX=01CH, ECX=0):EAX[7:0]
and for each bit "n" set in this field, the MSR_ARCH_LBR_DEPTH.DEPTH
value of "8*(n+1)" is supported.

On a guest write to MSR_ARCH_LBR_DEPTH, all LBR entries are reset to 0.
KVM emulates the reset behavior by introducing lbr_desc->arch_lbr_reset.
KVM writes the guest requested value to the native ARCH_LBR_DEPTH MSR
(this is safe because the two values will be the same) when the Arch LBR
records MSRs are pass-through to the guest.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/kvm/vmx/pmu_intel.c | 43 ++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.h       |  3 +++
 2 files changed, 46 insertions(+)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 9efc1a6b8693..25d620685ae7 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -220,6 +220,9 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 	case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
 		ret = pmu->version > 1;
 		break;
+	case MSR_ARCH_LBR_DEPTH:
+		ret = guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR);
+		break;
 	default:
 		ret = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0) ||
 			get_gp_pmc(pmu, msr, MSR_P6_EVNTSEL0) ||
@@ -250,6 +253,7 @@ static inline void intel_pmu_release_guest_lbr_event(struct kvm_vcpu *vcpu)
 	if (lbr_desc->event) {
 		perf_event_release_kernel(lbr_desc->event);
 		lbr_desc->event = NULL;
+		lbr_desc->arch_lbr_reset = false;
 		vcpu_to_pmu(vcpu)->event_count--;
 	}
 }
@@ -348,10 +352,26 @@ static bool intel_pmu_handle_lbr_msrs_access(struct kvm_vcpu *vcpu,
 	return true;
 }
 
+/*
+ * Check if the requested depth values is supported
+ * based on the bits [0:7] of the guest cpuid.1c.eax.
+ */
+static bool arch_lbr_depth_is_valid(struct kvm_vcpu *vcpu, u64 depth)
+{
+	struct kvm_cpuid_entry2 *best;
+
+	best = kvm_find_cpuid_entry(vcpu, 0x1c, 0);
+	if (best && depth && !(depth % 8))
+		return (best->eax & 0xff) & (1ULL << (depth / 8 - 1));
+
+	return false;
+}
+
 static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
 	struct kvm_pmc *pmc;
+	struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
 	u32 msr = msr_info->index;
 
 	switch (msr) {
@@ -367,6 +387,9 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
 		msr_info->data = pmu->global_ovf_ctrl;
 		return 0;
+	case MSR_ARCH_LBR_DEPTH:
+		msr_info->data = lbr_desc->records.nr;
+		return 0;
 	default:
 		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
 		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
@@ -393,6 +416,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
 	struct kvm_pmc *pmc;
+	struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
 	u32 msr = msr_info->index;
 	u64 data = msr_info->data;
 
@@ -427,6 +451,12 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			return 0;
 		}
 		break;
+	case MSR_ARCH_LBR_DEPTH:
+		if (!arch_lbr_depth_is_valid(vcpu, data))
+			return 1;
+		lbr_desc->records.nr = data;
+		lbr_desc->arch_lbr_reset = true;
+		return 0;
 	default:
 		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
 		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
@@ -566,6 +596,7 @@ static void intel_pmu_init(struct kvm_vcpu *vcpu)
 	lbr_desc->records.nr = 0;
 	lbr_desc->event = NULL;
 	lbr_desc->msr_passthrough = false;
+	lbr_desc->arch_lbr_reset = false;
 }
 
 static void intel_pmu_reset(struct kvm_vcpu *vcpu)
@@ -623,6 +654,15 @@ static void intel_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
 		intel_pmu_legacy_freezing_lbrs_on_pmi(vcpu);
 }
 
+static void intel_pmu_arch_lbr_reset(struct kvm_vcpu *vcpu)
+{
+	struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+
+	/* On a software write to IA32_LBR_DEPTH, all LBR entries are reset to 0. */
+	wrmsrl(MSR_ARCH_LBR_DEPTH, lbr_desc->records.nr);
+	lbr_desc->arch_lbr_reset = false;
+}
+
 static void vmx_update_intercept_for_lbr_msrs(struct kvm_vcpu *vcpu, bool set)
 {
 	struct x86_pmu_lbr *lbr = vcpu_to_lbr_records(vcpu);
@@ -654,6 +694,9 @@ static inline void vmx_enable_lbr_msrs_passthrough(struct kvm_vcpu *vcpu)
 {
 	struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
 
+	if (unlikely(lbr_desc->arch_lbr_reset))
+		intel_pmu_arch_lbr_reset(vcpu);
+
 	if (lbr_desc->msr_passthrough)
 		return;
 
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 89da5e1251f1..a32c0c95983a 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -116,6 +116,9 @@ struct lbr_desc {
 
 	/* True if LBRs are marked as not intercepted in the MSR bitmap */
 	bool msr_passthrough;
+
+	/* Reset all LBR entries on a guest write to MSR_ARCH_LBR_DEPTH */
+	bool arch_lbr_reset;
 };
 
 /*
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 6/9] KVM: vmx/pmu: Add MSR_ARCH_LBR_CTL emulation for Arch LBR
  2021-03-03 13:57 [PATCH v3 0/9] KVM: x86/pmu: Guest Architectural LBR Enabling Like Xu
                   ` (4 preceding siblings ...)
  2021-03-03 13:57 ` [PATCH v3 5/9] KVM: vmx/pmu: Add MSR_ARCH_LBR_DEPTH emulation for Arch LBR Like Xu
@ 2021-03-03 13:57 ` Like Xu
  2021-03-03 17:19   ` Sean Christopherson
  2021-03-03 13:57 ` [PATCH v3 7/9] KVM: vmx/pmu: Add Arch LBR emulation and its VMCS field Like Xu
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Like Xu @ 2021-03-03 13:57 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Kan Liang, Dave Hansen, wei.w.wang, Borislav Petkov, kvm, x86,
	linux-kernel, Like Xu

Arch LBRs are enabled by setting MSR_ARCH_LBR_CTL.LBREn to 1. A new guest
state field named "Guest IA32_LBR_CTL" is added to enhance guest LBR usage.
When guest Arch LBR is enabled, a guest LBR event will be created like the
model-specific LBR does.

On processors that support Arch LBR, MSR_IA32_DEBUGCTLMSR[bit 0] has no
meaning. It can be written to 0 or 1, but reads will always return 0. On
the vmx_vcpu_reset(), the IA32_LBR_CTL will be cleared to 0.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/include/asm/vmx.h   |  2 ++
 arch/x86/kvm/vmx/pmu_intel.c | 27 ++++++++++++++++++++++-----
 arch/x86/kvm/vmx/vmx.c       |  9 +++++++++
 3 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 358707f60d99..8ec7bc24b37a 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -245,6 +245,8 @@ enum vmcs_field {
 	GUEST_BNDCFGS_HIGH              = 0x00002813,
 	GUEST_IA32_RTIT_CTL		= 0x00002814,
 	GUEST_IA32_RTIT_CTL_HIGH	= 0x00002815,
+	GUEST_IA32_LBR_CTL		= 0x00002816,
+	GUEST_IA32_LBR_CTL_HIGH	= 0x00002817,
 	HOST_IA32_PAT			= 0x00002c00,
 	HOST_IA32_PAT_HIGH		= 0x00002c01,
 	HOST_IA32_EFER			= 0x00002c02,
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 25d620685ae7..d14a14eb712d 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -19,6 +19,7 @@
 #include "pmu.h"
 
 #define MSR_PMC_FULL_WIDTH_BIT      (MSR_IA32_PMC0 - MSR_IA32_PERFCTR0)
+#define KVM_ARCH_LBR_CTL_MASK			0x7f000f
 
 static struct kvm_event_hw_type_mapping intel_arch_events[] = {
 	/* Index must match CPUID 0x0A.EBX bit vector */
@@ -221,6 +222,7 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 		ret = pmu->version > 1;
 		break;
 	case MSR_ARCH_LBR_DEPTH:
+	case MSR_ARCH_LBR_CTL:
 		ret = guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR);
 		break;
 	default:
@@ -390,6 +392,9 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_ARCH_LBR_DEPTH:
 		msr_info->data = lbr_desc->records.nr;
 		return 0;
+	case MSR_ARCH_LBR_CTL:
+		msr_info->data = vmcs_read64(GUEST_IA32_LBR_CTL);
+		return 0;
 	default:
 		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
 		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
@@ -457,6 +462,15 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		lbr_desc->records.nr = data;
 		lbr_desc->arch_lbr_reset = true;
 		return 0;
+	case MSR_ARCH_LBR_CTL:
+		if (!(data & ~KVM_ARCH_LBR_CTL_MASK)) {
+			vmcs_write64(GUEST_IA32_LBR_CTL, data);
+			if (intel_pmu_lbr_is_enabled(vcpu) && !lbr_desc->event &&
+				(data & ARCH_LBR_CTL_LBREN))
+				intel_pmu_create_guest_lbr_event(vcpu);
+			return 0;
+		}
+		break;
 	default:
 		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
 		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
@@ -635,12 +649,15 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
  */
 static void intel_pmu_legacy_freezing_lbrs_on_pmi(struct kvm_vcpu *vcpu)
 {
-	u64 data = vmcs_read64(GUEST_IA32_DEBUGCTL);
+	u32 lbr_ctl_field = GUEST_IA32_DEBUGCTL;
 
-	if (data & DEBUGCTLMSR_FREEZE_LBRS_ON_PMI) {
-		data &= ~DEBUGCTLMSR_LBR;
-		vmcs_write64(GUEST_IA32_DEBUGCTL, data);
-	}
+	if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_FREEZE_LBRS_ON_PMI))
+		return;
+
+	if (guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR))
+		lbr_ctl_field = GUEST_IA32_LBR_CTL;
+
+	vmcs_write64(lbr_ctl_field, vmcs_read64(lbr_ctl_field) & ~BIT(0));
 }
 
 static void intel_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6d7e760fdfa0..a0660b9934c6 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2036,6 +2036,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 						VM_EXIT_SAVE_DEBUG_CONTROLS)
 			get_vmcs12(vcpu)->guest_ia32_debugctl = data;
 
+		/*
+		 * For Arch LBR, IA32_DEBUGCTL[bit 0] has no meaning.
+		 * It can be written to 0 or 1, but reads will always return 0.
+		 */
+		if (guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR))
+			data &= ~DEBUGCTLMSR_LBR;
+
 		vmcs_write64(GUEST_IA32_DEBUGCTL, data);
 		if (intel_pmu_lbr_is_enabled(vcpu) && !to_vmx(vcpu)->lbr_desc.event &&
 		    (data & DEBUGCTLMSR_LBR))
@@ -4463,6 +4470,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 		vmcs_writel(GUEST_SYSENTER_ESP, 0);
 		vmcs_writel(GUEST_SYSENTER_EIP, 0);
 		vmcs_write64(GUEST_IA32_DEBUGCTL, 0);
+		if (cpu_has_vmx_arch_lbr())
+			vmcs_write64(GUEST_IA32_LBR_CTL, 0);
 	}
 
 	kvm_set_rflags(vcpu, X86_EFLAGS_FIXED);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 7/9] KVM: vmx/pmu: Add Arch LBR emulation and its VMCS field
  2021-03-03 13:57 [PATCH v3 0/9] KVM: x86/pmu: Guest Architectural LBR Enabling Like Xu
                   ` (5 preceding siblings ...)
  2021-03-03 13:57 ` [PATCH v3 6/9] KVM: vmx/pmu: Add MSR_ARCH_LBR_CTL " Like Xu
@ 2021-03-03 13:57 ` Like Xu
  2021-03-03 17:26   ` Sean Christopherson
  2021-03-03 13:57 ` [PATCH v3 8/9] KVM: x86: Expose Architectural LBR CPUID leaf Like Xu
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Like Xu @ 2021-03-03 13:57 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Kan Liang, Dave Hansen, wei.w.wang, Borislav Petkov, kvm, x86,
	linux-kernel, Like Xu

New VMX controls bits for Arch LBR are added. When bit 21 in vmentry_ctrl
is set, VM entry will write the value from the "Guest IA32_LBR_CTL" guest
state field to IA32_LBR_CTL. When bit 26 in vmexit_ctrl is set, VM exit
will clear IA32_LBR_CTL after the value has been saved to the "Guest
IA32_LBR_CTL" guest state field.

To enable guest Arch LBR, KVM should set both the "Load Guest IA32_LBR_CTL"
entry control and the "Clear IA32_LBR_CTL" exit control bits. If these two
conditions cannot be met, KVM will clear the LBR_FMT bits and will not
expose the Arch LBR feature.

If Arch LBR is exposed on KVM, the guest should set both the ARCH_LBR CPUID
and the same LBR_FMT value as the host via MSR_IA32_PERF_CAPABILITIES to
enable guest Arch LBR.

KVM will bypass the host/guest x86 cpu model check and the records msrs can
still be pass-through to guest as usual and work like a model-specific LBR.
KVM is consistent with the host and does not support the LER entry.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/include/asm/vmx.h      |  2 ++
 arch/x86/kvm/vmx/capabilities.h | 25 +++++++++++++++++--------
 arch/x86/kvm/vmx/pmu_intel.c    | 27 ++++++++++++++++++++++-----
 arch/x86/kvm/vmx/vmx.c          |  9 +++++++--
 4 files changed, 48 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 8ec7bc24b37a..c8186ec46fca 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -95,6 +95,7 @@
 #define VM_EXIT_CLEAR_BNDCFGS                   0x00800000
 #define VM_EXIT_PT_CONCEAL_PIP			0x01000000
 #define VM_EXIT_CLEAR_IA32_RTIT_CTL		0x02000000
+#define VM_EXIT_CLEAR_IA32_LBR_CTL		0x04000000
 
 #define VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR	0x00036dff
 
@@ -108,6 +109,7 @@
 #define VM_ENTRY_LOAD_BNDCFGS                   0x00010000
 #define VM_ENTRY_PT_CONCEAL_PIP			0x00020000
 #define VM_ENTRY_LOAD_IA32_RTIT_CTL		0x00040000
+#define VM_ENTRY_LOAD_IA32_LBR_CTL			0x00200000
 
 #define VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR	0x000011ff
 
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index d1d77985e889..73fceb534c7c 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -378,20 +378,29 @@ static inline bool vmx_pt_mode_is_host_guest(void)
 	return pt_mode == PT_MODE_HOST_GUEST;
 }
 
-static inline u64 vmx_get_perf_capabilities(void)
+static inline bool cpu_has_vmx_arch_lbr(void)
 {
-	u64 perf_cap = 0;
-
-	if (boot_cpu_has(X86_FEATURE_PDCM))
-		rdmsrl(MSR_IA32_PERF_CAPABILITIES, perf_cap);
-
-	perf_cap &= PMU_CAP_LBR_FMT;
+	return (vmcs_config.vmexit_ctrl & VM_EXIT_CLEAR_IA32_LBR_CTL) &&
+		(vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_LBR_CTL);
+}
 
+static inline u64 vmx_get_perf_capabilities(void)
+{
 	/*
 	 * Since counters are virtualized, KVM would support full
 	 * width counting unconditionally, even if the host lacks it.
 	 */
-	return PMU_CAP_FW_WRITES | perf_cap;
+	u64 perf_cap = PMU_CAP_FW_WRITES;
+	u64 host_perf_cap = 0;
+
+	if (boot_cpu_has(X86_FEATURE_PDCM))
+		rdmsrl(MSR_IA32_PERF_CAPABILITIES, host_perf_cap);
+
+	perf_cap |= host_perf_cap & PMU_CAP_LBR_FMT;
+	if (boot_cpu_has(X86_FEATURE_ARCH_LBR) && !cpu_has_vmx_arch_lbr())
+		perf_cap &= ~PMU_CAP_LBR_FMT;
+
+	return perf_cap;
 }
 
 static inline u64 vmx_supported_debugctl(void)
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index d14a14eb712d..48a817be60ab 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -176,12 +176,16 @@ static inline struct kvm_pmc *get_fw_gp_pmc(struct kvm_pmu *pmu, u32 msr)
 
 bool intel_pmu_lbr_is_compatible(struct kvm_vcpu *vcpu)
 {
+	if (kvm_cpu_cap_has(X86_FEATURE_ARCH_LBR))
+		return guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR);
+
 	/*
 	 * As a first step, a guest could only enable LBR feature if its
 	 * cpu model is the same as the host because the LBR registers
 	 * would be pass-through to the guest and they're model specific.
 	 */
-	return boot_cpu_data.x86_model == guest_cpuid_model(vcpu);
+	return !boot_cpu_has(X86_FEATURE_ARCH_LBR) &&
+		boot_cpu_data.x86_model == guest_cpuid_model(vcpu);
 }
 
 bool intel_pmu_lbr_is_enabled(struct kvm_vcpu *vcpu)
@@ -199,8 +203,11 @@ static bool intel_pmu_is_valid_lbr_msr(struct kvm_vcpu *vcpu, u32 index)
 	if (!intel_pmu_lbr_is_enabled(vcpu))
 		return ret;
 
-	ret = (index == MSR_LBR_SELECT) || (index == MSR_LBR_TOS) ||
-		(index >= records->from && index < records->from + records->nr) ||
+	if (!guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR))
+		ret = (index == MSR_LBR_SELECT) || (index == MSR_LBR_TOS);
+
+	if (!ret)
+		ret = (index >= records->from && index < records->from + records->nr) ||
 		(index >= records->to && index < records->to + records->nr);
 
 	if (!ret && records->info)
@@ -692,6 +699,9 @@ static void vmx_update_intercept_for_lbr_msrs(struct kvm_vcpu *vcpu, bool set)
 			vmx_set_intercept_for_msr(vcpu, lbr->info + i, MSR_TYPE_RW, set);
 	}
 
+	if (guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR))
+		return;
+
 	vmx_set_intercept_for_msr(vcpu, MSR_LBR_SELECT, MSR_TYPE_RW, set);
 	vmx_set_intercept_for_msr(vcpu, MSR_LBR_TOS, MSR_TYPE_RW, set);
 }
@@ -735,10 +745,13 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
 {
 	struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
 	struct lbr_desc *lbr_desc = vcpu_to_lbr_desc(vcpu);
+	bool lbr_enable = guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR) ?
+		(vmcs_read64(GUEST_IA32_LBR_CTL) & ARCH_LBR_CTL_LBREN) :
+		(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR);
 
 	if (!lbr_desc->event) {
 		vmx_disable_lbr_msrs_passthrough(vcpu);
-		if (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR)
+		if (lbr_enable)
 			goto warn;
 		if (test_bit(INTEL_PMC_IDX_FIXED_VLBR, pmu->pmc_in_use))
 			goto warn;
@@ -761,7 +774,11 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
 
 static void intel_pmu_cleanup(struct kvm_vcpu *vcpu)
 {
-	if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR))
+	bool lbr_enable = guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR) ?
+		(vmcs_read64(GUEST_IA32_LBR_CTL) & ARCH_LBR_CTL_LBREN) :
+		(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR);
+
+	if (!lbr_enable)
 		intel_pmu_release_guest_lbr_event(vcpu);
 }
 
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a0660b9934c6..2f307689a14b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -666,6 +666,9 @@ static bool is_valid_passthrough_msr(u32 msr)
 	case MSR_LBR_NHM_TO ... MSR_LBR_NHM_TO + 31:
 	case MSR_LBR_CORE_FROM ... MSR_LBR_CORE_FROM + 8:
 	case MSR_LBR_CORE_TO ... MSR_LBR_CORE_TO + 8:
+	case MSR_ARCH_LBR_FROM_0 ... MSR_ARCH_LBR_FROM_0 + 31:
+	case MSR_ARCH_LBR_TO_0 ... MSR_ARCH_LBR_TO_0 + 31:
+	case MSR_ARCH_LBR_INFO_0 ... MSR_ARCH_LBR_INFO_0 + 31:
 		/* LBR MSRs. These are handled in vmx_update_intercept_for_lbr_msrs() */
 		return true;
 	}
@@ -2529,7 +2532,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	      VM_EXIT_LOAD_IA32_EFER |
 	      VM_EXIT_CLEAR_BNDCFGS |
 	      VM_EXIT_PT_CONCEAL_PIP |
-	      VM_EXIT_CLEAR_IA32_RTIT_CTL;
+	      VM_EXIT_CLEAR_IA32_RTIT_CTL |
+	      VM_EXIT_CLEAR_IA32_LBR_CTL;
 	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS,
 				&_vmexit_control) < 0)
 		return -EIO;
@@ -2553,7 +2557,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	      VM_ENTRY_LOAD_IA32_EFER |
 	      VM_ENTRY_LOAD_BNDCFGS |
 	      VM_ENTRY_PT_CONCEAL_PIP |
-	      VM_ENTRY_LOAD_IA32_RTIT_CTL;
+	      VM_ENTRY_LOAD_IA32_RTIT_CTL |
+	      VM_ENTRY_LOAD_IA32_LBR_CTL;
 	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_ENTRY_CTLS,
 				&_vmentry_control) < 0)
 		return -EIO;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 8/9] KVM: x86: Expose Architectural LBR CPUID leaf
  2021-03-03 13:57 [PATCH v3 0/9] KVM: x86/pmu: Guest Architectural LBR Enabling Like Xu
                   ` (6 preceding siblings ...)
  2021-03-03 13:57 ` [PATCH v3 7/9] KVM: vmx/pmu: Add Arch LBR emulation and its VMCS field Like Xu
@ 2021-03-03 13:57 ` Like Xu
  2021-03-03 17:34   ` Sean Christopherson
  2021-03-03 13:57 ` [PATCH v3 9/9] KVM: x86: Add XSAVE Support for Architectural LBRs Like Xu
  2021-03-03 13:57 ` [kvm-unit-tests PATCH] x86: Update guest LBR tests for Architectural LBR Like Xu
  9 siblings, 1 reply; 30+ messages in thread
From: Like Xu @ 2021-03-03 13:57 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Kan Liang, Dave Hansen, wei.w.wang, Borislav Petkov, kvm, x86,
	linux-kernel, Like Xu

If CPUID.(EAX=07H, ECX=0):EDX[19] is set to 1, then KVM supports Arch
LBRs and CPUID leaf 01CH indicates details of the Arch LBRs capabilities.
Currently, KVM only supports the current host LBR depth for guests,
which is also the maximum supported depth on the host.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/kvm/cpuid.c   | 25 ++++++++++++++++++++++++-
 arch/x86/kvm/vmx/vmx.c |  2 ++
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index b4247f821277..4473324fe7be 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -450,7 +450,7 @@ void kvm_set_cpu_caps(void)
 		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
 		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
 		F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
-		F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16)
+		F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16) | F(ARCH_LBR)
 	);
 
 	/* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */
@@ -805,6 +805,29 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 				goto out;
 		}
 		break;
+	/* Architectural LBR */
+	case 0x1c:
+	{
+		u64 lbr_depth_mask = entry->eax & 0xff;
+
+		if (!lbr_depth_mask || !kvm_cpu_cap_has(X86_FEATURE_ARCH_LBR)) {
+			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
+			break;
+		}
+
+		/*
+		 * KVM only exposes the maximum supported depth,
+		 * which is also the fixed value used on the host.
+		 *
+		 * KVM doesn't allow VMM user sapce to adjust depth
+		 * per guest, because the guest LBR emulation depends
+		 * on the implementation of the host LBR driver.
+		 */
+		lbr_depth_mask = 1UL << (fls(lbr_depth_mask) - 1);
+		entry->eax &= ~0xff;
+		entry->eax |= lbr_depth_mask;
+		break;
+	}
 	case KVM_CPUID_SIGNATURE: {
 		static const char signature[12] = "KVMKVMKVM\0\0";
 		const u32 *sigptr = (const u32 *)signature;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 2f307689a14b..034708a3df20 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7258,6 +7258,8 @@ static __init void vmx_set_cpu_caps(void)
 		kvm_cpu_cap_clear(X86_FEATURE_INVPCID);
 	if (vmx_pt_mode_is_host_guest())
 		kvm_cpu_cap_check_and_set(X86_FEATURE_INTEL_PT);
+	if (cpu_has_vmx_arch_lbr())
+		kvm_cpu_cap_check_and_set(X86_FEATURE_ARCH_LBR);
 
 	if (vmx_umip_emulated())
 		kvm_cpu_cap_set(X86_FEATURE_UMIP);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 9/9] KVM: x86: Add XSAVE Support for Architectural LBRs
  2021-03-03 13:57 [PATCH v3 0/9] KVM: x86/pmu: Guest Architectural LBR Enabling Like Xu
                   ` (7 preceding siblings ...)
  2021-03-03 13:57 ` [PATCH v3 8/9] KVM: x86: Expose Architectural LBR CPUID leaf Like Xu
@ 2021-03-03 13:57 ` Like Xu
  2021-03-03 18:03   ` Sean Christopherson
  2021-03-03 13:57 ` [kvm-unit-tests PATCH] x86: Update guest LBR tests for Architectural LBR Like Xu
  9 siblings, 1 reply; 30+ messages in thread
From: Like Xu @ 2021-03-03 13:57 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Kan Liang, Dave Hansen, wei.w.wang, Borislav Petkov, kvm, x86,
	linux-kernel, Like Xu

On processors whose XSAVE feature set supports XSAVES and XRSTORS,
the availability of support for Architectural LBR configuration state save
and restore can be determined from CPUID.(EAX=0DH, ECX=1):EDX:ECX[bit 15].
The detailed leaf for Arch LBRs is enumerated in CPUID.(EAX=0DH, ECX=0FH).

XSAVES provides a faster means than RDMSR for guest to read all LBRs.
When guest IA32_XSS[bit 15] is set, the Arch LBRs state can be saved using
XSAVES and restored by XRSTORS with the appropriate RFBM.

If the KVM fails to pass-through the LBR msrs to the guest, the LBR msrs
will be reset to prevent the leakage of host records via XSAVES. In this
case, the guest results may be inaccurate as the legacy LBR.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 arch/x86/kvm/vmx/pmu_intel.c | 2 ++
 arch/x86/kvm/vmx/vmx.c       | 2 ++
 arch/x86/kvm/x86.c           | 2 ++
 3 files changed, 6 insertions(+)

diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 48a817be60ab..08114f70c496 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -768,6 +768,8 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu)
 	return;
 
 warn:
+	if (kvm_cpu_cap_has(X86_FEATURE_ARCH_LBR))
+		wrmsrl(MSR_ARCH_LBR_DEPTH, lbr_desc->records.nr);
 	pr_warn_ratelimited("kvm: vcpu-%d: fail to passthrough LBR.\n",
 		vcpu->vcpu_id);
 }
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 034708a3df20..ec4593e0ee6d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7268,6 +7268,8 @@ static __init void vmx_set_cpu_caps(void)
 	supported_xss = 0;
 	if (!cpu_has_vmx_xsaves())
 		kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
+	else if (kvm_cpu_cap_has(X86_FEATURE_ARCH_LBR))
+		supported_xss |= XFEATURE_MASK_LBR;
 
 	/* CPUID 0x80000001 */
 	if (!cpu_has_vmx_rdtscp())
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d773836ceb7a..bca2e318ff24 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10433,6 +10433,8 @@ int kvm_arch_hardware_setup(void *opaque)
 
 	if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
 		supported_xss = 0;
+	else
+		supported_xss &= host_xss;
 
 #define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
 	cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [kvm-unit-tests PATCH] x86: Update guest LBR tests for Architectural LBR
  2021-03-03 13:57 [PATCH v3 0/9] KVM: x86/pmu: Guest Architectural LBR Enabling Like Xu
                   ` (8 preceding siblings ...)
  2021-03-03 13:57 ` [PATCH v3 9/9] KVM: x86: Add XSAVE Support for Architectural LBRs Like Xu
@ 2021-03-03 13:57 ` Like Xu
  2021-03-03 18:05   ` Sean Christopherson
  9 siblings, 1 reply; 30+ messages in thread
From: Like Xu @ 2021-03-03 13:57 UTC (permalink / raw)
  To: Peter Zijlstra, Paolo Bonzini, Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	Kan Liang, Dave Hansen, wei.w.wang, Borislav Petkov, kvm, x86,
	linux-kernel, Like Xu

This unit-test is intended to test the KVM's support for the
Architectural LBRs which is a Architectural performance monitor
unit (PMU) feature on Intel processors.

If the LBR bit is set to 1 in the MSR_ARCH_LBR_CTL, the processor
will record a running trace of the most recent branches guest
taken in the LBR entries for guest to read.

Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
 x86/pmu_lbr.c | 62 ++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 52 insertions(+), 10 deletions(-)

diff --git a/x86/pmu_lbr.c b/x86/pmu_lbr.c
index 3bd9e9f..588aec8 100644
--- a/x86/pmu_lbr.c
+++ b/x86/pmu_lbr.c
@@ -6,6 +6,7 @@
 #define MAX_NUM_LBR_ENTRY	  32
 #define DEBUGCTLMSR_LBR	  (1UL <<  0)
 #define PMU_CAP_LBR_FMT	  0x3f
+#define KVM_ARCH_LBR_CTL_MASK			0x7f000f
 
 #define MSR_LBR_NHM_FROM	0x00000680
 #define MSR_LBR_NHM_TO		0x000006c0
@@ -13,6 +14,10 @@
 #define MSR_LBR_CORE_TO	0x00000060
 #define MSR_LBR_TOS		0x000001c9
 #define MSR_LBR_SELECT		0x000001c8
+#define MSR_ARCH_LBR_CTL		0x000014ce
+#define MSR_ARCH_LBR_DEPTH		0x000014cf
+#define MSR_ARCH_LBR_FROM_0		0x00001500
+#define MSR_ARCH_LBR_TO_0		0x00001600
 
 volatile int count;
 
@@ -66,6 +71,9 @@ int main(int ac, char **av)
 	struct cpuid id = cpuid(10);
 	u64 perf_cap;
 	int max, i;
+	bool arch_lbr = false;
+	u32 ctl_msr = MSR_IA32_DEBUGCTLMSR;
+	u64 ctl_value = DEBUGCTLMSR_LBR;
 
 	setup_vm();
 	perf_cap = rdmsr(MSR_IA32_PERF_CAPABILITIES);
@@ -80,8 +88,23 @@ int main(int ac, char **av)
 		return report_summary();
 	}
 
+	/*
+	 * On processors that support Architectural LBRs,
+	 * IA32_PERF_CAPABILITIES.LBR_FMT will have the value 03FH.
+	 */
+	if (0x3f == (perf_cap & PMU_CAP_LBR_FMT)) {
+		arch_lbr = true;
+		ctl_msr = MSR_ARCH_LBR_CTL;
+		/* DEPTH defaults to the maximum number of LBRs entries. */
+		max = rdmsr(MSR_ARCH_LBR_DEPTH) - 1;
+		ctl_value = KVM_ARCH_LBR_CTL_MASK;
+	}
+
 	printf("PMU version:		 %d\n", eax.split.version_id);
-	printf("LBR version:		 %ld\n", perf_cap & PMU_CAP_LBR_FMT);
+	if (!arch_lbr)
+		printf("LBR version:		 %ld\n", perf_cap & PMU_CAP_LBR_FMT);
+	else
+		printf("Architectural LBR depth:		 %d\n", max + 1);
 
 	/* Look for LBR from and to MSRs */
 	lbr_from = MSR_LBR_CORE_FROM;
@@ -90,27 +113,46 @@ int main(int ac, char **av)
 		lbr_from = MSR_LBR_NHM_FROM;
 		lbr_to = MSR_LBR_NHM_TO;
 	}
+	if (test_init_lbr_from_exception(0)) {
+		lbr_from = MSR_ARCH_LBR_FROM_0;
+		lbr_to = MSR_ARCH_LBR_TO_0;
+	}
 
 	if (test_init_lbr_from_exception(0)) {
 		printf("LBR on this platform is not supported!\n");
 		return report_summary();
 	}
 
-	wrmsr(MSR_LBR_SELECT, 0);
-	wrmsr(MSR_LBR_TOS, 0);
-	for (max = 0; max < MAX_NUM_LBR_ENTRY; max++) {
-		if (test_init_lbr_from_exception(max))
-			break;
+	/* Reset the guest LBR entries. */
+	if (arch_lbr) {
+		/* On a software write to IA32_LBR_DEPTH, all LBR entries are reset to 0.*/
+		wrmsr(MSR_ARCH_LBR_DEPTH, max + 1);
+	} else {
+		wrmsr(MSR_LBR_SELECT, 0);
+		wrmsr(MSR_LBR_TOS, 0);
+		for (max = 0; max < MAX_NUM_LBR_ENTRY; max++) {
+			if (test_init_lbr_from_exception(max))
+				break;
+		}
 	}
-
 	report(max > 0, "The number of guest LBR entries is good.");
 
+	/* Check the guest LBR entries are initialized. */
+	for (i = 0; i < max; ++i) {
+		if (rdmsr(lbr_to + i) || rdmsr(lbr_from + i))
+			break;
+	}
+	report(i == max, "The guest LBR initialized FROM_IP/TO_IP values are good.");
+
 	/* Do some branch instructions. */
-	wrmsr(MSR_IA32_DEBUGCTLMSR, DEBUGCTLMSR_LBR);
+	wrmsr(ctl_msr, ctl_value);
 	lbr_test();
-	wrmsr(MSR_IA32_DEBUGCTLMSR, 0);
+	wrmsr(ctl_msr, 0);
 
-	report(rdmsr(MSR_LBR_TOS) != 0, "The guest LBR MSR_LBR_TOS value is good.");
+	/* Check if the guest LBR has recorded some branches. */
+	if (!arch_lbr) {
+		report(rdmsr(MSR_LBR_TOS) != 0, "The guest LBR MSR_LBR_TOS value is good.");
+	}
 	for (i = 0; i < max; ++i) {
 		if (!rdmsr(lbr_to + i) || !rdmsr(lbr_from + i))
 			break;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 1/9] perf/x86/intel: Fix a comment about guest LBR support
  2021-03-03 13:57 ` [PATCH v3 1/9] perf/x86/intel: Fix a comment about guest LBR support Like Xu
@ 2021-03-03 16:49   ` Sean Christopherson
  0 siblings, 0 replies; 30+ messages in thread
From: Sean Christopherson @ 2021-03-03 16:49 UTC (permalink / raw)
  To: Like Xu
  Cc: Peter Zijlstra, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Kan Liang, Dave Hansen, wei.w.wang,
	Borislav Petkov, kvm, x86, linux-kernel

On Wed, Mar 03, 2021, Like Xu wrote:
> Starting from v5.12, KVM reports guest LBR and extra_regs
> support when the host has relevant support.
> 
> Cc: Peter Zijlstra <peterz@infradead.org>
> Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
> Signed-off-by: Like Xu <like.xu@linux.intel.com>
> ---
>  arch/x86/events/intel/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index d4569bfa83e3..a32acc7733a7 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -5565,7 +5565,7 @@ __init int intel_pmu_init(void)
>  
>  	/*
>  	 * Access LBR MSR may cause #GP under certain circumstances.
> -	 * E.g. KVM doesn't support LBR MSR
> +	 * E.g. KVM doesn't support LBR MSR before v5.12.

Just delete this part of the comment.

>  	 * Check all LBT MSR here.
>  	 * Disable LBR access if any LBR MSRs can not be accessed.
>  	 */
> -- 
> 2.29.2
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 5/9] KVM: vmx/pmu: Add MSR_ARCH_LBR_DEPTH emulation for Arch LBR
  2021-03-03 13:57 ` [PATCH v3 5/9] KVM: vmx/pmu: Add MSR_ARCH_LBR_DEPTH emulation for Arch LBR Like Xu
@ 2021-03-03 16:58   ` Sean Christopherson
  2021-03-04  2:30     ` Xu, Like
  0 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2021-03-03 16:58 UTC (permalink / raw)
  To: Like Xu
  Cc: Peter Zijlstra, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Kan Liang, Dave Hansen, wei.w.wang,
	Borislav Petkov, kvm, x86, linux-kernel

On Wed, Mar 03, 2021, Like Xu wrote:
> @@ -348,10 +352,26 @@ static bool intel_pmu_handle_lbr_msrs_access(struct kvm_vcpu *vcpu,
>  	return true;
>  }
>  
> +/*
> + * Check if the requested depth values is supported
> + * based on the bits [0:7] of the guest cpuid.1c.eax.
> + */
> +static bool arch_lbr_depth_is_valid(struct kvm_vcpu *vcpu, u64 depth)
> +{
> +	struct kvm_cpuid_entry2 *best;
> +
> +	best = kvm_find_cpuid_entry(vcpu, 0x1c, 0);
> +	if (best && depth && !(depth % 8))

This is still wrong, it fails to weed out depth > 64.

Not that this is a hot path, but it's probably worth double checking that the
compiler generates simple code for "depth % 8", e.g. it can be "depth & 7)".

> +		return (best->eax & 0xff) & (1ULL << (depth / 8 - 1));
> +
> +	return false;
> +}
> +

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 6/9] KVM: vmx/pmu: Add MSR_ARCH_LBR_CTL emulation for Arch LBR
  2021-03-03 13:57 ` [PATCH v3 6/9] KVM: vmx/pmu: Add MSR_ARCH_LBR_CTL " Like Xu
@ 2021-03-03 17:19   ` Sean Christopherson
  2021-03-04  2:58     ` Xu, Like
  0 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2021-03-03 17:19 UTC (permalink / raw)
  To: Like Xu
  Cc: Peter Zijlstra, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Kan Liang, Dave Hansen, wei.w.wang,
	Borislav Petkov, kvm, x86, linux-kernel

On Wed, Mar 03, 2021, Like Xu wrote:
> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
> index 25d620685ae7..d14a14eb712d 100644
> --- a/arch/x86/kvm/vmx/pmu_intel.c
> +++ b/arch/x86/kvm/vmx/pmu_intel.c
> @@ -19,6 +19,7 @@
>  #include "pmu.h"
>  
>  #define MSR_PMC_FULL_WIDTH_BIT      (MSR_IA32_PMC0 - MSR_IA32_PERFCTR0)
> +#define KVM_ARCH_LBR_CTL_MASK			0x7f000f

It would nice to build this up with the individual bits instead of tossing in a
magic number. 

>  static struct kvm_event_hw_type_mapping intel_arch_events[] = {
>  	/* Index must match CPUID 0x0A.EBX bit vector */
> @@ -221,6 +222,7 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
>  		ret = pmu->version > 1;
>  		break;
>  	case MSR_ARCH_LBR_DEPTH:
> +	case MSR_ARCH_LBR_CTL:
>  		ret = guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR);
>  		break;
>  	default:
> @@ -390,6 +392,9 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  	case MSR_ARCH_LBR_DEPTH:
>  		msr_info->data = lbr_desc->records.nr;
>  		return 0;
> +	case MSR_ARCH_LBR_CTL:
> +		msr_info->data = vmcs_read64(GUEST_IA32_LBR_CTL);
> +		return 0;
>  	default:
>  		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
>  		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
> @@ -457,6 +462,15 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  		lbr_desc->records.nr = data;
>  		lbr_desc->arch_lbr_reset = true;
>  		return 0;
> +	case MSR_ARCH_LBR_CTL:
> +		if (!(data & ~KVM_ARCH_LBR_CTL_MASK)) {

Maybe invert this to reduce indentation?

		if (data & ...)
			break; (or "return 1;")


> +			vmcs_write64(GUEST_IA32_LBR_CTL, data);
> +			if (intel_pmu_lbr_is_enabled(vcpu) && !lbr_desc->event &&
> +				(data & ARCH_LBR_CTL_LBREN))

Alignment.

> +				intel_pmu_create_guest_lbr_event(vcpu);
> +			return 0;
> +		}
> +		break;
>  	default:
>  		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
>  		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
> @@ -635,12 +649,15 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
>   */
>  static void intel_pmu_legacy_freezing_lbrs_on_pmi(struct kvm_vcpu *vcpu)
>  {
> -	u64 data = vmcs_read64(GUEST_IA32_DEBUGCTL);
> +	u32 lbr_ctl_field = GUEST_IA32_DEBUGCTL;
>  
> -	if (data & DEBUGCTLMSR_FREEZE_LBRS_ON_PMI) {
> -		data &= ~DEBUGCTLMSR_LBR;
> -		vmcs_write64(GUEST_IA32_DEBUGCTL, data);
> -	}
> +	if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_FREEZE_LBRS_ON_PMI))
> +		return;
> +
> +	if (guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR))
> +		lbr_ctl_field = GUEST_IA32_LBR_CTL;
> +
> +	vmcs_write64(lbr_ctl_field, vmcs_read64(lbr_ctl_field) & ~BIT(0));

Use ARCH_LBR_CTL_LBREN?

>  }
>  
>  static void intel_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 6d7e760fdfa0..a0660b9934c6 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -2036,6 +2036,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>  						VM_EXIT_SAVE_DEBUG_CONTROLS)
>  			get_vmcs12(vcpu)->guest_ia32_debugctl = data;
>  
> +		/*
> +		 * For Arch LBR, IA32_DEBUGCTL[bit 0] has no meaning.
> +		 * It can be written to 0 or 1, but reads will always return 0.
> +		 */
> +		if (guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR))
> +			data &= ~DEBUGCTLMSR_LBR;
> +
>  		vmcs_write64(GUEST_IA32_DEBUGCTL, data);
>  		if (intel_pmu_lbr_is_enabled(vcpu) && !to_vmx(vcpu)->lbr_desc.event &&
>  		    (data & DEBUGCTLMSR_LBR))
> @@ -4463,6 +4470,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>  		vmcs_writel(GUEST_SYSENTER_ESP, 0);
>  		vmcs_writel(GUEST_SYSENTER_EIP, 0);
>  		vmcs_write64(GUEST_IA32_DEBUGCTL, 0);
> +		if (cpu_has_vmx_arch_lbr())
> +			vmcs_write64(GUEST_IA32_LBR_CTL, 0);

Not that any guest is likely to care, but is the MSR cleared on INIT?  The SDM
has specific language for warm reset, but I can't find anything for INIT.

  On a warm reset, all LBR MSRs, including IA32_LBR_DEPTH, have their values
  preserved. However, IA32_LBR_CTL.LBREn is cleared to 0, disabling LBRs. If a
  warm reset is triggered while the processor is in C6, also known as warm init,
  all LBR MSRs will be reset to their initial values.

>  	}
>  
>  	kvm_set_rflags(vcpu, X86_EFLAGS_FIXED);
> -- 
> 2.29.2
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 7/9] KVM: vmx/pmu: Add Arch LBR emulation and its VMCS field
  2021-03-03 13:57 ` [PATCH v3 7/9] KVM: vmx/pmu: Add Arch LBR emulation and its VMCS field Like Xu
@ 2021-03-03 17:26   ` Sean Christopherson
  2021-03-04  3:02     ` Xu, Like
  0 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2021-03-03 17:26 UTC (permalink / raw)
  To: Like Xu
  Cc: Peter Zijlstra, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Kan Liang, Dave Hansen, wei.w.wang,
	Borislav Petkov, kvm, x86, linux-kernel

On Wed, Mar 03, 2021, Like Xu wrote:
> New VMX controls bits for Arch LBR are added. When bit 21 in vmentry_ctrl
> is set, VM entry will write the value from the "Guest IA32_LBR_CTL" guest
> state field to IA32_LBR_CTL. When bit 26 in vmexit_ctrl is set, VM exit
> will clear IA32_LBR_CTL after the value has been saved to the "Guest
> IA32_LBR_CTL" guest state field.

...

> @@ -2529,7 +2532,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
>  	      VM_EXIT_LOAD_IA32_EFER |
>  	      VM_EXIT_CLEAR_BNDCFGS |
>  	      VM_EXIT_PT_CONCEAL_PIP |
> -	      VM_EXIT_CLEAR_IA32_RTIT_CTL;
> +	      VM_EXIT_CLEAR_IA32_RTIT_CTL |
> +	      VM_EXIT_CLEAR_IA32_LBR_CTL;

So, how does MSR_ARCH_LBR_CTL get restored on the host?  What if the host wants
to keep _its_ LBR recording active while the guest is running?

>  	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS,
>  				&_vmexit_control) < 0)
>  		return -EIO;
> @@ -2553,7 +2557,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
>  	      VM_ENTRY_LOAD_IA32_EFER |
>  	      VM_ENTRY_LOAD_BNDCFGS |
>  	      VM_ENTRY_PT_CONCEAL_PIP |
> -	      VM_ENTRY_LOAD_IA32_RTIT_CTL;
> +	      VM_ENTRY_LOAD_IA32_RTIT_CTL |
> +	      VM_ENTRY_LOAD_IA32_LBR_CTL;
>  	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_ENTRY_CTLS,
>  				&_vmentry_control) < 0)
>  		return -EIO;
> -- 
> 2.29.2
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 8/9] KVM: x86: Expose Architectural LBR CPUID leaf
  2021-03-03 13:57 ` [PATCH v3 8/9] KVM: x86: Expose Architectural LBR CPUID leaf Like Xu
@ 2021-03-03 17:34   ` Sean Christopherson
  2021-03-03 18:01     ` Sean Christopherson
  0 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2021-03-03 17:34 UTC (permalink / raw)
  To: Like Xu
  Cc: Peter Zijlstra, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Kan Liang, Dave Hansen, wei.w.wang,
	Borislav Petkov, kvm, x86, linux-kernel

On Wed, Mar 03, 2021, Like Xu wrote:
> If CPUID.(EAX=07H, ECX=0):EDX[19] is set to 1, then KVM supports Arch
> LBRs and CPUID leaf 01CH indicates details of the Arch LBRs capabilities.
> Currently, KVM only supports the current host LBR depth for guests,
> which is also the maximum supported depth on the host.
> 
> Signed-off-by: Like Xu <like.xu@linux.intel.com>
> ---
>  arch/x86/kvm/cpuid.c   | 25 ++++++++++++++++++++++++-
>  arch/x86/kvm/vmx/vmx.c |  2 ++
>  2 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index b4247f821277..4473324fe7be 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -450,7 +450,7 @@ void kvm_set_cpu_caps(void)
>  		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
>  		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
>  		F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
> -		F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16)
> +		F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16) | F(ARCH_LBR)
>  	);
>  
>  	/* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */

...

> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 2f307689a14b..034708a3df20 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7258,6 +7258,8 @@ static __init void vmx_set_cpu_caps(void)
>  		kvm_cpu_cap_clear(X86_FEATURE_INVPCID);
>  	if (vmx_pt_mode_is_host_guest())
>  		kvm_cpu_cap_check_and_set(X86_FEATURE_INTEL_PT);
> +	if (cpu_has_vmx_arch_lbr())
> +		kvm_cpu_cap_check_and_set(X86_FEATURE_ARCH_LBR);

Using kvm_cpu_cap_check_and_set(), which queries boot_cpu_has(), is only
necessary if a feature is not exposed by default in kvm_set_cpu_caps().  That's
why INTEL_PT uses it.  ARCH_LBR on the other hand is set in the "enable by
default" mask.

That being said, it's probably a bad idea to advertise ARCH_LBR by default.  In
the unlikely case that AMD adds support for ARCH_LBR, enable-by-default means
guest will be able to use ARCH_LBR on old KVMs that presumably would lack support
for ARCH_LBR on SVM.

TL;DR: omit F(ARCH_LBR) or replace it with "0 /* ARCH_LBR */".

>  	if (vmx_umip_emulated())
>  		kvm_cpu_cap_set(X86_FEATURE_UMIP);
> -- 
> 2.29.2
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 8/9] KVM: x86: Expose Architectural LBR CPUID leaf
  2021-03-03 17:34   ` Sean Christopherson
@ 2021-03-03 18:01     ` Sean Christopherson
  0 siblings, 0 replies; 30+ messages in thread
From: Sean Christopherson @ 2021-03-03 18:01 UTC (permalink / raw)
  To: Like Xu
  Cc: Peter Zijlstra, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Kan Liang, Dave Hansen, wei.w.wang,
	Borislav Petkov, kvm, x86, linux-kernel

On Wed, Mar 03, 2021, Sean Christopherson wrote:
> On Wed, Mar 03, 2021, Like Xu wrote:
> > If CPUID.(EAX=07H, ECX=0):EDX[19] is set to 1, then KVM supports Arch
> > LBRs and CPUID leaf 01CH indicates details of the Arch LBRs capabilities.
> > Currently, KVM only supports the current host LBR depth for guests,
> > which is also the maximum supported depth on the host.
> > 
> > Signed-off-by: Like Xu <like.xu@linux.intel.com>
> > ---
> >  arch/x86/kvm/cpuid.c   | 25 ++++++++++++++++++++++++-
> >  arch/x86/kvm/vmx/vmx.c |  2 ++
> >  2 files changed, 26 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index b4247f821277..4473324fe7be 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -450,7 +450,7 @@ void kvm_set_cpu_caps(void)
> >  		F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
> >  		F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
> >  		F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
> > -		F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16)
> > +		F(SERIALIZE) | F(TSXLDTRK) | F(AVX512_FP16) | F(ARCH_LBR)
> >  	);
> >  
> >  	/* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */
> 
> ...
> 
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 2f307689a14b..034708a3df20 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -7258,6 +7258,8 @@ static __init void vmx_set_cpu_caps(void)
> >  		kvm_cpu_cap_clear(X86_FEATURE_INVPCID);
> >  	if (vmx_pt_mode_is_host_guest())
> >  		kvm_cpu_cap_check_and_set(X86_FEATURE_INTEL_PT);
> > +	if (cpu_has_vmx_arch_lbr())
> > +		kvm_cpu_cap_check_and_set(X86_FEATURE_ARCH_LBR);
> 
> Using kvm_cpu_cap_check_and_set(), which queries boot_cpu_has(), is only
> necessary if a feature is not exposed by default in kvm_set_cpu_caps().  That's
> why INTEL_PT uses it.  ARCH_LBR on the other hand is set in the "enable by
> default" mask.
> 
> That being said, it's probably a bad idea to advertise ARCH_LBR by default.  In
> the unlikely case that AMD adds support for ARCH_LBR, enable-by-default means
> guest will be able to use ARCH_LBR on old KVMs that presumably would lack support
> for ARCH_LBR on SVM.
> 
> TL;DR: omit F(ARCH_LBR) or replace it with "0 /* ARCH_LBR */".

Actually, I take that back.  It'll require changing SVM, but due to the XSS
interaction it's probably cleaner to leaf F(ARCH_LBR) as is, and do:

	if (!cpu_has_vmx_arch_lbr())
		kvm_cpu_cap_clear(X86_FEATURE_ARCH_LBR);

and then unconditionally clear the cap for SVM.  In a way, that's arguably
better documentation as it explicitly shows that SVM lacks supports.

More thoughts in the next patch...

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 9/9] KVM: x86: Add XSAVE Support for Architectural LBRs
  2021-03-03 13:57 ` [PATCH v3 9/9] KVM: x86: Add XSAVE Support for Architectural LBRs Like Xu
@ 2021-03-03 18:03   ` Sean Christopherson
  2021-03-04  3:43     ` Like Xu
  0 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2021-03-03 18:03 UTC (permalink / raw)
  To: Like Xu
  Cc: Peter Zijlstra, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Kan Liang, Dave Hansen, wei.w.wang,
	Borislav Petkov, kvm, x86, linux-kernel

On Wed, Mar 03, 2021, Like Xu wrote:
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 034708a3df20..ec4593e0ee6d 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7268,6 +7268,8 @@ static __init void vmx_set_cpu_caps(void)
>  	supported_xss = 0;
>  	if (!cpu_has_vmx_xsaves())
>  		kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
> +	else if (kvm_cpu_cap_has(X86_FEATURE_ARCH_LBR))
> +		supported_xss |= XFEATURE_MASK_LBR;
>  
>  	/* CPUID 0x80000001 */
>  	if (!cpu_has_vmx_rdtscp())
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d773836ceb7a..bca2e318ff24 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10433,6 +10433,8 @@ int kvm_arch_hardware_setup(void *opaque)
>  
>  	if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
>  		supported_xss = 0;
> +	else
> +		supported_xss &= host_xss;

Not your fault by any means, but I would prefer to have matching logic for XSS
and XCR0.  The existing clearing of supported_xss here is pointless.  E.g. I'd
prefer something like the following, though Paolo may have a different opinion.

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6d7e760fdfa0..c781034463e5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7244,12 +7244,15 @@ static __init void vmx_set_cpu_caps(void)
                kvm_cpu_cap_clear(X86_FEATURE_INVPCID);
        if (vmx_pt_mode_is_host_guest())
                kvm_cpu_cap_check_and_set(X86_FEATURE_INTEL_PT);
+       if (!cpu_has_vmx_arch_lbr()) {
+               kvm_cpu_cap_clear(X86_FEATURE_ARCH_LBR);
+               supported_xss &= ~XFEATURE_MASK_LBR;
+       }

        if (vmx_umip_emulated())
                kvm_cpu_cap_set(X86_FEATURE_UMIP);

        /* CPUID 0xD.1 */
-       supported_xss = 0;
        if (!cpu_has_vmx_xsaves())
                kvm_cpu_cap_clear(X86_FEATURE_XSAVES);

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7b0adebec1ef..5f9eb1f5b840 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -205,6 +205,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
                                | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
                                | XFEATURE_MASK_PKRU)

+#define KVM_SUPPORTED_XSS      XFEATURE_MASK_LBR
+
 u64 __read_mostly host_efer;
 EXPORT_SYMBOL_GPL(host_efer);

@@ -8037,6 +8039,11 @@ int kvm_arch_init(void *opaque)
                supported_xcr0 = host_xcr0 & KVM_SUPPORTED_XCR0;
        }

+       if (boot_cpu_has(X86_FEATURE_XSAVES))
+               rdmsrl(MSR_IA32_XSS, host_xss);
+               supported_xss = host_xss & KVM_SUPPORTED_XSS;
+       }
+
        if (pi_inject_timer == -1)
                pi_inject_timer = housekeeping_enabled(HK_FLAG_TIMER);
 #ifdef CONFIG_X86_64
@@ -10412,9 +10419,6 @@ int kvm_arch_hardware_setup(void *opaque)

        rdmsrl_safe(MSR_EFER, &host_efer);

-       if (boot_cpu_has(X86_FEATURE_XSAVES))
-               rdmsrl(MSR_IA32_XSS, host_xss);
-
        r = ops->hardware_setup();
        if (r != 0)
                return r;
@@ -10422,9 +10426,6 @@ int kvm_arch_hardware_setup(void *opaque)
        memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
        kvm_ops_static_call_update();

-       if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
-               supported_xss = 0;
-
 #define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
        cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
 #undef __kvm_cpu_cap_has


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [kvm-unit-tests PATCH] x86: Update guest LBR tests for Architectural LBR
  2021-03-03 13:57 ` [kvm-unit-tests PATCH] x86: Update guest LBR tests for Architectural LBR Like Xu
@ 2021-03-03 18:05   ` Sean Christopherson
  0 siblings, 0 replies; 30+ messages in thread
From: Sean Christopherson @ 2021-03-03 18:05 UTC (permalink / raw)
  To: Like Xu
  Cc: Peter Zijlstra, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Kan Liang, Dave Hansen, wei.w.wang,
	Borislav Petkov, kvm, x86, linux-kernel

On Wed, Mar 03, 2021, Like Xu wrote:
> This unit-test is intended to test the KVM's support for the
> Architectural LBRs which is a Architectural performance monitor
> unit (PMU) feature on Intel processors.

These really need negative testing, especially on the MSR values.  IMO, negative
tests should be mandatory for merging arch LBR support in KVM, it shouldn't be
too much additional effort.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 5/9] KVM: vmx/pmu: Add MSR_ARCH_LBR_DEPTH emulation for Arch LBR
  2021-03-03 16:58   ` Sean Christopherson
@ 2021-03-04  2:30     ` Xu, Like
  2021-03-04 16:12       ` Sean Christopherson
  0 siblings, 1 reply; 30+ messages in thread
From: Xu, Like @ 2021-03-04  2:30 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Peter Zijlstra, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Kan Liang, Dave Hansen, wei.w.wang,
	Borislav Petkov, kvm, x86, linux-kernel, Like Xu

Hi Sean,

Thanks for your detailed review on the patch set.

On 2021/3/4 0:58, Sean Christopherson wrote:
> On Wed, Mar 03, 2021, Like Xu wrote:
>> @@ -348,10 +352,26 @@ static bool intel_pmu_handle_lbr_msrs_access(struct kvm_vcpu *vcpu,
>>   	return true;
>>   }
>>   
>> +/*
>> + * Check if the requested depth values is supported
>> + * based on the bits [0:7] of the guest cpuid.1c.eax.
>> + */
>> +static bool arch_lbr_depth_is_valid(struct kvm_vcpu *vcpu, u64 depth)
>> +{
>> +	struct kvm_cpuid_entry2 *best;
>> +
>> +	best = kvm_find_cpuid_entry(vcpu, 0x1c, 0);
>> +	if (best && depth && !(depth % 8))
> This is still wrong, it fails to weed out depth > 64.

How come ? The testcases depth = {65, 127, 128} get #GP as expected.

>
> Not that this is a hot path, but it's probably worth double checking that the
> compiler generates simple code for "depth % 8", e.g. it can be "depth & 7)".

Emm, the "%" operation is quite normal over kernel code.

if (best && depth && !(depth % 8))
    10659:       48 85 c0                test   rax,rax
    1065c:       74 c7                   je     10625 <intel_pmu_set_msr+0x65>
    1065e:       4d 85 e4                test   r12,r12
    10661:       74 c2                   je     10625 <intel_pmu_set_msr+0x65>
    10663:       41 f6 c4 07             test   r12b,0x7
    10667:       75 bc                   jne    10625 <intel_pmu_set_msr+0x65>

It looks like the compiler does the right thing.
Do you see the room for optimization ?

>
>> +		return (best->eax & 0xff) & (1ULL << (depth / 8 - 1));
>> +
>> +	return false;
>> +}
>> +


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 6/9] KVM: vmx/pmu: Add MSR_ARCH_LBR_CTL emulation for Arch LBR
  2021-03-03 17:19   ` Sean Christopherson
@ 2021-03-04  2:58     ` Xu, Like
  2021-03-04 16:25       ` Sean Christopherson
  0 siblings, 1 reply; 30+ messages in thread
From: Xu, Like @ 2021-03-04  2:58 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Peter Zijlstra, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Kan Liang, Dave Hansen, wei.w.wang,
	Borislav Petkov, kvm, x86, linux-kernel, Like Xu

On 2021/3/4 1:19, Sean Christopherson wrote:
> On Wed, Mar 03, 2021, Like Xu wrote:
>> diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
>> index 25d620685ae7..d14a14eb712d 100644
>> --- a/arch/x86/kvm/vmx/pmu_intel.c
>> +++ b/arch/x86/kvm/vmx/pmu_intel.c
>> @@ -19,6 +19,7 @@
>>   #include "pmu.h"
>>   
>>   #define MSR_PMC_FULL_WIDTH_BIT      (MSR_IA32_PMC0 - MSR_IA32_PERFCTR0)
>> +#define KVM_ARCH_LBR_CTL_MASK			0x7f000f
> It would nice to build this up with the individual bits instead of tossing in a
> magic number.

I will move the mask ARCH_LBR_CTL_MASK from lbr.c to msr-index.h and thus,
#define KVM_ARCH_LBR_CTL_MASK  (ARCH_LBR_CTL_MASK | ARCH_LBR_CTL_LBREN)

I assume the move operation would be a separate patch for host side.

>
>>   static struct kvm_event_hw_type_mapping intel_arch_events[] = {
>>   	/* Index must match CPUID 0x0A.EBX bit vector */
>> @@ -221,6 +222,7 @@ static bool intel_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
>>   		ret = pmu->version > 1;
>>   		break;
>>   	case MSR_ARCH_LBR_DEPTH:
>> +	case MSR_ARCH_LBR_CTL:
>>   		ret = guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR);
>>   		break;
>>   	default:
>> @@ -390,6 +392,9 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   	case MSR_ARCH_LBR_DEPTH:
>>   		msr_info->data = lbr_desc->records.nr;
>>   		return 0;
>> +	case MSR_ARCH_LBR_CTL:
>> +		msr_info->data = vmcs_read64(GUEST_IA32_LBR_CTL);
>> +		return 0;
>>   	default:
>>   		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
>>   		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
>> @@ -457,6 +462,15 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   		lbr_desc->records.nr = data;
>>   		lbr_desc->arch_lbr_reset = true;
>>   		return 0;
>> +	case MSR_ARCH_LBR_CTL:
>> +		if (!(data & ~KVM_ARCH_LBR_CTL_MASK)) {
> Maybe invert this to reduce indentation?
>
> 		if (data & ...)
> 			break; (or "return 1;")

Fine to me.

>
>
>> +			vmcs_write64(GUEST_IA32_LBR_CTL, data);
>> +			if (intel_pmu_lbr_is_enabled(vcpu) && !lbr_desc->event &&
>> +				(data & ARCH_LBR_CTL_LBREN))
> Alignment.

I'll fix it.

>
>> +				intel_pmu_create_guest_lbr_event(vcpu);
>> +			return 0;
>> +		}
>> +		break;
>>   	default:
>>   		if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) ||
>>   		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
>> @@ -635,12 +649,15 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu)
>>    */
>>   static void intel_pmu_legacy_freezing_lbrs_on_pmi(struct kvm_vcpu *vcpu)
>>   {
>> -	u64 data = vmcs_read64(GUEST_IA32_DEBUGCTL);
>> +	u32 lbr_ctl_field = GUEST_IA32_DEBUGCTL;
>>   
>> -	if (data & DEBUGCTLMSR_FREEZE_LBRS_ON_PMI) {
>> -		data &= ~DEBUGCTLMSR_LBR;
>> -		vmcs_write64(GUEST_IA32_DEBUGCTL, data);
>> -	}
>> +	if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_FREEZE_LBRS_ON_PMI))
>> +		return;
>> +
>> +	if (guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR))
>> +		lbr_ctl_field = GUEST_IA32_LBR_CTL;
>> +
>> +	vmcs_write64(lbr_ctl_field, vmcs_read64(lbr_ctl_field) & ~BIT(0));
> Use ARCH_LBR_CTL_LBREN?

I'm trying to unify the usage of EN bit for both Arch LBR and legacy LBR:

#define ARCH_LBR_CTL_LBREN        BIT(0)
#define DEBUGCTLMSR_LBR            (1UL <<  0)

Any suggestion ?

>
>>   }
>>   
>>   static void intel_pmu_deliver_pmi(struct kvm_vcpu *vcpu)
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 6d7e760fdfa0..a0660b9934c6 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -2036,6 +2036,13 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>>   						VM_EXIT_SAVE_DEBUG_CONTROLS)
>>   			get_vmcs12(vcpu)->guest_ia32_debugctl = data;
>>   
>> +		/*
>> +		 * For Arch LBR, IA32_DEBUGCTL[bit 0] has no meaning.
>> +		 * It can be written to 0 or 1, but reads will always return 0.
>> +		 */
>> +		if (guest_cpuid_has(vcpu, X86_FEATURE_ARCH_LBR))
>> +			data &= ~DEBUGCTLMSR_LBR;
>> +
>>   		vmcs_write64(GUEST_IA32_DEBUGCTL, data);
>>   		if (intel_pmu_lbr_is_enabled(vcpu) && !to_vmx(vcpu)->lbr_desc.event &&
>>   		    (data & DEBUGCTLMSR_LBR))
>> @@ -4463,6 +4470,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>>   		vmcs_writel(GUEST_SYSENTER_ESP, 0);
>>   		vmcs_writel(GUEST_SYSENTER_EIP, 0);
>>   		vmcs_write64(GUEST_IA32_DEBUGCTL, 0);
>> +		if (cpu_has_vmx_arch_lbr())
>> +			vmcs_write64(GUEST_IA32_LBR_CTL, 0);
> Not that any guest is likely to care, but is the MSR cleared on INIT?  The SDM
> has specific language for warm reset, but I can't find anything for INIT.
>
>    On a warm reset, all LBR MSRs, including IA32_LBR_DEPTH, have their values
>    preserved. However, IA32_LBR_CTL.LBREn is cleared to 0, disabling LBRs. If a
>    warm reset is triggered while the processor is in C6, also known as warm init,
>    all LBR MSRs will be reset to their initial values.

I was told that the reset behavior of GUEST_IA32_LBR_CTL
would be the same as the GUEST_IA32_DEBUGCTL (true for INIT as well).

It looks we have not strictly distinguished the guest's power concept C*.
Do we have two trap paths for "warm reset" and "warm init" ?

>
>>   	}
>>   
>>   	kvm_set_rflags(vcpu, X86_EFLAGS_FIXED);
>> -- 
>> 2.29.2
>>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 7/9] KVM: vmx/pmu: Add Arch LBR emulation and its VMCS field
  2021-03-03 17:26   ` Sean Christopherson
@ 2021-03-04  3:02     ` Xu, Like
  2021-03-04 17:23       ` Sean Christopherson
  0 siblings, 1 reply; 30+ messages in thread
From: Xu, Like @ 2021-03-04  3:02 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Peter Zijlstra, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Kan Liang, Dave Hansen, wei.w.wang,
	Borislav Petkov, kvm, x86, linux-kernel, Like Xu

On 2021/3/4 1:26, Sean Christopherson wrote:
> On Wed, Mar 03, 2021, Like Xu wrote:
>> New VMX controls bits for Arch LBR are added. When bit 21 in vmentry_ctrl
>> is set, VM entry will write the value from the "Guest IA32_LBR_CTL" guest
>> state field to IA32_LBR_CTL. When bit 26 in vmexit_ctrl is set, VM exit
>> will clear IA32_LBR_CTL after the value has been saved to the "Guest
>> IA32_LBR_CTL" guest state field.
> ...
>
>> @@ -2529,7 +2532,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
>>   	      VM_EXIT_LOAD_IA32_EFER |
>>   	      VM_EXIT_CLEAR_BNDCFGS |
>>   	      VM_EXIT_PT_CONCEAL_PIP |
>> -	      VM_EXIT_CLEAR_IA32_RTIT_CTL;
>> +	      VM_EXIT_CLEAR_IA32_RTIT_CTL |
>> +	      VM_EXIT_CLEAR_IA32_LBR_CTL;
> So, how does MSR_ARCH_LBR_CTL get restored on the host?  What if the host wants
> to keep _its_ LBR recording active while the guest is running?

Thank you!

I will add "host_lbrctlmsr" field to "struct vcpu_vmx" and
repeat the update/get_debugctlmsr() stuff.

>>   	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_EXIT_CTLS,
>>   				&_vmexit_control) < 0)
>>   		return -EIO;
>> @@ -2553,7 +2557,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
>>   	      VM_ENTRY_LOAD_IA32_EFER |
>>   	      VM_ENTRY_LOAD_BNDCFGS |
>>   	      VM_ENTRY_PT_CONCEAL_PIP |
>> -	      VM_ENTRY_LOAD_IA32_RTIT_CTL;
>> +	      VM_ENTRY_LOAD_IA32_RTIT_CTL |
>> +	      VM_ENTRY_LOAD_IA32_LBR_CTL;
>>   	if (adjust_vmx_controls(min, opt, MSR_IA32_VMX_ENTRY_CTLS,
>>   				&_vmentry_control) < 0)
>>   		return -EIO;
>> -- 
>> 2.29.2
>>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 9/9] KVM: x86: Add XSAVE Support for Architectural LBRs
  2021-03-03 18:03   ` Sean Christopherson
@ 2021-03-04  3:43     ` Like Xu
  2021-03-04 16:31       ` Sean Christopherson
  0 siblings, 1 reply; 30+ messages in thread
From: Like Xu @ 2021-03-04  3:43 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Peter Zijlstra, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, Kan Liang, Dave Hansen, wei.w.wang,
	Borislav Petkov, kvm, x86, linux-kernel

On 2021/3/4 2:03, Sean Christopherson wrote:
> On Wed, Mar 03, 2021, Like Xu wrote:
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 034708a3df20..ec4593e0ee6d 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -7268,6 +7268,8 @@ static __init void vmx_set_cpu_caps(void)
>>   	supported_xss = 0;
>>   	if (!cpu_has_vmx_xsaves())
>>   		kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
>> +	else if (kvm_cpu_cap_has(X86_FEATURE_ARCH_LBR))
>> +		supported_xss |= XFEATURE_MASK_LBR;
>>   
>>   	/* CPUID 0x80000001 */
>>   	if (!cpu_has_vmx_rdtscp())
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index d773836ceb7a..bca2e318ff24 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -10433,6 +10433,8 @@ int kvm_arch_hardware_setup(void *opaque)
>>   
>>   	if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
>>   		supported_xss = 0;
>> +	else
>> +		supported_xss &= host_xss;
> 
> Not your fault by any means, but I would prefer to have matching logic for XSS
> and XCR0.  The existing clearing of supported_xss here is pointless.  E.g. I'd
> prefer something like the following, though Paolo may have a different opinion.

I have no preference for where to do rdmsrl() in kvm_arch_init()
or kvm_arch_hardware_setup().

It's true the assignment of supported_xss in the kvm/intel
tree is redundant and introducing KVM_SUPPORTED_XSS is also fine to me.


> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 6d7e760fdfa0..c781034463e5 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7244,12 +7244,15 @@ static __init void vmx_set_cpu_caps(void)
>                  kvm_cpu_cap_clear(X86_FEATURE_INVPCID);
>          if (vmx_pt_mode_is_host_guest())
>                  kvm_cpu_cap_check_and_set(X86_FEATURE_INTEL_PT);
> +       if (!cpu_has_vmx_arch_lbr()) {
> +               kvm_cpu_cap_clear(X86_FEATURE_ARCH_LBR);
> +               supported_xss &= ~XFEATURE_MASK_LBR;
> +       }
> 

I will move the above part to the LBR patch
and leave the left part as a pre-patch for Paolo's review.

>          if (vmx_umip_emulated())
>                  kvm_cpu_cap_set(X86_FEATURE_UMIP);
> 
>          /* CPUID 0xD.1 */
> -       supported_xss = 0;
>          if (!cpu_has_vmx_xsaves())
>                  kvm_cpu_cap_clear(X86_FEATURE_XSAVES);

if (!cpu_has_vmx_xsaves())
	supported_xss = 0;
	kvm_cpu_cap_clear(X86_FEATURE_XSAVES);

> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7b0adebec1ef..5f9eb1f5b840 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -205,6 +205,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
>                                  | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
>                                  | XFEATURE_MASK_PKRU)
> 
> +#define KVM_SUPPORTED_XSS      XFEATURE_MASK_LBR
> +
>   u64 __read_mostly host_efer;
>   EXPORT_SYMBOL_GPL(host_efer);
> 
> @@ -8037,6 +8039,11 @@ int kvm_arch_init(void *opaque)
>                  supported_xcr0 = host_xcr0 & KVM_SUPPORTED_XCR0;
>          }
> 
> +       if (boot_cpu_has(X86_FEATURE_XSAVES))

{

> +               rdmsrl(MSR_IA32_XSS, host_xss);
> +               supported_xss = host_xss & KVM_SUPPORTED_XSS;
> +       }
> +
>          if (pi_inject_timer == -1)
>                  pi_inject_timer = housekeeping_enabled(HK_FLAG_TIMER);
>   #ifdef CONFIG_X86_64
> @@ -10412,9 +10419,6 @@ int kvm_arch_hardware_setup(void *opaque)
> 
>          rdmsrl_safe(MSR_EFER, &host_efer);
> 
> -       if (boot_cpu_has(X86_FEATURE_XSAVES))
> -               rdmsrl(MSR_IA32_XSS, host_xss);
> -
>          r = ops->hardware_setup();
>          if (r != 0)
>                  return r;
> @@ -10422,9 +10426,6 @@ int kvm_arch_hardware_setup(void *opaque)
>          memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
>          kvm_ops_static_call_update();
> 
> -       if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
> -               supported_xss = 0;
> -
>   #define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
>          cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
>   #undef __kvm_cpu_cap_has
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 5/9] KVM: vmx/pmu: Add MSR_ARCH_LBR_DEPTH emulation for Arch LBR
  2021-03-04  2:30     ` Xu, Like
@ 2021-03-04 16:12       ` Sean Christopherson
  2021-03-05  2:33         ` Xu, Like
  0 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2021-03-04 16:12 UTC (permalink / raw)
  To: Xu, Like
  Cc: Peter Zijlstra, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Kan Liang, Dave Hansen, wei.w.wang,
	Borislav Petkov, kvm, x86, linux-kernel, Like Xu

On Thu, Mar 04, 2021, Xu, Like wrote:
> Hi Sean,
> 
> Thanks for your detailed review on the patch set.
> 
> On 2021/3/4 0:58, Sean Christopherson wrote:
> > On Wed, Mar 03, 2021, Like Xu wrote:
> > > @@ -348,10 +352,26 @@ static bool intel_pmu_handle_lbr_msrs_access(struct kvm_vcpu *vcpu,
> > >   	return true;
> > >   }
> > > +/*
> > > + * Check if the requested depth values is supported
> > > + * based on the bits [0:7] of the guest cpuid.1c.eax.
> > > + */
> > > +static bool arch_lbr_depth_is_valid(struct kvm_vcpu *vcpu, u64 depth)
> > > +{
> > > +	struct kvm_cpuid_entry2 *best;
> > > +
> > > +	best = kvm_find_cpuid_entry(vcpu, 0x1c, 0);
> > > +	if (best && depth && !(depth % 8))
> > This is still wrong, it fails to weed out depth > 64.
> 
> How come ? The testcases depth = {65, 127, 128} get #GP as expected.

@depth is a u64, throw in a number that is a multiple of 8 and >= 520, and the
"(1ULL << (depth / 8 - 1))" will trigger undefined behavior due to shifting
beyond the capacity of a ULL / u64.

Adding the "< 64" check would also allow dropping the " & 0xff" since the check
would ensure the shift doesn't go beyond bit 7.  I'm not sure the cleverness is
worth shaving a cycle, though.

> > Not that this is a hot path, but it's probably worth double checking that the
> > compiler generates simple code for "depth % 8", e.g. it can be "depth & 7)".
> 
> Emm, the "%" operation is quite normal over kernel code.

So is "&" :-)  I was just pointing out that the compiler should optimize this,
and it did.

> if (best && depth && !(depth % 8))
>    10659:       48 85 c0                test   rax,rax
>    1065c:       74 c7                   je     10625 <intel_pmu_set_msr+0x65>
>    1065e:       4d 85 e4                test   r12,r12
>    10661:       74 c2                   je     10625 <intel_pmu_set_msr+0x65>
>    10663:       41 f6 c4 07             test   r12b,0x7
>    10667:       75 bc                   jne    10625 <intel_pmu_set_msr+0x65>
> 
> It looks like the compiler does the right thing.
> Do you see the room for optimization ?
> 
> > 
> > > +		return (best->eax & 0xff) & (1ULL << (depth / 8 - 1));

Actually, looking at this again, I would explicitly use BIT() instead of 1ULL
(or BIT_ULL), since the shift must be 7 or less.

> > > +
> > > +	return false;
> > > +}
> > > +
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 6/9] KVM: vmx/pmu: Add MSR_ARCH_LBR_CTL emulation for Arch LBR
  2021-03-04  2:58     ` Xu, Like
@ 2021-03-04 16:25       ` Sean Christopherson
  0 siblings, 0 replies; 30+ messages in thread
From: Sean Christopherson @ 2021-03-04 16:25 UTC (permalink / raw)
  To: Xu, Like
  Cc: Peter Zijlstra, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Kan Liang, Dave Hansen, wei.w.wang,
	Borislav Petkov, kvm, x86, linux-kernel, Like Xu

On Thu, Mar 04, 2021, Xu, Like wrote:
> On 2021/3/4 1:19, Sean Christopherson wrote:
> > > @@ -4463,6 +4470,8 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
> > >   		vmcs_writel(GUEST_SYSENTER_ESP, 0);
> > >   		vmcs_writel(GUEST_SYSENTER_EIP, 0);
> > >   		vmcs_write64(GUEST_IA32_DEBUGCTL, 0);
> > > +		if (cpu_has_vmx_arch_lbr())
> > > +			vmcs_write64(GUEST_IA32_LBR_CTL, 0);
> > Not that any guest is likely to care, but is the MSR cleared on INIT?  The SDM
> > has specific language for warm reset, but I can't find anything for INIT.
> > 
> >    On a warm reset, all LBR MSRs, including IA32_LBR_DEPTH, have their values
> >    preserved. However, IA32_LBR_CTL.LBREn is cleared to 0, disabling LBRs. If a
> >    warm reset is triggered while the processor is in C6, also known as warm init,
> >    all LBR MSRs will be reset to their initial values.
> 
> I was told that the reset behavior of GUEST_IA32_LBR_CTL
> would be the same as the GUEST_IA32_DEBUGCTL (true for INIT as well).

Yes, and DEBUGCTL is preserved on INIT.

	if (!init_event) {
		vmcs_write32(GUEST_SYSENTER_CS, 0);
		vmcs_writel(GUEST_SYSENTER_ESP, 0);
		vmcs_writel(GUEST_SYSENTER_EIP, 0);
		vmcs_write64(GUEST_IA32_DEBUGCTL, 0);
	}

Table 22-10 in the SDM:

	All Other MSRs | Pwr up or Reset: | INIT:
			 Undefined          Unchanged


If IA32_LBR_DEPTH is weirdly exempt, it needs to be documented.  I doubt that's
the case though.

> It looks we have not strictly distinguished the guest's power concept C*.
> Do we have two trap paths for "warm reset" and "warm init" ?

No.  Despite the name .vcpu_reset, KVM doesn't even have a RESET path, userspace
is responsible for modelling RESET.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 9/9] KVM: x86: Add XSAVE Support for Architectural LBRs
  2021-03-04  3:43     ` Like Xu
@ 2021-03-04 16:31       ` Sean Christopherson
  2021-03-05  2:57         ` Xu, Like
  0 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2021-03-04 16:31 UTC (permalink / raw)
  To: Like Xu
  Cc: Paolo Bonzini, Peter Zijlstra, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Kan Liang, Dave Hansen, wei.w.wang,
	Borislav Petkov, kvm, x86, linux-kernel

On Thu, Mar 04, 2021, Like Xu wrote:
> On 2021/3/4 2:03, Sean Christopherson wrote:
> >          if (vmx_umip_emulated())
> >                  kvm_cpu_cap_set(X86_FEATURE_UMIP);
> > 
> >          /* CPUID 0xD.1 */
> > -       supported_xss = 0;
> >          if (!cpu_has_vmx_xsaves())
> >                  kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
> 
> if (!cpu_has_vmx_xsaves())
> 	supported_xss = 0;

Argh, I forgot XSAVES has a VMCS control.  That's why kvm_arch_hardware_setup()
clears supported_xss if !XSAVES.  I guess just leave that existing code, but
maybe add a comment.

Paolo, any thoughts on how to keep supported_xss aligned with support_xcr0,
without spreading the logic around too much?

> 	kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
> 
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 7b0adebec1ef..5f9eb1f5b840 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -205,6 +205,8 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
> >                                  | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
> >                                  | XFEATURE_MASK_PKRU)
> > 
> > +#define KVM_SUPPORTED_XSS      XFEATURE_MASK_LBR
> > +
> >   u64 __read_mostly host_efer;
> >   EXPORT_SYMBOL_GPL(host_efer);
> > 
> > @@ -8037,6 +8039,11 @@ int kvm_arch_init(void *opaque)
> >                  supported_xcr0 = host_xcr0 & KVM_SUPPORTED_XCR0;
> >          }
> > 
> > +       if (boot_cpu_has(X86_FEATURE_XSAVES))
> 
> {
> 
> > +               rdmsrl(MSR_IA32_XSS, host_xss);
> > +               supported_xss = host_xss & KVM_SUPPORTED_XSS;
> > +       }
> > +
> >          if (pi_inject_timer == -1)
> >                  pi_inject_timer = housekeeping_enabled(HK_FLAG_TIMER);
> >   #ifdef CONFIG_X86_64
> > @@ -10412,9 +10419,6 @@ int kvm_arch_hardware_setup(void *opaque)
> > 
> >          rdmsrl_safe(MSR_EFER, &host_efer);
> > 
> > -       if (boot_cpu_has(X86_FEATURE_XSAVES))
> > -               rdmsrl(MSR_IA32_XSS, host_xss);
> > -
> >          r = ops->hardware_setup();
> >          if (r != 0)
> >                  return r;
> > @@ -10422,9 +10426,6 @@ int kvm_arch_hardware_setup(void *opaque)
> >          memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
> >          kvm_ops_static_call_update();
> > 
> > -       if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
> > -               supported_xss = 0;
> > -
> >   #define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
> >          cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
> >   #undef __kvm_cpu_cap_has
> > 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 7/9] KVM: vmx/pmu: Add Arch LBR emulation and its VMCS field
  2021-03-04  3:02     ` Xu, Like
@ 2021-03-04 17:23       ` Sean Christopherson
  2021-03-05  6:35         ` Xu, Like
  0 siblings, 1 reply; 30+ messages in thread
From: Sean Christopherson @ 2021-03-04 17:23 UTC (permalink / raw)
  To: Xu, Like
  Cc: Peter Zijlstra, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Kan Liang, Dave Hansen, wei.w.wang,
	Borislav Petkov, kvm, x86, linux-kernel, Like Xu

On Thu, Mar 04, 2021, Xu, Like wrote:
> On 2021/3/4 1:26, Sean Christopherson wrote:
> > On Wed, Mar 03, 2021, Like Xu wrote:
> > > New VMX controls bits for Arch LBR are added. When bit 21 in vmentry_ctrl
> > > is set, VM entry will write the value from the "Guest IA32_LBR_CTL" guest
> > > state field to IA32_LBR_CTL. When bit 26 in vmexit_ctrl is set, VM exit
> > > will clear IA32_LBR_CTL after the value has been saved to the "Guest
> > > IA32_LBR_CTL" guest state field.
> > ...
> > 
> > > @@ -2529,7 +2532,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
> > >   	      VM_EXIT_LOAD_IA32_EFER |
> > >   	      VM_EXIT_CLEAR_BNDCFGS |
> > >   	      VM_EXIT_PT_CONCEAL_PIP |
> > > -	      VM_EXIT_CLEAR_IA32_RTIT_CTL;
> > > +	      VM_EXIT_CLEAR_IA32_RTIT_CTL |
> > > +	      VM_EXIT_CLEAR_IA32_LBR_CTL;
> > So, how does MSR_ARCH_LBR_CTL get restored on the host?  What if the host wants
> > to keep _its_ LBR recording active while the guest is running?
> 
> Thank you!
> 
> I will add "host_lbrctlmsr" field to "struct vcpu_vmx" and
> repeat the update/get_debugctlmsr() stuff.

I am not remotely confident that tracking LBRCTL via vcpu_vmx is correct, and
I'm far less confident that the existing DEBUGCTL logic is correct.  As Jim
pointed out[*], intel_pmu_handle_irq() can run at any time, and it's not at all
clear to me that the DEBUGCTL coming out of the NMI handler is guaranteed to be
the same value going in.  Ditto for LBRCTL.

Actually, NMIs aside, KVM's DEBUGCTL handling is provably broken since writing
/sys/devices/cpu/freeze_on_smi is propagated to other CPUs via IRQ, and KVM
snapshots DEBUCTL on vCPU load, i.e. runs with IRQs enabled long after grabbing
the value.

  WARNING: CPU: 5 PID: 0 at arch/x86/events/intel/core.c:4066 flip_smm_bit+0xb/0x30
  RIP: 0010:flip_smm_bit+0xb/0x30
  Call Trace:
   <IRQ>
   flush_smp_call_function_queue+0x118/0x1a0
   __sysvec_call_function+0x2c/0x90
   asm_call_irq_on_stack+0x12/0x20
   </IRQ>


So, rather than pile on more MSR handling that is at best dubious, and at worst
broken, I would like to see KVM properly integrate with perf to ensure KVM
restores the correct, fresh values of all MSRs that are owned by perf.  Or at
least add something that guarantees that intel_pmu_handle_irq() preserves the
MSRs.  As is, it's impossible to review these KVM changes without deep, deep
knowledge of what perf is doing.

https://lkml.kernel.org/r/20210209225653.1393771-1-jmattson@google.com

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 5/9] KVM: vmx/pmu: Add MSR_ARCH_LBR_DEPTH emulation for Arch LBR
  2021-03-04 16:12       ` Sean Christopherson
@ 2021-03-05  2:33         ` Xu, Like
  0 siblings, 0 replies; 30+ messages in thread
From: Xu, Like @ 2021-03-05  2:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Peter Zijlstra, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Jim Mattson, Joerg Roedel, Kan Liang, Dave Hansen, wei.w.wang,
	Borislav Petkov, kvm, x86, linux-kernel, Like Xu

On 2021/3/5 0:12, Sean Christopherson wrote:
> On Thu, Mar 04, 2021, Xu, Like wrote:
>> Hi Sean,
>>
>> Thanks for your detailed review on the patch set.
>>
>> On 2021/3/4 0:58, Sean Christopherson wrote:
>>> On Wed, Mar 03, 2021, Like Xu wrote:
>>>> @@ -348,10 +352,26 @@ static bool intel_pmu_handle_lbr_msrs_access(struct kvm_vcpu *vcpu,
>>>>    	return true;
>>>>    }
>>>> +/*
>>>> + * Check if the requested depth values is supported
>>>> + * based on the bits [0:7] of the guest cpuid.1c.eax.
>>>> + */
>>>> +static bool arch_lbr_depth_is_valid(struct kvm_vcpu *vcpu, u64 depth)
>>>> +{
>>>> +	struct kvm_cpuid_entry2 *best;
>>>> +
>>>> +	best = kvm_find_cpuid_entry(vcpu, 0x1c, 0);
>>>> +	if (best && depth && !(depth % 8))
>>> This is still wrong, it fails to weed out depth > 64.
>> How come ? The testcases depth = {65, 127, 128} get #GP as expected.
> @depth is a u64, throw in a number that is a multiple of 8 and >= 520, and the
> "(1ULL << (depth / 8 - 1))" will trigger undefined behavior due to shifting
> beyond the capacity of a ULL / u64.

Extra:

when we say "undefined behavior" if shifting beyond the capacity of a ULL,
do you mean that the actual behavior depends on the machine, architecture 
or compiler?

>
> Adding the "< 64" check would also allow dropping the " & 0xff" since the check
> would ensure the shift doesn't go beyond bit 7.  I'm not sure the cleverness is
> worth shaving a cycle, though.

Finally how about:

     if (best && depth && (depth < 65) && !(depth & 7))
         return best->eax & BIT_ULL(depth / 8 - 1);

     return false;

Do you see the room for optimization ?

>
>>> Not that this is a hot path, but it's probably worth double checking that the
>>> compiler generates simple code for "depth % 8", e.g. it can be "depth & 7)".
>> Emm, the "%" operation is quite normal over kernel code.
> So is "&" :-)  I was just pointing out that the compiler should optimize this,
> and it did.
>
>> if (best && depth && !(depth % 8))
>>     10659:       48 85 c0                test   rax,rax
>>     1065c:       74 c7                   je     10625 <intel_pmu_set_msr+0x65>
>>     1065e:       4d 85 e4                test   r12,r12
>>     10661:       74 c2                   je     10625 <intel_pmu_set_msr+0x65>
>>     10663:       41 f6 c4 07             test   r12b,0x7
>>     10667:       75 bc                   jne    10625 <intel_pmu_set_msr+0x65>
>>
>> It looks like the compiler does the right thing.
>> Do you see the room for optimization ?
>>
>>>> +		return (best->eax & 0xff) & (1ULL << (depth / 8 - 1));
> Actually, looking at this again, I would explicitly use BIT() instead of 1ULL
> (or BIT_ULL), since the shift must be 7 or less.
>
>>>> +
>>>> +	return false;
>>>> +}
>>>> +


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 9/9] KVM: x86: Add XSAVE Support for Architectural LBRs
  2021-03-04 16:31       ` Sean Christopherson
@ 2021-03-05  2:57         ` Xu, Like
  0 siblings, 0 replies; 30+ messages in thread
From: Xu, Like @ 2021-03-05  2:57 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Peter Zijlstra, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, wei.w.wang, kvm, x86, linux-kernel, Like Xu

On 2021/3/5 0:31, Sean Christopherson wrote:
> Paolo, any thoughts on how to keep supported_xss aligned with support_xcr0,
> without spreading the logic around too much?

 From 58be4152ced441395dfc439f446c5ad53bd48576 Mon Sep 17 00:00:00 2001
From: Like Xu <like.xu@linux.intel.com>
Date: Thu, 4 Mar 2021 13:21:38 +0800
Subject: [PATCH] KVM: x86: Refine the matching and clearing logic for 
supported_xss

"The existing clearing of supported_xss here is pointless".
Let's refine the code path in this way: initialize the supported_xss
with the filter of KVM_SUPPORTED_XSS mask and update its value in
a bit clear manner (rather than bit setting).

Before:
(1) kvm_arch_hardware_setup
     if (boot_cpu_has(X86_FEATURE_XSAVES))
         rdmsrl(MSR_IA32_XSS, host_xss);
     if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
         supported_xss = 0;
     else supported_xss &= host_xss;
(2) vmx_set_cpu_caps
     supported_xss = 0;
     if (!cpu_has_vmx_xsaves())
         kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
     else set available bits to supported_xss

After:
(1) kvm_arch_init
     if (boot_cpu_has(X86_FEATURE_XSAVES))
         rdmsrl(MSR_IA32_XSS, host_xss);
         supported_xss = host_xss & KVM_SUPPORTED_XSS;
(2) vmx_set_cpu_caps
     if (!cpu_has_vmx_xsaves())
         kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
         supported_xss = 0;
     else clear un-available bits for supported_xss

Suggested-by: Sean Christopherson <seanjc@google.com>
Original-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Like Xu <like.xu@linux.intel.com>
---
  arch/x86/kvm/vmx/vmx.c |  5 +++--
  arch/x86/kvm/x86.c     | 13 +++++++------
  2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4bc4bb49aaa9..8706323547c4 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7288,9 +7288,10 @@ static __init void vmx_set_cpu_caps(void)
          kvm_cpu_cap_set(X86_FEATURE_UMIP);

      /* CPUID 0xD.1 */
-    supported_xss = 0;
-    if (!cpu_has_vmx_xsaves())
+    if (!cpu_has_vmx_xsaves()) {
          kvm_cpu_cap_clear(X86_FEATURE_XSAVES);
+        supported_xss = 0;
+    }

      /* CPUID 0x80000001 */
      if (!cpu_has_vmx_rdtscp())
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d773836ceb7a..99cb62035bb2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -205,6 +205,8 @@ static struct kvm_user_return_msrs __percpu 
*user_return_msrs;
                  | XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
                  | XFEATURE_MASK_PKRU)

+#define KVM_SUPPORTED_XSS     0
+
  u64 __read_mostly host_efer;
  EXPORT_SYMBOL_GPL(host_efer);

@@ -8046,6 +8048,11 @@ int kvm_arch_init(void *opaque)
          supported_xcr0 = host_xcr0 & KVM_SUPPORTED_XCR0;
      }

+    if (boot_cpu_has(X86_FEATURE_XSAVES)) {
+        rdmsrl(MSR_IA32_XSS, host_xss);
+        supported_xss = host_xss & KVM_SUPPORTED_XSS;
+    }
+
      if (pi_inject_timer == -1)
          pi_inject_timer = housekeeping_enabled(HK_FLAG_TIMER);
  #ifdef CONFIG_X86_64
@@ -10421,9 +10428,6 @@ int kvm_arch_hardware_setup(void *opaque)

      rdmsrl_safe(MSR_EFER, &host_efer);

-    if (boot_cpu_has(X86_FEATURE_XSAVES))
-        rdmsrl(MSR_IA32_XSS, host_xss);
-
      r = ops->hardware_setup();
      if (r != 0)
          return r;
@@ -10431,9 +10435,6 @@ int kvm_arch_hardware_setup(void *opaque)
      memcpy(&kvm_x86_ops, ops->runtime_ops, sizeof(kvm_x86_ops));
      kvm_ops_static_call_update();

-    if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES))
-        supported_xss = 0;
-
  #define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f)
      cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_);
  #undef __kvm_cpu_cap_has
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 7/9] KVM: vmx/pmu: Add Arch LBR emulation and its VMCS field
  2021-03-04 17:23       ` Sean Christopherson
@ 2021-03-05  6:35         ` Xu, Like
  0 siblings, 0 replies; 30+ messages in thread
From: Xu, Like @ 2021-03-05  6:35 UTC (permalink / raw)
  To: Sean Christopherson, Jim Mattson
  Cc: Peter Zijlstra, Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li,
	Joerg Roedel, Kan Liang, Dave Hansen, wei.w.wang,
	Borislav Petkov, kvm, x86, linux-kernel, Like Xu, Kleen, Andi

On 2021/3/5 1:23, Sean Christopherson wrote:
> On Thu, Mar 04, 2021, Xu, Like wrote:
>> On 2021/3/4 1:26, Sean Christopherson wrote:
>>> On Wed, Mar 03, 2021, Like Xu wrote:
>>>> New VMX controls bits for Arch LBR are added. When bit 21 in vmentry_ctrl
>>>> is set, VM entry will write the value from the "Guest IA32_LBR_CTL" guest
>>>> state field to IA32_LBR_CTL. When bit 26 in vmexit_ctrl is set, VM exit
>>>> will clear IA32_LBR_CTL after the value has been saved to the "Guest
>>>> IA32_LBR_CTL" guest state field.
>>> ...
>>>
>>>> @@ -2529,7 +2532,8 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
>>>>    	      VM_EXIT_LOAD_IA32_EFER |
>>>>    	      VM_EXIT_CLEAR_BNDCFGS |
>>>>    	      VM_EXIT_PT_CONCEAL_PIP |
>>>> -	      VM_EXIT_CLEAR_IA32_RTIT_CTL;
>>>> +	      VM_EXIT_CLEAR_IA32_RTIT_CTL |
>>>> +	      VM_EXIT_CLEAR_IA32_LBR_CTL;
>>> So, how does MSR_ARCH_LBR_CTL get restored on the host?  What if the host wants
>>> to keep _its_ LBR recording active while the guest is running?
>> Thank you!
>>
>> I will add "host_lbrctlmsr" field to "struct vcpu_vmx" and
>> repeat the update/get_debugctlmsr() stuff.
> I am not remotely confident that tracking LBRCTL via vcpu_vmx is correct, and
> I'm far less confident that the existing DEBUGCTL logic is correct.  As Jim
> pointed out[*], intel_pmu_handle_irq() can run at any time, and it's not at all
> clear to me that the DEBUGCTL coming out of the NMI handler is guaranteed to be
> the same value going in.  Ditto for LBRCTL.

It's not true for "Ditto for LBRCTL".

Because the usage of ARCH_LBR_CTL is specified for LBR,
not the shared case of DEBUGCTL. And all LBR events created from
KVM or host perf syscall are all under the control of host perf subsystem.

The irq handler would restore the original value of the ARCH_LBR_CTL
even it's called after the KVM snapshots DEBUCTL on vCPU load.
The change is transparent to the update_lbrctlmsr() and get_lbrctlmsr().

> Actually, NMIs aside, KVM's DEBUGCTL handling is provably broken since writing
> /sys/devices/cpu/freeze_on_smi is propagated to other CPUs via IRQ, and KVM
> snapshots DEBUCTL on vCPU load, i.e. runs with IRQs enabled long after grabbing
> the value.
>
>    WARNING: CPU: 5 PID: 0 at arch/x86/events/intel/core.c:4066 flip_smm_bit+0xb/0x30
>    RIP: 0010:flip_smm_bit+0xb/0x30
>    Call Trace:
>     <IRQ>
>     flush_smp_call_function_queue+0x118/0x1a0
>     __sysvec_call_function+0x2c/0x90
>     asm_call_irq_on_stack+0x12/0x20
>     </IRQ>

This kind of bug with the keyword "flip_smm_bit" did not appear on the 
mailing list.
Would you mind to share testcases or more details about the steps to 
reproduce ?

>
> So, rather than pile on more MSR handling that is at best dubious, and at worst
> broken, I would like to see KVM properly integrate with perf to ensure KVM
> restores the correct, fresh values of all MSRs that are owned by perf.  Or at
> least add something that guarantees that intel_pmu_handle_irq() preserves the
> MSRs.  As is, it's impossible to review these KVM changes without deep, deep
> knowledge of what perf is doing.

Jim complained more about the inconsistent maintenance of
MSR_IA32_PEBS_ENABLE between KVM and perf subsystem.

The issue bothers the host due to the subsystem integration,
but the guest's use of PBES will be safe and reliable.

We could cover more details in the guest PEBS enabling thread.

>
> https://lkml.kernel.org/r/20210209225653.1393771-1-jmattson@google.com


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2021-03-05  6:36 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-03 13:57 [PATCH v3 0/9] KVM: x86/pmu: Guest Architectural LBR Enabling Like Xu
2021-03-03 13:57 ` [PATCH v3 1/9] perf/x86/intel: Fix a comment about guest LBR support Like Xu
2021-03-03 16:49   ` Sean Christopherson
2021-03-03 13:57 ` [PATCH v3 2/9] perf/x86/lbr: Simplify the exposure check for the LBR_INFO registers Like Xu
2021-03-03 13:57 ` [PATCH v3 3/9] perf/x86/lbr: Skip checking for the existence of LBR_TOS for Arch LBR Like Xu
2021-03-03 13:57 ` [PATCH v3 4/9] perf/x86/lbr: Use GFP_ATOMIC for cpuc->lbr_xsave memory allocation Like Xu
2021-03-03 13:57 ` [PATCH v3 5/9] KVM: vmx/pmu: Add MSR_ARCH_LBR_DEPTH emulation for Arch LBR Like Xu
2021-03-03 16:58   ` Sean Christopherson
2021-03-04  2:30     ` Xu, Like
2021-03-04 16:12       ` Sean Christopherson
2021-03-05  2:33         ` Xu, Like
2021-03-03 13:57 ` [PATCH v3 6/9] KVM: vmx/pmu: Add MSR_ARCH_LBR_CTL " Like Xu
2021-03-03 17:19   ` Sean Christopherson
2021-03-04  2:58     ` Xu, Like
2021-03-04 16:25       ` Sean Christopherson
2021-03-03 13:57 ` [PATCH v3 7/9] KVM: vmx/pmu: Add Arch LBR emulation and its VMCS field Like Xu
2021-03-03 17:26   ` Sean Christopherson
2021-03-04  3:02     ` Xu, Like
2021-03-04 17:23       ` Sean Christopherson
2021-03-05  6:35         ` Xu, Like
2021-03-03 13:57 ` [PATCH v3 8/9] KVM: x86: Expose Architectural LBR CPUID leaf Like Xu
2021-03-03 17:34   ` Sean Christopherson
2021-03-03 18:01     ` Sean Christopherson
2021-03-03 13:57 ` [PATCH v3 9/9] KVM: x86: Add XSAVE Support for Architectural LBRs Like Xu
2021-03-03 18:03   ` Sean Christopherson
2021-03-04  3:43     ` Like Xu
2021-03-04 16:31       ` Sean Christopherson
2021-03-05  2:57         ` Xu, Like
2021-03-03 13:57 ` [kvm-unit-tests PATCH] x86: Update guest LBR tests for Architectural LBR Like Xu
2021-03-03 18:05   ` Sean Christopherson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.