All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] Implement support for IBS virtualization
@ 2023-09-04  9:53 Manali Shukla
  2023-09-04  9:53 ` [PATCH 01/13] KVM: Add KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC for extapic Manali Shukla
                   ` (13 more replies)
  0 siblings, 14 replies; 32+ messages in thread
From: Manali Shukla @ 2023-09-04  9:53 UTC (permalink / raw)
  To: kvm, seanjc
  Cc: linux-doc, linux-perf-users, x86, pbonzini, peterz, bp,
	santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj,
	manali.shukla

Add support for IBS virtualization (VIBS). VIBS feature allows the
guest to collect IBS samples without exiting the guest.  There are
2 parts to it [1].
- Virtualizing the IBS register state.
- Ensuring the IBS interrupt is handled in the guest without exiting
  the hypervisor.

VIBS requires the use of AVIC or NMI virtualization for delivery of
a virtualized interrupt from IBS hardware in the guest [1].

While IBS collects data for IBS fetch/op block for the sampled
interval, VIBS hardware signals a VNMI, but the source of VNMI is
different in both AVIC enabled/disabled case.
- When AVIC is disabled, virtual NMI is HW accelerated.
- When AVIC is enabled, virtual NMI is accelerated via AVIC using
  Extended LVT.

The local interrupts are extended to include more LVT registers, to
allow additional interrupt sources, like instruction based sampling
etc. [3].

Note that, since IBS registers are swap type C [2], the hypervisor is
responsible for saving and restoring of IBS host state. Hypervisor
does so only when IBS is active on the host to avoid unnecessary
rdmsrs/wrmsrs. Hypervisor needs to disable host IBS before saving the
state and enter the guest. After a guest exit, the hypervisor needs to
restore host IBS state and re-enable IBS.

[1]: https://bugzilla.kernel.org/attachment.cgi?id=304653
     AMD64 Architecture Programmer’s Manual, Vol 2, Section 15.38
     Instruction-Based Sampling Virtualization.

[2]: https://bugzilla.kernel.org/attachment.cgi?id=304653
     AMD64 Architecture Programmer’s Manual, Vol 2, Appendix B Layout
     of VMCB, Table B-3 Swap Types.

[3]: https://bugzilla.kernel.org/attachment.cgi?id=304653
     AMD64 Architecture Programmer’s Manual, Vol 2, Section 16.4.5
     Extended Interrupts.

Testing done:
- Following tests were executed on guest
  sudo perf record -e ibs_op// -c 100000 -a
  sudo perf record -e ibs_op// -c 100000 -C 10
  sudo perf record -e ibs_op/cnt_ctl=1/ -c 100000 -a
  sudo perf record -e ibs_op/cnt_ctl=1/ -c 100000 -a --raw-samples
  sudo perf record -e ibs_op/cnt_ctl=1,l3missonly=1/ -c 100000 -a
  sudo perf record -e ibs_op/cnt_ctl=1/ -c 100000 -p 1234
  sudo perf record -e ibs_op/cnt_ctl=1/ -c 100000 -- ls
  sudo ./tools/perf/perf record -e ibs_op// -e ibs_fetch// -a --raw-samples -c 100000
  sudo perf report
  sudo perf script
  sudo perf report -D | grep -P "LdOp 1.*StOp 0" | wc -l
  sudo perf report -D | grep -P "LdOp 1.*StOp 0.*DcMiss 1" | wc -l
  sudo perf report -D | grep -P "LdOp 1.*StOp 0.*DcMiss 1.*L2Miss 1" | wc -l
  sudo perf report -D | grep -B1 -P "LdOp 1.*StOp 0.*DcMiss 1.*L2Miss 1" | grep -P "DataSrc ([02-9]|1[0-2])=" | wc -l

- Following Nested guests combinations were tested manually
  ----------------------------
  | Collected IBS Samples in |
  ----------------------------
  |   L0   |   L1   |   L2   |
  ----------------------------
  |   Y    |   Y    |   Y    |
  |   Y    |   Y    |   N    |
  |   Y    |   N    |   Y    |
  |   Y    |   N    |   N    |
  |   N    |   Y    |   Y    |
  |   N    |   Y    |   N    |
  |   N    |   N    |   Y    |
  ----------------------------

Qemu changes can be found at below location:
https://github.com/Kullu14/qemu/tree/qemu_vibs_branch

Qemu commandline to enable IBS virtualization: qemu-system-x86_64
-enable-kvm -cpu EPYC-Genoa,+svm,+ibs,+extapic,+extlvt,+vibs \ ..

base-commit: 3b2ac85b3d2954b583dc2039825ad76eda8516a9 (kvm_x86 misc)

On top of base commit,
https://lore.kernel.org/all/20230717041903.85480-1-manali.shukla@amd.com
patch is applied, then VIBS patch series is applied 

Manali Shukla (8):
  KVM: Add KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC for
    extapic
  KVM: x86/cpuid: Add a KVM-only leaf for IBS capabilities
  KVM: x86: Extend CPUID range to include new leaf
  perf/x86/amd: Add framework to save/restore host IBS state
  x86/cpufeatures: Add CPUID feature bit for VIBS in SEV-ES guest
  KVM: SVM: Add support for IBS virtualization for SEV-ES guests
  KVM: SVM: Enable IBS virtualization on non SEV-ES and SEV-ES guests
  KVM: x86: nSVM: Implement support for nested IBS virtualization

Santosh Shukla (5):
  x86/cpufeatures: Add CPUID feature bit for Extended LVT
  KVM: x86: Add emulation support for Extented LVT registers
  x86/cpufeatures: Add CPUID feature bit for virtualized IBS
  KVM: SVM: Extend VMCB area for virtualized IBS registers
  KVM: SVM: add support for IBS virtualization for non SEV-ES guests

 Documentation/virt/kvm/api.rst     |  23 ++++
 arch/x86/events/amd/Makefile       |   2 +-
 arch/x86/events/amd/ibs.c          |  23 ++++
 arch/x86/events/amd/vibs.c         | 101 ++++++++++++++
 arch/x86/include/asm/apicdef.h     |  14 ++
 arch/x86/include/asm/cpufeatures.h |   3 +
 arch/x86/include/asm/perf_event.h  |  27 ++++
 arch/x86/include/asm/svm.h         |  34 ++++-
 arch/x86/include/uapi/asm/kvm.h    |   5 +
 arch/x86/kvm/cpuid.c               |  11 ++
 arch/x86/kvm/governed_features.h   |   1 +
 arch/x86/kvm/lapic.c               |  78 ++++++++++-
 arch/x86/kvm/lapic.h               |   7 +-
 arch/x86/kvm/reverse_cpuid.h       |  15 +++
 arch/x86/kvm/svm/avic.c            |   4 +
 arch/x86/kvm/svm/nested.c          |  23 ++++
 arch/x86/kvm/svm/sev.c             |  10 ++
 arch/x86/kvm/svm/svm.c             | 207 +++++++++++++++++++++++++++++
 arch/x86/kvm/svm/svm.h             |   5 +-
 arch/x86/kvm/x86.c                 |  24 ++--
 include/uapi/linux/kvm.h           |  10 ++
 21 files changed, 603 insertions(+), 24 deletions(-)
 create mode 100644 arch/x86/events/amd/vibs.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/13] KVM: Add KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC for extapic
  2023-09-04  9:53 [PATCH 00/13] Implement support for IBS virtualization Manali Shukla
@ 2023-09-04  9:53 ` Manali Shukla
  2023-09-12  1:47   ` Chao Gao
  2023-09-04  9:53 ` [PATCH 02/13] x86/cpufeatures: Add CPUID feature bit for Extended LVT Manali Shukla
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Manali Shukla @ 2023-09-04  9:53 UTC (permalink / raw)
  To: kvm, seanjc
  Cc: linux-doc, linux-perf-users, x86, pbonzini, peterz, bp,
	santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj,
	manali.shukla

There are four additional extended LVT registers available in extended
APIC register space which can be used for additional interrupt sources
like instruction based sampling and many more.

Please refer to AMD programmers's manual Volume 2, Section 16.4.5 for
more details on extapic.
https://bugzilla.kernel.org/attachment.cgi?id=304653

Adds two new vcpu-based IOCTLs to save and restore the local APIC
registers with extended APIC register space for a single vcpu. It
works same as KVM_GET_LAPIC and KVM_SET_LAPIC IOCTLs. The only
differece is the size of APIC page which is copied/restored by kernel.
In case of KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC IOCTLs,
kernel copies/restores the APIC page with extended APIC register space
located at APIC offsets 400h-530h.

KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC IOCTLs are used
when extended APIC is enabled in the guest.

Document KVM_GET_LAPIC_W_EXTAPIC, KVM_SET_LAPIC_W_EXTAPIC ioctls.

Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 Documentation/virt/kvm/api.rst  | 23 +++++++++++++++++++++++
 arch/x86/include/uapi/asm/kvm.h |  5 +++++
 arch/x86/kvm/lapic.c            | 12 +++++++-----
 arch/x86/kvm/lapic.h            |  6 ++++--
 arch/x86/kvm/x86.c              | 24 +++++++++++++-----------
 include/uapi/linux/kvm.h        | 10 ++++++++++
 6 files changed, 62 insertions(+), 18 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 73db30cb60fb..7239d4f1ecf3 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1961,6 +1961,18 @@ error.
 Reads the Local APIC registers and copies them into the input argument.  The
 data format and layout are the same as documented in the architecture manual.
 
+::
+
+  #define KVM_APIC_EXT_REG_SIZE 0x540
+  struct kvm_lapic_state_w_extapic {
+        __u8 regs[KVM_APIC_EXT_REG_SIZE];
+  };
+
+Applications should use KVM_GET_LAPIC_W_EXTAPIC ioctl if extended APIC is
+enabled. KVM_GET_LAPIC_W_EXTAPIC reads Local APIC registers with extended
+APIC register space located at offsets 500h-530h and copies them into input
+argument.
+
 If KVM_X2APIC_API_USE_32BIT_IDS feature of KVM_CAP_X2APIC_API is
 enabled, then the format of APIC_ID register depends on the APIC mode
 (reported by MSR_IA32_APICBASE) of its VCPU.  x2APIC stores APIC ID in
@@ -1992,6 +2004,17 @@ always uses xAPIC format.
 Copies the input argument into the Local APIC registers.  The data format
 and layout are the same as documented in the architecture manual.
 
+::
+
+  #define KVM_APIC_EXT_REG_SIZE 0x540
+  struct kvm_lapic_state_w_extapic {
+        __u8 regs[KVM_APIC_EXT_REG_SIZE];
+  };
+
+Applications should use KVM_SET_LAPIC_W_EXTAPIC ioctl if extended APIC is enabled.
+KVM_SET_LAPIC_W_EXTAPIC copies input arguments with extended APIC register into
+Local APIC and extended APIC registers.
+
 The format of the APIC ID register (bytes 32-35 of struct kvm_lapic_state's
 regs field) depends on the state of the KVM_CAP_X2APIC_API capability.
 See the note in KVM_GET_LAPIC.
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 1a6a1f987949..d5bed64fd73d 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -123,6 +123,11 @@ struct kvm_lapic_state {
 	char regs[KVM_APIC_REG_SIZE];
 };
 
+#define KVM_APIC_EXT_REG_SIZE 0x540
+struct kvm_lapic_state_w_extapic {
+	__u8 regs[KVM_APIC_EXT_REG_SIZE];
+};
+
 struct kvm_segment {
 	__u64 base;
 	__u32 limit;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index dcd60b39e794..7c1bd8594f1b 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2921,7 +2921,7 @@ int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu)
 }
 
 static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
-		struct kvm_lapic_state *s, bool set)
+		struct kvm_lapic_state_w_extapic *s, bool set)
 {
 	if (apic_x2apic_mode(vcpu->arch.apic)) {
 		u32 *id = (u32 *)(s->regs + APIC_ID);
@@ -2958,9 +2958,10 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
 	return 0;
 }
 
-int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
+int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state_w_extapic *s,
+		       unsigned int size)
 {
-	memcpy(s->regs, vcpu->arch.apic->regs, sizeof(*s));
+	memcpy(s->regs, vcpu->arch.apic->regs, size);
 
 	/*
 	 * Get calculated timer current count for remaining timer period (if
@@ -2972,7 +2973,8 @@ int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
 	return kvm_apic_state_fixup(vcpu, s, false);
 }
 
-int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
+int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state_w_extapic *s,
+		       unsigned int size)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 	int r;
@@ -2986,7 +2988,7 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
 		kvm_recalculate_apic_map(vcpu->kvm);
 		return r;
 	}
-	memcpy(vcpu->arch.apic->regs, s->regs, sizeof(*s));
+	memcpy(vcpu->arch.apic->regs, s->regs, size);
 
 	atomic_set_release(&apic->vcpu->kvm->arch.apic_map_dirty, DIRTY);
 	kvm_recalculate_apic_map(vcpu->kvm);
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 0a0ea4b5dd8c..ad6c48938733 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -121,8 +121,10 @@ void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high);
 
 u64 kvm_get_apic_base(struct kvm_vcpu *vcpu);
 int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
-int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
-int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s);
+int kvm_apic_get_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state_w_extapic *s,
+		       unsigned int size);
+int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state_w_extapic *s,
+		       unsigned int size);
 enum lapic_mode kvm_get_apic_mode(struct kvm_vcpu *vcpu);
 int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ddab7d0bb52b..e80a6d598753 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4925,19 +4925,19 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 }
 
 static int kvm_vcpu_ioctl_get_lapic(struct kvm_vcpu *vcpu,
-				    struct kvm_lapic_state *s)
+				    struct kvm_lapic_state_w_extapic *s, unsigned int size)
 {
 	static_call_cond(kvm_x86_sync_pir_to_irr)(vcpu);
 
-	return kvm_apic_get_state(vcpu, s);
+	return kvm_apic_get_state(vcpu, s, size);
 }
 
 static int kvm_vcpu_ioctl_set_lapic(struct kvm_vcpu *vcpu,
-				    struct kvm_lapic_state *s)
+				    struct kvm_lapic_state_w_extapic *s, unsigned int size)
 {
 	int r;
 
-	r = kvm_apic_set_state(vcpu, s);
+	r = kvm_apic_set_state(vcpu, s, size);
 	if (r)
 		return r;
 	update_cr8_intercept(vcpu);
@@ -5636,7 +5636,7 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 	int r;
 	union {
 		struct kvm_sregs2 *sregs2;
-		struct kvm_lapic_state *lapic;
+		struct kvm_lapic_state_w_extapic *lapic;
 		struct kvm_xsave *xsave;
 		struct kvm_xcrs *xcrs;
 		void *buffer;
@@ -5646,36 +5646,38 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 
 	u.buffer = NULL;
 	switch (ioctl) {
+	case KVM_GET_LAPIC_W_EXTAPIC:
 	case KVM_GET_LAPIC: {
 		r = -EINVAL;
 		if (!lapic_in_kernel(vcpu))
 			goto out;
-		u.lapic = kzalloc(sizeof(struct kvm_lapic_state),
-				GFP_KERNEL_ACCOUNT);
+		u.lapic = kzalloc(_IOC_SIZE(ioctl), GFP_KERNEL_ACCOUNT);
 
 		r = -ENOMEM;
 		if (!u.lapic)
 			goto out;
-		r = kvm_vcpu_ioctl_get_lapic(vcpu, u.lapic);
+		r = kvm_vcpu_ioctl_get_lapic(vcpu, u.lapic, _IOC_SIZE(ioctl));
 		if (r)
 			goto out;
 		r = -EFAULT;
-		if (copy_to_user(argp, u.lapic, sizeof(struct kvm_lapic_state)))
+		if (copy_to_user(argp, u.lapic, _IOC_SIZE(ioctl)))
 			goto out;
 		r = 0;
 		break;
 	}
+	case KVM_SET_LAPIC_W_EXTAPIC:
 	case KVM_SET_LAPIC: {
 		r = -EINVAL;
 		if (!lapic_in_kernel(vcpu))
 			goto out;
-		u.lapic = memdup_user(argp, sizeof(*u.lapic));
+		u.lapic = memdup_user(argp, _IOC_SIZE(ioctl));
+
 		if (IS_ERR(u.lapic)) {
 			r = PTR_ERR(u.lapic);
 			goto out_nofree;
 		}
 
-		r = kvm_vcpu_ioctl_set_lapic(vcpu, u.lapic);
+		r = kvm_vcpu_ioctl_set_lapic(vcpu, u.lapic, _IOC_SIZE(ioctl));
 		break;
 	}
 	case KVM_INTERRUPT: {
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 13065dd96132..e1dc04e0bf44 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1591,6 +1591,16 @@ struct kvm_s390_ucas_mapping {
 #define KVM_SET_FPU               _IOW(KVMIO,  0x8d, struct kvm_fpu)
 #define KVM_GET_LAPIC             _IOR(KVMIO,  0x8e, struct kvm_lapic_state)
 #define KVM_SET_LAPIC             _IOW(KVMIO,  0x8f, struct kvm_lapic_state)
+/*
+ * Added to save/restore local APIC registers with extended APIC (extapic)
+ * register space.
+ *
+ * Qemu emulates extapic logic only when KVM enables extapic functionality via
+ * KVM capability. In the condition where Qemu sets extapic registers, but KVM doesn't
+ * set extapic capability, Qemu ends up using KVM_GET_LAPIC and KVM_SET_LAPIC.
+ */
+#define KVM_GET_LAPIC_W_EXTAPIC   _IOR(KVMIO,  0x8e, struct kvm_lapic_state_w_extapic)
+#define KVM_SET_LAPIC_W_EXTAPIC   _IOW(KVMIO,  0x8f, struct kvm_lapic_state_w_extapic)
 #define KVM_SET_CPUID2            _IOW(KVMIO,  0x90, struct kvm_cpuid2)
 #define KVM_GET_CPUID2            _IOWR(KVMIO, 0x91, struct kvm_cpuid2)
 /* Available with KVM_CAP_VAPIC */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 02/13] x86/cpufeatures: Add CPUID feature bit for Extended LVT
  2023-09-04  9:53 [PATCH 00/13] Implement support for IBS virtualization Manali Shukla
  2023-09-04  9:53 ` [PATCH 01/13] KVM: Add KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC for extapic Manali Shukla
@ 2023-09-04  9:53 ` Manali Shukla
  2023-09-04  9:53 ` [PATCH 03/13] KVM: x86: Add emulation support for Extented LVT registers Manali Shukla
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Manali Shukla @ 2023-09-04  9:53 UTC (permalink / raw)
  To: kvm, seanjc
  Cc: linux-doc, linux-perf-users, x86, pbonzini, peterz, bp,
	santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj,
	manali.shukla

From: Santosh Shukla <santosh.shukla@amd.com>

Local interrupts can be extended to include more LVT registers in
order to allow additional interrupt sources, like Instruction Based
Sampling (IBS).

The Extended APIC feature register indicates the number of extended
Local Vector Table(LVT) registers in the local APIC.  Currently, there
are 4 extended LVT registers available which are located at APIC
offsets (500h-530h).

The EXTLVT feature bit changes the behavior associated with reading
and writing an extended LVT register. When the EXTLVT feature is
enabled, a write to an extended LVT register changes from a fault
style #VMEXIT to a trap style #VMEXIT and a read of an extended LVT
register no longer triggers a #VMEXIT.

Please refer to Section 16.4.5 in AMD Programmer's Manual Volume 2 for
more details on EXTLVT.
https://bugzilla.kernel.org/attachment.cgi?id=304653

Presence of the EXTLVT feature is indicated via CPUID function
0x8000000A_EDX[27].

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 7b4ecbf78d8b..2e4624fa6e4e 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -374,6 +374,7 @@
 #define X86_FEATURE_X2AVIC		(15*32+18) /* Virtual x2apic */
 #define X86_FEATURE_V_SPEC_CTRL		(15*32+20) /* Virtual SPEC_CTRL */
 #define X86_FEATURE_VNMI		(15*32+25) /* Virtual NMI */
+#define X86_FEATURE_EXTLVT		(15*32+27) /* "" EXTLVT */
 #define X86_FEATURE_SVME_ADDR_CHK	(15*32+28) /* "" SVME addr check */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:0 (ECX), word 16 */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 03/13] KVM: x86: Add emulation support for Extented LVT registers
  2023-09-04  9:53 [PATCH 00/13] Implement support for IBS virtualization Manali Shukla
  2023-09-04  9:53 ` [PATCH 01/13] KVM: Add KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC for extapic Manali Shukla
  2023-09-04  9:53 ` [PATCH 02/13] x86/cpufeatures: Add CPUID feature bit for Extended LVT Manali Shukla
@ 2023-09-04  9:53 ` Manali Shukla
  2023-09-12  2:36   ` Chao Gao
  2023-09-04  9:53 ` [PATCH 04/13] x86/cpufeatures: Add CPUID feature bit for virtualized IBS Manali Shukla
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Manali Shukla @ 2023-09-04  9:53 UTC (permalink / raw)
  To: kvm, seanjc
  Cc: linux-doc, linux-perf-users, x86, pbonzini, peterz, bp,
	santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj,
	manali.shukla

From: Santosh Shukla <santosh.shukla@amd.com>

The local interrupts are extended to include more LVT registers in
order to allow additional interrupt sources, like Instruction Based
Sampling (IBS) and many more.

Currently there are four additional LVT registers defined and they are
located at APIC offsets 500h-530h.

AMD IBS driver is designed to use EXTLVT (Extended interrupt local
vector table) by default for driver initialization.

Extended LVT registers are required to be emulated to initialize the
guest IBS driver successfully.

Please refer to Section 16.4.5 in AMD Programmer's Manual Volume 2 at
https://bugzilla.kernel.org/attachment.cgi?id=304653 for more details
on EXTLVT.

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Co-developed-by: Manali Shukla <manali.shukla@amd.com>
Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/include/asm/apicdef.h | 14 ++++++++
 arch/x86/kvm/lapic.c           | 66 ++++++++++++++++++++++++++++++++--
 arch/x86/kvm/lapic.h           |  1 +
 arch/x86/kvm/svm/avic.c        |  4 +++
 4 files changed, 83 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
index 4b125e5b3187..ac50919d10be 100644
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -139,6 +139,20 @@
 #define		APIC_EILVT_MSG_EXT	0x7
 #define		APIC_EILVT_MASKED	(1 << 16)
 
+/*
+ * Initialize extended APIC registers to the default value when guest is started
+ * and EXTAPIC feature is enabled on the guest.
+ *
+ * APIC_EFEAT is a read only Extended APIC feature register, whose default value
+ * is 0x00040007.
+ *
+ * APIC_ECTRL is a read-write Extended APIC control register, whose default value
+ * is 0x0.
+ */
+
+#define		APIC_EFEAT_DEFAULT	0x00040007
+#define		APIC_ECTRL_DEFAULT	0x0
+
 #define APIC_BASE (fix_to_virt(FIX_APIC_BASE))
 #define APIC_BASE_MSR		0x800
 #define APIC_X2APIC_ID_MSR	0x802
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 7c1bd8594f1b..88985c481fe8 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1599,9 +1599,13 @@ static inline struct kvm_lapic *to_lapic(struct kvm_io_device *dev)
 }
 
 #define APIC_REG_MASK(reg)	(1ull << ((reg) >> 4))
+#define APIC_REG_EXT_MASK(reg)	(1ull << (((reg) >> 4) - 0x40))
 #define APIC_REGS_MASK(first, count) \
 	(APIC_REG_MASK(first) * ((1ull << (count)) - 1))
 
+#define APIC_LAST_REG_OFFSET		0x3f0
+#define APIC_EXT_LAST_REG_OFFSET	0x530
+
 u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic)
 {
 	/* Leave bits '0' for reserved and write-only registers. */
@@ -1643,6 +1647,8 @@ EXPORT_SYMBOL_GPL(kvm_lapic_readable_reg_mask);
 static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
 			      void *data)
 {
+	u64 valid_reg_ext_mask = 0;
+	unsigned int last_reg = APIC_LAST_REG_OFFSET;
 	unsigned char alignment = offset & 0xf;
 	u32 result;
 
@@ -1652,13 +1658,44 @@ static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
 	 */
 	WARN_ON_ONCE(apic_x2apic_mode(apic) && offset == APIC_ICR);
 
+	/*
+	 * The local interrupts are extended to include LVT registers to allow
+	 * additional interrupt sources when the EXTAPIC feature bit is enabled.
+	 * The Extended Interrupt LVT registers are located at APIC offsets 400-530h.
+	 */
+	if (guest_cpuid_has(apic->vcpu, X86_FEATURE_EXTAPIC)) {
+		valid_reg_ext_mask =
+			APIC_REG_EXT_MASK(APIC_EFEAT) |
+			APIC_REG_EXT_MASK(APIC_ECTRL) |
+			APIC_REG_EXT_MASK(APIC_EILVTn(0)) |
+			APIC_REG_EXT_MASK(APIC_EILVTn(1)) |
+			APIC_REG_EXT_MASK(APIC_EILVTn(2)) |
+			APIC_REG_EXT_MASK(APIC_EILVTn(3));
+		last_reg = APIC_EXT_LAST_REG_OFFSET;
+	}
+
 	if (alignment + len > 4)
 		return 1;
 
-	if (offset > 0x3f0 ||
-	    !(kvm_lapic_readable_reg_mask(apic) & APIC_REG_MASK(offset)))
+	if (offset > last_reg)
 		return 1;
 
+	switch (offset) {
+	/*
+	 * Section 16.3.2 in the AMD Programmer's Manual Volume 2 states:
+	 * "APIC registers are aligned to 16-byte offsets and must be accessed
+	 * using naturally-aligned DWORD size read and writes."
+	 */
+	case KVM_APIC_REG_SIZE ... KVM_APIC_EXT_REG_SIZE - 16:
+		if (!(valid_reg_ext_mask & APIC_REG_EXT_MASK(offset)))
+			return 1;
+		break;
+	default:
+		if (!(kvm_lapic_readable_reg_mask(apic) & APIC_REG_MASK(offset)))
+			return 1;
+
+	}
+
 	result = __apic_read(apic, offset & ~0xf);
 
 	trace_kvm_apic_read(offset, result);
@@ -2386,6 +2423,12 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 		else
 			kvm_apic_send_ipi(apic, APIC_DEST_SELF | val, 0);
 		break;
+	case APIC_EILVTn(0):
+	case APIC_EILVTn(1):
+	case APIC_EILVTn(2):
+	case APIC_EILVTn(3):
+		kvm_lapic_set_reg(apic, reg, val);
+		break;
 	default:
 		ret = 1;
 		break;
@@ -2664,6 +2707,25 @@ void kvm_inhibit_apic_access_page(struct kvm_vcpu *vcpu)
 	kvm_vcpu_srcu_read_lock(vcpu);
 }
 
+/*
+ * Initialize extended APIC registers to the default value when guest is
+ * started. The extended APIC registers should only be initialized when the
+ * EXTAPIC feature is enabled on the guest.
+ */
+void kvm_apic_init_eilvt_regs(struct kvm_vcpu *vcpu)
+{
+	struct kvm_lapic *apic = vcpu->arch.apic;
+	int i;
+
+	if (guest_cpuid_has(vcpu, X86_FEATURE_EXTAPIC)) {
+		kvm_lapic_set_reg(apic, APIC_EFEAT, APIC_EFEAT_DEFAULT);
+		kvm_lapic_set_reg(apic, APIC_ECTRL, APIC_ECTRL_DEFAULT);
+		for (i = 0; i < APIC_EILVT_NR_MAX; i++)
+			kvm_lapic_set_reg(apic, APIC_EILVTn(i), APIC_EILVT_MASKED);
+	}
+}
+EXPORT_SYMBOL_GPL(kvm_apic_init_eilvt_regs);
+
 void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index ad6c48938733..b0c7393cd6af 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -93,6 +93,7 @@ int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
 int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
 int kvm_apic_accept_events(struct kvm_vcpu *vcpu);
 void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event);
+void kvm_apic_init_eilvt_regs(struct kvm_vcpu *vcpu);
 u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu);
 void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8);
 void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index cfc8ab773025..081075674b1d 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -679,6 +679,10 @@ static bool is_avic_unaccelerated_access_trap(u32 offset)
 	case APIC_LVTERR:
 	case APIC_TMICT:
 	case APIC_TDCR:
+	case APIC_EILVTn(0):
+	case APIC_EILVTn(1):
+	case APIC_EILVTn(2):
+	case APIC_EILVTn(3):
 		ret = true;
 		break;
 	default:
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 04/13] x86/cpufeatures: Add CPUID feature bit for virtualized IBS
  2023-09-04  9:53 [PATCH 00/13] Implement support for IBS virtualization Manali Shukla
                   ` (2 preceding siblings ...)
  2023-09-04  9:53 ` [PATCH 03/13] KVM: x86: Add emulation support for Extented LVT registers Manali Shukla
@ 2023-09-04  9:53 ` Manali Shukla
  2023-09-04  9:53 ` [PATCH 05/13] KVM: x86/cpuid: Add a KVM-only leaf for IBS capabilities Manali Shukla
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Manali Shukla @ 2023-09-04  9:53 UTC (permalink / raw)
  To: kvm, seanjc
  Cc: linux-doc, linux-perf-users, x86, pbonzini, peterz, bp,
	santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj,
	manali.shukla

From: Santosh Shukla <santosh.shukla@amd.com>

The virtualized IBS (VIBS) feature allows the guest to collect IBS
samples without exiting the guest.

Presence of the VIBS feature is indicated via CPUID function
0x8000000A_EDX[26].

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 2e4624fa6e4e..8f92fa6d8319 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -374,6 +374,7 @@
 #define X86_FEATURE_X2AVIC		(15*32+18) /* Virtual x2apic */
 #define X86_FEATURE_V_SPEC_CTRL		(15*32+20) /* Virtual SPEC_CTRL */
 #define X86_FEATURE_VNMI		(15*32+25) /* Virtual NMI */
+#define X86_FEATURE_VIBS		(15*32+26) /* "" Virtual IBS */
 #define X86_FEATURE_EXTLVT		(15*32+27) /* "" EXTLVT */
 #define X86_FEATURE_SVME_ADDR_CHK	(15*32+28) /* "" SVME addr check */
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 05/13] KVM: x86/cpuid: Add a KVM-only leaf for IBS capabilities
  2023-09-04  9:53 [PATCH 00/13] Implement support for IBS virtualization Manali Shukla
                   ` (3 preceding siblings ...)
  2023-09-04  9:53 ` [PATCH 04/13] x86/cpufeatures: Add CPUID feature bit for virtualized IBS Manali Shukla
@ 2023-09-04  9:53 ` Manali Shukla
  2023-09-04  9:53 ` [PATCH 06/13] KVM: x86: Extend CPUID range to include new leaf Manali Shukla
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Manali Shukla @ 2023-09-04  9:53 UTC (permalink / raw)
  To: kvm, seanjc
  Cc: linux-doc, linux-perf-users, x86, pbonzini, peterz, bp,
	santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj,
	manali.shukla

Add a KVM-only leaf for AMD's Instruction Based Sampling capabilities.
There are 10 capabilities which are added to KVM-only leaf, so that KVM
can set these capabilities for the guest, when IBS feature bit is
enabled on the guest.

Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/kvm/reverse_cpuid.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h
index b81650678375..c6386c431fa6 100644
--- a/arch/x86/kvm/reverse_cpuid.h
+++ b/arch/x86/kvm/reverse_cpuid.h
@@ -15,6 +15,7 @@ enum kvm_only_cpuid_leafs {
 	CPUID_12_EAX	 = NCAPINTS,
 	CPUID_7_1_EDX,
 	CPUID_8000_0007_EDX,
+	CPUID_8000_001B_EAX,
 	CPUID_8000_0022_EAX,
 	NR_KVM_CPU_CAPS,
 
@@ -52,6 +53,19 @@ enum kvm_only_cpuid_leafs {
 /* CPUID level 0x80000022 (EAX) */
 #define KVM_X86_FEATURE_PERFMON_V2	KVM_X86_FEATURE(CPUID_8000_0022_EAX, 0)
 
+/* AMD defined Instruction-base Sampling capabilities. CPUID level 0x8000001B (EAX). */
+#define X86_FEATURE_IBS_AVAIL		KVM_X86_FEATURE(CPUID_8000_001B_EAX, 0)
+#define X86_FEATURE_IBS_FETCHSAM	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 1)
+#define X86_FEATURE_IBS_OPSAM		KVM_X86_FEATURE(CPUID_8000_001B_EAX, 2)
+#define X86_FEATURE_IBS_RDWROPCNT	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 3)
+#define X86_FEATURE_IBS_OPCNT		KVM_X86_FEATURE(CPUID_8000_001B_EAX, 4)
+#define X86_FEATURE_IBS_BRNTRGT		KVM_X86_FEATURE(CPUID_8000_001B_EAX, 5)
+#define X86_FEATURE_IBS_OPCNTEXT	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 6)
+#define X86_FEATURE_IBS_RIPINVALIDCHK	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 7)
+#define X86_FEATURE_IBS_OPBRNFUSE	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 8)
+#define X86_FEATURE_IBS_FETCHCTLEXTD	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 9)
+#define X86_FEATURE_IBS_ZEN4_EXT	KVM_X86_FEATURE(CPUID_8000_001B_EAX, 11)
+
 struct cpuid_reg {
 	u32 function;
 	u32 index;
@@ -80,6 +94,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
 	[CPUID_8000_0007_EDX] = {0x80000007, 0, CPUID_EDX},
 	[CPUID_8000_0021_EAX] = {0x80000021, 0, CPUID_EAX},
 	[CPUID_8000_0022_EAX] = {0x80000022, 0, CPUID_EAX},
+	[CPUID_8000_001B_EAX] = {0x8000001b, 0, CPUID_EAX},
 };
 
 /*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 06/13] KVM: x86: Extend CPUID range to include new leaf
  2023-09-04  9:53 [PATCH 00/13] Implement support for IBS virtualization Manali Shukla
                   ` (4 preceding siblings ...)
  2023-09-04  9:53 ` [PATCH 05/13] KVM: x86/cpuid: Add a KVM-only leaf for IBS capabilities Manali Shukla
@ 2023-09-04  9:53 ` Manali Shukla
  2023-09-12  2:46   ` Chao Gao
  2023-09-04  9:53 ` [PATCH 07/13] KVM: SVM: Extend VMCB area for virtualized IBS registers Manali Shukla
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Manali Shukla @ 2023-09-04  9:53 UTC (permalink / raw)
  To: kvm, seanjc
  Cc: linux-doc, linux-perf-users, x86, pbonzini, peterz, bp,
	santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj,
	manali.shukla

CPUID leaf 0x8000001b (EAX) provides information about
Instruction-Based sampling capabilities on AMD Platforms. Complete
description about 0x8000001b CPUID leaf is available in AMD
Programmer's Manual volume 3, Appendix E, section E.4.13.
https://bugzilla.kernel.org/attachment.cgi?id=304655

Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/kvm/cpuid.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0544e30b4946..1f4d505fb69d 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -771,6 +771,12 @@ void kvm_set_cpu_caps(void)
 		F(PERFMON_V2)
 	);
 
+	/*
+	 * Hide all IBS related features by default, it will be enabled
+	 * automatically when IBS virtualization is enabled
+	 */
+	kvm_cpu_cap_init_kvm_defined(CPUID_8000_001B_EAX, 0);
+
 	/*
 	 * Synthesize "LFENCE is serializing" into the AMD-defined entry in
 	 * KVM's supported CPUID if the feature is reported as supported by the
@@ -1252,6 +1258,11 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
 		entry->eax = entry->ebx = entry->ecx = 0;
 		entry->edx = 0; /* reserved */
 		break;
+	/* AMD IBS capability */
+	case 0x8000001B:
+		entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
+		cpuid_entry_override(entry, CPUID_8000_001B_EAX);
+		break;
 	case 0x8000001F:
 		if (!kvm_cpu_cap_has(X86_FEATURE_SEV)) {
 			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 07/13] KVM: SVM: Extend VMCB area for virtualized IBS registers
  2023-09-04  9:53 [PATCH 00/13] Implement support for IBS virtualization Manali Shukla
                   ` (5 preceding siblings ...)
  2023-09-04  9:53 ` [PATCH 06/13] KVM: x86: Extend CPUID range to include new leaf Manali Shukla
@ 2023-09-04  9:53 ` Manali Shukla
  2023-09-12  2:50   ` Chao Gao
  2023-09-04  9:53 ` [PATCH 08/13] perf/x86/amd: Add framework to save/restore host IBS state Manali Shukla
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Manali Shukla @ 2023-09-04  9:53 UTC (permalink / raw)
  To: kvm, seanjc
  Cc: linux-doc, linux-perf-users, x86, pbonzini, peterz, bp,
	santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj,
	manali.shukla

From: Santosh Shukla <santosh.shukla@amd.com>

VMCB state save is extended to hold guest values of the fetch and op
IBS registers.

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/include/asm/svm.h | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index dee9fa91120b..4096d2f68770 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -346,6 +346,19 @@ struct vmcb_save_area {
 	u64 last_excp_to;
 	u8 reserved_0x298[72];
 	u64 spec_ctrl;		/* Guest version of SPEC_CTRL at 0x2E0 */
+	u8 reserved_0x2e8[904];
+	u8 lbr_stack_from_to[256];
+	u64 lbr_select;
+	u64 ibs_fetch_ctl;
+	u64 ibs_fetch_linear_addr;
+	u64 ibs_op_ctl;
+	u64 ibs_op_rip;
+	u64 ibs_op_data;
+	u64 ibs_op_data2;
+	u64 ibs_op_data3;
+	u64 ibs_dc_linear_addr;
+	u64 ibs_br_target;
+	u64 ibs_fetch_extd_ctl;
 } __packed;
 
 /* Save area definition for SEV-ES and SEV-SNP guests */
@@ -512,7 +525,7 @@ struct ghcb {
 } __packed;
 
 
-#define EXPECTED_VMCB_SAVE_AREA_SIZE		744
+#define EXPECTED_VMCB_SAVE_AREA_SIZE		1992
 #define EXPECTED_GHCB_SAVE_AREA_SIZE		1032
 #define EXPECTED_SEV_ES_SAVE_AREA_SIZE		1648
 #define EXPECTED_VMCB_CONTROL_AREA_SIZE		1024
@@ -537,6 +550,7 @@ static inline void __unused_size_checks(void)
 	BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x180);
 	BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x248);
 	BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x298);
+	BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x2e8);
 
 	BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0xc8);
 	BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0xcc);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 08/13] perf/x86/amd: Add framework to save/restore host IBS state
  2023-09-04  9:53 [PATCH 00/13] Implement support for IBS virtualization Manali Shukla
                   ` (6 preceding siblings ...)
  2023-09-04  9:53 ` [PATCH 07/13] KVM: SVM: Extend VMCB area for virtualized IBS registers Manali Shukla
@ 2023-09-04  9:53 ` Manali Shukla
  2023-09-05 14:54   ` Tom Lendacky
  2023-09-04  9:53 ` [PATCH 09/13] KVM: SVM: add support for IBS virtualization for non SEV-ES guests Manali Shukla
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Manali Shukla @ 2023-09-04  9:53 UTC (permalink / raw)
  To: kvm, seanjc
  Cc: linux-doc, linux-perf-users, x86, pbonzini, peterz, bp,
	santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj,
	manali.shukla

Since IBS registers falls under swap type C [1], only the guest state
is saved and restored automatically by the hardware. Host state needs
to be saved and restored manually by the hypervisor. Note that, saving
and restoring of host IBS state happens only when IBS is active on
host to avoid unnecessary rdmsrs/wrmsrs.

Also, hypervisor needs to disable host IBS before VMRUN and re-enable
it after VMEXIT [2]. However, disabling and enabling of IBS leads to
subtle races between software and hardware since IBS_*_CTL registers
contain both control and result bits in the same MSR.

Consider the following scenario, hypervisor reads IBS control MSR and
finds enable=1 (control bit) and valid=0 (result bit). While kernel is
clearing enable bit in its local copy, IBS hardware sets valid bit to
1 in the MSR. Software, who is unaware of the change done by IBS
hardware, overwrites IBS MSR with enable=0 and valid=0. Note that,
this situation occurs while NMIs are disabled. So CPU will receive IBS
NMI only after STGI. However, the IBS driver won't handle NMI because
of the valid bit being 0. Since the real source of NMI was IBS, nobody
else will also handle it which will result in the unknown NMIs.

Handle the above mentioned race by keeping track of different actions
performed by KVM on IBS:

  WINDOW_START: After CLGI and before VMRUN. KVM informs IBS driver
                about its intention to enable IBS for the guest. Thus
		IBS should be disabled on host and IBS host register
		state should be saved.
  WINDOW_STOPPING: After VMEXIT and before STGI. KVM informs IBS driver
                that it's done using IBS inside the guest and thus host
		IBS state should be restored followed by re-enabling
		IBS for host.
  WINDOW_STOPPED: After STGI. CPU will receive any pending NMI if it
                was raised between CLGI and STGI. NMI will be marked
		as handled by IBS driver if WINDOW_STOPPED action is
                _not performed, valid bit is _not_ set and a valid
                IBS event exists. However, IBS sample won't be generated.

[1]: https://bugzilla.kernel.org/attachment.cgi?id=304653
     AMD64 Architecture Programmer’s Manual, Vol 2, Appendix B Layout
     of VMCB, Table B-3 Swap Types.

[2]: https://bugzilla.kernel.org/attachment.cgi?id=304653
     AMD64 Architecture Programmer’s Manual, Vol 2, Section 15.38
     Instruction-Based Sampling Virtualization.

Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/events/amd/Makefile      |   2 +-
 arch/x86/events/amd/ibs.c         |  23 +++++++
 arch/x86/events/amd/vibs.c        | 101 ++++++++++++++++++++++++++++++
 arch/x86/include/asm/perf_event.h |  27 ++++++++
 4 files changed, 152 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/events/amd/vibs.c

diff --git a/arch/x86/events/amd/Makefile b/arch/x86/events/amd/Makefile
index 527d947eb76b..13c2980db9a7 100644
--- a/arch/x86/events/amd/Makefile
+++ b/arch/x86/events/amd/Makefile
@@ -2,7 +2,7 @@
 obj-$(CONFIG_CPU_SUP_AMD)		+= core.o lbr.o
 obj-$(CONFIG_PERF_EVENTS_AMD_BRS)	+= brs.o
 obj-$(CONFIG_PERF_EVENTS_AMD_POWER)	+= power.o
-obj-$(CONFIG_X86_LOCAL_APIC)		+= ibs.o
+obj-$(CONFIG_X86_LOCAL_APIC)		+= ibs.o vibs.o
 obj-$(CONFIG_PERF_EVENTS_AMD_UNCORE)	+= amd-uncore.o
 amd-uncore-objs				:= uncore.o
 ifdef CONFIG_AMD_IOMMU
diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
index 6911c5399d02..359464f2910d 100644
--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -1039,6 +1039,16 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
 		 */
 		if (test_and_clear_bit(IBS_STOPPED, pcpu->state))
 			return 1;
+		/*
+		 * Catch NMIs generated in an active IBS window: Incoming NMIs
+		 * from an active IBS window might have the VALID bit cleared
+		 * when it is supposed to be set due to a race. The reason for
+		 * the race is ENABLE and VALID bits for MSR_AMD64_IBSFETCHCTL
+		 * and MSR_AMD64_IBSOPCTL being in their same respective MSRs.
+		 * Ignore all such NMIs and treat them as handled.
+		 */
+		if (amd_vibs_ignore_nmi())
+			return 1;
 
 		return 0;
 	}
@@ -1542,3 +1552,16 @@ static __init int amd_ibs_init(void)
 
 /* Since we need the pci subsystem to init ibs we can't do this earlier: */
 device_initcall(amd_ibs_init);
+
+static inline bool get_ibs_state(struct perf_ibs *perf_ibs)
+{
+	struct cpu_perf_ibs *pcpu = this_cpu_ptr(perf_ibs->pcpu);
+
+	return test_bit(IBS_STARTED, pcpu->state);
+}
+
+bool is_amd_ibs_started(void)
+{
+	return get_ibs_state(&perf_ibs_fetch) || get_ibs_state(&perf_ibs_op);
+}
+EXPORT_SYMBOL_GPL(is_amd_ibs_started);
diff --git a/arch/x86/events/amd/vibs.c b/arch/x86/events/amd/vibs.c
new file mode 100644
index 000000000000..273a60f1cb7f
--- /dev/null
+++ b/arch/x86/events/amd/vibs.c
@@ -0,0 +1,101 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ *  Virtualized Performance events - AMD VIBS
+ *
+ *  Copyright (C) 2023 Advanced Micro Devices, Inc., Manali Shukla
+ *
+ *  For licencing details see kernel-base/COPYING
+ */
+
+#include <linux/perf_event.h>
+
+DEFINE_PER_CPU(bool, vibs_window_active);
+
+static bool amd_disable_ibs_fetch(u64 *ibs_fetch_ctl)
+{
+	*ibs_fetch_ctl = __rdmsr(MSR_AMD64_IBSFETCHCTL);
+	if (!(*ibs_fetch_ctl & IBS_FETCH_ENABLE))
+		return false;
+
+	native_wrmsrl(MSR_AMD64_IBSFETCHCTL, *ibs_fetch_ctl & ~IBS_FETCH_ENABLE);
+
+	return true;
+}
+
+static u64 amd_disable_ibs_op(u64 *ibs_op_ctl)
+{
+	*ibs_op_ctl = __rdmsr(MSR_AMD64_IBSOPCTL);
+	if (!(*ibs_op_ctl & IBS_OP_ENABLE))
+		return false;
+
+	native_wrmsrl(MSR_AMD64_IBSOPCTL, *ibs_op_ctl & ~IBS_OP_ENABLE);
+
+	return true;
+}
+
+static void amd_restore_ibs_fetch(u64 ibs_fetch_ctl)
+{
+	native_wrmsrl(MSR_AMD64_IBSFETCHCTL, ibs_fetch_ctl);
+}
+
+static void amd_restore_ibs_op(u64 ibs_op_ctl)
+{
+	native_wrmsrl(MSR_AMD64_IBSOPCTL, ibs_op_ctl);
+}
+
+bool amd_vibs_ignore_nmi(void)
+{
+	return __this_cpu_read(vibs_window_active);
+}
+EXPORT_SYMBOL_GPL(amd_vibs_ignore_nmi);
+
+bool amd_vibs_window(enum amd_vibs_window_state state, u64 *f_ctl,
+		     u64 *o_ctl)
+{
+	bool f_active, o_active;
+
+	switch (state) {
+	case WINDOW_START:
+		if (!f_ctl || !o_ctl)
+			return false;
+
+		if (!is_amd_ibs_started())
+			return false;
+
+		f_active = amd_disable_ibs_fetch(f_ctl);
+		o_active = amd_disable_ibs_op(o_ctl);
+		__this_cpu_write(vibs_window_active, (f_active || o_active));
+		break;
+
+	case WINDOW_STOPPING:
+		if (!f_ctl || !o_ctl)
+			return false;
+
+		if (__this_cpu_read(vibs_window_active))
+			return false;
+
+		if (*f_ctl & IBS_FETCH_ENABLE)
+			amd_restore_ibs_fetch(*f_ctl);
+		if (*o_ctl & IBS_OP_ENABLE)
+			amd_restore_ibs_op(*o_ctl);
+
+		break;
+
+	case WINDOW_STOPPED:
+		/*
+		 * This state is executed right after STGI (which is executed
+		 * after VMEXIT).  By this time, host IBS states are already
+		 * restored in WINDOW_STOPPING state, so f_ctl and o_ctl will
+		 * be passed as NULL for this state.
+		 */
+		if (__this_cpu_read(vibs_window_active))
+			__this_cpu_write(vibs_window_active, false);
+		break;
+
+	default:
+		return false;
+	}
+
+	return __this_cpu_read(vibs_window_active);
+}
+EXPORT_SYMBOL_GPL(amd_vibs_window);
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 85a9fd5a3ec3..b87c235e0e1e 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -486,6 +486,12 @@ struct pebs_xmm {
 #define IBS_OP_MAX_CNT_EXT_MASK	(0x7FULL<<20)	/* separate upper 7 bits */
 #define IBS_RIP_INVALID		(1ULL<<38)
 
+enum amd_vibs_window_state {
+	WINDOW_START = 0,
+	WINDOW_STOPPING,
+	WINDOW_STOPPED,
+};
+
 #ifdef CONFIG_X86_LOCAL_APIC
 extern u32 get_ibs_caps(void);
 extern int forward_event_to_ibs(struct perf_event *event);
@@ -584,6 +590,27 @@ static inline void intel_pt_handle_vmx(int on)
 }
 #endif
 
+#if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_CPU_SUP_AMD)
+extern bool amd_vibs_window(enum amd_vibs_window_state state, u64 *vibs_fetch_ctl,
+			    u64 *vibs_op_ctl);
+extern bool is_amd_ibs_started(void);
+extern bool amd_vibs_ignore_nmi(void);
+#else
+static inline bool amd_vibs_window(enum amd_vibs_window_state state, u64 *vibs_fetch_ctl,
+				  u64 *vibs_op_ctl)
+{
+	return false;
+}
+static inline bool is_amd_ibs_started(void)
+{
+	return false;
+}
+static inline bool amd_vibs_ignore_nmi(void)
+{
+	return false;
+}
+#endif
+
 #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD)
  extern void amd_pmu_enable_virt(void);
  extern void amd_pmu_disable_virt(void);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 09/13] KVM: SVM: add support for IBS virtualization for non SEV-ES guests
  2023-09-04  9:53 [PATCH 00/13] Implement support for IBS virtualization Manali Shukla
                   ` (7 preceding siblings ...)
  2023-09-04  9:53 ` [PATCH 08/13] perf/x86/amd: Add framework to save/restore host IBS state Manali Shukla
@ 2023-09-04  9:53 ` Manali Shukla
  2023-09-05 15:30   ` Tom Lendacky
                     ` (2 more replies)
  2023-09-04  9:53 ` [PATCH 10/13] x86/cpufeatures: Add CPUID feature bit for VIBS in SEV-ES guest Manali Shukla
                   ` (4 subsequent siblings)
  13 siblings, 3 replies; 32+ messages in thread
From: Manali Shukla @ 2023-09-04  9:53 UTC (permalink / raw)
  To: kvm, seanjc
  Cc: linux-doc, linux-perf-users, x86, pbonzini, peterz, bp,
	santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj,
	manali.shukla

From: Santosh Shukla <santosh.shukla@amd.com>

IBS virtualization (VIBS) [1] feature allows the guest to collect IBS
samples without exiting the guest.

There are 2 parts to this feature
- Virtualizing the IBS register state.
- Ensuring the IBS interrupt is handled in the guest without exiting
  to the hypervisor.

IBS virtualization requires the use of AVIC or NMI virtualization for
delivery of a virtualized interrupt from IBS hardware in the guest.
Without the virtualized interrupt delivery, the IBS interrupt
occurring in the guest will not be delivered to either the guest or
the hypervisor.  When AVIC is enabled, IBS LVT entry (Extended
Interrupt 0 LVT) message type should be programmed to INTR or NMI.

So, when the sampled interval for the data collection for IBS fetch/op
block is over, VIBS hardware is going to generate a Virtual NMI, but
the source of Virtual NMI is different in both AVIC enabled/disabled
case.
1) when AVIC is enabled, Virtual NMI is generated via AVIC using
   extended LVT (EXTLVT).
2) When AVIC is disabled, Virtual NMI is directly generated from
   hardware.

Since IBS registers falls under swap type C [2], only the guest state is
saved and restored automatically by the hardware. Host state needs to be
saved and restored manually by the hypervisor. Note that, saving and
restoring of host IBS state happens only when IBS is active on host.  to
avoid unnecessary rdmsrs/wrmsrs. Hypervisor needs to disable host IBS
before VMRUN and re-enable it after VMEXIT [1].

The IBS virtualization feature for non SEV-ES guests is not enabled in
this patch. Later patches enable VIBS for non SEV-ES guests.

[1]: https://bugzilla.kernel.org/attachment.cgi?id=304653
     AMD64 Architecture Programmer’s Manual, Vol 2, Section 15.38
     Instruction-Based Sampling Virtualization.

[2]: https://bugzilla.kernel.org/attachment.cgi?id=304653
     AMD64 Architecture Programmer’s Manual, Vol 2, Appendix B Layout
     of VMCB, Table B-3 Swap Types.

Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
Co-developed-by: Manali Shukla <manali.shukla@amd.com>
Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/kvm/svm/svm.c | 172 ++++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/svm/svm.h |   4 +-
 2 files changed, 173 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 20fe83eb32ee..6f566ed93f4c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -139,6 +139,22 @@ static const struct svm_direct_access_msrs {
 	{ .index = X2APIC_MSR(APIC_TMICT),		.always = false },
 	{ .index = X2APIC_MSR(APIC_TMCCT),		.always = false },
 	{ .index = X2APIC_MSR(APIC_TDCR),		.always = false },
+	{ .index = MSR_AMD64_IBSFETCHCTL,		.always = false },
+	{ .index = MSR_AMD64_IBSFETCHLINAD,		.always = false },
+	{ .index = MSR_AMD64_IBSOPCTL,			.always = false },
+	{ .index = MSR_AMD64_IBSOPRIP,			.always = false },
+	{ .index = MSR_AMD64_IBSOPDATA,			.always = false },
+	{ .index = MSR_AMD64_IBSOPDATA2,		.always = false },
+	{ .index = MSR_AMD64_IBSOPDATA3,		.always = false },
+	{ .index = MSR_AMD64_IBSDCLINAD,		.always = false },
+	{ .index = MSR_AMD64_IBSBRTARGET,		.always = false },
+	{ .index = MSR_AMD64_ICIBSEXTDCTL,		.always = false },
+	{ .index = X2APIC_MSR(APIC_EFEAT),		.always = false },
+	{ .index = X2APIC_MSR(APIC_ECTRL),		.always = false },
+	{ .index = X2APIC_MSR(APIC_EILVTn(0)),		.always = false },
+	{ .index = X2APIC_MSR(APIC_EILVTn(1)),		.always = false },
+	{ .index = X2APIC_MSR(APIC_EILVTn(2)),		.always = false },
+	{ .index = X2APIC_MSR(APIC_EILVTn(3)),		.always = false },
 	{ .index = MSR_INVALID,				.always = false },
 };
 
@@ -217,6 +233,10 @@ module_param(vgif, int, 0444);
 static int lbrv = true;
 module_param(lbrv, int, 0444);
 
+/* enable/disable IBS virtualization */
+static int vibs;
+module_param(vibs, int, 0444);
+
 static int tsc_scaling = true;
 module_param(tsc_scaling, int, 0444);
 
@@ -1050,6 +1070,20 @@ void disable_nmi_singlestep(struct vcpu_svm *svm)
 	}
 }
 
+void svm_ibs_msr_interception(struct vcpu_svm *svm, bool intercept)
+{
+	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSFETCHCTL, !intercept, !intercept);
+	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSFETCHLINAD, !intercept, !intercept);
+	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSOPCTL, !intercept, !intercept);
+	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSOPRIP, !intercept, !intercept);
+	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSOPDATA, !intercept, !intercept);
+	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSOPDATA2, !intercept, !intercept);
+	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSOPDATA3, !intercept, !intercept);
+	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSDCLINAD, !intercept, !intercept);
+	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSBRTARGET, !intercept, !intercept);
+	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_ICIBSEXTDCTL, !intercept, !intercept);
+}
+
 static void grow_ple_window(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -1207,6 +1241,29 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
 		/* No need to intercept these MSRs */
 		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_EIP, 1, 1);
 		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_ESP, 1, 1);
+
+		/*
+		 * If hardware supports VIBS then no need to intercept IBS MSRS
+		 * when VIBS is enabled in guest.
+		 */
+		if (vibs) {
+			if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_IBS)) {
+				svm_ibs_msr_interception(svm, false);
+				svm->ibs_enabled = true;
+
+				/*
+				 * In order to enable VIBS, AVIC/VNMI must be enabled to handle the
+				 * interrupt generated by IBS driver. When AVIC is enabled, once
+				 * data collection for IBS fetch/op block for sampled interval
+				 * provided is done, hardware signals VNMI which is generated via
+				 * AVIC which uses extended LVT registers. That is why extended LVT
+				 * registers are initialized at guest startup.
+				 */
+				kvm_apic_init_eilvt_regs(vcpu);
+			} else {
+				svm->ibs_enabled = false;
+			}
+		}
 	}
 }
 
@@ -2888,6 +2945,11 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_AMD64_DE_CFG:
 		msr_info->data = svm->msr_decfg;
 		break;
+
+	case MSR_AMD64_IBSCTL:
+		rdmsrl(MSR_AMD64_IBSCTL, msr_info->data);
+		break;
+
 	default:
 		return kvm_get_msr_common(vcpu, msr_info);
 	}
@@ -4038,19 +4100,111 @@ static fastpath_t svm_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
 	return EXIT_FASTPATH_NONE;
 }
 
+/*
+ * Since the IBS state is swap type C, the hypervisor is responsible for saving
+ * its own IBS state before VMRUN.
+ */
+static void svm_save_host_ibs_msrs(struct vmcb_save_area *hostsa)
+{
+	rdmsrl(MSR_AMD64_IBSFETCHLINAD, hostsa->ibs_fetch_linear_addr);
+	rdmsrl(MSR_AMD64_IBSOPRIP, hostsa->ibs_op_rip);
+	rdmsrl(MSR_AMD64_IBSOPDATA, hostsa->ibs_op_data);
+	rdmsrl(MSR_AMD64_IBSOPDATA2, hostsa->ibs_op_data2);
+	rdmsrl(MSR_AMD64_IBSOPDATA3, hostsa->ibs_op_data3);
+	rdmsrl(MSR_AMD64_IBSDCLINAD, hostsa->ibs_dc_linear_addr);
+	rdmsrl(MSR_AMD64_IBSBRTARGET, hostsa->ibs_br_target);
+	rdmsrl(MSR_AMD64_ICIBSEXTDCTL, hostsa->ibs_fetch_extd_ctl);
+}
+
+/*
+ * Since the IBS state is swap type C, the hypervisor is responsible for
+ * restoring its own IBS state after VMEXIT.
+ */
+static void svm_restore_host_ibs_msrs(struct vmcb_save_area *hostsa)
+{
+	wrmsrl(MSR_AMD64_IBSFETCHLINAD, hostsa->ibs_fetch_linear_addr);
+	wrmsrl(MSR_AMD64_IBSOPRIP, hostsa->ibs_op_rip);
+	wrmsrl(MSR_AMD64_IBSOPDATA, hostsa->ibs_op_data);
+	wrmsrl(MSR_AMD64_IBSOPDATA2, hostsa->ibs_op_data2);
+	wrmsrl(MSR_AMD64_IBSOPDATA3, hostsa->ibs_op_data3);
+	wrmsrl(MSR_AMD64_IBSDCLINAD, hostsa->ibs_dc_linear_addr);
+	wrmsrl(MSR_AMD64_IBSBRTARGET, hostsa->ibs_br_target);
+	wrmsrl(MSR_AMD64_ICIBSEXTDCTL, hostsa->ibs_fetch_extd_ctl);
+}
+
+/*
+ * Host states are categorized into three swap types based on how it is
+ * handled by hardware during a switch.
+ * Below enum represent host states which are categorized as Swap type C
+ *
+ * C: VMRUN:  Host state _NOT_ saved in host save area
+ *    VMEXIT: Host state initializard to default values.
+ *
+ * Swap type C state is not loaded by VMEXIT and is not saved by VMRUN.
+ * It needs to be saved/restored manually.
+ */
+enum {
+	SWAP_TYPE_C_IBS = 0,
+	SWAP_TYPE_C_MAX
+};
+
+/*
+ * Since IBS state is swap type C, hypervisor needs to disable IBS, then save
+ * IBS MSRs before VMRUN and re-enable it, then restore IBS MSRs after VMEXIT.
+ * This order is important, if not followed, software ends up reading inaccurate
+ * IBS registers.
+ */
+static noinstr u32 svm_save_swap_type_c(struct kvm_vcpu *vcpu)
+{
+	struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, vcpu->cpu);
+	struct vmcb_save_area *hostsa;
+	u32 restore_mask = 0;
+
+	hostsa = (struct vmcb_save_area *)(page_address(sd->save_area) + 0x400);
+
+	if (to_svm(vcpu)->ibs_enabled) {
+		bool en = amd_vibs_window(WINDOW_START, &hostsa->ibs_fetch_ctl, &hostsa->ibs_op_ctl);
+
+		if (en) {
+			svm_save_host_ibs_msrs(hostsa);
+			restore_mask |= 1 << SWAP_TYPE_C_IBS;
+		}
+	}
+	return restore_mask;
+}
+
+static noinstr void svm_restore_swap_type_c(struct kvm_vcpu *vcpu, u32 restore_mask)
+{
+	struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, vcpu->cpu);
+	struct vmcb_save_area *hostsa;
+
+	hostsa = (struct vmcb_save_area *)(page_address(sd->save_area) + 0x400);
+
+	if (restore_mask & (1 << SWAP_TYPE_C_IBS)) {
+		svm_restore_host_ibs_msrs(hostsa);
+		amd_vibs_window(WINDOW_STOPPING, &hostsa->ibs_fetch_ctl, &hostsa->ibs_op_ctl);
+	}
+}
+
 static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_intercepted)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
+	u32 restore_mask;
 
 	guest_state_enter_irqoff();
 
 	amd_clear_divider();
 
-	if (sev_es_guest(vcpu->kvm))
+	if (sev_es_guest(vcpu->kvm)) {
 		__svm_sev_es_vcpu_run(svm, spec_ctrl_intercepted);
-	else
+	} else {
+		restore_mask = svm_save_swap_type_c(vcpu);
 		__svm_vcpu_run(svm, spec_ctrl_intercepted);
 
+		if (restore_mask)
+			svm_restore_swap_type_c(vcpu, restore_mask);
+	}
+
 	guest_state_exit_irqoff();
 }
 
@@ -4137,6 +4291,13 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	/* Any pending NMI will happen here */
 
+	/*
+	 * Disable the IBS window since any pending IBS NMIs will have been
+	 * handled.
+	 */
+	if (svm->ibs_enabled)
+		amd_vibs_window(WINDOW_STOPPED, NULL, NULL);
+
 	if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI))
 		kvm_after_interrupt(vcpu);
 
@@ -5225,6 +5386,13 @@ static __init int svm_hardware_setup(void)
 			pr_info("LBR virtualization supported\n");
 	}
 
+	if (vibs) {
+		if ((vnmi || avic) && boot_cpu_has(X86_FEATURE_VIBS))
+			pr_info("IBS virtualization supported\n");
+		else
+			vibs = false;
+	}
+
 	if (!enable_pmu)
 		pr_info("PMU virtualization is disabled\n");
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index c7eb82a78127..c2a02629a1d1 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -30,7 +30,7 @@
 #define	IOPM_SIZE PAGE_SIZE * 3
 #define	MSRPM_SIZE PAGE_SIZE * 2
 
-#define MAX_DIRECT_ACCESS_MSRS	46
+#define MAX_DIRECT_ACCESS_MSRS	62
 #define MSRPM_OFFSETS	32
 extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
 extern bool npt_enabled;
@@ -260,6 +260,7 @@ struct vcpu_svm {
 	unsigned long soft_int_old_rip;
 	unsigned long soft_int_next_rip;
 	bool soft_int_injected;
+	bool ibs_enabled;
 
 	u32 ldr_reg;
 	u32 dfr_reg;
@@ -732,6 +733,7 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm);
 void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
 void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
 void sev_es_unmap_ghcb(struct vcpu_svm *svm);
+void svm_ibs_msr_interception(struct vcpu_svm *svm, bool intercept);
 
 /* vmenter.S */
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 10/13] x86/cpufeatures: Add CPUID feature bit for VIBS in SEV-ES guest
  2023-09-04  9:53 [PATCH 00/13] Implement support for IBS virtualization Manali Shukla
                   ` (8 preceding siblings ...)
  2023-09-04  9:53 ` [PATCH 09/13] KVM: SVM: add support for IBS virtualization for non SEV-ES guests Manali Shukla
@ 2023-09-04  9:53 ` Manali Shukla
  2023-09-04  9:53 ` [PATCH 11/13] KVM: SVM: Add support for IBS virtualization for SEV-ES guests Manali Shukla
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Manali Shukla @ 2023-09-04  9:53 UTC (permalink / raw)
  To: kvm, seanjc
  Cc: linux-doc, linux-perf-users, x86, pbonzini, peterz, bp,
	santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj,
	manali.shukla

VIBS feature allows the guest to collect IBS samples without exiting.

Presence of the VIBS feature for SEV-ES guests is indicated via CPUID
function 0x8000001F_EAX[19].

Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 8f92fa6d8319..022ccee197e2 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -439,6 +439,7 @@
 #define X86_FEATURE_SEV_ES		(19*32+ 3) /* AMD Secure Encrypted Virtualization - Encrypted State */
 #define X86_FEATURE_V_TSC_AUX		(19*32+ 9) /* "" Virtual TSC_AUX */
 #define X86_FEATURE_SME_COHERENT	(19*32+10) /* "" AMD hardware-enforced cache coherency */
+#define X86_FEATURE_SEV_ES_VIBS		(19*32+19) /* "" IBS virtualization for SEV-ES guests */
 
 /* AMD-defined Extended Feature 2 EAX, CPUID level 0x80000021 (EAX), word 20 */
 #define X86_FEATURE_NO_NESTED_DATA_BP	(20*32+ 0) /* "" No Nested Data Breakpoints */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 11/13] KVM: SVM: Add support for IBS virtualization for SEV-ES guests
  2023-09-04  9:53 [PATCH 00/13] Implement support for IBS virtualization Manali Shukla
                   ` (9 preceding siblings ...)
  2023-09-04  9:53 ` [PATCH 10/13] x86/cpufeatures: Add CPUID feature bit for VIBS in SEV-ES guest Manali Shukla
@ 2023-09-04  9:53 ` Manali Shukla
  2023-09-05 15:43   ` Tom Lendacky
  2023-09-04  9:53 ` [PATCH 12/13] KVM: SVM: Enable IBS virtualization on non SEV-ES and " Manali Shukla
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Manali Shukla @ 2023-09-04  9:53 UTC (permalink / raw)
  To: kvm, seanjc
  Cc: linux-doc, linux-perf-users, x86, pbonzini, peterz, bp,
	santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj,
	manali.shukla

Since the IBS state is swap type C, the hypervisor is responsible for
saving its own IBS state before VMRUN and restoring it after VMEXIT.
It is also responsible for disabling IBS before VMRUN and re-enabling
it after VMEXIT. For a SEV-ES guest with IBS virtualization enabled,
a VMEXIT_INVALID will happen if IBS is found to be enabled on VMRUN
[1].

The IBS virtualization feature for SEV-ES guests is not enabled in this
patch. Later patches enable IBS virtualization for SEV-ES guests.

[1]: https://bugzilla.kernel.org/attachment.cgi?id=304653
     AMD64 Architecture Programmer’s Manual, Vol 2, Section 15.38
     Instruction-Based Sampling Virtualization.

Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/include/asm/svm.h | 14 +++++++++++++-
 arch/x86/kvm/svm/sev.c     |  7 +++++++
 arch/x86/kvm/svm/svm.c     | 11 +++++------
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 4096d2f68770..58b60842a3b7 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -469,6 +469,18 @@ struct sev_es_save_area {
 	u8 fpreg_x87[80];
 	u8 fpreg_xmm[256];
 	u8 fpreg_ymm[256];
+	u8 lbr_stack_from_to[256];
+	u64 lbr_select;
+	u64 ibs_fetch_ctl;
+	u64 ibs_fetch_linear_addr;
+	u64 ibs_op_ctl;
+	u64 ibs_op_rip;
+	u64 ibs_op_data;
+	u64 ibs_op_data2;
+	u64 ibs_op_data3;
+	u64 ibs_dc_linear_addr;
+	u64 ibs_br_target;
+	u64 ibs_fetch_extd_ctl;
 } __packed;
 
 struct ghcb_save_area {
@@ -527,7 +539,7 @@ struct ghcb {
 
 #define EXPECTED_VMCB_SAVE_AREA_SIZE		1992
 #define EXPECTED_GHCB_SAVE_AREA_SIZE		1032
-#define EXPECTED_SEV_ES_SAVE_AREA_SIZE		1648
+#define EXPECTED_SEV_ES_SAVE_AREA_SIZE		1992
 #define EXPECTED_VMCB_CONTROL_AREA_SIZE		1024
 #define EXPECTED_GHCB_SIZE			PAGE_SIZE
 
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index d3aec1f2cad2..41706335cedd 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -59,6 +59,7 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
 #define sev_es_enabled false
 #endif /* CONFIG_KVM_AMD_SEV */
 
+static bool sev_es_vibs_enabled;
 static u8 sev_enc_bit;
 static DECLARE_RWSEM(sev_deactivate_lock);
 static DEFINE_MUTEX(sev_bitmap_lock);
@@ -2256,6 +2257,9 @@ void __init sev_hardware_setup(void)
 
 	sev_enabled = sev_supported;
 	sev_es_enabled = sev_es_supported;
+
+	if (!sev_es_enabled || !cpu_feature_enabled(X86_FEATURE_SEV_ES_VIBS))
+		sev_es_vibs_enabled = false;
 #endif
 }
 
@@ -2993,6 +2997,9 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
 		if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP))
 			svm_clr_intercept(svm, INTERCEPT_RDTSCP);
 	}
+
+	if (sev_es_vibs_enabled && svm->ibs_enabled)
+		svm_ibs_msr_interception(svm, false);
 }
 
 void sev_init_vmcb(struct vcpu_svm *svm)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 6f566ed93f4c..0cfe23bb144a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4194,16 +4194,15 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in
 	guest_state_enter_irqoff();
 
 	amd_clear_divider();
+	restore_mask = svm_save_swap_type_c(vcpu);
 
-	if (sev_es_guest(vcpu->kvm)) {
+	if (sev_es_guest(vcpu->kvm))
 		__svm_sev_es_vcpu_run(svm, spec_ctrl_intercepted);
-	} else {
-		restore_mask = svm_save_swap_type_c(vcpu);
+	else
 		__svm_vcpu_run(svm, spec_ctrl_intercepted);
 
-		if (restore_mask)
-			svm_restore_swap_type_c(vcpu, restore_mask);
-	}
+	if (restore_mask)
+		svm_restore_swap_type_c(vcpu, restore_mask);
 
 	guest_state_exit_irqoff();
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 12/13] KVM: SVM: Enable IBS virtualization on non SEV-ES and SEV-ES guests
  2023-09-04  9:53 [PATCH 00/13] Implement support for IBS virtualization Manali Shukla
                   ` (10 preceding siblings ...)
  2023-09-04  9:53 ` [PATCH 11/13] KVM: SVM: Add support for IBS virtualization for SEV-ES guests Manali Shukla
@ 2023-09-04  9:53 ` Manali Shukla
  2023-09-05 16:00   ` Tom Lendacky
  2023-09-12  3:30   ` Chao Gao
  2023-09-04  9:53 ` [PATCH 13/13] KVM: x86: nSVM: Implement support for nested IBS virtualization Manali Shukla
  2023-09-05 15:47 ` [PATCH 00/13] Implement support for " Peter Zijlstra
  13 siblings, 2 replies; 32+ messages in thread
From: Manali Shukla @ 2023-09-04  9:53 UTC (permalink / raw)
  To: kvm, seanjc
  Cc: linux-doc, linux-perf-users, x86, pbonzini, peterz, bp,
	santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj,
	manali.shukla

To enable IBS virtualization capability on non SEV-ES guests, bit 2
at offset 0xb8 in VMCB is set to 1 for non SEV-ES guests.

To enable IBS virtualization capability on SEV-ES guests, bit 12 in
SEV_FEATURES in VMSA is set to 1 for SEV-ES guests.

Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/include/asm/svm.h |  4 ++++
 arch/x86/kvm/svm/sev.c     |  5 ++++-
 arch/x86/kvm/svm/svm.c     | 26 +++++++++++++++++++++++++-
 3 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
index 58b60842a3b7..a31bf803b993 100644
--- a/arch/x86/include/asm/svm.h
+++ b/arch/x86/include/asm/svm.h
@@ -215,6 +215,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 #define LBR_CTL_ENABLE_MASK BIT_ULL(0)
 #define VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK BIT_ULL(1)
 
+#define VIRTUAL_IBS_ENABLE_MASK BIT_ULL(2)
+
 #define SVM_INTERRUPT_SHADOW_MASK	BIT_ULL(0)
 #define SVM_GUEST_INTERRUPT_MASK	BIT_ULL(1)
 
@@ -259,6 +261,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
 
 #define VMCB_AVIC_APIC_BAR_MASK				0xFFFFFFFFFF000ULL
 
+#define SVM_SEV_ES_FEAT_VIBS				BIT(12)
+
 #define AVIC_UNACCEL_ACCESS_WRITE_MASK		1
 #define AVIC_UNACCEL_ACCESS_OFFSET_MASK		0xFF0
 #define AVIC_UNACCEL_ACCESS_VECTOR_MASK		0xFFFFFFFF
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 41706335cedd..e0ef3a2323d6 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -59,7 +59,7 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
 #define sev_es_enabled false
 #endif /* CONFIG_KVM_AMD_SEV */
 
-static bool sev_es_vibs_enabled;
+static bool sev_es_vibs_enabled = true;
 static u8 sev_enc_bit;
 static DECLARE_RWSEM(sev_deactivate_lock);
 static DEFINE_MUTEX(sev_bitmap_lock);
@@ -607,6 +607,9 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
 	save->xss  = svm->vcpu.arch.ia32_xss;
 	save->dr6  = svm->vcpu.arch.dr6;
 
+	if (svm->ibs_enabled && sev_es_vibs_enabled)
+		save->sev_features |= SVM_SEV_ES_FEAT_VIBS;
+
 	pr_debug("Virtual Machine Save Area (VMSA):\n");
 	print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 0cfe23bb144a..b85120f0d3ac 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -234,7 +234,7 @@ static int lbrv = true;
 module_param(lbrv, int, 0444);
 
 /* enable/disable IBS virtualization */
-static int vibs;
+static int vibs = true;
 module_param(vibs, int, 0444);
 
 static int tsc_scaling = true;
@@ -1245,10 +1245,13 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
 		/*
 		 * If hardware supports VIBS then no need to intercept IBS MSRS
 		 * when VIBS is enabled in guest.
+		 *
+		 * Enable VIBS by setting bit 2 at offset 0xb8 in VMCB.
 		 */
 		if (vibs) {
 			if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_IBS)) {
 				svm_ibs_msr_interception(svm, false);
+				svm->vmcb->control.virt_ext |= VIRTUAL_IBS_ENABLE_MASK;
 				svm->ibs_enabled = true;
 
 				/*
@@ -5166,6 +5169,24 @@ static __init void svm_adjust_mmio_mask(void)
 	kvm_mmu_set_mmio_spte_mask(mask, mask, PT_WRITABLE_MASK | PT_USER_MASK);
 }
 
+static void svm_ibs_set_cpu_caps(void)
+{
+	kvm_cpu_cap_set(X86_FEATURE_IBS);
+	kvm_cpu_cap_set(X86_FEATURE_EXTLVT);
+	kvm_cpu_cap_set(X86_FEATURE_EXTAPIC);
+	kvm_cpu_cap_set(X86_FEATURE_IBS_AVAIL);
+	kvm_cpu_cap_set(X86_FEATURE_IBS_FETCHSAM);
+	kvm_cpu_cap_set(X86_FEATURE_IBS_OPSAM);
+	kvm_cpu_cap_set(X86_FEATURE_IBS_RDWROPCNT);
+	kvm_cpu_cap_set(X86_FEATURE_IBS_OPCNT);
+	kvm_cpu_cap_set(X86_FEATURE_IBS_BRNTRGT);
+	kvm_cpu_cap_set(X86_FEATURE_IBS_OPCNTEXT);
+	kvm_cpu_cap_set(X86_FEATURE_IBS_RIPINVALIDCHK);
+	kvm_cpu_cap_set(X86_FEATURE_IBS_OPBRNFUSE);
+	kvm_cpu_cap_set(X86_FEATURE_IBS_FETCHCTLEXTD);
+	kvm_cpu_cap_set(X86_FEATURE_IBS_ZEN4_EXT);
+}
+
 static __init void svm_set_cpu_caps(void)
 {
 	kvm_set_cpu_caps();
@@ -5208,6 +5229,9 @@ static __init void svm_set_cpu_caps(void)
 		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
 	}
 
+	if (vibs)
+		svm_ibs_set_cpu_caps();
+
 	/* CPUID 0x80000008 */
 	if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
 	    boot_cpu_has(X86_FEATURE_AMD_SSBD))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 13/13] KVM: x86: nSVM: Implement support for nested IBS virtualization
  2023-09-04  9:53 [PATCH 00/13] Implement support for IBS virtualization Manali Shukla
                   ` (11 preceding siblings ...)
  2023-09-04  9:53 ` [PATCH 12/13] KVM: SVM: Enable IBS virtualization on non SEV-ES and " Manali Shukla
@ 2023-09-04  9:53 ` Manali Shukla
  2023-09-05 15:47 ` [PATCH 00/13] Implement support for " Peter Zijlstra
  13 siblings, 0 replies; 32+ messages in thread
From: Manali Shukla @ 2023-09-04  9:53 UTC (permalink / raw)
  To: kvm, seanjc
  Cc: linux-doc, linux-perf-users, x86, pbonzini, peterz, bp,
	santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj,
	manali.shukla

To handle the case where IBS is enabled for L1 and L2, IBS MSRs are
copied from vmcb12 to vmcb02 during vmentry and vice-versa during
vmexit.

To handle the case where IBS is enabled for L1 but _not_ for L2, IBS
MSRs are copied from vmcb01 to vmcb02 during vmentry and vice-versa
during vmexit.

Signed-off-by: Manali Shukla <manali.shukla@amd.com>
---
 arch/x86/kvm/governed_features.h |  1 +
 arch/x86/kvm/svm/nested.c        | 23 +++++++++++++++++++++++
 arch/x86/kvm/svm/svm.c           | 18 ++++++++++++++++++
 arch/x86/kvm/svm/svm.h           |  1 +
 4 files changed, 43 insertions(+)

diff --git a/arch/x86/kvm/governed_features.h b/arch/x86/kvm/governed_features.h
index 423a73395c10..101c819f3876 100644
--- a/arch/x86/kvm/governed_features.h
+++ b/arch/x86/kvm/governed_features.h
@@ -16,6 +16,7 @@ KVM_GOVERNED_X86_FEATURE(PAUSEFILTER)
 KVM_GOVERNED_X86_FEATURE(PFTHRESHOLD)
 KVM_GOVERNED_X86_FEATURE(VGIF)
 KVM_GOVERNED_X86_FEATURE(VNMI)
+KVM_GOVERNED_X86_FEATURE(VIBS)
 
 #undef KVM_GOVERNED_X86_FEATURE
 #undef KVM_GOVERNED_FEATURE
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index dd496c9e5f91..a1bb32779b3e 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -616,6 +616,16 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12
 	} else if (unlikely(vmcb01->control.virt_ext & LBR_CTL_ENABLE_MASK)) {
 		svm_copy_lbrs(vmcb02, vmcb01);
 	}
+
+	if (guest_can_use(vcpu, X86_FEATURE_VIBS) &&
+	    !(vmcb12->control.virt_ext & VIRTUAL_IBS_ENABLE_MASK))
+		vmcb02->control.virt_ext = vmcb12->control.virt_ext & ~VIRTUAL_IBS_ENABLE_MASK;
+
+	if (unlikely(guest_can_use(vcpu, X86_FEATURE_VIBS) &&
+		     (svm->nested.ctl.virt_ext & VIRTUAL_IBS_ENABLE_MASK)))
+		svm_copy_ibs(vmcb02, vmcb12);
+	else if (unlikely(vmcb01->control.virt_ext & VIRTUAL_IBS_ENABLE_MASK))
+		svm_copy_ibs(vmcb02, vmcb01);
 }
 
 static inline bool is_evtinj_soft(u32 evtinj)
@@ -741,6 +751,13 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
 		vmcb02->control.virt_ext  |=
 			(svm->nested.ctl.virt_ext & LBR_CTL_ENABLE_MASK);
 
+	vmcb02->control.virt_ext            = vmcb01->control.virt_ext & VIRTUAL_IBS_ENABLE_MASK;
+
+	if (guest_can_use(vcpu, X86_FEATURE_VIBS))
+		vmcb02->control.virt_ext  |= (svm->nested.ctl.virt_ext & VIRTUAL_IBS_ENABLE_MASK);
+	else
+		vmcb02->control.virt_ext &= ~VIRTUAL_IBS_ENABLE_MASK;
+
 	if (!nested_vmcb_needs_vls_intercept(svm))
 		vmcb02->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK;
 
@@ -1083,6 +1100,12 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 		svm_update_lbrv(vcpu);
 	}
 
+	if (unlikely(guest_can_use(vcpu, X86_FEATURE_VIBS) &&
+		     (svm->nested.ctl.virt_ext & VIRTUAL_IBS_ENABLE_MASK)))
+		svm_copy_ibs(vmcb12, vmcb02);
+	else if (unlikely(vmcb01->control.virt_ext & VIRTUAL_IBS_ENABLE_MASK))
+		svm_copy_ibs(vmcb01, vmcb02);
+
 	if (vnmi) {
 		if (vmcb02->control.int_ctl & V_NMI_BLOCKING_MASK)
 			vmcb01->control.int_ctl |= V_NMI_BLOCKING_MASK;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index b85120f0d3ac..7925bfa0b4ce 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1084,6 +1084,20 @@ void svm_ibs_msr_interception(struct vcpu_svm *svm, bool intercept)
 	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_ICIBSEXTDCTL, !intercept, !intercept);
 }
 
+void svm_copy_ibs(struct vmcb *to_vmcb, struct vmcb *from_vmcb)
+{
+	to_vmcb->save.ibs_fetch_ctl		= from_vmcb->save.ibs_fetch_ctl;
+	to_vmcb->save.ibs_fetch_linear_addr	= from_vmcb->save.ibs_fetch_linear_addr;
+	to_vmcb->save.ibs_op_ctl		= from_vmcb->save.ibs_op_ctl;
+	to_vmcb->save.ibs_op_rip		= from_vmcb->save.ibs_op_rip;
+	to_vmcb->save.ibs_op_data		= from_vmcb->save.ibs_op_data;
+	to_vmcb->save.ibs_op_data2		= from_vmcb->save.ibs_op_data2;
+	to_vmcb->save.ibs_op_data3		= from_vmcb->save.ibs_op_data3;
+	to_vmcb->save.ibs_dc_linear_addr	= from_vmcb->save.ibs_dc_linear_addr;
+	to_vmcb->save.ibs_br_target		= from_vmcb->save.ibs_br_target;
+	to_vmcb->save.ibs_fetch_extd_ctl	= from_vmcb->save.ibs_fetch_extd_ctl;
+}
+
 static void grow_ple_window(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -4441,6 +4455,7 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
 	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_PFTHRESHOLD);
 	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_VGIF);
 	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_VNMI);
+	kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_VIBS);
 
 	svm_recalc_instruction_intercepts(vcpu, svm);
 
@@ -5225,6 +5240,9 @@ static __init void svm_set_cpu_caps(void)
 		if (vnmi)
 			kvm_cpu_cap_set(X86_FEATURE_VNMI);
 
+		if (vibs)
+			kvm_cpu_cap_set(X86_FEATURE_VIBS);
+
 		/* Nested VM can receive #VMEXIT instead of triggering #GP */
 		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
 	}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index c2a02629a1d1..f607dc690d94 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -584,6 +584,7 @@ void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm);
 void svm_vcpu_free_msrpm(u32 *msrpm);
 void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb);
 void svm_update_lbrv(struct kvm_vcpu *vcpu);
+void svm_copy_ibs(struct vmcb *to_vmcb, struct vmcb *from_vmcb);
 
 int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer);
 void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 08/13] perf/x86/amd: Add framework to save/restore host IBS state
  2023-09-04  9:53 ` [PATCH 08/13] perf/x86/amd: Add framework to save/restore host IBS state Manali Shukla
@ 2023-09-05 14:54   ` Tom Lendacky
  0 siblings, 0 replies; 32+ messages in thread
From: Tom Lendacky @ 2023-09-05 14:54 UTC (permalink / raw)
  To: Manali Shukla, kvm, seanjc
  Cc: linux-doc, linux-perf-users, x86, pbonzini, peterz, bp,
	santosh.shukla, ravi.bangoria, nikunj

On 9/4/23 04:53, Manali Shukla wrote:
> Since IBS registers falls under swap type C [1], only the guest state

s/falls/fall/

> is saved and restored automatically by the hardware. Host state needs
> to be saved and restored manually by the hypervisor. Note that, saving
> and restoring of host IBS state happens only when IBS is active on
> host to avoid unnecessary rdmsrs/wrmsrs.
> 
> Also, hypervisor needs to disable host IBS before VMRUN and re-enable
> it after VMEXIT [2]. However, disabling and enabling of IBS leads to
> subtle races between software and hardware since IBS_*_CTL registers
> contain both control and result bits in the same MSR.
> 
> Consider the following scenario, hypervisor reads IBS control MSR and
> finds enable=1 (control bit) and valid=0 (result bit). While kernel is
> clearing enable bit in its local copy, IBS hardware sets valid bit to
> 1 in the MSR. Software, who is unaware of the change done by IBS
> hardware, overwrites IBS MSR with enable=0 and valid=0. Note that,
> this situation occurs while NMIs are disabled. So CPU will receive IBS
> NMI only after STGI. However, the IBS driver won't handle NMI because
> of the valid bit being 0. Since the real source of NMI was IBS, nobody
> else will also handle it which will result in the unknown NMIs.
> 
> Handle the above mentioned race by keeping track of different actions
> performed by KVM on IBS:
> 
>    WINDOW_START: After CLGI and before VMRUN. KVM informs IBS driver
>                  about its intention to enable IBS for the guest. Thus
> 		IBS should be disabled on host and IBS host register
> 		state should be saved.
>    WINDOW_STOPPING: After VMEXIT and before STGI. KVM informs IBS driver
>                  that it's done using IBS inside the guest and thus host
> 		IBS state should be restored followed by re-enabling
> 		IBS for host.
>    WINDOW_STOPPED: After STGI. CPU will receive any pending NMI if it
>                  was raised between CLGI and STGI. NMI will be marked
> 		as handled by IBS driver if WINDOW_STOPPED action is
>                  _not performed, valid bit is _not_ set and a valid
>                  IBS event exists. However, IBS sample won't be generated.
> 
> [1]: https://bugzilla.kernel.org/attachment.cgi?id=304653
>       AMD64 Architecture Programmer’s Manual, Vol 2, Appendix B Layout
>       of VMCB, Table B-3 Swap Types.
> 
> [2]: https://bugzilla.kernel.org/attachment.cgi?id=304653
>       AMD64 Architecture Programmer’s Manual, Vol 2, Section 15.38
>       Instruction-Based Sampling Virtualization.
> 
> Signed-off-by: Manali Shukla <manali.shukla@amd.com>
> ---
>   arch/x86/events/amd/Makefile      |   2 +-
>   arch/x86/events/amd/ibs.c         |  23 +++++++
>   arch/x86/events/amd/vibs.c        | 101 ++++++++++++++++++++++++++++++
>   arch/x86/include/asm/perf_event.h |  27 ++++++++
>   4 files changed, 152 insertions(+), 1 deletion(-)
>   create mode 100644 arch/x86/events/amd/vibs.c
> 
> diff --git a/arch/x86/events/amd/Makefile b/arch/x86/events/amd/Makefile
> index 527d947eb76b..13c2980db9a7 100644
> --- a/arch/x86/events/amd/Makefile
> +++ b/arch/x86/events/amd/Makefile
> @@ -2,7 +2,7 @@
>   obj-$(CONFIG_CPU_SUP_AMD)		+= core.o lbr.o
>   obj-$(CONFIG_PERF_EVENTS_AMD_BRS)	+= brs.o
>   obj-$(CONFIG_PERF_EVENTS_AMD_POWER)	+= power.o
> -obj-$(CONFIG_X86_LOCAL_APIC)		+= ibs.o
> +obj-$(CONFIG_X86_LOCAL_APIC)		+= ibs.o vibs.o
>   obj-$(CONFIG_PERF_EVENTS_AMD_UNCORE)	+= amd-uncore.o
>   amd-uncore-objs				:= uncore.o
>   ifdef CONFIG_AMD_IOMMU
> diff --git a/arch/x86/events/amd/ibs.c b/arch/x86/events/amd/ibs.c
> index 6911c5399d02..359464f2910d 100644
> --- a/arch/x86/events/amd/ibs.c
> +++ b/arch/x86/events/amd/ibs.c
> @@ -1039,6 +1039,16 @@ static int perf_ibs_handle_irq(struct perf_ibs *perf_ibs, struct pt_regs *iregs)
>   		 */
>   		if (test_and_clear_bit(IBS_STOPPED, pcpu->state))
>   			return 1;
> +		/*
> +		 * Catch NMIs generated in an active IBS window: Incoming NMIs
> +		 * from an active IBS window might have the VALID bit cleared
> +		 * when it is supposed to be set due to a race. The reason for
> +		 * the race is ENABLE and VALID bits for MSR_AMD64_IBSFETCHCTL
> +		 * and MSR_AMD64_IBSOPCTL being in their same respective MSRs.
> +		 * Ignore all such NMIs and treat them as handled.
> +		 */
> +		if (amd_vibs_ignore_nmi())
> +			return 1;
>   
>   		return 0;
>   	}
> @@ -1542,3 +1552,16 @@ static __init int amd_ibs_init(void)
>   
>   /* Since we need the pci subsystem to init ibs we can't do this earlier: */
>   device_initcall(amd_ibs_init);
> +
> +static inline bool get_ibs_state(struct perf_ibs *perf_ibs)
> +{
> +	struct cpu_perf_ibs *pcpu = this_cpu_ptr(perf_ibs->pcpu);
> +
> +	return test_bit(IBS_STARTED, pcpu->state);
> +}
> +
> +bool is_amd_ibs_started(void)
> +{
> +	return get_ibs_state(&perf_ibs_fetch) || get_ibs_state(&perf_ibs_op);
> +}
> +EXPORT_SYMBOL_GPL(is_amd_ibs_started);

If this is only used by the IBS support, it shouldn't have an 
EXPORT_SYMBOL_GPL.

> diff --git a/arch/x86/events/amd/vibs.c b/arch/x86/events/amd/vibs.c
> new file mode 100644
> index 000000000000..273a60f1cb7f
> --- /dev/null
> +++ b/arch/x86/events/amd/vibs.c
> @@ -0,0 +1,101 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + *  Virtualized Performance events - AMD VIBS
> + *
> + *  Copyright (C) 2023 Advanced Micro Devices, Inc., Manali Shukla
> + *
> + *  For licencing details see kernel-base/COPYING
> + */
> +
> +#include <linux/perf_event.h>
> +
> +DEFINE_PER_CPU(bool, vibs_window_active);
> +
> +static bool amd_disable_ibs_fetch(u64 *ibs_fetch_ctl)
> +{
> +	*ibs_fetch_ctl = __rdmsr(MSR_AMD64_IBSFETCHCTL);
> +	if (!(*ibs_fetch_ctl & IBS_FETCH_ENABLE))
> +		return false;
> +
> +	native_wrmsrl(MSR_AMD64_IBSFETCHCTL, *ibs_fetch_ctl & ~IBS_FETCH_ENABLE);
> +
> +	return true;
> +}
> +
> +static u64 amd_disable_ibs_op(u64 *ibs_op_ctl)
> +{
> +	*ibs_op_ctl = __rdmsr(MSR_AMD64_IBSOPCTL);
> +	if (!(*ibs_op_ctl & IBS_OP_ENABLE))
> +		return false;
> +
> +	native_wrmsrl(MSR_AMD64_IBSOPCTL, *ibs_op_ctl & ~IBS_OP_ENABLE);
> +
> +	return true;
> +}
> +
> +static void amd_restore_ibs_fetch(u64 ibs_fetch_ctl)
> +{
> +	native_wrmsrl(MSR_AMD64_IBSFETCHCTL, ibs_fetch_ctl);
> +}
> +
> +static void amd_restore_ibs_op(u64 ibs_op_ctl)
> +{
> +	native_wrmsrl(MSR_AMD64_IBSOPCTL, ibs_op_ctl);
> +}
> +
> +bool amd_vibs_ignore_nmi(void)
> +{
> +	return __this_cpu_read(vibs_window_active);
> +}
> +EXPORT_SYMBOL_GPL(amd_vibs_ignore_nmi);

If this is only used by the IBS support, it shouldn't have an 
EXPORT_SYMBOL_GPL.

> +
> +bool amd_vibs_window(enum amd_vibs_window_state state, u64 *f_ctl,
> +		     u64 *o_ctl)
> +{
> +	bool f_active, o_active;
> +
> +	switch (state) {
> +	case WINDOW_START:
> +		if (!f_ctl || !o_ctl)
> +			return false;
> +
> +		if (!is_amd_ibs_started())
> +			return false;
> +
> +		f_active = amd_disable_ibs_fetch(f_ctl);
> +		o_active = amd_disable_ibs_op(o_ctl);
> +		__this_cpu_write(vibs_window_active, (f_active || o_active));
> +		break;
> +
> +	case WINDOW_STOPPING:
> +		if (!f_ctl || !o_ctl)
> +			return false;
> +
> +		if (__this_cpu_read(vibs_window_active))

Shouldn't this be if (!__this_cpu_read(vibs_window_active)) ?

> +			return false;
> +
> +		if (*f_ctl & IBS_FETCH_ENABLE)
> +			amd_restore_ibs_fetch(*f_ctl);
> +		if (*o_ctl & IBS_OP_ENABLE)
> +			amd_restore_ibs_op(*o_ctl);

Nit, these if checks could go into the restore functions to make this look 
a bit cleaner, but that's just from my point of view.

> +
> +		break;
> +
> +	case WINDOW_STOPPED:
> +		/*
> +		 * This state is executed right after STGI (which is executed
> +		 * after VMEXIT).  By this time, host IBS states are already
> +		 * restored in WINDOW_STOPPING state, so f_ctl and o_ctl will
> +		 * be passed as NULL for this state.
> +		 */
> +		if (__this_cpu_read(vibs_window_active))
> +			__this_cpu_write(vibs_window_active, false);

Any reason not to just issue __this_cpu_write(vibs_window_active, false) 
without the if check?

> +		break;
> +
> +	default:
> +		return false;
> +	}
> +
> +	return __this_cpu_read(vibs_window_active);

You could just issue a return true or return false (depending on the case) 
instead of having the break statements, e.g.:

For WINDOW_START replace break with return (f_active || o_active)
For WINDOW_STOPPING replace break with return true
For WINDOW_STOPPED replace break with return false

> +}
> +EXPORT_SYMBOL_GPL(amd_vibs_window);
> diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
> index 85a9fd5a3ec3..b87c235e0e1e 100644
> --- a/arch/x86/include/asm/perf_event.h
> +++ b/arch/x86/include/asm/perf_event.h
> @@ -486,6 +486,12 @@ struct pebs_xmm {
>   #define IBS_OP_MAX_CNT_EXT_MASK	(0x7FULL<<20)	/* separate upper 7 bits */
>   #define IBS_RIP_INVALID		(1ULL<<38)
>   
> +enum amd_vibs_window_state {
> +	WINDOW_START = 0,
> +	WINDOW_STOPPING,
> +	WINDOW_STOPPED,
> +};
> +
>   #ifdef CONFIG_X86_LOCAL_APIC
>   extern u32 get_ibs_caps(void);
>   extern int forward_event_to_ibs(struct perf_event *event);
> @@ -584,6 +590,27 @@ static inline void intel_pt_handle_vmx(int on)
>   }
>   #endif
>   
> +#if defined(CONFIG_X86_LOCAL_APIC) && defined(CONFIG_CPU_SUP_AMD)
> +extern bool amd_vibs_window(enum amd_vibs_window_state state, u64 *vibs_fetch_ctl,
> +			    u64 *vibs_op_ctl);
> +extern bool is_amd_ibs_started(void);
> +extern bool amd_vibs_ignore_nmi(void);

And if the two above functions aren't EXPORT_SYMBOL_GPL, then they could 
go somewhere else instead of here.

Thanks,
Tom

> +#else
> +static inline bool amd_vibs_window(enum amd_vibs_window_state state, u64 *vibs_fetch_ctl,
> +				  u64 *vibs_op_ctl)
> +{
> +	return false;
> +}
> +static inline bool is_amd_ibs_started(void)
> +{
> +	return false;
> +}
> +static inline bool amd_vibs_ignore_nmi(void)
> +{
> +	return false;
> +}
> +#endif
> +
>   #if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_AMD)
>    extern void amd_pmu_enable_virt(void);
>    extern void amd_pmu_disable_virt(void);

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/13] KVM: SVM: add support for IBS virtualization for non SEV-ES guests
  2023-09-04  9:53 ` [PATCH 09/13] KVM: SVM: add support for IBS virtualization for non SEV-ES guests Manali Shukla
@ 2023-09-05 15:30   ` Tom Lendacky
  2023-09-06  1:51   ` Alexey Kardashevskiy
  2023-09-12  3:09   ` Chao Gao
  2 siblings, 0 replies; 32+ messages in thread
From: Tom Lendacky @ 2023-09-05 15:30 UTC (permalink / raw)
  To: Manali Shukla, kvm, seanjc
  Cc: linux-doc, linux-perf-users, x86, pbonzini, peterz, bp,
	santosh.shukla, ravi.bangoria, nikunj

On 9/4/23 04:53, Manali Shukla wrote:
> From: Santosh Shukla <santosh.shukla@amd.com>
> 
> IBS virtualization (VIBS) [1] feature allows the guest to collect IBS
> samples without exiting the guest.
> 
> There are 2 parts to this feature
> - Virtualizing the IBS register state.
> - Ensuring the IBS interrupt is handled in the guest without exiting
>    to the hypervisor.
> 
> IBS virtualization requires the use of AVIC or NMI virtualization for
> delivery of a virtualized interrupt from IBS hardware in the guest.
> Without the virtualized interrupt delivery, the IBS interrupt
> occurring in the guest will not be delivered to either the guest or
> the hypervisor.  When AVIC is enabled, IBS LVT entry (Extended
> Interrupt 0 LVT) message type should be programmed to INTR or NMI.
> 
> So, when the sampled interval for the data collection for IBS fetch/op
> block is over, VIBS hardware is going to generate a Virtual NMI, but
> the source of Virtual NMI is different in both AVIC enabled/disabled
> case.
> 1) when AVIC is enabled, Virtual NMI is generated via AVIC using
>     extended LVT (EXTLVT).
> 2) When AVIC is disabled, Virtual NMI is directly generated from
>     hardware.
> 
> Since IBS registers falls under swap type C [2], only the guest state is
> saved and restored automatically by the hardware. Host state needs to be
> saved and restored manually by the hypervisor. Note that, saving and
> restoring of host IBS state happens only when IBS is active on host.  to
> avoid unnecessary rdmsrs/wrmsrs. Hypervisor needs to disable host IBS
> before VMRUN and re-enable it after VMEXIT [1].
> 
> The IBS virtualization feature for non SEV-ES guests is not enabled in
> this patch. Later patches enable VIBS for non SEV-ES guests.
> 
> [1]: https://bugzilla.kernel.org/attachment.cgi?id=304653
>       AMD64 Architecture Programmer’s Manual, Vol 2, Section 15.38
>       Instruction-Based Sampling Virtualization.
> 
> [2]: https://bugzilla.kernel.org/attachment.cgi?id=304653
>       AMD64 Architecture Programmer’s Manual, Vol 2, Appendix B Layout
>       of VMCB, Table B-3 Swap Types.
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> Co-developed-by: Manali Shukla <manali.shukla@amd.com>
> Signed-off-by: Manali Shukla <manali.shukla@amd.com>
> ---
>   arch/x86/kvm/svm/svm.c | 172 ++++++++++++++++++++++++++++++++++++++++-
>   arch/x86/kvm/svm/svm.h |   4 +-
>   2 files changed, 173 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 20fe83eb32ee..6f566ed93f4c 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -139,6 +139,22 @@ static const struct svm_direct_access_msrs {
>   	{ .index = X2APIC_MSR(APIC_TMICT),		.always = false },
>   	{ .index = X2APIC_MSR(APIC_TMCCT),		.always = false },
>   	{ .index = X2APIC_MSR(APIC_TDCR),		.always = false },
> +	{ .index = MSR_AMD64_IBSFETCHCTL,		.always = false },
> +	{ .index = MSR_AMD64_IBSFETCHLINAD,		.always = false },
> +	{ .index = MSR_AMD64_IBSOPCTL,			.always = false },
> +	{ .index = MSR_AMD64_IBSOPRIP,			.always = false },
> +	{ .index = MSR_AMD64_IBSOPDATA,			.always = false },
> +	{ .index = MSR_AMD64_IBSOPDATA2,		.always = false },
> +	{ .index = MSR_AMD64_IBSOPDATA3,		.always = false },
> +	{ .index = MSR_AMD64_IBSDCLINAD,		.always = false },
> +	{ .index = MSR_AMD64_IBSBRTARGET,		.always = false },
> +	{ .index = MSR_AMD64_ICIBSEXTDCTL,		.always = false },
> +	{ .index = X2APIC_MSR(APIC_EFEAT),		.always = false },
> +	{ .index = X2APIC_MSR(APIC_ECTRL),		.always = false },
> +	{ .index = X2APIC_MSR(APIC_EILVTn(0)),		.always = false },
> +	{ .index = X2APIC_MSR(APIC_EILVTn(1)),		.always = false },
> +	{ .index = X2APIC_MSR(APIC_EILVTn(2)),		.always = false },
> +	{ .index = X2APIC_MSR(APIC_EILVTn(3)),		.always = false },

Why not keep these X2APIC_MSR registers with the other X2APIC_MSR registers?

>   	{ .index = MSR_INVALID,				.always = false },
>   };
>   
> @@ -217,6 +233,10 @@ module_param(vgif, int, 0444);
>   static int lbrv = true;
>   module_param(lbrv, int, 0444);
>   
> +/* enable/disable IBS virtualization */
> +static int vibs;
> +module_param(vibs, int, 0444);
> +
>   static int tsc_scaling = true;
>   module_param(tsc_scaling, int, 0444);
>   
> @@ -1050,6 +1070,20 @@ void disable_nmi_singlestep(struct vcpu_svm *svm)
>   	}
>   }
>   
> +void svm_ibs_msr_interception(struct vcpu_svm *svm, bool intercept)
> +{
> +	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSFETCHCTL, !intercept, !intercept);
> +	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSFETCHLINAD, !intercept, !intercept);
> +	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSOPCTL, !intercept, !intercept);
> +	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSOPRIP, !intercept, !intercept);
> +	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSOPDATA, !intercept, !intercept);
> +	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSOPDATA2, !intercept, !intercept);
> +	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSOPDATA3, !intercept, !intercept);
> +	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSDCLINAD, !intercept, !intercept);
> +	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSBRTARGET, !intercept, !intercept);
> +	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_ICIBSEXTDCTL, !intercept, !intercept);
> +}
> +
>   static void grow_ple_window(struct kvm_vcpu *vcpu)
>   {
>   	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -1207,6 +1241,29 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
>   		/* No need to intercept these MSRs */
>   		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_EIP, 1, 1);
>   		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_ESP, 1, 1);
> +
> +		/*
> +		 * If hardware supports VIBS then no need to intercept IBS MSRS
> +		 * when VIBS is enabled in guest.
> +		 */
> +		if (vibs) {
> +			if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_IBS)) {
> +				svm_ibs_msr_interception(svm, false);
> +				svm->ibs_enabled = true;
> +
> +				/*
> +				 * In order to enable VIBS, AVIC/VNMI must be enabled to handle the
> +				 * interrupt generated by IBS driver. When AVIC is enabled, once
> +				 * data collection for IBS fetch/op block for sampled interval
> +				 * provided is done, hardware signals VNMI which is generated via
> +				 * AVIC which uses extended LVT registers. That is why extended LVT
> +				 * registers are initialized at guest startup.
> +				 */
> +				kvm_apic_init_eilvt_regs(vcpu);
> +			} else {
> +				svm->ibs_enabled = false;

Can svm->ibs_enabled have previously been true? If so, then you would need 
to reset the msr interception. If not, then this can be simplified to

	if (vibs && guest_cpuid_has(&svm->vcpu, X86_FEATURE_IBS)) {
		...
	}

without an else path.

> +			}
> +		}
>   	}
>   }
>   
> @@ -2888,6 +2945,11 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   	case MSR_AMD64_DE_CFG:
>   		msr_info->data = svm->msr_decfg;
>   		break;
> +
> +	case MSR_AMD64_IBSCTL:
> +		rdmsrl(MSR_AMD64_IBSCTL, msr_info->data);
> +		break;
> +
>   	default:
>   		return kvm_get_msr_common(vcpu, msr_info);
>   	}
> @@ -4038,19 +4100,111 @@ static fastpath_t svm_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
>   	return EXIT_FASTPATH_NONE;
>   }
>   
> +/*
> + * Since the IBS state is swap type C, the hypervisor is responsible for saving
> + * its own IBS state before VMRUN.
> + */
> +static void svm_save_host_ibs_msrs(struct vmcb_save_area *hostsa)
> +{

A comment here (and in the restore function) about how IBSFETCHCTL and 
IBSOPCTL are saved/restored as part of the calls to amd_vibs_window() 
would be nice to have.

> +	rdmsrl(MSR_AMD64_IBSFETCHLINAD, hostsa->ibs_fetch_linear_addr);
> +	rdmsrl(MSR_AMD64_IBSOPRIP, hostsa->ibs_op_rip);
> +	rdmsrl(MSR_AMD64_IBSOPDATA, hostsa->ibs_op_data);
> +	rdmsrl(MSR_AMD64_IBSOPDATA2, hostsa->ibs_op_data2);
> +	rdmsrl(MSR_AMD64_IBSOPDATA3, hostsa->ibs_op_data3);
> +	rdmsrl(MSR_AMD64_IBSDCLINAD, hostsa->ibs_dc_linear_addr);
> +	rdmsrl(MSR_AMD64_IBSBRTARGET, hostsa->ibs_br_target);
> +	rdmsrl(MSR_AMD64_ICIBSEXTDCTL, hostsa->ibs_fetch_extd_ctl);
> +}
> +
> +/*
> + * Since the IBS state is swap type C, the hypervisor is responsible for
> + * restoring its own IBS state after VMEXIT.
> + */
> +static void svm_restore_host_ibs_msrs(struct vmcb_save_area *hostsa)
> +{
> +	wrmsrl(MSR_AMD64_IBSFETCHLINAD, hostsa->ibs_fetch_linear_addr);
> +	wrmsrl(MSR_AMD64_IBSOPRIP, hostsa->ibs_op_rip);
> +	wrmsrl(MSR_AMD64_IBSOPDATA, hostsa->ibs_op_data);
> +	wrmsrl(MSR_AMD64_IBSOPDATA2, hostsa->ibs_op_data2);
> +	wrmsrl(MSR_AMD64_IBSOPDATA3, hostsa->ibs_op_data3);
> +	wrmsrl(MSR_AMD64_IBSDCLINAD, hostsa->ibs_dc_linear_addr);
> +	wrmsrl(MSR_AMD64_IBSBRTARGET, hostsa->ibs_br_target);
> +	wrmsrl(MSR_AMD64_ICIBSEXTDCTL, hostsa->ibs_fetch_extd_ctl);
> +}
> +
> +/*
> + * Host states are categorized into three swap types based on how it is
> + * handled by hardware during a switch.
> + * Below enum represent host states which are categorized as Swap type C
> + *
> + * C: VMRUN:  Host state _NOT_ saved in host save area
> + *    VMEXIT: Host state initializard to default values.
> + *
> + * Swap type C state is not loaded by VMEXIT and is not saved by VMRUN.
> + * It needs to be saved/restored manually.
> + */
> +enum {
> +	SWAP_TYPE_C_IBS = 0,
> +	SWAP_TYPE_C_MAX
> +};
> +
> +/*
> + * Since IBS state is swap type C, hypervisor needs to disable IBS, then save
> + * IBS MSRs before VMRUN and re-enable it, then restore IBS MSRs after VMEXIT.
> + * This order is important, if not followed, software ends up reading inaccurate
> + * IBS registers.
> + */
> +static noinstr u32 svm_save_swap_type_c(struct kvm_vcpu *vcpu)
> +{
> +	struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, vcpu->cpu);
> +	struct vmcb_save_area *hostsa;
> +	u32 restore_mask = 0;
> +
> +	hostsa = (struct vmcb_save_area *)(page_address(sd->save_area) + 0x400);
> +
> +	if (to_svm(vcpu)->ibs_enabled) {
> +		bool en = amd_vibs_window(WINDOW_START, &hostsa->ibs_fetch_ctl, &hostsa->ibs_op_ctl);
> +
> +		if (en) {

Why not just: if (amd_vibs_window(WINDOW_START, ...)) {

no need to define "en" just to use it once.

> +			svm_save_host_ibs_msrs(hostsa);
> +			restore_mask |= 1 << SWAP_TYPE_C_IBS;
> +		}
> +	}
> +	return restore_mask;
> +}
> +
> +static noinstr void svm_restore_swap_type_c(struct kvm_vcpu *vcpu, u32 restore_mask)
> +{
> +	struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, vcpu->cpu);
> +	struct vmcb_save_area *hostsa;
> +
> +	hostsa = (struct vmcb_save_area *)(page_address(sd->save_area) + 0x400);
> +
> +	if (restore_mask & (1 << SWAP_TYPE_C_IBS)) {
> +		svm_restore_host_ibs_msrs(hostsa);
> +		amd_vibs_window(WINDOW_STOPPING, &hostsa->ibs_fetch_ctl, &hostsa->ibs_op_ctl);
> +	}
> +}
> +
>   static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_intercepted)
>   {
>   	struct vcpu_svm *svm = to_svm(vcpu);
> +	u32 restore_mask;
>   
>   	guest_state_enter_irqoff();
>   
>   	amd_clear_divider();
>   
> -	if (sev_es_guest(vcpu->kvm))
> +	if (sev_es_guest(vcpu->kvm)) {
>   		__svm_sev_es_vcpu_run(svm, spec_ctrl_intercepted);
> -	else
> +	} else {
> +		restore_mask = svm_save_swap_type_c(vcpu);
>   		__svm_vcpu_run(svm, spec_ctrl_intercepted);
>   
> +		if (restore_mask)
> +			svm_restore_swap_type_c(vcpu, restore_mask);

Unconditionally calling svm_restore_swap_type_c() and having the if check 
in that function would make this a bit cleaner.

> +	}
> +
>   	guest_state_exit_irqoff();
>   }
>   
> @@ -4137,6 +4291,13 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
>   
>   	/* Any pending NMI will happen here */
>   
> +	/*
> +	 * Disable the IBS window since any pending IBS NMIs will have been
> +	 * handled.
> +	 */
> +	if (svm->ibs_enabled)
> +		amd_vibs_window(WINDOW_STOPPED, NULL, NULL);
> +
>   	if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI))
>   		kvm_after_interrupt(vcpu);
>   
> @@ -5225,6 +5386,13 @@ static __init int svm_hardware_setup(void)
>   			pr_info("LBR virtualization supported\n");
>   	}
>   
> +	if (vibs) {
> +		if ((vnmi || avic) && boot_cpu_has(X86_FEATURE_VIBS))
> +			pr_info("IBS virtualization supported\n");
> +		else
> +			vibs = false;
> +	}

How about:
	vibs = boot_cpu_has(X86_FEATURE_VIBS)) && (vnmi || avic);
	if (vibs)
		pr_info(...);

> +
>   	if (!enable_pmu)
>   		pr_info("PMU virtualization is disabled\n");
>   
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index c7eb82a78127..c2a02629a1d1 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -30,7 +30,7 @@
>   #define	IOPM_SIZE PAGE_SIZE * 3
>   #define	MSRPM_SIZE PAGE_SIZE * 2
>   
> -#define MAX_DIRECT_ACCESS_MSRS	46
> +#define MAX_DIRECT_ACCESS_MSRS	62
>   #define MSRPM_OFFSETS	32
>   extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
>   extern bool npt_enabled;
> @@ -260,6 +260,7 @@ struct vcpu_svm {
>   	unsigned long soft_int_old_rip;
>   	unsigned long soft_int_next_rip;
>   	bool soft_int_injected;
> +	bool ibs_enabled;
>   
>   	u32 ldr_reg;
>   	u32 dfr_reg;
> @@ -732,6 +733,7 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm);
>   void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
>   void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
>   void sev_es_unmap_ghcb(struct vcpu_svm *svm);
> +void svm_ibs_msr_interception(struct vcpu_svm *svm, bool intercept);

This doesn't seem necessary and isn't part of the sev.c file anyway.

Thanks,
Tom

>   
>   /* vmenter.S */
>   

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 11/13] KVM: SVM: Add support for IBS virtualization for SEV-ES guests
  2023-09-04  9:53 ` [PATCH 11/13] KVM: SVM: Add support for IBS virtualization for SEV-ES guests Manali Shukla
@ 2023-09-05 15:43   ` Tom Lendacky
  0 siblings, 0 replies; 32+ messages in thread
From: Tom Lendacky @ 2023-09-05 15:43 UTC (permalink / raw)
  To: Manali Shukla, kvm, seanjc
  Cc: linux-doc, linux-perf-users, x86, pbonzini, peterz, bp,
	santosh.shukla, ravi.bangoria, nikunj

On 9/4/23 04:53, Manali Shukla wrote:
> Since the IBS state is swap type C, the hypervisor is responsible for
> saving its own IBS state before VMRUN and restoring it after VMEXIT.
> It is also responsible for disabling IBS before VMRUN and re-enabling
> it after VMEXIT. For a SEV-ES guest with IBS virtualization enabled,
> a VMEXIT_INVALID will happen if IBS is found to be enabled on VMRUN
> [1].
> 
> The IBS virtualization feature for SEV-ES guests is not enabled in this
> patch. Later patches enable IBS virtualization for SEV-ES guests.
> 
> [1]: https://bugzilla.kernel.org/attachment.cgi?id=304653
>       AMD64 Architecture Programmer’s Manual, Vol 2, Section 15.38
>       Instruction-Based Sampling Virtualization.
> 
> Signed-off-by: Manali Shukla <manali.shukla@amd.com>
> ---
>   arch/x86/include/asm/svm.h | 14 +++++++++++++-
>   arch/x86/kvm/svm/sev.c     |  7 +++++++
>   arch/x86/kvm/svm/svm.c     | 11 +++++------
>   3 files changed, 25 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index 4096d2f68770..58b60842a3b7 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -469,6 +469,18 @@ struct sev_es_save_area {
>   	u8 fpreg_x87[80];
>   	u8 fpreg_xmm[256];
>   	u8 fpreg_ymm[256];
> +	u8 lbr_stack_from_to[256];
> +	u64 lbr_select;
> +	u64 ibs_fetch_ctl;
> +	u64 ibs_fetch_linear_addr;
> +	u64 ibs_op_ctl;
> +	u64 ibs_op_rip;
> +	u64 ibs_op_data;
> +	u64 ibs_op_data2;
> +	u64 ibs_op_data3;
> +	u64 ibs_dc_linear_addr;
> +	u64 ibs_br_target;
> +	u64 ibs_fetch_extd_ctl;
>   } __packed;
>   
>   struct ghcb_save_area {
> @@ -527,7 +539,7 @@ struct ghcb {
>   
>   #define EXPECTED_VMCB_SAVE_AREA_SIZE		1992
>   #define EXPECTED_GHCB_SAVE_AREA_SIZE		1032
> -#define EXPECTED_SEV_ES_SAVE_AREA_SIZE		1648
> +#define EXPECTED_SEV_ES_SAVE_AREA_SIZE		1992
>   #define EXPECTED_VMCB_CONTROL_AREA_SIZE		1024
>   #define EXPECTED_GHCB_SIZE			PAGE_SIZE
>   
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index d3aec1f2cad2..41706335cedd 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -59,6 +59,7 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
>   #define sev_es_enabled false
>   #endif /* CONFIG_KVM_AMD_SEV */
>   
> +static bool sev_es_vibs_enabled;
>   static u8 sev_enc_bit;
>   static DECLARE_RWSEM(sev_deactivate_lock);
>   static DEFINE_MUTEX(sev_bitmap_lock);
> @@ -2256,6 +2257,9 @@ void __init sev_hardware_setup(void)
>   
>   	sev_enabled = sev_supported;
>   	sev_es_enabled = sev_es_supported;
> +
> +	if (!sev_es_enabled || !cpu_feature_enabled(X86_FEATURE_SEV_ES_VIBS))
> +		sev_es_vibs_enabled = false;

sev_es_vibs_enabled = sev_es_enabled && cpu_feature_enabled(X86_FEATURE_SEV_ES_VIBS);

But won't this require VNMI support, too? So should that also be checked
along with vibs from svm.c (since AVIC isn't supported with SEV).

>   #endif
>   }
>   
> @@ -2993,6 +2997,9 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm)
>   		if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_RDTSCP))
>   			svm_clr_intercept(svm, INTERCEPT_RDTSCP);
>   	}
> +
> +	if (sev_es_vibs_enabled && svm->ibs_enabled)
> +		svm_ibs_msr_interception(svm, false);

I might be missing something here...  if svm->ibs_enabled is true, then
this intercept change will already have been done. Shouldn't this be
doing the reverse?  But, it looks like svm->ibs_enabled is set in the
init_vmcb_after_set_cpuid() function, which is called after
sev_es_init_vmcb() is called...  so it can never be true here, right?

Thanks,
Tom

>   }
>   
>   void sev_init_vmcb(struct vcpu_svm *svm)
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 6f566ed93f4c..0cfe23bb144a 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4194,16 +4194,15 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in
>   	guest_state_enter_irqoff();
>   
>   	amd_clear_divider();
> +	restore_mask = svm_save_swap_type_c(vcpu);
>   
> -	if (sev_es_guest(vcpu->kvm)) {
> +	if (sev_es_guest(vcpu->kvm))
>   		__svm_sev_es_vcpu_run(svm, spec_ctrl_intercepted);
> -	} else {
> -		restore_mask = svm_save_swap_type_c(vcpu);
> +	else
>   		__svm_vcpu_run(svm, spec_ctrl_intercepted);
>   
> -		if (restore_mask)
> -			svm_restore_swap_type_c(vcpu, restore_mask);
> -	}
> +	if (restore_mask)
> +		svm_restore_swap_type_c(vcpu, restore_mask);
>   
>   	guest_state_exit_irqoff();
>   }

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/13] Implement support for IBS virtualization
  2023-09-04  9:53 [PATCH 00/13] Implement support for IBS virtualization Manali Shukla
                   ` (12 preceding siblings ...)
  2023-09-04  9:53 ` [PATCH 13/13] KVM: x86: nSVM: Implement support for nested IBS virtualization Manali Shukla
@ 2023-09-05 15:47 ` Peter Zijlstra
  2023-09-06 15:38   ` Manali Shukla
  13 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2023-09-05 15:47 UTC (permalink / raw)
  To: Manali Shukla
  Cc: kvm, seanjc, linux-doc, linux-perf-users, x86, pbonzini, bp,
	santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj

On Mon, Sep 04, 2023 at 09:53:34AM +0000, Manali Shukla wrote:

> Note that, since IBS registers are swap type C [2], the hypervisor is
> responsible for saving and restoring of IBS host state. Hypervisor
> does so only when IBS is active on the host to avoid unnecessary
> rdmsrs/wrmsrs. Hypervisor needs to disable host IBS before saving the
> state and enter the guest. After a guest exit, the hypervisor needs to
> restore host IBS state and re-enable IBS.

Why do you think it is OK for a guest to disable the host IBS when
entering a guest? Perhaps the host was wanting to profile the guest.

Only when perf_event_attr::exclude_guest is set is this allowed,
otherwise you have to respect the host running IBS and you're not
allowed to touch it.

Host trumps guest etc..

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 12/13] KVM: SVM: Enable IBS virtualization on non SEV-ES and SEV-ES guests
  2023-09-04  9:53 ` [PATCH 12/13] KVM: SVM: Enable IBS virtualization on non SEV-ES and " Manali Shukla
@ 2023-09-05 16:00   ` Tom Lendacky
  2023-09-12  3:30   ` Chao Gao
  1 sibling, 0 replies; 32+ messages in thread
From: Tom Lendacky @ 2023-09-05 16:00 UTC (permalink / raw)
  To: Manali Shukla, kvm, seanjc
  Cc: linux-doc, linux-perf-users, x86, pbonzini, peterz, bp,
	santosh.shukla, ravi.bangoria, nikunj

On 9/4/23 04:53, Manali Shukla wrote:
> To enable IBS virtualization capability on non SEV-ES guests, bit 2
> at offset 0xb8 in VMCB is set to 1 for non SEV-ES guests.

s/for non SEV-ES guests//

> 
> To enable IBS virtualization capability on SEV-ES guests, bit 12 in
> SEV_FEATURES in VMSA is set to 1 for SEV-ES guests.

s/for SEV-ES guests//

> 
> Signed-off-by: Manali Shukla <manali.shukla@amd.com>
> ---
>   arch/x86/include/asm/svm.h |  4 ++++
>   arch/x86/kvm/svm/sev.c     |  5 ++++-
>   arch/x86/kvm/svm/svm.c     | 26 +++++++++++++++++++++++++-
>   3 files changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> index 58b60842a3b7..a31bf803b993 100644
> --- a/arch/x86/include/asm/svm.h
> +++ b/arch/x86/include/asm/svm.h
> @@ -215,6 +215,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
>   #define LBR_CTL_ENABLE_MASK BIT_ULL(0)
>   #define VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK BIT_ULL(1)
>   
> +#define VIRTUAL_IBS_ENABLE_MASK BIT_ULL(2)
> +
>   #define SVM_INTERRUPT_SHADOW_MASK	BIT_ULL(0)
>   #define SVM_GUEST_INTERRUPT_MASK	BIT_ULL(1)
>   
> @@ -259,6 +261,8 @@ struct __attribute__ ((__packed__)) vmcb_control_area {
>   
>   #define VMCB_AVIC_APIC_BAR_MASK				0xFFFFFFFFFF000ULL
>   
> +#define SVM_SEV_ES_FEAT_VIBS				BIT(12)

Based on the SNP series, this shouldn't be added in between all the AVIC 
features, but rather after them, just before the vmcb_seg struct. And 
probably best to just call it SVM_SEV_FEAT_VIBS.

> +
>   #define AVIC_UNACCEL_ACCESS_WRITE_MASK		1
>   #define AVIC_UNACCEL_ACCESS_OFFSET_MASK		0xFF0
>   #define AVIC_UNACCEL_ACCESS_VECTOR_MASK		0xFFFFFFFF
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 41706335cedd..e0ef3a2323d6 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -59,7 +59,7 @@ module_param_named(sev_es, sev_es_enabled, bool, 0444);
>   #define sev_es_enabled false
>   #endif /* CONFIG_KVM_AMD_SEV */
>   
> -static bool sev_es_vibs_enabled;
> +static bool sev_es_vibs_enabled = true;

No need to change this if you follow my comment from the previous patch. 
Also, maybe this should just be called sev_es_vibs, since it's more a 
capability (similar to how in svm.c it is just vibs).

>   static u8 sev_enc_bit;
>   static DECLARE_RWSEM(sev_deactivate_lock);
>   static DEFINE_MUTEX(sev_bitmap_lock);
> @@ -607,6 +607,9 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
>   	save->xss  = svm->vcpu.arch.ia32_xss;
>   	save->dr6  = svm->vcpu.arch.dr6;
>   
> +	if (svm->ibs_enabled && sev_es_vibs_enabled)

This should solely rely on svm->ibs_enabled, which means that 
svm->ibs_enabled should have previously been set to false if 
sev_es_vibs_enabled is false.

> +		save->sev_features |= SVM_SEV_ES_FEAT_VIBS;
> +
>   	pr_debug("Virtual Machine Save Area (VMSA):\n");
>   	print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
>   
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 0cfe23bb144a..b85120f0d3ac 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -234,7 +234,7 @@ static int lbrv = true;
>   module_param(lbrv, int, 0444);
>   
>   /* enable/disable IBS virtualization */
> -static int vibs;
> +static int vibs = true;
>   module_param(vibs, int, 0444);
>   
>   static int tsc_scaling = true;
> @@ -1245,10 +1245,13 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
>   		/*
>   		 * If hardware supports VIBS then no need to intercept IBS MSRS
>   		 * when VIBS is enabled in guest.
> +		 *
> +		 * Enable VIBS by setting bit 2 at offset 0xb8 in VMCB.
>   		 */
>   		if (vibs) {
>   			if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_IBS)) {
>   				svm_ibs_msr_interception(svm, false);
> +				svm->vmcb->control.virt_ext |= VIRTUAL_IBS_ENABLE_MASK;

This appears to do this for any type of guest, is this valid to do for 
SEV-ES guests?

Thanks,
Tom

>   				svm->ibs_enabled = true;
>   
>   				/*
> @@ -5166,6 +5169,24 @@ static __init void svm_adjust_mmio_mask(void)
>   	kvm_mmu_set_mmio_spte_mask(mask, mask, PT_WRITABLE_MASK | PT_USER_MASK);
>   }
>   
> +static void svm_ibs_set_cpu_caps(void)
> +{
> +	kvm_cpu_cap_set(X86_FEATURE_IBS);
> +	kvm_cpu_cap_set(X86_FEATURE_EXTLVT);
> +	kvm_cpu_cap_set(X86_FEATURE_EXTAPIC);
> +	kvm_cpu_cap_set(X86_FEATURE_IBS_AVAIL);
> +	kvm_cpu_cap_set(X86_FEATURE_IBS_FETCHSAM);
> +	kvm_cpu_cap_set(X86_FEATURE_IBS_OPSAM);
> +	kvm_cpu_cap_set(X86_FEATURE_IBS_RDWROPCNT);
> +	kvm_cpu_cap_set(X86_FEATURE_IBS_OPCNT);
> +	kvm_cpu_cap_set(X86_FEATURE_IBS_BRNTRGT);
> +	kvm_cpu_cap_set(X86_FEATURE_IBS_OPCNTEXT);
> +	kvm_cpu_cap_set(X86_FEATURE_IBS_RIPINVALIDCHK);
> +	kvm_cpu_cap_set(X86_FEATURE_IBS_OPBRNFUSE);
> +	kvm_cpu_cap_set(X86_FEATURE_IBS_FETCHCTLEXTD);
> +	kvm_cpu_cap_set(X86_FEATURE_IBS_ZEN4_EXT);
> +}
> +
>   static __init void svm_set_cpu_caps(void)
>   {
>   	kvm_set_cpu_caps();
> @@ -5208,6 +5229,9 @@ static __init void svm_set_cpu_caps(void)
>   		kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK);
>   	}
>   
> +	if (vibs)
> +		svm_ibs_set_cpu_caps();
> +
>   	/* CPUID 0x80000008 */
>   	if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
>   	    boot_cpu_has(X86_FEATURE_AMD_SSBD))

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/13] KVM: SVM: add support for IBS virtualization for non SEV-ES guests
  2023-09-04  9:53 ` [PATCH 09/13] KVM: SVM: add support for IBS virtualization for non SEV-ES guests Manali Shukla
  2023-09-05 15:30   ` Tom Lendacky
@ 2023-09-06  1:51   ` Alexey Kardashevskiy
  2023-09-12  3:09   ` Chao Gao
  2 siblings, 0 replies; 32+ messages in thread
From: Alexey Kardashevskiy @ 2023-09-06  1:51 UTC (permalink / raw)
  To: Manali Shukla, kvm, seanjc
  Cc: linux-doc, linux-perf-users, x86, pbonzini, peterz, bp,
	santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj

On 4/9/23 19:53, Manali Shukla wrote:
> From: Santosh Shukla <santosh.shukla@amd.com>
> 
> IBS virtualization (VIBS) [1] feature allows the guest to collect IBS
> samples without exiting the guest.
> 
> There are 2 parts to this feature
> - Virtualizing the IBS register state.
> - Ensuring the IBS interrupt is handled in the guest without exiting
>    to the hypervisor.
> 
> IBS virtualization requires the use of AVIC or NMI virtualization for
> delivery of a virtualized interrupt from IBS hardware in the guest.
> Without the virtualized interrupt delivery, the IBS interrupt
> occurring in the guest will not be delivered to either the guest or
> the hypervisor.  When AVIC is enabled, IBS LVT entry (Extended
> Interrupt 0 LVT) message type should be programmed to INTR or NMI.
> 
> So, when the sampled interval for the data collection for IBS fetch/op
> block is over, VIBS hardware is going to generate a Virtual NMI, but
> the source of Virtual NMI is different in both AVIC enabled/disabled
> case.
> 1) when AVIC is enabled, Virtual NMI is generated via AVIC using
>     extended LVT (EXTLVT).
> 2) When AVIC is disabled, Virtual NMI is directly generated from
>     hardware.
> 
> Since IBS registers falls under swap type C [2], only the guest state is
> saved and restored automatically by the hardware. Host state needs to be
> saved and restored manually by the hypervisor. Note that, saving and
> restoring of host IBS state happens only when IBS is active on host.  to
> avoid unnecessary rdmsrs/wrmsrs. Hypervisor needs to disable host IBS
> before VMRUN and re-enable it after VMEXIT [1].
> 
> The IBS virtualization feature for non SEV-ES guests is not enabled in
> this patch. Later patches enable VIBS for non SEV-ES guests.
> 
> [1]: https://bugzilla.kernel.org/attachment.cgi?id=304653
>       AMD64 Architecture Programmer’s Manual, Vol 2, Section 15.38
>       Instruction-Based Sampling Virtualization.
> 
> [2]: https://bugzilla.kernel.org/attachment.cgi?id=304653
>       AMD64 Architecture Programmer’s Manual, Vol 2, Appendix B Layout
>       of VMCB, Table B-3 Swap Types.
> 
> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
> Co-developed-by: Manali Shukla <manali.shukla@amd.com>
> Signed-off-by: Manali Shukla <manali.shukla@amd.com>
> ---
>   arch/x86/kvm/svm/svm.c | 172 ++++++++++++++++++++++++++++++++++++++++-
>   arch/x86/kvm/svm/svm.h |   4 +-
>   2 files changed, 173 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 20fe83eb32ee..6f566ed93f4c 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -139,6 +139,22 @@ static const struct svm_direct_access_msrs {
>   	{ .index = X2APIC_MSR(APIC_TMICT),		.always = false },
>   	{ .index = X2APIC_MSR(APIC_TMCCT),		.always = false },
>   	{ .index = X2APIC_MSR(APIC_TDCR),		.always = false },
> +	{ .index = MSR_AMD64_IBSFETCHCTL,		.always = false },
> +	{ .index = MSR_AMD64_IBSFETCHLINAD,		.always = false },
> +	{ .index = MSR_AMD64_IBSOPCTL,			.always = false },
> +	{ .index = MSR_AMD64_IBSOPRIP,			.always = false },
> +	{ .index = MSR_AMD64_IBSOPDATA,			.always = false },
> +	{ .index = MSR_AMD64_IBSOPDATA2,		.always = false },
> +	{ .index = MSR_AMD64_IBSOPDATA3,		.always = false },
> +	{ .index = MSR_AMD64_IBSDCLINAD,		.always = false },
> +	{ .index = MSR_AMD64_IBSBRTARGET,		.always = false },
> +	{ .index = MSR_AMD64_ICIBSEXTDCTL,		.always = false },
> +	{ .index = X2APIC_MSR(APIC_EFEAT),		.always = false },
> +	{ .index = X2APIC_MSR(APIC_ECTRL),		.always = false },
> +	{ .index = X2APIC_MSR(APIC_EILVTn(0)),		.always = false },
> +	{ .index = X2APIC_MSR(APIC_EILVTn(1)),		.always = false },
> +	{ .index = X2APIC_MSR(APIC_EILVTn(2)),		.always = false },
> +	{ .index = X2APIC_MSR(APIC_EILVTn(3)),		.always = false },
>   	{ .index = MSR_INVALID,				.always = false },
>   };
>   
> @@ -217,6 +233,10 @@ module_param(vgif, int, 0444);
>   static int lbrv = true;
>   module_param(lbrv, int, 0444);
>   
> +/* enable/disable IBS virtualization */
> +static int vibs;
> +module_param(vibs, int, 0444);
> +
>   static int tsc_scaling = true;
>   module_param(tsc_scaling, int, 0444);
>   
> @@ -1050,6 +1070,20 @@ void disable_nmi_singlestep(struct vcpu_svm *svm)
>   	}
>   }
>   
> +void svm_ibs_msr_interception(struct vcpu_svm *svm, bool intercept)
> +{
> +	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSFETCHCTL, !intercept, !intercept);
> +	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSFETCHLINAD, !intercept, !intercept);
> +	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSOPCTL, !intercept, !intercept);
> +	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSOPRIP, !intercept, !intercept);
> +	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSOPDATA, !intercept, !intercept);
> +	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSOPDATA2, !intercept, !intercept);
> +	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSOPDATA3, !intercept, !intercept);
> +	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSDCLINAD, !intercept, !intercept);
> +	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_IBSBRTARGET, !intercept, !intercept);
> +	set_msr_interception(&svm->vcpu, svm->msrpm, MSR_AMD64_ICIBSEXTDCTL, !intercept, !intercept);
> +}
> +
>   static void grow_ple_window(struct kvm_vcpu *vcpu)
>   {
>   	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -1207,6 +1241,29 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
>   		/* No need to intercept these MSRs */
>   		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_EIP, 1, 1);
>   		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_ESP, 1, 1);
> +
> +		/*
> +		 * If hardware supports VIBS then no need to intercept IBS MSRS
> +		 * when VIBS is enabled in guest.
> +		 */
> +		if (vibs) {
> +			if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_IBS)) {
> +				svm_ibs_msr_interception(svm, false);
> +				svm->ibs_enabled = true;
> +
> +				/*
> +				 * In order to enable VIBS, AVIC/VNMI must be enabled to handle the
> +				 * interrupt generated by IBS driver. When AVIC is enabled, once
> +				 * data collection for IBS fetch/op block for sampled interval
> +				 * provided is done, hardware signals VNMI which is generated via
> +				 * AVIC which uses extended LVT registers. That is why extended LVT
> +				 * registers are initialized at guest startup.
> +				 */
> +				kvm_apic_init_eilvt_regs(vcpu);
> +			} else {
> +				svm->ibs_enabled = false;
> +			}
> +		}
>   	}
>   }
>   
> @@ -2888,6 +2945,11 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
>   	case MSR_AMD64_DE_CFG:
>   		msr_info->data = svm->msr_decfg;
>   		break;
> +
> +	case MSR_AMD64_IBSCTL:
> +		rdmsrl(MSR_AMD64_IBSCTL, msr_info->data);
> +		break;
> +
>   	default:
>   		return kvm_get_msr_common(vcpu, msr_info);
>   	}
> @@ -4038,19 +4100,111 @@ static fastpath_t svm_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
>   	return EXIT_FASTPATH_NONE;
>   }
>   
> +/*
> + * Since the IBS state is swap type C, the hypervisor is responsible for saving
> + * its own IBS state before VMRUN.
> + */
> +static void svm_save_host_ibs_msrs(struct vmcb_save_area *hostsa)
> +{
> +	rdmsrl(MSR_AMD64_IBSFETCHLINAD, hostsa->ibs_fetch_linear_addr);
> +	rdmsrl(MSR_AMD64_IBSOPRIP, hostsa->ibs_op_rip);
> +	rdmsrl(MSR_AMD64_IBSOPDATA, hostsa->ibs_op_data);
> +	rdmsrl(MSR_AMD64_IBSOPDATA2, hostsa->ibs_op_data2);
> +	rdmsrl(MSR_AMD64_IBSOPDATA3, hostsa->ibs_op_data3);
> +	rdmsrl(MSR_AMD64_IBSDCLINAD, hostsa->ibs_dc_linear_addr);
> +	rdmsrl(MSR_AMD64_IBSBRTARGET, hostsa->ibs_br_target);
> +	rdmsrl(MSR_AMD64_ICIBSEXTDCTL, hostsa->ibs_fetch_extd_ctl);
> +}
> +
> +/*
> + * Since the IBS state is swap type C, the hypervisor is responsible for
> + * restoring its own IBS state after VMEXIT.
> + */
> +static void svm_restore_host_ibs_msrs(struct vmcb_save_area *hostsa)
> +{
> +	wrmsrl(MSR_AMD64_IBSFETCHLINAD, hostsa->ibs_fetch_linear_addr);
> +	wrmsrl(MSR_AMD64_IBSOPRIP, hostsa->ibs_op_rip);
> +	wrmsrl(MSR_AMD64_IBSOPDATA, hostsa->ibs_op_data);
> +	wrmsrl(MSR_AMD64_IBSOPDATA2, hostsa->ibs_op_data2);
> +	wrmsrl(MSR_AMD64_IBSOPDATA3, hostsa->ibs_op_data3);
> +	wrmsrl(MSR_AMD64_IBSDCLINAD, hostsa->ibs_dc_linear_addr);
> +	wrmsrl(MSR_AMD64_IBSBRTARGET, hostsa->ibs_br_target);
> +	wrmsrl(MSR_AMD64_ICIBSEXTDCTL, hostsa->ibs_fetch_extd_ctl);
> +}
> +
> +/*
> + * Host states are categorized into three swap types based on how it is
> + * handled by hardware during a switch.
> + * Below enum represent host states which are categorized as Swap type C
> + *
> + * C: VMRUN:  Host state _NOT_ saved in host save area
> + *    VMEXIT: Host state initializard to default values.
> + *
> + * Swap type C state is not loaded by VMEXIT and is not saved by VMRUN.
> + * It needs to be saved/restored manually.
> + */
> +enum {
> +	SWAP_TYPE_C_IBS = 0,
> +	SWAP_TYPE_C_MAX
> +};
> +
> +/*
> + * Since IBS state is swap type C, hypervisor needs to disable IBS, then save
> + * IBS MSRs before VMRUN and re-enable it, then restore IBS MSRs after VMEXIT.
> + * This order is important, if not followed, software ends up reading inaccurate
> + * IBS registers.
> + */
> +static noinstr u32 svm_save_swap_type_c(struct kvm_vcpu *vcpu)
> +{
> +	struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, vcpu->cpu);
> +	struct vmcb_save_area *hostsa;
> +	u32 restore_mask = 0;
> +
> +	hostsa = (struct vmcb_save_area *)(page_address(sd->save_area) + 0x400);
> +
> +	if (to_svm(vcpu)->ibs_enabled) {
> +		bool en = amd_vibs_window(WINDOW_START, &hostsa->ibs_fetch_ctl, &hostsa->ibs_op_ctl);
> +
> +		if (en) {
> +			svm_save_host_ibs_msrs(hostsa);
> +			restore_mask |= 1 << SWAP_TYPE_C_IBS;
> +		}
> +	}
> +	return restore_mask;
> +}
> +
> +static noinstr void svm_restore_swap_type_c(struct kvm_vcpu *vcpu, u32 restore_mask)
> +{
> +	struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, vcpu->cpu);
> +	struct vmcb_save_area *hostsa;
> +
> +	hostsa = (struct vmcb_save_area *)(page_address(sd->save_area) + 0x400);
> +
> +	if (restore_mask & (1 << SWAP_TYPE_C_IBS)) {
> +		svm_restore_host_ibs_msrs(hostsa);
> +		amd_vibs_window(WINDOW_STOPPING, &hostsa->ibs_fetch_ctl, &hostsa->ibs_op_ctl);
> +	}
> +}
> +
>   static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_intercepted)
>   {
>   	struct vcpu_svm *svm = to_svm(vcpu);
> +	u32 restore_mask;
>   
>   	guest_state_enter_irqoff();
>   
>   	amd_clear_divider();
>   
> -	if (sev_es_guest(vcpu->kvm))
> +	if (sev_es_guest(vcpu->kvm)) {
>   		__svm_sev_es_vcpu_run(svm, spec_ctrl_intercepted);
> -	else
> +	} else {
> +		restore_mask = svm_save_swap_type_c(vcpu);
>   		__svm_vcpu_run(svm, spec_ctrl_intercepted);
>   
> +		if (restore_mask)
> +			svm_restore_swap_type_c(vcpu, restore_mask);
> +	}
> +


The SEV-ES branch also needs to save/restore IBS to keep IBS working on 
the host. Unless the hardware is not resetting the IBS state if vIBS was 
not enabled for the guest, is this the case? Thanks,


>   	guest_state_exit_irqoff();
>   }
>   
> @@ -4137,6 +4291,13 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
>   
>   	/* Any pending NMI will happen here */
>   
> +	/*
> +	 * Disable the IBS window since any pending IBS NMIs will have been
> +	 * handled.
> +	 */
> +	if (svm->ibs_enabled)
> +		amd_vibs_window(WINDOW_STOPPED, NULL, NULL);
> +
>   	if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI))
>   		kvm_after_interrupt(vcpu);
>   
> @@ -5225,6 +5386,13 @@ static __init int svm_hardware_setup(void)
>   			pr_info("LBR virtualization supported\n");
>   	}
>   
> +	if (vibs) {
> +		if ((vnmi || avic) && boot_cpu_has(X86_FEATURE_VIBS))
> +			pr_info("IBS virtualization supported\n");
> +		else
> +			vibs = false;
> +	}
> +
>   	if (!enable_pmu)
>   		pr_info("PMU virtualization is disabled\n");
>   
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index c7eb82a78127..c2a02629a1d1 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -30,7 +30,7 @@
>   #define	IOPM_SIZE PAGE_SIZE * 3
>   #define	MSRPM_SIZE PAGE_SIZE * 2
>   
> -#define MAX_DIRECT_ACCESS_MSRS	46
> +#define MAX_DIRECT_ACCESS_MSRS	62
>   #define MSRPM_OFFSETS	32
>   extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
>   extern bool npt_enabled;
> @@ -260,6 +260,7 @@ struct vcpu_svm {
>   	unsigned long soft_int_old_rip;
>   	unsigned long soft_int_next_rip;
>   	bool soft_int_injected;
> +	bool ibs_enabled;
>   
>   	u32 ldr_reg;
>   	u32 dfr_reg;
> @@ -732,6 +733,7 @@ void sev_es_vcpu_reset(struct vcpu_svm *svm);
>   void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector);
>   void sev_es_prepare_switch_to_guest(struct sev_es_save_area *hostsa);
>   void sev_es_unmap_ghcb(struct vcpu_svm *svm);
> +void svm_ibs_msr_interception(struct vcpu_svm *svm, bool intercept);
>   
>   /* vmenter.S */
>   

-- 
Alexey


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/13] Implement support for IBS virtualization
  2023-09-05 15:47 ` [PATCH 00/13] Implement support for " Peter Zijlstra
@ 2023-09-06 15:38   ` Manali Shukla
  2023-09-06 19:56     ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Manali Shukla @ 2023-09-06 15:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: kvm, seanjc, linux-doc, linux-perf-users, x86, pbonzini, bp,
	santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj

Hi Peter,

Thank you for looking into this.

On 9/5/2023 9:17 PM, Peter Zijlstra wrote:
> On Mon, Sep 04, 2023 at 09:53:34AM +0000, Manali Shukla wrote:
> 
>> Note that, since IBS registers are swap type C [2], the hypervisor is
>> responsible for saving and restoring of IBS host state. Hypervisor
>> does so only when IBS is active on the host to avoid unnecessary
>> rdmsrs/wrmsrs. Hypervisor needs to disable host IBS before saving the
>> state and enter the guest. After a guest exit, the hypervisor needs to
>> restore host IBS state and re-enable IBS.
> 
> Why do you think it is OK for a guest to disable the host IBS when
> entering a guest? Perhaps the host was wanting to profile the guest.
> 

1. Since IBS registers are of swap type C [1], only guest state is saved
and restored by the hardware. Host state needs to be saved and restored by
hypervisor. In order to save IBS registers correctly, IBS needs to be
disabled before saving the IBS registers.

2. As per APM [2],
"When a VMRUN is executed to an SEV-ES guest with IBS virtualization enabled, the
IbsFetchCtl[IbsFetchEn] and IbsOpCtl[IbsOpEn] MSR bits must be 0. If either of 
these bits are not 0, the VMRUN will fail with a VMEXIT_INVALID error code."
This is enforced by hardware on SEV-ES guests when VIBS is enabled on SEV-ES
guests.

3. VIBS is not enabled by default. It can be enabled by an explicit
qemu command line option "-cpu +ibs". Guest should be invoked without
this option when host wants to profile the guest.

[1] https://bugzilla.kernel.org/attachment.cgi?id=304653
    AMD64 Architecture Programmer’s Manual, Vol 2, Appendix B. Layout
    of VMCB,
    Table B-2. VMCB Layout, State Save Area 
    Table B-4. VMSA Layout, State Save Area for SEV-ES
    
[2] https://bugzilla.kernel.org/attachment.cgi?id=304653
    AMD64 Architecture Programmer’s Manual, Vol 2, Section 15.38,
    Instruction-Based Sampling Virtualization


> Only when perf_event_attr::exclude_guest is set is this allowed,
> otherwise you have to respect the host running IBS and you're not
> allowed to touch it.
> 
> Host trumps guest etc..


- Manali

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/13] Implement support for IBS virtualization
  2023-09-06 15:38   ` Manali Shukla
@ 2023-09-06 19:56     ` Peter Zijlstra
  2023-09-07 15:49       ` Manali Shukla
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2023-09-06 19:56 UTC (permalink / raw)
  To: Manali Shukla
  Cc: kvm, seanjc, linux-doc, linux-perf-users, x86, pbonzini, bp,
	santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj

On Wed, Sep 06, 2023 at 09:08:25PM +0530, Manali Shukla wrote:
> Hi Peter,
> 
> Thank you for looking into this.
> 
> On 9/5/2023 9:17 PM, Peter Zijlstra wrote:
> > On Mon, Sep 04, 2023 at 09:53:34AM +0000, Manali Shukla wrote:
> > 
> >> Note that, since IBS registers are swap type C [2], the hypervisor is
> >> responsible for saving and restoring of IBS host state. Hypervisor
> >> does so only when IBS is active on the host to avoid unnecessary
> >> rdmsrs/wrmsrs. Hypervisor needs to disable host IBS before saving the
> >> state and enter the guest. After a guest exit, the hypervisor needs to
> >> restore host IBS state and re-enable IBS.
> > 
> > Why do you think it is OK for a guest to disable the host IBS when
> > entering a guest? Perhaps the host was wanting to profile the guest.
> > 
> 
> 1. Since IBS registers are of swap type C [1], only guest state is saved
> and restored by the hardware. Host state needs to be saved and restored by
> hypervisor. In order to save IBS registers correctly, IBS needs to be
> disabled before saving the IBS registers.
> 
> 2. As per APM [2],
> "When a VMRUN is executed to an SEV-ES guest with IBS virtualization enabled, the
> IbsFetchCtl[IbsFetchEn] and IbsOpCtl[IbsOpEn] MSR bits must be 0. If either of 
> these bits are not 0, the VMRUN will fail with a VMEXIT_INVALID error code."
> This is enforced by hardware on SEV-ES guests when VIBS is enabled on SEV-ES
> guests.

I'm not sure I'm fluent in virt speak (in fact, I'm sure I'm not). Is
the above saying that a host can never IBS profile a guest?

Does the current IBS thing assert perf_event_attr::exclude_guest is set?

I can't quickly find anything :-(

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/13] Implement support for IBS virtualization
  2023-09-06 19:56     ` Peter Zijlstra
@ 2023-09-07 15:49       ` Manali Shukla
  2023-09-08 13:31         ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Manali Shukla @ 2023-09-07 15:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: kvm, seanjc, linux-doc, linux-perf-users, x86, pbonzini, bp,
	santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj

Hi Peter,

On 9/7/2023 1:26 AM, Peter Zijlstra wrote:
> On Wed, Sep 06, 2023 at 09:08:25PM +0530, Manali Shukla wrote:
>> Hi Peter,
>>
>> Thank you for looking into this.
>>
>> On 9/5/2023 9:17 PM, Peter Zijlstra wrote:
>>> On Mon, Sep 04, 2023 at 09:53:34AM +0000, Manali Shukla wrote:
>>>
>>>> Note that, since IBS registers are swap type C [2], the hypervisor is
>>>> responsible for saving and restoring of IBS host state. Hypervisor
>>>> does so only when IBS is active on the host to avoid unnecessary
>>>> rdmsrs/wrmsrs. Hypervisor needs to disable host IBS before saving the
>>>> state and enter the guest. After a guest exit, the hypervisor needs to
>>>> restore host IBS state and re-enable IBS.
>>>
>>> Why do you think it is OK for a guest to disable the host IBS when
>>> entering a guest? Perhaps the host was wanting to profile the guest.
>>>
>>
>> 1. Since IBS registers are of swap type C [1], only guest state is saved
>> and restored by the hardware. Host state needs to be saved and restored by
>> hypervisor. In order to save IBS registers correctly, IBS needs to be
>> disabled before saving the IBS registers.
>>
>> 2. As per APM [2],
>> "When a VMRUN is executed to an SEV-ES guest with IBS virtualization enabled, the
>> IbsFetchCtl[IbsFetchEn] and IbsOpCtl[IbsOpEn] MSR bits must be 0. If either of 
>> these bits are not 0, the VMRUN will fail with a VMEXIT_INVALID error code."
>> This is enforced by hardware on SEV-ES guests when VIBS is enabled on SEV-ES
>> guests.
> 
> I'm not sure I'm fluent in virt speak (in fact, I'm sure I'm not). Is
> the above saying that a host can never IBS profile a guest?

Host can profile a guest with IBS if VIBS is disabled for the guest. This is
the default behavior. Host can not profile guest if VIBS is enabled for guest.

> 
> Does the current IBS thing assert perf_event_attr::exclude_guest is set?

Unlike AMD core pmu, IBS doesn't have Host/Guest filtering capability, thus
perf_event_open() fails if exclude_guest is set for an IBS event.

> 
> I can't quickly find anything :-(

Thank you,
Manali


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/13] Implement support for IBS virtualization
  2023-09-07 15:49       ` Manali Shukla
@ 2023-09-08 13:31         ` Peter Zijlstra
  2023-09-11 12:32           ` Manali Shukla
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2023-09-08 13:31 UTC (permalink / raw)
  To: Manali Shukla
  Cc: kvm, seanjc, linux-doc, linux-perf-users, x86, pbonzini, bp,
	santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj

On Thu, Sep 07, 2023 at 09:19:51PM +0530, Manali Shukla wrote:

> > I'm not sure I'm fluent in virt speak (in fact, I'm sure I'm not). Is
> > the above saying that a host can never IBS profile a guest?
> 
> Host can profile a guest with IBS if VIBS is disabled for the guest. This is
> the default behavior. Host can not profile guest if VIBS is enabled for guest.
> 
> > 
> > Does the current IBS thing assert perf_event_attr::exclude_guest is set?
> 
> Unlike AMD core pmu, IBS doesn't have Host/Guest filtering capability, thus
> perf_event_open() fails if exclude_guest is set for an IBS event.

Then you must not allow VIBS if a host cpu-wide IBS counter exists.

Also, VIBS reads like it can be (ab)used as a filter.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/13] Implement support for IBS virtualization
  2023-09-08 13:31         ` Peter Zijlstra
@ 2023-09-11 12:32           ` Manali Shukla
  2023-09-28 11:18             ` Manali Shukla
  0 siblings, 1 reply; 32+ messages in thread
From: Manali Shukla @ 2023-09-11 12:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: kvm, seanjc, linux-doc, linux-perf-users, x86, pbonzini, bp,
	santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj

On 9/8/2023 7:01 PM, Peter Zijlstra wrote:
> On Thu, Sep 07, 2023 at 09:19:51PM +0530, Manali Shukla wrote:
> 
>>> I'm not sure I'm fluent in virt speak (in fact, I'm sure I'm not). Is
>>> the above saying that a host can never IBS profile a guest?
>>
>> Host can profile a guest with IBS if VIBS is disabled for the guest. This is
>> the default behavior. Host can not profile guest if VIBS is enabled for guest.
>>
>>>
>>> Does the current IBS thing assert perf_event_attr::exclude_guest is set?
>>
>> Unlike AMD core pmu, IBS doesn't have Host/Guest filtering capability, thus
>> perf_event_open() fails if exclude_guest is set for an IBS event.
> 
> Then you must not allow VIBS if a host cpu-wide IBS counter exists.
> 
> Also, VIBS reads like it can be (ab)used as a filter.

I think I get your point: If host IBS with exclude_guest=0 doesn't capture
guest samples because of VIBS, it is an unintended behavior.

But if a guest cannot use IBS because a host is using it, that is also
unacceptable behavior.

Let me think over it and come back.

- Manali

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 01/13] KVM: Add KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC for extapic
  2023-09-04  9:53 ` [PATCH 01/13] KVM: Add KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC for extapic Manali Shukla
@ 2023-09-12  1:47   ` Chao Gao
  0 siblings, 0 replies; 32+ messages in thread
From: Chao Gao @ 2023-09-12  1:47 UTC (permalink / raw)
  To: Manali Shukla
  Cc: kvm, seanjc, linux-doc, linux-perf-users, x86, pbonzini, peterz,
	bp, santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj

On Mon, Sep 04, 2023 at 09:53:35AM +0000, Manali Shukla wrote:
>There are four additional extended LVT registers available in extended
>APIC register space which can be used for additional interrupt sources
>like instruction based sampling and many more.
>
>Please refer to AMD programmers's manual Volume 2, Section 16.4.5 for
>more details on extapic.
>https://bugzilla.kernel.org/attachment.cgi?id=304653
>
>Adds two new vcpu-based IOCTLs to save and restore the local APIC
>registers with extended APIC register space for a single vcpu. It
>works same as KVM_GET_LAPIC and KVM_SET_LAPIC IOCTLs. The only
>differece is the size of APIC page which is copied/restored by kernel.
>In case of KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC IOCTLs,
>kernel copies/restores the APIC page with extended APIC register space
>located at APIC offsets 400h-530h.
>
>KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC IOCTLs are used
>when extended APIC is enabled in the guest.
>
>Document KVM_GET_LAPIC_W_EXTAPIC, KVM_SET_LAPIC_W_EXTAPIC ioctls.
>
>Signed-off-by: Manali Shukla <manali.shukla@amd.com>
>---
> Documentation/virt/kvm/api.rst  | 23 +++++++++++++++++++++++
> arch/x86/include/uapi/asm/kvm.h |  5 +++++
> arch/x86/kvm/lapic.c            | 12 +++++++-----
> arch/x86/kvm/lapic.h            |  6 ++++--
> arch/x86/kvm/x86.c              | 24 +++++++++++++-----------
> include/uapi/linux/kvm.h        | 10 ++++++++++
> 6 files changed, 62 insertions(+), 18 deletions(-)
>
>diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>index 73db30cb60fb..7239d4f1ecf3 100644
>--- a/Documentation/virt/kvm/api.rst
>+++ b/Documentation/virt/kvm/api.rst
>@@ -1961,6 +1961,18 @@ error.
> Reads the Local APIC registers and copies them into the input argument.  The
> data format and layout are the same as documented in the architecture manual.
> 
>+::
>+
>+  #define KVM_APIC_EXT_REG_SIZE 0x540
>+  struct kvm_lapic_state_w_extapic {
>+        __u8 regs[KVM_APIC_EXT_REG_SIZE];
>+  };

The size of this new structure is also hard-coded. Do you think it is better to
make the new structure extensible so that next time KVM needn't add more uAPIs
for future local APIC extensions?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 03/13] KVM: x86: Add emulation support for Extented LVT registers
  2023-09-04  9:53 ` [PATCH 03/13] KVM: x86: Add emulation support for Extented LVT registers Manali Shukla
@ 2023-09-12  2:36   ` Chao Gao
  0 siblings, 0 replies; 32+ messages in thread
From: Chao Gao @ 2023-09-12  2:36 UTC (permalink / raw)
  To: Manali Shukla
  Cc: kvm, seanjc, linux-doc, linux-perf-users, x86, pbonzini, peterz,
	bp, santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj

On Mon, Sep 04, 2023 at 09:53:37AM +0000, Manali Shukla wrote:
>From: Santosh Shukla <santosh.shukla@amd.com>
>
>The local interrupts are extended to include more LVT registers in
>order to allow additional interrupt sources, like Instruction Based
>Sampling (IBS) and many more.
>
>Currently there are four additional LVT registers defined and they are
>located at APIC offsets 500h-530h.
>
>AMD IBS driver is designed to use EXTLVT (Extended interrupt local
>vector table) by default for driver initialization.
>
>Extended LVT registers are required to be emulated to initialize the
>guest IBS driver successfully.
>
>Please refer to Section 16.4.5 in AMD Programmer's Manual Volume 2 at
>https://bugzilla.kernel.org/attachment.cgi?id=304653 for more details
>on EXTLVT.
>
>Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
>Co-developed-by: Manali Shukla <manali.shukla@amd.com>
>Signed-off-by: Manali Shukla <manali.shukla@amd.com>
>---
> arch/x86/include/asm/apicdef.h | 14 ++++++++
> arch/x86/kvm/lapic.c           | 66 ++++++++++++++++++++++++++++++++--
> arch/x86/kvm/lapic.h           |  1 +
> arch/x86/kvm/svm/avic.c        |  4 +++
> 4 files changed, 83 insertions(+), 2 deletions(-)
>
>diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
>index 4b125e5b3187..ac50919d10be 100644
>--- a/arch/x86/include/asm/apicdef.h
>+++ b/arch/x86/include/asm/apicdef.h
>@@ -139,6 +139,20 @@
> #define		APIC_EILVT_MSG_EXT	0x7
> #define		APIC_EILVT_MASKED	(1 << 16)
> 
>+/*
>+ * Initialize extended APIC registers to the default value when guest is started
>+ * and EXTAPIC feature is enabled on the guest.
>+ *
>+ * APIC_EFEAT is a read only Extended APIC feature register, whose default value
>+ * is 0x00040007.

bits 0/1/2 indicate other features. If KVM sets them to 1s, KVM needs to
enumerate the corresponding features.

>+ *
>+ * APIC_ECTRL is a read-write Extended APIC control register, whose default value
>+ * is 0x0.
>+ */
>+
>+#define		APIC_EFEAT_DEFAULT	0x00040007
>+#define		APIC_ECTRL_DEFAULT	0x0
>+
> #define APIC_BASE (fix_to_virt(FIX_APIC_BASE))
> #define APIC_BASE_MSR		0x800
> #define APIC_X2APIC_ID_MSR	0x802
>diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>index 7c1bd8594f1b..88985c481fe8 100644
>--- a/arch/x86/kvm/lapic.c
>+++ b/arch/x86/kvm/lapic.c
>@@ -1599,9 +1599,13 @@ static inline struct kvm_lapic *to_lapic(struct kvm_io_device *dev)
> }
> 
> #define APIC_REG_MASK(reg)	(1ull << ((reg) >> 4))
>+#define APIC_REG_EXT_MASK(reg)	(1ull << (((reg) >> 4) - 0x40))
> #define APIC_REGS_MASK(first, count) \
> 	(APIC_REG_MASK(first) * ((1ull << (count)) - 1))
> 
>+#define APIC_LAST_REG_OFFSET		0x3f0
>+#define APIC_EXT_LAST_REG_OFFSET	0x530
>+
> u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic)
> {
> 	/* Leave bits '0' for reserved and write-only registers. */
>@@ -1643,6 +1647,8 @@ EXPORT_SYMBOL_GPL(kvm_lapic_readable_reg_mask);
> static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
> 			      void *data)
> {
>+	u64 valid_reg_ext_mask = 0;
>+	unsigned int last_reg = APIC_LAST_REG_OFFSET;
> 	unsigned char alignment = offset & 0xf;
> 	u32 result;
> 
>@@ -1652,13 +1658,44 @@ static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
> 	 */
> 	WARN_ON_ONCE(apic_x2apic_mode(apic) && offset == APIC_ICR);
> 
>+	/*
>+	 * The local interrupts are extended to include LVT registers to allow
>+	 * additional interrupt sources when the EXTAPIC feature bit is enabled.
>+	 * The Extended Interrupt LVT registers are located at APIC offsets 400-530h.
>+	 */
>+	if (guest_cpuid_has(apic->vcpu, X86_FEATURE_EXTAPIC)) {
>+		valid_reg_ext_mask =
>+			APIC_REG_EXT_MASK(APIC_EFEAT) |
>+			APIC_REG_EXT_MASK(APIC_ECTRL) |
>+			APIC_REG_EXT_MASK(APIC_EILVTn(0)) |
>+			APIC_REG_EXT_MASK(APIC_EILVTn(1)) |
>+			APIC_REG_EXT_MASK(APIC_EILVTn(2)) |
>+			APIC_REG_EXT_MASK(APIC_EILVTn(3));
>+		last_reg = APIC_EXT_LAST_REG_OFFSET;
>+	}
>+
> 	if (alignment + len > 4)
> 		return 1;
> 
>-	if (offset > 0x3f0 ||
>-	    !(kvm_lapic_readable_reg_mask(apic) & APIC_REG_MASK(offset)))
>+	if (offset > last_reg)
> 		return 1;
> 
>+	switch (offset) {
>+	/*
>+	 * Section 16.3.2 in the AMD Programmer's Manual Volume 2 states:
>+	 * "APIC registers are aligned to 16-byte offsets and must be accessed
>+	 * using naturally-aligned DWORD size read and writes."
>+	 */
>+	case KVM_APIC_REG_SIZE ... KVM_APIC_EXT_REG_SIZE - 16:
>+		if (!(valid_reg_ext_mask & APIC_REG_EXT_MASK(offset)))
>+			return 1;
>+		break;
>+	default:
>+		if (!(kvm_lapic_readable_reg_mask(apic) & APIC_REG_MASK(offset)))
>+			return 1;
>+
>+	}
>+
> 	result = __apic_read(apic, offset & ~0xf);
> 
> 	trace_kvm_apic_read(offset, result);
>@@ -2386,6 +2423,12 @@ static int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
> 		else
> 			kvm_apic_send_ipi(apic, APIC_DEST_SELF | val, 0);
> 		break;
>+	case APIC_EILVTn(0):
>+	case APIC_EILVTn(1):
>+	case APIC_EILVTn(2):
>+	case APIC_EILVTn(3):
>+		kvm_lapic_set_reg(apic, reg, val);
>+		break;

APIC_ECTRL is writable, so, I think it should be handled here.

> 	default:
> 		ret = 1;
> 		break;
>@@ -2664,6 +2707,25 @@ void kvm_inhibit_apic_access_page(struct kvm_vcpu *vcpu)
> 	kvm_vcpu_srcu_read_lock(vcpu);
> }
> 
>+/*
>+ * Initialize extended APIC registers to the default value when guest is
>+ * started. The extended APIC registers should only be initialized when the
>+ * EXTAPIC feature is enabled on the guest.
>+ */
>+void kvm_apic_init_eilvt_regs(struct kvm_vcpu *vcpu)
>+{
>+	struct kvm_lapic *apic = vcpu->arch.apic;
>+	int i;
>+
>+	if (guest_cpuid_has(vcpu, X86_FEATURE_EXTAPIC)) {

Do you need to check hardware support here?

>+		kvm_lapic_set_reg(apic, APIC_EFEAT, APIC_EFEAT_DEFAULT);
>+		kvm_lapic_set_reg(apic, APIC_ECTRL, APIC_ECTRL_DEFAULT);
>+		for (i = 0; i < APIC_EILVT_NR_MAX; i++)
>+			kvm_lapic_set_reg(apic, APIC_EILVTn(i), APIC_EILVT_MASKED);
>+	}
>+}
>+EXPORT_SYMBOL_GPL(kvm_apic_init_eilvt_regs);

looks Extended APIC space is generic to x86. Can you just call this function
from kvm_vcpu_after_set_cpuid()?  then there is no need to expose this function.

>+
> void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event)

The extended APIC space should be re-initialized on reset. Right?

> {
> 	struct kvm_lapic *apic = vcpu->arch.apic;
>diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
>index ad6c48938733..b0c7393cd6af 100644
>--- a/arch/x86/kvm/lapic.h
>+++ b/arch/x86/kvm/lapic.h
>@@ -93,6 +93,7 @@ int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
> int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
> int kvm_apic_accept_events(struct kvm_vcpu *vcpu);
> void kvm_lapic_reset(struct kvm_vcpu *vcpu, bool init_event);
>+void kvm_apic_init_eilvt_regs(struct kvm_vcpu *vcpu);
> u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu);
> void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8);
> void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu);
>diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
>index cfc8ab773025..081075674b1d 100644
>--- a/arch/x86/kvm/svm/avic.c
>+++ b/arch/x86/kvm/svm/avic.c
>@@ -679,6 +679,10 @@ static bool is_avic_unaccelerated_access_trap(u32 offset)
> 	case APIC_LVTERR:
> 	case APIC_TMICT:
> 	case APIC_TDCR:
>+	case APIC_EILVTn(0):
>+	case APIC_EILVTn(1):
>+	case APIC_EILVTn(2):
>+	case APIC_EILVTn(3):
> 		ret = true;
> 		break;
> 	default:
>-- 
>2.34.1
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 06/13] KVM: x86: Extend CPUID range to include new leaf
  2023-09-04  9:53 ` [PATCH 06/13] KVM: x86: Extend CPUID range to include new leaf Manali Shukla
@ 2023-09-12  2:46   ` Chao Gao
  0 siblings, 0 replies; 32+ messages in thread
From: Chao Gao @ 2023-09-12  2:46 UTC (permalink / raw)
  To: Manali Shukla
  Cc: kvm, seanjc, linux-doc, linux-perf-users, x86, pbonzini, peterz,
	bp, santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj

On Mon, Sep 04, 2023 at 09:53:40AM +0000, Manali Shukla wrote:
>CPUID leaf 0x8000001b (EAX) provides information about
>Instruction-Based sampling capabilities on AMD Platforms. Complete
>description about 0x8000001b CPUID leaf is available in AMD
>Programmer's Manual volume 3, Appendix E, section E.4.13.
>https://bugzilla.kernel.org/attachment.cgi?id=304655
>
>Signed-off-by: Manali Shukla <manali.shukla@amd.com>
>---
> arch/x86/kvm/cpuid.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
>diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>index 0544e30b4946..1f4d505fb69d 100644
>--- a/arch/x86/kvm/cpuid.c
>+++ b/arch/x86/kvm/cpuid.c
>@@ -771,6 +771,12 @@ void kvm_set_cpu_caps(void)
> 		F(PERFMON_V2)
> 	);
> 
>+	/*
>+	 * Hide all IBS related features by default, it will be enabled
>+	 * automatically when IBS virtualization is enabled
>+	 */
>+	kvm_cpu_cap_init_kvm_defined(CPUID_8000_001B_EAX, 0);
>+
> 	/*
> 	 * Synthesize "LFENCE is serializing" into the AMD-defined entry in
> 	 * KVM's supported CPUID if the feature is reported as supported by the
>@@ -1252,6 +1258,11 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
> 		entry->eax = entry->ebx = entry->ecx = 0;
> 		entry->edx = 0; /* reserved */
> 		break;
>+	/* AMD IBS capability */
>+	case 0x8000001B:
>+		entry->eax = entry->ebx = entry->ecx = entry->edx = 0;

nit: no need to clear entry->eax to 0 because it will be overwritten right below.

>+		cpuid_entry_override(entry, CPUID_8000_001B_EAX);
>+		break;
> 	case 0x8000001F:
> 		if (!kvm_cpu_cap_has(X86_FEATURE_SEV)) {
> 			entry->eax = entry->ebx = entry->ecx = entry->edx = 0;
>-- 
>2.34.1
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 07/13] KVM: SVM: Extend VMCB area for virtualized IBS registers
  2023-09-04  9:53 ` [PATCH 07/13] KVM: SVM: Extend VMCB area for virtualized IBS registers Manali Shukla
@ 2023-09-12  2:50   ` Chao Gao
  0 siblings, 0 replies; 32+ messages in thread
From: Chao Gao @ 2023-09-12  2:50 UTC (permalink / raw)
  To: Manali Shukla
  Cc: kvm, seanjc, linux-doc, linux-perf-users, x86, pbonzini, peterz,
	bp, santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj

On Mon, Sep 04, 2023 at 09:53:41AM +0000, Manali Shukla wrote:
>From: Santosh Shukla <santosh.shukla@amd.com>
>
>VMCB state save is extended to hold guest values of the fetch and op
>IBS registers.
>
>Signed-off-by: Santosh Shukla <santosh.shukla@amd.com>
>Signed-off-by: Manali Shukla <manali.shukla@amd.com>
>---
> arch/x86/include/asm/svm.h | 16 +++++++++++++++-
> 1 file changed, 15 insertions(+), 1 deletion(-)
>
>diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
>index dee9fa91120b..4096d2f68770 100644
>--- a/arch/x86/include/asm/svm.h
>+++ b/arch/x86/include/asm/svm.h
>@@ -346,6 +346,19 @@ struct vmcb_save_area {
> 	u64 last_excp_to;
> 	u8 reserved_0x298[72];
> 	u64 spec_ctrl;		/* Guest version of SPEC_CTRL at 0x2E0 */
>+	u8 reserved_0x2e8[904];
>+	u8 lbr_stack_from_to[256];
>+	u64 lbr_select;

Shouldn't these lbr fields be added by a separate patch/series?

>+	u64 ibs_fetch_ctl;
>+	u64 ibs_fetch_linear_addr;
>+	u64 ibs_op_ctl;
>+	u64 ibs_op_rip;
>+	u64 ibs_op_data;
>+	u64 ibs_op_data2;
>+	u64 ibs_op_data3;
>+	u64 ibs_dc_linear_addr;
>+	u64 ibs_br_target;
>+	u64 ibs_fetch_extd_ctl;
> } __packed;
> 
> /* Save area definition for SEV-ES and SEV-SNP guests */
>@@ -512,7 +525,7 @@ struct ghcb {
> } __packed;
> 
> 
>-#define EXPECTED_VMCB_SAVE_AREA_SIZE		744
>+#define EXPECTED_VMCB_SAVE_AREA_SIZE		1992
> #define EXPECTED_GHCB_SAVE_AREA_SIZE		1032
> #define EXPECTED_SEV_ES_SAVE_AREA_SIZE		1648
> #define EXPECTED_VMCB_CONTROL_AREA_SIZE		1024
>@@ -537,6 +550,7 @@ static inline void __unused_size_checks(void)
> 	BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x180);
> 	BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x248);
> 	BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x298);
>+	BUILD_BUG_RESERVED_OFFSET(vmcb_save_area, 0x2e8);
> 
> 	BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0xc8);
> 	BUILD_BUG_RESERVED_OFFSET(sev_es_save_area, 0xcc);
>-- 
>2.34.1
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 09/13] KVM: SVM: add support for IBS virtualization for non SEV-ES guests
  2023-09-04  9:53 ` [PATCH 09/13] KVM: SVM: add support for IBS virtualization for non SEV-ES guests Manali Shukla
  2023-09-05 15:30   ` Tom Lendacky
  2023-09-06  1:51   ` Alexey Kardashevskiy
@ 2023-09-12  3:09   ` Chao Gao
  2 siblings, 0 replies; 32+ messages in thread
From: Chao Gao @ 2023-09-12  3:09 UTC (permalink / raw)
  To: Manali Shukla
  Cc: kvm, seanjc, linux-doc, linux-perf-users, x86, pbonzini, peterz,
	bp, santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj

On Mon, Sep 04, 2023 at 09:53:43AM +0000, Manali Shukla wrote:
>@@ -1207,6 +1241,29 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
> 		/* No need to intercept these MSRs */
> 		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_EIP, 1, 1);
> 		set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_ESP, 1, 1);
>+
>+		/*
>+		 * If hardware supports VIBS then no need to intercept IBS MSRS
>+		 * when VIBS is enabled in guest.
>+		 */
>+		if (vibs) {
>+			if (guest_cpuid_has(&svm->vcpu, X86_FEATURE_IBS)) {
>+				svm_ibs_msr_interception(svm, false);
>+				svm->ibs_enabled = true;
>+
>+				/*
>+				 * In order to enable VIBS, AVIC/VNMI must be enabled to handle the
>+				 * interrupt generated by IBS driver. When AVIC is enabled, once
>+				 * data collection for IBS fetch/op block for sampled interval
>+				 * provided is done, hardware signals VNMI which is generated via
>+				 * AVIC which uses extended LVT registers. That is why extended LVT
>+				 * registers are initialized at guest startup.
>+				 */
>+				kvm_apic_init_eilvt_regs(vcpu);
>+			} else {
>+				svm->ibs_enabled = false;

The interception should be enabled for IBS MSRs in the else branch. see:

https://lore.kernel.org/all/ZJYzPn7ipYfO0fLZ@google.com/

>+			}
>+		}
> 	}
> }
> 
>@@ -2888,6 +2945,11 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> 	case MSR_AMD64_DE_CFG:
> 		msr_info->data = svm->msr_decfg;
> 		break;
>+
>+	case MSR_AMD64_IBSCTL:
>+		rdmsrl(MSR_AMD64_IBSCTL, msr_info->data);
>+		break;

"When AVIC is enabled, IBS LVT entry (Extended Interrupt 0 LVT) message type
should be programmed to INTR or NMI."

It implies that AVIC always uses extended LVT 0 when issuing IBS interrupts if
IBS virtualization is enabled. Right? If yes, KVM should emulate the LvtOffset
in guest's IBS_CTL MSR as 0. Returning the hardware value here is error-prone.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 12/13] KVM: SVM: Enable IBS virtualization on non SEV-ES and SEV-ES guests
  2023-09-04  9:53 ` [PATCH 12/13] KVM: SVM: Enable IBS virtualization on non SEV-ES and " Manali Shukla
  2023-09-05 16:00   ` Tom Lendacky
@ 2023-09-12  3:30   ` Chao Gao
  1 sibling, 0 replies; 32+ messages in thread
From: Chao Gao @ 2023-09-12  3:30 UTC (permalink / raw)
  To: Manali Shukla
  Cc: kvm, seanjc, linux-doc, linux-perf-users, x86, pbonzini, peterz,
	bp, santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj

>+static void svm_ibs_set_cpu_caps(void)
>+{
>+	kvm_cpu_cap_set(X86_FEATURE_IBS);
>+	kvm_cpu_cap_set(X86_FEATURE_EXTLVT);
>+	kvm_cpu_cap_set(X86_FEATURE_EXTAPIC);

EXTLVT is a misnomer to me. It indicates the AVIC change about handling guest's
accesses to externed LVTs rather than the presence of extended LVTs (that's what
EXTAPIC is for).

>+	kvm_cpu_cap_set(X86_FEATURE_IBS_AVAIL);
>+	kvm_cpu_cap_set(X86_FEATURE_IBS_FETCHSAM);
>+	kvm_cpu_cap_set(X86_FEATURE_IBS_OPSAM);
>+	kvm_cpu_cap_set(X86_FEATURE_IBS_RDWROPCNT);
>+	kvm_cpu_cap_set(X86_FEATURE_IBS_OPCNT);
>+	kvm_cpu_cap_set(X86_FEATURE_IBS_BRNTRGT);
>+	kvm_cpu_cap_set(X86_FEATURE_IBS_OPCNTEXT);
>+	kvm_cpu_cap_set(X86_FEATURE_IBS_RIPINVALIDCHK);
>+	kvm_cpu_cap_set(X86_FEATURE_IBS_OPBRNFUSE);
>+	kvm_cpu_cap_set(X86_FEATURE_IBS_FETCHCTLEXTD);
>+	kvm_cpu_cap_set(X86_FEATURE_IBS_ZEN4_EXT);

any reason for not using kvm_cpu_cap_check_and_set(), which takes
hardware capabilities into account?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 00/13] Implement support for IBS virtualization
  2023-09-11 12:32           ` Manali Shukla
@ 2023-09-28 11:18             ` Manali Shukla
  0 siblings, 0 replies; 32+ messages in thread
From: Manali Shukla @ 2023-09-28 11:18 UTC (permalink / raw)
  To: Peter Zijlstra, Sean Christopherson
  Cc: kvm, linux-doc, linux-perf-users, x86, pbonzini, bp,
	santosh.shukla, ravi.bangoria, thomas.lendacky, nikunj

On 9/11/2023 6:02 PM, Manali Shukla wrote:
> On 9/8/2023 7:01 PM, Peter Zijlstra wrote:
>> On Thu, Sep 07, 2023 at 09:19:51PM +0530, Manali Shukla wrote:
>>
>>>> I'm not sure I'm fluent in virt speak (in fact, I'm sure I'm not). Is
>>>> the above saying that a host can never IBS profile a guest?
>>>
>>> Host can profile a guest with IBS if VIBS is disabled for the guest. This is
>>> the default behavior. Host can not profile guest if VIBS is enabled for guest.
>>>
>>>>
>>>> Does the current IBS thing assert perf_event_attr::exclude_guest is set?
>>>
>>> Unlike AMD core pmu, IBS doesn't have Host/Guest filtering capability, thus
>>> perf_event_open() fails if exclude_guest is set for an IBS event.
>>
>> Then you must not allow VIBS if a host cpu-wide IBS counter exists.
>>
>> Also, VIBS reads like it can be (ab)used as a filter.
> 
> I think I get your point: If host IBS with exclude_guest=0 doesn't capture
> guest samples because of VIBS, it is an unintended behavior.
> 
> But if a guest cannot use IBS because a host is using it, that is also
> unacceptable behavior.
> 
> Let me think over it and come back.
> 
> - Manali

Hi Peter,

Apologies for the delay in response. It took me a while to think about
various possible solutions and their feasibility.

Problem with current design is, exclude_guest setting of the host IBS event is
not honored. Essentially, guest samples become invisible to the host IBS even
when exclude_guest=0, when VIBS is enabled on the guest.

Solution 1:
Enforce exclude_guest=1 for host IBS when hw supports VIBS, i.e.

        if (cpuid[VIBS])
                enforce exclude_guest=1
        else
                enforce exclude_guest=0

Disable/enable host IBS at VM Entry/VM Exit if cpuid[VIBS] is set and an active
IBS event exists on that cpu.

The major downside of this approach is, it will break all currently working
scripts which are using perf_event_open() to start an ibs event, since new
kernel will suddenly start failing for IBS events with exclude_guest=0. The
other issue is that the host with cpuid[VIBS] set, would not allow profiling any
guest from the host.

Solution 1.1:
This is an extension to Solution 1. Instead of keying off based on just
cpuid[VIBS] bit, introduce a kvm-amd module parameter and use both:

         if (cpuid[VIBS] && kvm-amd.ko loaded with vibs=1)
                enforce exclude_guest=1
         else
                enforce exclude_guest=0

KVM AMD vibs module parameter determines whether guest will be able to use VIBS
or not.  The kvm-amd.ko should be loaded with vibs=0, if a host wants to profile
guest and the kvm-amd.ko should be loaded with vibs=1, if a guest wants to use
VIBS. However, both are mutually exclusive.

The issue of digressing from current exclude_guest behavior remains with this
solution. Other issues are,
1) If the host IBS is active, with vibs=0, reloading of the kvm-amd.ko with
   vibs=1 will fail until IBS is running on the host.
2) If the host IBS is active, with vibs=1, reloading of the kvm-amd.ko with
   vibs=0 will fail until IBS is running on the host.

Solution 2:
Dynamically disable/enable VIBS per vCPU basis, i.e. when a host IBS is active,
guest will not be able to use VIBS _for that vCPU_. If the host is not using
IBS, VIBS will be enabled at VM Entry.

Although this solution does not digress from existing exclude_guest behavior, it
has its own limitations:
1) VIBS inside the guest is unreliable because host IBS can dynamically change
   VIBS behavior.
2) It works only for SVM and SEV guests, but not for SEV-ES and SEV-SNP guests
   since there is no way to disable VIBS dynamically.

From all the above solutions, we would be more inclined on implementing
solution 1.1.

-Manali

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2023-09-28 11:24 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-04  9:53 [PATCH 00/13] Implement support for IBS virtualization Manali Shukla
2023-09-04  9:53 ` [PATCH 01/13] KVM: Add KVM_GET_LAPIC_W_EXTAPIC and KVM_SET_LAPIC_W_EXTAPIC for extapic Manali Shukla
2023-09-12  1:47   ` Chao Gao
2023-09-04  9:53 ` [PATCH 02/13] x86/cpufeatures: Add CPUID feature bit for Extended LVT Manali Shukla
2023-09-04  9:53 ` [PATCH 03/13] KVM: x86: Add emulation support for Extented LVT registers Manali Shukla
2023-09-12  2:36   ` Chao Gao
2023-09-04  9:53 ` [PATCH 04/13] x86/cpufeatures: Add CPUID feature bit for virtualized IBS Manali Shukla
2023-09-04  9:53 ` [PATCH 05/13] KVM: x86/cpuid: Add a KVM-only leaf for IBS capabilities Manali Shukla
2023-09-04  9:53 ` [PATCH 06/13] KVM: x86: Extend CPUID range to include new leaf Manali Shukla
2023-09-12  2:46   ` Chao Gao
2023-09-04  9:53 ` [PATCH 07/13] KVM: SVM: Extend VMCB area for virtualized IBS registers Manali Shukla
2023-09-12  2:50   ` Chao Gao
2023-09-04  9:53 ` [PATCH 08/13] perf/x86/amd: Add framework to save/restore host IBS state Manali Shukla
2023-09-05 14:54   ` Tom Lendacky
2023-09-04  9:53 ` [PATCH 09/13] KVM: SVM: add support for IBS virtualization for non SEV-ES guests Manali Shukla
2023-09-05 15:30   ` Tom Lendacky
2023-09-06  1:51   ` Alexey Kardashevskiy
2023-09-12  3:09   ` Chao Gao
2023-09-04  9:53 ` [PATCH 10/13] x86/cpufeatures: Add CPUID feature bit for VIBS in SEV-ES guest Manali Shukla
2023-09-04  9:53 ` [PATCH 11/13] KVM: SVM: Add support for IBS virtualization for SEV-ES guests Manali Shukla
2023-09-05 15:43   ` Tom Lendacky
2023-09-04  9:53 ` [PATCH 12/13] KVM: SVM: Enable IBS virtualization on non SEV-ES and " Manali Shukla
2023-09-05 16:00   ` Tom Lendacky
2023-09-12  3:30   ` Chao Gao
2023-09-04  9:53 ` [PATCH 13/13] KVM: x86: nSVM: Implement support for nested IBS virtualization Manali Shukla
2023-09-05 15:47 ` [PATCH 00/13] Implement support for " Peter Zijlstra
2023-09-06 15:38   ` Manali Shukla
2023-09-06 19:56     ` Peter Zijlstra
2023-09-07 15:49       ` Manali Shukla
2023-09-08 13:31         ` Peter Zijlstra
2023-09-11 12:32           ` Manali Shukla
2023-09-28 11:18             ` Manali Shukla

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.