linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] KVM: x86: MSR filtering and related fixes
@ 2021-03-16 18:44 Sean Christopherson
  2021-03-16 18:44 ` [PATCH 1/4] KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish Sean Christopherson
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Sean Christopherson @ 2021-03-16 18:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Alexander Graf, Yuan Yao

Rework the MSR filtering implementation to treat a given filter instance
as an atomic unit, and to properly protect it with SRCU.

Fix two nVMX bugs related to MSR filtering (one directly, one indirectly),
and additional cleanup on top.

Regarding the macro insanity in patch 03, I verified the before and after
binary output for vmx_set_intercept_for_msr() was identical (this required
wrapping "if (msr <= 0x1fff)" with (un)likely in both the before and after
versions; gcc made seemingly random choices without forcing it to favor a
specific branch).

Sean Christopherson (4):
  KVM: x86: Protect userspace MSR filter with SRCU, and set
    atomically-ish
  KVM: nVMX: Handle dynamic MSR intercept toggling
  KVM: VMX: Macrofy the MSR bitmap getters and setters
  KVM: nVMX: Clean up x2APIC MSR handling for L2

 Documentation/virt/kvm/api.rst  |   6 +-
 arch/x86/include/asm/kvm_host.h |  17 ++--
 arch/x86/kvm/vmx/nested.c       | 161 +++++++++++++-------------------
 arch/x86/kvm/vmx/vmx.c          |  67 +------------
 arch/x86/kvm/vmx/vmx.h          |  32 +++++++
 arch/x86/kvm/x86.c              | 109 ++++++++++++---------
 6 files changed, 176 insertions(+), 216 deletions(-)

-- 
2.31.0.rc2.261.g7f71774620-goog


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish
  2021-03-16 18:44 [PATCH 0/4] KVM: x86: MSR filtering and related fixes Sean Christopherson
@ 2021-03-16 18:44 ` Sean Christopherson
  2021-03-17 13:15   ` Paolo Bonzini
  2021-03-17 19:29   ` Alexander Graf
  2021-03-16 18:44 ` [PATCH 2/4] KVM: nVMX: Handle dynamic MSR intercept toggling Sean Christopherson
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 14+ messages in thread
From: Sean Christopherson @ 2021-03-16 18:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Alexander Graf, Yuan Yao

Fix a plethora of issues with MSR filtering by installing the resulting
filter as an atomic bundle instead of updating the live filter one range
at a time.  The KVM_X86_SET_MSR_FILTER ioctl() isn't truly atomic, as
the hardware MSR bitmaps won't be updated until the next VM-Enter, but
the relevant software struct is atomically updated, which is what KVM
really needs.

Similar to the approach used for modifying memslots, make arch.msr_filter
a SRCU-protected pointer, do all the work configuring the new filter
outside of kvm->lock, and then acquire kvm->lock only when the new filter
has been vetted and created.  That way vCPU readers either see the old
filter or the new filter in their entirety, not some half-baked state.

Yuan Yao pointed out a use-after-free in ksm_msr_allowed() due to a
TOCTOU bug, but that's just the tip of the iceberg...

  - Nothing is __rcu annotated, making it nigh impossible to audit the
    code for correctness.
  - kvm_add_msr_filter() has an unpaired smp_wmb().  Violation of kernel
    coding style aside, the lack of a smb_rmb() anywhere casts all code
    into doubt.
  - kvm_clear_msr_filter() has a double free TOCTOU bug, as it grabs
    count before taking the lock.
  - kvm_clear_msr_filter() also has memory leak due to the same TOCTOU bug.

The entire approach of updating the live filter is also flawed.  While
installing a new filter is inherently racy if vCPUs are running, fixing
the above issues also makes it trivial to ensure certain behavior is
deterministic, e.g. KVM can provide deterministic behavior for MSRs with
identical settings in the old and new filters.  An atomic update of the
filter also prevents KVM from getting into a half-baked state, e.g. if
installing a filter fails, the existing approach would leave the filter
in a half-baked state, having already committed whatever bits of the
filter were already processed.

[*] https://lkml.kernel.org/r/20210312083157.25403-1-yaoyuan0329os@gmail.com

Fixes: 1a155254ff93 ("KVM: x86: Introduce MSR filtering")
Cc: stable@vger.kernel.org
Cc: Alexander Graf <graf@amazon.com>
Reported-by: Yuan Yao <yaoyuan0329os@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 Documentation/virt/kvm/api.rst  |   6 +-
 arch/x86/include/asm/kvm_host.h |  17 ++---
 arch/x86/kvm/x86.c              | 109 +++++++++++++++++++-------------
 3 files changed, 78 insertions(+), 54 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 38e327d4b479..2898d3e86b08 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -4806,8 +4806,10 @@ If an MSR access is not permitted through the filtering, it generates a
 allows user space to deflect and potentially handle various MSR accesses
 into user space.
 
-If a vCPU is in running state while this ioctl is invoked, the vCPU may
-experience inconsistent filtering behavior on MSR accesses.
+Note, invoking this ioctl with a vCPU is running is inherently racy.  However,
+KVM does guarantee that vCPUs will see either the previous filter or the new
+filter, e.g. MSRs with identical settings in both the old and new filter will
+have deterministic behavior.
 
 4.127 KVM_XEN_HVM_SET_ATTR
 --------------------------
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a52f973bdff6..84198c403a48 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -931,6 +931,12 @@ enum kvm_irqchip_mode {
 	KVM_IRQCHIP_SPLIT,        /* created with KVM_CAP_SPLIT_IRQCHIP */
 };
 
+struct kvm_x86_msr_filter {
+	u8 count;
+	bool default_allow:1;
+	struct msr_bitmap_range ranges[16];
+};
+
 #define APICV_INHIBIT_REASON_DISABLE    0
 #define APICV_INHIBIT_REASON_HYPERV     1
 #define APICV_INHIBIT_REASON_NESTED     2
@@ -1025,16 +1031,11 @@ struct kvm_arch {
 	bool guest_can_read_msr_platform_info;
 	bool exception_payload_enabled;
 
+	bool bus_lock_detection_enabled;
+
 	/* Deflect RDMSR and WRMSR to user space when they trigger a #GP */
 	u32 user_space_msr_mask;
-
-	struct {
-		u8 count;
-		bool default_allow:1;
-		struct msr_bitmap_range ranges[16];
-	} msr_filter;
-
-	bool bus_lock_detection_enabled;
+	struct kvm_x86_msr_filter __rcu *msr_filter;
 
 	struct kvm_pmu_event_filter __rcu *pmu_event_filter;
 	struct task_struct *nx_lpage_recovery_thread;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a9d95f90a048..c55769620b9a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1529,35 +1529,44 @@ EXPORT_SYMBOL_GPL(kvm_enable_efer_bits);
 
 bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type)
 {
+	struct kvm_x86_msr_filter *msr_filter;
+	struct msr_bitmap_range *ranges;
 	struct kvm *kvm = vcpu->kvm;
-	struct msr_bitmap_range *ranges = kvm->arch.msr_filter.ranges;
-	u32 count = kvm->arch.msr_filter.count;
-	u32 i;
-	bool r = kvm->arch.msr_filter.default_allow;
+	bool allowed;
 	int idx;
+	u32 i;
 
-	/* MSR filtering not set up or x2APIC enabled, allow everything */
-	if (!count || (index >= 0x800 && index <= 0x8ff))
+	/* x2APIC MSRs do not support filtering. */
+	if (index >= 0x800 && index <= 0x8ff)
 		return true;
 
-	/* Prevent collision with set_msr_filter */
 	idx = srcu_read_lock(&kvm->srcu);
 
-	for (i = 0; i < count; i++) {
+	msr_filter = srcu_dereference(kvm->arch.msr_filter, &kvm->srcu);
+	if (!msr_filter) {
+		allowed = true;
+		goto out;
+	}
+
+	allowed = msr_filter->default_allow;
+	ranges = msr_filter->ranges;
+
+	for (i = 0; i < msr_filter->count; i++) {
 		u32 start = ranges[i].base;
 		u32 end = start + ranges[i].nmsrs;
 		u32 flags = ranges[i].flags;
 		unsigned long *bitmap = ranges[i].bitmap;
 
 		if ((index >= start) && (index < end) && (flags & type)) {
-			r = !!test_bit(index - start, bitmap);
+			allowed = !!test_bit(index - start, bitmap);
 			break;
 		}
 	}
 
+out:
 	srcu_read_unlock(&kvm->srcu, idx);
 
-	return r;
+	return allowed;
 }
 EXPORT_SYMBOL_GPL(kvm_msr_allowed);
 
@@ -5389,25 +5398,34 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 	return r;
 }
 
-static void kvm_clear_msr_filter(struct kvm *kvm)
+static struct kvm_x86_msr_filter *kvm_alloc_msr_filter(bool default_allow)
+{
+	struct kvm_x86_msr_filter *msr_filter;
+
+	msr_filter = kzalloc(sizeof(*msr_filter), GFP_KERNEL_ACCOUNT);
+	if (!msr_filter)
+		return NULL;
+
+	msr_filter->default_allow = default_allow;
+	return msr_filter;
+}
+
+static void kvm_free_msr_filter(struct kvm_x86_msr_filter *msr_filter)
 {
 	u32 i;
-	u32 count = kvm->arch.msr_filter.count;
-	struct msr_bitmap_range ranges[16];
 
-	mutex_lock(&kvm->lock);
-	kvm->arch.msr_filter.count = 0;
-	memcpy(ranges, kvm->arch.msr_filter.ranges, count * sizeof(ranges[0]));
-	mutex_unlock(&kvm->lock);
-	synchronize_srcu(&kvm->srcu);
+	if (!msr_filter)
+		return;
 
-	for (i = 0; i < count; i++)
-		kfree(ranges[i].bitmap);
+	for (i = 0; i < msr_filter->count; i++)
+		kfree(msr_filter->ranges[i].bitmap);
+
+	kfree(msr_filter);
 }
 
-static int kvm_add_msr_filter(struct kvm *kvm, struct kvm_msr_filter_range *user_range)
+static int kvm_add_msr_filter(struct kvm_x86_msr_filter *msr_filter,
+			      struct kvm_msr_filter_range *user_range)
 {
-	struct msr_bitmap_range *ranges = kvm->arch.msr_filter.ranges;
 	struct msr_bitmap_range range;
 	unsigned long *bitmap = NULL;
 	size_t bitmap_size;
@@ -5441,11 +5459,9 @@ static int kvm_add_msr_filter(struct kvm *kvm, struct kvm_msr_filter_range *user
 		goto err;
 	}
 
-	/* Everything ok, add this range identifier to our global pool */
-	ranges[kvm->arch.msr_filter.count] = range;
-	/* Make sure we filled the array before we tell anyone to walk it */
-	smp_wmb();
-	kvm->arch.msr_filter.count++;
+	/* Everything ok, add this range identifier. */
+	msr_filter->ranges[msr_filter->count] = range;
+	msr_filter->count++;
 
 	return 0;
 err:
@@ -5456,10 +5472,11 @@ static int kvm_add_msr_filter(struct kvm *kvm, struct kvm_msr_filter_range *user
 static int kvm_vm_ioctl_set_msr_filter(struct kvm *kvm, void __user *argp)
 {
 	struct kvm_msr_filter __user *user_msr_filter = argp;
+	struct kvm_x86_msr_filter *new_filter, *old_filter;
 	struct kvm_msr_filter filter;
 	bool default_allow;
-	int r = 0;
 	bool empty = true;
+	int r = 0;
 	u32 i;
 
 	if (copy_from_user(&filter, user_msr_filter, sizeof(filter)))
@@ -5472,25 +5489,32 @@ static int kvm_vm_ioctl_set_msr_filter(struct kvm *kvm, void __user *argp)
 	if (empty && !default_allow)
 		return -EINVAL;
 
-	kvm_clear_msr_filter(kvm);
+	new_filter = kvm_alloc_msr_filter(default_allow);
+	if (!new_filter)
+		return -ENOMEM;
 
-	kvm->arch.msr_filter.default_allow = default_allow;
-
-	/*
-	 * Protect from concurrent calls to this function that could trigger
-	 * a TOCTOU violation on kvm->arch.msr_filter.count.
-	 */
-	mutex_lock(&kvm->lock);
 	for (i = 0; i < ARRAY_SIZE(filter.ranges); i++) {
-		r = kvm_add_msr_filter(kvm, &filter.ranges[i]);
-		if (r)
-			break;
+		r = kvm_add_msr_filter(new_filter, &filter.ranges[i]);
+		if (r) {
+			kvm_free_msr_filter(new_filter);
+			return r;
+		}
 	}
 
+	mutex_lock(&kvm->lock);
+
+	/* The per-VM filter is protected by kvm->lock... */
+	old_filter = srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1);
+
+	rcu_assign_pointer(kvm->arch.msr_filter, new_filter);
+	synchronize_srcu(&kvm->srcu);
+
+	kvm_free_msr_filter(old_filter);
+
 	kvm_make_all_cpus_request(kvm, KVM_REQ_MSR_FILTER_CHANGED);
 	mutex_unlock(&kvm->lock);
 
-	return r;
+	return 0;
 }
 
 long kvm_arch_vm_ioctl(struct file *filp,
@@ -10693,8 +10717,6 @@ void kvm_arch_pre_destroy_vm(struct kvm *kvm)
 
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
-	u32 i;
-
 	if (current->mm == kvm->mm) {
 		/*
 		 * Free memory regions allocated on behalf of userspace,
@@ -10710,8 +10732,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 		mutex_unlock(&kvm->slots_lock);
 	}
 	static_call_cond(kvm_x86_vm_destroy)(kvm);
-	for (i = 0; i < kvm->arch.msr_filter.count; i++)
-		kfree(kvm->arch.msr_filter.ranges[i].bitmap);
+	kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
 	kvm_pic_destroy(kvm);
 	kvm_ioapic_destroy(kvm);
 	kvm_free_vcpus(kvm);
-- 
2.31.0.rc2.261.g7f71774620-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] KVM: nVMX: Handle dynamic MSR intercept toggling
  2021-03-16 18:44 [PATCH 0/4] KVM: x86: MSR filtering and related fixes Sean Christopherson
  2021-03-16 18:44 ` [PATCH 1/4] KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish Sean Christopherson
@ 2021-03-16 18:44 ` Sean Christopherson
  2021-03-17 13:17   ` Paolo Bonzini
  2021-03-16 18:44 ` [PATCH 3/4] KVM: VMX: Macrofy the MSR bitmap getters and setters Sean Christopherson
  2021-03-16 18:44 ` [PATCH 4/4] KVM: nVMX: Clean up x2APIC MSR handling for L2 Sean Christopherson
  3 siblings, 1 reply; 14+ messages in thread
From: Sean Christopherson @ 2021-03-16 18:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Alexander Graf, Yuan Yao

Always check vmcs01's MSR bitmap when merging L0 and L1 bitmaps for L2,
and always update the relevant bits in vmcs02.  This fixes two distinct,
but intertwined bugs related to dynamic MSR bitmap modifications.

The first issue is that KVM fails to enable MSR interception in vmcs02
for the FS/GS base MSRs if L1 first runs L2 with interception disabled,
and later enables interception.

The second issue is that KVM fails to honor userspace MSR filtering when
preparing vmcs02.

Fix both issues simultaneous as fixing only one of the issues (doesn't
matter which) would create a mess that no one should have to bisect.
Fixing only the first bug would exacerbate the MSR filtering issue as
userspace would see inconsistent behavior depending on the whims of L1.
Fixing only the second bug (MSR filtering) effectively requires fixing
the first, as the nVMX code only knows how to transition vmcs02's
bitmap from 1->0.

Move the various accessor/mutators buried in vmx.c into vmx.h so that
they can be shared by the nested code.

Fixes: 1a155254ff93 ("KVM: x86: Introduce MSR filtering")
Fixes: d69129b4e46a ("KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible")
Cc: stable@vger.kernel.org
Cc: Alexander Graf <graf@amazon.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 108 +++++++++++++++++---------------------
 arch/x86/kvm/vmx/vmx.c    |  67 ++---------------------
 arch/x86/kvm/vmx/vmx.h    |  63 ++++++++++++++++++++++
 3 files changed, 115 insertions(+), 123 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index fd334e4aa6db..aff41a432a56 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -475,29 +475,6 @@ static int nested_vmx_check_tpr_shadow_controls(struct kvm_vcpu *vcpu,
 	return 0;
 }
 
-/*
- * Check if MSR is intercepted for L01 MSR bitmap.
- */
-static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr)
-{
-	unsigned long *msr_bitmap;
-	int f = sizeof(unsigned long);
-
-	if (!cpu_has_vmx_msr_bitmap())
-		return true;
-
-	msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap;
-
-	if (msr <= 0x1fff) {
-		return !!test_bit(msr, msr_bitmap + 0x800 / f);
-	} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
-		msr &= 0x1fff;
-		return !!test_bit(msr, msr_bitmap + 0xc00 / f);
-	}
-
-	return true;
-}
-
 /*
  * If a msr is allowed by L0, we should check whether it is allowed by L1.
  * The corresponding bit will be cleared unless both of L0 and L1 allow it.
@@ -551,6 +528,34 @@ static inline void enable_x2apic_msr_intercepts(unsigned long *msr_bitmap)
 	}
 }
 
+#define BUILD_NVMX_MSR_INTERCEPT_HELPER(rw)					\
+static inline									\
+void nested_vmx_set_msr_##rw##_intercept(struct vcpu_vmx *vmx,			\
+					 unsigned long *msr_bitmap_l1,		\
+					 unsigned long *msr_bitmap_l0, u32 msr)	\
+{										\
+	if (vmx_test_msr_bitmap_##rw(vmx->vmcs01.msr_bitmap, msr) ||		\
+	    vmx_test_msr_bitmap_##rw(msr_bitmap_l1, msr))			\
+		vmx_set_msr_bitmap_##rw(msr_bitmap_l0, msr);			\
+	else									\
+		vmx_clear_msr_bitmap_##rw(msr_bitmap_l0, msr);			\
+}
+BUILD_NVMX_MSR_INTERCEPT_HELPER(read)
+BUILD_NVMX_MSR_INTERCEPT_HELPER(write)
+
+static inline void nested_vmx_set_intercept_for_msr(struct vcpu_vmx *vmx,
+						    unsigned long *msr_bitmap_l1,
+						    unsigned long *msr_bitmap_l0,
+						    u32 msr, int types)
+{
+	if (types & MSR_TYPE_R)
+		nested_vmx_set_msr_read_intercept(vmx, msr_bitmap_l1,
+						  msr_bitmap_l0, msr);
+	if (types & MSR_TYPE_W)
+		nested_vmx_set_msr_write_intercept(vmx, msr_bitmap_l1,
+						   msr_bitmap_l0, msr);
+}
+
 /*
  * Merge L0's and L1's MSR bitmap, return false to indicate that
  * we do not use the hardware.
@@ -558,10 +563,11 @@ static inline void enable_x2apic_msr_intercepts(unsigned long *msr_bitmap)
 static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 						 struct vmcs12 *vmcs12)
 {
+	struct vcpu_vmx *vmx = to_vmx(vcpu);
 	int msr;
 	unsigned long *msr_bitmap_l1;
-	unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
-	struct kvm_host_map *map = &to_vmx(vcpu)->nested.msr_bitmap_map;
+	unsigned long *msr_bitmap_l0 = vmx->nested.vmcs02.msr_bitmap;
+	struct kvm_host_map *map = &vmx->nested.msr_bitmap_map;
 
 	/* Nothing to do if the MSR bitmap is not in use.  */
 	if (!cpu_has_vmx_msr_bitmap() ||
@@ -612,42 +618,26 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 		}
 	}
 
-	/* KVM unconditionally exposes the FS/GS base MSRs to L1. */
-	nested_vmx_disable_intercept_for_msr(msr_bitmap_l1, msr_bitmap_l0,
-					     MSR_FS_BASE, MSR_TYPE_RW);
-
-	nested_vmx_disable_intercept_for_msr(msr_bitmap_l1, msr_bitmap_l0,
-					     MSR_GS_BASE, MSR_TYPE_RW);
-
-	nested_vmx_disable_intercept_for_msr(msr_bitmap_l1, msr_bitmap_l0,
-					     MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
-
 	/*
-	 * Checking the L0->L1 bitmap is trying to verify two things:
-	 *
-	 * 1. L0 gave a permission to L1 to actually passthrough the MSR. This
-	 *    ensures that we do not accidentally generate an L02 MSR bitmap
-	 *    from the L12 MSR bitmap that is too permissive.
-	 * 2. That L1 or L2s have actually used the MSR. This avoids
-	 *    unnecessarily merging of the bitmap if the MSR is unused. This
-	 *    works properly because we only update the L01 MSR bitmap lazily.
-	 *    So even if L0 should pass L1 these MSRs, the L01 bitmap is only
-	 *    updated to reflect this when L1 (or its L2s) actually write to
-	 *    the MSR.
+	 * Always check vmcs01's bitmap to honor userspace MSR filters and any
+	 * other runtime changes to vmcs01's bitmap, e.g. dynamic pass-through.
 	 */
-	if (!msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL))
-		nested_vmx_disable_intercept_for_msr(
-					msr_bitmap_l1, msr_bitmap_l0,
-					MSR_IA32_SPEC_CTRL,
-					MSR_TYPE_R | MSR_TYPE_W);
-
-	if (!msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD))
-		nested_vmx_disable_intercept_for_msr(
-					msr_bitmap_l1, msr_bitmap_l0,
-					MSR_IA32_PRED_CMD,
-					MSR_TYPE_W);
-
-	kvm_vcpu_unmap(vcpu, &to_vmx(vcpu)->nested.msr_bitmap_map, false);
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_FS_BASE, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_GS_BASE, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_SPEC_CTRL, MSR_TYPE_RW);
+
+	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
+					 MSR_IA32_PRED_CMD, MSR_TYPE_W);
+
+	kvm_vcpu_unmap(vcpu, &vmx->nested.msr_bitmap_map, false);
 
 	return true;
 }
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c8a4a548e96b..9972e5d1c44e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -879,29 +879,6 @@ void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu)
 	vmcs_write32(EXCEPTION_BITMAP, eb);
 }
 
-/*
- * Check if MSR is intercepted for currently loaded MSR bitmap.
- */
-static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
-{
-	unsigned long *msr_bitmap;
-	int f = sizeof(unsigned long);
-
-	if (!cpu_has_vmx_msr_bitmap())
-		return true;
-
-	msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
-
-	if (msr <= 0x1fff) {
-		return !!test_bit(msr, msr_bitmap + 0x800 / f);
-	} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
-		msr &= 0x1fff;
-		return !!test_bit(msr, msr_bitmap + 0xc00 / f);
-	}
-
-	return true;
-}
-
 static void clear_atomic_switch_msr_special(struct vcpu_vmx *vmx,
 		unsigned long entry, unsigned long exit)
 {
@@ -3709,46 +3686,6 @@ void free_vpid(int vpid)
 	spin_unlock(&vmx_vpid_lock);
 }
 
-static void vmx_clear_msr_bitmap_read(ulong *msr_bitmap, u32 msr)
-{
-	int f = sizeof(unsigned long);
-
-	if (msr <= 0x1fff)
-		__clear_bit(msr, msr_bitmap + 0x000 / f);
-	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
-		__clear_bit(msr & 0x1fff, msr_bitmap + 0x400 / f);
-}
-
-static void vmx_clear_msr_bitmap_write(ulong *msr_bitmap, u32 msr)
-{
-	int f = sizeof(unsigned long);
-
-	if (msr <= 0x1fff)
-		__clear_bit(msr, msr_bitmap + 0x800 / f);
-	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
-		__clear_bit(msr & 0x1fff, msr_bitmap + 0xc00 / f);
-}
-
-static void vmx_set_msr_bitmap_read(ulong *msr_bitmap, u32 msr)
-{
-	int f = sizeof(unsigned long);
-
-	if (msr <= 0x1fff)
-		__set_bit(msr, msr_bitmap + 0x000 / f);
-	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
-		__set_bit(msr & 0x1fff, msr_bitmap + 0x400 / f);
-}
-
-static void vmx_set_msr_bitmap_write(ulong *msr_bitmap, u32 msr)
-{
-	int f = sizeof(unsigned long);
-
-	if (msr <= 0x1fff)
-		__set_bit(msr, msr_bitmap + 0x800 / f);
-	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
-		__set_bit(msr & 0x1fff, msr_bitmap + 0xc00 / f);
-}
-
 static __always_inline void vmx_disable_intercept_for_msr(struct kvm_vcpu *vcpu,
 							  u32 msr, int type)
 {
@@ -6722,7 +6659,9 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	 * If the L02 MSR bitmap does not intercept the MSR, then we need to
 	 * save it.
 	 */
-	if (unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
+	if (unlikely(cpu_has_vmx_msr_bitmap() &&
+		     vmx_test_msr_bitmap_write(vmx->loaded_vmcs->msr_bitmap,
+					       MSR_IA32_SPEC_CTRL)))
 		vmx->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
 
 	x86_spec_ctrl_restore_host(vmx->spec_ctrl, 0);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 0fb3236b0283..a6000c91b897 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -393,6 +393,69 @@ void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu,
 	u32 msr, int type, bool value);
 void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu);
 
+static inline bool vmx_test_msr_bitmap_read(ulong *msr_bitmap, u32 msr)
+{
+	int f = sizeof(unsigned long);
+
+	if (msr <= 0x1fff)
+		return test_bit(msr, msr_bitmap + 0x000 / f);
+	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
+		return test_bit(msr & 0x1fff, msr_bitmap + 0x400 / f);
+	return true;
+}
+
+static inline bool vmx_test_msr_bitmap_write(ulong *msr_bitmap, u32 msr)
+{
+	int f = sizeof(unsigned long);
+
+	if (msr <= 0x1fff)
+		return test_bit(msr, msr_bitmap + 0x800 / f);
+	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
+		return test_bit(msr & 0x1fff, msr_bitmap + 0xc00 / f);
+	return true;
+}
+
+static inline void vmx_clear_msr_bitmap_read(ulong *msr_bitmap, u32 msr)
+{
+	int f = sizeof(unsigned long);
+
+	if (msr <= 0x1fff)
+		__clear_bit(msr, msr_bitmap + 0x000 / f);
+	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
+		__clear_bit(msr & 0x1fff, msr_bitmap + 0x400 / f);
+}
+
+static inline void vmx_clear_msr_bitmap_write(ulong *msr_bitmap, u32 msr)
+{
+	int f = sizeof(unsigned long);
+
+	if (msr <= 0x1fff)
+		__clear_bit(msr, msr_bitmap + 0x800 / f);
+	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
+		__clear_bit(msr & 0x1fff, msr_bitmap + 0xc00 / f);
+}
+
+static inline void vmx_set_msr_bitmap_read(ulong *msr_bitmap, u32 msr)
+{
+	int f = sizeof(unsigned long);
+
+	if (msr <= 0x1fff)
+		__set_bit(msr, msr_bitmap + 0x000 / f);
+	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
+		__set_bit(msr & 0x1fff, msr_bitmap + 0x400 / f);
+}
+
+static inline void vmx_set_msr_bitmap_write(ulong *msr_bitmap, u32 msr)
+{
+	int f = sizeof(unsigned long);
+
+	if (msr <= 0x1fff)
+		__set_bit(msr, msr_bitmap + 0x800 / f);
+	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
+		__set_bit(msr & 0x1fff, msr_bitmap + 0xc00 / f);
+}
+
+
 static inline u8 vmx_get_rvi(void)
 {
 	return vmcs_read16(GUEST_INTR_STATUS) & 0xff;
-- 
2.31.0.rc2.261.g7f71774620-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] KVM: VMX: Macrofy the MSR bitmap getters and setters
  2021-03-16 18:44 [PATCH 0/4] KVM: x86: MSR filtering and related fixes Sean Christopherson
  2021-03-16 18:44 ` [PATCH 1/4] KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish Sean Christopherson
  2021-03-16 18:44 ` [PATCH 2/4] KVM: nVMX: Handle dynamic MSR intercept toggling Sean Christopherson
@ 2021-03-16 18:44 ` Sean Christopherson
  2021-03-17 13:15   ` Paolo Bonzini
  2021-03-16 18:44 ` [PATCH 4/4] KVM: nVMX: Clean up x2APIC MSR handling for L2 Sean Christopherson
  3 siblings, 1 reply; 14+ messages in thread
From: Sean Christopherson @ 2021-03-16 18:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Alexander Graf, Yuan Yao

Add builder macros to generate the MSR bitmap helpers to reduce the
amount of copy-paste code, especially with respect to all the magic
numbers needed to calc the correct bit location.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/vmx.h | 82 ++++++++++++------------------------------
 1 file changed, 22 insertions(+), 60 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index a6000c91b897..aab89e713c8e 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -393,68 +393,30 @@ void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu,
 	u32 msr, int type, bool value);
 void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu);
 
-static inline bool vmx_test_msr_bitmap_read(ulong *msr_bitmap, u32 msr)
-{
-	int f = sizeof(unsigned long);
-
-	if (msr <= 0x1fff)
-		return test_bit(msr, msr_bitmap + 0x000 / f);
-	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
-		return test_bit(msr & 0x1fff, msr_bitmap + 0x400 / f);
-	return true;
-}
-
-static inline bool vmx_test_msr_bitmap_write(ulong *msr_bitmap, u32 msr)
-{
-	int f = sizeof(unsigned long);
-
-	if (msr <= 0x1fff)
-		return test_bit(msr, msr_bitmap + 0x800 / f);
-	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
-		return test_bit(msr & 0x1fff, msr_bitmap + 0xc00 / f);
-	return true;
-}
-
-static inline void vmx_clear_msr_bitmap_read(ulong *msr_bitmap, u32 msr)
-{
-	int f = sizeof(unsigned long);
-
-	if (msr <= 0x1fff)
-		__clear_bit(msr, msr_bitmap + 0x000 / f);
-	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
-		__clear_bit(msr & 0x1fff, msr_bitmap + 0x400 / f);
-}
-
-static inline void vmx_clear_msr_bitmap_write(ulong *msr_bitmap, u32 msr)
-{
-	int f = sizeof(unsigned long);
-
-	if (msr <= 0x1fff)
-		__clear_bit(msr, msr_bitmap + 0x800 / f);
-	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
-		__clear_bit(msr & 0x1fff, msr_bitmap + 0xc00 / f);
-}
-
-static inline void vmx_set_msr_bitmap_read(ulong *msr_bitmap, u32 msr)
-{
-	int f = sizeof(unsigned long);
-
-	if (msr <= 0x1fff)
-		__set_bit(msr, msr_bitmap + 0x000 / f);
-	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
-		__set_bit(msr & 0x1fff, msr_bitmap + 0x400 / f);
-}
-
-static inline void vmx_set_msr_bitmap_write(ulong *msr_bitmap, u32 msr)
-{
-	int f = sizeof(unsigned long);
-
-	if (msr <= 0x1fff)
-		__set_bit(msr, msr_bitmap + 0x800 / f);
-	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
-		__set_bit(msr & 0x1fff, msr_bitmap + 0xc00 / f);
+#define VMX_MSR_BITMAP_BASE_read	0x0
+#define VMX_MSR_BITMAP_BASE_write	0x800
+
+#define BUILD_VMX_MSR_BITMAP_HELPER(ret, action, type, pre...)		      \
+static inline ret vmx_##action##_msr_bitmap_##type(unsigned long *msr_bitmap, \
+						   u32 msr)		      \
+{									      \
+	int base = VMX_MSR_BITMAP_BASE_##type;				      \
+	int f = sizeof(unsigned long);					      \
+									      \
+	if (msr <= 0x1fff)						      \
+		return pre##action##_bit(msr, msr_bitmap + base / f);	      \
+	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))		      \
+		return pre##action##_bit(msr & 0x1fff,			      \
+					 msr_bitmap + (base + 0x400) / f);    \
+	return (ret)true;						      \
 }
 
+BUILD_VMX_MSR_BITMAP_HELPER(bool, test, read)
+BUILD_VMX_MSR_BITMAP_HELPER(bool, test, write)
+BUILD_VMX_MSR_BITMAP_HELPER(void, clear, read, __)
+BUILD_VMX_MSR_BITMAP_HELPER(void, clear, write, __)
+BUILD_VMX_MSR_BITMAP_HELPER(void, set, read, __)
+BUILD_VMX_MSR_BITMAP_HELPER(void, set, write, __)
 
 static inline u8 vmx_get_rvi(void)
 {
-- 
2.31.0.rc2.261.g7f71774620-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] KVM: nVMX: Clean up x2APIC MSR handling for L2
  2021-03-16 18:44 [PATCH 0/4] KVM: x86: MSR filtering and related fixes Sean Christopherson
                   ` (2 preceding siblings ...)
  2021-03-16 18:44 ` [PATCH 3/4] KVM: VMX: Macrofy the MSR bitmap getters and setters Sean Christopherson
@ 2021-03-16 18:44 ` Sean Christopherson
  3 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2021-03-16 18:44 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Alexander Graf, Yuan Yao

Clean up the x2APIC MSR bitmap intereption code for L2, which is the last
holdout of open coded bitmap manipulations.  Freshen up the SDM/PRM
comment, rename the function to make it abundantly clear the funky
behavior is x2APIC specific, and explain _why_ vmcs01's bitmap is ignored
(the previous comment was flat out wrong for x2APIC behavior).

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/vmx/nested.c | 53 +++++++++++----------------------------
 arch/x86/kvm/vmx/vmx.h    |  7 ++++++
 2 files changed, 21 insertions(+), 39 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index aff41a432a56..49eeffb79823 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -476,44 +476,19 @@ static int nested_vmx_check_tpr_shadow_controls(struct kvm_vcpu *vcpu,
 }
 
 /*
- * If a msr is allowed by L0, we should check whether it is allowed by L1.
- * The corresponding bit will be cleared unless both of L0 and L1 allow it.
+ * For x2APIC MSRs, ignore the vmcs01 bitmap.  L1 can enable x2APIC without L1
+ * itself utilizing x2APIC.  All MSRs were previously set to be intercepted,
+ * only the disable intercept case needs to be handled.
  */
-static void nested_vmx_disable_intercept_for_msr(unsigned long *msr_bitmap_l1,
-					       unsigned long *msr_bitmap_nested,
-					       u32 msr, int type)
+static void nested_vmx_disable_intercept_for_x2apic_msr(unsigned long *msr_bitmap_l1,
+							unsigned long *msr_bitmap_l0,
+							u32 msr, int type)
 {
-	int f = sizeof(unsigned long);
+	if (type & MSR_TYPE_R && !vmx_test_msr_bitmap_read(msr_bitmap_l1, msr))
+		vmx_clear_msr_bitmap_read(msr_bitmap_l0, msr);
 
-	/*
-	 * See Intel PRM Vol. 3, 20.6.9 (MSR-Bitmap Address). Early manuals
-	 * have the write-low and read-high bitmap offsets the wrong way round.
-	 * We can control MSRs 0x00000000-0x00001fff and 0xc0000000-0xc0001fff.
-	 */
-	if (msr <= 0x1fff) {
-		if (type & MSR_TYPE_R &&
-		   !test_bit(msr, msr_bitmap_l1 + 0x000 / f))
-			/* read-low */
-			__clear_bit(msr, msr_bitmap_nested + 0x000 / f);
-
-		if (type & MSR_TYPE_W &&
-		   !test_bit(msr, msr_bitmap_l1 + 0x800 / f))
-			/* write-low */
-			__clear_bit(msr, msr_bitmap_nested + 0x800 / f);
-
-	} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
-		msr &= 0x1fff;
-		if (type & MSR_TYPE_R &&
-		   !test_bit(msr, msr_bitmap_l1 + 0x400 / f))
-			/* read-high */
-			__clear_bit(msr, msr_bitmap_nested + 0x400 / f);
-
-		if (type & MSR_TYPE_W &&
-		   !test_bit(msr, msr_bitmap_l1 + 0xc00 / f))
-			/* write-high */
-			__clear_bit(msr, msr_bitmap_nested + 0xc00 / f);
-
-	}
+	if (type & MSR_TYPE_W && !vmx_test_msr_bitmap_write(msr_bitmap_l1, msr))
+		vmx_clear_msr_bitmap_write(msr_bitmap_l0, msr);
 }
 
 static inline void enable_x2apic_msr_intercepts(unsigned long *msr_bitmap)
@@ -582,7 +557,7 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 	/*
 	 * To keep the control flow simple, pay eight 8-byte writes (sixteen
 	 * 4-byte writes on 32-bit systems) up front to enable intercepts for
-	 * the x2APIC MSR range and selectively disable them below.
+	 * the x2APIC MSR range and selectively toggle those relevant to L2.
 	 */
 	enable_x2apic_msr_intercepts(msr_bitmap_l0);
 
@@ -601,17 +576,17 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
 			}
 		}
 
-		nested_vmx_disable_intercept_for_msr(
+		nested_vmx_disable_intercept_for_x2apic_msr(
 			msr_bitmap_l1, msr_bitmap_l0,
 			X2APIC_MSR(APIC_TASKPRI),
 			MSR_TYPE_R | MSR_TYPE_W);
 
 		if (nested_cpu_has_vid(vmcs12)) {
-			nested_vmx_disable_intercept_for_msr(
+			nested_vmx_disable_intercept_for_x2apic_msr(
 				msr_bitmap_l1, msr_bitmap_l0,
 				X2APIC_MSR(APIC_EOI),
 				MSR_TYPE_W);
-			nested_vmx_disable_intercept_for_msr(
+			nested_vmx_disable_intercept_for_x2apic_msr(
 				msr_bitmap_l1, msr_bitmap_l0,
 				X2APIC_MSR(APIC_SELF_IPI),
 				MSR_TYPE_W);
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index aab89e713c8e..bc1a186a5bc2 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -393,6 +393,13 @@ void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu,
 	u32 msr, int type, bool value);
 void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu);
 
+/*
+ * Note, early Intel manuals have the write-low and read-high bitmap offsets
+ * the wrong way round.  The bitmaps control MSRs 0x00000000-0x00001fff and
+ * 0xc0000000-0xc0001fff.  The former (low) uses bytes 0-0x3ff for reads and
+ * 0x800-0xbff for writes.  The latter (high) uses 0x400-0x7ff for reads and
+ * 0xc00-0xfff for writes.
+ */
 #define VMX_MSR_BITMAP_BASE_read	0x0
 #define VMX_MSR_BITMAP_BASE_write	0x800
 
-- 
2.31.0.rc2.261.g7f71774620-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] KVM: VMX: Macrofy the MSR bitmap getters and setters
  2021-03-16 18:44 ` [PATCH 3/4] KVM: VMX: Macrofy the MSR bitmap getters and setters Sean Christopherson
@ 2021-03-17 13:15   ` Paolo Bonzini
  2021-03-17 16:39     ` Sean Christopherson
  0 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2021-03-17 13:15 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Alexander Graf, Yuan Yao

On 16/03/21 19:44, Sean Christopherson wrote:
> +	return (ret)true;						      \

I'm not sure if (void)true is amazing or disgusting, but anyway...

> +BUILD_VMX_MSR_BITMAP_HELPER(bool, test, read)
> +BUILD_VMX_MSR_BITMAP_HELPER(bool, test, write)
> +BUILD_VMX_MSR_BITMAP_HELPER(void, clear, read, __)
> +BUILD_VMX_MSR_BITMAP_HELPER(void, clear, write, __)
> +BUILD_VMX_MSR_BITMAP_HELPER(void, set, read, __)
> +BUILD_VMX_MSR_BITMAP_HELPER(void, set, write, __)

... I guess we have an armed truce where you let me do my bit 
manipulation magic and I let you do your macro magic.

Still, I think gluing the variadic arguments with ## is a bit too much. 
  This would be slightly less mysterious:

+BUILD_VMX_MSR_BITMAP_HELPER(bool, vmx_test_msr_bitmap_, read, test_bit)
+BUILD_VMX_MSR_BITMAP_HELPER(bool, vmx_test_msr_bitmap_, write, test_bit)
+BUILD_VMX_MSR_BITMAP_HELPER(void, vmx_clear_msr_bitmap_, read, __clear_bit)
+BUILD_VMX_MSR_BITMAP_HELPER(void, vmx_clear_msr_bitmap_, write, 
__clear_bit)
+BUILD_VMX_MSR_BITMAP_HELPER(void, vmx_set_msr_bitmap_, read, __set_bit)
+BUILD_VMX_MSR_BITMAP_HELPER(void, vmx_set_msr_bitmap_, write, __set_bit)

And I also wonder if we really need to expand all six functions one at a 
time.  You could remove the third argument and VMX_MSR_BITMAP_BASE_*, at 
the cost of expanding the inline functions' body twice in 
BUILD_VMX_MSR_BITMAP_HELPER.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish
  2021-03-16 18:44 ` [PATCH 1/4] KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish Sean Christopherson
@ 2021-03-17 13:15   ` Paolo Bonzini
  2021-03-17 19:29   ` Alexander Graf
  1 sibling, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2021-03-17 13:15 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Alexander Graf, Yuan Yao

On 16/03/21 19:44, Sean Christopherson wrote:
> Fix a plethora of issues with MSR filtering by installing the resulting
> filter as an atomic bundle instead of updating the live filter one range
> at a time.  The KVM_X86_SET_MSR_FILTER ioctl() isn't truly atomic, as
> the hardware MSR bitmaps won't be updated until the next VM-Enter, but
> the relevant software struct is atomically updated, which is what KVM
> really needs.
> 
> Similar to the approach used for modifying memslots, make arch.msr_filter
> a SRCU-protected pointer, do all the work configuring the new filter
> outside of kvm->lock, and then acquire kvm->lock only when the new filter
> has been vetted and created.  That way vCPU readers either see the old
> filter or the new filter in their entirety, not some half-baked state.
> 
> Yuan Yao pointed out a use-after-free in ksm_msr_allowed() due to a
> TOCTOU bug, but that's just the tip of the iceberg...
> 
>    - Nothing is __rcu annotated, making it nigh impossible to audit the
>      code for correctness.
>    - kvm_add_msr_filter() has an unpaired smp_wmb().  Violation of kernel
>      coding style aside, the lack of a smb_rmb() anywhere casts all code
>      into doubt.
>    - kvm_clear_msr_filter() has a double free TOCTOU bug, as it grabs
>      count before taking the lock.
>    - kvm_clear_msr_filter() also has memory leak due to the same TOCTOU bug.
> 
> The entire approach of updating the live filter is also flawed.  While
> installing a new filter is inherently racy if vCPUs are running, fixing
> the above issues also makes it trivial to ensure certain behavior is
> deterministic, e.g. KVM can provide deterministic behavior for MSRs with
> identical settings in the old and new filters.  An atomic update of the
> filter also prevents KVM from getting into a half-baked state, e.g. if
> installing a filter fails, the existing approach would leave the filter
> in a half-baked state, having already committed whatever bits of the
> filter were already processed.
> 
> [*] https://lkml.kernel.org/r/20210312083157.25403-1-yaoyuan0329os@gmail.com
> 
> Fixes: 1a155254ff93 ("KVM: x86: Introduce MSR filtering")
> Cc: stable@vger.kernel.org
> Cc: Alexander Graf <graf@amazon.com>
> Reported-by: Yuan Yao <yaoyuan0329os@gmail.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   Documentation/virt/kvm/api.rst  |   6 +-
>   arch/x86/include/asm/kvm_host.h |  17 ++---
>   arch/x86/kvm/x86.c              | 109 +++++++++++++++++++-------------
>   3 files changed, 78 insertions(+), 54 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 38e327d4b479..2898d3e86b08 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -4806,8 +4806,10 @@ If an MSR access is not permitted through the filtering, it generates a
>   allows user space to deflect and potentially handle various MSR accesses
>   into user space.
>   
> -If a vCPU is in running state while this ioctl is invoked, the vCPU may
> -experience inconsistent filtering behavior on MSR accesses.
> +Note, invoking this ioctl with a vCPU is running is inherently racy.  However,
> +KVM does guarantee that vCPUs will see either the previous filter or the new
> +filter, e.g. MSRs with identical settings in both the old and new filter will
> +have deterministic behavior.
>   
>   4.127 KVM_XEN_HVM_SET_ATTR
>   --------------------------
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index a52f973bdff6..84198c403a48 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -931,6 +931,12 @@ enum kvm_irqchip_mode {
>   	KVM_IRQCHIP_SPLIT,        /* created with KVM_CAP_SPLIT_IRQCHIP */
>   };
>   
> +struct kvm_x86_msr_filter {
> +	u8 count;
> +	bool default_allow:1;
> +	struct msr_bitmap_range ranges[16];
> +};
> +
>   #define APICV_INHIBIT_REASON_DISABLE    0
>   #define APICV_INHIBIT_REASON_HYPERV     1
>   #define APICV_INHIBIT_REASON_NESTED     2
> @@ -1025,16 +1031,11 @@ struct kvm_arch {
>   	bool guest_can_read_msr_platform_info;
>   	bool exception_payload_enabled;
>   
> +	bool bus_lock_detection_enabled;
> +
>   	/* Deflect RDMSR and WRMSR to user space when they trigger a #GP */
>   	u32 user_space_msr_mask;
> -
> -	struct {
> -		u8 count;
> -		bool default_allow:1;
> -		struct msr_bitmap_range ranges[16];
> -	} msr_filter;
> -
> -	bool bus_lock_detection_enabled;
> +	struct kvm_x86_msr_filter __rcu *msr_filter;
>   
>   	struct kvm_pmu_event_filter __rcu *pmu_event_filter;
>   	struct task_struct *nx_lpage_recovery_thread;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a9d95f90a048..c55769620b9a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1529,35 +1529,44 @@ EXPORT_SYMBOL_GPL(kvm_enable_efer_bits);
>   
>   bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type)
>   {
> +	struct kvm_x86_msr_filter *msr_filter;
> +	struct msr_bitmap_range *ranges;
>   	struct kvm *kvm = vcpu->kvm;
> -	struct msr_bitmap_range *ranges = kvm->arch.msr_filter.ranges;
> -	u32 count = kvm->arch.msr_filter.count;
> -	u32 i;
> -	bool r = kvm->arch.msr_filter.default_allow;
> +	bool allowed;
>   	int idx;
> +	u32 i;
>   
> -	/* MSR filtering not set up or x2APIC enabled, allow everything */
> -	if (!count || (index >= 0x800 && index <= 0x8ff))
> +	/* x2APIC MSRs do not support filtering. */
> +	if (index >= 0x800 && index <= 0x8ff)
>   		return true;
>   
> -	/* Prevent collision with set_msr_filter */
>   	idx = srcu_read_lock(&kvm->srcu);
>   
> -	for (i = 0; i < count; i++) {
> +	msr_filter = srcu_dereference(kvm->arch.msr_filter, &kvm->srcu);
> +	if (!msr_filter) {
> +		allowed = true;
> +		goto out;
> +	}
> +
> +	allowed = msr_filter->default_allow;
> +	ranges = msr_filter->ranges;
> +
> +	for (i = 0; i < msr_filter->count; i++) {
>   		u32 start = ranges[i].base;
>   		u32 end = start + ranges[i].nmsrs;
>   		u32 flags = ranges[i].flags;
>   		unsigned long *bitmap = ranges[i].bitmap;
>   
>   		if ((index >= start) && (index < end) && (flags & type)) {
> -			r = !!test_bit(index - start, bitmap);
> +			allowed = !!test_bit(index - start, bitmap);
>   			break;
>   		}
>   	}
>   
> +out:
>   	srcu_read_unlock(&kvm->srcu, idx);
>   
> -	return r;
> +	return allowed;
>   }
>   EXPORT_SYMBOL_GPL(kvm_msr_allowed);
>   
> @@ -5389,25 +5398,34 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>   	return r;
>   }
>   
> -static void kvm_clear_msr_filter(struct kvm *kvm)
> +static struct kvm_x86_msr_filter *kvm_alloc_msr_filter(bool default_allow)
> +{
> +	struct kvm_x86_msr_filter *msr_filter;
> +
> +	msr_filter = kzalloc(sizeof(*msr_filter), GFP_KERNEL_ACCOUNT);
> +	if (!msr_filter)
> +		return NULL;
> +
> +	msr_filter->default_allow = default_allow;
> +	return msr_filter;
> +}
> +
> +static void kvm_free_msr_filter(struct kvm_x86_msr_filter *msr_filter)
>   {
>   	u32 i;
> -	u32 count = kvm->arch.msr_filter.count;
> -	struct msr_bitmap_range ranges[16];
>   
> -	mutex_lock(&kvm->lock);
> -	kvm->arch.msr_filter.count = 0;
> -	memcpy(ranges, kvm->arch.msr_filter.ranges, count * sizeof(ranges[0]));
> -	mutex_unlock(&kvm->lock);
> -	synchronize_srcu(&kvm->srcu);
> +	if (!msr_filter)
> +		return;
>   
> -	for (i = 0; i < count; i++)
> -		kfree(ranges[i].bitmap);
> +	for (i = 0; i < msr_filter->count; i++)
> +		kfree(msr_filter->ranges[i].bitmap);
> +
> +	kfree(msr_filter);
>   }
>   
> -static int kvm_add_msr_filter(struct kvm *kvm, struct kvm_msr_filter_range *user_range)
> +static int kvm_add_msr_filter(struct kvm_x86_msr_filter *msr_filter,
> +			      struct kvm_msr_filter_range *user_range)
>   {
> -	struct msr_bitmap_range *ranges = kvm->arch.msr_filter.ranges;
>   	struct msr_bitmap_range range;
>   	unsigned long *bitmap = NULL;
>   	size_t bitmap_size;
> @@ -5441,11 +5459,9 @@ static int kvm_add_msr_filter(struct kvm *kvm, struct kvm_msr_filter_range *user
>   		goto err;
>   	}
>   
> -	/* Everything ok, add this range identifier to our global pool */
> -	ranges[kvm->arch.msr_filter.count] = range;
> -	/* Make sure we filled the array before we tell anyone to walk it */
> -	smp_wmb();
> -	kvm->arch.msr_filter.count++;
> +	/* Everything ok, add this range identifier. */
> +	msr_filter->ranges[msr_filter->count] = range;
> +	msr_filter->count++;
>   
>   	return 0;
>   err:
> @@ -5456,10 +5472,11 @@ static int kvm_add_msr_filter(struct kvm *kvm, struct kvm_msr_filter_range *user
>   static int kvm_vm_ioctl_set_msr_filter(struct kvm *kvm, void __user *argp)
>   {
>   	struct kvm_msr_filter __user *user_msr_filter = argp;
> +	struct kvm_x86_msr_filter *new_filter, *old_filter;
>   	struct kvm_msr_filter filter;
>   	bool default_allow;
> -	int r = 0;
>   	bool empty = true;
> +	int r = 0;
>   	u32 i;
>   
>   	if (copy_from_user(&filter, user_msr_filter, sizeof(filter)))
> @@ -5472,25 +5489,32 @@ static int kvm_vm_ioctl_set_msr_filter(struct kvm *kvm, void __user *argp)
>   	if (empty && !default_allow)
>   		return -EINVAL;
>   
> -	kvm_clear_msr_filter(kvm);
> +	new_filter = kvm_alloc_msr_filter(default_allow);
> +	if (!new_filter)
> +		return -ENOMEM;
>   
> -	kvm->arch.msr_filter.default_allow = default_allow;
> -
> -	/*
> -	 * Protect from concurrent calls to this function that could trigger
> -	 * a TOCTOU violation on kvm->arch.msr_filter.count.
> -	 */
> -	mutex_lock(&kvm->lock);
>   	for (i = 0; i < ARRAY_SIZE(filter.ranges); i++) {
> -		r = kvm_add_msr_filter(kvm, &filter.ranges[i]);
> -		if (r)
> -			break;
> +		r = kvm_add_msr_filter(new_filter, &filter.ranges[i]);
> +		if (r) {
> +			kvm_free_msr_filter(new_filter);
> +			return r;
> +		}
>   	}
>   
> +	mutex_lock(&kvm->lock);
> +
> +	/* The per-VM filter is protected by kvm->lock... */
> +	old_filter = srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1);
> +
> +	rcu_assign_pointer(kvm->arch.msr_filter, new_filter);
> +	synchronize_srcu(&kvm->srcu);
> +
> +	kvm_free_msr_filter(old_filter);
> +
>   	kvm_make_all_cpus_request(kvm, KVM_REQ_MSR_FILTER_CHANGED);
>   	mutex_unlock(&kvm->lock);
>   
> -	return r;
> +	return 0;
>   }
>   
>   long kvm_arch_vm_ioctl(struct file *filp,
> @@ -10693,8 +10717,6 @@ void kvm_arch_pre_destroy_vm(struct kvm *kvm)
>   
>   void kvm_arch_destroy_vm(struct kvm *kvm)
>   {
> -	u32 i;
> -
>   	if (current->mm == kvm->mm) {
>   		/*
>   		 * Free memory regions allocated on behalf of userspace,
> @@ -10710,8 +10732,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>   		mutex_unlock(&kvm->slots_lock);
>   	}
>   	static_call_cond(kvm_x86_vm_destroy)(kvm);
> -	for (i = 0; i < kvm->arch.msr_filter.count; i++)
> -		kfree(kvm->arch.msr_filter.ranges[i].bitmap);
> +	kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
>   	kvm_pic_destroy(kvm);
>   	kvm_ioapic_destroy(kvm);
>   	kvm_free_vcpus(kvm);
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] KVM: nVMX: Handle dynamic MSR intercept toggling
  2021-03-16 18:44 ` [PATCH 2/4] KVM: nVMX: Handle dynamic MSR intercept toggling Sean Christopherson
@ 2021-03-17 13:17   ` Paolo Bonzini
  2021-03-17 16:50     ` Sean Christopherson
  0 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2021-03-17 13:17 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Alexander Graf, Yuan Yao

On 16/03/21 19:44, Sean Christopherson wrote:
> Always check vmcs01's MSR bitmap when merging L0 and L1 bitmaps for L2,
> and always update the relevant bits in vmcs02.  This fixes two distinct,
> but intertwined bugs related to dynamic MSR bitmap modifications.
> 
> The first issue is that KVM fails to enable MSR interception in vmcs02
> for the FS/GS base MSRs if L1 first runs L2 with interception disabled,
> and later enables interception.
> 
> The second issue is that KVM fails to honor userspace MSR filtering when
> preparing vmcs02.
> 
> Fix both issues simultaneous as fixing only one of the issues (doesn't
> matter which) would create a mess that no one should have to bisect.
> Fixing only the first bug would exacerbate the MSR filtering issue as
> userspace would see inconsistent behavior depending on the whims of L1.
> Fixing only the second bug (MSR filtering) effectively requires fixing
> the first, as the nVMX code only knows how to transition vmcs02's
> bitmap from 1->0.
> 
> Move the various accessor/mutators buried in vmx.c into vmx.h so that
> they can be shared by the nested code.
> 
> Fixes: 1a155254ff93 ("KVM: x86: Introduce MSR filtering")
> Fixes: d69129b4e46a ("KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible")
> Cc: stable@vger.kernel.org
> Cc: Alexander Graf <graf@amazon.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/vmx/nested.c | 108 +++++++++++++++++---------------------
>   arch/x86/kvm/vmx/vmx.c    |  67 ++---------------------
>   arch/x86/kvm/vmx/vmx.h    |  63 ++++++++++++++++++++++
>   3 files changed, 115 insertions(+), 123 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index fd334e4aa6db..aff41a432a56 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -475,29 +475,6 @@ static int nested_vmx_check_tpr_shadow_controls(struct kvm_vcpu *vcpu,
>   	return 0;
>   }
>   
> -/*
> - * Check if MSR is intercepted for L01 MSR bitmap.
> - */
> -static bool msr_write_intercepted_l01(struct kvm_vcpu *vcpu, u32 msr)
> -{
> -	unsigned long *msr_bitmap;
> -	int f = sizeof(unsigned long);
> -
> -	if (!cpu_has_vmx_msr_bitmap())
> -		return true;
> -
> -	msr_bitmap = to_vmx(vcpu)->vmcs01.msr_bitmap;
> -
> -	if (msr <= 0x1fff) {
> -		return !!test_bit(msr, msr_bitmap + 0x800 / f);
> -	} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
> -		msr &= 0x1fff;
> -		return !!test_bit(msr, msr_bitmap + 0xc00 / f);
> -	}
> -
> -	return true;
> -}
> -
>   /*
>    * If a msr is allowed by L0, we should check whether it is allowed by L1.
>    * The corresponding bit will be cleared unless both of L0 and L1 allow it.
> @@ -551,6 +528,34 @@ static inline void enable_x2apic_msr_intercepts(unsigned long *msr_bitmap)
>   	}
>   }
>   
> +#define BUILD_NVMX_MSR_INTERCEPT_HELPER(rw)					\
> +static inline									\
> +void nested_vmx_set_msr_##rw##_intercept(struct vcpu_vmx *vmx,			\
> +					 unsigned long *msr_bitmap_l1,		\
> +					 unsigned long *msr_bitmap_l0, u32 msr)	\
> +{										\
> +	if (vmx_test_msr_bitmap_##rw(vmx->vmcs01.msr_bitmap, msr) ||		\
> +	    vmx_test_msr_bitmap_##rw(msr_bitmap_l1, msr))			\
> +		vmx_set_msr_bitmap_##rw(msr_bitmap_l0, msr);			\
> +	else									\
> +		vmx_clear_msr_bitmap_##rw(msr_bitmap_l0, msr);			\
> +}
> +BUILD_NVMX_MSR_INTERCEPT_HELPER(read)
> +BUILD_NVMX_MSR_INTERCEPT_HELPER(write)
> +
> +static inline void nested_vmx_set_intercept_for_msr(struct vcpu_vmx *vmx,
> +						    unsigned long *msr_bitmap_l1,
> +						    unsigned long *msr_bitmap_l0,
> +						    u32 msr, int types)
> +{
> +	if (types & MSR_TYPE_R)
> +		nested_vmx_set_msr_read_intercept(vmx, msr_bitmap_l1,
> +						  msr_bitmap_l0, msr);
> +	if (types & MSR_TYPE_W)
> +		nested_vmx_set_msr_write_intercept(vmx, msr_bitmap_l1,
> +						   msr_bitmap_l0, msr);
> +}
> +
>   /*
>    * Merge L0's and L1's MSR bitmap, return false to indicate that
>    * we do not use the hardware.
> @@ -558,10 +563,11 @@ static inline void enable_x2apic_msr_intercepts(unsigned long *msr_bitmap)
>   static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
>   						 struct vmcs12 *vmcs12)
>   {
> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
>   	int msr;
>   	unsigned long *msr_bitmap_l1;
> -	unsigned long *msr_bitmap_l0 = to_vmx(vcpu)->nested.vmcs02.msr_bitmap;
> -	struct kvm_host_map *map = &to_vmx(vcpu)->nested.msr_bitmap_map;
> +	unsigned long *msr_bitmap_l0 = vmx->nested.vmcs02.msr_bitmap;
> +	struct kvm_host_map *map = &vmx->nested.msr_bitmap_map;
>   
>   	/* Nothing to do if the MSR bitmap is not in use.  */
>   	if (!cpu_has_vmx_msr_bitmap() ||
> @@ -612,42 +618,26 @@ static inline bool nested_vmx_prepare_msr_bitmap(struct kvm_vcpu *vcpu,
>   		}
>   	}
>   
> -	/* KVM unconditionally exposes the FS/GS base MSRs to L1. */
> -	nested_vmx_disable_intercept_for_msr(msr_bitmap_l1, msr_bitmap_l0,
> -					     MSR_FS_BASE, MSR_TYPE_RW);
> -
> -	nested_vmx_disable_intercept_for_msr(msr_bitmap_l1, msr_bitmap_l0,
> -					     MSR_GS_BASE, MSR_TYPE_RW);
> -
> -	nested_vmx_disable_intercept_for_msr(msr_bitmap_l1, msr_bitmap_l0,
> -					     MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
> -
>   	/*
> -	 * Checking the L0->L1 bitmap is trying to verify two things:
> -	 *
> -	 * 1. L0 gave a permission to L1 to actually passthrough the MSR. This
> -	 *    ensures that we do not accidentally generate an L02 MSR bitmap
> -	 *    from the L12 MSR bitmap that is too permissive.
> -	 * 2. That L1 or L2s have actually used the MSR. This avoids
> -	 *    unnecessarily merging of the bitmap if the MSR is unused. This
> -	 *    works properly because we only update the L01 MSR bitmap lazily.
> -	 *    So even if L0 should pass L1 these MSRs, the L01 bitmap is only
> -	 *    updated to reflect this when L1 (or its L2s) actually write to
> -	 *    the MSR.
> +	 * Always check vmcs01's bitmap to honor userspace MSR filters and any
> +	 * other runtime changes to vmcs01's bitmap, e.g. dynamic pass-through.
>   	 */
> -	if (!msr_write_intercepted_l01(vcpu, MSR_IA32_SPEC_CTRL))
> -		nested_vmx_disable_intercept_for_msr(
> -					msr_bitmap_l1, msr_bitmap_l0,
> -					MSR_IA32_SPEC_CTRL,
> -					MSR_TYPE_R | MSR_TYPE_W);
> -
> -	if (!msr_write_intercepted_l01(vcpu, MSR_IA32_PRED_CMD))
> -		nested_vmx_disable_intercept_for_msr(
> -					msr_bitmap_l1, msr_bitmap_l0,
> -					MSR_IA32_PRED_CMD,
> -					MSR_TYPE_W);
> -
> -	kvm_vcpu_unmap(vcpu, &to_vmx(vcpu)->nested.msr_bitmap_map, false);
> +	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
> +					 MSR_FS_BASE, MSR_TYPE_RW);
> +
> +	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
> +					 MSR_GS_BASE, MSR_TYPE_RW);
> +
> +	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
> +					 MSR_KERNEL_GS_BASE, MSR_TYPE_RW);
> +
> +	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
> +					 MSR_IA32_SPEC_CTRL, MSR_TYPE_RW);
> +
> +	nested_vmx_set_intercept_for_msr(vmx, msr_bitmap_l1, msr_bitmap_l0,
> +					 MSR_IA32_PRED_CMD, MSR_TYPE_W);
> +
> +	kvm_vcpu_unmap(vcpu, &vmx->nested.msr_bitmap_map, false);
>   
>   	return true;
>   }
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index c8a4a548e96b..9972e5d1c44e 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -879,29 +879,6 @@ void vmx_update_exception_bitmap(struct kvm_vcpu *vcpu)
>   	vmcs_write32(EXCEPTION_BITMAP, eb);
>   }
>   
> -/*
> - * Check if MSR is intercepted for currently loaded MSR bitmap.
> - */
> -static bool msr_write_intercepted(struct kvm_vcpu *vcpu, u32 msr)
> -{
> -	unsigned long *msr_bitmap;
> -	int f = sizeof(unsigned long);
> -
> -	if (!cpu_has_vmx_msr_bitmap())
> -		return true;
> -
> -	msr_bitmap = to_vmx(vcpu)->loaded_vmcs->msr_bitmap;
> -
> -	if (msr <= 0x1fff) {
> -		return !!test_bit(msr, msr_bitmap + 0x800 / f);
> -	} else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff)) {
> -		msr &= 0x1fff;
> -		return !!test_bit(msr, msr_bitmap + 0xc00 / f);
> -	}
> -
> -	return true;
> -}
> -
>   static void clear_atomic_switch_msr_special(struct vcpu_vmx *vmx,
>   		unsigned long entry, unsigned long exit)
>   {
> @@ -3709,46 +3686,6 @@ void free_vpid(int vpid)
>   	spin_unlock(&vmx_vpid_lock);
>   }
>   
> -static void vmx_clear_msr_bitmap_read(ulong *msr_bitmap, u32 msr)
> -{
> -	int f = sizeof(unsigned long);
> -
> -	if (msr <= 0x1fff)
> -		__clear_bit(msr, msr_bitmap + 0x000 / f);
> -	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
> -		__clear_bit(msr & 0x1fff, msr_bitmap + 0x400 / f);
> -}
> -
> -static void vmx_clear_msr_bitmap_write(ulong *msr_bitmap, u32 msr)
> -{
> -	int f = sizeof(unsigned long);
> -
> -	if (msr <= 0x1fff)
> -		__clear_bit(msr, msr_bitmap + 0x800 / f);
> -	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
> -		__clear_bit(msr & 0x1fff, msr_bitmap + 0xc00 / f);
> -}
> -
> -static void vmx_set_msr_bitmap_read(ulong *msr_bitmap, u32 msr)
> -{
> -	int f = sizeof(unsigned long);
> -
> -	if (msr <= 0x1fff)
> -		__set_bit(msr, msr_bitmap + 0x000 / f);
> -	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
> -		__set_bit(msr & 0x1fff, msr_bitmap + 0x400 / f);
> -}
> -
> -static void vmx_set_msr_bitmap_write(ulong *msr_bitmap, u32 msr)
> -{
> -	int f = sizeof(unsigned long);
> -
> -	if (msr <= 0x1fff)
> -		__set_bit(msr, msr_bitmap + 0x800 / f);
> -	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
> -		__set_bit(msr & 0x1fff, msr_bitmap + 0xc00 / f);
> -}
> -
>   static __always_inline void vmx_disable_intercept_for_msr(struct kvm_vcpu *vcpu,
>   							  u32 msr, int type)
>   {
> @@ -6722,7 +6659,9 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu)
>   	 * If the L02 MSR bitmap does not intercept the MSR, then we need to
>   	 * save it.
>   	 */
> -	if (unlikely(!msr_write_intercepted(vcpu, MSR_IA32_SPEC_CTRL)))
> +	if (unlikely(cpu_has_vmx_msr_bitmap() &&
> +		     vmx_test_msr_bitmap_write(vmx->loaded_vmcs->msr_bitmap,
> +					       MSR_IA32_SPEC_CTRL)))
>   		vmx->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL);
>   
>   	x86_spec_ctrl_restore_host(vmx->spec_ctrl, 0);
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index 0fb3236b0283..a6000c91b897 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -393,6 +393,69 @@ void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu,
>   	u32 msr, int type, bool value);
>   void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu);
>   
> +static inline bool vmx_test_msr_bitmap_read(ulong *msr_bitmap, u32 msr)
> +{
> +	int f = sizeof(unsigned long);
> +
> +	if (msr <= 0x1fff)
> +		return test_bit(msr, msr_bitmap + 0x000 / f);
> +	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
> +		return test_bit(msr & 0x1fff, msr_bitmap + 0x400 / f);
> +	return true;
> +}
> +
> +static inline bool vmx_test_msr_bitmap_write(ulong *msr_bitmap, u32 msr)
> +{
> +	int f = sizeof(unsigned long);
> +
> +	if (msr <= 0x1fff)
> +		return test_bit(msr, msr_bitmap + 0x800 / f);
> +	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
> +		return test_bit(msr & 0x1fff, msr_bitmap + 0xc00 / f);
> +	return true;
> +}
> +
> +static inline void vmx_clear_msr_bitmap_read(ulong *msr_bitmap, u32 msr)
> +{
> +	int f = sizeof(unsigned long);
> +
> +	if (msr <= 0x1fff)
> +		__clear_bit(msr, msr_bitmap + 0x000 / f);
> +	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
> +		__clear_bit(msr & 0x1fff, msr_bitmap + 0x400 / f);
> +}
> +
> +static inline void vmx_clear_msr_bitmap_write(ulong *msr_bitmap, u32 msr)
> +{
> +	int f = sizeof(unsigned long);
> +
> +	if (msr <= 0x1fff)
> +		__clear_bit(msr, msr_bitmap + 0x800 / f);
> +	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
> +		__clear_bit(msr & 0x1fff, msr_bitmap + 0xc00 / f);
> +}
> +
> +static inline void vmx_set_msr_bitmap_read(ulong *msr_bitmap, u32 msr)
> +{
> +	int f = sizeof(unsigned long);
> +
> +	if (msr <= 0x1fff)
> +		__set_bit(msr, msr_bitmap + 0x000 / f);
> +	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
> +		__set_bit(msr & 0x1fff, msr_bitmap + 0x400 / f);
> +}
> +
> +static inline void vmx_set_msr_bitmap_write(ulong *msr_bitmap, u32 msr)
> +{
> +	int f = sizeof(unsigned long);
> +
> +	if (msr <= 0x1fff)
> +		__set_bit(msr, msr_bitmap + 0x800 / f);
> +	else if ((msr >= 0xc0000000) && (msr <= 0xc0001fff))
> +		__set_bit(msr & 0x1fff, msr_bitmap + 0xc00 / f);
> +}
> +
> +
>   static inline u8 vmx_get_rvi(void)
>   {
>   	return vmcs_read16(GUEST_INTR_STATUS) & 0xff;
> 

Feel free to squash patch 3 in this one or reorder it before; it makes 
sense to make them macros when you go from 4 to 6 functions.

Paolo


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] KVM: VMX: Macrofy the MSR bitmap getters and setters
  2021-03-17 13:15   ` Paolo Bonzini
@ 2021-03-17 16:39     ` Sean Christopherson
  0 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2021-03-17 16:39 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Alexander Graf, Yuan Yao

On Wed, Mar 17, 2021, Paolo Bonzini wrote:
> On 16/03/21 19:44, Sean Christopherson wrote:
> > +	return (ret)true;						      \
> 
> I'm not sure if (void)true is amazing or disgusting, but anyway...

Definitely both.

> > +BUILD_VMX_MSR_BITMAP_HELPER(bool, test, read)
> > +BUILD_VMX_MSR_BITMAP_HELPER(bool, test, write)
> > +BUILD_VMX_MSR_BITMAP_HELPER(void, clear, read, __)
> > +BUILD_VMX_MSR_BITMAP_HELPER(void, clear, write, __)
> > +BUILD_VMX_MSR_BITMAP_HELPER(void, set, read, __)
> > +BUILD_VMX_MSR_BITMAP_HELPER(void, set, write, __)
> 
> ... I guess we have an armed truce where you let me do my bit manipulation
> magic and I let you do your macro magic.

Ha, mutually assured destruction.

> Still, I think gluing the variadic arguments with ## is a bit too much.

Heh, I don't disagree at all.  Honestly, I was surprised it worked, and couldn't
resist throwing it in because it's so absurd.

> This would be slightly less mysterious:
> 
> +BUILD_VMX_MSR_BITMAP_HELPER(bool, vmx_test_msr_bitmap_, read, test_bit)
> +BUILD_VMX_MSR_BITMAP_HELPER(bool, vmx_test_msr_bitmap_, write, test_bit)
> +BUILD_VMX_MSR_BITMAP_HELPER(void, vmx_clear_msr_bitmap_, read, __clear_bit)
> +BUILD_VMX_MSR_BITMAP_HELPER(void, vmx_clear_msr_bitmap_, write,
> __clear_bit)
> +BUILD_VMX_MSR_BITMAP_HELPER(void, vmx_set_msr_bitmap_, read, __set_bit)
> +BUILD_VMX_MSR_BITMAP_HELPER(void, vmx_set_msr_bitmap_, write, __set_bit)
> 
> And I also wonder if we really need to expand all six functions one at a
> time.  You could remove the third argument and VMX_MSR_BITMAP_BASE_*, at the
> cost of expanding the inline functions' body twice in
> BUILD_VMX_MSR_BITMAP_HELPER.

I'll play around with the macros to see if I can make them less obnoxious.  I
found it easier to differentiate between the read/write offset and the high/low
offset by building them one at a time.  I'll see if I can find a compromise.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] KVM: nVMX: Handle dynamic MSR intercept toggling
  2021-03-17 13:17   ` Paolo Bonzini
@ 2021-03-17 16:50     ` Sean Christopherson
  2021-03-17 17:22       ` Paolo Bonzini
  0 siblings, 1 reply; 14+ messages in thread
From: Sean Christopherson @ 2021-03-17 16:50 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Alexander Graf, Yuan Yao

On Wed, Mar 17, 2021, Paolo Bonzini wrote:
> On 16/03/21 19:44, Sean Christopherson wrote:
> > Always check vmcs01's MSR bitmap when merging L0 and L1 bitmaps for L2,
> > and always update the relevant bits in vmcs02.  This fixes two distinct,
> > but intertwined bugs related to dynamic MSR bitmap modifications.
> > 
> > The first issue is that KVM fails to enable MSR interception in vmcs02
> > for the FS/GS base MSRs if L1 first runs L2 with interception disabled,
> > and later enables interception.
> > 
> > The second issue is that KVM fails to honor userspace MSR filtering when
> > preparing vmcs02.
> > 
> > Fix both issues simultaneous as fixing only one of the issues (doesn't
> > matter which) would create a mess that no one should have to bisect.
> > Fixing only the first bug would exacerbate the MSR filtering issue as
> > userspace would see inconsistent behavior depending on the whims of L1.
> > Fixing only the second bug (MSR filtering) effectively requires fixing
> > the first, as the nVMX code only knows how to transition vmcs02's
> > bitmap from 1->0.
> > 
> > Move the various accessor/mutators buried in vmx.c into vmx.h so that
> > they can be shared by the nested code.
> > 
> > Fixes: 1a155254ff93 ("KVM: x86: Introduce MSR filtering")
> > Fixes: d69129b4e46a ("KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible")
> > Cc: stable@vger.kernel.org
> > Cc: Alexander Graf <graf@amazon.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---

...

> Feel free to squash patch 3 in this one or reorder it before; it makes sense
> to make them macros when you go from 4 to 6 functions.

I put them in a separate patch so that backporting the fix for the older FS/GS
nVMX bug was at least feasible.  Not worth it?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] KVM: nVMX: Handle dynamic MSR intercept toggling
  2021-03-17 16:50     ` Sean Christopherson
@ 2021-03-17 17:22       ` Paolo Bonzini
  2021-03-17 17:24         ` Sean Christopherson
  0 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2021-03-17 17:22 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Alexander Graf, Yuan Yao

On 17/03/21 17:50, Sean Christopherson wrote:
>> Feel free to squash patch 3 in this one or reorder it before; it makes sense
>> to make them macros when you go from 4 to 6 functions.
> I put them in a separate patch so that backporting the fix for the older FS/GS
> nVMX bug was at least feasible.  Not worth it?

Going all the way back to 5.2 would almost certainly have other 
conflicts, so probably not.

Paolo


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] KVM: nVMX: Handle dynamic MSR intercept toggling
  2021-03-17 17:22       ` Paolo Bonzini
@ 2021-03-17 17:24         ` Sean Christopherson
  2021-03-17 20:04           ` Sean Christopherson
  0 siblings, 1 reply; 14+ messages in thread
From: Sean Christopherson @ 2021-03-17 17:24 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Alexander Graf, Yuan Yao

On Wed, Mar 17, 2021, Paolo Bonzini wrote:
> On 17/03/21 17:50, Sean Christopherson wrote:
> > > Feel free to squash patch 3 in this one or reorder it before; it makes sense
> > > to make them macros when you go from 4 to 6 functions.
> > I put them in a separate patch so that backporting the fix for the older FS/GS
> > nVMX bug was at least feasible.  Not worth it?
> 
> Going all the way back to 5.2 would almost certainly have other conflicts,
> so probably not.

I'll do a dry run before posting v2; if it's clean I'll leave things as is, if
it's a mess I'll move the macro patch earlier.

Thanks!

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish
  2021-03-16 18:44 ` [PATCH 1/4] KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish Sean Christopherson
  2021-03-17 13:15   ` Paolo Bonzini
@ 2021-03-17 19:29   ` Alexander Graf
  1 sibling, 0 replies; 14+ messages in thread
From: Alexander Graf @ 2021-03-17 19:29 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Yuan Yao



On 16.03.21 19:44, Sean Christopherson wrote:
> 
> Fix a plethora of issues with MSR filtering by installing the resulting
> filter as an atomic bundle instead of updating the live filter one range
> at a time.  The KVM_X86_SET_MSR_FILTER ioctl() isn't truly atomic, as
> the hardware MSR bitmaps won't be updated until the next VM-Enter, but
> the relevant software struct is atomically updated, which is what KVM
> really needs.
> 
> Similar to the approach used for modifying memslots, make arch.msr_filter
> a SRCU-protected pointer, do all the work configuring the new filter
> outside of kvm->lock, and then acquire kvm->lock only when the new filter
> has been vetted and created.  That way vCPU readers either see the old
> filter or the new filter in their entirety, not some half-baked state.
> 
> Yuan Yao pointed out a use-after-free in ksm_msr_allowed() due to a
> TOCTOU bug, but that's just the tip of the iceberg...
> 
>    - Nothing is __rcu annotated, making it nigh impossible to audit the
>      code for correctness.
>    - kvm_add_msr_filter() has an unpaired smp_wmb().  Violation of kernel
>      coding style aside, the lack of a smb_rmb() anywhere casts all code
>      into doubt.
>    - kvm_clear_msr_filter() has a double free TOCTOU bug, as it grabs
>      count before taking the lock.
>    - kvm_clear_msr_filter() also has memory leak due to the same TOCTOU bug.
> 
> The entire approach of updating the live filter is also flawed.  While
> installing a new filter is inherently racy if vCPUs are running, fixing
> the above issues also makes it trivial to ensure certain behavior is
> deterministic, e.g. KVM can provide deterministic behavior for MSRs with
> identical settings in the old and new filters.  An atomic update of the
> filter also prevents KVM from getting into a half-baked state, e.g. if
> installing a filter fails, the existing approach would leave the filter
> in a half-baked state, having already committed whatever bits of the
> filter were already processed.
> 
> [*] https://lkml.kernel.org/r/20210312083157.25403-1-yaoyuan0329os@gmail.com
> 
> Fixes: 1a155254ff93 ("KVM: x86: Introduce MSR filtering")
> Cc: stable@vger.kernel.org
> Cc: Alexander Graf <graf@amazon.com>
> Reported-by: Yuan Yao <yaoyuan0329os@gmail.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Thanks a lot Sean for cleaning up after me! I was trying to be a bit too 
smart with the inband count as token unfortunately :)

Reviewed-by: Alexander Graf <graf@amazon.com>


Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] KVM: nVMX: Handle dynamic MSR intercept toggling
  2021-03-17 17:24         ` Sean Christopherson
@ 2021-03-17 20:04           ` Sean Christopherson
  0 siblings, 0 replies; 14+ messages in thread
From: Sean Christopherson @ 2021-03-17 20:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Alexander Graf, Yuan Yao

On Wed, Mar 17, 2021, Sean Christopherson wrote:
> On Wed, Mar 17, 2021, Paolo Bonzini wrote:
> > On 17/03/21 17:50, Sean Christopherson wrote:
> > > > Feel free to squash patch 3 in this one or reorder it before; it makes sense
> > > > to make them macros when you go from 4 to 6 functions.
> > > I put them in a separate patch so that backporting the fix for the older FS/GS
> > > nVMX bug was at least feasible.  Not worth it?
> > 
> > Going all the way back to 5.2 would almost certainly have other conflicts,
> > so probably not.
> 
> I'll do a dry run before posting v2; if it's clean I'll leave things as is, if
> it's a mess I'll move the macro patch earlier.

Backports to 5.4 (the olds relevant LTS) with a single trivial conflict on a
function prototype (the function itself isn't touched, it's just an unfortunate
false postive), and the MSR filtering selftest runs cleanly (in L1).  I'll keep
the original ordering unless you strongly prefer moving the macro patch earlier.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-03-17 20:05 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-16 18:44 [PATCH 0/4] KVM: x86: MSR filtering and related fixes Sean Christopherson
2021-03-16 18:44 ` [PATCH 1/4] KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish Sean Christopherson
2021-03-17 13:15   ` Paolo Bonzini
2021-03-17 19:29   ` Alexander Graf
2021-03-16 18:44 ` [PATCH 2/4] KVM: nVMX: Handle dynamic MSR intercept toggling Sean Christopherson
2021-03-17 13:17   ` Paolo Bonzini
2021-03-17 16:50     ` Sean Christopherson
2021-03-17 17:22       ` Paolo Bonzini
2021-03-17 17:24         ` Sean Christopherson
2021-03-17 20:04           ` Sean Christopherson
2021-03-16 18:44 ` [PATCH 3/4] KVM: VMX: Macrofy the MSR bitmap getters and setters Sean Christopherson
2021-03-17 13:15   ` Paolo Bonzini
2021-03-17 16:39     ` Sean Christopherson
2021-03-16 18:44 ` [PATCH 4/4] KVM: nVMX: Clean up x2APIC MSR handling for L2 Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).