linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] KVM: Fix coalesced mmio ring buffer out-of-bounds access
@ 2019-09-17  8:16 Wanpeng Li
  2019-09-17  8:16 ` [PATCH v2 2/3] KVM: X86: Fix userspace set broken combinations of CPUID and CR4 Wanpeng Li
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Wanpeng Li @ 2019-09-17  8:16 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, stable

From: Wanpeng Li <wanpengli@tencent.com>

Reported by syzkaller:

	#PF: supervisor write access in kernel mode
	#PF: error_code(0x0002) - not-present page
	PGD 403c01067 P4D 403c01067 PUD 0
	Oops: 0002 [#1] SMP PTI
	CPU: 1 PID: 12564 Comm: a.out Tainted: G           OE     5.3.0-rc4+ #4
	RIP: 0010:coalesced_mmio_write+0xcc/0x130 [kvm]
	Call Trace:
	 __kvm_io_bus_write+0x91/0xe0 [kvm]
	 kvm_io_bus_write+0x79/0xf0 [kvm]
	 write_mmio+0xae/0x170 [kvm]
	 emulator_read_write_onepage+0x252/0x430 [kvm]
	 emulator_read_write+0xcd/0x180 [kvm]
	 emulator_write_emulated+0x15/0x20 [kvm]
	 segmented_write+0x59/0x80 [kvm]
	 writeback+0x113/0x250 [kvm]
	 x86_emulate_insn+0x78c/0xd80 [kvm]
	 x86_emulate_instruction+0x386/0x7c0 [kvm]
	 kvm_mmu_page_fault+0xf9/0x9e0 [kvm]
	 handle_ept_violation+0x10a/0x220 [kvm_intel]
	 vmx_handle_exit+0xbe/0x6b0 [kvm_intel]
	 vcpu_enter_guest+0x4dc/0x18d0 [kvm]
	 kvm_arch_vcpu_ioctl_run+0x407/0x660 [kvm]
	 kvm_vcpu_ioctl+0x3ad/0x690 [kvm]
	 do_vfs_ioctl+0xa2/0x690
	 ksys_ioctl+0x6d/0x80
	 __x64_sys_ioctl+0x1a/0x20
	 do_syscall_64+0x74/0x720
	 entry_SYSCALL_64_after_hwframe+0x49/0xbe
	RIP: 0010:coalesced_mmio_write+0xcc/0x130 [kvm]

Both the coalesced_mmio ring buffer indexs ring->first and ring->last are 
bigger than KVM_COALESCED_MMIO_MAX from the testcase, array out-of-bounds 
access triggers by ring->coalesced_mmio[ring->last].phys_addr = addr; 
assignment. This patch fixes it by mod indexs by KVM_COALESCED_MMIO_MAX.

syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=134b2826a00000

Reported-by: syzbot+983c866c3dd6efa3662a@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 virt/kvm/coalesced_mmio.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 5294abb..cff1ec9 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -73,6 +73,8 @@ static int coalesced_mmio_write(struct kvm_vcpu *vcpu,
 
 	spin_lock(&dev->kvm->ring_lock);
 
+	ring->first = ring->first % KVM_COALESCED_MMIO_MAX;
+	ring->last = ring->last % KVM_COALESCED_MMIO_MAX;
 	if (!coalesced_mmio_has_room(dev)) {
 		spin_unlock(&dev->kvm->ring_lock);
 		return -EOPNOTSUPP;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/3] KVM: X86: Fix userspace set broken combinations of CPUID and CR4
  2019-09-17  8:16 [PATCH 1/3] KVM: Fix coalesced mmio ring buffer out-of-bounds access Wanpeng Li
@ 2019-09-17  8:16 ` Wanpeng Li
  2019-09-17 17:32   ` Sean Christopherson
  2019-09-17  8:16 ` [PATCH v5 3/3] KVM: LAPIC: Tune lapic_timer_advance_ns smoothly Wanpeng Li
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Wanpeng Li @ 2019-09-17  8:16 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, stable

From: Wanpeng Li <wanpengli@tencent.com>

Reported by syzkaller:

	WARNING: CPU: 0 PID: 6544 at /home/kernel/data/kvm/arch/x86/kvm//vmx/vmx.c:4689 handle_desc+0x37/0x40 [kvm_intel]
	CPU: 0 PID: 6544 Comm: a.out Tainted: G           OE     5.3.0-rc4+ #4
	RIP: 0010:handle_desc+0x37/0x40 [kvm_intel]
	Call Trace:
	 vmx_handle_exit+0xbe/0x6b0 [kvm_intel]
	 vcpu_enter_guest+0x4dc/0x18d0 [kvm]
	 kvm_arch_vcpu_ioctl_run+0x407/0x660 [kvm]
	 kvm_vcpu_ioctl+0x3ad/0x690 [kvm]
	 do_vfs_ioctl+0xa2/0x690
	 ksys_ioctl+0x6d/0x80
	 __x64_sys_ioctl+0x1a/0x20
	 do_syscall_64+0x74/0x720
	 entry_SYSCALL_64_after_hwframe+0x49/0xbe

When CR4.UMIP is set, guest should have UMIP cpuid flag. Current
kvm set_sregs function doesn't have such check when userspace inputs
sregs values. SECONDARY_EXEC_DESC is enabled on writes to CR4.UMIP
in vmx_set_cr4 though guest doesn't have UMIP cpuid flag. The testcast
triggers handle_desc warning when executing ltr instruction since
guest architectural CR4 doesn't set UMIP. This patch fixes it by
adding valid CR4 and CPUID combination checking in __set_sregs.

syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=138efb99600000

Reported-by: syzbot+0f1819555fbdce992df9@syzkaller.appspotmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/x86.c | 39 ++++++++++++++++++++++++---------------
 1 file changed, 24 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f7cfd8e..cafb4d4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -884,34 +884,42 @@ int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
 }
 EXPORT_SYMBOL_GPL(kvm_set_xcr);
 
-int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+static int kvm_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
-	unsigned long old_cr4 = kvm_read_cr4(vcpu);
-	unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE |
-				   X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE;
-
-	if (cr4 & CR4_RESERVED_BITS)
-		return 1;
-
 	if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) && (cr4 & X86_CR4_OSXSAVE))
-		return 1;
+		return -EINVAL;
 
 	if (!guest_cpuid_has(vcpu, X86_FEATURE_SMEP) && (cr4 & X86_CR4_SMEP))
-		return 1;
+		return -EINVAL;
 
 	if (!guest_cpuid_has(vcpu, X86_FEATURE_SMAP) && (cr4 & X86_CR4_SMAP))
-		return 1;
+		return -EINVAL;
 
 	if (!guest_cpuid_has(vcpu, X86_FEATURE_FSGSBASE) && (cr4 & X86_CR4_FSGSBASE))
-		return 1;
+		return -EINVAL;
 
 	if (!guest_cpuid_has(vcpu, X86_FEATURE_PKU) && (cr4 & X86_CR4_PKE))
-		return 1;
+		return -EINVAL;
 
 	if (!guest_cpuid_has(vcpu, X86_FEATURE_LA57) && (cr4 & X86_CR4_LA57))
-		return 1;
+		return -EINVAL;
 
 	if (!guest_cpuid_has(vcpu, X86_FEATURE_UMIP) && (cr4 & X86_CR4_UMIP))
+		return -EINVAL;
+
+	return 0;
+}
+
+int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
+{
+	unsigned long old_cr4 = kvm_read_cr4(vcpu);
+	unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE |
+				   X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE;
+
+	if (cr4 & CR4_RESERVED_BITS)
+		return 1;
+
+	if (kvm_valid_cr4(vcpu, cr4))
 		return 1;
 
 	if (is_long_mode(vcpu)) {
@@ -8675,7 +8683,8 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
 	struct desc_ptr dt;
 	int ret = -EINVAL;
 
-	if (kvm_valid_sregs(vcpu, sregs))
+	if (kvm_valid_sregs(vcpu, sregs) ||
+		kvm_valid_cr4(vcpu, sregs->cr4))
 		goto out;
 
 	apic_base_msr.data = sregs->apic_base;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5 3/3] KVM: LAPIC: Tune lapic_timer_advance_ns smoothly
  2019-09-17  8:16 [PATCH 1/3] KVM: Fix coalesced mmio ring buffer out-of-bounds access Wanpeng Li
  2019-09-17  8:16 ` [PATCH v2 2/3] KVM: X86: Fix userspace set broken combinations of CPUID and CR4 Wanpeng Li
@ 2019-09-17  8:16 ` Wanpeng Li
  2019-09-17 17:07   ` Paolo Bonzini
  2019-09-17  8:18 ` [PATCH 1/3] KVM: Fix coalesced mmio ring buffer out-of-bounds access Paolo Bonzini
  2019-09-17 14:58 ` Jim Mattson
  3 siblings, 1 reply; 10+ messages in thread
From: Wanpeng Li @ 2019-09-17  8:16 UTC (permalink / raw)
  To: linux-kernel, kvm
  Cc: Paolo Bonzini, Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel

From: Wanpeng Li <wanpengli@tencent.com>

Filter out drastic fluctuation and random fluctuation, remove
timer_advance_adjust_done altogether, the adjustment would be
continuous.

Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
---
 arch/x86/kvm/lapic.c | 28 ++++++++++++++--------------
 arch/x86/kvm/lapic.h |  1 -
 2 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index dbbe478..323bdca 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -65,7 +65,9 @@
 #define APIC_BROADCAST			0xFF
 #define X2APIC_BROADCAST		0xFFFFFFFFul
 
-#define LAPIC_TIMER_ADVANCE_ADJUST_DONE 100
+static bool dynamically_adjust_timer_advance __read_mostly;
+#define LAPIC_TIMER_ADVANCE_ADJUST_MIN 100
+#define LAPIC_TIMER_ADVANCE_ADJUST_MAX 5000
 #define LAPIC_TIMER_ADVANCE_ADJUST_INIT 1000
 /* step-by-step approximation to mitigate fluctuation */
 #define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8
@@ -1485,26 +1487,25 @@ static inline void adjust_lapic_timer_advance(struct kvm_vcpu *vcpu,
 	u32 timer_advance_ns = apic->lapic_timer.timer_advance_ns;
 	u64 ns;
 
+	/* Do not adjust for tiny fluctuations or large random spikes. */
+	if (abs(advance_expire_delta) > LAPIC_TIMER_ADVANCE_ADJUST_MAX ||
+	    abs(advance_expire_delta) < LAPIC_TIMER_ADVANCE_ADJUST_MIN)
+		return;
+
 	/* too early */
 	if (advance_expire_delta < 0) {
 		ns = -advance_expire_delta * 1000000ULL;
 		do_div(ns, vcpu->arch.virtual_tsc_khz);
-		timer_advance_ns -= min((u32)ns,
-			timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP);
+		timer_advance_ns -= ns/LAPIC_TIMER_ADVANCE_ADJUST_STEP;
 	} else {
 	/* too late */
 		ns = advance_expire_delta * 1000000ULL;
 		do_div(ns, vcpu->arch.virtual_tsc_khz);
-		timer_advance_ns += min((u32)ns,
-			timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP);
+		timer_advance_ns += ns/LAPIC_TIMER_ADVANCE_ADJUST_STEP;
 	}
 
-	if (abs(advance_expire_delta) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
-		apic->lapic_timer.timer_advance_adjust_done = true;
-	if (unlikely(timer_advance_ns > 5000)) {
+	if (unlikely(timer_advance_ns > LAPIC_TIMER_ADVANCE_ADJUST_MAX))
 		timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT;
-		apic->lapic_timer.timer_advance_adjust_done = false;
-	}
 	apic->lapic_timer.timer_advance_ns = timer_advance_ns;
 }
 
@@ -1524,7 +1525,7 @@ static void __kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
 	if (guest_tsc < tsc_deadline)
 		__wait_lapic_expire(vcpu, tsc_deadline - guest_tsc);
 
-	if (unlikely(!apic->lapic_timer.timer_advance_adjust_done))
+	if (dynamically_adjust_timer_advance)
 		adjust_lapic_timer_advance(vcpu, apic->lapic_timer.advance_expire_delta);
 }
 
@@ -2302,13 +2303,12 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)
 	apic->lapic_timer.timer.function = apic_timer_fn;
 	if (timer_advance_ns == -1) {
 		apic->lapic_timer.timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT;
-		apic->lapic_timer.timer_advance_adjust_done = false;
+		dynamically_adjust_timer_advance = true;
 	} else {
 		apic->lapic_timer.timer_advance_ns = timer_advance_ns;
-		apic->lapic_timer.timer_advance_adjust_done = true;
+		dynamically_adjust_timer_advance = false;
 	}
 
-
 	/*
 	 * APIC is created enabled. This will prevent kvm_lapic_set_base from
 	 * thinking that APIC state has changed.
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 50053d2..2aad7e2 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -35,7 +35,6 @@ struct kvm_timer {
 	s64 advance_expire_delta;
 	atomic_t pending;			/* accumulated triggered timers */
 	bool hv_timer_in_use;
-	bool timer_advance_adjust_done;
 };
 
 struct kvm_lapic {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] KVM: Fix coalesced mmio ring buffer out-of-bounds access
  2019-09-17  8:16 [PATCH 1/3] KVM: Fix coalesced mmio ring buffer out-of-bounds access Wanpeng Li
  2019-09-17  8:16 ` [PATCH v2 2/3] KVM: X86: Fix userspace set broken combinations of CPUID and CR4 Wanpeng Li
  2019-09-17  8:16 ` [PATCH v5 3/3] KVM: LAPIC: Tune lapic_timer_advance_ns smoothly Wanpeng Li
@ 2019-09-17  8:18 ` Paolo Bonzini
  2019-09-17 14:58 ` Jim Mattson
  3 siblings, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2019-09-17  8:18 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm; +Cc: P J P, Jim Mattson

I think we should consider the embargo for CVE-2019-14821 to be broken.
 Since your patch is better, I'll push that one instead as soon as I get
confirmation.

Paolo

On 17/09/19 10:16, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> Reported by syzkaller:
> 
> 	#PF: supervisor write access in kernel mode
> 	#PF: error_code(0x0002) - not-present page
> 	PGD 403c01067 P4D 403c01067 PUD 0
> 	Oops: 0002 [#1] SMP PTI
> 	CPU: 1 PID: 12564 Comm: a.out Tainted: G           OE     5.3.0-rc4+ #4
> 	RIP: 0010:coalesced_mmio_write+0xcc/0x130 [kvm]
> 	Call Trace:
> 	 __kvm_io_bus_write+0x91/0xe0 [kvm]
> 	 kvm_io_bus_write+0x79/0xf0 [kvm]
> 	 write_mmio+0xae/0x170 [kvm]
> 	 emulator_read_write_onepage+0x252/0x430 [kvm]
> 	 emulator_read_write+0xcd/0x180 [kvm]
> 	 emulator_write_emulated+0x15/0x20 [kvm]
> 	 segmented_write+0x59/0x80 [kvm]
> 	 writeback+0x113/0x250 [kvm]
> 	 x86_emulate_insn+0x78c/0xd80 [kvm]
> 	 x86_emulate_instruction+0x386/0x7c0 [kvm]
> 	 kvm_mmu_page_fault+0xf9/0x9e0 [kvm]
> 	 handle_ept_violation+0x10a/0x220 [kvm_intel]
> 	 vmx_handle_exit+0xbe/0x6b0 [kvm_intel]
> 	 vcpu_enter_guest+0x4dc/0x18d0 [kvm]
> 	 kvm_arch_vcpu_ioctl_run+0x407/0x660 [kvm]
> 	 kvm_vcpu_ioctl+0x3ad/0x690 [kvm]
> 	 do_vfs_ioctl+0xa2/0x690
> 	 ksys_ioctl+0x6d/0x80
> 	 __x64_sys_ioctl+0x1a/0x20
> 	 do_syscall_64+0x74/0x720
> 	 entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 	RIP: 0010:coalesced_mmio_write+0xcc/0x130 [kvm]
> 
> Both the coalesced_mmio ring buffer indexs ring->first and ring->last are 
> bigger than KVM_COALESCED_MMIO_MAX from the testcase, array out-of-bounds 
> access triggers by ring->coalesced_mmio[ring->last].phys_addr = addr; 
> assignment. This patch fixes it by mod indexs by KVM_COALESCED_MMIO_MAX.
> 
> syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=134b2826a00000
> 
> Reported-by: syzbot+983c866c3dd6efa3662a@syzkaller.appspotmail.com
> Cc: stable@vger.kernel.org
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
>  virt/kvm/coalesced_mmio.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
> index 5294abb..cff1ec9 100644
> --- a/virt/kvm/coalesced_mmio.c
> +++ b/virt/kvm/coalesced_mmio.c
> @@ -73,6 +73,8 @@ static int coalesced_mmio_write(struct kvm_vcpu *vcpu,
>  
>  	spin_lock(&dev->kvm->ring_lock);
>  
> +	ring->first = ring->first % KVM_COALESCED_MMIO_MAX;
> +	ring->last = ring->last % KVM_COALESCED_MMIO_MAX;
>  	if (!coalesced_mmio_has_room(dev)) {
>  		spin_unlock(&dev->kvm->ring_lock);
>  		return -EOPNOTSUPP;
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] KVM: Fix coalesced mmio ring buffer out-of-bounds access
  2019-09-17  8:16 [PATCH 1/3] KVM: Fix coalesced mmio ring buffer out-of-bounds access Wanpeng Li
                   ` (2 preceding siblings ...)
  2019-09-17  8:18 ` [PATCH 1/3] KVM: Fix coalesced mmio ring buffer out-of-bounds access Paolo Bonzini
@ 2019-09-17 14:58 ` Jim Mattson
  2019-09-17 15:18   ` Matt Delco
  3 siblings, 1 reply; 10+ messages in thread
From: Jim Mattson @ 2019-09-17 14:58 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: LKML, kvm list, Paolo Bonzini, Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel,
	stable, Matt Delco

On Tue, Sep 17, 2019 at 1:16 AM Wanpeng Li <kernellwp@gmail.com> wrote:
>
> From: Wanpeng Li <wanpengli@tencent.com>
>
> Reported by syzkaller:
>
>         #PF: supervisor write access in kernel mode
>         #PF: error_code(0x0002) - not-present page
>         PGD 403c01067 P4D 403c01067 PUD 0
>         Oops: 0002 [#1] SMP PTI
>         CPU: 1 PID: 12564 Comm: a.out Tainted: G           OE     5.3.0-rc4+ #4
>         RIP: 0010:coalesced_mmio_write+0xcc/0x130 [kvm]
>         Call Trace:
>          __kvm_io_bus_write+0x91/0xe0 [kvm]
>          kvm_io_bus_write+0x79/0xf0 [kvm]
>          write_mmio+0xae/0x170 [kvm]
>          emulator_read_write_onepage+0x252/0x430 [kvm]
>          emulator_read_write+0xcd/0x180 [kvm]
>          emulator_write_emulated+0x15/0x20 [kvm]
>          segmented_write+0x59/0x80 [kvm]
>          writeback+0x113/0x250 [kvm]
>          x86_emulate_insn+0x78c/0xd80 [kvm]
>          x86_emulate_instruction+0x386/0x7c0 [kvm]
>          kvm_mmu_page_fault+0xf9/0x9e0 [kvm]
>          handle_ept_violation+0x10a/0x220 [kvm_intel]
>          vmx_handle_exit+0xbe/0x6b0 [kvm_intel]
>          vcpu_enter_guest+0x4dc/0x18d0 [kvm]
>          kvm_arch_vcpu_ioctl_run+0x407/0x660 [kvm]
>          kvm_vcpu_ioctl+0x3ad/0x690 [kvm]
>          do_vfs_ioctl+0xa2/0x690
>          ksys_ioctl+0x6d/0x80
>          __x64_sys_ioctl+0x1a/0x20
>          do_syscall_64+0x74/0x720
>          entry_SYSCALL_64_after_hwframe+0x49/0xbe
>         RIP: 0010:coalesced_mmio_write+0xcc/0x130 [kvm]
>
> Both the coalesced_mmio ring buffer indexs ring->first and ring->last are
> bigger than KVM_COALESCED_MMIO_MAX from the testcase, array out-of-bounds
> access triggers by ring->coalesced_mmio[ring->last].phys_addr = addr;
> assignment. This patch fixes it by mod indexs by KVM_COALESCED_MMIO_MAX.
>
> syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=134b2826a00000
>
> Reported-by: syzbot+983c866c3dd6efa3662a@syzkaller.appspotmail.com
> Cc: stable@vger.kernel.org
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
>  virt/kvm/coalesced_mmio.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
> index 5294abb..cff1ec9 100644
> --- a/virt/kvm/coalesced_mmio.c
> +++ b/virt/kvm/coalesced_mmio.c
> @@ -73,6 +73,8 @@ static int coalesced_mmio_write(struct kvm_vcpu *vcpu,
>
>         spin_lock(&dev->kvm->ring_lock);
>
> +       ring->first = ring->first % KVM_COALESCED_MMIO_MAX;
> +       ring->last = ring->last % KVM_COALESCED_MMIO_MAX;

I don't think this is sufficient, since the memory that ring points to
is shared with userspace. Userspace can overwrite your corrected
values with illegal ones before they are used. Not exactly a TOCTTOU
issue, since there isn't technically a 'check' here, but the same
idea.

>         if (!coalesced_mmio_has_room(dev)) {
>                 spin_unlock(&dev->kvm->ring_lock);
>                 return -EOPNOTSUPP;
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] KVM: Fix coalesced mmio ring buffer out-of-bounds access
  2019-09-17 14:58 ` Jim Mattson
@ 2019-09-17 15:18   ` Matt Delco
  0 siblings, 0 replies; 10+ messages in thread
From: Matt Delco @ 2019-09-17 15:18 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: LKML, Jim Mattson, kvm list, Paolo Bonzini,
	Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel,
	stable

On Tue, Sep 17, 2019 at 7:59 AM Jim Mattson <jmattson@google.com> wrote:
> On Tue, Sep 17, 2019 at 1:16 AM Wanpeng Li <kernellwp@gmail.com> wrote:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > Reported by syzkaller:
> >
> >         #PF: supervisor write access in kernel mode
> >         #PF: error_code(0x0002) - not-present page
> >         PGD 403c01067 P4D 403c01067 PUD 0
> >         Oops: 0002 [#1] SMP PTI
> >         CPU: 1 PID: 12564 Comm: a.out Tainted: G           OE     5.3.0-rc4+ #4
> >         RIP: 0010:coalesced_mmio_write+0xcc/0x130 [kvm]
> >         Call Trace:
> >          __kvm_io_bus_write+0x91/0xe0 [kvm]
> >          kvm_io_bus_write+0x79/0xf0 [kvm]
> >          write_mmio+0xae/0x170 [kvm]
> >          emulator_read_write_onepage+0x252/0x430 [kvm]
> >          emulator_read_write+0xcd/0x180 [kvm]
> >          emulator_write_emulated+0x15/0x20 [kvm]
> >          segmented_write+0x59/0x80 [kvm]
> >          writeback+0x113/0x250 [kvm]
> >          x86_emulate_insn+0x78c/0xd80 [kvm]
> >          x86_emulate_instruction+0x386/0x7c0 [kvm]
> >          kvm_mmu_page_fault+0xf9/0x9e0 [kvm]
> >          handle_ept_violation+0x10a/0x220 [kvm_intel]
> >          vmx_handle_exit+0xbe/0x6b0 [kvm_intel]
> >          vcpu_enter_guest+0x4dc/0x18d0 [kvm]
> >          kvm_arch_vcpu_ioctl_run+0x407/0x660 [kvm]
> >          kvm_vcpu_ioctl+0x3ad/0x690 [kvm]
> >          do_vfs_ioctl+0xa2/0x690
> >          ksys_ioctl+0x6d/0x80
> >          __x64_sys_ioctl+0x1a/0x20
> >          do_syscall_64+0x74/0x720
> >          entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >         RIP: 0010:coalesced_mmio_write+0xcc/0x130 [kvm]
> >
> > Both the coalesced_mmio ring buffer indexs ring->first and ring->last are
> > bigger than KVM_COALESCED_MMIO_MAX from the testcase, array out-of-bounds
> > access triggers by ring->coalesced_mmio[ring->last].phys_addr = addr;
> > assignment. This patch fixes it by mod indexs by KVM_COALESCED_MMIO_MAX.
> >
> > syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=134b2826a00000
> >
> > Reported-by: syzbot+983c866c3dd6efa3662a@syzkaller.appspotmail.com
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > ---
> >  virt/kvm/coalesced_mmio.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
> > index 5294abb..cff1ec9 100644
> > --- a/virt/kvm/coalesced_mmio.c
> > +++ b/virt/kvm/coalesced_mmio.c
> > @@ -73,6 +73,8 @@ static int coalesced_mmio_write(struct kvm_vcpu *vcpu,
> >
> >         spin_lock(&dev->kvm->ring_lock);
> >
> > +       ring->first = ring->first % KVM_COALESCED_MMIO_MAX;

This update to first doesn't provide any worthwhile benefit (it's not
used to compute the address of a write) and likely adds the overhead
cost of a 2nd divide operation (via the non-power-of-2 modulus).  If
first is invalid then the app and/or kernel will be confused about
whether the ring is empty or full, but no serious harm will occur (and
since the only write to first is by an app the app is only causing
harm to itself).

> > +       ring->last = ring->last % KVM_COALESCED_MMIO_MAX;
>
> I don't think this is sufficient, since the memory that ring points to
> is shared with userspace. Userspace can overwrite your corrected
> values with illegal ones before they are used. Not exactly a TOCTTOU
> issue, since there isn't technically a 'check' here, but the same
> idea.
>
> >         if (!coalesced_mmio_has_room(dev)) {
> >                 spin_unlock(&dev->kvm->ring_lock);
> >                 return -EOPNOTSUPP;
> > --
> > 2.7.4
> >

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5 3/3] KVM: LAPIC: Tune lapic_timer_advance_ns smoothly
  2019-09-17  8:16 ` [PATCH v5 3/3] KVM: LAPIC: Tune lapic_timer_advance_ns smoothly Wanpeng Li
@ 2019-09-17 17:07   ` Paolo Bonzini
  0 siblings, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2019-09-17 17:07 UTC (permalink / raw)
  To: Wanpeng Li, linux-kernel, kvm
  Cc: Radim Krčmář,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel

On 17/09/19 10:16, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> Filter out drastic fluctuation and random fluctuation, remove
> timer_advance_adjust_done altogether, the adjustment would be
> continuous.
> 
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>

Queued, thanks (I renamed the new variable to lapic_timer_advance_dynamic).

Thanks,

Paolo

> ---
>  arch/x86/kvm/lapic.c | 28 ++++++++++++++--------------
>  arch/x86/kvm/lapic.h |  1 -
>  2 files changed, 14 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index dbbe478..323bdca 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -65,7 +65,9 @@
>  #define APIC_BROADCAST			0xFF
>  #define X2APIC_BROADCAST		0xFFFFFFFFul
>  
> -#define LAPIC_TIMER_ADVANCE_ADJUST_DONE 100
> +static bool dynamically_adjust_timer_advance __read_mostly;
> +#define LAPIC_TIMER_ADVANCE_ADJUST_MIN 100
> +#define LAPIC_TIMER_ADVANCE_ADJUST_MAX 5000
>  #define LAPIC_TIMER_ADVANCE_ADJUST_INIT 1000
>  /* step-by-step approximation to mitigate fluctuation */
>  #define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8
> @@ -1485,26 +1487,25 @@ static inline void adjust_lapic_timer_advance(struct kvm_vcpu *vcpu,
>  	u32 timer_advance_ns = apic->lapic_timer.timer_advance_ns;
>  	u64 ns;
>  
> +	/* Do not adjust for tiny fluctuations or large random spikes. */
> +	if (abs(advance_expire_delta) > LAPIC_TIMER_ADVANCE_ADJUST_MAX ||
> +	    abs(advance_expire_delta) < LAPIC_TIMER_ADVANCE_ADJUST_MIN)
> +		return;
> +
>  	/* too early */
>  	if (advance_expire_delta < 0) {
>  		ns = -advance_expire_delta * 1000000ULL;
>  		do_div(ns, vcpu->arch.virtual_tsc_khz);
> -		timer_advance_ns -= min((u32)ns,
> -			timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP);
> +		timer_advance_ns -= ns/LAPIC_TIMER_ADVANCE_ADJUST_STEP;
>  	} else {
>  	/* too late */
>  		ns = advance_expire_delta * 1000000ULL;
>  		do_div(ns, vcpu->arch.virtual_tsc_khz);
> -		timer_advance_ns += min((u32)ns,
> -			timer_advance_ns / LAPIC_TIMER_ADVANCE_ADJUST_STEP);
> +		timer_advance_ns += ns/LAPIC_TIMER_ADVANCE_ADJUST_STEP;
>  	}
>  
> -	if (abs(advance_expire_delta) < LAPIC_TIMER_ADVANCE_ADJUST_DONE)
> -		apic->lapic_timer.timer_advance_adjust_done = true;
> -	if (unlikely(timer_advance_ns > 5000)) {
> +	if (unlikely(timer_advance_ns > LAPIC_TIMER_ADVANCE_ADJUST_MAX))
>  		timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT;
> -		apic->lapic_timer.timer_advance_adjust_done = false;
> -	}
>  	apic->lapic_timer.timer_advance_ns = timer_advance_ns;
>  }
>  
> @@ -1524,7 +1525,7 @@ static void __kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
>  	if (guest_tsc < tsc_deadline)
>  		__wait_lapic_expire(vcpu, tsc_deadline - guest_tsc);
>  
> -	if (unlikely(!apic->lapic_timer.timer_advance_adjust_done))
> +	if (dynamically_adjust_timer_advance)
>  		adjust_lapic_timer_advance(vcpu, apic->lapic_timer.advance_expire_delta);
>  }
>  
> @@ -2302,13 +2303,12 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns)
>  	apic->lapic_timer.timer.function = apic_timer_fn;
>  	if (timer_advance_ns == -1) {
>  		apic->lapic_timer.timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT;
> -		apic->lapic_timer.timer_advance_adjust_done = false;
> +		dynamically_adjust_timer_advance = true;
>  	} else {
>  		apic->lapic_timer.timer_advance_ns = timer_advance_ns;
> -		apic->lapic_timer.timer_advance_adjust_done = true;
> +		dynamically_adjust_timer_advance = false;
>  	}
>  
> -
>  	/*
>  	 * APIC is created enabled. This will prevent kvm_lapic_set_base from
>  	 * thinking that APIC state has changed.
> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
> index 50053d2..2aad7e2 100644
> --- a/arch/x86/kvm/lapic.h
> +++ b/arch/x86/kvm/lapic.h
> @@ -35,7 +35,6 @@ struct kvm_timer {
>  	s64 advance_expire_delta;
>  	atomic_t pending;			/* accumulated triggered timers */
>  	bool hv_timer_in_use;
> -	bool timer_advance_adjust_done;
>  };
>  
>  struct kvm_lapic {
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/3] KVM: X86: Fix userspace set broken combinations of CPUID and CR4
  2019-09-17  8:16 ` [PATCH v2 2/3] KVM: X86: Fix userspace set broken combinations of CPUID and CR4 Wanpeng Li
@ 2019-09-17 17:32   ` Sean Christopherson
  2019-09-18  9:56     ` Wanpeng Li
  2019-09-24 13:55     ` Paolo Bonzini
  0 siblings, 2 replies; 10+ messages in thread
From: Sean Christopherson @ 2019-09-17 17:32 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: linux-kernel, kvm, Paolo Bonzini, Radim Krčmář,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, stable

On Tue, Sep 17, 2019 at 04:16:25PM +0800, Wanpeng Li wrote:
> From: Wanpeng Li <wanpengli@tencent.com>
> 
> Reported by syzkaller:
> 
> 	WARNING: CPU: 0 PID: 6544 at /home/kernel/data/kvm/arch/x86/kvm//vmx/vmx.c:4689 handle_desc+0x37/0x40 [kvm_intel]
> 	CPU: 0 PID: 6544 Comm: a.out Tainted: G           OE     5.3.0-rc4+ #4
> 	RIP: 0010:handle_desc+0x37/0x40 [kvm_intel]
> 	Call Trace:
> 	 vmx_handle_exit+0xbe/0x6b0 [kvm_intel]
> 	 vcpu_enter_guest+0x4dc/0x18d0 [kvm]
> 	 kvm_arch_vcpu_ioctl_run+0x407/0x660 [kvm]
> 	 kvm_vcpu_ioctl+0x3ad/0x690 [kvm]
> 	 do_vfs_ioctl+0xa2/0x690
> 	 ksys_ioctl+0x6d/0x80
> 	 __x64_sys_ioctl+0x1a/0x20
> 	 do_syscall_64+0x74/0x720
> 	 entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> When CR4.UMIP is set, guest should have UMIP cpuid flag. Current
> kvm set_sregs function doesn't have such check when userspace inputs
> sregs values. SECONDARY_EXEC_DESC is enabled on writes to CR4.UMIP
> in vmx_set_cr4 though guest doesn't have UMIP cpuid flag. The testcast
> triggers handle_desc warning when executing ltr instruction since
> guest architectural CR4 doesn't set UMIP. This patch fixes it by
> adding valid CR4 and CPUID combination checking in __set_sregs.

Checking CPUID will fix this specific scenario, but it doesn't resolve
the underlying issue of __set_sregs() ignoring the return of kvm_x86_ops'
set_cr4(), e.g. I think vmx_set_cr4() can still fail if userspace sets a
custom MSR_IA32_VMX_CR4_FIXED0 when nested VMX is on.
 
> syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=138efb99600000
> 
> Reported-by: syzbot+0f1819555fbdce992df9@syzkaller.appspotmail.com
> Cc: stable@vger.kernel.org
> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> ---
>  arch/x86/kvm/x86.c | 39 ++++++++++++++++++++++++---------------
>  1 file changed, 24 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f7cfd8e..cafb4d4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -884,34 +884,42 @@ int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
>  }
>  EXPORT_SYMBOL_GPL(kvm_set_xcr);
>  
> -int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> +static int kvm_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
>  {
> -	unsigned long old_cr4 = kvm_read_cr4(vcpu);
> -	unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE |
> -				   X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE;
> -
> -	if (cr4 & CR4_RESERVED_BITS)
> -		return 1;
> -
>  	if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) && (cr4 & X86_CR4_OSXSAVE))
> -		return 1;
> +		return -EINVAL;
>  
>  	if (!guest_cpuid_has(vcpu, X86_FEATURE_SMEP) && (cr4 & X86_CR4_SMEP))
> -		return 1;
> +		return -EINVAL;
>  
>  	if (!guest_cpuid_has(vcpu, X86_FEATURE_SMAP) && (cr4 & X86_CR4_SMAP))
> -		return 1;
> +		return -EINVAL;
>  
>  	if (!guest_cpuid_has(vcpu, X86_FEATURE_FSGSBASE) && (cr4 & X86_CR4_FSGSBASE))
> -		return 1;
> +		return -EINVAL;
>  
>  	if (!guest_cpuid_has(vcpu, X86_FEATURE_PKU) && (cr4 & X86_CR4_PKE))
> -		return 1;
> +		return -EINVAL;
>  
>  	if (!guest_cpuid_has(vcpu, X86_FEATURE_LA57) && (cr4 & X86_CR4_LA57))
> -		return 1;
> +		return -EINVAL;
>  
>  	if (!guest_cpuid_has(vcpu, X86_FEATURE_UMIP) && (cr4 & X86_CR4_UMIP))
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
> +int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> +{
> +	unsigned long old_cr4 = kvm_read_cr4(vcpu);
> +	unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE |
> +				   X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE;
> +
> +	if (cr4 & CR4_RESERVED_BITS)
> +		return 1;

Checking CPUID bits but allowing unconditionally reserved bits to be set
feels wrong.

Paolo, can you provide an "official" ruling on how KVM_SET_SREGS should
interact with reserved bits?  It's not at all clear from the git history
if skipping the checks was intentional or an oversight.

The CR4_RESERVED_BITS check has been in kvm_set_cr4() since the beginning
of time (commit 6aa8b732ca01, "[PATCH] kvm: userspace interface").

The first CPUID check came later, in commit 2acf923e38fb ("KVM: VMX:
Enable XSAVE/XRSTOR for guest"), but its changelog is decidedly unhelpful.

> +
> +	if (kvm_valid_cr4(vcpu, cr4))
>  		return 1;
>  
>  	if (is_long_mode(vcpu)) {
> @@ -8675,7 +8683,8 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
>  	struct desc_ptr dt;
>  	int ret = -EINVAL;
>  
> -	if (kvm_valid_sregs(vcpu, sregs))
> +	if (kvm_valid_sregs(vcpu, sregs) ||

No need for a line break.  Even better, call kvm_valid_cr4() from
kvm_valid_sregs(), e.g. the X86_FEATURE_XSAVE check in kvm_valid_sregs()
is now redundant and can be dropped, and "return kvm_valid_cr4(...)" from
kvm_valid_sregs() can likely be optimized into a tail call.

> +		kvm_valid_cr4(vcpu, sregs->cr4))
>  		goto out;
>  
>  	apic_base_msr.data = sregs->apic_base;
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/3] KVM: X86: Fix userspace set broken combinations of CPUID and CR4
  2019-09-17 17:32   ` Sean Christopherson
@ 2019-09-18  9:56     ` Wanpeng Li
  2019-09-24 13:55     ` Paolo Bonzini
  1 sibling, 0 replies; 10+ messages in thread
From: Wanpeng Li @ 2019-09-18  9:56 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: LKML, kvm, Paolo Bonzini, Radim Krčmář,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel,
	# v3 . 10+

On Wed, 18 Sep 2019 at 01:32, Sean Christopherson
<sean.j.christopherson@intel.com> wrote:
>
> On Tue, Sep 17, 2019 at 04:16:25PM +0800, Wanpeng Li wrote:
> > From: Wanpeng Li <wanpengli@tencent.com>
> >
> > Reported by syzkaller:
> >
> >       WARNING: CPU: 0 PID: 6544 at /home/kernel/data/kvm/arch/x86/kvm//vmx/vmx.c:4689 handle_desc+0x37/0x40 [kvm_intel]
> >       CPU: 0 PID: 6544 Comm: a.out Tainted: G           OE     5.3.0-rc4+ #4
> >       RIP: 0010:handle_desc+0x37/0x40 [kvm_intel]
> >       Call Trace:
> >        vmx_handle_exit+0xbe/0x6b0 [kvm_intel]
> >        vcpu_enter_guest+0x4dc/0x18d0 [kvm]
> >        kvm_arch_vcpu_ioctl_run+0x407/0x660 [kvm]
> >        kvm_vcpu_ioctl+0x3ad/0x690 [kvm]
> >        do_vfs_ioctl+0xa2/0x690
> >        ksys_ioctl+0x6d/0x80
> >        __x64_sys_ioctl+0x1a/0x20
> >        do_syscall_64+0x74/0x720
> >        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> >
> > When CR4.UMIP is set, guest should have UMIP cpuid flag. Current
> > kvm set_sregs function doesn't have such check when userspace inputs
> > sregs values. SECONDARY_EXEC_DESC is enabled on writes to CR4.UMIP
> > in vmx_set_cr4 though guest doesn't have UMIP cpuid flag. The testcast
> > triggers handle_desc warning when executing ltr instruction since
> > guest architectural CR4 doesn't set UMIP. This patch fixes it by
> > adding valid CR4 and CPUID combination checking in __set_sregs.
>
> Checking CPUID will fix this specific scenario, but it doesn't resolve
> the underlying issue of __set_sregs() ignoring the return of kvm_x86_ops'
> set_cr4(), e.g. I think vmx_set_cr4() can still fail if userspace sets a
> custom MSR_IA32_VMX_CR4_FIXED0 when nested VMX is on.
>
> > syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=138efb99600000
> >
> > Reported-by: syzbot+0f1819555fbdce992df9@syzkaller.appspotmail.com
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > ---
> >  arch/x86/kvm/x86.c | 39 ++++++++++++++++++++++++---------------
> >  1 file changed, 24 insertions(+), 15 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index f7cfd8e..cafb4d4 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -884,34 +884,42 @@ int kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr)
> >  }
> >  EXPORT_SYMBOL_GPL(kvm_set_xcr);
> >
> > -int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> > +static int kvm_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> >  {
> > -     unsigned long old_cr4 = kvm_read_cr4(vcpu);
> > -     unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE |
> > -                                X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE;
> > -
> > -     if (cr4 & CR4_RESERVED_BITS)
> > -             return 1;
> > -
> >       if (!guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) && (cr4 & X86_CR4_OSXSAVE))
> > -             return 1;
> > +             return -EINVAL;
> >
> >       if (!guest_cpuid_has(vcpu, X86_FEATURE_SMEP) && (cr4 & X86_CR4_SMEP))
> > -             return 1;
> > +             return -EINVAL;
> >
> >       if (!guest_cpuid_has(vcpu, X86_FEATURE_SMAP) && (cr4 & X86_CR4_SMAP))
> > -             return 1;
> > +             return -EINVAL;
> >
> >       if (!guest_cpuid_has(vcpu, X86_FEATURE_FSGSBASE) && (cr4 & X86_CR4_FSGSBASE))
> > -             return 1;
> > +             return -EINVAL;
> >
> >       if (!guest_cpuid_has(vcpu, X86_FEATURE_PKU) && (cr4 & X86_CR4_PKE))
> > -             return 1;
> > +             return -EINVAL;
> >
> >       if (!guest_cpuid_has(vcpu, X86_FEATURE_LA57) && (cr4 & X86_CR4_LA57))
> > -             return 1;
> > +             return -EINVAL;
> >
> >       if (!guest_cpuid_has(vcpu, X86_FEATURE_UMIP) && (cr4 & X86_CR4_UMIP))
> > +             return -EINVAL;
> > +
> > +     return 0;
> > +}
> > +
> > +int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
> > +{
> > +     unsigned long old_cr4 = kvm_read_cr4(vcpu);
> > +     unsigned long pdptr_bits = X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE |
> > +                                X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE;
> > +
> > +     if (cr4 & CR4_RESERVED_BITS)
> > +             return 1;
>
> Checking CPUID bits but allowing unconditionally reserved bits to be set
> feels wrong.
>
> Paolo, can you provide an "official" ruling on how KVM_SET_SREGS should
> interact with reserved bits?  It's not at all clear from the git history
> if skipping the checks was intentional or an oversight.
>
> The CR4_RESERVED_BITS check has been in kvm_set_cr4() since the beginning
> of time (commit 6aa8b732ca01, "[PATCH] kvm: userspace interface").
>
> The first CPUID check came later, in commit 2acf923e38fb ("KVM: VMX:
> Enable XSAVE/XRSTOR for guest"), but its changelog is decidedly unhelpful.
>
> > +
> > +     if (kvm_valid_cr4(vcpu, cr4))
> >               return 1;
> >
> >       if (is_long_mode(vcpu)) {
> > @@ -8675,7 +8683,8 @@ static int __set_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
> >       struct desc_ptr dt;
> >       int ret = -EINVAL;
> >
> > -     if (kvm_valid_sregs(vcpu, sregs))
> > +     if (kvm_valid_sregs(vcpu, sregs) ||
>
> No need for a line break.  Even better, call kvm_valid_cr4() from
> kvm_valid_sregs(), e.g. the X86_FEATURE_XSAVE check in kvm_valid_sregs()
> is now redundant and can be dropped, and "return kvm_valid_cr4(...)" from
> kvm_valid_sregs() can likely be optimized into a tail call.

handle it in new version.

    Wanpeng

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/3] KVM: X86: Fix userspace set broken combinations of CPUID and CR4
  2019-09-17 17:32   ` Sean Christopherson
  2019-09-18  9:56     ` Wanpeng Li
@ 2019-09-24 13:55     ` Paolo Bonzini
  1 sibling, 0 replies; 10+ messages in thread
From: Paolo Bonzini @ 2019-09-24 13:55 UTC (permalink / raw)
  To: Sean Christopherson, Wanpeng Li
  Cc: linux-kernel, kvm, Radim Krčmář,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, stable

On 17/09/19 19:32, Sean Christopherson wrote:
> 
> Paolo, can you provide an "official" ruling on how KVM_SET_SREGS should
> interact with reserved bits?  It's not at all clear from the git history
> if skipping the checks was intentional or an oversight.

It's okay to make it fail as long as KVM already checks the value of the
reserved bits on vmexit.  If not, some care might be required.

Paolo

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-09-24 13:55 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-17  8:16 [PATCH 1/3] KVM: Fix coalesced mmio ring buffer out-of-bounds access Wanpeng Li
2019-09-17  8:16 ` [PATCH v2 2/3] KVM: X86: Fix userspace set broken combinations of CPUID and CR4 Wanpeng Li
2019-09-17 17:32   ` Sean Christopherson
2019-09-18  9:56     ` Wanpeng Li
2019-09-24 13:55     ` Paolo Bonzini
2019-09-17  8:16 ` [PATCH v5 3/3] KVM: LAPIC: Tune lapic_timer_advance_ns smoothly Wanpeng Li
2019-09-17 17:07   ` Paolo Bonzini
2019-09-17  8:18 ` [PATCH 1/3] KVM: Fix coalesced mmio ring buffer out-of-bounds access Paolo Bonzini
2019-09-17 14:58 ` Jim Mattson
2019-09-17 15:18   ` Matt Delco

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).