All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	syzbot+8112db3ab20e70d50c31@syzkaller.appspotmail.com,
	Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Tadeusz Struk <tadeusz.struk@linaro.org>
Subject: [PATCH 5.10 06/25] KVM: x86: Forcibly leave nested virt when SMM state is toggled
Date: Fri,  4 Feb 2022 10:20:13 +0100	[thread overview]
Message-ID: <20220204091914.497446272@linuxfoundation.org> (raw)
In-Reply-To: <20220204091914.280602669@linuxfoundation.org>

From: Sean Christopherson <seanjc@google.com>

commit f7e570780efc5cec9b2ed1e0472a7da14e864fdb upstream.

Forcibly leave nested virtualization operation if userspace toggles SMM
state via KVM_SET_VCPU_EVENTS or KVM_SYNC_X86_EVENTS.  If userspace
forces the vCPU out of SMM while it's post-VMXON and then injects an SMI,
vmx_enter_smm() will overwrite vmx->nested.smm.vmxon and end up with both
vmxon=false and smm.vmxon=false, but all other nVMX state allocated.

Don't attempt to gracefully handle the transition as (a) most transitions
are nonsencial, e.g. forcing SMM while L2 is running, (b) there isn't
sufficient information to handle all transitions, e.g. SVM wants access
to the SMRAM save state, and (c) KVM_SET_VCPU_EVENTS must precede
KVM_SET_NESTED_STATE during state restore as the latter disallows putting
the vCPU into L2 if SMM is active, and disallows tagging the vCPU as
being post-VMXON in SMM if SMM is not active.

Abuse of KVM_SET_VCPU_EVENTS manifests as a WARN and memory leak in nVMX
due to failure to free vmcs01's shadow VMCS, but the bug goes far beyond
just a memory leak, e.g. toggling SMM on while L2 is active puts the vCPU
in an architecturally impossible state.

  WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
  WARNING: CPU: 0 PID: 3606 at free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
  Modules linked in:
  CPU: 1 PID: 3606 Comm: syz-executor725 Not tainted 5.17.0-rc1-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:free_loaded_vmcs arch/x86/kvm/vmx/vmx.c:2665 [inline]
  RIP: 0010:free_loaded_vmcs+0x158/0x1a0 arch/x86/kvm/vmx/vmx.c:2656
  Code: <0f> 0b eb b3 e8 8f 4d 9f 00 e9 f7 fe ff ff 48 89 df e8 92 4d 9f 00
  Call Trace:
   <TASK>
   kvm_arch_vcpu_destroy+0x72/0x2f0 arch/x86/kvm/x86.c:11123
   kvm_vcpu_destroy arch/x86/kvm/../../../virt/kvm/kvm_main.c:441 [inline]
   kvm_destroy_vcpus+0x11f/0x290 arch/x86/kvm/../../../virt/kvm/kvm_main.c:460
   kvm_free_vcpus arch/x86/kvm/x86.c:11564 [inline]
   kvm_arch_destroy_vm+0x2e8/0x470 arch/x86/kvm/x86.c:11676
   kvm_destroy_vm arch/x86/kvm/../../../virt/kvm/kvm_main.c:1217 [inline]
   kvm_put_kvm+0x4fa/0xb00 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1250
   kvm_vm_release+0x3f/0x50 arch/x86/kvm/../../../virt/kvm/kvm_main.c:1273
   __fput+0x286/0x9f0 fs/file_table.c:311
   task_work_run+0xdd/0x1a0 kernel/task_work.c:164
   exit_task_work include/linux/task_work.h:32 [inline]
   do_exit+0xb29/0x2a30 kernel/exit.c:806
   do_group_exit+0xd2/0x2f0 kernel/exit.c:935
   get_signal+0x4b0/0x28c0 kernel/signal.c:2862
   arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:868
   handle_signal_work kernel/entry/common.c:148 [inline]
   exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
   exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:207
   __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
   syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:300
   do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
   entry_SYSCALL_64_after_hwframe+0x44/0xae
   </TASK>

Cc: stable@vger.kernel.org
Reported-by: syzbot+8112db3ab20e70d50c31@syzkaller.appspotmail.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220125220358.2091737-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Backported-by: Tadeusz Struk <tadeusz.struk@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/kvm/svm/nested.c       |   10 ++++++++--
 arch/x86/kvm/svm/svm.c          |    2 +-
 arch/x86/kvm/svm/svm.h          |    2 +-
 arch/x86/kvm/vmx/nested.c       |    1 +
 arch/x86/kvm/x86.c              |    2 ++
 6 files changed, 14 insertions(+), 4 deletions(-)

--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1285,6 +1285,7 @@ struct kvm_x86_ops {
 };
 
 struct kvm_x86_nested_ops {
+	void (*leave_nested)(struct kvm_vcpu *vcpu);
 	int (*check_events)(struct kvm_vcpu *vcpu);
 	bool (*hv_timer_pending)(struct kvm_vcpu *vcpu);
 	int (*get_state)(struct kvm_vcpu *vcpu,
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -783,8 +783,10 @@ void svm_free_nested(struct vcpu_svm *sv
 /*
  * Forcibly leave nested mode in order to be able to reset the VCPU later on.
  */
-void svm_leave_nested(struct vcpu_svm *svm)
+void svm_leave_nested(struct kvm_vcpu *vcpu)
 {
+	struct vcpu_svm *svm = to_svm(vcpu);
+
 	if (is_guest_mode(&svm->vcpu)) {
 		struct vmcb *hsave = svm->nested.hsave;
 		struct vmcb *vmcb = svm->vmcb;
@@ -1185,7 +1187,7 @@ static int svm_set_nested_state(struct k
 		return -EINVAL;
 
 	if (!(kvm_state->flags & KVM_STATE_NESTED_GUEST_MODE)) {
-		svm_leave_nested(svm);
+		svm_leave_nested(vcpu);
 		svm_set_gif(svm, !!(kvm_state->flags & KVM_STATE_NESTED_GIF_SET));
 		return 0;
 	}
@@ -1238,6 +1240,9 @@ static int svm_set_nested_state(struct k
 	copy_vmcb_control_area(&hsave->control, &svm->vmcb->control);
 	hsave->save = *save;
 
+	if (is_guest_mode(vcpu))
+		svm_leave_nested(vcpu);
+
 	svm->nested.vmcb12_gpa = kvm_state->hdr.svm.vmcb_pa;
 	load_nested_vmcb_control(svm, ctl);
 	nested_prepare_vmcb_control(svm);
@@ -1252,6 +1257,7 @@ out_free:
 }
 
 struct kvm_x86_nested_ops svm_nested_ops = {
+	.leave_nested = svm_leave_nested,
 	.check_events = svm_check_nested_events,
 	.get_nested_state_pages = svm_get_nested_state_pages,
 	.get_state = svm_get_nested_state,
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -279,7 +279,7 @@ int svm_set_efer(struct kvm_vcpu *vcpu,
 
 	if ((old_efer & EFER_SVME) != (efer & EFER_SVME)) {
 		if (!(efer & EFER_SVME)) {
-			svm_leave_nested(svm);
+			svm_leave_nested(vcpu);
 			svm_set_gif(svm, true);
 
 			/*
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -393,7 +393,7 @@ static inline bool nested_exit_on_nmi(st
 
 int enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
 			 struct vmcb *nested_vmcb);
-void svm_leave_nested(struct vcpu_svm *svm);
+void svm_leave_nested(struct kvm_vcpu *vcpu);
 void svm_free_nested(struct vcpu_svm *svm);
 int svm_allocate_nested(struct vcpu_svm *svm);
 int nested_svm_vmrun(struct vcpu_svm *svm);
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6628,6 +6628,7 @@ __init int nested_vmx_hardware_setup(int
 }
 
 struct kvm_x86_nested_ops vmx_nested_ops = {
+	.leave_nested = vmx_leave_nested,
 	.check_events = vmx_check_nested_events,
 	.hv_timer_pending = nested_vmx_preemption_timer_pending,
 	.get_state = vmx_get_nested_state,
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4391,6 +4391,8 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_e
 				vcpu->arch.hflags |= HF_SMM_MASK;
 			else
 				vcpu->arch.hflags &= ~HF_SMM_MASK;
+
+			kvm_x86_ops.nested_ops->leave_nested(vcpu);
 			kvm_smm_changed(vcpu);
 		}
 



  parent reply	other threads:[~2022-02-04  9:23 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-04  9:20 [PATCH 5.10 00/25] 5.10.97-rc1 review Greg Kroah-Hartman
2022-02-04  9:20 ` [PATCH 5.10 01/25] PCI: pciehp: Fix infinite loop in IRQ handler upon power fault Greg Kroah-Hartman
2022-02-04  9:20 ` [PATCH 5.10 02/25] net: ipa: fix atomic update in ipa_endpoint_replenish() Greg Kroah-Hartman
2022-02-04  9:20 ` [PATCH 5.10 03/25] net: ipa: use a bitmap for endpoint replenish_enabled Greg Kroah-Hartman
2022-02-04  9:20 ` [PATCH 5.10 04/25] net: ipa: prevent concurrent replenish Greg Kroah-Hartman
2022-02-04  9:20 ` [PATCH 5.10 05/25] Revert "drivers: bus: simple-pm-bus: Add support for probing simple bus only devices" Greg Kroah-Hartman
2022-02-04  9:20 ` Greg Kroah-Hartman [this message]
2022-02-04  9:20 ` [PATCH 5.10 07/25] psi: Fix uaf issue when psi trigger is destroyed while being polled Greg Kroah-Hartman
2022-02-04  9:20 ` [PATCH 5.10 08/25] perf: Rework perf_event_exit_event() Greg Kroah-Hartman
2022-02-04  9:37   ` Pavel Machek
2022-02-04  9:40     ` Greg Kroah-Hartman
2022-02-04 10:12       ` Pavel Machek
2022-02-05 10:25         ` Greg Kroah-Hartman
2022-02-04  9:20 ` [PATCH 5.10 09/25] perf/core: Fix cgroup event list management Greg Kroah-Hartman
2022-02-04  9:20 ` [PATCH 5.10 10/25] x86/mce: Add Xeon Sapphire Rapids to list of CPUs that support PPIN Greg Kroah-Hartman
2022-02-04  9:20 ` [PATCH 5.10 11/25] x86/cpu: Add Xeon Icelake-D " Greg Kroah-Hartman
2022-02-04  9:20 ` [PATCH 5.10 12/25] drm/vc4: hdmi: Make sure the device is powered with CEC Greg Kroah-Hartman
2022-02-05 11:40   ` Alexey Khoroshilov
2022-02-05 11:53     ` Greg Kroah-Hartman
2022-02-05 12:04       ` Alexey Khoroshilov
2022-02-04  9:20 ` [PATCH 5.10 13/25] cgroup-v1: Require capabilities to set release_agent Greg Kroah-Hartman
2022-02-04  9:20 ` [PATCH 5.10 14/25] net/mlx5e: Fix handling of wrong devices during bond netevent Greg Kroah-Hartman
2022-02-04  9:20 ` [PATCH 5.10 15/25] net/mlx5: Use del_timer_sync in fw reset flow of halting poll Greg Kroah-Hartman
2022-02-04  9:20 ` [PATCH 5.10 16/25] net/mlx5: E-Switch, Fix uninitialized variable modact Greg Kroah-Hartman
2022-02-04  9:20 ` [PATCH 5.10 17/25] ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback Greg Kroah-Hartman
2022-02-04  9:20 ` [PATCH 5.10 18/25] net: amd-xgbe: ensure to reset the tx_timer_active flag Greg Kroah-Hartman
2022-02-04  9:20 ` [PATCH 5.10 19/25] net: amd-xgbe: Fix skb data length underflow Greg Kroah-Hartman
2022-02-04  9:20 ` [PATCH 5.10 20/25] fanotify: Fix stale file descriptor in copy_event_to_user() Greg Kroah-Hartman
2022-02-04  9:20 ` [PATCH 5.10 21/25] net: sched: fix use-after-free in tc_new_tfilter() Greg Kroah-Hartman
2022-02-04  9:20 ` [PATCH 5.10 22/25] rtnetlink: make sure to refresh master_dev/m_ops in __rtnl_newlink() Greg Kroah-Hartman
2022-02-04  9:20 ` [PATCH 5.10 23/25] cpuset: Fix the bug that subpart_cpus updated wrongly in update_cpumask() Greg Kroah-Hartman
2022-02-04  9:20 ` [PATCH 5.10 24/25] af_packet: fix data-race in packet_setsockopt / packet_setsockopt Greg Kroah-Hartman
2022-02-04  9:20 ` [PATCH 5.10 25/25] tcp: add missing tcp_skb_can_collapse() test in tcp_shift_skb_data() Greg Kroah-Hartman
2022-02-04 11:31 ` [PATCH 5.10 00/25] 5.10.97-rc1 review Pavel Machek
2022-02-04 15:20 ` Jon Hunter
2022-02-04 17:33 ` Florian Fainelli
2022-02-04 19:11 ` Fox Chen
2022-02-04 20:32 ` Shuah Khan
2022-02-04 21:08 ` Guenter Roeck
2022-02-04 23:30 ` Slade Watkins
2022-02-05  7:01 ` Naresh Kamboju
2022-02-05 14:30 ` Sudip Mukherjee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220204091914.497446272@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=stable@vger.kernel.org \
    --cc=syzbot+8112db3ab20e70d50c31@syzkaller.appspotmail.com \
    --cc=tadeusz.struk@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.