linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Ladi Prosek <lprosek@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: [PATCH 4.11 81/84] KVM: x86: fix emulation of RSM and IRET instructions
Date: Mon,  3 Jul 2017 15:36:01 +0200	[thread overview]
Message-ID: <20170703133408.260886797@linuxfoundation.org> (raw)
In-Reply-To: <20170703133402.874816941@linuxfoundation.org>

4.11-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Ladi Prosek <lprosek@redhat.com>

commit 6ed071f051e12cf7baa1b69d3becb8f232fdfb7b upstream.

On AMD, the effect of set_nmi_mask called by emulate_iret_real and em_rsm
on hflags is reverted later on in x86_emulate_instruction where hflags are
overwritten with ctxt->emul_flags (the kvm_set_hflags call). This manifests
as a hang when rebooting Windows VMs with QEMU, OVMF, and >1 vcpu.

Instead of trying to merge ctxt->emul_flags into vcpu->arch.hflags after
an instruction is emulated, this commit deletes emul_flags altogether and
makes the emulator access vcpu->arch.hflags using two new accessors. This
way all changes, on the emulator side as well as in functions called from
the emulator and accessing vcpu state with emul_to_vcpu, are preserved.

More details on the bug and its manifestation with Windows and OVMF:

  It's a KVM bug in the interaction between SMI/SMM and NMI, specific to AMD.
  I believe that the SMM part explains why we started seeing this only with
  OVMF.

  KVM masks and unmasks NMI when entering and leaving SMM. When KVM emulates
  the RSM instruction in em_rsm, the set_nmi_mask call doesn't stick because
  later on in x86_emulate_instruction we overwrite arch.hflags with
  ctxt->emul_flags, effectively reverting the effect of the set_nmi_mask call.
  The AMD-specific hflag of interest here is HF_NMI_MASK.

  When rebooting the system, Windows sends an NMI IPI to all but the current
  cpu to shut them down. Only after all of them are parked in HLT will the
  initiating cpu finish the restart. If NMI is masked, other cpus never get
  the memo and the initiating cpu spins forever, waiting for
  hal!HalpInterruptProcessorsStarted to drop. That's the symptom we observe.

Fixes: a584539b24b8 ("KVM: x86: pass the whole hflags field to emulator and back")
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/include/asm/kvm_emulate.h |    4 +++-
 arch/x86/kvm/emulate.c             |   16 +++++++++-------
 arch/x86/kvm/x86.c                 |   15 ++++++++++++---
 3 files changed, 24 insertions(+), 11 deletions(-)

--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -221,6 +221,9 @@ struct x86_emulate_ops {
 	void (*get_cpuid)(struct x86_emulate_ctxt *ctxt,
 			  u32 *eax, u32 *ebx, u32 *ecx, u32 *edx);
 	void (*set_nmi_mask)(struct x86_emulate_ctxt *ctxt, bool masked);
+
+	unsigned (*get_hflags)(struct x86_emulate_ctxt *ctxt);
+	void (*set_hflags)(struct x86_emulate_ctxt *ctxt, unsigned hflags);
 };
 
 typedef u32 __attribute__((vector_size(16))) sse128_t;
@@ -290,7 +293,6 @@ struct x86_emulate_ctxt {
 
 	/* interruptibility state, as a result of execution of STI or MOV SS */
 	int interruptibility;
-	int emul_flags;
 
 	bool perm_ok; /* do not check permissions if true */
 	bool ud;	/* inject an #UD if host doesn't support insn */
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2547,7 +2547,7 @@ static int em_rsm(struct x86_emulate_ctx
 	u64 smbase;
 	int ret;
 
-	if ((ctxt->emul_flags & X86EMUL_SMM_MASK) == 0)
+	if ((ctxt->ops->get_hflags(ctxt) & X86EMUL_SMM_MASK) == 0)
 		return emulate_ud(ctxt);
 
 	/*
@@ -2596,11 +2596,11 @@ static int em_rsm(struct x86_emulate_ctx
 		return X86EMUL_UNHANDLEABLE;
 	}
 
-	if ((ctxt->emul_flags & X86EMUL_SMM_INSIDE_NMI_MASK) == 0)
+	if ((ctxt->ops->get_hflags(ctxt) & X86EMUL_SMM_INSIDE_NMI_MASK) == 0)
 		ctxt->ops->set_nmi_mask(ctxt, false);
 
-	ctxt->emul_flags &= ~X86EMUL_SMM_INSIDE_NMI_MASK;
-	ctxt->emul_flags &= ~X86EMUL_SMM_MASK;
+	ctxt->ops->set_hflags(ctxt, ctxt->ops->get_hflags(ctxt) &
+		~(X86EMUL_SMM_INSIDE_NMI_MASK | X86EMUL_SMM_MASK));
 	return X86EMUL_CONTINUE;
 }
 
@@ -5317,6 +5317,7 @@ int x86_emulate_insn(struct x86_emulate_
 	const struct x86_emulate_ops *ops = ctxt->ops;
 	int rc = X86EMUL_CONTINUE;
 	int saved_dst_type = ctxt->dst.type;
+	unsigned emul_flags;
 
 	ctxt->mem_read.pos = 0;
 
@@ -5331,6 +5332,7 @@ int x86_emulate_insn(struct x86_emulate_
 		goto done;
 	}
 
+	emul_flags = ctxt->ops->get_hflags(ctxt);
 	if (unlikely(ctxt->d &
 		     (No64|Undefined|Sse|Mmx|Intercept|CheckPerm|Priv|Prot|String))) {
 		if ((ctxt->mode == X86EMUL_MODE_PROT64 && (ctxt->d & No64)) ||
@@ -5364,7 +5366,7 @@ int x86_emulate_insn(struct x86_emulate_
 				fetch_possible_mmx_operand(ctxt, &ctxt->dst);
 		}
 
-		if (unlikely(ctxt->emul_flags & X86EMUL_GUEST_MASK) && ctxt->intercept) {
+		if (unlikely(emul_flags & X86EMUL_GUEST_MASK) && ctxt->intercept) {
 			rc = emulator_check_intercept(ctxt, ctxt->intercept,
 						      X86_ICPT_PRE_EXCEPT);
 			if (rc != X86EMUL_CONTINUE)
@@ -5393,7 +5395,7 @@ int x86_emulate_insn(struct x86_emulate_
 				goto done;
 		}
 
-		if (unlikely(ctxt->emul_flags & X86EMUL_GUEST_MASK) && (ctxt->d & Intercept)) {
+		if (unlikely(emul_flags & X86EMUL_GUEST_MASK) && (ctxt->d & Intercept)) {
 			rc = emulator_check_intercept(ctxt, ctxt->intercept,
 						      X86_ICPT_POST_EXCEPT);
 			if (rc != X86EMUL_CONTINUE)
@@ -5447,7 +5449,7 @@ int x86_emulate_insn(struct x86_emulate_
 
 special_insn:
 
-	if (unlikely(ctxt->emul_flags & X86EMUL_GUEST_MASK) && (ctxt->d & Intercept)) {
+	if (unlikely(emul_flags & X86EMUL_GUEST_MASK) && (ctxt->d & Intercept)) {
 		rc = emulator_check_intercept(ctxt, ctxt->intercept,
 					      X86_ICPT_POST_MEMACCESS);
 		if (rc != X86EMUL_CONTINUE)
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5248,6 +5248,16 @@ static void emulator_set_nmi_mask(struct
 	kvm_x86_ops->set_nmi_mask(emul_to_vcpu(ctxt), masked);
 }
 
+static unsigned emulator_get_hflags(struct x86_emulate_ctxt *ctxt)
+{
+	return emul_to_vcpu(ctxt)->arch.hflags;
+}
+
+static void emulator_set_hflags(struct x86_emulate_ctxt *ctxt, unsigned emul_flags)
+{
+	kvm_set_hflags(emul_to_vcpu(ctxt), emul_flags);
+}
+
 static const struct x86_emulate_ops emulate_ops = {
 	.read_gpr            = emulator_read_gpr,
 	.write_gpr           = emulator_write_gpr,
@@ -5287,6 +5297,8 @@ static const struct x86_emulate_ops emul
 	.intercept           = emulator_intercept,
 	.get_cpuid           = emulator_get_cpuid,
 	.set_nmi_mask        = emulator_set_nmi_mask,
+	.get_hflags          = emulator_get_hflags,
+	.set_hflags          = emulator_set_hflags,
 };
 
 static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask)
@@ -5341,7 +5353,6 @@ static void init_emulate_ctxt(struct kvm
 	BUILD_BUG_ON(HF_GUEST_MASK != X86EMUL_GUEST_MASK);
 	BUILD_BUG_ON(HF_SMM_MASK != X86EMUL_SMM_MASK);
 	BUILD_BUG_ON(HF_SMM_INSIDE_NMI_MASK != X86EMUL_SMM_INSIDE_NMI_MASK);
-	ctxt->emul_flags = vcpu->arch.hflags;
 
 	init_decode_cache(ctxt);
 	vcpu->arch.emulate_regs_need_sync_from_vcpu = false;
@@ -5744,8 +5755,6 @@ restart:
 		unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
 		toggle_interruptibility(vcpu, ctxt->interruptibility);
 		vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
-		if (vcpu->arch.hflags != ctxt->emul_flags)
-			kvm_set_hflags(vcpu, ctxt->emul_flags);
 		kvm_rip_write(vcpu, ctxt->eip);
 		if (r == EMULATE_DONE &&
 		    (ctxt->tf || (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP)))

  parent reply	other threads:[~2017-07-03 13:50 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-03 13:34 [PATCH 4.11 00/84] 4.11.9-stable review Greg Kroah-Hartman
2017-07-03 13:34 ` [PATCH 4.11 01/84] net: dont call strlen on non-terminated string in dev_set_alias() Greg Kroah-Hartman
2017-07-03 13:34 ` [PATCH 4.11 02/84] net: Fix inconsistent teardown and release of private netdev state Greg Kroah-Hartman
2017-07-03 13:34 ` [PATCH 4.11 03/84] net: s390: fix up for "Fix inconsistent teardown and release of private netdev state" Greg Kroah-Hartman
2017-07-03 13:34 ` [PATCH 4.11 04/84] mac80211: free netdev on dev_alloc_name() error Greg Kroah-Hartman
2017-07-03 13:34 ` [PATCH 4.11 05/84] decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb Greg Kroah-Hartman
2017-07-03 13:34 ` [PATCH 4.11 06/84] net: Zero ifla_vf_info in rtnl_fill_vfinfo() Greg Kroah-Hartman
2017-07-03 13:34 ` [PATCH 4.11 07/84] net: ipv6: Release route when device is unregistering Greg Kroah-Hartman
2017-07-03 13:34 ` [PATCH 4.11 08/84] net: vrf: Make add_fib_rules per network namespace flag Greg Kroah-Hartman
2017-07-03 13:34 ` [PATCH 4.11 09/84] af_unix: Add sockaddr length checks before accessing sa_family in bind and connect handlers Greg Kroah-Hartman
2017-07-03 13:34 ` [PATCH 4.11 10/84] Fix an intermittent pr_emerg warning about lo becoming free Greg Kroah-Hartman
2017-07-03 13:34 ` [PATCH 4.11 11/84] sctp: disable BH in sctp_for_each_endpoint Greg Kroah-Hartman
2017-07-03 13:34 ` [PATCH 4.11 12/84] net: caif: Fix a sleep-in-atomic bug in cfpkt_create_pfx Greg Kroah-Hartman
2017-07-03 13:34 ` [PATCH 4.11 13/84] net: tipc: Fix a sleep-in-atomic bug in tipc_msg_reverse Greg Kroah-Hartman
2017-07-03 13:34 ` [PATCH 4.11 14/84] net/mlx5: Remove several module events out of ethtool stats Greg Kroah-Hartman
2017-07-03 13:34 ` [PATCH 4.11 15/84] net/mlx5e: Added BW check for DIM decision mechanism Greg Kroah-Hartman
2017-07-03 13:34 ` [PATCH 4.11 16/84] net/mlx5e: Fix wrong indications in DIM due to counter wraparound Greg Kroah-Hartman
2017-07-03 13:34 ` [PATCH 4.11 17/84] net/mlx5: Enable 4K UAR only when page size is bigger than 4K Greg Kroah-Hartman
2017-07-03 13:34 ` [PATCH 4.11 18/84] proc: snmp6: Use correct type in memset Greg Kroah-Hartman
2017-07-03 13:34 ` [PATCH 4.11 19/84] igmp: acquire pmc lock for ip_mc_clear_src() Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 20/84] igmp: add a missing spin_lock_init() Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 22/84] net: dont global ICMP rate limit packets originating from loopback Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 23/84] ipv6: fix calling in6_ifa_hold incorrectly for dad work Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 24/84] sctp: return next obj by passing pos + 1 into sctp_transport_get_idx Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 25/84] net/mlx5e: Fix min inline value for VF rep SQs Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 26/84] net/mlx5e: Avoid doing a cleanup call if the profile doesnt have it Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 27/84] net/mlx5: Wait for FW readiness before initializing command interface Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 28/84] net/mlx5e: Fix timestamping capabilities reporting Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 29/84] decnet: always not take dst->__refcnt when inserting dst into hash table Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 30/84] net: 8021q: Fix one possible panic caused by BUG_ON in free_netdev Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 31/84] ipv6: Do not leak throw route references Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 32/84] rtnetlink: add IFLA_GROUP to ifla_policy Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 33/84] netfilter: synproxy: fix conntrackd interaction Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 34/84] NFSv4.x/callback: Create the callback service through svc_create_pooled Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 36/84] MIPS: head: Reorder instructions missing a delay slot Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 37/84] MIPS: Avoid accidental raw backtrace Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 38/84] MIPS: pm-cps: Drop manual cache-line alignment of ready_count Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 39/84] MIPS: Fix IRQ tracing & lockdep when rescheduling Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 40/84] ALSA: hda - Fix endless loop of codec configure Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 41/84] ALSA: hda - set input_path bitmap to zero after moving it to new place Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 42/84] NFSv4.2: Dont send mode again in post-EXCLUSIVE4_1 SETATTR with umask Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 43/84] NFSv4.1: Fix a race in nfs4_proc_layoutget Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 44/84] Revert "NFS: nfs_rename() handle -ERESTARTSYS dentry left behind" Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 45/84] ovl: copy-up: dont unlock between lookup and link Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 46/84] gpiolib: fix filtering out unwanted events Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 47/84] x86/intel_rdt: Fix memory leak on mount failure Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 48/84] perf/x86/intel/uncore: Fix wrong box pointer check Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 49/84] drm/vmwgfx: Free hash table allocated by cmdbuf managed res mgr Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 50/84] dm thin: do not queue freed thin mapping for next stage processing Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 51/84] x86/mm: Fix boot crash caused by incorrect loop count calculation in sync_global_pgds() Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 52/84] pinctrl/amd: Use regular interrupt instead of chained Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 53/84] mm/vmalloc.c: huge-vmap: fail gracefully on unexpected huge vmap mappings Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 55/84] xfrm6: Fix IPv6 payload_len in xfrm6_transport_finish Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 56/84] xfrm: move xfrm_garbage_collect out of xfrm_policy_flush Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 57/84] xfrm: fix stack access out of bounds with CONFIG_XFRM_SUB_POLICY Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 58/84] xfrm: NULL dereference on allocation failure Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 59/84] xfrm: Oops on error in pfkey_msg2xfrm_state() Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 60/84] watchdog: bcm281xx: Fix use of uninitialized spinlock Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 61/84] ARM64: PCI: Fix struct acpi_pci_root_ops allocation failure path Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 62/84] ARM64/ACPI: Fix BAD_MADT_GICC_ENTRY() macro implementation Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 63/84] ARM: 8685/1: ensure memblock-limit is pmd-aligned Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 64/84] ARM: davinci: PM: Free resources in error handling path in davinci_pm_init Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 65/84] ARM: davinci: PM: Do not free useful resources in normal " Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 66/84] tools arch: Sync arch/x86/lib/memcpy_64.S with the kernel Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 67/84] Revert "x86/entry: Fix the end of the stack for newly forked tasks" Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 68/84] x86/mshyperv: Remove excess #includes from mshyperv.h Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 69/84] x86/boot/KASLR: Fix kexec crash due to virt_addr calculation bug Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 70/84] perf/x86: Fix spurious NMI with PEBS Load Latency event Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 71/84] x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 72/84] x86/mm: Fix flush_tlb_page() on Xen Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 73/84] ocfs2: o2hb: revert hb threshold to keep compatible Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 74/84] ocfs2: fix deadlock caused by recursive locking in xattr Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 75/84] iommu/dma: Dont reserve PCI I/O windows Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 76/84] iommu/amd: Fix incorrect error handling in amd_iommu_bind_pasid() Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 77/84] iommu/amd: Fix interrupt remapping when disable guest_mode Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 78/84] infiniband: hns: avoid gcc-7.0.1 warning for uninitialized data Greg Kroah-Hartman
2017-07-03 13:35 ` [PATCH 4.11 79/84] mtd: nand: brcmnand: Check flash #WP pin status before nand erase/program Greg Kroah-Hartman
2017-07-03 13:36 ` [PATCH 4.11 80/84] mtd: nand: fsmc: fix NAND width handling Greg Kroah-Hartman
2017-07-03 13:36 ` Greg Kroah-Hartman [this message]
2017-07-03 19:55 ` [PATCH 4.11 00/84] 4.11.9-stable review Guenter Roeck
2017-07-04  7:59   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170703133408.260886797@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lprosek@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).