linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, David Woodhouse <dwmw@amazon.co.uk>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Nikos Tsironis <ntsironis@arrikto.com>
Subject: [PATCH 5.4 17/98] KVM: x86: Fix split-irqchip vs interrupt injection window request
Date: Tue,  1 Dec 2020 09:52:54 +0100	[thread overview]
Message-ID: <20201201084654.920393515@linuxfoundation.org> (raw)
In-Reply-To: <20201201084652.827177826@linuxfoundation.org>

From: Paolo Bonzini <pbonzini@redhat.com>

commit 71cc849b7093bb83af966c0e60cb11b7f35cd746 upstream.

kvm_cpu_accept_dm_intr and kvm_vcpu_ready_for_interrupt_injection are
a hodge-podge of conditions, hacked together to get something that
more or less works.  But what is actually needed is much simpler;
in both cases the fundamental question is, do we have a place to stash
an interrupt if userspace does KVM_INTERRUPT?

In userspace irqchip mode, that is !vcpu->arch.interrupt.injected.
Currently kvm_event_needs_reinjection(vcpu) covers it, but it is
unnecessarily restrictive.

In split irqchip mode it's a bit more complicated, we need to check
kvm_apic_accept_pic_intr(vcpu) (the IRQ window exit is basically an INTACK
cycle and thus requires ExtINTs not to be masked) as well as
!pending_userspace_extint(vcpu).  However, there is no need to
check kvm_event_needs_reinjection(vcpu), since split irqchip keeps
pending ExtINT state separate from event injection state, and checking
kvm_cpu_has_interrupt(vcpu) is wrong too since ExtINT has higher
priority than APIC interrupts.  In fact the latter fixes a bug:
when userspace requests an IRQ window vmexit, an interrupt in the
local APIC can cause kvm_cpu_has_interrupt() to be true and thus
kvm_vcpu_ready_for_interrupt_injection() to return false.  When this
happens, vcpu_run does not exit to userspace but the interrupt window
vmexits keep occurring.  The VM loops without any hope of making progress.

Once we try to fix these with something like

     return kvm_arch_interrupt_allowed(vcpu) &&
-        !kvm_cpu_has_interrupt(vcpu) &&
-        !kvm_event_needs_reinjection(vcpu) &&
-        kvm_cpu_accept_dm_intr(vcpu);
+        (!lapic_in_kernel(vcpu)
+         ? !vcpu->arch.interrupt.injected
+         : (kvm_apic_accept_pic_intr(vcpu)
+            && !pending_userspace_extint(v)));

we realize two things.  First, thanks to the previous patch the complex
conditional can reuse !kvm_cpu_has_extint(vcpu).  Second, the interrupt
window request in vcpu_enter_guest()

        bool req_int_win =
                dm_request_for_irq_injection(vcpu) &&
                kvm_cpu_accept_dm_intr(vcpu);

should be kept in sync with kvm_vcpu_ready_for_interrupt_injection():
it is unnecessary to ask the processor for an interrupt window
if we would not be able to return to userspace.  Therefore,
kvm_cpu_accept_dm_intr(vcpu) is basically !kvm_cpu_has_extint(vcpu)
ANDed with the existing check for masked ExtINT.  It all makes sense:

- we can accept an interrupt from userspace if there is a place
  to stash it (and, for irqchip split, ExtINTs are not masked).
  Interrupts from userspace _can_ be accepted even if right now
  EFLAGS.IF=0.

- in order to tell userspace we will inject its interrupt ("IRQ
  window open" i.e. kvm_vcpu_ready_for_interrupt_injection), both
  KVM and the vCPU need to be ready to accept the interrupt.

... and this is what the patch implements.

Reported-by: David Woodhouse <dwmw@amazon.co.uk>
Analyzed-by: David Woodhouse <dwmw@amazon.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Nikos Tsironis <ntsironis@arrikto.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 arch/x86/include/asm/kvm_host.h |    1 +
 arch/x86/kvm/irq.c              |    2 +-
 arch/x86/kvm/x86.c              |   18 ++++++++++--------
 3 files changed, 12 insertions(+), 9 deletions(-)

--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1560,6 +1560,7 @@ int kvm_test_age_hva(struct kvm *kvm, un
 int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v);
 int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu);
+int kvm_cpu_has_extint(struct kvm_vcpu *v);
 int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
 int kvm_cpu_get_interrupt(struct kvm_vcpu *v);
 void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event);
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -40,7 +40,7 @@ static int pending_userspace_extint(stru
  * check if there is pending interrupt from
  * non-APIC source without intack.
  */
-static int kvm_cpu_has_extint(struct kvm_vcpu *v)
+int kvm_cpu_has_extint(struct kvm_vcpu *v)
 {
 	/*
 	 * FIXME: interrupt.injected represents an interrupt whose
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3624,21 +3624,23 @@ static int kvm_vcpu_ioctl_set_lapic(stru
 
 static int kvm_cpu_accept_dm_intr(struct kvm_vcpu *vcpu)
 {
+	/*
+	 * We can accept userspace's request for interrupt injection
+	 * as long as we have a place to store the interrupt number.
+	 * The actual injection will happen when the CPU is able to
+	 * deliver the interrupt.
+	 */
+	if (kvm_cpu_has_extint(vcpu))
+		return false;
+
+	/* Acknowledging ExtINT does not happen if LINT0 is masked.  */
 	return (!lapic_in_kernel(vcpu) ||
 		kvm_apic_accept_pic_intr(vcpu));
 }
 
-/*
- * if userspace requested an interrupt window, check that the
- * interrupt window is open.
- *
- * No need to exit to userspace if we already have an interrupt queued.
- */
 static int kvm_vcpu_ready_for_interrupt_injection(struct kvm_vcpu *vcpu)
 {
 	return kvm_arch_interrupt_allowed(vcpu) &&
-		!kvm_cpu_has_interrupt(vcpu) &&
-		!kvm_event_needs_reinjection(vcpu) &&
 		kvm_cpu_accept_dm_intr(vcpu);
 }
 



  parent reply	other threads:[~2020-12-01  9:05 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-01  8:52 [PATCH 5.4 00/98] 5.4.81-rc1 review Greg Kroah-Hartman
2020-12-01  8:52 ` [PATCH 5.4 01/98] spi: bcm-qspi: Fix use-after-free on unbind Greg Kroah-Hartman
2020-12-01  8:52 ` [PATCH 5.4 02/98] spi: bcm2835: " Greg Kroah-Hartman
2020-12-01  8:52 ` [PATCH 5.4 03/98] ipv4: use IS_ENABLED instead of ifdef Greg Kroah-Hartman
2020-12-01  8:52 ` [PATCH 5.4 04/98] netfilter: clear skb->next in NF_HOOK_LIST() Greg Kroah-Hartman
2020-12-01  8:52 ` [PATCH 5.4 05/98] btrfs: tree-checker: add missing return after error in root_item Greg Kroah-Hartman
2020-12-01  8:52 ` [PATCH 5.4 06/98] btrfs: tree-checker: add missing returns after data_ref alignment checks Greg Kroah-Hartman
2020-12-01  8:52 ` [PATCH 5.4 07/98] btrfs: dont access possibly stale fs_info data for printing duplicate device Greg Kroah-Hartman
2020-12-01  8:52 ` [PATCH 5.4 08/98] btrfs: fix lockdep splat when reading qgroup config on mount Greg Kroah-Hartman
2020-12-01  8:52 ` [PATCH 5.4 09/98] wireless: Use linux/stddef.h instead of stddef.h Greg Kroah-Hartman
2020-12-01  8:52 ` [PATCH 5.4 10/98] smb3: Call cifs reconnect from demultiplex thread Greg Kroah-Hartman
2020-12-01  8:52 ` [PATCH 5.4 11/98] smb3: Avoid Mid pending list corruption Greg Kroah-Hartman
2020-12-01  8:52 ` [PATCH 5.4 12/98] smb3: Handle error case during offload read path Greg Kroah-Hartman
2020-12-01  8:52 ` [PATCH 5.4 13/98] cifs: fix a memleak with modefromsid Greg Kroah-Hartman
2020-12-01  8:52 ` [PATCH 5.4 14/98] KVM: PPC: Book3S HV: XIVE: Fix possible oops when accessing ESB page Greg Kroah-Hartman
2020-12-01  8:52 ` [PATCH 5.4 15/98] KVM: arm64: vgic-v3: Drop the reporting of GICR_TYPER.Last for userspace Greg Kroah-Hartman
2020-12-01  8:52 ` [PATCH 5.4 16/98] KVM: x86: handle !lapic_in_kernel case in kvm_cpu_*_extint Greg Kroah-Hartman
2020-12-01  8:52 ` Greg Kroah-Hartman [this message]
2020-12-01  8:52 ` [PATCH 5.4 18/98] trace: fix potenial dangerous pointer Greg Kroah-Hartman
2020-12-01  8:52 ` [PATCH 5.4 19/98] arm64: pgtable: Fix pte_accessible() Greg Kroah-Hartman
2020-12-01  8:52 ` [PATCH 5.4 20/98] arm64: pgtable: Ensure dirty bit is preserved across pte_wrprotect() Greg Kroah-Hartman
2020-12-01  8:52 ` [PATCH 5.4 21/98] HID: uclogic: Add ID for Trust Flex Design Tablet Greg Kroah-Hartman
2020-12-01  8:52 ` [PATCH 5.4 22/98] HID: ite: Replace ABS_MISC 120/121 events with touchpad on/off keypresses Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 23/98] HID: cypress: Support Varmilo Keyboards media hotkeys Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 24/98] HID: add support for Sega Saturn Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 25/98] Input: i8042 - allow insmod to succeed on devices without an i8042 controller Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 26/98] HID: hid-sensor-hub: Fix issue with devices with no report ID Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 27/98] staging: ralink-gdma: fix kconfig dependency bug for DMA_RALINK Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 28/98] HID: add HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE for Gamevice devices Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 29/98] dmaengine: xilinx_dma: use readl_poll_timeout_atomic variant Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 30/98] x86/xen: dont unbind uninitialized lock_kicker_irq Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 31/98] HID: logitech-hidpp: Add HIDPP_CONSUMER_VENDOR_KEYS quirk for the Dinovo Edge Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 32/98] HID: Add Logitech Dinovo Edge battery quirk Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 33/98] proc: dont allow async path resolution of /proc/self components Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 34/98] nvme: free sq/cq dbbuf pointers when dbbuf set fails Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 35/98] vhost scsi: fix cmd completion race Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 36/98] dmaengine: pl330: _prep_dma_memcpy: Fix wrong burst size Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 37/98] scsi: libiscsi: Fix NOP race condition Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 38/98] scsi: target: iscsi: Fix cmd abort fabric stop race Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 39/98] perf/x86: fix sysfs type mismatches Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 40/98] xtensa: uaccess: Add missing __user to strncpy_from_user() prototype Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 41/98] net: dsa: mv88e6xxx: Wait for EEPROM done after HW reset Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 42/98] bus: ti-sysc: Fix bogus resetdone warning on enable for cpsw Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 43/98] ARM: OMAP2+: Manage MPU state properly for omap_enter_idle_coupled() Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 44/98] phy: tegra: xusb: Fix dangling pointer on probe failure Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 45/98] iwlwifi: mvm: write queue_sync_state only for sync Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 46/98] batman-adv: set .owner to THIS_MODULE Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 47/98] arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 48/98] ARM: dts: dra76x: m_can: fix order of clocks Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 49/98] scsi: ufs: Fix race between shutdown and runtime resume flow Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 50/98] bnxt_en: fix error return code in bnxt_init_one() Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 51/98] bnxt_en: fix error return code in bnxt_init_board() Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 52/98] video: hyperv_fb: Fix the cache type when mapping the VRAM Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 53/98] bnxt_en: Release PCI regions when DMA mask setup fails during probe Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 54/98] cxgb4: fix the panic caused by non smac rewrite Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 55/98] s390/qeth: make af_iucv TX notification call more robust Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 56/98] s390/qeth: fix af_iucv notification race Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 57/98] s390/qeth: fix tear down of async TX buffers Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 58/98] ibmvnic: fix call_netdevice_notifiers in do_reset Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 59/98] ibmvnic: notify peers when failover and migration happen Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 60/98] powerpc/64s: Fix allnoconfig build since uaccess flush Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 61/98] IB/mthca: fix return value of error branch in mthca_init_cq() Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 62/98] i40e: Fix removing driver while bare-metal VFs pass traffic Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 63/98] nfc: s3fwrn5: use signed integer for parsing GPIO numbers Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 64/98] net: ena: set initial DMA width to avoid intel iommu issue Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 65/98] ibmvnic: fix NULL pointer dereference in reset_sub_crq_queues Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 66/98] ibmvnic: fix NULL pointer dereference in ibmvic_reset_crq Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 67/98] optee: add writeback to valid memory type Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 68/98] arm64: tegra: Wrong AON HSP reg property size Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 69/98] efivarfs: revert "fix memory leak in efivarfs_create()" Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 70/98] efi: EFI_EARLYCON should depend on EFI Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 71/98] can: gs_usb: fix endianess problem with candleLight firmware Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 72/98] platform/x86: thinkpad_acpi: Send tablet mode switch at wakeup time Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 73/98] platform/x86: toshiba_acpi: Fix the wrong variable assignment Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 74/98] RDMA/hns: Fix retry_cnt and rnr_cnt when querying QP Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 75/98] RDMA/hns: Bugfix for memory window mtpt configuration Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 76/98] can: m_can: m_can_open(): remove IRQF_TRIGGER_FALLING from request_threaded_irq()s flags Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 77/98] can: m_can: fix nominal bitiming tseg2 min for version >= 3.1 Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 78/98] perf stat: Use proper cpu for shadow stats Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 79/98] perf probe: Fix to die_entrypc() returns error correctly Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 80/98] spi: bcm2835aux: Restore err assignment in bcm2835aux_spi_probe Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 81/98] USB: core: Change %pK for __user pointers to %px Greg Kroah-Hartman
2020-12-01  8:53 ` [PATCH 5.4 82/98] usb: gadget: f_midi: Fix memleak in f_midi_alloc Greg Kroah-Hartman
2020-12-01  8:54 ` [PATCH 5.4 83/98] USB: quirks: Add USB_QUIRK_DISCONNECT_SUSPEND quirk for Lenovo A630Z TIO built-in usb-audio card Greg Kroah-Hartman
2020-12-01  8:54 ` [PATCH 5.4 84/98] usb: gadget: Fix memleak in gadgetfs_fill_super Greg Kroah-Hartman
2020-12-01  8:54 ` [PATCH 5.4 85/98] irqchip/exiu: Fix the index of fwspec for IRQ type Greg Kroah-Hartman
2020-12-01  8:54 ` [PATCH 5.4 86/98] x86/mce: Do not overwrite no_way_out if mce_end() fails Greg Kroah-Hartman
2020-12-01  8:54 ` [PATCH 5.4 87/98] x86/speculation: Fix prctl() when spectre_v2_user={seccomp,prctl},ibpb Greg Kroah-Hartman
2020-12-01  8:54 ` [PATCH 5.4 88/98] x86/resctrl: Remove superfluous kernfs_get() calls to prevent refcount leak Greg Kroah-Hartman
2020-12-01  8:54 ` [PATCH 5.4 89/98] x86/resctrl: Add necessary kernfs_put() " Greg Kroah-Hartman
2020-12-01  8:54 ` [PATCH 5.4 90/98] USB: core: Fix regression in Hercules audio card Greg Kroah-Hartman
2020-12-01  8:54 ` [PATCH 5.4 91/98] ASoC: Intel: Skylake: Remove superfluous chip initialization Greg Kroah-Hartman
2020-12-01  8:54 ` [PATCH 5.4 92/98] ASoC: Intel: Skylake: Select hda configuration permissively Greg Kroah-Hartman
2020-12-01  8:54 ` [PATCH 5.4 93/98] ASoC: Intel: Skylake: Enable codec wakeup during chip init Greg Kroah-Hartman
2020-12-01  8:54 ` [PATCH 5.4 94/98] ASoC: Intel: Skylake: Shield against no-NHLT configurations Greg Kroah-Hartman
2020-12-01  8:54 ` [PATCH 5.4 95/98] ASoC: Intel: Allow for ROM init retry on CNL platforms Greg Kroah-Hartman
2020-12-01  8:54 ` [PATCH 5.4 96/98] ASoC: Intel: Skylake: Await purge request ack on CNL Greg Kroah-Hartman
2020-12-01  8:54 ` [PATCH 5.4 97/98] ASoC: Intel: Multiple I/O PCM format support for pipe Greg Kroah-Hartman
2020-12-01  8:54 ` [PATCH 5.4 98/98] ASoC: Intel: Skylake: Automatic DMIC format configuration according to information from NHLT Greg Kroah-Hartman
2020-12-02  4:58 ` [PATCH 5.4 00/98] 5.4.81-rc1 review Naresh Kamboju
2020-12-02 16:57 ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201201084654.920393515@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=dwmw@amazon.co.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ntsironis@arrikto.com \
    --cc=pbonzini@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).