All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dmytro Maluka <dmy@semihalf.com>
To: Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	kvm@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	linux-kernel@vger.kernel.org, Eric Auger <eric.auger@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Rong L Liu <rong.l.liu@intel.com>,
	Zhenyu Wang <zhenyuw@linux.intel.com>,
	Tomasz Nowicki <tn@semihalf.com>,
	Grzegorz Jaszczyk <jaz@semihalf.com>,
	Dmitry Torokhov <dtor@google.com>
Subject: Re: [PATCH 3/3] KVM: irqfd: Postpone resamplefd notify for oneshot interrupts
Date: Tue, 2 Aug 2022 20:47:01 +0200	[thread overview]
Message-ID: <c7b7860e-ae3a-7b98-e97e-28a62470c470@semihalf.com> (raw)
In-Reply-To: <20220715155928.26362-4-dmy@semihalf.com>

On 7/15/22 17:59, Dmytro Maluka wrote:
> The existing KVM mechanism for forwarding of level-triggered interrupts
> using resample eventfd doesn't work quite correctly in the case of
> interrupts that are handled in a Linux guest as oneshot interrupts
> (IRQF_ONESHOT). Such an interrupt is acked to the device in its
> threaded irq handler, i.e. later than it is acked to the interrupt
> controller (EOI at the end of hardirq), not earlier.
> 
> Linux keeps such interrupt masked until its threaded handler finishes,
> to prevent the EOI from re-asserting an unacknowledged interrupt.
> However, with KVM + vfio (or whatever is listening on the resamplefd)
> we don't check that the interrupt is still masked in the guest at the
> moment of EOI. Resamplefd is notified regardless, so vfio prematurely
> unmasks the host physical IRQ, thus a new (unwanted) physical interrupt
> is generated in the host and queued for injection to the guest.
> 
> The fact that the virtual IRQ is still masked doesn't prevent this new
> physical IRQ from being propagated to the guest, because:
> 
> 1. It is not guaranteed that the vIRQ will remain masked by the time
>    when vfio signals the trigger eventfd.
> 2. KVM marks this IRQ as pending (e.g. setting its bit in the virtual
>    IRR register of IOAPIC on x86), so after the vIRQ is unmasked, this
>    new pending interrupt is injected by KVM to the guest anyway.
> 
> There are observed at least 2 user-visible issues caused by those
> extra erroneous pending interrupts for oneshot irq in the guest:
> 
> 1. System suspend aborted due to a pending wakeup interrupt from
>    ChromeOS EC (drivers/platform/chrome/cros_ec.c).
> 2. Annoying "invalid report id data" errors from ELAN0000 touchpad
>    (drivers/input/mouse/elan_i2c_core.c), flooding the guest dmesg
>    every time the touchpad is touched.
> 
> This patch fixes the issue on x86 by checking if the interrupt is
> unmasked when we receive irq ack (EOI) and, in case if it's masked,
> postponing resamplefd notify until the guest unmasks it.
> 
> Important notes:
> 
> 1. It doesn't fix the issue for other archs yet, due to some missing
>    KVM functionality needed by this patch:
>      - calling mask notifiers is implemented for x86 only
>      - irqchip ->is_masked() is implemented for x86 only
> 
> 2. It introduces an additional spinlock locking in the resample notify
>    path, since we are no longer just traversing an RCU list of irqfds
>    but also updating the resampler state. Hopefully this locking won't
>    noticeably slow down anything for anyone.
> 
> Regarding #2, there may be an alternative solution worth considering:
> extend KVM irqfd (userspace) API to send mask and unmask notifications
> directly to vfio/whatever, in addition to resample notifications, to
> let vfio check the irq state on its own. There is already locking on
> vfio side (see e.g. vfio_platform_unmask()), so this way we would avoid
> introducing any additional locking. Also such mask/unmask notifications
> could be useful for other cases.
> 
> Link: https://lore.kernel.org/kvm/31420943-8c5f-125c-a5ee-d2fde2700083@semihalf.com/
> Suggested-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Dmytro Maluka <dmy@semihalf.com>
> ---
>  include/linux/kvm_irqfd.h | 14 ++++++++++++
>  virt/kvm/eventfd.c        | 45 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 59 insertions(+)
> 
> diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
> index dac047abdba7..01754a1abb9e 100644
> --- a/include/linux/kvm_irqfd.h
> +++ b/include/linux/kvm_irqfd.h
> @@ -19,6 +19,16 @@
>   * resamplefd.  All resamplers on the same gsi are de-asserted
>   * together, so we don't need to track the state of each individual
>   * user.  We can also therefore share the same irq source ID.
> + *
> + * A special case is when the interrupt is still masked at the moment
> + * an irq ack is received. That likely means that the interrupt has
> + * been acknowledged to the interrupt controller but not acknowledged
> + * to the device yet, e.g. it might be a Linux guest's threaded
> + * oneshot interrupt (IRQF_ONESHOT). In this case notifying through
> + * resamplefd is postponed until the guest unmasks the interrupt,
> + * which is detected through the irq mask notifier. This prevents
> + * erroneous extra interrupts caused by premature re-assert of an
> + * unacknowledged interrupt by the resamplefd listener.
>   */
>  struct kvm_kernel_irqfd_resampler {
>  	struct kvm *kvm;
> @@ -28,6 +38,10 @@ struct kvm_kernel_irqfd_resampler {
>  	 */
>  	struct list_head list;
>  	struct kvm_irq_ack_notifier notifier;
> +	struct kvm_irq_mask_notifier mask_notifier;
> +	bool masked;
> +	bool pending;
> +	spinlock_t lock;
>  	/*
>  	 * Entry in list of kvm->irqfd.resampler_list.  Use for sharing
>  	 * resamplers among irqfds on the same gsi.
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 50ddb1d1a7f0..9ff47ac33790 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -75,6 +75,44 @@ irqfd_resampler_ack(struct kvm_irq_ack_notifier *kian)
>  	kvm_set_irq(kvm, KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID,
>  		    resampler->notifier.gsi, 0, false);
>  
> +	spin_lock(&resampler->lock);
> +	if (resampler->masked) {
> +		resampler->pending = true;
> +		spin_unlock(&resampler->lock);
> +		return;
> +	}
> +	spin_unlock(&resampler->lock);
> +
> +	idx = srcu_read_lock(&kvm->irq_srcu);
> +
> +	list_for_each_entry_srcu(irqfd, &resampler->list, resampler_link,
> +	    srcu_read_lock_held(&kvm->irq_srcu))
> +		eventfd_signal(irqfd->resamplefd, 1);
> +
> +	srcu_read_unlock(&kvm->irq_srcu, idx);
> +}
> +
> +static void
> +irqfd_resampler_mask(struct kvm_irq_mask_notifier *kimn, bool masked)
> +{
> +	struct kvm_kernel_irqfd_resampler *resampler;
> +	struct kvm *kvm;
> +	struct kvm_kernel_irqfd *irqfd;
> +	int idx;
> +
> +	resampler = container_of(kimn,
> +			struct kvm_kernel_irqfd_resampler, mask_notifier);
> +	kvm = resampler->kvm;
> +
> +	spin_lock(&resampler->lock);
> +	resampler->masked = masked;
> +	if (masked || !resampler->pending) {
> +		spin_unlock(&resampler->lock);
> +		return;
> +	}
> +	resampler->pending = false;
> +	spin_unlock(&resampler->lock);
> +
>  	idx = srcu_read_lock(&kvm->irq_srcu);
>  
>  	list_for_each_entry_srcu(irqfd, &resampler->list, resampler_link,
> @@ -98,6 +136,8 @@ irqfd_resampler_shutdown(struct kvm_kernel_irqfd *irqfd)
>  	if (list_empty(&resampler->list)) {
>  		list_del(&resampler->link);
>  		kvm_unregister_irq_ack_notifier(kvm, &resampler->notifier);
> +		kvm_unregister_irq_mask_notifier(kvm, resampler->mask_notifier.irq,
> +						 &resampler->mask_notifier);
>  		kvm_set_irq(kvm, KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID,
>  			    resampler->notifier.gsi, 0, false);
>  		kfree(resampler);
> @@ -367,11 +407,16 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
>  			INIT_LIST_HEAD(&resampler->list);
>  			resampler->notifier.gsi = irqfd->gsi;
>  			resampler->notifier.irq_acked = irqfd_resampler_ack;
> +			resampler->mask_notifier.func = irqfd_resampler_mask;
> +			kvm_irq_is_masked(kvm, irqfd->gsi, &resampler->masked);
> +			spin_lock_init(&resampler->lock);
>  			INIT_LIST_HEAD(&resampler->link);
>  
>  			list_add(&resampler->link, &kvm->irqfds.resampler_list);
>  			kvm_register_irq_ack_notifier(kvm,
>  						      &resampler->notifier);
> +			kvm_register_irq_mask_notifier(kvm, irqfd->gsi,
> +						       &resampler->mask_notifier);

I realized this is a bit racy: we may miss a mask or unmask event just
before kvm_register_irq_ack_notifier(), so irqfd_resampler_ack() may
see an outdated irq mask state.

Moving kvm_register_irq_mask_notifier() before
kvm_register_irq_ack_notifier() isn't enough, since a mask or unmask
may still happen just before kvm_register_irq_mask_notifier().

This race can be avoided by moving also resampler->masked initialization
(kvm_irq_is_masked()) after kvm_register_irq_mask_notifier(). But then
kvm_irq_is_masked() would need to be called under resampler->lock,
which could cause a deadlock, since kvm_irq_is_masked() locks
ioapic->lock while irqfd_resampler_mask() is called under ioapic->lock
(or correspondingly pic->lock).

So for v2 I'm considering replacing kvm_irq_is_masked() with a function
like the following, for registering & initializing mask notifier in a
presumably race-free deadlock-free way. (The below is just a sketch
showing the locking order; the actual code would be more complicated
due to irq -> pin mapping etc.)

void kvm_register_and_fire_irq_mask_notifier(struct kvm *kvm, int irq,
                                             struct kvm_irq_mask_notifier *kimn)
{
        struct kvm_pic *pic = kvm->arch.vpic;
        struct kvm_ioapic *ioapic = kvm->arch.vioapic;
        bool masked;

        mutex_lock(&kvm->irq_lock);
        spin_lock(&pic->lock);
        spin_lock(&ioapic->lock);

        kimn->irq = irq;
        hlist_add_head_rcu(&kimn->link, &kvm->arch.mask_notifier_list);

        masked = kvm_pic_irq_is_masked(pic, irq) &&
                 kvm_ioapic_irq_is_masked(ioapic, irq);
        kimn->func(kimn, masked);

        spin_unlock(&ioapic->lock);
        spin_unlock(&pic->lock);
        mutex_unlock(&kvm->irq_lock);
}


This implies that I'll probably go with moving the mask notifiers stuff
back to x86-specific code, and just add a weak no-op version for other
architectures.

Thanks,
Dmytro

>  			irqfd->resampler = resampler;
>  		}
>  

      parent reply	other threads:[~2022-08-02 18:47 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-15 15:59 [PATCH 0/3] KVM: Fix oneshot interrupts forwarding Dmytro Maluka
2022-07-15 15:59 ` [PATCH 1/3] KVM: x86: Move kvm_(un)register_irq_mask_notifier() to generic KVM Dmytro Maluka
2022-07-28 18:46   ` Sean Christopherson
2022-07-29 11:09     ` Dmytro Maluka
2022-08-02 21:43       ` Sean Christopherson
2022-07-15 15:59 ` [PATCH 2/3] KVM: x86: Add kvm_irq_is_masked() Dmytro Maluka
2022-08-04 17:14   ` Eric Auger
2022-08-04 19:31     ` Dmytro Maluka
2022-07-15 15:59 ` [PATCH 3/3] KVM: irqfd: Postpone resamplefd notify for oneshot interrupts Dmytro Maluka
2022-07-25 23:44   ` Liu, Rong L
2022-07-26 14:07     ` Dmytro Maluka
2022-07-29 20:48       ` Liu, Rong L
2022-07-30 14:34         ` Dmytro Maluka
2022-08-09 22:02           ` Liu, Rong L
2022-08-10  0:56             ` Dmytro Maluka
2022-08-11 16:12               ` Liu, Rong L
2022-08-13 14:33                 ` Dmytro Maluka
2022-08-10 17:34             ` Liu, Rong L
2022-07-28 18:55   ` Sean Christopherson
2022-07-29 11:19     ` Dmytro Maluka
2022-07-29 21:21   ` Liu, Rong L
2022-07-30 15:30     ` Dmytro Maluka
2022-08-02 18:47   ` Dmytro Maluka [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c7b7860e-ae3a-7b98-e97e-28a62470c470@semihalf.com \
    --to=dmy@semihalf.com \
    --cc=alex.williamson@redhat.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=dtor@google.com \
    --cc=eric.auger@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jaz@semihalf.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=rong.l.liu@intel.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=tn@semihalf.com \
    --cc=x86@kernel.org \
    --cc=zhenyuw@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.