RE: Add vfio-platform support for ONESHOT irq forwarding?

From: "Liu, Rong L" <rong.l.liu@intel.com>
To: Dmytro Maluka <dmy@semihalf.com>, "Christopherson,,
	Sean" <seanjc@google.com>
Cc: Auger Eric <eric.auger@redhat.com>,
	Micah Morton <mortonm@chromium.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Tomasz Nowicki <tn@semihalf.com>,
	Grzegorz Jaszczyk <jaz@semihalf.com>,
	Dmitry Torokhov <dtor@google.com>,
	"Liu, Rong L" <rong.l.liu@intel.com>
Subject: RE: Add vfio-platform support for ONESHOT irq forwarding?
Date: Tue, 12 Jul 2022 16:02:41 +0000	[thread overview]
Message-ID: <MW3PR11MB45544BF267FD3926F90A2158C7869@MW3PR11MB4554.namprd11.prod.outlook.com> (raw)
In-Reply-To: <aaec2781-2983-6195-2216-bf4aeeb86d7f@semihalf.com>

Hi Sean and Dmytro,

> -----Original Message-----
> From: Dmytro Maluka <dmy@semihalf.com>
> Sent: Thursday, July 7, 2022 10:39 AM
> To: Christopherson,, Sean <seanjc@google.com>
> Cc: Auger Eric <eric.auger@redhat.com>; Micah Morton
> <mortonm@chromium.org>; Alex Williamson
> <alex.williamson@redhat.com>; kvm@vger.kernel.org; Paolo Bonzini
> <pbonzini@redhat.com>; Liu, Rong L <rong.l.liu@intel.com>; Tomasz
> Nowicki <tn@semihalf.com>; Grzegorz Jaszczyk <jaz@semihalf.com>;
> Dmitry Torokhov <dtor@google.com>
> Subject: Re: Add vfio-platform support for ONESHOT irq forwarding?
> 
> On 7/7/22 17:00, Sean Christopherson wrote:
> > On Thu, Jul 07, 2022, Dmytro Maluka wrote:
> >> Hi Sean,
> >>
> >> On 7/6/22 10:39 PM, Sean Christopherson wrote:
> >>> On Wed, Jul 06, 2022, Dmytro Maluka wrote:
> >>>> This is not a problem on native, since for oneshot irq we keep the
> >>>> interrupt masked until the thread exits, so that the EOI at the end
> >>>> of hardirq doesn't result in immediate re-assert. In vfio + KVM
> >>>> case, however, the host doesn't check that the interrupt is still
> >>>> masked in the guest, so
> >>>> vfio_platform_unmask() is called regardless.
> >>>
> >>> Isn't not checking that an interrupt is unmasked the real bug?
> >>> Fudging around vfio (or whatever is doing the premature unmasking)
> >>> bugs by delaying an ack notification in KVM is a hack, no?
> >>
> >> Yes, not checking that an interrupt is unmasked is IMO a bug, and my
> >> patch actually adds this missing checking, only that it adds it in
> >> KVM, not in VFIO. :)
> >>
> >> Arguably it's not a bug that VFIO is not checking the guest interrupt
> >> state on its own, provided that the resample notification it receives
> >> is always a notification that the interrupt has been actually acked.
> >> That is the motivation behind postponing ack notification in KVM in
> >> my patch: it is to ensure that KVM "ack notifications" are always
> >> actual ack notifications (as the name suggests), not just "eoi
> notifications".
> >
> > But EOI is an ACK.  It's software saying "this interrupt has been
> consumed".
> 
> Ok, I see we've had some mutual misunderstanding of the term "ack"
> here.
> EOI is an ACK in the interrupt controller sense, while I was talking about
> an ACK in the device sense, i.e. a device-specific action, done in a device
> driver's IRQ handler, which makes the device de-assert the IRQ line (so
> that the IRQ will not get re-asserted after an EOI or unmask).
> 
> So the problem I'm trying to fix stems from the peculiarity of "oneshot"
> interrupts: an ACK in the device sense is done in a threaded handler, i.e.
> after an ACK in the interrupt controller sense, not before it.
> 
> In the meantime I've realized one more reason why my patch is wrong.
> kvm_notify_acked_irq() is an internal KVM thing which is supposed to
> notify interested parts of KVM about an ACK rather in the interrupt
> controller sense, i.e. about an EOI as it is doing already.
> 
> VFIO, on the other hand, rather expects a notification about an ACK in the
> device sense. So it still seems a good idea to me to postpone sending
> notifications to VFIO until an IRQ gets unmasked, but this postponing
> should be done not for the entire kvm_notify_acked_irq() but only for
> eventfd_signal() on resamplefd in irqfd_resampler_ack().
> 
> Thanks for making me think about that.
> 
> >
> >> That said, your idea of checking the guest interrupt status in VFIO
> >> (or whatever is listening on the resample eventfd) makes sense to me
> >> too. The problem, though, is that it's KVM that knows the guest
> >> interrupt status, so KVM would need to let VFIO/whatever know it
> >> somehow. (I'm assuming we are focusing on the case of KVM kernel
> >> irqchip, not userspace or split irqchip.) So do you have in mind
> >> adding something like "maskfd" and "unmaskfd" to KVM IRQFD
> interface,
> >> in addition to resamplefd? If so, I'm actually in favor of such an
> >> idea, as I think it would be also useful for other purposes, regardless
> of oneshot interrupts.
> >
> > Unless I'm misreading things, KVM already provides a mask notifier,
> > irqfd just needs to be wired up to use
> kvm_(un)register_irq_mask_notifier().
> 

Interesting...  I initially thought that kvm doesn't "trap" on ioapic's mmio
write.  However, I just traced kvm/ioapic.c and it turns out
ioapic_write_indirect() was called many times.   Does trapping on ioapic's mmio
write cause vmexit on edge-triggered interrupt exit?  It seems the case because
IOREGSEL and IOWIN of IOAPIC are memory mapped but not the redirection entry
register for each IRQ (that is why the name indirect_write), in order to unmask
redirection entry register on the exit of each interrupt (edge-triggered or
level-triggered), kernel needs to write to IORESEL, which means vmexit if kvm
traps on ioapic's mmio write.  However, for pass-thru device which uses
edge-triggered interrupt (handled by vfio or something similar),  interrupt
(pIRQ) is enabled by vfio and it seems unnecessary to cause a vmexit when guest
updates virtual ioapic.  I think the situation is similar for level-triggered
interrupt.  So 2 vmexits for each level-triggered interrupt completion, one for
EOI on lapic and another for unmask of IOAPIC register.  Does this sound right? 
I thought with vfio (or similar architecture), there is no vmexit necessary on
edge-triggered interrupt completion and only one vmexit for level triggered
interrupt completion, except the caveats of oneshot interrupt.  Maybe I
misunderstand something?

> Thanks for the tip. I'll take a look into implementing this idea.
> 
> It seems you agree that fixing this issue via a change in KVM (in irqfd, not
> in ioapic) seems to be the way to go.
> 

> An immediate problem I see with kvm_(un)register_irq_mask_notifier()
> is that it is currently available for x86 only. Also, mask notifiers are called
> under ioapic->lock (I'm not sure yet if that is good or bad news for us).
> 
> Thanks,
> Dmytro