On Tue, 2020-10-27 at 09:01 +0100, Paolo Bonzini wrote:
> On 26/10/20 18:53, David Woodhouse wrote:
> > From: David Woodhouse <dwmw@amazon.co.uk>
> > 
> > As far as I can tell, when we use posted interrupts we silently cut off
> > the events from userspace, if it's listening on the same eventfd that
> > feeds the irqfd.
> > 
> > I like that behaviour. Let's do it all the time, even without posted
> > interrupts. It makes it much easier to handle IRQ remapping invalidation
> > without having to constantly add/remove the fd from the userspace poll
> > set. We can just leave userspace polling on it, and the bypass will...
> > well... bypass it.
> 
> This looks good, though of course it depends on the somewhat hackish
> patch 1.

I thought it was quite neat :)

>  However don't you need to read the eventfd as well, since
> userspace will never be able to do so?

Yes. Although that's a separate cleanup as it was already true before
my patch. Right now, userspace needs to explicitly stop polling on the
VFIO eventfd while it's assigned as KVM IRQFD (to avoid injecting
duplicate interrupts when the kernel isn't using PI and allows events
to leak). So it isn't going to consume the events in that case either.
Nothing's really changed.

The VFIO virqfd is just the same. The count just builds up when the
kernel handles the events, and is eventually cleared by
eventfd_ctx_remove_wait_queue().

In both cases, that actually works fine because in practice the events
are raised by eventfd_signal() in the kernel, and that works even if
the count reaches ULLONG_MAX. It's just that sending further events
from *userspace* would block in that case.

Both of them theoretically want fixing — regardless of the priority
patch.

Since the wq lock is held while the wakeup function (virqfd_wakeup or 
irqfd_wakeup for VFIO/KVM respectively) run, all they really need to do
is call eventfd_ctx_do_read() to consume the events. I'll look at
whether I can find a nicer option than just exporting that.