kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Add vfio-platform support for ONESHOT irq forwarding?
@ 2021-01-25 15:46 Micah Morton
  2021-01-25 17:31 ` Auger Eric
  2021-01-25 20:36 ` Alex Williamson
  0 siblings, 2 replies; 26+ messages in thread
From: Micah Morton @ 2021-01-25 15:46 UTC (permalink / raw)
  To: Auger Eric, Alex Williamson, kvm

Hi Eric,

I was recently looking into some vfio-platform passthrough stuff and
came across a device I wanted to assign to a guest that uses a ONESHOT
type interrupt (these type of interrupts seem to be quite common, on
ARM at least). The semantics for ONESHOT interrupts are a bit
different from regular level triggered interrupts as I'll describe
here:

The normal generic code flow for level-triggered interrupts is as follows:

- regular type[1]: mask[2] the irq, then run the handler, then
unmask[3] the irq and done

- fasteoi type[4]: run the handler, then eoi[5] the irq and done

Note: IIUC the fasteoi type doesn't do any irq masking/unmasking
because that is assumed to be handled transparently by "modern forms
of interrupt handlers, which handle the flow details in hardware"

ONESHOT type interrupts are a special case of the fasteoi type
described above. They rely on the driver registering a threaded
handler for the interrupt and assume the irq line will remain masked
until the threaded handler completes, at which time the line will be
unmasked. TL;DR:

- mask[6] the irq, run the handler, and potentially eoi[7] the irq,
then unmask[8] later when the threaded handler has finished running.

For vfio-platform irq forwarding, there is no existing function in
drivers/vfio/platform/vfio_platform_irq.c[9] that is a good candidate
for registering as the threaded handler for a ONESHOT interrupt in the
case we want to request the ONESHOT irq with
request_threaded_irq()[10]. Moreover, we can't just register a
threaded function that simply immediately returns IRQ_HANDLED (as is
done in vfio_irq_handler()[11] and vfio_automasked_irq_handler()[12]),
since that would cause the IRQ to be unmasked[13] immediately, before
the userspace/guest driver has had any chance to service the
interrupt.

The most obvious way I see to handle this is to add a threaded handler
to vfio_platform_irq.c that waits until the userspace/guest driver has
serviced the interrupt and the unmask_handler[14] has been called, at
which point it returns IRQ_HANDLED so the generic IRQ code in the host
can finally unmask the interrupt.

Does this sound like a reasonable approach and something you would be
fine with adding to vfio-platform? If so I could get started looking
at the implementation for how to sleep in the threaded handler in
vfio-platform until the unmask_handler is called. The most tricky/ugly
part of this is that DT has no knowledge of irq ONESHOT-ness, as it
only contains info regarding active-low vs active-high and edge vs
level trigger. That means that vfio-platform can't figure out that a
device uses a ONESHOT irq in a similar way to how it queries[15] the
trigger type, and by extension QEMU can't learn this information
through the VFIO_DEVICE_GET_IRQ_INFO ioctl, but must have another way
of knowing (i.e. command line option to QEMU).

I guess potentially another option would be to treat ONESHOT
interrupts like regular level triggered interrupts from the
perspective of vfio-platform, but somehow ensure the interrupt stays
masked during injection to the guest, rather than just disabled. I'm
not sure whether this could cause legitimate interrupts coming from
devices to be missed while the injection for an existing interrupt is
underway, but maybe this is a rare enough scenario that we wouldn't
care. The main issue with this approach is that handle_level_irq()[16]
will try to unmask the irq out from under us after we start the
injection (as it is already masked before
vfio_automasked_irq_handler[17] runs anyway). Not sure if masking at
the irqchip level supports nesting or not.

Let me know if you think either of these are viable options for adding
ONESHOT interrupt forwarding support to vfio-platform?

Thanks,
Micah




Additional note about level triggered vs ONESHOT irq forwarding:
For the regular type of level triggered interrupt described above, the
vfio handler will call disable_irq_nosync()[18] before the
handle_level_irq() function unmasks the irq and returns. This ensures
if new interrupts come in on the line while the existing one is being
handled by the guest (and the irq is therefore disabled), that the
vfio_automasked_irq_handler() isn’t triggered again until the
vfio_platform_unmask_handler() function has been triggered by the
guest (causing the irq to be re-enabled[19]). In other words, the
purpose of the irq enable/disable that already exists in vfio-platform
is a higher level concept that delays handling of additional
level-triggered interrupts in the host until the current one has been
handled in the guest.

This means that the existing level triggered interrupt forwarding
logic in vfio/vfio-platform is not sufficient for handling ONESHOT
interrupts (i.e. we can’t just treat a ONESHOT interrupt like a
regular level triggered interrupt in the host and use the existing
vfio forwarding code). The masking that needs to happen for ONESHOT
interrupts is at the lower level of the irqchip mask/unmask in that
the ONESHOT irq needs to remain masked (not just disabled) until the
driver’s threaded handler has completed.




[1] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L642
[2] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L414
[3] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L619
[4] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L702
[5] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L688
[6] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L724
[7] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L688
[8] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L1028
[9] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c
[10] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L2038
[11] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L167
[12] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L142
[13] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L1028
[14] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L94
[15] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L310
[16] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L642
[17] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L142
[18] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L154
[19] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L87

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add vfio-platform support for ONESHOT irq forwarding?
  2021-01-25 15:46 Add vfio-platform support for ONESHOT irq forwarding? Micah Morton
@ 2021-01-25 17:31 ` Auger Eric
  2021-01-25 18:32   ` Micah Morton
  2021-01-25 20:36 ` Alex Williamson
  1 sibling, 1 reply; 26+ messages in thread
From: Auger Eric @ 2021-01-25 17:31 UTC (permalink / raw)
  To: Micah Morton, Alex Williamson, kvm

Hi Micah,

On 1/25/21 4:46 PM, Micah Morton wrote:
> Hi Eric,
> 
> I was recently looking into some vfio-platform passthrough stuff and
> came across a device I wanted to assign to a guest that uses a ONESHOT
> type interrupt (these type of interrupts seem to be quite common, on
> ARM at least). The semantics for ONESHOT interrupts are a bit
> different from regular level triggered interrupts as I'll describe
> here:
> 
> The normal generic code flow for level-triggered interrupts is as follows:
> 
> - regular type[1]: mask[2] the irq, then run the handler, then
> unmask[3] the irq and done

VFIO level sensitive interrupts are "automasked". See slide 10 of
https://www.linux-kvm.org/images/a/a8/01x04-ARMdevice.pdf if this can help.

When the guest deactivates the virtual IRQ, this causes a maintenance
interrupt on host. This occurence causes kvm_notify_acked_irq() to be
called and this latter unmasks the physical IRQ.
> 
> - fasteoi type[4]: run the handler, then eoi[5] the irq and done
> 
> Note: IIUC the fasteoi type doesn't do any irq masking/unmasking
> because that is assumed to be handled transparently by "modern forms
> of interrupt handlers, which handle the flow details in hardware"
> 
> ONESHOT type interrupts are a special case of the fasteoi type
> described above. They rely on the driver registering a threaded
> handler for the interrupt and assume the irq line will remain masked
> until the threaded handler completes, at which time the line will be
> unmasked. TL;DR:
> 
> - mask[6] the irq, run the handler, and potentially eoi[7] the irq,
> then unmask[8] later when the threaded handler has finished running.

could you point me to the exact linux handler (is it
handle_fasteoi_irq?)  or Why do you say "potentially". Given the details
above, the guest EOI would unmask the IRQ at physical level. We do not
have any hook KVM/VFIO on the guest unmap.
> 
> For vfio-platform irq forwarding, there is no existing function in
> drivers/vfio/platform/vfio_platform_irq.c[9] that is a good candidate
> for registering as the threaded handler for a ONESHOT interrupt in the
> case we want to request the ONESHOT irq with
> request_threaded_irq()[10]. Moreover, we can't just register a
> threaded function that simply immediately returns IRQ_HANDLED (as is
> done in vfio_irq_handler()[11] and vfio_automasked_irq_handler()[12]),
> since that would cause the IRQ to be unmasked[13] immediately, before
sorry I know you reworked that several times for style issue but [13]
does not match unmask().
> the userspace/guest driver has had any chance to service the
> interrupt.
> 
> The most obvious way I see to handle this is to add a threaded handler
> to vfio_platform_irq.c that waits until the userspace/guest driver has
> serviced the interrupt and the unmask_handler[14] has been called, at
> which point it returns IRQ_HANDLED so the generic IRQ code in the host
> can finally unmask the interrupt.
this is done on guest EOI at the moment.
> 
> Does this sound like a reasonable approach and something you would be
> fine with adding to vfio-platform?
Well I think it is interesting to do a pre-study and make sure we agree
on the terminology, IRQ flow and problems that would need to be solved.
Then we need to determine if it is worth the candle, if this support
would speed up the vfio-platform usage (I am not sure at this point as I
think there are more important drags like the lack of specification/dt
integration for instance).

Besides we need to make sure we are not launching into a huge effort for
attempting to assign a device that does not fit well into the vfio
platform scope (dma capable, reset, device tree).

 If so I could get started looking
> at the implementation for how to sleep in the threaded handler in
> vfio-platform until the unmask_handler is called. The most tricky/ugly
> part of this is that DT has no knowledge of irq ONESHOT-ness, as it
> only contains info regarding active-low vs active-high and edge vs
> level trigger. That means that vfio-platform can't figure out that a
> device uses a ONESHOT irq in a similar way to how it queries[15] the
> trigger type, and by extension QEMU can't learn this information
> through the VFIO_DEVICE_GET_IRQ_INFO ioctl, but must have another way
> of knowing (i.e. command line option to QEMU).
Indeed that's not really appealing.
> 
> I guess potentially another option would be to treat ONESHOT
> interrupts like regular level triggered interrupts from the
> perspective of vfio-platform, but somehow ensure the interrupt stays
> masked during injection to the guest, rather than just disabled.
I need this to be clarified actually. I am confused by the automasked
terminology now.

Thanks

Eric

 I'm
> not sure whether this could cause legitimate interrupts coming from
> devices to be missed while the injection for an existing interrupt is
> underway, but maybe this is a rare enough scenario that we wouldn't
> care. The main issue with this approach is that handle_level_irq()[16]
> will try to unmask the irq out from under us after we start the
> injection (as it is already masked before
> vfio_automasked_irq_handler[17] runs anyway). Not sure if masking at
> the irqchip level supports nesting or not.
> 
> Let me know if you think either of these are viable options for adding
> ONESHOT interrupt forwarding support to vfio-platform?
> 
> Thanks,
> Micah
> 
> 
> 
> 
> Additional note about level triggered vs ONESHOT irq forwarding:
> For the regular type of level triggered interrupt described above, the
> vfio handler will call disable_irq_nosync()[18] before the
> handle_level_irq() function unmasks the irq and returns. This ensures
> if new interrupts come in on the line while the existing one is being
> handled by the guest (and the irq is therefore disabled), that the
> vfio_automasked_irq_handler() isn’t triggered again until the
> vfio_platform_unmask_handler() function has been triggered by the
> guest (causing the irq to be re-enabled[19]). In other words, the
> purpose of the irq enable/disable that already exists in vfio-platform
> is a higher level concept that delays handling of additional
> level-triggered interrupts in the host until the current one has been
> handled in the guest.
> 
> This means that the existing level triggered interrupt forwarding
> logic in vfio/vfio-platform is not sufficient for handling ONESHOT
> interrupts (i.e. we can’t just treat a ONESHOT interrupt like a
> regular level triggered interrupt in the host and use the existing
> vfio forwarding code). The masking that needs to happen for ONESHOT
> interrupts is at the lower level of the irqchip mask/unmask in that
> the ONESHOT irq needs to remain masked (not just disabled) until the
> driver’s threaded handler has completed.
> 
> 
> 
> 
> [1] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L642
> [2] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L414
> [3] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L619
> [4] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L702
> [5] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L688
> [6] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L724
> [7] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L688
> [8] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L1028
> [9] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c
> [10] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L2038
> [11] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L167
> [12] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L142
> [13] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L1028
> [14] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L94
> [15] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L310
> [16] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L642
> [17] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L142
> [18] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L154
> [19] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L87
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add vfio-platform support for ONESHOT irq forwarding?
  2021-01-25 17:31 ` Auger Eric
@ 2021-01-25 18:32   ` Micah Morton
  2021-01-26  8:48     ` Auger Eric
  0 siblings, 1 reply; 26+ messages in thread
From: Micah Morton @ 2021-01-25 18:32 UTC (permalink / raw)
  To: Auger Eric; +Cc: Alex Williamson, kvm

On Mon, Jan 25, 2021 at 12:31 PM Auger Eric <eric.auger@redhat.com> wrote:
>
> Hi Micah,
>
> On 1/25/21 4:46 PM, Micah Morton wrote:
> > Hi Eric,
> >
> > I was recently looking into some vfio-platform passthrough stuff and
> > came across a device I wanted to assign to a guest that uses a ONESHOT
> > type interrupt (these type of interrupts seem to be quite common, on
> > ARM at least). The semantics for ONESHOT interrupts are a bit
> > different from regular level triggered interrupts as I'll describe
> > here:
> >
> > The normal generic code flow for level-triggered interrupts is as follows:
> >
> > - regular type[1]: mask[2] the irq, then run the handler, then
> > unmask[3] the irq and done
>
> VFIO level sensitive interrupts are "automasked". See slide 10 of
> https://www.linux-kvm.org/images/a/a8/01x04-ARMdevice.pdf if this can help.

When you say "automasked" / "mask and deactivate" are you referring to
the disable_irq_nosync() call in vfio_platform
(https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L154)?

>
> When the guest deactivates the virtual IRQ, this causes a maintenance
> interrupt on host. This occurence causes kvm_notify_acked_irq() to be
> called and this latter unmasks the physical IRQ.

Are you saying kvm_notify_acked_irq() causes
vfio_platform_unmask_handler()
(https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L94)
to be called? Makes sense if so.

> >
> > - fasteoi type[4]: run the handler, then eoi[5] the irq and done
> >
> > Note: IIUC the fasteoi type doesn't do any irq masking/unmasking
> > because that is assumed to be handled transparently by "modern forms
> > of interrupt handlers, which handle the flow details in hardware"
> >
> > ONESHOT type interrupts are a special case of the fasteoi type
> > described above. They rely on the driver registering a threaded
> > handler for the interrupt and assume the irq line will remain masked
> > until the threaded handler completes, at which time the line will be
> > unmasked. TL;DR:
> >
> > - mask[6] the irq, run the handler, and potentially eoi[7] the irq,
> > then unmask[8] later when the threaded handler has finished running.
>
> could you point me to the exact linux handler (is it
> handle_fasteoi_irq?)  or Why do you say "potentially". Given the details
> above, the guest EOI would unmask the IRQ at physical level. We do not
> have any hook KVM/VFIO on the guest unmap.

Yep. handle_fasteoi_irq(). By "potentially" I just meant the eoi is
only called for ONESHOT irqs if this check passes: !(chip->flags &
IRQCHIP_EOI_THREADED). Not actually sure what this check is about
since I didn't look into it.

So far I was just talking about irq operations that happen in the
host. Not sure I quite understand your last two sentences about guest
EOI and guest unmap.

> >
> > For vfio-platform irq forwarding, there is no existing function in
> > drivers/vfio/platform/vfio_platform_irq.c[9] that is a good candidate
> > for registering as the threaded handler for a ONESHOT interrupt in the
> > case we want to request the ONESHOT irq with
> > request_threaded_irq()[10]. Moreover, we can't just register a
> > threaded function that simply immediately returns IRQ_HANDLED (as is
> > done in vfio_irq_handler()[11] and vfio_automasked_irq_handler()[12]),
> > since that would cause the IRQ to be unmasked[13] immediately, before
> sorry I know you reworked that several times for style issue but [13]
> does not match unmask().

That's actually the link I intended, irq_finalize_oneshot() will be
called after the threaded interrupt handler returns, and that's when
the IRQ will be unmasked.

> > the userspace/guest driver has had any chance to service the
> > interrupt.
> >
> > The most obvious way I see to handle this is to add a threaded handler
> > to vfio_platform_irq.c that waits until the userspace/guest driver has
> > serviced the interrupt and the unmask_handler[14] has been called, at
> > which point it returns IRQ_HANDLED so the generic IRQ code in the host
> > can finally unmask the interrupt.
> this is done on guest EOI at the moment.

By "this", I think you mean enable_irq() in vfio_platform_irq.c?

> >
> > Does this sound like a reasonable approach and something you would be
> > fine with adding to vfio-platform?
> Well I think it is interesting to do a pre-study and make sure we agree
> on the terminology, IRQ flow and problems that would need to be solved.
> Then we need to determine if it is worth the candle, if this support
> would speed up the vfio-platform usage (I am not sure at this point as I
> think there are more important drags like the lack of specification/dt
> integration for instance).
>
> Besides we need to make sure we are not launching into a huge effort for
> attempting to assign a device that does not fit well into the vfio
> platform scope (dma capable, reset, device tree).

Yes, agreed.

>
>  If so I could get started looking
> > at the implementation for how to sleep in the threaded handler in
> > vfio-platform until the unmask_handler is called. The most tricky/ugly
> > part of this is that DT has no knowledge of irq ONESHOT-ness, as it
> > only contains info regarding active-low vs active-high and edge vs
> > level trigger. That means that vfio-platform can't figure out that a
> > device uses a ONESHOT irq in a similar way to how it queries[15] the
> > trigger type, and by extension QEMU can't learn this information
> > through the VFIO_DEVICE_GET_IRQ_INFO ioctl, but must have another way
> > of knowing (i.e. command line option to QEMU).
> Indeed that's not really appealing.
> >
> > I guess potentially another option would be to treat ONESHOT
> > interrupts like regular level triggered interrupts from the
> > perspective of vfio-platform, but somehow ensure the interrupt stays
> > masked during injection to the guest, rather than just disabled.
> I need this to be clarified actually. I am confused by the automasked
> terminology now.

My note below tries to explain a bit but AFAICT there's some confusion
in the VFIO code (both platform and PCI) mixing the IRQ terms
mask/unmask with disable/enable, which I think are not quite
interchangeable concepts. IIUC irq enable/disable is a high(er) level
concept that means that even if the irqchip HW sees interrupts coming
in, the kernel refrains from calling the interrupt handler for the IRQ
-- but the kernel keeps track of interrupts that come in while an IRQ
is disabled, in order to know (once the IRQ gets enabled again by the
driver) that there are pending interrupts from the HW. On the other
hand, IRQ masking is a lower level irqchip operation that truly masks
the IRQs at the hardware level, so the kernel doesn't even know or
care if the line is being activated/deactivated while the IRQ is
masked.

I don't have a great explanation for why ONESHOT interrupts are even a
thing in the first place, maybe the HW design means you're going to
get a bunch of garbage/meaningless ups and downs on the interrupt line
while the interrupt is being serviced and they should never be taken
into consideration in any way by the kernel until the IRQ is unmasked
again?

>
> Thanks
>
> Eric
>
>  I'm
> > not sure whether this could cause legitimate interrupts coming from
> > devices to be missed while the injection for an existing interrupt is
> > underway, but maybe this is a rare enough scenario that we wouldn't
> > care. The main issue with this approach is that handle_level_irq()[16]
> > will try to unmask the irq out from under us after we start the
> > injection (as it is already masked before
> > vfio_automasked_irq_handler[17] runs anyway). Not sure if masking at
> > the irqchip level supports nesting or not.
> >
> > Let me know if you think either of these are viable options for adding
> > ONESHOT interrupt forwarding support to vfio-platform?
> >
> > Thanks,
> > Micah
> >
> >
> >
> >
> > Additional note about level triggered vs ONESHOT irq forwarding:
> > For the regular type of level triggered interrupt described above, the
> > vfio handler will call disable_irq_nosync()[18] before the
> > handle_level_irq() function unmasks the irq and returns. This ensures
> > if new interrupts come in on the line while the existing one is being
> > handled by the guest (and the irq is therefore disabled), that the
> > vfio_automasked_irq_handler() isn’t triggered again until the
> > vfio_platform_unmask_handler() function has been triggered by the
> > guest (causing the irq to be re-enabled[19]). In other words, the
> > purpose of the irq enable/disable that already exists in vfio-platform
> > is a higher level concept that delays handling of additional
> > level-triggered interrupts in the host until the current one has been
> > handled in the guest.
> >
> > This means that the existing level triggered interrupt forwarding
> > logic in vfio/vfio-platform is not sufficient for handling ONESHOT
> > interrupts (i.e. we can’t just treat a ONESHOT interrupt like a
> > regular level triggered interrupt in the host and use the existing
> > vfio forwarding code). The masking that needs to happen for ONESHOT
> > interrupts is at the lower level of the irqchip mask/unmask in that
> > the ONESHOT irq needs to remain masked (not just disabled) until the
> > driver’s threaded handler has completed.
> >
> >
> >
> >
> > [1] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L642
> > [2] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L414
> > [3] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L619
> > [4] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L702
> > [5] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L688
> > [6] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L724
> > [7] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L688
> > [8] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L1028
> > [9] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c
> > [10] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L2038
> > [11] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L167
> > [12] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L142
> > [13] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L1028
> > [14] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L94
> > [15] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L310
> > [16] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L642
> > [17] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L142
> > [18] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L154
> > [19] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L87
> >
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add vfio-platform support for ONESHOT irq forwarding?
  2021-01-25 15:46 Add vfio-platform support for ONESHOT irq forwarding? Micah Morton
  2021-01-25 17:31 ` Auger Eric
@ 2021-01-25 20:36 ` Alex Williamson
  2021-01-26  8:53   ` Auger Eric
  1 sibling, 1 reply; 26+ messages in thread
From: Alex Williamson @ 2021-01-25 20:36 UTC (permalink / raw)
  To: Micah Morton; +Cc: Auger Eric, kvm

On Mon, 25 Jan 2021 10:46:47 -0500
Micah Morton <mortonm@chromium.org> wrote:

> Hi Eric,
> 
> I was recently looking into some vfio-platform passthrough stuff and
> came across a device I wanted to assign to a guest that uses a ONESHOT
> type interrupt (these type of interrupts seem to be quite common, on
> ARM at least). The semantics for ONESHOT interrupts are a bit
> different from regular level triggered interrupts as I'll describe
> here:
> 
> The normal generic code flow for level-triggered interrupts is as follows:
> 
> - regular type[1]: mask[2] the irq, then run the handler, then
> unmask[3] the irq and done
> 
> - fasteoi type[4]: run the handler, then eoi[5] the irq and done
> 
> Note: IIUC the fasteoi type doesn't do any irq masking/unmasking
> because that is assumed to be handled transparently by "modern forms
> of interrupt handlers, which handle the flow details in hardware"
> 
> ONESHOT type interrupts are a special case of the fasteoi type
> described above. They rely on the driver registering a threaded
> handler for the interrupt and assume the irq line will remain masked
> until the threaded handler completes, at which time the line will be
> unmasked. TL;DR:
> 
> - mask[6] the irq, run the handler, and potentially eoi[7] the irq,
> then unmask[8] later when the threaded handler has finished running.

This doesn't seem quite correct to me, it skips the discussion of the
hard vs threaded handler, where the "regular" type would expect the
device interrupt to be masked in the hard handler, such that the
controller line can be unmasked during execution of the threaded handler
(if it exists).  It seems fasteoi is more transactional, ie. rather
than masking around quiescing the device interrupt, we only need to
send an eoi once we're ready for a new interrupt.  ONESHOT, OTOH, is a
means of deferring all device handling to the threaded interrupt,
specifically for cases such as an i2c device where the bus interaction
necessitates non-IRQ-context handling.  Sound right?

> 
> For vfio-platform irq forwarding, there is no existing function in
> drivers/vfio/platform/vfio_platform_irq.c[9] that is a good candidate
> for registering as the threaded handler for a ONESHOT interrupt in the
> case we want to request the ONESHOT irq with
> request_threaded_irq()[10]. Moreover, we can't just register a
> threaded function that simply immediately returns IRQ_HANDLED (as is
> done in vfio_irq_handler()[11] and vfio_automasked_irq_handler()[12]),
> since that would cause the IRQ to be unmasked[13] immediately, before
> the userspace/guest driver has had any chance to service the
> interrupt.

Are you proposing servicing the device interrupt before it's sent to
userspace?  A ONESHOT irq is going to eoi the interrupt when the thread
exits, before userspace services the interrupt, so this seems like a
case where we'd need to mask the irq regardless of fasteoi handling so
that it cannot re-assert before userspace manages the device.  Our
existing autmasked for level triggered interrupts should handle this.

> The most obvious way I see to handle this is to add a threaded handler
> to vfio_platform_irq.c that waits until the userspace/guest driver has
> serviced the interrupt and the unmask_handler[14] has been called, at
> which point it returns IRQ_HANDLED so the generic IRQ code in the host
> can finally unmask the interrupt.

An interrupt thread with an indeterminate, user controlled runtime
seems bad.  The fact that fasteoi will send an eoi doesn't also mean
that it can't be masked.

> Does this sound like a reasonable approach and something you would be
> fine with adding to vfio-platform? If so I could get started looking
> at the implementation for how to sleep in the threaded handler in
> vfio-platform until the unmask_handler is called. The most tricky/ugly
> part of this is that DT has no knowledge of irq ONESHOT-ness, as it
> only contains info regarding active-low vs active-high and edge vs
> level trigger. That means that vfio-platform can't figure out that a
> device uses a ONESHOT irq in a similar way to how it queries[15] the
> trigger type, and by extension QEMU can't learn this information
> through the VFIO_DEVICE_GET_IRQ_INFO ioctl, but must have another way
> of knowing (i.e. command line option to QEMU).

Seems like existing level handling w/ masking should handle this, imo.

> I guess potentially another option would be to treat ONESHOT
> interrupts like regular level triggered interrupts from the
> perspective of vfio-platform, but somehow ensure the interrupt stays
> masked during injection to the guest, rather than just disabled. I'm
> not sure whether this could cause legitimate interrupts coming from
> devices to be missed while the injection for an existing interrupt is
> underway, but maybe this is a rare enough scenario that we wouldn't
> care. The main issue with this approach is that handle_level_irq()[16]
> will try to unmask the irq out from under us after we start the
> injection (as it is already masked before
> vfio_automasked_irq_handler[17] runs anyway). Not sure if masking at
> the irqchip level supports nesting or not.

I'd expect either an unmask at the controller or eoi to re-evaluate the
interrupt condition and re-assert as needed.  The interrupt will need to
be exclusive to the device so as not to starve other devices.

> Let me know if you think either of these are viable options for adding
> ONESHOT interrupt forwarding support to vfio-platform?
> 
> Thanks,
> Micah
> 
> 
> 
> 
> Additional note about level triggered vs ONESHOT irq forwarding:
> For the regular type of level triggered interrupt described above, the
> vfio handler will call disable_irq_nosync()[18] before the
> handle_level_irq() function unmasks the irq and returns. This ensures
> if new interrupts come in on the line while the existing one is being
> handled by the guest (and the irq is therefore disabled), that the
> vfio_automasked_irq_handler() isn’t triggered again until the
> vfio_platform_unmask_handler() function has been triggered by the
> guest (causing the irq to be re-enabled[19]). In other words, the
> purpose of the irq enable/disable that already exists in vfio-platform
> is a higher level concept that delays handling of additional
> level-triggered interrupts in the host until the current one has been
> handled in the guest.

I wouldn't say "delays", the interrupt condition is not re-evaluated
until the interrupt is unmasked, by which point the user has had an
opportunity to service the device, which could de-assert the interrupt
such that there is no pending interrupt on unmask.  It therefore blocks
further interrupts until user serviced and unmasked.
 
> This means that the existing level triggered interrupt forwarding
> logic in vfio/vfio-platform is not sufficient for handling ONESHOT
> interrupts (i.e. we can’t just treat a ONESHOT interrupt like a
> regular level triggered interrupt in the host and use the existing
> vfio forwarding code). The masking that needs to happen for ONESHOT
> interrupts is at the lower level of the irqchip mask/unmask in that
> the ONESHOT irq needs to remain masked (not just disabled) until the
> driver’s threaded handler has completed.

I don't see that this is true, unmasking the irq should cause the
controller to re-evaluate the irq condition on the device end and issue
a new interrupt as necessary.  Right?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add vfio-platform support for ONESHOT irq forwarding?
  2021-01-25 18:32   ` Micah Morton
@ 2021-01-26  8:48     ` Auger Eric
  2021-01-26 15:31       ` Micah Morton
  0 siblings, 1 reply; 26+ messages in thread
From: Auger Eric @ 2021-01-26  8:48 UTC (permalink / raw)
  To: Micah Morton; +Cc: Alex Williamson, kvm

Hi Micah,

On 1/25/21 7:32 PM, Micah Morton wrote:
> On Mon, Jan 25, 2021 at 12:31 PM Auger Eric <eric.auger@redhat.com> wrote:
>>
>> Hi Micah,
>>
>> On 1/25/21 4:46 PM, Micah Morton wrote:
>>> Hi Eric,
>>>
>>> I was recently looking into some vfio-platform passthrough stuff and
>>> came across a device I wanted to assign to a guest that uses a ONESHOT
>>> type interrupt (these type of interrupts seem to be quite common, on
>>> ARM at least). The semantics for ONESHOT interrupts are a bit
>>> different from regular level triggered interrupts as I'll describe
>>> here:
>>>
>>> The normal generic code flow for level-triggered interrupts is as follows:
>>>
>>> - regular type[1]: mask[2] the irq, then run the handler, then
>>> unmask[3] the irq and done
>>
>> VFIO level sensitive interrupts are "automasked". See slide 10 of
>> https://www.linux-kvm.org/images/a/a8/01x04-ARMdevice.pdf if this can help.
> 
> When you say "automasked" / "mask and deactivate" are you referring to
> the disable_irq_nosync() call in vfio_platform
> (https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L154)?
Yes that's what I meant
> 
>>
>> When the guest deactivates the virtual IRQ, this causes a maintenance
>> interrupt on host. This occurence causes kvm_notify_acked_irq() to be
>> called and this latter unmasks the physical IRQ.
> 
> Are you saying kvm_notify_acked_irq() causes
> vfio_platform_unmask_handler()
> (https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L94)
> to be called? Makes sense if so.
Yes this ends up calling irqfd_resampler_ack which toggles the virtual
irq line down and calls the unmask eventfd
> 
>>>
>>> - fasteoi type[4]: run the handler, then eoi[5] the irq and done
>>>
>>> Note: IIUC the fasteoi type doesn't do any irq masking/unmasking
>>> because that is assumed to be handled transparently by "modern forms
>>> of interrupt handlers, which handle the flow details in hardware"
>>>
>>> ONESHOT type interrupts are a special case of the fasteoi type
>>> described above. They rely on the driver registering a threaded
>>> handler for the interrupt and assume the irq line will remain masked
>>> until the threaded handler completes, at which time the line will be
>>> unmasked. TL;DR:
>>>
>>> - mask[6] the irq, run the handler, and potentially eoi[7] the irq,
>>> then unmask[8] later when the threaded handler has finished running.
>>
>> could you point me to the exact linux handler (is it
>> handle_fasteoi_irq?)  or Why do you say "potentially". Given the details
>> above, the guest EOI would unmask the IRQ at physical level. We do not
>> have any hook KVM/VFIO on the guest unmap.
> 
> Yep. handle_fasteoi_irq(). By "potentially" I just meant the eoi is
> only called for ONESHOT irqs if this check passes: !(chip->flags &
> IRQCHIP_EOI_THREADED). Not actually sure what this check is about
> since I didn't look into it.
I checked IRQCHIP_EOI_THREADED and it does not seem to be supported by
GIC irqchip. so this would mean that the EOI is not called in our case?

> 
> So far I was just talking about irq operations that happen in the
> host. Not sure I quite understand your last two sentences about guest
> EOI and guest unmap.
sorry I meant unmask instead of unmap :-( The closure of the physical
IRQ is triggered by the deactivation of the guest virtual IRQ.
Let's first assume you keep the existing "auto-masked" level sensitive
IRQ on host side (vfio platform driver), on the guest driver side you
are going to have a oneshot IRQ. What needs to be understood is when
does the deactivation happen on guest side.
> 
>>>
>>> For vfio-platform irq forwarding, there is no existing function in
>>> drivers/vfio/platform/vfio_platform_irq.c[9] that is a good candidate
>>> for registering as the threaded handler for a ONESHOT interrupt in the
>>> case we want to request the ONESHOT irq with
>>> request_threaded_irq()[10]. Moreover, we can't just register a
>>> threaded function that simply immediately returns IRQ_HANDLED (as is
>>> done in vfio_irq_handler()[11] and vfio_automasked_irq_handler()[12]),
>>> since that would cause the IRQ to be unmasked[13] immediately, before
>> sorry I know you reworked that several times for style issue but [13]
>> does not match unmask().
> 
> That's actually the link I intended, irq_finalize_oneshot() will be
> called after the threaded interrupt handler returns, and that's when
> the IRQ will be unmasked.
Hum this points to something within irq_thread_check_affinity(). Anyway.
> 
>>> the userspace/guest driver has had any chance to service the
>>> interrupt.
>>>
>>> The most obvious way I see to handle this is to add a threaded handler
>>> to vfio_platform_irq.c that waits until the userspace/guest driver has
>>> serviced the interrupt and the unmask_handler[14] has been called, at
>>> which point it returns IRQ_HANDLED so the generic IRQ code in the host
>>> can finally unmask the interrupt.
>> this is done on guest EOI at the moment.
> 
> By "this", I think you mean enable_irq() in vfio_platform_irq.c?
right
> 
>>>
>>> Does this sound like a reasonable approach and something you would be
>>> fine with adding to vfio-platform?
>> Well I think it is interesting to do a pre-study and make sure we agree
>> on the terminology, IRQ flow and problems that would need to be solved.
>> Then we need to determine if it is worth the candle, if this support
>> would speed up the vfio-platform usage (I am not sure at this point as I
>> think there are more important drags like the lack of specification/dt
>> integration for instance).
>>
>> Besides we need to make sure we are not launching into a huge effort for
>> attempting to assign a device that does not fit well into the vfio
>> platform scope (dma capable, reset, device tree).
> 
> Yes, agreed.
> 
>>
>>  If so I could get started looking
>>> at the implementation for how to sleep in the threaded handler in
>>> vfio-platform until the unmask_handler is called. The most tricky/ugly
>>> part of this is that DT has no knowledge of irq ONESHOT-ness, as it
>>> only contains info regarding active-low vs active-high and edge vs
>>> level trigger. That means that vfio-platform can't figure out that a
>>> device uses a ONESHOT irq in a similar way to how it queries[15] the
>>> trigger type, and by extension QEMU can't learn this information
>>> through the VFIO_DEVICE_GET_IRQ_INFO ioctl, but must have another way
>>> of knowing (i.e. command line option to QEMU).
>> Indeed that's not really appealing.
>>>
>>> I guess potentially another option would be to treat ONESHOT
>>> interrupts like regular level triggered interrupts from the
>>> perspective of vfio-platform, but somehow ensure the interrupt stays
>>> masked during injection to the guest, rather than just disabled.
>> I need this to be clarified actually. I am confused by the automasked
>> terminology now.
> 
> My note below tries to explain a bit but AFAICT there's some confusion
> in the VFIO code (both platform and PCI) mixing the IRQ terms
> mask/unmask with disable/enable, which I think are not quite
> interchangeable concepts. IIUC irq enable/disable is a high(er) level
> concept that means that even if the irqchip HW sees interrupts coming
> in, the kernel refrains from calling the interrupt handler for the IRQ
> -- but the kernel keeps track of interrupts that come in while an IRQ
> is disabled, in order to know (once the IRQ gets enabled again by the
> driver) that there are pending interrupts from the HW. On the other
> hand, IRQ masking is a lower level irqchip operation that truly masks
> the IRQs at the hardware level, so the kernel doesn't even know or
> care if the line is being activated/deactivated while the IRQ is
> masked.

Looking again at the genirq code, irq/chip.c __irq_disable, actual
implementation at HW level depends on the irqchip caps. GIC does not
implement the irq_enable() cb so this ends up calling mask_irq which
eventually calls gic_mask_irq and toggles the enable state for the given
IRQ.

Thanks

Eric
> 
> I don't have a great explanation for why ONESHOT interrupts are even a
> thing in the first place, maybe the HW design means you're going to
> get a bunch of garbage/meaningless ups and downs on the interrupt line
> while the interrupt is being serviced and they should never be taken
> into consideration in any way by the kernel until the IRQ is unmasked
> again?
> 
>>
>> Thanks
>>
>> Eric
>>
>>  I'm
>>> not sure whether this could cause legitimate interrupts coming from
>>> devices to be missed while the injection for an existing interrupt is
>>> underway, but maybe this is a rare enough scenario that we wouldn't
>>> care. The main issue with this approach is that handle_level_irq()[16]
>>> will try to unmask the irq out from under us after we start the
>>> injection (as it is already masked before
>>> vfio_automasked_irq_handler[17] runs anyway). Not sure if masking at
>>> the irqchip level supports nesting or not.
>>>
>>> Let me know if you think either of these are viable options for adding
>>> ONESHOT interrupt forwarding support to vfio-platform?
>>>
>>> Thanks,
>>> Micah
>>>
>>>
>>>
>>>
>>> Additional note about level triggered vs ONESHOT irq forwarding:
>>> For the regular type of level triggered interrupt described above, the
>>> vfio handler will call disable_irq_nosync()[18] before the
>>> handle_level_irq() function unmasks the irq and returns. This ensures
>>> if new interrupts come in on the line while the existing one is being
>>> handled by the guest (and the irq is therefore disabled), that the
>>> vfio_automasked_irq_handler() isn’t triggered again until the
>>> vfio_platform_unmask_handler() function has been triggered by the
>>> guest (causing the irq to be re-enabled[19]). In other words, the
>>> purpose of the irq enable/disable that already exists in vfio-platform
>>> is a higher level concept that delays handling of additional
>>> level-triggered interrupts in the host until the current one has been
>>> handled in the guest.
>>>
>>> This means that the existing level triggered interrupt forwarding
>>> logic in vfio/vfio-platform is not sufficient for handling ONESHOT
>>> interrupts (i.e. we can’t just treat a ONESHOT interrupt like a
>>> regular level triggered interrupt in the host and use the existing
>>> vfio forwarding code). The masking that needs to happen for ONESHOT
>>> interrupts is at the lower level of the irqchip mask/unmask in that
>>> the ONESHOT irq needs to remain masked (not just disabled) until the
>>> driver’s threaded handler has completed.
>>>
>>>
>>>
>>>
>>> [1] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L642
>>> [2] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L414
>>> [3] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L619
>>> [4] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L702
>>> [5] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L688
>>> [6] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L724
>>> [7] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L688
>>> [8] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L1028
>>> [9] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c
>>> [10] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L2038
>>> [11] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L167
>>> [12] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L142
>>> [13] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L1028
>>> [14] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L94
>>> [15] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L310
>>> [16] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L642
>>> [17] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L142
>>> [18] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L154
>>> [19] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L87
>>>
>>
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add vfio-platform support for ONESHOT irq forwarding?
  2021-01-25 20:36 ` Alex Williamson
@ 2021-01-26  8:53   ` Auger Eric
  2021-01-26 15:15     ` Micah Morton
  0 siblings, 1 reply; 26+ messages in thread
From: Auger Eric @ 2021-01-26  8:53 UTC (permalink / raw)
  To: Alex Williamson, Micah Morton; +Cc: kvm

Hi,

On 1/25/21 9:36 PM, Alex Williamson wrote:
> On Mon, 25 Jan 2021 10:46:47 -0500
> Micah Morton <mortonm@chromium.org> wrote:
> 
>> Hi Eric,
>>
>> I was recently looking into some vfio-platform passthrough stuff and
>> came across a device I wanted to assign to a guest that uses a ONESHOT
>> type interrupt (these type of interrupts seem to be quite common, on
>> ARM at least). The semantics for ONESHOT interrupts are a bit
>> different from regular level triggered interrupts as I'll describe
>> here:
>>
>> The normal generic code flow for level-triggered interrupts is as follows:
>>
>> - regular type[1]: mask[2] the irq, then run the handler, then
>> unmask[3] the irq and done
>>
>> - fasteoi type[4]: run the handler, then eoi[5] the irq and done
>>
>> Note: IIUC the fasteoi type doesn't do any irq masking/unmasking
>> because that is assumed to be handled transparently by "modern forms
>> of interrupt handlers, which handle the flow details in hardware"
>>
>> ONESHOT type interrupts are a special case of the fasteoi type
>> described above. They rely on the driver registering a threaded
>> handler for the interrupt and assume the irq line will remain masked
>> until the threaded handler completes, at which time the line will be
>> unmasked. TL;DR:
>>
>> - mask[6] the irq, run the handler, and potentially eoi[7] the irq,
>> then unmask[8] later when the threaded handler has finished running.
> 
> This doesn't seem quite correct to me, it skips the discussion of the
> hard vs threaded handler, where the "regular" type would expect the
> device interrupt to be masked in the hard handler, such that the
> controller line can be unmasked during execution of the threaded handler
> (if it exists).  It seems fasteoi is more transactional, ie. rather
> than masking around quiescing the device interrupt, we only need to
> send an eoi once we're ready for a new interrupt.  ONESHOT, OTOH, is a
> means of deferring all device handling to the threaded interrupt,
> specifically for cases such as an i2c device where the bus interaction
> necessitates non-IRQ-context handling.  Sound right?
> 
>>
>> For vfio-platform irq forwarding, there is no existing function in
>> drivers/vfio/platform/vfio_platform_irq.c[9] that is a good candidate
>> for registering as the threaded handler for a ONESHOT interrupt in the
>> case we want to request the ONESHOT irq with
>> request_threaded_irq()[10]. Moreover, we can't just register a
>> threaded function that simply immediately returns IRQ_HANDLED (as is
>> done in vfio_irq_handler()[11] and vfio_automasked_irq_handler()[12]),
>> since that would cause the IRQ to be unmasked[13] immediately, before
>> the userspace/guest driver has had any chance to service the
>> interrupt.
> 
> Are you proposing servicing the device interrupt before it's sent to
> userspace?  A ONESHOT irq is going to eoi the interrupt when the thread
> exits, before userspace services the interrupt, so this seems like a
> case where we'd need to mask the irq regardless of fasteoi handling so
> that it cannot re-assert before userspace manages the device.  Our
> existing autmasked for level triggered interrupts should handle this.

that's my understanding too. If we keep the current automasked level
sensitive vfio driver scheme and if the guest deactivates the vIRQ when
the guest irq thread has completed its job we should be good.
> 
>> The most obvious way I see to handle this is to add a threaded handler
>> to vfio_platform_irq.c that waits until the userspace/guest driver has
>> serviced the interrupt and the unmask_handler[14] has been called, at
>> which point it returns IRQ_HANDLED so the generic IRQ code in the host
>> can finally unmask the interrupt.
> 
> An interrupt thread with an indeterminate, user controlled runtime
> seems bad.  The fact that fasteoi will send an eoi doesn't also mean
> that it can't be masked.
> 
>> Does this sound like a reasonable approach and something you would be
>> fine with adding to vfio-platform? If so I could get started looking
>> at the implementation for how to sleep in the threaded handler in
>> vfio-platform until the unmask_handler is called. The most tricky/ugly
>> part of this is that DT has no knowledge of irq ONESHOT-ness, as it
>> only contains info regarding active-low vs active-high and edge vs
>> level trigger. That means that vfio-platform can't figure out that a
>> device uses a ONESHOT irq in a similar way to how it queries[15] the
>> trigger type, and by extension QEMU can't learn this information
>> through the VFIO_DEVICE_GET_IRQ_INFO ioctl, but must have another way
>> of knowing (i.e. command line option to QEMU).
> 
> Seems like existing level handling w/ masking should handle this, imo.
> 
>> I guess potentially another option would be to treat ONESHOT
>> interrupts like regular level triggered interrupts from the
>> perspective of vfio-platform, but somehow ensure the interrupt stays
>> masked during injection to the guest, rather than just disabled. I'm
>> not sure whether this could cause legitimate interrupts coming from
>> devices to be missed while the injection for an existing interrupt is
>> underway, but maybe this is a rare enough scenario that we wouldn't
>> care. The main issue with this approach is that handle_level_irq()[16]
>> will try to unmask the irq out from under us after we start the
>> injection (as it is already masked before
>> vfio_automasked_irq_handler[17] runs anyway). Not sure if masking at
>> the irqchip level supports nesting or not.
> 
> I'd expect either an unmask at the controller or eoi to re-evaluate the
> interrupt condition and re-assert as needed.  The interrupt will need to
> be exclusive to the device so as not to starve other devices.
To me the physical IRQ pending state depends on the line level.
Depending on this line level, on the unmask the irqchip reevaluates
whether it should be fired.

Thanks

Eric
> 
>> Let me know if you think either of these are viable options for adding
>> ONESHOT interrupt forwarding support to vfio-platform?
>>
>> Thanks,
>> Micah
>>
>>
>>
>>
>> Additional note about level triggered vs ONESHOT irq forwarding:
>> For the regular type of level triggered interrupt described above, the
>> vfio handler will call disable_irq_nosync()[18] before the
>> handle_level_irq() function unmasks the irq and returns. This ensures
>> if new interrupts come in on the line while the existing one is being
>> handled by the guest (and the irq is therefore disabled), that the
>> vfio_automasked_irq_handler() isn’t triggered again until the
>> vfio_platform_unmask_handler() function has been triggered by the
>> guest (causing the irq to be re-enabled[19]). In other words, the
>> purpose of the irq enable/disable that already exists in vfio-platform
>> is a higher level concept that delays handling of additional
>> level-triggered interrupts in the host until the current one has been
>> handled in the guest.
> 
> I wouldn't say "delays", the interrupt condition is not re-evaluated
> until the interrupt is unmasked, by which point the user has had an
> opportunity to service the device, which could de-assert the interrupt
> such that there is no pending interrupt on unmask.  It therefore blocks
> further interrupts until user serviced and unmasked.
>  
>> This means that the existing level triggered interrupt forwarding
>> logic in vfio/vfio-platform is not sufficient for handling ONESHOT
>> interrupts (i.e. we can’t just treat a ONESHOT interrupt like a
>> regular level triggered interrupt in the host and use the existing
>> vfio forwarding code). The masking that needs to happen for ONESHOT
>> interrupts is at the lower level of the irqchip mask/unmask in that
>> the ONESHOT irq needs to remain masked (not just disabled) until the
>> driver’s threaded handler has completed.
> 
> I don't see that this is true, unmasking the irq should cause the
> controller to re-evaluate the irq condition on the device end and issue
> a new interrupt as necessary.  Right?  Thanks,
if the line is asserted,
> 
> Alex
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add vfio-platform support for ONESHOT irq forwarding?
  2021-01-26  8:53   ` Auger Eric
@ 2021-01-26 15:15     ` Micah Morton
  2021-01-26 16:19       ` Auger Eric
  0 siblings, 1 reply; 26+ messages in thread
From: Micah Morton @ 2021-01-26 15:15 UTC (permalink / raw)
  To: Auger Eric; +Cc: Alex Williamson, kvm

On Tue, Jan 26, 2021 at 3:54 AM Auger Eric <eric.auger@redhat.com> wrote:
>
> Hi,
>
> On 1/25/21 9:36 PM, Alex Williamson wrote:
> > On Mon, 25 Jan 2021 10:46:47 -0500
> > Micah Morton <mortonm@chromium.org> wrote:
> >
> >> Hi Eric,
> >>
> >> I was recently looking into some vfio-platform passthrough stuff and
> >> came across a device I wanted to assign to a guest that uses a ONESHOT
> >> type interrupt (these type of interrupts seem to be quite common, on
> >> ARM at least). The semantics for ONESHOT interrupts are a bit
> >> different from regular level triggered interrupts as I'll describe
> >> here:
> >>
> >> The normal generic code flow for level-triggered interrupts is as follows:
> >>
> >> - regular type[1]: mask[2] the irq, then run the handler, then
> >> unmask[3] the irq and done
> >>
> >> - fasteoi type[4]: run the handler, then eoi[5] the irq and done
> >>
> >> Note: IIUC the fasteoi type doesn't do any irq masking/unmasking
> >> because that is assumed to be handled transparently by "modern forms
> >> of interrupt handlers, which handle the flow details in hardware"
> >>
> >> ONESHOT type interrupts are a special case of the fasteoi type
> >> described above. They rely on the driver registering a threaded
> >> handler for the interrupt and assume the irq line will remain masked
> >> until the threaded handler completes, at which time the line will be
> >> unmasked. TL;DR:
> >>
> >> - mask[6] the irq, run the handler, and potentially eoi[7] the irq,
> >> then unmask[8] later when the threaded handler has finished running.
> >
> > This doesn't seem quite correct to me, it skips the discussion of the
> > hard vs threaded handler, where the "regular" type would expect the
> > device interrupt to be masked in the hard handler, such that the
> > controller line can be unmasked during execution of the threaded handler
> > (if it exists).  It seems fasteoi is more transactional, ie. rather

handle_irq_event() only waits for the hard handler to run, not the
threaded handler. And then this comment
(https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L622)
implies that the "regular" type IRQs are not normally unmasked by the
threaded handler but rather before that as part of handle_level_irq()
after handle_irq_event() has returned (since cond_unmask_irq() is
always called there and handle_irq_event() doesn't wait for the
threaded handler to run before returning). I don't actually have first
hand knowledge one way or another whether threaded handlers normally
unmask the IRQ themselves -- just reading the generic IRQ code. Let me
know if I'm missing something here.

> > than masking around quiescing the device interrupt, we only need to
> > send an eoi once we're ready for a new interrupt.  ONESHOT, OTOH, is a
> > means of deferring all device handling to the threaded interrupt,
> > specifically for cases such as an i2c device where the bus interaction
> > necessitates non-IRQ-context handling.  Sound right?

The rest of this matches my understanding.

> >
> >>
> >> For vfio-platform irq forwarding, there is no existing function in
> >> drivers/vfio/platform/vfio_platform_irq.c[9] that is a good candidate
> >> for registering as the threaded handler for a ONESHOT interrupt in the
> >> case we want to request the ONESHOT irq with
> >> request_threaded_irq()[10]. Moreover, we can't just register a
> >> threaded function that simply immediately returns IRQ_HANDLED (as is
> >> done in vfio_irq_handler()[11] and vfio_automasked_irq_handler()[12]),
> >> since that would cause the IRQ to be unmasked[13] immediately, before
> >> the userspace/guest driver has had any chance to service the
> >> interrupt.
> >
> > Are you proposing servicing the device interrupt before it's sent to
> > userspace?  A ONESHOT irq is going to eoi the interrupt when the thread

No I wasn't thinking of doing any servicing before userspace gets it

> > exits, before userspace services the interrupt, so this seems like a

Yeah, although I would say unmask rather than eoi, since as Eric just
pointed out IRQCHIP_EOI_THREADED is not supported by GIC. IOW, the GIC
irqchip does not "require eoi() on unmask in threaded mode"
(https://elixir.bootlin.com/linux/v5.10.7/source/include/linux/irq.h#L575).
The unmask will happen after the thread exits, due to
irq_finalize_oneshot() being called from irq_thread_fn()

> > case where we'd need to mask the irq regardless of fasteoi handling so
> > that it cannot re-assert before userspace manages the device.  Our
> > existing autmasked for level triggered interrupts should handle this.

Yeah that's what I want as well. The thing that was tripping me up is
this description of irq_disable (disable_irq_nosync() as used in VFIO
is a wrapper for irq_disable):
https://elixir.bootlin.com/linux/v5.10.10/source/kernel/irq/chip.c#L371
. This makes it seem like depending on irqchip internals (whether chip
implements irq_disable() callback or not, and what that callback
actually does), we may not be actually masking the irq at the irqchip
level during the disable (although if the IRQ_DISABLE_UNLAZY flag is
set then irq_disable() will lead to mask() being called in the case
the irqchip doesn't implement the disable() callback. But in the
normal case I think that flag is not set). So seemed to me with
ONESHOT we would run the risk of an extra pending interrupt that was
never intended by the hardware (i.e. "an interrupt happens, then the
interrupt flow handler masks the line at the hardware level and marks
it pending"). I guess if we know we are going to ignore this pending
interrupt down the line after guest/userspace has finished with the
interrupt injection then this isn't an issue.

>
> that's my understanding too. If we keep the current automasked level
> sensitive vfio driver scheme and if the guest deactivates the vIRQ when
> the guest irq thread has completed its job we should be good.

So you're saying even if we get a pending interrupt VFIO will
correctly ignore it? Or you're saying we won't get one since the IRQ
will be masked (by some guarantee I don't yet understand)?

> >
> >> The most obvious way I see to handle this is to add a threaded handler
> >> to vfio_platform_irq.c that waits until the userspace/guest driver has
> >> serviced the interrupt and the unmask_handler[14] has been called, at
> >> which point it returns IRQ_HANDLED so the generic IRQ code in the host
> >> can finally unmask the interrupt.
> >
> > An interrupt thread with an indeterminate, user controlled runtime
> > seems bad.  The fact that fasteoi will send an eoi doesn't also mean
> > that it can't be masked.

Yeah ok. And as mentioned above doesn't look like we're doing the eoi
on ARM GIC anyway.

> >
> >> Does this sound like a reasonable approach and something you would be
> >> fine with adding to vfio-platform? If so I could get started looking
> >> at the implementation for how to sleep in the threaded handler in
> >> vfio-platform until the unmask_handler is called. The most tricky/ugly
> >> part of this is that DT has no knowledge of irq ONESHOT-ness, as it
> >> only contains info regarding active-low vs active-high and edge vs
> >> level trigger. That means that vfio-platform can't figure out that a
> >> device uses a ONESHOT irq in a similar way to how it queries[15] the
> >> trigger type, and by extension QEMU can't learn this information
> >> through the VFIO_DEVICE_GET_IRQ_INFO ioctl, but must have another way
> >> of knowing (i.e. command line option to QEMU).
> >
> > Seems like existing level handling w/ masking should handle this, imo.
> >
> >> I guess potentially another option would be to treat ONESHOT
> >> interrupts like regular level triggered interrupts from the
> >> perspective of vfio-platform, but somehow ensure the interrupt stays
> >> masked during injection to the guest, rather than just disabled. I'm
> >> not sure whether this could cause legitimate interrupts coming from
> >> devices to be missed while the injection for an existing interrupt is
> >> underway, but maybe this is a rare enough scenario that we wouldn't
> >> care. The main issue with this approach is that handle_level_irq()[16]
> >> will try to unmask the irq out from under us after we start the
> >> injection (as it is already masked before
> >> vfio_automasked_irq_handler[17] runs anyway). Not sure if masking at
> >> the irqchip level supports nesting or not.
> >
> > I'd expect either an unmask at the controller or eoi to re-evaluate the
> > interrupt condition and re-assert as needed.  The interrupt will need to
> > be exclusive to the device so as not to starve other devices.
> To me the physical IRQ pending state depends on the line level.
> Depending on this line level, on the unmask the irqchip reevaluates
> whether it should be fired.
>
> Thanks
>
> Eric
> >
> >> Let me know if you think either of these are viable options for adding
> >> ONESHOT interrupt forwarding support to vfio-platform?
> >>
> >> Thanks,
> >> Micah
> >>
> >>
> >>
> >>
> >> Additional note about level triggered vs ONESHOT irq forwarding:
> >> For the regular type of level triggered interrupt described above, the
> >> vfio handler will call disable_irq_nosync()[18] before the
> >> handle_level_irq() function unmasks the irq and returns. This ensures
> >> if new interrupts come in on the line while the existing one is being
> >> handled by the guest (and the irq is therefore disabled), that the
> >> vfio_automasked_irq_handler() isn’t triggered again until the
> >> vfio_platform_unmask_handler() function has been triggered by the
> >> guest (causing the irq to be re-enabled[19]). In other words, the
> >> purpose of the irq enable/disable that already exists in vfio-platform
> >> is a higher level concept that delays handling of additional
> >> level-triggered interrupts in the host until the current one has been
> >> handled in the guest.
> >
> > I wouldn't say "delays", the interrupt condition is not re-evaluated
> > until the interrupt is unmasked, by which point the user has had an
> > opportunity to service the device, which could de-assert the interrupt
> > such that there is no pending interrupt on unmask.  It therefore blocks

No pending interrupt on unmask definitely seems like what we want for
ONESHOT. You're saying we have that?

> > further interrupts until user serviced and unmasked.
> >
> >> This means that the existing level triggered interrupt forwarding
> >> logic in vfio/vfio-platform is not sufficient for handling ONESHOT
> >> interrupts (i.e. we can’t just treat a ONESHOT interrupt like a
> >> regular level triggered interrupt in the host and use the existing
> >> vfio forwarding code). The masking that needs to happen for ONESHOT
> >> interrupts is at the lower level of the irqchip mask/unmask in that
> >> the ONESHOT irq needs to remain masked (not just disabled) until the
> >> driver’s threaded handler has completed.
> >
> > I don't see that this is true, unmasking the irq should cause the
> > controller to re-evaluate the irq condition on the device end and issue
> > a new interrupt as necessary.  Right?  Thanks,
> if the line is asserted,

Yeah, my concern was that the re-evaluation would come to the wrong
conclusion if the ONESHOT irq was lazily disabled without doing the
mask at the irqchip HW level.

> >
> > Alex
> >
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add vfio-platform support for ONESHOT irq forwarding?
  2021-01-26  8:48     ` Auger Eric
@ 2021-01-26 15:31       ` Micah Morton
  2021-01-26 16:05         ` Auger Eric
  0 siblings, 1 reply; 26+ messages in thread
From: Micah Morton @ 2021-01-26 15:31 UTC (permalink / raw)
  To: Auger Eric; +Cc: Alex Williamson, kvm

On Tue, Jan 26, 2021 at 3:48 AM Auger Eric <eric.auger@redhat.com> wrote:
>
> Hi Micah,
>
> On 1/25/21 7:32 PM, Micah Morton wrote:
> > On Mon, Jan 25, 2021 at 12:31 PM Auger Eric <eric.auger@redhat.com> wrote:
> >>
> >> Hi Micah,
> >>
> >> On 1/25/21 4:46 PM, Micah Morton wrote:
> >>> Hi Eric,
> >>>
> >>> I was recently looking into some vfio-platform passthrough stuff and
> >>> came across a device I wanted to assign to a guest that uses a ONESHOT
> >>> type interrupt (these type of interrupts seem to be quite common, on
> >>> ARM at least). The semantics for ONESHOT interrupts are a bit
> >>> different from regular level triggered interrupts as I'll describe
> >>> here:
> >>>
> >>> The normal generic code flow for level-triggered interrupts is as follows:
> >>>
> >>> - regular type[1]: mask[2] the irq, then run the handler, then
> >>> unmask[3] the irq and done
> >>
> >> VFIO level sensitive interrupts are "automasked". See slide 10 of
> >> https://www.linux-kvm.org/images/a/a8/01x04-ARMdevice.pdf if this can help.
> >
> > When you say "automasked" / "mask and deactivate" are you referring to
> > the disable_irq_nosync() call in vfio_platform
> > (https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L154)?
> Yes that's what I meant
> >
> >>
> >> When the guest deactivates the virtual IRQ, this causes a maintenance
> >> interrupt on host. This occurence causes kvm_notify_acked_irq() to be
> >> called and this latter unmasks the physical IRQ.
> >
> > Are you saying kvm_notify_acked_irq() causes
> > vfio_platform_unmask_handler()
> > (https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L94)
> > to be called? Makes sense if so.
> Yes this ends up calling irqfd_resampler_ack which toggles the virtual
> irq line down and calls the unmask eventfd
> >
> >>>
> >>> - fasteoi type[4]: run the handler, then eoi[5] the irq and done
> >>>
> >>> Note: IIUC the fasteoi type doesn't do any irq masking/unmasking
> >>> because that is assumed to be handled transparently by "modern forms
> >>> of interrupt handlers, which handle the flow details in hardware"
> >>>
> >>> ONESHOT type interrupts are a special case of the fasteoi type
> >>> described above. They rely on the driver registering a threaded
> >>> handler for the interrupt and assume the irq line will remain masked
> >>> until the threaded handler completes, at which time the line will be
> >>> unmasked. TL;DR:
> >>>
> >>> - mask[6] the irq, run the handler, and potentially eoi[7] the irq,
> >>> then unmask[8] later when the threaded handler has finished running.
> >>
> >> could you point me to the exact linux handler (is it
> >> handle_fasteoi_irq?)  or Why do you say "potentially". Given the details
> >> above, the guest EOI would unmask the IRQ at physical level. We do not
> >> have any hook KVM/VFIO on the guest unmap.
> >
> > Yep. handle_fasteoi_irq(). By "potentially" I just meant the eoi is
> > only called for ONESHOT irqs if this check passes: !(chip->flags &
> > IRQCHIP_EOI_THREADED). Not actually sure what this check is about
> > since I didn't look into it.
> I checked IRQCHIP_EOI_THREADED and it does not seem to be supported by
> GIC irqchip. so this would mean that the EOI is not called in our case?

Sounds like it, thanks for looking into that.

>
> >
> > So far I was just talking about irq operations that happen in the
> > host. Not sure I quite understand your last two sentences about guest
> > EOI and guest unmap.
> sorry I meant unmask instead of unmap :-( The closure of the physical
> IRQ is triggered by the deactivation of the guest virtual IRQ.
> Let's first assume you keep the existing "auto-masked" level sensitive
> IRQ on host side (vfio platform driver), on the guest driver side you
> are going to have a oneshot IRQ. What needs to be understood is when
> does the deactivation happen on guest side.

That's a good question, I need to go back and look at that.

> >
> >>>
> >>> For vfio-platform irq forwarding, there is no existing function in
> >>> drivers/vfio/platform/vfio_platform_irq.c[9] that is a good candidate
> >>> for registering as the threaded handler for a ONESHOT interrupt in the
> >>> case we want to request the ONESHOT irq with
> >>> request_threaded_irq()[10]. Moreover, we can't just register a
> >>> threaded function that simply immediately returns IRQ_HANDLED (as is
> >>> done in vfio_irq_handler()[11] and vfio_automasked_irq_handler()[12]),
> >>> since that would cause the IRQ to be unmasked[13] immediately, before
> >> sorry I know you reworked that several times for style issue but [13]
> >> does not match unmask().
> >
> > That's actually the link I intended, irq_finalize_oneshot() will be
> > called after the threaded interrupt handler returns, and that's when
> > the IRQ will be unmasked.
> Hum this points to something within irq_thread_check_affinity(). Anyway.

Sorry I was pointing the URL at a specific line in
irq_finalize_oneshot() but it was probably confusing.
https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L984
is the function I'm talking about.

> >
> >>> the userspace/guest driver has had any chance to service the
> >>> interrupt.
> >>>
> >>> The most obvious way I see to handle this is to add a threaded handler
> >>> to vfio_platform_irq.c that waits until the userspace/guest driver has
> >>> serviced the interrupt and the unmask_handler[14] has been called, at
> >>> which point it returns IRQ_HANDLED so the generic IRQ code in the host
> >>> can finally unmask the interrupt.
> >> this is done on guest EOI at the moment.
> >
> > By "this", I think you mean enable_irq() in vfio_platform_irq.c?
> right
> >
> >>>
> >>> Does this sound like a reasonable approach and something you would be
> >>> fine with adding to vfio-platform?
> >> Well I think it is interesting to do a pre-study and make sure we agree
> >> on the terminology, IRQ flow and problems that would need to be solved.
> >> Then we need to determine if it is worth the candle, if this support
> >> would speed up the vfio-platform usage (I am not sure at this point as I
> >> think there are more important drags like the lack of specification/dt
> >> integration for instance).
> >>
> >> Besides we need to make sure we are not launching into a huge effort for
> >> attempting to assign a device that does not fit well into the vfio
> >> platform scope (dma capable, reset, device tree).
> >
> > Yes, agreed.
> >
> >>
> >>  If so I could get started looking
> >>> at the implementation for how to sleep in the threaded handler in
> >>> vfio-platform until the unmask_handler is called. The most tricky/ugly
> >>> part of this is that DT has no knowledge of irq ONESHOT-ness, as it
> >>> only contains info regarding active-low vs active-high and edge vs
> >>> level trigger. That means that vfio-platform can't figure out that a
> >>> device uses a ONESHOT irq in a similar way to how it queries[15] the
> >>> trigger type, and by extension QEMU can't learn this information
> >>> through the VFIO_DEVICE_GET_IRQ_INFO ioctl, but must have another way
> >>> of knowing (i.e. command line option to QEMU).
> >> Indeed that's not really appealing.
> >>>
> >>> I guess potentially another option would be to treat ONESHOT
> >>> interrupts like regular level triggered interrupts from the
> >>> perspective of vfio-platform, but somehow ensure the interrupt stays
> >>> masked during injection to the guest, rather than just disabled.
> >> I need this to be clarified actually. I am confused by the automasked
> >> terminology now.
> >
> > My note below tries to explain a bit but AFAICT there's some confusion
> > in the VFIO code (both platform and PCI) mixing the IRQ terms
> > mask/unmask with disable/enable, which I think are not quite
> > interchangeable concepts. IIUC irq enable/disable is a high(er) level
> > concept that means that even if the irqchip HW sees interrupts coming
> > in, the kernel refrains from calling the interrupt handler for the IRQ
> > -- but the kernel keeps track of interrupts that come in while an IRQ
> > is disabled, in order to know (once the IRQ gets enabled again by the
> > driver) that there are pending interrupts from the HW. On the other
> > hand, IRQ masking is a lower level irqchip operation that truly masks
> > the IRQs at the hardware level, so the kernel doesn't even know or
> > care if the line is being activated/deactivated while the IRQ is
> > masked.
>
> Looking again at the genirq code, irq/chip.c __irq_disable, actual
> implementation at HW level depends on the irqchip caps. GIC does not
> implement the irq_enable() cb so this ends up calling mask_irq which
> eventually calls gic_mask_irq and toggles the enable state for the given
> IRQ.

Are you talking about the lazy disable approach
(https://elixir.bootlin.com/linux/v5.10.10/source/kernel/irq/chip.c#L371)?
Sounds like you're implying IRQ_DISABLE_UNLAZY will be true for the
irq when we're doing VFIO forwarding? Or else we would see an extra
pending interrupt on the line during IRQ forwarding.

Maybe setting the IRQ_DISABLE_UNLAZY flag is the solution for ONESHOT
irq forwarding actually, so we guarantee the IRQ is masked immediately
and avoid the lazy approach that could lead to an erroneous pending
interrupt?

>
> Thanks
>
> Eric
> >
> > I don't have a great explanation for why ONESHOT interrupts are even a
> > thing in the first place, maybe the HW design means you're going to
> > get a bunch of garbage/meaningless ups and downs on the interrupt line
> > while the interrupt is being serviced and they should never be taken
> > into consideration in any way by the kernel until the IRQ is unmasked
> > again?
> >
> >>
> >> Thanks
> >>
> >> Eric
> >>
> >>  I'm
> >>> not sure whether this could cause legitimate interrupts coming from
> >>> devices to be missed while the injection for an existing interrupt is
> >>> underway, but maybe this is a rare enough scenario that we wouldn't
> >>> care. The main issue with this approach is that handle_level_irq()[16]
> >>> will try to unmask the irq out from under us after we start the
> >>> injection (as it is already masked before
> >>> vfio_automasked_irq_handler[17] runs anyway). Not sure if masking at
> >>> the irqchip level supports nesting or not.
> >>>
> >>> Let me know if you think either of these are viable options for adding
> >>> ONESHOT interrupt forwarding support to vfio-platform?
> >>>
> >>> Thanks,
> >>> Micah
> >>>
> >>>
> >>>
> >>>
> >>> Additional note about level triggered vs ONESHOT irq forwarding:
> >>> For the regular type of level triggered interrupt described above, the
> >>> vfio handler will call disable_irq_nosync()[18] before the
> >>> handle_level_irq() function unmasks the irq and returns. This ensures
> >>> if new interrupts come in on the line while the existing one is being
> >>> handled by the guest (and the irq is therefore disabled), that the
> >>> vfio_automasked_irq_handler() isn’t triggered again until the
> >>> vfio_platform_unmask_handler() function has been triggered by the
> >>> guest (causing the irq to be re-enabled[19]). In other words, the
> >>> purpose of the irq enable/disable that already exists in vfio-platform
> >>> is a higher level concept that delays handling of additional
> >>> level-triggered interrupts in the host until the current one has been
> >>> handled in the guest.
> >>>
> >>> This means that the existing level triggered interrupt forwarding
> >>> logic in vfio/vfio-platform is not sufficient for handling ONESHOT
> >>> interrupts (i.e. we can’t just treat a ONESHOT interrupt like a
> >>> regular level triggered interrupt in the host and use the existing
> >>> vfio forwarding code). The masking that needs to happen for ONESHOT
> >>> interrupts is at the lower level of the irqchip mask/unmask in that
> >>> the ONESHOT irq needs to remain masked (not just disabled) until the
> >>> driver’s threaded handler has completed.
> >>>
> >>>
> >>>
> >>>
> >>> [1] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L642
> >>> [2] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L414
> >>> [3] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L619
> >>> [4] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L702
> >>> [5] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L688
> >>> [6] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L724
> >>> [7] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L688
> >>> [8] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L1028
> >>> [9] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c
> >>> [10] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L2038
> >>> [11] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L167
> >>> [12] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L142
> >>> [13] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L1028
> >>> [14] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L94
> >>> [15] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L310
> >>> [16] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L642
> >>> [17] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L142
> >>> [18] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L154
> >>> [19] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L87
> >>>
> >>
> >
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add vfio-platform support for ONESHOT irq forwarding?
  2021-01-26 15:31       ` Micah Morton
@ 2021-01-26 16:05         ` Auger Eric
  0 siblings, 0 replies; 26+ messages in thread
From: Auger Eric @ 2021-01-26 16:05 UTC (permalink / raw)
  To: Micah Morton; +Cc: Alex Williamson, kvm

Hi Micah,

On 1/26/21 4:31 PM, Micah Morton wrote:
> On Tue, Jan 26, 2021 at 3:48 AM Auger Eric <eric.auger@redhat.com> wrote:
>>
>> Hi Micah,
>>
>> On 1/25/21 7:32 PM, Micah Morton wrote:
>>> On Mon, Jan 25, 2021 at 12:31 PM Auger Eric <eric.auger@redhat.com> wrote:
>>>>
>>>> Hi Micah,
>>>>
>>>> On 1/25/21 4:46 PM, Micah Morton wrote:
>>>>> Hi Eric,
>>>>>
>>>>> I was recently looking into some vfio-platform passthrough stuff and
>>>>> came across a device I wanted to assign to a guest that uses a ONESHOT
>>>>> type interrupt (these type of interrupts seem to be quite common, on
>>>>> ARM at least). The semantics for ONESHOT interrupts are a bit
>>>>> different from regular level triggered interrupts as I'll describe
>>>>> here:
>>>>>
>>>>> The normal generic code flow for level-triggered interrupts is as follows:
>>>>>
>>>>> - regular type[1]: mask[2] the irq, then run the handler, then
>>>>> unmask[3] the irq and done
>>>>
>>>> VFIO level sensitive interrupts are "automasked". See slide 10 of
>>>> https://www.linux-kvm.org/images/a/a8/01x04-ARMdevice.pdf if this can help.
>>>
>>> When you say "automasked" / "mask and deactivate" are you referring to
>>> the disable_irq_nosync() call in vfio_platform
>>> (https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L154)?
>> Yes that's what I meant
>>>
>>>>
>>>> When the guest deactivates the virtual IRQ, this causes a maintenance
>>>> interrupt on host. This occurence causes kvm_notify_acked_irq() to be
>>>> called and this latter unmasks the physical IRQ.
>>>
>>> Are you saying kvm_notify_acked_irq() causes
>>> vfio_platform_unmask_handler()
>>> (https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L94)
>>> to be called? Makes sense if so.
>> Yes this ends up calling irqfd_resampler_ack which toggles the virtual
>> irq line down and calls the unmask eventfd
>>>
>>>>>
>>>>> - fasteoi type[4]: run the handler, then eoi[5] the irq and done
>>>>>
>>>>> Note: IIUC the fasteoi type doesn't do any irq masking/unmasking
>>>>> because that is assumed to be handled transparently by "modern forms
>>>>> of interrupt handlers, which handle the flow details in hardware"
>>>>>
>>>>> ONESHOT type interrupts are a special case of the fasteoi type
>>>>> described above. They rely on the driver registering a threaded
>>>>> handler for the interrupt and assume the irq line will remain masked
>>>>> until the threaded handler completes, at which time the line will be
>>>>> unmasked. TL;DR:
>>>>>
>>>>> - mask[6] the irq, run the handler, and potentially eoi[7] the irq,
>>>>> then unmask[8] later when the threaded handler has finished running.
>>>>
>>>> could you point me to the exact linux handler (is it
>>>> handle_fasteoi_irq?)  or Why do you say "potentially". Given the details
>>>> above, the guest EOI would unmask the IRQ at physical level. We do not
>>>> have any hook KVM/VFIO on the guest unmap.
>>>
>>> Yep. handle_fasteoi_irq(). By "potentially" I just meant the eoi is
>>> only called for ONESHOT irqs if this check passes: !(chip->flags &
>>> IRQCHIP_EOI_THREADED). Not actually sure what this check is about
>>> since I didn't look into it.
>> I checked IRQCHIP_EOI_THREADED and it does not seem to be supported by
>> GIC irqchip. so this would mean that the EOI is not called in our case?
> 
> Sounds like it, thanks for looking into that.
> 
>>
>>>
>>> So far I was just talking about irq operations that happen in the
>>> host. Not sure I quite understand your last two sentences about guest
>>> EOI and guest unmap.
>> sorry I meant unmask instead of unmap :-( The closure of the physical
>> IRQ is triggered by the deactivation of the guest virtual IRQ.
>> Let's first assume you keep the existing "auto-masked" level sensitive
>> IRQ on host side (vfio platform driver), on the guest driver side you
>> are going to have a oneshot IRQ. What needs to be understood is when
>> does the deactivation happen on guest side.
> 
> That's a good question, I need to go back and look at that.
> 
>>>
>>>>>
>>>>> For vfio-platform irq forwarding, there is no existing function in
>>>>> drivers/vfio/platform/vfio_platform_irq.c[9] that is a good candidate
>>>>> for registering as the threaded handler for a ONESHOT interrupt in the
>>>>> case we want to request the ONESHOT irq with
>>>>> request_threaded_irq()[10]. Moreover, we can't just register a
>>>>> threaded function that simply immediately returns IRQ_HANDLED (as is
>>>>> done in vfio_irq_handler()[11] and vfio_automasked_irq_handler()[12]),
>>>>> since that would cause the IRQ to be unmasked[13] immediately, before
>>>> sorry I know you reworked that several times for style issue but [13]
>>>> does not match unmask().
>>>
>>> That's actually the link I intended, irq_finalize_oneshot() will be
>>> called after the threaded interrupt handler returns, and that's when
>>> the IRQ will be unmasked.
>> Hum this points to something within irq_thread_check_affinity(). Anyway.
> 
> Sorry I was pointing the URL at a specific line in
> irq_finalize_oneshot() but it was probably confusing.
> https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L984
> is the function I'm talking about.
> 
>>>
>>>>> the userspace/guest driver has had any chance to service the
>>>>> interrupt.
>>>>>
>>>>> The most obvious way I see to handle this is to add a threaded handler
>>>>> to vfio_platform_irq.c that waits until the userspace/guest driver has
>>>>> serviced the interrupt and the unmask_handler[14] has been called, at
>>>>> which point it returns IRQ_HANDLED so the generic IRQ code in the host
>>>>> can finally unmask the interrupt.
>>>> this is done on guest EOI at the moment.
>>>
>>> By "this", I think you mean enable_irq() in vfio_platform_irq.c?
>> right
>>>
>>>>>
>>>>> Does this sound like a reasonable approach and something you would be
>>>>> fine with adding to vfio-platform?
>>>> Well I think it is interesting to do a pre-study and make sure we agree
>>>> on the terminology, IRQ flow and problems that would need to be solved.
>>>> Then we need to determine if it is worth the candle, if this support
>>>> would speed up the vfio-platform usage (I am not sure at this point as I
>>>> think there are more important drags like the lack of specification/dt
>>>> integration for instance).
>>>>
>>>> Besides we need to make sure we are not launching into a huge effort for
>>>> attempting to assign a device that does not fit well into the vfio
>>>> platform scope (dma capable, reset, device tree).
>>>
>>> Yes, agreed.
>>>
>>>>
>>>>  If so I could get started looking
>>>>> at the implementation for how to sleep in the threaded handler in
>>>>> vfio-platform until the unmask_handler is called. The most tricky/ugly
>>>>> part of this is that DT has no knowledge of irq ONESHOT-ness, as it
>>>>> only contains info regarding active-low vs active-high and edge vs
>>>>> level trigger. That means that vfio-platform can't figure out that a
>>>>> device uses a ONESHOT irq in a similar way to how it queries[15] the
>>>>> trigger type, and by extension QEMU can't learn this information
>>>>> through the VFIO_DEVICE_GET_IRQ_INFO ioctl, but must have another way
>>>>> of knowing (i.e. command line option to QEMU).
>>>> Indeed that's not really appealing.
>>>>>
>>>>> I guess potentially another option would be to treat ONESHOT
>>>>> interrupts like regular level triggered interrupts from the
>>>>> perspective of vfio-platform, but somehow ensure the interrupt stays
>>>>> masked during injection to the guest, rather than just disabled.
>>>> I need this to be clarified actually. I am confused by the automasked
>>>> terminology now.
>>>
>>> My note below tries to explain a bit but AFAICT there's some confusion
>>> in the VFIO code (both platform and PCI) mixing the IRQ terms
>>> mask/unmask with disable/enable, which I think are not quite
>>> interchangeable concepts. IIUC irq enable/disable is a high(er) level
>>> concept that means that even if the irqchip HW sees interrupts coming
>>> in, the kernel refrains from calling the interrupt handler for the IRQ
>>> -- but the kernel keeps track of interrupts that come in while an IRQ
>>> is disabled, in order to know (once the IRQ gets enabled again by the
>>> driver) that there are pending interrupts from the HW. On the other
>>> hand, IRQ masking is a lower level irqchip operation that truly masks
>>> the IRQs at the hardware level, so the kernel doesn't even know or
>>> care if the line is being activated/deactivated while the IRQ is
>>> masked.
>>
>> Looking again at the genirq code, irq/chip.c __irq_disable, actual
>> implementation at HW level depends on the irqchip caps. GIC does not
>> implement the irq_enable() cb so this ends up calling mask_irq which
>> eventually calls gic_mask_irq and toggles the enable state for the given
>> IRQ.
> 
> Are you talking about the lazy disable approach
> (https://elixir.bootlin.com/linux/v5.10.10/source/kernel/irq/chip.c#L371)?
> Sounds like you're implying IRQ_DISABLE_UNLAZY will be true for the
> irq when we're doing VFIO forwarding? Or else we would see an extra
> pending interrupt on the line during IRQ forwarding.

I tried to remember the path from __irq_disable SW flow downto the GIC
SPI programming. I was not implying the IRQ_DISABLE_UNLAZY flag.

As fas as I remember lazy disable approach was in use when I did the
integration. But that's just an optimization that avoids to access the
HW unnecessarily (ie. the handler may be invoked even if the irq is
disabled).

Thanks

Eric
> 
> Maybe setting the IRQ_DISABLE_UNLAZY flag is the solution for ONESHOT
> irq forwarding actually, so we guarantee the IRQ is masked immediately
> and avoid the lazy approach that could lead to an erroneous pending
> interrupt?
> 
>>
>> Thanks
>>
>> Eric
>>>
>>> I don't have a great explanation for why ONESHOT interrupts are even a
>>> thing in the first place, maybe the HW design means you're going to
>>> get a bunch of garbage/meaningless ups and downs on the interrupt line
>>> while the interrupt is being serviced and they should never be taken
>>> into consideration in any way by the kernel until the IRQ is unmasked
>>> again?
>>>
>>>>
>>>> Thanks
>>>>
>>>> Eric
>>>>
>>>>  I'm
>>>>> not sure whether this could cause legitimate interrupts coming from
>>>>> devices to be missed while the injection for an existing interrupt is
>>>>> underway, but maybe this is a rare enough scenario that we wouldn't
>>>>> care. The main issue with this approach is that handle_level_irq()[16]
>>>>> will try to unmask the irq out from under us after we start the
>>>>> injection (as it is already masked before
>>>>> vfio_automasked_irq_handler[17] runs anyway). Not sure if masking at
>>>>> the irqchip level supports nesting or not.
>>>>>
>>>>> Let me know if you think either of these are viable options for adding
>>>>> ONESHOT interrupt forwarding support to vfio-platform?
>>>>>
>>>>> Thanks,
>>>>> Micah
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Additional note about level triggered vs ONESHOT irq forwarding:
>>>>> For the regular type of level triggered interrupt described above, the
>>>>> vfio handler will call disable_irq_nosync()[18] before the
>>>>> handle_level_irq() function unmasks the irq and returns. This ensures
>>>>> if new interrupts come in on the line while the existing one is being
>>>>> handled by the guest (and the irq is therefore disabled), that the
>>>>> vfio_automasked_irq_handler() isn’t triggered again until the
>>>>> vfio_platform_unmask_handler() function has been triggered by the
>>>>> guest (causing the irq to be re-enabled[19]). In other words, the
>>>>> purpose of the irq enable/disable that already exists in vfio-platform
>>>>> is a higher level concept that delays handling of additional
>>>>> level-triggered interrupts in the host until the current one has been
>>>>> handled in the guest.
>>>>>
>>>>> This means that the existing level triggered interrupt forwarding
>>>>> logic in vfio/vfio-platform is not sufficient for handling ONESHOT
>>>>> interrupts (i.e. we can’t just treat a ONESHOT interrupt like a
>>>>> regular level triggered interrupt in the host and use the existing
>>>>> vfio forwarding code). The masking that needs to happen for ONESHOT
>>>>> interrupts is at the lower level of the irqchip mask/unmask in that
>>>>> the ONESHOT irq needs to remain masked (not just disabled) until the
>>>>> driver’s threaded handler has completed.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> [1] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L642
>>>>> [2] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L414
>>>>> [3] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L619
>>>>> [4] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L702
>>>>> [5] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L688
>>>>> [6] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L724
>>>>> [7] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L688
>>>>> [8] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L1028
>>>>> [9] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c
>>>>> [10] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L2038
>>>>> [11] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L167
>>>>> [12] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L142
>>>>> [13] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/manage.c#L1028
>>>>> [14] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L94
>>>>> [15] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L310
>>>>> [16] https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L642
>>>>> [17] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L142
>>>>> [18] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L154
>>>>> [19] https://elixir.bootlin.com/linux/v5.10.7/source/drivers/vfio/platform/vfio_platform_irq.c#L87
>>>>>
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add vfio-platform support for ONESHOT irq forwarding?
  2021-01-26 15:15     ` Micah Morton
@ 2021-01-26 16:19       ` Auger Eric
  2021-01-27 15:58         ` Micah Morton
  0 siblings, 1 reply; 26+ messages in thread
From: Auger Eric @ 2021-01-26 16:19 UTC (permalink / raw)
  To: Micah Morton; +Cc: Alex Williamson, kvm

Hi Micah,

On 1/26/21 4:15 PM, Micah Morton wrote:
> On Tue, Jan 26, 2021 at 3:54 AM Auger Eric <eric.auger@redhat.com> wrote:
>>
>> Hi,
>>
>> On 1/25/21 9:36 PM, Alex Williamson wrote:
>>> On Mon, 25 Jan 2021 10:46:47 -0500
>>> Micah Morton <mortonm@chromium.org> wrote:
>>>
>>>> Hi Eric,
>>>>
>>>> I was recently looking into some vfio-platform passthrough stuff and
>>>> came across a device I wanted to assign to a guest that uses a ONESHOT
>>>> type interrupt (these type of interrupts seem to be quite common, on
>>>> ARM at least). The semantics for ONESHOT interrupts are a bit
>>>> different from regular level triggered interrupts as I'll describe
>>>> here:
>>>>
>>>> The normal generic code flow for level-triggered interrupts is as follows:
>>>>
>>>> - regular type[1]: mask[2] the irq, then run the handler, then
>>>> unmask[3] the irq and done
>>>>
>>>> - fasteoi type[4]: run the handler, then eoi[5] the irq and done
>>>>
>>>> Note: IIUC the fasteoi type doesn't do any irq masking/unmasking
>>>> because that is assumed to be handled transparently by "modern forms
>>>> of interrupt handlers, which handle the flow details in hardware"
>>>>
>>>> ONESHOT type interrupts are a special case of the fasteoi type
>>>> described above. They rely on the driver registering a threaded
>>>> handler for the interrupt and assume the irq line will remain masked
>>>> until the threaded handler completes, at which time the line will be
>>>> unmasked. TL;DR:
>>>>
>>>> - mask[6] the irq, run the handler, and potentially eoi[7] the irq,
>>>> then unmask[8] later when the threaded handler has finished running.
>>>
>>> This doesn't seem quite correct to me, it skips the discussion of the
>>> hard vs threaded handler, where the "regular" type would expect the
>>> device interrupt to be masked in the hard handler, such that the
>>> controller line can be unmasked during execution of the threaded handler
>>> (if it exists).  It seems fasteoi is more transactional, ie. rather
> 
> handle_irq_event() only waits for the hard handler to run, not the
> threaded handler. And then this comment
> (https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L622)
> implies that the "regular" type IRQs are not normally unmasked by the
> threaded handler but rather before that as part of handle_level_irq()
> after handle_irq_event() has returned (since cond_unmask_irq() is
> always called there and handle_irq_event() doesn't wait for the
> threaded handler to run before returning). I don't actually have first
> hand knowledge one way or another whether threaded handlers normally
> unmask the IRQ themselves -- just reading the generic IRQ code. Let me
> know if I'm missing something here.
> 
>>> than masking around quiescing the device interrupt, we only need to
>>> send an eoi once we're ready for a new interrupt.  ONESHOT, OTOH, is a
>>> means of deferring all device handling to the threaded interrupt,
>>> specifically for cases such as an i2c device where the bus interaction
>>> necessitates non-IRQ-context handling.  Sound right?
> 
> The rest of this matches my understanding.
> 
>>>
>>>>
>>>> For vfio-platform irq forwarding, there is no existing function in
>>>> drivers/vfio/platform/vfio_platform_irq.c[9] that is a good candidate
>>>> for registering as the threaded handler for a ONESHOT interrupt in the
>>>> case we want to request the ONESHOT irq with
>>>> request_threaded_irq()[10]. Moreover, we can't just register a
>>>> threaded function that simply immediately returns IRQ_HANDLED (as is
>>>> done in vfio_irq_handler()[11] and vfio_automasked_irq_handler()[12]),
>>>> since that would cause the IRQ to be unmasked[13] immediately, before
>>>> the userspace/guest driver has had any chance to service the
>>>> interrupt.
>>>
>>> Are you proposing servicing the device interrupt before it's sent to
>>> userspace?  A ONESHOT irq is going to eoi the interrupt when the thread
> 
> No I wasn't thinking of doing any servicing before userspace gets it
> 
>>> exits, before userspace services the interrupt, so this seems like a
> 
> Yeah, although I would say unmask rather than eoi, since as Eric just
> pointed out IRQCHIP_EOI_THREADED is not supported by GIC. IOW, the GIC
> irqchip does not "require eoi() on unmask in threaded mode"
> (https://elixir.bootlin.com/linux/v5.10.7/source/include/linux/irq.h#L575).
> The unmask will happen after the thread exits, due to
> irq_finalize_oneshot() being called from irq_thread_fn()
> 
>>> case where we'd need to mask the irq regardless of fasteoi handling so
>>> that it cannot re-assert before userspace manages the device.  Our
>>> existing autmasked for level triggered interrupts should handle this.
> 
> Yeah that's what I want as well. The thing that was tripping me up is
> this description of irq_disable (disable_irq_nosync() as used in VFIO
> is a wrapper for irq_disable):
> https://elixir.bootlin.com/linux/v5.10.10/source/kernel/irq/chip.c#L371
> . This makes it seem like depending on irqchip internals (whether chip
> implements irq_disable() callback or not, and what that callback
> actually does), we may not be actually masking the irq at the irqchip
> level during the disable (although if the IRQ_DISABLE_UNLAZY flag is
> set then irq_disable() will lead to mask() being called in the case
> the irqchip doesn't implement the disable() callback. But in the
> normal case I think that flag is not set). So seemed to me with
> ONESHOT we would run the risk of an extra pending interrupt that was
> never intended by the hardware (i.e. "an interrupt happens, then the
> interrupt flow handler masks the line at the hardware level and marks
> it pending"). I guess if we know we are going to ignore this pending
> interrupt down the line after guest/userspace has finished with the
> interrupt injection then this isn't an issue.

Here is my understanding.

if the IRQ has been automasked by VFIO on entry in
vfio_automasked_irq_handler(), the corresponding SPI (shared peripheral
interrupt) has been disabled and cannot be triggered by the GIC anymore
(until corresponding unmask). The physical IRQ is deactivated (EOI)
after the host ISR by the host genirq code. So the active state of the
SPI is removed at this point. When the guest deactivates the vIRQ (so I
understand after the completion of the guest vIRQ thread), VFIO unmask
handler gets called, the physical SPI is re-enabled. At this point, the
GIC scans the physical line again and depending on its state triggers
the physical IRQ again.

https://www.linux-kvm.org/images/a/a8/01x04-ARMdevice.pdf
slide 7 for the SPI states and slide 7 for the KVM/ARM forwarding flow

Eric
> 
>>
>> that's my understanding too. If we keep the current automasked level
>> sensitive vfio driver scheme and if the guest deactivates the vIRQ when
>> the guest irq thread has completed its job we should be good.
> 
> So you're saying even if we get a pending interrupt VFIO will
> correctly ignore it? Or you're saying we won't get one since the IRQ
> will be masked (by some guarantee I don't yet understand)?
> 
>>>
>>>> The most obvious way I see to handle this is to add a threaded handler
>>>> to vfio_platform_irq.c that waits until the userspace/guest driver has
>>>> serviced the interrupt and the unmask_handler[14] has been called, at
>>>> which point it returns IRQ_HANDLED so the generic IRQ code in the host
>>>> can finally unmask the interrupt.
>>>
>>> An interrupt thread with an indeterminate, user controlled runtime
>>> seems bad.  The fact that fasteoi will send an eoi doesn't also mean
>>> that it can't be masked.
> 
> Yeah ok. And as mentioned above doesn't look like we're doing the eoi
> on ARM GIC anyway.
> 
>>>
>>>> Does this sound like a reasonable approach and something you would be
>>>> fine with adding to vfio-platform? If so I could get started looking
>>>> at the implementation for how to sleep in the threaded handler in
>>>> vfio-platform until the unmask_handler is called. The most tricky/ugly
>>>> part of this is that DT has no knowledge of irq ONESHOT-ness, as it
>>>> only contains info regarding active-low vs active-high and edge vs
>>>> level trigger. That means that vfio-platform can't figure out that a
>>>> device uses a ONESHOT irq in a similar way to how it queries[15] the
>>>> trigger type, and by extension QEMU can't learn this information
>>>> through the VFIO_DEVICE_GET_IRQ_INFO ioctl, but must have another way
>>>> of knowing (i.e. command line option to QEMU).
>>>
>>> Seems like existing level handling w/ masking should handle this, imo.
>>>
>>>> I guess potentially another option would be to treat ONESHOT
>>>> interrupts like regular level triggered interrupts from the
>>>> perspective of vfio-platform, but somehow ensure the interrupt stays
>>>> masked during injection to the guest, rather than just disabled. I'm
>>>> not sure whether this could cause legitimate interrupts coming from
>>>> devices to be missed while the injection for an existing interrupt is
>>>> underway, but maybe this is a rare enough scenario that we wouldn't
>>>> care. The main issue with this approach is that handle_level_irq()[16]
>>>> will try to unmask the irq out from under us after we start the
>>>> injection (as it is already masked before
>>>> vfio_automasked_irq_handler[17] runs anyway). Not sure if masking at
>>>> the irqchip level supports nesting or not.
>>>
>>> I'd expect either an unmask at the controller or eoi to re-evaluate the
>>> interrupt condition and re-assert as needed.  The interrupt will need to
>>> be exclusive to the device so as not to starve other devices.
>> To me the physical IRQ pending state depends on the line level.
>> Depending on this line level, on the unmask the irqchip reevaluates
>> whether it should be fired.
>>
>> Thanks
>>
>> Eric
>>>
>>>> Let me know if you think either of these are viable options for adding
>>>> ONESHOT interrupt forwarding support to vfio-platform?
>>>>
>>>> Thanks,
>>>> Micah
>>>>
>>>>
>>>>
>>>>
>>>> Additional note about level triggered vs ONESHOT irq forwarding:
>>>> For the regular type of level triggered interrupt described above, the
>>>> vfio handler will call disable_irq_nosync()[18] before the
>>>> handle_level_irq() function unmasks the irq and returns. This ensures
>>>> if new interrupts come in on the line while the existing one is being
>>>> handled by the guest (and the irq is therefore disabled), that the
>>>> vfio_automasked_irq_handler() isn’t triggered again until the
>>>> vfio_platform_unmask_handler() function has been triggered by the
>>>> guest (causing the irq to be re-enabled[19]). In other words, the
>>>> purpose of the irq enable/disable that already exists in vfio-platform
>>>> is a higher level concept that delays handling of additional
>>>> level-triggered interrupts in the host until the current one has been
>>>> handled in the guest.
>>>
>>> I wouldn't say "delays", the interrupt condition is not re-evaluated
>>> until the interrupt is unmasked, by which point the user has had an
>>> opportunity to service the device, which could de-assert the interrupt
>>> such that there is no pending interrupt on unmask.  It therefore blocks
> 
> No pending interrupt on unmask definitely seems like what we want for
> ONESHOT. You're saying we have that?
> 
>>> further interrupts until user serviced and unmasked.
>>>
>>>> This means that the existing level triggered interrupt forwarding
>>>> logic in vfio/vfio-platform is not sufficient for handling ONESHOT
>>>> interrupts (i.e. we can’t just treat a ONESHOT interrupt like a
>>>> regular level triggered interrupt in the host and use the existing
>>>> vfio forwarding code). The masking that needs to happen for ONESHOT
>>>> interrupts is at the lower level of the irqchip mask/unmask in that
>>>> the ONESHOT irq needs to remain masked (not just disabled) until the
>>>> driver’s threaded handler has completed.
>>>
>>> I don't see that this is true, unmasking the irq should cause the
>>> controller to re-evaluate the irq condition on the device end and issue
>>> a new interrupt as necessary.  Right?  Thanks,
>> if the line is asserted,
> 
> Yeah, my concern was that the re-evaluation would come to the wrong
> conclusion if the ONESHOT irq was lazily disabled without doing the
> mask at the irqchip HW level.
> 
>>>
>>> Alex
>>>
>>
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add vfio-platform support for ONESHOT irq forwarding?
  2021-01-26 16:19       ` Auger Eric
@ 2021-01-27 15:58         ` Micah Morton
  2021-01-29 19:57           ` Micah Morton
  0 siblings, 1 reply; 26+ messages in thread
From: Micah Morton @ 2021-01-27 15:58 UTC (permalink / raw)
  To: Auger Eric; +Cc: Alex Williamson, kvm

On Tue, Jan 26, 2021 at 11:20 AM Auger Eric <eric.auger@redhat.com> wrote:
>
> Hi Micah,
>
> On 1/26/21 4:15 PM, Micah Morton wrote:
> > On Tue, Jan 26, 2021 at 3:54 AM Auger Eric <eric.auger@redhat.com> wrote:
> >>
> >> Hi,
> >>
> >> On 1/25/21 9:36 PM, Alex Williamson wrote:
> >>> On Mon, 25 Jan 2021 10:46:47 -0500
> >>> Micah Morton <mortonm@chromium.org> wrote:
> >>>
> >>>> Hi Eric,
> >>>>
> >>>> I was recently looking into some vfio-platform passthrough stuff and
> >>>> came across a device I wanted to assign to a guest that uses a ONESHOT
> >>>> type interrupt (these type of interrupts seem to be quite common, on
> >>>> ARM at least). The semantics for ONESHOT interrupts are a bit
> >>>> different from regular level triggered interrupts as I'll describe
> >>>> here:
> >>>>
> >>>> The normal generic code flow for level-triggered interrupts is as follows:
> >>>>
> >>>> - regular type[1]: mask[2] the irq, then run the handler, then
> >>>> unmask[3] the irq and done
> >>>>
> >>>> - fasteoi type[4]: run the handler, then eoi[5] the irq and done
> >>>>
> >>>> Note: IIUC the fasteoi type doesn't do any irq masking/unmasking
> >>>> because that is assumed to be handled transparently by "modern forms
> >>>> of interrupt handlers, which handle the flow details in hardware"
> >>>>
> >>>> ONESHOT type interrupts are a special case of the fasteoi type
> >>>> described above. They rely on the driver registering a threaded
> >>>> handler for the interrupt and assume the irq line will remain masked
> >>>> until the threaded handler completes, at which time the line will be
> >>>> unmasked. TL;DR:
> >>>>
> >>>> - mask[6] the irq, run the handler, and potentially eoi[7] the irq,
> >>>> then unmask[8] later when the threaded handler has finished running.
> >>>
> >>> This doesn't seem quite correct to me, it skips the discussion of the
> >>> hard vs threaded handler, where the "regular" type would expect the
> >>> device interrupt to be masked in the hard handler, such that the
> >>> controller line can be unmasked during execution of the threaded handler
> >>> (if it exists).  It seems fasteoi is more transactional, ie. rather
> >
> > handle_irq_event() only waits for the hard handler to run, not the
> > threaded handler. And then this comment
> > (https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L622)
> > implies that the "regular" type IRQs are not normally unmasked by the
> > threaded handler but rather before that as part of handle_level_irq()
> > after handle_irq_event() has returned (since cond_unmask_irq() is
> > always called there and handle_irq_event() doesn't wait for the
> > threaded handler to run before returning). I don't actually have first
> > hand knowledge one way or another whether threaded handlers normally
> > unmask the IRQ themselves -- just reading the generic IRQ code. Let me
> > know if I'm missing something here.
> >
> >>> than masking around quiescing the device interrupt, we only need to
> >>> send an eoi once we're ready for a new interrupt.  ONESHOT, OTOH, is a
> >>> means of deferring all device handling to the threaded interrupt,
> >>> specifically for cases such as an i2c device where the bus interaction
> >>> necessitates non-IRQ-context handling.  Sound right?
> >
> > The rest of this matches my understanding.
> >
> >>>
> >>>>
> >>>> For vfio-platform irq forwarding, there is no existing function in
> >>>> drivers/vfio/platform/vfio_platform_irq.c[9] that is a good candidate
> >>>> for registering as the threaded handler for a ONESHOT interrupt in the
> >>>> case we want to request the ONESHOT irq with
> >>>> request_threaded_irq()[10]. Moreover, we can't just register a
> >>>> threaded function that simply immediately returns IRQ_HANDLED (as is
> >>>> done in vfio_irq_handler()[11] and vfio_automasked_irq_handler()[12]),
> >>>> since that would cause the IRQ to be unmasked[13] immediately, before
> >>>> the userspace/guest driver has had any chance to service the
> >>>> interrupt.
> >>>
> >>> Are you proposing servicing the device interrupt before it's sent to
> >>> userspace?  A ONESHOT irq is going to eoi the interrupt when the thread
> >
> > No I wasn't thinking of doing any servicing before userspace gets it
> >
> >>> exits, before userspace services the interrupt, so this seems like a
> >
> > Yeah, although I would say unmask rather than eoi, since as Eric just
> > pointed out IRQCHIP_EOI_THREADED is not supported by GIC. IOW, the GIC
> > irqchip does not "require eoi() on unmask in threaded mode"
> > (https://elixir.bootlin.com/linux/v5.10.7/source/include/linux/irq.h#L575).
> > The unmask will happen after the thread exits, due to
> > irq_finalize_oneshot() being called from irq_thread_fn()
> >
> >>> case where we'd need to mask the irq regardless of fasteoi handling so
> >>> that it cannot re-assert before userspace manages the device.  Our
> >>> existing autmasked for level triggered interrupts should handle this.
> >
> > Yeah that's what I want as well. The thing that was tripping me up is
> > this description of irq_disable (disable_irq_nosync() as used in VFIO
> > is a wrapper for irq_disable):
> > https://elixir.bootlin.com/linux/v5.10.10/source/kernel/irq/chip.c#L371
> > . This makes it seem like depending on irqchip internals (whether chip
> > implements irq_disable() callback or not, and what that callback
> > actually does), we may not be actually masking the irq at the irqchip
> > level during the disable (although if the IRQ_DISABLE_UNLAZY flag is
> > set then irq_disable() will lead to mask() being called in the case
> > the irqchip doesn't implement the disable() callback. But in the
> > normal case I think that flag is not set). So seemed to me with
> > ONESHOT we would run the risk of an extra pending interrupt that was
> > never intended by the hardware (i.e. "an interrupt happens, then the
> > interrupt flow handler masks the line at the hardware level and marks
> > it pending"). I guess if we know we are going to ignore this pending
> > interrupt down the line after guest/userspace has finished with the
> > interrupt injection then this isn't an issue.
>
> Here is my understanding.
>
> if the IRQ has been automasked by VFIO on entry in
> vfio_automasked_irq_handler(), the corresponding SPI (shared peripheral
> interrupt) has been disabled and cannot be triggered by the GIC anymore
> (until corresponding unmask). The physical IRQ is deactivated (EOI)
> after the host ISR by the host genirq code. So the active state of the
> SPI is removed at this point. When the guest deactivates the vIRQ (so I
> understand after the completion of the guest vIRQ thread), VFIO unmask
> handler gets called, the physical SPI is re-enabled. At this point, the
> GIC scans the physical line again and depending on its state triggers
> the physical IRQ again.
>
> https://www.linux-kvm.org/images/a/a8/01x04-ARMdevice.pdf
> slide 7 for the SPI states and slide 7 for the KVM/ARM forwarding flow

Ok thanks for the explanation. Sounds like you and Alex are in
agreement that there shouldn't be a problem with seeing extra
erroneous pending interrupts upon calling vfio_platform_unmask() in
the host when doing ONESHOT interrupt forwarding. I have some HW
devices that use ONESHOT interrupts that I have working in a guest VM,
although I had used some of the hacks described in my original email
to get things working. I'll go back and see if everything works well
with none of these modifications to vfio-platform / guest driver.

>
> Eric
> >
> >>
> >> that's my understanding too. If we keep the current automasked level
> >> sensitive vfio driver scheme and if the guest deactivates the vIRQ when
> >> the guest irq thread has completed its job we should be good.
> >
> > So you're saying even if we get a pending interrupt VFIO will
> > correctly ignore it? Or you're saying we won't get one since the IRQ
> > will be masked (by some guarantee I don't yet understand)?
> >
> >>>
> >>>> The most obvious way I see to handle this is to add a threaded handler
> >>>> to vfio_platform_irq.c that waits until the userspace/guest driver has
> >>>> serviced the interrupt and the unmask_handler[14] has been called, at
> >>>> which point it returns IRQ_HANDLED so the generic IRQ code in the host
> >>>> can finally unmask the interrupt.
> >>>
> >>> An interrupt thread with an indeterminate, user controlled runtime
> >>> seems bad.  The fact that fasteoi will send an eoi doesn't also mean
> >>> that it can't be masked.
> >
> > Yeah ok. And as mentioned above doesn't look like we're doing the eoi
> > on ARM GIC anyway.
> >
> >>>
> >>>> Does this sound like a reasonable approach and something you would be
> >>>> fine with adding to vfio-platform? If so I could get started looking
> >>>> at the implementation for how to sleep in the threaded handler in
> >>>> vfio-platform until the unmask_handler is called. The most tricky/ugly
> >>>> part of this is that DT has no knowledge of irq ONESHOT-ness, as it
> >>>> only contains info regarding active-low vs active-high and edge vs
> >>>> level trigger. That means that vfio-platform can't figure out that a
> >>>> device uses a ONESHOT irq in a similar way to how it queries[15] the
> >>>> trigger type, and by extension QEMU can't learn this information
> >>>> through the VFIO_DEVICE_GET_IRQ_INFO ioctl, but must have another way
> >>>> of knowing (i.e. command line option to QEMU).
> >>>
> >>> Seems like existing level handling w/ masking should handle this, imo.
> >>>
> >>>> I guess potentially another option would be to treat ONESHOT
> >>>> interrupts like regular level triggered interrupts from the
> >>>> perspective of vfio-platform, but somehow ensure the interrupt stays
> >>>> masked during injection to the guest, rather than just disabled. I'm
> >>>> not sure whether this could cause legitimate interrupts coming from
> >>>> devices to be missed while the injection for an existing interrupt is
> >>>> underway, but maybe this is a rare enough scenario that we wouldn't
> >>>> care. The main issue with this approach is that handle_level_irq()[16]
> >>>> will try to unmask the irq out from under us after we start the
> >>>> injection (as it is already masked before
> >>>> vfio_automasked_irq_handler[17] runs anyway). Not sure if masking at
> >>>> the irqchip level supports nesting or not.
> >>>
> >>> I'd expect either an unmask at the controller or eoi to re-evaluate the
> >>> interrupt condition and re-assert as needed.  The interrupt will need to
> >>> be exclusive to the device so as not to starve other devices.
> >> To me the physical IRQ pending state depends on the line level.
> >> Depending on this line level, on the unmask the irqchip reevaluates
> >> whether it should be fired.
> >>
> >> Thanks
> >>
> >> Eric
> >>>
> >>>> Let me know if you think either of these are viable options for adding
> >>>> ONESHOT interrupt forwarding support to vfio-platform?
> >>>>
> >>>> Thanks,
> >>>> Micah
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Additional note about level triggered vs ONESHOT irq forwarding:
> >>>> For the regular type of level triggered interrupt described above, the
> >>>> vfio handler will call disable_irq_nosync()[18] before the
> >>>> handle_level_irq() function unmasks the irq and returns. This ensures
> >>>> if new interrupts come in on the line while the existing one is being
> >>>> handled by the guest (and the irq is therefore disabled), that the
> >>>> vfio_automasked_irq_handler() isn’t triggered again until the
> >>>> vfio_platform_unmask_handler() function has been triggered by the
> >>>> guest (causing the irq to be re-enabled[19]). In other words, the
> >>>> purpose of the irq enable/disable that already exists in vfio-platform
> >>>> is a higher level concept that delays handling of additional
> >>>> level-triggered interrupts in the host until the current one has been
> >>>> handled in the guest.
> >>>
> >>> I wouldn't say "delays", the interrupt condition is not re-evaluated
> >>> until the interrupt is unmasked, by which point the user has had an
> >>> opportunity to service the device, which could de-assert the interrupt
> >>> such that there is no pending interrupt on unmask.  It therefore blocks
> >
> > No pending interrupt on unmask definitely seems like what we want for
> > ONESHOT. You're saying we have that?
> >
> >>> further interrupts until user serviced and unmasked.
> >>>
> >>>> This means that the existing level triggered interrupt forwarding
> >>>> logic in vfio/vfio-platform is not sufficient for handling ONESHOT
> >>>> interrupts (i.e. we can’t just treat a ONESHOT interrupt like a
> >>>> regular level triggered interrupt in the host and use the existing
> >>>> vfio forwarding code). The masking that needs to happen for ONESHOT
> >>>> interrupts is at the lower level of the irqchip mask/unmask in that
> >>>> the ONESHOT irq needs to remain masked (not just disabled) until the
> >>>> driver’s threaded handler has completed.
> >>>
> >>> I don't see that this is true, unmasking the irq should cause the
> >>> controller to re-evaluate the irq condition on the device end and issue
> >>> a new interrupt as necessary.  Right?  Thanks,
> >> if the line is asserted,
> >
> > Yeah, my concern was that the re-evaluation would come to the wrong
> > conclusion if the ONESHOT irq was lazily disabled without doing the
> > mask at the irqchip HW level.
> >
> >>>
> >>> Alex
> >>>
> >>
> >
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add vfio-platform support for ONESHOT irq forwarding?
  2021-01-27 15:58         ` Micah Morton
@ 2021-01-29 19:57           ` Micah Morton
  2021-01-30 16:38             ` Auger Eric
  0 siblings, 1 reply; 26+ messages in thread
From: Micah Morton @ 2021-01-29 19:57 UTC (permalink / raw)
  To: Auger Eric; +Cc: Alex Williamson, kvm

On Wed, Jan 27, 2021 at 10:58 AM Micah Morton <mortonm@chromium.org> wrote:
>
> On Tue, Jan 26, 2021 at 11:20 AM Auger Eric <eric.auger@redhat.com> wrote:
> >
> > Hi Micah,
> >
> > On 1/26/21 4:15 PM, Micah Morton wrote:
> > > On Tue, Jan 26, 2021 at 3:54 AM Auger Eric <eric.auger@redhat.com> wrote:
> > >>
> > >> Hi,
> > >>
> > >> On 1/25/21 9:36 PM, Alex Williamson wrote:
> > >>> On Mon, 25 Jan 2021 10:46:47 -0500
> > >>> Micah Morton <mortonm@chromium.org> wrote:
> > >>>
> > >>>> Hi Eric,
> > >>>>
> > >>>> I was recently looking into some vfio-platform passthrough stuff and
> > >>>> came across a device I wanted to assign to a guest that uses a ONESHOT
> > >>>> type interrupt (these type of interrupts seem to be quite common, on
> > >>>> ARM at least). The semantics for ONESHOT interrupts are a bit
> > >>>> different from regular level triggered interrupts as I'll describe
> > >>>> here:
> > >>>>
> > >>>> The normal generic code flow for level-triggered interrupts is as follows:
> > >>>>
> > >>>> - regular type[1]: mask[2] the irq, then run the handler, then
> > >>>> unmask[3] the irq and done
> > >>>>
> > >>>> - fasteoi type[4]: run the handler, then eoi[5] the irq and done
> > >>>>
> > >>>> Note: IIUC the fasteoi type doesn't do any irq masking/unmasking
> > >>>> because that is assumed to be handled transparently by "modern forms
> > >>>> of interrupt handlers, which handle the flow details in hardware"
> > >>>>
> > >>>> ONESHOT type interrupts are a special case of the fasteoi type
> > >>>> described above. They rely on the driver registering a threaded
> > >>>> handler for the interrupt and assume the irq line will remain masked
> > >>>> until the threaded handler completes, at which time the line will be
> > >>>> unmasked. TL;DR:
> > >>>>
> > >>>> - mask[6] the irq, run the handler, and potentially eoi[7] the irq,
> > >>>> then unmask[8] later when the threaded handler has finished running.
> > >>>
> > >>> This doesn't seem quite correct to me, it skips the discussion of the
> > >>> hard vs threaded handler, where the "regular" type would expect the
> > >>> device interrupt to be masked in the hard handler, such that the
> > >>> controller line can be unmasked during execution of the threaded handler
> > >>> (if it exists).  It seems fasteoi is more transactional, ie. rather
> > >
> > > handle_irq_event() only waits for the hard handler to run, not the
> > > threaded handler. And then this comment
> > > (https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L622)
> > > implies that the "regular" type IRQs are not normally unmasked by the
> > > threaded handler but rather before that as part of handle_level_irq()
> > > after handle_irq_event() has returned (since cond_unmask_irq() is
> > > always called there and handle_irq_event() doesn't wait for the
> > > threaded handler to run before returning). I don't actually have first
> > > hand knowledge one way or another whether threaded handlers normally
> > > unmask the IRQ themselves -- just reading the generic IRQ code. Let me
> > > know if I'm missing something here.
> > >
> > >>> than masking around quiescing the device interrupt, we only need to
> > >>> send an eoi once we're ready for a new interrupt.  ONESHOT, OTOH, is a
> > >>> means of deferring all device handling to the threaded interrupt,
> > >>> specifically for cases such as an i2c device where the bus interaction
> > >>> necessitates non-IRQ-context handling.  Sound right?
> > >
> > > The rest of this matches my understanding.
> > >
> > >>>
> > >>>>
> > >>>> For vfio-platform irq forwarding, there is no existing function in
> > >>>> drivers/vfio/platform/vfio_platform_irq.c[9] that is a good candidate
> > >>>> for registering as the threaded handler for a ONESHOT interrupt in the
> > >>>> case we want to request the ONESHOT irq with
> > >>>> request_threaded_irq()[10]. Moreover, we can't just register a
> > >>>> threaded function that simply immediately returns IRQ_HANDLED (as is
> > >>>> done in vfio_irq_handler()[11] and vfio_automasked_irq_handler()[12]),
> > >>>> since that would cause the IRQ to be unmasked[13] immediately, before
> > >>>> the userspace/guest driver has had any chance to service the
> > >>>> interrupt.
> > >>>
> > >>> Are you proposing servicing the device interrupt before it's sent to
> > >>> userspace?  A ONESHOT irq is going to eoi the interrupt when the thread
> > >
> > > No I wasn't thinking of doing any servicing before userspace gets it
> > >
> > >>> exits, before userspace services the interrupt, so this seems like a
> > >
> > > Yeah, although I would say unmask rather than eoi, since as Eric just
> > > pointed out IRQCHIP_EOI_THREADED is not supported by GIC. IOW, the GIC
> > > irqchip does not "require eoi() on unmask in threaded mode"
> > > (https://elixir.bootlin.com/linux/v5.10.7/source/include/linux/irq.h#L575).
> > > The unmask will happen after the thread exits, due to
> > > irq_finalize_oneshot() being called from irq_thread_fn()
> > >
> > >>> case where we'd need to mask the irq regardless of fasteoi handling so
> > >>> that it cannot re-assert before userspace manages the device.  Our
> > >>> existing autmasked for level triggered interrupts should handle this.
> > >
> > > Yeah that's what I want as well. The thing that was tripping me up is
> > > this description of irq_disable (disable_irq_nosync() as used in VFIO
> > > is a wrapper for irq_disable):
> > > https://elixir.bootlin.com/linux/v5.10.10/source/kernel/irq/chip.c#L371
> > > . This makes it seem like depending on irqchip internals (whether chip
> > > implements irq_disable() callback or not, and what that callback
> > > actually does), we may not be actually masking the irq at the irqchip
> > > level during the disable (although if the IRQ_DISABLE_UNLAZY flag is
> > > set then irq_disable() will lead to mask() being called in the case
> > > the irqchip doesn't implement the disable() callback. But in the
> > > normal case I think that flag is not set). So seemed to me with
> > > ONESHOT we would run the risk of an extra pending interrupt that was
> > > never intended by the hardware (i.e. "an interrupt happens, then the
> > > interrupt flow handler masks the line at the hardware level and marks
> > > it pending"). I guess if we know we are going to ignore this pending
> > > interrupt down the line after guest/userspace has finished with the
> > > interrupt injection then this isn't an issue.
> >
> > Here is my understanding.
> >
> > if the IRQ has been automasked by VFIO on entry in
> > vfio_automasked_irq_handler(), the corresponding SPI (shared peripheral
> > interrupt) has been disabled and cannot be triggered by the GIC anymore
> > (until corresponding unmask). The physical IRQ is deactivated (EOI)
> > after the host ISR by the host genirq code. So the active state of the
> > SPI is removed at this point. When the guest deactivates the vIRQ (so I
> > understand after the completion of the guest vIRQ thread), VFIO unmask
> > handler gets called, the physical SPI is re-enabled. At this point, the
> > GIC scans the physical line again and depending on its state triggers
> > the physical IRQ again.
> >
> > https://www.linux-kvm.org/images/a/a8/01x04-ARMdevice.pdf
> > slide 7 for the SPI states and slide 7 for the KVM/ARM forwarding flow
>
> Ok thanks for the explanation. Sounds like you and Alex are in
> agreement that there shouldn't be a problem with seeing extra
> erroneous pending interrupts upon calling vfio_platform_unmask() in
> the host when doing ONESHOT interrupt forwarding. I have some HW
> devices that use ONESHOT interrupts that I have working in a guest VM,
> although I had used some of the hacks described in my original email
> to get things working. I'll go back and see if everything works well
> with none of these modifications to vfio-platform / guest driver.

Looks like the HW I have that uses ONESHOT interrupts works fine with
vfio-platform IRQ forwarding out of the box. There were some bugs on
my end with kernel modules not being loaded that caused an interrupt
storm and for some reason I need to artificially force a call to
vfio_platform_unmask() for the unmask eventfd at the start of VM
execution since the first interrupt gets missed by the VM or
something. After that everything works normally though AFAICT.

Thanks for the help.

>
> >
> > Eric
> > >
> > >>
> > >> that's my understanding too. If we keep the current automasked level
> > >> sensitive vfio driver scheme and if the guest deactivates the vIRQ when
> > >> the guest irq thread has completed its job we should be good.
> > >
> > > So you're saying even if we get a pending interrupt VFIO will
> > > correctly ignore it? Or you're saying we won't get one since the IRQ
> > > will be masked (by some guarantee I don't yet understand)?
> > >
> > >>>
> > >>>> The most obvious way I see to handle this is to add a threaded handler
> > >>>> to vfio_platform_irq.c that waits until the userspace/guest driver has
> > >>>> serviced the interrupt and the unmask_handler[14] has been called, at
> > >>>> which point it returns IRQ_HANDLED so the generic IRQ code in the host
> > >>>> can finally unmask the interrupt.
> > >>>
> > >>> An interrupt thread with an indeterminate, user controlled runtime
> > >>> seems bad.  The fact that fasteoi will send an eoi doesn't also mean
> > >>> that it can't be masked.
> > >
> > > Yeah ok. And as mentioned above doesn't look like we're doing the eoi
> > > on ARM GIC anyway.
> > >
> > >>>
> > >>>> Does this sound like a reasonable approach and something you would be
> > >>>> fine with adding to vfio-platform? If so I could get started looking
> > >>>> at the implementation for how to sleep in the threaded handler in
> > >>>> vfio-platform until the unmask_handler is called. The most tricky/ugly
> > >>>> part of this is that DT has no knowledge of irq ONESHOT-ness, as it
> > >>>> only contains info regarding active-low vs active-high and edge vs
> > >>>> level trigger. That means that vfio-platform can't figure out that a
> > >>>> device uses a ONESHOT irq in a similar way to how it queries[15] the
> > >>>> trigger type, and by extension QEMU can't learn this information
> > >>>> through the VFIO_DEVICE_GET_IRQ_INFO ioctl, but must have another way
> > >>>> of knowing (i.e. command line option to QEMU).
> > >>>
> > >>> Seems like existing level handling w/ masking should handle this, imo.
> > >>>
> > >>>> I guess potentially another option would be to treat ONESHOT
> > >>>> interrupts like regular level triggered interrupts from the
> > >>>> perspective of vfio-platform, but somehow ensure the interrupt stays
> > >>>> masked during injection to the guest, rather than just disabled. I'm
> > >>>> not sure whether this could cause legitimate interrupts coming from
> > >>>> devices to be missed while the injection for an existing interrupt is
> > >>>> underway, but maybe this is a rare enough scenario that we wouldn't
> > >>>> care. The main issue with this approach is that handle_level_irq()[16]
> > >>>> will try to unmask the irq out from under us after we start the
> > >>>> injection (as it is already masked before
> > >>>> vfio_automasked_irq_handler[17] runs anyway). Not sure if masking at
> > >>>> the irqchip level supports nesting or not.
> > >>>
> > >>> I'd expect either an unmask at the controller or eoi to re-evaluate the
> > >>> interrupt condition and re-assert as needed.  The interrupt will need to
> > >>> be exclusive to the device so as not to starve other devices.
> > >> To me the physical IRQ pending state depends on the line level.
> > >> Depending on this line level, on the unmask the irqchip reevaluates
> > >> whether it should be fired.
> > >>
> > >> Thanks
> > >>
> > >> Eric
> > >>>
> > >>>> Let me know if you think either of these are viable options for adding
> > >>>> ONESHOT interrupt forwarding support to vfio-platform?
> > >>>>
> > >>>> Thanks,
> > >>>> Micah
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> Additional note about level triggered vs ONESHOT irq forwarding:
> > >>>> For the regular type of level triggered interrupt described above, the
> > >>>> vfio handler will call disable_irq_nosync()[18] before the
> > >>>> handle_level_irq() function unmasks the irq and returns. This ensures
> > >>>> if new interrupts come in on the line while the existing one is being
> > >>>> handled by the guest (and the irq is therefore disabled), that the
> > >>>> vfio_automasked_irq_handler() isn’t triggered again until the
> > >>>> vfio_platform_unmask_handler() function has been triggered by the
> > >>>> guest (causing the irq to be re-enabled[19]). In other words, the
> > >>>> purpose of the irq enable/disable that already exists in vfio-platform
> > >>>> is a higher level concept that delays handling of additional
> > >>>> level-triggered interrupts in the host until the current one has been
> > >>>> handled in the guest.
> > >>>
> > >>> I wouldn't say "delays", the interrupt condition is not re-evaluated
> > >>> until the interrupt is unmasked, by which point the user has had an
> > >>> opportunity to service the device, which could de-assert the interrupt
> > >>> such that there is no pending interrupt on unmask.  It therefore blocks
> > >
> > > No pending interrupt on unmask definitely seems like what we want for
> > > ONESHOT. You're saying we have that?
> > >
> > >>> further interrupts until user serviced and unmasked.
> > >>>
> > >>>> This means that the existing level triggered interrupt forwarding
> > >>>> logic in vfio/vfio-platform is not sufficient for handling ONESHOT
> > >>>> interrupts (i.e. we can’t just treat a ONESHOT interrupt like a
> > >>>> regular level triggered interrupt in the host and use the existing
> > >>>> vfio forwarding code). The masking that needs to happen for ONESHOT
> > >>>> interrupts is at the lower level of the irqchip mask/unmask in that
> > >>>> the ONESHOT irq needs to remain masked (not just disabled) until the
> > >>>> driver’s threaded handler has completed.
> > >>>
> > >>> I don't see that this is true, unmasking the irq should cause the
> > >>> controller to re-evaluate the irq condition on the device end and issue
> > >>> a new interrupt as necessary.  Right?  Thanks,
> > >> if the line is asserted,
> > >
> > > Yeah, my concern was that the re-evaluation would come to the wrong
> > > conclusion if the ONESHOT irq was lazily disabled without doing the
> > > mask at the irqchip HW level.
> > >
> > >>>
> > >>> Alex
> > >>>
> > >>
> > >
> >

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add vfio-platform support for ONESHOT irq forwarding?
  2021-01-29 19:57           ` Micah Morton
@ 2021-01-30 16:38             ` Auger Eric
  2022-07-06 15:25               ` Dmytro Maluka
  0 siblings, 1 reply; 26+ messages in thread
From: Auger Eric @ 2021-01-30 16:38 UTC (permalink / raw)
  To: Micah Morton; +Cc: Alex Williamson, kvm

Hi Micah,

On 1/29/21 8:57 PM, Micah Morton wrote:
> On Wed, Jan 27, 2021 at 10:58 AM Micah Morton <mortonm@chromium.org> wrote:
>>
>> On Tue, Jan 26, 2021 at 11:20 AM Auger Eric <eric.auger@redhat.com> wrote:
>>>
>>> Hi Micah,
>>>
>>> On 1/26/21 4:15 PM, Micah Morton wrote:
>>>> On Tue, Jan 26, 2021 at 3:54 AM Auger Eric <eric.auger@redhat.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> On 1/25/21 9:36 PM, Alex Williamson wrote:
>>>>>> On Mon, 25 Jan 2021 10:46:47 -0500
>>>>>> Micah Morton <mortonm@chromium.org> wrote:
>>>>>>
>>>>>>> Hi Eric,
>>>>>>>
>>>>>>> I was recently looking into some vfio-platform passthrough stuff and
>>>>>>> came across a device I wanted to assign to a guest that uses a ONESHOT
>>>>>>> type interrupt (these type of interrupts seem to be quite common, on
>>>>>>> ARM at least). The semantics for ONESHOT interrupts are a bit
>>>>>>> different from regular level triggered interrupts as I'll describe
>>>>>>> here:
>>>>>>>
>>>>>>> The normal generic code flow for level-triggered interrupts is as follows:
>>>>>>>
>>>>>>> - regular type[1]: mask[2] the irq, then run the handler, then
>>>>>>> unmask[3] the irq and done
>>>>>>>
>>>>>>> - fasteoi type[4]: run the handler, then eoi[5] the irq and done
>>>>>>>
>>>>>>> Note: IIUC the fasteoi type doesn't do any irq masking/unmasking
>>>>>>> because that is assumed to be handled transparently by "modern forms
>>>>>>> of interrupt handlers, which handle the flow details in hardware"
>>>>>>>
>>>>>>> ONESHOT type interrupts are a special case of the fasteoi type
>>>>>>> described above. They rely on the driver registering a threaded
>>>>>>> handler for the interrupt and assume the irq line will remain masked
>>>>>>> until the threaded handler completes, at which time the line will be
>>>>>>> unmasked. TL;DR:
>>>>>>>
>>>>>>> - mask[6] the irq, run the handler, and potentially eoi[7] the irq,
>>>>>>> then unmask[8] later when the threaded handler has finished running.
>>>>>>
>>>>>> This doesn't seem quite correct to me, it skips the discussion of the
>>>>>> hard vs threaded handler, where the "regular" type would expect the
>>>>>> device interrupt to be masked in the hard handler, such that the
>>>>>> controller line can be unmasked during execution of the threaded handler
>>>>>> (if it exists).  It seems fasteoi is more transactional, ie. rather
>>>>
>>>> handle_irq_event() only waits for the hard handler to run, not the
>>>> threaded handler. And then this comment
>>>> (https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L622)
>>>> implies that the "regular" type IRQs are not normally unmasked by the
>>>> threaded handler but rather before that as part of handle_level_irq()
>>>> after handle_irq_event() has returned (since cond_unmask_irq() is
>>>> always called there and handle_irq_event() doesn't wait for the
>>>> threaded handler to run before returning). I don't actually have first
>>>> hand knowledge one way or another whether threaded handlers normally
>>>> unmask the IRQ themselves -- just reading the generic IRQ code. Let me
>>>> know if I'm missing something here.
>>>>
>>>>>> than masking around quiescing the device interrupt, we only need to
>>>>>> send an eoi once we're ready for a new interrupt.  ONESHOT, OTOH, is a
>>>>>> means of deferring all device handling to the threaded interrupt,
>>>>>> specifically for cases such as an i2c device where the bus interaction
>>>>>> necessitates non-IRQ-context handling.  Sound right?
>>>>
>>>> The rest of this matches my understanding.
>>>>
>>>>>>
>>>>>>>
>>>>>>> For vfio-platform irq forwarding, there is no existing function in
>>>>>>> drivers/vfio/platform/vfio_platform_irq.c[9] that is a good candidate
>>>>>>> for registering as the threaded handler for a ONESHOT interrupt in the
>>>>>>> case we want to request the ONESHOT irq with
>>>>>>> request_threaded_irq()[10]. Moreover, we can't just register a
>>>>>>> threaded function that simply immediately returns IRQ_HANDLED (as is
>>>>>>> done in vfio_irq_handler()[11] and vfio_automasked_irq_handler()[12]),
>>>>>>> since that would cause the IRQ to be unmasked[13] immediately, before
>>>>>>> the userspace/guest driver has had any chance to service the
>>>>>>> interrupt.
>>>>>>
>>>>>> Are you proposing servicing the device interrupt before it's sent to
>>>>>> userspace?  A ONESHOT irq is going to eoi the interrupt when the thread
>>>>
>>>> No I wasn't thinking of doing any servicing before userspace gets it
>>>>
>>>>>> exits, before userspace services the interrupt, so this seems like a
>>>>
>>>> Yeah, although I would say unmask rather than eoi, since as Eric just
>>>> pointed out IRQCHIP_EOI_THREADED is not supported by GIC. IOW, the GIC
>>>> irqchip does not "require eoi() on unmask in threaded mode"
>>>> (https://elixir.bootlin.com/linux/v5.10.7/source/include/linux/irq.h#L575).
>>>> The unmask will happen after the thread exits, due to
>>>> irq_finalize_oneshot() being called from irq_thread_fn()
>>>>
>>>>>> case where we'd need to mask the irq regardless of fasteoi handling so
>>>>>> that it cannot re-assert before userspace manages the device.  Our
>>>>>> existing autmasked for level triggered interrupts should handle this.
>>>>
>>>> Yeah that's what I want as well. The thing that was tripping me up is
>>>> this description of irq_disable (disable_irq_nosync() as used in VFIO
>>>> is a wrapper for irq_disable):
>>>> https://elixir.bootlin.com/linux/v5.10.10/source/kernel/irq/chip.c#L371
>>>> . This makes it seem like depending on irqchip internals (whether chip
>>>> implements irq_disable() callback or not, and what that callback
>>>> actually does), we may not be actually masking the irq at the irqchip
>>>> level during the disable (although if the IRQ_DISABLE_UNLAZY flag is
>>>> set then irq_disable() will lead to mask() being called in the case
>>>> the irqchip doesn't implement the disable() callback. But in the
>>>> normal case I think that flag is not set). So seemed to me with
>>>> ONESHOT we would run the risk of an extra pending interrupt that was
>>>> never intended by the hardware (i.e. "an interrupt happens, then the
>>>> interrupt flow handler masks the line at the hardware level and marks
>>>> it pending"). I guess if we know we are going to ignore this pending
>>>> interrupt down the line after guest/userspace has finished with the
>>>> interrupt injection then this isn't an issue.
>>>
>>> Here is my understanding.
>>>
>>> if the IRQ has been automasked by VFIO on entry in
>>> vfio_automasked_irq_handler(), the corresponding SPI (shared peripheral
>>> interrupt) has been disabled and cannot be triggered by the GIC anymore
>>> (until corresponding unmask). The physical IRQ is deactivated (EOI)
>>> after the host ISR by the host genirq code. So the active state of the
>>> SPI is removed at this point. When the guest deactivates the vIRQ (so I
>>> understand after the completion of the guest vIRQ thread), VFIO unmask
>>> handler gets called, the physical SPI is re-enabled. At this point, the
>>> GIC scans the physical line again and depending on its state triggers
>>> the physical IRQ again.
>>>
>>> https://www.linux-kvm.org/images/a/a8/01x04-ARMdevice.pdf
>>> slide 7 for the SPI states and slide 7 for the KVM/ARM forwarding flow
>>
>> Ok thanks for the explanation. Sounds like you and Alex are in
>> agreement that there shouldn't be a problem with seeing extra
>> erroneous pending interrupts upon calling vfio_platform_unmask() in
>> the host when doing ONESHOT interrupt forwarding. I have some HW
>> devices that use ONESHOT interrupts that I have working in a guest VM,
>> although I had used some of the hacks described in my original email
>> to get things working. I'll go back and see if everything works well
>> with none of these modifications to vfio-platform / guest driver.
> 
> Looks like the HW I have that uses ONESHOT interrupts works fine with
> vfio-platform IRQ forwarding out of the box. There were some bugs on
> my end with kernel modules not being loaded that caused an interrupt
> storm and for some reason I need to artificially force a call to
> vfio_platform_unmask() for the unmask eventfd at the start of VM
> execution since the first interrupt gets missed by the VM or
> something. After that everything works normally though AFAICT.
> 
> Thanks for the help.

You're welcome. Nice to read that ;-)

Thanks

Eric
> 
>>
>>>
>>> Eric
>>>>
>>>>>
>>>>> that's my understanding too. If we keep the current automasked level
>>>>> sensitive vfio driver scheme and if the guest deactivates the vIRQ when
>>>>> the guest irq thread has completed its job we should be good.
>>>>
>>>> So you're saying even if we get a pending interrupt VFIO will
>>>> correctly ignore it? Or you're saying we won't get one since the IRQ
>>>> will be masked (by some guarantee I don't yet understand)?
>>>>
>>>>>>
>>>>>>> The most obvious way I see to handle this is to add a threaded handler
>>>>>>> to vfio_platform_irq.c that waits until the userspace/guest driver has
>>>>>>> serviced the interrupt and the unmask_handler[14] has been called, at
>>>>>>> which point it returns IRQ_HANDLED so the generic IRQ code in the host
>>>>>>> can finally unmask the interrupt.
>>>>>>
>>>>>> An interrupt thread with an indeterminate, user controlled runtime
>>>>>> seems bad.  The fact that fasteoi will send an eoi doesn't also mean
>>>>>> that it can't be masked.
>>>>
>>>> Yeah ok. And as mentioned above doesn't look like we're doing the eoi
>>>> on ARM GIC anyway.
>>>>
>>>>>>
>>>>>>> Does this sound like a reasonable approach and something you would be
>>>>>>> fine with adding to vfio-platform? If so I could get started looking
>>>>>>> at the implementation for how to sleep in the threaded handler in
>>>>>>> vfio-platform until the unmask_handler is called. The most tricky/ugly
>>>>>>> part of this is that DT has no knowledge of irq ONESHOT-ness, as it
>>>>>>> only contains info regarding active-low vs active-high and edge vs
>>>>>>> level trigger. That means that vfio-platform can't figure out that a
>>>>>>> device uses a ONESHOT irq in a similar way to how it queries[15] the
>>>>>>> trigger type, and by extension QEMU can't learn this information
>>>>>>> through the VFIO_DEVICE_GET_IRQ_INFO ioctl, but must have another way
>>>>>>> of knowing (i.e. command line option to QEMU).
>>>>>>
>>>>>> Seems like existing level handling w/ masking should handle this, imo.
>>>>>>
>>>>>>> I guess potentially another option would be to treat ONESHOT
>>>>>>> interrupts like regular level triggered interrupts from the
>>>>>>> perspective of vfio-platform, but somehow ensure the interrupt stays
>>>>>>> masked during injection to the guest, rather than just disabled. I'm
>>>>>>> not sure whether this could cause legitimate interrupts coming from
>>>>>>> devices to be missed while the injection for an existing interrupt is
>>>>>>> underway, but maybe this is a rare enough scenario that we wouldn't
>>>>>>> care. The main issue with this approach is that handle_level_irq()[16]
>>>>>>> will try to unmask the irq out from under us after we start the
>>>>>>> injection (as it is already masked before
>>>>>>> vfio_automasked_irq_handler[17] runs anyway). Not sure if masking at
>>>>>>> the irqchip level supports nesting or not.
>>>>>>
>>>>>> I'd expect either an unmask at the controller or eoi to re-evaluate the
>>>>>> interrupt condition and re-assert as needed.  The interrupt will need to
>>>>>> be exclusive to the device so as not to starve other devices.
>>>>> To me the physical IRQ pending state depends on the line level.
>>>>> Depending on this line level, on the unmask the irqchip reevaluates
>>>>> whether it should be fired.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Eric
>>>>>>
>>>>>>> Let me know if you think either of these are viable options for adding
>>>>>>> ONESHOT interrupt forwarding support to vfio-platform?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Micah
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Additional note about level triggered vs ONESHOT irq forwarding:
>>>>>>> For the regular type of level triggered interrupt described above, the
>>>>>>> vfio handler will call disable_irq_nosync()[18] before the
>>>>>>> handle_level_irq() function unmasks the irq and returns. This ensures
>>>>>>> if new interrupts come in on the line while the existing one is being
>>>>>>> handled by the guest (and the irq is therefore disabled), that the
>>>>>>> vfio_automasked_irq_handler() isn’t triggered again until the
>>>>>>> vfio_platform_unmask_handler() function has been triggered by the
>>>>>>> guest (causing the irq to be re-enabled[19]). In other words, the
>>>>>>> purpose of the irq enable/disable that already exists in vfio-platform
>>>>>>> is a higher level concept that delays handling of additional
>>>>>>> level-triggered interrupts in the host until the current one has been
>>>>>>> handled in the guest.
>>>>>>
>>>>>> I wouldn't say "delays", the interrupt condition is not re-evaluated
>>>>>> until the interrupt is unmasked, by which point the user has had an
>>>>>> opportunity to service the device, which could de-assert the interrupt
>>>>>> such that there is no pending interrupt on unmask.  It therefore blocks
>>>>
>>>> No pending interrupt on unmask definitely seems like what we want for
>>>> ONESHOT. You're saying we have that?
>>>>
>>>>>> further interrupts until user serviced and unmasked.
>>>>>>
>>>>>>> This means that the existing level triggered interrupt forwarding
>>>>>>> logic in vfio/vfio-platform is not sufficient for handling ONESHOT
>>>>>>> interrupts (i.e. we can’t just treat a ONESHOT interrupt like a
>>>>>>> regular level triggered interrupt in the host and use the existing
>>>>>>> vfio forwarding code). The masking that needs to happen for ONESHOT
>>>>>>> interrupts is at the lower level of the irqchip mask/unmask in that
>>>>>>> the ONESHOT irq needs to remain masked (not just disabled) until the
>>>>>>> driver’s threaded handler has completed.
>>>>>>
>>>>>> I don't see that this is true, unmasking the irq should cause the
>>>>>> controller to re-evaluate the irq condition on the device end and issue
>>>>>> a new interrupt as necessary.  Right?  Thanks,
>>>>> if the line is asserted,
>>>>
>>>> Yeah, my concern was that the re-evaluation would come to the wrong
>>>> conclusion if the ONESHOT irq was lazily disabled without doing the
>>>> mask at the irqchip HW level.
>>>>
>>>>>>
>>>>>> Alex
>>>>>>
>>>>>
>>>>
>>>
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add vfio-platform support for ONESHOT irq forwarding?
  2021-01-30 16:38             ` Auger Eric
@ 2022-07-06 15:25               ` Dmytro Maluka
  2022-07-06 20:39                 ` Sean Christopherson
  2022-07-07  8:25                 ` Eric Auger
  0 siblings, 2 replies; 26+ messages in thread
From: Dmytro Maluka @ 2022-07-06 15:25 UTC (permalink / raw)
  To: Auger Eric, Micah Morton
  Cc: Alex Williamson, kvm, Sean Christopherson, Paolo Bonzini,
	Rong L Liu, Tomasz Nowicki, Grzegorz Jaszczyk, Dmitry Torokhov

Hi,

Let me revive this discussion.

It seems last time the conclusion of this discussion was that the 
existing VFIO + KVM mechanism for interrupts forwarding should work just 
as fine for "oneshot" interrupts as for normal level-triggered 
interrupts. It seems, however, that in fact it doesn't work quite 
correctly for "oneshot" interrupts, and now we are observing some real 
problems caused by it (this time on x86, but I suspect that ARM is also 
subject to the same issue).

In short, KVM always sends ack notification at the moment of guest EOI, 
which is too early in the case of oneshot interrupts. Please see my 
comments below.

On 1/30/21 17:38, Auger Eric wrote:
> Hi Micah,
> 
> On 1/29/21 8:57 PM, Micah Morton wrote:
>> On Wed, Jan 27, 2021 at 10:58 AM Micah Morton <mortonm@chromium.org> wrote:
>>>
>>> On Tue, Jan 26, 2021 at 11:20 AM Auger Eric <eric.auger@redhat.com> wrote:
>>>>
>>>> Hi Micah,
>>>>
>>>> On 1/26/21 4:15 PM, Micah Morton wrote:
>>>>> On Tue, Jan 26, 2021 at 3:54 AM Auger Eric <eric.auger@redhat.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On 1/25/21 9:36 PM, Alex Williamson wrote:
>>>>>>> On Mon, 25 Jan 2021 10:46:47 -0500
>>>>>>> Micah Morton <mortonm@chromium.org> wrote:
>>>>>>>
>>>>>>>> Hi Eric,
>>>>>>>>
>>>>>>>> I was recently looking into some vfio-platform passthrough stuff and
>>>>>>>> came across a device I wanted to assign to a guest that uses a ONESHOT
>>>>>>>> type interrupt (these type of interrupts seem to be quite common, on
>>>>>>>> ARM at least). The semantics for ONESHOT interrupts are a bit
>>>>>>>> different from regular level triggered interrupts as I'll describe
>>>>>>>> here:
>>>>>>>>
>>>>>>>> The normal generic code flow for level-triggered interrupts is as follows:
>>>>>>>>
>>>>>>>> - regular type[1]: mask[2] the irq, then run the handler, then
>>>>>>>> unmask[3] the irq and done
>>>>>>>>
>>>>>>>> - fasteoi type[4]: run the handler, then eoi[5] the irq and done
>>>>>>>>
>>>>>>>> Note: IIUC the fasteoi type doesn't do any irq masking/unmasking
>>>>>>>> because that is assumed to be handled transparently by "modern forms
>>>>>>>> of interrupt handlers, which handle the flow details in hardware"
>>>>>>>>
>>>>>>>> ONESHOT type interrupts are a special case of the fasteoi type
>>>>>>>> described above. They rely on the driver registering a threaded
>>>>>>>> handler for the interrupt and assume the irq line will remain masked
>>>>>>>> until the threaded handler completes, at which time the line will be
>>>>>>>> unmasked. TL;DR:
>>>>>>>>
>>>>>>>> - mask[6] the irq, run the handler, and potentially eoi[7] the irq,
>>>>>>>> then unmask[8] later when the threaded handler has finished running.
>>>>>>>
>>>>>>> This doesn't seem quite correct to me, it skips the discussion of the
>>>>>>> hard vs threaded handler, where the "regular" type would expect the
>>>>>>> device interrupt to be masked in the hard handler, such that the
>>>>>>> controller line can be unmasked during execution of the threaded handler
>>>>>>> (if it exists).  It seems fasteoi is more transactional, ie. rather
>>>>>
>>>>> handle_irq_event() only waits for the hard handler to run, not the
>>>>> threaded handler. And then this comment
>>>>> (https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L622)
>>>>> implies that the "regular" type IRQs are not normally unmasked by the
>>>>> threaded handler but rather before that as part of handle_level_irq()
>>>>> after handle_irq_event() has returned (since cond_unmask_irq() is
>>>>> always called there and handle_irq_event() doesn't wait for the
>>>>> threaded handler to run before returning). I don't actually have first
>>>>> hand knowledge one way or another whether threaded handlers normally
>>>>> unmask the IRQ themselves -- just reading the generic IRQ code. Let me
>>>>> know if I'm missing something here.
>>>>>
>>>>>>> than masking around quiescing the device interrupt, we only need to
>>>>>>> send an eoi once we're ready for a new interrupt.  ONESHOT, OTOH, is a
>>>>>>> means of deferring all device handling to the threaded interrupt,
>>>>>>> specifically for cases such as an i2c device where the bus interaction
>>>>>>> necessitates non-IRQ-context handling.  Sound right?
>>>>>
>>>>> The rest of this matches my understanding.
>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> For vfio-platform irq forwarding, there is no existing function in
>>>>>>>> drivers/vfio/platform/vfio_platform_irq.c[9] that is a good candidate
>>>>>>>> for registering as the threaded handler for a ONESHOT interrupt in the
>>>>>>>> case we want to request the ONESHOT irq with
>>>>>>>> request_threaded_irq()[10]. Moreover, we can't just register a
>>>>>>>> threaded function that simply immediately returns IRQ_HANDLED (as is
>>>>>>>> done in vfio_irq_handler()[11] and vfio_automasked_irq_handler()[12]),
>>>>>>>> since that would cause the IRQ to be unmasked[13] immediately, before
>>>>>>>> the userspace/guest driver has had any chance to service the
>>>>>>>> interrupt.
>>>>>>>
>>>>>>> Are you proposing servicing the device interrupt before it's sent to
>>>>>>> userspace?  A ONESHOT irq is going to eoi the interrupt when the thread
>>>>>
>>>>> No I wasn't thinking of doing any servicing before userspace gets it
>>>>>
>>>>>>> exits, before userspace services the interrupt, so this seems like a
>>>>>
>>>>> Yeah, although I would say unmask rather than eoi, since as Eric just
>>>>> pointed out IRQCHIP_EOI_THREADED is not supported by GIC. IOW, the GIC
>>>>> irqchip does not "require eoi() on unmask in threaded mode"
>>>>> (https://elixir.bootlin.com/linux/v5.10.7/source/include/linux/irq.h#L575).
>>>>> The unmask will happen after the thread exits, due to
>>>>> irq_finalize_oneshot() being called from irq_thread_fn()
>>>>>
>>>>>>> case where we'd need to mask the irq regardless of fasteoi handling so
>>>>>>> that it cannot re-assert before userspace manages the device.  Our
>>>>>>> existing autmasked for level triggered interrupts should handle this.

To Alex:
This doesn't seem true. For oneshot fasteoi irq we EOI the interrupt 
when the hardirq handler exits, just like for regular fasteoi irq. (Only 
if IRQCHIP_EOI_THREADED flag is set, EOI is postponed until the threaded 
handler exits. But IRQCHIP_EOI_THREADED is set only for a few exotic 
interrupt controllers, not for generic IO-APIC or GIC.)

This is not a problem on native, since for oneshot irq we keep the 
interrupt masked until the thread exits, so that the EOI at the end of 
hardirq doesn't result in immediate re-assert. In vfio + KVM case, 
however, the host doesn't check that the interrupt is still masked in 
the guest, so vfio_platform_unmask() is called regardless. Therefore, 
since the interrupt has not yet been acked in the guest's threaded 
handler, a new (unwanted) physical interrupt is generated in the host 
and queued for injection to the guest in vfio_automasked_irq_handler().

>>>>>
>>>>> Yeah that's what I want as well. The thing that was tripping me up is
>>>>> this description of irq_disable (disable_irq_nosync() as used in VFIO
>>>>> is a wrapper for irq_disable):
>>>>> https://elixir.bootlin.com/linux/v5.10.10/source/kernel/irq/chip.c#L371
>>>>> . This makes it seem like depending on irqchip internals (whether chip
>>>>> implements irq_disable() callback or not, and what that callback
>>>>> actually does), we may not be actually masking the irq at the irqchip
>>>>> level during the disable (although if the IRQ_DISABLE_UNLAZY flag is
>>>>> set then irq_disable() will lead to mask() being called in the case
>>>>> the irqchip doesn't implement the disable() callback. But in the
>>>>> normal case I think that flag is not set). So seemed to me with
>>>>> ONESHOT we would run the risk of an extra pending interrupt that was
>>>>> never intended by the hardware (i.e. "an interrupt happens, then the
>>>>> interrupt flow handler masks the line at the hardware level and marks
>>>>> it pending"). I guess if we know we are going to ignore this pending
>>>>> interrupt down the line after guest/userspace has finished with the
>>>>> interrupt injection then this isn't an issue.
>>>>
>>>> Here is my understanding.
>>>>
>>>> if the IRQ has been automasked by VFIO on entry in
>>>> vfio_automasked_irq_handler(), the corresponding SPI (shared peripheral
>>>> interrupt) has been disabled and cannot be triggered by the GIC anymore
>>>> (until corresponding unmask). The physical IRQ is deactivated (EOI)
>>>> after the host ISR by the host genirq code. So the active state of the
>>>> SPI is removed at this point. When the guest deactivates the vIRQ (so I
>>>> understand after the completion of the guest vIRQ thread), VFIO unmask
>>>> handler gets called, the physical SPI is re-enabled. At this point, the
>>>> GIC scans the physical line again and depending on its state triggers
>>>> the physical IRQ again.

To Eric:
Again, this doesn't seem to be true. Just as explained in my above reply 
to Alex, the guest deactivates (EOI) the vIRQ already after the 
completion of the vIRQ hardirq handler, not the vIRQ thread.

So VFIO unmask handler gets called too early, before the interrupt gets 
serviced and acked in the vIRQ thread.

>>>>
>>>> https://www.linux-kvm.org/images/a/a8/01x04-ARMdevice.pdf
>>>> slide 7 for the SPI states and slide 7 for the KVM/ARM forwarding flow
>>>
>>> Ok thanks for the explanation. Sounds like you and Alex are in
>>> agreement that there shouldn't be a problem with seeing extra
>>> erroneous pending interrupts upon calling vfio_platform_unmask() in
>>> the host when doing ONESHOT interrupt forwarding. I have some HW

To Micah:
It seems this is not what Eric and Alex were implying. Anyway, even if 
we are fine with those erroneous pending interrupts firing in the host, 
I believe we are not fine with propagating them to the guest.

There are at least 2 user-visible issues caused by those extra erroneous 
pending interrupts for oneshot irq in the guest:

1. System suspend aborted due to a pending wakeup interrupt from
   ChromeOS EC (drivers/platform/chrome/cros_ec.c):

     Abort: Last active Wakeup Source: GOOG0004:00
     PM: suspend exit

2. Annoying "invalid report id data" errors from ELAN0000 touchpad
   (drivers/input/mouse/elan_i2c_core.c), flooding the guest dmesg
   every time the touchpad is touched.

It seems the obvious fix is to postpone sending irq ack notifications in 
KVM from EOI to unmask (for oneshot interrupts only). Luckily, we don't 
need to provide KVM with the info that the given interrupt is oneshot. 
KVM can just find it out from the fact that the interrupt is masked at 
the time of EOI.

Actually I already have a patch implementing such a fix for KVM IO-APIC 
(see below). It makes both the above issues go away. I realize this 
patch is probably racy, since it assumes that dropping ioapic->lock in 
ioapic_write_indirect() is just as fine as doing it in 
kvm_ioapic_update_eoi_one(). I just haven't figured out yet how to 
rework it in a race-free way (BTW, I'd appreciate any suggestions).

Also I guess the same issue exists in KVM on non-x86 as well (since it 
looks like on arm and powerpc kvm_notify_acked_irq() is also always 
called at EOI), as well as in user-space IO-APIC implementations in QEMU 
and other VMMs.


Signed-off-by: Dmytro Maluka <dmy@semihalf.com>
---
  arch/x86/kvm/ioapic.c | 43 +++++++++++++++++++++++++++++++------------
  arch/x86/kvm/ioapic.h |  1 +
  2 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 765943d7cfa5..f008cb5a2bcb 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -365,8 +365,16 @@ static void ioapic_write_indirect(struct kvm_ioapic 
*ioapic, u32 val)
  			e->fields.remote_irr = 0;

  		mask_after = e->fields.mask;
-		if (mask_before != mask_after)
+		if (mask_before != mask_after) {
  			kvm_fire_mask_notifiers(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index, 
mask_after);
+			if (!mask_after && ioapic->ack_pending & (1 << index)) {
+				spin_unlock(&ioapic->lock);
+				kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index);
+				spin_lock(&ioapic->lock);
+
+				ioapic->ack_pending &= ~(1 << index);
+			}
+		}
  		if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG
  		    && ioapic->irr & (1 << index))
  			ioapic_service(ioapic, index, false);
@@ -505,17 +513,28 @@ static void kvm_ioapic_update_eoi_one(struct 
kvm_vcpu *vcpu,
  	struct kvm_lapic *apic = vcpu->arch.apic;
  	union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[pin];

-	/*
-	 * We are dropping lock while calling ack notifiers because ack
-	 * notifier callbacks for assigned devices call into IOAPIC
-	 * recursively. Since remote_irr is cleared only after call
-	 * to notifiers if the same vector will be delivered while lock
-	 * is dropped it will be put into irr and will be delivered
-	 * after ack notifier returns.
-	 */
-	spin_unlock(&ioapic->lock);
-	kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, pin);
-	spin_lock(&ioapic->lock);
+	if (!ent->fields.mask) {
+		/*
+		 * We are dropping lock while calling ack notifiers because ack
+		 * notifier callbacks for assigned devices call into IOAPIC
+		 * recursively. Since remote_irr is cleared only after call
+		 * to notifiers if the same vector will be delivered while lock
+		 * is dropped it will be put into irr and will be delivered
+		 * after ack notifier returns.
+		 */
+		spin_unlock(&ioapic->lock);
+		kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, pin);
+		spin_lock(&ioapic->lock);
+	} else {
+		/*
+		 * If we get an EOI while the interrupt is masked, this is likely
+		 * a Linux guest's threaded oneshot interrupt which is not really
+		 * acked yet at the moment of EOI. In any case, EOI should not
+		 * trigger re-assertion as long as the interrupt is masked,
+		 * so postpone calling ack notifiers until the guest unmasks it.
+		 */
+		ioapic->ack_pending |= 1 << pin;
+	}

  	if (trigger_mode != IOAPIC_LEVEL_TRIG ||
  	    kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index 539333ac4b38..04a3e2e81b32 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -86,6 +86,7 @@ struct kvm_ioapic {
  	struct delayed_work eoi_inject;
  	u32 irq_eoi[IOAPIC_NUM_PINS];
  	u32 irr_delivered;
+	u32 ack_pending;
  };

  #ifdef DEBUG
-- 
2.37.0.rc0.161.g10f37bed90-goog


Thanks,
Dmytro

>>> devices that use ONESHOT interrupts that I have working in a guest VM,
>>> although I had used some of the hacks described in my original email
>>> to get things working. I'll go back and see if everything works well
>>> with none of these modifications to vfio-platform / guest driver.
>>
>> Looks like the HW I have that uses ONESHOT interrupts works fine with
>> vfio-platform IRQ forwarding out of the box. There were some bugs on
>> my end with kernel modules not being loaded that caused an interrupt
>> storm and for some reason I need to artificially force a call to
>> vfio_platform_unmask() for the unmask eventfd at the start of VM
>> execution since the first interrupt gets missed by the VM or
>> something. After that everything works normally though AFAICT.
>>
>> Thanks for the help.
> 
> You're welcome. Nice to read that ;-)
> 
> Thanks
> 
> Eric
>>
>>>
>>>>
>>>> Eric
>>>>>
>>>>>>
>>>>>> that's my understanding too. If we keep the current automasked level
>>>>>> sensitive vfio driver scheme and if the guest deactivates the vIRQ when
>>>>>> the guest irq thread has completed its job we should be good.
>>>>>
>>>>> So you're saying even if we get a pending interrupt VFIO will
>>>>> correctly ignore it? Or you're saying we won't get one since the IRQ
>>>>> will be masked (by some guarantee I don't yet understand)?
>>>>>
>>>>>>>
>>>>>>>> The most obvious way I see to handle this is to add a threaded handler
>>>>>>>> to vfio_platform_irq.c that waits until the userspace/guest driver has
>>>>>>>> serviced the interrupt and the unmask_handler[14] has been called, at
>>>>>>>> which point it returns IRQ_HANDLED so the generic IRQ code in the host
>>>>>>>> can finally unmask the interrupt.
>>>>>>>
>>>>>>> An interrupt thread with an indeterminate, user controlled runtime
>>>>>>> seems bad.  The fact that fasteoi will send an eoi doesn't also mean
>>>>>>> that it can't be masked.
>>>>>
>>>>> Yeah ok. And as mentioned above doesn't look like we're doing the eoi
>>>>> on ARM GIC anyway.
>>>>>
>>>>>>>
>>>>>>>> Does this sound like a reasonable approach and something you would be
>>>>>>>> fine with adding to vfio-platform? If so I could get started looking
>>>>>>>> at the implementation for how to sleep in the threaded handler in
>>>>>>>> vfio-platform until the unmask_handler is called. The most tricky/ugly
>>>>>>>> part of this is that DT has no knowledge of irq ONESHOT-ness, as it
>>>>>>>> only contains info regarding active-low vs active-high and edge vs
>>>>>>>> level trigger. That means that vfio-platform can't figure out that a
>>>>>>>> device uses a ONESHOT irq in a similar way to how it queries[15] the
>>>>>>>> trigger type, and by extension QEMU can't learn this information
>>>>>>>> through the VFIO_DEVICE_GET_IRQ_INFO ioctl, but must have another way
>>>>>>>> of knowing (i.e. command line option to QEMU).
>>>>>>>
>>>>>>> Seems like existing level handling w/ masking should handle this, imo.
>>>>>>>
>>>>>>>> I guess potentially another option would be to treat ONESHOT
>>>>>>>> interrupts like regular level triggered interrupts from the
>>>>>>>> perspective of vfio-platform, but somehow ensure the interrupt stays
>>>>>>>> masked during injection to the guest, rather than just disabled. I'm
>>>>>>>> not sure whether this could cause legitimate interrupts coming from
>>>>>>>> devices to be missed while the injection for an existing interrupt is
>>>>>>>> underway, but maybe this is a rare enough scenario that we wouldn't
>>>>>>>> care. The main issue with this approach is that handle_level_irq()[16]
>>>>>>>> will try to unmask the irq out from under us after we start the
>>>>>>>> injection (as it is already masked before
>>>>>>>> vfio_automasked_irq_handler[17] runs anyway). Not sure if masking at
>>>>>>>> the irqchip level supports nesting or not.
>>>>>>>
>>>>>>> I'd expect either an unmask at the controller or eoi to re-evaluate the
>>>>>>> interrupt condition and re-assert as needed.  The interrupt will need to
>>>>>>> be exclusive to the device so as not to starve other devices.
>>>>>> To me the physical IRQ pending state depends on the line level.
>>>>>> Depending on this line level, on the unmask the irqchip reevaluates
>>>>>> whether it should be fired.
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Eric
>>>>>>>
>>>>>>>> Let me know if you think either of these are viable options for adding
>>>>>>>> ONESHOT interrupt forwarding support to vfio-platform?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Micah
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Additional note about level triggered vs ONESHOT irq forwarding:
>>>>>>>> For the regular type of level triggered interrupt described above, the
>>>>>>>> vfio handler will call disable_irq_nosync()[18] before the
>>>>>>>> handle_level_irq() function unmasks the irq and returns. This ensures
>>>>>>>> if new interrupts come in on the line while the existing one is being
>>>>>>>> handled by the guest (and the irq is therefore disabled), that the
>>>>>>>> vfio_automasked_irq_handler() isn’t triggered again until the
>>>>>>>> vfio_platform_unmask_handler() function has been triggered by the
>>>>>>>> guest (causing the irq to be re-enabled[19]). In other words, the
>>>>>>>> purpose of the irq enable/disable that already exists in vfio-platform
>>>>>>>> is a higher level concept that delays handling of additional
>>>>>>>> level-triggered interrupts in the host until the current one has been
>>>>>>>> handled in the guest.
>>>>>>>
>>>>>>> I wouldn't say "delays", the interrupt condition is not re-evaluated
>>>>>>> until the interrupt is unmasked, by which point the user has had an
>>>>>>> opportunity to service the device, which could de-assert the interrupt
>>>>>>> such that there is no pending interrupt on unmask.  It therefore blocks
>>>>>
>>>>> No pending interrupt on unmask definitely seems like what we want for
>>>>> ONESHOT. You're saying we have that?
>>>>>
>>>>>>> further interrupts until user serviced and unmasked.
>>>>>>>
>>>>>>>> This means that the existing level triggered interrupt forwarding
>>>>>>>> logic in vfio/vfio-platform is not sufficient for handling ONESHOT
>>>>>>>> interrupts (i.e. we can’t just treat a ONESHOT interrupt like a
>>>>>>>> regular level triggered interrupt in the host and use the existing
>>>>>>>> vfio forwarding code). The masking that needs to happen for ONESHOT
>>>>>>>> interrupts is at the lower level of the irqchip mask/unmask in that
>>>>>>>> the ONESHOT irq needs to remain masked (not just disabled) until the
>>>>>>>> driver’s threaded handler has completed.
>>>>>>>
>>>>>>> I don't see that this is true, unmasking the irq should cause the
>>>>>>> controller to re-evaluate the irq condition on the device end and issue
>>>>>>> a new interrupt as necessary.  Right?  Thanks,
>>>>>> if the line is asserted,
>>>>>
>>>>> Yeah, my concern was that the re-evaluation would come to the wrong
>>>>> conclusion if the ONESHOT irq was lazily disabled without doing the
>>>>> mask at the irqchip HW level.
>>>>>
>>>>>>>
>>>>>>> Alex
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
> 

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: Add vfio-platform support for ONESHOT irq forwarding?
  2022-07-06 15:25               ` Dmytro Maluka
@ 2022-07-06 20:39                 ` Sean Christopherson
  2022-07-07  7:38                   ` Dmytro Maluka
  2022-07-07  8:25                 ` Eric Auger
  1 sibling, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2022-07-06 20:39 UTC (permalink / raw)
  To: Dmytro Maluka
  Cc: Auger Eric, Micah Morton, Alex Williamson, kvm, Paolo Bonzini,
	Rong L Liu, Tomasz Nowicki, Grzegorz Jaszczyk, Dmitry Torokhov

On Wed, Jul 06, 2022, Dmytro Maluka wrote:
> This is not a problem on native, since for oneshot irq we keep the interrupt
> masked until the thread exits, so that the EOI at the end of hardirq doesn't
> result in immediate re-assert. In vfio + KVM case, however, the host doesn't
> check that the interrupt is still masked in the guest, so
> vfio_platform_unmask() is called regardless.

Isn't not checking that an interrupt is unmasked the real bug?  Fudging around vfio
(or whatever is doing the premature unmasking) bugs by delaying an ack notification
in KVM is a hack, no?

> Therefore, since the interrupt has not yet been acked in the guest's threaded
> handler, a new (unwanted) physical interrupt is generated in the host and
> queued for injection to the guest in vfio_automasked_irq_handler().

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add vfio-platform support for ONESHOT irq forwarding?
  2022-07-06 20:39                 ` Sean Christopherson
@ 2022-07-07  7:38                   ` Dmytro Maluka
  2022-07-07 15:00                     ` Sean Christopherson
  0 siblings, 1 reply; 26+ messages in thread
From: Dmytro Maluka @ 2022-07-07  7:38 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Auger Eric, Micah Morton, Alex Williamson, kvm, Paolo Bonzini,
	Rong L Liu, Tomasz Nowicki, Grzegorz Jaszczyk, Dmitry Torokhov

Hi Sean,

On 7/6/22 10:39 PM, Sean Christopherson wrote:
> On Wed, Jul 06, 2022, Dmytro Maluka wrote:
>> This is not a problem on native, since for oneshot irq we keep the interrupt
>> masked until the thread exits, so that the EOI at the end of hardirq doesn't
>> result in immediate re-assert. In vfio + KVM case, however, the host doesn't
>> check that the interrupt is still masked in the guest, so
>> vfio_platform_unmask() is called regardless.
> 
> Isn't not checking that an interrupt is unmasked the real bug?  Fudging around vfio
> (or whatever is doing the premature unmasking) bugs by delaying an ack notification
> in KVM is a hack, no?

Yes, not checking that an interrupt is unmasked is IMO a bug, and my 
patch actually adds this missing checking, only that it adds it in KVM, 
not in VFIO. :)

Arguably it's not a bug that VFIO is not checking the guest interrupt 
state on its own, provided that the resample notification it receives is 
always a notification that the interrupt has been actually acked. That 
is the motivation behind postponing ack notification in KVM in my patch: 
it is to ensure that KVM "ack notifications" are always actual ack 
notifications (as the name suggests), not just "eoi notifications".

That said, your idea of checking the guest interrupt status in VFIO (or 
whatever is listening on the resample eventfd) makes sense to me too. 
The problem, though, is that it's KVM that knows the guest interrupt 
status, so KVM would need to let VFIO/whatever know it somehow. (I'm 
assuming we are focusing on the case of KVM kernel irqchip, not 
userspace or split irqchip.) So do you have in mind adding something 
like "maskfd" and "unmaskfd" to KVM IRQFD interface, in addition to 
resamplefd? If so, I'm actually in favor of such an idea, as I think it 
would be also useful for other purposes, regardless of oneshot interrupts.

VFIO seems to have an assumption that once a device is initialized, its 
interrupt stays unmasked all the time. I agree it might make sense to 
revisit this assumption.

Thanks,
Dmytro

> 
>> Therefore, since the interrupt has not yet been acked in the guest's threaded
>> handler, a new (unwanted) physical interrupt is generated in the host and
>> queued for injection to the guest in vfio_automasked_irq_handler().

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add vfio-platform support for ONESHOT irq forwarding?
  2022-07-06 15:25               ` Dmytro Maluka
  2022-07-06 20:39                 ` Sean Christopherson
@ 2022-07-07  8:25                 ` Eric Auger
  2022-07-07  9:15                   ` Dmytro Maluka
  1 sibling, 1 reply; 26+ messages in thread
From: Eric Auger @ 2022-07-07  8:25 UTC (permalink / raw)
  To: Dmytro Maluka, Micah Morton
  Cc: Alex Williamson, kvm, Sean Christopherson, Paolo Bonzini,
	Rong L Liu, Tomasz Nowicki, Grzegorz Jaszczyk, Dmitry Torokhov

Hi Dmytro,
On 7/6/22 17:25, Dmytro Maluka wrote:
> Hi,
>
> Let me revive this discussion.
>
> It seems last time the conclusion of this discussion was that the
> existing VFIO + KVM mechanism for interrupts forwarding should work
> just as fine for "oneshot" interrupts as for normal level-triggered
> interrupts. It seems, however, that in fact it doesn't work quite
> correctly for "oneshot" interrupts, and now we are observing some real
> problems caused by it (this time on x86, but I suspect that ARM is
> also subject to the same issue).
>
> In short, KVM always sends ack notification at the moment of guest
> EOI, which is too early in the case of oneshot interrupts. Please see
> my comments below.
>
> On 1/30/21 17:38, Auger Eric wrote:
>> Hi Micah,
>>
>> On 1/29/21 8:57 PM, Micah Morton wrote:
>>> On Wed, Jan 27, 2021 at 10:58 AM Micah Morton <mortonm@chromium.org>
>>> wrote:
>>>>
>>>> On Tue, Jan 26, 2021 at 11:20 AM Auger Eric <eric.auger@redhat.com>
>>>> wrote:
>>>>>
>>>>> Hi Micah,
>>>>>
>>>>> On 1/26/21 4:15 PM, Micah Morton wrote:
>>>>>> On Tue, Jan 26, 2021 at 3:54 AM Auger Eric
>>>>>> <eric.auger@redhat.com> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 1/25/21 9:36 PM, Alex Williamson wrote:
>>>>>>>> On Mon, 25 Jan 2021 10:46:47 -0500
>>>>>>>> Micah Morton <mortonm@chromium.org> wrote:
>>>>>>>>
>>>>>>>>> Hi Eric,
>>>>>>>>>
>>>>>>>>> I was recently looking into some vfio-platform passthrough
>>>>>>>>> stuff and
>>>>>>>>> came across a device I wanted to assign to a guest that uses a
>>>>>>>>> ONESHOT
>>>>>>>>> type interrupt (these type of interrupts seem to be quite
>>>>>>>>> common, on
>>>>>>>>> ARM at least). The semantics for ONESHOT interrupts are a bit
>>>>>>>>> different from regular level triggered interrupts as I'll
>>>>>>>>> describe
>>>>>>>>> here:
>>>>>>>>>
>>>>>>>>> The normal generic code flow for level-triggered interrupts is
>>>>>>>>> as follows:
>>>>>>>>>
>>>>>>>>> - regular type[1]: mask[2] the irq, then run the handler, then
>>>>>>>>> unmask[3] the irq and done
>>>>>>>>>
>>>>>>>>> - fasteoi type[4]: run the handler, then eoi[5] the irq and done
>>>>>>>>>
>>>>>>>>> Note: IIUC the fasteoi type doesn't do any irq masking/unmasking
>>>>>>>>> because that is assumed to be handled transparently by "modern
>>>>>>>>> forms
>>>>>>>>> of interrupt handlers, which handle the flow details in hardware"
>>>>>>>>>
>>>>>>>>> ONESHOT type interrupts are a special case of the fasteoi type
>>>>>>>>> described above. They rely on the driver registering a threaded
>>>>>>>>> handler for the interrupt and assume the irq line will remain
>>>>>>>>> masked
>>>>>>>>> until the threaded handler completes, at which time the line
>>>>>>>>> will be
>>>>>>>>> unmasked. TL;DR:
>>>>>>>>>
>>>>>>>>> - mask[6] the irq, run the handler, and potentially eoi[7] the
>>>>>>>>> irq,
>>>>>>>>> then unmask[8] later when the threaded handler has finished
>>>>>>>>> running.
>>>>>>>>
>>>>>>>> This doesn't seem quite correct to me, it skips the discussion
>>>>>>>> of the
>>>>>>>> hard vs threaded handler, where the "regular" type would expect
>>>>>>>> the
>>>>>>>> device interrupt to be masked in the hard handler, such that the
>>>>>>>> controller line can be unmasked during execution of the
>>>>>>>> threaded handler
>>>>>>>> (if it exists).  It seems fasteoi is more transactional, ie.
>>>>>>>> rather
>>>>>>
>>>>>> handle_irq_event() only waits for the hard handler to run, not the
>>>>>> threaded handler. And then this comment
>>>>>> (https://elixir.bootlin.com/linux/v5.10.7/source/kernel/irq/chip.c#L622)
>>>>>>
>>>>>> implies that the "regular" type IRQs are not normally unmasked by
>>>>>> the
>>>>>> threaded handler but rather before that as part of
>>>>>> handle_level_irq()
>>>>>> after handle_irq_event() has returned (since cond_unmask_irq() is
>>>>>> always called there and handle_irq_event() doesn't wait for the
>>>>>> threaded handler to run before returning). I don't actually have
>>>>>> first
>>>>>> hand knowledge one way or another whether threaded handlers normally
>>>>>> unmask the IRQ themselves -- just reading the generic IRQ code.
>>>>>> Let me
>>>>>> know if I'm missing something here.
>>>>>>
>>>>>>>> than masking around quiescing the device interrupt, we only
>>>>>>>> need to
>>>>>>>> send an eoi once we're ready for a new interrupt.  ONESHOT,
>>>>>>>> OTOH, is a
>>>>>>>> means of deferring all device handling to the threaded interrupt,
>>>>>>>> specifically for cases such as an i2c device where the bus
>>>>>>>> interaction
>>>>>>>> necessitates non-IRQ-context handling.  Sound right?
>>>>>>
>>>>>> The rest of this matches my understanding.
>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> For vfio-platform irq forwarding, there is no existing
>>>>>>>>> function in
>>>>>>>>> drivers/vfio/platform/vfio_platform_irq.c[9] that is a good
>>>>>>>>> candidate
>>>>>>>>> for registering as the threaded handler for a ONESHOT
>>>>>>>>> interrupt in the
>>>>>>>>> case we want to request the ONESHOT irq with
>>>>>>>>> request_threaded_irq()[10]. Moreover, we can't just register a
>>>>>>>>> threaded function that simply immediately returns IRQ_HANDLED
>>>>>>>>> (as is
>>>>>>>>> done in vfio_irq_handler()[11] and
>>>>>>>>> vfio_automasked_irq_handler()[12]),
>>>>>>>>> since that would cause the IRQ to be unmasked[13] immediately,
>>>>>>>>> before
>>>>>>>>> the userspace/guest driver has had any chance to service the
>>>>>>>>> interrupt.
>>>>>>>>
>>>>>>>> Are you proposing servicing the device interrupt before it's
>>>>>>>> sent to
>>>>>>>> userspace?  A ONESHOT irq is going to eoi the interrupt when
>>>>>>>> the thread
>>>>>>
>>>>>> No I wasn't thinking of doing any servicing before userspace gets it
>>>>>>
>>>>>>>> exits, before userspace services the interrupt, so this seems
>>>>>>>> like a
>>>>>>
>>>>>> Yeah, although I would say unmask rather than eoi, since as Eric
>>>>>> just
>>>>>> pointed out IRQCHIP_EOI_THREADED is not supported by GIC. IOW,
>>>>>> the GIC
>>>>>> irqchip does not "require eoi() on unmask in threaded mode"
>>>>>> (https://elixir.bootlin.com/linux/v5.10.7/source/include/linux/irq.h#L575).
>>>>>>
>>>>>> The unmask will happen after the thread exits, due to
>>>>>> irq_finalize_oneshot() being called from irq_thread_fn()
>>>>>>
>>>>>>>> case where we'd need to mask the irq regardless of fasteoi
>>>>>>>> handling so
>>>>>>>> that it cannot re-assert before userspace manages the device.  Our
>>>>>>>> existing autmasked for level triggered interrupts should handle
>>>>>>>> this.
>
> To Alex:
> This doesn't seem true. For oneshot fasteoi irq we EOI the interrupt
> when the hardirq handler exits, just like for regular fasteoi irq.
> (Only if IRQCHIP_EOI_THREADED flag is set, EOI is postponed until the
> threaded handler exits. But IRQCHIP_EOI_THREADED is set only for a few
> exotic interrupt controllers, not for generic IO-APIC or GIC.)
>
> This is not a problem on native, since for oneshot irq we keep the
> interrupt masked until the thread exits, so that the EOI at the end of
> hardirq doesn't result in immediate re-assert. In vfio + KVM case,
> however, the host doesn't check that the interrupt is still masked in
> the guest, so vfio_platform_unmask() is called regardless. Therefore,
> since the interrupt has not yet been acked in the guest's threaded
> handler, a new (unwanted) physical interrupt is generated in the host
> and queued for injection to the guest in vfio_automasked_irq_handler().
>
>>>>>>
>>>>>> Yeah that's what I want as well. The thing that was tripping me
>>>>>> up is
>>>>>> this description of irq_disable (disable_irq_nosync() as used in
>>>>>> VFIO
>>>>>> is a wrapper for irq_disable):
>>>>>> https://elixir.bootlin.com/linux/v5.10.10/source/kernel/irq/chip.c#L371
>>>>>>
>>>>>> . This makes it seem like depending on irqchip internals (whether
>>>>>> chip
>>>>>> implements irq_disable() callback or not, and what that callback
>>>>>> actually does), we may not be actually masking the irq at the
>>>>>> irqchip
>>>>>> level during the disable (although if the IRQ_DISABLE_UNLAZY flag is
>>>>>> set then irq_disable() will lead to mask() being called in the case
>>>>>> the irqchip doesn't implement the disable() callback. But in the
>>>>>> normal case I think that flag is not set). So seemed to me with
>>>>>> ONESHOT we would run the risk of an extra pending interrupt that was
>>>>>> never intended by the hardware (i.e. "an interrupt happens, then the
>>>>>> interrupt flow handler masks the line at the hardware level and
>>>>>> marks
>>>>>> it pending"). I guess if we know we are going to ignore this pending
>>>>>> interrupt down the line after guest/userspace has finished with the
>>>>>> interrupt injection then this isn't an issue.
>>>>>
>>>>> Here is my understanding.
>>>>>
>>>>> if the IRQ has been automasked by VFIO on entry in
>>>>> vfio_automasked_irq_handler(), the corresponding SPI (shared
>>>>> peripheral
>>>>> interrupt) has been disabled and cannot be triggered by the GIC
>>>>> anymore
>>>>> (until corresponding unmask). The physical IRQ is deactivated (EOI)
>>>>> after the host ISR by the host genirq code. So the active state of
>>>>> the
>>>>> SPI is removed at this point. When the guest deactivates the vIRQ
>>>>> (so I
>>>>> understand after the completion of the guest vIRQ thread), VFIO
>>>>> unmask
>>>>> handler gets called, the physical SPI is re-enabled. At this
>>>>> point, the
>>>>> GIC scans the physical line again and depending on its state triggers
>>>>> the physical IRQ again.
>
> To Eric:
> Again, this doesn't seem to be true. Just as explained in my above
> reply to Alex, the guest deactivates (EOI) the vIRQ already after the
> completion of the vIRQ hardirq handler, not the vIRQ thread.
>
> So VFIO unmask handler gets called too early, before the interrupt
> gets serviced and acked in the vIRQ thread.
Fair enough, on vIRQ hardirq handler the physical IRQ gets unmasked.
This event occurs on guest EOI, which triggers the resamplefd. But what
is the state of the vIRQ? Isn't it stil masked until the vIRQ thread
completes, preventing the physical IRQ from being propagated to the guest?
>
>>>>>
>>>>> https://www.linux-kvm.org/images/a/a8/01x04-ARMdevice.pdf
>>>>> slide 7 for the SPI states and slide 7 for the KVM/ARM forwarding
>>>>> flow
>>>>
>>>> Ok thanks for the explanation. Sounds like you and Alex are in
>>>> agreement that there shouldn't be a problem with seeing extra
>>>> erroneous pending interrupts upon calling vfio_platform_unmask() in
>>>> the host when doing ONESHOT interrupt forwarding. I have some HW
>
> To Micah:
> It seems this is not what Eric and Alex were implying. Anyway, even if
> we are fine with those erroneous pending interrupts firing in the
> host, I believe we are not fine with propagating them to the guest.
>
> There are at least 2 user-visible issues caused by those extra
> erroneous pending interrupts for oneshot irq in the guest:
>
> 1. System suspend aborted due to a pending wakeup interrupt from
>   ChromeOS EC (drivers/platform/chrome/cros_ec.c):
>
>     Abort: Last active Wakeup Source: GOOG0004:00
>     PM: suspend exit
>
> 2. Annoying "invalid report id data" errors from ELAN0000 touchpad
>   (drivers/input/mouse/elan_i2c_core.c), flooding the guest dmesg
>   every time the touchpad is touched.
>
> It seems the obvious fix is to postpone sending irq ack notifications
> in KVM from EOI to unmask (for oneshot interrupts only). Luckily, we
> don't need to provide KVM with the info that the given interrupt is
> oneshot. KVM can just find it out from the fact that the interrupt is
> masked at the time of EOI.
you mean the vIRQ right?
>
> Actually I already have a patch implementing such a fix for KVM
> IO-APIC (see below). It makes both the above issues go away. I realize
> this patch is probably racy, since it assumes that dropping
> ioapic->lock in ioapic_write_indirect() is just as fine as doing it in
> kvm_ioapic_update_eoi_one(). I just haven't figured out yet how to
> rework it in a race-free way (BTW, I'd appreciate any suggestions).
>
> Also I guess the same issue exists in KVM on non-x86 as well (since it
> looks like on arm and powerpc kvm_notify_acked_irq() is also always
> called at EOI), as well as in user-space IO-APIC implementations in
> QEMU and other VMMs.
Before going further and we invest more time in that thread, please
could you give us additional context info and confidence
in/understanding of the stakes. This thread is from Jan 2021 and was
discontinued for a while. vfio-platform currently only is enabled on ARM
and maintained for very few devices which properly implement reset
callbacks and duly use an underlying IOMMU.

Thanks

Eric
>
>
> Signed-off-by: Dmytro Maluka <dmy@semihalf.com>
> ---
>  arch/x86/kvm/ioapic.c | 43 +++++++++++++++++++++++++++++++------------
>  arch/x86/kvm/ioapic.h |  1 +
>  2 files changed, 32 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index 765943d7cfa5..f008cb5a2bcb 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -365,8 +365,16 @@ static void ioapic_write_indirect(struct
> kvm_ioapic *ioapic, u32 val)
>              e->fields.remote_irr = 0;
>
>          mask_after = e->fields.mask;
> -        if (mask_before != mask_after)
> +        if (mask_before != mask_after) {
>              kvm_fire_mask_notifiers(ioapic->kvm, KVM_IRQCHIP_IOAPIC,
> index, mask_after);
> +            if (!mask_after && ioapic->ack_pending & (1 << index)) {
> +                spin_unlock(&ioapic->lock);
> +                kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC,
> index);
> +                spin_lock(&ioapic->lock);
> +
> +                ioapic->ack_pending &= ~(1 << index);
> +            }
> +        }
>          if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG
>              && ioapic->irr & (1 << index))
>              ioapic_service(ioapic, index, false);
> @@ -505,17 +513,28 @@ static void kvm_ioapic_update_eoi_one(struct
> kvm_vcpu *vcpu,
>      struct kvm_lapic *apic = vcpu->arch.apic;
>      union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[pin];
>
> -    /*
> -     * We are dropping lock while calling ack notifiers because ack
> -     * notifier callbacks for assigned devices call into IOAPIC
> -     * recursively. Since remote_irr is cleared only after call
> -     * to notifiers if the same vector will be delivered while lock
> -     * is dropped it will be put into irr and will be delivered
> -     * after ack notifier returns.
> -     */
> -    spin_unlock(&ioapic->lock);
> -    kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, pin);
> -    spin_lock(&ioapic->lock);
> +    if (!ent->fields.mask) {
> +        /*
> +         * We are dropping lock while calling ack notifiers because ack
> +         * notifier callbacks for assigned devices call into IOAPIC
> +         * recursively. Since remote_irr is cleared only after call
> +         * to notifiers if the same vector will be delivered while lock
> +         * is dropped it will be put into irr and will be delivered
> +         * after ack notifier returns.
> +         */
> +        spin_unlock(&ioapic->lock);
> +        kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, pin);
> +        spin_lock(&ioapic->lock);
> +    } else {
> +        /*
> +         * If we get an EOI while the interrupt is masked, this is
> likely
> +         * a Linux guest's threaded oneshot interrupt which is not
> really
> +         * acked yet at the moment of EOI. In any case, EOI should not
> +         * trigger re-assertion as long as the interrupt is masked,
> +         * so postpone calling ack notifiers until the guest unmasks it.
> +         */
> +        ioapic->ack_pending |= 1 << pin;
> +    }
>
>      if (trigger_mode != IOAPIC_LEVEL_TRIG ||
>          kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
> index 539333ac4b38..04a3e2e81b32 100644
> --- a/arch/x86/kvm/ioapic.h
> +++ b/arch/x86/kvm/ioapic.h
> @@ -86,6 +86,7 @@ struct kvm_ioapic {
>      struct delayed_work eoi_inject;
>      u32 irq_eoi[IOAPIC_NUM_PINS];
>      u32 irr_delivered;
> +    u32 ack_pending;
>  };
>
>  #ifdef DEBUG


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add vfio-platform support for ONESHOT irq forwarding?
  2022-07-07  8:25                 ` Eric Auger
@ 2022-07-07  9:15                   ` Dmytro Maluka
  2022-07-25 22:03                     ` Liu, Rong L
  0 siblings, 1 reply; 26+ messages in thread
From: Dmytro Maluka @ 2022-07-07  9:15 UTC (permalink / raw)
  To: eric.auger, Micah Morton
  Cc: Alex Williamson, kvm, Sean Christopherson, Paolo Bonzini,
	Rong L Liu, Tomasz Nowicki, Grzegorz Jaszczyk, Dmitry Torokhov

Hi Eric,

On 7/7/22 10:25 AM, Eric Auger wrote:
>> Again, this doesn't seem to be true. Just as explained in my above
>> reply to Alex, the guest deactivates (EOI) the vIRQ already after the
>> completion of the vIRQ hardirq handler, not the vIRQ thread.
>>
>> So VFIO unmask handler gets called too early, before the interrupt
>> gets serviced and acked in the vIRQ thread.
> Fair enough, on vIRQ hardirq handler the physical IRQ gets unmasked.
> This event occurs on guest EOI, which triggers the resamplefd. But what
> is the state of the vIRQ? Isn't it stil masked until the vIRQ thread
> completes, preventing the physical IRQ from being propagated to the guest?

Even if vIRQ is still masked by the time when 
vfio_automasked_irq_handler() signals the eventfd (which in itself is 
not guaranteed, I guess), I believe KVM is buffering this event, so 
after the vIRQ is unmasked, this new IRQ will be injected to the guest 
anyway.

>> It seems the obvious fix is to postpone sending irq ack notifications
>> in KVM from EOI to unmask (for oneshot interrupts only). Luckily, we
>> don't need to provide KVM with the info that the given interrupt is
>> oneshot. KVM can just find it out from the fact that the interrupt is
>> masked at the time of EOI.
> you mean the vIRQ right?

Right.

> Before going further and we invest more time in that thread, please
> could you give us additional context info and confidence
> in/understanding of the stakes. This thread is from Jan 2021 and was
> discontinued for a while. vfio-platform currently only is enabled on ARM
> and maintained for very few devices which properly implement reset
> callbacks and duly use an underlying IOMMU.

Sure. We are not really using vfio-platform for the devices we are 
concerned with, since those are not DMA capable devices, and some of 
them are not really platform devices but I2C or SPI devices. Instead we 
are using (hopefully temporarily) Micah's module for forwarding 
arbitrary IRQs [1][2] which mostly reimplements the VFIO irq forwarding 
mechanism.

Also with a few simple hacks I managed to use vfio-platform for the same 
thing (just as a PoC) and confirmed, unsurprisingly, that the problems 
with oneshot interrupts are observed with vfio-platform as well.

[1] 
https://chromium.googlesource.com/chromiumos/third_party/kernel/+/refs/heads/chromeos-5.10-manatee/virt/lib/platirqforward.c

[2] 
https://lkml.kernel.org/kvm/CAJ-EccPU8KpU96PM2PtroLjdNVDbvnxwKwWJr2B+RBKuXEr7Vw@mail.gmail.com/T/

Thanks,
Dmytro

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add vfio-platform support for ONESHOT irq forwarding?
  2022-07-07  7:38                   ` Dmytro Maluka
@ 2022-07-07 15:00                     ` Sean Christopherson
  2022-07-07 17:38                       ` Dmytro Maluka
  0 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2022-07-07 15:00 UTC (permalink / raw)
  To: Dmytro Maluka
  Cc: Auger Eric, Micah Morton, Alex Williamson, kvm, Paolo Bonzini,
	Rong L Liu, Tomasz Nowicki, Grzegorz Jaszczyk, Dmitry Torokhov

On Thu, Jul 07, 2022, Dmytro Maluka wrote:
> Hi Sean,
> 
> On 7/6/22 10:39 PM, Sean Christopherson wrote:
> > On Wed, Jul 06, 2022, Dmytro Maluka wrote:
> > > This is not a problem on native, since for oneshot irq we keep the interrupt
> > > masked until the thread exits, so that the EOI at the end of hardirq doesn't
> > > result in immediate re-assert. In vfio + KVM case, however, the host doesn't
> > > check that the interrupt is still masked in the guest, so
> > > vfio_platform_unmask() is called regardless.
> > 
> > Isn't not checking that an interrupt is unmasked the real bug?  Fudging around vfio
> > (or whatever is doing the premature unmasking) bugs by delaying an ack notification
> > in KVM is a hack, no?
> 
> Yes, not checking that an interrupt is unmasked is IMO a bug, and my patch
> actually adds this missing checking, only that it adds it in KVM, not in
> VFIO. :)
> 
> Arguably it's not a bug that VFIO is not checking the guest interrupt state
> on its own, provided that the resample notification it receives is always a
> notification that the interrupt has been actually acked. That is the
> motivation behind postponing ack notification in KVM in my patch: it is to
> ensure that KVM "ack notifications" are always actual ack notifications (as
> the name suggests), not just "eoi notifications".

But EOI is an ACK.  It's software saying "this interrupt has been consumed".
 
> That said, your idea of checking the guest interrupt status in VFIO (or
> whatever is listening on the resample eventfd) makes sense to me too. The
> problem, though, is that it's KVM that knows the guest interrupt status, so
> KVM would need to let VFIO/whatever know it somehow. (I'm assuming we are
> focusing on the case of KVM kernel irqchip, not userspace or split irqchip.)
> So do you have in mind adding something like "maskfd" and "unmaskfd" to KVM
> IRQFD interface, in addition to resamplefd? If so, I'm actually in favor of
> such an idea, as I think it would be also useful for other purposes,
> regardless of oneshot interrupts.

Unless I'm misreading things, KVM already provides a mask notifier, irqfd just
needs to be wired up to use kvm_(un)register_irq_mask_notifier().

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add vfio-platform support for ONESHOT irq forwarding?
  2022-07-07 15:00                     ` Sean Christopherson
@ 2022-07-07 17:38                       ` Dmytro Maluka
  2022-07-12 16:02                         ` Liu, Rong L
  0 siblings, 1 reply; 26+ messages in thread
From: Dmytro Maluka @ 2022-07-07 17:38 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Auger Eric, Micah Morton, Alex Williamson, kvm, Paolo Bonzini,
	Rong L Liu, Tomasz Nowicki, Grzegorz Jaszczyk, Dmitry Torokhov

On 7/7/22 17:00, Sean Christopherson wrote:
> On Thu, Jul 07, 2022, Dmytro Maluka wrote:
>> Hi Sean,
>>
>> On 7/6/22 10:39 PM, Sean Christopherson wrote:
>>> On Wed, Jul 06, 2022, Dmytro Maluka wrote:
>>>> This is not a problem on native, since for oneshot irq we keep the interrupt
>>>> masked until the thread exits, so that the EOI at the end of hardirq doesn't
>>>> result in immediate re-assert. In vfio + KVM case, however, the host doesn't
>>>> check that the interrupt is still masked in the guest, so
>>>> vfio_platform_unmask() is called regardless.
>>>
>>> Isn't not checking that an interrupt is unmasked the real bug?  Fudging around vfio
>>> (or whatever is doing the premature unmasking) bugs by delaying an ack notification
>>> in KVM is a hack, no?
>>
>> Yes, not checking that an interrupt is unmasked is IMO a bug, and my patch
>> actually adds this missing checking, only that it adds it in KVM, not in
>> VFIO. :)
>>
>> Arguably it's not a bug that VFIO is not checking the guest interrupt state
>> on its own, provided that the resample notification it receives is always a
>> notification that the interrupt has been actually acked. That is the
>> motivation behind postponing ack notification in KVM in my patch: it is to
>> ensure that KVM "ack notifications" are always actual ack notifications (as
>> the name suggests), not just "eoi notifications".
> 
> But EOI is an ACK.  It's software saying "this interrupt has been consumed".

Ok, I see we've had some mutual misunderstanding of the term "ack" here. 
EOI is an ACK in the interrupt controller sense, while I was talking 
about an ACK in the device sense, i.e. a device-specific action, done in 
a device driver's IRQ handler, which makes the device de-assert the IRQ 
line (so that the IRQ will not get re-asserted after an EOI or unmask).

So the problem I'm trying to fix stems from the peculiarity of "oneshot" 
interrupts: an ACK in the device sense is done in a threaded handler, 
i.e. after an ACK in the interrupt controller sense, not before it.

In the meantime I've realized one more reason why my patch is wrong. 
kvm_notify_acked_irq() is an internal KVM thing which is supposed to 
notify interested parts of KVM about an ACK rather in the interrupt 
controller sense, i.e. about an EOI as it is doing already.

VFIO, on the other hand, rather expects a notification about an ACK in 
the device sense. So it still seems a good idea to me to postpone 
sending notifications to VFIO until an IRQ gets unmasked, but this 
postponing should be done not for the entire kvm_notify_acked_irq() but 
only for eventfd_signal() on resamplefd in irqfd_resampler_ack().

Thanks for making me think about that.

>   
>> That said, your idea of checking the guest interrupt status in VFIO (or
>> whatever is listening on the resample eventfd) makes sense to me too. The
>> problem, though, is that it's KVM that knows the guest interrupt status, so
>> KVM would need to let VFIO/whatever know it somehow. (I'm assuming we are
>> focusing on the case of KVM kernel irqchip, not userspace or split irqchip.)
>> So do you have in mind adding something like "maskfd" and "unmaskfd" to KVM
>> IRQFD interface, in addition to resamplefd? If so, I'm actually in favor of
>> such an idea, as I think it would be also useful for other purposes,
>> regardless of oneshot interrupts.
> 
> Unless I'm misreading things, KVM already provides a mask notifier, irqfd just
> needs to be wired up to use kvm_(un)register_irq_mask_notifier().

Thanks for the tip. I'll take a look into implementing this idea.

It seems you agree that fixing this issue via a change in KVM (in irqfd, 
not in ioapic) seems to be the way to go.

An immediate problem I see with kvm_(un)register_irq_mask_notifier() is 
that it is currently available for x86 only. Also, mask notifiers are 
called under ioapic->lock (I'm not sure yet if that is good or bad news 
for us).

Thanks,
Dmytro


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: Add vfio-platform support for ONESHOT irq forwarding?
  2022-07-07 17:38                       ` Dmytro Maluka
@ 2022-07-12 16:02                         ` Liu, Rong L
  2022-08-04 14:44                           ` Eric Auger
  0 siblings, 1 reply; 26+ messages in thread
From: Liu, Rong L @ 2022-07-12 16:02 UTC (permalink / raw)
  To: Dmytro Maluka, Christopherson,, Sean
  Cc: Auger Eric, Micah Morton, Alex Williamson, kvm, Paolo Bonzini,
	Tomasz Nowicki, Grzegorz Jaszczyk, Dmitry Torokhov, Liu, Rong L

Hi Sean and Dmytro,

> -----Original Message-----
> From: Dmytro Maluka <dmy@semihalf.com>
> Sent: Thursday, July 7, 2022 10:39 AM
> To: Christopherson,, Sean <seanjc@google.com>
> Cc: Auger Eric <eric.auger@redhat.com>; Micah Morton
> <mortonm@chromium.org>; Alex Williamson
> <alex.williamson@redhat.com>; kvm@vger.kernel.org; Paolo Bonzini
> <pbonzini@redhat.com>; Liu, Rong L <rong.l.liu@intel.com>; Tomasz
> Nowicki <tn@semihalf.com>; Grzegorz Jaszczyk <jaz@semihalf.com>;
> Dmitry Torokhov <dtor@google.com>
> Subject: Re: Add vfio-platform support for ONESHOT irq forwarding?
> 
> On 7/7/22 17:00, Sean Christopherson wrote:
> > On Thu, Jul 07, 2022, Dmytro Maluka wrote:
> >> Hi Sean,
> >>
> >> On 7/6/22 10:39 PM, Sean Christopherson wrote:
> >>> On Wed, Jul 06, 2022, Dmytro Maluka wrote:
> >>>> This is not a problem on native, since for oneshot irq we keep the
> >>>> interrupt masked until the thread exits, so that the EOI at the end
> >>>> of hardirq doesn't result in immediate re-assert. In vfio + KVM
> >>>> case, however, the host doesn't check that the interrupt is still
> >>>> masked in the guest, so
> >>>> vfio_platform_unmask() is called regardless.
> >>>
> >>> Isn't not checking that an interrupt is unmasked the real bug?
> >>> Fudging around vfio (or whatever is doing the premature unmasking)
> >>> bugs by delaying an ack notification in KVM is a hack, no?
> >>
> >> Yes, not checking that an interrupt is unmasked is IMO a bug, and my
> >> patch actually adds this missing checking, only that it adds it in
> >> KVM, not in VFIO. :)
> >>
> >> Arguably it's not a bug that VFIO is not checking the guest interrupt
> >> state on its own, provided that the resample notification it receives
> >> is always a notification that the interrupt has been actually acked.
> >> That is the motivation behind postponing ack notification in KVM in
> >> my patch: it is to ensure that KVM "ack notifications" are always
> >> actual ack notifications (as the name suggests), not just "eoi
> notifications".
> >
> > But EOI is an ACK.  It's software saying "this interrupt has been
> consumed".
> 
> Ok, I see we've had some mutual misunderstanding of the term "ack"
> here.
> EOI is an ACK in the interrupt controller sense, while I was talking about
> an ACK in the device sense, i.e. a device-specific action, done in a device
> driver's IRQ handler, which makes the device de-assert the IRQ line (so
> that the IRQ will not get re-asserted after an EOI or unmask).
> 
> So the problem I'm trying to fix stems from the peculiarity of "oneshot"
> interrupts: an ACK in the device sense is done in a threaded handler, i.e.
> after an ACK in the interrupt controller sense, not before it.
> 
> In the meantime I've realized one more reason why my patch is wrong.
> kvm_notify_acked_irq() is an internal KVM thing which is supposed to
> notify interested parts of KVM about an ACK rather in the interrupt
> controller sense, i.e. about an EOI as it is doing already.
> 
> VFIO, on the other hand, rather expects a notification about an ACK in the
> device sense. So it still seems a good idea to me to postpone sending
> notifications to VFIO until an IRQ gets unmasked, but this postponing
> should be done not for the entire kvm_notify_acked_irq() but only for
> eventfd_signal() on resamplefd in irqfd_resampler_ack().
> 
> Thanks for making me think about that.
> 
> >
> >> That said, your idea of checking the guest interrupt status in VFIO
> >> (or whatever is listening on the resample eventfd) makes sense to me
> >> too. The problem, though, is that it's KVM that knows the guest
> >> interrupt status, so KVM would need to let VFIO/whatever know it
> >> somehow. (I'm assuming we are focusing on the case of KVM kernel
> >> irqchip, not userspace or split irqchip.) So do you have in mind
> >> adding something like "maskfd" and "unmaskfd" to KVM IRQFD
> interface,
> >> in addition to resamplefd? If so, I'm actually in favor of such an
> >> idea, as I think it would be also useful for other purposes, regardless
> of oneshot interrupts.
> >
> > Unless I'm misreading things, KVM already provides a mask notifier,
> > irqfd just needs to be wired up to use
> kvm_(un)register_irq_mask_notifier().
> 

Interesting...  I initially thought that kvm doesn't "trap" on ioapic's mmio
write.  However, I just traced kvm/ioapic.c and it turns out
ioapic_write_indirect() was called many times.   Does trapping on ioapic's mmio
write cause vmexit on edge-triggered interrupt exit?  It seems the case because
IOREGSEL and IOWIN of IOAPIC are memory mapped but not the redirection entry
register for each IRQ (that is why the name indirect_write), in order to unmask
redirection entry register on the exit of each interrupt (edge-triggered or
level-triggered), kernel needs to write to IORESEL, which means vmexit if kvm
traps on ioapic's mmio write.  However, for pass-thru device which uses
edge-triggered interrupt (handled by vfio or something similar),  interrupt
(pIRQ) is enabled by vfio and it seems unnecessary to cause a vmexit when guest
updates virtual ioapic.  I think the situation is similar for level-triggered
interrupt.  So 2 vmexits for each level-triggered interrupt completion, one for
EOI on lapic and another for unmask of IOAPIC register.  Does this sound right? 
I thought with vfio (or similar architecture), there is no vmexit necessary on
edge-triggered interrupt completion and only one vmexit for level triggered
interrupt completion, except the caveats of oneshot interrupt.  Maybe I
misunderstand something?

> Thanks for the tip. I'll take a look into implementing this idea.
> 
> It seems you agree that fixing this issue via a change in KVM (in irqfd, not
> in ioapic) seems to be the way to go.
> 

> An immediate problem I see with kvm_(un)register_irq_mask_notifier()
> is that it is currently available for x86 only. Also, mask notifiers are called
> under ioapic->lock (I'm not sure yet if that is good or bad news for us).
> 
> Thanks,
> Dmytro


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: Add vfio-platform support for ONESHOT irq forwarding?
  2022-07-07  9:15                   ` Dmytro Maluka
@ 2022-07-25 22:03                     ` Liu, Rong L
  2022-08-04 14:39                       ` Eric Auger
  0 siblings, 1 reply; 26+ messages in thread
From: Liu, Rong L @ 2022-07-25 22:03 UTC (permalink / raw)
  To: Dmytro Maluka, eric.auger, Micah Morton
  Cc: Alex Williamson, kvm, Christopherson,,
	Sean, Paolo Bonzini, Tomasz Nowicki, Grzegorz Jaszczyk,
	Dmitry Torokhov

Hi Eric,

> -----Original Message-----
> From: Dmytro Maluka <dmy@semihalf.com>
> Sent: Thursday, July 7, 2022 2:16 AM
> To: eric.auger@redhat.com; Micah Morton <mortonm@chromium.org>
> Cc: Alex Williamson <alex.williamson@redhat.com>;
> kvm@vger.kernel.org; Christopherson,, Sean <seanjc@google.com>;
> Paolo Bonzini <pbonzini@redhat.com>; Liu, Rong L
> <rong.l.liu@intel.com>; Tomasz Nowicki <tn@semihalf.com>; Grzegorz
> Jaszczyk <jaz@semihalf.com>; Dmitry Torokhov <dtor@google.com>
> Subject: Re: Add vfio-platform support for ONESHOT irq forwarding?
> 
> Hi Eric,
> 
> On 7/7/22 10:25 AM, Eric Auger wrote:
> >> Again, this doesn't seem to be true. Just as explained in my above
> >> reply to Alex, the guest deactivates (EOI) the vIRQ already after the
> >> completion of the vIRQ hardirq handler, not the vIRQ thread.
> >>
> >> So VFIO unmask handler gets called too early, before the interrupt
> >> gets serviced and acked in the vIRQ thread.
> > Fair enough, on vIRQ hardirq handler the physical IRQ gets unmasked.
> > This event occurs on guest EOI, which triggers the resamplefd. But what
> > is the state of the vIRQ? Isn't it stil masked until the vIRQ thread
> > completes, preventing the physical IRQ from being propagated to the
> guest?
> 
> Even if vIRQ is still masked by the time when
> vfio_automasked_irq_handler() signals the eventfd (which in itself is
> not guaranteed, I guess), I believe KVM is buffering this event, so
> after the vIRQ is unmasked, this new IRQ will be injected to the guest
> anyway.
> 
> >> It seems the obvious fix is to postpone sending irq ack notifications
> >> in KVM from EOI to unmask (for oneshot interrupts only). Luckily, we
> >> don't need to provide KVM with the info that the given interrupt is
> >> oneshot. KVM can just find it out from the fact that the interrupt is
> >> masked at the time of EOI.
> > you mean the vIRQ right?
> 
> Right.
> 
> > Before going further and we invest more time in that thread, please
> > could you give us additional context info and confidence
> > in/understanding of the stakes. This thread is from Jan 2021 and was
> > discontinued for a while. vfio-platform currently only is enabled on
> ARM
> > and maintained for very few devices which properly implement reset
> > callbacks and duly use an underlying IOMMU.

Do you have more questions about this issue after following info and POC from
Dmytro?
I agree that we tried to extend the vfio infrastructure to x86 and a few more
devices which may not "traditionally" supported by current vfio implementation. 
However if we view vfio as a general infrastructure to be used for pass-thru
devices (this is what we intend to do, implementation may vary),   Oneshot
interrupt is not properly handled. 

From this discussion when oneshot interrupt is first upstreamed:
https://lkml.iu.edu/hypermail/linux/kernel/0908.1/02114.html it says: "... we
need to keep the interrupt line masked until the threaded handler has executed. 
... The interrupt line is unmasked after the thread handler function has been
executed." using today's vfio architecture, (physical) interrupt line is
unmasked by vfio after EOI introduced vmexit, instead after the threaded
function has been executed (or in x86 world, when virtual interrupt is
unmasked): this totally violates how oneshot irq should be used.   We have a few
internal discussions and we couldn't find a solution which are both correct and
efficient.  But at least we can target a "correct" solution first and that will
help us resolve bugs we have on our products now.

> Sure. We are not really using vfio-platform for the devices we are
> concerned with, since those are not DMA capable devices, and some of
> them are not really platform devices but I2C or SPI devices. Instead we
> are using (hopefully temporarily) Micah's module for forwarding
> arbitrary IRQs [1][2] which mostly reimplements the VFIO irq forwarding
> mechanism.
> 
> Also with a few simple hacks I managed to use vfio-platform for the same
> thing (just as a PoC) and confirmed, unsurprisingly, that the problems
> with oneshot interrupts are observed with vfio-platform as well.
> 
> [1]
> https://chromium.googlesource.com/chromiumos/third_party/kernel/+/
> refs/heads/chromeos-5.10-manatee/virt/lib/platirqforward.c
> 
> [2]
> https://lkml.kernel.org/kvm/CAJ-
> EccPU8KpU96PM2PtroLjdNVDbvnxwKwWJr2B+RBKuXEr7Vw@mail.gmail
> .com/T/
> 
> Thanks,
> Dmytro

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add vfio-platform support for ONESHOT irq forwarding?
  2022-07-25 22:03                     ` Liu, Rong L
@ 2022-08-04 14:39                       ` Eric Auger
  2022-08-08 17:34                         ` Liu, Rong L
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Auger @ 2022-08-04 14:39 UTC (permalink / raw)
  To: Liu, Rong L, Dmytro Maluka, Micah Morton
  Cc: Alex Williamson, kvm, Christopherson,,
	Sean, Paolo Bonzini, Tomasz Nowicki, Grzegorz Jaszczyk,
	Dmitry Torokhov

H Liu,

On 7/26/22 00:03, Liu, Rong L wrote:
> Hi Eric,
>
>> -----Original Message-----
>> From: Dmytro Maluka <dmy@semihalf.com>
>> Sent: Thursday, July 7, 2022 2:16 AM
>> To: eric.auger@redhat.com; Micah Morton <mortonm@chromium.org>
>> Cc: Alex Williamson <alex.williamson@redhat.com>;
>> kvm@vger.kernel.org; Christopherson,, Sean <seanjc@google.com>;
>> Paolo Bonzini <pbonzini@redhat.com>; Liu, Rong L
>> <rong.l.liu@intel.com>; Tomasz Nowicki <tn@semihalf.com>; Grzegorz
>> Jaszczyk <jaz@semihalf.com>; Dmitry Torokhov <dtor@google.com>
>> Subject: Re: Add vfio-platform support for ONESHOT irq forwarding?
>>
>> Hi Eric,
>>
>> On 7/7/22 10:25 AM, Eric Auger wrote:
>>>> Again, this doesn't seem to be true. Just as explained in my above
>>>> reply to Alex, the guest deactivates (EOI) the vIRQ already after the
>>>> completion of the vIRQ hardirq handler, not the vIRQ thread.
>>>>
>>>> So VFIO unmask handler gets called too early, before the interrupt
>>>> gets serviced and acked in the vIRQ thread.
>>> Fair enough, on vIRQ hardirq handler the physical IRQ gets unmasked.
>>> This event occurs on guest EOI, which triggers the resamplefd. But what
>>> is the state of the vIRQ? Isn't it stil masked until the vIRQ thread
>>> completes, preventing the physical IRQ from being propagated to the
>> guest?
>>
>> Even if vIRQ is still masked by the time when
>> vfio_automasked_irq_handler() signals the eventfd (which in itself is
>> not guaranteed, I guess), I believe KVM is buffering this event, so
>> after the vIRQ is unmasked, this new IRQ will be injected to the guest
>> anyway.
>>
>>>> It seems the obvious fix is to postpone sending irq ack notifications
>>>> in KVM from EOI to unmask (for oneshot interrupts only). Luckily, we
>>>> don't need to provide KVM with the info that the given interrupt is
>>>> oneshot. KVM can just find it out from the fact that the interrupt is
>>>> masked at the time of EOI.
>>> you mean the vIRQ right?
>> Right.
>>
>>> Before going further and we invest more time in that thread, please
>>> could you give us additional context info and confidence
>>> in/understanding of the stakes. This thread is from Jan 2021 and was
>>> discontinued for a while. vfio-platform currently only is enabled on
>> ARM
>>> and maintained for very few devices which properly implement reset
>>> callbacks and duly use an underlying IOMMU.
> Do you have more questions about this issue after following info and POC from
> Dmytro?
> I agree that we tried to extend the vfio infrastructure to x86 and a few more
> devices which may not "traditionally" supported by current vfio implementation. 
> However if we view vfio as a general infrastructure to be used for pass-thru
> devices (this is what we intend to do, implementation may vary),   Oneshot
> interrupt is not properly handled. 

sorry for the delay, I was out of office and it took me some time to
catch up.

Yes the problem and context is clearer now after the last emails. I now
understand the vEOI (inducing the VFIO pIRQ unmask) is done before the
device interrupt line is deasserted by the threaded handler and the vIRQ
unmask is done, causing spurious hits of the same oneshot IRQ.

Thanks

Eric
>
> From this discussion when oneshot interrupt is first upstreamed:
> https://lkml.iu.edu/hypermail/linux/kernel/0908.1/02114.html it says: "... we
> need to keep the interrupt line masked until the threaded handler has executed. 
> ... The interrupt line is unmasked after the thread handler function has been
> executed." using today's vfio architecture, (physical) interrupt line is
> unmasked by vfio after EOI introduced vmexit, instead after the threaded
> function has been executed (or in x86 world, when virtual interrupt is
> unmasked): this totally violates how oneshot irq should be used.   We have a few
> internal discussions and we couldn't find a solution which are both correct and
> efficient.  But at least we can target a "correct" solution first and that will
> help us resolve bugs we have on our products now.
>
>> Sure. We are not really using vfio-platform for the devices we are
>> concerned with, since those are not DMA capable devices, and some of
>> them are not really platform devices but I2C or SPI devices. Instead we
>> are using (hopefully temporarily) Micah's module for forwarding
>> arbitrary IRQs [1][2] which mostly reimplements the VFIO irq forwarding
>> mechanism.
>>
>> Also with a few simple hacks I managed to use vfio-platform for the same
>> thing (just as a PoC) and confirmed, unsurprisingly, that the problems
>> with oneshot interrupts are observed with vfio-platform as well.
>>
>> [1]
>> https://chromium.googlesource.com/chromiumos/third_party/kernel/+/
>> refs/heads/chromeos-5.10-manatee/virt/lib/platirqforward.c
>>
>> [2]
>> https://lkml.kernel.org/kvm/CAJ-
>> EccPU8KpU96PM2PtroLjdNVDbvnxwKwWJr2B+RBKuXEr7Vw@mail.gmail
>> .com/T/
>>
>> Thanks,
>> Dmytro


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: Add vfio-platform support for ONESHOT irq forwarding?
  2022-07-12 16:02                         ` Liu, Rong L
@ 2022-08-04 14:44                           ` Eric Auger
  2022-08-08 21:15                             ` Liu, Rong L
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Auger @ 2022-08-04 14:44 UTC (permalink / raw)
  To: Liu, Rong L, Dmytro Maluka, Christopherson,, Sean
  Cc: Micah Morton, Alex Williamson, kvm, Paolo Bonzini,
	Tomasz Nowicki, Grzegorz Jaszczyk, Dmitry Torokhov

Hi,
On 7/12/22 18:02, Liu, Rong L wrote:
> Hi Sean and Dmytro,
>
>> -----Original Message-----
>> From: Dmytro Maluka <dmy@semihalf.com>
>> Sent: Thursday, July 7, 2022 10:39 AM
>> To: Christopherson,, Sean <seanjc@google.com>
>> Cc: Auger Eric <eric.auger@redhat.com>; Micah Morton
>> <mortonm@chromium.org>; Alex Williamson
>> <alex.williamson@redhat.com>; kvm@vger.kernel.org; Paolo Bonzini
>> <pbonzini@redhat.com>; Liu, Rong L <rong.l.liu@intel.com>; Tomasz
>> Nowicki <tn@semihalf.com>; Grzegorz Jaszczyk <jaz@semihalf.com>;
>> Dmitry Torokhov <dtor@google.com>
>> Subject: Re: Add vfio-platform support for ONESHOT irq forwarding?
>>
>> On 7/7/22 17:00, Sean Christopherson wrote:
>>> On Thu, Jul 07, 2022, Dmytro Maluka wrote:
>>>> Hi Sean,
>>>>
>>>> On 7/6/22 10:39 PM, Sean Christopherson wrote:
>>>>> On Wed, Jul 06, 2022, Dmytro Maluka wrote:
>>>>>> This is not a problem on native, since for oneshot irq we keep the
>>>>>> interrupt masked until the thread exits, so that the EOI at the end
>>>>>> of hardirq doesn't result in immediate re-assert. In vfio + KVM
>>>>>> case, however, the host doesn't check that the interrupt is still
>>>>>> masked in the guest, so
>>>>>> vfio_platform_unmask() is called regardless.
>>>>> Isn't not checking that an interrupt is unmasked the real bug?
>>>>> Fudging around vfio (or whatever is doing the premature unmasking)
>>>>> bugs by delaying an ack notification in KVM is a hack, no?
>>>> Yes, not checking that an interrupt is unmasked is IMO a bug, and my
>>>> patch actually adds this missing checking, only that it adds it in
>>>> KVM, not in VFIO. :)
>>>>
>>>> Arguably it's not a bug that VFIO is not checking the guest interrupt
>>>> state on its own, provided that the resample notification it receives
>>>> is always a notification that the interrupt has been actually acked.
>>>> That is the motivation behind postponing ack notification in KVM in
>>>> my patch: it is to ensure that KVM "ack notifications" are always
>>>> actual ack notifications (as the name suggests), not just "eoi
>> notifications".
>>> But EOI is an ACK.  It's software saying "this interrupt has been
>> consumed".
>>
>> Ok, I see we've had some mutual misunderstanding of the term "ack"
>> here.
>> EOI is an ACK in the interrupt controller sense, while I was talking about
>> an ACK in the device sense, i.e. a device-specific action, done in a device
>> driver's IRQ handler, which makes the device de-assert the IRQ line (so
>> that the IRQ will not get re-asserted after an EOI or unmask).
>>
>> So the problem I'm trying to fix stems from the peculiarity of "oneshot"
>> interrupts: an ACK in the device sense is done in a threaded handler, i.e.
>> after an ACK in the interrupt controller sense, not before it.
>>
>> In the meantime I've realized one more reason why my patch is wrong.
>> kvm_notify_acked_irq() is an internal KVM thing which is supposed to
>> notify interested parts of KVM about an ACK rather in the interrupt
>> controller sense, i.e. about an EOI as it is doing already.
>>
>> VFIO, on the other hand, rather expects a notification about an ACK in the
>> device sense. So it still seems a good idea to me to postpone sending
>> notifications to VFIO until an IRQ gets unmasked, but this postponing
>> should be done not for the entire kvm_notify_acked_irq() but only for
>> eventfd_signal() on resamplefd in irqfd_resampler_ack().
>>
>> Thanks for making me think about that.
>>
>>>> That said, your idea of checking the guest interrupt status in VFIO
>>>> (or whatever is listening on the resample eventfd) makes sense to me
>>>> too. The problem, though, is that it's KVM that knows the guest
>>>> interrupt status, so KVM would need to let VFIO/whatever know it
>>>> somehow. (I'm assuming we are focusing on the case of KVM kernel
>>>> irqchip, not userspace or split irqchip.) So do you have in mind
>>>> adding something like "maskfd" and "unmaskfd" to KVM IRQFD
>> interface,
>>>> in addition to resamplefd? If so, I'm actually in favor of such an
>>>> idea, as I think it would be also useful for other purposes, regardless
>> of oneshot interrupts.
>>> Unless I'm misreading things, KVM already provides a mask notifier,
>>> irqfd just needs to be wired up to use
>> kvm_(un)register_irq_mask_notifier().
>>
> Interesting...  I initially thought that kvm doesn't "trap" on ioapic's mmio
> write.  However, I just traced kvm/ioapic.c and it turns out
> ioapic_write_indirect() was called many times.   Does trapping on ioapic's mmio
> write cause vmexit on edge-triggered interrupt exit?  It seems the case because
> IOREGSEL and IOWIN of IOAPIC are memory mapped but not the redirection entry
> register for each IRQ (that is why the name indirect_write), in order to unmask
> redirection entry register on the exit of each interrupt (edge-triggered or
> level-triggered), kernel needs to write to IORESEL, which means vmexit if kvm
> traps on ioapic's mmio write.  However, for pass-thru device which uses
> edge-triggered interrupt (handled by vfio or something similar),  interrupt
> (pIRQ) is enabled by vfio and it seems unnecessary to cause a vmexit when guest
> updates virtual ioapic.  I think the situation is similar for level-triggered
> interrupt.  So 2 vmexits for each level-triggered interrupt completion, one for
> EOI on lapic and another for unmask of IOAPIC register.  Does this sound right? 
> I thought with vfio (or similar architecture), there is no vmexit necessary on
> edge-triggered interrupt completion and only one vmexit for level triggered
> interrupt completion, except the caveats of oneshot interrupt.  Maybe I
> misunderstand something?
Currently, no vmexit for edge-sensitive and 1 vmexit for level-sensitive
is what happens on ARM shared peripheral interrupts at least.
Note there is one setup that could remove the need for the vmexit on
vEOI: irq_bypass mode
(https://www.linux-kvm.org/images/a/a8/01x04-ARMdevice.pdf slide 12-14):
on GIC you have a mode that allows automatic completion of the physical
IRQ when the corresponding vIRQ is completed. This mode would not be
compatible with oneshort_irq.
At some point we worked on this enablement but given the lack of
vfio-platform customers this work was paused so we still have the
mask/unmask vfio dance.

Thanks

Eric

>
>> Thanks for the tip. I'll take a look into implementing this idea.
>>
>> It seems you agree that fixing this issue via a change in KVM (in irqfd, not
>> in ioapic) seems to be the way to go.
>>
>> An immediate problem I see with kvm_(un)register_irq_mask_notifier()
>> is that it is currently available for x86 only. Also, mask notifiers are called
>> under ioapic->lock (I'm not sure yet if that is good or bad news for us).
>>
>> Thanks,
>> Dmytro


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: Add vfio-platform support for ONESHOT irq forwarding?
  2022-08-04 14:39                       ` Eric Auger
@ 2022-08-08 17:34                         ` Liu, Rong L
  0 siblings, 0 replies; 26+ messages in thread
From: Liu, Rong L @ 2022-08-08 17:34 UTC (permalink / raw)
  To: eric.auger, Dmytro Maluka, Micah Morton
  Cc: Alex Williamson, kvm, Christopherson,,
	Sean, Paolo Bonzini, Tomasz Nowicki, Grzegorz Jaszczyk,
	Dmitry Torokhov

Hi Eric,

> -----Original Message-----
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Thursday, August 4, 2022 7:39 AM
> To: Liu, Rong L <rong.l.liu@intel.com>; Dmytro Maluka
> <dmy@semihalf.com>; Micah Morton <mortonm@chromium.org>
> Cc: Alex Williamson <alex.williamson@redhat.com>;
> kvm@vger.kernel.org; Christopherson,, Sean <seanjc@google.com>;
> Paolo Bonzini <pbonzini@redhat.com>; Tomasz Nowicki
> <tn@semihalf.com>; Grzegorz Jaszczyk <jaz@semihalf.com>; Dmitry
> Torokhov <dtor@google.com>
> Subject: Re: Add vfio-platform support for ONESHOT irq forwarding?
> 
> H Liu,
> 
> On 7/26/22 00:03, Liu, Rong L wrote:
> > Hi Eric,
> >
> >> -----Original Message-----
> >> From: Dmytro Maluka <dmy@semihalf.com>
> >> Sent: Thursday, July 7, 2022 2:16 AM
> >> To: eric.auger@redhat.com; Micah Morton
> <mortonm@chromium.org>
> >> Cc: Alex Williamson <alex.williamson@redhat.com>;
> >> kvm@vger.kernel.org; Christopherson,, Sean <seanjc@google.com>;
> >> Paolo Bonzini <pbonzini@redhat.com>; Liu, Rong L
> >> <rong.l.liu@intel.com>; Tomasz Nowicki <tn@semihalf.com>;
> Grzegorz
> >> Jaszczyk <jaz@semihalf.com>; Dmitry Torokhov <dtor@google.com>
> >> Subject: Re: Add vfio-platform support for ONESHOT irq forwarding?
> >>
> >> Hi Eric,
> >>
> >> On 7/7/22 10:25 AM, Eric Auger wrote:
> >>>> Again, this doesn't seem to be true. Just as explained in my above
> >>>> reply to Alex, the guest deactivates (EOI) the vIRQ already after the
> >>>> completion of the vIRQ hardirq handler, not the vIRQ thread.
> >>>>
> >>>> So VFIO unmask handler gets called too early, before the interrupt
> >>>> gets serviced and acked in the vIRQ thread.
> >>> Fair enough, on vIRQ hardirq handler the physical IRQ gets unmasked.
> >>> This event occurs on guest EOI, which triggers the resamplefd. But
> what
> >>> is the state of the vIRQ? Isn't it stil masked until the vIRQ thread
> >>> completes, preventing the physical IRQ from being propagated to the
> >> guest?
> >>
> >> Even if vIRQ is still masked by the time when
> >> vfio_automasked_irq_handler() signals the eventfd (which in itself is
> >> not guaranteed, I guess), I believe KVM is buffering this event, so
> >> after the vIRQ is unmasked, this new IRQ will be injected to the guest
> >> anyway.
> >>
> >>>> It seems the obvious fix is to postpone sending irq ack notifications
> >>>> in KVM from EOI to unmask (for oneshot interrupts only). Luckily,
> we
> >>>> don't need to provide KVM with the info that the given interrupt is
> >>>> oneshot. KVM can just find it out from the fact that the interrupt is
> >>>> masked at the time of EOI.
> >>> you mean the vIRQ right?
> >> Right.
> >>
> >>> Before going further and we invest more time in that thread, please
> >>> could you give us additional context info and confidence
> >>> in/understanding of the stakes. This thread is from Jan 2021 and was
> >>> discontinued for a while. vfio-platform currently only is enabled on
> >> ARM
> >>> and maintained for very few devices which properly implement reset
> >>> callbacks and duly use an underlying IOMMU.
> > Do you have more questions about this issue after following info and
> POC from
> > Dmytro?
> > I agree that we tried to extend the vfio infrastructure to x86 and a few
> more
> > devices which may not "traditionally" supported by current vfio
> implementation.
> > However if we view vfio as a general infrastructure to be used for pass-
> thru
> > devices (this is what we intend to do, implementation may vary),
> Oneshot
> > interrupt is not properly handled.
> 
> sorry for the delay, I was out of office and it took me some time to
> catch up.
> 
> Yes the problem and context is clearer now after the last emails. I now
> understand the vEOI (inducing the VFIO pIRQ unmask) is done before the
> device interrupt line is deasserted by the threaded handler and the vIRQ
> unmask is done, causing spurious hits of the same oneshot IRQ.
> 
> Thanks
> 
> Eric

No problem.  Good summary of the problem:-)  And thanks for confirming that
you agree oneshot interrupt handling is an issue in virtualized environment.

Thanks,

Rong
> >
> > From this discussion when oneshot interrupt is first upstreamed:
> > https://lkml.iu.edu/hypermail/linux/kernel/0908.1/02114.html it says:
> "... we
> > need to keep the interrupt line masked until the threaded handler has
> executed.
> > ... The interrupt line is unmasked after the thread handler function has
> been
> > executed." using today's vfio architecture, (physical) interrupt line is
> > unmasked by vfio after EOI introduced vmexit, instead after the
> threaded
> > function has been executed (or in x86 world, when virtual interrupt is
> > unmasked): this totally violates how oneshot irq should be used.   We
> have a few
> > internal discussions and we couldn't find a solution which are both
> correct and
> > efficient.  But at least we can target a "correct" solution first and that
> will
> > help us resolve bugs we have on our products now.
> >
> >> Sure. We are not really using vfio-platform for the devices we are
> >> concerned with, since those are not DMA capable devices, and some
> of
> >> them are not really platform devices but I2C or SPI devices. Instead we
> >> are using (hopefully temporarily) Micah's module for forwarding
> >> arbitrary IRQs [1][2] which mostly reimplements the VFIO irq
> forwarding
> >> mechanism.
> >>
> >> Also with a few simple hacks I managed to use vfio-platform for the
> same
> >> thing (just as a PoC) and confirmed, unsurprisingly, that the problems
> >> with oneshot interrupts are observed with vfio-platform as well.
> >>
> >> [1]
> >>
> https://chromium.googlesource.com/chromiumos/third_party/kernel/+/
> >> refs/heads/chromeos-5.10-manatee/virt/lib/platirqforward.c
> >>
> >> [2]
> >> https://lkml.kernel.org/kvm/CAJ-
> >>
> EccPU8KpU96PM2PtroLjdNVDbvnxwKwWJr2B+RBKuXEr7Vw@mail.gmail
> >> .com/T/
> >>
> >> Thanks,
> >> Dmytro


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: Add vfio-platform support for ONESHOT irq forwarding?
  2022-08-04 14:44                           ` Eric Auger
@ 2022-08-08 21:15                             ` Liu, Rong L
  0 siblings, 0 replies; 26+ messages in thread
From: Liu, Rong L @ 2022-08-08 21:15 UTC (permalink / raw)
  To: eric.auger, Dmytro Maluka, Christopherson,, Sean
  Cc: Micah Morton, Alex Williamson, kvm, Paolo Bonzini,
	Tomasz Nowicki, Grzegorz Jaszczyk, Dmitry Torokhov

Hi Eric,

> -----Original Message-----
> From: Eric Auger <eric.auger@redhat.com>
> Sent: Thursday, August 4, 2022 7:45 AM
> To: Liu, Rong L <rong.l.liu@intel.com>; Dmytro Maluka
> <dmy@semihalf.com>; Christopherson,, Sean <seanjc@google.com>
> Cc: Micah Morton <mortonm@chromium.org>; Alex Williamson
> <alex.williamson@redhat.com>; kvm@vger.kernel.org; Paolo Bonzini
> <pbonzini@redhat.com>; Tomasz Nowicki <tn@semihalf.com>; Grzegorz
> Jaszczyk <jaz@semihalf.com>; Dmitry Torokhov <dtor@google.com>
> Subject: Re: Add vfio-platform support for ONESHOT irq forwarding?
> 
> Hi,
> On 7/12/22 18:02, Liu, Rong L wrote:
> > Hi Sean and Dmytro,
> >
> >> -----Original Message-----
> >> From: Dmytro Maluka <dmy@semihalf.com>
> >> Sent: Thursday, July 7, 2022 10:39 AM
> >> To: Christopherson,, Sean <seanjc@google.com>
> >> Cc: Auger Eric <eric.auger@redhat.com>; Micah Morton
> >> <mortonm@chromium.org>; Alex Williamson
> >> <alex.williamson@redhat.com>; kvm@vger.kernel.org; Paolo Bonzini
> >> <pbonzini@redhat.com>; Liu, Rong L <rong.l.liu@intel.com>; Tomasz
> >> Nowicki <tn@semihalf.com>; Grzegorz Jaszczyk <jaz@semihalf.com>;
> >> Dmitry Torokhov <dtor@google.com>
> >> Subject: Re: Add vfio-platform support for ONESHOT irq forwarding?
> >>
> >> On 7/7/22 17:00, Sean Christopherson wrote:
> >>> On Thu, Jul 07, 2022, Dmytro Maluka wrote:
> >>>> Hi Sean,
> >>>>
> >>>> On 7/6/22 10:39 PM, Sean Christopherson wrote:
> >>>>> On Wed, Jul 06, 2022, Dmytro Maluka wrote:
> >>>>>> This is not a problem on native, since for oneshot irq we keep the
> >>>>>> interrupt masked until the thread exits, so that the EOI at the end
> >>>>>> of hardirq doesn't result in immediate re-assert. In vfio + KVM
> >>>>>> case, however, the host doesn't check that the interrupt is still
> >>>>>> masked in the guest, so
> >>>>>> vfio_platform_unmask() is called regardless.
> >>>>> Isn't not checking that an interrupt is unmasked the real bug?
> >>>>> Fudging around vfio (or whatever is doing the premature
> unmasking)
> >>>>> bugs by delaying an ack notification in KVM is a hack, no?
> >>>> Yes, not checking that an interrupt is unmasked is IMO a bug, and
> my
> >>>> patch actually adds this missing checking, only that it adds it in
> >>>> KVM, not in VFIO. :)
> >>>>
> >>>> Arguably it's not a bug that VFIO is not checking the guest interrupt
> >>>> state on its own, provided that the resample notification it receives
> >>>> is always a notification that the interrupt has been actually acked.
> >>>> That is the motivation behind postponing ack notification in KVM in
> >>>> my patch: it is to ensure that KVM "ack notifications" are always
> >>>> actual ack notifications (as the name suggests), not just "eoi
> >> notifications".
> >>> But EOI is an ACK.  It's software saying "this interrupt has been
> >> consumed".
> >>
> >> Ok, I see we've had some mutual misunderstanding of the term "ack"
> >> here.
> >> EOI is an ACK in the interrupt controller sense, while I was talking
> about
> >> an ACK in the device sense, i.e. a device-specific action, done in a
> device
> >> driver's IRQ handler, which makes the device de-assert the IRQ line (so
> >> that the IRQ will not get re-asserted after an EOI or unmask).
> >>
> >> So the problem I'm trying to fix stems from the peculiarity of
> "oneshot"
> >> interrupts: an ACK in the device sense is done in a threaded handler,
> i.e.
> >> after an ACK in the interrupt controller sense, not before it.
> >>
> >> In the meantime I've realized one more reason why my patch is wrong.
> >> kvm_notify_acked_irq() is an internal KVM thing which is supposed to
> >> notify interested parts of KVM about an ACK rather in the interrupt
> >> controller sense, i.e. about an EOI as it is doing already.
> >>
> >> VFIO, on the other hand, rather expects a notification about an ACK in
> the
> >> device sense. So it still seems a good idea to me to postpone sending
> >> notifications to VFIO until an IRQ gets unmasked, but this postponing
> >> should be done not for the entire kvm_notify_acked_irq() but only for
> >> eventfd_signal() on resamplefd in irqfd_resampler_ack().
> >>
> >> Thanks for making me think about that.
> >>
> >>>> That said, your idea of checking the guest interrupt status in VFIO
> >>>> (or whatever is listening on the resample eventfd) makes sense to
> me
> >>>> too. The problem, though, is that it's KVM that knows the guest
> >>>> interrupt status, so KVM would need to let VFIO/whatever know it
> >>>> somehow. (I'm assuming we are focusing on the case of KVM kernel
> >>>> irqchip, not userspace or split irqchip.) So do you have in mind
> >>>> adding something like "maskfd" and "unmaskfd" to KVM IRQFD
> >> interface,
> >>>> in addition to resamplefd? If so, I'm actually in favor of such an
> >>>> idea, as I think it would be also useful for other purposes, regardless
> >> of oneshot interrupts.
> >>> Unless I'm misreading things, KVM already provides a mask notifier,
> >>> irqfd just needs to be wired up to use
> >> kvm_(un)register_irq_mask_notifier().
> >>
> > Interesting...  I initially thought that kvm doesn't "trap" on ioapic's
> mmio
> > write.  However, I just traced kvm/ioapic.c and it turns out
> > ioapic_write_indirect() was called many times.   Does trapping on
> ioapic's mmio
> > write cause vmexit on edge-triggered interrupt exit?  It seems the case
> because
> > IOREGSEL and IOWIN of IOAPIC are memory mapped but not the
> redirection entry
> > register for each IRQ (that is why the name indirect_write), in order to
> unmask
> > redirection entry register on the exit of each interrupt (edge-triggered
> or
> > level-triggered), kernel needs to write to IORESEL, which means vmexit
> if kvm
> > traps on ioapic's mmio write.  However, for pass-thru device which
> uses
> > edge-triggered interrupt (handled by vfio or something similar),
> interrupt
> > (pIRQ) is enabled by vfio and it seems unnecessary to cause a vmexit
> when guest
> > updates virtual ioapic.  I think the situation is similar for level-triggered
> > interrupt.  So 2 vmexits for each level-triggered interrupt completion,
> one for
> > EOI on lapic and another for unmask of IOAPIC register.  Does this
> sound right?
> > I thought with vfio (or similar architecture), there is no vmexit
> necessary on
> > edge-triggered interrupt completion and only one vmexit for level
> triggered
> > interrupt completion, except the caveats of oneshot interrupt.  Maybe I
> > misunderstand something?
> Currently, no vmexit for edge-sensitive and 1 vmexit for level-sensitive
> is what happens on ARM shared peripheral interrupts at least.
> Note there is one setup that could remove the need for the vmexit on
> vEOI: irq_bypass mode
> (https://www.linux-kvm.org/images/a/a8/01x04-ARMdevice.pdf slide
> 12-14):
> on GIC you have a mode that allows automatic completion of the
> physical
> IRQ when the corresponding vIRQ is completed. This mode would not be
> compatible with oneshort_irq.
> At some point we worked on this enablement but given the lack of
> vfio-platform customers this work was paused so we still have the
> mask/unmask vfio dance.
> 
> Thanks
> 
> Eric
> 

Thanks for the info.  I am not familiar with ARM but it is interesting to know
the difference between 2 architectures.

> >
> >> Thanks for the tip. I'll take a look into implementing this idea.
> >>
> >> It seems you agree that fixing this issue via a change in KVM (in irqfd,
> not
> >> in ioapic) seems to be the way to go.
> >>
> >> An immediate problem I see with
> kvm_(un)register_irq_mask_notifier()
> >> is that it is currently available for x86 only. Also, mask notifiers are
> called
> >> under ioapic->lock (I'm not sure yet if that is good or bad news for us).
> >>
> >> Thanks,
> >> Dmytro


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-08-08 21:15 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-25 15:46 Add vfio-platform support for ONESHOT irq forwarding? Micah Morton
2021-01-25 17:31 ` Auger Eric
2021-01-25 18:32   ` Micah Morton
2021-01-26  8:48     ` Auger Eric
2021-01-26 15:31       ` Micah Morton
2021-01-26 16:05         ` Auger Eric
2021-01-25 20:36 ` Alex Williamson
2021-01-26  8:53   ` Auger Eric
2021-01-26 15:15     ` Micah Morton
2021-01-26 16:19       ` Auger Eric
2021-01-27 15:58         ` Micah Morton
2021-01-29 19:57           ` Micah Morton
2021-01-30 16:38             ` Auger Eric
2022-07-06 15:25               ` Dmytro Maluka
2022-07-06 20:39                 ` Sean Christopherson
2022-07-07  7:38                   ` Dmytro Maluka
2022-07-07 15:00                     ` Sean Christopherson
2022-07-07 17:38                       ` Dmytro Maluka
2022-07-12 16:02                         ` Liu, Rong L
2022-08-04 14:44                           ` Eric Auger
2022-08-08 21:15                             ` Liu, Rong L
2022-07-07  8:25                 ` Eric Auger
2022-07-07  9:15                   ` Dmytro Maluka
2022-07-25 22:03                     ` Liu, Rong L
2022-08-04 14:39                       ` Eric Auger
2022-08-08 17:34                         ` Liu, Rong L

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).