On May 31, 2022, at 5:45 PM, Alex Williamson <alex.williamson@redhat.com<mailto:alex.williamson@redhat.com>> wrote:

On Tue, 31 May 2022 22:03:14 +0100
Stefan Hajnoczi <stefanha@gmail.com<mailto:stefanha@gmail.com>> wrote:

On Tue, 31 May 2022 at 21:11, Alex Williamson
<alex.williamson@redhat.com<mailto:alex.williamson@redhat.com>> wrote:

On Tue, 31 May 2022 15:01:57 +0000
Jag Raman <jag.raman@oracle.com<mailto:jag.raman@oracle.com>> wrote:

On May 25, 2022, at 10:53 AM, Stefan Hajnoczi <stefanha@redhat.com<mailto:stefanha@redhat.com>> wrote:

On Tue, May 24, 2022 at 11:30:32AM -0400, Jagannathan Raman wrote:
Forward remote device's interrupts to the guest

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com<mailto:elena.ufimtseva@oracle.com>>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com<mailto:john.g.johnson@oracle.com>>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com<mailto:jag.raman@oracle.com>>
---
include/hw/pci/pci.h              |  13 ++++
include/hw/remote/vfio-user-obj.h |   6 ++
hw/pci/msi.c                      |  16 ++--
hw/pci/msix.c                     |  10 ++-
hw/pci/pci.c                      |  13 ++++
hw/remote/machine.c               |  14 +++-
hw/remote/vfio-user-obj.c         | 123 ++++++++++++++++++++++++++++++
stubs/vfio-user-obj.c             |   6 ++
MAINTAINERS                       |   1 +
hw/remote/trace-events            |   1 +
stubs/meson.build                 |   1 +
11 files changed, 193 insertions(+), 11 deletions(-)
create mode 100644 include/hw/remote/vfio-user-obj.h
create mode 100644 stubs/vfio-user-obj.c

It would be great if Michael Tsirkin and Alex Williamson would review
this.

Hi Michael and Alex,

Do you have any thoughts on this patch?

Ultimately this is just how to insert callbacks to replace the default
MSI/X triggers so you can send a vector# over the wire for a remote
machine, right?  I'll let the code owners, Michael and Marcel, comment
if they have grand vision how to architect this differently.  Thanks,

An earlier version of the patch intercepted MSI-X at the msix_notify()
level, replacing the entire function. This patch replaces
msix_get_message() and msi_send_message(), leaving the masking logic
in place.

I haven't seen the latest vfio-user client implementation for QEMU,
but if the idea is to allow the guest to directly control the
vfio-user device's MSI-X table's mask bits, then I think this is a
different design from VFIO kernel where masking is emulated by QEMU
and not passed through to the PCI device.

Essentially what's happening here is an implementation of an interrupt
handler callback in the remote QEMU instance.  The default handler is
to simply write the MSI message data at the MSI message address of the
vCPU, vfio-user replaces that with hijacking the MSI message itself to
simply report the vector# so that the "handler", ie. trigger, can
forward it to the client.  That's very analogous to the kernel
implementation.

The equivalent masking we have today with vfio kernel would happen on
the client side, where the MSI/X code might instead set a pending bit
if the vector is masked on the client.  Likewise the possibility
remains, just as it does on the kernel side, that the guest masking a
vector could be relayed over ioctl/socket to set the equivalent mask on
the host/remote.

Hi Alex,

Just to add some more detail, the emulated PCI device in QEMU presently
maintains a MSIx table (PCIDevice->msix_table) and Pending Bit Array. In the
present VFIO PCI device implementation, QEMU leverages the same
MSIx table for interrupt masking/unmasking. The backend PCI device (such as
the passthru device) always thinks that the interrupt is unmasked and lets
QEMU manage masking.

Whereas in the vfio-user case, the client additionally pushes a copy of
emulated PCI device’s table downstream to the remote device. We did this
to allow a small set of devices (such as e1000e) to clear the
PBA (msix_clr_pending()). Secondly, the remote device uses its copy of the
MSIx table to determine if interrupt should be triggered - this would prevent
an interrupt from being sent to the client unnecessarily if it's masked.

We are wondering if pushing the MSIx table to the remote device and
reading PBA from it would diverge from the VFIO protocol specification?

From your comment, I understand it’s similar to VFIO protocol because VFIO
clients could mask an interrupt using VFIO_DEVICE_SET_IRQS ioctl +
VFIO_IRQ_SET_ACTION_MASK / _UNMASK flags. I observed that QEMU presently
does not use this approach and the kernel does not support it for MSI.

Thank you!
--
Jag