All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] Xen PIRQ support
@ 2023-01-14  0:39 David Woodhouse
  2023-01-14  0:39 ` [RFC PATCH 1/5] i386/xen: Implement HYPERVISOR_physdev_op David Woodhouse
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: David Woodhouse @ 2023-01-14  0:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Paul Durrant, Joao Martins, Ankur Arora,
	Philippe Mathieu-Daudé,
	Thomas Huth, Alex Bennée, Juan Quintela,
	Dr . David Alan Gilbert, Claudio Fontana, Julien Grall,
	Michael S. Tsirkin, arcel Apfelbaum

This continues to build on the basic Xen on KVM platform support from 
https://lore.kernel.org/qemu-devel/20230110122042.1562155-1-dwmw2@infradead.org/

We're working on hooking up the PV backend devices, and the biggest 
remaining noticeably missing part was PIRQ support. This allows a Xen 
guest to route GSI and MSI interrupts to event channels instead of being 
delivered via the emulated I/OAPIC or local APIC respectively.

It starts relatively simple, with the basic hypercalls and infrastructure
for tracking/migrating the PIRQ table (and as I type this I've just
remembered I forgot to write the post_load function to reconstitute the
data structures which explicitly *state* that they need to be rebuilt).

I'm particularly interested in opinions on the hook in gsi_handler() 
which lets the Xen emulation 'eat' the event instead of passing it to 
the I/OAPIC.

I did ponder replacing the qemu_irq in gsi_state->ioapic_irq[n] when
GSI#n is redirected to a PIRQ, but I figured that was worse.

I definitely need to rethink the locking a little bit to avoid the 
potential for deadlock when gsi_handler calls back into the evtchn code 
to translate the event channel GSI. It's non-trivial to drop the lock 
before sending the IRQ; maybe just a different lock with a smaller 
scope. A previous implementation of event channels was a bit more 
lockless, with atomic updates of the port table (the port_info fits in a 
uint64_t). But now we have all the interesting fast paths accelerated in 
the kernel that didn't seem worth it, so I went with simple locking... 
too simple, it seems.

There's a similar recursive locking issue when pirq_bind_port() wants to 
call kvm_update_msi_routes_all(), but is already holding the lock that 
we'd take again when called to redo a translation. (And I still don't 
much like the way that kvm_update_msi_routes_all() has to have a list of 
PCI devices and actually recalculates the routes at all, instead of just 
detaching the IRQFD and letting them be recalculated on demand. But I 
was trying to avoid actually fixing that this week).

David Woodhouse (5):
      i386/xen: Implement HYPERVISOR_physdev_op
      hw/xen: Implement emulated PIRQ hypercall support
      hw/xen: Support GSI mapping to PIRQ
      hw/xen: [FIXME] Avoid deadlock in xen_evtchn_set_gsi()
      hw/xen: Support MSI mapping to PIRQ

 hw/i386/kvm/trace-events     |   4 ++
 hw/i386/kvm/trace.h          |   1 +
 hw/i386/kvm/xen-stubs.c      |  11 ++++
 hw/i386/kvm/xen_evtchn.c     | 461 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 hw/i386/kvm/xen_evtchn.h     |  22 +++++++
 hw/i386/x86.c                |  15 +++++
 hw/pci/msi.c                 |  13 ++++
 hw/pci/msix.c                |   7 ++-
 hw/pci/pci.c                 |  14 +++++
 meson.build                  |   1 +
 target/i386/kvm/kvm.c        |  12 +++-
 target/i386/kvm/kvm_i386.h   |   2 +
 target/i386/kvm/xen-compat.h |  19 ++++++
 target/i386/kvm/xen-emu.c    | 136 +++++++++++++++++++++++++++++++++++++++++-
 14 files changed, 712 insertions(+), 6 deletions(-)







^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH 1/5] i386/xen: Implement HYPERVISOR_physdev_op
  2023-01-14  0:39 [RFC PATCH 0/5] Xen PIRQ support David Woodhouse
@ 2023-01-14  0:39 ` David Woodhouse
  2023-01-14  0:39 ` [RFC PATCH 2/5] hw/xen: Implement emulated PIRQ hypercall support David Woodhouse
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: David Woodhouse @ 2023-01-14  0:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Paul Durrant, Joao Martins, Ankur Arora,
	Philippe Mathieu-Daudé,
	Thomas Huth, Alex Bennée, Juan Quintela,
	Dr . David Alan Gilbert, Claudio Fontana, Julien Grall,
	Michael S. Tsirkin, arcel Apfelbaum

From: David Woodhouse <dwmw@amazon.co.uk>

Just hook up the basic hypercalls to stubs in xen_evtchn.c for now.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 hw/i386/kvm/xen_evtchn.c     |  25 ++++++++
 hw/i386/kvm/xen_evtchn.h     |  11 ++++
 target/i386/kvm/xen-compat.h |  19 ++++++
 target/i386/kvm/xen-emu.c    | 118 +++++++++++++++++++++++++++++++++++
 4 files changed, 173 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 084249c56d..fd83d052f7 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -1301,6 +1301,31 @@ int xen_evtchn_set_port(uint16_t port)
     return ret;
 }
 
+int xen_physdev_map_pirq(struct physdev_map_pirq *map)
+{
+    return -ENOTSUP;
+}
+
+int xen_physdev_unmap_pirq(struct physdev_unmap_pirq *unmap)
+{
+    return -ENOTSUP;
+}
+
+int xen_physdev_eoi_pirq(struct physdev_eoi *eoi)
+{
+    return -ENOTSUP;
+}
+
+int xen_physdev_query_pirq(struct physdev_irq_status_query *query)
+{
+    return -ENOTSUP;
+}
+
+int xen_physdev_get_free_pirq(struct physdev_get_free_pirq *get)
+{
+    return -ENOTSUP;
+}
+
 struct xenevtchn_handle *xen_be_evtchn_open(void)
 {
     struct xenevtchn_handle *xc = g_new0(struct xenevtchn_handle, 1);
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index b7b6f4e592..ccf58aa796 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -65,4 +65,15 @@ int xen_evtchn_bind_interdomain_op(struct evtchn_bind_interdomain *interdomain);
 int xen_evtchn_bind_vcpu_op(struct evtchn_bind_vcpu *vcpu);
 int xen_evtchn_reset_op(struct evtchn_reset *reset);
 
+struct physdev_map_pirq;
+struct physdev_unmap_pirq;
+struct physdev_eoi;
+struct physdev_irq_status_query;
+struct physdev_get_free_pirq;
+int xen_physdev_map_pirq(struct physdev_map_pirq *map);
+int xen_physdev_unmap_pirq(struct physdev_unmap_pirq *unmap);
+int xen_physdev_eoi_pirq(struct physdev_eoi *eoi);
+int xen_physdev_query_pirq(struct physdev_irq_status_query *query);
+int xen_physdev_get_free_pirq(struct physdev_get_free_pirq *get);
+
 #endif /* QEMU_XEN_EVTCHN_H */
diff --git a/target/i386/kvm/xen-compat.h b/target/i386/kvm/xen-compat.h
index ff5d20e901..e86ffc7644 100644
--- a/target/i386/kvm/xen-compat.h
+++ b/target/i386/kvm/xen-compat.h
@@ -48,4 +48,23 @@ struct compat_xen_add_to_physmap_batch {
     COMPAT_HANDLE(int) errs;
 };
 
+struct compat_physdev_map_pirq {
+    domid_t domid;
+    uint16_t pad;
+    /* IN */
+    int type;
+    /* IN (ignored for ..._MULTI_MSI) */
+    int index;
+    /* IN or OUT */
+    int pirq;
+    /* IN - high 16 bits hold segment for ..._MSI_SEG and ..._MULTI_MSI */
+    int bus;
+    /* IN */
+    int devfn;
+    /* IN (also OUT for ..._MULTI_MSI) */
+    int entry_nr;
+    /* IN */
+    uint64_t table_base;
+} __attribute__((packed));
+
 #endif /* QEMU_I386_XEN_COMPAT_H */
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 273200bc70..3fa58e33bd 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -1480,6 +1480,121 @@ static bool kvm_xen_hcall_gnttab_op(struct kvm_xen_exit *exit, X86CPU *cpu,
     return true;
 }
 
+static bool kvm_xen_hcall_physdev_op(struct kvm_xen_exit *exit, X86CPU *cpu,
+                                     int cmd, uint64_t arg)
+{
+    CPUState *cs = CPU(cpu);
+    int err;
+
+    switch (cmd) {
+    case PHYSDEVOP_map_pirq: {
+        struct physdev_map_pirq map;
+
+        if (hypercall_compat32(exit->u.hcall.longmode)) {
+            struct compat_physdev_map_pirq *map32 = (void *)&map;
+
+            if (kvm_copy_from_gva(cs, arg, map32, sizeof(*map32))) {
+                return -EFAULT;
+            }
+
+            /*
+             * The only thing that's different is the alignment of the
+             * uint64_t table_base at the end, which gets padding to make
+             * it 64-bit aligned in the 64-bit version.
+             */
+            qemu_build_assert(sizeof(*map32) == 36);
+            qemu_build_assert(offsetof(struct physdev_map_pirq, entry_nr) ==
+                              offsetof(struct compat_physdev_map_pirq, entry_nr));
+            memmove(&map.table_base, &map32->table_base, sizeof(map.table_base));
+        } else {
+            if (kvm_copy_from_gva(cs, arg, &map, sizeof(map))) {
+                err = -EFAULT;
+                break;
+            }
+        }
+        err = xen_physdev_map_pirq(&map);
+        /*
+         * Since table_base is an IN parameter and won't be changed, just
+         * copy the size of the compat structure back to the guest.
+         */
+        if (!err && kvm_copy_to_gva(cs, arg, &map,
+                                    sizeof(struct compat_physdev_map_pirq))) {
+            err = -EFAULT;
+        }
+        break;
+    }
+    case PHYSDEVOP_unmap_pirq: {
+        struct physdev_unmap_pirq unmap;
+
+        qemu_build_assert(sizeof(unmap) == 8);
+        if (kvm_copy_from_gva(cs, arg, &unmap, sizeof(unmap))) {
+            err = -EFAULT;
+            break;
+        }
+
+        err = xen_physdev_unmap_pirq(&unmap);
+        if (!err && kvm_copy_to_gva(cs, arg, &unmap, sizeof(unmap))) {
+            err = -EFAULT;
+        }
+        break;
+    }
+    case PHYSDEVOP_eoi: {
+        struct physdev_eoi eoi;
+
+        qemu_build_assert(sizeof(eoi) == 4);
+        if (kvm_copy_from_gva(cs, arg, &eoi, sizeof(eoi))) {
+            err = -EFAULT;
+            break;
+        }
+
+        err = xen_physdev_eoi_pirq(&eoi);
+        if (!err && kvm_copy_to_gva(cs, arg, &eoi, sizeof(eoi))) {
+            err = -EFAULT;
+        }
+        break;
+    }
+    case PHYSDEVOP_irq_status_query: {
+        struct physdev_irq_status_query query;
+
+        qemu_build_assert(sizeof(query) == 8);
+        if (kvm_copy_from_gva(cs, arg, &query, sizeof(query))) {
+            err = -EFAULT;
+            break;
+        }
+
+        err = xen_physdev_query_pirq(&query);
+        if (!err && kvm_copy_to_gva(cs, arg, &query, sizeof(query))) {
+            err = -EFAULT;
+        }
+        break;
+    }
+    case PHYSDEVOP_get_free_pirq: {
+        struct physdev_get_free_pirq get;
+
+        qemu_build_assert(sizeof(get) == 8);
+        if (kvm_copy_from_gva(cs, arg, &get, sizeof(get))) {
+            err = -EFAULT;
+            break;
+        }
+
+        err = xen_physdev_get_free_pirq(&get);
+        if (!err && kvm_copy_to_gva(cs, arg, &get, sizeof(get))) {
+            err = -EFAULT;
+        }
+        break;
+    }
+    case PHYSDEVOP_pirq_eoi_gmfn_v2: // FreeBSD 13 makes this hypercall
+        err = -ENOSYS;
+        break;
+
+    default:
+        return false;
+    }
+
+    exit->u.hcall.result = err;
+    return true;
+}
+
 static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit *exit)
 {
     uint16_t code = exit->u.hcall.input;
@@ -1514,6 +1629,9 @@ static bool do_kvm_xen_handle_exit(X86CPU *cpu, struct kvm_xen_exit *exit)
     case __HYPERVISOR_memory_op:
         return kvm_xen_hcall_memory_op(exit, cpu, exit->u.hcall.params[0],
                                        exit->u.hcall.params[1]);
+    case __HYPERVISOR_physdev_op:
+        return kvm_xen_hcall_physdev_op(exit, cpu, exit->u.hcall.params[0],
+                                        exit->u.hcall.params[1]);
     case __HYPERVISOR_xen_version:
         return kvm_xen_hcall_xen_version(exit, cpu, exit->u.hcall.params[0],
                                          exit->u.hcall.params[1]);
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH 2/5] hw/xen: Implement emulated PIRQ hypercall support
  2023-01-14  0:39 [RFC PATCH 0/5] Xen PIRQ support David Woodhouse
  2023-01-14  0:39 ` [RFC PATCH 1/5] i386/xen: Implement HYPERVISOR_physdev_op David Woodhouse
@ 2023-01-14  0:39 ` David Woodhouse
  2023-01-14  0:39 ` [RFC PATCH 3/5] hw/xen: Support GSI mapping to PIRQ David Woodhouse
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: David Woodhouse @ 2023-01-14  0:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Paul Durrant, Joao Martins, Ankur Arora,
	Philippe Mathieu-Daudé,
	Thomas Huth, Alex Bennée, Juan Quintela,
	Dr . David Alan Gilbert, Claudio Fontana, Julien Grall,
	Michael S. Tsirkin, arcel Apfelbaum

From: David Woodhouse <dwmw@amazon.co.uk>

This wires up the basic infrastructure but the actual interrupts aren't
there yet, so don't advertise it to the guest.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 hw/i386/kvm/trace-events  |   4 +
 hw/i386/kvm/trace.h       |   1 +
 hw/i386/kvm/xen_evtchn.c  | 265 +++++++++++++++++++++++++++++++++++++-
 hw/i386/kvm/xen_evtchn.h  |   2 +
 meson.build               |   1 +
 target/i386/kvm/xen-emu.c |  15 +++
 6 files changed, 283 insertions(+), 5 deletions(-)
 create mode 100644 hw/i386/kvm/trace-events
 create mode 100644 hw/i386/kvm/trace.h

diff --git a/hw/i386/kvm/trace-events b/hw/i386/kvm/trace-events
new file mode 100644
index 0000000000..04e60c5bb8
--- /dev/null
+++ b/hw/i386/kvm/trace-events
@@ -0,0 +1,4 @@
+kvm_xen_map_pirq(int pirq, int gsi) "pirq %d gsi %d"
+kvm_xen_unmap_pirq(int pirq, int gsi) "pirq %d gsi %d"
+kvm_xen_get_free_pirq(int pirq, int type) "pirq %d type %d"
+kvm_xen_bind_pirq(int pirq, int port) "pirq %d port %d"
diff --git a/hw/i386/kvm/trace.h b/hw/i386/kvm/trace.h
new file mode 100644
index 0000000000..e55d0812fd
--- /dev/null
+++ b/hw/i386/kvm/trace.h
@@ -0,0 +1 @@
+#include "trace/trace-hw_i386_kvm.h"
diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index fd83d052f7..82250daecb 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -23,6 +23,7 @@
 #include "exec/target_page.h"
 #include "exec/address-spaces.h"
 #include "migration/vmstate.h"
+#include "trace.h"
 
 #include "hw/sysbus.h"
 #include "hw/xen/xen.h"
@@ -105,6 +106,23 @@ struct xenevtchn_handle {
 #define PORT_INFO_TYPEVAL_REMOTE_QEMU           0x8000
 #define PORT_INFO_TYPEVAL_REMOTE_PORT_MASK      0x7FFF
 
+#define MAX_XEN_PIRQ 0x1048 /* Empirically */
+
+/*
+ * These 'emuirq' values are used by Xen in the LM stream... and yes, I am
+ * insane enough to think about guest-transparent live migration from actual
+ * Xen to QEMU, and ensuring that we can convert/consume the stream.
+ */
+#define IRQ_UNBOUND -1
+#define IRQ_PT -2
+#define IRQ_MSI_EMU -3
+
+
+struct pirq_info {
+    int gsi;
+    uint16_t port;
+};
+
 struct XenEvtchnState {
     /*< private >*/
     SysBusDevice busdev;
@@ -120,6 +138,14 @@ struct XenEvtchnState {
     qemu_irq gsis[GSI_NUM_PINS];
 
     struct xenevtchn_handle *be_handles[EVTCHN_2L_NR_CHANNELS];
+
+    /* GSI → PIRQ mapping (serialized) */
+    uint16_t gsi_pirq[GSI_NUM_PINS];
+    /* Bitmap of allocated PIRQs (serialized) */
+    uint64_t pirq_inuse[DIV_ROUND_UP(MAX_XEN_PIRQ, 64)];
+
+    /* Per-PIRQ information (rebuilt on migration) */
+    struct pirq_info pirq[MAX_XEN_PIRQ];
 };
 
 struct XenEvtchnState *xen_evtchn_singleton;
@@ -179,6 +205,9 @@ static const VMStateDescription xen_evtchn_vmstate = {
         VMSTATE_UINT32(nr_ports, XenEvtchnState),
         VMSTATE_STRUCT_VARRAY_UINT32(port_table, XenEvtchnState, nr_ports, 1,
                                      xen_evtchn_port_vmstate, XenEvtchnPort),
+        VMSTATE_UINT16_ARRAY(gsi_pirq, XenEvtchnState, GSI_NUM_PINS),
+        VMSTATE_UINT64_ARRAY(pirq_inuse, XenEvtchnState,
+                             DIV_ROUND_UP(MAX_XEN_PIRQ, 64)),
         VMSTATE_END_OF_LIST()
     }
 };
@@ -247,6 +276,21 @@ static void xen_evtchn_register_types(void)
 
 type_init(xen_evtchn_register_types)
 
+static int pirq_bind_port(XenEvtchnState *s, int pirq, uint16_t port)
+{
+    assert(pirq < MAX_XEN_PIRQ);
+
+    if (port && s->pirq[pirq].port) {
+        return -EBUSY;
+    }
+
+    s->pirq[pirq].port = port;
+    trace_kvm_xen_bind_pirq(pirq, port);
+
+    /* XX: We need to unmask MSI here, when we get to that */
+    return 0;
+}
+
 static int set_callback_pci_intx(XenEvtchnState *s, uint64_t param)
 {
     PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
@@ -881,6 +925,10 @@ static int close_port(XenEvtchnState *s, evtchn_port_t port)
     case EVTCHNSTAT_closed:
         return -ENOENT;
 
+    case EVTCHNSTAT_pirq:
+        pirq_bind_port(s, p->type_val, 0);
+        break;
+
     case EVTCHNSTAT_virq:
         kvm_xen_set_vcpu_virq(virq_is_global(p->type_val) ? 0 : p->vcpu,
                               p->type_val, 0);
@@ -1075,6 +1123,35 @@ int xen_evtchn_bind_virq_op(struct evtchn_bind_virq *virq)
     return ret;
 }
 
+int xen_evtchn_bind_pirq_op(struct evtchn_bind_pirq *pirq)
+{
+    XenEvtchnState *s = xen_evtchn_singleton;
+    int ret;
+
+    if (!s) {
+        return -ENOTSUP;
+    }
+
+    if (pirq->pirq >= MAX_XEN_PIRQ) {
+        return -EINVAL;
+    }
+
+    QEMU_LOCK_GUARD(&s->port_lock);
+
+    ret = allocate_port(s, 0, EVTCHNSTAT_pirq, pirq->pirq,
+                        &pirq->port);
+    if (ret) {
+        return ret;
+    }
+
+    ret = pirq_bind_port(s, pirq->pirq, pirq->port);
+    if (ret) {
+        free_port(s, pirq->port);
+        pirq->port = 0;
+    }
+    return ret;
+}
+
 int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi)
 {
     XenEvtchnState *s = xen_evtchn_singleton;
@@ -1301,29 +1378,207 @@ int xen_evtchn_set_port(uint16_t port)
     return ret;
 }
 
+#define pirq_inuse_word(s, pirq) (s->pirq_inuse[((pirq) / 64)])
+#define pirq_inuse_bit(pirq) (1ULL << ((pirq) & 63))
+
+#define pirq_inuse(s, pirq) (pirq_inuse_word(s, pirq) & pirq_inuse_bit(pirq))
+
+static int allocate_pirq(XenEvtchnState *s, int type, int gsi)
+{
+    uint16_t pirq;
+
+    /* Preserve the allocation strategy that Xen has. It looks like
+     * we *never* give out PIRQ 0-15, we give out 16-nr_irqs_gsi only
+     * to GSIs (counting up from 16), and then we count backwards from
+     * the top for MSIs or when the GSI space is exhausted. */
+    if (type == MAP_PIRQ_TYPE_GSI) {
+        for (pirq = 16 ; pirq < GSI_NUM_PINS; pirq++) {
+            if (pirq_inuse(s, pirq)) {
+                continue;
+            }
+
+            /* Found it */
+            goto found;
+        }
+    }
+    for (pirq = MAX_XEN_PIRQ - 1; pirq >= GSI_NUM_PINS; pirq--) {
+        /* Skip whole words at a time when they're full */
+        if (pirq_inuse_word(s, pirq) == UINT64_MAX) {
+            pirq &= ~63ULL;
+            continue;
+        }
+        if (pirq_inuse(s, pirq)) {
+            continue;
+        }
+
+        goto found;
+    }
+    return -ENOSPC;
+
+ found:
+    pirq_inuse_word(s, pirq) |= pirq_inuse_bit(pirq);
+    if (gsi >= 0) {
+        assert(gsi <= GSI_NUM_PINS);
+        s->gsi_pirq[gsi] = pirq;
+    }
+    s->pirq[pirq].gsi = gsi;
+    return pirq;
+}
+
 int xen_physdev_map_pirq(struct physdev_map_pirq *map)
 {
-    return -ENOTSUP;
+    XenEvtchnState *s = xen_evtchn_singleton;
+    int pirq = map->pirq;
+    int gsi = map->index;
+
+    if (!s) {
+        return -ENOTSUP;
+    }
+
+    QEMU_LOCK_GUARD(&s->port_lock);
+
+    if (map->domid != DOMID_SELF && map->domid != xen_domid) {
+        return -EPERM;
+    }
+    if (map->type != MAP_PIRQ_TYPE_GSI) {
+        return -EINVAL;
+    }
+    if (gsi < 0 || gsi >= GSI_NUM_PINS) {
+        return -EINVAL;
+    }
+
+    if (pirq < 0) {
+        pirq = allocate_pirq(s, map->type, gsi);
+        if (pirq < 0) {
+            return pirq;
+        }
+        map->pirq = pirq;
+    } else if (pirq > MAX_XEN_PIRQ) {
+        return -EINVAL;
+    } else {
+        /* User specified a valid-looking PIRQ#. Allow it if it is
+         * allocated and not yet bound, or if it is unallocated */
+        if (pirq_inuse(s, pirq)) {
+            if (s->pirq[pirq].gsi != IRQ_UNBOUND) {
+                return -EBUSY;
+            }
+        } else {
+            /* If it was unused, mark it used now. */
+            pirq_inuse_word(s, pirq) |= pirq_inuse_bit(pirq);
+        }
+        /* Set the mapping in both directions. */
+        s->pirq[pirq].gsi = gsi;
+        s->gsi_pirq[gsi] = pirq;
+    }
+
+    trace_kvm_xen_map_pirq(pirq, gsi);
+    return 0;
 }
 
 int xen_physdev_unmap_pirq(struct physdev_unmap_pirq *unmap)
 {
-    return -ENOTSUP;
+    XenEvtchnState *s = xen_evtchn_singleton;
+    int pirq = unmap->pirq;
+    int gsi;
+
+    if (!s) {
+        return -ENOTSUP;
+    }
+
+    if (unmap->domid != DOMID_SELF && unmap->domid != xen_domid)
+        return -EPERM;
+    if (pirq < 0 || pirq >= MAX_XEN_PIRQ)
+        return -EINVAL;
+
+    QEMU_LOCK_GUARD(&s->port_lock);
+
+    if (!pirq_inuse(s, pirq)) {
+        return -ENOENT;
+    }
+
+    gsi = s->pirq[pirq].gsi;
+
+    /* We can only unmap GSI PIRQs */
+    if (gsi < 0) {
+        return -EINVAL;
+    }
+
+    s->gsi_pirq[gsi] = 0;
+    s->pirq[pirq].gsi = IRQ_UNBOUND; /* Doesn't actually matter because: */
+    pirq_inuse_word(s, pirq) &= ~pirq_inuse_bit(pirq);
+
+    trace_kvm_xen_unmap_pirq(pirq, gsi);
+    return 0;
 }
 
 int xen_physdev_eoi_pirq(struct physdev_eoi *eoi)
 {
-    return -ENOTSUP;
+    XenEvtchnState *s = xen_evtchn_singleton;
+    int pirq = eoi->irq;
+    int gsi;
+
+    if (!s) {
+        return -ENOTSUP;
+    }
+
+    QEMU_LOCK_GUARD(&s->port_lock);
+
+    if (!pirq_inuse(s, pirq)) {
+        return -ENOENT;
+    }
+
+    gsi = s->pirq[pirq].gsi;
+    if (gsi < 0) {
+        return -EINVAL;
+    }
+
+    // XX: Reassert a level IRQ if needed */
+    return 0;
 }
 
 int xen_physdev_query_pirq(struct physdev_irq_status_query *query)
 {
-    return -ENOTSUP;
+    XenEvtchnState *s = xen_evtchn_singleton;
+    int pirq = query->irq;
+
+    if (!s) {
+        return -ENOTSUP;
+    }
+
+    QEMU_LOCK_GUARD(&s->port_lock);
+
+    if (!pirq_inuse(s, pirq)) {
+        return -ENOENT;
+    }
+
+    if (s->pirq[pirq].gsi >= 0) {
+        query->flags = XENIRQSTAT_needs_eoi;
+    } else {
+        query->flags = 0;
+    }
+
+    return 0;
 }
 
 int xen_physdev_get_free_pirq(struct physdev_get_free_pirq *get)
 {
-    return -ENOTSUP;
+    XenEvtchnState *s = xen_evtchn_singleton;
+    int pirq;
+
+    if (!s) {
+        return -ENOTSUP;
+    }
+
+    QEMU_LOCK_GUARD(&s->port_lock);
+
+    pirq = allocate_pirq(s, get->type, IRQ_UNBOUND);
+    if (pirq < 0) {
+        return pirq;
+    }
+
+    get->pirq = pirq;
+    trace_kvm_xen_get_free_pirq(pirq, get->type);
+    return 0;
 }
 
 struct xenevtchn_handle *xen_be_evtchn_open(void)
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index ccf58aa796..2c12506cc2 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -48,6 +48,7 @@ struct evtchn_status;
 struct evtchn_close;
 struct evtchn_unmask;
 struct evtchn_bind_virq;
+struct evtchn_bind_pirq;
 struct evtchn_bind_ipi;
 struct evtchn_send;
 struct evtchn_alloc_unbound;
@@ -58,6 +59,7 @@ int xen_evtchn_status_op(struct evtchn_status *status);
 int xen_evtchn_close_op(struct evtchn_close *close);
 int xen_evtchn_unmask_op(struct evtchn_unmask *unmask);
 int xen_evtchn_bind_virq_op(struct evtchn_bind_virq *virq);
+int xen_evtchn_bind_pirq_op(struct evtchn_bind_pirq *pirq);
 int xen_evtchn_bind_ipi_op(struct evtchn_bind_ipi *ipi);
 int xen_evtchn_send_op(struct evtchn_send *send);
 int xen_evtchn_alloc_unbound_op(struct evtchn_alloc_unbound *alloc);
diff --git a/meson.build b/meson.build
index 72eec9c68a..8a58c381eb 100644
--- a/meson.build
+++ b/meson.build
@@ -2943,6 +2943,7 @@ if have_system
     'hw/i2c',
     'hw/i386',
     'hw/i386/xen',
+    'hw/i386/kvm',
     'hw/ide',
     'hw/input',
     'hw/intc',
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index 3fa58e33bd..c956390e2c 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -1215,6 +1215,21 @@ static bool kvm_xen_hcall_evtchn_op(struct kvm_xen_exit *exit, X86CPU *cpu,
         }
         break;
     }
+    case EVTCHNOP_bind_pirq: {
+        struct evtchn_bind_pirq pirq;
+
+        qemu_build_assert(sizeof(pirq) == 12);
+        if (kvm_copy_from_gva(cs, arg, &pirq, sizeof(pirq))) {
+            err = -EFAULT;
+            break;
+        }
+
+        err = xen_evtchn_bind_pirq_op(&pirq);
+        if (!err && kvm_copy_to_gva(cs, arg, &pirq, sizeof(pirq))) {
+            err = -EFAULT;
+        }
+        break;
+    }
     case EVTCHNOP_bind_ipi: {
         struct evtchn_bind_ipi ipi;
 
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH 3/5] hw/xen: Support GSI mapping to PIRQ
  2023-01-14  0:39 [RFC PATCH 0/5] Xen PIRQ support David Woodhouse
  2023-01-14  0:39 ` [RFC PATCH 1/5] i386/xen: Implement HYPERVISOR_physdev_op David Woodhouse
  2023-01-14  0:39 ` [RFC PATCH 2/5] hw/xen: Implement emulated PIRQ hypercall support David Woodhouse
@ 2023-01-14  0:39 ` David Woodhouse
  2023-01-14  0:39 ` [RFC PATCH 4/5] hw/xen: [FIXME] Avoid deadlock in xen_evtchn_set_gsi() David Woodhouse
  2023-01-14  0:39 ` [RFC PATCH 5/5] hw/xen: Support MSI mapping to PIRQ David Woodhouse
  4 siblings, 0 replies; 7+ messages in thread
From: David Woodhouse @ 2023-01-14  0:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Paul Durrant, Joao Martins, Ankur Arora,
	Philippe Mathieu-Daudé,
	Thomas Huth, Alex Bennée, Juan Quintela,
	Dr . David Alan Gilbert, Claudio Fontana, Julien Grall,
	Michael S. Tsirkin, arcel Apfelbaum

From: David Woodhouse <dwmw@amazon.co.uk>

If I advertise XENFEAT_hvm_pirqs then a guest now boots successfully as
long as I tell it 'pci=nomsi'.

[root@localhost ~]# cat /proc/interrupts
           CPU0
  0:         52   IO-APIC   2-edge      timer
  1:         16  xen-pirq   1-ioapic-edge  i8042
  4:       1534  xen-pirq   4-ioapic-edge  ttyS0
  8:          1  xen-pirq   8-ioapic-edge  rtc0
  9:          0  xen-pirq   9-ioapic-level  acpi
 11:       5648  xen-pirq  11-ioapic-level  ahci[0000:00:04.0]
 12:        257  xen-pirq  12-ioapic-edge  i8042
...

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 hw/i386/kvm/xen_evtchn.c | 38 +++++++++++++++++++++++++++++++++++++-
 hw/i386/kvm/xen_evtchn.h |  2 ++
 hw/i386/x86.c            | 15 +++++++++++++++
 3 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 82250daecb..18c88229ab 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -143,6 +143,7 @@ struct XenEvtchnState {
     uint16_t gsi_pirq[GSI_NUM_PINS];
     /* Bitmap of allocated PIRQs (serialized) */
     uint64_t pirq_inuse[DIV_ROUND_UP(MAX_XEN_PIRQ, 64)];
+    uint32_t pirq_gsi_set;
 
     /* Per-PIRQ information (rebuilt on migration) */
     struct pirq_info pirq[MAX_XEN_PIRQ];
@@ -208,6 +209,7 @@ static const VMStateDescription xen_evtchn_vmstate = {
         VMSTATE_UINT16_ARRAY(gsi_pirq, XenEvtchnState, GSI_NUM_PINS),
         VMSTATE_UINT64_ARRAY(pirq_inuse, XenEvtchnState,
                              DIV_ROUND_UP(MAX_XEN_PIRQ, 64)),
+        VMSTATE_UINT32(pirq_gsi_set, XenEvtchnState),
         VMSTATE_END_OF_LIST()
     }
 };
@@ -1425,6 +1427,35 @@ static int allocate_pirq(XenEvtchnState *s, int type, int gsi)
     return pirq;
 }
 
+bool xen_evtchn_set_gsi(int gsi, int level)
+{
+    XenEvtchnState *s = xen_evtchn_singleton;
+    int pirq;
+
+    if (!s || gsi < 0 || gsi > GSI_NUM_PINS) {
+        return false;
+    }
+
+    QEMU_LOCK_GUARD(&s->port_lock);
+
+    pirq = s->gsi_pirq[gsi];
+    if (!pirq) {
+        return false;
+    }
+
+    if (level) {
+        int port = s->pirq[pirq].port;
+
+        s->pirq_gsi_set |= (1U << gsi);
+        if (port) {
+            set_port_pending(s, port);
+        }
+    } else {
+        s->pirq_gsi_set &= ~(1U << gsi);
+    }
+    return true;
+}
+
 int xen_physdev_map_pirq(struct physdev_map_pirq *map)
 {
     XenEvtchnState *s = xen_evtchn_singleton;
@@ -1531,8 +1562,13 @@ int xen_physdev_eoi_pirq(struct physdev_eoi *eoi)
     if (gsi < 0) {
         return -EINVAL;
     }
+    if (s->pirq_gsi_set & (1U << gsi)) {
+        int port = s->pirq[pirq].port;
+        if (port) {
+            set_port_pending(s, port);
+        }
+    }
 
-    // XX: Reassert a level IRQ if needed */
     return 0;
 }
 
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index 2c12506cc2..dba9d6b021 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -24,6 +24,8 @@ void xen_evtchn_set_callback_level(int level);
 
 int xen_evtchn_set_port(uint16_t port);
 
+bool xen_evtchn_set_gsi(int gsi, int level);
+
 /*
  * These functions mirror the libxenevtchn library API, providing the QEMU
  * backend side of "interdomain" event channels.
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 78cc131926..201fd5626c 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -61,6 +61,10 @@
 #include CONFIG_DEVICES
 #include "kvm/kvm_i386.h"
 
+#ifdef CONFIG_XEN_EMU
+#include "hw/i386/kvm/xen_evtchn.h"
+#endif
+
 /* Physical Address of PVH entry point read from kernel ELF NOTE */
 static size_t pvh_start_addr;
 
@@ -608,6 +612,17 @@ void gsi_handler(void *opaque, int n, int level)
         }
         /* fall through */
     case ISA_NUM_IRQS ... IOAPIC_NUM_PINS - 1:
+#ifdef CONFIG_XEN_EMU
+        /*
+         * Xen delivers the GSI to the Legacy PIC (not that Legacy PIC
+         * routing actually works properly under Xen). And then to
+         * *either* the PIRQ handling or the I/OAPIC depending on
+         * whether the former wants it.
+         */
+        if (xen_evtchn_set_gsi(n, level)) {
+            break;
+        }
+#endif
         qemu_set_irq(s->ioapic_irq[n], level);
         break;
     case IO_APIC_SECONDARY_IRQBASE
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH 4/5] hw/xen: [FIXME] Avoid deadlock in xen_evtchn_set_gsi()
  2023-01-14  0:39 [RFC PATCH 0/5] Xen PIRQ support David Woodhouse
                   ` (2 preceding siblings ...)
  2023-01-14  0:39 ` [RFC PATCH 3/5] hw/xen: Support GSI mapping to PIRQ David Woodhouse
@ 2023-01-14  0:39 ` David Woodhouse
  2023-01-16 11:27   ` David Woodhouse
  2023-01-14  0:39 ` [RFC PATCH 5/5] hw/xen: Support MSI mapping to PIRQ David Woodhouse
  4 siblings, 1 reply; 7+ messages in thread
From: David Woodhouse @ 2023-01-14  0:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Paul Durrant, Joao Martins, Ankur Arora,
	Philippe Mathieu-Daudé,
	Thomas Huth, Alex Bennée, Juan Quintela,
	Dr . David Alan Gilbert, Claudio Fontana, Julien Grall,
	Michael S. Tsirkin, arcel Apfelbaum

From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 hw/i386/kvm/xen_evtchn.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 18c88229ab..c4103ee98b 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -1436,6 +1436,14 @@ bool xen_evtchn_set_gsi(int gsi, int level)
         return false;
     }
 
+    /*
+     * XXX: We access this without locking. Because we'd deadlock
+     * if it was the callback_gsi. Do something cleverer.
+     */
+    if (gsi && gsi == s->callback_gsi) {
+        return false;
+    }
+
     QEMU_LOCK_GUARD(&s->port_lock);
 
     pirq = s->gsi_pirq[gsi];
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH 5/5] hw/xen: Support MSI mapping to PIRQ
  2023-01-14  0:39 [RFC PATCH 0/5] Xen PIRQ support David Woodhouse
                   ` (3 preceding siblings ...)
  2023-01-14  0:39 ` [RFC PATCH 4/5] hw/xen: [FIXME] Avoid deadlock in xen_evtchn_set_gsi() David Woodhouse
@ 2023-01-14  0:39 ` David Woodhouse
  4 siblings, 0 replies; 7+ messages in thread
From: David Woodhouse @ 2023-01-14  0:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Paul Durrant, Joao Martins, Ankur Arora,
	Philippe Mathieu-Daudé,
	Thomas Huth, Alex Bennée, Juan Quintela,
	Dr . David Alan Gilbert, Claudio Fontana, Julien Grall,
	Michael S. Tsirkin, arcel Apfelbaum

From: David Woodhouse <dwmw@amazon.co.uk>

The way that Xen handles MSI PIRQs is kind of awful.

There is a special MSI message which targets a PIRQ. The vector in the
low bits of data must be zero. The low 8 bits of the PIRQ# are in the
destination ID field, the extended destination ID field is unused, and
instead the high bits of the PIRQ# are in the high 32 bits of the address.

Using the high bits of the address means that we can't intercept and
translate these messages in kvm_send_msi(), because they won't be caught
by the APIC — addresses like 0x1000fee46000 aren't in the APIC's range.

So we catch them in pci_msi_trigger() instead, and deliver the event
channel directly.

That isn't even the worst part. The worst part is that Xen snoops on
writes to devices' MSI vectors while they are *masked*. When a MSI
message is written which looks like it targets a PIRQ, it remembers
the device and vector for later.

When the guest makes a hypercall to bind that PIRQ# (snooped from a
marked MSI vector) to an event channel port, Xen *unmasks* that MSI
vector on the device. Xen guests using PIRQ delivery of MSI don't
ever actually unmask the MSI for themselves.

Now that this is working we can finally enable XENFEAT_hvm_pirqs and
let the guest use it all.

  0:         36          0   IO-APIC   2-edge      timer
  1:          0         16  xen-pirq   1-ioapic-edge  i8042
  4:          0        960  xen-pirq   4-ioapic-edge  ttyS0
  8:          1          0  xen-pirq   8-ioapic-edge  rtc0
  9:          0          0  xen-pirq   9-ioapic-level  acpi
 12:        257          0  xen-pirq  12-ioapic-edge  i8042
 14:          0          0   IO-APIC  14-edge      ata_piix
 15:          0          0   IO-APIC  15-edge      ata_piix
 24:      11667          0  xen-percpu    -virq      timer0
 25:       3074          0  xen-percpu    -ipi       resched0
 26:          0          0  xen-percpu    -ipi       callfunc0
 27:          0          0  xen-percpu    -virq      debug0
 28:        488          0  xen-percpu    -ipi       callfuncsingle0
 29:          0          0  xen-percpu    -ipi       spinlock0
 30:          0      14209  xen-percpu    -virq      timer1
 31:          0      12386  xen-percpu    -ipi       resched1
 32:          0          0  xen-percpu    -ipi       callfunc1
 33:          0          0  xen-percpu    -virq      debug1
 34:          0        401  xen-percpu    -ipi       callfuncsingle1
 35:          0          0  xen-percpu    -ipi       spinlock1
 36:          8          0   xen-dyn    -event     xenbus
 37:          0       5693  xen-pirq    -msi       ahci[0000:00:04.0]

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 hw/i386/kvm/xen-stubs.c    |  11 +++
 hw/i386/kvm/xen_evtchn.c   | 141 ++++++++++++++++++++++++++++++++++++-
 hw/i386/kvm/xen_evtchn.h   |   7 ++
 hw/pci/msi.c               |  13 ++++
 hw/pci/msix.c              |   7 +-
 hw/pci/pci.c               |  14 ++++
 target/i386/kvm/kvm.c      |  12 +++-
 target/i386/kvm/kvm_i386.h |   2 +
 target/i386/kvm/xen-emu.c  |   3 +-
 9 files changed, 202 insertions(+), 8 deletions(-)

diff --git a/hw/i386/kvm/xen-stubs.c b/hw/i386/kvm/xen-stubs.c
index 6f433dc995..8b64fcf6c5 100644
--- a/hw/i386/kvm/xen-stubs.c
+++ b/hw/i386/kvm/xen-stubs.c
@@ -12,6 +12,7 @@
 #include "qemu/osdep.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-misc.h"
+#include "xen_evtchn.h"
 
 EvtchnInfoList *qmp_xen_event_list(Error **errp)
 {
@@ -23,3 +24,13 @@ void qmp_xen_event_inject(uint32_t port, Error **errp)
 {
     error_setg(errp, "Xen event channel emulation not enabled");
 }
+
+void xen_evtchn_snoop_msi(PCIDevice *dev, bool is_msix, unsigned int vector,
+                          uint64_t addr, uint32_t data, bool is_masked)
+{
+}
+
+bool xen_evtchn_deliver_pirq_msi(uint64_t address, uint32_t data)
+{
+    return false;
+}
diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index c4103ee98b..8e128ff8bc 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -30,9 +30,10 @@
 #include "hw/i386/x86.h"
 #include "hw/i386/pc.h"
 #include "hw/pci/pci.h"
+#include "hw/pci/msi.h"
+#include "hw/pci/msix.h"
 #include "hw/irq.h"
 #include "hw/xen/xen_backend_ops.h"
-
 #include "xen_evtchn.h"
 #include "xen_overlay.h"
 #include "xen_xenstore.h"
@@ -45,6 +46,9 @@
 #include "standard-headers/xen/memory.h"
 #include "standard-headers/xen/hvm/params.h"
 
+/* XX: For kvm_update_msi_routes_all() */
+#include "target/i386/kvm/kvm_i386.h"
+
 #define TYPE_XEN_EVTCHN "xen-evtchn"
 OBJECT_DECLARE_SIMPLE_TYPE(XenEvtchnState, XEN_EVTCHN)
 
@@ -121,6 +125,10 @@ struct xenevtchn_handle {
 struct pirq_info {
     int gsi;
     uint16_t port;
+    PCIDevice *dev;
+    int vector;
+    bool is_msix;
+    bool is_masked;
 };
 
 struct XenEvtchnState {
@@ -289,7 +297,18 @@ static int pirq_bind_port(XenEvtchnState *s, int pirq, uint16_t port)
     s->pirq[pirq].port = port;
     trace_kvm_xen_bind_pirq(pirq, port);
 
-    /* XX: We need to unmask MSI here, when we get to that */
+    if (port && s->pirq[pirq].gsi == IRQ_MSI_EMU) {
+         if (s->pirq[pirq].is_msix) {
+            msix_set_mask(s->pirq[pirq].dev, s->pirq[pirq].vector,
+                          false);
+        } else {
+            msi_set_mask(s->pirq[pirq].dev, s->pirq[pirq].vector,
+                         false, &error_fatal);
+        }
+    }
+    if (!port) {
+        /* XX: Want to invalidate MSI routing here but we'd deadlock */
+    }
     return 0;
 }
 
@@ -1464,6 +1483,115 @@ bool xen_evtchn_set_gsi(int gsi, int level)
     return true;
 }
 
+static uint32_t msi_pirq_target(uint64_t addr, uint32_t data)
+{
+    /* The vector (in low 8 bits of data) must be zero */
+    if (data & 0xff)
+        return 0;
+
+    uint32_t pirq = (addr & 0xff000) >> 12;
+    pirq |= (addr >> 32) & 0xffffff00;
+
+    return pirq;
+}
+
+void xen_evtchn_snoop_msi(PCIDevice *dev, bool is_msix, unsigned int vector,
+                          uint64_t addr, uint32_t data, bool is_masked)
+{
+    XenEvtchnState *s = xen_evtchn_singleton;
+    uint32_t pirq;
+
+    if (!s) {
+        return;
+    }
+
+    pirq = msi_pirq_target(addr, data);
+    if (!pirq || pirq >= MAX_XEN_PIRQ) {
+        return;
+    }
+
+    QEMU_LOCK_GUARD(&s->port_lock);
+
+    if (s->pirq[pirq].gsi != IRQ_UNBOUND && s->pirq[pirq].gsi != IRQ_MSI_EMU) {
+        return;
+    }
+
+    if (s->pirq[pirq].dev != dev) {
+        if (s->pirq[pirq].dev) {
+            object_unref(OBJECT(s->pirq[pirq].dev));
+        }
+        object_ref(OBJECT(dev));
+        s->pirq[pirq].dev = dev;
+    }
+
+    s->pirq[pirq].gsi = IRQ_MSI_EMU;
+    s->pirq[pirq].is_msix = is_msix;
+    s->pirq[pirq].vector = vector;
+    s->pirq[pirq].is_masked = is_masked;
+
+}
+
+bool xen_evtchn_translate_pirq_msi(struct kvm_irq_routing_entry *route,
+                                   uint64_t address, uint32_t data)
+{
+    XenEvtchnState *s = xen_evtchn_singleton;
+    uint32_t pirq, port;
+    CPUState *cpu;
+
+    if (!s) {
+        return false;
+    }
+
+    pirq = msi_pirq_target(address, data);
+    if (!pirq || pirq >= MAX_XEN_PIRQ) {
+        return false;
+    }
+
+    QEMU_LOCK_GUARD(&s->port_lock);
+
+    port = s->pirq[pirq].port;
+    if (!valid_port(port)) {
+        return false;
+    }
+
+    cpu = qemu_get_cpu(s->port_table[port].vcpu);
+    if (!cpu) {
+        return false;
+    }
+
+    route->type = KVM_IRQ_ROUTING_XEN_EVTCHN;
+    route->u.xen_evtchn.port = port;
+    route->u.xen_evtchn.vcpu = kvm_arch_vcpu_id(cpu);
+    route->u.xen_evtchn.priority = KVM_IRQ_ROUTING_XEN_EVTCHN_PRIO_2LEVEL;
+
+    return true;
+}
+
+bool xen_evtchn_deliver_pirq_msi(uint64_t address, uint32_t data)
+{
+    XenEvtchnState *s = xen_evtchn_singleton;
+    uint32_t pirq, port;
+
+    if (!s) {
+        return false;
+    }
+
+    pirq = msi_pirq_target(address, data);
+    if (!pirq || pirq >= MAX_XEN_PIRQ) {
+        return false;
+    }
+
+    QEMU_LOCK_GUARD(&s->port_lock);
+
+    port = s->pirq[pirq].port;
+    if (!valid_port(port)) {
+        return false;
+    }
+
+    set_port_pending(s, port);
+    return true;
+}
+
 int xen_physdev_map_pirq(struct physdev_map_pirq *map)
 {
     XenEvtchnState *s = xen_evtchn_singleton;
@@ -1529,9 +1657,10 @@ int xen_physdev_unmap_pirq(struct physdev_unmap_pirq *unmap)
     if (pirq < 0 || pirq >= MAX_XEN_PIRQ)
         return -EINVAL;
 
-    QEMU_LOCK_GUARD(&s->port_lock);
+    qemu_mutex_lock(&s->port_lock);
 
     if (!pirq_inuse(s, pirq)) {
+        qemu_mutex_unlock(&s->port_lock);
         return -ENOENT;
     }
 
@@ -1539,6 +1668,7 @@ int xen_physdev_unmap_pirq(struct physdev_unmap_pirq *unmap)
 
     /* We can only unmap GSI PIRQs */
     if (gsi < 0) {
+        qemu_mutex_unlock(&s->port_lock);
         return -EINVAL;
     }
 
@@ -1547,6 +1677,11 @@ int xen_physdev_unmap_pirq(struct physdev_unmap_pirq *unmap)
     pirq_inuse_word(s, pirq) &= ~pirq_inuse_bit(pirq);
 
     trace_kvm_xen_unmap_pirq(pirq, gsi);
+    qemu_mutex_unlock(&s->port_lock);
+
+    if (gsi == IRQ_MSI_EMU)
+        kvm_update_msi_routes_all(NULL, true, 0, 0);
+
     return 0;
 }
 
diff --git a/hw/i386/kvm/xen_evtchn.h b/hw/i386/kvm/xen_evtchn.h
index dba9d6b021..3f8b757194 100644
--- a/hw/i386/kvm/xen_evtchn.h
+++ b/hw/i386/kvm/xen_evtchn.h
@@ -25,6 +25,13 @@ void xen_evtchn_set_callback_level(int level);
 int xen_evtchn_set_port(uint16_t port);
 
 bool xen_evtchn_set_gsi(int gsi, int level);
+void xen_evtchn_snoop_msi(PCIDevice *dev, bool is_msix, unsigned int vector,
+                          uint64_t addr, uint32_t data, bool is_masked);
+struct kvm_irq_routing_entry;
+bool xen_evtchn_translate_pirq_msi(struct kvm_irq_routing_entry *route,
+                                   uint64_t address, uint32_t data);
+bool xen_evtchn_deliver_pirq_msi(uint64_t address, uint32_t data);
+
 
 /*
  * These functions mirror the libxenevtchn library API, providing the QEMU
diff --git a/hw/pci/msi.c b/hw/pci/msi.c
index 1cadf150bc..f09b7f3fca 100644
--- a/hw/pci/msi.c
+++ b/hw/pci/msi.c
@@ -24,6 +24,8 @@
 #include "qemu/range.h"
 #include "qapi/error.h"
 
+#include "hw/i386/kvm/xen_evtchn.h"
+
 /* PCI_MSI_ADDRESS_LO */
 #define PCI_MSI_ADDRESS_LO_MASK         (~0x3)
 
@@ -431,6 +433,17 @@ void msi_write_config(PCIDevice *dev, uint32_t addr, uint32_t val, int len)
      */
     pci_device_deassert_intx(dev);
 
+    if (xen_mode == XEN_EMULATE) {
+        for (vector = 0; vector < msi_nr_vectors(flags); vector++) {
+            MSIMessage msg = msi_prepare_message(dev, vector);
+
+            xen_evtchn_snoop_msi(dev, false, vector, msg.address, msg.data,
+                                 msi_is_masked(dev, vector));
+        }
+    }
+
+
+
     /*
      * nr_vectors might be set bigger than capable. So clamp it.
      * This is not legal by spec, so we can do anything we like,
diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index 9e70fcd6fa..afa77b8d71 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -26,6 +26,8 @@
 #include "qapi/error.h"
 #include "trace.h"
 
+#include "hw/i386/kvm/xen_evtchn.h"
+
 /* MSI enable bit and maskall bit are in byte 1 in FLAGS register */
 #define MSIX_CONTROL_OFFSET (PCI_MSIX_FLAGS + 1)
 #define MSIX_ENABLE_MASK (PCI_MSIX_FLAGS_ENABLE >> 8)
@@ -105,16 +107,17 @@ bool msix_is_masked(PCIDevice *dev, unsigned int vector)
 static void msix_fire_vector_notifier(PCIDevice *dev,
                                       unsigned int vector, bool is_masked)
 {
-    MSIMessage msg;
+    MSIMessage msg = msi_get_message(dev, vector);
     int ret;
 
+    xen_evtchn_snoop_msi(dev, true, vector, msg.address, msg.data, is_masked);
+
     if (!dev->msix_vector_use_notifier) {
         return;
     }
     if (is_masked) {
         dev->msix_vector_release_notifier(dev, vector);
     } else {
-        msg = msix_get_message(dev, vector);
         ret = dev->msix_vector_use_notifier(dev, vector, msg);
         assert(ret >= 0);
     }
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index c2fb88f9a3..fdb8d4abc9 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -49,6 +49,9 @@
 #include "qemu/cutils.h"
 #include "pci-internal.h"
 
+#include "hw/xen/xen.h"
+#include "hw/i386/kvm/xen_evtchn.h"
+
 //#define DEBUG_PCI
 #ifdef DEBUG_PCI
 # define PCI_DPRINTF(format, ...)       printf(format, ## __VA_ARGS__)
@@ -318,6 +321,17 @@ static void pci_msi_trigger(PCIDevice *dev, MSIMessage msg)
 {
     MemTxAttrs attrs = {};
 
+    /*
+     * Xen uses the high bits of the address to contain some of the bits
+     * of the PIRQ#. Therefore we can't just send the write cycle and
+     * trust that it's caught by the APIC at 0xfee00000 because the
+     * target of the write might be e.g. 0x0x1000fee46000 for PIRQ#4166.
+     * So we intercept the delivery here instead of in kvm_send_msi().
+     */
+    if (xen_mode == XEN_EMULATE &&
+        xen_evtchn_deliver_pirq_msi(msg.address, msg.data)) {
+        return;
+    }
     attrs.requester_id = pci_requester_id(dev);
     address_space_stl_le(&dev->bus_master_as, msg.address, msg.data,
                          attrs, NULL);
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 76bdd9d7ea..6d307cdcad 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -43,6 +43,7 @@
 #include "qemu/error-report.h"
 #include "qemu/memalign.h"
 #include "hw/i386/x86.h"
+#include "hw/i386/kvm/xen_evtchn.h"
 #include "hw/i386/pc.h"
 #include "hw/i386/apic.h"
 #include "hw/i386/apic_internal.h"
@@ -5640,6 +5641,13 @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
         }
     }
 
+#ifdef CONFIG_XEN_EMU
+    if (xen_mode == XEN_EMULATE &&
+        xen_evtchn_translate_pirq_msi(route, address, data)) {
+        return 0;
+    }
+#endif
+
     address = kvm_swizzle_msi_ext_dest_id(address);
     route->u.msi.address_hi = address >> VTD_MSI_ADDR_HI_SHIFT;
     route->u.msi.address_lo = address & VTD_MSI_ADDR_LO_MASK;
@@ -5659,8 +5667,8 @@ struct MSIRouteEntry {
 static QLIST_HEAD(, MSIRouteEntry) msi_route_list = \
     QLIST_HEAD_INITIALIZER(msi_route_list);
 
-static void kvm_update_msi_routes_all(void *private, bool global,
-                                      uint32_t index, uint32_t mask)
+void kvm_update_msi_routes_all(void *private, bool global,
+                               uint32_t index, uint32_t mask)
 {
     int cnt = 0, vector;
     MSIRouteEntry *entry;
diff --git a/target/i386/kvm/kvm_i386.h b/target/i386/kvm/kvm_i386.h
index 6a5c24e3dc..e24753abfe 100644
--- a/target/i386/kvm/kvm_i386.h
+++ b/target/i386/kvm/kvm_i386.h
@@ -51,6 +51,8 @@ bool kvm_hv_vpindex_settable(void);
 bool kvm_hyperv_expand_features(X86CPU *cpu, Error **errp);
 
 uint64_t kvm_swizzle_msi_ext_dest_id(uint64_t address);
+void kvm_update_msi_routes_all(void *private, bool global,
+                               uint32_t index, uint32_t mask);
 
 bool kvm_enable_sgx_provisioning(KVMState *s);
 void kvm_request_xsave_components(X86CPU *cpu, uint64_t mask);
diff --git a/target/i386/kvm/xen-emu.c b/target/i386/kvm/xen-emu.c
index c956390e2c..0f1a7c8edf 100644
--- a/target/i386/kvm/xen-emu.c
+++ b/target/i386/kvm/xen-emu.c
@@ -266,7 +266,8 @@ static bool kvm_xen_hcall_xen_version(struct kvm_xen_exit *exit, X86CPU *cpu,
                          1 << XENFEAT_auto_translated_physmap |
                          1 << XENFEAT_supervisor_mode_kernel |
                          1 << XENFEAT_hvm_callback_vector |
-                         1 << XENFEAT_hvm_safe_pvclock;
+                         1 << XENFEAT_hvm_safe_pvclock |
+                         1 << XENFEAT_hvm_pirqs;
         }
 
         err = kvm_copy_to_gva(CPU(cpu), arg, &fi, sizeof(fi));
-- 
2.35.3



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC PATCH 4/5] hw/xen: [FIXME] Avoid deadlock in xen_evtchn_set_gsi()
  2023-01-14  0:39 ` [RFC PATCH 4/5] hw/xen: [FIXME] Avoid deadlock in xen_evtchn_set_gsi() David Woodhouse
@ 2023-01-16 11:27   ` David Woodhouse
  0 siblings, 0 replies; 7+ messages in thread
From: David Woodhouse @ 2023-01-16 11:27 UTC (permalink / raw)
  To: qemu-devel
  Cc: Paolo Bonzini, Paul Durrant, Joao Martins, Ankur Arora,
	Philippe Mathieu-Daudé,
	Thomas Huth, Alex Bennée, Juan Quintela,
	Dr . David Alan Gilbert, Claudio Fontana, Julien Grall,
	Michael S. Tsirkin, arcel Apfelbaum

[-- Attachment #1: Type: text/plain, Size: 2166 bytes --]

FWIW, with this support in QEMU I've just found and fixed two guest
kernel regressions in MSI-X handling under Xen. And having done that, I
can get back to frowning at the locking on the qemu side...

On Sat, 2023-01-14 at 00:39 +0000, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  hw/i386/kvm/xen_evtchn.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
> index 18c88229ab..c4103ee98b 100644
> --- a/hw/i386/kvm/xen_evtchn.c
> +++ b/hw/i386/kvm/xen_evtchn.c
> @@ -1436,6 +1436,14 @@ bool xen_evtchn_set_gsi(int gsi, int level)
>          return false;
>      }
>  
> +    /*
> +     * XXX: We access this without locking. Because we'd deadlock
> +     * if it was the callback_gsi. Do something cleverer.
> +     */
> +    if (gsi && gsi == s->callback_gsi) {
> +        return false;
> +    }
> +
>      QEMU_LOCK_GUARD(&s->port_lock);
>  
>      pirq = s->gsi_pirq[gsi];

I tried adding a QemuRecMutex just around s->callback_gsi which kind of
works, but then I realised that I'm not supposed to be calling
qemu_set_irq() without holding the BQL, am I?

The code which does so (in xen_evtchn_set_callback_level()) sometimes
gets triggered from a vCPU making a hypercall (which doesn't hold the
BQL), and sometimes from a PV backend (which *will* hold the BQL). In
fact the latter is much more common in the case where GSI delivery is
needed.

Because it's called from the PV backends, I don't think I can avoid the
rule that the BQL is taken first, and the evtchn port_lock taken inside
it.

So I must never block on the BQL from the hypercall code paths. I think
that means I should refactor xen_evtchn_set_callback_level() to:

  • Take an argument telling it whether it is running from a context 
    with the BQL already held.
  • Do a *trylock* on the BQL if isn't already held.
  • If the trylock fails, invoke itself from a BH on the main I/O
    thread.

Does that seem reasonable?


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-01-16 11:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-14  0:39 [RFC PATCH 0/5] Xen PIRQ support David Woodhouse
2023-01-14  0:39 ` [RFC PATCH 1/5] i386/xen: Implement HYPERVISOR_physdev_op David Woodhouse
2023-01-14  0:39 ` [RFC PATCH 2/5] hw/xen: Implement emulated PIRQ hypercall support David Woodhouse
2023-01-14  0:39 ` [RFC PATCH 3/5] hw/xen: Support GSI mapping to PIRQ David Woodhouse
2023-01-14  0:39 ` [RFC PATCH 4/5] hw/xen: [FIXME] Avoid deadlock in xen_evtchn_set_gsi() David Woodhouse
2023-01-16 11:27   ` David Woodhouse
2023-01-14  0:39 ` [RFC PATCH 5/5] hw/xen: Support MSI mapping to PIRQ David Woodhouse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.