qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/5] Handle PAPR irq chip changes for VFIO devices
@ 2019-10-17  5:42 David Gibson
  2019-10-17  5:42 ` [RFC 1/5] kvm: Introduce KVM irqchip change notifier David Gibson
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: David Gibson @ 2019-10-17  5:42 UTC (permalink / raw)
  To: alex.williamson, clg, groug; +Cc: aik, qemu-ppc, qemu-devel, David Gibson

The pseries machine type has the odd property that it's root irq chip
can be completely changed at runtime.  This comes about because PAPR
feature negotiation lets the guest choose between the old XICS style
or new XIVE style PIC.  It's possible, because both PICs are
paravirtualized via hypercalls.

VFIO needs to wire up device interrupts directly to the kernel irqchip
to accelerate delivery, and that's broken by the irq chip change.
This series introduces a new notifier to get this correctly updated
when PAPR switchs irq chip.

Caveats:
 * I'm not sure I've sufficiently pinned down the semantics of when
   exactly the new notifier should be called yet
   
 * It would kind of be niced to automatically fire the notifier from
   somewhere in the irq chip update routines, rather than at the PAPR
   level.  I haven't seen a good way to do that (at least not without
   double firing it on every transition).

 * Patch 5/5 to work around spurious warnings is working a bit *too*
   well.  On a Boston machine which allows in-kernel XICS, but not
   in-kernel XIVE, I (correctly) no longer get the spurious warning at
   initial start up (in XICS mode).  However we incorrectly *don't*
   get the "failed to setup resample irqfd" warning after we negotiate
   features and switch to XIVE mode.  I haven't had a chance to
   investigate, but I suspect a kernel bug where it's responding to
   KVM_IRQFD based on stale information about the kernel irqchip.

David Gibson (5):
  kvm: Introduce KVM irqchip change notifier
  vfio/pci: Split vfio_intx_update()
  vfio/pci: Respond to KVM irqchip change notifier
  spapr: Handle irq backend changes with VFIO PCI devices
  spapr: Work around spurious warnings from vfio INTx initialization

 accel/kvm/kvm-all.c    | 18 +++++++++++++++
 accel/stubs/kvm-stub.c | 12 ++++++++++
 hw/ppc/spapr_irq.c     | 17 +++++++++++++-
 hw/vfio/pci.c          | 51 ++++++++++++++++++++++++++++--------------
 hw/vfio/pci.h          |  2 ++
 include/sysemu/kvm.h   |  5 +++++
 6 files changed, 87 insertions(+), 18 deletions(-)

-- 
2.21.0



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC 1/5] kvm: Introduce KVM irqchip change notifier
  2019-10-17  5:42 [RFC 0/5] Handle PAPR irq chip changes for VFIO devices David Gibson
@ 2019-10-17  5:42 ` David Gibson
  2019-10-17  5:42 ` [RFC 2/5] vfio/pci: Split vfio_intx_update() David Gibson
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2019-10-17  5:42 UTC (permalink / raw)
  To: alex.williamson, clg, groug; +Cc: aik, qemu-ppc, qemu-devel, David Gibson

Awareness of an in kernel irqchip is usually local to the machine and its
top-level interrupt controller.  However, in a few cases other things need
to know about it.  In particular vfio devices need this in order to
accelerate interrupt delivery.

If interrupt routing is changed, such devices may need to readjust their
connection to the KVM irqchip.  pci_bus_fire_intx_routing_notifier() exists
to do just this.

However, for the pseries machine type we have a situation where the routing
remains constant but the top-level irq chip itself is changed.  This occurs
because of PAPR feature negotiation which allows the guest to decide
between the older XICS and newer XIVE irq chip models (both of which are
paravirtualized).

To allow devices like vfio to adjust to this change, introduce a new
notifier for the purpose kvm_irqchip_change_notify().

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 accel/kvm/kvm-all.c    | 18 ++++++++++++++++++
 accel/stubs/kvm-stub.c | 12 ++++++++++++
 include/sysemu/kvm.h   |  5 +++++
 3 files changed, 35 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index d2d96d73e8..44df1908dd 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -149,6 +149,9 @@ static const KVMCapabilityInfo kvm_required_capabilites[] = {
     KVM_CAP_LAST_INFO
 };
 
+static NotifierList kvm_irqchip_change_notifiers =
+    NOTIFIER_LIST_INITIALIZER(kvm_irqchip_change_notifiers);
+
 #define kvm_slots_lock(kml)      qemu_mutex_lock(&(kml)->slots_lock)
 #define kvm_slots_unlock(kml)    qemu_mutex_unlock(&(kml)->slots_lock)
 
@@ -1396,6 +1399,21 @@ void kvm_irqchip_release_virq(KVMState *s, int virq)
     trace_kvm_irqchip_release_virq(virq);
 }
 
+void kvm_irqchip_add_change_notifier(Notifier *n)
+{
+    notifier_list_add(&kvm_irqchip_change_notifiers, n);
+}
+
+void kvm_irqchip_remove_change_notifier(Notifier *n)
+{
+    notifier_remove(n);
+}
+
+void kvm_irqchip_change_notify(void)
+{
+    notifier_list_notify(&kvm_irqchip_change_notifiers, NULL);
+}
+
 static unsigned int kvm_hash_msi(uint32_t data)
 {
     /* This is optimized for IA32 MSI layout. However, no other arch shall
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 6feb66ed80..82f118d2df 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -138,6 +138,18 @@ void kvm_irqchip_commit_routes(KVMState *s)
 {
 }
 
+void kvm_irqchip_add_change_notifier(Notifier *n)
+{
+}
+
+void kvm_irqchip_remove_change_notifier(Notifier *n)
+{
+}
+
+void kvm_irqchip_change_notify(void)
+{
+}
+
 int kvm_irqchip_add_adapter_route(KVMState *s, AdapterInfo *adapter)
 {
     return -ENOSYS;
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 9d143282bc..9fe233b9bf 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -201,6 +201,7 @@ typedef struct KVMCapabilityInfo {
 struct KVMState;
 typedef struct KVMState KVMState;
 extern KVMState *kvm_state;
+typedef struct Notifier Notifier;
 
 /* external API */
 
@@ -401,6 +402,10 @@ int kvm_irqchip_send_msi(KVMState *s, MSIMessage msg);
 
 void kvm_irqchip_add_irq_route(KVMState *s, int gsi, int irqchip, int pin);
 
+void kvm_irqchip_add_change_notifier(Notifier *n);
+void kvm_irqchip_remove_change_notifier(Notifier *n);
+void kvm_irqchip_change_notify(void);
+
 void kvm_get_apic_state(DeviceState *d, struct kvm_lapic_state *kapic);
 
 struct kvm_guest_debug;
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 2/5] vfio/pci: Split vfio_intx_update()
  2019-10-17  5:42 [RFC 0/5] Handle PAPR irq chip changes for VFIO devices David Gibson
  2019-10-17  5:42 ` [RFC 1/5] kvm: Introduce KVM irqchip change notifier David Gibson
@ 2019-10-17  5:42 ` David Gibson
  2019-10-17  5:42 ` [RFC 3/5] vfio/pci: Respond to KVM irqchip change notifier David Gibson
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2019-10-17  5:42 UTC (permalink / raw)
  To: alex.williamson, clg, groug; +Cc: aik, qemu-ppc, qemu-devel, David Gibson

This splits the vfio_intx_update() function into one part doing the actual
reconnection with the KVM irqchip (vfio_intx_update(), now taking an
argument with the new routing) and vfio_intx_routing_notifier() which
handles calls to the pci device intx routing notifier and calling
vfio_intx_update() when necessary.  This will make adding support for the
irqchip change notifier easier.

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/vfio/pci.c | 39 ++++++++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 12fac39804..529ad13908 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -215,30 +215,18 @@ static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev)
 #endif
 }
 
-static void vfio_intx_update(PCIDevice *pdev)
+static void vfio_intx_update(VFIOPCIDevice *vdev, PCIINTxRoute *route)
 {
-    VFIOPCIDevice *vdev = PCI_VFIO(pdev);
-    PCIINTxRoute route;
     Error *err = NULL;
 
-    if (vdev->interrupt != VFIO_INT_INTx) {
-        return;
-    }
-
-    route = pci_device_route_intx_to_irq(&vdev->pdev, vdev->intx.pin);
-
-    if (!pci_intx_route_changed(&vdev->intx.route, &route)) {
-        return; /* Nothing changed */
-    }
-
     trace_vfio_intx_update(vdev->vbasedev.name,
-                           vdev->intx.route.irq, route.irq);
+                           vdev->intx.route.irq, route->irq);
 
     vfio_intx_disable_kvm(vdev);
 
-    vdev->intx.route = route;
+    vdev->intx.route = *route;
 
-    if (route.mode != PCI_INTX_ENABLED) {
+    if (route->mode != PCI_INTX_ENABLED) {
         return;
     }
 
@@ -251,6 +239,22 @@ static void vfio_intx_update(PCIDevice *pdev)
     vfio_intx_eoi(&vdev->vbasedev);
 }
 
+static void vfio_intx_routing_notifier(PCIDevice *pdev)
+{
+    VFIOPCIDevice *vdev = PCI_VFIO(pdev);
+    PCIINTxRoute route;
+
+    if (vdev->interrupt != VFIO_INT_INTx) {
+        return;
+    }
+
+    route = pci_device_route_intx_to_irq(&vdev->pdev, vdev->intx.pin);
+
+    if (pci_intx_route_changed(&vdev->intx.route, &route)) {
+        vfio_intx_update(vdev, &route);
+    }
+}
+
 static int vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp)
 {
     uint8_t pin = vfio_pci_read_config(&vdev->pdev, PCI_INTERRUPT_PIN, 1);
@@ -2954,7 +2958,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     if (vfio_pci_read_config(&vdev->pdev, PCI_INTERRUPT_PIN, 1)) {
         vdev->intx.mmap_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
                                                   vfio_intx_mmap_enable, vdev);
-        pci_device_set_intx_routing_notifier(&vdev->pdev, vfio_intx_update);
+        pci_device_set_intx_routing_notifier(&vdev->pdev,
+                                             vfio_intx_routing_notifier);
         ret = vfio_intx_enable(vdev, errp);
         if (ret) {
             goto out_teardown;
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 3/5] vfio/pci: Respond to KVM irqchip change notifier
  2019-10-17  5:42 [RFC 0/5] Handle PAPR irq chip changes for VFIO devices David Gibson
  2019-10-17  5:42 ` [RFC 1/5] kvm: Introduce KVM irqchip change notifier David Gibson
  2019-10-17  5:42 ` [RFC 2/5] vfio/pci: Split vfio_intx_update() David Gibson
@ 2019-10-17  5:42 ` David Gibson
  2019-10-17 20:28   ` Alex Williamson
  2019-10-17  5:42 ` [RFC 4/5] spapr: Handle irq backend changes with VFIO PCI devices David Gibson
  2019-10-17  5:42 ` [RFC 5/5] spapr: Work around spurious warnings from vfio INTx initialization David Gibson
  4 siblings, 1 reply; 13+ messages in thread
From: David Gibson @ 2019-10-17  5:42 UTC (permalink / raw)
  To: alex.williamson, clg, groug; +Cc: aik, qemu-ppc, qemu-devel, David Gibson

VFIO PCI devices already respond to the pci intx routing notifier, in order
to update kernel irqchip mappings when routing is updated.  However this
won't handle the case where the irqchip itself is replaced by a different
model while retaining the same routing.  This case can happen on
the pseries machine type due to PAPR feature negotiation.

To handle that case, add a handler for the irqchip change notifier, which
does much the same thing as the routing notifier, but is unconditional,
rather than being a no-op when the routing hasn't changed.

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/vfio/pci.c | 12 ++++++++++++
 hw/vfio/pci.h |  2 ++
 2 files changed, 14 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 529ad13908..6aa806baff 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -255,6 +255,14 @@ static void vfio_intx_routing_notifier(PCIDevice *pdev)
     }
 }
 
+static void vfio_irqchip_change(Notifier *notify, void *data)
+{
+    VFIOPCIDevice *vdev = container_of(notify, VFIOPCIDevice,
+                                       irqchip_change_notifier);
+
+    vfio_intx_update(vdev, &vdev->intx.route);
+}
+
 static int vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp)
 {
     uint8_t pin = vfio_pci_read_config(&vdev->pdev, PCI_INTERRUPT_PIN, 1);
@@ -2960,6 +2968,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
                                                   vfio_intx_mmap_enable, vdev);
         pci_device_set_intx_routing_notifier(&vdev->pdev,
                                              vfio_intx_routing_notifier);
+        vdev->irqchip_change_notifier.notify = vfio_irqchip_change;
+        kvm_irqchip_add_change_notifier(&vdev->irqchip_change_notifier);
         ret = vfio_intx_enable(vdev, errp);
         if (ret) {
             goto out_teardown;
@@ -3009,6 +3019,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 
 out_teardown:
     pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
+    kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
     vfio_teardown_msi(vdev);
     vfio_bars_exit(vdev);
 error:
@@ -3042,6 +3053,7 @@ static void vfio_exitfn(PCIDevice *pdev)
     vfio_unregister_req_notifier(vdev);
     vfio_unregister_err_notifier(vdev);
     pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
+    kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
     vfio_disable_interrupts(vdev);
     if (vdev->intx.mmap_timer) {
         timer_free(vdev->intx.mmap_timer);
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 834a90d646..11324f28ce 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -168,6 +168,8 @@ typedef struct VFIOPCIDevice {
     bool no_vfio_ioeventfd;
     bool enable_ramfb;
     VFIODisplay *dpy;
+
+    Notifier irqchip_change_notifier;
 } VFIOPCIDevice;
 
 uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 4/5] spapr: Handle irq backend changes with VFIO PCI devices
  2019-10-17  5:42 [RFC 0/5] Handle PAPR irq chip changes for VFIO devices David Gibson
                   ` (2 preceding siblings ...)
  2019-10-17  5:42 ` [RFC 3/5] vfio/pci: Respond to KVM irqchip change notifier David Gibson
@ 2019-10-17  5:42 ` David Gibson
  2019-10-17  5:42 ` [RFC 5/5] spapr: Work around spurious warnings from vfio INTx initialization David Gibson
  4 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2019-10-17  5:42 UTC (permalink / raw)
  To: alex.williamson, clg, groug; +Cc: aik, qemu-ppc, qemu-devel, David Gibson

pseries machine type can have one of two different interrupt controllers in
use depending on feature negotiation with the guest.  Usually this is
invisible to devices, because they route to a common set of qemu_irqs which
in turn dispatch to the correct back end.

VFIO passthrough devices, however, wire themselves up directly to the KVM
irqchip for performance, which means they are affected by this change in
interrupt controller.  To get them to adjust correctly for the change in
irqchip, we need to fire the kvm irqchip change notifier.

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr_irq.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 234d1073e5..45544b8976 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -480,6 +480,12 @@ static void set_active_intc(SpaprMachineState *spapr,
     }
 
     spapr->active_intc = new_intc;
+
+    /*
+     * We've changed the kernel irqchip, let VFIO devices know they
+     * need to readjust.
+     */
+    kvm_irqchip_change_notify();
 }
 
 void spapr_irq_update_active_intc(SpaprMachineState *spapr)
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC 5/5] spapr: Work around spurious warnings from vfio INTx initialization
  2019-10-17  5:42 [RFC 0/5] Handle PAPR irq chip changes for VFIO devices David Gibson
                   ` (3 preceding siblings ...)
  2019-10-17  5:42 ` [RFC 4/5] spapr: Handle irq backend changes with VFIO PCI devices David Gibson
@ 2019-10-17  5:42 ` David Gibson
  2019-10-17  8:43   ` Cédric Le Goater
  4 siblings, 1 reply; 13+ messages in thread
From: David Gibson @ 2019-10-17  5:42 UTC (permalink / raw)
  To: alex.williamson, clg, groug; +Cc: aik, qemu-ppc, qemu-devel, David Gibson

Traditional PCI INTx for vfio devices can only perform well if using
an in-kernel irqchip.  Therefore, vfio_intx_update() issues a warning
if an in kernel irqchip is not available.

We usually do have an in-kernel irqchip available for pseries machines
on POWER hosts.  However, because the platform allows feature
negotiation of what interrupt controller model to use, we don't
currently initialize it until machine reset.  vfio_intx_update() is
called (first) from vfio_realize() before that, so it can issue a
spurious warning, even if we will have an in kernel irqchip by the
time we need it.

To workaround this, make a call to spapr_irq_update_active_intc() from
spapr_irq_init() which is called at machine realize time, before the
vfio realize.  This call will be pretty much obsoleted by the later
call at reset time, but it serves to suppress the spurious warning
from VFIO.

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr_irq.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 45544b8976..bb91c61fa0 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -345,6 +345,14 @@ void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
 
     spapr->qirqs = qemu_allocate_irqs(spapr_set_irq, spapr,
                                       smc->nr_xirqs + SPAPR_XIRQ_BASE);
+
+    /*
+     * Mostly we don't actually need this until reset, except that not
+     * having this set up can cause VFIO devices to issue a
+     * false-positive warning during realize(), because they don't yet
+     * have an in-kernel irq chip.
+     */
+    spapr_irq_update_active_intc(spapr);
 }
 
 int spapr_irq_claim(SpaprMachineState *spapr, int irq, bool lsi, Error **errp)
@@ -500,7 +508,8 @@ void spapr_irq_update_active_intc(SpaprMachineState *spapr)
          * this.
          */
         new_intc = SPAPR_INTC(spapr->xive);
-    } else if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+    } else if (spapr->ov5_cas
+               && spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
         new_intc = SPAPR_INTC(spapr->xive);
     } else {
         new_intc = SPAPR_INTC(spapr->ics);
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC 5/5] spapr: Work around spurious warnings from vfio INTx initialization
  2019-10-17  5:42 ` [RFC 5/5] spapr: Work around spurious warnings from vfio INTx initialization David Gibson
@ 2019-10-17  8:43   ` Cédric Le Goater
  2019-10-17  9:00     ` Greg Kurz
  2019-10-17 13:42     ` Cédric Le Goater
  0 siblings, 2 replies; 13+ messages in thread
From: Cédric Le Goater @ 2019-10-17  8:43 UTC (permalink / raw)
  To: David Gibson, alex.williamson, groug; +Cc: aik, qemu-ppc, qemu-devel

On 17/10/2019 07:42, David Gibson wrote:
> Traditional PCI INTx for vfio devices can only perform well if using
> an in-kernel irqchip.  Therefore, vfio_intx_update() issues a warning
> if an in kernel irqchip is not available.
> 
> We usually do have an in-kernel irqchip available for pseries machines
> on POWER hosts.  However, because the platform allows feature
> negotiation of what interrupt controller model to use, we don't
> currently initialize it until machine reset.  vfio_intx_update() is
> called (first) from vfio_realize() before that, so it can issue a
> spurious warning, even if we will have an in kernel irqchip by the
> time we need it.
> 
> To workaround this, make a call to spapr_irq_update_active_intc() from
> spapr_irq_init() which is called at machine realize time, before the
> vfio realize.  This call will be pretty much obsoleted by the later
> call at reset time, but it serves to suppress the spurious warning
> from VFIO.
> 
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/ppc/spapr_irq.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index 45544b8976..bb91c61fa0 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -345,6 +345,14 @@ void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
>  
>      spapr->qirqs = qemu_allocate_irqs(spapr_set_irq, spapr,
>                                        smc->nr_xirqs + SPAPR_XIRQ_BASE);
> +
> +    /*
> +     * Mostly we don't actually need this until reset, except that not
> +     * having this set up can cause VFIO devices to issue a
> +     * false-positive warning during realize(), because they don't yet
> +     * have an in-kernel irq chip.
> +     */
> +    spapr_irq_update_active_intc(spapr);

This will call the de/activate hooks of the irq chip very early 
before reset and CAS, and won't call them at reset if the intc
pointers are the same (xive only for instance). It breaks very 
obviously the emulated XIVE device which won't reset the OS CAM 
line with the CPU id values and CPU notification will be broken.

We have to find something else.


>  }
>  
>  int spapr_irq_claim(SpaprMachineState *spapr, int irq, bool lsi, Error **errp)
> @@ -500,7 +508,8 @@ void spapr_irq_update_active_intc(SpaprMachineState *spapr)
>           * this.
>           */
>          new_intc = SPAPR_INTC(spapr->xive);
> -    } else if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +    } else if (spapr->ov5_cas
> +               && spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {

This change can go in another patch.

C. 

>          new_intc = SPAPR_INTC(spapr->xive);
>      } else {
>          new_intc = SPAPR_INTC(spapr->ics);
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 5/5] spapr: Work around spurious warnings from vfio INTx initialization
  2019-10-17  8:43   ` Cédric Le Goater
@ 2019-10-17  9:00     ` Greg Kurz
  2019-10-17 13:42     ` Cédric Le Goater
  1 sibling, 0 replies; 13+ messages in thread
From: Greg Kurz @ 2019-10-17  9:00 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: aik, alex.williamson, qemu-ppc, qemu-devel, David Gibson

On Thu, 17 Oct 2019 10:43:11 +0200
Cédric Le Goater <clg@kaod.org> wrote:

> On 17/10/2019 07:42, David Gibson wrote:
> > Traditional PCI INTx for vfio devices can only perform well if using
> > an in-kernel irqchip.  Therefore, vfio_intx_update() issues a warning
> > if an in kernel irqchip is not available.
> > 
> > We usually do have an in-kernel irqchip available for pseries machines
> > on POWER hosts.  However, because the platform allows feature
> > negotiation of what interrupt controller model to use, we don't
> > currently initialize it until machine reset.  vfio_intx_update() is
> > called (first) from vfio_realize() before that, so it can issue a
> > spurious warning, even if we will have an in kernel irqchip by the
> > time we need it.
> > 
> > To workaround this, make a call to spapr_irq_update_active_intc() from
> > spapr_irq_init() which is called at machine realize time, before the
> > vfio realize.  This call will be pretty much obsoleted by the later
> > call at reset time, but it serves to suppress the spurious warning
> > from VFIO.
> > 
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  hw/ppc/spapr_irq.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> > index 45544b8976..bb91c61fa0 100644
> > --- a/hw/ppc/spapr_irq.c
> > +++ b/hw/ppc/spapr_irq.c
> > @@ -345,6 +345,14 @@ void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
> >  
> >      spapr->qirqs = qemu_allocate_irqs(spapr_set_irq, spapr,
> >                                        smc->nr_xirqs + SPAPR_XIRQ_BASE);
> > +
> > +    /*
> > +     * Mostly we don't actually need this until reset, except that not
> > +     * having this set up can cause VFIO devices to issue a
> > +     * false-positive warning during realize(), because they don't yet
> > +     * have an in-kernel irq chip.
> > +     */
> > +    spapr_irq_update_active_intc(spapr);
> 
> This will call the de/activate hooks of the irq chip very early 
> before reset and CAS, and won't call them at reset if the intc
> pointers are the same (xive only for instance). It breaks very 
> obviously the emulated XIVE device which won't reset the OS CAM 
> line with the CPU id values and CPU notification will be broken.
> 

Yes. The problem is that we now have a new path:

spapr_irq_init()
 spapr_irq_update_active_intc()
  set_active_intc()
   spapr_xive_activate()

and we go there before the CPUs are realized:

(gdb) p cpus
$6 = {tqh_first = 0x0, tqh_circ = {tql_next = 0x0, 
    tql_prev = 0x100f203a8 <cpus>}}

spapr_xive_activate() thus doesn't reset the OS CAM line.

When the initial machine reset happens later, CPUs are
present but we don't go down the activate path since XIVE
is already active.

Commenting away the following check in set_active_intc() is enough
to fix:

    if (new_intc == spapr->active_intc) {
        /* Nothing to do */
        return;
    }

but this also reveals that handling the OS CAM line reset in
spapr_xive_activate() only may be a bit fragile since it obviously
requires CPUs to be present... Maybe this should also be done by
the CPUs on their realize path ?

> We have to find something else.
> 
> 
> >  }
> >  
> >  int spapr_irq_claim(SpaprMachineState *spapr, int irq, bool lsi, Error **errp)
> > @@ -500,7 +508,8 @@ void spapr_irq_update_active_intc(SpaprMachineState *spapr)
> >           * this.
> >           */
> >          new_intc = SPAPR_INTC(spapr->xive);
> > -    } else if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> > +    } else if (spapr->ov5_cas
> > +               && spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> 
> This change can go in another patch.
> 
> C. 
> 
> >          new_intc = SPAPR_INTC(spapr->xive);
> >      } else {
> >          new_intc = SPAPR_INTC(spapr->ics);
> > 
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 5/5] spapr: Work around spurious warnings from vfio INTx initialization
  2019-10-17  8:43   ` Cédric Le Goater
  2019-10-17  9:00     ` Greg Kurz
@ 2019-10-17 13:42     ` Cédric Le Goater
  2019-11-20  4:17       ` David Gibson
  1 sibling, 1 reply; 13+ messages in thread
From: Cédric Le Goater @ 2019-10-17 13:42 UTC (permalink / raw)
  To: David Gibson, alex.williamson, groug; +Cc: aik, qemu-ppc, qemu-devel

On 17/10/2019 10:43, Cédric Le Goater wrote:
> On 17/10/2019 07:42, David Gibson wrote:
>> Traditional PCI INTx for vfio devices can only perform well if using
>> an in-kernel irqchip.  Therefore, vfio_intx_update() issues a warning
>> if an in kernel irqchip is not available.
>>
>> We usually do have an in-kernel irqchip available for pseries machines
>> on POWER hosts.  However, because the platform allows feature
>> negotiation of what interrupt controller model to use, we don't
>> currently initialize it until machine reset.  vfio_intx_update() is
>> called (first) from vfio_realize() before that, so it can issue a
>> spurious warning, even if we will have an in kernel irqchip by the
>> time we need it.
>>
>> To workaround this, make a call to spapr_irq_update_active_intc() from
>> spapr_irq_init() which is called at machine realize time, before the
>> vfio realize.  This call will be pretty much obsoleted by the later
>> call at reset time, but it serves to suppress the spurious warning
>> from VFIO.
>>
>> Cc: Alex Williamson <alex.williamson@redhat.com>
>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>
>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>> ---
>>  hw/ppc/spapr_irq.c | 11 ++++++++++-
>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
>> index 45544b8976..bb91c61fa0 100644
>> --- a/hw/ppc/spapr_irq.c
>> +++ b/hw/ppc/spapr_irq.c
>> @@ -345,6 +345,14 @@ void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
>>  
>>      spapr->qirqs = qemu_allocate_irqs(spapr_set_irq, spapr,
>>                                        smc->nr_xirqs + SPAPR_XIRQ_BASE);
>> +
>> +    /*
>> +     * Mostly we don't actually need this until reset, except that not
>> +     * having this set up can cause VFIO devices to issue a
>> +     * false-positive warning during realize(), because they don't yet
>> +     * have an in-kernel irq chip.
>> +     */
>> +    spapr_irq_update_active_intc(spapr);
> 
> This will call the de/activate hooks of the irq chip very early 
> before reset and CAS, and won't call them at reset if the intc
> pointers are the same (xive only for instance). It breaks very 
> obviously the emulated XIVE device which won't reset the OS CAM 
> line with the CPU id values and CPU notification will be broken.
> 
> We have to find something else.

Here is a possible fix for the (re)setting of the OS CAM line. 

Removing spapr_xive_set_tctx_os_cam() is a good thing but this solution
shows some issues in our modeling of hot-plugged CPUS with a reset() 
being called at realize().


Thanks,

C.


From eb6086645b1db13a8e021e8a5b8a2c6faba875b3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@kaod.org>
Date: Thu, 17 Oct 2019 15:35:36 +0200
Subject: [PATCH] spapr/xive: Set the OS CAM line at reset
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When a Virtual Processor is scheduled to run on a HW thread, the
hypervisor pushes its identifier in the OS CAM line. When running in
TCG or kernel_irqchip=off, QEMU needs to emulate the same behavior.

Introduce a 'os-cam' property which will be used to set the OS CAM
line at reset.

Complexity arise with hotplug CPUs which need an extra reset of the
intc presenter of the interrupt controller selected by CAS. The sPAPR
IRQ backend is extended with new handler cpu_intc_reset()

spapr_realize_vcpu() is also modified to call the CPU reset after the
creation of the intc presenter.

This is a combo patch which needs to be splitted.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr_irq.h  |  4 ++++
 include/hw/ppc/spapr_xive.h |  1 -
 include/hw/ppc/xics.h       |  1 +
 include/hw/ppc/xive.h       |  5 ++++-
 hw/intc/spapr_xive.c        | 37 ++++++++++++-------------------------
 hw/intc/xics.c              |  5 +++++
 hw/intc/xics_spapr.c        |  8 ++++++++
 hw/intc/xive.c              | 33 +++++++++++++++++++++++++++++----
 hw/ppc/pnv.c                |  3 ++-
 hw/ppc/spapr_cpu_core.c     |  8 ++++++--
 hw/ppc/spapr_irq.c          | 21 +++++++++++++++++++++
 11 files changed, 92 insertions(+), 34 deletions(-)

diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
index 5e150a667902..78327496c102 100644
--- a/include/hw/ppc/spapr_irq.h
+++ b/include/hw/ppc/spapr_irq.h
@@ -52,6 +52,8 @@ typedef struct SpaprInterruptControllerClass {
      */
     int (*cpu_intc_create)(SpaprInterruptController *intc,
                             PowerPCCPU *cpu, Error **errp);
+    int (*cpu_intc_reset)(SpaprInterruptController *intc, PowerPCCPU *cpu,
+                          Error **errp);
     int (*claim_irq)(SpaprInterruptController *intc, int irq, bool lsi,
                      Error **errp);
     void (*free_irq)(SpaprInterruptController *intc, int irq);
@@ -68,6 +70,8 @@ void spapr_irq_update_active_intc(SpaprMachineState *spapr);
 
 int spapr_irq_cpu_intc_create(SpaprMachineState *spapr,
                               PowerPCCPU *cpu, Error **errp);
+int spapr_irq_cpu_intc_reset(SpaprMachineState *spapr,
+                             PowerPCCPU *cpu, Error **errp);
 void spapr_irq_print_info(SpaprMachineState *spapr, Monitor *mon);
 void spapr_irq_dt(SpaprMachineState *spapr, uint32_t nr_servers,
                   void *fdt, uint32_t phandle);
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index d84bd5c229f0..742b7e834f2a 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -57,7 +57,6 @@ typedef struct SpaprXive {
 void spapr_xive_pic_print_info(SpaprXive *xive, Monitor *mon);
 
 void spapr_xive_hcall_init(SpaprMachineState *spapr);
-void spapr_xive_set_tctx_os_cam(XiveTCTX *tctx);
 void spapr_xive_mmio_set_enabled(SpaprXive *xive, bool enable);
 void spapr_xive_map_mmio(SpaprXive *xive);
 
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 1e6a9300eb2b..602173c12250 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -161,6 +161,7 @@ void icp_set_mfrr(ICPState *icp, uint8_t mfrr);
 uint32_t icp_accept(ICPState *ss);
 uint32_t icp_ipoll(ICPState *ss, uint32_t *mfrr);
 void icp_eoi(ICPState *icp, uint32_t xirr);
+void icp_reset(ICPState *icp);
 
 void ics_write_xive(ICSState *ics, int nr, int server,
                     uint8_t priority, uint8_t saved_priority);
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index fd3319bd3202..e273069c25a9 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -319,6 +319,7 @@ typedef struct XiveTCTX {
     qemu_irq    os_output;
 
     uint8_t     regs[XIVE_TM_RING_COUNT * XIVE_TM_RING_SIZE];
+    uint32_t    os_cam;
 } XiveTCTX;
 
 /*
@@ -414,7 +415,9 @@ void xive_tctx_tm_write(XiveTCTX *tctx, hwaddr offset, uint64_t value,
 uint64_t xive_tctx_tm_read(XiveTCTX *tctx, hwaddr offset, unsigned size);
 
 void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon);
-Object *xive_tctx_create(Object *cpu, XiveRouter *xrtr, Error **errp);
+Object *xive_tctx_create(Object *cpu, XiveRouter *xrtr, uint32_t os_cam,
+                         Error **errp);
+void xive_tctx_reset(XiveTCTX *tctx);
 
 static inline uint32_t xive_nvt_cam_line(uint8_t nvt_blk, uint32_t nvt_idx)
 {
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index ba32d2cc5b0f..71f138512a1c 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -205,21 +205,13 @@ void spapr_xive_mmio_set_enabled(SpaprXive *xive, bool enable)
     memory_region_set_enabled(&xive->end_source.esb_mmio, false);
 }
 
-/*
- * When a Virtual Processor is scheduled to run on a HW thread, the
- * hypervisor pushes its identifier in the OS CAM line. Emulate the
- * same behavior under QEMU.
- */
-void spapr_xive_set_tctx_os_cam(XiveTCTX *tctx)
+static uint32_t spapr_xive_get_os_cam(PowerPCCPU *cpu)
 {
     uint8_t  nvt_blk;
     uint32_t nvt_idx;
-    uint32_t nvt_cam;
 
-    spapr_xive_cpu_to_nvt(POWERPC_CPU(tctx->cs), &nvt_blk, &nvt_idx);
-
-    nvt_cam = cpu_to_be32(TM_QW1W2_VO | xive_nvt_cam_line(nvt_blk, nvt_idx));
-    memcpy(&tctx->regs[TM_QW1_OS + TM_WORD2], &nvt_cam, 4);
+    spapr_xive_cpu_to_nvt(cpu, &nvt_blk, &nvt_idx);
+    return xive_nvt_cam_line(nvt_blk, nvt_idx);
 }
 
 static void spapr_xive_end_reset(XiveEND *end)
@@ -537,19 +529,21 @@ static int spapr_xive_cpu_intc_create(SpaprInterruptController *intc,
     SpaprXive *xive = SPAPR_XIVE(intc);
     Object *obj;
     SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
+    uint32_t os_cam = spapr_xive_get_os_cam(cpu);
 
-    obj = xive_tctx_create(OBJECT(cpu), XIVE_ROUTER(xive), errp);
+    obj = xive_tctx_create(OBJECT(cpu), XIVE_ROUTER(xive), os_cam, errp);
     if (!obj) {
         return -1;
     }
 
     spapr_cpu->tctx = XIVE_TCTX(obj);
+    return 0;
+}
 
-    /*
-     * (TCG) Early setting the OS CAM line for hotplugged CPUs as they
-     * don't beneficiate from the reset of the XIVE IRQ backend
-     */
-    spapr_xive_set_tctx_os_cam(spapr_cpu->tctx);
+static int spapr_xive_cpu_intc_reset(SpaprInterruptController *intc,
+                                     PowerPCCPU *cpu, Error **errp)
+{
+    xive_tctx_reset(spapr_cpu_state(cpu)->tctx);
     return 0;
 }
 
@@ -643,14 +637,6 @@ static void spapr_xive_dt(SpaprInterruptController *intc, uint32_t nr_servers,
 static int spapr_xive_activate(SpaprInterruptController *intc, Error **errp)
 {
     SpaprXive *xive = SPAPR_XIVE(intc);
-    CPUState *cs;
-
-    CPU_FOREACH(cs) {
-        PowerPCCPU *cpu = POWERPC_CPU(cs);
-
-        /* (TCG) Set the OS CAM line of the thread interrupt context. */
-        spapr_xive_set_tctx_os_cam(spapr_cpu_state(cpu)->tctx);
-    }
 
     if (kvm_enabled()) {
         int rc = spapr_irq_init_kvm(kvmppc_xive_connect, intc, errp);
@@ -697,6 +683,7 @@ static void spapr_xive_class_init(ObjectClass *klass, void *data)
     sicc->activate = spapr_xive_activate;
     sicc->deactivate = spapr_xive_deactivate;
     sicc->cpu_intc_create = spapr_xive_cpu_intc_create;
+    sicc->cpu_intc_reset = spapr_xive_cpu_intc_reset;
     sicc->claim_irq = spapr_xive_claim_irq;
     sicc->free_irq = spapr_xive_free_irq;
     sicc->set_irq = spapr_xive_set_irq;
diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index b5ac408f7b74..652771d6a5a5 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -295,6 +295,11 @@ static void icp_reset_handler(void *dev)
     }
 }
 
+void icp_reset(ICPState *icp)
+{
+    icp_reset_handler(icp);
+}
+
 static void icp_realize(DeviceState *dev, Error **errp)
 {
     ICPState *icp = ICP(dev);
diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c
index 4f64b9a9fc66..c0b2a576effe 100644
--- a/hw/intc/xics_spapr.c
+++ b/hw/intc/xics_spapr.c
@@ -346,6 +346,13 @@ static int xics_spapr_cpu_intc_create(SpaprInterruptController *intc,
     return 0;
 }
 
+static int xics_spapr_cpu_intc_reset(SpaprInterruptController *intc,
+                                     PowerPCCPU *cpu, Error **errp)
+{
+    icp_reset(spapr_cpu_state(cpu)->icp);
+    return 0;
+}
+
 static int xics_spapr_claim_irq(SpaprInterruptController *intc, int irq,
                                 bool lsi, Error **errp)
 {
@@ -433,6 +440,7 @@ static void ics_spapr_class_init(ObjectClass *klass, void *data)
     sicc->activate = xics_spapr_activate;
     sicc->deactivate = xics_spapr_deactivate;
     sicc->cpu_intc_create = xics_spapr_cpu_intc_create;
+    sicc->cpu_intc_reset = xics_spapr_cpu_intc_reset;
     sicc->claim_irq = xics_spapr_claim_irq;
     sicc->free_irq = xics_spapr_free_irq;
     sicc->set_irq = xics_spapr_set_irq;
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index d420c6571e14..be4f2c974178 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -547,7 +547,7 @@ void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon)
     }
 }
 
-static void xive_tctx_reset(void *dev)
+static void xive_tctx_reset_handler(void *dev)
 {
     XiveTCTX *tctx = XIVE_TCTX(dev);
 
@@ -566,6 +566,23 @@ static void xive_tctx_reset(void *dev)
         ipb_to_pipr(tctx->regs[TM_QW1_OS + TM_IPB]);
     tctx->regs[TM_QW3_HV_PHYS + TM_PIPR] =
         ipb_to_pipr(tctx->regs[TM_QW3_HV_PHYS + TM_IPB]);
+
+    /*
+     * (TCG) Set the OS CAM line of the thread interrupt context.
+     *
+     * When a Virtual Processor is scheduled to run on a HW thread,
+     * the hypervisor pushes its identifier in the OS CAM line.
+     * Emulate the same behavior under QEMU.
+     */
+    if (tctx->os_cam) {
+        uint32_t qw1w2 = cpu_to_be32(TM_QW1W2_VO | tctx->os_cam);
+        memcpy(&tctx->regs[TM_QW1_OS + TM_WORD2], &qw1w2, 4);
+    }
+}
+
+void xive_tctx_reset(XiveTCTX *tctx)
+{
+    xive_tctx_reset_handler(tctx);
 }
 
 static void xive_tctx_realize(DeviceState *dev, Error **errp)
@@ -608,12 +625,12 @@ static void xive_tctx_realize(DeviceState *dev, Error **errp)
         }
     }
 
-    qemu_register_reset(xive_tctx_reset, dev);
+    qemu_register_reset(xive_tctx_reset_handler, dev);
 }
 
 static void xive_tctx_unrealize(DeviceState *dev, Error **errp)
 {
-    qemu_unregister_reset(xive_tctx_reset, dev);
+    qemu_unregister_reset(xive_tctx_reset_handler, dev);
 }
 
 static int vmstate_xive_tctx_pre_save(void *opaque)
@@ -662,11 +679,17 @@ static const VMStateDescription vmstate_xive_tctx = {
     },
 };
 
+static Property  xive_tctx_properties[] = {
+    DEFINE_PROP_UINT32("os-cam", XiveTCTX, os_cam, 0),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
 static void xive_tctx_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
 
     dc->desc = "XIVE Interrupt Thread Context";
+    dc->props = xive_tctx_properties;
     dc->realize = xive_tctx_realize;
     dc->unrealize = xive_tctx_unrealize;
     dc->vmsd = &vmstate_xive_tctx;
@@ -684,7 +707,8 @@ static const TypeInfo xive_tctx_info = {
     .class_init    = xive_tctx_class_init,
 };
 
-Object *xive_tctx_create(Object *cpu, XiveRouter *xrtr, Error **errp)
+Object *xive_tctx_create(Object *cpu, XiveRouter *xrtr, uint32_t os_cam,
+                         Error **errp)
 {
     Error *local_err = NULL;
     Object *obj;
@@ -693,6 +717,7 @@ Object *xive_tctx_create(Object *cpu, XiveRouter *xrtr, Error **errp)
     object_property_add_child(cpu, TYPE_XIVE_TCTX, obj, &error_abort);
     object_unref(obj);
     object_property_add_const_link(obj, "cpu", cpu, &error_abort);
+    object_property_set_int(obj, os_cam, "os-cam", &local_err);
     object_property_set_bool(obj, true, "realized", &local_err);
     if (local_err) {
         goto error;
diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 7cf64b6d2533..99c06842573e 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -806,7 +806,8 @@ static void pnv_chip_power9_intc_create(PnvChip *chip, PowerPCCPU *cpu,
      * controller object is initialized afterwards. Hopefully, it's
      * only used at runtime.
      */
-    obj = xive_tctx_create(OBJECT(cpu), XIVE_ROUTER(&chip9->xive), &local_err);
+    obj = xive_tctx_create(OBJECT(cpu), XIVE_ROUTER(&chip9->xive), 0,
+                           &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         return;
diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c
index 3e4302c7d596..416aa75e5fba 100644
--- a/hw/ppc/spapr_cpu_core.c
+++ b/hw/ppc/spapr_cpu_core.c
@@ -33,6 +33,7 @@ static void spapr_cpu_reset(void *opaque)
     PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cpu);
     SpaprCpuState *spapr_cpu = spapr_cpu_state(cpu);
     target_ulong lpcr;
+    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
 
     cpu_reset(cs);
 
@@ -77,9 +78,11 @@ static void spapr_cpu_reset(void *opaque)
     spapr_cpu->dtl_addr = 0;
     spapr_cpu->dtl_size = 0;
 
-    spapr_caps_cpu_apply(SPAPR_MACHINE(qdev_get_machine()), cpu);
+    spapr_caps_cpu_apply(spapr, cpu);
 
     kvm_check_mmu(cpu, &error_fatal);
+
+    spapr_irq_cpu_intc_reset(spapr, cpu, &error_fatal);
 }
 
 void spapr_cpu_set_entry_state(PowerPCCPU *cpu, target_ulong nip, target_ulong r3)
@@ -235,12 +238,13 @@ static void spapr_realize_vcpu(PowerPCCPU *cpu, SpaprMachineState *spapr,
     kvmppc_set_papr(cpu);
 
     qemu_register_reset(spapr_cpu_reset, cpu);
-    spapr_cpu_reset(cpu);
 
     if (spapr_irq_cpu_intc_create(spapr, cpu, &local_err) < 0) {
         goto error_unregister;
     }
 
+    spapr_cpu_reset(cpu);
+
     if (!sc->pre_3_0_migration) {
         vmstate_register(NULL, cs->cpu_index, &vmstate_spapr_cpu_state,
                          cpu->machine_data);
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index bb91c61fa000..5d2b64029cd5 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -220,6 +220,27 @@ int spapr_irq_cpu_intc_create(SpaprMachineState *spapr,
     return 0;
 }
 
+int spapr_irq_cpu_intc_reset(SpaprMachineState *spapr,
+                             PowerPCCPU *cpu, Error **errp)
+{
+    SpaprInterruptController *intcs[] = ALL_INTCS(spapr);
+    int i;
+    int rc;
+
+    for (i = 0; i < ARRAY_SIZE(intcs); i++) {
+        SpaprInterruptController *intc = intcs[i];
+        if (intc) {
+            SpaprInterruptControllerClass *sicc = SPAPR_INTC_GET_CLASS(intc);
+            rc = sicc->cpu_intc_reset(intc, cpu, errp);
+            if (rc < 0) {
+                return rc;
+            }
+        }
+    }
+
+    return 0;
+}
+
 static void spapr_set_irq(void *opaque, int irq, int level)
 {
     SpaprMachineState *spapr = SPAPR_MACHINE(opaque);
-- 
2.21.0




^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC 3/5] vfio/pci: Respond to KVM irqchip change notifier
  2019-10-17  5:42 ` [RFC 3/5] vfio/pci: Respond to KVM irqchip change notifier David Gibson
@ 2019-10-17 20:28   ` Alex Williamson
  2019-11-20  3:34     ` David Gibson
  0 siblings, 1 reply; 13+ messages in thread
From: Alex Williamson @ 2019-10-17 20:28 UTC (permalink / raw)
  To: David Gibson; +Cc: aik, qemu-devel, qemu-ppc, clg, groug

On Thu, 17 Oct 2019 16:42:16 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> VFIO PCI devices already respond to the pci intx routing notifier, in order
> to update kernel irqchip mappings when routing is updated.  However this
> won't handle the case where the irqchip itself is replaced by a different
> model while retaining the same routing.  This case can happen on
> the pseries machine type due to PAPR feature negotiation.
> 
> To handle that case, add a handler for the irqchip change notifier, which
> does much the same thing as the routing notifier, but is unconditional,
> rather than being a no-op when the routing hasn't changed.
> 
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/vfio/pci.c | 12 ++++++++++++
>  hw/vfio/pci.h |  2 ++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 529ad13908..6aa806baff 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -255,6 +255,14 @@ static void vfio_intx_routing_notifier(PCIDevice *pdev)
>      }
>  }
>  
> +static void vfio_irqchip_change(Notifier *notify, void *data)
> +{
> +    VFIOPCIDevice *vdev = container_of(notify, VFIOPCIDevice,
> +                                       irqchip_change_notifier);
> +
> +    vfio_intx_update(vdev, &vdev->intx.route);
> +}
> +
>  static int vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp)
>  {
>      uint8_t pin = vfio_pci_read_config(&vdev->pdev, PCI_INTERRUPT_PIN, 1);
> @@ -2960,6 +2968,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>                                                    vfio_intx_mmap_enable, vdev);
>          pci_device_set_intx_routing_notifier(&vdev->pdev,
>                                               vfio_intx_routing_notifier);
> +        vdev->irqchip_change_notifier.notify = vfio_irqchip_change;
> +        kvm_irqchip_add_change_notifier(&vdev->irqchip_change_notifier);
>          ret = vfio_intx_enable(vdev, errp);
>          if (ret) {
>              goto out_teardown;
> @@ -3009,6 +3019,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>  
>  out_teardown:
>      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
> +    kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);

We can get here without having added the irqchip notifier and I don't
think QLIST_REMOVE is very friendly with gratuitous calls.

>      vfio_teardown_msi(vdev);
>      vfio_bars_exit(vdev);
>  error:
> @@ -3042,6 +3053,7 @@ static void vfio_exitfn(PCIDevice *pdev)
>      vfio_unregister_req_notifier(vdev);
>      vfio_unregister_err_notifier(vdev);
>      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
> +    kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);

Same here.  Otherwise the approach looks sane to me.  Thanks,

Alex

>      vfio_disable_interrupts(vdev);
>      if (vdev->intx.mmap_timer) {
>          timer_free(vdev->intx.mmap_timer);
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index 834a90d646..11324f28ce 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -168,6 +168,8 @@ typedef struct VFIOPCIDevice {
>      bool no_vfio_ioeventfd;
>      bool enable_ramfb;
>      VFIODisplay *dpy;
> +
> +    Notifier irqchip_change_notifier;
>  } VFIOPCIDevice;
>  
>  uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 3/5] vfio/pci: Respond to KVM irqchip change notifier
  2019-10-17 20:28   ` Alex Williamson
@ 2019-11-20  3:34     ` David Gibson
  0 siblings, 0 replies; 13+ messages in thread
From: David Gibson @ 2019-11-20  3:34 UTC (permalink / raw)
  To: Alex Williamson; +Cc: aik, qemu-devel, qemu-ppc, clg, groug

[-- Attachment #1: Type: text/plain, Size: 3995 bytes --]

On Thu, Oct 17, 2019 at 02:28:45PM -0600, Alex Williamson wrote:
> On Thu, 17 Oct 2019 16:42:16 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > VFIO PCI devices already respond to the pci intx routing notifier, in order
> > to update kernel irqchip mappings when routing is updated.  However this
> > won't handle the case where the irqchip itself is replaced by a different
> > model while retaining the same routing.  This case can happen on
> > the pseries machine type due to PAPR feature negotiation.
> > 
> > To handle that case, add a handler for the irqchip change notifier, which
> > does much the same thing as the routing notifier, but is unconditional,
> > rather than being a no-op when the routing hasn't changed.
> > 
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> > 
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  hw/vfio/pci.c | 12 ++++++++++++
> >  hw/vfio/pci.h |  2 ++
> >  2 files changed, 14 insertions(+)
> > 
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index 529ad13908..6aa806baff 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -255,6 +255,14 @@ static void vfio_intx_routing_notifier(PCIDevice *pdev)
> >      }
> >  }
> >  
> > +static void vfio_irqchip_change(Notifier *notify, void *data)
> > +{
> > +    VFIOPCIDevice *vdev = container_of(notify, VFIOPCIDevice,
> > +                                       irqchip_change_notifier);
> > +
> > +    vfio_intx_update(vdev, &vdev->intx.route);
> > +}
> > +
> >  static int vfio_intx_enable(VFIOPCIDevice *vdev, Error **errp)
> >  {
> >      uint8_t pin = vfio_pci_read_config(&vdev->pdev, PCI_INTERRUPT_PIN, 1);
> > @@ -2960,6 +2968,8 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
> >                                                    vfio_intx_mmap_enable, vdev);
> >          pci_device_set_intx_routing_notifier(&vdev->pdev,
> >                                               vfio_intx_routing_notifier);
> > +        vdev->irqchip_change_notifier.notify = vfio_irqchip_change;
> > +        kvm_irqchip_add_change_notifier(&vdev->irqchip_change_notifier);
> >          ret = vfio_intx_enable(vdev, errp);
> >          if (ret) {
> >              goto out_teardown;
> > @@ -3009,6 +3019,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
> >  
> >  out_teardown:
> >      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
> > +    kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
> 
> We can get here without having added the irqchip notifier and I don't
> think QLIST_REMOVE is very friendly with gratuitous calls.

Good point, adjusted.


> >      vfio_teardown_msi(vdev);
> >      vfio_bars_exit(vdev);
> >  error:
> > @@ -3042,6 +3053,7 @@ static void vfio_exitfn(PCIDevice *pdev)
> >      vfio_unregister_req_notifier(vdev);
> >      vfio_unregister_err_notifier(vdev);
> >      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
> > +    kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notifier);
> 
> Same here.  Otherwise the approach looks sane to me.  Thanks,
> 
> Alex
> 
> >      vfio_disable_interrupts(vdev);
> >      if (vdev->intx.mmap_timer) {
> >          timer_free(vdev->intx.mmap_timer);
> > diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> > index 834a90d646..11324f28ce 100644
> > --- a/hw/vfio/pci.h
> > +++ b/hw/vfio/pci.h
> > @@ -168,6 +168,8 @@ typedef struct VFIOPCIDevice {
> >      bool no_vfio_ioeventfd;
> >      bool enable_ramfb;
> >      VFIODisplay *dpy;
> > +
> > +    Notifier irqchip_change_notifier;
> >  } VFIOPCIDevice;
> >  
> >  uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 5/5] spapr: Work around spurious warnings from vfio INTx initialization
  2019-10-17 13:42     ` Cédric Le Goater
@ 2019-11-20  4:17       ` David Gibson
  2019-11-20  6:50         ` Cédric Le Goater
  0 siblings, 1 reply; 13+ messages in thread
From: David Gibson @ 2019-11-20  4:17 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: aik, alex.williamson, qemu-ppc, groug, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3132 bytes --]

On Thu, Oct 17, 2019 at 03:42:06PM +0200, Cédric Le Goater wrote:
> On 17/10/2019 10:43, Cédric Le Goater wrote:
> > On 17/10/2019 07:42, David Gibson wrote:
> >> Traditional PCI INTx for vfio devices can only perform well if using
> >> an in-kernel irqchip.  Therefore, vfio_intx_update() issues a warning
> >> if an in kernel irqchip is not available.
> >>
> >> We usually do have an in-kernel irqchip available for pseries machines
> >> on POWER hosts.  However, because the platform allows feature
> >> negotiation of what interrupt controller model to use, we don't
> >> currently initialize it until machine reset.  vfio_intx_update() is
> >> called (first) from vfio_realize() before that, so it can issue a
> >> spurious warning, even if we will have an in kernel irqchip by the
> >> time we need it.
> >>
> >> To workaround this, make a call to spapr_irq_update_active_intc() from
> >> spapr_irq_init() which is called at machine realize time, before the
> >> vfio realize.  This call will be pretty much obsoleted by the later
> >> call at reset time, but it serves to suppress the spurious warning
> >> from VFIO.
> >>
> >> Cc: Alex Williamson <alex.williamson@redhat.com>
> >> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>
> >> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> >> ---
> >>  hw/ppc/spapr_irq.c | 11 ++++++++++-
> >>  1 file changed, 10 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> >> index 45544b8976..bb91c61fa0 100644
> >> --- a/hw/ppc/spapr_irq.c
> >> +++ b/hw/ppc/spapr_irq.c
> >> @@ -345,6 +345,14 @@ void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
> >>  
> >>      spapr->qirqs = qemu_allocate_irqs(spapr_set_irq, spapr,
> >>                                        smc->nr_xirqs + SPAPR_XIRQ_BASE);
> >> +
> >> +    /*
> >> +     * Mostly we don't actually need this until reset, except that not
> >> +     * having this set up can cause VFIO devices to issue a
> >> +     * false-positive warning during realize(), because they don't yet
> >> +     * have an in-kernel irq chip.
> >> +     */
> >> +    spapr_irq_update_active_intc(spapr);
> > 
> > This will call the de/activate hooks of the irq chip very early 
> > before reset and CAS, and won't call them at reset if the intc
> > pointers are the same (xive only for instance). It breaks very 
> > obviously the emulated XIVE device which won't reset the OS CAM 
> > line with the CPU id values and CPU notification will be broken.
> > 
> > We have to find something else.
> 
> Here is a possible fix for the (re)setting of the OS CAM line. 
> 
> Removing spapr_xive_set_tctx_os_cam() is a good thing but this solution
> shows some issues in our modeling of hot-plugged CPUS with a reset() 
> being called at realize().

Ok, I've applied the patch below now.  Does that mean that my 5/5
patch should be good now?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 5/5] spapr: Work around spurious warnings from vfio INTx initialization
  2019-11-20  4:17       ` David Gibson
@ 2019-11-20  6:50         ` Cédric Le Goater
  0 siblings, 0 replies; 13+ messages in thread
From: Cédric Le Goater @ 2019-11-20  6:50 UTC (permalink / raw)
  To: David Gibson; +Cc: aik, alex.williamson, qemu-ppc, groug, qemu-devel

On 20/11/2019 05:17, David Gibson wrote:
> On Thu, Oct 17, 2019 at 03:42:06PM +0200, Cédric Le Goater wrote:
>> On 17/10/2019 10:43, Cédric Le Goater wrote:
>>> On 17/10/2019 07:42, David Gibson wrote:
>>>> Traditional PCI INTx for vfio devices can only perform well if using
>>>> an in-kernel irqchip.  Therefore, vfio_intx_update() issues a warning
>>>> if an in kernel irqchip is not available.
>>>>
>>>> We usually do have an in-kernel irqchip available for pseries machines
>>>> on POWER hosts.  However, because the platform allows feature
>>>> negotiation of what interrupt controller model to use, we don't
>>>> currently initialize it until machine reset.  vfio_intx_update() is
>>>> called (first) from vfio_realize() before that, so it can issue a
>>>> spurious warning, even if we will have an in kernel irqchip by the
>>>> time we need it.
>>>>
>>>> To workaround this, make a call to spapr_irq_update_active_intc() from
>>>> spapr_irq_init() which is called at machine realize time, before the
>>>> vfio realize.  This call will be pretty much obsoleted by the later
>>>> call at reset time, but it serves to suppress the spurious warning
>>>> from VFIO.
>>>>
>>>> Cc: Alex Williamson <alex.williamson@redhat.com>
>>>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>
>>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>>> ---
>>>>  hw/ppc/spapr_irq.c | 11 ++++++++++-
>>>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
>>>> index 45544b8976..bb91c61fa0 100644
>>>> --- a/hw/ppc/spapr_irq.c
>>>> +++ b/hw/ppc/spapr_irq.c
>>>> @@ -345,6 +345,14 @@ void spapr_irq_init(SpaprMachineState *spapr, Error **errp)
>>>>  
>>>>      spapr->qirqs = qemu_allocate_irqs(spapr_set_irq, spapr,
>>>>                                        smc->nr_xirqs + SPAPR_XIRQ_BASE);
>>>> +
>>>> +    /*
>>>> +     * Mostly we don't actually need this until reset, except that not
>>>> +     * having this set up can cause VFIO devices to issue a
>>>> +     * false-positive warning during realize(), because they don't yet
>>>> +     * have an in-kernel irq chip.
>>>> +     */
>>>> +    spapr_irq_update_active_intc(spapr);
>>>
>>> This will call the de/activate hooks of the irq chip very early 
>>> before reset and CAS, and won't call them at reset if the intc
>>> pointers are the same (xive only for instance). It breaks very 
>>> obviously the emulated XIVE device which won't reset the OS CAM 
>>> line with the CPU id values and CPU notification will be broken.
>>>
>>> We have to find something else.
>>
>> Here is a possible fix for the (re)setting of the OS CAM line. 
>>
>> Removing spapr_xive_set_tctx_os_cam() is a good thing but this solution
>> shows some issues in our modeling of hot-plugged CPUS with a reset() 
>> being called at realize().
> 
> Ok, I've applied the patch below now. 

Yes. this was replaced by :

  http://patchwork.ozlabs.org/cover/1181522/

> Does that mean that my 5/5 patch should be good now?

We should.

C.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-11-20  6:51 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-17  5:42 [RFC 0/5] Handle PAPR irq chip changes for VFIO devices David Gibson
2019-10-17  5:42 ` [RFC 1/5] kvm: Introduce KVM irqchip change notifier David Gibson
2019-10-17  5:42 ` [RFC 2/5] vfio/pci: Split vfio_intx_update() David Gibson
2019-10-17  5:42 ` [RFC 3/5] vfio/pci: Respond to KVM irqchip change notifier David Gibson
2019-10-17 20:28   ` Alex Williamson
2019-11-20  3:34     ` David Gibson
2019-10-17  5:42 ` [RFC 4/5] spapr: Handle irq backend changes with VFIO PCI devices David Gibson
2019-10-17  5:42 ` [RFC 5/5] spapr: Work around spurious warnings from vfio INTx initialization David Gibson
2019-10-17  8:43   ` Cédric Le Goater
2019-10-17  9:00     ` Greg Kurz
2019-10-17 13:42     ` Cédric Le Goater
2019-11-20  4:17       ` David Gibson
2019-11-20  6:50         ` Cédric Le Goater

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).