All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/13] spapr: add KVM support to the XIVE interrupt mode
@ 2019-02-22 13:13 Cédric Le Goater
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 01/13] spapr/xive: add KVM support Cédric Le Goater
                   ` (12 more replies)
  0 siblings, 13 replies; 32+ messages in thread
From: Cédric Le Goater @ 2019-02-22 13:13 UTC (permalink / raw)
  To: David Gibson; +Cc: Greg Kurz, qemu-ppc, qemu-devel, Cédric Le Goater

Hello,

This is the v2 of the QEMU/KVM patchset taking into account the
remarks on the interface with Linux/KVM.

The first patches introduce the XIVE KVM device, state synchronization
and migration support under KVM. The second part of the patchset
modifies the XICS and XIVE interrupt models to add KVM support to the
'dual' IRQ backend.

GitHub trees available here :
 
QEMU sPAPR:

  https://github.com/legoater/qemu/commits/xive-next
  
Linux/KVM:

  https://github.com/legoater/linux/commits/xive-5.0

OPAL:

  https://github.com/legoater/skiboot/commits/xive

Thanks,

C.

Change since v1:

 - Reworked most of the KVM interface
 - Reworked *All* hcalls which are now handled at the QEMU level,
   possibly extended with a KVM device ioctl when required.
 - TIMA and ESB special mapping done on the KVM device fd.
 - Tested on nested
 - Implemented the device fallback mode when a kernel_irqchip is not
   available and not required. Useful on nested to use XIVE. 
 - Fix device hotplug when VM is stopped (Is this necessary ?)

Cédric Le Goater (13):
  spapr/xive: add KVM support
  spapr/xive: add hcall support when under KVM
  spapr/xive: activate KVM support
  spapr/xive: add state synchronization with KVM
  spapr/xive: introduce a VM state change handler
  spapr/xive: add migration support for KVM
  spapr/xive: fix migration of the XiveTCTX under TCG
  spapr/rtas: modify spapr_rtas_register() to remove RTAS handlers
  sysbus: add a sysbus_mmio_unmap() helper
  spapr: introduce routines to delete the KVM IRQ device
  spapr: check for the activation of the KVM IRQ device
  spapr: add KVM support to the 'dual' machine
  spapr/xive: fix device hotplug when VM is stopped

 default-configs/ppc64-softmmu.mak |   1 +
 include/hw/ppc/spapr.h            |   4 +
 include/hw/ppc/spapr_xive.h       |  39 ++
 include/hw/ppc/xics_spapr.h       |   1 +
 include/hw/ppc/xive.h             |  15 +
 include/hw/sysbus.h               |   1 +
 target/ppc/kvm_ppc.h              |   6 +
 hw/core/sysbus.c                  |  10 +
 hw/intc/spapr_xive.c              | 193 ++++++-
 hw/intc/spapr_xive_kvm.c          | 807 ++++++++++++++++++++++++++++++
 hw/intc/xics_kvm.c                | 108 +++-
 hw/intc/xive.c                    |  44 +-
 hw/ppc/spapr_irq.c                | 136 +++--
 hw/ppc/spapr_rtas.c               |   2 +-
 target/ppc/kvm.c                  |   7 +
 hw/intc/Makefile.objs             |   1 +
 16 files changed, 1315 insertions(+), 60 deletions(-)
 create mode 100644 hw/intc/spapr_xive_kvm.c

-- 
2.20.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v2 01/13] spapr/xive: add KVM support
  2019-02-22 13:13 [Qemu-devel] [PATCH v2 00/13] spapr: add KVM support to the XIVE interrupt mode Cédric Le Goater
@ 2019-02-22 13:13 ` Cédric Le Goater
  2019-02-25  5:55   ` David Gibson
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 02/13] spapr/xive: add hcall support when under KVM Cédric Le Goater
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Cédric Le Goater @ 2019-02-22 13:13 UTC (permalink / raw)
  To: David Gibson; +Cc: Greg Kurz, qemu-ppc, qemu-devel, Cédric Le Goater

This introduces a set of helpers when KVM is in use, which create the
KVM XIVE device, initialize the interrupt sources at a KVM level and
connect the interrupt presenters to the vCPU.

They also handle the initialization of the TIMA and the source ESB
memory regions of the controller. These have a different type under
KVM. They are 'ram device' memory mappings, similarly to VFIO, exposed
to the guest and the associated VMAs on the host are populated
dynamically with the appropriate pages using a fault handler.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 default-configs/ppc64-softmmu.mak |   1 +
 include/hw/ppc/spapr_xive.h       |  10 ++
 include/hw/ppc/xive.h             |  13 ++
 target/ppc/kvm_ppc.h              |   6 +
 hw/intc/spapr_xive.c              |  48 +++++-
 hw/intc/spapr_xive_kvm.c          | 237 ++++++++++++++++++++++++++++++
 hw/intc/xive.c                    |  21 ++-
 hw/ppc/spapr_irq.c                |   6 +-
 target/ppc/kvm.c                  |   7 +
 hw/intc/Makefile.objs             |   1 +
 10 files changed, 340 insertions(+), 10 deletions(-)
 create mode 100644 hw/intc/spapr_xive_kvm.c

diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
index 7f34ad0528ed..c1bf5cd951f5 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -18,6 +18,7 @@ CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
 CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
 CONFIG_XIVE=$(CONFIG_PSERIES)
 CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
+CONFIG_XIVE_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
 CONFIG_MEM_DEVICE=y
 CONFIG_DIMM=y
 CONFIG_SPAPR_RNG=y
diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 2d31f24e3bfe..ab6732b14a02 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -38,6 +38,10 @@ typedef struct sPAPRXive {
     /* TIMA mapping address */
     hwaddr        tm_base;
     MemoryRegion  tm_mmio;
+
+    /* KVM support */
+    int           fd;
+    void          *tm_mmap;
 } sPAPRXive;
 
 bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi);
@@ -49,5 +53,11 @@ void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
                    uint32_t phandle);
 void spapr_xive_set_tctx_os_cam(XiveTCTX *tctx);
 void spapr_xive_mmio_set_enabled(sPAPRXive *xive, bool enable);
+void spapr_xive_map_mmio(sPAPRXive *xive);
+
+/*
+ * KVM XIVE device helpers
+ */
+void kvmppc_xive_connect(sPAPRXive *xive, Error **errp);
 
 #endif /* PPC_SPAPR_XIVE_H */
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 13a487527b11..061d43fea24d 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -140,6 +140,7 @@
 #ifndef PPC_XIVE_H
 #define PPC_XIVE_H
 
+#include "sysemu/kvm.h"
 #include "hw/qdev-core.h"
 #include "hw/sysbus.h"
 #include "hw/ppc/xive_regs.h"
@@ -194,6 +195,9 @@ typedef struct XiveSource {
     uint32_t        esb_shift;
     MemoryRegion    esb_mmio;
 
+    /* KVM support */
+    void            *esb_mmap;
+
     XiveNotifier    *xive;
 } XiveSource;
 
@@ -419,4 +423,13 @@ static inline uint32_t xive_nvt_cam_line(uint8_t nvt_blk, uint32_t nvt_idx)
     return (nvt_blk << 19) | nvt_idx;
 }
 
+/*
+ * KVM XIVE device helpers
+ */
+
+void kvmppc_xive_source_reset_one(XiveSource *xsrc, int srcno, Error **errp);
+void kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp);
+void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val);
+void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp);
+
 #endif /* PPC_XIVE_H */
diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
index bdfaa4e70a83..d2159660f9f2 100644
--- a/target/ppc/kvm_ppc.h
+++ b/target/ppc/kvm_ppc.h
@@ -59,6 +59,7 @@ bool kvmppc_has_cap_fixup_hcalls(void);
 bool kvmppc_has_cap_htm(void);
 bool kvmppc_has_cap_mmu_radix(void);
 bool kvmppc_has_cap_mmu_hash_v3(void);
+bool kvmppc_has_cap_xive(void);
 int kvmppc_get_cap_safe_cache(void);
 int kvmppc_get_cap_safe_bounds_check(void);
 int kvmppc_get_cap_safe_indirect_branch(void);
@@ -307,6 +308,11 @@ static inline bool kvmppc_has_cap_mmu_hash_v3(void)
     return false;
 }
 
+static inline bool kvmppc_has_cap_xive(void)
+{
+    return false;
+}
+
 static inline int kvmppc_get_cap_safe_cache(void)
 {
     return 0;
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 06e3c9fdbfeb..c24d649e3668 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -173,7 +173,7 @@ void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
     }
 }
 
-static void spapr_xive_map_mmio(sPAPRXive *xive)
+void spapr_xive_map_mmio(sPAPRXive *xive)
 {
     sysbus_mmio_map(SYS_BUS_DEVICE(xive), 0, xive->vc_base);
     sysbus_mmio_map(SYS_BUS_DEVICE(xive), 1, xive->end_base);
@@ -251,6 +251,9 @@ static void spapr_xive_instance_init(Object *obj)
                       TYPE_XIVE_END_SOURCE);
     object_property_add_child(obj, "end_source", OBJECT(&xive->end_source),
                               NULL);
+
+    /* Not connected to the KVM XIVE device */
+    xive->fd = -1;
 }
 
 static void spapr_xive_realize(DeviceState *dev, Error **errp)
@@ -259,6 +262,7 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
     XiveSource *xsrc = &xive->source;
     XiveENDSource *end_xsrc = &xive->end_source;
     Error *local_err = NULL;
+    MachineState *machine = MACHINE(qdev_get_machine());
 
     if (!xive->nr_irqs) {
         error_setg(errp, "Number of interrupt needs to be greater 0");
@@ -305,6 +309,32 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
     xive->eat = g_new0(XiveEAS, xive->nr_irqs);
     xive->endt = g_new0(XiveEND, xive->nr_ends);
 
+    xive->nodename = g_strdup_printf("interrupt-controller@%" PRIx64,
+                           xive->tm_base + XIVE_TM_USER_PAGE * (1 << TM_SHIFT));
+
+    qemu_register_reset(spapr_xive_reset, dev);
+
+    if (kvm_enabled() && machine_kernel_irqchip_allowed(machine)) {
+        kvmppc_xive_connect(xive, &local_err);
+        if (local_err && machine_kernel_irqchip_required(machine)) {
+            error_prepend(&local_err,
+                          "kernel_irqchip requested but unavailable: ");
+            error_propagate(errp, local_err);
+            return;
+        }
+
+        if (!local_err) {
+            return;
+        }
+
+        /*
+         * We failed to initialize the XIVE KVM device, fallback to
+         * emulated mode
+         */
+        error_prepend(&local_err, "kernel_irqchip allowed but unavailable: ");
+        error_report_err(local_err);
+    }
+
     /* TIMA initialization */
     memory_region_init_io(&xive->tm_mmio, OBJECT(xive), &xive_tm_ops, xive,
                           "xive.tima", 4ull << TM_SHIFT);
@@ -316,11 +346,6 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
 
     /* Map all regions */
     spapr_xive_map_mmio(xive);
-
-    xive->nodename = g_strdup_printf("interrupt-controller@%" PRIx64,
-                           xive->tm_base + XIVE_TM_USER_PAGE * (1 << TM_SHIFT));
-
-    qemu_register_reset(spapr_xive_reset, dev);
 }
 
 static int spapr_xive_get_eas(XiveRouter *xrtr, uint8_t eas_blk,
@@ -495,6 +520,17 @@ bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi)
     if (lsi) {
         xive_source_irq_set_lsi(xsrc, lisn);
     }
+
+    if (kvm_irqchip_in_kernel()) {
+        Error *local_err = NULL;
+
+        kvmppc_xive_source_reset_one(xsrc, lisn, &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            return false;
+        }
+    }
+
     return true;
 }
 
diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
new file mode 100644
index 000000000000..623fbf74f23e
--- /dev/null
+++ b/hw/intc/spapr_xive_kvm.c
@@ -0,0 +1,237 @@
+/*
+ * QEMU PowerPC sPAPR XIVE interrupt controller model
+ *
+ * Copyright (c) 2017-2019, IBM Corporation.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qemu/error-report.h"
+#include "qapi/error.h"
+#include "target/ppc/cpu.h"
+#include "sysemu/cpus.h"
+#include "sysemu/kvm.h"
+#include "hw/ppc/spapr.h"
+#include "hw/ppc/spapr_xive.h"
+#include "hw/ppc/xive.h"
+#include "kvm_ppc.h"
+
+#include <sys/ioctl.h>
+
+/*
+ * Helpers for CPU hotplug
+ *
+ * TODO: make a common KVMEnabledCPU layer for XICS and XIVE
+ */
+typedef struct KVMEnabledCPU {
+    unsigned long vcpu_id;
+    QLIST_ENTRY(KVMEnabledCPU) node;
+} KVMEnabledCPU;
+
+static QLIST_HEAD(, KVMEnabledCPU)
+    kvm_enabled_cpus = QLIST_HEAD_INITIALIZER(&kvm_enabled_cpus);
+
+static bool kvm_cpu_is_enabled(CPUState *cs)
+{
+    KVMEnabledCPU *enabled_cpu;
+    unsigned long vcpu_id = kvm_arch_vcpu_id(cs);
+
+    QLIST_FOREACH(enabled_cpu, &kvm_enabled_cpus, node) {
+        if (enabled_cpu->vcpu_id == vcpu_id) {
+            return true;
+        }
+    }
+    return false;
+}
+
+static void kvm_cpu_enable(CPUState *cs)
+{
+    KVMEnabledCPU *enabled_cpu;
+    unsigned long vcpu_id = kvm_arch_vcpu_id(cs);
+
+    enabled_cpu = g_malloc(sizeof(*enabled_cpu));
+    enabled_cpu->vcpu_id = vcpu_id;
+    QLIST_INSERT_HEAD(&kvm_enabled_cpus, enabled_cpu, node);
+}
+
+/*
+ * XIVE Thread Interrupt Management context (KVM)
+ */
+
+void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp)
+{
+    sPAPRXive *xive = SPAPR_MACHINE(qdev_get_machine())->xive;
+    unsigned long vcpu_id;
+    int ret;
+
+    /* Check if CPU was hot unplugged and replugged. */
+    if (kvm_cpu_is_enabled(tctx->cs)) {
+        return;
+    }
+
+    vcpu_id = kvm_arch_vcpu_id(tctx->cs);
+
+    ret = kvm_vcpu_enable_cap(tctx->cs, KVM_CAP_PPC_IRQ_XIVE, 0, xive->fd,
+                              vcpu_id, 0);
+    if (ret < 0) {
+        error_setg(errp, "XIVE: unable to connect CPU%ld to KVM device: %s",
+                   vcpu_id, strerror(errno));
+        return;
+    }
+
+    kvm_cpu_enable(tctx->cs);
+}
+
+/*
+ * XIVE Interrupt Source (KVM)
+ */
+
+/*
+ * At reset, the interrupt sources are simply created and MASKED. We
+ * only need to inform the KVM XIVE device about their type: LSI or
+ * MSI.
+ */
+void kvmppc_xive_source_reset_one(XiveSource *xsrc, int srcno, Error **errp)
+{
+    sPAPRXive *xive = SPAPR_XIVE(xsrc->xive);
+    uint64_t state = 0;
+
+    if (xive_source_irq_is_lsi(xsrc, srcno)) {
+        state |= KVM_XIVE_LEVEL_SENSITIVE;
+        if (xsrc->status[srcno] & XIVE_STATUS_ASSERTED) {
+            state |= KVM_XIVE_LEVEL_ASSERTED;
+        }
+    }
+
+    kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_SOURCE, srcno, &state,
+                      true, errp);
+}
+
+void kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp)
+{
+    int i;
+
+    for (i = 0; i < xsrc->nr_irqs; i++) {
+        Error *local_err = NULL;
+
+        kvmppc_xive_source_reset_one(xsrc, i, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+}
+
+void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val)
+{
+    XiveSource *xsrc = opaque;
+    struct kvm_irq_level args;
+    int rc;
+
+    args.irq = srcno;
+    if (!xive_source_irq_is_lsi(xsrc, srcno)) {
+        if (!val) {
+            return;
+        }
+        args.level = KVM_INTERRUPT_SET;
+    } else {
+        if (val) {
+            xsrc->status[srcno] |= XIVE_STATUS_ASSERTED;
+            args.level = KVM_INTERRUPT_SET_LEVEL;
+        } else {
+            xsrc->status[srcno] &= ~XIVE_STATUS_ASSERTED;
+            args.level = KVM_INTERRUPT_UNSET;
+        }
+    }
+    rc = kvm_vm_ioctl(kvm_state, KVM_IRQ_LINE, &args);
+    if (rc < 0) {
+        error_report("XIVE: kvm_irq_line() failed : %s", strerror(errno));
+    }
+}
+
+/*
+ * sPAPR XIVE interrupt controller (KVM)
+ */
+
+static void *kvmppc_xive_mmap(sPAPRXive *xive, int pgoff, size_t len,
+                              Error **errp)
+{
+    void *addr;
+    uint32_t page_shift = 16; /* TODO: fix page_shift */
+
+    addr = mmap(NULL, len, PROT_WRITE | PROT_READ, MAP_SHARED, xive->fd,
+                pgoff << page_shift);
+    if (addr == MAP_FAILED) {
+        error_setg_errno(errp, errno, "XIVE: unable to set memory mapping");
+        return NULL;
+    }
+
+    return addr;
+}
+
+/*
+ * All the XIVE memory regions are now backed by mappings from the KVM
+ * XIVE device.
+ */
+void kvmppc_xive_connect(sPAPRXive *xive, Error **errp)
+{
+    XiveSource *xsrc = &xive->source;
+    XiveENDSource *end_xsrc = &xive->end_source;
+    Error *local_err = NULL;
+    size_t esb_len = (1ull << xsrc->esb_shift) * xsrc->nr_irqs;
+    size_t tima_len = 4ull << TM_SHIFT;
+
+    if (!kvmppc_has_cap_xive()) {
+        error_setg(errp, "IRQ_XIVE capability must be present for KVM");
+        return;
+    }
+
+    /* First, create the KVM XIVE device */
+    xive->fd = kvm_create_device(kvm_state, KVM_DEV_TYPE_XIVE, false);
+    if (xive->fd < 0) {
+        error_setg_errno(errp, -xive->fd, "XIVE: error creating KVM device");
+        return;
+    }
+
+    /*
+     * 1. Source ESB pages - KVM mapping
+     */
+    xsrc->esb_mmap = kvmppc_xive_mmap(xive, KVM_XIVE_ESB_PAGE_OFFSET, esb_len,
+                                      &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    memory_region_init_ram_device_ptr(&xsrc->esb_mmio, OBJECT(xsrc),
+                                      "xive.esb", esb_len, xsrc->esb_mmap);
+    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &xsrc->esb_mmio);
+
+    /*
+     * 2. END ESB pages (No KVM support yet)
+     */
+    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &end_xsrc->esb_mmio);
+
+    /*
+     * 3. TIMA pages - KVM mapping
+     */
+    xive->tm_mmap = kvmppc_xive_mmap(xive, KVM_XIVE_TIMA_PAGE_OFFSET, tima_len,
+                                     &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+    memory_region_init_ram_device_ptr(&xive->tm_mmio, OBJECT(xive),
+                                      "xive.tima", tima_len, xive->tm_mmap);
+    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &xive->tm_mmio);
+
+    kvm_kernel_irqchip = true;
+    kvm_msi_via_irqfd_allowed = true;
+    kvm_gsi_direct_mapping = true;
+
+    /* Map all regions */
+    spapr_xive_map_mmio(xive);
+}
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index daa7badc8492..0284b5803551 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -491,6 +491,15 @@ static void xive_tctx_realize(DeviceState *dev, Error **errp)
         return;
     }
 
+    /* Connect the presenter to the VCPU (required for CPU hotplug) */
+    if (kvm_irqchip_in_kernel()) {
+        kvmppc_xive_cpu_connect(tctx, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+
     qemu_register_reset(xive_tctx_reset, dev);
 }
 
@@ -893,6 +902,10 @@ static void xive_source_reset(void *dev)
 
     /* PQs are initialized to 0b01 (Q=1) which corresponds to "ints off" */
     memset(xsrc->status, XIVE_ESB_OFF, xsrc->nr_irqs);
+
+    if (kvm_irqchip_in_kernel()) {
+        kvmppc_xive_source_reset(xsrc, &error_fatal);
+    }
 }
 
 static void xive_source_realize(DeviceState *dev, Error **errp)
@@ -926,9 +939,11 @@ static void xive_source_realize(DeviceState *dev, Error **errp)
     xsrc->status = g_malloc0(xsrc->nr_irqs);
     xsrc->lsi_map = bitmap_new(xsrc->nr_irqs);
 
-    memory_region_init_io(&xsrc->esb_mmio, OBJECT(xsrc),
-                          &xive_source_esb_ops, xsrc, "xive.esb",
-                          (1ull << xsrc->esb_shift) * xsrc->nr_irqs);
+    if (!kvm_irqchip_in_kernel()) {
+        memory_region_init_io(&xsrc->esb_mmio, OBJECT(xsrc),
+                              &xive_source_esb_ops, xsrc, "xive.esb",
+                              (1ull << xsrc->esb_shift) * xsrc->nr_irqs);
+    }
 
     qemu_register_reset(xive_source_reset, dev);
 }
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 4145079d7fa5..6e1c36dc62ca 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -387,7 +387,11 @@ static void spapr_irq_set_irq_xive(void *opaque, int srcno, int val)
 {
     sPAPRMachineState *spapr = opaque;
 
-    xive_source_set_irq(&spapr->xive->source, srcno, val);
+    if (kvm_irqchip_in_kernel()) {
+        kvmppc_xive_source_set_irq(&spapr->xive->source, srcno, val);
+    } else {
+        xive_source_set_irq(&spapr->xive->source, srcno, val);
+    }
 }
 
 static const char *spapr_irq_get_nodename_xive(sPAPRMachineState *spapr)
diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
index d01852fe3112..43e42e3c2af9 100644
--- a/target/ppc/kvm.c
+++ b/target/ppc/kvm.c
@@ -85,6 +85,7 @@ static int cap_fixup_hcalls;
 static int cap_htm;             /* Hardware transactional memory support */
 static int cap_mmu_radix;
 static int cap_mmu_hash_v3;
+static int cap_xive;
 static int cap_resize_hpt;
 static int cap_ppc_pvr_compat;
 static int cap_ppc_safe_cache;
@@ -148,6 +149,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
     cap_htm = kvm_vm_check_extension(s, KVM_CAP_PPC_HTM);
     cap_mmu_radix = kvm_vm_check_extension(s, KVM_CAP_PPC_MMU_RADIX);
     cap_mmu_hash_v3 = kvm_vm_check_extension(s, KVM_CAP_PPC_MMU_HASH_V3);
+    cap_xive = kvm_vm_check_extension(s, KVM_CAP_PPC_IRQ_XIVE);
     cap_resize_hpt = kvm_vm_check_extension(s, KVM_CAP_SPAPR_RESIZE_HPT);
     kvmppc_get_cpu_characteristics(s);
     cap_ppc_nested_kvm_hv = kvm_vm_check_extension(s, KVM_CAP_PPC_NESTED_HV);
@@ -2388,6 +2390,11 @@ static int parse_cap_ppc_safe_indirect_branch(struct kvm_ppc_cpu_char c)
     return 0;
 }
 
+bool kvmppc_has_cap_xive(void)
+{
+    return cap_xive;
+}
+
 static void kvmppc_get_cpu_characteristics(KVMState *s)
 {
     struct kvm_ppc_cpu_char c;
diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 301a8e972d91..23126c199178 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -39,6 +39,7 @@ obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
 obj-$(CONFIG_XICS_KVM) += xics_kvm.o
 obj-$(CONFIG_XIVE) += xive.o
 obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o
+obj-$(CONFIG_XIVE_KVM) += spapr_xive_kvm.o
 obj-$(CONFIG_POWERNV) += xics_pnv.o
 obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
 obj-$(CONFIG_S390_FLIC) += s390_flic.o
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v2 02/13] spapr/xive: add hcall support when under KVM
  2019-02-22 13:13 [Qemu-devel] [PATCH v2 00/13] spapr: add KVM support to the XIVE interrupt mode Cédric Le Goater
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 01/13] spapr/xive: add KVM support Cédric Le Goater
@ 2019-02-22 13:13 ` Cédric Le Goater
  2019-02-25 23:22   ` David Gibson
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 03/13] spapr/xive: activate KVM support Cédric Le Goater
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Cédric Le Goater @ 2019-02-22 13:13 UTC (permalink / raw)
  To: David Gibson; +Cc: Greg Kurz, qemu-ppc, qemu-devel, Cédric Le Goater

XIVE hcalls are all redirected to QEMU as none are on a fast path.
When necessary, QEMU invokes KVM through specific ioctls to perform
host operations. QEMU should have done the necessary checks before
calling KVM and, in case of failure, H_HARDWARE is simply returned.

H_INT_ESB is a special case that could have been handled under KVM
but the impact on performance was low when under QEMU. Here are some
figures :

    kernel irqchip      OFF          ON
    H_INT_ESB                    KVM   QEMU

    rtl8139 (LSI )      1.19     1.24  1.23  Gbits/sec
    virtio             31.80    42.30   --   Gbits/sec

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr_xive.h |  15 +++
 hw/intc/spapr_xive.c        |  87 +++++++++++++++--
 hw/intc/spapr_xive_kvm.c    | 184 ++++++++++++++++++++++++++++++++++++
 3 files changed, 278 insertions(+), 8 deletions(-)

diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index ab6732b14a02..749c6cbc2c56 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -55,9 +55,24 @@ void spapr_xive_set_tctx_os_cam(XiveTCTX *tctx);
 void spapr_xive_mmio_set_enabled(sPAPRXive *xive, bool enable);
 void spapr_xive_map_mmio(sPAPRXive *xive);
 
+int spapr_xive_end_to_target(uint8_t end_blk, uint32_t end_idx,
+                             uint32_t *out_server, uint8_t *out_prio);
+
 /*
  * KVM XIVE device helpers
  */
 void kvmppc_xive_connect(sPAPRXive *xive, Error **errp);
+void kvmppc_xive_reset(sPAPRXive *xive, Error **errp);
+void kvmppc_xive_set_source_config(sPAPRXive *xive, uint32_t lisn, XiveEAS *eas,
+                                   Error **errp);
+void kvmppc_xive_sync_source(sPAPRXive *xive, uint32_t lisn, Error **errp);
+uint64_t kvmppc_xive_esb_rw(XiveSource *xsrc, int srcno, uint32_t offset,
+                            uint64_t data, bool write);
+void kvmppc_xive_set_queue_config(sPAPRXive *xive, uint8_t end_blk,
+                                 uint32_t end_idx, XiveEND *end,
+                                 Error **errp);
+void kvmppc_xive_get_queue_config(sPAPRXive *xive, uint8_t end_blk,
+                                 uint32_t end_idx, XiveEND *end,
+                                 Error **errp);
 
 #endif /* PPC_SPAPR_XIVE_H */
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index c24d649e3668..3db24391e31c 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -86,6 +86,19 @@ static int spapr_xive_target_to_nvt(uint32_t target,
  * sPAPR END indexing uses a simple mapping of the CPU vcpu_id, 8
  * priorities per CPU
  */
+int spapr_xive_end_to_target(uint8_t end_blk, uint32_t end_idx,
+                             uint32_t *out_server, uint8_t *out_prio)
+{
+    if (out_server) {
+        *out_server = end_idx >> 3;
+    }
+
+    if (out_prio) {
+        *out_prio = end_idx & 0x7;
+    }
+    return 0;
+}
+
 static void spapr_xive_cpu_to_end(PowerPCCPU *cpu, uint8_t prio,
                                   uint8_t *out_end_blk, uint32_t *out_end_idx)
 {
@@ -792,6 +805,16 @@ static target_ulong h_int_set_source_config(PowerPCCPU *cpu,
         new_eas.w = xive_set_field64(EAS_END_DATA, new_eas.w, eisn);
     }
 
+    if (kvm_irqchip_in_kernel()) {
+        Error *local_err = NULL;
+
+        kvmppc_xive_set_source_config(xive, lisn, &new_eas, &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            return H_HARDWARE;
+        }
+    }
+
 out:
     xive->eat[lisn] = new_eas;
     return H_SUCCESS;
@@ -1097,6 +1120,16 @@ static target_ulong h_int_set_queue_config(PowerPCCPU *cpu,
      */
 
 out:
+    if (kvm_irqchip_in_kernel()) {
+        Error *local_err = NULL;
+
+        kvmppc_xive_set_queue_config(xive, end_blk, end_idx, &end, &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            return H_HARDWARE;
+        }
+    }
+
     /* Update END */
     memcpy(&xive->endt[end_idx], &end, sizeof(XiveEND));
     return H_SUCCESS;
@@ -1189,6 +1222,16 @@ static target_ulong h_int_get_queue_config(PowerPCCPU *cpu,
         args[2] = 0;
     }
 
+    if (kvm_irqchip_in_kernel()) {
+        Error *local_err = NULL;
+
+        kvmppc_xive_get_queue_config(xive, end_blk, end_idx, end, &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            return H_HARDWARE;
+        }
+    }
+
     /* TODO: do we need any locking on the END ? */
     if (flags & SPAPR_XIVE_END_DEBUG) {
         /* Load the event queue generation number into the return flags */
@@ -1341,15 +1384,20 @@ static target_ulong h_int_esb(PowerPCCPU *cpu,
         return H_P3;
     }
 
-    mmio_addr = xive->vc_base + xive_source_esb_mgmt(xsrc, lisn) + offset;
+    if (kvm_irqchip_in_kernel()) {
+        args[0] = kvmppc_xive_esb_rw(xsrc, lisn, offset, data,
+                                     flags & SPAPR_XIVE_ESB_STORE);
+    } else {
+        mmio_addr = xive->vc_base + xive_source_esb_mgmt(xsrc, lisn) + offset;
 
-    if (dma_memory_rw(&address_space_memory, mmio_addr, &data, 8,
-                      (flags & SPAPR_XIVE_ESB_STORE))) {
-        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: failed to access ESB @0x%"
-                      HWADDR_PRIx "\n", mmio_addr);
-        return H_HARDWARE;
+        if (dma_memory_rw(&address_space_memory, mmio_addr, &data, 8,
+                          (flags & SPAPR_XIVE_ESB_STORE))) {
+            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: failed to access ESB @0x%"
+                          HWADDR_PRIx "\n", mmio_addr);
+            return H_HARDWARE;
+        }
+        args[0] = (flags & SPAPR_XIVE_ESB_STORE) ? -1 : data;
     }
-    args[0] = (flags & SPAPR_XIVE_ESB_STORE) ? -1 : data;
     return H_SUCCESS;
 }
 
@@ -1406,7 +1454,20 @@ static target_ulong h_int_sync(PowerPCCPU *cpu,
      * This is not needed when running the emulation under QEMU
      */
 
-    /* This is not real hardware. Nothing to be done */
+    /*
+     * This is not real hardware. Nothing to be done unless when
+     * under KVM
+     */
+
+    if (kvm_irqchip_in_kernel()) {
+        Error *local_err = NULL;
+
+        kvmppc_xive_sync_source(xive, lisn, &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            return H_HARDWARE;
+        }
+    }
     return H_SUCCESS;
 }
 
@@ -1441,6 +1502,16 @@ static target_ulong h_int_reset(PowerPCCPU *cpu,
     }
 
     device_reset(DEVICE(xive));
+
+    if (kvm_irqchip_in_kernel()) {
+        Error *local_err = NULL;
+
+        kvmppc_xive_reset(xive, &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            return H_HARDWARE;
+        }
+    }
     return H_SUCCESS;
 }
 
diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index 623fbf74f23e..6b50451b4f85 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -89,6 +89,52 @@ void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp)
  * XIVE Interrupt Source (KVM)
  */
 
+void kvmppc_xive_set_source_config(sPAPRXive *xive, uint32_t lisn, XiveEAS *eas,
+                                   Error **errp)
+{
+    uint32_t end_idx;
+    uint32_t end_blk;
+    uint32_t eisn;
+    uint8_t priority;
+    uint32_t server;
+    uint64_t kvm_src;
+    Error *local_err = NULL;
+
+    /*
+     * No need to set a MASKED source, this is the default state after
+     * reset.
+     */
+    if (!xive_eas_is_valid(eas) || xive_eas_is_masked(eas)) {
+        return;
+    }
+
+    end_idx = xive_get_field64(EAS_END_INDEX, eas->w);
+    end_blk = xive_get_field64(EAS_END_BLOCK, eas->w);
+    eisn = xive_get_field64(EAS_END_DATA, eas->w);
+
+    spapr_xive_end_to_target(end_blk, end_idx, &server, &priority);
+
+    kvm_src = priority << KVM_XIVE_SOURCE_PRIORITY_SHIFT &
+        KVM_XIVE_SOURCE_PRIORITY_MASK;
+    kvm_src |= server << KVM_XIVE_SOURCE_SERVER_SHIFT &
+        KVM_XIVE_SOURCE_SERVER_MASK;
+    kvm_src |= ((uint64_t)eisn << KVM_XIVE_SOURCE_EISN_SHIFT) &
+        KVM_XIVE_SOURCE_EISN_MASK;
+
+    kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_SOURCE_CONFIG, lisn,
+                      &kvm_src, true, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+}
+
+void kvmppc_xive_sync_source(sPAPRXive *xive, uint32_t lisn, Error **errp)
+{
+    kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_SOURCE_SYNC, lisn,
+                      NULL, true, errp);
+}
+
 /*
  * At reset, the interrupt sources are simply created and MASKED. We
  * only need to inform the KVM XIVE device about their type: LSI or
@@ -125,6 +171,64 @@ void kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp)
     }
 }
 
+/*
+ * This is used to perform the magic loads on the ESB pages, described
+ * in xive.h.
+ */
+static uint64_t xive_esb_rw(XiveSource *xsrc, int srcno, uint32_t offset,
+                            uint64_t data, bool write)
+{
+    unsigned long addr = (unsigned long) xsrc->esb_mmap +
+        xive_source_esb_mgmt(xsrc, srcno) + offset;
+
+    if (write) {
+        *((uint64_t *) addr) = data;
+        return -1;
+    } else {
+        return *((uint64_t *) addr);
+    }
+}
+
+static uint8_t xive_esb_read(XiveSource *xsrc, int srcno, uint32_t offset)
+{
+    /* Prevent the compiler from optimizing away the load */
+    volatile uint64_t value = xive_esb_rw(xsrc, srcno, offset, 0, 0);
+
+    return be64_to_cpu(value) & 0x3;
+}
+
+static void xive_esb_trigger(XiveSource *xsrc, int srcno)
+{
+    unsigned long addr = (unsigned long) xsrc->esb_mmap +
+        xive_source_esb_page(xsrc, srcno);
+
+    *((uint64_t *) addr) = 0x0;
+}
+
+uint64_t kvmppc_xive_esb_rw(XiveSource *xsrc, int srcno, uint32_t offset,
+                            uint64_t data, bool write)
+{
+    if (write) {
+        return xive_esb_rw(xsrc, srcno, offset, data, 1);
+    }
+
+    /*
+     * Special Load EOI handling for LSI sources. Q bit is never set
+     * and the interrupt should be re-triggered if the level is still
+     * asserted.
+     */
+    if (xive_source_irq_is_lsi(xsrc, srcno) &&
+        offset == XIVE_ESB_LOAD_EOI) {
+        xive_esb_read(xsrc, srcno, XIVE_ESB_SET_PQ_00);
+        if (xsrc->status[srcno] & XIVE_STATUS_ASSERTED) {
+            xive_esb_trigger(xsrc, srcno);
+        }
+        return 0;
+    } else {
+        return xive_esb_rw(xsrc, srcno, offset, 0, 0);
+    }
+}
+
 void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val)
 {
     XiveSource *xsrc = opaque;
@@ -155,6 +259,86 @@ void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val)
 /*
  * sPAPR XIVE interrupt controller (KVM)
  */
+void kvmppc_xive_get_queue_config(sPAPRXive *xive, uint8_t end_blk,
+                                  uint32_t end_idx, XiveEND *end,
+                                  Error **errp)
+{
+    struct kvm_ppc_xive_eq kvm_eq = { 0 };
+    uint64_t kvm_eq_idx;
+    uint8_t priority;
+    uint32_t server;
+    Error *local_err = NULL;
+
+    if (!xive_end_is_valid(end)) {
+        return;
+    }
+
+    /* Encode the tuple (server, prio) as a KVM EQ index */
+    spapr_xive_end_to_target(end_blk, end_idx, &server, &priority);
+
+    kvm_eq_idx = priority << KVM_XIVE_EQ_PRIORITY_SHIFT &
+            KVM_XIVE_EQ_PRIORITY_MASK;
+    kvm_eq_idx |= server << KVM_XIVE_EQ_SERVER_SHIFT &
+        KVM_XIVE_EQ_SERVER_MASK;
+
+    kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_EQ_CONFIG, kvm_eq_idx,
+                      &kvm_eq, false, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    /*
+     * The EQ index and toggle bit are updated by HW. These are the
+     * only fields we want to return.
+     */
+    end->w1 = xive_set_field32(END_W1_GENERATION, 0ul, kvm_eq.qtoggle) |
+        xive_set_field32(END_W1_PAGE_OFF, 0ul, kvm_eq.qindex);
+}
+
+void kvmppc_xive_set_queue_config(sPAPRXive *xive, uint8_t end_blk,
+                                  uint32_t end_idx, XiveEND *end,
+                                  Error **errp)
+{
+    struct kvm_ppc_xive_eq kvm_eq = { 0 };
+    uint64_t kvm_eq_idx;
+    uint8_t priority;
+    uint32_t server;
+    Error *local_err = NULL;
+
+    if (!xive_end_is_valid(end)) {
+        return;
+    }
+
+    /* Build the KVM state from the local END structure */
+    kvm_eq.flags   = KVM_XIVE_EQ_FLAG_ALWAYS_NOTIFY;
+    kvm_eq.qsize   = xive_get_field32(END_W0_QSIZE, end->w0) + 12;
+    kvm_eq.qpage   = (uint64_t) be32_to_cpu(end->w2 & 0x0fffffff) << 32 |
+        be32_to_cpu(end->w3);
+    kvm_eq.qtoggle = xive_get_field32(END_W1_GENERATION, end->w1);
+    kvm_eq.qindex  = xive_get_field32(END_W1_PAGE_OFF, end->w1);
+
+    /* Encode the tuple (server, prio) as a KVM EQ index */
+    spapr_xive_end_to_target(end_blk, end_idx, &server, &priority);
+
+    kvm_eq_idx = priority << KVM_XIVE_EQ_PRIORITY_SHIFT &
+            KVM_XIVE_EQ_PRIORITY_MASK;
+    kvm_eq_idx |= server << KVM_XIVE_EQ_SERVER_SHIFT &
+        KVM_XIVE_EQ_SERVER_MASK;
+
+    kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_EQ_CONFIG, kvm_eq_idx,
+                      &kvm_eq, true, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+}
+
+void kvmppc_xive_reset(sPAPRXive *xive, Error **errp)
+{
+    kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_CTRL, KVM_DEV_XIVE_RESET,
+                      NULL, true, errp);
+}
 
 static void *kvmppc_xive_mmap(sPAPRXive *xive, int pgoff, size_t len,
                               Error **errp)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v2 03/13] spapr/xive: activate KVM support
  2019-02-22 13:13 [Qemu-devel] [PATCH v2 00/13] spapr: add KVM support to the XIVE interrupt mode Cédric Le Goater
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 01/13] spapr/xive: add KVM support Cédric Le Goater
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 02/13] spapr/xive: add hcall support when under KVM Cédric Le Goater
@ 2019-02-22 13:13 ` Cédric Le Goater
  2019-02-25 23:49   ` David Gibson
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 04/13] spapr/xive: add state synchronization with KVM Cédric Le Goater
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Cédric Le Goater @ 2019-02-22 13:13 UTC (permalink / raw)
  To: David Gibson; +Cc: Greg Kurz, qemu-ppc, qemu-devel, Cédric Le Goater

All is in place for KVM now. State synchronization and migration will
come next.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/ppc/spapr_irq.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 6e1c36dc62ca..1ad57582a403 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -263,19 +263,10 @@ sPAPRIrq spapr_irq_xics = {
 static void spapr_irq_init_xive(sPAPRMachineState *spapr, int nr_irqs,
                                 Error **errp)
 {
-    MachineState *machine = MACHINE(spapr);
     uint32_t nr_servers = spapr_max_server_number(spapr);
     DeviceState *dev;
     int i;
 
-    /* KVM XIVE device not yet available */
-    if (kvm_enabled()) {
-        if (machine_kernel_irqchip_required(machine)) {
-            error_setg(errp, "kernel_irqchip requested. no KVM XIVE support");
-            return;
-        }
-    }
-
     dev = qdev_create(NULL, TYPE_SPAPR_XIVE);
     qdev_prop_set_uint32(dev, "nr-irqs", nr_irqs);
     /*
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v2 04/13] spapr/xive: add state synchronization with KVM
  2019-02-22 13:13 [Qemu-devel] [PATCH v2 00/13] spapr: add KVM support to the XIVE interrupt mode Cédric Le Goater
                   ` (2 preceding siblings ...)
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 03/13] spapr/xive: activate KVM support Cédric Le Goater
@ 2019-02-22 13:13 ` Cédric Le Goater
  2019-02-26  0:01   ` David Gibson
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 05/13] spapr/xive: introduce a VM state change handler Cédric Le Goater
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Cédric Le Goater @ 2019-02-22 13:13 UTC (permalink / raw)
  To: David Gibson; +Cc: Greg Kurz, qemu-ppc, qemu-devel, Cédric Le Goater

This extends the KVM XIVE device backend with 'synchronize_state'
methods used to retrieve the state from KVM. The HW state of the
sources, the KVM device and the thread interrupt contexts are
collected for the monitor usage and also migration.

These get operations rely on their KVM counterpart in the host kernel
which acts as a proxy for OPAL, the host firmware. The set operations
will be added for migration support later.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr_xive.h |  8 ++++
 include/hw/ppc/xive.h       |  1 +
 hw/intc/spapr_xive.c        | 17 ++++---
 hw/intc/spapr_xive_kvm.c    | 89 +++++++++++++++++++++++++++++++++++++
 hw/intc/xive.c              | 10 +++++
 5 files changed, 118 insertions(+), 7 deletions(-)

diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 749c6cbc2c56..ebd65e7fe36b 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -44,6 +44,13 @@ typedef struct sPAPRXive {
     void          *tm_mmap;
 } sPAPRXive;
 
+/*
+ * The sPAPR machine has a unique XIVE IC device. Assign a fixed value
+ * to the controller block id value. It can nevertheless be changed
+ * for testing purpose.
+ */
+#define SPAPR_XIVE_BLOCK_ID 0x0
+
 bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi);
 bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t lisn);
 void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
@@ -74,5 +81,6 @@ void kvmppc_xive_set_queue_config(sPAPRXive *xive, uint8_t end_blk,
 void kvmppc_xive_get_queue_config(sPAPRXive *xive, uint8_t end_blk,
                                  uint32_t end_idx, XiveEND *end,
                                  Error **errp);
+void kvmppc_xive_synchronize_state(sPAPRXive *xive, Error **errp);
 
 #endif /* PPC_SPAPR_XIVE_H */
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 061d43fea24d..f3766fd881a2 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -431,5 +431,6 @@ void kvmppc_xive_source_reset_one(XiveSource *xsrc, int srcno, Error **errp);
 void kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp);
 void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val);
 void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp);
+void kvmppc_xive_cpu_synchronize_state(XiveTCTX *tctx, Error **errp);
 
 #endif /* PPC_XIVE_H */
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 3db24391e31c..9f07567f4d78 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -40,13 +40,6 @@
 
 #define SPAPR_XIVE_NVT_BASE 0x400
 
-/*
- * The sPAPR machine has a unique XIVE IC device. Assign a fixed value
- * to the controller block id value. It can nevertheless be changed
- * for testing purpose.
- */
-#define SPAPR_XIVE_BLOCK_ID 0x0
-
 /*
  * sPAPR NVT and END indexing helpers
  */
@@ -153,6 +146,16 @@ void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
     XiveSource *xsrc = &xive->source;
     int i;
 
+    if (kvm_irqchip_in_kernel()) {
+        Error *local_err = NULL;
+
+        kvmppc_xive_synchronize_state(xive, &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            return;
+        }
+    }
+
     monitor_printf(mon, "  LSIN         PQ    EISN     CPU/PRIO EQ\n");
 
     for (i = 0; i < xive->nr_irqs; i++) {
diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index 6b50451b4f85..4b1ffb9835f9 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -60,6 +60,57 @@ static void kvm_cpu_enable(CPUState *cs)
 /*
  * XIVE Thread Interrupt Management context (KVM)
  */
+static void kvmppc_xive_cpu_get_state(XiveTCTX *tctx, Error **errp)
+{
+    uint64_t state[4] = { 0 };
+    int ret;
+
+    ret = kvm_get_one_reg(tctx->cs, KVM_REG_PPC_NVT_STATE, state);
+    if (ret != 0) {
+        error_setg_errno(errp, errno,
+                         "XIVE: could not capture KVM state of CPU %ld",
+                         kvm_arch_vcpu_id(tctx->cs));
+        return;
+    }
+
+    /* word0 and word1 of the OS ring. */
+    *((uint64_t *) &tctx->regs[TM_QW1_OS]) = state[0];
+
+    /*
+     * KVM also returns word2 containing the OS CAM line which is
+     * interesting to print out in the QEMU monitor.
+     */
+    *((uint64_t *) &tctx->regs[TM_QW1_OS + TM_WORD2]) = state[1];
+}
+
+typedef struct {
+    XiveTCTX *tctx;
+    Error *err;
+} XiveCpuGetState;
+
+static void kvmppc_xive_cpu_do_synchronize_state(CPUState *cpu,
+                                                 run_on_cpu_data arg)
+{
+    XiveCpuGetState *s = arg.host_ptr;
+
+    kvmppc_xive_cpu_get_state(s->tctx, &s->err);
+}
+
+void kvmppc_xive_cpu_synchronize_state(XiveTCTX *tctx, Error **errp)
+{
+    XiveCpuGetState s = {
+        .tctx = tctx,
+        .err = NULL,
+    };
+
+    run_on_cpu(tctx->cs, kvmppc_xive_cpu_do_synchronize_state,
+               RUN_ON_CPU_HOST_PTR(&s));
+
+    if (s.err) {
+        error_propagate(errp, s.err);
+        return;
+    }
+}
 
 void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp)
 {
@@ -229,6 +280,19 @@ uint64_t kvmppc_xive_esb_rw(XiveSource *xsrc, int srcno, uint32_t offset,
     }
 }
 
+static void kvmppc_xive_source_get_state(XiveSource *xsrc)
+{
+    int i;
+
+    for (i = 0; i < xsrc->nr_irqs; i++) {
+        /* Perform a load without side effect to retrieve the PQ bits */
+        uint8_t pq = xive_esb_read(xsrc, i, XIVE_ESB_GET);
+
+        /* and save PQ locally */
+        xive_source_esb_set(xsrc, i, pq);
+    }
+}
+
 void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val)
 {
     XiveSource *xsrc = opaque;
@@ -340,6 +404,31 @@ void kvmppc_xive_reset(sPAPRXive *xive, Error **errp)
                       NULL, true, errp);
 }
 
+static void kvmppc_xive_get_queues(sPAPRXive *xive, Error **errp)
+{
+    Error *local_err = NULL;
+    int i;
+
+    for (i = 0; i < xive->nr_ends; i++) {
+        kvmppc_xive_get_queue_config(xive, SPAPR_XIVE_BLOCK_ID, i,
+                                     &xive->endt[i], &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+}
+
+void kvmppc_xive_synchronize_state(sPAPRXive *xive, Error **errp)
+{
+    kvmppc_xive_source_get_state(&xive->source);
+
+    /* EAT: there is no extra state to query from KVM */
+
+    /* ENDT */
+    kvmppc_xive_get_queues(xive, errp);
+}
+
 static void *kvmppc_xive_mmap(sPAPRXive *xive, int pgoff, size_t len,
                               Error **errp)
 {
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 0284b5803551..f478c52ab2a0 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -431,6 +431,16 @@ void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon)
     int cpu_index = tctx->cs ? tctx->cs->cpu_index : -1;
     int i;
 
+    if (kvm_irqchip_in_kernel()) {
+        Error *local_err = NULL;
+
+        kvmppc_xive_cpu_synchronize_state(tctx, &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            return;
+        }
+    }
+
     monitor_printf(mon, "CPU[%04x]:   QW   NSR CPPR IPB LSMFB ACK# INC AGE PIPR"
                    "  W2\n", cpu_index);
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v2 05/13] spapr/xive: introduce a VM state change handler
  2019-02-22 13:13 [Qemu-devel] [PATCH v2 00/13] spapr: add KVM support to the XIVE interrupt mode Cédric Le Goater
                   ` (3 preceding siblings ...)
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 04/13] spapr/xive: add state synchronization with KVM Cédric Le Goater
@ 2019-02-22 13:13 ` Cédric Le Goater
  2019-02-26  0:39   ` David Gibson
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 06/13] spapr/xive: add migration support for KVM Cédric Le Goater
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Cédric Le Goater @ 2019-02-22 13:13 UTC (permalink / raw)
  To: David Gibson; +Cc: Greg Kurz, qemu-ppc, qemu-devel, Cédric Le Goater

This handler is in charge of stabilizing the flow of event notifications
in the XIVE controller before migrating a guest. This is a requirement
before transferring the guest EQ pages to a destination.

When the VM is stopped, the handler masks the sources (PQ=01) to stop
the flow of events and saves their previous state. The XIVE controller
is then synced through KVM to flush any in-flight event notification
and to stabilize the EQs. At this stage, the EQ pages are marked dirty
to make sure the EQ pages are transferred if a migration sequence is
in progress.

The previous configuration of the sources is restored when the VM
resumes, after a migration or a stop.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr_xive.h |  1 +
 hw/intc/spapr_xive_kvm.c    | 77 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 77 insertions(+), 1 deletion(-)

diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index ebd65e7fe36b..298d204d54ef 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -42,6 +42,7 @@ typedef struct sPAPRXive {
     /* KVM support */
     int           fd;
     void          *tm_mmap;
+    VMChangeStateEntry *change;
 } sPAPRXive;
 
 /*
diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index 4b1ffb9835f9..44d80175b1b5 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -419,9 +419,81 @@ static void kvmppc_xive_get_queues(sPAPRXive *xive, Error **errp)
     }
 }
 
+/*
+ * The primary goal of the XIVE VM change handler is to mark the EQ
+ * pages dirty when all XIVE event notifications have stopped.
+ *
+ * Whenever the VM is stopped, the VM change handler masks the sources
+ * (PQ=01) to stop the flow of events and saves the previous state in
+ * anticipation of a migration. The XIVE controller is then synced
+ * through KVM to flush any in-flight event notification and stabilize
+ * the EQs.
+ *
+ * At this stage, we can mark the EQ page dirty and let a migration
+ * sequence transfer the EQ pages to the destination, which is done
+ * just after the stop state.
+ *
+ * The previous configuration of the sources is restored when the VM
+ * runs again.
+ */
+static void kvmppc_xive_change_state_handler(void *opaque, int running,
+                                             RunState state)
+{
+    sPAPRXive *xive = opaque;
+    XiveSource *xsrc = &xive->source;
+    Error *local_err = NULL;
+    int i;
+
+    /*
+     * Restore the sources to their initial state. This is called when
+     * the VM resumes after a stop or a migration.
+     */
+    if (running) {
+        for (i = 0; i < xsrc->nr_irqs; i++) {
+            uint8_t pq = xive_source_esb_get(xsrc, i);
+            if (xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_00 + (pq << 8)) != 0x1) {
+                error_report("XIVE: IRQ %d has an invalid state", i);
+            }
+        }
+
+        return;
+    }
+
+    /*
+     * Mask the sources, to stop the flow of event notifications, and
+     * save the PQs locally in the XiveSource object. The XiveSource
+     * state will be collected later on by its vmstate handler if a
+     * migration is in progress.
+     */
+    for (i = 0; i < xsrc->nr_irqs; i++) {
+        uint8_t pq = xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_01);
+        xive_source_esb_set(xsrc, i, pq);
+    }
+
+    /*
+     * Sync the XIVE controller in KVM, to flush in-flight event
+     * notification that should be enqueued in the EQs and mark the
+     * XIVE EQ pages dirty to collect all updates.
+     */
+    kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_CTRL,
+                      KVM_DEV_XIVE_EQ_SYNC, NULL, true, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+        return;
+    }
+}
+
 void kvmppc_xive_synchronize_state(sPAPRXive *xive, Error **errp)
 {
-    kvmppc_xive_source_get_state(&xive->source);
+    /*
+     * When the VM is stopped, the sources are masked and the previous
+     * state is saved in anticipation of a migration. We should not
+     * synchronize the source state in that case else we will override
+     * the saved state.
+     */
+    if (runstate_is_running()) {
+        kvmppc_xive_source_get_state(&xive->source);
+    }
 
     /* EAT: there is no extra state to query from KVM */
 
@@ -501,6 +573,9 @@ void kvmppc_xive_connect(sPAPRXive *xive, Error **errp)
                                       "xive.tima", tima_len, xive->tm_mmap);
     sysbus_init_mmio(SYS_BUS_DEVICE(xive), &xive->tm_mmio);
 
+    xive->change = qemu_add_vm_change_state_handler(
+        kvmppc_xive_change_state_handler, xive);
+
     kvm_kernel_irqchip = true;
     kvm_msi_via_irqfd_allowed = true;
     kvm_gsi_direct_mapping = true;
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v2 06/13] spapr/xive: add migration support for KVM
  2019-02-22 13:13 [Qemu-devel] [PATCH v2 00/13] spapr: add KVM support to the XIVE interrupt mode Cédric Le Goater
                   ` (4 preceding siblings ...)
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 05/13] spapr/xive: introduce a VM state change handler Cédric Le Goater
@ 2019-02-22 13:13 ` Cédric Le Goater
  2019-02-26  0:58   ` David Gibson
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 07/13] spapr/xive: fix migration of the XiveTCTX under TCG Cédric Le Goater
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Cédric Le Goater @ 2019-02-22 13:13 UTC (permalink / raw)
  To: David Gibson; +Cc: Greg Kurz, qemu-ppc, qemu-devel, Cédric Le Goater

When the VM is stopped, the VM state handler stabilizes the XIVE IC
and marks the EQ pages dirty. These are then transferred to destination
before the transfer of the device vmstates starts.

The sPAPRXive interrupt controller model captures the XIVE internal
tables, EAT and ENDT and the XiveTCTX model does the same for the
thread interrupt context registers.

At restart, the sPAPRXive 'post_load' method restores all the XIVE
states. It is called by the sPAPR machine 'post_load' method, when all
XIVE states have been transferred and loaded.

Finally, the source states are restored in the VM change state handler
when the machine reaches the running state.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr_xive.h |  3 ++
 include/hw/ppc/xive.h       |  1 +
 hw/intc/spapr_xive.c        | 24 ++++++++++
 hw/intc/spapr_xive_kvm.c    | 93 ++++++++++++++++++++++++++++++++++++-
 hw/intc/xive.c              | 17 +++++++
 hw/ppc/spapr_irq.c          |  2 +-
 6 files changed, 138 insertions(+), 2 deletions(-)

diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 298d204d54ef..22d70650b51f 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -55,6 +55,7 @@ typedef struct sPAPRXive {
 bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi);
 bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t lisn);
 void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
+int spapr_xive_post_load(sPAPRXive *xive, int version_id);
 
 void spapr_xive_hcall_init(sPAPRMachineState *spapr);
 void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
@@ -83,5 +84,7 @@ void kvmppc_xive_get_queue_config(sPAPRXive *xive, uint8_t end_blk,
                                  uint32_t end_idx, XiveEND *end,
                                  Error **errp);
 void kvmppc_xive_synchronize_state(sPAPRXive *xive, Error **errp);
+int kvmppc_xive_pre_save(sPAPRXive *xive);
+int kvmppc_xive_post_load(sPAPRXive *xive, int version_id);
 
 #endif /* PPC_SPAPR_XIVE_H */
diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index f3766fd881a2..3b1baa783975 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -432,5 +432,6 @@ void kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp);
 void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val);
 void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp);
 void kvmppc_xive_cpu_synchronize_state(XiveTCTX *tctx, Error **errp);
+void kvmppc_xive_cpu_get_state(XiveTCTX *tctx, Error **errp);
 
 #endif /* PPC_XIVE_H */
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 9f07567f4d78..21fe5e1aa39f 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -469,10 +469,34 @@ static const VMStateDescription vmstate_spapr_xive_eas = {
     },
 };
 
+static int vmstate_spapr_xive_pre_save(void *opaque)
+{
+    if (kvm_irqchip_in_kernel()) {
+        return kvmppc_xive_pre_save(SPAPR_XIVE(opaque));
+    }
+
+    return 0;
+}
+
+/*
+ * Called by the sPAPR IRQ backend 'post_load' method at the machine
+ * level.
+ */
+int spapr_xive_post_load(sPAPRXive *xive, int version_id)
+{
+    if (kvm_irqchip_in_kernel()) {
+        return kvmppc_xive_post_load(xive, version_id);
+    }
+
+    return 0;
+}
+
 static const VMStateDescription vmstate_spapr_xive = {
     .name = TYPE_SPAPR_XIVE,
     .version_id = 1,
     .minimum_version_id = 1,
+    .pre_save = vmstate_spapr_xive_pre_save,
+    .post_load = NULL, /* handled at the machine level */
     .fields = (VMStateField[]) {
         VMSTATE_UINT32_EQUAL(nr_irqs, sPAPRXive, NULL),
         VMSTATE_STRUCT_VARRAY_POINTER_UINT32(eat, sPAPRXive, nr_irqs,
diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index 44d80175b1b5..119fd59fc9ae 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -15,6 +15,7 @@
 #include "sysemu/cpus.h"
 #include "sysemu/kvm.h"
 #include "hw/ppc/spapr.h"
+#include "hw/ppc/spapr_cpu_core.h"
 #include "hw/ppc/spapr_xive.h"
 #include "hw/ppc/xive.h"
 #include "kvm_ppc.h"
@@ -60,7 +61,30 @@ static void kvm_cpu_enable(CPUState *cs)
 /*
  * XIVE Thread Interrupt Management context (KVM)
  */
-static void kvmppc_xive_cpu_get_state(XiveTCTX *tctx, Error **errp)
+
+static void kvmppc_xive_cpu_set_state(XiveTCTX *tctx, Error **errp)
+{
+    uint64_t state[4];
+    int ret;
+
+    /* word0 and word1 of the OS ring. */
+    state[0] = *((uint64_t *) &tctx->regs[TM_QW1_OS]);
+
+    /*
+     * OS CAM line. Used by KVM to print out the VP identifier. This
+     * is for debug only.
+     */
+    state[1] = *((uint64_t *) &tctx->regs[TM_QW1_OS + TM_WORD2]);
+
+    ret = kvm_set_one_reg(tctx->cs, KVM_REG_PPC_NVT_STATE, state);
+    if (ret != 0) {
+        error_setg_errno(errp, errno,
+                         "XIVE: could not restore KVM state of CPU %ld",
+                         kvm_arch_vcpu_id(tctx->cs));
+    }
+}
+
+void kvmppc_xive_cpu_get_state(XiveTCTX *tctx, Error **errp)
 {
     uint64_t state[4] = { 0 };
     int ret;
@@ -501,6 +525,73 @@ void kvmppc_xive_synchronize_state(sPAPRXive *xive, Error **errp)
     kvmppc_xive_get_queues(xive, errp);
 }
 
+/*
+ * The sPAPRXive 'pre_save' method is called by the vmstate handler of
+ * the sPAPRXive model, after the XIVE controller is synced in the VM
+ * change handler.
+ */
+int kvmppc_xive_pre_save(sPAPRXive *xive)
+{
+    Error *local_err = NULL;
+
+    /* EAT: there is no extra state to query from KVM */
+
+    /* ENDT */
+    kvmppc_xive_get_queues(xive, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+        return -1;
+    }
+
+    return 0;
+}
+
+/*
+ * The sPAPRXive 'post_load' method is not called by a vmstate
+ * handler. It is called at the sPAPR machine level at the end of the
+ * migration sequence by the sPAPR IRQ backend 'post_load' method,
+ * when all XIVE states have been transferred and loaded.
+ */
+int kvmppc_xive_post_load(sPAPRXive *xive, int version_id)
+{
+    Error *local_err = NULL;
+    CPUState *cs;
+    int i;
+
+    /* Restore the ENDT first. The targetting depends on it. */
+    for (i = 0; i < xive->nr_ends; i++) {
+        kvmppc_xive_set_queue_config(xive, SPAPR_XIVE_BLOCK_ID, i,
+                                     &xive->endt[i], &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            return -1;
+        }
+    }
+
+    /* Restore the EAT */
+    for (i = 0; i < xive->nr_irqs; i++) {
+        kvmppc_xive_set_source_config(xive, i, &xive->eat[i], &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            return -1;
+        }
+    }
+
+    /* Restore the thread interrupt contexts */
+    CPU_FOREACH(cs) {
+        PowerPCCPU *cpu = POWERPC_CPU(cs);
+
+        kvmppc_xive_cpu_set_state(spapr_cpu_state(cpu)->tctx, &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            return -1;
+        }
+    }
+
+    /* The source states will be restored when the machine starts running */
+    return 0;
+}
+
 static void *kvmppc_xive_mmap(sPAPRXive *xive, int pgoff, size_t len,
                               Error **errp)
 {
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index f478c52ab2a0..1f8e923ca654 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -518,10 +518,27 @@ static void xive_tctx_unrealize(DeviceState *dev, Error **errp)
     qemu_unregister_reset(xive_tctx_reset, dev);
 }
 
+static int vmstate_xive_tctx_pre_save(void *opaque)
+{
+    Error *local_err = NULL;
+
+    if (kvm_irqchip_in_kernel()) {
+        kvmppc_xive_cpu_get_state(XIVE_TCTX(opaque), &local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            return -1;
+        }
+    }
+
+    return 0;
+}
+
 static const VMStateDescription vmstate_xive_tctx = {
     .name = TYPE_XIVE_TCTX,
     .version_id = 1,
     .minimum_version_id = 1,
+    .pre_save = vmstate_xive_tctx_pre_save,
+    .post_load = NULL, /* handled by the sPAPRxive model */
     .fields = (VMStateField[]) {
         VMSTATE_BUFFER(regs, XiveTCTX),
         VMSTATE_END_OF_LIST()
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 1ad57582a403..12ecca6264f3 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -356,7 +356,7 @@ static void spapr_irq_cpu_intc_create_xive(sPAPRMachineState *spapr,
 
 static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
 {
-    return 0;
+    return spapr_xive_post_load(spapr->xive, version_id);
 }
 
 static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v2 07/13] spapr/xive: fix migration of the XiveTCTX under TCG
  2019-02-22 13:13 [Qemu-devel] [PATCH v2 00/13] spapr: add KVM support to the XIVE interrupt mode Cédric Le Goater
                   ` (5 preceding siblings ...)
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 06/13] spapr/xive: add migration support for KVM Cédric Le Goater
@ 2019-02-22 13:13 ` Cédric Le Goater
  2019-02-26  1:02   ` David Gibson
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 08/13] spapr/rtas: modify spapr_rtas_register() to remove RTAS handlers Cédric Le Goater
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Cédric Le Goater @ 2019-02-22 13:13 UTC (permalink / raw)
  To: David Gibson; +Cc: Greg Kurz, qemu-ppc, qemu-devel, Cédric Le Goater

When the thread interrupt management state is retrieved from the KVM
VCPU, word2 is saved under the QEMU XIVE thread context to print out
the OS CAM line under the QEMU monitor.

This breaks the migration of a TCG guest (and with KVM when
kernel_irqchip=off) because the matching algorithm of the presenter
relies on the OS CAM value. Fix with an extra reset of the thread
contexts to restore the expected value.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/ppc/spapr_irq.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 12ecca6264f3..3176098b9f7c 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -356,7 +356,31 @@ static void spapr_irq_cpu_intc_create_xive(sPAPRMachineState *spapr,
 
 static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
 {
-    return spapr_xive_post_load(spapr->xive, version_id);
+    CPUState *cs;
+    int ret;
+
+    ret = spapr_xive_post_load(spapr->xive, version_id);
+    if (ret) {
+        return ret;
+    }
+
+    /*
+     * When the states are collected from the KVM XIVE device, word2
+     * of the XiveTCTX is set to print out the OS CAM line under the
+     * QEMU monitor.
+     *
+     * This breaks the migration on a TCG guest (or on KVM with
+     * kernel_irqchip=off) because the matching algorithm of the
+     * presenter relies on the OS CAM value. Fix with an extra reset
+     * of the thread contexts to restore the expected value.
+     */
+    CPU_FOREACH(cs) {
+        PowerPCCPU *cpu = POWERPC_CPU(cs);
+
+        /* (TCG) Set the OS CAM line of the thread interrupt context. */
+        spapr_xive_set_tctx_os_cam(spapr_cpu_state(cpu)->tctx);
+    }
+    return 0;
 }
 
 static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v2 08/13] spapr/rtas: modify spapr_rtas_register() to remove RTAS handlers
  2019-02-22 13:13 [Qemu-devel] [PATCH v2 00/13] spapr: add KVM support to the XIVE interrupt mode Cédric Le Goater
                   ` (6 preceding siblings ...)
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 07/13] spapr/xive: fix migration of the XiveTCTX under TCG Cédric Le Goater
@ 2019-02-22 13:13 ` Cédric Le Goater
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 09/13] sysbus: add a sysbus_mmio_unmap() helper Cédric Le Goater
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Cédric Le Goater @ 2019-02-22 13:13 UTC (permalink / raw)
  To: David Gibson; +Cc: Greg Kurz, qemu-ppc, qemu-devel, Cédric Le Goater

Removing RTAS handlers will become necessary when the new pseries
machine supporting multiple interrupt mode is introduced.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr.h | 4 ++++
 hw/ppc/spapr_rtas.c    | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 59073a757900..dd346d921428 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -660,6 +660,10 @@ typedef void (*spapr_rtas_fn)(PowerPCCPU *cpu, sPAPRMachineState *sm,
                               uint32_t nargs, target_ulong args,
                               uint32_t nret, target_ulong rets);
 void spapr_rtas_register(int token, const char *name, spapr_rtas_fn fn);
+static inline void spapr_rtas_unregister(int token)
+{
+    spapr_rtas_register(token, NULL, NULL);
+}
 target_ulong spapr_rtas_call(PowerPCCPU *cpu, sPAPRMachineState *sm,
                              uint32_t token, uint32_t nargs, target_ulong args,
                              uint32_t nret, target_ulong rets);
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 7a2cb786a36a..d09c5f463d22 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -404,7 +404,7 @@ void spapr_rtas_register(int token, const char *name, spapr_rtas_fn fn)
 
     token -= RTAS_TOKEN_BASE;
 
-    assert(!rtas_table[token].name);
+    assert(!name || !rtas_table[token].name);
 
     rtas_table[token].name = name;
     rtas_table[token].fn = fn;
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v2 09/13] sysbus: add a sysbus_mmio_unmap() helper
  2019-02-22 13:13 [Qemu-devel] [PATCH v2 00/13] spapr: add KVM support to the XIVE interrupt mode Cédric Le Goater
                   ` (7 preceding siblings ...)
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 08/13] spapr/rtas: modify spapr_rtas_register() to remove RTAS handlers Cédric Le Goater
@ 2019-02-22 13:13 ` Cédric Le Goater
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 10/13] spapr: introduce routines to delete the KVM IRQ device Cédric Le Goater
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 32+ messages in thread
From: Cédric Le Goater @ 2019-02-22 13:13 UTC (permalink / raw)
  To: David Gibson; +Cc: Greg Kurz, qemu-ppc, qemu-devel, Cédric Le Goater

This will be used to remove the MMIO regions of the POWER9 XIVE
interrupt controller when the sPAPR machine is reseted.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 include/hw/sysbus.h |  1 +
 hw/core/sysbus.c    | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/include/hw/sysbus.h b/include/hw/sysbus.h
index 1aedcf05c92b..4c668fbbdc60 100644
--- a/include/hw/sysbus.h
+++ b/include/hw/sysbus.h
@@ -89,6 +89,7 @@ qemu_irq sysbus_get_connected_irq(SysBusDevice *dev, int n);
 void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr);
 void sysbus_mmio_map_overlap(SysBusDevice *dev, int n, hwaddr addr,
                              int priority);
+void sysbus_mmio_unmap(SysBusDevice *dev, int n);
 void sysbus_add_io(SysBusDevice *dev, hwaddr addr,
                    MemoryRegion *mem);
 MemoryRegion *sysbus_address_space(SysBusDevice *dev);
diff --git a/hw/core/sysbus.c b/hw/core/sysbus.c
index 9f9edbcab96f..f90d87b058c3 100644
--- a/hw/core/sysbus.c
+++ b/hw/core/sysbus.c
@@ -153,6 +153,16 @@ static void sysbus_mmio_map_common(SysBusDevice *dev, int n, hwaddr addr,
     }
 }
 
+void sysbus_mmio_unmap(SysBusDevice *dev, int n)
+{
+    assert(n >= 0 && n < dev->num_mmio);
+
+    if (dev->mmio[n].addr != (hwaddr)-1) {
+        memory_region_del_subregion(get_system_memory(), dev->mmio[n].memory);
+        dev->mmio[n].addr = (hwaddr)-1;
+    }
+}
+
 void sysbus_mmio_map(SysBusDevice *dev, int n, hwaddr addr)
 {
     sysbus_mmio_map_common(dev, n, addr, false, 0);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v2 10/13] spapr: introduce routines to delete the KVM IRQ device
  2019-02-22 13:13 [Qemu-devel] [PATCH v2 00/13] spapr: add KVM support to the XIVE interrupt mode Cédric Le Goater
                   ` (8 preceding siblings ...)
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 09/13] sysbus: add a sysbus_mmio_unmap() helper Cédric Le Goater
@ 2019-02-22 13:13 ` Cédric Le Goater
  2019-02-26  1:10   ` David Gibson
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 11/13] spapr: check for the activation of " Cédric Le Goater
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 32+ messages in thread
From: Cédric Le Goater @ 2019-02-22 13:13 UTC (permalink / raw)
  To: David Gibson; +Cc: Greg Kurz, qemu-ppc, qemu-devel, Cédric Le Goater

If a new interrupt mode is chosen by CAS, the machine generates a
reset to reconfigure. At this point, the connection with the previous
KVM device needs to be closed and a new connection needs to opened
with the KVM device operating the chosen interrupt mode.

New routines are introduced to destroy the XICS and the XIVE KVM
devices. They make use of a new KVM device ioctl which destroys the
device and also disconnects the IRQ presenters from the vCPUs.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr_xive.h |  1 +
 include/hw/ppc/xics_spapr.h |  1 +
 hw/intc/spapr_xive_kvm.c    | 60 +++++++++++++++++++++++++++++++++++++
 hw/intc/xics_kvm.c          | 56 ++++++++++++++++++++++++++++++++++
 4 files changed, 118 insertions(+)

diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index 22d70650b51f..a7c4c275a747 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -71,6 +71,7 @@ int spapr_xive_end_to_target(uint8_t end_blk, uint32_t end_idx,
  * KVM XIVE device helpers
  */
 void kvmppc_xive_connect(sPAPRXive *xive, Error **errp);
+void kvmppc_xive_disconnect(sPAPRXive *xive, Error **errp);
 void kvmppc_xive_reset(sPAPRXive *xive, Error **errp);
 void kvmppc_xive_set_source_config(sPAPRXive *xive, uint32_t lisn, XiveEAS *eas,
                                    Error **errp);
diff --git a/include/hw/ppc/xics_spapr.h b/include/hw/ppc/xics_spapr.h
index b8d924baf437..bddf09821cb0 100644
--- a/include/hw/ppc/xics_spapr.h
+++ b/include/hw/ppc/xics_spapr.h
@@ -34,6 +34,7 @@
 void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
                    uint32_t phandle);
 int xics_kvm_init(sPAPRMachineState *spapr, Error **errp);
+int xics_kvm_disconnect(sPAPRMachineState *spapr, Error **errp);
 void xics_spapr_init(sPAPRMachineState *spapr);
 
 #endif /* XICS_SPAPR_H */
diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index 119fd59fc9ae..e31035c90260 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -58,6 +58,16 @@ static void kvm_cpu_enable(CPUState *cs)
     QLIST_INSERT_HEAD(&kvm_enabled_cpus, enabled_cpu, node);
 }
 
+static void kvm_cpu_disable_all(void)
+{
+    KVMEnabledCPU *enabled_cpu, *next;
+
+    QLIST_FOREACH_SAFE(enabled_cpu, &kvm_enabled_cpus, node, next) {
+        QLIST_REMOVE(enabled_cpu, node);
+        g_free(enabled_cpu);
+    }
+}
+
 /*
  * XIVE Thread Interrupt Management context (KVM)
  */
@@ -674,3 +684,53 @@ void kvmppc_xive_connect(sPAPRXive *xive, Error **errp)
     /* Map all regions */
     spapr_xive_map_mmio(xive);
 }
+
+void kvmppc_xive_disconnect(sPAPRXive *xive, Error **errp)
+{
+    XiveSource *xsrc;
+    struct kvm_destroy_device xive_destroy_device;
+    size_t esb_len;
+    int rc;
+
+    /* The KVM XIVE device is not in use */
+    if (!xive || xive->fd == -1) {
+        return;
+    }
+
+    if (!kvmppc_has_cap_xive()) {
+        error_setg(errp, "IRQ_XIVE capability must be present for KVM");
+        return;
+    }
+
+    /* Clear the KVM mapping */
+    xsrc = &xive->source;
+    esb_len = (1ull << xsrc->esb_shift) * xsrc->nr_irqs;
+
+    sysbus_mmio_unmap(SYS_BUS_DEVICE(xive), 0);
+    munmap(xsrc->esb_mmap, esb_len);
+
+    sysbus_mmio_unmap(SYS_BUS_DEVICE(xive), 1);
+
+    sysbus_mmio_unmap(SYS_BUS_DEVICE(xive), 2);
+    munmap(xive->tm_mmap, 4ull << TM_SHIFT);
+
+    /* Destroy the KVM device. This also clears the VCPU presenters */
+    xive_destroy_device.fd = xive->fd;
+    xive_destroy_device.flags = 0;
+    rc = kvm_vm_ioctl(kvm_state, KVM_DESTROY_DEVICE, &xive_destroy_device);
+    if (rc < 0) {
+        error_setg_errno(errp, -rc, "Error on KVM_DESTROY_DEVICE for XIVE");
+    }
+    close(xive->fd);
+    xive->fd = -1;
+
+    kvm_kernel_irqchip = false;
+    kvm_msi_via_irqfd_allowed = false;
+    kvm_gsi_direct_mapping = false;
+
+    /* Clear the local list of presenter (hotplug) */
+    kvm_cpu_disable_all();
+
+    /* VM Change state handler is not needed anymore */
+    qemu_del_vm_change_state_handler(xive->change);
+}
diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
index c6e1b630a404..373de3155f6b 100644
--- a/hw/intc/xics_kvm.c
+++ b/hw/intc/xics_kvm.c
@@ -51,6 +51,16 @@ typedef struct KVMEnabledICP {
 static QLIST_HEAD(, KVMEnabledICP)
     kvm_enabled_icps = QLIST_HEAD_INITIALIZER(&kvm_enabled_icps);
 
+static void kvm_disable_icps(void)
+{
+    KVMEnabledICP *enabled_icp, *next;
+
+    QLIST_FOREACH_SAFE(enabled_icp, &kvm_enabled_icps, node, next) {
+        QLIST_REMOVE(enabled_icp, node);
+        g_free(enabled_icp);
+    }
+}
+
 /*
  * ICP-KVM
  */
@@ -360,3 +370,49 @@ fail:
     kvmppc_define_rtas_kernel_token(0, "ibm,int-off");
     return -1;
 }
+
+int xics_kvm_disconnect(sPAPRMachineState *spapr, Error **errp)
+{
+    int rc;
+    struct kvm_destroy_device xics_destroy_device = {
+        .fd = kernel_xics_fd,
+        .flags = 0,
+    };
+
+    /* The KVM XICS device is not in use */
+    if (kernel_xics_fd == -1) {
+        return 0;
+    }
+
+    if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_IRQ_XICS)) {
+        error_setg(errp,
+                   "KVM and IRQ_XICS capability must be present for KVM XICS device");
+        return -1;
+    }
+
+    rc = kvm_vm_ioctl(kvm_state, KVM_DESTROY_DEVICE, &xics_destroy_device);
+    if (rc < 0) {
+        error_setg_errno(errp, -rc, "Error on KVM_DESTROY_DEVICE for XICS");
+    }
+    close(kernel_xics_fd);
+    kernel_xics_fd = -1;
+
+    spapr_rtas_unregister(RTAS_IBM_SET_XIVE);
+    spapr_rtas_unregister(RTAS_IBM_GET_XIVE);
+    spapr_rtas_unregister(RTAS_IBM_INT_OFF);
+    spapr_rtas_unregister(RTAS_IBM_INT_ON);
+
+    kvmppc_define_rtas_kernel_token(0, "ibm,set-xive");
+    kvmppc_define_rtas_kernel_token(0, "ibm,get-xive");
+    kvmppc_define_rtas_kernel_token(0, "ibm,int-on");
+    kvmppc_define_rtas_kernel_token(0, "ibm,int-off");
+
+    kvm_kernel_irqchip = false;
+    kvm_msi_via_irqfd_allowed = false;
+    kvm_gsi_direct_mapping = false;
+
+    /* Clear the presenter from the VCPUs */
+    kvm_disable_icps();
+
+    return rc;
+}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v2 11/13] spapr: check for the activation of the KVM IRQ device
  2019-02-22 13:13 [Qemu-devel] [PATCH v2 00/13] spapr: add KVM support to the XIVE interrupt mode Cédric Le Goater
                   ` (9 preceding siblings ...)
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 10/13] spapr: introduce routines to delete the KVM IRQ device Cédric Le Goater
@ 2019-02-22 13:13 ` Cédric Le Goater
  2019-02-26  1:27   ` David Gibson
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 12/13] spapr: add KVM support to the 'dual' machine Cédric Le Goater
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 13/13] spapr/xive: fix device hotplug when VM is stopped Cédric Le Goater
  12 siblings, 1 reply; 32+ messages in thread
From: Cédric Le Goater @ 2019-02-22 13:13 UTC (permalink / raw)
  To: David Gibson; +Cc: Greg Kurz, qemu-ppc, qemu-devel, Cédric Le Goater

The activation of the KVM IRQ device depends on the interrupt mode
chosen at CAS time by the machine and some methods used at reset or by
the migration need to be protected.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive_kvm.c | 28 ++++++++++++++++++++++++++++
 hw/intc/xics_kvm.c       | 26 +++++++++++++++++++++++++-
 2 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index e31035c90260..cd81cdb23a5e 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -96,9 +96,15 @@ static void kvmppc_xive_cpu_set_state(XiveTCTX *tctx, Error **errp)
 
 void kvmppc_xive_cpu_get_state(XiveTCTX *tctx, Error **errp)
 {
+    sPAPRXive *xive = SPAPR_MACHINE(qdev_get_machine())->xive;
     uint64_t state[4] = { 0 };
     int ret;
 
+    /* The KVM XIVE device is not in use */
+    if (xive->fd == -1) {
+        return;
+    }
+
     ret = kvm_get_one_reg(tctx->cs, KVM_REG_PPC_NVT_STATE, state);
     if (ret != 0) {
         error_setg_errno(errp, errno,
@@ -152,6 +158,11 @@ void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp)
     unsigned long vcpu_id;
     int ret;
 
+    /* The KVM XIVE device is not in use */
+    if (xive->fd == -1) {
+        return;
+    }
+
     /* Check if CPU was hot unplugged and replugged. */
     if (kvm_cpu_is_enabled(tctx->cs)) {
         return;
@@ -330,9 +341,13 @@ static void kvmppc_xive_source_get_state(XiveSource *xsrc)
 void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val)
 {
     XiveSource *xsrc = opaque;
+    sPAPRXive *xive = SPAPR_XIVE(xsrc->xive);
     struct kvm_irq_level args;
     int rc;
 
+    /* The KVM XIVE device should be in use */
+    assert(xive->fd != -1);
+
     args.irq = srcno;
     if (!xive_source_irq_is_lsi(xsrc, srcno)) {
         if (!val) {
@@ -519,6 +534,11 @@ static void kvmppc_xive_change_state_handler(void *opaque, int running,
 
 void kvmppc_xive_synchronize_state(sPAPRXive *xive, Error **errp)
 {
+    /* The KVM XIVE device is not in use */
+    if (xive->fd == -1) {
+        return;
+    }
+
     /*
      * When the VM is stopped, the sources are masked and the previous
      * state is saved in anticipation of a migration. We should not
@@ -544,6 +564,11 @@ int kvmppc_xive_pre_save(sPAPRXive *xive)
 {
     Error *local_err = NULL;
 
+    /* The KVM XIVE device is not in use */
+    if (xive->fd == -1) {
+        return 0;
+    }
+
     /* EAT: there is no extra state to query from KVM */
 
     /* ENDT */
@@ -568,6 +593,9 @@ int kvmppc_xive_post_load(sPAPRXive *xive, int version_id)
     CPUState *cs;
     int i;
 
+    /* The KVM XIVE device should be in use */
+    assert(xive->fd != -1);
+
     /* Restore the ENDT first. The targetting depends on it. */
     for (i = 0; i < xive->nr_ends; i++) {
         kvmppc_xive_set_queue_config(xive, SPAPR_XIVE_BLOCK_ID, i,
diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
index 373de3155f6b..9855316e4831 100644
--- a/hw/intc/xics_kvm.c
+++ b/hw/intc/xics_kvm.c
@@ -69,6 +69,11 @@ void icp_get_kvm_state(ICPState *icp)
     uint64_t state;
     int ret;
 
+    /* The KVM XICS device is not in use */
+    if (kernel_xics_fd == -1) {
+        return;
+    }
+
     /* ICP for this CPU thread is not in use, exiting */
     if (!icp->cs) {
         return;
@@ -105,6 +110,11 @@ int icp_set_kvm_state(ICPState *icp)
     uint64_t state;
     int ret;
 
+    /* The KVM XICS device is not in use */
+    if (kernel_xics_fd == -1) {
+        return 0;
+    }
+
     /* ICP for this CPU thread is not in use, exiting */
     if (!icp->cs) {
         return 0;
@@ -133,8 +143,9 @@ void icp_kvm_realize(DeviceState *dev, Error **errp)
     unsigned long vcpu_id;
     int ret;
 
+    /* The KVM XICS device is not in use */
     if (kernel_xics_fd == -1) {
-        abort();
+        return;
     }
 
     cs = icp->cs;
@@ -170,6 +181,11 @@ void ics_get_kvm_state(ICSState *ics)
     uint64_t state;
     int i;
 
+    /* The KVM XICS device is not in use */
+    if (kernel_xics_fd == -1) {
+        return;
+    }
+
     for (i = 0; i < ics->nr_irqs; i++) {
         ICSIRQState *irq = &ics->irqs[i];
 
@@ -269,6 +285,11 @@ int ics_set_kvm_state(ICSState *ics)
 {
     int i;
 
+    /* The KVM XICS device is not in use */
+    if (kernel_xics_fd == -1) {
+        return 0;
+    }
+
     for (i = 0; i < ics->nr_irqs; i++) {
         int ret;
 
@@ -286,6 +307,9 @@ void ics_kvm_set_irq(ICSState *ics, int srcno, int val)
     struct kvm_irq_level args;
     int rc;
 
+    /* The KVM XICS device should be in use */
+    assert(kernel_xics_fd != -1);
+
     args.irq = srcno + ics->offset;
     if (ics->irqs[srcno].flags & XICS_FLAGS_IRQ_MSI) {
         if (!val) {
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v2 12/13] spapr: add KVM support to the 'dual' machine
  2019-02-22 13:13 [Qemu-devel] [PATCH v2 00/13] spapr: add KVM support to the XIVE interrupt mode Cédric Le Goater
                   ` (10 preceding siblings ...)
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 11/13] spapr: check for the activation of " Cédric Le Goater
@ 2019-02-22 13:13 ` Cédric Le Goater
  2019-02-28  5:15   ` David Gibson
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 13/13] spapr/xive: fix device hotplug when VM is stopped Cédric Le Goater
  12 siblings, 1 reply; 32+ messages in thread
From: Cédric Le Goater @ 2019-02-22 13:13 UTC (permalink / raw)
  To: David Gibson; +Cc: Greg Kurz, qemu-ppc, qemu-devel, Cédric Le Goater

The interrupt mode is chosen by the CAS negotiation process and
activated after a reset to take into account the required changes in
the machine. This brings new constraints on how the associated KVM IRQ
device is initialized.

Currently, each model takes care of the initialization of the KVM
device in their realize method but this is not possible anymore as the
initialization needs to be done globaly when the interrupt mode is
known, i.e. when machine is reseted. It also means that we need a way
to delete a KVM device when another mode is chosen.

Also, to support migration, the QEMU objects holding the state to
transfer should always be available but not necessarily activated.

The overall approach of this proposal is to initialize both interrupt
mode at the QEMU level and keep the IRQ number space in sync to allow
switching from one mode to another. For the KVM side of things, the
whole initialization of the KVM device, sources and presenters, is
grouped in a single routine. The XICS and XIVE sPAPR IRQ reset
handlers are modified accordingly to handle the init and the delete
sequences of the KVM device.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 include/hw/ppc/spapr_xive.h |  1 +
 hw/intc/spapr_xive.c        | 19 +++++++-
 hw/intc/spapr_xive_kvm.c    | 27 +++++++++++
 hw/intc/xics_kvm.c          | 26 ++++++++++
 hw/intc/xive.c              |  4 --
 hw/ppc/spapr_irq.c          | 97 ++++++++++++++++++++++++++++---------
 6 files changed, 145 insertions(+), 29 deletions(-)

diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
index a7c4c275a747..a1593ac2fcf0 100644
--- a/include/hw/ppc/spapr_xive.h
+++ b/include/hw/ppc/spapr_xive.h
@@ -66,6 +66,7 @@ void spapr_xive_map_mmio(sPAPRXive *xive);
 
 int spapr_xive_end_to_target(uint8_t end_blk, uint32_t end_idx,
                              uint32_t *out_server, uint8_t *out_prio);
+void spapr_xive_late_realize(sPAPRXive *xive, Error **errp);
 
 /*
  * KVM XIVE device helpers
diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
index 21fe5e1aa39f..b0cbc2fe21ee 100644
--- a/hw/intc/spapr_xive.c
+++ b/hw/intc/spapr_xive.c
@@ -278,7 +278,6 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
     XiveSource *xsrc = &xive->source;
     XiveENDSource *end_xsrc = &xive->end_source;
     Error *local_err = NULL;
-    MachineState *machine = MACHINE(qdev_get_machine());
 
     if (!xive->nr_irqs) {
         error_setg(errp, "Number of interrupt needs to be greater 0");
@@ -329,6 +328,15 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
                            xive->tm_base + XIVE_TM_USER_PAGE * (1 << TM_SHIFT));
 
     qemu_register_reset(spapr_xive_reset, dev);
+}
+
+void spapr_xive_late_realize(sPAPRXive *xive, Error **errp)
+{
+    Error *local_err = NULL;
+    MachineState *machine = MACHINE(qdev_get_machine());
+    XiveSource *xsrc = &xive->source;
+    XiveENDSource *end_xsrc = &xive->end_source;
+    static bool once;
 
     if (kvm_enabled() && machine_kernel_irqchip_allowed(machine)) {
         kvmppc_xive_connect(xive, &local_err);
@@ -351,6 +359,15 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
         error_report_err(local_err);
     }
 
+    /*
+     * TODO: Emulated mode can only be initialized once. Should we
+     * store the information under the device model for later usage ?
+     */
+    if (once) {
+        return;
+    }
+    once = true;
+
     /* TIMA initialization */
     memory_region_init_io(&xive->tm_mmio, OBJECT(xive), &xive_tm_ops, xive,
                           "xive.tima", 4ull << TM_SHIFT);
diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index cd81cdb23a5e..99a829fb3f60 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -657,6 +657,15 @@ void kvmppc_xive_connect(sPAPRXive *xive, Error **errp)
     Error *local_err = NULL;
     size_t esb_len = (1ull << xsrc->esb_shift) * xsrc->nr_irqs;
     size_t tima_len = 4ull << TM_SHIFT;
+    CPUState *cs;
+
+    /*
+     * The KVM XIVE device already in use. This is the case when
+     * rebooting XIVE -> XIVE
+     */
+    if (xive->fd != -1) {
+        return;
+    }
 
     if (!kvmppc_has_cap_xive()) {
         error_setg(errp, "IRQ_XIVE capability must be present for KVM");
@@ -705,6 +714,24 @@ void kvmppc_xive_connect(sPAPRXive *xive, Error **errp)
     xive->change = qemu_add_vm_change_state_handler(
         kvmppc_xive_change_state_handler, xive);
 
+    /* Connect the presenters to the initial VCPUs of the machine */
+    CPU_FOREACH(cs) {
+        PowerPCCPU *cpu = POWERPC_CPU(cs);
+
+        kvmppc_xive_cpu_connect(spapr_cpu_state(cpu)->tctx, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+
+    /* Update the KVM sources */
+    kvmppc_xive_source_reset(xsrc, &local_err);
+    if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+    }
+
     kvm_kernel_irqchip = true;
     kvm_msi_via_irqfd_allowed = true;
     kvm_gsi_direct_mapping = true;
diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
index 9855316e4831..8ffd4c7a36f8 100644
--- a/hw/intc/xics_kvm.c
+++ b/hw/intc/xics_kvm.c
@@ -33,6 +33,7 @@
 #include "trace.h"
 #include "sysemu/kvm.h"
 #include "hw/ppc/spapr.h"
+#include "hw/ppc/spapr_cpu_core.h"
 #include "hw/ppc/xics.h"
 #include "hw/ppc/xics_spapr.h"
 #include "kvm_ppc.h"
@@ -337,6 +338,16 @@ static void rtas_dummy(PowerPCCPU *cpu, sPAPRMachineState *spapr,
 int xics_kvm_init(sPAPRMachineState *spapr, Error **errp)
 {
     int rc;
+    CPUState *cs;
+    Error *local_err = NULL;
+
+    /*
+     * The KVM XICS device already in use. This is the case when
+     * rebooting XICS -> XICS
+     */
+    if (kernel_xics_fd != -1) {
+        return 0;
+    }
 
     if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_IRQ_XICS)) {
         error_setg(errp,
@@ -385,6 +396,21 @@ int xics_kvm_init(sPAPRMachineState *spapr, Error **errp)
     kvm_msi_via_irqfd_allowed = true;
     kvm_gsi_direct_mapping = true;
 
+    /* Connect the presenters to the initial VCPUs of the machine */
+    CPU_FOREACH(cs) {
+        PowerPCCPU *cpu = POWERPC_CPU(cs);
+
+        icp_kvm_realize(DEVICE(spapr_cpu_state(cpu)->icp), &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            goto fail;
+        }
+        icp_set_kvm_state(spapr_cpu_state(cpu)->icp);
+    }
+
+    /* Update the KVM sources */
+    ics_set_kvm_state(spapr->ics);
+
     return 0;
 
 fail:
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index 1f8e923ca654..715d5a7e65ed 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -929,10 +929,6 @@ static void xive_source_reset(void *dev)
 
     /* PQs are initialized to 0b01 (Q=1) which corresponds to "ints off" */
     memset(xsrc->status, XIVE_ESB_OFF, xsrc->nr_irqs);
-
-    if (kvm_irqchip_in_kernel()) {
-        kvmppc_xive_source_reset(xsrc, &error_fatal);
-    }
 }
 
 static void xive_source_realize(DeviceState *dev, Error **errp)
diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
index 3176098b9f7c..f8260c14aecd 100644
--- a/hw/ppc/spapr_irq.c
+++ b/hw/ppc/spapr_irq.c
@@ -92,35 +92,55 @@ error:
     return NULL;
 }
 
-static void spapr_irq_init_xics(sPAPRMachineState *spapr, int nr_irqs,
-                                Error **errp)
+static void spapr_ics_late_realize(sPAPRMachineState *spapr, Error **errp)
 {
     MachineState *machine = MACHINE(spapr);
     Error *local_err = NULL;
-    bool xics_kvm = false;
+    static bool once;
 
-    if (kvm_enabled()) {
-        if (machine_kernel_irqchip_allowed(machine) &&
-            !xics_kvm_init(spapr, &local_err)) {
-            xics_kvm = true;
-        }
-        if (machine_kernel_irqchip_required(machine) && !xics_kvm) {
+    if (kvm_enabled() && machine_kernel_irqchip_allowed(machine)) {
+        xics_kvm_init(spapr, &local_err);
+        if (local_err && machine_kernel_irqchip_required(machine)) {
             error_prepend(&local_err,
                           "kernel_irqchip requested but unavailable: ");
-            goto error;
+            error_propagate(errp, local_err);
+            return;
         }
-        error_free(local_err);
-        local_err = NULL;
+
+        if (!local_err) {
+            return;
+        }
+
+        /*
+         * We failed to initialize the XIVE KVM device, fallback to
+         * emulated mode
+         */
+        error_prepend(&local_err, "kernel_irqchip allowed but unavailable: ");
+        error_report_err(local_err);
     }
 
-    if (!xics_kvm) {
-        xics_spapr_init(spapr);
+    /*
+     * TODO: Emulated mode can only be initialized once. Should we
+     * store the information under the device model for later usage ?
+     */
+    if (once) {
+        return;
     }
+    once = true;
 
-    spapr->ics = spapr_ics_create(spapr, nr_irqs, &local_err);
+    xics_spapr_init(spapr);
+}
 
-error:
-    error_propagate(errp, local_err);
+static void spapr_irq_init_xics(sPAPRMachineState *spapr, int nr_irqs,
+                                Error **errp)
+{
+    Error *local_err = NULL;
+
+    spapr->ics = spapr_ics_create(spapr, nr_irqs, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
 }
 
 #define ICS_IRQ_FREE(ics, srcno)   \
@@ -227,7 +247,13 @@ static void spapr_irq_set_irq_xics(void *opaque, int srcno, int val)
 
 static void spapr_irq_reset_xics(sPAPRMachineState *spapr, Error **errp)
 {
-    /* TODO: create the KVM XICS device */
+    Error *local_err = NULL;
+
+    spapr_ics_late_realize(spapr, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
 }
 
 static const char *spapr_irq_get_nodename_xics(sPAPRMachineState *spapr)
@@ -386,6 +412,7 @@ static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
 static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
 {
     CPUState *cs;
+    Error *local_err = NULL;
 
     CPU_FOREACH(cs) {
         PowerPCCPU *cpu = POWERPC_CPU(cs);
@@ -394,6 +421,12 @@ static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
         spapr_xive_set_tctx_os_cam(spapr_cpu_state(cpu)->tctx);
     }
 
+    spapr_xive_late_realize(spapr->xive, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
     /* Activate the XIVE MMIOs */
     spapr_xive_mmio_set_enabled(spapr->xive, true);
 }
@@ -462,14 +495,8 @@ static sPAPRIrq *spapr_irq_current(sPAPRMachineState *spapr)
 static void spapr_irq_init_dual(sPAPRMachineState *spapr, int nr_irqs,
                                 Error **errp)
 {
-    MachineState *machine = MACHINE(spapr);
     Error *local_err = NULL;
 
-    if (kvm_enabled() && machine_kernel_irqchip_allowed(machine)) {
-        error_setg(errp, "No KVM support for the 'dual' machine");
-        return;
-    }
-
     spapr_irq_xics.init(spapr, spapr_irq_xics.nr_irqs, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
@@ -548,6 +575,9 @@ static int spapr_irq_post_load_dual(sPAPRMachineState *spapr, int version_id)
      * defaults to XICS at startup.
      */
     if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
+        if (kvm_irqchip_in_kernel()) {
+            xics_kvm_disconnect(spapr, &error_fatal);
+        }
         spapr_irq_xive.reset(spapr, &error_fatal);
     }
 
@@ -556,12 +586,30 @@ static int spapr_irq_post_load_dual(sPAPRMachineState *spapr, int version_id)
 
 static void spapr_irq_reset_dual(sPAPRMachineState *spapr, Error **errp)
 {
+    Error *local_err = NULL;
+
     /*
      * Deactivate the XIVE MMIOs. The XIVE backend will reenable them
      * if selected.
      */
     spapr_xive_mmio_set_enabled(spapr->xive, false);
 
+    /* Destroy all KVM devices */
+    if (kvm_irqchip_in_kernel()) {
+        xics_kvm_disconnect(spapr, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            error_prepend(errp, "KVM XICS disconnect failed: ");
+            return;
+        }
+        kvmppc_xive_disconnect(spapr->xive, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            error_prepend(errp, "KVM XIVE disconnect failed: ");
+            return;
+        }
+    }
+
     spapr_irq_current(spapr)->reset(spapr, errp);
 }
 
@@ -748,6 +796,7 @@ sPAPRIrq spapr_irq_xics_legacy = {
     .dt_populate = spapr_dt_xics,
     .cpu_intc_create = spapr_irq_cpu_intc_create_xics,
     .post_load   = spapr_irq_post_load_xics,
+    .reset       = spapr_irq_reset_xics,
     .set_irq     = spapr_irq_set_irq_xics,
     .get_nodename = spapr_irq_get_nodename_xics,
 };
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Qemu-devel] [PATCH v2 13/13] spapr/xive: fix device hotplug when VM is stopped
  2019-02-22 13:13 [Qemu-devel] [PATCH v2 00/13] spapr: add KVM support to the XIVE interrupt mode Cédric Le Goater
                   ` (11 preceding siblings ...)
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 12/13] spapr: add KVM support to the 'dual' machine Cédric Le Goater
@ 2019-02-22 13:13 ` Cédric Le Goater
  2019-02-26  4:17   ` David Gibson
  12 siblings, 1 reply; 32+ messages in thread
From: Cédric Le Goater @ 2019-02-22 13:13 UTC (permalink / raw)
  To: David Gibson; +Cc: Greg Kurz, qemu-ppc, qemu-devel, Cédric Le Goater

Instead of switching off the sources, set their state to PENDING to
possibly catch a hotplug event occuring while the VM is stopped. At
resume, check the previous state and if an interrupt was queued,
generate a trigger.

Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 hw/intc/spapr_xive_kvm.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index 99a829fb3f60..64d160babb26 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -500,8 +500,16 @@ static void kvmppc_xive_change_state_handler(void *opaque, int running,
     if (running) {
         for (i = 0; i < xsrc->nr_irqs; i++) {
             uint8_t pq = xive_source_esb_get(xsrc, i);
-            if (xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_00 + (pq << 8)) != 0x1) {
-                error_report("XIVE: IRQ %d has an invalid state", i);
+            uint8_t old_pq;
+
+            old_pq = xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_00 + (pq << 8));
+
+            /*
+             * If an interrupt was queued (hotplug event) while VM was
+             * stopped, generate a trigger.
+             */
+            if (pq == XIVE_ESB_RESET && old_pq == XIVE_ESB_QUEUED) {
+                xive_esb_trigger(xsrc, i);
             }
         }
 
@@ -515,7 +523,15 @@ static void kvmppc_xive_change_state_handler(void *opaque, int running,
      * migration is in progress.
      */
     for (i = 0; i < xsrc->nr_irqs; i++) {
-        uint8_t pq = xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_01);
+        uint8_t pq = xive_esb_read(xsrc, i, XIVE_ESB_GET);
+
+        /*
+         * PQ is set to PENDING to possibly catch a hotplug event
+         * occuring while the VM is stopped.
+         */
+        if (pq != XIVE_ESB_OFF) {
+            pq = xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_10);
+        }
         xive_source_esb_set(xsrc, i, pq);
     }
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/13] spapr/xive: add KVM support
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 01/13] spapr/xive: add KVM support Cédric Le Goater
@ 2019-02-25  5:55   ` David Gibson
  2019-03-11 15:53     ` Cédric Le Goater
  0 siblings, 1 reply; 32+ messages in thread
From: David Gibson @ 2019-02-25  5:55 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: Greg Kurz, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 20416 bytes --]

On Fri, Feb 22, 2019 at 02:13:10PM +0100, Cédric Le Goater wrote:
> This introduces a set of helpers when KVM is in use, which create the
> KVM XIVE device, initialize the interrupt sources at a KVM level and
> connect the interrupt presenters to the vCPU.
> 
> They also handle the initialization of the TIMA and the source ESB
> memory regions of the controller. These have a different type under
> KVM. They are 'ram device' memory mappings, similarly to VFIO, exposed
> to the guest and the associated VMAs on the host are populated
> dynamically with the appropriate pages using a fault handler.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  default-configs/ppc64-softmmu.mak |   1 +
>  include/hw/ppc/spapr_xive.h       |  10 ++
>  include/hw/ppc/xive.h             |  13 ++
>  target/ppc/kvm_ppc.h              |   6 +
>  hw/intc/spapr_xive.c              |  48 +++++-
>  hw/intc/spapr_xive_kvm.c          | 237 ++++++++++++++++++++++++++++++
>  hw/intc/xive.c                    |  21 ++-
>  hw/ppc/spapr_irq.c                |   6 +-
>  target/ppc/kvm.c                  |   7 +
>  hw/intc/Makefile.objs             |   1 +
>  10 files changed, 340 insertions(+), 10 deletions(-)
>  create mode 100644 hw/intc/spapr_xive_kvm.c
> 
> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> index 7f34ad0528ed..c1bf5cd951f5 100644
> --- a/default-configs/ppc64-softmmu.mak
> +++ b/default-configs/ppc64-softmmu.mak
> @@ -18,6 +18,7 @@ CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
>  CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
>  CONFIG_XIVE=$(CONFIG_PSERIES)
>  CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
> +CONFIG_XIVE_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
>  CONFIG_MEM_DEVICE=y
>  CONFIG_DIMM=y
>  CONFIG_SPAPR_RNG=y
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index 2d31f24e3bfe..ab6732b14a02 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -38,6 +38,10 @@ typedef struct sPAPRXive {
>      /* TIMA mapping address */
>      hwaddr        tm_base;
>      MemoryRegion  tm_mmio;
> +
> +    /* KVM support */
> +    int           fd;
> +    void          *tm_mmap;
>  } sPAPRXive;
>  
>  bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi);
> @@ -49,5 +53,11 @@ void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
>                     uint32_t phandle);
>  void spapr_xive_set_tctx_os_cam(XiveTCTX *tctx);
>  void spapr_xive_mmio_set_enabled(sPAPRXive *xive, bool enable);
> +void spapr_xive_map_mmio(sPAPRXive *xive);
> +
> +/*
> + * KVM XIVE device helpers
> + */
> +void kvmppc_xive_connect(sPAPRXive *xive, Error **errp);
>  
>  #endif /* PPC_SPAPR_XIVE_H */
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index 13a487527b11..061d43fea24d 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -140,6 +140,7 @@
>  #ifndef PPC_XIVE_H
>  #define PPC_XIVE_H
>  
> +#include "sysemu/kvm.h"
>  #include "hw/qdev-core.h"
>  #include "hw/sysbus.h"
>  #include "hw/ppc/xive_regs.h"
> @@ -194,6 +195,9 @@ typedef struct XiveSource {
>      uint32_t        esb_shift;
>      MemoryRegion    esb_mmio;
>  
> +    /* KVM support */
> +    void            *esb_mmap;
> +
>      XiveNotifier    *xive;
>  } XiveSource;
>  
> @@ -419,4 +423,13 @@ static inline uint32_t xive_nvt_cam_line(uint8_t nvt_blk, uint32_t nvt_idx)
>      return (nvt_blk << 19) | nvt_idx;
>  }
>  
> +/*
> + * KVM XIVE device helpers
> + */
> +
> +void kvmppc_xive_source_reset_one(XiveSource *xsrc, int srcno, Error **errp);
> +void kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp);
> +void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val);
> +void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp);
> +
>  #endif /* PPC_XIVE_H */
> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
> index bdfaa4e70a83..d2159660f9f2 100644
> --- a/target/ppc/kvm_ppc.h
> +++ b/target/ppc/kvm_ppc.h
> @@ -59,6 +59,7 @@ bool kvmppc_has_cap_fixup_hcalls(void);
>  bool kvmppc_has_cap_htm(void);
>  bool kvmppc_has_cap_mmu_radix(void);
>  bool kvmppc_has_cap_mmu_hash_v3(void);
> +bool kvmppc_has_cap_xive(void);
>  int kvmppc_get_cap_safe_cache(void);
>  int kvmppc_get_cap_safe_bounds_check(void);
>  int kvmppc_get_cap_safe_indirect_branch(void);
> @@ -307,6 +308,11 @@ static inline bool kvmppc_has_cap_mmu_hash_v3(void)
>      return false;
>  }
>  
> +static inline bool kvmppc_has_cap_xive(void)
> +{
> +    return false;
> +}
> +
>  static inline int kvmppc_get_cap_safe_cache(void)
>  {
>      return 0;
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 06e3c9fdbfeb..c24d649e3668 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -173,7 +173,7 @@ void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
>      }
>  }
>  
> -static void spapr_xive_map_mmio(sPAPRXive *xive)
> +void spapr_xive_map_mmio(sPAPRXive *xive)
>  {
>      sysbus_mmio_map(SYS_BUS_DEVICE(xive), 0, xive->vc_base);
>      sysbus_mmio_map(SYS_BUS_DEVICE(xive), 1, xive->end_base);
> @@ -251,6 +251,9 @@ static void spapr_xive_instance_init(Object *obj)
>                        TYPE_XIVE_END_SOURCE);
>      object_property_add_child(obj, "end_source", OBJECT(&xive->end_source),
>                                NULL);
> +
> +    /* Not connected to the KVM XIVE device */
> +    xive->fd = -1;
>  }
>  
>  static void spapr_xive_realize(DeviceState *dev, Error **errp)
> @@ -259,6 +262,7 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>      XiveSource *xsrc = &xive->source;
>      XiveENDSource *end_xsrc = &xive->end_source;
>      Error *local_err = NULL;
> +    MachineState *machine = MACHINE(qdev_get_machine());
>  
>      if (!xive->nr_irqs) {
>          error_setg(errp, "Number of interrupt needs to be greater 0");
> @@ -305,6 +309,32 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>      xive->eat = g_new0(XiveEAS, xive->nr_irqs);
>      xive->endt = g_new0(XiveEND, xive->nr_ends);
>  
> +    xive->nodename = g_strdup_printf("interrupt-controller@%" PRIx64,
> +                           xive->tm_base + XIVE_TM_USER_PAGE * (1 << TM_SHIFT));
> +
> +    qemu_register_reset(spapr_xive_reset, dev);
> +
> +    if (kvm_enabled() && machine_kernel_irqchip_allowed(machine)) {
> +        kvmppc_xive_connect(xive, &local_err);
> +        if (local_err && machine_kernel_irqchip_required(machine)) {
> +            error_prepend(&local_err,
> +                          "kernel_irqchip requested but unavailable: ");
> +            error_propagate(errp, local_err);
> +            return;
> +        }
> +
> +        if (!local_err) {
> +            return;
> +        }
> +
> +        /*
> +         * We failed to initialize the XIVE KVM device, fallback to
> +         * emulated mode
> +         */
> +        error_prepend(&local_err, "kernel_irqchip allowed but unavailable: ");
> +        error_report_err(local_err);

Since we can fall back this should probably just be
warn_report_err().  Maybe not even that, for the case where the host
kernel doesn't support KVM XIVE at all.

> +    }
> +
>      /* TIMA initialization */
>      memory_region_init_io(&xive->tm_mmio, OBJECT(xive), &xive_tm_ops, xive,
>                            "xive.tima", 4ull << TM_SHIFT);
> @@ -316,11 +346,6 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>  
>      /* Map all regions */
>      spapr_xive_map_mmio(xive);
> -
> -    xive->nodename = g_strdup_printf("interrupt-controller@%" PRIx64,
> -                           xive->tm_base + XIVE_TM_USER_PAGE * (1 << TM_SHIFT));
> -
> -    qemu_register_reset(spapr_xive_reset, dev);
>  }
>  
>  static int spapr_xive_get_eas(XiveRouter *xrtr, uint8_t eas_blk,
> @@ -495,6 +520,17 @@ bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi)
>      if (lsi) {
>          xive_source_irq_set_lsi(xsrc, lisn);
>      }
> +
> +    if (kvm_irqchip_in_kernel()) {
> +        Error *local_err = NULL;
> +
> +        kvmppc_xive_source_reset_one(xsrc, lisn, &local_err);
> +        if (local_err) {
> +            error_report_err(local_err);
> +            return false;
> +        }
> +    }
> +
>      return true;
>  }
>  
> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> new file mode 100644
> index 000000000000..623fbf74f23e
> --- /dev/null
> +++ b/hw/intc/spapr_xive_kvm.c
> @@ -0,0 +1,237 @@
> +/*
> + * QEMU PowerPC sPAPR XIVE interrupt controller model
> + *
> + * Copyright (c) 2017-2019, IBM Corporation.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/log.h"
> +#include "qemu/error-report.h"
> +#include "qapi/error.h"
> +#include "target/ppc/cpu.h"
> +#include "sysemu/cpus.h"
> +#include "sysemu/kvm.h"
> +#include "hw/ppc/spapr.h"
> +#include "hw/ppc/spapr_xive.h"
> +#include "hw/ppc/xive.h"
> +#include "kvm_ppc.h"
> +
> +#include <sys/ioctl.h>
> +
> +/*
> + * Helpers for CPU hotplug
> + *
> + * TODO: make a common KVMEnabledCPU layer for XICS and XIVE
> + */
> +typedef struct KVMEnabledCPU {
> +    unsigned long vcpu_id;
> +    QLIST_ENTRY(KVMEnabledCPU) node;
> +} KVMEnabledCPU;
> +
> +static QLIST_HEAD(, KVMEnabledCPU)
> +    kvm_enabled_cpus = QLIST_HEAD_INITIALIZER(&kvm_enabled_cpus);
> +
> +static bool kvm_cpu_is_enabled(CPUState *cs)
> +{
> +    KVMEnabledCPU *enabled_cpu;
> +    unsigned long vcpu_id = kvm_arch_vcpu_id(cs);
> +
> +    QLIST_FOREACH(enabled_cpu, &kvm_enabled_cpus, node) {
> +        if (enabled_cpu->vcpu_id == vcpu_id) {
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +static void kvm_cpu_enable(CPUState *cs)
> +{
> +    KVMEnabledCPU *enabled_cpu;
> +    unsigned long vcpu_id = kvm_arch_vcpu_id(cs);
> +
> +    enabled_cpu = g_malloc(sizeof(*enabled_cpu));
> +    enabled_cpu->vcpu_id = vcpu_id;
> +    QLIST_INSERT_HEAD(&kvm_enabled_cpus, enabled_cpu, node);
> +}
> +
> +/*
> + * XIVE Thread Interrupt Management context (KVM)
> + */
> +
> +void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp)
> +{
> +    sPAPRXive *xive = SPAPR_MACHINE(qdev_get_machine())->xive;
> +    unsigned long vcpu_id;
> +    int ret;
> +
> +    /* Check if CPU was hot unplugged and replugged. */
> +    if (kvm_cpu_is_enabled(tctx->cs)) {
> +        return;
> +    }
> +
> +    vcpu_id = kvm_arch_vcpu_id(tctx->cs);
> +
> +    ret = kvm_vcpu_enable_cap(tctx->cs, KVM_CAP_PPC_IRQ_XIVE, 0, xive->fd,
> +                              vcpu_id, 0);
> +    if (ret < 0) {
> +        error_setg(errp, "XIVE: unable to connect CPU%ld to KVM device: %s",
> +                   vcpu_id, strerror(errno));
> +        return;
> +    }
> +
> +    kvm_cpu_enable(tctx->cs);
> +}
> +
> +/*
> + * XIVE Interrupt Source (KVM)
> + */
> +
> +/*
> + * At reset, the interrupt sources are simply created and MASKED. We
> + * only need to inform the KVM XIVE device about their type: LSI or
> + * MSI.
> + */
> +void kvmppc_xive_source_reset_one(XiveSource *xsrc, int srcno, Error **errp)
> +{
> +    sPAPRXive *xive = SPAPR_XIVE(xsrc->xive);
> +    uint64_t state = 0;
> +
> +    if (xive_source_irq_is_lsi(xsrc, srcno)) {
> +        state |= KVM_XIVE_LEVEL_SENSITIVE;
> +        if (xsrc->status[srcno] & XIVE_STATUS_ASSERTED) {
> +            state |= KVM_XIVE_LEVEL_ASSERTED;
> +        }
> +    }
> +
> +    kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_SOURCE, srcno, &state,
> +                      true, errp);
> +}
> +
> +void kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp)
> +{
> +    int i;
> +
> +    for (i = 0; i < xsrc->nr_irqs; i++) {
> +        Error *local_err = NULL;
> +
> +        kvmppc_xive_source_reset_one(xsrc, i, &local_err);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            return;
> +        }
> +    }
> +}
> +
> +void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val)
> +{
> +    XiveSource *xsrc = opaque;
> +    struct kvm_irq_level args;
> +    int rc;
> +
> +    args.irq = srcno;
> +    if (!xive_source_irq_is_lsi(xsrc, srcno)) {
> +        if (!val) {
> +            return;
> +        }
> +        args.level = KVM_INTERRUPT_SET;
> +    } else {
> +        if (val) {
> +            xsrc->status[srcno] |= XIVE_STATUS_ASSERTED;
> +            args.level = KVM_INTERRUPT_SET_LEVEL;
> +        } else {
> +            xsrc->status[srcno] &= ~XIVE_STATUS_ASSERTED;
> +            args.level = KVM_INTERRUPT_UNSET;
> +        }
> +    }
> +    rc = kvm_vm_ioctl(kvm_state, KVM_IRQ_LINE, &args);
> +    if (rc < 0) {
> +        error_report("XIVE: kvm_irq_line() failed : %s", strerror(errno));
> +    }
> +}
> +
> +/*
> + * sPAPR XIVE interrupt controller (KVM)
> + */
> +
> +static void *kvmppc_xive_mmap(sPAPRXive *xive, int pgoff, size_t len,
> +                              Error **errp)
> +{
> +    void *addr;
> +    uint32_t page_shift = 16; /* TODO: fix page_shift */
> +
> +    addr = mmap(NULL, len, PROT_WRITE | PROT_READ, MAP_SHARED, xive->fd,
> +                pgoff << page_shift);
> +    if (addr == MAP_FAILED) {
> +        error_setg_errno(errp, errno, "XIVE: unable to set memory mapping");
> +        return NULL;
> +    }
> +
> +    return addr;
> +}
> +
> +/*
> + * All the XIVE memory regions are now backed by mappings from the KVM
> + * XIVE device.
> + */
> +void kvmppc_xive_connect(sPAPRXive *xive, Error **errp)
> +{
> +    XiveSource *xsrc = &xive->source;
> +    XiveENDSource *end_xsrc = &xive->end_source;
> +    Error *local_err = NULL;
> +    size_t esb_len = (1ull << xsrc->esb_shift) * xsrc->nr_irqs;
> +    size_t tima_len = 4ull << TM_SHIFT;
> +
> +    if (!kvmppc_has_cap_xive()) {
> +        error_setg(errp, "IRQ_XIVE capability must be present for KVM");
> +        return;
> +    }
> +
> +    /* First, create the KVM XIVE device */
> +    xive->fd = kvm_create_device(kvm_state, KVM_DEV_TYPE_XIVE, false);
> +    if (xive->fd < 0) {
> +        error_setg_errno(errp, -xive->fd, "XIVE: error creating KVM device");
> +        return;
> +    }
> +
> +    /*
> +     * 1. Source ESB pages - KVM mapping
> +     */
> +    xsrc->esb_mmap = kvmppc_xive_mmap(xive, KVM_XIVE_ESB_PAGE_OFFSET, esb_len,
> +                                      &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    memory_region_init_ram_device_ptr(&xsrc->esb_mmio, OBJECT(xsrc),
> +                                      "xive.esb", esb_len, xsrc->esb_mmap);
> +    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &xsrc->esb_mmio);
> +
> +    /*
> +     * 2. END ESB pages (No KVM support yet)
> +     */
> +    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &end_xsrc->esb_mmio);
> +
> +    /*
> +     * 3. TIMA pages - KVM mapping
> +     */
> +    xive->tm_mmap = kvmppc_xive_mmap(xive, KVM_XIVE_TIMA_PAGE_OFFSET, tima_len,
> +                                     &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +    memory_region_init_ram_device_ptr(&xive->tm_mmio, OBJECT(xive),
> +                                      "xive.tima", tima_len, xive->tm_mmap);
> +    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &xive->tm_mmio);
> +
> +    kvm_kernel_irqchip = true;
> +    kvm_msi_via_irqfd_allowed = true;
> +    kvm_gsi_direct_mapping = true;
> +
> +    /* Map all regions */
> +    spapr_xive_map_mmio(xive);
> +}
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index daa7badc8492..0284b5803551 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -491,6 +491,15 @@ static void xive_tctx_realize(DeviceState *dev, Error **errp)
>          return;
>      }
>  
> +    /* Connect the presenter to the VCPU (required for CPU hotplug) */
> +    if (kvm_irqchip_in_kernel()) {
> +        kvmppc_xive_cpu_connect(tctx, &local_err);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            return;
> +        }
> +    }
> +
>      qemu_register_reset(xive_tctx_reset, dev);
>  }
>  
> @@ -893,6 +902,10 @@ static void xive_source_reset(void *dev)
>  
>      /* PQs are initialized to 0b01 (Q=1) which corresponds to "ints off" */
>      memset(xsrc->status, XIVE_ESB_OFF, xsrc->nr_irqs);
> +
> +    if (kvm_irqchip_in_kernel()) {
> +        kvmppc_xive_source_reset(xsrc, &error_fatal);
> +    }
>  }
>  
>  static void xive_source_realize(DeviceState *dev, Error **errp)
> @@ -926,9 +939,11 @@ static void xive_source_realize(DeviceState *dev, Error **errp)
>      xsrc->status = g_malloc0(xsrc->nr_irqs);
>      xsrc->lsi_map = bitmap_new(xsrc->nr_irqs);
>  
> -    memory_region_init_io(&xsrc->esb_mmio, OBJECT(xsrc),
> -                          &xive_source_esb_ops, xsrc, "xive.esb",
> -                          (1ull << xsrc->esb_shift) * xsrc->nr_irqs);
> +    if (!kvm_irqchip_in_kernel()) {
> +        memory_region_init_io(&xsrc->esb_mmio, OBJECT(xsrc),
> +                              &xive_source_esb_ops, xsrc, "xive.esb",
> +                              (1ull << xsrc->esb_shift) * xsrc->nr_irqs);
> +    }
>  
>      qemu_register_reset(xive_source_reset, dev);
>  }
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index 4145079d7fa5..6e1c36dc62ca 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -387,7 +387,11 @@ static void spapr_irq_set_irq_xive(void *opaque, int srcno, int val)
>  {
>      sPAPRMachineState *spapr = opaque;
>  
> -    xive_source_set_irq(&spapr->xive->source, srcno, val);
> +    if (kvm_irqchip_in_kernel()) {
> +        kvmppc_xive_source_set_irq(&spapr->xive->source, srcno, val);
> +    } else {
> +        xive_source_set_irq(&spapr->xive->source, srcno, val);
> +    }
>  }
>  
>  static const char *spapr_irq_get_nodename_xive(sPAPRMachineState *spapr)
> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
> index d01852fe3112..43e42e3c2af9 100644
> --- a/target/ppc/kvm.c
> +++ b/target/ppc/kvm.c
> @@ -85,6 +85,7 @@ static int cap_fixup_hcalls;
>  static int cap_htm;             /* Hardware transactional memory support */
>  static int cap_mmu_radix;
>  static int cap_mmu_hash_v3;
> +static int cap_xive;
>  static int cap_resize_hpt;
>  static int cap_ppc_pvr_compat;
>  static int cap_ppc_safe_cache;
> @@ -148,6 +149,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>      cap_htm = kvm_vm_check_extension(s, KVM_CAP_PPC_HTM);
>      cap_mmu_radix = kvm_vm_check_extension(s, KVM_CAP_PPC_MMU_RADIX);
>      cap_mmu_hash_v3 = kvm_vm_check_extension(s, KVM_CAP_PPC_MMU_HASH_V3);
> +    cap_xive = kvm_vm_check_extension(s, KVM_CAP_PPC_IRQ_XIVE);
>      cap_resize_hpt = kvm_vm_check_extension(s, KVM_CAP_SPAPR_RESIZE_HPT);
>      kvmppc_get_cpu_characteristics(s);
>      cap_ppc_nested_kvm_hv = kvm_vm_check_extension(s, KVM_CAP_PPC_NESTED_HV);
> @@ -2388,6 +2390,11 @@ static int parse_cap_ppc_safe_indirect_branch(struct kvm_ppc_cpu_char c)
>      return 0;
>  }
>  
> +bool kvmppc_has_cap_xive(void)
> +{
> +    return cap_xive;
> +}
> +
>  static void kvmppc_get_cpu_characteristics(KVMState *s)
>  {
>      struct kvm_ppc_cpu_char c;
> diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
> index 301a8e972d91..23126c199178 100644
> --- a/hw/intc/Makefile.objs
> +++ b/hw/intc/Makefile.objs
> @@ -39,6 +39,7 @@ obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
>  obj-$(CONFIG_XICS_KVM) += xics_kvm.o
>  obj-$(CONFIG_XIVE) += xive.o
>  obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o
> +obj-$(CONFIG_XIVE_KVM) += spapr_xive_kvm.o
>  obj-$(CONFIG_POWERNV) += xics_pnv.o
>  obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
>  obj-$(CONFIG_S390_FLIC) += s390_flic.o

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/13] spapr/xive: add hcall support when under KVM
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 02/13] spapr/xive: add hcall support when under KVM Cédric Le Goater
@ 2019-02-25 23:22   ` David Gibson
  2019-03-11 17:32     ` Cédric Le Goater
  0 siblings, 1 reply; 32+ messages in thread
From: David Gibson @ 2019-02-25 23:22 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: Greg Kurz, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 15593 bytes --]

On Fri, Feb 22, 2019 at 02:13:11PM +0100, Cédric Le Goater wrote:
> XIVE hcalls are all redirected to QEMU as none are on a fast path.
> When necessary, QEMU invokes KVM through specific ioctls to perform
> host operations. QEMU should have done the necessary checks before
> calling KVM and, in case of failure, H_HARDWARE is simply returned.
> 
> H_INT_ESB is a special case that could have been handled under KVM
> but the impact on performance was low when under QEMU. Here are some
> figures :
> 
>     kernel irqchip      OFF          ON
>     H_INT_ESB                    KVM   QEMU
> 
>     rtl8139 (LSI )      1.19     1.24  1.23  Gbits/sec
>     virtio             31.80    42.30   --   Gbits/sec
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  include/hw/ppc/spapr_xive.h |  15 +++
>  hw/intc/spapr_xive.c        |  87 +++++++++++++++--
>  hw/intc/spapr_xive_kvm.c    | 184 ++++++++++++++++++++++++++++++++++++
>  3 files changed, 278 insertions(+), 8 deletions(-)
> 
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index ab6732b14a02..749c6cbc2c56 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -55,9 +55,24 @@ void spapr_xive_set_tctx_os_cam(XiveTCTX *tctx);
>  void spapr_xive_mmio_set_enabled(sPAPRXive *xive, bool enable);
>  void spapr_xive_map_mmio(sPAPRXive *xive);
>  
> +int spapr_xive_end_to_target(uint8_t end_blk, uint32_t end_idx,
> +                             uint32_t *out_server, uint8_t *out_prio);
> +
>  /*
>   * KVM XIVE device helpers
>   */
>  void kvmppc_xive_connect(sPAPRXive *xive, Error **errp);
> +void kvmppc_xive_reset(sPAPRXive *xive, Error **errp);
> +void kvmppc_xive_set_source_config(sPAPRXive *xive, uint32_t lisn, XiveEAS *eas,
> +                                   Error **errp);
> +void kvmppc_xive_sync_source(sPAPRXive *xive, uint32_t lisn, Error **errp);
> +uint64_t kvmppc_xive_esb_rw(XiveSource *xsrc, int srcno, uint32_t offset,
> +                            uint64_t data, bool write);
> +void kvmppc_xive_set_queue_config(sPAPRXive *xive, uint8_t end_blk,
> +                                 uint32_t end_idx, XiveEND *end,
> +                                 Error **errp);
> +void kvmppc_xive_get_queue_config(sPAPRXive *xive, uint8_t end_blk,
> +                                 uint32_t end_idx, XiveEND *end,
> +                                 Error **errp);
>  
>  #endif /* PPC_SPAPR_XIVE_H */
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index c24d649e3668..3db24391e31c 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -86,6 +86,19 @@ static int spapr_xive_target_to_nvt(uint32_t target,
>   * sPAPR END indexing uses a simple mapping of the CPU vcpu_id, 8
>   * priorities per CPU
>   */
> +int spapr_xive_end_to_target(uint8_t end_blk, uint32_t end_idx,
> +                             uint32_t *out_server, uint8_t *out_prio)
> +{

Since you don't support irq blocks as yet, should this error out
rather than ignoring if end_blk != 0?

> +    if (out_server) {
> +        *out_server = end_idx >> 3;
> +    }
> +
> +    if (out_prio) {
> +        *out_prio = end_idx & 0x7;
> +    }
> +    return 0;
> +}
> +
>  static void spapr_xive_cpu_to_end(PowerPCCPU *cpu, uint8_t prio,
>                                    uint8_t *out_end_blk, uint32_t *out_end_idx)
>  {
> @@ -792,6 +805,16 @@ static target_ulong h_int_set_source_config(PowerPCCPU *cpu,
>          new_eas.w = xive_set_field64(EAS_END_DATA, new_eas.w, eisn);
>      }
>  
> +    if (kvm_irqchip_in_kernel()) {
> +        Error *local_err = NULL;
> +
> +        kvmppc_xive_set_source_config(xive, lisn, &new_eas, &local_err);
> +        if (local_err) {
> +            error_report_err(local_err);
> +            return H_HARDWARE;
> +        }
> +    }
> +
>  out:
>      xive->eat[lisn] = new_eas;
>      return H_SUCCESS;
> @@ -1097,6 +1120,16 @@ static target_ulong h_int_set_queue_config(PowerPCCPU *cpu,
>       */
>  
>  out:
> +    if (kvm_irqchip_in_kernel()) {
> +        Error *local_err = NULL;
> +
> +        kvmppc_xive_set_queue_config(xive, end_blk, end_idx, &end, &local_err);
> +        if (local_err) {
> +            error_report_err(local_err);
> +            return H_HARDWARE;
> +        }
> +    }
> +
>      /* Update END */
>      memcpy(&xive->endt[end_idx], &end, sizeof(XiveEND));
>      return H_SUCCESS;
> @@ -1189,6 +1222,16 @@ static target_ulong h_int_get_queue_config(PowerPCCPU *cpu,
>          args[2] = 0;
>      }
>  
> +    if (kvm_irqchip_in_kernel()) {
> +        Error *local_err = NULL;
> +
> +        kvmppc_xive_get_queue_config(xive, end_blk, end_idx, end, &local_err);
> +        if (local_err) {
> +            error_report_err(local_err);
> +            return H_HARDWARE;
> +        }
> +    }
> +
>      /* TODO: do we need any locking on the END ? */
>      if (flags & SPAPR_XIVE_END_DEBUG) {
>          /* Load the event queue generation number into the return flags */
> @@ -1341,15 +1384,20 @@ static target_ulong h_int_esb(PowerPCCPU *cpu,
>          return H_P3;
>      }
>  
> -    mmio_addr = xive->vc_base + xive_source_esb_mgmt(xsrc, lisn) + offset;
> +    if (kvm_irqchip_in_kernel()) {
> +        args[0] = kvmppc_xive_esb_rw(xsrc, lisn, offset, data,
> +                                     flags & SPAPR_XIVE_ESB_STORE);
> +    } else {
> +        mmio_addr = xive->vc_base + xive_source_esb_mgmt(xsrc, lisn) + offset;
>  
> -    if (dma_memory_rw(&address_space_memory, mmio_addr, &data, 8,
> -                      (flags & SPAPR_XIVE_ESB_STORE))) {
> -        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: failed to access ESB @0x%"
> -                      HWADDR_PRIx "\n", mmio_addr);
> -        return H_HARDWARE;
> +        if (dma_memory_rw(&address_space_memory, mmio_addr, &data, 8,
> +                          (flags & SPAPR_XIVE_ESB_STORE))) {
> +            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: failed to access ESB @0x%"
> +                          HWADDR_PRIx "\n", mmio_addr);
> +            return H_HARDWARE;
> +        }
> +        args[0] = (flags & SPAPR_XIVE_ESB_STORE) ? -1 : data;
>      }
> -    args[0] = (flags & SPAPR_XIVE_ESB_STORE) ? -1 : data;
>      return H_SUCCESS;
>  }
>  
> @@ -1406,7 +1454,20 @@ static target_ulong h_int_sync(PowerPCCPU *cpu,
>       * This is not needed when running the emulation under QEMU
>       */
>  
> -    /* This is not real hardware. Nothing to be done */
> +    /*
> +     * This is not real hardware. Nothing to be done unless when
> +     * under KVM
> +     */
> +
> +    if (kvm_irqchip_in_kernel()) {
> +        Error *local_err = NULL;
> +
> +        kvmppc_xive_sync_source(xive, lisn, &local_err);
> +        if (local_err) {
> +            error_report_err(local_err);
> +            return H_HARDWARE;
> +        }
> +    }
>      return H_SUCCESS;
>  }
>  
> @@ -1441,6 +1502,16 @@ static target_ulong h_int_reset(PowerPCCPU *cpu,
>      }
>  
>      device_reset(DEVICE(xive));
> +
> +    if (kvm_irqchip_in_kernel()) {
> +        Error *local_err = NULL;
> +
> +        kvmppc_xive_reset(xive, &local_err);
> +        if (local_err) {
> +            error_report_err(local_err);
> +            return H_HARDWARE;
> +        }
> +    }
>      return H_SUCCESS;
>  }
>  
> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> index 623fbf74f23e..6b50451b4f85 100644
> --- a/hw/intc/spapr_xive_kvm.c
> +++ b/hw/intc/spapr_xive_kvm.c
> @@ -89,6 +89,52 @@ void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp)
>   * XIVE Interrupt Source (KVM)
>   */
>  
> +void kvmppc_xive_set_source_config(sPAPRXive *xive, uint32_t lisn, XiveEAS *eas,
> +                                   Error **errp)
> +{
> +    uint32_t end_idx;
> +    uint32_t end_blk;
> +    uint32_t eisn;
> +    uint8_t priority;
> +    uint32_t server;
> +    uint64_t kvm_src;
> +    Error *local_err = NULL;
> +
> +    /*
> +     * No need to set a MASKED source, this is the default state after
> +     * reset.

I don't quite follow this comment, why is there no need to call a
MASKED source?

> +     */
> +    if (!xive_eas_is_valid(eas) || xive_eas_is_masked(eas)) {
> +        return;
> +    }
> +
> +    end_idx = xive_get_field64(EAS_END_INDEX, eas->w);
> +    end_blk = xive_get_field64(EAS_END_BLOCK, eas->w);
> +    eisn = xive_get_field64(EAS_END_DATA, eas->w);
> +
> +    spapr_xive_end_to_target(end_blk, end_idx, &server, &priority);
> +
> +    kvm_src = priority << KVM_XIVE_SOURCE_PRIORITY_SHIFT &
> +        KVM_XIVE_SOURCE_PRIORITY_MASK;
> +    kvm_src |= server << KVM_XIVE_SOURCE_SERVER_SHIFT &
> +        KVM_XIVE_SOURCE_SERVER_MASK;
> +    kvm_src |= ((uint64_t)eisn << KVM_XIVE_SOURCE_EISN_SHIFT) &
> +        KVM_XIVE_SOURCE_EISN_MASK;
> +
> +    kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_SOURCE_CONFIG, lisn,
> +                      &kvm_src, true, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +}
> +
> +void kvmppc_xive_sync_source(sPAPRXive *xive, uint32_t lisn, Error **errp)
> +{
> +    kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_SOURCE_SYNC, lisn,
> +                      NULL, true, errp);
> +}
> +
>  /*
>   * At reset, the interrupt sources are simply created and MASKED. We
>   * only need to inform the KVM XIVE device about their type: LSI or
> @@ -125,6 +171,64 @@ void kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp)
>      }
>  }
>  
> +/*
> + * This is used to perform the magic loads on the ESB pages, described
> + * in xive.h.
> + */
> +static uint64_t xive_esb_rw(XiveSource *xsrc, int srcno, uint32_t offset,
> +                            uint64_t data, bool write)
> +{
> +    unsigned long addr = (unsigned long) xsrc->esb_mmap +
> +        xive_source_esb_mgmt(xsrc, srcno) + offset;

Casting the esb_mmap into unsigned long then back to a pointer looks
unnecessary.  You should be able to do this with pointer arithmetic.

> +    if (write) {
> +        *((uint64_t *) addr) = data;
> +        return -1;
> +    } else {
> +        return *((uint64_t *) addr);
> +    }

Since this is always dealing with 64-bit values, couldn't you put the
byteswaps in here rather than in all the callers?

> +}
> +
> +static uint8_t xive_esb_read(XiveSource *xsrc, int srcno, uint32_t offset)
> +{
> +    /* Prevent the compiler from optimizing away the load */
> +    volatile uint64_t value = xive_esb_rw(xsrc, srcno, offset, 0, 0);

Wouldn't the volatile magic be better inside xive_esb_rw()?

> +    return be64_to_cpu(value) & 0x3;
> +}
> +
> +static void xive_esb_trigger(XiveSource *xsrc, int srcno)
> +{
> +    unsigned long addr = (unsigned long) xsrc->esb_mmap +
> +        xive_source_esb_page(xsrc, srcno);
> +
> +    *((uint64_t *) addr) = 0x0;
> +}

Also.. aren't some of these register accesses likely to need memory
barriers?

> +
> +uint64_t kvmppc_xive_esb_rw(XiveSource *xsrc, int srcno, uint32_t offset,
> +                            uint64_t data, bool write)
> +{
> +    if (write) {
> +        return xive_esb_rw(xsrc, srcno, offset, data, 1);
> +    }
> +
> +    /*
> +     * Special Load EOI handling for LSI sources. Q bit is never set
> +     * and the interrupt should be re-triggered if the level is still
> +     * asserted.
> +     */
> +    if (xive_source_irq_is_lsi(xsrc, srcno) &&
> +        offset == XIVE_ESB_LOAD_EOI) {
> +        xive_esb_read(xsrc, srcno, XIVE_ESB_SET_PQ_00);
> +        if (xsrc->status[srcno] & XIVE_STATUS_ASSERTED) {
> +            xive_esb_trigger(xsrc, srcno);
> +        }
> +        return 0;
> +    } else {
> +        return xive_esb_rw(xsrc, srcno, offset, 0, 0);
> +    }
> +}
> +
>  void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val)
>  {
>      XiveSource *xsrc = opaque;
> @@ -155,6 +259,86 @@ void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val)
>  /*
>   * sPAPR XIVE interrupt controller (KVM)
>   */
> +void kvmppc_xive_get_queue_config(sPAPRXive *xive, uint8_t end_blk,
> +                                  uint32_t end_idx, XiveEND *end,
> +                                  Error **errp)
> +{
> +    struct kvm_ppc_xive_eq kvm_eq = { 0 };
> +    uint64_t kvm_eq_idx;
> +    uint8_t priority;
> +    uint32_t server;
> +    Error *local_err = NULL;
> +
> +    if (!xive_end_is_valid(end)) {

This should set an error, shouldn't it?

> +        return;
> +    }
> +
> +    /* Encode the tuple (server, prio) as a KVM EQ index */
> +    spapr_xive_end_to_target(end_blk, end_idx, &server, &priority);
> +
> +    kvm_eq_idx = priority << KVM_XIVE_EQ_PRIORITY_SHIFT &
> +            KVM_XIVE_EQ_PRIORITY_MASK;
> +    kvm_eq_idx |= server << KVM_XIVE_EQ_SERVER_SHIFT &
> +        KVM_XIVE_EQ_SERVER_MASK;
> +
> +    kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_EQ_CONFIG, kvm_eq_idx,
> +                      &kvm_eq, false, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    /*
> +     * The EQ index and toggle bit are updated by HW. These are the
> +     * only fields we want to return.
> +     */
> +    end->w1 = xive_set_field32(END_W1_GENERATION, 0ul, kvm_eq.qtoggle) |
> +        xive_set_field32(END_W1_PAGE_OFF, 0ul, kvm_eq.qindex);
> +}
> +
> +void kvmppc_xive_set_queue_config(sPAPRXive *xive, uint8_t end_blk,
> +                                  uint32_t end_idx, XiveEND *end,
> +                                  Error **errp)
> +{
> +    struct kvm_ppc_xive_eq kvm_eq = { 0 };
> +    uint64_t kvm_eq_idx;
> +    uint8_t priority;
> +    uint32_t server;
> +    Error *local_err = NULL;
> +
> +    if (!xive_end_is_valid(end)) {
> +        return;
> +    }
> +
> +    /* Build the KVM state from the local END structure */
> +    kvm_eq.flags   = KVM_XIVE_EQ_FLAG_ALWAYS_NOTIFY;
> +    kvm_eq.qsize   = xive_get_field32(END_W0_QSIZE, end->w0) + 12;
> +    kvm_eq.qpage   = (uint64_t) be32_to_cpu(end->w2 & 0x0fffffff) << 32 |
> +        be32_to_cpu(end->w3);
> +    kvm_eq.qtoggle = xive_get_field32(END_W1_GENERATION, end->w1);
> +    kvm_eq.qindex  = xive_get_field32(END_W1_PAGE_OFF, end->w1);
> +
> +    /* Encode the tuple (server, prio) as a KVM EQ index */
> +    spapr_xive_end_to_target(end_blk, end_idx, &server, &priority);
> +
> +    kvm_eq_idx = priority << KVM_XIVE_EQ_PRIORITY_SHIFT &
> +            KVM_XIVE_EQ_PRIORITY_MASK;
> +    kvm_eq_idx |= server << KVM_XIVE_EQ_SERVER_SHIFT &
> +        KVM_XIVE_EQ_SERVER_MASK;
> +
> +    kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_EQ_CONFIG, kvm_eq_idx,
> +                      &kvm_eq, true, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +}
> +
> +void kvmppc_xive_reset(sPAPRXive *xive, Error **errp)
> +{
> +    kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_CTRL, KVM_DEV_XIVE_RESET,
> +                      NULL, true, errp);
> +}
>  
>  static void *kvmppc_xive_mmap(sPAPRXive *xive, int pgoff, size_t len,
>                                Error **errp)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/13] spapr/xive: activate KVM support
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 03/13] spapr/xive: activate KVM support Cédric Le Goater
@ 2019-02-25 23:49   ` David Gibson
  2019-02-25 23:49     ` David Gibson
  2019-03-11 20:44     ` Cédric Le Goater
  0 siblings, 2 replies; 32+ messages in thread
From: David Gibson @ 2019-02-25 23:49 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: Greg Kurz, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1484 bytes --]

On Fri, Feb 22, 2019 at 02:13:12PM +0100, Cédric Le Goater wrote:
> All is in place for KVM now. State synchronization and migration will
> come next.

As with the kernel side capability, this should be moved later in the
series to avoid breaking bisections.

> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/ppc/spapr_irq.c | 9 ---------
>  1 file changed, 9 deletions(-)
> 
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index 6e1c36dc62ca..1ad57582a403 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -263,19 +263,10 @@ sPAPRIrq spapr_irq_xics = {
>  static void spapr_irq_init_xive(sPAPRMachineState *spapr, int nr_irqs,
>                                  Error **errp)
>  {
> -    MachineState *machine = MACHINE(spapr);
>      uint32_t nr_servers = spapr_max_server_number(spapr);
>      DeviceState *dev;
>      int i;
>  
> -    /* KVM XIVE device not yet available */
> -    if (kvm_enabled()) {
> -        if (machine_kernel_irqchip_required(machine)) {
> -            error_setg(errp, "kernel_irqchip requested. no KVM XIVE support");
> -            return;
> -        }
> -    }
> -
>      dev = qdev_create(NULL, TYPE_SPAPR_XIVE);
>      qdev_prop_set_uint32(dev, "nr-irqs", nr_irqs);
>      /*

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/13] spapr/xive: activate KVM support
  2019-02-25 23:49   ` David Gibson
@ 2019-02-25 23:49     ` David Gibson
  2019-03-11 20:44     ` Cédric Le Goater
  1 sibling, 0 replies; 32+ messages in thread
From: David Gibson @ 2019-02-25 23:49 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: Greg Kurz, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1710 bytes --]

On Tue, Feb 26, 2019 at 10:49:27AM +1100, David Gibson wrote:
> On Fri, Feb 22, 2019 at 02:13:12PM +0100, Cédric Le Goater wrote:
> > All is in place for KVM now. State synchronization and migration will
> > come next.
> 
> As with the kernel side capability, this should be moved later in the
> series to avoid breaking bisections.

Apart from that,

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> 
> > 
> > Signed-off-by: Cédric Le Goater <clg@kaod.org>
> > ---
> >  hw/ppc/spapr_irq.c | 9 ---------
> >  1 file changed, 9 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> > index 6e1c36dc62ca..1ad57582a403 100644
> > --- a/hw/ppc/spapr_irq.c
> > +++ b/hw/ppc/spapr_irq.c
> > @@ -263,19 +263,10 @@ sPAPRIrq spapr_irq_xics = {
> >  static void spapr_irq_init_xive(sPAPRMachineState *spapr, int nr_irqs,
> >                                  Error **errp)
> >  {
> > -    MachineState *machine = MACHINE(spapr);
> >      uint32_t nr_servers = spapr_max_server_number(spapr);
> >      DeviceState *dev;
> >      int i;
> >  
> > -    /* KVM XIVE device not yet available */
> > -    if (kvm_enabled()) {
> > -        if (machine_kernel_irqchip_required(machine)) {
> > -            error_setg(errp, "kernel_irqchip requested. no KVM XIVE support");
> > -            return;
> > -        }
> > -    }
> > -
> >      dev = qdev_create(NULL, TYPE_SPAPR_XIVE);
> >      qdev_prop_set_uint32(dev, "nr-irqs", nr_irqs);
> >      /*
> 



-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/13] spapr/xive: add state synchronization with KVM
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 04/13] spapr/xive: add state synchronization with KVM Cédric Le Goater
@ 2019-02-26  0:01   ` David Gibson
  2019-03-11 20:41     ` Cédric Le Goater
  0 siblings, 1 reply; 32+ messages in thread
From: David Gibson @ 2019-02-26  0:01 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: Greg Kurz, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 8384 bytes --]

On Fri, Feb 22, 2019 at 02:13:13PM +0100, Cédric Le Goater wrote:
> This extends the KVM XIVE device backend with 'synchronize_state'
> methods used to retrieve the state from KVM. The HW state of the
> sources, the KVM device and the thread interrupt contexts are
> collected for the monitor usage and also migration.
> 
> These get operations rely on their KVM counterpart in the host kernel
> which acts as a proxy for OPAL, the host firmware. The set operations
> will be added for migration support later.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  include/hw/ppc/spapr_xive.h |  8 ++++
>  include/hw/ppc/xive.h       |  1 +
>  hw/intc/spapr_xive.c        | 17 ++++---
>  hw/intc/spapr_xive_kvm.c    | 89 +++++++++++++++++++++++++++++++++++++
>  hw/intc/xive.c              | 10 +++++
>  5 files changed, 118 insertions(+), 7 deletions(-)
> 
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index 749c6cbc2c56..ebd65e7fe36b 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -44,6 +44,13 @@ typedef struct sPAPRXive {
>      void          *tm_mmap;
>  } sPAPRXive;
>  
> +/*
> + * The sPAPR machine has a unique XIVE IC device. Assign a fixed value
> + * to the controller block id value. It can nevertheless be changed
> + * for testing purpose.
> + */
> +#define SPAPR_XIVE_BLOCK_ID 0x0
> +
>  bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi);
>  bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t lisn);
>  void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
> @@ -74,5 +81,6 @@ void kvmppc_xive_set_queue_config(sPAPRXive *xive, uint8_t end_blk,
>  void kvmppc_xive_get_queue_config(sPAPRXive *xive, uint8_t end_blk,
>                                   uint32_t end_idx, XiveEND *end,
>                                   Error **errp);
> +void kvmppc_xive_synchronize_state(sPAPRXive *xive, Error **errp);
>  
>  #endif /* PPC_SPAPR_XIVE_H */
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index 061d43fea24d..f3766fd881a2 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -431,5 +431,6 @@ void kvmppc_xive_source_reset_one(XiveSource *xsrc, int srcno, Error **errp);
>  void kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp);
>  void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val);
>  void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp);
> +void kvmppc_xive_cpu_synchronize_state(XiveTCTX *tctx, Error **errp);
>  
>  #endif /* PPC_XIVE_H */
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 3db24391e31c..9f07567f4d78 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -40,13 +40,6 @@
>  
>  #define SPAPR_XIVE_NVT_BASE 0x400
>  
> -/*
> - * The sPAPR machine has a unique XIVE IC device. Assign a fixed value
> - * to the controller block id value. It can nevertheless be changed
> - * for testing purpose.
> - */
> -#define SPAPR_XIVE_BLOCK_ID 0x0
> -
>  /*
>   * sPAPR NVT and END indexing helpers
>   */
> @@ -153,6 +146,16 @@ void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
>      XiveSource *xsrc = &xive->source;
>      int i;
>  
> +    if (kvm_irqchip_in_kernel()) {
> +        Error *local_err = NULL;
> +
> +        kvmppc_xive_synchronize_state(xive, &local_err);
> +        if (local_err) {
> +            error_report_err(local_err);
> +            return;
> +        }
> +    }
> +
>      monitor_printf(mon, "  LSIN         PQ    EISN     CPU/PRIO EQ\n");
>  
>      for (i = 0; i < xive->nr_irqs; i++) {
> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> index 6b50451b4f85..4b1ffb9835f9 100644
> --- a/hw/intc/spapr_xive_kvm.c
> +++ b/hw/intc/spapr_xive_kvm.c
> @@ -60,6 +60,57 @@ static void kvm_cpu_enable(CPUState *cs)
>  /*
>   * XIVE Thread Interrupt Management context (KVM)
>   */
> +static void kvmppc_xive_cpu_get_state(XiveTCTX *tctx, Error **errp)
> +{
> +    uint64_t state[4] = { 0 };
> +    int ret;
> +
> +    ret = kvm_get_one_reg(tctx->cs, KVM_REG_PPC_NVT_STATE, state);
> +    if (ret != 0) {
> +        error_setg_errno(errp, errno,
> +                         "XIVE: could not capture KVM state of CPU %ld",
> +                         kvm_arch_vcpu_id(tctx->cs));
> +        return;
> +    }
> +
> +    /* word0 and word1 of the OS ring. */
> +    *((uint64_t *) &tctx->regs[TM_QW1_OS]) = state[0];
> +
> +    /*
> +     * KVM also returns word2 containing the OS CAM line which is
> +     * interesting to print out in the QEMU monitor.
> +     */
> +    *((uint64_t *) &tctx->regs[TM_QW1_OS + TM_WORD2]) = state[1];

As mentioned elsewhere, it is interesting for debugging, but doesn't
seem to really match the guest visible CAM state, so I'm not convinced
it's a good idea to put it into the regs[] structure.

> +}
> +
> +typedef struct {
> +    XiveTCTX *tctx;
> +    Error *err;
> +} XiveCpuGetState;
> +
> +static void kvmppc_xive_cpu_do_synchronize_state(CPUState *cpu,
> +                                                 run_on_cpu_data arg)
> +{
> +    XiveCpuGetState *s = arg.host_ptr;
> +
> +    kvmppc_xive_cpu_get_state(s->tctx, &s->err);
> +}
> +
> +void kvmppc_xive_cpu_synchronize_state(XiveTCTX *tctx, Error **errp)
> +{
> +    XiveCpuGetState s = {
> +        .tctx = tctx,
> +        .err = NULL,
> +    };
> +
> +    run_on_cpu(tctx->cs, kvmppc_xive_cpu_do_synchronize_state,
> +               RUN_ON_CPU_HOST_PTR(&s));

Why does this need a run_on_cpu() ?  The KVM call which is getting the
actual info takes a cpu parameter.

> +
> +    if (s.err) {
> +        error_propagate(errp, s.err);
> +        return;
> +    }
> +}
>  
>  void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp)
>  {
> @@ -229,6 +280,19 @@ uint64_t kvmppc_xive_esb_rw(XiveSource *xsrc, int srcno, uint32_t offset,
>      }
>  }
>  
> +static void kvmppc_xive_source_get_state(XiveSource *xsrc)
> +{
> +    int i;
> +
> +    for (i = 0; i < xsrc->nr_irqs; i++) {
> +        /* Perform a load without side effect to retrieve the PQ bits */
> +        uint8_t pq = xive_esb_read(xsrc, i, XIVE_ESB_GET);
> +
> +        /* and save PQ locally */
> +        xive_source_esb_set(xsrc, i, pq);
> +    }
> +}
> +
>  void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val)
>  {
>      XiveSource *xsrc = opaque;
> @@ -340,6 +404,31 @@ void kvmppc_xive_reset(sPAPRXive *xive, Error **errp)
>                        NULL, true, errp);
>  }
>  
> +static void kvmppc_xive_get_queues(sPAPRXive *xive, Error **errp)
> +{
> +    Error *local_err = NULL;
> +    int i;
> +
> +    for (i = 0; i < xive->nr_ends; i++) {
> +        kvmppc_xive_get_queue_config(xive, SPAPR_XIVE_BLOCK_ID, i,
> +                                     &xive->endt[i], &local_err);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            return;
> +        }
> +    }
> +}
> +
> +void kvmppc_xive_synchronize_state(sPAPRXive *xive, Error **errp)
> +{
> +    kvmppc_xive_source_get_state(&xive->source);
> +
> +    /* EAT: there is no extra state to query from KVM */
> +
> +    /* ENDT */
> +    kvmppc_xive_get_queues(xive, errp);
> +}
> +
>  static void *kvmppc_xive_mmap(sPAPRXive *xive, int pgoff, size_t len,
>                                Error **errp)
>  {
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index 0284b5803551..f478c52ab2a0 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -431,6 +431,16 @@ void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon)
>      int cpu_index = tctx->cs ? tctx->cs->cpu_index : -1;
>      int i;
>  
> +    if (kvm_irqchip_in_kernel()) {
> +        Error *local_err = NULL;
> +
> +        kvmppc_xive_cpu_synchronize_state(tctx, &local_err);
> +        if (local_err) {
> +            error_report_err(local_err);
> +            return;
> +        }
> +    }
> +
>      monitor_printf(mon, "CPU[%04x]:   QW   NSR CPPR IPB LSMFB ACK# INC AGE PIPR"
>                     "  W2\n", cpu_index);
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/13] spapr/xive: introduce a VM state change handler
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 05/13] spapr/xive: introduce a VM state change handler Cédric Le Goater
@ 2019-02-26  0:39   ` David Gibson
  0 siblings, 0 replies; 32+ messages in thread
From: David Gibson @ 2019-02-26  0:39 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: Greg Kurz, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 5441 bytes --]

On Fri, Feb 22, 2019 at 02:13:14PM +0100, Cédric Le Goater wrote:
> This handler is in charge of stabilizing the flow of event notifications
> in the XIVE controller before migrating a guest. This is a requirement
> before transferring the guest EQ pages to a destination.
> 
> When the VM is stopped, the handler masks the sources (PQ=01) to stop
> the flow of events and saves their previous state. The XIVE controller
> is then synced through KVM to flush any in-flight event notification
> and to stabilize the EQs. At this stage, the EQ pages are marked dirty
> to make sure the EQ pages are transferred if a migration sequence is
> in progress.
> 
> The previous configuration of the sources is restored when the VM
> resumes, after a migration or a stop.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  include/hw/ppc/spapr_xive.h |  1 +
>  hw/intc/spapr_xive_kvm.c    | 77 ++++++++++++++++++++++++++++++++++++-
>  2 files changed, 77 insertions(+), 1 deletion(-)
> 
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index ebd65e7fe36b..298d204d54ef 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -42,6 +42,7 @@ typedef struct sPAPRXive {
>      /* KVM support */
>      int           fd;
>      void          *tm_mmap;
> +    VMChangeStateEntry *change;
>  } sPAPRXive;
>  
>  /*
> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> index 4b1ffb9835f9..44d80175b1b5 100644
> --- a/hw/intc/spapr_xive_kvm.c
> +++ b/hw/intc/spapr_xive_kvm.c
> @@ -419,9 +419,81 @@ static void kvmppc_xive_get_queues(sPAPRXive *xive, Error **errp)
>      }
>  }
>  
> +/*
> + * The primary goal of the XIVE VM change handler is to mark the EQ
> + * pages dirty when all XIVE event notifications have stopped.
> + *
> + * Whenever the VM is stopped, the VM change handler masks the sources
> + * (PQ=01) to stop the flow of events and saves the previous state in
> + * anticipation of a migration. The XIVE controller is then synced
> + * through KVM to flush any in-flight event notification and stabilize
> + * the EQs.
> + *
> + * At this stage, we can mark the EQ page dirty and let a migration
> + * sequence transfer the EQ pages to the destination, which is done
> + * just after the stop state.
> + *
> + * The previous configuration of the sources is restored when the VM
> + * runs again.
> + */
> +static void kvmppc_xive_change_state_handler(void *opaque, int running,
> +                                             RunState state)
> +{
> +    sPAPRXive *xive = opaque;
> +    XiveSource *xsrc = &xive->source;
> +    Error *local_err = NULL;
> +    int i;
> +
> +    /*
> +     * Restore the sources to their initial state. This is called when
> +     * the VM resumes after a stop or a migration.
> +     */
> +    if (running) {
> +        for (i = 0; i < xsrc->nr_irqs; i++) {
> +            uint8_t pq = xive_source_esb_get(xsrc, i);
> +            if (xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_00 + (pq << 8)) != 0x1) {
> +                error_report("XIVE: IRQ %d has an invalid state", i);
> +            }
> +        }
> +
> +        return;
> +    }
> +
> +    /*
> +     * Mask the sources, to stop the flow of event notifications, and
> +     * save the PQs locally in the XiveSource object. The XiveSource
> +     * state will be collected later on by its vmstate handler if a
> +     * migration is in progress.
> +     */
> +    for (i = 0; i < xsrc->nr_irqs; i++) {
> +        uint8_t pq = xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_01);
> +        xive_source_esb_set(xsrc, i, pq);
> +    }
> +
> +    /*
> +     * Sync the XIVE controller in KVM, to flush in-flight event
> +     * notification that should be enqueued in the EQs and mark the
> +     * XIVE EQ pages dirty to collect all updates.
> +     */
> +    kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_CTRL,
> +                      KVM_DEV_XIVE_EQ_SYNC, NULL, true, &local_err);
> +    if (local_err) {
> +        error_report_err(local_err);
> +        return;
> +    }
> +}
> +
>  void kvmppc_xive_synchronize_state(sPAPRXive *xive, Error **errp)
>  {
> -    kvmppc_xive_source_get_state(&xive->source);
> +    /*
> +     * When the VM is stopped, the sources are masked and the previous
> +     * state is saved in anticipation of a migration. We should not
> +     * synchronize the source state in that case else we will override
> +     * the saved state.
> +     */
> +    if (runstate_is_running()) {
> +        kvmppc_xive_source_get_state(&xive->source);
> +    }
>  
>      /* EAT: there is no extra state to query from KVM */
>  
> @@ -501,6 +573,9 @@ void kvmppc_xive_connect(sPAPRXive *xive, Error **errp)
>                                        "xive.tima", tima_len, xive->tm_mmap);
>      sysbus_init_mmio(SYS_BUS_DEVICE(xive), &xive->tm_mmio);
>  
> +    xive->change = qemu_add_vm_change_state_handler(
> +        kvmppc_xive_change_state_handler, xive);
> +
>      kvm_kernel_irqchip = true;
>      kvm_msi_via_irqfd_allowed = true;
>      kvm_gsi_direct_mapping = true;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/13] spapr/xive: add migration support for KVM
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 06/13] spapr/xive: add migration support for KVM Cédric Le Goater
@ 2019-02-26  0:58   ` David Gibson
  0 siblings, 0 replies; 32+ messages in thread
From: David Gibson @ 2019-02-26  0:58 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: Greg Kurz, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 9689 bytes --]

On Fri, Feb 22, 2019 at 02:13:15PM +0100, Cédric Le Goater wrote:
> When the VM is stopped, the VM state handler stabilizes the XIVE IC
> and marks the EQ pages dirty. These are then transferred to destination
> before the transfer of the device vmstates starts.
> 
> The sPAPRXive interrupt controller model captures the XIVE internal
> tables, EAT and ENDT and the XiveTCTX model does the same for the
> thread interrupt context registers.
> 
> At restart, the sPAPRXive 'post_load' method restores all the XIVE
> states. It is called by the sPAPR machine 'post_load' method, when all
> XIVE states have been transferred and loaded.
> 
> Finally, the source states are restored in the VM change state handler
> when the machine reaches the running state.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  include/hw/ppc/spapr_xive.h |  3 ++
>  include/hw/ppc/xive.h       |  1 +
>  hw/intc/spapr_xive.c        | 24 ++++++++++
>  hw/intc/spapr_xive_kvm.c    | 93 ++++++++++++++++++++++++++++++++++++-
>  hw/intc/xive.c              | 17 +++++++
>  hw/ppc/spapr_irq.c          |  2 +-
>  6 files changed, 138 insertions(+), 2 deletions(-)
> 
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index 298d204d54ef..22d70650b51f 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -55,6 +55,7 @@ typedef struct sPAPRXive {
>  bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi);
>  bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t lisn);
>  void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
> +int spapr_xive_post_load(sPAPRXive *xive, int version_id);
>  
>  void spapr_xive_hcall_init(sPAPRMachineState *spapr);
>  void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
> @@ -83,5 +84,7 @@ void kvmppc_xive_get_queue_config(sPAPRXive *xive, uint8_t end_blk,
>                                   uint32_t end_idx, XiveEND *end,
>                                   Error **errp);
>  void kvmppc_xive_synchronize_state(sPAPRXive *xive, Error **errp);
> +int kvmppc_xive_pre_save(sPAPRXive *xive);
> +int kvmppc_xive_post_load(sPAPRXive *xive, int version_id);
>  
>  #endif /* PPC_SPAPR_XIVE_H */
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index f3766fd881a2..3b1baa783975 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -432,5 +432,6 @@ void kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp);
>  void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val);
>  void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp);
>  void kvmppc_xive_cpu_synchronize_state(XiveTCTX *tctx, Error **errp);
> +void kvmppc_xive_cpu_get_state(XiveTCTX *tctx, Error **errp);
>  
>  #endif /* PPC_XIVE_H */
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 9f07567f4d78..21fe5e1aa39f 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -469,10 +469,34 @@ static const VMStateDescription vmstate_spapr_xive_eas = {
>      },
>  };
>  
> +static int vmstate_spapr_xive_pre_save(void *opaque)
> +{
> +    if (kvm_irqchip_in_kernel()) {
> +        return kvmppc_xive_pre_save(SPAPR_XIVE(opaque));
> +    }
> +
> +    return 0;
> +}
> +
> +/*
> + * Called by the sPAPR IRQ backend 'post_load' method at the machine
> + * level.
> + */
> +int spapr_xive_post_load(sPAPRXive *xive, int version_id)
> +{
> +    if (kvm_irqchip_in_kernel()) {
> +        return kvmppc_xive_post_load(xive, version_id);
> +    }
> +
> +    return 0;
> +}
> +
>  static const VMStateDescription vmstate_spapr_xive = {
>      .name = TYPE_SPAPR_XIVE,
>      .version_id = 1,
>      .minimum_version_id = 1,
> +    .pre_save = vmstate_spapr_xive_pre_save,
> +    .post_load = NULL, /* handled at the machine level */
>      .fields = (VMStateField[]) {
>          VMSTATE_UINT32_EQUAL(nr_irqs, sPAPRXive, NULL),
>          VMSTATE_STRUCT_VARRAY_POINTER_UINT32(eat, sPAPRXive, nr_irqs,
> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> index 44d80175b1b5..119fd59fc9ae 100644
> --- a/hw/intc/spapr_xive_kvm.c
> +++ b/hw/intc/spapr_xive_kvm.c
> @@ -15,6 +15,7 @@
>  #include "sysemu/cpus.h"
>  #include "sysemu/kvm.h"
>  #include "hw/ppc/spapr.h"
> +#include "hw/ppc/spapr_cpu_core.h"
>  #include "hw/ppc/spapr_xive.h"
>  #include "hw/ppc/xive.h"
>  #include "kvm_ppc.h"
> @@ -60,7 +61,30 @@ static void kvm_cpu_enable(CPUState *cs)
>  /*
>   * XIVE Thread Interrupt Management context (KVM)
>   */
> -static void kvmppc_xive_cpu_get_state(XiveTCTX *tctx, Error **errp)
> +
> +static void kvmppc_xive_cpu_set_state(XiveTCTX *tctx, Error **errp)
> +{
> +    uint64_t state[4];
> +    int ret;
> +
> +    /* word0 and word1 of the OS ring. */
> +    state[0] = *((uint64_t *) &tctx->regs[TM_QW1_OS]);
> +
> +    /*
> +     * OS CAM line. Used by KVM to print out the VP identifier. This
> +     * is for debug only.
> +     */
> +    state[1] = *((uint64_t *) &tctx->regs[TM_QW1_OS + TM_WORD2]);
> +
> +    ret = kvm_set_one_reg(tctx->cs, KVM_REG_PPC_NVT_STATE, state);
> +    if (ret != 0) {
> +        error_setg_errno(errp, errno,
> +                         "XIVE: could not restore KVM state of CPU %ld",
> +                         kvm_arch_vcpu_id(tctx->cs));
> +    }
> +}
> +
> +void kvmppc_xive_cpu_get_state(XiveTCTX *tctx, Error **errp)
>  {
>      uint64_t state[4] = { 0 };
>      int ret;
> @@ -501,6 +525,73 @@ void kvmppc_xive_synchronize_state(sPAPRXive *xive, Error **errp)
>      kvmppc_xive_get_queues(xive, errp);
>  }
>  
> +/*
> + * The sPAPRXive 'pre_save' method is called by the vmstate handler of
> + * the sPAPRXive model, after the XIVE controller is synced in the VM
> + * change handler.
> + */
> +int kvmppc_xive_pre_save(sPAPRXive *xive)
> +{
> +    Error *local_err = NULL;
> +
> +    /* EAT: there is no extra state to query from KVM */
> +
> +    /* ENDT */
> +    kvmppc_xive_get_queues(xive, &local_err);
> +    if (local_err) {
> +        error_report_err(local_err);
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +/*
> + * The sPAPRXive 'post_load' method is not called by a vmstate
> + * handler. It is called at the sPAPR machine level at the end of the
> + * migration sequence by the sPAPR IRQ backend 'post_load' method,
> + * when all XIVE states have been transferred and loaded.
> + */
> +int kvmppc_xive_post_load(sPAPRXive *xive, int version_id)
> +{
> +    Error *local_err = NULL;
> +    CPUState *cs;
> +    int i;
> +
> +    /* Restore the ENDT first. The targetting depends on it. */
> +    for (i = 0; i < xive->nr_ends; i++) {
> +        kvmppc_xive_set_queue_config(xive, SPAPR_XIVE_BLOCK_ID, i,
> +                                     &xive->endt[i], &local_err);
> +        if (local_err) {
> +            error_report_err(local_err);
> +            return -1;
> +        }
> +    }
> +
> +    /* Restore the EAT */
> +    for (i = 0; i < xive->nr_irqs; i++) {
> +        kvmppc_xive_set_source_config(xive, i, &xive->eat[i], &local_err);
> +        if (local_err) {
> +            error_report_err(local_err);
> +            return -1;
> +        }
> +    }
> +
> +    /* Restore the thread interrupt contexts */
> +    CPU_FOREACH(cs) {
> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
> +
> +        kvmppc_xive_cpu_set_state(spapr_cpu_state(cpu)->tctx, &local_err);
> +        if (local_err) {
> +            error_report_err(local_err);
> +            return -1;
> +        }
> +    }
> +
> +    /* The source states will be restored when the machine starts running */
> +    return 0;
> +}
> +
>  static void *kvmppc_xive_mmap(sPAPRXive *xive, int pgoff, size_t len,
>                                Error **errp)
>  {
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index f478c52ab2a0..1f8e923ca654 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -518,10 +518,27 @@ static void xive_tctx_unrealize(DeviceState *dev, Error **errp)
>      qemu_unregister_reset(xive_tctx_reset, dev);
>  }
>  
> +static int vmstate_xive_tctx_pre_save(void *opaque)
> +{
> +    Error *local_err = NULL;
> +
> +    if (kvm_irqchip_in_kernel()) {
> +        kvmppc_xive_cpu_get_state(XIVE_TCTX(opaque), &local_err);
> +        if (local_err) {
> +            error_report_err(local_err);
> +            return -1;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
>  static const VMStateDescription vmstate_xive_tctx = {
>      .name = TYPE_XIVE_TCTX,
>      .version_id = 1,
>      .minimum_version_id = 1,
> +    .pre_save = vmstate_xive_tctx_pre_save,
> +    .post_load = NULL, /* handled by the sPAPRxive model */
>      .fields = (VMStateField[]) {
>          VMSTATE_BUFFER(regs, XiveTCTX),
>          VMSTATE_END_OF_LIST()
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index 1ad57582a403..12ecca6264f3 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -356,7 +356,7 @@ static void spapr_irq_cpu_intc_create_xive(sPAPRMachineState *spapr,
>  
>  static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
>  {
> -    return 0;
> +    return spapr_xive_post_load(spapr->xive, version_id);
>  }
>  
>  static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/13] spapr/xive: fix migration of the XiveTCTX under TCG
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 07/13] spapr/xive: fix migration of the XiveTCTX under TCG Cédric Le Goater
@ 2019-02-26  1:02   ` David Gibson
  2019-03-11 20:45     ` Cédric Le Goater
  0 siblings, 1 reply; 32+ messages in thread
From: David Gibson @ 2019-02-26  1:02 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: Greg Kurz, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2396 bytes --]

On Fri, Feb 22, 2019 at 02:13:16PM +0100, Cédric Le Goater wrote:
> When the thread interrupt management state is retrieved from the KVM
> VCPU, word2 is saved under the QEMU XIVE thread context to print out
> the OS CAM line under the QEMU monitor.
> 
> This breaks the migration of a TCG guest (and with KVM when
> kernel_irqchip=off) because the matching algorithm of the presenter
> relies on the OS CAM value. Fix with an extra reset of the thread
> contexts to restore the expected value.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

As noted elsewhere, I'm not sure this is the right approach to fixing
this.  In any case this can be folded into the previous patch.

> ---
>  hw/ppc/spapr_irq.c | 26 +++++++++++++++++++++++++-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index 12ecca6264f3..3176098b9f7c 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -356,7 +356,31 @@ static void spapr_irq_cpu_intc_create_xive(sPAPRMachineState *spapr,
>  
>  static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
>  {
> -    return spapr_xive_post_load(spapr->xive, version_id);
> +    CPUState *cs;
> +    int ret;
> +
> +    ret = spapr_xive_post_load(spapr->xive, version_id);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    /*
> +     * When the states are collected from the KVM XIVE device, word2
> +     * of the XiveTCTX is set to print out the OS CAM line under the
> +     * QEMU monitor.
> +     *
> +     * This breaks the migration on a TCG guest (or on KVM with
> +     * kernel_irqchip=off) because the matching algorithm of the
> +     * presenter relies on the OS CAM value. Fix with an extra reset
> +     * of the thread contexts to restore the expected value.
> +     */
> +    CPU_FOREACH(cs) {
> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
> +
> +        /* (TCG) Set the OS CAM line of the thread interrupt context. */
> +        spapr_xive_set_tctx_os_cam(spapr_cpu_state(cpu)->tctx);
> +    }
> +    return 0;
>  }
>  
>  static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/13] spapr: introduce routines to delete the KVM IRQ device
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 10/13] spapr: introduce routines to delete the KVM IRQ device Cédric Le Goater
@ 2019-02-26  1:10   ` David Gibson
  0 siblings, 0 replies; 32+ messages in thread
From: David Gibson @ 2019-02-26  1:10 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: Greg Kurz, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 7016 bytes --]

On Fri, Feb 22, 2019 at 02:13:19PM +0100, Cédric Le Goater wrote:
> If a new interrupt mode is chosen by CAS, the machine generates a
> reset to reconfigure. At this point, the connection with the previous
> KVM device needs to be closed and a new connection needs to opened
> with the KVM device operating the chosen interrupt mode.
> 
> New routines are introduced to destroy the XICS and the XIVE KVM
> devices. They make use of a new KVM device ioctl which destroys the
> device and also disconnects the IRQ presenters from the vCPUs.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Ugly, but necessary

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  include/hw/ppc/spapr_xive.h |  1 +
>  include/hw/ppc/xics_spapr.h |  1 +
>  hw/intc/spapr_xive_kvm.c    | 60 +++++++++++++++++++++++++++++++++++++
>  hw/intc/xics_kvm.c          | 56 ++++++++++++++++++++++++++++++++++
>  4 files changed, 118 insertions(+)
> 
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index 22d70650b51f..a7c4c275a747 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -71,6 +71,7 @@ int spapr_xive_end_to_target(uint8_t end_blk, uint32_t end_idx,
>   * KVM XIVE device helpers
>   */
>  void kvmppc_xive_connect(sPAPRXive *xive, Error **errp);
> +void kvmppc_xive_disconnect(sPAPRXive *xive, Error **errp);
>  void kvmppc_xive_reset(sPAPRXive *xive, Error **errp);
>  void kvmppc_xive_set_source_config(sPAPRXive *xive, uint32_t lisn, XiveEAS *eas,
>                                     Error **errp);
> diff --git a/include/hw/ppc/xics_spapr.h b/include/hw/ppc/xics_spapr.h
> index b8d924baf437..bddf09821cb0 100644
> --- a/include/hw/ppc/xics_spapr.h
> +++ b/include/hw/ppc/xics_spapr.h
> @@ -34,6 +34,7 @@
>  void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
>                     uint32_t phandle);
>  int xics_kvm_init(sPAPRMachineState *spapr, Error **errp);
> +int xics_kvm_disconnect(sPAPRMachineState *spapr, Error **errp);
>  void xics_spapr_init(sPAPRMachineState *spapr);
>  
>  #endif /* XICS_SPAPR_H */
> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> index 119fd59fc9ae..e31035c90260 100644
> --- a/hw/intc/spapr_xive_kvm.c
> +++ b/hw/intc/spapr_xive_kvm.c
> @@ -58,6 +58,16 @@ static void kvm_cpu_enable(CPUState *cs)
>      QLIST_INSERT_HEAD(&kvm_enabled_cpus, enabled_cpu, node);
>  }
>  
> +static void kvm_cpu_disable_all(void)
> +{
> +    KVMEnabledCPU *enabled_cpu, *next;
> +
> +    QLIST_FOREACH_SAFE(enabled_cpu, &kvm_enabled_cpus, node, next) {
> +        QLIST_REMOVE(enabled_cpu, node);
> +        g_free(enabled_cpu);
> +    }
> +}
> +
>  /*
>   * XIVE Thread Interrupt Management context (KVM)
>   */
> @@ -674,3 +684,53 @@ void kvmppc_xive_connect(sPAPRXive *xive, Error **errp)
>      /* Map all regions */
>      spapr_xive_map_mmio(xive);
>  }
> +
> +void kvmppc_xive_disconnect(sPAPRXive *xive, Error **errp)
> +{
> +    XiveSource *xsrc;
> +    struct kvm_destroy_device xive_destroy_device;
> +    size_t esb_len;
> +    int rc;
> +
> +    /* The KVM XIVE device is not in use */
> +    if (!xive || xive->fd == -1) {
> +        return;
> +    }
> +
> +    if (!kvmppc_has_cap_xive()) {
> +        error_setg(errp, "IRQ_XIVE capability must be present for KVM");
> +        return;
> +    }
> +
> +    /* Clear the KVM mapping */
> +    xsrc = &xive->source;
> +    esb_len = (1ull << xsrc->esb_shift) * xsrc->nr_irqs;
> +
> +    sysbus_mmio_unmap(SYS_BUS_DEVICE(xive), 0);
> +    munmap(xsrc->esb_mmap, esb_len);
> +
> +    sysbus_mmio_unmap(SYS_BUS_DEVICE(xive), 1);
> +
> +    sysbus_mmio_unmap(SYS_BUS_DEVICE(xive), 2);
> +    munmap(xive->tm_mmap, 4ull << TM_SHIFT);
> +
> +    /* Destroy the KVM device. This also clears the VCPU presenters */
> +    xive_destroy_device.fd = xive->fd;
> +    xive_destroy_device.flags = 0;
> +    rc = kvm_vm_ioctl(kvm_state, KVM_DESTROY_DEVICE, &xive_destroy_device);
> +    if (rc < 0) {
> +        error_setg_errno(errp, -rc, "Error on KVM_DESTROY_DEVICE for XIVE");
> +    }
> +    close(xive->fd);
> +    xive->fd = -1;
> +
> +    kvm_kernel_irqchip = false;
> +    kvm_msi_via_irqfd_allowed = false;
> +    kvm_gsi_direct_mapping = false;
> +
> +    /* Clear the local list of presenter (hotplug) */
> +    kvm_cpu_disable_all();
> +
> +    /* VM Change state handler is not needed anymore */
> +    qemu_del_vm_change_state_handler(xive->change);
> +}
> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
> index c6e1b630a404..373de3155f6b 100644
> --- a/hw/intc/xics_kvm.c
> +++ b/hw/intc/xics_kvm.c
> @@ -51,6 +51,16 @@ typedef struct KVMEnabledICP {
>  static QLIST_HEAD(, KVMEnabledICP)
>      kvm_enabled_icps = QLIST_HEAD_INITIALIZER(&kvm_enabled_icps);
>  
> +static void kvm_disable_icps(void)
> +{
> +    KVMEnabledICP *enabled_icp, *next;
> +
> +    QLIST_FOREACH_SAFE(enabled_icp, &kvm_enabled_icps, node, next) {
> +        QLIST_REMOVE(enabled_icp, node);
> +        g_free(enabled_icp);
> +    }
> +}
> +
>  /*
>   * ICP-KVM
>   */
> @@ -360,3 +370,49 @@ fail:
>      kvmppc_define_rtas_kernel_token(0, "ibm,int-off");
>      return -1;
>  }
> +
> +int xics_kvm_disconnect(sPAPRMachineState *spapr, Error **errp)
> +{
> +    int rc;
> +    struct kvm_destroy_device xics_destroy_device = {
> +        .fd = kernel_xics_fd,
> +        .flags = 0,
> +    };
> +
> +    /* The KVM XICS device is not in use */
> +    if (kernel_xics_fd == -1) {
> +        return 0;
> +    }
> +
> +    if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_IRQ_XICS)) {
> +        error_setg(errp,
> +                   "KVM and IRQ_XICS capability must be present for KVM XICS device");
> +        return -1;
> +    }
> +
> +    rc = kvm_vm_ioctl(kvm_state, KVM_DESTROY_DEVICE, &xics_destroy_device);
> +    if (rc < 0) {
> +        error_setg_errno(errp, -rc, "Error on KVM_DESTROY_DEVICE for XICS");
> +    }
> +    close(kernel_xics_fd);
> +    kernel_xics_fd = -1;
> +
> +    spapr_rtas_unregister(RTAS_IBM_SET_XIVE);
> +    spapr_rtas_unregister(RTAS_IBM_GET_XIVE);
> +    spapr_rtas_unregister(RTAS_IBM_INT_OFF);
> +    spapr_rtas_unregister(RTAS_IBM_INT_ON);
> +
> +    kvmppc_define_rtas_kernel_token(0, "ibm,set-xive");
> +    kvmppc_define_rtas_kernel_token(0, "ibm,get-xive");
> +    kvmppc_define_rtas_kernel_token(0, "ibm,int-on");
> +    kvmppc_define_rtas_kernel_token(0, "ibm,int-off");
> +
> +    kvm_kernel_irqchip = false;
> +    kvm_msi_via_irqfd_allowed = false;
> +    kvm_gsi_direct_mapping = false;
> +
> +    /* Clear the presenter from the VCPUs */
> +    kvm_disable_icps();
> +
> +    return rc;
> +}

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/13] spapr: check for the activation of the KVM IRQ device
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 11/13] spapr: check for the activation of " Cédric Le Goater
@ 2019-02-26  1:27   ` David Gibson
  0 siblings, 0 replies; 32+ messages in thread
From: David Gibson @ 2019-02-26  1:27 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: Greg Kurz, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 5549 bytes --]

On Fri, Feb 22, 2019 at 02:13:20PM +0100, Cédric Le Goater wrote:
> The activation of the KVM IRQ device depends on the interrupt mode
> chosen at CAS time by the machine and some methods used at reset or by
> the migration need to be protected.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  hw/intc/spapr_xive_kvm.c | 28 ++++++++++++++++++++++++++++
>  hw/intc/xics_kvm.c       | 26 +++++++++++++++++++++++++-
>  2 files changed, 53 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> index e31035c90260..cd81cdb23a5e 100644
> --- a/hw/intc/spapr_xive_kvm.c
> +++ b/hw/intc/spapr_xive_kvm.c
> @@ -96,9 +96,15 @@ static void kvmppc_xive_cpu_set_state(XiveTCTX *tctx, Error **errp)
>  
>  void kvmppc_xive_cpu_get_state(XiveTCTX *tctx, Error **errp)
>  {
> +    sPAPRXive *xive = SPAPR_MACHINE(qdev_get_machine())->xive;
>      uint64_t state[4] = { 0 };
>      int ret;
>  
> +    /* The KVM XIVE device is not in use */
> +    if (xive->fd == -1) {
> +        return;
> +    }
> +
>      ret = kvm_get_one_reg(tctx->cs, KVM_REG_PPC_NVT_STATE, state);
>      if (ret != 0) {
>          error_setg_errno(errp, errno,
> @@ -152,6 +158,11 @@ void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp)
>      unsigned long vcpu_id;
>      int ret;
>  
> +    /* The KVM XIVE device is not in use */
> +    if (xive->fd == -1) {
> +        return;
> +    }
> +
>      /* Check if CPU was hot unplugged and replugged. */
>      if (kvm_cpu_is_enabled(tctx->cs)) {
>          return;
> @@ -330,9 +341,13 @@ static void kvmppc_xive_source_get_state(XiveSource *xsrc)
>  void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val)
>  {
>      XiveSource *xsrc = opaque;
> +    sPAPRXive *xive = SPAPR_XIVE(xsrc->xive);
>      struct kvm_irq_level args;
>      int rc;
>  
> +    /* The KVM XIVE device should be in use */
> +    assert(xive->fd != -1);
> +
>      args.irq = srcno;
>      if (!xive_source_irq_is_lsi(xsrc, srcno)) {
>          if (!val) {
> @@ -519,6 +534,11 @@ static void kvmppc_xive_change_state_handler(void *opaque, int running,
>  
>  void kvmppc_xive_synchronize_state(sPAPRXive *xive, Error **errp)
>  {
> +    /* The KVM XIVE device is not in use */
> +    if (xive->fd == -1) {
> +        return;
> +    }
> +
>      /*
>       * When the VM is stopped, the sources are masked and the previous
>       * state is saved in anticipation of a migration. We should not
> @@ -544,6 +564,11 @@ int kvmppc_xive_pre_save(sPAPRXive *xive)
>  {
>      Error *local_err = NULL;
>  
> +    /* The KVM XIVE device is not in use */
> +    if (xive->fd == -1) {
> +        return 0;
> +    }
> +
>      /* EAT: there is no extra state to query from KVM */
>  
>      /* ENDT */
> @@ -568,6 +593,9 @@ int kvmppc_xive_post_load(sPAPRXive *xive, int version_id)
>      CPUState *cs;
>      int i;
>  
> +    /* The KVM XIVE device should be in use */
> +    assert(xive->fd != -1);
> +
>      /* Restore the ENDT first. The targetting depends on it. */
>      for (i = 0; i < xive->nr_ends; i++) {
>          kvmppc_xive_set_queue_config(xive, SPAPR_XIVE_BLOCK_ID, i,
> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
> index 373de3155f6b..9855316e4831 100644
> --- a/hw/intc/xics_kvm.c
> +++ b/hw/intc/xics_kvm.c
> @@ -69,6 +69,11 @@ void icp_get_kvm_state(ICPState *icp)
>      uint64_t state;
>      int ret;
>  
> +    /* The KVM XICS device is not in use */
> +    if (kernel_xics_fd == -1) {
> +        return;
> +    }
> +
>      /* ICP for this CPU thread is not in use, exiting */
>      if (!icp->cs) {
>          return;
> @@ -105,6 +110,11 @@ int icp_set_kvm_state(ICPState *icp)
>      uint64_t state;
>      int ret;
>  
> +    /* The KVM XICS device is not in use */
> +    if (kernel_xics_fd == -1) {
> +        return 0;
> +    }
> +
>      /* ICP for this CPU thread is not in use, exiting */
>      if (!icp->cs) {
>          return 0;
> @@ -133,8 +143,9 @@ void icp_kvm_realize(DeviceState *dev, Error **errp)
>      unsigned long vcpu_id;
>      int ret;
>  
> +    /* The KVM XICS device is not in use */
>      if (kernel_xics_fd == -1) {
> -        abort();
> +        return;
>      }
>  
>      cs = icp->cs;
> @@ -170,6 +181,11 @@ void ics_get_kvm_state(ICSState *ics)
>      uint64_t state;
>      int i;
>  
> +    /* The KVM XICS device is not in use */
> +    if (kernel_xics_fd == -1) {
> +        return;
> +    }
> +
>      for (i = 0; i < ics->nr_irqs; i++) {
>          ICSIRQState *irq = &ics->irqs[i];
>  
> @@ -269,6 +285,11 @@ int ics_set_kvm_state(ICSState *ics)
>  {
>      int i;
>  
> +    /* The KVM XICS device is not in use */
> +    if (kernel_xics_fd == -1) {
> +        return 0;
> +    }
> +
>      for (i = 0; i < ics->nr_irqs; i++) {
>          int ret;
>  
> @@ -286,6 +307,9 @@ void ics_kvm_set_irq(ICSState *ics, int srcno, int val)
>      struct kvm_irq_level args;
>      int rc;
>  
> +    /* The KVM XICS device should be in use */
> +    assert(kernel_xics_fd != -1);
> +
>      args.irq = srcno + ics->offset;
>      if (ics->irqs[srcno].flags & XICS_FLAGS_IRQ_MSI) {
>          if (!val) {

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 13/13] spapr/xive: fix device hotplug when VM is stopped
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 13/13] spapr/xive: fix device hotplug when VM is stopped Cédric Le Goater
@ 2019-02-26  4:17   ` David Gibson
  2019-03-11 20:59     ` Cédric Le Goater
  0 siblings, 1 reply; 32+ messages in thread
From: David Gibson @ 2019-02-26  4:17 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: Greg Kurz, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2850 bytes --]

On Fri, Feb 22, 2019 at 02:13:22PM +0100, Cédric Le Goater wrote:
> Instead of switching off the sources, set their state to PENDING to
> possibly catch a hotplug event occuring while the VM is stopped. At
> resume, check the previous state and if an interrupt was queued,
> generate a trigger.

First, I think it would be better to fold this fix into the patch
introducing the state change handlers.

Second, IIUC this would handle any instance of an irq being triggered
while the VM is stopped.  Hotplug interrupts is one obvious case of
that, but I'm not sure its the only one.  VFIO devices could interrupt
while the VM is stopped, I think.  Maybe even emulated devices
depending on how their synchronization with the cpu run state works.
There might be other cases.  Does that sound right to you?

> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  hw/intc/spapr_xive_kvm.c | 22 +++++++++++++++++++---
>  1 file changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> index 99a829fb3f60..64d160babb26 100644
> --- a/hw/intc/spapr_xive_kvm.c
> +++ b/hw/intc/spapr_xive_kvm.c
> @@ -500,8 +500,16 @@ static void kvmppc_xive_change_state_handler(void *opaque, int running,
>      if (running) {
>          for (i = 0; i < xsrc->nr_irqs; i++) {
>              uint8_t pq = xive_source_esb_get(xsrc, i);
> -            if (xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_00 + (pq << 8)) != 0x1) {
> -                error_report("XIVE: IRQ %d has an invalid state", i);
> +            uint8_t old_pq;
> +
> +            old_pq = xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_00 + (pq << 8));
> +
> +            /*
> +             * If an interrupt was queued (hotplug event) while VM was
> +             * stopped, generate a trigger.
> +             */
> +            if (pq == XIVE_ESB_RESET && old_pq == XIVE_ESB_QUEUED) {
> +                xive_esb_trigger(xsrc, i);
>              }
>          }
>  
> @@ -515,7 +523,15 @@ static void kvmppc_xive_change_state_handler(void *opaque, int running,
>       * migration is in progress.
>       */
>      for (i = 0; i < xsrc->nr_irqs; i++) {
> -        uint8_t pq = xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_01);
> +        uint8_t pq = xive_esb_read(xsrc, i, XIVE_ESB_GET);
> +
> +        /*
> +         * PQ is set to PENDING to possibly catch a hotplug event
> +         * occuring while the VM is stopped.
> +         */
> +        if (pq != XIVE_ESB_OFF) {
> +            pq = xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_10);
> +        }
>          xive_source_esb_set(xsrc, i, pq);
>      }
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 12/13] spapr: add KVM support to the 'dual' machine
  2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 12/13] spapr: add KVM support to the 'dual' machine Cédric Le Goater
@ 2019-02-28  5:15   ` David Gibson
  0 siblings, 0 replies; 32+ messages in thread
From: David Gibson @ 2019-02-28  5:15 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: Greg Kurz, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 14824 bytes --]

On Fri, Feb 22, 2019 at 02:13:21PM +0100, Cédric Le Goater wrote:
> The interrupt mode is chosen by the CAS negotiation process and
> activated after a reset to take into account the required changes in
> the machine. This brings new constraints on how the associated KVM IRQ
> device is initialized.
> 
> Currently, each model takes care of the initialization of the KVM
> device in their realize method but this is not possible anymore as the
> initialization needs to be done globaly when the interrupt mode is
> known, i.e. when machine is reseted. It also means that we need a way
> to delete a KVM device when another mode is chosen.
> 
> Also, to support migration, the QEMU objects holding the state to
> transfer should always be available but not necessarily activated.
> 
> The overall approach of this proposal is to initialize both interrupt
> mode at the QEMU level and keep the IRQ number space in sync to allow
> switching from one mode to another. For the KVM side of things, the
> whole initialization of the KVM device, sources and presenters, is
> grouped in a single routine. The XICS and XIVE sPAPR IRQ reset
> handlers are modified accordingly to handle the init and the delete
> sequences of the KVM device.
> 
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  include/hw/ppc/spapr_xive.h |  1 +
>  hw/intc/spapr_xive.c        | 19 +++++++-
>  hw/intc/spapr_xive_kvm.c    | 27 +++++++++++
>  hw/intc/xics_kvm.c          | 26 ++++++++++
>  hw/intc/xive.c              |  4 --
>  hw/ppc/spapr_irq.c          | 97 ++++++++++++++++++++++++++++---------
>  6 files changed, 145 insertions(+), 29 deletions(-)
> 
> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
> index a7c4c275a747..a1593ac2fcf0 100644
> --- a/include/hw/ppc/spapr_xive.h
> +++ b/include/hw/ppc/spapr_xive.h
> @@ -66,6 +66,7 @@ void spapr_xive_map_mmio(sPAPRXive *xive);
>  
>  int spapr_xive_end_to_target(uint8_t end_blk, uint32_t end_idx,
>                               uint32_t *out_server, uint8_t *out_prio);
> +void spapr_xive_late_realize(sPAPRXive *xive, Error **errp);
>  
>  /*
>   * KVM XIVE device helpers
> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
> index 21fe5e1aa39f..b0cbc2fe21ee 100644
> --- a/hw/intc/spapr_xive.c
> +++ b/hw/intc/spapr_xive.c
> @@ -278,7 +278,6 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>      XiveSource *xsrc = &xive->source;
>      XiveENDSource *end_xsrc = &xive->end_source;
>      Error *local_err = NULL;
> -    MachineState *machine = MACHINE(qdev_get_machine());
>  
>      if (!xive->nr_irqs) {
>          error_setg(errp, "Number of interrupt needs to be greater 0");
> @@ -329,6 +328,15 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>                             xive->tm_base + XIVE_TM_USER_PAGE * (1 << TM_SHIFT));
>  
>      qemu_register_reset(spapr_xive_reset, dev);
> +}
> +
> +void spapr_xive_late_realize(sPAPRXive *xive, Error **errp)
> +{
> +    Error *local_err = NULL;
> +    MachineState *machine = MACHINE(qdev_get_machine());
> +    XiveSource *xsrc = &xive->source;
> +    XiveENDSource *end_xsrc = &xive->end_source;
> +    static bool once;
>  
>      if (kvm_enabled() && machine_kernel_irqchip_allowed(machine)) {
>          kvmppc_xive_connect(xive, &local_err);
> @@ -351,6 +359,15 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>          error_report_err(local_err);
>      }
>  
> +    /*
> +     * TODO: Emulated mode can only be initialized once. Should we
> +     * store the information under the device model for later usage ?
> +     */
> +    if (once) {
> +        return;
> +    }
> +    once = true;

Urgh.  static locals are a bad smell.  I think at least this flag
should go into the instance structure (if we can't deduce it from
something else in there).

> +
>      /* TIMA initialization */
>      memory_region_init_io(&xive->tm_mmio, OBJECT(xive), &xive_tm_ops, xive,
>                            "xive.tima", 4ull << TM_SHIFT);
> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> index cd81cdb23a5e..99a829fb3f60 100644
> --- a/hw/intc/spapr_xive_kvm.c
> +++ b/hw/intc/spapr_xive_kvm.c
> @@ -657,6 +657,15 @@ void kvmppc_xive_connect(sPAPRXive *xive, Error **errp)
>      Error *local_err = NULL;
>      size_t esb_len = (1ull << xsrc->esb_shift) * xsrc->nr_irqs;
>      size_t tima_len = 4ull << TM_SHIFT;
> +    CPUState *cs;
> +
> +    /*
> +     * The KVM XIVE device already in use. This is the case when
> +     * rebooting XIVE -> XIVE

I take it from this we're leaving the XIVE KVM device initialized if
we don't change to XICS during CAS.  Would it make things simpler if
we always removed both the XICS and XIVE KVM devices at reset and
recreated the one we need at CAS?

> +     */
> +    if (xive->fd != -1) {
> +        return;
> +    }
>  
>      if (!kvmppc_has_cap_xive()) {
>          error_setg(errp, "IRQ_XIVE capability must be present for KVM");
> @@ -705,6 +714,24 @@ void kvmppc_xive_connect(sPAPRXive *xive, Error **errp)
>      xive->change = qemu_add_vm_change_state_handler(
>          kvmppc_xive_change_state_handler, xive);
>  
> +    /* Connect the presenters to the initial VCPUs of the machine */
> +    CPU_FOREACH(cs) {
> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
> +
> +        kvmppc_xive_cpu_connect(spapr_cpu_state(cpu)->tctx, &local_err);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            return;
> +        }
> +    }
> +
> +    /* Update the KVM sources */
> +    kvmppc_xive_source_reset(xsrc, &local_err);
> +    if (local_err) {
> +            error_propagate(errp, local_err);
> +            return;
> +    }
> +
>      kvm_kernel_irqchip = true;
>      kvm_msi_via_irqfd_allowed = true;
>      kvm_gsi_direct_mapping = true;
> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
> index 9855316e4831..8ffd4c7a36f8 100644
> --- a/hw/intc/xics_kvm.c
> +++ b/hw/intc/xics_kvm.c
> @@ -33,6 +33,7 @@
>  #include "trace.h"
>  #include "sysemu/kvm.h"
>  #include "hw/ppc/spapr.h"
> +#include "hw/ppc/spapr_cpu_core.h"
>  #include "hw/ppc/xics.h"
>  #include "hw/ppc/xics_spapr.h"
>  #include "kvm_ppc.h"
> @@ -337,6 +338,16 @@ static void rtas_dummy(PowerPCCPU *cpu, sPAPRMachineState *spapr,
>  int xics_kvm_init(sPAPRMachineState *spapr, Error **errp)
>  {
>      int rc;
> +    CPUState *cs;
> +    Error *local_err = NULL;
> +
> +    /*
> +     * The KVM XICS device already in use. This is the case when
> +     * rebooting XICS -> XICS
> +     */
> +    if (kernel_xics_fd != -1) {
> +        return 0;
> +    }
>  
>      if (!kvm_enabled() || !kvm_check_extension(kvm_state, KVM_CAP_IRQ_XICS)) {
>          error_setg(errp,
> @@ -385,6 +396,21 @@ int xics_kvm_init(sPAPRMachineState *spapr, Error **errp)
>      kvm_msi_via_irqfd_allowed = true;
>      kvm_gsi_direct_mapping = true;
>  
> +    /* Connect the presenters to the initial VCPUs of the machine */
> +    CPU_FOREACH(cs) {
> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
> +
> +        icp_kvm_realize(DEVICE(spapr_cpu_state(cpu)->icp), &local_err);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            goto fail;
> +        }
> +        icp_set_kvm_state(spapr_cpu_state(cpu)->icp);
> +    }
> +
> +    /* Update the KVM sources */
> +    ics_set_kvm_state(spapr->ics);
> +
>      return 0;
>  
>  fail:
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index 1f8e923ca654..715d5a7e65ed 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -929,10 +929,6 @@ static void xive_source_reset(void *dev)
>  
>      /* PQs are initialized to 0b01 (Q=1) which corresponds to "ints off" */
>      memset(xsrc->status, XIVE_ESB_OFF, xsrc->nr_irqs);
> -
> -    if (kvm_irqchip_in_kernel()) {
> -        kvmppc_xive_source_reset(xsrc, &error_fatal);
> -    }
>  }
>  
>  static void xive_source_realize(DeviceState *dev, Error **errp)
> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
> index 3176098b9f7c..f8260c14aecd 100644
> --- a/hw/ppc/spapr_irq.c
> +++ b/hw/ppc/spapr_irq.c
> @@ -92,35 +92,55 @@ error:
>      return NULL;
>  }
>  
> -static void spapr_irq_init_xics(sPAPRMachineState *spapr, int nr_irqs,
> -                                Error **errp)
> +static void spapr_ics_late_realize(sPAPRMachineState *spapr, Error **errp)
>  {
>      MachineState *machine = MACHINE(spapr);
>      Error *local_err = NULL;
> -    bool xics_kvm = false;
> +    static bool once;
>  
> -    if (kvm_enabled()) {
> -        if (machine_kernel_irqchip_allowed(machine) &&
> -            !xics_kvm_init(spapr, &local_err)) {
> -            xics_kvm = true;
> -        }
> -        if (machine_kernel_irqchip_required(machine) && !xics_kvm) {
> +    if (kvm_enabled() && machine_kernel_irqchip_allowed(machine)) {
> +        xics_kvm_init(spapr, &local_err);
> +        if (local_err && machine_kernel_irqchip_required(machine)) {
>              error_prepend(&local_err,
>                            "kernel_irqchip requested but unavailable: ");
> -            goto error;
> +            error_propagate(errp, local_err);
> +            return;
>          }
> -        error_free(local_err);
> -        local_err = NULL;
> +
> +        if (!local_err) {
> +            return;
> +        }
> +
> +        /*
> +         * We failed to initialize the XIVE KVM device, fallback to
> +         * emulated mode
> +         */
> +        error_prepend(&local_err, "kernel_irqchip allowed but unavailable: ");
> +        error_report_err(local_err);

Should only warn (at most) in this case, since fallback is permitted
and should work.

>      }
>  
> -    if (!xics_kvm) {
> -        xics_spapr_init(spapr);
> +    /*
> +     * TODO: Emulated mode can only be initialized once. Should we
> +     * store the information under the device model for later usage ?
> +     */
> +    if (once) {
> +        return;
>      }
> +    once = true;
>  
> -    spapr->ics = spapr_ics_create(spapr, nr_irqs, &local_err);
> +    xics_spapr_init(spapr);
> +}
>  
> -error:
> -    error_propagate(errp, local_err);
> +static void spapr_irq_init_xics(sPAPRMachineState *spapr, int nr_irqs,
> +                                Error **errp)
> +{
> +    Error *local_err = NULL;
> +
> +    spapr->ics = spapr_ics_create(spapr, nr_irqs, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
>  }
>  
>  #define ICS_IRQ_FREE(ics, srcno)   \
> @@ -227,7 +247,13 @@ static void spapr_irq_set_irq_xics(void *opaque, int srcno, int val)
>  
>  static void spapr_irq_reset_xics(sPAPRMachineState *spapr, Error **errp)
>  {
> -    /* TODO: create the KVM XICS device */
> +    Error *local_err = NULL;
> +
> +    spapr_ics_late_realize(spapr, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
>  }
>  
>  static const char *spapr_irq_get_nodename_xics(sPAPRMachineState *spapr)
> @@ -386,6 +412,7 @@ static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
>  static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
>  {
>      CPUState *cs;
> +    Error *local_err = NULL;
>  
>      CPU_FOREACH(cs) {
>          PowerPCCPU *cpu = POWERPC_CPU(cs);
> @@ -394,6 +421,12 @@ static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
>          spapr_xive_set_tctx_os_cam(spapr_cpu_state(cpu)->tctx);
>      }
>  
> +    spapr_xive_late_realize(spapr->xive, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
>      /* Activate the XIVE MMIOs */
>      spapr_xive_mmio_set_enabled(spapr->xive, true);
>  }
> @@ -462,14 +495,8 @@ static sPAPRIrq *spapr_irq_current(sPAPRMachineState *spapr)
>  static void spapr_irq_init_dual(sPAPRMachineState *spapr, int nr_irqs,
>                                  Error **errp)
>  {
> -    MachineState *machine = MACHINE(spapr);
>      Error *local_err = NULL;
>  
> -    if (kvm_enabled() && machine_kernel_irqchip_allowed(machine)) {
> -        error_setg(errp, "No KVM support for the 'dual' machine");
> -        return;
> -    }
> -
>      spapr_irq_xics.init(spapr, spapr_irq_xics.nr_irqs, &local_err);
>      if (local_err) {
>          error_propagate(errp, local_err);
> @@ -548,6 +575,9 @@ static int spapr_irq_post_load_dual(sPAPRMachineState *spapr, int version_id)
>       * defaults to XICS at startup.
>       */
>      if (spapr_ovec_test(spapr->ov5_cas, OV5_XIVE_EXPLOIT)) {
> +        if (kvm_irqchip_in_kernel()) {
> +            xics_kvm_disconnect(spapr, &error_fatal);

Yeah, this is kinda nasty.  I'm wondering if we could make things
simpler by always using emulated irqchip until CAS and only setting up
kernel irqchip then.

> +        }
>          spapr_irq_xive.reset(spapr, &error_fatal);
>      }
>  
> @@ -556,12 +586,30 @@ static int spapr_irq_post_load_dual(sPAPRMachineState *spapr, int version_id)
>  
>  static void spapr_irq_reset_dual(sPAPRMachineState *spapr, Error **errp)
>  {
> +    Error *local_err = NULL;
> +
>      /*
>       * Deactivate the XIVE MMIOs. The XIVE backend will reenable them
>       * if selected.
>       */
>      spapr_xive_mmio_set_enabled(spapr->xive, false);
>  
> +    /* Destroy all KVM devices */
> +    if (kvm_irqchip_in_kernel()) {
> +        xics_kvm_disconnect(spapr, &local_err);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            error_prepend(errp, "KVM XICS disconnect failed: ");
> +            return;
> +        }
> +        kvmppc_xive_disconnect(spapr->xive, &local_err);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            error_prepend(errp, "KVM XIVE disconnect failed: ");
> +            return;
> +        }
> +    }
> +
>      spapr_irq_current(spapr)->reset(spapr, errp);
>  }
>  
> @@ -748,6 +796,7 @@ sPAPRIrq spapr_irq_xics_legacy = {
>      .dt_populate = spapr_dt_xics,
>      .cpu_intc_create = spapr_irq_cpu_intc_create_xics,
>      .post_load   = spapr_irq_post_load_xics,
> +    .reset       = spapr_irq_reset_xics,
>      .set_irq     = spapr_irq_set_irq_xics,
>      .get_nodename = spapr_irq_get_nodename_xics,
>  };

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/13] spapr/xive: add KVM support
  2019-02-25  5:55   ` David Gibson
@ 2019-03-11 15:53     ` Cédric Le Goater
  0 siblings, 0 replies; 32+ messages in thread
From: Cédric Le Goater @ 2019-03-11 15:53 UTC (permalink / raw)
  To: David Gibson; +Cc: Greg Kurz, qemu-ppc, qemu-devel

On 2/25/19 6:55 AM, David Gibson wrote:
> On Fri, Feb 22, 2019 at 02:13:10PM +0100, Cédric Le Goater wrote:
>> This introduces a set of helpers when KVM is in use, which create the
>> KVM XIVE device, initialize the interrupt sources at a KVM level and
>> connect the interrupt presenters to the vCPU.
>>
>> They also handle the initialization of the TIMA and the source ESB
>> memory regions of the controller. These have a different type under
>> KVM. They are 'ram device' memory mappings, similarly to VFIO, exposed
>> to the guest and the associated VMAs on the host are populated
>> dynamically with the appropriate pages using a fault handler.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  default-configs/ppc64-softmmu.mak |   1 +
>>  include/hw/ppc/spapr_xive.h       |  10 ++
>>  include/hw/ppc/xive.h             |  13 ++
>>  target/ppc/kvm_ppc.h              |   6 +
>>  hw/intc/spapr_xive.c              |  48 +++++-
>>  hw/intc/spapr_xive_kvm.c          | 237 ++++++++++++++++++++++++++++++
>>  hw/intc/xive.c                    |  21 ++-
>>  hw/ppc/spapr_irq.c                |   6 +-
>>  target/ppc/kvm.c                  |   7 +
>>  hw/intc/Makefile.objs             |   1 +
>>  10 files changed, 340 insertions(+), 10 deletions(-)
>>  create mode 100644 hw/intc/spapr_xive_kvm.c
>>
>> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
>> index 7f34ad0528ed..c1bf5cd951f5 100644
>> --- a/default-configs/ppc64-softmmu.mak
>> +++ b/default-configs/ppc64-softmmu.mak
>> @@ -18,6 +18,7 @@ CONFIG_XICS_SPAPR=$(CONFIG_PSERIES)
>>  CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
>>  CONFIG_XIVE=$(CONFIG_PSERIES)
>>  CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES)
>> +CONFIG_XIVE_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM))
>>  CONFIG_MEM_DEVICE=y
>>  CONFIG_DIMM=y
>>  CONFIG_SPAPR_RNG=y
>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>> index 2d31f24e3bfe..ab6732b14a02 100644
>> --- a/include/hw/ppc/spapr_xive.h
>> +++ b/include/hw/ppc/spapr_xive.h
>> @@ -38,6 +38,10 @@ typedef struct sPAPRXive {
>>      /* TIMA mapping address */
>>      hwaddr        tm_base;
>>      MemoryRegion  tm_mmio;
>> +
>> +    /* KVM support */
>> +    int           fd;
>> +    void          *tm_mmap;
>>  } sPAPRXive;
>>  
>>  bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi);
>> @@ -49,5 +53,11 @@ void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt,
>>                     uint32_t phandle);
>>  void spapr_xive_set_tctx_os_cam(XiveTCTX *tctx);
>>  void spapr_xive_mmio_set_enabled(sPAPRXive *xive, bool enable);
>> +void spapr_xive_map_mmio(sPAPRXive *xive);
>> +
>> +/*
>> + * KVM XIVE device helpers
>> + */
>> +void kvmppc_xive_connect(sPAPRXive *xive, Error **errp);
>>  
>>  #endif /* PPC_SPAPR_XIVE_H */
>> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
>> index 13a487527b11..061d43fea24d 100644
>> --- a/include/hw/ppc/xive.h
>> +++ b/include/hw/ppc/xive.h
>> @@ -140,6 +140,7 @@
>>  #ifndef PPC_XIVE_H
>>  #define PPC_XIVE_H
>>  
>> +#include "sysemu/kvm.h"
>>  #include "hw/qdev-core.h"
>>  #include "hw/sysbus.h"
>>  #include "hw/ppc/xive_regs.h"
>> @@ -194,6 +195,9 @@ typedef struct XiveSource {
>>      uint32_t        esb_shift;
>>      MemoryRegion    esb_mmio;
>>  
>> +    /* KVM support */
>> +    void            *esb_mmap;
>> +
>>      XiveNotifier    *xive;
>>  } XiveSource;
>>  
>> @@ -419,4 +423,13 @@ static inline uint32_t xive_nvt_cam_line(uint8_t nvt_blk, uint32_t nvt_idx)
>>      return (nvt_blk << 19) | nvt_idx;
>>  }
>>  
>> +/*
>> + * KVM XIVE device helpers
>> + */
>> +
>> +void kvmppc_xive_source_reset_one(XiveSource *xsrc, int srcno, Error **errp);
>> +void kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp);
>> +void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val);
>> +void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp);
>> +
>>  #endif /* PPC_XIVE_H */
>> diff --git a/target/ppc/kvm_ppc.h b/target/ppc/kvm_ppc.h
>> index bdfaa4e70a83..d2159660f9f2 100644
>> --- a/target/ppc/kvm_ppc.h
>> +++ b/target/ppc/kvm_ppc.h
>> @@ -59,6 +59,7 @@ bool kvmppc_has_cap_fixup_hcalls(void);
>>  bool kvmppc_has_cap_htm(void);
>>  bool kvmppc_has_cap_mmu_radix(void);
>>  bool kvmppc_has_cap_mmu_hash_v3(void);
>> +bool kvmppc_has_cap_xive(void);
>>  int kvmppc_get_cap_safe_cache(void);
>>  int kvmppc_get_cap_safe_bounds_check(void);
>>  int kvmppc_get_cap_safe_indirect_branch(void);
>> @@ -307,6 +308,11 @@ static inline bool kvmppc_has_cap_mmu_hash_v3(void)
>>      return false;
>>  }
>>  
>> +static inline bool kvmppc_has_cap_xive(void)
>> +{
>> +    return false;
>> +}
>> +
>>  static inline int kvmppc_get_cap_safe_cache(void)
>>  {
>>      return 0;
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index 06e3c9fdbfeb..c24d649e3668 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -173,7 +173,7 @@ void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
>>      }
>>  }
>>  
>> -static void spapr_xive_map_mmio(sPAPRXive *xive)
>> +void spapr_xive_map_mmio(sPAPRXive *xive)
>>  {
>>      sysbus_mmio_map(SYS_BUS_DEVICE(xive), 0, xive->vc_base);
>>      sysbus_mmio_map(SYS_BUS_DEVICE(xive), 1, xive->end_base);
>> @@ -251,6 +251,9 @@ static void spapr_xive_instance_init(Object *obj)
>>                        TYPE_XIVE_END_SOURCE);
>>      object_property_add_child(obj, "end_source", OBJECT(&xive->end_source),
>>                                NULL);
>> +
>> +    /* Not connected to the KVM XIVE device */
>> +    xive->fd = -1;
>>  }
>>  
>>  static void spapr_xive_realize(DeviceState *dev, Error **errp)
>> @@ -259,6 +262,7 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>>      XiveSource *xsrc = &xive->source;
>>      XiveENDSource *end_xsrc = &xive->end_source;
>>      Error *local_err = NULL;
>> +    MachineState *machine = MACHINE(qdev_get_machine());
>>  
>>      if (!xive->nr_irqs) {
>>          error_setg(errp, "Number of interrupt needs to be greater 0");
>> @@ -305,6 +309,32 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>>      xive->eat = g_new0(XiveEAS, xive->nr_irqs);
>>      xive->endt = g_new0(XiveEND, xive->nr_ends);
>>  
>> +    xive->nodename = g_strdup_printf("interrupt-controller@%" PRIx64,
>> +                           xive->tm_base + XIVE_TM_USER_PAGE * (1 << TM_SHIFT));
>> +
>> +    qemu_register_reset(spapr_xive_reset, dev);
>> +
>> +    if (kvm_enabled() && machine_kernel_irqchip_allowed(machine)) {
>> +        kvmppc_xive_connect(xive, &local_err);
>> +        if (local_err && machine_kernel_irqchip_required(machine)) {
>> +            error_prepend(&local_err,
>> +                          "kernel_irqchip requested but unavailable: ");
>> +            error_propagate(errp, local_err);
>> +            return;
>> +        }
>> +
>> +        if (!local_err) {
>> +            return;
>> +        }
>> +
>> +        /*
>> +         * We failed to initialize the XIVE KVM device, fallback to
>> +         * emulated mode
>> +         */
>> +        error_prepend(&local_err, "kernel_irqchip allowed but unavailable: ");
>> +        error_report_err(local_err);
> 
> Since we can fall back this should probably just be
> warn_report_err().  Maybe not even that, for the case where the host
> kernel doesn't support KVM XIVE at all.

Let's use a warn_report_err().

Thanks, 

C. 

 
>> +    }
>> +
>>      /* TIMA initialization */
>>      memory_region_init_io(&xive->tm_mmio, OBJECT(xive), &xive_tm_ops, xive,
>>                            "xive.tima", 4ull << TM_SHIFT);
>> @@ -316,11 +346,6 @@ static void spapr_xive_realize(DeviceState *dev, Error **errp)
>>  
>>      /* Map all regions */
>>      spapr_xive_map_mmio(xive);
>> -
>> -    xive->nodename = g_strdup_printf("interrupt-controller@%" PRIx64,
>> -                           xive->tm_base + XIVE_TM_USER_PAGE * (1 << TM_SHIFT));
>> -
>> -    qemu_register_reset(spapr_xive_reset, dev);
>>  }
>>  
>>  static int spapr_xive_get_eas(XiveRouter *xrtr, uint8_t eas_blk,
>> @@ -495,6 +520,17 @@ bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi)
>>      if (lsi) {
>>          xive_source_irq_set_lsi(xsrc, lisn);
>>      }
>> +
>> +    if (kvm_irqchip_in_kernel()) {
>> +        Error *local_err = NULL;
>> +
>> +        kvmppc_xive_source_reset_one(xsrc, lisn, &local_err);
>> +        if (local_err) {
>> +            error_report_err(local_err);
>> +            return false;
>> +        }
>> +    }
>> +
>>      return true;
>>  }
>>  
>> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
>> new file mode 100644
>> index 000000000000..623fbf74f23e
>> --- /dev/null
>> +++ b/hw/intc/spapr_xive_kvm.c
>> @@ -0,0 +1,237 @@
>> +/*
>> + * QEMU PowerPC sPAPR XIVE interrupt controller model
>> + *
>> + * Copyright (c) 2017-2019, IBM Corporation.
>> + *
>> + * This code is licensed under the GPL version 2 or later. See the
>> + * COPYING file in the top-level directory.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu/log.h"
>> +#include "qemu/error-report.h"
>> +#include "qapi/error.h"
>> +#include "target/ppc/cpu.h"
>> +#include "sysemu/cpus.h"
>> +#include "sysemu/kvm.h"
>> +#include "hw/ppc/spapr.h"
>> +#include "hw/ppc/spapr_xive.h"
>> +#include "hw/ppc/xive.h"
>> +#include "kvm_ppc.h"
>> +
>> +#include <sys/ioctl.h>
>> +
>> +/*
>> + * Helpers for CPU hotplug
>> + *
>> + * TODO: make a common KVMEnabledCPU layer for XICS and XIVE
>> + */
>> +typedef struct KVMEnabledCPU {
>> +    unsigned long vcpu_id;
>> +    QLIST_ENTRY(KVMEnabledCPU) node;
>> +} KVMEnabledCPU;
>> +
>> +static QLIST_HEAD(, KVMEnabledCPU)
>> +    kvm_enabled_cpus = QLIST_HEAD_INITIALIZER(&kvm_enabled_cpus);
>> +
>> +static bool kvm_cpu_is_enabled(CPUState *cs)
>> +{
>> +    KVMEnabledCPU *enabled_cpu;
>> +    unsigned long vcpu_id = kvm_arch_vcpu_id(cs);
>> +
>> +    QLIST_FOREACH(enabled_cpu, &kvm_enabled_cpus, node) {
>> +        if (enabled_cpu->vcpu_id == vcpu_id) {
>> +            return true;
>> +        }
>> +    }
>> +    return false;
>> +}
>> +
>> +static void kvm_cpu_enable(CPUState *cs)
>> +{
>> +    KVMEnabledCPU *enabled_cpu;
>> +    unsigned long vcpu_id = kvm_arch_vcpu_id(cs);
>> +
>> +    enabled_cpu = g_malloc(sizeof(*enabled_cpu));
>> +    enabled_cpu->vcpu_id = vcpu_id;
>> +    QLIST_INSERT_HEAD(&kvm_enabled_cpus, enabled_cpu, node);
>> +}
>> +
>> +/*
>> + * XIVE Thread Interrupt Management context (KVM)
>> + */
>> +
>> +void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp)
>> +{
>> +    sPAPRXive *xive = SPAPR_MACHINE(qdev_get_machine())->xive;
>> +    unsigned long vcpu_id;
>> +    int ret;
>> +
>> +    /* Check if CPU was hot unplugged and replugged. */
>> +    if (kvm_cpu_is_enabled(tctx->cs)) {
>> +        return;
>> +    }
>> +
>> +    vcpu_id = kvm_arch_vcpu_id(tctx->cs);
>> +
>> +    ret = kvm_vcpu_enable_cap(tctx->cs, KVM_CAP_PPC_IRQ_XIVE, 0, xive->fd,
>> +                              vcpu_id, 0);
>> +    if (ret < 0) {
>> +        error_setg(errp, "XIVE: unable to connect CPU%ld to KVM device: %s",
>> +                   vcpu_id, strerror(errno));
>> +        return;
>> +    }
>> +
>> +    kvm_cpu_enable(tctx->cs);
>> +}
>> +
>> +/*
>> + * XIVE Interrupt Source (KVM)
>> + */
>> +
>> +/*
>> + * At reset, the interrupt sources are simply created and MASKED. We
>> + * only need to inform the KVM XIVE device about their type: LSI or
>> + * MSI.
>> + */
>> +void kvmppc_xive_source_reset_one(XiveSource *xsrc, int srcno, Error **errp)
>> +{
>> +    sPAPRXive *xive = SPAPR_XIVE(xsrc->xive);
>> +    uint64_t state = 0;
>> +
>> +    if (xive_source_irq_is_lsi(xsrc, srcno)) {
>> +        state |= KVM_XIVE_LEVEL_SENSITIVE;
>> +        if (xsrc->status[srcno] & XIVE_STATUS_ASSERTED) {
>> +            state |= KVM_XIVE_LEVEL_ASSERTED;
>> +        }
>> +    }
>> +
>> +    kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_SOURCE, srcno, &state,
>> +                      true, errp);
>> +}
>> +
>> +void kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp)
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < xsrc->nr_irqs; i++) {
>> +        Error *local_err = NULL;
>> +
>> +        kvmppc_xive_source_reset_one(xsrc, i, &local_err);
>> +        if (local_err) {
>> +            error_propagate(errp, local_err);
>> +            return;
>> +        }
>> +    }
>> +}
>> +
>> +void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val)
>> +{
>> +    XiveSource *xsrc = opaque;
>> +    struct kvm_irq_level args;
>> +    int rc;
>> +
>> +    args.irq = srcno;
>> +    if (!xive_source_irq_is_lsi(xsrc, srcno)) {
>> +        if (!val) {
>> +            return;
>> +        }
>> +        args.level = KVM_INTERRUPT_SET;
>> +    } else {
>> +        if (val) {
>> +            xsrc->status[srcno] |= XIVE_STATUS_ASSERTED;
>> +            args.level = KVM_INTERRUPT_SET_LEVEL;
>> +        } else {
>> +            xsrc->status[srcno] &= ~XIVE_STATUS_ASSERTED;
>> +            args.level = KVM_INTERRUPT_UNSET;
>> +        }
>> +    }
>> +    rc = kvm_vm_ioctl(kvm_state, KVM_IRQ_LINE, &args);
>> +    if (rc < 0) {
>> +        error_report("XIVE: kvm_irq_line() failed : %s", strerror(errno));
>> +    }
>> +}
>> +
>> +/*
>> + * sPAPR XIVE interrupt controller (KVM)
>> + */
>> +
>> +static void *kvmppc_xive_mmap(sPAPRXive *xive, int pgoff, size_t len,
>> +                              Error **errp)
>> +{
>> +    void *addr;
>> +    uint32_t page_shift = 16; /* TODO: fix page_shift */
>> +
>> +    addr = mmap(NULL, len, PROT_WRITE | PROT_READ, MAP_SHARED, xive->fd,
>> +                pgoff << page_shift);
>> +    if (addr == MAP_FAILED) {
>> +        error_setg_errno(errp, errno, "XIVE: unable to set memory mapping");
>> +        return NULL;
>> +    }
>> +
>> +    return addr;
>> +}
>> +
>> +/*
>> + * All the XIVE memory regions are now backed by mappings from the KVM
>> + * XIVE device.
>> + */
>> +void kvmppc_xive_connect(sPAPRXive *xive, Error **errp)
>> +{
>> +    XiveSource *xsrc = &xive->source;
>> +    XiveENDSource *end_xsrc = &xive->end_source;
>> +    Error *local_err = NULL;
>> +    size_t esb_len = (1ull << xsrc->esb_shift) * xsrc->nr_irqs;
>> +    size_t tima_len = 4ull << TM_SHIFT;
>> +
>> +    if (!kvmppc_has_cap_xive()) {
>> +        error_setg(errp, "IRQ_XIVE capability must be present for KVM");
>> +        return;
>> +    }
>> +
>> +    /* First, create the KVM XIVE device */
>> +    xive->fd = kvm_create_device(kvm_state, KVM_DEV_TYPE_XIVE, false);
>> +    if (xive->fd < 0) {
>> +        error_setg_errno(errp, -xive->fd, "XIVE: error creating KVM device");
>> +        return;
>> +    }
>> +
>> +    /*
>> +     * 1. Source ESB pages - KVM mapping
>> +     */
>> +    xsrc->esb_mmap = kvmppc_xive_mmap(xive, KVM_XIVE_ESB_PAGE_OFFSET, esb_len,
>> +                                      &local_err);
>> +    if (local_err) {
>> +        error_propagate(errp, local_err);
>> +        return;
>> +    }
>> +
>> +    memory_region_init_ram_device_ptr(&xsrc->esb_mmio, OBJECT(xsrc),
>> +                                      "xive.esb", esb_len, xsrc->esb_mmap);
>> +    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &xsrc->esb_mmio);
>> +
>> +    /*
>> +     * 2. END ESB pages (No KVM support yet)
>> +     */
>> +    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &end_xsrc->esb_mmio);
>> +
>> +    /*
>> +     * 3. TIMA pages - KVM mapping
>> +     */
>> +    xive->tm_mmap = kvmppc_xive_mmap(xive, KVM_XIVE_TIMA_PAGE_OFFSET, tima_len,
>> +                                     &local_err);
>> +    if (local_err) {
>> +        error_propagate(errp, local_err);
>> +        return;
>> +    }
>> +    memory_region_init_ram_device_ptr(&xive->tm_mmio, OBJECT(xive),
>> +                                      "xive.tima", tima_len, xive->tm_mmap);
>> +    sysbus_init_mmio(SYS_BUS_DEVICE(xive), &xive->tm_mmio);
>> +
>> +    kvm_kernel_irqchip = true;
>> +    kvm_msi_via_irqfd_allowed = true;
>> +    kvm_gsi_direct_mapping = true;
>> +
>> +    /* Map all regions */
>> +    spapr_xive_map_mmio(xive);
>> +}
>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>> index daa7badc8492..0284b5803551 100644
>> --- a/hw/intc/xive.c
>> +++ b/hw/intc/xive.c
>> @@ -491,6 +491,15 @@ static void xive_tctx_realize(DeviceState *dev, Error **errp)
>>          return;
>>      }
>>  
>> +    /* Connect the presenter to the VCPU (required for CPU hotplug) */
>> +    if (kvm_irqchip_in_kernel()) {
>> +        kvmppc_xive_cpu_connect(tctx, &local_err);
>> +        if (local_err) {
>> +            error_propagate(errp, local_err);
>> +            return;
>> +        }
>> +    }
>> +
>>      qemu_register_reset(xive_tctx_reset, dev);
>>  }
>>  
>> @@ -893,6 +902,10 @@ static void xive_source_reset(void *dev)
>>  
>>      /* PQs are initialized to 0b01 (Q=1) which corresponds to "ints off" */
>>      memset(xsrc->status, XIVE_ESB_OFF, xsrc->nr_irqs);
>> +
>> +    if (kvm_irqchip_in_kernel()) {
>> +        kvmppc_xive_source_reset(xsrc, &error_fatal);
>> +    }
>>  }
>>  
>>  static void xive_source_realize(DeviceState *dev, Error **errp)
>> @@ -926,9 +939,11 @@ static void xive_source_realize(DeviceState *dev, Error **errp)
>>      xsrc->status = g_malloc0(xsrc->nr_irqs);
>>      xsrc->lsi_map = bitmap_new(xsrc->nr_irqs);
>>  
>> -    memory_region_init_io(&xsrc->esb_mmio, OBJECT(xsrc),
>> -                          &xive_source_esb_ops, xsrc, "xive.esb",
>> -                          (1ull << xsrc->esb_shift) * xsrc->nr_irqs);
>> +    if (!kvm_irqchip_in_kernel()) {
>> +        memory_region_init_io(&xsrc->esb_mmio, OBJECT(xsrc),
>> +                              &xive_source_esb_ops, xsrc, "xive.esb",
>> +                              (1ull << xsrc->esb_shift) * xsrc->nr_irqs);
>> +    }
>>  
>>      qemu_register_reset(xive_source_reset, dev);
>>  }
>> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
>> index 4145079d7fa5..6e1c36dc62ca 100644
>> --- a/hw/ppc/spapr_irq.c
>> +++ b/hw/ppc/spapr_irq.c
>> @@ -387,7 +387,11 @@ static void spapr_irq_set_irq_xive(void *opaque, int srcno, int val)
>>  {
>>      sPAPRMachineState *spapr = opaque;
>>  
>> -    xive_source_set_irq(&spapr->xive->source, srcno, val);
>> +    if (kvm_irqchip_in_kernel()) {
>> +        kvmppc_xive_source_set_irq(&spapr->xive->source, srcno, val);
>> +    } else {
>> +        xive_source_set_irq(&spapr->xive->source, srcno, val);
>> +    }
>>  }
>>  
>>  static const char *spapr_irq_get_nodename_xive(sPAPRMachineState *spapr)
>> diff --git a/target/ppc/kvm.c b/target/ppc/kvm.c
>> index d01852fe3112..43e42e3c2af9 100644
>> --- a/target/ppc/kvm.c
>> +++ b/target/ppc/kvm.c
>> @@ -85,6 +85,7 @@ static int cap_fixup_hcalls;
>>  static int cap_htm;             /* Hardware transactional memory support */
>>  static int cap_mmu_radix;
>>  static int cap_mmu_hash_v3;
>> +static int cap_xive;
>>  static int cap_resize_hpt;
>>  static int cap_ppc_pvr_compat;
>>  static int cap_ppc_safe_cache;
>> @@ -148,6 +149,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
>>      cap_htm = kvm_vm_check_extension(s, KVM_CAP_PPC_HTM);
>>      cap_mmu_radix = kvm_vm_check_extension(s, KVM_CAP_PPC_MMU_RADIX);
>>      cap_mmu_hash_v3 = kvm_vm_check_extension(s, KVM_CAP_PPC_MMU_HASH_V3);
>> +    cap_xive = kvm_vm_check_extension(s, KVM_CAP_PPC_IRQ_XIVE);
>>      cap_resize_hpt = kvm_vm_check_extension(s, KVM_CAP_SPAPR_RESIZE_HPT);
>>      kvmppc_get_cpu_characteristics(s);
>>      cap_ppc_nested_kvm_hv = kvm_vm_check_extension(s, KVM_CAP_PPC_NESTED_HV);
>> @@ -2388,6 +2390,11 @@ static int parse_cap_ppc_safe_indirect_branch(struct kvm_ppc_cpu_char c)
>>      return 0;
>>  }
>>  
>> +bool kvmppc_has_cap_xive(void)
>> +{
>> +    return cap_xive;
>> +}
>> +
>>  static void kvmppc_get_cpu_characteristics(KVMState *s)
>>  {
>>      struct kvm_ppc_cpu_char c;
>> diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
>> index 301a8e972d91..23126c199178 100644
>> --- a/hw/intc/Makefile.objs
>> +++ b/hw/intc/Makefile.objs
>> @@ -39,6 +39,7 @@ obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o
>>  obj-$(CONFIG_XICS_KVM) += xics_kvm.o
>>  obj-$(CONFIG_XIVE) += xive.o
>>  obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o
>> +obj-$(CONFIG_XIVE_KVM) += spapr_xive_kvm.o
>>  obj-$(CONFIG_POWERNV) += xics_pnv.o
>>  obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o
>>  obj-$(CONFIG_S390_FLIC) += s390_flic.o
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/13] spapr/xive: add hcall support when under KVM
  2019-02-25 23:22   ` David Gibson
@ 2019-03-11 17:32     ` Cédric Le Goater
  0 siblings, 0 replies; 32+ messages in thread
From: Cédric Le Goater @ 2019-03-11 17:32 UTC (permalink / raw)
  To: David Gibson; +Cc: Greg Kurz, qemu-ppc, qemu-devel

On 2/26/19 12:22 AM, David Gibson wrote:
> On Fri, Feb 22, 2019 at 02:13:11PM +0100, Cédric Le Goater wrote:
>> XIVE hcalls are all redirected to QEMU as none are on a fast path.
>> When necessary, QEMU invokes KVM through specific ioctls to perform
>> host operations. QEMU should have done the necessary checks before
>> calling KVM and, in case of failure, H_HARDWARE is simply returned.
>>
>> H_INT_ESB is a special case that could have been handled under KVM
>> but the impact on performance was low when under QEMU. Here are some
>> figures :
>>
>>     kernel irqchip      OFF          ON
>>     H_INT_ESB                    KVM   QEMU
>>
>>     rtl8139 (LSI )      1.19     1.24  1.23  Gbits/sec
>>     virtio             31.80    42.30   --   Gbits/sec
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  include/hw/ppc/spapr_xive.h |  15 +++
>>  hw/intc/spapr_xive.c        |  87 +++++++++++++++--
>>  hw/intc/spapr_xive_kvm.c    | 184 ++++++++++++++++++++++++++++++++++++
>>  3 files changed, 278 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>> index ab6732b14a02..749c6cbc2c56 100644
>> --- a/include/hw/ppc/spapr_xive.h
>> +++ b/include/hw/ppc/spapr_xive.h
>> @@ -55,9 +55,24 @@ void spapr_xive_set_tctx_os_cam(XiveTCTX *tctx);
>>  void spapr_xive_mmio_set_enabled(sPAPRXive *xive, bool enable);
>>  void spapr_xive_map_mmio(sPAPRXive *xive);
>>  
>> +int spapr_xive_end_to_target(uint8_t end_blk, uint32_t end_idx,
>> +                             uint32_t *out_server, uint8_t *out_prio);
>> +
>>  /*
>>   * KVM XIVE device helpers
>>   */
>>  void kvmppc_xive_connect(sPAPRXive *xive, Error **errp);
>> +void kvmppc_xive_reset(sPAPRXive *xive, Error **errp);
>> +void kvmppc_xive_set_source_config(sPAPRXive *xive, uint32_t lisn, XiveEAS *eas,
>> +                                   Error **errp);
>> +void kvmppc_xive_sync_source(sPAPRXive *xive, uint32_t lisn, Error **errp);
>> +uint64_t kvmppc_xive_esb_rw(XiveSource *xsrc, int srcno, uint32_t offset,
>> +                            uint64_t data, bool write);
>> +void kvmppc_xive_set_queue_config(sPAPRXive *xive, uint8_t end_blk,
>> +                                 uint32_t end_idx, XiveEND *end,
>> +                                 Error **errp);
>> +void kvmppc_xive_get_queue_config(sPAPRXive *xive, uint8_t end_blk,
>> +                                 uint32_t end_idx, XiveEND *end,
>> +                                 Error **errp);
>>  
>>  #endif /* PPC_SPAPR_XIVE_H */
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index c24d649e3668..3db24391e31c 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -86,6 +86,19 @@ static int spapr_xive_target_to_nvt(uint32_t target,
>>   * sPAPR END indexing uses a simple mapping of the CPU vcpu_id, 8
>>   * priorities per CPU
>>   */
>> +int spapr_xive_end_to_target(uint8_t end_blk, uint32_t end_idx,
>> +                             uint32_t *out_server, uint8_t *out_prio)
>> +{
> 
> Since you don't support irq blocks as yet, should this error out
> rather than ignoring if end_blk != 0?

yes we could. I will add a test against SPAPR_XIVE_BLOCK which is the value 
of the sPAPR block ID. I would like to be able to track where it is used 
even if constant.  

> 
>> +    if (out_server) {
>> +        *out_server = end_idx >> 3;
>> +    }
>> +
>> +    if (out_prio) {
>> +        *out_prio = end_idx & 0x7;
>> +    }
>> +    return 0;
>> +}
>> +
>>  static void spapr_xive_cpu_to_end(PowerPCCPU *cpu, uint8_t prio,
>>                                    uint8_t *out_end_blk, uint32_t *out_end_idx)
>>  {
>> @@ -792,6 +805,16 @@ static target_ulong h_int_set_source_config(PowerPCCPU *cpu,
>>          new_eas.w = xive_set_field64(EAS_END_DATA, new_eas.w, eisn);
>>      }
>>  
>> +    if (kvm_irqchip_in_kernel()) {
>> +        Error *local_err = NULL;
>> +
>> +        kvmppc_xive_set_source_config(xive, lisn, &new_eas, &local_err);
>> +        if (local_err) {
>> +            error_report_err(local_err);
>> +            return H_HARDWARE;
>> +        }
>> +    }
>> +
>>  out:
>>      xive->eat[lisn] = new_eas;
>>      return H_SUCCESS;
>> @@ -1097,6 +1120,16 @@ static target_ulong h_int_set_queue_config(PowerPCCPU *cpu,
>>       */
>>  
>>  out:
>> +    if (kvm_irqchip_in_kernel()) {
>> +        Error *local_err = NULL;
>> +
>> +        kvmppc_xive_set_queue_config(xive, end_blk, end_idx, &end, &local_err);
>> +        if (local_err) {
>> +            error_report_err(local_err);
>> +            return H_HARDWARE;
>> +        }
>> +    }
>> +
>>      /* Update END */
>>      memcpy(&xive->endt[end_idx], &end, sizeof(XiveEND));
>>      return H_SUCCESS;
>> @@ -1189,6 +1222,16 @@ static target_ulong h_int_get_queue_config(PowerPCCPU *cpu,
>>          args[2] = 0;
>>      }
>>  
>> +    if (kvm_irqchip_in_kernel()) {
>> +        Error *local_err = NULL;
>> +
>> +        kvmppc_xive_get_queue_config(xive, end_blk, end_idx, end, &local_err);
>> +        if (local_err) {
>> +            error_report_err(local_err);
>> +            return H_HARDWARE;
>> +        }
>> +    }
>> +
>>      /* TODO: do we need any locking on the END ? */
>>      if (flags & SPAPR_XIVE_END_DEBUG) {
>>          /* Load the event queue generation number into the return flags */
>> @@ -1341,15 +1384,20 @@ static target_ulong h_int_esb(PowerPCCPU *cpu,
>>          return H_P3;
>>      }
>>  
>> -    mmio_addr = xive->vc_base + xive_source_esb_mgmt(xsrc, lisn) + offset;
>> +    if (kvm_irqchip_in_kernel()) {
>> +        args[0] = kvmppc_xive_esb_rw(xsrc, lisn, offset, data,
>> +                                     flags & SPAPR_XIVE_ESB_STORE);
>> +    } else {
>> +        mmio_addr = xive->vc_base + xive_source_esb_mgmt(xsrc, lisn) + offset;
>>  
>> -    if (dma_memory_rw(&address_space_memory, mmio_addr, &data, 8,
>> -                      (flags & SPAPR_XIVE_ESB_STORE))) {
>> -        qemu_log_mask(LOG_GUEST_ERROR, "XIVE: failed to access ESB @0x%"
>> -                      HWADDR_PRIx "\n", mmio_addr);
>> -        return H_HARDWARE;
>> +        if (dma_memory_rw(&address_space_memory, mmio_addr, &data, 8,
>> +                          (flags & SPAPR_XIVE_ESB_STORE))) {
>> +            qemu_log_mask(LOG_GUEST_ERROR, "XIVE: failed to access ESB @0x%"
>> +                          HWADDR_PRIx "\n", mmio_addr);
>> +            return H_HARDWARE;
>> +        }
>> +        args[0] = (flags & SPAPR_XIVE_ESB_STORE) ? -1 : data;
>>      }
>> -    args[0] = (flags & SPAPR_XIVE_ESB_STORE) ? -1 : data;
>>      return H_SUCCESS;
>>  }
>>  
>> @@ -1406,7 +1454,20 @@ static target_ulong h_int_sync(PowerPCCPU *cpu,
>>       * This is not needed when running the emulation under QEMU
>>       */
>>  
>> -    /* This is not real hardware. Nothing to be done */
>> +    /*
>> +     * This is not real hardware. Nothing to be done unless when
>> +     * under KVM
>> +     */
>> +
>> +    if (kvm_irqchip_in_kernel()) {
>> +        Error *local_err = NULL;
>> +
>> +        kvmppc_xive_sync_source(xive, lisn, &local_err);
>> +        if (local_err) {
>> +            error_report_err(local_err);
>> +            return H_HARDWARE;
>> +        }
>> +    }
>>      return H_SUCCESS;
>>  }
>>  
>> @@ -1441,6 +1502,16 @@ static target_ulong h_int_reset(PowerPCCPU *cpu,
>>      }
>>  
>>      device_reset(DEVICE(xive));
>> +
>> +    if (kvm_irqchip_in_kernel()) {
>> +        Error *local_err = NULL;
>> +
>> +        kvmppc_xive_reset(xive, &local_err);
>> +        if (local_err) {
>> +            error_report_err(local_err);
>> +            return H_HARDWARE;
>> +        }
>> +    }
>>      return H_SUCCESS;
>>  }
>>  
>> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
>> index 623fbf74f23e..6b50451b4f85 100644
>> --- a/hw/intc/spapr_xive_kvm.c
>> +++ b/hw/intc/spapr_xive_kvm.c
>> @@ -89,6 +89,52 @@ void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp)
>>   * XIVE Interrupt Source (KVM)
>>   */
>>  
>> +void kvmppc_xive_set_source_config(sPAPRXive *xive, uint32_t lisn, XiveEAS *eas,
>> +                                   Error **errp)
>> +{
>> +    uint32_t end_idx;
>> +    uint32_t end_blk;
>> +    uint32_t eisn;
>> +    uint8_t priority;
>> +    uint32_t server;
>> +    uint64_t kvm_src;
>> +    Error *local_err = NULL;
>> +
>> +    /*
>> +     * No need to set a MASKED source, this is the default state after
>> +     * reset.
> 
> I don't quite follow this comment, why is there no need to call a
> MASKED source?

because MASKED is the default state in which KVM initializes the IRQ. I will
clarify.
 
>> +     */
>> +    if (!xive_eas_is_valid(eas) || xive_eas_is_masked(eas)) {
>> +        return;
>> +    }
>> +
>> +    end_idx = xive_get_field64(EAS_END_INDEX, eas->w);
>> +    end_blk = xive_get_field64(EAS_END_BLOCK, eas->w);
>> +    eisn = xive_get_field64(EAS_END_DATA, eas->w);
>> +
>> +    spapr_xive_end_to_target(end_blk, end_idx, &server, &priority);
>> +
>> +    kvm_src = priority << KVM_XIVE_SOURCE_PRIORITY_SHIFT &
>> +        KVM_XIVE_SOURCE_PRIORITY_MASK;
>> +    kvm_src |= server << KVM_XIVE_SOURCE_SERVER_SHIFT &
>> +        KVM_XIVE_SOURCE_SERVER_MASK;
>> +    kvm_src |= ((uint64_t)eisn << KVM_XIVE_SOURCE_EISN_SHIFT) &
>> +        KVM_XIVE_SOURCE_EISN_MASK;
>> +
>> +    kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_SOURCE_CONFIG, lisn,
>> +                      &kvm_src, true, &local_err);
>> +    if (local_err) {
>> +        error_propagate(errp, local_err);
>> +        return;
>> +    }
>> +}
>> +
>> +void kvmppc_xive_sync_source(sPAPRXive *xive, uint32_t lisn, Error **errp)
>> +{
>> +    kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_SOURCE_SYNC, lisn,
>> +                      NULL, true, errp);
>> +}
>> +
>>  /*
>>   * At reset, the interrupt sources are simply created and MASKED. We
>>   * only need to inform the KVM XIVE device about their type: LSI or
>> @@ -125,6 +171,64 @@ void kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp)
>>      }
>>  }
>>  
>> +/*
>> + * This is used to perform the magic loads on the ESB pages, described
>> + * in xive.h.
>> + */
>> +static uint64_t xive_esb_rw(XiveSource *xsrc, int srcno, uint32_t offset,
>> +                            uint64_t data, bool write)
>> +{
>> +    unsigned long addr = (unsigned long) xsrc->esb_mmap +
>> +        xive_source_esb_mgmt(xsrc, srcno) + offset;
> 
> Casting the esb_mmap into unsigned long then back to a pointer looks
> unnecessary.  You should be able to do this with pointer arithmetic.

yes.

>> +    if (write) {
>> +        *((uint64_t *) addr) = data;
>> +        return -1;
>> +    } else {
>> +        return *((uint64_t *) addr);
>> +    }
> 
> Since this is always dealing with 64-bit values, couldn't you put the
> byteswaps in here rather than in all the callers?

indeed.
 
>> +}
>> +
>> +static uint8_t xive_esb_read(XiveSource *xsrc, int srcno, uint32_t offset)
>> +{
>> +    /* Prevent the compiler from optimizing away the load */
>> +    volatile uint64_t value = xive_esb_rw(xsrc, srcno, offset, 0, 0);
> 
> Wouldn't the volatile magic be better inside xive_esb_rw()?

sure. I will rework these helpers. 

>> +    return be64_to_cpu(value) & 0x3;
>> +}
>> +
>> +static void xive_esb_trigger(XiveSource *xsrc, int srcno)
>> +{
>> +    unsigned long addr = (unsigned long) xsrc->esb_mmap +
>> +        xive_source_esb_page(xsrc, srcno);
>> +
>> +    *((uint64_t *) addr) = 0x0;
>> +}
> 
> Also.. aren't some of these register accesses likely to need memory
> barriers?

AIUI, these are CI pages. So we shouldn't need barriers.

>> +
>> +uint64_t kvmppc_xive_esb_rw(XiveSource *xsrc, int srcno, uint32_t offset,
>> +                            uint64_t data, bool write)
>> +{
>> +    if (write) {
>> +        return xive_esb_rw(xsrc, srcno, offset, data, 1);
>> +    }
>> +
>> +    /*
>> +     * Special Load EOI handling for LSI sources. Q bit is never set
>> +     * and the interrupt should be re-triggered if the level is still
>> +     * asserted.
>> +     */
>> +    if (xive_source_irq_is_lsi(xsrc, srcno) &&
>> +        offset == XIVE_ESB_LOAD_EOI) {
>> +        xive_esb_read(xsrc, srcno, XIVE_ESB_SET_PQ_00);
>> +        if (xsrc->status[srcno] & XIVE_STATUS_ASSERTED) {
>> +            xive_esb_trigger(xsrc, srcno);
>> +        }
>> +        return 0;
>> +    } else {
>> +        return xive_esb_rw(xsrc, srcno, offset, 0, 0);
>> +    }
>> +}
>> +
>>  void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val)
>>  {
>>      XiveSource *xsrc = opaque;
>> @@ -155,6 +259,86 @@ void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val)
>>  /*
>>   * sPAPR XIVE interrupt controller (KVM)
>>   */
>> +void kvmppc_xive_get_queue_config(sPAPRXive *xive, uint8_t end_blk,
>> +                                  uint32_t end_idx, XiveEND *end,
>> +                                  Error **errp)
>> +{
>> +    struct kvm_ppc_xive_eq kvm_eq = { 0 };
>> +    uint64_t kvm_eq_idx;
>> +    uint8_t priority;
>> +    uint32_t server;
>> +    Error *local_err = NULL;
>> +
>> +    if (!xive_end_is_valid(end)) {
> 
> This should set an error, shouldn't it?

Hmm, this helper is used in the hcall h_int_get_queue_config() and, later, 
in kvmppc_xive_get_queues() to synchronize the state from KVM. 

I should probably move the test outside this routine, return H_HARDWARE
in the hcall and skip invalid ENDs in kvmppc_xive_get_queues() 

Thanks,

C.


> 
>> +        return;
>> +    }
>> +
>> +    /* Encode the tuple (server, prio) as a KVM EQ index */
>> +    spapr_xive_end_to_target(end_blk, end_idx, &server, &priority);
>> +
>> +    kvm_eq_idx = priority << KVM_XIVE_EQ_PRIORITY_SHIFT &
>> +            KVM_XIVE_EQ_PRIORITY_MASK;
>> +    kvm_eq_idx |= server << KVM_XIVE_EQ_SERVER_SHIFT &
>> +        KVM_XIVE_EQ_SERVER_MASK;
>> +
>> +    kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_EQ_CONFIG, kvm_eq_idx,
>> +                      &kvm_eq, false, &local_err);
>> +    if (local_err) {
>> +        error_propagate(errp, local_err);
>> +        return;
>> +    }
>> +
>> +    /*
>> +     * The EQ index and toggle bit are updated by HW. These are the
>> +     * only fields we want to return.
>> +     */
>> +    end->w1 = xive_set_field32(END_W1_GENERATION, 0ul, kvm_eq.qtoggle) |
>> +        xive_set_field32(END_W1_PAGE_OFF, 0ul, kvm_eq.qindex);
>> +}
>> +
>> +void kvmppc_xive_set_queue_config(sPAPRXive *xive, uint8_t end_blk,
>> +                                  uint32_t end_idx, XiveEND *end,
>> +                                  Error **errp)
>> +{
>> +    struct kvm_ppc_xive_eq kvm_eq = { 0 };
>> +    uint64_t kvm_eq_idx;
>> +    uint8_t priority;
>> +    uint32_t server;
>> +    Error *local_err = NULL;
>> +
>> +    if (!xive_end_is_valid(end)) {
>> +        return;
>> +    }
>> +
>> +    /* Build the KVM state from the local END structure */
>> +    kvm_eq.flags   = KVM_XIVE_EQ_FLAG_ALWAYS_NOTIFY;
>> +    kvm_eq.qsize   = xive_get_field32(END_W0_QSIZE, end->w0) + 12;
>> +    kvm_eq.qpage   = (uint64_t) be32_to_cpu(end->w2 & 0x0fffffff) << 32 |
>> +        be32_to_cpu(end->w3);
>> +    kvm_eq.qtoggle = xive_get_field32(END_W1_GENERATION, end->w1);
>> +    kvm_eq.qindex  = xive_get_field32(END_W1_PAGE_OFF, end->w1);
>> +
>> +    /* Encode the tuple (server, prio) as a KVM EQ index */
>> +    spapr_xive_end_to_target(end_blk, end_idx, &server, &priority);
>> +
>> +    kvm_eq_idx = priority << KVM_XIVE_EQ_PRIORITY_SHIFT &
>> +            KVM_XIVE_EQ_PRIORITY_MASK;
>> +    kvm_eq_idx |= server << KVM_XIVE_EQ_SERVER_SHIFT &
>> +        KVM_XIVE_EQ_SERVER_MASK;
>> +
>> +    kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_EQ_CONFIG, kvm_eq_idx,
>> +                      &kvm_eq, true, &local_err);
>> +    if (local_err) {
>> +        error_propagate(errp, local_err);
>> +        return;
>> +    }
>> +}
>> +
>> +void kvmppc_xive_reset(sPAPRXive *xive, Error **errp)
>> +{
>> +    kvm_device_access(xive->fd, KVM_DEV_XIVE_GRP_CTRL, KVM_DEV_XIVE_RESET,
>> +                      NULL, true, errp);
>> +}
>>  
>>  static void *kvmppc_xive_mmap(sPAPRXive *xive, int pgoff, size_t len,
>>                                Error **errp)
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/13] spapr/xive: add state synchronization with KVM
  2019-02-26  0:01   ` David Gibson
@ 2019-03-11 20:41     ` Cédric Le Goater
  0 siblings, 0 replies; 32+ messages in thread
From: Cédric Le Goater @ 2019-03-11 20:41 UTC (permalink / raw)
  To: David Gibson; +Cc: Greg Kurz, qemu-ppc, qemu-devel

On 2/26/19 1:01 AM, David Gibson wrote:
> On Fri, Feb 22, 2019 at 02:13:13PM +0100, Cédric Le Goater wrote:
>> This extends the KVM XIVE device backend with 'synchronize_state'
>> methods used to retrieve the state from KVM. The HW state of the
>> sources, the KVM device and the thread interrupt contexts are
>> collected for the monitor usage and also migration.
>>
>> These get operations rely on their KVM counterpart in the host kernel
>> which acts as a proxy for OPAL, the host firmware. The set operations
>> will be added for migration support later.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  include/hw/ppc/spapr_xive.h |  8 ++++
>>  include/hw/ppc/xive.h       |  1 +
>>  hw/intc/spapr_xive.c        | 17 ++++---
>>  hw/intc/spapr_xive_kvm.c    | 89 +++++++++++++++++++++++++++++++++++++
>>  hw/intc/xive.c              | 10 +++++
>>  5 files changed, 118 insertions(+), 7 deletions(-)
>>
>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>> index 749c6cbc2c56..ebd65e7fe36b 100644
>> --- a/include/hw/ppc/spapr_xive.h
>> +++ b/include/hw/ppc/spapr_xive.h
>> @@ -44,6 +44,13 @@ typedef struct sPAPRXive {
>>      void          *tm_mmap;
>>  } sPAPRXive;
>>  
>> +/*
>> + * The sPAPR machine has a unique XIVE IC device. Assign a fixed value
>> + * to the controller block id value. It can nevertheless be changed
>> + * for testing purpose.
>> + */
>> +#define SPAPR_XIVE_BLOCK_ID 0x0
>> +
>>  bool spapr_xive_irq_claim(sPAPRXive *xive, uint32_t lisn, bool lsi);
>>  bool spapr_xive_irq_free(sPAPRXive *xive, uint32_t lisn);
>>  void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon);
>> @@ -74,5 +81,6 @@ void kvmppc_xive_set_queue_config(sPAPRXive *xive, uint8_t end_blk,
>>  void kvmppc_xive_get_queue_config(sPAPRXive *xive, uint8_t end_blk,
>>                                   uint32_t end_idx, XiveEND *end,
>>                                   Error **errp);
>> +void kvmppc_xive_synchronize_state(sPAPRXive *xive, Error **errp);
>>  
>>  #endif /* PPC_SPAPR_XIVE_H */
>> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
>> index 061d43fea24d..f3766fd881a2 100644
>> --- a/include/hw/ppc/xive.h
>> +++ b/include/hw/ppc/xive.h
>> @@ -431,5 +431,6 @@ void kvmppc_xive_source_reset_one(XiveSource *xsrc, int srcno, Error **errp);
>>  void kvmppc_xive_source_reset(XiveSource *xsrc, Error **errp);
>>  void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val);
>>  void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp);
>> +void kvmppc_xive_cpu_synchronize_state(XiveTCTX *tctx, Error **errp);
>>  
>>  #endif /* PPC_XIVE_H */
>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>> index 3db24391e31c..9f07567f4d78 100644
>> --- a/hw/intc/spapr_xive.c
>> +++ b/hw/intc/spapr_xive.c
>> @@ -40,13 +40,6 @@
>>  
>>  #define SPAPR_XIVE_NVT_BASE 0x400
>>  
>> -/*
>> - * The sPAPR machine has a unique XIVE IC device. Assign a fixed value
>> - * to the controller block id value. It can nevertheless be changed
>> - * for testing purpose.
>> - */
>> -#define SPAPR_XIVE_BLOCK_ID 0x0
>> -
>>  /*
>>   * sPAPR NVT and END indexing helpers
>>   */
>> @@ -153,6 +146,16 @@ void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon)
>>      XiveSource *xsrc = &xive->source;
>>      int i;
>>  
>> +    if (kvm_irqchip_in_kernel()) {
>> +        Error *local_err = NULL;
>> +
>> +        kvmppc_xive_synchronize_state(xive, &local_err);
>> +        if (local_err) {
>> +            error_report_err(local_err);
>> +            return;
>> +        }
>> +    }
>> +
>>      monitor_printf(mon, "  LSIN         PQ    EISN     CPU/PRIO EQ\n");
>>  
>>      for (i = 0; i < xive->nr_irqs; i++) {
>> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
>> index 6b50451b4f85..4b1ffb9835f9 100644
>> --- a/hw/intc/spapr_xive_kvm.c
>> +++ b/hw/intc/spapr_xive_kvm.c
>> @@ -60,6 +60,57 @@ static void kvm_cpu_enable(CPUState *cs)
>>  /*
>>   * XIVE Thread Interrupt Management context (KVM)
>>   */
>> +static void kvmppc_xive_cpu_get_state(XiveTCTX *tctx, Error **errp)
>> +{
>> +    uint64_t state[4] = { 0 };
>> +    int ret;
>> +
>> +    ret = kvm_get_one_reg(tctx->cs, KVM_REG_PPC_NVT_STATE, state);
>> +    if (ret != 0) {
>> +        error_setg_errno(errp, errno,
>> +                         "XIVE: could not capture KVM state of CPU %ld",
>> +                         kvm_arch_vcpu_id(tctx->cs));
>> +        return;
>> +    }
>> +
>> +    /* word0 and word1 of the OS ring. */
>> +    *((uint64_t *) &tctx->regs[TM_QW1_OS]) = state[0];
>> +
>> +    /*
>> +     * KVM also returns word2 containing the OS CAM line which is
>> +     * interesting to print out in the QEMU monitor.
>> +     */
>> +    *((uint64_t *) &tctx->regs[TM_QW1_OS + TM_WORD2]) = state[1];
> 
> As mentioned elsewhere, it is interesting for debugging, but doesn't
> seem to really match the guest visible CAM state, 

The guest is not allowed to see these registers in the TIMA OS page 
and we are not violating the XIVE architecture. That is where the 
CAM value belong in HW. The exact same place. I was even thinking 
to propagate the POOL value which could be useful for nested.

> so I'm not convinced it's a good idea to put it into the regs[] 
> structure.

I understand it is problematic in case of a KVM->QEMU migration 
because we need to force a XiveTCTX reset to update the registers 
with the QEMU CAM line which has been overridden with the KVM CAM 
line. 

Another solution could be to add a 'nvt_base' property to SpaprXive 
and a KVM control to get its value (xive->vp_base in the KVM XIVE 
device). It would be migrated and used by the QEMU XIVE device 
after migration. 
 
>> +}
>> +
>> +typedef struct {
>> +    XiveTCTX *tctx;
>> +    Error *err;
>> +} XiveCpuGetState;
>> +
>> +static void kvmppc_xive_cpu_do_synchronize_state(CPUState *cpu,
>> +                                                 run_on_cpu_data arg)
>> +{
>> +    XiveCpuGetState *s = arg.host_ptr;
>> +
>> +    kvmppc_xive_cpu_get_state(s->tctx, &s->err);
>> +}
>> +
>> +void kvmppc_xive_cpu_synchronize_state(XiveTCTX *tctx, Error **errp)
>> +{
>> +    XiveCpuGetState s = {
>> +        .tctx = tctx,
>> +        .err = NULL,
>> +    };
>> +
>> +    run_on_cpu(tctx->cs, kvmppc_xive_cpu_do_synchronize_state,
>> +               RUN_ON_CPU_HOST_PTR(&s));
> 
> Why does this need a run_on_cpu() ?  The KVM call which is getting the
> actual info takes a cpu parameter.

Don't we need to kick the vCPU ? 

Thanks,

C. 


> 
>> +
>> +    if (s.err) {
>> +        error_propagate(errp, s.err);
>> +        return;
>> +    }
>> +}
>>  
>>  void kvmppc_xive_cpu_connect(XiveTCTX *tctx, Error **errp)
>>  {
>> @@ -229,6 +280,19 @@ uint64_t kvmppc_xive_esb_rw(XiveSource *xsrc, int srcno, uint32_t offset,
>>      }
>>  }
>>  
>> +static void kvmppc_xive_source_get_state(XiveSource *xsrc)
>> +{
>> +    int i;
>> +
>> +    for (i = 0; i < xsrc->nr_irqs; i++) {
>> +        /* Perform a load without side effect to retrieve the PQ bits */
>> +        uint8_t pq = xive_esb_read(xsrc, i, XIVE_ESB_GET);
>> +
>> +        /* and save PQ locally */
>> +        xive_source_esb_set(xsrc, i, pq);
>> +    }
>> +}
>> +
>>  void kvmppc_xive_source_set_irq(void *opaque, int srcno, int val)
>>  {
>>      XiveSource *xsrc = opaque;
>> @@ -340,6 +404,31 @@ void kvmppc_xive_reset(sPAPRXive *xive, Error **errp)
>>                        NULL, true, errp);
>>  }
>>  
>> +static void kvmppc_xive_get_queues(sPAPRXive *xive, Error **errp)
>> +{
>> +    Error *local_err = NULL;
>> +    int i;
>> +
>> +    for (i = 0; i < xive->nr_ends; i++) {
>> +        kvmppc_xive_get_queue_config(xive, SPAPR_XIVE_BLOCK_ID, i,
>> +                                     &xive->endt[i], &local_err);
>> +        if (local_err) {
>> +            error_propagate(errp, local_err);
>> +            return;
>> +        }
>> +    }
>> +}
>> +
>> +void kvmppc_xive_synchronize_state(sPAPRXive *xive, Error **errp)
>> +{
>> +    kvmppc_xive_source_get_state(&xive->source);
>> +
>> +    /* EAT: there is no extra state to query from KVM */
>> +
>> +    /* ENDT */
>> +    kvmppc_xive_get_queues(xive, errp);
>> +}
>> +
>>  static void *kvmppc_xive_mmap(sPAPRXive *xive, int pgoff, size_t len,
>>                                Error **errp)
>>  {
>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>> index 0284b5803551..f478c52ab2a0 100644
>> --- a/hw/intc/xive.c
>> +++ b/hw/intc/xive.c
>> @@ -431,6 +431,16 @@ void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor *mon)
>>      int cpu_index = tctx->cs ? tctx->cs->cpu_index : -1;
>>      int i;
>>  
>> +    if (kvm_irqchip_in_kernel()) {
>> +        Error *local_err = NULL;
>> +
>> +        kvmppc_xive_cpu_synchronize_state(tctx, &local_err);
>> +        if (local_err) {
>> +            error_report_err(local_err);
>> +            return;
>> +        }
>> +    }
>> +
>>      monitor_printf(mon, "CPU[%04x]:   QW   NSR CPPR IPB LSMFB ACK# INC AGE PIPR"
>>                     "  W2\n", cpu_index);
>>  
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/13] spapr/xive: activate KVM support
  2019-02-25 23:49   ` David Gibson
  2019-02-25 23:49     ` David Gibson
@ 2019-03-11 20:44     ` Cédric Le Goater
  1 sibling, 0 replies; 32+ messages in thread
From: Cédric Le Goater @ 2019-03-11 20:44 UTC (permalink / raw)
  To: David Gibson; +Cc: Greg Kurz, qemu-ppc, qemu-devel

On 2/26/19 12:49 AM, David Gibson wrote:
> On Fri, Feb 22, 2019 at 02:13:12PM +0100, Cédric Le Goater wrote:
>> All is in place for KVM now. State synchronization and migration will
>> come next.
> 
> As with the kernel side capability, this should be moved later in the
> series to avoid breaking bisections.

I am not sure to understand. At this stage of the patchset, the XIVE
exploitation mode is operational. We can not synchronise the state 
or migrate but it runs.

Should we move XIVE activation after the migration patch ? 

C. 

 
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/ppc/spapr_irq.c | 9 ---------
>>  1 file changed, 9 deletions(-)
>>
>> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
>> index 6e1c36dc62ca..1ad57582a403 100644
>> --- a/hw/ppc/spapr_irq.c
>> +++ b/hw/ppc/spapr_irq.c
>> @@ -263,19 +263,10 @@ sPAPRIrq spapr_irq_xics = {
>>  static void spapr_irq_init_xive(sPAPRMachineState *spapr, int nr_irqs,
>>                                  Error **errp)
>>  {
>> -    MachineState *machine = MACHINE(spapr);
>>      uint32_t nr_servers = spapr_max_server_number(spapr);
>>      DeviceState *dev;
>>      int i;
>>  
>> -    /* KVM XIVE device not yet available */
>> -    if (kvm_enabled()) {
>> -        if (machine_kernel_irqchip_required(machine)) {
>> -            error_setg(errp, "kernel_irqchip requested. no KVM XIVE support");
>> -            return;
>> -        }
>> -    }
>> -
>>      dev = qdev_create(NULL, TYPE_SPAPR_XIVE);
>>      qdev_prop_set_uint32(dev, "nr-irqs", nr_irqs);
>>      /*
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/13] spapr/xive: fix migration of the XiveTCTX under TCG
  2019-02-26  1:02   ` David Gibson
@ 2019-03-11 20:45     ` Cédric Le Goater
  0 siblings, 0 replies; 32+ messages in thread
From: Cédric Le Goater @ 2019-03-11 20:45 UTC (permalink / raw)
  To: David Gibson; +Cc: Greg Kurz, qemu-ppc, qemu-devel

On 2/26/19 2:02 AM, David Gibson wrote:
> On Fri, Feb 22, 2019 at 02:13:16PM +0100, Cédric Le Goater wrote:
>> When the thread interrupt management state is retrieved from the KVM
>> VCPU, word2 is saved under the QEMU XIVE thread context to print out
>> the OS CAM line under the QEMU monitor.
>>
>> This breaks the migration of a TCG guest (and with KVM when
>> kernel_irqchip=off) because the matching algorithm of the presenter
>> relies on the OS CAM value. Fix with an extra reset of the thread
>> contexts to restore the expected value.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> 
> As noted elsewhere, I'm not sure this is the right approach to fixing
> this.  In any case this can be folded into the previous patch.

I have proposed an alternative in a response to :

 [PATCH v2 04/13] spapr/xive: add state synchronization with KVM 

C.


> 
>> ---
>>  hw/ppc/spapr_irq.c | 26 +++++++++++++++++++++++++-
>>  1 file changed, 25 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c
>> index 12ecca6264f3..3176098b9f7c 100644
>> --- a/hw/ppc/spapr_irq.c
>> +++ b/hw/ppc/spapr_irq.c
>> @@ -356,7 +356,31 @@ static void spapr_irq_cpu_intc_create_xive(sPAPRMachineState *spapr,
>>  
>>  static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id)
>>  {
>> -    return spapr_xive_post_load(spapr->xive, version_id);
>> +    CPUState *cs;
>> +    int ret;
>> +
>> +    ret = spapr_xive_post_load(spapr->xive, version_id);
>> +    if (ret) {
>> +        return ret;
>> +    }
>> +
>> +    /*
>> +     * When the states are collected from the KVM XIVE device, word2
>> +     * of the XiveTCTX is set to print out the OS CAM line under the
>> +     * QEMU monitor.
>> +     *
>> +     * This breaks the migration on a TCG guest (or on KVM with
>> +     * kernel_irqchip=off) because the matching algorithm of the
>> +     * presenter relies on the OS CAM value. Fix with an extra reset
>> +     * of the thread contexts to restore the expected value.
>> +     */
>> +    CPU_FOREACH(cs) {
>> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
>> +
>> +        /* (TCG) Set the OS CAM line of the thread interrupt context. */
>> +        spapr_xive_set_tctx_os_cam(spapr_cpu_state(cpu)->tctx);
>> +    }
>> +    return 0;
>>  }
>>  
>>  static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp)
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Qemu-devel] [PATCH v2 13/13] spapr/xive: fix device hotplug when VM is stopped
  2019-02-26  4:17   ` David Gibson
@ 2019-03-11 20:59     ` Cédric Le Goater
  0 siblings, 0 replies; 32+ messages in thread
From: Cédric Le Goater @ 2019-03-11 20:59 UTC (permalink / raw)
  To: David Gibson; +Cc: Greg Kurz, qemu-ppc, qemu-devel

On 2/26/19 5:17 AM, David Gibson wrote:
> On Fri, Feb 22, 2019 at 02:13:22PM +0100, Cédric Le Goater wrote:
>> Instead of switching off the sources, set their state to PENDING to
>> possibly catch a hotplug event occuring while the VM is stopped. At
>> resume, check the previous state and if an interrupt was queued,
>> generate a trigger.
> 
> First, I think it would be better to fold this fix into the patch
> introducing the state change handlers.> 
> Second, IIUC this would handle any instance of an irq being triggered
> while the VM is stopped. 

yes.

> Hotplug interrupts is one obvious case of that, but I'm not sure its 
> the only one.  

Do we really need to support device hotplug when the VM is stopped ? 
Is that a libvirt requirement ?  

> VFIO devices could interrupt while the VM is stopped, I think. 

If the guest has configured and mapped the IRQs, I would say yes. 

> Maybe even emulated devices
> depending on how their synchronization with the cpu run state works.

The console is one example.

> There might be other cases.  Does that sound right to you?

yes.

Supporting interrupts while a VM is stopped seems like a weird 
test scenario to me. Should we or should we not ? 

Thanks,

C.

>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> ---
>>  hw/intc/spapr_xive_kvm.c | 22 +++++++++++++++++++---
>>  1 file changed, 19 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
>> index 99a829fb3f60..64d160babb26 100644
>> --- a/hw/intc/spapr_xive_kvm.c
>> +++ b/hw/intc/spapr_xive_kvm.c
>> @@ -500,8 +500,16 @@ static void kvmppc_xive_change_state_handler(void *opaque, int running,
>>      if (running) {
>>          for (i = 0; i < xsrc->nr_irqs; i++) {
>>              uint8_t pq = xive_source_esb_get(xsrc, i);
>> -            if (xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_00 + (pq << 8)) != 0x1) {
>> -                error_report("XIVE: IRQ %d has an invalid state", i);
>> +            uint8_t old_pq;
>> +
>> +            old_pq = xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_00 + (pq << 8));
>> +
>> +            /*
>> +             * If an interrupt was queued (hotplug event) while VM was
>> +             * stopped, generate a trigger.
>> +             */
>> +            if (pq == XIVE_ESB_RESET && old_pq == XIVE_ESB_QUEUED) {
>> +                xive_esb_trigger(xsrc, i);
>>              }
>>          }
>>  
>> @@ -515,7 +523,15 @@ static void kvmppc_xive_change_state_handler(void *opaque, int running,
>>       * migration is in progress.
>>       */
>>      for (i = 0; i < xsrc->nr_irqs; i++) {
>> -        uint8_t pq = xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_01);
>> +        uint8_t pq = xive_esb_read(xsrc, i, XIVE_ESB_GET);
>> +
>> +        /*
>> +         * PQ is set to PENDING to possibly catch a hotplug event
>> +         * occuring while the VM is stopped.
>> +         */
>> +        if (pq != XIVE_ESB_OFF) {
>> +            pq = xive_esb_read(xsrc, i, XIVE_ESB_SET_PQ_10);
>> +        }
>>          xive_source_esb_set(xsrc, i, pq);
>>      }
>>  
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2019-03-11 21:07 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-22 13:13 [Qemu-devel] [PATCH v2 00/13] spapr: add KVM support to the XIVE interrupt mode Cédric Le Goater
2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 01/13] spapr/xive: add KVM support Cédric Le Goater
2019-02-25  5:55   ` David Gibson
2019-03-11 15:53     ` Cédric Le Goater
2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 02/13] spapr/xive: add hcall support when under KVM Cédric Le Goater
2019-02-25 23:22   ` David Gibson
2019-03-11 17:32     ` Cédric Le Goater
2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 03/13] spapr/xive: activate KVM support Cédric Le Goater
2019-02-25 23:49   ` David Gibson
2019-02-25 23:49     ` David Gibson
2019-03-11 20:44     ` Cédric Le Goater
2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 04/13] spapr/xive: add state synchronization with KVM Cédric Le Goater
2019-02-26  0:01   ` David Gibson
2019-03-11 20:41     ` Cédric Le Goater
2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 05/13] spapr/xive: introduce a VM state change handler Cédric Le Goater
2019-02-26  0:39   ` David Gibson
2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 06/13] spapr/xive: add migration support for KVM Cédric Le Goater
2019-02-26  0:58   ` David Gibson
2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 07/13] spapr/xive: fix migration of the XiveTCTX under TCG Cédric Le Goater
2019-02-26  1:02   ` David Gibson
2019-03-11 20:45     ` Cédric Le Goater
2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 08/13] spapr/rtas: modify spapr_rtas_register() to remove RTAS handlers Cédric Le Goater
2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 09/13] sysbus: add a sysbus_mmio_unmap() helper Cédric Le Goater
2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 10/13] spapr: introduce routines to delete the KVM IRQ device Cédric Le Goater
2019-02-26  1:10   ` David Gibson
2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 11/13] spapr: check for the activation of " Cédric Le Goater
2019-02-26  1:27   ` David Gibson
2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 12/13] spapr: add KVM support to the 'dual' machine Cédric Le Goater
2019-02-28  5:15   ` David Gibson
2019-02-22 13:13 ` [Qemu-devel] [PATCH v2 13/13] spapr/xive: fix device hotplug when VM is stopped Cédric Le Goater
2019-02-26  4:17   ` David Gibson
2019-03-11 20:59     ` Cédric Le Goater

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.