All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/8] spapr: fix IOMMU and XICS/IRQs migration
@ 2014-03-14  4:18 Alexey Kardashevskiy
  2014-03-14  4:18 ` [Qemu-devel] [PATCH 1/8] spapr-iommu: add a bus for spapr-iommu devices Alexey Kardashevskiy
                   ` (10 more replies)
  0 siblings, 11 replies; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-03-14  4:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf, Andreas Färber

This initial problem came form libvirt - it does not preserve
the device order when running QEMU. So it is easy to get source QEMU with:
-device spapr-vscsi,id=scsi1,reg=0x2000 -device spapr-vscsi,id=scsi0,reg=0x3000
and destination QEMU with:
-device spapr-vscsi,id=scsi0,reg=0x3000 -device spapr-vscsi,id=scsi1,reg=0x2000

Since SPAPR IOMMU device does not have a bus, it is identified in
the migration stream as "spapr-iommu" and @instance_id which is assigned
as IOMMUs are created. This results in broken migration as @reg does not match.
The first patch fixes this issue by adding a bus device and a bridge.

However just 1/8 does not fix the migration as device creation order also
affects IRQs assigned to the devices, for both PCI and VIO.
2/7..8/8 fix that by moving XICS IRQ management from SPAPR to XICS and
implementing migration support for the entire XICS IRQ map.

As we are here, the patchset also prepares XICS to support multiple ICS
(interrupt servers).

This is a bugfix patchset but it feels too big to go to 2.0, right? :)


Alexey Kardashevskiy (8):
  spapr-iommu: add a bus for spapr-iommu devices
  xics: add flags for interrupts
  xics: add find_server
  xics: add pre_load() hook to ICSStateClass
  xics: disable flags reset on xics reset
  spapr: move interrupt allocator to xics
  spapr: remove @next_irq
  xics: enable interrupt configuration reset on migration

 hw/intc/xics.c         | 158 +++++++++++++++++++++++++++++++++++++++++++++----
 hw/intc/xics_kvm.c     |   9 +--
 hw/ppc/spapr.c         |  73 +----------------------
 hw/ppc/spapr_events.c  |   2 +-
 hw/ppc/spapr_iommu.c   |  59 +++++++++++++++++-
 hw/ppc/spapr_pci.c     |   8 +--
 hw/ppc/spapr_vio.c     |   4 +-
 include/hw/ppc/spapr.h |  18 +++---
 include/hw/ppc/xics.h  |   8 ++-
 trace-events           |   4 ++
 10 files changed, 238 insertions(+), 105 deletions(-)

-- 
1.8.4.rc4

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 1/8] spapr-iommu: add a bus for spapr-iommu devices
  2014-03-14  4:18 [Qemu-devel] [PATCH 0/8] spapr: fix IOMMU and XICS/IRQs migration Alexey Kardashevskiy
@ 2014-03-14  4:18 ` Alexey Kardashevskiy
  2014-04-10 12:40   ` Alexander Graf
  2014-03-14  4:18 ` [Qemu-devel] [PATCH 2/8] xics: add flags for interrupts Alexey Kardashevskiy
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-03-14  4:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf, Andreas Färber

At the moment sPAPR IOMMU table is a device which participates in
a migration stream. Normally QEMU uses a get_dev_path() hook from
the device's bus to compose the section name and @instance_id which are
used to match the section to the real device. This works till the user
changes the device order in the command line - if this happens,
devices get other instance_id's and migration fails.

This adds a TCE bridge bus device per sPAPR machine and places all sPAPR
IOMMU devices onto it.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr.c         |  3 +++
 hw/ppc/spapr_iommu.c   | 59 +++++++++++++++++++++++++++++++++++++++++++++++++-
 include/hw/ppc/spapr.h |  7 ++++++
 3 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 5c9a154..12adc21 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1263,6 +1263,9 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
     /* Set up EPOW events infrastructure */
     spapr_events_init(spapr);
 
+    /* Set up TCE IOMMUs bus */
+    spapr->tce_bus = spapr_tce_bus_init();
+
     /* Set up VIO bus */
     spapr->vio_bus = spapr_vio_bus_init();
 
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index d9fe946..7db0acf 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -157,7 +157,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn, size_t wi
         return NULL;
     }
 
-    tcet = SPAPR_TCE_TABLE(object_new(TYPE_SPAPR_TCE_TABLE));
+    tcet = SPAPR_TCE_TABLE(qdev_create(spapr->tce_bus, TYPE_SPAPR_TCE_TABLE));
     tcet->liobn = liobn;
     tcet->window_size = window_size;
 
@@ -342,9 +342,66 @@ static TypeInfo spapr_tce_table_info = {
     .instance_finalize = spapr_tce_table_finalize,
 };
 
+static char *spapr_tce_bus_get_dev_name(DeviceState *qdev)
+{
+    sPAPRTCETable *tcet = SPAPR_TCE_TABLE(qdev);
+    char *name;
+
+    name = g_strdup_printf("liobn@%x", tcet->liobn);
+    return name;
+}
+
+static void spapr_tce_bus_class_init(ObjectClass *klass, void *data)
+{
+    BusClass *k = BUS_CLASS(klass);
+
+    k->get_dev_path = spapr_tce_bus_get_dev_name;
+}
+
+static const TypeInfo spapr_tce_bus_info = {
+    .name = TYPE_SPAPR_TCE_BUS,
+    .parent = TYPE_BUS,
+    .class_init = spapr_tce_bus_class_init,
+    .instance_size = sizeof(BusState),
+};
+
+static int spapr_tce_bridge_init(SysBusDevice *dev)
+{
+    /* nothing */
+    return 0;
+}
+
+static void spapr_tce_bridge_class_init(ObjectClass *klass, void *data)
+{
+    SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
+
+    k->init = spapr_tce_bridge_init;
+}
+
+static const TypeInfo spapr_tce_bridge_info = {
+    .name          = "spapr-tce-bridge",
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(SysBusDevice),
+    .class_init    = spapr_tce_bridge_class_init,
+};
+
 static void register_types(void)
 {
     type_register_static(&spapr_tce_table_info);
+    type_register_static(&spapr_tce_bridge_info);
+    type_register_static(&spapr_tce_bus_info);
+}
+
+BusState *spapr_tce_bus_init(void)
+{
+    DeviceState *dev;
+
+    /* Create bridge device */
+    dev = qdev_create(NULL, spapr_tce_bridge_info.name);
+    qdev_init_nofail(dev);
+
+    /* Create bus on bridge device */
+    return qbus_create(TYPE_SPAPR_TCE_BUS, dev, "spapr-tce");
 }
 
 type_init(register_types);
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 449fc7c..18332fd 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -12,6 +12,7 @@ struct sPAPRNVRAM;
 
 typedef struct sPAPREnvironment {
     struct VIOsPAPRBus *vio_bus;
+    BusState *tce_bus;
     QLIST_HEAD(, sPAPRPHBState) phbs;
     hwaddr msi_win_addr;
     MemoryRegion msiwindow;
@@ -405,4 +406,10 @@ int spapr_dma_dt(void *fdt, int node_off, const char *propname,
 int spapr_tcet_dma_dt(void *fdt, int node_off, const char *propname,
                       sPAPRTCETable *tcet);
 
+#define TYPE_SPAPR_TCE_BUS "spapr-tce-bus"
+#define SPAPR_TCE_BUS(obj) \
+    OBJECT_CHECK(BusState, (obj), TYPE_SPAPR_TCE_BUS)
+
+BusState *spapr_tce_bus_init(void);
+
 #endif /* !defined (__HW_SPAPR_H__) */
-- 
1.8.4.rc4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 2/8] xics: add flags for interrupts
  2014-03-14  4:18 [Qemu-devel] [PATCH 0/8] spapr: fix IOMMU and XICS/IRQs migration Alexey Kardashevskiy
  2014-03-14  4:18 ` [Qemu-devel] [PATCH 1/8] spapr-iommu: add a bus for spapr-iommu devices Alexey Kardashevskiy
@ 2014-03-14  4:18 ` Alexey Kardashevskiy
  2014-04-10 12:43   ` Alexander Graf
  2014-03-14  4:18 ` [Qemu-devel] [PATCH 3/8] xics: add find_server Alexey Kardashevskiy
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-03-14  4:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf, Andreas Färber

We will need soon an "allocated" flag for every interrupt to support
interrupt configuration change which may happen during migration.

This replaces a separate lslsi[] array with a byte in the ICSIRQState
struct and defines "LSI" and "MSI" flags. Neither of these flags set
signals that the descriptor is not in use.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/intc/xics.c        | 17 +++++++++++------
 hw/intc/xics_kvm.c    |  5 ++---
 include/hw/ppc/xics.h |  4 +++-
 3 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 64aabe7..7eac85a 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -438,7 +438,7 @@ static void ics_set_irq(void *opaque, int srcno, int val)
 {
     ICSState *ics = (ICSState *)opaque;
 
-    if (ics->islsi[srcno]) {
+    if (ics->irqs[srcno].flags & XICS_FLAGS_LSI) {
         set_irq_lsi(ics, srcno, val);
     } else {
         set_irq_msi(ics, srcno, val);
@@ -475,7 +475,7 @@ static void ics_write_xive(ICSState *ics, int nr, int server,
 
     trace_xics_ics_write_xive(nr, srcno, server, priority);
 
-    if (ics->islsi[srcno]) {
+    if (ics->irqs[srcno].flags & XICS_FLAGS_LSI) {
         write_xive_lsi(ics, srcno);
     } else {
         write_xive_msi(ics, srcno);
@@ -497,7 +497,7 @@ static void ics_resend(ICSState *ics)
 
     for (i = 0; i < ics->nr_irqs; i++) {
         /* FIXME: filter by server#? */
-        if (ics->islsi[i]) {
+        if (ics->irqs[i].flags & XICS_FLAGS_LSI) {
             resend_lsi(ics, i);
         } else {
             resend_msi(ics, i);
@@ -512,7 +512,7 @@ static void ics_eoi(ICSState *ics, int nr)
 
     trace_xics_ics_eoi(nr);
 
-    if (ics->islsi[srcno]) {
+    if (ics->irqs[srcno].flags & XICS_FLAGS_LSI) {
         irq->status &= ~XICS_STATUS_SENT;
     }
 }
@@ -609,7 +609,6 @@ static void ics_realize(DeviceState *dev, Error **errp)
         return;
     }
     ics->irqs = g_malloc0(ics->nr_irqs * sizeof(ICSIRQState));
-    ics->islsi = g_malloc0(ics->nr_irqs * sizeof(bool));
     ics->qirqs = qemu_allocate_irqs(ics_set_irq, ics, ics->nr_irqs);
 }
 
@@ -646,11 +645,17 @@ qemu_irq xics_get_qirq(XICSState *icp, int irq)
     return icp->ics->qirqs[irq - icp->ics->offset];
 }
 
+static void ics_set_irq_type(ICSState *ics, int irq, bool lsi)
+{
+    ics->irqs[irq - ics->offset].flags |=
+        lsi ? XICS_FLAGS_LSI : XICS_FLAGS_MSI;
+}
+
 void xics_set_irq_type(XICSState *icp, int irq, bool lsi)
 {
     assert(ics_valid_irq(icp->ics, irq));
 
-    icp->ics->islsi[irq - icp->ics->offset] = lsi;
+    ics_set_irq_type(icp->ics, irq, lsi);
 }
 
 /*
diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
index c93dae0..8073827 100644
--- a/hw/intc/xics_kvm.c
+++ b/hw/intc/xics_kvm.c
@@ -224,7 +224,7 @@ static int ics_set_kvm_state(ICSState *ics, int version_id)
             state |= KVM_XICS_MASKED;
         }
 
-        if (ics->islsi[i]) {
+        if (ics->irqs[i].flags & XICS_FLAGS_LSI) {
             state |= KVM_XICS_LEVEL_SENSITIVE;
             if (irq->status & XICS_STATUS_ASSERTED) {
                 state |= KVM_XICS_PENDING;
@@ -253,7 +253,7 @@ static void ics_kvm_set_irq(void *opaque, int srcno, int val)
     int rc;
 
     args.irq = srcno + ics->offset;
-    if (!ics->islsi[srcno]) {
+    if (ics->irqs[srcno].flags & XICS_FLAGS_LSI) {
         if (!val) {
             return;
         }
@@ -290,7 +290,6 @@ static void ics_kvm_realize(DeviceState *dev, Error **errp)
         return;
     }
     ics->irqs = g_malloc0(ics->nr_irqs * sizeof(ICSIRQState));
-    ics->islsi = g_malloc0(ics->nr_irqs * sizeof(bool));
     ics->qirqs = qemu_allocate_irqs(ics_kvm_set_irq, ics, ics->nr_irqs);
 }
 
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 0d7673d..02bbe31 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -136,7 +136,6 @@ struct ICSState {
     uint32_t nr_irqs;
     uint32_t offset;
     qemu_irq *qirqs;
-    bool *islsi;
     ICSIRQState *irqs;
     XICSState *icp;
 };
@@ -150,6 +149,9 @@ struct ICSIRQState {
 #define XICS_STATUS_REJECTED           0x4
 #define XICS_STATUS_MASKED_PENDING     0x8
     uint8_t status;
+#define XICS_FLAGS_LSI                 0x1
+#define XICS_FLAGS_MSI                 0x2
+    uint8_t flags;
 };
 
 qemu_irq xics_get_qirq(XICSState *icp, int irq);
-- 
1.8.4.rc4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 3/8] xics: add find_server
  2014-03-14  4:18 [Qemu-devel] [PATCH 0/8] spapr: fix IOMMU and XICS/IRQs migration Alexey Kardashevskiy
  2014-03-14  4:18 ` [Qemu-devel] [PATCH 1/8] spapr-iommu: add a bus for spapr-iommu devices Alexey Kardashevskiy
  2014-03-14  4:18 ` [Qemu-devel] [PATCH 2/8] xics: add flags for interrupts Alexey Kardashevskiy
@ 2014-03-14  4:18 ` Alexey Kardashevskiy
  2014-03-14  4:18 ` [Qemu-devel] [PATCH 4/8] xics: add pre_load() hook to ICSStateClass Alexey Kardashevskiy
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-03-14  4:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf, Andreas Färber

PAPR allows having multiple interrupr servers. However now we
only support one.

This adds a server lookup function and makes use of it.

At the moment no change is expected.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/intc/xics.c | 27 ++++++++++++++++++++++-----
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 7eac85a..88ef9ef 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -635,14 +635,30 @@ static const TypeInfo ics_info = {
 /*
  * Exported functions
  */
+static int xics_find_server(XICSState *icp, int irq)
+{
+    int server;
+
+    for (server = 0; server < icp->nr_servers; ++server) {
+        ICSState *ics = &icp->ics[server];
+        if (ics_valid_irq(ics, irq)) {
+            return server;
+        }
+    }
+
+    return -1;
+}
 
 qemu_irq xics_get_qirq(XICSState *icp, int irq)
 {
-    if (!ics_valid_irq(icp->ics, irq)) {
-        return NULL;
+    int server = xics_find_server(icp, irq);
+
+    if (server >= 0) {
+        ICSState *ics = &icp->ics[server];
+        return ics->qirqs[irq - ics->offset];
     }
 
-    return icp->ics->qirqs[irq - icp->ics->offset];
+    return NULL;
 }
 
 static void ics_set_irq_type(ICSState *ics, int irq, bool lsi)
@@ -653,9 +669,10 @@ static void ics_set_irq_type(ICSState *ics, int irq, bool lsi)
 
 void xics_set_irq_type(XICSState *icp, int irq, bool lsi)
 {
-    assert(ics_valid_irq(icp->ics, irq));
+    int server = xics_find_server(icp, irq);
 
-    ics_set_irq_type(icp->ics, irq, lsi);
+    assert(server >= 0);
+    ics_set_irq_type(&icp->ics[server], irq, lsi);
 }
 
 /*
-- 
1.8.4.rc4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 4/8] xics: add pre_load() hook to ICSStateClass
  2014-03-14  4:18 [Qemu-devel] [PATCH 0/8] spapr: fix IOMMU and XICS/IRQs migration Alexey Kardashevskiy
                   ` (2 preceding siblings ...)
  2014-03-14  4:18 ` [Qemu-devel] [PATCH 3/8] xics: add find_server Alexey Kardashevskiy
@ 2014-03-14  4:18 ` Alexey Kardashevskiy
  2014-03-14  4:18 ` [Qemu-devel] [PATCH 5/8] xics: disable flags reset on xics reset Alexey Kardashevskiy
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-03-14  4:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf, Andreas Färber

This adds a hook which will be used to reset ICS state before incoming
migration.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/intc/xics.c        | 13 +++++++++++++
 include/hw/ppc/xics.h |  1 +
 2 files changed, 14 insertions(+)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 88ef9ef..39c33bc 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -550,6 +550,18 @@ static void ics_dispatch_pre_save(void *opaque)
     }
 }
 
+static int ics_dispatch_pre_load(void *opaque)
+{
+    ICSState *ics = opaque;
+    ICSStateClass *info = ICS_GET_CLASS(ics);
+
+    if (info->pre_load) {
+        return info->pre_load(ics);
+    }
+
+    return 0;
+}
+
 static int ics_dispatch_post_load(void *opaque, int version_id)
 {
     ICSState *ics = opaque;
@@ -582,6 +594,7 @@ static const VMStateDescription vmstate_ics = {
     .minimum_version_id = 1,
     .minimum_version_id_old = 1,
     .pre_save = ics_dispatch_pre_save,
+    .pre_load = ics_dispatch_pre_load,
     .post_load = ics_dispatch_post_load,
     .fields      = (VMStateField []) {
         /* Sanity check */
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 02bbe31..4b30e9a 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -126,6 +126,7 @@ struct ICSStateClass {
     DeviceClass parent_class;
 
     void (*pre_save)(ICSState *s);
+    int (*pre_load)(ICSState *s);
     int (*post_load)(ICSState *s, int version_id);
 };
 
-- 
1.8.4.rc4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 5/8] xics: disable flags reset on xics reset
  2014-03-14  4:18 [Qemu-devel] [PATCH 0/8] spapr: fix IOMMU and XICS/IRQs migration Alexey Kardashevskiy
                   ` (3 preceding siblings ...)
  2014-03-14  4:18 ` [Qemu-devel] [PATCH 4/8] xics: add pre_load() hook to ICSStateClass Alexey Kardashevskiy
@ 2014-03-14  4:18 ` Alexey Kardashevskiy
  2014-03-14  4:18 ` [Qemu-devel] [PATCH 6/8] spapr: move interrupt allocator to xics Alexey Kardashevskiy
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-03-14  4:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf, Andreas Färber

Since islsi[] array has been merged into the ICSState struct,
we must not reset flags as they tell if the interrupt is in use.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/intc/xics.c     | 4 +++-
 hw/intc/xics_kvm.c | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 39c33bc..e5195bf 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -522,10 +522,12 @@ static void ics_reset(DeviceState *dev)
     ICSState *ics = ICS(dev);
     int i;
 
-    memset(ics->irqs, 0, sizeof(ICSIRQState) * ics->nr_irqs);
     for (i = 0; i < ics->nr_irqs; i++) {
+        ics->irqs[i].server = 0;
         ics->irqs[i].priority = 0xff;
         ics->irqs[i].saved_priority = 0xff;
+        ics->irqs[i].status = 0;
+        /* Do not reset @flags as IRQ might be allocated */
     }
 }
 
diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
index 8073827..a0f43ba 100644
--- a/hw/intc/xics_kvm.c
+++ b/hw/intc/xics_kvm.c
@@ -272,10 +272,12 @@ static void ics_kvm_reset(DeviceState *dev)
     ICSState *ics = ICS(dev);
     int i;
 
-    memset(ics->irqs, 0, sizeof(ICSIRQState) * ics->nr_irqs);
     for (i = 0; i < ics->nr_irqs; i++) {
+        ics->irqs[i].server = 0;
         ics->irqs[i].priority = 0xff;
         ics->irqs[i].saved_priority = 0xff;
+        ics->irqs[i].status = 0;
+        /* Do not reset @flags as IRQ might be allocated */
     }
 
     ics_set_kvm_state(ics, 1);
-- 
1.8.4.rc4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 6/8] spapr: move interrupt allocator to xics
  2014-03-14  4:18 [Qemu-devel] [PATCH 0/8] spapr: fix IOMMU and XICS/IRQs migration Alexey Kardashevskiy
                   ` (4 preceding siblings ...)
  2014-03-14  4:18 ` [Qemu-devel] [PATCH 5/8] xics: disable flags reset on xics reset Alexey Kardashevskiy
@ 2014-03-14  4:18 ` Alexey Kardashevskiy
  2014-04-10 12:51   ` Alexander Graf
  2014-03-14  4:18 ` [Qemu-devel] [PATCH 7/8] spapr: remove @next_irq Alexey Kardashevskiy
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-03-14  4:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf, Andreas Färber

The current allocator returns IRQ numbers from a pool and does not
support IRQs reuse in any form as it did not keep track of what it
previously returned, it only had the last returned IRQ.
However migration may change interrupts for devices depending on
their order in the command line.

This moves an allocator from SPAPR to XICS.

This switches IRQ users to use new API.

This uses LSI/MSI flags to know if interrupt is in use.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/intc/xics.c         | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr.c         | 67 ------------------------------------------
 hw/ppc/spapr_events.c  |  2 +-
 hw/ppc/spapr_pci.c     |  6 ++--
 hw/ppc/spapr_vio.c     |  2 +-
 include/hw/ppc/spapr.h | 10 -------
 include/hw/ppc/xics.h  |  2 ++
 trace-events           |  3 ++
 8 files changed, 90 insertions(+), 82 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index e5195bf..8d101a3 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -690,6 +690,86 @@ void xics_set_irq_type(XICSState *icp, int irq, bool lsi)
     ics_set_irq_type(&icp->ics[server], irq, lsi);
 }
 
+#define XICS_IRQ_FREE(ics, n)   \
+    (!((ics)->irqs[(n) - (ics)->offset].flags & \
+       (XICS_FLAGS_LSI | XICS_FLAGS_MSI)))
+
+static int ics_find_free_block(ICSState *ics, int num, int alignnum)
+{
+    int first, i;
+
+    for (first = 0; first < ics->nr_irqs; first += alignnum) {
+        if (num > (ics->nr_irqs - first)) {
+            return -1;
+        }
+        for (i = first; i < first + num; ++i) {
+            if (!XICS_IRQ_FREE(ics, i + ics->offset)) {
+                break;
+            }
+        }
+        if (i == (first + num)) {
+            return first + ics->offset;
+        }
+    }
+
+    return -1;
+}
+
+int xics_alloc(XICSState *icp, int server, int irq, bool lsi)
+{
+    ICSState *ics = &icp->ics[server];
+
+    if (irq) {
+        assert(server == xics_find_server(icp, irq));
+        if (!XICS_IRQ_FREE(ics, irq)) {
+            trace_xics_alloc_failed(server, irq);
+            return -1;
+        }
+    } else {
+        irq = ics_find_free_block(ics, 1, 1);
+    }
+
+    ics_set_irq_type(ics, irq, lsi);
+    trace_xics_alloc(server, irq);
+
+    return irq;
+}
+
+/*
+ * Allocate block of consequtive IRQs, returns a number of the first.
+ * If align==true, aligns the first IRQ number to num.
+ */
+int xics_alloc_block(XICSState *icp, int server, int num, bool lsi, bool align)
+{
+    int i, first = -1;
+    ICSState *ics = &icp->ics[server];
+
+    assert(server == 0);
+    /*
+     * MSIMesage::data is used for storing VIRQ so
+     * it has to be aligned to num to support multiple
+     * MSI vectors. MSI-X is not affected by this.
+     * The hint is used for the first IRQ, the rest should
+     * be allocated continuously.
+     */
+    if (align) {
+        assert((num == 1) || (num == 2) || (num == 4) ||
+               (num == 8) || (num == 16) || (num == 32));
+        first = ics_find_free_block(ics, num, num);
+    } else {
+        first = ics_find_free_block(ics, num, 1);
+    }
+
+    if (first > 0) {
+        for (i = first; i < first + num; ++i) {
+            ics_set_irq_type(ics, i, lsi);
+        }
+    }
+    trace_xics_alloc_block(server, first, num, lsi, align);
+
+    return first;
+}
+
 /*
  * Guest interfaces
  */
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 12adc21..29ca2e0 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -83,73 +83,6 @@
 
 sPAPREnvironment *spapr;
 
-int spapr_allocate_irq(int hint, bool lsi)
-{
-    int irq;
-
-    if (hint) {
-        irq = hint;
-        if (hint >= spapr->next_irq) {
-            spapr->next_irq = hint + 1;
-        }
-        /* FIXME: we should probably check for collisions somehow */
-    } else {
-        irq = spapr->next_irq++;
-    }
-
-    /* Configure irq type */
-    if (!xics_get_qirq(spapr->icp, irq)) {
-        return 0;
-    }
-
-    xics_set_irq_type(spapr->icp, irq, lsi);
-
-    return irq;
-}
-
-/*
- * Allocate block of consequtive IRQs, returns a number of the first.
- * If msi==true, aligns the first IRQ number to num.
- */
-int spapr_allocate_irq_block(int num, bool lsi, bool msi)
-{
-    int first = -1;
-    int i, hint = 0;
-
-    /*
-     * MSIMesage::data is used for storing VIRQ so
-     * it has to be aligned to num to support multiple
-     * MSI vectors. MSI-X is not affected by this.
-     * The hint is used for the first IRQ, the rest should
-     * be allocated continuously.
-     */
-    if (msi) {
-        assert((num == 1) || (num == 2) || (num == 4) ||
-               (num == 8) || (num == 16) || (num == 32));
-        hint = (spapr->next_irq + num - 1) & ~(num - 1);
-    }
-
-    for (i = 0; i < num; ++i) {
-        int irq;
-
-        irq = spapr_allocate_irq(hint, lsi);
-        if (!irq) {
-            return -1;
-        }
-
-        if (0 == i) {
-            first = irq;
-            hint = 0;
-        }
-
-        /* If the above doesn't create a consecutive block then that's
-         * an internal bug */
-        assert(irq == (first + i));
-    }
-
-    return first;
-}
-
 static XICSState *try_create_xics(const char *type, int nr_servers,
                                   int nr_irqs)
 {
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index 16fa49e..e605430 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -314,7 +314,7 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
 
 void spapr_events_init(sPAPREnvironment *spapr)
 {
-    spapr->epow_irq = spapr_allocate_msi(0);
+    spapr->epow_irq = xics_alloc_block(spapr->icp, 0, 1, false, false);
     spapr->epow_notifier.notify = spapr_powerdown_req;
     qemu_register_powerdown_notifier(&spapr->epow_notifier);
     spapr_rtas_register("check-exception", check_exception);
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index cbef095..4eaf364 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -343,8 +343,8 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
 
     /* There is no cached config, allocate MSIs */
     if (!phb->msi_table[ndev].nvec) {
-        irq = spapr_allocate_irq_block(req_num, false,
-                                       ret_intr_type == RTAS_TYPE_MSI);
+        irq = xics_alloc_block(spapr->icp, 0, req_num, false,
+                               ret_intr_type == RTAS_TYPE_MSI);
         if (irq < 0) {
             error_report("Cannot allocate MSIs for device#%d", ndev);
             rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
@@ -623,7 +623,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
     for (i = 0; i < PCI_NUM_PINS; i++) {
         uint32_t irq;
 
-        irq = spapr_allocate_lsi(0);
+        irq = xics_alloc_block(spapr->icp, 0, 1, true, false);
         if (!irq) {
             error_setg(errp, "spapr_allocate_lsi failed");
             return;
diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
index 4e33f46..8aeb263 100644
--- a/hw/ppc/spapr_vio.c
+++ b/hw/ppc/spapr_vio.c
@@ -448,7 +448,7 @@ static int spapr_vio_busdev_init(DeviceState *qdev)
         dev->qdev.id = id;
     }
 
-    dev->irq = spapr_allocate_msi(dev->irq);
+    dev->irq = xics_alloc(spapr->icp, 0, dev->irq, false);
     if (!dev->irq) {
         return -1;
     }
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 18332fd..5d577ff 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -323,16 +323,6 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
 int spapr_allocate_irq(int hint, bool lsi);
 int spapr_allocate_irq_block(int num, bool lsi, bool msi);
 
-static inline int spapr_allocate_msi(int hint)
-{
-    return spapr_allocate_irq(hint, false);
-}
-
-static inline int spapr_allocate_lsi(int hint)
-{
-    return spapr_allocate_irq(hint, true);
-}
-
 /* RTAS return codes */
 #define RTAS_OUT_SUCCESS            0
 #define RTAS_OUT_NO_ERRORS_FOUND    1
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 4b30e9a..337398d 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -157,6 +157,8 @@ struct ICSIRQState {
 
 qemu_irq xics_get_qirq(XICSState *icp, int irq);
 void xics_set_irq_type(XICSState *icp, int irq, bool lsi);
+int xics_alloc(XICSState *icp, int server, int irq, bool lsi);
+int xics_alloc_block(XICSState *icp, int server, int num, bool lsi, bool align);
 
 void xics_cpu_setup(XICSState *icp, PowerPCCPU *cpu);
 
diff --git a/trace-events b/trace-events
index 002c260..ad7400e 100644
--- a/trace-events
+++ b/trace-events
@@ -1143,6 +1143,9 @@ xics_set_irq_lsi(int srcno, int nr) "set_irq_lsi: srcno %d [irq %#x]"
 xics_ics_write_xive(int nr, int srcno, int server, uint8_t priority) "ics_write_xive: irq %#x [src %d] server %#x prio %#x"
 xics_ics_reject(int nr, int srcno) "reject irq %#x [src %d]"
 xics_ics_eoi(int nr) "ics_eoi: irq %#x"
+xics_alloc(int server, int irq) "server#%d, irq %d"
+xics_alloc_failed(int server, int irq) "server#%d, irq %d"
+xics_alloc_block(int server, int first, int num, bool lsi, int align) "server#%d, first irq %d, %d irqs, lsi=%d, alignnum %d"
 
 # hw/ppc/spapr_iommu.c
 spapr_iommu_put(uint64_t liobn, uint64_t ioba, uint64_t tce, uint64_t ret) "liobn=%"PRIx64" ioba=0x%"PRIx64" tce=0x%"PRIx64" ret=%"PRId64
-- 
1.8.4.rc4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 7/8] spapr: remove @next_irq
  2014-03-14  4:18 [Qemu-devel] [PATCH 0/8] spapr: fix IOMMU and XICS/IRQs migration Alexey Kardashevskiy
                   ` (5 preceding siblings ...)
  2014-03-14  4:18 ` [Qemu-devel] [PATCH 6/8] spapr: move interrupt allocator to xics Alexey Kardashevskiy
@ 2014-03-14  4:18 ` Alexey Kardashevskiy
  2014-03-14  7:19   ` Thomas Huth
  2014-03-14  4:18 ` [Qemu-devel] [PATCH 8/8] xics: enable interrupt configuration reset on migration Alexey Kardashevskiy
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-03-14  4:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf, Andreas Färber

This removes @next_irq from sPAPREnvironment which was used in old
IRQ allocator as XICS is now responsible for IRQs and keep track of
allocated IRQs.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr.c         | 3 ---
 include/hw/ppc/spapr.h | 1 -
 2 files changed, 4 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 29ca2e0..1f7162c 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -732,8 +732,6 @@ static const VMStateDescription vmstate_spapr = {
     .minimum_version_id = 1,
     .minimum_version_id_old = 1,
     .fields      = (VMStateField []) {
-        VMSTATE_UINT32(next_irq, sPAPREnvironment),
-
         /* RTC offset */
         VMSTATE_UINT64(rtc_offset, sPAPREnvironment),
 
@@ -1136,7 +1134,6 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
     /* Set up Interrupt Controller before we create the VCPUs */
     spapr->icp = xics_system_init(smp_cpus * kvmppc_smt_threads() / smp_threads,
                                   XICS_IRQS);
-    spapr->next_irq = XICS_IRQ_BASE;
 
     /* init CPUs */
     if (cpu_model == NULL) {
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 5d577ff..aac2505 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -28,7 +28,6 @@ typedef struct sPAPREnvironment {
     long rtas_size;
     void *fdt_skel;
     target_ulong entry_point;
-    uint32_t next_irq;
     uint64_t rtc_offset;
     bool has_graphics;
 
-- 
1.8.4.rc4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 8/8] xics: enable interrupt configuration reset on migration
  2014-03-14  4:18 [Qemu-devel] [PATCH 0/8] spapr: fix IOMMU and XICS/IRQs migration Alexey Kardashevskiy
                   ` (6 preceding siblings ...)
  2014-03-14  4:18 ` [Qemu-devel] [PATCH 7/8] spapr: remove @next_irq Alexey Kardashevskiy
@ 2014-03-14  4:18 ` Alexey Kardashevskiy
  2014-04-10 12:55   ` Alexander Graf
  2014-03-20  1:25 ` [Qemu-devel] [PATCH 0/8] spapr: fix IOMMU and XICS/IRQs migration Andreas Färber
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-03-14  4:18 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf, Andreas Färber

Interrupt numbers migrate along with other properties so
the initial QEMU setup will be reset by migration. Since
XICS migrates as well and this includes IRQ map with all
the flags saying which ones are already used, all we need
is just to reset the XICS IRQ array on the destination.

This resets XICS IRQ usage map.

This enables devices to migrate IRQ number instead of
checking that the number has not changed since
the initialization.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/intc/xics.c        | 19 +++++++++++++++++++
 hw/ppc/spapr_pci.c    |  2 +-
 hw/ppc/spapr_vio.c    |  2 +-
 include/hw/ppc/xics.h |  1 +
 trace-events          |  1 +
 5 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 8d101a3..0809a52 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -33,6 +33,8 @@
 #include "qemu/error-report.h"
 #include "qapi/visitor.h"
 
+static void ics_free(ICSState *ics, int irq, int num);
+
 static int get_cpu_index_by_dt_id(int cpu_dt_id)
 {
     PowerPCCPU *cpu = ppc_get_vcpu_by_dt_id(cpu_dt_id);
@@ -531,6 +533,12 @@ static void ics_reset(DeviceState *dev)
     }
 }
 
+static int ics_pre_load(ICSState *ics)
+{
+    ics_free(ics, ics->offset, ics->nr_irqs);
+    return 0;
+}
+
 static int ics_post_load(ICSState *ics, int version_id)
 {
     int i;
@@ -635,6 +643,7 @@ static void ics_class_init(ObjectClass *klass, void *data)
     dc->realize = ics_realize;
     dc->vmsd = &vmstate_ics;
     dc->reset = ics_reset;
+    isc->pre_load = ics_pre_load;
     isc->post_load = ics_post_load;
 }
 
@@ -770,6 +779,16 @@ int xics_alloc_block(XICSState *icp, int server, int num, bool lsi, bool align)
     return first;
 }
 
+static void ics_free(ICSState *ics, int irq, int num)
+{
+    int i;
+
+    trace_xics_ics_free(ics - ics->icp->ics, irq, num);
+    for (i = irq; i < irq + num; ++i) {
+        memset(&ics->irqs[i], 0, sizeof(ICSIRQState));
+    }
+}
+
 /*
  * Guest interfaces
  */
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 4eaf364..aa12d1a 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -661,7 +661,7 @@ static const VMStateDescription vmstate_spapr_pci_lsi = {
     .minimum_version_id = 1,
     .minimum_version_id_old = 1,
     .fields      = (VMStateField []) {
-        VMSTATE_UINT32_EQUAL(irq, struct spapr_pci_lsi),
+        VMSTATE_UINT32(irq, struct spapr_pci_lsi),
 
         VMSTATE_END_OF_LIST()
     },
diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
index 8aeb263..022c914 100644
--- a/hw/ppc/spapr_vio.c
+++ b/hw/ppc/spapr_vio.c
@@ -548,7 +548,7 @@ const VMStateDescription vmstate_spapr_vio = {
     .fields      = (VMStateField []) {
         /* Sanity check */
         VMSTATE_UINT32_EQUAL(reg, VIOsPAPRDevice),
-        VMSTATE_UINT32_EQUAL(irq, VIOsPAPRDevice),
+        VMSTATE_UINT32(irq, VIOsPAPRDevice),
 
         /* General VIO device state */
         VMSTATE_UINTTL(signal_state, VIOsPAPRDevice),
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 337398d..6ee6279 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -159,6 +159,7 @@ qemu_irq xics_get_qirq(XICSState *icp, int irq);
 void xics_set_irq_type(XICSState *icp, int irq, bool lsi);
 int xics_alloc(XICSState *icp, int server, int irq, bool lsi);
 int xics_alloc_block(XICSState *icp, int server, int num, bool lsi, bool align);
+void xics_free(XICSState *icp, int server, int irq, int num);
 
 void xics_cpu_setup(XICSState *icp, PowerPCCPU *cpu);
 
diff --git a/trace-events b/trace-events
index ad7400e..948ab93 100644
--- a/trace-events
+++ b/trace-events
@@ -1146,6 +1146,7 @@ xics_ics_eoi(int nr) "ics_eoi: irq %#x"
 xics_alloc(int server, int irq) "server#%d, irq %d"
 xics_alloc_failed(int server, int irq) "server#%d, irq %d"
 xics_alloc_block(int server, int first, int num, bool lsi, int align) "server#%d, first irq %d, %d irqs, lsi=%d, alignnum %d"
+xics_ics_free(int server, int irq, int num) "server#%d, first irq %d, %d irqs"
 
 # hw/ppc/spapr_iommu.c
 spapr_iommu_put(uint64_t liobn, uint64_t ioba, uint64_t tce, uint64_t ret) "liobn=%"PRIx64" ioba=0x%"PRIx64" tce=0x%"PRIx64" ret=%"PRId64
-- 
1.8.4.rc4

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 7/8] spapr: remove @next_irq
  2014-03-14  4:18 ` [Qemu-devel] [PATCH 7/8] spapr: remove @next_irq Alexey Kardashevskiy
@ 2014-03-14  7:19   ` Thomas Huth
  0 siblings, 0 replies; 35+ messages in thread
From: Thomas Huth @ 2014-03-14  7:19 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: qemu-ppc, qemu-devel, Andreas Färber, Alexander Graf

On Fri, 14 Mar 2014 15:18:08 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> This removes @next_irq from sPAPREnvironment which was used in old
> IRQ allocator as XICS is now responsible for IRQs and keep track of
> allocated IRQs.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  hw/ppc/spapr.c         | 3 ---
>  include/hw/ppc/spapr.h | 1 -
>  2 files changed, 4 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 29ca2e0..1f7162c 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -732,8 +732,6 @@ static const VMStateDescription vmstate_spapr = {
>      .minimum_version_id = 1,
>      .minimum_version_id_old = 1,
>      .fields      = (VMStateField []) {
> -        VMSTATE_UINT32(next_irq, sPAPREnvironment),
> -

You're changing the layout of a state description here... maybe it
would be a good idea to increase the version_id (and
minimum_version_id) in that case?

 Thomas

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] spapr: fix IOMMU and XICS/IRQs migration
  2014-03-14  4:18 [Qemu-devel] [PATCH 0/8] spapr: fix IOMMU and XICS/IRQs migration Alexey Kardashevskiy
                   ` (7 preceding siblings ...)
  2014-03-14  4:18 ` [Qemu-devel] [PATCH 8/8] xics: enable interrupt configuration reset on migration Alexey Kardashevskiy
@ 2014-03-20  1:25 ` Andreas Färber
  2014-04-04  5:53 ` Alexey Kardashevskiy
  2014-05-04 13:56 ` Alexey Kardashevskiy
  10 siblings, 0 replies; 35+ messages in thread
From: Andreas Färber @ 2014-03-20  1:25 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel; +Cc: qemu-ppc, Alexander Graf

Am 14.03.2014 05:18, schrieb Alexey Kardashevskiy:
> This initial problem came form libvirt - it does not preserve
> the device order when running QEMU. So it is easy to get source QEMU with:
> -device spapr-vscsi,id=scsi1,reg=0x2000 -device spapr-vscsi,id=scsi0,reg=0x3000
> and destination QEMU with:
> -device spapr-vscsi,id=scsi0,reg=0x3000 -device spapr-vscsi,id=scsi1,reg=0x2000
> 
> Since SPAPR IOMMU device does not have a bus, it is identified in
> the migration stream as "spapr-iommu" and @instance_id which is assigned
> as IOMMUs are created. This results in broken migration as @reg does not match.
> The first patch fixes this issue by adding a bus device and a bridge.
> 
> However just 1/8 does not fix the migration as device creation order also
> affects IRQs assigned to the devices, for both PCI and VIO.
> 2/7..8/8 fix that by moving XICS IRQ management from SPAPR to XICS and
> implementing migration support for the entire XICS IRQ map.
> 
> As we are here, the patchset also prepares XICS to support multiple ICS
> (interrupt servers).
> 
> This is a bugfix patchset but it feels too big to go to 2.0, right? :)

Too big for tonight at least, yeah. :/

Regards,
Andreas

> 
> 
> Alexey Kardashevskiy (8):
>   spapr-iommu: add a bus for spapr-iommu devices
>   xics: add flags for interrupts
>   xics: add find_server
>   xics: add pre_load() hook to ICSStateClass
>   xics: disable flags reset on xics reset
>   spapr: move interrupt allocator to xics
>   spapr: remove @next_irq
>   xics: enable interrupt configuration reset on migration
> 
>  hw/intc/xics.c         | 158 +++++++++++++++++++++++++++++++++++++++++++++----
>  hw/intc/xics_kvm.c     |   9 +--
>  hw/ppc/spapr.c         |  73 +----------------------
>  hw/ppc/spapr_events.c  |   2 +-
>  hw/ppc/spapr_iommu.c   |  59 +++++++++++++++++-
>  hw/ppc/spapr_pci.c     |   8 +--
>  hw/ppc/spapr_vio.c     |   4 +-
>  include/hw/ppc/spapr.h |  18 +++---
>  include/hw/ppc/xics.h  |   8 ++-
>  trace-events           |   4 ++
>  10 files changed, 238 insertions(+), 105 deletions(-)
> 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] spapr: fix IOMMU and XICS/IRQs migration
  2014-03-14  4:18 [Qemu-devel] [PATCH 0/8] spapr: fix IOMMU and XICS/IRQs migration Alexey Kardashevskiy
                   ` (8 preceding siblings ...)
  2014-03-20  1:25 ` [Qemu-devel] [PATCH 0/8] spapr: fix IOMMU and XICS/IRQs migration Andreas Färber
@ 2014-04-04  5:53 ` Alexey Kardashevskiy
  2014-05-04 13:56 ` Alexey Kardashevskiy
  10 siblings, 0 replies; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-04-04  5:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, Alexander Graf, Andreas Färber

On 03/14/2014 03:18 PM, Alexey Kardashevskiy wrote:
> This initial problem came form libvirt - it does not preserve
> the device order when running QEMU. So it is easy to get source QEMU with:
> -device spapr-vscsi,id=scsi1,reg=0x2000 -device spapr-vscsi,id=scsi0,reg=0x3000
> and destination QEMU with:
> -device spapr-vscsi,id=scsi0,reg=0x3000 -device spapr-vscsi,id=scsi1,reg=0x2000
> 
> Since SPAPR IOMMU device does not have a bus, it is identified in
> the migration stream as "spapr-iommu" and @instance_id which is assigned
> as IOMMUs are created. This results in broken migration as @reg does not match.
> The first patch fixes this issue by adding a bus device and a bridge.
> 
> However just 1/8 does not fix the migration as device creation order also
> affects IRQs assigned to the devices, for both PCI and VIO.
> 2/7..8/8 fix that by moving XICS IRQ management from SPAPR to XICS and
> implementing migration support for the entire XICS IRQ map.
> 
> As we are here, the patchset also prepares XICS to support multiple ICS
> (interrupt servers).
> 
> This is a bugfix patchset but it feels too big to go to 2.0, right? :)

Alex, ping?


> 
> 
> Alexey Kardashevskiy (8):
>   spapr-iommu: add a bus for spapr-iommu devices
>   xics: add flags for interrupts
>   xics: add find_server
>   xics: add pre_load() hook to ICSStateClass
>   xics: disable flags reset on xics reset
>   spapr: move interrupt allocator to xics
>   spapr: remove @next_irq
>   xics: enable interrupt configuration reset on migration
> 
>  hw/intc/xics.c         | 158 +++++++++++++++++++++++++++++++++++++++++++++----
>  hw/intc/xics_kvm.c     |   9 +--
>  hw/ppc/spapr.c         |  73 +----------------------
>  hw/ppc/spapr_events.c  |   2 +-
>  hw/ppc/spapr_iommu.c   |  59 +++++++++++++++++-
>  hw/ppc/spapr_pci.c     |   8 +--
>  hw/ppc/spapr_vio.c     |   4 +-
>  include/hw/ppc/spapr.h |  18 +++---
>  include/hw/ppc/xics.h  |   8 ++-
>  trace-events           |   4 ++
>  10 files changed, 238 insertions(+), 105 deletions(-)
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spapr-iommu: add a bus for spapr-iommu devices
  2014-03-14  4:18 ` [Qemu-devel] [PATCH 1/8] spapr-iommu: add a bus for spapr-iommu devices Alexey Kardashevskiy
@ 2014-04-10 12:40   ` Alexander Graf
  2014-04-10 14:40     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 35+ messages in thread
From: Alexander Graf @ 2014-04-10 12:40 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: qemu-ppc, Andreas Färber, Juan Quintela


On 14.03.14 05:18, Alexey Kardashevskiy wrote:
> At the moment sPAPR IOMMU table is a device which participates in
> a migration stream. Normally QEMU uses a get_dev_path() hook from
> the device's bus to compose the section name and @instance_id which are
> used to match the section to the real device. This works till the user
> changes the device order in the command line - if this happens,
> devices get other instance_id's and migration fails.
>
> This adds a TCE bridge bus device per sPAPR machine and places all sPAPR
> IOMMU devices onto it.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Juan, is a different command line device order supposed to work with 
migration?


Alex

> ---
>   hw/ppc/spapr.c         |  3 +++
>   hw/ppc/spapr_iommu.c   | 59 +++++++++++++++++++++++++++++++++++++++++++++++++-
>   include/hw/ppc/spapr.h |  7 ++++++
>   3 files changed, 68 insertions(+), 1 deletion(-)
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 5c9a154..12adc21 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1263,6 +1263,9 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
>       /* Set up EPOW events infrastructure */
>       spapr_events_init(spapr);
>   
> +    /* Set up TCE IOMMUs bus */
> +    spapr->tce_bus = spapr_tce_bus_init();
> +
>       /* Set up VIO bus */
>       spapr->vio_bus = spapr_vio_bus_init();
>   
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index d9fe946..7db0acf 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -157,7 +157,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn, size_t wi
>           return NULL;
>       }
>   
> -    tcet = SPAPR_TCE_TABLE(object_new(TYPE_SPAPR_TCE_TABLE));
> +    tcet = SPAPR_TCE_TABLE(qdev_create(spapr->tce_bus, TYPE_SPAPR_TCE_TABLE));
>       tcet->liobn = liobn;
>       tcet->window_size = window_size;
>   
> @@ -342,9 +342,66 @@ static TypeInfo spapr_tce_table_info = {
>       .instance_finalize = spapr_tce_table_finalize,
>   };
>   
> +static char *spapr_tce_bus_get_dev_name(DeviceState *qdev)
> +{
> +    sPAPRTCETable *tcet = SPAPR_TCE_TABLE(qdev);
> +    char *name;
> +
> +    name = g_strdup_printf("liobn@%x", tcet->liobn);
> +    return name;
> +}
> +
> +static void spapr_tce_bus_class_init(ObjectClass *klass, void *data)
> +{
> +    BusClass *k = BUS_CLASS(klass);
> +
> +    k->get_dev_path = spapr_tce_bus_get_dev_name;
> +}
> +
> +static const TypeInfo spapr_tce_bus_info = {
> +    .name = TYPE_SPAPR_TCE_BUS,
> +    .parent = TYPE_BUS,
> +    .class_init = spapr_tce_bus_class_init,
> +    .instance_size = sizeof(BusState),
> +};
> +
> +static int spapr_tce_bridge_init(SysBusDevice *dev)
> +{
> +    /* nothing */
> +    return 0;
> +}
> +
> +static void spapr_tce_bridge_class_init(ObjectClass *klass, void *data)
> +{
> +    SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
> +
> +    k->init = spapr_tce_bridge_init;
> +}
> +
> +static const TypeInfo spapr_tce_bridge_info = {
> +    .name          = "spapr-tce-bridge",
> +    .parent        = TYPE_SYS_BUS_DEVICE,
> +    .instance_size = sizeof(SysBusDevice),
> +    .class_init    = spapr_tce_bridge_class_init,
> +};
> +
>   static void register_types(void)
>   {
>       type_register_static(&spapr_tce_table_info);
> +    type_register_static(&spapr_tce_bridge_info);
> +    type_register_static(&spapr_tce_bus_info);
> +}
> +
> +BusState *spapr_tce_bus_init(void)
> +{
> +    DeviceState *dev;
> +
> +    /* Create bridge device */
> +    dev = qdev_create(NULL, spapr_tce_bridge_info.name);
> +    qdev_init_nofail(dev);
> +
> +    /* Create bus on bridge device */
> +    return qbus_create(TYPE_SPAPR_TCE_BUS, dev, "spapr-tce");
>   }
>   
>   type_init(register_types);
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 449fc7c..18332fd 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -12,6 +12,7 @@ struct sPAPRNVRAM;
>   
>   typedef struct sPAPREnvironment {
>       struct VIOsPAPRBus *vio_bus;
> +    BusState *tce_bus;
>       QLIST_HEAD(, sPAPRPHBState) phbs;
>       hwaddr msi_win_addr;
>       MemoryRegion msiwindow;
> @@ -405,4 +406,10 @@ int spapr_dma_dt(void *fdt, int node_off, const char *propname,
>   int spapr_tcet_dma_dt(void *fdt, int node_off, const char *propname,
>                         sPAPRTCETable *tcet);
>   
> +#define TYPE_SPAPR_TCE_BUS "spapr-tce-bus"
> +#define SPAPR_TCE_BUS(obj) \
> +    OBJECT_CHECK(BusState, (obj), TYPE_SPAPR_TCE_BUS)
> +
> +BusState *spapr_tce_bus_init(void);
> +
>   #endif /* !defined (__HW_SPAPR_H__) */

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] xics: add flags for interrupts
  2014-03-14  4:18 ` [Qemu-devel] [PATCH 2/8] xics: add flags for interrupts Alexey Kardashevskiy
@ 2014-04-10 12:43   ` Alexander Graf
  0 siblings, 0 replies; 35+ messages in thread
From: Alexander Graf @ 2014-04-10 12:43 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel; +Cc: qemu-ppc, Andreas Färber


On 14.03.14 05:18, Alexey Kardashevskiy wrote:
> We will need soon an "allocated" flag for every interrupt to support
> interrupt configuration change which may happen during migration.
>
> This replaces a separate lslsi[] array with a byte in the ICSIRQState
> struct and defines "LSI" and "MSI" flags. Neither of these flags set
> signals that the descriptor is not in use.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>   hw/intc/xics.c        | 17 +++++++++++------
>   hw/intc/xics_kvm.c    |  5 ++---
>   include/hw/ppc/xics.h |  4 +++-
>   3 files changed, 16 insertions(+), 10 deletions(-)
>
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index 64aabe7..7eac85a 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -438,7 +438,7 @@ static void ics_set_irq(void *opaque, int srcno, int val)
>   {
>       ICSState *ics = (ICSState *)opaque;
>   
> -    if (ics->islsi[srcno]) {
> +    if (ics->irqs[srcno].flags & XICS_FLAGS_LSI) {
>           set_irq_lsi(ics, srcno, val);
>       } else {
>           set_irq_msi(ics, srcno, val);
> @@ -475,7 +475,7 @@ static void ics_write_xive(ICSState *ics, int nr, int server,
>   
>       trace_xics_ics_write_xive(nr, srcno, server, priority);
>   
> -    if (ics->islsi[srcno]) {
> +    if (ics->irqs[srcno].flags & XICS_FLAGS_LSI) {
>           write_xive_lsi(ics, srcno);
>       } else {
>           write_xive_msi(ics, srcno);
> @@ -497,7 +497,7 @@ static void ics_resend(ICSState *ics)
>   
>       for (i = 0; i < ics->nr_irqs; i++) {
>           /* FIXME: filter by server#? */
> -        if (ics->islsi[i]) {
> +        if (ics->irqs[i].flags & XICS_FLAGS_LSI) {
>               resend_lsi(ics, i);
>           } else {
>               resend_msi(ics, i);
> @@ -512,7 +512,7 @@ static void ics_eoi(ICSState *ics, int nr)
>   
>       trace_xics_ics_eoi(nr);
>   
> -    if (ics->islsi[srcno]) {
> +    if (ics->irqs[srcno].flags & XICS_FLAGS_LSI) {
>           irq->status &= ~XICS_STATUS_SENT;
>       }
>   }
> @@ -609,7 +609,6 @@ static void ics_realize(DeviceState *dev, Error **errp)
>           return;
>       }
>       ics->irqs = g_malloc0(ics->nr_irqs * sizeof(ICSIRQState));
> -    ics->islsi = g_malloc0(ics->nr_irqs * sizeof(bool));
>       ics->qirqs = qemu_allocate_irqs(ics_set_irq, ics, ics->nr_irqs);
>   }
>   
> @@ -646,11 +645,17 @@ qemu_irq xics_get_qirq(XICSState *icp, int irq)
>       return icp->ics->qirqs[irq - icp->ics->offset];
>   }
>   
> +static void ics_set_irq_type(ICSState *ics, int irq, bool lsi)
> +{
> +    ics->irqs[irq - ics->offset].flags |=
> +        lsi ? XICS_FLAGS_LSI : XICS_FLAGS_MSI;

If I configure an IRQ as LSI then as MSI this doesn't work. Sure, we 
probably don't do this but in general this is not how a "set" function 
should behave.


Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] spapr: move interrupt allocator to xics
  2014-03-14  4:18 ` [Qemu-devel] [PATCH 6/8] spapr: move interrupt allocator to xics Alexey Kardashevskiy
@ 2014-04-10 12:51   ` Alexander Graf
  2014-04-10 13:24     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 35+ messages in thread
From: Alexander Graf @ 2014-04-10 12:51 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel; +Cc: qemu-ppc, Andreas Färber


On 14.03.14 05:18, Alexey Kardashevskiy wrote:
> The current allocator returns IRQ numbers from a pool and does not
> support IRQs reuse in any form as it did not keep track of what it
> previously returned, it only had the last returned IRQ.
> However migration may change interrupts for devices depending on
> their order in the command line.

Wtf? Nonono, this sounds very bogus and wrong. Migration shouldn't 
change anything.


Alex

> This moves an allocator from SPAPR to XICS.
>
> This switches IRQ users to use new API.
>
> This uses LSI/MSI flags to know if interrupt is in use.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>   hw/intc/xics.c         | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++
>   hw/ppc/spapr.c         | 67 ------------------------------------------
>   hw/ppc/spapr_events.c  |  2 +-
>   hw/ppc/spapr_pci.c     |  6 ++--
>   hw/ppc/spapr_vio.c     |  2 +-
>   include/hw/ppc/spapr.h | 10 -------
>   include/hw/ppc/xics.h  |  2 ++
>   trace-events           |  3 ++
>   8 files changed, 90 insertions(+), 82 deletions(-)
>
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index e5195bf..8d101a3 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -690,6 +690,86 @@ void xics_set_irq_type(XICSState *icp, int irq, bool lsi)
>       ics_set_irq_type(&icp->ics[server], irq, lsi);
>   }
>   
> +#define XICS_IRQ_FREE(ics, n)   \
> +    (!((ics)->irqs[(n) - (ics)->offset].flags & \
> +       (XICS_FLAGS_LSI | XICS_FLAGS_MSI)))
> +
> +static int ics_find_free_block(ICSState *ics, int num, int alignnum)
> +{
> +    int first, i;
> +
> +    for (first = 0; first < ics->nr_irqs; first += alignnum) {
> +        if (num > (ics->nr_irqs - first)) {
> +            return -1;
> +        }
> +        for (i = first; i < first + num; ++i) {
> +            if (!XICS_IRQ_FREE(ics, i + ics->offset)) {
> +                break;
> +            }
> +        }
> +        if (i == (first + num)) {
> +            return first + ics->offset;
> +        }
> +    }
> +
> +    return -1;
> +}
> +
> +int xics_alloc(XICSState *icp, int server, int irq, bool lsi)
> +{
> +    ICSState *ics = &icp->ics[server];
> +
> +    if (irq) {
> +        assert(server == xics_find_server(icp, irq));
> +        if (!XICS_IRQ_FREE(ics, irq)) {
> +            trace_xics_alloc_failed(server, irq);
> +            return -1;
> +        }
> +    } else {
> +        irq = ics_find_free_block(ics, 1, 1);
> +    }
> +
> +    ics_set_irq_type(ics, irq, lsi);
> +    trace_xics_alloc(server, irq);
> +
> +    return irq;
> +}
> +
> +/*
> + * Allocate block of consequtive IRQs, returns a number of the first.
> + * If align==true, aligns the first IRQ number to num.
> + */
> +int xics_alloc_block(XICSState *icp, int server, int num, bool lsi, bool align)
> +{
> +    int i, first = -1;
> +    ICSState *ics = &icp->ics[server];
> +
> +    assert(server == 0);
> +    /*
> +     * MSIMesage::data is used for storing VIRQ so
> +     * it has to be aligned to num to support multiple
> +     * MSI vectors. MSI-X is not affected by this.
> +     * The hint is used for the first IRQ, the rest should
> +     * be allocated continuously.
> +     */
> +    if (align) {
> +        assert((num == 1) || (num == 2) || (num == 4) ||
> +               (num == 8) || (num == 16) || (num == 32));
> +        first = ics_find_free_block(ics, num, num);
> +    } else {
> +        first = ics_find_free_block(ics, num, 1);
> +    }
> +
> +    if (first > 0) {
> +        for (i = first; i < first + num; ++i) {
> +            ics_set_irq_type(ics, i, lsi);
> +        }
> +    }
> +    trace_xics_alloc_block(server, first, num, lsi, align);
> +
> +    return first;
> +}
> +
>   /*
>    * Guest interfaces
>    */
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 12adc21..29ca2e0 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -83,73 +83,6 @@
>   
>   sPAPREnvironment *spapr;
>   
> -int spapr_allocate_irq(int hint, bool lsi)
> -{
> -    int irq;
> -
> -    if (hint) {
> -        irq = hint;
> -        if (hint >= spapr->next_irq) {
> -            spapr->next_irq = hint + 1;
> -        }
> -        /* FIXME: we should probably check for collisions somehow */
> -    } else {
> -        irq = spapr->next_irq++;
> -    }
> -
> -    /* Configure irq type */
> -    if (!xics_get_qirq(spapr->icp, irq)) {
> -        return 0;
> -    }
> -
> -    xics_set_irq_type(spapr->icp, irq, lsi);
> -
> -    return irq;
> -}
> -
> -/*
> - * Allocate block of consequtive IRQs, returns a number of the first.
> - * If msi==true, aligns the first IRQ number to num.
> - */
> -int spapr_allocate_irq_block(int num, bool lsi, bool msi)
> -{
> -    int first = -1;
> -    int i, hint = 0;
> -
> -    /*
> -     * MSIMesage::data is used for storing VIRQ so
> -     * it has to be aligned to num to support multiple
> -     * MSI vectors. MSI-X is not affected by this.
> -     * The hint is used for the first IRQ, the rest should
> -     * be allocated continuously.
> -     */
> -    if (msi) {
> -        assert((num == 1) || (num == 2) || (num == 4) ||
> -               (num == 8) || (num == 16) || (num == 32));
> -        hint = (spapr->next_irq + num - 1) & ~(num - 1);
> -    }
> -
> -    for (i = 0; i < num; ++i) {
> -        int irq;
> -
> -        irq = spapr_allocate_irq(hint, lsi);
> -        if (!irq) {
> -            return -1;
> -        }
> -
> -        if (0 == i) {
> -            first = irq;
> -            hint = 0;
> -        }
> -
> -        /* If the above doesn't create a consecutive block then that's
> -         * an internal bug */
> -        assert(irq == (first + i));
> -    }
> -
> -    return first;
> -}
> -
>   static XICSState *try_create_xics(const char *type, int nr_servers,
>                                     int nr_irqs)
>   {
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index 16fa49e..e605430 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -314,7 +314,7 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>   
>   void spapr_events_init(sPAPREnvironment *spapr)
>   {
> -    spapr->epow_irq = spapr_allocate_msi(0);
> +    spapr->epow_irq = xics_alloc_block(spapr->icp, 0, 1, false, false);
>       spapr->epow_notifier.notify = spapr_powerdown_req;
>       qemu_register_powerdown_notifier(&spapr->epow_notifier);
>       spapr_rtas_register("check-exception", check_exception);
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index cbef095..4eaf364 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -343,8 +343,8 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>   
>       /* There is no cached config, allocate MSIs */
>       if (!phb->msi_table[ndev].nvec) {
> -        irq = spapr_allocate_irq_block(req_num, false,
> -                                       ret_intr_type == RTAS_TYPE_MSI);
> +        irq = xics_alloc_block(spapr->icp, 0, req_num, false,
> +                               ret_intr_type == RTAS_TYPE_MSI);
>           if (irq < 0) {
>               error_report("Cannot allocate MSIs for device#%d", ndev);
>               rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
> @@ -623,7 +623,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>       for (i = 0; i < PCI_NUM_PINS; i++) {
>           uint32_t irq;
>   
> -        irq = spapr_allocate_lsi(0);
> +        irq = xics_alloc_block(spapr->icp, 0, 1, true, false);
>           if (!irq) {
>               error_setg(errp, "spapr_allocate_lsi failed");
>               return;
> diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
> index 4e33f46..8aeb263 100644
> --- a/hw/ppc/spapr_vio.c
> +++ b/hw/ppc/spapr_vio.c
> @@ -448,7 +448,7 @@ static int spapr_vio_busdev_init(DeviceState *qdev)
>           dev->qdev.id = id;
>       }
>   
> -    dev->irq = spapr_allocate_msi(dev->irq);
> +    dev->irq = xics_alloc(spapr->icp, 0, dev->irq, false);
>       if (!dev->irq) {
>           return -1;
>       }
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 18332fd..5d577ff 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -323,16 +323,6 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
>   int spapr_allocate_irq(int hint, bool lsi);
>   int spapr_allocate_irq_block(int num, bool lsi, bool msi);
>   
> -static inline int spapr_allocate_msi(int hint)
> -{
> -    return spapr_allocate_irq(hint, false);
> -}
> -
> -static inline int spapr_allocate_lsi(int hint)
> -{
> -    return spapr_allocate_irq(hint, true);
> -}
> -
>   /* RTAS return codes */
>   #define RTAS_OUT_SUCCESS            0
>   #define RTAS_OUT_NO_ERRORS_FOUND    1
> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> index 4b30e9a..337398d 100644
> --- a/include/hw/ppc/xics.h
> +++ b/include/hw/ppc/xics.h
> @@ -157,6 +157,8 @@ struct ICSIRQState {
>   
>   qemu_irq xics_get_qirq(XICSState *icp, int irq);
>   void xics_set_irq_type(XICSState *icp, int irq, bool lsi);
> +int xics_alloc(XICSState *icp, int server, int irq, bool lsi);
> +int xics_alloc_block(XICSState *icp, int server, int num, bool lsi, bool align);
>   
>   void xics_cpu_setup(XICSState *icp, PowerPCCPU *cpu);
>   
> diff --git a/trace-events b/trace-events
> index 002c260..ad7400e 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1143,6 +1143,9 @@ xics_set_irq_lsi(int srcno, int nr) "set_irq_lsi: srcno %d [irq %#x]"
>   xics_ics_write_xive(int nr, int srcno, int server, uint8_t priority) "ics_write_xive: irq %#x [src %d] server %#x prio %#x"
>   xics_ics_reject(int nr, int srcno) "reject irq %#x [src %d]"
>   xics_ics_eoi(int nr) "ics_eoi: irq %#x"
> +xics_alloc(int server, int irq) "server#%d, irq %d"
> +xics_alloc_failed(int server, int irq) "server#%d, irq %d"
> +xics_alloc_block(int server, int first, int num, bool lsi, int align) "server#%d, first irq %d, %d irqs, lsi=%d, alignnum %d"
>   
>   # hw/ppc/spapr_iommu.c
>   spapr_iommu_put(uint64_t liobn, uint64_t ioba, uint64_t tce, uint64_t ret) "liobn=%"PRIx64" ioba=0x%"PRIx64" tce=0x%"PRIx64" ret=%"PRId64

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 8/8] xics: enable interrupt configuration reset on migration
  2014-03-14  4:18 ` [Qemu-devel] [PATCH 8/8] xics: enable interrupt configuration reset on migration Alexey Kardashevskiy
@ 2014-04-10 12:55   ` Alexander Graf
  0 siblings, 0 replies; 35+ messages in thread
From: Alexander Graf @ 2014-04-10 12:55 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: qemu-ppc, Andreas Färber, Juan Quintela


On 14.03.14 05:18, Alexey Kardashevskiy wrote:
> Interrupt numbers migrate along with other properties so
> the initial QEMU setup will be reset by migration. Since
> XICS migrates as well and this includes IRQ map with all
> the flags saying which ones are already used, all we need
> is just to reset the XICS IRQ array on the destination.
>
> This resets XICS IRQ usage map.
>
> This enables devices to migrate IRQ number instead of
> checking that the number has not changed since
> the initialization.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

I'm not sure this is how it's supposed to work. Juan, what's the usual 
way to ensure that interrupt allocation stays identical between source 
and destination on migration?


Alex

> ---
>   hw/intc/xics.c        | 19 +++++++++++++++++++
>   hw/ppc/spapr_pci.c    |  2 +-
>   hw/ppc/spapr_vio.c    |  2 +-
>   include/hw/ppc/xics.h |  1 +
>   trace-events          |  1 +
>   5 files changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index 8d101a3..0809a52 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -33,6 +33,8 @@
>   #include "qemu/error-report.h"
>   #include "qapi/visitor.h"
>   
> +static void ics_free(ICSState *ics, int irq, int num);
> +
>   static int get_cpu_index_by_dt_id(int cpu_dt_id)
>   {
>       PowerPCCPU *cpu = ppc_get_vcpu_by_dt_id(cpu_dt_id);
> @@ -531,6 +533,12 @@ static void ics_reset(DeviceState *dev)
>       }
>   }
>   
> +static int ics_pre_load(ICSState *ics)
> +{
> +    ics_free(ics, ics->offset, ics->nr_irqs);
> +    return 0;
> +}
> +
>   static int ics_post_load(ICSState *ics, int version_id)
>   {
>       int i;
> @@ -635,6 +643,7 @@ static void ics_class_init(ObjectClass *klass, void *data)
>       dc->realize = ics_realize;
>       dc->vmsd = &vmstate_ics;
>       dc->reset = ics_reset;
> +    isc->pre_load = ics_pre_load;
>       isc->post_load = ics_post_load;
>   }
>   
> @@ -770,6 +779,16 @@ int xics_alloc_block(XICSState *icp, int server, int num, bool lsi, bool align)
>       return first;
>   }
>   
> +static void ics_free(ICSState *ics, int irq, int num)
> +{
> +    int i;
> +
> +    trace_xics_ics_free(ics - ics->icp->ics, irq, num);
> +    for (i = irq; i < irq + num; ++i) {
> +        memset(&ics->irqs[i], 0, sizeof(ICSIRQState));
> +    }
> +}
> +
>   /*
>    * Guest interfaces
>    */
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 4eaf364..aa12d1a 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -661,7 +661,7 @@ static const VMStateDescription vmstate_spapr_pci_lsi = {
>       .minimum_version_id = 1,
>       .minimum_version_id_old = 1,
>       .fields      = (VMStateField []) {
> -        VMSTATE_UINT32_EQUAL(irq, struct spapr_pci_lsi),
> +        VMSTATE_UINT32(irq, struct spapr_pci_lsi),
>   
>           VMSTATE_END_OF_LIST()
>       },
> diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
> index 8aeb263..022c914 100644
> --- a/hw/ppc/spapr_vio.c
> +++ b/hw/ppc/spapr_vio.c
> @@ -548,7 +548,7 @@ const VMStateDescription vmstate_spapr_vio = {
>       .fields      = (VMStateField []) {
>           /* Sanity check */
>           VMSTATE_UINT32_EQUAL(reg, VIOsPAPRDevice),
> -        VMSTATE_UINT32_EQUAL(irq, VIOsPAPRDevice),
> +        VMSTATE_UINT32(irq, VIOsPAPRDevice),
>   
>           /* General VIO device state */
>           VMSTATE_UINTTL(signal_state, VIOsPAPRDevice),
> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> index 337398d..6ee6279 100644
> --- a/include/hw/ppc/xics.h
> +++ b/include/hw/ppc/xics.h
> @@ -159,6 +159,7 @@ qemu_irq xics_get_qirq(XICSState *icp, int irq);
>   void xics_set_irq_type(XICSState *icp, int irq, bool lsi);
>   int xics_alloc(XICSState *icp, int server, int irq, bool lsi);
>   int xics_alloc_block(XICSState *icp, int server, int num, bool lsi, bool align);
> +void xics_free(XICSState *icp, int server, int irq, int num);
>   
>   void xics_cpu_setup(XICSState *icp, PowerPCCPU *cpu);
>   
> diff --git a/trace-events b/trace-events
> index ad7400e..948ab93 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -1146,6 +1146,7 @@ xics_ics_eoi(int nr) "ics_eoi: irq %#x"
>   xics_alloc(int server, int irq) "server#%d, irq %d"
>   xics_alloc_failed(int server, int irq) "server#%d, irq %d"
>   xics_alloc_block(int server, int first, int num, bool lsi, int align) "server#%d, first irq %d, %d irqs, lsi=%d, alignnum %d"
> +xics_ics_free(int server, int irq, int num) "server#%d, first irq %d, %d irqs"
>   
>   # hw/ppc/spapr_iommu.c
>   spapr_iommu_put(uint64_t liobn, uint64_t ioba, uint64_t tce, uint64_t ret) "liobn=%"PRIx64" ioba=0x%"PRIx64" tce=0x%"PRIx64" ret=%"PRId64

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] spapr: move interrupt allocator to xics
  2014-04-10 12:51   ` Alexander Graf
@ 2014-04-10 13:24     ` Alexey Kardashevskiy
  2014-04-10 13:26       ` Alexander Graf
  0 siblings, 1 reply; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-04-10 13:24 UTC (permalink / raw)
  To: Alexander Graf, qemu-devel; +Cc: qemu-ppc, Andreas Färber

On 04/10/2014 10:51 PM, Alexander Graf wrote:
> 
> On 14.03.14 05:18, Alexey Kardashevskiy wrote:
>> The current allocator returns IRQ numbers from a pool and does not
>> support IRQs reuse in any form as it did not keep track of what it
>> previously returned, it only had the last returned IRQ.
>> However migration may change interrupts for devices depending on
>> their order in the command line.
> 
> Wtf? Nonono, this sounds very bogus and wrong. Migration shouldn't change
> anything.


I put wrong commit message. By change I meant that the default state before
the destination guest started accepting migration is different from what
the destination guest became after migration finished. And migration cannot
avoid changing this default state.




> Alex
> 
>> This moves an allocator from SPAPR to XICS.
>>
>> This switches IRQ users to use new API.
>>
>> This uses LSI/MSI flags to know if interrupt is in use.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>   hw/intc/xics.c         | 80
>> ++++++++++++++++++++++++++++++++++++++++++++++++++
>>   hw/ppc/spapr.c         | 67 ------------------------------------------
>>   hw/ppc/spapr_events.c  |  2 +-
>>   hw/ppc/spapr_pci.c     |  6 ++--
>>   hw/ppc/spapr_vio.c     |  2 +-
>>   include/hw/ppc/spapr.h | 10 -------
>>   include/hw/ppc/xics.h  |  2 ++
>>   trace-events           |  3 ++
>>   8 files changed, 90 insertions(+), 82 deletions(-)
>>
>> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
>> index e5195bf..8d101a3 100644
>> --- a/hw/intc/xics.c
>> +++ b/hw/intc/xics.c
>> @@ -690,6 +690,86 @@ void xics_set_irq_type(XICSState *icp, int irq, bool
>> lsi)
>>       ics_set_irq_type(&icp->ics[server], irq, lsi);
>>   }
>>   +#define XICS_IRQ_FREE(ics, n)   \
>> +    (!((ics)->irqs[(n) - (ics)->offset].flags & \
>> +       (XICS_FLAGS_LSI | XICS_FLAGS_MSI)))
>> +
>> +static int ics_find_free_block(ICSState *ics, int num, int alignnum)
>> +{
>> +    int first, i;
>> +
>> +    for (first = 0; first < ics->nr_irqs; first += alignnum) {
>> +        if (num > (ics->nr_irqs - first)) {
>> +            return -1;
>> +        }
>> +        for (i = first; i < first + num; ++i) {
>> +            if (!XICS_IRQ_FREE(ics, i + ics->offset)) {
>> +                break;
>> +            }
>> +        }
>> +        if (i == (first + num)) {
>> +            return first + ics->offset;
>> +        }
>> +    }
>> +
>> +    return -1;
>> +}
>> +
>> +int xics_alloc(XICSState *icp, int server, int irq, bool lsi)
>> +{
>> +    ICSState *ics = &icp->ics[server];
>> +
>> +    if (irq) {
>> +        assert(server == xics_find_server(icp, irq));
>> +        if (!XICS_IRQ_FREE(ics, irq)) {
>> +            trace_xics_alloc_failed(server, irq);
>> +            return -1;
>> +        }
>> +    } else {
>> +        irq = ics_find_free_block(ics, 1, 1);
>> +    }
>> +
>> +    ics_set_irq_type(ics, irq, lsi);
>> +    trace_xics_alloc(server, irq);
>> +
>> +    return irq;
>> +}
>> +
>> +/*
>> + * Allocate block of consequtive IRQs, returns a number of the first.
>> + * If align==true, aligns the first IRQ number to num.
>> + */
>> +int xics_alloc_block(XICSState *icp, int server, int num, bool lsi, bool
>> align)
>> +{
>> +    int i, first = -1;
>> +    ICSState *ics = &icp->ics[server];
>> +
>> +    assert(server == 0);
>> +    /*
>> +     * MSIMesage::data is used for storing VIRQ so
>> +     * it has to be aligned to num to support multiple
>> +     * MSI vectors. MSI-X is not affected by this.
>> +     * The hint is used for the first IRQ, the rest should
>> +     * be allocated continuously.
>> +     */
>> +    if (align) {
>> +        assert((num == 1) || (num == 2) || (num == 4) ||
>> +               (num == 8) || (num == 16) || (num == 32));
>> +        first = ics_find_free_block(ics, num, num);
>> +    } else {
>> +        first = ics_find_free_block(ics, num, 1);
>> +    }
>> +
>> +    if (first > 0) {
>> +        for (i = first; i < first + num; ++i) {
>> +            ics_set_irq_type(ics, i, lsi);
>> +        }
>> +    }
>> +    trace_xics_alloc_block(server, first, num, lsi, align);
>> +
>> +    return first;
>> +}
>> +
>>   /*
>>    * Guest interfaces
>>    */
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 12adc21..29ca2e0 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -83,73 +83,6 @@
>>     sPAPREnvironment *spapr;
>>   -int spapr_allocate_irq(int hint, bool lsi)
>> -{
>> -    int irq;
>> -
>> -    if (hint) {
>> -        irq = hint;
>> -        if (hint >= spapr->next_irq) {
>> -            spapr->next_irq = hint + 1;
>> -        }
>> -        /* FIXME: we should probably check for collisions somehow */
>> -    } else {
>> -        irq = spapr->next_irq++;
>> -    }
>> -
>> -    /* Configure irq type */
>> -    if (!xics_get_qirq(spapr->icp, irq)) {
>> -        return 0;
>> -    }
>> -
>> -    xics_set_irq_type(spapr->icp, irq, lsi);
>> -
>> -    return irq;
>> -}
>> -
>> -/*
>> - * Allocate block of consequtive IRQs, returns a number of the first.
>> - * If msi==true, aligns the first IRQ number to num.
>> - */
>> -int spapr_allocate_irq_block(int num, bool lsi, bool msi)
>> -{
>> -    int first = -1;
>> -    int i, hint = 0;
>> -
>> -    /*
>> -     * MSIMesage::data is used for storing VIRQ so
>> -     * it has to be aligned to num to support multiple
>> -     * MSI vectors. MSI-X is not affected by this.
>> -     * The hint is used for the first IRQ, the rest should
>> -     * be allocated continuously.
>> -     */
>> -    if (msi) {
>> -        assert((num == 1) || (num == 2) || (num == 4) ||
>> -               (num == 8) || (num == 16) || (num == 32));
>> -        hint = (spapr->next_irq + num - 1) & ~(num - 1);
>> -    }
>> -
>> -    for (i = 0; i < num; ++i) {
>> -        int irq;
>> -
>> -        irq = spapr_allocate_irq(hint, lsi);
>> -        if (!irq) {
>> -            return -1;
>> -        }
>> -
>> -        if (0 == i) {
>> -            first = irq;
>> -            hint = 0;
>> -        }
>> -
>> -        /* If the above doesn't create a consecutive block then that's
>> -         * an internal bug */
>> -        assert(irq == (first + i));
>> -    }
>> -
>> -    return first;
>> -}
>> -
>>   static XICSState *try_create_xics(const char *type, int nr_servers,
>>                                     int nr_irqs)
>>   {
>> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
>> index 16fa49e..e605430 100644
>> --- a/hw/ppc/spapr_events.c
>> +++ b/hw/ppc/spapr_events.c
>> @@ -314,7 +314,7 @@ static void check_exception(PowerPCCPU *cpu,
>> sPAPREnvironment *spapr,
>>     void spapr_events_init(sPAPREnvironment *spapr)
>>   {
>> -    spapr->epow_irq = spapr_allocate_msi(0);
>> +    spapr->epow_irq = xics_alloc_block(spapr->icp, 0, 1, false, false);
>>       spapr->epow_notifier.notify = spapr_powerdown_req;
>>       qemu_register_powerdown_notifier(&spapr->epow_notifier);
>>       spapr_rtas_register("check-exception", check_exception);
>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>> index cbef095..4eaf364 100644
>> --- a/hw/ppc/spapr_pci.c
>> +++ b/hw/ppc/spapr_pci.c
>> @@ -343,8 +343,8 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu,
>> sPAPREnvironment *spapr,
>>         /* There is no cached config, allocate MSIs */
>>       if (!phb->msi_table[ndev].nvec) {
>> -        irq = spapr_allocate_irq_block(req_num, false,
>> -                                       ret_intr_type == RTAS_TYPE_MSI);
>> +        irq = xics_alloc_block(spapr->icp, 0, req_num, false,
>> +                               ret_intr_type == RTAS_TYPE_MSI);
>>           if (irq < 0) {
>>               error_report("Cannot allocate MSIs for device#%d", ndev);
>>               rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
>> @@ -623,7 +623,7 @@ static void spapr_phb_realize(DeviceState *dev, Error
>> **errp)
>>       for (i = 0; i < PCI_NUM_PINS; i++) {
>>           uint32_t irq;
>>   -        irq = spapr_allocate_lsi(0);
>> +        irq = xics_alloc_block(spapr->icp, 0, 1, true, false);
>>           if (!irq) {
>>               error_setg(errp, "spapr_allocate_lsi failed");
>>               return;
>> diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
>> index 4e33f46..8aeb263 100644
>> --- a/hw/ppc/spapr_vio.c
>> +++ b/hw/ppc/spapr_vio.c
>> @@ -448,7 +448,7 @@ static int spapr_vio_busdev_init(DeviceState *qdev)
>>           dev->qdev.id = id;
>>       }
>>   -    dev->irq = spapr_allocate_msi(dev->irq);
>> +    dev->irq = xics_alloc(spapr->icp, 0, dev->irq, false);
>>       if (!dev->irq) {
>>           return -1;
>>       }
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 18332fd..5d577ff 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -323,16 +323,6 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu,
>> target_ulong opcode,
>>   int spapr_allocate_irq(int hint, bool lsi);
>>   int spapr_allocate_irq_block(int num, bool lsi, bool msi);
>>   -static inline int spapr_allocate_msi(int hint)
>> -{
>> -    return spapr_allocate_irq(hint, false);
>> -}
>> -
>> -static inline int spapr_allocate_lsi(int hint)
>> -{
>> -    return spapr_allocate_irq(hint, true);
>> -}
>> -
>>   /* RTAS return codes */
>>   #define RTAS_OUT_SUCCESS            0
>>   #define RTAS_OUT_NO_ERRORS_FOUND    1
>> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
>> index 4b30e9a..337398d 100644
>> --- a/include/hw/ppc/xics.h
>> +++ b/include/hw/ppc/xics.h
>> @@ -157,6 +157,8 @@ struct ICSIRQState {
>>     qemu_irq xics_get_qirq(XICSState *icp, int irq);
>>   void xics_set_irq_type(XICSState *icp, int irq, bool lsi);
>> +int xics_alloc(XICSState *icp, int server, int irq, bool lsi);
>> +int xics_alloc_block(XICSState *icp, int server, int num, bool lsi, bool
>> align);
>>     void xics_cpu_setup(XICSState *icp, PowerPCCPU *cpu);
>>   diff --git a/trace-events b/trace-events
>> index 002c260..ad7400e 100644
>> --- a/trace-events
>> +++ b/trace-events
>> @@ -1143,6 +1143,9 @@ xics_set_irq_lsi(int srcno, int nr) "set_irq_lsi:
>> srcno %d [irq %#x]"
>>   xics_ics_write_xive(int nr, int srcno, int server, uint8_t priority)
>> "ics_write_xive: irq %#x [src %d] server %#x prio %#x"
>>   xics_ics_reject(int nr, int srcno) "reject irq %#x [src %d]"
>>   xics_ics_eoi(int nr) "ics_eoi: irq %#x"
>> +xics_alloc(int server, int irq) "server#%d, irq %d"
>> +xics_alloc_failed(int server, int irq) "server#%d, irq %d"
>> +xics_alloc_block(int server, int first, int num, bool lsi, int align)
>> "server#%d, first irq %d, %d irqs, lsi=%d, alignnum %d"
>>     # hw/ppc/spapr_iommu.c
>>   spapr_iommu_put(uint64_t liobn, uint64_t ioba, uint64_t tce, uint64_t
>> ret) "liobn=%"PRIx64" ioba=0x%"PRIx64" tce=0x%"PRIx64" ret=%"PRId64
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] spapr: move interrupt allocator to xics
  2014-04-10 13:24     ` Alexey Kardashevskiy
@ 2014-04-10 13:26       ` Alexander Graf
  2014-04-10 14:43         ` Alexey Kardashevskiy
  0 siblings, 1 reply; 35+ messages in thread
From: Alexander Graf @ 2014-04-10 13:26 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel; +Cc: qemu-ppc, Andreas Färber


On 10.04.14 15:24, Alexey Kardashevskiy wrote:
> On 04/10/2014 10:51 PM, Alexander Graf wrote:
>> On 14.03.14 05:18, Alexey Kardashevskiy wrote:
>>> The current allocator returns IRQ numbers from a pool and does not
>>> support IRQs reuse in any form as it did not keep track of what it
>>> previously returned, it only had the last returned IRQ.
>>> However migration may change interrupts for devices depending on
>>> their order in the command line.
>> Wtf? Nonono, this sounds very bogus and wrong. Migration shouldn't change
>> anything.
>
> I put wrong commit message. By change I meant that the default state before
> the destination guest started accepting migration is different from what
> the destination guest became after migration finished. And migration cannot
> avoid changing this default state.

Ok, why is the IRQ configuration different?


Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spapr-iommu: add a bus for spapr-iommu devices
  2014-04-10 12:40   ` Alexander Graf
@ 2014-04-10 14:40     ` Alexey Kardashevskiy
  2014-04-10 14:52       ` Andreas Färber
  0 siblings, 1 reply; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-04-10 14:40 UTC (permalink / raw)
  To: Alexander Graf, qemu-devel
  Cc: Paolo Bonzini, qemu-ppc, Andreas Färber, Juan Quintela

On 04/10/2014 10:40 PM, Alexander Graf wrote:
> 
> On 14.03.14 05:18, Alexey Kardashevskiy wrote:
>> At the moment sPAPR IOMMU table is a device which participates in
>> a migration stream. Normally QEMU uses a get_dev_path() hook from
>> the device's bus to compose the section name and @instance_id which are
>> used to match the section to the real device. This works till the user
>> changes the device order in the command line - if this happens,
>> devices get other instance_id's and migration fails.
>>
>> This adds a TCE bridge bus device per sPAPR machine and places all sPAPR
>> IOMMU devices onto it.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Juan, is a different command line device order supposed to work with
> migration?


We discussed this on IRC with Paolo and the conclusion is that yes, the
order should not matter.

Ideally we should implement "irq" property for every device (and INTA/B/C/D
for PHB) and run the source and destination QEMU with exact IRQ numbers (of
nail IRQ numbers to devices somehow?). But for me either is overkill.


> 
> Alex
> 
>> ---
>>   hw/ppc/spapr.c         |  3 +++
>>   hw/ppc/spapr_iommu.c   | 59
>> +++++++++++++++++++++++++++++++++++++++++++++++++-
>>   include/hw/ppc/spapr.h |  7 ++++++
>>   3 files changed, 68 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 5c9a154..12adc21 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -1263,6 +1263,9 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
>>       /* Set up EPOW events infrastructure */
>>       spapr_events_init(spapr);
>>   +    /* Set up TCE IOMMUs bus */
>> +    spapr->tce_bus = spapr_tce_bus_init();
>> +
>>       /* Set up VIO bus */
>>       spapr->vio_bus = spapr_vio_bus_init();
>>   diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>> index d9fe946..7db0acf 100644
>> --- a/hw/ppc/spapr_iommu.c
>> +++ b/hw/ppc/spapr_iommu.c
>> @@ -157,7 +157,7 @@ sPAPRTCETable *spapr_tce_new_table(DeviceState
>> *owner, uint32_t liobn, size_t wi
>>           return NULL;
>>       }
>>   -    tcet = SPAPR_TCE_TABLE(object_new(TYPE_SPAPR_TCE_TABLE));
>> +    tcet = SPAPR_TCE_TABLE(qdev_create(spapr->tce_bus,
>> TYPE_SPAPR_TCE_TABLE));
>>       tcet->liobn = liobn;
>>       tcet->window_size = window_size;
>>   @@ -342,9 +342,66 @@ static TypeInfo spapr_tce_table_info = {
>>       .instance_finalize = spapr_tce_table_finalize,
>>   };
>>   +static char *spapr_tce_bus_get_dev_name(DeviceState *qdev)
>> +{
>> +    sPAPRTCETable *tcet = SPAPR_TCE_TABLE(qdev);
>> +    char *name;
>> +
>> +    name = g_strdup_printf("liobn@%x", tcet->liobn);
>> +    return name;
>> +}
>> +
>> +static void spapr_tce_bus_class_init(ObjectClass *klass, void *data)
>> +{
>> +    BusClass *k = BUS_CLASS(klass);
>> +
>> +    k->get_dev_path = spapr_tce_bus_get_dev_name;
>> +}
>> +
>> +static const TypeInfo spapr_tce_bus_info = {
>> +    .name = TYPE_SPAPR_TCE_BUS,
>> +    .parent = TYPE_BUS,
>> +    .class_init = spapr_tce_bus_class_init,
>> +    .instance_size = sizeof(BusState),
>> +};
>> +
>> +static int spapr_tce_bridge_init(SysBusDevice *dev)
>> +{
>> +    /* nothing */
>> +    return 0;
>> +}
>> +
>> +static void spapr_tce_bridge_class_init(ObjectClass *klass, void *data)
>> +{
>> +    SysBusDeviceClass *k = SYS_BUS_DEVICE_CLASS(klass);
>> +
>> +    k->init = spapr_tce_bridge_init;
>> +}
>> +
>> +static const TypeInfo spapr_tce_bridge_info = {
>> +    .name          = "spapr-tce-bridge",
>> +    .parent        = TYPE_SYS_BUS_DEVICE,
>> +    .instance_size = sizeof(SysBusDevice),
>> +    .class_init    = spapr_tce_bridge_class_init,
>> +};
>> +
>>   static void register_types(void)
>>   {
>>       type_register_static(&spapr_tce_table_info);
>> +    type_register_static(&spapr_tce_bridge_info);
>> +    type_register_static(&spapr_tce_bus_info);
>> +}
>> +
>> +BusState *spapr_tce_bus_init(void)
>> +{
>> +    DeviceState *dev;
>> +
>> +    /* Create bridge device */
>> +    dev = qdev_create(NULL, spapr_tce_bridge_info.name);
>> +    qdev_init_nofail(dev);
>> +
>> +    /* Create bus on bridge device */
>> +    return qbus_create(TYPE_SPAPR_TCE_BUS, dev, "spapr-tce");
>>   }
>>     type_init(register_types);
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index 449fc7c..18332fd 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -12,6 +12,7 @@ struct sPAPRNVRAM;
>>     typedef struct sPAPREnvironment {
>>       struct VIOsPAPRBus *vio_bus;
>> +    BusState *tce_bus;
>>       QLIST_HEAD(, sPAPRPHBState) phbs;
>>       hwaddr msi_win_addr;
>>       MemoryRegion msiwindow;
>> @@ -405,4 +406,10 @@ int spapr_dma_dt(void *fdt, int node_off, const char
>> *propname,
>>   int spapr_tcet_dma_dt(void *fdt, int node_off, const char *propname,
>>                         sPAPRTCETable *tcet);
>>   +#define TYPE_SPAPR_TCE_BUS "spapr-tce-bus"
>> +#define SPAPR_TCE_BUS(obj) \
>> +    OBJECT_CHECK(BusState, (obj), TYPE_SPAPR_TCE_BUS)
>> +
>> +BusState *spapr_tce_bus_init(void);
>> +
>>   #endif /* !defined (__HW_SPAPR_H__) */
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] spapr: move interrupt allocator to xics
  2014-04-10 13:26       ` Alexander Graf
@ 2014-04-10 14:43         ` Alexey Kardashevskiy
  2014-04-11  9:24           ` Alexander Graf
  0 siblings, 1 reply; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-04-10 14:43 UTC (permalink / raw)
  To: Alexander Graf, qemu-devel; +Cc: qemu-ppc, Andreas Färber

On 04/10/2014 11:26 PM, Alexander Graf wrote:
> 
> On 10.04.14 15:24, Alexey Kardashevskiy wrote:
>> On 04/10/2014 10:51 PM, Alexander Graf wrote:
>>> On 14.03.14 05:18, Alexey Kardashevskiy wrote:
>>>> The current allocator returns IRQ numbers from a pool and does not
>>>> support IRQs reuse in any form as it did not keep track of what it
>>>> previously returned, it only had the last returned IRQ.
>>>> However migration may change interrupts for devices depending on
>>>> their order in the command line.
>>> Wtf? Nonono, this sounds very bogus and wrong. Migration shouldn't change
>>> anything.
>>
>> I put wrong commit message. By change I meant that the default state before
>> the destination guest started accepting migration is different from what
>> the destination guest became after migration finished. And migration cannot
>> avoid changing this default state.
> 
> Ok, why is the IRQ configuration different?

Because QEMU creates devices in the order as in the command line, and
libvirt changes this order - the XML used to create the guest and the XML
which is sends during migration are different. libvirt thinks it is ok
while it keeps @reg property for (for example) spapr-vscsi devices but it
is not because since the order is different, devices call IRQ allocator in
different order and get different IRQs.



-- 
Alexey

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spapr-iommu: add a bus for spapr-iommu devices
  2014-04-10 14:40     ` Alexey Kardashevskiy
@ 2014-04-10 14:52       ` Andreas Färber
  2014-04-10 15:18         ` Alexey Kardashevskiy
  0 siblings, 1 reply; 35+ messages in thread
From: Andreas Färber @ 2014-04-10 14:52 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Paolo Bonzini, Juan Quintela, qemu-ppc, Alexander Graf, qemu-devel

Am 10.04.2014 16:40, schrieb Alexey Kardashevskiy:
> On 04/10/2014 10:40 PM, Alexander Graf wrote:
>>
>> Juan, is a different command line device order supposed to work with
>> migration?
> 
> 
> We discussed this on IRC with Paolo and the conclusion is that yes, the
> order should not matter.

Huh?! If you ever tried changing the order of PCI devices such as
virtio-blk-pci on the command line (or changing between if=virtio and
-device virtio-blk-pci) then surely it does change the order of the
/dev/vdX the guest sees and will be migration-incompatible. The order of
-drive and -device however, for instance, does not seem to matter. But
in this generality, your statement does not seem to reflect current reality.

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] spapr-iommu: add a bus for spapr-iommu devices
  2014-04-10 14:52       ` Andreas Färber
@ 2014-04-10 15:18         ` Alexey Kardashevskiy
  0 siblings, 0 replies; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-04-10 15:18 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Paolo Bonzini, Juan Quintela, qemu-ppc, Alexander Graf, qemu-devel

On 04/11/2014 12:52 AM, Andreas Färber wrote:
> Am 10.04.2014 16:40, schrieb Alexey Kardashevskiy:
>> On 04/10/2014 10:40 PM, Alexander Graf wrote:
>>>
>>> Juan, is a different command line device order supposed to work with
>>> migration?
>>
>>
>> We discussed this on IRC with Paolo and the conclusion is that yes, the
>> order should not matter.
> 
> Huh?! If you ever tried changing the order of PCI devices such as
> virtio-blk-pci on the command line (or changing between if=virtio and
> -device virtio-blk-pci) then surely it does change the order of the
> /dev/vdX the guest sees and will be migration-incompatible. The order of
> -drive and -device however, for instance, does not seem to matter. But
> in this generality, your statement does not seem to reflect current reality.

As far as I can tell libvirt is trying to be precise and specifies PCI bus
name and PCI address for PCI devices.


-- 
Alexey

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] spapr: move interrupt allocator to xics
  2014-04-10 14:43         ` Alexey Kardashevskiy
@ 2014-04-11  9:24           ` Alexander Graf
  2014-04-11 12:38             ` Alexey Kardashevskiy
  0 siblings, 1 reply; 35+ messages in thread
From: Alexander Graf @ 2014-04-11  9:24 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel; +Cc: qemu-ppc, Andreas Färber


On 10.04.14 16:43, Alexey Kardashevskiy wrote:
> On 04/10/2014 11:26 PM, Alexander Graf wrote:
>> On 10.04.14 15:24, Alexey Kardashevskiy wrote:
>>> On 04/10/2014 10:51 PM, Alexander Graf wrote:
>>>> On 14.03.14 05:18, Alexey Kardashevskiy wrote:
>>>>> The current allocator returns IRQ numbers from a pool and does not
>>>>> support IRQs reuse in any form as it did not keep track of what it
>>>>> previously returned, it only had the last returned IRQ.
>>>>> However migration may change interrupts for devices depending on
>>>>> their order in the command line.
>>>> Wtf? Nonono, this sounds very bogus and wrong. Migration shouldn't change
>>>> anything.
>>> I put wrong commit message. By change I meant that the default state before
>>> the destination guest started accepting migration is different from what
>>> the destination guest became after migration finished. And migration cannot
>>> avoid changing this default state.
>> Ok, why is the IRQ configuration different?
> Because QEMU creates devices in the order as in the command line, and
> libvirt changes this order - the XML used to create the guest and the XML
> which is sends during migration are different. libvirt thinks it is ok
> while it keeps @reg property for (for example) spapr-vscsi devices but it
> is not because since the order is different, devices call IRQ allocator in
> different order and get different IRQs.

So your patch migrates the current IRQ configuration, but once you 
restart the virtual machine on the destination host it will have 
different IRQ numbering again, right?

I'm not sure that's a good solution to the problem. I guess we should 
rather aim to make sure that we can make IRQ allocation explicit. 
Fundamentally the problem sounds very similar to the PCI slot allocation 
which eventually got solved by libvirt specifying the slots manually.


Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] spapr: move interrupt allocator to xics
  2014-04-11  9:24           ` Alexander Graf
@ 2014-04-11 12:38             ` Alexey Kardashevskiy
  2014-04-11 13:58               ` Alexander Graf
  0 siblings, 1 reply; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-04-11 12:38 UTC (permalink / raw)
  To: Alexander Graf, qemu-devel; +Cc: qemu-ppc, Andreas Färber

On 04/11/2014 07:24 PM, Alexander Graf wrote:
> 
> On 10.04.14 16:43, Alexey Kardashevskiy wrote:
>> On 04/10/2014 11:26 PM, Alexander Graf wrote:
>>> On 10.04.14 15:24, Alexey Kardashevskiy wrote:
>>>> On 04/10/2014 10:51 PM, Alexander Graf wrote:
>>>>> On 14.03.14 05:18, Alexey Kardashevskiy wrote:
>>>>>> The current allocator returns IRQ numbers from a pool and does not
>>>>>> support IRQs reuse in any form as it did not keep track of what it
>>>>>> previously returned, it only had the last returned IRQ.
>>>>>> However migration may change interrupts for devices depending on
>>>>>> their order in the command line.
>>>>> Wtf? Nonono, this sounds very bogus and wrong. Migration shouldn't change
>>>>> anything.
>>>> I put wrong commit message. By change I meant that the default state
>>>> before
>>>> the destination guest started accepting migration is different from what
>>>> the destination guest became after migration finished. And migration
>>>> cannot
>>>> avoid changing this default state.
>>> Ok, why is the IRQ configuration different?
>> Because QEMU creates devices in the order as in the command line, and
>> libvirt changes this order - the XML used to create the guest and the XML
>> which is sends during migration are different. libvirt thinks it is ok
>> while it keeps @reg property for (for example) spapr-vscsi devices but it
>> is not because since the order is different, devices call IRQ allocator in
>> different order and get different IRQs.
> 
> So your patch migrates the current IRQ configuration, but once you restart
> the virtual machine on the destination host it will have different IRQ
> numbering again, right?

No, why? IRQs are assigned at init time from realize() callbacks (and
survive reset) or as a part of ibm,change-msi rtas call which happens in
the same order as it only depends on pci addresses and we do not change
this either.


> I'm not sure that's a good solution to the problem. I guess we should
> rather aim to make sure that we can make IRQ allocation explicit.
> Fundamentally the problem sounds very similar to the PCI slot allocation
> which eventually got solved by libvirt specifying the slots manually.

We can do that too. Who decides? :)



-- 
Alexey

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] spapr: move interrupt allocator to xics
  2014-04-11 12:38             ` Alexey Kardashevskiy
@ 2014-04-11 13:58               ` Alexander Graf
  2014-04-11 14:50                 ` Alexey Kardashevskiy
  0 siblings, 1 reply; 35+ messages in thread
From: Alexander Graf @ 2014-04-11 13:58 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-ppc, qemu-devel, Andreas Färber


On 11.04.2014, at 14:38, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> On 04/11/2014 07:24 PM, Alexander Graf wrote:
>> 
>> On 10.04.14 16:43, Alexey Kardashevskiy wrote:
>>> On 04/10/2014 11:26 PM, Alexander Graf wrote:
>>>> On 10.04.14 15:24, Alexey Kardashevskiy wrote:
>>>>> On 04/10/2014 10:51 PM, Alexander Graf wrote:
>>>>>> On 14.03.14 05:18, Alexey Kardashevskiy wrote:
>>>>>>> The current allocator returns IRQ numbers from a pool and does not
>>>>>>> support IRQs reuse in any form as it did not keep track of what it
>>>>>>> previously returned, it only had the last returned IRQ.
>>>>>>> However migration may change interrupts for devices depending on
>>>>>>> their order in the command line.
>>>>>> Wtf? Nonono, this sounds very bogus and wrong. Migration shouldn't change
>>>>>> anything.
>>>>> I put wrong commit message. By change I meant that the default state
>>>>> before
>>>>> the destination guest started accepting migration is different from what
>>>>> the destination guest became after migration finished. And migration
>>>>> cannot
>>>>> avoid changing this default state.
>>>> Ok, why is the IRQ configuration different?
>>> Because QEMU creates devices in the order as in the command line, and
>>> libvirt changes this order - the XML used to create the guest and the XML
>>> which is sends during migration are different. libvirt thinks it is ok
>>> while it keeps @reg property for (for example) spapr-vscsi devices but it
>>> is not because since the order is different, devices call IRQ allocator in
>>> different order and get different IRQs.
>> 
>> So your patch migrates the current IRQ configuration, but once you restart
>> the virtual machine on the destination host it will have different IRQ
>> numbering again, right?
> 
> No, why? IRQs are assigned at init time from realize() callbacks (and
> survive reset) or as a part of ibm,change-msi rtas call which happens in
> the same order as it only depends on pci addresses and we do not change
> this either.

Ok, let me rephrase. If I shut the machine down because I'm doing on-disk hibernate and then boot it back up, will the guest find the same configuration?

> 
> 
>> I'm not sure that's a good solution to the problem. I guess we should
>> rather aim to make sure that we can make IRQ allocation explicit.
>> Fundamentally the problem sounds very similar to the PCI slot allocation
>> which eventually got solved by libvirt specifying the slots manually.
> 
> We can do that too. Who decides? :)

The better solution wins :)


Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] spapr: move interrupt allocator to xics
  2014-04-11 13:58               ` Alexander Graf
@ 2014-04-11 14:50                 ` Alexey Kardashevskiy
  2014-04-11 14:58                   ` Alexander Graf
  0 siblings, 1 reply; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-04-11 14:50 UTC (permalink / raw)
  To: Alexander Graf; +Cc: qemu-ppc, qemu-devel, Andreas Färber

On 04/11/2014 11:58 PM, Alexander Graf wrote:
> 
> On 11.04.2014, at 14:38, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> 
>> On 04/11/2014 07:24 PM, Alexander Graf wrote:
>>>
>>> On 10.04.14 16:43, Alexey Kardashevskiy wrote:
>>>> On 04/10/2014 11:26 PM, Alexander Graf wrote:
>>>>> On 10.04.14 15:24, Alexey Kardashevskiy wrote:
>>>>>> On 04/10/2014 10:51 PM, Alexander Graf wrote:
>>>>>>> On 14.03.14 05:18, Alexey Kardashevskiy wrote:
>>>>>>>> The current allocator returns IRQ numbers from a pool and does not
>>>>>>>> support IRQs reuse in any form as it did not keep track of what it
>>>>>>>> previously returned, it only had the last returned IRQ.
>>>>>>>> However migration may change interrupts for devices depending on
>>>>>>>> their order in the command line.
>>>>>>> Wtf? Nonono, this sounds very bogus and wrong. Migration shouldn't change
>>>>>>> anything.
>>>>>> I put wrong commit message. By change I meant that the default state
>>>>>> before
>>>>>> the destination guest started accepting migration is different from what
>>>>>> the destination guest became after migration finished. And migration
>>>>>> cannot
>>>>>> avoid changing this default state.
>>>>> Ok, why is the IRQ configuration different?
>>>> Because QEMU creates devices in the order as in the command line, and
>>>> libvirt changes this order - the XML used to create the guest and the XML
>>>> which is sends during migration are different. libvirt thinks it is ok
>>>> while it keeps @reg property for (for example) spapr-vscsi devices but it
>>>> is not because since the order is different, devices call IRQ allocator in
>>>> different order and get different IRQs.
>>>
>>> So your patch migrates the current IRQ configuration, but once you restart
>>> the virtual machine on the destination host it will have different IRQ
>>> numbering again, right?
>>
>> No, why? IRQs are assigned at init time from realize() callbacks (and
>> survive reset) or as a part of ibm,change-msi rtas call which happens in
>> the same order as it only depends on pci addresses and we do not change
>> this either.
> 

> Ok, let me rephrase. If I shut the machine down because I'm doing
> on-disk hibernate and then boot it back up, will the guest find the same
> configuration?


I do not understand what you mean by this. Hibernation by the guest OS
itself or by QEMU? If this involves QEMU exit and QEMU start - then yes,
config may be different. If it is "migrate to file" and then "migrate from
file" (do not know what you call it when migration goes to a pipe which is
"tar") - then config will be the same.


>>> I'm not sure that's a good solution to the problem. I guess we should
>>> rather aim to make sure that we can make IRQ allocation explicit.
>>> Fundamentally the problem sounds very similar to the PCI slot allocation
>>> which eventually got solved by libvirt specifying the slots manually.
>>
>> We can do that too. Who decides? :)
> 
> The better solution wins :)

We both know who decides ;) I posted series, I need heads up if it is going
the right way or not.


-- 
Alexey

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] spapr: move interrupt allocator to xics
  2014-04-11 14:50                 ` Alexey Kardashevskiy
@ 2014-04-11 14:58                   ` Alexander Graf
  2014-04-11 15:27                     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 35+ messages in thread
From: Alexander Graf @ 2014-04-11 14:58 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-ppc, qemu-devel, Andreas Färber


On 11.04.14 16:50, Alexey Kardashevskiy wrote:
> On 04/11/2014 11:58 PM, Alexander Graf wrote:
>> On 11.04.2014, at 14:38, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>
>>> On 04/11/2014 07:24 PM, Alexander Graf wrote:
>>>> On 10.04.14 16:43, Alexey Kardashevskiy wrote:
>>>>> On 04/10/2014 11:26 PM, Alexander Graf wrote:
>>>>>> On 10.04.14 15:24, Alexey Kardashevskiy wrote:
>>>>>>> On 04/10/2014 10:51 PM, Alexander Graf wrote:
>>>>>>>> On 14.03.14 05:18, Alexey Kardashevskiy wrote:
>>>>>>>>> The current allocator returns IRQ numbers from a pool and does not
>>>>>>>>> support IRQs reuse in any form as it did not keep track of what it
>>>>>>>>> previously returned, it only had the last returned IRQ.
>>>>>>>>> However migration may change interrupts for devices depending on
>>>>>>>>> their order in the command line.
>>>>>>>> Wtf? Nonono, this sounds very bogus and wrong. Migration shouldn't change
>>>>>>>> anything.
>>>>>>> I put wrong commit message. By change I meant that the default state
>>>>>>> before
>>>>>>> the destination guest started accepting migration is different from what
>>>>>>> the destination guest became after migration finished. And migration
>>>>>>> cannot
>>>>>>> avoid changing this default state.
>>>>>> Ok, why is the IRQ configuration different?
>>>>> Because QEMU creates devices in the order as in the command line, and
>>>>> libvirt changes this order - the XML used to create the guest and the XML
>>>>> which is sends during migration are different. libvirt thinks it is ok
>>>>> while it keeps @reg property for (for example) spapr-vscsi devices but it
>>>>> is not because since the order is different, devices call IRQ allocator in
>>>>> different order and get different IRQs.
>>>> So your patch migrates the current IRQ configuration, but once you restart
>>>> the virtual machine on the destination host it will have different IRQ
>>>> numbering again, right?
>>> No, why? IRQs are assigned at init time from realize() callbacks (and
>>> survive reset) or as a part of ibm,change-msi rtas call which happens in
>>> the same order as it only depends on pci addresses and we do not change
>>> this either.
>> Ok, let me rephrase. If I shut the machine down because I'm doing
>> on-disk hibernate and then boot it back up, will the guest find the same
>> configuration?
>
> I do not understand what you mean by this. Hibernation by the guest OS
> itself or by QEMU? If this involves QEMU exit and QEMU start - then yes,

by the guest OS. The host will only see a genuine "shutdown" event. The 
guest OS will expect the machine to look *the exact same* as before the 
shutdown.

> config may be different. If it is "migrate to file" and then "migrate from
> file" (do not know what you call it when migration goes to a pipe which is
> "tar") - then config will be the same.
>
>
>>>> I'm not sure that's a good solution to the problem. I guess we should
>>>> rather aim to make sure that we can make IRQ allocation explicit.
>>>> Fundamentally the problem sounds very similar to the PCI slot allocation
>>>> which eventually got solved by libvirt specifying the slots manually.
>>> We can do that too. Who decides? :)
>> The better solution wins :)
> We both know who decides ;) I posted series, I need heads up if it is going
> the right way or not.

It's not :). If a guest may not have different IRQ allocation after 
migration, it also must not have different IRQ allocation after shutdown 
+ restart.


Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] spapr: move interrupt allocator to xics
  2014-04-11 14:58                   ` Alexander Graf
@ 2014-04-11 15:27                     ` Alexey Kardashevskiy
  2014-04-11 15:38                       ` Alexander Graf
  0 siblings, 1 reply; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-04-11 15:27 UTC (permalink / raw)
  To: Alexander Graf; +Cc: qemu-ppc, qemu-devel, Andreas Färber

On 04/12/2014 12:58 AM, Alexander Graf wrote:
> 
> On 11.04.14 16:50, Alexey Kardashevskiy wrote:
>> On 04/11/2014 11:58 PM, Alexander Graf wrote:
>>> On 11.04.2014, at 14:38, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>
>>>> On 04/11/2014 07:24 PM, Alexander Graf wrote:
>>>>> On 10.04.14 16:43, Alexey Kardashevskiy wrote:
>>>>>> On 04/10/2014 11:26 PM, Alexander Graf wrote:
>>>>>>> On 10.04.14 15:24, Alexey Kardashevskiy wrote:
>>>>>>>> On 04/10/2014 10:51 PM, Alexander Graf wrote:
>>>>>>>>> On 14.03.14 05:18, Alexey Kardashevskiy wrote:
>>>>>>>>>> The current allocator returns IRQ numbers from a pool and does not
>>>>>>>>>> support IRQs reuse in any form as it did not keep track of what it
>>>>>>>>>> previously returned, it only had the last returned IRQ.
>>>>>>>>>> However migration may change interrupts for devices depending on
>>>>>>>>>> their order in the command line.
>>>>>>>>> Wtf? Nonono, this sounds very bogus and wrong. Migration shouldn't
>>>>>>>>> change
>>>>>>>>> anything.
>>>>>>>> I put wrong commit message. By change I meant that the default state
>>>>>>>> before
>>>>>>>> the destination guest started accepting migration is different from
>>>>>>>> what
>>>>>>>> the destination guest became after migration finished. And migration
>>>>>>>> cannot
>>>>>>>> avoid changing this default state.
>>>>>>> Ok, why is the IRQ configuration different?
>>>>>> Because QEMU creates devices in the order as in the command line, and
>>>>>> libvirt changes this order - the XML used to create the guest and the
>>>>>> XML
>>>>>> which is sends during migration are different. libvirt thinks it is ok
>>>>>> while it keeps @reg property for (for example) spapr-vscsi devices
>>>>>> but it
>>>>>> is not because since the order is different, devices call IRQ
>>>>>> allocator in
>>>>>> different order and get different IRQs.
>>>>> So your patch migrates the current IRQ configuration, but once you
>>>>> restart
>>>>> the virtual machine on the destination host it will have different IRQ
>>>>> numbering again, right?
>>>> No, why? IRQs are assigned at init time from realize() callbacks (and
>>>> survive reset) or as a part of ibm,change-msi rtas call which happens in
>>>> the same order as it only depends on pci addresses and we do not change
>>>> this either.
>>> Ok, let me rephrase. If I shut the machine down because I'm doing
>>> on-disk hibernate and then boot it back up, will the guest find the same
>>> configuration?
>>
>> I do not understand what you mean by this. Hibernation by the guest OS
>> itself or by QEMU? If this involves QEMU exit and QEMU start - then yes,
> 
> by the guest OS. The host will only see a genuine "shutdown" event. The
> guest OS will expect the machine to look *the exact same* as before the
> shutdown.

Ok. So. I have to implement "irq" property everywhere (PHB is missing
INTA/B/C/D now) and check if they did not change during migration via those
VMSTATE.*EQUAL. Correct?

If so (more or less), I still would like to keep patches 1..7.
In fact, the first one is independent and we need it anyway.
Yes/no?


>> config may be different. If it is "migrate to file" and then "migrate from
>> file" (do not know what you call it when migration goes to a pipe which is
>> "tar") - then config will be the same.
>>
>>
>>>>> I'm not sure that's a good solution to the problem. I guess we should
>>>>> rather aim to make sure that we can make IRQ allocation explicit.
>>>>> Fundamentally the problem sounds very similar to the PCI slot allocation
>>>>> which eventually got solved by libvirt specifying the slots manually.
>>>> We can do that too. Who decides? :)
>>> The better solution wins :)
>> We both know who decides ;) I posted series, I need heads up if it is going
>> the right way or not.
> 
> It's not :). If a guest may not have different IRQ allocation after
> migration, it also must not have different IRQ allocation after shutdown +
> restart.

Ok. That's good answer, thanks. How does x86 work then? IRQs are hardcoded
(some are for sure but I do not know about MSI)? Or in order to support
migration, the user has to specify IRQs for the devices which may get
different IRQs depending on things like command line parameters order?


-- 
Alexey

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] spapr: move interrupt allocator to xics
  2014-04-11 15:27                     ` Alexey Kardashevskiy
@ 2014-04-11 15:38                       ` Alexander Graf
  2014-04-11 16:01                         ` Alexey Kardashevskiy
  0 siblings, 1 reply; 35+ messages in thread
From: Alexander Graf @ 2014-04-11 15:38 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-ppc, qemu-devel, Andreas Färber


On 11.04.14 17:27, Alexey Kardashevskiy wrote:
> On 04/12/2014 12:58 AM, Alexander Graf wrote:
>> On 11.04.14 16:50, Alexey Kardashevskiy wrote:
>>> On 04/11/2014 11:58 PM, Alexander Graf wrote:
>>>> On 11.04.2014, at 14:38, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>>
>>>>> On 04/11/2014 07:24 PM, Alexander Graf wrote:
>>>>>> On 10.04.14 16:43, Alexey Kardashevskiy wrote:
>>>>>>> On 04/10/2014 11:26 PM, Alexander Graf wrote:
>>>>>>>> On 10.04.14 15:24, Alexey Kardashevskiy wrote:
>>>>>>>>> On 04/10/2014 10:51 PM, Alexander Graf wrote:
>>>>>>>>>> On 14.03.14 05:18, Alexey Kardashevskiy wrote:
>>>>>>>>>>> The current allocator returns IRQ numbers from a pool and does not
>>>>>>>>>>> support IRQs reuse in any form as it did not keep track of what it
>>>>>>>>>>> previously returned, it only had the last returned IRQ.
>>>>>>>>>>> However migration may change interrupts for devices depending on
>>>>>>>>>>> their order in the command line.
>>>>>>>>>> Wtf? Nonono, this sounds very bogus and wrong. Migration shouldn't
>>>>>>>>>> change
>>>>>>>>>> anything.
>>>>>>>>> I put wrong commit message. By change I meant that the default state
>>>>>>>>> before
>>>>>>>>> the destination guest started accepting migration is different from
>>>>>>>>> what
>>>>>>>>> the destination guest became after migration finished. And migration
>>>>>>>>> cannot
>>>>>>>>> avoid changing this default state.
>>>>>>>> Ok, why is the IRQ configuration different?
>>>>>>> Because QEMU creates devices in the order as in the command line, and
>>>>>>> libvirt changes this order - the XML used to create the guest and the
>>>>>>> XML
>>>>>>> which is sends during migration are different. libvirt thinks it is ok
>>>>>>> while it keeps @reg property for (for example) spapr-vscsi devices
>>>>>>> but it
>>>>>>> is not because since the order is different, devices call IRQ
>>>>>>> allocator in
>>>>>>> different order and get different IRQs.
>>>>>> So your patch migrates the current IRQ configuration, but once you
>>>>>> restart
>>>>>> the virtual machine on the destination host it will have different IRQ
>>>>>> numbering again, right?
>>>>> No, why? IRQs are assigned at init time from realize() callbacks (and
>>>>> survive reset) or as a part of ibm,change-msi rtas call which happens in
>>>>> the same order as it only depends on pci addresses and we do not change
>>>>> this either.
>>>> Ok, let me rephrase. If I shut the machine down because I'm doing
>>>> on-disk hibernate and then boot it back up, will the guest find the same
>>>> configuration?
>>> I do not understand what you mean by this. Hibernation by the guest OS
>>> itself or by QEMU? If this involves QEMU exit and QEMU start - then yes,
>> by the guest OS. The host will only see a genuine "shutdown" event. The
>> guest OS will expect the machine to look *the exact same* as before the
>> shutdown.
> Ok. So. I have to implement "irq" property everywhere (PHB is missing
> INTA/B/C/D now) and check if they did not change during migration via those

Hrm. Not sure. Maybe it'd make sense to join next week's call on 
platform device creation. The problem seems pretty closely related.

> VMSTATE.*EQUAL. Correct?

Why would you need this? I think we already said a couple dozen times 
that configuration matching is a bigger problem, no?

> If so (more or less), I still would like to keep patches 1..7.
> In fact, the first one is independent and we need it anyway.
> Yes/no?

Why?

>
>
>>> config may be different. If it is "migrate to file" and then "migrate from
>>> file" (do not know what you call it when migration goes to a pipe which is
>>> "tar") - then config will be the same.
>>>
>>>
>>>>>> I'm not sure that's a good solution to the problem. I guess we should
>>>>>> rather aim to make sure that we can make IRQ allocation explicit.
>>>>>> Fundamentally the problem sounds very similar to the PCI slot allocation
>>>>>> which eventually got solved by libvirt specifying the slots manually.
>>>>> We can do that too. Who decides? :)
>>>> The better solution wins :)
>>> We both know who decides ;) I posted series, I need heads up if it is going
>>> the right way or not.
>> It's not :). If a guest may not have different IRQ allocation after
>> migration, it also must not have different IRQ allocation after shutdown +
>> restart.
> Ok. That's good answer, thanks. How does x86 work then? IRQs are hardcoded
> (some are for sure but I do not know about MSI)? Or in order to support

Non-PCI IRQs are hardcoded, yes. PCI IRQs are mapped to one of the 4 PCI 
interrupts which again are hardcoded to IOAPIC interrupt lines after 
some PCI line swizzling.

MSI gets configured by the guest, so it has to make sure MSIs are set up 
identically again after hibernation.


Alex
> migration, the user has to specify IRQs for the devices which may get
> different IRQs depending on things like command line parameters order?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] spapr: move interrupt allocator to xics
  2014-04-11 15:38                       ` Alexander Graf
@ 2014-04-11 16:01                         ` Alexey Kardashevskiy
  2014-04-11 16:15                           ` Alexander Graf
  0 siblings, 1 reply; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-04-11 16:01 UTC (permalink / raw)
  To: Alexander Graf; +Cc: qemu-ppc, qemu-devel, Andreas Färber

On 04/12/2014 01:38 AM, Alexander Graf wrote:
> 
> On 11.04.14 17:27, Alexey Kardashevskiy wrote:
>> On 04/12/2014 12:58 AM, Alexander Graf wrote:
>>> On 11.04.14 16:50, Alexey Kardashevskiy wrote:
>>>> On 04/11/2014 11:58 PM, Alexander Graf wrote:
>>>>> On 11.04.2014, at 14:38, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>>>
>>>>>> On 04/11/2014 07:24 PM, Alexander Graf wrote:
>>>>>>> On 10.04.14 16:43, Alexey Kardashevskiy wrote:
>>>>>>>> On 04/10/2014 11:26 PM, Alexander Graf wrote:
>>>>>>>>> On 10.04.14 15:24, Alexey Kardashevskiy wrote:
>>>>>>>>>> On 04/10/2014 10:51 PM, Alexander Graf wrote:
>>>>>>>>>>> On 14.03.14 05:18, Alexey Kardashevskiy wrote:
>>>>>>>>>>>> The current allocator returns IRQ numbers from a pool and does not
>>>>>>>>>>>> support IRQs reuse in any form as it did not keep track of what it
>>>>>>>>>>>> previously returned, it only had the last returned IRQ.
>>>>>>>>>>>> However migration may change interrupts for devices depending on
>>>>>>>>>>>> their order in the command line.
>>>>>>>>>>> Wtf? Nonono, this sounds very bogus and wrong. Migration shouldn't
>>>>>>>>>>> change
>>>>>>>>>>> anything.
>>>>>>>>>> I put wrong commit message. By change I meant that the default state
>>>>>>>>>> before
>>>>>>>>>> the destination guest started accepting migration is different from
>>>>>>>>>> what
>>>>>>>>>> the destination guest became after migration finished. And migration
>>>>>>>>>> cannot
>>>>>>>>>> avoid changing this default state.
>>>>>>>>> Ok, why is the IRQ configuration different?
>>>>>>>> Because QEMU creates devices in the order as in the command line, and
>>>>>>>> libvirt changes this order - the XML used to create the guest and the
>>>>>>>> XML
>>>>>>>> which is sends during migration are different. libvirt thinks it is ok
>>>>>>>> while it keeps @reg property for (for example) spapr-vscsi devices
>>>>>>>> but it
>>>>>>>> is not because since the order is different, devices call IRQ
>>>>>>>> allocator in
>>>>>>>> different order and get different IRQs.
>>>>>>> So your patch migrates the current IRQ configuration, but once you
>>>>>>> restart
>>>>>>> the virtual machine on the destination host it will have different IRQ
>>>>>>> numbering again, right?
>>>>>> No, why? IRQs are assigned at init time from realize() callbacks (and
>>>>>> survive reset) or as a part of ibm,change-msi rtas call which happens in
>>>>>> the same order as it only depends on pci addresses and we do not change
>>>>>> this either.
>>>>> Ok, let me rephrase. If I shut the machine down because I'm doing
>>>>> on-disk hibernate and then boot it back up, will the guest find the same
>>>>> configuration?
>>>> I do not understand what you mean by this. Hibernation by the guest OS
>>>> itself or by QEMU? If this involves QEMU exit and QEMU start - then yes,
>>> by the guest OS. The host will only see a genuine "shutdown" event. The
>>> guest OS will expect the machine to look *the exact same* as before the
>>> shutdown.
>> Ok. So. I have to implement "irq" property everywhere (PHB is missing
>> INTA/B/C/D now) and check if they did not change during migration via those
> 
> Hrm. Not sure. Maybe it'd make sense to join next week's call on platform
> device creation. The problem seems pretty closely related.

What are those platform devices and what are you going to discuss exactly?


>> VMSTATE.*EQUAL. Correct?
> 
> Why would you need this? I think we already said a couple dozen times that
> configuration matching is a bigger problem, no?

For debug! It is not needed in general, yes.


>> If so (more or less), I still would like to keep patches 1..7.
>> In fact, the first one is independent and we need it anyway.
>> Yes/no?
> 
> Why?

IOMMUs do not migrate correctly - they only have a class have and
instance_id and this instance_it depends on command line arguments order.
The #1 patch makes it classname + liobn.


> 
>>
>>
>>>> config may be different. If it is "migrate to file" and then "migrate from
>>>> file" (do not know what you call it when migration goes to a pipe which is
>>>> "tar") - then config will be the same.
>>>>
>>>>
>>>>>>> I'm not sure that's a good solution to the problem. I guess we should
>>>>>>> rather aim to make sure that we can make IRQ allocation explicit.
>>>>>>> Fundamentally the problem sounds very similar to the PCI slot
>>>>>>> allocation
>>>>>>> which eventually got solved by libvirt specifying the slots manually.
>>>>>> We can do that too. Who decides? :)
>>>>> The better solution wins :)
>>>> We both know who decides ;) I posted series, I need heads up if it is
>>>> going
>>>> the right way or not.
>>> It's not :). If a guest may not have different IRQ allocation after
>>> migration, it also must not have different IRQ allocation after shutdown +
>>> restart.
>> Ok. That's good answer, thanks. How does x86 work then? IRQs are hardcoded
>> (some are for sure but I do not know about MSI)? Or in order to support
> 
> Non-PCI IRQs are hardcoded, yes. PCI IRQs are mapped to one of the 4 PCI
> interrupts which again are hardcoded to IOAPIC interrupt lines after some
> PCI line swizzling.

This is what I meant - I need to have a way to tell PHB IRQ numbers for
INTA/B/C/D.



> MSI gets configured by the guest, so it has to make sure MSIs are set up
> identically again after hibernation.
> 
> 
> Alex
>> migration, the user has to specify IRQs for the devices which may get
>> different IRQs depending on things like command line parameters order?
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] spapr: move interrupt allocator to xics
  2014-04-11 16:01                         ` Alexey Kardashevskiy
@ 2014-04-11 16:15                           ` Alexander Graf
  2014-04-11 16:30                             ` Alexey Kardashevskiy
  0 siblings, 1 reply; 35+ messages in thread
From: Alexander Graf @ 2014-04-11 16:15 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-ppc, qemu-devel, Andreas Färber


On 11.04.14 18:01, Alexey Kardashevskiy wrote:
> On 04/12/2014 01:38 AM, Alexander Graf wrote:
>> On 11.04.14 17:27, Alexey Kardashevskiy wrote:
>>> On 04/12/2014 12:58 AM, Alexander Graf wrote:
>>>> On 11.04.14 16:50, Alexey Kardashevskiy wrote:
>>>>> On 04/11/2014 11:58 PM, Alexander Graf wrote:
>>>>>> On 11.04.2014, at 14:38, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>>>>
>>>>>>> On 04/11/2014 07:24 PM, Alexander Graf wrote:
>>>>>>>> On 10.04.14 16:43, Alexey Kardashevskiy wrote:
>>>>>>>>> On 04/10/2014 11:26 PM, Alexander Graf wrote:
>>>>>>>>>> On 10.04.14 15:24, Alexey Kardashevskiy wrote:
>>>>>>>>>>> On 04/10/2014 10:51 PM, Alexander Graf wrote:
>>>>>>>>>>>> On 14.03.14 05:18, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>> The current allocator returns IRQ numbers from a pool and does not
>>>>>>>>>>>>> support IRQs reuse in any form as it did not keep track of what it
>>>>>>>>>>>>> previously returned, it only had the last returned IRQ.
>>>>>>>>>>>>> However migration may change interrupts for devices depending on
>>>>>>>>>>>>> their order in the command line.
>>>>>>>>>>>> Wtf? Nonono, this sounds very bogus and wrong. Migration shouldn't
>>>>>>>>>>>> change
>>>>>>>>>>>> anything.
>>>>>>>>>>> I put wrong commit message. By change I meant that the default state
>>>>>>>>>>> before
>>>>>>>>>>> the destination guest started accepting migration is different from
>>>>>>>>>>> what
>>>>>>>>>>> the destination guest became after migration finished. And migration
>>>>>>>>>>> cannot
>>>>>>>>>>> avoid changing this default state.
>>>>>>>>>> Ok, why is the IRQ configuration different?
>>>>>>>>> Because QEMU creates devices in the order as in the command line, and
>>>>>>>>> libvirt changes this order - the XML used to create the guest and the
>>>>>>>>> XML
>>>>>>>>> which is sends during migration are different. libvirt thinks it is ok
>>>>>>>>> while it keeps @reg property for (for example) spapr-vscsi devices
>>>>>>>>> but it
>>>>>>>>> is not because since the order is different, devices call IRQ
>>>>>>>>> allocator in
>>>>>>>>> different order and get different IRQs.
>>>>>>>> So your patch migrates the current IRQ configuration, but once you
>>>>>>>> restart
>>>>>>>> the virtual machine on the destination host it will have different IRQ
>>>>>>>> numbering again, right?
>>>>>>> No, why? IRQs are assigned at init time from realize() callbacks (and
>>>>>>> survive reset) or as a part of ibm,change-msi rtas call which happens in
>>>>>>> the same order as it only depends on pci addresses and we do not change
>>>>>>> this either.
>>>>>> Ok, let me rephrase. If I shut the machine down because I'm doing
>>>>>> on-disk hibernate and then boot it back up, will the guest find the same
>>>>>> configuration?
>>>>> I do not understand what you mean by this. Hibernation by the guest OS
>>>>> itself or by QEMU? If this involves QEMU exit and QEMU start - then yes,
>>>> by the guest OS. The host will only see a genuine "shutdown" event. The
>>>> guest OS will expect the machine to look *the exact same* as before the
>>>> shutdown.
>>> Ok. So. I have to implement "irq" property everywhere (PHB is missing
>>> INTA/B/C/D now) and check if they did not change during migration via those
>> Hrm. Not sure. Maybe it'd make sense to join next week's call on platform
>> device creation. The problem seems pretty closely related.
> What are those platform devices and what are you going to discuss exactly?

Devices that don't have a unified interrupt routing scheme like PCI 
where you just link lines A/B/C/D to your controller and you're good to go.

>
>
>>> VMSTATE.*EQUAL. Correct?
>> Why would you need this? I think we already said a couple dozen times that
>> configuration matching is a bigger problem, no?
> For debug! It is not needed in general, yes.
>
>
>>> If so (more or less), I still would like to keep patches 1..7.
>>> In fact, the first one is independent and we need it anyway.
>>> Yes/no?
>> Why?
> IOMMUs do not migrate correctly - they only have a class have and
> instance_id and this instance_it depends on command line arguments order.
> The #1 patch makes it classname + liobn.

Why do we need a bus for that?

>
>
>>>
>>>>> config may be different. If it is "migrate to file" and then "migrate from
>>>>> file" (do not know what you call it when migration goes to a pipe which is
>>>>> "tar") - then config will be the same.
>>>>>
>>>>>
>>>>>>>> I'm not sure that's a good solution to the problem. I guess we should
>>>>>>>> rather aim to make sure that we can make IRQ allocation explicit.
>>>>>>>> Fundamentally the problem sounds very similar to the PCI slot
>>>>>>>> allocation
>>>>>>>> which eventually got solved by libvirt specifying the slots manually.
>>>>>>> We can do that too. Who decides? :)
>>>>>> The better solution wins :)
>>>>> We both know who decides ;) I posted series, I need heads up if it is
>>>>> going
>>>>> the right way or not.
>>>> It's not :). If a guest may not have different IRQ allocation after
>>>> migration, it also must not have different IRQ allocation after shutdown +
>>>> restart.
>>> Ok. That's good answer, thanks. How does x86 work then? IRQs are hardcoded
>>> (some are for sure but I do not know about MSI)? Or in order to support
>> Non-PCI IRQs are hardcoded, yes. PCI IRQs are mapped to one of the 4 PCI
>> interrupts which again are hardcoded to IOAPIC interrupt lines after some
>> PCI line swizzling.
> This is what I meant - I need to have a way to tell PHB IRQ numbers for
> INTA/B/C/D.

Yes, just like platform devices ;).


Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] spapr: move interrupt allocator to xics
  2014-04-11 16:15                           ` Alexander Graf
@ 2014-04-11 16:30                             ` Alexey Kardashevskiy
  0 siblings, 0 replies; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-04-11 16:30 UTC (permalink / raw)
  To: Alexander Graf; +Cc: qemu-ppc, qemu-devel, Andreas Färber

On 04/12/2014 02:15 AM, Alexander Graf wrote:
> 
> On 11.04.14 18:01, Alexey Kardashevskiy wrote:
>> On 04/12/2014 01:38 AM, Alexander Graf wrote:
>>> On 11.04.14 17:27, Alexey Kardashevskiy wrote:
>>>> On 04/12/2014 12:58 AM, Alexander Graf wrote:
>>>>> On 11.04.14 16:50, Alexey Kardashevskiy wrote:
>>>>>> On 04/11/2014 11:58 PM, Alexander Graf wrote:
>>>>>>> On 11.04.2014, at 14:38, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>>>>>
>>>>>>>> On 04/11/2014 07:24 PM, Alexander Graf wrote:
>>>>>>>>> On 10.04.14 16:43, Alexey Kardashevskiy wrote:
>>>>>>>>>> On 04/10/2014 11:26 PM, Alexander Graf wrote:
>>>>>>>>>>> On 10.04.14 15:24, Alexey Kardashevskiy wrote:
>>>>>>>>>>>> On 04/10/2014 10:51 PM, Alexander Graf wrote:
>>>>>>>>>>>>> On 14.03.14 05:18, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>>> The current allocator returns IRQ numbers from a pool and
>>>>>>>>>>>>>> does not
>>>>>>>>>>>>>> support IRQs reuse in any form as it did not keep track of
>>>>>>>>>>>>>> what it
>>>>>>>>>>>>>> previously returned, it only had the last returned IRQ.
>>>>>>>>>>>>>> However migration may change interrupts for devices depending on
>>>>>>>>>>>>>> their order in the command line.
>>>>>>>>>>>>> Wtf? Nonono, this sounds very bogus and wrong. Migration
>>>>>>>>>>>>> shouldn't
>>>>>>>>>>>>> change
>>>>>>>>>>>>> anything.
>>>>>>>>>>>> I put wrong commit message. By change I meant that the default
>>>>>>>>>>>> state
>>>>>>>>>>>> before
>>>>>>>>>>>> the destination guest started accepting migration is different
>>>>>>>>>>>> from
>>>>>>>>>>>> what
>>>>>>>>>>>> the destination guest became after migration finished. And
>>>>>>>>>>>> migration
>>>>>>>>>>>> cannot
>>>>>>>>>>>> avoid changing this default state.
>>>>>>>>>>> Ok, why is the IRQ configuration different?
>>>>>>>>>> Because QEMU creates devices in the order as in the command line,
>>>>>>>>>> and
>>>>>>>>>> libvirt changes this order - the XML used to create the guest and
>>>>>>>>>> the
>>>>>>>>>> XML
>>>>>>>>>> which is sends during migration are different. libvirt thinks it
>>>>>>>>>> is ok
>>>>>>>>>> while it keeps @reg property for (for example) spapr-vscsi devices
>>>>>>>>>> but it
>>>>>>>>>> is not because since the order is different, devices call IRQ
>>>>>>>>>> allocator in
>>>>>>>>>> different order and get different IRQs.
>>>>>>>>> So your patch migrates the current IRQ configuration, but once you
>>>>>>>>> restart
>>>>>>>>> the virtual machine on the destination host it will have different
>>>>>>>>> IRQ
>>>>>>>>> numbering again, right?
>>>>>>>> No, why? IRQs are assigned at init time from realize() callbacks (and
>>>>>>>> survive reset) or as a part of ibm,change-msi rtas call which
>>>>>>>> happens in
>>>>>>>> the same order as it only depends on pci addresses and we do not
>>>>>>>> change
>>>>>>>> this either.
>>>>>>> Ok, let me rephrase. If I shut the machine down because I'm doing
>>>>>>> on-disk hibernate and then boot it back up, will the guest find the
>>>>>>> same
>>>>>>> configuration?
>>>>>> I do not understand what you mean by this. Hibernation by the guest OS
>>>>>> itself or by QEMU? If this involves QEMU exit and QEMU start - then yes,
>>>>> by the guest OS. The host will only see a genuine "shutdown" event. The
>>>>> guest OS will expect the machine to look *the exact same* as before the
>>>>> shutdown.
>>>> Ok. So. I have to implement "irq" property everywhere (PHB is missing
>>>> INTA/B/C/D now) and check if they did not change during migration via
>>>> those
>>> Hrm. Not sure. Maybe it'd make sense to join next week's call on platform
>>> device creation. The problem seems pretty closely related.
>> What are those platform devices and what are you going to discuss exactly?
> 
> Devices that don't have a unified interrupt routing scheme like PCI where
> you just link lines A/B/C/D to your controller and you're good to go.

Ah. VIO in my case.



>>>> VMSTATE.*EQUAL. Correct?
>>> Why would you need this? I think we already said a couple dozen times that
>>> configuration matching is a bigger problem, no?
>> For debug! It is not needed in general, yes.
>>
>>
>>>> If so (more or less), I still would like to keep patches 1..7.
>>>> In fact, the first one is independent and we need it anyway.
>>>> Yes/no?
>>> Why?
>> IOMMUs do not migrate correctly - they only have a class have and
>> instance_id and this instance_it depends on command line arguments order.
>> The #1 patch makes it classname + liobn.
> 
> Why do we need a bus for that?


For BusClass::get_dev_path callback to get an unique name.



>>>>>> config may be different. If it is "migrate to file" and then "migrate
>>>>>> from
>>>>>> file" (do not know what you call it when migration goes to a pipe
>>>>>> which is
>>>>>> "tar") - then config will be the same.
>>>>>>
>>>>>>
>>>>>>>>> I'm not sure that's a good solution to the problem. I guess we should
>>>>>>>>> rather aim to make sure that we can make IRQ allocation explicit.
>>>>>>>>> Fundamentally the problem sounds very similar to the PCI slot
>>>>>>>>> allocation
>>>>>>>>> which eventually got solved by libvirt specifying the slots manually.
>>>>>>>> We can do that too. Who decides? :)
>>>>>>> The better solution wins :)
>>>>>> We both know who decides ;) I posted series, I need heads up if it is
>>>>>> going
>>>>>> the right way or not.
>>>>> It's not :). If a guest may not have different IRQ allocation after
>>>>> migration, it also must not have different IRQ allocation after
>>>>> shutdown +
>>>>> restart.
>>>> Ok. That's good answer, thanks. How does x86 work then? IRQs are hardcoded
>>>> (some are for sure but I do not know about MSI)? Or in order to support
>>> Non-PCI IRQs are hardcoded, yes. PCI IRQs are mapped to one of the 4 PCI
>>> interrupts which again are hardcoded to IOAPIC interrupt lines after some
>>> PCI line swizzling.
>> This is what I meant - I need to have a way to tell PHB IRQ numbers for
>> INTA/B/C/D.
> 
> Yes, just like platform devices ;).




-- 
Alexey

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] spapr: fix IOMMU and XICS/IRQs migration
  2014-03-14  4:18 [Qemu-devel] [PATCH 0/8] spapr: fix IOMMU and XICS/IRQs migration Alexey Kardashevskiy
                   ` (9 preceding siblings ...)
  2014-04-04  5:53 ` Alexey Kardashevskiy
@ 2014-05-04 13:56 ` Alexey Kardashevskiy
  2014-05-04 21:52   ` Paolo Bonzini
  10 siblings, 1 reply; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-04 13:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, Alexander Graf, Andreas Färber

On 03/14/2014 03:18 PM, Alexey Kardashevskiy wrote:
> This initial problem came form libvirt - it does not preserve
> the device order when running QEMU. So it is easy to get source QEMU with:
> -device spapr-vscsi,id=scsi1,reg=0x2000 -device spapr-vscsi,id=scsi0,reg=0x3000
> and destination QEMU with:
> -device spapr-vscsi,id=scsi0,reg=0x3000 -device spapr-vscsi,id=scsi1,reg=0x2000
> 
> Since SPAPR IOMMU device does not have a bus, it is identified in
> the migration stream as "spapr-iommu" and @instance_id which is assigned
> as IOMMUs are created. This results in broken migration as @reg does not match.
> The first patch fixes this issue by adding a bus device and a bridge.
> 
> However just 1/8 does not fix the migration as device creation order also
> affects IRQs assigned to the devices, for both PCI and VIO.
> 2/7..8/8 fix that by moving XICS IRQ management from SPAPR to XICS and
> implementing migration support for the entire XICS IRQ map.
> 
> As we are here, the patchset also prepares XICS to support multiple ICS
> (interrupt servers).
> 
> This is a bugfix patchset but it feels too big to go to 2.0, right? :)


I understand that the first and last patches of the series are wrong but I
would still like others to get in upstream. How to proceed now? Repost them
all again? Thanks.


> 
> 
> Alexey Kardashevskiy (8):
>   spapr-iommu: add a bus for spapr-iommu devices
>   xics: add flags for interrupts
>   xics: add find_server
>   xics: add pre_load() hook to ICSStateClass
>   xics: disable flags reset on xics reset
>   spapr: move interrupt allocator to xics
>   spapr: remove @next_irq
>   xics: enable interrupt configuration reset on migration
> 
>  hw/intc/xics.c         | 158 +++++++++++++++++++++++++++++++++++++++++++++----
>  hw/intc/xics_kvm.c     |   9 +--
>  hw/ppc/spapr.c         |  73 +----------------------
>  hw/ppc/spapr_events.c  |   2 +-
>  hw/ppc/spapr_iommu.c   |  59 +++++++++++++++++-
>  hw/ppc/spapr_pci.c     |   8 +--
>  hw/ppc/spapr_vio.c     |   4 +-
>  include/hw/ppc/spapr.h |  18 +++---
>  include/hw/ppc/xics.h  |   8 ++-
>  trace-events           |   4 ++
>  10 files changed, 238 insertions(+), 105 deletions(-)
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] spapr: fix IOMMU and XICS/IRQs migration
  2014-05-04 13:56 ` Alexey Kardashevskiy
@ 2014-05-04 21:52   ` Paolo Bonzini
  2014-05-04 23:48     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 35+ messages in thread
From: Paolo Bonzini @ 2014-05-04 21:52 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: qemu-ppc, Alexander Graf, Andreas Färber

Il 04/05/2014 15:56, Alexey Kardashevskiy ha scritto:
> On 03/14/2014 03:18 PM, Alexey Kardashevskiy wrote:
>> This initial problem came form libvirt - it does not preserve
>> the device order when running QEMU. So it is easy to get source QEMU with:
>> -device spapr-vscsi,id=scsi1,reg=0x2000 -device spapr-vscsi,id=scsi0,reg=0x3000
>> and destination QEMU with:
>> -device spapr-vscsi,id=scsi0,reg=0x3000 -device spapr-vscsi,id=scsi1,reg=0x2000
>>
>> Since SPAPR IOMMU device does not have a bus, it is identified in
>> the migration stream as "spapr-iommu" and @instance_id which is assigned
>> as IOMMUs are created. This results in broken migration as @reg does not match.
>> The first patch fixes this issue by adding a bus device and a bridge.
>>
>> However just 1/8 does not fix the migration as device creation order also
>> affects IRQs assigned to the devices, for both PCI and VIO.
>> 2/7..8/8 fix that by moving XICS IRQ management from SPAPR to XICS and
>> implementing migration support for the entire XICS IRQ map.
>>
>> As we are here, the patchset also prepares XICS to support multiple ICS
>> (interrupt servers).
>>
>> This is a bugfix patchset but it feels too big to go to 2.0, right? :)
>
>
> I understand that the first and last patches of the series are wrong but I
> would still like others to get in upstream. How to proceed now? Repost them
> all again? Thanks.

You are working around a libvirt problem caused by a missing QEMU feature.

At least patch 4 is probably unnecessary too.  2/3/5/6/7 (aka what's 
left) seem sane and likely to remain useful in order to implement the 
QEMU feature correctly.

Paolo

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 0/8] spapr: fix IOMMU and XICS/IRQs migration
  2014-05-04 21:52   ` Paolo Bonzini
@ 2014-05-04 23:48     ` Alexey Kardashevskiy
  0 siblings, 0 replies; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-04 23:48 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel; +Cc: qemu-ppc, Alexander Graf, Andreas Färber

On 05/05/2014 07:52 AM, Paolo Bonzini wrote:
> Il 04/05/2014 15:56, Alexey Kardashevskiy ha scritto:
>> On 03/14/2014 03:18 PM, Alexey Kardashevskiy wrote:
>>> This initial problem came form libvirt - it does not preserve
>>> the device order when running QEMU. So it is easy to get source QEMU with:
>>> -device spapr-vscsi,id=scsi1,reg=0x2000 -device
>>> spapr-vscsi,id=scsi0,reg=0x3000
>>> and destination QEMU with:
>>> -device spapr-vscsi,id=scsi0,reg=0x3000 -device
>>> spapr-vscsi,id=scsi1,reg=0x2000
>>>
>>> Since SPAPR IOMMU device does not have a bus, it is identified in
>>> the migration stream as "spapr-iommu" and @instance_id which is assigned
>>> as IOMMUs are created. This results in broken migration as @reg does not
>>> match.
>>> The first patch fixes this issue by adding a bus device and a bridge.
>>>
>>> However just 1/8 does not fix the migration as device creation order also
>>> affects IRQs assigned to the devices, for both PCI and VIO.
>>> 2/7..8/8 fix that by moving XICS IRQ management from SPAPR to XICS and
>>> implementing migration support for the entire XICS IRQ map.
>>>
>>> As we are here, the patchset also prepares XICS to support multiple ICS
>>> (interrupt servers).
>>>
>>> This is a bugfix patchset but it feels too big to go to 2.0, right? :)
>>
>>
>> I understand that the first and last patches of the series are wrong but I
>> would still like others to get in upstream. How to proceed now? Repost them
>> all again? Thanks.
> 
> You are working around a libvirt problem caused by a missing QEMU feature.

Not anymore, I am solving a different issue now - I want to return IRQs
back to pool after PCI hot unplug.

> At least patch 4 is probably unnecessary too.  2/3/5/6/7 (aka what's left)
> seem sane and likely to remain useful in order to implement the QEMU
> feature correctly.

Yep, patch#4 has to go too. I'll repost then. Thanks!



-- 
Alexey

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2014-05-04 23:48 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-14  4:18 [Qemu-devel] [PATCH 0/8] spapr: fix IOMMU and XICS/IRQs migration Alexey Kardashevskiy
2014-03-14  4:18 ` [Qemu-devel] [PATCH 1/8] spapr-iommu: add a bus for spapr-iommu devices Alexey Kardashevskiy
2014-04-10 12:40   ` Alexander Graf
2014-04-10 14:40     ` Alexey Kardashevskiy
2014-04-10 14:52       ` Andreas Färber
2014-04-10 15:18         ` Alexey Kardashevskiy
2014-03-14  4:18 ` [Qemu-devel] [PATCH 2/8] xics: add flags for interrupts Alexey Kardashevskiy
2014-04-10 12:43   ` Alexander Graf
2014-03-14  4:18 ` [Qemu-devel] [PATCH 3/8] xics: add find_server Alexey Kardashevskiy
2014-03-14  4:18 ` [Qemu-devel] [PATCH 4/8] xics: add pre_load() hook to ICSStateClass Alexey Kardashevskiy
2014-03-14  4:18 ` [Qemu-devel] [PATCH 5/8] xics: disable flags reset on xics reset Alexey Kardashevskiy
2014-03-14  4:18 ` [Qemu-devel] [PATCH 6/8] spapr: move interrupt allocator to xics Alexey Kardashevskiy
2014-04-10 12:51   ` Alexander Graf
2014-04-10 13:24     ` Alexey Kardashevskiy
2014-04-10 13:26       ` Alexander Graf
2014-04-10 14:43         ` Alexey Kardashevskiy
2014-04-11  9:24           ` Alexander Graf
2014-04-11 12:38             ` Alexey Kardashevskiy
2014-04-11 13:58               ` Alexander Graf
2014-04-11 14:50                 ` Alexey Kardashevskiy
2014-04-11 14:58                   ` Alexander Graf
2014-04-11 15:27                     ` Alexey Kardashevskiy
2014-04-11 15:38                       ` Alexander Graf
2014-04-11 16:01                         ` Alexey Kardashevskiy
2014-04-11 16:15                           ` Alexander Graf
2014-04-11 16:30                             ` Alexey Kardashevskiy
2014-03-14  4:18 ` [Qemu-devel] [PATCH 7/8] spapr: remove @next_irq Alexey Kardashevskiy
2014-03-14  7:19   ` Thomas Huth
2014-03-14  4:18 ` [Qemu-devel] [PATCH 8/8] xics: enable interrupt configuration reset on migration Alexey Kardashevskiy
2014-04-10 12:55   ` Alexander Graf
2014-03-20  1:25 ` [Qemu-devel] [PATCH 0/8] spapr: fix IOMMU and XICS/IRQs migration Andreas Färber
2014-04-04  5:53 ` Alexey Kardashevskiy
2014-05-04 13:56 ` Alexey Kardashevskiy
2014-05-04 21:52   ` Paolo Bonzini
2014-05-04 23:48     ` Alexey Kardashevskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.