All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/8] Move interrupts from spapr to xics
@ 2014-05-15  9:59 Alexey Kardashevskiy
  2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 1/8] xics: Add flags for interrupts Alexey Kardashevskiy
                   ` (7 more replies)
  0 siblings, 8 replies; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-15  9:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf

This moves interrupts allocation business from SPAPR to XICS
and makes use of it.

Changes:
v2:
* s/server/source/
* fixed typos, code style, added an assert
* added patch for spapr_pci for better IRQ reuse for MSI/MSIX

There is just one source at the moment. We might create one
per PHB and one per VIO device or VIO bus but I do not see any
immediate profit from it. Makes any sense?

Please comment. Thanks!



Alexey Kardashevskiy (8):
  xics: Add flags for interrupts
  xics: Add xics_find_source()
  xics: Disable flags reset on xics reset
  spapr: Move interrupt allocator to xics
  xics: Remove obsolete xics_set_irq_type()
  spapr: Remove @next_irq
  xics: Implement xics_ics_free()
  spapr_pci: Use XICS interrupt allocator and do not cache interrupts in
    PHB

 hw/intc/xics.c              | 160 ++++++++++++++++++++++++++++++++++++---
 hw/intc/xics_kvm.c          |  12 ++-
 hw/ppc/spapr.c              |  70 +----------------
 hw/ppc/spapr_events.c       |   2 +-
 hw/ppc/spapr_pci.c          | 181 +++++++++++++++++++++++---------------------
 hw/ppc/spapr_vio.c          |   2 +-
 include/hw/pci-host/spapr.h |  11 +--
 include/hw/ppc/spapr.h      |  11 ---
 include/hw/ppc/xics.h       |  10 ++-
 trace-events                |  11 ++-
 10 files changed, 275 insertions(+), 195 deletions(-)

-- 
1.9.rc0

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 1/8] xics: Add flags for interrupts
  2014-05-15  9:59 [Qemu-devel] [PATCH v2 0/8] Move interrupts from spapr to xics Alexey Kardashevskiy
@ 2014-05-15  9:59 ` Alexey Kardashevskiy
  2014-05-21  6:30   ` Alexey Kardashevskiy
  2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 2/8] xics: Add xics_find_source() Alexey Kardashevskiy
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-15  9:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf

The existing interrupt allocation scheme in SPAPR assumes that
interrupts are allocated at the start time, continously and the config
will not change. However, there are cases when this is not going to work
such as:

1. migration - we will have to have an ability to choose interrupt
numbers for devices in the command line and this will create gaps in
interrupt space.

2. PCI hotplug - interrupts from unplugged device need to be returned
back to interrupt pool, otherwise we will quickly run out of interrupts.

This replaces a separate lslsi[] array with a byte in the ICSIRQState
struct and defines "LSI" and "MSI" flags. Neither of these flags set
signals that the descriptor is not allocated and not in use.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/intc/xics.c        | 23 ++++++++++++++++-------
 hw/intc/xics_kvm.c    |  5 ++---
 include/hw/ppc/xics.h |  6 +++++-
 3 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 64aabe7..220ca0e 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -438,7 +438,7 @@ static void ics_set_irq(void *opaque, int srcno, int val)
 {
     ICSState *ics = (ICSState *)opaque;
 
-    if (ics->islsi[srcno]) {
+    if (ics->irqs[srcno].flags & XICS_FLAGS_IRQ_LSI) {
         set_irq_lsi(ics, srcno, val);
     } else {
         set_irq_msi(ics, srcno, val);
@@ -475,7 +475,7 @@ static void ics_write_xive(ICSState *ics, int nr, int server,
 
     trace_xics_ics_write_xive(nr, srcno, server, priority);
 
-    if (ics->islsi[srcno]) {
+    if (ics->irqs[srcno].flags & XICS_FLAGS_IRQ_LSI) {
         write_xive_lsi(ics, srcno);
     } else {
         write_xive_msi(ics, srcno);
@@ -497,7 +497,7 @@ static void ics_resend(ICSState *ics)
 
     for (i = 0; i < ics->nr_irqs; i++) {
         /* FIXME: filter by server#? */
-        if (ics->islsi[i]) {
+        if (ics->irqs[i].flags & XICS_FLAGS_IRQ_LSI) {
             resend_lsi(ics, i);
         } else {
             resend_msi(ics, i);
@@ -512,7 +512,7 @@ static void ics_eoi(ICSState *ics, int nr)
 
     trace_xics_ics_eoi(nr);
 
-    if (ics->islsi[srcno]) {
+    if (ics->irqs[srcno].flags & XICS_FLAGS_IRQ_LSI) {
         irq->status &= ~XICS_STATUS_SENT;
     }
 }
@@ -609,7 +609,6 @@ static void ics_realize(DeviceState *dev, Error **errp)
         return;
     }
     ics->irqs = g_malloc0(ics->nr_irqs * sizeof(ICSIRQState));
-    ics->islsi = g_malloc0(ics->nr_irqs * sizeof(bool));
     ics->qirqs = qemu_allocate_irqs(ics_set_irq, ics, ics->nr_irqs);
 }
 
@@ -646,11 +645,21 @@ qemu_irq xics_get_qirq(XICSState *icp, int irq)
     return icp->ics->qirqs[irq - icp->ics->offset];
 }
 
+static void ics_set_irq_type(ICSState *ics, int srcno, bool lsi)
+{
+    assert(!(ics->irqs[srcno].flags & XICS_FLAGS_IRQ_MASK));
+
+    ics->irqs[srcno].flags |=
+        lsi ? XICS_FLAGS_IRQ_LSI : XICS_FLAGS_IRQ_MSI;
+}
+
 void xics_set_irq_type(XICSState *icp, int irq, bool lsi)
 {
-    assert(ics_valid_irq(icp->ics, irq));
+    ICSState *ics = icp->ics;
 
-    icp->ics->islsi[irq - icp->ics->offset] = lsi;
+    assert(ics_valid_irq(ics, irq));
+
+    ics_set_irq_type(ics, irq - ics->offset, lsi);
 }
 
 /*
diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
index 09476ae..8719a88 100644
--- a/hw/intc/xics_kvm.c
+++ b/hw/intc/xics_kvm.c
@@ -224,7 +224,7 @@ static int ics_set_kvm_state(ICSState *ics, int version_id)
             state |= KVM_XICS_MASKED;
         }
 
-        if (ics->islsi[i]) {
+        if (ics->irqs[i].flags & XICS_FLAGS_IRQ_LSI) {
             state |= KVM_XICS_LEVEL_SENSITIVE;
             if (irq->status & XICS_STATUS_ASSERTED) {
                 state |= KVM_XICS_PENDING;
@@ -253,7 +253,7 @@ static void ics_kvm_set_irq(void *opaque, int srcno, int val)
     int rc;
 
     args.irq = srcno + ics->offset;
-    if (!ics->islsi[srcno]) {
+    if (ics->irqs[srcno].flags & XICS_FLAGS_IRQ_MSI) {
         if (!val) {
             return;
         }
@@ -290,7 +290,6 @@ static void ics_kvm_realize(DeviceState *dev, Error **errp)
         return;
     }
     ics->irqs = g_malloc0(ics->nr_irqs * sizeof(ICSIRQState));
-    ics->islsi = g_malloc0(ics->nr_irqs * sizeof(bool));
     ics->qirqs = qemu_allocate_irqs(ics_kvm_set_irq, ics, ics->nr_irqs);
 }
 
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 0d7673d..fa8e9c2 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -136,7 +136,6 @@ struct ICSState {
     uint32_t nr_irqs;
     uint32_t offset;
     qemu_irq *qirqs;
-    bool *islsi;
     ICSIRQState *irqs;
     XICSState *icp;
 };
@@ -150,6 +149,11 @@ struct ICSIRQState {
 #define XICS_STATUS_REJECTED           0x4
 #define XICS_STATUS_MASKED_PENDING     0x8
     uint8_t status;
+/* (flags & XICS_FLAGS_IRQ_MASK) == 0 means the interrupt is not allocated */
+#define XICS_FLAGS_IRQ_LSI             0x1
+#define XICS_FLAGS_IRQ_MSI             0x2
+#define XICS_FLAGS_IRQ_MASK            0x3
+    uint8_t flags;
 };
 
 qemu_irq xics_get_qirq(XICSState *icp, int irq);
-- 
1.9.rc0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 2/8] xics: Add xics_find_source()
  2014-05-15  9:59 [Qemu-devel] [PATCH v2 0/8] Move interrupts from spapr to xics Alexey Kardashevskiy
  2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 1/8] xics: Add flags for interrupts Alexey Kardashevskiy
@ 2014-05-15  9:59 ` Alexey Kardashevskiy
  2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 3/8] xics: Disable flags reset on xics reset Alexey Kardashevskiy
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-15  9:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf

PAPR allows having multiple interrupt sources such as PHB.

This adds a source lookup function and makes use of it.

Since at the moment QEMU only supports a single source,
no change in behaviour is expected.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/intc/xics.c | 30 +++++++++++++++++++++++++-----
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 220ca0e..7065978 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -635,14 +635,32 @@ static const TypeInfo ics_info = {
 /*
  * Exported functions
  */
+static int xics_find_source(XICSState *icp, int irq)
+{
+    int sources = 1;
+    int src;
+
+    /* FIXME: implement multiple sources */
+    for (src = 0; src < sources; ++src) {
+        ICSState *ics = &icp->ics[src];
+        if (ics_valid_irq(ics, irq)) {
+            return src;
+        }
+    }
+
+    return -1;
+}
 
 qemu_irq xics_get_qirq(XICSState *icp, int irq)
 {
-    if (!ics_valid_irq(icp->ics, irq)) {
-        return NULL;
+    int src = xics_find_source(icp, irq);
+
+    if (src >= 0) {
+        ICSState *ics = &icp->ics[src];
+        return ics->qirqs[irq - ics->offset];
     }
 
-    return icp->ics->qirqs[irq - icp->ics->offset];
+    return NULL;
 }
 
 static void ics_set_irq_type(ICSState *ics, int srcno, bool lsi)
@@ -655,10 +673,12 @@ static void ics_set_irq_type(ICSState *ics, int srcno, bool lsi)
 
 void xics_set_irq_type(XICSState *icp, int irq, bool lsi)
 {
-    ICSState *ics = icp->ics;
+    int src = xics_find_source(icp, irq);
+    ICSState *ics;
 
-    assert(ics_valid_irq(ics, irq));
+    assert(src >= 0);
 
+    ics = &icp->ics[src];
     ics_set_irq_type(ics, irq - ics->offset, lsi);
 }
 
-- 
1.9.rc0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 3/8] xics: Disable flags reset on xics reset
  2014-05-15  9:59 [Qemu-devel] [PATCH v2 0/8] Move interrupts from spapr to xics Alexey Kardashevskiy
  2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 1/8] xics: Add flags for interrupts Alexey Kardashevskiy
  2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 2/8] xics: Add xics_find_source() Alexey Kardashevskiy
@ 2014-05-15  9:59 ` Alexey Kardashevskiy
  2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 4/8] spapr: Move interrupt allocator to xics Alexey Kardashevskiy
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-15  9:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf

Since islsi[] array has been merged into the ICSState struct,
we must not reset flags as they tell if the interrupt is in use.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/intc/xics.c     | 7 +++++++
 hw/intc/xics_kvm.c | 7 +++++++
 2 files changed, 14 insertions(+)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 7065978..83a809e 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -521,11 +521,18 @@ static void ics_reset(DeviceState *dev)
 {
     ICSState *ics = ICS(dev);
     int i;
+    uint8_t flags[ics->nr_irqs];
+
+    for (i = 0; i < ics->nr_irqs; i++) {
+        flags[i] = ics->irqs[i].flags;
+    }
 
     memset(ics->irqs, 0, sizeof(ICSIRQState) * ics->nr_irqs);
+
     for (i = 0; i < ics->nr_irqs; i++) {
         ics->irqs[i].priority = 0xff;
         ics->irqs[i].saved_priority = 0xff;
+        ics->irqs[i].flags = flags[i];
     }
 }
 
diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
index 8719a88..6fa2412 100644
--- a/hw/intc/xics_kvm.c
+++ b/hw/intc/xics_kvm.c
@@ -271,11 +271,18 @@ static void ics_kvm_reset(DeviceState *dev)
 {
     ICSState *ics = ICS(dev);
     int i;
+    uint8_t flags[ics->nr_irqs];
+
+    for (i = 0; i < ics->nr_irqs; i++) {
+        flags[i] = ics->irqs[i].flags;
+    }
 
     memset(ics->irqs, 0, sizeof(ICSIRQState) * ics->nr_irqs);
+
     for (i = 0; i < ics->nr_irqs; i++) {
         ics->irqs[i].priority = 0xff;
         ics->irqs[i].saved_priority = 0xff;
+        ics->irqs[i].flags = flags[i];
     }
 
     ics_set_kvm_state(ics, 1);
-- 
1.9.rc0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 4/8] spapr: Move interrupt allocator to xics
  2014-05-15  9:59 [Qemu-devel] [PATCH v2 0/8] Move interrupts from spapr to xics Alexey Kardashevskiy
                   ` (2 preceding siblings ...)
  2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 3/8] xics: Disable flags reset on xics reset Alexey Kardashevskiy
@ 2014-05-15  9:59 ` Alexey Kardashevskiy
  2014-05-21  8:34   ` Alexander Graf
  2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 5/8] xics: Remove obsolete xics_set_irq_type() Alexey Kardashevskiy
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-15  9:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf

The current allocator returns IRQ numbers from a pool and does not
support IRQs reuse in any form as it did not keep track of what it
previously returned, it only keeps the last returned IRQ. Some use
cases such as PCI hot(un)plug may require IRQ release and reallocation.

This moves an allocator from SPAPR to XICS.

This switches IRQ users to use new API.

This uses LSI/MSI flags to know if interrupt is allocated.

The interrupt release function will be posted as a separate patch.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/intc/xics.c         | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr.c         | 67 --------------------------------------
 hw/ppc/spapr_events.c  |  2 +-
 hw/ppc/spapr_pci.c     |  6 ++--
 hw/ppc/spapr_vio.c     |  2 +-
 include/hw/ppc/spapr.h | 10 ------
 include/hw/ppc/xics.h  |  2 ++
 trace-events           |  4 +++
 8 files changed, 99 insertions(+), 82 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 83a809e..fdcbb3a 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -689,6 +689,94 @@ void xics_set_irq_type(XICSState *icp, int irq, bool lsi)
     ics_set_irq_type(ics, irq - ics->offset, lsi);
 }
 
+#define ICS_IRQ_FREE(ics, srcno)   \
+    (!((ics)->irqs[(srcno)].flags & (XICS_FLAGS_IRQ_MASK)))
+
+static int ics_find_free_block(ICSState *ics, int num, int alignnum)
+{
+    int first, i;
+
+    for (first = 0; first < ics->nr_irqs; first += alignnum) {
+        if (num > (ics->nr_irqs - first)) {
+            return -1;
+        }
+        for (i = first; i < first + num; ++i) {
+            if (!ICS_IRQ_FREE(ics, i)) {
+                break;
+            }
+        }
+        if (i == (first + num)) {
+            return first;
+        }
+    }
+
+    return -1;
+}
+
+int xics_alloc(XICSState *icp, int src, int irq_hint, bool lsi)
+{
+    ICSState *ics = &icp->ics[src];
+    int irq;
+
+    if (irq_hint) {
+        assert(src == xics_find_source(icp, irq_hint));
+        if (!ICS_IRQ_FREE(ics, irq_hint - ics->offset)) {
+            trace_xics_alloc_failed_hint(src, irq_hint);
+            return -1;
+        }
+        irq = irq_hint;
+    } else {
+        irq = ics_find_free_block(ics, 1, 1);
+        if (irq < 0) {
+            trace_xics_alloc_failed_no_left(src);
+            return -1;
+        }
+        irq += ics->offset;
+    }
+
+    ics_set_irq_type(ics, irq - ics->offset, lsi);
+    trace_xics_alloc(src, irq);
+
+    return irq;
+}
+
+/*
+ * Allocate block of consequtive IRQs, returns a number of the first.
+ * If align==true, aligns the first IRQ number to num.
+ */
+int xics_alloc_block(XICSState *icp, int src, int num, bool lsi, bool align)
+{
+    int i, first = -1;
+    ICSState *ics = &icp->ics[src];
+
+    assert(src == 0);
+    /*
+     * MSIMesage::data is used for storing VIRQ so
+     * it has to be aligned to num to support multiple
+     * MSI vectors. MSI-X is not affected by this.
+     * The hint is used for the first IRQ, the rest should
+     * be allocated continuously.
+     */
+    if (align) {
+        assert((num == 1) || (num == 2) || (num == 4) ||
+               (num == 8) || (num == 16) || (num == 32));
+        first = ics_find_free_block(ics, num, num);
+    } else {
+        first = ics_find_free_block(ics, num, 1);
+    }
+
+    if (first >= 0) {
+        for (i = first; i < first + num; ++i) {
+            ics_set_irq_type(ics, i, lsi);
+        }
+    }
+    first += ics->offset;
+
+    trace_xics_alloc_block(src, first, num, lsi, align);
+
+    return first;
+}
+
 /*
  * Guest interfaces
  */
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index b898a39..fce7e1c 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -88,73 +88,6 @@
 
 sPAPREnvironment *spapr;
 
-int spapr_allocate_irq(int hint, bool lsi)
-{
-    int irq;
-
-    if (hint) {
-        irq = hint;
-        if (hint >= spapr->next_irq) {
-            spapr->next_irq = hint + 1;
-        }
-        /* FIXME: we should probably check for collisions somehow */
-    } else {
-        irq = spapr->next_irq++;
-    }
-
-    /* Configure irq type */
-    if (!xics_get_qirq(spapr->icp, irq)) {
-        return 0;
-    }
-
-    xics_set_irq_type(spapr->icp, irq, lsi);
-
-    return irq;
-}
-
-/*
- * Allocate block of consequtive IRQs, returns a number of the first.
- * If msi==true, aligns the first IRQ number to num.
- */
-int spapr_allocate_irq_block(int num, bool lsi, bool msi)
-{
-    int first = -1;
-    int i, hint = 0;
-
-    /*
-     * MSIMesage::data is used for storing VIRQ so
-     * it has to be aligned to num to support multiple
-     * MSI vectors. MSI-X is not affected by this.
-     * The hint is used for the first IRQ, the rest should
-     * be allocated continuously.
-     */
-    if (msi) {
-        assert((num == 1) || (num == 2) || (num == 4) ||
-               (num == 8) || (num == 16) || (num == 32));
-        hint = (spapr->next_irq + num - 1) & ~(num - 1);
-    }
-
-    for (i = 0; i < num; ++i) {
-        int irq;
-
-        irq = spapr_allocate_irq(hint, lsi);
-        if (!irq) {
-            return -1;
-        }
-
-        if (0 == i) {
-            first = irq;
-            hint = 0;
-        }
-
-        /* If the above doesn't create a consecutive block then that's
-         * an internal bug */
-        assert(irq == (first + i));
-    }
-
-    return first;
-}
-
 static XICSState *try_create_xics(const char *type, int nr_servers,
                                   int nr_irqs)
 {
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index 16fa49e..60f282f 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -314,7 +314,7 @@ static void check_exception(PowerPCCPU *cpu, sPAPREnvironment *spapr,
 
 void spapr_events_init(sPAPREnvironment *spapr)
 {
-    spapr->epow_irq = spapr_allocate_msi(0);
+    spapr->epow_irq = xics_alloc(spapr->icp, 0, 0, false);
     spapr->epow_notifier.notify = spapr_powerdown_req;
     qemu_register_powerdown_notifier(&spapr->epow_notifier);
     spapr_rtas_register("check-exception", check_exception);
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 1db73f2..e574014 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -360,8 +360,8 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
 
     /* There is no cached config, allocate MSIs */
     if (!phb->msi_table[ndev].nvec) {
-        irq = spapr_allocate_irq_block(req_num, false,
-                                       ret_intr_type == RTAS_TYPE_MSI);
+        irq = xics_alloc_block(spapr->icp, 0, req_num, false,
+                               ret_intr_type == RTAS_TYPE_MSI);
         if (irq < 0) {
             error_report("Cannot allocate MSIs for device#%d", ndev);
             rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
@@ -631,7 +631,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
     for (i = 0; i < PCI_NUM_PINS; i++) {
         uint32_t irq;
 
-        irq = spapr_allocate_lsi(0);
+        irq = xics_alloc_block(spapr->icp, 0, 1, true, false);
         if (!irq) {
             error_setg(errp, "spapr_allocate_lsi failed");
             return;
diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
index 2ae06a3..713499e 100644
--- a/hw/ppc/spapr_vio.c
+++ b/hw/ppc/spapr_vio.c
@@ -449,7 +449,7 @@ static int spapr_vio_busdev_init(DeviceState *qdev)
         dev->qdev.id = id;
     }
 
-    dev->irq = spapr_allocate_msi(dev->irq);
+    dev->irq = xics_alloc(spapr->icp, 0, dev->irq, false);
     if (!dev->irq) {
         return -1;
     }
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 5fdac1e..feb241a 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -327,16 +327,6 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
 int spapr_allocate_irq(int hint, bool lsi);
 int spapr_allocate_irq_block(int num, bool lsi, bool msi);
 
-static inline int spapr_allocate_msi(int hint)
-{
-    return spapr_allocate_irq(hint, false);
-}
-
-static inline int spapr_allocate_lsi(int hint)
-{
-    return spapr_allocate_irq(hint, true);
-}
-
 /* RTAS return codes */
 #define RTAS_OUT_SUCCESS            0
 #define RTAS_OUT_NO_ERRORS_FOUND    1
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index fa8e9c2..8e13488 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -158,6 +158,8 @@ struct ICSIRQState {
 
 qemu_irq xics_get_qirq(XICSState *icp, int irq);
 void xics_set_irq_type(XICSState *icp, int irq, bool lsi);
+int xics_alloc(XICSState *icp, int src, int irq_hint, bool lsi);
+int xics_alloc_block(XICSState *icp, int src, int num, bool lsi, bool align);
 
 void xics_cpu_setup(XICSState *icp, PowerPCCPU *cpu);
 
diff --git a/trace-events b/trace-events
index af4449d..d8e83cc 100644
--- a/trace-events
+++ b/trace-events
@@ -1178,6 +1178,10 @@ xics_set_irq_lsi(int srcno, int nr) "set_irq_lsi: srcno %d [irq %#x]"
 xics_ics_write_xive(int nr, int srcno, int server, uint8_t priority) "ics_write_xive: irq %#x [src %d] server %#x prio %#x"
 xics_ics_reject(int nr, int srcno) "reject irq %#x [src %d]"
 xics_ics_eoi(int nr) "ics_eoi: irq %#x"
+xics_alloc(int src, int irq) "source#%d, irq %d"
+xics_alloc_failed_hint(int src, int irq) "source#%d, irq %d is already in use"
+xics_alloc_failed_no_left(int src) "source#%d, no irq left"
+xics_alloc_block(int src, int first, int num, bool lsi, int align) "source#%d, first irq %d, %d irqs, lsi=%d, alignnum %d"
 
 # hw/ppc/spapr_iommu.c
 spapr_iommu_put(uint64_t liobn, uint64_t ioba, uint64_t tce, uint64_t ret) "liobn=%"PRIx64" ioba=0x%"PRIx64" tce=0x%"PRIx64" ret=%"PRId64
-- 
1.9.rc0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 5/8] xics: Remove obsolete xics_set_irq_type()
  2014-05-15  9:59 [Qemu-devel] [PATCH v2 0/8] Move interrupts from spapr to xics Alexey Kardashevskiy
                   ` (3 preceding siblings ...)
  2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 4/8] spapr: Move interrupt allocator to xics Alexey Kardashevskiy
@ 2014-05-15  9:59 ` Alexey Kardashevskiy
  2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 6/8] spapr: Remove @next_irq Alexey Kardashevskiy
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-15  9:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf

This removes xics_set_irq_type() as it is not used anymore.

This is done by a separate patch to make the previous patch
look nicer.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/intc/xics.c        | 11 -----------
 include/hw/ppc/xics.h |  1 -
 2 files changed, 12 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index fdcbb3a..1a8cfd7 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -678,17 +678,6 @@ static void ics_set_irq_type(ICSState *ics, int srcno, bool lsi)
         lsi ? XICS_FLAGS_IRQ_LSI : XICS_FLAGS_IRQ_MSI;
 }
 
-void xics_set_irq_type(XICSState *icp, int irq, bool lsi)
-{
-    int src = xics_find_source(icp, irq);
-    ICSState *ics;
-
-    assert(src >= 0);
-
-    ics = &icp->ics[src];
-    ics_set_irq_type(ics, irq - ics->offset, lsi);
-}
-
 #define ICS_IRQ_FREE(ics, srcno)   \
     (!((ics)->irqs[(srcno)].flags & (XICS_FLAGS_IRQ_MASK)))
 
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 8e13488..0d8af1b 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -157,7 +157,6 @@ struct ICSIRQState {
 };
 
 qemu_irq xics_get_qirq(XICSState *icp, int irq);
-void xics_set_irq_type(XICSState *icp, int irq, bool lsi);
 int xics_alloc(XICSState *icp, int src, int irq_hint, bool lsi);
 int xics_alloc_block(XICSState *icp, int src, int num, bool lsi, bool align);
 
-- 
1.9.rc0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 6/8] spapr: Remove @next_irq
  2014-05-15  9:59 [Qemu-devel] [PATCH v2 0/8] Move interrupts from spapr to xics Alexey Kardashevskiy
                   ` (4 preceding siblings ...)
  2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 5/8] xics: Remove obsolete xics_set_irq_type() Alexey Kardashevskiy
@ 2014-05-15  9:59 ` Alexey Kardashevskiy
  2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 7/8] xics: Implement xics_ics_free() Alexey Kardashevskiy
  2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB Alexey Kardashevskiy
  7 siblings, 0 replies; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-15  9:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf

This removes @next_irq from sPAPREnvironment which was used in old
IRQ allocator as XICS is now responsible for IRQs and keeps track of
allocated IRQs.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr.c         | 3 +--
 include/hw/ppc/spapr.h | 1 -
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index fce7e1c..d95e9b4 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -767,7 +767,7 @@ static const VMStateDescription vmstate_spapr = {
     .minimum_version_id = 1,
     .minimum_version_id_old = 1,
     .fields      = (VMStateField []) {
-        VMSTATE_UINT32(next_irq, sPAPREnvironment),
+        VMSTATE_UNUSED(4), /* used to be @next_irq */
 
         /* RTC offset */
         VMSTATE_UINT64(rtc_offset, sPAPREnvironment),
@@ -1171,7 +1171,6 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
     /* Set up Interrupt Controller before we create the VCPUs */
     spapr->icp = xics_system_init(smp_cpus * kvmppc_smt_threads() / smp_threads,
                                   XICS_IRQS);
-    spapr->next_irq = XICS_IRQ_BASE;
 
     /* init CPUs */
     if (cpu_model == NULL) {
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index feb241a..f8d7326 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -27,7 +27,6 @@ typedef struct sPAPREnvironment {
     long rtas_size;
     void *fdt_skel;
     target_ulong entry_point;
-    uint32_t next_irq;
     uint64_t rtc_offset;
     bool has_graphics;
 
-- 
1.9.rc0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 7/8] xics: Implement xics_ics_free()
  2014-05-15  9:59 [Qemu-devel] [PATCH v2 0/8] Move interrupts from spapr to xics Alexey Kardashevskiy
                   ` (5 preceding siblings ...)
  2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 6/8] spapr: Remove @next_irq Alexey Kardashevskiy
@ 2014-05-15  9:59 ` Alexey Kardashevskiy
  2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB Alexey Kardashevskiy
  7 siblings, 0 replies; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-15  9:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf

This implements interrupt release function so IRQs can be returned back
to the pool for reuse in cases such as PCI hot plug.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/intc/xics.c        | 25 +++++++++++++++++++++++++
 include/hw/ppc/xics.h |  1 +
 trace-events          |  2 ++
 3 files changed, 28 insertions(+)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 1a8cfd7..f6a2066 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -766,6 +766,31 @@ int xics_alloc_block(XICSState *icp, int src, int num, bool lsi, bool align)
     return first;
 }
 
+static void ics_free(ICSState *ics, int srcno, int num)
+{
+    int i;
+
+    for (i = srcno; i < srcno + num; ++i) {
+        if (ICS_IRQ_FREE(ics, i)) {
+            trace_xics_ics_free_warn(ics - ics->icp->ics, i + ics->offset);
+        }
+        memset(&ics->irqs[i], 0, sizeof(ICSIRQState));
+    }
+}
+
+void xics_free(XICSState *icp, int src, int irq, int num)
+{
+    ICSState *ics = &icp->ics[src];
+
+    /* FIXME: implement multiple sources */
+    assert(src == 0);
+
+    trace_xics_ics_free(ics - icp->ics, irq, num);
+    if (src >= 0) {
+        ics_free(ics, irq - ics->offset, num);
+    }
+}
+
 /*
  * Guest interfaces
  */
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 0d8af1b..267d045 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -159,6 +159,7 @@ struct ICSIRQState {
 qemu_irq xics_get_qirq(XICSState *icp, int irq);
 int xics_alloc(XICSState *icp, int src, int irq_hint, bool lsi);
 int xics_alloc_block(XICSState *icp, int src, int num, bool lsi, bool align);
+void xics_free(XICSState *icp, int src, int irq, int num);
 
 void xics_cpu_setup(XICSState *icp, PowerPCCPU *cpu);
 
diff --git a/trace-events b/trace-events
index d8e83cc..0a66c71 100644
--- a/trace-events
+++ b/trace-events
@@ -1182,6 +1182,8 @@ xics_alloc(int src, int irq) "source#%d, irq %d"
 xics_alloc_failed_hint(int src, int irq) "source#%d, irq %d is already in use"
 xics_alloc_failed_no_left(int src) "source#%d, no irq left"
 xics_alloc_block(int src, int first, int num, bool lsi, int align) "source#%d, first irq %d, %d irqs, lsi=%d, alignnum %d"
+xics_ics_free(int src, int irq, int num) "Source#%d, first irq %d, %d irqs"
+xics_ics_free_warn(int src, int irq) "Source#%d, irq %d is already free"
 
 # hw/ppc/spapr_iommu.c
 spapr_iommu_put(uint64_t liobn, uint64_t ioba, uint64_t tce, uint64_t ret) "liobn=%"PRIx64" ioba=0x%"PRIx64" tce=0x%"PRIx64" ret=%"PRId64
-- 
1.9.rc0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
  2014-05-15  9:59 [Qemu-devel] [PATCH v2 0/8] Move interrupts from spapr to xics Alexey Kardashevskiy
                   ` (6 preceding siblings ...)
  2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 7/8] xics: Implement xics_ics_free() Alexey Kardashevskiy
@ 2014-05-15  9:59 ` Alexey Kardashevskiy
  2014-05-21  8:40   ` Alexander Graf
  7 siblings, 1 reply; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-15  9:59 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf

Currently SPAPR PHB keeps track of all allocated MSI/MISX interrupt as
XICS used to be unable to reuse interrupts which becomes a problem for
dynamic MSI reconfiguration which is happening on guest driver reload or
PCI hot (un)plug. Another problem is that PHB has a limit of devices
supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no good reason
for that.

This makes use of new XICS ability to reuse interrupts.

This removes cached MSI configuration from SPAPR PHB so the first IRQ number
of a device is stored in MSI/MSIX config space so there is no need to store
this anywhere else. From now on, SPAPR PHB only keeps flags telling what type
of interrupt for which device it has configured in order to return error if
(for example) MSIX was enabled and the guest is trying to disable MSI which
it has not enabled.

This removes a limit for the maximum number of MSIX-enabled devices per PHB,
now XICS and PCI bus capacity are the only limitation.

This changes migration stream as it fixes vmstate_spapr_pci_msi::name which was
wrong since the beginning.

This fixed traces to be more informative.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---

In reality either MSIX or MSI is enabled, never both. So I could remove msi/msix
bitmaps from this patch, would it make sense?
---
 hw/ppc/spapr_pci.c          | 179 +++++++++++++++++++++++---------------------
 include/hw/pci-host/spapr.h |  11 +--
 trace-events                |   5 +-
 3 files changed, 99 insertions(+), 96 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index e574014..49e0382 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -220,36 +220,12 @@ static void rtas_write_pci_config(PowerPCCPU *cpu, sPAPREnvironment *spapr,
 }
 
 /*
- * Find an entry with config_addr or returns the empty one if not found AND
- * alloc_new is set.
- * At the moment the msi_table entries are never released so there is
- * no point to look till the end of the list if we need to find the free entry.
- */
-static int spapr_msicfg_find(sPAPRPHBState *phb, uint32_t config_addr,
-                             bool alloc_new)
-{
-    int i;
-
-    for (i = 0; i < SPAPR_MSIX_MAX_DEVS; ++i) {
-        if (!phb->msi_table[i].nvec) {
-            break;
-        }
-        if (phb->msi_table[i].config_addr == config_addr) {
-            return i;
-        }
-    }
-    if ((i < SPAPR_MSIX_MAX_DEVS) && alloc_new) {
-        trace_spapr_pci_msi("Allocating new MSI config", i, config_addr);
-        return i;
-    }
-
-    return -1;
-}
-
-/*
  * Set MSI/MSIX message data.
  * This is required for msi_notify()/msix_notify() which
  * will write at the addresses via spapr_msi_write().
+ *
+ * If hwaddr == 0, all entries will have .data == first_irq i.e.
+ * table will be reset.
  */
 static void spapr_msi_setmsg(PCIDevice *pdev, hwaddr addr, bool msix,
                              unsigned first_irq, unsigned req_num)
@@ -263,12 +239,51 @@ static void spapr_msi_setmsg(PCIDevice *pdev, hwaddr addr, bool msix,
         return;
     }
 
-    for (i = 0; i < req_num; ++i, ++msg.data) {
+    for (i = 0; i < req_num; ++i) {
         msix_set_message(pdev, i, msg);
         trace_spapr_pci_msi_setup(pdev->name, i, msg.address);
+        if (addr) {
+            ++msg.data;
+        }
     }
 }
 
+static unsigned spapr_msi_get(sPAPRPHBState *phb, PCIDevice *pdev,
+                              unsigned *num, bool *msix)
+{
+    MSIMessage msg;
+    unsigned irq = 0;
+    uint8_t offs = (pci_bus_num(pdev->bus) << SPAPR_PCI_BUS_SHIFT) |
+        PCI_SLOT(pdev->devfn);
+
+    if ((phb->msi[offs] & (1 << PCI_FUNC(pdev->devfn))) &&
+        (phb->msix[offs] & (1 << PCI_FUNC(pdev->devfn)))) {
+        error_report("Both MSI and MSIX configured! MSIX will be used.");
+    }
+
+    if (phb->msix[offs] & (1 << PCI_FUNC(pdev->devfn))) {
+        *num = pdev->msix_entries_nr;
+        if (*num) {
+            msg = msix_get_message(pdev, 0);
+            irq = msg.data;
+            if (msix) {
+                *msix = true;
+            }
+        }
+    } else if (phb->msi[offs] & (1 << PCI_FUNC(pdev->devfn))) {
+        *num = msi_nr_vectors_allocated(pdev);
+        if (*num) {
+            msg = msi_get_message(pdev, 0);
+            irq = msg.data;
+            if (msix) {
+                *msix = false;
+            }
+        }
+    }
+
+    return irq;
+}
+
 static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
                                 uint32_t token, uint32_t nargs,
                                 target_ulong args, uint32_t nret,
@@ -280,9 +295,10 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
     unsigned int req_num = rtas_ld(args, 4); /* 0 == remove all */
     unsigned int seq_num = rtas_ld(args, 5);
     unsigned int ret_intr_type;
-    int ndev, irq, max_irqs = 0;
+    unsigned int irq, max_irqs = 0, num = 0, offs;
     sPAPRPHBState *phb = NULL;
     PCIDevice *pdev = NULL;
+    bool msix = false;
 
     switch (func) {
     case RTAS_CHANGE_MSI_FN:
@@ -307,16 +323,29 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
         rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
         return;
     }
+    offs = (pci_bus_num(pdev->bus) << SPAPR_PCI_BUS_SHIFT) |
+        PCI_SLOT(pdev->devfn);
 
     /* Releasing MSIs */
     if (!req_num) {
-        ndev = spapr_msicfg_find(phb, config_addr, false);
-        if (ndev < 0) {
-            trace_spapr_pci_msi("MSI has not been enabled", -1, config_addr);
+        irq = spapr_msi_get(phb, pdev, &num, &msix);
+        if (!num || !irq ||
+            ((func == RTAS_CHANGE_MSI_FN) && msix) ||
+            ((func == RTAS_CHANGE_MSIX_FN) && !msix)) {
+            trace_spapr_pci_msi("Releasing wrong config", config_addr);
             rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
             return;
         }
-        trace_spapr_pci_msi("Released MSIs", ndev, config_addr);
+
+        xics_free(spapr->icp, 0, irq, num);
+        spapr_msi_setmsg(pdev, 0, msix, 0, num);
+
+        if (msix) {
+            phb->msix[offs] &= ~(1 << PCI_FUNC(pdev->devfn));
+        } else {
+            phb->msi[offs] &= ~(1 << PCI_FUNC(pdev->devfn));
+        }
+        trace_spapr_pci_msi("Released MSIs", config_addr);
         rtas_st(rets, 0, RTAS_OUT_SUCCESS);
         rtas_st(rets, 1, 0);
         return;
@@ -324,15 +353,6 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
 
     /* Enabling MSI */
 
-    /* Find a device number in the map to add or reuse the existing one */
-    ndev = spapr_msicfg_find(phb, config_addr, true);
-    if (ndev >= SPAPR_MSIX_MAX_DEVS || ndev < 0) {
-        error_report("No free entry for a new MSI device");
-        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
-        return;
-    }
-    trace_spapr_pci_msi("Configuring MSI", ndev, config_addr);
-
     /* Check if the device supports as many IRQs as requested */
     if (ret_intr_type == RTAS_TYPE_MSI) {
         max_irqs = msi_nr_vectors_allocated(pdev);
@@ -340,48 +360,44 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
         max_irqs = pdev->msix_entries_nr;
     }
     if (!max_irqs) {
-        error_report("Requested interrupt type %d is not enabled for device#%d",
-                     ret_intr_type, ndev);
+        error_report("Requested interrupt type %d is not enabled for device %x",
+                     ret_intr_type, config_addr);
         rtas_st(rets, 0, -1); /* Hardware error */
         return;
     }
     /* Correct the number if the guest asked for too many */
     if (req_num > max_irqs) {
+        trace_spapr_pci_msi_retry(config_addr, req_num, max_irqs);
         req_num = max_irqs;
+        irq = 0; /* to avoid misleading trace */
+        goto out;
     }
 
-    /* Check if there is an old config and MSI number has not changed */
-    if (phb->msi_table[ndev].nvec && (req_num != phb->msi_table[ndev].nvec)) {
-        /* Unexpected behaviour */
-        error_report("Cannot reuse MSI config for device#%d", ndev);
+    /* Allocate MSIs */
+    irq = xics_alloc_block(spapr->icp, 0, req_num, false,
+                           ret_intr_type == RTAS_TYPE_MSI);
+    if (!irq) {
+        error_report("Cannot allocate MSIs for device %x", config_addr);
         rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
         return;
     }
 
-    /* There is no cached config, allocate MSIs */
-    if (!phb->msi_table[ndev].nvec) {
-        irq = xics_alloc_block(spapr->icp, 0, req_num, false,
-                               ret_intr_type == RTAS_TYPE_MSI);
-        if (irq < 0) {
-            error_report("Cannot allocate MSIs for device#%d", ndev);
-            rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
-            return;
-        }
-        phb->msi_table[ndev].irq = irq;
-        phb->msi_table[ndev].nvec = req_num;
-        phb->msi_table[ndev].config_addr = config_addr;
-    }
-
     /* Setup MSI/MSIX vectors in the device (via cfgspace or MSIX BAR) */
     spapr_msi_setmsg(pdev, spapr->msi_win_addr, ret_intr_type == RTAS_TYPE_MSIX,
-                     phb->msi_table[ndev].irq, req_num);
+                     irq, req_num);
+    if (ret_intr_type == RTAS_TYPE_MSIX) {
+        phb->msix[offs] |= 1 << PCI_FUNC(pdev->devfn);
+    } else {
+        phb->msi[offs] |= 1 << PCI_FUNC(pdev->devfn);
+    }
 
+out:
     rtas_st(rets, 0, RTAS_OUT_SUCCESS);
     rtas_st(rets, 1, req_num);
     rtas_st(rets, 2, ++seq_num);
     rtas_st(rets, 3, ret_intr_type);
 
-    trace_spapr_pci_rtas_ibm_change_msi(func, req_num);
+    trace_spapr_pci_rtas_ibm_change_msi(config_addr, func, req_num, irq);
 }
 
 static void rtas_ibm_query_interrupt_source_number(PowerPCCPU *cpu,
@@ -395,25 +411,28 @@ static void rtas_ibm_query_interrupt_source_number(PowerPCCPU *cpu,
     uint32_t config_addr = rtas_ld(args, 0);
     uint64_t buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
     unsigned int intr_src_num = -1, ioa_intr_num = rtas_ld(args, 3);
-    int ndev;
+    unsigned irq, num = 0;
     sPAPRPHBState *phb = NULL;
+    PCIDevice *pdev = NULL;
 
     /* Fins sPAPRPHBState */
     phb = find_phb(spapr, buid);
-    if (!phb) {
+    if (phb) {
+        pdev = find_dev(spapr, buid, config_addr);
+    }
+    if (!phb || !pdev) {
         rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
         return;
     }
 
     /* Find device descriptor and start IRQ */
-    ndev = spapr_msicfg_find(phb, config_addr, false);
-    if (ndev < 0) {
-        trace_spapr_pci_msi("MSI has not been enabled", -1, config_addr);
+    irq = spapr_msi_get(phb, pdev, &num, NULL);
+    if (!irq || !num || (ioa_intr_num >= num)) {
+        trace_spapr_pci_msi("Failed to return vector", config_addr);
         rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
         return;
     }
-
-    intr_src_num = phb->msi_table[ndev].irq + ioa_intr_num;
+    intr_src_num = irq + ioa_intr_num;
     trace_spapr_pci_rtas_ibm_query_interrupt_source_number(ioa_intr_num,
                                                            intr_src_num);
 
@@ -675,20 +694,6 @@ static const VMStateDescription vmstate_spapr_pci_lsi = {
     },
 };
 
-static const VMStateDescription vmstate_spapr_pci_msi = {
-    .name = "spapr_pci/lsi",
-    .version_id = 1,
-    .minimum_version_id = 1,
-    .minimum_version_id_old = 1,
-    .fields      = (VMStateField []) {
-        VMSTATE_UINT32(config_addr, struct spapr_pci_msi),
-        VMSTATE_UINT32(irq, struct spapr_pci_msi),
-        VMSTATE_UINT32(nvec, struct spapr_pci_msi),
-
-        VMSTATE_END_OF_LIST()
-    },
-};
-
 static const VMStateDescription vmstate_spapr_pci = {
     .name = "spapr_pci",
     .version_id = 1,
@@ -703,8 +708,8 @@ static const VMStateDescription vmstate_spapr_pci = {
         VMSTATE_UINT64_EQUAL(io_win_size, sPAPRPHBState),
         VMSTATE_STRUCT_ARRAY(lsi_table, sPAPRPHBState, PCI_NUM_PINS, 0,
                              vmstate_spapr_pci_lsi, struct spapr_pci_lsi),
-        VMSTATE_STRUCT_ARRAY(msi_table, sPAPRPHBState, SPAPR_MSIX_MAX_DEVS, 0,
-                             vmstate_spapr_pci_msi, struct spapr_pci_msi),
+        VMSTATE_UINT8_ARRAY(msi, sPAPRPHBState, PCI_BUS_MAX * PCI_SLOT_MAX),
+        VMSTATE_UINT8_ARRAY(msix, sPAPRPHBState, PCI_BUS_MAX * PCI_SLOT_MAX),
 
         VMSTATE_END_OF_LIST()
     },
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 970b4a9..7ef29ab 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -27,8 +27,6 @@
 #include "hw/pci/pci_host.h"
 #include "hw/ppc/xics.h"
 
-#define SPAPR_MSIX_MAX_DEVS 32
-
 #define TYPE_SPAPR_PCI_HOST_BRIDGE "spapr-pci-host-bridge"
 
 #define SPAPR_PCI_HOST_BRIDGE(obj) \
@@ -55,11 +53,10 @@ typedef struct sPAPRPHBState {
         uint32_t irq;
     } lsi_table[PCI_NUM_PINS];
 
-    struct spapr_pci_msi {
-        uint32_t config_addr;
-        uint32_t irq;
-        uint32_t nvec;
-    } msi_table[SPAPR_MSIX_MAX_DEVS];
+#define PCI_BUS_MAX             0xFF
+#define SPAPR_PCI_BUS_SHIFT     5
+    uint8_t msi[PCI_BUS_MAX * PCI_SLOT_MAX];
+    uint8_t msix[PCI_BUS_MAX * PCI_SLOT_MAX];
 
     QLIST_ENTRY(sPAPRPHBState) list;
 } sPAPRPHBState;
diff --git a/trace-events b/trace-events
index 0a66c71..c180ea1 100644
--- a/trace-events
+++ b/trace-events
@@ -1159,12 +1159,13 @@ qxl_render_guest_primary_resized(int32_t width, int32_t height, int32_t stride,
 qxl_render_update_area_done(void *cookie) "%p"
 
 # hw/ppc/spapr_pci.c
-spapr_pci_msi(const char *msg, uint32_t n, uint32_t ca) "%s (device#%d, cfg=%x)"
+spapr_pci_msi(const char *msg, uint32_t ca) "%s (cfg=%x)"
 spapr_pci_msi_setup(const char *name, unsigned vector, uint64_t addr) "dev\"%s\" vector %u, addr=%"PRIx64
-spapr_pci_rtas_ibm_change_msi(unsigned func, unsigned req) "func %u, requested %u"
+spapr_pci_rtas_ibm_change_msi(unsigned cfg, unsigned func, unsigned req, unsigned first) "cfgaddr %x func %u, requested %u, first irq %u"
 spapr_pci_rtas_ibm_query_interrupt_source_number(unsigned ioa, unsigned intr) "queries for #%u, IRQ%u"
 spapr_pci_msi_write(uint64_t addr, uint64_t data, uint32_t dt_irq) "@%"PRIx64"<=%"PRIx64" IRQ %u"
 spapr_pci_lsi_set(const char *busname, int pin, uint32_t irq) "%s PIN%d IRQ %u"
+spapr_pci_msi_retry(unsigned config_addr, unsigned req_num, unsigned max_irqs) "Guest device at %x asked %u, have only %u"
 
 # hw/intc/xics.c
 xics_icp_check_ipi(int server, uint8_t mfrr) "CPU %d can take IPI mfrr=%#x"
-- 
1.9.rc0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] xics: Add flags for interrupts
  2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 1/8] xics: Add flags for interrupts Alexey Kardashevskiy
@ 2014-05-21  6:30   ` Alexey Kardashevskiy
  2014-05-21  8:40     ` Alexander Graf
  0 siblings, 1 reply; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-21  6:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, Alexander Graf

On 05/15/2014 07:59 PM, Alexey Kardashevskiy wrote:
> The existing interrupt allocation scheme in SPAPR assumes that
> interrupts are allocated at the start time, continously and the config
> will not change. However, there are cases when this is not going to work
> such as:
> 
> 1. migration - we will have to have an ability to choose interrupt
> numbers for devices in the command line and this will create gaps in
> interrupt space.
> 
> 2. PCI hotplug - interrupts from unplugged device need to be returned
> back to interrupt pool, otherwise we will quickly run out of interrupts.
> 
> This replaces a separate lslsi[] array with a byte in the ICSIRQState
> struct and defines "LSI" and "MSI" flags. Neither of these flags set
> signals that the descriptor is not allocated and not in use.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  hw/intc/xics.c        | 23 ++++++++++++++++-------
>  hw/intc/xics_kvm.c    |  5 ++---
>  include/hw/ppc/xics.h |  6 +++++-
>  3 files changed, 23 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index 64aabe7..220ca0e 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -438,7 +438,7 @@ static void ics_set_irq(void *opaque, int srcno, int val)
>  {
>      ICSState *ics = (ICSState *)opaque;
>  
> -    if (ics->islsi[srcno]) {
> +    if (ics->irqs[srcno].flags & XICS_FLAGS_IRQ_LSI) {
>          set_irq_lsi(ics, srcno, val);
>      } else {
>          set_irq_msi(ics, srcno, val);
> @@ -475,7 +475,7 @@ static void ics_write_xive(ICSState *ics, int nr, int server,
>  
>      trace_xics_ics_write_xive(nr, srcno, server, priority);
>  
> -    if (ics->islsi[srcno]) {
> +    if (ics->irqs[srcno].flags & XICS_FLAGS_IRQ_LSI) {
>          write_xive_lsi(ics, srcno);
>      } else {
>          write_xive_msi(ics, srcno);
> @@ -497,7 +497,7 @@ static void ics_resend(ICSState *ics)
>  
>      for (i = 0; i < ics->nr_irqs; i++) {
>          /* FIXME: filter by server#? */
> -        if (ics->islsi[i]) {
> +        if (ics->irqs[i].flags & XICS_FLAGS_IRQ_LSI) {
>              resend_lsi(ics, i);
>          } else {
>              resend_msi(ics, i);
> @@ -512,7 +512,7 @@ static void ics_eoi(ICSState *ics, int nr)
>  
>      trace_xics_ics_eoi(nr);
>  
> -    if (ics->islsi[srcno]) {
> +    if (ics->irqs[srcno].flags & XICS_FLAGS_IRQ_LSI) {
>          irq->status &= ~XICS_STATUS_SENT;
>      }
>  }
> @@ -609,7 +609,6 @@ static void ics_realize(DeviceState *dev, Error **errp)
>          return;
>      }
>      ics->irqs = g_malloc0(ics->nr_irqs * sizeof(ICSIRQState));
> -    ics->islsi = g_malloc0(ics->nr_irqs * sizeof(bool));
>      ics->qirqs = qemu_allocate_irqs(ics_set_irq, ics, ics->nr_irqs);
>  }
>  
> @@ -646,11 +645,21 @@ qemu_irq xics_get_qirq(XICSState *icp, int irq)
>      return icp->ics->qirqs[irq - icp->ics->offset];
>  }
>  
> +static void ics_set_irq_type(ICSState *ics, int srcno, bool lsi)
> +{
> +    assert(!(ics->irqs[srcno].flags & XICS_FLAGS_IRQ_MASK));
> +
> +    ics->irqs[srcno].flags |=
> +        lsi ? XICS_FLAGS_IRQ_LSI : XICS_FLAGS_IRQ_MSI;
> +}
> +
>  void xics_set_irq_type(XICSState *icp, int irq, bool lsi)
>  {
> -    assert(ics_valid_irq(icp->ics, irq));
> +    ICSState *ics = icp->ics;
>  
> -    icp->ics->islsi[irq - icp->ics->offset] = lsi;
> +    assert(ics_valid_irq(ics, irq));
> +
> +    ics_set_irq_type(ics, irq - ics->offset, lsi);
>  }
>  
>  /*
> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
> index 09476ae..8719a88 100644
> --- a/hw/intc/xics_kvm.c
> +++ b/hw/intc/xics_kvm.c
> @@ -224,7 +224,7 @@ static int ics_set_kvm_state(ICSState *ics, int version_id)
>              state |= KVM_XICS_MASKED;
>          }
>  
> -        if (ics->islsi[i]) {
> +        if (ics->irqs[i].flags & XICS_FLAGS_IRQ_LSI) {
>              state |= KVM_XICS_LEVEL_SENSITIVE;
>              if (irq->status & XICS_STATUS_ASSERTED) {
>                  state |= KVM_XICS_PENDING;
> @@ -253,7 +253,7 @@ static void ics_kvm_set_irq(void *opaque, int srcno, int val)
>      int rc;
>  
>      args.irq = srcno + ics->offset;
> -    if (!ics->islsi[srcno]) {
> +    if (ics->irqs[srcno].flags & XICS_FLAGS_IRQ_MSI) {
>          if (!val) {
>              return;
>          }
> @@ -290,7 +290,6 @@ static void ics_kvm_realize(DeviceState *dev, Error **errp)
>          return;
>      }
>      ics->irqs = g_malloc0(ics->nr_irqs * sizeof(ICSIRQState));
> -    ics->islsi = g_malloc0(ics->nr_irqs * sizeof(bool));
>      ics->qirqs = qemu_allocate_irqs(ics_kvm_set_irq, ics, ics->nr_irqs);
>  }
>  
> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> index 0d7673d..fa8e9c2 100644
> --- a/include/hw/ppc/xics.h
> +++ b/include/hw/ppc/xics.h
> @@ -136,7 +136,6 @@ struct ICSState {
>      uint32_t nr_irqs;
>      uint32_t offset;
>      qemu_irq *qirqs;
> -    bool *islsi;
>      ICSIRQState *irqs;
>      XICSState *icp;
>  };
> @@ -150,6 +149,11 @@ struct ICSIRQState {
>  #define XICS_STATUS_REJECTED           0x4
>  #define XICS_STATUS_MASKED_PENDING     0x8
>      uint8_t status;
> +/* (flags & XICS_FLAGS_IRQ_MASK) == 0 means the interrupt is not allocated */
> +#define XICS_FLAGS_IRQ_LSI             0x1
> +#define XICS_FLAGS_IRQ_MSI             0x2
> +#define XICS_FLAGS_IRQ_MASK            0x3
> +    uint8_t flags;
>  };
>  
>  qemu_irq xics_get_qirq(XICSState *icp, int irq);
> 

This is missing. But I suspect they still mmm not perfect so no point in
reposting them?


diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index f6a2066..c920410 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -579,6 +579,7 @@ static const VMStateDescription vmstate_ics_irq = {
         VMSTATE_UINT8(priority, ICSIRQState),
         VMSTATE_UINT8(saved_priority, ICSIRQState),
         VMSTATE_UINT8(status, ICSIRQState),
+        VMSTATE_UINT8(flags, ICSIRQState),
         VMSTATE_END_OF_LIST()
     },
 };




-- 
Alexey

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/8] spapr: Move interrupt allocator to xics
  2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 4/8] spapr: Move interrupt allocator to xics Alexey Kardashevskiy
@ 2014-05-21  8:34   ` Alexander Graf
  2014-05-21  8:46     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 35+ messages in thread
From: Alexander Graf @ 2014-05-21  8:34 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel; +Cc: qemu-ppc


On 15.05.14 11:59, Alexey Kardashevskiy wrote:
> The current allocator returns IRQ numbers from a pool and does not
> support IRQs reuse in any form as it did not keep track of what it
> previously returned, it only keeps the last returned IRQ. Some use
> cases such as PCI hot(un)plug may require IRQ release and reallocation.
>
> This moves an allocator from SPAPR to XICS.
>
> This switches IRQ users to use new API.
>
> This uses LSI/MSI flags to know if interrupt is allocated.
>
> The interrupt release function will be posted as a separate patch.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>   hw/intc/xics.c         | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++
>   hw/ppc/spapr.c         | 67 --------------------------------------
>   hw/ppc/spapr_events.c  |  2 +-
>   hw/ppc/spapr_pci.c     |  6 ++--
>   hw/ppc/spapr_vio.c     |  2 +-
>   include/hw/ppc/spapr.h | 10 ------
>   include/hw/ppc/xics.h  |  2 ++
>   trace-events           |  4 +++
>   8 files changed, 99 insertions(+), 82 deletions(-)
>
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index 83a809e..fdcbb3a 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -689,6 +689,94 @@ void xics_set_irq_type(XICSState *icp, int irq, bool lsi)
>       ics_set_irq_type(ics, irq - ics->offset, lsi);
>   }
>   
> +#define ICS_IRQ_FREE(ics, srcno)   \
> +    (!((ics)->irqs[(srcno)].flags & (XICS_FLAGS_IRQ_MASK)))
> +
> +static int ics_find_free_block(ICSState *ics, int num, int alignnum)
> +{
> +    int first, i;
> +
> +    for (first = 0; first < ics->nr_irqs; first += alignnum) {
> +        if (num > (ics->nr_irqs - first)) {
> +            return -1;
> +        }
> +        for (i = first; i < first + num; ++i) {
> +            if (!ICS_IRQ_FREE(ics, i)) {
> +                break;
> +            }
> +        }
> +        if (i == (first + num)) {
> +            return first;
> +        }
> +    }
> +
> +    return -1;
> +}
> +
> +int xics_alloc(XICSState *icp, int src, int irq_hint, bool lsi)
> +{
> +    ICSState *ics = &icp->ics[src];
> +    int irq;
> +
> +    if (irq_hint) {
> +        assert(src == xics_find_source(icp, irq_hint));

Could this ever get triggered by a guest? Why don't we just fail as if 
the IRQ wasn't available?


Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
  2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB Alexey Kardashevskiy
@ 2014-05-21  8:40   ` Alexander Graf
  2014-05-21  8:52     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 35+ messages in thread
From: Alexander Graf @ 2014-05-21  8:40 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel; +Cc: qemu-ppc


On 15.05.14 11:59, Alexey Kardashevskiy wrote:
> Currently SPAPR PHB keeps track of all allocated MSI/MISX interrupt as
> XICS used to be unable to reuse interrupts which becomes a problem for
> dynamic MSI reconfiguration which is happening on guest driver reload or
> PCI hot (un)plug. Another problem is that PHB has a limit of devices
> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no good reason
> for that.
>
> This makes use of new XICS ability to reuse interrupts.
>
> This removes cached MSI configuration from SPAPR PHB so the first IRQ number
> of a device is stored in MSI/MSIX config space so there is no need to store
> this anywhere else. From now on, SPAPR PHB only keeps flags telling what type
> of interrupt for which device it has configured in order to return error if
> (for example) MSIX was enabled and the guest is trying to disable MSI which
> it has not enabled.
>
> This removes a limit for the maximum number of MSIX-enabled devices per PHB,
> now XICS and PCI bus capacity are the only limitation.
>
> This changes migration stream as it fixes vmstate_spapr_pci_msi::name which was
> wrong since the beginning.
>
> This fixed traces to be more informative.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>
> In reality either MSIX or MSI is enabled, never both. So I could remove msi/msix
> bitmaps from this patch, would it make sense?

Is this a hard requirement? Does a device have to choose between MSIX 
and MSI or could it theoretically have both enabled? Is this a PCI 
limitation, a PAPR/XICS limitation or just a limitation of your 
implementation?


Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/8] xics: Add flags for interrupts
  2014-05-21  6:30   ` Alexey Kardashevskiy
@ 2014-05-21  8:40     ` Alexander Graf
  0 siblings, 0 replies; 35+ messages in thread
From: Alexander Graf @ 2014-05-21  8:40 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel; +Cc: qemu-ppc


On 21.05.14 08:30, Alexey Kardashevskiy wrote:
> On 05/15/2014 07:59 PM, Alexey Kardashevskiy wrote:
>> The existing interrupt allocation scheme in SPAPR assumes that
>> interrupts are allocated at the start time, continously and the config
>> will not change. However, there are cases when this is not going to work
>> such as:
>>
>> 1. migration - we will have to have an ability to choose interrupt
>> numbers for devices in the command line and this will create gaps in
>> interrupt space.
>>
>> 2. PCI hotplug - interrupts from unplugged device need to be returned
>> back to interrupt pool, otherwise we will quickly run out of interrupts.
>>
>> This replaces a separate lslsi[] array with a byte in the ICSIRQState
>> struct and defines "LSI" and "MSI" flags. Neither of these flags set
>> signals that the descriptor is not allocated and not in use.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>   hw/intc/xics.c        | 23 ++++++++++++++++-------
>>   hw/intc/xics_kvm.c    |  5 ++---
>>   include/hw/ppc/xics.h |  6 +++++-
>>   3 files changed, 23 insertions(+), 11 deletions(-)
>>
>> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
>> index 64aabe7..220ca0e 100644
>> --- a/hw/intc/xics.c
>> +++ b/hw/intc/xics.c
>> @@ -438,7 +438,7 @@ static void ics_set_irq(void *opaque, int srcno, int val)
>>   {
>>       ICSState *ics = (ICSState *)opaque;
>>   
>> -    if (ics->islsi[srcno]) {
>> +    if (ics->irqs[srcno].flags & XICS_FLAGS_IRQ_LSI) {
>>           set_irq_lsi(ics, srcno, val);
>>       } else {
>>           set_irq_msi(ics, srcno, val);
>> @@ -475,7 +475,7 @@ static void ics_write_xive(ICSState *ics, int nr, int server,
>>   
>>       trace_xics_ics_write_xive(nr, srcno, server, priority);
>>   
>> -    if (ics->islsi[srcno]) {
>> +    if (ics->irqs[srcno].flags & XICS_FLAGS_IRQ_LSI) {
>>           write_xive_lsi(ics, srcno);
>>       } else {
>>           write_xive_msi(ics, srcno);
>> @@ -497,7 +497,7 @@ static void ics_resend(ICSState *ics)
>>   
>>       for (i = 0; i < ics->nr_irqs; i++) {
>>           /* FIXME: filter by server#? */
>> -        if (ics->islsi[i]) {
>> +        if (ics->irqs[i].flags & XICS_FLAGS_IRQ_LSI) {
>>               resend_lsi(ics, i);
>>           } else {
>>               resend_msi(ics, i);
>> @@ -512,7 +512,7 @@ static void ics_eoi(ICSState *ics, int nr)
>>   
>>       trace_xics_ics_eoi(nr);
>>   
>> -    if (ics->islsi[srcno]) {
>> +    if (ics->irqs[srcno].flags & XICS_FLAGS_IRQ_LSI) {
>>           irq->status &= ~XICS_STATUS_SENT;
>>       }
>>   }
>> @@ -609,7 +609,6 @@ static void ics_realize(DeviceState *dev, Error **errp)
>>           return;
>>       }
>>       ics->irqs = g_malloc0(ics->nr_irqs * sizeof(ICSIRQState));
>> -    ics->islsi = g_malloc0(ics->nr_irqs * sizeof(bool));
>>       ics->qirqs = qemu_allocate_irqs(ics_set_irq, ics, ics->nr_irqs);
>>   }
>>   
>> @@ -646,11 +645,21 @@ qemu_irq xics_get_qirq(XICSState *icp, int irq)
>>       return icp->ics->qirqs[irq - icp->ics->offset];
>>   }
>>   
>> +static void ics_set_irq_type(ICSState *ics, int srcno, bool lsi)
>> +{
>> +    assert(!(ics->irqs[srcno].flags & XICS_FLAGS_IRQ_MASK));
>> +
>> +    ics->irqs[srcno].flags |=
>> +        lsi ? XICS_FLAGS_IRQ_LSI : XICS_FLAGS_IRQ_MSI;
>> +}
>> +
>>   void xics_set_irq_type(XICSState *icp, int irq, bool lsi)
>>   {
>> -    assert(ics_valid_irq(icp->ics, irq));
>> +    ICSState *ics = icp->ics;
>>   
>> -    icp->ics->islsi[irq - icp->ics->offset] = lsi;
>> +    assert(ics_valid_irq(ics, irq));
>> +
>> +    ics_set_irq_type(ics, irq - ics->offset, lsi);
>>   }
>>   
>>   /*
>> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
>> index 09476ae..8719a88 100644
>> --- a/hw/intc/xics_kvm.c
>> +++ b/hw/intc/xics_kvm.c
>> @@ -224,7 +224,7 @@ static int ics_set_kvm_state(ICSState *ics, int version_id)
>>               state |= KVM_XICS_MASKED;
>>           }
>>   
>> -        if (ics->islsi[i]) {
>> +        if (ics->irqs[i].flags & XICS_FLAGS_IRQ_LSI) {
>>               state |= KVM_XICS_LEVEL_SENSITIVE;
>>               if (irq->status & XICS_STATUS_ASSERTED) {
>>                   state |= KVM_XICS_PENDING;
>> @@ -253,7 +253,7 @@ static void ics_kvm_set_irq(void *opaque, int srcno, int val)
>>       int rc;
>>   
>>       args.irq = srcno + ics->offset;
>> -    if (!ics->islsi[srcno]) {
>> +    if (ics->irqs[srcno].flags & XICS_FLAGS_IRQ_MSI) {
>>           if (!val) {
>>               return;
>>           }
>> @@ -290,7 +290,6 @@ static void ics_kvm_realize(DeviceState *dev, Error **errp)
>>           return;
>>       }
>>       ics->irqs = g_malloc0(ics->nr_irqs * sizeof(ICSIRQState));
>> -    ics->islsi = g_malloc0(ics->nr_irqs * sizeof(bool));
>>       ics->qirqs = qemu_allocate_irqs(ics_kvm_set_irq, ics, ics->nr_irqs);
>>   }
>>   
>> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
>> index 0d7673d..fa8e9c2 100644
>> --- a/include/hw/ppc/xics.h
>> +++ b/include/hw/ppc/xics.h
>> @@ -136,7 +136,6 @@ struct ICSState {
>>       uint32_t nr_irqs;
>>       uint32_t offset;
>>       qemu_irq *qirqs;
>> -    bool *islsi;
>>       ICSIRQState *irqs;
>>       XICSState *icp;
>>   };
>> @@ -150,6 +149,11 @@ struct ICSIRQState {
>>   #define XICS_STATUS_REJECTED           0x4
>>   #define XICS_STATUS_MASKED_PENDING     0x8
>>       uint8_t status;
>> +/* (flags & XICS_FLAGS_IRQ_MASK) == 0 means the interrupt is not allocated */
>> +#define XICS_FLAGS_IRQ_LSI             0x1
>> +#define XICS_FLAGS_IRQ_MSI             0x2
>> +#define XICS_FLAGS_IRQ_MASK            0x3
>> +    uint8_t flags;
>>   };
>>   
>>   qemu_irq xics_get_qirq(XICSState *icp, int irq);
>>
> This is missing. But I suspect they still mmm not perfect so no point in
> reposting them?
>
>
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index f6a2066..c920410 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -579,6 +579,7 @@ static const VMStateDescription vmstate_ics_irq = {
>           VMSTATE_UINT8(priority, ICSIRQState),
>           VMSTATE_UINT8(saved_priority, ICSIRQState),
>           VMSTATE_UINT8(status, ICSIRQState),
> +        VMSTATE_UINT8(flags, ICSIRQState),

The version change is also missing.


Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/8] spapr: Move interrupt allocator to xics
  2014-05-21  8:34   ` Alexander Graf
@ 2014-05-21  8:46     ` Alexey Kardashevskiy
  2014-05-21  8:47       ` Alexander Graf
  0 siblings, 1 reply; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-21  8:46 UTC (permalink / raw)
  To: Alexander Graf, qemu-devel; +Cc: qemu-ppc

On 05/21/2014 06:34 PM, Alexander Graf wrote:
> 
> On 15.05.14 11:59, Alexey Kardashevskiy wrote:
>> The current allocator returns IRQ numbers from a pool and does not
>> support IRQs reuse in any form as it did not keep track of what it
>> previously returned, it only keeps the last returned IRQ. Some use
>> cases such as PCI hot(un)plug may require IRQ release and reallocation.
>>
>> This moves an allocator from SPAPR to XICS.
>>
>> This switches IRQ users to use new API.
>>
>> This uses LSI/MSI flags to know if interrupt is allocated.
>>
>> The interrupt release function will be posted as a separate patch.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>   hw/intc/xics.c         | 88
>> ++++++++++++++++++++++++++++++++++++++++++++++++++
>>   hw/ppc/spapr.c         | 67 --------------------------------------
>>   hw/ppc/spapr_events.c  |  2 +-
>>   hw/ppc/spapr_pci.c     |  6 ++--
>>   hw/ppc/spapr_vio.c     |  2 +-
>>   include/hw/ppc/spapr.h | 10 ------
>>   include/hw/ppc/xics.h  |  2 ++
>>   trace-events           |  4 +++
>>   8 files changed, 99 insertions(+), 82 deletions(-)
>>
>> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
>> index 83a809e..fdcbb3a 100644
>> --- a/hw/intc/xics.c
>> +++ b/hw/intc/xics.c
>> @@ -689,6 +689,94 @@ void xics_set_irq_type(XICSState *icp, int irq, bool
>> lsi)
>>       ics_set_irq_type(ics, irq - ics->offset, lsi);
>>   }
>>   +#define ICS_IRQ_FREE(ics, srcno)   \
>> +    (!((ics)->irqs[(srcno)].flags & (XICS_FLAGS_IRQ_MASK)))
>> +
>> +static int ics_find_free_block(ICSState *ics, int num, int alignnum)
>> +{
>> +    int first, i;
>> +
>> +    for (first = 0; first < ics->nr_irqs; first += alignnum) {
>> +        if (num > (ics->nr_irqs - first)) {
>> +            return -1;
>> +        }
>> +        for (i = first; i < first + num; ++i) {
>> +            if (!ICS_IRQ_FREE(ics, i)) {
>> +                break;
>> +            }
>> +        }
>> +        if (i == (first + num)) {
>> +            return first;
>> +        }
>> +    }
>> +
>> +    return -1;
>> +}
>> +
>> +int xics_alloc(XICSState *icp, int src, int irq_hint, bool lsi)
>> +{
>> +    ICSState *ics = &icp->ics[src];
>> +    int irq;
>> +
>> +    if (irq_hint) {
>> +        assert(src == xics_find_source(icp, irq_hint));
> 
> Could this ever get triggered by a guest? Why don't we just fail as if the
> IRQ wasn't available?

The @irq_hist is only used when the user specified IRQ in the command line,
this is why it is assert. And the guest never gets to choose IRQ number, it
receives numbers either via device tree (and QEMU puts those numbers there,
not SLOF) or ibm,change-msi (QEMU returns numbers).



-- 
Alexey

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 4/8] spapr: Move interrupt allocator to xics
  2014-05-21  8:46     ` Alexey Kardashevskiy
@ 2014-05-21  8:47       ` Alexander Graf
  0 siblings, 0 replies; 35+ messages in thread
From: Alexander Graf @ 2014-05-21  8:47 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel; +Cc: qemu-ppc


On 21.05.14 10:46, Alexey Kardashevskiy wrote:
> On 05/21/2014 06:34 PM, Alexander Graf wrote:
>> On 15.05.14 11:59, Alexey Kardashevskiy wrote:
>>> The current allocator returns IRQ numbers from a pool and does not
>>> support IRQs reuse in any form as it did not keep track of what it
>>> previously returned, it only keeps the last returned IRQ. Some use
>>> cases such as PCI hot(un)plug may require IRQ release and reallocation.
>>>
>>> This moves an allocator from SPAPR to XICS.
>>>
>>> This switches IRQ users to use new API.
>>>
>>> This uses LSI/MSI flags to know if interrupt is allocated.
>>>
>>> The interrupt release function will be posted as a separate patch.
>>>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>    hw/intc/xics.c         | 88
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>    hw/ppc/spapr.c         | 67 --------------------------------------
>>>    hw/ppc/spapr_events.c  |  2 +-
>>>    hw/ppc/spapr_pci.c     |  6 ++--
>>>    hw/ppc/spapr_vio.c     |  2 +-
>>>    include/hw/ppc/spapr.h | 10 ------
>>>    include/hw/ppc/xics.h  |  2 ++
>>>    trace-events           |  4 +++
>>>    8 files changed, 99 insertions(+), 82 deletions(-)
>>>
>>> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
>>> index 83a809e..fdcbb3a 100644
>>> --- a/hw/intc/xics.c
>>> +++ b/hw/intc/xics.c
>>> @@ -689,6 +689,94 @@ void xics_set_irq_type(XICSState *icp, int irq, bool
>>> lsi)
>>>        ics_set_irq_type(ics, irq - ics->offset, lsi);
>>>    }
>>>    +#define ICS_IRQ_FREE(ics, srcno)   \
>>> +    (!((ics)->irqs[(srcno)].flags & (XICS_FLAGS_IRQ_MASK)))
>>> +
>>> +static int ics_find_free_block(ICSState *ics, int num, int alignnum)
>>> +{
>>> +    int first, i;
>>> +
>>> +    for (first = 0; first < ics->nr_irqs; first += alignnum) {
>>> +        if (num > (ics->nr_irqs - first)) {
>>> +            return -1;
>>> +        }
>>> +        for (i = first; i < first + num; ++i) {
>>> +            if (!ICS_IRQ_FREE(ics, i)) {
>>> +                break;
>>> +            }
>>> +        }
>>> +        if (i == (first + num)) {
>>> +            return first;
>>> +        }
>>> +    }
>>> +
>>> +    return -1;
>>> +}
>>> +
>>> +int xics_alloc(XICSState *icp, int src, int irq_hint, bool lsi)
>>> +{
>>> +    ICSState *ics = &icp->ics[src];
>>> +    int irq;
>>> +
>>> +    if (irq_hint) {
>>> +        assert(src == xics_find_source(icp, irq_hint));
>> Could this ever get triggered by a guest? Why don't we just fail as if the
>> IRQ wasn't available?
> The @irq_hist is only used when the user specified IRQ in the command line,
> this is why it is assert. And the guest never gets to choose IRQ number, it
> receives numbers either via device tree (and QEMU puts those numbers there,
> not SLOF) or ibm,change-msi (QEMU returns numbers).

Ok, it's safe then :).


Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
  2014-05-21  8:40   ` Alexander Graf
@ 2014-05-21  8:52     ` Alexey Kardashevskiy
  2014-05-21  9:06       ` Alexander Graf
  0 siblings, 1 reply; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-21  8:52 UTC (permalink / raw)
  To: Alexander Graf, qemu-devel; +Cc: qemu-ppc

On 05/21/2014 06:40 PM, Alexander Graf wrote:
> 
> On 15.05.14 11:59, Alexey Kardashevskiy wrote:
>> Currently SPAPR PHB keeps track of all allocated MSI/MISX interrupt as
>> XICS used to be unable to reuse interrupts which becomes a problem for
>> dynamic MSI reconfiguration which is happening on guest driver reload or
>> PCI hot (un)plug. Another problem is that PHB has a limit of devices
>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no good reason
>> for that.
>>
>> This makes use of new XICS ability to reuse interrupts.
>>
>> This removes cached MSI configuration from SPAPR PHB so the first IRQ number
>> of a device is stored in MSI/MSIX config space so there is no need to store
>> this anywhere else. From now on, SPAPR PHB only keeps flags telling what
>> type
>> of interrupt for which device it has configured in order to return error if
>> (for example) MSIX was enabled and the guest is trying to disable MSI which
>> it has not enabled.
>>
>> This removes a limit for the maximum number of MSIX-enabled devices per PHB,
>> now XICS and PCI bus capacity are the only limitation.
>>
>> This changes migration stream as it fixes vmstate_spapr_pci_msi::name
>> which was
>> wrong since the beginning.
>>
>> This fixed traces to be more informative.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>
>> In reality either MSIX or MSI is enabled, never both. So I could remove
>> msi/msix
>> bitmaps from this patch, would it make sense?
> 
> Is this a hard requirement? Does a device have to choose between MSIX and
> MSI or could it theoretically have both enabled? Is this a PCI limitation,
> a PAPR/XICS limitation or just a limitation of your implementation?


My implementation does not have this limitation, I asked if I can simplify
code by introducing one :)

I cannot see any reason why PCI cannot have both MSI and MSIX enabled but
it does not seem to be used by anyone => cannot debug and confirm.

PAPR spec assumes that if the guest tries enabling MSIX when MSI is already
enabled, this is a "change", not enabling both types. But it also says MSI
and MSIX vector numbers are not shared. Hm.


-- 
Alexey

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
  2014-05-21  8:52     ` Alexey Kardashevskiy
@ 2014-05-21  9:06       ` Alexander Graf
  2014-05-21  9:11         ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: Alexander Graf @ 2014-05-21  9:06 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel; +Cc: qemu-ppc, Michael S. Tsirkin


On 21.05.14 10:52, Alexey Kardashevskiy wrote:
> On 05/21/2014 06:40 PM, Alexander Graf wrote:
>> On 15.05.14 11:59, Alexey Kardashevskiy wrote:
>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX interrupt as
>>> XICS used to be unable to reuse interrupts which becomes a problem for
>>> dynamic MSI reconfiguration which is happening on guest driver reload or
>>> PCI hot (un)plug. Another problem is that PHB has a limit of devices
>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no good reason
>>> for that.
>>>
>>> This makes use of new XICS ability to reuse interrupts.
>>>
>>> This removes cached MSI configuration from SPAPR PHB so the first IRQ number
>>> of a device is stored in MSI/MSIX config space so there is no need to store
>>> this anywhere else. From now on, SPAPR PHB only keeps flags telling what
>>> type
>>> of interrupt for which device it has configured in order to return error if
>>> (for example) MSIX was enabled and the guest is trying to disable MSI which
>>> it has not enabled.
>>>
>>> This removes a limit for the maximum number of MSIX-enabled devices per PHB,
>>> now XICS and PCI bus capacity are the only limitation.
>>>
>>> This changes migration stream as it fixes vmstate_spapr_pci_msi::name
>>> which was
>>> wrong since the beginning.
>>>
>>> This fixed traces to be more informative.
>>>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>
>>> In reality either MSIX or MSI is enabled, never both. So I could remove
>>> msi/msix
>>> bitmaps from this patch, would it make sense?
>> Is this a hard requirement? Does a device have to choose between MSIX and
>> MSI or could it theoretically have both enabled? Is this a PCI limitation,
>> a PAPR/XICS limitation or just a limitation of your implementation?
>
> My implementation does not have this limitation, I asked if I can simplify
> code by introducing one :)
>
> I cannot see any reason why PCI cannot have both MSI and MSIX enabled but
> it does not seem to be used by anyone => cannot debug and confirm.
>
> PAPR spec assumes that if the guest tries enabling MSIX when MSI is already
> enabled, this is a "change", not enabling both types. But it also says MSI
> and MSIX vector numbers are not shared. Hm.

Yeah, I'm not aware of any limitation on hardware here and I'd rather 
not impose one.

Michael, do you know of any hardware that uses MSI *and* MSI-X at the 
same time?


Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
  2014-05-21  9:06       ` Alexander Graf
@ 2014-05-21  9:11         ` Michael S. Tsirkin
  2014-05-21  9:13           ` Alexander Graf
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2014-05-21  9:11 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Alexey Kardashevskiy, qemu-ppc, qemu-devel

On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote:
> 
> On 21.05.14 10:52, Alexey Kardashevskiy wrote:
> >On 05/21/2014 06:40 PM, Alexander Graf wrote:
> >>On 15.05.14 11:59, Alexey Kardashevskiy wrote:
> >>>Currently SPAPR PHB keeps track of all allocated MSI/MISX interrupt as
> >>>XICS used to be unable to reuse interrupts which becomes a problem for
> >>>dynamic MSI reconfiguration which is happening on guest driver reload or
> >>>PCI hot (un)plug. Another problem is that PHB has a limit of devices
> >>>supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no good reason
> >>>for that.
> >>>
> >>>This makes use of new XICS ability to reuse interrupts.
> >>>
> >>>This removes cached MSI configuration from SPAPR PHB so the first IRQ number
> >>>of a device is stored in MSI/MSIX config space so there is no need to store
> >>>this anywhere else. From now on, SPAPR PHB only keeps flags telling what
> >>>type
> >>>of interrupt for which device it has configured in order to return error if
> >>>(for example) MSIX was enabled and the guest is trying to disable MSI which
> >>>it has not enabled.
> >>>
> >>>This removes a limit for the maximum number of MSIX-enabled devices per PHB,
> >>>now XICS and PCI bus capacity are the only limitation.
> >>>
> >>>This changes migration stream as it fixes vmstate_spapr_pci_msi::name
> >>>which was
> >>>wrong since the beginning.
> >>>
> >>>This fixed traces to be more informative.
> >>>
> >>>Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>>---
> >>>
> >>>In reality either MSIX or MSI is enabled, never both. So I could remove
> >>>msi/msix
> >>>bitmaps from this patch, would it make sense?
> >>Is this a hard requirement? Does a device have to choose between MSIX and
> >>MSI or could it theoretically have both enabled? Is this a PCI limitation,
> >>a PAPR/XICS limitation or just a limitation of your implementation?
> >
> >My implementation does not have this limitation, I asked if I can simplify
> >code by introducing one :)
> >
> >I cannot see any reason why PCI cannot have both MSI and MSIX enabled but
> >it does not seem to be used by anyone => cannot debug and confirm.
> >
> >PAPR spec assumes that if the guest tries enabling MSIX when MSI is already
> >enabled, this is a "change", not enabling both types. But it also says MSI
> >and MSIX vector numbers are not shared. Hm.
> 
> Yeah, I'm not aware of any limitation on hardware here and I'd
> rather not impose one.
> 
> Michael, do you know of any hardware that uses MSI *and* MSI-X at
> the same time?
> 
> 
> Alex

No, and the PCI spec says:
	A function is permitted to implement both MSI and MSI-X, but system
	software is
	prohibited from enabling both at the same time. If system software
	enables both at the same time, the result is undefined.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
  2014-05-21  9:11         ` Michael S. Tsirkin
@ 2014-05-21  9:13           ` Alexander Graf
  2014-05-21  9:33             ` Alexey Kardashevskiy
  0 siblings, 1 reply; 35+ messages in thread
From: Alexander Graf @ 2014-05-21  9:13 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Alexey Kardashevskiy, qemu-ppc, qemu-devel


On 21.05.14 11:11, Michael S. Tsirkin wrote:
> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote:
>> On 21.05.14 10:52, Alexey Kardashevskiy wrote:
>>> On 05/21/2014 06:40 PM, Alexander Graf wrote:
>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote:
>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX interrupt as
>>>>> XICS used to be unable to reuse interrupts which becomes a problem for
>>>>> dynamic MSI reconfiguration which is happening on guest driver reload or
>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of devices
>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no good reason
>>>>> for that.
>>>>>
>>>>> This makes use of new XICS ability to reuse interrupts.
>>>>>
>>>>> This removes cached MSI configuration from SPAPR PHB so the first IRQ number
>>>>> of a device is stored in MSI/MSIX config space so there is no need to store
>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags telling what
>>>>> type
>>>>> of interrupt for which device it has configured in order to return error if
>>>>> (for example) MSIX was enabled and the guest is trying to disable MSI which
>>>>> it has not enabled.
>>>>>
>>>>> This removes a limit for the maximum number of MSIX-enabled devices per PHB,
>>>>> now XICS and PCI bus capacity are the only limitation.
>>>>>
>>>>> This changes migration stream as it fixes vmstate_spapr_pci_msi::name
>>>>> which was
>>>>> wrong since the beginning.
>>>>>
>>>>> This fixed traces to be more informative.
>>>>>
>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>> ---
>>>>>
>>>>> In reality either MSIX or MSI is enabled, never both. So I could remove
>>>>> msi/msix
>>>>> bitmaps from this patch, would it make sense?
>>>> Is this a hard requirement? Does a device have to choose between MSIX and
>>>> MSI or could it theoretically have both enabled? Is this a PCI limitation,
>>>> a PAPR/XICS limitation or just a limitation of your implementation?
>>> My implementation does not have this limitation, I asked if I can simplify
>>> code by introducing one :)
>>>
>>> I cannot see any reason why PCI cannot have both MSI and MSIX enabled but
>>> it does not seem to be used by anyone => cannot debug and confirm.
>>>
>>> PAPR spec assumes that if the guest tries enabling MSIX when MSI is already
>>> enabled, this is a "change", not enabling both types. But it also says MSI
>>> and MSIX vector numbers are not shared. Hm.
>> Yeah, I'm not aware of any limitation on hardware here and I'd
>> rather not impose one.
>>
>> Michael, do you know of any hardware that uses MSI *and* MSI-X at
>> the same time?
>>
>>
>> Alex
> No, and the PCI spec says:
> 	A function is permitted to implement both MSI and MSI-X, but system
> 	software is
> 	prohibited from enabling both at the same time. If system software
> 	enables both at the same time, the result is undefined.

Ah, cool. So yes Alexey, feel free to impose it :).


Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
  2014-05-21  9:13           ` Alexander Graf
@ 2014-05-21  9:33             ` Alexey Kardashevskiy
  2014-05-21  9:38               ` Michael S. Tsirkin
  2014-05-21  9:50               ` Alexander Graf
  0 siblings, 2 replies; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-21  9:33 UTC (permalink / raw)
  To: Alexander Graf, Michael S. Tsirkin; +Cc: qemu-ppc, qemu-devel

On 05/21/2014 07:13 PM, Alexander Graf wrote:
> 
> On 21.05.14 11:11, Michael S. Tsirkin wrote:
>> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote:
>>> On 21.05.14 10:52, Alexey Kardashevskiy wrote:
>>>> On 05/21/2014 06:40 PM, Alexander Graf wrote:
>>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote:
>>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX interrupt as
>>>>>> XICS used to be unable to reuse interrupts which becomes a problem for
>>>>>> dynamic MSI reconfiguration which is happening on guest driver reload or
>>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of devices
>>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no good reason
>>>>>> for that.
>>>>>>
>>>>>> This makes use of new XICS ability to reuse interrupts.
>>>>>>
>>>>>> This removes cached MSI configuration from SPAPR PHB so the first IRQ
>>>>>> number
>>>>>> of a device is stored in MSI/MSIX config space so there is no need to
>>>>>> store
>>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags telling what
>>>>>> type
>>>>>> of interrupt for which device it has configured in order to return
>>>>>> error if
>>>>>> (for example) MSIX was enabled and the guest is trying to disable MSI
>>>>>> which
>>>>>> it has not enabled.
>>>>>>
>>>>>> This removes a limit for the maximum number of MSIX-enabled devices
>>>>>> per PHB,
>>>>>> now XICS and PCI bus capacity are the only limitation.
>>>>>>
>>>>>> This changes migration stream as it fixes vmstate_spapr_pci_msi::name
>>>>>> which was
>>>>>> wrong since the beginning.
>>>>>>
>>>>>> This fixed traces to be more informative.
>>>>>>
>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>> ---
>>>>>>
>>>>>> In reality either MSIX or MSI is enabled, never both. So I could remove
>>>>>> msi/msix
>>>>>> bitmaps from this patch, would it make sense?
>>>>> Is this a hard requirement? Does a device have to choose between MSIX and
>>>>> MSI or could it theoretically have both enabled? Is this a PCI
>>>>> limitation,
>>>>> a PAPR/XICS limitation or just a limitation of your implementation?
>>>> My implementation does not have this limitation, I asked if I can simplify
>>>> code by introducing one :)
>>>>
>>>> I cannot see any reason why PCI cannot have both MSI and MSIX enabled but
>>>> it does not seem to be used by anyone => cannot debug and confirm.
>>>>
>>>> PAPR spec assumes that if the guest tries enabling MSIX when MSI is
>>>> already
>>>> enabled, this is a "change", not enabling both types. But it also says MSI
>>>> and MSIX vector numbers are not shared. Hm.
>>> Yeah, I'm not aware of any limitation on hardware here and I'd
>>> rather not impose one.
>>>
>>> Michael, do you know of any hardware that uses MSI *and* MSI-X at
>>> the same time?
>>>
>>>
>>> Alex
>> No, and the PCI spec says:
>>     A function is permitted to implement both MSI and MSI-X, but system
>>     software is
>>     prohibited from enabling both at the same time. If system software
>>     enables both at the same time, the result is undefined.
> 
> Ah, cool. So yes Alexey, feel free to impose it :).

Heh. This solves just half of the problem - I still have to keep track of
what device got MSI/MSIX configured via that ibm,change-msi interface. I
was hoping I can store such flag somewhere in a device PCI config space but
MSI/MSIX enable bit is not good as it is not set when those calls are made.
And I cannot rely on address/data fields much as the guest can change them
(I already use them to store IRQ numbers and btw it is missing checks when
I read them back for disposal, I'll fix in next round).

Or on "enable" event I could put IRQ numbers to .data of MSI config space
and on "disable" check if it is not zero, then configuration took place,
then I can remove my msi[]/msix[] flag arrays. If the guest did any change
to MSI/MSIX config space (it does not on SPAPR except weird selftest
cases), I compare .data with what ICS can possibly have and either reject
"disable" or handle it and if it breaks XICS - that's too bad for the
stupid guest. Would that be acceptable?


-- 
Alexey

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
  2014-05-21  9:33             ` Alexey Kardashevskiy
@ 2014-05-21  9:38               ` Michael S. Tsirkin
  2014-05-21 10:03                 ` Alexey Kardashevskiy
  2014-05-21  9:50               ` Alexander Graf
  1 sibling, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2014-05-21  9:38 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-ppc, Alexander Graf, qemu-devel

On Wed, May 21, 2014 at 07:33:36PM +1000, Alexey Kardashevskiy wrote:
> On 05/21/2014 07:13 PM, Alexander Graf wrote:
> > 
> > On 21.05.14 11:11, Michael S. Tsirkin wrote:
> >> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote:
> >>> On 21.05.14 10:52, Alexey Kardashevskiy wrote:
> >>>> On 05/21/2014 06:40 PM, Alexander Graf wrote:
> >>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote:
> >>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX interrupt as
> >>>>>> XICS used to be unable to reuse interrupts which becomes a problem for
> >>>>>> dynamic MSI reconfiguration which is happening on guest driver reload or
> >>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of devices
> >>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no good reason
> >>>>>> for that.
> >>>>>>
> >>>>>> This makes use of new XICS ability to reuse interrupts.
> >>>>>>
> >>>>>> This removes cached MSI configuration from SPAPR PHB so the first IRQ
> >>>>>> number
> >>>>>> of a device is stored in MSI/MSIX config space so there is no need to
> >>>>>> store
> >>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags telling what
> >>>>>> type
> >>>>>> of interrupt for which device it has configured in order to return
> >>>>>> error if
> >>>>>> (for example) MSIX was enabled and the guest is trying to disable MSI
> >>>>>> which
> >>>>>> it has not enabled.
> >>>>>>
> >>>>>> This removes a limit for the maximum number of MSIX-enabled devices
> >>>>>> per PHB,
> >>>>>> now XICS and PCI bus capacity are the only limitation.
> >>>>>>
> >>>>>> This changes migration stream as it fixes vmstate_spapr_pci_msi::name
> >>>>>> which was
> >>>>>> wrong since the beginning.
> >>>>>>
> >>>>>> This fixed traces to be more informative.
> >>>>>>
> >>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>>>>> ---
> >>>>>>
> >>>>>> In reality either MSIX or MSI is enabled, never both. So I could remove
> >>>>>> msi/msix
> >>>>>> bitmaps from this patch, would it make sense?
> >>>>> Is this a hard requirement? Does a device have to choose between MSIX and
> >>>>> MSI or could it theoretically have both enabled? Is this a PCI
> >>>>> limitation,
> >>>>> a PAPR/XICS limitation or just a limitation of your implementation?
> >>>> My implementation does not have this limitation, I asked if I can simplify
> >>>> code by introducing one :)
> >>>>
> >>>> I cannot see any reason why PCI cannot have both MSI and MSIX enabled but
> >>>> it does not seem to be used by anyone => cannot debug and confirm.
> >>>>
> >>>> PAPR spec assumes that if the guest tries enabling MSIX when MSI is
> >>>> already
> >>>> enabled, this is a "change", not enabling both types. But it also says MSI
> >>>> and MSIX vector numbers are not shared. Hm.
> >>> Yeah, I'm not aware of any limitation on hardware here and I'd
> >>> rather not impose one.
> >>>
> >>> Michael, do you know of any hardware that uses MSI *and* MSI-X at
> >>> the same time?
> >>>
> >>>
> >>> Alex
> >> No, and the PCI spec says:
> >>     A function is permitted to implement both MSI and MSI-X, but system
> >>     software is
> >>     prohibited from enabling both at the same time. If system software
> >>     enables both at the same time, the result is undefined.
> > 
> > Ah, cool. So yes Alexey, feel free to impose it :).
> 
> Heh. This solves just half of the problem - I still have to keep track of
> what device got MSI/MSIX configured via that ibm,change-msi interface. I
> was hoping I can store such flag somewhere in a device PCI config space but
> MSI/MSIX enable bit is not good as it is not set when those calls are made.

Hmm could you pls remind me why is it desirable to store this
in device? Device is not yet sending MSI interrupts after all
otherwise enable would be set.

> And I cannot rely on address/data fields much as the guest can change them
> (I already use them to store IRQ numbers and btw it is missing checks when
> I read them back for disposal, I'll fix in next round).
> 
> Or on "enable" event I could put IRQ numbers to .data of MSI config space
> and on "disable" check if it is not zero, then configuration took place,
> then I can remove my msi[]/msix[] flag arrays. If the guest did any change
> to MSI/MSIX config space (it does not on SPAPR except weird selftest
> cases), I compare .data with what ICS can possibly have and either reject
> "disable" or handle it and if it breaks XICS - that's too bad for the
> stupid guest. Would that be acceptable?


> 
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
  2014-05-21  9:33             ` Alexey Kardashevskiy
  2014-05-21  9:38               ` Michael S. Tsirkin
@ 2014-05-21  9:50               ` Alexander Graf
  2014-05-21 10:13                 ` Alexey Kardashevskiy
  1 sibling, 1 reply; 35+ messages in thread
From: Alexander Graf @ 2014-05-21  9:50 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael S. Tsirkin; +Cc: qemu-ppc, qemu-devel


On 21.05.14 11:33, Alexey Kardashevskiy wrote:
> On 05/21/2014 07:13 PM, Alexander Graf wrote:
>> On 21.05.14 11:11, Michael S. Tsirkin wrote:
>>> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote:
>>>> On 21.05.14 10:52, Alexey Kardashevskiy wrote:
>>>>> On 05/21/2014 06:40 PM, Alexander Graf wrote:
>>>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote:
>>>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX interrupt as
>>>>>>> XICS used to be unable to reuse interrupts which becomes a problem for
>>>>>>> dynamic MSI reconfiguration which is happening on guest driver reload or
>>>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of devices
>>>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no good reason
>>>>>>> for that.
>>>>>>>
>>>>>>> This makes use of new XICS ability to reuse interrupts.
>>>>>>>
>>>>>>> This removes cached MSI configuration from SPAPR PHB so the first IRQ
>>>>>>> number
>>>>>>> of a device is stored in MSI/MSIX config space so there is no need to
>>>>>>> store
>>>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags telling what
>>>>>>> type
>>>>>>> of interrupt for which device it has configured in order to return
>>>>>>> error if
>>>>>>> (for example) MSIX was enabled and the guest is trying to disable MSI
>>>>>>> which
>>>>>>> it has not enabled.
>>>>>>>
>>>>>>> This removes a limit for the maximum number of MSIX-enabled devices
>>>>>>> per PHB,
>>>>>>> now XICS and PCI bus capacity are the only limitation.
>>>>>>>
>>>>>>> This changes migration stream as it fixes vmstate_spapr_pci_msi::name
>>>>>>> which was
>>>>>>> wrong since the beginning.
>>>>>>>
>>>>>>> This fixed traces to be more informative.
>>>>>>>
>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>> ---
>>>>>>>
>>>>>>> In reality either MSIX or MSI is enabled, never both. So I could remove
>>>>>>> msi/msix
>>>>>>> bitmaps from this patch, would it make sense?
>>>>>> Is this a hard requirement? Does a device have to choose between MSIX and
>>>>>> MSI or could it theoretically have both enabled? Is this a PCI
>>>>>> limitation,
>>>>>> a PAPR/XICS limitation or just a limitation of your implementation?
>>>>> My implementation does not have this limitation, I asked if I can simplify
>>>>> code by introducing one :)
>>>>>
>>>>> I cannot see any reason why PCI cannot have both MSI and MSIX enabled but
>>>>> it does not seem to be used by anyone => cannot debug and confirm.
>>>>>
>>>>> PAPR spec assumes that if the guest tries enabling MSIX when MSI is
>>>>> already
>>>>> enabled, this is a "change", not enabling both types. But it also says MSI
>>>>> and MSIX vector numbers are not shared. Hm.
>>>> Yeah, I'm not aware of any limitation on hardware here and I'd
>>>> rather not impose one.
>>>>
>>>> Michael, do you know of any hardware that uses MSI *and* MSI-X at
>>>> the same time?
>>>>
>>>>
>>>> Alex
>>> No, and the PCI spec says:
>>>      A function is permitted to implement both MSI and MSI-X, but system
>>>      software is
>>>      prohibited from enabling both at the same time. If system software
>>>      enables both at the same time, the result is undefined.
>> Ah, cool. So yes Alexey, feel free to impose it :).
> Heh. This solves just half of the problem - I still have to keep track of
> what device got MSI/MSIX configured via that ibm,change-msi interface. I
> was hoping I can store such flag somewhere in a device PCI config space but
> MSI/MSIX enable bit is not good as it is not set when those calls are made.
> And I cannot rely on address/data fields much as the guest can change them
> (I already use them to store IRQ numbers and btw it is missing checks when
> I read them back for disposal, I'll fix in next round).
>
> Or on "enable" event I could put IRQ numbers to .data of MSI config space
> and on "disable" check if it is not zero, then configuration took place,
> then I can remove my msi[]/msix[] flag arrays. If the guest did any change
> to MSI/MSIX config space (it does not on SPAPR except weird selftest
> cases), I compare .data with what ICS can possibly have and either reject
> "disable" or handle it and if it breaks XICS - that's too bad for the
> stupid guest. Would that be acceptable?

Can't you prohibit the guest from writing to the MSI configuration 
registers itself? Then you don't need to do sanity checks.


Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
  2014-05-21  9:38               ` Michael S. Tsirkin
@ 2014-05-21 10:03                 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-21 10:03 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: qemu-ppc, Alexander Graf, qemu-devel

On 05/21/2014 07:38 PM, Michael S. Tsirkin wrote:
> On Wed, May 21, 2014 at 07:33:36PM +1000, Alexey Kardashevskiy wrote:
>> On 05/21/2014 07:13 PM, Alexander Graf wrote:
>>>
>>> On 21.05.14 11:11, Michael S. Tsirkin wrote:
>>>> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote:
>>>>> On 21.05.14 10:52, Alexey Kardashevskiy wrote:
>>>>>> On 05/21/2014 06:40 PM, Alexander Graf wrote:
>>>>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote:
>>>>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX interrupt as
>>>>>>>> XICS used to be unable to reuse interrupts which becomes a problem for
>>>>>>>> dynamic MSI reconfiguration which is happening on guest driver reload or
>>>>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of devices
>>>>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no good reason
>>>>>>>> for that.
>>>>>>>>
>>>>>>>> This makes use of new XICS ability to reuse interrupts.
>>>>>>>>
>>>>>>>> This removes cached MSI configuration from SPAPR PHB so the first IRQ
>>>>>>>> number
>>>>>>>> of a device is stored in MSI/MSIX config space so there is no need to
>>>>>>>> store
>>>>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags telling what
>>>>>>>> type
>>>>>>>> of interrupt for which device it has configured in order to return
>>>>>>>> error if
>>>>>>>> (for example) MSIX was enabled and the guest is trying to disable MSI
>>>>>>>> which
>>>>>>>> it has not enabled.
>>>>>>>>
>>>>>>>> This removes a limit for the maximum number of MSIX-enabled devices
>>>>>>>> per PHB,
>>>>>>>> now XICS and PCI bus capacity are the only limitation.
>>>>>>>>
>>>>>>>> This changes migration stream as it fixes vmstate_spapr_pci_msi::name
>>>>>>>> which was
>>>>>>>> wrong since the beginning.
>>>>>>>>
>>>>>>>> This fixed traces to be more informative.
>>>>>>>>
>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> In reality either MSIX or MSI is enabled, never both. So I could remove
>>>>>>>> msi/msix
>>>>>>>> bitmaps from this patch, would it make sense?
>>>>>>> Is this a hard requirement? Does a device have to choose between MSIX and
>>>>>>> MSI or could it theoretically have both enabled? Is this a PCI
>>>>>>> limitation,
>>>>>>> a PAPR/XICS limitation or just a limitation of your implementation?
>>>>>> My implementation does not have this limitation, I asked if I can simplify
>>>>>> code by introducing one :)
>>>>>>
>>>>>> I cannot see any reason why PCI cannot have both MSI and MSIX enabled but
>>>>>> it does not seem to be used by anyone => cannot debug and confirm.
>>>>>>
>>>>>> PAPR spec assumes that if the guest tries enabling MSIX when MSI is
>>>>>> already
>>>>>> enabled, this is a "change", not enabling both types. But it also says MSI
>>>>>> and MSIX vector numbers are not shared. Hm.
>>>>> Yeah, I'm not aware of any limitation on hardware here and I'd
>>>>> rather not impose one.
>>>>>
>>>>> Michael, do you know of any hardware that uses MSI *and* MSI-X at
>>>>> the same time?
>>>>>
>>>>>
>>>>> Alex
>>>> No, and the PCI spec says:
>>>>     A function is permitted to implement both MSI and MSI-X, but system
>>>>     software is
>>>>     prohibited from enabling both at the same time. If system software
>>>>     enables both at the same time, the result is undefined.
>>>
>>> Ah, cool. So yes Alexey, feel free to impose it :).
>>
>> Heh. This solves just half of the problem - I still have to keep track of
>> what device got MSI/MSIX configured via that ibm,change-msi interface. I
>> was hoping I can store such flag somewhere in a device PCI config space but
>> MSI/MSIX enable bit is not good as it is not set when those calls are made.
> 
> Hmm could you pls remind me why is it desirable to store this
> in device?

I need this flag to know if I can process "disable" or return an error. So
I need to save it somewhere. And there can be up to 256 buses * 32 dev * 8
functions = 65536 flags which is 8KB. And only a small portion of it will
ever be used for obvious reasons. Having 1 bit anywhere in config space or
QEMU's PCIDevice would help here...

At the moment I keep an array in SPAPR's PHB, it is 32 entries long so
SPAPR PHB can have only 32 MSI-enabled devices and our testers think this
is not enough :)


> Device is not yet sending MSI interrupts after all
> otherwise enable would be set.

That is correct.


> 
>> And I cannot rely on address/data fields much as the guest can change them
>> (I already use them to store IRQ numbers and btw it is missing checks when
>> I read them back for disposal, I'll fix in next round).
>>
>> Or on "enable" event I could put IRQ numbers to .data of MSI config space
>> and on "disable" check if it is not zero, then configuration took place,
>> then I can remove my msi[]/msix[] flag arrays. If the guest did any change
>> to MSI/MSIX config space (it does not on SPAPR except weird selftest
>> cases), I compare .data with what ICS can possibly have and either reject
>> "disable" or handle it and if it breaks XICS - that's too bad for the
>> stupid guest. Would that be acceptable?
> 
> 
>>
>> -- 
>> Alexey


-- 
Alexey

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
  2014-05-21  9:50               ` Alexander Graf
@ 2014-05-21 10:13                 ` Alexey Kardashevskiy
  2014-05-21 10:35                   ` Alexander Graf
  0 siblings, 1 reply; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-21 10:13 UTC (permalink / raw)
  To: Alexander Graf, Michael S. Tsirkin; +Cc: qemu-ppc, qemu-devel

On 05/21/2014 07:50 PM, Alexander Graf wrote:
> 
> On 21.05.14 11:33, Alexey Kardashevskiy wrote:
>> On 05/21/2014 07:13 PM, Alexander Graf wrote:
>>> On 21.05.14 11:11, Michael S. Tsirkin wrote:
>>>> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote:
>>>>> On 21.05.14 10:52, Alexey Kardashevskiy wrote:
>>>>>> On 05/21/2014 06:40 PM, Alexander Graf wrote:
>>>>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote:
>>>>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX interrupt as
>>>>>>>> XICS used to be unable to reuse interrupts which becomes a problem for
>>>>>>>> dynamic MSI reconfiguration which is happening on guest driver
>>>>>>>> reload or
>>>>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of devices
>>>>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no good
>>>>>>>> reason
>>>>>>>> for that.
>>>>>>>>
>>>>>>>> This makes use of new XICS ability to reuse interrupts.
>>>>>>>>
>>>>>>>> This removes cached MSI configuration from SPAPR PHB so the first IRQ
>>>>>>>> number
>>>>>>>> of a device is stored in MSI/MSIX config space so there is no need to
>>>>>>>> store
>>>>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags telling
>>>>>>>> what
>>>>>>>> type
>>>>>>>> of interrupt for which device it has configured in order to return
>>>>>>>> error if
>>>>>>>> (for example) MSIX was enabled and the guest is trying to disable MSI
>>>>>>>> which
>>>>>>>> it has not enabled.
>>>>>>>>
>>>>>>>> This removes a limit for the maximum number of MSIX-enabled devices
>>>>>>>> per PHB,
>>>>>>>> now XICS and PCI bus capacity are the only limitation.
>>>>>>>>
>>>>>>>> This changes migration stream as it fixes vmstate_spapr_pci_msi::name
>>>>>>>> which was
>>>>>>>> wrong since the beginning.
>>>>>>>>
>>>>>>>> This fixed traces to be more informative.
>>>>>>>>
>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> In reality either MSIX or MSI is enabled, never both. So I could
>>>>>>>> remove
>>>>>>>> msi/msix
>>>>>>>> bitmaps from this patch, would it make sense?
>>>>>>> Is this a hard requirement? Does a device have to choose between
>>>>>>> MSIX and
>>>>>>> MSI or could it theoretically have both enabled? Is this a PCI
>>>>>>> limitation,
>>>>>>> a PAPR/XICS limitation or just a limitation of your implementation?
>>>>>> My implementation does not have this limitation, I asked if I can
>>>>>> simplify
>>>>>> code by introducing one :)
>>>>>>
>>>>>> I cannot see any reason why PCI cannot have both MSI and MSIX enabled
>>>>>> but
>>>>>> it does not seem to be used by anyone => cannot debug and confirm.
>>>>>>
>>>>>> PAPR spec assumes that if the guest tries enabling MSIX when MSI is
>>>>>> already
>>>>>> enabled, this is a "change", not enabling both types. But it also
>>>>>> says MSI
>>>>>> and MSIX vector numbers are not shared. Hm.
>>>>> Yeah, I'm not aware of any limitation on hardware here and I'd
>>>>> rather not impose one.
>>>>>
>>>>> Michael, do you know of any hardware that uses MSI *and* MSI-X at
>>>>> the same time?
>>>>>
>>>>>
>>>>> Alex
>>>> No, and the PCI spec says:
>>>>      A function is permitted to implement both MSI and MSI-X, but system
>>>>      software is
>>>>      prohibited from enabling both at the same time. If system software
>>>>      enables both at the same time, the result is undefined.
>>> Ah, cool. So yes Alexey, feel free to impose it :).
>> Heh. This solves just half of the problem - I still have to keep track of
>> what device got MSI/MSIX configured via that ibm,change-msi interface. I
>> was hoping I can store such flag somewhere in a device PCI config space but
>> MSI/MSIX enable bit is not good as it is not set when those calls are made.
>> And I cannot rely on address/data fields much as the guest can change them
>> (I already use them to store IRQ numbers and btw it is missing checks when
>> I read them back for disposal, I'll fix in next round).
>>
>> Or on "enable" event I could put IRQ numbers to .data of MSI config space
>> and on "disable" check if it is not zero, then configuration took place,
>> then I can remove my msi[]/msix[] flag arrays. If the guest did any change
>> to MSI/MSIX config space (it does not on SPAPR except weird selftest
>> cases), I compare .data with what ICS can possibly have and either reject
>> "disable" or handle it and if it breaks XICS - that's too bad for the
>> stupid guest. Would that be acceptable?
> 
> Can't you prohibit the guest from writing to the MSI configuration
> registers itself? Then you don't need to do sanity checks.


I could for emulated devices but VFIO uses the same code. For example,
there is an IBM SCSI IPR card which does a "self test". For that, it saves
MSIX BAR content, does reboot via some backdoor interface and restores MSIX
BAR. It has been solved for VFIO in the host kernel by restoring MSIX data
from cached values when guest is trying to restore it with what it thinks
is actual MSIX data (it is virtualized because of x86). But there is cache
in the host kernel and I am trying to avoid cache here (or put it into the
device).


-- 
Alexey

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
  2014-05-21 10:13                 ` Alexey Kardashevskiy
@ 2014-05-21 10:35                   ` Alexander Graf
  2014-05-21 12:42                     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 35+ messages in thread
From: Alexander Graf @ 2014-05-21 10:35 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Michael S. Tsirkin; +Cc: qemu-ppc, qemu-devel


On 21.05.14 12:13, Alexey Kardashevskiy wrote:
> On 05/21/2014 07:50 PM, Alexander Graf wrote:
>> On 21.05.14 11:33, Alexey Kardashevskiy wrote:
>>> On 05/21/2014 07:13 PM, Alexander Graf wrote:
>>>> On 21.05.14 11:11, Michael S. Tsirkin wrote:
>>>>> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote:
>>>>>> On 21.05.14 10:52, Alexey Kardashevskiy wrote:
>>>>>>> On 05/21/2014 06:40 PM, Alexander Graf wrote:
>>>>>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote:
>>>>>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX interrupt as
>>>>>>>>> XICS used to be unable to reuse interrupts which becomes a problem for
>>>>>>>>> dynamic MSI reconfiguration which is happening on guest driver
>>>>>>>>> reload or
>>>>>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of devices
>>>>>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no good
>>>>>>>>> reason
>>>>>>>>> for that.
>>>>>>>>>
>>>>>>>>> This makes use of new XICS ability to reuse interrupts.
>>>>>>>>>
>>>>>>>>> This removes cached MSI configuration from SPAPR PHB so the first IRQ
>>>>>>>>> number
>>>>>>>>> of a device is stored in MSI/MSIX config space so there is no need to
>>>>>>>>> store
>>>>>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags telling
>>>>>>>>> what
>>>>>>>>> type
>>>>>>>>> of interrupt for which device it has configured in order to return
>>>>>>>>> error if
>>>>>>>>> (for example) MSIX was enabled and the guest is trying to disable MSI
>>>>>>>>> which
>>>>>>>>> it has not enabled.
>>>>>>>>>
>>>>>>>>> This removes a limit for the maximum number of MSIX-enabled devices
>>>>>>>>> per PHB,
>>>>>>>>> now XICS and PCI bus capacity are the only limitation.
>>>>>>>>>
>>>>>>>>> This changes migration stream as it fixes vmstate_spapr_pci_msi::name
>>>>>>>>> which was
>>>>>>>>> wrong since the beginning.
>>>>>>>>>
>>>>>>>>> This fixed traces to be more informative.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>>> ---
>>>>>>>>>
>>>>>>>>> In reality either MSIX or MSI is enabled, never both. So I could
>>>>>>>>> remove
>>>>>>>>> msi/msix
>>>>>>>>> bitmaps from this patch, would it make sense?
>>>>>>>> Is this a hard requirement? Does a device have to choose between
>>>>>>>> MSIX and
>>>>>>>> MSI or could it theoretically have both enabled? Is this a PCI
>>>>>>>> limitation,
>>>>>>>> a PAPR/XICS limitation or just a limitation of your implementation?
>>>>>>> My implementation does not have this limitation, I asked if I can
>>>>>>> simplify
>>>>>>> code by introducing one :)
>>>>>>>
>>>>>>> I cannot see any reason why PCI cannot have both MSI and MSIX enabled
>>>>>>> but
>>>>>>> it does not seem to be used by anyone => cannot debug and confirm.
>>>>>>>
>>>>>>> PAPR spec assumes that if the guest tries enabling MSIX when MSI is
>>>>>>> already
>>>>>>> enabled, this is a "change", not enabling both types. But it also
>>>>>>> says MSI
>>>>>>> and MSIX vector numbers are not shared. Hm.
>>>>>> Yeah, I'm not aware of any limitation on hardware here and I'd
>>>>>> rather not impose one.
>>>>>>
>>>>>> Michael, do you know of any hardware that uses MSI *and* MSI-X at
>>>>>> the same time?
>>>>>>
>>>>>>
>>>>>> Alex
>>>>> No, and the PCI spec says:
>>>>>       A function is permitted to implement both MSI and MSI-X, but system
>>>>>       software is
>>>>>       prohibited from enabling both at the same time. If system software
>>>>>       enables both at the same time, the result is undefined.
>>>> Ah, cool. So yes Alexey, feel free to impose it :).
>>> Heh. This solves just half of the problem - I still have to keep track of
>>> what device got MSI/MSIX configured via that ibm,change-msi interface. I
>>> was hoping I can store such flag somewhere in a device PCI config space but
>>> MSI/MSIX enable bit is not good as it is not set when those calls are made.
>>> And I cannot rely on address/data fields much as the guest can change them
>>> (I already use them to store IRQ numbers and btw it is missing checks when
>>> I read them back for disposal, I'll fix in next round).
>>>
>>> Or on "enable" event I could put IRQ numbers to .data of MSI config space
>>> and on "disable" check if it is not zero, then configuration took place,
>>> then I can remove my msi[]/msix[] flag arrays. If the guest did any change
>>> to MSI/MSIX config space (it does not on SPAPR except weird selftest
>>> cases), I compare .data with what ICS can possibly have and either reject
>>> "disable" or handle it and if it breaks XICS - that's too bad for the
>>> stupid guest. Would that be acceptable?
>> Can't you prohibit the guest from writing to the MSI configuration
>> registers itself? Then you don't need to do sanity checks.
>
> I could for emulated devices but VFIO uses the same code. For example,
> there is an IBM SCSI IPR card which does a "self test". For that, it saves
> MSIX BAR content, does reboot via some backdoor interface and restores MSIX
> BAR. It has been solved for VFIO in the host kernel by restoring MSIX data
> from cached values when guest is trying to restore it with what it thinks
> is actual MSIX data (it is virtualized because of x86). But there is cache

We already have a cache because we don't access the real PCI registers 
with msi_set_message(), no?


Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
  2014-05-21 10:35                   ` Alexander Graf
@ 2014-05-21 12:42                     ` Alexey Kardashevskiy
  2014-05-22  6:53                       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-21 12:42 UTC (permalink / raw)
  To: Alexander Graf, Michael S. Tsirkin; +Cc: qemu-ppc, qemu-devel

On 05/21/2014 08:35 PM, Alexander Graf wrote:
> 
> On 21.05.14 12:13, Alexey Kardashevskiy wrote:
>> On 05/21/2014 07:50 PM, Alexander Graf wrote:
>>> On 21.05.14 11:33, Alexey Kardashevskiy wrote:
>>>> On 05/21/2014 07:13 PM, Alexander Graf wrote:
>>>>> On 21.05.14 11:11, Michael S. Tsirkin wrote:
>>>>>> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote:
>>>>>>> On 21.05.14 10:52, Alexey Kardashevskiy wrote:
>>>>>>>> On 05/21/2014 06:40 PM, Alexander Graf wrote:
>>>>>>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote:
>>>>>>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX
>>>>>>>>>> interrupt as
>>>>>>>>>> XICS used to be unable to reuse interrupts which becomes a
>>>>>>>>>> problem for
>>>>>>>>>> dynamic MSI reconfiguration which is happening on guest driver
>>>>>>>>>> reload or
>>>>>>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of devices
>>>>>>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no good
>>>>>>>>>> reason
>>>>>>>>>> for that.
>>>>>>>>>>
>>>>>>>>>> This makes use of new XICS ability to reuse interrupts.
>>>>>>>>>>
>>>>>>>>>> This removes cached MSI configuration from SPAPR PHB so the first
>>>>>>>>>> IRQ
>>>>>>>>>> number
>>>>>>>>>> of a device is stored in MSI/MSIX config space so there is no
>>>>>>>>>> need to
>>>>>>>>>> store
>>>>>>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags telling
>>>>>>>>>> what
>>>>>>>>>> type
>>>>>>>>>> of interrupt for which device it has configured in order to return
>>>>>>>>>> error if
>>>>>>>>>> (for example) MSIX was enabled and the guest is trying to disable
>>>>>>>>>> MSI
>>>>>>>>>> which
>>>>>>>>>> it has not enabled.
>>>>>>>>>>
>>>>>>>>>> This removes a limit for the maximum number of MSIX-enabled devices
>>>>>>>>>> per PHB,
>>>>>>>>>> now XICS and PCI bus capacity are the only limitation.
>>>>>>>>>>
>>>>>>>>>> This changes migration stream as it fixes
>>>>>>>>>> vmstate_spapr_pci_msi::name
>>>>>>>>>> which was
>>>>>>>>>> wrong since the beginning.
>>>>>>>>>>
>>>>>>>>>> This fixed traces to be more informative.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>>>> ---
>>>>>>>>>>
>>>>>>>>>> In reality either MSIX or MSI is enabled, never both. So I could
>>>>>>>>>> remove
>>>>>>>>>> msi/msix
>>>>>>>>>> bitmaps from this patch, would it make sense?
>>>>>>>>> Is this a hard requirement? Does a device have to choose between
>>>>>>>>> MSIX and
>>>>>>>>> MSI or could it theoretically have both enabled? Is this a PCI
>>>>>>>>> limitation,
>>>>>>>>> a PAPR/XICS limitation or just a limitation of your implementation?
>>>>>>>> My implementation does not have this limitation, I asked if I can
>>>>>>>> simplify
>>>>>>>> code by introducing one :)
>>>>>>>>
>>>>>>>> I cannot see any reason why PCI cannot have both MSI and MSIX enabled
>>>>>>>> but
>>>>>>>> it does not seem to be used by anyone => cannot debug and confirm.
>>>>>>>>
>>>>>>>> PAPR spec assumes that if the guest tries enabling MSIX when MSI is
>>>>>>>> already
>>>>>>>> enabled, this is a "change", not enabling both types. But it also
>>>>>>>> says MSI
>>>>>>>> and MSIX vector numbers are not shared. Hm.
>>>>>>> Yeah, I'm not aware of any limitation on hardware here and I'd
>>>>>>> rather not impose one.
>>>>>>>
>>>>>>> Michael, do you know of any hardware that uses MSI *and* MSI-X at
>>>>>>> the same time?
>>>>>>>
>>>>>>>
>>>>>>> Alex
>>>>>> No, and the PCI spec says:
>>>>>>       A function is permitted to implement both MSI and MSI-X, but
>>>>>> system
>>>>>>       software is
>>>>>>       prohibited from enabling both at the same time. If system software
>>>>>>       enables both at the same time, the result is undefined.
>>>>> Ah, cool. So yes Alexey, feel free to impose it :).
>>>> Heh. This solves just half of the problem - I still have to keep track of
>>>> what device got MSI/MSIX configured via that ibm,change-msi interface. I
>>>> was hoping I can store such flag somewhere in a device PCI config space
>>>> but
>>>> MSI/MSIX enable bit is not good as it is not set when those calls are
>>>> made.
>>>> And I cannot rely on address/data fields much as the guest can change them
>>>> (I already use them to store IRQ numbers and btw it is missing checks when
>>>> I read them back for disposal, I'll fix in next round).
>>>>
>>>> Or on "enable" event I could put IRQ numbers to .data of MSI config space
>>>> and on "disable" check if it is not zero, then configuration took place,
>>>> then I can remove my msi[]/msix[] flag arrays. If the guest did any change
>>>> to MSI/MSIX config space (it does not on SPAPR except weird selftest
>>>> cases), I compare .data with what ICS can possibly have and either reject
>>>> "disable" or handle it and if it breaks XICS - that's too bad for the
>>>> stupid guest. Would that be acceptable?
>>> Can't you prohibit the guest from writing to the MSI configuration
>>> registers itself? Then you don't need to do sanity checks.
>>
>> I could for emulated devices but VFIO uses the same code. For example,
>> there is an IBM SCSI IPR card which does a "self test". For that, it saves
>> MSIX BAR content, does reboot via some backdoor interface and restores MSIX
>> BAR. It has been solved for VFIO in the host kernel by restoring MSIX data
>> from cached values when guest is trying to restore it with what it thinks
>> is actual MSIX data (it is virtualized because of x86). But there is cache
> 
> We already have a cache because we don't access the real PCI registers with
> msi_set_message(), no?


For emulated devices there is no cache. And in any case the guest is
allowed to write to it... Who knows what AIX does? I do not.


-- 
Alexey

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
  2014-05-21 12:42                     ` Alexey Kardashevskiy
@ 2014-05-22  6:53                       ` Alexey Kardashevskiy
  2014-05-22  7:16                         ` Alexander Graf
  0 siblings, 1 reply; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-22  6:53 UTC (permalink / raw)
  To: Alexander Graf, Michael S. Tsirkin; +Cc: qemu-ppc, qemu-devel

On 05/21/2014 10:42 PM, Alexey Kardashevskiy wrote:
> On 05/21/2014 08:35 PM, Alexander Graf wrote:
>>
>> On 21.05.14 12:13, Alexey Kardashevskiy wrote:
>>> On 05/21/2014 07:50 PM, Alexander Graf wrote:
>>>> On 21.05.14 11:33, Alexey Kardashevskiy wrote:
>>>>> On 05/21/2014 07:13 PM, Alexander Graf wrote:
>>>>>> On 21.05.14 11:11, Michael S. Tsirkin wrote:
>>>>>>> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote:
>>>>>>>> On 21.05.14 10:52, Alexey Kardashevskiy wrote:
>>>>>>>>> On 05/21/2014 06:40 PM, Alexander Graf wrote:
>>>>>>>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote:
>>>>>>>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX
>>>>>>>>>>> interrupt as
>>>>>>>>>>> XICS used to be unable to reuse interrupts which becomes a
>>>>>>>>>>> problem for
>>>>>>>>>>> dynamic MSI reconfiguration which is happening on guest driver
>>>>>>>>>>> reload or
>>>>>>>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of devices
>>>>>>>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no good
>>>>>>>>>>> reason
>>>>>>>>>>> for that.
>>>>>>>>>>>
>>>>>>>>>>> This makes use of new XICS ability to reuse interrupts.
>>>>>>>>>>>
>>>>>>>>>>> This removes cached MSI configuration from SPAPR PHB so the first
>>>>>>>>>>> IRQ
>>>>>>>>>>> number
>>>>>>>>>>> of a device is stored in MSI/MSIX config space so there is no
>>>>>>>>>>> need to
>>>>>>>>>>> store
>>>>>>>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags telling
>>>>>>>>>>> what
>>>>>>>>>>> type
>>>>>>>>>>> of interrupt for which device it has configured in order to return
>>>>>>>>>>> error if
>>>>>>>>>>> (for example) MSIX was enabled and the guest is trying to disable
>>>>>>>>>>> MSI
>>>>>>>>>>> which
>>>>>>>>>>> it has not enabled.
>>>>>>>>>>>
>>>>>>>>>>> This removes a limit for the maximum number of MSIX-enabled devices
>>>>>>>>>>> per PHB,
>>>>>>>>>>> now XICS and PCI bus capacity are the only limitation.
>>>>>>>>>>>
>>>>>>>>>>> This changes migration stream as it fixes
>>>>>>>>>>> vmstate_spapr_pci_msi::name
>>>>>>>>>>> which was
>>>>>>>>>>> wrong since the beginning.
>>>>>>>>>>>
>>>>>>>>>>> This fixed traces to be more informative.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>>>>> ---
>>>>>>>>>>>
>>>>>>>>>>> In reality either MSIX or MSI is enabled, never both. So I could
>>>>>>>>>>> remove
>>>>>>>>>>> msi/msix
>>>>>>>>>>> bitmaps from this patch, would it make sense?
>>>>>>>>>> Is this a hard requirement? Does a device have to choose between
>>>>>>>>>> MSIX and
>>>>>>>>>> MSI or could it theoretically have both enabled? Is this a PCI
>>>>>>>>>> limitation,
>>>>>>>>>> a PAPR/XICS limitation or just a limitation of your implementation?
>>>>>>>>> My implementation does not have this limitation, I asked if I can
>>>>>>>>> simplify
>>>>>>>>> code by introducing one :)
>>>>>>>>>
>>>>>>>>> I cannot see any reason why PCI cannot have both MSI and MSIX enabled
>>>>>>>>> but
>>>>>>>>> it does not seem to be used by anyone => cannot debug and confirm.
>>>>>>>>>
>>>>>>>>> PAPR spec assumes that if the guest tries enabling MSIX when MSI is
>>>>>>>>> already
>>>>>>>>> enabled, this is a "change", not enabling both types. But it also
>>>>>>>>> says MSI
>>>>>>>>> and MSIX vector numbers are not shared. Hm.
>>>>>>>> Yeah, I'm not aware of any limitation on hardware here and I'd
>>>>>>>> rather not impose one.
>>>>>>>>
>>>>>>>> Michael, do you know of any hardware that uses MSI *and* MSI-X at
>>>>>>>> the same time?
>>>>>>>>
>>>>>>>>
>>>>>>>> Alex
>>>>>>> No, and the PCI spec says:
>>>>>>>       A function is permitted to implement both MSI and MSI-X, but
>>>>>>> system
>>>>>>>       software is
>>>>>>>       prohibited from enabling both at the same time. If system software
>>>>>>>       enables both at the same time, the result is undefined.
>>>>>> Ah, cool. So yes Alexey, feel free to impose it :).
>>>>> Heh. This solves just half of the problem - I still have to keep track of
>>>>> what device got MSI/MSIX configured via that ibm,change-msi interface. I
>>>>> was hoping I can store such flag somewhere in a device PCI config space
>>>>> but
>>>>> MSI/MSIX enable bit is not good as it is not set when those calls are
>>>>> made.
>>>>> And I cannot rely on address/data fields much as the guest can change them
>>>>> (I already use them to store IRQ numbers and btw it is missing checks when
>>>>> I read them back for disposal, I'll fix in next round).
>>>>>
>>>>> Or on "enable" event I could put IRQ numbers to .data of MSI config space
>>>>> and on "disable" check if it is not zero, then configuration took place,
>>>>> then I can remove my msi[]/msix[] flag arrays. If the guest did any change
>>>>> to MSI/MSIX config space (it does not on SPAPR except weird selftest
>>>>> cases), I compare .data with what ICS can possibly have and either reject
>>>>> "disable" or handle it and if it breaks XICS - that's too bad for the
>>>>> stupid guest. Would that be acceptable?
>>>> Can't you prohibit the guest from writing to the MSI configuration
>>>> registers itself? Then you don't need to do sanity checks.
>>>
>>> I could for emulated devices but VFIO uses the same code. For example,
>>> there is an IBM SCSI IPR card which does a "self test". For that, it saves
>>> MSIX BAR content, does reboot via some backdoor interface and restores MSIX
>>> BAR. It has been solved for VFIO in the host kernel by restoring MSIX data
>>> from cached values when guest is trying to restore it with what it thinks
>>> is actual MSIX data (it is virtualized because of x86). But there is cache
>>
>> We already have a cache because we don't access the real PCI registers with
>> msi_set_message(), no?
> 
> 
> For emulated devices there is no cache. And in any case the guest is
> allowed to write to it... Who knows what AIX does? I do not.


Tried GHashTable for keeping bus:dev.fn <-> (irq, num), more or less ok but
how to migrate such thing? Temporary cache somewhere and then unpack it? Or
use old style migration callbacks?



-- 
Alexey

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
  2014-05-22  6:53                       ` Alexey Kardashevskiy
@ 2014-05-22  7:16                         ` Alexander Graf
  2014-05-22 10:53                           ` Alexey Kardashevskiy
  0 siblings, 1 reply; 35+ messages in thread
From: Alexander Graf @ 2014-05-22  7:16 UTC (permalink / raw)
  To: Alexey Kardashevskiy; +Cc: qemu-ppc, qemu-devel, Michael S. Tsirkin



> Am 22.05.2014 um 08:53 schrieb Alexey Kardashevskiy <aik@ozlabs.ru>:
> 
>> On 05/21/2014 10:42 PM, Alexey Kardashevskiy wrote:
>>> On 05/21/2014 08:35 PM, Alexander Graf wrote:
>>> 
>>>> On 21.05.14 12:13, Alexey Kardashevskiy wrote:
>>>>> On 05/21/2014 07:50 PM, Alexander Graf wrote:
>>>>>> On 21.05.14 11:33, Alexey Kardashevskiy wrote:
>>>>>>> On 05/21/2014 07:13 PM, Alexander Graf wrote:
>>>>>>>> On 21.05.14 11:11, Michael S. Tsirkin wrote:
>>>>>>>>> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote:
>>>>>>>>>> On 21.05.14 10:52, Alexey Kardashevskiy wrote:
>>>>>>>>>>> On 05/21/2014 06:40 PM, Alexander Graf wrote:
>>>>>>>>>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote:
>>>>>>>>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX
>>>>>>>>>>>> interrupt as
>>>>>>>>>>>> XICS used to be unable to reuse interrupts which becomes a
>>>>>>>>>>>> problem for
>>>>>>>>>>>> dynamic MSI reconfiguration which is happening on guest driver
>>>>>>>>>>>> reload or
>>>>>>>>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of devices
>>>>>>>>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no good
>>>>>>>>>>>> reason
>>>>>>>>>>>> for that.
>>>>>>>>>>>> 
>>>>>>>>>>>> This makes use of new XICS ability to reuse interrupts.
>>>>>>>>>>>> 
>>>>>>>>>>>> This removes cached MSI configuration from SPAPR PHB so the first
>>>>>>>>>>>> IRQ
>>>>>>>>>>>> number
>>>>>>>>>>>> of a device is stored in MSI/MSIX config space so there is no
>>>>>>>>>>>> need to
>>>>>>>>>>>> store
>>>>>>>>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags telling
>>>>>>>>>>>> what
>>>>>>>>>>>> type
>>>>>>>>>>>> of interrupt for which device it has configured in order to return
>>>>>>>>>>>> error if
>>>>>>>>>>>> (for example) MSIX was enabled and the guest is trying to disable
>>>>>>>>>>>> MSI
>>>>>>>>>>>> which
>>>>>>>>>>>> it has not enabled.
>>>>>>>>>>>> 
>>>>>>>>>>>> This removes a limit for the maximum number of MSIX-enabled devices
>>>>>>>>>>>> per PHB,
>>>>>>>>>>>> now XICS and PCI bus capacity are the only limitation.
>>>>>>>>>>>> 
>>>>>>>>>>>> This changes migration stream as it fixes
>>>>>>>>>>>> vmstate_spapr_pci_msi::name
>>>>>>>>>>>> which was
>>>>>>>>>>>> wrong since the beginning.
>>>>>>>>>>>> 
>>>>>>>>>>>> This fixed traces to be more informative.
>>>>>>>>>>>> 
>>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>>>>>> ---
>>>>>>>>>>>> 
>>>>>>>>>>>> In reality either MSIX or MSI is enabled, never both. So I could
>>>>>>>>>>>> remove
>>>>>>>>>>>> msi/msix
>>>>>>>>>>>> bitmaps from this patch, would it make sense?
>>>>>>>>>>> Is this a hard requirement? Does a device have to choose between
>>>>>>>>>>> MSIX and
>>>>>>>>>>> MSI or could it theoretically have both enabled? Is this a PCI
>>>>>>>>>>> limitation,
>>>>>>>>>>> a PAPR/XICS limitation or just a limitation of your implementation?
>>>>>>>>>> My implementation does not have this limitation, I asked if I can
>>>>>>>>>> simplify
>>>>>>>>>> code by introducing one :)
>>>>>>>>>> 
>>>>>>>>>> I cannot see any reason why PCI cannot have both MSI and MSIX enabled
>>>>>>>>>> but
>>>>>>>>>> it does not seem to be used by anyone => cannot debug and confirm.
>>>>>>>>>> 
>>>>>>>>>> PAPR spec assumes that if the guest tries enabling MSIX when MSI is
>>>>>>>>>> already
>>>>>>>>>> enabled, this is a "change", not enabling both types. But it also
>>>>>>>>>> says MSI
>>>>>>>>>> and MSIX vector numbers are not shared. Hm.
>>>>>>>>> Yeah, I'm not aware of any limitation on hardware here and I'd
>>>>>>>>> rather not impose one.
>>>>>>>>> 
>>>>>>>>> Michael, do you know of any hardware that uses MSI *and* MSI-X at
>>>>>>>>> the same time?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Alex
>>>>>>>> No, and the PCI spec says:
>>>>>>>>      A function is permitted to implement both MSI and MSI-X, but
>>>>>>>> system
>>>>>>>>      software is
>>>>>>>>      prohibited from enabling both at the same time. If system software
>>>>>>>>      enables both at the same time, the result is undefined.
>>>>>>> Ah, cool. So yes Alexey, feel free to impose it :).
>>>>>> Heh. This solves just half of the problem - I still have to keep track of
>>>>>> what device got MSI/MSIX configured via that ibm,change-msi interface. I
>>>>>> was hoping I can store such flag somewhere in a device PCI config space
>>>>>> but
>>>>>> MSI/MSIX enable bit is not good as it is not set when those calls are
>>>>>> made.
>>>>>> And I cannot rely on address/data fields much as the guest can change them
>>>>>> (I already use them to store IRQ numbers and btw it is missing checks when
>>>>>> I read them back for disposal, I'll fix in next round).
>>>>>> 
>>>>>> Or on "enable" event I could put IRQ numbers to .data of MSI config space
>>>>>> and on "disable" check if it is not zero, then configuration took place,
>>>>>> then I can remove my msi[]/msix[] flag arrays. If the guest did any change
>>>>>> to MSI/MSIX config space (it does not on SPAPR except weird selftest
>>>>>> cases), I compare .data with what ICS can possibly have and either reject
>>>>>> "disable" or handle it and if it breaks XICS - that's too bad for the
>>>>>> stupid guest. Would that be acceptable?
>>>>> Can't you prohibit the guest from writing to the MSI configuration
>>>>> registers itself? Then you don't need to do sanity checks.
>>>> 
>>>> I could for emulated devices but VFIO uses the same code. For example,
>>>> there is an IBM SCSI IPR card which does a "self test". For that, it saves
>>>> MSIX BAR content, does reboot via some backdoor interface and restores MSIX
>>>> BAR. It has been solved for VFIO in the host kernel by restoring MSIX data
>>>> from cached values when guest is trying to restore it with what it thinks
>>>> is actual MSIX data (it is virtualized because of x86). But there is cache
>>> 
>>> We already have a cache because we don't access the real PCI registers with
>>> msi_set_message(), no?
>> 
>> 
>> For emulated devices there is no cache. And in any case the guest is
>> allowed to write to it... Who knows what AIX does? I do not.
> 
> 
> Tried GHashTable for keeping bus:dev.fn <-> (irq, num), more or less ok but
> how to migrate such thing? Temporary cache somewhere and then unpack it? Or
> use old style migration callbacks?

Could you try to introduce a new vmstate type that just serializes and deserializes hash tables? Maybe there is already a serialization function for it in glib?


Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
  2014-05-22  7:16                         ` Alexander Graf
@ 2014-05-22 10:53                           ` Alexey Kardashevskiy
  2014-05-22 10:53                             ` [Qemu-devel] [PATCH 1/2] vmstate: Add helper to enable GHashTable migration Alexey Kardashevskiy
                                               ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-22 10:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf


On 05/22/2014 05:16 PM, Alexander Graf wrote:> 
> 
>> Am 22.05.2014 um 08:53 schrieb Alexey Kardashevskiy <aik@ozlabs.ru>:
>>
>>> On 05/21/2014 10:42 PM, Alexey Kardashevskiy wrote:
>>>> On 05/21/2014 08:35 PM, Alexander Graf wrote:
>>>>
>>>>> On 21.05.14 12:13, Alexey Kardashevskiy wrote:
>>>>>> On 05/21/2014 07:50 PM, Alexander Graf wrote:
>>>>>>> On 21.05.14 11:33, Alexey Kardashevskiy wrote:
>>>>>>>> On 05/21/2014 07:13 PM, Alexander Graf wrote:
>>>>>>>>> On 21.05.14 11:11, Michael S. Tsirkin wrote:
>>>>>>>>>> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote:
>>>>>>>>>>> On 21.05.14 10:52, Alexey Kardashevskiy wrote:
>>>>>>>>>>>> On 05/21/2014 06:40 PM, Alexander Graf wrote:
>>>>>>>>>>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX
>>>>>>>>>>>>> interrupt as
>>>>>>>>>>>>> XICS used to be unable to reuse interrupts which becomes a
>>>>>>>>>>>>> problem for
>>>>>>>>>>>>> dynamic MSI reconfiguration which is happening on guest driver
>>>>>>>>>>>>> reload or
>>>>>>>>>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of devices
>>>>>>>>>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no good
>>>>>>>>>>>>> reason
>>>>>>>>>>>>> for that.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This makes use of new XICS ability to reuse interrupts.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This removes cached MSI configuration from SPAPR PHB so the first
>>>>>>>>>>>>> IRQ
>>>>>>>>>>>>> number
>>>>>>>>>>>>> of a device is stored in MSI/MSIX config space so there is no
>>>>>>>>>>>>> need to
>>>>>>>>>>>>> store
>>>>>>>>>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags telling
>>>>>>>>>>>>> what
>>>>>>>>>>>>> type
>>>>>>>>>>>>> of interrupt for which device it has configured in order to return
>>>>>>>>>>>>> error if
>>>>>>>>>>>>> (for example) MSIX was enabled and the guest is trying to disable
>>>>>>>>>>>>> MSI
>>>>>>>>>>>>> which
>>>>>>>>>>>>> it has not enabled.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This removes a limit for the maximum number of MSIX-enabled devices
>>>>>>>>>>>>> per PHB,
>>>>>>>>>>>>> now XICS and PCI bus capacity are the only limitation.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This changes migration stream as it fixes
>>>>>>>>>>>>> vmstate_spapr_pci_msi::name
>>>>>>>>>>>>> which was
>>>>>>>>>>>>> wrong since the beginning.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This fixed traces to be more informative.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>
>>>>>>>>>>>>> In reality either MSIX or MSI is enabled, never both. So I could
>>>>>>>>>>>>> remove
>>>>>>>>>>>>> msi/msix
>>>>>>>>>>>>> bitmaps from this patch, would it make sense?
>>>>>>>>>>>> Is this a hard requirement? Does a device have to choose between
>>>>>>>>>>>> MSIX and
>>>>>>>>>>>> MSI or could it theoretically have both enabled? Is this a PCI
>>>>>>>>>>>> limitation,
>>>>>>>>>>>> a PAPR/XICS limitation or just a limitation of your implementation?
>>>>>>>>>>> My implementation does not have this limitation, I asked if I can
>>>>>>>>>>> simplify
>>>>>>>>>>> code by introducing one :)
>>>>>>>>>>>
>>>>>>>>>>> I cannot see any reason why PCI cannot have both MSI and MSIX enabled
>>>>>>>>>>> but
>>>>>>>>>>> it does not seem to be used by anyone => cannot debug and confirm.
>>>>>>>>>>>
>>>>>>>>>>> PAPR spec assumes that if the guest tries enabling MSIX when MSI is
>>>>>>>>>>> already
>>>>>>>>>>> enabled, this is a "change", not enabling both types. But it also
>>>>>>>>>>> says MSI
>>>>>>>>>>> and MSIX vector numbers are not shared. Hm.
>>>>>>>>>> Yeah, I'm not aware of any limitation on hardware here and I'd
>>>>>>>>>> rather not impose one.
>>>>>>>>>>
>>>>>>>>>> Michael, do you know of any hardware that uses MSI *and* MSI-X at
>>>>>>>>>> the same time?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Alex
>>>>>>>>> No, and the PCI spec says:
>>>>>>>>>      A function is permitted to implement both MSI and MSI-X, but
>>>>>>>>> system
>>>>>>>>>      software is
>>>>>>>>>      prohibited from enabling both at the same time. If system software
>>>>>>>>>      enables both at the same time, the result is undefined.
>>>>>>>> Ah, cool. So yes Alexey, feel free to impose it :).
>>>>>>> Heh. This solves just half of the problem - I still have to keep track of
>>>>>>> what device got MSI/MSIX configured via that ibm,change-msi interface. I
>>>>>>> was hoping I can store such flag somewhere in a device PCI config space
>>>>>>> but
>>>>>>> MSI/MSIX enable bit is not good as it is not set when those calls are
>>>>>>> made.
>>>>>>> And I cannot rely on address/data fields much as the guest can change them
>>>>>>> (I already use them to store IRQ numbers and btw it is missing checks when
>>>>>>> I read them back for disposal, I'll fix in next round).
>>>>>>>
>>>>>>> Or on "enable" event I could put IRQ numbers to .data of MSI config space
>>>>>>> and on "disable" check if it is not zero, then configuration took place,
>>>>>>> then I can remove my msi[]/msix[] flag arrays. If the guest did any change
>>>>>>> to MSI/MSIX config space (it does not on SPAPR except weird selftest
>>>>>>> cases), I compare .data with what ICS can possibly have and either reject
>>>>>>> "disable" or handle it and if it breaks XICS - that's too bad for the
>>>>>>> stupid guest. Would that be acceptable?
>>>>>> Can't you prohibit the guest from writing to the MSI configuration
>>>>>> registers itself? Then you don't need to do sanity checks.
>>>>>
>>>>> I could for emulated devices but VFIO uses the same code. For example,
>>>>> there is an IBM SCSI IPR card which does a "self test". For that, it saves
>>>>> MSIX BAR content, does reboot via some backdoor interface and restores MSIX
>>>>> BAR. It has been solved for VFIO in the host kernel by restoring MSIX data
>>>>> from cached values when guest is trying to restore it with what it thinks
>>>>> is actual MSIX data (it is virtualized because of x86). But there is cache
>>>>
>>>> We already have a cache because we don't access the real PCI registers with
>>>> msi_set_message(), no?
>>>
>>>
>>> For emulated devices there is no cache. And in any case the guest is
>>> allowed to write to it... Who knows what AIX does? I do not.
>>
>>
>> Tried GHashTable for keeping bus:dev.fn <-> (irq, num), more or less ok but
>> how to migrate such thing? Temporary cache somewhere and then unpack it? Or
>> use old style migration callbacks?
> 

> Could you try to introduce a new vmstate type that just serializes and
> deserializes hash tables? Maybe there is already a serialization
> function for it in glib?

I have not found any (most likely I do not know how to search there),
I added mine, here are VMSTATE_HASH + its use for SPAPR.


Is this a movement to right direction? I need to put key/value sizes
into VMSTATE definition somehow but do not really want to touch
VMStateField.




Alexey Kardashevskiy (2):
  vmstate: Add helper to enable GHashTable migration
  spapr_pci: Use XICS interrupt allocator and do not cache interrupts in
    PHB

 hw/ppc/spapr_pci.c          | 144 +++++++++++++++++---------------------------
 include/hw/pci-host/spapr.h |  13 ++--
 include/migration/vmstate.h |  10 +++
 include/qemu-common.h       |  13 ++++
 trace-events                |   5 +-
 vmstate.c                   |  54 +++++++++++++++++
 6 files changed, 141 insertions(+), 98 deletions(-)

-- 
1.9.rc0

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 1/2] vmstate: Add helper to enable GHashTable migration
  2014-05-22 10:53                           ` Alexey Kardashevskiy
@ 2014-05-22 10:53                             ` Alexey Kardashevskiy
  2014-05-22 10:57                               ` Alexander Graf
  2014-05-22 10:53                             ` [Qemu-devel] [PATCH 2/2] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB Alexey Kardashevskiy
  2014-05-22 10:57                             ` [Qemu-devel] [PATCH v2 8/8] " Alexander Graf
  2 siblings, 1 reply; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-22 10:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf

This adds a VMSTATE_HASH_V macro. This implements put/get callbacks for it.
This implements a qemu_hash_init() wrapper to save key/value sizes.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 include/migration/vmstate.h | 10 +++++++++
 include/qemu-common.h       | 13 +++++++++++
 vmstate.c                   | 54 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 77 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 7e45048..6af599d 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -166,6 +166,7 @@ extern const VMStateInfo vmstate_info_timer;
 extern const VMStateInfo vmstate_info_buffer;
 extern const VMStateInfo vmstate_info_unused_buffer;
 extern const VMStateInfo vmstate_info_bitmap;
+static const VMStateInfo vmstate_info_hash;
 
 #define type_check_2darray(t1,t2,n,m) ((t1(*)[n][m])0 - (t2*)0)
 #define type_check_array(t1,t2,n) ((t1(*)[n])0 - (t2*)0)
@@ -740,6 +741,15 @@ extern const VMStateInfo vmstate_info_bitmap;
 #define VMSTATE_BUFFER_UNSAFE(_field, _state, _version, _size)        \
     VMSTATE_BUFFER_UNSAFE_INFO(_field, _state, _version, vmstate_info_buffer, _size)
 
+#define VMSTATE_HASH_V(_field, _state, _version) {                      \
+    .name       = (stringify(_field)),                                  \
+    .version_id = (_version),                                           \
+    .size       = sizeof(qemu_hash),                                    \
+    .info       = &vmstate_info_hash,                                   \
+    .flags      = VMS_SINGLE,                                           \
+    .offset     = vmstate_offset_value(_state, _field, qemu_hash),      \
+}
+
 #define VMSTATE_UNUSED_V(_v, _size)                                   \
     VMSTATE_UNUSED_BUFFER(NULL, _v, _size)
 
diff --git a/include/qemu-common.h b/include/qemu-common.h
index 3f3fd60..ee973e7 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -462,4 +462,17 @@ int parse_debug_env(const char *name, int max, int initial);
 
 const char *qemu_ether_ntoa(const MACAddr *mac);
 
+typedef struct qemu_hash {
+    GHashTable *hash;
+    int key_size;
+    int value_size;
+} qemu_hash;
+
+static inline void qemu_hash_init(qemu_hash *h, int key_size, int value_size)
+{
+    h->key_size = key_size;
+    h->value_size = value_size;
+    h->hash = g_hash_table_new_full(g_int_hash, g_int_equal, g_free, g_free);
+}
+
 #endif
diff --git a/vmstate.c b/vmstate.c
index b5882fa..2148e73 100644
--- a/vmstate.c
+++ b/vmstate.c
@@ -667,3 +667,57 @@ const VMStateInfo vmstate_info_bitmap = {
     .get = get_bitmap,
     .put = put_bitmap,
 };
+
+/* Save restore for qemu_hash which is a wrapper over GHashTable */
+static int get_hash(QEMUFile *f, void *pv, size_t size)
+{
+    qemu_hash *h = pv;
+    uint32_t num = g_hash_table_size(h->hash);
+    gpointer key, value;
+
+    num = qemu_get_be32(f);
+
+    for ( ; num; --num) {
+        int i;
+
+        key = g_malloc0(h->key_size);
+        for (i = 0; i < h->key_size; ++i) {
+            ((uint8_t *)key)[i] = qemu_get_byte(f);
+        }
+        value = g_malloc0(h->value_size);
+        for (i = 0; i < h->value_size; ++i) {
+            ((uint8_t *)value)[i] = qemu_get_byte(f);
+        }
+        g_hash_table_insert(h->hash, key, value);
+    }
+
+    return 0;
+}
+
+static void put_hash(QEMUFile *f, void *pv, size_t size)
+{
+    qemu_hash *h = pv;
+    uint32_t num = g_hash_table_size(h->hash);
+    GHashTableIter iter;
+    gpointer key, value;
+
+    qemu_put_be32(f, num);
+
+    g_hash_table_iter_init(&iter, h->hash);
+    while (g_hash_table_iter_next (&iter, &key, &value)) {
+        int i;
+
+        for (i = 0; i < h->key_size; ++i) {
+            qemu_put_byte(f, ((uint8_t *)key)[i]);
+        }
+        for (i = 0; i < h->value_size; ++i) {
+            qemu_put_byte(f, ((uint8_t *)value)[i]);
+        }
+    }
+}
+
+static const VMStateInfo vmstate_info_hash = {
+    .name = "qemu-hash-desc",
+    .get  = get_hash,
+    .put  = put_hash,
+};
-- 
1.9.rc0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [Qemu-devel] [PATCH 2/2] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
  2014-05-22 10:53                           ` Alexey Kardashevskiy
  2014-05-22 10:53                             ` [Qemu-devel] [PATCH 1/2] vmstate: Add helper to enable GHashTable migration Alexey Kardashevskiy
@ 2014-05-22 10:53                             ` Alexey Kardashevskiy
  2014-05-22 10:57                             ` [Qemu-devel] [PATCH v2 8/8] " Alexander Graf
  2 siblings, 0 replies; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-22 10:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexey Kardashevskiy, qemu-ppc, Alexander Graf

Currently SPAPR PHB keeps track of all allocated MSI/MISX interrupt as
XICS used to be unable to reuse interrupts which becomes a problem for
dynamic MSI reconfiguration which is happening on guest driver reload or
PCI hot (un)plug. Another problem is that PHB has a limit of devices
supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no good reason
for that.

This makes use of new XICS ability to reuse interrupts.

This removes cached MSI configuration from SPAPR PHB so the first IRQ number
of a device is stored in MSI/MSIX config space so there is no need to store
this anywhere else. From now on, SPAPR PHB only keeps flags telling what type
of interrupt for which device it has configured in order to return error if
(for example) MSIX was enabled and the guest is trying to disable MSI which
it has not enabled.

This removes a limit for the maximum number of MSIX-enabled devices per PHB,
now XICS and PCI bus capacity are the only limitation.

This changes migration stream as it fixes vmstate_spapr_pci_msi::name which was
wrong since the beginning.

This fixed traces to be more informative.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr_pci.c          | 144 +++++++++++++++++---------------------------
 include/hw/pci-host/spapr.h |  13 ++--
 trace-events                |   5 +-
 3 files changed, 64 insertions(+), 98 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index e574014..6dc0c50 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -220,36 +220,12 @@ static void rtas_write_pci_config(PowerPCCPU *cpu, sPAPREnvironment *spapr,
 }
 
 /*
- * Find an entry with config_addr or returns the empty one if not found AND
- * alloc_new is set.
- * At the moment the msi_table entries are never released so there is
- * no point to look till the end of the list if we need to find the free entry.
- */
-static int spapr_msicfg_find(sPAPRPHBState *phb, uint32_t config_addr,
-                             bool alloc_new)
-{
-    int i;
-
-    for (i = 0; i < SPAPR_MSIX_MAX_DEVS; ++i) {
-        if (!phb->msi_table[i].nvec) {
-            break;
-        }
-        if (phb->msi_table[i].config_addr == config_addr) {
-            return i;
-        }
-    }
-    if ((i < SPAPR_MSIX_MAX_DEVS) && alloc_new) {
-        trace_spapr_pci_msi("Allocating new MSI config", i, config_addr);
-        return i;
-    }
-
-    return -1;
-}
-
-/*
  * Set MSI/MSIX message data.
  * This is required for msi_notify()/msix_notify() which
  * will write at the addresses via spapr_msi_write().
+ *
+ * If hwaddr == 0, all entries will have .data == first_irq i.e.
+ * table will be reset.
  */
 static void spapr_msi_setmsg(PCIDevice *pdev, hwaddr addr, bool msix,
                              unsigned first_irq, unsigned req_num)
@@ -263,9 +239,12 @@ static void spapr_msi_setmsg(PCIDevice *pdev, hwaddr addr, bool msix,
         return;
     }
 
-    for (i = 0; i < req_num; ++i, ++msg.data) {
+    for (i = 0; i < req_num; ++i) {
         msix_set_message(pdev, i, msg);
         trace_spapr_pci_msi_setup(pdev->name, i, msg.address);
+        if (addr) {
+            ++msg.data;
+        }
     }
 }
 
@@ -280,9 +259,12 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
     unsigned int req_num = rtas_ld(args, 4); /* 0 == remove all */
     unsigned int seq_num = rtas_ld(args, 5);
     unsigned int ret_intr_type;
-    int ndev, irq, max_irqs = 0;
+    unsigned int irq, max_irqs = 0, num = 0;
     sPAPRPHBState *phb = NULL;
     PCIDevice *pdev = NULL;
+    bool msix = false;
+    spapr_pci_msi *msi;
+    int *config_addr_key;
 
     switch (func) {
     case RTAS_CHANGE_MSI_FN:
@@ -310,13 +292,18 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
 
     /* Releasing MSIs */
     if (!req_num) {
-        ndev = spapr_msicfg_find(phb, config_addr, false);
-        if (ndev < 0) {
-            trace_spapr_pci_msi("MSI has not been enabled", -1, config_addr);
+        msi = (spapr_pci_msi *) g_hash_table_lookup(phb->msi.hash, &config_addr);
+        if (!msi) {
+            trace_spapr_pci_msi("Releasing wrong config", config_addr);
             rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
             return;
         }
-        trace_spapr_pci_msi("Released MSIs", ndev, config_addr);
+
+        xics_free(spapr->icp, msi->first_irq, msi->num);
+        spapr_msi_setmsg(pdev, 0, msix, 0, num);
+        g_hash_table_remove(phb->msi.hash, &config_addr);
+
+        trace_spapr_pci_msi("Released MSIs", config_addr);
         rtas_st(rets, 0, RTAS_OUT_SUCCESS);
         rtas_st(rets, 1, 0);
         return;
@@ -324,15 +311,6 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
 
     /* Enabling MSI */
 
-    /* Find a device number in the map to add or reuse the existing one */
-    ndev = spapr_msicfg_find(phb, config_addr, true);
-    if (ndev >= SPAPR_MSIX_MAX_DEVS || ndev < 0) {
-        error_report("No free entry for a new MSI device");
-        rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
-        return;
-    }
-    trace_spapr_pci_msi("Configuring MSI", ndev, config_addr);
-
     /* Check if the device supports as many IRQs as requested */
     if (ret_intr_type == RTAS_TYPE_MSI) {
         max_irqs = msi_nr_vectors_allocated(pdev);
@@ -340,48 +318,47 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
         max_irqs = pdev->msix_entries_nr;
     }
     if (!max_irqs) {
-        error_report("Requested interrupt type %d is not enabled for device#%d",
-                     ret_intr_type, ndev);
+        error_report("Requested interrupt type %d is not enabled for device %x",
+                     ret_intr_type, config_addr);
         rtas_st(rets, 0, -1); /* Hardware error */
         return;
     }
     /* Correct the number if the guest asked for too many */
     if (req_num > max_irqs) {
+        trace_spapr_pci_msi_retry(config_addr, req_num, max_irqs);
         req_num = max_irqs;
+        irq = 0; /* to avoid misleading trace */
+        goto out;
     }
 
-    /* Check if there is an old config and MSI number has not changed */
-    if (phb->msi_table[ndev].nvec && (req_num != phb->msi_table[ndev].nvec)) {
-        /* Unexpected behaviour */
-        error_report("Cannot reuse MSI config for device#%d", ndev);
+    /* Allocate MSIs */
+    irq = xics_alloc_block(spapr->icp, 0, req_num, false,
+                           ret_intr_type == RTAS_TYPE_MSI);
+    if (!irq) {
+        error_report("Cannot allocate MSIs for device %x", config_addr);
         rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
         return;
     }
 
-    /* There is no cached config, allocate MSIs */
-    if (!phb->msi_table[ndev].nvec) {
-        irq = xics_alloc_block(spapr->icp, 0, req_num, false,
-                               ret_intr_type == RTAS_TYPE_MSI);
-        if (irq < 0) {
-            error_report("Cannot allocate MSIs for device#%d", ndev);
-            rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
-            return;
-        }
-        phb->msi_table[ndev].irq = irq;
-        phb->msi_table[ndev].nvec = req_num;
-        phb->msi_table[ndev].config_addr = config_addr;
-    }
-
     /* Setup MSI/MSIX vectors in the device (via cfgspace or MSIX BAR) */
     spapr_msi_setmsg(pdev, spapr->msi_win_addr, ret_intr_type == RTAS_TYPE_MSIX,
-                     phb->msi_table[ndev].irq, req_num);
+                     irq, req_num);
 
+    /* Add MSI device to cache */
+    msi = g_malloc0(sizeof(*msi));
+    msi->first_irq = irq;
+    msi->num = req_num;
+    config_addr_key = g_new(int, 1);
+    *config_addr_key = config_addr;
+    g_hash_table_insert(phb->msi.hash, config_addr_key, msi);
+
+out:
     rtas_st(rets, 0, RTAS_OUT_SUCCESS);
     rtas_st(rets, 1, req_num);
     rtas_st(rets, 2, ++seq_num);
     rtas_st(rets, 3, ret_intr_type);
 
-    trace_spapr_pci_rtas_ibm_change_msi(func, req_num);
+    trace_spapr_pci_rtas_ibm_change_msi(config_addr, func, req_num, irq);
 }
 
 static void rtas_ibm_query_interrupt_source_number(PowerPCCPU *cpu,
@@ -395,25 +372,28 @@ static void rtas_ibm_query_interrupt_source_number(PowerPCCPU *cpu,
     uint32_t config_addr = rtas_ld(args, 0);
     uint64_t buid = ((uint64_t)rtas_ld(args, 1) << 32) | rtas_ld(args, 2);
     unsigned int intr_src_num = -1, ioa_intr_num = rtas_ld(args, 3);
-    int ndev;
     sPAPRPHBState *phb = NULL;
+    PCIDevice *pdev = NULL;
+    spapr_pci_msi *msi;
 
-    /* Fins sPAPRPHBState */
+    /* Find sPAPRPHBState */
     phb = find_phb(spapr, buid);
-    if (!phb) {
+    if (phb) {
+        pdev = find_dev(spapr, buid, config_addr);
+    }
+    if (!phb || !pdev) {
         rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
         return;
     }
 
     /* Find device descriptor and start IRQ */
-    ndev = spapr_msicfg_find(phb, config_addr, false);
-    if (ndev < 0) {
-        trace_spapr_pci_msi("MSI has not been enabled", -1, config_addr);
+    msi = (spapr_pci_msi *) g_hash_table_lookup(phb->msi.hash, &config_addr);
+    if (!msi || !msi->first_irq || !msi->num || (ioa_intr_num >= msi->num)) {
+        trace_spapr_pci_msi("Failed to return vector", config_addr);
         rtas_st(rets, 0, RTAS_OUT_HW_ERROR);
         return;
     }
-
-    intr_src_num = phb->msi_table[ndev].irq + ioa_intr_num;
+    intr_src_num = msi->first_irq + ioa_intr_num;
     trace_spapr_pci_rtas_ibm_query_interrupt_source_number(ioa_intr_num,
                                                            intr_src_num);
 
@@ -639,6 +619,8 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
 
         sphb->lsi_table[i].irq = irq;
     }
+
+    qemu_hash_init(&sphb->msi, sizeof(uint32_t), sizeof(spapr_pci_msi));
 }
 
 static void spapr_phb_reset(DeviceState *qdev)
@@ -675,20 +657,6 @@ static const VMStateDescription vmstate_spapr_pci_lsi = {
     },
 };
 
-static const VMStateDescription vmstate_spapr_pci_msi = {
-    .name = "spapr_pci/lsi",
-    .version_id = 1,
-    .minimum_version_id = 1,
-    .minimum_version_id_old = 1,
-    .fields      = (VMStateField []) {
-        VMSTATE_UINT32(config_addr, struct spapr_pci_msi),
-        VMSTATE_UINT32(irq, struct spapr_pci_msi),
-        VMSTATE_UINT32(nvec, struct spapr_pci_msi),
-
-        VMSTATE_END_OF_LIST()
-    },
-};
-
 static const VMStateDescription vmstate_spapr_pci = {
     .name = "spapr_pci",
     .version_id = 1,
@@ -703,9 +671,7 @@ static const VMStateDescription vmstate_spapr_pci = {
         VMSTATE_UINT64_EQUAL(io_win_size, sPAPRPHBState),
         VMSTATE_STRUCT_ARRAY(lsi_table, sPAPRPHBState, PCI_NUM_PINS, 0,
                              vmstate_spapr_pci_lsi, struct spapr_pci_lsi),
-        VMSTATE_STRUCT_ARRAY(msi_table, sPAPRPHBState, SPAPR_MSIX_MAX_DEVS, 0,
-                             vmstate_spapr_pci_msi, struct spapr_pci_msi),
-
+        VMSTATE_HASH_V(msi, sPAPRPHBState, 0),
         VMSTATE_END_OF_LIST()
     },
 };
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 970b4a9..b6d1626 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -27,13 +27,16 @@
 #include "hw/pci/pci_host.h"
 #include "hw/ppc/xics.h"
 
-#define SPAPR_MSIX_MAX_DEVS 32
-
 #define TYPE_SPAPR_PCI_HOST_BRIDGE "spapr-pci-host-bridge"
 
 #define SPAPR_PCI_HOST_BRIDGE(obj) \
     OBJECT_CHECK(sPAPRPHBState, (obj), TYPE_SPAPR_PCI_HOST_BRIDGE)
 
+typedef struct spapr_pci_msi {
+    uint32_t first_irq;
+    uint32_t num;
+} spapr_pci_msi;
+
 typedef struct sPAPRPHBState {
     PCIHostState parent_obj;
 
@@ -55,11 +58,7 @@ typedef struct sPAPRPHBState {
         uint32_t irq;
     } lsi_table[PCI_NUM_PINS];
 
-    struct spapr_pci_msi {
-        uint32_t config_addr;
-        uint32_t irq;
-        uint32_t nvec;
-    } msi_table[SPAPR_MSIX_MAX_DEVS];
+    qemu_hash msi;
 
     QLIST_ENTRY(sPAPRPHBState) list;
 } sPAPRPHBState;
diff --git a/trace-events b/trace-events
index f76323c..1a74bea 100644
--- a/trace-events
+++ b/trace-events
@@ -1163,12 +1163,13 @@ qxl_render_guest_primary_resized(int32_t width, int32_t height, int32_t stride,
 qxl_render_update_area_done(void *cookie) "%p"
 
 # hw/ppc/spapr_pci.c
-spapr_pci_msi(const char *msg, uint32_t n, uint32_t ca) "%s (device#%d, cfg=%x)"
+spapr_pci_msi(const char *msg, uint32_t ca) "%s (cfg=%x)"
 spapr_pci_msi_setup(const char *name, unsigned vector, uint64_t addr) "dev\"%s\" vector %u, addr=%"PRIx64
-spapr_pci_rtas_ibm_change_msi(unsigned func, unsigned req) "func %u, requested %u"
+spapr_pci_rtas_ibm_change_msi(unsigned cfg, unsigned func, unsigned req, unsigned first) "cfgaddr %x func %u, requested %u, first irq %u"
 spapr_pci_rtas_ibm_query_interrupt_source_number(unsigned ioa, unsigned intr) "queries for #%u, IRQ%u"
 spapr_pci_msi_write(uint64_t addr, uint64_t data, uint32_t dt_irq) "@%"PRIx64"<=%"PRIx64" IRQ %u"
 spapr_pci_lsi_set(const char *busname, int pin, uint32_t irq) "%s PIN%d IRQ %u"
+spapr_pci_msi_retry(unsigned config_addr, unsigned req_num, unsigned max_irqs) "Guest device at %x asked %u, have only %u"
 
 # hw/intc/xics.c
 xics_icp_check_ipi(int server, uint8_t mfrr) "CPU %d can take IPI mfrr=%#x"
-- 
1.9.rc0

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] vmstate: Add helper to enable GHashTable migration
  2014-05-22 10:53                             ` [Qemu-devel] [PATCH 1/2] vmstate: Add helper to enable GHashTable migration Alexey Kardashevskiy
@ 2014-05-22 10:57                               ` Alexander Graf
  2014-05-22 11:04                                 ` Alexey Kardashevskiy
  0 siblings, 1 reply; 35+ messages in thread
From: Alexander Graf @ 2014-05-22 10:57 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel; +Cc: qemu-ppc


On 22.05.14 12:53, Alexey Kardashevskiy wrote:
> This adds a VMSTATE_HASH_V macro. This implements put/get callbacks for it.
> This implements a qemu_hash_init() wrapper to save key/value sizes.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>   include/migration/vmstate.h | 10 +++++++++
>   include/qemu-common.h       | 13 +++++++++++
>   vmstate.c                   | 54 +++++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 77 insertions(+)
>
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 7e45048..6af599d 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -166,6 +166,7 @@ extern const VMStateInfo vmstate_info_timer;
>   extern const VMStateInfo vmstate_info_buffer;
>   extern const VMStateInfo vmstate_info_unused_buffer;
>   extern const VMStateInfo vmstate_info_bitmap;
> +static const VMStateInfo vmstate_info_hash;
>   
>   #define type_check_2darray(t1,t2,n,m) ((t1(*)[n][m])0 - (t2*)0)
>   #define type_check_array(t1,t2,n) ((t1(*)[n])0 - (t2*)0)
> @@ -740,6 +741,15 @@ extern const VMStateInfo vmstate_info_bitmap;
>   #define VMSTATE_BUFFER_UNSAFE(_field, _state, _version, _size)        \
>       VMSTATE_BUFFER_UNSAFE_INFO(_field, _state, _version, vmstate_info_buffer, _size)
>   
> +#define VMSTATE_HASH_V(_field, _state, _version) {                      \
> +    .name       = (stringify(_field)),                                  \
> +    .version_id = (_version),                                           \
> +    .size       = sizeof(qemu_hash),                                    \
> +    .info       = &vmstate_info_hash,                                   \
> +    .flags      = VMS_SINGLE,                                           \
> +    .offset     = vmstate_offset_value(_state, _field, qemu_hash),      \
> +}
> +
>   #define VMSTATE_UNUSED_V(_v, _size)                                   \
>       VMSTATE_UNUSED_BUFFER(NULL, _v, _size)
>   
> diff --git a/include/qemu-common.h b/include/qemu-common.h
> index 3f3fd60..ee973e7 100644
> --- a/include/qemu-common.h
> +++ b/include/qemu-common.h
> @@ -462,4 +462,17 @@ int parse_debug_env(const char *name, int max, int initial);
>   
>   const char *qemu_ether_ntoa(const MACAddr *mac);
>   
> +typedef struct qemu_hash {
> +    GHashTable *hash;
> +    int key_size;
> +    int value_size;
> +} qemu_hash;
> +
> +static inline void qemu_hash_init(qemu_hash *h, int key_size, int value_size)
> +{
> +    h->key_size = key_size;
> +    h->value_size = value_size;
> +    h->hash = g_hash_table_new_full(g_int_hash, g_int_equal, g_free, g_free);
> +}
> +
>   #endif
> diff --git a/vmstate.c b/vmstate.c
> index b5882fa..2148e73 100644
> --- a/vmstate.c
> +++ b/vmstate.c
> @@ -667,3 +667,57 @@ const VMStateInfo vmstate_info_bitmap = {
>       .get = get_bitmap,
>       .put = put_bitmap,
>   };
> +
> +/* Save restore for qemu_hash which is a wrapper over GHashTable */
> +static int get_hash(QEMUFile *f, void *pv, size_t size)
> +{
> +    qemu_hash *h = pv;
> +    uint32_t num = g_hash_table_size(h->hash);
> +    gpointer key, value;
> +
> +    num = qemu_get_be32(f);
> +
> +    for ( ; num; --num) {
> +        int i;
> +
> +        key = g_malloc0(h->key_size);
> +        for (i = 0; i < h->key_size; ++i) {
> +            ((uint8_t *)key)[i] = qemu_get_byte(f);
> +        }
> +        value = g_malloc0(h->value_size);
> +        for (i = 0; i < h->value_size; ++i) {
> +            ((uint8_t *)value)[i] = qemu_get_byte(f);
> +        }
> +        g_hash_table_insert(h->hash, key, value);
> +    }
> +
> +    return 0;
> +}
> +
> +static void put_hash(QEMUFile *f, void *pv, size_t size)
> +{
> +    qemu_hash *h = pv;
> +    uint32_t num = g_hash_table_size(h->hash);
> +    GHashTableIter iter;
> +    gpointer key, value;
> +
> +    qemu_put_be32(f, num);
> +
> +    g_hash_table_iter_init(&iter, h->hash);
> +    while (g_hash_table_iter_next (&iter, &key, &value)) {
> +        int i;
> +
> +        for (i = 0; i < h->key_size; ++i) {
> +            qemu_put_byte(f, ((uint8_t *)key)[i]);
> +        }
> +        for (i = 0; i < h->value_size; ++i) {
> +            qemu_put_byte(f, ((uint8_t *)value)[i]);

I think the only way to do this safely would be to do a recursive 
introspective walk through the hash table's key and value properties. 
Every key and every property can be an object again, right?

We would basically impose a limit onto ourselves that every member of 
the hash table would have to be a GObject again.


Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
  2014-05-22 10:53                           ` Alexey Kardashevskiy
  2014-05-22 10:53                             ` [Qemu-devel] [PATCH 1/2] vmstate: Add helper to enable GHashTable migration Alexey Kardashevskiy
  2014-05-22 10:53                             ` [Qemu-devel] [PATCH 2/2] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB Alexey Kardashevskiy
@ 2014-05-22 10:57                             ` Alexander Graf
  2014-05-22 14:25                               ` Alexey Kardashevskiy
  2 siblings, 1 reply; 35+ messages in thread
From: Alexander Graf @ 2014-05-22 10:57 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel; +Cc: qemu-ppc, Juan Quintela


On 22.05.14 12:53, Alexey Kardashevskiy wrote:
> On 05/22/2014 05:16 PM, Alexander Graf wrote:>
>>> Am 22.05.2014 um 08:53 schrieb Alexey Kardashevskiy <aik@ozlabs.ru>:
>>>
>>>> On 05/21/2014 10:42 PM, Alexey Kardashevskiy wrote:
>>>>> On 05/21/2014 08:35 PM, Alexander Graf wrote:
>>>>>
>>>>>> On 21.05.14 12:13, Alexey Kardashevskiy wrote:
>>>>>>> On 05/21/2014 07:50 PM, Alexander Graf wrote:
>>>>>>>> On 21.05.14 11:33, Alexey Kardashevskiy wrote:
>>>>>>>>> On 05/21/2014 07:13 PM, Alexander Graf wrote:
>>>>>>>>>> On 21.05.14 11:11, Michael S. Tsirkin wrote:
>>>>>>>>>>> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote:
>>>>>>>>>>>> On 21.05.14 10:52, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>> On 05/21/2014 06:40 PM, Alexander Graf wrote:
>>>>>>>>>>>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX
>>>>>>>>>>>>>> interrupt as
>>>>>>>>>>>>>> XICS used to be unable to reuse interrupts which becomes a
>>>>>>>>>>>>>> problem for
>>>>>>>>>>>>>> dynamic MSI reconfiguration which is happening on guest driver
>>>>>>>>>>>>>> reload or
>>>>>>>>>>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of devices
>>>>>>>>>>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no good
>>>>>>>>>>>>>> reason
>>>>>>>>>>>>>> for that.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This makes use of new XICS ability to reuse interrupts.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This removes cached MSI configuration from SPAPR PHB so the first
>>>>>>>>>>>>>> IRQ
>>>>>>>>>>>>>> number
>>>>>>>>>>>>>> of a device is stored in MSI/MSIX config space so there is no
>>>>>>>>>>>>>> need to
>>>>>>>>>>>>>> store
>>>>>>>>>>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags telling
>>>>>>>>>>>>>> what
>>>>>>>>>>>>>> type
>>>>>>>>>>>>>> of interrupt for which device it has configured in order to return
>>>>>>>>>>>>>> error if
>>>>>>>>>>>>>> (for example) MSIX was enabled and the guest is trying to disable
>>>>>>>>>>>>>> MSI
>>>>>>>>>>>>>> which
>>>>>>>>>>>>>> it has not enabled.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This removes a limit for the maximum number of MSIX-enabled devices
>>>>>>>>>>>>>> per PHB,
>>>>>>>>>>>>>> now XICS and PCI bus capacity are the only limitation.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This changes migration stream as it fixes
>>>>>>>>>>>>>> vmstate_spapr_pci_msi::name
>>>>>>>>>>>>>> which was
>>>>>>>>>>>>>> wrong since the beginning.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This fixed traces to be more informative.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In reality either MSIX or MSI is enabled, never both. So I could
>>>>>>>>>>>>>> remove
>>>>>>>>>>>>>> msi/msix
>>>>>>>>>>>>>> bitmaps from this patch, would it make sense?
>>>>>>>>>>>>> Is this a hard requirement? Does a device have to choose between
>>>>>>>>>>>>> MSIX and
>>>>>>>>>>>>> MSI or could it theoretically have both enabled? Is this a PCI
>>>>>>>>>>>>> limitation,
>>>>>>>>>>>>> a PAPR/XICS limitation or just a limitation of your implementation?
>>>>>>>>>>>> My implementation does not have this limitation, I asked if I can
>>>>>>>>>>>> simplify
>>>>>>>>>>>> code by introducing one :)
>>>>>>>>>>>>
>>>>>>>>>>>> I cannot see any reason why PCI cannot have both MSI and MSIX enabled
>>>>>>>>>>>> but
>>>>>>>>>>>> it does not seem to be used by anyone => cannot debug and confirm.
>>>>>>>>>>>>
>>>>>>>>>>>> PAPR spec assumes that if the guest tries enabling MSIX when MSI is
>>>>>>>>>>>> already
>>>>>>>>>>>> enabled, this is a "change", not enabling both types. But it also
>>>>>>>>>>>> says MSI
>>>>>>>>>>>> and MSIX vector numbers are not shared. Hm.
>>>>>>>>>>> Yeah, I'm not aware of any limitation on hardware here and I'd
>>>>>>>>>>> rather not impose one.
>>>>>>>>>>>
>>>>>>>>>>> Michael, do you know of any hardware that uses MSI *and* MSI-X at
>>>>>>>>>>> the same time?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Alex
>>>>>>>>>> No, and the PCI spec says:
>>>>>>>>>>       A function is permitted to implement both MSI and MSI-X, but
>>>>>>>>>> system
>>>>>>>>>>       software is
>>>>>>>>>>       prohibited from enabling both at the same time. If system software
>>>>>>>>>>       enables both at the same time, the result is undefined.
>>>>>>>>> Ah, cool. So yes Alexey, feel free to impose it :).
>>>>>>>> Heh. This solves just half of the problem - I still have to keep track of
>>>>>>>> what device got MSI/MSIX configured via that ibm,change-msi interface. I
>>>>>>>> was hoping I can store such flag somewhere in a device PCI config space
>>>>>>>> but
>>>>>>>> MSI/MSIX enable bit is not good as it is not set when those calls are
>>>>>>>> made.
>>>>>>>> And I cannot rely on address/data fields much as the guest can change them
>>>>>>>> (I already use them to store IRQ numbers and btw it is missing checks when
>>>>>>>> I read them back for disposal, I'll fix in next round).
>>>>>>>>
>>>>>>>> Or on "enable" event I could put IRQ numbers to .data of MSI config space
>>>>>>>> and on "disable" check if it is not zero, then configuration took place,
>>>>>>>> then I can remove my msi[]/msix[] flag arrays. If the guest did any change
>>>>>>>> to MSI/MSIX config space (it does not on SPAPR except weird selftest
>>>>>>>> cases), I compare .data with what ICS can possibly have and either reject
>>>>>>>> "disable" or handle it and if it breaks XICS - that's too bad for the
>>>>>>>> stupid guest. Would that be acceptable?
>>>>>>> Can't you prohibit the guest from writing to the MSI configuration
>>>>>>> registers itself? Then you don't need to do sanity checks.
>>>>>> I could for emulated devices but VFIO uses the same code. For example,
>>>>>> there is an IBM SCSI IPR card which does a "self test". For that, it saves
>>>>>> MSIX BAR content, does reboot via some backdoor interface and restores MSIX
>>>>>> BAR. It has been solved for VFIO in the host kernel by restoring MSIX data
>>>>>> from cached values when guest is trying to restore it with what it thinks
>>>>>> is actual MSIX data (it is virtualized because of x86). But there is cache
>>>>> We already have a cache because we don't access the real PCI registers with
>>>>> msi_set_message(), no?
>>>>
>>>> For emulated devices there is no cache. And in any case the guest is
>>>> allowed to write to it... Who knows what AIX does? I do not.
>>>
>>> Tried GHashTable for keeping bus:dev.fn <-> (irq, num), more or less ok but
>>> how to migrate such thing? Temporary cache somewhere and then unpack it? Or
>>> use old style migration callbacks?
>> Could you try to introduce a new vmstate type that just serializes and
>> deserializes hash tables? Maybe there is already a serialization
>> function for it in glib?
> I have not found any (most likely I do not know how to search there),
> I added mine, here are VMSTATE_HASH + its use for SPAPR.
>
>
> Is this a movement to right direction? I need to put key/value sizes
> into VMSTATE definition somehow but do not really want to touch
> VMStateField.

I'm not sure. Juan?


Alex

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH 1/2] vmstate: Add helper to enable GHashTable migration
  2014-05-22 10:57                               ` Alexander Graf
@ 2014-05-22 11:04                                 ` Alexey Kardashevskiy
  0 siblings, 0 replies; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-22 11:04 UTC (permalink / raw)
  To: Alexander Graf, qemu-devel; +Cc: qemu-ppc

On 05/22/2014 08:57 PM, Alexander Graf wrote:
> 
> On 22.05.14 12:53, Alexey Kardashevskiy wrote:
>> This adds a VMSTATE_HASH_V macro. This implements put/get callbacks for it.
>> This implements a qemu_hash_init() wrapper to save key/value sizes.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>   include/migration/vmstate.h | 10 +++++++++
>>   include/qemu-common.h       | 13 +++++++++++
>>   vmstate.c                   | 54
>> +++++++++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 77 insertions(+)
>>
>> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
>> index 7e45048..6af599d 100644
>> --- a/include/migration/vmstate.h
>> +++ b/include/migration/vmstate.h
>> @@ -166,6 +166,7 @@ extern const VMStateInfo vmstate_info_timer;
>>   extern const VMStateInfo vmstate_info_buffer;
>>   extern const VMStateInfo vmstate_info_unused_buffer;
>>   extern const VMStateInfo vmstate_info_bitmap;
>> +static const VMStateInfo vmstate_info_hash;
>>     #define type_check_2darray(t1,t2,n,m) ((t1(*)[n][m])0 - (t2*)0)
>>   #define type_check_array(t1,t2,n) ((t1(*)[n])0 - (t2*)0)
>> @@ -740,6 +741,15 @@ extern const VMStateInfo vmstate_info_bitmap;
>>   #define VMSTATE_BUFFER_UNSAFE(_field, _state, _version, _size)        \
>>       VMSTATE_BUFFER_UNSAFE_INFO(_field, _state, _version,
>> vmstate_info_buffer, _size)
>>   +#define VMSTATE_HASH_V(_field, _state, _version) {                      \
>> +    .name       = (stringify(_field)),                                  \
>> +    .version_id = (_version),                                           \
>> +    .size       = sizeof(qemu_hash),                                    \
>> +    .info       = &vmstate_info_hash,                                   \
>> +    .flags      = VMS_SINGLE,                                           \
>> +    .offset     = vmstate_offset_value(_state, _field, qemu_hash),      \
>> +}
>> +
>>   #define VMSTATE_UNUSED_V(_v, _size)                                   \
>>       VMSTATE_UNUSED_BUFFER(NULL, _v, _size)
>>   diff --git a/include/qemu-common.h b/include/qemu-common.h
>> index 3f3fd60..ee973e7 100644
>> --- a/include/qemu-common.h
>> +++ b/include/qemu-common.h
>> @@ -462,4 +462,17 @@ int parse_debug_env(const char *name, int max, int
>> initial);
>>     const char *qemu_ether_ntoa(const MACAddr *mac);
>>   +typedef struct qemu_hash {
>> +    GHashTable *hash;
>> +    int key_size;
>> +    int value_size;
>> +} qemu_hash;
>> +
>> +static inline void qemu_hash_init(qemu_hash *h, int key_size, int
>> value_size)
>> +{
>> +    h->key_size = key_size;
>> +    h->value_size = value_size;
>> +    h->hash = g_hash_table_new_full(g_int_hash, g_int_equal, g_free,
>> g_free);
>> +}
>> +
>>   #endif
>> diff --git a/vmstate.c b/vmstate.c
>> index b5882fa..2148e73 100644
>> --- a/vmstate.c
>> +++ b/vmstate.c
>> @@ -667,3 +667,57 @@ const VMStateInfo vmstate_info_bitmap = {
>>       .get = get_bitmap,
>>       .put = put_bitmap,
>>   };
>> +
>> +/* Save restore for qemu_hash which is a wrapper over GHashTable */
>> +static int get_hash(QEMUFile *f, void *pv, size_t size)
>> +{
>> +    qemu_hash *h = pv;
>> +    uint32_t num = g_hash_table_size(h->hash);
>> +    gpointer key, value;
>> +
>> +    num = qemu_get_be32(f);
>> +
>> +    for ( ; num; --num) {
>> +        int i;
>> +
>> +        key = g_malloc0(h->key_size);
>> +        for (i = 0; i < h->key_size; ++i) {
>> +            ((uint8_t *)key)[i] = qemu_get_byte(f);
>> +        }
>> +        value = g_malloc0(h->value_size);
>> +        for (i = 0; i < h->value_size; ++i) {
>> +            ((uint8_t *)value)[i] = qemu_get_byte(f);
>> +        }
>> +        g_hash_table_insert(h->hash, key, value);
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static void put_hash(QEMUFile *f, void *pv, size_t size)
>> +{
>> +    qemu_hash *h = pv;
>> +    uint32_t num = g_hash_table_size(h->hash);
>> +    GHashTableIter iter;
>> +    gpointer key, value;
>> +
>> +    qemu_put_be32(f, num);
>> +
>> +    g_hash_table_iter_init(&iter, h->hash);
>> +    while (g_hash_table_iter_next (&iter, &key, &value)) {
>> +        int i;
>> +
>> +        for (i = 0; i < h->key_size; ++i) {
>> +            qemu_put_byte(f, ((uint8_t *)key)[i]);
>> +        }
>> +        for (i = 0; i < h->value_size; ++i) {
>> +            qemu_put_byte(f, ((uint8_t *)value)[i]);
> 
> I think the only way to do this safely would be to do a recursive
> introspective walk through the hash table's key and value properties. Every
> key and every property can be an object again, right?
>
> We would basically impose a limit onto ourselves that every member of the
> hash table would have to be a GObject again.

This is a bit too much for the task I am trying to solve :)

@key here is uin32_t and there is no point in making it an object...



-- 
Alexey

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
  2014-05-22 10:57                             ` [Qemu-devel] [PATCH v2 8/8] " Alexander Graf
@ 2014-05-22 14:25                               ` Alexey Kardashevskiy
  0 siblings, 0 replies; 35+ messages in thread
From: Alexey Kardashevskiy @ 2014-05-22 14:25 UTC (permalink / raw)
  To: Alexander Graf, qemu-devel; +Cc: qemu-ppc, Juan Quintela

On 05/22/2014 08:57 PM, Alexander Graf wrote:
> 
> On 22.05.14 12:53, Alexey Kardashevskiy wrote:
>> On 05/22/2014 05:16 PM, Alexander Graf wrote:>
>>>> Am 22.05.2014 um 08:53 schrieb Alexey Kardashevskiy <aik@ozlabs.ru>:
>>>>
>>>>> On 05/21/2014 10:42 PM, Alexey Kardashevskiy wrote:
>>>>>> On 05/21/2014 08:35 PM, Alexander Graf wrote:
>>>>>>
>>>>>>> On 21.05.14 12:13, Alexey Kardashevskiy wrote:
>>>>>>>> On 05/21/2014 07:50 PM, Alexander Graf wrote:
>>>>>>>>> On 21.05.14 11:33, Alexey Kardashevskiy wrote:
>>>>>>>>>> On 05/21/2014 07:13 PM, Alexander Graf wrote:
>>>>>>>>>>> On 21.05.14 11:11, Michael S. Tsirkin wrote:
>>>>>>>>>>>> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote:
>>>>>>>>>>>>> On 21.05.14 10:52, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>>> On 05/21/2014 06:40 PM, Alexander Graf wrote:
>>>>>>>>>>>>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX
>>>>>>>>>>>>>>> interrupt as
>>>>>>>>>>>>>>> XICS used to be unable to reuse interrupts which becomes a
>>>>>>>>>>>>>>> problem for
>>>>>>>>>>>>>>> dynamic MSI reconfiguration which is happening on guest driver
>>>>>>>>>>>>>>> reload or
>>>>>>>>>>>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of
>>>>>>>>>>>>>>> devices
>>>>>>>>>>>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no
>>>>>>>>>>>>>>> good
>>>>>>>>>>>>>>> reason
>>>>>>>>>>>>>>> for that.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This makes use of new XICS ability to reuse interrupts.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This removes cached MSI configuration from SPAPR PHB so the
>>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>> IRQ
>>>>>>>>>>>>>>> number
>>>>>>>>>>>>>>> of a device is stored in MSI/MSIX config space so there is no
>>>>>>>>>>>>>>> need to
>>>>>>>>>>>>>>> store
>>>>>>>>>>>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags
>>>>>>>>>>>>>>> telling
>>>>>>>>>>>>>>> what
>>>>>>>>>>>>>>> type
>>>>>>>>>>>>>>> of interrupt for which device it has configured in order to
>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>> error if
>>>>>>>>>>>>>>> (for example) MSIX was enabled and the guest is trying to
>>>>>>>>>>>>>>> disable
>>>>>>>>>>>>>>> MSI
>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>> it has not enabled.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This removes a limit for the maximum number of MSIX-enabled
>>>>>>>>>>>>>>> devices
>>>>>>>>>>>>>>> per PHB,
>>>>>>>>>>>>>>> now XICS and PCI bus capacity are the only limitation.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This changes migration stream as it fixes
>>>>>>>>>>>>>>> vmstate_spapr_pci_msi::name
>>>>>>>>>>>>>>> which was
>>>>>>>>>>>>>>> wrong since the beginning.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This fixed traces to be more informative.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In reality either MSIX or MSI is enabled, never both. So I
>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>> remove
>>>>>>>>>>>>>>> msi/msix
>>>>>>>>>>>>>>> bitmaps from this patch, would it make sense?
>>>>>>>>>>>>>> Is this a hard requirement? Does a device have to choose between
>>>>>>>>>>>>>> MSIX and
>>>>>>>>>>>>>> MSI or could it theoretically have both enabled? Is this a PCI
>>>>>>>>>>>>>> limitation,
>>>>>>>>>>>>>> a PAPR/XICS limitation or just a limitation of your
>>>>>>>>>>>>>> implementation?
>>>>>>>>>>>>> My implementation does not have this limitation, I asked if I can
>>>>>>>>>>>>> simplify
>>>>>>>>>>>>> code by introducing one :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I cannot see any reason why PCI cannot have both MSI and MSIX
>>>>>>>>>>>>> enabled
>>>>>>>>>>>>> but
>>>>>>>>>>>>> it does not seem to be used by anyone => cannot debug and
>>>>>>>>>>>>> confirm.
>>>>>>>>>>>>>
>>>>>>>>>>>>> PAPR spec assumes that if the guest tries enabling MSIX when
>>>>>>>>>>>>> MSI is
>>>>>>>>>>>>> already
>>>>>>>>>>>>> enabled, this is a "change", not enabling both types. But it also
>>>>>>>>>>>>> says MSI
>>>>>>>>>>>>> and MSIX vector numbers are not shared. Hm.
>>>>>>>>>>>> Yeah, I'm not aware of any limitation on hardware here and I'd
>>>>>>>>>>>> rather not impose one.
>>>>>>>>>>>>
>>>>>>>>>>>> Michael, do you know of any hardware that uses MSI *and* MSI-X at
>>>>>>>>>>>> the same time?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Alex
>>>>>>>>>>> No, and the PCI spec says:
>>>>>>>>>>>       A function is permitted to implement both MSI and MSI-X, but
>>>>>>>>>>> system
>>>>>>>>>>>       software is
>>>>>>>>>>>       prohibited from enabling both at the same time. If system
>>>>>>>>>>> software
>>>>>>>>>>>       enables both at the same time, the result is undefined.
>>>>>>>>>> Ah, cool. So yes Alexey, feel free to impose it :).
>>>>>>>>> Heh. This solves just half of the problem - I still have to keep
>>>>>>>>> track of
>>>>>>>>> what device got MSI/MSIX configured via that ibm,change-msi
>>>>>>>>> interface. I
>>>>>>>>> was hoping I can store such flag somewhere in a device PCI config
>>>>>>>>> space
>>>>>>>>> but
>>>>>>>>> MSI/MSIX enable bit is not good as it is not set when those calls are
>>>>>>>>> made.
>>>>>>>>> And I cannot rely on address/data fields much as the guest can
>>>>>>>>> change them
>>>>>>>>> (I already use them to store IRQ numbers and btw it is missing
>>>>>>>>> checks when
>>>>>>>>> I read them back for disposal, I'll fix in next round).
>>>>>>>>>
>>>>>>>>> Or on "enable" event I could put IRQ numbers to .data of MSI
>>>>>>>>> config space
>>>>>>>>> and on "disable" check if it is not zero, then configuration took
>>>>>>>>> place,
>>>>>>>>> then I can remove my msi[]/msix[] flag arrays. If the guest did
>>>>>>>>> any change
>>>>>>>>> to MSI/MSIX config space (it does not on SPAPR except weird selftest
>>>>>>>>> cases), I compare .data with what ICS can possibly have and either
>>>>>>>>> reject
>>>>>>>>> "disable" or handle it and if it breaks XICS - that's too bad for the
>>>>>>>>> stupid guest. Would that be acceptable?
>>>>>>>> Can't you prohibit the guest from writing to the MSI configuration
>>>>>>>> registers itself? Then you don't need to do sanity checks.
>>>>>>> I could for emulated devices but VFIO uses the same code. For example,
>>>>>>> there is an IBM SCSI IPR card which does a "self test". For that, it
>>>>>>> saves
>>>>>>> MSIX BAR content, does reboot via some backdoor interface and
>>>>>>> restores MSIX
>>>>>>> BAR. It has been solved for VFIO in the host kernel by restoring
>>>>>>> MSIX data
>>>>>>> from cached values when guest is trying to restore it with what it
>>>>>>> thinks
>>>>>>> is actual MSIX data (it is virtualized because of x86). But there is
>>>>>>> cache
>>>>>> We already have a cache because we don't access the real PCI
>>>>>> registers with
>>>>>> msi_set_message(), no?
>>>>>
>>>>> For emulated devices there is no cache. And in any case the guest is
>>>>> allowed to write to it... Who knows what AIX does? I do not.
>>>>
>>>> Tried GHashTable for keeping bus:dev.fn <-> (irq, num), more or less ok
>>>> but
>>>> how to migrate such thing? Temporary cache somewhere and then unpack
>>>> it? Or
>>>> use old style migration callbacks?
>>> Could you try to introduce a new vmstate type that just serializes and
>>> deserializes hash tables? Maybe there is already a serialization
>>> function for it in glib?
>> I have not found any (most likely I do not know how to search there),
>> I added mine, here are VMSTATE_HASH + its use for SPAPR.
>>
>>
>> Is this a movement to right direction? I need to put key/value sizes
>> into VMSTATE definition somehow but do not really want to touch
>> VMStateField.
> 
> I'm not sure. Juan?


I also tried to simplify this particular thing more by assuming that the
key is always "int" and put size of value to VMStateField::size. But there
is no way to get the size in VMStateInfo::get(). Or I could do a
"subsection" and try implementing has as an array (and have get/put defined
for items, should work) but in this case I'll need to cache number of
elements of the hash table somewhere so I'll need a wrapper around GHashTable.

Making generalized version is not obvious for me :(


-- 
Alexey

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2014-05-22 14:26 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-15  9:59 [Qemu-devel] [PATCH v2 0/8] Move interrupts from spapr to xics Alexey Kardashevskiy
2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 1/8] xics: Add flags for interrupts Alexey Kardashevskiy
2014-05-21  6:30   ` Alexey Kardashevskiy
2014-05-21  8:40     ` Alexander Graf
2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 2/8] xics: Add xics_find_source() Alexey Kardashevskiy
2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 3/8] xics: Disable flags reset on xics reset Alexey Kardashevskiy
2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 4/8] spapr: Move interrupt allocator to xics Alexey Kardashevskiy
2014-05-21  8:34   ` Alexander Graf
2014-05-21  8:46     ` Alexey Kardashevskiy
2014-05-21  8:47       ` Alexander Graf
2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 5/8] xics: Remove obsolete xics_set_irq_type() Alexey Kardashevskiy
2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 6/8] spapr: Remove @next_irq Alexey Kardashevskiy
2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 7/8] xics: Implement xics_ics_free() Alexey Kardashevskiy
2014-05-15  9:59 ` [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB Alexey Kardashevskiy
2014-05-21  8:40   ` Alexander Graf
2014-05-21  8:52     ` Alexey Kardashevskiy
2014-05-21  9:06       ` Alexander Graf
2014-05-21  9:11         ` Michael S. Tsirkin
2014-05-21  9:13           ` Alexander Graf
2014-05-21  9:33             ` Alexey Kardashevskiy
2014-05-21  9:38               ` Michael S. Tsirkin
2014-05-21 10:03                 ` Alexey Kardashevskiy
2014-05-21  9:50               ` Alexander Graf
2014-05-21 10:13                 ` Alexey Kardashevskiy
2014-05-21 10:35                   ` Alexander Graf
2014-05-21 12:42                     ` Alexey Kardashevskiy
2014-05-22  6:53                       ` Alexey Kardashevskiy
2014-05-22  7:16                         ` Alexander Graf
2014-05-22 10:53                           ` Alexey Kardashevskiy
2014-05-22 10:53                             ` [Qemu-devel] [PATCH 1/2] vmstate: Add helper to enable GHashTable migration Alexey Kardashevskiy
2014-05-22 10:57                               ` Alexander Graf
2014-05-22 11:04                                 ` Alexey Kardashevskiy
2014-05-22 10:53                             ` [Qemu-devel] [PATCH 2/2] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB Alexey Kardashevskiy
2014-05-22 10:57                             ` [Qemu-devel] [PATCH v2 8/8] " Alexander Graf
2014-05-22 14:25                               ` Alexey Kardashevskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.