All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8
@ 2013-06-27  6:45 Alexey Kardashevskiy
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 01/17] pseries: move interrupt controllers to hw/intc/ Alexey Kardashevskiy
                   ` (19 more replies)
  0 siblings, 20 replies; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-27  6:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, Alexey Kardashevskiy, Alexander Graf, qemu-ppc,
	Paolo Bonzini, Paul Mackerras, David Gibson


This series spent quite a lot of time waiting when David's PCI series
reaches the upstream but it does not seem to happen soon so I rebased
those on top of agraf/ppc-next rebased on top qemu.org/master.


While this series applies and compiles, the migration will often fail
until the "migration: do not sent zero pages in bulk stage" patch is reverted
or fixed somehow.


Alexey Kardashevskiy (4):
  pseries: move interrupt controllers to hw/intc/
  pseries: rework XICS
  pseries: rework PAPR virtual SCSI
  spapr-pci: rework MSI/MSIX

David Gibson (12):
  savevm: Implement VMS_DIVIDE flag
  target-ppc: Convert ppc cpu savevm to VMStateDescription
  pseries: savevm support for XICS interrupt controller
  pseries: savevm support for VIO devices
  pseries: savevm support for PAPR VIO logical lan
  pseries: savevm support for PAPR TCE tables
  pseries: savevm support for PAPR virtual SCSI
  pseries: savevm support for pseries machine
  pseries: savevm support for PCI host bridge
  target-ppc: Add helper for KVM_PPC_RTAS_DEFINE_TOKEN
  pseries: Support for in-kernel XICS interrupt controller
  pseries: savevm support with KVM

Prerna Saxena (1):
  ppc64: Enable QEMU to run on POWER 8 DD1 chip.

 default-configs/ppc64-softmmu.mak |    2 +
 hw/char/spapr_vty.c               |   16 ++
 hw/intc/Makefile.objs             |    2 +
 hw/{ppc => intc}/xics.c           |  172 ++++++++----
 hw/intc/xics_kvm.c                |  445 +++++++++++++++++++++++++++++++
 hw/net/spapr_llan.c               |   24 +-
 hw/ppc/Makefile.objs              |    2 +-
 hw/ppc/spapr.c                    |  418 ++++++++++++++++++++++++++++-
 hw/ppc/spapr_hcall.c              |    8 +-
 hw/ppc/spapr_iommu.c              |   25 ++
 hw/ppc/spapr_pci.c                |  141 ++++++----
 hw/ppc/spapr_vio.c                |   20 ++
 hw/scsi/spapr_vscsi.c             |  306 ++++++++++++++-------
 include/hw/pci-host/spapr.h       |   14 +-
 include/hw/ppc/spapr.h            |   17 +-
 include/hw/ppc/spapr_vio.h        |    5 +
 include/hw/ppc/xics.h             |   72 ++++-
 include/migration/vmstate.h       |   13 +
 savevm.c                          |    8 +
 target-ppc/cpu-models.c           |    3 +
 target-ppc/cpu-models.h           |    1 +
 target-ppc/cpu-qom.h              |    4 +
 target-ppc/cpu.h                  |    8 +-
 target-ppc/kvm.c                  |   83 ++++++
 target-ppc/kvm_ppc.h              |   29 ++
 target-ppc/machine.c              |  533 +++++++++++++++++++++++++++++++------
 target-ppc/translate_init.c       |   36 +++
 27 files changed, 2088 insertions(+), 319 deletions(-)
 rename hw/{ppc => intc}/xics.c (80%)
 create mode 100644 hw/intc/xics_kvm.c

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH 01/17] pseries: move interrupt controllers to hw/intc/
  2013-06-27  6:45 [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
@ 2013-06-27  6:45 ` Alexey Kardashevskiy
  2013-07-02 20:54   ` Andreas Färber
  2013-07-08 18:15   ` Anthony Liguori
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 02/17] pseries: rework XICS Alexey Kardashevskiy
                   ` (18 subsequent siblings)
  19 siblings, 2 replies; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-27  6:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, Alexey Kardashevskiy, Alexander Graf, qemu-ppc,
	Paolo Bonzini, Paul Mackerras, David Gibson

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 default-configs/ppc64-softmmu.mak |    1 +
 hw/intc/Makefile.objs             |    1 +
 hw/{ppc => intc}/xics.c           |    0
 hw/ppc/Makefile.objs              |    2 +-
 4 files changed, 3 insertions(+), 1 deletion(-)
 rename hw/{ppc => intc}/xics.c (100%)

diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
index cb279cb..69a9f8d 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -47,5 +47,6 @@ CONFIG_E500=y
 CONFIG_OPENPIC_KVM=$(and $(CONFIG_E500),$(CONFIG_KVM))
 # For pSeries
 CONFIG_PCI_HOTPLUG=y
+CONFIG_XICS=$(CONFIG_PSERIES)
 # For PReP
 CONFIG_MC146818RTC=y
diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 2ba49d0..abe8f80 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -22,3 +22,4 @@ obj-$(CONFIG_OMAP) += omap_intc.o
 obj-$(CONFIG_OPENPIC) += openpic.o
 obj-$(CONFIG_OPENPIC_KVM) += openpic_kvm.o
 obj-$(CONFIG_SH4) += sh_intc.o
+obj-$(CONFIG_XICS) += xics.o
diff --git a/hw/ppc/xics.c b/hw/intc/xics.c
similarity index 100%
rename from hw/ppc/xics.c
rename to hw/intc/xics.c
diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index be00d1d..7a1cd5d 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -1,7 +1,7 @@
 # shared objects
 obj-y += ppc.o ppc_booke.o
 # IBM pSeries (sPAPR)
-obj-$(CONFIG_PSERIES) += spapr.o xics.o spapr_vio.o spapr_events.o
+obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
 obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
 obj-$(CONFIG_PSERIES) += spapr_pci.o
 # PowerPC 4xx boards
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH 02/17] pseries: rework XICS
  2013-06-27  6:45 [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 01/17] pseries: move interrupt controllers to hw/intc/ Alexey Kardashevskiy
@ 2013-06-27  6:45 ` Alexey Kardashevskiy
  2013-06-27 11:47   ` David Gibson
  2013-07-08 18:22   ` Anthony Liguori
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 03/17] savevm: Implement VMS_DIVIDE flag Alexey Kardashevskiy
                   ` (17 subsequent siblings)
  19 siblings, 2 replies; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-27  6:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, Alexey Kardashevskiy, Alexander Graf, qemu-ppc,
	Paolo Bonzini, Paul Mackerras, David Gibson

Currently XICS interrupt controller is not a QEMU device. As we are going
to support in-kernel emulated XICS which is a part of KVM, it make
sense not to extend the existing XICS and have multiple KVM stub functions
but to create yet another device and share pieces between fully emulated
XICS and in-kernel XICS.

The rework includes:
* port to QOM
* made few functions public to use from in-kernel XICS implementation
* made VMStateDescription public to be used for in-kernel XICS migration
* move xics_system_init() to spapr.c, it tries creating fully-emulated
XICS now and will try in-kernel XICS in upcoming patches.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/intc/xics.c        |  109 ++++++++++++++++++++++++++-----------------------
 hw/ppc/spapr.c        |   28 +++++++++++++
 include/hw/ppc/xics.h |   59 ++++++++++++++++++++++++--
 3 files changed, 141 insertions(+), 55 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 091912e..0e374c8 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -34,13 +34,6 @@
  * ICP: Presentation layer
  */
 
-struct icp_server_state {
-    uint32_t xirr;
-    uint8_t pending_priority;
-    uint8_t mfrr;
-    qemu_irq output;
-};
-
 #define XISR_MASK  0x00ffffff
 #define CPPR_MASK  0xff000000
 
@@ -49,12 +42,6 @@ struct icp_server_state {
 
 struct ics_state;
 
-struct icp_state {
-    long nr_servers;
-    struct icp_server_state *ss;
-    struct ics_state *ics;
-};
-
 static void ics_reject(struct ics_state *ics, int nr);
 static void ics_resend(struct ics_state *ics);
 static void ics_eoi(struct ics_state *ics, int nr);
@@ -171,27 +158,6 @@ static void icp_irq(struct icp_state *icp, int server, int nr, uint8_t priority)
 /*
  * ICS: Source layer
  */
-
-struct ics_irq_state {
-    int server;
-    uint8_t priority;
-    uint8_t saved_priority;
-#define XICS_STATUS_ASSERTED           0x1
-#define XICS_STATUS_SENT               0x2
-#define XICS_STATUS_REJECTED           0x4
-#define XICS_STATUS_MASKED_PENDING     0x8
-    uint8_t status;
-};
-
-struct ics_state {
-    int nr_irqs;
-    int offset;
-    qemu_irq *qirqs;
-    bool *islsi;
-    struct ics_irq_state *irqs;
-    struct icp_state *icp;
-};
-
 static int ics_valid_irq(struct ics_state *ics, uint32_t nr)
 {
     return (nr >= ics->offset)
@@ -506,9 +472,8 @@ static void rtas_int_on(PowerPCCPU *cpu, sPAPREnvironment *spapr,
     rtas_st(rets, 0, 0); /* Success */
 }
 
-static void xics_reset(void *opaque)
+void xics_common_reset(struct icp_state *icp)
 {
-    struct icp_state *icp = (struct icp_state *)opaque;
     struct ics_state *ics = icp->ics;
     int i;
 
@@ -527,7 +492,12 @@ static void xics_reset(void *opaque)
     }
 }
 
-void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
+static void xics_reset(DeviceState *d)
+{
+    xics_common_reset(XICS(d));
+}
+
+void xics_common_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
 {
     CPUState *cs = CPU(cpu);
     CPUPPCState *env = &cpu->env;
@@ -551,37 +521,72 @@ void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
     }
 }
 
-struct icp_state *xics_system_init(int nr_servers, int nr_irqs)
+void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
+{
+    xics_common_cpu_setup(icp, cpu);
+}
+
+void xics_common_init(struct icp_state *icp, qemu_irq_handler handler)
 {
-    struct icp_state *icp;
-    struct ics_state *ics;
+    struct ics_state *ics = icp->ics;
 
-    icp = g_malloc0(sizeof(*icp));
-    icp->nr_servers = nr_servers;
     icp->ss = g_malloc0(icp->nr_servers*sizeof(struct icp_server_state));
 
     ics = g_malloc0(sizeof(*ics));
-    ics->nr_irqs = nr_irqs;
+    ics->nr_irqs = icp->nr_irqs;
     ics->offset = XICS_IRQ_BASE;
-    ics->irqs = g_malloc0(nr_irqs * sizeof(struct ics_irq_state));
-    ics->islsi = g_malloc0(nr_irqs * sizeof(bool));
+    ics->irqs = g_malloc0(ics->nr_irqs * sizeof(struct ics_irq_state));
+    ics->islsi = g_malloc0(ics->nr_irqs * sizeof(bool));
 
     icp->ics = ics;
     ics->icp = icp;
 
-    ics->qirqs = qemu_allocate_irqs(ics_set_irq, ics, nr_irqs);
+    ics->qirqs = qemu_allocate_irqs(handler, ics, ics->nr_irqs);
+}
 
-    spapr_register_hypercall(H_CPPR, h_cppr);
-    spapr_register_hypercall(H_IPI, h_ipi);
-    spapr_register_hypercall(H_XIRR, h_xirr);
-    spapr_register_hypercall(H_EOI, h_eoi);
+static void xics_realize(DeviceState *dev, Error **errp)
+{
+    struct icp_state *icp = XICS(dev);
+
+    xics_common_init(icp, ics_set_irq);
 
     spapr_rtas_register("ibm,set-xive", rtas_set_xive);
     spapr_rtas_register("ibm,get-xive", rtas_get_xive);
     spapr_rtas_register("ibm,int-off", rtas_int_off);
     spapr_rtas_register("ibm,int-on", rtas_int_on);
 
-    qemu_register_reset(xics_reset, icp);
+}
+
+static Property xics_properties[] = {
+    DEFINE_PROP_UINT32("nr_servers", struct icp_state, nr_servers, -1),
+    DEFINE_PROP_UINT32("nr_irqs", struct icp_state, nr_irqs, -1),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void xics_class_init(ObjectClass *oc, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(oc);
+
+    dc->realize = xics_realize;
+    dc->props = xics_properties;
+    dc->reset = xics_reset;
+}
+
+static const TypeInfo xics_info = {
+    .name          = TYPE_XICS,
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(struct icp_state),
+    .class_init    = xics_class_init,
+};
+
+static void xics_register_types(void)
+{
+    spapr_register_hypercall(H_CPPR, h_cppr);
+    spapr_register_hypercall(H_IPI, h_ipi);
+    spapr_register_hypercall(H_XIRR, h_xirr);
+    spapr_register_hypercall(H_EOI, h_eoi);
 
-    return icp;
+    type_register_static(&xics_info);
 }
+
+type_init(xics_register_types)
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 38c29b7..def3505 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -719,6 +719,34 @@ static int spapr_vga_init(PCIBus *pci_bus)
     }
 }
 
+static struct icp_state *try_create_xics(const char *type, int nr_servers,
+                                         int nr_irqs)
+{
+    DeviceState *dev;
+
+    dev = qdev_create(NULL, type);
+    qdev_prop_set_uint32(dev, "nr_servers", nr_servers);
+    qdev_prop_set_uint32(dev, "nr_irqs", nr_irqs);
+    if (qdev_init(dev) < 0) {
+        return NULL;
+    }
+
+    return XICS(dev);
+}
+
+static struct icp_state *xics_system_init(int nr_servers, int nr_irqs)
+{
+    struct icp_state *icp = NULL;
+
+    icp = try_create_xics(TYPE_XICS, nr_servers, nr_irqs);
+    if (!icp) {
+        perror("Failed to create XICS\n");
+        abort();
+    }
+
+    return icp;
+}
+
 /* pSeries LPAR / sPAPR hardware init */
 static void ppc_spapr_init(QEMUMachineInitArgs *args)
 {
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 6bce042..3f72806 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -27,15 +27,68 @@
 #if !defined(__XICS_H__)
 #define __XICS_H__
 
+#include "hw/sysbus.h"
+
+#define TYPE_XICS "xics"
+#define XICS(obj) OBJECT_CHECK(struct icp_state, (obj), TYPE_XICS)
+
 #define XICS_IPI        0x2
-#define XICS_IRQ_BASE   0x10
+#define XICS_BUID       0x1
+#define XICS_IRQ_BASE   (XICS_BUID << 12)
+
+/*
+ * We currently only support one BUID which is our interrupt base
+ * (the kernel implementation supports more but we don't exploit
+ *  that yet)
+ */
 
-struct icp_state;
+struct icp_state {
+    /*< private >*/
+    SysBusDevice parent_obj;
+    /*< public >*/
+    uint32_t nr_servers;
+    uint32_t nr_irqs;
+    struct icp_server_state *ss;
+    struct ics_state *ics;
+};
+
+struct icp_server_state {
+    uint32_t xirr;
+    uint8_t pending_priority;
+    uint8_t mfrr;
+    qemu_irq output;
+};
+
+struct ics_state {
+    uint32_t nr_irqs;
+    uint32_t offset;
+    qemu_irq *qirqs;
+    bool *islsi;
+    struct ics_irq_state *irqs;
+    struct icp_state *icp;
+};
+
+struct ics_irq_state {
+    uint32_t server;
+    uint8_t priority;
+    uint8_t saved_priority;
+#define XICS_STATUS_ASSERTED           0x1
+#define XICS_STATUS_SENT               0x2
+#define XICS_STATUS_REJECTED           0x4
+#define XICS_STATUS_MASKED_PENDING     0x8
+    uint8_t status;
+};
 
 qemu_irq xics_get_qirq(struct icp_state *icp, int irq);
 void xics_set_irq_type(struct icp_state *icp, int irq, bool lsi);
 
-struct icp_state *xics_system_init(int nr_servers, int nr_irqs);
+void xics_common_init(struct icp_state *icp, qemu_irq_handler handler);
+void xics_common_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu);
+void xics_common_reset(struct icp_state *icp);
+
 void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu);
 
+extern const VMStateDescription vmstate_icp_server;
+extern const VMStateDescription vmstate_ics;
+
 #endif /* __XICS_H__ */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH 03/17] savevm: Implement VMS_DIVIDE flag
  2013-06-27  6:45 [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 01/17] pseries: move interrupt controllers to hw/intc/ Alexey Kardashevskiy
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 02/17] pseries: rework XICS Alexey Kardashevskiy
@ 2013-06-27  6:45 ` Alexey Kardashevskiy
  2013-07-08 18:27   ` Anthony Liguori
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 04/17] target-ppc: Convert ppc cpu savevm to VMStateDescription Alexey Kardashevskiy
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-27  6:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

The vmstate infrastructure includes a VMS_MULTIPY flag, and associated
VMSTATE_VBUFFER_MULTIPLY helper macro.  These can be used to save a
variably sized buffer where the size in bytes of the buffer isn't directly
accessible as a structure field, but an element count from which the size
can be derived is.

This patch adds an analogous VMS_DIVIDE option, which handles a variably
sized buffer whose size is a submultiple of a field, rather than a
multiple.  For example a buffer containing per-page structures whose size
is derived from a field storing the total address space described by the
structures could use this construct.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 include/migration/vmstate.h |   13 +++++++++++++
 savevm.c                    |    8 ++++++++
 2 files changed, 21 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index ebc4d09..787f1cb 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -98,6 +98,7 @@ enum VMStateFlags {
     VMS_MULTIPLY         = 0x200,  /* multiply "size" field by field_size */
     VMS_VARRAY_UINT8     = 0x400,  /* Array with size in uint8_t field*/
     VMS_VARRAY_UINT32    = 0x800,  /* Array with size in uint32_t field*/
+    VMS_DIVIDE           = 0x1000, /* divide "size" field by field_size */
 };
 
 typedef struct {
@@ -420,6 +421,18 @@ extern const VMStateInfo vmstate_info_bitmap;
     .start        = (_start),                                        \
 }
 
+#define VMSTATE_VBUFFER_DIVIDE(_field, _state, _version, _test, _start, _field_size, _divide) { \
+    .name         = (stringify(_field)),                             \
+    .version_id   = (_version),                                      \
+    .field_exists = (_test),                                         \
+    .size_offset  = vmstate_offset_value(_state, _field_size, uint32_t),\
+    .size         = (_divide),                                       \
+    .info         = &vmstate_info_buffer,                            \
+    .flags        = VMS_VBUFFER|VMS_POINTER|VMS_DIVIDE,              \
+    .offset       = offsetof(_state, _field),                        \
+    .start        = (_start),                                        \
+}
+
 #define VMSTATE_VBUFFER(_field, _state, _version, _test, _start, _field_size) { \
     .name         = (stringify(_field)),                             \
     .version_id   = (_version),                                      \
diff --git a/savevm.c b/savevm.c
index 48cc2a9..c0fb4a3 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1658,6 +1658,10 @@ int vmstate_load_state(QEMUFile *f, const VMStateDescription *vmsd,
                 if (field->flags & VMS_MULTIPLY) {
                     size *= field->size;
                 }
+                if (field->flags & VMS_DIVIDE) {
+                    assert((size % field->size) == 0);
+                    size /= field->size;
+                }
             }
             if (field->flags & VMS_ARRAY) {
                 n_elems = field->num;
@@ -1722,6 +1726,10 @@ void vmstate_save_state(QEMUFile *f, const VMStateDescription *vmsd,
                 if (field->flags & VMS_MULTIPLY) {
                     size *= field->size;
                 }
+                if (field->flags & VMS_DIVIDE) {
+                    assert((size % field->size) == 0);
+                    size /= field->size;
+                }
             }
             if (field->flags & VMS_ARRAY) {
                 n_elems = field->num;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH 04/17] target-ppc: Convert ppc cpu savevm to VMStateDescription
  2013-06-27  6:45 [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (2 preceding siblings ...)
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 03/17] savevm: Implement VMS_DIVIDE flag Alexey Kardashevskiy
@ 2013-06-27  6:45 ` Alexey Kardashevskiy
  2013-07-08 18:29   ` Anthony Liguori
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 05/17] pseries: savevm support for XICS interrupt controller Alexey Kardashevskiy
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-27  6:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

The savevm code for the powerpc cpu emulation is currently based around
the old register_savevm() rather than register_vmstate() method.  It's also
rather broken, missing some important state on some CPU models.

This patch completely rewrites the savevm for target-ppc, using the new
VMStateDescription approach.  Exactly what needs to be saved in what
configurations has been more carefully examined, too.  This introduces a
new version (5) of the cpu save format.  The old load function is retained
to support version 4 images.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[aik: ppc cpu savevm convertion fixed to use PowerPCCPU instead of CPUPPCState]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 target-ppc/cpu-qom.h        |    4 +
 target-ppc/cpu.h            |    8 +-
 target-ppc/machine.c        |  533 ++++++++++++++++++++++++++++++++++++-------
 target-ppc/translate_init.c |    2 +
 4 files changed, 454 insertions(+), 93 deletions(-)

diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h
index eb03a00..2b96b04 100644
--- a/target-ppc/cpu-qom.h
+++ b/target-ppc/cpu-qom.h
@@ -102,4 +102,8 @@ PowerPCCPUClass *ppc_cpu_class_by_pvr(uint32_t pvr);
 
 void ppc_cpu_do_interrupt(CPUState *cpu);
 
+#ifndef CONFIG_USER_ONLY
+extern const struct VMStateDescription vmstate_ppc_cpu;
+#endif
+
 #endif
diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 0ede077..f30577d 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -948,7 +948,7 @@ struct CPUPPCState {
 #if defined(TARGET_PPC64)
     /* PowerPC 64 SLB area */
     ppc_slb_t slb[64];
-    int slb_nr;
+    int32_t slb_nr;
 #endif
     /* segment registers */
     hwaddr htab_base;
@@ -957,11 +957,11 @@ struct CPUPPCState {
     /* externally stored hash table */
     uint8_t *external_htab;
     /* BATs */
-    int nb_BATs;
+    uint32_t nb_BATs;
     target_ulong DBAT[2][8];
     target_ulong IBAT[2][8];
     /* PowerPC TLB registers (for 4xx, e500 and 60x software driven TLBs) */
-    int nb_tlb;      /* Total number of TLB                                  */
+    int32_t nb_tlb;      /* Total number of TLB                              */
     int tlb_per_way; /* Speed-up helper: used to avoid divisions at run time */
     int nb_ways;     /* Number of ways in the TLB set                        */
     int last_way;    /* Last used way used to allocate TLB in a LRU way      */
@@ -1176,8 +1176,6 @@ static inline CPUPPCState *cpu_init(const char *cpu_model)
 #define cpu_signal_handler cpu_ppc_signal_handler
 #define cpu_list ppc_cpu_list
 
-#define CPU_SAVE_VERSION 4
-
 /* MMU modes definitions */
 #define MMU_MODE0_SUFFIX _user
 #define MMU_MODE1_SUFFIX _kernel
diff --git a/target-ppc/machine.c b/target-ppc/machine.c
index 2d10adb..1fcc6bc 100644
--- a/target-ppc/machine.c
+++ b/target-ppc/machine.c
@@ -1,96 +1,12 @@
 #include "hw/hw.h"
 #include "hw/boards.h"
 #include "sysemu/kvm.h"
+#include "helper_regs.h"
 
-void cpu_save(QEMUFile *f, void *opaque)
+static int cpu_load_old(QEMUFile *f, void *opaque, int version_id)
 {
-    CPUPPCState *env = (CPUPPCState *)opaque;
-    unsigned int i, j;
-    uint32_t fpscr;
-    target_ulong xer;
-
-    for (i = 0; i < 32; i++)
-        qemu_put_betls(f, &env->gpr[i]);
-#if !defined(TARGET_PPC64)
-    for (i = 0; i < 32; i++)
-        qemu_put_betls(f, &env->gprh[i]);
-#endif
-    qemu_put_betls(f, &env->lr);
-    qemu_put_betls(f, &env->ctr);
-    for (i = 0; i < 8; i++)
-        qemu_put_be32s(f, &env->crf[i]);
-    xer = cpu_read_xer(env);
-    qemu_put_betls(f, &xer);
-    qemu_put_betls(f, &env->reserve_addr);
-    qemu_put_betls(f, &env->msr);
-    for (i = 0; i < 4; i++)
-        qemu_put_betls(f, &env->tgpr[i]);
-    for (i = 0; i < 32; i++) {
-        union {
-            float64 d;
-            uint64_t l;
-        } u;
-        u.d = env->fpr[i];
-        qemu_put_be64(f, u.l);
-    }
-    fpscr = env->fpscr;
-    qemu_put_be32s(f, &fpscr);
-    qemu_put_sbe32s(f, &env->access_type);
-#if defined(TARGET_PPC64)
-    qemu_put_betls(f, &env->spr[SPR_ASR]);
-    qemu_put_sbe32s(f, &env->slb_nr);
-#endif
-    qemu_put_betls(f, &env->spr[SPR_SDR1]);
-    for (i = 0; i < 32; i++)
-        qemu_put_betls(f, &env->sr[i]);
-    for (i = 0; i < 2; i++)
-        for (j = 0; j < 8; j++)
-            qemu_put_betls(f, &env->DBAT[i][j]);
-    for (i = 0; i < 2; i++)
-        for (j = 0; j < 8; j++)
-            qemu_put_betls(f, &env->IBAT[i][j]);
-    qemu_put_sbe32s(f, &env->nb_tlb);
-    qemu_put_sbe32s(f, &env->tlb_per_way);
-    qemu_put_sbe32s(f, &env->nb_ways);
-    qemu_put_sbe32s(f, &env->last_way);
-    qemu_put_sbe32s(f, &env->id_tlbs);
-    qemu_put_sbe32s(f, &env->nb_pids);
-    if (env->tlb.tlb6) {
-        // XXX assumes 6xx
-        for (i = 0; i < env->nb_tlb; i++) {
-            qemu_put_betls(f, &env->tlb.tlb6[i].pte0);
-            qemu_put_betls(f, &env->tlb.tlb6[i].pte1);
-            qemu_put_betls(f, &env->tlb.tlb6[i].EPN);
-        }
-    }
-    for (i = 0; i < 4; i++)
-        qemu_put_betls(f, &env->pb[i]);
-    for (i = 0; i < 1024; i++)
-        qemu_put_betls(f, &env->spr[i]);
-    qemu_put_be32s(f, &env->vscr);
-    qemu_put_be64s(f, &env->spe_acc);
-    qemu_put_be32s(f, &env->spe_fscr);
-    qemu_put_betls(f, &env->msr_mask);
-    qemu_put_be32s(f, &env->flags);
-    qemu_put_sbe32s(f, &env->error_code);
-    qemu_put_be32s(f, &env->pending_interrupts);
-    qemu_put_be32s(f, &env->irq_input_state);
-    for (i = 0; i < POWERPC_EXCP_NB; i++)
-        qemu_put_betls(f, &env->excp_vectors[i]);
-    qemu_put_betls(f, &env->excp_prefix);
-    qemu_put_betls(f, &env->ivor_mask);
-    qemu_put_betls(f, &env->ivpr_mask);
-    qemu_put_betls(f, &env->hreset_vector);
-    qemu_put_betls(f, &env->nip);
-    qemu_put_betls(f, &env->hflags);
-    qemu_put_betls(f, &env->hflags_nmsr);
-    qemu_put_sbe32s(f, &env->mmu_idx);
-    qemu_put_sbe32(f, 0);
-}
-
-int cpu_load(QEMUFile *f, void *opaque, int version_id)
-{
-    CPUPPCState *env = (CPUPPCState *)opaque;
+    PowerPCCPU *cpu = opaque;
+    CPUPPCState *env = &cpu->env;
     unsigned int i, j;
     target_ulong sdr1;
     uint32_t fpscr;
@@ -177,3 +93,444 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
 
     return 0;
 }
+
+static int get_avr(QEMUFile *f, void *pv, size_t size)
+{
+    ppc_avr_t *v = pv;
+
+    v->u64[0] = qemu_get_be64(f);
+    v->u64[1] = qemu_get_be64(f);
+
+    return 0;
+}
+
+static void put_avr(QEMUFile *f, void *pv, size_t size)
+{
+    ppc_avr_t *v = pv;
+
+    qemu_put_be64(f, v->u64[0]);
+    qemu_put_be64(f, v->u64[1]);
+}
+
+const VMStateInfo vmstate_info_avr = {
+    .name = "avr",
+    .get  = get_avr,
+    .put  = put_avr,
+};
+
+#define VMSTATE_AVR_ARRAY_V(_f, _s, _n, _v)                       \
+    VMSTATE_ARRAY(_f, _s, _n, _v, vmstate_info_avr, ppc_avr_t)
+
+#define VMSTATE_AVR_ARRAY(_f, _s, _n)                             \
+    VMSTATE_AVR_ARRAY_V(_f, _s, _n, 0)
+
+static void cpu_pre_save(void *opaque)
+{
+    PowerPCCPU *cpu = opaque;
+    CPUPPCState *env = &cpu->env;
+    int i;
+
+    env->spr[SPR_LR] = env->lr;
+    env->spr[SPR_CTR] = env->ctr;
+    env->spr[SPR_XER] = env->xer;
+#if defined(TARGET_PPC64)
+    env->spr[SPR_CFAR] = env->cfar;
+#endif
+    env->spr[SPR_BOOKE_SPEFSCR] = env->spe_fscr;
+
+    for (i = 0; (i < 4) && (i < env->nb_BATs); i++) {
+        env->spr[SPR_DBAT0U + 2*i] = env->DBAT[0][i];
+        env->spr[SPR_DBAT0U + 2*i + 1] = env->DBAT[1][i];
+        env->spr[SPR_IBAT0U + 2*i] = env->IBAT[0][i];
+        env->spr[SPR_IBAT0U + 2*i + 1] = env->IBAT[1][i];
+    }
+    for (i = 0; (i < 4) && ((i+4) < env->nb_BATs); i++) {
+        env->spr[SPR_DBAT4U + 2*i] = env->DBAT[0][i+4];
+        env->spr[SPR_DBAT4U + 2*i + 1] = env->DBAT[1][i+4];
+        env->spr[SPR_IBAT4U + 2*i] = env->IBAT[0][i+4];
+        env->spr[SPR_IBAT4U + 2*i + 1] = env->IBAT[1][i+4];
+    }
+}
+
+static int cpu_post_load(void *opaque, int version_id)
+{
+    PowerPCCPU *cpu = opaque;
+    CPUPPCState *env = &cpu->env;
+    int i;
+
+    env->lr = env->spr[SPR_LR];
+    env->ctr = env->spr[SPR_CTR];
+    env->xer = env->spr[SPR_XER];
+#if defined(TARGET_PPC64)
+    env->cfar = env->spr[SPR_CFAR];
+#endif
+    env->spe_fscr = env->spr[SPR_BOOKE_SPEFSCR];
+
+    for (i = 0; (i < 4) && (i < env->nb_BATs); i++) {
+        env->DBAT[0][i] = env->spr[SPR_DBAT0U + 2*i];
+        env->DBAT[1][i] = env->spr[SPR_DBAT0U + 2*i + 1];
+        env->IBAT[0][i] = env->spr[SPR_IBAT0U + 2*i];
+        env->IBAT[1][i] = env->spr[SPR_IBAT0U + 2*i + 1];
+    }
+    for (i = 0; (i < 4) && ((i+4) < env->nb_BATs); i++) {
+        env->DBAT[0][i+4] = env->spr[SPR_DBAT4U + 2*i];
+        env->DBAT[1][i+4] = env->spr[SPR_DBAT4U + 2*i + 1];
+        env->IBAT[0][i+4] = env->spr[SPR_IBAT4U + 2*i];
+        env->IBAT[1][i+4] = env->spr[SPR_IBAT4U + 2*i + 1];
+    }
+
+    /* Restore htab_base and htab_mask variables */
+    ppc_store_sdr1(env, env->spr[SPR_SDR1]);
+
+    hreg_compute_hflags(env);
+    hreg_compute_mem_idx(env);
+
+    return 0;
+}
+
+static bool fpu_needed(void *opaque)
+{
+    PowerPCCPU *cpu = opaque;
+
+    return (cpu->env.insns_flags & PPC_FLOAT);
+}
+
+static const VMStateDescription vmstate_fpu = {
+    .name = "cpu/fpu",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_FLOAT64_ARRAY(env.fpr, PowerPCCPU, 32),
+        VMSTATE_UINTTL(env.fpscr, PowerPCCPU),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static bool altivec_needed(void *opaque)
+{
+    PowerPCCPU *cpu = opaque;
+
+    return (cpu->env.insns_flags & PPC_ALTIVEC);
+}
+
+static const VMStateDescription vmstate_altivec = {
+    .name = "cpu/altivec",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_AVR_ARRAY(env.avr, PowerPCCPU, 32),
+        VMSTATE_UINT32(env.vscr, PowerPCCPU),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static bool vsx_needed(void *opaque)
+{
+    PowerPCCPU *cpu = opaque;
+
+    return (cpu->env.insns_flags2 & PPC2_VSX);
+}
+
+static const VMStateDescription vmstate_vsx = {
+    .name = "cpu/vsx",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_UINT64_ARRAY(env.vsr, PowerPCCPU, 32),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static bool sr_needed(void *opaque)
+{
+#ifdef TARGET_PPC64
+    PowerPCCPU *cpu = opaque;
+
+    return !(cpu->env.mmu_model & POWERPC_MMU_64);
+#else
+    return true;
+#endif
+}
+
+static const VMStateDescription vmstate_sr = {
+    .name = "cpu/sr",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_UINTTL_ARRAY(env.sr, PowerPCCPU, 32),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+#ifdef TARGET_PPC64
+static int get_slbe(QEMUFile *f, void *pv, size_t size)
+{
+    ppc_slb_t *v = pv;
+
+    v->esid = qemu_get_be64(f);
+    v->vsid = qemu_get_be64(f);
+
+    return 0;
+}
+
+static void put_slbe(QEMUFile *f, void *pv, size_t size)
+{
+    ppc_slb_t *v = pv;
+
+    qemu_put_be64(f, v->esid);
+    qemu_put_be64(f, v->vsid);
+}
+
+const VMStateInfo vmstate_info_slbe = {
+    .name = "slbe",
+    .get  = get_slbe,
+    .put  = put_slbe,
+};
+
+#define VMSTATE_SLB_ARRAY_V(_f, _s, _n, _v)                       \
+    VMSTATE_ARRAY(_f, _s, _n, _v, vmstate_info_slbe, ppc_slb_t)
+
+#define VMSTATE_SLB_ARRAY(_f, _s, _n)                             \
+    VMSTATE_SLB_ARRAY_V(_f, _s, _n, 0)
+
+static bool slb_needed(void *opaque)
+{
+    PowerPCCPU *cpu = opaque;
+
+    /* We don't support any of the old segment table based 64-bit CPUs */
+    return (cpu->env.mmu_model & POWERPC_MMU_64);
+}
+
+static const VMStateDescription vmstate_slb = {
+    .name = "cpu/slb",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_INT32_EQUAL(env.slb_nr, PowerPCCPU),
+        VMSTATE_SLB_ARRAY(env.slb, PowerPCCPU, 64),
+        VMSTATE_END_OF_LIST()
+    }
+};
+#endif /* TARGET_PPC64 */
+
+static const VMStateDescription vmstate_tlb6xx_entry = {
+    .name = "cpu/tlb6xx_entry",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_UINTTL(pte0, ppc6xx_tlb_t),
+        VMSTATE_UINTTL(pte1, ppc6xx_tlb_t),
+        VMSTATE_UINTTL(EPN, ppc6xx_tlb_t),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static bool tlb6xx_needed(void *opaque)
+{
+    PowerPCCPU *cpu = opaque;
+    CPUPPCState *env = &cpu->env;
+
+    return env->nb_tlb && (env->tlb_type == TLB_6XX);
+}
+
+static const VMStateDescription vmstate_tlb6xx = {
+    .name = "cpu/tlb6xx",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_INT32_EQUAL(env.nb_tlb, PowerPCCPU),
+        VMSTATE_STRUCT_VARRAY_POINTER_INT32(env.tlb.tlb6, PowerPCCPU,
+                                            env.nb_tlb,
+                                            vmstate_tlb6xx_entry,
+                                            ppc6xx_tlb_t),
+        VMSTATE_UINTTL_ARRAY(env.tgpr, PowerPCCPU, 4),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription vmstate_tlbemb_entry = {
+    .name = "cpu/tlbemb_entry",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_UINT64(RPN, ppcemb_tlb_t),
+        VMSTATE_UINTTL(EPN, ppcemb_tlb_t),
+        VMSTATE_UINTTL(PID, ppcemb_tlb_t),
+        VMSTATE_UINTTL(size, ppcemb_tlb_t),
+        VMSTATE_UINT32(prot, ppcemb_tlb_t),
+        VMSTATE_UINT32(attr, ppcemb_tlb_t),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static bool tlbemb_needed(void *opaque)
+{
+    PowerPCCPU *cpu = opaque;
+    CPUPPCState *env = &cpu->env;
+
+    return env->nb_tlb && (env->tlb_type == TLB_EMB);
+}
+
+static bool pbr403_needed(void *opaque)
+{
+    PowerPCCPU *cpu = opaque;
+    uint32_t pvr = cpu->env.spr[SPR_PVR];
+
+    return (pvr & 0xffff0000) == 0x00200000;
+}
+
+static const VMStateDescription vmstate_pbr403 = {
+    .name = "cpu/pbr403",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_UINTTL_ARRAY(env.pb, PowerPCCPU, 4),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static const VMStateDescription vmstate_tlbemb = {
+    .name = "cpu/tlb6xx",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_INT32_EQUAL(env.nb_tlb, PowerPCCPU),
+        VMSTATE_STRUCT_VARRAY_POINTER_INT32(env.tlb.tlbe, PowerPCCPU,
+                                            env.nb_tlb,
+                                            vmstate_tlbemb_entry,
+                                            ppcemb_tlb_t),
+        /* 403 protection registers */
+        VMSTATE_END_OF_LIST()
+    },
+    .subsections = (VMStateSubsection []) {
+        {
+            .vmsd = &vmstate_pbr403,
+            .needed = pbr403_needed,
+        } , {
+            /* empty */
+        }
+    }
+};
+
+static const VMStateDescription vmstate_tlbmas_entry = {
+    .name = "cpu/tlbmas_entry",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_UINT32(mas8, ppcmas_tlb_t),
+        VMSTATE_UINT32(mas1, ppcmas_tlb_t),
+        VMSTATE_UINT64(mas2, ppcmas_tlb_t),
+        VMSTATE_UINT64(mas7_3, ppcmas_tlb_t),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static bool tlbmas_needed(void *opaque)
+{
+    PowerPCCPU *cpu = opaque;
+    CPUPPCState *env = &cpu->env;
+
+    return env->nb_tlb && (env->tlb_type == TLB_MAS);
+}
+
+static const VMStateDescription vmstate_tlbmas = {
+    .name = "cpu/tlbmas",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_INT32_EQUAL(env.nb_tlb, PowerPCCPU),
+        VMSTATE_STRUCT_VARRAY_POINTER_INT32(env.tlb.tlbm, PowerPCCPU,
+                                            env.nb_tlb,
+                                            vmstate_tlbmas_entry,
+                                            ppcmas_tlb_t),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+const VMStateDescription vmstate_ppc_cpu = {
+    .name = "cpu",
+    .version_id = 5,
+    .minimum_version_id = 5,
+    .minimum_version_id_old = 4,
+    .load_state_old = cpu_load_old,
+    .pre_save = cpu_pre_save,
+    .post_load = cpu_post_load,
+    .fields      = (VMStateField []) {
+        /* Verify we haven't changed the pvr */
+        VMSTATE_UINTTL_EQUAL(env.spr[SPR_PVR], PowerPCCPU),
+
+        /* User mode architected state */
+        VMSTATE_UINTTL_ARRAY(env.gpr, PowerPCCPU, 32),
+#if !defined(TARGET_PPC64)
+        VMSTATE_UINTTL_ARRAY(env.gprh, PowerPCCPU, 32),
+#endif
+        VMSTATE_UINT32_ARRAY(env.crf, PowerPCCPU, 8),
+        VMSTATE_UINTTL(env.nip, PowerPCCPU),
+
+        /* SPRs */
+        VMSTATE_UINTTL_ARRAY(env.spr, PowerPCCPU, 1024),
+        VMSTATE_UINT64(env.spe_acc, PowerPCCPU),
+
+        /* Reservation */
+        VMSTATE_UINTTL(env.reserve_addr, PowerPCCPU),
+
+        /* Supervisor mode architected state */
+        VMSTATE_UINTTL(env.msr, PowerPCCPU),
+
+        /* Internal state */
+        VMSTATE_UINTTL(env.hflags_nmsr, PowerPCCPU),
+        /* FIXME: access_type? */
+
+        /* Sanity checking */
+        VMSTATE_UINTTL_EQUAL(env.msr_mask, PowerPCCPU),
+        VMSTATE_UINT64_EQUAL(env.insns_flags, PowerPCCPU),
+        VMSTATE_UINT64_EQUAL(env.insns_flags2, PowerPCCPU),
+        VMSTATE_UINT32_EQUAL(env.nb_BATs, PowerPCCPU),
+        VMSTATE_END_OF_LIST()
+    },
+    .subsections = (VMStateSubsection []) {
+        {
+            .vmsd = &vmstate_fpu,
+            .needed = fpu_needed,
+        } , {
+            .vmsd = &vmstate_altivec,
+            .needed = altivec_needed,
+        } , {
+            .vmsd = &vmstate_vsx,
+            .needed = vsx_needed,
+        } , {
+            .vmsd = &vmstate_sr,
+            .needed = sr_needed,
+        } , {
+#ifdef TARGET_PPC64
+            .vmsd = &vmstate_slb,
+            .needed = slb_needed,
+        } , {
+#endif /* TARGET_PPC64 */
+            .vmsd = &vmstate_tlb6xx,
+            .needed = tlb6xx_needed,
+        } , {
+            .vmsd = &vmstate_tlbemb,
+            .needed = tlbemb_needed,
+        } , {
+            .vmsd = &vmstate_tlbmas,
+            .needed = tlbmas_needed,
+        } , {
+            /* FIXME: DCRs? */
+            /* FIXME: timebase? */
+            /* empty */
+        }
+    }
+};
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index d8758d5..95aebf7 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -8295,6 +8295,8 @@ static void ppc_cpu_class_init(ObjectClass *oc, void *data)
 
     cc->class_by_name = ppc_cpu_class_by_name;
     cc->do_interrupt = ppc_cpu_do_interrupt;
+
+    cpu_class_set_vmsd(cc, &vmstate_ppc_cpu);
 }
 
 static const TypeInfo ppc_cpu_type_info = {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH 05/17] pseries: savevm support for XICS interrupt controller
  2013-06-27  6:45 [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (3 preceding siblings ...)
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 04/17] target-ppc: Convert ppc cpu savevm to VMStateDescription Alexey Kardashevskiy
@ 2013-06-27  6:45 ` Alexey Kardashevskiy
  2013-07-08 18:31   ` Anthony Liguori
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 06/17] pseries: savevm support for VIO devices Alexey Kardashevskiy
                   ` (14 subsequent siblings)
  19 siblings, 1 reply; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-27  6:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

This patch adds the necessary VMStateDescription information to support
savevm/loadvm for the XICS interrupt controller used on the pseries
machine.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[aik: added ics_resend() on post_load]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/intc/xics.c |   63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 0e374c8..3e8f48f 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -497,6 +497,61 @@ static void xics_reset(DeviceState *d)
     xics_common_reset(XICS(d));
 }
 
+static int ics_post_load(void *opaque, int version_id)
+{
+    int i;
+    struct ics_state *ics = opaque;
+
+    for (i = 0; i < ics->icp->nr_servers; i++) {
+        icp_resend(ics->icp, i);
+    }
+
+    return 0;
+}
+
+const VMStateDescription vmstate_icp_server = {
+    .name = "icp/server",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        /* Sanity check */
+        VMSTATE_UINT32(xirr, struct icp_server_state),
+        VMSTATE_UINT8(pending_priority, struct icp_server_state),
+        VMSTATE_UINT8(mfrr, struct icp_server_state),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static const VMStateDescription vmstate_ics_irq = {
+    .name = "ics/irq",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_UINT32(server, struct ics_irq_state),
+        VMSTATE_UINT8(priority, struct ics_irq_state),
+        VMSTATE_UINT8(saved_priority, struct ics_irq_state),
+        VMSTATE_UINT8(status, struct ics_irq_state),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+const VMStateDescription vmstate_ics = {
+    .name = "ics",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .post_load = ics_post_load,
+    .fields      = (VMStateField []) {
+        /* Sanity check */
+        VMSTATE_UINT32_EQUAL(nr_irqs, struct ics_state),
+
+        VMSTATE_STRUCT_VARRAY_POINTER_UINT32(irqs, struct ics_state, nr_irqs, vmstate_ics_irq, struct ics_irq_state),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 void xics_common_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
 {
     CPUState *cs = CPU(cpu);
@@ -523,7 +578,11 @@ void xics_common_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
 
 void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
 {
+    CPUState *cs = CPU(cpu);
+    struct icp_server_state *ss = &icp->ss[cs->cpu_index];
+
     xics_common_cpu_setup(icp, cpu);
+    vmstate_register(NULL, cs->cpu_index, &vmstate_icp_server, ss);
 }
 
 void xics_common_init(struct icp_state *icp, qemu_irq_handler handler)
@@ -555,6 +614,10 @@ static void xics_realize(DeviceState *dev, Error **errp)
     spapr_rtas_register("ibm,int-off", rtas_int_off);
     spapr_rtas_register("ibm,int-on", rtas_int_on);
 
+    /* We use each the ICS's offset into the global irq number space
+     * as an instance id.  This means we can extend to multiple ICS
+     * instances without needing to change the savevm format */
+    vmstate_register(NULL, icp->ics->offset, &vmstate_ics, icp->ics);
 }
 
 static Property xics_properties[] = {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH 06/17] pseries: savevm support for VIO devices
  2013-06-27  6:45 [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (4 preceding siblings ...)
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 05/17] pseries: savevm support for XICS interrupt controller Alexey Kardashevskiy
@ 2013-06-27  6:45 ` Alexey Kardashevskiy
  2013-07-08 18:35   ` Anthony Liguori
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 07/17] pseries: savevm support for PAPR VIO logical lan Alexey Kardashevskiy
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-27  6:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

This patch adds helpers to allow PAPR VIO devices to save state common
to all VIO devices during savevm.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr_vio.c         |   20 ++++++++++++++++++++
 include/hw/ppc/spapr_vio.h |    5 +++++
 2 files changed, 25 insertions(+)

diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
index 9c18741..565d883 100644
--- a/hw/ppc/spapr_vio.c
+++ b/hw/ppc/spapr_vio.c
@@ -542,6 +542,26 @@ static const TypeInfo spapr_vio_bridge_info = {
     .class_init    = spapr_vio_bridge_class_init,
 };
 
+const VMStateDescription vmstate_spapr_vio = {
+    .name = "spapr_vio",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        /* Sanity check */
+        VMSTATE_UINT32_EQUAL(reg, VIOsPAPRDevice),
+        VMSTATE_UINT32_EQUAL(irq, VIOsPAPRDevice),
+
+        /* General VIO device state */
+        VMSTATE_UINTTL(signal_state, VIOsPAPRDevice),
+        VMSTATE_UINT64(crq.qladdr, VIOsPAPRDevice),
+        VMSTATE_UINT32(crq.qsize, VIOsPAPRDevice),
+        VMSTATE_UINT32(crq.qnext, VIOsPAPRDevice),
+
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static void vio_spapr_device_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *k = DEVICE_CLASS(klass);
diff --git a/include/hw/ppc/spapr_vio.h b/include/hw/ppc/spapr_vio.h
index 3609327..46edc2a 100644
--- a/include/hw/ppc/spapr_vio.h
+++ b/include/hw/ppc/spapr_vio.h
@@ -134,4 +134,9 @@ VIOsPAPRDevice *spapr_vty_get_default(VIOsPAPRBus *bus);
 
 void spapr_vio_quiesce(void);
 
+extern const VMStateDescription vmstate_spapr_vio;
+
+#define VMSTATE_SPAPR_VIO(_f, _s) \
+    VMSTATE_STRUCT(_f, _s, 0, vmstate_spapr_vio, VIOsPAPRDevice)
+
 #endif /* _HW_SPAPR_VIO_H */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH 07/17] pseries: savevm support for PAPR VIO logical lan
  2013-06-27  6:45 [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (5 preceding siblings ...)
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 06/17] pseries: savevm support for VIO devices Alexey Kardashevskiy
@ 2013-06-27  6:45 ` Alexey Kardashevskiy
  2013-07-08 18:36   ` Anthony Liguori
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 08/17] pseries: savevm support for PAPR TCE tables Alexey Kardashevskiy
                   ` (12 subsequent siblings)
  19 siblings, 1 reply; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-27  6:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

This patch adds the necessary VMStateDescription information to support
savevm/loadvm for the spapr_llan (PAPR logical lan) device.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/char/spapr_vty.c |   16 ++++++++++++++++
 hw/net/spapr_llan.c |   24 ++++++++++++++++++++++--
 2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/hw/char/spapr_vty.c b/hw/char/spapr_vty.c
index 2993848..a799721 100644
--- a/hw/char/spapr_vty.c
+++ b/hw/char/spapr_vty.c
@@ -142,6 +142,21 @@ static Property spapr_vty_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
+static const VMStateDescription vmstate_spapr_vty = {
+    .name = "spapr_vty",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_SPAPR_VIO(sdev, VIOsPAPRVTYDevice),
+
+        VMSTATE_UINT32(in, VIOsPAPRVTYDevice),
+        VMSTATE_UINT32(out, VIOsPAPRVTYDevice),
+        VMSTATE_BUFFER(buf, VIOsPAPRVTYDevice),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static void spapr_vty_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
@@ -152,6 +167,7 @@ static void spapr_vty_class_init(ObjectClass *klass, void *data)
     k->dt_type = "serial";
     k->dt_compatible = "hvterm1";
     dc->props = spapr_vty_properties;
+    dc->vmsd = &vmstate_spapr_vty;
 }
 
 static const TypeInfo spapr_vty_info = {
diff --git a/hw/net/spapr_llan.c b/hw/net/spapr_llan.c
index 03a09f2..46f7d5f 100644
--- a/hw/net/spapr_llan.c
+++ b/hw/net/spapr_llan.c
@@ -81,9 +81,9 @@ typedef struct VIOsPAPRVLANDevice {
     VIOsPAPRDevice sdev;
     NICConf nicconf;
     NICState *nic;
-    int isopen;
+    bool isopen;
     target_ulong buf_list;
-    int add_buf_ptr, use_buf_ptr, rx_bufs;
+    uint32_t add_buf_ptr, use_buf_ptr, rx_bufs;
     target_ulong rxq_ptr;
 } VIOsPAPRVLANDevice;
 
@@ -500,6 +500,25 @@ static Property spapr_vlan_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
+static const VMStateDescription vmstate_spapr_llan = {
+    .name = "spapr_llan",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_SPAPR_VIO(sdev, VIOsPAPRVLANDevice),
+        /* LLAN state */
+        VMSTATE_BOOL(isopen, VIOsPAPRVLANDevice),
+        VMSTATE_UINTTL(buf_list, VIOsPAPRVLANDevice),
+        VMSTATE_UINT32(add_buf_ptr, VIOsPAPRVLANDevice),
+        VMSTATE_UINT32(use_buf_ptr, VIOsPAPRVLANDevice),
+        VMSTATE_UINT32(rx_bufs, VIOsPAPRVLANDevice),
+        VMSTATE_UINTTL(rxq_ptr, VIOsPAPRVLANDevice),
+
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static void spapr_vlan_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
@@ -514,6 +533,7 @@ static void spapr_vlan_class_init(ObjectClass *klass, void *data)
     k->signal_mask = 0x1;
     dc->props = spapr_vlan_properties;
     k->rtce_window_size = 0x10000000;
+    dc->vmsd = &vmstate_spapr_llan;
 }
 
 static const TypeInfo spapr_vlan_info = {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH 08/17] pseries: savevm support for PAPR TCE tables
  2013-06-27  6:45 [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (6 preceding siblings ...)
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 07/17] pseries: savevm support for PAPR VIO logical lan Alexey Kardashevskiy
@ 2013-06-27  6:45 ` Alexey Kardashevskiy
  2013-07-08 18:39   ` Anthony Liguori
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 09/17] pseries: rework PAPR virtual SCSI Alexey Kardashevskiy
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-27  6:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

This patch adds the necessary VMStateDescription information to save the
state of PAPR TCE tables (that is, the PAPR specified IOMMU).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr_iommu.c |   25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 91bc8e4..ba1f7b6 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -112,6 +112,25 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
     };
 }
 
+static const VMStateDescription vmstate_spapr_tce_table = {
+    .name = "spapr_iommu",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        /* Sanity check */
+        VMSTATE_UINT32_EQUAL(liobn, sPAPRTCETable),
+        VMSTATE_UINT32_EQUAL(window_size, sPAPRTCETable),
+
+        /* IOMMU state */
+        VMSTATE_BOOL(bypass, sPAPRTCETable),
+        VMSTATE_VBUFFER_DIVIDE(table, sPAPRTCETable, 0, NULL, 0, window_size,
+                               SPAPR_TCE_PAGE_SIZE / sizeof(sPAPRTCE)),
+
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static MemoryRegionIOMMUOps spapr_iommu_ops = {
     .translate = spapr_tce_translate_iommu,
 };
@@ -156,6 +175,8 @@ sPAPRTCETable *spapr_tce_new_table(uint32_t liobn, size_t window_size)
 
     QLIST_INSERT_HEAD(&spapr_tce_tables, tcet, list);
 
+    vmstate_register(NULL, tcet->liobn, &vmstate_spapr_tce_table, tcet);
+
     return tcet;
 }
 
@@ -163,6 +184,10 @@ void spapr_tce_free(sPAPRTCETable *tcet)
 {
     QLIST_REMOVE(tcet, list);
 
+    vmstate_unregister(NULL, &vmstate_spapr_tce_table, tcet);
+
+    QLIST_REMOVE(tcet, list);
+
     if (!kvm_enabled() ||
         (kvmppc_remove_spapr_tce(tcet->table, tcet->fd,
                                  tcet->window_size) != 0)) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH 09/17] pseries: rework PAPR virtual SCSI
  2013-06-27  6:45 [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (7 preceding siblings ...)
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 08/17] pseries: savevm support for PAPR TCE tables Alexey Kardashevskiy
@ 2013-06-27  6:45 ` Alexey Kardashevskiy
  2013-07-08 18:42   ` Anthony Liguori
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 10/17] pseries: savevm support for " Alexey Kardashevskiy
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-27  6:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, Alexey Kardashevskiy, Alexander Graf, qemu-ppc,
	Paolo Bonzini, Paul Mackerras, David Gibson

The patch reimplements handling of indirect requests in order to
simplify upcoming live migration support.
- all pointers (except SCSIRequest*) were replaces with integer
indexes and offsets;
- DMA'ed srp_direct_buf kept untouched (ie. BE format);
- vscsi_fetch_desc() is added, now it is the only place where
descriptors are fetched and byteswapped;
- vscsi_req struct fields converted to migration-friendly types;
- many dprintf()'s fixed.

This also removed an unused field 'lun' from the spapr_vscsi device
which is assigned, but never used.  So, remove it.

[David Gibson: removed unused 'lun']
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: David Gibson <david@gibson.dropbear.id.au>
---
 hw/scsi/spapr_vscsi.c |  224 +++++++++++++++++++++++++++++--------------------
 1 file changed, 131 insertions(+), 93 deletions(-)

diff --git a/hw/scsi/spapr_vscsi.c b/hw/scsi/spapr_vscsi.c
index e8978bf..1e93102 100644
--- a/hw/scsi/spapr_vscsi.c
+++ b/hw/scsi/spapr_vscsi.c
@@ -75,20 +75,19 @@ typedef struct vscsi_req {
     /* SCSI request tracking */
     SCSIRequest             *sreq;
     uint32_t                qtag; /* qemu tag != srp tag */
-    int                     lun;
-    int                     active;
-    long                    data_len;
-    int                     writing;
-    int                     senselen;
+    bool                    active;
+    uint32_t                data_len;
+    bool                    writing;
+    uint32_t                senselen;
     uint8_t                 sense[SCSI_SENSE_BUF_SIZE];
 
     /* RDMA related bits */
     uint8_t                 dma_fmt;
-    struct srp_direct_buf   ext_desc;
-    struct srp_direct_buf   *cur_desc;
-    struct srp_indirect_buf *ind_desc;
-    int                     local_desc;
-    int                     total_desc;
+    uint16_t                local_desc;
+    uint16_t                total_desc;
+    uint16_t                cdb_offset;
+    uint16_t                cur_desc_num;
+    uint16_t                cur_desc_offset;
 } vscsi_req;
 
 #define TYPE_VIO_SPAPR_VSCSI_DEVICE "spapr-vscsi"
@@ -264,93 +263,139 @@ static int vscsi_send_rsp(VSCSIState *s, vscsi_req *req,
     return 0;
 }
 
-static inline void vscsi_swap_desc(struct srp_direct_buf *desc)
+static inline struct srp_direct_buf vscsi_swap_desc(struct srp_direct_buf desc)
 {
-    desc->va = be64_to_cpu(desc->va);
-    desc->len = be32_to_cpu(desc->len);
+    desc.va = be64_to_cpu(desc.va);
+    desc.len = be32_to_cpu(desc.len);
+    return desc;
+}
+
+static int vscsi_fetch_desc(VSCSIState *s, struct vscsi_req *req,
+                            unsigned n, unsigned buf_offset,
+                            struct srp_direct_buf *ret)
+{
+    struct srp_cmd *cmd = &req->iu.srp.cmd;
+
+    switch (req->dma_fmt) {
+    case SRP_NO_DATA_DESC: {
+        dprintf("VSCSI: no data descriptor\n");
+        return 0;
+    }
+    case SRP_DATA_DESC_DIRECT: {
+        *ret = *(struct srp_direct_buf *)(cmd->add_data + req->cdb_offset);
+        assert(req->cur_desc_num == 0);
+        dprintf("VSCSI: direct segment");
+        break;
+    }
+    case SRP_DATA_DESC_INDIRECT: {
+        struct srp_indirect_buf *tmp = (struct srp_indirect_buf *)
+                                       (cmd->add_data + req->cdb_offset);
+        if (n < req->local_desc) {
+            *ret = tmp->desc_list[n];
+            dprintf("VSCSI: indirect segment local tag=0x%x desc#%d/%d",
+                    req->qtag, n, req->local_desc);
+
+        } else if (n < req->total_desc) {
+            int rc;
+            struct srp_direct_buf tbl_desc = vscsi_swap_desc(tmp->table_desc);
+            unsigned desc_offset = (n - req->local_desc) *
+                                    sizeof(struct srp_direct_buf);
+
+            if (desc_offset > tbl_desc.len) {
+                dprintf("VSCSI:   #%d is ouf of range (%d bytes)\n",
+                        n, desc_offset);
+                return -1;
+            }
+            rc = spapr_vio_dma_read(&s->vdev, tbl_desc.va + desc_offset,
+                                    ret, sizeof(struct srp_direct_buf));
+            if (rc) {
+                dprintf("VSCSI: spapr_vio_dma_read -> %d reading ext_desc\n",
+                        rc);
+                return rc;
+            }
+            dprintf("VSCSI: indirect segment ext. tag=0x%x desc#%d/%d { va=%"PRIx64" len=%x }",
+                    req->qtag, n, req->total_desc, tbl_desc.va, tbl_desc.len);
+        } else {
+            dprintf("VSCSI:   Out of descriptors !\n");
+            return 0;
+        }
+        break;
+    }
+    default:
+        fprintf(stderr, "VSCSI:   Unknown format %x\n", req->dma_fmt);
+        return -1;
+    }
+
+    *ret = vscsi_swap_desc(*ret);
+    if (buf_offset > ret->len) {
+        dprintf("   offset=%x is out of a descriptor #%d boundary=%x\n",
+                buf_offset, req->cur_desc_num, ret->len);
+        return -1;
+    }
+    ret->va += buf_offset;
+    ret->len -= buf_offset;
+
+    dprintf("   cur=%d offs=%x ret { va=%"PRIx64" len=%x }\n",
+            req->cur_desc_num, req->cur_desc_offset, ret->va, ret->len);
+
+    return ret->len ? 1 : 0;
 }
 
 static int vscsi_srp_direct_data(VSCSIState *s, vscsi_req *req,
                                  uint8_t *buf, uint32_t len)
 {
-    struct srp_direct_buf *md = req->cur_desc;
+    struct srp_direct_buf md;
     uint32_t llen;
     int rc = 0;
 
-    dprintf("VSCSI: direct segment 0x%x bytes, va=0x%llx desc len=0x%x\n",
-            len, (unsigned long long)md->va, md->len);
+    rc = vscsi_fetch_desc(s, req, req->cur_desc_num, req->cur_desc_offset, &md);
+    if (rc < 0) {
+        return -1;
+    } else if (rc == 0) {
+        return 0;
+    }
 
-    llen = MIN(len, md->len);
+    llen = MIN(len, md.len);
     if (llen) {
         if (req->writing) { /* writing = to device = reading from memory */
-            rc = spapr_vio_dma_read(&s->vdev, md->va, buf, llen);
+            rc = spapr_vio_dma_read(&s->vdev, md.va, buf, llen);
         } else {
-            rc = spapr_vio_dma_write(&s->vdev, md->va, buf, llen);
+            rc = spapr_vio_dma_write(&s->vdev, md.va, buf, llen);
         }
     }
-    md->len -= llen;
-    md->va += llen;
 
     if (rc) {
         return -1;
     }
+    req->cur_desc_offset += llen;
+
     return llen;
 }
 
 static int vscsi_srp_indirect_data(VSCSIState *s, vscsi_req *req,
                                    uint8_t *buf, uint32_t len)
 {
-    struct srp_direct_buf *td = &req->ind_desc->table_desc;
-    struct srp_direct_buf *md = req->cur_desc;
+    struct srp_direct_buf md;
     int rc = 0;
     uint32_t llen, total = 0;
 
-    dprintf("VSCSI: indirect segment 0x%x bytes, td va=0x%llx len=0x%x\n",
-            len, (unsigned long long)td->va, td->len);
+    dprintf("VSCSI: indirect segment 0x%x bytes\n", len);
 
     /* While we have data ... */
     while (len) {
-        /* If we have a descriptor but it's empty, go fetch a new one */
-        if (md && md->len == 0) {
-            /* More local available, use one */
-            if (req->local_desc) {
-                md = ++req->cur_desc;
-                --req->local_desc;
-                --req->total_desc;
-                td->va += sizeof(struct srp_direct_buf);
-            } else {
-                md = req->cur_desc = NULL;
-            }
-        }
-        /* No descriptor at hand, fetch one */
-        if (!md) {
-            if (!req->total_desc) {
-                dprintf("VSCSI:   Out of descriptors !\n");
-                break;
-            }
-            md = req->cur_desc = &req->ext_desc;
-            dprintf("VSCSI:   Reading desc from 0x%llx\n",
-                    (unsigned long long)td->va);
-            rc = spapr_vio_dma_read(&s->vdev, td->va, md,
-                                    sizeof(struct srp_direct_buf));
-            if (rc) {
-                dprintf("VSCSI: spapr_vio_dma_read -> %d reading ext_desc\n",
-                        rc);
-                break;
-            }
-            vscsi_swap_desc(md);
-            td->va += sizeof(struct srp_direct_buf);
-            --req->total_desc;
+        rc = vscsi_fetch_desc(s, req, req->cur_desc_num, req->cur_desc_offset, &md);
+        if (rc < 0) {
+            return -1;
+        } else if (rc == 0) {
+            break;
         }
-        dprintf("VSCSI:   [desc va=0x%llx,len=0x%x] remaining=0x%x\n",
-                (unsigned long long)md->va, md->len, len);
 
         /* Perform transfer */
-        llen = MIN(len, md->len);
+        llen = MIN(len, md.len);
         if (req->writing) { /* writing = to device = reading from memory */
-            rc = spapr_vio_dma_read(&s->vdev, md->va, buf, llen);
+            rc = spapr_vio_dma_read(&s->vdev, md.va, buf, llen);
         } else {
-            rc = spapr_vio_dma_write(&s->vdev, md->va, buf, llen);
+            rc = spapr_vio_dma_write(&s->vdev, md.va, buf, llen);
         }
         if (rc) {
             dprintf("VSCSI: spapr_vio_dma_r/w(%d) -> %d\n", req->writing, rc);
@@ -361,10 +406,18 @@ static int vscsi_srp_indirect_data(VSCSIState *s, vscsi_req *req,
 
         len -= llen;
         buf += llen;
+
         total += llen;
-        md->va += llen;
-        md->len -= llen;
+
+        /* Update current position in the current descriptor */
+        req->cur_desc_offset += llen;
+        if (md.len == llen) {
+            /* Go to the next descriptor if the current one finished */
+            ++req->cur_desc_num;
+            req->cur_desc_offset = 0;
+        }
     }
+
     return rc ? -1 : total;
 }
 
@@ -412,14 +465,13 @@ static int data_out_desc_size(struct srp_cmd *cmd)
 static int vscsi_preprocess_desc(vscsi_req *req)
 {
     struct srp_cmd *cmd = &req->iu.srp.cmd;
-    int offset, i;
 
-    offset = cmd->add_cdb_len & ~3;
+    req->cdb_offset = cmd->add_cdb_len & ~3;
 
     if (req->writing) {
         req->dma_fmt = cmd->buf_fmt >> 4;
     } else {
-        offset += data_out_desc_size(cmd);
+        req->cdb_offset += data_out_desc_size(cmd);
         req->dma_fmt = cmd->buf_fmt & ((1U << 4) - 1);
     }
 
@@ -427,31 +479,18 @@ static int vscsi_preprocess_desc(vscsi_req *req)
     case SRP_NO_DATA_DESC:
         break;
     case SRP_DATA_DESC_DIRECT:
-        req->cur_desc = (struct srp_direct_buf *)(cmd->add_data + offset);
         req->total_desc = req->local_desc = 1;
-        vscsi_swap_desc(req->cur_desc);
-        dprintf("VSCSI: using direct RDMA %s, 0x%x bytes MD: 0x%llx\n",
-                req->writing ? "write" : "read",
-                req->cur_desc->len, (unsigned long long)req->cur_desc->va);
         break;
-    case SRP_DATA_DESC_INDIRECT:
-        req->ind_desc = (struct srp_indirect_buf *)(cmd->add_data + offset);
-        vscsi_swap_desc(&req->ind_desc->table_desc);
-        req->total_desc = req->ind_desc->table_desc.len /
-            sizeof(struct srp_direct_buf);
+    case SRP_DATA_DESC_INDIRECT: {
+        struct srp_indirect_buf *ind_tmp = (struct srp_indirect_buf *)
+                (cmd->add_data + req->cdb_offset);
+
+        req->total_desc = be32_to_cpu(ind_tmp->table_desc.len) /
+                          sizeof(struct srp_direct_buf);
         req->local_desc = req->writing ? cmd->data_out_desc_cnt :
-            cmd->data_in_desc_cnt;
-        for (i = 0; i < req->local_desc; i++) {
-            vscsi_swap_desc(&req->ind_desc->desc_list[i]);
-        }
-        req->cur_desc = req->local_desc ? &req->ind_desc->desc_list[0] : NULL;
-        dprintf("VSCSI: using indirect RDMA %s, 0x%x bytes %d descs "
-                "(%d local) VA: 0x%llx\n",
-                req->writing ? "read" : "write",
-                be32_to_cpu(req->ind_desc->len),
-                req->total_desc, req->local_desc,
-                (unsigned long long)req->ind_desc->table_desc.va);
+                          cmd->data_in_desc_cnt;
         break;
+    }
     default:
         fprintf(stderr,
                 "vscsi_preprocess_desc: Unknown format %x\n", req->dma_fmt);
@@ -499,8 +538,8 @@ static void vscsi_command_complete(SCSIRequest *sreq, uint32_t status, size_t re
     vscsi_req *req = sreq->hba_private;
     int32_t res_in = 0, res_out = 0;
 
-    dprintf("VSCSI: SCSI cmd complete, r=0x%x tag=0x%x status=0x%x, req=%p\n",
-            reason, sreq->tag, status, req);
+    dprintf("VSCSI: SCSI cmd complete, tag=0x%x status=0x%x, req=%p\n",
+            sreq->tag, status, req);
     if (req == NULL) {
         fprintf(stderr, "VSCSI: Can't find request for tag 0x%x\n", sreq->tag);
         return;
@@ -509,7 +548,7 @@ static void vscsi_command_complete(SCSIRequest *sreq, uint32_t status, size_t re
     if (status == CHECK_CONDITION) {
         req->senselen = scsi_req_get_sense(req->sreq, req->sense,
                                            sizeof(req->sense));
-        dprintf("VSCSI: Sense data, %d bytes:\n", len);
+        dprintf("VSCSI: Sense data, %d bytes:\n", req->senselen);
         dprintf("       %02x  %02x  %02x  %02x  %02x  %02x  %02x  %02x\n",
                 req->sense[0], req->sense[1], req->sense[2], req->sense[3],
                 req->sense[4], req->sense[5], req->sense[6], req->sense[7]);
@@ -621,12 +660,11 @@ static int vscsi_queue_cmd(VSCSIState *s, vscsi_req *req)
         } return 1;
     }
 
-    req->lun = lun;
     req->sreq = scsi_req_new(sdev, req->qtag, lun, srp->cmd.cdb, req);
     n = scsi_req_enqueue(req->sreq);
 
-    dprintf("VSCSI: Queued command tag 0x%x CMD 0x%x ID %d LUN %d ret: %d\n",
-            req->qtag, srp->cmd.cdb[0], id, lun, n);
+    dprintf("VSCSI: Queued command tag 0x%x CMD 0x%x LUN %d ret: %d\n",
+            req->qtag, srp->cmd.cdb[0], lun, n);
 
     if (n) {
         /* Transfer direction must be set before preprocessing the
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH 10/17] pseries: savevm support for PAPR virtual SCSI
  2013-06-27  6:45 [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (8 preceding siblings ...)
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 09/17] pseries: rework PAPR virtual SCSI Alexey Kardashevskiy
@ 2013-06-27  6:45 ` Alexey Kardashevskiy
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 11/17] pseries: savevm support for pseries machine Alexey Kardashevskiy
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-27  6:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

This patch adds the necessary support for saving the state of the PAPR VIO
virtual SCSI device. This also saves and restores active SCSI requests.

[aik: implemented vscsi_req save/restore]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: David Gibson <david@gibson.dropbear.id.au>
---
 hw/scsi/spapr_vscsi.c |   82 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 81 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/spapr_vscsi.c b/hw/scsi/spapr_vscsi.c
index 1e93102..4db3a47 100644
--- a/hw/scsi/spapr_vscsi.c
+++ b/hw/scsi/spapr_vscsi.c
@@ -579,6 +579,69 @@ static void vscsi_request_cancelled(SCSIRequest *sreq)
     vscsi_put_req(req);
 }
 
+static const VMStateDescription vmstate_spapr_vscsi_req = {
+    .name = "spapr_vscsi_req",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_BUFFER(crq.raw, vscsi_req),
+        VMSTATE_BUFFER(iu.srp.reserved, vscsi_req),
+        VMSTATE_UINT32(qtag, vscsi_req),
+        VMSTATE_BOOL(active, vscsi_req),
+        VMSTATE_UINT32(data_len, vscsi_req),
+        VMSTATE_BOOL(writing, vscsi_req),
+        VMSTATE_UINT32(senselen, vscsi_req),
+        VMSTATE_BUFFER(sense, vscsi_req),
+        VMSTATE_UINT8(dma_fmt, vscsi_req),
+        VMSTATE_UINT16(local_desc, vscsi_req),
+        VMSTATE_UINT16(total_desc, vscsi_req),
+        VMSTATE_UINT16(cdb_offset, vscsi_req),
+      /*Restart SCSI request from the beginning for now */
+      /*VMSTATE_UINT16(cur_desc_num, vscsi_req),
+        VMSTATE_UINT16(cur_desc_offset, vscsi_req),*/
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static void vscsi_save_request(QEMUFile *f, SCSIRequest *sreq)
+{
+    vscsi_req *req = sreq->hba_private;
+    assert(req->active);
+
+    vmstate_save_state(f, &vmstate_spapr_vscsi_req, req);
+
+    dprintf("VSCSI: saving tag=%u, current desc#%d, offset=%x\n",
+            req->qtag, req->cur_desc_num, req->cur_desc_offset);
+}
+
+static void *vscsi_load_request(QEMUFile *f, SCSIRequest *sreq)
+{
+    SCSIBus *bus = sreq->bus;
+    VSCSIState *s = VIO_SPAPR_VSCSI_DEVICE(bus->qbus.parent);
+    vscsi_req *req;
+    int rc;
+
+    assert(sreq->tag < VSCSI_REQ_LIMIT);
+    req = &s->reqs[sreq->tag];
+    assert(!req->active);
+
+    memset(req, 0, sizeof(*req));
+    rc = vmstate_load_state(f, &vmstate_spapr_vscsi_req, req, 1);
+    if (rc) {
+        fprintf(stderr, "VSCSI: failed loading request tag#%u\n", sreq->tag);
+        return NULL;
+    }
+    assert(req->active);
+
+    req->sreq = scsi_req_ref(sreq);
+
+    dprintf("VSCSI: restoring tag=%u, current desc#%d, offset=%x\n",
+            req->qtag, req->cur_desc_num, req->cur_desc_offset);
+
+    return req;
+}
+
 static void vscsi_process_login(VSCSIState *s, vscsi_req *req)
 {
     union viosrp_iu *iu = &req->iu;
@@ -933,7 +996,9 @@ static const struct SCSIBusInfo vscsi_scsi_info = {
 
     .transfer_data = vscsi_transfer_data,
     .complete = vscsi_command_complete,
-    .cancel = vscsi_request_cancelled
+    .cancel = vscsi_request_cancelled,
+    .save_request = vscsi_save_request,
+    .load_request = vscsi_load_request,
 };
 
 static void spapr_vscsi_reset(VIOsPAPRDevice *dev)
@@ -992,6 +1057,20 @@ static Property spapr_vscsi_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
+static const VMStateDescription vmstate_spapr_vscsi = {
+    .name = "spapr_vscsi",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_SPAPR_VIO(vdev, VSCSIState),
+        /* VSCSI state */
+        /* ???? */
+
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static void spapr_vscsi_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
@@ -1006,6 +1085,7 @@ static void spapr_vscsi_class_init(ObjectClass *klass, void *data)
     k->signal_mask = 0x00000001;
     dc->props = spapr_vscsi_properties;
     k->rtce_window_size = 0x10000000;
+    dc->vmsd = &vmstate_spapr_vscsi;
 }
 
 static const TypeInfo spapr_vscsi_info = {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH 11/17] pseries: savevm support for pseries machine
  2013-06-27  6:45 [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (9 preceding siblings ...)
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 10/17] pseries: savevm support for " Alexey Kardashevskiy
@ 2013-06-27  6:45 ` Alexey Kardashevskiy
  2013-07-08 18:45   ` Anthony Liguori
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 12/17] pseries: savevm support for PCI host bridge Alexey Kardashevskiy
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-27  6:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

This adds the necessary pieces to implement savevm / migration for the
pseries machine.  The most complex part here is migrating the hash
table - for the paravirtualized pseries machine the guest's hash page
table is not stored within guest memory, but externally and the guest
accesses it via hypercalls.

This patch uses a hypervisor reserved bit of the HPTE as a dirty bit
(tracking changes to the HPTE itself, not the page it references).
This is used to implement a live migration style incremental save and
restore of the hash table contents.

In addition it adds VMStateDescription information to save and restore
the (few) remaining pieces of state information needed by the pseries
machine.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr.c         |  269 +++++++++++++++++++++++++++++++++++++++++++++++-
 hw/ppc/spapr_hcall.c   |    8 +-
 include/hw/ppc/spapr.h |   12 ++-
 3 files changed, 281 insertions(+), 8 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index def3505..f989a22 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -32,6 +32,7 @@
 #include "sysemu/cpus.h"
 #include "sysemu/kvm.h"
 #include "kvm_ppc.h"
+#include "mmu-hash64.h"
 
 #include "hw/boards.h"
 #include "hw/ppc/ppc.h"
@@ -667,7 +668,7 @@ static void spapr_cpu_reset(void *opaque)
 
     env->spr[SPR_HIOR] = 0;
 
-    env->external_htab = spapr->htab;
+    env->external_htab = (uint8_t *)spapr->htab;
     env->htab_base = -1;
     env->htab_mask = HTAB_SIZE(spapr) - 1;
     env->spr[SPR_SDR1] = (target_ulong)spapr->htab |
@@ -719,6 +720,268 @@ static int spapr_vga_init(PCIBus *pci_bus)
     }
 }
 
+static const VMStateDescription vmstate_spapr = {
+    .name = "spapr",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_UINT32(next_irq, sPAPREnvironment),
+
+        /* RTC offset */
+        VMSTATE_UINT64(rtc_offset, sPAPREnvironment),
+
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+#define HPTE(_table, _i)   (void *)(((uint64_t *)(_table)) + ((_i) * 2))
+#define HPTE_VALID(_hpte)  (tswap64(*((uint64_t *)(_hpte))) & HPTE64_V_VALID)
+#define HPTE_DIRTY(_hpte)  (tswap64(*((uint64_t *)(_hpte))) & HPTE64_V_HPTE_DIRTY)
+#define CLEAN_HPTE(_hpte)  ((*(uint64_t *)(_hpte)) &= tswap64(~HPTE64_V_HPTE_DIRTY))
+
+static int htab_save_setup(QEMUFile *f, void *opaque)
+{
+    sPAPREnvironment *spapr = opaque;
+
+    spapr->htab_save_index = 0;
+    spapr->htab_first_pass = true;
+
+    /* "Iteration" header */
+    qemu_put_be32(f, spapr->htab_shift);
+
+    return 0;
+}
+
+#define MAX_ITERATION_NS    5000000 /* 5 ms */
+
+static void htab_save_first_pass(QEMUFile *f, sPAPREnvironment *spapr,
+                                 int64_t max_ns)
+{
+    int htabslots = HTAB_SIZE(spapr) / HASH_PTE_SIZE_64;
+    int index = spapr->htab_save_index;
+    int64_t starttime = qemu_get_clock_ns(rt_clock);
+
+    assert(spapr->htab_first_pass);
+
+    do {
+        int chunkstart;
+
+        /* Consume invalid HPTEs */
+        while ((index < htabslots)
+               && !HPTE_VALID(HPTE(spapr->htab, index))) {
+            index++;
+            CLEAN_HPTE(HPTE(spapr->htab, index));
+        }
+
+        /* Consume valid HPTEs */
+        chunkstart = index;
+        while ((index < htabslots)
+               && HPTE_VALID(HPTE(spapr->htab, index))) {
+            index++;
+            CLEAN_HPTE(HPTE(spapr->htab, index));
+        }
+
+        if (index > chunkstart) {
+            int n_valid = index - chunkstart;
+
+            qemu_put_be32(f, chunkstart);
+            qemu_put_be16(f, n_valid);
+            qemu_put_be16(f, 0);
+            qemu_put_buffer(f, HPTE(spapr->htab, chunkstart),
+                            HASH_PTE_SIZE_64 * n_valid);
+
+            if ((qemu_get_clock_ns(rt_clock) - starttime) > max_ns) {
+                break;
+            }
+        }
+    } while ((index < htabslots) && !qemu_file_rate_limit(f));
+
+    if (index >= htabslots) {
+        assert(index == htabslots);
+        index = 0;
+        spapr->htab_first_pass = false;
+    }
+    spapr->htab_save_index = index;
+}
+
+static bool htab_save_later_pass(QEMUFile *f, sPAPREnvironment *spapr,
+                                 int64_t max_ns)
+{
+    bool final = max_ns < 0;
+    int htabslots = HTAB_SIZE(spapr) / HASH_PTE_SIZE_64;
+    int examined = 0, sent = 0;
+    int index = spapr->htab_save_index;
+    int64_t starttime = qemu_get_clock_ns(rt_clock);
+
+    assert(!spapr->htab_first_pass);
+
+    do {
+        int chunkstart, invalidstart;
+
+        /* Consume non-dirty HPTEs */
+        while ((index < htabslots)
+               && !HPTE_DIRTY(HPTE(spapr->htab, index))) {
+            index++;
+            examined++;
+        }
+
+        chunkstart = index;
+        /* Consume valid dirty HPTEs */
+        while ((index < htabslots)
+               && HPTE_DIRTY(HPTE(spapr->htab, index))
+               && HPTE_VALID(HPTE(spapr->htab, index))) {
+            CLEAN_HPTE(HPTE(spapr->htab, index));
+            index++;
+            examined++;
+        }
+
+        invalidstart = index;
+        /* Consume invalid dirty HPTEs */
+        while ((index < htabslots)
+               && HPTE_DIRTY(HPTE(spapr->htab, index))
+               && !HPTE_VALID(HPTE(spapr->htab, index))) {
+            CLEAN_HPTE(HPTE(spapr->htab, index));
+            index++;
+            examined++;
+        }
+
+        if (index > chunkstart) {
+            int n_valid = invalidstart - chunkstart;
+            int n_invalid = index - invalidstart;
+
+            qemu_put_be32(f, chunkstart);
+            qemu_put_be16(f, n_valid);
+            qemu_put_be16(f, n_invalid);
+            qemu_put_buffer(f, HPTE(spapr->htab, chunkstart),
+                            HASH_PTE_SIZE_64 * n_valid);
+            sent += index - chunkstart;
+
+            if (!final && (qemu_get_clock_ns(rt_clock) - starttime) > max_ns) {
+                break;
+            }
+        }
+
+        if (examined >= htabslots) {
+            break;
+        }
+
+        if (index >= htabslots) {
+            assert(index == htabslots);
+            index = 0;
+        }
+    } while ((examined < htabslots) && (!qemu_file_rate_limit(f) || final));
+
+    if (index >= htabslots) {
+        assert(index == htabslots);
+        index = 0;
+    }
+
+    spapr->htab_save_index = index;
+
+    return (examined >= htabslots) && (sent == 0);
+}
+
+static int htab_save_iterate(QEMUFile *f, void *opaque)
+{
+    sPAPREnvironment *spapr = opaque;
+    bool nothingleft = false;;
+
+    /* Iteration header */
+    qemu_put_be32(f, 0);
+
+    if (spapr->htab_first_pass) {
+        htab_save_first_pass(f, spapr, MAX_ITERATION_NS);
+    } else {
+        nothingleft = htab_save_later_pass(f, spapr, MAX_ITERATION_NS);
+    }
+
+    /* End marker */
+    qemu_put_be32(f, 0);
+    qemu_put_be16(f, 0);
+    qemu_put_be16(f, 0);
+
+    return nothingleft ? 1 : 0;
+}
+
+static int htab_save_complete(QEMUFile *f, void *opaque)
+{
+    sPAPREnvironment *spapr = opaque;
+
+    /* Iteration header */
+    qemu_put_be32(f, 0);
+
+    htab_save_later_pass(f, spapr, -1);
+
+    /* End marker */
+    qemu_put_be32(f, 0);
+    qemu_put_be16(f, 0);
+    qemu_put_be16(f, 0);
+
+    return 0;
+}
+
+static int htab_load(QEMUFile *f, void *opaque, int version_id)
+{
+    sPAPREnvironment *spapr = opaque;
+    uint32_t section_hdr;
+
+    if (version_id < 1 || version_id > 1) {
+        fprintf(stderr, "htab_load() bad version\n");
+        return -EINVAL;
+    }
+
+    section_hdr = qemu_get_be32(f);
+
+    if (section_hdr) {
+        /* First section, just the hash shift */
+        if (spapr->htab_shift != section_hdr) {
+            return -EINVAL;
+        }
+        return 0;
+    }
+
+    while (true) {
+        uint32_t index;
+        uint16_t n_valid, n_invalid;
+
+        index = qemu_get_be32(f);
+        n_valid = qemu_get_be16(f);
+        n_invalid = qemu_get_be16(f);
+
+        if ((index == 0) && (n_valid == 0) && (n_invalid == 0)) {
+            /* End of Stream */
+            break;
+        }
+
+        if ((index + n_valid + n_invalid) >=
+            (HTAB_SIZE(spapr) / HASH_PTE_SIZE_64)) {
+            /* Bad index in stream */
+            fprintf(stderr, "htab_load() bad index %d (%hd+%hd entries) "
+                    "in htab stream\n", index, n_valid, n_invalid);
+            return -EINVAL;
+        }
+
+        if (n_valid) {
+            qemu_get_buffer(f, HPTE(spapr->htab, index),
+                            HASH_PTE_SIZE_64 * n_valid);
+        }
+        if (n_invalid) {
+            memset(HPTE(spapr->htab, index + n_valid), 0,
+                   HASH_PTE_SIZE_64 * n_invalid);
+        }
+    }
+
+    return 0;
+}
+
+static SaveVMHandlers savevm_htab_handlers = {
+    .save_live_setup = htab_save_setup,
+    .save_live_iterate = htab_save_iterate,
+    .save_live_complete = htab_save_complete,
+    .load_state = htab_load,
+};
+
 static struct icp_state *try_create_xics(const char *type, int nr_servers,
                                          int nr_irqs)
 {
@@ -987,6 +1250,10 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
 
     spapr->entry_point = 0x100;
 
+    vmstate_register(NULL, 0, &vmstate_spapr, spapr);
+    register_savevm_live(NULL, "spapr/htab", -1, 1,
+                         &savevm_htab_handlers, spapr);
+
     /* Prepare the device tree */
     spapr->fdt_skel = spapr_create_fdt_skel(cpu_model,
                                             initrd_base, initrd_size,
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index e6f321d..7ca984e 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -115,7 +115,7 @@ static target_ulong h_enter(PowerPCCPU *cpu, sPAPREnvironment *spapr,
     }
     ppc_hash64_store_hpte1(env, hpte, ptel);
     /* eieio();  FIXME: need some sort of barrier for smp? */
-    ppc_hash64_store_hpte0(env, hpte, pteh);
+    ppc_hash64_store_hpte0(env, hpte, pteh | HPTE64_V_HPTE_DIRTY);
 
     args[0] = pte_index + i;
     return H_SUCCESS;
@@ -152,7 +152,7 @@ static target_ulong remove_hpte(CPUPPCState *env, target_ulong ptex,
     }
     *vp = v;
     *rp = r;
-    ppc_hash64_store_hpte0(env, hpte, 0);
+    ppc_hash64_store_hpte0(env, hpte, HPTE64_V_HPTE_DIRTY);
     rb = compute_tlbie_rb(v, r, ptex);
     ppc_tlb_invalidate_one(env, rb);
     return REMOVE_SUCCESS;
@@ -282,11 +282,11 @@ static target_ulong h_protect(PowerPCCPU *cpu, sPAPREnvironment *spapr,
     r |= (flags << 48) & HPTE64_R_KEY_HI;
     r |= flags & (HPTE64_R_PP | HPTE64_R_N | HPTE64_R_KEY_LO);
     rb = compute_tlbie_rb(v, r, pte_index);
-    ppc_hash64_store_hpte0(env, hpte, v & ~HPTE64_V_VALID);
+    ppc_hash64_store_hpte0(env, hpte, (v & ~HPTE64_V_VALID) | HPTE64_V_HPTE_DIRTY);
     ppc_tlb_invalidate_one(env, rb);
     ppc_hash64_store_hpte1(env, hpte, r);
     /* Don't need a memory barrier, due to qemu's global lock */
-    ppc_hash64_store_hpte0(env, hpte, v);
+    ppc_hash64_store_hpte0(env, hpte, v | HPTE64_V_HPTE_DIRTY);
     return H_SUCCESS;
 }
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 09c4570..4cfe449 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -9,6 +9,8 @@ struct sPAPRPHBState;
 struct sPAPRNVRAM;
 struct icp_state;
 
+#define HPTE64_V_HPTE_DIRTY     0x0000000000000040ULL
+
 typedef struct sPAPREnvironment {
     struct VIOsPAPRBus *vio_bus;
     QLIST_HEAD(, sPAPRPHBState) phbs;
@@ -17,20 +19,24 @@ typedef struct sPAPREnvironment {
 
     hwaddr ram_limit;
     void *htab;
-    long htab_shift;
+    uint32_t htab_shift;
     hwaddr rma_size;
     int vrma_adjust;
     hwaddr fdt_addr, rtas_addr;
     long rtas_size;
     void *fdt_skel;
     target_ulong entry_point;
-    int next_irq;
-    int rtc_offset;
+    uint32_t next_irq;
+    uint64_t rtc_offset;
     char *cpu_model;
     bool has_graphics;
 
     uint32_t epow_irq;
     Notifier epow_notifier;
+
+    /* Migration state */
+    int htab_save_index;
+    bool htab_first_pass;
 } sPAPREnvironment;
 
 #define H_SUCCESS         0
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH 12/17] pseries: savevm support for PCI host bridge
  2013-06-27  6:45 [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (10 preceding siblings ...)
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 11/17] pseries: savevm support for pseries machine Alexey Kardashevskiy
@ 2013-06-27  6:45 ` Alexey Kardashevskiy
  2013-07-08 18:45   ` Anthony Liguori
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 13/17] target-ppc: Add helper for KVM_PPC_RTAS_DEFINE_TOKEN Alexey Kardashevskiy
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-27  6:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

This adds the necessary support for saving the state of the PAPR virtual
PCI host bridge (or host bridges).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr_pci.c          |   49 +++++++++++++++++++++++++++++++++++++++++++
 include/hw/pci-host/spapr.h |    6 +++---
 2 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index c8c12c8..4d8e3cd 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -696,6 +696,54 @@ static Property spapr_phb_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
+static const VMStateDescription vmstate_spapr_pci_lsi = {
+    .name = "spapr_pci/lsi",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_UINT32_EQUAL(irq, struct spapr_pci_lsi),
+
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static const VMStateDescription vmstate_spapr_pci_msi = {
+    .name = "spapr_pci/lsi",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_UINT32(config_addr, struct spapr_pci_msi),
+        VMSTATE_UINT32(irq, struct spapr_pci_msi),
+        VMSTATE_UINT32(nvec, struct spapr_pci_msi),
+
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static const VMStateDescription vmstate_spapr_pci = {
+    .name = "spapr_pci",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_UINT64_EQUAL(buid, sPAPRPHBState),
+        VMSTATE_UINT32_EQUAL(dma_liobn, sPAPRPHBState),
+        VMSTATE_UINT64_EQUAL(mem_win_addr, sPAPRPHBState),
+        VMSTATE_UINT64_EQUAL(mem_win_size, sPAPRPHBState),
+        VMSTATE_UINT64_EQUAL(io_win_addr, sPAPRPHBState),
+        VMSTATE_UINT64_EQUAL(io_win_size, sPAPRPHBState),
+        VMSTATE_UINT64_EQUAL(msi_win_addr, sPAPRPHBState),
+        VMSTATE_STRUCT_ARRAY(lsi_table, sPAPRPHBState, PCI_NUM_PINS, 0,
+                             vmstate_spapr_pci_lsi, struct spapr_pci_lsi),
+        VMSTATE_STRUCT_ARRAY(msi_table, sPAPRPHBState, SPAPR_MSIX_MAX_DEVS, 0,
+                             vmstate_spapr_pci_msi, struct spapr_pci_msi),
+
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static void spapr_phb_class_init(ObjectClass *klass, void *data)
 {
     SysBusDeviceClass *sdc = SYS_BUS_DEVICE_CLASS(klass);
@@ -704,6 +752,7 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
     sdc->init = spapr_phb_init;
     dc->props = spapr_phb_properties;
     dc->reset = spapr_phb_reset;
+    dc->vmsd = &vmstate_spapr_pci;
 }
 
 static const TypeInfo spapr_phb_info = {
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 1e23dbf..93f9511 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -52,14 +52,14 @@ typedef struct sPAPRPHBState {
     sPAPRTCETable *tcet;
     AddressSpace iommu_as;
 
-    struct {
+    struct spapr_pci_lsi {
         uint32_t irq;
     } lsi_table[PCI_NUM_PINS];
 
-    struct {
+    struct spapr_pci_msi {
         uint32_t config_addr;
         uint32_t irq;
-        int nvec;
+        uint32_t nvec;
     } msi_table[SPAPR_MSIX_MAX_DEVS];
 
     QLIST_ENTRY(sPAPRPHBState) list;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH 13/17] target-ppc: Add helper for KVM_PPC_RTAS_DEFINE_TOKEN
  2013-06-27  6:45 [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (11 preceding siblings ...)
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 12/17] pseries: savevm support for PCI host bridge Alexey Kardashevskiy
@ 2013-06-27  6:45 ` Alexey Kardashevskiy
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 14/17] pseries: Support for in-kernel XICS interrupt controller Alexey Kardashevskiy
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-27  6:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

Recent PowerKVM allows the kernel to intercept some RTAS calls from the
guest directly.  This is used to implement the more efficient in-kernel
XICS for example.  qemu is still responsible for assigning the RTAS token
numbers however, and needs to tell the kernel which RTAS function name is
assigned to a given token value.  This patch adds a convenience wrapper for
the KVM_PPC_RTAS_DEFINE_TOKEN ioctl() which is used for this purpose.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 target-ppc/kvm.c     |   14 ++++++++++++++
 target-ppc/kvm_ppc.h |    7 +++++++
 2 files changed, 21 insertions(+)

diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index c89dd58..33ddf63 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -1787,6 +1787,20 @@ static int kvm_ppc_register_host_cpu_type(void)
     return 0;
 }
 
+int kvmppc_define_rtas_token(uint32_t token, const char *function)
+{
+    struct kvm_rtas_token_args args = {
+        .token = token,
+    };
+
+    if (!kvm_check_extension(kvm_state, KVM_CAP_PPC_RTAS)) {
+        return -ENOENT;
+    }
+
+    strncpy(args.name, function, sizeof(args.name));
+
+    return kvm_vm_ioctl(kvm_state, KVM_PPC_RTAS_DEFINE_TOKEN, &args);
+}
 
 bool kvm_arch_stop_on_emulation_error(CPUState *cpu)
 {
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index 771cfbe..21939a8 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -38,6 +38,7 @@ uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);
 #endif /* !CONFIG_USER_ONLY */
 int kvmppc_fixup_cpu(PowerPCCPU *cpu);
 bool kvmppc_has_cap_epr(void);
+int kvmppc_define_rtas_token(uint32_t token, const char *function);
 
 #else
 
@@ -159,6 +160,12 @@ static inline bool kvmppc_has_cap_epr(void)
 {
     return false;
 }
+
+static inline int kvmppc_define_rtas_token(uint32_t token,
+                                           const char *function)
+{
+    return -1;
+}
 #endif
 
 #ifndef CONFIG_KVM
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH 14/17] pseries: Support for in-kernel XICS interrupt controller
  2013-06-27  6:45 [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (12 preceding siblings ...)
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 13/17] target-ppc: Add helper for KVM_PPC_RTAS_DEFINE_TOKEN Alexey Kardashevskiy
@ 2013-06-27  6:45 ` Alexey Kardashevskiy
  2013-07-08 18:50   ` Anthony Liguori
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 15/17] pseries: savevm support with KVM Alexey Kardashevskiy
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-27  6:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

Recent (host) kernels support emulating the PAPR defined "XICS" interrupt
controller system within KVM.  This patch allows qemu to initialize and
configure the in-kernel XICS, and keep its state in sync with qemu's XICS
state as necessary.

This should give considerable performance improvements.  e.g. on a simple
IPI ping-pong test between hardware threads, using qemu XICS gives us
around 5,000 irqs/second, whereas the in-kernel XICS gives us around
70,000 irqs/s on the same hardware configuration.

[Mike Qiu <qiudayu@linux.vnet.ibm.com>: fixed mistype which caused ics_set_kvm_state() to fail]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[aik: moved to a separate device]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 default-configs/ppc64-softmmu.mak |    1 +
 hw/intc/Makefile.objs             |    1 +
 hw/intc/xics_kvm.c                |  445 +++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr.c                    |   32 ++-
 include/hw/ppc/xics.h             |   13 ++
 5 files changed, 489 insertions(+), 3 deletions(-)
 create mode 100644 hw/intc/xics_kvm.c

diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
index 69a9f8d..5b995f9 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -48,5 +48,6 @@ CONFIG_OPENPIC_KVM=$(and $(CONFIG_E500),$(CONFIG_KVM))
 # For pSeries
 CONFIG_PCI_HOTPLUG=y
 CONFIG_XICS=$(CONFIG_PSERIES)
+CONFIG_XICS_KVM=$(and $(CONFIG_PSERIES),$(CONFIG_KVM))
 # For PReP
 CONFIG_MC146818RTC=y
diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index abe8f80..9e77afe 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -23,3 +23,4 @@ obj-$(CONFIG_OPENPIC) += openpic.o
 obj-$(CONFIG_OPENPIC_KVM) += openpic_kvm.o
 obj-$(CONFIG_SH4) += sh_intc.o
 obj-$(CONFIG_XICS) += xics.o
+obj-$(CONFIG_XICS_KVM) += xics_kvm.o
diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
new file mode 100644
index 0000000..d5604a7
--- /dev/null
+++ b/hw/intc/xics_kvm.c
@@ -0,0 +1,445 @@
+/*
+ * QEMU PowerPC pSeries Logical Partition (aka sPAPR) hardware System Emulator
+ *
+ * PAPR Virtualized Interrupt System, aka ICS/ICP aka xics, in-kernel emulation
+ *
+ * Copyright (c) 2013 David Gibson, IBM Corporation.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ *
+ */
+
+#include "hw/hw.h"
+#include "trace.h"
+#include "hw/ppc/spapr.h"
+#include "hw/ppc/xics.h"
+#include "kvm_ppc.h"
+#include "qemu/config-file.h"
+
+#include <sys/ioctl.h>
+
+struct icp_state_kvm {
+    struct icp_state parent;
+
+    uint32_t set_xive_token;
+    uint32_t get_xive_token;
+    uint32_t int_off_token;
+    uint32_t int_on_token;
+    int kernel_xics_fd;
+};
+
+static void icp_get_kvm_state(struct icp_server_state *ss)
+{
+    uint64_t state;
+    struct kvm_one_reg reg = {
+        .id = KVM_REG_PPC_ICP_STATE,
+        .addr = (uintptr_t)&state,
+    };
+    int ret;
+
+    if (!ss->cs) {
+        return; /* kernel irqchip not in use */
+    }
+
+    ret = kvm_vcpu_ioctl(ss->cs, KVM_GET_ONE_REG, &reg);
+    if (ret != 0) {
+        fprintf(stderr, "Unable to retrieve KVM interrupt controller state"
+                " for CPU %d: %s\n", ss->cs->cpu_index, strerror(errno));
+        exit(1);
+    }
+
+    ss->xirr = state >> KVM_REG_PPC_ICP_XISR_SHIFT;
+    ss->mfrr = (state >> KVM_REG_PPC_ICP_MFRR_SHIFT)
+        & KVM_REG_PPC_ICP_MFRR_MASK;
+    ss->pending_priority = (state >> KVM_REG_PPC_ICP_PPRI_SHIFT)
+        & KVM_REG_PPC_ICP_PPRI_MASK;
+}
+
+static int icp_set_kvm_state(struct icp_server_state *ss)
+{
+    uint64_t state;
+    struct kvm_one_reg reg = {
+        .id = KVM_REG_PPC_ICP_STATE,
+        .addr = (uintptr_t)&state,
+    };
+    int ret;
+
+    if (!ss->cs) {
+        return 0; /* kernel irqchip not in use */
+    }
+
+    state = ((uint64_t)ss->xirr << KVM_REG_PPC_ICP_XISR_SHIFT)
+        | ((uint64_t)ss->mfrr << KVM_REG_PPC_ICP_MFRR_SHIFT)
+        | ((uint64_t)ss->pending_priority << KVM_REG_PPC_ICP_PPRI_SHIFT);
+
+    ret = kvm_vcpu_ioctl(ss->cs, KVM_SET_ONE_REG, &reg);
+    if (ret != 0) {
+        fprintf(stderr, "Unable to restore KVM interrupt controller state (0x%"
+                PRIx64 ") for CPU %d: %s\n", state, ss->cs->cpu_index,
+                strerror(errno));
+        exit(1);
+        return ret;
+    }
+
+    return 0;
+}
+
+static void ics_get_kvm_state(struct ics_state *ics)
+{
+    struct icp_state_kvm *icpkvm = XICS_KVM(ics->icp);
+    uint64_t state;
+    struct kvm_device_attr attr = {
+        .flags = 0,
+        .group = KVM_DEV_XICS_GRP_SOURCES,
+        .addr = (uint64_t)(uintptr_t)&state,
+    };
+    int i;
+
+    for (i = 0; i < ics->nr_irqs; i++) {
+        struct ics_irq_state *irq = &ics->irqs[i];
+        int ret;
+
+        attr.attr = i + ics->offset;
+
+        ret = ioctl(icpkvm->kernel_xics_fd, KVM_GET_DEVICE_ATTR, &attr);
+        if (ret != 0) {
+            fprintf(stderr, "Unable to retrieve KVM interrupt controller state"
+                    " for IRQ %d: %s\n", i + ics->offset, strerror(errno));
+            exit(1);
+        }
+
+        irq->server = state & KVM_XICS_DESTINATION_MASK;
+        irq->saved_priority = (state >> KVM_XICS_PRIORITY_SHIFT)
+            & KVM_XICS_PRIORITY_MASK;
+        /*
+         * To be consistent with the software emulation in xics.c, we
+         * split out the masked state + priority that we get from the
+         * kernel into 'current priority' (0xff if masked) and
+         * 'saved priority' (if masked, this is the priority the
+         * interrupt had before it was masked).  Masking and unmasking
+         * are done with the ibm,int-off and ibm,int-on RTAS calls.
+         */
+        if (state & KVM_XICS_MASKED) {
+            irq->priority = 0xff;
+        } else {
+            irq->priority = irq->saved_priority;
+        }
+
+        if (state & KVM_XICS_PENDING) {
+            if (state & KVM_XICS_LEVEL_SENSITIVE) {
+                irq->status |= XICS_STATUS_ASSERTED;
+            } else {
+                /*
+                 * A pending edge-triggered interrupt (or MSI)
+                 * must have been rejected previously when we
+                 * first detected it and tried to deliver it,
+                 * so mark it as pending and previously rejected
+                 * for consistency with how xics.c works.
+                 */
+                irq->status |= XICS_STATUS_MASKED_PENDING
+                    | XICS_STATUS_REJECTED;
+            }
+        }
+    }
+}
+
+static int ics_set_kvm_state(struct ics_state *ics)
+{
+    struct icp_state_kvm *icpkvm = XICS_KVM(ics->icp);
+    uint64_t state;
+    struct kvm_device_attr attr = {
+        .flags = 0,
+        .group = KVM_DEV_XICS_GRP_SOURCES,
+        .addr = (uint64_t)(uintptr_t)&state,
+    };
+    int i;
+
+    for (i = 0; i < ics->nr_irqs; i++) {
+        struct ics_irq_state *irq = &ics->irqs[i];
+        int ret;
+
+        attr.attr = i + ics->offset;
+
+        state = irq->server;
+        state |= (uint64_t)(irq->saved_priority & KVM_XICS_PRIORITY_MASK)
+            << KVM_XICS_PRIORITY_SHIFT;
+        if (irq->priority != irq->saved_priority) {
+            assert(irq->priority == 0xff);
+            state |= KVM_XICS_MASKED;
+        }
+
+        if (ics->islsi[i]) {
+            state |= KVM_XICS_LEVEL_SENSITIVE;
+            if (irq->status & XICS_STATUS_ASSERTED) {
+                state |= KVM_XICS_PENDING;
+            }
+        } else {
+            if (irq->status & XICS_STATUS_MASKED_PENDING) {
+                state |= KVM_XICS_PENDING;
+            }
+        }
+
+        ret = ioctl(icpkvm->kernel_xics_fd, KVM_SET_DEVICE_ATTR, &attr);
+        if (ret != 0) {
+            fprintf(stderr, "Unable to restore KVM interrupt controller state"
+                    " for IRQs %d: %s\n", i + ics->offset, strerror(errno));
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
+static void icp_pre_save(void *opaque)
+{
+    struct icp_server_state *ss = opaque;
+
+    icp_get_kvm_state(ss);
+}
+
+static int icp_post_load(void *opaque, int version_id)
+{
+    struct icp_server_state *ss = opaque;
+
+    return icp_set_kvm_state(ss);
+}
+
+static void ics_pre_save(void *opaque)
+{
+    struct ics_state *ics = opaque;
+
+    ics_get_kvm_state(ics);
+}
+
+static int ics_post_load(void *opaque, int version_id)
+{
+    struct ics_state *ics = opaque;
+
+    return ics_set_kvm_state(ics);
+}
+
+static VMStateDescription vmstate_icpkvm_server = {
+    .name = "icpkvm/server",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .pre_save = icp_pre_save,
+    .post_load = icp_post_load,
+};
+
+static VMStateDescription vmstate_icskvm = {
+    .name = "icskvm",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .pre_save = ics_pre_save,
+    .post_load = ics_post_load,
+};
+
+static void ics_set_irq_kvm(void *opaque, int srcno, int val)
+{
+    struct ics_state *ics = opaque;
+    struct kvm_irq_level args;
+    int rc;
+
+    args.irq = srcno + ics->offset;
+    if (!ics->islsi[srcno]) {
+        if (!val) {
+            return;
+        }
+        args.level = KVM_INTERRUPT_SET;
+    } else {
+        args.level = val ? KVM_INTERRUPT_SET_LEVEL : KVM_INTERRUPT_UNSET;
+    }
+    rc = kvm_vm_ioctl(kvm_state, KVM_IRQ_LINE, &args);
+    if (rc < 0) {
+        perror("kvm_irq_line");
+    }
+}
+
+int xics_kvm_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
+{
+    CPUState *cs;
+    struct icp_server_state *ss;
+    struct icp_state_kvm *icpkvm = (struct icp_state_kvm *) object_dynamic_cast(
+            OBJECT(icp), TYPE_XICS_KVM);
+
+    if (!icpkvm) {
+        return -1;
+    }
+
+    cs = CPU(cpu);
+    ss = &icp->ss[cs->cpu_index];
+
+    assert(cs->cpu_index < icp->nr_servers);
+    if (icpkvm->kernel_xics_fd == -1) {
+        abort();
+    }
+
+    if (icpkvm->kernel_xics_fd != -1) {
+        int ret;
+        struct kvm_enable_cap xics_enable_cap = {
+            .cap = KVM_CAP_IRQ_XICS,
+            .flags = 0,
+            .args = {icpkvm->kernel_xics_fd, cs->cpu_index, 0, 0},
+        };
+
+        ss->cs = cs;
+
+        ret = kvm_vcpu_ioctl(ss->cs, KVM_ENABLE_CAP, &xics_enable_cap);
+        if (ret < 0) {
+            fprintf(stderr, "Unable to connect CPU%d to kernel XICS: %s\n",
+                    cs->cpu_index, strerror(errno));
+            exit(1);
+        }
+    }
+    xics_common_cpu_setup(icp, cpu);
+
+    vmstate_icpkvm_server.fields = vmstate_icp_server.fields;
+    vmstate_register(NULL, cs->cpu_index, &vmstate_icpkvm_server, ss);
+
+    return 0;
+}
+
+static void rtas_dummy(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+                       uint32_t token,
+                       uint32_t nargs, target_ulong args,
+                       uint32_t nret, target_ulong rets)
+{
+    fprintf(stderr, "pseries: %s() should never be called for in-kernel XICS\n", __func__);
+}
+
+static void xics_kvm_realize(DeviceState *dev, Error **errp)
+{
+    struct icp_state_kvm *icpkvm = XICS_KVM(dev);
+    QemuOptsList *list = qemu_find_opts("machine");
+    int rc;
+    struct kvm_create_device xics_create_device = {
+        .type = KVM_DEV_TYPE_XICS,
+        .flags = 0,
+    };
+
+    if (!kvm_enabled()) {
+        error_setg(errp, "KVM must be enabled for in-kernel XICS");
+        goto fail;
+    }
+
+    if (QTAILQ_EMPTY(&list->head) ||
+        !qemu_opt_get_bool(QTAILQ_FIRST(&list->head),
+                           "kernel_irqchip", true) ||
+        !kvm_check_extension(kvm_state, KVM_CAP_IRQ_XICS)) {
+        error_setg(errp, "KVM must be enabled for in-kernel XICS");
+        return;
+    }
+
+    icpkvm->set_xive_token = spapr_rtas_register("ibm,set-xive", rtas_dummy);
+    icpkvm->get_xive_token = spapr_rtas_register("ibm,get-xive", rtas_dummy);
+    icpkvm->int_off_token = spapr_rtas_register("ibm,int-off", rtas_dummy);
+    icpkvm->int_on_token = spapr_rtas_register("ibm,int-on", rtas_dummy);
+
+    rc = kvmppc_define_rtas_token(icpkvm->set_xive_token, "ibm,set-xive");
+    if (rc < 0) {
+        error_setg(errp, "kvmppc_define_rtas_token: ibm,set-xive");
+        goto fail;
+    }
+
+    rc = kvmppc_define_rtas_token(icpkvm->get_xive_token, "ibm,get-xive");
+    if (rc < 0) {
+        error_setg(errp, "kvmppc_define_rtas_token: ibm,get-xive");
+        goto fail;
+    }
+
+    rc = kvmppc_define_rtas_token(icpkvm->int_on_token, "ibm,int-on");
+    if (rc < 0) {
+        error_setg(errp, "kvmppc_define_rtas_token: ibm,int-on");
+        goto fail;
+    }
+
+    rc = kvmppc_define_rtas_token(icpkvm->int_off_token, "ibm,int-off");
+    if (rc < 0) {
+        error_setg(errp, "kvmppc_define_rtas_token: ibm,int-off");
+        goto fail;
+    }
+
+    /* Create the kernel ICP */
+    rc = kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &xics_create_device);
+    if (rc < 0) {
+        error_setg_errno(errp, -rc, "Error on KVM_CREATE_DEVICE for XICS");
+        goto fail;
+    }
+
+    icpkvm->kernel_xics_fd = xics_create_device.fd;
+
+    xics_common_init(&icpkvm->parent, ics_set_irq_kvm);
+
+    /* We use each the ICS's offset into the global irq number space
+     * as an instance id.  This means we can extend to multiple ICS
+     * instances without needing to change the savevm format */
+    vmstate_icskvm.fields = vmstate_ics.fields;
+    vmstate_register(NULL, icpkvm->parent.ics->offset, &vmstate_icskvm,
+                     icpkvm->parent.ics);
+
+    return;
+
+fail:
+    kvmppc_define_rtas_token(0, "ibm,set-xive");
+    kvmppc_define_rtas_token(0, "ibm,get-xive");
+    kvmppc_define_rtas_token(0, "ibm,int-on");
+    kvmppc_define_rtas_token(0, "ibm,int-off");
+    return;
+}
+
+static void xics_kvm_reset(DeviceState *d)
+{
+    struct icp_state_kvm *icpkvm = XICS_KVM(d);
+    struct icp_state *icp = &icpkvm->parent;
+    int i;
+
+    xics_common_reset(icp);
+
+    for (i = 0; i < icp->nr_servers; i++) {
+        if (icp->ss[i].cs) {
+            icp_set_kvm_state(&icp->ss[i]);
+        }
+    }
+
+    ics_set_kvm_state(icp->ics);
+}
+
+static void xics_kvm_class_init(ObjectClass *oc, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(oc);
+
+    dc->realize = xics_kvm_realize;
+    dc->reset = xics_kvm_reset;
+}
+
+static const TypeInfo xics_kvm_info = {
+    .name          = TYPE_XICS_KVM,
+    .parent        = TYPE_XICS,
+    .instance_size = sizeof(struct icp_state_kvm),
+    .class_init    = xics_kvm_class_init,
+};
+
+static void xics_kvm_register_types(void)
+{
+    type_register_static(&xics_kvm_info);
+}
+
+type_init(xics_kvm_register_types)
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index f989a22..211f434 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1001,7 +1001,31 @@ static struct icp_state *xics_system_init(int nr_servers, int nr_irqs)
 {
     struct icp_state *icp = NULL;
 
-    icp = try_create_xics(TYPE_XICS, nr_servers, nr_irqs);
+    if (kvm_enabled()) {
+        bool irqchip_allowed = true, irqchip_required = false;
+        QemuOptsList *list = qemu_find_opts("machine");
+
+        if (!QTAILQ_EMPTY(&list->head)) {
+            irqchip_allowed = qemu_opt_get_bool(QTAILQ_FIRST(&list->head),
+                                                "kernel_irqchip", true);
+            irqchip_required = qemu_opt_get_bool(QTAILQ_FIRST(&list->head),
+                                                 "kernel_irqchip", false);
+        }
+
+        if (irqchip_allowed) {
+            icp = try_create_xics(TYPE_XICS_KVM, nr_servers, nr_irqs);
+        }
+
+        if (irqchip_required && !icp) {
+            perror("iFailed to create in-kernel XICS\n");
+            abort();
+        }
+    }
+
+    if (!icp) {
+        icp = try_create_xics(TYPE_XICS, nr_servers, nr_irqs);
+    }
+
     if (!icp) {
         perror("Failed to create XICS\n");
         abort();
@@ -1102,8 +1126,6 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
         }
         env = &cpu->env;
 
-        xics_cpu_setup(spapr->icp, cpu);
-
         /* Set time-base frequency to 512 MHz */
         cpu_ppc_tb_init(env, TIMEBASE_FREQ);
 
@@ -1117,6 +1139,10 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
             kvmppc_set_papr(cpu);
         }
 
+        if (xics_kvm_cpu_setup(spapr->icp, cpu)) {
+            xics_cpu_setup(spapr->icp, cpu);
+        }
+
         qemu_register_reset(spapr_cpu_reset, cpu);
     }
 
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 3f72806..e474c01 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -32,6 +32,9 @@
 #define TYPE_XICS "xics"
 #define XICS(obj) OBJECT_CHECK(struct icp_state, (obj), TYPE_XICS)
 
+#define TYPE_XICS_KVM "xics-kvm"
+#define XICS_KVM(obj) OBJECT_CHECK(struct icp_state_kvm, (obj), TYPE_XICS_KVM)
+
 #define XICS_IPI        0x2
 #define XICS_BUID       0x1
 #define XICS_IRQ_BASE   (XICS_BUID << 12)
@@ -53,6 +56,7 @@ struct icp_state {
 };
 
 struct icp_server_state {
+    CPUState *cs;
     uint32_t xirr;
     uint8_t pending_priority;
     uint8_t mfrr;
@@ -88,6 +92,15 @@ void xics_common_reset(struct icp_state *icp);
 
 void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu);
 
+#ifdef CONFIG_KVM
+int xics_kvm_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu);
+#else
+static inline int xics_kvm_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
+{
+    return -1;
+}
+#endif
+
 extern const VMStateDescription vmstate_icp_server;
 extern const VMStateDescription vmstate_ics;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH 15/17] pseries: savevm support with KVM
  2013-06-27  6:45 [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (13 preceding siblings ...)
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 14/17] pseries: Support for in-kernel XICS interrupt controller Alexey Kardashevskiy
@ 2013-06-27  6:45 ` Alexey Kardashevskiy
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 16/17] ppc64: Enable QEMU to run on POWER 8 DD1 chip Alexey Kardashevskiy
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-27  6:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

At present, the savevm / migration support for the pseries machine will not
work when KVM is enabled.  That's because KVM manages the guest's hash page
table in the host kernel, so qemu has no visibility of it.  This patch
fixes this by using new kernel interfaces to extract and reinsert the
guest's hash table during the migration process.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr.c         |  106 ++++++++++++++++++++++++++++++++++++++----------
 include/hw/ppc/spapr.h |    1 +
 target-ppc/kvm.c       |   69 +++++++++++++++++++++++++++++++
 target-ppc/kvm_ppc.h   |   22 ++++++++++
 4 files changed, 176 insertions(+), 22 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 211f434..9489edc 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -744,17 +744,27 @@ static int htab_save_setup(QEMUFile *f, void *opaque)
 {
     sPAPREnvironment *spapr = opaque;
 
-    spapr->htab_save_index = 0;
-    spapr->htab_first_pass = true;
-
     /* "Iteration" header */
     qemu_put_be32(f, spapr->htab_shift);
 
+    if (spapr->htab) {
+        spapr->htab_save_index = 0;
+        spapr->htab_first_pass = true;
+    } else {
+        assert(kvm_enabled());
+
+        spapr->htab_fd = kvmppc_get_htab_fd(false);
+        if (spapr->htab_fd < 0) {
+            fprintf(stderr, "Unable to open fd for reading hash table from KVM: %s\n",
+                    strerror(errno));
+            return -1;
+        }
+    }
+
+
     return 0;
 }
 
-#define MAX_ITERATION_NS    5000000 /* 5 ms */
-
 static void htab_save_first_pass(QEMUFile *f, sPAPREnvironment *spapr,
                                  int64_t max_ns)
 {
@@ -805,8 +815,8 @@ static void htab_save_first_pass(QEMUFile *f, sPAPREnvironment *spapr,
     spapr->htab_save_index = index;
 }
 
-static bool htab_save_later_pass(QEMUFile *f, sPAPREnvironment *spapr,
-                                 int64_t max_ns)
+static int htab_save_later_pass(QEMUFile *f, sPAPREnvironment *spapr,
+                                int64_t max_ns)
 {
     bool final = max_ns < 0;
     int htabslots = HTAB_SIZE(spapr) / HASH_PTE_SIZE_64;
@@ -879,21 +889,32 @@ static bool htab_save_later_pass(QEMUFile *f, sPAPREnvironment *spapr,
 
     spapr->htab_save_index = index;
 
-    return (examined >= htabslots) && (sent == 0);
+    return (examined >= htabslots) && (sent == 0) ? 1 : 0;
 }
 
+#define MAX_ITERATION_NS    5000000 /* 5 ms */
+#define MAX_KVM_BUF_SIZE    2048
+
 static int htab_save_iterate(QEMUFile *f, void *opaque)
 {
     sPAPREnvironment *spapr = opaque;
-    bool nothingleft = false;;
+    int rc = 0;
 
     /* Iteration header */
     qemu_put_be32(f, 0);
 
-    if (spapr->htab_first_pass) {
+    if (!spapr->htab) {
+        assert(kvm_enabled());
+
+        rc = kvmppc_save_htab(f, spapr->htab_fd,
+                              MAX_KVM_BUF_SIZE, MAX_ITERATION_NS);
+        if (rc < 0) {
+            return rc;
+        }
+    } else  if (spapr->htab_first_pass) {
         htab_save_first_pass(f, spapr, MAX_ITERATION_NS);
     } else {
-        nothingleft = htab_save_later_pass(f, spapr, MAX_ITERATION_NS);
+        rc = htab_save_later_pass(f, spapr, MAX_ITERATION_NS);
     }
 
     /* End marker */
@@ -901,7 +922,7 @@ static int htab_save_iterate(QEMUFile *f, void *opaque)
     qemu_put_be16(f, 0);
     qemu_put_be16(f, 0);
 
-    return nothingleft ? 1 : 0;
+    return rc;
 }
 
 static int htab_save_complete(QEMUFile *f, void *opaque)
@@ -911,7 +932,20 @@ static int htab_save_complete(QEMUFile *f, void *opaque)
     /* Iteration header */
     qemu_put_be32(f, 0);
 
-    htab_save_later_pass(f, spapr, -1);
+    if (!spapr->htab) {
+        int rc;
+
+        assert(kvm_enabled());
+
+        rc = kvmppc_save_htab(f, spapr->htab_fd, MAX_KVM_BUF_SIZE, -1);
+        if (rc < 0) {
+            return rc;
+        }
+        close(spapr->htab_fd);
+        spapr->htab_fd = -1;
+    } else {
+        htab_save_later_pass(f, spapr, -1);
+    }
 
     /* End marker */
     qemu_put_be32(f, 0);
@@ -925,6 +959,7 @@ static int htab_load(QEMUFile *f, void *opaque, int version_id)
 {
     sPAPREnvironment *spapr = opaque;
     uint32_t section_hdr;
+    int fd = -1;
 
     if (version_id < 1 || version_id > 1) {
         fprintf(stderr, "htab_load() bad version\n");
@@ -941,6 +976,16 @@ static int htab_load(QEMUFile *f, void *opaque, int version_id)
         return 0;
     }
 
+    if (!spapr->htab) {
+        assert(kvm_enabled());
+
+        fd = kvmppc_get_htab_fd(true);
+        if (fd < 0) {
+            fprintf(stderr, "Unable to open fd to restore KVM hash table: %s\n",
+                    strerror(errno));
+        }
+    }
+
     while (true) {
         uint32_t index;
         uint16_t n_valid, n_invalid;
@@ -954,24 +999,41 @@ static int htab_load(QEMUFile *f, void *opaque, int version_id)
             break;
         }
 
-        if ((index + n_valid + n_invalid) >=
+        if ((index + n_valid + n_invalid) >
             (HTAB_SIZE(spapr) / HASH_PTE_SIZE_64)) {
             /* Bad index in stream */
             fprintf(stderr, "htab_load() bad index %d (%hd+%hd entries) "
-                    "in htab stream\n", index, n_valid, n_invalid);
+                    "in htab stream (htab_shift=%d)\n", index, n_valid, n_invalid,
+                    spapr->htab_shift);
             return -EINVAL;
         }
 
-        if (n_valid) {
-            qemu_get_buffer(f, HPTE(spapr->htab, index),
-                            HASH_PTE_SIZE_64 * n_valid);
-        }
-        if (n_invalid) {
-            memset(HPTE(spapr->htab, index + n_valid), 0,
-                   HASH_PTE_SIZE_64 * n_invalid);
+        if (spapr->htab) {
+            if (n_valid) {
+                qemu_get_buffer(f, HPTE(spapr->htab, index),
+                                HASH_PTE_SIZE_64 * n_valid);
+            }
+            if (n_invalid) {
+                memset(HPTE(spapr->htab, index + n_valid), 0,
+                       HASH_PTE_SIZE_64 * n_invalid);
+            }
+        } else {
+            int rc;
+
+            assert(fd >= 0);
+
+            rc = kvmppc_load_htab_chunk(f, fd, index, n_valid, n_invalid);
+            if (rc < 0) {
+                return rc;
+            }
         }
     }
 
+    if (!spapr->htab) {
+        assert(fd >= 0);
+        close(fd);
+    }
+
     return 0;
 }
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 4cfe449..3da31f0 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -37,6 +37,7 @@ typedef struct sPAPREnvironment {
     /* Migration state */
     int htab_save_index;
     bool htab_first_pass;
+    int htab_fd;
 } sPAPREnvironment;
 
 #define H_SUCCESS         0
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 33ddf63..ff85c19 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -65,6 +65,7 @@ static int cap_one_reg;
 static int cap_epr;
 static int cap_ppc_watchdog;
 static int cap_papr;
+static int cap_htab_fd;
 
 /* XXX We have a race condition where we actually have a level triggered
  *     interrupt, but the infrastructure can't expose that yet, so the guest
@@ -101,6 +102,7 @@ int kvm_arch_init(KVMState *s)
     cap_ppc_watchdog = kvm_check_extension(s, KVM_CAP_PPC_BOOKE_WATCHDOG);
     /* Note: we don't set cap_papr here, because this capability is
      * only activated after this by kvmppc_set_papr() */
+    cap_htab_fd = kvm_check_extension(s, KVM_CAP_PPC_HTAB_FD);
 
     if (!cap_interrupt_level) {
         fprintf(stderr, "KVM: Couldn't find level irq capability. Expect the "
@@ -1802,6 +1804,73 @@ int kvmppc_define_rtas_token(uint32_t token, const char *function)
     return kvm_vm_ioctl(kvm_state, KVM_PPC_RTAS_DEFINE_TOKEN, &args);
 }
 
+int kvmppc_get_htab_fd(bool write)
+{
+    struct kvm_get_htab_fd s = {
+        .flags = write ? KVM_GET_HTAB_WRITE : 0,
+        .start_index = 0,
+    };
+
+    if (!cap_htab_fd) {
+        fprintf(stderr, "KVM version doesn't support saving the hash table\n");
+        return -1;
+    }
+
+    return kvm_vm_ioctl(kvm_state, KVM_PPC_GET_HTAB_FD, &s);
+}
+
+int kvmppc_save_htab(QEMUFile *f, int fd, size_t bufsize, int64_t max_ns)
+{
+    int64_t starttime = qemu_get_clock_ns(rt_clock);
+    uint8_t buf[bufsize];
+    ssize_t rc;
+
+    do {
+        rc = read(fd, buf, bufsize);
+        if (rc < 0) {
+            fprintf(stderr, "Error reading data from KVM HTAB fd: %s\n",
+                    strerror(errno));
+            return rc;
+        } else if (rc) {
+            /* Kernel already retuns data in BE format for the file */
+            qemu_put_buffer(f, buf, rc);
+        }
+    } while ((rc != 0)
+             && ((max_ns < 0)
+                 || ((qemu_get_clock_ns(rt_clock) - starttime) < max_ns)));
+
+    return (rc == 0) ? 1 : 0;
+}
+
+int kvmppc_load_htab_chunk(QEMUFile *f, int fd, uint32_t index,
+                           uint16_t n_valid, uint16_t n_invalid)
+{
+    struct kvm_get_htab_header *buf;
+    size_t chunksize = sizeof(*buf) + n_valid*HASH_PTE_SIZE_64;
+    ssize_t rc;
+
+    buf = alloca(chunksize);
+    /* This is KVM on ppc, so this is all big-endian */
+    buf->index = index;
+    buf->n_valid = n_valid;
+    buf->n_invalid = n_invalid;
+
+    qemu_get_buffer(f, (void *)(buf + 1), HASH_PTE_SIZE_64*n_valid);
+
+    rc = write(fd, buf, chunksize);
+    if (rc < 0) {
+        fprintf(stderr, "Error writing KVM hash table: %s\n",
+                strerror(errno));
+        return rc;
+    }
+    if (rc != chunksize) {
+        /* We should never get a short write on a single chunk */
+        fprintf(stderr, "Short write, restoring KVM hash table\n");
+        return -1;
+    }
+    return 0;
+}
+
 bool kvm_arch_stop_on_emulation_error(CPUState *cpu)
 {
     return true;
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index 21939a8..12564ef 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -39,6 +39,10 @@ uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);
 int kvmppc_fixup_cpu(PowerPCCPU *cpu);
 bool kvmppc_has_cap_epr(void);
 int kvmppc_define_rtas_token(uint32_t token, const char *function);
+int kvmppc_get_htab_fd(bool write);
+int kvmppc_save_htab(QEMUFile *f, int fd, size_t bufsize, int64_t max_ns);
+int kvmppc_load_htab_chunk(QEMUFile *f, int fd, uint32_t index,
+                           uint16_t n_valid, uint16_t n_invalid);
 
 #else
 
@@ -166,6 +170,24 @@ static inline int kvmppc_define_rtas_token(uint32_t token,
 {
     return -1;
 }
+
+static inline int kvmppc_get_htab_fd(bool write)
+{
+    return -1;
+}
+
+static inline int kvmppc_save_htab(QEMUFile *f, int fd, size_t bufsize,
+                                   int64_t max_ns)
+{
+    abort();
+}
+
+static inline int kvmppc_load_htab_chunk(QEMUFile *f, int fd, uint32_t index,
+                                         uint16_t n_valid, uint16_t n_invalid)
+{
+    abort();
+}
+
 #endif
 
 #ifndef CONFIG_KVM
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH 16/17] ppc64: Enable QEMU to run on POWER 8 DD1 chip.
  2013-06-27  6:45 [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (14 preceding siblings ...)
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 15/17] pseries: savevm support with KVM Alexey Kardashevskiy
@ 2013-06-27  6:45 ` Alexey Kardashevskiy
  2013-07-04  5:54   ` Andreas Färber
  2013-06-27  6:46 ` [Qemu-devel] [PATCH 17/17] spapr-pci: rework MSI/MSIX Alexey Kardashevskiy
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-27  6:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, Paolo Bonzini, qemu-ppc,
	Prerna Saxena, Paul Mackerras, David Gibson

From: Prerna Saxena <prerna@linux.vnet.ibm.com>

This patch enables QEMU to launch VM guests on POWER8 chip. I have tested
this to work with BML kernel on P8 dd1 chip.

Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Paul Mackerras <paulus@samba.org>
---
 target-ppc/cpu-models.c     |    3 +++
 target-ppc/cpu-models.h     |    1 +
 target-ppc/translate_init.c |   34 ++++++++++++++++++++++++++++++++++
 3 files changed, 38 insertions(+)

diff --git a/target-ppc/cpu-models.c b/target-ppc/cpu-models.c
index 9bb68c8..f8c64dd 100644
--- a/target-ppc/cpu-models.c
+++ b/target-ppc/cpu-models.c
@@ -1145,6 +1145,8 @@
                 "POWER7 v2.1")
     POWERPC_DEF("POWER7_v2.3",   CPU_POWERPC_POWER7_v23,             POWER7,
                 "POWER7 v2.3")
+    POWERPC_DEF("POWER8_v0.1",   CPU_POWERPC_POWER8_v01,             POWER8,
+                "POWER8 v0.1")
     POWERPC_DEF("970",           CPU_POWERPC_970,                    970,
                 "PowerPC 970")
     POWERPC_DEF("970fx_v1.0",    CPU_POWERPC_970FX_v10,              970FX,
@@ -1390,6 +1392,7 @@ PowerPCCPUAlias ppc_cpu_aliases[] = {
     { "Dino",  "POWER3" },
     { "POWER3+", "631" },
     { "POWER7", "POWER7_v2.3" },
+    { "POWER8", "POWER8_v0.1" },
     { "970fx", "970fx_v3.1" },
     { "970mp", "970mp_v1.1" },
     { "Apache", "RS64" },
diff --git a/target-ppc/cpu-models.h b/target-ppc/cpu-models.h
index 262ca47..b349ad2 100644
--- a/target-ppc/cpu-models.h
+++ b/target-ppc/cpu-models.h
@@ -556,6 +556,7 @@ enum {
     CPU_POWERPC_POWER7_v20         = 0x003F0200,
     CPU_POWERPC_POWER7_v21         = 0x003F0201,
     CPU_POWERPC_POWER7_v23         = 0x003F0203,
+    CPU_POWERPC_POWER8_v01         = 0x004B0100,
     CPU_POWERPC_970                = 0x00390202,
     CPU_POWERPC_970FX_v10          = 0x00391100,
     CPU_POWERPC_970FX_v20          = 0x003C0200,
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 95aebf7..2502758 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -7011,6 +7011,40 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data)
     pcc->l1_dcache_size = 0x8000;
     pcc->l1_icache_size = 0x8000;
 }
+
+POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(oc);
+    PowerPCCPUClass *pcc = POWERPC_CPU_CLASS(oc);
+
+    dc->desc = "POWER8";
+    pcc->init_proc = init_proc_POWER7;
+    pcc->check_pow = check_pow_nocheck;
+    pcc->insns_flags = PPC_INSNS_BASE | PPC_STRING | PPC_MFTB |
+                       PPC_FLOAT | PPC_FLOAT_FSEL | PPC_FLOAT_FRES |
+                       PPC_FLOAT_FSQRT | PPC_FLOAT_FRSQRTE |
+                       PPC_FLOAT_STFIWX |
+                       PPC_CACHE | PPC_CACHE_ICBI | PPC_CACHE_DCBZ |
+                       PPC_MEM_SYNC | PPC_MEM_EIEIO |
+                       PPC_MEM_TLBIE | PPC_MEM_TLBSYNC |
+                       PPC_64B | PPC_ALTIVEC |
+                       PPC_SEGMENT_64B | PPC_SLBI |
+                       PPC_POPCNTB | PPC_POPCNTWD;
+    pcc->insns_flags2 = PPC2_VSX | PPC2_DFP | PPC2_DBRX;
+    pcc->msr_mask = 0x800000000204FF36ULL;
+    pcc->mmu_model = POWERPC_MMU_2_06;
+#if defined(CONFIG_SOFTMMU)
+    pcc->handle_mmu_fault = ppc_hash64_handle_mmu_fault;
+#endif
+    pcc->excp_model = POWERPC_EXCP_POWER7;
+    pcc->bus_model = PPC_FLAGS_INPUT_POWER7;
+    pcc->bfd_mach = bfd_mach_ppc64;
+    pcc->flags = POWERPC_FLAG_VRE | POWERPC_FLAG_SE |
+                 POWERPC_FLAG_BE | POWERPC_FLAG_PMM |
+                 POWERPC_FLAG_BUS_CLK | POWERPC_FLAG_CFAR;
+    pcc->l1_dcache_size = 0x8000;
+    pcc->l1_icache_size = 0x8000;
+}
 #endif /* defined (TARGET_PPC64) */
 
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [Qemu-devel] [PATCH 17/17] spapr-pci: rework MSI/MSIX
  2013-06-27  6:45 [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (15 preceding siblings ...)
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 16/17] ppc64: Enable QEMU to run on POWER 8 DD1 chip Alexey Kardashevskiy
@ 2013-06-27  6:46 ` Alexey Kardashevskiy
  2013-07-04  2:31 ` [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-27  6:46 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, Alexey Kardashevskiy, Alexander Graf, qemu-ppc,
	Paolo Bonzini, Paul Mackerras, David Gibson

The specific of sPAPR platform is that the guest allocates MSI/MSIX
vectors via RTAS hypercalls and only operates with global IRQ numbers.
In the real hardware, PHB is expected to convert MSIMessage to an IRQ
number. So it is up to the host kernel to setup correct MSIMessage in
a real device and a PHB where a device sits on.

Therefore MSIMessage handling is completely hidden in QEMU.

Previously every PCI host bridge implemented its own MSI memory window
to catch msi_notify()/msix_notify() calls from QEMU devices (virtio-pci
or vfio) and redirect them to the guest via qemu_pulse_irq().

MSIMessage encoding was:
* .addr - address within the PHB MSI window;
* .data - the device index on PHB plus vector number.

The MSI MR write function translated this MSIMessage to a global VIRQ
number and called qemu_pulse_irq().

However the total number of IRQs is not really big (at the moment it is
1024 IRQs starting from 4096) and even 16bit data field of MSIMessage
seems to be enough to store a VIRQ number there so no decoding will be
needed.

The patch does:

1. remove MSI windows from a PHB;
2. add a single memory region for all MSIs in the guest;
3. encode MSIMessage as:
    * .addr - a fixed address of SPAPR_PCI_MSI_WINDOW==0x40000000000ULL;
    * .data as a IRQ number.
4. change IRQ allocator to align first IRQ number for MSI as it uses
lowest .data bits to put a vector number; this is not required for MSI-X
though as it has a per vector .data field.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr.c              |   29 +++++++++++--
 hw/ppc/spapr_pci.c          |   94 ++++++++++++++++++-------------------------
 include/hw/pci-host/spapr.h |    8 ++--
 include/hw/ppc/spapr.h      |    4 +-
 4 files changed, 73 insertions(+), 62 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 9489edc..75d29d8 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -88,6 +88,9 @@ int spapr_allocate_irq(int hint, bool lsi)
 
     if (hint) {
         irq = hint;
+        if (hint >= spapr->next_irq) {
+            spapr->next_irq = hint + 1;
+        }
         /* FIXME: we should probably check for collisions somehow */
     } else {
         irq = spapr->next_irq++;
@@ -103,22 +106,39 @@ int spapr_allocate_irq(int hint, bool lsi)
     return irq;
 }
 
-/* Allocate block of consequtive IRQs, returns a number of the first */
-int spapr_allocate_irq_block(int num, bool lsi)
+/*
+ * Allocate block of consequtive IRQs, returns a number of the first.
+ * If msi==true, aligns the first IRQ number to num.
+ */
+int spapr_allocate_irq_block(int num, bool lsi, bool msi)
 {
     int first = -1;
-    int i;
+    int i, hint = 0;
+
+    /*
+     * MSIMesage::data is used for storing VIRQ so
+     * it has to be aligned to num to support multiple
+     * MSI vectors. MSI-X is not affected by this.
+     * The hint is used for the first IRQ, the rest should
+     * be allocated continously.
+     */
+    if (msi) {
+        assert((num == 1) || (num == 2) || (num == 4) ||
+               (num == 8) || (num == 16) || (num == 32));
+        hint = (spapr->next_irq + num - 1) & ~(num - 1);
+    }
 
     for (i = 0; i < num; ++i) {
         int irq;
 
-        irq = spapr_allocate_irq(0, lsi);
+        irq = spapr_allocate_irq(hint, lsi);
         if (!irq) {
             return -1;
         }
 
         if (0 == i) {
             first = irq;
+            hint = 0;
         }
 
         /* If the above doesn't create a consecutive block then that's
@@ -1252,6 +1272,7 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
     spapr_create_nvram(spapr);
 
     /* Set up PCI */
+    spapr_pci_msi_init(spapr, SPAPR_PCI_MSI_WINDOW);
     spapr_pci_rtas_init();
 
     phb = spapr_create_phb(spapr, 0);
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 4d8e3cd..23dbc0e 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -253,30 +253,6 @@ static int spapr_msicfg_find(sPAPRPHBState *phb, uint32_t config_addr,
     return -1;
 }
 
-/*
- * Set MSI/MSIX message data.
- * This is required for msi_notify()/msix_notify() which
- * will write at the addresses via spapr_msi_write().
- */
-static void spapr_msi_setmsg(PCIDevice *pdev, hwaddr addr,
-                             bool msix, unsigned req_num)
-{
-    unsigned i;
-    MSIMessage msg = { .address = addr, .data = 0 };
-
-    if (!msix) {
-        msi_set_message(pdev, msg);
-        trace_spapr_pci_msi_setup(pdev->name, 0, msg.address);
-        return;
-    }
-
-    for (i = 0; i < req_num; ++i) {
-        msg.address = addr | (i << 2);
-        msix_set_message(pdev, i, msg);
-        trace_spapr_pci_msi_setup(pdev->name, i, msg.address);
-    }
-}
-
 static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
                                 uint32_t token, uint32_t nargs,
                                 target_ulong args, uint32_t nret,
@@ -288,9 +264,10 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
     unsigned int req_num = rtas_ld(args, 4); /* 0 == remove all */
     unsigned int seq_num = rtas_ld(args, 5);
     unsigned int ret_intr_type;
-    int ndev, irq;
+    int i, ndev, irq;
     sPAPRPHBState *phb = NULL;
     PCIDevice *pdev = NULL;
+    MSIMessage msg;
 
     switch (func) {
     case RTAS_CHANGE_MSI_FN:
@@ -351,7 +328,8 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
 
     /* There is no cached config, allocate MSIs */
     if (!phb->msi_table[ndev].nvec) {
-        irq = spapr_allocate_irq_block(req_num, false);
+        irq = spapr_allocate_irq_block(req_num, false,
+                                       ret_intr_type == RTAS_TYPE_MSI);
         if (irq < 0) {
             fprintf(stderr, "Cannot allocate MSIs for device#%d", ndev);
             rtas_st(rets, 0, -1); /* Hardware error */
@@ -362,9 +340,23 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
         phb->msi_table[ndev].config_addr = config_addr;
     }
 
-    /* Setup MSI/MSIX vectors in the device (via cfgspace or MSIX BAR) */
-    spapr_msi_setmsg(pdev, phb->msi_win_addr | (ndev << 16),
-                     ret_intr_type == RTAS_TYPE_MSIX, req_num);
+    /*
+     * Set MSI/MSIX message data.
+     * This is required for msi_notify()/msix_notify() which
+     * will write at the addresses via spapr_msi_write().
+     */
+    msg.address = spapr->msi_win_addr;
+    if (ret_intr_type == RTAS_TYPE_MSI) {
+        msg.data = phb->msi_table[ndev].irq;
+        msi_set_message(pdev, msg);
+        trace_spapr_pci_msi_setup(pdev->name, 0, msg.address);
+    } else {
+        for (i = 0; i < phb->msi_table[ndev].nvec; ++i) {
+            msg.data = phb->msi_table[ndev].irq + i;
+            msix_set_message(pdev, i, msg);
+            trace_spapr_pci_msi_setup(pdev->name, i, msg.address);
+        }
+    }
 
     rtas_st(rets, 0, 0);
     rtas_st(rets, 1, req_num);
@@ -487,10 +479,7 @@ static const MemoryRegionOps spapr_io_ops = {
 static void spapr_msi_write(void *opaque, hwaddr addr,
                             uint64_t data, unsigned size)
 {
-    sPAPRPHBState *phb = opaque;
-    int ndev = addr >> 16;
-    int vec = ((addr & 0xFFFF) >> 2) | data;
-    uint32_t irq = phb->msi_table[ndev].irq + vec;
+    uint32_t irq = data;
 
     trace_spapr_pci_msi_write(addr, data, irq);
 
@@ -504,6 +493,23 @@ static const MemoryRegionOps spapr_msi_ops = {
     .endianness = DEVICE_LITTLE_ENDIAN
 };
 
+void spapr_pci_msi_init(sPAPREnvironment *spapr, hwaddr addr)
+{
+    /*
+     * As MSI/MSIX interrupts trigger by writing at MSI/MSIX vectors,
+     * we need to allocate some memory to catch those writes coming
+     * from msi_notify()/msix_notify().
+     * As MSIMessage:addr is going to be the same and MSIMessage:data
+     * is going to be a VIRQ number, 4 bytes of the MSI MR will only
+     * be used.
+     */
+    spapr->msi_win_addr = addr;
+    memory_region_init_io(&spapr->msiwindow, &spapr_msi_ops, spapr,
+                          "msi", getpagesize());
+    memory_region_add_subregion(get_system_memory(), spapr->msi_win_addr,
+                                &spapr->msiwindow);
+}
+
 /*
  * PHB PCI device
  */
@@ -528,8 +534,7 @@ static int spapr_phb_init(SysBusDevice *s)
 
         if ((sphb->buid != -1) || (sphb->dma_liobn != -1)
             || (sphb->mem_win_addr != -1)
-            || (sphb->io_win_addr != -1)
-            || (sphb->msi_win_addr != -1)) {
+            || (sphb->io_win_addr != -1)) {
             fprintf(stderr, "Either \"index\" or other parameters must"
                     " be specified for PAPR PHB, not both\n");
             return -1;
@@ -542,7 +547,6 @@ static int spapr_phb_init(SysBusDevice *s)
             + sphb->index * SPAPR_PCI_WINDOW_SPACING;
         sphb->mem_win_addr = windows_base + SPAPR_PCI_MMIO_WIN_OFF;
         sphb->io_win_addr = windows_base + SPAPR_PCI_IO_WIN_OFF;
-        sphb->msi_win_addr = windows_base + SPAPR_PCI_MSI_WIN_OFF;
     }
 
     if (sphb->buid == -1) {
@@ -565,11 +569,6 @@ static int spapr_phb_init(SysBusDevice *s)
         return -1;
     }
 
-    if (sphb->msi_win_addr == -1) {
-        fprintf(stderr, "MSI window address not specified for PHB\n");
-        return -1;
-    }
-
     if (find_phb(spapr, sphb->buid)) {
         fprintf(stderr, "PCI host bridges must have unique BUIDs\n");
         return -1;
@@ -608,17 +607,6 @@ static int spapr_phb_init(SysBusDevice *s)
     memory_region_add_subregion(get_system_memory(), sphb->io_win_addr,
                                 &sphb->iowindow);
 
-    /* As MSI/MSIX interrupts trigger by writing at MSI/MSIX vectors,
-     * we need to allocate some memory to catch those writes coming
-     * from msi_notify()/msix_notify() */
-    if (msi_supported) {
-        sprintf(namebuf, "%s.msi", sphb->dtbusname);
-        memory_region_init_io(&sphb->msiwindow, &spapr_msi_ops, sphb,
-                              namebuf, SPAPR_MSIX_MAX_DEVS * 0x10000);
-        memory_region_add_subregion(get_system_memory(), sphb->msi_win_addr,
-                                    &sphb->msiwindow);
-    }
-
     /*
      * Selecting a busname is more complex than you'd think, due to
      * interacting constraints.  If the user has specified an id
@@ -692,7 +680,6 @@ static Property spapr_phb_properties[] = {
     DEFINE_PROP_HEX64("io_win_addr", sPAPRPHBState, io_win_addr, -1),
     DEFINE_PROP_HEX64("io_win_size", sPAPRPHBState, io_win_size,
                       SPAPR_PCI_IO_WIN_SIZE),
-    DEFINE_PROP_HEX64("msi_win_addr", sPAPRPHBState, msi_win_addr, -1),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -734,7 +721,6 @@ static const VMStateDescription vmstate_spapr_pci = {
         VMSTATE_UINT64_EQUAL(mem_win_size, sPAPRPHBState),
         VMSTATE_UINT64_EQUAL(io_win_addr, sPAPRPHBState),
         VMSTATE_UINT64_EQUAL(io_win_size, sPAPRPHBState),
-        VMSTATE_UINT64_EQUAL(msi_win_addr, sPAPRPHBState),
         VMSTATE_STRUCT_ARRAY(lsi_table, sPAPRPHBState, PCI_NUM_PINS, 0,
                              vmstate_spapr_pci_lsi, struct spapr_pci_lsi),
         VMSTATE_STRUCT_ARRAY(msi_table, sPAPRPHBState, SPAPR_MSIX_MAX_DEVS, 0,
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 93f9511..970b4a9 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -43,8 +43,7 @@ typedef struct sPAPRPHBState {
 
     MemoryRegion memspace, iospace;
     hwaddr mem_win_addr, mem_win_size, io_win_addr, io_win_size;
-    hwaddr msi_win_addr;
-    MemoryRegion memwindow, iowindow, msiwindow;
+    MemoryRegion memwindow, iowindow;
 
     uint32_t dma_liobn;
     uint64_t dma_window_start;
@@ -73,7 +72,8 @@ typedef struct sPAPRPHBState {
 #define SPAPR_PCI_MMIO_WIN_SIZE      0x20000000
 #define SPAPR_PCI_IO_WIN_OFF         0x80000000
 #define SPAPR_PCI_IO_WIN_SIZE        0x10000
-#define SPAPR_PCI_MSI_WIN_OFF        0x90000000
+
+#define SPAPR_PCI_MSI_WINDOW         0x40000000000ULL
 
 #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL
 
@@ -88,6 +88,8 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
                           uint32_t xics_phandle,
                           void *fdt);
 
+void spapr_pci_msi_init(sPAPREnvironment *spapr, hwaddr addr);
+
 void spapr_pci_rtas_init(void);
 
 #endif /* __HW_SPAPR_PCI_H__ */
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 3da31f0..f0129f4 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -14,6 +14,8 @@ struct icp_state;
 typedef struct sPAPREnvironment {
     struct VIOsPAPRBus *vio_bus;
     QLIST_HEAD(, sPAPRPHBState) phbs;
+    hwaddr msi_win_addr;
+    MemoryRegion msiwindow;
     struct sPAPRNVRAM *nvram;
     struct icp_state *icp;
 
@@ -304,7 +306,7 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
                              target_ulong *args);
 
 int spapr_allocate_irq(int hint, bool lsi);
-int spapr_allocate_irq_block(int num, bool lsi);
+int spapr_allocate_irq_block(int num, bool lsi, bool msi);
 
 static inline int spapr_allocate_msi(int hint)
 {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 02/17] pseries: rework XICS
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 02/17] pseries: rework XICS Alexey Kardashevskiy
@ 2013-06-27 11:47   ` David Gibson
  2013-06-27 12:17     ` Alexey Kardashevskiy
  2013-07-08 18:22   ` Anthony Liguori
  1 sibling, 1 reply; 92+ messages in thread
From: David Gibson @ 2013-06-27 11:47 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Anthony Liguori, qemu-devel, Alexander Graf, qemu-ppc,
	Paolo Bonzini, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 10439 bytes --]

On Thu, Jun 27, 2013 at 04:45:45PM +1000, Alexey Kardashevskiy wrote:
> Currently XICS interrupt controller is not a QEMU device. As we are going
> to support in-kernel emulated XICS which is a part of KVM, it make
> sense not to extend the existing XICS and have multiple KVM stub functions
> but to create yet another device and share pieces between fully emulated
> XICS and in-kernel XICS.

Hmm.  So, I think changing the xics to the qdev/qom framework is a
generally good idea.  But I'm not convinced its a good idea to have
different devices for the kernel and non-kernel xics.  Won't that
prevent migrating from a system with a kernel xics to one without, or
vice versa?

> 
> The rework includes:
> * port to QOM
> * made few functions public to use from in-kernel XICS implementation
> * made VMStateDescription public to be used for in-kernel XICS migration
> * move xics_system_init() to spapr.c, it tries creating fully-emulated
> XICS now and will try in-kernel XICS in upcoming patches.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  hw/intc/xics.c        |  109 ++++++++++++++++++++++++++-----------------------
>  hw/ppc/spapr.c        |   28 +++++++++++++
>  include/hw/ppc/xics.h |   59 ++++++++++++++++++++++++--
>  3 files changed, 141 insertions(+), 55 deletions(-)
> 
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index 091912e..0e374c8 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -34,13 +34,6 @@
>   * ICP: Presentation layer
>   */
>  
> -struct icp_server_state {
> -    uint32_t xirr;
> -    uint8_t pending_priority;
> -    uint8_t mfrr;
> -    qemu_irq output;
> -};
> -
>  #define XISR_MASK  0x00ffffff
>  #define CPPR_MASK  0xff000000
>  
> @@ -49,12 +42,6 @@ struct icp_server_state {
>  
>  struct ics_state;
>  
> -struct icp_state {
> -    long nr_servers;
> -    struct icp_server_state *ss;
> -    struct ics_state *ics;
> -};
> -
>  static void ics_reject(struct ics_state *ics, int nr);
>  static void ics_resend(struct ics_state *ics);
>  static void ics_eoi(struct ics_state *ics, int nr);
> @@ -171,27 +158,6 @@ static void icp_irq(struct icp_state *icp, int server, int nr, uint8_t priority)
>  /*
>   * ICS: Source layer
>   */
> -
> -struct ics_irq_state {
> -    int server;
> -    uint8_t priority;
> -    uint8_t saved_priority;
> -#define XICS_STATUS_ASSERTED           0x1
> -#define XICS_STATUS_SENT               0x2
> -#define XICS_STATUS_REJECTED           0x4
> -#define XICS_STATUS_MASKED_PENDING     0x8
> -    uint8_t status;
> -};
> -
> -struct ics_state {
> -    int nr_irqs;
> -    int offset;
> -    qemu_irq *qirqs;
> -    bool *islsi;
> -    struct ics_irq_state *irqs;
> -    struct icp_state *icp;
> -};
> -
>  static int ics_valid_irq(struct ics_state *ics, uint32_t nr)
>  {
>      return (nr >= ics->offset)
> @@ -506,9 +472,8 @@ static void rtas_int_on(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>      rtas_st(rets, 0, 0); /* Success */
>  }
>  
> -static void xics_reset(void *opaque)
> +void xics_common_reset(struct icp_state *icp)

Why do you need to expose this interface?  Couldn't the caller use
qdev_reset(xics) just as easily?

>  {
> -    struct icp_state *icp = (struct icp_state *)opaque;
>      struct ics_state *ics = icp->ics;
>      int i;
>  
> @@ -527,7 +492,12 @@ static void xics_reset(void *opaque)
>      }
>  }
>  
> -void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
> +static void xics_reset(DeviceState *d)
> +{
> +    xics_common_reset(XICS(d));
> +}
> +
> +void xics_common_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
>  {
>      CPUState *cs = CPU(cpu);
>      CPUPPCState *env = &cpu->env;
> @@ -551,37 +521,72 @@ void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
>      }
>  }
>  
> -struct icp_state *xics_system_init(int nr_servers, int nr_irqs)
> +void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
> +{
> +    xics_common_cpu_setup(icp, cpu);
> +}
> +
> +void xics_common_init(struct icp_state *icp, qemu_irq_handler handler)
>  {
> -    struct icp_state *icp;
> -    struct ics_state *ics;
> +    struct ics_state *ics = icp->ics;
>  
> -    icp = g_malloc0(sizeof(*icp));
> -    icp->nr_servers = nr_servers;
>      icp->ss = g_malloc0(icp->nr_servers*sizeof(struct icp_server_state));
>  
>      ics = g_malloc0(sizeof(*ics));
> -    ics->nr_irqs = nr_irqs;
> +    ics->nr_irqs = icp->nr_irqs;
>      ics->offset = XICS_IRQ_BASE;
> -    ics->irqs = g_malloc0(nr_irqs * sizeof(struct ics_irq_state));
> -    ics->islsi = g_malloc0(nr_irqs * sizeof(bool));
> +    ics->irqs = g_malloc0(ics->nr_irqs * sizeof(struct ics_irq_state));
> +    ics->islsi = g_malloc0(ics->nr_irqs * sizeof(bool));
>  
>      icp->ics = ics;
>      ics->icp = icp;
>  
> -    ics->qirqs = qemu_allocate_irqs(ics_set_irq, ics, nr_irqs);
> +    ics->qirqs = qemu_allocate_irqs(handler, ics, ics->nr_irqs);
> +}
>  
> -    spapr_register_hypercall(H_CPPR, h_cppr);
> -    spapr_register_hypercall(H_IPI, h_ipi);
> -    spapr_register_hypercall(H_XIRR, h_xirr);
> -    spapr_register_hypercall(H_EOI, h_eoi);
> +static void xics_realize(DeviceState *dev, Error **errp)
> +{
> +    struct icp_state *icp = XICS(dev);
> +
> +    xics_common_init(icp, ics_set_irq);
>  
>      spapr_rtas_register("ibm,set-xive", rtas_set_xive);
>      spapr_rtas_register("ibm,get-xive", rtas_get_xive);
>      spapr_rtas_register("ibm,int-off", rtas_int_off);
>      spapr_rtas_register("ibm,int-on", rtas_int_on);
>  
> -    qemu_register_reset(xics_reset, icp);
> +}
> +
> +static Property xics_properties[] = {
> +    DEFINE_PROP_UINT32("nr_servers", struct icp_state, nr_servers, -1),
> +    DEFINE_PROP_UINT32("nr_irqs", struct icp_state, nr_irqs, -1),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void xics_class_init(ObjectClass *oc, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(oc);
> +
> +    dc->realize = xics_realize;
> +    dc->props = xics_properties;
> +    dc->reset = xics_reset;
> +}
> +
> +static const TypeInfo xics_info = {
> +    .name          = TYPE_XICS,
> +    .parent        = TYPE_SYS_BUS_DEVICE,
> +    .instance_size = sizeof(struct icp_state),
> +    .class_init    = xics_class_init,
> +};
> +
> +static void xics_register_types(void)
> +{
> +    spapr_register_hypercall(H_CPPR, h_cppr);
> +    spapr_register_hypercall(H_IPI, h_ipi);
> +    spapr_register_hypercall(H_XIRR, h_xirr);
> +    spapr_register_hypercall(H_EOI, h_eoi);
>  
> -    return icp;
> +    type_register_static(&xics_info);
>  }
> +
> +type_init(xics_register_types)
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 38c29b7..def3505 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -719,6 +719,34 @@ static int spapr_vga_init(PCIBus *pci_bus)
>      }
>  }
>  
> +static struct icp_state *try_create_xics(const char *type, int nr_servers,
> +                                         int nr_irqs)
> +{
> +    DeviceState *dev;
> +
> +    dev = qdev_create(NULL, type);
> +    qdev_prop_set_uint32(dev, "nr_servers", nr_servers);
> +    qdev_prop_set_uint32(dev, "nr_irqs", nr_irqs);
> +    if (qdev_init(dev) < 0) {
> +        return NULL;

You could just use qdev_init_nofail() here to avoid the manual
handling of failures.

> +    }
> +
> +    return XICS(dev);
> +}
> +
> +static struct icp_state *xics_system_init(int nr_servers, int nr_irqs)
> +{
> +    struct icp_state *icp = NULL;
> +
> +    icp = try_create_xics(TYPE_XICS, nr_servers, nr_irqs);
> +    if (!icp) {
> +        perror("Failed to create XICS\n");
> +        abort();
> +    }
> +
> +    return icp;
> +}
> +
>  /* pSeries LPAR / sPAPR hardware init */
>  static void ppc_spapr_init(QEMUMachineInitArgs *args)
>  {
> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> index 6bce042..3f72806 100644
> --- a/include/hw/ppc/xics.h
> +++ b/include/hw/ppc/xics.h
> @@ -27,15 +27,68 @@
>  #if !defined(__XICS_H__)
>  #define __XICS_H__
>  
> +#include "hw/sysbus.h"
> +
> +#define TYPE_XICS "xics"
> +#define XICS(obj) OBJECT_CHECK(struct icp_state, (obj), TYPE_XICS)
> +
>  #define XICS_IPI        0x2
> -#define XICS_IRQ_BASE   0x10
> +#define XICS_BUID       0x1
> +#define XICS_IRQ_BASE   (XICS_BUID << 12)
> +
> +/*
> + * We currently only support one BUID which is our interrupt base
> + * (the kernel implementation supports more but we don't exploit
> + *  that yet)
> + */
>  
> -struct icp_state;
> +struct icp_state {
> +    /*< private >*/
> +    SysBusDevice parent_obj;
> +    /*< public >*/
> +    uint32_t nr_servers;
> +    uint32_t nr_irqs;
> +    struct icp_server_state *ss;
> +    struct ics_state *ics;
> +};
> +
> +struct icp_server_state {
> +    uint32_t xirr;
> +    uint8_t pending_priority;
> +    uint8_t mfrr;
> +    qemu_irq output;
> +};

The indivudual server_state and irq_state structures probably
shouldn't be exported.

> +struct ics_state {
> +    uint32_t nr_irqs;
> +    uint32_t offset;
> +    qemu_irq *qirqs;
> +    bool *islsi;
> +    struct ics_irq_state *irqs;
> +    struct icp_state *icp;
> +};
> +
> +struct ics_irq_state {
> +    uint32_t server;
> +    uint8_t priority;
> +    uint8_t saved_priority;
> +#define XICS_STATUS_ASSERTED           0x1
> +#define XICS_STATUS_SENT               0x2
> +#define XICS_STATUS_REJECTED           0x4
> +#define XICS_STATUS_MASKED_PENDING     0x8
> +    uint8_t status;
> +};
>  
>  qemu_irq xics_get_qirq(struct icp_state *icp, int irq);
>  void xics_set_irq_type(struct icp_state *icp, int irq, bool lsi);
>  
> -struct icp_state *xics_system_init(int nr_servers, int nr_irqs);
> +void xics_common_init(struct icp_state *icp, qemu_irq_handler handler);
> +void xics_common_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu);
> +void xics_common_reset(struct icp_state *icp);
> +
>  void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu);
>  
> +extern const VMStateDescription vmstate_icp_server;
> +extern const VMStateDescription vmstate_ics;
> +
>  #endif /* __XICS_H__ */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 02/17] pseries: rework XICS
  2013-06-27 11:47   ` David Gibson
@ 2013-06-27 12:17     ` Alexey Kardashevskiy
  2013-07-02  0:06       ` David Gibson
  2013-07-08 18:24       ` Anthony Liguori
  0 siblings, 2 replies; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-06-27 12:17 UTC (permalink / raw)
  To: David Gibson
  Cc: Anthony Liguori, qemu-devel, Alexander Graf, qemu-ppc,
	Paolo Bonzini, Paul Mackerras

On 06/27/2013 09:47 PM, David Gibson wrote:
> On Thu, Jun 27, 2013 at 04:45:45PM +1000, Alexey Kardashevskiy wrote:
>> Currently XICS interrupt controller is not a QEMU device. As we are going
>> to support in-kernel emulated XICS which is a part of KVM, it make
>> sense not to extend the existing XICS and have multiple KVM stub functions
>> but to create yet another device and share pieces between fully emulated
>> XICS and in-kernel XICS.
> 
> Hmm.  So, I think changing the xics to the qdev/qom framework is a
> generally good idea.  But I'm not convinced its a good idea to have
> different devices for the kernel and non-kernel xics.

The idea came from Alex Graf, this is already done for openpic/openpic-kvm.
The normal practice is to move ioctls to KVM to KVM code and provide empty
stubs for non-KVM case. There were too many so having a separate xics-kvm
is kind of help.


> Won't that
> prevent migrating from a system with a kernel xics to one without, or
> vice versa?

Mmm. Do we care much about that?...
At the moment it is not supported that as VMStateDescription have different
.name for xics and xics-kvm but easy to fix. And we do not pass a device to
vmstate_register so that must be it.


> 
>>
>> The rework includes:
>> * port to QOM
>> * made few functions public to use from in-kernel XICS implementation
>> * made VMStateDescription public to be used for in-kernel XICS migration
>> * move xics_system_init() to spapr.c, it tries creating fully-emulated
>> XICS now and will try in-kernel XICS in upcoming patches.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  hw/intc/xics.c        |  109 ++++++++++++++++++++++++++-----------------------
>>  hw/ppc/spapr.c        |   28 +++++++++++++
>>  include/hw/ppc/xics.h |   59 ++++++++++++++++++++++++--
>>  3 files changed, 141 insertions(+), 55 deletions(-)
>>
>> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
>> index 091912e..0e374c8 100644
>> --- a/hw/intc/xics.c
>> +++ b/hw/intc/xics.c
>> @@ -34,13 +34,6 @@
>>   * ICP: Presentation layer
>>   */
>>  
>> -struct icp_server_state {
>> -    uint32_t xirr;
>> -    uint8_t pending_priority;
>> -    uint8_t mfrr;
>> -    qemu_irq output;
>> -};
>> -
>>  #define XISR_MASK  0x00ffffff
>>  #define CPPR_MASK  0xff000000
>>  
>> @@ -49,12 +42,6 @@ struct icp_server_state {
>>  
>>  struct ics_state;
>>  
>> -struct icp_state {
>> -    long nr_servers;
>> -    struct icp_server_state *ss;
>> -    struct ics_state *ics;
>> -};
>> -
>>  static void ics_reject(struct ics_state *ics, int nr);
>>  static void ics_resend(struct ics_state *ics);
>>  static void ics_eoi(struct ics_state *ics, int nr);
>> @@ -171,27 +158,6 @@ static void icp_irq(struct icp_state *icp, int server, int nr, uint8_t priority)
>>  /*
>>   * ICS: Source layer
>>   */
>> -
>> -struct ics_irq_state {
>> -    int server;
>> -    uint8_t priority;
>> -    uint8_t saved_priority;
>> -#define XICS_STATUS_ASSERTED           0x1
>> -#define XICS_STATUS_SENT               0x2
>> -#define XICS_STATUS_REJECTED           0x4
>> -#define XICS_STATUS_MASKED_PENDING     0x8
>> -    uint8_t status;
>> -};
>> -
>> -struct ics_state {
>> -    int nr_irqs;
>> -    int offset;
>> -    qemu_irq *qirqs;
>> -    bool *islsi;
>> -    struct ics_irq_state *irqs;
>> -    struct icp_state *icp;
>> -};
>> -
>>  static int ics_valid_irq(struct ics_state *ics, uint32_t nr)
>>  {
>>      return (nr >= ics->offset)
>> @@ -506,9 +472,8 @@ static void rtas_int_on(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>>      rtas_st(rets, 0, 0); /* Success */
>>  }
>>  
>> -static void xics_reset(void *opaque)
>> +void xics_common_reset(struct icp_state *icp)
> 
> Why do you need to expose this interface?  Couldn't the caller use
> qdev_reset(xics) just as easily?
> 
>>  {
>> -    struct icp_state *icp = (struct icp_state *)opaque;
>>      struct ics_state *ics = icp->ics;
>>      int i;
>>  
>> @@ -527,7 +492,12 @@ static void xics_reset(void *opaque)
>>      }
>>  }
>>  
>> -void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
>> +static void xics_reset(DeviceState *d)
>> +{
>> +    xics_common_reset(XICS(d));
>> +}
>> +
>> +void xics_common_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
>>  {
>>      CPUState *cs = CPU(cpu);
>>      CPUPPCState *env = &cpu->env;
>> @@ -551,37 +521,72 @@ void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
>>      }
>>  }
>>  
>> -struct icp_state *xics_system_init(int nr_servers, int nr_irqs)
>> +void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
>> +{
>> +    xics_common_cpu_setup(icp, cpu);
>> +}
>> +
>> +void xics_common_init(struct icp_state *icp, qemu_irq_handler handler)
>>  {
>> -    struct icp_state *icp;
>> -    struct ics_state *ics;
>> +    struct ics_state *ics = icp->ics;
>>  
>> -    icp = g_malloc0(sizeof(*icp));
>> -    icp->nr_servers = nr_servers;
>>      icp->ss = g_malloc0(icp->nr_servers*sizeof(struct icp_server_state));
>>  
>>      ics = g_malloc0(sizeof(*ics));
>> -    ics->nr_irqs = nr_irqs;
>> +    ics->nr_irqs = icp->nr_irqs;
>>      ics->offset = XICS_IRQ_BASE;
>> -    ics->irqs = g_malloc0(nr_irqs * sizeof(struct ics_irq_state));
>> -    ics->islsi = g_malloc0(nr_irqs * sizeof(bool));
>> +    ics->irqs = g_malloc0(ics->nr_irqs * sizeof(struct ics_irq_state));
>> +    ics->islsi = g_malloc0(ics->nr_irqs * sizeof(bool));
>>  
>>      icp->ics = ics;
>>      ics->icp = icp;
>>  
>> -    ics->qirqs = qemu_allocate_irqs(ics_set_irq, ics, nr_irqs);
>> +    ics->qirqs = qemu_allocate_irqs(handler, ics, ics->nr_irqs);
>> +}
>>  
>> -    spapr_register_hypercall(H_CPPR, h_cppr);
>> -    spapr_register_hypercall(H_IPI, h_ipi);
>> -    spapr_register_hypercall(H_XIRR, h_xirr);
>> -    spapr_register_hypercall(H_EOI, h_eoi);
>> +static void xics_realize(DeviceState *dev, Error **errp)
>> +{
>> +    struct icp_state *icp = XICS(dev);
>> +
>> +    xics_common_init(icp, ics_set_irq);
>>  
>>      spapr_rtas_register("ibm,set-xive", rtas_set_xive);
>>      spapr_rtas_register("ibm,get-xive", rtas_get_xive);
>>      spapr_rtas_register("ibm,int-off", rtas_int_off);
>>      spapr_rtas_register("ibm,int-on", rtas_int_on);
>>  
>> -    qemu_register_reset(xics_reset, icp);
>> +}
>> +
>> +static Property xics_properties[] = {
>> +    DEFINE_PROP_UINT32("nr_servers", struct icp_state, nr_servers, -1),
>> +    DEFINE_PROP_UINT32("nr_irqs", struct icp_state, nr_irqs, -1),
>> +    DEFINE_PROP_END_OF_LIST(),
>> +};
>> +
>> +static void xics_class_init(ObjectClass *oc, void *data)
>> +{
>> +    DeviceClass *dc = DEVICE_CLASS(oc);
>> +
>> +    dc->realize = xics_realize;
>> +    dc->props = xics_properties;
>> +    dc->reset = xics_reset;
>> +}
>> +
>> +static const TypeInfo xics_info = {
>> +    .name          = TYPE_XICS,
>> +    .parent        = TYPE_SYS_BUS_DEVICE,
>> +    .instance_size = sizeof(struct icp_state),
>> +    .class_init    = xics_class_init,
>> +};
>> +
>> +static void xics_register_types(void)
>> +{
>> +    spapr_register_hypercall(H_CPPR, h_cppr);
>> +    spapr_register_hypercall(H_IPI, h_ipi);
>> +    spapr_register_hypercall(H_XIRR, h_xirr);
>> +    spapr_register_hypercall(H_EOI, h_eoi);
>>  
>> -    return icp;
>> +    type_register_static(&xics_info);
>>  }
>> +
>> +type_init(xics_register_types)
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 38c29b7..def3505 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -719,6 +719,34 @@ static int spapr_vga_init(PCIBus *pci_bus)
>>      }
>>  }
>>  
>> +static struct icp_state *try_create_xics(const char *type, int nr_servers,
>> +                                         int nr_irqs)
>> +{
>> +    DeviceState *dev;
>> +
>> +    dev = qdev_create(NULL, type);
>> +    qdev_prop_set_uint32(dev, "nr_servers", nr_servers);
>> +    qdev_prop_set_uint32(dev, "nr_irqs", nr_irqs);
>> +    if (qdev_init(dev) < 0) {
>> +        return NULL;
> 
> You could just use qdev_init_nofail() here to avoid the manual
> handling of failures.
> 
>> +    }
>> +
>> +    return XICS(dev);
>> +}
>> +
>> +static struct icp_state *xics_system_init(int nr_servers, int nr_irqs)
>> +{
>> +    struct icp_state *icp = NULL;
>> +
>> +    icp = try_create_xics(TYPE_XICS, nr_servers, nr_irqs);
>> +    if (!icp) {
>> +        perror("Failed to create XICS\n");
>> +        abort();
>> +    }
>> +
>> +    return icp;
>> +}
>> +
>>  /* pSeries LPAR / sPAPR hardware init */
>>  static void ppc_spapr_init(QEMUMachineInitArgs *args)
>>  {
>> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
>> index 6bce042..3f72806 100644
>> --- a/include/hw/ppc/xics.h
>> +++ b/include/hw/ppc/xics.h
>> @@ -27,15 +27,68 @@
>>  #if !defined(__XICS_H__)
>>  #define __XICS_H__
>>  
>> +#include "hw/sysbus.h"
>> +
>> +#define TYPE_XICS "xics"
>> +#define XICS(obj) OBJECT_CHECK(struct icp_state, (obj), TYPE_XICS)
>> +
>>  #define XICS_IPI        0x2
>> -#define XICS_IRQ_BASE   0x10
>> +#define XICS_BUID       0x1
>> +#define XICS_IRQ_BASE   (XICS_BUID << 12)
>> +
>> +/*
>> + * We currently only support one BUID which is our interrupt base
>> + * (the kernel implementation supports more but we don't exploit
>> + *  that yet)
>> + */
>>  
>> -struct icp_state;
>> +struct icp_state {
>> +    /*< private >*/
>> +    SysBusDevice parent_obj;
>> +    /*< public >*/
>> +    uint32_t nr_servers;
>> +    uint32_t nr_irqs;
>> +    struct icp_server_state *ss;
>> +    struct ics_state *ics;
>> +};
>> +
>> +struct icp_server_state {
>> +    uint32_t xirr;
>> +    uint8_t pending_priority;
>> +    uint8_t mfrr;
>> +    qemu_irq output;
>> +};
> 
> The indivudual server_state and irq_state structures probably
> shouldn't be exported.
> 
>> +struct ics_state {
>> +    uint32_t nr_irqs;
>> +    uint32_t offset;
>> +    qemu_irq *qirqs;
>> +    bool *islsi;
>> +    struct ics_irq_state *irqs;
>> +    struct icp_state *icp;
>> +};
>> +
>> +struct ics_irq_state {
>> +    uint32_t server;
>> +    uint8_t priority;
>> +    uint8_t saved_priority;
>> +#define XICS_STATUS_ASSERTED           0x1
>> +#define XICS_STATUS_SENT               0x2
>> +#define XICS_STATUS_REJECTED           0x4
>> +#define XICS_STATUS_MASKED_PENDING     0x8
>> +    uint8_t status;
>> +};
>>  
>>  qemu_irq xics_get_qirq(struct icp_state *icp, int irq);
>>  void xics_set_irq_type(struct icp_state *icp, int irq, bool lsi);
>>  
>> -struct icp_state *xics_system_init(int nr_servers, int nr_irqs);
>> +void xics_common_init(struct icp_state *icp, qemu_irq_handler handler);
>> +void xics_common_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu);
>> +void xics_common_reset(struct icp_state *icp);
>> +
>>  void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu);
>>  
>> +extern const VMStateDescription vmstate_icp_server;
>> +extern const VMStateDescription vmstate_ics;
>> +
>>  #endif /* __XICS_H__ */
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 02/17] pseries: rework XICS
  2013-06-27 12:17     ` Alexey Kardashevskiy
@ 2013-07-02  0:06       ` David Gibson
  2013-07-02  0:21         ` Alexander Graf
  2013-07-08 18:24       ` Anthony Liguori
  1 sibling, 1 reply; 92+ messages in thread
From: David Gibson @ 2013-07-02  0:06 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Anthony Liguori, qemu-devel, Alexander Graf, qemu-ppc,
	Paolo Bonzini, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 1685 bytes --]

On Thu, Jun 27, 2013 at 10:17:19PM +1000, Alexey Kardashevskiy wrote:
> On 06/27/2013 09:47 PM, David Gibson wrote:
> > On Thu, Jun 27, 2013 at 04:45:45PM +1000, Alexey Kardashevskiy wrote:
> >> Currently XICS interrupt controller is not a QEMU device. As we are going
> >> to support in-kernel emulated XICS which is a part of KVM, it make
> >> sense not to extend the existing XICS and have multiple KVM stub functions
> >> but to create yet another device and share pieces between fully emulated
> >> XICS and in-kernel XICS.
> > 
> > Hmm.  So, I think changing the xics to the qdev/qom framework is a
> > generally good idea.  But I'm not convinced its a good idea to have
> > different devices for the kernel and non-kernel xics.
> 
> The idea came from Alex Graf, this is already done for openpic/openpic-kvm.
> The normal practice is to move ioctls to KVM to KVM code and provide empty
> stubs for non-KVM case. There were too many so having a separate xics-kvm
> is kind of help.
> 
> 
> > Won't that
> > prevent migrating from a system with a kernel xics to one without, or
> > vice versa?
> 
> Mmm. Do we care much about that?...

Enough to avoid making it impossible by design.

> At the moment it is not supported that as VMStateDescription have different
> .name for xics and xics-kvm but easy to fix. And we do not pass a device to
> vmstate_register so that must be it.

Ok, if you can make the ids in the vmsd match, then that should be ok.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 02/17] pseries: rework XICS
  2013-07-02  0:06       ` David Gibson
@ 2013-07-02  0:21         ` Alexander Graf
  2013-07-02  2:08           ` Alexey Kardashevskiy
  0 siblings, 1 reply; 92+ messages in thread
From: Alexander Graf @ 2013-07-02  0:21 UTC (permalink / raw)
  To: David Gibson
  Cc: Anthony Liguori, Alexey Kardashevskiy, qemu-devel, qemu-ppc,
	Paolo Bonzini, Paul Mackerras


On 02.07.2013, at 02:06, David Gibson wrote:

> On Thu, Jun 27, 2013 at 10:17:19PM +1000, Alexey Kardashevskiy wrote:
>> On 06/27/2013 09:47 PM, David Gibson wrote:
>>> On Thu, Jun 27, 2013 at 04:45:45PM +1000, Alexey Kardashevskiy wrote:
>>>> Currently XICS interrupt controller is not a QEMU device. As we are going
>>>> to support in-kernel emulated XICS which is a part of KVM, it make
>>>> sense not to extend the existing XICS and have multiple KVM stub functions
>>>> but to create yet another device and share pieces between fully emulated
>>>> XICS and in-kernel XICS.
>>> 
>>> Hmm.  So, I think changing the xics to the qdev/qom framework is a
>>> generally good idea.  But I'm not convinced its a good idea to have
>>> different devices for the kernel and non-kernel xics.
>> 
>> The idea came from Alex Graf, this is already done for openpic/openpic-kvm.
>> The normal practice is to move ioctls to KVM to KVM code and provide empty
>> stubs for non-KVM case. There were too many so having a separate xics-kvm
>> is kind of help.
>> 
>> 
>>> Won't that
>>> prevent migrating from a system with a kernel xics to one without, or
>>> vice versa?
>> 
>> Mmm. Do we care much about that?...
> 
> Enough to avoid making it impossible by design.

We went that route with x86 too after lots of hassle trying to shoehorn the in-kernel APIC into the emulation device. It's more hassle than gain.

> 
>> At the moment it is not supported that as VMStateDescription have different
>> .name for xics and xics-kvm but easy to fix. And we do not pass a device to
>> vmstate_register so that must be it.
> 
> Ok, if you can make the ids in the vmsd match, then that should be ok.

I really just wouldn't bother too much about it. Sooner or later QEMU-XICS is going to be a legacy and debug only option.


Alex

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 02/17] pseries: rework XICS
  2013-07-02  0:21         ` Alexander Graf
@ 2013-07-02  2:08           ` Alexey Kardashevskiy
  0 siblings, 0 replies; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-02  2:08 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Anthony Liguori, qemu-devel, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson

On 07/02/2013 10:21 AM, Alexander Graf wrote:
> 
> On 02.07.2013, at 02:06, David Gibson wrote:
> 
>> On Thu, Jun 27, 2013 at 10:17:19PM +1000, Alexey Kardashevskiy wrote:
>>> On 06/27/2013 09:47 PM, David Gibson wrote:
>>>> On Thu, Jun 27, 2013 at 04:45:45PM +1000, Alexey Kardashevskiy wrote:
>>>>> Currently XICS interrupt controller is not a QEMU device. As we are going
>>>>> to support in-kernel emulated XICS which is a part of KVM, it make
>>>>> sense not to extend the existing XICS and have multiple KVM stub functions
>>>>> but to create yet another device and share pieces between fully emulated
>>>>> XICS and in-kernel XICS.
>>>>
>>>> Hmm.  So, I think changing the xics to the qdev/qom framework is a
>>>> generally good idea.  But I'm not convinced its a good idea to have
>>>> different devices for the kernel and non-kernel xics.
>>>
>>> The idea came from Alex Graf, this is already done for openpic/openpic-kvm.
>>> The normal practice is to move ioctls to KVM to KVM code and provide empty
>>> stubs for non-KVM case. There were too many so having a separate xics-kvm
>>> is kind of help.
>>>
>>>
>>>> Won't that
>>>> prevent migrating from a system with a kernel xics to one without, or
>>>> vice versa?
>>>
>>> Mmm. Do we care much about that?...
>>
>> Enough to avoid making it impossible by design.
> 
> We went that route with x86 too after lots of hassle trying to shoehorn the in-kernel APIC into the emulation device. It's more hassle than gain.

At the moment it can be supported at no cost so next time I'll post it with
matched vmsd.



>>> At the moment it is not supported that as VMStateDescription have different
>>> .name for xics and xics-kvm but easy to fix. And we do not pass a device to
>>> vmstate_register so that must be it.
>>
>> Ok, if you can make the ids in the vmsd match, then that should be ok.
> 
> I really just wouldn't bother too much about it. Sooner or later QEMU-XICS is going to be a legacy and debug only option.


-- 
Alexey

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 01/17] pseries: move interrupt controllers to hw/intc/
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 01/17] pseries: move interrupt controllers to hw/intc/ Alexey Kardashevskiy
@ 2013-07-02 20:54   ` Andreas Färber
  2013-07-08 18:15   ` Anthony Liguori
  1 sibling, 0 replies; 92+ messages in thread
From: Andreas Färber @ 2013-07-02 20:54 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Anthony Liguori, qemu-devel, Alexander Graf, qemu-ppc,
	Paolo Bonzini, Paul Mackerras, David Gibson

Am 27.06.2013 08:45, schrieb Alexey Kardashevskiy:
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  default-configs/ppc64-softmmu.mak |    1 +
>  hw/intc/Makefile.objs             |    1 +
>  hw/{ppc => intc}/xics.c           |    0
>  hw/ppc/Makefile.objs              |    2 +-
>  4 files changed, 3 insertions(+), 1 deletion(-)
>  rename hw/{ppc => intc}/xics.c (100%)

Looks sensible,

Reviewed-by: Andreas Färber <afaerber@suse.de>

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8
  2013-06-27  6:45 [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (16 preceding siblings ...)
  2013-06-27  6:46 ` [Qemu-devel] [PATCH 17/17] spapr-pci: rework MSI/MSIX Alexey Kardashevskiy
@ 2013-07-04  2:31 ` Alexey Kardashevskiy
  2013-07-04  2:40   ` Anthony Liguori
  2013-07-08 18:01 ` Anthony Liguori
  2013-07-09 14:04 ` Anthony Liguori
  19 siblings, 1 reply; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-04  2:31 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Anthony Liguori, qemu-devel, Alexander Graf, qemu-ppc,
	Paolo Bonzini, Paul Mackerras, David Gibson

Alex, Anthony, ping? Do I need to post rebased version? I know there is
something to fix for migration (Anthony mentioned once) but what? Thanks.


On 06/27/2013 04:45 PM, Alexey Kardashevskiy wrote:
> This series spent quite a lot of time waiting when David's PCI series
> reaches the upstream but it does not seem to happen soon so I rebased
> those on top of agraf/ppc-next rebased on top qemu.org/master.
> 
> 
> While this series applies and compiles, the migration will often fail
> until the "migration: do not sent zero pages in bulk stage" patch is reverted
> or fixed somehow.
> 
> 
> Alexey Kardashevskiy (4):
>   pseries: move interrupt controllers to hw/intc/
>   pseries: rework XICS
>   pseries: rework PAPR virtual SCSI
>   spapr-pci: rework MSI/MSIX
> 
> David Gibson (12):
>   savevm: Implement VMS_DIVIDE flag
>   target-ppc: Convert ppc cpu savevm to VMStateDescription
>   pseries: savevm support for XICS interrupt controller
>   pseries: savevm support for VIO devices
>   pseries: savevm support for PAPR VIO logical lan
>   pseries: savevm support for PAPR TCE tables
>   pseries: savevm support for PAPR virtual SCSI
>   pseries: savevm support for pseries machine
>   pseries: savevm support for PCI host bridge
>   target-ppc: Add helper for KVM_PPC_RTAS_DEFINE_TOKEN
>   pseries: Support for in-kernel XICS interrupt controller
>   pseries: savevm support with KVM
> 
> Prerna Saxena (1):
>   ppc64: Enable QEMU to run on POWER 8 DD1 chip.
> 
>  default-configs/ppc64-softmmu.mak |    2 +
>  hw/char/spapr_vty.c               |   16 ++
>  hw/intc/Makefile.objs             |    2 +
>  hw/{ppc => intc}/xics.c           |  172 ++++++++----
>  hw/intc/xics_kvm.c                |  445 +++++++++++++++++++++++++++++++
>  hw/net/spapr_llan.c               |   24 +-
>  hw/ppc/Makefile.objs              |    2 +-
>  hw/ppc/spapr.c                    |  418 ++++++++++++++++++++++++++++-
>  hw/ppc/spapr_hcall.c              |    8 +-
>  hw/ppc/spapr_iommu.c              |   25 ++
>  hw/ppc/spapr_pci.c                |  141 ++++++----
>  hw/ppc/spapr_vio.c                |   20 ++
>  hw/scsi/spapr_vscsi.c             |  306 ++++++++++++++-------
>  include/hw/pci-host/spapr.h       |   14 +-
>  include/hw/ppc/spapr.h            |   17 +-
>  include/hw/ppc/spapr_vio.h        |    5 +
>  include/hw/ppc/xics.h             |   72 ++++-
>  include/migration/vmstate.h       |   13 +
>  savevm.c                          |    8 +
>  target-ppc/cpu-models.c           |    3 +
>  target-ppc/cpu-models.h           |    1 +
>  target-ppc/cpu-qom.h              |    4 +
>  target-ppc/cpu.h                  |    8 +-
>  target-ppc/kvm.c                  |   83 ++++++
>  target-ppc/kvm_ppc.h              |   29 ++
>  target-ppc/machine.c              |  533 +++++++++++++++++++++++++++++++------
>  target-ppc/translate_init.c       |   36 +++
>  27 files changed, 2088 insertions(+), 319 deletions(-)
>  rename hw/{ppc => intc}/xics.c (80%)
>  create mode 100644 hw/intc/xics_kvm.c
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8
  2013-07-04  2:31 ` [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
@ 2013-07-04  2:40   ` Anthony Liguori
  2013-07-04  2:48     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 92+ messages in thread
From: Anthony Liguori @ 2013-07-04  2:40 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Anthony Liguori, qemu-devel, Alexander Graf, qemu-ppc,
	Paolo Bonzini, Paul Mackerras, David Gibson

On Wed, Jul 3, 2013 at 9:31 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
> Alex, Anthony, ping? Do I need to post rebased version? I know there is
> something to fix for migration (Anthony mentioned once) but what? Thanks.

You need to rework the XICS patches no?

You're also missing save/restore support for VTY.

Regards,

Anthony Liguori


> On 06/27/2013 04:45 PM, Alexey Kardashevskiy wrote:
>> This series spent quite a lot of time waiting when David's PCI series
>> reaches the upstream but it does not seem to happen soon so I rebased
>> those on top of agraf/ppc-next rebased on top qemu.org/master.
>>
>>
>> While this series applies and compiles, the migration will often fail
>> until the "migration: do not sent zero pages in bulk stage" patch is reverted
>> or fixed somehow.
>>
>>
>> Alexey Kardashevskiy (4):
>>   pseries: move interrupt controllers to hw/intc/
>>   pseries: rework XICS
>>   pseries: rework PAPR virtual SCSI
>>   spapr-pci: rework MSI/MSIX
>>
>> David Gibson (12):
>>   savevm: Implement VMS_DIVIDE flag
>>   target-ppc: Convert ppc cpu savevm to VMStateDescription
>>   pseries: savevm support for XICS interrupt controller
>>   pseries: savevm support for VIO devices
>>   pseries: savevm support for PAPR VIO logical lan
>>   pseries: savevm support for PAPR TCE tables
>>   pseries: savevm support for PAPR virtual SCSI
>>   pseries: savevm support for pseries machine
>>   pseries: savevm support for PCI host bridge
>>   target-ppc: Add helper for KVM_PPC_RTAS_DEFINE_TOKEN
>>   pseries: Support for in-kernel XICS interrupt controller
>>   pseries: savevm support with KVM
>>
>> Prerna Saxena (1):
>>   ppc64: Enable QEMU to run on POWER 8 DD1 chip.
>>
>>  default-configs/ppc64-softmmu.mak |    2 +
>>  hw/char/spapr_vty.c               |   16 ++
>>  hw/intc/Makefile.objs             |    2 +
>>  hw/{ppc => intc}/xics.c           |  172 ++++++++----
>>  hw/intc/xics_kvm.c                |  445 +++++++++++++++++++++++++++++++
>>  hw/net/spapr_llan.c               |   24 +-
>>  hw/ppc/Makefile.objs              |    2 +-
>>  hw/ppc/spapr.c                    |  418 ++++++++++++++++++++++++++++-
>>  hw/ppc/spapr_hcall.c              |    8 +-
>>  hw/ppc/spapr_iommu.c              |   25 ++
>>  hw/ppc/spapr_pci.c                |  141 ++++++----
>>  hw/ppc/spapr_vio.c                |   20 ++
>>  hw/scsi/spapr_vscsi.c             |  306 ++++++++++++++-------
>>  include/hw/pci-host/spapr.h       |   14 +-
>>  include/hw/ppc/spapr.h            |   17 +-
>>  include/hw/ppc/spapr_vio.h        |    5 +
>>  include/hw/ppc/xics.h             |   72 ++++-
>>  include/migration/vmstate.h       |   13 +
>>  savevm.c                          |    8 +
>>  target-ppc/cpu-models.c           |    3 +
>>  target-ppc/cpu-models.h           |    1 +
>>  target-ppc/cpu-qom.h              |    4 +
>>  target-ppc/cpu.h                  |    8 +-
>>  target-ppc/kvm.c                  |   83 ++++++
>>  target-ppc/kvm_ppc.h              |   29 ++
>>  target-ppc/machine.c              |  533 +++++++++++++++++++++++++++++++------
>>  target-ppc/translate_init.c       |   36 +++
>>  27 files changed, 2088 insertions(+), 319 deletions(-)
>>  rename hw/{ppc => intc}/xics.c (80%)
>>  create mode 100644 hw/intc/xics_kvm.c
>>
>
>
> --
> Alexey
>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8
  2013-07-04  2:40   ` Anthony Liguori
@ 2013-07-04  2:48     ` Alexey Kardashevskiy
  0 siblings, 0 replies; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-04  2:48 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Anthony Liguori, qemu-devel, Alexander Graf, qemu-ppc,
	Paolo Bonzini, Paul Mackerras, David Gibson

On 07/04/2013 12:40 PM, Anthony Liguori wrote:
> On Wed, Jul 3, 2013 at 9:31 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>> Alex, Anthony, ping? Do I need to post rebased version? I know there is
>> something to fix for migration (Anthony mentioned once) but what? Thanks.
> 
> You need to rework the XICS patches no?

I thought I did already, I even QOM'ed it and made a new xics-kvm device.
Need something more?

> You're also missing save/restore support for VTY.

Oh. Just noticed. It is there already but somehow it got merged into
"[PATCH 07/17] pseries: savevm support for PAPR VIO logical lan" before I
got this stuff in my hands :) I'll split it.

Anything else is missing?



> Regards,
> 
> Anthony Liguori
> 
> 
>> On 06/27/2013 04:45 PM, Alexey Kardashevskiy wrote:
>>> This series spent quite a lot of time waiting when David's PCI series
>>> reaches the upstream but it does not seem to happen soon so I rebased
>>> those on top of agraf/ppc-next rebased on top qemu.org/master.
>>>
>>>
>>> While this series applies and compiles, the migration will often fail
>>> until the "migration: do not sent zero pages in bulk stage" patch is reverted
>>> or fixed somehow.
>>>
>>>
>>> Alexey Kardashevskiy (4):
>>>   pseries: move interrupt controllers to hw/intc/
>>>   pseries: rework XICS
>>>   pseries: rework PAPR virtual SCSI
>>>   spapr-pci: rework MSI/MSIX
>>>
>>> David Gibson (12):
>>>   savevm: Implement VMS_DIVIDE flag
>>>   target-ppc: Convert ppc cpu savevm to VMStateDescription
>>>   pseries: savevm support for XICS interrupt controller
>>>   pseries: savevm support for VIO devices
>>>   pseries: savevm support for PAPR VIO logical lan
>>>   pseries: savevm support for PAPR TCE tables
>>>   pseries: savevm support for PAPR virtual SCSI
>>>   pseries: savevm support for pseries machine
>>>   pseries: savevm support for PCI host bridge
>>>   target-ppc: Add helper for KVM_PPC_RTAS_DEFINE_TOKEN
>>>   pseries: Support for in-kernel XICS interrupt controller
>>>   pseries: savevm support with KVM
>>>
>>> Prerna Saxena (1):
>>>   ppc64: Enable QEMU to run on POWER 8 DD1 chip.
>>>
>>>  default-configs/ppc64-softmmu.mak |    2 +
>>>  hw/char/spapr_vty.c               |   16 ++
>>>  hw/intc/Makefile.objs             |    2 +
>>>  hw/{ppc => intc}/xics.c           |  172 ++++++++----
>>>  hw/intc/xics_kvm.c                |  445 +++++++++++++++++++++++++++++++
>>>  hw/net/spapr_llan.c               |   24 +-
>>>  hw/ppc/Makefile.objs              |    2 +-
>>>  hw/ppc/spapr.c                    |  418 ++++++++++++++++++++++++++++-
>>>  hw/ppc/spapr_hcall.c              |    8 +-
>>>  hw/ppc/spapr_iommu.c              |   25 ++
>>>  hw/ppc/spapr_pci.c                |  141 ++++++----
>>>  hw/ppc/spapr_vio.c                |   20 ++
>>>  hw/scsi/spapr_vscsi.c             |  306 ++++++++++++++-------
>>>  include/hw/pci-host/spapr.h       |   14 +-
>>>  include/hw/ppc/spapr.h            |   17 +-
>>>  include/hw/ppc/spapr_vio.h        |    5 +
>>>  include/hw/ppc/xics.h             |   72 ++++-
>>>  include/migration/vmstate.h       |   13 +
>>>  savevm.c                          |    8 +
>>>  target-ppc/cpu-models.c           |    3 +
>>>  target-ppc/cpu-models.h           |    1 +
>>>  target-ppc/cpu-qom.h              |    4 +
>>>  target-ppc/cpu.h                  |    8 +-
>>>  target-ppc/kvm.c                  |   83 ++++++
>>>  target-ppc/kvm_ppc.h              |   29 ++
>>>  target-ppc/machine.c              |  533 +++++++++++++++++++++++++++++++------
>>>  target-ppc/translate_init.c       |   36 +++
>>>  27 files changed, 2088 insertions(+), 319 deletions(-)
>>>  rename hw/{ppc => intc}/xics.c (80%)
>>>  create mode 100644 hw/intc/xics_kvm.c
>>>
>>
>>
>> --
>> Alexey
>>


-- 
Alexey

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 16/17] ppc64: Enable QEMU to run on POWER 8 DD1 chip.
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 16/17] ppc64: Enable QEMU to run on POWER 8 DD1 chip Alexey Kardashevskiy
@ 2013-07-04  5:54   ` Andreas Färber
  2013-07-04  6:26     ` [Qemu-devel] [Qemu-ppc] " Benjamin Herrenschmidt
  2013-07-04  6:42     ` [Qemu-devel] " Prerna Saxena
  0 siblings, 2 replies; 92+ messages in thread
From: Andreas Färber @ 2013-07-04  5:54 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Alexander Graf
  Cc: Anthony Liguori, qemu-devel, qemu-ppc, Prerna Saxena,
	Paolo Bonzini, Paul Mackerras, David Gibson

Am 27.06.2013 08:45, schrieb Alexey Kardashevskiy:
> From: Prerna Saxena <prerna@linux.vnet.ibm.com>
> 
> This patch enables QEMU to launch VM guests on POWER8 chip. I have tested
> this to work with BML kernel on P8 dd1 chip.
> 
> Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Reviewed-by: Paul Mackerras <paulus@samba.org>

The subject slightly hides what the patch is actually doing:
Suggest "target-ppc: Add POWER8 v0.1 CPU model"?

What's DD1, should that be added to the textual description?

> ---
>  target-ppc/cpu-models.c     |    3 +++
>  target-ppc/cpu-models.h     |    1 +
>  target-ppc/translate_init.c |   34 ++++++++++++++++++++++++++++++++++
>  3 files changed, 38 insertions(+)
> 
> diff --git a/target-ppc/cpu-models.c b/target-ppc/cpu-models.c
> index 9bb68c8..f8c64dd 100644
> --- a/target-ppc/cpu-models.c
> +++ b/target-ppc/cpu-models.c
> @@ -1145,6 +1145,8 @@
>                  "POWER7 v2.1")
>      POWERPC_DEF("POWER7_v2.3",   CPU_POWERPC_POWER7_v23,             POWER7,
>                  "POWER7 v2.3")
> +    POWERPC_DEF("POWER8_v0.1",   CPU_POWERPC_POWER8_v01,             POWER8,
> +                "POWER8 v0.1")
>      POWERPC_DEF("970",           CPU_POWERPC_970,                    970,
>                  "PowerPC 970")
>      POWERPC_DEF("970fx_v1.0",    CPU_POWERPC_970FX_v10,              970FX,
> @@ -1390,6 +1392,7 @@ PowerPCCPUAlias ppc_cpu_aliases[] = {
>      { "Dino",  "POWER3" },
>      { "POWER3+", "631" },
>      { "POWER7", "POWER7_v2.3" },
> +    { "POWER8", "POWER8_v0.1" },
>      { "970fx", "970fx_v3.1" },
>      { "970mp", "970mp_v1.1" },
>      { "Apache", "RS64" },
> diff --git a/target-ppc/cpu-models.h b/target-ppc/cpu-models.h
> index 262ca47..b349ad2 100644
> --- a/target-ppc/cpu-models.h
> +++ b/target-ppc/cpu-models.h
> @@ -556,6 +556,7 @@ enum {
>      CPU_POWERPC_POWER7_v20         = 0x003F0200,
>      CPU_POWERPC_POWER7_v21         = 0x003F0201,
>      CPU_POWERPC_POWER7_v23         = 0x003F0203,
> +    CPU_POWERPC_POWER8_v01         = 0x004B0100,

Are you sure this PVR is v0.1 and not v1.0?

Rest looks okay, although I wouldn't know how to check all flags.

Andreas

>      CPU_POWERPC_970                = 0x00390202,
>      CPU_POWERPC_970FX_v10          = 0x00391100,
>      CPU_POWERPC_970FX_v20          = 0x003C0200,
> diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
> index 95aebf7..2502758 100644
> --- a/target-ppc/translate_init.c
> +++ b/target-ppc/translate_init.c
> @@ -7011,6 +7011,40 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data)
>      pcc->l1_dcache_size = 0x8000;
>      pcc->l1_icache_size = 0x8000;
>  }
> +
> +POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(oc);
> +    PowerPCCPUClass *pcc = POWERPC_CPU_CLASS(oc);
> +
> +    dc->desc = "POWER8";
> +    pcc->init_proc = init_proc_POWER7;
> +    pcc->check_pow = check_pow_nocheck;
> +    pcc->insns_flags = PPC_INSNS_BASE | PPC_STRING | PPC_MFTB |
> +                       PPC_FLOAT | PPC_FLOAT_FSEL | PPC_FLOAT_FRES |
> +                       PPC_FLOAT_FSQRT | PPC_FLOAT_FRSQRTE |
> +                       PPC_FLOAT_STFIWX |
> +                       PPC_CACHE | PPC_CACHE_ICBI | PPC_CACHE_DCBZ |
> +                       PPC_MEM_SYNC | PPC_MEM_EIEIO |
> +                       PPC_MEM_TLBIE | PPC_MEM_TLBSYNC |
> +                       PPC_64B | PPC_ALTIVEC |
> +                       PPC_SEGMENT_64B | PPC_SLBI |
> +                       PPC_POPCNTB | PPC_POPCNTWD;
> +    pcc->insns_flags2 = PPC2_VSX | PPC2_DFP | PPC2_DBRX;
> +    pcc->msr_mask = 0x800000000204FF36ULL;
> +    pcc->mmu_model = POWERPC_MMU_2_06;
> +#if defined(CONFIG_SOFTMMU)
> +    pcc->handle_mmu_fault = ppc_hash64_handle_mmu_fault;
> +#endif
> +    pcc->excp_model = POWERPC_EXCP_POWER7;
> +    pcc->bus_model = PPC_FLAGS_INPUT_POWER7;
> +    pcc->bfd_mach = bfd_mach_ppc64;
> +    pcc->flags = POWERPC_FLAG_VRE | POWERPC_FLAG_SE |
> +                 POWERPC_FLAG_BE | POWERPC_FLAG_PMM |
> +                 POWERPC_FLAG_BUS_CLK | POWERPC_FLAG_CFAR;
> +    pcc->l1_dcache_size = 0x8000;
> +    pcc->l1_icache_size = 0x8000;
> +}
>  #endif /* defined (TARGET_PPC64) */
>  
>  


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 16/17] ppc64: Enable QEMU to run on POWER 8 DD1 chip.
  2013-07-04  5:54   ` Andreas Färber
@ 2013-07-04  6:26     ` Benjamin Herrenschmidt
  2013-07-04  6:42     ` [Qemu-devel] " Prerna Saxena
  1 sibling, 0 replies; 92+ messages in thread
From: Benjamin Herrenschmidt @ 2013-07-04  6:26 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Anthony Liguori, Alexey Kardashevskiy, Alexander Graf,
	qemu-devel, qemu-ppc, Prerna Saxena, Paolo Bonzini,
	Paul Mackerras, David Gibson

On Thu, 2013-07-04 at 07:54 +0200, Andreas Färber wrote:
> Am 27.06.2013 08:45, schrieb Alexey Kardashevskiy:
> > From: Prerna Saxena <prerna@linux.vnet.ibm.com>
> > 
> > This patch enables QEMU to launch VM guests on POWER8 chip. I have tested
> > this to work with BML kernel on P8 dd1 chip.
> > 
> > Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
> > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > Reviewed-by: Paul Mackerras <paulus@samba.org>
> 
> The subject slightly hides what the patch is actually doing:
> Suggest "target-ppc: Add POWER8 v0.1 CPU model"?

It's 1.0 anyway :-)

> What's DD1, should that be added to the textual description?

"DD" is how we call our chip revisions internally. DD1 is 1.0, DD1.1 is
1.1, etc..

Cheers,
Ben.

> > ---
> >  target-ppc/cpu-models.c     |    3 +++
> >  target-ppc/cpu-models.h     |    1 +
> >  target-ppc/translate_init.c |   34 ++++++++++++++++++++++++++++++++++
> >  3 files changed, 38 insertions(+)
> > 
> > diff --git a/target-ppc/cpu-models.c b/target-ppc/cpu-models.c
> > index 9bb68c8..f8c64dd 100644
> > --- a/target-ppc/cpu-models.c
> > +++ b/target-ppc/cpu-models.c
> > @@ -1145,6 +1145,8 @@
> >                  "POWER7 v2.1")
> >      POWERPC_DEF("POWER7_v2.3",   CPU_POWERPC_POWER7_v23,             POWER7,
> >                  "POWER7 v2.3")
> > +    POWERPC_DEF("POWER8_v0.1",   CPU_POWERPC_POWER8_v01,             POWER8,
> > +                "POWER8 v0.1")
> >      POWERPC_DEF("970",           CPU_POWERPC_970,                    970,
> >                  "PowerPC 970")
> >      POWERPC_DEF("970fx_v1.0",    CPU_POWERPC_970FX_v10,              970FX,
> > @@ -1390,6 +1392,7 @@ PowerPCCPUAlias ppc_cpu_aliases[] = {
> >      { "Dino",  "POWER3" },
> >      { "POWER3+", "631" },
> >      { "POWER7", "POWER7_v2.3" },
> > +    { "POWER8", "POWER8_v0.1" },
> >      { "970fx", "970fx_v3.1" },
> >      { "970mp", "970mp_v1.1" },
> >      { "Apache", "RS64" },
> > diff --git a/target-ppc/cpu-models.h b/target-ppc/cpu-models.h
> > index 262ca47..b349ad2 100644
> > --- a/target-ppc/cpu-models.h
> > +++ b/target-ppc/cpu-models.h
> > @@ -556,6 +556,7 @@ enum {
> >      CPU_POWERPC_POWER7_v20         = 0x003F0200,
> >      CPU_POWERPC_POWER7_v21         = 0x003F0201,
> >      CPU_POWERPC_POWER7_v23         = 0x003F0203,
> > +    CPU_POWERPC_POWER8_v01         = 0x004B0100,
> 
> Are you sure this PVR is v0.1 and not v1.0?
> 
> Rest looks okay, although I wouldn't know how to check all flags.
> 
> Andreas
> 
> >      CPU_POWERPC_970                = 0x00390202,
> >      CPU_POWERPC_970FX_v10          = 0x00391100,
> >      CPU_POWERPC_970FX_v20          = 0x003C0200,
> > diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
> > index 95aebf7..2502758 100644
> > --- a/target-ppc/translate_init.c
> > +++ b/target-ppc/translate_init.c
> > @@ -7011,6 +7011,40 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data)
> >      pcc->l1_dcache_size = 0x8000;
> >      pcc->l1_icache_size = 0x8000;
> >  }
> > +
> > +POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data)
> > +{
> > +    DeviceClass *dc = DEVICE_CLASS(oc);
> > +    PowerPCCPUClass *pcc = POWERPC_CPU_CLASS(oc);
> > +
> > +    dc->desc = "POWER8";
> > +    pcc->init_proc = init_proc_POWER7;
> > +    pcc->check_pow = check_pow_nocheck;
> > +    pcc->insns_flags = PPC_INSNS_BASE | PPC_STRING | PPC_MFTB |
> > +                       PPC_FLOAT | PPC_FLOAT_FSEL | PPC_FLOAT_FRES |
> > +                       PPC_FLOAT_FSQRT | PPC_FLOAT_FRSQRTE |
> > +                       PPC_FLOAT_STFIWX |
> > +                       PPC_CACHE | PPC_CACHE_ICBI | PPC_CACHE_DCBZ |
> > +                       PPC_MEM_SYNC | PPC_MEM_EIEIO |
> > +                       PPC_MEM_TLBIE | PPC_MEM_TLBSYNC |
> > +                       PPC_64B | PPC_ALTIVEC |
> > +                       PPC_SEGMENT_64B | PPC_SLBI |
> > +                       PPC_POPCNTB | PPC_POPCNTWD;
> > +    pcc->insns_flags2 = PPC2_VSX | PPC2_DFP | PPC2_DBRX;
> > +    pcc->msr_mask = 0x800000000204FF36ULL;
> > +    pcc->mmu_model = POWERPC_MMU_2_06;
> > +#if defined(CONFIG_SOFTMMU)
> > +    pcc->handle_mmu_fault = ppc_hash64_handle_mmu_fault;
> > +#endif
> > +    pcc->excp_model = POWERPC_EXCP_POWER7;
> > +    pcc->bus_model = PPC_FLAGS_INPUT_POWER7;
> > +    pcc->bfd_mach = bfd_mach_ppc64;
> > +    pcc->flags = POWERPC_FLAG_VRE | POWERPC_FLAG_SE |
> > +                 POWERPC_FLAG_BE | POWERPC_FLAG_PMM |
> > +                 POWERPC_FLAG_BUS_CLK | POWERPC_FLAG_CFAR;
> > +    pcc->l1_dcache_size = 0x8000;
> > +    pcc->l1_icache_size = 0x8000;
> > +}
> >  #endif /* defined (TARGET_PPC64) */
> >  
> >  
> 
> 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 16/17] ppc64: Enable QEMU to run on POWER 8 DD1 chip.
  2013-07-04  5:54   ` Andreas Färber
  2013-07-04  6:26     ` [Qemu-devel] [Qemu-ppc] " Benjamin Herrenschmidt
@ 2013-07-04  6:42     ` Prerna Saxena
  2013-07-10 11:19       ` Alexander Graf
  1 sibling, 1 reply; 92+ messages in thread
From: Prerna Saxena @ 2013-07-04  6:42 UTC (permalink / raw)
  To: Andreas Färber
  Cc: Anthony Liguori, Alexey Kardashevskiy, Alexander Graf,
	qemu-devel, qemu-ppc, Paolo Bonzini, Paul Mackerras,
	David Gibson

Hi Andreas,
Thank you for taking a look.
I have incorporated your feedback into a new patch, attached herewith.


Regards,
Prerna

Subject: [PATCH] target-ppc: Add POWER8 v1.0 CPU model

This patch adds CPU PVR definition for POWER8,
and enables QEMU to launch guests on POWER8 hardware.

Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Andreas Farber <afaerber@suse.de>
---
 target-ppc/cpu-models.c     |  3 +++
 target-ppc/cpu-models.h     |  1 +
 target-ppc/translate_init.c | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 38 insertions(+)

diff --git a/target-ppc/cpu-models.c b/target-ppc/cpu-models.c
index 17f56b7..72f7088 100644
--- a/target-ppc/cpu-models.c
+++ b/target-ppc/cpu-models.c
@@ -1145,6 +1145,8 @@
                 "POWER7 v2.1")
     POWERPC_DEF("POWER7_v2.3",   CPU_POWERPC_POWER7_v23,             POWER7,
                 "POWER7 v2.3")
+    POWERPC_DEF("POWER8_v1.0",   CPU_POWERPC_POWER8_v10,             POWER8,
+                "POWER8 v1.0")
     POWERPC_DEF("970",           CPU_POWERPC_970,                    970,
                 "PowerPC 970")
     POWERPC_DEF("970fx_v1.0",    CPU_POWERPC_970FX_v10,              970FX,
@@ -1390,6 +1392,7 @@ const PowerPCCPUAlias ppc_cpu_aliases[] = {
     { "Dino",  "POWER3" },
     { "POWER3+", "631" },
     { "POWER7", "POWER7_v2.3" },
+    { "POWER8", "POWER8_v1.0" },
     { "970fx", "970fx_v3.1" },
     { "970mp", "970mp_v1.1" },
     { "Apache", "RS64" },
diff --git a/target-ppc/cpu-models.h b/target-ppc/cpu-models.h
index a94f835..1c67a0e 100644
--- a/target-ppc/cpu-models.h
+++ b/target-ppc/cpu-models.h
@@ -555,6 +555,7 @@ enum {
     CPU_POWERPC_POWER7_v20         = 0x003F0200,
     CPU_POWERPC_POWER7_v21         = 0x003F0201,
     CPU_POWERPC_POWER7_v23         = 0x003F0203,
+    CPU_POWERPC_POWER8_v10         = 0x004B0100,
     CPU_POWERPC_970                = 0x00390202,
     CPU_POWERPC_970FX_v10          = 0x00391100,
     CPU_POWERPC_970FX_v20          = 0x003C0200,
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 71e434a..a1d8e70 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -7042,6 +7042,40 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data)
     pcc->l1_dcache_size = 0x8000;
     pcc->l1_icache_size = 0x8000;
 }
+
+POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(oc);
+    PowerPCCPUClass *pcc = POWERPC_CPU_CLASS(oc);
+
+    dc->desc = "POWER8";
+    pcc->init_proc = init_proc_POWER7;
+    pcc->check_pow = check_pow_nocheck;
+    pcc->insns_flags = PPC_INSNS_BASE | PPC_STRING | PPC_MFTB |
+                       PPC_FLOAT | PPC_FLOAT_FSEL | PPC_FLOAT_FRES |
+                       PPC_FLOAT_FSQRT | PPC_FLOAT_FRSQRTE |
+                       PPC_FLOAT_STFIWX |
+                       PPC_CACHE | PPC_CACHE_ICBI | PPC_CACHE_DCBZ |
+                       PPC_MEM_SYNC | PPC_MEM_EIEIO |
+                       PPC_MEM_TLBIE | PPC_MEM_TLBSYNC |
+                       PPC_64B | PPC_ALTIVEC |
+                       PPC_SEGMENT_64B | PPC_SLBI |
+                       PPC_POPCNTB | PPC_POPCNTWD;
+    pcc->insns_flags2 = PPC2_VSX | PPC2_DFP | PPC2_DBRX;
+    pcc->msr_mask = 0x800000000204FF36ULL;
+    pcc->mmu_model = POWERPC_MMU_2_06;
+#if defined(CONFIG_SOFTMMU)
+    pcc->handle_mmu_fault = ppc_hash64_handle_mmu_fault;
+#endif
+    pcc->excp_model = POWERPC_EXCP_POWER7;
+    pcc->bus_model = PPC_FLAGS_INPUT_POWER7;
+    pcc->bfd_mach = bfd_mach_ppc64;
+    pcc->flags = POWERPC_FLAG_VRE | POWERPC_FLAG_SE |
+                 POWERPC_FLAG_BE | POWERPC_FLAG_PMM |
+                 POWERPC_FLAG_BUS_CLK | POWERPC_FLAG_CFAR;
+    pcc->l1_dcache_size = 0x8000;
+    pcc->l1_icache_size = 0x8000;
+}
 #endif /* defined (TARGET_PPC64) */
 
 
-- 
1.7.11.4



-- 
Prerna Saxena

Linux Technology Centre,
IBM Systems and Technology Lab,
Bangalore, India

^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8
  2013-06-27  6:45 [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (17 preceding siblings ...)
  2013-07-04  2:31 ` [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
@ 2013-07-08 18:01 ` Anthony Liguori
  2013-07-09  6:37   ` Alexey Kardashevskiy
  2013-07-09 14:04 ` Anthony Liguori
  19 siblings, 1 reply; 92+ messages in thread
From: Anthony Liguori @ 2013-07-08 18:01 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: Alexander Graf, qemu-ppc, Paolo Bonzini, Paul Mackerras, David Gibson

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> This series spent quite a lot of time waiting when David's PCI series
> reaches the upstream but it does not seem to happen soon so I rebased
> those on top of agraf/ppc-next rebased on top qemu.org/master.
>
>
> While this series applies and compiles, the migration will often fail
> until the "migration: do not sent zero pages in bulk stage" patch is reverted
> or fixed somehow.

Your cover letter is out of date.  This patch has been applied.  Can you
confirm the series now works as expected?

David's PCI series is now upstream too.

This should be at least three if not four distinct patch series.
Sending it as a single series means it cannot be applied in chunks easily.

Regards,

Anthony Liguori

> Alexey Kardashevskiy (4):
>   pseries: move interrupt controllers to hw/intc/
>   pseries: rework XICS
>   pseries: rework PAPR virtual SCSI
>   spapr-pci: rework MSI/MSIX
>
> David Gibson (12):
>   savevm: Implement VMS_DIVIDE flag
>   target-ppc: Convert ppc cpu savevm to VMStateDescription
>   pseries: savevm support for XICS interrupt controller
>   pseries: savevm support for VIO devices
>   pseries: savevm support for PAPR VIO logical lan
>   pseries: savevm support for PAPR TCE tables
>   pseries: savevm support for PAPR virtual SCSI
>   pseries: savevm support for pseries machine
>   pseries: savevm support for PCI host bridge
>   target-ppc: Add helper for KVM_PPC_RTAS_DEFINE_TOKEN
>   pseries: Support for in-kernel XICS interrupt controller
>   pseries: savevm support with KVM
>
> Prerna Saxena (1):
>   ppc64: Enable QEMU to run on POWER 8 DD1 chip.
>
>  default-configs/ppc64-softmmu.mak |    2 +
>  hw/char/spapr_vty.c               |   16 ++
>  hw/intc/Makefile.objs             |    2 +
>  hw/{ppc => intc}/xics.c           |  172 ++++++++----
>  hw/intc/xics_kvm.c                |  445 +++++++++++++++++++++++++++++++
>  hw/net/spapr_llan.c               |   24 +-
>  hw/ppc/Makefile.objs              |    2 +-
>  hw/ppc/spapr.c                    |  418 ++++++++++++++++++++++++++++-
>  hw/ppc/spapr_hcall.c              |    8 +-
>  hw/ppc/spapr_iommu.c              |   25 ++
>  hw/ppc/spapr_pci.c                |  141 ++++++----
>  hw/ppc/spapr_vio.c                |   20 ++
>  hw/scsi/spapr_vscsi.c             |  306 ++++++++++++++-------
>  include/hw/pci-host/spapr.h       |   14 +-
>  include/hw/ppc/spapr.h            |   17 +-
>  include/hw/ppc/spapr_vio.h        |    5 +
>  include/hw/ppc/xics.h             |   72 ++++-
>  include/migration/vmstate.h       |   13 +
>  savevm.c                          |    8 +
>  target-ppc/cpu-models.c           |    3 +
>  target-ppc/cpu-models.h           |    1 +
>  target-ppc/cpu-qom.h              |    4 +
>  target-ppc/cpu.h                  |    8 +-
>  target-ppc/kvm.c                  |   83 ++++++
>  target-ppc/kvm_ppc.h              |   29 ++
>  target-ppc/machine.c              |  533 +++++++++++++++++++++++++++++++------
>  target-ppc/translate_init.c       |   36 +++
>  27 files changed, 2088 insertions(+), 319 deletions(-)
>  rename hw/{ppc => intc}/xics.c (80%)
>  create mode 100644 hw/intc/xics_kvm.c
>
> -- 
> 1.7.10.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 01/17] pseries: move interrupt controllers to hw/intc/
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 01/17] pseries: move interrupt controllers to hw/intc/ Alexey Kardashevskiy
  2013-07-02 20:54   ` Andreas Färber
@ 2013-07-08 18:15   ` Anthony Liguori
  2013-07-08 18:34     ` Alexander Graf
  1 sibling, 1 reply; 92+ messages in thread
From: Anthony Liguori @ 2013-07-08 18:15 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: Alexander Graf, qemu-ppc, Paolo Bonzini, Paul Mackerras, David Gibson

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>

Regards,

Anthony Liguori

> ---
>  default-configs/ppc64-softmmu.mak |    1 +
>  hw/intc/Makefile.objs             |    1 +
>  hw/{ppc => intc}/xics.c           |    0
>  hw/ppc/Makefile.objs              |    2 +-
>  4 files changed, 3 insertions(+), 1 deletion(-)
>  rename hw/{ppc => intc}/xics.c (100%)
>
> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> index cb279cb..69a9f8d 100644
> --- a/default-configs/ppc64-softmmu.mak
> +++ b/default-configs/ppc64-softmmu.mak
> @@ -47,5 +47,6 @@ CONFIG_E500=y
>  CONFIG_OPENPIC_KVM=$(and $(CONFIG_E500),$(CONFIG_KVM))
>  # For pSeries
>  CONFIG_PCI_HOTPLUG=y
> +CONFIG_XICS=$(CONFIG_PSERIES)
>  # For PReP
>  CONFIG_MC146818RTC=y
> diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
> index 2ba49d0..abe8f80 100644
> --- a/hw/intc/Makefile.objs
> +++ b/hw/intc/Makefile.objs
> @@ -22,3 +22,4 @@ obj-$(CONFIG_OMAP) += omap_intc.o
>  obj-$(CONFIG_OPENPIC) += openpic.o
>  obj-$(CONFIG_OPENPIC_KVM) += openpic_kvm.o
>  obj-$(CONFIG_SH4) += sh_intc.o
> +obj-$(CONFIG_XICS) += xics.o
> diff --git a/hw/ppc/xics.c b/hw/intc/xics.c
> similarity index 100%
> rename from hw/ppc/xics.c
> rename to hw/intc/xics.c
> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> index be00d1d..7a1cd5d 100644
> --- a/hw/ppc/Makefile.objs
> +++ b/hw/ppc/Makefile.objs
> @@ -1,7 +1,7 @@
>  # shared objects
>  obj-y += ppc.o ppc_booke.o
>  # IBM pSeries (sPAPR)
> -obj-$(CONFIG_PSERIES) += spapr.o xics.o spapr_vio.o spapr_events.o
> +obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
>  obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
>  obj-$(CONFIG_PSERIES) += spapr_pci.o
>  # PowerPC 4xx boards
> -- 
> 1.7.10.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 02/17] pseries: rework XICS
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 02/17] pseries: rework XICS Alexey Kardashevskiy
  2013-06-27 11:47   ` David Gibson
@ 2013-07-08 18:22   ` Anthony Liguori
  2013-07-09  3:40     ` Alexey Kardashevskiy
  1 sibling, 1 reply; 92+ messages in thread
From: Anthony Liguori @ 2013-07-08 18:22 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: Alexander Graf, qemu-ppc, Paolo Bonzini, Paul Mackerras, David Gibson

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> Currently XICS interrupt controller is not a QEMU device. As we are going
> to support in-kernel emulated XICS which is a part of KVM, it make
> sense not to extend the existing XICS and have multiple KVM stub functions
> but to create yet another device and share pieces between fully emulated
> XICS and in-kernel XICS.
>
> The rework includes:
> * port to QOM
> * made few functions public to use from in-kernel XICS implementation
> * made VMStateDescription public to be used for in-kernel XICS migration
> * move xics_system_init() to spapr.c, it tries creating fully-emulated
> XICS now and will try in-kernel XICS in upcoming patches.
>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  hw/intc/xics.c        |  109 ++++++++++++++++++++++++++-----------------------
>  hw/ppc/spapr.c        |   28 +++++++++++++
>  include/hw/ppc/xics.h |   59 ++++++++++++++++++++++++--
>  3 files changed, 141 insertions(+), 55 deletions(-)
>
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index 091912e..0e374c8 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -34,13 +34,6 @@
>   * ICP: Presentation layer
>   */
>  
> -struct icp_server_state {
> -    uint32_t xirr;
> -    uint8_t pending_priority;
> -    uint8_t mfrr;
> -    qemu_irq output;
> -};
> -
>  #define XISR_MASK  0x00ffffff
>  #define CPPR_MASK  0xff000000
>  
> @@ -49,12 +42,6 @@ struct icp_server_state {
>  
>  struct ics_state;
>  
> -struct icp_state {
> -    long nr_servers;
> -    struct icp_server_state *ss;
> -    struct ics_state *ics;
> -};
> -
>  static void ics_reject(struct ics_state *ics, int nr);
>  static void ics_resend(struct ics_state *ics);
>  static void ics_eoi(struct ics_state *ics, int nr);
> @@ -171,27 +158,6 @@ static void icp_irq(struct icp_state *icp, int server, int nr, uint8_t priority)
>  /*
>   * ICS: Source layer
>   */
> -
> -struct ics_irq_state {
> -    int server;
> -    uint8_t priority;
> -    uint8_t saved_priority;
> -#define XICS_STATUS_ASSERTED           0x1
> -#define XICS_STATUS_SENT               0x2
> -#define XICS_STATUS_REJECTED           0x4
> -#define XICS_STATUS_MASKED_PENDING     0x8
> -    uint8_t status;
> -};
> -
> -struct ics_state {
> -    int nr_irqs;
> -    int offset;
> -    qemu_irq *qirqs;
> -    bool *islsi;
> -    struct ics_irq_state *irqs;
> -    struct icp_state *icp;
> -};
> -
>  static int ics_valid_irq(struct ics_state *ics, uint32_t nr)
>  {
>      return (nr >= ics->offset)
> @@ -506,9 +472,8 @@ static void rtas_int_on(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>      rtas_st(rets, 0, 0); /* Success */
>  }
>  
> -static void xics_reset(void *opaque)
> +void xics_common_reset(struct icp_state *icp)
>  {
> -    struct icp_state *icp = (struct icp_state *)opaque;
>      struct ics_state *ics = icp->ics;
>      int i;
>  
> @@ -527,7 +492,12 @@ static void xics_reset(void *opaque)
>      }
>  }
>  
> -void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
> +static void xics_reset(DeviceState *d)
> +{
> +    xics_common_reset(XICS(d));
> +}
> +
> +void xics_common_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
>  {
>      CPUState *cs = CPU(cpu);
>      CPUPPCState *env = &cpu->env;
> @@ -551,37 +521,72 @@ void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
>      }
>  }
>  
> -struct icp_state *xics_system_init(int nr_servers, int nr_irqs)
> +void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
> +{
> +    xics_common_cpu_setup(icp, cpu);
> +}
> +
> +void xics_common_init(struct icp_state *icp, qemu_irq_handler handler)
>  {
> -    struct icp_state *icp;
> -    struct ics_state *ics;
> +    struct ics_state *ics = icp->ics;
>  
> -    icp = g_malloc0(sizeof(*icp));
> -    icp->nr_servers = nr_servers;
>      icp->ss = g_malloc0(icp->nr_servers*sizeof(struct icp_server_state));
>  
>      ics = g_malloc0(sizeof(*ics));
> -    ics->nr_irqs = nr_irqs;
> +    ics->nr_irqs = icp->nr_irqs;
>      ics->offset = XICS_IRQ_BASE;
> -    ics->irqs = g_malloc0(nr_irqs * sizeof(struct ics_irq_state));
> -    ics->islsi = g_malloc0(nr_irqs * sizeof(bool));
> +    ics->irqs = g_malloc0(ics->nr_irqs * sizeof(struct ics_irq_state));
> +    ics->islsi = g_malloc0(ics->nr_irqs * sizeof(bool));
>  
>      icp->ics = ics;
>      ics->icp = icp;
>  
> -    ics->qirqs = qemu_allocate_irqs(ics_set_irq, ics, nr_irqs);
> +    ics->qirqs = qemu_allocate_irqs(handler, ics, ics->nr_irqs);
> +}
>  
> -    spapr_register_hypercall(H_CPPR, h_cppr);
> -    spapr_register_hypercall(H_IPI, h_ipi);
> -    spapr_register_hypercall(H_XIRR, h_xirr);
> -    spapr_register_hypercall(H_EOI, h_eoi);
> +static void xics_realize(DeviceState *dev, Error **errp)
> +{
> +    struct icp_state *icp = XICS(dev);
> +
> +    xics_common_init(icp, ics_set_irq);
>  
>      spapr_rtas_register("ibm,set-xive", rtas_set_xive);
>      spapr_rtas_register("ibm,get-xive", rtas_get_xive);
>      spapr_rtas_register("ibm,int-off", rtas_int_off);
>      spapr_rtas_register("ibm,int-on", rtas_int_on);
>  
> -    qemu_register_reset(xics_reset, icp);
> +}
> +
> +static Property xics_properties[] = {
> +    DEFINE_PROP_UINT32("nr_servers", struct icp_state, nr_servers, -1),
> +    DEFINE_PROP_UINT32("nr_irqs", struct icp_state, nr_irqs, -1),
> +    DEFINE_PROP_END_OF_LIST(),
> +};
> +
> +static void xics_class_init(ObjectClass *oc, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(oc);
> +
> +    dc->realize = xics_realize;
> +    dc->props = xics_properties;
> +    dc->reset = xics_reset;
> +}
> +
> +static const TypeInfo xics_info = {
> +    .name          = TYPE_XICS,
> +    .parent        = TYPE_SYS_BUS_DEVICE,
> +    .instance_size = sizeof(struct icp_state),
> +    .class_init    = xics_class_init,
> +};
> +
> +static void xics_register_types(void)
> +{
> +    spapr_register_hypercall(H_CPPR, h_cppr);
> +    spapr_register_hypercall(H_IPI, h_ipi);
> +    spapr_register_hypercall(H_XIRR, h_xirr);
> +    spapr_register_hypercall(H_EOI, h_eoi);
>  
> -    return icp;
> +    type_register_static(&xics_info);
>  }
> +
> +type_init(xics_register_types)
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 38c29b7..def3505 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -719,6 +719,34 @@ static int spapr_vga_init(PCIBus *pci_bus)
>      }
>  }
>  
> +static struct icp_state *try_create_xics(const char *type, int nr_servers,
> +                                         int nr_irqs)
> +{
> +    DeviceState *dev;
> +
> +    dev = qdev_create(NULL, type);
> +    qdev_prop_set_uint32(dev, "nr_servers", nr_servers);
> +    qdev_prop_set_uint32(dev, "nr_irqs", nr_irqs);
> +    if (qdev_init(dev) < 0) {
> +        return NULL;
> +    }
> +
> +    return XICS(dev);
> +}
> +
> +static struct icp_state *xics_system_init(int nr_servers, int nr_irqs)
> +{
> +    struct icp_state *icp = NULL;
> +
> +    icp = try_create_xics(TYPE_XICS, nr_servers, nr_irqs);
> +    if (!icp) {
> +        perror("Failed to create XICS\n");
> +        abort();
> +    }
> +
> +    return icp;
> +}
> +
>  /* pSeries LPAR / sPAPR hardware init */
>  static void ppc_spapr_init(QEMUMachineInitArgs *args)
>  {
> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> index 6bce042..3f72806 100644
> --- a/include/hw/ppc/xics.h
> +++ b/include/hw/ppc/xics.h
> @@ -27,15 +27,68 @@
>  #if !defined(__XICS_H__)
>  #define __XICS_H__
>  
> +#include "hw/sysbus.h"
> +
> +#define TYPE_XICS "xics"
> +#define XICS(obj) OBJECT_CHECK(struct icp_state, (obj), TYPE_XICS)
> +
>  #define XICS_IPI        0x2
> -#define XICS_IRQ_BASE   0x10
> +#define XICS_BUID       0x1
> +#define XICS_IRQ_BASE   (XICS_BUID << 12)
> +
> +/*
> + * We currently only support one BUID which is our interrupt base
> + * (the kernel implementation supports more but we don't exploit
> + *  that yet)
> + */
>  
> -struct icp_state;
> +struct icp_state {
> +    /*< private >*/
> +    SysBusDevice parent_obj;
> +    /*< public >*/
> +    uint32_t nr_servers;
> +    uint32_t nr_irqs;
> +    struct icp_server_state *ss;
> +    struct ics_state *ics;
> +};
> +
> +struct icp_server_state {
> +    uint32_t xirr;
> +    uint8_t pending_priority;
> +    uint8_t mfrr;
> +    qemu_irq output;
> +};

If you're exposing all of this, please fix coding style while you're at
it.

> +
> +struct ics_state {
> +    uint32_t nr_irqs;
> +    uint32_t offset;
> +    qemu_irq *qirqs;
> +    bool *islsi;
> +    struct ics_irq_state *irqs;
> +    struct icp_state *icp;
> +};

Shouldn't this be a device too?

> +
> +struct ics_irq_state {
> +    uint32_t server;
> +    uint8_t priority;
> +    uint8_t saved_priority;
> +#define XICS_STATUS_ASSERTED           0x1
> +#define XICS_STATUS_SENT               0x2
> +#define XICS_STATUS_REJECTED           0x4
> +#define XICS_STATUS_MASKED_PENDING     0x8
> +    uint8_t status;
> +};
>  
>  qemu_irq xics_get_qirq(struct icp_state *icp, int irq);
>  void xics_set_irq_type(struct icp_state *icp, int irq, bool lsi);
>  
> -struct icp_state *xics_system_init(int nr_servers, int nr_irqs);
> +void xics_common_init(struct icp_state *icp, qemu_irq_handler handler);
> +void xics_common_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu);
> +void xics_common_reset(struct icp_state *icp);
> +
>  void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu);
>  
> +extern const VMStateDescription vmstate_icp_server;
> +extern const VMStateDescription vmstate_ics;

This is the wrong way of doing whatever you're trying to do.

Regards,

Anthony Liguori

> +
>  #endif /* __XICS_H__ */
> -- 
> 1.7.10.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 02/17] pseries: rework XICS
  2013-06-27 12:17     ` Alexey Kardashevskiy
  2013-07-02  0:06       ` David Gibson
@ 2013-07-08 18:24       ` Anthony Liguori
  1 sibling, 0 replies; 92+ messages in thread
From: Anthony Liguori @ 2013-07-08 18:24 UTC (permalink / raw)
  To: Alexey Kardashevskiy, David Gibson
  Cc: qemu-devel, Alexander Graf, qemu-ppc, Paolo Bonzini, Paul Mackerras

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> On 06/27/2013 09:47 PM, David Gibson wrote:
>> On Thu, Jun 27, 2013 at 04:45:45PM +1000, Alexey Kardashevskiy wrote:
>>> Currently XICS interrupt controller is not a QEMU device. As we are going
>>> to support in-kernel emulated XICS which is a part of KVM, it make
>>> sense not to extend the existing XICS and have multiple KVM stub functions
>>> but to create yet another device and share pieces between fully emulated
>>> XICS and in-kernel XICS.
>> 
>> Hmm.  So, I think changing the xics to the qdev/qom framework is a
>> generally good idea.  But I'm not convinced its a good idea to have
>> different devices for the kernel and non-kernel xics.
>
> The idea came from Alex Graf, this is already done for openpic/openpic-kvm.
> The normal practice is to move ioctls to KVM to KVM code and provide empty
> stubs for non-KVM case. There were too many so having a separate xics-kvm
> is kind of help.

The way this should be modelled is:

XICSCommon
 -> XICS
 -> XICSKVM

With vmstate et al being part of XICSCommon.  See how the i8259 and
i8254 are modelled.

Regards,

Anthony Liguori


>
>
>> Won't that
>> prevent migrating from a system with a kernel xics to one without, or
>> vice versa?
>
> Mmm. Do we care much about that?...
> At the moment it is not supported that as VMStateDescription have different
> .name for xics and xics-kvm but easy to fix. And we do not pass a device to
> vmstate_register so that must be it.
>
>
>> 
>>>
>>> The rework includes:
>>> * port to QOM
>>> * made few functions public to use from in-kernel XICS implementation
>>> * made VMStateDescription public to be used for in-kernel XICS migration
>>> * move xics_system_init() to spapr.c, it tries creating fully-emulated
>>> XICS now and will try in-kernel XICS in upcoming patches.
>>>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>  hw/intc/xics.c        |  109 ++++++++++++++++++++++++++-----------------------
>>>  hw/ppc/spapr.c        |   28 +++++++++++++
>>>  include/hw/ppc/xics.h |   59 ++++++++++++++++++++++++--
>>>  3 files changed, 141 insertions(+), 55 deletions(-)
>>>
>>> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
>>> index 091912e..0e374c8 100644
>>> --- a/hw/intc/xics.c
>>> +++ b/hw/intc/xics.c
>>> @@ -34,13 +34,6 @@
>>>   * ICP: Presentation layer
>>>   */
>>>  
>>> -struct icp_server_state {
>>> -    uint32_t xirr;
>>> -    uint8_t pending_priority;
>>> -    uint8_t mfrr;
>>> -    qemu_irq output;
>>> -};
>>> -
>>>  #define XISR_MASK  0x00ffffff
>>>  #define CPPR_MASK  0xff000000
>>>  
>>> @@ -49,12 +42,6 @@ struct icp_server_state {
>>>  
>>>  struct ics_state;
>>>  
>>> -struct icp_state {
>>> -    long nr_servers;
>>> -    struct icp_server_state *ss;
>>> -    struct ics_state *ics;
>>> -};
>>> -
>>>  static void ics_reject(struct ics_state *ics, int nr);
>>>  static void ics_resend(struct ics_state *ics);
>>>  static void ics_eoi(struct ics_state *ics, int nr);
>>> @@ -171,27 +158,6 @@ static void icp_irq(struct icp_state *icp, int server, int nr, uint8_t priority)
>>>  /*
>>>   * ICS: Source layer
>>>   */
>>> -
>>> -struct ics_irq_state {
>>> -    int server;
>>> -    uint8_t priority;
>>> -    uint8_t saved_priority;
>>> -#define XICS_STATUS_ASSERTED           0x1
>>> -#define XICS_STATUS_SENT               0x2
>>> -#define XICS_STATUS_REJECTED           0x4
>>> -#define XICS_STATUS_MASKED_PENDING     0x8
>>> -    uint8_t status;
>>> -};
>>> -
>>> -struct ics_state {
>>> -    int nr_irqs;
>>> -    int offset;
>>> -    qemu_irq *qirqs;
>>> -    bool *islsi;
>>> -    struct ics_irq_state *irqs;
>>> -    struct icp_state *icp;
>>> -};
>>> -
>>>  static int ics_valid_irq(struct ics_state *ics, uint32_t nr)
>>>  {
>>>      return (nr >= ics->offset)
>>> @@ -506,9 +472,8 @@ static void rtas_int_on(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>>>      rtas_st(rets, 0, 0); /* Success */
>>>  }
>>>  
>>> -static void xics_reset(void *opaque)
>>> +void xics_common_reset(struct icp_state *icp)
>> 
>> Why do you need to expose this interface?  Couldn't the caller use
>> qdev_reset(xics) just as easily?
>> 
>>>  {
>>> -    struct icp_state *icp = (struct icp_state *)opaque;
>>>      struct ics_state *ics = icp->ics;
>>>      int i;
>>>  
>>> @@ -527,7 +492,12 @@ static void xics_reset(void *opaque)
>>>      }
>>>  }
>>>  
>>> -void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
>>> +static void xics_reset(DeviceState *d)
>>> +{
>>> +    xics_common_reset(XICS(d));
>>> +}
>>> +
>>> +void xics_common_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
>>>  {
>>>      CPUState *cs = CPU(cpu);
>>>      CPUPPCState *env = &cpu->env;
>>> @@ -551,37 +521,72 @@ void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
>>>      }
>>>  }
>>>  
>>> -struct icp_state *xics_system_init(int nr_servers, int nr_irqs)
>>> +void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
>>> +{
>>> +    xics_common_cpu_setup(icp, cpu);
>>> +}
>>> +
>>> +void xics_common_init(struct icp_state *icp, qemu_irq_handler handler)
>>>  {
>>> -    struct icp_state *icp;
>>> -    struct ics_state *ics;
>>> +    struct ics_state *ics = icp->ics;
>>>  
>>> -    icp = g_malloc0(sizeof(*icp));
>>> -    icp->nr_servers = nr_servers;
>>>      icp->ss = g_malloc0(icp->nr_servers*sizeof(struct icp_server_state));
>>>  
>>>      ics = g_malloc0(sizeof(*ics));
>>> -    ics->nr_irqs = nr_irqs;
>>> +    ics->nr_irqs = icp->nr_irqs;
>>>      ics->offset = XICS_IRQ_BASE;
>>> -    ics->irqs = g_malloc0(nr_irqs * sizeof(struct ics_irq_state));
>>> -    ics->islsi = g_malloc0(nr_irqs * sizeof(bool));
>>> +    ics->irqs = g_malloc0(ics->nr_irqs * sizeof(struct ics_irq_state));
>>> +    ics->islsi = g_malloc0(ics->nr_irqs * sizeof(bool));
>>>  
>>>      icp->ics = ics;
>>>      ics->icp = icp;
>>>  
>>> -    ics->qirqs = qemu_allocate_irqs(ics_set_irq, ics, nr_irqs);
>>> +    ics->qirqs = qemu_allocate_irqs(handler, ics, ics->nr_irqs);
>>> +}
>>>  
>>> -    spapr_register_hypercall(H_CPPR, h_cppr);
>>> -    spapr_register_hypercall(H_IPI, h_ipi);
>>> -    spapr_register_hypercall(H_XIRR, h_xirr);
>>> -    spapr_register_hypercall(H_EOI, h_eoi);
>>> +static void xics_realize(DeviceState *dev, Error **errp)
>>> +{
>>> +    struct icp_state *icp = XICS(dev);
>>> +
>>> +    xics_common_init(icp, ics_set_irq);
>>>  
>>>      spapr_rtas_register("ibm,set-xive", rtas_set_xive);
>>>      spapr_rtas_register("ibm,get-xive", rtas_get_xive);
>>>      spapr_rtas_register("ibm,int-off", rtas_int_off);
>>>      spapr_rtas_register("ibm,int-on", rtas_int_on);
>>>  
>>> -    qemu_register_reset(xics_reset, icp);
>>> +}
>>> +
>>> +static Property xics_properties[] = {
>>> +    DEFINE_PROP_UINT32("nr_servers", struct icp_state, nr_servers, -1),
>>> +    DEFINE_PROP_UINT32("nr_irqs", struct icp_state, nr_irqs, -1),
>>> +    DEFINE_PROP_END_OF_LIST(),
>>> +};
>>> +
>>> +static void xics_class_init(ObjectClass *oc, void *data)
>>> +{
>>> +    DeviceClass *dc = DEVICE_CLASS(oc);
>>> +
>>> +    dc->realize = xics_realize;
>>> +    dc->props = xics_properties;
>>> +    dc->reset = xics_reset;
>>> +}
>>> +
>>> +static const TypeInfo xics_info = {
>>> +    .name          = TYPE_XICS,
>>> +    .parent        = TYPE_SYS_BUS_DEVICE,
>>> +    .instance_size = sizeof(struct icp_state),
>>> +    .class_init    = xics_class_init,
>>> +};
>>> +
>>> +static void xics_register_types(void)
>>> +{
>>> +    spapr_register_hypercall(H_CPPR, h_cppr);
>>> +    spapr_register_hypercall(H_IPI, h_ipi);
>>> +    spapr_register_hypercall(H_XIRR, h_xirr);
>>> +    spapr_register_hypercall(H_EOI, h_eoi);
>>>  
>>> -    return icp;
>>> +    type_register_static(&xics_info);
>>>  }
>>> +
>>> +type_init(xics_register_types)
>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>> index 38c29b7..def3505 100644
>>> --- a/hw/ppc/spapr.c
>>> +++ b/hw/ppc/spapr.c
>>> @@ -719,6 +719,34 @@ static int spapr_vga_init(PCIBus *pci_bus)
>>>      }
>>>  }
>>>  
>>> +static struct icp_state *try_create_xics(const char *type, int nr_servers,
>>> +                                         int nr_irqs)
>>> +{
>>> +    DeviceState *dev;
>>> +
>>> +    dev = qdev_create(NULL, type);
>>> +    qdev_prop_set_uint32(dev, "nr_servers", nr_servers);
>>> +    qdev_prop_set_uint32(dev, "nr_irqs", nr_irqs);
>>> +    if (qdev_init(dev) < 0) {
>>> +        return NULL;
>> 
>> You could just use qdev_init_nofail() here to avoid the manual
>> handling of failures.
>> 
>>> +    }
>>> +
>>> +    return XICS(dev);
>>> +}
>>> +
>>> +static struct icp_state *xics_system_init(int nr_servers, int nr_irqs)
>>> +{
>>> +    struct icp_state *icp = NULL;
>>> +
>>> +    icp = try_create_xics(TYPE_XICS, nr_servers, nr_irqs);
>>> +    if (!icp) {
>>> +        perror("Failed to create XICS\n");
>>> +        abort();
>>> +    }
>>> +
>>> +    return icp;
>>> +}
>>> +
>>>  /* pSeries LPAR / sPAPR hardware init */
>>>  static void ppc_spapr_init(QEMUMachineInitArgs *args)
>>>  {
>>> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
>>> index 6bce042..3f72806 100644
>>> --- a/include/hw/ppc/xics.h
>>> +++ b/include/hw/ppc/xics.h
>>> @@ -27,15 +27,68 @@
>>>  #if !defined(__XICS_H__)
>>>  #define __XICS_H__
>>>  
>>> +#include "hw/sysbus.h"
>>> +
>>> +#define TYPE_XICS "xics"
>>> +#define XICS(obj) OBJECT_CHECK(struct icp_state, (obj), TYPE_XICS)
>>> +
>>>  #define XICS_IPI        0x2
>>> -#define XICS_IRQ_BASE   0x10
>>> +#define XICS_BUID       0x1
>>> +#define XICS_IRQ_BASE   (XICS_BUID << 12)
>>> +
>>> +/*
>>> + * We currently only support one BUID which is our interrupt base
>>> + * (the kernel implementation supports more but we don't exploit
>>> + *  that yet)
>>> + */
>>>  
>>> -struct icp_state;
>>> +struct icp_state {
>>> +    /*< private >*/
>>> +    SysBusDevice parent_obj;
>>> +    /*< public >*/
>>> +    uint32_t nr_servers;
>>> +    uint32_t nr_irqs;
>>> +    struct icp_server_state *ss;
>>> +    struct ics_state *ics;
>>> +};
>>> +
>>> +struct icp_server_state {
>>> +    uint32_t xirr;
>>> +    uint8_t pending_priority;
>>> +    uint8_t mfrr;
>>> +    qemu_irq output;
>>> +};
>> 
>> The indivudual server_state and irq_state structures probably
>> shouldn't be exported.
>> 
>>> +struct ics_state {
>>> +    uint32_t nr_irqs;
>>> +    uint32_t offset;
>>> +    qemu_irq *qirqs;
>>> +    bool *islsi;
>>> +    struct ics_irq_state *irqs;
>>> +    struct icp_state *icp;
>>> +};
>>> +
>>> +struct ics_irq_state {
>>> +    uint32_t server;
>>> +    uint8_t priority;
>>> +    uint8_t saved_priority;
>>> +#define XICS_STATUS_ASSERTED           0x1
>>> +#define XICS_STATUS_SENT               0x2
>>> +#define XICS_STATUS_REJECTED           0x4
>>> +#define XICS_STATUS_MASKED_PENDING     0x8
>>> +    uint8_t status;
>>> +};
>>>  
>>>  qemu_irq xics_get_qirq(struct icp_state *icp, int irq);
>>>  void xics_set_irq_type(struct icp_state *icp, int irq, bool lsi);
>>>  
>>> -struct icp_state *xics_system_init(int nr_servers, int nr_irqs);
>>> +void xics_common_init(struct icp_state *icp, qemu_irq_handler handler);
>>> +void xics_common_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu);
>>> +void xics_common_reset(struct icp_state *icp);
>>> +
>>>  void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu);
>>>  
>>> +extern const VMStateDescription vmstate_icp_server;
>>> +extern const VMStateDescription vmstate_ics;
>>> +
>>>  #endif /* __XICS_H__ */
>> 
>
>
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 03/17] savevm: Implement VMS_DIVIDE flag
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 03/17] savevm: Implement VMS_DIVIDE flag Alexey Kardashevskiy
@ 2013-07-08 18:27   ` Anthony Liguori
  2013-07-08 23:57     ` David Gibson
  0 siblings, 1 reply; 92+ messages in thread
From: Anthony Liguori @ 2013-07-08 18:27 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: Alexander Graf, qemu-ppc, Paolo Bonzini, Paul Mackerras, David Gibson

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> From: David Gibson <david@gibson.dropbear.id.au>
>
> The vmstate infrastructure includes a VMS_MULTIPY flag, and associated
> VMSTATE_VBUFFER_MULTIPLY helper macro.  These can be used to save a
> variably sized buffer where the size in bytes of the buffer isn't directly
> accessible as a structure field, but an element count from which the size
> can be derived is.

Why?  What's the point of sending the total size vs. the element count?

It's not like we have legacy that we have to support here...

Regards,

Anthony Liguori

>
> This patch adds an analogous VMS_DIVIDE option, which handles a variably
> sized buffer whose size is a submultiple of a field, rather than a
> multiple.  For example a buffer containing per-page structures whose size
> is derived from a field storing the total address space described by the
> structures could use this construct.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  include/migration/vmstate.h |   13 +++++++++++++
>  savevm.c                    |    8 ++++++++
>  2 files changed, 21 insertions(+)
>
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index ebc4d09..787f1cb 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -98,6 +98,7 @@ enum VMStateFlags {
>      VMS_MULTIPLY         = 0x200,  /* multiply "size" field by field_size */
>      VMS_VARRAY_UINT8     = 0x400,  /* Array with size in uint8_t field*/
>      VMS_VARRAY_UINT32    = 0x800,  /* Array with size in uint32_t field*/
> +    VMS_DIVIDE           = 0x1000, /* divide "size" field by field_size */
>  };
>  
>  typedef struct {
> @@ -420,6 +421,18 @@ extern const VMStateInfo vmstate_info_bitmap;
>      .start        = (_start),                                        \
>  }
>  
> +#define VMSTATE_VBUFFER_DIVIDE(_field, _state, _version, _test, _start, _field_size, _divide) { \
> +    .name         = (stringify(_field)),                             \
> +    .version_id   = (_version),                                      \
> +    .field_exists = (_test),                                         \
> +    .size_offset  = vmstate_offset_value(_state, _field_size, uint32_t),\
> +    .size         = (_divide),                                       \
> +    .info         = &vmstate_info_buffer,                            \
> +    .flags        = VMS_VBUFFER|VMS_POINTER|VMS_DIVIDE,              \
> +    .offset       = offsetof(_state, _field),                        \
> +    .start        = (_start),                                        \
> +}
> +
>  #define VMSTATE_VBUFFER(_field, _state, _version, _test, _start, _field_size) { \
>      .name         = (stringify(_field)),                             \
>      .version_id   = (_version),                                      \
> diff --git a/savevm.c b/savevm.c
> index 48cc2a9..c0fb4a3 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -1658,6 +1658,10 @@ int vmstate_load_state(QEMUFile *f, const VMStateDescription *vmsd,
>                  if (field->flags & VMS_MULTIPLY) {
>                      size *= field->size;
>                  }
> +                if (field->flags & VMS_DIVIDE) {
> +                    assert((size % field->size) == 0);
> +                    size /= field->size;
> +                }
>              }
>              if (field->flags & VMS_ARRAY) {
>                  n_elems = field->num;
> @@ -1722,6 +1726,10 @@ void vmstate_save_state(QEMUFile *f, const VMStateDescription *vmsd,
>                  if (field->flags & VMS_MULTIPLY) {
>                      size *= field->size;
>                  }
> +                if (field->flags & VMS_DIVIDE) {
> +                    assert((size % field->size) == 0);
> +                    size /= field->size;
> +                }
>              }
>              if (field->flags & VMS_ARRAY) {
>                  n_elems = field->num;
> -- 
> 1.7.10.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 04/17] target-ppc: Convert ppc cpu savevm to VMStateDescription
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 04/17] target-ppc: Convert ppc cpu savevm to VMStateDescription Alexey Kardashevskiy
@ 2013-07-08 18:29   ` Anthony Liguori
  2013-07-09  5:14     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 92+ messages in thread
From: Anthony Liguori @ 2013-07-08 18:29 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: Alexander Graf, qemu-ppc, Paolo Bonzini, Paul Mackerras, David Gibson

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> From: David Gibson <david@gibson.dropbear.id.au>
>
> The savevm code for the powerpc cpu emulation is currently based around
> the old register_savevm() rather than register_vmstate() method.  It's also
> rather broken, missing some important state on some CPU models.
>
> This patch completely rewrites the savevm for target-ppc, using the new
> VMStateDescription approach.  Exactly what needs to be saved in what
> configurations has been more carefully examined, too.  This introduces a
> new version (5) of the cpu save format.  The old load function is retained
> to support version 4 images.

Supporting "version 4" is purely an academic exercise.  I wouldn't bother.

> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> [aik: ppc cpu savevm convertion fixed to use PowerPCCPU instead of CPUPPCState]
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  target-ppc/cpu-qom.h        |    4 +
>  target-ppc/cpu.h            |    8 +-
>  target-ppc/machine.c        |  533 ++++++++++++++++++++++++++++++++++++-------
>  target-ppc/translate_init.c |    2 +
>  4 files changed, 454 insertions(+), 93 deletions(-)
>
> diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h
> index eb03a00..2b96b04 100644
> --- a/target-ppc/cpu-qom.h
> +++ b/target-ppc/cpu-qom.h
> @@ -102,4 +102,8 @@ PowerPCCPUClass *ppc_cpu_class_by_pvr(uint32_t pvr);
>  
>  void ppc_cpu_do_interrupt(CPUState *cpu);
>  
> +#ifndef CONFIG_USER_ONLY
> +extern const struct VMStateDescription vmstate_ppc_cpu;
> +#endif
> +
>  #endif
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index 0ede077..f30577d 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -948,7 +948,7 @@ struct CPUPPCState {
>  #if defined(TARGET_PPC64)
>      /* PowerPC 64 SLB area */
>      ppc_slb_t slb[64];
> -    int slb_nr;
> +    int32_t slb_nr;
>  #endif
>      /* segment registers */
>      hwaddr htab_base;
> @@ -957,11 +957,11 @@ struct CPUPPCState {
>      /* externally stored hash table */
>      uint8_t *external_htab;
>      /* BATs */
> -    int nb_BATs;
> +    uint32_t nb_BATs;
>      target_ulong DBAT[2][8];
>      target_ulong IBAT[2][8];
>      /* PowerPC TLB registers (for 4xx, e500 and 60x software driven TLBs) */
> -    int nb_tlb;      /* Total number of TLB                                  */
> +    int32_t nb_tlb;      /* Total number of TLB                              */
>      int tlb_per_way; /* Speed-up helper: used to avoid divisions at run time */
>      int nb_ways;     /* Number of ways in the TLB set                        */
>      int last_way;    /* Last used way used to allocate TLB in a LRU way      */
> @@ -1176,8 +1176,6 @@ static inline CPUPPCState *cpu_init(const char *cpu_model)
>  #define cpu_signal_handler cpu_ppc_signal_handler
>  #define cpu_list ppc_cpu_list
>  
> -#define CPU_SAVE_VERSION 4
> -
>  /* MMU modes definitions */
>  #define MMU_MODE0_SUFFIX _user
>  #define MMU_MODE1_SUFFIX _kernel
> diff --git a/target-ppc/machine.c b/target-ppc/machine.c
> index 2d10adb..1fcc6bc 100644
> --- a/target-ppc/machine.c
> +++ b/target-ppc/machine.c
> @@ -1,96 +1,12 @@
>  #include "hw/hw.h"
>  #include "hw/boards.h"
>  #include "sysemu/kvm.h"
> +#include "helper_regs.h"
>  
> -void cpu_save(QEMUFile *f, void *opaque)
> +static int cpu_load_old(QEMUFile *f, void *opaque, int version_id)
>  {
> -    CPUPPCState *env = (CPUPPCState *)opaque;
> -    unsigned int i, j;
> -    uint32_t fpscr;
> -    target_ulong xer;
> -
> -    for (i = 0; i < 32; i++)
> -        qemu_put_betls(f, &env->gpr[i]);
> -#if !defined(TARGET_PPC64)
> -    for (i = 0; i < 32; i++)
> -        qemu_put_betls(f, &env->gprh[i]);
> -#endif
> -    qemu_put_betls(f, &env->lr);
> -    qemu_put_betls(f, &env->ctr);
> -    for (i = 0; i < 8; i++)
> -        qemu_put_be32s(f, &env->crf[i]);
> -    xer = cpu_read_xer(env);
> -    qemu_put_betls(f, &xer);
> -    qemu_put_betls(f, &env->reserve_addr);
> -    qemu_put_betls(f, &env->msr);
> -    for (i = 0; i < 4; i++)
> -        qemu_put_betls(f, &env->tgpr[i]);
> -    for (i = 0; i < 32; i++) {
> -        union {
> -            float64 d;
> -            uint64_t l;
> -        } u;
> -        u.d = env->fpr[i];
> -        qemu_put_be64(f, u.l);
> -    }
> -    fpscr = env->fpscr;
> -    qemu_put_be32s(f, &fpscr);
> -    qemu_put_sbe32s(f, &env->access_type);
> -#if defined(TARGET_PPC64)
> -    qemu_put_betls(f, &env->spr[SPR_ASR]);
> -    qemu_put_sbe32s(f, &env->slb_nr);
> -#endif
> -    qemu_put_betls(f, &env->spr[SPR_SDR1]);
> -    for (i = 0; i < 32; i++)
> -        qemu_put_betls(f, &env->sr[i]);
> -    for (i = 0; i < 2; i++)
> -        for (j = 0; j < 8; j++)
> -            qemu_put_betls(f, &env->DBAT[i][j]);
> -    for (i = 0; i < 2; i++)
> -        for (j = 0; j < 8; j++)
> -            qemu_put_betls(f, &env->IBAT[i][j]);
> -    qemu_put_sbe32s(f, &env->nb_tlb);
> -    qemu_put_sbe32s(f, &env->tlb_per_way);
> -    qemu_put_sbe32s(f, &env->nb_ways);
> -    qemu_put_sbe32s(f, &env->last_way);
> -    qemu_put_sbe32s(f, &env->id_tlbs);
> -    qemu_put_sbe32s(f, &env->nb_pids);
> -    if (env->tlb.tlb6) {
> -        // XXX assumes 6xx
> -        for (i = 0; i < env->nb_tlb; i++) {
> -            qemu_put_betls(f, &env->tlb.tlb6[i].pte0);
> -            qemu_put_betls(f, &env->tlb.tlb6[i].pte1);
> -            qemu_put_betls(f, &env->tlb.tlb6[i].EPN);
> -        }
> -    }
> -    for (i = 0; i < 4; i++)
> -        qemu_put_betls(f, &env->pb[i]);
> -    for (i = 0; i < 1024; i++)
> -        qemu_put_betls(f, &env->spr[i]);
> -    qemu_put_be32s(f, &env->vscr);
> -    qemu_put_be64s(f, &env->spe_acc);
> -    qemu_put_be32s(f, &env->spe_fscr);
> -    qemu_put_betls(f, &env->msr_mask);
> -    qemu_put_be32s(f, &env->flags);
> -    qemu_put_sbe32s(f, &env->error_code);
> -    qemu_put_be32s(f, &env->pending_interrupts);
> -    qemu_put_be32s(f, &env->irq_input_state);
> -    for (i = 0; i < POWERPC_EXCP_NB; i++)
> -        qemu_put_betls(f, &env->excp_vectors[i]);
> -    qemu_put_betls(f, &env->excp_prefix);
> -    qemu_put_betls(f, &env->ivor_mask);
> -    qemu_put_betls(f, &env->ivpr_mask);
> -    qemu_put_betls(f, &env->hreset_vector);
> -    qemu_put_betls(f, &env->nip);
> -    qemu_put_betls(f, &env->hflags);
> -    qemu_put_betls(f, &env->hflags_nmsr);
> -    qemu_put_sbe32s(f, &env->mmu_idx);
> -    qemu_put_sbe32(f, 0);
> -}
> -
> -int cpu_load(QEMUFile *f, void *opaque, int version_id)
> -{
> -    CPUPPCState *env = (CPUPPCState *)opaque;
> +    PowerPCCPU *cpu = opaque;
> +    CPUPPCState *env = &cpu->env;
>      unsigned int i, j;
>      target_ulong sdr1;
>      uint32_t fpscr;
> @@ -177,3 +93,444 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
>  
>      return 0;
>  }
> +
> +static int get_avr(QEMUFile *f, void *pv, size_t size)
> +{
> +    ppc_avr_t *v = pv;
> +
> +    v->u64[0] = qemu_get_be64(f);
> +    v->u64[1] = qemu_get_be64(f);
> +
> +    return 0;
> +}
> +
> +static void put_avr(QEMUFile *f, void *pv, size_t size)
> +{
> +    ppc_avr_t *v = pv;
> +
> +    qemu_put_be64(f, v->u64[0]);
> +    qemu_put_be64(f, v->u64[1]);
> +}
> +
> +const VMStateInfo vmstate_info_avr = {
> +    .name = "avr",
> +    .get  = get_avr,
> +    .put  = put_avr,
> +};
> +
> +#define VMSTATE_AVR_ARRAY_V(_f, _s, _n, _v)                       \
> +    VMSTATE_ARRAY(_f, _s, _n, _v, vmstate_info_avr, ppc_avr_t)
> +
> +#define VMSTATE_AVR_ARRAY(_f, _s, _n)                             \
> +    VMSTATE_AVR_ARRAY_V(_f, _s, _n, 0)
> +
> +static void cpu_pre_save(void *opaque)
> +{
> +    PowerPCCPU *cpu = opaque;
> +    CPUPPCState *env = &cpu->env;
> +    int i;
> +
> +    env->spr[SPR_LR] = env->lr;
> +    env->spr[SPR_CTR] = env->ctr;
> +    env->spr[SPR_XER] = env->xer;
> +#if defined(TARGET_PPC64)
> +    env->spr[SPR_CFAR] = env->cfar;
> +#endif
> +    env->spr[SPR_BOOKE_SPEFSCR] = env->spe_fscr;
> +
> +    for (i = 0; (i < 4) && (i < env->nb_BATs); i++) {
> +        env->spr[SPR_DBAT0U + 2*i] = env->DBAT[0][i];
> +        env->spr[SPR_DBAT0U + 2*i + 1] = env->DBAT[1][i];
> +        env->spr[SPR_IBAT0U + 2*i] = env->IBAT[0][i];
> +        env->spr[SPR_IBAT0U + 2*i + 1] = env->IBAT[1][i];
> +    }
> +    for (i = 0; (i < 4) && ((i+4) < env->nb_BATs); i++) {
> +        env->spr[SPR_DBAT4U + 2*i] = env->DBAT[0][i+4];
> +        env->spr[SPR_DBAT4U + 2*i + 1] = env->DBAT[1][i+4];
> +        env->spr[SPR_IBAT4U + 2*i] = env->IBAT[0][i+4];
> +        env->spr[SPR_IBAT4U + 2*i + 1] = env->IBAT[1][i+4];
> +    }
> +}
> +
> +static int cpu_post_load(void *opaque, int version_id)
> +{
> +    PowerPCCPU *cpu = opaque;
> +    CPUPPCState *env = &cpu->env;
> +    int i;
> +
> +    env->lr = env->spr[SPR_LR];
> +    env->ctr = env->spr[SPR_CTR];
> +    env->xer = env->spr[SPR_XER];
> +#if defined(TARGET_PPC64)
> +    env->cfar = env->spr[SPR_CFAR];
> +#endif
> +    env->spe_fscr = env->spr[SPR_BOOKE_SPEFSCR];
> +
> +    for (i = 0; (i < 4) && (i < env->nb_BATs); i++) {
> +        env->DBAT[0][i] = env->spr[SPR_DBAT0U + 2*i];
> +        env->DBAT[1][i] = env->spr[SPR_DBAT0U + 2*i + 1];
> +        env->IBAT[0][i] = env->spr[SPR_IBAT0U + 2*i];
> +        env->IBAT[1][i] = env->spr[SPR_IBAT0U + 2*i + 1];
> +    }
> +    for (i = 0; (i < 4) && ((i+4) < env->nb_BATs); i++) {
> +        env->DBAT[0][i+4] = env->spr[SPR_DBAT4U + 2*i];
> +        env->DBAT[1][i+4] = env->spr[SPR_DBAT4U + 2*i + 1];
> +        env->IBAT[0][i+4] = env->spr[SPR_IBAT4U + 2*i];
> +        env->IBAT[1][i+4] = env->spr[SPR_IBAT4U + 2*i + 1];
> +    }
> +
> +    /* Restore htab_base and htab_mask variables */
> +    ppc_store_sdr1(env, env->spr[SPR_SDR1]);
> +
> +    hreg_compute_hflags(env);
> +    hreg_compute_mem_idx(env);
> +
> +    return 0;
> +}
> +
> +static bool fpu_needed(void *opaque)
> +{
> +    PowerPCCPU *cpu = opaque;
> +
> +    return (cpu->env.insns_flags & PPC_FLOAT);
> +}
> +
> +static const VMStateDescription vmstate_fpu = {
> +    .name = "cpu/fpu",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields      = (VMStateField []) {
> +        VMSTATE_FLOAT64_ARRAY(env.fpr, PowerPCCPU, 32),
> +        VMSTATE_UINTTL(env.fpscr, PowerPCCPU),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static bool altivec_needed(void *opaque)
> +{
> +    PowerPCCPU *cpu = opaque;
> +
> +    return (cpu->env.insns_flags & PPC_ALTIVEC);
> +}
> +
> +static const VMStateDescription vmstate_altivec = {
> +    .name = "cpu/altivec",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields      = (VMStateField []) {
> +        VMSTATE_AVR_ARRAY(env.avr, PowerPCCPU, 32),
> +        VMSTATE_UINT32(env.vscr, PowerPCCPU),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static bool vsx_needed(void *opaque)
> +{
> +    PowerPCCPU *cpu = opaque;
> +
> +    return (cpu->env.insns_flags2 & PPC2_VSX);
> +}
> +
> +static const VMStateDescription vmstate_vsx = {
> +    .name = "cpu/vsx",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields      = (VMStateField []) {
> +        VMSTATE_UINT64_ARRAY(env.vsr, PowerPCCPU, 32),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static bool sr_needed(void *opaque)
> +{
> +#ifdef TARGET_PPC64
> +    PowerPCCPU *cpu = opaque;
> +
> +    return !(cpu->env.mmu_model & POWERPC_MMU_64);
> +#else
> +    return true;
> +#endif
> +}
> +
> +static const VMStateDescription vmstate_sr = {
> +    .name = "cpu/sr",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields      = (VMStateField []) {
> +        VMSTATE_UINTTL_ARRAY(env.sr, PowerPCCPU, 32),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +#ifdef TARGET_PPC64
> +static int get_slbe(QEMUFile *f, void *pv, size_t size)
> +{
> +    ppc_slb_t *v = pv;
> +
> +    v->esid = qemu_get_be64(f);
> +    v->vsid = qemu_get_be64(f);
> +
> +    return 0;
> +}
> +
> +static void put_slbe(QEMUFile *f, void *pv, size_t size)
> +{
> +    ppc_slb_t *v = pv;
> +
> +    qemu_put_be64(f, v->esid);
> +    qemu_put_be64(f, v->vsid);
> +}
> +
> +const VMStateInfo vmstate_info_slbe = {
> +    .name = "slbe",
> +    .get  = get_slbe,
> +    .put  = put_slbe,
> +};
> +
> +#define VMSTATE_SLB_ARRAY_V(_f, _s, _n, _v)                       \
> +    VMSTATE_ARRAY(_f, _s, _n, _v, vmstate_info_slbe, ppc_slb_t)
> +
> +#define VMSTATE_SLB_ARRAY(_f, _s, _n)                             \
> +    VMSTATE_SLB_ARRAY_V(_f, _s, _n, 0)
> +
> +static bool slb_needed(void *opaque)
> +{
> +    PowerPCCPU *cpu = opaque;
> +
> +    /* We don't support any of the old segment table based 64-bit CPUs */
> +    return (cpu->env.mmu_model & POWERPC_MMU_64);
> +}
> +
> +static const VMStateDescription vmstate_slb = {
> +    .name = "cpu/slb",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields      = (VMStateField []) {
> +        VMSTATE_INT32_EQUAL(env.slb_nr, PowerPCCPU),
> +        VMSTATE_SLB_ARRAY(env.slb, PowerPCCPU, 64),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +#endif /* TARGET_PPC64 */
> +
> +static const VMStateDescription vmstate_tlb6xx_entry = {
> +    .name = "cpu/tlb6xx_entry",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields      = (VMStateField []) {
> +        VMSTATE_UINTTL(pte0, ppc6xx_tlb_t),
> +        VMSTATE_UINTTL(pte1, ppc6xx_tlb_t),
> +        VMSTATE_UINTTL(EPN, ppc6xx_tlb_t),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static bool tlb6xx_needed(void *opaque)
> +{
> +    PowerPCCPU *cpu = opaque;
> +    CPUPPCState *env = &cpu->env;
> +
> +    return env->nb_tlb && (env->tlb_type == TLB_6XX);
> +}
> +
> +static const VMStateDescription vmstate_tlb6xx = {
> +    .name = "cpu/tlb6xx",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields      = (VMStateField []) {
> +        VMSTATE_INT32_EQUAL(env.nb_tlb, PowerPCCPU),
> +        VMSTATE_STRUCT_VARRAY_POINTER_INT32(env.tlb.tlb6, PowerPCCPU,
> +                                            env.nb_tlb,
> +                                            vmstate_tlb6xx_entry,
> +                                            ppc6xx_tlb_t),
> +        VMSTATE_UINTTL_ARRAY(env.tgpr, PowerPCCPU, 4),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static const VMStateDescription vmstate_tlbemb_entry = {
> +    .name = "cpu/tlbemb_entry",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields      = (VMStateField []) {
> +        VMSTATE_UINT64(RPN, ppcemb_tlb_t),
> +        VMSTATE_UINTTL(EPN, ppcemb_tlb_t),
> +        VMSTATE_UINTTL(PID, ppcemb_tlb_t),
> +        VMSTATE_UINTTL(size, ppcemb_tlb_t),
> +        VMSTATE_UINT32(prot, ppcemb_tlb_t),
> +        VMSTATE_UINT32(attr, ppcemb_tlb_t),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static bool tlbemb_needed(void *opaque)
> +{
> +    PowerPCCPU *cpu = opaque;
> +    CPUPPCState *env = &cpu->env;
> +
> +    return env->nb_tlb && (env->tlb_type == TLB_EMB);
> +}
> +
> +static bool pbr403_needed(void *opaque)
> +{
> +    PowerPCCPU *cpu = opaque;
> +    uint32_t pvr = cpu->env.spr[SPR_PVR];
> +
> +    return (pvr & 0xffff0000) == 0x00200000;
> +}
> +
> +static const VMStateDescription vmstate_pbr403 = {
> +    .name = "cpu/pbr403",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields      = (VMStateField []) {
> +        VMSTATE_UINTTL_ARRAY(env.pb, PowerPCCPU, 4),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static const VMStateDescription vmstate_tlbemb = {
> +    .name = "cpu/tlb6xx",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields      = (VMStateField []) {
> +        VMSTATE_INT32_EQUAL(env.nb_tlb, PowerPCCPU),
> +        VMSTATE_STRUCT_VARRAY_POINTER_INT32(env.tlb.tlbe, PowerPCCPU,
> +                                            env.nb_tlb,
> +                                            vmstate_tlbemb_entry,
> +                                            ppcemb_tlb_t),
> +        /* 403 protection registers */
> +        VMSTATE_END_OF_LIST()
> +    },
> +    .subsections = (VMStateSubsection []) {
> +        {
> +            .vmsd = &vmstate_pbr403,
> +            .needed = pbr403_needed,
> +        } , {
> +            /* empty */
> +        }
> +    }
> +};
> +
> +static const VMStateDescription vmstate_tlbmas_entry = {
> +    .name = "cpu/tlbmas_entry",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields      = (VMStateField []) {
> +        VMSTATE_UINT32(mas8, ppcmas_tlb_t),
> +        VMSTATE_UINT32(mas1, ppcmas_tlb_t),
> +        VMSTATE_UINT64(mas2, ppcmas_tlb_t),
> +        VMSTATE_UINT64(mas7_3, ppcmas_tlb_t),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static bool tlbmas_needed(void *opaque)
> +{
> +    PowerPCCPU *cpu = opaque;
> +    CPUPPCState *env = &cpu->env;
> +
> +    return env->nb_tlb && (env->tlb_type == TLB_MAS);
> +}
> +
> +static const VMStateDescription vmstate_tlbmas = {
> +    .name = "cpu/tlbmas",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields      = (VMStateField []) {
> +        VMSTATE_INT32_EQUAL(env.nb_tlb, PowerPCCPU),
> +        VMSTATE_STRUCT_VARRAY_POINTER_INT32(env.tlb.tlbm, PowerPCCPU,
> +                                            env.nb_tlb,
> +                                            vmstate_tlbmas_entry,
> +                                            ppcmas_tlb_t),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +const VMStateDescription vmstate_ppc_cpu = {
> +    .name = "cpu",
> +    .version_id = 5,
> +    .minimum_version_id = 5,
> +    .minimum_version_id_old = 4,
> +    .load_state_old = cpu_load_old,
> +    .pre_save = cpu_pre_save,
> +    .post_load = cpu_post_load,
> +    .fields      = (VMStateField []) {
> +        /* Verify we haven't changed the pvr */
> +        VMSTATE_UINTTL_EQUAL(env.spr[SPR_PVR], PowerPCCPU),
> +
> +        /* User mode architected state */
> +        VMSTATE_UINTTL_ARRAY(env.gpr, PowerPCCPU, 32),
> +#if !defined(TARGET_PPC64)
> +        VMSTATE_UINTTL_ARRAY(env.gprh, PowerPCCPU, 32),
> +#endif
> +        VMSTATE_UINT32_ARRAY(env.crf, PowerPCCPU, 8),
> +        VMSTATE_UINTTL(env.nip, PowerPCCPU),
> +
> +        /* SPRs */
> +        VMSTATE_UINTTL_ARRAY(env.spr, PowerPCCPU, 1024),
> +        VMSTATE_UINT64(env.spe_acc, PowerPCCPU),
> +
> +        /* Reservation */
> +        VMSTATE_UINTTL(env.reserve_addr, PowerPCCPU),
> +
> +        /* Supervisor mode architected state */
> +        VMSTATE_UINTTL(env.msr, PowerPCCPU),
> +
> +        /* Internal state */
> +        VMSTATE_UINTTL(env.hflags_nmsr, PowerPCCPU),
> +        /* FIXME: access_type? */
> +
> +        /* Sanity checking */
> +        VMSTATE_UINTTL_EQUAL(env.msr_mask, PowerPCCPU),
> +        VMSTATE_UINT64_EQUAL(env.insns_flags, PowerPCCPU),
> +        VMSTATE_UINT64_EQUAL(env.insns_flags2, PowerPCCPU),
> +        VMSTATE_UINT32_EQUAL(env.nb_BATs, PowerPCCPU),
> +        VMSTATE_END_OF_LIST()
> +    },
> +    .subsections = (VMStateSubsection []) {
> +        {
> +            .vmsd = &vmstate_fpu,
> +            .needed = fpu_needed,
> +        } , {
> +            .vmsd = &vmstate_altivec,
> +            .needed = altivec_needed,
> +        } , {
> +            .vmsd = &vmstate_vsx,
> +            .needed = vsx_needed,
> +        } , {
> +            .vmsd = &vmstate_sr,
> +            .needed = sr_needed,
> +        } , {
> +#ifdef TARGET_PPC64
> +            .vmsd = &vmstate_slb,
> +            .needed = slb_needed,
> +        } , {
> +#endif /* TARGET_PPC64 */
> +            .vmsd = &vmstate_tlb6xx,
> +            .needed = tlb6xx_needed,
> +        } , {
> +            .vmsd = &vmstate_tlbemb,
> +            .needed = tlbemb_needed,
> +        } , {
> +            .vmsd = &vmstate_tlbmas,
> +            .needed = tlbmas_needed,
> +        } , {
> +            /* FIXME: DCRs? */
> +            /* FIXME: timebase? */
> +            /* empty */

Are they needed or not needed?

If they're needed, please add them.

Regards,

Anthony Liguori

> +        }
> +    }
> +};
> diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
> index d8758d5..95aebf7 100644
> --- a/target-ppc/translate_init.c
> +++ b/target-ppc/translate_init.c
> @@ -8295,6 +8295,8 @@ static void ppc_cpu_class_init(ObjectClass *oc, void *data)
>  
>      cc->class_by_name = ppc_cpu_class_by_name;
>      cc->do_interrupt = ppc_cpu_do_interrupt;
> +
> +    cpu_class_set_vmsd(cc, &vmstate_ppc_cpu);
>  }
>  
>  static const TypeInfo ppc_cpu_type_info = {
> -- 
> 1.7.10.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 05/17] pseries: savevm support for XICS interrupt controller
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 05/17] pseries: savevm support for XICS interrupt controller Alexey Kardashevskiy
@ 2013-07-08 18:31   ` Anthony Liguori
  2013-07-09  0:06     ` Alexey Kardashevskiy
  2013-07-09  7:17     ` David Gibson
  0 siblings, 2 replies; 92+ messages in thread
From: Anthony Liguori @ 2013-07-08 18:31 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: Alexander Graf, qemu-ppc, Paolo Bonzini, Paul Mackerras, David Gibson

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> From: David Gibson <david@gibson.dropbear.id.au>
>
> This patch adds the necessary VMStateDescription information to support
> savevm/loadvm for the XICS interrupt controller used on the pseries
> machine.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> [aik: added ics_resend() on post_load]
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  hw/intc/xics.c |   63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 63 insertions(+)
>
> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
> index 0e374c8..3e8f48f 100644
> --- a/hw/intc/xics.c
> +++ b/hw/intc/xics.c
> @@ -497,6 +497,61 @@ static void xics_reset(DeviceState *d)
>      xics_common_reset(XICS(d));
>  }
>  
> +static int ics_post_load(void *opaque, int version_id)
> +{
> +    int i;
> +    struct ics_state *ics = opaque;
> +
> +    for (i = 0; i < ics->icp->nr_servers; i++) {
> +        icp_resend(ics->icp, i);
> +    }
> +
> +    return 0;
> +}
> +
> +const VMStateDescription vmstate_icp_server = {
> +    .name = "icp/server",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields      = (VMStateField []) {
> +        /* Sanity check */
> +        VMSTATE_UINT32(xirr, struct icp_server_state),
> +        VMSTATE_UINT8(pending_priority, struct icp_server_state),
> +        VMSTATE_UINT8(mfrr, struct icp_server_state),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static const VMStateDescription vmstate_ics_irq = {
> +    .name = "ics/irq",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields      = (VMStateField []) {
> +        VMSTATE_UINT32(server, struct ics_irq_state),
> +        VMSTATE_UINT8(priority, struct ics_irq_state),
> +        VMSTATE_UINT8(saved_priority, struct ics_irq_state),
> +        VMSTATE_UINT8(status, struct ics_irq_state),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +const VMStateDescription vmstate_ics = {
> +    .name = "ics",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .post_load = ics_post_load,
> +    .fields      = (VMStateField []) {
> +        /* Sanity check */
> +        VMSTATE_UINT32_EQUAL(nr_irqs, struct ics_state),
> +
> +        VMSTATE_STRUCT_VARRAY_POINTER_UINT32(irqs, struct ics_state, nr_irqs, vmstate_ics_irq, struct ics_irq_state),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
>  void xics_common_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
>  {
>      CPUState *cs = CPU(cpu);
> @@ -523,7 +578,11 @@ void xics_common_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
>  
>  void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
>  {
> +    CPUState *cs = CPU(cpu);
> +    struct icp_server_state *ss = &icp->ss[cs->cpu_index];
> +
>      xics_common_cpu_setup(icp, cpu);
> +    vmstate_register(NULL, cs->cpu_index, &vmstate_icp_server, ss);

This is an indication that something is wrong.

You should tie the vmstate section to DeviceState::vmsd.  You only need
to do this because you haven't converted everything to QOM yet.

Please do that to avoid these hacks.

Regards,

Anthony Liguori

>  }
>  
>  void xics_common_init(struct icp_state *icp, qemu_irq_handler handler)
> @@ -555,6 +614,10 @@ static void xics_realize(DeviceState *dev, Error **errp)
>      spapr_rtas_register("ibm,int-off", rtas_int_off);
>      spapr_rtas_register("ibm,int-on", rtas_int_on);
>  
> +    /* We use each the ICS's offset into the global irq number space
> +     * as an instance id.  This means we can extend to multiple ICS
> +     * instances without needing to change the savevm format */
> +    vmstate_register(NULL, icp->ics->offset, &vmstate_ics, icp->ics);
>  }
>  
>  static Property xics_properties[] = {
> -- 
> 1.7.10.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 01/17] pseries: move interrupt controllers to hw/intc/
  2013-07-08 18:15   ` Anthony Liguori
@ 2013-07-08 18:34     ` Alexander Graf
  0 siblings, 0 replies; 92+ messages in thread
From: Alexander Graf @ 2013-07-08 18:34 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexey Kardashevskiy, qemu-devel, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson


On 08.07.2013, at 20:15, Anthony Liguori wrote:

> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> 
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>

Thanks, applied to ppc-next.


Alex

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 06/17] pseries: savevm support for VIO devices
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 06/17] pseries: savevm support for VIO devices Alexey Kardashevskiy
@ 2013-07-08 18:35   ` Anthony Liguori
  0 siblings, 0 replies; 92+ messages in thread
From: Anthony Liguori @ 2013-07-08 18:35 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: Alexander Graf, qemu-ppc, Paolo Bonzini, Paul Mackerras, David Gibson

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> From: David Gibson <david@gibson.dropbear.id.au>
>
> This patch adds helpers to allow PAPR VIO devices to save state common
> to all VIO devices during savevm.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>

Regards,

Anthony Liguori

> ---
>  hw/ppc/spapr_vio.c         |   20 ++++++++++++++++++++
>  include/hw/ppc/spapr_vio.h |    5 +++++
>  2 files changed, 25 insertions(+)
>
> diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
> index 9c18741..565d883 100644
> --- a/hw/ppc/spapr_vio.c
> +++ b/hw/ppc/spapr_vio.c
> @@ -542,6 +542,26 @@ static const TypeInfo spapr_vio_bridge_info = {
>      .class_init    = spapr_vio_bridge_class_init,
>  };
>  
> +const VMStateDescription vmstate_spapr_vio = {
> +    .name = "spapr_vio",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields      = (VMStateField []) {
> +        /* Sanity check */
> +        VMSTATE_UINT32_EQUAL(reg, VIOsPAPRDevice),
> +        VMSTATE_UINT32_EQUAL(irq, VIOsPAPRDevice),
> +
> +        /* General VIO device state */
> +        VMSTATE_UINTTL(signal_state, VIOsPAPRDevice),
> +        VMSTATE_UINT64(crq.qladdr, VIOsPAPRDevice),
> +        VMSTATE_UINT32(crq.qsize, VIOsPAPRDevice),
> +        VMSTATE_UINT32(crq.qnext, VIOsPAPRDevice),
> +
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
>  static void vio_spapr_device_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *k = DEVICE_CLASS(klass);
> diff --git a/include/hw/ppc/spapr_vio.h b/include/hw/ppc/spapr_vio.h
> index 3609327..46edc2a 100644
> --- a/include/hw/ppc/spapr_vio.h
> +++ b/include/hw/ppc/spapr_vio.h
> @@ -134,4 +134,9 @@ VIOsPAPRDevice *spapr_vty_get_default(VIOsPAPRBus *bus);
>  
>  void spapr_vio_quiesce(void);
>  
> +extern const VMStateDescription vmstate_spapr_vio;
> +
> +#define VMSTATE_SPAPR_VIO(_f, _s) \
> +    VMSTATE_STRUCT(_f, _s, 0, vmstate_spapr_vio, VIOsPAPRDevice)
> +
>  #endif /* _HW_SPAPR_VIO_H */
> -- 
> 1.7.10.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 07/17] pseries: savevm support for PAPR VIO logical lan
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 07/17] pseries: savevm support for PAPR VIO logical lan Alexey Kardashevskiy
@ 2013-07-08 18:36   ` Anthony Liguori
  0 siblings, 0 replies; 92+ messages in thread
From: Anthony Liguori @ 2013-07-08 18:36 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: Alexander Graf, qemu-ppc, Paolo Bonzini, Paul Mackerras, David Gibson

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> From: David Gibson <david@gibson.dropbear.id.au>
>
> This patch adds the necessary VMStateDescription information to support
> savevm/loadvm for the spapr_llan (PAPR logical lan) device.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  hw/char/spapr_vty.c |   16 ++++++++++++++++

Please split this out.  But then you can add:

Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>

Regards,

Anthony Liguori

>  hw/net/spapr_llan.c |   24 ++++++++++++++++++++++--
>  2 files changed, 38 insertions(+), 2 deletions(-)
>
> diff --git a/hw/char/spapr_vty.c b/hw/char/spapr_vty.c
> index 2993848..a799721 100644
> --- a/hw/char/spapr_vty.c
> +++ b/hw/char/spapr_vty.c
> @@ -142,6 +142,21 @@ static Property spapr_vty_properties[] = {
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> +static const VMStateDescription vmstate_spapr_vty = {
> +    .name = "spapr_vty",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields      = (VMStateField []) {
> +        VMSTATE_SPAPR_VIO(sdev, VIOsPAPRVTYDevice),
> +
> +        VMSTATE_UINT32(in, VIOsPAPRVTYDevice),
> +        VMSTATE_UINT32(out, VIOsPAPRVTYDevice),
> +        VMSTATE_BUFFER(buf, VIOsPAPRVTYDevice),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
>  static void spapr_vty_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
> @@ -152,6 +167,7 @@ static void spapr_vty_class_init(ObjectClass *klass, void *data)
>      k->dt_type = "serial";
>      k->dt_compatible = "hvterm1";
>      dc->props = spapr_vty_properties;
> +    dc->vmsd = &vmstate_spapr_vty;
>  }
>  
>  static const TypeInfo spapr_vty_info = {
> diff --git a/hw/net/spapr_llan.c b/hw/net/spapr_llan.c
> index 03a09f2..46f7d5f 100644
> --- a/hw/net/spapr_llan.c
> +++ b/hw/net/spapr_llan.c
> @@ -81,9 +81,9 @@ typedef struct VIOsPAPRVLANDevice {
>      VIOsPAPRDevice sdev;
>      NICConf nicconf;
>      NICState *nic;
> -    int isopen;
> +    bool isopen;
>      target_ulong buf_list;
> -    int add_buf_ptr, use_buf_ptr, rx_bufs;
> +    uint32_t add_buf_ptr, use_buf_ptr, rx_bufs;
>      target_ulong rxq_ptr;
>  } VIOsPAPRVLANDevice;
>  
> @@ -500,6 +500,25 @@ static Property spapr_vlan_properties[] = {
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> +static const VMStateDescription vmstate_spapr_llan = {
> +    .name = "spapr_llan",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields      = (VMStateField []) {
> +        VMSTATE_SPAPR_VIO(sdev, VIOsPAPRVLANDevice),
> +        /* LLAN state */
> +        VMSTATE_BOOL(isopen, VIOsPAPRVLANDevice),
> +        VMSTATE_UINTTL(buf_list, VIOsPAPRVLANDevice),
> +        VMSTATE_UINT32(add_buf_ptr, VIOsPAPRVLANDevice),
> +        VMSTATE_UINT32(use_buf_ptr, VIOsPAPRVLANDevice),
> +        VMSTATE_UINT32(rx_bufs, VIOsPAPRVLANDevice),
> +        VMSTATE_UINTTL(rxq_ptr, VIOsPAPRVLANDevice),
> +
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
>  static void spapr_vlan_class_init(ObjectClass *klass, void *data)
>  {
>      DeviceClass *dc = DEVICE_CLASS(klass);
> @@ -514,6 +533,7 @@ static void spapr_vlan_class_init(ObjectClass *klass, void *data)
>      k->signal_mask = 0x1;
>      dc->props = spapr_vlan_properties;
>      k->rtce_window_size = 0x10000000;
> +    dc->vmsd = &vmstate_spapr_llan;
>  }
>  
>  static const TypeInfo spapr_vlan_info = {
> -- 
> 1.7.10.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 08/17] pseries: savevm support for PAPR TCE tables
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 08/17] pseries: savevm support for PAPR TCE tables Alexey Kardashevskiy
@ 2013-07-08 18:39   ` Anthony Liguori
  2013-07-08 21:45     ` Benjamin Herrenschmidt
                       ` (2 more replies)
  0 siblings, 3 replies; 92+ messages in thread
From: Anthony Liguori @ 2013-07-08 18:39 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: Alexander Graf, qemu-ppc, Paolo Bonzini, Paul Mackerras, David Gibson

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> From: David Gibson <david@gibson.dropbear.id.au>
>
> This patch adds the necessary VMStateDescription information to save the
> state of PAPR TCE tables (that is, the PAPR specified IOMMU).
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  hw/ppc/spapr_iommu.c |   25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
>
> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> index 91bc8e4..ba1f7b6 100644
> --- a/hw/ppc/spapr_iommu.c
> +++ b/hw/ppc/spapr_iommu.c
> @@ -112,6 +112,25 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
>      };
>  }
>  
> +static const VMStateDescription vmstate_spapr_tce_table = {
> +    .name = "spapr_iommu",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields      = (VMStateField []) {
> +        /* Sanity check */
> +        VMSTATE_UINT32_EQUAL(liobn, sPAPRTCETable),
> +        VMSTATE_UINT32_EQUAL(window_size, sPAPRTCETable),
> +
> +        /* IOMMU state */
> +        VMSTATE_BOOL(bypass, sPAPRTCETable),
> +        VMSTATE_VBUFFER_DIVIDE(table, sPAPRTCETable, 0, NULL, 0, window_size,
> +                               SPAPR_TCE_PAGE_SIZE /
> sizeof(sPAPRTCE)),

Not endian safe.  I really don't get the divide bit at all either.

> +
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
>  static MemoryRegionIOMMUOps spapr_iommu_ops = {
>      .translate = spapr_tce_translate_iommu,
>  };
> @@ -156,6 +175,8 @@ sPAPRTCETable *spapr_tce_new_table(uint32_t liobn, size_t window_size)
>  
>      QLIST_INSERT_HEAD(&spapr_tce_tables, tcet, list);
>  
> +    vmstate_register(NULL, tcet->liobn, &vmstate_spapr_tce_table, tcet);
> +

If you need to add these, then you need to do more QOM conversion.

Regards,

Anthony Liguori

>      return tcet;
>  }
>  
> @@ -163,6 +184,10 @@ void spapr_tce_free(sPAPRTCETable *tcet)
>  {
>      QLIST_REMOVE(tcet, list);
>  
> +    vmstate_unregister(NULL, &vmstate_spapr_tce_table, tcet);
> +
> +    QLIST_REMOVE(tcet, list);
> +
>      if (!kvm_enabled() ||
>          (kvmppc_remove_spapr_tce(tcet->table, tcet->fd,
>                                   tcet->window_size) != 0)) {
> -- 
> 1.7.10.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 09/17] pseries: rework PAPR virtual SCSI
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 09/17] pseries: rework PAPR virtual SCSI Alexey Kardashevskiy
@ 2013-07-08 18:42   ` Anthony Liguori
  2013-07-15 13:11     ` Paolo Bonzini
  0 siblings, 1 reply; 92+ messages in thread
From: Anthony Liguori @ 2013-07-08 18:42 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: Alexander Graf, qemu-ppc, Paolo Bonzini, Paul Mackerras, David Gibson

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> The patch reimplements handling of indirect requests in order to
> simplify upcoming live migration support.
> - all pointers (except SCSIRequest*) were replaces with integer
> indexes and offsets;
> - DMA'ed srp_direct_buf kept untouched (ie. BE format);
> - vscsi_fetch_desc() is added, now it is the only place where
> descriptors are fetched and byteswapped;
> - vscsi_req struct fields converted to migration-friendly types;
> - many dprintf()'s fixed.
>
> This also removed an unused field 'lun' from the spapr_vscsi device
> which is assigned, but never used.  So, remove it.
>
> [David Gibson: removed unused 'lun']
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/scsi/spapr_vscsi.c |  224 +++++++++++++++++++++++++++++--------------------
>  1 file changed, 131 insertions(+), 93 deletions(-)
>
> diff --git a/hw/scsi/spapr_vscsi.c b/hw/scsi/spapr_vscsi.c
> index e8978bf..1e93102 100644
> --- a/hw/scsi/spapr_vscsi.c
> +++ b/hw/scsi/spapr_vscsi.c
> @@ -75,20 +75,19 @@ typedef struct vscsi_req {
>      /* SCSI request tracking */
>      SCSIRequest             *sreq;
>      uint32_t                qtag; /* qemu tag != srp tag */
> -    int                     lun;
> -    int                     active;
> -    long                    data_len;
> -    int                     writing;
> -    int                     senselen;
> +    bool                    active;
> +    uint32_t                data_len;
> +    bool                    writing;
> +    uint32_t                senselen;
>      uint8_t                 sense[SCSI_SENSE_BUF_SIZE];
>  
>      /* RDMA related bits */
>      uint8_t                 dma_fmt;
> -    struct srp_direct_buf   ext_desc;
> -    struct srp_direct_buf   *cur_desc;
> -    struct srp_indirect_buf *ind_desc;
> -    int                     local_desc;
> -    int                     total_desc;
> +    uint16_t                local_desc;
> +    uint16_t                total_desc;
> +    uint16_t                cdb_offset;
> +    uint16_t                cur_desc_num;
> +    uint16_t                cur_desc_offset;
>  } vscsi_req;
>  
>  #define TYPE_VIO_SPAPR_VSCSI_DEVICE "spapr-vscsi"
> @@ -264,93 +263,139 @@ static int vscsi_send_rsp(VSCSIState *s, vscsi_req *req,
>      return 0;
>  }
>  
> -static inline void vscsi_swap_desc(struct srp_direct_buf *desc)
> +static inline struct srp_direct_buf vscsi_swap_desc(struct srp_direct_buf desc)
>  {
> -    desc->va = be64_to_cpu(desc->va);
> -    desc->len = be32_to_cpu(desc->len);
> +    desc.va = be64_to_cpu(desc.va);
> +    desc.len = be32_to_cpu(desc.len);
> +    return desc;
> +}
> +
> +static int vscsi_fetch_desc(VSCSIState *s, struct vscsi_req *req,
> +                            unsigned n, unsigned buf_offset,
> +                            struct srp_direct_buf *ret)
> +{
> +    struct srp_cmd *cmd = &req->iu.srp.cmd;
> +
> +    switch (req->dma_fmt) {
> +    case SRP_NO_DATA_DESC: {
> +        dprintf("VSCSI: no data descriptor\n");
> +        return 0;
> +    }
> +    case SRP_DATA_DESC_DIRECT: {
> +        *ret = *(struct srp_direct_buf *)(cmd->add_data +
> req->cdb_offset);

If you're reworking this code, you should remove these casts.  It's not
safe to assume that cdb_offset is aligned properly.  memcpy()'ing would
be much safer.

Regards,

Anthony Liguori

> +        assert(req->cur_desc_num == 0);
> +        dprintf("VSCSI: direct segment");
> +        break;
> +    }
> +    case SRP_DATA_DESC_INDIRECT: {
> +        struct srp_indirect_buf *tmp = (struct srp_indirect_buf *)
> +                                       (cmd->add_data + req->cdb_offset);
> +        if (n < req->local_desc) {
> +            *ret = tmp->desc_list[n];
> +            dprintf("VSCSI: indirect segment local tag=0x%x desc#%d/%d",
> +                    req->qtag, n, req->local_desc);
> +
> +        } else if (n < req->total_desc) {
> +            int rc;
> +            struct srp_direct_buf tbl_desc = vscsi_swap_desc(tmp->table_desc);
> +            unsigned desc_offset = (n - req->local_desc) *
> +                                    sizeof(struct srp_direct_buf);
> +
> +            if (desc_offset > tbl_desc.len) {
> +                dprintf("VSCSI:   #%d is ouf of range (%d bytes)\n",
> +                        n, desc_offset);
> +                return -1;
> +            }
> +            rc = spapr_vio_dma_read(&s->vdev, tbl_desc.va + desc_offset,
> +                                    ret, sizeof(struct srp_direct_buf));
> +            if (rc) {
> +                dprintf("VSCSI: spapr_vio_dma_read -> %d reading ext_desc\n",
> +                        rc);
> +                return rc;
> +            }
> +            dprintf("VSCSI: indirect segment ext. tag=0x%x desc#%d/%d { va=%"PRIx64" len=%x }",
> +                    req->qtag, n, req->total_desc, tbl_desc.va, tbl_desc.len);
> +        } else {
> +            dprintf("VSCSI:   Out of descriptors !\n");
> +            return 0;
> +        }
> +        break;
> +    }
> +    default:
> +        fprintf(stderr, "VSCSI:   Unknown format %x\n", req->dma_fmt);
> +        return -1;
> +    }
> +
> +    *ret = vscsi_swap_desc(*ret);
> +    if (buf_offset > ret->len) {
> +        dprintf("   offset=%x is out of a descriptor #%d boundary=%x\n",
> +                buf_offset, req->cur_desc_num, ret->len);
> +        return -1;
> +    }
> +    ret->va += buf_offset;
> +    ret->len -= buf_offset;
> +
> +    dprintf("   cur=%d offs=%x ret { va=%"PRIx64" len=%x }\n",
> +            req->cur_desc_num, req->cur_desc_offset, ret->va, ret->len);
> +
> +    return ret->len ? 1 : 0;
>  }
>  
>  static int vscsi_srp_direct_data(VSCSIState *s, vscsi_req *req,
>                                   uint8_t *buf, uint32_t len)
>  {
> -    struct srp_direct_buf *md = req->cur_desc;
> +    struct srp_direct_buf md;
>      uint32_t llen;
>      int rc = 0;
>  
> -    dprintf("VSCSI: direct segment 0x%x bytes, va=0x%llx desc len=0x%x\n",
> -            len, (unsigned long long)md->va, md->len);
> +    rc = vscsi_fetch_desc(s, req, req->cur_desc_num, req->cur_desc_offset, &md);
> +    if (rc < 0) {
> +        return -1;
> +    } else if (rc == 0) {
> +        return 0;
> +    }
>  
> -    llen = MIN(len, md->len);
> +    llen = MIN(len, md.len);
>      if (llen) {
>          if (req->writing) { /* writing = to device = reading from memory */
> -            rc = spapr_vio_dma_read(&s->vdev, md->va, buf, llen);
> +            rc = spapr_vio_dma_read(&s->vdev, md.va, buf, llen);
>          } else {
> -            rc = spapr_vio_dma_write(&s->vdev, md->va, buf, llen);
> +            rc = spapr_vio_dma_write(&s->vdev, md.va, buf, llen);
>          }
>      }
> -    md->len -= llen;
> -    md->va += llen;
>  
>      if (rc) {
>          return -1;
>      }
> +    req->cur_desc_offset += llen;
> +
>      return llen;
>  }
>  
>  static int vscsi_srp_indirect_data(VSCSIState *s, vscsi_req *req,
>                                     uint8_t *buf, uint32_t len)
>  {
> -    struct srp_direct_buf *td = &req->ind_desc->table_desc;
> -    struct srp_direct_buf *md = req->cur_desc;
> +    struct srp_direct_buf md;
>      int rc = 0;
>      uint32_t llen, total = 0;
>  
> -    dprintf("VSCSI: indirect segment 0x%x bytes, td va=0x%llx len=0x%x\n",
> -            len, (unsigned long long)td->va, td->len);
> +    dprintf("VSCSI: indirect segment 0x%x bytes\n", len);
>  
>      /* While we have data ... */
>      while (len) {
> -        /* If we have a descriptor but it's empty, go fetch a new one */
> -        if (md && md->len == 0) {
> -            /* More local available, use one */
> -            if (req->local_desc) {
> -                md = ++req->cur_desc;
> -                --req->local_desc;
> -                --req->total_desc;
> -                td->va += sizeof(struct srp_direct_buf);
> -            } else {
> -                md = req->cur_desc = NULL;
> -            }
> -        }
> -        /* No descriptor at hand, fetch one */
> -        if (!md) {
> -            if (!req->total_desc) {
> -                dprintf("VSCSI:   Out of descriptors !\n");
> -                break;
> -            }
> -            md = req->cur_desc = &req->ext_desc;
> -            dprintf("VSCSI:   Reading desc from 0x%llx\n",
> -                    (unsigned long long)td->va);
> -            rc = spapr_vio_dma_read(&s->vdev, td->va, md,
> -                                    sizeof(struct srp_direct_buf));
> -            if (rc) {
> -                dprintf("VSCSI: spapr_vio_dma_read -> %d reading ext_desc\n",
> -                        rc);
> -                break;
> -            }
> -            vscsi_swap_desc(md);
> -            td->va += sizeof(struct srp_direct_buf);
> -            --req->total_desc;
> +        rc = vscsi_fetch_desc(s, req, req->cur_desc_num, req->cur_desc_offset, &md);
> +        if (rc < 0) {
> +            return -1;
> +        } else if (rc == 0) {
> +            break;
>          }
> -        dprintf("VSCSI:   [desc va=0x%llx,len=0x%x] remaining=0x%x\n",
> -                (unsigned long long)md->va, md->len, len);
>  
>          /* Perform transfer */
> -        llen = MIN(len, md->len);
> +        llen = MIN(len, md.len);
>          if (req->writing) { /* writing = to device = reading from memory */
> -            rc = spapr_vio_dma_read(&s->vdev, md->va, buf, llen);
> +            rc = spapr_vio_dma_read(&s->vdev, md.va, buf, llen);
>          } else {
> -            rc = spapr_vio_dma_write(&s->vdev, md->va, buf, llen);
> +            rc = spapr_vio_dma_write(&s->vdev, md.va, buf, llen);
>          }
>          if (rc) {
>              dprintf("VSCSI: spapr_vio_dma_r/w(%d) -> %d\n", req->writing, rc);
> @@ -361,10 +406,18 @@ static int vscsi_srp_indirect_data(VSCSIState *s, vscsi_req *req,
>  
>          len -= llen;
>          buf += llen;
> +
>          total += llen;
> -        md->va += llen;
> -        md->len -= llen;
> +
> +        /* Update current position in the current descriptor */
> +        req->cur_desc_offset += llen;
> +        if (md.len == llen) {
> +            /* Go to the next descriptor if the current one finished */
> +            ++req->cur_desc_num;
> +            req->cur_desc_offset = 0;
> +        }
>      }
> +
>      return rc ? -1 : total;
>  }
>  
> @@ -412,14 +465,13 @@ static int data_out_desc_size(struct srp_cmd *cmd)
>  static int vscsi_preprocess_desc(vscsi_req *req)
>  {
>      struct srp_cmd *cmd = &req->iu.srp.cmd;
> -    int offset, i;
>  
> -    offset = cmd->add_cdb_len & ~3;
> +    req->cdb_offset = cmd->add_cdb_len & ~3;
>  
>      if (req->writing) {
>          req->dma_fmt = cmd->buf_fmt >> 4;
>      } else {
> -        offset += data_out_desc_size(cmd);
> +        req->cdb_offset += data_out_desc_size(cmd);
>          req->dma_fmt = cmd->buf_fmt & ((1U << 4) - 1);
>      }
>  
> @@ -427,31 +479,18 @@ static int vscsi_preprocess_desc(vscsi_req *req)
>      case SRP_NO_DATA_DESC:
>          break;
>      case SRP_DATA_DESC_DIRECT:
> -        req->cur_desc = (struct srp_direct_buf *)(cmd->add_data + offset);
>          req->total_desc = req->local_desc = 1;
> -        vscsi_swap_desc(req->cur_desc);
> -        dprintf("VSCSI: using direct RDMA %s, 0x%x bytes MD: 0x%llx\n",
> -                req->writing ? "write" : "read",
> -                req->cur_desc->len, (unsigned long long)req->cur_desc->va);
>          break;
> -    case SRP_DATA_DESC_INDIRECT:
> -        req->ind_desc = (struct srp_indirect_buf *)(cmd->add_data + offset);
> -        vscsi_swap_desc(&req->ind_desc->table_desc);
> -        req->total_desc = req->ind_desc->table_desc.len /
> -            sizeof(struct srp_direct_buf);
> +    case SRP_DATA_DESC_INDIRECT: {
> +        struct srp_indirect_buf *ind_tmp = (struct srp_indirect_buf *)
> +                (cmd->add_data + req->cdb_offset);
> +
> +        req->total_desc = be32_to_cpu(ind_tmp->table_desc.len) /
> +                          sizeof(struct srp_direct_buf);
>          req->local_desc = req->writing ? cmd->data_out_desc_cnt :
> -            cmd->data_in_desc_cnt;
> -        for (i = 0; i < req->local_desc; i++) {
> -            vscsi_swap_desc(&req->ind_desc->desc_list[i]);
> -        }
> -        req->cur_desc = req->local_desc ? &req->ind_desc->desc_list[0] : NULL;
> -        dprintf("VSCSI: using indirect RDMA %s, 0x%x bytes %d descs "
> -                "(%d local) VA: 0x%llx\n",
> -                req->writing ? "read" : "write",
> -                be32_to_cpu(req->ind_desc->len),
> -                req->total_desc, req->local_desc,
> -                (unsigned long long)req->ind_desc->table_desc.va);
> +                          cmd->data_in_desc_cnt;
>          break;
> +    }
>      default:
>          fprintf(stderr,
>                  "vscsi_preprocess_desc: Unknown format %x\n", req->dma_fmt);
> @@ -499,8 +538,8 @@ static void vscsi_command_complete(SCSIRequest *sreq, uint32_t status, size_t re
>      vscsi_req *req = sreq->hba_private;
>      int32_t res_in = 0, res_out = 0;
>  
> -    dprintf("VSCSI: SCSI cmd complete, r=0x%x tag=0x%x status=0x%x, req=%p\n",
> -            reason, sreq->tag, status, req);
> +    dprintf("VSCSI: SCSI cmd complete, tag=0x%x status=0x%x, req=%p\n",
> +            sreq->tag, status, req);
>      if (req == NULL) {
>          fprintf(stderr, "VSCSI: Can't find request for tag 0x%x\n", sreq->tag);
>          return;
> @@ -509,7 +548,7 @@ static void vscsi_command_complete(SCSIRequest *sreq, uint32_t status, size_t re
>      if (status == CHECK_CONDITION) {
>          req->senselen = scsi_req_get_sense(req->sreq, req->sense,
>                                             sizeof(req->sense));
> -        dprintf("VSCSI: Sense data, %d bytes:\n", len);
> +        dprintf("VSCSI: Sense data, %d bytes:\n", req->senselen);
>          dprintf("       %02x  %02x  %02x  %02x  %02x  %02x  %02x  %02x\n",
>                  req->sense[0], req->sense[1], req->sense[2], req->sense[3],
>                  req->sense[4], req->sense[5], req->sense[6], req->sense[7]);
> @@ -621,12 +660,11 @@ static int vscsi_queue_cmd(VSCSIState *s, vscsi_req *req)
>          } return 1;
>      }
>  
> -    req->lun = lun;
>      req->sreq = scsi_req_new(sdev, req->qtag, lun, srp->cmd.cdb, req);
>      n = scsi_req_enqueue(req->sreq);
>  
> -    dprintf("VSCSI: Queued command tag 0x%x CMD 0x%x ID %d LUN %d ret: %d\n",
> -            req->qtag, srp->cmd.cdb[0], id, lun, n);
> +    dprintf("VSCSI: Queued command tag 0x%x CMD 0x%x LUN %d ret: %d\n",
> +            req->qtag, srp->cmd.cdb[0], lun, n);
>  
>      if (n) {
>          /* Transfer direction must be set before preprocessing the
> -- 
> 1.7.10.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 11/17] pseries: savevm support for pseries machine
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 11/17] pseries: savevm support for pseries machine Alexey Kardashevskiy
@ 2013-07-08 18:45   ` Anthony Liguori
  2013-07-08 18:50     ` Alexander Graf
  2013-07-08 21:48     ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 92+ messages in thread
From: Anthony Liguori @ 2013-07-08 18:45 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: Alexander Graf, qemu-ppc, Paolo Bonzini, Paul Mackerras, David Gibson

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> From: David Gibson <david@gibson.dropbear.id.au>
>
> This adds the necessary pieces to implement savevm / migration for the
> pseries machine.  The most complex part here is migrating the hash
> table - for the paravirtualized pseries machine the guest's hash page
> table is not stored within guest memory, but externally and the guest
> accesses it via hypercalls.
>
> This patch uses a hypervisor reserved bit of the HPTE as a dirty bit
> (tracking changes to the HPTE itself, not the page it references).
> This is used to implement a live migration style incremental save and
> restore of the hash table contents.
>
> In addition it adds VMStateDescription information to save and restore
> the (few) remaining pieces of state information needed by the pseries
> machine.
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

I vaguely recall making the suggestion to use a live section like this.
How large is the HTAB typically?

Regards,

Anthony Liguori

> ---
>  hw/ppc/spapr.c         |  269 +++++++++++++++++++++++++++++++++++++++++++++++-
>  hw/ppc/spapr_hcall.c   |    8 +-
>  include/hw/ppc/spapr.h |   12 ++-
>  3 files changed, 281 insertions(+), 8 deletions(-)
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index def3505..f989a22 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -32,6 +32,7 @@
>  #include "sysemu/cpus.h"
>  #include "sysemu/kvm.h"
>  #include "kvm_ppc.h"
> +#include "mmu-hash64.h"
>  
>  #include "hw/boards.h"
>  #include "hw/ppc/ppc.h"
> @@ -667,7 +668,7 @@ static void spapr_cpu_reset(void *opaque)
>  
>      env->spr[SPR_HIOR] = 0;
>  
> -    env->external_htab = spapr->htab;
> +    env->external_htab = (uint8_t *)spapr->htab;
>      env->htab_base = -1;
>      env->htab_mask = HTAB_SIZE(spapr) - 1;
>      env->spr[SPR_SDR1] = (target_ulong)spapr->htab |
> @@ -719,6 +720,268 @@ static int spapr_vga_init(PCIBus *pci_bus)
>      }
>  }
>  
> +static const VMStateDescription vmstate_spapr = {
> +    .name = "spapr",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields      = (VMStateField []) {
> +        VMSTATE_UINT32(next_irq, sPAPREnvironment),
> +
> +        /* RTC offset */
> +        VMSTATE_UINT64(rtc_offset, sPAPREnvironment),
> +
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +#define HPTE(_table, _i)   (void *)(((uint64_t *)(_table)) + ((_i) * 2))
> +#define HPTE_VALID(_hpte)  (tswap64(*((uint64_t *)(_hpte))) & HPTE64_V_VALID)
> +#define HPTE_DIRTY(_hpte)  (tswap64(*((uint64_t *)(_hpte))) & HPTE64_V_HPTE_DIRTY)
> +#define CLEAN_HPTE(_hpte)  ((*(uint64_t *)(_hpte)) &= tswap64(~HPTE64_V_HPTE_DIRTY))
> +
> +static int htab_save_setup(QEMUFile *f, void *opaque)
> +{
> +    sPAPREnvironment *spapr = opaque;
> +
> +    spapr->htab_save_index = 0;
> +    spapr->htab_first_pass = true;
> +
> +    /* "Iteration" header */
> +    qemu_put_be32(f, spapr->htab_shift);
> +
> +    return 0;
> +}
> +
> +#define MAX_ITERATION_NS    5000000 /* 5 ms */
> +
> +static void htab_save_first_pass(QEMUFile *f, sPAPREnvironment *spapr,
> +                                 int64_t max_ns)
> +{
> +    int htabslots = HTAB_SIZE(spapr) / HASH_PTE_SIZE_64;
> +    int index = spapr->htab_save_index;
> +    int64_t starttime = qemu_get_clock_ns(rt_clock);
> +
> +    assert(spapr->htab_first_pass);
> +
> +    do {
> +        int chunkstart;
> +
> +        /* Consume invalid HPTEs */
> +        while ((index < htabslots)
> +               && !HPTE_VALID(HPTE(spapr->htab, index))) {
> +            index++;
> +            CLEAN_HPTE(HPTE(spapr->htab, index));
> +        }
> +
> +        /* Consume valid HPTEs */
> +        chunkstart = index;
> +        while ((index < htabslots)
> +               && HPTE_VALID(HPTE(spapr->htab, index))) {
> +            index++;
> +            CLEAN_HPTE(HPTE(spapr->htab, index));
> +        }
> +
> +        if (index > chunkstart) {
> +            int n_valid = index - chunkstart;
> +
> +            qemu_put_be32(f, chunkstart);
> +            qemu_put_be16(f, n_valid);
> +            qemu_put_be16(f, 0);
> +            qemu_put_buffer(f, HPTE(spapr->htab, chunkstart),
> +                            HASH_PTE_SIZE_64 * n_valid);
> +
> +            if ((qemu_get_clock_ns(rt_clock) - starttime) > max_ns) {
> +                break;
> +            }
> +        }
> +    } while ((index < htabslots) && !qemu_file_rate_limit(f));
> +
> +    if (index >= htabslots) {
> +        assert(index == htabslots);
> +        index = 0;
> +        spapr->htab_first_pass = false;
> +    }
> +    spapr->htab_save_index = index;
> +}
> +
> +static bool htab_save_later_pass(QEMUFile *f, sPAPREnvironment *spapr,
> +                                 int64_t max_ns)
> +{
> +    bool final = max_ns < 0;
> +    int htabslots = HTAB_SIZE(spapr) / HASH_PTE_SIZE_64;
> +    int examined = 0, sent = 0;
> +    int index = spapr->htab_save_index;
> +    int64_t starttime = qemu_get_clock_ns(rt_clock);
> +
> +    assert(!spapr->htab_first_pass);
> +
> +    do {
> +        int chunkstart, invalidstart;
> +
> +        /* Consume non-dirty HPTEs */
> +        while ((index < htabslots)
> +               && !HPTE_DIRTY(HPTE(spapr->htab, index))) {
> +            index++;
> +            examined++;
> +        }
> +
> +        chunkstart = index;
> +        /* Consume valid dirty HPTEs */
> +        while ((index < htabslots)
> +               && HPTE_DIRTY(HPTE(spapr->htab, index))
> +               && HPTE_VALID(HPTE(spapr->htab, index))) {
> +            CLEAN_HPTE(HPTE(spapr->htab, index));
> +            index++;
> +            examined++;
> +        }
> +
> +        invalidstart = index;
> +        /* Consume invalid dirty HPTEs */
> +        while ((index < htabslots)
> +               && HPTE_DIRTY(HPTE(spapr->htab, index))
> +               && !HPTE_VALID(HPTE(spapr->htab, index))) {
> +            CLEAN_HPTE(HPTE(spapr->htab, index));
> +            index++;
> +            examined++;
> +        }
> +
> +        if (index > chunkstart) {
> +            int n_valid = invalidstart - chunkstart;
> +            int n_invalid = index - invalidstart;
> +
> +            qemu_put_be32(f, chunkstart);
> +            qemu_put_be16(f, n_valid);
> +            qemu_put_be16(f, n_invalid);
> +            qemu_put_buffer(f, HPTE(spapr->htab, chunkstart),
> +                            HASH_PTE_SIZE_64 * n_valid);
> +            sent += index - chunkstart;
> +
> +            if (!final && (qemu_get_clock_ns(rt_clock) - starttime) > max_ns) {
> +                break;
> +            }
> +        }
> +
> +        if (examined >= htabslots) {
> +            break;
> +        }
> +
> +        if (index >= htabslots) {
> +            assert(index == htabslots);
> +            index = 0;
> +        }
> +    } while ((examined < htabslots) && (!qemu_file_rate_limit(f) || final));
> +
> +    if (index >= htabslots) {
> +        assert(index == htabslots);
> +        index = 0;
> +    }
> +
> +    spapr->htab_save_index = index;
> +
> +    return (examined >= htabslots) && (sent == 0);
> +}
> +
> +static int htab_save_iterate(QEMUFile *f, void *opaque)
> +{
> +    sPAPREnvironment *spapr = opaque;
> +    bool nothingleft = false;;
> +
> +    /* Iteration header */
> +    qemu_put_be32(f, 0);
> +
> +    if (spapr->htab_first_pass) {
> +        htab_save_first_pass(f, spapr, MAX_ITERATION_NS);
> +    } else {
> +        nothingleft = htab_save_later_pass(f, spapr, MAX_ITERATION_NS);
> +    }
> +
> +    /* End marker */
> +    qemu_put_be32(f, 0);
> +    qemu_put_be16(f, 0);
> +    qemu_put_be16(f, 0);
> +
> +    return nothingleft ? 1 : 0;
> +}
> +
> +static int htab_save_complete(QEMUFile *f, void *opaque)
> +{
> +    sPAPREnvironment *spapr = opaque;
> +
> +    /* Iteration header */
> +    qemu_put_be32(f, 0);
> +
> +    htab_save_later_pass(f, spapr, -1);
> +
> +    /* End marker */
> +    qemu_put_be32(f, 0);
> +    qemu_put_be16(f, 0);
> +    qemu_put_be16(f, 0);
> +
> +    return 0;
> +}
> +
> +static int htab_load(QEMUFile *f, void *opaque, int version_id)
> +{
> +    sPAPREnvironment *spapr = opaque;
> +    uint32_t section_hdr;
> +
> +    if (version_id < 1 || version_id > 1) {
> +        fprintf(stderr, "htab_load() bad version\n");
> +        return -EINVAL;
> +    }
> +
> +    section_hdr = qemu_get_be32(f);
> +
> +    if (section_hdr) {
> +        /* First section, just the hash shift */
> +        if (spapr->htab_shift != section_hdr) {
> +            return -EINVAL;
> +        }
> +        return 0;
> +    }
> +
> +    while (true) {
> +        uint32_t index;
> +        uint16_t n_valid, n_invalid;
> +
> +        index = qemu_get_be32(f);
> +        n_valid = qemu_get_be16(f);
> +        n_invalid = qemu_get_be16(f);
> +
> +        if ((index == 0) && (n_valid == 0) && (n_invalid == 0)) {
> +            /* End of Stream */
> +            break;
> +        }
> +
> +        if ((index + n_valid + n_invalid) >=
> +            (HTAB_SIZE(spapr) / HASH_PTE_SIZE_64)) {
> +            /* Bad index in stream */
> +            fprintf(stderr, "htab_load() bad index %d (%hd+%hd entries) "
> +                    "in htab stream\n", index, n_valid, n_invalid);
> +            return -EINVAL;
> +        }
> +
> +        if (n_valid) {
> +            qemu_get_buffer(f, HPTE(spapr->htab, index),
> +                            HASH_PTE_SIZE_64 * n_valid);
> +        }
> +        if (n_invalid) {
> +            memset(HPTE(spapr->htab, index + n_valid), 0,
> +                   HASH_PTE_SIZE_64 * n_invalid);
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +static SaveVMHandlers savevm_htab_handlers = {
> +    .save_live_setup = htab_save_setup,
> +    .save_live_iterate = htab_save_iterate,
> +    .save_live_complete = htab_save_complete,
> +    .load_state = htab_load,
> +};
> +
>  static struct icp_state *try_create_xics(const char *type, int nr_servers,
>                                           int nr_irqs)
>  {
> @@ -987,6 +1250,10 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
>  
>      spapr->entry_point = 0x100;
>  
> +    vmstate_register(NULL, 0, &vmstate_spapr, spapr);
> +    register_savevm_live(NULL, "spapr/htab", -1, 1,
> +                         &savevm_htab_handlers, spapr);
> +
>      /* Prepare the device tree */
>      spapr->fdt_skel = spapr_create_fdt_skel(cpu_model,
>                                              initrd_base, initrd_size,
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index e6f321d..7ca984e 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -115,7 +115,7 @@ static target_ulong h_enter(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>      }
>      ppc_hash64_store_hpte1(env, hpte, ptel);
>      /* eieio();  FIXME: need some sort of barrier for smp? */
> -    ppc_hash64_store_hpte0(env, hpte, pteh);
> +    ppc_hash64_store_hpte0(env, hpte, pteh | HPTE64_V_HPTE_DIRTY);
>  
>      args[0] = pte_index + i;
>      return H_SUCCESS;
> @@ -152,7 +152,7 @@ static target_ulong remove_hpte(CPUPPCState *env, target_ulong ptex,
>      }
>      *vp = v;
>      *rp = r;
> -    ppc_hash64_store_hpte0(env, hpte, 0);
> +    ppc_hash64_store_hpte0(env, hpte, HPTE64_V_HPTE_DIRTY);
>      rb = compute_tlbie_rb(v, r, ptex);
>      ppc_tlb_invalidate_one(env, rb);
>      return REMOVE_SUCCESS;
> @@ -282,11 +282,11 @@ static target_ulong h_protect(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>      r |= (flags << 48) & HPTE64_R_KEY_HI;
>      r |= flags & (HPTE64_R_PP | HPTE64_R_N | HPTE64_R_KEY_LO);
>      rb = compute_tlbie_rb(v, r, pte_index);
> -    ppc_hash64_store_hpte0(env, hpte, v & ~HPTE64_V_VALID);
> +    ppc_hash64_store_hpte0(env, hpte, (v & ~HPTE64_V_VALID) | HPTE64_V_HPTE_DIRTY);
>      ppc_tlb_invalidate_one(env, rb);
>      ppc_hash64_store_hpte1(env, hpte, r);
>      /* Don't need a memory barrier, due to qemu's global lock */
> -    ppc_hash64_store_hpte0(env, hpte, v);
> +    ppc_hash64_store_hpte0(env, hpte, v | HPTE64_V_HPTE_DIRTY);
>      return H_SUCCESS;
>  }
>  
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 09c4570..4cfe449 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -9,6 +9,8 @@ struct sPAPRPHBState;
>  struct sPAPRNVRAM;
>  struct icp_state;
>  
> +#define HPTE64_V_HPTE_DIRTY     0x0000000000000040ULL
> +
>  typedef struct sPAPREnvironment {
>      struct VIOsPAPRBus *vio_bus;
>      QLIST_HEAD(, sPAPRPHBState) phbs;
> @@ -17,20 +19,24 @@ typedef struct sPAPREnvironment {
>  
>      hwaddr ram_limit;
>      void *htab;
> -    long htab_shift;
> +    uint32_t htab_shift;
>      hwaddr rma_size;
>      int vrma_adjust;
>      hwaddr fdt_addr, rtas_addr;
>      long rtas_size;
>      void *fdt_skel;
>      target_ulong entry_point;
> -    int next_irq;
> -    int rtc_offset;
> +    uint32_t next_irq;
> +    uint64_t rtc_offset;
>      char *cpu_model;
>      bool has_graphics;
>  
>      uint32_t epow_irq;
>      Notifier epow_notifier;
> +
> +    /* Migration state */
> +    int htab_save_index;
> +    bool htab_first_pass;
>  } sPAPREnvironment;
>  
>  #define H_SUCCESS         0
> -- 
> 1.7.10.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 12/17] pseries: savevm support for PCI host bridge
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 12/17] pseries: savevm support for PCI host bridge Alexey Kardashevskiy
@ 2013-07-08 18:45   ` Anthony Liguori
  0 siblings, 0 replies; 92+ messages in thread
From: Anthony Liguori @ 2013-07-08 18:45 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: Alexander Graf, qemu-ppc, Paolo Bonzini, Paul Mackerras, David Gibson

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> From: David Gibson <david@gibson.dropbear.id.au>
>
> This adds the necessary support for saving the state of the PAPR virtual
> PCI host bridge (or host bridges).
>
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>

Regards,

Anthony Liguori

> ---
>  hw/ppc/spapr_pci.c          |   49 +++++++++++++++++++++++++++++++++++++++++++
>  include/hw/pci-host/spapr.h |    6 +++---
>  2 files changed, 52 insertions(+), 3 deletions(-)
>
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index c8c12c8..4d8e3cd 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -696,6 +696,54 @@ static Property spapr_phb_properties[] = {
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> +static const VMStateDescription vmstate_spapr_pci_lsi = {
> +    .name = "spapr_pci/lsi",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields      = (VMStateField []) {
> +        VMSTATE_UINT32_EQUAL(irq, struct spapr_pci_lsi),
> +
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static const VMStateDescription vmstate_spapr_pci_msi = {
> +    .name = "spapr_pci/lsi",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields      = (VMStateField []) {
> +        VMSTATE_UINT32(config_addr, struct spapr_pci_msi),
> +        VMSTATE_UINT32(irq, struct spapr_pci_msi),
> +        VMSTATE_UINT32(nvec, struct spapr_pci_msi),
> +
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static const VMStateDescription vmstate_spapr_pci = {
> +    .name = "spapr_pci",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .fields      = (VMStateField []) {
> +        VMSTATE_UINT64_EQUAL(buid, sPAPRPHBState),
> +        VMSTATE_UINT32_EQUAL(dma_liobn, sPAPRPHBState),
> +        VMSTATE_UINT64_EQUAL(mem_win_addr, sPAPRPHBState),
> +        VMSTATE_UINT64_EQUAL(mem_win_size, sPAPRPHBState),
> +        VMSTATE_UINT64_EQUAL(io_win_addr, sPAPRPHBState),
> +        VMSTATE_UINT64_EQUAL(io_win_size, sPAPRPHBState),
> +        VMSTATE_UINT64_EQUAL(msi_win_addr, sPAPRPHBState),
> +        VMSTATE_STRUCT_ARRAY(lsi_table, sPAPRPHBState, PCI_NUM_PINS, 0,
> +                             vmstate_spapr_pci_lsi, struct spapr_pci_lsi),
> +        VMSTATE_STRUCT_ARRAY(msi_table, sPAPRPHBState, SPAPR_MSIX_MAX_DEVS, 0,
> +                             vmstate_spapr_pci_msi, struct spapr_pci_msi),
> +
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
>  static void spapr_phb_class_init(ObjectClass *klass, void *data)
>  {
>      SysBusDeviceClass *sdc = SYS_BUS_DEVICE_CLASS(klass);
> @@ -704,6 +752,7 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
>      sdc->init = spapr_phb_init;
>      dc->props = spapr_phb_properties;
>      dc->reset = spapr_phb_reset;
> +    dc->vmsd = &vmstate_spapr_pci;
>  }
>  
>  static const TypeInfo spapr_phb_info = {
> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
> index 1e23dbf..93f9511 100644
> --- a/include/hw/pci-host/spapr.h
> +++ b/include/hw/pci-host/spapr.h
> @@ -52,14 +52,14 @@ typedef struct sPAPRPHBState {
>      sPAPRTCETable *tcet;
>      AddressSpace iommu_as;
>  
> -    struct {
> +    struct spapr_pci_lsi {
>          uint32_t irq;
>      } lsi_table[PCI_NUM_PINS];
>  
> -    struct {
> +    struct spapr_pci_msi {
>          uint32_t config_addr;
>          uint32_t irq;
> -        int nvec;
> +        uint32_t nvec;
>      } msi_table[SPAPR_MSIX_MAX_DEVS];
>  
>      QLIST_ENTRY(sPAPRPHBState) list;
> -- 
> 1.7.10.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 11/17] pseries: savevm support for pseries machine
  2013-07-08 18:45   ` Anthony Liguori
@ 2013-07-08 18:50     ` Alexander Graf
  2013-07-08 19:01       ` Anthony Liguori
  2013-07-08 21:48     ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 92+ messages in thread
From: Alexander Graf @ 2013-07-08 18:50 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexey Kardashevskiy, qemu-devel, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson


On 08.07.2013, at 20:45, Anthony Liguori wrote:

> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> 
>> From: David Gibson <david@gibson.dropbear.id.au>
>> 
>> This adds the necessary pieces to implement savevm / migration for the
>> pseries machine.  The most complex part here is migrating the hash
>> table - for the paravirtualized pseries machine the guest's hash page
>> table is not stored within guest memory, but externally and the guest
>> accesses it via hypercalls.
>> 
>> This patch uses a hypervisor reserved bit of the HPTE as a dirty bit
>> (tracking changes to the HPTE itself, not the page it references).
>> This is used to implement a live migration style incremental save and
>> restore of the hash table contents.
>> 
>> In addition it adds VMStateDescription information to save and restore
>> the (few) remaining pieces of state information needed by the pseries
>> machine.
>> 
>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> 
> I vaguely recall making the suggestion to use a live section like this.
> How large is the HTAB typically?

The default for HV KVM is at 16MB IIRC.


Alex

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 14/17] pseries: Support for in-kernel XICS interrupt controller
  2013-06-27  6:45 ` [Qemu-devel] [PATCH 14/17] pseries: Support for in-kernel XICS interrupt controller Alexey Kardashevskiy
@ 2013-07-08 18:50   ` Anthony Liguori
  2013-07-09  3:21     ` Alexey Kardashevskiy
  0 siblings, 1 reply; 92+ messages in thread
From: Anthony Liguori @ 2013-07-08 18:50 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: Alexander Graf, qemu-ppc, Paolo Bonzini, Paul Mackerras, David Gibson

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> From: David Gibson <david@gibson.dropbear.id.au>
>
> Recent (host) kernels support emulating the PAPR defined "XICS" interrupt
> controller system within KVM.  This patch allows qemu to initialize and
> configure the in-kernel XICS, and keep its state in sync with qemu's XICS
> state as necessary.
>
> This should give considerable performance improvements.  e.g. on a simple
> IPI ping-pong test between hardware threads, using qemu XICS gives us
> around 5,000 irqs/second, whereas the in-kernel XICS gives us around
> 70,000 irqs/s on the same hardware configuration.
>
> [Mike Qiu <qiudayu@linux.vnet.ibm.com>: fixed mistype which caused ics_set_kvm_state() to fail]
> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> [aik: moved to a separate device]
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  default-configs/ppc64-softmmu.mak |    1 +
>  hw/intc/Makefile.objs             |    1 +
>  hw/intc/xics_kvm.c                |  445 +++++++++++++++++++++++++++++++++++++
>  hw/ppc/spapr.c                    |   32 ++-
>  include/hw/ppc/xics.h             |   13 ++
>  5 files changed, 489 insertions(+), 3 deletions(-)
>  create mode 100644 hw/intc/xics_kvm.c
>
> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> index 69a9f8d..5b995f9 100644
> --- a/default-configs/ppc64-softmmu.mak
> +++ b/default-configs/ppc64-softmmu.mak
> @@ -48,5 +48,6 @@ CONFIG_OPENPIC_KVM=$(and $(CONFIG_E500),$(CONFIG_KVM))
>  # For pSeries
>  CONFIG_PCI_HOTPLUG=y
>  CONFIG_XICS=$(CONFIG_PSERIES)
> +CONFIG_XICS_KVM=$(and $(CONFIG_PSERIES),$(CONFIG_KVM))
>  # For PReP
>  CONFIG_MC146818RTC=y
> diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
> index abe8f80..9e77afe 100644
> --- a/hw/intc/Makefile.objs
> +++ b/hw/intc/Makefile.objs
> @@ -23,3 +23,4 @@ obj-$(CONFIG_OPENPIC) += openpic.o
>  obj-$(CONFIG_OPENPIC_KVM) += openpic_kvm.o
>  obj-$(CONFIG_SH4) += sh_intc.o
>  obj-$(CONFIG_XICS) += xics.o
> +obj-$(CONFIG_XICS_KVM) += xics_kvm.o
> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
> new file mode 100644
> index 0000000..d5604a7
> --- /dev/null
> +++ b/hw/intc/xics_kvm.c
> @@ -0,0 +1,445 @@
> +/*
> + * QEMU PowerPC pSeries Logical Partition (aka sPAPR) hardware System Emulator
> + *
> + * PAPR Virtualized Interrupt System, aka ICS/ICP aka xics, in-kernel emulation
> + *
> + * Copyright (c) 2013 David Gibson, IBM Corporation.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a copy
> + * of this software and associated documentation files (the "Software"), to deal
> + * in the Software without restriction, including without limitation the rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + *
> + */
> +
> +#include "hw/hw.h"
> +#include "trace.h"
> +#include "hw/ppc/spapr.h"
> +#include "hw/ppc/xics.h"
> +#include "kvm_ppc.h"
> +#include "qemu/config-file.h"
> +
> +#include <sys/ioctl.h>
> +
> +struct icp_state_kvm {

CodingStyle

Regards,

Anthony Liguori

> +    struct icp_state parent;
> +
> +    uint32_t set_xive_token;
> +    uint32_t get_xive_token;
> +    uint32_t int_off_token;
> +    uint32_t int_on_token;
> +    int kernel_xics_fd;
> +};
> +
> +static void icp_get_kvm_state(struct icp_server_state *ss)
> +{
> +    uint64_t state;
> +    struct kvm_one_reg reg = {
> +        .id = KVM_REG_PPC_ICP_STATE,
> +        .addr = (uintptr_t)&state,
> +    };
> +    int ret;
> +
> +    if (!ss->cs) {
> +        return; /* kernel irqchip not in use */
> +    }
> +
> +    ret = kvm_vcpu_ioctl(ss->cs, KVM_GET_ONE_REG, &reg);
> +    if (ret != 0) {
> +        fprintf(stderr, "Unable to retrieve KVM interrupt controller state"
> +                " for CPU %d: %s\n", ss->cs->cpu_index, strerror(errno));
> +        exit(1);
> +    }
> +
> +    ss->xirr = state >> KVM_REG_PPC_ICP_XISR_SHIFT;
> +    ss->mfrr = (state >> KVM_REG_PPC_ICP_MFRR_SHIFT)
> +        & KVM_REG_PPC_ICP_MFRR_MASK;
> +    ss->pending_priority = (state >> KVM_REG_PPC_ICP_PPRI_SHIFT)
> +        & KVM_REG_PPC_ICP_PPRI_MASK;
> +}
> +
> +static int icp_set_kvm_state(struct icp_server_state *ss)
> +{
> +    uint64_t state;
> +    struct kvm_one_reg reg = {
> +        .id = KVM_REG_PPC_ICP_STATE,
> +        .addr = (uintptr_t)&state,
> +    };
> +    int ret;
> +
> +    if (!ss->cs) {
> +        return 0; /* kernel irqchip not in use */
> +    }
> +
> +    state = ((uint64_t)ss->xirr << KVM_REG_PPC_ICP_XISR_SHIFT)
> +        | ((uint64_t)ss->mfrr << KVM_REG_PPC_ICP_MFRR_SHIFT)
> +        | ((uint64_t)ss->pending_priority << KVM_REG_PPC_ICP_PPRI_SHIFT);
> +
> +    ret = kvm_vcpu_ioctl(ss->cs, KVM_SET_ONE_REG, &reg);
> +    if (ret != 0) {
> +        fprintf(stderr, "Unable to restore KVM interrupt controller state (0x%"
> +                PRIx64 ") for CPU %d: %s\n", state, ss->cs->cpu_index,
> +                strerror(errno));
> +        exit(1);
> +        return ret;
> +    }
> +
> +    return 0;
> +}
> +
> +static void ics_get_kvm_state(struct ics_state *ics)
> +{
> +    struct icp_state_kvm *icpkvm = XICS_KVM(ics->icp);
> +    uint64_t state;
> +    struct kvm_device_attr attr = {
> +        .flags = 0,
> +        .group = KVM_DEV_XICS_GRP_SOURCES,
> +        .addr = (uint64_t)(uintptr_t)&state,
> +    };
> +    int i;
> +
> +    for (i = 0; i < ics->nr_irqs; i++) {
> +        struct ics_irq_state *irq = &ics->irqs[i];
> +        int ret;
> +
> +        attr.attr = i + ics->offset;
> +
> +        ret = ioctl(icpkvm->kernel_xics_fd, KVM_GET_DEVICE_ATTR, &attr);
> +        if (ret != 0) {
> +            fprintf(stderr, "Unable to retrieve KVM interrupt controller state"
> +                    " for IRQ %d: %s\n", i + ics->offset, strerror(errno));
> +            exit(1);
> +        }
> +
> +        irq->server = state & KVM_XICS_DESTINATION_MASK;
> +        irq->saved_priority = (state >> KVM_XICS_PRIORITY_SHIFT)
> +            & KVM_XICS_PRIORITY_MASK;
> +        /*
> +         * To be consistent with the software emulation in xics.c, we
> +         * split out the masked state + priority that we get from the
> +         * kernel into 'current priority' (0xff if masked) and
> +         * 'saved priority' (if masked, this is the priority the
> +         * interrupt had before it was masked).  Masking and unmasking
> +         * are done with the ibm,int-off and ibm,int-on RTAS calls.
> +         */
> +        if (state & KVM_XICS_MASKED) {
> +            irq->priority = 0xff;
> +        } else {
> +            irq->priority = irq->saved_priority;
> +        }
> +
> +        if (state & KVM_XICS_PENDING) {
> +            if (state & KVM_XICS_LEVEL_SENSITIVE) {
> +                irq->status |= XICS_STATUS_ASSERTED;
> +            } else {
> +                /*
> +                 * A pending edge-triggered interrupt (or MSI)
> +                 * must have been rejected previously when we
> +                 * first detected it and tried to deliver it,
> +                 * so mark it as pending and previously rejected
> +                 * for consistency with how xics.c works.
> +                 */
> +                irq->status |= XICS_STATUS_MASKED_PENDING
> +                    | XICS_STATUS_REJECTED;
> +            }
> +        }
> +    }
> +}
> +
> +static int ics_set_kvm_state(struct ics_state *ics)
> +{
> +    struct icp_state_kvm *icpkvm = XICS_KVM(ics->icp);
> +    uint64_t state;
> +    struct kvm_device_attr attr = {
> +        .flags = 0,
> +        .group = KVM_DEV_XICS_GRP_SOURCES,
> +        .addr = (uint64_t)(uintptr_t)&state,
> +    };
> +    int i;
> +
> +    for (i = 0; i < ics->nr_irqs; i++) {
> +        struct ics_irq_state *irq = &ics->irqs[i];
> +        int ret;
> +
> +        attr.attr = i + ics->offset;
> +
> +        state = irq->server;
> +        state |= (uint64_t)(irq->saved_priority & KVM_XICS_PRIORITY_MASK)
> +            << KVM_XICS_PRIORITY_SHIFT;
> +        if (irq->priority != irq->saved_priority) {
> +            assert(irq->priority == 0xff);
> +            state |= KVM_XICS_MASKED;
> +        }
> +
> +        if (ics->islsi[i]) {
> +            state |= KVM_XICS_LEVEL_SENSITIVE;
> +            if (irq->status & XICS_STATUS_ASSERTED) {
> +                state |= KVM_XICS_PENDING;
> +            }
> +        } else {
> +            if (irq->status & XICS_STATUS_MASKED_PENDING) {
> +                state |= KVM_XICS_PENDING;
> +            }
> +        }
> +
> +        ret = ioctl(icpkvm->kernel_xics_fd, KVM_SET_DEVICE_ATTR, &attr);
> +        if (ret != 0) {
> +            fprintf(stderr, "Unable to restore KVM interrupt controller state"
> +                    " for IRQs %d: %s\n", i + ics->offset, strerror(errno));
> +            return ret;
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +static void icp_pre_save(void *opaque)
> +{
> +    struct icp_server_state *ss = opaque;
> +
> +    icp_get_kvm_state(ss);
> +}
> +
> +static int icp_post_load(void *opaque, int version_id)
> +{
> +    struct icp_server_state *ss = opaque;
> +
> +    return icp_set_kvm_state(ss);
> +}
> +
> +static void ics_pre_save(void *opaque)
> +{
> +    struct ics_state *ics = opaque;
> +
> +    ics_get_kvm_state(ics);
> +}
> +
> +static int ics_post_load(void *opaque, int version_id)
> +{
> +    struct ics_state *ics = opaque;
> +
> +    return ics_set_kvm_state(ics);
> +}
> +
> +static VMStateDescription vmstate_icpkvm_server = {
> +    .name = "icpkvm/server",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .pre_save = icp_pre_save,
> +    .post_load = icp_post_load,
> +};
> +
> +static VMStateDescription vmstate_icskvm = {
> +    .name = "icskvm",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .minimum_version_id_old = 1,
> +    .pre_save = ics_pre_save,
> +    .post_load = ics_post_load,
> +};
> +
> +static void ics_set_irq_kvm(void *opaque, int srcno, int val)
> +{
> +    struct ics_state *ics = opaque;
> +    struct kvm_irq_level args;
> +    int rc;
> +
> +    args.irq = srcno + ics->offset;
> +    if (!ics->islsi[srcno]) {
> +        if (!val) {
> +            return;
> +        }
> +        args.level = KVM_INTERRUPT_SET;
> +    } else {
> +        args.level = val ? KVM_INTERRUPT_SET_LEVEL : KVM_INTERRUPT_UNSET;
> +    }
> +    rc = kvm_vm_ioctl(kvm_state, KVM_IRQ_LINE, &args);
> +    if (rc < 0) {
> +        perror("kvm_irq_line");
> +    }
> +}
> +
> +int xics_kvm_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
> +{
> +    CPUState *cs;
> +    struct icp_server_state *ss;
> +    struct icp_state_kvm *icpkvm = (struct icp_state_kvm *) object_dynamic_cast(
> +            OBJECT(icp), TYPE_XICS_KVM);
> +
> +    if (!icpkvm) {
> +        return -1;
> +    }
> +
> +    cs = CPU(cpu);
> +    ss = &icp->ss[cs->cpu_index];
> +
> +    assert(cs->cpu_index < icp->nr_servers);
> +    if (icpkvm->kernel_xics_fd == -1) {
> +        abort();
> +    }
> +
> +    if (icpkvm->kernel_xics_fd != -1) {
> +        int ret;
> +        struct kvm_enable_cap xics_enable_cap = {
> +            .cap = KVM_CAP_IRQ_XICS,
> +            .flags = 0,
> +            .args = {icpkvm->kernel_xics_fd, cs->cpu_index, 0, 0},
> +        };
> +
> +        ss->cs = cs;
> +
> +        ret = kvm_vcpu_ioctl(ss->cs, KVM_ENABLE_CAP, &xics_enable_cap);
> +        if (ret < 0) {
> +            fprintf(stderr, "Unable to connect CPU%d to kernel XICS: %s\n",
> +                    cs->cpu_index, strerror(errno));
> +            exit(1);
> +        }
> +    }
> +    xics_common_cpu_setup(icp, cpu);
> +
> +    vmstate_icpkvm_server.fields = vmstate_icp_server.fields;
> +    vmstate_register(NULL, cs->cpu_index, &vmstate_icpkvm_server, ss);
> +
> +    return 0;
> +}
> +
> +static void rtas_dummy(PowerPCCPU *cpu, sPAPREnvironment *spapr,
> +                       uint32_t token,
> +                       uint32_t nargs, target_ulong args,
> +                       uint32_t nret, target_ulong rets)
> +{
> +    fprintf(stderr, "pseries: %s() should never be called for in-kernel XICS\n", __func__);
> +}
> +
> +static void xics_kvm_realize(DeviceState *dev, Error **errp)
> +{
> +    struct icp_state_kvm *icpkvm = XICS_KVM(dev);
> +    QemuOptsList *list = qemu_find_opts("machine");
> +    int rc;
> +    struct kvm_create_device xics_create_device = {
> +        .type = KVM_DEV_TYPE_XICS,
> +        .flags = 0,
> +    };
> +
> +    if (!kvm_enabled()) {
> +        error_setg(errp, "KVM must be enabled for in-kernel XICS");
> +        goto fail;
> +    }
> +
> +    if (QTAILQ_EMPTY(&list->head) ||
> +        !qemu_opt_get_bool(QTAILQ_FIRST(&list->head),
> +                           "kernel_irqchip", true) ||
> +        !kvm_check_extension(kvm_state, KVM_CAP_IRQ_XICS)) {
> +        error_setg(errp, "KVM must be enabled for in-kernel XICS");
> +        return;
> +    }
> +
> +    icpkvm->set_xive_token = spapr_rtas_register("ibm,set-xive", rtas_dummy);
> +    icpkvm->get_xive_token = spapr_rtas_register("ibm,get-xive", rtas_dummy);
> +    icpkvm->int_off_token = spapr_rtas_register("ibm,int-off", rtas_dummy);
> +    icpkvm->int_on_token = spapr_rtas_register("ibm,int-on", rtas_dummy);
> +
> +    rc = kvmppc_define_rtas_token(icpkvm->set_xive_token, "ibm,set-xive");
> +    if (rc < 0) {
> +        error_setg(errp, "kvmppc_define_rtas_token: ibm,set-xive");
> +        goto fail;
> +    }
> +
> +    rc = kvmppc_define_rtas_token(icpkvm->get_xive_token, "ibm,get-xive");
> +    if (rc < 0) {
> +        error_setg(errp, "kvmppc_define_rtas_token: ibm,get-xive");
> +        goto fail;
> +    }
> +
> +    rc = kvmppc_define_rtas_token(icpkvm->int_on_token, "ibm,int-on");
> +    if (rc < 0) {
> +        error_setg(errp, "kvmppc_define_rtas_token: ibm,int-on");
> +        goto fail;
> +    }
> +
> +    rc = kvmppc_define_rtas_token(icpkvm->int_off_token, "ibm,int-off");
> +    if (rc < 0) {
> +        error_setg(errp, "kvmppc_define_rtas_token: ibm,int-off");
> +        goto fail;
> +    }
> +
> +    /* Create the kernel ICP */
> +    rc = kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &xics_create_device);
> +    if (rc < 0) {
> +        error_setg_errno(errp, -rc, "Error on KVM_CREATE_DEVICE for XICS");
> +        goto fail;
> +    }
> +
> +    icpkvm->kernel_xics_fd = xics_create_device.fd;
> +
> +    xics_common_init(&icpkvm->parent, ics_set_irq_kvm);
> +
> +    /* We use each the ICS's offset into the global irq number space
> +     * as an instance id.  This means we can extend to multiple ICS
> +     * instances without needing to change the savevm format */
> +    vmstate_icskvm.fields = vmstate_ics.fields;
> +    vmstate_register(NULL, icpkvm->parent.ics->offset, &vmstate_icskvm,
> +                     icpkvm->parent.ics);
> +
> +    return;
> +
> +fail:
> +    kvmppc_define_rtas_token(0, "ibm,set-xive");
> +    kvmppc_define_rtas_token(0, "ibm,get-xive");
> +    kvmppc_define_rtas_token(0, "ibm,int-on");
> +    kvmppc_define_rtas_token(0, "ibm,int-off");
> +    return;
> +}
> +
> +static void xics_kvm_reset(DeviceState *d)
> +{
> +    struct icp_state_kvm *icpkvm = XICS_KVM(d);
> +    struct icp_state *icp = &icpkvm->parent;
> +    int i;
> +
> +    xics_common_reset(icp);
> +
> +    for (i = 0; i < icp->nr_servers; i++) {
> +        if (icp->ss[i].cs) {
> +            icp_set_kvm_state(&icp->ss[i]);
> +        }
> +    }
> +
> +    ics_set_kvm_state(icp->ics);
> +}
> +
> +static void xics_kvm_class_init(ObjectClass *oc, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(oc);
> +
> +    dc->realize = xics_kvm_realize;
> +    dc->reset = xics_kvm_reset;
> +}
> +
> +static const TypeInfo xics_kvm_info = {
> +    .name          = TYPE_XICS_KVM,
> +    .parent        = TYPE_XICS,
> +    .instance_size = sizeof(struct icp_state_kvm),
> +    .class_init    = xics_kvm_class_init,
> +};
> +
> +static void xics_kvm_register_types(void)
> +{
> +    type_register_static(&xics_kvm_info);
> +}
> +
> +type_init(xics_kvm_register_types)
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index f989a22..211f434 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1001,7 +1001,31 @@ static struct icp_state *xics_system_init(int nr_servers, int nr_irqs)
>  {
>      struct icp_state *icp = NULL;
>  
> -    icp = try_create_xics(TYPE_XICS, nr_servers, nr_irqs);
> +    if (kvm_enabled()) {
> +        bool irqchip_allowed = true, irqchip_required = false;
> +        QemuOptsList *list = qemu_find_opts("machine");
> +
> +        if (!QTAILQ_EMPTY(&list->head)) {
> +            irqchip_allowed = qemu_opt_get_bool(QTAILQ_FIRST(&list->head),
> +                                                "kernel_irqchip", true);
> +            irqchip_required = qemu_opt_get_bool(QTAILQ_FIRST(&list->head),
> +                                                 "kernel_irqchip", false);
> +        }
> +
> +        if (irqchip_allowed) {
> +            icp = try_create_xics(TYPE_XICS_KVM, nr_servers, nr_irqs);
> +        }
> +
> +        if (irqchip_required && !icp) {
> +            perror("iFailed to create in-kernel XICS\n");
> +            abort();
> +        }
> +    }
> +
> +    if (!icp) {
> +        icp = try_create_xics(TYPE_XICS, nr_servers, nr_irqs);
> +    }
> +
>      if (!icp) {
>          perror("Failed to create XICS\n");
>          abort();
> @@ -1102,8 +1126,6 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
>          }
>          env = &cpu->env;
>  
> -        xics_cpu_setup(spapr->icp, cpu);
> -
>          /* Set time-base frequency to 512 MHz */
>          cpu_ppc_tb_init(env, TIMEBASE_FREQ);
>  
> @@ -1117,6 +1139,10 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
>              kvmppc_set_papr(cpu);
>          }
>  
> +        if (xics_kvm_cpu_setup(spapr->icp, cpu)) {
> +            xics_cpu_setup(spapr->icp, cpu);
> +        }
> +
>          qemu_register_reset(spapr_cpu_reset, cpu);
>      }
>  
> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> index 3f72806..e474c01 100644
> --- a/include/hw/ppc/xics.h
> +++ b/include/hw/ppc/xics.h
> @@ -32,6 +32,9 @@
>  #define TYPE_XICS "xics"
>  #define XICS(obj) OBJECT_CHECK(struct icp_state, (obj), TYPE_XICS)
>  
> +#define TYPE_XICS_KVM "xics-kvm"
> +#define XICS_KVM(obj) OBJECT_CHECK(struct icp_state_kvm, (obj), TYPE_XICS_KVM)
> +
>  #define XICS_IPI        0x2
>  #define XICS_BUID       0x1
>  #define XICS_IRQ_BASE   (XICS_BUID << 12)
> @@ -53,6 +56,7 @@ struct icp_state {
>  };
>  
>  struct icp_server_state {
> +    CPUState *cs;
>      uint32_t xirr;
>      uint8_t pending_priority;
>      uint8_t mfrr;
> @@ -88,6 +92,15 @@ void xics_common_reset(struct icp_state *icp);
>  
>  void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu);
>  
> +#ifdef CONFIG_KVM
> +int xics_kvm_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu);
> +#else
> +static inline int xics_kvm_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
> +{
> +    return -1;
> +}
> +#endif
> +
>  extern const VMStateDescription vmstate_icp_server;
>  extern const VMStateDescription vmstate_ics;
>  
> -- 
> 1.7.10.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 11/17] pseries: savevm support for pseries machine
  2013-07-08 18:50     ` Alexander Graf
@ 2013-07-08 19:01       ` Anthony Liguori
  0 siblings, 0 replies; 92+ messages in thread
From: Anthony Liguori @ 2013-07-08 19:01 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Alexey Kardashevskiy, qemu-devel, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson

Alexander Graf <agraf@suse.de> writes:

> On 08.07.2013, at 20:45, Anthony Liguori wrote:
>
>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>> 
>>> From: David Gibson <david@gibson.dropbear.id.au>
>>> 
>>> This adds the necessary pieces to implement savevm / migration for the
>>> pseries machine.  The most complex part here is migrating the hash
>>> table - for the paravirtualized pseries machine the guest's hash page
>>> table is not stored within guest memory, but externally and the guest
>>> accesses it via hypercalls.
>>> 
>>> This patch uses a hypervisor reserved bit of the HPTE as a dirty bit
>>> (tracking changes to the HPTE itself, not the page it references).
>>> This is used to implement a live migration style incremental save and
>>> restore of the hash table contents.
>>> 
>>> In addition it adds VMStateDescription information to save and restore
>>> the (few) remaining pieces of state information needed by the pseries
>>> machine.
>>> 
>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> 
>> I vaguely recall making the suggestion to use a live section like this.
>> How large is the HTAB typically?
>
> The default for HV KVM is at 16MB IIRC.

And if I recall since it's a hash table, updates are random access and
not at all page aligned making using qemu ram quite unusable for this
purpose.

I guess:

Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>

Regards,

Anthony Liguori


>
>
> Alex

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 08/17] pseries: savevm support for PAPR TCE tables
  2013-07-08 18:39   ` Anthony Liguori
@ 2013-07-08 21:45     ` Benjamin Herrenschmidt
  2013-07-08 22:15       ` Anthony Liguori
  2013-07-09  7:20     ` David Gibson
  2013-07-15 13:26     ` Paolo Bonzini
  2 siblings, 1 reply; 92+ messages in thread
From: Benjamin Herrenschmidt @ 2013-07-08 21:45 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexey Kardashevskiy, Alexander Graf, qemu-devel, qemu-ppc,
	Paolo Bonzini, Paul Mackerras, David Gibson

On Mon, 2013-07-08 at 13:39 -0500, Anthony Liguori wrote:
> > +    .fields      = (VMStateField []) {
> > +        /* Sanity check */
> > +        VMSTATE_UINT32_EQUAL(liobn, sPAPRTCETable),
> > +        VMSTATE_UINT32_EQUAL(window_size, sPAPRTCETable),
> > +
> > +        /* IOMMU state */
> > +        VMSTATE_BOOL(bypass, sPAPRTCETable),
> > +        VMSTATE_VBUFFER_DIVIDE(table, sPAPRTCETable, 0, NULL, 0, window_size,
> > +                               SPAPR_TCE_PAGE_SIZE /
> > sizeof(sPAPRTCE)),
> 
> Not endian safe.  I really don't get the divide bit at all either.

What do you mean by not endian safe ? The TCE table is a well defined format,
it's always big endian regardless of the endianness of either host or guest.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 11/17] pseries: savevm support for pseries machine
  2013-07-08 18:45   ` Anthony Liguori
  2013-07-08 18:50     ` Alexander Graf
@ 2013-07-08 21:48     ` Benjamin Herrenschmidt
  2013-07-08 22:23       ` Anthony Liguori
  1 sibling, 1 reply; 92+ messages in thread
From: Benjamin Herrenschmidt @ 2013-07-08 21:48 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexey Kardashevskiy, Alexander Graf, qemu-devel, qemu-ppc,
	Paolo Bonzini, Paul Mackerras, David Gibson

On Mon, 2013-07-08 at 13:45 -0500, Anthony Liguori wrote:
> I vaguely recall making the suggestion to use a live section like
> this. How large is the HTAB typically?

Depends on how much RAM you put in your VM. The default is around 16M
but it can get bigger.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 08/17] pseries: savevm support for PAPR TCE tables
  2013-07-08 21:45     ` Benjamin Herrenschmidt
@ 2013-07-08 22:15       ` Anthony Liguori
  2013-07-08 22:41         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 92+ messages in thread
From: Anthony Liguori @ 2013-07-08 22:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alexey Kardashevskiy, Alexander Graf, qemu-devel, qemu-ppc,
	Paolo Bonzini, Paul Mackerras, David Gibson

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Mon, 2013-07-08 at 13:39 -0500, Anthony Liguori wrote:
>> > +    .fields      = (VMStateField []) {
>> > +        /* Sanity check */
>> > +        VMSTATE_UINT32_EQUAL(liobn, sPAPRTCETable),
>> > +        VMSTATE_UINT32_EQUAL(window_size, sPAPRTCETable),
>> > +
>> > +        /* IOMMU state */
>> > +        VMSTATE_BOOL(bypass, sPAPRTCETable),
>> > +        VMSTATE_VBUFFER_DIVIDE(table, sPAPRTCETable, 0, NULL, 0, window_size,
>> > +                               SPAPR_TCE_PAGE_SIZE /
>> > sizeof(sPAPRTCE)),
>> 
>> Not endian safe.  I really don't get the divide bit at all either.
>
> What do you mean by not endian safe ? The TCE table is a well defined format,
> it's always big endian regardless of the endianness of either host or
> guest.

VMSTATE_VBUFFER is essentially:

  write(fd, s->table, byte_size_of_table);

It treats whatever is given it as a sized data blob.

table is an array of sPAPRTCE which is just a struct wrapper around a
uint64_t value (the tce entry).

Those entries are set via the h_put_tce hcall through a simple
assignment:

> static target_ulong put_tce_emu(sPAPRTCETable *tcet, target_ulong ioba,
>                                target_ulong tce)
> {
>     ...
> 
>     tcep = tcet->table + (ioba >> SPAPR_TCE_PAGE_SHIFT);
>     tcep->tce = tce;
>  ...
>
> static target_ulong h_put_tce(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>                               target_ulong opcode, target_ulong *args)
> {
>     ...
>     target_ulong tce = args[2];
>     sPAPRTCETable *tcet = spapr_tce_find_by_liobn(liobn);
> 
>     ...
> 
>     if (tcet) {
>         return put_tce_emu(tcet, ioba, tce);
>     }

Hypercall arguments are passed in CPU endianness so what's being stored
in the tce table is CPU endianness.

Since VBUFFER just does a blind write() of the full array of uint64s,
what goes on the wire will be CPU endianness.

So if you do a savevm on a little endian host and loadvm on a big endian
host, badness ensues.

The proper thing to do is use a VARRAY instead of a VBUFFER.  VARRAY
will handle endian because it treats the data as an array, not as an
opaque buffer.

Regards,

Anthony Liguori

>
> Cheers,
> Ben.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 11/17] pseries: savevm support for pseries machine
  2013-07-08 21:48     ` Benjamin Herrenschmidt
@ 2013-07-08 22:23       ` Anthony Liguori
  0 siblings, 0 replies; 92+ messages in thread
From: Anthony Liguori @ 2013-07-08 22:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Juan Quintela, Alexey Kardashevskiy, Alexander Graf, qemu-devel,
	qemu-ppc, Paolo Bonzini, Paul Mackerras, David Gibson

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Mon, 2013-07-08 at 13:45 -0500, Anthony Liguori wrote:
>> I vaguely recall making the suggestion to use a live section like
>> this. How large is the HTAB typically?
>
> Depends on how much RAM you put in your VM. The default is around 16M
> but it can get bigger.

Yeah, it's worth adding a comment to the commit message explaining
this on the next spin.  There are very, very few live savevm handlers (I
think this would be the third) so it's a very unusual thing to do.

I don't know of a better option though.

Regards,

Anthony Liguori

>
> Cheers,
> Ben.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 08/17] pseries: savevm support for PAPR TCE tables
  2013-07-08 22:15       ` Anthony Liguori
@ 2013-07-08 22:41         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 92+ messages in thread
From: Benjamin Herrenschmidt @ 2013-07-08 22:41 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexey Kardashevskiy, Alexander Graf, qemu-devel, qemu-ppc,
	Paolo Bonzini, Paul Mackerras, David Gibson

On Mon, 2013-07-08 at 17:15 -0500, Anthony Liguori wrote:

> Hypercall arguments are passed in CPU endianness so what's being stored
> in the tce table is CPU endianness.
> 
> Since VBUFFER just does a blind write() of the full array of uint64s,
> what goes on the wire will be CPU endianness.
> 
> So if you do a savevm on a little endian host and loadvm on a big endian
> host, badness ensues.
> 
> The proper thing to do is use a VARRAY instead of a VBUFFER.  VARRAY
> will handle endian because it treats the data as an array, not as an
> opaque buffer.

Ok, so that's indeed an issue for emulated TCEs because what qemu stores
is not the real (BE) TCE table but a "host native" version of it. I see.

Cheers,
Ben.
 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 03/17] savevm: Implement VMS_DIVIDE flag
  2013-07-08 18:27   ` Anthony Liguori
@ 2013-07-08 23:57     ` David Gibson
  2013-07-09 14:06       ` Anthony Liguori
  0 siblings, 1 reply; 92+ messages in thread
From: David Gibson @ 2013-07-08 23:57 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexey Kardashevskiy, qemu-devel, Alexander Graf, qemu-ppc,
	Paolo Bonzini, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 998 bytes --]

On Mon, Jul 08, 2013 at 01:27:05PM -0500, Anthony Liguori wrote:
> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> 
> > From: David Gibson <david@gibson.dropbear.id.au>
> >
> > The vmstate infrastructure includes a VMS_MULTIPY flag, and associated
> > VMSTATE_VBUFFER_MULTIPLY helper macro.  These can be used to save a
> > variably sized buffer where the size in bytes of the buffer isn't directly
> > accessible as a structure field, but an element count from which the size
> > can be derived is.
> 
> Why?  What's the point of sending the total size vs. the element
> count?

Because it's more convenient to work with the total size at runtime,
and because the VMSTATE stuff works with actual structure fields,
there's not really a way to convert it at migrate time, short of this.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 05/17] pseries: savevm support for XICS interrupt controller
  2013-07-08 18:31   ` Anthony Liguori
@ 2013-07-09  0:06     ` Alexey Kardashevskiy
  2013-07-09  0:49       ` Anthony Liguori
  2013-07-09  7:17     ` David Gibson
  1 sibling, 1 reply; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-09  0:06 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexander Graf, qemu-devel, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson

On 07/09/2013 04:31 AM, Anthony Liguori wrote:
> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> 
>> From: David Gibson <david@gibson.dropbear.id.au>
>>
>> This patch adds the necessary VMStateDescription information to support
>> savevm/loadvm for the XICS interrupt controller used on the pseries
>> machine.
>>
>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>> [aik: added ics_resend() on post_load]
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  hw/intc/xics.c |   63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 63 insertions(+)
>>
>> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
>> index 0e374c8..3e8f48f 100644
>> --- a/hw/intc/xics.c
>> +++ b/hw/intc/xics.c
>> @@ -497,6 +497,61 @@ static void xics_reset(DeviceState *d)
>>      xics_common_reset(XICS(d));
>>  }
>>  
>> +static int ics_post_load(void *opaque, int version_id)
>> +{
>> +    int i;
>> +    struct ics_state *ics = opaque;
>> +
>> +    for (i = 0; i < ics->icp->nr_servers; i++) {
>> +        icp_resend(ics->icp, i);
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +const VMStateDescription vmstate_icp_server = {
>> +    .name = "icp/server",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .minimum_version_id_old = 1,
>> +    .fields      = (VMStateField []) {
>> +        /* Sanity check */
>> +        VMSTATE_UINT32(xirr, struct icp_server_state),
>> +        VMSTATE_UINT8(pending_priority, struct icp_server_state),
>> +        VMSTATE_UINT8(mfrr, struct icp_server_state),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>> +static const VMStateDescription vmstate_ics_irq = {
>> +    .name = "ics/irq",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .minimum_version_id_old = 1,
>> +    .fields      = (VMStateField []) {
>> +        VMSTATE_UINT32(server, struct ics_irq_state),
>> +        VMSTATE_UINT8(priority, struct ics_irq_state),
>> +        VMSTATE_UINT8(saved_priority, struct ics_irq_state),
>> +        VMSTATE_UINT8(status, struct ics_irq_state),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>> +const VMStateDescription vmstate_ics = {
>> +    .name = "ics",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .minimum_version_id_old = 1,
>> +    .post_load = ics_post_load,
>> +    .fields      = (VMStateField []) {
>> +        /* Sanity check */
>> +        VMSTATE_UINT32_EQUAL(nr_irqs, struct ics_state),
>> +
>> +        VMSTATE_STRUCT_VARRAY_POINTER_UINT32(irqs, struct ics_state, nr_irqs, vmstate_ics_irq, struct ics_irq_state),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>>  void xics_common_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
>>  {
>>      CPUState *cs = CPU(cpu);
>> @@ -523,7 +578,11 @@ void xics_common_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
>>  
>>  void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
>>  {
>> +    CPUState *cs = CPU(cpu);
>> +    struct icp_server_state *ss = &icp->ss[cs->cpu_index];
>> +
>>      xics_common_cpu_setup(icp, cpu);
>> +    vmstate_register(NULL, cs->cpu_index, &vmstate_icp_server, ss);
> 
> This is an indication that something is wrong.
> 
> You should tie the vmstate section to DeviceState::vmsd.  You only need
> to do this because you haven't converted everything to QOM yet.
> 
> Please do that to avoid these hacks.


How? I want to support migration from xics to xics-kvm and vice versa.
vmsd cannot be inherited and even if they could, different device names
would kill that support.


> 
> Regards,
> 
> Anthony Liguori
> 
>>  }
>>  
>>  void xics_common_init(struct icp_state *icp, qemu_irq_handler handler)
>> @@ -555,6 +614,10 @@ static void xics_realize(DeviceState *dev, Error **errp)
>>      spapr_rtas_register("ibm,int-off", rtas_int_off);
>>      spapr_rtas_register("ibm,int-on", rtas_int_on);
>>  
>> +    /* We use each the ICS's offset into the global irq number space
>> +     * as an instance id.  This means we can extend to multiple ICS
>> +     * instances without needing to change the savevm format */
>> +    vmstate_register(NULL, icp->ics->offset, &vmstate_ics, icp->ics);
>>  }
>>  
>>  static Property xics_properties[] = {
>> -- 
>> 1.7.10.4
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 05/17] pseries: savevm support for XICS interrupt controller
  2013-07-09  0:06     ` Alexey Kardashevskiy
@ 2013-07-09  0:49       ` Anthony Liguori
  2013-07-09  0:59         ` Alexey Kardashevskiy
  2013-07-09  3:37         ` Alexey Kardashevskiy
  0 siblings, 2 replies; 92+ messages in thread
From: Anthony Liguori @ 2013-07-09  0:49 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Anthony Liguori, Alexander Graf, qemu-devel, qemu-ppc,
	Paolo Bonzini, Paul Mackerras, David Gibson

On Mon, Jul 8, 2013 at 7:06 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>> You should tie the vmstate section to DeviceState::vmsd.  You only need
>> to do this because you haven't converted everything to QOM yet.
>>
>> Please do that to avoid these hacks.
>
>
> How? I want to support migration from xics to xics-kvm and vice versa.
> vmsd cannot be inherited and even if they could, different device names
> would kill that support.

Please look at hw/intc/i8259_common.c and then hw/i386/kvm/i8259.c and
hw/i386/intc/i8259.c.

The vmsd is in the common base class shared between the KVM version
and the non-KVM version.  As long as the subclasses don't introduce
any new state members, you can safely migrate between the two devices.

You should consider splitting the implementations up into separate
files just like i8259 too.

Regards,

Anthony Liguori

>
>
>>
>> Regards,
>>
>> Anthony Liguori
>>
>>>  }
>>>
>>>  void xics_common_init(struct icp_state *icp, qemu_irq_handler handler)
>>> @@ -555,6 +614,10 @@ static void xics_realize(DeviceState *dev, Error **errp)
>>>      spapr_rtas_register("ibm,int-off", rtas_int_off);
>>>      spapr_rtas_register("ibm,int-on", rtas_int_on);
>>>
>>> +    /* We use each the ICS's offset into the global irq number space
>>> +     * as an instance id.  This means we can extend to multiple ICS
>>> +     * instances without needing to change the savevm format */
>>> +    vmstate_register(NULL, icp->ics->offset, &vmstate_ics, icp->ics);
>>>  }
>>>
>>>  static Property xics_properties[] = {
>>> --
>>> 1.7.10.4
>>
>
>
> --
> Alexey
>

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 05/17] pseries: savevm support for XICS interrupt controller
  2013-07-09  0:49       ` Anthony Liguori
@ 2013-07-09  0:59         ` Alexey Kardashevskiy
  2013-07-09  1:25           ` Anthony Liguori
  2013-07-09  3:37         ` Alexey Kardashevskiy
  1 sibling, 1 reply; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-09  0:59 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Anthony Liguori, Alexander Graf, qemu-devel, qemu-ppc,
	Paolo Bonzini, Paul Mackerras, David Gibson

On 07/09/2013 10:49 AM, Anthony Liguori wrote:
> On Mon, Jul 8, 2013 at 7:06 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>> You should tie the vmstate section to DeviceState::vmsd.  You only need
>>> to do this because you haven't converted everything to QOM yet.
>>>
>>> Please do that to avoid these hacks.
>>
>>
>> How? I want to support migration from xics to xics-kvm and vice versa.
>> vmsd cannot be inherited and even if they could, different device names
>> would kill that support.
> 
> Please look at hw/intc/i8259_common.c and then hw/i386/kvm/i8259.c and
> hw/i386/intc/i8259.c.
> 
> The vmsd is in the common base class shared between the KVM version
> and the non-KVM version.  As long as the subclasses don't introduce
> any new state members, you can safely migrate between the two devices.

Ok, thanks.

> You should consider splitting the implementations up into separate
> files just like i8259 too.


I  already split it to xics and xics-kvm devices so you are are definitely
talking about something else but I do not understand what exactly...



> Regards,
> 
> Anthony Liguori
> 
>>
>>
>>>
>>> Regards,
>>>
>>> Anthony Liguori
>>>
>>>>  }
>>>>
>>>>  void xics_common_init(struct icp_state *icp, qemu_irq_handler handler)
>>>> @@ -555,6 +614,10 @@ static void xics_realize(DeviceState *dev, Error **errp)
>>>>      spapr_rtas_register("ibm,int-off", rtas_int_off);
>>>>      spapr_rtas_register("ibm,int-on", rtas_int_on);
>>>>
>>>> +    /* We use each the ICS's offset into the global irq number space
>>>> +     * as an instance id.  This means we can extend to multiple ICS
>>>> +     * instances without needing to change the savevm format */
>>>> +    vmstate_register(NULL, icp->ics->offset, &vmstate_ics, icp->ics);
>>>>  }
>>>>
>>>>  static Property xics_properties[] = {
>>>> --
>>>> 1.7.10.4
>>>
>>
>>
>> --
>> Alexey
>>


-- 
Alexey

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 05/17] pseries: savevm support for XICS interrupt controller
  2013-07-09  0:59         ` Alexey Kardashevskiy
@ 2013-07-09  1:25           ` Anthony Liguori
  0 siblings, 0 replies; 92+ messages in thread
From: Anthony Liguori @ 2013-07-09  1:25 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alexander Graf, qemu-devel, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> On 07/09/2013 10:49 AM, Anthony Liguori wrote:
>> On Mon, Jul 8, 2013 at 7:06 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>> You should tie the vmstate section to DeviceState::vmsd.  You only need
>>>> to do this because you haven't converted everything to QOM yet.
>>>>
>>>> Please do that to avoid these hacks.
>>>
>>>
>>> How? I want to support migration from xics to xics-kvm and vice versa.
>>> vmsd cannot be inherited and even if they could, different device names
>>> would kill that support.
>> 
>> Please look at hw/intc/i8259_common.c and then hw/i386/kvm/i8259.c and
>> hw/i386/intc/i8259.c.
>> 
>> The vmsd is in the common base class shared between the KVM version
>> and the non-KVM version.  As long as the subclasses don't introduce
>> any new state members, you can safely migrate between the two devices.
>
> Ok, thanks.
>
>> You should consider splitting the implementations up into separate
>> files just like i8259 too.
>
>
> I  already split it to xics and xics-kvm devices so you are are definitely
> talking about something else but I do not understand what exactly...

There are three classes for the i8259 split between three files.  I was
suggesting factoring out a base class and putting that in a separate
file.

Regards,

Anthony Liguori

>
>
>
>> Regards,
>> 
>> Anthony Liguori
>> 
>>>
>>>
>>>>
>>>> Regards,
>>>>
>>>> Anthony Liguori
>>>>
>>>>>  }
>>>>>
>>>>>  void xics_common_init(struct icp_state *icp, qemu_irq_handler handler)
>>>>> @@ -555,6 +614,10 @@ static void xics_realize(DeviceState *dev, Error **errp)
>>>>>      spapr_rtas_register("ibm,int-off", rtas_int_off);
>>>>>      spapr_rtas_register("ibm,int-on", rtas_int_on);
>>>>>
>>>>> +    /* We use each the ICS's offset into the global irq number space
>>>>> +     * as an instance id.  This means we can extend to multiple ICS
>>>>> +     * instances without needing to change the savevm format */
>>>>> +    vmstate_register(NULL, icp->ics->offset, &vmstate_ics, icp->ics);
>>>>>  }
>>>>>
>>>>>  static Property xics_properties[] = {
>>>>> --
>>>>> 1.7.10.4
>>>>
>>>
>>>
>>> --
>>> Alexey
>>>
>
>
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 14/17] pseries: Support for in-kernel XICS interrupt controller
  2013-07-08 18:50   ` Anthony Liguori
@ 2013-07-09  3:21     ` Alexey Kardashevskiy
  2013-07-09  7:21       ` David Gibson
  0 siblings, 1 reply; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-09  3:21 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexander Graf, qemu-devel, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson

On 07/09/2013 04:50 AM, Anthony Liguori wrote:
>> +#include "hw/hw.h"
>> +#include "trace.h"
>> +#include "hw/ppc/spapr.h"
>> +#include "hw/ppc/xics.h"
>> +#include "kvm_ppc.h"
>> +#include "qemu/config-file.h"
>> +
>> +#include <sys/ioctl.h>
>> +
>> +struct icp_state_kvm {
> 
> CodingStyle


./scripts/checkpatch.pl finds nothing.

Did you mean missing typedef?


-- 
Alexey

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 05/17] pseries: savevm support for XICS interrupt controller
  2013-07-09  0:49       ` Anthony Liguori
  2013-07-09  0:59         ` Alexey Kardashevskiy
@ 2013-07-09  3:37         ` Alexey Kardashevskiy
  2013-07-15 13:05           ` Paolo Bonzini
  1 sibling, 1 reply; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-09  3:37 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Anthony Liguori, Alexander Graf, qemu-devel, qemu-ppc,
	Paolo Bonzini, Paul Mackerras, David Gibson

On 07/09/2013 10:49 AM, Anthony Liguori wrote:
> On Mon, Jul 8, 2013 at 7:06 PM, Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>> You should tie the vmstate section to DeviceState::vmsd.  You only need
>>> to do this because you haven't converted everything to QOM yet.
>>>
>>> Please do that to avoid these hacks.
>>
>>
>> How? I want to support migration from xics to xics-kvm and vice versa.
>> vmsd cannot be inherited and even if they could, different device names
>> would kill that support.
> 
> Please look at hw/intc/i8259_common.c and then hw/i386/kvm/i8259.c and
> hw/i386/intc/i8259.c.

btw do I have to put xics_kvm.c to hw/ppc64/kvm (which does not exist yet)
or to hw/intc? What is the system here? I am really confused.

> The vmsd is in the common base class shared between the KVM version
> and the non-KVM version.  As long as the subclasses don't introduce
> any new state members, you can safely migrate between the two devices.


btw xics-kvm does not introduce new members but does have very different
.pre_save and .post_load. This actually was the whole point of splitting
xics into xics and xics-kvm. I cannot see how I can fix it without hacks.
Property's can be inherited from a parent class (?) but VMStateDescription
cannot.


> You should consider splitting the implementations up into separate
> files just like i8259 too.




-- 
Alexey

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 02/17] pseries: rework XICS
  2013-07-08 18:22   ` Anthony Liguori
@ 2013-07-09  3:40     ` Alexey Kardashevskiy
  2013-07-09  4:48       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-09  3:40 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, Alexander Graf, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson

On 07/09/2013 04:22 AM, Anthony Liguori wrote:
> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> 
>> Currently XICS interrupt controller is not a QEMU device. As we are going
>> to support in-kernel emulated XICS which is a part of KVM, it make
>> sense not to extend the existing XICS and have multiple KVM stub functions
>> but to create yet another device and share pieces between fully emulated
>> XICS and in-kernel XICS.
>>
>> The rework includes:
>> * port to QOM
>> * made few functions public to use from in-kernel XICS implementation
>> * made VMStateDescription public to be used for in-kernel XICS migration
>> * move xics_system_init() to spapr.c, it tries creating fully-emulated
>> XICS now and will try in-kernel XICS in upcoming patches.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  hw/intc/xics.c        |  109 ++++++++++++++++++++++++++-----------------------
>>  hw/ppc/spapr.c        |   28 +++++++++++++
>>  include/hw/ppc/xics.h |   59 ++++++++++++++++++++++++--
>>  3 files changed, 141 insertions(+), 55 deletions(-)
>>
>> diff --git a/hw/intc/xics.c b/hw/intc/xics.c
>> index 091912e..0e374c8 100644
>> --- a/hw/intc/xics.c
>> +++ b/hw/intc/xics.c
>> @@ -34,13 +34,6 @@
>>   * ICP: Presentation layer
>>   */
>>  
>> -struct icp_server_state {
>> -    uint32_t xirr;
>> -    uint8_t pending_priority;
>> -    uint8_t mfrr;
>> -    qemu_irq output;
>> -};
>> -
>>  #define XISR_MASK  0x00ffffff
>>  #define CPPR_MASK  0xff000000
>>  
>> @@ -49,12 +42,6 @@ struct icp_server_state {
>>  
>>  struct ics_state;
>>  
>> -struct icp_state {
>> -    long nr_servers;
>> -    struct icp_server_state *ss;
>> -    struct ics_state *ics;
>> -};
>> -
>>  static void ics_reject(struct ics_state *ics, int nr);
>>  static void ics_resend(struct ics_state *ics);
>>  static void ics_eoi(struct ics_state *ics, int nr);
>> @@ -171,27 +158,6 @@ static void icp_irq(struct icp_state *icp, int server, int nr, uint8_t priority)
>>  /*
>>   * ICS: Source layer
>>   */
>> -
>> -struct ics_irq_state {
>> -    int server;
>> -    uint8_t priority;
>> -    uint8_t saved_priority;
>> -#define XICS_STATUS_ASSERTED           0x1
>> -#define XICS_STATUS_SENT               0x2
>> -#define XICS_STATUS_REJECTED           0x4
>> -#define XICS_STATUS_MASKED_PENDING     0x8
>> -    uint8_t status;
>> -};
>> -
>> -struct ics_state {
>> -    int nr_irqs;
>> -    int offset;
>> -    qemu_irq *qirqs;
>> -    bool *islsi;
>> -    struct ics_irq_state *irqs;
>> -    struct icp_state *icp;
>> -};
>> -
>>  static int ics_valid_irq(struct ics_state *ics, uint32_t nr)
>>  {
>>      return (nr >= ics->offset)
>> @@ -506,9 +472,8 @@ static void rtas_int_on(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>>      rtas_st(rets, 0, 0); /* Success */
>>  }
>>  
>> -static void xics_reset(void *opaque)
>> +void xics_common_reset(struct icp_state *icp)
>>  {
>> -    struct icp_state *icp = (struct icp_state *)opaque;
>>      struct ics_state *ics = icp->ics;
>>      int i;
>>  
>> @@ -527,7 +492,12 @@ static void xics_reset(void *opaque)
>>      }
>>  }
>>  
>> -void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
>> +static void xics_reset(DeviceState *d)
>> +{
>> +    xics_common_reset(XICS(d));
>> +}
>> +
>> +void xics_common_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
>>  {
>>      CPUState *cs = CPU(cpu);
>>      CPUPPCState *env = &cpu->env;
>> @@ -551,37 +521,72 @@ void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
>>      }
>>  }
>>  
>> -struct icp_state *xics_system_init(int nr_servers, int nr_irqs)
>> +void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
>> +{
>> +    xics_common_cpu_setup(icp, cpu);
>> +}
>> +
>> +void xics_common_init(struct icp_state *icp, qemu_irq_handler handler)
>>  {
>> -    struct icp_state *icp;
>> -    struct ics_state *ics;
>> +    struct ics_state *ics = icp->ics;
>>  
>> -    icp = g_malloc0(sizeof(*icp));
>> -    icp->nr_servers = nr_servers;
>>      icp->ss = g_malloc0(icp->nr_servers*sizeof(struct icp_server_state));
>>  
>>      ics = g_malloc0(sizeof(*ics));
>> -    ics->nr_irqs = nr_irqs;
>> +    ics->nr_irqs = icp->nr_irqs;
>>      ics->offset = XICS_IRQ_BASE;
>> -    ics->irqs = g_malloc0(nr_irqs * sizeof(struct ics_irq_state));
>> -    ics->islsi = g_malloc0(nr_irqs * sizeof(bool));
>> +    ics->irqs = g_malloc0(ics->nr_irqs * sizeof(struct ics_irq_state));
>> +    ics->islsi = g_malloc0(ics->nr_irqs * sizeof(bool));
>>  
>>      icp->ics = ics;
>>      ics->icp = icp;
>>  
>> -    ics->qirqs = qemu_allocate_irqs(ics_set_irq, ics, nr_irqs);
>> +    ics->qirqs = qemu_allocate_irqs(handler, ics, ics->nr_irqs);
>> +}
>>  
>> -    spapr_register_hypercall(H_CPPR, h_cppr);
>> -    spapr_register_hypercall(H_IPI, h_ipi);
>> -    spapr_register_hypercall(H_XIRR, h_xirr);
>> -    spapr_register_hypercall(H_EOI, h_eoi);
>> +static void xics_realize(DeviceState *dev, Error **errp)
>> +{
>> +    struct icp_state *icp = XICS(dev);
>> +
>> +    xics_common_init(icp, ics_set_irq);
>>  
>>      spapr_rtas_register("ibm,set-xive", rtas_set_xive);
>>      spapr_rtas_register("ibm,get-xive", rtas_get_xive);
>>      spapr_rtas_register("ibm,int-off", rtas_int_off);
>>      spapr_rtas_register("ibm,int-on", rtas_int_on);
>>  
>> -    qemu_register_reset(xics_reset, icp);
>> +}
>> +
>> +static Property xics_properties[] = {
>> +    DEFINE_PROP_UINT32("nr_servers", struct icp_state, nr_servers, -1),
>> +    DEFINE_PROP_UINT32("nr_irqs", struct icp_state, nr_irqs, -1),
>> +    DEFINE_PROP_END_OF_LIST(),
>> +};
>> +
>> +static void xics_class_init(ObjectClass *oc, void *data)
>> +{
>> +    DeviceClass *dc = DEVICE_CLASS(oc);
>> +
>> +    dc->realize = xics_realize;
>> +    dc->props = xics_properties;
>> +    dc->reset = xics_reset;
>> +}
>> +
>> +static const TypeInfo xics_info = {
>> +    .name          = TYPE_XICS,
>> +    .parent        = TYPE_SYS_BUS_DEVICE,
>> +    .instance_size = sizeof(struct icp_state),
>> +    .class_init    = xics_class_init,
>> +};
>> +
>> +static void xics_register_types(void)
>> +{
>> +    spapr_register_hypercall(H_CPPR, h_cppr);
>> +    spapr_register_hypercall(H_IPI, h_ipi);
>> +    spapr_register_hypercall(H_XIRR, h_xirr);
>> +    spapr_register_hypercall(H_EOI, h_eoi);
>>  
>> -    return icp;
>> +    type_register_static(&xics_info);
>>  }
>> +
>> +type_init(xics_register_types)
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 38c29b7..def3505 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -719,6 +719,34 @@ static int spapr_vga_init(PCIBus *pci_bus)
>>      }
>>  }
>>  
>> +static struct icp_state *try_create_xics(const char *type, int nr_servers,
>> +                                         int nr_irqs)
>> +{
>> +    DeviceState *dev;
>> +
>> +    dev = qdev_create(NULL, type);
>> +    qdev_prop_set_uint32(dev, "nr_servers", nr_servers);
>> +    qdev_prop_set_uint32(dev, "nr_irqs", nr_irqs);
>> +    if (qdev_init(dev) < 0) {
>> +        return NULL;
>> +    }
>> +
>> +    return XICS(dev);
>> +}
>> +
>> +static struct icp_state *xics_system_init(int nr_servers, int nr_irqs)
>> +{
>> +    struct icp_state *icp = NULL;
>> +
>> +    icp = try_create_xics(TYPE_XICS, nr_servers, nr_irqs);
>> +    if (!icp) {
>> +        perror("Failed to create XICS\n");
>> +        abort();
>> +    }
>> +
>> +    return icp;
>> +}
>> +
>>  /* pSeries LPAR / sPAPR hardware init */
>>  static void ppc_spapr_init(QEMUMachineInitArgs *args)
>>  {
>> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
>> index 6bce042..3f72806 100644
>> --- a/include/hw/ppc/xics.h
>> +++ b/include/hw/ppc/xics.h
>> @@ -27,15 +27,68 @@
>>  #if !defined(__XICS_H__)
>>  #define __XICS_H__
>>  
>> +#include "hw/sysbus.h"
>> +
>> +#define TYPE_XICS "xics"
>> +#define XICS(obj) OBJECT_CHECK(struct icp_state, (obj), TYPE_XICS)
>> +
>>  #define XICS_IPI        0x2
>> -#define XICS_IRQ_BASE   0x10
>> +#define XICS_BUID       0x1
>> +#define XICS_IRQ_BASE   (XICS_BUID << 12)
>> +
>> +/*
>> + * We currently only support one BUID which is our interrupt base
>> + * (the kernel implementation supports more but we don't exploit
>> + *  that yet)
>> + */
>>  
>> -struct icp_state;
>> +struct icp_state {
>> +    /*< private >*/
>> +    SysBusDevice parent_obj;
>> +    /*< public >*/
>> +    uint32_t nr_servers;
>> +    uint32_t nr_irqs;
>> +    struct icp_server_state *ss;
>> +    struct ics_state *ics;
>> +};
>> +
>> +struct icp_server_state {
>> +    uint32_t xirr;
>> +    uint8_t pending_priority;
>> +    uint8_t mfrr;
>> +    qemu_irq output;
>> +};
> 
> If you're exposing all of this, please fix coding style while you're at
> it.

>> +
>> +struct ics_state {
>> +    uint32_t nr_irqs;
>> +    uint32_t offset;
>> +    qemu_irq *qirqs;
>> +    bool *islsi;
>> +    struct ics_irq_state *irqs;
>> +    struct icp_state *icp;
>> +};
> 
> Shouldn't this be a device too?

No, why? It is a per CPU state of XICS controller, never exists apart from
XICS.


-- 
Alexey

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 02/17] pseries: rework XICS
  2013-07-09  3:40     ` Alexey Kardashevskiy
@ 2013-07-09  4:48       ` Benjamin Herrenschmidt
  2013-07-09 13:58         ` Anthony Liguori
  0 siblings, 1 reply; 92+ messages in thread
From: Benjamin Herrenschmidt @ 2013-07-09  4:48 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Anthony Liguori, qemu-devel, Alexander Graf, Paul Mackerras,
	Paolo Bonzini, qemu-ppc, David Gibson

On Tue, 2013-07-09 at 13:40 +1000, Alexey Kardashevskiy wrote:
> No, why? It is a per CPU state of XICS controller, never exists apart
> from XICS.

ICP is. ICS is  ... different but can mostly be considered to be the
XICS itself.

Anthony, we could be completely anal about it and create a gigantic
cathedral of devices or just be a bit realistic and do something simpler
that has the exact same functionality :)

Basically, in HW the layout of the interrupt network is:

 - One ICP per processor thread (the "presenter"). This contains the
registers to fetch a pending interrupt (ack), EOI, and control the
processor priority.

 - One ICS per logical source of interrupts (ie, one per PCI host
bridge, and a few others here or there). This contains the per-interrupt
source configuration (target processor(s), priority, mask) and the
per-interrupt internal state.

Under PAPR, there is a single "virtual" ICS ... somewhat (it's a bit
oddball what pHyp does here, arguably there are two but we can ignore
that distinction). There is no register level access. A pair of firmware
(RTAS) calls is used to configure each virtual interrupt.

So our model here is somewhat the same. We have one ICS in the emulated
XICS which arguably *is* the emulated XICS, there's no point making it a
separate "device", that would just be gross, and each VCPU has an
associated ICP.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 04/17] target-ppc: Convert ppc cpu savevm to VMStateDescription
  2013-07-08 18:29   ` Anthony Liguori
@ 2013-07-09  5:14     ` Alexey Kardashevskiy
  2013-07-09 14:08       ` Anthony Liguori
  0 siblings, 1 reply; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-09  5:14 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexander Graf, qemu-devel, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson

On 07/09/2013 04:29 AM, Anthony Liguori wrote:
> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> 
>> From: David Gibson <david@gibson.dropbear.id.au>
>>
>> The savevm code for the powerpc cpu emulation is currently based around
>> the old register_savevm() rather than register_vmstate() method.  It's also
>> rather broken, missing some important state on some CPU models.
>>
>> This patch completely rewrites the savevm for target-ppc, using the new
>> VMStateDescription approach.  Exactly what needs to be saved in what
>> configurations has been more carefully examined, too.  This introduces a
>> new version (5) of the cpu save format.  The old load function is retained
>> to support version 4 images.
> 
> Supporting "version 4" is purely an academic exercise.  I wouldn't bother.


Sorry, I do not get it. Will or will not the patch be accepted as is (with
removed comments from the bottom)? Or do I have to remove the old handlers
to get it in upstream? Thanks.


>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>> [aik: ppc cpu savevm convertion fixed to use PowerPCCPU instead of CPUPPCState]
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  target-ppc/cpu-qom.h        |    4 +
>>  target-ppc/cpu.h            |    8 +-
>>  target-ppc/machine.c        |  533 ++++++++++++++++++++++++++++++++++++-------
>>  target-ppc/translate_init.c |    2 +
>>  4 files changed, 454 insertions(+), 93 deletions(-)
>>
>> diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h
>> index eb03a00..2b96b04 100644
>> --- a/target-ppc/cpu-qom.h
>> +++ b/target-ppc/cpu-qom.h
>> @@ -102,4 +102,8 @@ PowerPCCPUClass *ppc_cpu_class_by_pvr(uint32_t pvr);
>>  
>>  void ppc_cpu_do_interrupt(CPUState *cpu);
>>  
>> +#ifndef CONFIG_USER_ONLY
>> +extern const struct VMStateDescription vmstate_ppc_cpu;
>> +#endif
>> +
>>  #endif
>> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
>> index 0ede077..f30577d 100644
>> --- a/target-ppc/cpu.h
>> +++ b/target-ppc/cpu.h
>> @@ -948,7 +948,7 @@ struct CPUPPCState {
>>  #if defined(TARGET_PPC64)
>>      /* PowerPC 64 SLB area */
>>      ppc_slb_t slb[64];
>> -    int slb_nr;
>> +    int32_t slb_nr;
>>  #endif
>>      /* segment registers */
>>      hwaddr htab_base;
>> @@ -957,11 +957,11 @@ struct CPUPPCState {
>>      /* externally stored hash table */
>>      uint8_t *external_htab;
>>      /* BATs */
>> -    int nb_BATs;
>> +    uint32_t nb_BATs;
>>      target_ulong DBAT[2][8];
>>      target_ulong IBAT[2][8];
>>      /* PowerPC TLB registers (for 4xx, e500 and 60x software driven TLBs) */
>> -    int nb_tlb;      /* Total number of TLB                                  */
>> +    int32_t nb_tlb;      /* Total number of TLB                              */
>>      int tlb_per_way; /* Speed-up helper: used to avoid divisions at run time */
>>      int nb_ways;     /* Number of ways in the TLB set                        */
>>      int last_way;    /* Last used way used to allocate TLB in a LRU way      */
>> @@ -1176,8 +1176,6 @@ static inline CPUPPCState *cpu_init(const char *cpu_model)
>>  #define cpu_signal_handler cpu_ppc_signal_handler
>>  #define cpu_list ppc_cpu_list
>>  
>> -#define CPU_SAVE_VERSION 4
>> -
>>  /* MMU modes definitions */
>>  #define MMU_MODE0_SUFFIX _user
>>  #define MMU_MODE1_SUFFIX _kernel
>> diff --git a/target-ppc/machine.c b/target-ppc/machine.c
>> index 2d10adb..1fcc6bc 100644
>> --- a/target-ppc/machine.c
>> +++ b/target-ppc/machine.c
>> @@ -1,96 +1,12 @@
>>  #include "hw/hw.h"
>>  #include "hw/boards.h"
>>  #include "sysemu/kvm.h"
>> +#include "helper_regs.h"
>>  
>> -void cpu_save(QEMUFile *f, void *opaque)
>> +static int cpu_load_old(QEMUFile *f, void *opaque, int version_id)
>>  {
>> -    CPUPPCState *env = (CPUPPCState *)opaque;
>> -    unsigned int i, j;
>> -    uint32_t fpscr;
>> -    target_ulong xer;
>> -
>> -    for (i = 0; i < 32; i++)
>> -        qemu_put_betls(f, &env->gpr[i]);
>> -#if !defined(TARGET_PPC64)
>> -    for (i = 0; i < 32; i++)
>> -        qemu_put_betls(f, &env->gprh[i]);
>> -#endif
>> -    qemu_put_betls(f, &env->lr);
>> -    qemu_put_betls(f, &env->ctr);
>> -    for (i = 0; i < 8; i++)
>> -        qemu_put_be32s(f, &env->crf[i]);
>> -    xer = cpu_read_xer(env);
>> -    qemu_put_betls(f, &xer);
>> -    qemu_put_betls(f, &env->reserve_addr);
>> -    qemu_put_betls(f, &env->msr);
>> -    for (i = 0; i < 4; i++)
>> -        qemu_put_betls(f, &env->tgpr[i]);
>> -    for (i = 0; i < 32; i++) {
>> -        union {
>> -            float64 d;
>> -            uint64_t l;
>> -        } u;
>> -        u.d = env->fpr[i];
>> -        qemu_put_be64(f, u.l);
>> -    }
>> -    fpscr = env->fpscr;
>> -    qemu_put_be32s(f, &fpscr);
>> -    qemu_put_sbe32s(f, &env->access_type);
>> -#if defined(TARGET_PPC64)
>> -    qemu_put_betls(f, &env->spr[SPR_ASR]);
>> -    qemu_put_sbe32s(f, &env->slb_nr);
>> -#endif
>> -    qemu_put_betls(f, &env->spr[SPR_SDR1]);
>> -    for (i = 0; i < 32; i++)
>> -        qemu_put_betls(f, &env->sr[i]);
>> -    for (i = 0; i < 2; i++)
>> -        for (j = 0; j < 8; j++)
>> -            qemu_put_betls(f, &env->DBAT[i][j]);
>> -    for (i = 0; i < 2; i++)
>> -        for (j = 0; j < 8; j++)
>> -            qemu_put_betls(f, &env->IBAT[i][j]);
>> -    qemu_put_sbe32s(f, &env->nb_tlb);
>> -    qemu_put_sbe32s(f, &env->tlb_per_way);
>> -    qemu_put_sbe32s(f, &env->nb_ways);
>> -    qemu_put_sbe32s(f, &env->last_way);
>> -    qemu_put_sbe32s(f, &env->id_tlbs);
>> -    qemu_put_sbe32s(f, &env->nb_pids);
>> -    if (env->tlb.tlb6) {
>> -        // XXX assumes 6xx
>> -        for (i = 0; i < env->nb_tlb; i++) {
>> -            qemu_put_betls(f, &env->tlb.tlb6[i].pte0);
>> -            qemu_put_betls(f, &env->tlb.tlb6[i].pte1);
>> -            qemu_put_betls(f, &env->tlb.tlb6[i].EPN);
>> -        }
>> -    }
>> -    for (i = 0; i < 4; i++)
>> -        qemu_put_betls(f, &env->pb[i]);
>> -    for (i = 0; i < 1024; i++)
>> -        qemu_put_betls(f, &env->spr[i]);
>> -    qemu_put_be32s(f, &env->vscr);
>> -    qemu_put_be64s(f, &env->spe_acc);
>> -    qemu_put_be32s(f, &env->spe_fscr);
>> -    qemu_put_betls(f, &env->msr_mask);
>> -    qemu_put_be32s(f, &env->flags);
>> -    qemu_put_sbe32s(f, &env->error_code);
>> -    qemu_put_be32s(f, &env->pending_interrupts);
>> -    qemu_put_be32s(f, &env->irq_input_state);
>> -    for (i = 0; i < POWERPC_EXCP_NB; i++)
>> -        qemu_put_betls(f, &env->excp_vectors[i]);
>> -    qemu_put_betls(f, &env->excp_prefix);
>> -    qemu_put_betls(f, &env->ivor_mask);
>> -    qemu_put_betls(f, &env->ivpr_mask);
>> -    qemu_put_betls(f, &env->hreset_vector);
>> -    qemu_put_betls(f, &env->nip);
>> -    qemu_put_betls(f, &env->hflags);
>> -    qemu_put_betls(f, &env->hflags_nmsr);
>> -    qemu_put_sbe32s(f, &env->mmu_idx);
>> -    qemu_put_sbe32(f, 0);
>> -}
>> -
>> -int cpu_load(QEMUFile *f, void *opaque, int version_id)
>> -{
>> -    CPUPPCState *env = (CPUPPCState *)opaque;
>> +    PowerPCCPU *cpu = opaque;
>> +    CPUPPCState *env = &cpu->env;
>>      unsigned int i, j;
>>      target_ulong sdr1;
>>      uint32_t fpscr;
>> @@ -177,3 +93,444 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
>>  
>>      return 0;
>>  }
>> +
>> +static int get_avr(QEMUFile *f, void *pv, size_t size)
>> +{
>> +    ppc_avr_t *v = pv;
>> +
>> +    v->u64[0] = qemu_get_be64(f);
>> +    v->u64[1] = qemu_get_be64(f);
>> +
>> +    return 0;
>> +}
>> +
>> +static void put_avr(QEMUFile *f, void *pv, size_t size)
>> +{
>> +    ppc_avr_t *v = pv;
>> +
>> +    qemu_put_be64(f, v->u64[0]);
>> +    qemu_put_be64(f, v->u64[1]);
>> +}
>> +
>> +const VMStateInfo vmstate_info_avr = {
>> +    .name = "avr",
>> +    .get  = get_avr,
>> +    .put  = put_avr,
>> +};
>> +
>> +#define VMSTATE_AVR_ARRAY_V(_f, _s, _n, _v)                       \
>> +    VMSTATE_ARRAY(_f, _s, _n, _v, vmstate_info_avr, ppc_avr_t)
>> +
>> +#define VMSTATE_AVR_ARRAY(_f, _s, _n)                             \
>> +    VMSTATE_AVR_ARRAY_V(_f, _s, _n, 0)
>> +
>> +static void cpu_pre_save(void *opaque)
>> +{
>> +    PowerPCCPU *cpu = opaque;
>> +    CPUPPCState *env = &cpu->env;
>> +    int i;
>> +
>> +    env->spr[SPR_LR] = env->lr;
>> +    env->spr[SPR_CTR] = env->ctr;
>> +    env->spr[SPR_XER] = env->xer;
>> +#if defined(TARGET_PPC64)
>> +    env->spr[SPR_CFAR] = env->cfar;
>> +#endif
>> +    env->spr[SPR_BOOKE_SPEFSCR] = env->spe_fscr;
>> +
>> +    for (i = 0; (i < 4) && (i < env->nb_BATs); i++) {
>> +        env->spr[SPR_DBAT0U + 2*i] = env->DBAT[0][i];
>> +        env->spr[SPR_DBAT0U + 2*i + 1] = env->DBAT[1][i];
>> +        env->spr[SPR_IBAT0U + 2*i] = env->IBAT[0][i];
>> +        env->spr[SPR_IBAT0U + 2*i + 1] = env->IBAT[1][i];
>> +    }
>> +    for (i = 0; (i < 4) && ((i+4) < env->nb_BATs); i++) {
>> +        env->spr[SPR_DBAT4U + 2*i] = env->DBAT[0][i+4];
>> +        env->spr[SPR_DBAT4U + 2*i + 1] = env->DBAT[1][i+4];
>> +        env->spr[SPR_IBAT4U + 2*i] = env->IBAT[0][i+4];
>> +        env->spr[SPR_IBAT4U + 2*i + 1] = env->IBAT[1][i+4];
>> +    }
>> +}
>> +
>> +static int cpu_post_load(void *opaque, int version_id)
>> +{
>> +    PowerPCCPU *cpu = opaque;
>> +    CPUPPCState *env = &cpu->env;
>> +    int i;
>> +
>> +    env->lr = env->spr[SPR_LR];
>> +    env->ctr = env->spr[SPR_CTR];
>> +    env->xer = env->spr[SPR_XER];
>> +#if defined(TARGET_PPC64)
>> +    env->cfar = env->spr[SPR_CFAR];
>> +#endif
>> +    env->spe_fscr = env->spr[SPR_BOOKE_SPEFSCR];
>> +
>> +    for (i = 0; (i < 4) && (i < env->nb_BATs); i++) {
>> +        env->DBAT[0][i] = env->spr[SPR_DBAT0U + 2*i];
>> +        env->DBAT[1][i] = env->spr[SPR_DBAT0U + 2*i + 1];
>> +        env->IBAT[0][i] = env->spr[SPR_IBAT0U + 2*i];
>> +        env->IBAT[1][i] = env->spr[SPR_IBAT0U + 2*i + 1];
>> +    }
>> +    for (i = 0; (i < 4) && ((i+4) < env->nb_BATs); i++) {
>> +        env->DBAT[0][i+4] = env->spr[SPR_DBAT4U + 2*i];
>> +        env->DBAT[1][i+4] = env->spr[SPR_DBAT4U + 2*i + 1];
>> +        env->IBAT[0][i+4] = env->spr[SPR_IBAT4U + 2*i];
>> +        env->IBAT[1][i+4] = env->spr[SPR_IBAT4U + 2*i + 1];
>> +    }
>> +
>> +    /* Restore htab_base and htab_mask variables */
>> +    ppc_store_sdr1(env, env->spr[SPR_SDR1]);
>> +
>> +    hreg_compute_hflags(env);
>> +    hreg_compute_mem_idx(env);
>> +
>> +    return 0;
>> +}
>> +
>> +static bool fpu_needed(void *opaque)
>> +{
>> +    PowerPCCPU *cpu = opaque;
>> +
>> +    return (cpu->env.insns_flags & PPC_FLOAT);
>> +}
>> +
>> +static const VMStateDescription vmstate_fpu = {
>> +    .name = "cpu/fpu",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .minimum_version_id_old = 1,
>> +    .fields      = (VMStateField []) {
>> +        VMSTATE_FLOAT64_ARRAY(env.fpr, PowerPCCPU, 32),
>> +        VMSTATE_UINTTL(env.fpscr, PowerPCCPU),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>> +static bool altivec_needed(void *opaque)
>> +{
>> +    PowerPCCPU *cpu = opaque;
>> +
>> +    return (cpu->env.insns_flags & PPC_ALTIVEC);
>> +}
>> +
>> +static const VMStateDescription vmstate_altivec = {
>> +    .name = "cpu/altivec",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .minimum_version_id_old = 1,
>> +    .fields      = (VMStateField []) {
>> +        VMSTATE_AVR_ARRAY(env.avr, PowerPCCPU, 32),
>> +        VMSTATE_UINT32(env.vscr, PowerPCCPU),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>> +static bool vsx_needed(void *opaque)
>> +{
>> +    PowerPCCPU *cpu = opaque;
>> +
>> +    return (cpu->env.insns_flags2 & PPC2_VSX);
>> +}
>> +
>> +static const VMStateDescription vmstate_vsx = {
>> +    .name = "cpu/vsx",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .minimum_version_id_old = 1,
>> +    .fields      = (VMStateField []) {
>> +        VMSTATE_UINT64_ARRAY(env.vsr, PowerPCCPU, 32),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>> +static bool sr_needed(void *opaque)
>> +{
>> +#ifdef TARGET_PPC64
>> +    PowerPCCPU *cpu = opaque;
>> +
>> +    return !(cpu->env.mmu_model & POWERPC_MMU_64);
>> +#else
>> +    return true;
>> +#endif
>> +}
>> +
>> +static const VMStateDescription vmstate_sr = {
>> +    .name = "cpu/sr",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .minimum_version_id_old = 1,
>> +    .fields      = (VMStateField []) {
>> +        VMSTATE_UINTTL_ARRAY(env.sr, PowerPCCPU, 32),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>> +#ifdef TARGET_PPC64
>> +static int get_slbe(QEMUFile *f, void *pv, size_t size)
>> +{
>> +    ppc_slb_t *v = pv;
>> +
>> +    v->esid = qemu_get_be64(f);
>> +    v->vsid = qemu_get_be64(f);
>> +
>> +    return 0;
>> +}
>> +
>> +static void put_slbe(QEMUFile *f, void *pv, size_t size)
>> +{
>> +    ppc_slb_t *v = pv;
>> +
>> +    qemu_put_be64(f, v->esid);
>> +    qemu_put_be64(f, v->vsid);
>> +}
>> +
>> +const VMStateInfo vmstate_info_slbe = {
>> +    .name = "slbe",
>> +    .get  = get_slbe,
>> +    .put  = put_slbe,
>> +};
>> +
>> +#define VMSTATE_SLB_ARRAY_V(_f, _s, _n, _v)                       \
>> +    VMSTATE_ARRAY(_f, _s, _n, _v, vmstate_info_slbe, ppc_slb_t)
>> +
>> +#define VMSTATE_SLB_ARRAY(_f, _s, _n)                             \
>> +    VMSTATE_SLB_ARRAY_V(_f, _s, _n, 0)
>> +
>> +static bool slb_needed(void *opaque)
>> +{
>> +    PowerPCCPU *cpu = opaque;
>> +
>> +    /* We don't support any of the old segment table based 64-bit CPUs */
>> +    return (cpu->env.mmu_model & POWERPC_MMU_64);
>> +}
>> +
>> +static const VMStateDescription vmstate_slb = {
>> +    .name = "cpu/slb",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .minimum_version_id_old = 1,
>> +    .fields      = (VMStateField []) {
>> +        VMSTATE_INT32_EQUAL(env.slb_nr, PowerPCCPU),
>> +        VMSTATE_SLB_ARRAY(env.slb, PowerPCCPU, 64),
>> +        VMSTATE_END_OF_LIST()
>> +    }
>> +};
>> +#endif /* TARGET_PPC64 */
>> +
>> +static const VMStateDescription vmstate_tlb6xx_entry = {
>> +    .name = "cpu/tlb6xx_entry",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .minimum_version_id_old = 1,
>> +    .fields      = (VMStateField []) {
>> +        VMSTATE_UINTTL(pte0, ppc6xx_tlb_t),
>> +        VMSTATE_UINTTL(pte1, ppc6xx_tlb_t),
>> +        VMSTATE_UINTTL(EPN, ppc6xx_tlb_t),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>> +static bool tlb6xx_needed(void *opaque)
>> +{
>> +    PowerPCCPU *cpu = opaque;
>> +    CPUPPCState *env = &cpu->env;
>> +
>> +    return env->nb_tlb && (env->tlb_type == TLB_6XX);
>> +}
>> +
>> +static const VMStateDescription vmstate_tlb6xx = {
>> +    .name = "cpu/tlb6xx",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .minimum_version_id_old = 1,
>> +    .fields      = (VMStateField []) {
>> +        VMSTATE_INT32_EQUAL(env.nb_tlb, PowerPCCPU),
>> +        VMSTATE_STRUCT_VARRAY_POINTER_INT32(env.tlb.tlb6, PowerPCCPU,
>> +                                            env.nb_tlb,
>> +                                            vmstate_tlb6xx_entry,
>> +                                            ppc6xx_tlb_t),
>> +        VMSTATE_UINTTL_ARRAY(env.tgpr, PowerPCCPU, 4),
>> +        VMSTATE_END_OF_LIST()
>> +    }
>> +};
>> +
>> +static const VMStateDescription vmstate_tlbemb_entry = {
>> +    .name = "cpu/tlbemb_entry",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .minimum_version_id_old = 1,
>> +    .fields      = (VMStateField []) {
>> +        VMSTATE_UINT64(RPN, ppcemb_tlb_t),
>> +        VMSTATE_UINTTL(EPN, ppcemb_tlb_t),
>> +        VMSTATE_UINTTL(PID, ppcemb_tlb_t),
>> +        VMSTATE_UINTTL(size, ppcemb_tlb_t),
>> +        VMSTATE_UINT32(prot, ppcemb_tlb_t),
>> +        VMSTATE_UINT32(attr, ppcemb_tlb_t),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>> +static bool tlbemb_needed(void *opaque)
>> +{
>> +    PowerPCCPU *cpu = opaque;
>> +    CPUPPCState *env = &cpu->env;
>> +
>> +    return env->nb_tlb && (env->tlb_type == TLB_EMB);
>> +}
>> +
>> +static bool pbr403_needed(void *opaque)
>> +{
>> +    PowerPCCPU *cpu = opaque;
>> +    uint32_t pvr = cpu->env.spr[SPR_PVR];
>> +
>> +    return (pvr & 0xffff0000) == 0x00200000;
>> +}
>> +
>> +static const VMStateDescription vmstate_pbr403 = {
>> +    .name = "cpu/pbr403",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .minimum_version_id_old = 1,
>> +    .fields      = (VMStateField []) {
>> +        VMSTATE_UINTTL_ARRAY(env.pb, PowerPCCPU, 4),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>> +static const VMStateDescription vmstate_tlbemb = {
>> +    .name = "cpu/tlb6xx",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .minimum_version_id_old = 1,
>> +    .fields      = (VMStateField []) {
>> +        VMSTATE_INT32_EQUAL(env.nb_tlb, PowerPCCPU),
>> +        VMSTATE_STRUCT_VARRAY_POINTER_INT32(env.tlb.tlbe, PowerPCCPU,
>> +                                            env.nb_tlb,
>> +                                            vmstate_tlbemb_entry,
>> +                                            ppcemb_tlb_t),
>> +        /* 403 protection registers */
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +    .subsections = (VMStateSubsection []) {
>> +        {
>> +            .vmsd = &vmstate_pbr403,
>> +            .needed = pbr403_needed,
>> +        } , {
>> +            /* empty */
>> +        }
>> +    }
>> +};
>> +
>> +static const VMStateDescription vmstate_tlbmas_entry = {
>> +    .name = "cpu/tlbmas_entry",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .minimum_version_id_old = 1,
>> +    .fields      = (VMStateField []) {
>> +        VMSTATE_UINT32(mas8, ppcmas_tlb_t),
>> +        VMSTATE_UINT32(mas1, ppcmas_tlb_t),
>> +        VMSTATE_UINT64(mas2, ppcmas_tlb_t),
>> +        VMSTATE_UINT64(mas7_3, ppcmas_tlb_t),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>> +static bool tlbmas_needed(void *opaque)
>> +{
>> +    PowerPCCPU *cpu = opaque;
>> +    CPUPPCState *env = &cpu->env;
>> +
>> +    return env->nb_tlb && (env->tlb_type == TLB_MAS);
>> +}
>> +
>> +static const VMStateDescription vmstate_tlbmas = {
>> +    .name = "cpu/tlbmas",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .minimum_version_id_old = 1,
>> +    .fields      = (VMStateField []) {
>> +        VMSTATE_INT32_EQUAL(env.nb_tlb, PowerPCCPU),
>> +        VMSTATE_STRUCT_VARRAY_POINTER_INT32(env.tlb.tlbm, PowerPCCPU,
>> +                                            env.nb_tlb,
>> +                                            vmstate_tlbmas_entry,
>> +                                            ppcmas_tlb_t),
>> +        VMSTATE_END_OF_LIST()
>> +    }
>> +};
>> +
>> +const VMStateDescription vmstate_ppc_cpu = {
>> +    .name = "cpu",
>> +    .version_id = 5,
>> +    .minimum_version_id = 5,
>> +    .minimum_version_id_old = 4,
>> +    .load_state_old = cpu_load_old,
>> +    .pre_save = cpu_pre_save,
>> +    .post_load = cpu_post_load,
>> +    .fields      = (VMStateField []) {
>> +        /* Verify we haven't changed the pvr */
>> +        VMSTATE_UINTTL_EQUAL(env.spr[SPR_PVR], PowerPCCPU),
>> +
>> +        /* User mode architected state */
>> +        VMSTATE_UINTTL_ARRAY(env.gpr, PowerPCCPU, 32),
>> +#if !defined(TARGET_PPC64)
>> +        VMSTATE_UINTTL_ARRAY(env.gprh, PowerPCCPU, 32),
>> +#endif
>> +        VMSTATE_UINT32_ARRAY(env.crf, PowerPCCPU, 8),
>> +        VMSTATE_UINTTL(env.nip, PowerPCCPU),
>> +
>> +        /* SPRs */
>> +        VMSTATE_UINTTL_ARRAY(env.spr, PowerPCCPU, 1024),
>> +        VMSTATE_UINT64(env.spe_acc, PowerPCCPU),
>> +
>> +        /* Reservation */
>> +        VMSTATE_UINTTL(env.reserve_addr, PowerPCCPU),
>> +
>> +        /* Supervisor mode architected state */
>> +        VMSTATE_UINTTL(env.msr, PowerPCCPU),
>> +
>> +        /* Internal state */
>> +        VMSTATE_UINTTL(env.hflags_nmsr, PowerPCCPU),
>> +        /* FIXME: access_type? */
>> +
>> +        /* Sanity checking */
>> +        VMSTATE_UINTTL_EQUAL(env.msr_mask, PowerPCCPU),
>> +        VMSTATE_UINT64_EQUAL(env.insns_flags, PowerPCCPU),
>> +        VMSTATE_UINT64_EQUAL(env.insns_flags2, PowerPCCPU),
>> +        VMSTATE_UINT32_EQUAL(env.nb_BATs, PowerPCCPU),
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +    .subsections = (VMStateSubsection []) {
>> +        {
>> +            .vmsd = &vmstate_fpu,
>> +            .needed = fpu_needed,
>> +        } , {
>> +            .vmsd = &vmstate_altivec,
>> +            .needed = altivec_needed,
>> +        } , {
>> +            .vmsd = &vmstate_vsx,
>> +            .needed = vsx_needed,
>> +        } , {
>> +            .vmsd = &vmstate_sr,
>> +            .needed = sr_needed,
>> +        } , {
>> +#ifdef TARGET_PPC64
>> +            .vmsd = &vmstate_slb,
>> +            .needed = slb_needed,
>> +        } , {
>> +#endif /* TARGET_PPC64 */
>> +            .vmsd = &vmstate_tlb6xx,
>> +            .needed = tlb6xx_needed,
>> +        } , {
>> +            .vmsd = &vmstate_tlbemb,
>> +            .needed = tlbemb_needed,
>> +        } , {
>> +            .vmsd = &vmstate_tlbmas,
>> +            .needed = tlbmas_needed,
>> +        } , {
>> +            /* FIXME: DCRs? */
>> +            /* FIXME: timebase? */
>> +            /* empty */
> 
> Are they needed or not needed?

DCR is not needed, I'll remove it.

Timebase is needed but it requires kernel support and either way it should
not prevent the rest of the patch to go upstream.

I'll remove both comments anyway.


> If they're needed, please add them.
> 
> Regards,
> 
> Anthony Liguori
> 
>> +        }
>> +    }
>> +};
>> diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
>> index d8758d5..95aebf7 100644
>> --- a/target-ppc/translate_init.c
>> +++ b/target-ppc/translate_init.c
>> @@ -8295,6 +8295,8 @@ static void ppc_cpu_class_init(ObjectClass *oc, void *data)
>>  
>>      cc->class_by_name = ppc_cpu_class_by_name;
>>      cc->do_interrupt = ppc_cpu_do_interrupt;
>> +
>> +    cpu_class_set_vmsd(cc, &vmstate_ppc_cpu);
>>  }
>>  
>>  static const TypeInfo ppc_cpu_type_info = {
>> -- 
>> 1.7.10.4
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8
  2013-07-08 18:01 ` Anthony Liguori
@ 2013-07-09  6:37   ` Alexey Kardashevskiy
  2013-07-09 15:26     ` Anthony Liguori
  0 siblings, 1 reply; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-09  6:37 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, Alexander Graf, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson

On 07/09/2013 04:01 AM, Anthony Liguori wrote:
> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> 
>> This series spent quite a lot of time waiting when David's PCI series
>> reaches the upstream but it does not seem to happen soon so I rebased
>> those on top of agraf/ppc-next rebased on top qemu.org/master.
>>
>>
>> While this series applies and compiles, the migration will often fail
>> until the "migration: do not sent zero pages in bulk stage" patch is reverted
>> or fixed somehow.
> 
> Your cover letter is out of date.  This patch has been applied.  Can you
> confirm the series now works as expected?

Sorry, my bad. It all works now.

> David's PCI series is now upstream too.
> 
> This should be at least three if not four distinct patch series.
> Sending it as a single series means it cannot be applied in chunks easily.

Besides "savevm: Implement VMS_DIVIDE flag" (can we keep it? or I should
get rid of it?), the rest should go through Alex Graf's ppc-next tree. I
cannot easily move the patches in this series as it will require rebase
almost every time. So what is the point in splitting this into 4 series?
Can try grouping some of them though...



> Regards,
> 
> Anthony Liguori
> 
>> Alexey Kardashevskiy (4):
>>   pseries: move interrupt controllers to hw/intc/
>>   pseries: rework XICS
>>   pseries: rework PAPR virtual SCSI
>>   spapr-pci: rework MSI/MSIX
>>
>> David Gibson (12):
>>   savevm: Implement VMS_DIVIDE flag
>>   target-ppc: Convert ppc cpu savevm to VMStateDescription
>>   pseries: savevm support for XICS interrupt controller
>>   pseries: savevm support for VIO devices
>>   pseries: savevm support for PAPR VIO logical lan
>>   pseries: savevm support for PAPR TCE tables
>>   pseries: savevm support for PAPR virtual SCSI
>>   pseries: savevm support for pseries machine
>>   pseries: savevm support for PCI host bridge
>>   target-ppc: Add helper for KVM_PPC_RTAS_DEFINE_TOKEN
>>   pseries: Support for in-kernel XICS interrupt controller
>>   pseries: savevm support with KVM
>>
>> Prerna Saxena (1):
>>   ppc64: Enable QEMU to run on POWER 8 DD1 chip.
>>
>>  default-configs/ppc64-softmmu.mak |    2 +
>>  hw/char/spapr_vty.c               |   16 ++
>>  hw/intc/Makefile.objs             |    2 +
>>  hw/{ppc => intc}/xics.c           |  172 ++++++++----
>>  hw/intc/xics_kvm.c                |  445 +++++++++++++++++++++++++++++++
>>  hw/net/spapr_llan.c               |   24 +-
>>  hw/ppc/Makefile.objs              |    2 +-
>>  hw/ppc/spapr.c                    |  418 ++++++++++++++++++++++++++++-
>>  hw/ppc/spapr_hcall.c              |    8 +-
>>  hw/ppc/spapr_iommu.c              |   25 ++
>>  hw/ppc/spapr_pci.c                |  141 ++++++----
>>  hw/ppc/spapr_vio.c                |   20 ++
>>  hw/scsi/spapr_vscsi.c             |  306 ++++++++++++++-------
>>  include/hw/pci-host/spapr.h       |   14 +-
>>  include/hw/ppc/spapr.h            |   17 +-
>>  include/hw/ppc/spapr_vio.h        |    5 +
>>  include/hw/ppc/xics.h             |   72 ++++-
>>  include/migration/vmstate.h       |   13 +
>>  savevm.c                          |    8 +
>>  target-ppc/cpu-models.c           |    3 +
>>  target-ppc/cpu-models.h           |    1 +
>>  target-ppc/cpu-qom.h              |    4 +
>>  target-ppc/cpu.h                  |    8 +-
>>  target-ppc/kvm.c                  |   83 ++++++
>>  target-ppc/kvm_ppc.h              |   29 ++
>>  target-ppc/machine.c              |  533 +++++++++++++++++++++++++++++++------
>>  target-ppc/translate_init.c       |   36 +++
>>  27 files changed, 2088 insertions(+), 319 deletions(-)
>>  rename hw/{ppc => intc}/xics.c (80%)
>>  create mode 100644 hw/intc/xics_kvm.c
>>
>> -- 
>> 1.7.10.4
> 
> 


-- 
Alexey

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 05/17] pseries: savevm support for XICS interrupt controller
  2013-07-08 18:31   ` Anthony Liguori
  2013-07-09  0:06     ` Alexey Kardashevskiy
@ 2013-07-09  7:17     ` David Gibson
  2013-07-15 13:10       ` Paolo Bonzini
  1 sibling, 1 reply; 92+ messages in thread
From: David Gibson @ 2013-07-09  7:17 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexey Kardashevskiy, qemu-devel, Alexander Graf, qemu-ppc,
	Paolo Bonzini, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 1762 bytes --]

On Mon, Jul 08, 2013 at 01:31:59PM -0500, Anthony Liguori wrote:
> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> > From: David Gibson <david@gibson.dropbear.id.au>
[snip]
> >  void xics_common_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
> >  {
> >      CPUState *cs = CPU(cpu);
> > @@ -523,7 +578,11 @@ void xics_common_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
> >  
> >  void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
> >  {
> > +    CPUState *cs = CPU(cpu);
> > +    struct icp_server_state *ss = &icp->ss[cs->cpu_index];
> > +
> >      xics_common_cpu_setup(icp, cpu);
> > +    vmstate_register(NULL, cs->cpu_index, &vmstate_icp_server, ss);
> 
> This is an indication that something is wrong.
> 
> You should tie the vmstate section to DeviceState::vmsd.  You only need
> to do this because you haven't converted everything to QOM yet.
> 
> Please do that to avoid these hacks.

So, Alexey addressed the xics vs. xics-kvm issues.  But there's
another factor here.  It's not clear to me how you'd QOM this
component.

What's being registered here is the "presentation server".  That's the
per-CPU part - vaguely equivalent to the LAPIC on x86.  x86 doesn't
have something equivalent here, because they register the LAPIC state
as part of the CPU state, but we can't do that because the ICP is not
bound to the CPU as tightly - a POWER7 using a different interrupt
architecture would certainly be possible.

So to do this with QOM, would the ICP need to be registered as a child
of the cpu object?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 08/17] pseries: savevm support for PAPR TCE tables
  2013-07-08 18:39   ` Anthony Liguori
  2013-07-08 21:45     ` Benjamin Herrenschmidt
@ 2013-07-09  7:20     ` David Gibson
  2013-07-09 15:22       ` Anthony Liguori
  2013-07-09 16:26       ` Anthony Liguori
  2013-07-15 13:26     ` Paolo Bonzini
  2 siblings, 2 replies; 92+ messages in thread
From: David Gibson @ 2013-07-09  7:20 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexey Kardashevskiy, qemu-devel, Alexander Graf, qemu-ppc,
	Paolo Bonzini, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 2650 bytes --]

On Mon, Jul 08, 2013 at 01:39:26PM -0500, Anthony Liguori wrote:
> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> 
> > From: David Gibson <david@gibson.dropbear.id.au>
> >
> > This patch adds the necessary VMStateDescription information to save the
> > state of PAPR TCE tables (that is, the PAPR specified IOMMU).
> >
> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> > ---
> >  hw/ppc/spapr_iommu.c |   25 +++++++++++++++++++++++++
> >  1 file changed, 25 insertions(+)
> >
> > diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> > index 91bc8e4..ba1f7b6 100644
> > --- a/hw/ppc/spapr_iommu.c
> > +++ b/hw/ppc/spapr_iommu.c
> > @@ -112,6 +112,25 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
> >      };
> >  }
> >  
> > +static const VMStateDescription vmstate_spapr_tce_table = {
> > +    .name = "spapr_iommu",
> > +    .version_id = 1,
> > +    .minimum_version_id = 1,
> > +    .minimum_version_id_old = 1,
> > +    .fields      = (VMStateField []) {
> > +        /* Sanity check */
> > +        VMSTATE_UINT32_EQUAL(liobn, sPAPRTCETable),
> > +        VMSTATE_UINT32_EQUAL(window_size, sPAPRTCETable),
> > +
> > +        /* IOMMU state */
> > +        VMSTATE_BOOL(bypass, sPAPRTCETable),
> > +        VMSTATE_VBUFFER_DIVIDE(table, sPAPRTCETable, 0, NULL, 0, window_size,
> > +                               SPAPR_TCE_PAGE_SIZE /
> > sizeof(sPAPRTCE)),
> 
> Not endian safe.  I really don't get the divide bit at all either.

So, the actual bug is that we're currently storing the TCE table
native endian, whereas it should be stored big endan always.
 
> > +
> > +        VMSTATE_END_OF_LIST()
> > +    },
> > +};
> > +
> >  static MemoryRegionIOMMUOps spapr_iommu_ops = {
> >      .translate = spapr_tce_translate_iommu,
> >  };
> > @@ -156,6 +175,8 @@ sPAPRTCETable *spapr_tce_new_table(uint32_t liobn, size_t window_size)
> >  
> >      QLIST_INSERT_HEAD(&spapr_tce_tables, tcet, list);
> >  
> > +    vmstate_register(NULL, tcet->liobn, &vmstate_spapr_tce_table, tcet);
> > +
> 
> If you need to add these, then you need to do more QOM conversion.

Again, it's not clear how this should be QOMed.  Child of the device
constructing the TCE table?  But since that can often be a bus bridge,
wouldn't the TCE table instances get confused with the real bus
devices.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 14/17] pseries: Support for in-kernel XICS interrupt controller
  2013-07-09  3:21     ` Alexey Kardashevskiy
@ 2013-07-09  7:21       ` David Gibson
  2013-07-10  3:24         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 92+ messages in thread
From: David Gibson @ 2013-07-09  7:21 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Anthony Liguori, qemu-devel, Alexander Graf, qemu-ppc,
	Paolo Bonzini, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 781 bytes --]

On Tue, Jul 09, 2013 at 01:21:27PM +1000, Alexey Kardashevskiy wrote:
> On 07/09/2013 04:50 AM, Anthony Liguori wrote:
> >> +#include "hw/hw.h"
> >> +#include "trace.h"
> >> +#include "hw/ppc/spapr.h"
> >> +#include "hw/ppc/xics.h"
> >> +#include "kvm_ppc.h"
> >> +#include "qemu/config-file.h"
> >> +
> >> +#include <sys/ioctl.h>
> >> +
> >> +struct icp_state_kvm {
> > 
> > CodingStyle
> 
> 
> ./scripts/checkpatch.pl finds nothing.
> 
> Did you mean missing typedef?

I think he means the kernel_style_struct_name instead of the
QemuStyleStudlyCapsStructName.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 02/17] pseries: rework XICS
  2013-07-09  4:48       ` Benjamin Herrenschmidt
@ 2013-07-09 13:58         ` Anthony Liguori
  2013-07-10  3:06           ` Alexey Kardashevskiy
  2013-07-10  3:26           ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 92+ messages in thread
From: Anthony Liguori @ 2013-07-09 13:58 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alexey Kardashevskiy
  Cc: qemu-devel, Alexander Graf, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Tue, 2013-07-09 at 13:40 +1000, Alexey Kardashevskiy wrote:
>> No, why? It is a per CPU state of XICS controller, never exists apart
>> from XICS.
>
> ICP is. ICS is  ... different but can mostly be considered to be the
> XICS itself.
>
> Anthony, we could be completely anal about it and create a gigantic
> cathedral of devices or just be a bit realistic and do something simpler
> that has the exact same functionality :)

There's very little complexity in making something a device.  It's just
a matter of sticking a DeviceState member in the struct, changing the
way the object is created (object_new vs. malloc), and adding a
TypeInfo.

There's a very good reason to have things be devices too.  You can only
control the section naming of devices for live migration.  The only way
to set compatibility properties for live migration is by having device
properties too.

You haven't dealt with these problems yet, but you will, and doing the
work up front means that you don't have to break migration once in order
to keep it compatible in the future.

> Basically, in HW the layout of the interrupt network is:
>
>  - One ICP per processor thread (the "presenter"). This contains the
> registers to fetch a pending interrupt (ack), EOI, and control the
> processor priority.
>
>  - One ICS per logical source of interrupts (ie, one per PCI host
> bridge, and a few others here or there). This contains the per-interrupt
> source configuration (target processor(s), priority, mask) and the
> per-interrupt internal state.

This sounds an awful lot like the relationship between the I/O APIC(s)
and the local APICs FWIW.

> Under PAPR, there is a single "virtual" ICS ... somewhat (it's a bit
> oddball what pHyp does here, arguably there are two but we can ignore
> that distinction). There is no register level access. A pair of firmware
> (RTAS) calls is used to configure each virtual interrupt.
>
> So our model here is somewhat the same. We have one ICS in the emulated
> XICS which arguably *is* the emulated XICS, there's no point making it a
> separate "device", that would just be gross, and each VCPU has an
> associated ICP.

There's nothing gross about making the things that are devices devices.

Regards,

Anthony Liguori

>
> Cheers,
> Ben.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8
  2013-06-27  6:45 [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (18 preceding siblings ...)
  2013-07-08 18:01 ` Anthony Liguori
@ 2013-07-09 14:04 ` Anthony Liguori
  19 siblings, 0 replies; 92+ messages in thread
From: Anthony Liguori @ 2013-07-09 14:04 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: Alexander Graf, qemu-ppc, Paolo Bonzini, Paul Mackerras, David Gibson

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> This series spent quite a lot of time waiting when David's PCI series
> reaches the upstream but it does not seem to happen soon so I rebased
> those on top of agraf/ppc-next rebased on top qemu.org/master.
>
>
> While this series applies and compiles, the migration will often fail
> until the "migration: do not sent zero pages in bulk stage" patch is reverted
> or fixed somehow.

Can you publish this in a branch please somewhere that I can pull from?

Regards,

Anthony Liguori

>
>
> Alexey Kardashevskiy (4):
>   pseries: move interrupt controllers to hw/intc/
>   pseries: rework XICS
>   pseries: rework PAPR virtual SCSI
>   spapr-pci: rework MSI/MSIX
>
> David Gibson (12):
>   savevm: Implement VMS_DIVIDE flag
>   target-ppc: Convert ppc cpu savevm to VMStateDescription
>   pseries: savevm support for XICS interrupt controller
>   pseries: savevm support for VIO devices
>   pseries: savevm support for PAPR VIO logical lan
>   pseries: savevm support for PAPR TCE tables
>   pseries: savevm support for PAPR virtual SCSI
>   pseries: savevm support for pseries machine
>   pseries: savevm support for PCI host bridge
>   target-ppc: Add helper for KVM_PPC_RTAS_DEFINE_TOKEN
>   pseries: Support for in-kernel XICS interrupt controller
>   pseries: savevm support with KVM
>
> Prerna Saxena (1):
>   ppc64: Enable QEMU to run on POWER 8 DD1 chip.
>
>  default-configs/ppc64-softmmu.mak |    2 +
>  hw/char/spapr_vty.c               |   16 ++
>  hw/intc/Makefile.objs             |    2 +
>  hw/{ppc => intc}/xics.c           |  172 ++++++++----
>  hw/intc/xics_kvm.c                |  445 +++++++++++++++++++++++++++++++
>  hw/net/spapr_llan.c               |   24 +-
>  hw/ppc/Makefile.objs              |    2 +-
>  hw/ppc/spapr.c                    |  418 ++++++++++++++++++++++++++++-
>  hw/ppc/spapr_hcall.c              |    8 +-
>  hw/ppc/spapr_iommu.c              |   25 ++
>  hw/ppc/spapr_pci.c                |  141 ++++++----
>  hw/ppc/spapr_vio.c                |   20 ++
>  hw/scsi/spapr_vscsi.c             |  306 ++++++++++++++-------
>  include/hw/pci-host/spapr.h       |   14 +-
>  include/hw/ppc/spapr.h            |   17 +-
>  include/hw/ppc/spapr_vio.h        |    5 +
>  include/hw/ppc/xics.h             |   72 ++++-
>  include/migration/vmstate.h       |   13 +
>  savevm.c                          |    8 +
>  target-ppc/cpu-models.c           |    3 +
>  target-ppc/cpu-models.h           |    1 +
>  target-ppc/cpu-qom.h              |    4 +
>  target-ppc/cpu.h                  |    8 +-
>  target-ppc/kvm.c                  |   83 ++++++
>  target-ppc/kvm_ppc.h              |   29 ++
>  target-ppc/machine.c              |  533 +++++++++++++++++++++++++++++++------
>  target-ppc/translate_init.c       |   36 +++
>  27 files changed, 2088 insertions(+), 319 deletions(-)
>  rename hw/{ppc => intc}/xics.c (80%)
>  create mode 100644 hw/intc/xics_kvm.c
>
> -- 
> 1.7.10.4

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 03/17] savevm: Implement VMS_DIVIDE flag
  2013-07-08 23:57     ` David Gibson
@ 2013-07-09 14:06       ` Anthony Liguori
  2013-07-09 14:38         ` David Gibson
  0 siblings, 1 reply; 92+ messages in thread
From: Anthony Liguori @ 2013-07-09 14:06 UTC (permalink / raw)
  To: David Gibson
  Cc: Alexey Kardashevskiy, qemu-devel, Alexander Graf, qemu-ppc,
	Paolo Bonzini, Paul Mackerras

David Gibson <david@gibson.dropbear.id.au> writes:

> On Mon, Jul 08, 2013 at 01:27:05PM -0500, Anthony Liguori wrote:
>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>> 
>> > From: David Gibson <david@gibson.dropbear.id.au>
>> >
>> > The vmstate infrastructure includes a VMS_MULTIPY flag, and associated
>> > VMSTATE_VBUFFER_MULTIPLY helper macro.  These can be used to save a
>> > variably sized buffer where the size in bytes of the buffer isn't directly
>> > accessible as a structure field, but an element count from which the size
>> > can be derived is.
>> 
>> Why?  What's the point of sending the total size vs. the element
>> count?
>
> Because it's more convenient to work with the total size at runtime,
> and because the VMSTATE stuff works with actual structure fields,
> there's not really a way to convert it at migrate time, short of this.

The only thing I see using it is the tce array which is broken anyway.

Regards,

Anthony Liguori

>
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 04/17] target-ppc: Convert ppc cpu savevm to VMStateDescription
  2013-07-09  5:14     ` Alexey Kardashevskiy
@ 2013-07-09 14:08       ` Anthony Liguori
  2013-07-09 15:11         ` David Gibson
  0 siblings, 1 reply; 92+ messages in thread
From: Anthony Liguori @ 2013-07-09 14:08 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Alexander Graf, qemu-devel, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> On 07/09/2013 04:29 AM, Anthony Liguori wrote:
>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>> 
>>> From: David Gibson <david@gibson.dropbear.id.au>
>>>
>>> The savevm code for the powerpc cpu emulation is currently based around
>>> the old register_savevm() rather than register_vmstate() method.  It's also
>>> rather broken, missing some important state on some CPU models.
>>>
>>> This patch completely rewrites the savevm for target-ppc, using the new
>>> VMStateDescription approach.  Exactly what needs to be saved in what
>>> configurations has been more carefully examined, too.  This introduces a
>>> new version (5) of the cpu save format.  The old load function is retained
>>> to support version 4 images.
>> 
>> Supporting "version 4" is purely an academic exercise.  I wouldn't bother.
>
>
> Sorry, I do not get it. Will or will not the patch be accepted as is (with
> removed comments from the bottom)? Or do I have to remove the old handlers
> to get it in upstream? Thanks.

It's dead code.  Please remove it.

>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>> [aik: ppc cpu savevm convertion fixed to use PowerPCCPU instead of CPUPPCState]
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>  target-ppc/cpu-qom.h        |    4 +
>>>  target-ppc/cpu.h            |    8 +-
>>>  target-ppc/machine.c        |  533 ++++++++++++++++++++++++++++++++++++-------
>>>  target-ppc/translate_init.c |    2 +
>>>  4 files changed, 454 insertions(+), 93 deletions(-)
>>>
>>> diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h
>>> index eb03a00..2b96b04 100644
>>> --- a/target-ppc/cpu-qom.h
>>> +++ b/target-ppc/cpu-qom.h
>>> @@ -102,4 +102,8 @@ PowerPCCPUClass *ppc_cpu_class_by_pvr(uint32_t pvr);
>>>  
>>>  void ppc_cpu_do_interrupt(CPUState *cpu);
>>>  
>>> +#ifndef CONFIG_USER_ONLY
>>> +extern const struct VMStateDescription vmstate_ppc_cpu;
>>> +#endif
>>> +
>>>  #endif
>>> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
>>> index 0ede077..f30577d 100644
>>> --- a/target-ppc/cpu.h
>>> +++ b/target-ppc/cpu.h
>>> @@ -948,7 +948,7 @@ struct CPUPPCState {
>>>  #if defined(TARGET_PPC64)
>>>      /* PowerPC 64 SLB area */
>>>      ppc_slb_t slb[64];
>>> -    int slb_nr;
>>> +    int32_t slb_nr;
>>>  #endif
>>>      /* segment registers */
>>>      hwaddr htab_base;
>>> @@ -957,11 +957,11 @@ struct CPUPPCState {
>>>      /* externally stored hash table */
>>>      uint8_t *external_htab;
>>>      /* BATs */
>>> -    int nb_BATs;
>>> +    uint32_t nb_BATs;
>>>      target_ulong DBAT[2][8];
>>>      target_ulong IBAT[2][8];
>>>      /* PowerPC TLB registers (for 4xx, e500 and 60x software driven TLBs) */
>>> -    int nb_tlb;      /* Total number of TLB                                  */
>>> +    int32_t nb_tlb;      /* Total number of TLB                              */
>>>      int tlb_per_way; /* Speed-up helper: used to avoid divisions at run time */
>>>      int nb_ways;     /* Number of ways in the TLB set                        */
>>>      int last_way;    /* Last used way used to allocate TLB in a LRU way      */
>>> @@ -1176,8 +1176,6 @@ static inline CPUPPCState *cpu_init(const char *cpu_model)
>>>  #define cpu_signal_handler cpu_ppc_signal_handler
>>>  #define cpu_list ppc_cpu_list
>>>  
>>> -#define CPU_SAVE_VERSION 4
>>> -
>>>  /* MMU modes definitions */
>>>  #define MMU_MODE0_SUFFIX _user
>>>  #define MMU_MODE1_SUFFIX _kernel
>>> diff --git a/target-ppc/machine.c b/target-ppc/machine.c
>>> index 2d10adb..1fcc6bc 100644
>>> --- a/target-ppc/machine.c
>>> +++ b/target-ppc/machine.c
>>> @@ -1,96 +1,12 @@
>>>  #include "hw/hw.h"
>>>  #include "hw/boards.h"
>>>  #include "sysemu/kvm.h"
>>> +#include "helper_regs.h"
>>>  
>>> -void cpu_save(QEMUFile *f, void *opaque)
>>> +static int cpu_load_old(QEMUFile *f, void *opaque, int version_id)
>>>  {
>>> -    CPUPPCState *env = (CPUPPCState *)opaque;
>>> -    unsigned int i, j;
>>> -    uint32_t fpscr;
>>> -    target_ulong xer;
>>> -
>>> -    for (i = 0; i < 32; i++)
>>> -        qemu_put_betls(f, &env->gpr[i]);
>>> -#if !defined(TARGET_PPC64)
>>> -    for (i = 0; i < 32; i++)
>>> -        qemu_put_betls(f, &env->gprh[i]);
>>> -#endif
>>> -    qemu_put_betls(f, &env->lr);
>>> -    qemu_put_betls(f, &env->ctr);
>>> -    for (i = 0; i < 8; i++)
>>> -        qemu_put_be32s(f, &env->crf[i]);
>>> -    xer = cpu_read_xer(env);
>>> -    qemu_put_betls(f, &xer);
>>> -    qemu_put_betls(f, &env->reserve_addr);
>>> -    qemu_put_betls(f, &env->msr);
>>> -    for (i = 0; i < 4; i++)
>>> -        qemu_put_betls(f, &env->tgpr[i]);
>>> -    for (i = 0; i < 32; i++) {
>>> -        union {
>>> -            float64 d;
>>> -            uint64_t l;
>>> -        } u;
>>> -        u.d = env->fpr[i];
>>> -        qemu_put_be64(f, u.l);
>>> -    }
>>> -    fpscr = env->fpscr;
>>> -    qemu_put_be32s(f, &fpscr);
>>> -    qemu_put_sbe32s(f, &env->access_type);
>>> -#if defined(TARGET_PPC64)
>>> -    qemu_put_betls(f, &env->spr[SPR_ASR]);
>>> -    qemu_put_sbe32s(f, &env->slb_nr);
>>> -#endif
>>> -    qemu_put_betls(f, &env->spr[SPR_SDR1]);
>>> -    for (i = 0; i < 32; i++)
>>> -        qemu_put_betls(f, &env->sr[i]);
>>> -    for (i = 0; i < 2; i++)
>>> -        for (j = 0; j < 8; j++)
>>> -            qemu_put_betls(f, &env->DBAT[i][j]);
>>> -    for (i = 0; i < 2; i++)
>>> -        for (j = 0; j < 8; j++)
>>> -            qemu_put_betls(f, &env->IBAT[i][j]);
>>> -    qemu_put_sbe32s(f, &env->nb_tlb);
>>> -    qemu_put_sbe32s(f, &env->tlb_per_way);
>>> -    qemu_put_sbe32s(f, &env->nb_ways);
>>> -    qemu_put_sbe32s(f, &env->last_way);
>>> -    qemu_put_sbe32s(f, &env->id_tlbs);
>>> -    qemu_put_sbe32s(f, &env->nb_pids);
>>> -    if (env->tlb.tlb6) {
>>> -        // XXX assumes 6xx
>>> -        for (i = 0; i < env->nb_tlb; i++) {
>>> -            qemu_put_betls(f, &env->tlb.tlb6[i].pte0);
>>> -            qemu_put_betls(f, &env->tlb.tlb6[i].pte1);
>>> -            qemu_put_betls(f, &env->tlb.tlb6[i].EPN);
>>> -        }
>>> -    }
>>> -    for (i = 0; i < 4; i++)
>>> -        qemu_put_betls(f, &env->pb[i]);
>>> -    for (i = 0; i < 1024; i++)
>>> -        qemu_put_betls(f, &env->spr[i]);
>>> -    qemu_put_be32s(f, &env->vscr);
>>> -    qemu_put_be64s(f, &env->spe_acc);
>>> -    qemu_put_be32s(f, &env->spe_fscr);
>>> -    qemu_put_betls(f, &env->msr_mask);
>>> -    qemu_put_be32s(f, &env->flags);
>>> -    qemu_put_sbe32s(f, &env->error_code);
>>> -    qemu_put_be32s(f, &env->pending_interrupts);
>>> -    qemu_put_be32s(f, &env->irq_input_state);
>>> -    for (i = 0; i < POWERPC_EXCP_NB; i++)
>>> -        qemu_put_betls(f, &env->excp_vectors[i]);
>>> -    qemu_put_betls(f, &env->excp_prefix);
>>> -    qemu_put_betls(f, &env->ivor_mask);
>>> -    qemu_put_betls(f, &env->ivpr_mask);
>>> -    qemu_put_betls(f, &env->hreset_vector);
>>> -    qemu_put_betls(f, &env->nip);
>>> -    qemu_put_betls(f, &env->hflags);
>>> -    qemu_put_betls(f, &env->hflags_nmsr);
>>> -    qemu_put_sbe32s(f, &env->mmu_idx);
>>> -    qemu_put_sbe32(f, 0);
>>> -}
>>> -
>>> -int cpu_load(QEMUFile *f, void *opaque, int version_id)
>>> -{
>>> -    CPUPPCState *env = (CPUPPCState *)opaque;
>>> +    PowerPCCPU *cpu = opaque;
>>> +    CPUPPCState *env = &cpu->env;
>>>      unsigned int i, j;
>>>      target_ulong sdr1;
>>>      uint32_t fpscr;
>>> @@ -177,3 +93,444 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
>>>  
>>>      return 0;
>>>  }
>>> +
>>> +static int get_avr(QEMUFile *f, void *pv, size_t size)
>>> +{
>>> +    ppc_avr_t *v = pv;
>>> +
>>> +    v->u64[0] = qemu_get_be64(f);
>>> +    v->u64[1] = qemu_get_be64(f);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static void put_avr(QEMUFile *f, void *pv, size_t size)
>>> +{
>>> +    ppc_avr_t *v = pv;
>>> +
>>> +    qemu_put_be64(f, v->u64[0]);
>>> +    qemu_put_be64(f, v->u64[1]);
>>> +}
>>> +
>>> +const VMStateInfo vmstate_info_avr = {
>>> +    .name = "avr",
>>> +    .get  = get_avr,
>>> +    .put  = put_avr,
>>> +};
>>> +
>>> +#define VMSTATE_AVR_ARRAY_V(_f, _s, _n, _v)                       \
>>> +    VMSTATE_ARRAY(_f, _s, _n, _v, vmstate_info_avr, ppc_avr_t)
>>> +
>>> +#define VMSTATE_AVR_ARRAY(_f, _s, _n)                             \
>>> +    VMSTATE_AVR_ARRAY_V(_f, _s, _n, 0)
>>> +
>>> +static void cpu_pre_save(void *opaque)
>>> +{
>>> +    PowerPCCPU *cpu = opaque;
>>> +    CPUPPCState *env = &cpu->env;
>>> +    int i;
>>> +
>>> +    env->spr[SPR_LR] = env->lr;
>>> +    env->spr[SPR_CTR] = env->ctr;
>>> +    env->spr[SPR_XER] = env->xer;
>>> +#if defined(TARGET_PPC64)
>>> +    env->spr[SPR_CFAR] = env->cfar;
>>> +#endif
>>> +    env->spr[SPR_BOOKE_SPEFSCR] = env->spe_fscr;
>>> +
>>> +    for (i = 0; (i < 4) && (i < env->nb_BATs); i++) {
>>> +        env->spr[SPR_DBAT0U + 2*i] = env->DBAT[0][i];
>>> +        env->spr[SPR_DBAT0U + 2*i + 1] = env->DBAT[1][i];
>>> +        env->spr[SPR_IBAT0U + 2*i] = env->IBAT[0][i];
>>> +        env->spr[SPR_IBAT0U + 2*i + 1] = env->IBAT[1][i];
>>> +    }
>>> +    for (i = 0; (i < 4) && ((i+4) < env->nb_BATs); i++) {
>>> +        env->spr[SPR_DBAT4U + 2*i] = env->DBAT[0][i+4];
>>> +        env->spr[SPR_DBAT4U + 2*i + 1] = env->DBAT[1][i+4];
>>> +        env->spr[SPR_IBAT4U + 2*i] = env->IBAT[0][i+4];
>>> +        env->spr[SPR_IBAT4U + 2*i + 1] = env->IBAT[1][i+4];
>>> +    }
>>> +}
>>> +
>>> +static int cpu_post_load(void *opaque, int version_id)
>>> +{
>>> +    PowerPCCPU *cpu = opaque;
>>> +    CPUPPCState *env = &cpu->env;
>>> +    int i;
>>> +
>>> +    env->lr = env->spr[SPR_LR];
>>> +    env->ctr = env->spr[SPR_CTR];
>>> +    env->xer = env->spr[SPR_XER];
>>> +#if defined(TARGET_PPC64)
>>> +    env->cfar = env->spr[SPR_CFAR];
>>> +#endif
>>> +    env->spe_fscr = env->spr[SPR_BOOKE_SPEFSCR];
>>> +
>>> +    for (i = 0; (i < 4) && (i < env->nb_BATs); i++) {
>>> +        env->DBAT[0][i] = env->spr[SPR_DBAT0U + 2*i];
>>> +        env->DBAT[1][i] = env->spr[SPR_DBAT0U + 2*i + 1];
>>> +        env->IBAT[0][i] = env->spr[SPR_IBAT0U + 2*i];
>>> +        env->IBAT[1][i] = env->spr[SPR_IBAT0U + 2*i + 1];
>>> +    }
>>> +    for (i = 0; (i < 4) && ((i+4) < env->nb_BATs); i++) {
>>> +        env->DBAT[0][i+4] = env->spr[SPR_DBAT4U + 2*i];
>>> +        env->DBAT[1][i+4] = env->spr[SPR_DBAT4U + 2*i + 1];
>>> +        env->IBAT[0][i+4] = env->spr[SPR_IBAT4U + 2*i];
>>> +        env->IBAT[1][i+4] = env->spr[SPR_IBAT4U + 2*i + 1];
>>> +    }
>>> +
>>> +    /* Restore htab_base and htab_mask variables */
>>> +    ppc_store_sdr1(env, env->spr[SPR_SDR1]);
>>> +
>>> +    hreg_compute_hflags(env);
>>> +    hreg_compute_mem_idx(env);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static bool fpu_needed(void *opaque)
>>> +{
>>> +    PowerPCCPU *cpu = opaque;
>>> +
>>> +    return (cpu->env.insns_flags & PPC_FLOAT);
>>> +}
>>> +
>>> +static const VMStateDescription vmstate_fpu = {
>>> +    .name = "cpu/fpu",
>>> +    .version_id = 1,
>>> +    .minimum_version_id = 1,
>>> +    .minimum_version_id_old = 1,
>>> +    .fields      = (VMStateField []) {
>>> +        VMSTATE_FLOAT64_ARRAY(env.fpr, PowerPCCPU, 32),
>>> +        VMSTATE_UINTTL(env.fpscr, PowerPCCPU),
>>> +        VMSTATE_END_OF_LIST()
>>> +    },
>>> +};
>>> +
>>> +static bool altivec_needed(void *opaque)
>>> +{
>>> +    PowerPCCPU *cpu = opaque;
>>> +
>>> +    return (cpu->env.insns_flags & PPC_ALTIVEC);
>>> +}
>>> +
>>> +static const VMStateDescription vmstate_altivec = {
>>> +    .name = "cpu/altivec",
>>> +    .version_id = 1,
>>> +    .minimum_version_id = 1,
>>> +    .minimum_version_id_old = 1,
>>> +    .fields      = (VMStateField []) {
>>> +        VMSTATE_AVR_ARRAY(env.avr, PowerPCCPU, 32),
>>> +        VMSTATE_UINT32(env.vscr, PowerPCCPU),
>>> +        VMSTATE_END_OF_LIST()
>>> +    },
>>> +};
>>> +
>>> +static bool vsx_needed(void *opaque)
>>> +{
>>> +    PowerPCCPU *cpu = opaque;
>>> +
>>> +    return (cpu->env.insns_flags2 & PPC2_VSX);
>>> +}
>>> +
>>> +static const VMStateDescription vmstate_vsx = {
>>> +    .name = "cpu/vsx",
>>> +    .version_id = 1,
>>> +    .minimum_version_id = 1,
>>> +    .minimum_version_id_old = 1,
>>> +    .fields      = (VMStateField []) {
>>> +        VMSTATE_UINT64_ARRAY(env.vsr, PowerPCCPU, 32),
>>> +        VMSTATE_END_OF_LIST()
>>> +    },
>>> +};
>>> +
>>> +static bool sr_needed(void *opaque)
>>> +{
>>> +#ifdef TARGET_PPC64
>>> +    PowerPCCPU *cpu = opaque;
>>> +
>>> +    return !(cpu->env.mmu_model & POWERPC_MMU_64);
>>> +#else
>>> +    return true;
>>> +#endif
>>> +}
>>> +
>>> +static const VMStateDescription vmstate_sr = {
>>> +    .name = "cpu/sr",
>>> +    .version_id = 1,
>>> +    .minimum_version_id = 1,
>>> +    .minimum_version_id_old = 1,
>>> +    .fields      = (VMStateField []) {
>>> +        VMSTATE_UINTTL_ARRAY(env.sr, PowerPCCPU, 32),
>>> +        VMSTATE_END_OF_LIST()
>>> +    },
>>> +};
>>> +
>>> +#ifdef TARGET_PPC64
>>> +static int get_slbe(QEMUFile *f, void *pv, size_t size)
>>> +{
>>> +    ppc_slb_t *v = pv;
>>> +
>>> +    v->esid = qemu_get_be64(f);
>>> +    v->vsid = qemu_get_be64(f);
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +static void put_slbe(QEMUFile *f, void *pv, size_t size)
>>> +{
>>> +    ppc_slb_t *v = pv;
>>> +
>>> +    qemu_put_be64(f, v->esid);
>>> +    qemu_put_be64(f, v->vsid);
>>> +}
>>> +
>>> +const VMStateInfo vmstate_info_slbe = {
>>> +    .name = "slbe",
>>> +    .get  = get_slbe,
>>> +    .put  = put_slbe,
>>> +};
>>> +
>>> +#define VMSTATE_SLB_ARRAY_V(_f, _s, _n, _v)                       \
>>> +    VMSTATE_ARRAY(_f, _s, _n, _v, vmstate_info_slbe, ppc_slb_t)
>>> +
>>> +#define VMSTATE_SLB_ARRAY(_f, _s, _n)                             \
>>> +    VMSTATE_SLB_ARRAY_V(_f, _s, _n, 0)
>>> +
>>> +static bool slb_needed(void *opaque)
>>> +{
>>> +    PowerPCCPU *cpu = opaque;
>>> +
>>> +    /* We don't support any of the old segment table based 64-bit CPUs */
>>> +    return (cpu->env.mmu_model & POWERPC_MMU_64);
>>> +}
>>> +
>>> +static const VMStateDescription vmstate_slb = {
>>> +    .name = "cpu/slb",
>>> +    .version_id = 1,
>>> +    .minimum_version_id = 1,
>>> +    .minimum_version_id_old = 1,
>>> +    .fields      = (VMStateField []) {
>>> +        VMSTATE_INT32_EQUAL(env.slb_nr, PowerPCCPU),
>>> +        VMSTATE_SLB_ARRAY(env.slb, PowerPCCPU, 64),
>>> +        VMSTATE_END_OF_LIST()
>>> +    }
>>> +};
>>> +#endif /* TARGET_PPC64 */
>>> +
>>> +static const VMStateDescription vmstate_tlb6xx_entry = {
>>> +    .name = "cpu/tlb6xx_entry",
>>> +    .version_id = 1,
>>> +    .minimum_version_id = 1,
>>> +    .minimum_version_id_old = 1,
>>> +    .fields      = (VMStateField []) {
>>> +        VMSTATE_UINTTL(pte0, ppc6xx_tlb_t),
>>> +        VMSTATE_UINTTL(pte1, ppc6xx_tlb_t),
>>> +        VMSTATE_UINTTL(EPN, ppc6xx_tlb_t),
>>> +        VMSTATE_END_OF_LIST()
>>> +    },
>>> +};
>>> +
>>> +static bool tlb6xx_needed(void *opaque)
>>> +{
>>> +    PowerPCCPU *cpu = opaque;
>>> +    CPUPPCState *env = &cpu->env;
>>> +
>>> +    return env->nb_tlb && (env->tlb_type == TLB_6XX);
>>> +}
>>> +
>>> +static const VMStateDescription vmstate_tlb6xx = {
>>> +    .name = "cpu/tlb6xx",
>>> +    .version_id = 1,
>>> +    .minimum_version_id = 1,
>>> +    .minimum_version_id_old = 1,
>>> +    .fields      = (VMStateField []) {
>>> +        VMSTATE_INT32_EQUAL(env.nb_tlb, PowerPCCPU),
>>> +        VMSTATE_STRUCT_VARRAY_POINTER_INT32(env.tlb.tlb6, PowerPCCPU,
>>> +                                            env.nb_tlb,
>>> +                                            vmstate_tlb6xx_entry,
>>> +                                            ppc6xx_tlb_t),
>>> +        VMSTATE_UINTTL_ARRAY(env.tgpr, PowerPCCPU, 4),
>>> +        VMSTATE_END_OF_LIST()
>>> +    }
>>> +};
>>> +
>>> +static const VMStateDescription vmstate_tlbemb_entry = {
>>> +    .name = "cpu/tlbemb_entry",
>>> +    .version_id = 1,
>>> +    .minimum_version_id = 1,
>>> +    .minimum_version_id_old = 1,
>>> +    .fields      = (VMStateField []) {
>>> +        VMSTATE_UINT64(RPN, ppcemb_tlb_t),
>>> +        VMSTATE_UINTTL(EPN, ppcemb_tlb_t),
>>> +        VMSTATE_UINTTL(PID, ppcemb_tlb_t),
>>> +        VMSTATE_UINTTL(size, ppcemb_tlb_t),
>>> +        VMSTATE_UINT32(prot, ppcemb_tlb_t),
>>> +        VMSTATE_UINT32(attr, ppcemb_tlb_t),
>>> +        VMSTATE_END_OF_LIST()
>>> +    },
>>> +};
>>> +
>>> +static bool tlbemb_needed(void *opaque)
>>> +{
>>> +    PowerPCCPU *cpu = opaque;
>>> +    CPUPPCState *env = &cpu->env;
>>> +
>>> +    return env->nb_tlb && (env->tlb_type == TLB_EMB);
>>> +}
>>> +
>>> +static bool pbr403_needed(void *opaque)
>>> +{
>>> +    PowerPCCPU *cpu = opaque;
>>> +    uint32_t pvr = cpu->env.spr[SPR_PVR];
>>> +
>>> +    return (pvr & 0xffff0000) == 0x00200000;
>>> +}
>>> +
>>> +static const VMStateDescription vmstate_pbr403 = {
>>> +    .name = "cpu/pbr403",
>>> +    .version_id = 1,
>>> +    .minimum_version_id = 1,
>>> +    .minimum_version_id_old = 1,
>>> +    .fields      = (VMStateField []) {
>>> +        VMSTATE_UINTTL_ARRAY(env.pb, PowerPCCPU, 4),
>>> +        VMSTATE_END_OF_LIST()
>>> +    },
>>> +};
>>> +
>>> +static const VMStateDescription vmstate_tlbemb = {
>>> +    .name = "cpu/tlb6xx",
>>> +    .version_id = 1,
>>> +    .minimum_version_id = 1,
>>> +    .minimum_version_id_old = 1,
>>> +    .fields      = (VMStateField []) {
>>> +        VMSTATE_INT32_EQUAL(env.nb_tlb, PowerPCCPU),
>>> +        VMSTATE_STRUCT_VARRAY_POINTER_INT32(env.tlb.tlbe, PowerPCCPU,
>>> +                                            env.nb_tlb,
>>> +                                            vmstate_tlbemb_entry,
>>> +                                            ppcemb_tlb_t),
>>> +        /* 403 protection registers */
>>> +        VMSTATE_END_OF_LIST()
>>> +    },
>>> +    .subsections = (VMStateSubsection []) {
>>> +        {
>>> +            .vmsd = &vmstate_pbr403,
>>> +            .needed = pbr403_needed,
>>> +        } , {
>>> +            /* empty */
>>> +        }
>>> +    }
>>> +};
>>> +
>>> +static const VMStateDescription vmstate_tlbmas_entry = {
>>> +    .name = "cpu/tlbmas_entry",
>>> +    .version_id = 1,
>>> +    .minimum_version_id = 1,
>>> +    .minimum_version_id_old = 1,
>>> +    .fields      = (VMStateField []) {
>>> +        VMSTATE_UINT32(mas8, ppcmas_tlb_t),
>>> +        VMSTATE_UINT32(mas1, ppcmas_tlb_t),
>>> +        VMSTATE_UINT64(mas2, ppcmas_tlb_t),
>>> +        VMSTATE_UINT64(mas7_3, ppcmas_tlb_t),
>>> +        VMSTATE_END_OF_LIST()
>>> +    },
>>> +};
>>> +
>>> +static bool tlbmas_needed(void *opaque)
>>> +{
>>> +    PowerPCCPU *cpu = opaque;
>>> +    CPUPPCState *env = &cpu->env;
>>> +
>>> +    return env->nb_tlb && (env->tlb_type == TLB_MAS);
>>> +}
>>> +
>>> +static const VMStateDescription vmstate_tlbmas = {
>>> +    .name = "cpu/tlbmas",
>>> +    .version_id = 1,
>>> +    .minimum_version_id = 1,
>>> +    .minimum_version_id_old = 1,
>>> +    .fields      = (VMStateField []) {
>>> +        VMSTATE_INT32_EQUAL(env.nb_tlb, PowerPCCPU),
>>> +        VMSTATE_STRUCT_VARRAY_POINTER_INT32(env.tlb.tlbm, PowerPCCPU,
>>> +                                            env.nb_tlb,
>>> +                                            vmstate_tlbmas_entry,
>>> +                                            ppcmas_tlb_t),
>>> +        VMSTATE_END_OF_LIST()
>>> +    }
>>> +};
>>> +
>>> +const VMStateDescription vmstate_ppc_cpu = {
>>> +    .name = "cpu",
>>> +    .version_id = 5,
>>> +    .minimum_version_id = 5,
>>> +    .minimum_version_id_old = 4,
>>> +    .load_state_old = cpu_load_old,
>>> +    .pre_save = cpu_pre_save,
>>> +    .post_load = cpu_post_load,
>>> +    .fields      = (VMStateField []) {
>>> +        /* Verify we haven't changed the pvr */
>>> +        VMSTATE_UINTTL_EQUAL(env.spr[SPR_PVR], PowerPCCPU),
>>> +
>>> +        /* User mode architected state */
>>> +        VMSTATE_UINTTL_ARRAY(env.gpr, PowerPCCPU, 32),
>>> +#if !defined(TARGET_PPC64)
>>> +        VMSTATE_UINTTL_ARRAY(env.gprh, PowerPCCPU, 32),
>>> +#endif
>>> +        VMSTATE_UINT32_ARRAY(env.crf, PowerPCCPU, 8),
>>> +        VMSTATE_UINTTL(env.nip, PowerPCCPU),
>>> +
>>> +        /* SPRs */
>>> +        VMSTATE_UINTTL_ARRAY(env.spr, PowerPCCPU, 1024),
>>> +        VMSTATE_UINT64(env.spe_acc, PowerPCCPU),
>>> +
>>> +        /* Reservation */
>>> +        VMSTATE_UINTTL(env.reserve_addr, PowerPCCPU),
>>> +
>>> +        /* Supervisor mode architected state */
>>> +        VMSTATE_UINTTL(env.msr, PowerPCCPU),
>>> +
>>> +        /* Internal state */
>>> +        VMSTATE_UINTTL(env.hflags_nmsr, PowerPCCPU),
>>> +        /* FIXME: access_type? */
>>> +
>>> +        /* Sanity checking */
>>> +        VMSTATE_UINTTL_EQUAL(env.msr_mask, PowerPCCPU),
>>> +        VMSTATE_UINT64_EQUAL(env.insns_flags, PowerPCCPU),
>>> +        VMSTATE_UINT64_EQUAL(env.insns_flags2, PowerPCCPU),
>>> +        VMSTATE_UINT32_EQUAL(env.nb_BATs, PowerPCCPU),
>>> +        VMSTATE_END_OF_LIST()
>>> +    },
>>> +    .subsections = (VMStateSubsection []) {
>>> +        {
>>> +            .vmsd = &vmstate_fpu,
>>> +            .needed = fpu_needed,
>>> +        } , {
>>> +            .vmsd = &vmstate_altivec,
>>> +            .needed = altivec_needed,
>>> +        } , {
>>> +            .vmsd = &vmstate_vsx,
>>> +            .needed = vsx_needed,
>>> +        } , {
>>> +            .vmsd = &vmstate_sr,
>>> +            .needed = sr_needed,
>>> +        } , {
>>> +#ifdef TARGET_PPC64
>>> +            .vmsd = &vmstate_slb,
>>> +            .needed = slb_needed,
>>> +        } , {
>>> +#endif /* TARGET_PPC64 */
>>> +            .vmsd = &vmstate_tlb6xx,
>>> +            .needed = tlb6xx_needed,
>>> +        } , {
>>> +            .vmsd = &vmstate_tlbemb,
>>> +            .needed = tlbemb_needed,
>>> +        } , {
>>> +            .vmsd = &vmstate_tlbmas,
>>> +            .needed = tlbmas_needed,
>>> +        } , {
>>> +            /* FIXME: DCRs? */
>>> +            /* FIXME: timebase? */
>>> +            /* empty */
>> 
>> Are they needed or not needed?
>
> DCR is not needed, I'll remove it.
>
> Timebase is needed but it requires kernel support and either way it should
> not prevent the rest of the patch to go upstream.

So migration doesn't work?

If you need timebase, what happens without it?

Regards,

Anthony Liguori

>
> I'll remove both comments anyway.
>
>
>> If they're needed, please add them.
>> 
>> Regards,
>> 
>> Anthony Liguori
>> 
>>> +        }
>>> +    }
>>> +};
>>> diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
>>> index d8758d5..95aebf7 100644
>>> --- a/target-ppc/translate_init.c
>>> +++ b/target-ppc/translate_init.c
>>> @@ -8295,6 +8295,8 @@ static void ppc_cpu_class_init(ObjectClass *oc, void *data)
>>>  
>>>      cc->class_by_name = ppc_cpu_class_by_name;
>>>      cc->do_interrupt = ppc_cpu_do_interrupt;
>>> +
>>> +    cpu_class_set_vmsd(cc, &vmstate_ppc_cpu);
>>>  }
>>>  
>>>  static const TypeInfo ppc_cpu_type_info = {
>>> -- 
>>> 1.7.10.4
>> 
>
>
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 03/17] savevm: Implement VMS_DIVIDE flag
  2013-07-09 14:06       ` Anthony Liguori
@ 2013-07-09 14:38         ` David Gibson
  0 siblings, 0 replies; 92+ messages in thread
From: David Gibson @ 2013-07-09 14:38 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexey Kardashevskiy, qemu-devel, Alexander Graf, qemu-ppc,
	Paolo Bonzini, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 1474 bytes --]

On Tue, Jul 09, 2013 at 09:06:21AM -0500, Anthony Liguori wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
> 
> > On Mon, Jul 08, 2013 at 01:27:05PM -0500, Anthony Liguori wrote:
> >> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> >> 
> >> > From: David Gibson <david@gibson.dropbear.id.au>
> >> >
> >> > The vmstate infrastructure includes a VMS_MULTIPY flag, and associated
> >> > VMSTATE_VBUFFER_MULTIPLY helper macro.  These can be used to save a
> >> > variably sized buffer where the size in bytes of the buffer isn't directly
> >> > accessible as a structure field, but an element count from which the size
> >> > can be derived is.
> >> 
> >> Why?  What's the point of sending the total size vs. the element
> >> count?
> >
> > Because it's more convenient to work with the total size at runtime,
> > and because the VMSTATE stuff works with actual structure fields,
> > there's not really a way to convert it at migrate time, short of this.
> 
> The only thing I see using it is the tce array which is broken
> anyway.

It's only broken due to a bug elsewhere.  Whatever is done there it
will still be some sort of VBUFFER arrangement, and the window will
still be a more convenient thing to work with than the number of
entries.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 04/17] target-ppc: Convert ppc cpu savevm to VMStateDescription
  2013-07-09 14:08       ` Anthony Liguori
@ 2013-07-09 15:11         ` David Gibson
  2013-07-10  3:31           ` Benjamin Herrenschmidt
  2013-07-15 13:24           ` Paolo Bonzini
  0 siblings, 2 replies; 92+ messages in thread
From: David Gibson @ 2013-07-09 15:11 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexey Kardashevskiy, qemu-devel, Alexander Graf, qemu-ppc,
	Paolo Bonzini, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 2756 bytes --]

On Tue, Jul 09, 2013 at 09:08:01AM -0500, Anthony Liguori wrote:
> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> > On 07/09/2013 04:29 AM, Anthony Liguori wrote:
> >> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
[snip]
> >>> +#endif /* TARGET_PPC64 */
> >>> +            .vmsd = &vmstate_tlb6xx,
> >>> +            .needed = tlb6xx_needed,
> >>> +        } , {
> >>> +            .vmsd = &vmstate_tlbemb,
> >>> +            .needed = tlbemb_needed,
> >>> +        } , {
> >>> +            .vmsd = &vmstate_tlbmas,
> >>> +            .needed = tlbmas_needed,
> >>> +        } , {
> >>> +            /* FIXME: DCRs? */
> >>> +            /* FIXME: timebase? */
> >>> +            /* empty */
> >> 
> >> Are they needed or not needed?
> >
> > DCR is not needed, I'll remove it.

More precisely, DCRs are only needed on the BookE CPUs which have
them.  They can be added later without breaking compatibility, and
would be best added by someone working on the BookE stuff who can test
it properly.

> > Timebase is needed but it requires kernel support and either way it should
> > not prevent the rest of the patch to go upstream.
> 
> So migration doesn't work?
> 
> If you need timebase, what happens without it?

Migration will (in fact, does) work without anything extra for the
timebase.  What's less clear is if all the timing edge cases are
correct at present.

As a rule, the guest should see the timebase advance across the
migration according to the elapsed wall clock time.  But the guest
*must not* see the timebase go backwards, even if the source and
destination host clocks are out of sync in such a way that time
appears to go backwards across the migration.

Under TCG, the guest timebase is not tracked as it advances, but an
appropriate value is computed from the host system time when the
timebase is read.  Under KVM, the host and guest timebase are the same
register physically.  We don't yet, but we probably should, context
switch the upper bits of the timebase, to give the guest its own
logical value for it.

Getting all the combinations of cases corrects probably needs some
sort of real time <-> guest timebase delta transferred across the
migration, but working out exactly what's needed and how to encode it
is a bit fiddly.

Since the common cases work already, and it's fairly straightforward
to add whatever delta is needed in a backwards compatible way.  It
seems reasonable, therefore to get migration mostly working, even with
some known bugs in timing edge cases.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 08/17] pseries: savevm support for PAPR TCE tables
  2013-07-09  7:20     ` David Gibson
@ 2013-07-09 15:22       ` Anthony Liguori
  2013-07-10  7:42         ` David Gibson
  2013-07-09 16:26       ` Anthony Liguori
  1 sibling, 1 reply; 92+ messages in thread
From: Anthony Liguori @ 2013-07-09 15:22 UTC (permalink / raw)
  To: David Gibson
  Cc: Alexey Kardashevskiy, qemu-devel, Alexander Graf, qemu-ppc,
	Paolo Bonzini, Paul Mackerras

David Gibson <david@gibson.dropbear.id.au> writes:

> On Mon, Jul 08, 2013 at 01:39:26PM -0500, Anthony Liguori wrote:
>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>> 
>> > From: David Gibson <david@gibson.dropbear.id.au>
>> >
>> > This patch adds the necessary VMStateDescription information to save the
>> > state of PAPR TCE tables (that is, the PAPR specified IOMMU).
>> >
>> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>> > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> > ---
>> >  hw/ppc/spapr_iommu.c |   25 +++++++++++++++++++++++++
>> >  1 file changed, 25 insertions(+)
>> >
>> > diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>> > index 91bc8e4..ba1f7b6 100644
>> > --- a/hw/ppc/spapr_iommu.c
>> > +++ b/hw/ppc/spapr_iommu.c
>> > @@ -112,6 +112,25 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
>> >      };
>> >  }
>> >  
>> > +static const VMStateDescription vmstate_spapr_tce_table = {
>> > +    .name = "spapr_iommu",
>> > +    .version_id = 1,
>> > +    .minimum_version_id = 1,
>> > +    .minimum_version_id_old = 1,
>> > +    .fields      = (VMStateField []) {
>> > +        /* Sanity check */
>> > +        VMSTATE_UINT32_EQUAL(liobn, sPAPRTCETable),
>> > +        VMSTATE_UINT32_EQUAL(window_size, sPAPRTCETable),
>> > +
>> > +        /* IOMMU state */
>> > +        VMSTATE_BOOL(bypass, sPAPRTCETable),
>> > +        VMSTATE_VBUFFER_DIVIDE(table, sPAPRTCETable, 0, NULL, 0, window_size,
>> > +                               SPAPR_TCE_PAGE_SIZE /
>> > sizeof(sPAPRTCE)),
>> 
>> Not endian safe.  I really don't get the divide bit at all either.
>
> So, the actual bug is that we're currently storing the TCE table
> native endian, whereas it should be stored big endan always.

Why?  There are no guest visible byte accesses done to the table
AFAICT.  Everything is done as words and there's quite a lot of math
done to the entries.

It seems like native endian is the right internal representation.

>  
>> > +
>> > +        VMSTATE_END_OF_LIST()
>> > +    },
>> > +};
>> > +
>> >  static MemoryRegionIOMMUOps spapr_iommu_ops = {
>> >      .translate = spapr_tce_translate_iommu,
>> >  };
>> > @@ -156,6 +175,8 @@ sPAPRTCETable *spapr_tce_new_table(uint32_t liobn, size_t window_size)
>> >  
>> >      QLIST_INSERT_HEAD(&spapr_tce_tables, tcet, list);
>> >  
>> > +    vmstate_register(NULL, tcet->liobn, &vmstate_spapr_tce_table, tcet);
>> > +
>> 
>> If you need to add these, then you need to do more QOM conversion.
>
> Again, it's not clear how this should be QOMed.  Child of the device
> constructing the TCE table?  But since that can often be a bus bridge,
> wouldn't the TCE table instances get confused with the real bus
> devices.

I can't apply this series (I'm not sure what tree it's against), but if
Alexey pushes a branch somewhere I can do the QOM conversions to
demonstrate.

Regards,

Anthony Liguori

>
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8
  2013-07-09  6:37   ` Alexey Kardashevskiy
@ 2013-07-09 15:26     ` Anthony Liguori
  0 siblings, 0 replies; 92+ messages in thread
From: Anthony Liguori @ 2013-07-09 15:26 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: qemu-devel, Alexander Graf, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> On 07/09/2013 04:01 AM, Anthony Liguori wrote:
>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>> 
>>> This series spent quite a lot of time waiting when David's PCI series
>>> reaches the upstream but it does not seem to happen soon so I rebased
>>> those on top of agraf/ppc-next rebased on top qemu.org/master.
>>>
>>>
>>> While this series applies and compiles, the migration will often fail
>>> until the "migration: do not sent zero pages in bulk stage" patch is reverted
>>> or fixed somehow.
>> 
>> Your cover letter is out of date.  This patch has been applied.  Can you
>> confirm the series now works as expected?
>
> Sorry, my bad. It all works now.
>
>> David's PCI series is now upstream too.
>> 
>> This should be at least three if not four distinct patch series.
>> Sending it as a single series means it cannot be applied in chunks easily.
>
> Besides "savevm: Implement VMS_DIVIDE flag" (can we keep it? or I should
> get rid of it?), the rest should go through Alex Graf's ppc-next tree. I
> cannot easily move the patches in this series as it will require rebase
> almost every time. So what is the point in splitting this into 4 series?
> Can try grouping some of them though...

It's four different things.

Regards,

Anthony Liguori

>
>
>
>> Regards,
>> 
>> Anthony Liguori
>> 
>>> Alexey Kardashevskiy (4):
>>>   pseries: move interrupt controllers to hw/intc/
>>>   pseries: rework XICS
>>>   pseries: rework PAPR virtual SCSI
>>>   spapr-pci: rework MSI/MSIX
>>>
>>> David Gibson (12):
>>>   savevm: Implement VMS_DIVIDE flag
>>>   target-ppc: Convert ppc cpu savevm to VMStateDescription
>>>   pseries: savevm support for XICS interrupt controller
>>>   pseries: savevm support for VIO devices
>>>   pseries: savevm support for PAPR VIO logical lan
>>>   pseries: savevm support for PAPR TCE tables
>>>   pseries: savevm support for PAPR virtual SCSI
>>>   pseries: savevm support for pseries machine
>>>   pseries: savevm support for PCI host bridge
>>>   target-ppc: Add helper for KVM_PPC_RTAS_DEFINE_TOKEN
>>>   pseries: Support for in-kernel XICS interrupt controller
>>>   pseries: savevm support with KVM
>>>
>>> Prerna Saxena (1):
>>>   ppc64: Enable QEMU to run on POWER 8 DD1 chip.
>>>
>>>  default-configs/ppc64-softmmu.mak |    2 +
>>>  hw/char/spapr_vty.c               |   16 ++
>>>  hw/intc/Makefile.objs             |    2 +
>>>  hw/{ppc => intc}/xics.c           |  172 ++++++++----
>>>  hw/intc/xics_kvm.c                |  445 +++++++++++++++++++++++++++++++
>>>  hw/net/spapr_llan.c               |   24 +-
>>>  hw/ppc/Makefile.objs              |    2 +-
>>>  hw/ppc/spapr.c                    |  418 ++++++++++++++++++++++++++++-
>>>  hw/ppc/spapr_hcall.c              |    8 +-
>>>  hw/ppc/spapr_iommu.c              |   25 ++
>>>  hw/ppc/spapr_pci.c                |  141 ++++++----
>>>  hw/ppc/spapr_vio.c                |   20 ++
>>>  hw/scsi/spapr_vscsi.c             |  306 ++++++++++++++-------
>>>  include/hw/pci-host/spapr.h       |   14 +-
>>>  include/hw/ppc/spapr.h            |   17 +-
>>>  include/hw/ppc/spapr_vio.h        |    5 +
>>>  include/hw/ppc/xics.h             |   72 ++++-
>>>  include/migration/vmstate.h       |   13 +
>>>  savevm.c                          |    8 +
>>>  target-ppc/cpu-models.c           |    3 +
>>>  target-ppc/cpu-models.h           |    1 +
>>>  target-ppc/cpu-qom.h              |    4 +
>>>  target-ppc/cpu.h                  |    8 +-
>>>  target-ppc/kvm.c                  |   83 ++++++
>>>  target-ppc/kvm_ppc.h              |   29 ++
>>>  target-ppc/machine.c              |  533 +++++++++++++++++++++++++++++++------
>>>  target-ppc/translate_init.c       |   36 +++
>>>  27 files changed, 2088 insertions(+), 319 deletions(-)
>>>  rename hw/{ppc => intc}/xics.c (80%)
>>>  create mode 100644 hw/intc/xics_kvm.c
>>>
>>> -- 
>>> 1.7.10.4
>> 
>> 
>
>
> -- 
> Alexey

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 08/17] pseries: savevm support for PAPR TCE tables
  2013-07-09  7:20     ` David Gibson
  2013-07-09 15:22       ` Anthony Liguori
@ 2013-07-09 16:26       ` Anthony Liguori
  1 sibling, 0 replies; 92+ messages in thread
From: Anthony Liguori @ 2013-07-09 16:26 UTC (permalink / raw)
  To: David Gibson
  Cc: Alexey Kardashevskiy, qemu-devel, Alexander Graf, qemu-ppc,
	Paolo Bonzini, Paul Mackerras

David Gibson <david@gibson.dropbear.id.au> writes:

> On Mon, Jul 08, 2013 at 01:39:26PM -0500, Anthony Liguori wrote:
>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>> 
>> > From: David Gibson <david@gibson.dropbear.id.au>
>> >
>> > This patch adds the necessary VMStateDescription information to save the
>> > state of PAPR TCE tables (that is, the PAPR specified IOMMU).
>> >
>> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>> > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> > ---
>> >  hw/ppc/spapr_iommu.c |   25 +++++++++++++++++++++++++
>> >  1 file changed, 25 insertions(+)
>> >
>> > diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>> > index 91bc8e4..ba1f7b6 100644
>> > --- a/hw/ppc/spapr_iommu.c
>> > +++ b/hw/ppc/spapr_iommu.c
>> > @@ -112,6 +112,25 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
>> >      };
>> >  }
>> >  
>> > +static const VMStateDescription vmstate_spapr_tce_table = {
>> > +    .name = "spapr_iommu",
>> > +    .version_id = 1,
>> > +    .minimum_version_id = 1,
>> > +    .minimum_version_id_old = 1,
>> > +    .fields      = (VMStateField []) {
>> > +        /* Sanity check */
>> > +        VMSTATE_UINT32_EQUAL(liobn, sPAPRTCETable),
>> > +        VMSTATE_UINT32_EQUAL(window_size, sPAPRTCETable),
>> > +
>> > +        /* IOMMU state */
>> > +        VMSTATE_BOOL(bypass, sPAPRTCETable),
>> > +        VMSTATE_VBUFFER_DIVIDE(table, sPAPRTCETable, 0, NULL, 0, window_size,
>> > +                               SPAPR_TCE_PAGE_SIZE /
>> > sizeof(sPAPRTCE)),
>> 
>> Not endian safe.  I really don't get the divide bit at all either.
>
> So, the actual bug is that we're currently storing the TCE table
> native endian, whereas it should be stored big endan always.
>  
>> > +
>> > +        VMSTATE_END_OF_LIST()
>> > +    },
>> > +};
>> > +
>> >  static MemoryRegionIOMMUOps spapr_iommu_ops = {
>> >      .translate = spapr_tce_translate_iommu,
>> >  };
>> > @@ -156,6 +175,8 @@ sPAPRTCETable *spapr_tce_new_table(uint32_t liobn, size_t window_size)
>> >  
>> >      QLIST_INSERT_HEAD(&spapr_tce_tables, tcet, list);
>> >  
>> > +    vmstate_register(NULL, tcet->liobn, &vmstate_spapr_tce_table, tcet);
>> > +
>> 
>> If you need to add these, then you need to do more QOM conversion.
>
> Again, it's not clear how this should be QOMed.  Child of the device
> constructing the TCE table?  But since that can often be a bus bridge,
> wouldn't the TCE table instances get confused with the real bus
> devices.

Only build tested.

https://github.com/aliguori/qemu/commit/a47a391c875a69f203110811c730877da12f5b14

I'll put together a patch series once I have a chance to test properly.

Regards,

Anthony Liguori

>
> -- 
> David Gibson			| I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
> 				| _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 02/17] pseries: rework XICS
  2013-07-09 13:58         ` Anthony Liguori
@ 2013-07-10  3:06           ` Alexey Kardashevskiy
  2013-07-10  3:26           ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-10  3:06 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: qemu-devel, Alexander Graf, qemu-ppc, Paolo Bonzini,
	Paul Mackerras, David Gibson

On 07/09/2013 11:58 PM, Anthony Liguori wrote:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> 
>> On Tue, 2013-07-09 at 13:40 +1000, Alexey Kardashevskiy wrote:
>>> No, why? It is a per CPU state of XICS controller, never exists apart
>>> from XICS.
>>
>> ICP is. ICS is  ... different but can mostly be considered to be the
>> XICS itself.
>>
>> Anthony, we could be completely anal about it and create a gigantic
>> cathedral of devices or just be a bit realistic and do something simpler
>> that has the exact same functionality :)
> 
> There's very little complexity in making something a device.  It's just
> a matter of sticking a DeviceState member in the struct, changing the
> way the object is created (object_new vs. malloc), and adding a
> TypeInfo.
> 
> There's a very good reason to have things be devices too.  You can only
> control the section naming of devices for live migration.  The only way
> to set compatibility properties for live migration is by having device
> properties too.
> 
> You haven't dealt with these problems yet, but you will, and doing the
> work up front means that you don't have to break migration once in order
> to keep it compatible in the future.



I have got a problem right now. I need to have 2 devices - xics and
xics-kvm, give them the same VMState properties and migration names and
have different pre_load/post_load (btw the whole point of separating
xics-kvm from xics).

How do I solve this without anyone saying that I am doing terribly a wrong
thing?

I already asked this in "[Qemu-devel] [PATCH 05/17] pseries: savevm support
for XICS interrupt controller" but have not seen any response yet. Thank you.




>> Basically, in HW the layout of the interrupt network is:
>>
>>  - One ICP per processor thread (the "presenter"). This contains the
>> registers to fetch a pending interrupt (ack), EOI, and control the
>> processor priority.
>>
>>  - One ICS per logical source of interrupts (ie, one per PCI host
>> bridge, and a few others here or there). This contains the per-interrupt
>> source configuration (target processor(s), priority, mask) and the
>> per-interrupt internal state.
> 
> This sounds an awful lot like the relationship between the I/O APIC(s)
> and the local APICs FWIW.
> 
>> Under PAPR, there is a single "virtual" ICS ... somewhat (it's a bit
>> oddball what pHyp does here, arguably there are two but we can ignore
>> that distinction). There is no register level access. A pair of firmware
>> (RTAS) calls is used to configure each virtual interrupt.
>>
>> So our model here is somewhat the same. We have one ICS in the emulated
>> XICS which arguably *is* the emulated XICS, there's no point making it a
>> separate "device", that would just be gross, and each VCPU has an
>> associated ICP.
> 
> There's nothing gross about making the things that are devices devices.


-- 
Alexey

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 14/17] pseries: Support for in-kernel XICS interrupt controller
  2013-07-09  7:21       ` David Gibson
@ 2013-07-10  3:24         ` Benjamin Herrenschmidt
  2013-07-10  7:48           ` David Gibson
  0 siblings, 1 reply; 92+ messages in thread
From: Benjamin Herrenschmidt @ 2013-07-10  3:24 UTC (permalink / raw)
  To: David Gibson
  Cc: Anthony Liguori, Alexey Kardashevskiy, qemu-devel,
	Alexander Graf, Paul Mackerras, Paolo Bonzini, qemu-ppc

On Tue, 2013-07-09 at 17:21 +1000, David Gibson wrote:
> > Did you mean missing typedef?
> 
> I think he means the kernel_style_struct_name instead of the
> QemuStyleStudlyCapsStructName.

Looks like we missed the mandatory MakeCodeFugly rule of qemu :-)

Cheers,
Ben. 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 02/17] pseries: rework XICS
  2013-07-09 13:58         ` Anthony Liguori
  2013-07-10  3:06           ` Alexey Kardashevskiy
@ 2013-07-10  3:26           ` Benjamin Herrenschmidt
  2013-07-10 12:09             ` Anthony Liguori
  1 sibling, 1 reply; 92+ messages in thread
From: Benjamin Herrenschmidt @ 2013-07-10  3:26 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexey Kardashevskiy, qemu-devel, Alexander Graf, qemu-ppc,
	Paolo Bonzini, Paul Mackerras, David Gibson

On Tue, 2013-07-09 at 08:58 -0500, Anthony Liguori wrote:
> There's nothing gross about making the things that are devices
> devices.

But there is no such thing as the XICS ...

The "XICS" is just the combination of ICP's and ICS... so XICS *is* the
device...

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 04/17] target-ppc: Convert ppc cpu savevm to VMStateDescription
  2013-07-09 15:11         ` David Gibson
@ 2013-07-10  3:31           ` Benjamin Herrenschmidt
  2013-07-10  7:49             ` David Gibson
  2013-07-15 13:24           ` Paolo Bonzini
  1 sibling, 1 reply; 92+ messages in thread
From: Benjamin Herrenschmidt @ 2013-07-10  3:31 UTC (permalink / raw)
  To: David Gibson
  Cc: Anthony Liguori, Alexey Kardashevskiy, qemu-devel,
	Alexander Graf, Paul Mackerras, Paolo Bonzini, qemu-ppc

On Wed, 2013-07-10 at 01:11 +1000, David Gibson wrote:

> More precisely, DCRs are only needed on the BookE CPUs which have
> them.  They can be added later without breaking compatibility, and
> would be best added by someone working on the BookE stuff who can test
> it properly.

DCRs are also not in the core, they are in the fabric (ie, global
chip things) anyway.

> Migration will (in fact, does) work without anything extra for the
> timebase.  What's less clear is if all the timing edge cases are
> correct at present.
> 
> As a rule, the guest should see the timebase advance across the
> migration according to the elapsed wall clock time.  But the guest
> *must not* see the timebase go backwards, even if the source and
> destination host clocks are out of sync in such a way that time
> appears to go backwards across the migration.
> 
> Under TCG, the guest timebase is not tracked as it advances, but an
> appropriate value is computed from the host system time when the
> timebase is read.  Under KVM, the host and guest timebase are the same
> register physically.  We don't yet, but we probably should, context
> switch the upper bits of the timebase, to give the guest its own
> logical value for it.
> 
> Getting all the combinations of cases corrects probably needs some
> sort of real time <-> guest timebase delta transferred across the
> migration, but working out exactly what's needed and how to encode it
> is a bit fiddly.
> 
> Since the common cases work already, and it's fairly straightforward
> to add whatever delta is needed in a backwards compatible way.  It
> seems reasonable, therefore to get migration mostly working, even with
> some known bugs in timing edge cases.

What do you mean by the "common case" ? The common case of KVM does not
work afaik. The timebase *will* appear to go backward if the target
machine was booted after the source machine today which is likely to
crash the kernel.

The timebase context switching must be implemented asap. We've discussed
it a few times here and we know how to do it, it's just not done yet.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 08/17] pseries: savevm support for PAPR TCE tables
  2013-07-09 15:22       ` Anthony Liguori
@ 2013-07-10  7:42         ` David Gibson
  0 siblings, 0 replies; 92+ messages in thread
From: David Gibson @ 2013-07-10  7:42 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexey Kardashevskiy, qemu-devel, Alexander Graf, qemu-ppc,
	Paolo Bonzini, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 2646 bytes --]

On Tue, Jul 09, 2013 at 10:22:39AM -0500, Anthony Liguori wrote:
> David Gibson <david@gibson.dropbear.id.au> writes:
> 
> > On Mon, Jul 08, 2013 at 01:39:26PM -0500, Anthony Liguori wrote:
> >> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> >> 
> >> > From: David Gibson <david@gibson.dropbear.id.au>
> >> >
> >> > This patch adds the necessary VMStateDescription information to save the
> >> > state of PAPR TCE tables (that is, the PAPR specified IOMMU).
> >> >
> >> > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> >> > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >> > ---
> >> >  hw/ppc/spapr_iommu.c |   25 +++++++++++++++++++++++++
> >> >  1 file changed, 25 insertions(+)
> >> >
> >> > diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
> >> > index 91bc8e4..ba1f7b6 100644
> >> > --- a/hw/ppc/spapr_iommu.c
> >> > +++ b/hw/ppc/spapr_iommu.c
> >> > @@ -112,6 +112,25 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
> >> >      };
> >> >  }
> >> >  
> >> > +static const VMStateDescription vmstate_spapr_tce_table = {
> >> > +    .name = "spapr_iommu",
> >> > +    .version_id = 1,
> >> > +    .minimum_version_id = 1,
> >> > +    .minimum_version_id_old = 1,
> >> > +    .fields      = (VMStateField []) {
> >> > +        /* Sanity check */
> >> > +        VMSTATE_UINT32_EQUAL(liobn, sPAPRTCETable),
> >> > +        VMSTATE_UINT32_EQUAL(window_size, sPAPRTCETable),
> >> > +
> >> > +        /* IOMMU state */
> >> > +        VMSTATE_BOOL(bypass, sPAPRTCETable),
> >> > +        VMSTATE_VBUFFER_DIVIDE(table, sPAPRTCETable, 0, NULL, 0, window_size,
> >> > +                               SPAPR_TCE_PAGE_SIZE /
> >> > sizeof(sPAPRTCE)),
> >> 
> >> Not endian safe.  I really don't get the divide bit at all either.
> >
> > So, the actual bug is that we're currently storing the TCE table
> > native endian, whereas it should be stored big endan always.
> 
> Why?  There are no guest visible byte accesses done to the table
> AFAICT.  Everything is done as words and there's quite a lot of math
> done to the entries.
> 
> It seems like native endian is the right internal representation.

Hrm.  I suppose it could be fixed at either end.  The idea was that
the table array would contain exactly the same bytes as would be
present in physical memory on a real bare-metal system, which seems
like a generally nice property.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 14/17] pseries: Support for in-kernel XICS interrupt controller
  2013-07-10  3:24         ` Benjamin Herrenschmidt
@ 2013-07-10  7:48           ` David Gibson
  0 siblings, 0 replies; 92+ messages in thread
From: David Gibson @ 2013-07-10  7:48 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Anthony Liguori, Alexey Kardashevskiy, qemu-devel,
	Alexander Graf, Paul Mackerras, Paolo Bonzini, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 560 bytes --]

On Wed, Jul 10, 2013 at 01:24:39PM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2013-07-09 at 17:21 +1000, David Gibson wrote:
> > > Did you mean missing typedef?
> > 
> > I think he means the kernel_style_struct_name instead of the
> > QemuStyleStudlyCapsStructName.
> 
> Looks like we missed the mandatory MakeCodeFugly rule of qemu :-)

ThatsTheOne

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 04/17] target-ppc: Convert ppc cpu savevm to VMStateDescription
  2013-07-10  3:31           ` Benjamin Herrenschmidt
@ 2013-07-10  7:49             ` David Gibson
  0 siblings, 0 replies; 92+ messages in thread
From: David Gibson @ 2013-07-10  7:49 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Anthony Liguori, Alexey Kardashevskiy, qemu-devel,
	Alexander Graf, Paul Mackerras, Paolo Bonzini, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 2783 bytes --]

On Wed, Jul 10, 2013 at 01:31:24PM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2013-07-10 at 01:11 +1000, David Gibson wrote:
> 
> > More precisely, DCRs are only needed on the BookE CPUs which have
> > them.  They can be added later without breaking compatibility, and
> > would be best added by someone working on the BookE stuff who can test
> > it properly.
> 
> DCRs are also not in the core, they are in the fabric (ie, global
> chip things) anyway.
> 
> > Migration will (in fact, does) work without anything extra for the
> > timebase.  What's less clear is if all the timing edge cases are
> > correct at present.
> > 
> > As a rule, the guest should see the timebase advance across the
> > migration according to the elapsed wall clock time.  But the guest
> > *must not* see the timebase go backwards, even if the source and
> > destination host clocks are out of sync in such a way that time
> > appears to go backwards across the migration.
> > 
> > Under TCG, the guest timebase is not tracked as it advances, but an
> > appropriate value is computed from the host system time when the
> > timebase is read.  Under KVM, the host and guest timebase are the same
> > register physically.  We don't yet, but we probably should, context
> > switch the upper bits of the timebase, to give the guest its own
> > logical value for it.
> > 
> > Getting all the combinations of cases corrects probably needs some
> > sort of real time <-> guest timebase delta transferred across the
> > migration, but working out exactly what's needed and how to encode it
> > is a bit fiddly.
> > 
> > Since the common cases work already, and it's fairly straightforward
> > to add whatever delta is needed in a backwards compatible way.  It
> > seems reasonable, therefore to get migration mostly working, even with
> > some known bugs in timing edge cases.
> 
> What do you mean by the "common case" ? The common case of KVM does not
> work afaik. The timebase *will* appear to go backward if the target
> machine was booted after the source machine today which is likely to
> crash the kernel.
> 
> The timebase context switching must be implemented asap. We've discussed
> it a few times here and we know how to do it, it's just not done yet.

Hmm.. good point.  I mean that I did some sample migrates and they
worked.  But that was probably full-emu and/or the same source and
dest machine.  So above should be rephrased as "at least one case"
works, which is more than previously.

But yes the timebase handling needs to be sorted out.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 16/17] ppc64: Enable QEMU to run on POWER 8 DD1 chip.
  2013-07-04  6:42     ` [Qemu-devel] " Prerna Saxena
@ 2013-07-10 11:19       ` Alexander Graf
  0 siblings, 0 replies; 92+ messages in thread
From: Alexander Graf @ 2013-07-10 11:19 UTC (permalink / raw)
  To: Prerna Saxena
  Cc: Anthony Liguori, Alexey Kardashevskiy, qemu-devel, qemu-ppc,
	Paolo Bonzini, Andreas Färber, Paul Mackerras, David Gibson


On 04.07.2013, at 08:42, Prerna Saxena wrote:

> Hi Andreas,
> Thank you for taking a look.
> I have incorporated your feedback into a new patch, attached herewith.
> 
> 
> Regards,
> Prerna
> 
> Subject: [PATCH] target-ppc: Add POWER8 v1.0 CPU model
> 
> This patch adds CPU PVR definition for POWER8,
> and enables QEMU to launch guests on POWER8 hardware.
> 
> Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Reviewed-by: Paul Mackerras <paulus@samba.org>
> Reviewed-by: Andreas Farber <afaerber@suse.de>

Thanks, applied to ppc-next. Next time please send new revisions of patches as new emails.


Alex

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 02/17] pseries: rework XICS
  2013-07-10  3:26           ` Benjamin Herrenschmidt
@ 2013-07-10 12:09             ` Anthony Liguori
  0 siblings, 0 replies; 92+ messages in thread
From: Anthony Liguori @ 2013-07-10 12:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alexey Kardashevskiy, qemu-devel, Alexander Graf, qemu-ppc,
	Paolo Bonzini, Paul Mackerras, David Gibson

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> On Tue, 2013-07-09 at 08:58 -0500, Anthony Liguori wrote:
>> There's nothing gross about making the things that are devices
>> devices.
>
> But there is no such thing as the XICS ...
>
> The "XICS" is just the combination of ICP's and ICS... so XICS *is* the
> device...

Then you have an XICS device which has a single ICP and multiple ICSs
as child devices.

Regards,

Anthony Liguori

>
> Cheers,
> Ben.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 05/17] pseries: savevm support for XICS interrupt controller
  2013-07-09  3:37         ` Alexey Kardashevskiy
@ 2013-07-15 13:05           ` Paolo Bonzini
  2013-07-15 13:13             ` Alexey Kardashevskiy
  0 siblings, 1 reply; 92+ messages in thread
From: Paolo Bonzini @ 2013-07-15 13:05 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Anthony Liguori, Alexander Graf, qemu-devel, Paul Mackerras,
	Anthony Liguori, qemu-ppc, David Gibson

Il 09/07/2013 05:37, Alexey Kardashevskiy ha scritto:
> 
> btw xics-kvm does not introduce new members but does have very different
> .pre_save and .post_load. This actually was the whole point of splitting
> xics into xics and xics-kvm. I cannot see how I can fix it without hacks.
> Property's can be inherited from a parent class (?) but VMStateDescription
> cannot.

The vmstate's pre_save and post_load functions can dispatch to a method
in the subclass.  Again, i8259 does exactly what you want:

static void pic_dispatch_pre_save(void *opaque)
{
    PICCommonState *s = opaque;
    PICCommonClass *info = PIC_COMMON_GET_CLASS(s);

    if (info->pre_save) {
        info->pre_save(s);
    }
}

Paolo

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 05/17] pseries: savevm support for XICS interrupt controller
  2013-07-09  7:17     ` David Gibson
@ 2013-07-15 13:10       ` Paolo Bonzini
  0 siblings, 0 replies; 92+ messages in thread
From: Paolo Bonzini @ 2013-07-15 13:10 UTC (permalink / raw)
  To: David Gibson
  Cc: Anthony Liguori, Alexey Kardashevskiy, qemu-devel,
	Alexander Graf, qemu-ppc, Paul Mackerras

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Il 09/07/2013 09:17, David Gibson ha scritto:
> So, Alexey addressed the xics vs. xics-kvm issues.  But there's 
> another factor here.  It's not clear to me how you'd QOM this 
> component.
> 
> What's being registered here is the "presentation server".  That's
> the per-CPU part - vaguely equivalent to the LAPIC on x86.  x86
> doesn't have something equivalent here, because they register the
> LAPIC state as part of the CPU state, but we can't do that because
> the ICP is not bound to the CPU as tightly - a POWER7 using a
> different interrupt architecture would certainly be possible.

That's also possible with x86, in fact there is a command line option
to only use the legacy 8259 interrupt controller.

The LAPIC is a separate device from the CPU, it just happens that the
CPU also needs a back-pointer to the LAPIC.  If you do not need that
back-pointer, just do not put it in.  The ICP can still have a link
property that points to the CPU.

Paolo

> So to do this with QOM, would the ICP need to be registered as a
> child of the cpu object?

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJR4/StAAoJEBvWZb6bTYbyxmQQAJi8B6Dlyrg/6EKwtK834MQ/
+XWQda+EfYVzFgECIxzQtiumUMNv2pxOEJ1Ij1jgs4o+n18mH14moO1A1r2YAONx
8eXpmxwd3vt0ka/fiW7BP4mDThUT8u0EYhyLkRnMkXfw2RTElw/E+Cx5v2aCK43C
bz1Ws7Dtjsw3pDinobrl32NhwiJZ+SQvEGnxZiMt1R3PFu7m5cuBdr7Cmc6ZWFAq
lvnUXNqOaAI8sywcsXLMFTan9rzdz0eNRxpMBB9F60szRFmTIGDv8kww0LLwJE1/
pTXv0Ts7jwdA0wykIQQKFLtmLKJGfuq8U4qe/uH+AnevC0CZ0A3/g3y+juC8qKnA
8vUPZdwUy+J4NqdZM1wMMd2QOA1XO4Pd6RTHY5kU7ITDma5A/sHsrysz8XfrcL4T
X8sEDCoUprMn/qF+52671Ol4T8mT5N0pwkjak5yjtQbcmAk4uSXMCS+eAbQ2i8ae
2KCLuCAFTuDIon52UtqEcV/7QHUVp1vB8qjhZjqkLpEgrR7ojINCmUpNaxLddOmz
b3v64JOYk4QNEJ0yccFSSib7LwIxYqilx0Pyk0pl5f5G+eqMFlJhxFSS26QxxIqR
fJMObjZxdoCeH49TLOshRUKJpRi1f7ChxlREiY0xC2eMF0k3fDEWCHqg4K5vMidd
eloFvLkkygN52W9C8f1E
=NpYW
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 09/17] pseries: rework PAPR virtual SCSI
  2013-07-08 18:42   ` Anthony Liguori
@ 2013-07-15 13:11     ` Paolo Bonzini
  0 siblings, 0 replies; 92+ messages in thread
From: Paolo Bonzini @ 2013-07-15 13:11 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexey Kardashevskiy, qemu-devel, Alexander Graf, qemu-ppc,
	Paul Mackerras, David Gibson

Il 08/07/2013 20:42, Anthony Liguori ha scritto:
>> > +static int vscsi_fetch_desc(VSCSIState *s, struct vscsi_req *req,
>> > +                            unsigned n, unsigned buf_offset,
>> > +                            struct srp_direct_buf *ret)
>> > +{
>> > +    struct srp_cmd *cmd = &req->iu.srp.cmd;
>> > +
>> > +    switch (req->dma_fmt) {
>> > +    case SRP_NO_DATA_DESC: {
>> > +        dprintf("VSCSI: no data descriptor\n");
>> > +        return 0;
>> > +    }
>> > +    case SRP_DATA_DESC_DIRECT: {
>> > +        *ret = *(struct srp_direct_buf *)(cmd->add_data +
>> > req->cdb_offset);
> If you're reworking this code, you should remove these casts.  It's not
> safe to assume that cdb_offset is aligned properly.  memcpy()'ing would
> be much safer.

Or simply declare struct srp_direct_buf as packed (even better, use a
typedef as in the coding conventions).

Paolo

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 05/17] pseries: savevm support for XICS interrupt controller
  2013-07-15 13:05           ` Paolo Bonzini
@ 2013-07-15 13:13             ` Alexey Kardashevskiy
  2013-07-15 13:17               ` Paolo Bonzini
  0 siblings, 1 reply; 92+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-15 13:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Anthony Liguori, Alexander Graf, qemu-devel, Paul Mackerras,
	Anthony Liguori, qemu-ppc, David Gibson

On 07/15/2013 11:05 PM, Paolo Bonzini wrote:
> Il 09/07/2013 05:37, Alexey Kardashevskiy ha scritto:
>>
>> btw xics-kvm does not introduce new members but does have very different
>> .pre_save and .post_load. This actually was the whole point of splitting
>> xics into xics and xics-kvm. I cannot see how I can fix it without hacks.
>> Property's can be inherited from a parent class (?) but VMStateDescription
>> cannot.
> 
> The vmstate's pre_save and post_load functions can dispatch to a method
> in the subclass.  Again, i8259 does exactly what you want:
> 
> static void pic_dispatch_pre_save(void *opaque)
> {
>     PICCommonState *s = opaque;
>     PICCommonClass *info = PIC_COMMON_GET_CLASS(s);
> 
>     if (info->pre_save) {
>         info->pre_save(s);
>     }
> }

And this is not a hack. Hm. I do not get it. There is even INTERFACE_CLASS
defined but noone is using it. Instead you are proposing to add callbacks
called from callbacks. And this is all for not having dev==NULL in
vmstate_register()... Gosh :(


-- 
Alexey

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 05/17] pseries: savevm support for XICS interrupt controller
  2013-07-15 13:13             ` Alexey Kardashevskiy
@ 2013-07-15 13:17               ` Paolo Bonzini
  0 siblings, 0 replies; 92+ messages in thread
From: Paolo Bonzini @ 2013-07-15 13:17 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Anthony Liguori, Alexander Graf, qemu-devel, Paul Mackerras,
	Anthony Liguori, qemu-ppc, David Gibson

Il 15/07/2013 15:13, Alexey Kardashevskiy ha scritto:
>> > 
>> > The vmstate's pre_save and post_load functions can dispatch to a method
>> > in the subclass.  Again, i8259 does exactly what you want:
>> > 
>> > static void pic_dispatch_pre_save(void *opaque)
>> > {
>> >     PICCommonState *s = opaque;
>> >     PICCommonClass *info = PIC_COMMON_GET_CLASS(s);
>> > 
>> >     if (info->pre_save) {
>> >         info->pre_save(s);
>> >     }
>> > }
> And this is not a hack. Hm. I do not get it. There is even INTERFACE_CLASS
> defined but noone is using it. Instead you are proposing to add callbacks
> called from callbacks. And this is all for not having dev==NULL in
> vmstate_register()... Gosh :(

This is not about having dev!=NULL.  It is about not using
vmstate_register at all.

Paolo

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 04/17] target-ppc: Convert ppc cpu savevm to VMStateDescription
  2013-07-09 15:11         ` David Gibson
  2013-07-10  3:31           ` Benjamin Herrenschmidt
@ 2013-07-15 13:24           ` Paolo Bonzini
  1 sibling, 0 replies; 92+ messages in thread
From: Paolo Bonzini @ 2013-07-15 13:24 UTC (permalink / raw)
  To: David Gibson
  Cc: Anthony Liguori, Alexey Kardashevskiy, qemu-devel,
	Alexander Graf, qemu-ppc, Paul Mackerras

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Il 09/07/2013 17:11, David Gibson ha scritto:
> Under TCG, the guest timebase is not tracked as it advances, but
> an appropriate value is computed from the host system time when
> the timebase is read.

Under TCG, the timebase uses vm_clock so it is migrated correctly.

Paolo
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJR4/gZAAoJEBvWZb6bTYby4IYP/2taI/Vq2TQmjQ30aJ+bGG+Q
ptJ5iaxqPMeq1BzVUYYl3tZvLC+pw5DH9p0mOoJZ/2COt/EvmXoGW8pDQdXO5dox
Q8hGz37ilJoAiNK0utd+vYFRs1suA5VGCznuUfG8Fev0WjJDUW/3nP/2/499iJ70
SWtn2R9Bj5YqPFTfCN9M306fEeiAXembFyvwntoHxLNjE+sPYr0o2shyWDQT3+sd
5sfIaKvLaR4TyG7xeFeQrqsRAqjnzPBiRDiPriMHTFit5zTaOdF61ClkAwHOjiXX
ND3W1KDDB4Ofeb2s4r9A7AkfQe1sTSOy7Q2xxomYo9/lAHEGZavHblb6Pl5WmiKT
LyKpp454Jk8rggVxJ2E58SokCe4GZp828PL94A+3qLw3Y8wMdKgyvIxYrkB3P3dh
aGY15oBBuaTDSJEjp+Luoi0Vdomq39nkxZbkUZWnONIIG5rYQFzIFYrYT42aqfbj
ErQPlekjA+iY8bCiZ6XIMxWpq0FsUq6lcbVRG6M/whiAzKL3nUR+j4IugVddjpXV
Z3eOdpWSafuZNy2krOzEwq7BWYvCAdkrEcM3vFtHjg1F1Zogsw+HsjELJi7mVfI3
twb6n0SRbM6sgLJwMcNncGSezwy/CZRAHmYZHLoEdYdh714zxeAPpSZFhQzD2QW2
bu7kfLv7mqqimFcIGftT
=/eEN
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 08/17] pseries: savevm support for PAPR TCE tables
  2013-07-08 18:39   ` Anthony Liguori
  2013-07-08 21:45     ` Benjamin Herrenschmidt
  2013-07-09  7:20     ` David Gibson
@ 2013-07-15 13:26     ` Paolo Bonzini
  2013-07-15 15:06       ` Anthony Liguori
  2 siblings, 1 reply; 92+ messages in thread
From: Paolo Bonzini @ 2013-07-15 13:26 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Alexey Kardashevskiy, Alexander Graf, qemu-devel, qemu-ppc,
	Paul Mackerras, David Gibson

Il 08/07/2013 20:39, Anthony Liguori ha scritto:
> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> 
>> From: David Gibson <david@gibson.dropbear.id.au>
>>
>> This patch adds the necessary VMStateDescription information to save the
>> state of PAPR TCE tables (that is, the PAPR specified IOMMU).
>>
>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>  hw/ppc/spapr_iommu.c |   25 +++++++++++++++++++++++++
>>  1 file changed, 25 insertions(+)
>>
>> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>> index 91bc8e4..ba1f7b6 100644
>> --- a/hw/ppc/spapr_iommu.c
>> +++ b/hw/ppc/spapr_iommu.c
>> @@ -112,6 +112,25 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
>>      };
>>  }
>>  
>> +static const VMStateDescription vmstate_spapr_tce_table = {
>> +    .name = "spapr_iommu",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .minimum_version_id_old = 1,
>> +    .fields      = (VMStateField []) {
>> +        /* Sanity check */
>> +        VMSTATE_UINT32_EQUAL(liobn, sPAPRTCETable),
>> +        VMSTATE_UINT32_EQUAL(window_size, sPAPRTCETable),
>> +
>> +        /* IOMMU state */
>> +        VMSTATE_BOOL(bypass, sPAPRTCETable),
>> +        VMSTATE_VBUFFER_DIVIDE(table, sPAPRTCETable, 0, NULL, 0, window_size,
>> +                               SPAPR_TCE_PAGE_SIZE /
>> sizeof(sPAPRTCE)),
> 
> Not endian safe.  I really don't get the divide bit at all either.
> 
>> +
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>>  static MemoryRegionIOMMUOps spapr_iommu_ops = {
>>      .translate = spapr_tce_translate_iommu,
>>  };
>> @@ -156,6 +175,8 @@ sPAPRTCETable *spapr_tce_new_table(uint32_t liobn, size_t window_size)
>>  
>>      QLIST_INSERT_HEAD(&spapr_tce_tables, tcet, list);
>>  
>> +    vmstate_register(NULL, tcet->liobn, &vmstate_spapr_tce_table, tcet);
>> +
> 
> If you need to add these, then you need to do more QOM conversion.

No, this does not need QOM conversion.  It needs a sub-vmstate, that is
then used by both the PCI and VIO bridges via VMSTATE_STRUCT.

Paolo

> Regards,
> 
> Anthony Liguori
> 
>>      return tcet;
>>  }
>>  
>> @@ -163,6 +184,10 @@ void spapr_tce_free(sPAPRTCETable *tcet)
>>  {
>>      QLIST_REMOVE(tcet, list);
>>  
>> +    vmstate_unregister(NULL, &vmstate_spapr_tce_table, tcet);
>> +
>> +    QLIST_REMOVE(tcet, list);
>> +
>>      if (!kvm_enabled() ||
>>          (kvmppc_remove_spapr_tce(tcet->table, tcet->fd,
>>                                   tcet->window_size) != 0)) {
>> -- 
>> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [Qemu-devel] [PATCH 08/17] pseries: savevm support for PAPR TCE tables
  2013-07-15 13:26     ` Paolo Bonzini
@ 2013-07-15 15:06       ` Anthony Liguori
  0 siblings, 0 replies; 92+ messages in thread
From: Anthony Liguori @ 2013-07-15 15:06 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Alexey Kardashevskiy, Alexander Graf, qemu-devel, Paul Mackerras,
	qemu-ppc, David Gibson

Paolo Bonzini <pbonzini@redhat.com> writes:

> Il 08/07/2013 20:39, Anthony Liguori ha scritto:
>> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
>> 
>>> From: David Gibson <david@gibson.dropbear.id.au>
>>>
>>> This patch adds the necessary VMStateDescription information to save the
>>> state of PAPR TCE tables (that is, the PAPR specified IOMMU).
>>>
>>> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>  hw/ppc/spapr_iommu.c |   25 +++++++++++++++++++++++++
>>>  1 file changed, 25 insertions(+)
>>>
>>> diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
>>> index 91bc8e4..ba1f7b6 100644
>>> --- a/hw/ppc/spapr_iommu.c
>>> +++ b/hw/ppc/spapr_iommu.c
>>> @@ -112,6 +112,25 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
>>>      };
>>>  }
>>>  
>>> +static const VMStateDescription vmstate_spapr_tce_table = {
>>> +    .name = "spapr_iommu",
>>> +    .version_id = 1,
>>> +    .minimum_version_id = 1,
>>> +    .minimum_version_id_old = 1,
>>> +    .fields      = (VMStateField []) {
>>> +        /* Sanity check */
>>> +        VMSTATE_UINT32_EQUAL(liobn, sPAPRTCETable),
>>> +        VMSTATE_UINT32_EQUAL(window_size, sPAPRTCETable),
>>> +
>>> +        /* IOMMU state */
>>> +        VMSTATE_BOOL(bypass, sPAPRTCETable),
>>> +        VMSTATE_VBUFFER_DIVIDE(table, sPAPRTCETable, 0, NULL, 0, window_size,
>>> +                               SPAPR_TCE_PAGE_SIZE /
>>> sizeof(sPAPRTCE)),
>> 
>> Not endian safe.  I really don't get the divide bit at all either.
>> 
>>> +
>>> +        VMSTATE_END_OF_LIST()
>>> +    },
>>> +};
>>> +
>>>  static MemoryRegionIOMMUOps spapr_iommu_ops = {
>>>      .translate = spapr_tce_translate_iommu,
>>>  };
>>> @@ -156,6 +175,8 @@ sPAPRTCETable *spapr_tce_new_table(uint32_t liobn, size_t window_size)
>>>  
>>>      QLIST_INSERT_HEAD(&spapr_tce_tables, tcet, list);
>>>  
>>> +    vmstate_register(NULL, tcet->liobn, &vmstate_spapr_tce_table, tcet);
>>> +
>> 
>> If you need to add these, then you need to do more QOM conversion.
>
> No, this does not need QOM conversion.  It needs a sub-vmstate, that is
> then used by both the PCI and VIO bridges via VMSTATE_STRUCT.

I already QOM converted it and made it a sub-object.

I think that's better from a modeling point of view than using a
sub-vmstate.

Patches coming shortly.

Regards,

Anthony Liguori

>
> Paolo
>
>> Regards,
>> 
>> Anthony Liguori
>> 
>>>      return tcet;
>>>  }
>>>  
>>> @@ -163,6 +184,10 @@ void spapr_tce_free(sPAPRTCETable *tcet)
>>>  {
>>>      QLIST_REMOVE(tcet, list);
>>>  
>>> +    vmstate_unregister(NULL, &vmstate_spapr_tce_table, tcet);
>>> +
>>> +    QLIST_REMOVE(tcet, list);
>>> +
>>>      if (!kvm_enabled() ||
>>>          (kvmppc_remove_spapr_tce(tcet->table, tcet->fd,
>>>                                   tcet->window_size) != 0)) {
>>> -- 
>>> 1.7.10.4
>> 

^ permalink raw reply	[flat|nested] 92+ messages in thread

end of thread, other threads:[~2013-07-15 15:06 UTC | newest]

Thread overview: 92+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-27  6:45 [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
2013-06-27  6:45 ` [Qemu-devel] [PATCH 01/17] pseries: move interrupt controllers to hw/intc/ Alexey Kardashevskiy
2013-07-02 20:54   ` Andreas Färber
2013-07-08 18:15   ` Anthony Liguori
2013-07-08 18:34     ` Alexander Graf
2013-06-27  6:45 ` [Qemu-devel] [PATCH 02/17] pseries: rework XICS Alexey Kardashevskiy
2013-06-27 11:47   ` David Gibson
2013-06-27 12:17     ` Alexey Kardashevskiy
2013-07-02  0:06       ` David Gibson
2013-07-02  0:21         ` Alexander Graf
2013-07-02  2:08           ` Alexey Kardashevskiy
2013-07-08 18:24       ` Anthony Liguori
2013-07-08 18:22   ` Anthony Liguori
2013-07-09  3:40     ` Alexey Kardashevskiy
2013-07-09  4:48       ` Benjamin Herrenschmidt
2013-07-09 13:58         ` Anthony Liguori
2013-07-10  3:06           ` Alexey Kardashevskiy
2013-07-10  3:26           ` Benjamin Herrenschmidt
2013-07-10 12:09             ` Anthony Liguori
2013-06-27  6:45 ` [Qemu-devel] [PATCH 03/17] savevm: Implement VMS_DIVIDE flag Alexey Kardashevskiy
2013-07-08 18:27   ` Anthony Liguori
2013-07-08 23:57     ` David Gibson
2013-07-09 14:06       ` Anthony Liguori
2013-07-09 14:38         ` David Gibson
2013-06-27  6:45 ` [Qemu-devel] [PATCH 04/17] target-ppc: Convert ppc cpu savevm to VMStateDescription Alexey Kardashevskiy
2013-07-08 18:29   ` Anthony Liguori
2013-07-09  5:14     ` Alexey Kardashevskiy
2013-07-09 14:08       ` Anthony Liguori
2013-07-09 15:11         ` David Gibson
2013-07-10  3:31           ` Benjamin Herrenschmidt
2013-07-10  7:49             ` David Gibson
2013-07-15 13:24           ` Paolo Bonzini
2013-06-27  6:45 ` [Qemu-devel] [PATCH 05/17] pseries: savevm support for XICS interrupt controller Alexey Kardashevskiy
2013-07-08 18:31   ` Anthony Liguori
2013-07-09  0:06     ` Alexey Kardashevskiy
2013-07-09  0:49       ` Anthony Liguori
2013-07-09  0:59         ` Alexey Kardashevskiy
2013-07-09  1:25           ` Anthony Liguori
2013-07-09  3:37         ` Alexey Kardashevskiy
2013-07-15 13:05           ` Paolo Bonzini
2013-07-15 13:13             ` Alexey Kardashevskiy
2013-07-15 13:17               ` Paolo Bonzini
2013-07-09  7:17     ` David Gibson
2013-07-15 13:10       ` Paolo Bonzini
2013-06-27  6:45 ` [Qemu-devel] [PATCH 06/17] pseries: savevm support for VIO devices Alexey Kardashevskiy
2013-07-08 18:35   ` Anthony Liguori
2013-06-27  6:45 ` [Qemu-devel] [PATCH 07/17] pseries: savevm support for PAPR VIO logical lan Alexey Kardashevskiy
2013-07-08 18:36   ` Anthony Liguori
2013-06-27  6:45 ` [Qemu-devel] [PATCH 08/17] pseries: savevm support for PAPR TCE tables Alexey Kardashevskiy
2013-07-08 18:39   ` Anthony Liguori
2013-07-08 21:45     ` Benjamin Herrenschmidt
2013-07-08 22:15       ` Anthony Liguori
2013-07-08 22:41         ` Benjamin Herrenschmidt
2013-07-09  7:20     ` David Gibson
2013-07-09 15:22       ` Anthony Liguori
2013-07-10  7:42         ` David Gibson
2013-07-09 16:26       ` Anthony Liguori
2013-07-15 13:26     ` Paolo Bonzini
2013-07-15 15:06       ` Anthony Liguori
2013-06-27  6:45 ` [Qemu-devel] [PATCH 09/17] pseries: rework PAPR virtual SCSI Alexey Kardashevskiy
2013-07-08 18:42   ` Anthony Liguori
2013-07-15 13:11     ` Paolo Bonzini
2013-06-27  6:45 ` [Qemu-devel] [PATCH 10/17] pseries: savevm support for " Alexey Kardashevskiy
2013-06-27  6:45 ` [Qemu-devel] [PATCH 11/17] pseries: savevm support for pseries machine Alexey Kardashevskiy
2013-07-08 18:45   ` Anthony Liguori
2013-07-08 18:50     ` Alexander Graf
2013-07-08 19:01       ` Anthony Liguori
2013-07-08 21:48     ` Benjamin Herrenschmidt
2013-07-08 22:23       ` Anthony Liguori
2013-06-27  6:45 ` [Qemu-devel] [PATCH 12/17] pseries: savevm support for PCI host bridge Alexey Kardashevskiy
2013-07-08 18:45   ` Anthony Liguori
2013-06-27  6:45 ` [Qemu-devel] [PATCH 13/17] target-ppc: Add helper for KVM_PPC_RTAS_DEFINE_TOKEN Alexey Kardashevskiy
2013-06-27  6:45 ` [Qemu-devel] [PATCH 14/17] pseries: Support for in-kernel XICS interrupt controller Alexey Kardashevskiy
2013-07-08 18:50   ` Anthony Liguori
2013-07-09  3:21     ` Alexey Kardashevskiy
2013-07-09  7:21       ` David Gibson
2013-07-10  3:24         ` Benjamin Herrenschmidt
2013-07-10  7:48           ` David Gibson
2013-06-27  6:45 ` [Qemu-devel] [PATCH 15/17] pseries: savevm support with KVM Alexey Kardashevskiy
2013-06-27  6:45 ` [Qemu-devel] [PATCH 16/17] ppc64: Enable QEMU to run on POWER 8 DD1 chip Alexey Kardashevskiy
2013-07-04  5:54   ` Andreas Färber
2013-07-04  6:26     ` [Qemu-devel] [Qemu-ppc] " Benjamin Herrenschmidt
2013-07-04  6:42     ` [Qemu-devel] " Prerna Saxena
2013-07-10 11:19       ` Alexander Graf
2013-06-27  6:46 ` [Qemu-devel] [PATCH 17/17] spapr-pci: rework MSI/MSIX Alexey Kardashevskiy
2013-07-04  2:31 ` [Qemu-devel] [PATCH 00/17 v3] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
2013-07-04  2:40   ` Anthony Liguori
2013-07-04  2:48     ` Alexey Kardashevskiy
2013-07-08 18:01 ` Anthony Liguori
2013-07-09  6:37   ` Alexey Kardashevskiy
2013-07-09 15:26     ` Anthony Liguori
2013-07-09 14:04 ` Anthony Liguori

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.