All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8
@ 2013-07-06 13:53 Alexey Kardashevskiy
  2013-07-06 13:53 ` [Qemu-devel] [PATCH 01/19] pseries: move interrupt controllers to hw/intc/ Alexey Kardashevskiy
                   ` (19 more replies)
  0 siblings, 20 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-06 13:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, Alexey Kardashevskiy, Alexander Graf, qemu-ppc,
	Paul Mackerras, David Gibson


New patch ("target-ppc: Enhance the CPU node labels for
the guest device tree for pseries") and "savevm for VIO TTY" is
separated from "savevm for VIO LAN".

The series was also rebased on top of current master from qemu.org.

Besides that, no more changes.


Alexey Kardashevskiy (4):
  pseries: move interrupt controllers to hw/intc/
  pseries: rework XICS
  pseries: rework PAPR virtual SCSI
  spapr-pci: rework MSI/MSIX

David Gibson (13):
  savevm: Implement VMS_DIVIDE flag
  target-ppc: Convert ppc cpu savevm to VMStateDescription
  pseries: savevm support for XICS interrupt controller
  pseries: savevm support for VIO devices
  pseries: savevm support for PAPR VIO logical lan
  pseries: savevm support for PAPR VIO logical tty
  pseries: savevm support for PAPR TCE tables
  pseries: savevm support for PAPR virtual SCSI
  pseries: savevm support for pseries machine
  pseries: savevm support for PCI host bridge
  target-ppc: Add helper for KVM_PPC_RTAS_DEFINE_TOKEN
  pseries: Support for in-kernel XICS interrupt controller
  pseries: savevm support with KVM

Prerna Saxena (2):
  target-ppc: Add POWER8 v1.0 CPU model
  target-ppc: Enhance the CPU node labels for the guest device tree for
    pseries.

 default-configs/ppc64-softmmu.mak |   2 +
 hw/char/spapr_vty.c               |  16 ++
 hw/intc/Makefile.objs             |   2 +
 hw/{ppc => intc}/xics.c           | 172 ++++++++----
 hw/intc/xics_kvm.c                | 445 +++++++++++++++++++++++++++++++
 hw/net/spapr_llan.c               |  24 +-
 hw/ppc/Makefile.objs              |   2 +-
 hw/ppc/spapr.c                    | 435 ++++++++++++++++++++++++++++++-
 hw/ppc/spapr_hcall.c              |   8 +-
 hw/ppc/spapr_iommu.c              |  25 ++
 hw/ppc/spapr_pci.c                | 141 ++++++----
 hw/ppc/spapr_vio.c                |  20 ++
 hw/scsi/spapr_vscsi.c             | 306 +++++++++++++++-------
 include/hw/pci-host/spapr.h       |  14 +-
 include/hw/ppc/spapr.h            |  17 +-
 include/hw/ppc/spapr_vio.h        |   5 +
 include/hw/ppc/xics.h             |  72 ++++-
 include/migration/vmstate.h       |  13 +
 savevm.c                          |   8 +
 target-ppc/cpu-models.c           |   3 +
 target-ppc/cpu-models.h           |   1 +
 target-ppc/cpu-qom.h              |   5 +
 target-ppc/cpu.h                  |   8 +-
 target-ppc/kvm.c                  |  83 ++++++
 target-ppc/kvm_ppc.h              |  29 +++
 target-ppc/machine.c              | 533 +++++++++++++++++++++++++++++++-------
 target-ppc/translate_init.c       |  64 +++++
 27 files changed, 2131 insertions(+), 322 deletions(-)
 rename hw/{ppc => intc}/xics.c (80%)
 create mode 100644 hw/intc/xics_kvm.c

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 01/19] pseries: move interrupt controllers to hw/intc/
  2013-07-06 13:53 [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
@ 2013-07-06 13:53 ` Alexey Kardashevskiy
  2013-07-06 13:53 ` [Qemu-devel] [PATCH 02/19] pseries: rework XICS Alexey Kardashevskiy
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-06 13:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, Alexey Kardashevskiy, Alexander Graf, qemu-ppc,
	Paul Mackerras, David Gibson

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Andreas Färber <afaerber@suse.de>
---
 default-configs/ppc64-softmmu.mak | 1 +
 hw/intc/Makefile.objs             | 1 +
 hw/{ppc => intc}/xics.c           | 0
 hw/ppc/Makefile.objs              | 2 +-
 4 files changed, 3 insertions(+), 1 deletion(-)
 rename hw/{ppc => intc}/xics.c (100%)

diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
index cb279cb..69a9f8d 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -47,5 +47,6 @@ CONFIG_E500=y
 CONFIG_OPENPIC_KVM=$(and $(CONFIG_E500),$(CONFIG_KVM))
 # For pSeries
 CONFIG_PCI_HOTPLUG=y
+CONFIG_XICS=$(CONFIG_PSERIES)
 # For PReP
 CONFIG_MC146818RTC=y
diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index 2ba49d0..abe8f80 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -22,3 +22,4 @@ obj-$(CONFIG_OMAP) += omap_intc.o
 obj-$(CONFIG_OPENPIC) += openpic.o
 obj-$(CONFIG_OPENPIC_KVM) += openpic_kvm.o
 obj-$(CONFIG_SH4) += sh_intc.o
+obj-$(CONFIG_XICS) += xics.o
diff --git a/hw/ppc/xics.c b/hw/intc/xics.c
similarity index 100%
rename from hw/ppc/xics.c
rename to hw/intc/xics.c
diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index be00d1d..7a1cd5d 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -1,7 +1,7 @@
 # shared objects
 obj-y += ppc.o ppc_booke.o
 # IBM pSeries (sPAPR)
-obj-$(CONFIG_PSERIES) += spapr.o xics.o spapr_vio.o spapr_events.o
+obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
 obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
 obj-$(CONFIG_PSERIES) += spapr_pci.o
 # PowerPC 4xx boards
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 02/19] pseries: rework XICS
  2013-07-06 13:53 [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
  2013-07-06 13:53 ` [Qemu-devel] [PATCH 01/19] pseries: move interrupt controllers to hw/intc/ Alexey Kardashevskiy
@ 2013-07-06 13:53 ` Alexey Kardashevskiy
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 03/19] savevm: Implement VMS_DIVIDE flag Alexey Kardashevskiy
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-06 13:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, Alexey Kardashevskiy, Alexander Graf, qemu-ppc,
	Paul Mackerras, David Gibson

Currently XICS interrupt controller is not a QEMU device. As we are going
to support in-kernel emulated XICS which is a part of KVM, it make
sense not to extend the existing XICS and have multiple KVM stub functions
but to create yet another device and share pieces between fully emulated
XICS and in-kernel XICS.

The rework includes:
* port to QOM
* made few functions public to use from in-kernel XICS implementation
* made VMStateDescription public to be used for in-kernel XICS migration
* move xics_system_init() to spapr.c, it tries creating fully-emulated
XICS now and will try in-kernel XICS in upcoming patches.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/intc/xics.c        | 109 ++++++++++++++++++++++++++------------------------
 hw/ppc/spapr.c        |  28 +++++++++++++
 include/hw/ppc/xics.h |  59 +++++++++++++++++++++++++--
 3 files changed, 141 insertions(+), 55 deletions(-)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 091912e..0e374c8 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -34,13 +34,6 @@
  * ICP: Presentation layer
  */
 
-struct icp_server_state {
-    uint32_t xirr;
-    uint8_t pending_priority;
-    uint8_t mfrr;
-    qemu_irq output;
-};
-
 #define XISR_MASK  0x00ffffff
 #define CPPR_MASK  0xff000000
 
@@ -49,12 +42,6 @@ struct icp_server_state {
 
 struct ics_state;
 
-struct icp_state {
-    long nr_servers;
-    struct icp_server_state *ss;
-    struct ics_state *ics;
-};
-
 static void ics_reject(struct ics_state *ics, int nr);
 static void ics_resend(struct ics_state *ics);
 static void ics_eoi(struct ics_state *ics, int nr);
@@ -171,27 +158,6 @@ static void icp_irq(struct icp_state *icp, int server, int nr, uint8_t priority)
 /*
  * ICS: Source layer
  */
-
-struct ics_irq_state {
-    int server;
-    uint8_t priority;
-    uint8_t saved_priority;
-#define XICS_STATUS_ASSERTED           0x1
-#define XICS_STATUS_SENT               0x2
-#define XICS_STATUS_REJECTED           0x4
-#define XICS_STATUS_MASKED_PENDING     0x8
-    uint8_t status;
-};
-
-struct ics_state {
-    int nr_irqs;
-    int offset;
-    qemu_irq *qirqs;
-    bool *islsi;
-    struct ics_irq_state *irqs;
-    struct icp_state *icp;
-};
-
 static int ics_valid_irq(struct ics_state *ics, uint32_t nr)
 {
     return (nr >= ics->offset)
@@ -506,9 +472,8 @@ static void rtas_int_on(PowerPCCPU *cpu, sPAPREnvironment *spapr,
     rtas_st(rets, 0, 0); /* Success */
 }
 
-static void xics_reset(void *opaque)
+void xics_common_reset(struct icp_state *icp)
 {
-    struct icp_state *icp = (struct icp_state *)opaque;
     struct ics_state *ics = icp->ics;
     int i;
 
@@ -527,7 +492,12 @@ static void xics_reset(void *opaque)
     }
 }
 
-void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
+static void xics_reset(DeviceState *d)
+{
+    xics_common_reset(XICS(d));
+}
+
+void xics_common_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
 {
     CPUState *cs = CPU(cpu);
     CPUPPCState *env = &cpu->env;
@@ -551,37 +521,72 @@ void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
     }
 }
 
-struct icp_state *xics_system_init(int nr_servers, int nr_irqs)
+void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
 {
-    struct icp_state *icp;
-    struct ics_state *ics;
+    xics_common_cpu_setup(icp, cpu);
+}
+
+void xics_common_init(struct icp_state *icp, qemu_irq_handler handler)
+{
+    struct ics_state *ics = icp->ics;
 
-    icp = g_malloc0(sizeof(*icp));
-    icp->nr_servers = nr_servers;
     icp->ss = g_malloc0(icp->nr_servers*sizeof(struct icp_server_state));
 
     ics = g_malloc0(sizeof(*ics));
-    ics->nr_irqs = nr_irqs;
+    ics->nr_irqs = icp->nr_irqs;
     ics->offset = XICS_IRQ_BASE;
-    ics->irqs = g_malloc0(nr_irqs * sizeof(struct ics_irq_state));
-    ics->islsi = g_malloc0(nr_irqs * sizeof(bool));
+    ics->irqs = g_malloc0(ics->nr_irqs * sizeof(struct ics_irq_state));
+    ics->islsi = g_malloc0(ics->nr_irqs * sizeof(bool));
 
     icp->ics = ics;
     ics->icp = icp;
 
-    ics->qirqs = qemu_allocate_irqs(ics_set_irq, ics, nr_irqs);
+    ics->qirqs = qemu_allocate_irqs(handler, ics, ics->nr_irqs);
+}
 
-    spapr_register_hypercall(H_CPPR, h_cppr);
-    spapr_register_hypercall(H_IPI, h_ipi);
-    spapr_register_hypercall(H_XIRR, h_xirr);
-    spapr_register_hypercall(H_EOI, h_eoi);
+static void xics_realize(DeviceState *dev, Error **errp)
+{
+    struct icp_state *icp = XICS(dev);
+
+    xics_common_init(icp, ics_set_irq);
 
     spapr_rtas_register("ibm,set-xive", rtas_set_xive);
     spapr_rtas_register("ibm,get-xive", rtas_get_xive);
     spapr_rtas_register("ibm,int-off", rtas_int_off);
     spapr_rtas_register("ibm,int-on", rtas_int_on);
 
-    qemu_register_reset(xics_reset, icp);
+}
+
+static Property xics_properties[] = {
+    DEFINE_PROP_UINT32("nr_servers", struct icp_state, nr_servers, -1),
+    DEFINE_PROP_UINT32("nr_irqs", struct icp_state, nr_irqs, -1),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void xics_class_init(ObjectClass *oc, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(oc);
+
+    dc->realize = xics_realize;
+    dc->props = xics_properties;
+    dc->reset = xics_reset;
+}
+
+static const TypeInfo xics_info = {
+    .name          = TYPE_XICS,
+    .parent        = TYPE_SYS_BUS_DEVICE,
+    .instance_size = sizeof(struct icp_state),
+    .class_init    = xics_class_init,
+};
 
-    return icp;
+static void xics_register_types(void)
+{
+    spapr_register_hypercall(H_CPPR, h_cppr);
+    spapr_register_hypercall(H_IPI, h_ipi);
+    spapr_register_hypercall(H_XIRR, h_xirr);
+    spapr_register_hypercall(H_EOI, h_eoi);
+
+    type_register_static(&xics_info);
 }
+
+type_init(xics_register_types)
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index fe34291..d8f1614 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -719,6 +719,34 @@ static int spapr_vga_init(PCIBus *pci_bus)
     }
 }
 
+static struct icp_state *try_create_xics(const char *type, int nr_servers,
+                                         int nr_irqs)
+{
+    DeviceState *dev;
+
+    dev = qdev_create(NULL, type);
+    qdev_prop_set_uint32(dev, "nr_servers", nr_servers);
+    qdev_prop_set_uint32(dev, "nr_irqs", nr_irqs);
+    if (qdev_init(dev) < 0) {
+        return NULL;
+    }
+
+    return XICS(dev);
+}
+
+static struct icp_state *xics_system_init(int nr_servers, int nr_irqs)
+{
+    struct icp_state *icp = NULL;
+
+    icp = try_create_xics(TYPE_XICS, nr_servers, nr_irqs);
+    if (!icp) {
+        perror("Failed to create XICS\n");
+        abort();
+    }
+
+    return icp;
+}
+
 /* pSeries LPAR / sPAPR hardware init */
 static void ppc_spapr_init(QEMUMachineInitArgs *args)
 {
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 6bce042..3f72806 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -27,15 +27,68 @@
 #if !defined(__XICS_H__)
 #define __XICS_H__
 
+#include "hw/sysbus.h"
+
+#define TYPE_XICS "xics"
+#define XICS(obj) OBJECT_CHECK(struct icp_state, (obj), TYPE_XICS)
+
 #define XICS_IPI        0x2
-#define XICS_IRQ_BASE   0x10
+#define XICS_BUID       0x1
+#define XICS_IRQ_BASE   (XICS_BUID << 12)
 
-struct icp_state;
+/*
+ * We currently only support one BUID which is our interrupt base
+ * (the kernel implementation supports more but we don't exploit
+ *  that yet)
+ */
+
+struct icp_state {
+    /*< private >*/
+    SysBusDevice parent_obj;
+    /*< public >*/
+    uint32_t nr_servers;
+    uint32_t nr_irqs;
+    struct icp_server_state *ss;
+    struct ics_state *ics;
+};
+
+struct icp_server_state {
+    uint32_t xirr;
+    uint8_t pending_priority;
+    uint8_t mfrr;
+    qemu_irq output;
+};
+
+struct ics_state {
+    uint32_t nr_irqs;
+    uint32_t offset;
+    qemu_irq *qirqs;
+    bool *islsi;
+    struct ics_irq_state *irqs;
+    struct icp_state *icp;
+};
+
+struct ics_irq_state {
+    uint32_t server;
+    uint8_t priority;
+    uint8_t saved_priority;
+#define XICS_STATUS_ASSERTED           0x1
+#define XICS_STATUS_SENT               0x2
+#define XICS_STATUS_REJECTED           0x4
+#define XICS_STATUS_MASKED_PENDING     0x8
+    uint8_t status;
+};
 
 qemu_irq xics_get_qirq(struct icp_state *icp, int irq);
 void xics_set_irq_type(struct icp_state *icp, int irq, bool lsi);
 
-struct icp_state *xics_system_init(int nr_servers, int nr_irqs);
+void xics_common_init(struct icp_state *icp, qemu_irq_handler handler);
+void xics_common_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu);
+void xics_common_reset(struct icp_state *icp);
+
 void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu);
 
+extern const VMStateDescription vmstate_icp_server;
+extern const VMStateDescription vmstate_ics;
+
 #endif /* __XICS_H__ */
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 03/19] savevm: Implement VMS_DIVIDE flag
  2013-07-06 13:53 [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
  2013-07-06 13:53 ` [Qemu-devel] [PATCH 01/19] pseries: move interrupt controllers to hw/intc/ Alexey Kardashevskiy
  2013-07-06 13:53 ` [Qemu-devel] [PATCH 02/19] pseries: rework XICS Alexey Kardashevskiy
@ 2013-07-06 13:54 ` Alexey Kardashevskiy
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 04/19] target-ppc: Convert ppc cpu savevm to VMStateDescription Alexey Kardashevskiy
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-06 13:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paul Mackerras,
	David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

The vmstate infrastructure includes a VMS_MULTIPY flag, and associated
VMSTATE_VBUFFER_MULTIPLY helper macro.  These can be used to save a
variably sized buffer where the size in bytes of the buffer isn't directly
accessible as a structure field, but an element count from which the size
can be derived is.

This patch adds an analogous VMS_DIVIDE option, which handles a variably
sized buffer whose size is a submultiple of a field, rather than a
multiple.  For example a buffer containing per-page structures whose size
is derived from a field storing the total address space described by the
structures could use this construct.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 include/migration/vmstate.h | 13 +++++++++++++
 savevm.c                    |  8 ++++++++
 2 files changed, 21 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 1c31b5d..672b0a7 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -100,6 +100,7 @@ enum VMStateFlags {
     VMS_MULTIPLY         = 0x200,  /* multiply "size" field by field_size */
     VMS_VARRAY_UINT8     = 0x400,  /* Array with size in uint8_t field*/
     VMS_VARRAY_UINT32    = 0x800,  /* Array with size in uint32_t field*/
+    VMS_DIVIDE           = 0x1000, /* divide "size" field by field_size */
 };
 
 typedef struct {
@@ -422,6 +423,18 @@ extern const VMStateInfo vmstate_info_bitmap;
     .start        = (_start),                                        \
 }
 
+#define VMSTATE_VBUFFER_DIVIDE(_field, _state, _version, _test, _start, _field_size, _divide) { \
+    .name         = (stringify(_field)),                             \
+    .version_id   = (_version),                                      \
+    .field_exists = (_test),                                         \
+    .size_offset  = vmstate_offset_value(_state, _field_size, uint32_t),\
+    .size         = (_divide),                                       \
+    .info         = &vmstate_info_buffer,                            \
+    .flags        = VMS_VBUFFER|VMS_POINTER|VMS_DIVIDE,              \
+    .offset       = offsetof(_state, _field),                        \
+    .start        = (_start),                                        \
+}
+
 #define VMSTATE_VBUFFER(_field, _state, _version, _test, _start, _field_size) { \
     .name         = (stringify(_field)),                             \
     .version_id   = (_version),                                      \
diff --git a/savevm.c b/savevm.c
index e0491e7..788af85 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1700,6 +1700,10 @@ int vmstate_load_state(QEMUFile *f, const VMStateDescription *vmsd,
                 if (field->flags & VMS_MULTIPLY) {
                     size *= field->size;
                 }
+                if (field->flags & VMS_DIVIDE) {
+                    assert((size % field->size) == 0);
+                    size /= field->size;
+                }
             }
             if (field->flags & VMS_ARRAY) {
                 n_elems = field->num;
@@ -1764,6 +1768,10 @@ void vmstate_save_state(QEMUFile *f, const VMStateDescription *vmsd,
                 if (field->flags & VMS_MULTIPLY) {
                     size *= field->size;
                 }
+                if (field->flags & VMS_DIVIDE) {
+                    assert((size % field->size) == 0);
+                    size /= field->size;
+                }
             }
             if (field->flags & VMS_ARRAY) {
                 n_elems = field->num;
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 04/19] target-ppc: Convert ppc cpu savevm to VMStateDescription
  2013-07-06 13:53 [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (2 preceding siblings ...)
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 03/19] savevm: Implement VMS_DIVIDE flag Alexey Kardashevskiy
@ 2013-07-06 13:54 ` Alexey Kardashevskiy
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 05/19] pseries: savevm support for XICS interrupt controller Alexey Kardashevskiy
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-06 13:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paul Mackerras,
	David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

The savevm code for the powerpc cpu emulation is currently based around
the old register_savevm() rather than register_vmstate() method.  It's also
rather broken, missing some important state on some CPU models.

This patch completely rewrites the savevm for target-ppc, using the new
VMStateDescription approach.  Exactly what needs to be saved in what
configurations has been more carefully examined, too.  This introduces a
new version (5) of the cpu save format.  The old load function is retained
to support version 4 images.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[aik: ppc cpu savevm convertion fixed to use PowerPCCPU instead of CPUPPCState]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 target-ppc/cpu-qom.h        |   4 +
 target-ppc/cpu.h            |   8 +-
 target-ppc/machine.c        | 533 ++++++++++++++++++++++++++++++++++++--------
 target-ppc/translate_init.c |   2 +
 4 files changed, 454 insertions(+), 93 deletions(-)

diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h
index 84ba105..a14a3d9 100644
--- a/target-ppc/cpu-qom.h
+++ b/target-ppc/cpu-qom.h
@@ -106,4 +106,8 @@ void ppc_cpu_dump_state(CPUState *cpu, FILE *f, fprintf_function cpu_fprintf,
 void ppc_cpu_dump_statistics(CPUState *cpu, FILE *f,
                              fprintf_function cpu_fprintf, int flags);
 
+#ifndef CONFIG_USER_ONLY
+extern const struct VMStateDescription vmstate_ppc_cpu;
+#endif
+
 #endif
diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 0ede077..f30577d 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -948,7 +948,7 @@ struct CPUPPCState {
 #if defined(TARGET_PPC64)
     /* PowerPC 64 SLB area */
     ppc_slb_t slb[64];
-    int slb_nr;
+    int32_t slb_nr;
 #endif
     /* segment registers */
     hwaddr htab_base;
@@ -957,11 +957,11 @@ struct CPUPPCState {
     /* externally stored hash table */
     uint8_t *external_htab;
     /* BATs */
-    int nb_BATs;
+    uint32_t nb_BATs;
     target_ulong DBAT[2][8];
     target_ulong IBAT[2][8];
     /* PowerPC TLB registers (for 4xx, e500 and 60x software driven TLBs) */
-    int nb_tlb;      /* Total number of TLB                                  */
+    int32_t nb_tlb;      /* Total number of TLB                              */
     int tlb_per_way; /* Speed-up helper: used to avoid divisions at run time */
     int nb_ways;     /* Number of ways in the TLB set                        */
     int last_way;    /* Last used way used to allocate TLB in a LRU way      */
@@ -1176,8 +1176,6 @@ static inline CPUPPCState *cpu_init(const char *cpu_model)
 #define cpu_signal_handler cpu_ppc_signal_handler
 #define cpu_list ppc_cpu_list
 
-#define CPU_SAVE_VERSION 4
-
 /* MMU modes definitions */
 #define MMU_MODE0_SUFFIX _user
 #define MMU_MODE1_SUFFIX _kernel
diff --git a/target-ppc/machine.c b/target-ppc/machine.c
index 2d10adb..1fcc6bc 100644
--- a/target-ppc/machine.c
+++ b/target-ppc/machine.c
@@ -1,96 +1,12 @@
 #include "hw/hw.h"
 #include "hw/boards.h"
 #include "sysemu/kvm.h"
+#include "helper_regs.h"
 
-void cpu_save(QEMUFile *f, void *opaque)
+static int cpu_load_old(QEMUFile *f, void *opaque, int version_id)
 {
-    CPUPPCState *env = (CPUPPCState *)opaque;
-    unsigned int i, j;
-    uint32_t fpscr;
-    target_ulong xer;
-
-    for (i = 0; i < 32; i++)
-        qemu_put_betls(f, &env->gpr[i]);
-#if !defined(TARGET_PPC64)
-    for (i = 0; i < 32; i++)
-        qemu_put_betls(f, &env->gprh[i]);
-#endif
-    qemu_put_betls(f, &env->lr);
-    qemu_put_betls(f, &env->ctr);
-    for (i = 0; i < 8; i++)
-        qemu_put_be32s(f, &env->crf[i]);
-    xer = cpu_read_xer(env);
-    qemu_put_betls(f, &xer);
-    qemu_put_betls(f, &env->reserve_addr);
-    qemu_put_betls(f, &env->msr);
-    for (i = 0; i < 4; i++)
-        qemu_put_betls(f, &env->tgpr[i]);
-    for (i = 0; i < 32; i++) {
-        union {
-            float64 d;
-            uint64_t l;
-        } u;
-        u.d = env->fpr[i];
-        qemu_put_be64(f, u.l);
-    }
-    fpscr = env->fpscr;
-    qemu_put_be32s(f, &fpscr);
-    qemu_put_sbe32s(f, &env->access_type);
-#if defined(TARGET_PPC64)
-    qemu_put_betls(f, &env->spr[SPR_ASR]);
-    qemu_put_sbe32s(f, &env->slb_nr);
-#endif
-    qemu_put_betls(f, &env->spr[SPR_SDR1]);
-    for (i = 0; i < 32; i++)
-        qemu_put_betls(f, &env->sr[i]);
-    for (i = 0; i < 2; i++)
-        for (j = 0; j < 8; j++)
-            qemu_put_betls(f, &env->DBAT[i][j]);
-    for (i = 0; i < 2; i++)
-        for (j = 0; j < 8; j++)
-            qemu_put_betls(f, &env->IBAT[i][j]);
-    qemu_put_sbe32s(f, &env->nb_tlb);
-    qemu_put_sbe32s(f, &env->tlb_per_way);
-    qemu_put_sbe32s(f, &env->nb_ways);
-    qemu_put_sbe32s(f, &env->last_way);
-    qemu_put_sbe32s(f, &env->id_tlbs);
-    qemu_put_sbe32s(f, &env->nb_pids);
-    if (env->tlb.tlb6) {
-        // XXX assumes 6xx
-        for (i = 0; i < env->nb_tlb; i++) {
-            qemu_put_betls(f, &env->tlb.tlb6[i].pte0);
-            qemu_put_betls(f, &env->tlb.tlb6[i].pte1);
-            qemu_put_betls(f, &env->tlb.tlb6[i].EPN);
-        }
-    }
-    for (i = 0; i < 4; i++)
-        qemu_put_betls(f, &env->pb[i]);
-    for (i = 0; i < 1024; i++)
-        qemu_put_betls(f, &env->spr[i]);
-    qemu_put_be32s(f, &env->vscr);
-    qemu_put_be64s(f, &env->spe_acc);
-    qemu_put_be32s(f, &env->spe_fscr);
-    qemu_put_betls(f, &env->msr_mask);
-    qemu_put_be32s(f, &env->flags);
-    qemu_put_sbe32s(f, &env->error_code);
-    qemu_put_be32s(f, &env->pending_interrupts);
-    qemu_put_be32s(f, &env->irq_input_state);
-    for (i = 0; i < POWERPC_EXCP_NB; i++)
-        qemu_put_betls(f, &env->excp_vectors[i]);
-    qemu_put_betls(f, &env->excp_prefix);
-    qemu_put_betls(f, &env->ivor_mask);
-    qemu_put_betls(f, &env->ivpr_mask);
-    qemu_put_betls(f, &env->hreset_vector);
-    qemu_put_betls(f, &env->nip);
-    qemu_put_betls(f, &env->hflags);
-    qemu_put_betls(f, &env->hflags_nmsr);
-    qemu_put_sbe32s(f, &env->mmu_idx);
-    qemu_put_sbe32(f, 0);
-}
-
-int cpu_load(QEMUFile *f, void *opaque, int version_id)
-{
-    CPUPPCState *env = (CPUPPCState *)opaque;
+    PowerPCCPU *cpu = opaque;
+    CPUPPCState *env = &cpu->env;
     unsigned int i, j;
     target_ulong sdr1;
     uint32_t fpscr;
@@ -177,3 +93,444 @@ int cpu_load(QEMUFile *f, void *opaque, int version_id)
 
     return 0;
 }
+
+static int get_avr(QEMUFile *f, void *pv, size_t size)
+{
+    ppc_avr_t *v = pv;
+
+    v->u64[0] = qemu_get_be64(f);
+    v->u64[1] = qemu_get_be64(f);
+
+    return 0;
+}
+
+static void put_avr(QEMUFile *f, void *pv, size_t size)
+{
+    ppc_avr_t *v = pv;
+
+    qemu_put_be64(f, v->u64[0]);
+    qemu_put_be64(f, v->u64[1]);
+}
+
+const VMStateInfo vmstate_info_avr = {
+    .name = "avr",
+    .get  = get_avr,
+    .put  = put_avr,
+};
+
+#define VMSTATE_AVR_ARRAY_V(_f, _s, _n, _v)                       \
+    VMSTATE_ARRAY(_f, _s, _n, _v, vmstate_info_avr, ppc_avr_t)
+
+#define VMSTATE_AVR_ARRAY(_f, _s, _n)                             \
+    VMSTATE_AVR_ARRAY_V(_f, _s, _n, 0)
+
+static void cpu_pre_save(void *opaque)
+{
+    PowerPCCPU *cpu = opaque;
+    CPUPPCState *env = &cpu->env;
+    int i;
+
+    env->spr[SPR_LR] = env->lr;
+    env->spr[SPR_CTR] = env->ctr;
+    env->spr[SPR_XER] = env->xer;
+#if defined(TARGET_PPC64)
+    env->spr[SPR_CFAR] = env->cfar;
+#endif
+    env->spr[SPR_BOOKE_SPEFSCR] = env->spe_fscr;
+
+    for (i = 0; (i < 4) && (i < env->nb_BATs); i++) {
+        env->spr[SPR_DBAT0U + 2*i] = env->DBAT[0][i];
+        env->spr[SPR_DBAT0U + 2*i + 1] = env->DBAT[1][i];
+        env->spr[SPR_IBAT0U + 2*i] = env->IBAT[0][i];
+        env->spr[SPR_IBAT0U + 2*i + 1] = env->IBAT[1][i];
+    }
+    for (i = 0; (i < 4) && ((i+4) < env->nb_BATs); i++) {
+        env->spr[SPR_DBAT4U + 2*i] = env->DBAT[0][i+4];
+        env->spr[SPR_DBAT4U + 2*i + 1] = env->DBAT[1][i+4];
+        env->spr[SPR_IBAT4U + 2*i] = env->IBAT[0][i+4];
+        env->spr[SPR_IBAT4U + 2*i + 1] = env->IBAT[1][i+4];
+    }
+}
+
+static int cpu_post_load(void *opaque, int version_id)
+{
+    PowerPCCPU *cpu = opaque;
+    CPUPPCState *env = &cpu->env;
+    int i;
+
+    env->lr = env->spr[SPR_LR];
+    env->ctr = env->spr[SPR_CTR];
+    env->xer = env->spr[SPR_XER];
+#if defined(TARGET_PPC64)
+    env->cfar = env->spr[SPR_CFAR];
+#endif
+    env->spe_fscr = env->spr[SPR_BOOKE_SPEFSCR];
+
+    for (i = 0; (i < 4) && (i < env->nb_BATs); i++) {
+        env->DBAT[0][i] = env->spr[SPR_DBAT0U + 2*i];
+        env->DBAT[1][i] = env->spr[SPR_DBAT0U + 2*i + 1];
+        env->IBAT[0][i] = env->spr[SPR_IBAT0U + 2*i];
+        env->IBAT[1][i] = env->spr[SPR_IBAT0U + 2*i + 1];
+    }
+    for (i = 0; (i < 4) && ((i+4) < env->nb_BATs); i++) {
+        env->DBAT[0][i+4] = env->spr[SPR_DBAT4U + 2*i];
+        env->DBAT[1][i+4] = env->spr[SPR_DBAT4U + 2*i + 1];
+        env->IBAT[0][i+4] = env->spr[SPR_IBAT4U + 2*i];
+        env->IBAT[1][i+4] = env->spr[SPR_IBAT4U + 2*i + 1];
+    }
+
+    /* Restore htab_base and htab_mask variables */
+    ppc_store_sdr1(env, env->spr[SPR_SDR1]);
+
+    hreg_compute_hflags(env);
+    hreg_compute_mem_idx(env);
+
+    return 0;
+}
+
+static bool fpu_needed(void *opaque)
+{
+    PowerPCCPU *cpu = opaque;
+
+    return (cpu->env.insns_flags & PPC_FLOAT);
+}
+
+static const VMStateDescription vmstate_fpu = {
+    .name = "cpu/fpu",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_FLOAT64_ARRAY(env.fpr, PowerPCCPU, 32),
+        VMSTATE_UINTTL(env.fpscr, PowerPCCPU),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static bool altivec_needed(void *opaque)
+{
+    PowerPCCPU *cpu = opaque;
+
+    return (cpu->env.insns_flags & PPC_ALTIVEC);
+}
+
+static const VMStateDescription vmstate_altivec = {
+    .name = "cpu/altivec",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_AVR_ARRAY(env.avr, PowerPCCPU, 32),
+        VMSTATE_UINT32(env.vscr, PowerPCCPU),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static bool vsx_needed(void *opaque)
+{
+    PowerPCCPU *cpu = opaque;
+
+    return (cpu->env.insns_flags2 & PPC2_VSX);
+}
+
+static const VMStateDescription vmstate_vsx = {
+    .name = "cpu/vsx",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_UINT64_ARRAY(env.vsr, PowerPCCPU, 32),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static bool sr_needed(void *opaque)
+{
+#ifdef TARGET_PPC64
+    PowerPCCPU *cpu = opaque;
+
+    return !(cpu->env.mmu_model & POWERPC_MMU_64);
+#else
+    return true;
+#endif
+}
+
+static const VMStateDescription vmstate_sr = {
+    .name = "cpu/sr",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_UINTTL_ARRAY(env.sr, PowerPCCPU, 32),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+#ifdef TARGET_PPC64
+static int get_slbe(QEMUFile *f, void *pv, size_t size)
+{
+    ppc_slb_t *v = pv;
+
+    v->esid = qemu_get_be64(f);
+    v->vsid = qemu_get_be64(f);
+
+    return 0;
+}
+
+static void put_slbe(QEMUFile *f, void *pv, size_t size)
+{
+    ppc_slb_t *v = pv;
+
+    qemu_put_be64(f, v->esid);
+    qemu_put_be64(f, v->vsid);
+}
+
+const VMStateInfo vmstate_info_slbe = {
+    .name = "slbe",
+    .get  = get_slbe,
+    .put  = put_slbe,
+};
+
+#define VMSTATE_SLB_ARRAY_V(_f, _s, _n, _v)                       \
+    VMSTATE_ARRAY(_f, _s, _n, _v, vmstate_info_slbe, ppc_slb_t)
+
+#define VMSTATE_SLB_ARRAY(_f, _s, _n)                             \
+    VMSTATE_SLB_ARRAY_V(_f, _s, _n, 0)
+
+static bool slb_needed(void *opaque)
+{
+    PowerPCCPU *cpu = opaque;
+
+    /* We don't support any of the old segment table based 64-bit CPUs */
+    return (cpu->env.mmu_model & POWERPC_MMU_64);
+}
+
+static const VMStateDescription vmstate_slb = {
+    .name = "cpu/slb",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_INT32_EQUAL(env.slb_nr, PowerPCCPU),
+        VMSTATE_SLB_ARRAY(env.slb, PowerPCCPU, 64),
+        VMSTATE_END_OF_LIST()
+    }
+};
+#endif /* TARGET_PPC64 */
+
+static const VMStateDescription vmstate_tlb6xx_entry = {
+    .name = "cpu/tlb6xx_entry",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_UINTTL(pte0, ppc6xx_tlb_t),
+        VMSTATE_UINTTL(pte1, ppc6xx_tlb_t),
+        VMSTATE_UINTTL(EPN, ppc6xx_tlb_t),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static bool tlb6xx_needed(void *opaque)
+{
+    PowerPCCPU *cpu = opaque;
+    CPUPPCState *env = &cpu->env;
+
+    return env->nb_tlb && (env->tlb_type == TLB_6XX);
+}
+
+static const VMStateDescription vmstate_tlb6xx = {
+    .name = "cpu/tlb6xx",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_INT32_EQUAL(env.nb_tlb, PowerPCCPU),
+        VMSTATE_STRUCT_VARRAY_POINTER_INT32(env.tlb.tlb6, PowerPCCPU,
+                                            env.nb_tlb,
+                                            vmstate_tlb6xx_entry,
+                                            ppc6xx_tlb_t),
+        VMSTATE_UINTTL_ARRAY(env.tgpr, PowerPCCPU, 4),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription vmstate_tlbemb_entry = {
+    .name = "cpu/tlbemb_entry",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_UINT64(RPN, ppcemb_tlb_t),
+        VMSTATE_UINTTL(EPN, ppcemb_tlb_t),
+        VMSTATE_UINTTL(PID, ppcemb_tlb_t),
+        VMSTATE_UINTTL(size, ppcemb_tlb_t),
+        VMSTATE_UINT32(prot, ppcemb_tlb_t),
+        VMSTATE_UINT32(attr, ppcemb_tlb_t),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static bool tlbemb_needed(void *opaque)
+{
+    PowerPCCPU *cpu = opaque;
+    CPUPPCState *env = &cpu->env;
+
+    return env->nb_tlb && (env->tlb_type == TLB_EMB);
+}
+
+static bool pbr403_needed(void *opaque)
+{
+    PowerPCCPU *cpu = opaque;
+    uint32_t pvr = cpu->env.spr[SPR_PVR];
+
+    return (pvr & 0xffff0000) == 0x00200000;
+}
+
+static const VMStateDescription vmstate_pbr403 = {
+    .name = "cpu/pbr403",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_UINTTL_ARRAY(env.pb, PowerPCCPU, 4),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static const VMStateDescription vmstate_tlbemb = {
+    .name = "cpu/tlb6xx",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_INT32_EQUAL(env.nb_tlb, PowerPCCPU),
+        VMSTATE_STRUCT_VARRAY_POINTER_INT32(env.tlb.tlbe, PowerPCCPU,
+                                            env.nb_tlb,
+                                            vmstate_tlbemb_entry,
+                                            ppcemb_tlb_t),
+        /* 403 protection registers */
+        VMSTATE_END_OF_LIST()
+    },
+    .subsections = (VMStateSubsection []) {
+        {
+            .vmsd = &vmstate_pbr403,
+            .needed = pbr403_needed,
+        } , {
+            /* empty */
+        }
+    }
+};
+
+static const VMStateDescription vmstate_tlbmas_entry = {
+    .name = "cpu/tlbmas_entry",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_UINT32(mas8, ppcmas_tlb_t),
+        VMSTATE_UINT32(mas1, ppcmas_tlb_t),
+        VMSTATE_UINT64(mas2, ppcmas_tlb_t),
+        VMSTATE_UINT64(mas7_3, ppcmas_tlb_t),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static bool tlbmas_needed(void *opaque)
+{
+    PowerPCCPU *cpu = opaque;
+    CPUPPCState *env = &cpu->env;
+
+    return env->nb_tlb && (env->tlb_type == TLB_MAS);
+}
+
+static const VMStateDescription vmstate_tlbmas = {
+    .name = "cpu/tlbmas",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_INT32_EQUAL(env.nb_tlb, PowerPCCPU),
+        VMSTATE_STRUCT_VARRAY_POINTER_INT32(env.tlb.tlbm, PowerPCCPU,
+                                            env.nb_tlb,
+                                            vmstate_tlbmas_entry,
+                                            ppcmas_tlb_t),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+const VMStateDescription vmstate_ppc_cpu = {
+    .name = "cpu",
+    .version_id = 5,
+    .minimum_version_id = 5,
+    .minimum_version_id_old = 4,
+    .load_state_old = cpu_load_old,
+    .pre_save = cpu_pre_save,
+    .post_load = cpu_post_load,
+    .fields      = (VMStateField []) {
+        /* Verify we haven't changed the pvr */
+        VMSTATE_UINTTL_EQUAL(env.spr[SPR_PVR], PowerPCCPU),
+
+        /* User mode architected state */
+        VMSTATE_UINTTL_ARRAY(env.gpr, PowerPCCPU, 32),
+#if !defined(TARGET_PPC64)
+        VMSTATE_UINTTL_ARRAY(env.gprh, PowerPCCPU, 32),
+#endif
+        VMSTATE_UINT32_ARRAY(env.crf, PowerPCCPU, 8),
+        VMSTATE_UINTTL(env.nip, PowerPCCPU),
+
+        /* SPRs */
+        VMSTATE_UINTTL_ARRAY(env.spr, PowerPCCPU, 1024),
+        VMSTATE_UINT64(env.spe_acc, PowerPCCPU),
+
+        /* Reservation */
+        VMSTATE_UINTTL(env.reserve_addr, PowerPCCPU),
+
+        /* Supervisor mode architected state */
+        VMSTATE_UINTTL(env.msr, PowerPCCPU),
+
+        /* Internal state */
+        VMSTATE_UINTTL(env.hflags_nmsr, PowerPCCPU),
+        /* FIXME: access_type? */
+
+        /* Sanity checking */
+        VMSTATE_UINTTL_EQUAL(env.msr_mask, PowerPCCPU),
+        VMSTATE_UINT64_EQUAL(env.insns_flags, PowerPCCPU),
+        VMSTATE_UINT64_EQUAL(env.insns_flags2, PowerPCCPU),
+        VMSTATE_UINT32_EQUAL(env.nb_BATs, PowerPCCPU),
+        VMSTATE_END_OF_LIST()
+    },
+    .subsections = (VMStateSubsection []) {
+        {
+            .vmsd = &vmstate_fpu,
+            .needed = fpu_needed,
+        } , {
+            .vmsd = &vmstate_altivec,
+            .needed = altivec_needed,
+        } , {
+            .vmsd = &vmstate_vsx,
+            .needed = vsx_needed,
+        } , {
+            .vmsd = &vmstate_sr,
+            .needed = sr_needed,
+        } , {
+#ifdef TARGET_PPC64
+            .vmsd = &vmstate_slb,
+            .needed = slb_needed,
+        } , {
+#endif /* TARGET_PPC64 */
+            .vmsd = &vmstate_tlb6xx,
+            .needed = tlb6xx_needed,
+        } , {
+            .vmsd = &vmstate_tlbemb,
+            .needed = tlbemb_needed,
+        } , {
+            .vmsd = &vmstate_tlbmas,
+            .needed = tlbmas_needed,
+        } , {
+            /* FIXME: DCRs? */
+            /* FIXME: timebase? */
+            /* empty */
+        }
+    }
+};
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 50e0ee5..02f3825 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -8295,6 +8295,8 @@ static void ppc_cpu_class_init(ObjectClass *oc, void *data)
     cc->do_interrupt = ppc_cpu_do_interrupt;
     cc->dump_state = ppc_cpu_dump_state;
     cc->dump_statistics = ppc_cpu_dump_statistics;
+
+    cpu_class_set_vmsd(cc, &vmstate_ppc_cpu);
 }
 
 static const TypeInfo ppc_cpu_type_info = {
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 05/19] pseries: savevm support for XICS interrupt controller
  2013-07-06 13:53 [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (3 preceding siblings ...)
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 04/19] target-ppc: Convert ppc cpu savevm to VMStateDescription Alexey Kardashevskiy
@ 2013-07-06 13:54 ` Alexey Kardashevskiy
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 06/19] pseries: savevm support for VIO devices Alexey Kardashevskiy
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-06 13:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paul Mackerras,
	David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

This patch adds the necessary VMStateDescription information to support
savevm/loadvm for the XICS interrupt controller used on the pseries
machine.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[aik: added ics_resend() on post_load]

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/intc/xics.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 0e374c8..3e8f48f 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -497,6 +497,61 @@ static void xics_reset(DeviceState *d)
     xics_common_reset(XICS(d));
 }
 
+static int ics_post_load(void *opaque, int version_id)
+{
+    int i;
+    struct ics_state *ics = opaque;
+
+    for (i = 0; i < ics->icp->nr_servers; i++) {
+        icp_resend(ics->icp, i);
+    }
+
+    return 0;
+}
+
+const VMStateDescription vmstate_icp_server = {
+    .name = "icp/server",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        /* Sanity check */
+        VMSTATE_UINT32(xirr, struct icp_server_state),
+        VMSTATE_UINT8(pending_priority, struct icp_server_state),
+        VMSTATE_UINT8(mfrr, struct icp_server_state),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static const VMStateDescription vmstate_ics_irq = {
+    .name = "ics/irq",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_UINT32(server, struct ics_irq_state),
+        VMSTATE_UINT8(priority, struct ics_irq_state),
+        VMSTATE_UINT8(saved_priority, struct ics_irq_state),
+        VMSTATE_UINT8(status, struct ics_irq_state),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+const VMStateDescription vmstate_ics = {
+    .name = "ics",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .post_load = ics_post_load,
+    .fields      = (VMStateField []) {
+        /* Sanity check */
+        VMSTATE_UINT32_EQUAL(nr_irqs, struct ics_state),
+
+        VMSTATE_STRUCT_VARRAY_POINTER_UINT32(irqs, struct ics_state, nr_irqs, vmstate_ics_irq, struct ics_irq_state),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 void xics_common_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
 {
     CPUState *cs = CPU(cpu);
@@ -523,7 +578,11 @@ void xics_common_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
 
 void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
 {
+    CPUState *cs = CPU(cpu);
+    struct icp_server_state *ss = &icp->ss[cs->cpu_index];
+
     xics_common_cpu_setup(icp, cpu);
+    vmstate_register(NULL, cs->cpu_index, &vmstate_icp_server, ss);
 }
 
 void xics_common_init(struct icp_state *icp, qemu_irq_handler handler)
@@ -555,6 +614,10 @@ static void xics_realize(DeviceState *dev, Error **errp)
     spapr_rtas_register("ibm,int-off", rtas_int_off);
     spapr_rtas_register("ibm,int-on", rtas_int_on);
 
+    /* We use each the ICS's offset into the global irq number space
+     * as an instance id.  This means we can extend to multiple ICS
+     * instances without needing to change the savevm format */
+    vmstate_register(NULL, icp->ics->offset, &vmstate_ics, icp->ics);
 }
 
 static Property xics_properties[] = {
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 06/19] pseries: savevm support for VIO devices
  2013-07-06 13:53 [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (4 preceding siblings ...)
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 05/19] pseries: savevm support for XICS interrupt controller Alexey Kardashevskiy
@ 2013-07-06 13:54 ` Alexey Kardashevskiy
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 07/19] pseries: savevm support for PAPR VIO logical lan Alexey Kardashevskiy
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-06 13:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paul Mackerras,
	David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

This patch adds helpers to allow PAPR VIO devices to save state common
to all VIO devices during savevm.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr_vio.c         | 20 ++++++++++++++++++++
 include/hw/ppc/spapr_vio.h |  5 +++++
 2 files changed, 25 insertions(+)

diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c
index 9c18741..565d883 100644
--- a/hw/ppc/spapr_vio.c
+++ b/hw/ppc/spapr_vio.c
@@ -542,6 +542,26 @@ static const TypeInfo spapr_vio_bridge_info = {
     .class_init    = spapr_vio_bridge_class_init,
 };
 
+const VMStateDescription vmstate_spapr_vio = {
+    .name = "spapr_vio",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        /* Sanity check */
+        VMSTATE_UINT32_EQUAL(reg, VIOsPAPRDevice),
+        VMSTATE_UINT32_EQUAL(irq, VIOsPAPRDevice),
+
+        /* General VIO device state */
+        VMSTATE_UINTTL(signal_state, VIOsPAPRDevice),
+        VMSTATE_UINT64(crq.qladdr, VIOsPAPRDevice),
+        VMSTATE_UINT32(crq.qsize, VIOsPAPRDevice),
+        VMSTATE_UINT32(crq.qnext, VIOsPAPRDevice),
+
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static void vio_spapr_device_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *k = DEVICE_CLASS(klass);
diff --git a/include/hw/ppc/spapr_vio.h b/include/hw/ppc/spapr_vio.h
index 3609327..46edc2a 100644
--- a/include/hw/ppc/spapr_vio.h
+++ b/include/hw/ppc/spapr_vio.h
@@ -134,4 +134,9 @@ VIOsPAPRDevice *spapr_vty_get_default(VIOsPAPRBus *bus);
 
 void spapr_vio_quiesce(void);
 
+extern const VMStateDescription vmstate_spapr_vio;
+
+#define VMSTATE_SPAPR_VIO(_f, _s) \
+    VMSTATE_STRUCT(_f, _s, 0, vmstate_spapr_vio, VIOsPAPRDevice)
+
 #endif /* _HW_SPAPR_VIO_H */
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 07/19] pseries: savevm support for PAPR VIO logical lan
  2013-07-06 13:53 [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (5 preceding siblings ...)
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 06/19] pseries: savevm support for VIO devices Alexey Kardashevskiy
@ 2013-07-06 13:54 ` Alexey Kardashevskiy
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 08/19] pseries: savevm support for PAPR VIO logical tty Alexey Kardashevskiy
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-06 13:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paul Mackerras,
	David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

This patch adds the necessary VMStateDescription information to support
savevm/loadvm for the spapr_llan (PAPR logical lan) device.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/net/spapr_llan.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/hw/net/spapr_llan.c b/hw/net/spapr_llan.c
index 03a09f2..46f7d5f 100644
--- a/hw/net/spapr_llan.c
+++ b/hw/net/spapr_llan.c
@@ -81,9 +81,9 @@ typedef struct VIOsPAPRVLANDevice {
     VIOsPAPRDevice sdev;
     NICConf nicconf;
     NICState *nic;
-    int isopen;
+    bool isopen;
     target_ulong buf_list;
-    int add_buf_ptr, use_buf_ptr, rx_bufs;
+    uint32_t add_buf_ptr, use_buf_ptr, rx_bufs;
     target_ulong rxq_ptr;
 } VIOsPAPRVLANDevice;
 
@@ -500,6 +500,25 @@ static Property spapr_vlan_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
+static const VMStateDescription vmstate_spapr_llan = {
+    .name = "spapr_llan",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_SPAPR_VIO(sdev, VIOsPAPRVLANDevice),
+        /* LLAN state */
+        VMSTATE_BOOL(isopen, VIOsPAPRVLANDevice),
+        VMSTATE_UINTTL(buf_list, VIOsPAPRVLANDevice),
+        VMSTATE_UINT32(add_buf_ptr, VIOsPAPRVLANDevice),
+        VMSTATE_UINT32(use_buf_ptr, VIOsPAPRVLANDevice),
+        VMSTATE_UINT32(rx_bufs, VIOsPAPRVLANDevice),
+        VMSTATE_UINTTL(rxq_ptr, VIOsPAPRVLANDevice),
+
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static void spapr_vlan_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
@@ -514,6 +533,7 @@ static void spapr_vlan_class_init(ObjectClass *klass, void *data)
     k->signal_mask = 0x1;
     dc->props = spapr_vlan_properties;
     k->rtce_window_size = 0x10000000;
+    dc->vmsd = &vmstate_spapr_llan;
 }
 
 static const TypeInfo spapr_vlan_info = {
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 08/19] pseries: savevm support for PAPR VIO logical tty
  2013-07-06 13:53 [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (6 preceding siblings ...)
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 07/19] pseries: savevm support for PAPR VIO logical lan Alexey Kardashevskiy
@ 2013-07-06 13:54 ` Alexey Kardashevskiy
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 09/19] pseries: savevm support for PAPR TCE tables Alexey Kardashevskiy
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-06 13:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paul Mackerras,
	David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

This patch adds the necessary VMStateDescription information to support
savevm/loadvm for the spapr_tty (PAPR logical serial) device.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/char/spapr_vty.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/hw/char/spapr_vty.c b/hw/char/spapr_vty.c
index 2993848..a799721 100644
--- a/hw/char/spapr_vty.c
+++ b/hw/char/spapr_vty.c
@@ -142,6 +142,21 @@ static Property spapr_vty_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
+static const VMStateDescription vmstate_spapr_vty = {
+    .name = "spapr_vty",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_SPAPR_VIO(sdev, VIOsPAPRVTYDevice),
+
+        VMSTATE_UINT32(in, VIOsPAPRVTYDevice),
+        VMSTATE_UINT32(out, VIOsPAPRVTYDevice),
+        VMSTATE_BUFFER(buf, VIOsPAPRVTYDevice),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static void spapr_vty_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
@@ -152,6 +167,7 @@ static void spapr_vty_class_init(ObjectClass *klass, void *data)
     k->dt_type = "serial";
     k->dt_compatible = "hvterm1";
     dc->props = spapr_vty_properties;
+    dc->vmsd = &vmstate_spapr_vty;
 }
 
 static const TypeInfo spapr_vty_info = {
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 09/19] pseries: savevm support for PAPR TCE tables
  2013-07-06 13:53 [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (7 preceding siblings ...)
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 08/19] pseries: savevm support for PAPR VIO logical tty Alexey Kardashevskiy
@ 2013-07-06 13:54 ` Alexey Kardashevskiy
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 10/19] pseries: rework PAPR virtual SCSI Alexey Kardashevskiy
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-06 13:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paul Mackerras,
	David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

This patch adds the necessary VMStateDescription information to save the
state of PAPR TCE tables (that is, the PAPR specified IOMMU).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr_iommu.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 91bc8e4..ba1f7b6 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -112,6 +112,25 @@ static IOMMUTLBEntry spapr_tce_translate_iommu(MemoryRegion *iommu, hwaddr addr)
     };
 }
 
+static const VMStateDescription vmstate_spapr_tce_table = {
+    .name = "spapr_iommu",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        /* Sanity check */
+        VMSTATE_UINT32_EQUAL(liobn, sPAPRTCETable),
+        VMSTATE_UINT32_EQUAL(window_size, sPAPRTCETable),
+
+        /* IOMMU state */
+        VMSTATE_BOOL(bypass, sPAPRTCETable),
+        VMSTATE_VBUFFER_DIVIDE(table, sPAPRTCETable, 0, NULL, 0, window_size,
+                               SPAPR_TCE_PAGE_SIZE / sizeof(sPAPRTCE)),
+
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static MemoryRegionIOMMUOps spapr_iommu_ops = {
     .translate = spapr_tce_translate_iommu,
 };
@@ -156,6 +175,8 @@ sPAPRTCETable *spapr_tce_new_table(uint32_t liobn, size_t window_size)
 
     QLIST_INSERT_HEAD(&spapr_tce_tables, tcet, list);
 
+    vmstate_register(NULL, tcet->liobn, &vmstate_spapr_tce_table, tcet);
+
     return tcet;
 }
 
@@ -163,6 +184,10 @@ void spapr_tce_free(sPAPRTCETable *tcet)
 {
     QLIST_REMOVE(tcet, list);
 
+    vmstate_unregister(NULL, &vmstate_spapr_tce_table, tcet);
+
+    QLIST_REMOVE(tcet, list);
+
     if (!kvm_enabled() ||
         (kvmppc_remove_spapr_tce(tcet->table, tcet->fd,
                                  tcet->window_size) != 0)) {
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 10/19] pseries: rework PAPR virtual SCSI
  2013-07-06 13:53 [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (8 preceding siblings ...)
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 09/19] pseries: savevm support for PAPR TCE tables Alexey Kardashevskiy
@ 2013-07-06 13:54 ` Alexey Kardashevskiy
  2013-07-08 11:57   ` [Qemu-devel] [PATCH v2] " Alexey Kardashevskiy
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 11/19] pseries: savevm support for " Alexey Kardashevskiy
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 30+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-06 13:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, Alexey Kardashevskiy, Alexander Graf, qemu-ppc,
	Paul Mackerras, David Gibson

The patch reimplements handling of indirect requests in order to
simplify upcoming live migration support.
- all pointers (except SCSIRequest*) were replaces with integer
indexes and offsets;
- DMA'ed srp_direct_buf kept untouched (ie. BE format);
- vscsi_fetch_desc() is added, now it is the only place where
descriptors are fetched and byteswapped;
- vscsi_req struct fields converted to migration-friendly types;
- many dprintf()'s fixed.

This also removed an unused field 'lun' from the spapr_vscsi device
which is assigned, but never used.  So, remove it.

[David Gibson: removed unused 'lun']
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: David Gibson <david@gibson.dropbear.id.au>

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/scsi/spapr_vscsi.c | 224 +++++++++++++++++++++++++++++---------------------
 1 file changed, 131 insertions(+), 93 deletions(-)

diff --git a/hw/scsi/spapr_vscsi.c b/hw/scsi/spapr_vscsi.c
index e8978bf..1e93102 100644
--- a/hw/scsi/spapr_vscsi.c
+++ b/hw/scsi/spapr_vscsi.c
@@ -75,20 +75,19 @@ typedef struct vscsi_req {
     /* SCSI request tracking */
     SCSIRequest             *sreq;
     uint32_t                qtag; /* qemu tag != srp tag */
-    int                     lun;
-    int                     active;
-    long                    data_len;
-    int                     writing;
-    int                     senselen;
+    bool                    active;
+    uint32_t                data_len;
+    bool                    writing;
+    uint32_t                senselen;
     uint8_t                 sense[SCSI_SENSE_BUF_SIZE];
 
     /* RDMA related bits */
     uint8_t                 dma_fmt;
-    struct srp_direct_buf   ext_desc;
-    struct srp_direct_buf   *cur_desc;
-    struct srp_indirect_buf *ind_desc;
-    int                     local_desc;
-    int                     total_desc;
+    uint16_t                local_desc;
+    uint16_t                total_desc;
+    uint16_t                cdb_offset;
+    uint16_t                cur_desc_num;
+    uint16_t                cur_desc_offset;
 } vscsi_req;
 
 #define TYPE_VIO_SPAPR_VSCSI_DEVICE "spapr-vscsi"
@@ -264,93 +263,139 @@ static int vscsi_send_rsp(VSCSIState *s, vscsi_req *req,
     return 0;
 }
 
-static inline void vscsi_swap_desc(struct srp_direct_buf *desc)
+static inline struct srp_direct_buf vscsi_swap_desc(struct srp_direct_buf desc)
 {
-    desc->va = be64_to_cpu(desc->va);
-    desc->len = be32_to_cpu(desc->len);
+    desc.va = be64_to_cpu(desc.va);
+    desc.len = be32_to_cpu(desc.len);
+    return desc;
+}
+
+static int vscsi_fetch_desc(VSCSIState *s, struct vscsi_req *req,
+                            unsigned n, unsigned buf_offset,
+                            struct srp_direct_buf *ret)
+{
+    struct srp_cmd *cmd = &req->iu.srp.cmd;
+
+    switch (req->dma_fmt) {
+    case SRP_NO_DATA_DESC: {
+        dprintf("VSCSI: no data descriptor\n");
+        return 0;
+    }
+    case SRP_DATA_DESC_DIRECT: {
+        *ret = *(struct srp_direct_buf *)(cmd->add_data + req->cdb_offset);
+        assert(req->cur_desc_num == 0);
+        dprintf("VSCSI: direct segment");
+        break;
+    }
+    case SRP_DATA_DESC_INDIRECT: {
+        struct srp_indirect_buf *tmp = (struct srp_indirect_buf *)
+                                       (cmd->add_data + req->cdb_offset);
+        if (n < req->local_desc) {
+            *ret = tmp->desc_list[n];
+            dprintf("VSCSI: indirect segment local tag=0x%x desc#%d/%d",
+                    req->qtag, n, req->local_desc);
+
+        } else if (n < req->total_desc) {
+            int rc;
+            struct srp_direct_buf tbl_desc = vscsi_swap_desc(tmp->table_desc);
+            unsigned desc_offset = (n - req->local_desc) *
+                                    sizeof(struct srp_direct_buf);
+
+            if (desc_offset > tbl_desc.len) {
+                dprintf("VSCSI:   #%d is ouf of range (%d bytes)\n",
+                        n, desc_offset);
+                return -1;
+            }
+            rc = spapr_vio_dma_read(&s->vdev, tbl_desc.va + desc_offset,
+                                    ret, sizeof(struct srp_direct_buf));
+            if (rc) {
+                dprintf("VSCSI: spapr_vio_dma_read -> %d reading ext_desc\n",
+                        rc);
+                return rc;
+            }
+            dprintf("VSCSI: indirect segment ext. tag=0x%x desc#%d/%d { va=%"PRIx64" len=%x }",
+                    req->qtag, n, req->total_desc, tbl_desc.va, tbl_desc.len);
+        } else {
+            dprintf("VSCSI:   Out of descriptors !\n");
+            return 0;
+        }
+        break;
+    }
+    default:
+        fprintf(stderr, "VSCSI:   Unknown format %x\n", req->dma_fmt);
+        return -1;
+    }
+
+    *ret = vscsi_swap_desc(*ret);
+    if (buf_offset > ret->len) {
+        dprintf("   offset=%x is out of a descriptor #%d boundary=%x\n",
+                buf_offset, req->cur_desc_num, ret->len);
+        return -1;
+    }
+    ret->va += buf_offset;
+    ret->len -= buf_offset;
+
+    dprintf("   cur=%d offs=%x ret { va=%"PRIx64" len=%x }\n",
+            req->cur_desc_num, req->cur_desc_offset, ret->va, ret->len);
+
+    return ret->len ? 1 : 0;
 }
 
 static int vscsi_srp_direct_data(VSCSIState *s, vscsi_req *req,
                                  uint8_t *buf, uint32_t len)
 {
-    struct srp_direct_buf *md = req->cur_desc;
+    struct srp_direct_buf md;
     uint32_t llen;
     int rc = 0;
 
-    dprintf("VSCSI: direct segment 0x%x bytes, va=0x%llx desc len=0x%x\n",
-            len, (unsigned long long)md->va, md->len);
+    rc = vscsi_fetch_desc(s, req, req->cur_desc_num, req->cur_desc_offset, &md);
+    if (rc < 0) {
+        return -1;
+    } else if (rc == 0) {
+        return 0;
+    }
 
-    llen = MIN(len, md->len);
+    llen = MIN(len, md.len);
     if (llen) {
         if (req->writing) { /* writing = to device = reading from memory */
-            rc = spapr_vio_dma_read(&s->vdev, md->va, buf, llen);
+            rc = spapr_vio_dma_read(&s->vdev, md.va, buf, llen);
         } else {
-            rc = spapr_vio_dma_write(&s->vdev, md->va, buf, llen);
+            rc = spapr_vio_dma_write(&s->vdev, md.va, buf, llen);
         }
     }
-    md->len -= llen;
-    md->va += llen;
 
     if (rc) {
         return -1;
     }
+    req->cur_desc_offset += llen;
+
     return llen;
 }
 
 static int vscsi_srp_indirect_data(VSCSIState *s, vscsi_req *req,
                                    uint8_t *buf, uint32_t len)
 {
-    struct srp_direct_buf *td = &req->ind_desc->table_desc;
-    struct srp_direct_buf *md = req->cur_desc;
+    struct srp_direct_buf md;
     int rc = 0;
     uint32_t llen, total = 0;
 
-    dprintf("VSCSI: indirect segment 0x%x bytes, td va=0x%llx len=0x%x\n",
-            len, (unsigned long long)td->va, td->len);
+    dprintf("VSCSI: indirect segment 0x%x bytes\n", len);
 
     /* While we have data ... */
     while (len) {
-        /* If we have a descriptor but it's empty, go fetch a new one */
-        if (md && md->len == 0) {
-            /* More local available, use one */
-            if (req->local_desc) {
-                md = ++req->cur_desc;
-                --req->local_desc;
-                --req->total_desc;
-                td->va += sizeof(struct srp_direct_buf);
-            } else {
-                md = req->cur_desc = NULL;
-            }
+        rc = vscsi_fetch_desc(s, req, req->cur_desc_num, req->cur_desc_offset, &md);
+        if (rc < 0) {
+            return -1;
+        } else if (rc == 0) {
+            break;
         }
-        /* No descriptor at hand, fetch one */
-        if (!md) {
-            if (!req->total_desc) {
-                dprintf("VSCSI:   Out of descriptors !\n");
-                break;
-            }
-            md = req->cur_desc = &req->ext_desc;
-            dprintf("VSCSI:   Reading desc from 0x%llx\n",
-                    (unsigned long long)td->va);
-            rc = spapr_vio_dma_read(&s->vdev, td->va, md,
-                                    sizeof(struct srp_direct_buf));
-            if (rc) {
-                dprintf("VSCSI: spapr_vio_dma_read -> %d reading ext_desc\n",
-                        rc);
-                break;
-            }
-            vscsi_swap_desc(md);
-            td->va += sizeof(struct srp_direct_buf);
-            --req->total_desc;
-        }
-        dprintf("VSCSI:   [desc va=0x%llx,len=0x%x] remaining=0x%x\n",
-                (unsigned long long)md->va, md->len, len);
 
         /* Perform transfer */
-        llen = MIN(len, md->len);
+        llen = MIN(len, md.len);
         if (req->writing) { /* writing = to device = reading from memory */
-            rc = spapr_vio_dma_read(&s->vdev, md->va, buf, llen);
+            rc = spapr_vio_dma_read(&s->vdev, md.va, buf, llen);
         } else {
-            rc = spapr_vio_dma_write(&s->vdev, md->va, buf, llen);
+            rc = spapr_vio_dma_write(&s->vdev, md.va, buf, llen);
         }
         if (rc) {
             dprintf("VSCSI: spapr_vio_dma_r/w(%d) -> %d\n", req->writing, rc);
@@ -361,10 +406,18 @@ static int vscsi_srp_indirect_data(VSCSIState *s, vscsi_req *req,
 
         len -= llen;
         buf += llen;
+
         total += llen;
-        md->va += llen;
-        md->len -= llen;
+
+        /* Update current position in the current descriptor */
+        req->cur_desc_offset += llen;
+        if (md.len == llen) {
+            /* Go to the next descriptor if the current one finished */
+            ++req->cur_desc_num;
+            req->cur_desc_offset = 0;
+        }
     }
+
     return rc ? -1 : total;
 }
 
@@ -412,14 +465,13 @@ static int data_out_desc_size(struct srp_cmd *cmd)
 static int vscsi_preprocess_desc(vscsi_req *req)
 {
     struct srp_cmd *cmd = &req->iu.srp.cmd;
-    int offset, i;
 
-    offset = cmd->add_cdb_len & ~3;
+    req->cdb_offset = cmd->add_cdb_len & ~3;
 
     if (req->writing) {
         req->dma_fmt = cmd->buf_fmt >> 4;
     } else {
-        offset += data_out_desc_size(cmd);
+        req->cdb_offset += data_out_desc_size(cmd);
         req->dma_fmt = cmd->buf_fmt & ((1U << 4) - 1);
     }
 
@@ -427,31 +479,18 @@ static int vscsi_preprocess_desc(vscsi_req *req)
     case SRP_NO_DATA_DESC:
         break;
     case SRP_DATA_DESC_DIRECT:
-        req->cur_desc = (struct srp_direct_buf *)(cmd->add_data + offset);
         req->total_desc = req->local_desc = 1;
-        vscsi_swap_desc(req->cur_desc);
-        dprintf("VSCSI: using direct RDMA %s, 0x%x bytes MD: 0x%llx\n",
-                req->writing ? "write" : "read",
-                req->cur_desc->len, (unsigned long long)req->cur_desc->va);
         break;
-    case SRP_DATA_DESC_INDIRECT:
-        req->ind_desc = (struct srp_indirect_buf *)(cmd->add_data + offset);
-        vscsi_swap_desc(&req->ind_desc->table_desc);
-        req->total_desc = req->ind_desc->table_desc.len /
-            sizeof(struct srp_direct_buf);
+    case SRP_DATA_DESC_INDIRECT: {
+        struct srp_indirect_buf *ind_tmp = (struct srp_indirect_buf *)
+                (cmd->add_data + req->cdb_offset);
+
+        req->total_desc = be32_to_cpu(ind_tmp->table_desc.len) /
+                          sizeof(struct srp_direct_buf);
         req->local_desc = req->writing ? cmd->data_out_desc_cnt :
-            cmd->data_in_desc_cnt;
-        for (i = 0; i < req->local_desc; i++) {
-            vscsi_swap_desc(&req->ind_desc->desc_list[i]);
-        }
-        req->cur_desc = req->local_desc ? &req->ind_desc->desc_list[0] : NULL;
-        dprintf("VSCSI: using indirect RDMA %s, 0x%x bytes %d descs "
-                "(%d local) VA: 0x%llx\n",
-                req->writing ? "read" : "write",
-                be32_to_cpu(req->ind_desc->len),
-                req->total_desc, req->local_desc,
-                (unsigned long long)req->ind_desc->table_desc.va);
+                          cmd->data_in_desc_cnt;
         break;
+    }
     default:
         fprintf(stderr,
                 "vscsi_preprocess_desc: Unknown format %x\n", req->dma_fmt);
@@ -499,8 +538,8 @@ static void vscsi_command_complete(SCSIRequest *sreq, uint32_t status, size_t re
     vscsi_req *req = sreq->hba_private;
     int32_t res_in = 0, res_out = 0;
 
-    dprintf("VSCSI: SCSI cmd complete, r=0x%x tag=0x%x status=0x%x, req=%p\n",
-            reason, sreq->tag, status, req);
+    dprintf("VSCSI: SCSI cmd complete, tag=0x%x status=0x%x, req=%p\n",
+            sreq->tag, status, req);
     if (req == NULL) {
         fprintf(stderr, "VSCSI: Can't find request for tag 0x%x\n", sreq->tag);
         return;
@@ -509,7 +548,7 @@ static void vscsi_command_complete(SCSIRequest *sreq, uint32_t status, size_t re
     if (status == CHECK_CONDITION) {
         req->senselen = scsi_req_get_sense(req->sreq, req->sense,
                                            sizeof(req->sense));
-        dprintf("VSCSI: Sense data, %d bytes:\n", len);
+        dprintf("VSCSI: Sense data, %d bytes:\n", req->senselen);
         dprintf("       %02x  %02x  %02x  %02x  %02x  %02x  %02x  %02x\n",
                 req->sense[0], req->sense[1], req->sense[2], req->sense[3],
                 req->sense[4], req->sense[5], req->sense[6], req->sense[7]);
@@ -621,12 +660,11 @@ static int vscsi_queue_cmd(VSCSIState *s, vscsi_req *req)
         } return 1;
     }
 
-    req->lun = lun;
     req->sreq = scsi_req_new(sdev, req->qtag, lun, srp->cmd.cdb, req);
     n = scsi_req_enqueue(req->sreq);
 
-    dprintf("VSCSI: Queued command tag 0x%x CMD 0x%x ID %d LUN %d ret: %d\n",
-            req->qtag, srp->cmd.cdb[0], id, lun, n);
+    dprintf("VSCSI: Queued command tag 0x%x CMD 0x%x LUN %d ret: %d\n",
+            req->qtag, srp->cmd.cdb[0], lun, n);
 
     if (n) {
         /* Transfer direction must be set before preprocessing the
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 11/19] pseries: savevm support for PAPR virtual SCSI
  2013-07-06 13:53 [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (9 preceding siblings ...)
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 10/19] pseries: rework PAPR virtual SCSI Alexey Kardashevskiy
@ 2013-07-06 13:54 ` Alexey Kardashevskiy
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 12/19] pseries: savevm support for pseries machine Alexey Kardashevskiy
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-06 13:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paul Mackerras,
	David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

This patch adds the necessary support for saving the state of the PAPR VIO
virtual SCSI device. This also saves and restores active SCSI requests.

[aik: implemented vscsi_req save/restore]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: David Gibson <david@gibson.dropbear.id.au>

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/scsi/spapr_vscsi.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 81 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/spapr_vscsi.c b/hw/scsi/spapr_vscsi.c
index 1e93102..4db3a47 100644
--- a/hw/scsi/spapr_vscsi.c
+++ b/hw/scsi/spapr_vscsi.c
@@ -579,6 +579,69 @@ static void vscsi_request_cancelled(SCSIRequest *sreq)
     vscsi_put_req(req);
 }
 
+static const VMStateDescription vmstate_spapr_vscsi_req = {
+    .name = "spapr_vscsi_req",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_BUFFER(crq.raw, vscsi_req),
+        VMSTATE_BUFFER(iu.srp.reserved, vscsi_req),
+        VMSTATE_UINT32(qtag, vscsi_req),
+        VMSTATE_BOOL(active, vscsi_req),
+        VMSTATE_UINT32(data_len, vscsi_req),
+        VMSTATE_BOOL(writing, vscsi_req),
+        VMSTATE_UINT32(senselen, vscsi_req),
+        VMSTATE_BUFFER(sense, vscsi_req),
+        VMSTATE_UINT8(dma_fmt, vscsi_req),
+        VMSTATE_UINT16(local_desc, vscsi_req),
+        VMSTATE_UINT16(total_desc, vscsi_req),
+        VMSTATE_UINT16(cdb_offset, vscsi_req),
+      /*Restart SCSI request from the beginning for now */
+      /*VMSTATE_UINT16(cur_desc_num, vscsi_req),
+        VMSTATE_UINT16(cur_desc_offset, vscsi_req),*/
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static void vscsi_save_request(QEMUFile *f, SCSIRequest *sreq)
+{
+    vscsi_req *req = sreq->hba_private;
+    assert(req->active);
+
+    vmstate_save_state(f, &vmstate_spapr_vscsi_req, req);
+
+    dprintf("VSCSI: saving tag=%u, current desc#%d, offset=%x\n",
+            req->qtag, req->cur_desc_num, req->cur_desc_offset);
+}
+
+static void *vscsi_load_request(QEMUFile *f, SCSIRequest *sreq)
+{
+    SCSIBus *bus = sreq->bus;
+    VSCSIState *s = VIO_SPAPR_VSCSI_DEVICE(bus->qbus.parent);
+    vscsi_req *req;
+    int rc;
+
+    assert(sreq->tag < VSCSI_REQ_LIMIT);
+    req = &s->reqs[sreq->tag];
+    assert(!req->active);
+
+    memset(req, 0, sizeof(*req));
+    rc = vmstate_load_state(f, &vmstate_spapr_vscsi_req, req, 1);
+    if (rc) {
+        fprintf(stderr, "VSCSI: failed loading request tag#%u\n", sreq->tag);
+        return NULL;
+    }
+    assert(req->active);
+
+    req->sreq = scsi_req_ref(sreq);
+
+    dprintf("VSCSI: restoring tag=%u, current desc#%d, offset=%x\n",
+            req->qtag, req->cur_desc_num, req->cur_desc_offset);
+
+    return req;
+}
+
 static void vscsi_process_login(VSCSIState *s, vscsi_req *req)
 {
     union viosrp_iu *iu = &req->iu;
@@ -933,7 +996,9 @@ static const struct SCSIBusInfo vscsi_scsi_info = {
 
     .transfer_data = vscsi_transfer_data,
     .complete = vscsi_command_complete,
-    .cancel = vscsi_request_cancelled
+    .cancel = vscsi_request_cancelled,
+    .save_request = vscsi_save_request,
+    .load_request = vscsi_load_request,
 };
 
 static void spapr_vscsi_reset(VIOsPAPRDevice *dev)
@@ -992,6 +1057,20 @@ static Property spapr_vscsi_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
+static const VMStateDescription vmstate_spapr_vscsi = {
+    .name = "spapr_vscsi",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_SPAPR_VIO(vdev, VSCSIState),
+        /* VSCSI state */
+        /* ???? */
+
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static void spapr_vscsi_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
@@ -1006,6 +1085,7 @@ static void spapr_vscsi_class_init(ObjectClass *klass, void *data)
     k->signal_mask = 0x00000001;
     dc->props = spapr_vscsi_properties;
     k->rtce_window_size = 0x10000000;
+    dc->vmsd = &vmstate_spapr_vscsi;
 }
 
 static const TypeInfo spapr_vscsi_info = {
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 12/19] pseries: savevm support for pseries machine
  2013-07-06 13:53 [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (10 preceding siblings ...)
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 11/19] pseries: savevm support for " Alexey Kardashevskiy
@ 2013-07-06 13:54 ` Alexey Kardashevskiy
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 13/19] pseries: savevm support for PCI host bridge Alexey Kardashevskiy
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-06 13:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paul Mackerras,
	David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

This adds the necessary pieces to implement savevm / migration for the
pseries machine.  The most complex part here is migrating the hash
table - for the paravirtualized pseries machine the guest's hash page
table is not stored within guest memory, but externally and the guest
accesses it via hypercalls.

This patch uses a hypervisor reserved bit of the HPTE as a dirty bit
(tracking changes to the HPTE itself, not the page it references).
This is used to implement a live migration style incremental save and
restore of the hash table contents.

In addition it adds VMStateDescription information to save and restore
the (few) remaining pieces of state information needed by the pseries
machine.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr.c         | 269 ++++++++++++++++++++++++++++++++++++++++++++++++-
 hw/ppc/spapr_hcall.c   |   8 +-
 include/hw/ppc/spapr.h |  12 ++-
 3 files changed, 281 insertions(+), 8 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index d8f1614..bf348c7 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -32,6 +32,7 @@
 #include "sysemu/cpus.h"
 #include "sysemu/kvm.h"
 #include "kvm_ppc.h"
+#include "mmu-hash64.h"
 
 #include "hw/boards.h"
 #include "hw/ppc/ppc.h"
@@ -667,7 +668,7 @@ static void spapr_cpu_reset(void *opaque)
 
     env->spr[SPR_HIOR] = 0;
 
-    env->external_htab = spapr->htab;
+    env->external_htab = (uint8_t *)spapr->htab;
     env->htab_base = -1;
     env->htab_mask = HTAB_SIZE(spapr) - 1;
     env->spr[SPR_SDR1] = (target_ulong)(uintptr_t)spapr->htab |
@@ -719,6 +720,268 @@ static int spapr_vga_init(PCIBus *pci_bus)
     }
 }
 
+static const VMStateDescription vmstate_spapr = {
+    .name = "spapr",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_UINT32(next_irq, sPAPREnvironment),
+
+        /* RTC offset */
+        VMSTATE_UINT64(rtc_offset, sPAPREnvironment),
+
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+#define HPTE(_table, _i)   (void *)(((uint64_t *)(_table)) + ((_i) * 2))
+#define HPTE_VALID(_hpte)  (tswap64(*((uint64_t *)(_hpte))) & HPTE64_V_VALID)
+#define HPTE_DIRTY(_hpte)  (tswap64(*((uint64_t *)(_hpte))) & HPTE64_V_HPTE_DIRTY)
+#define CLEAN_HPTE(_hpte)  ((*(uint64_t *)(_hpte)) &= tswap64(~HPTE64_V_HPTE_DIRTY))
+
+static int htab_save_setup(QEMUFile *f, void *opaque)
+{
+    sPAPREnvironment *spapr = opaque;
+
+    spapr->htab_save_index = 0;
+    spapr->htab_first_pass = true;
+
+    /* "Iteration" header */
+    qemu_put_be32(f, spapr->htab_shift);
+
+    return 0;
+}
+
+#define MAX_ITERATION_NS    5000000 /* 5 ms */
+
+static void htab_save_first_pass(QEMUFile *f, sPAPREnvironment *spapr,
+                                 int64_t max_ns)
+{
+    int htabslots = HTAB_SIZE(spapr) / HASH_PTE_SIZE_64;
+    int index = spapr->htab_save_index;
+    int64_t starttime = qemu_get_clock_ns(rt_clock);
+
+    assert(spapr->htab_first_pass);
+
+    do {
+        int chunkstart;
+
+        /* Consume invalid HPTEs */
+        while ((index < htabslots)
+               && !HPTE_VALID(HPTE(spapr->htab, index))) {
+            index++;
+            CLEAN_HPTE(HPTE(spapr->htab, index));
+        }
+
+        /* Consume valid HPTEs */
+        chunkstart = index;
+        while ((index < htabslots)
+               && HPTE_VALID(HPTE(spapr->htab, index))) {
+            index++;
+            CLEAN_HPTE(HPTE(spapr->htab, index));
+        }
+
+        if (index > chunkstart) {
+            int n_valid = index - chunkstart;
+
+            qemu_put_be32(f, chunkstart);
+            qemu_put_be16(f, n_valid);
+            qemu_put_be16(f, 0);
+            qemu_put_buffer(f, HPTE(spapr->htab, chunkstart),
+                            HASH_PTE_SIZE_64 * n_valid);
+
+            if ((qemu_get_clock_ns(rt_clock) - starttime) > max_ns) {
+                break;
+            }
+        }
+    } while ((index < htabslots) && !qemu_file_rate_limit(f));
+
+    if (index >= htabslots) {
+        assert(index == htabslots);
+        index = 0;
+        spapr->htab_first_pass = false;
+    }
+    spapr->htab_save_index = index;
+}
+
+static bool htab_save_later_pass(QEMUFile *f, sPAPREnvironment *spapr,
+                                 int64_t max_ns)
+{
+    bool final = max_ns < 0;
+    int htabslots = HTAB_SIZE(spapr) / HASH_PTE_SIZE_64;
+    int examined = 0, sent = 0;
+    int index = spapr->htab_save_index;
+    int64_t starttime = qemu_get_clock_ns(rt_clock);
+
+    assert(!spapr->htab_first_pass);
+
+    do {
+        int chunkstart, invalidstart;
+
+        /* Consume non-dirty HPTEs */
+        while ((index < htabslots)
+               && !HPTE_DIRTY(HPTE(spapr->htab, index))) {
+            index++;
+            examined++;
+        }
+
+        chunkstart = index;
+        /* Consume valid dirty HPTEs */
+        while ((index < htabslots)
+               && HPTE_DIRTY(HPTE(spapr->htab, index))
+               && HPTE_VALID(HPTE(spapr->htab, index))) {
+            CLEAN_HPTE(HPTE(spapr->htab, index));
+            index++;
+            examined++;
+        }
+
+        invalidstart = index;
+        /* Consume invalid dirty HPTEs */
+        while ((index < htabslots)
+               && HPTE_DIRTY(HPTE(spapr->htab, index))
+               && !HPTE_VALID(HPTE(spapr->htab, index))) {
+            CLEAN_HPTE(HPTE(spapr->htab, index));
+            index++;
+            examined++;
+        }
+
+        if (index > chunkstart) {
+            int n_valid = invalidstart - chunkstart;
+            int n_invalid = index - invalidstart;
+
+            qemu_put_be32(f, chunkstart);
+            qemu_put_be16(f, n_valid);
+            qemu_put_be16(f, n_invalid);
+            qemu_put_buffer(f, HPTE(spapr->htab, chunkstart),
+                            HASH_PTE_SIZE_64 * n_valid);
+            sent += index - chunkstart;
+
+            if (!final && (qemu_get_clock_ns(rt_clock) - starttime) > max_ns) {
+                break;
+            }
+        }
+
+        if (examined >= htabslots) {
+            break;
+        }
+
+        if (index >= htabslots) {
+            assert(index == htabslots);
+            index = 0;
+        }
+    } while ((examined < htabslots) && (!qemu_file_rate_limit(f) || final));
+
+    if (index >= htabslots) {
+        assert(index == htabslots);
+        index = 0;
+    }
+
+    spapr->htab_save_index = index;
+
+    return (examined >= htabslots) && (sent == 0);
+}
+
+static int htab_save_iterate(QEMUFile *f, void *opaque)
+{
+    sPAPREnvironment *spapr = opaque;
+    bool nothingleft = false;;
+
+    /* Iteration header */
+    qemu_put_be32(f, 0);
+
+    if (spapr->htab_first_pass) {
+        htab_save_first_pass(f, spapr, MAX_ITERATION_NS);
+    } else {
+        nothingleft = htab_save_later_pass(f, spapr, MAX_ITERATION_NS);
+    }
+
+    /* End marker */
+    qemu_put_be32(f, 0);
+    qemu_put_be16(f, 0);
+    qemu_put_be16(f, 0);
+
+    return nothingleft ? 1 : 0;
+}
+
+static int htab_save_complete(QEMUFile *f, void *opaque)
+{
+    sPAPREnvironment *spapr = opaque;
+
+    /* Iteration header */
+    qemu_put_be32(f, 0);
+
+    htab_save_later_pass(f, spapr, -1);
+
+    /* End marker */
+    qemu_put_be32(f, 0);
+    qemu_put_be16(f, 0);
+    qemu_put_be16(f, 0);
+
+    return 0;
+}
+
+static int htab_load(QEMUFile *f, void *opaque, int version_id)
+{
+    sPAPREnvironment *spapr = opaque;
+    uint32_t section_hdr;
+
+    if (version_id < 1 || version_id > 1) {
+        fprintf(stderr, "htab_load() bad version\n");
+        return -EINVAL;
+    }
+
+    section_hdr = qemu_get_be32(f);
+
+    if (section_hdr) {
+        /* First section, just the hash shift */
+        if (spapr->htab_shift != section_hdr) {
+            return -EINVAL;
+        }
+        return 0;
+    }
+
+    while (true) {
+        uint32_t index;
+        uint16_t n_valid, n_invalid;
+
+        index = qemu_get_be32(f);
+        n_valid = qemu_get_be16(f);
+        n_invalid = qemu_get_be16(f);
+
+        if ((index == 0) && (n_valid == 0) && (n_invalid == 0)) {
+            /* End of Stream */
+            break;
+        }
+
+        if ((index + n_valid + n_invalid) >=
+            (HTAB_SIZE(spapr) / HASH_PTE_SIZE_64)) {
+            /* Bad index in stream */
+            fprintf(stderr, "htab_load() bad index %d (%hd+%hd entries) "
+                    "in htab stream\n", index, n_valid, n_invalid);
+            return -EINVAL;
+        }
+
+        if (n_valid) {
+            qemu_get_buffer(f, HPTE(spapr->htab, index),
+                            HASH_PTE_SIZE_64 * n_valid);
+        }
+        if (n_invalid) {
+            memset(HPTE(spapr->htab, index + n_valid), 0,
+                   HASH_PTE_SIZE_64 * n_invalid);
+        }
+    }
+
+    return 0;
+}
+
+static SaveVMHandlers savevm_htab_handlers = {
+    .save_live_setup = htab_save_setup,
+    .save_live_iterate = htab_save_iterate,
+    .save_live_complete = htab_save_complete,
+    .load_state = htab_load,
+};
+
 static struct icp_state *try_create_xics(const char *type, int nr_servers,
                                          int nr_irqs)
 {
@@ -987,6 +1250,10 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
 
     spapr->entry_point = 0x100;
 
+    vmstate_register(NULL, 0, &vmstate_spapr, spapr);
+    register_savevm_live(NULL, "spapr/htab", -1, 1,
+                         &savevm_htab_handlers, spapr);
+
     /* Prepare the device tree */
     spapr->fdt_skel = spapr_create_fdt_skel(cpu_model,
                                             initrd_base, initrd_size,
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index e6f321d..7ca984e 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -115,7 +115,7 @@ static target_ulong h_enter(PowerPCCPU *cpu, sPAPREnvironment *spapr,
     }
     ppc_hash64_store_hpte1(env, hpte, ptel);
     /* eieio();  FIXME: need some sort of barrier for smp? */
-    ppc_hash64_store_hpte0(env, hpte, pteh);
+    ppc_hash64_store_hpte0(env, hpte, pteh | HPTE64_V_HPTE_DIRTY);
 
     args[0] = pte_index + i;
     return H_SUCCESS;
@@ -152,7 +152,7 @@ static target_ulong remove_hpte(CPUPPCState *env, target_ulong ptex,
     }
     *vp = v;
     *rp = r;
-    ppc_hash64_store_hpte0(env, hpte, 0);
+    ppc_hash64_store_hpte0(env, hpte, HPTE64_V_HPTE_DIRTY);
     rb = compute_tlbie_rb(v, r, ptex);
     ppc_tlb_invalidate_one(env, rb);
     return REMOVE_SUCCESS;
@@ -282,11 +282,11 @@ static target_ulong h_protect(PowerPCCPU *cpu, sPAPREnvironment *spapr,
     r |= (flags << 48) & HPTE64_R_KEY_HI;
     r |= flags & (HPTE64_R_PP | HPTE64_R_N | HPTE64_R_KEY_LO);
     rb = compute_tlbie_rb(v, r, pte_index);
-    ppc_hash64_store_hpte0(env, hpte, v & ~HPTE64_V_VALID);
+    ppc_hash64_store_hpte0(env, hpte, (v & ~HPTE64_V_VALID) | HPTE64_V_HPTE_DIRTY);
     ppc_tlb_invalidate_one(env, rb);
     ppc_hash64_store_hpte1(env, hpte, r);
     /* Don't need a memory barrier, due to qemu's global lock */
-    ppc_hash64_store_hpte0(env, hpte, v);
+    ppc_hash64_store_hpte0(env, hpte, v | HPTE64_V_HPTE_DIRTY);
     return H_SUCCESS;
 }
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 09c4570..4cfe449 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -9,6 +9,8 @@ struct sPAPRPHBState;
 struct sPAPRNVRAM;
 struct icp_state;
 
+#define HPTE64_V_HPTE_DIRTY     0x0000000000000040ULL
+
 typedef struct sPAPREnvironment {
     struct VIOsPAPRBus *vio_bus;
     QLIST_HEAD(, sPAPRPHBState) phbs;
@@ -17,20 +19,24 @@ typedef struct sPAPREnvironment {
 
     hwaddr ram_limit;
     void *htab;
-    long htab_shift;
+    uint32_t htab_shift;
     hwaddr rma_size;
     int vrma_adjust;
     hwaddr fdt_addr, rtas_addr;
     long rtas_size;
     void *fdt_skel;
     target_ulong entry_point;
-    int next_irq;
-    int rtc_offset;
+    uint32_t next_irq;
+    uint64_t rtc_offset;
     char *cpu_model;
     bool has_graphics;
 
     uint32_t epow_irq;
     Notifier epow_notifier;
+
+    /* Migration state */
+    int htab_save_index;
+    bool htab_first_pass;
 } sPAPREnvironment;
 
 #define H_SUCCESS         0
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 13/19] pseries: savevm support for PCI host bridge
  2013-07-06 13:53 [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (11 preceding siblings ...)
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 12/19] pseries: savevm support for pseries machine Alexey Kardashevskiy
@ 2013-07-06 13:54 ` Alexey Kardashevskiy
  2013-07-07 23:33   ` David Gibson
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 14/19] target-ppc: Add helper for KVM_PPC_RTAS_DEFINE_TOKEN Alexey Kardashevskiy
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 30+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-06 13:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paul Mackerras,
	David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

This adds the necessary support for saving the state of the PAPR virtual
PCI host bridge (or host bridges).

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr_pci.c          | 49 +++++++++++++++++++++++++++++++++++++++++++++
 include/hw/pci-host/spapr.h |  6 +++---
 2 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index c8c12c8..4d8e3cd 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -696,6 +696,54 @@ static Property spapr_phb_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
+static const VMStateDescription vmstate_spapr_pci_lsi = {
+    .name = "spapr_pci/lsi",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_UINT32_EQUAL(irq, struct spapr_pci_lsi),
+
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static const VMStateDescription vmstate_spapr_pci_msi = {
+    .name = "spapr_pci/lsi",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_UINT32(config_addr, struct spapr_pci_msi),
+        VMSTATE_UINT32(irq, struct spapr_pci_msi),
+        VMSTATE_UINT32(nvec, struct spapr_pci_msi),
+
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static const VMStateDescription vmstate_spapr_pci = {
+    .name = "spapr_pci",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .fields      = (VMStateField []) {
+        VMSTATE_UINT64_EQUAL(buid, sPAPRPHBState),
+        VMSTATE_UINT32_EQUAL(dma_liobn, sPAPRPHBState),
+        VMSTATE_UINT64_EQUAL(mem_win_addr, sPAPRPHBState),
+        VMSTATE_UINT64_EQUAL(mem_win_size, sPAPRPHBState),
+        VMSTATE_UINT64_EQUAL(io_win_addr, sPAPRPHBState),
+        VMSTATE_UINT64_EQUAL(io_win_size, sPAPRPHBState),
+        VMSTATE_UINT64_EQUAL(msi_win_addr, sPAPRPHBState),
+        VMSTATE_STRUCT_ARRAY(lsi_table, sPAPRPHBState, PCI_NUM_PINS, 0,
+                             vmstate_spapr_pci_lsi, struct spapr_pci_lsi),
+        VMSTATE_STRUCT_ARRAY(msi_table, sPAPRPHBState, SPAPR_MSIX_MAX_DEVS, 0,
+                             vmstate_spapr_pci_msi, struct spapr_pci_msi),
+
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static void spapr_phb_class_init(ObjectClass *klass, void *data)
 {
     SysBusDeviceClass *sdc = SYS_BUS_DEVICE_CLASS(klass);
@@ -704,6 +752,7 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data)
     sdc->init = spapr_phb_init;
     dc->props = spapr_phb_properties;
     dc->reset = spapr_phb_reset;
+    dc->vmsd = &vmstate_spapr_pci;
 }
 
 static const TypeInfo spapr_phb_info = {
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 1e23dbf..93f9511 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -52,14 +52,14 @@ typedef struct sPAPRPHBState {
     sPAPRTCETable *tcet;
     AddressSpace iommu_as;
 
-    struct {
+    struct spapr_pci_lsi {
         uint32_t irq;
     } lsi_table[PCI_NUM_PINS];
 
-    struct {
+    struct spapr_pci_msi {
         uint32_t config_addr;
         uint32_t irq;
-        int nvec;
+        uint32_t nvec;
     } msi_table[SPAPR_MSIX_MAX_DEVS];
 
     QLIST_ENTRY(sPAPRPHBState) list;
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 14/19] target-ppc: Add helper for KVM_PPC_RTAS_DEFINE_TOKEN
  2013-07-06 13:53 [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (12 preceding siblings ...)
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 13/19] pseries: savevm support for PCI host bridge Alexey Kardashevskiy
@ 2013-07-06 13:54 ` Alexey Kardashevskiy
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 15/19] pseries: Support for in-kernel XICS interrupt controller Alexey Kardashevskiy
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-06 13:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paul Mackerras,
	David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

Recent PowerKVM allows the kernel to intercept some RTAS calls from the
guest directly.  This is used to implement the more efficient in-kernel
XICS for example.  qemu is still responsible for assigning the RTAS token
numbers however, and needs to tell the kernel which RTAS function name is
assigned to a given token value.  This patch adds a convenience wrapper for
the KVM_PPC_RTAS_DEFINE_TOKEN ioctl() which is used for this purpose.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 target-ppc/kvm.c     | 14 ++++++++++++++
 target-ppc/kvm_ppc.h |  7 +++++++
 2 files changed, 21 insertions(+)

diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index c89dd58..33ddf63 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -1787,6 +1787,20 @@ static int kvm_ppc_register_host_cpu_type(void)
     return 0;
 }
 
+int kvmppc_define_rtas_token(uint32_t token, const char *function)
+{
+    struct kvm_rtas_token_args args = {
+        .token = token,
+    };
+
+    if (!kvm_check_extension(kvm_state, KVM_CAP_PPC_RTAS)) {
+        return -ENOENT;
+    }
+
+    strncpy(args.name, function, sizeof(args.name));
+
+    return kvm_vm_ioctl(kvm_state, KVM_PPC_RTAS_DEFINE_TOKEN, &args);
+}
 
 bool kvm_arch_stop_on_emulation_error(CPUState *cpu)
 {
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index 771cfbe..21939a8 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -38,6 +38,7 @@ uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);
 #endif /* !CONFIG_USER_ONLY */
 int kvmppc_fixup_cpu(PowerPCCPU *cpu);
 bool kvmppc_has_cap_epr(void);
+int kvmppc_define_rtas_token(uint32_t token, const char *function);
 
 #else
 
@@ -159,6 +160,12 @@ static inline bool kvmppc_has_cap_epr(void)
 {
     return false;
 }
+
+static inline int kvmppc_define_rtas_token(uint32_t token,
+                                           const char *function)
+{
+    return -1;
+}
 #endif
 
 #ifndef CONFIG_KVM
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 15/19] pseries: Support for in-kernel XICS interrupt controller
  2013-07-06 13:53 [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (13 preceding siblings ...)
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 14/19] target-ppc: Add helper for KVM_PPC_RTAS_DEFINE_TOKEN Alexey Kardashevskiy
@ 2013-07-06 13:54 ` Alexey Kardashevskiy
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 16/19] pseries: savevm support with KVM Alexey Kardashevskiy
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-06 13:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paul Mackerras,
	David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

Recent (host) kernels support emulating the PAPR defined "XICS" interrupt
controller system within KVM.  This patch allows qemu to initialize and
configure the in-kernel XICS, and keep its state in sync with qemu's XICS
state as necessary.

This should give considerable performance improvements.  e.g. on a simple
IPI ping-pong test between hardware threads, using qemu XICS gives us
around 5,000 irqs/second, whereas the in-kernel XICS gives us around
70,000 irqs/s on the same hardware configuration.

[Mike Qiu <qiudayu@linux.vnet.ibm.com>: fixed mistype which caused ics_set_kvm_state() to fail]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
[aik: moved to a separate device]

---
Changes:
2013/07/01
* fixed VMState names in order to support xics-kvm migration to xics and vice versa

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 default-configs/ppc64-softmmu.mak |   1 +
 hw/intc/Makefile.objs             |   1 +
 hw/intc/xics_kvm.c                | 445 ++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr.c                    |  32 ++-
 include/hw/ppc/xics.h             |  13 ++
 5 files changed, 489 insertions(+), 3 deletions(-)
 create mode 100644 hw/intc/xics_kvm.c

diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
index 69a9f8d..5b995f9 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -48,5 +48,6 @@ CONFIG_OPENPIC_KVM=$(and $(CONFIG_E500),$(CONFIG_KVM))
 # For pSeries
 CONFIG_PCI_HOTPLUG=y
 CONFIG_XICS=$(CONFIG_PSERIES)
+CONFIG_XICS_KVM=$(and $(CONFIG_PSERIES),$(CONFIG_KVM))
 # For PReP
 CONFIG_MC146818RTC=y
diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs
index abe8f80..9e77afe 100644
--- a/hw/intc/Makefile.objs
+++ b/hw/intc/Makefile.objs
@@ -23,3 +23,4 @@ obj-$(CONFIG_OPENPIC) += openpic.o
 obj-$(CONFIG_OPENPIC_KVM) += openpic_kvm.o
 obj-$(CONFIG_SH4) += sh_intc.o
 obj-$(CONFIG_XICS) += xics.o
+obj-$(CONFIG_XICS_KVM) += xics_kvm.o
diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
new file mode 100644
index 0000000..b630150
--- /dev/null
+++ b/hw/intc/xics_kvm.c
@@ -0,0 +1,445 @@
+/*
+ * QEMU PowerPC pSeries Logical Partition (aka sPAPR) hardware System Emulator
+ *
+ * PAPR Virtualized Interrupt System, aka ICS/ICP aka xics, in-kernel emulation
+ *
+ * Copyright (c) 2013 David Gibson, IBM Corporation.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ *
+ */
+
+#include "hw/hw.h"
+#include "trace.h"
+#include "hw/ppc/spapr.h"
+#include "hw/ppc/xics.h"
+#include "kvm_ppc.h"
+#include "qemu/config-file.h"
+
+#include <sys/ioctl.h>
+
+struct icp_state_kvm {
+    struct icp_state parent;
+
+    uint32_t set_xive_token;
+    uint32_t get_xive_token;
+    uint32_t int_off_token;
+    uint32_t int_on_token;
+    int kernel_xics_fd;
+};
+
+static void icp_get_kvm_state(struct icp_server_state *ss)
+{
+    uint64_t state;
+    struct kvm_one_reg reg = {
+        .id = KVM_REG_PPC_ICP_STATE,
+        .addr = (uintptr_t)&state,
+    };
+    int ret;
+
+    if (!ss->cs) {
+        return; /* kernel irqchip not in use */
+    }
+
+    ret = kvm_vcpu_ioctl(ss->cs, KVM_GET_ONE_REG, &reg);
+    if (ret != 0) {
+        fprintf(stderr, "Unable to retrieve KVM interrupt controller state"
+                " for CPU %d: %s\n", ss->cs->cpu_index, strerror(errno));
+        exit(1);
+    }
+
+    ss->xirr = state >> KVM_REG_PPC_ICP_XISR_SHIFT;
+    ss->mfrr = (state >> KVM_REG_PPC_ICP_MFRR_SHIFT)
+        & KVM_REG_PPC_ICP_MFRR_MASK;
+    ss->pending_priority = (state >> KVM_REG_PPC_ICP_PPRI_SHIFT)
+        & KVM_REG_PPC_ICP_PPRI_MASK;
+}
+
+static int icp_set_kvm_state(struct icp_server_state *ss)
+{
+    uint64_t state;
+    struct kvm_one_reg reg = {
+        .id = KVM_REG_PPC_ICP_STATE,
+        .addr = (uintptr_t)&state,
+    };
+    int ret;
+
+    if (!ss->cs) {
+        return 0; /* kernel irqchip not in use */
+    }
+
+    state = ((uint64_t)ss->xirr << KVM_REG_PPC_ICP_XISR_SHIFT)
+        | ((uint64_t)ss->mfrr << KVM_REG_PPC_ICP_MFRR_SHIFT)
+        | ((uint64_t)ss->pending_priority << KVM_REG_PPC_ICP_PPRI_SHIFT);
+
+    ret = kvm_vcpu_ioctl(ss->cs, KVM_SET_ONE_REG, &reg);
+    if (ret != 0) {
+        fprintf(stderr, "Unable to restore KVM interrupt controller state (0x%"
+                PRIx64 ") for CPU %d: %s\n", state, ss->cs->cpu_index,
+                strerror(errno));
+        exit(1);
+        return ret;
+    }
+
+    return 0;
+}
+
+static void ics_get_kvm_state(struct ics_state *ics)
+{
+    struct icp_state_kvm *icpkvm = XICS_KVM(ics->icp);
+    uint64_t state;
+    struct kvm_device_attr attr = {
+        .flags = 0,
+        .group = KVM_DEV_XICS_GRP_SOURCES,
+        .addr = (uint64_t)(uintptr_t)&state,
+    };
+    int i;
+
+    for (i = 0; i < ics->nr_irqs; i++) {
+        struct ics_irq_state *irq = &ics->irqs[i];
+        int ret;
+
+        attr.attr = i + ics->offset;
+
+        ret = ioctl(icpkvm->kernel_xics_fd, KVM_GET_DEVICE_ATTR, &attr);
+        if (ret != 0) {
+            fprintf(stderr, "Unable to retrieve KVM interrupt controller state"
+                    " for IRQ %d: %s\n", i + ics->offset, strerror(errno));
+            exit(1);
+        }
+
+        irq->server = state & KVM_XICS_DESTINATION_MASK;
+        irq->saved_priority = (state >> KVM_XICS_PRIORITY_SHIFT)
+            & KVM_XICS_PRIORITY_MASK;
+        /*
+         * To be consistent with the software emulation in xics.c, we
+         * split out the masked state + priority that we get from the
+         * kernel into 'current priority' (0xff if masked) and
+         * 'saved priority' (if masked, this is the priority the
+         * interrupt had before it was masked).  Masking and unmasking
+         * are done with the ibm,int-off and ibm,int-on RTAS calls.
+         */
+        if (state & KVM_XICS_MASKED) {
+            irq->priority = 0xff;
+        } else {
+            irq->priority = irq->saved_priority;
+        }
+
+        if (state & KVM_XICS_PENDING) {
+            if (state & KVM_XICS_LEVEL_SENSITIVE) {
+                irq->status |= XICS_STATUS_ASSERTED;
+            } else {
+                /*
+                 * A pending edge-triggered interrupt (or MSI)
+                 * must have been rejected previously when we
+                 * first detected it and tried to deliver it,
+                 * so mark it as pending and previously rejected
+                 * for consistency with how xics.c works.
+                 */
+                irq->status |= XICS_STATUS_MASKED_PENDING
+                    | XICS_STATUS_REJECTED;
+            }
+        }
+    }
+}
+
+static int ics_set_kvm_state(struct ics_state *ics)
+{
+    struct icp_state_kvm *icpkvm = XICS_KVM(ics->icp);
+    uint64_t state;
+    struct kvm_device_attr attr = {
+        .flags = 0,
+        .group = KVM_DEV_XICS_GRP_SOURCES,
+        .addr = (uint64_t)(uintptr_t)&state,
+    };
+    int i;
+
+    for (i = 0; i < ics->nr_irqs; i++) {
+        struct ics_irq_state *irq = &ics->irqs[i];
+        int ret;
+
+        attr.attr = i + ics->offset;
+
+        state = irq->server;
+        state |= (uint64_t)(irq->saved_priority & KVM_XICS_PRIORITY_MASK)
+            << KVM_XICS_PRIORITY_SHIFT;
+        if (irq->priority != irq->saved_priority) {
+            assert(irq->priority == 0xff);
+            state |= KVM_XICS_MASKED;
+        }
+
+        if (ics->islsi[i]) {
+            state |= KVM_XICS_LEVEL_SENSITIVE;
+            if (irq->status & XICS_STATUS_ASSERTED) {
+                state |= KVM_XICS_PENDING;
+            }
+        } else {
+            if (irq->status & XICS_STATUS_MASKED_PENDING) {
+                state |= KVM_XICS_PENDING;
+            }
+        }
+
+        ret = ioctl(icpkvm->kernel_xics_fd, KVM_SET_DEVICE_ATTR, &attr);
+        if (ret != 0) {
+            fprintf(stderr, "Unable to restore KVM interrupt controller state"
+                    " for IRQs %d: %s\n", i + ics->offset, strerror(errno));
+            return ret;
+        }
+    }
+
+    return 0;
+}
+
+static void icp_pre_save(void *opaque)
+{
+    struct icp_server_state *ss = opaque;
+
+    icp_get_kvm_state(ss);
+}
+
+static int icp_post_load(void *opaque, int version_id)
+{
+    struct icp_server_state *ss = opaque;
+
+    return icp_set_kvm_state(ss);
+}
+
+static void ics_pre_save(void *opaque)
+{
+    struct ics_state *ics = opaque;
+
+    ics_get_kvm_state(ics);
+}
+
+static int ics_post_load(void *opaque, int version_id)
+{
+    struct ics_state *ics = opaque;
+
+    return ics_set_kvm_state(ics);
+}
+
+static VMStateDescription vmstate_icpkvm_server = {
+    .name = "icp/server",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .pre_save = icp_pre_save,
+    .post_load = icp_post_load,
+};
+
+static VMStateDescription vmstate_icskvm = {
+    .name = "ics",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .pre_save = ics_pre_save,
+    .post_load = ics_post_load,
+};
+
+static void ics_set_irq_kvm(void *opaque, int srcno, int val)
+{
+    struct ics_state *ics = opaque;
+    struct kvm_irq_level args;
+    int rc;
+
+    args.irq = srcno + ics->offset;
+    if (!ics->islsi[srcno]) {
+        if (!val) {
+            return;
+        }
+        args.level = KVM_INTERRUPT_SET;
+    } else {
+        args.level = val ? KVM_INTERRUPT_SET_LEVEL : KVM_INTERRUPT_UNSET;
+    }
+    rc = kvm_vm_ioctl(kvm_state, KVM_IRQ_LINE, &args);
+    if (rc < 0) {
+        perror("kvm_irq_line");
+    }
+}
+
+int xics_kvm_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
+{
+    CPUState *cs;
+    struct icp_server_state *ss;
+    struct icp_state_kvm *icpkvm = (struct icp_state_kvm *) object_dynamic_cast(
+            OBJECT(icp), TYPE_XICS_KVM);
+
+    if (!icpkvm) {
+        return -1;
+    }
+
+    cs = CPU(cpu);
+    ss = &icp->ss[cs->cpu_index];
+
+    assert(cs->cpu_index < icp->nr_servers);
+    if (icpkvm->kernel_xics_fd == -1) {
+        abort();
+    }
+
+    if (icpkvm->kernel_xics_fd != -1) {
+        int ret;
+        struct kvm_enable_cap xics_enable_cap = {
+            .cap = KVM_CAP_IRQ_XICS,
+            .flags = 0,
+            .args = {icpkvm->kernel_xics_fd, cs->cpu_index, 0, 0},
+        };
+
+        ss->cs = cs;
+
+        ret = kvm_vcpu_ioctl(ss->cs, KVM_ENABLE_CAP, &xics_enable_cap);
+        if (ret < 0) {
+            fprintf(stderr, "Unable to connect CPU%d to kernel XICS: %s\n",
+                    cs->cpu_index, strerror(errno));
+            exit(1);
+        }
+    }
+    xics_common_cpu_setup(icp, cpu);
+
+    vmstate_icpkvm_server.fields = vmstate_icp_server.fields;
+    vmstate_register(NULL, cs->cpu_index, &vmstate_icpkvm_server, ss);
+
+    return 0;
+}
+
+static void rtas_dummy(PowerPCCPU *cpu, sPAPREnvironment *spapr,
+                       uint32_t token,
+                       uint32_t nargs, target_ulong args,
+                       uint32_t nret, target_ulong rets)
+{
+    fprintf(stderr, "pseries: %s() should never be called for in-kernel XICS\n", __func__);
+}
+
+static void xics_kvm_realize(DeviceState *dev, Error **errp)
+{
+    struct icp_state_kvm *icpkvm = XICS_KVM(dev);
+    QemuOptsList *list = qemu_find_opts("machine");
+    int rc;
+    struct kvm_create_device xics_create_device = {
+        .type = KVM_DEV_TYPE_XICS,
+        .flags = 0,
+    };
+
+    if (!kvm_enabled()) {
+        error_setg(errp, "KVM must be enabled for in-kernel XICS");
+        goto fail;
+    }
+
+    if (QTAILQ_EMPTY(&list->head) ||
+        !qemu_opt_get_bool(QTAILQ_FIRST(&list->head),
+                           "kernel_irqchip", true) ||
+        !kvm_check_extension(kvm_state, KVM_CAP_IRQ_XICS)) {
+        error_setg(errp, "KVM must be enabled for in-kernel XICS");
+        return;
+    }
+
+    icpkvm->set_xive_token = spapr_rtas_register("ibm,set-xive", rtas_dummy);
+    icpkvm->get_xive_token = spapr_rtas_register("ibm,get-xive", rtas_dummy);
+    icpkvm->int_off_token = spapr_rtas_register("ibm,int-off", rtas_dummy);
+    icpkvm->int_on_token = spapr_rtas_register("ibm,int-on", rtas_dummy);
+
+    rc = kvmppc_define_rtas_token(icpkvm->set_xive_token, "ibm,set-xive");
+    if (rc < 0) {
+        error_setg(errp, "kvmppc_define_rtas_token: ibm,set-xive");
+        goto fail;
+    }
+
+    rc = kvmppc_define_rtas_token(icpkvm->get_xive_token, "ibm,get-xive");
+    if (rc < 0) {
+        error_setg(errp, "kvmppc_define_rtas_token: ibm,get-xive");
+        goto fail;
+    }
+
+    rc = kvmppc_define_rtas_token(icpkvm->int_on_token, "ibm,int-on");
+    if (rc < 0) {
+        error_setg(errp, "kvmppc_define_rtas_token: ibm,int-on");
+        goto fail;
+    }
+
+    rc = kvmppc_define_rtas_token(icpkvm->int_off_token, "ibm,int-off");
+    if (rc < 0) {
+        error_setg(errp, "kvmppc_define_rtas_token: ibm,int-off");
+        goto fail;
+    }
+
+    /* Create the kernel ICP */
+    rc = kvm_vm_ioctl(kvm_state, KVM_CREATE_DEVICE, &xics_create_device);
+    if (rc < 0) {
+        error_setg_errno(errp, -rc, "Error on KVM_CREATE_DEVICE for XICS");
+        goto fail;
+    }
+
+    icpkvm->kernel_xics_fd = xics_create_device.fd;
+
+    xics_common_init(&icpkvm->parent, ics_set_irq_kvm);
+
+    /* We use each the ICS's offset into the global irq number space
+     * as an instance id.  This means we can extend to multiple ICS
+     * instances without needing to change the savevm format */
+    vmstate_icskvm.fields = vmstate_ics.fields;
+    vmstate_register(NULL, icpkvm->parent.ics->offset, &vmstate_icskvm,
+                     icpkvm->parent.ics);
+
+    return;
+
+fail:
+    kvmppc_define_rtas_token(0, "ibm,set-xive");
+    kvmppc_define_rtas_token(0, "ibm,get-xive");
+    kvmppc_define_rtas_token(0, "ibm,int-on");
+    kvmppc_define_rtas_token(0, "ibm,int-off");
+    return;
+}
+
+static void xics_kvm_reset(DeviceState *d)
+{
+    struct icp_state_kvm *icpkvm = XICS_KVM(d);
+    struct icp_state *icp = &icpkvm->parent;
+    int i;
+
+    xics_common_reset(icp);
+
+    for (i = 0; i < icp->nr_servers; i++) {
+        if (icp->ss[i].cs) {
+            icp_set_kvm_state(&icp->ss[i]);
+        }
+    }
+
+    ics_set_kvm_state(icp->ics);
+}
+
+static void xics_kvm_class_init(ObjectClass *oc, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(oc);
+
+    dc->realize = xics_kvm_realize;
+    dc->reset = xics_kvm_reset;
+}
+
+static const TypeInfo xics_kvm_info = {
+    .name          = TYPE_XICS_KVM,
+    .parent        = TYPE_XICS,
+    .instance_size = sizeof(struct icp_state_kvm),
+    .class_init    = xics_kvm_class_init,
+};
+
+static void xics_kvm_register_types(void)
+{
+    type_register_static(&xics_kvm_info);
+}
+
+type_init(xics_kvm_register_types)
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index bf348c7..961f2f7 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1001,7 +1001,31 @@ static struct icp_state *xics_system_init(int nr_servers, int nr_irqs)
 {
     struct icp_state *icp = NULL;
 
-    icp = try_create_xics(TYPE_XICS, nr_servers, nr_irqs);
+    if (kvm_enabled()) {
+        bool irqchip_allowed = true, irqchip_required = false;
+        QemuOptsList *list = qemu_find_opts("machine");
+
+        if (!QTAILQ_EMPTY(&list->head)) {
+            irqchip_allowed = qemu_opt_get_bool(QTAILQ_FIRST(&list->head),
+                                                "kernel_irqchip", true);
+            irqchip_required = qemu_opt_get_bool(QTAILQ_FIRST(&list->head),
+                                                 "kernel_irqchip", false);
+        }
+
+        if (irqchip_allowed) {
+            icp = try_create_xics(TYPE_XICS_KVM, nr_servers, nr_irqs);
+        }
+
+        if (irqchip_required && !icp) {
+            perror("iFailed to create in-kernel XICS\n");
+            abort();
+        }
+    }
+
+    if (!icp) {
+        icp = try_create_xics(TYPE_XICS, nr_servers, nr_irqs);
+    }
+
     if (!icp) {
         perror("Failed to create XICS\n");
         abort();
@@ -1102,8 +1126,6 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
         }
         env = &cpu->env;
 
-        xics_cpu_setup(spapr->icp, cpu);
-
         /* Set time-base frequency to 512 MHz */
         cpu_ppc_tb_init(env, TIMEBASE_FREQ);
 
@@ -1117,6 +1139,10 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
             kvmppc_set_papr(cpu);
         }
 
+        if (xics_kvm_cpu_setup(spapr->icp, cpu)) {
+            xics_cpu_setup(spapr->icp, cpu);
+        }
+
         qemu_register_reset(spapr_cpu_reset, cpu);
     }
 
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 3f72806..e474c01 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -32,6 +32,9 @@
 #define TYPE_XICS "xics"
 #define XICS(obj) OBJECT_CHECK(struct icp_state, (obj), TYPE_XICS)
 
+#define TYPE_XICS_KVM "xics-kvm"
+#define XICS_KVM(obj) OBJECT_CHECK(struct icp_state_kvm, (obj), TYPE_XICS_KVM)
+
 #define XICS_IPI        0x2
 #define XICS_BUID       0x1
 #define XICS_IRQ_BASE   (XICS_BUID << 12)
@@ -53,6 +56,7 @@ struct icp_state {
 };
 
 struct icp_server_state {
+    CPUState *cs;
     uint32_t xirr;
     uint8_t pending_priority;
     uint8_t mfrr;
@@ -88,6 +92,15 @@ void xics_common_reset(struct icp_state *icp);
 
 void xics_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu);
 
+#ifdef CONFIG_KVM
+int xics_kvm_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu);
+#else
+static inline int xics_kvm_cpu_setup(struct icp_state *icp, PowerPCCPU *cpu)
+{
+    return -1;
+}
+#endif
+
 extern const VMStateDescription vmstate_icp_server;
 extern const VMStateDescription vmstate_ics;
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 16/19] pseries: savevm support with KVM
  2013-07-06 13:53 [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (14 preceding siblings ...)
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 15/19] pseries: Support for in-kernel XICS interrupt controller Alexey Kardashevskiy
@ 2013-07-06 13:54 ` Alexey Kardashevskiy
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 17/19] target-ppc: Add POWER8 v1.0 CPU model Alexey Kardashevskiy
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-06 13:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, qemu-ppc, Paul Mackerras,
	David Gibson

From: David Gibson <david@gibson.dropbear.id.au>

At present, the savevm / migration support for the pseries machine will not
work when KVM is enabled.  That's because KVM manages the guest's hash page
table in the host kernel, so qemu has no visibility of it.  This patch
fixes this by using new kernel interfaces to extract and reinsert the
guest's hash table during the migration process.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr.c         | 106 +++++++++++++++++++++++++++++++++++++++----------
 include/hw/ppc/spapr.h |   1 +
 target-ppc/kvm.c       |  69 ++++++++++++++++++++++++++++++++
 target-ppc/kvm_ppc.h   |  22 ++++++++++
 4 files changed, 176 insertions(+), 22 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 961f2f7..26dd3f7 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -744,17 +744,27 @@ static int htab_save_setup(QEMUFile *f, void *opaque)
 {
     sPAPREnvironment *spapr = opaque;
 
-    spapr->htab_save_index = 0;
-    spapr->htab_first_pass = true;
-
     /* "Iteration" header */
     qemu_put_be32(f, spapr->htab_shift);
 
+    if (spapr->htab) {
+        spapr->htab_save_index = 0;
+        spapr->htab_first_pass = true;
+    } else {
+        assert(kvm_enabled());
+
+        spapr->htab_fd = kvmppc_get_htab_fd(false);
+        if (spapr->htab_fd < 0) {
+            fprintf(stderr, "Unable to open fd for reading hash table from KVM: %s\n",
+                    strerror(errno));
+            return -1;
+        }
+    }
+
+
     return 0;
 }
 
-#define MAX_ITERATION_NS    5000000 /* 5 ms */
-
 static void htab_save_first_pass(QEMUFile *f, sPAPREnvironment *spapr,
                                  int64_t max_ns)
 {
@@ -805,8 +815,8 @@ static void htab_save_first_pass(QEMUFile *f, sPAPREnvironment *spapr,
     spapr->htab_save_index = index;
 }
 
-static bool htab_save_later_pass(QEMUFile *f, sPAPREnvironment *spapr,
-                                 int64_t max_ns)
+static int htab_save_later_pass(QEMUFile *f, sPAPREnvironment *spapr,
+                                int64_t max_ns)
 {
     bool final = max_ns < 0;
     int htabslots = HTAB_SIZE(spapr) / HASH_PTE_SIZE_64;
@@ -879,21 +889,32 @@ static bool htab_save_later_pass(QEMUFile *f, sPAPREnvironment *spapr,
 
     spapr->htab_save_index = index;
 
-    return (examined >= htabslots) && (sent == 0);
+    return (examined >= htabslots) && (sent == 0) ? 1 : 0;
 }
 
+#define MAX_ITERATION_NS    5000000 /* 5 ms */
+#define MAX_KVM_BUF_SIZE    2048
+
 static int htab_save_iterate(QEMUFile *f, void *opaque)
 {
     sPAPREnvironment *spapr = opaque;
-    bool nothingleft = false;;
+    int rc = 0;
 
     /* Iteration header */
     qemu_put_be32(f, 0);
 
-    if (spapr->htab_first_pass) {
+    if (!spapr->htab) {
+        assert(kvm_enabled());
+
+        rc = kvmppc_save_htab(f, spapr->htab_fd,
+                              MAX_KVM_BUF_SIZE, MAX_ITERATION_NS);
+        if (rc < 0) {
+            return rc;
+        }
+    } else  if (spapr->htab_first_pass) {
         htab_save_first_pass(f, spapr, MAX_ITERATION_NS);
     } else {
-        nothingleft = htab_save_later_pass(f, spapr, MAX_ITERATION_NS);
+        rc = htab_save_later_pass(f, spapr, MAX_ITERATION_NS);
     }
 
     /* End marker */
@@ -901,7 +922,7 @@ static int htab_save_iterate(QEMUFile *f, void *opaque)
     qemu_put_be16(f, 0);
     qemu_put_be16(f, 0);
 
-    return nothingleft ? 1 : 0;
+    return rc;
 }
 
 static int htab_save_complete(QEMUFile *f, void *opaque)
@@ -911,7 +932,20 @@ static int htab_save_complete(QEMUFile *f, void *opaque)
     /* Iteration header */
     qemu_put_be32(f, 0);
 
-    htab_save_later_pass(f, spapr, -1);
+    if (!spapr->htab) {
+        int rc;
+
+        assert(kvm_enabled());
+
+        rc = kvmppc_save_htab(f, spapr->htab_fd, MAX_KVM_BUF_SIZE, -1);
+        if (rc < 0) {
+            return rc;
+        }
+        close(spapr->htab_fd);
+        spapr->htab_fd = -1;
+    } else {
+        htab_save_later_pass(f, spapr, -1);
+    }
 
     /* End marker */
     qemu_put_be32(f, 0);
@@ -925,6 +959,7 @@ static int htab_load(QEMUFile *f, void *opaque, int version_id)
 {
     sPAPREnvironment *spapr = opaque;
     uint32_t section_hdr;
+    int fd = -1;
 
     if (version_id < 1 || version_id > 1) {
         fprintf(stderr, "htab_load() bad version\n");
@@ -941,6 +976,16 @@ static int htab_load(QEMUFile *f, void *opaque, int version_id)
         return 0;
     }
 
+    if (!spapr->htab) {
+        assert(kvm_enabled());
+
+        fd = kvmppc_get_htab_fd(true);
+        if (fd < 0) {
+            fprintf(stderr, "Unable to open fd to restore KVM hash table: %s\n",
+                    strerror(errno));
+        }
+    }
+
     while (true) {
         uint32_t index;
         uint16_t n_valid, n_invalid;
@@ -954,24 +999,41 @@ static int htab_load(QEMUFile *f, void *opaque, int version_id)
             break;
         }
 
-        if ((index + n_valid + n_invalid) >=
+        if ((index + n_valid + n_invalid) >
             (HTAB_SIZE(spapr) / HASH_PTE_SIZE_64)) {
             /* Bad index in stream */
             fprintf(stderr, "htab_load() bad index %d (%hd+%hd entries) "
-                    "in htab stream\n", index, n_valid, n_invalid);
+                    "in htab stream (htab_shift=%d)\n", index, n_valid, n_invalid,
+                    spapr->htab_shift);
             return -EINVAL;
         }
 
-        if (n_valid) {
-            qemu_get_buffer(f, HPTE(spapr->htab, index),
-                            HASH_PTE_SIZE_64 * n_valid);
-        }
-        if (n_invalid) {
-            memset(HPTE(spapr->htab, index + n_valid), 0,
-                   HASH_PTE_SIZE_64 * n_invalid);
+        if (spapr->htab) {
+            if (n_valid) {
+                qemu_get_buffer(f, HPTE(spapr->htab, index),
+                                HASH_PTE_SIZE_64 * n_valid);
+            }
+            if (n_invalid) {
+                memset(HPTE(spapr->htab, index + n_valid), 0,
+                       HASH_PTE_SIZE_64 * n_invalid);
+            }
+        } else {
+            int rc;
+
+            assert(fd >= 0);
+
+            rc = kvmppc_load_htab_chunk(f, fd, index, n_valid, n_invalid);
+            if (rc < 0) {
+                return rc;
+            }
         }
     }
 
+    if (!spapr->htab) {
+        assert(fd >= 0);
+        close(fd);
+    }
+
     return 0;
 }
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 4cfe449..3da31f0 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -37,6 +37,7 @@ typedef struct sPAPREnvironment {
     /* Migration state */
     int htab_save_index;
     bool htab_first_pass;
+    int htab_fd;
 } sPAPREnvironment;
 
 #define H_SUCCESS         0
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 33ddf63..ff85c19 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -65,6 +65,7 @@ static int cap_one_reg;
 static int cap_epr;
 static int cap_ppc_watchdog;
 static int cap_papr;
+static int cap_htab_fd;
 
 /* XXX We have a race condition where we actually have a level triggered
  *     interrupt, but the infrastructure can't expose that yet, so the guest
@@ -101,6 +102,7 @@ int kvm_arch_init(KVMState *s)
     cap_ppc_watchdog = kvm_check_extension(s, KVM_CAP_PPC_BOOKE_WATCHDOG);
     /* Note: we don't set cap_papr here, because this capability is
      * only activated after this by kvmppc_set_papr() */
+    cap_htab_fd = kvm_check_extension(s, KVM_CAP_PPC_HTAB_FD);
 
     if (!cap_interrupt_level) {
         fprintf(stderr, "KVM: Couldn't find level irq capability. Expect the "
@@ -1802,6 +1804,73 @@ int kvmppc_define_rtas_token(uint32_t token, const char *function)
     return kvm_vm_ioctl(kvm_state, KVM_PPC_RTAS_DEFINE_TOKEN, &args);
 }
 
+int kvmppc_get_htab_fd(bool write)
+{
+    struct kvm_get_htab_fd s = {
+        .flags = write ? KVM_GET_HTAB_WRITE : 0,
+        .start_index = 0,
+    };
+
+    if (!cap_htab_fd) {
+        fprintf(stderr, "KVM version doesn't support saving the hash table\n");
+        return -1;
+    }
+
+    return kvm_vm_ioctl(kvm_state, KVM_PPC_GET_HTAB_FD, &s);
+}
+
+int kvmppc_save_htab(QEMUFile *f, int fd, size_t bufsize, int64_t max_ns)
+{
+    int64_t starttime = qemu_get_clock_ns(rt_clock);
+    uint8_t buf[bufsize];
+    ssize_t rc;
+
+    do {
+        rc = read(fd, buf, bufsize);
+        if (rc < 0) {
+            fprintf(stderr, "Error reading data from KVM HTAB fd: %s\n",
+                    strerror(errno));
+            return rc;
+        } else if (rc) {
+            /* Kernel already retuns data in BE format for the file */
+            qemu_put_buffer(f, buf, rc);
+        }
+    } while ((rc != 0)
+             && ((max_ns < 0)
+                 || ((qemu_get_clock_ns(rt_clock) - starttime) < max_ns)));
+
+    return (rc == 0) ? 1 : 0;
+}
+
+int kvmppc_load_htab_chunk(QEMUFile *f, int fd, uint32_t index,
+                           uint16_t n_valid, uint16_t n_invalid)
+{
+    struct kvm_get_htab_header *buf;
+    size_t chunksize = sizeof(*buf) + n_valid*HASH_PTE_SIZE_64;
+    ssize_t rc;
+
+    buf = alloca(chunksize);
+    /* This is KVM on ppc, so this is all big-endian */
+    buf->index = index;
+    buf->n_valid = n_valid;
+    buf->n_invalid = n_invalid;
+
+    qemu_get_buffer(f, (void *)(buf + 1), HASH_PTE_SIZE_64*n_valid);
+
+    rc = write(fd, buf, chunksize);
+    if (rc < 0) {
+        fprintf(stderr, "Error writing KVM hash table: %s\n",
+                strerror(errno));
+        return rc;
+    }
+    if (rc != chunksize) {
+        /* We should never get a short write on a single chunk */
+        fprintf(stderr, "Short write, restoring KVM hash table\n");
+        return -1;
+    }
+    return 0;
+}
+
 bool kvm_arch_stop_on_emulation_error(CPUState *cpu)
 {
     return true;
diff --git a/target-ppc/kvm_ppc.h b/target-ppc/kvm_ppc.h
index 21939a8..12564ef 100644
--- a/target-ppc/kvm_ppc.h
+++ b/target-ppc/kvm_ppc.h
@@ -39,6 +39,10 @@ uint64_t kvmppc_rma_size(uint64_t current_size, unsigned int hash_shift);
 int kvmppc_fixup_cpu(PowerPCCPU *cpu);
 bool kvmppc_has_cap_epr(void);
 int kvmppc_define_rtas_token(uint32_t token, const char *function);
+int kvmppc_get_htab_fd(bool write);
+int kvmppc_save_htab(QEMUFile *f, int fd, size_t bufsize, int64_t max_ns);
+int kvmppc_load_htab_chunk(QEMUFile *f, int fd, uint32_t index,
+                           uint16_t n_valid, uint16_t n_invalid);
 
 #else
 
@@ -166,6 +170,24 @@ static inline int kvmppc_define_rtas_token(uint32_t token,
 {
     return -1;
 }
+
+static inline int kvmppc_get_htab_fd(bool write)
+{
+    return -1;
+}
+
+static inline int kvmppc_save_htab(QEMUFile *f, int fd, size_t bufsize,
+                                   int64_t max_ns)
+{
+    abort();
+}
+
+static inline int kvmppc_load_htab_chunk(QEMUFile *f, int fd, uint32_t index,
+                                         uint16_t n_valid, uint16_t n_invalid)
+{
+    abort();
+}
+
 #endif
 
 #ifndef CONFIG_KVM
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 17/19] target-ppc: Add POWER8 v1.0 CPU model
  2013-07-06 13:53 [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (15 preceding siblings ...)
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 16/19] pseries: savevm support with KVM Alexey Kardashevskiy
@ 2013-07-06 13:54 ` Alexey Kardashevskiy
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 18/19] target-ppc: Enhance the CPU node labels for the guest device tree for pseries Alexey Kardashevskiy
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-06 13:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, Paul Mackerras,
	Prerna Saxena, qemu-ppc, David Gibson

From: Prerna Saxena <prerna@linux.vnet.ibm.com>

This patch adds CPU PVR definition for POWER8,
and enables QEMU to launch guests on POWER8 hardware.

Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Andreas Farber <afaerber@suse.de>

---
Changes:
2013/07/04:
* version 0.1 fixed to 1.0

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 target-ppc/cpu-models.c     |  3 +++
 target-ppc/cpu-models.h     |  1 +
 target-ppc/translate_init.c | 34 ++++++++++++++++++++++++++++++++++
 3 files changed, 38 insertions(+)

diff --git a/target-ppc/cpu-models.c b/target-ppc/cpu-models.c
index 9bb68c8..623ad29 100644
--- a/target-ppc/cpu-models.c
+++ b/target-ppc/cpu-models.c
@@ -1145,6 +1145,8 @@
                 "POWER7 v2.1")
     POWERPC_DEF("POWER7_v2.3",   CPU_POWERPC_POWER7_v23,             POWER7,
                 "POWER7 v2.3")
+    POWERPC_DEF("POWER8_v1.0",   CPU_POWERPC_POWER8_v10,             POWER8,
+                "POWER8 v1.0")
     POWERPC_DEF("970",           CPU_POWERPC_970,                    970,
                 "PowerPC 970")
     POWERPC_DEF("970fx_v1.0",    CPU_POWERPC_970FX_v10,              970FX,
@@ -1390,6 +1392,7 @@ PowerPCCPUAlias ppc_cpu_aliases[] = {
     { "Dino",  "POWER3" },
     { "POWER3+", "631" },
     { "POWER7", "POWER7_v2.3" },
+    { "POWER8", "POWER8_v1.0" },
     { "970fx", "970fx_v3.1" },
     { "970mp", "970mp_v1.1" },
     { "Apache", "RS64" },
diff --git a/target-ppc/cpu-models.h b/target-ppc/cpu-models.h
index ae8f7c7..5458529 100644
--- a/target-ppc/cpu-models.h
+++ b/target-ppc/cpu-models.h
@@ -556,6 +556,7 @@ enum {
     CPU_POWERPC_POWER7_v20         = 0x003F0200,
     CPU_POWERPC_POWER7_v21         = 0x003F0201,
     CPU_POWERPC_POWER7_v23         = 0x003F0203,
+    CPU_POWERPC_POWER8_v10         = 0x004B0100,
     CPU_POWERPC_970                = 0x00390202,
     CPU_POWERPC_970FX_v10          = 0x00391100,
     CPU_POWERPC_970FX_v20          = 0x003C0200,
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 02f3825..c4b466b 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -7011,6 +7011,40 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data)
     pcc->l1_dcache_size = 0x8000;
     pcc->l1_icache_size = 0x8000;
 }
+
+POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(oc);
+    PowerPCCPUClass *pcc = POWERPC_CPU_CLASS(oc);
+
+    dc->desc = "POWER8";
+    pcc->init_proc = init_proc_POWER7;
+    pcc->check_pow = check_pow_nocheck;
+    pcc->insns_flags = PPC_INSNS_BASE | PPC_STRING | PPC_MFTB |
+                       PPC_FLOAT | PPC_FLOAT_FSEL | PPC_FLOAT_FRES |
+                       PPC_FLOAT_FSQRT | PPC_FLOAT_FRSQRTE |
+                       PPC_FLOAT_STFIWX |
+                       PPC_CACHE | PPC_CACHE_ICBI | PPC_CACHE_DCBZ |
+                       PPC_MEM_SYNC | PPC_MEM_EIEIO |
+                       PPC_MEM_TLBIE | PPC_MEM_TLBSYNC |
+                       PPC_64B | PPC_ALTIVEC |
+                       PPC_SEGMENT_64B | PPC_SLBI |
+                       PPC_POPCNTB | PPC_POPCNTWD;
+    pcc->insns_flags2 = PPC2_VSX | PPC2_DFP | PPC2_DBRX;
+    pcc->msr_mask = 0x800000000204FF36ULL;
+    pcc->mmu_model = POWERPC_MMU_2_06;
+#if defined(CONFIG_SOFTMMU)
+    pcc->handle_mmu_fault = ppc_hash64_handle_mmu_fault;
+#endif
+    pcc->excp_model = POWERPC_EXCP_POWER7;
+    pcc->bus_model = PPC_FLAGS_INPUT_POWER7;
+    pcc->bfd_mach = bfd_mach_ppc64;
+    pcc->flags = POWERPC_FLAG_VRE | POWERPC_FLAG_SE |
+                 POWERPC_FLAG_BE | POWERPC_FLAG_PMM |
+                 POWERPC_FLAG_BUS_CLK | POWERPC_FLAG_CFAR;
+    pcc->l1_dcache_size = 0x8000;
+    pcc->l1_icache_size = 0x8000;
+}
 #endif /* defined (TARGET_PPC64) */
 
 
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 18/19] target-ppc: Enhance the CPU node labels for the guest device tree for pseries.
  2013-07-06 13:53 [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (16 preceding siblings ...)
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 17/19] target-ppc: Add POWER8 v1.0 CPU model Alexey Kardashevskiy
@ 2013-07-06 13:54 ` Alexey Kardashevskiy
  2013-07-08  1:09   ` David Gibson
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 19/19] spapr-pci: rework MSI/MSIX Alexey Kardashevskiy
  2013-07-29 20:24 ` [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8 Anthony Liguori
  19 siblings, 1 reply; 30+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-06 13:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, aik, Alexander Graf, Paul Mackerras,
	Prerna Saxena, qemu-ppc, David Gibson

From: Prerna Saxena <prerna@linux.vnet.ibm.com>

In absence of a -CPU parameter in the qemu command line, the nodes of
KVM-enabled guest device tree look like this :

/proc/device-tree/cpus/HOST@0/...
/proc/device-tree/cpus/HOST@4/...

This patch replaces this obscure 'HOST' label with a more descriptive label.
This is gathered by first identifying the PVR of the host, and then determining
the host CPU alias which corresponds to the model indicated by this PVR.

Sample Final outcome for an KVM-enabled pseries guest running on POWER7:
/proc/device-tree/cpus/PowerPC,POWER7@0/...
/proc/device-tree/cpus/PowerPC,POWER7@4/...

This also helps userspace tools like ppc64_cpu, which expect the device tree
to be in this format in the guest.

Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr.c              | 17 ++++++++++++++---
 target-ppc/cpu-qom.h        |  1 +
 target-ppc/translate_init.c | 28 ++++++++++++++++++++++++++++
 3 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 26dd3f7..5ecd81b 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -80,6 +80,7 @@
 
 #define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
 
+#define PPC_DEVTREE_STR         "PowerPC,"
 sPAPREnvironment *spapr;
 
 int spapr_allocate_irq(int hint, bool lsi)
@@ -296,9 +297,12 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
     _FDT((fdt_property_cell(fdt, "#address-cells", 0x1)));
     _FDT((fdt_property_cell(fdt, "#size-cells", 0x0)));
 
-    modelname = g_strdup(cpu_model);
+    /* device tree nodes must look like this :
+     * PowerPC,CPU_ALIAS@0
+     */
+    modelname = g_strdup_printf(PPC_DEVTREE_STR "%s", cpu_model);
 
-    for (i = 0; i < strlen(modelname); i++) {
+    for (i = strlen(PPC_DEVTREE_STR); i < strlen(modelname); i++) {
         modelname[i] = toupper(modelname[i]);
     }
 
@@ -1112,7 +1116,7 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
     MemoryRegion *sysmem = get_system_memory();
     MemoryRegion *ram = g_new(MemoryRegion, 1);
     hwaddr rma_alloc_size;
-    uint32_t initrd_base = 0;
+    uint32_t initrd_base = 0, pvr = 0;
     long kernel_size = 0, initrd_size = 0;
     long load_limit, rtas_limit, fw_size;
     char *filename;
@@ -1342,6 +1346,13 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
     register_savevm_live(NULL, "spapr/htab", -1, 1,
                          &savevm_htab_handlers, spapr);
 
+    /* Ensure that cpu_model is correctly reflected for a KVM guest */
+    if (kvm_enabled() && !strcmp(cpu_model, "host")) {
+        asm ("mfpvr %0"
+            : "=r"(pvr));
+        cpu_model = ppc_cpu_alias_by_pvr(pvr);
+    }
+
     /* Prepare the device tree */
     spapr->fdt_skel = spapr_create_fdt_skel(cpu_model,
                                             initrd_base, initrd_size,
diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h
index a14a3d9..2ad45c2 100644
--- a/target-ppc/cpu-qom.h
+++ b/target-ppc/cpu-qom.h
@@ -99,6 +99,7 @@ static inline PowerPCCPU *ppc_env_get_cpu(CPUPPCState *env)
 #define ENV_OFFSET offsetof(PowerPCCPU, env)
 
 PowerPCCPUClass *ppc_cpu_class_by_pvr(uint32_t pvr);
+const char *ppc_cpu_alias_by_pvr(uint32_t pvr);
 
 void ppc_cpu_do_interrupt(CPUState *cpu);
 void ppc_cpu_dump_state(CPUState *cpu, FILE *f, fprintf_function cpu_fprintf,
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index c4b466b..2b013c2 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -7947,6 +7947,34 @@ PowerPCCPUClass *ppc_cpu_class_by_pvr(uint32_t pvr)
     return pcc;
 }
 
+const char *ppc_cpu_alias_by_pvr(uint32_t pvr)
+{
+    int i;
+    const char *cpu_alias;
+    char *offset, *model;
+
+    cpu_alias  = object_class_get_name(OBJECT_CLASS
+                            (ppc_cpu_class_by_pvr(pvr)));
+
+    /* Replace the full class name in cpu_alias with the CPU alias
+     * Eg, POWER7_V2.3-POWERPC64-CPU can simply be called
+     * POWER7
+     */
+
+    offset = strstr(cpu_alias, "-" TYPE_POWERPC_CPU);
+    if (offset) {
+        model = g_strndup(cpu_alias, offset - cpu_alias);
+        for (i = 0; ppc_cpu_aliases[i].model != NULL; i++) {
+            if (strcmp(ppc_cpu_aliases[i].model, model) == 0) {
+                g_free(model);
+                return ppc_cpu_aliases[i].alias;
+            }
+        }
+        g_free(model);
+    }
+    return NULL;
+}
+
 static gint ppc_cpu_compare_class_name(gconstpointer a, gconstpointer b)
 {
     ObjectClass *oc = (ObjectClass *)a;
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH 19/19] spapr-pci: rework MSI/MSIX
  2013-07-06 13:53 [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (17 preceding siblings ...)
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 18/19] target-ppc: Enhance the CPU node labels for the guest device tree for pseries Alexey Kardashevskiy
@ 2013-07-06 13:54 ` Alexey Kardashevskiy
  2013-07-29 20:24 ` [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8 Anthony Liguori
  19 siblings, 0 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-06 13:54 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, Alexey Kardashevskiy, Alexander Graf, qemu-ppc,
	Paul Mackerras, David Gibson

The specific of sPAPR platform is that the guest allocates MSI/MSIX
vectors via RTAS hypercalls and only operates with global IRQ numbers.
In the real hardware, PHB is expected to convert MSIMessage to an IRQ
number. So it is up to the host kernel to setup correct MSIMessage in
a real device and a PHB where a device sits on.

Therefore MSIMessage handling is completely hidden in QEMU.

Previously every PCI host bridge implemented its own MSI memory window
to catch msi_notify()/msix_notify() calls from QEMU devices (virtio-pci
or vfio) and redirect them to the guest via qemu_pulse_irq().

MSIMessage encoding was:
* .addr - address within the PHB MSI window;
* .data - the device index on PHB plus vector number.

The MSI MR write function translated this MSIMessage to a global VIRQ
number and called qemu_pulse_irq().

However the total number of IRQs is not really big (at the moment it is
1024 IRQs starting from 4096) and even 16bit data field of MSIMessage
seems to be enough to store a VIRQ number there so no decoding will be
needed.

The patch does:

1. remove MSI windows from a PHB;
2. add a single memory region for all MSIs in the guest;
3. encode MSIMessage as:
    * .addr - a fixed address of SPAPR_PCI_MSI_WINDOW==0x40000000000ULL;
    * .data as a IRQ number.
4. change IRQ allocator to align first IRQ number for MSI as it uses
lowest .data bits to put a vector number; this is not required for MSI-X
though as it has a per vector .data field.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 hw/ppc/spapr.c              | 29 ++++++++++++--
 hw/ppc/spapr_pci.c          | 94 +++++++++++++++++++--------------------------
 include/hw/pci-host/spapr.h |  8 ++--
 include/hw/ppc/spapr.h      |  4 +-
 4 files changed, 73 insertions(+), 62 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 5ecd81b..29d2be5 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -89,6 +89,9 @@ int spapr_allocate_irq(int hint, bool lsi)
 
     if (hint) {
         irq = hint;
+        if (hint >= spapr->next_irq) {
+            spapr->next_irq = hint + 1;
+        }
         /* FIXME: we should probably check for collisions somehow */
     } else {
         irq = spapr->next_irq++;
@@ -104,22 +107,39 @@ int spapr_allocate_irq(int hint, bool lsi)
     return irq;
 }
 
-/* Allocate block of consequtive IRQs, returns a number of the first */
-int spapr_allocate_irq_block(int num, bool lsi)
+/*
+ * Allocate block of consequtive IRQs, returns a number of the first.
+ * If msi==true, aligns the first IRQ number to num.
+ */
+int spapr_allocate_irq_block(int num, bool lsi, bool msi)
 {
     int first = -1;
-    int i;
+    int i, hint = 0;
+
+    /*
+     * MSIMesage::data is used for storing VIRQ so
+     * it has to be aligned to num to support multiple
+     * MSI vectors. MSI-X is not affected by this.
+     * The hint is used for the first IRQ, the rest should
+     * be allocated continously.
+     */
+    if (msi) {
+        assert((num == 1) || (num == 2) || (num == 4) ||
+               (num == 8) || (num == 16) || (num == 32));
+        hint = (spapr->next_irq + num - 1) & ~(num - 1);
+    }
 
     for (i = 0; i < num; ++i) {
         int irq;
 
-        irq = spapr_allocate_irq(0, lsi);
+        irq = spapr_allocate_irq(hint, lsi);
         if (!irq) {
             return -1;
         }
 
         if (0 == i) {
             first = irq;
+            hint = 0;
         }
 
         /* If the above doesn't create a consecutive block then that's
@@ -1256,6 +1276,7 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
     spapr_create_nvram(spapr);
 
     /* Set up PCI */
+    spapr_pci_msi_init(spapr, SPAPR_PCI_MSI_WINDOW);
     spapr_pci_rtas_init();
 
     phb = spapr_create_phb(spapr, 0);
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 4d8e3cd..23dbc0e 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -253,30 +253,6 @@ static int spapr_msicfg_find(sPAPRPHBState *phb, uint32_t config_addr,
     return -1;
 }
 
-/*
- * Set MSI/MSIX message data.
- * This is required for msi_notify()/msix_notify() which
- * will write at the addresses via spapr_msi_write().
- */
-static void spapr_msi_setmsg(PCIDevice *pdev, hwaddr addr,
-                             bool msix, unsigned req_num)
-{
-    unsigned i;
-    MSIMessage msg = { .address = addr, .data = 0 };
-
-    if (!msix) {
-        msi_set_message(pdev, msg);
-        trace_spapr_pci_msi_setup(pdev->name, 0, msg.address);
-        return;
-    }
-
-    for (i = 0; i < req_num; ++i) {
-        msg.address = addr | (i << 2);
-        msix_set_message(pdev, i, msg);
-        trace_spapr_pci_msi_setup(pdev->name, i, msg.address);
-    }
-}
-
 static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
                                 uint32_t token, uint32_t nargs,
                                 target_ulong args, uint32_t nret,
@@ -288,9 +264,10 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
     unsigned int req_num = rtas_ld(args, 4); /* 0 == remove all */
     unsigned int seq_num = rtas_ld(args, 5);
     unsigned int ret_intr_type;
-    int ndev, irq;
+    int i, ndev, irq;
     sPAPRPHBState *phb = NULL;
     PCIDevice *pdev = NULL;
+    MSIMessage msg;
 
     switch (func) {
     case RTAS_CHANGE_MSI_FN:
@@ -351,7 +328,8 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
 
     /* There is no cached config, allocate MSIs */
     if (!phb->msi_table[ndev].nvec) {
-        irq = spapr_allocate_irq_block(req_num, false);
+        irq = spapr_allocate_irq_block(req_num, false,
+                                       ret_intr_type == RTAS_TYPE_MSI);
         if (irq < 0) {
             fprintf(stderr, "Cannot allocate MSIs for device#%d", ndev);
             rtas_st(rets, 0, -1); /* Hardware error */
@@ -362,9 +340,23 @@ static void rtas_ibm_change_msi(PowerPCCPU *cpu, sPAPREnvironment *spapr,
         phb->msi_table[ndev].config_addr = config_addr;
     }
 
-    /* Setup MSI/MSIX vectors in the device (via cfgspace or MSIX BAR) */
-    spapr_msi_setmsg(pdev, phb->msi_win_addr | (ndev << 16),
-                     ret_intr_type == RTAS_TYPE_MSIX, req_num);
+    /*
+     * Set MSI/MSIX message data.
+     * This is required for msi_notify()/msix_notify() which
+     * will write at the addresses via spapr_msi_write().
+     */
+    msg.address = spapr->msi_win_addr;
+    if (ret_intr_type == RTAS_TYPE_MSI) {
+        msg.data = phb->msi_table[ndev].irq;
+        msi_set_message(pdev, msg);
+        trace_spapr_pci_msi_setup(pdev->name, 0, msg.address);
+    } else {
+        for (i = 0; i < phb->msi_table[ndev].nvec; ++i) {
+            msg.data = phb->msi_table[ndev].irq + i;
+            msix_set_message(pdev, i, msg);
+            trace_spapr_pci_msi_setup(pdev->name, i, msg.address);
+        }
+    }
 
     rtas_st(rets, 0, 0);
     rtas_st(rets, 1, req_num);
@@ -487,10 +479,7 @@ static const MemoryRegionOps spapr_io_ops = {
 static void spapr_msi_write(void *opaque, hwaddr addr,
                             uint64_t data, unsigned size)
 {
-    sPAPRPHBState *phb = opaque;
-    int ndev = addr >> 16;
-    int vec = ((addr & 0xFFFF) >> 2) | data;
-    uint32_t irq = phb->msi_table[ndev].irq + vec;
+    uint32_t irq = data;
 
     trace_spapr_pci_msi_write(addr, data, irq);
 
@@ -504,6 +493,23 @@ static const MemoryRegionOps spapr_msi_ops = {
     .endianness = DEVICE_LITTLE_ENDIAN
 };
 
+void spapr_pci_msi_init(sPAPREnvironment *spapr, hwaddr addr)
+{
+    /*
+     * As MSI/MSIX interrupts trigger by writing at MSI/MSIX vectors,
+     * we need to allocate some memory to catch those writes coming
+     * from msi_notify()/msix_notify().
+     * As MSIMessage:addr is going to be the same and MSIMessage:data
+     * is going to be a VIRQ number, 4 bytes of the MSI MR will only
+     * be used.
+     */
+    spapr->msi_win_addr = addr;
+    memory_region_init_io(&spapr->msiwindow, &spapr_msi_ops, spapr,
+                          "msi", getpagesize());
+    memory_region_add_subregion(get_system_memory(), spapr->msi_win_addr,
+                                &spapr->msiwindow);
+}
+
 /*
  * PHB PCI device
  */
@@ -528,8 +534,7 @@ static int spapr_phb_init(SysBusDevice *s)
 
         if ((sphb->buid != -1) || (sphb->dma_liobn != -1)
             || (sphb->mem_win_addr != -1)
-            || (sphb->io_win_addr != -1)
-            || (sphb->msi_win_addr != -1)) {
+            || (sphb->io_win_addr != -1)) {
             fprintf(stderr, "Either \"index\" or other parameters must"
                     " be specified for PAPR PHB, not both\n");
             return -1;
@@ -542,7 +547,6 @@ static int spapr_phb_init(SysBusDevice *s)
             + sphb->index * SPAPR_PCI_WINDOW_SPACING;
         sphb->mem_win_addr = windows_base + SPAPR_PCI_MMIO_WIN_OFF;
         sphb->io_win_addr = windows_base + SPAPR_PCI_IO_WIN_OFF;
-        sphb->msi_win_addr = windows_base + SPAPR_PCI_MSI_WIN_OFF;
     }
 
     if (sphb->buid == -1) {
@@ -565,11 +569,6 @@ static int spapr_phb_init(SysBusDevice *s)
         return -1;
     }
 
-    if (sphb->msi_win_addr == -1) {
-        fprintf(stderr, "MSI window address not specified for PHB\n");
-        return -1;
-    }
-
     if (find_phb(spapr, sphb->buid)) {
         fprintf(stderr, "PCI host bridges must have unique BUIDs\n");
         return -1;
@@ -608,17 +607,6 @@ static int spapr_phb_init(SysBusDevice *s)
     memory_region_add_subregion(get_system_memory(), sphb->io_win_addr,
                                 &sphb->iowindow);
 
-    /* As MSI/MSIX interrupts trigger by writing at MSI/MSIX vectors,
-     * we need to allocate some memory to catch those writes coming
-     * from msi_notify()/msix_notify() */
-    if (msi_supported) {
-        sprintf(namebuf, "%s.msi", sphb->dtbusname);
-        memory_region_init_io(&sphb->msiwindow, &spapr_msi_ops, sphb,
-                              namebuf, SPAPR_MSIX_MAX_DEVS * 0x10000);
-        memory_region_add_subregion(get_system_memory(), sphb->msi_win_addr,
-                                    &sphb->msiwindow);
-    }
-
     /*
      * Selecting a busname is more complex than you'd think, due to
      * interacting constraints.  If the user has specified an id
@@ -692,7 +680,6 @@ static Property spapr_phb_properties[] = {
     DEFINE_PROP_HEX64("io_win_addr", sPAPRPHBState, io_win_addr, -1),
     DEFINE_PROP_HEX64("io_win_size", sPAPRPHBState, io_win_size,
                       SPAPR_PCI_IO_WIN_SIZE),
-    DEFINE_PROP_HEX64("msi_win_addr", sPAPRPHBState, msi_win_addr, -1),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -734,7 +721,6 @@ static const VMStateDescription vmstate_spapr_pci = {
         VMSTATE_UINT64_EQUAL(mem_win_size, sPAPRPHBState),
         VMSTATE_UINT64_EQUAL(io_win_addr, sPAPRPHBState),
         VMSTATE_UINT64_EQUAL(io_win_size, sPAPRPHBState),
-        VMSTATE_UINT64_EQUAL(msi_win_addr, sPAPRPHBState),
         VMSTATE_STRUCT_ARRAY(lsi_table, sPAPRPHBState, PCI_NUM_PINS, 0,
                              vmstate_spapr_pci_lsi, struct spapr_pci_lsi),
         VMSTATE_STRUCT_ARRAY(msi_table, sPAPRPHBState, SPAPR_MSIX_MAX_DEVS, 0,
diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h
index 93f9511..970b4a9 100644
--- a/include/hw/pci-host/spapr.h
+++ b/include/hw/pci-host/spapr.h
@@ -43,8 +43,7 @@ typedef struct sPAPRPHBState {
 
     MemoryRegion memspace, iospace;
     hwaddr mem_win_addr, mem_win_size, io_win_addr, io_win_size;
-    hwaddr msi_win_addr;
-    MemoryRegion memwindow, iowindow, msiwindow;
+    MemoryRegion memwindow, iowindow;
 
     uint32_t dma_liobn;
     uint64_t dma_window_start;
@@ -73,7 +72,8 @@ typedef struct sPAPRPHBState {
 #define SPAPR_PCI_MMIO_WIN_SIZE      0x20000000
 #define SPAPR_PCI_IO_WIN_OFF         0x80000000
 #define SPAPR_PCI_IO_WIN_SIZE        0x10000
-#define SPAPR_PCI_MSI_WIN_OFF        0x90000000
+
+#define SPAPR_PCI_MSI_WINDOW         0x40000000000ULL
 
 #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL
 
@@ -88,6 +88,8 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
                           uint32_t xics_phandle,
                           void *fdt);
 
+void spapr_pci_msi_init(sPAPREnvironment *spapr, hwaddr addr);
+
 void spapr_pci_rtas_init(void);
 
 #endif /* __HW_SPAPR_PCI_H__ */
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 3da31f0..f0129f4 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -14,6 +14,8 @@ struct icp_state;
 typedef struct sPAPREnvironment {
     struct VIOsPAPRBus *vio_bus;
     QLIST_HEAD(, sPAPRPHBState) phbs;
+    hwaddr msi_win_addr;
+    MemoryRegion msiwindow;
     struct sPAPRNVRAM *nvram;
     struct icp_state *icp;
 
@@ -304,7 +306,7 @@ target_ulong spapr_hypercall(PowerPCCPU *cpu, target_ulong opcode,
                              target_ulong *args);
 
 int spapr_allocate_irq(int hint, bool lsi);
-int spapr_allocate_irq_block(int num, bool lsi);
+int spapr_allocate_irq_block(int num, bool lsi, bool msi);
 
 static inline int spapr_allocate_msi(int hint)
 {
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 13/19] pseries: savevm support for PCI host bridge
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 13/19] pseries: savevm support for PCI host bridge Alexey Kardashevskiy
@ 2013-07-07 23:33   ` David Gibson
  0 siblings, 0 replies; 30+ messages in thread
From: David Gibson @ 2013-07-07 23:33 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Anthony Liguori, qemu-devel, Alexander Graf, qemu-ppc, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 604 bytes --]

On Sat, Jul 06, 2013 at 11:54:10PM +1000, Alexey Kardashevskiy wrote:
> From: David Gibson <david@gibson.dropbear.id.au>
> 
> This adds the necessary support for saving the state of the PAPR virtual
> PCI host bridge (or host bridges).

Note that the migration streams created by this patch will be changed
by the multiple host bridge rework which I see that Michael Tsirkin
has now sent for merge.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 18/19] target-ppc: Enhance the CPU node labels for the guest device tree for pseries.
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 18/19] target-ppc: Enhance the CPU node labels for the guest device tree for pseries Alexey Kardashevskiy
@ 2013-07-08  1:09   ` David Gibson
  2013-07-08  9:02     ` Andreas Färber
  2013-07-08 15:45     ` [Qemu-devel] [PATCH v2 " Prerna Saxena
  0 siblings, 2 replies; 30+ messages in thread
From: David Gibson @ 2013-07-08  1:09 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: Anthony Liguori, Alexander Graf, qemu-devel, qemu-ppc,
	Prerna Saxena, Paul Mackerras

[-- Attachment #1: Type: text/plain, Size: 3629 bytes --]

On Sat, Jul 06, 2013 at 11:54:15PM +1000, Alexey Kardashevskiy wrote:
> From: Prerna Saxena <prerna@linux.vnet.ibm.com>
> 
> In absence of a -CPU parameter in the qemu command line, the nodes of
> KVM-enabled guest device tree look like this :
> 
> /proc/device-tree/cpus/HOST@0/...
> /proc/device-tree/cpus/HOST@4/...
> 
> This patch replaces this obscure 'HOST' label with a more descriptive label.
> This is gathered by first identifying the PVR of the host, and then determining
> the host CPU alias which corresponds to the model indicated by this PVR.
> 
> Sample Final outcome for an KVM-enabled pseries guest running on POWER7:
> /proc/device-tree/cpus/PowerPC,POWER7@0/...
> /proc/device-tree/cpus/PowerPC,POWER7@4/...
> 
> This also helps userspace tools like ppc64_cpu, which expect the device tree
> to be in this format in the guest.
> 
> Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
>  hw/ppc/spapr.c              | 17 ++++++++++++++---
>  target-ppc/cpu-qom.h        |  1 +
>  target-ppc/translate_init.c | 28 ++++++++++++++++++++++++++++
>  3 files changed, 43 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 26dd3f7..5ecd81b 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -80,6 +80,7 @@
>  
>  #define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
>  
> +#define PPC_DEVTREE_STR         "PowerPC,"

I thought under PowerVM, modern CPUs showed up as simply
e.g. "POWER7@0" not "PowerPC,POWER7@0".  Have I misremembered?


>  sPAPREnvironment *spapr;
>  
>  int spapr_allocate_irq(int hint, bool lsi)
> @@ -296,9 +297,12 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
>      _FDT((fdt_property_cell(fdt, "#address-cells", 0x1)));
>      _FDT((fdt_property_cell(fdt, "#size-cells", 0x0)));
>  
> -    modelname = g_strdup(cpu_model);
> +    /* device tree nodes must look like this :
> +     * PowerPC,CPU_ALIAS@0
> +     */
> +    modelname = g_strdup_printf(PPC_DEVTREE_STR "%s", cpu_model);
>  
> -    for (i = 0; i < strlen(modelname); i++) {
> +    for (i = strlen(PPC_DEVTREE_STR); i < strlen(modelname); i++) {
>          modelname[i] = toupper(modelname[i]);
>      }
>  
> @@ -1112,7 +1116,7 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
>      MemoryRegion *sysmem = get_system_memory();
>      MemoryRegion *ram = g_new(MemoryRegion, 1);
>      hwaddr rma_alloc_size;
> -    uint32_t initrd_base = 0;
> +    uint32_t initrd_base = 0, pvr = 0;
>      long kernel_size = 0, initrd_size = 0;
>      long load_limit, rtas_limit, fw_size;
>      char *filename;
> @@ -1342,6 +1346,13 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
>      register_savevm_live(NULL, "spapr/htab", -1, 1,
>                           &savevm_htab_handlers, spapr);
>  
> +    /* Ensure that cpu_model is correctly reflected for a KVM guest */
> +    if (kvm_enabled() && !strcmp(cpu_model, "host")) {
> +        asm ("mfpvr %0"
> +            : "=r"(pvr));
> +        cpu_model = ppc_cpu_alias_by_pvr(pvr);

This needs to be protected by an ifdef CONFIG_KVM or similar.  If the
compiler optimization level is turned down, so that it doesn't
recognize that the kvm_enabled() is always false, then this could
attempt to compile the ppc asm instructions on an x86 (or whatever)
host.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 18/19] target-ppc: Enhance the CPU node labels for the guest device tree for pseries.
  2013-07-08  1:09   ` David Gibson
@ 2013-07-08  9:02     ` Andreas Färber
  2013-07-08 15:49       ` Prerna Saxena
  2013-07-08 15:45     ` [Qemu-devel] [PATCH v2 " Prerna Saxena
  1 sibling, 1 reply; 30+ messages in thread
From: Andreas Färber @ 2013-07-08  9:02 UTC (permalink / raw)
  To: David Gibson, Alexander Graf
  Cc: Anthony Liguori, Alexey Kardashevskiy, qemu-devel,
	Paul Mackerras, Prerna Saxena, qemu-ppc

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 08.07.2013 03:09, schrieb David Gibson:
> On Sat, Jul 06, 2013 at 11:54:15PM +1000, Alexey Kardashevskiy
> wrote:
>> @@ -1342,6 +1346,13 @@ static void
>> ppc_spapr_init(QEMUMachineInitArgs *args) 
>> register_savevm_live(NULL, "spapr/htab", -1, 1, 
>> &savevm_htab_handlers, spapr);
>> 
>> +    /* Ensure that cpu_model is correctly reflected for a KVM
>> guest */ +    if (kvm_enabled() && !strcmp(cpu_model, "host")) { 
>> +        asm ("mfpvr %0" +            : "=r"(pvr)); +
>> cpu_model = ppc_cpu_alias_by_pvr(pvr);
> 
> This needs to be protected by an ifdef CONFIG_KVM or similar.  If
> the compiler optimization level is turned down, so that it doesn't 
> recognize that the kvm_enabled() is always false, then this could 
> attempt to compile the ppc asm instructions on an x86 (or
> whatever) host.

This hunk can be completely replaced by QOM mechanisms - just didn't
get to replying yet...

Andreas

- -- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nrnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imend￶rffer; HRB 16746 AG Nrnberg
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)

iQIcBAEBAgAGBQJR2oAjAAoJEPou0S0+fgE/kxEP/2hvke1o/T4/h3Gl48W2+ASv
84iu5M7atndRF1L1bI6VogmQGlhE4qiAMxuLljpqriXz5lZndslMMcP3mx4skljD
Y3YX9Hi37yR9KEaw0AzoQCBhhS5ZIMGjd/mtW/DqPDcN0H0IdCu340Mz/Lr+0HHy
wp+ChUA8q8iYWJd6zmFmIvnaHUdbRoOHePhVlJD+GZQ2oBNu48DAaiiCdnrEJy+R
ipyZJEF+QmO2RlNDgImOKfyKry6PhuWPBIjMB3qZWyuFzmkwNEcQILnOVSW/bBIl
zXkEkWy3u5fES1+bYs1J4ZL6MZ+Edcd0c2BRKQ9JNUGM6mBj1S61aP8rC7u1VgLp
eUfkSRYOrsvVvJJ/kpOzgWgcgYnfSYp/CUTRURHxlyIxNuvhjDllRhC4wxbF4Bk4
l6jbIDa8jAMTlbCj9EW03Fi+i+oGemkOg2g5Dxl5GnFwdPC95fE39RvSa5vB3X3q
6IgdkbicFReR1dY8JxdcJsTln6b2eMTSHvUjH56FEvDQ9Z/W7TM/qc1jpmNDX7WS
bdWHcziPeAoY9Sk0aMK/LlTKmgZQM1gi5eyKIrL4ujtU3O4VKcNSihYu+Moc+oyx
pEfJrkXP6cvYLwW60yxj8soBv9ssCSBU5ZqgcSK7NlfST0KxtQe4y+jwCT0LkhyS
Qoat9lALzVlVlQwCWM6/
=2tHJ
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v2] pseries: rework PAPR virtual SCSI
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 10/19] pseries: rework PAPR virtual SCSI Alexey Kardashevskiy
@ 2013-07-08 11:57   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 30+ messages in thread
From: Alexey Kardashevskiy @ 2013-07-08 11:57 UTC (permalink / raw)
  To: qemu-devel
  Cc: Anthony Liguori, Alexey Kardashevskiy, Alexander Graf, qemu-ppc,
	Paul Mackerras, David Gibson

The patch reimplements handling of indirect requests in order to
simplify upcoming live migration support.
- all pointers (except SCSIRequest*) were replaces with integer
indexes and offsets;
- DMA'ed srp_direct_buf kept untouched (ie. BE format);
- vscsi_fetch_desc() is added, now it is the only place where
descriptors are fetched and byteswapped;
- vscsi_req struct fields converted to migration-friendly types;
- many dprintf()'s fixed.

This also removed an unused field 'lun' from the spapr_vscsi device
which is assigned, but never used.  So, remove it.

[David Gibson: removed unused 'lun']
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
2013/08/07:
* fixed handling of indirect requests with an additional table descriptor

---
 hw/scsi/spapr_vscsi.c | 223 +++++++++++++++++++++++++++++---------------------
 1 file changed, 130 insertions(+), 93 deletions(-)

diff --git a/hw/scsi/spapr_vscsi.c b/hw/scsi/spapr_vscsi.c
index e8978bf..104fde4 100644
--- a/hw/scsi/spapr_vscsi.c
+++ b/hw/scsi/spapr_vscsi.c
@@ -75,20 +75,19 @@ typedef struct vscsi_req {
     /* SCSI request tracking */
     SCSIRequest             *sreq;
     uint32_t                qtag; /* qemu tag != srp tag */
-    int                     lun;
-    int                     active;
-    long                    data_len;
-    int                     writing;
-    int                     senselen;
+    bool                    active;
+    uint32_t                data_len;
+    bool                    writing;
+    uint32_t                senselen;
     uint8_t                 sense[SCSI_SENSE_BUF_SIZE];
 
     /* RDMA related bits */
     uint8_t                 dma_fmt;
-    struct srp_direct_buf   ext_desc;
-    struct srp_direct_buf   *cur_desc;
-    struct srp_indirect_buf *ind_desc;
-    int                     local_desc;
-    int                     total_desc;
+    uint16_t                local_desc;
+    uint16_t                total_desc;
+    uint16_t                cdb_offset;
+    uint16_t                cur_desc_num;
+    uint16_t                cur_desc_offset;
 } vscsi_req;
 
 #define TYPE_VIO_SPAPR_VSCSI_DEVICE "spapr-vscsi"
@@ -264,93 +263,138 @@ static int vscsi_send_rsp(VSCSIState *s, vscsi_req *req,
     return 0;
 }
 
-static inline void vscsi_swap_desc(struct srp_direct_buf *desc)
+static inline struct srp_direct_buf vscsi_swap_desc(struct srp_direct_buf desc)
 {
-    desc->va = be64_to_cpu(desc->va);
-    desc->len = be32_to_cpu(desc->len);
+    desc.va = be64_to_cpu(desc.va);
+    desc.len = be32_to_cpu(desc.len);
+    return desc;
+}
+
+static int vscsi_fetch_desc(VSCSIState *s, struct vscsi_req *req,
+                            unsigned n, unsigned buf_offset,
+                            struct srp_direct_buf *ret)
+{
+    struct srp_cmd *cmd = &req->iu.srp.cmd;
+
+    switch (req->dma_fmt) {
+    case SRP_NO_DATA_DESC: {
+        dprintf("VSCSI: no data descriptor\n");
+        return 0;
+    }
+    case SRP_DATA_DESC_DIRECT: {
+        *ret = *(struct srp_direct_buf *)(cmd->add_data + req->cdb_offset);
+        assert(req->cur_desc_num == 0);
+        dprintf("VSCSI: direct segment\n");
+        break;
+    }
+    case SRP_DATA_DESC_INDIRECT: {
+        struct srp_indirect_buf *tmp = (struct srp_indirect_buf *)
+                                       (cmd->add_data + req->cdb_offset);
+        if (n < req->local_desc) {
+            *ret = tmp->desc_list[n];
+            dprintf("VSCSI: indirect segment local tag=0x%x desc#%d/%d\n",
+                    req->qtag, n, req->local_desc);
+
+        } else if (n < req->total_desc) {
+            int rc;
+            struct srp_direct_buf tbl_desc = vscsi_swap_desc(tmp->table_desc);
+            unsigned desc_offset = n * sizeof(struct srp_direct_buf);
+
+            if (desc_offset >= tbl_desc.len) {
+                dprintf("VSCSI:   #%d is ouf of range (%d bytes)\n",
+                        n, desc_offset);
+                return -1;
+            }
+            rc = spapr_vio_dma_read(&s->vdev, tbl_desc.va + desc_offset,
+                                    ret, sizeof(struct srp_direct_buf));
+            if (rc) {
+                dprintf("VSCSI: spapr_vio_dma_read -> %d reading ext_desc\n",
+                        rc);
+                return -1;
+            }
+            dprintf("VSCSI: indirect segment ext. tag=0x%x desc#%d/%d { va=%"PRIx64" len=%x }\n",
+                    req->qtag, n, req->total_desc, tbl_desc.va, tbl_desc.len);
+        } else {
+            dprintf("VSCSI:   Out of descriptors !\n");
+            return 0;
+        }
+        break;
+    }
+    default:
+        fprintf(stderr, "VSCSI:   Unknown format %x\n", req->dma_fmt);
+        return -1;
+    }
+
+    *ret = vscsi_swap_desc(*ret);
+    if (buf_offset > ret->len) {
+        dprintf("   offset=%x is out of a descriptor #%d boundary=%x\n",
+                buf_offset, req->cur_desc_num, ret->len);
+        return -1;
+    }
+    ret->va += buf_offset;
+    ret->len -= buf_offset;
+
+    dprintf("   cur=%d offs=%x ret { va=%"PRIx64" len=%x }\n",
+            req->cur_desc_num, req->cur_desc_offset, ret->va, ret->len);
+
+    return ret->len ? 1 : 0;
 }
 
 static int vscsi_srp_direct_data(VSCSIState *s, vscsi_req *req,
                                  uint8_t *buf, uint32_t len)
 {
-    struct srp_direct_buf *md = req->cur_desc;
+    struct srp_direct_buf md;
     uint32_t llen;
     int rc = 0;
 
-    dprintf("VSCSI: direct segment 0x%x bytes, va=0x%llx desc len=0x%x\n",
-            len, (unsigned long long)md->va, md->len);
+    rc = vscsi_fetch_desc(s, req, req->cur_desc_num, req->cur_desc_offset, &md);
+    if (rc < 0) {
+        return -1;
+    } else if (rc == 0) {
+        return 0;
+    }
 
-    llen = MIN(len, md->len);
+    llen = MIN(len, md.len);
     if (llen) {
         if (req->writing) { /* writing = to device = reading from memory */
-            rc = spapr_vio_dma_read(&s->vdev, md->va, buf, llen);
+            rc = spapr_vio_dma_read(&s->vdev, md.va, buf, llen);
         } else {
-            rc = spapr_vio_dma_write(&s->vdev, md->va, buf, llen);
+            rc = spapr_vio_dma_write(&s->vdev, md.va, buf, llen);
         }
     }
-    md->len -= llen;
-    md->va += llen;
 
     if (rc) {
         return -1;
     }
+    req->cur_desc_offset += llen;
+
     return llen;
 }
 
 static int vscsi_srp_indirect_data(VSCSIState *s, vscsi_req *req,
                                    uint8_t *buf, uint32_t len)
 {
-    struct srp_direct_buf *td = &req->ind_desc->table_desc;
-    struct srp_direct_buf *md = req->cur_desc;
+    struct srp_direct_buf md;
     int rc = 0;
     uint32_t llen, total = 0;
 
-    dprintf("VSCSI: indirect segment 0x%x bytes, td va=0x%llx len=0x%x\n",
-            len, (unsigned long long)td->va, td->len);
+    dprintf("VSCSI: indirect segment 0x%x bytes\n", len);
 
     /* While we have data ... */
     while (len) {
-        /* If we have a descriptor but it's empty, go fetch a new one */
-        if (md && md->len == 0) {
-            /* More local available, use one */
-            if (req->local_desc) {
-                md = ++req->cur_desc;
-                --req->local_desc;
-                --req->total_desc;
-                td->va += sizeof(struct srp_direct_buf);
-            } else {
-                md = req->cur_desc = NULL;
-            }
+        rc = vscsi_fetch_desc(s, req, req->cur_desc_num, req->cur_desc_offset, &md);
+        if (rc < 0) {
+            return -1;
+        } else if (rc == 0) {
+            break;
         }
-        /* No descriptor at hand, fetch one */
-        if (!md) {
-            if (!req->total_desc) {
-                dprintf("VSCSI:   Out of descriptors !\n");
-                break;
-            }
-            md = req->cur_desc = &req->ext_desc;
-            dprintf("VSCSI:   Reading desc from 0x%llx\n",
-                    (unsigned long long)td->va);
-            rc = spapr_vio_dma_read(&s->vdev, td->va, md,
-                                    sizeof(struct srp_direct_buf));
-            if (rc) {
-                dprintf("VSCSI: spapr_vio_dma_read -> %d reading ext_desc\n",
-                        rc);
-                break;
-            }
-            vscsi_swap_desc(md);
-            td->va += sizeof(struct srp_direct_buf);
-            --req->total_desc;
-        }
-        dprintf("VSCSI:   [desc va=0x%llx,len=0x%x] remaining=0x%x\n",
-                (unsigned long long)md->va, md->len, len);
 
         /* Perform transfer */
-        llen = MIN(len, md->len);
+        llen = MIN(len, md.len);
         if (req->writing) { /* writing = to device = reading from memory */
-            rc = spapr_vio_dma_read(&s->vdev, md->va, buf, llen);
+            rc = spapr_vio_dma_read(&s->vdev, md.va, buf, llen);
         } else {
-            rc = spapr_vio_dma_write(&s->vdev, md->va, buf, llen);
+            rc = spapr_vio_dma_write(&s->vdev, md.va, buf, llen);
         }
         if (rc) {
             dprintf("VSCSI: spapr_vio_dma_r/w(%d) -> %d\n", req->writing, rc);
@@ -361,10 +405,18 @@ static int vscsi_srp_indirect_data(VSCSIState *s, vscsi_req *req,
 
         len -= llen;
         buf += llen;
+
         total += llen;
-        md->va += llen;
-        md->len -= llen;
+
+        /* Update current position in the current descriptor */
+        req->cur_desc_offset += llen;
+        if (md.len == llen) {
+            /* Go to the next descriptor if the current one finished */
+            ++req->cur_desc_num;
+            req->cur_desc_offset = 0;
+        }
     }
+
     return rc ? -1 : total;
 }
 
@@ -412,14 +464,13 @@ static int data_out_desc_size(struct srp_cmd *cmd)
 static int vscsi_preprocess_desc(vscsi_req *req)
 {
     struct srp_cmd *cmd = &req->iu.srp.cmd;
-    int offset, i;
 
-    offset = cmd->add_cdb_len & ~3;
+    req->cdb_offset = cmd->add_cdb_len & ~3;
 
     if (req->writing) {
         req->dma_fmt = cmd->buf_fmt >> 4;
     } else {
-        offset += data_out_desc_size(cmd);
+        req->cdb_offset += data_out_desc_size(cmd);
         req->dma_fmt = cmd->buf_fmt & ((1U << 4) - 1);
     }
 
@@ -427,31 +478,18 @@ static int vscsi_preprocess_desc(vscsi_req *req)
     case SRP_NO_DATA_DESC:
         break;
     case SRP_DATA_DESC_DIRECT:
-        req->cur_desc = (struct srp_direct_buf *)(cmd->add_data + offset);
         req->total_desc = req->local_desc = 1;
-        vscsi_swap_desc(req->cur_desc);
-        dprintf("VSCSI: using direct RDMA %s, 0x%x bytes MD: 0x%llx\n",
-                req->writing ? "write" : "read",
-                req->cur_desc->len, (unsigned long long)req->cur_desc->va);
         break;
-    case SRP_DATA_DESC_INDIRECT:
-        req->ind_desc = (struct srp_indirect_buf *)(cmd->add_data + offset);
-        vscsi_swap_desc(&req->ind_desc->table_desc);
-        req->total_desc = req->ind_desc->table_desc.len /
-            sizeof(struct srp_direct_buf);
+    case SRP_DATA_DESC_INDIRECT: {
+        struct srp_indirect_buf *ind_tmp = (struct srp_indirect_buf *)
+                (cmd->add_data + req->cdb_offset);
+
+        req->total_desc = be32_to_cpu(ind_tmp->table_desc.len) /
+                          sizeof(struct srp_direct_buf);
         req->local_desc = req->writing ? cmd->data_out_desc_cnt :
-            cmd->data_in_desc_cnt;
-        for (i = 0; i < req->local_desc; i++) {
-            vscsi_swap_desc(&req->ind_desc->desc_list[i]);
-        }
-        req->cur_desc = req->local_desc ? &req->ind_desc->desc_list[0] : NULL;
-        dprintf("VSCSI: using indirect RDMA %s, 0x%x bytes %d descs "
-                "(%d local) VA: 0x%llx\n",
-                req->writing ? "read" : "write",
-                be32_to_cpu(req->ind_desc->len),
-                req->total_desc, req->local_desc,
-                (unsigned long long)req->ind_desc->table_desc.va);
+                          cmd->data_in_desc_cnt;
         break;
+    }
     default:
         fprintf(stderr,
                 "vscsi_preprocess_desc: Unknown format %x\n", req->dma_fmt);
@@ -499,8 +537,8 @@ static void vscsi_command_complete(SCSIRequest *sreq, uint32_t status, size_t re
     vscsi_req *req = sreq->hba_private;
     int32_t res_in = 0, res_out = 0;
 
-    dprintf("VSCSI: SCSI cmd complete, r=0x%x tag=0x%x status=0x%x, req=%p\n",
-            reason, sreq->tag, status, req);
+    dprintf("VSCSI: SCSI cmd complete, tag=0x%x status=0x%x, req=%p\n",
+            sreq->tag, status, req);
     if (req == NULL) {
         fprintf(stderr, "VSCSI: Can't find request for tag 0x%x\n", sreq->tag);
         return;
@@ -509,7 +547,7 @@ static void vscsi_command_complete(SCSIRequest *sreq, uint32_t status, size_t re
     if (status == CHECK_CONDITION) {
         req->senselen = scsi_req_get_sense(req->sreq, req->sense,
                                            sizeof(req->sense));
-        dprintf("VSCSI: Sense data, %d bytes:\n", len);
+        dprintf("VSCSI: Sense data, %d bytes:\n", req->senselen);
         dprintf("       %02x  %02x  %02x  %02x  %02x  %02x  %02x  %02x\n",
                 req->sense[0], req->sense[1], req->sense[2], req->sense[3],
                 req->sense[4], req->sense[5], req->sense[6], req->sense[7]);
@@ -621,12 +659,11 @@ static int vscsi_queue_cmd(VSCSIState *s, vscsi_req *req)
         } return 1;
     }
 
-    req->lun = lun;
     req->sreq = scsi_req_new(sdev, req->qtag, lun, srp->cmd.cdb, req);
     n = scsi_req_enqueue(req->sreq);
 
-    dprintf("VSCSI: Queued command tag 0x%x CMD 0x%x ID %d LUN %d ret: %d\n",
-            req->qtag, srp->cmd.cdb[0], id, lun, n);
+    dprintf("VSCSI: Queued command tag 0x%x CMD 0x%x LUN %d ret: %d\n",
+            req->qtag, srp->cmd.cdb[0], lun, n);
 
     if (n) {
         /* Transfer direction must be set before preprocessing the
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [Qemu-devel] [PATCH v2 18/19] target-ppc: Enhance the CPU node labels for the guest device tree for pseries.
  2013-07-08  1:09   ` David Gibson
  2013-07-08  9:02     ` Andreas Färber
@ 2013-07-08 15:45     ` Prerna Saxena
  1 sibling, 0 replies; 30+ messages in thread
From: Prerna Saxena @ 2013-07-08 15:45 UTC (permalink / raw)
  To: David Gibson
  Cc: Anthony Liguori, Alexey Kardashevskiy, qemu-devel,
	Alexander Graf, qemu-ppc, Paul Mackerras

Hi David,
Thanks for the review feedback. I have incorporated your changes in v2 of
the patch, which follows herewith.

Regards,
Prerna

Subject: [PATCH v2] Target-ppc : Enhance the CPU node labels for the guest
 device tree for pseries.

In absence of a -CPU parameter in the qemu command line, the nodes of
KVM-enabled guest device tree look like this :

/proc/device-tree/cpus/HOST@0/...
/proc/device-tree/cpus/HOST@4/...

This patch replaces this obscure 'HOST' label with a more descriptive label.
This is gathered by first identifying the PVR of the host, and then determining
the host CPU alias which corresponds to the model indicated by this PVR.

Sample Final outcome for an KVM-enabled pseries guest running on POWER7:
/proc/device-tree/cpus/PowerPC,POWER7@0/...
/proc/device-tree/cpus/PowerPC,POWER7@4/...

This also helps userspace tools like ppc64_cpu, which expect the device tree
to be in this format.

Signed-off-by: Prerna Saxena <prerna@linux.vnet.ibm.com>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr.c              | 18 +++++++++++++++---
 target-ppc/cpu-qom.h        |  1 +
 target-ppc/translate_init.c | 28 ++++++++++++++++++++++++++++
 3 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index fe34291..ddf263a 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -79,6 +79,7 @@
 
 #define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
 
+#define PPC_DEVTREE_STR         "PowerPC,"
 sPAPREnvironment *spapr;
 
 int spapr_allocate_irq(int hint, bool lsi)
@@ -295,9 +296,12 @@ static void *spapr_create_fdt_skel(const char *cpu_model,
     _FDT((fdt_property_cell(fdt, "#address-cells", 0x1)));
     _FDT((fdt_property_cell(fdt, "#size-cells", 0x0)));
 
-    modelname = g_strdup(cpu_model);
+    /* device tree nodes must look like this :
+     * PowerPC,CPU_ALIAS@0
+     */
+    modelname = g_strdup_printf(PPC_DEVTREE_STR "%s", cpu_model);
 
-    for (i = 0; i < strlen(modelname); i++) {
+    for (i = strlen(PPC_DEVTREE_STR); i < strlen(modelname); i++) {
         modelname[i] = toupper(modelname[i]);
     }
 
@@ -735,7 +739,7 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
     MemoryRegion *sysmem = get_system_memory();
     MemoryRegion *ram = g_new(MemoryRegion, 1);
     hwaddr rma_alloc_size;
-    uint32_t initrd_base = 0;
+    uint32_t initrd_base = 0, pvr = 0;
     long kernel_size = 0, initrd_size = 0;
     long load_limit, rtas_limit, fw_size;
     char *filename;
@@ -959,6 +963,14 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
 
     spapr->entry_point = 0x100;
 
+#ifdef CONFIG_KVM
+    /* Ensure that cpu_model is correctly reflected for a KVM guest */
+    if (kvm_enabled() && !strcmp(cpu_model, "host")) {
+        asm ("mfpvr %0"
+            : "=r"(pvr));
+        cpu_model = ppc_cpu_alias_by_pvr(pvr);
+    }
+#endif
     /* Prepare the device tree */
     spapr->fdt_skel = spapr_create_fdt_skel(cpu_model,
                                             initrd_base, initrd_size,
diff --git a/target-ppc/cpu-qom.h b/target-ppc/cpu-qom.h
index 84ba105..90dd1dd 100644
--- a/target-ppc/cpu-qom.h
+++ b/target-ppc/cpu-qom.h
@@ -99,6 +99,7 @@ static inline PowerPCCPU *ppc_env_get_cpu(CPUPPCState *env)
 #define ENV_OFFSET offsetof(PowerPCCPU, env)
 
 PowerPCCPUClass *ppc_cpu_class_by_pvr(uint32_t pvr);
+const char *ppc_cpu_alias_by_pvr(uint32_t pvr);
 
 void ppc_cpu_do_interrupt(CPUState *cpu);
 void ppc_cpu_dump_state(CPUState *cpu, FILE *f, fprintf_function cpu_fprintf,
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 50e0ee5..21a7f6f 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -7913,6 +7913,34 @@ PowerPCCPUClass *ppc_cpu_class_by_pvr(uint32_t pvr)
     return pcc;
 }
 
+const char *ppc_cpu_alias_by_pvr(uint32_t pvr)
+{
+    int i;
+    const char *cpu_alias;
+    char *offset, *model;
+
+    cpu_alias  = object_class_get_name(OBJECT_CLASS
+                            (ppc_cpu_class_by_pvr(pvr)));
+
+    /* Replace the full class name in cpu_alias with the CPU alias
+     * Eg, POWER7_V2.3-POWERPC64-CPU can simply be called
+     * POWER7
+     */
+
+    offset = strstr(cpu_alias, "-" TYPE_POWERPC_CPU);
+    if (offset) {
+        model = g_strndup(cpu_alias, offset - cpu_alias);
+        for (i = 0; ppc_cpu_aliases[i].model != NULL; i++) {
+            if (strcmp(ppc_cpu_aliases[i].model, model) == 0) {
+                g_free(model);
+                return ppc_cpu_aliases[i].alias;
+            }
+        }
+        g_free(model);
+    }
+    return NULL;
+}
+
 static gint ppc_cpu_compare_class_name(gconstpointer a, gconstpointer b)
 {
     ObjectClass *oc = (ObjectClass *)a;
-- 
1.7.11.7



-- 
Prerna Saxena

Linux Technology Centre,
IBM Systems and Technology Lab,
Bangalore, India

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 18/19] target-ppc: Enhance the CPU node labels for the guest device tree for pseries.
  2013-07-08  9:02     ` Andreas Färber
@ 2013-07-08 15:49       ` Prerna Saxena
  2013-07-08 16:45         ` Andreas Färber
  0 siblings, 1 reply; 30+ messages in thread
From: Prerna Saxena @ 2013-07-08 15:49 UTC (permalink / raw)
  To: afaerber; +Cc: qemu-devel

On 07/08/2013 02:32 PM, Andreas Färber wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Am 08.07.2013 03:09, schrieb David Gibson:
>> On Sat, Jul 06, 2013 at 11:54:15PM +1000, Alexey Kardashevskiy
>> wrote:
>>> @@ -1342,6 +1346,13 @@ static void
>>> ppc_spapr_init(QEMUMachineInitArgs *args) 
>>> register_savevm_live(NULL, "spapr/htab", -1, 1, 
>>> &savevm_htab_handlers, spapr);
>>>
>>> +    /* Ensure that cpu_model is correctly reflected for a KVM
>>> guest */ +    if (kvm_enabled() && !strcmp(cpu_model, "host")) { 
>>> +        asm ("mfpvr %0" +            : "=r"(pvr)); +
>>> cpu_model = ppc_cpu_alias_by_pvr(pvr);
>>
>> This needs to be protected by an ifdef CONFIG_KVM or similar.  If
>> the compiler optimization level is turned down, so that it doesn't 
>> recognize that the kvm_enabled() is always false, then this could 
>> attempt to compile the ppc asm instructions on an x86 (or
>> whatever) host.
> 
> This hunk can be completely replaced by QOM mechanisms - just didn't
> get to replying yet...
> 

Hi Andreas,
Sorry I already sent out a v2, and only then saw your message. Could you
pls explain how I could use QOM to replace this code block ?

Regards,
-- 
Prerna Saxena

Linux Technology Centre,
IBM Systems and Technology Lab,
Bangalore, India

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 18/19] target-ppc: Enhance the CPU node labels for the guest device tree for pseries.
  2013-07-08 15:49       ` Prerna Saxena
@ 2013-07-08 16:45         ` Andreas Färber
  2013-07-10  6:38           ` Prerna Saxena
  0 siblings, 1 reply; 30+ messages in thread
From: Andreas Färber @ 2013-07-08 16:45 UTC (permalink / raw)
  To: Prerna Saxena; +Cc: qemu-ppc, qemu-devel, Alexander Graf

Hi,

Am 08.07.2013 17:49, schrieb Prerna Saxena:
> On 07/08/2013 02:32 PM, Andreas Färber wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Am 08.07.2013 03:09, schrieb David Gibson:
>>> On Sat, Jul 06, 2013 at 11:54:15PM +1000, Alexey Kardashevskiy
>>> wrote:
>>>> @@ -1342,6 +1346,13 @@ static void
>>>> ppc_spapr_init(QEMUMachineInitArgs *args) 
>>>> register_savevm_live(NULL, "spapr/htab", -1, 1, 
>>>> &savevm_htab_handlers, spapr);
>>>>
>>>> +    /* Ensure that cpu_model is correctly reflected for a KVM
>>>> guest */ +    if (kvm_enabled() && !strcmp(cpu_model, "host")) { 
>>>> +        asm ("mfpvr %0" +            : "=r"(pvr)); +
>>>> cpu_model = ppc_cpu_alias_by_pvr(pvr);
>>>
>>> This needs to be protected by an ifdef CONFIG_KVM or similar.  If
>>> the compiler optimization level is turned down, so that it doesn't 
>>> recognize that the kvm_enabled() is always false, then this could 
>>> attempt to compile the ppc asm instructions on an x86 (or
>>> whatever) host.
>>
>> This hunk can be completely replaced by QOM mechanisms - just didn't
>> get to replying yet...
> 
> Sorry I already sent out a v2, and only then saw your message. Could you
> pls explain how I could use QOM to replace this code block ?

Well, in short the thing is it has not much to do with KVM. The
KVM-specific host-powerpc64-cpu type is derived from the one you're
looking for and thus you can use object_class_get_parent() to obtain the
parent type and look at its name - stripping "-" TYPE_POWERPC_CPU from
it should be much more efficient but will give you the detailed name
including revision. I was planning to propose an alternative patch for that.

Replacing a concrete model name with its simpler alias is a secondary
issue (separate patch) that is not specific to KVM or -cpu host. Compare
-cpu POWER8_v1.0 printing .../POWER8_v1.0@0/... presumably.

Further, Alex has already applied a patch of his working around the
alias table being a rather archaic construct, not intended for frequent
use. Instead of adding even more functions that iterate it, we should
turn it into a hashtable for efficient lookup.

(Note that the cpu_model_str field may contain more than just the model
name, it is otherwise unused in softmmu and I was therefore preparing a
patch to ban its use to linux-user solely, so the type name seems the
most reliable indicator we have and as a bonus no PVR needed for it.)

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 18/19] target-ppc: Enhance the CPU node labels for the guest device tree for pseries.
  2013-07-08 16:45         ` Andreas Färber
@ 2013-07-10  6:38           ` Prerna Saxena
  2013-07-10  9:11             ` Andreas Färber
  0 siblings, 1 reply; 30+ messages in thread
From: Prerna Saxena @ 2013-07-10  6:38 UTC (permalink / raw)
  To: Andreas Färber; +Cc: qemu-ppc, qemu-devel, Alexander Graf

Hi Andreas,
Thanks for the response.

On 07/08/2013 10:15 PM, Andreas Färber wrote:
> Hi,
> 
> Am 08.07.2013 17:49, schrieb Prerna Saxena:
>> On 07/08/2013 02:32 PM, Andreas Färber wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Am 08.07.2013 03:09, schrieb David Gibson:
>>>> On Sat, Jul 06, 2013 at 11:54:15PM +1000, Alexey Kardashevskiy
>>>> wrote:
>>>>> @@ -1342,6 +1346,13 @@ static void
>>>>> ppc_spapr_init(QEMUMachineInitArgs *args) 
>>>>> register_savevm_live(NULL, "spapr/htab", -1, 1, 
>>>>> &savevm_htab_handlers, spapr);
>>>>>
>>>>> +    /* Ensure that cpu_model is correctly reflected for a KVM
>>>>> guest */ +    if (kvm_enabled() && !strcmp(cpu_model, "host")) { 
>>>>> +        asm ("mfpvr %0" +            : "=r"(pvr)); +
>>>>> cpu_model = ppc_cpu_alias_by_pvr(pvr);
>>>>
>>>> This needs to be protected by an ifdef CONFIG_KVM or similar.  If
>>>> the compiler optimization level is turned down, so that it doesn't 
>>>> recognize that the kvm_enabled() is always false, then this could 
>>>> attempt to compile the ppc asm instructions on an x86 (or
>>>> whatever) host.
>>>
>>> This hunk can be completely replaced by QOM mechanisms - just didn't
>>> get to replying yet...
>>
>> Sorry I already sent out a v2, and only then saw your message. Could you
>> pls explain how I could use QOM to replace this code block ?
> 
> Well, in short the thing is it has not much to do with KVM. The
> KVM-specific host-powerpc64-cpu type is derived from the one you're
> looking for and thus you can use object_class_get_parent() to obtain the
> parent type and look at its name - stripping "-" TYPE_POWERPC_CPU from
> it should be much more efficient but will give you the detailed name
> including revision. I was planning to propose an alternative patch for that.

This is what my patch does :-)

+const char *ppc_cpu_alias_by_pvr(uint32_t pvr)
+{
+    int i;
+    const char *cpu_alias;
+    char *offset, *model;
+
+    cpu_alias  = object_class_get_name(OBJECT_CLASS
+                            (ppc_cpu_class_by_pvr(pvr)));
+ ....[snip]

> 
> Replacing a concrete model name with its simpler alias is a secondary
> issue (separate patch) that is not specific to KVM or -cpu host. Compare
> -cpu POWER8_v1.0 printing .../POWER8_v1.0@0/... presumably.
> 

Agree that this is not specific to KVM. That is the reason I have set it
in a separate function, which can be called otherwise as well.

Just to clarify your response, you want the function I coded to be split
into 2 different pieces, to cater to the two specific requirements you
mention ? That can be done, but not sure if it is too much code bloat.

> Further, Alex has already applied a patch of his working around the
> alias table being a rather archaic construct, not intended for frequent
> use. Instead of adding even more functions that iterate it, we should
> turn it into a hashtable for efficient lookup.
> 

Can you / Alexander Graf point me to the fix ? I can rework my patch to
consume it ?

> (Note that the cpu_model_str field may contain more than just the model
> name, it is otherwise unused in softmmu and I was therefore preparing a
> patch to ban its use to linux-user solely, so the type name seems the
> most reliable indicator we have and as a bonus no PVR needed for it.)
> 

Hmm, maybe obsoleting PVR check is not such a great idea.
I'm not sure if my earlier email clearly outlined the use-case this
patch was attempting to fix. Here is a detailed explanation :

We will still need PVR based lookups for cases such as the one I have
described. As an illustration, consider running in a KVM environment
where QEMU hasnt been started with a specific CPU type via "-CPU
PPC_MODEL". In this case, we will be required to do a PVR_based lookup
only -- to make sure the guest gets initialized with the same CPU as
host. The notion of _same_cpu_model_ can only be built over a PVR check.

Regards,
-- 
Prerna Saxena

Linux Technology Centre,
IBM Systems and Technology Lab,
Bangalore, India

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 18/19] target-ppc: Enhance the CPU node labels for the guest device tree for pseries.
  2013-07-10  6:38           ` Prerna Saxena
@ 2013-07-10  9:11             ` Andreas Färber
  0 siblings, 0 replies; 30+ messages in thread
From: Andreas Färber @ 2013-07-10  9:11 UTC (permalink / raw)
  To: Prerna Saxena; +Cc: qemu-ppc, qemu-devel, Alexander Graf

Am 10.07.2013 08:38, schrieb Prerna Saxena:
> On 07/08/2013 10:15 PM, Andreas Färber wrote:
>> Am 08.07.2013 17:49, schrieb Prerna Saxena:
>>> On 07/08/2013 02:32 PM, Andreas Färber wrote:
>>>> Am 08.07.2013 03:09, schrieb David Gibson:
>>>>> On Sat, Jul 06, 2013 at 11:54:15PM +1000, Alexey Kardashevskiy
>>>>> wrote:
>>>>>> @@ -1342,6 +1346,13 @@ static void
>>>>>> ppc_spapr_init(QEMUMachineInitArgs *args) 
>>>>>> register_savevm_live(NULL, "spapr/htab", -1, 1, 
>>>>>> &savevm_htab_handlers, spapr);
>>>>>>
>>>>>> +    /* Ensure that cpu_model is correctly reflected for a KVM
>>>>>> guest */ +    if (kvm_enabled() && !strcmp(cpu_model, "host")) { 
>>>>>> +        asm ("mfpvr %0" +            : "=r"(pvr)); +
>>>>>> cpu_model = ppc_cpu_alias_by_pvr(pvr);
>>>>>
>>>>> This needs to be protected by an ifdef CONFIG_KVM or similar.  If
>>>>> the compiler optimization level is turned down, so that it doesn't 
>>>>> recognize that the kvm_enabled() is always false, then this could 
>>>>> attempt to compile the ppc asm instructions on an x86 (or
>>>>> whatever) host.
>>>>
>>>> This hunk can be completely replaced by QOM mechanisms - just didn't
>>>> get to replying yet...
>>>
>>> Sorry I already sent out a v2, and only then saw your message. Could you
>>> pls explain how I could use QOM to replace this code block ?
>>
>> Well, in short the thing is it has not much to do with KVM. The
>> KVM-specific host-powerpc64-cpu type is derived from the one you're
>> looking for and thus you can use object_class_get_parent() to obtain the
>> parent type and look at its name - stripping "-" TYPE_POWERPC_CPU from
>> it should be much more efficient but will give you the detailed name
>> including revision. I was planning to propose an alternative patch for that.
> 
> This is what my patch does :-)
> 
> +const char *ppc_cpu_alias_by_pvr(uint32_t pvr)
> +{
> +    int i;
> +    const char *cpu_alias;
> +    char *offset, *model;
> +
> +    cpu_alias  = object_class_get_name(OBJECT_CLASS
> +                            (ppc_cpu_class_by_pvr(pvr)));
> + ....[snip]

And I am complaining about code duplication: Your use of
ppc_cpu_class_by_pvr() should be replaced with object_class_get_parent()
as I said above, because the PVR lookup is already done in KVM code for
you. :-)

>> Replacing a concrete model name with its simpler alias is a secondary
>> issue (separate patch) that is not specific to KVM or -cpu host. Compare
>> -cpu POWER8_v1.0 printing .../POWER8_v1.0@0/... presumably.
>>
> 
> Agree that this is not specific to KVM. That is the reason I have set it
> in a separate function, which can be called otherwise as well.
> 
> Just to clarify your response, you want the function I coded to be split
> into 2 different pieces, to cater to the two specific requirements you
> mention ? That can be done, but not sure if it is too much code bloat.

Your function duplicates runtime functionality (while the model list
keeps growing...) and you are duplicating KVM code into sPAPR. I was
asking you to make better reuse of existing code and I asked you whether
we need the model-to-alias lookup at all. It should not be limited to
your KVMish cpu_model == host check but either be dropped or called
afterwards on any cpu_model for consistent results.

>> (Note that the cpu_model_str field may contain more than just the model
>> name, it is otherwise unused in softmmu and I was therefore preparing a
>> patch to ban its use to linux-user solely, so the type name seems the
>> most reliable indicator we have and as a bonus no PVR needed for it.)
>>
> 
> Hmm, maybe obsoleting PVR check is not such a great idea.
> I'm not sure if my earlier email clearly outlined the use-case this
> patch was attempting to fix. Here is a detailed explanation :
> 
> We will still need PVR based lookups for cases such as the one I have
> described. As an illustration, consider running in a KVM environment
> where QEMU hasnt been started with a specific CPU type via "-CPU
> PPC_MODEL". In this case, we will be required to do a PVR_based lookup
> only -- to make sure the guest gets initialized with the same CPU as
> host. The notion of _same_cpu_model_ can only be built over a PVR check.

Sorry? That is done in kvm.c (I wrote the current form of that code!)
and no one proposed changing it. What I am asking is not to introduce
yet another mfpvr in your *sPAPR* code.

There is no requirement to use -cpu host (the default) with KVM, you can
use -cpu some_model with KVM just as well. For instance, when your PVR
is not yet enabled in QEMU (e.g., try -cpu POWER7_v2.3 on POWER8 DD1 to
see what I mean).

Regards,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8
  2013-07-06 13:53 [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
                   ` (18 preceding siblings ...)
  2013-07-06 13:54 ` [Qemu-devel] [PATCH 19/19] spapr-pci: rework MSI/MSIX Alexey Kardashevskiy
@ 2013-07-29 20:24 ` Anthony Liguori
  19 siblings, 0 replies; 30+ messages in thread
From: Anthony Liguori @ 2013-07-29 20:24 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-devel
  Cc: Anthony Liguori, Alexander Graf, Paul Mackerras, qemu-ppc, David Gibson

Applied.  Thanks.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2013-07-29 20:24 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-06 13:53 [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8 Alexey Kardashevskiy
2013-07-06 13:53 ` [Qemu-devel] [PATCH 01/19] pseries: move interrupt controllers to hw/intc/ Alexey Kardashevskiy
2013-07-06 13:53 ` [Qemu-devel] [PATCH 02/19] pseries: rework XICS Alexey Kardashevskiy
2013-07-06 13:54 ` [Qemu-devel] [PATCH 03/19] savevm: Implement VMS_DIVIDE flag Alexey Kardashevskiy
2013-07-06 13:54 ` [Qemu-devel] [PATCH 04/19] target-ppc: Convert ppc cpu savevm to VMStateDescription Alexey Kardashevskiy
2013-07-06 13:54 ` [Qemu-devel] [PATCH 05/19] pseries: savevm support for XICS interrupt controller Alexey Kardashevskiy
2013-07-06 13:54 ` [Qemu-devel] [PATCH 06/19] pseries: savevm support for VIO devices Alexey Kardashevskiy
2013-07-06 13:54 ` [Qemu-devel] [PATCH 07/19] pseries: savevm support for PAPR VIO logical lan Alexey Kardashevskiy
2013-07-06 13:54 ` [Qemu-devel] [PATCH 08/19] pseries: savevm support for PAPR VIO logical tty Alexey Kardashevskiy
2013-07-06 13:54 ` [Qemu-devel] [PATCH 09/19] pseries: savevm support for PAPR TCE tables Alexey Kardashevskiy
2013-07-06 13:54 ` [Qemu-devel] [PATCH 10/19] pseries: rework PAPR virtual SCSI Alexey Kardashevskiy
2013-07-08 11:57   ` [Qemu-devel] [PATCH v2] " Alexey Kardashevskiy
2013-07-06 13:54 ` [Qemu-devel] [PATCH 11/19] pseries: savevm support for " Alexey Kardashevskiy
2013-07-06 13:54 ` [Qemu-devel] [PATCH 12/19] pseries: savevm support for pseries machine Alexey Kardashevskiy
2013-07-06 13:54 ` [Qemu-devel] [PATCH 13/19] pseries: savevm support for PCI host bridge Alexey Kardashevskiy
2013-07-07 23:33   ` David Gibson
2013-07-06 13:54 ` [Qemu-devel] [PATCH 14/19] target-ppc: Add helper for KVM_PPC_RTAS_DEFINE_TOKEN Alexey Kardashevskiy
2013-07-06 13:54 ` [Qemu-devel] [PATCH 15/19] pseries: Support for in-kernel XICS interrupt controller Alexey Kardashevskiy
2013-07-06 13:54 ` [Qemu-devel] [PATCH 16/19] pseries: savevm support with KVM Alexey Kardashevskiy
2013-07-06 13:54 ` [Qemu-devel] [PATCH 17/19] target-ppc: Add POWER8 v1.0 CPU model Alexey Kardashevskiy
2013-07-06 13:54 ` [Qemu-devel] [PATCH 18/19] target-ppc: Enhance the CPU node labels for the guest device tree for pseries Alexey Kardashevskiy
2013-07-08  1:09   ` David Gibson
2013-07-08  9:02     ` Andreas Färber
2013-07-08 15:49       ` Prerna Saxena
2013-07-08 16:45         ` Andreas Färber
2013-07-10  6:38           ` Prerna Saxena
2013-07-10  9:11             ` Andreas Färber
2013-07-08 15:45     ` [Qemu-devel] [PATCH v2 " Prerna Saxena
2013-07-06 13:54 ` [Qemu-devel] [PATCH 19/19] spapr-pci: rework MSI/MSIX Alexey Kardashevskiy
2013-07-29 20:24 ` [Qemu-devel] [PATCH 00/19 v4] spapr: migration, pci, msi, power8 Anthony Liguori

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.