All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests
@ 2015-04-24  6:47 Bharata B Rao
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 01/24] spapr: enable PHB/CPU/LMB hotplug for pseries-2.3 Bharata B Rao
                   ` (23 more replies)
  0 siblings, 24 replies; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, Bharata B Rao, mdroth, agraf, qemu-ppc, tyreld, nfont,
	imammedo, afaerber, david

This is v3 of the CPU and Memory hotplug patches for PowerPC sPAPR.

- CPU hotplug implementation here is based on device_add command and has
  nothing to do with the existing cpu-add QEMU monitor command.
  (qemu) device_add powerpc64-cpu-socket,id=sock1
  (qemu) device_del sock1
- This version adds a full socket in response to device_add. Though for
  PowerPC, the term socket really doesn't make sense, it is used as a
  container of CPU cores here. Unless sockets= option is explicitly specified,
  QEMU assigns one core per socket by default for PowerPC and the hotplug
  implementation too sticks to this behaviour.
- Currently hotplugging a socket populates the cores and threads based on
  the topology specified at boot time. i,e., there isn't a way to have add
  partially populated sockets resulting in non-homogeneous configurations.
  I have kept it this way since I don't want to invent and implement
  any new semantics that hasn't been agreed upon.
- Andreas has been mentioning about having link<> properties for the machine
  defined at boot time and populated at hotplug time by CPU objects that get
  created. I am not following that here as its need wasn't felt.
- The CPU hotplug semantics discussion is still underway but I am posting
  this version with v2 review comments addressed and more fixes incorporated.

Changes in v3
-------------
CPU
---
- Minor reworks on CPU device tree code reorganzation. Refer to patch
  description in 05/24 to see how the DT generation code flow looks like
  from different call sites.
- Don't create ibm,my-drc-index property when CPU DR isn't enabled. (12/24)
- Enable reuse of device ID string after unplug by releasing core and socket
  object when a vCPU is destroyed. (17/24) (Removed David's Reviewed-by
  from 17/24 since the patch got changed)
- Releasing cores and sockets during vcpu destroy needed a new QOM API (16/24).
- Deduce socket, core and thread numbers correctly even when none or only
  some of them are explicitly specified. (11/24) 
- Remove vCPU thread only after sending the unplug notification to guest and
  getting ACK from it. (20/24)
- Add CPU device coldplug support which allows CPU to be specified on QEMU
  cmdline like -device powerpc64-cpu-socket,id=sock1. This is in addition
  to the CPUs that are specified using -smp X. Coldplug support is needed
  for migration. (12/24)
- Setup CPU DR connectors early before boot CPUs are setup. This is needed
  to support coldplug. (02/24 - Removed David's Reviewed-by since the patch
  got changed)
- Ensure hot and cold plug of CPUs are handled correctly for machines where
  CPU DR isn't enabled. (12/24)
- Ensure NUMA node information is set for cold pluggged CPUs too. (12/24)

Memory
------
- Use ldl_phys instead of rtas_ld as suggested by David. (23/24)
- Support cold plugging of memory device via QEMU cmdline like
  -object memory-backend-ram,id=ram1,size=1G -device pc-dimm,id=dimm,memdev=ram1
  Ensure SPAPR_LMB_FLAGS_ASSIGNED flags are set for cold-plugged memory
  too. Cold plugging support enables migration. (23/24)
- New API to lookup NUMA node by address. (22/24)
- Removed unused enforced-aligned-dimm property.
- Enforce memory hotplug to node 0 only since pseries kernel still doesn't
  allow updation of associativity index for hotplugged LMBs. (24/24)
- Do vm_unregister_ram() and memory_region_del_subregion() when hot adding
  of memory fails. (24/24)
- Fail hotplug and unplug for machine versions that don't support memory
  DR (24/24)
- Resorted to fdt_setprop instead of fdt_setprop64 since the latter isn't
  present yet in the DTC submodule of QEMU. (23/24)
- Use uint64_t for lmb_size so that memory > 4G is handled correctly by
  the guest (23/24)
 
Known issues and limitations
----------------------------
- Hot removal of CPUs in random order breaks migration. Fixing this needs
  invention of new semantics.
- Still not able to fail the hotplug and do proper cleanup for hot plug
  requests on machines which don't support CPU DR. Resorting to silent
  failure without raising an error to the user.
- Memory can be hotplugged to Node 0 only currently. IIUC, to support hotplug
  to other NUMA nodes, guest kernel support for sending configure-connector
  call during hotplug is needed.
- Guest NUMA nodes come up with flat distance after supporting
  ibm,dynamic-reconfiguration-memory node. Still debugging this.
- DRC states are still not migrated, this is needed to support migration
  with hotplug correctly. Without it there are limitations:
	- VM hotplugged with CPUs can be migrated and hotplugged CPUs
	  can be removed at the target (supported by a hack)
	- VM hotplugged with memory can be migrated however hot memory
	  addition to a migrated VM isn't possible yet.
  These oddities will go away after DRC state migration is supported.
- David's long standing review comment about sharing code b/n x86 and
  PowerPC memory hotplug handlers is yet to be addressed.

Dependencies
------------
- For CPU and Memory hotplug to work, latest powerpc-utils and ppc64-diag
  packages are needed in the guest. version 1.2.25 of powerpc-utils
  and latest git master (unreleased) of ppc64-diag are required.

Previous versions
-----------------
v2: http://lists.nongnu.org/archive/html/qemu-devel/2015-03/msg04737.html
v1: http://lists.gnu.org/archive/html/qemu-devel/2015-01/msg00611.html
v0: http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg00752.html

Git tree for this implementation
--------------------------------
- These patches apply against spapr-hotplug-pci-v7 branch of Michael Roth's
  PCI hotplug tree (git://github.com/mdroth/qemu)
- The current patchset can be fetched from
  spapr-hotplug branch at https://github.com/bharata/qemu/


Andreas Färber (1):
  cpu: Prepare Socket container type

Bharata B Rao (21):
  spapr: Add DRC dt entries for CPUs
  spapr: Consider max_cpus during xics initialization
  spapr: Support ibm,lrdr-capacity device tree property
  spapr: Reorganize CPU dt generation code
  spapr: Consolidate cpu init code into a routine
  ppc: Prepare CPU socket/core abstraction
  spapr: Add CPU hotplug handler
  ppc: Update cpu_model in MachineState
  ppc: Create sockets and cores for CPUs
  spapr: CPU hotplug support
  cpus: Add Error argument to cpu_exec_init()
  cpus: Convert cpu_index into a bitmap
  ppc: Move cpu_exec_init() call to realize function
  qom: Introduce object_has_no_children() API
  xics_kvm: Don't enable KVM_CAP_IRQ_XICS if already enabled
  xics_kvm: Add cpu_destroy method to XICS
  spapr: CPU hot unplug support
  spapr: Initialize hotplug memory address space
  numa: API to looking NUMA node by address
  spapr: Support ibm,dynamic-reconfiguration-memory
  spapr: Memory hotplug support

Gu Zheng (1):
  cpus, qom: Reclaim vCPU objects

Michael Roth (1):
  spapr: enable PHB/CPU/LMB hotplug for pseries-2.3

 cpus.c                            |   67 +++
 default-configs/ppc64-softmmu.mak |    1 +
 docs/specs/ppc-spapr-hotplug.txt  |   66 +++
 exec.c                            |   39 +-
 hw/cpu/Makefile.objs              |    2 +-
 hw/cpu/socket.c                   |   21 +
 hw/intc/xics.c                    |   12 +
 hw/intc/xics_kvm.c                |   19 +
 hw/ppc/Makefile.objs              |    1 +
 hw/ppc/cpu-core.c                 |   65 +++
 hw/ppc/cpu-socket.c               |   68 +++
 hw/ppc/mac_newworld.c             |   10 +-
 hw/ppc/mac_oldworld.c             |    7 +-
 hw/ppc/ppc440_bamboo.c            |    7 +-
 hw/ppc/prep.c                     |    7 +-
 hw/ppc/spapr.c                    | 1054 ++++++++++++++++++++++++++++++-------
 hw/ppc/spapr_events.c             |   11 +-
 hw/ppc/spapr_hcall.c              |   51 +-
 hw/ppc/spapr_rtas.c               |   29 +-
 hw/ppc/virtex_ml507.c             |    7 +-
 include/exec/exec-all.h           |    2 +-
 include/hw/cpu/socket.h           |   14 +
 include/hw/ppc/cpu-core.h         |   32 ++
 include/hw/ppc/cpu-socket.h       |   32 ++
 include/hw/ppc/spapr.h            |   37 +-
 include/hw/ppc/xics.h             |    3 +
 include/qom/cpu.h                 |   19 +
 include/qom/object.h              |   11 +
 include/sysemu/kvm.h              |    1 +
 include/sysemu/numa.h             |    3 +
 kvm-all.c                         |   57 +-
 kvm-stub.c                        |    5 +
 numa.c                            |   61 +++
 qom/object.c                      |   12 +
 target-alpha/cpu.c                |    8 +-
 target-arm/cpu.c                  |    3 +-
 target-cris/cpu.c                 |    8 +-
 target-i386/cpu.c                 |    8 +-
 target-lm32/cpu.c                 |    8 +-
 target-m68k/cpu.c                 |    8 +-
 target-microblaze/cpu.c           |    8 +-
 target-mips/cpu.c                 |    8 +-
 target-moxie/cpu.c                |    8 +-
 target-openrisc/cpu.c             |    8 +-
 target-ppc/cpu.h                  |    1 +
 target-ppc/translate_init.c       |   65 ++-
 target-s390x/cpu.c                |    3 +-
 target-sh4/cpu.c                  |    8 +-
 target-sparc/cpu.c                |    3 +-
 target-tricore/cpu.c              |    7 +-
 target-unicore32/cpu.c            |    8 +-
 target-xtensa/cpu.c               |    8 +-
 52 files changed, 1747 insertions(+), 264 deletions(-)
 create mode 100644 hw/cpu/socket.c
 create mode 100644 hw/ppc/cpu-core.c
 create mode 100644 hw/ppc/cpu-socket.c
 create mode 100644 include/hw/cpu/socket.h
 create mode 100644 include/hw/ppc/cpu-core.h
 create mode 100644 include/hw/ppc/cpu-socket.h

-- 
2.1.0

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [Qemu-devel] [RFC PATCH v3 01/24] spapr: enable PHB/CPU/LMB hotplug for pseries-2.3
  2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
@ 2015-04-24  6:47 ` Bharata B Rao
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 02/24] spapr: Add DRC dt entries for CPUs Bharata B Rao
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, Bharata B Rao, mdroth, agraf, qemu-ppc, tyreld, nfont,
	imammedo, afaerber, david

From: Michael Roth <mdroth@linux.vnet.ibm.com>

Introduce an sPAPRMachineClass sub-class of MachineClass to
handle sPAPR-specific machine configuration properties.

The 'dr_phb[cpu,lmb]_enabled' field of that class can be set as
part of machine-specific init code, and is then propagated
to sPAPREnvironment to conditional enable creation of DRC
objects and device-tree description to facilitate hotplug
of PHBs/CPUs/LMBs.

Since we can't migrate this state to older machine types,
default the option to false and only enable it for new
machine types.

TODO: Change this to pseries-2.4.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
              [Added CPU and LMB bits]
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr.c         | 32 ++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h |  3 +++
 2 files changed, 35 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 74ee277..981814d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -90,11 +90,29 @@
 
 #define HTAB_SIZE(spapr)        (1ULL << ((spapr)->htab_shift))
 
+typedef struct sPAPRMachineClass sPAPRMachineClass;
 typedef struct sPAPRMachineState sPAPRMachineState;
 
 #define TYPE_SPAPR_MACHINE      "spapr-machine"
 #define SPAPR_MACHINE(obj) \
     OBJECT_CHECK(sPAPRMachineState, (obj), TYPE_SPAPR_MACHINE)
+#define SPAPR_MACHINE_GET_CLASS(obj) \
+        OBJECT_GET_CLASS(sPAPRMachineClass, obj, TYPE_SPAPR_MACHINE)
+#define SPAPR_MACHINE_CLASS(klass) \
+        OBJECT_CLASS_CHECK(sPAPRMachineClass, klass, TYPE_SPAPR_MACHINE)
+
+/**
+ * sPAPRMachineClass:
+ */
+struct sPAPRMachineClass {
+    /*< private >*/
+    MachineClass parent_class;
+
+    /*< public >*/
+    bool dr_phb_enabled; /* enable dynamic-reconfig/hotplug of PHBs */
+    bool dr_cpu_enabled; /* enable dynamic-reconfig/hotplug of CPUs */
+    bool dr_lmb_enabled; /* enable dynamic-reconfig/hotplug of LMBs */
+};
 
 /**
  * sPAPRMachineState:
@@ -1378,6 +1396,7 @@ static SaveVMHandlers savevm_htab_handlers = {
 /* pSeries LPAR / sPAPR hardware init */
 static void ppc_spapr_init(MachineState *machine)
 {
+    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(machine);
     ram_addr_t ram_size = machine->ram_size;
     const char *cpu_model = machine->cpu_model;
     const char *kernel_filename = machine->kernel_filename;
@@ -1459,6 +1478,10 @@ static void ppc_spapr_init(MachineState *machine)
     spapr->icp = xics_system_init(smp_cpus * kvmppc_smt_threads() / smp_threads,
                                   XICS_IRQS);
 
+    spapr->dr_phb_enabled = smc->dr_phb_enabled;
+    spapr->dr_cpu_enabled = smc->dr_cpu_enabled;
+    spapr->dr_lmb_enabled = smc->dr_lmb_enabled;
+
     /* init CPUs */
     if (cpu_model == NULL) {
         cpu_model = kvm_enabled() ? "host" : "POWER7";
@@ -1767,6 +1790,7 @@ static void spapr_nmi(NMIState *n, int cpu_index, Error **errp)
 static void spapr_machine_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
+    sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(oc);
     FWPathProviderClass *fwc = FW_PATH_PROVIDER_CLASS(oc);
     NMIClass *nc = NMI_CLASS(oc);
 
@@ -1778,6 +1802,9 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
     mc->default_boot_order = NULL;
     mc->kvm_type = spapr_kvm_type;
     mc->has_dynamic_sysbus = true;
+    smc->dr_phb_enabled = false;
+    smc->dr_cpu_enabled = false;
+    smc->dr_lmb_enabled = false;
 
     fwc->get_dev_path = spapr_get_fw_dev_path;
     nc->nmi_monitor_handler = spapr_nmi;
@@ -1789,6 +1816,7 @@ static const TypeInfo spapr_machine_info = {
     .abstract      = true,
     .instance_size = sizeof(sPAPRMachineState),
     .instance_init = spapr_machine_initfn,
+    .class_size    = sizeof(sPAPRMachineClass),
     .class_init    = spapr_machine_class_init,
     .interfaces = (InterfaceInfo[]) {
         { TYPE_FW_PATH_PROVIDER },
@@ -1854,11 +1882,15 @@ static const TypeInfo spapr_machine_2_2_info = {
 static void spapr_machine_2_3_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
+    sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(oc);
 
     mc->name = "pseries-2.3";
     mc->desc = "pSeries Logical Partition (PAPR compliant) v2.3";
     mc->alias = "pseries";
     mc->is_default = 1;
+    smc->dr_phb_enabled = true;
+    smc->dr_cpu_enabled = true;
+    smc->dr_lmb_enabled = true;
 }
 
 static const TypeInfo spapr_machine_2_3_info = {
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 4ab289b..4578433 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -32,6 +32,9 @@ typedef struct sPAPREnvironment {
     uint64_t rtc_offset; /* Now used only during incoming migration */
     struct PPCTimebase tb;
     bool has_graphics;
+    bool dr_phb_enabled; /* hotplug / dynamic-reconfiguration of PHBs */
+    bool dr_cpu_enabled; /* hotplug / dynamic-reconfiguration of CPUs */
+    bool dr_lmb_enabled; /* hotplug / dynamic-reconfiguration of LMBs */
 
     uint32_t check_exception_irq;
     Notifier epow_notifier;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [RFC PATCH v3 02/24] spapr: Add DRC dt entries for CPUs
  2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 01/24] spapr: enable PHB/CPU/LMB hotplug for pseries-2.3 Bharata B Rao
@ 2015-04-24  6:47 ` Bharata B Rao
  2015-05-04 11:46   ` David Gibson
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 03/24] spapr: Consider max_cpus during xics initialization Bharata B Rao
                   ` (21 subsequent siblings)
  23 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, Bharata B Rao, mdroth, agraf, qemu-ppc, tyreld, nfont,
	imammedo, afaerber, david

Advertise CPU DR-capability to the guest via device tree.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
               [spapr_drc_reset implementation]
---
 hw/ppc/spapr.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 981814d..9ea3a38 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -807,6 +807,15 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
         spapr_populate_chosen_stdout(fdt, spapr->vio_bus);
     }
 
+    if (spapr->dr_cpu_enabled) {
+        int offset = fdt_path_offset(fdt, "/cpus");
+        ret = spapr_drc_populate_dt(fdt, offset, NULL,
+                                    SPAPR_DR_CONNECTOR_TYPE_CPU);
+        if (ret < 0) {
+            fprintf(stderr, "Couldn't set up CPU DR device tree properties\n");
+        }
+    }
+
     _FDT((fdt_pack(fdt)));
 
     if (fdt_totalsize(fdt) > FDT_MAX_SIZE) {
@@ -1393,6 +1402,16 @@ static SaveVMHandlers savevm_htab_handlers = {
     .load_state = htab_load,
 };
 
+static void spapr_drc_reset(void *opaque)
+{
+    sPAPRDRConnector *drc = opaque;
+    DeviceState *d = DEVICE(drc);
+
+    if (d) {
+        device_reset(d);
+    }
+}
+
 /* pSeries LPAR / sPAPR hardware init */
 static void ppc_spapr_init(MachineState *machine)
 {
@@ -1418,6 +1437,7 @@ static void ppc_spapr_init(MachineState *machine)
     long load_limit, fw_size;
     bool kernel_le = false;
     char *filename;
+    int smt = kvmppc_smt_threads();
 
     msi_supported = true;
 
@@ -1482,6 +1502,15 @@ static void ppc_spapr_init(MachineState *machine)
     spapr->dr_cpu_enabled = smc->dr_cpu_enabled;
     spapr->dr_lmb_enabled = smc->dr_lmb_enabled;
 
+    if (spapr->dr_cpu_enabled) {
+        for (i = 0; i < max_cpus/smp_threads; i++) {
+            sPAPRDRConnector *drc =
+                spapr_dr_connector_new(OBJECT(machine),
+                                       SPAPR_DR_CONNECTOR_TYPE_CPU, i * smt);
+            qemu_register_reset(spapr_drc_reset, drc);
+        }
+    }
+
     /* init CPUs */
     if (cpu_model == NULL) {
         cpu_model = kvm_enabled() ? "host" : "POWER7";
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [RFC PATCH v3 03/24] spapr: Consider max_cpus during xics initialization
  2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 01/24] spapr: enable PHB/CPU/LMB hotplug for pseries-2.3 Bharata B Rao
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 02/24] spapr: Add DRC dt entries for CPUs Bharata B Rao
@ 2015-04-24  6:47 ` Bharata B Rao
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 04/24] spapr: Support ibm, lrdr-capacity device tree property Bharata B Rao
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, Bharata B Rao, mdroth, agraf, qemu-ppc, tyreld, nfont,
	imammedo, afaerber, david

Use max_cpus instead of smp_cpus when intializating xics system. Also
report max_cpus in ibm,interrupt-server-ranges device tree property of
interrupt controller node.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 9ea3a38..5151d71 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -347,7 +347,7 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
     GString *hypertas = g_string_sized_new(256);
     GString *qemu_hypertas = g_string_sized_new(256);
     uint32_t refpoints[] = {cpu_to_be32(0x4), cpu_to_be32(0x4)};
-    uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(smp_cpus)};
+    uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(max_cpus)};
     int smt = kvmppc_smt_threads();
     unsigned char vec5[] = {0x0, 0x0, 0x0, 0x0, 0x0, 0x80};
     QemuOpts *opts = qemu_opts_find(qemu_find_opts("smp-opts"), NULL);
@@ -1495,7 +1495,7 @@ static void ppc_spapr_init(MachineState *machine)
     }
 
     /* Set up Interrupt Controller before we create the VCPUs */
-    spapr->icp = xics_system_init(smp_cpus * kvmppc_smt_threads() / smp_threads,
+    spapr->icp = xics_system_init(max_cpus * kvmppc_smt_threads() / smp_threads,
                                   XICS_IRQS);
 
     spapr->dr_phb_enabled = smc->dr_phb_enabled;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [RFC PATCH v3 04/24] spapr: Support ibm, lrdr-capacity device tree property
  2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
                   ` (2 preceding siblings ...)
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 03/24] spapr: Consider max_cpus during xics initialization Bharata B Rao
@ 2015-04-24  6:47 ` Bharata B Rao
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 05/24] spapr: Reorganize CPU dt generation code Bharata B Rao
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, Bharata B Rao, mdroth, agraf, qemu-ppc, tyreld, nfont,
	imammedo, afaerber, david

Add support for ibm,lrdr-capacity since this is needed by the guest
kernel to know about the possible hot-pluggable CPUs and Memory. With
this, pseries kernels will start reporting correct maxcpus in
/sys/devices/system/cpu/possible.

Define minimum hotpluggable memory size as 256MB and start storing maximum
possible memory for the guest in sPAPREnvironment.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 docs/specs/ppc-spapr-hotplug.txt | 18 ++++++++++++++++++
 hw/ppc/spapr.c                   |  3 ++-
 hw/ppc/spapr_rtas.c              | 18 ++++++++++++++++--
 include/hw/ppc/spapr.h           |  7 +++++--
 4 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/docs/specs/ppc-spapr-hotplug.txt b/docs/specs/ppc-spapr-hotplug.txt
index d35771c..46e0719 100644
--- a/docs/specs/ppc-spapr-hotplug.txt
+++ b/docs/specs/ppc-spapr-hotplug.txt
@@ -284,4 +284,22 @@ struct rtas_event_log_v6_hp {
     } drc;
 } QEMU_PACKED;
 
+== ibm,lrdr-capacity ==
+
+ibm,lrdr-capacity is a property in the /rtas device tree node that identifies
+the dynamic reconfiguration capabilities of the guest. It consists of a triple
+consisting of <phys>, <size> and <maxcpus>.
+
+  <phys>, encoded in BE format represents the maximum address in bytes and
+  hence the maximum memory that can be allocated to the guest.
+
+  <size>, encoded in BE format represents the size increments in which
+  memory can be hot-plugged to the guest.
+
+  <maxcpus>, a BE-encoded integer, represents the maximum number of
+  processors that the guest can have.
+
+pseries guests use this property to note the maximum allowed CPUs for the
+guest.
+
 [1] http://thread.gmane.org/gmane.linux.ports.ppc.embedded/75350/focus=106867
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 5151d71..4e72f26 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -777,7 +777,7 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
     }
 
     /* RTAS */
-    ret = spapr_rtas_device_tree_setup(fdt, rtas_addr, rtas_size);
+    ret = spapr_rtas_device_tree_setup(spapr, fdt, rtas_addr, rtas_size);
     if (ret < 0) {
         fprintf(stderr, "Couldn't set up RTAS device tree properties\n");
     }
@@ -1549,6 +1549,7 @@ static void ppc_spapr_init(MachineState *machine)
 
     /* allocate RAM */
     spapr->ram_limit = ram_size;
+    spapr->maxram_limit = machine->maxram_size;
     memory_region_allocate_system_memory(ram, NULL, "ppc_spapr.ram",
                                          spapr->ram_limit);
     memory_region_add_subregion(sysmem, 0, ram);
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index b4047af..57ec97a 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -29,6 +29,7 @@
 #include "sysemu/char.h"
 #include "hw/qdev.h"
 #include "sysemu/device_tree.h"
+#include "sysemu/cpus.h"
 
 #include "hw/ppc/spapr.h"
 #include "hw/ppc/spapr_vio.h"
@@ -613,11 +614,12 @@ void spapr_rtas_register(int token, const char *name, spapr_rtas_fn fn)
     rtas_table[token].fn = fn;
 }
 
-int spapr_rtas_device_tree_setup(void *fdt, hwaddr rtas_addr,
-                                 hwaddr rtas_size)
+int spapr_rtas_device_tree_setup(sPAPREnvironment *spapr, void *fdt,
+                                 hwaddr rtas_addr, hwaddr rtas_size)
 {
     int ret;
     int i;
+    uint32_t lrdr_capacity[5];
 
     ret = fdt_add_mem_rsv(fdt, rtas_addr, rtas_size);
     if (ret < 0) {
@@ -666,6 +668,18 @@ int spapr_rtas_device_tree_setup(void *fdt, hwaddr rtas_addr,
         }
 
     }
+
+    lrdr_capacity[0] = cpu_to_be32(spapr->maxram_limit >> 32);
+    lrdr_capacity[1] = cpu_to_be32(spapr->maxram_limit & 0xffffffff);
+    lrdr_capacity[2] = 0;
+    lrdr_capacity[3] = cpu_to_be32(SPAPR_MEMORY_BLOCK_SIZE);
+    lrdr_capacity[4] = cpu_to_be32(max_cpus/smp_threads);
+    ret = qemu_fdt_setprop(fdt, "/rtas", "ibm,lrdr-capacity", lrdr_capacity,
+                     sizeof(lrdr_capacity));
+    if (ret < 0) {
+        fprintf(stderr, "Couldn't add ibm,lrdr-capacity rtas property\n");
+    }
+
     return 0;
 }
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 4578433..ecac6e3 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -20,6 +20,7 @@ typedef struct sPAPREnvironment {
     DeviceState *rtc;
 
     hwaddr ram_limit;
+    hwaddr maxram_limit;
     void *htab;
     uint32_t htab_shift;
     hwaddr rma_size;
@@ -497,8 +498,8 @@ void spapr_rtas_register(int token, const char *name, spapr_rtas_fn fn);
 target_ulong spapr_rtas_call(PowerPCCPU *cpu, sPAPREnvironment *spapr,
                              uint32_t token, uint32_t nargs, target_ulong args,
                              uint32_t nret, target_ulong rets);
-int spapr_rtas_device_tree_setup(void *fdt, hwaddr rtas_addr,
-                                 hwaddr rtas_size);
+int spapr_rtas_device_tree_setup(sPAPREnvironment *spapr, void *fdt,
+                                 hwaddr rtas_addr, hwaddr rtas_size);
 
 #define SPAPR_TCE_PAGE_SHIFT   12
 #define SPAPR_TCE_PAGE_SIZE    (1ULL << SPAPR_TCE_PAGE_SHIFT)
@@ -539,6 +540,8 @@ struct sPAPREventLogEntry {
     QTAILQ_ENTRY(sPAPREventLogEntry) next;
 };
 
+#define SPAPR_MEMORY_BLOCK_SIZE (1 << 28) /* 256MB */
+
 void spapr_events_init(sPAPREnvironment *spapr);
 void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq);
 int spapr_h_cas_compose_response(target_ulong addr, target_ulong size);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [RFC PATCH v3 05/24] spapr: Reorganize CPU dt generation code
  2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
                   ` (3 preceding siblings ...)
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 04/24] spapr: Support ibm, lrdr-capacity device tree property Bharata B Rao
@ 2015-04-24  6:47 ` Bharata B Rao
  2015-04-26 11:47   ` Bharata B Rao
  2015-05-04 11:59   ` David Gibson
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 06/24] spapr: Consolidate cpu init code into a routine Bharata B Rao
                   ` (18 subsequent siblings)
  23 siblings, 2 replies; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, Bharata B Rao, mdroth, agraf, qemu-ppc, tyreld, nfont,
	imammedo, afaerber, david

Reorganize CPU device tree generation code so that it be reused from
hotplug path. CPU dt entries are now generated from spapr_finalize_fdt()
instead of spapr_create_fdt_skel().

Note: This is how the split-up looks like now:

Boot path
---------
spapr_finalize_fdt
 spapr_populate_cpus_dt_node
  spapr_populate_cpu_dt
   spapr_fixup_cpu_numa_dt
   spapr_fixup_cpu_smt_dt

Hotplug path
------------
spapr_cpu_plug
 spapr_populate_hotplug_cpu_dt
  spapr_populate_cpu_dt
   spapr_fixup_cpu_numa_dt
   spapr_fixup_cpu_smt_dt

ibm,cas path
------------
spapr_h_cas_compose_response
 spapr_fixup_cpu_dt
  spapr_fixup_cpu_numa_dt
  spapr_fixup_cpu_smt_dt

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c | 278 +++++++++++++++++++++++++++++++--------------------------
 1 file changed, 153 insertions(+), 125 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 4e72f26..a56f9a1 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -206,6 +206,27 @@ static int spapr_fixup_cpu_smt_dt(void *fdt, int offset, PowerPCCPU *cpu,
     return ret;
 }
 
+static int spapr_fixup_cpu_numa_dt(void *fdt, int offset, CPUState *cs)
+{
+    int ret = 0;
+    PowerPCCPU *cpu = POWERPC_CPU(cs);
+    int index = ppc_get_vcpu_dt_id(cpu);
+    uint32_t associativity[] = {cpu_to_be32(0x5),
+                                cpu_to_be32(0x0),
+                                cpu_to_be32(0x0),
+                                cpu_to_be32(0x0),
+                                cpu_to_be32(cs->numa_node),
+                                cpu_to_be32(index)};
+
+    /* Advertise NUMA via ibm,associativity */
+    if (nb_numa_nodes > 1) {
+        ret = fdt_setprop(fdt, offset, "ibm,associativity", associativity,
+                          sizeof(associativity));
+    }
+
+    return ret;
+}
+
 static int spapr_fixup_cpu_dt(void *fdt, sPAPREnvironment *spapr)
 {
     int ret = 0, offset, cpus_offset;
@@ -218,12 +239,6 @@ static int spapr_fixup_cpu_dt(void *fdt, sPAPREnvironment *spapr)
         PowerPCCPU *cpu = POWERPC_CPU(cs);
         DeviceClass *dc = DEVICE_GET_CLASS(cs);
         int index = ppc_get_vcpu_dt_id(cpu);
-        uint32_t associativity[] = {cpu_to_be32(0x5),
-                                    cpu_to_be32(0x0),
-                                    cpu_to_be32(0x0),
-                                    cpu_to_be32(0x0),
-                                    cpu_to_be32(cs->numa_node),
-                                    cpu_to_be32(index)};
 
         if ((index % smt) != 0) {
             continue;
@@ -247,16 +262,13 @@ static int spapr_fixup_cpu_dt(void *fdt, sPAPREnvironment *spapr)
             }
         }
 
-        if (nb_numa_nodes > 1) {
-            ret = fdt_setprop(fdt, offset, "ibm,associativity", associativity,
-                              sizeof(associativity));
-            if (ret < 0) {
-                return ret;
-            }
+        ret = fdt_setprop(fdt, offset, "ibm,pft-size",
+                      pft_size_prop, sizeof(pft_size_prop));
+        if (ret < 0) {
+            return ret;
         }
 
-        ret = fdt_setprop(fdt, offset, "ibm,pft-size",
-                          pft_size_prop, sizeof(pft_size_prop));
+        ret = spapr_fixup_cpu_numa_dt(fdt, offset, cs);
         if (ret < 0) {
             return ret;
         }
@@ -341,18 +353,13 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
                                    uint32_t epow_irq)
 {
     void *fdt;
-    CPUState *cs;
     uint32_t start_prop = cpu_to_be32(initrd_base);
     uint32_t end_prop = cpu_to_be32(initrd_base + initrd_size);
     GString *hypertas = g_string_sized_new(256);
     GString *qemu_hypertas = g_string_sized_new(256);
     uint32_t refpoints[] = {cpu_to_be32(0x4), cpu_to_be32(0x4)};
     uint32_t interrupt_server_ranges_prop[] = {0, cpu_to_be32(max_cpus)};
-    int smt = kvmppc_smt_threads();
     unsigned char vec5[] = {0x0, 0x0, 0x0, 0x0, 0x0, 0x80};
-    QemuOpts *opts = qemu_opts_find(qemu_find_opts("smp-opts"), NULL);
-    unsigned sockets = opts ? qemu_opt_get_number(opts, "sockets", 0) : 0;
-    uint32_t cpus_per_socket = sockets ? (smp_cpus / sockets) : 1;
     char *buf;
 
     add_str(hypertas, "hcall-pft");
@@ -441,107 +448,6 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
 
     _FDT((fdt_end_node(fdt)));
 
-    /* cpus */
-    _FDT((fdt_begin_node(fdt, "cpus")));
-
-    _FDT((fdt_property_cell(fdt, "#address-cells", 0x1)));
-    _FDT((fdt_property_cell(fdt, "#size-cells", 0x0)));
-
-    CPU_FOREACH(cs) {
-        PowerPCCPU *cpu = POWERPC_CPU(cs);
-        CPUPPCState *env = &cpu->env;
-        DeviceClass *dc = DEVICE_GET_CLASS(cs);
-        PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cs);
-        int index = ppc_get_vcpu_dt_id(cpu);
-        char *nodename;
-        uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40),
-                           0xffffffff, 0xffffffff};
-        uint32_t tbfreq = kvm_enabled() ? kvmppc_get_tbfreq() : TIMEBASE_FREQ;
-        uint32_t cpufreq = kvm_enabled() ? kvmppc_get_clockfreq() : 1000000000;
-        uint32_t page_sizes_prop[64];
-        size_t page_sizes_prop_size;
-
-        if ((index % smt) != 0) {
-            continue;
-        }
-
-        nodename = g_strdup_printf("%s@%x", dc->fw_name, index);
-
-        _FDT((fdt_begin_node(fdt, nodename)));
-
-        g_free(nodename);
-
-        _FDT((fdt_property_cell(fdt, "reg", index)));
-        _FDT((fdt_property_string(fdt, "device_type", "cpu")));
-
-        _FDT((fdt_property_cell(fdt, "cpu-version", env->spr[SPR_PVR])));
-        _FDT((fdt_property_cell(fdt, "d-cache-block-size",
-                                env->dcache_line_size)));
-        _FDT((fdt_property_cell(fdt, "d-cache-line-size",
-                                env->dcache_line_size)));
-        _FDT((fdt_property_cell(fdt, "i-cache-block-size",
-                                env->icache_line_size)));
-        _FDT((fdt_property_cell(fdt, "i-cache-line-size",
-                                env->icache_line_size)));
-
-        if (pcc->l1_dcache_size) {
-            _FDT((fdt_property_cell(fdt, "d-cache-size", pcc->l1_dcache_size)));
-        } else {
-            fprintf(stderr, "Warning: Unknown L1 dcache size for cpu\n");
-        }
-        if (pcc->l1_icache_size) {
-            _FDT((fdt_property_cell(fdt, "i-cache-size", pcc->l1_icache_size)));
-        } else {
-            fprintf(stderr, "Warning: Unknown L1 icache size for cpu\n");
-        }
-
-        _FDT((fdt_property_cell(fdt, "timebase-frequency", tbfreq)));
-        _FDT((fdt_property_cell(fdt, "clock-frequency", cpufreq)));
-        _FDT((fdt_property_cell(fdt, "ibm,slb-size", env->slb_nr)));
-        _FDT((fdt_property_string(fdt, "status", "okay")));
-        _FDT((fdt_property(fdt, "64-bit", NULL, 0)));
-
-        if (env->spr_cb[SPR_PURR].oea_read) {
-            _FDT((fdt_property(fdt, "ibm,purr", NULL, 0)));
-        }
-
-        if (env->mmu_model & POWERPC_MMU_1TSEG) {
-            _FDT((fdt_property(fdt, "ibm,processor-segment-sizes",
-                               segs, sizeof(segs))));
-        }
-
-        /* Advertise VMX/VSX (vector extensions) if available
-         *   0 / no property == no vector extensions
-         *   1               == VMX / Altivec available
-         *   2               == VSX available */
-        if (env->insns_flags & PPC_ALTIVEC) {
-            uint32_t vmx = (env->insns_flags2 & PPC2_VSX) ? 2 : 1;
-
-            _FDT((fdt_property_cell(fdt, "ibm,vmx", vmx)));
-        }
-
-        /* Advertise DFP (Decimal Floating Point) if available
-         *   0 / no property == no DFP
-         *   1               == DFP available */
-        if (env->insns_flags2 & PPC2_DFP) {
-            _FDT((fdt_property_cell(fdt, "ibm,dfp", 1)));
-        }
-
-        page_sizes_prop_size = create_page_sizes_prop(env, page_sizes_prop,
-                                                      sizeof(page_sizes_prop));
-        if (page_sizes_prop_size) {
-            _FDT((fdt_property(fdt, "ibm,segment-page-sizes",
-                               page_sizes_prop, page_sizes_prop_size)));
-        }
-
-        _FDT((fdt_property_cell(fdt, "ibm,chip-id",
-                                cs->cpu_index / cpus_per_socket)));
-
-        _FDT((fdt_end_node(fdt)));
-    }
-
-    _FDT((fdt_end_node(fdt)));
-
     /* RTAS */
     _FDT((fdt_begin_node(fdt, "rtas")));
 
@@ -739,6 +645,131 @@ static int spapr_populate_memory(sPAPREnvironment *spapr, void *fdt)
     return 0;
 }
 
+static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, int offset)
+{
+    PowerPCCPU *cpu = POWERPC_CPU(cs);
+    CPUPPCState *env = &cpu->env;
+    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(cs);
+    int index = ppc_get_vcpu_dt_id(cpu);
+    uint32_t segs[] = {cpu_to_be32(28), cpu_to_be32(40),
+                       0xffffffff, 0xffffffff};
+    uint32_t tbfreq = kvm_enabled() ? kvmppc_get_tbfreq() : TIMEBASE_FREQ;
+    uint32_t cpufreq = kvm_enabled() ? kvmppc_get_clockfreq() : 1000000000;
+    uint32_t page_sizes_prop[64];
+    size_t page_sizes_prop_size;
+    QemuOpts *opts = qemu_opts_find(qemu_find_opts("smp-opts"), NULL);
+    unsigned sockets = opts ? qemu_opt_get_number(opts, "sockets", 0) : 0;
+    uint32_t cpus_per_socket = sockets ? (smp_cpus / sockets) : 1;
+    uint32_t pft_size_prop[] = {0, cpu_to_be32(spapr->htab_shift)};
+
+    _FDT((fdt_setprop_cell(fdt, offset, "reg", index)));
+    _FDT((fdt_setprop_string(fdt, offset, "device_type", "cpu")));
+
+    _FDT((fdt_setprop_cell(fdt, offset, "cpu-version", env->spr[SPR_PVR])));
+    _FDT((fdt_setprop_cell(fdt, offset, "d-cache-block-size",
+                            env->dcache_line_size)));
+    _FDT((fdt_setprop_cell(fdt, offset, "d-cache-line-size",
+                            env->dcache_line_size)));
+    _FDT((fdt_setprop_cell(fdt, offset, "i-cache-block-size",
+                            env->icache_line_size)));
+    _FDT((fdt_setprop_cell(fdt, offset, "i-cache-line-size",
+                            env->icache_line_size)));
+
+    if (pcc->l1_dcache_size) {
+        _FDT((fdt_setprop_cell(fdt, offset, "d-cache-size",
+                             pcc->l1_dcache_size)));
+    } else {
+        fprintf(stderr, "Warning: Unknown L1 dcache size for cpu\n");
+    }
+    if (pcc->l1_icache_size) {
+        _FDT((fdt_setprop_cell(fdt, offset, "i-cache-size",
+                             pcc->l1_icache_size)));
+    } else {
+        fprintf(stderr, "Warning: Unknown L1 icache size for cpu\n");
+    }
+
+    _FDT((fdt_setprop_cell(fdt, offset, "timebase-frequency", tbfreq)));
+    _FDT((fdt_setprop_cell(fdt, offset, "clock-frequency", cpufreq)));
+    _FDT((fdt_setprop_cell(fdt, offset, "ibm,slb-size", env->slb_nr)));
+    _FDT((fdt_setprop_string(fdt, offset, "status", "okay")));
+    _FDT((fdt_setprop(fdt, offset, "64-bit", NULL, 0)));
+
+    if (env->spr_cb[SPR_PURR].oea_read) {
+        _FDT((fdt_setprop(fdt, offset, "ibm,purr", NULL, 0)));
+    }
+
+    if (env->mmu_model & POWERPC_MMU_1TSEG) {
+        _FDT((fdt_setprop(fdt, offset, "ibm,processor-segment-sizes",
+                           segs, sizeof(segs))));
+    }
+
+    /* Advertise VMX/VSX (vector extensions) if available
+     *   0 / no property == no vector extensions
+     *   1               == VMX / Altivec available
+     *   2               == VSX available */
+    if (env->insns_flags & PPC_ALTIVEC) {
+        uint32_t vmx = (env->insns_flags2 & PPC2_VSX) ? 2 : 1;
+
+        _FDT((fdt_setprop_cell(fdt, offset, "ibm,vmx", vmx)));
+    }
+
+    /* Advertise DFP (Decimal Floating Point) if available
+     *   0 / no property == no DFP
+     *   1               == DFP available */
+    if (env->insns_flags2 & PPC2_DFP) {
+        _FDT((fdt_setprop_cell(fdt, offset, "ibm,dfp", 1)));
+    }
+
+    page_sizes_prop_size = create_page_sizes_prop(env, page_sizes_prop,
+                                                  sizeof(page_sizes_prop));
+    if (page_sizes_prop_size) {
+        _FDT((fdt_setprop(fdt, offset, "ibm,segment-page-sizes",
+                           page_sizes_prop, page_sizes_prop_size)));
+    }
+
+    _FDT((fdt_setprop_cell(fdt, offset, "ibm,chip-id",
+                            cs->cpu_index / cpus_per_socket)));
+
+    _FDT((fdt_setprop(fdt, offset, "ibm,pft-size",
+                      pft_size_prop, sizeof(pft_size_prop))));
+
+    _FDT(spapr_fixup_cpu_numa_dt(fdt, offset, cs));
+
+    _FDT(spapr_fixup_cpu_smt_dt(fdt, offset, cpu,
+                                 ppc_get_compat_smt_threads(cpu)));
+}
+
+static void spapr_populate_cpus_dt_node(void *fdt, sPAPREnvironment *spapr)
+{
+    CPUState *cs;
+    int cpus_offset;
+    char *nodename;
+    int smt = kvmppc_smt_threads();
+
+    cpus_offset = fdt_add_subnode(fdt, 0, "cpus");
+    _FDT(cpus_offset);
+    _FDT((fdt_setprop_cell(fdt, cpus_offset, "#address-cells", 0x1)));
+    _FDT((fdt_setprop_cell(fdt, cpus_offset, "#size-cells", 0x0)));
+
+    CPU_FOREACH(cs) {
+        PowerPCCPU *cpu = POWERPC_CPU(cs);
+        int index = ppc_get_vcpu_dt_id(cpu);
+        DeviceClass *dc = DEVICE_GET_CLASS(cs);
+        int offset;
+
+        if ((index % smt) != 0) {
+            continue;
+        }
+
+        nodename = g_strdup_printf("%s@%x", dc->fw_name, index);
+        offset = fdt_add_subnode(fdt, cpus_offset, nodename);
+        g_free(nodename);
+        _FDT(offset);
+        spapr_populate_cpu_dt(cs, fdt, offset);
+    }
+
+}
+
 static void spapr_finalize_fdt(sPAPREnvironment *spapr,
                                hwaddr fdt_addr,
                                hwaddr rtas_addr,
@@ -782,11 +813,8 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
         fprintf(stderr, "Couldn't set up RTAS device tree properties\n");
     }
 
-    /* Advertise NUMA via ibm,associativity */
-    ret = spapr_fixup_cpu_dt(fdt, spapr);
-    if (ret < 0) {
-        fprintf(stderr, "Couldn't finalize CPU device tree properties\n");
-    }
+    /* cpus */
+    spapr_populate_cpus_dt_node(fdt, spapr);
 
     bootlist = get_boot_devices_list(&cb, true);
     if (cb && bootlist) {
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [RFC PATCH v3 06/24] spapr: Consolidate cpu init code into a routine
  2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
                   ` (4 preceding siblings ...)
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 05/24] spapr: Reorganize CPU dt generation code Bharata B Rao
@ 2015-04-24  6:47 ` Bharata B Rao
  2015-05-04 16:10   ` Thomas Huth
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 07/24] cpu: Prepare Socket container type Bharata B Rao
                   ` (17 subsequent siblings)
  23 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, Bharata B Rao, mdroth, agraf, qemu-ppc, tyreld, nfont,
	imammedo, afaerber, david

Factor out bits of sPAPR specific CPU initialization code into
a separate routine so that it can be called from CPU hotplug
path too.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr.c | 54 +++++++++++++++++++++++++++++-------------------------
 1 file changed, 29 insertions(+), 25 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index a56f9a1..5c8f2ff 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1440,6 +1440,34 @@ static void spapr_drc_reset(void *opaque)
     }
 }
 
+static void spapr_cpu_init(PowerPCCPU *cpu)
+{
+    CPUPPCState *env = &cpu->env;
+
+    /* Set time-base frequency to 512 MHz */
+    cpu_ppc_tb_init(env, TIMEBASE_FREQ);
+
+    /* PAPR always has exception vectors in RAM not ROM. To ensure this,
+     * MSR[IP] should never be set.
+     */
+    env->msr_mask &= ~(1 << 6);
+
+    /* Tell KVM that we're in PAPR mode */
+    if (kvm_enabled()) {
+        kvmppc_set_papr(cpu);
+    }
+
+    if (cpu->max_compat) {
+        if (ppc_set_compat(cpu, cpu->max_compat) < 0) {
+            exit(1);
+        }
+    }
+
+    xics_cpu_setup(spapr->icp, cpu);
+
+    qemu_register_reset(spapr_cpu_reset, cpu);
+}
+
 /* pSeries LPAR / sPAPR hardware init */
 static void ppc_spapr_init(MachineState *machine)
 {
@@ -1451,7 +1479,6 @@ static void ppc_spapr_init(MachineState *machine)
     const char *initrd_filename = machine->initrd_filename;
     const char *boot_device = machine->boot_order;
     PowerPCCPU *cpu;
-    CPUPPCState *env;
     PCIHostState *phb;
     int i;
     MemoryRegion *sysmem = get_system_memory();
@@ -1549,30 +1576,7 @@ static void ppc_spapr_init(MachineState *machine)
             fprintf(stderr, "Unable to find PowerPC CPU definition\n");
             exit(1);
         }
-        env = &cpu->env;
-
-        /* Set time-base frequency to 512 MHz */
-        cpu_ppc_tb_init(env, TIMEBASE_FREQ);
-
-        /* PAPR always has exception vectors in RAM not ROM. To ensure this,
-         * MSR[IP] should never be set.
-         */
-        env->msr_mask &= ~(1 << 6);
-
-        /* Tell KVM that we're in PAPR mode */
-        if (kvm_enabled()) {
-            kvmppc_set_papr(cpu);
-        }
-
-        if (cpu->max_compat) {
-            if (ppc_set_compat(cpu, cpu->max_compat) < 0) {
-                exit(1);
-            }
-        }
-
-        xics_cpu_setup(spapr->icp, cpu);
-
-        qemu_register_reset(spapr_cpu_reset, cpu);
+        spapr_cpu_init(cpu);
     }
 
     /* allocate RAM */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [RFC PATCH v3 07/24] cpu: Prepare Socket container type
  2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
                   ` (5 preceding siblings ...)
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 06/24] spapr: Consolidate cpu init code into a routine Bharata B Rao
@ 2015-04-24  6:47 ` Bharata B Rao
  2015-05-05  1:47   ` David Gibson
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 08/24] ppc: Prepare CPU socket/core abstraction Bharata B Rao
                   ` (16 subsequent siblings)
  23 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, Bharata B Rao, mdroth, agraf, qemu-ppc, tyreld, nfont,
	imammedo, afaerber, david

From: Andreas Färber <afaerber@suse.de>

Signed-off-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---
 hw/cpu/Makefile.objs    |  2 +-
 hw/cpu/socket.c         | 21 +++++++++++++++++++++
 include/hw/cpu/socket.h | 14 ++++++++++++++
 3 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 hw/cpu/socket.c
 create mode 100644 include/hw/cpu/socket.h

diff --git a/hw/cpu/Makefile.objs b/hw/cpu/Makefile.objs
index 6381238..e6890cf 100644
--- a/hw/cpu/Makefile.objs
+++ b/hw/cpu/Makefile.objs
@@ -3,4 +3,4 @@ obj-$(CONFIG_REALVIEW) += realview_mpcore.o
 obj-$(CONFIG_A9MPCORE) += a9mpcore.o
 obj-$(CONFIG_A15MPCORE) += a15mpcore.o
 obj-$(CONFIG_ICC_BUS) += icc_bus.o
-
+obj-y += socket.o
diff --git a/hw/cpu/socket.c b/hw/cpu/socket.c
new file mode 100644
index 0000000..5ca47e9
--- /dev/null
+++ b/hw/cpu/socket.c
@@ -0,0 +1,21 @@
+/*
+ * CPU socket abstraction
+ *
+ * Copyright (c) 2013-2014 SUSE LINUX Products GmbH
+ * Copyright (c) 2015 SUSE Linux GmbH
+ */
+
+#include "hw/cpu/socket.h"
+
+static const TypeInfo cpu_socket_type_info = {
+    .name = TYPE_CPU_SOCKET,
+    .parent = TYPE_DEVICE,
+    .abstract = true,
+};
+
+static void cpu_socket_register_types(void)
+{
+    type_register_static(&cpu_socket_type_info);
+}
+
+type_init(cpu_socket_register_types)
diff --git a/include/hw/cpu/socket.h b/include/hw/cpu/socket.h
new file mode 100644
index 0000000..c8e0c18
--- /dev/null
+++ b/include/hw/cpu/socket.h
@@ -0,0 +1,14 @@
+/*
+ * CPU socket abstraction
+ *
+ * Copyright (c) 2013-2014 SUSE LINUX Products GmbH
+ * Copyright (c) 2015 SUSE Linux GmbH
+ */
+#ifndef HW_CPU_SOCKET_H
+#define HW_CPU_SOCKET_H
+
+#include "hw/qdev.h"
+
+#define TYPE_CPU_SOCKET "cpu-socket"
+
+#endif
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [RFC PATCH v3 08/24] ppc: Prepare CPU socket/core abstraction
  2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
                   ` (6 preceding siblings ...)
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 07/24] cpu: Prepare Socket container type Bharata B Rao
@ 2015-04-24  6:47 ` Bharata B Rao
  2015-05-04 15:15   ` Thomas Huth
  2015-05-05  6:46   ` David Gibson
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 09/24] spapr: Add CPU hotplug handler Bharata B Rao
                   ` (15 subsequent siblings)
  23 siblings, 2 replies; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, Bharata B Rao, mdroth, agraf, qemu-ppc, tyreld, nfont,
	imammedo, afaerber, david

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andreas Färber <afaerber@suse.de>
---
 hw/ppc/Makefile.objs        |  1 +
 hw/ppc/cpu-core.c           | 46 ++++++++++++++++++++++++++++++++++++++++++++
 hw/ppc/cpu-socket.c         | 47 +++++++++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/cpu-core.h   | 32 ++++++++++++++++++++++++++++++
 include/hw/ppc/cpu-socket.h | 32 ++++++++++++++++++++++++++++++
 5 files changed, 158 insertions(+)
 create mode 100644 hw/ppc/cpu-core.c
 create mode 100644 hw/ppc/cpu-socket.c
 create mode 100644 include/hw/ppc/cpu-core.h
 create mode 100644 include/hw/ppc/cpu-socket.h

diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
index c8ab06e..a35cac5 100644
--- a/hw/ppc/Makefile.objs
+++ b/hw/ppc/Makefile.objs
@@ -1,5 +1,6 @@
 # shared objects
 obj-y += ppc.o ppc_booke.o
+obj-y += cpu-socket.o cpu-core.o
 # IBM pSeries (sPAPR)
 obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
 obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
diff --git a/hw/ppc/cpu-core.c b/hw/ppc/cpu-core.c
new file mode 100644
index 0000000..ed0481f
--- /dev/null
+++ b/hw/ppc/cpu-core.c
@@ -0,0 +1,46 @@
+/*
+ * ppc CPU core abstraction
+ *
+ * Copyright (c) 2015 SUSE Linux GmbH
+ * Copyright (C) 2015 Bharata B Rao <bharata@linux.vnet.ibm.com>
+ */
+
+#include "hw/qdev.h"
+#include "hw/ppc/cpu-core.h"
+
+static int ppc_cpu_core_realize_child(Object *child, void *opaque)
+{
+    Error **errp = opaque;
+
+    object_property_set_bool(child, true, "realized", errp);
+    if (*errp) {
+        return 1;
+    }
+
+    return 0;
+}
+
+static void ppc_cpu_core_realize(DeviceState *dev, Error **errp)
+{
+    object_child_foreach(OBJECT(dev), ppc_cpu_core_realize_child, errp);
+}
+
+static void ppc_cpu_core_class_init(ObjectClass *oc, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(oc);
+
+    dc->realize = ppc_cpu_core_realize;
+}
+
+static const TypeInfo ppc_cpu_core_type_info = {
+    .name = TYPE_POWERPC_CPU_CORE,
+    .parent = TYPE_DEVICE,
+    .class_init = ppc_cpu_core_class_init,
+};
+
+static void ppc_cpu_core_register_types(void)
+{
+    type_register_static(&ppc_cpu_core_type_info);
+}
+
+type_init(ppc_cpu_core_register_types)
diff --git a/hw/ppc/cpu-socket.c b/hw/ppc/cpu-socket.c
new file mode 100644
index 0000000..602a060
--- /dev/null
+++ b/hw/ppc/cpu-socket.c
@@ -0,0 +1,47 @@
+/*
+ * PPC CPU socket abstraction
+ *
+ * Copyright (c) 2015 SUSE Linux GmbH
+ * Copyright (C) 2015 Bharata B Rao <bharata@linux.vnet.ibm.com>
+ */
+
+#include "hw/qdev.h"
+#include "hw/ppc/cpu-socket.h"
+#include "sysemu/cpus.h"
+
+static int ppc_cpu_socket_realize_child(Object *child, void *opaque)
+{
+    Error **errp = opaque;
+
+    object_property_set_bool(child, true, "realized", errp);
+    if (*errp) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+
+static void ppc_cpu_socket_realize(DeviceState *dev, Error **errp)
+{
+    object_child_foreach(OBJECT(dev), ppc_cpu_socket_realize_child, errp);
+}
+
+static void ppc_cpu_socket_class_init(ObjectClass *oc, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(oc);
+
+    dc->realize = ppc_cpu_socket_realize;
+}
+
+static const TypeInfo ppc_cpu_socket_type_info = {
+    .name = TYPE_POWERPC_CPU_SOCKET,
+    .parent = TYPE_CPU_SOCKET,
+    .class_init = ppc_cpu_socket_class_init,
+};
+
+static void ppc_cpu_socket_register_types(void)
+{
+    type_register_static(&ppc_cpu_socket_type_info);
+}
+
+type_init(ppc_cpu_socket_register_types)
diff --git a/include/hw/ppc/cpu-core.h b/include/hw/ppc/cpu-core.h
new file mode 100644
index 0000000..95f1c28
--- /dev/null
+++ b/include/hw/ppc/cpu-core.h
@@ -0,0 +1,32 @@
+/*
+ * PowerPC CPU core abstraction
+ *
+ * Copyright (c) 2015 SUSE Linux GmbH
+ * Copyright (C) 2015 Bharata B Rao <bharata@linux.vnet.ibm.com>
+ */
+#ifndef HW_PPC_CPU_CORE_H
+#define HW_PPC_CPU_CORE_H
+
+#include "hw/qdev.h"
+#include "cpu.h"
+
+#ifdef TARGET_PPC64
+#define TYPE_POWERPC_CPU_CORE "powerpc64-cpu-core"
+#elif defined(TARGET_PPCEMB)
+#define TYPE_POWERPC_CPU_CORE "embedded-powerpc-cpu-core"
+#else
+#define TYPE_POWERPC_CPU_CORE "powerpc-cpu-core"
+#endif
+
+#define POWERPC_CPU_CORE(obj) \
+    OBJECT_CHECK(PowerPCCPUCore, (obj), TYPE_POWERPC_CPU_CORE)
+
+typedef struct PowerPCCPUCore {
+    /*< private >*/
+    DeviceState parent_obj;
+    /*< public >*/
+
+    PowerPCCPU thread[0];
+} PowerPCCPUCore;
+
+#endif
diff --git a/include/hw/ppc/cpu-socket.h b/include/hw/ppc/cpu-socket.h
new file mode 100644
index 0000000..5ae19d0
--- /dev/null
+++ b/include/hw/ppc/cpu-socket.h
@@ -0,0 +1,32 @@
+/*
+ * PowerPC CPU socket abstraction
+ *
+ * Copyright (c) 2015 SUSE Linux GmbH
+ * Copyright (C) 2015 Bharata B Rao <bharata@linux.vnet.ibm.com>
+ */
+#ifndef HW_PPC_CPU_SOCKET_H
+#define HW_PPC_CPU_SOCKET_H
+
+#include "hw/cpu/socket.h"
+#include "cpu-core.h"
+
+#ifdef TARGET_PPC64
+#define TYPE_POWERPC_CPU_SOCKET "powerpc64-cpu-socket"
+#elif defined(TARGET_PPCEMB)
+#define TYPE_POWERPC_CPU_SOCKET "embedded-powerpc-cpu-socket"
+#else
+#define TYPE_POWERPC_CPU_SOCKET "powerpc-cpu-socket"
+#endif
+
+#define POWERPC_CPU_SOCKET(obj) \
+    OBJECT_CHECK(PowerPCCPUSocket, (obj), TYPE_POWERPC_CPU_SOCKET)
+
+typedef struct PowerPCCPUSocket {
+    /*< private >*/
+    DeviceState parent_obj;
+    /*< public >*/
+
+    PowerPCCPUCore core[0];
+} PowerPCCPUSocket;
+
+#endif
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [RFC PATCH v3 09/24] spapr: Add CPU hotplug handler
  2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
                   ` (7 preceding siblings ...)
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 08/24] ppc: Prepare CPU socket/core abstraction Bharata B Rao
@ 2015-04-24  6:47 ` Bharata B Rao
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 10/24] ppc: Update cpu_model in MachineState Bharata B Rao
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, Bharata B Rao, mdroth, agraf, qemu-ppc, tyreld, nfont,
	imammedo, afaerber, david

Add CPU hotplug handler to spapr machine class and let the plug handler
initialize spapr CPU specific initialization bits for a realized CPU.
This lets CPU boot path and hotplug path to share as much code as possible.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/spapr.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 5c8f2ff..f2c4fbd 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1576,7 +1576,6 @@ static void ppc_spapr_init(MachineState *machine)
             fprintf(stderr, "Unable to find PowerPC CPU definition\n");
             exit(1);
         }
-        spapr_cpu_init(cpu);
     }
 
     /* allocate RAM */
@@ -1849,12 +1848,33 @@ static void spapr_nmi(NMIState *n, int cpu_index, Error **errp)
     }
 }
 
+static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
+                                      DeviceState *dev, Error **errp)
+{
+    if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
+        CPUState *cs = CPU(dev);
+        PowerPCCPU *cpu = POWERPC_CPU(cs);
+
+        spapr_cpu_init(cpu);
+    }
+}
+
+static HotplugHandler *spapr_get_hotpug_handler(MachineState *machine,
+                                             DeviceState *dev)
+{
+    if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
+        return HOTPLUG_HANDLER(machine);
+    }
+    return NULL;
+}
+
 static void spapr_machine_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
     sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(oc);
     FWPathProviderClass *fwc = FW_PATH_PROVIDER_CLASS(oc);
     NMIClass *nc = NMI_CLASS(oc);
+    HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(oc);
 
     mc->init = ppc_spapr_init;
     mc->reset = ppc_spapr_reset;
@@ -1864,6 +1884,8 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
     mc->default_boot_order = NULL;
     mc->kvm_type = spapr_kvm_type;
     mc->has_dynamic_sysbus = true;
+    mc->get_hotplug_handler = spapr_get_hotpug_handler;
+    hc->plug = spapr_machine_device_plug;
     smc->dr_phb_enabled = false;
     smc->dr_cpu_enabled = false;
     smc->dr_lmb_enabled = false;
@@ -1883,6 +1905,7 @@ static const TypeInfo spapr_machine_info = {
     .interfaces = (InterfaceInfo[]) {
         { TYPE_FW_PATH_PROVIDER },
         { TYPE_NMI },
+        { TYPE_HOTPLUG_HANDLER },
         { }
     },
 };
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [RFC PATCH v3 10/24] ppc: Update cpu_model in MachineState
  2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
                   ` (8 preceding siblings ...)
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 09/24] spapr: Add CPU hotplug handler Bharata B Rao
@ 2015-04-24  6:47 ` Bharata B Rao
  2015-05-05  6:49   ` David Gibson
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 11/24] ppc: Create sockets and cores for CPUs Bharata B Rao
                   ` (13 subsequent siblings)
  23 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, Bharata B Rao, mdroth, agraf, qemu-ppc, tyreld, nfont,
	imammedo, afaerber, david

Keep cpu_model field in MachineState uptodate so that it can be used
from the CPU hotplug path.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/ppc/mac_newworld.c  | 10 +++++-----
 hw/ppc/mac_oldworld.c  |  7 +++----
 hw/ppc/ppc440_bamboo.c |  7 +++----
 hw/ppc/prep.c          |  7 +++----
 hw/ppc/spapr.c         |  7 +++----
 hw/ppc/virtex_ml507.c  |  7 +++----
 6 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
index 624b4ab..fe18bce 100644
--- a/hw/ppc/mac_newworld.c
+++ b/hw/ppc/mac_newworld.c
@@ -145,7 +145,6 @@ static void ppc_core99_reset(void *opaque)
 static void ppc_core99_init(MachineState *machine)
 {
     ram_addr_t ram_size = machine->ram_size;
-    const char *cpu_model = machine->cpu_model;
     const char *kernel_filename = machine->kernel_filename;
     const char *kernel_cmdline = machine->kernel_cmdline;
     const char *initrd_filename = machine->initrd_filename;
@@ -182,14 +181,15 @@ static void ppc_core99_init(MachineState *machine)
     linux_boot = (kernel_filename != NULL);
 
     /* init CPUs */
-    if (cpu_model == NULL)
+    if (machine->cpu_model == NULL) {
 #ifdef TARGET_PPC64
-        cpu_model = "970fx";
+        machine->cpu_model = "970fx";
 #else
-        cpu_model = "G4";
+        machine->cpu_model = "G4";
 #endif
+    }
     for (i = 0; i < smp_cpus; i++) {
-        cpu = cpu_ppc_init(cpu_model);
+        cpu = cpu_ppc_init(machine->cpu_model);
         if (cpu == NULL) {
             fprintf(stderr, "Unable to find PowerPC CPU definition\n");
             exit(1);
diff --git a/hw/ppc/mac_oldworld.c b/hw/ppc/mac_oldworld.c
index 3079510..2732319 100644
--- a/hw/ppc/mac_oldworld.c
+++ b/hw/ppc/mac_oldworld.c
@@ -75,7 +75,6 @@ static void ppc_heathrow_reset(void *opaque)
 static void ppc_heathrow_init(MachineState *machine)
 {
     ram_addr_t ram_size = machine->ram_size;
-    const char *cpu_model = machine->cpu_model;
     const char *kernel_filename = machine->kernel_filename;
     const char *kernel_cmdline = machine->kernel_cmdline;
     const char *initrd_filename = machine->initrd_filename;
@@ -107,10 +106,10 @@ static void ppc_heathrow_init(MachineState *machine)
     linux_boot = (kernel_filename != NULL);
 
     /* init CPUs */
-    if (cpu_model == NULL)
-        cpu_model = "G3";
+    if (machine->cpu_model == NULL)
+        machine->cpu_model = "G3";
     for (i = 0; i < smp_cpus; i++) {
-        cpu = cpu_ppc_init(cpu_model);
+        cpu = cpu_ppc_init(machine->cpu_model);
         if (cpu == NULL) {
             fprintf(stderr, "Unable to find PowerPC CPU definition\n");
             exit(1);
diff --git a/hw/ppc/ppc440_bamboo.c b/hw/ppc/ppc440_bamboo.c
index 778970a..032fa80 100644
--- a/hw/ppc/ppc440_bamboo.c
+++ b/hw/ppc/ppc440_bamboo.c
@@ -159,7 +159,6 @@ static void main_cpu_reset(void *opaque)
 static void bamboo_init(MachineState *machine)
 {
     ram_addr_t ram_size = machine->ram_size;
-    const char *cpu_model = machine->cpu_model;
     const char *kernel_filename = machine->kernel_filename;
     const char *kernel_cmdline = machine->kernel_cmdline;
     const char *initrd_filename = machine->initrd_filename;
@@ -184,10 +183,10 @@ static void bamboo_init(MachineState *machine)
     int i;
 
     /* Setup CPU. */
-    if (cpu_model == NULL) {
-        cpu_model = "440EP";
+    if (machine->cpu_model == NULL) {
+        machine->cpu_model = "440EP";
     }
-    cpu = cpu_ppc_init(cpu_model);
+    cpu = cpu_ppc_init(machine->cpu_model);
     if (cpu == NULL) {
         fprintf(stderr, "Unable to initialize CPU!\n");
         exit(1);
diff --git a/hw/ppc/prep.c b/hw/ppc/prep.c
index 15df7f3..55e9643 100644
--- a/hw/ppc/prep.c
+++ b/hw/ppc/prep.c
@@ -364,7 +364,6 @@ static PortioList prep_port_list;
 static void ppc_prep_init(MachineState *machine)
 {
     ram_addr_t ram_size = machine->ram_size;
-    const char *cpu_model = machine->cpu_model;
     const char *kernel_filename = machine->kernel_filename;
     const char *kernel_cmdline = machine->kernel_cmdline;
     const char *initrd_filename = machine->initrd_filename;
@@ -396,10 +395,10 @@ static void ppc_prep_init(MachineState *machine)
     linux_boot = (kernel_filename != NULL);
 
     /* init CPUs */
-    if (cpu_model == NULL)
-        cpu_model = "602";
+    if (machine->cpu_model == NULL)
+        machine->cpu_model = "602";
     for (i = 0; i < smp_cpus; i++) {
-        cpu = cpu_ppc_init(cpu_model);
+        cpu = cpu_ppc_init(machine->cpu_model);
         if (cpu == NULL) {
             fprintf(stderr, "Unable to find PowerPC CPU definition\n");
             exit(1);
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index f2c4fbd..8cc55fe 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1473,7 +1473,6 @@ static void ppc_spapr_init(MachineState *machine)
 {
     sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(machine);
     ram_addr_t ram_size = machine->ram_size;
-    const char *cpu_model = machine->cpu_model;
     const char *kernel_filename = machine->kernel_filename;
     const char *kernel_cmdline = machine->kernel_cmdline;
     const char *initrd_filename = machine->initrd_filename;
@@ -1567,11 +1566,11 @@ static void ppc_spapr_init(MachineState *machine)
     }
 
     /* init CPUs */
-    if (cpu_model == NULL) {
-        cpu_model = kvm_enabled() ? "host" : "POWER7";
+    if (machine->cpu_model == NULL) {
+        machine->cpu_model = kvm_enabled() ? "host" : "POWER7";
     }
     for (i = 0; i < smp_cpus; i++) {
-        cpu = cpu_ppc_init(cpu_model);
+        cpu = cpu_ppc_init(machine->cpu_model);
         if (cpu == NULL) {
             fprintf(stderr, "Unable to find PowerPC CPU definition\n");
             exit(1);
diff --git a/hw/ppc/virtex_ml507.c b/hw/ppc/virtex_ml507.c
index 6ebd5be..f33d398 100644
--- a/hw/ppc/virtex_ml507.c
+++ b/hw/ppc/virtex_ml507.c
@@ -197,7 +197,6 @@ static int xilinx_load_device_tree(hwaddr addr,
 static void virtex_init(MachineState *machine)
 {
     ram_addr_t ram_size = machine->ram_size;
-    const char *cpu_model = machine->cpu_model;
     const char *kernel_filename = machine->kernel_filename;
     const char *kernel_cmdline = machine->kernel_cmdline;
     hwaddr initrd_base = 0;
@@ -214,11 +213,11 @@ static void virtex_init(MachineState *machine)
     int i;
 
     /* init CPUs */
-    if (cpu_model == NULL) {
-        cpu_model = "440-Xilinx";
+    if (machine->cpu_model == NULL) {
+        machine->cpu_model = "440-Xilinx";
     }
 
-    cpu = ppc440_init_xilinx(&ram_size, 1, cpu_model, 400000000);
+    cpu = ppc440_init_xilinx(&ram_size, 1, machine->cpu_model, 400000000);
     env = &cpu->env;
     qemu_register_reset(main_cpu_reset, cpu);
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [RFC PATCH v3 11/24] ppc: Create sockets and cores for CPUs
  2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
                   ` (9 preceding siblings ...)
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 10/24] ppc: Update cpu_model in MachineState Bharata B Rao
@ 2015-04-24  6:47 ` Bharata B Rao
  2015-05-05  6:52   ` David Gibson
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 12/24] spapr: CPU hotplug support Bharata B Rao
                   ` (12 subsequent siblings)
  23 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, Bharata B Rao, mdroth, agraf, qemu-ppc, tyreld, nfont,
	imammedo, afaerber, david

ppc machine init functions create individual CPU threads. Change this
for sPAPR by switching to socket creation. CPUs are created recursively
by socket and core instance init routines.

TODO: Switching to socket level CPU creation is done only for sPAPR
target now.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---
 hw/ppc/cpu-core.c           | 19 +++++++++++++++++++
 hw/ppc/cpu-socket.c         | 21 +++++++++++++++++++++
 hw/ppc/spapr.c              | 17 ++++++++++-------
 target-ppc/cpu.h            |  1 +
 target-ppc/translate_init.c | 46 +++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 97 insertions(+), 7 deletions(-)

diff --git a/hw/ppc/cpu-core.c b/hw/ppc/cpu-core.c
index ed0481f..f257495 100644
--- a/hw/ppc/cpu-core.c
+++ b/hw/ppc/cpu-core.c
@@ -7,6 +7,9 @@
 
 #include "hw/qdev.h"
 #include "hw/ppc/cpu-core.h"
+#include "hw/boards.h"
+#include <sysemu/cpus.h>
+#include <sysemu/sysemu.h>
 
 static int ppc_cpu_core_realize_child(Object *child, void *opaque)
 {
@@ -32,10 +35,26 @@ static void ppc_cpu_core_class_init(ObjectClass *oc, void *data)
     dc->realize = ppc_cpu_core_realize;
 }
 
+static void ppc_cpu_core_instance_init(Object *obj)
+{
+    int i;
+    PowerPCCPU *cpu = NULL;
+    MachineState *machine = MACHINE(qdev_get_machine());
+    int threads_per_core = (smp_cpus > smp_threads) ? smp_threads : smp_cpus;
+
+    for (i = 0; i < threads_per_core; i++) {
+        cpu = POWERPC_CPU(cpu_ppc_create(TYPE_POWERPC_CPU, machine->cpu_model));
+        object_property_add_child(obj, "thread[*]", OBJECT(cpu), &error_abort);
+        object_unref(OBJECT(cpu));
+    }
+}
+
 static const TypeInfo ppc_cpu_core_type_info = {
     .name = TYPE_POWERPC_CPU_CORE,
     .parent = TYPE_DEVICE,
     .class_init = ppc_cpu_core_class_init,
+    .instance_init = ppc_cpu_core_instance_init,
+    .instance_size = sizeof(PowerPCCPUCore),
 };
 
 static void ppc_cpu_core_register_types(void)
diff --git a/hw/ppc/cpu-socket.c b/hw/ppc/cpu-socket.c
index 602a060..af8c35f 100644
--- a/hw/ppc/cpu-socket.c
+++ b/hw/ppc/cpu-socket.c
@@ -8,6 +8,9 @@
 #include "hw/qdev.h"
 #include "hw/ppc/cpu-socket.h"
 #include "sysemu/cpus.h"
+#include "sysemu/sysemu.h"
+#include "qemu/config-file.h"
+#include "cpu.h"
 
 static int ppc_cpu_socket_realize_child(Object *child, void *opaque)
 {
@@ -33,10 +36,28 @@ static void ppc_cpu_socket_class_init(ObjectClass *oc, void *data)
     dc->realize = ppc_cpu_socket_realize;
 }
 
+static void ppc_cpu_socket_instance_init(Object *obj)
+{
+    int i;
+    Object *core;
+    QemuOpts *opts = qemu_opts_find(qemu_find_opts("smp-opts"), NULL);
+    int sockets = opts ? qemu_opt_get_number(opts, "sockets", 0) : 0;
+    int cores = (smp_cpus/smp_threads) ? smp_cpus/smp_threads : 1;
+
+    sockets = sockets ? sockets : cores;
+    for (i = 0; i < cores/sockets; i++) {
+        core = object_new(TYPE_POWERPC_CPU_CORE);
+        object_property_add_child(obj, "core[*]", core, &error_abort);
+        object_unref(core);
+    }
+}
+
 static const TypeInfo ppc_cpu_socket_type_info = {
     .name = TYPE_POWERPC_CPU_SOCKET,
     .parent = TYPE_CPU_SOCKET,
     .class_init = ppc_cpu_socket_class_init,
+    .instance_init = ppc_cpu_socket_instance_init,
+    .instance_size = sizeof(PowerPCCPUSocket),
 };
 
 static void ppc_cpu_socket_register_types(void)
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 8cc55fe..b526b7d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -37,6 +37,7 @@
 #include "mmu-hash64.h"
 #include "qom/cpu.h"
 
+#include "hw/ppc/cpu-socket.h"
 #include "hw/boards.h"
 #include "hw/ppc/ppc.h"
 #include "hw/loader.h"
@@ -1477,7 +1478,6 @@ static void ppc_spapr_init(MachineState *machine)
     const char *kernel_cmdline = machine->kernel_cmdline;
     const char *initrd_filename = machine->initrd_filename;
     const char *boot_device = machine->boot_order;
-    PowerPCCPU *cpu;
     PCIHostState *phb;
     int i;
     MemoryRegion *sysmem = get_system_memory();
@@ -1492,7 +1492,12 @@ static void ppc_spapr_init(MachineState *machine)
     bool kernel_le = false;
     char *filename;
     int smt = kvmppc_smt_threads();
+    Object *socket;
+    QemuOpts *opts = qemu_opts_find(qemu_find_opts("smp-opts"), NULL);
+    int sockets = opts ? qemu_opt_get_number(opts, "sockets", 0) : 0;
+    int cores = (smp_cpus/smp_threads) ? smp_cpus/smp_threads : 1;
 
+    sockets = sockets ? sockets : cores;
     msi_supported = true;
 
     spapr = g_malloc0(sizeof(*spapr));
@@ -1569,12 +1574,10 @@ static void ppc_spapr_init(MachineState *machine)
     if (machine->cpu_model == NULL) {
         machine->cpu_model = kvm_enabled() ? "host" : "POWER7";
     }
-    for (i = 0; i < smp_cpus; i++) {
-        cpu = cpu_ppc_init(machine->cpu_model);
-        if (cpu == NULL) {
-            fprintf(stderr, "Unable to find PowerPC CPU definition\n");
-            exit(1);
-        }
+
+    for (i = 0; i < sockets; i++) {
+        socket = object_new(TYPE_POWERPC_CPU_SOCKET);
+        object_property_set_bool(socket, true, "realized", &error_abort);
     }
 
     /* allocate RAM */
diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index abc3545..f15cc2c 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -1162,6 +1162,7 @@ do {                                            \
 
 /*****************************************************************************/
 PowerPCCPU *cpu_ppc_init(const char *cpu_model);
+CPUState *cpu_ppc_create(const char *typename, const char *cpu_model);
 void ppc_translate_init(void);
 void gen_update_current_nip(void *opaque);
 int cpu_ppc_exec (CPUPPCState *s);
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index d74f4f0..a8716cf 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -9365,6 +9365,52 @@ static ObjectClass *ppc_cpu_class_by_name(const char *name)
     return NULL;
 }
 
+/*
+ * This is essentially same as cpu_generic_init() but without a set
+ * realize call.
+ */
+CPUState *cpu_ppc_create(const char *typename, const char *cpu_model)
+{
+    char *str, *name, *featurestr;
+    CPUState *cpu;
+    ObjectClass *oc;
+    CPUClass *cc;
+    Error *err = NULL;
+
+    str = g_strdup(cpu_model);
+    name = strtok(str, ",");
+
+    oc = cpu_class_by_name(typename, name);
+    if (oc == NULL) {
+        g_free(str);
+        return NULL;
+    }
+
+    cpu = CPU(object_new(object_class_get_name(oc)));
+    cc = CPU_GET_CLASS(cpu);
+
+    featurestr = strtok(NULL, ",");
+    cc->parse_features(cpu, featurestr, &err);
+    g_free(str);
+    if (err != NULL) {
+        goto out;
+    }
+
+out:
+    if (err != NULL) {
+        error_report("%s", error_get_pretty(err));
+        error_free(err);
+        object_unref(OBJECT(cpu));
+        return NULL;
+    }
+
+    return cpu;
+}
+
+/*
+ * TODO: This can be removed when all powerpc targets are converted to
+ * socket level CPU realization.
+ */
 PowerPCCPU *cpu_ppc_init(const char *cpu_model)
 {
     return POWERPC_CPU(cpu_generic_init(TYPE_POWERPC_CPU, cpu_model));
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [RFC PATCH v3 12/24] spapr: CPU hotplug support
  2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
                   ` (10 preceding siblings ...)
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 11/24] ppc: Create sockets and cores for CPUs Bharata B Rao
@ 2015-04-24  6:47 ` Bharata B Rao
  2015-05-04 15:53   ` Thomas Huth
  2015-05-05  6:59   ` David Gibson
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 13/24] cpus: Add Error argument to cpu_exec_init() Bharata B Rao
                   ` (11 subsequent siblings)
  23 siblings, 2 replies; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, Bharata B Rao, mdroth, agraf, qemu-ppc, tyreld, nfont,
	imammedo, afaerber, david

Support CPU hotplug via device-add command. Set up device tree
entries for the hotplugged CPU core and use the exising EPOW event
infrastructure to send CPU hotplug notification to the guest.

Also support cold plugged CPUs that are specified by -device option
on cmdline.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c        | 129 ++++++++++++++++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr_events.c |   8 ++--
 hw/ppc/spapr_rtas.c   |  11 +++++
 3 files changed, 145 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index b526b7d..9b0701c 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -33,6 +33,7 @@
 #include "sysemu/block-backend.h"
 #include "sysemu/cpus.h"
 #include "sysemu/kvm.h"
+#include "sysemu/device_tree.h"
 #include "kvm_ppc.h"
 #include "mmu-hash64.h"
 #include "qom/cpu.h"
@@ -662,6 +663,17 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, int offset)
     unsigned sockets = opts ? qemu_opt_get_number(opts, "sockets", 0) : 0;
     uint32_t cpus_per_socket = sockets ? (smp_cpus / sockets) : 1;
     uint32_t pft_size_prop[] = {0, cpu_to_be32(spapr->htab_shift)};
+    sPAPRDRConnector *drc;
+    sPAPRDRConnectorClass *drck;
+    int drc_index;
+
+    if (spapr->dr_cpu_enabled) {
+        drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, index);
+        g_assert(drc);
+        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+        drc_index = drck->get_index(drc);
+        _FDT((fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_index)));
+    }
 
     _FDT((fdt_setprop_cell(fdt, offset, "reg", index)));
     _FDT((fdt_setprop_string(fdt, offset, "device_type", "cpu")));
@@ -1850,6 +1862,114 @@ static void spapr_nmi(NMIState *n, int cpu_index, Error **errp)
     }
 }
 
+static void *spapr_populate_hotplug_cpu_dt(DeviceState *dev, CPUState *cs,
+                                            int *fdt_offset)
+{
+    PowerPCCPU *cpu = POWERPC_CPU(cs);
+    DeviceClass *dc = DEVICE_GET_CLASS(cs);
+    int id = ppc_get_vcpu_dt_id(cpu);
+    void *fdt;
+    int offset, fdt_size;
+    char *nodename;
+
+    fdt = create_device_tree(&fdt_size);
+    nodename = g_strdup_printf("%s@%x", dc->fw_name, id);
+    offset = fdt_add_subnode(fdt, 0, nodename);
+
+    spapr_populate_cpu_dt(cs, fdt, offset);
+    g_free(nodename);
+
+    *fdt_offset = offset;
+    return fdt;
+}
+
+static void spapr_cpu_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
+                            Error **errp)
+{
+    CPUState *cs = CPU(dev);
+    PowerPCCPU *cpu = POWERPC_CPU(cs);
+    int id = ppc_get_vcpu_dt_id(cpu);
+    sPAPRDRConnector *drc =
+        spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, id);
+    sPAPRDRConnectorClass *drck;
+    int smt = kvmppc_smt_threads();
+    Error *local_err = NULL;
+    void *fdt = NULL;
+    int i, fdt_offset = 0;
+
+    /* Set NUMA node for the added CPUs  */
+    for (i = 0; i < nb_numa_nodes; i++) {
+        if (test_bit(cs->cpu_index, numa_info[i].node_cpu)) {
+            cs->numa_node = i;
+            break;
+        }
+    }
+
+    /*
+     * SMT threads return from here, only main thread (core) will
+     * continue and signal hotplug event to the guest.
+     */
+    if ((id % smt) != 0) {
+        return;
+    }
+
+    if (!spapr->dr_cpu_enabled) {
+        /*
+         * This is a cold plugged CPU but the machine doesn't support
+         * DR. So skip the hotplug path ensuring that the CPU is brought
+         * up online with out an associated DR connector.
+         */
+        return;
+    }
+
+    g_assert(drc);
+
+    /*
+     * Setup CPU DT entries only for hotplugged CPUs. For boot time or
+     * coldplugged CPUs DT entries are setup in spapr_finalize_fdt().
+     */
+    if (dev->hotplugged) {
+        fdt = spapr_populate_hotplug_cpu_dt(dev, cs, &fdt_offset);
+    }
+
+    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    drck->attach(drc, dev, fdt, fdt_offset, !dev->hotplugged, &local_err);
+    if (local_err) {
+        g_free(fdt);
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    /*
+     * We send hotplug notification interrupt to the guest only in case
+     * of hotplugged CPUs.
+     */
+    if (dev->hotplugged) {
+        spapr_hotplug_req_add_event(drc);
+    } else {
+        /*
+         * HACK to support removal of hotplugged CPU after VM migration:
+         *
+         * Since we want to be able to hot-remove those coldplugged CPUs
+         * started at boot time using -device option at the target VM, we set
+         * the right allocation_state and isolation_state for them, which for
+         * the hotplugged CPUs would be set via RTAS calls done from the
+         * guest during hotplug.
+         *
+         * This allows the coldplugged CPUs started using -device option to
+         * have the right isolation and allocation states as expected by the
+         * CPU hot removal code.
+         *
+         * This hack will be removed once we have DRC states migrated as part
+         * of VM migration.
+         */
+        drck->set_allocation_state(drc, SPAPR_DR_ALLOCATION_STATE_USABLE);
+        drck->set_isolation_state(drc, SPAPR_DR_ISOLATION_STATE_UNISOLATED);
+    }
+
+    return;
+}
+
 static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
                                       DeviceState *dev, Error **errp)
 {
@@ -1858,6 +1978,15 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
         PowerPCCPU *cpu = POWERPC_CPU(cs);
 
         spapr_cpu_init(cpu);
+        spapr_cpu_reset(cpu);
+
+        /*
+         * Fail hotplug on machines where CPU DR isn't enabled.
+         */
+        if (!spapr->dr_cpu_enabled && dev->hotplugged) {
+            return;
+        }
+        spapr_cpu_plug(hotplug_dev, dev, errp);
     }
 }
 
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index be82815..4ae818a 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -421,14 +421,16 @@ static void spapr_hotplug_req_event(sPAPRDRConnector *drc, uint8_t hp_action)
     hp->hdr.section_length = cpu_to_be16(sizeof(*hp));
     hp->hdr.section_version = 1; /* includes extended modifier */
     hp->hotplug_action = hp_action;
-
+    hp->drc.index = cpu_to_be32(drck->get_index(drc));
+    hp->hotplug_identifier = RTAS_LOG_V6_HP_ID_DRC_INDEX;
 
     switch (drc_type) {
     case SPAPR_DR_CONNECTOR_TYPE_PCI:
-        hp->drc.index = cpu_to_be32(drck->get_index(drc));
-        hp->hotplug_identifier = RTAS_LOG_V6_HP_ID_DRC_INDEX;
         hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PCI;
         break;
+    case SPAPR_DR_CONNECTOR_TYPE_CPU:
+        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_CPU;
+        break;
     default:
         /* we shouldn't be signaling hotplug events for resources
          * that don't support them
diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
index 57ec97a..48aeb86 100644
--- a/hw/ppc/spapr_rtas.c
+++ b/hw/ppc/spapr_rtas.c
@@ -121,6 +121,16 @@ static void rtas_query_cpu_stopped_state(PowerPCCPU *cpu_,
     rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
 }
 
+static void spapr_cpu_set_endianness(PowerPCCPU *cpu)
+{
+    PowerPCCPU *fcpu = POWERPC_CPU(first_cpu);
+    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(fcpu);
+
+    if (!(*pcc->interrupts_big_endian)(fcpu)) {
+        cpu->env.spr[SPR_LPCR] |= LPCR_ILE;
+    }
+}
+
 static void rtas_start_cpu(PowerPCCPU *cpu_, sPAPREnvironment *spapr,
                            uint32_t token, uint32_t nargs,
                            target_ulong args,
@@ -157,6 +167,7 @@ static void rtas_start_cpu(PowerPCCPU *cpu_, sPAPREnvironment *spapr,
         env->nip = start;
         env->gpr[3] = r3;
         cs->halted = 0;
+        spapr_cpu_set_endianness(cpu);
 
         qemu_cpu_kick(cs);
 
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [RFC PATCH v3 13/24] cpus: Add Error argument to cpu_exec_init()
  2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
                   ` (11 preceding siblings ...)
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 12/24] spapr: CPU hotplug support Bharata B Rao
@ 2015-04-24  6:47 ` Bharata B Rao
  2015-05-05  7:01   ` David Gibson
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 14/24] cpus: Convert cpu_index into a bitmap Bharata B Rao
                   ` (10 subsequent siblings)
  23 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, Bharata B Rao, mdroth, agraf, qemu-ppc, tyreld, nfont,
	imammedo, afaerber, david

Add an Error argument to cpu_exec_init() to let users collect the
error. Change all callers to currently pass NULL error argument. This change
is needed for the following reasons:

- A subsequent commit changes the CPU enumeration logic in cpu_exec_init()
  resulting in cpu_exec_init() to fail if cpu_index values corresponding
  to max_cpus have already been handed out.
- There is a thinking that cpu_exec_init() should be called from realize
  rather than instance_init. With this change, those architectures
  that can move this call into realize function can do so in a phased
  manner.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 exec.c                      | 2 +-
 include/exec/exec-all.h     | 2 +-
 target-alpha/cpu.c          | 2 +-
 target-arm/cpu.c            | 2 +-
 target-cris/cpu.c           | 2 +-
 target-i386/cpu.c           | 2 +-
 target-lm32/cpu.c           | 2 +-
 target-m68k/cpu.c           | 2 +-
 target-microblaze/cpu.c     | 2 +-
 target-mips/cpu.c           | 2 +-
 target-moxie/cpu.c          | 2 +-
 target-openrisc/cpu.c       | 2 +-
 target-ppc/translate_init.c | 2 +-
 target-s390x/cpu.c          | 2 +-
 target-sh4/cpu.c            | 2 +-
 target-sparc/cpu.c          | 2 +-
 target-tricore/cpu.c        | 2 +-
 target-unicore32/cpu.c      | 2 +-
 target-xtensa/cpu.c         | 2 +-
 19 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/exec.c b/exec.c
index c85321a..e1ff6b0 100644
--- a/exec.c
+++ b/exec.c
@@ -527,7 +527,7 @@ void tcg_cpu_address_space_init(CPUState *cpu, AddressSpace *as)
 }
 #endif
 
-void cpu_exec_init(CPUArchState *env)
+void cpu_exec_init(CPUArchState *env, Error **errp)
 {
     CPUState *cpu = ENV_GET_CPU(env);
     CPUClass *cc = CPU_GET_CLASS(cpu);
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 8eb0db3..41a9393 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -88,7 +88,7 @@ void QEMU_NORETURN cpu_io_recompile(CPUState *cpu, uintptr_t retaddr);
 TranslationBlock *tb_gen_code(CPUState *cpu,
                               target_ulong pc, target_ulong cs_base, int flags,
                               int cflags);
-void cpu_exec_init(CPUArchState *env);
+void cpu_exec_init(CPUArchState *env, Error **errp);
 void QEMU_NORETURN cpu_loop_exit(CPUState *cpu);
 int page_unprotect(target_ulong address, uintptr_t pc, void *puc);
 void tb_invalidate_phys_page_range(tb_page_addr_t start, tb_page_addr_t end,
diff --git a/target-alpha/cpu.c b/target-alpha/cpu.c
index a98b7d8..0a0c21e 100644
--- a/target-alpha/cpu.c
+++ b/target-alpha/cpu.c
@@ -257,7 +257,7 @@ static void alpha_cpu_initfn(Object *obj)
     CPUAlphaState *env = &cpu->env;
 
     cs->env_ptr = env;
-    cpu_exec_init(env);
+    cpu_exec_init(env, NULL);
     tlb_flush(cs, 1);
 
     alpha_translate_init();
diff --git a/target-arm/cpu.c b/target-arm/cpu.c
index 986f04c..86edaab 100644
--- a/target-arm/cpu.c
+++ b/target-arm/cpu.c
@@ -369,7 +369,7 @@ static void arm_cpu_initfn(Object *obj)
     static bool inited;
 
     cs->env_ptr = &cpu->env;
-    cpu_exec_init(&cpu->env);
+    cpu_exec_init(&cpu->env, NULL);
     cpu->cp_regs = g_hash_table_new_full(g_int_hash, g_int_equal,
                                          g_free, g_free);
 
diff --git a/target-cris/cpu.c b/target-cris/cpu.c
index 16cfba9..8b589ec 100644
--- a/target-cris/cpu.c
+++ b/target-cris/cpu.c
@@ -170,7 +170,7 @@ static void cris_cpu_initfn(Object *obj)
     static bool tcg_initialized;
 
     cs->env_ptr = env;
-    cpu_exec_init(env);
+    cpu_exec_init(env, NULL);
 
     env->pregs[PR_VR] = ccc->vr;
 
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index d543e2b..daccf4f 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2886,7 +2886,7 @@ static void x86_cpu_initfn(Object *obj)
     static int inited;
 
     cs->env_ptr = env;
-    cpu_exec_init(env);
+    cpu_exec_init(env, NULL);
 
     object_property_add(obj, "family", "int",
                         x86_cpuid_version_get_family,
diff --git a/target-lm32/cpu.c b/target-lm32/cpu.c
index f8081f5..89b6631 100644
--- a/target-lm32/cpu.c
+++ b/target-lm32/cpu.c
@@ -151,7 +151,7 @@ static void lm32_cpu_initfn(Object *obj)
     static bool tcg_initialized;
 
     cs->env_ptr = env;
-    cpu_exec_init(env);
+    cpu_exec_init(env, NULL);
 
     env->flags = 0;
 
diff --git a/target-m68k/cpu.c b/target-m68k/cpu.c
index 4cfb725..6a41551 100644
--- a/target-m68k/cpu.c
+++ b/target-m68k/cpu.c
@@ -168,7 +168,7 @@ static void m68k_cpu_initfn(Object *obj)
     static bool inited;
 
     cs->env_ptr = env;
-    cpu_exec_init(env);
+    cpu_exec_init(env, NULL);
 
     if (tcg_enabled() && !inited) {
         inited = true;
diff --git a/target-microblaze/cpu.c b/target-microblaze/cpu.c
index 67e3182..6b3732d 100644
--- a/target-microblaze/cpu.c
+++ b/target-microblaze/cpu.c
@@ -130,7 +130,7 @@ static void mb_cpu_initfn(Object *obj)
     static bool tcg_initialized;
 
     cs->env_ptr = env;
-    cpu_exec_init(env);
+    cpu_exec_init(env, NULL);
 
     set_float_rounding_mode(float_round_nearest_even, &env->fp_status);
 
diff --git a/target-mips/cpu.c b/target-mips/cpu.c
index 98dc94e..02f1d32 100644
--- a/target-mips/cpu.c
+++ b/target-mips/cpu.c
@@ -115,7 +115,7 @@ static void mips_cpu_initfn(Object *obj)
     CPUMIPSState *env = &cpu->env;
 
     cs->env_ptr = env;
-    cpu_exec_init(env);
+    cpu_exec_init(env, NULL);
 
     if (tcg_enabled()) {
         mips_tcg_init();
diff --git a/target-moxie/cpu.c b/target-moxie/cpu.c
index 47b617f..f815fb3 100644
--- a/target-moxie/cpu.c
+++ b/target-moxie/cpu.c
@@ -66,7 +66,7 @@ static void moxie_cpu_initfn(Object *obj)
     static int inited;
 
     cs->env_ptr = &cpu->env;
-    cpu_exec_init(&cpu->env);
+    cpu_exec_init(&cpu->env, NULL);
 
     if (tcg_enabled() && !inited) {
         inited = 1;
diff --git a/target-openrisc/cpu.c b/target-openrisc/cpu.c
index 39bedc1..87b2f80 100644
--- a/target-openrisc/cpu.c
+++ b/target-openrisc/cpu.c
@@ -92,7 +92,7 @@ static void openrisc_cpu_initfn(Object *obj)
     static int inited;
 
     cs->env_ptr = &cpu->env;
-    cpu_exec_init(&cpu->env);
+    cpu_exec_init(&cpu->env, NULL);
 
 #ifndef CONFIG_USER_ONLY
     cpu_openrisc_mmu_init(cpu);
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index a8716cf..9f4f172 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -9679,7 +9679,7 @@ static void ppc_cpu_initfn(Object *obj)
     CPUPPCState *env = &cpu->env;
 
     cs->env_ptr = env;
-    cpu_exec_init(env);
+    cpu_exec_init(env, NULL);
     cpu->cpu_dt_id = cs->cpu_index;
 
     env->msr_mask = pcc->msr_mask;
diff --git a/target-s390x/cpu.c b/target-s390x/cpu.c
index d2f6312..28717bd 100644
--- a/target-s390x/cpu.c
+++ b/target-s390x/cpu.c
@@ -185,7 +185,7 @@ static void s390_cpu_initfn(Object *obj)
 #endif
 
     cs->env_ptr = env;
-    cpu_exec_init(env);
+    cpu_exec_init(env, NULL);
 #if !defined(CONFIG_USER_ONLY)
     qemu_register_reset(s390_cpu_machine_reset_cb, cpu);
     qemu_get_timedate(&tm, 0);
diff --git a/target-sh4/cpu.c b/target-sh4/cpu.c
index d187a2b..ffb635e 100644
--- a/target-sh4/cpu.c
+++ b/target-sh4/cpu.c
@@ -247,7 +247,7 @@ static void superh_cpu_initfn(Object *obj)
     CPUSH4State *env = &cpu->env;
 
     cs->env_ptr = env;
-    cpu_exec_init(env);
+    cpu_exec_init(env, NULL);
 
     env->movcal_backup_tail = &(env->movcal_backup);
 
diff --git a/target-sparc/cpu.c b/target-sparc/cpu.c
index a952097..d857aae 100644
--- a/target-sparc/cpu.c
+++ b/target-sparc/cpu.c
@@ -802,7 +802,7 @@ static void sparc_cpu_initfn(Object *obj)
     CPUSPARCState *env = &cpu->env;
 
     cs->env_ptr = env;
-    cpu_exec_init(env);
+    cpu_exec_init(env, NULL);
 
     if (tcg_enabled()) {
         gen_intermediate_code_init(env);
diff --git a/target-tricore/cpu.c b/target-tricore/cpu.c
index 2ba0cf4..53b117b 100644
--- a/target-tricore/cpu.c
+++ b/target-tricore/cpu.c
@@ -88,7 +88,7 @@ static void tricore_cpu_initfn(Object *obj)
     CPUTriCoreState *env = &cpu->env;
 
     cs->env_ptr = env;
-    cpu_exec_init(env);
+    cpu_exec_init(env, NULL);
 
     if (tcg_enabled()) {
         tricore_tcg_init();
diff --git a/target-unicore32/cpu.c b/target-unicore32/cpu.c
index 5b32987..d56d78a 100644
--- a/target-unicore32/cpu.c
+++ b/target-unicore32/cpu.c
@@ -111,7 +111,7 @@ static void uc32_cpu_initfn(Object *obj)
     static bool inited;
 
     cs->env_ptr = env;
-    cpu_exec_init(env);
+    cpu_exec_init(env, NULL);
 
 #ifdef CONFIG_USER_ONLY
     env->uncached_asr = ASR_MODE_USER;
diff --git a/target-xtensa/cpu.c b/target-xtensa/cpu.c
index 6a5414f..dd23d32 100644
--- a/target-xtensa/cpu.c
+++ b/target-xtensa/cpu.c
@@ -114,7 +114,7 @@ static void xtensa_cpu_initfn(Object *obj)
 
     cs->env_ptr = env;
     env->config = xcc->config;
-    cpu_exec_init(env);
+    cpu_exec_init(env, NULL);
 
     if (tcg_enabled() && !tcg_inited) {
         tcg_inited = true;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [RFC PATCH v3 14/24] cpus: Convert cpu_index into a bitmap
  2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
                   ` (12 preceding siblings ...)
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 13/24] cpus: Add Error argument to cpu_exec_init() Bharata B Rao
@ 2015-04-24  6:47 ` Bharata B Rao
  2015-05-05  7:10   ` David Gibson
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 15/24] ppc: Move cpu_exec_init() call to realize function Bharata B Rao
                   ` (9 subsequent siblings)
  23 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, Bharata B Rao, mdroth, agraf, qemu-ppc, tyreld, nfont,
	imammedo, afaerber, david

Currently CPUState.cpu_index is monotonically increasing and a newly
created CPU always gets the next higher index. The next available
index is calculated by counting the existing number of CPUs. This is
fine as long as we only add CPUs, but there are architectures which
are starting to support CPU removal too. For an architecture like PowerPC
which derives its CPU identifier (device tree ID) from cpu_index, the
existing logic of generating cpu_index values causes problems.

With the currently proposed method of handling vCPU removal by parking
the vCPU fd in QEMU
(Ref: http://lists.gnu.org/archive/html/qemu-devel/2015-02/msg02604.html),
generating cpu_index this way will not work for PowerPC.

This patch changes the way cpu_index is handed out by maintaining
a bit map of the CPUs that tracks both addition and removal of CPUs.

The CPU bitmap allocation logic is part of cpu_exec_init() which is
called by instance_init routines of various CPU targets. This patch
also adds corresponding instance_finalize routine if needed for these
CPU targets so that CPU can be marked free when it is removed.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---
 exec.c                      | 37 ++++++++++++++++++++++++++++++++++---
 include/qom/cpu.h           |  8 ++++++++
 target-alpha/cpu.c          |  6 ++++++
 target-arm/cpu.c            |  1 +
 target-cris/cpu.c           |  6 ++++++
 target-i386/cpu.c           |  6 ++++++
 target-lm32/cpu.c           |  6 ++++++
 target-m68k/cpu.c           |  6 ++++++
 target-microblaze/cpu.c     |  6 ++++++
 target-mips/cpu.c           |  6 ++++++
 target-moxie/cpu.c          |  6 ++++++
 target-openrisc/cpu.c       |  6 ++++++
 target-ppc/translate_init.c |  6 ++++++
 target-s390x/cpu.c          |  1 +
 target-sh4/cpu.c            |  6 ++++++
 target-sparc/cpu.c          |  1 +
 target-tricore/cpu.c        |  5 +++++
 target-unicore32/cpu.c      |  6 ++++++
 target-xtensa/cpu.c         |  6 ++++++
 19 files changed, 128 insertions(+), 3 deletions(-)

diff --git a/exec.c b/exec.c
index e1ff6b0..9bbab02 100644
--- a/exec.c
+++ b/exec.c
@@ -527,21 +527,52 @@ void tcg_cpu_address_space_init(CPUState *cpu, AddressSpace *as)
 }
 #endif
 
+#ifndef CONFIG_USER_ONLY
+static DECLARE_BITMAP(cpu_index_map, MAX_CPUMASK_BITS);
+
+static int cpu_get_free_index(Error **errp)
+{
+    int cpu = find_first_zero_bit(cpu_index_map, max_cpus);
+
+    if (cpu == max_cpus) {
+        error_setg(errp, "Trying to use more CPUs than allowed max of %d\n",
+                    max_cpus);
+        return max_cpus;
+    } else {
+        bitmap_set(cpu_index_map, cpu, 1);
+        return cpu;
+    }
+}
+
+void cpu_exec_exit(CPUState *cpu)
+{
+    bitmap_clear(cpu_index_map, cpu->cpu_index, 1);
+}
+#endif
+
 void cpu_exec_init(CPUArchState *env, Error **errp)
 {
     CPUState *cpu = ENV_GET_CPU(env);
     CPUClass *cc = CPU_GET_CLASS(cpu);
-    CPUState *some_cpu;
     int cpu_index;
-
 #if defined(CONFIG_USER_ONLY)
+    CPUState *some_cpu;
+
     cpu_list_lock();
-#endif
     cpu_index = 0;
     CPU_FOREACH(some_cpu) {
         cpu_index++;
     }
     cpu->cpu_index = cpu_index;
+#else
+    Error *local_err = NULL;
+
+    cpu_index = cpu->cpu_index = cpu_get_free_index(&local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+#endif
     cpu->numa_node = 0;
     QTAILQ_INIT(&cpu->breakpoints);
     QTAILQ_INIT(&cpu->watchpoints);
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 48fd6fb..5241cf4 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -659,6 +659,14 @@ void cpu_watchpoint_remove_all(CPUState *cpu, int mask);
 void QEMU_NORETURN cpu_abort(CPUState *cpu, const char *fmt, ...)
     GCC_FMT_ATTR(2, 3);
 
+#ifndef CONFIG_USER_ONLY
+void cpu_exec_exit(CPUState *cpu);
+#else
+static inline void cpu_exec_exit(CPUState *cpu)
+{
+}
+#endif
+
 #ifdef CONFIG_SOFTMMU
 extern const struct VMStateDescription vmstate_cpu_common;
 #else
diff --git a/target-alpha/cpu.c b/target-alpha/cpu.c
index 0a0c21e..259a04c 100644
--- a/target-alpha/cpu.c
+++ b/target-alpha/cpu.c
@@ -250,6 +250,11 @@ static const TypeInfo ev68_cpu_type_info = {
     .parent = TYPE("ev67"),
 };
 
+static void alpha_cpu_finalize(Object *obj)
+{
+    cpu_exec_exit(CPU(obj));
+}
+
 static void alpha_cpu_initfn(Object *obj)
 {
     CPUState *cs = CPU(obj);
@@ -305,6 +310,7 @@ static const TypeInfo alpha_cpu_type_info = {
     .parent = TYPE_CPU,
     .instance_size = sizeof(AlphaCPU),
     .instance_init = alpha_cpu_initfn,
+    .instance_finalize = alpha_cpu_finalize,
     .abstract = true,
     .class_size = sizeof(AlphaCPUClass),
     .class_init = alpha_cpu_class_init,
diff --git a/target-arm/cpu.c b/target-arm/cpu.c
index 86edaab..8d824d3 100644
--- a/target-arm/cpu.c
+++ b/target-arm/cpu.c
@@ -454,6 +454,7 @@ static void arm_cpu_finalizefn(Object *obj)
 {
     ARMCPU *cpu = ARM_CPU(obj);
     g_hash_table_destroy(cpu->cp_regs);
+    cpu_exec_exit(CPU(obj));
 }
 
 static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
diff --git a/target-cris/cpu.c b/target-cris/cpu.c
index 8b589ec..da39223 100644
--- a/target-cris/cpu.c
+++ b/target-cris/cpu.c
@@ -161,6 +161,11 @@ static void cris_cpu_set_irq(void *opaque, int irq, int level)
 }
 #endif
 
+static void cris_cpu_finalize(Object *obj)
+{
+    cpu_exec_exit(CPU(obj));
+}
+
 static void cris_cpu_initfn(Object *obj)
 {
     CPUState *cs = CPU(obj);
@@ -299,6 +304,7 @@ static const TypeInfo cris_cpu_type_info = {
     .parent = TYPE_CPU,
     .instance_size = sizeof(CRISCPU),
     .instance_init = cris_cpu_initfn,
+    .instance_finalize = cris_cpu_finalize,
     .abstract = true,
     .class_size = sizeof(CRISCPUClass),
     .class_init = cris_cpu_class_init,
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index daccf4f..ca2b93e 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2877,6 +2877,11 @@ uint32_t x86_cpu_apic_id_from_index(unsigned int cpu_index)
     }
 }
 
+static void x86_cpu_finalize(Object *obj)
+{
+    cpu_exec_exit(CPU(obj));
+}
+
 static void x86_cpu_initfn(Object *obj)
 {
     CPUState *cs = CPU(obj);
@@ -3046,6 +3051,7 @@ static const TypeInfo x86_cpu_type_info = {
     .parent = TYPE_CPU,
     .instance_size = sizeof(X86CPU),
     .instance_init = x86_cpu_initfn,
+    .instance_finalize = x86_cpu_finalize,
     .abstract = true,
     .class_size = sizeof(X86CPUClass),
     .class_init = x86_cpu_common_class_init,
diff --git a/target-lm32/cpu.c b/target-lm32/cpu.c
index 89b6631..d7bc973 100644
--- a/target-lm32/cpu.c
+++ b/target-lm32/cpu.c
@@ -143,6 +143,11 @@ static void lm32_cpu_realizefn(DeviceState *dev, Error **errp)
     lcc->parent_realize(dev, errp);
 }
 
+static void lm32_cpu_finalize(Object *obj)
+{
+    cpu_exec_exit(CPU(obj));
+}
+
 static void lm32_cpu_initfn(Object *obj)
 {
     CPUState *cs = CPU(obj);
@@ -294,6 +299,7 @@ static const TypeInfo lm32_cpu_type_info = {
     .parent = TYPE_CPU,
     .instance_size = sizeof(LM32CPU),
     .instance_init = lm32_cpu_initfn,
+    .instance_finalize = lm32_cpu_finalize,
     .abstract = true,
     .class_size = sizeof(LM32CPUClass),
     .class_init = lm32_cpu_class_init,
diff --git a/target-m68k/cpu.c b/target-m68k/cpu.c
index 6a41551..c2fce97 100644
--- a/target-m68k/cpu.c
+++ b/target-m68k/cpu.c
@@ -160,6 +160,11 @@ static void m68k_cpu_realizefn(DeviceState *dev, Error **errp)
     mcc->parent_realize(dev, errp);
 }
 
+static void m68k_cpu_finalize(Object *obj)
+{
+    cpu_exec_exit(CPU(obj));
+}
+
 static void m68k_cpu_initfn(Object *obj)
 {
     CPUState *cs = CPU(obj);
@@ -231,6 +236,7 @@ static const TypeInfo m68k_cpu_type_info = {
     .parent = TYPE_CPU,
     .instance_size = sizeof(M68kCPU),
     .instance_init = m68k_cpu_initfn,
+    .instance_finalize = m68k_cpu_finalize,
     .abstract = true,
     .class_size = sizeof(M68kCPUClass),
     .class_init = m68k_cpu_class_init,
diff --git a/target-microblaze/cpu.c b/target-microblaze/cpu.c
index 6b3732d..3aa3796 100644
--- a/target-microblaze/cpu.c
+++ b/target-microblaze/cpu.c
@@ -122,6 +122,11 @@ static void mb_cpu_realizefn(DeviceState *dev, Error **errp)
     mcc->parent_realize(dev, errp);
 }
 
+static void mb_cpu_finalize(Object *obj)
+{
+    cpu_exec_exit(CPU(obj));
+}
+
 static void mb_cpu_initfn(Object *obj)
 {
     CPUState *cs = CPU(obj);
@@ -190,6 +195,7 @@ static const TypeInfo mb_cpu_type_info = {
     .parent = TYPE_CPU,
     .instance_size = sizeof(MicroBlazeCPU),
     .instance_init = mb_cpu_initfn,
+    .instance_finalize = mb_cpu_finalize,
     .class_size = sizeof(MicroBlazeCPUClass),
     .class_init = mb_cpu_class_init,
 };
diff --git a/target-mips/cpu.c b/target-mips/cpu.c
index 02f1d32..2150999 100644
--- a/target-mips/cpu.c
+++ b/target-mips/cpu.c
@@ -108,6 +108,11 @@ static void mips_cpu_realizefn(DeviceState *dev, Error **errp)
     mcc->parent_realize(dev, errp);
 }
 
+static void mips_cpu_finalize(Object *obj)
+{
+    cpu_exec_exit(CPU(obj));
+}
+
 static void mips_cpu_initfn(Object *obj)
 {
     CPUState *cs = CPU(obj);
@@ -159,6 +164,7 @@ static const TypeInfo mips_cpu_type_info = {
     .parent = TYPE_CPU,
     .instance_size = sizeof(MIPSCPU),
     .instance_init = mips_cpu_initfn,
+    .instance_finalize = mips_cpu_finalize,
     .abstract = false,
     .class_size = sizeof(MIPSCPUClass),
     .class_init = mips_cpu_class_init,
diff --git a/target-moxie/cpu.c b/target-moxie/cpu.c
index f815fb3..25d5f30 100644
--- a/target-moxie/cpu.c
+++ b/target-moxie/cpu.c
@@ -59,6 +59,11 @@ static void moxie_cpu_realizefn(DeviceState *dev, Error **errp)
     mcc->parent_realize(dev, errp);
 }
 
+static void moxie_cpu_finalize(Object *obj)
+{
+    cpu_exec_exit(CPU(obj));
+}
+
 static void moxie_cpu_initfn(Object *obj)
 {
     CPUState *cs = CPU(obj);
@@ -160,6 +165,7 @@ static const TypeInfo moxie_cpu_type_info = {
     .parent = TYPE_CPU,
     .instance_size = sizeof(MoxieCPU),
     .instance_init = moxie_cpu_initfn,
+    .instance_finalize = moxie_cpu_finalize,
     .class_size = sizeof(MoxieCPUClass),
     .class_init = moxie_cpu_class_init,
 };
diff --git a/target-openrisc/cpu.c b/target-openrisc/cpu.c
index 87b2f80..f0c990f 100644
--- a/target-openrisc/cpu.c
+++ b/target-openrisc/cpu.c
@@ -85,6 +85,11 @@ static void openrisc_cpu_realizefn(DeviceState *dev, Error **errp)
     occ->parent_realize(dev, errp);
 }
 
+static void openrisc_cpu_finalize(Object *obj)
+{
+    cpu_exec_exit(CPU(obj));
+}
+
 static void openrisc_cpu_initfn(Object *obj)
 {
     CPUState *cs = CPU(obj);
@@ -198,6 +203,7 @@ static const TypeInfo openrisc_cpu_type_info = {
     .parent = TYPE_CPU,
     .instance_size = sizeof(OpenRISCCPU),
     .instance_init = openrisc_cpu_initfn,
+    .instance_finalize = openrisc_cpu_finalize,
     .abstract = true,
     .class_size = sizeof(OpenRISCCPUClass),
     .class_init = openrisc_cpu_class_init,
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 9f4f172..a553560 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -9671,6 +9671,11 @@ static bool ppc_cpu_is_big_endian(CPUState *cs)
 }
 #endif
 
+static void ppc_cpu_finalize(Object *obj)
+{
+    cpu_exec_exit(CPU(obj));
+}
+
 static void ppc_cpu_initfn(Object *obj)
 {
     CPUState *cs = CPU(obj);
@@ -9784,6 +9789,7 @@ static const TypeInfo ppc_cpu_type_info = {
     .parent = TYPE_CPU,
     .instance_size = sizeof(PowerPCCPU),
     .instance_init = ppc_cpu_initfn,
+    .instance_finalize = ppc_cpu_finalize,
     .abstract = true,
     .class_size = sizeof(PowerPCCPUClass),
     .class_init = ppc_cpu_class_init,
diff --git a/target-s390x/cpu.c b/target-s390x/cpu.c
index 28717bd..198e57b 100644
--- a/target-s390x/cpu.c
+++ b/target-s390x/cpu.c
@@ -212,6 +212,7 @@ static void s390_cpu_finalize(Object *obj)
 
     qemu_unregister_reset(s390_cpu_machine_reset_cb, cpu);
 #endif
+    cpu_exec_exit(CPU(obj));
 }
 
 #if !defined(CONFIG_USER_ONLY)
diff --git a/target-sh4/cpu.c b/target-sh4/cpu.c
index ffb635e..65f44c7 100644
--- a/target-sh4/cpu.c
+++ b/target-sh4/cpu.c
@@ -240,6 +240,11 @@ static void superh_cpu_realizefn(DeviceState *dev, Error **errp)
     scc->parent_realize(dev, errp);
 }
 
+static void superh_cpu_finalize(Object *obj)
+{
+    cpu_exec_exit(CPU(obj));
+}
+
 static void superh_cpu_initfn(Object *obj)
 {
     CPUState *cs = CPU(obj);
@@ -296,6 +301,7 @@ static const TypeInfo superh_cpu_type_info = {
     .parent = TYPE_CPU,
     .instance_size = sizeof(SuperHCPU),
     .instance_init = superh_cpu_initfn,
+    .instance_finalize = superh_cpu_finalize,
     .abstract = true,
     .class_size = sizeof(SuperHCPUClass),
     .class_init = superh_cpu_class_init,
diff --git a/target-sparc/cpu.c b/target-sparc/cpu.c
index d857aae..ac7091a 100644
--- a/target-sparc/cpu.c
+++ b/target-sparc/cpu.c
@@ -815,6 +815,7 @@ static void sparc_cpu_uninitfn(Object *obj)
     CPUSPARCState *env = &cpu->env;
 
     g_free(env->def);
+    cpu_exec_exit(CPU(obj));
 }
 
 static void sparc_cpu_class_init(ObjectClass *oc, void *data)
diff --git a/target-tricore/cpu.c b/target-tricore/cpu.c
index 53b117b..e871dc4 100644
--- a/target-tricore/cpu.c
+++ b/target-tricore/cpu.c
@@ -80,6 +80,10 @@ static void tricore_cpu_realizefn(DeviceState *dev, Error **errp)
     tcc->parent_realize(dev, errp);
 }
 
+static void tricore_cpu_finalize(Object *obj)
+{
+    cpu_exec_exit(CPU(obj));
+}
 
 static void tricore_cpu_initfn(Object *obj)
 {
@@ -180,6 +184,7 @@ static const TypeInfo tricore_cpu_type_info = {
     .parent = TYPE_CPU,
     .instance_size = sizeof(TriCoreCPU),
     .instance_init = tricore_cpu_initfn,
+    .instance_finalize = tricore_cpu_finalize,
     .abstract = true,
     .class_size = sizeof(TriCoreCPUClass),
     .class_init = tricore_cpu_class_init,
diff --git a/target-unicore32/cpu.c b/target-unicore32/cpu.c
index d56d78a..64af9f8 100644
--- a/target-unicore32/cpu.c
+++ b/target-unicore32/cpu.c
@@ -103,6 +103,11 @@ static void uc32_cpu_realizefn(DeviceState *dev, Error **errp)
     ucc->parent_realize(dev, errp);
 }
 
+static void uc32_cpu_finalize(Object *obj)
+{
+    cpu_exec_exit(CPU(obj));
+}
+
 static void uc32_cpu_initfn(Object *obj)
 {
     CPUState *cs = CPU(obj);
@@ -174,6 +179,7 @@ static const TypeInfo uc32_cpu_type_info = {
     .parent = TYPE_CPU,
     .instance_size = sizeof(UniCore32CPU),
     .instance_init = uc32_cpu_initfn,
+    .instance_finalize = uc32_cpu_finalize,
     .abstract = true,
     .class_size = sizeof(UniCore32CPUClass),
     .class_init = uc32_cpu_class_init,
diff --git a/target-xtensa/cpu.c b/target-xtensa/cpu.c
index dd23d32..565a946 100644
--- a/target-xtensa/cpu.c
+++ b/target-xtensa/cpu.c
@@ -104,6 +104,11 @@ static void xtensa_cpu_realizefn(DeviceState *dev, Error **errp)
     xcc->parent_realize(dev, errp);
 }
 
+static void xtensa_cpu_finalize(Object *obj)
+{
+    cpu_exec_exit(CPU(obj));
+}
+
 static void xtensa_cpu_initfn(Object *obj)
 {
     CPUState *cs = CPU(obj);
@@ -161,6 +166,7 @@ static const TypeInfo xtensa_cpu_type_info = {
     .parent = TYPE_CPU,
     .instance_size = sizeof(XtensaCPU),
     .instance_init = xtensa_cpu_initfn,
+    .instance_finalize = xtensa_cpu_finalize,
     .abstract = true,
     .class_size = sizeof(XtensaCPUClass),
     .class_init = xtensa_cpu_class_init,
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [RFC PATCH v3 15/24] ppc: Move cpu_exec_init() call to realize function
  2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
                   ` (13 preceding siblings ...)
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 14/24] cpus: Convert cpu_index into a bitmap Bharata B Rao
@ 2015-04-24  6:47 ` Bharata B Rao
  2015-05-05  7:12   ` David Gibson
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 16/24] qom: Introduce object_has_no_children() API Bharata B Rao
                   ` (8 subsequent siblings)
  23 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, Bharata B Rao, mdroth, agraf, qemu-ppc, tyreld, nfont,
	imammedo, afaerber, david

Move cpu_exec_init() call from instance_init to realize. This allows
any failures from cpu_exec_init() to be handled appropriately.
Correspondingly move cpu_exec_exit() call from instance_finalize
to unrealize.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---
 target-ppc/translate_init.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index a553560..fccee82 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -8928,6 +8928,11 @@ static void ppc_cpu_realizefn(DeviceState *dev, Error **errp)
         return;
     }
 
+    cpu_exec_init(&cpu->env, &local_err);
+    if (local_err != NULL) {
+        error_propagate(errp, local_err);
+        return;
+    }
     cpu->cpu_dt_id = (cs->cpu_index / smp_threads) * max_smt
         + (cs->cpu_index % smp_threads);
 #endif
@@ -9141,6 +9146,8 @@ static void ppc_cpu_unrealizefn(DeviceState *dev, Error **errp)
     opc_handler_t **table;
     int i, j;
 
+    cpu_exec_exit(CPU(dev));
+
     for (i = 0; i < PPC_CPU_OPCODES_LEN; i++) {
         if (env->opcodes[i] == &invalid_handler) {
             continue;
@@ -9671,11 +9678,6 @@ static bool ppc_cpu_is_big_endian(CPUState *cs)
 }
 #endif
 
-static void ppc_cpu_finalize(Object *obj)
-{
-    cpu_exec_exit(CPU(obj));
-}
-
 static void ppc_cpu_initfn(Object *obj)
 {
     CPUState *cs = CPU(obj);
@@ -9684,8 +9686,6 @@ static void ppc_cpu_initfn(Object *obj)
     CPUPPCState *env = &cpu->env;
 
     cs->env_ptr = env;
-    cpu_exec_init(env, NULL);
-    cpu->cpu_dt_id = cs->cpu_index;
 
     env->msr_mask = pcc->msr_mask;
     env->mmu_model = pcc->mmu_model;
@@ -9789,7 +9789,6 @@ static const TypeInfo ppc_cpu_type_info = {
     .parent = TYPE_CPU,
     .instance_size = sizeof(PowerPCCPU),
     .instance_init = ppc_cpu_initfn,
-    .instance_finalize = ppc_cpu_finalize,
     .abstract = true,
     .class_size = sizeof(PowerPCCPUClass),
     .class_init = ppc_cpu_class_init,
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [RFC PATCH v3 16/24] qom: Introduce object_has_no_children() API
  2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
                   ` (14 preceding siblings ...)
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 15/24] ppc: Move cpu_exec_init() call to realize function Bharata B Rao
@ 2015-04-24  6:47 ` Bharata B Rao
  2015-05-05  7:13   ` David Gibson
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 17/24] cpus: Reclaim vCPU objects Bharata B Rao
                   ` (7 subsequent siblings)
  23 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, Bharata B Rao, mdroth, agraf, qemu-ppc, tyreld, nfont,
	imammedo, afaerber, david

This QOM API can be used to check of an object has any child objects
associated with it.

Needed by PowerPC CPU hotplug code to release parent CPU core and
socket objects only after ascertaining that they don't have any child
objects.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---
 include/qom/object.h | 11 +++++++++++
 qom/object.c         | 12 ++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/include/qom/object.h b/include/qom/object.h
index d2d7748..264e412 100644
--- a/include/qom/object.h
+++ b/include/qom/object.h
@@ -1317,6 +1317,17 @@ int object_child_foreach(Object *obj, int (*fn)(Object *child, void *opaque),
                          void *opaque);
 
 /**
+ * object_has_no_children:
+ * @obj: the object which will be checked for children
+ *
+ * Navigate the properties list of the @obj and check for any child
+ * property.
+ *
+ * Returns: true if no child properties are found, else returns false.
+ */
+bool object_has_no_children(Object *obj);
+
+/**
  * container_get:
  * @root: root of the #path, e.g., object_get_root()
  * @path: path to the container
diff --git a/qom/object.c b/qom/object.c
index d167038..0fddf00 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -683,6 +683,18 @@ int object_child_foreach(Object *obj, int (*fn)(Object *child, void *opaque),
     return ret;
 }
 
+bool object_has_no_children(Object *obj)
+{
+    ObjectProperty *prop;
+
+    QTAILQ_FOREACH(prop, &obj->properties, node) {
+        if (object_property_is_child(prop)) {
+            return false;
+        }
+    }
+    return true;
+}
+
 static void object_class_get_list_tramp(ObjectClass *klass, void *opaque)
 {
     GSList **list = opaque;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [RFC PATCH v3 17/24] cpus: Reclaim vCPU objects
  2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
                   ` (15 preceding siblings ...)
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 16/24] qom: Introduce object_has_no_children() API Bharata B Rao
@ 2015-04-24  6:47 ` Bharata B Rao
  2015-05-05  7:20   ` David Gibson
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 18/24] xics_kvm: Don't enable KVM_CAP_IRQ_XICS if already enabled Bharata B Rao
                   ` (6 subsequent siblings)
  23 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: Zhu Guihua, aik, Bharata B Rao, mdroth, agraf, Chen Fan,
	qemu-ppc, tyreld, nfont, Gu Zheng, imammedo, afaerber, david

From: Gu Zheng <guz.fnst@cn.fujitsu.com>

In order to deal well with the kvm vcpus (which can not be removed without any
protection), we do not close KVM vcpu fd, just record and mark it as stopped
into a list, so that we can reuse it for the appending cpu hot-add request if
possible. It is also the approach that kvm guys suggested:
https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html

This patch also adds a QOM API object_has_no_children(Object *obj)
that checks whether a given object has any child objects. This API
is needed to release CPU core and socket objects when a vCPU is destroyed.

Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
               [Added core and socket removal bits]
---
 cpus.c               | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 include/qom/cpu.h    | 11 +++++++++
 include/sysemu/kvm.h |  1 +
 kvm-all.c            | 57 +++++++++++++++++++++++++++++++++++++++++++-
 kvm-stub.c           |  5 ++++
 5 files changed, 140 insertions(+), 1 deletion(-)

diff --git a/cpus.c b/cpus.c
index 0fac143..325f8a6 100644
--- a/cpus.c
+++ b/cpus.c
@@ -858,6 +858,47 @@ void async_run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
     qemu_cpu_kick(cpu);
 }
 
+static void qemu_destroy_cpu_core(Object *core)
+{
+    Object *socket = core->parent;
+
+    object_unparent(core);
+    if (socket && object_has_no_children(socket)) {
+        object_unparent(socket);
+    }
+}
+
+static void qemu_kvm_destroy_vcpu(CPUState *cpu)
+{
+    Object *thread = OBJECT(cpu);
+    Object *core = thread->parent;
+
+    CPU_REMOVE(cpu);
+
+    if (kvm_destroy_vcpu(cpu) < 0) {
+        error_report("kvm_destroy_vcpu failed.\n");
+        exit(EXIT_FAILURE);
+    }
+
+    object_unparent(thread);
+    if (core && object_has_no_children(core)) {
+        qemu_destroy_cpu_core(core);
+    }
+}
+
+static void qemu_tcg_destroy_vcpu(CPUState *cpu)
+{
+    Object *thread = OBJECT(cpu);
+    Object *core = thread->parent;
+
+    CPU_REMOVE(cpu);
+    object_unparent(OBJECT(cpu));
+
+    if (core && object_has_no_children(core)) {
+        qemu_destroy_cpu_core(core);
+    }
+}
+
 static void flush_queued_work(CPUState *cpu)
 {
     struct qemu_work_item *wi;
@@ -950,6 +991,11 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
             }
         }
         qemu_kvm_wait_io_event(cpu);
+        if (cpu->exit && !cpu_can_run(cpu)) {
+            qemu_kvm_destroy_vcpu(cpu);
+            qemu_mutex_unlock(&qemu_global_mutex);
+            return NULL;
+        }
     }
 
     return NULL;
@@ -1003,6 +1049,7 @@ static void tcg_exec_all(void);
 static void *qemu_tcg_cpu_thread_fn(void *arg)
 {
     CPUState *cpu = arg;
+    CPUState *remove_cpu = NULL;
 
     qemu_tcg_init_cpu_signals();
     qemu_thread_get_self(cpu->thread);
@@ -1039,6 +1086,16 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
             }
         }
         qemu_tcg_wait_io_event();
+        CPU_FOREACH(cpu) {
+            if (cpu->exit && !cpu_can_run(cpu)) {
+                remove_cpu = cpu;
+                break;
+            }
+        }
+        if (remove_cpu) {
+            qemu_tcg_destroy_vcpu(remove_cpu);
+            remove_cpu = NULL;
+        }
     }
 
     return NULL;
@@ -1196,6 +1253,13 @@ void resume_all_vcpus(void)
     }
 }
 
+void cpu_remove(CPUState *cpu)
+{
+    cpu->stop = true;
+    cpu->exit = true;
+    qemu_cpu_kick(cpu);
+}
+
 /* For temporary buffers for forming a name */
 #define VCPU_THREAD_NAME_SIZE 16
 
@@ -1390,6 +1454,9 @@ static void tcg_exec_all(void)
                 break;
             }
         } else if (cpu->stop || cpu->stopped) {
+            if (cpu->exit) {
+                next_cpu = CPU_NEXT(cpu);
+            }
             break;
         }
     }
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 5241cf4..1bfc3d4 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -206,6 +206,7 @@ struct kvm_run;
  * @halted: Nonzero if the CPU is in suspended state.
  * @stop: Indicates a pending stop request.
  * @stopped: Indicates the CPU has been artificially stopped.
+ * @exit: Indicates the CPU has exited due to an unplug operation.
  * @tcg_exit_req: Set to force TCG to stop executing linked TBs for this
  *           CPU and return to its top level loop.
  * @singlestep_enabled: Flags for single-stepping.
@@ -249,6 +250,7 @@ struct CPUState {
     bool created;
     bool stop;
     bool stopped;
+    bool exit;
     volatile sig_atomic_t exit_request;
     uint32_t interrupt_request;
     int singlestep_enabled;
@@ -306,6 +308,7 @@ struct CPUState {
 QTAILQ_HEAD(CPUTailQ, CPUState);
 extern struct CPUTailQ cpus;
 #define CPU_NEXT(cpu) QTAILQ_NEXT(cpu, node)
+#define CPU_REMOVE(cpu) QTAILQ_REMOVE(&cpus, cpu, node)
 #define CPU_FOREACH(cpu) QTAILQ_FOREACH(cpu, &cpus, node)
 #define CPU_FOREACH_SAFE(cpu, next_cpu) \
     QTAILQ_FOREACH_SAFE(cpu, &cpus, node, next_cpu)
@@ -610,6 +613,14 @@ void cpu_exit(CPUState *cpu);
  */
 void cpu_resume(CPUState *cpu);
 
+ /**
+ * cpu_remove:
+ * @cpu: The CPU to remove.
+ *
+ * Requests the CPU to be removed.
+ */
+void cpu_remove(CPUState *cpu);
+
 /**
  * qemu_init_vcpu:
  * @cpu: The vCPU to initialize.
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 30cb84d..560caef 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -188,6 +188,7 @@ int kvm_has_intx_set_mask(void);
 
 int kvm_init_vcpu(CPUState *cpu);
 int kvm_cpu_exec(CPUState *cpu);
+int kvm_destroy_vcpu(CPUState *cpu);
 
 #ifdef NEED_CPU_H
 
diff --git a/kvm-all.c b/kvm-all.c
index 05a79c2..46e7853 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -71,6 +71,12 @@ typedef struct KVMSlot
 
 typedef struct kvm_dirty_log KVMDirtyLog;
 
+struct KVMParkedVcpu {
+    unsigned long vcpu_id;
+    int kvm_fd;
+    QLIST_ENTRY(KVMParkedVcpu) node;
+};
+
 struct KVMState
 {
     AccelState parent_obj;
@@ -107,6 +113,7 @@ struct KVMState
     QTAILQ_HEAD(msi_hashtab, KVMMSIRoute) msi_hashtab[KVM_MSI_HASHTAB_SIZE];
     bool direct_msi;
 #endif
+    QLIST_HEAD(, KVMParkedVcpu) kvm_parked_vcpus;
 };
 
 #define TYPE_KVM_ACCEL ACCEL_CLASS_NAME("kvm")
@@ -247,6 +254,53 @@ static int kvm_set_user_memory_region(KVMState *s, KVMSlot *slot)
     return kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
 }
 
+int kvm_destroy_vcpu(CPUState *cpu)
+{
+    KVMState *s = kvm_state;
+    long mmap_size;
+    struct KVMParkedVcpu *vcpu = NULL;
+    int ret = 0;
+
+    DPRINTF("kvm_destroy_vcpu\n");
+
+    mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
+    if (mmap_size < 0) {
+        ret = mmap_size;
+        DPRINTF("KVM_GET_VCPU_MMAP_SIZE failed\n");
+        goto err;
+    }
+
+    ret = munmap(cpu->kvm_run, mmap_size);
+    if (ret < 0) {
+        goto err;
+    }
+
+    vcpu = g_malloc0(sizeof(*vcpu));
+    vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
+    vcpu->kvm_fd = cpu->kvm_fd;
+    QLIST_INSERT_HEAD(&kvm_state->kvm_parked_vcpus, vcpu, node);
+err:
+    return ret;
+}
+
+static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id)
+{
+    struct KVMParkedVcpu *cpu;
+
+    QLIST_FOREACH(cpu, &s->kvm_parked_vcpus, node) {
+        if (cpu->vcpu_id == vcpu_id) {
+            int kvm_fd;
+
+            QLIST_REMOVE(cpu, node);
+            kvm_fd = cpu->kvm_fd;
+            g_free(cpu);
+            return kvm_fd;
+        }
+    }
+
+    return kvm_vm_ioctl(s, KVM_CREATE_VCPU, (void *)vcpu_id);
+}
+
 int kvm_init_vcpu(CPUState *cpu)
 {
     KVMState *s = kvm_state;
@@ -255,7 +309,7 @@ int kvm_init_vcpu(CPUState *cpu)
 
     DPRINTF("kvm_init_vcpu\n");
 
-    ret = kvm_vm_ioctl(s, KVM_CREATE_VCPU, (void *)kvm_arch_vcpu_id(cpu));
+    ret = kvm_get_vcpu(s, kvm_arch_vcpu_id(cpu));
     if (ret < 0) {
         DPRINTF("kvm_create_vcpu failed\n");
         goto err;
@@ -1448,6 +1502,7 @@ static int kvm_init(MachineState *ms)
 #ifdef KVM_CAP_SET_GUEST_DEBUG
     QTAILQ_INIT(&s->kvm_sw_breakpoints);
 #endif
+    QLIST_INIT(&s->kvm_parked_vcpus);
     s->vmfd = -1;
     s->fd = qemu_open("/dev/kvm", O_RDWR);
     if (s->fd == -1) {
diff --git a/kvm-stub.c b/kvm-stub.c
index 7ba90c5..79ac626 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -30,6 +30,11 @@ bool kvm_gsi_direct_mapping;
 bool kvm_allowed;
 bool kvm_readonly_mem_allowed;
 
+int kvm_destroy_vcpu(CPUState *cpu)
+{
+    return -ENOSYS;
+}
+
 int kvm_init_vcpu(CPUState *cpu)
 {
     return -ENOSYS;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [RFC PATCH v3 18/24] xics_kvm: Don't enable KVM_CAP_IRQ_XICS if already enabled
  2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
                   ` (16 preceding siblings ...)
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 17/24] cpus: Reclaim vCPU objects Bharata B Rao
@ 2015-04-24  6:47 ` Bharata B Rao
  2015-05-05  7:22   ` David Gibson
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 19/24] xics_kvm: Add cpu_destroy method to XICS Bharata B Rao
                   ` (5 subsequent siblings)
  23 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, Bharata B Rao, mdroth, agraf, qemu-ppc, tyreld, nfont,
	imammedo, afaerber, david

When supporting CPU hot removal by parking the vCPU fd and reusing
it during hotplug again, there can be cases where we try to reenable
KVM_CAP_IRQ_XICS CAP for the vCPU for which it was already enabled.
Introduce a boolean member in ICPState to track this and don't
reenable the CAP if it was already enabled earlier.

This change allows CPU hot removal to work for sPAPR.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---
 hw/intc/xics_kvm.c    | 10 ++++++++++
 include/hw/ppc/xics.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
index c15453f..5b27bf8 100644
--- a/hw/intc/xics_kvm.c
+++ b/hw/intc/xics_kvm.c
@@ -331,6 +331,15 @@ static void xics_kvm_cpu_setup(XICSState *icp, PowerPCCPU *cpu)
         abort();
     }
 
+    /*
+     * If we are reusing a parked vCPU fd corresponding to the CPU
+     * which was hot-removed earlier we don't have to renable
+     * KVM_CAP_IRQ_XICS capability again.
+     */
+    if (ss->cap_irq_xics_enabled) {
+        return;
+    }
+
     if (icpkvm->kernel_xics_fd != -1) {
         int ret;
 
@@ -343,6 +352,7 @@ static void xics_kvm_cpu_setup(XICSState *icp, PowerPCCPU *cpu)
                     kvm_arch_vcpu_id(cs), strerror(errno));
             exit(1);
         }
+        ss->cap_irq_xics_enabled = true;
     }
 }
 
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index a214dd7..355a966 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -109,6 +109,7 @@ struct ICPState {
     uint8_t pending_priority;
     uint8_t mfrr;
     qemu_irq output;
+    bool cap_irq_xics_enabled;
 };
 
 #define TYPE_ICS "ics"
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [RFC PATCH v3 19/24] xics_kvm: Add cpu_destroy method to XICS
  2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
                   ` (17 preceding siblings ...)
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 18/24] xics_kvm: Don't enable KVM_CAP_IRQ_XICS if already enabled Bharata B Rao
@ 2015-04-24  6:47 ` Bharata B Rao
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 20/24] spapr: CPU hot unplug support Bharata B Rao
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, Bharata B Rao, mdroth, agraf, qemu-ppc, tyreld, nfont,
	imammedo, afaerber, david

XICS is setup for each CPU during initialization. Provide a routine
to undo the same when CPU is unplugged.

This allows reboot of a VM that has undergone CPU hotplug and unplug
to work correctly.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
---
 hw/intc/xics.c        | 12 ++++++++++++
 hw/intc/xics_kvm.c    |  9 +++++++++
 include/hw/ppc/xics.h |  2 ++
 3 files changed, 23 insertions(+)

diff --git a/hw/intc/xics.c b/hw/intc/xics.c
index 0fd2a84..406697d 100644
--- a/hw/intc/xics.c
+++ b/hw/intc/xics.c
@@ -44,6 +44,18 @@ static int get_cpu_index_by_dt_id(int cpu_dt_id)
     return -1;
 }
 
+void xics_cpu_destroy(XICSState *icp, PowerPCCPU *cpu)
+{
+    CPUState *cs = CPU(cpu);
+    XICSStateClass *info = XICS_COMMON_GET_CLASS(icp);
+
+    assert(cs->cpu_index < icp->nr_servers);
+
+    if (info->cpu_destroy) {
+        info->cpu_destroy(icp, cpu);
+    }
+}
+
 void xics_cpu_setup(XICSState *icp, PowerPCCPU *cpu)
 {
     CPUState *cs = CPU(cpu);
diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
index 5b27bf8..a7c6226 100644
--- a/hw/intc/xics_kvm.c
+++ b/hw/intc/xics_kvm.c
@@ -356,6 +356,14 @@ static void xics_kvm_cpu_setup(XICSState *icp, PowerPCCPU *cpu)
     }
 }
 
+static void xics_kvm_cpu_destroy(XICSState *icp, PowerPCCPU *cpu)
+{
+    CPUState *cs = CPU(cpu);
+    ICPState *ss = &icp->ss[cs->cpu_index];
+
+    ss->cs = NULL;
+}
+
 static void xics_kvm_set_nr_irqs(XICSState *icp, uint32_t nr_irqs, Error **errp)
 {
     icp->nr_irqs = icp->ics->nr_irqs = nr_irqs;
@@ -486,6 +494,7 @@ static void xics_kvm_class_init(ObjectClass *oc, void *data)
 
     dc->realize = xics_kvm_realize;
     xsc->cpu_setup = xics_kvm_cpu_setup;
+    xsc->cpu_destroy = xics_kvm_cpu_destroy;
     xsc->set_nr_irqs = xics_kvm_set_nr_irqs;
     xsc->set_nr_servers = xics_kvm_set_nr_servers;
 }
diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
index 355a966..2faad48 100644
--- a/include/hw/ppc/xics.h
+++ b/include/hw/ppc/xics.h
@@ -68,6 +68,7 @@ struct XICSStateClass {
     DeviceClass parent_class;
 
     void (*cpu_setup)(XICSState *icp, PowerPCCPU *cpu);
+    void (*cpu_destroy)(XICSState *icp, PowerPCCPU *cpu);
     void (*set_nr_irqs)(XICSState *icp, uint32_t nr_irqs, Error **errp);
     void (*set_nr_servers)(XICSState *icp, uint32_t nr_servers, Error **errp);
 };
@@ -166,5 +167,6 @@ int xics_alloc_block(XICSState *icp, int src, int num, bool lsi, bool align);
 void xics_free(XICSState *icp, int irq, int num);
 
 void xics_cpu_setup(XICSState *icp, PowerPCCPU *cpu);
+void xics_cpu_destroy(XICSState *icp, PowerPCCPU *cpu);
 
 #endif /* __XICS_H__ */
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [RFC PATCH v3 20/24] spapr: CPU hot unplug support
  2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
                   ` (18 preceding siblings ...)
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 19/24] xics_kvm: Add cpu_destroy method to XICS Bharata B Rao
@ 2015-04-24  6:47 ` Bharata B Rao
  2015-05-05  7:28   ` David Gibson
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 21/24] spapr: Initialize hotplug memory address space Bharata B Rao
                   ` (3 subsequent siblings)
  23 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, Bharata B Rao, mdroth, agraf, qemu-ppc, tyreld, nfont,
	imammedo, afaerber, david

Support hot removal of CPU for sPAPR guests by sending the hot unplug
notification to the guest via EPOW interrupt. Release the vCPU object
after CPU hot unplug so that vCPU fd can be parked and reused.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c              | 101 +++++++++++++++++++++++++++++++++++++++++++-
 target-ppc/translate_init.c |  10 +++++
 2 files changed, 110 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 9b0701c..910a50f 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1481,6 +1481,12 @@ static void spapr_cpu_init(PowerPCCPU *cpu)
     qemu_register_reset(spapr_cpu_reset, cpu);
 }
 
+static void spapr_cpu_destroy(PowerPCCPU *cpu)
+{
+    xics_cpu_destroy(spapr->icp, cpu);
+    qemu_unregister_reset(spapr_cpu_reset, cpu);
+}
+
 /* pSeries LPAR / sPAPR hardware init */
 static void ppc_spapr_init(MachineState *machine)
 {
@@ -1883,6 +1889,24 @@ static void *spapr_populate_hotplug_cpu_dt(DeviceState *dev, CPUState *cs,
     return fdt;
 }
 
+static void spapr_cpu_release(DeviceState *dev, void *opaque)
+{
+    CPUState *cs;
+    int i;
+    int id = ppc_get_vcpu_dt_id(POWERPC_CPU(CPU(dev)));
+
+    for (i = id; i < id + smp_threads; i++) {
+        CPU_FOREACH(cs) {
+            PowerPCCPU *cpu = POWERPC_CPU(cs);
+
+            if (i == ppc_get_vcpu_dt_id(cpu)) {
+                spapr_cpu_destroy(cpu);
+                cpu_remove(cs);
+            }
+        }
+    }
+}
+
 static void spapr_cpu_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
                             Error **errp)
 {
@@ -1970,6 +1994,59 @@ static void spapr_cpu_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
     return;
 }
 
+static int spapr_cpu_unplug(Object *obj, void *opaque)
+{
+    Error **errp = opaque;
+    DeviceState *dev = DEVICE(obj);
+    CPUState *cs = CPU(dev);
+    PowerPCCPU *cpu = POWERPC_CPU(cs);
+    int id = ppc_get_vcpu_dt_id(cpu);
+    int smt = kvmppc_smt_threads();
+    sPAPRDRConnector *drc =
+        spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, id);
+    sPAPRDRConnectorClass *drck;
+    Error *local_err = NULL;
+
+    /*
+     * SMT threads return from here, only main thread (core) will
+     * continue and signal hot unplug event to the guest.
+     */
+    if ((id % smt) != 0) {
+        return 0;
+    }
+    g_assert(drc);
+
+    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    drck->detach(drc, dev, spapr_cpu_release, NULL, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return -1;
+    }
+
+    /*
+     * In addition to hotplugged CPUs, send the hot-unplug notification
+     * interrupt to the guest for coldplugged CPUs started via -device
+     * option too.
+     */
+    spapr_hotplug_req_remove_event(drc);
+
+    return 0;
+}
+
+static int spapr_cpu_core_unplug(Object *obj, void *opaque)
+{
+    Error **errp = opaque;
+
+    object_child_foreach(obj, spapr_cpu_unplug, errp);
+    return 0;
+}
+
+static void spapr_cpu_socket_unplug(HotplugHandler *hotplug_dev,
+                            DeviceState *dev, Error **errp)
+{
+    object_child_foreach(OBJECT(dev), spapr_cpu_core_unplug, errp);
+}
+
 static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
                                       DeviceState *dev, Error **errp)
 {
@@ -1984,16 +2061,36 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
          * Fail hotplug on machines where CPU DR isn't enabled.
          */
         if (!spapr->dr_cpu_enabled && dev->hotplugged) {
+            /*
+             * FIXME: Ideally should fail hotplug here by doing an error_setg,
+             * but failing hotplug here doesn't work well with the vCPU object
+             * removal code. Hence silently refusing to add CPUs here.
+             */
+            spapr_cpu_destroy(cpu);
+            cpu_remove(cs);
             return;
         }
         spapr_cpu_plug(hotplug_dev, dev, errp);
     }
 }
 
+static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
+                                      DeviceState *dev, Error **errp)
+{
+    if (object_dynamic_cast(OBJECT(dev), TYPE_CPU_SOCKET)) {
+        if (!spapr->dr_cpu_enabled) {
+            error_setg(errp, "CPU hot unplug not supported on this machine");
+            return;
+        }
+        spapr_cpu_socket_unplug(hotplug_dev, dev, errp);
+    }
+}
+
 static HotplugHandler *spapr_get_hotpug_handler(MachineState *machine,
                                              DeviceState *dev)
 {
-    if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
+    if (object_dynamic_cast(OBJECT(dev), TYPE_CPU) ||
+        object_dynamic_cast(OBJECT(dev), TYPE_CPU_SOCKET)) {
         return HOTPLUG_HANDLER(machine);
     }
     return NULL;
@@ -2017,6 +2114,8 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
     mc->has_dynamic_sysbus = true;
     mc->get_hotplug_handler = spapr_get_hotpug_handler;
     hc->plug = spapr_machine_device_plug;
+    hc->unplug = spapr_machine_device_unplug;
+
     smc->dr_phb_enabled = false;
     smc->dr_cpu_enabled = false;
     smc->dr_lmb_enabled = false;
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index fccee82..8e24235 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -30,6 +30,7 @@
 #include "qemu/error-report.h"
 #include "qapi/visitor.h"
 #include "hw/qdev-properties.h"
+#include "migration/vmstate.h"
 
 //#define PPC_DUMP_CPU
 //#define PPC_DEBUG_SPR
@@ -9143,9 +9144,18 @@ static void ppc_cpu_unrealizefn(DeviceState *dev, Error **errp)
 {
     PowerPCCPU *cpu = POWERPC_CPU(dev);
     CPUPPCState *env = &cpu->env;
+    CPUClass *cc = CPU_GET_CLASS(dev);
     opc_handler_t **table;
     int i, j;
 
+    if (qdev_get_vmsd(dev) == NULL) {
+        vmstate_unregister(NULL, &vmstate_cpu_common, cpu);
+    }
+
+    if (cc->vmsd != NULL) {
+        vmstate_unregister(NULL, cc->vmsd, cpu);
+    }
+
     cpu_exec_exit(CPU(dev));
 
     for (i = 0; i < PPC_CPU_OPCODES_LEN; i++) {
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [RFC PATCH v3 21/24] spapr: Initialize hotplug memory address space
  2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
                   ` (19 preceding siblings ...)
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 20/24] spapr: CPU hot unplug support Bharata B Rao
@ 2015-04-24  6:47 ` Bharata B Rao
  2015-05-05  7:33   ` David Gibson
  2015-05-05  8:48   ` Igor Mammedov
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 22/24] numa: API to lookup NUMA node by address Bharata B Rao
                   ` (2 subsequent siblings)
  23 siblings, 2 replies; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, Bharata B Rao, mdroth, agraf, qemu-ppc, tyreld, nfont,
	imammedo, afaerber, david

Initialize a hotplug memory region under which all the hotplugged
memory is accommodated. Also enable memory hotplug by setting
CONFIG_MEM_HOTPLUG.

Modelled on i386 memory hotplug.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---
 default-configs/ppc64-softmmu.mak |  1 +
 hw/ppc/spapr.c                    | 38 ++++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr.h            | 12 ++++++++++++
 3 files changed, 51 insertions(+)

diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
index 22ef132..16b3011 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -51,3 +51,4 @@ CONFIG_XICS_KVM=$(and $(CONFIG_PSERIES),$(CONFIG_KVM))
 # For PReP
 CONFIG_MC146818RTC=y
 CONFIG_ISA_TESTDEV=y
+CONFIG_MEM_HOTPLUG=y
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 910a50f..9dc4c36 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -125,6 +125,9 @@ struct sPAPRMachineState {
 
     /*< public >*/
     char *kvm_type;
+    ram_addr_t hotplug_memory_base;
+    MemoryRegion hotplug_memory;
+    bool enforce_aligned_dimm;
 };
 
 sPAPREnvironment *spapr;
@@ -1514,6 +1517,7 @@ static void ppc_spapr_init(MachineState *machine)
     QemuOpts *opts = qemu_opts_find(qemu_find_opts("smp-opts"), NULL);
     int sockets = opts ? qemu_opt_get_number(opts, "sockets", 0) : 0;
     int cores = (smp_cpus/smp_threads) ? smp_cpus/smp_threads : 1;
+    sPAPRMachineState *ms = SPAPR_MACHINE(machine);
 
     sockets = sockets ? sockets : cores;
     msi_supported = true;
@@ -1613,6 +1617,36 @@ static void ppc_spapr_init(MachineState *machine)
         memory_region_add_subregion(sysmem, 0, rma_region);
     }
 
+    /* initialize hotplug memory address space */
+    if (machine->ram_size < machine->maxram_size) {
+        ram_addr_t hotplug_mem_size =
+            machine->maxram_size - machine->ram_size;
+
+        if (machine->ram_slots > SPAPR_MAX_RAM_SLOTS) {
+            error_report("unsupported amount of memory slots: %"PRIu64,
+                         machine->ram_slots);
+            exit(EXIT_FAILURE);
+        }
+
+        ms->hotplug_memory_base = ROUND_UP(machine->ram_size,
+                                    SPAPR_HOTPLUG_MEM_ALIGN);
+
+        if (ms->enforce_aligned_dimm) {
+            hotplug_mem_size += SPAPR_HOTPLUG_MEM_ALIGN * machine->ram_slots;
+        }
+
+        if ((ms->hotplug_memory_base + hotplug_mem_size) < hotplug_mem_size) {
+            error_report("unsupported amount of maximum memory: " RAM_ADDR_FMT,
+                         machine->maxram_size);
+            exit(EXIT_FAILURE);
+        }
+
+        memory_region_init(&ms->hotplug_memory, OBJECT(ms),
+                           "hotplug-memory", hotplug_mem_size);
+        memory_region_add_subregion(sysmem, ms->hotplug_memory_base,
+                                    &ms->hotplug_memory);
+    }
+
     filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, "spapr-rtas.bin");
     spapr->rtas_size = get_image_size(filename);
     spapr->rtas_blob = g_malloc(spapr->rtas_size);
@@ -1844,11 +1878,15 @@ static void spapr_set_kvm_type(Object *obj, const char *value, Error **errp)
 
 static void spapr_machine_initfn(Object *obj)
 {
+    sPAPRMachineState *ms = SPAPR_MACHINE(obj);
+
     object_property_add_str(obj, "kvm-type",
                             spapr_get_kvm_type, spapr_set_kvm_type, NULL);
     object_property_set_description(obj, "kvm-type",
                                     "Specifies the KVM virtualization mode (HV, PR)",
                                     NULL);
+
+    ms->enforce_aligned_dimm = true;
 }
 
 static void ppc_cpu_do_nmi_on_cpu(void *arg)
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index ecac6e3..53560e9 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -542,6 +542,18 @@ struct sPAPREventLogEntry {
 
 #define SPAPR_MEMORY_BLOCK_SIZE (1 << 28) /* 256MB */
 
+/*
+ * This defines the maximum number of DIMM slots we can have for sPAPR
+ * guest. This is not defined by sPAPR but we are defining it to 4096 slots
+ * here. With the worst case addition of SPAPR_MEMORY_BLOCK_SIZE
+ * (256MB) memory per slot, we should be able to support 1TB of guest
+ * hotpluggable memory.
+ */
+#define SPAPR_MAX_RAM_SLOTS     (1ULL << 12)
+
+/* 1GB alignment for hotplug memory region */
+#define SPAPR_HOTPLUG_MEM_ALIGN (1ULL << 30)
+
 void spapr_events_init(sPAPREnvironment *spapr);
 void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq);
 int spapr_h_cas_compose_response(target_ulong addr, target_ulong size);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [RFC PATCH v3 22/24] numa: API to lookup NUMA node by address
  2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
                   ` (20 preceding siblings ...)
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 21/24] spapr: Initialize hotplug memory address space Bharata B Rao
@ 2015-04-24  6:47 ` Bharata B Rao
  2015-05-05  7:35   ` David Gibson
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 23/24] spapr: Support ibm, dynamic-reconfiguration-memory Bharata B Rao
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 24/24] spapr: Memory hotplug support Bharata B Rao
  23 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, Bharata B Rao, mdroth, agraf, Paolo Bonzini, qemu-ppc,
	tyreld, nfont, imammedo, afaerber, david

Keep track of start and end address of each NUMA node in numa_info
structure so that lookup of node by address becomes easier. Add
an API get_numa_node() to lookup a node by address.

This is needed by PowerPC memory hotplug implementation.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
---
 include/sysemu/numa.h |  3 +++
 numa.c                | 61 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+)

diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 5633b85..7274d05 100644
--- a/include/sysemu/numa.h
+++ b/include/sysemu/numa.h
@@ -14,11 +14,14 @@ typedef struct node_info {
     DECLARE_BITMAP(node_cpu, MAX_CPUMASK_BITS);
     struct HostMemoryBackend *node_memdev;
     bool present;
+    ram_addr_t mem_start;
+    ram_addr_t mem_end;
 } NodeInfo;
 extern NodeInfo numa_info[MAX_NODES];
 void parse_numa_opts(void);
 void numa_post_machine_init(void);
 void query_numa_node_mem(uint64_t node_mem[]);
 extern QemuOptsList qemu_numa_opts;
+uint32_t get_numa_node(ram_addr_t addr, Error **errp);
 
 #endif
diff --git a/numa.c b/numa.c
index 5634bf0..85ff00f 100644
--- a/numa.c
+++ b/numa.c
@@ -53,6 +53,51 @@ static int max_numa_nodeid; /* Highest specified NUMA node ID, plus one.
 int nb_numa_nodes;
 NodeInfo numa_info[MAX_NODES];
 
+/*
+ * Given an address, return the index of the NUMA node to which the
+ * address belongs to.
+ */
+uint32_t get_numa_node(ram_addr_t addr, Error **errp)
+{
+    uint32_t i;
+    MemoryDeviceInfoList *info_list = NULL;
+    MemoryDeviceInfoList **prev = &info_list;
+    MemoryDeviceInfoList *info;
+
+    for (i = 0; i < nb_numa_nodes; i++) {
+        if (addr >= numa_info[i].mem_start && addr < numa_info[i].mem_end) {
+            return i;
+        }
+    }
+
+    /*
+     * If this @addr falls under cold or hotplugged memory regions,
+     * check there too.
+     */
+    qmp_pc_dimm_device_list(qdev_get_machine(), &prev);
+    for (info = info_list; info; info = info->next) {
+        MemoryDeviceInfo *value = info->value;
+
+        if (value) {
+            switch (value->kind) {
+            case MEMORY_DEVICE_INFO_KIND_DIMM:
+                if (addr >= value->dimm->addr &&
+                        addr < (value->dimm->addr + value->dimm->size)) {
+                    qapi_free_MemoryDeviceInfoList(info_list);
+                    return value->dimm->node;
+                }
+                break;
+            default:
+                break;
+            }
+        }
+    }
+    qapi_free_MemoryDeviceInfoList(info_list);
+    error_setg(errp, "Address 0x" RAM_ADDR_FMT " doesn't belong to any NUMA node", addr);
+
+    return -1;
+}
+
 static void numa_node_parse(NumaNodeOptions *node, QemuOpts *opts, Error **errp)
 {
     uint16_t nodenr;
@@ -119,6 +164,7 @@ static void numa_node_parse(NumaNodeOptions *node, QemuOpts *opts, Error **errp)
         numa_info[nodenr].node_mem = object_property_get_int(o, "size", NULL);
         numa_info[nodenr].node_memdev = MEMORY_BACKEND(o);
     }
+
     numa_info[nodenr].present = true;
     max_numa_nodeid = MAX(max_numa_nodeid, nodenr + 1);
 }
@@ -165,6 +211,17 @@ error:
     return -1;
 }
 
+static void numa_set_mem_address(int nodenr)
+{
+    if (nodenr) {
+        numa_info[nodenr].mem_start = numa_info[nodenr-1].mem_end;
+    } else {
+        numa_info[nodenr].mem_start = 0;
+    }
+    numa_info[nodenr].mem_end = numa_info[nodenr].mem_start +
+                                   numa_info[nodenr].node_mem;
+}
+
 void parse_numa_opts(void)
 {
     int i;
@@ -229,6 +286,10 @@ void parse_numa_opts(void)
         }
 
         for (i = 0; i < nb_numa_nodes; i++) {
+            numa_set_mem_address(i);
+        }
+
+        for (i = 0; i < nb_numa_nodes; i++) {
             if (!bitmap_empty(numa_info[i].node_cpu, MAX_CPUMASK_BITS)) {
                 break;
             }
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [RFC PATCH v3 23/24] spapr: Support ibm, dynamic-reconfiguration-memory
  2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
                   ` (21 preceding siblings ...)
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 22/24] numa: API to lookup NUMA node by address Bharata B Rao
@ 2015-04-24  6:47 ` Bharata B Rao
  2015-05-05  7:40   ` David Gibson
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 24/24] spapr: Memory hotplug support Bharata B Rao
  23 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, Bharata B Rao, mdroth, agraf, qemu-ppc, tyreld, nfont,
	imammedo, afaerber, david

Parse ibm,architecture.vec table obtained from the guest and enable
memory node configuration via ibm,dynamic-reconfiguration-memory if guest
supports it. This is in preparation to support memory hotplug for
sPAPR guests.

This changes the way memory node configuration is done. Currently all
memory nodes are built upfront. But after this patch, only memory@0 node
for RMA is built upfront. Guest kernel boots with just that and rest of
the memory nodes (via memory@XXX or ibm,dynamic-reconfiguration-memory)
are built when guest does ibm,client-architecture-support call.

Note: This patch needs a SLOF enhancement which is already part of
upstream SLOF.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---
 docs/specs/ppc-spapr-hotplug.txt |  48 ++++++++++
 hw/ppc/spapr.c                   | 192 ++++++++++++++++++++++++++++++---------
 hw/ppc/spapr_hcall.c             |  51 +++++++++--
 include/hw/ppc/spapr.h           |  15 ++-
 4 files changed, 257 insertions(+), 49 deletions(-)

diff --git a/docs/specs/ppc-spapr-hotplug.txt b/docs/specs/ppc-spapr-hotplug.txt
index 46e0719..9d574b5 100644
--- a/docs/specs/ppc-spapr-hotplug.txt
+++ b/docs/specs/ppc-spapr-hotplug.txt
@@ -302,4 +302,52 @@ consisting of <phys>, <size> and <maxcpus>.
 pseries guests use this property to note the maximum allowed CPUs for the
 guest.
 
+== ibm,dynamic-reconfiguration-memory ==
+
+ibm,dynamic-reconfiguration-memory is a device tree node that represents
+dynamically reconfigurable logical memory blocks (LMB). This node
+is generated only when the guest advertises the support for it via
+ibm,client-architecture-support call. Memory that is not dynamically
+reconfigurable is represented by /memory nodes. The properties of this
+node that are of interest to the sPAPR memory hotplug implementation
+in QEMU are described here.
+
+ibm,lmb-size
+
+This 64bit integer defines the size of each dynamically reconfigurable LMB.
+
+ibm,associativity-lookup-arrays
+
+This property defines a lookup array in which the NUMA associativity
+information for each LMB can be found. It is a property encoded array
+that begins with an integer M, the number of associativity lists followed
+by an integer N, the number of entries per associativity list and terminated
+by M associativity lists each of length N integers.
+
+This property provides the same information as given by ibm,associativity
+property in a /memory node. Each assigned LMB has an index value between
+0 and M-1 which is used as an index into this table to select which
+associativity list to use for the LMB. This index value for each LMB
+is defined in ibm,dynamic-memory property.
+
+ibm,dynamic-memory
+
+This property describes the dynamically reconfigurable memory. It is a
+property endoded array that has an integer N, the number of LMBs followed
+by N LMB list entires.
+
+Each LMB list entry consists of the following elements:
+
+- Logical address of the start of the LMB encoded as a 64bit integer. This
+  corresponds to reg property in /memory node.
+- DRC index of the LMB that corresponds to ibm,my-drc-index property
+  in a /memory node.
+- Four bytes reserved for expansion.
+- Associativity list index for the LMB that is used an index into
+  ibm,associativity-lookup-arrays property described earlier. This
+  is used to retrieve the right associativity list to be used for this
+  LMB.
+- A 32bit flags word. The bit at bit position 0x00000008 defines whether
+  the LMB is assigned to the the partition as of boot time.
+
 [1] http://thread.gmane.org/gmane.linux.ports.ppc.embedded/75350/focus=106867
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 9dc4c36..73f947b 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -540,42 +540,6 @@ static void *spapr_create_fdt_skel(hwaddr initrd_base,
     return fdt;
 }
 
-int spapr_h_cas_compose_response(target_ulong addr, target_ulong size)
-{
-    void *fdt, *fdt_skel;
-    sPAPRDeviceTreeUpdateHeader hdr = { .version_id = 1 };
-
-    size -= sizeof(hdr);
-
-    /* Create sceleton */
-    fdt_skel = g_malloc0(size);
-    _FDT((fdt_create(fdt_skel, size)));
-    _FDT((fdt_begin_node(fdt_skel, "")));
-    _FDT((fdt_end_node(fdt_skel)));
-    _FDT((fdt_finish(fdt_skel)));
-    fdt = g_malloc0(size);
-    _FDT((fdt_open_into(fdt_skel, fdt, size)));
-    g_free(fdt_skel);
-
-    /* Fix skeleton up */
-    _FDT((spapr_fixup_cpu_dt(fdt, spapr)));
-
-    /* Pack resulting tree */
-    _FDT((fdt_pack(fdt)));
-
-    if (fdt_totalsize(fdt) + sizeof(hdr) > size) {
-        trace_spapr_cas_failed(size);
-        return -1;
-    }
-
-    cpu_physical_memory_write(addr, &hdr, sizeof(hdr));
-    cpu_physical_memory_write(addr + sizeof(hdr), fdt, fdt_totalsize(fdt));
-    trace_spapr_cas_continue(fdt_totalsize(fdt) + sizeof(hdr));
-    g_free(fdt);
-
-    return 0;
-}
-
 static void spapr_populate_memory_node(void *fdt, int nodeid, hwaddr start,
                                        hwaddr size)
 {
@@ -629,7 +593,6 @@ static int spapr_populate_memory(sPAPREnvironment *spapr, void *fdt)
         }
         if (!mem_start) {
             /* ppc_spapr_init() checks for rma_size <= node0_size already */
-            spapr_populate_memory_node(fdt, i, 0, spapr->rma_size);
             mem_start += spapr->rma_size;
             node_size -= spapr->rma_size;
         }
@@ -786,6 +749,150 @@ static void spapr_populate_cpus_dt_node(void *fdt, sPAPREnvironment *spapr)
 
 }
 
+/*
+ * Adds ibm,dynamic-reconfiguration-memory node.
+ * Refer to docs/specs/ppc-spapr-hotplug.txt for the documentation
+ * of this device tree node.
+ */
+static int spapr_populate_drconf_memory(sPAPREnvironment *spapr, void *fdt)
+{
+    int ret, i, offset;
+    uint64_t lmb_size = SPAPR_MEMORY_BLOCK_SIZE;
+    uint32_t prop_lmb_size[] = {0, cpu_to_be32(lmb_size)};
+    uint32_t nr_rma_lmbs = spapr->rma_size/lmb_size;
+    uint32_t nr_lmbs = spapr->maxram_limit/lmb_size - nr_rma_lmbs;
+    uint32_t nr_assigned_lmbs = spapr->ram_limit/lmb_size - nr_rma_lmbs;
+    uint32_t *int_buf, *cur_index, buf_len;
+
+    /* Allocate enough buffer size to fit in ibm,dynamic-memory */
+    buf_len = nr_lmbs * SPAPR_DR_LMB_LIST_ENTRY_SIZE * sizeof(uint32_t) +
+                sizeof(uint32_t);
+    cur_index = int_buf = g_malloc0(buf_len);
+
+    offset = fdt_add_subnode(fdt, 0, "ibm,dynamic-reconfiguration-memory");
+
+    ret = fdt_setprop(fdt, offset, "ibm,lmb-size", prop_lmb_size,
+                    sizeof(prop_lmb_size));
+    if (ret < 0) {
+        goto out;
+    }
+
+    ret = fdt_setprop_cell(fdt, offset, "ibm,memory-flags-mask", 0xff);
+    if (ret < 0) {
+        goto out;
+    }
+
+    ret = fdt_setprop_cell(fdt, offset, "ibm,memory-preservation-time", 0x0);
+    if (ret < 0) {
+        goto out;
+    }
+
+    /* ibm,dynamic-memory */
+    int_buf[0] = cpu_to_be32(nr_lmbs);
+    cur_index++;
+    for (i = 0; i < nr_lmbs; i++) {
+        sPAPRDRConnector *drc;
+        sPAPRDRConnectorClass *drck;
+        uint64_t addr;
+        uint32_t *dynamic_memory = cur_index;
+
+        if (i < nr_assigned_lmbs) {
+            addr = (i + nr_rma_lmbs) * lmb_size;
+        } else {
+            addr = (i - nr_assigned_lmbs) * lmb_size +
+                SPAPR_MACHINE(qdev_get_machine())->hotplug_memory_base;
+        }
+        drc = spapr_dr_connector_new(qdev_get_machine(),
+                SPAPR_DR_CONNECTOR_TYPE_LMB, addr/lmb_size);
+        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+
+        dynamic_memory[0] = cpu_to_be32(addr >> 32);
+        dynamic_memory[1] = cpu_to_be32(addr & 0xffffffff);
+        dynamic_memory[2] = cpu_to_be32(drck->get_index(drc));
+        dynamic_memory[3] = cpu_to_be32(0); /* reserved */
+        dynamic_memory[4] = cpu_to_be32(get_numa_node(addr, NULL));
+        if (addr < spapr->ram_limit ||
+                    memory_region_present(get_system_memory(), addr)) {
+            dynamic_memory[5] = cpu_to_be32(SPAPR_LMB_FLAGS_ASSIGNED);
+        } else {
+            dynamic_memory[5] = cpu_to_be32(0);
+        }
+
+        cur_index += SPAPR_DR_LMB_LIST_ENTRY_SIZE;
+    }
+    ret = fdt_setprop(fdt, offset, "ibm,dynamic-memory", int_buf, buf_len);
+    if (ret < 0) {
+        goto out;
+    }
+
+    /* ibm,associativity-lookup-arrays */
+    cur_index = int_buf;
+    int_buf[0] = cpu_to_be32(nb_numa_nodes);
+    int_buf[1] = cpu_to_be32(4); /* Number of entries per associativity list */
+    cur_index += 2;
+    for (i = 0; i < nb_numa_nodes; i++) {
+        uint32_t associativity[] = {
+            cpu_to_be32(0x0),
+            cpu_to_be32(0x0),
+            cpu_to_be32(0x0),
+            cpu_to_be32(i)
+        };
+        memcpy(cur_index, associativity, sizeof(associativity));
+        cur_index += 4;
+    }
+    ret = fdt_setprop(fdt, offset, "ibm,associativity-lookup-arrays", int_buf,
+            (cur_index - int_buf) * sizeof(uint32_t));
+out:
+    g_free(int_buf);
+    return ret;
+}
+
+int spapr_h_cas_compose_response(target_ulong addr, target_ulong size,
+                                bool cpu_update, bool memory_update)
+{
+    void *fdt, *fdt_skel;
+    sPAPRDeviceTreeUpdateHeader hdr = { .version_id = 1 };
+
+    size -= sizeof(hdr);
+
+    /* Create sceleton */
+    fdt_skel = g_malloc0(size);
+    _FDT((fdt_create(fdt_skel, size)));
+    _FDT((fdt_begin_node(fdt_skel, "")));
+    _FDT((fdt_end_node(fdt_skel)));
+    _FDT((fdt_finish(fdt_skel)));
+    fdt = g_malloc0(size);
+    _FDT((fdt_open_into(fdt_skel, fdt, size)));
+    g_free(fdt_skel);
+
+    /* Fixup cpu nodes */
+    if (cpu_update) {
+        _FDT((spapr_fixup_cpu_dt(fdt, spapr)));
+    }
+
+    /* Generate memory nodes or ibm,dynamic-reconfiguration-memory node */
+    if (memory_update) {
+        _FDT((spapr_populate_drconf_memory(spapr, fdt)));
+    } else {
+        _FDT((spapr_populate_memory(spapr, fdt)));
+    }
+
+    /* Pack resulting tree */
+    _FDT((fdt_pack(fdt)));
+
+    if (fdt_totalsize(fdt) + sizeof(hdr) > size) {
+        trace_spapr_cas_failed(size);
+        return -1;
+    }
+
+    cpu_physical_memory_write(addr, &hdr, sizeof(hdr));
+    cpu_physical_memory_write(addr + sizeof(hdr), fdt, fdt_totalsize(fdt));
+    trace_spapr_cas_continue(fdt_totalsize(fdt) + sizeof(hdr));
+    g_free(fdt);
+
+    return 0;
+}
+
 static void spapr_finalize_fdt(sPAPREnvironment *spapr,
                                hwaddr fdt_addr,
                                hwaddr rtas_addr,
@@ -802,11 +909,12 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
     /* open out the base tree into a temp buffer for the final tweaks */
     _FDT((fdt_open_into(spapr->fdt_skel, fdt, FDT_MAX_SIZE)));
 
-    ret = spapr_populate_memory(spapr, fdt);
-    if (ret < 0) {
-        fprintf(stderr, "couldn't setup memory nodes in fdt\n");
-        exit(1);
-    }
+    /*
+     * Add memory@0 node to represent RMA. Rest of the memory is either
+     * represented by memory nodes or ibm,dynamic-reconfiguration-memory
+     * node later during ibm,client-architecture-support call.
+     */
+    spapr_populate_memory_node(fdt, 0, 0, spapr->rma_size);
 
     ret = spapr_populate_vdevice(spapr->vio_bus, fdt);
     if (ret < 0) {
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 4f76f1c..ada2123 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -807,6 +807,32 @@ static target_ulong h_set_mode(PowerPCCPU *cpu, sPAPREnvironment *spapr,
     return ret;
 }
 
+/*
+ * Return the offset to the requested option vector @vector in the
+ * option vector table @table.
+ */
+static target_ulong cas_get_option_vector(int vector, target_ulong table)
+{
+    int i;
+    char nr_vectors, nr_entries;
+
+    if (!table) {
+        return 0;
+    }
+
+    nr_vectors = (ldl_phys(&address_space_memory, table) >> 24) + 1;
+    if (!vector || vector > nr_vectors) {
+        return 0;
+    }
+    table++; /* skip nr option vectors */
+
+    for (i = 0; i < vector - 1; i++) {
+        nr_entries = ldl_phys(&address_space_memory, table) >> 24;
+        table += nr_entries + 2;
+    }
+    return table;
+}
+
 typedef struct {
     PowerPCCPU *cpu;
     uint32_t cpu_version;
@@ -827,19 +853,22 @@ static void do_set_compat(void *arg)
     ((cpuver) == CPU_POWERPC_LOGICAL_2_06_PLUS) ? 2061 : \
     ((cpuver) == CPU_POWERPC_LOGICAL_2_07) ? 2070 : 0)
 
+#define OV5_DRCONF_MEMORY 0x20
+
 static target_ulong h_client_architecture_support(PowerPCCPU *cpu_,
                                                   sPAPREnvironment *spapr,
                                                   target_ulong opcode,
                                                   target_ulong *args)
 {
-    target_ulong list = args[0];
+    target_ulong list = args[0], ov_table;
     PowerPCCPUClass *pcc_ = POWERPC_CPU_GET_CLASS(cpu_);
     CPUState *cs;
-    bool cpu_match = false;
+    bool cpu_match = false, cpu_update = true, memory_update = false;
     unsigned old_cpu_version = cpu_->cpu_version;
     unsigned compat_lvl = 0, cpu_version = 0;
     unsigned max_lvl = get_compat_level(cpu_->max_compat);
     int counter;
+    char ov5_byte2;
 
     /* Parse PVR list */
     for (counter = 0; counter < 512; ++counter) {
@@ -889,8 +918,6 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu_,
         }
     }
 
-    /* For the future use: here @list points to the first capability */
-
     /* Parsing finished */
     trace_spapr_cas_pvr(cpu_->cpu_version, cpu_match,
                         cpu_version, pcc_->pcr_mask);
@@ -914,14 +941,26 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu_,
     }
 
     if (!cpu_version) {
-        return H_SUCCESS;
+        cpu_update = false;
     }
 
+    /* For the future use: here @ov_table points to the first option vector */
+    ov_table = list;
+
+    list = cas_get_option_vector(5, ov_table);
     if (!list) {
         return H_SUCCESS;
     }
 
-    if (spapr_h_cas_compose_response(args[1], args[2])) {
+    /* @list now points to OV 5 */
+    list += 2;
+    ov5_byte2 = rtas_ld(list, 0) >> 24;
+    if (ov5_byte2 & OV5_DRCONF_MEMORY) {
+        memory_update = true;
+    }
+
+    if (spapr_h_cas_compose_response(args[1], args[2], cpu_update,
+                                    memory_update)) {
         qemu_system_reset_request();
     }
 
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 53560e9..a286fe7 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -554,9 +554,22 @@ struct sPAPREventLogEntry {
 /* 1GB alignment for hotplug memory region */
 #define SPAPR_HOTPLUG_MEM_ALIGN (1ULL << 30)
 
+/*
+ * Number of 32 bit words in each LMB list entry in ibm,dynamic-memory
+ * property under ibm,dynamic-reconfiguration-memory node.
+ */
+#define SPAPR_DR_LMB_LIST_ENTRY_SIZE 6
+
+/*
+ * This flag value defines the LMB as assigned in ibm,dynamic-memory
+ * property under ibm,dynamic-reconfiguration-memory node.
+ */
+#define SPAPR_LMB_FLAGS_ASSIGNED 0x00000008
+
 void spapr_events_init(sPAPREnvironment *spapr);
 void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq);
-int spapr_h_cas_compose_response(target_ulong addr, target_ulong size);
+int spapr_h_cas_compose_response(target_ulong addr, target_ulong size,
+                                bool cpu_update, bool memory_update);
 sPAPRTCETable *spapr_tce_new_table(DeviceState *owner, uint32_t liobn,
                                    uint64_t bus_offset,
                                    uint32_t page_shift,
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [Qemu-devel] [RFC PATCH v3 24/24] spapr: Memory hotplug support
  2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
                   ` (22 preceding siblings ...)
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 23/24] spapr: Support ibm, dynamic-reconfiguration-memory Bharata B Rao
@ 2015-04-24  6:47 ` Bharata B Rao
  2015-05-05  7:45   ` David Gibson
  23 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-04-24  6:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, Bharata B Rao, mdroth, agraf, qemu-ppc, tyreld, nfont,
	imammedo, afaerber, david

Make use of pc-dimm infrastructure to support memory hotplug
for PowerPC.

Modelled on i386 memory hotplug.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c        | 157 +++++++++++++++++++++++++++++++++++++++++++++++++-
 hw/ppc/spapr_events.c |   3 +
 2 files changed, 158 insertions(+), 2 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 73f947b..9f72890 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -61,7 +61,8 @@
 #include "hw/nmi.h"
 
 #include "hw/compat.h"
-
+#include "hw/mem/pc-dimm.h"
+#include "qapi/qmp/qerror.h"
 #include <libfdt.h>
 
 /* SLOF memory layout:
@@ -877,6 +878,10 @@ int spapr_h_cas_compose_response(target_ulong addr, target_ulong size,
         _FDT((spapr_populate_memory(spapr, fdt)));
     }
 
+    if (spapr->dr_lmb_enabled) {
+        _FDT(spapr_drc_populate_dt(fdt, 0, NULL, SPAPR_DR_CONNECTOR_TYPE_LMB));
+    }
+
     /* Pack resulting tree */
     _FDT((fdt_pack(fdt)));
 
@@ -2193,6 +2198,133 @@ static void spapr_cpu_socket_unplug(HotplugHandler *hotplug_dev,
     object_child_foreach(OBJECT(dev), spapr_cpu_core_unplug, errp);
 }
 
+static void spapr_add_lmbs(DeviceState *dev, uint64_t addr, uint64_t size,
+                            Error **errp)
+{
+    sPAPRDRConnector *drc;
+    uint32_t nr_lmbs = size/SPAPR_MEMORY_BLOCK_SIZE;
+    int i;
+
+    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
+        error_setg(errp, "Hotplugged memory size must be a multiple of "
+                      "%d MB", SPAPR_MEMORY_BLOCK_SIZE/(1024 * 1024));
+        return;
+    }
+
+    /*
+     * Check for DRC connectors and send hotplug notification to the
+     * guest only in case of hotplugged memory. This allows cold plugged
+     * memory to be specified at boot time.
+     */
+    if (!dev->hotplugged) {
+        return;
+    }
+
+    for (i = 0; i < nr_lmbs; i++) {
+        drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_LMB,
+                addr/SPAPR_MEMORY_BLOCK_SIZE);
+        g_assert(drc);
+
+        /*
+         * TODO: Not doing drc->attach() since it is currently not
+         * needed. When pseries guest kernel implements configure-connector
+         * RTAS for memory hotplug, we will have to pass a DT node at
+         * which time we can use ->attach().
+         */
+        spapr_hotplug_req_add_event(drc);
+        addr += SPAPR_MEMORY_BLOCK_SIZE;
+    }
+}
+
+/*
+ * TODO: Share code with pc_dimm_plug.
+ */
+static void spapr_memory_plug(HotplugHandler *hotplug_dev,
+                         DeviceState *dev, Error **errp)
+{
+    int slot;
+    Error *local_err = NULL;
+    sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
+    MachineState *machine = MACHINE(hotplug_dev);
+    PCDIMMDevice *dimm = PC_DIMM(dev);
+    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
+    MemoryRegion *mr = ddc->get_memory_region(dimm);
+    uint64_t existing_dimms_capacity = 0;
+    uint64_t align = TARGET_PAGE_SIZE;
+    uint64_t addr;
+
+    addr = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP, &local_err);
+    if (local_err) {
+        goto out;
+    }
+
+    if (memory_region_get_alignment(mr) && ms->enforce_aligned_dimm) {
+        align = memory_region_get_alignment(mr);
+    }
+
+    addr = pc_dimm_get_free_addr(ms->hotplug_memory_base,
+                                 memory_region_size(&ms->hotplug_memory),
+                                 !addr ? NULL : &addr, align,
+                                 memory_region_size(mr), &local_err);
+    if (local_err) {
+        goto out;
+    }
+
+    existing_dimms_capacity = pc_existing_dimms_capacity(&local_err);
+    if (local_err) {
+        goto out;
+    }
+
+    if (existing_dimms_capacity + memory_region_size(mr) >
+        machine->maxram_size - machine->ram_size) {
+        error_setg(&local_err, "not enough space, currently 0x%" PRIx64
+                   " in use of total hot pluggable 0x" RAM_ADDR_FMT,
+                   existing_dimms_capacity,
+                   machine->maxram_size - machine->ram_size);
+        goto out;
+    }
+
+    object_property_set_int(OBJECT(dev), addr, PC_DIMM_ADDR_PROP, &local_err);
+    if (local_err) {
+        goto out;
+    }
+    trace_mhp_pc_dimm_assigned_address(addr);
+
+    slot = object_property_get_int(OBJECT(dev), PC_DIMM_SLOT_PROP, &local_err);
+    if (local_err) {
+        goto out;
+    }
+
+    slot = pc_dimm_get_free_slot(slot == PC_DIMM_UNASSIGNED_SLOT ? NULL : &slot,
+                                 machine->ram_slots, &local_err);
+    if (local_err) {
+        goto out;
+    }
+    object_property_set_int(OBJECT(dev), slot, PC_DIMM_SLOT_PROP, &local_err);
+    if (local_err) {
+        goto out;
+    }
+    trace_mhp_pc_dimm_assigned_slot(slot);
+
+    if (kvm_enabled() && !kvm_has_free_slot(machine)) {
+        error_setg(&local_err, "hypervisor has no free memory slots left");
+        goto out;
+    }
+
+    memory_region_add_subregion(&ms->hotplug_memory,
+                                addr - ms->hotplug_memory_base, mr);
+    vmstate_register_ram(mr, dev);
+
+    spapr_add_lmbs(dev, addr, memory_region_size(mr), &local_err);
+    if (local_err) {
+        vmstate_unregister_ram(mr, dev);
+        memory_region_del_subregion(&ms->hotplug_memory, mr);
+    }
+
+out:
+    error_propagate(errp, local_err);
+}
+
 static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
                                       DeviceState *dev, Error **errp)
 {
@@ -2217,6 +2349,24 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
             return;
         }
         spapr_cpu_plug(hotplug_dev, dev, errp);
+    } else if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        uint32_t node;
+
+        if (!spapr->dr_lmb_enabled) {
+            error_setg(errp, "Memory hotplug not supported for this machine");
+            return;
+        }
+        node = object_property_get_int(OBJECT(dev), PC_DIMM_NODE_PROP, errp);
+        if (*errp) {
+            return;
+        }
+
+        if (node != 0) {
+            error_setg(errp, "Currently hot adding memory to only node 0"
+                        " is supported for sPAPR");
+            return;
+        }
+        spapr_memory_plug(hotplug_dev, dev, errp);
     }
 }
 
@@ -2229,6 +2379,8 @@ static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
             return;
         }
         spapr_cpu_socket_unplug(hotplug_dev, dev, errp);
+    } else if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        error_setg(errp, "Memory hot unplug not supported by sPAPR");
     }
 }
 
@@ -2236,7 +2388,8 @@ static HotplugHandler *spapr_get_hotpug_handler(MachineState *machine,
                                              DeviceState *dev)
 {
     if (object_dynamic_cast(OBJECT(dev), TYPE_CPU) ||
-        object_dynamic_cast(OBJECT(dev), TYPE_CPU_SOCKET)) {
+        object_dynamic_cast(OBJECT(dev), TYPE_CPU_SOCKET) ||
+        object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
         return HOTPLUG_HANDLER(machine);
     }
     return NULL;
diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
index 4ae818a..e2a22d2 100644
--- a/hw/ppc/spapr_events.c
+++ b/hw/ppc/spapr_events.c
@@ -431,6 +431,9 @@ static void spapr_hotplug_req_event(sPAPRDRConnector *drc, uint8_t hp_action)
     case SPAPR_DR_CONNECTOR_TYPE_CPU:
         hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_CPU;
         break;
+    case SPAPR_DR_CONNECTOR_TYPE_LMB:
+        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_MEMORY;
+        break;
     default:
         /* we shouldn't be signaling hotplug events for resources
          * that don't support them
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 05/24] spapr: Reorganize CPU dt generation code
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 05/24] spapr: Reorganize CPU dt generation code Bharata B Rao
@ 2015-04-26 11:47   ` Bharata B Rao
  2015-04-27  5:36     ` Bharata B Rao
  2015-05-04 11:59   ` David Gibson
  1 sibling, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-04-26 11:47 UTC (permalink / raw)
  To: qemu-devel
  Cc: aik, mdroth, agraf, qemu-ppc, tyreld, nfont, imammedo, afaerber, david

On Fri, Apr 24, 2015 at 12:17:27PM +0530, Bharata B Rao wrote:
> Reorganize CPU device tree generation code so that it be reused from
> hotplug path. CPU dt entries are now generated from spapr_finalize_fdt()
> instead of spapr_create_fdt_skel().

Creating CPU DT entries from spapr_finalize_fdt() instead of
spapr_create_fdt_skel() has an interesting side effect.

Before this patch, when I boot an SMP guest with the following configuration:

-smp 4 -numa node,cpus=0-1,mem=4G,nodeid=0 -numa node,cpus=2-3,mem=4G,nodeid=1

the guest CPUs come up in the following fashion:

[root@localhost ~]# cat /proc/cpuinfo
processor	: 0
cpu dt id	: 0

processor	: 1
cpu dt id	: 8

processor	: 2
cpu dt id	: 16

processor	: 3
cpu dt id	: 24

In the above /proc/cpuinfo output, only the relevant fields are retained
and the newly added field "cpu dt id" is essentially obtained by
arch/powerpc/include/asm/smp.h:get_hard_smp_processor_id() in the kernel.

[root@localhost ~]# lscpu
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             4
NUMA node(s):          2
NUMA node0 CPU(s):     0,1
NUMA node1 CPU(s):     2,3

Here CPUs 0,1 are in node0 and 2,3 are in node1 as specified. The same is
reported by QEMU monitor below.

(qemu) info numa
2 nodes
node 0 cpus: 0 1
node 0 size: 4096 MB
node 1 cpus: 2 3
node 1 size: 4096 MB

After this patch where CPU DT entries are built in spapr_finalize_fdt()
completely, CPU enumeration done by the guest kernel gets reversed since
the CPU DT nodes end up getting discovered by guest kernel in the reverse
order in arch/powerpc/kernel/setup-common.c:smp_setup_cpu_maps(). With
this the resulting guest SMP configuration looks like this:

[root@localhost ~]# cat /proc/cpuinfo 
processor	: 0
cpu dt id	: 24  <--- was 0 previously

processor	: 1
cpu dt id	: 16  <--- was 8 previously

processor	: 2
cpu dt id	: 8   <--- was 16 previously

processor	: 3
cpu dt id	: 0   <--- was 24 previously

[root@localhost ~]# lscpu
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             4
NUMA node(s):          2
NUMA node0 CPU(s):     2,3 <--- node0 was supposed to have 0,1
NUMA node1 CPU(s):     0,1 <--- node1 was supposed to have 2,3

(qemu) info numa
2 nodes
node 0 cpus: 0 1
node 0 size: 4096 MB
node 1 cpus: 2 3
node 1 size: 4096 MB

This is not wrong per se because CPUs with correct DT ids ended up on
right NUMA nodes, but just that the CPU numbers assigned by guest got
reversed.

Is this acceptable or will this break some userpace ?

In both the cases, I am adding CPU DT nodes from QEMU in the same order,
but not sure why the guest kernel discovers them in different orders in
each case.

> +static void spapr_populate_cpus_dt_node(void *fdt, sPAPREnvironment *spapr)
> +{
> +    CPUState *cs;
> +    int cpus_offset;
> +    char *nodename;
> +    int smt = kvmppc_smt_threads();
> +
> +    cpus_offset = fdt_add_subnode(fdt, 0, "cpus");
> +    _FDT(cpus_offset);
> +    _FDT((fdt_setprop_cell(fdt, cpus_offset, "#address-cells", 0x1)));
> +    _FDT((fdt_setprop_cell(fdt, cpus_offset, "#size-cells", 0x0)));
> +
> +    CPU_FOREACH(cs) {
> +        PowerPCCPU *cpu = POWERPC_CPU(cs);
> +        int index = ppc_get_vcpu_dt_id(cpu);
> +        DeviceClass *dc = DEVICE_GET_CLASS(cs);
> +        int offset;
> +
> +        if ((index % smt) != 0) {
> +            continue;
> +        }
> +
> +        nodename = g_strdup_printf("%s@%x", dc->fw_name, index);
> +        offset = fdt_add_subnode(fdt, cpus_offset, nodename);
> +        g_free(nodename);
> +        _FDT(offset);
> +        spapr_populate_cpu_dt(cs, fdt, offset);
> +    }

I can simply fix this by walking the CPUs in reverse order in the above
code which makes the guest kernel to discover the CPU DT nodes in the
right order.

s/CPU_FOREACH(cs)/CPU_FOREACH_REVERSE(cs) will solve this problem. Would this
be the right approach or should we just leave it to the guest kernel to
discover and enumerate CPUs in whatever order it finds the DT nodes in FDT ?

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 05/24] spapr: Reorganize CPU dt generation code
  2015-04-26 11:47   ` Bharata B Rao
@ 2015-04-27  5:36     ` Bharata B Rao
  2015-05-04 12:01       ` David Gibson
  0 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-04-27  5:36 UTC (permalink / raw)
  To: qemu-devel
  Cc: Nikunj A. Dadhania, aik, mdroth, agraf, qemu-ppc, tyreld, nfont,
	imammedo, afaerber, david

On Sun, Apr 26, 2015 at 05:17:48PM +0530, Bharata B Rao wrote:
> On Fri, Apr 24, 2015 at 12:17:27PM +0530, Bharata B Rao wrote:
> > Reorganize CPU device tree generation code so that it be reused from
> > hotplug path. CPU dt entries are now generated from spapr_finalize_fdt()
> > instead of spapr_create_fdt_skel().
> 
> Creating CPU DT entries from spapr_finalize_fdt() instead of
> spapr_create_fdt_skel() has an interesting side effect.
> 
> <snip> 
> 
> In both the cases, I am adding CPU DT nodes from QEMU in the same order,
> but not sure why the guest kernel discovers them in different orders in
> each case.

Nikunj and I tracked this down to the difference in device tree APIs that
we are using in two cases.

When CPU DT nodes are created from spapr_create_fdt_skel(), we are using
fdt_begin_node() API which does sequential write and hence CPU DT nodes
end up in the same order in which they are created.

However in my patch when I create CPU DT entries in spapr_finalize_fdt(),
I am using fdt_add_subnode() which ends up writing the CPU DT node at the
same parent offset for all the CPUs. This results in CPU DT nodes being
generated in reverse order in FDT.

> 
> > +static void spapr_populate_cpus_dt_node(void *fdt, sPAPREnvironment *spapr)
> > +{
> > +    CPUState *cs;
> > +    int cpus_offset;
> > +    char *nodename;
> > +    int smt = kvmppc_smt_threads();
> > +
> > +    cpus_offset = fdt_add_subnode(fdt, 0, "cpus");
> > +    _FDT(cpus_offset);
> > +    _FDT((fdt_setprop_cell(fdt, cpus_offset, "#address-cells", 0x1)));
> > +    _FDT((fdt_setprop_cell(fdt, cpus_offset, "#size-cells", 0x0)));
> > +
> > +    CPU_FOREACH(cs) {
> > +        PowerPCCPU *cpu = POWERPC_CPU(cs);
> > +        int index = ppc_get_vcpu_dt_id(cpu);
> > +        DeviceClass *dc = DEVICE_GET_CLASS(cs);
> > +        int offset;
> > +
> > +        if ((index % smt) != 0) {
> > +            continue;
> > +        }
> > +
> > +        nodename = g_strdup_printf("%s@%x", dc->fw_name, index);
> > +        offset = fdt_add_subnode(fdt, cpus_offset, nodename);
> > +        g_free(nodename);
> > +        _FDT(offset);
> > +        spapr_populate_cpu_dt(cs, fdt, offset);
> > +    }
> 
> I can simply fix this by walking the CPUs in reverse order in the above
> code which makes the guest kernel to discover the CPU DT nodes in the
> right order.
> 
> s/CPU_FOREACH(cs)/CPU_FOREACH_REVERSE(cs) will solve this problem. Would this
> be the right approach or should we just leave it to the guest kernel to
> discover and enumerate CPUs in whatever order it finds the DT nodes in FDT ?

So using CPU_FOREACH_REVERSE(cs) appears to be right way to handle this.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 02/24] spapr: Add DRC dt entries for CPUs
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 02/24] spapr: Add DRC dt entries for CPUs Bharata B Rao
@ 2015-05-04 11:46   ` David Gibson
  0 siblings, 0 replies; 74+ messages in thread
From: David Gibson @ 2015-05-04 11:46 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

[-- Attachment #1: Type: text/plain, Size: 2924 bytes --]

On Fri, Apr 24, 2015 at 12:17:24PM +0530, Bharata B Rao wrote:
> Advertise CPU DR-capability to the guest via device tree.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
>                [spapr_drc_reset implementation]
> ---
>  hw/ppc/spapr.c | 29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 981814d..9ea3a38 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -807,6 +807,15 @@ static void spapr_finalize_fdt(sPAPREnvironment *spapr,
>          spapr_populate_chosen_stdout(fdt, spapr->vio_bus);
>      }
>  
> +    if (spapr->dr_cpu_enabled) {
> +        int offset = fdt_path_offset(fdt, "/cpus");
> +        ret = spapr_drc_populate_dt(fdt, offset, NULL,
> +                                    SPAPR_DR_CONNECTOR_TYPE_CPU);
> +        if (ret < 0) {
> +            fprintf(stderr, "Couldn't set up CPU DR device tree properties\n");
> +        }
> +    }
> +
>      _FDT((fdt_pack(fdt)));
>  
>      if (fdt_totalsize(fdt) > FDT_MAX_SIZE) {
> @@ -1393,6 +1402,16 @@ static SaveVMHandlers savevm_htab_handlers = {
>      .load_state = htab_load,
>  };
>  
> +static void spapr_drc_reset(void *opaque)
> +{
> +    sPAPRDRConnector *drc = opaque;
> +    DeviceState *d = DEVICE(drc);
> +
> +    if (d) {
> +        device_reset(d);
> +    }
> +}

Why do these need an explicit reset, rather than having their reset
hook automatically called by the qdev infrastructure?

I'm guessing it's something to do with how these are linked into the
qdev tree, but it could do with a comment clarifying this.

>  /* pSeries LPAR / sPAPR hardware init */
>  static void ppc_spapr_init(MachineState *machine)
>  {
> @@ -1418,6 +1437,7 @@ static void ppc_spapr_init(MachineState *machine)
>      long load_limit, fw_size;
>      bool kernel_le = false;
>      char *filename;
> +    int smt = kvmppc_smt_threads();
>  
>      msi_supported = true;
>  
> @@ -1482,6 +1502,15 @@ static void ppc_spapr_init(MachineState *machine)
>      spapr->dr_cpu_enabled = smc->dr_cpu_enabled;
>      spapr->dr_lmb_enabled = smc->dr_lmb_enabled;
>  
> +    if (spapr->dr_cpu_enabled) {
> +        for (i = 0; i < max_cpus/smp_threads; i++) {
> +            sPAPRDRConnector *drc =
> +                spapr_dr_connector_new(OBJECT(machine),
> +                                       SPAPR_DR_CONNECTOR_TYPE_CPU, i * smt);
> +            qemu_register_reset(spapr_drc_reset, drc);
> +        }
> +    }
> +
>      /* init CPUs */
>      if (cpu_model == NULL) {
>          cpu_model = kvm_enabled() ? "host" : "POWER7";

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 05/24] spapr: Reorganize CPU dt generation code
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 05/24] spapr: Reorganize CPU dt generation code Bharata B Rao
  2015-04-26 11:47   ` Bharata B Rao
@ 2015-05-04 11:59   ` David Gibson
  1 sibling, 0 replies; 74+ messages in thread
From: David Gibson @ 2015-05-04 11:59 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

[-- Attachment #1: Type: text/plain, Size: 1117 bytes --]

On Fri, Apr 24, 2015 at 12:17:27PM +0530, Bharata B Rao wrote:
> Reorganize CPU device tree generation code so that it be reused from
> hotplug path. CPU dt entries are now generated from spapr_finalize_fdt()
> instead of spapr_create_fdt_skel().
> 
> Note: This is how the split-up looks like now:
> 
> Boot path
> ---------
> spapr_finalize_fdt
>  spapr_populate_cpus_dt_node
>   spapr_populate_cpu_dt
>    spapr_fixup_cpu_numa_dt
>    spapr_fixup_cpu_smt_dt
> 
> Hotplug path
> ------------
> spapr_cpu_plug
>  spapr_populate_hotplug_cpu_dt
>   spapr_populate_cpu_dt
>    spapr_fixup_cpu_numa_dt
>    spapr_fixup_cpu_smt_dt
> 
> ibm,cas path
> ------------
> spapr_h_cas_compose_response
>  spapr_fixup_cpu_dt
>   spapr_fixup_cpu_numa_dt
>   spapr_fixup_cpu_smt_dt
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 05/24] spapr: Reorganize CPU dt generation code
  2015-04-27  5:36     ` Bharata B Rao
@ 2015-05-04 12:01       ` David Gibson
  0 siblings, 0 replies; 74+ messages in thread
From: David Gibson @ 2015-05-04 12:01 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, Nikunj A. Dadhania, aik, agraf, qemu-devel, qemu-ppc,
	tyreld, nfont, imammedo, afaerber

[-- Attachment #1: Type: text/plain, Size: 3224 bytes --]

On Mon, Apr 27, 2015 at 11:06:07AM +0530, Bharata B Rao wrote:
> On Sun, Apr 26, 2015 at 05:17:48PM +0530, Bharata B Rao wrote:
> > On Fri, Apr 24, 2015 at 12:17:27PM +0530, Bharata B Rao wrote:
> > > Reorganize CPU device tree generation code so that it be reused from
> > > hotplug path. CPU dt entries are now generated from spapr_finalize_fdt()
> > > instead of spapr_create_fdt_skel().
> > 
> > Creating CPU DT entries from spapr_finalize_fdt() instead of
> > spapr_create_fdt_skel() has an interesting side effect.
> > 
> > <snip> 
> > 
> > In both the cases, I am adding CPU DT nodes from QEMU in the same order,
> > but not sure why the guest kernel discovers them in different orders in
> > each case.
> 
> Nikunj and I tracked this down to the difference in device tree APIs that
> we are using in two cases.
> 
> When CPU DT nodes are created from spapr_create_fdt_skel(), we are using
> fdt_begin_node() API which does sequential write and hence CPU DT nodes
> end up in the same order in which they are created.
> 
> However in my patch when I create CPU DT entries in spapr_finalize_fdt(),
> I am using fdt_add_subnode() which ends up writing the CPU DT node at the
> same parent offset for all the CPUs. This results in CPU DT nodes being
> generated in reverse order in FDT.
> 
> > 
> > > +static void spapr_populate_cpus_dt_node(void *fdt, sPAPREnvironment *spapr)
> > > +{
> > > +    CPUState *cs;
> > > +    int cpus_offset;
> > > +    char *nodename;
> > > +    int smt = kvmppc_smt_threads();
> > > +
> > > +    cpus_offset = fdt_add_subnode(fdt, 0, "cpus");
> > > +    _FDT(cpus_offset);
> > > +    _FDT((fdt_setprop_cell(fdt, cpus_offset, "#address-cells", 0x1)));
> > > +    _FDT((fdt_setprop_cell(fdt, cpus_offset, "#size-cells", 0x0)));
> > > +
> > > +    CPU_FOREACH(cs) {
> > > +        PowerPCCPU *cpu = POWERPC_CPU(cs);
> > > +        int index = ppc_get_vcpu_dt_id(cpu);
> > > +        DeviceClass *dc = DEVICE_GET_CLASS(cs);
> > > +        int offset;
> > > +
> > > +        if ((index % smt) != 0) {
> > > +            continue;
> > > +        }
> > > +
> > > +        nodename = g_strdup_printf("%s@%x", dc->fw_name, index);
> > > +        offset = fdt_add_subnode(fdt, cpus_offset, nodename);
> > > +        g_free(nodename);
> > > +        _FDT(offset);
> > > +        spapr_populate_cpu_dt(cs, fdt, offset);
> > > +    }
> > 
> > I can simply fix this by walking the CPUs in reverse order in the above
> > code which makes the guest kernel to discover the CPU DT nodes in the
> > right order.
> > 
> > s/CPU_FOREACH(cs)/CPU_FOREACH_REVERSE(cs) will solve this problem. Would this
> > be the right approach or should we just leave it to the guest kernel to
> > discover and enumerate CPUs in whatever order it finds the DT nodes in FDT ?
> 
> So using CPU_FOREACH_REVERSE(cs) appears to be right way to handle this.

Yes, I think so.  In theory it shouldn't matter, but I think it's
safer to retain the device tree order.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 08/24] ppc: Prepare CPU socket/core abstraction
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 08/24] ppc: Prepare CPU socket/core abstraction Bharata B Rao
@ 2015-05-04 15:15   ` Thomas Huth
  2015-05-06  4:40     ` Bharata B Rao
  2015-05-05  6:46   ` David Gibson
  1 sibling, 1 reply; 74+ messages in thread
From: Thomas Huth @ 2015-05-04 15:15 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, imammedo,
	nfont, afaerber, david

On Fri, 24 Apr 2015 12:17:30 +0530
Bharata B Rao <bharata@linux.vnet.ibm.com> wrote:

> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Signed-off-by: Andreas Färber <afaerber@suse.de>

Not sure if QEMU is fully following the kernel Sob-process, but if
that's the case, I think your Sob should be below the one of Andreas in
case you're the last person who touched the patch (and I think that's
the case here since you've sent it out)?

Also a short patch description would be really nice.

 Thomas

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 12/24] spapr: CPU hotplug support
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 12/24] spapr: CPU hotplug support Bharata B Rao
@ 2015-05-04 15:53   ` Thomas Huth
  2015-05-06  5:37     ` Bharata B Rao
  2015-05-05  6:59   ` David Gibson
  1 sibling, 1 reply; 74+ messages in thread
From: Thomas Huth @ 2015-05-04 15:53 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, imammedo,
	nfont, afaerber, david

On Fri, 24 Apr 2015 12:17:34 +0530
Bharata B Rao <bharata@linux.vnet.ibm.com> wrote:

> Support CPU hotplug via device-add command. Set up device tree
> entries for the hotplugged CPU core and use the exising EPOW event
> infrastructure to send CPU hotplug notification to the guest.
> 
> Also support cold plugged CPUs that are specified by -device option
> on cmdline.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c        | 129 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/ppc/spapr_events.c |   8 ++--
>  hw/ppc/spapr_rtas.c   |  11 +++++
>  3 files changed, 145 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index b526b7d..9b0701c 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
[...]
> +static void spapr_cpu_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> +                            Error **errp)
> +{
> +    CPUState *cs = CPU(dev);
> +    PowerPCCPU *cpu = POWERPC_CPU(cs);
> +    int id = ppc_get_vcpu_dt_id(cpu);
> +    sPAPRDRConnector *drc =
> +        spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, id);
> +    sPAPRDRConnectorClass *drck;
> +    int smt = kvmppc_smt_threads();
> +    Error *local_err = NULL;
> +    void *fdt = NULL;
> +    int i, fdt_offset = 0;
> +
> +    /* Set NUMA node for the added CPUs  */
> +    for (i = 0; i < nb_numa_nodes; i++) {
> +        if (test_bit(cs->cpu_index, numa_info[i].node_cpu)) {
> +            cs->numa_node = i;
> +            break;
> +        }
> +    }
> +
> +    /*
> +     * SMT threads return from here, only main thread (core) will
> +     * continue and signal hotplug event to the guest.
> +     */
> +    if ((id % smt) != 0) {
> +        return;
> +    }
> +
> +    if (!spapr->dr_cpu_enabled) {
> +        /*
> +         * This is a cold plugged CPU but the machine doesn't support
> +         * DR. So skip the hotplug path ensuring that the CPU is brought
> +         * up online with out an associated DR connector.
> +         */
> +        return;
> +    }
> +
> +    g_assert(drc);
> +
> +    /*
> +     * Setup CPU DT entries only for hotplugged CPUs. For boot time or
> +     * coldplugged CPUs DT entries are setup in spapr_finalize_fdt().
> +     */
> +    if (dev->hotplugged) {
> +        fdt = spapr_populate_hotplug_cpu_dt(dev, cs, &fdt_offset);
> +    }
> +
> +    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +    drck->attach(drc, dev, fdt, fdt_offset, !dev->hotplugged, &local_err);
> +    if (local_err) {
> +        g_free(fdt);
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    /*
> +     * We send hotplug notification interrupt to the guest only in case
> +     * of hotplugged CPUs.
> +     */
> +    if (dev->hotplugged) {
> +        spapr_hotplug_req_add_event(drc);
> +    } else {
> +        /*
> +         * HACK to support removal of hotplugged CPU after VM migration:
> +         *
> +         * Since we want to be able to hot-remove those coldplugged CPUs
> +         * started at boot time using -device option at the target VM, we set
> +         * the right allocation_state and isolation_state for them, which for
> +         * the hotplugged CPUs would be set via RTAS calls done from the
> +         * guest during hotplug.
> +         *
> +         * This allows the coldplugged CPUs started using -device option to
> +         * have the right isolation and allocation states as expected by the
> +         * CPU hot removal code.
> +         *
> +         * This hack will be removed once we have DRC states migrated as part
> +         * of VM migration.
> +         */
> +        drck->set_allocation_state(drc, SPAPR_DR_ALLOCATION_STATE_USABLE);
> +        drck->set_isolation_state(drc, SPAPR_DR_ISOLATION_STATE_UNISOLATED);
> +    }
> +
> +    return;

Cosmetic nit: Superfluous return statement

> +}
> +
[...]
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 57ec97a..48aeb86 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -121,6 +121,16 @@ static void rtas_query_cpu_stopped_state(PowerPCCPU *cpu_,
>      rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>  }
>  
> +static void spapr_cpu_set_endianness(PowerPCCPU *cpu)
> +{
> +    PowerPCCPU *fcpu = POWERPC_CPU(first_cpu);
> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(fcpu);
> +
> +    if (!(*pcc->interrupts_big_endian)(fcpu)) {

Functions pointers are sometimes still confusing to me, but can't you
simplify that to:

    if (!pcc->interrupts_big_endian(fcpu)) {

?

> +        cpu->env.spr[SPR_LPCR] |= LPCR_ILE;
> +    }
> +}

 Thomas

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 06/24] spapr: Consolidate cpu init code into a routine
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 06/24] spapr: Consolidate cpu init code into a routine Bharata B Rao
@ 2015-05-04 16:10   ` Thomas Huth
  2015-05-06  4:28     ` Bharata B Rao
  0 siblings, 1 reply; 74+ messages in thread
From: Thomas Huth @ 2015-05-04 16:10 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, imammedo,
	nfont, afaerber, david

On Fri, 24 Apr 2015 12:17:28 +0530
Bharata B Rao <bharata@linux.vnet.ibm.com> wrote:

> Factor out bits of sPAPR specific CPU initialization code into
> a separate routine so that it can be called from CPU hotplug
> path too.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> ---
>  hw/ppc/spapr.c | 54 +++++++++++++++++++++++++++++-------------------------
>  1 file changed, 29 insertions(+), 25 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index a56f9a1..5c8f2ff 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1440,6 +1440,34 @@ static void spapr_drc_reset(void *opaque)
>      }
>  }
>  
> +static void spapr_cpu_init(PowerPCCPU *cpu)
> +{
> +    CPUPPCState *env = &cpu->env;
> +
> +    /* Set time-base frequency to 512 MHz */
> +    cpu_ppc_tb_init(env, TIMEBASE_FREQ);
> +
> +    /* PAPR always has exception vectors in RAM not ROM. To ensure this,
> +     * MSR[IP] should never be set.
> +     */
> +    env->msr_mask &= ~(1 << 6);

While you're at it ... could we maybe get a proper #define for that MSR
bit? (just like the other ones in target-ppc/cpu.h)

> +    /* Tell KVM that we're in PAPR mode */
> +    if (kvm_enabled()) {
> +        kvmppc_set_papr(cpu);
> +    }
> +
> +    if (cpu->max_compat) {
> +        if (ppc_set_compat(cpu, cpu->max_compat) < 0) {
> +            exit(1);
> +        }
> +    }
> +
> +    xics_cpu_setup(spapr->icp, cpu);
> +
> +    qemu_register_reset(spapr_cpu_reset, cpu);
> +}

 Thomas

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 07/24] cpu: Prepare Socket container type
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 07/24] cpu: Prepare Socket container type Bharata B Rao
@ 2015-05-05  1:47   ` David Gibson
  2015-05-06  4:36     ` Bharata B Rao
  0 siblings, 1 reply; 74+ messages in thread
From: David Gibson @ 2015-05-05  1:47 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

[-- Attachment #1: Type: text/plain, Size: 1076 bytes --]

On Fri, Apr 24, 2015 at 12:17:29PM +0530, Bharata B Rao wrote:
> From: Andreas Färber <afaerber@suse.de>
> 
> Signed-off-by: Andreas Färber <afaerber@suse.de>
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>

So, how to organize this generically is still under discussion.  For
now, I don't think this generic outline is really worth it.  In any
case I can't really take it through my tree.

What I'd suggest instead is just implementing the POWER core device in
the ppc specific code.  As the generic socket vs. core vs. whatever
stuff clarifies, that POWER core device might become a "virtual
socket" or CM or whatever, but I think we'll be able to keep the
external interface compatible with the right use of aliases.

In the meantime it should at least give us a draft we can experiment
with on Power without requiring new generic infrastructure.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 08/24] ppc: Prepare CPU socket/core abstraction
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 08/24] ppc: Prepare CPU socket/core abstraction Bharata B Rao
  2015-05-04 15:15   ` Thomas Huth
@ 2015-05-05  6:46   ` David Gibson
  1 sibling, 0 replies; 74+ messages in thread
From: David Gibson @ 2015-05-05  6:46 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

[-- Attachment #1: Type: text/plain, Size: 6834 bytes --]

On Fri, Apr 24, 2015 at 12:17:30PM +0530, Bharata B Rao wrote:

As Thomas says, this really needs a commit message.

I also think building this infrastructure is a bit premature when the
discussion is ongoing about how to do this geerically.

What I'd suggest is just have the minimal set you need, which can be
reworked into the new generic scheme once it solidifies.

So, I'd suggest just implement a specific POWER8 core device, which
instantiates up to 8 POWER8 vcpu threads.  We know we'll need some
kind of handle for that, regardless of where it fits in the eventual
overall scheme of sockets and cores and whatever.

> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Signed-off-by: Andreas Färber <afaerber@suse.de>
> ---
>  hw/ppc/Makefile.objs        |  1 +
>  hw/ppc/cpu-core.c           | 46 ++++++++++++++++++++++++++++++++++++++++++++
>  hw/ppc/cpu-socket.c         | 47 +++++++++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/cpu-core.h   | 32 ++++++++++++++++++++++++++++++
>  include/hw/ppc/cpu-socket.h | 32 ++++++++++++++++++++++++++++++
>  5 files changed, 158 insertions(+)
>  create mode 100644 hw/ppc/cpu-core.c
>  create mode 100644 hw/ppc/cpu-socket.c
>  create mode 100644 include/hw/ppc/cpu-core.h
>  create mode 100644 include/hw/ppc/cpu-socket.h
> 
> diff --git a/hw/ppc/Makefile.objs b/hw/ppc/Makefile.objs
> index c8ab06e..a35cac5 100644
> --- a/hw/ppc/Makefile.objs
> +++ b/hw/ppc/Makefile.objs
> @@ -1,5 +1,6 @@
>  # shared objects
>  obj-y += ppc.o ppc_booke.o
> +obj-y += cpu-socket.o cpu-core.o
>  # IBM pSeries (sPAPR)
>  obj-$(CONFIG_PSERIES) += spapr.o spapr_vio.o spapr_events.o
>  obj-$(CONFIG_PSERIES) += spapr_hcall.o spapr_iommu.o spapr_rtas.o
> diff --git a/hw/ppc/cpu-core.c b/hw/ppc/cpu-core.c
> new file mode 100644
> index 0000000..ed0481f
> --- /dev/null
> +++ b/hw/ppc/cpu-core.c
> @@ -0,0 +1,46 @@
> +/*
> + * ppc CPU core abstraction
> + *
> + * Copyright (c) 2015 SUSE Linux GmbH
> + * Copyright (C) 2015 Bharata B Rao <bharata@linux.vnet.ibm.com>
> + */
> +
> +#include "hw/qdev.h"
> +#include "hw/ppc/cpu-core.h"
> +
> +static int ppc_cpu_core_realize_child(Object *child, void *opaque)
> +{
> +    Error **errp = opaque;
> +
> +    object_property_set_bool(child, true, "realized", errp);
> +    if (*errp) {
> +        return 1;
> +    }
> +
> +    return 0;
> +}
> +
> +static void ppc_cpu_core_realize(DeviceState *dev, Error **errp)
> +{
> +    object_child_foreach(OBJECT(dev), ppc_cpu_core_realize_child, errp);
> +}
> +
> +static void ppc_cpu_core_class_init(ObjectClass *oc, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(oc);
> +
> +    dc->realize = ppc_cpu_core_realize;
> +}
> +
> +static const TypeInfo ppc_cpu_core_type_info = {
> +    .name = TYPE_POWERPC_CPU_CORE,
> +    .parent = TYPE_DEVICE,
> +    .class_init = ppc_cpu_core_class_init,
> +};
> +
> +static void ppc_cpu_core_register_types(void)
> +{
> +    type_register_static(&ppc_cpu_core_type_info);
> +}
> +
> +type_init(ppc_cpu_core_register_types)
> diff --git a/hw/ppc/cpu-socket.c b/hw/ppc/cpu-socket.c
> new file mode 100644
> index 0000000..602a060
> --- /dev/null
> +++ b/hw/ppc/cpu-socket.c
> @@ -0,0 +1,47 @@
> +/*
> + * PPC CPU socket abstraction
> + *
> + * Copyright (c) 2015 SUSE Linux GmbH
> + * Copyright (C) 2015 Bharata B Rao <bharata@linux.vnet.ibm.com>
> + */
> +
> +#include "hw/qdev.h"
> +#include "hw/ppc/cpu-socket.h"
> +#include "sysemu/cpus.h"
> +
> +static int ppc_cpu_socket_realize_child(Object *child, void *opaque)
> +{
> +    Error **errp = opaque;
> +
> +    object_property_set_bool(child, true, "realized", errp);
> +    if (*errp) {
> +        return 1;
> +    } else {
> +        return 0;
> +    }
> +}
> +
> +static void ppc_cpu_socket_realize(DeviceState *dev, Error **errp)
> +{
> +    object_child_foreach(OBJECT(dev), ppc_cpu_socket_realize_child, errp);
> +}
> +
> +static void ppc_cpu_socket_class_init(ObjectClass *oc, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(oc);
> +
> +    dc->realize = ppc_cpu_socket_realize;
> +}
> +
> +static const TypeInfo ppc_cpu_socket_type_info = {
> +    .name = TYPE_POWERPC_CPU_SOCKET,
> +    .parent = TYPE_CPU_SOCKET,
> +    .class_init = ppc_cpu_socket_class_init,
> +};
> +
> +static void ppc_cpu_socket_register_types(void)
> +{
> +    type_register_static(&ppc_cpu_socket_type_info);
> +}
> +
> +type_init(ppc_cpu_socket_register_types)
> diff --git a/include/hw/ppc/cpu-core.h b/include/hw/ppc/cpu-core.h
> new file mode 100644
> index 0000000..95f1c28
> --- /dev/null
> +++ b/include/hw/ppc/cpu-core.h
> @@ -0,0 +1,32 @@
> +/*
> + * PowerPC CPU core abstraction
> + *
> + * Copyright (c) 2015 SUSE Linux GmbH
> + * Copyright (C) 2015 Bharata B Rao <bharata@linux.vnet.ibm.com>
> + */
> +#ifndef HW_PPC_CPU_CORE_H
> +#define HW_PPC_CPU_CORE_H
> +
> +#include "hw/qdev.h"
> +#include "cpu.h"
> +
> +#ifdef TARGET_PPC64
> +#define TYPE_POWERPC_CPU_CORE "powerpc64-cpu-core"
> +#elif defined(TARGET_PPCEMB)
> +#define TYPE_POWERPC_CPU_CORE "embedded-powerpc-cpu-core"
> +#else
> +#define TYPE_POWERPC_CPU_CORE "powerpc-cpu-core"
> +#endif
> +
> +#define POWERPC_CPU_CORE(obj) \
> +    OBJECT_CHECK(PowerPCCPUCore, (obj), TYPE_POWERPC_CPU_CORE)
> +
> +typedef struct PowerPCCPUCore {
> +    /*< private >*/
> +    DeviceState parent_obj;
> +    /*< public >*/
> +
> +    PowerPCCPU thread[0];
> +} PowerPCCPUCore;
> +
> +#endif
> diff --git a/include/hw/ppc/cpu-socket.h b/include/hw/ppc/cpu-socket.h
> new file mode 100644
> index 0000000..5ae19d0
> --- /dev/null
> +++ b/include/hw/ppc/cpu-socket.h
> @@ -0,0 +1,32 @@
> +/*
> + * PowerPC CPU socket abstraction
> + *
> + * Copyright (c) 2015 SUSE Linux GmbH
> + * Copyright (C) 2015 Bharata B Rao <bharata@linux.vnet.ibm.com>
> + */
> +#ifndef HW_PPC_CPU_SOCKET_H
> +#define HW_PPC_CPU_SOCKET_H
> +
> +#include "hw/cpu/socket.h"
> +#include "cpu-core.h"
> +
> +#ifdef TARGET_PPC64
> +#define TYPE_POWERPC_CPU_SOCKET "powerpc64-cpu-socket"
> +#elif defined(TARGET_PPCEMB)
> +#define TYPE_POWERPC_CPU_SOCKET "embedded-powerpc-cpu-socket"
> +#else
> +#define TYPE_POWERPC_CPU_SOCKET "powerpc-cpu-socket"
> +#endif
> +
> +#define POWERPC_CPU_SOCKET(obj) \
> +    OBJECT_CHECK(PowerPCCPUSocket, (obj), TYPE_POWERPC_CPU_SOCKET)
> +
> +typedef struct PowerPCCPUSocket {
> +    /*< private >*/
> +    DeviceState parent_obj;
> +    /*< public >*/
> +
> +    PowerPCCPUCore core[0];
> +} PowerPCCPUSocket;
> +
> +#endif

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 10/24] ppc: Update cpu_model in MachineState
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 10/24] ppc: Update cpu_model in MachineState Bharata B Rao
@ 2015-05-05  6:49   ` David Gibson
  2015-05-06  4:49     ` Bharata B Rao
  0 siblings, 1 reply; 74+ messages in thread
From: David Gibson @ 2015-05-05  6:49 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

[-- Attachment #1: Type: text/plain, Size: 7759 bytes --]

On Fri, Apr 24, 2015 at 12:17:32PM +0530, Bharata B Rao wrote:
> Keep cpu_model field in MachineState uptodate so that it can be used
> from the CPU hotplug path.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

As before, this looks fine to me, but I'm not sure which tree it
should go through.

Alex, do you want to take it directly, or send an Acked-by and I'll
take it through spapr-next?

> ---
>  hw/ppc/mac_newworld.c  | 10 +++++-----
>  hw/ppc/mac_oldworld.c  |  7 +++----
>  hw/ppc/ppc440_bamboo.c |  7 +++----
>  hw/ppc/prep.c          |  7 +++----
>  hw/ppc/spapr.c         |  7 +++----
>  hw/ppc/virtex_ml507.c  |  7 +++----
>  6 files changed, 20 insertions(+), 25 deletions(-)
> 
> diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c
> index 624b4ab..fe18bce 100644
> --- a/hw/ppc/mac_newworld.c
> +++ b/hw/ppc/mac_newworld.c
> @@ -145,7 +145,6 @@ static void ppc_core99_reset(void *opaque)
>  static void ppc_core99_init(MachineState *machine)
>  {
>      ram_addr_t ram_size = machine->ram_size;
> -    const char *cpu_model = machine->cpu_model;
>      const char *kernel_filename = machine->kernel_filename;
>      const char *kernel_cmdline = machine->kernel_cmdline;
>      const char *initrd_filename = machine->initrd_filename;
> @@ -182,14 +181,15 @@ static void ppc_core99_init(MachineState *machine)
>      linux_boot = (kernel_filename != NULL);
>  
>      /* init CPUs */
> -    if (cpu_model == NULL)
> +    if (machine->cpu_model == NULL) {
>  #ifdef TARGET_PPC64
> -        cpu_model = "970fx";
> +        machine->cpu_model = "970fx";
>  #else
> -        cpu_model = "G4";
> +        machine->cpu_model = "G4";
>  #endif
> +    }
>      for (i = 0; i < smp_cpus; i++) {
> -        cpu = cpu_ppc_init(cpu_model);
> +        cpu = cpu_ppc_init(machine->cpu_model);
>          if (cpu == NULL) {
>              fprintf(stderr, "Unable to find PowerPC CPU definition\n");
>              exit(1);
> diff --git a/hw/ppc/mac_oldworld.c b/hw/ppc/mac_oldworld.c
> index 3079510..2732319 100644
> --- a/hw/ppc/mac_oldworld.c
> +++ b/hw/ppc/mac_oldworld.c
> @@ -75,7 +75,6 @@ static void ppc_heathrow_reset(void *opaque)
>  static void ppc_heathrow_init(MachineState *machine)
>  {
>      ram_addr_t ram_size = machine->ram_size;
> -    const char *cpu_model = machine->cpu_model;
>      const char *kernel_filename = machine->kernel_filename;
>      const char *kernel_cmdline = machine->kernel_cmdline;
>      const char *initrd_filename = machine->initrd_filename;
> @@ -107,10 +106,10 @@ static void ppc_heathrow_init(MachineState *machine)
>      linux_boot = (kernel_filename != NULL);
>  
>      /* init CPUs */
> -    if (cpu_model == NULL)
> -        cpu_model = "G3";
> +    if (machine->cpu_model == NULL)
> +        machine->cpu_model = "G3";
>      for (i = 0; i < smp_cpus; i++) {
> -        cpu = cpu_ppc_init(cpu_model);
> +        cpu = cpu_ppc_init(machine->cpu_model);
>          if (cpu == NULL) {
>              fprintf(stderr, "Unable to find PowerPC CPU definition\n");
>              exit(1);
> diff --git a/hw/ppc/ppc440_bamboo.c b/hw/ppc/ppc440_bamboo.c
> index 778970a..032fa80 100644
> --- a/hw/ppc/ppc440_bamboo.c
> +++ b/hw/ppc/ppc440_bamboo.c
> @@ -159,7 +159,6 @@ static void main_cpu_reset(void *opaque)
>  static void bamboo_init(MachineState *machine)
>  {
>      ram_addr_t ram_size = machine->ram_size;
> -    const char *cpu_model = machine->cpu_model;
>      const char *kernel_filename = machine->kernel_filename;
>      const char *kernel_cmdline = machine->kernel_cmdline;
>      const char *initrd_filename = machine->initrd_filename;
> @@ -184,10 +183,10 @@ static void bamboo_init(MachineState *machine)
>      int i;
>  
>      /* Setup CPU. */
> -    if (cpu_model == NULL) {
> -        cpu_model = "440EP";
> +    if (machine->cpu_model == NULL) {
> +        machine->cpu_model = "440EP";
>      }
> -    cpu = cpu_ppc_init(cpu_model);
> +    cpu = cpu_ppc_init(machine->cpu_model);
>      if (cpu == NULL) {
>          fprintf(stderr, "Unable to initialize CPU!\n");
>          exit(1);
> diff --git a/hw/ppc/prep.c b/hw/ppc/prep.c
> index 15df7f3..55e9643 100644
> --- a/hw/ppc/prep.c
> +++ b/hw/ppc/prep.c
> @@ -364,7 +364,6 @@ static PortioList prep_port_list;
>  static void ppc_prep_init(MachineState *machine)
>  {
>      ram_addr_t ram_size = machine->ram_size;
> -    const char *cpu_model = machine->cpu_model;
>      const char *kernel_filename = machine->kernel_filename;
>      const char *kernel_cmdline = machine->kernel_cmdline;
>      const char *initrd_filename = machine->initrd_filename;
> @@ -396,10 +395,10 @@ static void ppc_prep_init(MachineState *machine)
>      linux_boot = (kernel_filename != NULL);
>  
>      /* init CPUs */
> -    if (cpu_model == NULL)
> -        cpu_model = "602";
> +    if (machine->cpu_model == NULL)
> +        machine->cpu_model = "602";
>      for (i = 0; i < smp_cpus; i++) {
> -        cpu = cpu_ppc_init(cpu_model);
> +        cpu = cpu_ppc_init(machine->cpu_model);
>          if (cpu == NULL) {
>              fprintf(stderr, "Unable to find PowerPC CPU definition\n");
>              exit(1);
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index f2c4fbd..8cc55fe 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1473,7 +1473,6 @@ static void ppc_spapr_init(MachineState *machine)
>  {
>      sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(machine);
>      ram_addr_t ram_size = machine->ram_size;
> -    const char *cpu_model = machine->cpu_model;
>      const char *kernel_filename = machine->kernel_filename;
>      const char *kernel_cmdline = machine->kernel_cmdline;
>      const char *initrd_filename = machine->initrd_filename;
> @@ -1567,11 +1566,11 @@ static void ppc_spapr_init(MachineState *machine)
>      }
>  
>      /* init CPUs */
> -    if (cpu_model == NULL) {
> -        cpu_model = kvm_enabled() ? "host" : "POWER7";
> +    if (machine->cpu_model == NULL) {
> +        machine->cpu_model = kvm_enabled() ? "host" : "POWER7";
>      }
>      for (i = 0; i < smp_cpus; i++) {
> -        cpu = cpu_ppc_init(cpu_model);
> +        cpu = cpu_ppc_init(machine->cpu_model);
>          if (cpu == NULL) {
>              fprintf(stderr, "Unable to find PowerPC CPU definition\n");
>              exit(1);
> diff --git a/hw/ppc/virtex_ml507.c b/hw/ppc/virtex_ml507.c
> index 6ebd5be..f33d398 100644
> --- a/hw/ppc/virtex_ml507.c
> +++ b/hw/ppc/virtex_ml507.c
> @@ -197,7 +197,6 @@ static int xilinx_load_device_tree(hwaddr addr,
>  static void virtex_init(MachineState *machine)
>  {
>      ram_addr_t ram_size = machine->ram_size;
> -    const char *cpu_model = machine->cpu_model;
>      const char *kernel_filename = machine->kernel_filename;
>      const char *kernel_cmdline = machine->kernel_cmdline;
>      hwaddr initrd_base = 0;
> @@ -214,11 +213,11 @@ static void virtex_init(MachineState *machine)
>      int i;
>  
>      /* init CPUs */
> -    if (cpu_model == NULL) {
> -        cpu_model = "440-Xilinx";
> +    if (machine->cpu_model == NULL) {
> +        machine->cpu_model = "440-Xilinx";
>      }
>  
> -    cpu = ppc440_init_xilinx(&ram_size, 1, cpu_model, 400000000);
> +    cpu = ppc440_init_xilinx(&ram_size, 1, machine->cpu_model, 400000000);
>      env = &cpu->env;
>      qemu_register_reset(main_cpu_reset, cpu);
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 11/24] ppc: Create sockets and cores for CPUs
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 11/24] ppc: Create sockets and cores for CPUs Bharata B Rao
@ 2015-05-05  6:52   ` David Gibson
  0 siblings, 0 replies; 74+ messages in thread
From: David Gibson @ 2015-05-05  6:52 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

[-- Attachment #1: Type: text/plain, Size: 779 bytes --]

On Fri, Apr 24, 2015 at 12:17:33PM +0530, Bharata B Rao wrote:
> ppc machine init functions create individual CPU threads. Change this
> for sPAPR by switching to socket creation. CPUs are created recursively
> by socket and core instance init routines.
> 
> TODO: Switching to socket level CPU creation is done only for sPAPR
> target now.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Though it obviously may need rework depending on what other changes
happen with the core/socket structure.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 12/24] spapr: CPU hotplug support
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 12/24] spapr: CPU hotplug support Bharata B Rao
  2015-05-04 15:53   ` Thomas Huth
@ 2015-05-05  6:59   ` David Gibson
  2015-05-06  6:14     ` Bharata B Rao
  1 sibling, 1 reply; 74+ messages in thread
From: David Gibson @ 2015-05-05  6:59 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

[-- Attachment #1: Type: text/plain, Size: 8976 bytes --]

On Fri, Apr 24, 2015 at 12:17:34PM +0530, Bharata B Rao wrote:
> Support CPU hotplug via device-add command. Set up device tree
> entries for the hotplugged CPU core and use the exising EPOW event
> infrastructure to send CPU hotplug notification to the guest.
> 
> Also support cold plugged CPUs that are specified by -device option
> on cmdline.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c        | 129 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  hw/ppc/spapr_events.c |   8 ++--
>  hw/ppc/spapr_rtas.c   |  11 +++++
>  3 files changed, 145 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index b526b7d..9b0701c 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -33,6 +33,7 @@
>  #include "sysemu/block-backend.h"
>  #include "sysemu/cpus.h"
>  #include "sysemu/kvm.h"
> +#include "sysemu/device_tree.h"
>  #include "kvm_ppc.h"
>  #include "mmu-hash64.h"
>  #include "qom/cpu.h"
> @@ -662,6 +663,17 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, int offset)
>      unsigned sockets = opts ? qemu_opt_get_number(opts, "sockets", 0) : 0;
>      uint32_t cpus_per_socket = sockets ? (smp_cpus / sockets) : 1;
>      uint32_t pft_size_prop[] = {0, cpu_to_be32(spapr->htab_shift)};
> +    sPAPRDRConnector *drc;
> +    sPAPRDRConnectorClass *drck;
> +    int drc_index;
> +
> +    if (spapr->dr_cpu_enabled) {
> +        drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, index);
> +        g_assert(drc);
> +        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +        drc_index = drck->get_index(drc);
> +        _FDT((fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_index)));
> +    }
>  
>      _FDT((fdt_setprop_cell(fdt, offset, "reg", index)));
>      _FDT((fdt_setprop_string(fdt, offset, "device_type", "cpu")));
> @@ -1850,6 +1862,114 @@ static void spapr_nmi(NMIState *n, int cpu_index, Error **errp)
>      }
>  }
>  
> +static void *spapr_populate_hotplug_cpu_dt(DeviceState *dev, CPUState *cs,
> +                                            int *fdt_offset)
> +{
> +    PowerPCCPU *cpu = POWERPC_CPU(cs);
> +    DeviceClass *dc = DEVICE_GET_CLASS(cs);
> +    int id = ppc_get_vcpu_dt_id(cpu);
> +    void *fdt;
> +    int offset, fdt_size;
> +    char *nodename;
> +
> +    fdt = create_device_tree(&fdt_size);
> +    nodename = g_strdup_printf("%s@%x", dc->fw_name, id);
> +    offset = fdt_add_subnode(fdt, 0, nodename);
> +
> +    spapr_populate_cpu_dt(cs, fdt, offset);
> +    g_free(nodename);
> +
> +    *fdt_offset = offset;
> +    return fdt;
> +}
> +
> +static void spapr_cpu_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> +                            Error **errp)
> +{
> +    CPUState *cs = CPU(dev);
> +    PowerPCCPU *cpu = POWERPC_CPU(cs);
> +    int id = ppc_get_vcpu_dt_id(cpu);
> +    sPAPRDRConnector *drc =
> +        spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, id);
> +    sPAPRDRConnectorClass *drck;
> +    int smt = kvmppc_smt_threads();
> +    Error *local_err = NULL;
> +    void *fdt = NULL;
> +    int i, fdt_offset = 0;
> +
> +    /* Set NUMA node for the added CPUs  */
> +    for (i = 0; i < nb_numa_nodes; i++) {
> +        if (test_bit(cs->cpu_index, numa_info[i].node_cpu)) {
> +            cs->numa_node = i;
> +            break;
> +        }
> +    }
> +
> +    /*
> +     * SMT threads return from here, only main thread (core) will
> +     * continue and signal hotplug event to the guest.
> +     */
> +    if ((id % smt) != 0) {
> +        return;
> +    }

Couldn't you avoid this by attaching this call to the core device,
rather than the individual vcpu thread objects?


> +    if (!spapr->dr_cpu_enabled) {
> +        /*
> +         * This is a cold plugged CPU but the machine doesn't support
> +         * DR. So skip the hotplug path ensuring that the CPU is brought
> +         * up online with out an associated DR connector.
> +         */
> +        return;
> +    }
> +
> +    g_assert(drc);
> +
> +    /*
> +     * Setup CPU DT entries only for hotplugged CPUs. For boot time or
> +     * coldplugged CPUs DT entries are setup in spapr_finalize_fdt().
> +     */
> +    if (dev->hotplugged) {
> +        fdt = spapr_populate_hotplug_cpu_dt(dev, cs, &fdt_offset);
> +    }
> +
> +    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +    drck->attach(drc, dev, fdt, fdt_offset, !dev->hotplugged, &local_err);
> +    if (local_err) {
> +        g_free(fdt);
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    /*
> +     * We send hotplug notification interrupt to the guest only in case
> +     * of hotplugged CPUs.
> +     */
> +    if (dev->hotplugged) {
> +        spapr_hotplug_req_add_event(drc);
> +    } else {
> +        /*
> +         * HACK to support removal of hotplugged CPU after VM migration:
> +         *
> +         * Since we want to be able to hot-remove those coldplugged CPUs
> +         * started at boot time using -device option at the target VM, we set
> +         * the right allocation_state and isolation_state for them, which for
> +         * the hotplugged CPUs would be set via RTAS calls done from the
> +         * guest during hotplug.
> +         *
> +         * This allows the coldplugged CPUs started using -device option to
> +         * have the right isolation and allocation states as expected by the
> +         * CPU hot removal code.
> +         *
> +         * This hack will be removed once we have DRC states migrated as part
> +         * of VM migration.
> +         */
> +        drck->set_allocation_state(drc, SPAPR_DR_ALLOCATION_STATE_USABLE);
> +        drck->set_isolation_state(drc, SPAPR_DR_ISOLATION_STATE_UNISOLATED);
> +    }
> +
> +    return;
> +}
> +
>  static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
>                                        DeviceState *dev, Error **errp)
>  {
> @@ -1858,6 +1978,15 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
>          PowerPCCPU *cpu = POWERPC_CPU(cs);
>  
>          spapr_cpu_init(cpu);
> +        spapr_cpu_reset(cpu);

I'm a little surprised these get called here, rather than in the
creation / realize path of the core qdev.

> +        /*
> +         * Fail hotplug on machines where CPU DR isn't enabled.
> +         */
> +        if (!spapr->dr_cpu_enabled && dev->hotplugged) {
> +            return;
> +        }
> +        spapr_cpu_plug(hotplug_dev, dev, errp);
>      }
>  }
>  
> diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c
> index be82815..4ae818a 100644
> --- a/hw/ppc/spapr_events.c
> +++ b/hw/ppc/spapr_events.c
> @@ -421,14 +421,16 @@ static void spapr_hotplug_req_event(sPAPRDRConnector *drc, uint8_t hp_action)
>      hp->hdr.section_length = cpu_to_be16(sizeof(*hp));
>      hp->hdr.section_version = 1; /* includes extended modifier */
>      hp->hotplug_action = hp_action;
> -
> +    hp->drc.index = cpu_to_be32(drck->get_index(drc));
> +    hp->hotplug_identifier = RTAS_LOG_V6_HP_ID_DRC_INDEX;
>  
>      switch (drc_type) {
>      case SPAPR_DR_CONNECTOR_TYPE_PCI:
> -        hp->drc.index = cpu_to_be32(drck->get_index(drc));
> -        hp->hotplug_identifier = RTAS_LOG_V6_HP_ID_DRC_INDEX;
>          hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PCI;
>          break;
> +    case SPAPR_DR_CONNECTOR_TYPE_CPU:
> +        hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_CPU;
> +        break;
>      default:
>          /* we shouldn't be signaling hotplug events for resources
>           * that don't support them
> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> index 57ec97a..48aeb86 100644
> --- a/hw/ppc/spapr_rtas.c
> +++ b/hw/ppc/spapr_rtas.c
> @@ -121,6 +121,16 @@ static void rtas_query_cpu_stopped_state(PowerPCCPU *cpu_,
>      rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
>  }
>  
> +static void spapr_cpu_set_endianness(PowerPCCPU *cpu)
> +{
> +    PowerPCCPU *fcpu = POWERPC_CPU(first_cpu);
> +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(fcpu);
> +
> +    if (!(*pcc->interrupts_big_endian)(fcpu)) {
> +        cpu->env.spr[SPR_LPCR] |= LPCR_ILE;
> +    }
> +}
> +
>  static void rtas_start_cpu(PowerPCCPU *cpu_, sPAPREnvironment *spapr,
>                             uint32_t token, uint32_t nargs,
>                             target_ulong args,
> @@ -157,6 +167,7 @@ static void rtas_start_cpu(PowerPCCPU *cpu_, sPAPREnvironment *spapr,
>          env->nip = start;
>          env->gpr[3] = r3;
>          cs->halted = 0;
> +        spapr_cpu_set_endianness(cpu);
>  
>          qemu_cpu_kick(cs);
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 13/24] cpus: Add Error argument to cpu_exec_init()
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 13/24] cpus: Add Error argument to cpu_exec_init() Bharata B Rao
@ 2015-05-05  7:01   ` David Gibson
  0 siblings, 0 replies; 74+ messages in thread
From: David Gibson @ 2015-05-05  7:01 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

[-- Attachment #1: Type: text/plain, Size: 1091 bytes --]

On Fri, Apr 24, 2015 at 12:17:35PM +0530, Bharata B Rao wrote:
> Add an Error argument to cpu_exec_init() to let users collect the
> error. Change all callers to currently pass NULL error argument. This change
> is needed for the following reasons:
> 
> - A subsequent commit changes the CPU enumeration logic in cpu_exec_init()
>   resulting in cpu_exec_init() to fail if cpu_index values corresponding
>   to max_cpus have already been handed out.
> - There is a thinking that cpu_exec_init() should be called from realize
>   rather than instance_init. With this change, those architectures
>   that can move this call into realize function can do so in a phased
>   manner.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Again, looks good to me, but I'm not sure whose tree it needs to go
through.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 14/24] cpus: Convert cpu_index into a bitmap
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 14/24] cpus: Convert cpu_index into a bitmap Bharata B Rao
@ 2015-05-05  7:10   ` David Gibson
  0 siblings, 0 replies; 74+ messages in thread
From: David Gibson @ 2015-05-05  7:10 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

[-- Attachment #1: Type: text/plain, Size: 17656 bytes --]

On Fri, Apr 24, 2015 at 12:17:36PM +0530, Bharata B Rao wrote:
> Currently CPUState.cpu_index is monotonically increasing and a newly
> created CPU always gets the next higher index. The next available
> index is calculated by counting the existing number of CPUs. This is
> fine as long as we only add CPUs, but there are architectures which
> are starting to support CPU removal too. For an architecture like PowerPC
> which derives its CPU identifier (device tree ID) from cpu_index, the
> existing logic of generating cpu_index values causes problems.
> 
> With the currently proposed method of handling vCPU removal by parking
> the vCPU fd in QEMU
> (Ref: http://lists.gnu.org/archive/html/qemu-devel/2015-02/msg02604.html),
> generating cpu_index this way will not work for PowerPC.
> 
> This patch changes the way cpu_index is handed out by maintaining
> a bit map of the CPUs that tracks both addition and removal of CPUs.
> 
> The CPU bitmap allocation logic is part of cpu_exec_init() which is
> called by instance_init routines of various CPU targets. This patch
> also adds corresponding instance_finalize routine if needed for these
> CPU targets so that CPU can be marked free when it is removed.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>

Looks good in concept, though there are a couple of implementation
nits noted below.

I thin kit might be worth posting this patch and the previous one
separately from your spapr hotplug series.  They're generic patches
which can't go through my tree, and they also look sound to me
regardless of how the remaining details of cpu hotplug work out.

> ---
>  exec.c                      | 37 ++++++++++++++++++++++++++++++++++---
>  include/qom/cpu.h           |  8 ++++++++
>  target-alpha/cpu.c          |  6 ++++++
>  target-arm/cpu.c            |  1 +
>  target-cris/cpu.c           |  6 ++++++
>  target-i386/cpu.c           |  6 ++++++
>  target-lm32/cpu.c           |  6 ++++++
>  target-m68k/cpu.c           |  6 ++++++
>  target-microblaze/cpu.c     |  6 ++++++
>  target-mips/cpu.c           |  6 ++++++
>  target-moxie/cpu.c          |  6 ++++++
>  target-openrisc/cpu.c       |  6 ++++++
>  target-ppc/translate_init.c |  6 ++++++
>  target-s390x/cpu.c          |  1 +
>  target-sh4/cpu.c            |  6 ++++++
>  target-sparc/cpu.c          |  1 +
>  target-tricore/cpu.c        |  5 +++++
>  target-unicore32/cpu.c      |  6 ++++++
>  target-xtensa/cpu.c         |  6 ++++++
>  19 files changed, 128 insertions(+), 3 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index e1ff6b0..9bbab02 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -527,21 +527,52 @@ void tcg_cpu_address_space_init(CPUState *cpu, AddressSpace *as)
>  }
>  #endif
>  
> +#ifndef CONFIG_USER_ONLY

Having different methods of handling the cpu infox for USER_ONLY and
softmmu mode seems a bit ugly.  There's no need for the bitmap in user
only mode, but there's no harm to it either.

> +static DECLARE_BITMAP(cpu_index_map, MAX_CPUMASK_BITS);
> +
> +static int cpu_get_free_index(Error **errp)
> +{
> +    int cpu = find_first_zero_bit(cpu_index_map, max_cpus);
> +
> +    if (cpu == max_cpus) {

Might be safer to have cpu >= max_cpus here, just in case something
changes.

> +        error_setg(errp, "Trying to use more CPUs than allowed max of %d\n",
> +                    max_cpus);
> +        return max_cpus;
> +    } else {
> +        bitmap_set(cpu_index_map, cpu, 1);
> +        return cpu;
> +    }
> +}
> +
> +void cpu_exec_exit(CPUState *cpu)
> +{
> +    bitmap_clear(cpu_index_map, cpu->cpu_index, 1);
> +}
> +#endif
> +
>  void cpu_exec_init(CPUArchState *env, Error **errp)
>  {
>      CPUState *cpu = ENV_GET_CPU(env);
>      CPUClass *cc = CPU_GET_CLASS(cpu);
> -    CPUState *some_cpu;
>      int cpu_index;
> -
>  #if defined(CONFIG_USER_ONLY)
> +    CPUState *some_cpu;
> +
>      cpu_list_lock();
> -#endif
>      cpu_index = 0;
>      CPU_FOREACH(some_cpu) {
>          cpu_index++;
>      }
>      cpu->cpu_index = cpu_index;
> +#else
> +    Error *local_err = NULL;
> +
> +    cpu_index = cpu->cpu_index = cpu_get_free_index(&local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +#endif
>      cpu->numa_node = 0;
>      QTAILQ_INIT(&cpu->breakpoints);
>      QTAILQ_INIT(&cpu->watchpoints);
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 48fd6fb..5241cf4 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -659,6 +659,14 @@ void cpu_watchpoint_remove_all(CPUState *cpu, int mask);
>  void QEMU_NORETURN cpu_abort(CPUState *cpu, const char *fmt, ...)
>      GCC_FMT_ATTR(2, 3);
>  
> +#ifndef CONFIG_USER_ONLY
> +void cpu_exec_exit(CPUState *cpu);
> +#else
> +static inline void cpu_exec_exit(CPUState *cpu)
> +{
> +}
> +#endif
> +
>  #ifdef CONFIG_SOFTMMU
>  extern const struct VMStateDescription vmstate_cpu_common;
>  #else
> diff --git a/target-alpha/cpu.c b/target-alpha/cpu.c
> index 0a0c21e..259a04c 100644
> --- a/target-alpha/cpu.c
> +++ b/target-alpha/cpu.c
> @@ -250,6 +250,11 @@ static const TypeInfo ev68_cpu_type_info = {
>      .parent = TYPE("ev67"),
>  };
>  
> +static void alpha_cpu_finalize(Object *obj)
> +{
> +    cpu_exec_exit(CPU(obj));
> +}
> +
>  static void alpha_cpu_initfn(Object *obj)
>  {
>      CPUState *cs = CPU(obj);
> @@ -305,6 +310,7 @@ static const TypeInfo alpha_cpu_type_info = {
>      .parent = TYPE_CPU,
>      .instance_size = sizeof(AlphaCPU),
>      .instance_init = alpha_cpu_initfn,
> +    .instance_finalize = alpha_cpu_finalize,
>      .abstract = true,
>      .class_size = sizeof(AlphaCPUClass),
>      .class_init = alpha_cpu_class_init,
> diff --git a/target-arm/cpu.c b/target-arm/cpu.c
> index 86edaab..8d824d3 100644
> --- a/target-arm/cpu.c
> +++ b/target-arm/cpu.c
> @@ -454,6 +454,7 @@ static void arm_cpu_finalizefn(Object *obj)
>  {
>      ARMCPU *cpu = ARM_CPU(obj);
>      g_hash_table_destroy(cpu->cp_regs);
> +    cpu_exec_exit(CPU(obj));
>  }
>  
>  static void arm_cpu_realizefn(DeviceState *dev, Error **errp)
> diff --git a/target-cris/cpu.c b/target-cris/cpu.c
> index 8b589ec..da39223 100644
> --- a/target-cris/cpu.c
> +++ b/target-cris/cpu.c
> @@ -161,6 +161,11 @@ static void cris_cpu_set_irq(void *opaque, int irq, int level)
>  }
>  #endif
>  
> +static void cris_cpu_finalize(Object *obj)
> +{
> +    cpu_exec_exit(CPU(obj));
> +}
> +
>  static void cris_cpu_initfn(Object *obj)
>  {
>      CPUState *cs = CPU(obj);
> @@ -299,6 +304,7 @@ static const TypeInfo cris_cpu_type_info = {
>      .parent = TYPE_CPU,
>      .instance_size = sizeof(CRISCPU),
>      .instance_init = cris_cpu_initfn,
> +    .instance_finalize = cris_cpu_finalize,
>      .abstract = true,
>      .class_size = sizeof(CRISCPUClass),
>      .class_init = cris_cpu_class_init,
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index daccf4f..ca2b93e 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -2877,6 +2877,11 @@ uint32_t x86_cpu_apic_id_from_index(unsigned int cpu_index)
>      }
>  }
>  
> +static void x86_cpu_finalize(Object *obj)
> +{
> +    cpu_exec_exit(CPU(obj));
> +}
> +
>  static void x86_cpu_initfn(Object *obj)
>  {
>      CPUState *cs = CPU(obj);
> @@ -3046,6 +3051,7 @@ static const TypeInfo x86_cpu_type_info = {
>      .parent = TYPE_CPU,
>      .instance_size = sizeof(X86CPU),
>      .instance_init = x86_cpu_initfn,
> +    .instance_finalize = x86_cpu_finalize,
>      .abstract = true,
>      .class_size = sizeof(X86CPUClass),
>      .class_init = x86_cpu_common_class_init,
> diff --git a/target-lm32/cpu.c b/target-lm32/cpu.c
> index 89b6631..d7bc973 100644
> --- a/target-lm32/cpu.c
> +++ b/target-lm32/cpu.c
> @@ -143,6 +143,11 @@ static void lm32_cpu_realizefn(DeviceState *dev, Error **errp)
>      lcc->parent_realize(dev, errp);
>  }
>  
> +static void lm32_cpu_finalize(Object *obj)
> +{
> +    cpu_exec_exit(CPU(obj));
> +}
> +
>  static void lm32_cpu_initfn(Object *obj)
>  {
>      CPUState *cs = CPU(obj);
> @@ -294,6 +299,7 @@ static const TypeInfo lm32_cpu_type_info = {
>      .parent = TYPE_CPU,
>      .instance_size = sizeof(LM32CPU),
>      .instance_init = lm32_cpu_initfn,
> +    .instance_finalize = lm32_cpu_finalize,
>      .abstract = true,
>      .class_size = sizeof(LM32CPUClass),
>      .class_init = lm32_cpu_class_init,
> diff --git a/target-m68k/cpu.c b/target-m68k/cpu.c
> index 6a41551..c2fce97 100644
> --- a/target-m68k/cpu.c
> +++ b/target-m68k/cpu.c
> @@ -160,6 +160,11 @@ static void m68k_cpu_realizefn(DeviceState *dev, Error **errp)
>      mcc->parent_realize(dev, errp);
>  }
>  
> +static void m68k_cpu_finalize(Object *obj)
> +{
> +    cpu_exec_exit(CPU(obj));
> +}
> +
>  static void m68k_cpu_initfn(Object *obj)
>  {
>      CPUState *cs = CPU(obj);
> @@ -231,6 +236,7 @@ static const TypeInfo m68k_cpu_type_info = {
>      .parent = TYPE_CPU,
>      .instance_size = sizeof(M68kCPU),
>      .instance_init = m68k_cpu_initfn,
> +    .instance_finalize = m68k_cpu_finalize,
>      .abstract = true,
>      .class_size = sizeof(M68kCPUClass),
>      .class_init = m68k_cpu_class_init,
> diff --git a/target-microblaze/cpu.c b/target-microblaze/cpu.c
> index 6b3732d..3aa3796 100644
> --- a/target-microblaze/cpu.c
> +++ b/target-microblaze/cpu.c
> @@ -122,6 +122,11 @@ static void mb_cpu_realizefn(DeviceState *dev, Error **errp)
>      mcc->parent_realize(dev, errp);
>  }
>  
> +static void mb_cpu_finalize(Object *obj)
> +{
> +    cpu_exec_exit(CPU(obj));
> +}
> +
>  static void mb_cpu_initfn(Object *obj)
>  {
>      CPUState *cs = CPU(obj);
> @@ -190,6 +195,7 @@ static const TypeInfo mb_cpu_type_info = {
>      .parent = TYPE_CPU,
>      .instance_size = sizeof(MicroBlazeCPU),
>      .instance_init = mb_cpu_initfn,
> +    .instance_finalize = mb_cpu_finalize,
>      .class_size = sizeof(MicroBlazeCPUClass),
>      .class_init = mb_cpu_class_init,
>  };
> diff --git a/target-mips/cpu.c b/target-mips/cpu.c
> index 02f1d32..2150999 100644
> --- a/target-mips/cpu.c
> +++ b/target-mips/cpu.c
> @@ -108,6 +108,11 @@ static void mips_cpu_realizefn(DeviceState *dev, Error **errp)
>      mcc->parent_realize(dev, errp);
>  }
>  
> +static void mips_cpu_finalize(Object *obj)
> +{
> +    cpu_exec_exit(CPU(obj));
> +}
> +
>  static void mips_cpu_initfn(Object *obj)
>  {
>      CPUState *cs = CPU(obj);
> @@ -159,6 +164,7 @@ static const TypeInfo mips_cpu_type_info = {
>      .parent = TYPE_CPU,
>      .instance_size = sizeof(MIPSCPU),
>      .instance_init = mips_cpu_initfn,
> +    .instance_finalize = mips_cpu_finalize,
>      .abstract = false,
>      .class_size = sizeof(MIPSCPUClass),
>      .class_init = mips_cpu_class_init,
> diff --git a/target-moxie/cpu.c b/target-moxie/cpu.c
> index f815fb3..25d5f30 100644
> --- a/target-moxie/cpu.c
> +++ b/target-moxie/cpu.c
> @@ -59,6 +59,11 @@ static void moxie_cpu_realizefn(DeviceState *dev, Error **errp)
>      mcc->parent_realize(dev, errp);
>  }
>  
> +static void moxie_cpu_finalize(Object *obj)
> +{
> +    cpu_exec_exit(CPU(obj));
> +}
> +
>  static void moxie_cpu_initfn(Object *obj)
>  {
>      CPUState *cs = CPU(obj);
> @@ -160,6 +165,7 @@ static const TypeInfo moxie_cpu_type_info = {
>      .parent = TYPE_CPU,
>      .instance_size = sizeof(MoxieCPU),
>      .instance_init = moxie_cpu_initfn,
> +    .instance_finalize = moxie_cpu_finalize,
>      .class_size = sizeof(MoxieCPUClass),
>      .class_init = moxie_cpu_class_init,
>  };
> diff --git a/target-openrisc/cpu.c b/target-openrisc/cpu.c
> index 87b2f80..f0c990f 100644
> --- a/target-openrisc/cpu.c
> +++ b/target-openrisc/cpu.c
> @@ -85,6 +85,11 @@ static void openrisc_cpu_realizefn(DeviceState *dev, Error **errp)
>      occ->parent_realize(dev, errp);
>  }
>  
> +static void openrisc_cpu_finalize(Object *obj)
> +{
> +    cpu_exec_exit(CPU(obj));
> +}
> +
>  static void openrisc_cpu_initfn(Object *obj)
>  {
>      CPUState *cs = CPU(obj);
> @@ -198,6 +203,7 @@ static const TypeInfo openrisc_cpu_type_info = {
>      .parent = TYPE_CPU,
>      .instance_size = sizeof(OpenRISCCPU),
>      .instance_init = openrisc_cpu_initfn,
> +    .instance_finalize = openrisc_cpu_finalize,
>      .abstract = true,
>      .class_size = sizeof(OpenRISCCPUClass),
>      .class_init = openrisc_cpu_class_init,
> diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
> index 9f4f172..a553560 100644
> --- a/target-ppc/translate_init.c
> +++ b/target-ppc/translate_init.c
> @@ -9671,6 +9671,11 @@ static bool ppc_cpu_is_big_endian(CPUState *cs)
>  }
>  #endif
>  
> +static void ppc_cpu_finalize(Object *obj)
> +{
> +    cpu_exec_exit(CPU(obj));
> +}
> +
>  static void ppc_cpu_initfn(Object *obj)
>  {
>      CPUState *cs = CPU(obj);
> @@ -9784,6 +9789,7 @@ static const TypeInfo ppc_cpu_type_info = {
>      .parent = TYPE_CPU,
>      .instance_size = sizeof(PowerPCCPU),
>      .instance_init = ppc_cpu_initfn,
> +    .instance_finalize = ppc_cpu_finalize,
>      .abstract = true,
>      .class_size = sizeof(PowerPCCPUClass),
>      .class_init = ppc_cpu_class_init,
> diff --git a/target-s390x/cpu.c b/target-s390x/cpu.c
> index 28717bd..198e57b 100644
> --- a/target-s390x/cpu.c
> +++ b/target-s390x/cpu.c
> @@ -212,6 +212,7 @@ static void s390_cpu_finalize(Object *obj)
>  
>      qemu_unregister_reset(s390_cpu_machine_reset_cb, cpu);
>  #endif
> +    cpu_exec_exit(CPU(obj));
>  }
>  
>  #if !defined(CONFIG_USER_ONLY)
> diff --git a/target-sh4/cpu.c b/target-sh4/cpu.c
> index ffb635e..65f44c7 100644
> --- a/target-sh4/cpu.c
> +++ b/target-sh4/cpu.c
> @@ -240,6 +240,11 @@ static void superh_cpu_realizefn(DeviceState *dev, Error **errp)
>      scc->parent_realize(dev, errp);
>  }
>  
> +static void superh_cpu_finalize(Object *obj)
> +{
> +    cpu_exec_exit(CPU(obj));
> +}
> +
>  static void superh_cpu_initfn(Object *obj)
>  {
>      CPUState *cs = CPU(obj);
> @@ -296,6 +301,7 @@ static const TypeInfo superh_cpu_type_info = {
>      .parent = TYPE_CPU,
>      .instance_size = sizeof(SuperHCPU),
>      .instance_init = superh_cpu_initfn,
> +    .instance_finalize = superh_cpu_finalize,
>      .abstract = true,
>      .class_size = sizeof(SuperHCPUClass),
>      .class_init = superh_cpu_class_init,
> diff --git a/target-sparc/cpu.c b/target-sparc/cpu.c
> index d857aae..ac7091a 100644
> --- a/target-sparc/cpu.c
> +++ b/target-sparc/cpu.c
> @@ -815,6 +815,7 @@ static void sparc_cpu_uninitfn(Object *obj)
>      CPUSPARCState *env = &cpu->env;
>  
>      g_free(env->def);
> +    cpu_exec_exit(CPU(obj));
>  }
>  
>  static void sparc_cpu_class_init(ObjectClass *oc, void *data)
> diff --git a/target-tricore/cpu.c b/target-tricore/cpu.c
> index 53b117b..e871dc4 100644
> --- a/target-tricore/cpu.c
> +++ b/target-tricore/cpu.c
> @@ -80,6 +80,10 @@ static void tricore_cpu_realizefn(DeviceState *dev, Error **errp)
>      tcc->parent_realize(dev, errp);
>  }
>  
> +static void tricore_cpu_finalize(Object *obj)
> +{
> +    cpu_exec_exit(CPU(obj));
> +}
>  
>  static void tricore_cpu_initfn(Object *obj)
>  {
> @@ -180,6 +184,7 @@ static const TypeInfo tricore_cpu_type_info = {
>      .parent = TYPE_CPU,
>      .instance_size = sizeof(TriCoreCPU),
>      .instance_init = tricore_cpu_initfn,
> +    .instance_finalize = tricore_cpu_finalize,
>      .abstract = true,
>      .class_size = sizeof(TriCoreCPUClass),
>      .class_init = tricore_cpu_class_init,
> diff --git a/target-unicore32/cpu.c b/target-unicore32/cpu.c
> index d56d78a..64af9f8 100644
> --- a/target-unicore32/cpu.c
> +++ b/target-unicore32/cpu.c
> @@ -103,6 +103,11 @@ static void uc32_cpu_realizefn(DeviceState *dev, Error **errp)
>      ucc->parent_realize(dev, errp);
>  }
>  
> +static void uc32_cpu_finalize(Object *obj)
> +{
> +    cpu_exec_exit(CPU(obj));
> +}
> +
>  static void uc32_cpu_initfn(Object *obj)
>  {
>      CPUState *cs = CPU(obj);
> @@ -174,6 +179,7 @@ static const TypeInfo uc32_cpu_type_info = {
>      .parent = TYPE_CPU,
>      .instance_size = sizeof(UniCore32CPU),
>      .instance_init = uc32_cpu_initfn,
> +    .instance_finalize = uc32_cpu_finalize,
>      .abstract = true,
>      .class_size = sizeof(UniCore32CPUClass),
>      .class_init = uc32_cpu_class_init,
> diff --git a/target-xtensa/cpu.c b/target-xtensa/cpu.c
> index dd23d32..565a946 100644
> --- a/target-xtensa/cpu.c
> +++ b/target-xtensa/cpu.c
> @@ -104,6 +104,11 @@ static void xtensa_cpu_realizefn(DeviceState *dev, Error **errp)
>      xcc->parent_realize(dev, errp);
>  }
>  
> +static void xtensa_cpu_finalize(Object *obj)
> +{
> +    cpu_exec_exit(CPU(obj));
> +}
> +
>  static void xtensa_cpu_initfn(Object *obj)
>  {
>      CPUState *cs = CPU(obj);
> @@ -161,6 +166,7 @@ static const TypeInfo xtensa_cpu_type_info = {
>      .parent = TYPE_CPU,
>      .instance_size = sizeof(XtensaCPU),
>      .instance_init = xtensa_cpu_initfn,
> +    .instance_finalize = xtensa_cpu_finalize,
>      .abstract = true,
>      .class_size = sizeof(XtensaCPUClass),
>      .class_init = xtensa_cpu_class_init,

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 15/24] ppc: Move cpu_exec_init() call to realize function
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 15/24] ppc: Move cpu_exec_init() call to realize function Bharata B Rao
@ 2015-05-05  7:12   ` David Gibson
  0 siblings, 0 replies; 74+ messages in thread
From: David Gibson @ 2015-05-05  7:12 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

[-- Attachment #1: Type: text/plain, Size: 607 bytes --]

On Fri, Apr 24, 2015 at 12:17:37PM +0530, Bharata B Rao wrote:
> Move cpu_exec_init() call from instance_init to realize. This allows
> any failures from cpu_exec_init() to be handled appropriately.
> Correspondingly move cpu_exec_exit() call from instance_finalize
> to unrealize.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 16/24] qom: Introduce object_has_no_children() API
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 16/24] qom: Introduce object_has_no_children() API Bharata B Rao
@ 2015-05-05  7:13   ` David Gibson
  0 siblings, 0 replies; 74+ messages in thread
From: David Gibson @ 2015-05-05  7:13 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

[-- Attachment #1: Type: text/plain, Size: 727 bytes --]

On Fri, Apr 24, 2015 at 12:17:38PM +0530, Bharata B Rao wrote:
> This QOM API can be used to check of an object has any child objects
> associated with it.
> 
> Needed by PowerPC CPU hotplug code to release parent CPU core and
> socket objects only after ascertaining that they don't have any child
> objects.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Another one which might be worth posting independently of the powerpc
hotplug series.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 17/24] cpus: Reclaim vCPU objects
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 17/24] cpus: Reclaim vCPU objects Bharata B Rao
@ 2015-05-05  7:20   ` David Gibson
  2015-05-06  6:37     ` Bharata B Rao
  0 siblings, 1 reply; 74+ messages in thread
From: David Gibson @ 2015-05-05  7:20 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: Zhu Guihua, mdroth, aik, agraf, qemu-devel, Chen Fan, qemu-ppc,
	tyreld, nfont, Gu Zheng, imammedo, afaerber

[-- Attachment #1: Type: text/plain, Size: 10088 bytes --]

On Fri, Apr 24, 2015 at 12:17:39PM +0530, Bharata B Rao wrote:
> From: Gu Zheng <guz.fnst@cn.fujitsu.com>
> 
> In order to deal well with the kvm vcpus (which can not be removed without any
> protection), we do not close KVM vcpu fd, just record and mark it as stopped
> into a list, so that we can reuse it for the appending cpu hot-add request if
> possible. It is also the approach that kvm guys suggested:
> https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
> 
> This patch also adds a QOM API object_has_no_children(Object *obj)
> that checks whether a given object has any child objects. This API
> is needed to release CPU core and socket objects when a vCPU is destroyed.

I'm guessing this commit message needs updating, since you seem to
have split this out into the previous patch.

> Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
> Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
>                [Added core and socket removal bits]
> ---
>  cpus.c               | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  include/qom/cpu.h    | 11 +++++++++
>  include/sysemu/kvm.h |  1 +
>  kvm-all.c            | 57 +++++++++++++++++++++++++++++++++++++++++++-
>  kvm-stub.c           |  5 ++++
>  5 files changed, 140 insertions(+), 1 deletion(-)
> 
> diff --git a/cpus.c b/cpus.c
> index 0fac143..325f8a6 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -858,6 +858,47 @@ void async_run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
>      qemu_cpu_kick(cpu);
>  }
>  
> +static void qemu_destroy_cpu_core(Object *core)
> +{
> +    Object *socket = core->parent;
> +
> +    object_unparent(core);
> +    if (socket && object_has_no_children(socket)) {
> +        object_unparent(socket);
> +    }

This seems a bit odd to me.  I thought the general idea of the new
approaches to cpu hotplug meant that the hotplug sequence started from
the top (the socket or core) and worked down to the threads.  Rather
than starting at the thread, and working up to the core and socket
level.

> +}
> +
> +static void qemu_kvm_destroy_vcpu(CPUState *cpu)
> +{
> +    Object *thread = OBJECT(cpu);
> +    Object *core = thread->parent;
> +
> +    CPU_REMOVE(cpu);
> +
> +    if (kvm_destroy_vcpu(cpu) < 0) {
> +        error_report("kvm_destroy_vcpu failed.\n");
> +        exit(EXIT_FAILURE);
> +    }
> +
> +    object_unparent(thread);
> +    if (core && object_has_no_children(core)) {
> +        qemu_destroy_cpu_core(core);
> +    }
> +}
> +
> +static void qemu_tcg_destroy_vcpu(CPUState *cpu)
> +{
> +    Object *thread = OBJECT(cpu);
> +    Object *core = thread->parent;
> +
> +    CPU_REMOVE(cpu);
> +    object_unparent(OBJECT(cpu));
> +
> +    if (core && object_has_no_children(core)) {
> +        qemu_destroy_cpu_core(core);
> +    }
> +}
> +
>  static void flush_queued_work(CPUState *cpu)
>  {
>      struct qemu_work_item *wi;
> @@ -950,6 +991,11 @@ static void *qemu_kvm_cpu_thread_fn(void *arg)
>              }
>          }
>          qemu_kvm_wait_io_event(cpu);
> +        if (cpu->exit && !cpu_can_run(cpu)) {
> +            qemu_kvm_destroy_vcpu(cpu);
> +            qemu_mutex_unlock(&qemu_global_mutex);
> +            return NULL;
> +        }
>      }
>  
>      return NULL;
> @@ -1003,6 +1049,7 @@ static void tcg_exec_all(void);
>  static void *qemu_tcg_cpu_thread_fn(void *arg)
>  {
>      CPUState *cpu = arg;
> +    CPUState *remove_cpu = NULL;
>  
>      qemu_tcg_init_cpu_signals();
>      qemu_thread_get_self(cpu->thread);
> @@ -1039,6 +1086,16 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>              }
>          }
>          qemu_tcg_wait_io_event();
> +        CPU_FOREACH(cpu) {
> +            if (cpu->exit && !cpu_can_run(cpu)) {
> +                remove_cpu = cpu;
> +                break;
> +            }
> +        }
> +        if (remove_cpu) {
> +            qemu_tcg_destroy_vcpu(remove_cpu);
> +            remove_cpu = NULL;
> +        }
>      }
>  
>      return NULL;
> @@ -1196,6 +1253,13 @@ void resume_all_vcpus(void)
>      }
>  }
>  
> +void cpu_remove(CPUState *cpu)
> +{
> +    cpu->stop = true;
> +    cpu->exit = true;
> +    qemu_cpu_kick(cpu);
> +}
> +
>  /* For temporary buffers for forming a name */
>  #define VCPU_THREAD_NAME_SIZE 16
>  
> @@ -1390,6 +1454,9 @@ static void tcg_exec_all(void)
>                  break;
>              }
>          } else if (cpu->stop || cpu->stopped) {
> +            if (cpu->exit) {
> +                next_cpu = CPU_NEXT(cpu);
> +            }
>              break;
>          }
>      }
> diff --git a/include/qom/cpu.h b/include/qom/cpu.h
> index 5241cf4..1bfc3d4 100644
> --- a/include/qom/cpu.h
> +++ b/include/qom/cpu.h
> @@ -206,6 +206,7 @@ struct kvm_run;
>   * @halted: Nonzero if the CPU is in suspended state.
>   * @stop: Indicates a pending stop request.
>   * @stopped: Indicates the CPU has been artificially stopped.
> + * @exit: Indicates the CPU has exited due to an unplug operation.
>   * @tcg_exit_req: Set to force TCG to stop executing linked TBs for this
>   *           CPU and return to its top level loop.
>   * @singlestep_enabled: Flags for single-stepping.
> @@ -249,6 +250,7 @@ struct CPUState {
>      bool created;
>      bool stop;
>      bool stopped;
> +    bool exit;
>      volatile sig_atomic_t exit_request;
>      uint32_t interrupt_request;
>      int singlestep_enabled;
> @@ -306,6 +308,7 @@ struct CPUState {
>  QTAILQ_HEAD(CPUTailQ, CPUState);
>  extern struct CPUTailQ cpus;
>  #define CPU_NEXT(cpu) QTAILQ_NEXT(cpu, node)
> +#define CPU_REMOVE(cpu) QTAILQ_REMOVE(&cpus, cpu, node)
>  #define CPU_FOREACH(cpu) QTAILQ_FOREACH(cpu, &cpus, node)
>  #define CPU_FOREACH_SAFE(cpu, next_cpu) \
>      QTAILQ_FOREACH_SAFE(cpu, &cpus, node, next_cpu)
> @@ -610,6 +613,14 @@ void cpu_exit(CPUState *cpu);
>   */
>  void cpu_resume(CPUState *cpu);
>  
> + /**
> + * cpu_remove:
> + * @cpu: The CPU to remove.
> + *
> + * Requests the CPU to be removed.
> + */
> +void cpu_remove(CPUState *cpu);
> +
>  /**
>   * qemu_init_vcpu:
>   * @cpu: The vCPU to initialize.
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 30cb84d..560caef 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -188,6 +188,7 @@ int kvm_has_intx_set_mask(void);
>  
>  int kvm_init_vcpu(CPUState *cpu);
>  int kvm_cpu_exec(CPUState *cpu);
> +int kvm_destroy_vcpu(CPUState *cpu);
>  
>  #ifdef NEED_CPU_H
>  
> diff --git a/kvm-all.c b/kvm-all.c
> index 05a79c2..46e7853 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -71,6 +71,12 @@ typedef struct KVMSlot
>  
>  typedef struct kvm_dirty_log KVMDirtyLog;
>  
> +struct KVMParkedVcpu {
> +    unsigned long vcpu_id;
> +    int kvm_fd;
> +    QLIST_ENTRY(KVMParkedVcpu) node;
> +};
> +
>  struct KVMState
>  {
>      AccelState parent_obj;
> @@ -107,6 +113,7 @@ struct KVMState
>      QTAILQ_HEAD(msi_hashtab, KVMMSIRoute) msi_hashtab[KVM_MSI_HASHTAB_SIZE];
>      bool direct_msi;
>  #endif
> +    QLIST_HEAD(, KVMParkedVcpu) kvm_parked_vcpus;
>  };
>  
>  #define TYPE_KVM_ACCEL ACCEL_CLASS_NAME("kvm")
> @@ -247,6 +254,53 @@ static int kvm_set_user_memory_region(KVMState *s, KVMSlot *slot)
>      return kvm_vm_ioctl(s, KVM_SET_USER_MEMORY_REGION, &mem);
>  }
>  
> +int kvm_destroy_vcpu(CPUState *cpu)
> +{
> +    KVMState *s = kvm_state;
> +    long mmap_size;
> +    struct KVMParkedVcpu *vcpu = NULL;
> +    int ret = 0;
> +
> +    DPRINTF("kvm_destroy_vcpu\n");
> +
> +    mmap_size = kvm_ioctl(s, KVM_GET_VCPU_MMAP_SIZE, 0);
> +    if (mmap_size < 0) {
> +        ret = mmap_size;
> +        DPRINTF("KVM_GET_VCPU_MMAP_SIZE failed\n");
> +        goto err;
> +    }
> +
> +    ret = munmap(cpu->kvm_run, mmap_size);
> +    if (ret < 0) {
> +        goto err;
> +    }
> +
> +    vcpu = g_malloc0(sizeof(*vcpu));
> +    vcpu->vcpu_id = kvm_arch_vcpu_id(cpu);
> +    vcpu->kvm_fd = cpu->kvm_fd;
> +    QLIST_INSERT_HEAD(&kvm_state->kvm_parked_vcpus, vcpu, node);
> +err:
> +    return ret;
> +}
> +
> +static int kvm_get_vcpu(KVMState *s, unsigned long vcpu_id)
> +{
> +    struct KVMParkedVcpu *cpu;
> +
> +    QLIST_FOREACH(cpu, &s->kvm_parked_vcpus, node) {
> +        if (cpu->vcpu_id == vcpu_id) {
> +            int kvm_fd;
> +
> +            QLIST_REMOVE(cpu, node);
> +            kvm_fd = cpu->kvm_fd;
> +            g_free(cpu);
> +            return kvm_fd;
> +        }
> +    }
> +
> +    return kvm_vm_ioctl(s, KVM_CREATE_VCPU, (void *)vcpu_id);
> +}
> +
>  int kvm_init_vcpu(CPUState *cpu)
>  {
>      KVMState *s = kvm_state;
> @@ -255,7 +309,7 @@ int kvm_init_vcpu(CPUState *cpu)
>  
>      DPRINTF("kvm_init_vcpu\n");
>  
> -    ret = kvm_vm_ioctl(s, KVM_CREATE_VCPU, (void *)kvm_arch_vcpu_id(cpu));
> +    ret = kvm_get_vcpu(s, kvm_arch_vcpu_id(cpu));
>      if (ret < 0) {
>          DPRINTF("kvm_create_vcpu failed\n");
>          goto err;
> @@ -1448,6 +1502,7 @@ static int kvm_init(MachineState *ms)
>  #ifdef KVM_CAP_SET_GUEST_DEBUG
>      QTAILQ_INIT(&s->kvm_sw_breakpoints);
>  #endif
> +    QLIST_INIT(&s->kvm_parked_vcpus);
>      s->vmfd = -1;
>      s->fd = qemu_open("/dev/kvm", O_RDWR);
>      if (s->fd == -1) {
> diff --git a/kvm-stub.c b/kvm-stub.c
> index 7ba90c5..79ac626 100644
> --- a/kvm-stub.c
> +++ b/kvm-stub.c
> @@ -30,6 +30,11 @@ bool kvm_gsi_direct_mapping;
>  bool kvm_allowed;
>  bool kvm_readonly_mem_allowed;
>  
> +int kvm_destroy_vcpu(CPUState *cpu)
> +{
> +    return -ENOSYS;
> +}
> +
>  int kvm_init_vcpu(CPUState *cpu)
>  {
>      return -ENOSYS;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 18/24] xics_kvm: Don't enable KVM_CAP_IRQ_XICS if already enabled
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 18/24] xics_kvm: Don't enable KVM_CAP_IRQ_XICS if already enabled Bharata B Rao
@ 2015-05-05  7:22   ` David Gibson
  2015-05-06  5:42     ` Bharata B Rao
  0 siblings, 1 reply; 74+ messages in thread
From: David Gibson @ 2015-05-05  7:22 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

[-- Attachment #1: Type: text/plain, Size: 2218 bytes --]

On Fri, Apr 24, 2015 at 12:17:40PM +0530, Bharata B Rao wrote:
> When supporting CPU hot removal by parking the vCPU fd and reusing
> it during hotplug again, there can be cases where we try to reenable
> KVM_CAP_IRQ_XICS CAP for the vCPU for which it was already enabled.
> Introduce a boolean member in ICPState to track this and don't
> reenable the CAP if it was already enabled earlier.
> 
> This change allows CPU hot removal to work for sPAPR.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>

Do you actually need this?  Is there any harm in setting the
capability multiple times, or could you just ignore the "already set"
error?

> ---
>  hw/intc/xics_kvm.c    | 10 ++++++++++
>  include/hw/ppc/xics.h |  1 +
>  2 files changed, 11 insertions(+)
> 
> diff --git a/hw/intc/xics_kvm.c b/hw/intc/xics_kvm.c
> index c15453f..5b27bf8 100644
> --- a/hw/intc/xics_kvm.c
> +++ b/hw/intc/xics_kvm.c
> @@ -331,6 +331,15 @@ static void xics_kvm_cpu_setup(XICSState *icp, PowerPCCPU *cpu)
>          abort();
>      }
>  
> +    /*
> +     * If we are reusing a parked vCPU fd corresponding to the CPU
> +     * which was hot-removed earlier we don't have to renable
> +     * KVM_CAP_IRQ_XICS capability again.
> +     */
> +    if (ss->cap_irq_xics_enabled) {
> +        return;
> +    }
> +
>      if (icpkvm->kernel_xics_fd != -1) {
>          int ret;
>  
> @@ -343,6 +352,7 @@ static void xics_kvm_cpu_setup(XICSState *icp, PowerPCCPU *cpu)
>                      kvm_arch_vcpu_id(cs), strerror(errno));
>              exit(1);
>          }
> +        ss->cap_irq_xics_enabled = true;
>      }
>  }
>  
> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
> index a214dd7..355a966 100644
> --- a/include/hw/ppc/xics.h
> +++ b/include/hw/ppc/xics.h
> @@ -109,6 +109,7 @@ struct ICPState {
>      uint8_t pending_priority;
>      uint8_t mfrr;
>      qemu_irq output;
> +    bool cap_irq_xics_enabled;
>  };
>  
>  #define TYPE_ICS "ics"

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 20/24] spapr: CPU hot unplug support
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 20/24] spapr: CPU hot unplug support Bharata B Rao
@ 2015-05-05  7:28   ` David Gibson
  2015-05-06  7:55     ` Bharata B Rao
  0 siblings, 1 reply; 74+ messages in thread
From: David Gibson @ 2015-05-05  7:28 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

[-- Attachment #1: Type: text/plain, Size: 7166 bytes --]

On Fri, Apr 24, 2015 at 12:17:42PM +0530, Bharata B Rao wrote:
> Support hot removal of CPU for sPAPR guests by sending the hot unplug
> notification to the guest via EPOW interrupt. Release the vCPU object
> after CPU hot unplug so that vCPU fd can be parked and reused.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> ---
>  hw/ppc/spapr.c              | 101 +++++++++++++++++++++++++++++++++++++++++++-
>  target-ppc/translate_init.c |  10 +++++
>  2 files changed, 110 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 9b0701c..910a50f 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1481,6 +1481,12 @@ static void spapr_cpu_init(PowerPCCPU *cpu)
>      qemu_register_reset(spapr_cpu_reset, cpu);
>  }
>  
> +static void spapr_cpu_destroy(PowerPCCPU *cpu)
> +{
> +    xics_cpu_destroy(spapr->icp, cpu);
> +    qemu_unregister_reset(spapr_cpu_reset, cpu);
> +}
> +
>  /* pSeries LPAR / sPAPR hardware init */
>  static void ppc_spapr_init(MachineState *machine)
>  {
> @@ -1883,6 +1889,24 @@ static void *spapr_populate_hotplug_cpu_dt(DeviceState *dev, CPUState *cs,
>      return fdt;
>  }
>  
> +static void spapr_cpu_release(DeviceState *dev, void *opaque)
> +{
> +    CPUState *cs;
> +    int i;
> +    int id = ppc_get_vcpu_dt_id(POWERPC_CPU(CPU(dev)));
> +
> +    for (i = id; i < id + smp_threads; i++) {
> +        CPU_FOREACH(cs) {
> +            PowerPCCPU *cpu = POWERPC_CPU(cs);
> +
> +            if (i == ppc_get_vcpu_dt_id(cpu)) {
> +                spapr_cpu_destroy(cpu);
> +                cpu_remove(cs);
> +            }
> +        }
> +    }
> +}
> +
>  static void spapr_cpu_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>                              Error **errp)
>  {
> @@ -1970,6 +1994,59 @@ static void spapr_cpu_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>      return;
>  }
>  
> +static int spapr_cpu_unplug(Object *obj, void *opaque)
> +{
> +    Error **errp = opaque;
> +    DeviceState *dev = DEVICE(obj);
> +    CPUState *cs = CPU(dev);
> +    PowerPCCPU *cpu = POWERPC_CPU(cs);
> +    int id = ppc_get_vcpu_dt_id(cpu);
> +    int smt = kvmppc_smt_threads();
> +    sPAPRDRConnector *drc =
> +        spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, id);
> +    sPAPRDRConnectorClass *drck;
> +    Error *local_err = NULL;
> +
> +    /*
> +     * SMT threads return from here, only main thread (core) will
> +     * continue and signal hot unplug event to the guest.
> +     */
> +    if ((id % smt) != 0) {
> +        return 0;
> +    }

As with the in-plug side couldn't this be done more naturally by
attaching this function to the cpu core object rather than the thread?

> +    g_assert(drc);
> +
> +    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +    drck->detach(drc, dev, spapr_cpu_release, NULL, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return -1;
> +    }
> +
> +    /*
> +     * In addition to hotplugged CPUs, send the hot-unplug notification
> +     * interrupt to the guest for coldplugged CPUs started via -device
> +     * option too.
> +     */
> +    spapr_hotplug_req_remove_event(drc);

Um.. doesn't the remove notification need to go *before* the
"physical" unplug?  Along with a wait for acknowledgement from the
guest?

> +    return 0;
> +}
> +
> +static int spapr_cpu_core_unplug(Object *obj, void *opaque)
> +{
> +    Error **errp = opaque;
> +
> +    object_child_foreach(obj, spapr_cpu_unplug, errp);
> +    return 0;
> +}
> +
> +static void spapr_cpu_socket_unplug(HotplugHandler *hotplug_dev,
> +                            DeviceState *dev, Error **errp)
> +{
> +    object_child_foreach(OBJECT(dev), spapr_cpu_core_unplug, errp);
> +}
> +
>  static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
>                                        DeviceState *dev, Error **errp)
>  {
> @@ -1984,16 +2061,36 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
>           * Fail hotplug on machines where CPU DR isn't enabled.
>           */
>          if (!spapr->dr_cpu_enabled && dev->hotplugged) {
> +            /*
> +             * FIXME: Ideally should fail hotplug here by doing an error_setg,
> +             * but failing hotplug here doesn't work well with the vCPU object
> +             * removal code. Hence silently refusing to add CPUs here.
> +             */
> +            spapr_cpu_destroy(cpu);
> +            cpu_remove(cs);
>              return;
>          }
>          spapr_cpu_plug(hotplug_dev, dev, errp);
>      }
>  }
>  
> +static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
> +                                      DeviceState *dev, Error **errp)
> +{
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_CPU_SOCKET)) {
> +        if (!spapr->dr_cpu_enabled) {
> +            error_setg(errp, "CPU hot unplug not supported on this machine");
> +            return;
> +        }
> +        spapr_cpu_socket_unplug(hotplug_dev, dev, errp);
> +    }
> +}
> +
>  static HotplugHandler *spapr_get_hotpug_handler(MachineState *machine,
>                                               DeviceState *dev)
>  {
> -    if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_CPU) ||
> +        object_dynamic_cast(OBJECT(dev), TYPE_CPU_SOCKET)) {
>          return HOTPLUG_HANDLER(machine);
>      }
>      return NULL;
> @@ -2017,6 +2114,8 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>      mc->has_dynamic_sysbus = true;
>      mc->get_hotplug_handler = spapr_get_hotpug_handler;
>      hc->plug = spapr_machine_device_plug;
> +    hc->unplug = spapr_machine_device_unplug;
> +
>      smc->dr_phb_enabled = false;
>      smc->dr_cpu_enabled = false;
>      smc->dr_lmb_enabled = false;
> diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
> index fccee82..8e24235 100644
> --- a/target-ppc/translate_init.c
> +++ b/target-ppc/translate_init.c
> @@ -30,6 +30,7 @@
>  #include "qemu/error-report.h"
>  #include "qapi/visitor.h"
>  #include "hw/qdev-properties.h"
> +#include "migration/vmstate.h"
>  
>  //#define PPC_DUMP_CPU
>  //#define PPC_DEBUG_SPR
> @@ -9143,9 +9144,18 @@ static void ppc_cpu_unrealizefn(DeviceState *dev, Error **errp)
>  {
>      PowerPCCPU *cpu = POWERPC_CPU(dev);
>      CPUPPCState *env = &cpu->env;
> +    CPUClass *cc = CPU_GET_CLASS(dev);
>      opc_handler_t **table;
>      int i, j;
>  
> +    if (qdev_get_vmsd(dev) == NULL) {
> +        vmstate_unregister(NULL, &vmstate_cpu_common, cpu);
> +    }
> +
> +    if (cc->vmsd != NULL) {
> +        vmstate_unregister(NULL, cc->vmsd, cpu);
> +    }
> +
>      cpu_exec_exit(CPU(dev));
>  
>      for (i = 0; i < PPC_CPU_OPCODES_LEN; i++) {

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 21/24] spapr: Initialize hotplug memory address space
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 21/24] spapr: Initialize hotplug memory address space Bharata B Rao
@ 2015-05-05  7:33   ` David Gibson
  2015-05-06  7:58     ` Bharata B Rao
  2015-05-05  8:48   ` Igor Mammedov
  1 sibling, 1 reply; 74+ messages in thread
From: David Gibson @ 2015-05-05  7:33 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

[-- Attachment #1: Type: text/plain, Size: 5632 bytes --]

On Fri, Apr 24, 2015 at 12:17:43PM +0530, Bharata B Rao wrote:
> Initialize a hotplug memory region under which all the hotplugged
> memory is accommodated. Also enable memory hotplug by setting
> CONFIG_MEM_HOTPLUG.
> 
> Modelled on i386 memory hotplug.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>

So, the cpu hotplug stuff has been caught up in these generic level
design discussions.  Could you split out the memory hotplug part of
this series so that we might be able to move forwards with that while
the cpu hotplug discussion continues?

> ---
>  default-configs/ppc64-softmmu.mak |  1 +
>  hw/ppc/spapr.c                    | 38 ++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h            | 12 ++++++++++++
>  3 files changed, 51 insertions(+)
> 
> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> index 22ef132..16b3011 100644
> --- a/default-configs/ppc64-softmmu.mak
> +++ b/default-configs/ppc64-softmmu.mak
> @@ -51,3 +51,4 @@ CONFIG_XICS_KVM=$(and $(CONFIG_PSERIES),$(CONFIG_KVM))
>  # For PReP
>  CONFIG_MC146818RTC=y
>  CONFIG_ISA_TESTDEV=y
> +CONFIG_MEM_HOTPLUG=y
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 910a50f..9dc4c36 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -125,6 +125,9 @@ struct sPAPRMachineState {
>  
>      /*< public >*/
>      char *kvm_type;
> +    ram_addr_t hotplug_memory_base;
> +    MemoryRegion hotplug_memory;
> +    bool enforce_aligned_dimm;

As mentioned on the earlier version, on ppc we don't have
compatibility reasons we need to have a mode for unaligned dimms.  Get
rid of this bool and treat it as always on instead.

>  };
>  
>  sPAPREnvironment *spapr;
> @@ -1514,6 +1517,7 @@ static void ppc_spapr_init(MachineState *machine)
>      QemuOpts *opts = qemu_opts_find(qemu_find_opts("smp-opts"), NULL);
>      int sockets = opts ? qemu_opt_get_number(opts, "sockets", 0) : 0;
>      int cores = (smp_cpus/smp_threads) ? smp_cpus/smp_threads : 1;
> +    sPAPRMachineState *ms = SPAPR_MACHINE(machine);
>  
>      sockets = sockets ? sockets : cores;
>      msi_supported = true;
> @@ -1613,6 +1617,36 @@ static void ppc_spapr_init(MachineState *machine)
>          memory_region_add_subregion(sysmem, 0, rma_region);
>      }
>  
> +    /* initialize hotplug memory address space */
> +    if (machine->ram_size < machine->maxram_size) {
> +        ram_addr_t hotplug_mem_size =
> +            machine->maxram_size - machine->ram_size;
> +
> +        if (machine->ram_slots > SPAPR_MAX_RAM_SLOTS) {
> +            error_report("unsupported amount of memory slots: %"PRIu64,
> +                         machine->ram_slots);
> +            exit(EXIT_FAILURE);
> +        }
> +
> +        ms->hotplug_memory_base = ROUND_UP(machine->ram_size,
> +                                    SPAPR_HOTPLUG_MEM_ALIGN);
> +
> +        if (ms->enforce_aligned_dimm) {
> +            hotplug_mem_size += SPAPR_HOTPLUG_MEM_ALIGN * machine->ram_slots;
> +        }
> +
> +        if ((ms->hotplug_memory_base + hotplug_mem_size) < hotplug_mem_size) {
> +            error_report("unsupported amount of maximum memory: " RAM_ADDR_FMT,
> +                         machine->maxram_size);
> +            exit(EXIT_FAILURE);
> +        }
> +
> +        memory_region_init(&ms->hotplug_memory, OBJECT(ms),
> +                           "hotplug-memory", hotplug_mem_size);
> +        memory_region_add_subregion(sysmem, ms->hotplug_memory_base,
> +                                    &ms->hotplug_memory);
> +    }
> +
>      filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, "spapr-rtas.bin");
>      spapr->rtas_size = get_image_size(filename);
>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
> @@ -1844,11 +1878,15 @@ static void spapr_set_kvm_type(Object *obj, const char *value, Error **errp)
>  
>  static void spapr_machine_initfn(Object *obj)
>  {
> +    sPAPRMachineState *ms = SPAPR_MACHINE(obj);
> +
>      object_property_add_str(obj, "kvm-type",
>                              spapr_get_kvm_type, spapr_set_kvm_type, NULL);
>      object_property_set_description(obj, "kvm-type",
>                                      "Specifies the KVM virtualization mode (HV, PR)",
>                                      NULL);
> +
> +    ms->enforce_aligned_dimm = true;
>  }
>  
>  static void ppc_cpu_do_nmi_on_cpu(void *arg)
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index ecac6e3..53560e9 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -542,6 +542,18 @@ struct sPAPREventLogEntry {
>  
>  #define SPAPR_MEMORY_BLOCK_SIZE (1 << 28) /* 256MB */
>  
> +/*
> + * This defines the maximum number of DIMM slots we can have for sPAPR
> + * guest. This is not defined by sPAPR but we are defining it to 4096 slots
> + * here. With the worst case addition of SPAPR_MEMORY_BLOCK_SIZE
> + * (256MB) memory per slot, we should be able to support 1TB of guest
> + * hotpluggable memory.
> + */
> +#define SPAPR_MAX_RAM_SLOTS     (1ULL << 12)
> +
> +/* 1GB alignment for hotplug memory region */
> +#define SPAPR_HOTPLUG_MEM_ALIGN (1ULL << 30)
> +
>  void spapr_events_init(sPAPREnvironment *spapr);
>  void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq);
>  int spapr_h_cas_compose_response(target_ulong addr, target_ulong size);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 22/24] numa: API to lookup NUMA node by address
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 22/24] numa: API to lookup NUMA node by address Bharata B Rao
@ 2015-05-05  7:35   ` David Gibson
  0 siblings, 0 replies; 74+ messages in thread
From: David Gibson @ 2015-05-05  7:35 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, Paolo Bonzini, qemu-ppc, tyreld,
	nfont, imammedo, afaerber

[-- Attachment #1: Type: text/plain, Size: 757 bytes --]

On Fri, Apr 24, 2015 at 12:17:44PM +0530, Bharata B Rao wrote:
> Keep track of start and end address of each NUMA node in numa_info
> structure so that lookup of node by address becomes easier. Add
> an API get_numa_node() to lookup a node by address.
> 
> This is needed by PowerPC memory hotplug implementation.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Another candidate for sending separately from the papr hotplug series.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 23/24] spapr: Support ibm, dynamic-reconfiguration-memory
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 23/24] spapr: Support ibm, dynamic-reconfiguration-memory Bharata B Rao
@ 2015-05-05  7:40   ` David Gibson
  2015-05-06  8:27     ` Bharata B Rao
  0 siblings, 1 reply; 74+ messages in thread
From: David Gibson @ 2015-05-05  7:40 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

[-- Attachment #1: Type: text/plain, Size: 1222 bytes --]

On Fri, Apr 24, 2015 at 12:17:45PM +0530, Bharata B Rao wrote:
> Parse ibm,architecture.vec table obtained from the guest and enable
> memory node configuration via ibm,dynamic-reconfiguration-memory if guest
> supports it. This is in preparation to support memory hotplug for
> sPAPR guests.
> 
> This changes the way memory node configuration is done. Currently all
> memory nodes are built upfront. But after this patch, only memory@0 node
> for RMA is built upfront. Guest kernel boots with just that and rest of
> the memory nodes (via memory@XXX or ibm,dynamic-reconfiguration-memory)
> are built when guest does ibm,client-architecture-support call.
> 
> Note: This patch needs a SLOF enhancement which is already part of
> upstream SLOF.

Is it in the SLOF included in the qemu submodule though?  If not you
should have a patch to update the submodule first.

> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>

Apart from that,

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 24/24] spapr: Memory hotplug support
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 24/24] spapr: Memory hotplug support Bharata B Rao
@ 2015-05-05  7:45   ` David Gibson
  2015-05-06  8:30     ` Bharata B Rao
  0 siblings, 1 reply; 74+ messages in thread
From: David Gibson @ 2015-05-05  7:45 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

[-- Attachment #1: Type: text/plain, Size: 613 bytes --]

On Fri, Apr 24, 2015 at 12:17:46PM +0530, Bharata B Rao wrote:
> Make use of pc-dimm infrastructure to support memory hotplug
> for PowerPC.
> 
> Modelled on i386 memory hotplug.

Can the previous patch actually do anything without this one?  If not,
might as well fold them together.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 21/24] spapr: Initialize hotplug memory address space
  2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 21/24] spapr: Initialize hotplug memory address space Bharata B Rao
  2015-05-05  7:33   ` David Gibson
@ 2015-05-05  8:48   ` Igor Mammedov
  2015-05-06  8:23     ` Bharata B Rao
  1 sibling, 1 reply; 74+ messages in thread
From: Igor Mammedov @ 2015-05-05  8:48 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont, afaerber, david

On Fri, 24 Apr 2015 12:17:43 +0530
Bharata B Rao <bharata@linux.vnet.ibm.com> wrote:

> Initialize a hotplug memory region under which all the hotplugged
> memory is accommodated. Also enable memory hotplug by setting
> CONFIG_MEM_HOTPLUG.
> 
> Modelled on i386 memory hotplug.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> ---
>  default-configs/ppc64-softmmu.mak |  1 +
>  hw/ppc/spapr.c                    | 38 ++++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr.h            | 12 ++++++++++++
>  3 files changed, 51 insertions(+)
> 
> diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> index 22ef132..16b3011 100644
> --- a/default-configs/ppc64-softmmu.mak
> +++ b/default-configs/ppc64-softmmu.mak
> @@ -51,3 +51,4 @@ CONFIG_XICS_KVM=$(and $(CONFIG_PSERIES),$(CONFIG_KVM))
>  # For PReP
>  CONFIG_MC146818RTC=y
>  CONFIG_ISA_TESTDEV=y
> +CONFIG_MEM_HOTPLUG=y
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 910a50f..9dc4c36 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -125,6 +125,9 @@ struct sPAPRMachineState {
>  
>      /*< public >*/
>      char *kvm_type;
> +    ram_addr_t hotplug_memory_base;
> +    MemoryRegion hotplug_memory;
> +    bool enforce_aligned_dimm;
>  };
>  
>  sPAPREnvironment *spapr;
> @@ -1514,6 +1517,7 @@ static void ppc_spapr_init(MachineState *machine)
>      QemuOpts *opts = qemu_opts_find(qemu_find_opts("smp-opts"), NULL);
>      int sockets = opts ? qemu_opt_get_number(opts, "sockets", 0) : 0;
>      int cores = (smp_cpus/smp_threads) ? smp_cpus/smp_threads : 1;
> +    sPAPRMachineState *ms = SPAPR_MACHINE(machine);
>  
>      sockets = sockets ? sockets : cores;
>      msi_supported = true;
> @@ -1613,6 +1617,36 @@ static void ppc_spapr_init(MachineState *machine)
>          memory_region_add_subregion(sysmem, 0, rma_region);
>      }
>  
> +    /* initialize hotplug memory address space */
> +    if (machine->ram_size < machine->maxram_size) {
> +        ram_addr_t hotplug_mem_size =
> +            machine->maxram_size - machine->ram_size;
> +
> +        if (machine->ram_slots > SPAPR_MAX_RAM_SLOTS) {
> +            error_report("unsupported amount of memory slots: %"PRIu64,
> +                         machine->ram_slots);
> +            exit(EXIT_FAILURE);
> +        }
> +
> +        ms->hotplug_memory_base = ROUND_UP(machine->ram_size,
> +                                    SPAPR_HOTPLUG_MEM_ALIGN);
> +
> +        if (ms->enforce_aligned_dimm) {
> +            hotplug_mem_size += SPAPR_HOTPLUG_MEM_ALIGN * machine->ram_slots;
> +        }
> +
> +        if ((ms->hotplug_memory_base + hotplug_mem_size) < hotplug_mem_size) {
> +            error_report("unsupported amount of maximum memory: " RAM_ADDR_FMT,
> +                         machine->maxram_size);
> +            exit(EXIT_FAILURE);
> +        }
> +
> +        memory_region_init(&ms->hotplug_memory, OBJECT(ms),
> +                           "hotplug-memory", hotplug_mem_size);
> +        memory_region_add_subregion(sysmem, ms->hotplug_memory_base,
> +                                    &ms->hotplug_memory);
> +    }
> +
>      filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, "spapr-rtas.bin");
>      spapr->rtas_size = get_image_size(filename);
>      spapr->rtas_blob = g_malloc(spapr->rtas_size);
> @@ -1844,11 +1878,15 @@ static void spapr_set_kvm_type(Object *obj, const char *value, Error **errp)
>  
>  static void spapr_machine_initfn(Object *obj)
>  {
> +    sPAPRMachineState *ms = SPAPR_MACHINE(obj);
> +
>      object_property_add_str(obj, "kvm-type",
>                              spapr_get_kvm_type, spapr_set_kvm_type, NULL);
>      object_property_set_description(obj, "kvm-type",
>                                      "Specifies the KVM virtualization mode (HV, PR)",
>                                      NULL);
> +
> +    ms->enforce_aligned_dimm = true;
>  }
>  
>  static void ppc_cpu_do_nmi_on_cpu(void *arg)
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index ecac6e3..53560e9 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -542,6 +542,18 @@ struct sPAPREventLogEntry {
>  
>  #define SPAPR_MEMORY_BLOCK_SIZE (1 << 28) /* 256MB */
>  
> +/*
> + * This defines the maximum number of DIMM slots we can have for sPAPR
> + * guest. This is not defined by sPAPR but we are defining it to 4096 slots
> + * here. With the worst case addition of SPAPR_MEMORY_BLOCK_SIZE
> + * (256MB) memory per slot, we should be able to support 1TB of guest
> + * hotpluggable memory.
> + */
> +#define SPAPR_MAX_RAM_SLOTS     (1ULL << 12)
why not write 4096 instead of (1ULL << 12), much easier to read.

BTW:
KVM supports upto 509 memory slots including slots consumed by
initial memory.

> +
> +/* 1GB alignment for hotplug memory region */
> +#define SPAPR_HOTPLUG_MEM_ALIGN (1ULL << 30)
> +
>  void spapr_events_init(sPAPREnvironment *spapr);
>  void spapr_events_fdt_skel(void *fdt, uint32_t epow_irq);
>  int spapr_h_cas_compose_response(target_ulong addr, target_ulong size);

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 06/24] spapr: Consolidate cpu init code into a routine
  2015-05-04 16:10   ` Thomas Huth
@ 2015-05-06  4:28     ` Bharata B Rao
  2015-05-06  6:32       ` Thomas Huth
  0 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-05-06  4:28 UTC (permalink / raw)
  To: Thomas Huth
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, imammedo,
	nfont, afaerber, david

On Mon, May 04, 2015 at 06:10:59PM +0200, Thomas Huth wrote:
> On Fri, 24 Apr 2015 12:17:28 +0530
> Bharata B Rao <bharata@linux.vnet.ibm.com> wrote:
> 
> > Factor out bits of sPAPR specific CPU initialization code into
> > a separate routine so that it can be called from CPU hotplug
> > path too.
> > 
> > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > ---
> >  hw/ppc/spapr.c | 54 +++++++++++++++++++++++++++++-------------------------
> >  1 file changed, 29 insertions(+), 25 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index a56f9a1..5c8f2ff 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -1440,6 +1440,34 @@ static void spapr_drc_reset(void *opaque)
> >      }
> >  }
> >  
> > +static void spapr_cpu_init(PowerPCCPU *cpu)
> > +{
> > +    CPUPPCState *env = &cpu->env;
> > +
> > +    /* Set time-base frequency to 512 MHz */
> > +    cpu_ppc_tb_init(env, TIMEBASE_FREQ);
> > +
> > +    /* PAPR always has exception vectors in RAM not ROM. To ensure this,
> > +     * MSR[IP] should never be set.
> > +     */
> > +    env->msr_mask &= ~(1 << 6);
> 
> While you're at it ... could we maybe get a proper #define for that MSR
> bit? (just like the other ones in target-ppc/cpu.h)

Sure will use MSR_EP here next time.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 07/24] cpu: Prepare Socket container type
  2015-05-05  1:47   ` David Gibson
@ 2015-05-06  4:36     ` Bharata B Rao
  0 siblings, 0 replies; 74+ messages in thread
From: Bharata B Rao @ 2015-05-06  4:36 UTC (permalink / raw)
  To: David Gibson
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

On Tue, May 05, 2015 at 11:47:30AM +1000, David Gibson wrote:
> On Fri, Apr 24, 2015 at 12:17:29PM +0530, Bharata B Rao wrote:
> > From: Andreas Färber <afaerber@suse.de>
> > 
> > Signed-off-by: Andreas Färber <afaerber@suse.de>
> > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> 
> So, how to organize this generically is still under discussion.  For
> now, I don't think this generic outline is really worth it.  In any
> case I can't really take it through my tree.
> 
> What I'd suggest instead is just implementing the POWER core device in
> the ppc specific code.  As the generic socket vs. core vs. whatever
> stuff clarifies, that POWER core device might become a "virtual
> socket" or CM or whatever, but I think we'll be able to keep the
> external interface compatible with the right use of aliases.
> 
> In the meantime it should at least give us a draft we can experiment
> with on Power without requiring new generic infrastructure.

Makes sense, I will switch to the semantics that I had in v1 where
I enabled CPU hotplug for POWER8 using device_add.

(qemu) device_add POWER8-powerpc64-cpu,id=XXX

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 08/24] ppc: Prepare CPU socket/core abstraction
  2015-05-04 15:15   ` Thomas Huth
@ 2015-05-06  4:40     ` Bharata B Rao
  2015-05-06  6:52       ` Thomas Huth
  0 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-05-06  4:40 UTC (permalink / raw)
  To: Thomas Huth
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, imammedo,
	nfont, afaerber, david

On Mon, May 04, 2015 at 05:15:04PM +0200, Thomas Huth wrote:
> On Fri, 24 Apr 2015 12:17:30 +0530
> Bharata B Rao <bharata@linux.vnet.ibm.com> wrote:
> 
> > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > Signed-off-by: Andreas Färber <afaerber@suse.de>
> 
> Not sure if QEMU is fully following the kernel Sob-process, but if
> that's the case, I think your Sob should be below the one of Andreas in
> case you're the last person who touched the patch (and I think that's
> the case here since you've sent it out)?

The patch is from me, but I included Andreas' sob and copyrights since
the code was derived from his equivalent x86 implementation.

> 
> Also a short patch description would be really nice.

Guess I will get rid of this whole socket abstraction for now as noted
in an earlier thread.

Regards,
Bharata.


Guess I will get rid of this whole socket abstraction for now as noted
in an earlier thread.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 10/24] ppc: Update cpu_model in MachineState
  2015-05-05  6:49   ` David Gibson
@ 2015-05-06  4:49     ` Bharata B Rao
  0 siblings, 0 replies; 74+ messages in thread
From: Bharata B Rao @ 2015-05-06  4:49 UTC (permalink / raw)
  To: David Gibson
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

On Tue, May 05, 2015 at 04:49:08PM +1000, David Gibson wrote:
> On Fri, Apr 24, 2015 at 12:17:32PM +0530, Bharata B Rao wrote:
> > Keep cpu_model field in MachineState uptodate so that it can be used
> > from the CPU hotplug path.
> > 
> > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> As before, this looks fine to me, but I'm not sure which tree it
> should go through.
> 
> Alex, do you want to take it directly, or send an Acked-by and I'll
> take it through spapr-next?

In addition to this patch, there are a few other patches that are required
by hotplug but can go in independently.

03/24 - spapr: Consider max_cpus during xics initialization
04/24 - spapr: Support ibm,lrdr-capacity device tree property
05/24 - spapr: Reorganize CPU dt generation code (Maybe?)
06/24 - spapr: Consolidate cpu init code into a routine
18/24 - xics_kvm: Don't enable KVM_CAP_IRQ_XICS if already enabled
19/24 - xics_kvm: Add cpu_destroy method to XICS

Should I post these as a separate set (pre-requisites for sPAPR CPU hotplug)
so that they can be pushed independently of the core hotplug patchset ?

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 12/24] spapr: CPU hotplug support
  2015-05-04 15:53   ` Thomas Huth
@ 2015-05-06  5:37     ` Bharata B Rao
  0 siblings, 0 replies; 74+ messages in thread
From: Bharata B Rao @ 2015-05-06  5:37 UTC (permalink / raw)
  To: Thomas Huth
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, imammedo,
	nfont, afaerber, david

On Mon, May 04, 2015 at 05:53:23PM +0200, Thomas Huth wrote:
> On Fri, 24 Apr 2015 12:17:34 +0530
> Bharata B Rao <bharata@linux.vnet.ibm.com> wrote:
> 
> > Support CPU hotplug via device-add command. Set up device tree
> > entries for the hotplugged CPU core and use the exising EPOW event
> > infrastructure to send CPU hotplug notification to the guest.
> > 
> > Also support cold plugged CPUs that are specified by -device option
> > on cmdline.
> > 
> > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr.c        | 129 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >  hw/ppc/spapr_events.c |   8 ++--
> >  hw/ppc/spapr_rtas.c   |  11 +++++
> >  3 files changed, 145 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index b526b7d..9b0701c 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> [...]
> > +static void spapr_cpu_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > +                            Error **errp)
> > +{
> > +    CPUState *cs = CPU(dev);
> > +    PowerPCCPU *cpu = POWERPC_CPU(cs);
> > +    int id = ppc_get_vcpu_dt_id(cpu);
> > +    sPAPRDRConnector *drc =
> > +        spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, id);
> > +    sPAPRDRConnectorClass *drck;
> > +    int smt = kvmppc_smt_threads();
> > +    Error *local_err = NULL;
> > +    void *fdt = NULL;
> > +    int i, fdt_offset = 0;
> > +
> > +    /* Set NUMA node for the added CPUs  */
> > +    for (i = 0; i < nb_numa_nodes; i++) {
> > +        if (test_bit(cs->cpu_index, numa_info[i].node_cpu)) {
> > +            cs->numa_node = i;
> > +            break;
> > +        }
> > +    }
> > +
> > +    /*
> > +     * SMT threads return from here, only main thread (core) will
> > +     * continue and signal hotplug event to the guest.
> > +     */
> > +    if ((id % smt) != 0) {
> > +        return;
> > +    }
> > +
> > +    if (!spapr->dr_cpu_enabled) {
> > +        /*
> > +         * This is a cold plugged CPU but the machine doesn't support
> > +         * DR. So skip the hotplug path ensuring that the CPU is brought
> > +         * up online with out an associated DR connector.
> > +         */
> > +        return;
> > +    }
> > +
> > +    g_assert(drc);
> > +
> > +    /*
> > +     * Setup CPU DT entries only for hotplugged CPUs. For boot time or
> > +     * coldplugged CPUs DT entries are setup in spapr_finalize_fdt().
> > +     */
> > +    if (dev->hotplugged) {
> > +        fdt = spapr_populate_hotplug_cpu_dt(dev, cs, &fdt_offset);
> > +    }
> > +
> > +    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +    drck->attach(drc, dev, fdt, fdt_offset, !dev->hotplugged, &local_err);
> > +    if (local_err) {
> > +        g_free(fdt);
> > +        error_propagate(errp, local_err);
> > +        return;
> > +    }
> > +
> > +    /*
> > +     * We send hotplug notification interrupt to the guest only in case
> > +     * of hotplugged CPUs.
> > +     */
> > +    if (dev->hotplugged) {
> > +        spapr_hotplug_req_add_event(drc);
> > +    } else {
> > +        /*
> > +         * HACK to support removal of hotplugged CPU after VM migration:
> > +         *
> > +         * Since we want to be able to hot-remove those coldplugged CPUs
> > +         * started at boot time using -device option at the target VM, we set
> > +         * the right allocation_state and isolation_state for them, which for
> > +         * the hotplugged CPUs would be set via RTAS calls done from the
> > +         * guest during hotplug.
> > +         *
> > +         * This allows the coldplugged CPUs started using -device option to
> > +         * have the right isolation and allocation states as expected by the
> > +         * CPU hot removal code.
> > +         *
> > +         * This hack will be removed once we have DRC states migrated as part
> > +         * of VM migration.
> > +         */
> > +        drck->set_allocation_state(drc, SPAPR_DR_ALLOCATION_STATE_USABLE);
> > +        drck->set_isolation_state(drc, SPAPR_DR_ISOLATION_STATE_UNISOLATED);
> > +    }
> > +
> > +    return;
> 
> Cosmetic nit: Superfluous return statement
> 
> > +}
> > +
> [...]
> > diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c
> > index 57ec97a..48aeb86 100644
> > --- a/hw/ppc/spapr_rtas.c
> > +++ b/hw/ppc/spapr_rtas.c
> > @@ -121,6 +121,16 @@ static void rtas_query_cpu_stopped_state(PowerPCCPU *cpu_,
> >      rtas_st(rets, 0, RTAS_OUT_PARAM_ERROR);
> >  }
> >  
> > +static void spapr_cpu_set_endianness(PowerPCCPU *cpu)
> > +{
> > +    PowerPCCPU *fcpu = POWERPC_CPU(first_cpu);
> > +    PowerPCCPUClass *pcc = POWERPC_CPU_GET_CLASS(fcpu);
> > +
> > +    if (!(*pcc->interrupts_big_endian)(fcpu)) {
> 
> Functions pointers are sometimes still confusing to me, but can't you
> simplify that to:
> 
>     if (!pcc->interrupts_big_endian(fcpu)) {

That should work too. Thanks.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 18/24] xics_kvm: Don't enable KVM_CAP_IRQ_XICS if already enabled
  2015-05-05  7:22   ` David Gibson
@ 2015-05-06  5:42     ` Bharata B Rao
  2015-05-07  1:07       ` David Gibson
  0 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-05-06  5:42 UTC (permalink / raw)
  To: David Gibson
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

On Tue, May 05, 2015 at 05:22:52PM +1000, David Gibson wrote:
> On Fri, Apr 24, 2015 at 12:17:40PM +0530, Bharata B Rao wrote:
> > When supporting CPU hot removal by parking the vCPU fd and reusing
> > it during hotplug again, there can be cases where we try to reenable
> > KVM_CAP_IRQ_XICS CAP for the vCPU for which it was already enabled.
> > Introduce a boolean member in ICPState to track this and don't
> > reenable the CAP if it was already enabled earlier.
> > 
> > This change allows CPU hot removal to work for sPAPR.
> > 
> > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> 
> Do you actually need this?  Is there any harm in setting the
> capability multiple times, or could you just ignore the "already set"
> error?

We discussed this last time and concluded that this patch is needed.

Ref: http://lists.nongnu.org/archive/html/qemu-devel/2015-03/msg05361.html

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 12/24] spapr: CPU hotplug support
  2015-05-05  6:59   ` David Gibson
@ 2015-05-06  6:14     ` Bharata B Rao
  2015-05-07  1:03       ` David Gibson
  0 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-05-06  6:14 UTC (permalink / raw)
  To: David Gibson
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

On Tue, May 05, 2015 at 04:59:51PM +1000, David Gibson wrote:
> On Fri, Apr 24, 2015 at 12:17:34PM +0530, Bharata B Rao wrote:
> > Support CPU hotplug via device-add command. Set up device tree
> > entries for the hotplugged CPU core and use the exising EPOW event
> > infrastructure to send CPU hotplug notification to the guest.
> > 
> > Also support cold plugged CPUs that are specified by -device option
> > on cmdline.
> > 
> > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr.c        | 129 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >  hw/ppc/spapr_events.c |   8 ++--
> >  hw/ppc/spapr_rtas.c   |  11 +++++
> >  3 files changed, 145 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index b526b7d..9b0701c 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -33,6 +33,7 @@
> >  #include "sysemu/block-backend.h"
> >  #include "sysemu/cpus.h"
> >  #include "sysemu/kvm.h"
> > +#include "sysemu/device_tree.h"
> >  #include "kvm_ppc.h"
> >  #include "mmu-hash64.h"
> >  #include "qom/cpu.h"
> > @@ -662,6 +663,17 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, int offset)
> >      unsigned sockets = opts ? qemu_opt_get_number(opts, "sockets", 0) : 0;
> >      uint32_t cpus_per_socket = sockets ? (smp_cpus / sockets) : 1;
> >      uint32_t pft_size_prop[] = {0, cpu_to_be32(spapr->htab_shift)};
> > +    sPAPRDRConnector *drc;
> > +    sPAPRDRConnectorClass *drck;
> > +    int drc_index;
> > +
> > +    if (spapr->dr_cpu_enabled) {
> > +        drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, index);
> > +        g_assert(drc);
> > +        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +        drc_index = drck->get_index(drc);
> > +        _FDT((fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_index)));
> > +    }
> >  
> >      _FDT((fdt_setprop_cell(fdt, offset, "reg", index)));
> >      _FDT((fdt_setprop_string(fdt, offset, "device_type", "cpu")));
> > @@ -1850,6 +1862,114 @@ static void spapr_nmi(NMIState *n, int cpu_index, Error **errp)
> >      }
> >  }
> >  
> > +static void *spapr_populate_hotplug_cpu_dt(DeviceState *dev, CPUState *cs,
> > +                                            int *fdt_offset)
> > +{
> > +    PowerPCCPU *cpu = POWERPC_CPU(cs);
> > +    DeviceClass *dc = DEVICE_GET_CLASS(cs);
> > +    int id = ppc_get_vcpu_dt_id(cpu);
> > +    void *fdt;
> > +    int offset, fdt_size;
> > +    char *nodename;
> > +
> > +    fdt = create_device_tree(&fdt_size);
> > +    nodename = g_strdup_printf("%s@%x", dc->fw_name, id);
> > +    offset = fdt_add_subnode(fdt, 0, nodename);
> > +
> > +    spapr_populate_cpu_dt(cs, fdt, offset);
> > +    g_free(nodename);
> > +
> > +    *fdt_offset = offset;
> > +    return fdt;
> > +}
> > +
> > +static void spapr_cpu_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > +                            Error **errp)
> > +{
> > +    CPUState *cs = CPU(dev);
> > +    PowerPCCPU *cpu = POWERPC_CPU(cs);
> > +    int id = ppc_get_vcpu_dt_id(cpu);
> > +    sPAPRDRConnector *drc =
> > +        spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, id);
> > +    sPAPRDRConnectorClass *drck;
> > +    int smt = kvmppc_smt_threads();
> > +    Error *local_err = NULL;
> > +    void *fdt = NULL;
> > +    int i, fdt_offset = 0;
> > +
> > +    /* Set NUMA node for the added CPUs  */
> > +    for (i = 0; i < nb_numa_nodes; i++) {
> > +        if (test_bit(cs->cpu_index, numa_info[i].node_cpu)) {
> > +            cs->numa_node = i;
> > +            break;
> > +        }
> > +    }
> > +
> > +    /*
> > +     * SMT threads return from here, only main thread (core) will
> > +     * continue and signal hotplug event to the guest.
> > +     */
> > +    if ((id % smt) != 0) {
> > +        return;
> > +    }
> 
> Couldn't you avoid this by attaching this call to the core device,
> rather than the individual vcpu thread objects?

Adding a socket device will result in realize call for that device.
Socket realizefn will call core realizefn and core realizefn will call
thread (or CPU) realizefn. device_set_realized() will call ->plug handler
for all these devices (socket, cores and threads) and that's how we
end up here even for threads.

This will be same when I get rid of socket abstraction and hot plug
cores for the same reason as above.

And calling ->plug handler in the context of threads is required
to initialize board specific CPU bits for the CPU thread that is being
realized.

> 
> 
> > +    if (!spapr->dr_cpu_enabled) {
> > +        /*
> > +         * This is a cold plugged CPU but the machine doesn't support
> > +         * DR. So skip the hotplug path ensuring that the CPU is brought
> > +         * up online with out an associated DR connector.
> > +         */
> > +        return;
> > +    }
> > +
> > +    g_assert(drc);
> > +
> > +    /*
> > +     * Setup CPU DT entries only for hotplugged CPUs. For boot time or
> > +     * coldplugged CPUs DT entries are setup in spapr_finalize_fdt().
> > +     */
> > +    if (dev->hotplugged) {
> > +        fdt = spapr_populate_hotplug_cpu_dt(dev, cs, &fdt_offset);
> > +    }
> > +
> > +    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +    drck->attach(drc, dev, fdt, fdt_offset, !dev->hotplugged, &local_err);
> > +    if (local_err) {
> > +        g_free(fdt);
> > +        error_propagate(errp, local_err);
> > +        return;
> > +    }
> > +
> > +    /*
> > +     * We send hotplug notification interrupt to the guest only in case
> > +     * of hotplugged CPUs.
> > +     */
> > +    if (dev->hotplugged) {
> > +        spapr_hotplug_req_add_event(drc);
> > +    } else {
> > +        /*
> > +         * HACK to support removal of hotplugged CPU after VM migration:
> > +         *
> > +         * Since we want to be able to hot-remove those coldplugged CPUs
> > +         * started at boot time using -device option at the target VM, we set
> > +         * the right allocation_state and isolation_state for them, which for
> > +         * the hotplugged CPUs would be set via RTAS calls done from the
> > +         * guest during hotplug.
> > +         *
> > +         * This allows the coldplugged CPUs started using -device option to
> > +         * have the right isolation and allocation states as expected by the
> > +         * CPU hot removal code.
> > +         *
> > +         * This hack will be removed once we have DRC states migrated as part
> > +         * of VM migration.
> > +         */
> > +        drck->set_allocation_state(drc, SPAPR_DR_ALLOCATION_STATE_USABLE);
> > +        drck->set_isolation_state(drc, SPAPR_DR_ISOLATION_STATE_UNISOLATED);
> > +    }
> > +
> > +    return;
> > +}
> > +
> >  static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
> >                                        DeviceState *dev, Error **errp)
> >  {
> > @@ -1858,6 +1978,15 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
> >          PowerPCCPU *cpu = POWERPC_CPU(cs);
> >  
> >          spapr_cpu_init(cpu);
> > +        spapr_cpu_reset(cpu);
> 
> I'm a little surprised these get called here, rather than in the
> creation / realize path of the core qdev.

These two routines (spapr_cpu_init and spapr_cpu_reset) are sPAPR or
board specific and I can't even have them as part of ppc_cpu_realizefn.

We had discussed this earlier at
http://lists.gnu.org/archive/html/qemu-devel/2015-02/msg04399.html

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 06/24] spapr: Consolidate cpu init code into a routine
  2015-05-06  4:28     ` Bharata B Rao
@ 2015-05-06  6:32       ` Thomas Huth
  2015-05-06  8:45         ` Bharata B Rao
  0 siblings, 1 reply; 74+ messages in thread
From: Thomas Huth @ 2015-05-06  6:32 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, imammedo,
	nfont, afaerber, david

On Wed, 6 May 2015 09:58:09 +0530
Bharata B Rao <bharata@linux.vnet.ibm.com> wrote:

> On Mon, May 04, 2015 at 06:10:59PM +0200, Thomas Huth wrote:
> > On Fri, 24 Apr 2015 12:17:28 +0530
> > Bharata B Rao <bharata@linux.vnet.ibm.com> wrote:
> > 
> > > Factor out bits of sPAPR specific CPU initialization code into
> > > a separate routine so that it can be called from CPU hotplug
> > > path too.
> > > 
> > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > > ---
> > >  hw/ppc/spapr.c | 54 +++++++++++++++++++++++++++++-------------------------
> > >  1 file changed, 29 insertions(+), 25 deletions(-)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index a56f9a1..5c8f2ff 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -1440,6 +1440,34 @@ static void spapr_drc_reset(void *opaque)
> > >      }
> > >  }
> > >  
> > > +static void spapr_cpu_init(PowerPCCPU *cpu)
> > > +{
> > > +    CPUPPCState *env = &cpu->env;
> > > +
> > > +    /* Set time-base frequency to 512 MHz */
> > > +    cpu_ppc_tb_init(env, TIMEBASE_FREQ);
> > > +
> > > +    /* PAPR always has exception vectors in RAM not ROM. To ensure this,
> > > +     * MSR[IP] should never be set.
> > > +     */
> > > +    env->msr_mask &= ~(1 << 6);
> > 
> > While you're at it ... could we maybe get a proper #define for that MSR
> > bit? (just like the other ones in target-ppc/cpu.h)
> 
> Sure will use MSR_EP here next time.

According to the comment in cpu.h, the EP bit was for the 601 CPU only,
so I think it would be better to introduce a new #define MSR_IP 6 (or
so) here instead.

 Thomas

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 17/24] cpus: Reclaim vCPU objects
  2015-05-05  7:20   ` David Gibson
@ 2015-05-06  6:37     ` Bharata B Rao
  2015-05-07  1:06       ` David Gibson
  0 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-05-06  6:37 UTC (permalink / raw)
  To: David Gibson
  Cc: Zhu Guihua, mdroth, aik, agraf, qemu-devel, Chen Fan, qemu-ppc,
	tyreld, nfont, Gu Zheng, imammedo, afaerber

On Tue, May 05, 2015 at 05:20:04PM +1000, David Gibson wrote:
> On Fri, Apr 24, 2015 at 12:17:39PM +0530, Bharata B Rao wrote:
> > From: Gu Zheng <guz.fnst@cn.fujitsu.com>
> > 
> > In order to deal well with the kvm vcpus (which can not be removed without any
> > protection), we do not close KVM vcpu fd, just record and mark it as stopped
> > into a list, so that we can reuse it for the appending cpu hot-add request if
> > possible. It is also the approach that kvm guys suggested:
> > https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
> > 
> > This patch also adds a QOM API object_has_no_children(Object *obj)
> > that checks whether a given object has any child objects. This API
> > is needed to release CPU core and socket objects when a vCPU is destroyed.
> 
> I'm guessing this commit message needs updating, since you seem to
> have split this out into the previous patch.
> 
> > Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> > Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
> > Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
> > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> >                [Added core and socket removal bits]
> > ---
> >  cpus.c               | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  include/qom/cpu.h    | 11 +++++++++
> >  include/sysemu/kvm.h |  1 +
> >  kvm-all.c            | 57 +++++++++++++++++++++++++++++++++++++++++++-
> >  kvm-stub.c           |  5 ++++
> >  5 files changed, 140 insertions(+), 1 deletion(-)
> > 
> > diff --git a/cpus.c b/cpus.c
> > index 0fac143..325f8a6 100644
> > --- a/cpus.c
> > +++ b/cpus.c
> > @@ -858,6 +858,47 @@ void async_run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
> >      qemu_cpu_kick(cpu);
> >  }
> >  
> > +static void qemu_destroy_cpu_core(Object *core)
> > +{
> > +    Object *socket = core->parent;
> > +
> > +    object_unparent(core);
> > +    if (socket && object_has_no_children(socket)) {
> > +        object_unparent(socket);
> > +    }
> 
> This seems a bit odd to me.  I thought the general idea of the new
> approaches to cpu hotplug meant that the hotplug sequence started from
> the top (the socket or core) and worked down to the threads.  Rather
> than starting at the thread, and working up to the core and socket
> level.

Yes that's true for hotplug as well as hot unplug curently. Plug or
unplug starts at socket, moves down to cores and threads.

However when the unplug request comes down to the thread, we have to
destroy the vCPU and that's when we end up in this part of the code. Here
the thread (vCPU) unparents itself from the core. The core can't unparent
untill all its threads have unparented themselves. When all threads of a
core are done unparenting, core goes ahead and unparents itself from
its parent socket. Similarly socket can unparent when all cores under
it have unparented themselves from the socket.

This is the code that ensures that the socket device object finally
gets cleared and the id associated with the hot removed socket device
is available for reuse with next hotplug.

> 
> > +}
> > +
> > +static void qemu_kvm_destroy_vcpu(CPUState *cpu)
> > +{
> > +    Object *thread = OBJECT(cpu);
> > +    Object *core = thread->parent;
> > +
> > +    CPU_REMOVE(cpu);
> > +
> > +    if (kvm_destroy_vcpu(cpu) < 0) {
> > +        error_report("kvm_destroy_vcpu failed.\n");
> > +        exit(EXIT_FAILURE);
> > +    }
> > +
> > +    object_unparent(thread);
> > +    if (core && object_has_no_children(core)) {
> > +        qemu_destroy_cpu_core(core);
> > +    }
> > +}
> > +

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 08/24] ppc: Prepare CPU socket/core abstraction
  2015-05-06  4:40     ` Bharata B Rao
@ 2015-05-06  6:52       ` Thomas Huth
  0 siblings, 0 replies; 74+ messages in thread
From: Thomas Huth @ 2015-05-06  6:52 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, imammedo,
	nfont, afaerber, david

On Wed, 6 May 2015 10:10:15 +0530
Bharata B Rao <bharata@linux.vnet.ibm.com> wrote:

> On Mon, May 04, 2015 at 05:15:04PM +0200, Thomas Huth wrote:
> > On Fri, 24 Apr 2015 12:17:30 +0530
> > Bharata B Rao <bharata@linux.vnet.ibm.com> wrote:
> > 
> > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > > Signed-off-by: Andreas Färber <afaerber@suse.de>
> > 
> > Not sure if QEMU is fully following the kernel Sob-process, but if
> > that's the case, I think your Sob should be below the one of Andreas in
> > case you're the last person who touched the patch (and I think that's
> > the case here since you've sent it out)?
> 
> The patch is from me, but I included Andreas' sob and copyrights since
> the code was derived from his equivalent x86 implementation.

I am neither a lawyer nor an expert here, but I think in case the patch
itself never really passed through the hands of Andreas, it should not
include an Sob of him. An Sob is like a signature - you can/should not
put an Sob from somebody else into a file, everybody has to do that
personally.

So in case you copied code from another file, simply state that clearly
in the patch description and make sure that the original author is on
"CC:". You can also explicitly put a "Cc: Original Author <e@mail>"
line after your Sob in the patch description to make sure that the
"Cc:" information is retained in the patch description.

 Thomas

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 20/24] spapr: CPU hot unplug support
  2015-05-05  7:28   ` David Gibson
@ 2015-05-06  7:55     ` Bharata B Rao
  2015-05-07  1:09       ` David Gibson
  0 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-05-06  7:55 UTC (permalink / raw)
  To: David Gibson
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

On Tue, May 05, 2015 at 05:28:38PM +1000, David Gibson wrote:
> On Fri, Apr 24, 2015 at 12:17:42PM +0530, Bharata B Rao wrote:
> > Support hot removal of CPU for sPAPR guests by sending the hot unplug
> > notification to the guest via EPOW interrupt. Release the vCPU object
> > after CPU hot unplug so that vCPU fd can be parked and reused.
> > 
> > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > ---
> >  hw/ppc/spapr.c              | 101 +++++++++++++++++++++++++++++++++++++++++++-
> >  target-ppc/translate_init.c |  10 +++++
> >  2 files changed, 110 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 9b0701c..910a50f 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -1481,6 +1481,12 @@ static void spapr_cpu_init(PowerPCCPU *cpu)
> >      qemu_register_reset(spapr_cpu_reset, cpu);
> >  }
> >  
> > +static void spapr_cpu_destroy(PowerPCCPU *cpu)
> > +{
> > +    xics_cpu_destroy(spapr->icp, cpu);
> > +    qemu_unregister_reset(spapr_cpu_reset, cpu);
> > +}
> > +
> >  /* pSeries LPAR / sPAPR hardware init */
> >  static void ppc_spapr_init(MachineState *machine)
> >  {
> > @@ -1883,6 +1889,24 @@ static void *spapr_populate_hotplug_cpu_dt(DeviceState *dev, CPUState *cs,
> >      return fdt;
> >  }
> >  
> > +static void spapr_cpu_release(DeviceState *dev, void *opaque)
> > +{
> > +    CPUState *cs;
> > +    int i;
> > +    int id = ppc_get_vcpu_dt_id(POWERPC_CPU(CPU(dev)));
> > +
> > +    for (i = id; i < id + smp_threads; i++) {
> > +        CPU_FOREACH(cs) {
> > +            PowerPCCPU *cpu = POWERPC_CPU(cs);
> > +
> > +            if (i == ppc_get_vcpu_dt_id(cpu)) {
> > +                spapr_cpu_destroy(cpu);
> > +                cpu_remove(cs);
> > +            }
> > +        }
> > +    }
> > +}
> > +
> >  static void spapr_cpu_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >                              Error **errp)
> >  {
> > @@ -1970,6 +1994,59 @@ static void spapr_cpu_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >      return;
> >  }
> >  
> > +static int spapr_cpu_unplug(Object *obj, void *opaque)
> > +{
> > +    Error **errp = opaque;
> > +    DeviceState *dev = DEVICE(obj);
> > +    CPUState *cs = CPU(dev);
> > +    PowerPCCPU *cpu = POWERPC_CPU(cs);
> > +    int id = ppc_get_vcpu_dt_id(cpu);
> > +    int smt = kvmppc_smt_threads();
> > +    sPAPRDRConnector *drc =
> > +        spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, id);
> > +    sPAPRDRConnectorClass *drck;
> > +    Error *local_err = NULL;
> > +
> > +    /*
> > +     * SMT threads return from here, only main thread (core) will
> > +     * continue and signal hot unplug event to the guest.
> > +     */
> > +    if ((id % smt) != 0) {
> > +        return 0;
> > +    }
> 
> As with the in-plug side couldn't this be done more naturally by
> attaching this function to the cpu core object rather than the thread?

When I switch to core level addition in the next version, I will see
how best this can be handled.

> 
> > +    g_assert(drc);
> > +
> > +    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > +    drck->detach(drc, dev, spapr_cpu_release, NULL, &local_err);
> > +    if (local_err) {
> > +        error_propagate(errp, local_err);
> > +        return -1;
> > +    }
> > +
> > +    /*
> > +     * In addition to hotplugged CPUs, send the hot-unplug notification
> > +     * interrupt to the guest for coldplugged CPUs started via -device
> > +     * option too.
> > +     */
> > +    spapr_hotplug_req_remove_event(drc);
> 
> Um.. doesn't the remove notification need to go *before* the
> "physical" unplug?  Along with a wait for acknowledgement from the
> guest?

If I am reading it right, PCI hotplug is also following the same order
(detach followed by guest notification).

Let me experiment with this in the next version.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 21/24] spapr: Initialize hotplug memory address space
  2015-05-05  7:33   ` David Gibson
@ 2015-05-06  7:58     ` Bharata B Rao
  0 siblings, 0 replies; 74+ messages in thread
From: Bharata B Rao @ 2015-05-06  7:58 UTC (permalink / raw)
  To: David Gibson
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

On Tue, May 05, 2015 at 05:33:33PM +1000, David Gibson wrote:
> On Fri, Apr 24, 2015 at 12:17:43PM +0530, Bharata B Rao wrote:
> > Initialize a hotplug memory region under which all the hotplugged
> > memory is accommodated. Also enable memory hotplug by setting
> > CONFIG_MEM_HOTPLUG.
> > 
> > Modelled on i386 memory hotplug.
> > 
> > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> 
> So, the cpu hotplug stuff has been caught up in these generic level
> design discussions.  Could you split out the memory hotplug part of
> this series so that we might be able to move forwards with that while
> the cpu hotplug discussion continues?

Yes, in fact I too was planning to split memory and push it separately.

> 
> > ---
> >  default-configs/ppc64-softmmu.mak |  1 +
> >  hw/ppc/spapr.c                    | 38 ++++++++++++++++++++++++++++++++++++++
> >  include/hw/ppc/spapr.h            | 12 ++++++++++++
> >  3 files changed, 51 insertions(+)
> > 
> > diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> > index 22ef132..16b3011 100644
> > --- a/default-configs/ppc64-softmmu.mak
> > +++ b/default-configs/ppc64-softmmu.mak
> > @@ -51,3 +51,4 @@ CONFIG_XICS_KVM=$(and $(CONFIG_PSERIES),$(CONFIG_KVM))
> >  # For PReP
> >  CONFIG_MC146818RTC=y
> >  CONFIG_ISA_TESTDEV=y
> > +CONFIG_MEM_HOTPLUG=y
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 910a50f..9dc4c36 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -125,6 +125,9 @@ struct sPAPRMachineState {
> >  
> >      /*< public >*/
> >      char *kvm_type;
> > +    ram_addr_t hotplug_memory_base;
> > +    MemoryRegion hotplug_memory;
> > +    bool enforce_aligned_dimm;
> 
> As mentioned on the earlier version, on ppc we don't have
> compatibility reasons we need to have a mode for unaligned dimms.  Get
> rid of this bool and treat it as always on instead.

ok.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 21/24] spapr: Initialize hotplug memory address space
  2015-05-05  8:48   ` Igor Mammedov
@ 2015-05-06  8:23     ` Bharata B Rao
  2015-05-07  1:12       ` David Gibson
  0 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-05-06  8:23 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont, afaerber, david

On Tue, May 05, 2015 at 10:48:50AM +0200, Igor Mammedov wrote:
> On Fri, 24 Apr 2015 12:17:43 +0530
> Bharata B Rao <bharata@linux.vnet.ibm.com> wrote:
> 
> > Initialize a hotplug memory region under which all the hotplugged
> > memory is accommodated. Also enable memory hotplug by setting
> > CONFIG_MEM_HOTPLUG.
> > 
> > Modelled on i386 memory hotplug.
> > 
> > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > ---
> >  default-configs/ppc64-softmmu.mak |  1 +
> >  hw/ppc/spapr.c                    | 38 ++++++++++++++++++++++++++++++++++++++
> >  include/hw/ppc/spapr.h            | 12 ++++++++++++
> >  3 files changed, 51 insertions(+)
> > 
> > diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> > index 22ef132..16b3011 100644
> > --- a/default-configs/ppc64-softmmu.mak
> > +++ b/default-configs/ppc64-softmmu.mak
> > @@ -51,3 +51,4 @@ CONFIG_XICS_KVM=$(and $(CONFIG_PSERIES),$(CONFIG_KVM))
> >  # For PReP
> >  CONFIG_MC146818RTC=y
> >  CONFIG_ISA_TESTDEV=y
> > +CONFIG_MEM_HOTPLUG=y
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index 910a50f..9dc4c36 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -125,6 +125,9 @@ struct sPAPRMachineState {
> >  
> >      /*< public >*/
> >      char *kvm_type;
> > +    ram_addr_t hotplug_memory_base;
> > +    MemoryRegion hotplug_memory;
> > +    bool enforce_aligned_dimm;
> >  };
> >  
> >  sPAPREnvironment *spapr;
> > @@ -1514,6 +1517,7 @@ static void ppc_spapr_init(MachineState *machine)
> >      QemuOpts *opts = qemu_opts_find(qemu_find_opts("smp-opts"), NULL);
> >      int sockets = opts ? qemu_opt_get_number(opts, "sockets", 0) : 0;
> >      int cores = (smp_cpus/smp_threads) ? smp_cpus/smp_threads : 1;
> > +    sPAPRMachineState *ms = SPAPR_MACHINE(machine);
> >  
> >      sockets = sockets ? sockets : cores;
> >      msi_supported = true;
> > @@ -1613,6 +1617,36 @@ static void ppc_spapr_init(MachineState *machine)
> >          memory_region_add_subregion(sysmem, 0, rma_region);
> >      }
> >  
> > +    /* initialize hotplug memory address space */
> > +    if (machine->ram_size < machine->maxram_size) {
> > +        ram_addr_t hotplug_mem_size =
> > +            machine->maxram_size - machine->ram_size;
> > +
> > +        if (machine->ram_slots > SPAPR_MAX_RAM_SLOTS) {
> > +            error_report("unsupported amount of memory slots: %"PRIu64,
> > +                         machine->ram_slots);
> > +            exit(EXIT_FAILURE);
> > +        }
> > +
> > +        ms->hotplug_memory_base = ROUND_UP(machine->ram_size,
> > +                                    SPAPR_HOTPLUG_MEM_ALIGN);
> > +
> > +        if (ms->enforce_aligned_dimm) {
> > +            hotplug_mem_size += SPAPR_HOTPLUG_MEM_ALIGN * machine->ram_slots;
> > +        }
> > +
> > +        if ((ms->hotplug_memory_base + hotplug_mem_size) < hotplug_mem_size) {
> > +            error_report("unsupported amount of maximum memory: " RAM_ADDR_FMT,
> > +                         machine->maxram_size);
> > +            exit(EXIT_FAILURE);
> > +        }
> > +
> > +        memory_region_init(&ms->hotplug_memory, OBJECT(ms),
> > +                           "hotplug-memory", hotplug_mem_size);
> > +        memory_region_add_subregion(sysmem, ms->hotplug_memory_base,
> > +                                    &ms->hotplug_memory);
> > +    }
> > +
> >      filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, "spapr-rtas.bin");
> >      spapr->rtas_size = get_image_size(filename);
> >      spapr->rtas_blob = g_malloc(spapr->rtas_size);
> > @@ -1844,11 +1878,15 @@ static void spapr_set_kvm_type(Object *obj, const char *value, Error **errp)
> >  
> >  static void spapr_machine_initfn(Object *obj)
> >  {
> > +    sPAPRMachineState *ms = SPAPR_MACHINE(obj);
> > +
> >      object_property_add_str(obj, "kvm-type",
> >                              spapr_get_kvm_type, spapr_set_kvm_type, NULL);
> >      object_property_set_description(obj, "kvm-type",
> >                                      "Specifies the KVM virtualization mode (HV, PR)",
> >                                      NULL);
> > +
> > +    ms->enforce_aligned_dimm = true;
> >  }
> >  
> >  static void ppc_cpu_do_nmi_on_cpu(void *arg)
> > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > index ecac6e3..53560e9 100644
> > --- a/include/hw/ppc/spapr.h
> > +++ b/include/hw/ppc/spapr.h
> > @@ -542,6 +542,18 @@ struct sPAPREventLogEntry {
> >  
> >  #define SPAPR_MEMORY_BLOCK_SIZE (1 << 28) /* 256MB */
> >  
> > +/*
> > + * This defines the maximum number of DIMM slots we can have for sPAPR
> > + * guest. This is not defined by sPAPR but we are defining it to 4096 slots
> > + * here. With the worst case addition of SPAPR_MEMORY_BLOCK_SIZE
> > + * (256MB) memory per slot, we should be able to support 1TB of guest
> > + * hotpluggable memory.
> > + */
> > +#define SPAPR_MAX_RAM_SLOTS     (1ULL << 12)
> why not write 4096 instead of (1ULL << 12), much easier to read.

Sure.

> 
> BTW:
> KVM supports upto 509 memory slots including slots consumed by
> initial memory.

I see that PowerPC defaults to 32 slots. So having 4096 slots is really
pointless then ? So to ensure more hot-pluggable memory space is available
should I be increasing the size of the minimum pluggable memory in a
dimm slot (as defined by SPAPR_MEMORY_BLOCK_SIZE above) ?

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 23/24] spapr: Support ibm, dynamic-reconfiguration-memory
  2015-05-05  7:40   ` David Gibson
@ 2015-05-06  8:27     ` Bharata B Rao
  2015-05-07  1:13       ` David Gibson
  0 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-05-06  8:27 UTC (permalink / raw)
  To: David Gibson
  Cc: mdroth, Nikunj A. Dadhania, aik, agraf, qemu-devel, qemu-ppc,
	tyreld, nfont, imammedo, afaerber

On Tue, May 05, 2015 at 05:40:32PM +1000, David Gibson wrote:
> On Fri, Apr 24, 2015 at 12:17:45PM +0530, Bharata B Rao wrote:
> > Parse ibm,architecture.vec table obtained from the guest and enable
> > memory node configuration via ibm,dynamic-reconfiguration-memory if guest
> > supports it. This is in preparation to support memory hotplug for
> > sPAPR guests.
> > 
> > This changes the way memory node configuration is done. Currently all
> > memory nodes are built upfront. But after this patch, only memory@0 node
> > for RMA is built upfront. Guest kernel boots with just that and rest of
> > the memory nodes (via memory@XXX or ibm,dynamic-reconfiguration-memory)
> > are built when guest does ibm,client-architecture-support call.
> > 
> > Note: This patch needs a SLOF enhancement which is already part of
> > upstream SLOF.
> 
> Is it in the SLOF included in the qemu submodule though?  If not you
> should have a patch to update the submodule first.

Nikunj confirms that SLOF change needed to support
ibm,dynamic-reconfiguration-memory is already part of QEMU.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 24/24] spapr: Memory hotplug support
  2015-05-05  7:45   ` David Gibson
@ 2015-05-06  8:30     ` Bharata B Rao
  0 siblings, 0 replies; 74+ messages in thread
From: Bharata B Rao @ 2015-05-06  8:30 UTC (permalink / raw)
  To: David Gibson
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

On Tue, May 05, 2015 at 05:45:01PM +1000, David Gibson wrote:
> On Fri, Apr 24, 2015 at 12:17:46PM +0530, Bharata B Rao wrote:
> > Make use of pc-dimm infrastructure to support memory hotplug
> > for PowerPC.
> > 
> > Modelled on i386 memory hotplug.
> 
> Can the previous patch actually do anything without this one?

Yes, the previous patch adds ibm,dynamic-reconfiguration-memory node
and is contained in itself.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 06/24] spapr: Consolidate cpu init code into a routine
  2015-05-06  6:32       ` Thomas Huth
@ 2015-05-06  8:45         ` Bharata B Rao
  2015-05-06  9:37           ` Thomas Huth
  0 siblings, 1 reply; 74+ messages in thread
From: Bharata B Rao @ 2015-05-06  8:45 UTC (permalink / raw)
  To: Thomas Huth
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, imammedo,
	nfont, afaerber, david

On Wed, May 06, 2015 at 08:32:03AM +0200, Thomas Huth wrote:
> On Wed, 6 May 2015 09:58:09 +0530
> Bharata B Rao <bharata@linux.vnet.ibm.com> wrote:
> 
> > On Mon, May 04, 2015 at 06:10:59PM +0200, Thomas Huth wrote:
> > > On Fri, 24 Apr 2015 12:17:28 +0530
> > > Bharata B Rao <bharata@linux.vnet.ibm.com> wrote:
> > > 
> > > > Factor out bits of sPAPR specific CPU initialization code into
> > > > a separate routine so that it can be called from CPU hotplug
> > > > path too.
> > > > 
> > > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > > > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > > > ---
> > > >  hw/ppc/spapr.c | 54 +++++++++++++++++++++++++++++-------------------------
> > > >  1 file changed, 29 insertions(+), 25 deletions(-)
> > > > 
> > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > > index a56f9a1..5c8f2ff 100644
> > > > --- a/hw/ppc/spapr.c
> > > > +++ b/hw/ppc/spapr.c
> > > > @@ -1440,6 +1440,34 @@ static void spapr_drc_reset(void *opaque)
> > > >      }
> > > >  }
> > > >  
> > > > +static void spapr_cpu_init(PowerPCCPU *cpu)
> > > > +{
> > > > +    CPUPPCState *env = &cpu->env;
> > > > +
> > > > +    /* Set time-base frequency to 512 MHz */
> > > > +    cpu_ppc_tb_init(env, TIMEBASE_FREQ);
> > > > +
> > > > +    /* PAPR always has exception vectors in RAM not ROM. To ensure this,
> > > > +     * MSR[IP] should never be set.
> > > > +     */
> > > > +    env->msr_mask &= ~(1 << 6);
> > > 
> > > While you're at it ... could we maybe get a proper #define for that MSR
> > > bit? (just like the other ones in target-ppc/cpu.h)
> > 
> > Sure will use MSR_EP here next time.
> 
> According to the comment in cpu.h, the EP bit was for the 601 CPU only,
> so I think it would be better to introduce a new #define MSR_IP 6 (or
> so) here instead.

Kernel defines bit 6 as
#define MSR_IP_LG       6               /* Exception prefix 0x000/0xFFF */
(arch/powerpc/include/asm/reg.h)

QEMU defines it as
#define MSR_EP   6  /* Exception prefix on 601                               */

I can add MSR_IP in QEMU, but that will mean two defines for same bit position,
but I think MSR_IP_LG in kernel or MSR_EP in QEMU both mean the same, but
called differently.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 06/24] spapr: Consolidate cpu init code into a routine
  2015-05-06  8:45         ` Bharata B Rao
@ 2015-05-06  9:37           ` Thomas Huth
  0 siblings, 0 replies; 74+ messages in thread
From: Thomas Huth @ 2015-05-06  9:37 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, imammedo,
	nfont, afaerber, david

On Wed, 6 May 2015 14:15:37 +0530
Bharata B Rao <bharata@linux.vnet.ibm.com> wrote:

> On Wed, May 06, 2015 at 08:32:03AM +0200, Thomas Huth wrote:
> > On Wed, 6 May 2015 09:58:09 +0530
> > Bharata B Rao <bharata@linux.vnet.ibm.com> wrote:
> > 
> > > On Mon, May 04, 2015 at 06:10:59PM +0200, Thomas Huth wrote:
> > > > On Fri, 24 Apr 2015 12:17:28 +0530
> > > > Bharata B Rao <bharata@linux.vnet.ibm.com> wrote:
> > > > 
> > > > > Factor out bits of sPAPR specific CPU initialization code into
> > > > > a separate routine so that it can be called from CPU hotplug
> > > > > path too.
> > > > > 
> > > > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > > > > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> > > > > ---
> > > > >  hw/ppc/spapr.c | 54 +++++++++++++++++++++++++++++-------------------------
> > > > >  1 file changed, 29 insertions(+), 25 deletions(-)
> > > > > 
> > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > > > index a56f9a1..5c8f2ff 100644
> > > > > --- a/hw/ppc/spapr.c
> > > > > +++ b/hw/ppc/spapr.c
> > > > > @@ -1440,6 +1440,34 @@ static void spapr_drc_reset(void *opaque)
> > > > >      }
> > > > >  }
> > > > >  
> > > > > +static void spapr_cpu_init(PowerPCCPU *cpu)
> > > > > +{
> > > > > +    CPUPPCState *env = &cpu->env;
> > > > > +
> > > > > +    /* Set time-base frequency to 512 MHz */
> > > > > +    cpu_ppc_tb_init(env, TIMEBASE_FREQ);
> > > > > +
> > > > > +    /* PAPR always has exception vectors in RAM not ROM. To ensure this,
> > > > > +     * MSR[IP] should never be set.
> > > > > +     */
> > > > > +    env->msr_mask &= ~(1 << 6);
> > > > 
> > > > While you're at it ... could we maybe get a proper #define for that MSR
> > > > bit? (just like the other ones in target-ppc/cpu.h)
> > > 
> > > Sure will use MSR_EP here next time.
> > 
> > According to the comment in cpu.h, the EP bit was for the 601 CPU only,
> > so I think it would be better to introduce a new #define MSR_IP 6 (or
> > so) here instead.
> 
> Kernel defines bit 6 as
> #define MSR_IP_LG       6               /* Exception prefix 0x000/0xFFF */
> (arch/powerpc/include/asm/reg.h)
> 
> QEMU defines it as
> #define MSR_EP   6  /* Exception prefix on 601                               */
> 
> I can add MSR_IP in QEMU, but that will mean two defines for same bit position,
> but I think MSR_IP_LG in kernel or MSR_EP in QEMU both mean the same, but
> called differently.

Ok, so EP = IP ... then I also think it's fine to use the MSR_EP define
here. I first thought that EP and IP would mean something different,
since a lot of MSR bits are defined differently on the various
POWER/PowerPC chips (see the MSR bit 10 for example, it has three
defines in cpu.h), but in this case they really seem to be the same.

By the way, is this bit still used at all on recent chips (which are
used for the spapr machine)? It's apparently not defined in the PowerISA
spec anymore...

 Thomas

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 12/24] spapr: CPU hotplug support
  2015-05-06  6:14     ` Bharata B Rao
@ 2015-05-07  1:03       ` David Gibson
  0 siblings, 0 replies; 74+ messages in thread
From: David Gibson @ 2015-05-07  1:03 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

[-- Attachment #1: Type: text/plain, Size: 8525 bytes --]

On Wed, May 06, 2015 at 11:44:20AM +0530, Bharata B Rao wrote:
> On Tue, May 05, 2015 at 04:59:51PM +1000, David Gibson wrote:
> > On Fri, Apr 24, 2015 at 12:17:34PM +0530, Bharata B Rao wrote:
> > > Support CPU hotplug via device-add command. Set up device tree
> > > entries for the hotplugged CPU core and use the exising EPOW event
> > > infrastructure to send CPU hotplug notification to the guest.
> > > 
> > > Also support cold plugged CPUs that are specified by -device option
> > > on cmdline.
> > > 
> > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > > ---
> > >  hw/ppc/spapr.c        | 129 ++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  hw/ppc/spapr_events.c |   8 ++--
> > >  hw/ppc/spapr_rtas.c   |  11 +++++
> > >  3 files changed, 145 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index b526b7d..9b0701c 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -33,6 +33,7 @@
> > >  #include "sysemu/block-backend.h"
> > >  #include "sysemu/cpus.h"
> > >  #include "sysemu/kvm.h"
> > > +#include "sysemu/device_tree.h"
> > >  #include "kvm_ppc.h"
> > >  #include "mmu-hash64.h"
> > >  #include "qom/cpu.h"
> > > @@ -662,6 +663,17 @@ static void spapr_populate_cpu_dt(CPUState *cs, void *fdt, int offset)
> > >      unsigned sockets = opts ? qemu_opt_get_number(opts, "sockets", 0) : 0;
> > >      uint32_t cpus_per_socket = sockets ? (smp_cpus / sockets) : 1;
> > >      uint32_t pft_size_prop[] = {0, cpu_to_be32(spapr->htab_shift)};
> > > +    sPAPRDRConnector *drc;
> > > +    sPAPRDRConnectorClass *drck;
> > > +    int drc_index;
> > > +
> > > +    if (spapr->dr_cpu_enabled) {
> > > +        drc = spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, index);
> > > +        g_assert(drc);
> > > +        drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > +        drc_index = drck->get_index(drc);
> > > +        _FDT((fdt_setprop_cell(fdt, offset, "ibm,my-drc-index", drc_index)));
> > > +    }
> > >  
> > >      _FDT((fdt_setprop_cell(fdt, offset, "reg", index)));
> > >      _FDT((fdt_setprop_string(fdt, offset, "device_type", "cpu")));
> > > @@ -1850,6 +1862,114 @@ static void spapr_nmi(NMIState *n, int cpu_index, Error **errp)
> > >      }
> > >  }
> > >  
> > > +static void *spapr_populate_hotplug_cpu_dt(DeviceState *dev, CPUState *cs,
> > > +                                            int *fdt_offset)
> > > +{
> > > +    PowerPCCPU *cpu = POWERPC_CPU(cs);
> > > +    DeviceClass *dc = DEVICE_GET_CLASS(cs);
> > > +    int id = ppc_get_vcpu_dt_id(cpu);
> > > +    void *fdt;
> > > +    int offset, fdt_size;
> > > +    char *nodename;
> > > +
> > > +    fdt = create_device_tree(&fdt_size);
> > > +    nodename = g_strdup_printf("%s@%x", dc->fw_name, id);
> > > +    offset = fdt_add_subnode(fdt, 0, nodename);
> > > +
> > > +    spapr_populate_cpu_dt(cs, fdt, offset);
> > > +    g_free(nodename);
> > > +
> > > +    *fdt_offset = offset;
> > > +    return fdt;
> > > +}
> > > +
> > > +static void spapr_cpu_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > > +                            Error **errp)
> > > +{
> > > +    CPUState *cs = CPU(dev);
> > > +    PowerPCCPU *cpu = POWERPC_CPU(cs);
> > > +    int id = ppc_get_vcpu_dt_id(cpu);
> > > +    sPAPRDRConnector *drc =
> > > +        spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, id);
> > > +    sPAPRDRConnectorClass *drck;
> > > +    int smt = kvmppc_smt_threads();
> > > +    Error *local_err = NULL;
> > > +    void *fdt = NULL;
> > > +    int i, fdt_offset = 0;
> > > +
> > > +    /* Set NUMA node for the added CPUs  */
> > > +    for (i = 0; i < nb_numa_nodes; i++) {
> > > +        if (test_bit(cs->cpu_index, numa_info[i].node_cpu)) {
> > > +            cs->numa_node = i;
> > > +            break;
> > > +        }
> > > +    }
> > > +
> > > +    /*
> > > +     * SMT threads return from here, only main thread (core) will
> > > +     * continue and signal hotplug event to the guest.
> > > +     */
> > > +    if ((id % smt) != 0) {
> > > +        return;
> > > +    }
> > 
> > Couldn't you avoid this by attaching this call to the core device,
> > rather than the individual vcpu thread objects?
> 
> Adding a socket device will result in realize call for that device.
> Socket realizefn will call core realizefn and core realizefn will call
> thread (or CPU) realizefn. device_set_realized() will call ->plug handler
> for all these devices (socket, cores and threads) and that's how we
> end up here even for threads.
> 
> This will be same when I get rid of socket abstraction and hot plug
> cores for the same reason as above.
> 
> And calling ->plug handler in the context of threads is required
> to initialize board specific CPU bits for the CPU thread that is being
> realized.

Right, but can't you do the thread specific initialization from the
thread hotplug path, and the core specific initialization (basically
everything below this if) from the core hotplug path?

> 
> > 
> > 
> > > +    if (!spapr->dr_cpu_enabled) {
> > > +        /*
> > > +         * This is a cold plugged CPU but the machine doesn't support
> > > +         * DR. So skip the hotplug path ensuring that the CPU is brought
> > > +         * up online with out an associated DR connector.
> > > +         */
> > > +        return;
> > > +    }
> > > +
> > > +    g_assert(drc);
> > > +
> > > +    /*
> > > +     * Setup CPU DT entries only for hotplugged CPUs. For boot time or
> > > +     * coldplugged CPUs DT entries are setup in spapr_finalize_fdt().
> > > +     */
> > > +    if (dev->hotplugged) {
> > > +        fdt = spapr_populate_hotplug_cpu_dt(dev, cs, &fdt_offset);
> > > +    }
> > > +
> > > +    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > +    drck->attach(drc, dev, fdt, fdt_offset, !dev->hotplugged, &local_err);
> > > +    if (local_err) {
> > > +        g_free(fdt);
> > > +        error_propagate(errp, local_err);
> > > +        return;
> > > +    }
> > > +
> > > +    /*
> > > +     * We send hotplug notification interrupt to the guest only in case
> > > +     * of hotplugged CPUs.
> > > +     */
> > > +    if (dev->hotplugged) {
> > > +        spapr_hotplug_req_add_event(drc);
> > > +    } else {
> > > +        /*
> > > +         * HACK to support removal of hotplugged CPU after VM migration:
> > > +         *
> > > +         * Since we want to be able to hot-remove those coldplugged CPUs
> > > +         * started at boot time using -device option at the target VM, we set
> > > +         * the right allocation_state and isolation_state for them, which for
> > > +         * the hotplugged CPUs would be set via RTAS calls done from the
> > > +         * guest during hotplug.
> > > +         *
> > > +         * This allows the coldplugged CPUs started using -device option to
> > > +         * have the right isolation and allocation states as expected by the
> > > +         * CPU hot removal code.
> > > +         *
> > > +         * This hack will be removed once we have DRC states migrated as part
> > > +         * of VM migration.
> > > +         */
> > > +        drck->set_allocation_state(drc, SPAPR_DR_ALLOCATION_STATE_USABLE);
> > > +        drck->set_isolation_state(drc, SPAPR_DR_ISOLATION_STATE_UNISOLATED);
> > > +    }
> > > +
> > > +    return;
> > > +}
> > > +
> > >  static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
> > >                                        DeviceState *dev, Error **errp)
> > >  {
> > > @@ -1858,6 +1978,15 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
> > >          PowerPCCPU *cpu = POWERPC_CPU(cs);
> > >  
> > >          spapr_cpu_init(cpu);
> > > +        spapr_cpu_reset(cpu);
> > 
> > I'm a little surprised these get called here, rather than in the
> > creation / realize path of the core qdev.
> 
> These two routines (spapr_cpu_init and spapr_cpu_reset) are sPAPR or
> board specific and I can't even have them as part of ppc_cpu_realizefn.

Oh, yes of course.  Sorry.

> We had discussed this earlier at
> http://lists.gnu.org/archive/html/qemu-devel/2015-02/msg04399.html
> 
> Regards,
> Bharata.
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 17/24] cpus: Reclaim vCPU objects
  2015-05-06  6:37     ` Bharata B Rao
@ 2015-05-07  1:06       ` David Gibson
  0 siblings, 0 replies; 74+ messages in thread
From: David Gibson @ 2015-05-07  1:06 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: Zhu Guihua, mdroth, aik, agraf, qemu-devel, Chen Fan, qemu-ppc,
	tyreld, nfont, Gu Zheng, imammedo, afaerber

[-- Attachment #1: Type: text/plain, Size: 4357 bytes --]

On Wed, May 06, 2015 at 12:07:57PM +0530, Bharata B Rao wrote:
> On Tue, May 05, 2015 at 05:20:04PM +1000, David Gibson wrote:
> > On Fri, Apr 24, 2015 at 12:17:39PM +0530, Bharata B Rao wrote:
> > > From: Gu Zheng <guz.fnst@cn.fujitsu.com>
> > > 
> > > In order to deal well with the kvm vcpus (which can not be removed without any
> > > protection), we do not close KVM vcpu fd, just record and mark it as stopped
> > > into a list, so that we can reuse it for the appending cpu hot-add request if
> > > possible. It is also the approach that kvm guys suggested:
> > > https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
> > > 
> > > This patch also adds a QOM API object_has_no_children(Object *obj)
> > > that checks whether a given object has any child objects. This API
> > > is needed to release CPU core and socket objects when a vCPU is destroyed.
> > 
> > I'm guessing this commit message needs updating, since you seem to
> > have split this out into the previous patch.
> > 
> > > Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> > > Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
> > > Signed-off-by: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
> > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > >                [Added core and socket removal bits]
> > > ---
> > >  cpus.c               | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  include/qom/cpu.h    | 11 +++++++++
> > >  include/sysemu/kvm.h |  1 +
> > >  kvm-all.c            | 57 +++++++++++++++++++++++++++++++++++++++++++-
> > >  kvm-stub.c           |  5 ++++
> > >  5 files changed, 140 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/cpus.c b/cpus.c
> > > index 0fac143..325f8a6 100644
> > > --- a/cpus.c
> > > +++ b/cpus.c
> > > @@ -858,6 +858,47 @@ void async_run_on_cpu(CPUState *cpu, void (*func)(void *data), void *data)
> > >      qemu_cpu_kick(cpu);
> > >  }
> > >  
> > > +static void qemu_destroy_cpu_core(Object *core)
> > > +{
> > > +    Object *socket = core->parent;
> > > +
> > > +    object_unparent(core);
> > > +    if (socket && object_has_no_children(socket)) {
> > > +        object_unparent(socket);
> > > +    }
> > 
> > This seems a bit odd to me.  I thought the general idea of the new
> > approaches to cpu hotplug meant that the hotplug sequence started from
> > the top (the socket or core) and worked down to the threads.  Rather
> > than starting at the thread, and working up to the core and socket
> > level.
> 
> Yes that's true for hotplug as well as hot unplug curently. Plug or
> unplug starts at socket, moves down to cores and threads.
> 
> However when the unplug request comes down to the thread, we have to
> destroy the vCPU and that's when we end up in this part of the code. Here
> the thread (vCPU) unparents itself from the core. The core can't unparent
> untill all its threads have unparented themselves. When all threads of a
> core are done unparenting, core goes ahead and unparents itself from
> its parent socket. Similarly socket can unparent when all cores under
> it have unparented themselves from the socket.

Why can't the core unplug routine propagte the unplug down to the
threads, let that complete, then do the per-core unplug stuff and
remove itself?

Is there an asynchronous callback in here somewhere?

> This is the code that ensures that the socket device object finally
> gets cleared and the id associated with the hot removed socket device
> is available for reuse with next hotplug.
> 
> > 
> > > +}
> > > +
> > > +static void qemu_kvm_destroy_vcpu(CPUState *cpu)
> > > +{
> > > +    Object *thread = OBJECT(cpu);
> > > +    Object *core = thread->parent;
> > > +
> > > +    CPU_REMOVE(cpu);
> > > +
> > > +    if (kvm_destroy_vcpu(cpu) < 0) {
> > > +        error_report("kvm_destroy_vcpu failed.\n");
> > > +        exit(EXIT_FAILURE);
> > > +    }
> > > +
> > > +    object_unparent(thread);
> > > +    if (core && object_has_no_children(core)) {
> > > +        qemu_destroy_cpu_core(core);
> > > +    }
> > > +}
> > > +
> 
> Regards,
> Bharata.
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 18/24] xics_kvm: Don't enable KVM_CAP_IRQ_XICS if already enabled
  2015-05-06  5:42     ` Bharata B Rao
@ 2015-05-07  1:07       ` David Gibson
  0 siblings, 0 replies; 74+ messages in thread
From: David Gibson @ 2015-05-07  1:07 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

[-- Attachment #1: Type: text/plain, Size: 1364 bytes --]

On Wed, May 06, 2015 at 11:12:00AM +0530, Bharata B Rao wrote:
> On Tue, May 05, 2015 at 05:22:52PM +1000, David Gibson wrote:
> > On Fri, Apr 24, 2015 at 12:17:40PM +0530, Bharata B Rao wrote:
> > > When supporting CPU hot removal by parking the vCPU fd and reusing
> > > it during hotplug again, there can be cases where we try to reenable
> > > KVM_CAP_IRQ_XICS CAP for the vCPU for which it was already enabled.
> > > Introduce a boolean member in ICPState to track this and don't
> > > reenable the CAP if it was already enabled earlier.
> > > 
> > > This change allows CPU hot removal to work for sPAPR.
> > > 
> > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > 
> > Do you actually need this?  Is there any harm in setting the
> > capability multiple times, or could you just ignore the "already set"
> > error?
> 
> We discussed this last time and concluded that this patch is needed.
> 
> Ref: http://lists.nongnu.org/archive/html/qemu-devel/2015-03/msg05361.html

Ah, ok.  Can you include the explanation from that message into the
commit message so I don't forget next time (and for the benefit of
other reviewers).

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 20/24] spapr: CPU hot unplug support
  2015-05-06  7:55     ` Bharata B Rao
@ 2015-05-07  1:09       ` David Gibson
  0 siblings, 0 replies; 74+ messages in thread
From: David Gibson @ 2015-05-07  1:09 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	imammedo, afaerber

[-- Attachment #1: Type: text/plain, Size: 4779 bytes --]

On Wed, May 06, 2015 at 01:25:55PM +0530, Bharata B Rao wrote:
> On Tue, May 05, 2015 at 05:28:38PM +1000, David Gibson wrote:
> > On Fri, Apr 24, 2015 at 12:17:42PM +0530, Bharata B Rao wrote:
> > > Support hot removal of CPU for sPAPR guests by sending the hot unplug
> > > notification to the guest via EPOW interrupt. Release the vCPU object
> > > after CPU hot unplug so that vCPU fd can be parked and reused.
> > > 
> > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > > ---
> > >  hw/ppc/spapr.c              | 101 +++++++++++++++++++++++++++++++++++++++++++-
> > >  target-ppc/translate_init.c |  10 +++++
> > >  2 files changed, 110 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index 9b0701c..910a50f 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -1481,6 +1481,12 @@ static void spapr_cpu_init(PowerPCCPU *cpu)
> > >      qemu_register_reset(spapr_cpu_reset, cpu);
> > >  }
> > >  
> > > +static void spapr_cpu_destroy(PowerPCCPU *cpu)
> > > +{
> > > +    xics_cpu_destroy(spapr->icp, cpu);
> > > +    qemu_unregister_reset(spapr_cpu_reset, cpu);
> > > +}
> > > +
> > >  /* pSeries LPAR / sPAPR hardware init */
> > >  static void ppc_spapr_init(MachineState *machine)
> > >  {
> > > @@ -1883,6 +1889,24 @@ static void *spapr_populate_hotplug_cpu_dt(DeviceState *dev, CPUState *cs,
> > >      return fdt;
> > >  }
> > >  
> > > +static void spapr_cpu_release(DeviceState *dev, void *opaque)
> > > +{
> > > +    CPUState *cs;
> > > +    int i;
> > > +    int id = ppc_get_vcpu_dt_id(POWERPC_CPU(CPU(dev)));
> > > +
> > > +    for (i = id; i < id + smp_threads; i++) {
> > > +        CPU_FOREACH(cs) {
> > > +            PowerPCCPU *cpu = POWERPC_CPU(cs);
> > > +
> > > +            if (i == ppc_get_vcpu_dt_id(cpu)) {
> > > +                spapr_cpu_destroy(cpu);
> > > +                cpu_remove(cs);
> > > +            }
> > > +        }
> > > +    }
> > > +}
> > > +
> > >  static void spapr_cpu_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > >                              Error **errp)
> > >  {
> > > @@ -1970,6 +1994,59 @@ static void spapr_cpu_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > >      return;
> > >  }
> > >  
> > > +static int spapr_cpu_unplug(Object *obj, void *opaque)
> > > +{
> > > +    Error **errp = opaque;
> > > +    DeviceState *dev = DEVICE(obj);
> > > +    CPUState *cs = CPU(dev);
> > > +    PowerPCCPU *cpu = POWERPC_CPU(cs);
> > > +    int id = ppc_get_vcpu_dt_id(cpu);
> > > +    int smt = kvmppc_smt_threads();
> > > +    sPAPRDRConnector *drc =
> > > +        spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, id);
> > > +    sPAPRDRConnectorClass *drck;
> > > +    Error *local_err = NULL;
> > > +
> > > +    /*
> > > +     * SMT threads return from here, only main thread (core) will
> > > +     * continue and signal hot unplug event to the guest.
> > > +     */
> > > +    if ((id % smt) != 0) {
> > > +        return 0;
> > > +    }
> > 
> > As with the in-plug side couldn't this be done more naturally by
> > attaching this function to the cpu core object rather than the thread?
> 
> When I switch to core level addition in the next version, I will see
> how best this can be handled.
> 
> > 
> > > +    g_assert(drc);
> > > +
> > > +    drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > +    drck->detach(drc, dev, spapr_cpu_release, NULL, &local_err);
> > > +    if (local_err) {
> > > +        error_propagate(errp, local_err);
> > > +        return -1;
> > > +    }
> > > +
> > > +    /*
> > > +     * In addition to hotplugged CPUs, send the hot-unplug notification
> > > +     * interrupt to the guest for coldplugged CPUs started via -device
> > > +     * option too.
> > > +     */
> > > +    spapr_hotplug_req_remove_event(drc);
> > 
> > Um.. doesn't the remove notification need to go *before* the
> > "physical" unplug?  Along with a wait for acknowledgement from the
> > guest?
> 
> If I am reading it right, PCI hotplug is also following the same order
> (detach followed by guest notification).

So for PCI that might work.  Attempting to access an unplugged device
should mean IO reads return 0xff.  PCI drivers are supposed to be able
to cope with that at least long enough for the unplug notification to
come through.

You absolutely can't unplug CPUs without the OS relinquishing them
first though.

> Let me experiment with this in the next version.
> 
> Regards,
> Bharata.
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 21/24] spapr: Initialize hotplug memory address space
  2015-05-06  8:23     ` Bharata B Rao
@ 2015-05-07  1:12       ` David Gibson
  2015-05-07  5:01         ` Bharata B Rao
  0 siblings, 1 reply; 74+ messages in thread
From: David Gibson @ 2015-05-07  1:12 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	Igor Mammedov, afaerber

[-- Attachment #1: Type: text/plain, Size: 6367 bytes --]

On Wed, May 06, 2015 at 01:53:05PM +0530, Bharata B Rao wrote:
> On Tue, May 05, 2015 at 10:48:50AM +0200, Igor Mammedov wrote:
> > On Fri, 24 Apr 2015 12:17:43 +0530
> > Bharata B Rao <bharata@linux.vnet.ibm.com> wrote:
> > 
> > > Initialize a hotplug memory region under which all the hotplugged
> > > memory is accommodated. Also enable memory hotplug by setting
> > > CONFIG_MEM_HOTPLUG.
> > > 
> > > Modelled on i386 memory hotplug.
> > > 
> > > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > > ---
> > >  default-configs/ppc64-softmmu.mak |  1 +
> > >  hw/ppc/spapr.c                    | 38 ++++++++++++++++++++++++++++++++++++++
> > >  include/hw/ppc/spapr.h            | 12 ++++++++++++
> > >  3 files changed, 51 insertions(+)
> > > 
> > > diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
> > > index 22ef132..16b3011 100644
> > > --- a/default-configs/ppc64-softmmu.mak
> > > +++ b/default-configs/ppc64-softmmu.mak
> > > @@ -51,3 +51,4 @@ CONFIG_XICS_KVM=$(and $(CONFIG_PSERIES),$(CONFIG_KVM))
> > >  # For PReP
> > >  CONFIG_MC146818RTC=y
> > >  CONFIG_ISA_TESTDEV=y
> > > +CONFIG_MEM_HOTPLUG=y
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index 910a50f..9dc4c36 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -125,6 +125,9 @@ struct sPAPRMachineState {
> > >  
> > >      /*< public >*/
> > >      char *kvm_type;
> > > +    ram_addr_t hotplug_memory_base;
> > > +    MemoryRegion hotplug_memory;
> > > +    bool enforce_aligned_dimm;
> > >  };
> > >  
> > >  sPAPREnvironment *spapr;
> > > @@ -1514,6 +1517,7 @@ static void ppc_spapr_init(MachineState *machine)
> > >      QemuOpts *opts = qemu_opts_find(qemu_find_opts("smp-opts"), NULL);
> > >      int sockets = opts ? qemu_opt_get_number(opts, "sockets", 0) : 0;
> > >      int cores = (smp_cpus/smp_threads) ? smp_cpus/smp_threads : 1;
> > > +    sPAPRMachineState *ms = SPAPR_MACHINE(machine);
> > >  
> > >      sockets = sockets ? sockets : cores;
> > >      msi_supported = true;
> > > @@ -1613,6 +1617,36 @@ static void ppc_spapr_init(MachineState *machine)
> > >          memory_region_add_subregion(sysmem, 0, rma_region);
> > >      }
> > >  
> > > +    /* initialize hotplug memory address space */
> > > +    if (machine->ram_size < machine->maxram_size) {
> > > +        ram_addr_t hotplug_mem_size =
> > > +            machine->maxram_size - machine->ram_size;
> > > +
> > > +        if (machine->ram_slots > SPAPR_MAX_RAM_SLOTS) {
> > > +            error_report("unsupported amount of memory slots: %"PRIu64,
> > > +                         machine->ram_slots);
> > > +            exit(EXIT_FAILURE);
> > > +        }
> > > +
> > > +        ms->hotplug_memory_base = ROUND_UP(machine->ram_size,
> > > +                                    SPAPR_HOTPLUG_MEM_ALIGN);
> > > +
> > > +        if (ms->enforce_aligned_dimm) {
> > > +            hotplug_mem_size += SPAPR_HOTPLUG_MEM_ALIGN * machine->ram_slots;
> > > +        }
> > > +
> > > +        if ((ms->hotplug_memory_base + hotplug_mem_size) < hotplug_mem_size) {
> > > +            error_report("unsupported amount of maximum memory: " RAM_ADDR_FMT,
> > > +                         machine->maxram_size);
> > > +            exit(EXIT_FAILURE);
> > > +        }
> > > +
> > > +        memory_region_init(&ms->hotplug_memory, OBJECT(ms),
> > > +                           "hotplug-memory", hotplug_mem_size);
> > > +        memory_region_add_subregion(sysmem, ms->hotplug_memory_base,
> > > +                                    &ms->hotplug_memory);
> > > +    }
> > > +
> > >      filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, "spapr-rtas.bin");
> > >      spapr->rtas_size = get_image_size(filename);
> > >      spapr->rtas_blob = g_malloc(spapr->rtas_size);
> > > @@ -1844,11 +1878,15 @@ static void spapr_set_kvm_type(Object *obj, const char *value, Error **errp)
> > >  
> > >  static void spapr_machine_initfn(Object *obj)
> > >  {
> > > +    sPAPRMachineState *ms = SPAPR_MACHINE(obj);
> > > +
> > >      object_property_add_str(obj, "kvm-type",
> > >                              spapr_get_kvm_type, spapr_set_kvm_type, NULL);
> > >      object_property_set_description(obj, "kvm-type",
> > >                                      "Specifies the KVM virtualization mode (HV, PR)",
> > >                                      NULL);
> > > +
> > > +    ms->enforce_aligned_dimm = true;
> > >  }
> > >  
> > >  static void ppc_cpu_do_nmi_on_cpu(void *arg)
> > > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > > index ecac6e3..53560e9 100644
> > > --- a/include/hw/ppc/spapr.h
> > > +++ b/include/hw/ppc/spapr.h
> > > @@ -542,6 +542,18 @@ struct sPAPREventLogEntry {
> > >  
> > >  #define SPAPR_MEMORY_BLOCK_SIZE (1 << 28) /* 256MB */
> > >  
> > > +/*
> > > + * This defines the maximum number of DIMM slots we can have for sPAPR
> > > + * guest. This is not defined by sPAPR but we are defining it to 4096 slots
> > > + * here. With the worst case addition of SPAPR_MEMORY_BLOCK_SIZE
> > > + * (256MB) memory per slot, we should be able to support 1TB of guest
> > > + * hotpluggable memory.
> > > + */
> > > +#define SPAPR_MAX_RAM_SLOTS     (1ULL << 12)
> > why not write 4096 instead of (1ULL << 12), much easier to read.
> 
> Sure.
> 
> > 
> > BTW:
> > KVM supports upto 509 memory slots including slots consumed by
> > initial memory.
> 
> I see that PowerPC defaults to 32 slots. So having 4096 slots is really
> pointless then ? So to ensure more hot-pluggable memory space is available
> should I be increasing the size of the minimum pluggable memory in a
> dimm slot (as defined by SPAPR_MEMORY_BLOCK_SIZE above) ?

That seems a bit nasty, since then the granularity of adding blocks
will be enormous for small guests as well.

Is it possible to increase the maximum size of a single DIMM, but not
the minimum?  That way you can still do small inserts for small
guests.  To get the full RAM for big guests youd have to insert big
chunks though, due to the limited number of slots.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 23/24] spapr: Support ibm, dynamic-reconfiguration-memory
  2015-05-06  8:27     ` Bharata B Rao
@ 2015-05-07  1:13       ` David Gibson
  0 siblings, 0 replies; 74+ messages in thread
From: David Gibson @ 2015-05-07  1:13 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mdroth, Nikunj A. Dadhania, aik, agraf, qemu-devel, qemu-ppc,
	tyreld, nfont, imammedo, afaerber

[-- Attachment #1: Type: text/plain, Size: 1463 bytes --]

On Wed, May 06, 2015 at 01:57:50PM +0530, Bharata B Rao wrote:
> On Tue, May 05, 2015 at 05:40:32PM +1000, David Gibson wrote:
> > On Fri, Apr 24, 2015 at 12:17:45PM +0530, Bharata B Rao wrote:
> > > Parse ibm,architecture.vec table obtained from the guest and enable
> > > memory node configuration via ibm,dynamic-reconfiguration-memory if guest
> > > supports it. This is in preparation to support memory hotplug for
> > > sPAPR guests.
> > > 
> > > This changes the way memory node configuration is done. Currently all
> > > memory nodes are built upfront. But after this patch, only memory@0 node
> > > for RMA is built upfront. Guest kernel boots with just that and rest of
> > > the memory nodes (via memory@XXX or ibm,dynamic-reconfiguration-memory)
> > > are built when guest does ibm,client-architecture-support call.
> > > 
> > > Note: This patch needs a SLOF enhancement which is already part of
> > > upstream SLOF.
> > 
> > Is it in the SLOF included in the qemu submodule though?  If not you
> > should have a patch to update the submodule first.
> 
> Nikunj confirms that SLOF change needed to support
> ibm,dynamic-reconfiguration-memory is already part of QEMU.

Ok great.  Can you adjust the commit message to clarify that.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: [Qemu-devel] [RFC PATCH v3 21/24] spapr: Initialize hotplug memory address space
  2015-05-07  1:12       ` David Gibson
@ 2015-05-07  5:01         ` Bharata B Rao
  0 siblings, 0 replies; 74+ messages in thread
From: Bharata B Rao @ 2015-05-07  5:01 UTC (permalink / raw)
  To: David Gibson
  Cc: mdroth, aik, agraf, qemu-devel, qemu-ppc, tyreld, nfont,
	Igor Mammedov, afaerber

On Thu, May 07, 2015 at 11:12:36AM +1000, David Gibson wrote:
> On Wed, May 06, 2015 at 01:53:05PM +0530, Bharata B Rao wrote:
> > On Tue, May 05, 2015 at 10:48:50AM +0200, Igor Mammedov wrote:
> > > On Fri, 24 Apr 2015 12:17:43 +0530
> > > Bharata B Rao <bharata@linux.vnet.ibm.com> wrote:
<snip>
> > > 
> > > > Initialize a hotplug memory region under which all the hotplugged
> > > > memory is accommodated. Also enable memory hotplug by setting
> > > > CONFIG_MEM_HOTPLUG.
> > > > 
> > > >  }
> > > >  
> > > >  static void ppc_cpu_do_nmi_on_cpu(void *arg)
> > > > diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> > > > index ecac6e3..53560e9 100644
> > > > --- a/include/hw/ppc/spapr.h
> > > > +++ b/include/hw/ppc/spapr.h
> > > > @@ -542,6 +542,18 @@ struct sPAPREventLogEntry {
> > > >  
> > > >  #define SPAPR_MEMORY_BLOCK_SIZE (1 << 28) /* 256MB */
> > > >  
> > > > +/*
> > > > + * This defines the maximum number of DIMM slots we can have for sPAPR
> > > > + * guest. This is not defined by sPAPR but we are defining it to 4096 slots
> > > > + * here. With the worst case addition of SPAPR_MEMORY_BLOCK_SIZE
> > > > + * (256MB) memory per slot, we should be able to support 1TB of guest
> > > > + * hotpluggable memory.
> > > > + */
> > > > +#define SPAPR_MAX_RAM_SLOTS     (1ULL << 12)
> > > why not write 4096 instead of (1ULL << 12), much easier to read.
> > 
> > Sure.
> > 
> > > 
> > > BTW:
> > > KVM supports upto 509 memory slots including slots consumed by
> > > initial memory.
> > 
> > I see that PowerPC defaults to 32 slots. So having 4096 slots is really
> > pointless then ? So to ensure more hot-pluggable memory space is available
> > should I be increasing the size of the minimum pluggable memory in a
> > dimm slot (as defined by SPAPR_MEMORY_BLOCK_SIZE above) ?
> 
> That seems a bit nasty, since then the granularity of adding blocks
> will be enormous for small guests as well.
> 
> Is it possible to increase the maximum size of a single DIMM, but not
> the minimum?  That way you can still do small inserts for small
> guests.  To get the full RAM for big guests youd have to insert big
> chunks though, due to the limited number of slots.

The maximum memory that can be plugged into a slot is not limited iiuc,
so smaller and bigger guests can use memory sizes appropriate for them.
Only the minimum size is restricted to SPAPR_MEMORY_BLOCK_SIZE (256M).

So I guess I should just stick to 32 slots here for PowerPC memory
hotplug implementation.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2015-05-07  5:02 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-24  6:47 [Qemu-devel] [RFC PATCH v3 00/24] CPU and Memory hotplug for PowerPC sPAPR guests Bharata B Rao
2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 01/24] spapr: enable PHB/CPU/LMB hotplug for pseries-2.3 Bharata B Rao
2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 02/24] spapr: Add DRC dt entries for CPUs Bharata B Rao
2015-05-04 11:46   ` David Gibson
2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 03/24] spapr: Consider max_cpus during xics initialization Bharata B Rao
2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 04/24] spapr: Support ibm, lrdr-capacity device tree property Bharata B Rao
2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 05/24] spapr: Reorganize CPU dt generation code Bharata B Rao
2015-04-26 11:47   ` Bharata B Rao
2015-04-27  5:36     ` Bharata B Rao
2015-05-04 12:01       ` David Gibson
2015-05-04 11:59   ` David Gibson
2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 06/24] spapr: Consolidate cpu init code into a routine Bharata B Rao
2015-05-04 16:10   ` Thomas Huth
2015-05-06  4:28     ` Bharata B Rao
2015-05-06  6:32       ` Thomas Huth
2015-05-06  8:45         ` Bharata B Rao
2015-05-06  9:37           ` Thomas Huth
2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 07/24] cpu: Prepare Socket container type Bharata B Rao
2015-05-05  1:47   ` David Gibson
2015-05-06  4:36     ` Bharata B Rao
2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 08/24] ppc: Prepare CPU socket/core abstraction Bharata B Rao
2015-05-04 15:15   ` Thomas Huth
2015-05-06  4:40     ` Bharata B Rao
2015-05-06  6:52       ` Thomas Huth
2015-05-05  6:46   ` David Gibson
2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 09/24] spapr: Add CPU hotplug handler Bharata B Rao
2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 10/24] ppc: Update cpu_model in MachineState Bharata B Rao
2015-05-05  6:49   ` David Gibson
2015-05-06  4:49     ` Bharata B Rao
2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 11/24] ppc: Create sockets and cores for CPUs Bharata B Rao
2015-05-05  6:52   ` David Gibson
2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 12/24] spapr: CPU hotplug support Bharata B Rao
2015-05-04 15:53   ` Thomas Huth
2015-05-06  5:37     ` Bharata B Rao
2015-05-05  6:59   ` David Gibson
2015-05-06  6:14     ` Bharata B Rao
2015-05-07  1:03       ` David Gibson
2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 13/24] cpus: Add Error argument to cpu_exec_init() Bharata B Rao
2015-05-05  7:01   ` David Gibson
2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 14/24] cpus: Convert cpu_index into a bitmap Bharata B Rao
2015-05-05  7:10   ` David Gibson
2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 15/24] ppc: Move cpu_exec_init() call to realize function Bharata B Rao
2015-05-05  7:12   ` David Gibson
2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 16/24] qom: Introduce object_has_no_children() API Bharata B Rao
2015-05-05  7:13   ` David Gibson
2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 17/24] cpus: Reclaim vCPU objects Bharata B Rao
2015-05-05  7:20   ` David Gibson
2015-05-06  6:37     ` Bharata B Rao
2015-05-07  1:06       ` David Gibson
2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 18/24] xics_kvm: Don't enable KVM_CAP_IRQ_XICS if already enabled Bharata B Rao
2015-05-05  7:22   ` David Gibson
2015-05-06  5:42     ` Bharata B Rao
2015-05-07  1:07       ` David Gibson
2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 19/24] xics_kvm: Add cpu_destroy method to XICS Bharata B Rao
2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 20/24] spapr: CPU hot unplug support Bharata B Rao
2015-05-05  7:28   ` David Gibson
2015-05-06  7:55     ` Bharata B Rao
2015-05-07  1:09       ` David Gibson
2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 21/24] spapr: Initialize hotplug memory address space Bharata B Rao
2015-05-05  7:33   ` David Gibson
2015-05-06  7:58     ` Bharata B Rao
2015-05-05  8:48   ` Igor Mammedov
2015-05-06  8:23     ` Bharata B Rao
2015-05-07  1:12       ` David Gibson
2015-05-07  5:01         ` Bharata B Rao
2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 22/24] numa: API to lookup NUMA node by address Bharata B Rao
2015-05-05  7:35   ` David Gibson
2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 23/24] spapr: Support ibm, dynamic-reconfiguration-memory Bharata B Rao
2015-05-05  7:40   ` David Gibson
2015-05-06  8:27     ` Bharata B Rao
2015-05-07  1:13       ` David Gibson
2015-04-24  6:47 ` [Qemu-devel] [RFC PATCH v3 24/24] spapr: Memory hotplug support Bharata B Rao
2015-05-05  7:45   ` David Gibson
2015-05-06  8:30     ` Bharata B Rao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.