[Qemu-devel] [PATCH 0/4] spapr: disable hotplugging without OS

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/4] spapr: disable hotplugging without OS
@ 2017-05-23 11:18 Laurent Vivier
  2017-05-23 11:18 ` [Qemu-devel] [PATCH 1/4] spapr: add pre_plug function for memory Laurent Vivier
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Laurent Vivier @ 2017-05-23 11:18 UTC (permalink / raw)
  To: David Gibson
  Cc: Thomas Huth, qemu-ppc, qemu-devel, Michael Roth, Laurent Vivier

If the OS is not started, QEMU sends an event to the OS
that is lost and cannot be recovered. An unplug is not
able to restore QEMU in a coherent state.
So, while the OS is not started, disable CPU and memory hotplug.
We use option vector 6 to know if the OS is started

This series moves error checking for memory hotplug
in a pre_plug function, and introduces the option
vector 6 management. It also revert previous
fix which was not really fixing the hotplug problem
when the OS is not running.

Laurent Vivier (4):
  spapr: add pre_plug function for memory
  spapr: add option vector 6
  spapr: disable hotplugging without OS
  Revert "spapr: fix memory hot-unplugging"

 hw/ppc/spapr.c              | 103 ++++++++++++++++++++++++++++++++++++--------
 hw/ppc/spapr_drc.c          |  20 ++-------
 hw/ppc/spapr_hcall.c        |   5 ++-
 hw/ppc/spapr_ovec.c         |   8 ++++
 include/hw/ppc/spapr.h      |   2 +
 include/hw/ppc/spapr_drc.h  |   1 -
 include/hw/ppc/spapr_ovec.h |   7 +++
 7 files changed, 109 insertions(+), 37 deletions(-)

-- 
2.9.4

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH 1/4] spapr: add pre_plug function for memory
  2017-05-23 11:18 [Qemu-devel] [PATCH 0/4] spapr: disable hotplugging without OS Laurent Vivier
@ 2017-05-23 11:18 ` Laurent Vivier
  2017-05-23 15:28   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  2017-05-23 11:18 ` [Qemu-devel] [PATCH 2/4] spapr: add option vector 6 Laurent Vivier
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Laurent Vivier @ 2017-05-23 11:18 UTC (permalink / raw)
  To: David Gibson
  Cc: Thomas Huth, qemu-ppc, qemu-devel, Michael Roth, Laurent Vivier

This allows to manage errors before the memory
has started to be hotplugged. We already have
the function for the CPU cores.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
 hw/ppc/spapr.c | 45 ++++++++++++++++++++++++++++++---------------
 1 file changed, 30 insertions(+), 15 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0980d73..0e8d8d1 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2569,20 +2569,6 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
     uint64_t align = memory_region_get_alignment(mr);
     uint64_t size = memory_region_size(mr);
     uint64_t addr;
-    char *mem_dev;
-
-    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
-        error_setg(&local_err, "Hotplugged memory size must be a multiple of "
-                      "%lld MB", SPAPR_MEMORY_BLOCK_SIZE/M_BYTE);
-        goto out;
-    }
-
-    mem_dev = object_property_get_str(OBJECT(dimm), PC_DIMM_MEMDEV_PROP, NULL);
-    if (mem_dev && !kvmppc_is_mem_backend_page_size_ok(mem_dev)) {
-        error_setg(&local_err, "Memory backend has bad page size. "
-                   "Use 'memory-backend-file' with correct mem-path.");
-        goto out;
-    }
 
     pc_dimm_memory_plug(dev, &ms->hotplug_memory, mr, align, &local_err);
     if (local_err) {
@@ -2603,6 +2589,33 @@ out:
     error_propagate(errp, local_err);
 }
 
+static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
+                                Error **errp)
+{
+    PCDIMMDevice *dimm = PC_DIMM(dev);
+    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
+    MemoryRegion *mr = ddc->get_memory_region(dimm);
+    uint64_t size = memory_region_size(mr);
+    Error *local_err = NULL;
+    char *mem_dev;
+
+    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
+        error_setg(&local_err, "Hotplugged memory size must be a multiple of "
+                      "%lld MB", SPAPR_MEMORY_BLOCK_SIZE / M_BYTE);
+        goto out;
+    }
+
+    mem_dev = object_property_get_str(OBJECT(dimm), PC_DIMM_MEMDEV_PROP, NULL);
+    if (mem_dev && !kvmppc_is_mem_backend_page_size_ok(mem_dev)) {
+        error_setg(&local_err, "Memory backend has bad page size. "
+                   "Use 'memory-backend-file' with correct mem-path.");
+        goto out;
+    }
+
+out:
+    error_propagate(errp, local_err);
+}
+
 typedef struct sPAPRDIMMState {
     uint32_t nr_lmbs;
 } sPAPRDIMMState;
@@ -2990,7 +3003,9 @@ static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
 static void spapr_machine_device_pre_plug(HotplugHandler *hotplug_dev,
                                           DeviceState *dev, Error **errp)
 {
-    if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
+    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        spapr_memory_pre_plug(hotplug_dev, dev, errp);
+    } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
         spapr_core_pre_plug(hotplug_dev, dev, errp);
     }
 }
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH 2/4] spapr: add option vector 6
  2017-05-23 11:18 [Qemu-devel] [PATCH 0/4] spapr: disable hotplugging without OS Laurent Vivier
  2017-05-23 11:18 ` [Qemu-devel] [PATCH 1/4] spapr: add pre_plug function for memory Laurent Vivier
@ 2017-05-23 11:18 ` Laurent Vivier
  2017-05-23 16:31   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  2017-05-24  4:58   ` [Qemu-devel] " David Gibson
  2017-05-23 11:18 ` [Qemu-devel] [PATCH 3/4] spapr: disable hotplugging without OS Laurent Vivier
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 27+ messages in thread
From: Laurent Vivier @ 2017-05-23 11:18 UTC (permalink / raw)
  To: David Gibson
  Cc: Thomas Huth, qemu-ppc, qemu-devel, Michael Roth, Laurent Vivier

This allows to know when the OS is started and its type.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
 hw/ppc/spapr.c              | 36 ++++++++++++++++++++++++++++++++++++
 hw/ppc/spapr_hcall.c        |  5 ++++-
 hw/ppc/spapr_ovec.c         |  8 ++++++++
 include/hw/ppc/spapr.h      |  2 ++
 include/hw/ppc/spapr_ovec.h |  7 +++++++
 5 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0e8d8d1..eceb4cc 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1369,6 +1369,7 @@ static void ppc_spapr_reset(void)
     first_ppc_cpu->env.nip = SPAPR_ENTRY_POINT;
 
     spapr->cas_reboot = false;
+    spapr->os_name = OV6_NONE;
 }
 
 static void spapr_create_nvram(sPAPRMachineState *spapr)
@@ -1524,10 +1525,41 @@ static const VMStateDescription vmstate_spapr_patb_entry = {
     },
 };
 
+static bool spapr_os_name_needed(void *opaque)
+{
+    sPAPRMachineState *spapr = opaque;
+    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
+    return smc->need_os_name;
+}
+
+static const VMStateDescription vmstate_spapr_os_name = {
+    .name = "spapr_os_name",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = spapr_os_name_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT8(os_name, sPAPRMachineState),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static int spapr_pre_load(void *opaque)
+{
+    sPAPRMachineState *spapr = opaque;
+
+    /* if the os_name is not migrated from the source,
+     * we must allow hotplug, so set os_name to linux
+     */
+    spapr->os_name = OV6_LINUX;
+
+    return 0;
+}
+
 static const VMStateDescription vmstate_spapr = {
     .name = "spapr",
     .version_id = 3,
     .minimum_version_id = 1,
+    .pre_load = spapr_pre_load,
     .post_load = spapr_post_load,
     .fields = (VMStateField[]) {
         /* used to be @next_irq */
@@ -1542,6 +1574,7 @@ static const VMStateDescription vmstate_spapr = {
     .subsections = (const VMStateDescription*[]) {
         &vmstate_spapr_ov5_cas,
         &vmstate_spapr_patb_entry,
+        &vmstate_spapr_os_name,
         NULL
     }
 };
@@ -3216,6 +3249,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
      * in which LMBs are represented and hot-added
      */
     mc->numa_mem_align_shift = 28;
+    smc->need_os_name = true;
 }
 
 static const TypeInfo spapr_machine_info = {
@@ -3293,9 +3327,11 @@ static void spapr_machine_2_9_instance_options(MachineState *machine)
 
 static void spapr_machine_2_9_class_options(MachineClass *mc)
 {
+    sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
     spapr_machine_2_10_class_options(mc);
     SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_9);
     mc->numa_auto_assign_ram = numa_legacy_auto_assign_ram;
+    smc->need_os_name = false;
 }
 
 DEFINE_SPAPR_MACHINE(2_9, "2.9", false);
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 0d608d6..5dbe3c7 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1058,7 +1058,8 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
     uint32_t max_compat = cpu->max_compat;
     uint32_t best_compat = 0;
     int i;
-    sPAPROptionVector *ov1_guest, *ov5_guest, *ov5_cas_old, *ov5_updates;
+    sPAPROptionVector *ov1_guest, *ov5_guest, *ov5_cas_old, *ov5_updates,
+                      *ov6_guest;
     bool guest_radix;
 
     /*
@@ -1112,6 +1113,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
 
     ov1_guest = spapr_ovec_parse_vector(ov_table, 1);
     ov5_guest = spapr_ovec_parse_vector(ov_table, 5);
+    ov6_guest = spapr_ovec_parse_vector(ov_table, 6);
     if (spapr_ovec_test(ov5_guest, OV5_MMU_BOTH)) {
         error_report("guest requested hash and radix MMU, which is invalid.");
         exit(EXIT_FAILURE);
@@ -1154,6 +1156,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
     }
     spapr->cas_legacy_guest_workaround = !spapr_ovec_test(ov1_guest,
                                                           OV1_PPC_3_00);
+    spapr->os_name = spapr_ovec_byte(ov6_guest, OV6_OS_NAME);
     if (!spapr->cas_reboot) {
         spapr->cas_reboot =
             (spapr_h_cas_compose_response(spapr, args[1], args[2],
diff --git a/hw/ppc/spapr_ovec.c b/hw/ppc/spapr_ovec.c
index 41df4c3..7adc9e6 100644
--- a/hw/ppc/spapr_ovec.c
+++ b/hw/ppc/spapr_ovec.c
@@ -160,6 +160,14 @@ static uint8_t guest_byte_from_bitmap(unsigned long *bitmap, long bitmap_offset)
     return entry;
 }
 
+uint8_t spapr_ovec_byte(sPAPROptionVector *ov, long bitnr)
+{
+    g_assert(ov);
+    g_assert(bitnr < OV_MAXBITS);
+
+    return guest_byte_from_bitmap(ov->bitmap, bitnr);
+}
+
 static target_ulong vector_addr(target_ulong table_addr, int vector)
 {
     uint16_t vector_count, vector_len;
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 5802f88..041ce19 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -52,6 +52,7 @@ struct sPAPRMachineClass {
     /*< public >*/
     bool dr_lmb_enabled;       /* enable dynamic-reconfig/hotplug of LMBs */
     bool use_ohci_by_default;  /* use USB-OHCI instead of XHCI */
+    bool need_os_name;
     const char *tcg_default_cpu; /* which (TCG) CPU to simulate by default */
     void (*phb_placement)(sPAPRMachineState *spapr, uint32_t index,
                           uint64_t *buid, hwaddr *pio, 
@@ -90,6 +91,7 @@ struct sPAPRMachineState {
     sPAPROptionVector *ov5_cas;     /* negotiated (via CAS) option vectors */
     bool cas_reboot;
     bool cas_legacy_guest_workaround;
+    uint8_t os_name;
 
     Notifier epow_notifier;
     QTAILQ_HEAD(, sPAPREventLogEntry) pending_events;
diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
index f088833..c728bb3 100644
--- a/include/hw/ppc/spapr_ovec.h
+++ b/include/hw/ppc/spapr_ovec.h
@@ -56,6 +56,12 @@ typedef struct sPAPROptionVector sPAPROptionVector;
 #define OV5_MMU_RADIX_300       OV_BIT(24, 1) /* 1=Radix only, 0=Hash only */
 #define OV5_MMU_RADIX_GTSE      OV_BIT(26, 1) /* Radix GTSE */
 
+/* option vector 6 */
+#define OV6_OS_NAME             OV_BIT(3, 0)
+#define OV6_NONE                0x00
+#define OV6_AIX                 0x01
+#define OV6_LINUX               0x02
+
 /* interfaces */
 sPAPROptionVector *spapr_ovec_new(void);
 sPAPROptionVector *spapr_ovec_clone(sPAPROptionVector *ov_orig);
@@ -69,6 +75,7 @@ void spapr_ovec_cleanup(sPAPROptionVector *ov);
 void spapr_ovec_set(sPAPROptionVector *ov, long bitnr);
 void spapr_ovec_clear(sPAPROptionVector *ov, long bitnr);
 bool spapr_ovec_test(sPAPROptionVector *ov, long bitnr);
+uint8_t spapr_ovec_byte(sPAPROptionVector *ov, long bitnr);
 sPAPROptionVector *spapr_ovec_parse_vector(target_ulong table_addr, int vector);
 int spapr_ovec_populate_dt(void *fdt, int fdt_offset,
                            sPAPROptionVector *ov, const char *name);
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH 3/4] spapr: disable hotplugging without OS
  2017-05-23 11:18 [Qemu-devel] [PATCH 0/4] spapr: disable hotplugging without OS Laurent Vivier
  2017-05-23 11:18 ` [Qemu-devel] [PATCH 1/4] spapr: add pre_plug function for memory Laurent Vivier
  2017-05-23 11:18 ` [Qemu-devel] [PATCH 2/4] spapr: add option vector 6 Laurent Vivier
@ 2017-05-23 11:18 ` Laurent Vivier
  2017-05-24  5:07   ` David Gibson
  2017-05-23 11:18 ` [Qemu-devel] [PATCH 4/4] Revert "spapr: fix memory hot-unplugging" Laurent Vivier
  2017-05-23 17:52 ` [Qemu-devel] [Qemu-ppc] [PATCH 0/4] spapr: disable hotplugging without OS Daniel Henrique Barboza
  4 siblings, 1 reply; 27+ messages in thread
From: Laurent Vivier @ 2017-05-23 11:18 UTC (permalink / raw)
  To: David Gibson
  Cc: Thomas Huth, qemu-ppc, qemu-devel, Michael Roth, Laurent Vivier

If the OS is not started, QEMU sends an event to the OS
that is lost and cannot be recovered. An unplug is not
able to restore QEMU in a coherent state.
So, while the OS is not started, disable CPU and memory hotplug.
We use option vector 6 to know if the OS is started

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
 hw/ppc/spapr.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index eceb4cc..2e9320d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2625,6 +2625,7 @@ out:
 static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
                                 Error **errp)
 {
+    sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
     PCDIMMDevice *dimm = PC_DIMM(dev);
     PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
     MemoryRegion *mr = ddc->get_memory_region(dimm);
@@ -2645,6 +2646,13 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
         goto out;
     }
 
+    if (dev->hotplugged) {
+        if (!ms->os_name) {
+            error_setg(&local_err, "Memory hotplug not supported without OS");
+            goto out;
+        }
+    }
+
 out:
     error_propagate(errp, local_err);
 }
@@ -2874,6 +2882,7 @@ static void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
                                 Error **errp)
 {
     MachineState *machine = MACHINE(OBJECT(hotplug_dev));
+    sPAPRMachineState *ms = SPAPR_MACHINE(machine);
     MachineClass *mc = MACHINE_GET_CLASS(hotplug_dev);
     Error *local_err = NULL;
     CPUCore *cc = CPU_CORE(dev);
@@ -2884,9 +2893,16 @@ static void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
     int node_id;
     int index;
 
-    if (dev->hotplugged && !mc->has_hotpluggable_cpus) {
-        error_setg(&local_err, "CPU hotplug not supported for this machine");
-        goto out;
+    if (dev->hotplugged) {
+        if (!mc->has_hotpluggable_cpus) {
+            error_setg(&local_err,
+                       "CPU hotplug not supported for this machine");
+            goto out;
+        }
+        if (!ms->os_name) {
+            error_setg(&local_err, "CPU hotplug not supported without OS");
+            goto out;
+        }
     }
 
     if (strcmp(base_core_type, type)) {
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [Qemu-devel] [PATCH 4/4] Revert "spapr: fix memory hot-unplugging"
  2017-05-23 11:18 [Qemu-devel] [PATCH 0/4] spapr: disable hotplugging without OS Laurent Vivier
                   ` (2 preceding siblings ...)
  2017-05-23 11:18 ` [Qemu-devel] [PATCH 3/4] spapr: disable hotplugging without OS Laurent Vivier
@ 2017-05-23 11:18 ` Laurent Vivier
  2017-05-23 17:52 ` [Qemu-devel] [Qemu-ppc] [PATCH 0/4] spapr: disable hotplugging without OS Daniel Henrique Barboza
  4 siblings, 0 replies; 27+ messages in thread
From: Laurent Vivier @ 2017-05-23 11:18 UTC (permalink / raw)
  To: David Gibson
  Cc: Thomas Huth, qemu-ppc, qemu-devel, Michael Roth, Laurent Vivier

This reverts commit fe6824d12642b005c69123ecf8631f9b13553f8b.

This didn't fix the problem. Once the hotplug has been started
some memory is allocated and some structures are allocated.
We don't free it when we ignore the unplug, and we can't because
they can be in use by the kernel.

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
 hw/ppc/spapr_drc.c         | 20 +++-----------------
 include/hw/ppc/spapr_drc.h |  1 -
 2 files changed, 3 insertions(+), 18 deletions(-)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 9fa5545..2bda00f 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -135,17 +135,6 @@ static uint32_t set_allocation_state(sPAPRDRConnector *drc,
         if (!drc->dev) {
             return RTAS_OUT_NO_SUCH_INDICATOR;
         }
-        if (drc->awaiting_release && drc->awaiting_allocation) {
-            /* kernel is acknowledging a previous hotplug event
-             * while we are already removing it.
-             * it's safe to ignore awaiting_allocation here since we know the
-             * situation is predicated on the guest either already having done
-             * so (boot-time hotplug), or never being able to acquire in the
-             * first place (hotplug followed by immediate unplug).
-             */
-            drc->awaiting_allocation_skippable = true;
-            return RTAS_OUT_NO_SUCH_INDICATOR;
-        }
     }
 
     if (drc->type != SPAPR_DR_CONNECTOR_TYPE_PCI) {
@@ -447,11 +436,9 @@ static void detach(sPAPRDRConnector *drc, DeviceState *d,
     }
 
     if (drc->awaiting_allocation) {
-        if (!drc->awaiting_allocation_skippable) {
-            drc->awaiting_release = true;
-            trace_spapr_drc_awaiting_allocation(get_index(drc));
-            return;
-        }
+        drc->awaiting_release = true;
+        trace_spapr_drc_awaiting_allocation(get_index(drc));
+        return;
     }
 
     drc->indicator_state = SPAPR_DR_INDICATOR_STATE_INACTIVE;
@@ -461,7 +448,6 @@ static void detach(sPAPRDRConnector *drc, DeviceState *d,
     }
 
     drc->awaiting_release = false;
-    drc->awaiting_allocation_skippable = false;
     g_free(drc->fdt);
     drc->fdt = NULL;
     drc->fdt_start_offset = 0;
diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
index 5524247..fa531d5 100644
--- a/include/hw/ppc/spapr_drc.h
+++ b/include/hw/ppc/spapr_drc.h
@@ -154,7 +154,6 @@ typedef struct sPAPRDRConnector {
     bool awaiting_release;
     bool signalled;
     bool awaiting_allocation;
-    bool awaiting_allocation_skippable;
 
     /* device pointer, via link property */
     DeviceState *dev;
-- 
2.9.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 1/4] spapr: add pre_plug function for memory
  2017-05-23 11:18 ` [Qemu-devel] [PATCH 1/4] spapr: add pre_plug function for memory Laurent Vivier
@ 2017-05-23 15:28   ` Greg Kurz
  2017-05-23 16:09     ` Laurent Vivier
  0 siblings, 1 reply; 27+ messages in thread
From: Greg Kurz @ 2017-05-23 15:28 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: David Gibson, Thomas Huth, qemu-ppc, qemu-devel, Michael Roth

[-- Attachment #1: Type: text/plain, Size: 3598 bytes --]

On Tue, 23 May 2017 13:18:09 +0200
Laurent Vivier <lvivier@redhat.com> wrote:

> This allows to manage errors before the memory
> has started to be hotplugged. We already have
> the function for the CPU cores.
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
>  hw/ppc/spapr.c | 45 ++++++++++++++++++++++++++++++---------------
>  1 file changed, 30 insertions(+), 15 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 0980d73..0e8d8d1 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2569,20 +2569,6 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>      uint64_t align = memory_region_get_alignment(mr);
>      uint64_t size = memory_region_size(mr);
>      uint64_t addr;
> -    char *mem_dev;
> -
> -    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
> -        error_setg(&local_err, "Hotplugged memory size must be a multiple of "
> -                      "%lld MB", SPAPR_MEMORY_BLOCK_SIZE/M_BYTE);
> -        goto out;
> -    }
> -
> -    mem_dev = object_property_get_str(OBJECT(dimm), PC_DIMM_MEMDEV_PROP, NULL);
> -    if (mem_dev && !kvmppc_is_mem_backend_page_size_ok(mem_dev)) {
> -        error_setg(&local_err, "Memory backend has bad page size. "
> -                   "Use 'memory-backend-file' with correct mem-path.");
> -        goto out;
> -    }
>  
>      pc_dimm_memory_plug(dev, &ms->hotplug_memory, mr, align, &local_err);
>      if (local_err) {
> @@ -2603,6 +2589,33 @@ out:
>      error_propagate(errp, local_err);
>  }
>  
> +static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> +                                Error **errp)

Indentation nit

> +{
> +    PCDIMMDevice *dimm = PC_DIMM(dev);
> +    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
> +    MemoryRegion *mr = ddc->get_memory_region(dimm);
> +    uint64_t size = memory_region_size(mr);
> +    Error *local_err = NULL;
> +    char *mem_dev;
> +
> +    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
> +        error_setg(&local_err, "Hotplugged memory size must be a multiple of "
> +                      "%lld MB", SPAPR_MEMORY_BLOCK_SIZE / M_BYTE);
> +        goto out;
> +    }
> +
> +    mem_dev = object_property_get_str(OBJECT(dimm), PC_DIMM_MEMDEV_PROP, NULL);
> +    if (mem_dev && !kvmppc_is_mem_backend_page_size_ok(mem_dev)) {
> +        error_setg(&local_err, "Memory backend has bad page size. "
> +                   "Use 'memory-backend-file' with correct mem-path.");
> +        goto out;
> +    }
> +
> +out:
> +    error_propagate(errp, local_err);

As recently discussed with Markus Armbruster, it isn't necessary to have a
local Error * if you don't do anything else with it but propagate it.

Message-ID: <20170522090055.GN30246@umbus.fritz.box>

The patch looks good anyway.

Reviewed-by: Greg Kurz <groug@kaod.org>

> +}
> +
>  typedef struct sPAPRDIMMState {
>      uint32_t nr_lmbs;
>  } sPAPRDIMMState;
> @@ -2990,7 +3003,9 @@ static void spapr_machine_device_unplug_request(HotplugHandler *hotplug_dev,
>  static void spapr_machine_device_pre_plug(HotplugHandler *hotplug_dev,
>                                            DeviceState *dev, Error **errp)
>  {
> -    if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
> +        spapr_memory_pre_plug(hotplug_dev, dev, errp);
> +    } else if (object_dynamic_cast(OBJECT(dev), TYPE_SPAPR_CPU_CORE)) {
>          spapr_core_pre_plug(hotplug_dev, dev, errp);
>      }
>  }


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 1/4] spapr: add pre_plug function for memory
  2017-05-23 15:28   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
@ 2017-05-23 16:09     ` Laurent Vivier
  2017-05-24  4:52       ` David Gibson
  0 siblings, 1 reply; 27+ messages in thread
From: Laurent Vivier @ 2017-05-23 16:09 UTC (permalink / raw)
  To: Greg Kurz; +Cc: David Gibson, Thomas Huth, qemu-ppc, qemu-devel, Michael Roth

On 23/05/2017 17:28, Greg Kurz wrote:
> On Tue, 23 May 2017 13:18:09 +0200
> Laurent Vivier <lvivier@redhat.com> wrote:
> 
>> This allows to manage errors before the memory
>> has started to be hotplugged. We already have
>> the function for the CPU cores.
>>
>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
>> ---
>>  hw/ppc/spapr.c | 45 ++++++++++++++++++++++++++++++---------------
>>  1 file changed, 30 insertions(+), 15 deletions(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index 0980d73..0e8d8d1 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -2569,20 +2569,6 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>      uint64_t align = memory_region_get_alignment(mr);
>>      uint64_t size = memory_region_size(mr);
>>      uint64_t addr;
>> -    char *mem_dev;
>> -
>> -    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
>> -        error_setg(&local_err, "Hotplugged memory size must be a multiple of "
>> -                      "%lld MB", SPAPR_MEMORY_BLOCK_SIZE/M_BYTE);
>> -        goto out;
>> -    }
>> -
>> -    mem_dev = object_property_get_str(OBJECT(dimm), PC_DIMM_MEMDEV_PROP, NULL);
>> -    if (mem_dev && !kvmppc_is_mem_backend_page_size_ok(mem_dev)) {
>> -        error_setg(&local_err, "Memory backend has bad page size. "
>> -                   "Use 'memory-backend-file' with correct mem-path.");
>> -        goto out;
>> -    }
>>  
>>      pc_dimm_memory_plug(dev, &ms->hotplug_memory, mr, align, &local_err);
>>      if (local_err) {
>> @@ -2603,6 +2589,33 @@ out:
>>      error_propagate(errp, local_err);
>>  }
>>  
>> +static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>> +                                Error **errp)
> 
> Indentation nit

ok

> 
>> +{
>> +    PCDIMMDevice *dimm = PC_DIMM(dev);
>> +    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
>> +    MemoryRegion *mr = ddc->get_memory_region(dimm);
>> +    uint64_t size = memory_region_size(mr);
>> +    Error *local_err = NULL;
>> +    char *mem_dev;
>> +
>> +    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
>> +        error_setg(&local_err, "Hotplugged memory size must be a multiple of "
>> +                      "%lld MB", SPAPR_MEMORY_BLOCK_SIZE / M_BYTE);
>> +        goto out;
>> +    }
>> +
>> +    mem_dev = object_property_get_str(OBJECT(dimm), PC_DIMM_MEMDEV_PROP, NULL);
>> +    if (mem_dev && !kvmppc_is_mem_backend_page_size_ok(mem_dev)) {
>> +        error_setg(&local_err, "Memory backend has bad page size. "
>> +                   "Use 'memory-backend-file' with correct mem-path.");
>> +        goto out;
>> +    }
>> +
>> +out:
>> +    error_propagate(errp, local_err);
> 
> As recently discussed with Markus Armbruster, it isn't necessary to have a
> local Error * if you don't do anything else with it but propagate it.

Yes, you are right, it's a stupid cut'n'paste.

Thanks,
Laurent

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/4] spapr: add option vector 6
  2017-05-23 11:18 ` [Qemu-devel] [PATCH 2/4] spapr: add option vector 6 Laurent Vivier
@ 2017-05-23 16:31   ` Greg Kurz
  2017-05-24  4:58   ` [Qemu-devel] " David Gibson
  1 sibling, 0 replies; 27+ messages in thread
From: Greg Kurz @ 2017-05-23 16:31 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: David Gibson, Thomas Huth, qemu-ppc, qemu-devel, Michael Roth

[-- Attachment #1: Type: text/plain, Size: 8372 bytes --]

On Tue, 23 May 2017 13:18:10 +0200
Laurent Vivier <lvivier@redhat.com> wrote:

> This allows to know when the OS is started and its type.
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
>  hw/ppc/spapr.c              | 36 ++++++++++++++++++++++++++++++++++++
>  hw/ppc/spapr_hcall.c        |  5 ++++-
>  hw/ppc/spapr_ovec.c         |  8 ++++++++
>  include/hw/ppc/spapr.h      |  2 ++
>  include/hw/ppc/spapr_ovec.h |  7 +++++++
>  5 files changed, 57 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 0e8d8d1..eceb4cc 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1369,6 +1369,7 @@ static void ppc_spapr_reset(void)
>      first_ppc_cpu->env.nip = SPAPR_ENTRY_POINT;
>  
>      spapr->cas_reboot = false;
> +    spapr->os_name = OV6_NONE;
>  }
>  
>  static void spapr_create_nvram(sPAPRMachineState *spapr)
> @@ -1524,10 +1525,41 @@ static const VMStateDescription vmstate_spapr_patb_entry = {
>      },
>  };
>  
> +static bool spapr_os_name_needed(void *opaque)
> +{
> +    sPAPRMachineState *spapr = opaque;
> +    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> +    return smc->need_os_name;

So this will have the subsection migrated unconditionally even if the value wasn't
changed from the default yet ? Also, it looks weird to involve a machine compat
flag here... if the concern is backwards migration then I guess you should check
the compat flag in h_client_architecture_support() and not set @os_name for older
machines.

> +}
> +
> +static const VMStateDescription vmstate_spapr_os_name = {
> +    .name = "spapr_os_name",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = spapr_os_name_needed,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT8(os_name, sPAPRMachineState),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static int spapr_pre_load(void *opaque)
> +{
> +    sPAPRMachineState *spapr = opaque;
> +
> +    /* if the os_name is not migrated from the source,
> +     * we must allow hotplug, so set os_name to linux
> +     */
> +    spapr->os_name = OV6_LINUX;

But maybe the source was in SLOF and I guess you don't want @os_name
to be set in this case... The correct way to restore older machines
behavior is to set @os_name to OV6_LINUX according to the compat flag.

> +
> +    return 0;
> +}
> +
>  static const VMStateDescription vmstate_spapr = {
>      .name = "spapr",
>      .version_id = 3,
>      .minimum_version_id = 1,
> +    .pre_load = spapr_pre_load,
>      .post_load = spapr_post_load,
>      .fields = (VMStateField[]) {
>          /* used to be @next_irq */
> @@ -1542,6 +1574,7 @@ static const VMStateDescription vmstate_spapr = {
>      .subsections = (const VMStateDescription*[]) {
>          &vmstate_spapr_ov5_cas,
>          &vmstate_spapr_patb_entry,
> +        &vmstate_spapr_os_name,
>          NULL
>      }
>  };
> @@ -3216,6 +3249,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>       * in which LMBs are represented and hot-added
>       */
>      mc->numa_mem_align_shift = 28;
> +    smc->need_os_name = true;

The name seems to indicate the machine requires this, which is obviously not
the case... what about @parse_os_name instead ?

>  }
>  
>  static const TypeInfo spapr_machine_info = {
> @@ -3293,9 +3327,11 @@ static void spapr_machine_2_9_instance_options(MachineState *machine)
>  
>  static void spapr_machine_2_9_class_options(MachineClass *mc)
>  {
> +    sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
>      spapr_machine_2_10_class_options(mc);
>      SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_9);
>      mc->numa_auto_assign_ram = numa_legacy_auto_assign_ram;
> +    smc->need_os_name = false;
>  }
>  
>  DEFINE_SPAPR_MACHINE(2_9, "2.9", false);
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 0d608d6..5dbe3c7 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1058,7 +1058,8 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>      uint32_t max_compat = cpu->max_compat;
>      uint32_t best_compat = 0;
>      int i;
> -    sPAPROptionVector *ov1_guest, *ov5_guest, *ov5_cas_old, *ov5_updates;
> +    sPAPROptionVector *ov1_guest, *ov5_guest, *ov5_cas_old, *ov5_updates,
> +                      *ov6_guest;
>      bool guest_radix;
>  
>      /*
> @@ -1112,6 +1113,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>  
>      ov1_guest = spapr_ovec_parse_vector(ov_table, 1);
>      ov5_guest = spapr_ovec_parse_vector(ov_table, 5);
> +    ov6_guest = spapr_ovec_parse_vector(ov_table, 6);
>      if (spapr_ovec_test(ov5_guest, OV5_MMU_BOTH)) {
>          error_report("guest requested hash and radix MMU, which is invalid.");
>          exit(EXIT_FAILURE);
> @@ -1154,6 +1156,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>      }
>      spapr->cas_legacy_guest_workaround = !spapr_ovec_test(ov1_guest,
>                                                            OV1_PPC_3_00);
> +    spapr->os_name = spapr_ovec_byte(ov6_guest, OV6_OS_NAME);

Is there a reason not to have these lines grouped ?

+    ov6_guest = spapr_ovec_parse_vector(ov_table, 6);
+    spapr->os_name = spapr_ovec_byte(ov6_guest, OV6_OS_NAME);

>      if (!spapr->cas_reboot) {
>          spapr->cas_reboot =
>              (spapr_h_cas_compose_response(spapr, args[1], args[2],
> diff --git a/hw/ppc/spapr_ovec.c b/hw/ppc/spapr_ovec.c
> index 41df4c3..7adc9e6 100644
> --- a/hw/ppc/spapr_ovec.c
> +++ b/hw/ppc/spapr_ovec.c
> @@ -160,6 +160,14 @@ static uint8_t guest_byte_from_bitmap(unsigned long *bitmap, long bitmap_offset)
>      return entry;
>  }
>  
> +uint8_t spapr_ovec_byte(sPAPROptionVector *ov, long bitnr)
> +{
> +    g_assert(ov);
> +    g_assert(bitnr < OV_MAXBITS);
> +
> +    return guest_byte_from_bitmap(ov->bitmap, bitnr);
> +}
> +
>  static target_ulong vector_addr(target_ulong table_addr, int vector)
>  {
>      uint16_t vector_count, vector_len;
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 5802f88..041ce19 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -52,6 +52,7 @@ struct sPAPRMachineClass {
>      /*< public >*/
>      bool dr_lmb_enabled;       /* enable dynamic-reconfig/hotplug of LMBs */
>      bool use_ohci_by_default;  /* use USB-OHCI instead of XHCI */
> +    bool need_os_name;
>      const char *tcg_default_cpu; /* which (TCG) CPU to simulate by default */
>      void (*phb_placement)(sPAPRMachineState *spapr, uint32_t index,
>                            uint64_t *buid, hwaddr *pio, 
> @@ -90,6 +91,7 @@ struct sPAPRMachineState {
>      sPAPROptionVector *ov5_cas;     /* negotiated (via CAS) option vectors */
>      bool cas_reboot;
>      bool cas_legacy_guest_workaround;
> +    uint8_t os_name;
>  
>      Notifier epow_notifier;
>      QTAILQ_HEAD(, sPAPREventLogEntry) pending_events;
> diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
> index f088833..c728bb3 100644
> --- a/include/hw/ppc/spapr_ovec.h
> +++ b/include/hw/ppc/spapr_ovec.h
> @@ -56,6 +56,12 @@ typedef struct sPAPROptionVector sPAPROptionVector;
>  #define OV5_MMU_RADIX_300       OV_BIT(24, 1) /* 1=Radix only, 0=Hash only */
>  #define OV5_MMU_RADIX_GTSE      OV_BIT(26, 1) /* Radix GTSE */
>  
> +/* option vector 6 */
> +#define OV6_OS_NAME             OV_BIT(3, 0)
> +#define OV6_NONE                0x00
> +#define OV6_AIX                 0x01
> +#define OV6_LINUX               0x02
> +
>  /* interfaces */
>  sPAPROptionVector *spapr_ovec_new(void);
>  sPAPROptionVector *spapr_ovec_clone(sPAPROptionVector *ov_orig);
> @@ -69,6 +75,7 @@ void spapr_ovec_cleanup(sPAPROptionVector *ov);
>  void spapr_ovec_set(sPAPROptionVector *ov, long bitnr);
>  void spapr_ovec_clear(sPAPROptionVector *ov, long bitnr);
>  bool spapr_ovec_test(sPAPROptionVector *ov, long bitnr);
> +uint8_t spapr_ovec_byte(sPAPROptionVector *ov, long bitnr);
>  sPAPROptionVector *spapr_ovec_parse_vector(target_ulong table_addr, int vector);
>  int spapr_ovec_populate_dt(void *fdt, int fdt_offset,
>                             sPAPROptionVector *ov, const char *name);


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 0/4] spapr: disable hotplugging without OS
  2017-05-23 11:18 [Qemu-devel] [PATCH 0/4] spapr: disable hotplugging without OS Laurent Vivier
                   ` (3 preceding siblings ...)
  2017-05-23 11:18 ` [Qemu-devel] [PATCH 4/4] Revert "spapr: fix memory hot-unplugging" Laurent Vivier
@ 2017-05-23 17:52 ` Daniel Henrique Barboza
  2017-05-23 18:07   ` Daniel Henrique Barboza
  4 siblings, 1 reply; 27+ messages in thread
From: Daniel Henrique Barboza @ 2017-05-23 17:52 UTC (permalink / raw)
  To: Laurent Vivier, David Gibson
  Cc: Thomas Huth, qemu-ppc, qemu-devel, Michael Roth

Hi Laurent,

This is an interesting patch series. I've been working in the last weeks 
in the DRC
migration, mainly to solve the problem in which a hot CPU unplug will 
not succeed after
a migration if the CPU was hotplugged in the source. The problem 
happened when
migrating with virsh because Libvirt hotplugs the CPU in both source and 
target, and
the DRC state of the hotplugged after the migration is inconsistent. 
This series solves the
issue by preventing it from happening in the first place. Of course that 
migrating DRC states
has other uses (pending unplug operations during a migration, for 
example) so both patch
series can coexist.

One possible issue I see with this series is that it breaks Libvirt 
migration entirely if
a CPU/mem hotplug happens in the target. With your series applied the 
migration
fails before start with:

# ./virsh -c 'qemu:///system' migrate --live --domain dhb_ub1704_nfs 
--desturi qemu+ssh://target_ip/system --timeout 60 --verbose
error: internal error: unable to execute QEMU command 'device_add': CPU 
hotplug not supported without OS

Note that I say "possible issue" because, although I believe we do not 
want to break Libvirt
if possible, I also believe that we need to think about what makes sense 
in QEMU first.

Thanks,

Daniel

On 05/23/2017 08:18 AM, Laurent Vivier wrote:
> If the OS is not started, QEMU sends an event to the OS
> that is lost and cannot be recovered. An unplug is not
> able to restore QEMU in a coherent state.
> So, while the OS is not started, disable CPU and memory hotplug.
> We use option vector 6 to know if the OS is started
>
> This series moves error checking for memory hotplug
> in a pre_plug function, and introduces the option
> vector 6 management. It also revert previous
> fix which was not really fixing the hotplug problem
> when the OS is not running.
>
> Laurent Vivier (4):
>    spapr: add pre_plug function for memory
>    spapr: add option vector 6
>    spapr: disable hotplugging without OS
>    Revert "spapr: fix memory hot-unplugging"
>
>   hw/ppc/spapr.c              | 103 ++++++++++++++++++++++++++++++++++++--------
>   hw/ppc/spapr_drc.c          |  20 ++-------
>   hw/ppc/spapr_hcall.c        |   5 ++-
>   hw/ppc/spapr_ovec.c         |   8 ++++
>   include/hw/ppc/spapr.h      |   2 +
>   include/hw/ppc/spapr_drc.h  |   1 -
>   include/hw/ppc/spapr_ovec.h |   7 +++
>   7 files changed, 109 insertions(+), 37 deletions(-)
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 0/4] spapr: disable hotplugging without OS
  2017-05-23 17:52 ` [Qemu-devel] [Qemu-ppc] [PATCH 0/4] spapr: disable hotplugging without OS Daniel Henrique Barboza
@ 2017-05-23 18:07   ` Daniel Henrique Barboza
  2017-05-23 18:22     ` Daniel Henrique Barboza
  0 siblings, 1 reply; 27+ messages in thread
From: Daniel Henrique Barboza @ 2017-05-23 18:07 UTC (permalink / raw)
  To: Laurent Vivier, David Gibson
  Cc: Thomas Huth, qemu-ppc, qemu-devel, Michael Roth



On 05/23/2017 02:52 PM, Daniel Henrique Barboza wrote:
> Hi Laurent,
>
> This is an interesting patch series. I've been working in the last 
> weeks in the DRC
> migration, mainly to solve the problem in which a hot CPU unplug will 
> not succeed after
> a migration if the CPU was hotplugged in the source. The problem 
> happened when
> migrating with virsh because Libvirt hotplugs the CPU in both source 
> and target, and
> the DRC state of the hotplugged after the migration is inconsistent. 
> This series solves the
> issue by preventing it from happening in the first place. Of course 
> that migrating DRC states
> has other uses (pending unplug operations during a migration, for 
> example) so both patch
> series can coexist.
>
> One possible issue I see with this series is that it breaks Libvirt 
> migration entirely if
> a CPU/mem hotplug happens in the target. With your series applied the 
> migration
> fails before start with:

Sorry: if a migration happens in the *source*, before the migration.

>
>
> # ./virsh -c 'qemu:///system' migrate --live --domain dhb_ub1704_nfs 
> --desturi qemu+ssh://target_ip/system --timeout 60 --verbose
> error: internal error: unable to execute QEMU command 'device_add': 
> CPU hotplug not supported without OS
>
> Note that I say "possible issue" because, although I believe we do not 
> want to break Libvirt
> if possible, I also believe that we need to think about what makes 
> sense in QEMU first.
>
>
>
> Thanks,
>
>
> Daniel
>
>
> On 05/23/2017 08:18 AM, Laurent Vivier wrote:
>> If the OS is not started, QEMU sends an event to the OS
>> that is lost and cannot be recovered. An unplug is not
>> able to restore QEMU in a coherent state.
>> So, while the OS is not started, disable CPU and memory hotplug.
>> We use option vector 6 to know if the OS is started
>>
>> This series moves error checking for memory hotplug
>> in a pre_plug function, and introduces the option
>> vector 6 management. It also revert previous
>> fix which was not really fixing the hotplug problem
>> when the OS is not running.
>>
>> Laurent Vivier (4):
>>    spapr: add pre_plug function for memory
>>    spapr: add option vector 6
>>    spapr: disable hotplugging without OS
>>    Revert "spapr: fix memory hot-unplugging"
>>
>>   hw/ppc/spapr.c              | 103 
>> ++++++++++++++++++++++++++++++++++++--------
>>   hw/ppc/spapr_drc.c          |  20 ++-------
>>   hw/ppc/spapr_hcall.c        |   5 ++-
>>   hw/ppc/spapr_ovec.c         |   8 ++++
>>   include/hw/ppc/spapr.h      |   2 +
>>   include/hw/ppc/spapr_drc.h  |   1 -
>>   include/hw/ppc/spapr_ovec.h |   7 +++
>>   7 files changed, 109 insertions(+), 37 deletions(-)
>>
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 0/4] spapr: disable hotplugging without OS
  2017-05-23 18:07   ` Daniel Henrique Barboza
@ 2017-05-23 18:22     ` Daniel Henrique Barboza
  2017-05-23 19:42       ` Laurent Vivier
  0 siblings, 1 reply; 27+ messages in thread
From: Daniel Henrique Barboza @ 2017-05-23 18:22 UTC (permalink / raw)
  To: Laurent Vivier, David Gibson
  Cc: Thomas Huth, qemu-ppc, qemu-devel, Michael Roth



On 05/23/2017 03:07 PM, Daniel Henrique Barboza wrote:
>
>
> On 05/23/2017 02:52 PM, Daniel Henrique Barboza wrote:
>> Hi Laurent,
>>
>> This is an interesting patch series. I've been working in the last 
>> weeks in the DRC
>> migration, mainly to solve the problem in which a hot CPU unplug will 
>> not succeed after
>> a migration if the CPU was hotplugged in the source. The problem 
>> happened when
>> migrating with virsh because Libvirt hotplugs the CPU in both source 
>> and target, and
>> the DRC state of the hotplugged after the migration is inconsistent. 
>> This series solves the
>> issue by preventing it from happening in the first place. Of course 
>> that migrating DRC states
>> has other uses (pending unplug operations during a migration, for 
>> example) so both patch
>> series can coexist.
>>
>> One possible issue I see with this series is that it breaks Libvirt 
>> migration entirely if
>> a CPU/mem hotplug happens in the target. With your series applied the 
>> migration
>> fails before start with:
>
> Sorry: if a migration happens in the *source*, before the migration.

Hehe nothing like fixing a typo with another one ...

This is the Libvirt use case that fails with this patch set applied in 
QEMU master, Libvirt 3.4.0 compiled from
source:

# ./virsh start dhb_ub1704_nfs 2
#
# ./virsh setvcpus dhb_ub1704_nfs 2 --live
#
# ./virsh -c 'qemu:///system' migrate --live --domain dhb_ub1704_nfs 
--desturi qemu+ssh://target_ip/system --timeout 60 --verbose
error: internal error: unable to execute QEMU command 'device_add': CPU 
hotplug not supported without OS

This is the error msg that appears in Libvirt daemon:

2017-05-23 18:17:17.844+0000: 159678: error : 
qemuMonitorJSONCheckError:389 : internal error: unable to execute QEMU 
command 'device_add': CPU hotplug not supported without OS



Daniel

>
>>
>>
>> # ./virsh -c 'qemu:///system' migrate --live --domain dhb_ub1704_nfs 
>> --desturi qemu+ssh://target_ip/system --timeout 60 --verbose
>> error: internal error: unable to execute QEMU command 'device_add': 
>> CPU hotplug not supported without OS
>>
>> Note that I say "possible issue" because, although I believe we do 
>> not want to break Libvirt
>> if possible, I also believe that we need to think about what makes 
>> sense in QEMU first.
>>
>>
>>
>> Thanks,
>>
>>
>> Daniel
>>
>>
>> On 05/23/2017 08:18 AM, Laurent Vivier wrote:
>>> If the OS is not started, QEMU sends an event to the OS
>>> that is lost and cannot be recovered. An unplug is not
>>> able to restore QEMU in a coherent state.
>>> So, while the OS is not started, disable CPU and memory hotplug.
>>> We use option vector 6 to know if the OS is started
>>>
>>> This series moves error checking for memory hotplug
>>> in a pre_plug function, and introduces the option
>>> vector 6 management. It also revert previous
>>> fix which was not really fixing the hotplug problem
>>> when the OS is not running.
>>>
>>> Laurent Vivier (4):
>>>    spapr: add pre_plug function for memory
>>>    spapr: add option vector 6
>>>    spapr: disable hotplugging without OS
>>>    Revert "spapr: fix memory hot-unplugging"
>>>
>>>   hw/ppc/spapr.c              | 103 
>>> ++++++++++++++++++++++++++++++++++++--------
>>>   hw/ppc/spapr_drc.c          |  20 ++-------
>>>   hw/ppc/spapr_hcall.c        |   5 ++-
>>>   hw/ppc/spapr_ovec.c         |   8 ++++
>>>   include/hw/ppc/spapr.h      |   2 +
>>>   include/hw/ppc/spapr_drc.h  |   1 -
>>>   include/hw/ppc/spapr_ovec.h |   7 +++
>>>   7 files changed, 109 insertions(+), 37 deletions(-)
>>>
>>
>>
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 0/4] spapr: disable hotplugging without OS
  2017-05-23 18:22     ` Daniel Henrique Barboza
@ 2017-05-23 19:42       ` Laurent Vivier
  0 siblings, 0 replies; 27+ messages in thread
From: Laurent Vivier @ 2017-05-23 19:42 UTC (permalink / raw)
  To: Daniel Henrique Barboza, David Gibson
  Cc: Thomas Huth, qemu-ppc, qemu-devel, Michael Roth

On 23/05/2017 20:22, Daniel Henrique Barboza wrote:
> 
> 
> This is the Libvirt use case that fails with this patch set applied in
> QEMU master, Libvirt 3.4.0 compiled from
> source:
> 
> # ./virsh start dhb_ub1704_nfs 2
> #
> # ./virsh setvcpus dhb_ub1704_nfs 2 --live
> #
> # ./virsh -c 'qemu:///system' migrate --live --domain dhb_ub1704_nfs
> --desturi qemu+ssh://target_ip/system --timeout 60 --verbose
> error: internal error: unable to execute QEMU command 'device_add': CPU
> hotplug not supported without OS

Good point.

I think I should refine my series to allow hotplug if the machine is not
started.

Thanks,
Laurent

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 1/4] spapr: add pre_plug function for memory
  2017-05-23 16:09     ` Laurent Vivier
@ 2017-05-24  4:52       ` David Gibson
  2017-05-24  9:55         ` Greg Kurz
  0 siblings, 1 reply; 27+ messages in thread
From: David Gibson @ 2017-05-24  4:52 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: Greg Kurz, Thomas Huth, qemu-ppc, qemu-devel, Michael Roth

[-- Attachment #1: Type: text/plain, Size: 3497 bytes --]

On Tue, May 23, 2017 at 06:09:00PM +0200, Laurent Vivier wrote:
> On 23/05/2017 17:28, Greg Kurz wrote:
> > On Tue, 23 May 2017 13:18:09 +0200
> > Laurent Vivier <lvivier@redhat.com> wrote:
> > 
> >> This allows to manage errors before the memory
> >> has started to be hotplugged. We already have
> >> the function for the CPU cores.
> >>
> >> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> >> ---
> >>  hw/ppc/spapr.c | 45 ++++++++++++++++++++++++++++++---------------
> >>  1 file changed, 30 insertions(+), 15 deletions(-)
> >>
> >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> >> index 0980d73..0e8d8d1 100644
> >> --- a/hw/ppc/spapr.c
> >> +++ b/hw/ppc/spapr.c
> >> @@ -2569,20 +2569,6 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >>      uint64_t align = memory_region_get_alignment(mr);
> >>      uint64_t size = memory_region_size(mr);
> >>      uint64_t addr;
> >> -    char *mem_dev;
> >> -
> >> -    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
> >> -        error_setg(&local_err, "Hotplugged memory size must be a multiple of "
> >> -                      "%lld MB", SPAPR_MEMORY_BLOCK_SIZE/M_BYTE);
> >> -        goto out;
> >> -    }
> >> -
> >> -    mem_dev = object_property_get_str(OBJECT(dimm), PC_DIMM_MEMDEV_PROP, NULL);
> >> -    if (mem_dev && !kvmppc_is_mem_backend_page_size_ok(mem_dev)) {
> >> -        error_setg(&local_err, "Memory backend has bad page size. "
> >> -                   "Use 'memory-backend-file' with correct mem-path.");
> >> -        goto out;
> >> -    }
> >>  
> >>      pc_dimm_memory_plug(dev, &ms->hotplug_memory, mr, align, &local_err);
> >>      if (local_err) {
> >> @@ -2603,6 +2589,33 @@ out:
> >>      error_propagate(errp, local_err);
> >>  }
> >>  
> >> +static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >> +                                Error **errp)
> > 
> > Indentation nit
> 
> ok
> 
> > 
> >> +{
> >> +    PCDIMMDevice *dimm = PC_DIMM(dev);
> >> +    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
> >> +    MemoryRegion *mr = ddc->get_memory_region(dimm);
> >> +    uint64_t size = memory_region_size(mr);
> >> +    Error *local_err = NULL;
> >> +    char *mem_dev;
> >> +
> >> +    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
> >> +        error_setg(&local_err, "Hotplugged memory size must be a multiple of "
> >> +                      "%lld MB", SPAPR_MEMORY_BLOCK_SIZE / M_BYTE);
> >> +        goto out;
> >> +    }
> >> +
> >> +    mem_dev = object_property_get_str(OBJECT(dimm), PC_DIMM_MEMDEV_PROP, NULL);
> >> +    if (mem_dev && !kvmppc_is_mem_backend_page_size_ok(mem_dev)) {
> >> +        error_setg(&local_err, "Memory backend has bad page size. "
> >> +                   "Use 'memory-backend-file' with correct mem-path.");
> >> +        goto out;
> >> +    }
> >> +
> >> +out:
> >> +    error_propagate(errp, local_err);
> > 
> > As recently discussed with Markus Armbruster, it isn't necessary to have a
> > local Error * if you don't do anything else with it but propagate it.
> 
> Yes, you are right, it's a stupid cut'n'paste.

This patch seems like a good idea regardless of the rest, so I've
fixed the minor nits Greg pointed out and merged to ppc-for-2.10.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 2/4] spapr: add option vector 6
  2017-05-23 11:18 ` [Qemu-devel] [PATCH 2/4] spapr: add option vector 6 Laurent Vivier
  2017-05-23 16:31   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
@ 2017-05-24  4:58   ` David Gibson
  1 sibling, 0 replies; 27+ messages in thread
From: David Gibson @ 2017-05-24  4:58 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: Thomas Huth, qemu-ppc, qemu-devel, Michael Roth

[-- Attachment #1: Type: text/plain, Size: 7916 bytes --]

On Tue, May 23, 2017 at 01:18:10PM +0200, Laurent Vivier wrote:
> This allows to know when the OS is started and its type.
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>

This seems a bit oddly complex for the task at hand.  AFAICT you're
never actually using the value in OV6, just whether it's set or not.
So, it seems like all you're using this as is basically a flag saying
that CAS is complete.

> ---
>  hw/ppc/spapr.c              | 36 ++++++++++++++++++++++++++++++++++++
>  hw/ppc/spapr_hcall.c        |  5 ++++-
>  hw/ppc/spapr_ovec.c         |  8 ++++++++
>  include/hw/ppc/spapr.h      |  2 ++
>  include/hw/ppc/spapr_ovec.h |  7 +++++++
>  5 files changed, 57 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 0e8d8d1..eceb4cc 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1369,6 +1369,7 @@ static void ppc_spapr_reset(void)
>      first_ppc_cpu->env.nip = SPAPR_ENTRY_POINT;
>  
>      spapr->cas_reboot = false;
> +    spapr->os_name = OV6_NONE;
>  }
>  
>  static void spapr_create_nvram(sPAPRMachineState *spapr)
> @@ -1524,10 +1525,41 @@ static const VMStateDescription vmstate_spapr_patb_entry = {
>      },
>  };
>  
> +static bool spapr_os_name_needed(void *opaque)
> +{
> +    sPAPRMachineState *spapr = opaque;
> +    sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> +    return smc->need_os_name;
> +}
> +
> +static const VMStateDescription vmstate_spapr_os_name = {
> +    .name = "spapr_os_name",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = spapr_os_name_needed,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT8(os_name, sPAPRMachineState),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +static int spapr_pre_load(void *opaque)
> +{
> +    sPAPRMachineState *spapr = opaque;
> +
> +    /* if the os_name is not migrated from the source,
> +     * we must allow hotplug, so set os_name to linux
> +     */
> +    spapr->os_name = OV6_LINUX;
> +
> +    return 0;
> +}
> +
>  static const VMStateDescription vmstate_spapr = {
>      .name = "spapr",
>      .version_id = 3,
>      .minimum_version_id = 1,
> +    .pre_load = spapr_pre_load,
>      .post_load = spapr_post_load,
>      .fields = (VMStateField[]) {
>          /* used to be @next_irq */
> @@ -1542,6 +1574,7 @@ static const VMStateDescription vmstate_spapr = {
>      .subsections = (const VMStateDescription*[]) {
>          &vmstate_spapr_ov5_cas,
>          &vmstate_spapr_patb_entry,
> +        &vmstate_spapr_os_name,
>          NULL
>      }
>  };
> @@ -3216,6 +3249,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data)
>       * in which LMBs are represented and hot-added
>       */
>      mc->numa_mem_align_shift = 28;
> +    smc->need_os_name = true;
>  }
>  
>  static const TypeInfo spapr_machine_info = {
> @@ -3293,9 +3327,11 @@ static void spapr_machine_2_9_instance_options(MachineState *machine)
>  
>  static void spapr_machine_2_9_class_options(MachineClass *mc)
>  {
> +    sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
>      spapr_machine_2_10_class_options(mc);
>      SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_2_9);
>      mc->numa_auto_assign_ram = numa_legacy_auto_assign_ram;
> +    smc->need_os_name = false;
>  }
>  
>  DEFINE_SPAPR_MACHINE(2_9, "2.9", false);
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 0d608d6..5dbe3c7 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1058,7 +1058,8 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>      uint32_t max_compat = cpu->max_compat;
>      uint32_t best_compat = 0;
>      int i;
> -    sPAPROptionVector *ov1_guest, *ov5_guest, *ov5_cas_old, *ov5_updates;
> +    sPAPROptionVector *ov1_guest, *ov5_guest, *ov5_cas_old, *ov5_updates,
> +                      *ov6_guest;
>      bool guest_radix;
>  
>      /*
> @@ -1112,6 +1113,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>  
>      ov1_guest = spapr_ovec_parse_vector(ov_table, 1);
>      ov5_guest = spapr_ovec_parse_vector(ov_table, 5);
> +    ov6_guest = spapr_ovec_parse_vector(ov_table, 6);
>      if (spapr_ovec_test(ov5_guest, OV5_MMU_BOTH)) {
>          error_report("guest requested hash and radix MMU, which is invalid.");
>          exit(EXIT_FAILURE);
> @@ -1154,6 +1156,7 @@ static target_ulong h_client_architecture_support(PowerPCCPU *cpu,
>      }
>      spapr->cas_legacy_guest_workaround = !spapr_ovec_test(ov1_guest,
>                                                            OV1_PPC_3_00);
> +    spapr->os_name = spapr_ovec_byte(ov6_guest, OV6_OS_NAME);
>      if (!spapr->cas_reboot) {
>          spapr->cas_reboot =
>              (spapr_h_cas_compose_response(spapr, args[1], args[2],
> diff --git a/hw/ppc/spapr_ovec.c b/hw/ppc/spapr_ovec.c
> index 41df4c3..7adc9e6 100644
> --- a/hw/ppc/spapr_ovec.c
> +++ b/hw/ppc/spapr_ovec.c
> @@ -160,6 +160,14 @@ static uint8_t guest_byte_from_bitmap(unsigned long *bitmap, long bitmap_offset)
>      return entry;
>  }
>  
> +uint8_t spapr_ovec_byte(sPAPROptionVector *ov, long bitnr)
> +{
> +    g_assert(ov);
> +    g_assert(bitnr < OV_MAXBITS);
> +
> +    return guest_byte_from_bitmap(ov->bitmap, bitnr);
> +}
> +
>  static target_ulong vector_addr(target_ulong table_addr, int vector)
>  {
>      uint16_t vector_count, vector_len;
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index 5802f88..041ce19 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -52,6 +52,7 @@ struct sPAPRMachineClass {
>      /*< public >*/
>      bool dr_lmb_enabled;       /* enable dynamic-reconfig/hotplug of LMBs */
>      bool use_ohci_by_default;  /* use USB-OHCI instead of XHCI */
> +    bool need_os_name;
>      const char *tcg_default_cpu; /* which (TCG) CPU to simulate by default */
>      void (*phb_placement)(sPAPRMachineState *spapr, uint32_t index,
>                            uint64_t *buid, hwaddr *pio, 
> @@ -90,6 +91,7 @@ struct sPAPRMachineState {
>      sPAPROptionVector *ov5_cas;     /* negotiated (via CAS) option vectors */
>      bool cas_reboot;
>      bool cas_legacy_guest_workaround;
> +    uint8_t os_name;
>  
>      Notifier epow_notifier;
>      QTAILQ_HEAD(, sPAPREventLogEntry) pending_events;
> diff --git a/include/hw/ppc/spapr_ovec.h b/include/hw/ppc/spapr_ovec.h
> index f088833..c728bb3 100644
> --- a/include/hw/ppc/spapr_ovec.h
> +++ b/include/hw/ppc/spapr_ovec.h
> @@ -56,6 +56,12 @@ typedef struct sPAPROptionVector sPAPROptionVector;
>  #define OV5_MMU_RADIX_300       OV_BIT(24, 1) /* 1=Radix only, 0=Hash only */
>  #define OV5_MMU_RADIX_GTSE      OV_BIT(26, 1) /* Radix GTSE */
>  
> +/* option vector 6 */
> +#define OV6_OS_NAME             OV_BIT(3, 0)
> +#define OV6_NONE                0x00
> +#define OV6_AIX                 0x01
> +#define OV6_LINUX               0x02
> +
>  /* interfaces */
>  sPAPROptionVector *spapr_ovec_new(void);
>  sPAPROptionVector *spapr_ovec_clone(sPAPROptionVector *ov_orig);
> @@ -69,6 +75,7 @@ void spapr_ovec_cleanup(sPAPROptionVector *ov);
>  void spapr_ovec_set(sPAPROptionVector *ov, long bitnr);
>  void spapr_ovec_clear(sPAPROptionVector *ov, long bitnr);
>  bool spapr_ovec_test(sPAPROptionVector *ov, long bitnr);
> +uint8_t spapr_ovec_byte(sPAPROptionVector *ov, long bitnr);
>  sPAPROptionVector *spapr_ovec_parse_vector(target_ulong table_addr, int vector);
>  int spapr_ovec_populate_dt(void *fdt, int fdt_offset,
>                             sPAPROptionVector *ov, const char *name);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [PATCH 3/4] spapr: disable hotplugging without OS
  2017-05-23 11:18 ` [Qemu-devel] [PATCH 3/4] spapr: disable hotplugging without OS Laurent Vivier
@ 2017-05-24  5:07   ` David Gibson
  2017-05-24  9:28     ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  0 siblings, 1 reply; 27+ messages in thread
From: David Gibson @ 2017-05-24  5:07 UTC (permalink / raw)
  To: Laurent Vivier; +Cc: Thomas Huth, qemu-ppc, qemu-devel, Michael Roth

[-- Attachment #1: Type: text/plain, Size: 3587 bytes --]

On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:
> If the OS is not started, QEMU sends an event to the OS
> that is lost and cannot be recovered. An unplug is not
> able to restore QEMU in a coherent state.
> So, while the OS is not started, disable CPU and memory hotplug.
> We use option vector 6 to know if the OS is started
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>

Urgh.. I'm not terribly confident that this is really correct.  As
discussed on the previous patch, you're essentially using OV6 as a
flag that CAS is complete.

But while it undoubtedly makes the race window much smaller, I don't
see that there's any guarantee the guest OS will really be able to
handle hotplug events immediately after CAS.

In particular if the CAS process completes partially but then needs to
trigger a reboot, I think that would end up setting the ov6 variable,
but the OS would definitely not be in a state to accept events.

Mike, I really think we need some input from someone familiar with how
these hotplug events are supposed to work.  What do we need to do to
handle lost or stale events, such as those delivered when an OS is not
booted.

> ---
>  hw/ppc/spapr.c | 22 +++++++++++++++++++---
>  1 file changed, 19 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index eceb4cc..2e9320d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2625,6 +2625,7 @@ out:
>  static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>                                  Error **errp)
>  {
> +    sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
>      PCDIMMDevice *dimm = PC_DIMM(dev);
>      PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
>      MemoryRegion *mr = ddc->get_memory_region(dimm);
> @@ -2645,6 +2646,13 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>          goto out;
>      }
>  
> +    if (dev->hotplugged) {
> +        if (!ms->os_name) {
> +            error_setg(&local_err, "Memory hotplug not supported without OS");
> +            goto out;
> +        }
> +    }
> +
>  out:
>      error_propagate(errp, local_err);
>  }
> @@ -2874,6 +2882,7 @@ static void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>                                  Error **errp)
>  {
>      MachineState *machine = MACHINE(OBJECT(hotplug_dev));
> +    sPAPRMachineState *ms = SPAPR_MACHINE(machine);
>      MachineClass *mc = MACHINE_GET_CLASS(hotplug_dev);
>      Error *local_err = NULL;
>      CPUCore *cc = CPU_CORE(dev);
> @@ -2884,9 +2893,16 @@ static void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>      int node_id;
>      int index;
>  
> -    if (dev->hotplugged && !mc->has_hotpluggable_cpus) {
> -        error_setg(&local_err, "CPU hotplug not supported for this machine");
> -        goto out;
> +    if (dev->hotplugged) {
> +        if (!mc->has_hotpluggable_cpus) {
> +            error_setg(&local_err,
> +                       "CPU hotplug not supported for this machine");
> +            goto out;
> +        }
> +        if (!ms->os_name) {
> +            error_setg(&local_err, "CPU hotplug not supported without OS");
> +            goto out;
> +        }
>      }
>  
>      if (strcmp(base_core_type, type)) {

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS
  2017-05-24  5:07   ` David Gibson
@ 2017-05-24  9:28     ` Greg Kurz
  2017-05-24 10:14       ` Igor Mammedov
  2017-05-25  2:45       ` David Gibson
  0 siblings, 2 replies; 27+ messages in thread
From: Greg Kurz @ 2017-05-24  9:28 UTC (permalink / raw)
  To: David Gibson
  Cc: Laurent Vivier, Thomas Huth, qemu-ppc, qemu-devel, Michael Roth

[-- Attachment #1: Type: text/plain, Size: 4264 bytes --]

On Wed, 24 May 2017 15:07:54 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:
> > If the OS is not started, QEMU sends an event to the OS
> > that is lost and cannot be recovered. An unplug is not
> > able to restore QEMU in a coherent state.
> > So, while the OS is not started, disable CPU and memory hotplug.
> > We use option vector 6 to know if the OS is started
> > 
> > Signed-off-by: Laurent Vivier <lvivier@redhat.com>  
> 
> Urgh.. I'm not terribly confident that this is really correct.  As
> discussed on the previous patch, you're essentially using OV6 as a
> flag that CAS is complete.
> 
> But while it undoubtedly makes the race window much smaller, I don't
> see that there's any guarantee the guest OS will really be able to
> handle hotplug events immediately after CAS.
> 
> In particular if the CAS process completes partially but then needs to
> trigger a reboot, I think that would end up setting the ov6 variable,
> but the OS would definitely not be in a state to accept events.
> 

We never have any guarantee that the OS will process an event that
we've sent actually (think of a kernel crash just after a successful
CAS negotiation for example, or any failure with the various guest
components involved in the process of hotplug).

> Mike, I really think we need some input from someone familiar with how
> these hotplug events are supposed to work.  What do we need to do to
> handle lost or stale events, such as those delivered when an OS is not
> booted.
> 

AFAIK, in the PowerVM world, the HMC exposes a user configurable timeout.

https://www.ibm.com/support/knowledgecenter/POWER8/p8hat/p8hat_dlparprocpoweraddp6.htm

I'm not sure we can do anything better than being able to "cancel" a previous
hotplug attempt if it takes too long, but I'm not necessarily the expert you're
looking for :)

> > ---
> >  hw/ppc/spapr.c | 22 +++++++++++++++++++---
> >  1 file changed, 19 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index eceb4cc..2e9320d 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -2625,6 +2625,7 @@ out:
> >  static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >                                  Error **errp)
> >  {
> > +    sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
> >      PCDIMMDevice *dimm = PC_DIMM(dev);
> >      PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
> >      MemoryRegion *mr = ddc->get_memory_region(dimm);
> > @@ -2645,6 +2646,13 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >          goto out;
> >      }
> >  
> > +    if (dev->hotplugged) {
> > +        if (!ms->os_name) {
> > +            error_setg(&local_err, "Memory hotplug not supported without OS");
> > +            goto out;
> > +        }
> > +    }
> > +
> >  out:
> >      error_propagate(errp, local_err);
> >  }
> > @@ -2874,6 +2882,7 @@ static void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >                                  Error **errp)
> >  {
> >      MachineState *machine = MACHINE(OBJECT(hotplug_dev));
> > +    sPAPRMachineState *ms = SPAPR_MACHINE(machine);
> >      MachineClass *mc = MACHINE_GET_CLASS(hotplug_dev);
> >      Error *local_err = NULL;
> >      CPUCore *cc = CPU_CORE(dev);
> > @@ -2884,9 +2893,16 @@ static void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> >      int node_id;
> >      int index;
> >  
> > -    if (dev->hotplugged && !mc->has_hotpluggable_cpus) {
> > -        error_setg(&local_err, "CPU hotplug not supported for this machine");
> > -        goto out;
> > +    if (dev->hotplugged) {
> > +        if (!mc->has_hotpluggable_cpus) {
> > +            error_setg(&local_err,
> > +                       "CPU hotplug not supported for this machine");
> > +            goto out;
> > +        }
> > +        if (!ms->os_name) {
> > +            error_setg(&local_err, "CPU hotplug not supported without OS");
> > +            goto out;
> > +        }
> >      }
> >  
> >      if (strcmp(base_core_type, type)) {  
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 1/4] spapr: add pre_plug function for memory
  2017-05-24  4:52       ` David Gibson
@ 2017-05-24  9:55         ` Greg Kurz
  2017-05-24 10:27           ` David Gibson
  0 siblings, 1 reply; 27+ messages in thread
From: Greg Kurz @ 2017-05-24  9:55 UTC (permalink / raw)
  To: David Gibson
  Cc: Laurent Vivier, Thomas Huth, qemu-ppc, qemu-devel, Michael Roth

[-- Attachment #1: Type: text/plain, Size: 1286 bytes --]

On Wed, 24 May 2017 14:52:36 +1000
David Gibson <david@gibson.dropbear.id.au> wrote:
[...]
> 
> This patch seems like a good idea regardless of the rest, so I've
> fixed the minor nits Greg pointed out and merged to ppc-for-2.10.
> 

David,

Commit d2e4c6a1437fab2fbb4553b598f25e282c475199 in your ppc-for-2.10 branch
doesn't compile:

+static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
+                                  Error **errp)
+{
+    PCDIMMDevice *dimm = PC_DIMM(dev);
+    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
+    MemoryRegion *mr = ddc->get_memory_region(dimm);
+    uint64_t size = memory_region_size(mr);
+    char *mem_dev;
+
+    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
+        error_setg(&local_err, "Hotplugged memory size must be a multiple of "

s/&local_err/errp/

+                      "%lld MB", SPAPR_MEMORY_BLOCK_SIZE / M_BYTE);
+        goto out;

s/goto out/return/

+    }
+
+    mem_dev = object_property_get_str(OBJECT(dimm), PC_DIMM_MEMDEV_PROP, NULL);
+    if (mem_dev && !kvmppc_is_mem_backend_page_size_ok(mem_dev)) {
+        error_setg(errp, "Memory backend has bad page size. "
+                   "Use 'memory-backend-file' with correct mem-path.");
+    }
+}
+

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS
  2017-05-24  9:28     ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
@ 2017-05-24 10:14       ` Igor Mammedov
  2017-05-24 15:54         ` Greg Kurz
  2017-05-25  2:49         ` David Gibson
  2017-05-25  2:45       ` David Gibson
  1 sibling, 2 replies; 27+ messages in thread
From: Igor Mammedov @ 2017-05-24 10:14 UTC (permalink / raw)
  To: Greg Kurz
  Cc: David Gibson, Laurent Vivier, Thomas Huth, qemu-ppc, qemu-devel,
	Michael Roth

On Wed, 24 May 2017 11:28:57 +0200
Greg Kurz <groug@kaod.org> wrote:

> On Wed, 24 May 2017 15:07:54 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:  
> > > If the OS is not started, QEMU sends an event to the OS
> > > that is lost and cannot be recovered. An unplug is not
> > > able to restore QEMU in a coherent state.
> > > So, while the OS is not started, disable CPU and memory hotplug.
> > > We use option vector 6 to know if the OS is started
> > > 
> > > Signed-off-by: Laurent Vivier <lvivier@redhat.com>    
> > 
> > Urgh.. I'm not terribly confident that this is really correct.  As
> > discussed on the previous patch, you're essentially using OV6 as a
> > flag that CAS is complete.
> > 
> > But while it undoubtedly makes the race window much smaller, I don't
> > see that there's any guarantee the guest OS will really be able to
> > handle hotplug events immediately after CAS.
> > 
> > In particular if the CAS process completes partially but then needs to
> > trigger a reboot, I think that would end up setting the ov6 variable,
> > but the OS would definitely not be in a state to accept events.
wouldn't guest on reboot pick up updated fdt and online hotplugged
before crash cpu along with initial cpus?

> We never have any guarantee that the OS will process an event that
> we've sent actually (think of a kernel crash just after a successful
> CAS negotiation for example, or any failure with the various guest
> components involved in the process of hotplug).
> 
> > Mike, I really think we need some input from someone familiar with how
> > these hotplug events are supposed to work.  What do we need to do to
> > handle lost or stale events, such as those delivered when an OS is not
> > booted.
> >   
> 
> AFAIK, in the PowerVM world, the HMC exposes a user configurable timeout.
> 
> https://www.ibm.com/support/knowledgecenter/POWER8/p8hat/p8hat_dlparprocpoweraddp6.htm
> 
> I'm not sure we can do anything better than being able to "cancel" a previous
> hotplug attempt if it takes too long, but I'm not necessarily the expert you're
> looking for :)
From x86/ACPI world:
 - if hotplug happens early at boot before guest OS is running
   hotplug notification (SCI interrupt) stays pending and once guest
   is up it will/should handle it and online CPU
 - if guest crashed and is rebooted it will pickup updated apci tables (fdt equivalent)
   with all present cpus (including hotplugged one before crash) and online
   hotplugged cpu along with coldplugged ones
 - if guest looses SCI somehow, it's considered guest issue and such cpu
   stays unpluggable until guest picks it somehow (reboot, manually running cpus scan
   method from ACPI or another cpu hotplug event) and explicitly ejects it.

Taking in account that CPUs don't support surprise removal and requires
guest cooperation it's fine to leave CPU plugged in until guest ejects it.
That's what I'd expect to happen on baremetal, 
you hotplug CPU, hardware notifies OS about it and that's all,
cpu won't suddenly pop out if OS isn't able to online it.

More over that hotplugged cpu might be executing some code or one of
already present cpus might be executing initialization routines to online
it (think of host overcommit and arbitrary delays) so it is not really safe
to remove hotplugged but not onlined cpu without OS consent
(i.e. explicit eject by OS/firmware). I think the lost event handling should be
fixed on guest side and not in QEMU.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 1/4] spapr: add pre_plug function for memory
  2017-05-24  9:55         ` Greg Kurz
@ 2017-05-24 10:27           ` David Gibson
  0 siblings, 0 replies; 27+ messages in thread
From: David Gibson @ 2017-05-24 10:27 UTC (permalink / raw)
  To: Greg Kurz; +Cc: Laurent Vivier, Thomas Huth, qemu-ppc, qemu-devel, Michael Roth

[-- Attachment #1: Type: text/plain, Size: 1706 bytes --]

On Wed, May 24, 2017 at 11:55:13AM +0200, Greg Kurz wrote:
1;4601;0c> On Wed, 24 May 2017 14:52:36 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> [...]
> > 
> > This patch seems like a good idea regardless of the rest, so I've
> > fixed the minor nits Greg pointed out and merged to ppc-for-2.10.
> > 
> 
> David,
> 
> Commit d2e4c6a1437fab2fbb4553b598f25e282c475199 in your ppc-for-2.10 branch
> doesn't compile:
> 
> +static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> +                                  Error **errp)
> +{
> +    PCDIMMDevice *dimm = PC_DIMM(dev);
> +    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
> +    MemoryRegion *mr = ddc->get_memory_region(dimm);
> +    uint64_t size = memory_region_size(mr);
> +    char *mem_dev;
> +
> +    if (size % SPAPR_MEMORY_BLOCK_SIZE) {
> +        error_setg(&local_err, "Hotplugged memory size must be a multiple of "
> 
> s/&local_err/errp/
> 
> +                      "%lld MB", SPAPR_MEMORY_BLOCK_SIZE / M_BYTE);
> +        goto out;
> 
> s/goto out/return/
> 
> +    }
> +
> +    mem_dev = object_property_get_str(OBJECT(dimm), PC_DIMM_MEMDEV_PROP, NULL);
> +    if (mem_dev && !kvmppc_is_mem_backend_page_size_ok(mem_dev)) {
> +        error_setg(errp, "Memory backend has bad page size. "
> +                   "Use 'memory-backend-file' with correct mem-path.");
> +    }
> +}
> +

Sorry, I found and fixed that already, but forgot to push the update.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS
  2017-05-24 10:14       ` Igor Mammedov
@ 2017-05-24 15:54         ` Greg Kurz
  2017-05-24 16:02           ` Laurent Vivier
  2017-05-25  2:49         ` David Gibson
  1 sibling, 1 reply; 27+ messages in thread
From: Greg Kurz @ 2017-05-24 15:54 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Laurent Vivier, Thomas Huth, Michael Roth, qemu-devel, qemu-ppc,
	David Gibson

[-- Attachment #1: Type: text/plain, Size: 4344 bytes --]

On Wed, 24 May 2017 12:14:02 +0200
Igor Mammedov <imammedo@redhat.com> wrote:

> On Wed, 24 May 2017 11:28:57 +0200
> Greg Kurz <groug@kaod.org> wrote:
> 
> > On Wed, 24 May 2017 15:07:54 +1000
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:    
> > > > If the OS is not started, QEMU sends an event to the OS
> > > > that is lost and cannot be recovered. An unplug is not
> > > > able to restore QEMU in a coherent state.
> > > > So, while the OS is not started, disable CPU and memory hotplug.
> > > > We use option vector 6 to know if the OS is started
> > > > 
> > > > Signed-off-by: Laurent Vivier <lvivier@redhat.com>      
> > > 
> > > Urgh.. I'm not terribly confident that this is really correct.  As
> > > discussed on the previous patch, you're essentially using OV6 as a
> > > flag that CAS is complete.
> > > 
> > > But while it undoubtedly makes the race window much smaller, I don't
> > > see that there's any guarantee the guest OS will really be able to
> > > handle hotplug events immediately after CAS.
> > > 
> > > In particular if the CAS process completes partially but then needs to
> > > trigger a reboot, I think that would end up setting the ov6 variable,
> > > but the OS would definitely not be in a state to accept events.  
> wouldn't guest on reboot pick up updated fdt and online hotplugged
> before crash cpu along with initial cpus?
> 

Yes and that's what actually happens with cpus.

But catching up with the background for this series, I have the
impression that the issue isn't the fact we loose an event if the OS
isn't started (which is not true), but more something wrong happening
when hotplugging+unplugging memory as described in this commit:

commit fe6824d12642b005c69123ecf8631f9b13553f8b
Author: Laurent Vivier <lvivier@redhat.com>
Date:   Tue Mar 28 14:09:34 2017 +0200

    spapr: fix memory hot-unplugging

> > We never have any guarantee that the OS will process an event that
> > we've sent actually (think of a kernel crash just after a successful
> > CAS negotiation for example, or any failure with the various guest
> > components involved in the process of hotplug).
> >   
> > > Mike, I really think we need some input from someone familiar with how
> > > these hotplug events are supposed to work.  What do we need to do to
> > > handle lost or stale events, such as those delivered when an OS is not
> > > booted.
> > >     
> > 
> > AFAIK, in the PowerVM world, the HMC exposes a user configurable timeout.
> > 
> > https://www.ibm.com/support/knowledgecenter/POWER8/p8hat/p8hat_dlparprocpoweraddp6.htm
> > 
> > I'm not sure we can do anything better than being able to "cancel" a previous
> > hotplug attempt if it takes too long, but I'm not necessarily the expert you're
> > looking for :)  
> From x86/ACPI world:
>  - if hotplug happens early at boot before guest OS is running
>    hotplug notification (SCI interrupt) stays pending and once guest
>    is up it will/should handle it and online CPU
>  - if guest crashed and is rebooted it will pickup updated apci tables (fdt equivalent)
>    with all present cpus (including hotplugged one before crash) and online
>    hotplugged cpu along with coldplugged ones
>  - if guest looses SCI somehow, it's considered guest issue and such cpu
>    stays unpluggable until guest picks it somehow (reboot, manually running cpus scan
>    method from ACPI or another cpu hotplug event) and explicitly ejects it.
> 
> Taking in account that CPUs don't support surprise removal and requires
> guest cooperation it's fine to leave CPU plugged in until guest ejects it.
> That's what I'd expect to happen on baremetal, 
> you hotplug CPU, hardware notifies OS about it and that's all,
> cpu won't suddenly pop out if OS isn't able to online it.
> 
> More over that hotplugged cpu might be executing some code or one of
> already present cpus might be executing initialization routines to online
> it (think of host overcommit and arbitrary delays) so it is not really safe
> to remove hotplugged but not onlined cpu without OS consent
> (i.e. explicit eject by OS/firmware). I think the lost event handling should be
> fixed on guest side and not in QEMU.
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS
  2017-05-24 15:54         ` Greg Kurz
@ 2017-05-24 16:02           ` Laurent Vivier
  2017-05-24 17:40             ` Michael Roth
  0 siblings, 1 reply; 27+ messages in thread
From: Laurent Vivier @ 2017-05-24 16:02 UTC (permalink / raw)
  To: Greg Kurz, Igor Mammedov
  Cc: Thomas Huth, Michael Roth, qemu-devel, qemu-ppc, David Gibson

On 24/05/2017 17:54, Greg Kurz wrote:
> On Wed, 24 May 2017 12:14:02 +0200
> Igor Mammedov <imammedo@redhat.com> wrote:
> 
>> On Wed, 24 May 2017 11:28:57 +0200
>> Greg Kurz <groug@kaod.org> wrote:
>>
>>> On Wed, 24 May 2017 15:07:54 +1000
>>> David Gibson <david@gibson.dropbear.id.au> wrote:
>>>   
>>>> On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:    
>>>>> If the OS is not started, QEMU sends an event to the OS
>>>>> that is lost and cannot be recovered. An unplug is not
>>>>> able to restore QEMU in a coherent state.
>>>>> So, while the OS is not started, disable CPU and memory hotplug.
>>>>> We use option vector 6 to know if the OS is started
>>>>>
>>>>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>      
>>>>
>>>> Urgh.. I'm not terribly confident that this is really correct.  As
>>>> discussed on the previous patch, you're essentially using OV6 as a
>>>> flag that CAS is complete.
>>>>
>>>> But while it undoubtedly makes the race window much smaller, I don't
>>>> see that there's any guarantee the guest OS will really be able to
>>>> handle hotplug events immediately after CAS.
>>>>
>>>> In particular if the CAS process completes partially but then needs to
>>>> trigger a reboot, I think that would end up setting the ov6 variable,
>>>> but the OS would definitely not be in a state to accept events.  
>> wouldn't guest on reboot pick up updated fdt and online hotplugged
>> before crash cpu along with initial cpus?
>>
> 
> Yes and that's what actually happens with cpus.
> 
> But catching up with the background for this series, I have the
> impression that the issue isn't the fact we loose an event if the OS
> isn't started (which is not true), but more something wrong happening
> when hotplugging+unplugging memory as described in this commit:
> 
> commit fe6824d12642b005c69123ecf8631f9b13553f8b
> Author: Laurent Vivier <lvivier@redhat.com>
> Date:   Tue Mar 28 14:09:34 2017 +0200
> 
>     spapr: fix memory hot-unplugging
> 

Yes, this commit try to fix that, but it's not possible. Some objects
remain in memory: you can see with "info cpus" or "info memory-devices"
that they are not really removed, and this prevents to hotplug them
again, and moreover in the case of the memory hot-unplug we can rerun
the device_del and crash qemu (as before the fix).

Moreover all stuff normally cleared in detach() are not, and we can't do
it later in set_allocation_state() because some are in use by the
kernel, and this is the last call from the kernel.

Laurent

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS
  2017-05-24 16:02           ` Laurent Vivier
@ 2017-05-24 17:40             ` Michael Roth
  2017-05-25  3:16               ` David Gibson
  0 siblings, 1 reply; 27+ messages in thread
From: Michael Roth @ 2017-05-24 17:40 UTC (permalink / raw)
  To: Greg Kurz, Igor Mammedov, Laurent Vivier
  Cc: Thomas Huth, David Gibson, qemu-ppc, qemu-devel

Quoting Laurent Vivier (2017-05-24 11:02:30)
> On 24/05/2017 17:54, Greg Kurz wrote:
> > On Wed, 24 May 2017 12:14:02 +0200
> > Igor Mammedov <imammedo@redhat.com> wrote:
> > 
> >> On Wed, 24 May 2017 11:28:57 +0200
> >> Greg Kurz <groug@kaod.org> wrote:
> >>
> >>> On Wed, 24 May 2017 15:07:54 +1000
> >>> David Gibson <david@gibson.dropbear.id.au> wrote:
> >>>   
> >>>> On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:    
> >>>>> If the OS is not started, QEMU sends an event to the OS
> >>>>> that is lost and cannot be recovered. An unplug is not
> >>>>> able to restore QEMU in a coherent state.
> >>>>> So, while the OS is not started, disable CPU and memory hotplug.
> >>>>> We use option vector 6 to know if the OS is started
> >>>>>
> >>>>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>      
> >>>>
> >>>> Urgh.. I'm not terribly confident that this is really correct.  As
> >>>> discussed on the previous patch, you're essentially using OV6 as a
> >>>> flag that CAS is complete.
> >>>>
> >>>> But while it undoubtedly makes the race window much smaller, I don't
> >>>> see that there's any guarantee the guest OS will really be able to
> >>>> handle hotplug events immediately after CAS.
> >>>>
> >>>> In particular if the CAS process completes partially but then needs to
> >>>> trigger a reboot, I think that would end up setting the ov6 variable,
> >>>> but the OS would definitely not be in a state to accept events.  
> >> wouldn't guest on reboot pick up updated fdt and online hotplugged
> >> before crash cpu along with initial cpus?
> >>
> > 
> > Yes and that's what actually happens with cpus.
> > 
> > But catching up with the background for this series, I have the
> > impression that the issue isn't the fact we loose an event if the OS
> > isn't started (which is not true), but more something wrong happening
> > when hotplugging+unplugging memory as described in this commit:
> > 
> > commit fe6824d12642b005c69123ecf8631f9b13553f8b
> > Author: Laurent Vivier <lvivier@redhat.com>
> > Date:   Tue Mar 28 14:09:34 2017 +0200
> > 
> >     spapr: fix memory hot-unplugging
> > 
> 
> Yes, this commit try to fix that, but it's not possible. Some objects
> remain in memory: you can see with "info cpus" or "info memory-devices"
> that they are not really removed, and this prevents to hotplug them
> again, and moreover in the case of the memory hot-unplug we can rerun
> the device_del and crash qemu (as before the fix).
> 
> Moreover all stuff normally cleared in detach() are not, and we can't do
> it later in set_allocation_state() because some are in use by the
> kernel, and this is the last call from the kernel.

Focusing on the hotplug/add case, it's a bit odd that the guest would be
using the memory even though the hotplug event is clearly still sitting
in the queue.

I think part of the issue is us not having a clear enough distinction in
the code between what constitutes the need for "boot-time" handling vs.
"hotplug" handling.

We have this hook in spapr_add_lmbs:

    if (!dev->hotplugged) {
        /* guests expect coldplugged LMBs to be pre-allocated */
        drck->set_allocation_state(drc, SPAPR_DR_ALLOCATION_STATE_USABLE);
        drck->set_isolation_state(drc, SPAPR_DR_ISOLATION_STATE_UNISOLATED);
    }

Whereas the default allocation/isolation state for LMBs in spapr_drc.c is
UNUSABLE/ISOLATED, which is what covers the dev->hotplugged == true case.

I need to spend some time testing to confirm, but trying to walk through the
various scenarios looking at the code:

case 1)

If the hotplug occurs before reset (not sure how likely this is), the event
will get dropped by reset handler, and the DRC stuff will be left in
UNUSABLE/ISOLATED. I think it's more appropriate to treat this as "boot-time"
and set it to USABLE/UNISOLATED like the !dev->hotplugged case.

case 2)

If the hotplug it occurs after reset, but before CAS,
spapr_populate_drconf_memory will be called to populate the DT with all active
LMBs. AFAICT, for hotplugged LMBs it marks everything where
memory_region_preset(get_system_memory(), addr) == true as
SPAPR_LMB_FLAGS_ASSIGNED. Since the region is mapped regardless of whether the
guest has acknowledged the hotplug, I think this would end up presenting the
LMB as having been present at boot-time. However, they will still be in the
UNUSABLE/ISOLATED state because dev->hotplugged == true.

I would think that the delayed hotplug event would move them to the appropriate
state later, allowing the unplug to succeed later, but it totally possible the
guest code bails out during the hotplug path since it already has the LMB marked
as being in use via the CAS-generated DT.

So it seems like we need to either:

a) not mark these LMBs as SPAPR_LMB_FLAGS_ASSIGNED in the DT and let them get
picked up by the deferred hotplug event (which seems to also be in need of an
extra IRQ pulse given that it's not getting picked up till later), or

b) let them get picked up as boot-time LMBs and add a CAS hook to move the
state to USABLE/UNISOLATED at that point. optionally we could also purge any
pending hotplug events from the event queue but that gets weird if we have
subsequent unplug events and whatnot sitting there as well. Hopefully letting
guest process the hotplug event later and possible fail still leaves us in
a recoverable state where we can still complete the unplug after boot.

Does this seem like an accurate assessment of the issues you're seeing?

> 
> Laurent
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS
  2017-05-24  9:28     ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
  2017-05-24 10:14       ` Igor Mammedov
@ 2017-05-25  2:45       ` David Gibson
  1 sibling, 0 replies; 27+ messages in thread
From: David Gibson @ 2017-05-25  2:45 UTC (permalink / raw)
  To: Greg Kurz; +Cc: Laurent Vivier, Thomas Huth, qemu-ppc, qemu-devel, Michael Roth

[-- Attachment #1: Type: text/plain, Size: 4881 bytes --]

On Wed, May 24, 2017 at 11:28:57AM +0200, Greg Kurz wrote:
> On Wed, 24 May 2017 15:07:54 +1000
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:
> > > If the OS is not started, QEMU sends an event to the OS
> > > that is lost and cannot be recovered. An unplug is not
> > > able to restore QEMU in a coherent state.
> > > So, while the OS is not started, disable CPU and memory hotplug.
> > > We use option vector 6 to know if the OS is started
> > > 
> > > Signed-off-by: Laurent Vivier <lvivier@redhat.com>  
> > 
> > Urgh.. I'm not terribly confident that this is really correct.  As
> > discussed on the previous patch, you're essentially using OV6 as a
> > flag that CAS is complete.
> > 
> > But while it undoubtedly makes the race window much smaller, I don't
> > see that there's any guarantee the guest OS will really be able to
> > handle hotplug events immediately after CAS.
> > 
> > In particular if the CAS process completes partially but then needs to
> > trigger a reboot, I think that would end up setting the ov6 variable,
> > but the OS would definitely not be in a state to accept events.
> > 
> 
> We never have any guarantee that the OS will process an event that
> we've sent actually (think of a kernel crash just after a successful
> CAS negotiation for example, or any failure with the various guest
> components involved in the process of hotplug).
> 
> > Mike, I really think we need some input from someone familiar with how
> > these hotplug events are supposed to work.  What do we need to do to
> > handle lost or stale events, such as those delivered when an OS is not
> > booted.
> > 
> 
> AFAIK, in the PowerVM world, the HMC exposes a user configurable timeout.
> 
> https://www.ibm.com/support/knowledgecenter/POWER8/p8hat/p8hat_dlparprocpoweraddp6.htm
> 
> I'm not sure we can do anything better than being able to "cancel" a previous
> hotplug attempt if it takes too long, but I'm not necessarily the expert you're
> looking for :)

Right, but at the moment we *don't* have a way to cancel a previous
hotplug attempt.  Trying to remove again ends up with things in a
tangle.

> 
> > > ---
> > >  hw/ppc/spapr.c | 22 +++++++++++++++++++---
> > >  1 file changed, 19 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index eceb4cc..2e9320d 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -2625,6 +2625,7 @@ out:
> > >  static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > >                                  Error **errp)
> > >  {
> > > +    sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
> > >      PCDIMMDevice *dimm = PC_DIMM(dev);
> > >      PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
> > >      MemoryRegion *mr = ddc->get_memory_region(dimm);
> > > @@ -2645,6 +2646,13 @@ static void spapr_memory_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > >          goto out;
> > >      }
> > >  
> > > +    if (dev->hotplugged) {
> > > +        if (!ms->os_name) {
> > > +            error_setg(&local_err, "Memory hotplug not supported without OS");
> > > +            goto out;
> > > +        }
> > > +    }
> > > +
> > >  out:
> > >      error_propagate(errp, local_err);
> > >  }
> > > @@ -2874,6 +2882,7 @@ static void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > >                                  Error **errp)
> > >  {
> > >      MachineState *machine = MACHINE(OBJECT(hotplug_dev));
> > > +    sPAPRMachineState *ms = SPAPR_MACHINE(machine);
> > >      MachineClass *mc = MACHINE_GET_CLASS(hotplug_dev);
> > >      Error *local_err = NULL;
> > >      CPUCore *cc = CPU_CORE(dev);
> > > @@ -2884,9 +2893,16 @@ static void spapr_core_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > >      int node_id;
> > >      int index;
> > >  
> > > -    if (dev->hotplugged && !mc->has_hotpluggable_cpus) {
> > > -        error_setg(&local_err, "CPU hotplug not supported for this machine");
> > > -        goto out;
> > > +    if (dev->hotplugged) {
> > > +        if (!mc->has_hotpluggable_cpus) {
> > > +            error_setg(&local_err,
> > > +                       "CPU hotplug not supported for this machine");
> > > +            goto out;
> > > +        }
> > > +        if (!ms->os_name) {
> > > +            error_setg(&local_err, "CPU hotplug not supported without OS");
> > > +            goto out;
> > > +        }
> > >      }
> > >  
> > >      if (strcmp(base_core_type, type)) {  
> > 
> 



-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS
  2017-05-24 10:14       ` Igor Mammedov
  2017-05-24 15:54         ` Greg Kurz
@ 2017-05-25  2:49         ` David Gibson
  1 sibling, 0 replies; 27+ messages in thread
From: David Gibson @ 2017-05-25  2:49 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: Greg Kurz, Laurent Vivier, Thomas Huth, qemu-ppc, qemu-devel,
	Michael Roth

[-- Attachment #1: Type: text/plain, Size: 4639 bytes --]

On Wed, May 24, 2017 at 12:14:02PM +0200, Igor Mammedov wrote:
> On Wed, 24 May 2017 11:28:57 +0200
> Greg Kurz <groug@kaod.org> wrote:
> 
> > On Wed, 24 May 2017 15:07:54 +1000
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> > 
> > > On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:  
> > > > If the OS is not started, QEMU sends an event to the OS
> > > > that is lost and cannot be recovered. An unplug is not
> > > > able to restore QEMU in a coherent state.
> > > > So, while the OS is not started, disable CPU and memory hotplug.
> > > > We use option vector 6 to know if the OS is started
> > > > 
> > > > Signed-off-by: Laurent Vivier <lvivier@redhat.com>    
> > > 
> > > Urgh.. I'm not terribly confident that this is really correct.  As
> > > discussed on the previous patch, you're essentially using OV6 as a
> > > flag that CAS is complete.
> > > 
> > > But while it undoubtedly makes the race window much smaller, I don't
> > > see that there's any guarantee the guest OS will really be able to
> > > handle hotplug events immediately after CAS.
> > > 
> > > In particular if the CAS process completes partially but then needs to
> > > trigger a reboot, I think that would end up setting the ov6 variable,
> > > but the OS would definitely not be in a state to accept events.
> wouldn't guest on reboot pick up updated fdt and online hotplugged
> before crash cpu along with initial cpus?

Ah.. yes, I guess so.  Are we already resetting DRC and pending event
state at reset?

> 
> > We never have any guarantee that the OS will process an event that
> > we've sent actually (think of a kernel crash just after a successful
> > CAS negotiation for example, or any failure with the various guest
> > components involved in the process of hotplug).
> > 
> > > Mike, I really think we need some input from someone familiar with how
> > > these hotplug events are supposed to work.  What do we need to do to
> > > handle lost or stale events, such as those delivered when an OS is not
> > > booted.
> > >   
> > 
> > AFAIK, in the PowerVM world, the HMC exposes a user configurable timeout.
> > 
> > https://www.ibm.com/support/knowledgecenter/POWER8/p8hat/p8hat_dlparprocpoweraddp6.htm
> > 
> > I'm not sure we can do anything better than being able to "cancel" a previous
> > hotplug attempt if it takes too long, but I'm not necessarily the expert you're
> > looking for :)
> From x86/ACPI world:
>  - if hotplug happens early at boot before guest OS is running
>    hotplug notification (SCI interrupt) stays pending and once guest
>    is up it will/should handle it and online CPU

Yeah.. I'm not sure how this will play out on pseries, I suspect the
problem is here.

>  - if guest crashed and is rebooted it will pickup updated apci tables (fdt equivalent)
>    with all present cpus (including hotplugged one before crash) and online
>    hotplugged cpu along with coldplugged ones

I think that should work ok for us, as long as we're properly
resetting in-flight hotplug state at a reset.

>  - if guest looses SCI somehow, it's considered guest issue and such cpu
>    stays unpluggable until guest picks it somehow (reboot, manually running cpus scan
>    method from ACPI or another cpu hotplug event) and explicitly ejects it.
> 
> Taking in account that CPUs don't support surprise removal and requires
> guest cooperation it's fine to leave CPU plugged in until guest ejects it.
> That's what I'd expect to happen on baremetal, 
> you hotplug CPU, hardware notifies OS about it and that's all,
> cpu won't suddenly pop out if OS isn't able to online it.
> 
> More over that hotplugged cpu might be executing some code or one of
> already present cpus might be executing initialization routines to online
> it (think of host overcommit and arbitrary delays) so it is not really safe
> to remove hotplugged but not onlined cpu without OS consent
> (i.e. explicit eject by OS/firmware). I think the lost event handling should be
> fixed on guest side and not in QEMU.

I agree in principle, but it's not yet clear what needs to be done.
I'm guessing the problem is amounting to lost events, but I'm not
certain.  The question is does the mechanism we're using to present
the events have a means to safely not lose them.  Are they being
presented and lost during SLOF; is there some way we can prevent that.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS
  2017-05-24 17:40             ` Michael Roth
@ 2017-05-25  3:16               ` David Gibson
  2017-05-30 17:15                 ` Michael Roth
  0 siblings, 1 reply; 27+ messages in thread
From: David Gibson @ 2017-05-25  3:16 UTC (permalink / raw)
  To: Michael Roth
  Cc: Greg Kurz, Igor Mammedov, Laurent Vivier, Thomas Huth, qemu-ppc,
	qemu-devel

[-- Attachment #1: Type: text/plain, Size: 7014 bytes --]

On Wed, May 24, 2017 at 12:40:37PM -0500, Michael Roth wrote:
> Quoting Laurent Vivier (2017-05-24 11:02:30)
> > On 24/05/2017 17:54, Greg Kurz wrote:
> > > On Wed, 24 May 2017 12:14:02 +0200
> > > Igor Mammedov <imammedo@redhat.com> wrote:
> > > 
> > >> On Wed, 24 May 2017 11:28:57 +0200
> > >> Greg Kurz <groug@kaod.org> wrote:
> > >>
> > >>> On Wed, 24 May 2017 15:07:54 +1000
> > >>> David Gibson <david@gibson.dropbear.id.au> wrote:
> > >>>   
> > >>>> On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:    
> > >>>>> If the OS is not started, QEMU sends an event to the OS
> > >>>>> that is lost and cannot be recovered. An unplug is not
> > >>>>> able to restore QEMU in a coherent state.
> > >>>>> So, while the OS is not started, disable CPU and memory hotplug.
> > >>>>> We use option vector 6 to know if the OS is started
> > >>>>>
> > >>>>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>      
> > >>>>
> > >>>> Urgh.. I'm not terribly confident that this is really correct.  As
> > >>>> discussed on the previous patch, you're essentially using OV6 as a
> > >>>> flag that CAS is complete.
> > >>>>
> > >>>> But while it undoubtedly makes the race window much smaller, I don't
> > >>>> see that there's any guarantee the guest OS will really be able to
> > >>>> handle hotplug events immediately after CAS.
> > >>>>
> > >>>> In particular if the CAS process completes partially but then needs to
> > >>>> trigger a reboot, I think that would end up setting the ov6 variable,
> > >>>> but the OS would definitely not be in a state to accept events.  
> > >> wouldn't guest on reboot pick up updated fdt and online hotplugged
> > >> before crash cpu along with initial cpus?
> > >>
> > > 
> > > Yes and that's what actually happens with cpus.
> > > 
> > > But catching up with the background for this series, I have the
> > > impression that the issue isn't the fact we loose an event if the OS
> > > isn't started (which is not true), but more something wrong happening
> > > when hotplugging+unplugging memory as described in this commit:
> > > 
> > > commit fe6824d12642b005c69123ecf8631f9b13553f8b
> > > Author: Laurent Vivier <lvivier@redhat.com>
> > > Date:   Tue Mar 28 14:09:34 2017 +0200
> > > 
> > >     spapr: fix memory hot-unplugging
> > > 
> > 
> > Yes, this commit try to fix that, but it's not possible. Some objects
> > remain in memory: you can see with "info cpus" or "info memory-devices"
> > that they are not really removed, and this prevents to hotplug them
> > again, and moreover in the case of the memory hot-unplug we can rerun
> > the device_del and crash qemu (as before the fix).
> > 
> > Moreover all stuff normally cleared in detach() are not, and we can't do
> > it later in set_allocation_state() because some are in use by the
> > kernel, and this is the last call from the kernel.
> 
> Focusing on the hotplug/add case, it's a bit odd that the guest would be
> using the memory even though the hotplug event is clearly still sitting
> in the queue.
> 
> I think part of the issue is us not having a clear enough distinction in
> the code between what constitutes the need for "boot-time" handling vs.
> "hotplug" handling.
> 
> We have this hook in spapr_add_lmbs:
> 
>     if (!dev->hotplugged) {
>         /* guests expect coldplugged LMBs to be pre-allocated */
>         drck->set_allocation_state(drc, SPAPR_DR_ALLOCATION_STATE_USABLE);
>         drck->set_isolation_state(drc, SPAPR_DR_ISOLATION_STATE_UNISOLATED);
>     }
> 
> Whereas the default allocation/isolation state for LMBs in spapr_drc.c is
> UNUSABLE/ISOLATED, which is what covers the dev->hotplugged == true case.
> 
> I need to spend some time testing to confirm, but trying to walk through the
> various scenarios looking at the code:
> 
> case 1)
> 
> If the hotplug occurs before reset (not sure how likely this is), the event
> will get dropped by reset handler, and the DRC stuff will be left in
> UNUSABLE/ISOLATED. I think it's more appropriate to treat this as "boot-time"
> and set it to USABLE/UNISOLATED like the !dev->hotplugged case.

Right.  It looks like we might need to go through all DRCs and sanitize
their state at reset time.  Essentially whatever their state before
the reset, they should appear as cold-plugged after the reset, I
think.

> case 2)
> 
> If the hotplug it occurs after reset, but before CAS,
> spapr_populate_drconf_memory will be called to populate the DT with all active
> LMBs. AFAICT, for hotplugged LMBs it marks everything where
> memory_region_preset(get_system_memory(), addr) == true as
> SPAPR_LMB_FLAGS_ASSIGNED. Since the region is mapped regardless of whether the
> guest has acknowledged the hotplug, I think this would end up presenting the
> LMB as having been present at boot-time. However, they will still be in the
> UNUSABLE/ISOLATED state because dev->hotplugged == true.
> 
> I would think that the delayed hotplug event would move them to the appropriate
> state later, allowing the unplug to succeed later, but it totally possible the
> guest code bails out during the hotplug path since it already has the LMB marked
> as being in use via the CAS-generated DT.
> 
> So it seems like we need to either:
> 
> a) not mark these LMBs as SPAPR_LMB_FLAGS_ASSIGNED in the DT and let them get
> picked up by the deferred hotplug event (which seems to also be in need of an
> extra IRQ pulse given that it's not getting picked up till later), or
> 
> b) let them get picked up as boot-time LMBs and add a CAS hook to move the
> state to USABLE/UNISOLATED at that point. optionally we could also purge any
> pending hotplug events from the event queue but that gets weird if we have
> subsequent unplug events and whatnot sitting there as well. Hopefully letting
> guest process the hotplug event later and possible fail still leaves us in
> a recoverable state where we can still complete the unplug after boot.
> 
> Does this seem like an accurate assessment of the issues you're seeing?

It seems plausible from my limited understanding of the situation.
The variety of possible state transitions in the PAPR hotplug model
hurts my brain.

I think plan (a) sounds simpler than plan (b).  Basically any hotplug
events that occur between reset and CAS we want to queue until CAS is
complete.  AIUI we're already effectively queuing the event that goes
to the guest, but we've already - incorrectly - made some qemu side
state changes that show up in the DT fragments handed out by CAS.

Can we just in general postpone the qemu side updates until the
hotplug event is presented to the guest, rather than when it's
submitted from the host?  Or will that raise a different bunch of problems?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS
  2017-05-25  3:16               ` David Gibson
@ 2017-05-30 17:15                 ` Michael Roth
  2017-05-31  6:36                   ` David Gibson
  0 siblings, 1 reply; 27+ messages in thread
From: Michael Roth @ 2017-05-30 17:15 UTC (permalink / raw)
  To: David Gibson
  Cc: Laurent Vivier, Thomas Huth, Greg Kurz, qemu-devel, qemu-ppc,
	Igor Mammedov

Quoting David Gibson (2017-05-24 22:16:26)
> On Wed, May 24, 2017 at 12:40:37PM -0500, Michael Roth wrote:
> > Quoting Laurent Vivier (2017-05-24 11:02:30)
> > > On 24/05/2017 17:54, Greg Kurz wrote:
> > > > On Wed, 24 May 2017 12:14:02 +0200
> > > > Igor Mammedov <imammedo@redhat.com> wrote:
> > > > 
> > > >> On Wed, 24 May 2017 11:28:57 +0200
> > > >> Greg Kurz <groug@kaod.org> wrote:
> > > >>
> > > >>> On Wed, 24 May 2017 15:07:54 +1000
> > > >>> David Gibson <david@gibson.dropbear.id.au> wrote:
> > > >>>   
> > > >>>> On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:    
> > > >>>>> If the OS is not started, QEMU sends an event to the OS
> > > >>>>> that is lost and cannot be recovered. An unplug is not
> > > >>>>> able to restore QEMU in a coherent state.
> > > >>>>> So, while the OS is not started, disable CPU and memory hotplug.
> > > >>>>> We use option vector 6 to know if the OS is started
> > > >>>>>
> > > >>>>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>      
> > > >>>>
> > > >>>> Urgh.. I'm not terribly confident that this is really correct.  As
> > > >>>> discussed on the previous patch, you're essentially using OV6 as a
> > > >>>> flag that CAS is complete.
> > > >>>>
> > > >>>> But while it undoubtedly makes the race window much smaller, I don't
> > > >>>> see that there's any guarantee the guest OS will really be able to
> > > >>>> handle hotplug events immediately after CAS.
> > > >>>>
> > > >>>> In particular if the CAS process completes partially but then needs to
> > > >>>> trigger a reboot, I think that would end up setting the ov6 variable,
> > > >>>> but the OS would definitely not be in a state to accept events.  
> > > >> wouldn't guest on reboot pick up updated fdt and online hotplugged
> > > >> before crash cpu along with initial cpus?
> > > >>
> > > > 
> > > > Yes and that's what actually happens with cpus.
> > > > 
> > > > But catching up with the background for this series, I have the
> > > > impression that the issue isn't the fact we loose an event if the OS
> > > > isn't started (which is not true), but more something wrong happening
> > > > when hotplugging+unplugging memory as described in this commit:
> > > > 
> > > > commit fe6824d12642b005c69123ecf8631f9b13553f8b
> > > > Author: Laurent Vivier <lvivier@redhat.com>
> > > > Date:   Tue Mar 28 14:09:34 2017 +0200
> > > > 
> > > >     spapr: fix memory hot-unplugging
> > > > 
> > > 
> > > Yes, this commit try to fix that, but it's not possible. Some objects
> > > remain in memory: you can see with "info cpus" or "info memory-devices"
> > > that they are not really removed, and this prevents to hotplug them
> > > again, and moreover in the case of the memory hot-unplug we can rerun
> > > the device_del and crash qemu (as before the fix).
> > > 
> > > Moreover all stuff normally cleared in detach() are not, and we can't do
> > > it later in set_allocation_state() because some are in use by the
> > > kernel, and this is the last call from the kernel.
> > 
> > Focusing on the hotplug/add case, it's a bit odd that the guest would be
> > using the memory even though the hotplug event is clearly still sitting
> > in the queue.
> > 
> > I think part of the issue is us not having a clear enough distinction in
> > the code between what constitutes the need for "boot-time" handling vs.
> > "hotplug" handling.
> > 
> > We have this hook in spapr_add_lmbs:
> > 
> >     if (!dev->hotplugged) {
> >         /* guests expect coldplugged LMBs to be pre-allocated */
> >         drck->set_allocation_state(drc, SPAPR_DR_ALLOCATION_STATE_USABLE);
> >         drck->set_isolation_state(drc, SPAPR_DR_ISOLATION_STATE_UNISOLATED);
> >     }
> > 
> > Whereas the default allocation/isolation state for LMBs in spapr_drc.c is
> > UNUSABLE/ISOLATED, which is what covers the dev->hotplugged == true case.
> > 
> > I need to spend some time testing to confirm, but trying to walk through the
> > various scenarios looking at the code:
> > 
> > case 1)
> > 
> > If the hotplug occurs before reset (not sure how likely this is), the event
> > will get dropped by reset handler, and the DRC stuff will be left in
> > UNUSABLE/ISOLATED. I think it's more appropriate to treat this as "boot-time"
> > and set it to USABLE/UNISOLATED like the !dev->hotplugged case.
> 
> Right.  It looks like we might need to go through all DRCs and sanitize
> their state at reset time.  Essentially whatever their state before
> the reset, they should appear as cold-plugged after the reset, I
> think.
> 
> > case 2)
> > 
> > If the hotplug it occurs after reset, but before CAS,
> > spapr_populate_drconf_memory will be called to populate the DT with all active
> > LMBs. AFAICT, for hotplugged LMBs it marks everything where
> > memory_region_preset(get_system_memory(), addr) == true as
> > SPAPR_LMB_FLAGS_ASSIGNED. Since the region is mapped regardless of whether the
> > guest has acknowledged the hotplug, I think this would end up presenting the
> > LMB as having been present at boot-time. However, they will still be in the
> > UNUSABLE/ISOLATED state because dev->hotplugged == true.
> > 
> > I would think that the delayed hotplug event would move them to the appropriate
> > state later, allowing the unplug to succeed later, but it totally possible the
> > guest code bails out during the hotplug path since it already has the LMB marked
> > as being in use via the CAS-generated DT.
> > 
> > So it seems like we need to either:
> > 
> > a) not mark these LMBs as SPAPR_LMB_FLAGS_ASSIGNED in the DT and let them get
> > picked up by the deferred hotplug event (which seems to also be in need of an
> > extra IRQ pulse given that it's not getting picked up till later), or
> > 
> > b) let them get picked up as boot-time LMBs and add a CAS hook to move the
> > state to USABLE/UNISOLATED at that point. optionally we could also purge any
> > pending hotplug events from the event queue but that gets weird if we have
> > subsequent unplug events and whatnot sitting there as well. Hopefully letting
> > guest process the hotplug event later and possible fail still leaves us in
> > a recoverable state where we can still complete the unplug after boot.
> > 
> > Does this seem like an accurate assessment of the issues you're seeing?
> 
> It seems plausible from my limited understanding of the situation.
> The variety of possible state transitions in the PAPR hotplug model
> hurts my brain.
> 
> I think plan (a) sounds simpler than plan (b).  Basically any hotplug
> events that occur between reset and CAS we want to queue until CAS is
> complete.  AIUI we're already effectively queuing the event that goes
> to the guest, but we've already - incorrectly - made some qemu side
> state changes that show up in the DT fragments handed out by CAS.

I agree. The one thing I'm a bit iffy on is why the guest is missing
the interrupt (or handling at least) for the initially-queued events;
if we go this route we need to make sure the guest acts on them as part
of boot.

I assume pending interrupts get dropped by CAS because the guest doesn't
initialize the hotplug interrupt handler until that point. If that's the
case, a CAS hook to scan through the event queue to re-signal if needed
would hopefully do the trick, but I'm still a bit uncertain about whether
that's sufficient.

If it's something we can't do deterministically, we might need to consider
plan (b).

> 
> Can we just in general postpone the qemu side updates until the
> hotplug event is presented to the guest, rather than when it's
> submitted from the host?  Or will that raise a different bunch of problems?

It seems like that might be problematic for migration.

Not updating the device tree with LMBs still pending delivery of hotplug
event during CAS seems fairly easy. We generate a new DT fragment for
the LMB at hp time anyways, so I think we can safely "throw away" the
updates and not worry about tracking any additional intermediate state.

Going to the extent of delaying the call to pc_dimm_memory_plug would be
problematic though I think. We would need a hook to make sure the call
is made if CAS completes after migration, and for cases where we do
migration by re-creating DIMMs via cmdline, we'd need some way to
synchonize state for these "pending" DIMMs, else that deferred call to
pc_dimm_memory_plug will probably generate errors due to duplicate
DIMM mappings (and if we relax/ignore those errors we still have the
original issue with CAS picking up the LMBs prematurely). This ends up
seeming really similar to the stuff that necessitated DRC migration,
which we probably want to avoid if possible.

> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS
  2017-05-30 17:15                 ` Michael Roth
@ 2017-05-31  6:36                   ` David Gibson
  0 siblings, 0 replies; 27+ messages in thread
From: David Gibson @ 2017-05-31  6:36 UTC (permalink / raw)
  To: Michael Roth
  Cc: Laurent Vivier, Thomas Huth, Greg Kurz, qemu-devel, qemu-ppc,
	Igor Mammedov

[-- Attachment #1: Type: text/plain, Size: 9737 bytes --]

On Tue, May 30, 2017 at 12:15:59PM -0500, Michael Roth wrote:
> Quoting David Gibson (2017-05-24 22:16:26)
> > On Wed, May 24, 2017 at 12:40:37PM -0500, Michael Roth wrote:
> > > Quoting Laurent Vivier (2017-05-24 11:02:30)
> > > > On 24/05/2017 17:54, Greg Kurz wrote:
> > > > > On Wed, 24 May 2017 12:14:02 +0200
> > > > > Igor Mammedov <imammedo@redhat.com> wrote:
> > > > > 
> > > > >> On Wed, 24 May 2017 11:28:57 +0200
> > > > >> Greg Kurz <groug@kaod.org> wrote:
> > > > >>
> > > > >>> On Wed, 24 May 2017 15:07:54 +1000
> > > > >>> David Gibson <david@gibson.dropbear.id.au> wrote:
> > > > >>>   
> > > > >>>> On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:    
> > > > >>>>> If the OS is not started, QEMU sends an event to the OS
> > > > >>>>> that is lost and cannot be recovered. An unplug is not
> > > > >>>>> able to restore QEMU in a coherent state.
> > > > >>>>> So, while the OS is not started, disable CPU and memory hotplug.
> > > > >>>>> We use option vector 6 to know if the OS is started
> > > > >>>>>
> > > > >>>>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>      
> > > > >>>>
> > > > >>>> Urgh.. I'm not terribly confident that this is really correct.  As
> > > > >>>> discussed on the previous patch, you're essentially using OV6 as a
> > > > >>>> flag that CAS is complete.
> > > > >>>>
> > > > >>>> But while it undoubtedly makes the race window much smaller, I don't
> > > > >>>> see that there's any guarantee the guest OS will really be able to
> > > > >>>> handle hotplug events immediately after CAS.
> > > > >>>>
> > > > >>>> In particular if the CAS process completes partially but then needs to
> > > > >>>> trigger a reboot, I think that would end up setting the ov6 variable,
> > > > >>>> but the OS would definitely not be in a state to accept events.  
> > > > >> wouldn't guest on reboot pick up updated fdt and online hotplugged
> > > > >> before crash cpu along with initial cpus?
> > > > >>
> > > > > 
> > > > > Yes and that's what actually happens with cpus.
> > > > > 
> > > > > But catching up with the background for this series, I have the
> > > > > impression that the issue isn't the fact we loose an event if the OS
> > > > > isn't started (which is not true), but more something wrong happening
> > > > > when hotplugging+unplugging memory as described in this commit:
> > > > > 
> > > > > commit fe6824d12642b005c69123ecf8631f9b13553f8b
> > > > > Author: Laurent Vivier <lvivier@redhat.com>
> > > > > Date:   Tue Mar 28 14:09:34 2017 +0200
> > > > > 
> > > > >     spapr: fix memory hot-unplugging
> > > > > 
> > > > 
> > > > Yes, this commit try to fix that, but it's not possible. Some objects
> > > > remain in memory: you can see with "info cpus" or "info memory-devices"
> > > > that they are not really removed, and this prevents to hotplug them
> > > > again, and moreover in the case of the memory hot-unplug we can rerun
> > > > the device_del and crash qemu (as before the fix).
> > > > 
> > > > Moreover all stuff normally cleared in detach() are not, and we can't do
> > > > it later in set_allocation_state() because some are in use by the
> > > > kernel, and this is the last call from the kernel.
> > > 
> > > Focusing on the hotplug/add case, it's a bit odd that the guest would be
> > > using the memory even though the hotplug event is clearly still sitting
> > > in the queue.
> > > 
> > > I think part of the issue is us not having a clear enough distinction in
> > > the code between what constitutes the need for "boot-time" handling vs.
> > > "hotplug" handling.
> > > 
> > > We have this hook in spapr_add_lmbs:
> > > 
> > >     if (!dev->hotplugged) {
> > >         /* guests expect coldplugged LMBs to be pre-allocated */
> > >         drck->set_allocation_state(drc, SPAPR_DR_ALLOCATION_STATE_USABLE);
> > >         drck->set_isolation_state(drc, SPAPR_DR_ISOLATION_STATE_UNISOLATED);
> > >     }
> > > 
> > > Whereas the default allocation/isolation state for LMBs in spapr_drc.c is
> > > UNUSABLE/ISOLATED, which is what covers the dev->hotplugged == true case.
> > > 
> > > I need to spend some time testing to confirm, but trying to walk through the
> > > various scenarios looking at the code:
> > > 
> > > case 1)
> > > 
> > > If the hotplug occurs before reset (not sure how likely this is), the event
> > > will get dropped by reset handler, and the DRC stuff will be left in
> > > UNUSABLE/ISOLATED. I think it's more appropriate to treat this as "boot-time"
> > > and set it to USABLE/UNISOLATED like the !dev->hotplugged case.
> > 
> > Right.  It looks like we might need to go through all DRCs and sanitize
> > their state at reset time.  Essentially whatever their state before
> > the reset, they should appear as cold-plugged after the reset, I
> > think.
> > 
> > > case 2)
> > > 
> > > If the hotplug it occurs after reset, but before CAS,
> > > spapr_populate_drconf_memory will be called to populate the DT with all active
> > > LMBs. AFAICT, for hotplugged LMBs it marks everything where
> > > memory_region_preset(get_system_memory(), addr) == true as
> > > SPAPR_LMB_FLAGS_ASSIGNED. Since the region is mapped regardless of whether the
> > > guest has acknowledged the hotplug, I think this would end up presenting the
> > > LMB as having been present at boot-time. However, they will still be in the
> > > UNUSABLE/ISOLATED state because dev->hotplugged == true.
> > > 
> > > I would think that the delayed hotplug event would move them to the appropriate
> > > state later, allowing the unplug to succeed later, but it totally possible the
> > > guest code bails out during the hotplug path since it already has the LMB marked
> > > as being in use via the CAS-generated DT.
> > > 
> > > So it seems like we need to either:
> > > 
> > > a) not mark these LMBs as SPAPR_LMB_FLAGS_ASSIGNED in the DT and let them get
> > > picked up by the deferred hotplug event (which seems to also be in need of an
> > > extra IRQ pulse given that it's not getting picked up till later), or
> > > 
> > > b) let them get picked up as boot-time LMBs and add a CAS hook to move the
> > > state to USABLE/UNISOLATED at that point. optionally we could also purge any
> > > pending hotplug events from the event queue but that gets weird if we have
> > > subsequent unplug events and whatnot sitting there as well. Hopefully letting
> > > guest process the hotplug event later and possible fail still leaves us in
> > > a recoverable state where we can still complete the unplug after boot.
> > > 
> > > Does this seem like an accurate assessment of the issues you're seeing?
> > 
> > It seems plausible from my limited understanding of the situation.
> > The variety of possible state transitions in the PAPR hotplug model
> > hurts my brain.
> > 
> > I think plan (a) sounds simpler than plan (b).  Basically any hotplug
> > events that occur between reset and CAS we want to queue until CAS is
> > complete.  AIUI we're already effectively queuing the event that goes
> > to the guest, but we've already - incorrectly - made some qemu side
> > state changes that show up in the DT fragments handed out by CAS.
> 
> I agree. The one thing I'm a bit iffy on is why the guest is missing
> the interrupt (or handling at least) for the initially-queued events;
> if we go this route we need to make sure the guest acts on them as part
> of boot.
> 
> I assume pending interrupts get dropped by CAS because the guest doesn't
> initialize the hotplug interrupt handler until that point. If that's the
> case, a CAS hook to scan through the event queue to re-signal if needed
> would hopefully do the trick, but I'm still a bit uncertain about whether
> that's sufficient.

Have we confirmed that events are actually being dropped here, it's
not just that once the guest gets to them other state is incorrect
meaning they don't get processed as expected?

> If it's something we can't do deterministically, we might need to consider
> plan (b).
> 
> > 
> > Can we just in general postpone the qemu side updates until the
> > hotplug event is presented to the guest, rather than when it's
> > submitted from the host?  Or will that raise a different bunch of problems?
> 
> It seems like that might be problematic for migration.

Why?  We're now migrating the contents of the event queue...

> Not updating the device tree with LMBs still pending delivery of hotplug
> event during CAS seems fairly easy. We generate a new DT fragment for
> the LMB at hp time anyways, so I think we can safely "throw away" the
> updates and not worry about tracking any additional intermediate state.
> 
> Going to the extent of delaying the call to pc_dimm_memory_plug would be
> problematic though I think. We would need a hook to make sure the call
> is made if CAS completes after migration, and for cases where we do
> migration by re-creating DIMMs via cmdline, we'd need some way to
> synchonize state for these "pending" DIMMs, else that deferred call to
> pc_dimm_memory_plug will probably generate errors due to duplicate
> DIMM mappings (and if we relax/ignore those errors we still have the
> original issue with CAS picking up the LMBs prematurely). This ends up
> seeming really similar to the stuff that necessitated DRC migration,
> which we probably want to avoid if possible.

Hm, ok.  My brain hurts.  Any thoughts on what the next logical step
should be?

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2017-05-31  6:37 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-23 11:18 [Qemu-devel] [PATCH 0/4] spapr: disable hotplugging without OS Laurent Vivier
2017-05-23 11:18 ` [Qemu-devel] [PATCH 1/4] spapr: add pre_plug function for memory Laurent Vivier
2017-05-23 15:28   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2017-05-23 16:09     ` Laurent Vivier
2017-05-24  4:52       ` David Gibson
2017-05-24  9:55         ` Greg Kurz
2017-05-24 10:27           ` David Gibson
2017-05-23 11:18 ` [Qemu-devel] [PATCH 2/4] spapr: add option vector 6 Laurent Vivier
2017-05-23 16:31   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2017-05-24  4:58   ` [Qemu-devel] " David Gibson
2017-05-23 11:18 ` [Qemu-devel] [PATCH 3/4] spapr: disable hotplugging without OS Laurent Vivier
2017-05-24  5:07   ` David Gibson
2017-05-24  9:28     ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2017-05-24 10:14       ` Igor Mammedov
2017-05-24 15:54         ` Greg Kurz
2017-05-24 16:02           ` Laurent Vivier
2017-05-24 17:40             ` Michael Roth
2017-05-25  3:16               ` David Gibson
2017-05-30 17:15                 ` Michael Roth
2017-05-31  6:36                   ` David Gibson
2017-05-25  2:49         ` David Gibson
2017-05-25  2:45       ` David Gibson
2017-05-23 11:18 ` [Qemu-devel] [PATCH 4/4] Revert "spapr: fix memory hot-unplugging" Laurent Vivier
2017-05-23 17:52 ` [Qemu-devel] [Qemu-ppc] [PATCH 0/4] spapr: disable hotplugging without OS Daniel Henrique Barboza
2017-05-23 18:07   ` Daniel Henrique Barboza
2017-05-23 18:22     ` Daniel Henrique Barboza
2017-05-23 19:42       ` Laurent Vivier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.