All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] spapr: Fix visibility and traversal of DR connectors
@ 2020-12-18 10:33 Greg Kurz
  2020-12-18 10:33 ` [PATCH 1/6] spapr: Call spapr_drc_reset() for all DRCs at CAS Greg Kurz
                   ` (6 more replies)
  0 siblings, 7 replies; 22+ messages in thread
From: Greg Kurz @ 2020-12-18 10:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, Greg Kurz, David Gibson

Setting a high maxmem value seriously degrades the guest's boot
time, from 3 seconds for 1T up to more than 3 minutes for 8T.
All this time is spent during initial machine setup and CAS,
preventing use of the QEMU monitor in the meantime.

Profiling reveals:

  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 85.48     24.08    24.08   566117     0.00     0.00  object_get_canonical_path_component
 13.67     27.93     3.85 57623944     0.00     0.00  strstart

-----------------------------------------------
                0.00    0.00       1/566117      host_memory_backend_get_name [270]
                1.41    0.22   33054/566117      drc_realize <cycle 1> [23]
               22.67    3.51  533062/566117      object_get_canonical_path <cycle 1> [3]
[2]     98.7   24.08    3.73  566117         object_get_canonical_path_component [2]
                3.73    0.00 55802324/57623944     strstart [19]
-----------------------------------------------
                                  12             object_property_set_link <cycle 1> [1267]
                               33074             device_set_realized <cycle 1> [138]
                              231378             object_get_child_property <cycle 1> [652]
[3]     93.0    0.01   26.18  264464         object_get_canonical_path <cycle 1> [3]
               22.67    3.51  533062/566117      object_get_canonical_path_component [2]
                              264464             object_get_root <cycle 1> [629]
-----------------------------------------------

This is because an 8T maxmem means QEMU can create up to 32768
LMB DRC objects, each tracking the hot-plug/unplug state of 256M
of contiguous RAM. These objects are all created during machine
init for the machine lifetime. Their realize path involves
several calls to object_get_canonical_path_component(), which
itself traverses all properties of the parent node. This results
in a quadratic operation. Worse, the full list of DRCs is traversed
7 times during the boot process, eg. to populate the device tree,
calling object_get_canonical_path_component() on each DRC again.
Yet even more costly quadratic traversals.

Modeling DR connectors as individual devices raises some
concerns, as already discussed a year ago in this thread:

https://patchew.org/QEMU/20191017205953.13122-1-cheloha@linux.vnet.ibm.com/

First, having so many devices to track the DRC states is excessive
and can cause scalability issues in various ways. This bites again
with this quadratic traversal issue. Second, DR connectors are really
PAPR internals that shouldn't be exposed at all in the composition
tree.

This series converts DR connectors to be simple unparented
objects tracked in a separate hash table, rather than
actual devices exposed in the QOM tree. This doesn't address
the overall concern on scalability, but this brings linear
traversal of the DR connectors. The time penalty with a
8T maxmem is reduced to less than 1 second, and we get
a much shorter 'info qom-tree' output.

This is transparent to migration.

Greg Kurz (6):
  spapr: Call spapr_drc_reset() for all DRCs at CAS
  spapr: Fix reset of transient DR connectors
  spapr: Introduce spapr_drc_reset_all()
  spapr: Use spapr_drc_reset_all() at machine reset
  spapr: Add drc_ prefix to the DRC realize and unrealize functions
  spapr: Model DR connectors as simple objects

 include/hw/ppc/spapr_drc.h |  18 +++-
 hw/ppc/spapr.c             |  15 +--
 hw/ppc/spapr_drc.c         | 181 +++++++++++++++++--------------------
 hw/ppc/spapr_hcall.c       |  33 ++-----
 hw/ppc/spapr_pci.c         |   2 +-
 5 files changed, 106 insertions(+), 143 deletions(-)

-- 
2.26.2




^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/6] spapr: Call spapr_drc_reset() for all DRCs at CAS
  2020-12-18 10:33 [PATCH 0/6] spapr: Fix visibility and traversal of DR connectors Greg Kurz
@ 2020-12-18 10:33 ` Greg Kurz
  2020-12-21 18:24   ` Daniel Henrique Barboza
  2020-12-28  7:20   ` David Gibson
  2020-12-18 10:33 ` [PATCH 2/6] spapr: Fix reset of transient DR connectors Greg Kurz
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 22+ messages in thread
From: Greg Kurz @ 2020-12-18 10:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, Greg Kurz, David Gibson

Non-transient DRCs are either in the empty or the ready state,
which means spapr_drc_reset() doesn't change their state. It
is thus not needed to do any checking. Call spapr_drc_reset()
unconditionally and squash spapr_drc_transient() into its
only user, spapr_drc_needed().

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 include/hw/ppc/spapr_drc.h | 3 ---
 hw/ppc/spapr_drc.c         | 8 ++------
 hw/ppc/spapr_hcall.c       | 7 ++++---
 3 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
index def3593adc8b..cff5e707d0d9 100644
--- a/include/hw/ppc/spapr_drc.h
+++ b/include/hw/ppc/spapr_drc.h
@@ -244,9 +244,6 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask);
 void spapr_drc_attach(SpaprDrc *drc, DeviceState *d);
 void spapr_drc_detach(SpaprDrc *drc);
 
-/* Returns true if a hot plug/unplug request is pending */
-bool spapr_drc_transient(SpaprDrc *drc);
-
 static inline bool spapr_drc_unplug_requested(SpaprDrc *drc)
 {
     return drc->unplug_requested;
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index fc7e321fcdf6..8d62f55066b6 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -462,8 +462,9 @@ static const VMStateDescription vmstate_spapr_drc_unplug_requested = {
     }
 };
 
-bool spapr_drc_transient(SpaprDrc *drc)
+static bool spapr_drc_needed(void *opaque)
 {
+    SpaprDrc *drc = opaque;
     SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
 
     /*
@@ -483,11 +484,6 @@ bool spapr_drc_transient(SpaprDrc *drc)
         spapr_drc_unplug_requested(drc);
 }
 
-static bool spapr_drc_needed(void *opaque)
-{
-    return spapr_drc_transient(opaque);
-}
-
 static const VMStateDescription vmstate_spapr_drc = {
     .name = "spapr_drc",
     .version_id = 1,
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index c0ea0bd5794b..4e9d50c254f0 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1650,9 +1650,10 @@ static void spapr_handle_transient_dev_before_cas(SpaprMachineState *spapr)
                                                           prop->name,
                                                           &error_abort));
 
-        if (spapr_drc_transient(drc)) {
-            spapr_drc_reset(drc);
-        }
+        /*
+         * This will complete any pending plug/unplug requests.
+         */
+        spapr_drc_reset(drc);
     }
 
     spapr_clear_pending_hotplug_events(spapr);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/6] spapr: Fix reset of transient DR connectors
  2020-12-18 10:33 [PATCH 0/6] spapr: Fix visibility and traversal of DR connectors Greg Kurz
  2020-12-18 10:33 ` [PATCH 1/6] spapr: Call spapr_drc_reset() for all DRCs at CAS Greg Kurz
@ 2020-12-18 10:33 ` Greg Kurz
  2020-12-21 20:34   ` Daniel Henrique Barboza
  2020-12-28  7:24   ` David Gibson
  2020-12-18 10:33 ` [PATCH 3/6] spapr: Introduce spapr_drc_reset_all() Greg Kurz
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 22+ messages in thread
From: Greg Kurz @ 2020-12-18 10:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, Greg Kurz, David Gibson

Documentation of object_property_iter_init() clearly stipulates that
"it is forbidden to modify the property list while iterating". But this
is exactly what we do when resetting transient DR connectors during CAS.
The call to spapr_drc_reset() can finalize the hot-unplug sequence of a
PHB or a PCI bridge, both of which will then in turn destroy their PCI
DRCs. This could potentially invalidate the iterator. It is pure luck
that this haven't caused any issues so far.

Change spapr_drc_reset() to return true if it caused a device to be
removed. Restart from scratch in this case. This can potentially
increase the overall DRC reset time, especially with a high maxmem
which generates a lot of LMB DRCs. But this kind of setup is rare,
and so is the use case of rebooting a guest while doing hot-unplug.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 include/hw/ppc/spapr_drc.h | 3 ++-
 hw/ppc/spapr_drc.c         | 6 +++++-
 hw/ppc/spapr_hcall.c       | 8 +++++++-
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
index cff5e707d0d9..5d80019f82e2 100644
--- a/include/hw/ppc/spapr_drc.h
+++ b/include/hw/ppc/spapr_drc.h
@@ -224,7 +224,8 @@ static inline bool spapr_drc_hotplugged(DeviceState *dev)
     return dev->hotplugged && !runstate_check(RUN_STATE_INMIGRATE);
 }
 
-void spapr_drc_reset(SpaprDrc *drc);
+/* Returns true if an unplug request completed */
+bool spapr_drc_reset(SpaprDrc *drc);
 
 uint32_t spapr_drc_index(SpaprDrc *drc);
 SpaprDrcType spapr_drc_type(SpaprDrc *drc);
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 8d62f55066b6..5b5e2ac58a7e 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -417,9 +417,10 @@ void spapr_drc_detach(SpaprDrc *drc)
     spapr_drc_release(drc);
 }
 
-void spapr_drc_reset(SpaprDrc *drc)
+bool spapr_drc_reset(SpaprDrc *drc)
 {
     SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+    bool unplug_completed = false;
 
     trace_spapr_drc_reset(spapr_drc_index(drc));
 
@@ -428,6 +429,7 @@ void spapr_drc_reset(SpaprDrc *drc)
      */
     if (drc->unplug_requested) {
         spapr_drc_release(drc);
+        unplug_completed = true;
     }
 
     if (drc->dev) {
@@ -444,6 +446,8 @@ void spapr_drc_reset(SpaprDrc *drc)
         drc->ccs_offset = -1;
         drc->ccs_depth = -1;
     }
+
+    return unplug_completed;
 }
 
 static bool spapr_drc_unplug_requested_needed(void *opaque)
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 4e9d50c254f0..aa22830ac4bd 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1639,6 +1639,7 @@ static void spapr_handle_transient_dev_before_cas(SpaprMachineState *spapr)
     ObjectPropertyIterator iter;
 
     drc_container = container_get(object_get_root(), "/dr-connector");
+restart:
     object_property_iter_init(&iter, drc_container);
     while ((prop = object_property_iter_next(&iter))) {
         SpaprDrc *drc;
@@ -1652,8 +1653,13 @@ static void spapr_handle_transient_dev_before_cas(SpaprMachineState *spapr)
 
         /*
          * This will complete any pending plug/unplug requests.
+         * In case of a unplugged PHB or PCI bridge, this will
+         * cause some DRCs to be destroyed and thus potentially
+         * invalidate the iterator.
          */
-        spapr_drc_reset(drc);
+        if (spapr_drc_reset(drc)) {
+            goto restart;
+        }
     }
 
     spapr_clear_pending_hotplug_events(spapr);
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/6] spapr: Introduce spapr_drc_reset_all()
  2020-12-18 10:33 [PATCH 0/6] spapr: Fix visibility and traversal of DR connectors Greg Kurz
  2020-12-18 10:33 ` [PATCH 1/6] spapr: Call spapr_drc_reset() for all DRCs at CAS Greg Kurz
  2020-12-18 10:33 ` [PATCH 2/6] spapr: Fix reset of transient DR connectors Greg Kurz
@ 2020-12-18 10:33 ` Greg Kurz
  2020-12-21 20:35   ` Daniel Henrique Barboza
  2020-12-28  7:26   ` David Gibson
  2020-12-18 10:33 ` [PATCH 4/6] spapr: Use spapr_drc_reset_all() at machine reset Greg Kurz
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 22+ messages in thread
From: Greg Kurz @ 2020-12-18 10:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, Greg Kurz, David Gibson

No need to expose the way DRCs are traversed outside of spapr_drc.c.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 include/hw/ppc/spapr_drc.h |  6 ++++++
 hw/ppc/spapr_drc.c         | 31 +++++++++++++++++++++++++++++
 hw/ppc/spapr_hcall.c       | 40 ++++++--------------------------------
 3 files changed, 43 insertions(+), 34 deletions(-)

diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
index 5d80019f82e2..8982927d5c24 100644
--- a/include/hw/ppc/spapr_drc.h
+++ b/include/hw/ppc/spapr_drc.h
@@ -245,6 +245,12 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask);
 void spapr_drc_attach(SpaprDrc *drc, DeviceState *d);
 void spapr_drc_detach(SpaprDrc *drc);
 
+/*
+ * Reset all DRCs, causing pending hot-plug/unplug requests to complete.
+ * Safely handles potential DRC removal (eg. PHBs or PCI bridges).
+ */
+void spapr_drc_reset_all(struct SpaprMachineState *spapr);
+
 static inline bool spapr_drc_unplug_requested(SpaprDrc *drc)
 {
     return drc->unplug_requested;
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 5b5e2ac58a7e..a4d2608017c5 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -949,6 +949,37 @@ out:
     return ret;
 }
 
+void spapr_drc_reset_all(SpaprMachineState *spapr)
+{
+    Object *drc_container;
+    ObjectProperty *prop;
+    ObjectPropertyIterator iter;
+
+    drc_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
+restart:
+    object_property_iter_init(&iter, drc_container);
+    while ((prop = object_property_iter_next(&iter))) {
+        SpaprDrc *drc;
+
+        if (!strstart(prop->type, "link<", NULL)) {
+            continue;
+        }
+        drc = SPAPR_DR_CONNECTOR(object_property_get_link(drc_container,
+                                                          prop->name,
+                                                          &error_abort));
+
+        /*
+         * This will complete any pending plug/unplug requests.
+         * In case of a unplugged PHB or PCI bridge, this will
+         * cause some DRCs to be destroyed and thus potentially
+         * invalidate the iterator.
+         */
+        if (spapr_drc_reset(drc)) {
+            goto restart;
+        }
+    }
+}
+
 /*
  * RTAS calls
  */
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index aa22830ac4bd..e5dfc1ba7acc 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1632,39 +1632,6 @@ static uint32_t cas_check_pvr(PowerPCCPU *cpu, uint32_t max_compat,
     return best_compat;
 }
 
-static void spapr_handle_transient_dev_before_cas(SpaprMachineState *spapr)
-{
-    Object *drc_container;
-    ObjectProperty *prop;
-    ObjectPropertyIterator iter;
-
-    drc_container = container_get(object_get_root(), "/dr-connector");
-restart:
-    object_property_iter_init(&iter, drc_container);
-    while ((prop = object_property_iter_next(&iter))) {
-        SpaprDrc *drc;
-
-        if (!strstart(prop->type, "link<", NULL)) {
-            continue;
-        }
-        drc = SPAPR_DR_CONNECTOR(object_property_get_link(drc_container,
-                                                          prop->name,
-                                                          &error_abort));
-
-        /*
-         * This will complete any pending plug/unplug requests.
-         * In case of a unplugged PHB or PCI bridge, this will
-         * cause some DRCs to be destroyed and thus potentially
-         * invalidate the iterator.
-         */
-        if (spapr_drc_reset(drc)) {
-            goto restart;
-        }
-    }
-
-    spapr_clear_pending_hotplug_events(spapr);
-}
-
 target_ulong do_client_architecture_support(PowerPCCPU *cpu,
                                             SpaprMachineState *spapr,
                                             target_ulong vec,
@@ -1822,7 +1789,12 @@ target_ulong do_client_architecture_support(PowerPCCPU *cpu,
 
     spapr_irq_update_active_intc(spapr);
 
-    spapr_handle_transient_dev_before_cas(spapr);
+    /*
+     * Process all pending hot-plug/unplug requests now. An updated full
+     * rendered FDT will be returned to the guest.
+     */
+    spapr_drc_reset_all(spapr);
+    spapr_clear_pending_hotplug_events(spapr);
 
     /*
      * If spapr_machine_reset() did not set up a HPT but one is necessary
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/6] spapr: Use spapr_drc_reset_all() at machine reset
  2020-12-18 10:33 [PATCH 0/6] spapr: Fix visibility and traversal of DR connectors Greg Kurz
                   ` (2 preceding siblings ...)
  2020-12-18 10:33 ` [PATCH 3/6] spapr: Introduce spapr_drc_reset_all() Greg Kurz
@ 2020-12-18 10:33 ` Greg Kurz
  2020-12-21 20:36   ` Daniel Henrique Barboza
  2020-12-28  7:29   ` David Gibson
  2020-12-18 10:33 ` [PATCH 5/6] spapr: Add drc_ prefix to the DRC realize and unrealize functions Greg Kurz
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 22+ messages in thread
From: Greg Kurz @ 2020-12-18 10:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, Greg Kurz, David Gibson

Documentation of object_child_foreach_recursive() clearly stipulates
that "it is forbidden to add or remove children from @obj from the @fn
callback". But this is exactly what we do during machine reset. The call
to spapr_drc_reset() can finalize the hot-unplug sequence of a PHB or a
PCI bridge, both of which will then in turn destroy their PCI DRCs. This
could potentially invalidate the iterator used by do_object_child_foreach().
It is pure luck that this haven't caused any issues so far.

Use spapr_drc_reset_all() since it can cope with DRC removal.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 hw/ppc/spapr.c | 15 +--------------
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 43dded87f498..8528bc90fec4 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1566,19 +1566,6 @@ void spapr_setup_hpt(SpaprMachineState *spapr)
     }
 }
 
-static int spapr_reset_drcs(Object *child, void *opaque)
-{
-    SpaprDrc *drc =
-        (SpaprDrc *) object_dynamic_cast(child,
-                                                 TYPE_SPAPR_DR_CONNECTOR);
-
-    if (drc) {
-        spapr_drc_reset(drc);
-    }
-
-    return 0;
-}
-
 static void spapr_machine_reset(MachineState *machine)
 {
     SpaprMachineState *spapr = SPAPR_MACHINE(machine);
@@ -1633,7 +1620,7 @@ static void spapr_machine_reset(MachineState *machine)
      * will crash QEMU if the DIMM holding the vring goes away). To avoid such
      * situations, we reset DRCs after all devices have been reset.
      */
-    object_child_foreach_recursive(object_get_root(), spapr_reset_drcs, NULL);
+    spapr_drc_reset_all(spapr);
 
     spapr_clear_pending_events(spapr);
 
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/6] spapr: Add drc_ prefix to the DRC realize and unrealize functions
  2020-12-18 10:33 [PATCH 0/6] spapr: Fix visibility and traversal of DR connectors Greg Kurz
                   ` (3 preceding siblings ...)
  2020-12-18 10:33 ` [PATCH 4/6] spapr: Use spapr_drc_reset_all() at machine reset Greg Kurz
@ 2020-12-18 10:33 ` Greg Kurz
  2020-12-21 20:37   ` Daniel Henrique Barboza
  2020-12-28  7:31   ` David Gibson
  2020-12-18 10:34 ` [PATCH 6/6] spapr: Model DR connectors as simple objects Greg Kurz
  2020-12-22 10:14 ` [PATCH 0/6] spapr: Fix visibility and traversal of DR connectors Daniel Henrique Barboza
  6 siblings, 2 replies; 22+ messages in thread
From: Greg Kurz @ 2020-12-18 10:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, Greg Kurz, David Gibson

Use a less generic name for an easier experience with tools such as
cscope or grep.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 hw/ppc/spapr_drc.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index a4d2608017c5..8571d5bafe4e 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -503,7 +503,7 @@ static const VMStateDescription vmstate_spapr_drc = {
     }
 };
 
-static void realize(DeviceState *d, Error **errp)
+static void drc_realize(DeviceState *d, Error **errp)
 {
     SpaprDrc *drc = SPAPR_DR_CONNECTOR(d);
     Object *root_container;
@@ -530,7 +530,7 @@ static void realize(DeviceState *d, Error **errp)
     trace_spapr_drc_realize_complete(spapr_drc_index(drc));
 }
 
-static void unrealize(DeviceState *d)
+static void drc_unrealize(DeviceState *d)
 {
     SpaprDrc *drc = SPAPR_DR_CONNECTOR(d);
     Object *root_container;
@@ -579,8 +579,8 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
 {
     DeviceClass *dk = DEVICE_CLASS(k);
 
-    dk->realize = realize;
-    dk->unrealize = unrealize;
+    dk->realize = drc_realize;
+    dk->unrealize = drc_unrealize;
     /*
      * Reason: DR connector needs to be wired to either the machine or to a
      * PHB in spapr_dr_connector_new().
@@ -628,7 +628,7 @@ static void realize_physical(DeviceState *d, Error **errp)
     SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d);
     Error *local_err = NULL;
 
-    realize(d, &local_err);
+    drc_realize(d, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         return;
@@ -644,7 +644,7 @@ static void unrealize_physical(DeviceState *d)
 {
     SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d);
 
-    unrealize(d);
+    drc_unrealize(d);
     vmstate_unregister(VMSTATE_IF(drcp), &vmstate_spapr_drc_physical, drcp);
     qemu_unregister_reset(drc_physical_reset, drcp);
 }
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 6/6] spapr: Model DR connectors as simple objects
  2020-12-18 10:33 [PATCH 0/6] spapr: Fix visibility and traversal of DR connectors Greg Kurz
                   ` (4 preceding siblings ...)
  2020-12-18 10:33 ` [PATCH 5/6] spapr: Add drc_ prefix to the DRC realize and unrealize functions Greg Kurz
@ 2020-12-18 10:34 ` Greg Kurz
  2020-12-21 20:45   ` Daniel Henrique Barboza
  2020-12-28  8:28   ` David Gibson
  2020-12-22 10:14 ` [PATCH 0/6] spapr: Fix visibility and traversal of DR connectors Daniel Henrique Barboza
  6 siblings, 2 replies; 22+ messages in thread
From: Greg Kurz @ 2020-12-18 10:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, Greg Kurz, David Gibson

Modeling DR connectors as individual devices raises some
concerns, as already discussed a year ago in this thread:

https://patchew.org/QEMU/20191017205953.13122-1-cheloha@linux.vnet.ibm.com/

First, high maxmem settings creates too many DRC devices.
This causes scalability issues. It severely increase boot
time because the multiple traversals of the DRC list that
are performed during machine setup are quadratic operations.
This is directly related to the fact that DRCs are modeled
as individual devices and added to the composition tree.

Second, DR connectors are really an internal concept of
PAPR. They aren't something that the user or management
layer can manipulate in any way. We already don't allow
their creation with device_add by clearing user_creatable.

DR connectors don't even need to be modeled as actual
devices since they don't sit in a bus. They just need
to be associated to an 'owner' object and to have the
equivalent of realize/unrealize functions.

Downgrade them to be simple objects. Convert the existing
realize() and unrealize() to be methods of the DR connector
base class. Also have the base class to inherit from the
vmstate_if interface directly. The get_id() hook simply
returns NULL, just as device_vmstate_if_get_id() does for
devices that don't sit in a bus. The DR connector is no
longer made a child object. This means that it must be
explicitely freed when no longer needed. This is only
required for PHBs and PCI bridges actually : have them to
free the DRC with spapr_dr_connector_free() instead of
object_unparent().

No longer add the DRCs to the QOM composition tree. Track
them with a glib hash table using the global DRC index as
the key instead. This makes traversal a linear operation.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 include/hw/ppc/spapr_drc.h |   8 +-
 hw/ppc/spapr_drc.c         | 166 ++++++++++++++-----------------------
 hw/ppc/spapr_pci.c         |   2 +-
 3 files changed, 69 insertions(+), 107 deletions(-)

diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
index 8982927d5c24..a26aa8b9d4c3 100644
--- a/include/hw/ppc/spapr_drc.h
+++ b/include/hw/ppc/spapr_drc.h
@@ -170,7 +170,7 @@ typedef enum {
 
 typedef struct SpaprDrc {
     /*< private >*/
-    DeviceState parent;
+    Object parent;
 
     uint32_t id;
     Object *owner;
@@ -193,7 +193,7 @@ struct SpaprMachineState;
 
 typedef struct SpaprDrcClass {
     /*< private >*/
-    DeviceClass parent;
+    ObjectClass parent;
     SpaprDrcState empty_state;
     SpaprDrcState ready_state;
 
@@ -209,6 +209,9 @@ typedef struct SpaprDrcClass {
 
     int (*dt_populate)(SpaprDrc *drc, struct SpaprMachineState *spapr,
                        void *fdt, int *fdt_start_offset, Error **errp);
+
+    void (*realize)(SpaprDrc *drc);
+    void (*unrealize)(SpaprDrc *drc);
 } SpaprDrcClass;
 
 typedef struct SpaprDrcPhysical {
@@ -232,6 +235,7 @@ SpaprDrcType spapr_drc_type(SpaprDrc *drc);
 
 SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type,
                                          uint32_t id);
+void spapr_dr_connector_free(SpaprDrc *drc);
 SpaprDrc *spapr_drc_by_index(uint32_t index);
 SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id);
 int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask);
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 8571d5bafe4e..e26763f8b5a4 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -27,7 +27,6 @@
 #include "sysemu/reset.h"
 #include "trace.h"
 
-#define DRC_CONTAINER_PATH "/dr-connector"
 #define DRC_INDEX_TYPE_SHIFT 28
 #define DRC_INDEX_ID_MASK ((1ULL << DRC_INDEX_TYPE_SHIFT) - 1)
 
@@ -503,65 +502,56 @@ static const VMStateDescription vmstate_spapr_drc = {
     }
 };
 
-static void drc_realize(DeviceState *d, Error **errp)
+static GHashTable *drc_hash_table(void)
 {
-    SpaprDrc *drc = SPAPR_DR_CONNECTOR(d);
-    Object *root_container;
-    gchar *link_name;
-    const char *child_name;
+    static GHashTable *dht;
 
+    if (!dht) {
+        dht = g_hash_table_new(NULL, NULL);
+    }
+
+    return dht;
+}
+
+
+static void drc_realize(SpaprDrc *drc)
+{
     trace_spapr_drc_realize(spapr_drc_index(drc));
-    /* NOTE: we do this as part of realize/unrealize due to the fact
-     * that the guest will communicate with the DRC via RTAS calls
-     * referencing the global DRC index. By unlinking the DRC
-     * from DRC_CONTAINER_PATH/<drc_index> we effectively make it
-     * inaccessible by the guest, since lookups rely on this path
-     * existing in the composition tree
-     */
-    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
-    link_name = g_strdup_printf("%x", spapr_drc_index(drc));
-    child_name = object_get_canonical_path_component(OBJECT(drc));
-    trace_spapr_drc_realize_child(spapr_drc_index(drc), child_name);
-    object_property_add_alias(root_container, link_name,
-                              drc->owner, child_name);
-    g_free(link_name);
+
+    g_hash_table_insert(drc_hash_table(),
+                        GUINT_TO_POINTER(spapr_drc_index(drc)), drc);
     vmstate_register(VMSTATE_IF(drc), spapr_drc_index(drc), &vmstate_spapr_drc,
                      drc);
     trace_spapr_drc_realize_complete(spapr_drc_index(drc));
 }
 
-static void drc_unrealize(DeviceState *d)
+static void drc_unrealize(SpaprDrc *drc)
 {
-    SpaprDrc *drc = SPAPR_DR_CONNECTOR(d);
-    Object *root_container;
-    gchar *name;
-
     trace_spapr_drc_unrealize(spapr_drc_index(drc));
     vmstate_unregister(VMSTATE_IF(drc), &vmstate_spapr_drc, drc);
-    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
-    name = g_strdup_printf("%x", spapr_drc_index(drc));
-    object_property_del(root_container, name);
-    g_free(name);
+    g_hash_table_remove(drc_hash_table(),
+                        GUINT_TO_POINTER(spapr_drc_index(drc)));
 }
 
 SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type,
                                          uint32_t id)
 {
     SpaprDrc *drc = SPAPR_DR_CONNECTOR(object_new(type));
-    char *prop_name;
 
     drc->id = id;
-    drc->owner = owner;
-    prop_name = g_strdup_printf("dr-connector[%"PRIu32"]",
-                                spapr_drc_index(drc));
-    object_property_add_child(owner, prop_name, OBJECT(drc));
-    object_unref(OBJECT(drc));
-    qdev_realize(DEVICE(drc), NULL, NULL);
-    g_free(prop_name);
+    drc->owner = object_ref(owner);
+    SPAPR_DR_CONNECTOR_GET_CLASS(drc)->realize(drc);
 
     return drc;
 }
 
+void spapr_dr_connector_free(SpaprDrc *drc)
+{
+    SPAPR_DR_CONNECTOR_GET_CLASS(drc)->unrealize(drc);
+    object_unref(drc->owner);
+    object_unref(drc);
+}
+
 static void spapr_dr_connector_instance_init(Object *obj)
 {
     SpaprDrc *drc = SPAPR_DR_CONNECTOR(obj);
@@ -575,17 +565,19 @@ static void spapr_dr_connector_instance_init(Object *obj)
     drc->state = drck->empty_state;
 }
 
+static char *drc_vmstate_if_get_id(VMStateIf *obj)
+{
+    return NULL;
+}
+
 static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
 {
-    DeviceClass *dk = DEVICE_CLASS(k);
+    SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
+    VMStateIfClass *vc = VMSTATE_IF_CLASS(k);
 
-    dk->realize = drc_realize;
-    dk->unrealize = drc_unrealize;
-    /*
-     * Reason: DR connector needs to be wired to either the machine or to a
-     * PHB in spapr_dr_connector_new().
-     */
-    dk->user_creatable = false;
+    drck->realize = drc_realize;
+    drck->unrealize = drc_unrealize;
+    vc->get_id = drc_vmstate_if_get_id;
 }
 
 static bool drc_physical_needed(void *opaque)
@@ -623,39 +615,32 @@ static void drc_physical_reset(void *opaque)
     }
 }
 
-static void realize_physical(DeviceState *d, Error **errp)
+static void realize_physical(SpaprDrc *drc)
 {
-    SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d);
-    Error *local_err = NULL;
-
-    drc_realize(d, &local_err);
-    if (local_err) {
-        error_propagate(errp, local_err);
-        return;
-    }
+    SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(drc);
 
+    drc_realize(drc);
     vmstate_register(VMSTATE_IF(drcp),
                      spapr_drc_index(SPAPR_DR_CONNECTOR(drcp)),
                      &vmstate_spapr_drc_physical, drcp);
     qemu_register_reset(drc_physical_reset, drcp);
 }
 
-static void unrealize_physical(DeviceState *d)
+static void unrealize_physical(SpaprDrc *drc)
 {
-    SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d);
+    SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(drc);
 
-    drc_unrealize(d);
-    vmstate_unregister(VMSTATE_IF(drcp), &vmstate_spapr_drc_physical, drcp);
     qemu_unregister_reset(drc_physical_reset, drcp);
+    vmstate_unregister(VMSTATE_IF(drcp), &vmstate_spapr_drc_physical, drcp);
+    drc_unrealize(drc);
 }
 
 static void spapr_drc_physical_class_init(ObjectClass *k, void *data)
 {
-    DeviceClass *dk = DEVICE_CLASS(k);
     SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
 
-    dk->realize = realize_physical;
-    dk->unrealize = unrealize_physical;
+    drck->realize = realize_physical;
+    drck->unrealize = unrealize_physical;
     drck->dr_entity_sense = physical_entity_sense;
     drck->isolate = drc_isolate_physical;
     drck->unisolate = drc_unisolate_physical;
@@ -731,12 +716,16 @@ static void spapr_drc_pmem_class_init(ObjectClass *k, void *data)
 
 static const TypeInfo spapr_dr_connector_info = {
     .name          = TYPE_SPAPR_DR_CONNECTOR,
-    .parent        = TYPE_DEVICE,
+    .parent        = TYPE_OBJECT,
     .instance_size = sizeof(SpaprDrc),
     .instance_init = spapr_dr_connector_instance_init,
     .class_size    = sizeof(SpaprDrcClass),
     .class_init    = spapr_dr_connector_class_init,
     .abstract      = true,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_VMSTATE_IF },
+        { }
+    },
 };
 
 static const TypeInfo spapr_drc_physical_info = {
@@ -789,14 +778,9 @@ static const TypeInfo spapr_drc_pmem_info = {
 
 SpaprDrc *spapr_drc_by_index(uint32_t index)
 {
-    Object *obj;
-    gchar *name;
-
-    name = g_strdup_printf("%s/%x", DRC_CONTAINER_PATH, index);
-    obj = object_resolve_path(name, NULL);
-    g_free(name);
-
-    return !obj ? NULL : SPAPR_DR_CONNECTOR(obj);
+    return
+        SPAPR_DR_CONNECTOR(g_hash_table_lookup(drc_hash_table(),
+                                               GUINT_TO_POINTER(index)));
 }
 
 SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id)
@@ -824,13 +808,12 @@ SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id)
  */
 int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask)
 {
-    Object *root_container;
-    ObjectProperty *prop;
-    ObjectPropertyIterator iter;
+    GHashTableIter iter;
     uint32_t drc_count = 0;
     GArray *drc_indexes, *drc_power_domains;
     GString *drc_names, *drc_types;
     int ret;
+    SpaprDrc *drc;
 
     /*
      * This should really be only called once per node since it overwrites
@@ -851,26 +834,12 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask)
     drc_names = g_string_set_size(g_string_new(NULL), sizeof(uint32_t));
     drc_types = g_string_set_size(g_string_new(NULL), sizeof(uint32_t));
 
-    /* aliases for all DRConnector objects will be rooted in QOM
-     * composition tree at DRC_CONTAINER_PATH
-     */
-    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
-
-    object_property_iter_init(&iter, root_container);
-    while ((prop = object_property_iter_next(&iter))) {
-        Object *obj;
-        SpaprDrc *drc;
+    g_hash_table_iter_init(&iter, drc_hash_table());
+    while (g_hash_table_iter_next(&iter, NULL, (gpointer) &drc)) {
         SpaprDrcClass *drck;
         char *drc_name = NULL;
         uint32_t drc_index, drc_power_domain;
 
-        if (!strstart(prop->type, "link<", NULL)) {
-            continue;
-        }
-
-        obj = object_property_get_link(root_container, prop->name,
-                                       &error_abort);
-        drc = SPAPR_DR_CONNECTOR(obj);
         drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
 
         if (owner && (drc->owner != owner)) {
@@ -951,23 +920,12 @@ out:
 
 void spapr_drc_reset_all(SpaprMachineState *spapr)
 {
-    Object *drc_container;
-    ObjectProperty *prop;
-    ObjectPropertyIterator iter;
+    GHashTableIter iter;
+    SpaprDrc *drc;
 
-    drc_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
 restart:
-    object_property_iter_init(&iter, drc_container);
-    while ((prop = object_property_iter_next(&iter))) {
-        SpaprDrc *drc;
-
-        if (!strstart(prop->type, "link<", NULL)) {
-            continue;
-        }
-        drc = SPAPR_DR_CONNECTOR(object_property_get_link(drc_container,
-                                                          prop->name,
-                                                          &error_abort));
-
+    g_hash_table_iter_init(&iter, drc_hash_table());
+    while (g_hash_table_iter_next(&iter, NULL, (gpointer) &drc)) {
         /*
          * This will complete any pending plug/unplug requests.
          * In case of a unplugged PHB or PCI bridge, this will
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 76d7c91e9c64..ca0cca664e3c 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1262,7 +1262,7 @@ static void remove_drcs(SpaprPhbState *phb, PCIBus *bus)
         SpaprDrc *drc = drc_from_devfn(phb, chassis, i);
 
         if (drc) {
-            object_unparent(OBJECT(drc));
+            spapr_dr_connector_free(drc);
         }
     }
 }
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/6] spapr: Call spapr_drc_reset() for all DRCs at CAS
  2020-12-18 10:33 ` [PATCH 1/6] spapr: Call spapr_drc_reset() for all DRCs at CAS Greg Kurz
@ 2020-12-21 18:24   ` Daniel Henrique Barboza
  2020-12-28  7:20   ` David Gibson
  1 sibling, 0 replies; 22+ messages in thread
From: Daniel Henrique Barboza @ 2020-12-21 18:24 UTC (permalink / raw)
  To: Greg Kurz, qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, David Gibson



On 12/18/20 7:33 AM, Greg Kurz wrote:
> Non-transient DRCs are either in the empty or the ready state,
> which means spapr_drc_reset() doesn't change their state. It
> is thus not needed to do any checking. Call spapr_drc_reset()
> unconditionally and squash spapr_drc_transient() into its
> only user, spapr_drc_needed().
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---


Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

>   include/hw/ppc/spapr_drc.h | 3 ---
>   hw/ppc/spapr_drc.c         | 8 ++------
>   hw/ppc/spapr_hcall.c       | 7 ++++---
>   3 files changed, 6 insertions(+), 12 deletions(-)
> 
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index def3593adc8b..cff5e707d0d9 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -244,9 +244,6 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask);
>   void spapr_drc_attach(SpaprDrc *drc, DeviceState *d);
>   void spapr_drc_detach(SpaprDrc *drc);
>   
> -/* Returns true if a hot plug/unplug request is pending */
> -bool spapr_drc_transient(SpaprDrc *drc);
> -
>   static inline bool spapr_drc_unplug_requested(SpaprDrc *drc)
>   {
>       return drc->unplug_requested;
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index fc7e321fcdf6..8d62f55066b6 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -462,8 +462,9 @@ static const VMStateDescription vmstate_spapr_drc_unplug_requested = {
>       }
>   };
>   
> -bool spapr_drc_transient(SpaprDrc *drc)
> +static bool spapr_drc_needed(void *opaque)
>   {
> +    SpaprDrc *drc = opaque;
>       SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
>   
>       /*
> @@ -483,11 +484,6 @@ bool spapr_drc_transient(SpaprDrc *drc)
>           spapr_drc_unplug_requested(drc);
>   }
>   
> -static bool spapr_drc_needed(void *opaque)
> -{
> -    return spapr_drc_transient(opaque);
> -}
> -
>   static const VMStateDescription vmstate_spapr_drc = {
>       .name = "spapr_drc",
>       .version_id = 1,
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index c0ea0bd5794b..4e9d50c254f0 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1650,9 +1650,10 @@ static void spapr_handle_transient_dev_before_cas(SpaprMachineState *spapr)
>                                                             prop->name,
>                                                             &error_abort));
>   
> -        if (spapr_drc_transient(drc)) {
> -            spapr_drc_reset(drc);
> -        }
> +        /*
> +         * This will complete any pending plug/unplug requests.
> +         */
> +        spapr_drc_reset(drc);
>       }
>   
>       spapr_clear_pending_hotplug_events(spapr);
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] spapr: Fix reset of transient DR connectors
  2020-12-18 10:33 ` [PATCH 2/6] spapr: Fix reset of transient DR connectors Greg Kurz
@ 2020-12-21 20:34   ` Daniel Henrique Barboza
  2020-12-28  7:24   ` David Gibson
  1 sibling, 0 replies; 22+ messages in thread
From: Daniel Henrique Barboza @ 2020-12-21 20:34 UTC (permalink / raw)
  To: Greg Kurz, qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, David Gibson



On 12/18/20 7:33 AM, Greg Kurz wrote:
> Documentation of object_property_iter_init() clearly stipulates that
> "it is forbidden to modify the property list while iterating". But this
> is exactly what we do when resetting transient DR connectors during CAS.
> The call to spapr_drc_reset() can finalize the hot-unplug sequence of a
> PHB or a PCI bridge, both of which will then in turn destroy their PCI
> DRCs. This could potentially invalidate the iterator. It is pure luck
> that this haven't caused any issues so far.
> 
> Change spapr_drc_reset() to return true if it caused a device to be
> removed. Restart from scratch in this case. This can potentially
> increase the overall DRC reset time, especially with a high maxmem
> which generates a lot of LMB DRCs. But this kind of setup is rare,
> and so is the use case of rebooting a guest while doing hot-unplug.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

>   include/hw/ppc/spapr_drc.h | 3 ++-
>   hw/ppc/spapr_drc.c         | 6 +++++-
>   hw/ppc/spapr_hcall.c       | 8 +++++++-
>   3 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index cff5e707d0d9..5d80019f82e2 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -224,7 +224,8 @@ static inline bool spapr_drc_hotplugged(DeviceState *dev)
>       return dev->hotplugged && !runstate_check(RUN_STATE_INMIGRATE);
>   }
>   
> -void spapr_drc_reset(SpaprDrc *drc);
> +/* Returns true if an unplug request completed */
> +bool spapr_drc_reset(SpaprDrc *drc);
>   
>   uint32_t spapr_drc_index(SpaprDrc *drc);
>   SpaprDrcType spapr_drc_type(SpaprDrc *drc);
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 8d62f55066b6..5b5e2ac58a7e 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -417,9 +417,10 @@ void spapr_drc_detach(SpaprDrc *drc)
>       spapr_drc_release(drc);
>   }
>   
> -void spapr_drc_reset(SpaprDrc *drc)
> +bool spapr_drc_reset(SpaprDrc *drc)
>   {
>       SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +    bool unplug_completed = false;
>   
>       trace_spapr_drc_reset(spapr_drc_index(drc));
>   
> @@ -428,6 +429,7 @@ void spapr_drc_reset(SpaprDrc *drc)
>        */
>       if (drc->unplug_requested) {
>           spapr_drc_release(drc);
> +        unplug_completed = true;
>       }
>   
>       if (drc->dev) {
> @@ -444,6 +446,8 @@ void spapr_drc_reset(SpaprDrc *drc)
>           drc->ccs_offset = -1;
>           drc->ccs_depth = -1;
>       }
> +
> +    return unplug_completed;
>   }
>   
>   static bool spapr_drc_unplug_requested_needed(void *opaque)
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 4e9d50c254f0..aa22830ac4bd 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1639,6 +1639,7 @@ static void spapr_handle_transient_dev_before_cas(SpaprMachineState *spapr)
>       ObjectPropertyIterator iter;
>   
>       drc_container = container_get(object_get_root(), "/dr-connector");
> +restart:
>       object_property_iter_init(&iter, drc_container);
>       while ((prop = object_property_iter_next(&iter))) {
>           SpaprDrc *drc;
> @@ -1652,8 +1653,13 @@ static void spapr_handle_transient_dev_before_cas(SpaprMachineState *spapr)
>   
>           /*
>            * This will complete any pending plug/unplug requests.
> +         * In case of a unplugged PHB or PCI bridge, this will
> +         * cause some DRCs to be destroyed and thus potentially
> +         * invalidate the iterator.
>            */
> -        spapr_drc_reset(drc);
> +        if (spapr_drc_reset(drc)) {
> +            goto restart;
> +        }
>       }
>   
>       spapr_clear_pending_hotplug_events(spapr);
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] spapr: Introduce spapr_drc_reset_all()
  2020-12-18 10:33 ` [PATCH 3/6] spapr: Introduce spapr_drc_reset_all() Greg Kurz
@ 2020-12-21 20:35   ` Daniel Henrique Barboza
  2020-12-28  7:26   ` David Gibson
  1 sibling, 0 replies; 22+ messages in thread
From: Daniel Henrique Barboza @ 2020-12-21 20:35 UTC (permalink / raw)
  To: Greg Kurz, qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, David Gibson



On 12/18/20 7:33 AM, Greg Kurz wrote:
> No need to expose the way DRCs are traversed outside of spapr_drc.c.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

>   include/hw/ppc/spapr_drc.h |  6 ++++++
>   hw/ppc/spapr_drc.c         | 31 +++++++++++++++++++++++++++++
>   hw/ppc/spapr_hcall.c       | 40 ++++++--------------------------------
>   3 files changed, 43 insertions(+), 34 deletions(-)
> 
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index 5d80019f82e2..8982927d5c24 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -245,6 +245,12 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask);
>   void spapr_drc_attach(SpaprDrc *drc, DeviceState *d);
>   void spapr_drc_detach(SpaprDrc *drc);
>   
> +/*
> + * Reset all DRCs, causing pending hot-plug/unplug requests to complete.
> + * Safely handles potential DRC removal (eg. PHBs or PCI bridges).
> + */
> +void spapr_drc_reset_all(struct SpaprMachineState *spapr);
> +
>   static inline bool spapr_drc_unplug_requested(SpaprDrc *drc)
>   {
>       return drc->unplug_requested;
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 5b5e2ac58a7e..a4d2608017c5 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -949,6 +949,37 @@ out:
>       return ret;
>   }
>   
> +void spapr_drc_reset_all(SpaprMachineState *spapr)
> +{
> +    Object *drc_container;
> +    ObjectProperty *prop;
> +    ObjectPropertyIterator iter;
> +
> +    drc_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
> +restart:
> +    object_property_iter_init(&iter, drc_container);
> +    while ((prop = object_property_iter_next(&iter))) {
> +        SpaprDrc *drc;
> +
> +        if (!strstart(prop->type, "link<", NULL)) {
> +            continue;
> +        }
> +        drc = SPAPR_DR_CONNECTOR(object_property_get_link(drc_container,
> +                                                          prop->name,
> +                                                          &error_abort));
> +
> +        /*
> +         * This will complete any pending plug/unplug requests.
> +         * In case of a unplugged PHB or PCI bridge, this will
> +         * cause some DRCs to be destroyed and thus potentially
> +         * invalidate the iterator.
> +         */
> +        if (spapr_drc_reset(drc)) {
> +            goto restart;
> +        }
> +    }
> +}
> +
>   /*
>    * RTAS calls
>    */
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index aa22830ac4bd..e5dfc1ba7acc 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1632,39 +1632,6 @@ static uint32_t cas_check_pvr(PowerPCCPU *cpu, uint32_t max_compat,
>       return best_compat;
>   }
>   
> -static void spapr_handle_transient_dev_before_cas(SpaprMachineState *spapr)
> -{
> -    Object *drc_container;
> -    ObjectProperty *prop;
> -    ObjectPropertyIterator iter;
> -
> -    drc_container = container_get(object_get_root(), "/dr-connector");
> -restart:
> -    object_property_iter_init(&iter, drc_container);
> -    while ((prop = object_property_iter_next(&iter))) {
> -        SpaprDrc *drc;
> -
> -        if (!strstart(prop->type, "link<", NULL)) {
> -            continue;
> -        }
> -        drc = SPAPR_DR_CONNECTOR(object_property_get_link(drc_container,
> -                                                          prop->name,
> -                                                          &error_abort));
> -
> -        /*
> -         * This will complete any pending plug/unplug requests.
> -         * In case of a unplugged PHB or PCI bridge, this will
> -         * cause some DRCs to be destroyed and thus potentially
> -         * invalidate the iterator.
> -         */
> -        if (spapr_drc_reset(drc)) {
> -            goto restart;
> -        }
> -    }
> -
> -    spapr_clear_pending_hotplug_events(spapr);
> -}
> -
>   target_ulong do_client_architecture_support(PowerPCCPU *cpu,
>                                               SpaprMachineState *spapr,
>                                               target_ulong vec,
> @@ -1822,7 +1789,12 @@ target_ulong do_client_architecture_support(PowerPCCPU *cpu,
>   
>       spapr_irq_update_active_intc(spapr);
>   
> -    spapr_handle_transient_dev_before_cas(spapr);
> +    /*
> +     * Process all pending hot-plug/unplug requests now. An updated full
> +     * rendered FDT will be returned to the guest.
> +     */
> +    spapr_drc_reset_all(spapr);
> +    spapr_clear_pending_hotplug_events(spapr);
>   
>       /*
>        * If spapr_machine_reset() did not set up a HPT but one is necessary
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/6] spapr: Use spapr_drc_reset_all() at machine reset
  2020-12-18 10:33 ` [PATCH 4/6] spapr: Use spapr_drc_reset_all() at machine reset Greg Kurz
@ 2020-12-21 20:36   ` Daniel Henrique Barboza
  2020-12-28  7:29   ` David Gibson
  1 sibling, 0 replies; 22+ messages in thread
From: Daniel Henrique Barboza @ 2020-12-21 20:36 UTC (permalink / raw)
  To: Greg Kurz, qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, David Gibson



On 12/18/20 7:33 AM, Greg Kurz wrote:
> Documentation of object_child_foreach_recursive() clearly stipulates
> that "it is forbidden to add or remove children from @obj from the @fn
> callback". But this is exactly what we do during machine reset. The call
> to spapr_drc_reset() can finalize the hot-unplug sequence of a PHB or a
> PCI bridge, both of which will then in turn destroy their PCI DRCs. This
> could potentially invalidate the iterator used by do_object_child_foreach().
> It is pure luck that this haven't caused any issues so far.
> 
> Use spapr_drc_reset_all() since it can cope with DRC removal.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

>   hw/ppc/spapr.c | 15 +--------------
>   1 file changed, 1 insertion(+), 14 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 43dded87f498..8528bc90fec4 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1566,19 +1566,6 @@ void spapr_setup_hpt(SpaprMachineState *spapr)
>       }
>   }
>   
> -static int spapr_reset_drcs(Object *child, void *opaque)
> -{
> -    SpaprDrc *drc =
> -        (SpaprDrc *) object_dynamic_cast(child,
> -                                                 TYPE_SPAPR_DR_CONNECTOR);
> -
> -    if (drc) {
> -        spapr_drc_reset(drc);
> -    }
> -
> -    return 0;
> -}
> -
>   static void spapr_machine_reset(MachineState *machine)
>   {
>       SpaprMachineState *spapr = SPAPR_MACHINE(machine);
> @@ -1633,7 +1620,7 @@ static void spapr_machine_reset(MachineState *machine)
>        * will crash QEMU if the DIMM holding the vring goes away). To avoid such
>        * situations, we reset DRCs after all devices have been reset.
>        */
> -    object_child_foreach_recursive(object_get_root(), spapr_reset_drcs, NULL);
> +    spapr_drc_reset_all(spapr);
>   
>       spapr_clear_pending_events(spapr);
>   
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] spapr: Add drc_ prefix to the DRC realize and unrealize functions
  2020-12-18 10:33 ` [PATCH 5/6] spapr: Add drc_ prefix to the DRC realize and unrealize functions Greg Kurz
@ 2020-12-21 20:37   ` Daniel Henrique Barboza
  2020-12-28  7:31   ` David Gibson
  1 sibling, 0 replies; 22+ messages in thread
From: Daniel Henrique Barboza @ 2020-12-21 20:37 UTC (permalink / raw)
  To: Greg Kurz, qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, David Gibson



On 12/18/20 7:33 AM, Greg Kurz wrote:
> Use a less generic name for an easier experience with tools such as
> cscope or grep.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---

Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>

>   hw/ppc/spapr_drc.c | 12 ++++++------
>   1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index a4d2608017c5..8571d5bafe4e 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -503,7 +503,7 @@ static const VMStateDescription vmstate_spapr_drc = {
>       }
>   };
>   
> -static void realize(DeviceState *d, Error **errp)
> +static void drc_realize(DeviceState *d, Error **errp)
>   {
>       SpaprDrc *drc = SPAPR_DR_CONNECTOR(d);
>       Object *root_container;
> @@ -530,7 +530,7 @@ static void realize(DeviceState *d, Error **errp)
>       trace_spapr_drc_realize_complete(spapr_drc_index(drc));
>   }
>   
> -static void unrealize(DeviceState *d)
> +static void drc_unrealize(DeviceState *d)
>   {
>       SpaprDrc *drc = SPAPR_DR_CONNECTOR(d);
>       Object *root_container;
> @@ -579,8 +579,8 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
>   {
>       DeviceClass *dk = DEVICE_CLASS(k);
>   
> -    dk->realize = realize;
> -    dk->unrealize = unrealize;
> +    dk->realize = drc_realize;
> +    dk->unrealize = drc_unrealize;
>       /*
>        * Reason: DR connector needs to be wired to either the machine or to a
>        * PHB in spapr_dr_connector_new().
> @@ -628,7 +628,7 @@ static void realize_physical(DeviceState *d, Error **errp)
>       SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d);
>       Error *local_err = NULL;
>   
> -    realize(d, &local_err);
> +    drc_realize(d, &local_err);
>       if (local_err) {
>           error_propagate(errp, local_err);
>           return;
> @@ -644,7 +644,7 @@ static void unrealize_physical(DeviceState *d)
>   {
>       SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d);
>   
> -    unrealize(d);
> +    drc_unrealize(d);
>       vmstate_unregister(VMSTATE_IF(drcp), &vmstate_spapr_drc_physical, drcp);
>       qemu_unregister_reset(drc_physical_reset, drcp);
>   }
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] spapr: Model DR connectors as simple objects
  2020-12-18 10:34 ` [PATCH 6/6] spapr: Model DR connectors as simple objects Greg Kurz
@ 2020-12-21 20:45   ` Daniel Henrique Barboza
  2020-12-28  8:28   ` David Gibson
  1 sibling, 0 replies; 22+ messages in thread
From: Daniel Henrique Barboza @ 2020-12-21 20:45 UTC (permalink / raw)
  To: Greg Kurz, qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, David Gibson



On 12/18/20 7:34 AM, Greg Kurz wrote:
> Modeling DR connectors as individual devices raises some
> concerns, as already discussed a year ago in this thread:
> 
> https://patchew.org/QEMU/20191017205953.13122-1-cheloha@linux.vnet.ibm.com/
> 
> First, high maxmem settings creates too many DRC devices.
> This causes scalability issues. It severely increase boot

s/increase/increases

> time because the multiple traversals of the DRC list that
> are performed during machine setup are quadratic operations.
> This is directly related to the fact that DRCs are modeled
> as individual devices and added to the composition tree.
> 
> Second, DR connectors are really an internal concept of
> PAPR. They aren't something that the user or management
> layer can manipulate in any way. We already don't allow
> their creation with device_add by clearing user_creatable.
> 
> DR connectors don't even need to be modeled as actual
> devices since they don't sit in a bus. They just need
> to be associated to an 'owner' object and to have the
> equivalent of realize/unrealize functions.
> 
> Downgrade them to be simple objects. Convert the existing
> realize() and unrealize() to be methods of the DR connector
> base class. Also have the base class to inherit from the
> vmstate_if interface directly. The get_id() hook simply
> returns NULL, just as device_vmstate_if_get_id() does for
> devices that don't sit in a bus. The DR connector is no
> longer made a child object. This means that it must be
> explicitely freed when no longer needed. This is only


s/explicitely/explicitly

> required for PHBs and PCI bridges actually : have them to
> free the DRC with spapr_dr_connector_free() instead of
> object_unparent().
> 
> No longer add the DRCs to the QOM composition tree. Track
> them with a glib hash table using the global DRC index as
> the key instead. This makes traversal a linear operation.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---


Code LGTM. The maintainer is welcome to fix the nits while pushing.


Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>




>   include/hw/ppc/spapr_drc.h |   8 +-
>   hw/ppc/spapr_drc.c         | 166 ++++++++++++++-----------------------
>   hw/ppc/spapr_pci.c         |   2 +-
>   3 files changed, 69 insertions(+), 107 deletions(-)
> 
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index 8982927d5c24..a26aa8b9d4c3 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -170,7 +170,7 @@ typedef enum {
>   
>   typedef struct SpaprDrc {
>       /*< private >*/
> -    DeviceState parent;
> +    Object parent;
>   
>       uint32_t id;
>       Object *owner;
> @@ -193,7 +193,7 @@ struct SpaprMachineState;
>   
>   typedef struct SpaprDrcClass {
>       /*< private >*/
> -    DeviceClass parent;
> +    ObjectClass parent;
>       SpaprDrcState empty_state;
>       SpaprDrcState ready_state;
>   
> @@ -209,6 +209,9 @@ typedef struct SpaprDrcClass {
>   
>       int (*dt_populate)(SpaprDrc *drc, struct SpaprMachineState *spapr,
>                          void *fdt, int *fdt_start_offset, Error **errp);
> +
> +    void (*realize)(SpaprDrc *drc);
> +    void (*unrealize)(SpaprDrc *drc);
>   } SpaprDrcClass;
>   
>   typedef struct SpaprDrcPhysical {
> @@ -232,6 +235,7 @@ SpaprDrcType spapr_drc_type(SpaprDrc *drc);
>   
>   SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type,
>                                            uint32_t id);
> +void spapr_dr_connector_free(SpaprDrc *drc);
>   SpaprDrc *spapr_drc_by_index(uint32_t index);
>   SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id);
>   int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask);
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 8571d5bafe4e..e26763f8b5a4 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -27,7 +27,6 @@
>   #include "sysemu/reset.h"
>   #include "trace.h"
>   
> -#define DRC_CONTAINER_PATH "/dr-connector"
>   #define DRC_INDEX_TYPE_SHIFT 28
>   #define DRC_INDEX_ID_MASK ((1ULL << DRC_INDEX_TYPE_SHIFT) - 1)
>   
> @@ -503,65 +502,56 @@ static const VMStateDescription vmstate_spapr_drc = {
>       }
>   };
>   
> -static void drc_realize(DeviceState *d, Error **errp)
> +static GHashTable *drc_hash_table(void)
>   {
> -    SpaprDrc *drc = SPAPR_DR_CONNECTOR(d);
> -    Object *root_container;
> -    gchar *link_name;
> -    const char *child_name;
> +    static GHashTable *dht;
>   
> +    if (!dht) {
> +        dht = g_hash_table_new(NULL, NULL);
> +    }
> +
> +    return dht;
> +}
> +
> +
> +static void drc_realize(SpaprDrc *drc)
> +{
>       trace_spapr_drc_realize(spapr_drc_index(drc));
> -    /* NOTE: we do this as part of realize/unrealize due to the fact
> -     * that the guest will communicate with the DRC via RTAS calls
> -     * referencing the global DRC index. By unlinking the DRC
> -     * from DRC_CONTAINER_PATH/<drc_index> we effectively make it
> -     * inaccessible by the guest, since lookups rely on this path
> -     * existing in the composition tree
> -     */
> -    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
> -    link_name = g_strdup_printf("%x", spapr_drc_index(drc));
> -    child_name = object_get_canonical_path_component(OBJECT(drc));
> -    trace_spapr_drc_realize_child(spapr_drc_index(drc), child_name);
> -    object_property_add_alias(root_container, link_name,
> -                              drc->owner, child_name);
> -    g_free(link_name);
> +
> +    g_hash_table_insert(drc_hash_table(),
> +                        GUINT_TO_POINTER(spapr_drc_index(drc)), drc);
>       vmstate_register(VMSTATE_IF(drc), spapr_drc_index(drc), &vmstate_spapr_drc,
>                        drc);
>       trace_spapr_drc_realize_complete(spapr_drc_index(drc));
>   }
>   
> -static void drc_unrealize(DeviceState *d)
> +static void drc_unrealize(SpaprDrc *drc)
>   {
> -    SpaprDrc *drc = SPAPR_DR_CONNECTOR(d);
> -    Object *root_container;
> -    gchar *name;
> -
>       trace_spapr_drc_unrealize(spapr_drc_index(drc));
>       vmstate_unregister(VMSTATE_IF(drc), &vmstate_spapr_drc, drc);
> -    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
> -    name = g_strdup_printf("%x", spapr_drc_index(drc));
> -    object_property_del(root_container, name);
> -    g_free(name);
> +    g_hash_table_remove(drc_hash_table(),
> +                        GUINT_TO_POINTER(spapr_drc_index(drc)));
>   }
>   
>   SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type,
>                                            uint32_t id)
>   {
>       SpaprDrc *drc = SPAPR_DR_CONNECTOR(object_new(type));
> -    char *prop_name;
>   
>       drc->id = id;
> -    drc->owner = owner;
> -    prop_name = g_strdup_printf("dr-connector[%"PRIu32"]",
> -                                spapr_drc_index(drc));
> -    object_property_add_child(owner, prop_name, OBJECT(drc));
> -    object_unref(OBJECT(drc));
> -    qdev_realize(DEVICE(drc), NULL, NULL);
> -    g_free(prop_name);
> +    drc->owner = object_ref(owner);
> +    SPAPR_DR_CONNECTOR_GET_CLASS(drc)->realize(drc);
>   
>       return drc;
>   }
>   
> +void spapr_dr_connector_free(SpaprDrc *drc)
> +{
> +    SPAPR_DR_CONNECTOR_GET_CLASS(drc)->unrealize(drc);
> +    object_unref(drc->owner);
> +    object_unref(drc);
> +}
> +
>   static void spapr_dr_connector_instance_init(Object *obj)
>   {
>       SpaprDrc *drc = SPAPR_DR_CONNECTOR(obj);
> @@ -575,17 +565,19 @@ static void spapr_dr_connector_instance_init(Object *obj)
>       drc->state = drck->empty_state;
>   }
>   
> +static char *drc_vmstate_if_get_id(VMStateIf *obj)
> +{
> +    return NULL;
> +}
> +
>   static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
>   {
> -    DeviceClass *dk = DEVICE_CLASS(k);
> +    SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
> +    VMStateIfClass *vc = VMSTATE_IF_CLASS(k);
>   
> -    dk->realize = drc_realize;
> -    dk->unrealize = drc_unrealize;
> -    /*
> -     * Reason: DR connector needs to be wired to either the machine or to a
> -     * PHB in spapr_dr_connector_new().
> -     */
> -    dk->user_creatable = false;
> +    drck->realize = drc_realize;
> +    drck->unrealize = drc_unrealize;
> +    vc->get_id = drc_vmstate_if_get_id;
>   }
>   
>   static bool drc_physical_needed(void *opaque)
> @@ -623,39 +615,32 @@ static void drc_physical_reset(void *opaque)
>       }
>   }
>   
> -static void realize_physical(DeviceState *d, Error **errp)
> +static void realize_physical(SpaprDrc *drc)
>   {
> -    SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d);
> -    Error *local_err = NULL;
> -
> -    drc_realize(d, &local_err);
> -    if (local_err) {
> -        error_propagate(errp, local_err);
> -        return;
> -    }
> +    SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(drc);
>   
> +    drc_realize(drc);
>       vmstate_register(VMSTATE_IF(drcp),
>                        spapr_drc_index(SPAPR_DR_CONNECTOR(drcp)),
>                        &vmstate_spapr_drc_physical, drcp);
>       qemu_register_reset(drc_physical_reset, drcp);
>   }
>   
> -static void unrealize_physical(DeviceState *d)
> +static void unrealize_physical(SpaprDrc *drc)
>   {
> -    SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d);
> +    SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(drc);
>   
> -    drc_unrealize(d);
> -    vmstate_unregister(VMSTATE_IF(drcp), &vmstate_spapr_drc_physical, drcp);
>       qemu_unregister_reset(drc_physical_reset, drcp);
> +    vmstate_unregister(VMSTATE_IF(drcp), &vmstate_spapr_drc_physical, drcp);
> +    drc_unrealize(drc);
>   }
>   
>   static void spapr_drc_physical_class_init(ObjectClass *k, void *data)
>   {
> -    DeviceClass *dk = DEVICE_CLASS(k);
>       SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
>   
> -    dk->realize = realize_physical;
> -    dk->unrealize = unrealize_physical;
> +    drck->realize = realize_physical;
> +    drck->unrealize = unrealize_physical;
>       drck->dr_entity_sense = physical_entity_sense;
>       drck->isolate = drc_isolate_physical;
>       drck->unisolate = drc_unisolate_physical;
> @@ -731,12 +716,16 @@ static void spapr_drc_pmem_class_init(ObjectClass *k, void *data)
>   
>   static const TypeInfo spapr_dr_connector_info = {
>       .name          = TYPE_SPAPR_DR_CONNECTOR,
> -    .parent        = TYPE_DEVICE,
> +    .parent        = TYPE_OBJECT,
>       .instance_size = sizeof(SpaprDrc),
>       .instance_init = spapr_dr_connector_instance_init,
>       .class_size    = sizeof(SpaprDrcClass),
>       .class_init    = spapr_dr_connector_class_init,
>       .abstract      = true,
> +    .interfaces = (InterfaceInfo[]) {
> +        { TYPE_VMSTATE_IF },
> +        { }
> +    },
>   };
>   
>   static const TypeInfo spapr_drc_physical_info = {
> @@ -789,14 +778,9 @@ static const TypeInfo spapr_drc_pmem_info = {
>   
>   SpaprDrc *spapr_drc_by_index(uint32_t index)
>   {
> -    Object *obj;
> -    gchar *name;
> -
> -    name = g_strdup_printf("%s/%x", DRC_CONTAINER_PATH, index);
> -    obj = object_resolve_path(name, NULL);
> -    g_free(name);
> -
> -    return !obj ? NULL : SPAPR_DR_CONNECTOR(obj);
> +    return
> +        SPAPR_DR_CONNECTOR(g_hash_table_lookup(drc_hash_table(),
> +                                               GUINT_TO_POINTER(index)));
>   }
>   
>   SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id)
> @@ -824,13 +808,12 @@ SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id)
>    */
>   int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask)
>   {
> -    Object *root_container;
> -    ObjectProperty *prop;
> -    ObjectPropertyIterator iter;
> +    GHashTableIter iter;
>       uint32_t drc_count = 0;
>       GArray *drc_indexes, *drc_power_domains;
>       GString *drc_names, *drc_types;
>       int ret;
> +    SpaprDrc *drc;
>   
>       /*
>        * This should really be only called once per node since it overwrites
> @@ -851,26 +834,12 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask)
>       drc_names = g_string_set_size(g_string_new(NULL), sizeof(uint32_t));
>       drc_types = g_string_set_size(g_string_new(NULL), sizeof(uint32_t));
>   
> -    /* aliases for all DRConnector objects will be rooted in QOM
> -     * composition tree at DRC_CONTAINER_PATH
> -     */
> -    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
> -
> -    object_property_iter_init(&iter, root_container);
> -    while ((prop = object_property_iter_next(&iter))) {
> -        Object *obj;
> -        SpaprDrc *drc;
> +    g_hash_table_iter_init(&iter, drc_hash_table());
> +    while (g_hash_table_iter_next(&iter, NULL, (gpointer) &drc)) {
>           SpaprDrcClass *drck;
>           char *drc_name = NULL;
>           uint32_t drc_index, drc_power_domain;
>   
> -        if (!strstart(prop->type, "link<", NULL)) {
> -            continue;
> -        }
> -
> -        obj = object_property_get_link(root_container, prop->name,
> -                                       &error_abort);
> -        drc = SPAPR_DR_CONNECTOR(obj);
>           drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
>   
>           if (owner && (drc->owner != owner)) {
> @@ -951,23 +920,12 @@ out:
>   
>   void spapr_drc_reset_all(SpaprMachineState *spapr)
>   {
> -    Object *drc_container;
> -    ObjectProperty *prop;
> -    ObjectPropertyIterator iter;
> +    GHashTableIter iter;
> +    SpaprDrc *drc;
>   
> -    drc_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
>   restart:
> -    object_property_iter_init(&iter, drc_container);
> -    while ((prop = object_property_iter_next(&iter))) {
> -        SpaprDrc *drc;
> -
> -        if (!strstart(prop->type, "link<", NULL)) {
> -            continue;
> -        }
> -        drc = SPAPR_DR_CONNECTOR(object_property_get_link(drc_container,
> -                                                          prop->name,
> -                                                          &error_abort));
> -
> +    g_hash_table_iter_init(&iter, drc_hash_table());
> +    while (g_hash_table_iter_next(&iter, NULL, (gpointer) &drc)) {
>           /*
>            * This will complete any pending plug/unplug requests.
>            * In case of a unplugged PHB or PCI bridge, this will
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 76d7c91e9c64..ca0cca664e3c 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1262,7 +1262,7 @@ static void remove_drcs(SpaprPhbState *phb, PCIBus *bus)
>           SpaprDrc *drc = drc_from_devfn(phb, chassis, i);
>   
>           if (drc) {
> -            object_unparent(OBJECT(drc));
> +            spapr_dr_connector_free(drc);
>           }
>       }
>   }
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/6] spapr: Fix visibility and traversal of DR connectors
  2020-12-18 10:33 [PATCH 0/6] spapr: Fix visibility and traversal of DR connectors Greg Kurz
                   ` (5 preceding siblings ...)
  2020-12-18 10:34 ` [PATCH 6/6] spapr: Model DR connectors as simple objects Greg Kurz
@ 2020-12-22 10:14 ` Daniel Henrique Barboza
  6 siblings, 0 replies; 22+ messages in thread
From: Daniel Henrique Barboza @ 2020-12-22 10:14 UTC (permalink / raw)
  To: Greg Kurz, qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, David Gibson



On 12/18/20 7:33 AM, Greg Kurz wrote:
> Setting a high maxmem value seriously degrades the guest's boot
> time, from 3 seconds for 1T up to more than 3 minutes for 8T.
> All this time is spent during initial machine setup and CAS,
> preventing use of the QEMU monitor in the meantime.
> 
> Profiling reveals:
> 
>    %   cumulative   self              self     total
>   time   seconds   seconds    calls   s/call   s/call  name
>   85.48     24.08    24.08   566117     0.00     0.00  object_get_canonical_path_component
>   13.67     27.93     3.85 57623944     0.00     0.00  strstart
> 
> -----------------------------------------------
>                  0.00    0.00       1/566117      host_memory_backend_get_name [270]
>                  1.41    0.22   33054/566117      drc_realize <cycle 1> [23]
>                 22.67    3.51  533062/566117      object_get_canonical_path <cycle 1> [3]
> [2]     98.7   24.08    3.73  566117         object_get_canonical_path_component [2]
>                  3.73    0.00 55802324/57623944     strstart [19]
> -----------------------------------------------
>                                    12             object_property_set_link <cycle 1> [1267]
>                                 33074             device_set_realized <cycle 1> [138]
>                                231378             object_get_child_property <cycle 1> [652]
> [3]     93.0    0.01   26.18  264464         object_get_canonical_path <cycle 1> [3]
>                 22.67    3.51  533062/566117      object_get_canonical_path_component [2]
>                                264464             object_get_root <cycle 1> [629]
> -----------------------------------------------
> 
> This is because an 8T maxmem means QEMU can create up to 32768
> LMB DRC objects, each tracking the hot-plug/unplug state of 256M
> of contiguous RAM. These objects are all created during machine
> init for the machine lifetime. Their realize path involves
> several calls to object_get_canonical_path_component(), which
> itself traverses all properties of the parent node. This results
> in a quadratic operation. Worse, the full list of DRCs is traversed
> 7 times during the boot process, eg. to populate the device tree,
> calling object_get_canonical_path_component() on each DRC again.
> Yet even more costly quadratic traversals.
> 
> Modeling DR connectors as individual devices raises some
> concerns, as already discussed a year ago in this thread:
> 
> https://patchew.org/QEMU/20191017205953.13122-1-cheloha@linux.vnet.ibm.com/
> 
> First, having so many devices to track the DRC states is excessive
> and can cause scalability issues in various ways. This bites again
> with this quadratic traversal issue. Second, DR connectors are really
> PAPR internals that shouldn't be exposed at all in the composition
> tree.
> 
> This series converts DR connectors to be simple unparented
> objects tracked in a separate hash table, rather than
> actual devices exposed in the QOM tree. This doesn't address
> the overall concern on scalability, but this brings linear
> traversal of the DR connectors. The time penalty with a
> 8T maxmem is reduced to less than 1 second, and we get
> a much shorter 'info qom-tree' output.
> 
> This is transparent to migration.


Tested in a P9 host with a 512Gb initial mem RAM guest in regular mode and P8
compat mode. Without this series, the guest takes on average 35 seconds to
start booting. With this series the boot starts in 5 seconds for the same
guest.


Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>


> 
> Greg Kurz (6):
>    spapr: Call spapr_drc_reset() for all DRCs at CAS
>    spapr: Fix reset of transient DR connectors
>    spapr: Introduce spapr_drc_reset_all()
>    spapr: Use spapr_drc_reset_all() at machine reset
>    spapr: Add drc_ prefix to the DRC realize and unrealize functions
>    spapr: Model DR connectors as simple objects
> 
>   include/hw/ppc/spapr_drc.h |  18 +++-
>   hw/ppc/spapr.c             |  15 +--
>   hw/ppc/spapr_drc.c         | 181 +++++++++++++++++--------------------
>   hw/ppc/spapr_hcall.c       |  33 ++-----
>   hw/ppc/spapr_pci.c         |   2 +-
>   5 files changed, 106 insertions(+), 143 deletions(-)
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/6] spapr: Call spapr_drc_reset() for all DRCs at CAS
  2020-12-18 10:33 ` [PATCH 1/6] spapr: Call spapr_drc_reset() for all DRCs at CAS Greg Kurz
  2020-12-21 18:24   ` Daniel Henrique Barboza
@ 2020-12-28  7:20   ` David Gibson
  1 sibling, 0 replies; 22+ messages in thread
From: David Gibson @ 2020-12-28  7:20 UTC (permalink / raw)
  To: Greg Kurz; +Cc: Daniel Henrique Barboza, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2978 bytes --]

On Fri, Dec 18, 2020 at 11:33:55AM +0100, Greg Kurz wrote:
> Non-transient DRCs are either in the empty or the ready state,
> which means spapr_drc_reset() doesn't change their state. It
> is thus not needed to do any checking. Call spapr_drc_reset()
> unconditionally and squash spapr_drc_transient() into its
> only user, spapr_drc_needed().
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>

Applied to ppc-fof-6.0, thanks.

> ---
>  include/hw/ppc/spapr_drc.h | 3 ---
>  hw/ppc/spapr_drc.c         | 8 ++------
>  hw/ppc/spapr_hcall.c       | 7 ++++---
>  3 files changed, 6 insertions(+), 12 deletions(-)
> 
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index def3593adc8b..cff5e707d0d9 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -244,9 +244,6 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask);
>  void spapr_drc_attach(SpaprDrc *drc, DeviceState *d);
>  void spapr_drc_detach(SpaprDrc *drc);
>  
> -/* Returns true if a hot plug/unplug request is pending */
> -bool spapr_drc_transient(SpaprDrc *drc);
> -
>  static inline bool spapr_drc_unplug_requested(SpaprDrc *drc)
>  {
>      return drc->unplug_requested;
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index fc7e321fcdf6..8d62f55066b6 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -462,8 +462,9 @@ static const VMStateDescription vmstate_spapr_drc_unplug_requested = {
>      }
>  };
>  
> -bool spapr_drc_transient(SpaprDrc *drc)
> +static bool spapr_drc_needed(void *opaque)
>  {
> +    SpaprDrc *drc = opaque;
>      SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
>  
>      /*
> @@ -483,11 +484,6 @@ bool spapr_drc_transient(SpaprDrc *drc)
>          spapr_drc_unplug_requested(drc);
>  }
>  
> -static bool spapr_drc_needed(void *opaque)
> -{
> -    return spapr_drc_transient(opaque);
> -}
> -
>  static const VMStateDescription vmstate_spapr_drc = {
>      .name = "spapr_drc",
>      .version_id = 1,
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index c0ea0bd5794b..4e9d50c254f0 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1650,9 +1650,10 @@ static void spapr_handle_transient_dev_before_cas(SpaprMachineState *spapr)
>                                                            prop->name,
>                                                            &error_abort));
>  
> -        if (spapr_drc_transient(drc)) {
> -            spapr_drc_reset(drc);
> -        }
> +        /*
> +         * This will complete any pending plug/unplug requests.
> +         */
> +        spapr_drc_reset(drc);
>      }
>  
>      spapr_clear_pending_hotplug_events(spapr);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/6] spapr: Fix reset of transient DR connectors
  2020-12-18 10:33 ` [PATCH 2/6] spapr: Fix reset of transient DR connectors Greg Kurz
  2020-12-21 20:34   ` Daniel Henrique Barboza
@ 2020-12-28  7:24   ` David Gibson
  1 sibling, 0 replies; 22+ messages in thread
From: David Gibson @ 2020-12-28  7:24 UTC (permalink / raw)
  To: Greg Kurz; +Cc: Daniel Henrique Barboza, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4032 bytes --]

On Fri, Dec 18, 2020 at 11:33:56AM +0100, Greg Kurz wrote:
> Documentation of object_property_iter_init() clearly stipulates that
> "it is forbidden to modify the property list while iterating". But this
> is exactly what we do when resetting transient DR connectors during CAS.
> The call to spapr_drc_reset() can finalize the hot-unplug sequence of a
> PHB or a PCI bridge, both of which will then in turn destroy their PCI
> DRCs. This could potentially invalidate the iterator. It is pure luck
> that this haven't caused any issues so far.
> 
> Change spapr_drc_reset() to return true if it caused a device to be
> removed. Restart from scratch in this case. This can potentially
> increase the overall DRC reset time, especially with a high maxmem
> which generates a lot of LMB DRCs. But this kind of setup is rare,
> and so is the use case of rebooting a guest while doing hot-unplug.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>

Applied, thanks.

> ---
>  include/hw/ppc/spapr_drc.h | 3 ++-
>  hw/ppc/spapr_drc.c         | 6 +++++-
>  hw/ppc/spapr_hcall.c       | 8 +++++++-
>  3 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index cff5e707d0d9..5d80019f82e2 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -224,7 +224,8 @@ static inline bool spapr_drc_hotplugged(DeviceState *dev)
>      return dev->hotplugged && !runstate_check(RUN_STATE_INMIGRATE);
>  }
>  
> -void spapr_drc_reset(SpaprDrc *drc);
> +/* Returns true if an unplug request completed */
> +bool spapr_drc_reset(SpaprDrc *drc);
>  
>  uint32_t spapr_drc_index(SpaprDrc *drc);
>  SpaprDrcType spapr_drc_type(SpaprDrc *drc);
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 8d62f55066b6..5b5e2ac58a7e 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -417,9 +417,10 @@ void spapr_drc_detach(SpaprDrc *drc)
>      spapr_drc_release(drc);
>  }
>  
> -void spapr_drc_reset(SpaprDrc *drc)
> +bool spapr_drc_reset(SpaprDrc *drc)
>  {
>      SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +    bool unplug_completed = false;
>  
>      trace_spapr_drc_reset(spapr_drc_index(drc));
>  
> @@ -428,6 +429,7 @@ void spapr_drc_reset(SpaprDrc *drc)
>       */
>      if (drc->unplug_requested) {
>          spapr_drc_release(drc);
> +        unplug_completed = true;
>      }
>  
>      if (drc->dev) {
> @@ -444,6 +446,8 @@ void spapr_drc_reset(SpaprDrc *drc)
>          drc->ccs_offset = -1;
>          drc->ccs_depth = -1;
>      }
> +
> +    return unplug_completed;
>  }
>  
>  static bool spapr_drc_unplug_requested_needed(void *opaque)
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index 4e9d50c254f0..aa22830ac4bd 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1639,6 +1639,7 @@ static void spapr_handle_transient_dev_before_cas(SpaprMachineState *spapr)
>      ObjectPropertyIterator iter;
>  
>      drc_container = container_get(object_get_root(), "/dr-connector");
> +restart:
>      object_property_iter_init(&iter, drc_container);
>      while ((prop = object_property_iter_next(&iter))) {
>          SpaprDrc *drc;
> @@ -1652,8 +1653,13 @@ static void spapr_handle_transient_dev_before_cas(SpaprMachineState *spapr)
>  
>          /*
>           * This will complete any pending plug/unplug requests.
> +         * In case of a unplugged PHB or PCI bridge, this will
> +         * cause some DRCs to be destroyed and thus potentially
> +         * invalidate the iterator.
>           */
> -        spapr_drc_reset(drc);
> +        if (spapr_drc_reset(drc)) {
> +            goto restart;
> +        }
>      }
>  
>      spapr_clear_pending_hotplug_events(spapr);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/6] spapr: Introduce spapr_drc_reset_all()
  2020-12-18 10:33 ` [PATCH 3/6] spapr: Introduce spapr_drc_reset_all() Greg Kurz
  2020-12-21 20:35   ` Daniel Henrique Barboza
@ 2020-12-28  7:26   ` David Gibson
  1 sibling, 0 replies; 22+ messages in thread
From: David Gibson @ 2020-12-28  7:26 UTC (permalink / raw)
  To: Greg Kurz; +Cc: Daniel Henrique Barboza, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4970 bytes --]

On Fri, Dec 18, 2020 at 11:33:57AM +0100, Greg Kurz wrote:
> No need to expose the way DRCs are traversed outside of spapr_drc.c.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>

Applied, thanks.

> ---
>  include/hw/ppc/spapr_drc.h |  6 ++++++
>  hw/ppc/spapr_drc.c         | 31 +++++++++++++++++++++++++++++
>  hw/ppc/spapr_hcall.c       | 40 ++++++--------------------------------
>  3 files changed, 43 insertions(+), 34 deletions(-)
> 
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index 5d80019f82e2..8982927d5c24 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -245,6 +245,12 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask);
>  void spapr_drc_attach(SpaprDrc *drc, DeviceState *d);
>  void spapr_drc_detach(SpaprDrc *drc);
>  
> +/*
> + * Reset all DRCs, causing pending hot-plug/unplug requests to complete.
> + * Safely handles potential DRC removal (eg. PHBs or PCI bridges).
> + */
> +void spapr_drc_reset_all(struct SpaprMachineState *spapr);
> +
>  static inline bool spapr_drc_unplug_requested(SpaprDrc *drc)
>  {
>      return drc->unplug_requested;
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 5b5e2ac58a7e..a4d2608017c5 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -949,6 +949,37 @@ out:
>      return ret;
>  }
>  
> +void spapr_drc_reset_all(SpaprMachineState *spapr)
> +{
> +    Object *drc_container;
> +    ObjectProperty *prop;
> +    ObjectPropertyIterator iter;
> +
> +    drc_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
> +restart:
> +    object_property_iter_init(&iter, drc_container);
> +    while ((prop = object_property_iter_next(&iter))) {
> +        SpaprDrc *drc;
> +
> +        if (!strstart(prop->type, "link<", NULL)) {
> +            continue;
> +        }
> +        drc = SPAPR_DR_CONNECTOR(object_property_get_link(drc_container,
> +                                                          prop->name,
> +                                                          &error_abort));
> +
> +        /*
> +         * This will complete any pending plug/unplug requests.
> +         * In case of a unplugged PHB or PCI bridge, this will
> +         * cause some DRCs to be destroyed and thus potentially
> +         * invalidate the iterator.
> +         */
> +        if (spapr_drc_reset(drc)) {
> +            goto restart;
> +        }
> +    }
> +}
> +
>  /*
>   * RTAS calls
>   */
> diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
> index aa22830ac4bd..e5dfc1ba7acc 100644
> --- a/hw/ppc/spapr_hcall.c
> +++ b/hw/ppc/spapr_hcall.c
> @@ -1632,39 +1632,6 @@ static uint32_t cas_check_pvr(PowerPCCPU *cpu, uint32_t max_compat,
>      return best_compat;
>  }
>  
> -static void spapr_handle_transient_dev_before_cas(SpaprMachineState *spapr)
> -{
> -    Object *drc_container;
> -    ObjectProperty *prop;
> -    ObjectPropertyIterator iter;
> -
> -    drc_container = container_get(object_get_root(), "/dr-connector");
> -restart:
> -    object_property_iter_init(&iter, drc_container);
> -    while ((prop = object_property_iter_next(&iter))) {
> -        SpaprDrc *drc;
> -
> -        if (!strstart(prop->type, "link<", NULL)) {
> -            continue;
> -        }
> -        drc = SPAPR_DR_CONNECTOR(object_property_get_link(drc_container,
> -                                                          prop->name,
> -                                                          &error_abort));
> -
> -        /*
> -         * This will complete any pending plug/unplug requests.
> -         * In case of a unplugged PHB or PCI bridge, this will
> -         * cause some DRCs to be destroyed and thus potentially
> -         * invalidate the iterator.
> -         */
> -        if (spapr_drc_reset(drc)) {
> -            goto restart;
> -        }
> -    }
> -
> -    spapr_clear_pending_hotplug_events(spapr);
> -}
> -
>  target_ulong do_client_architecture_support(PowerPCCPU *cpu,
>                                              SpaprMachineState *spapr,
>                                              target_ulong vec,
> @@ -1822,7 +1789,12 @@ target_ulong do_client_architecture_support(PowerPCCPU *cpu,
>  
>      spapr_irq_update_active_intc(spapr);
>  
> -    spapr_handle_transient_dev_before_cas(spapr);
> +    /*
> +     * Process all pending hot-plug/unplug requests now. An updated full
> +     * rendered FDT will be returned to the guest.
> +     */
> +    spapr_drc_reset_all(spapr);
> +    spapr_clear_pending_hotplug_events(spapr);
>  
>      /*
>       * If spapr_machine_reset() did not set up a HPT but one is necessary

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/6] spapr: Use spapr_drc_reset_all() at machine reset
  2020-12-18 10:33 ` [PATCH 4/6] spapr: Use spapr_drc_reset_all() at machine reset Greg Kurz
  2020-12-21 20:36   ` Daniel Henrique Barboza
@ 2020-12-28  7:29   ` David Gibson
  1 sibling, 0 replies; 22+ messages in thread
From: David Gibson @ 2020-12-28  7:29 UTC (permalink / raw)
  To: Greg Kurz; +Cc: Daniel Henrique Barboza, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2115 bytes --]

On Fri, Dec 18, 2020 at 11:33:58AM +0100, Greg Kurz wrote:
> Documentation of object_child_foreach_recursive() clearly stipulates
> that "it is forbidden to add or remove children from @obj from the @fn
> callback". But this is exactly what we do during machine reset. The call
> to spapr_drc_reset() can finalize the hot-unplug sequence of a PHB or a
> PCI bridge, both of which will then in turn destroy their PCI DRCs. This
> could potentially invalidate the iterator used by do_object_child_foreach().
> It is pure luck that this haven't caused any issues so far.
> 
> Use spapr_drc_reset_all() since it can cope with DRC removal.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>

Applied, thanks.

> ---
>  hw/ppc/spapr.c | 15 +--------------
>  1 file changed, 1 insertion(+), 14 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 43dded87f498..8528bc90fec4 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1566,19 +1566,6 @@ void spapr_setup_hpt(SpaprMachineState *spapr)
>      }
>  }
>  
> -static int spapr_reset_drcs(Object *child, void *opaque)
> -{
> -    SpaprDrc *drc =
> -        (SpaprDrc *) object_dynamic_cast(child,
> -                                                 TYPE_SPAPR_DR_CONNECTOR);
> -
> -    if (drc) {
> -        spapr_drc_reset(drc);
> -    }
> -
> -    return 0;
> -}
> -
>  static void spapr_machine_reset(MachineState *machine)
>  {
>      SpaprMachineState *spapr = SPAPR_MACHINE(machine);
> @@ -1633,7 +1620,7 @@ static void spapr_machine_reset(MachineState *machine)
>       * will crash QEMU if the DIMM holding the vring goes away). To avoid such
>       * situations, we reset DRCs after all devices have been reset.
>       */
> -    object_child_foreach_recursive(object_get_root(), spapr_reset_drcs, NULL);
> +    spapr_drc_reset_all(spapr);
>  
>      spapr_clear_pending_events(spapr);
>  

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/6] spapr: Add drc_ prefix to the DRC realize and unrealize functions
  2020-12-18 10:33 ` [PATCH 5/6] spapr: Add drc_ prefix to the DRC realize and unrealize functions Greg Kurz
  2020-12-21 20:37   ` Daniel Henrique Barboza
@ 2020-12-28  7:31   ` David Gibson
  1 sibling, 0 replies; 22+ messages in thread
From: David Gibson @ 2020-12-28  7:31 UTC (permalink / raw)
  To: Greg Kurz; +Cc: Daniel Henrique Barboza, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 2394 bytes --]

On Fri, Dec 18, 2020 at 11:33:59AM +0100, Greg Kurz wrote:
> Use a less generic name for an easier experience with tools such as
> cscope or grep.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>

Applied to ppc-for-6.0, thanks.

> ---
>  hw/ppc/spapr_drc.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index a4d2608017c5..8571d5bafe4e 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -503,7 +503,7 @@ static const VMStateDescription vmstate_spapr_drc = {
>      }
>  };
>  
> -static void realize(DeviceState *d, Error **errp)
> +static void drc_realize(DeviceState *d, Error **errp)
>  {
>      SpaprDrc *drc = SPAPR_DR_CONNECTOR(d);
>      Object *root_container;
> @@ -530,7 +530,7 @@ static void realize(DeviceState *d, Error **errp)
>      trace_spapr_drc_realize_complete(spapr_drc_index(drc));
>  }
>  
> -static void unrealize(DeviceState *d)
> +static void drc_unrealize(DeviceState *d)
>  {
>      SpaprDrc *drc = SPAPR_DR_CONNECTOR(d);
>      Object *root_container;
> @@ -579,8 +579,8 @@ static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
>  {
>      DeviceClass *dk = DEVICE_CLASS(k);
>  
> -    dk->realize = realize;
> -    dk->unrealize = unrealize;
> +    dk->realize = drc_realize;
> +    dk->unrealize = drc_unrealize;
>      /*
>       * Reason: DR connector needs to be wired to either the machine or to a
>       * PHB in spapr_dr_connector_new().
> @@ -628,7 +628,7 @@ static void realize_physical(DeviceState *d, Error **errp)
>      SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d);
>      Error *local_err = NULL;
>  
> -    realize(d, &local_err);
> +    drc_realize(d, &local_err);
>      if (local_err) {
>          error_propagate(errp, local_err);
>          return;
> @@ -644,7 +644,7 @@ static void unrealize_physical(DeviceState *d)
>  {
>      SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d);
>  
> -    unrealize(d);
> +    drc_unrealize(d);
>      vmstate_unregister(VMSTATE_IF(drcp), &vmstate_spapr_drc_physical, drcp);
>      qemu_unregister_reset(drc_physical_reset, drcp);
>  }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] spapr: Model DR connectors as simple objects
  2020-12-18 10:34 ` [PATCH 6/6] spapr: Model DR connectors as simple objects Greg Kurz
  2020-12-21 20:45   ` Daniel Henrique Barboza
@ 2020-12-28  8:28   ` David Gibson
  2021-01-06 18:15     ` Greg Kurz
  1 sibling, 1 reply; 22+ messages in thread
From: David Gibson @ 2020-12-28  8:28 UTC (permalink / raw)
  To: Greg Kurz; +Cc: Daniel Henrique Barboza, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 16007 bytes --]

On Fri, Dec 18, 2020 at 11:34:00AM +0100, Greg Kurz wrote:
> Modeling DR connectors as individual devices raises some
> concerns, as already discussed a year ago in this thread:
> 
> https://patchew.org/QEMU/20191017205953.13122-1-cheloha@linux.vnet.ibm.com/
> 
> First, high maxmem settings creates too many DRC devices.
> This causes scalability issues. It severely increase boot
> time because the multiple traversals of the DRC list that
> are performed during machine setup are quadratic operations.
> This is directly related to the fact that DRCs are modeled
> as individual devices and added to the composition tree.
> 
> Second, DR connectors are really an internal concept of
> PAPR. They aren't something that the user or management
> layer can manipulate in any way. We already don't allow
> their creation with device_add by clearing user_creatable.
> 
> DR connectors don't even need to be modeled as actual
> devices since they don't sit in a bus. They just need
> to be associated to an 'owner' object and to have the
> equivalent of realize/unrealize functions.
> 
> Downgrade them to be simple objects. Convert the existing
> realize() and unrealize() to be methods of the DR connector
> base class. Also have the base class to inherit from the
> vmstate_if interface directly. The get_id() hook simply
> returns NULL, just as device_vmstate_if_get_id() does for
> devices that don't sit in a bus. The DR connector is no
> longer made a child object. This means that it must be
> explicitely freed when no longer needed. This is only
> required for PHBs and PCI bridges actually : have them to
> free the DRC with spapr_dr_connector_free() instead of
> object_unparent().
> 
> No longer add the DRCs to the QOM composition tree. Track
> them with a glib hash table using the global DRC index as
> the key instead. This makes traversal a linear operation.

I have some reservations about this one.  The main thing is that
attaching migration state to something that's not a device seems a bit
odd to me.  AFAICT exactly one other non-device implements
TYPE_VMSTATE_IF, and what it does isn't very clear to me.

As I might have mentioned to you I had a different idea for how to
address this problem: still use a TYPE_DEVICE, but have it manage a
whole array of DRCs as one unit, rather than just a single one.
Specifically I was thinking:

* one array per PCI bus (DRCs for each function on the bus)
* one array for each block of memory (so one for base memory, one for
  each DIMM)
* one array for all the cpus
* one array for all the PHBs

It has some disadvantages compared to your scheme: it still leaves
(less) devices which can't be user managed, which is a bit ugly.  On
the other hand, each of those arrays can reasonably be dense, so we
can use direct indexing rather than a hash table, which is a bit
nicer.

Thoughts?

> 
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
>  include/hw/ppc/spapr_drc.h |   8 +-
>  hw/ppc/spapr_drc.c         | 166 ++++++++++++++-----------------------
>  hw/ppc/spapr_pci.c         |   2 +-
>  3 files changed, 69 insertions(+), 107 deletions(-)
> 
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index 8982927d5c24..a26aa8b9d4c3 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -170,7 +170,7 @@ typedef enum {
>  
>  typedef struct SpaprDrc {
>      /*< private >*/
> -    DeviceState parent;
> +    Object parent;
>  
>      uint32_t id;
>      Object *owner;
> @@ -193,7 +193,7 @@ struct SpaprMachineState;
>  
>  typedef struct SpaprDrcClass {
>      /*< private >*/
> -    DeviceClass parent;
> +    ObjectClass parent;
>      SpaprDrcState empty_state;
>      SpaprDrcState ready_state;
>  
> @@ -209,6 +209,9 @@ typedef struct SpaprDrcClass {
>  
>      int (*dt_populate)(SpaprDrc *drc, struct SpaprMachineState *spapr,
>                         void *fdt, int *fdt_start_offset, Error **errp);
> +
> +    void (*realize)(SpaprDrc *drc);
> +    void (*unrealize)(SpaprDrc *drc);
>  } SpaprDrcClass;
>  
>  typedef struct SpaprDrcPhysical {
> @@ -232,6 +235,7 @@ SpaprDrcType spapr_drc_type(SpaprDrc *drc);
>  
>  SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type,
>                                           uint32_t id);
> +void spapr_dr_connector_free(SpaprDrc *drc);
>  SpaprDrc *spapr_drc_by_index(uint32_t index);
>  SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id);
>  int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask);
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 8571d5bafe4e..e26763f8b5a4 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -27,7 +27,6 @@
>  #include "sysemu/reset.h"
>  #include "trace.h"
>  
> -#define DRC_CONTAINER_PATH "/dr-connector"
>  #define DRC_INDEX_TYPE_SHIFT 28
>  #define DRC_INDEX_ID_MASK ((1ULL << DRC_INDEX_TYPE_SHIFT) - 1)
>  
> @@ -503,65 +502,56 @@ static const VMStateDescription vmstate_spapr_drc = {
>      }
>  };
>  
> -static void drc_realize(DeviceState *d, Error **errp)
> +static GHashTable *drc_hash_table(void)
>  {
> -    SpaprDrc *drc = SPAPR_DR_CONNECTOR(d);
> -    Object *root_container;
> -    gchar *link_name;
> -    const char *child_name;
> +    static GHashTable *dht;
>  
> +    if (!dht) {
> +        dht = g_hash_table_new(NULL, NULL);
> +    }
> +
> +    return dht;
> +}
> +
> +
> +static void drc_realize(SpaprDrc *drc)
> +{
>      trace_spapr_drc_realize(spapr_drc_index(drc));
> -    /* NOTE: we do this as part of realize/unrealize due to the fact
> -     * that the guest will communicate with the DRC via RTAS calls
> -     * referencing the global DRC index. By unlinking the DRC
> -     * from DRC_CONTAINER_PATH/<drc_index> we effectively make it
> -     * inaccessible by the guest, since lookups rely on this path
> -     * existing in the composition tree
> -     */
> -    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
> -    link_name = g_strdup_printf("%x", spapr_drc_index(drc));
> -    child_name = object_get_canonical_path_component(OBJECT(drc));
> -    trace_spapr_drc_realize_child(spapr_drc_index(drc), child_name);
> -    object_property_add_alias(root_container, link_name,
> -                              drc->owner, child_name);
> -    g_free(link_name);
> +
> +    g_hash_table_insert(drc_hash_table(),
> +                        GUINT_TO_POINTER(spapr_drc_index(drc)), drc);
>      vmstate_register(VMSTATE_IF(drc), spapr_drc_index(drc), &vmstate_spapr_drc,
>                       drc);
>      trace_spapr_drc_realize_complete(spapr_drc_index(drc));
>  }
>  
> -static void drc_unrealize(DeviceState *d)
> +static void drc_unrealize(SpaprDrc *drc)
>  {
> -    SpaprDrc *drc = SPAPR_DR_CONNECTOR(d);
> -    Object *root_container;
> -    gchar *name;
> -
>      trace_spapr_drc_unrealize(spapr_drc_index(drc));
>      vmstate_unregister(VMSTATE_IF(drc), &vmstate_spapr_drc, drc);
> -    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
> -    name = g_strdup_printf("%x", spapr_drc_index(drc));
> -    object_property_del(root_container, name);
> -    g_free(name);
> +    g_hash_table_remove(drc_hash_table(),
> +                        GUINT_TO_POINTER(spapr_drc_index(drc)));
>  }
>  
>  SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type,
>                                           uint32_t id)
>  {
>      SpaprDrc *drc = SPAPR_DR_CONNECTOR(object_new(type));
> -    char *prop_name;
>  
>      drc->id = id;
> -    drc->owner = owner;
> -    prop_name = g_strdup_printf("dr-connector[%"PRIu32"]",
> -                                spapr_drc_index(drc));
> -    object_property_add_child(owner, prop_name, OBJECT(drc));
> -    object_unref(OBJECT(drc));
> -    qdev_realize(DEVICE(drc), NULL, NULL);
> -    g_free(prop_name);
> +    drc->owner = object_ref(owner);
> +    SPAPR_DR_CONNECTOR_GET_CLASS(drc)->realize(drc);
>  
>      return drc;
>  }
>  
> +void spapr_dr_connector_free(SpaprDrc *drc)
> +{
> +    SPAPR_DR_CONNECTOR_GET_CLASS(drc)->unrealize(drc);
> +    object_unref(drc->owner);
> +    object_unref(drc);
> +}
> +
>  static void spapr_dr_connector_instance_init(Object *obj)
>  {
>      SpaprDrc *drc = SPAPR_DR_CONNECTOR(obj);
> @@ -575,17 +565,19 @@ static void spapr_dr_connector_instance_init(Object *obj)
>      drc->state = drck->empty_state;
>  }
>  
> +static char *drc_vmstate_if_get_id(VMStateIf *obj)
> +{
> +    return NULL;
> +}
> +
>  static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
>  {
> -    DeviceClass *dk = DEVICE_CLASS(k);
> +    SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
> +    VMStateIfClass *vc = VMSTATE_IF_CLASS(k);
>  
> -    dk->realize = drc_realize;
> -    dk->unrealize = drc_unrealize;
> -    /*
> -     * Reason: DR connector needs to be wired to either the machine or to a
> -     * PHB in spapr_dr_connector_new().
> -     */
> -    dk->user_creatable = false;
> +    drck->realize = drc_realize;
> +    drck->unrealize = drc_unrealize;
> +    vc->get_id = drc_vmstate_if_get_id;
>  }
>  
>  static bool drc_physical_needed(void *opaque)
> @@ -623,39 +615,32 @@ static void drc_physical_reset(void *opaque)
>      }
>  }
>  
> -static void realize_physical(DeviceState *d, Error **errp)
> +static void realize_physical(SpaprDrc *drc)
>  {
> -    SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d);
> -    Error *local_err = NULL;
> -
> -    drc_realize(d, &local_err);
> -    if (local_err) {
> -        error_propagate(errp, local_err);
> -        return;
> -    }
> +    SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(drc);
>  
> +    drc_realize(drc);
>      vmstate_register(VMSTATE_IF(drcp),
>                       spapr_drc_index(SPAPR_DR_CONNECTOR(drcp)),
>                       &vmstate_spapr_drc_physical, drcp);
>      qemu_register_reset(drc_physical_reset, drcp);
>  }
>  
> -static void unrealize_physical(DeviceState *d)
> +static void unrealize_physical(SpaprDrc *drc)
>  {
> -    SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d);
> +    SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(drc);
>  
> -    drc_unrealize(d);
> -    vmstate_unregister(VMSTATE_IF(drcp), &vmstate_spapr_drc_physical, drcp);
>      qemu_unregister_reset(drc_physical_reset, drcp);
> +    vmstate_unregister(VMSTATE_IF(drcp), &vmstate_spapr_drc_physical, drcp);
> +    drc_unrealize(drc);
>  }
>  
>  static void spapr_drc_physical_class_init(ObjectClass *k, void *data)
>  {
> -    DeviceClass *dk = DEVICE_CLASS(k);
>      SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
>  
> -    dk->realize = realize_physical;
> -    dk->unrealize = unrealize_physical;
> +    drck->realize = realize_physical;
> +    drck->unrealize = unrealize_physical;
>      drck->dr_entity_sense = physical_entity_sense;
>      drck->isolate = drc_isolate_physical;
>      drck->unisolate = drc_unisolate_physical;
> @@ -731,12 +716,16 @@ static void spapr_drc_pmem_class_init(ObjectClass *k, void *data)
>  
>  static const TypeInfo spapr_dr_connector_info = {
>      .name          = TYPE_SPAPR_DR_CONNECTOR,
> -    .parent        = TYPE_DEVICE,
> +    .parent        = TYPE_OBJECT,
>      .instance_size = sizeof(SpaprDrc),
>      .instance_init = spapr_dr_connector_instance_init,
>      .class_size    = sizeof(SpaprDrcClass),
>      .class_init    = spapr_dr_connector_class_init,
>      .abstract      = true,
> +    .interfaces = (InterfaceInfo[]) {
> +        { TYPE_VMSTATE_IF },
> +        { }
> +    },
>  };
>  
>  static const TypeInfo spapr_drc_physical_info = {
> @@ -789,14 +778,9 @@ static const TypeInfo spapr_drc_pmem_info = {
>  
>  SpaprDrc *spapr_drc_by_index(uint32_t index)
>  {
> -    Object *obj;
> -    gchar *name;
> -
> -    name = g_strdup_printf("%s/%x", DRC_CONTAINER_PATH, index);
> -    obj = object_resolve_path(name, NULL);
> -    g_free(name);
> -
> -    return !obj ? NULL : SPAPR_DR_CONNECTOR(obj);
> +    return
> +        SPAPR_DR_CONNECTOR(g_hash_table_lookup(drc_hash_table(),
> +                                               GUINT_TO_POINTER(index)));
>  }
>  
>  SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id)
> @@ -824,13 +808,12 @@ SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id)
>   */
>  int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask)
>  {
> -    Object *root_container;
> -    ObjectProperty *prop;
> -    ObjectPropertyIterator iter;
> +    GHashTableIter iter;
>      uint32_t drc_count = 0;
>      GArray *drc_indexes, *drc_power_domains;
>      GString *drc_names, *drc_types;
>      int ret;
> +    SpaprDrc *drc;
>  
>      /*
>       * This should really be only called once per node since it overwrites
> @@ -851,26 +834,12 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask)
>      drc_names = g_string_set_size(g_string_new(NULL), sizeof(uint32_t));
>      drc_types = g_string_set_size(g_string_new(NULL), sizeof(uint32_t));
>  
> -    /* aliases for all DRConnector objects will be rooted in QOM
> -     * composition tree at DRC_CONTAINER_PATH
> -     */
> -    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
> -
> -    object_property_iter_init(&iter, root_container);
> -    while ((prop = object_property_iter_next(&iter))) {
> -        Object *obj;
> -        SpaprDrc *drc;
> +    g_hash_table_iter_init(&iter, drc_hash_table());
> +    while (g_hash_table_iter_next(&iter, NULL, (gpointer) &drc)) {
>          SpaprDrcClass *drck;
>          char *drc_name = NULL;
>          uint32_t drc_index, drc_power_domain;
>  
> -        if (!strstart(prop->type, "link<", NULL)) {
> -            continue;
> -        }
> -
> -        obj = object_property_get_link(root_container, prop->name,
> -                                       &error_abort);
> -        drc = SPAPR_DR_CONNECTOR(obj);
>          drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
>  
>          if (owner && (drc->owner != owner)) {
> @@ -951,23 +920,12 @@ out:
>  
>  void spapr_drc_reset_all(SpaprMachineState *spapr)
>  {
> -    Object *drc_container;
> -    ObjectProperty *prop;
> -    ObjectPropertyIterator iter;
> +    GHashTableIter iter;
> +    SpaprDrc *drc;
>  
> -    drc_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
>  restart:
> -    object_property_iter_init(&iter, drc_container);
> -    while ((prop = object_property_iter_next(&iter))) {
> -        SpaprDrc *drc;
> -
> -        if (!strstart(prop->type, "link<", NULL)) {
> -            continue;
> -        }
> -        drc = SPAPR_DR_CONNECTOR(object_property_get_link(drc_container,
> -                                                          prop->name,
> -                                                          &error_abort));
> -
> +    g_hash_table_iter_init(&iter, drc_hash_table());
> +    while (g_hash_table_iter_next(&iter, NULL, (gpointer) &drc)) {
>          /*
>           * This will complete any pending plug/unplug requests.
>           * In case of a unplugged PHB or PCI bridge, this will
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 76d7c91e9c64..ca0cca664e3c 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1262,7 +1262,7 @@ static void remove_drcs(SpaprPhbState *phb, PCIBus *bus)
>          SpaprDrc *drc = drc_from_devfn(phb, chassis, i);
>  
>          if (drc) {
> -            object_unparent(OBJECT(drc));
> +            spapr_dr_connector_free(drc);
>          }
>      }
>  }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] spapr: Model DR connectors as simple objects
  2020-12-28  8:28   ` David Gibson
@ 2021-01-06 18:15     ` Greg Kurz
  2021-02-08  6:30       ` David Gibson
  0 siblings, 1 reply; 22+ messages in thread
From: Greg Kurz @ 2021-01-06 18:15 UTC (permalink / raw)
  To: David Gibson; +Cc: Daniel Henrique Barboza, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 19882 bytes --]

On Mon, 28 Dec 2020 19:28:39 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Fri, Dec 18, 2020 at 11:34:00AM +0100, Greg Kurz wrote:
> > Modeling DR connectors as individual devices raises some
> > concerns, as already discussed a year ago in this thread:
> > 
> > https://patchew.org/QEMU/20191017205953.13122-1-cheloha@linux.vnet.ibm.com/
> > 
> > First, high maxmem settings creates too many DRC devices.
> > This causes scalability issues. It severely increase boot
> > time because the multiple traversals of the DRC list that
> > are performed during machine setup are quadratic operations.
> > This is directly related to the fact that DRCs are modeled
> > as individual devices and added to the composition tree.
> > 
> > Second, DR connectors are really an internal concept of
> > PAPR. They aren't something that the user or management
> > layer can manipulate in any way. We already don't allow
> > their creation with device_add by clearing user_creatable.
> > 
> > DR connectors don't even need to be modeled as actual
> > devices since they don't sit in a bus. They just need
> > to be associated to an 'owner' object and to have the
> > equivalent of realize/unrealize functions.
> > 
> > Downgrade them to be simple objects. Convert the existing
> > realize() and unrealize() to be methods of the DR connector
> > base class. Also have the base class to inherit from the
> > vmstate_if interface directly. The get_id() hook simply
> > returns NULL, just as device_vmstate_if_get_id() does for
> > devices that don't sit in a bus. The DR connector is no
> > longer made a child object. This means that it must be
> > explicitely freed when no longer needed. This is only
> > required for PHBs and PCI bridges actually : have them to
> > free the DRC with spapr_dr_connector_free() instead of
> > object_unparent().
> > 
> > No longer add the DRCs to the QOM composition tree. Track
> > them with a glib hash table using the global DRC index as
> > the key instead. This makes traversal a linear operation.
> 
> I have some reservations about this one.  The main thing is that
> attaching migration state to something that's not a device seems a bit
> odd to me.  AFAICT exactly one other non-device implements
> TYPE_VMSTATE_IF, and what it does isn't very clear to me.
> 

Even with your proposal below, the current SpaprDrc type, which is
used all over the place, will stop being a TYPE_DEVICE but we still
need to support migration with existing machine types for which DRC
are devices. Implementing TYPE_VMSTATE_IF is essentially a hack that
allows to do that without keeping the current TYPE_DEVICE based
implementation around.

> As I might have mentioned to you I had a different idea for how to
> address this problem: still use a TYPE_DEVICE, but have it manage a
> whole array of DRCs as one unit, rather than just a single one.
> Specifically I was thinking:
> 
> * one array per PCI bus (DRCs for each function on the bus)
> * one array for each block of memory (so one for base memory, one for
>   each DIMM)
> * one array for all the cpus
> * one array for all the PHBs
> 
> It has some disadvantages compared to your scheme: it still leaves
> (less) devices which can't be user managed, which is a bit ugly.  On
> the other hand, each of those arrays can reasonably be dense, so we
> can use direct indexing rather than a hash table, which is a bit
> nicer.
> 
> Thoughts?
> 

I find it a bit overkill to introduce a new TYPE_DEVICE (let's
call it a DRC manager) for something that:
- doesn't sit on a bus
- can't be user managed
- isn't directly represented to the guest as a full node
  in the DT unlike all other devices, but just as indexes
  in some properties of actual DR capable devices.

Given that the DRC index space is global and this is what
the guest passes to DR RTAS calls, we can't do direct
indexing, strictly speaking. We need at least some logic
to dispatch operations on individual DRC states to the
appropriate DRC manager. This logic belongs to the machine
IMHO.

This shouldn't be too complex for CPUs and PHBs since they
sit directly under the machine and have 1:1 relation with
the attached device. It just boils down to instantiate
some DRC managers during machine init:

- range [ 0x10000000 ... 0x10000000 + ${max_cpus} [
  for CPUs
- range [ 0x20000000 ... 0x20000000 + 31 [
  for PHBs

For memory, the code currently generates DRC indexes in the range:

[ 0x80000000 ... 0x80000000 + ${base_ram_size}/256M ... ${max_ram_size}/256M [

ie. it doesn't generate DRC indexes for the base memory AFAICT. Also
each DIMM can be of arbitrary size, ie. consume an arbitrary amount
of DRC indexes. So the machine would instantiate SPAPR_MAX_RAM_SLOTS (32)
DRC managers, each capable of managing the full set of LMB DRCs, just
in case ? Likely a lot of zeroes with high maxmem settings but I guess
we can live with it.

PCI busses would need some extra care though since the machine
doesn't know about them. This would require to be able to
register/unregister DRC managers for SPAPR_DR_CONNECTOR_TYPE_PCI
indexes, so that the dispatching logic know about the ranges
they cover (PHB internals).

And finally comes migration : I cannot think of a way to generate
the VMState sections used by existing machine types out of a set
of arrays of integers... We could keep the current implementation
around and use it with older machine types, but this certainly
looks terrible from a maintenance perspective. Did you have any
suggestion to handle that ?

I seem to remember that one of the motivation to have arrays
of DRCs is to avoid the inflation of VMState sections that
we currently get with high maxmem settings, and it is considered
preferable to stream sparse arrays. This could be achieved by
building these arrays out of the global DRC hash table in a machine
pre-save handler and migrate them in a subsection for the default
machine type. Older machine types would continue with the current
VMState sections thanks to the TYPE_VMSTATE_IF hack.

Does this seem a reasonable trade-off to be able to support
older and newer machine types with the same implementation ?

> > 
> > Signed-off-by: Greg Kurz <groug@kaod.org>
> > ---
> >  include/hw/ppc/spapr_drc.h |   8 +-
> >  hw/ppc/spapr_drc.c         | 166 ++++++++++++++-----------------------
> >  hw/ppc/spapr_pci.c         |   2 +-
> >  3 files changed, 69 insertions(+), 107 deletions(-)
> > 
> > diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> > index 8982927d5c24..a26aa8b9d4c3 100644
> > --- a/include/hw/ppc/spapr_drc.h
> > +++ b/include/hw/ppc/spapr_drc.h
> > @@ -170,7 +170,7 @@ typedef enum {
> >  
> >  typedef struct SpaprDrc {
> >      /*< private >*/
> > -    DeviceState parent;
> > +    Object parent;
> >  
> >      uint32_t id;
> >      Object *owner;
> > @@ -193,7 +193,7 @@ struct SpaprMachineState;
> >  
> >  typedef struct SpaprDrcClass {
> >      /*< private >*/
> > -    DeviceClass parent;
> > +    ObjectClass parent;
> >      SpaprDrcState empty_state;
> >      SpaprDrcState ready_state;
> >  
> > @@ -209,6 +209,9 @@ typedef struct SpaprDrcClass {
> >  
> >      int (*dt_populate)(SpaprDrc *drc, struct SpaprMachineState *spapr,
> >                         void *fdt, int *fdt_start_offset, Error **errp);
> > +
> > +    void (*realize)(SpaprDrc *drc);
> > +    void (*unrealize)(SpaprDrc *drc);
> >  } SpaprDrcClass;
> >  
> >  typedef struct SpaprDrcPhysical {
> > @@ -232,6 +235,7 @@ SpaprDrcType spapr_drc_type(SpaprDrc *drc);
> >  
> >  SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type,
> >                                           uint32_t id);
> > +void spapr_dr_connector_free(SpaprDrc *drc);
> >  SpaprDrc *spapr_drc_by_index(uint32_t index);
> >  SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id);
> >  int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask);
> > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> > index 8571d5bafe4e..e26763f8b5a4 100644
> > --- a/hw/ppc/spapr_drc.c
> > +++ b/hw/ppc/spapr_drc.c
> > @@ -27,7 +27,6 @@
> >  #include "sysemu/reset.h"
> >  #include "trace.h"
> >  
> > -#define DRC_CONTAINER_PATH "/dr-connector"
> >  #define DRC_INDEX_TYPE_SHIFT 28
> >  #define DRC_INDEX_ID_MASK ((1ULL << DRC_INDEX_TYPE_SHIFT) - 1)
> >  
> > @@ -503,65 +502,56 @@ static const VMStateDescription vmstate_spapr_drc = {
> >      }
> >  };
> >  
> > -static void drc_realize(DeviceState *d, Error **errp)
> > +static GHashTable *drc_hash_table(void)
> >  {
> > -    SpaprDrc *drc = SPAPR_DR_CONNECTOR(d);
> > -    Object *root_container;
> > -    gchar *link_name;
> > -    const char *child_name;
> > +    static GHashTable *dht;
> >  
> > +    if (!dht) {
> > +        dht = g_hash_table_new(NULL, NULL);
> > +    }
> > +
> > +    return dht;
> > +}
> > +
> > +
> > +static void drc_realize(SpaprDrc *drc)
> > +{
> >      trace_spapr_drc_realize(spapr_drc_index(drc));
> > -    /* NOTE: we do this as part of realize/unrealize due to the fact
> > -     * that the guest will communicate with the DRC via RTAS calls
> > -     * referencing the global DRC index. By unlinking the DRC
> > -     * from DRC_CONTAINER_PATH/<drc_index> we effectively make it
> > -     * inaccessible by the guest, since lookups rely on this path
> > -     * existing in the composition tree
> > -     */
> > -    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
> > -    link_name = g_strdup_printf("%x", spapr_drc_index(drc));
> > -    child_name = object_get_canonical_path_component(OBJECT(drc));
> > -    trace_spapr_drc_realize_child(spapr_drc_index(drc), child_name);
> > -    object_property_add_alias(root_container, link_name,
> > -                              drc->owner, child_name);
> > -    g_free(link_name);
> > +
> > +    g_hash_table_insert(drc_hash_table(),
> > +                        GUINT_TO_POINTER(spapr_drc_index(drc)), drc);
> >      vmstate_register(VMSTATE_IF(drc), spapr_drc_index(drc), &vmstate_spapr_drc,
> >                       drc);
> >      trace_spapr_drc_realize_complete(spapr_drc_index(drc));
> >  }
> >  
> > -static void drc_unrealize(DeviceState *d)
> > +static void drc_unrealize(SpaprDrc *drc)
> >  {
> > -    SpaprDrc *drc = SPAPR_DR_CONNECTOR(d);
> > -    Object *root_container;
> > -    gchar *name;
> > -
> >      trace_spapr_drc_unrealize(spapr_drc_index(drc));
> >      vmstate_unregister(VMSTATE_IF(drc), &vmstate_spapr_drc, drc);
> > -    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
> > -    name = g_strdup_printf("%x", spapr_drc_index(drc));
> > -    object_property_del(root_container, name);
> > -    g_free(name);
> > +    g_hash_table_remove(drc_hash_table(),
> > +                        GUINT_TO_POINTER(spapr_drc_index(drc)));
> >  }
> >  
> >  SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type,
> >                                           uint32_t id)
> >  {
> >      SpaprDrc *drc = SPAPR_DR_CONNECTOR(object_new(type));
> > -    char *prop_name;
> >  
> >      drc->id = id;
> > -    drc->owner = owner;
> > -    prop_name = g_strdup_printf("dr-connector[%"PRIu32"]",
> > -                                spapr_drc_index(drc));
> > -    object_property_add_child(owner, prop_name, OBJECT(drc));
> > -    object_unref(OBJECT(drc));
> > -    qdev_realize(DEVICE(drc), NULL, NULL);
> > -    g_free(prop_name);
> > +    drc->owner = object_ref(owner);
> > +    SPAPR_DR_CONNECTOR_GET_CLASS(drc)->realize(drc);
> >  
> >      return drc;
> >  }
> >  
> > +void spapr_dr_connector_free(SpaprDrc *drc)
> > +{
> > +    SPAPR_DR_CONNECTOR_GET_CLASS(drc)->unrealize(drc);
> > +    object_unref(drc->owner);
> > +    object_unref(drc);
> > +}
> > +
> >  static void spapr_dr_connector_instance_init(Object *obj)
> >  {
> >      SpaprDrc *drc = SPAPR_DR_CONNECTOR(obj);
> > @@ -575,17 +565,19 @@ static void spapr_dr_connector_instance_init(Object *obj)
> >      drc->state = drck->empty_state;
> >  }
> >  
> > +static char *drc_vmstate_if_get_id(VMStateIf *obj)
> > +{
> > +    return NULL;
> > +}
> > +
> >  static void spapr_dr_connector_class_init(ObjectClass *k, void *data)
> >  {
> > -    DeviceClass *dk = DEVICE_CLASS(k);
> > +    SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
> > +    VMStateIfClass *vc = VMSTATE_IF_CLASS(k);
> >  
> > -    dk->realize = drc_realize;
> > -    dk->unrealize = drc_unrealize;
> > -    /*
> > -     * Reason: DR connector needs to be wired to either the machine or to a
> > -     * PHB in spapr_dr_connector_new().
> > -     */
> > -    dk->user_creatable = false;
> > +    drck->realize = drc_realize;
> > +    drck->unrealize = drc_unrealize;
> > +    vc->get_id = drc_vmstate_if_get_id;
> >  }
> >  
> >  static bool drc_physical_needed(void *opaque)
> > @@ -623,39 +615,32 @@ static void drc_physical_reset(void *opaque)
> >      }
> >  }
> >  
> > -static void realize_physical(DeviceState *d, Error **errp)
> > +static void realize_physical(SpaprDrc *drc)
> >  {
> > -    SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d);
> > -    Error *local_err = NULL;
> > -
> > -    drc_realize(d, &local_err);
> > -    if (local_err) {
> > -        error_propagate(errp, local_err);
> > -        return;
> > -    }
> > +    SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(drc);
> >  
> > +    drc_realize(drc);
> >      vmstate_register(VMSTATE_IF(drcp),
> >                       spapr_drc_index(SPAPR_DR_CONNECTOR(drcp)),
> >                       &vmstate_spapr_drc_physical, drcp);
> >      qemu_register_reset(drc_physical_reset, drcp);
> >  }
> >  
> > -static void unrealize_physical(DeviceState *d)
> > +static void unrealize_physical(SpaprDrc *drc)
> >  {
> > -    SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(d);
> > +    SpaprDrcPhysical *drcp = SPAPR_DRC_PHYSICAL(drc);
> >  
> > -    drc_unrealize(d);
> > -    vmstate_unregister(VMSTATE_IF(drcp), &vmstate_spapr_drc_physical, drcp);
> >      qemu_unregister_reset(drc_physical_reset, drcp);
> > +    vmstate_unregister(VMSTATE_IF(drcp), &vmstate_spapr_drc_physical, drcp);
> > +    drc_unrealize(drc);
> >  }
> >  
> >  static void spapr_drc_physical_class_init(ObjectClass *k, void *data)
> >  {
> > -    DeviceClass *dk = DEVICE_CLASS(k);
> >      SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_CLASS(k);
> >  
> > -    dk->realize = realize_physical;
> > -    dk->unrealize = unrealize_physical;
> > +    drck->realize = realize_physical;
> > +    drck->unrealize = unrealize_physical;
> >      drck->dr_entity_sense = physical_entity_sense;
> >      drck->isolate = drc_isolate_physical;
> >      drck->unisolate = drc_unisolate_physical;
> > @@ -731,12 +716,16 @@ static void spapr_drc_pmem_class_init(ObjectClass *k, void *data)
> >  
> >  static const TypeInfo spapr_dr_connector_info = {
> >      .name          = TYPE_SPAPR_DR_CONNECTOR,
> > -    .parent        = TYPE_DEVICE,
> > +    .parent        = TYPE_OBJECT,
> >      .instance_size = sizeof(SpaprDrc),
> >      .instance_init = spapr_dr_connector_instance_init,
> >      .class_size    = sizeof(SpaprDrcClass),
> >      .class_init    = spapr_dr_connector_class_init,
> >      .abstract      = true,
> > +    .interfaces = (InterfaceInfo[]) {
> > +        { TYPE_VMSTATE_IF },
> > +        { }
> > +    },
> >  };
> >  
> >  static const TypeInfo spapr_drc_physical_info = {
> > @@ -789,14 +778,9 @@ static const TypeInfo spapr_drc_pmem_info = {
> >  
> >  SpaprDrc *spapr_drc_by_index(uint32_t index)
> >  {
> > -    Object *obj;
> > -    gchar *name;
> > -
> > -    name = g_strdup_printf("%s/%x", DRC_CONTAINER_PATH, index);
> > -    obj = object_resolve_path(name, NULL);
> > -    g_free(name);
> > -
> > -    return !obj ? NULL : SPAPR_DR_CONNECTOR(obj);
> > +    return
> > +        SPAPR_DR_CONNECTOR(g_hash_table_lookup(drc_hash_table(),
> > +                                               GUINT_TO_POINTER(index)));
> >  }
> >  
> >  SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id)
> > @@ -824,13 +808,12 @@ SpaprDrc *spapr_drc_by_id(const char *type, uint32_t id)
> >   */
> >  int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask)
> >  {
> > -    Object *root_container;
> > -    ObjectProperty *prop;
> > -    ObjectPropertyIterator iter;
> > +    GHashTableIter iter;
> >      uint32_t drc_count = 0;
> >      GArray *drc_indexes, *drc_power_domains;
> >      GString *drc_names, *drc_types;
> >      int ret;
> > +    SpaprDrc *drc;
> >  
> >      /*
> >       * This should really be only called once per node since it overwrites
> > @@ -851,26 +834,12 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask)
> >      drc_names = g_string_set_size(g_string_new(NULL), sizeof(uint32_t));
> >      drc_types = g_string_set_size(g_string_new(NULL), sizeof(uint32_t));
> >  
> > -    /* aliases for all DRConnector objects will be rooted in QOM
> > -     * composition tree at DRC_CONTAINER_PATH
> > -     */
> > -    root_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
> > -
> > -    object_property_iter_init(&iter, root_container);
> > -    while ((prop = object_property_iter_next(&iter))) {
> > -        Object *obj;
> > -        SpaprDrc *drc;
> > +    g_hash_table_iter_init(&iter, drc_hash_table());
> > +    while (g_hash_table_iter_next(&iter, NULL, (gpointer) &drc)) {
> >          SpaprDrcClass *drck;
> >          char *drc_name = NULL;
> >          uint32_t drc_index, drc_power_domain;
> >  
> > -        if (!strstart(prop->type, "link<", NULL)) {
> > -            continue;
> > -        }
> > -
> > -        obj = object_property_get_link(root_container, prop->name,
> > -                                       &error_abort);
> > -        drc = SPAPR_DR_CONNECTOR(obj);
> >          drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> >  
> >          if (owner && (drc->owner != owner)) {
> > @@ -951,23 +920,12 @@ out:
> >  
> >  void spapr_drc_reset_all(SpaprMachineState *spapr)
> >  {
> > -    Object *drc_container;
> > -    ObjectProperty *prop;
> > -    ObjectPropertyIterator iter;
> > +    GHashTableIter iter;
> > +    SpaprDrc *drc;
> >  
> > -    drc_container = container_get(object_get_root(), DRC_CONTAINER_PATH);
> >  restart:
> > -    object_property_iter_init(&iter, drc_container);
> > -    while ((prop = object_property_iter_next(&iter))) {
> > -        SpaprDrc *drc;
> > -
> > -        if (!strstart(prop->type, "link<", NULL)) {
> > -            continue;
> > -        }
> > -        drc = SPAPR_DR_CONNECTOR(object_property_get_link(drc_container,
> > -                                                          prop->name,
> > -                                                          &error_abort));
> > -
> > +    g_hash_table_iter_init(&iter, drc_hash_table());
> > +    while (g_hash_table_iter_next(&iter, NULL, (gpointer) &drc)) {
> >          /*
> >           * This will complete any pending plug/unplug requests.
> >           * In case of a unplugged PHB or PCI bridge, this will
> > diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> > index 76d7c91e9c64..ca0cca664e3c 100644
> > --- a/hw/ppc/spapr_pci.c
> > +++ b/hw/ppc/spapr_pci.c
> > @@ -1262,7 +1262,7 @@ static void remove_drcs(SpaprPhbState *phb, PCIBus *bus)
> >          SpaprDrc *drc = drc_from_devfn(phb, chassis, i);
> >  
> >          if (drc) {
> > -            object_unparent(OBJECT(drc));
> > +            spapr_dr_connector_free(drc);
> >          }
> >      }
> >  }
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 6/6] spapr: Model DR connectors as simple objects
  2021-01-06 18:15     ` Greg Kurz
@ 2021-02-08  6:30       ` David Gibson
  0 siblings, 0 replies; 22+ messages in thread
From: David Gibson @ 2021-02-08  6:30 UTC (permalink / raw)
  To: Greg Kurz; +Cc: Daniel Henrique Barboza, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 7595 bytes --]

On Wed, Jan 06, 2021 at 07:15:36PM +0100, Greg Kurz wrote:
> On Mon, 28 Dec 2020 19:28:39 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Fri, Dec 18, 2020 at 11:34:00AM +0100, Greg Kurz wrote:
> > > Modeling DR connectors as individual devices raises some
> > > concerns, as already discussed a year ago in this thread:
> > > 
> > > https://patchew.org/QEMU/20191017205953.13122-1-cheloha@linux.vnet.ibm.com/
> > > 
> > > First, high maxmem settings creates too many DRC devices.
> > > This causes scalability issues. It severely increase boot
> > > time because the multiple traversals of the DRC list that
> > > are performed during machine setup are quadratic operations.
> > > This is directly related to the fact that DRCs are modeled
> > > as individual devices and added to the composition tree.
> > > 
> > > Second, DR connectors are really an internal concept of
> > > PAPR. They aren't something that the user or management
> > > layer can manipulate in any way. We already don't allow
> > > their creation with device_add by clearing user_creatable.
> > > 
> > > DR connectors don't even need to be modeled as actual
> > > devices since they don't sit in a bus. They just need
> > > to be associated to an 'owner' object and to have the
> > > equivalent of realize/unrealize functions.
> > > 
> > > Downgrade them to be simple objects. Convert the existing
> > > realize() and unrealize() to be methods of the DR connector
> > > base class. Also have the base class to inherit from the
> > > vmstate_if interface directly. The get_id() hook simply
> > > returns NULL, just as device_vmstate_if_get_id() does for
> > > devices that don't sit in a bus. The DR connector is no
> > > longer made a child object. This means that it must be
> > > explicitely freed when no longer needed. This is only
> > > required for PHBs and PCI bridges actually : have them to
> > > free the DRC with spapr_dr_connector_free() instead of
> > > object_unparent().
> > > 
> > > No longer add the DRCs to the QOM composition tree. Track
> > > them with a glib hash table using the global DRC index as
> > > the key instead. This makes traversal a linear operation.
> > 
> > I have some reservations about this one.  The main thing is that
> > attaching migration state to something that's not a device seems a bit
> > odd to me.  AFAICT exactly one other non-device implements
> > TYPE_VMSTATE_IF, and what it does isn't very clear to me.
> > 
> 
> Even with your proposal below, the current SpaprDrc type, which is
> used all over the place, will stop being a TYPE_DEVICE but we still
> need to support migration with existing machine types for which DRC
> are devices.

Ah... that's a good point.

> Implementing TYPE_VMSTATE_IF is essentially a hack that
> allows to do that without keeping the current TYPE_DEVICE based
> implementation around.

Ok, that makes things clearer.

> > As I might have mentioned to you I had a different idea for how to
> > address this problem: still use a TYPE_DEVICE, but have it manage a
> > whole array of DRCs as one unit, rather than just a single one.
> > Specifically I was thinking:
> > 
> > * one array per PCI bus (DRCs for each function on the bus)
> > * one array for each block of memory (so one for base memory, one for
> >   each DIMM)
> > * one array for all the cpus
> > * one array for all the PHBs
> > 
> > It has some disadvantages compared to your scheme: it still leaves
> > (less) devices which can't be user managed, which is a bit ugly.  On
> > the other hand, each of those arrays can reasonably be dense, so we
> > can use direct indexing rather than a hash table, which is a bit
> > nicer.
> > 
> > Thoughts?
> > 
> 
> I find it a bit overkill to introduce a new TYPE_DEVICE (let's
> call it a DRC manager) for something that:
> - doesn't sit on a bus
> - can't be user managed
> - isn't directly represented to the guest as a full node
>   in the DT unlike all other devices, but just as indexes
>   in some properties of actual DR capable devices.
> 
> Given that the DRC index space is global and this is what
> the guest passes to DR RTAS calls, we can't do direct
> indexing, strictly speaking. We need at least some logic
> to dispatch operations on individual DRC states to the
> appropriate DRC manager. This logic belongs to the machine
> IMHO.
> 
> This shouldn't be too complex for CPUs and PHBs since they
> sit directly under the machine and have 1:1 relation with
> the attached device. It just boils down to instantiate
> some DRC managers during machine init:
> 
> - range [ 0x10000000 ... 0x10000000 + ${max_cpus} [
>   for CPUs
> - range [ 0x20000000 ... 0x20000000 + 31 [
>   for PHBs
> 
> For memory, the code currently generates DRC indexes in the range:
> 
> [ 0x80000000 ... 0x80000000 + ${base_ram_size}/256M ... ${max_ram_size}/256M [
> 
> ie. it doesn't generate DRC indexes for the base memory AFAICT. Also
> each DIMM can be of arbitrary size, ie. consume an arbitrary amount
> of DRC indexes. So the machine would instantiate SPAPR_MAX_RAM_SLOTS (32)
> DRC managers, each capable of managing the full set of LMB DRCs, just
> in case ? Likely a lot of zeroes with high maxmem settings but I guess
> we can live with it.

Actually, I was thinking of just a single manager for all the
(pluggable) LMB DRCs, a single manager for all CPU DRCs, a single
manager for all PHB DRCs and one per bus for PCI DRCs.  I'm not
assuming a 1:1 correspondance between manager and user side hotplug
operations.

Although... actually the "manager" could be an interface rather than
an object, in which case the DRC manager would be the machine for
LMBs, CPUs, and PHBs and the parent bus for each PCI slot.

> PCI busses would need some extra care though since the machine
> doesn't know about them. This would require to be able to
> register/unregister DRC managers for SPAPR_DR_CONNECTOR_TYPE_PCI
> indexes, so that the dispatching logic know about the ranges
> they cover (PHB internals).

Right, but that wouldn't really be any different from the dynamic
creation of DRCs we do add_drcs() / remove_drcs() right now, except
that it would create /destroy one object instead of a bunch.

> And finally comes migration : I cannot think of a way to generate
> the VMState sections used by existing machine types out of a set
> of arrays of integers... We could keep the current implementation
> around and use it with older machine types, but this certainly
> looks terrible from a maintenance perspective. Did you have any
> suggestion to handle that ?

Ugh, yeah.. that could be difficult.

> I seem to remember that one of the motivation to have arrays
> of DRCs is to avoid the inflation of VMState sections that
> we currently get with high maxmem settings, and it is considered
> preferable to stream sparse arrays. This could be achieved by
> building these arrays out of the global DRC hash table in a machine
> pre-save handler and migrate them in a subsection for the default
> machine type. Older machine types would continue with the current
> VMState sections thanks to the TYPE_VMSTATE_IF hack.
> 
> Does this seem a reasonable trade-off to be able to support
> older and newer machine types with the same implementation ?

Hrm, maybe, yeah.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-02-08 10:41 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-18 10:33 [PATCH 0/6] spapr: Fix visibility and traversal of DR connectors Greg Kurz
2020-12-18 10:33 ` [PATCH 1/6] spapr: Call spapr_drc_reset() for all DRCs at CAS Greg Kurz
2020-12-21 18:24   ` Daniel Henrique Barboza
2020-12-28  7:20   ` David Gibson
2020-12-18 10:33 ` [PATCH 2/6] spapr: Fix reset of transient DR connectors Greg Kurz
2020-12-21 20:34   ` Daniel Henrique Barboza
2020-12-28  7:24   ` David Gibson
2020-12-18 10:33 ` [PATCH 3/6] spapr: Introduce spapr_drc_reset_all() Greg Kurz
2020-12-21 20:35   ` Daniel Henrique Barboza
2020-12-28  7:26   ` David Gibson
2020-12-18 10:33 ` [PATCH 4/6] spapr: Use spapr_drc_reset_all() at machine reset Greg Kurz
2020-12-21 20:36   ` Daniel Henrique Barboza
2020-12-28  7:29   ` David Gibson
2020-12-18 10:33 ` [PATCH 5/6] spapr: Add drc_ prefix to the DRC realize and unrealize functions Greg Kurz
2020-12-21 20:37   ` Daniel Henrique Barboza
2020-12-28  7:31   ` David Gibson
2020-12-18 10:34 ` [PATCH 6/6] spapr: Model DR connectors as simple objects Greg Kurz
2020-12-21 20:45   ` Daniel Henrique Barboza
2020-12-28  8:28   ` David Gibson
2021-01-06 18:15     ` Greg Kurz
2021-02-08  6:30       ` David Gibson
2020-12-22 10:14 ` [PATCH 0/6] spapr: Fix visibility and traversal of DR connectors Daniel Henrique Barboza

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.