All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] CPU unplug timeout/LMB unplug cleanup in DRC reconfiguration
@ 2021-02-11 22:52 Daniel Henrique Barboza
  2021-02-11 22:52 ` [PATCH v3 1/7] spapr_drc.c: do not call spapr_drc_detach() in drc_isolate_logical() Daniel Henrique Barboza
                   ` (7 more replies)
  0 siblings, 8 replies; 28+ messages in thread
From: Daniel Henrique Barboza @ 2021-02-11 22:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, groug, david

Hi,

This is marked as a v3 as it started as a result of discussions that
followed the v2 [1]. 

The idea with this series is to add CPU hotunplug timeout to avoid the
situations where the kernel refuses to release the CPU. The reasoning
for a timeout approach is described in patch 05.

While investigating putting a timeout in memory hotunplug, I have found
out that we have a way to determine, at least in some cases, when the kernel
refuses to release the DIMM during a memory hotunplug. This alleviate one
of the most common issues (at least AFAIK) with memory hotunplug and it
made me gave up attempting to put a timeout in memory hotunplug altogether.

At this point I didn't add timeouts for PCI hotunplug operations, but it
is trivial to do so if desirable.

The series goes as follows:

- Patches 1-4: DRC simplifications/cleanups. The idea with these
  cleanups were to trim the spapr_drc_detach use as much as possible,
  since the function would be used to start the timeout timer

- Patch 5: timeout timer infrastructure

- Patch 6: add cpu unplug timeout

- Patch 7: reset DIMM unplug state when the kernel reconfigures the DRC
  connector



v2 link: [1] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04400.html


Daniel Henrique Barboza (7):
  spapr_drc.c: do not call spapr_drc_detach() in drc_isolate_logical()
  spapr_pci.c: simplify spapr_pci_unplug_request() function handling
  spapr_drc.c: use spapr_drc_release() in isolate_physical/set_unusable
  spapr: rename spapr_drc_detach() to spapr_drc_unplug_request()
  spapr_drc.c: introduce unplug_timeout_timer
  spapr_drc.c: add hotunplug timeout for CPUs
  spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state

 hw/ppc/spapr.c             |  40 ++++++++++++-
 hw/ppc/spapr_drc.c         | 116 +++++++++++++++++++++++++++----------
 hw/ppc/spapr_pci.c         |  44 +++++---------
 hw/ppc/trace-events        |   2 +-
 include/hw/ppc/spapr.h     |   2 +
 include/hw/ppc/spapr_drc.h |   7 ++-
 6 files changed, 147 insertions(+), 64 deletions(-)

-- 
2.29.2



^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v3 1/7] spapr_drc.c: do not call spapr_drc_detach() in drc_isolate_logical()
  2021-02-11 22:52 [PATCH v3 0/7] CPU unplug timeout/LMB unplug cleanup in DRC reconfiguration Daniel Henrique Barboza
@ 2021-02-11 22:52 ` Daniel Henrique Barboza
  2021-02-15 10:40   ` Greg Kurz
  2021-02-11 22:52 ` [PATCH v3 2/7] spapr_pci.c: simplify spapr_pci_unplug_request() function handling Daniel Henrique Barboza
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Daniel Henrique Barboza @ 2021-02-11 22:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, groug, david

drc_isolate_logical() is used to move the DRC from the "Configured" to the
"Available" state, erroring out if the DRC is in the unexpected "Unisolate"
state and doing nothing (with RTAS_OUT_SUCCESS) if the DRC is already in
"Available" or in "Unusable" state.

When moving from "Configured" to "Available", the DRC is moved to the
LOGICAL_AVAILABLE state, a drc->unplug_requested check is done and, if true,
spapr_drc_detach() is called.

What spapr_drc_detach() does then is:

- set drc->unplug_requested to true. In fact, this is the only place where
unplug_request is set to true;
- does nothing else if drc->state != drck->empty_state. If the DRC state
is equal to drck->empty_state, spapr_drc_release() is called. For logical
DRCs, drck->empty_state = LOGICAL_UNUSABLE.

In short, calling spapr_drc_detach() in drc_isolate_logical() does nothing. It'll
set unplug_request to true again ('again' since it was already true - otherwise the
function wouldn't be called), and will return without calling spapr_drc_release()
because the DRC is not in LOGICAL_UNUSABLE, since drc_isolate_logical() just
moved it to LOGICAL_AVAILABLE. The only place where the logical DRC is released
is when called from drc_set_unusable(), when it is moved to the "Unusable" state.
As it should, according to PAPR.

Even though calling spapr_drc_detach() in drc_isolate_logical() is benign, removing
it will avoid further thought about the matter. So let's go ahead and do that.

As a note, this logic was introduced in commit bbf5c878ab76. Since then, the DRC
handling code was refactored and enhanced, and PAPR itself went through some
changes in the DRC area as well. It is expected that some assumptions we had back
then are now deprecated.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 hw/ppc/spapr_drc.c | 13 -------------
 1 file changed, 13 deletions(-)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 8571d5bafe..84bd3c881f 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -132,19 +132,6 @@ static uint32_t drc_isolate_logical(SpaprDrc *drc)
 
     drc->state = SPAPR_DRC_STATE_LOGICAL_AVAILABLE;
 
-    /* if we're awaiting release, but still in an unconfigured state,
-     * it's likely the guest is still in the process of configuring
-     * the device and is transitioning the devices to an ISOLATED
-     * state as a part of that process. so we only complete the
-     * removal when this transition happens for a device in a
-     * configured state, as suggested by the state diagram from PAPR+
-     * 2.7, 13.4
-     */
-    if (drc->unplug_requested) {
-        uint32_t drc_index = spapr_drc_index(drc);
-        trace_spapr_drc_set_isolation_state_finalizing(drc_index);
-        spapr_drc_detach(drc);
-    }
     return RTAS_OUT_SUCCESS;
 }
 
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 2/7] spapr_pci.c: simplify spapr_pci_unplug_request() function handling
  2021-02-11 22:52 [PATCH v3 0/7] CPU unplug timeout/LMB unplug cleanup in DRC reconfiguration Daniel Henrique Barboza
  2021-02-11 22:52 ` [PATCH v3 1/7] spapr_drc.c: do not call spapr_drc_detach() in drc_isolate_logical() Daniel Henrique Barboza
@ 2021-02-11 22:52 ` Daniel Henrique Barboza
  2021-02-16 15:50   ` Greg Kurz
  2021-02-11 22:52 ` [PATCH v3 3/7] spapr_drc.c: use spapr_drc_release() in isolate_physical/set_unusable Daniel Henrique Barboza
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 28+ messages in thread
From: Daniel Henrique Barboza @ 2021-02-11 22:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, groug, david

When hotunplugging a PCI function we'll branch out the logic in two cases,
function zero and non-zero. If non-zero, we'll call spapr_drc_detach() and
nothing else. If it's function zero, we'll loop it once between all the
functions in the slot to call spapr_drc_detach() on them, and afterwards
we'll do another backwards loop where we'll signal the event to the guest.

We can simplify this logic. We can ignore all the DRC handling for non-zero
functions, since we'll end up doing that regardless when unplugging function
zero. And for function zero, everything can be done in a single loop, since
tt doesn't matter if we end up marking the function DRCs as unplug pending in
backwards order or not, as long as we call spapr_drc_detach() before issuing
the hotunplug event to the guest.

This will also avoid a possible scenario where the user starts to hotunplug
the slot, starting with a non-zero function, and then delays/forgets to
hotunplug function zero afterwards. This would keep the function DRC marked
as unplug requested indefinitely.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 hw/ppc/spapr_pci.c | 44 ++++++++++++++++----------------------------
 1 file changed, 16 insertions(+), 28 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index f1c7479816..1791d98a49 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1709,38 +1709,26 @@ static void spapr_pci_unplug_request(HotplugHandler *plug_handler,
             return;
         }
 
-        /* ensure any other present functions are pending unplug */
-        if (PCI_FUNC(pdev->devfn) == 0) {
-            for (i = 1; i < 8; i++) {
-                func_drc = drc_from_devfn(phb, chassis, PCI_DEVFN(slotnr, i));
-                func_drck = SPAPR_DR_CONNECTOR_GET_CLASS(func_drc);
-                state = func_drck->dr_entity_sense(func_drc);
-                if (state == SPAPR_DR_ENTITY_SENSE_PRESENT
-                    && !spapr_drc_unplug_requested(func_drc)) {
-                    /*
-                     * Attempting to remove function 0 of a multifunction
-                     * device will will cascade into removing all child
-                     * functions, even if their unplug weren't requested
-                     * beforehand.
-                     */
-                    spapr_drc_detach(func_drc);
-                }
-            }
+        /*
+         * The hotunplug itself will occur when unplugging function 0,
+         * regardless of marking any other functions DRCs as pending
+         * unplug beforehand (since 02a1536eee33).
+         */
+        if (PCI_FUNC(pdev->devfn) != 0) {
+            return;
         }
 
-        spapr_drc_detach(drc);
+        for (i = 7; i >= 0; i--) {
+            func_drc = drc_from_devfn(phb, chassis, PCI_DEVFN(slotnr, i));
+            func_drck = SPAPR_DR_CONNECTOR_GET_CLASS(func_drc);
+            state = func_drck->dr_entity_sense(func_drc);
 
-        /* if this isn't func 0, defer unplug event. otherwise signal removal
-         * for all present functions
-         */
-        if (PCI_FUNC(pdev->devfn) == 0) {
-            for (i = 7; i >= 0; i--) {
-                func_drc = drc_from_devfn(phb, chassis, PCI_DEVFN(slotnr, i));
-                func_drck = SPAPR_DR_CONNECTOR_GET_CLASS(func_drc);
-                state = func_drck->dr_entity_sense(func_drc);
-                if (state == SPAPR_DR_ENTITY_SENSE_PRESENT) {
-                    spapr_hotplug_req_remove_by_index(func_drc);
+            if (state == SPAPR_DR_ENTITY_SENSE_PRESENT) {
+                /* Mark the DRC as requested unplug if needed. */
+                if (!spapr_drc_unplug_requested(func_drc)) {
+                    spapr_drc_detach(func_drc);
                 }
+                spapr_hotplug_req_remove_by_index(func_drc);
             }
         }
     }
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 3/7] spapr_drc.c: use spapr_drc_release() in isolate_physical/set_unusable
  2021-02-11 22:52 [PATCH v3 0/7] CPU unplug timeout/LMB unplug cleanup in DRC reconfiguration Daniel Henrique Barboza
  2021-02-11 22:52 ` [PATCH v3 1/7] spapr_drc.c: do not call spapr_drc_detach() in drc_isolate_logical() Daniel Henrique Barboza
  2021-02-11 22:52 ` [PATCH v3 2/7] spapr_pci.c: simplify spapr_pci_unplug_request() function handling Daniel Henrique Barboza
@ 2021-02-11 22:52 ` Daniel Henrique Barboza
  2021-02-17  0:57   ` David Gibson
  2021-02-17 10:58   ` Greg Kurz
  2021-02-11 22:52 ` [PATCH v3 4/7] spapr: rename spapr_drc_detach() to spapr_drc_unplug_request() Daniel Henrique Barboza
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 28+ messages in thread
From: Daniel Henrique Barboza @ 2021-02-11 22:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, groug, david

When moving a physical DRC to "Available", drc_isolate_physical() will
move the DRC state to STATE_PHYSICAL_POWERON and, if the DRC is marked
for unplug, call spapr_drc_detach(). For physical DRCs, drck->empty_state
is STATE_PHYSICAL_POWERON, meaning that we're sure that spapr_drc_detach()
will end up calling spapr_drc_release() in the end.

Likewise, for logical DRCs, drc_set_unusable will move the DRC to "Unusable"
state, setting drc->state to STATE_LOGICAL_UNUSABLE, which is the
drck->empty_state for logical DRCs. spapr_drc_detach() will call
spapr_drc_release() in this case as well.

In both scenarios, spapr_drc_detach() is being used as a spapr_drc_release(),
wrapper, where we also set unplug_requested (which is already true, otherwise
spapr_drc_detach() wouldn't be called in the first place) and check if
drc->state == drck->empty_state, which we also know it's guaranteed to
be true because we just set it.

Just use spapr_drc_release() in these functions to be clear of our intentions
in both these functions.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 hw/ppc/spapr_drc.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 84bd3c881f..555a25517d 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -50,6 +50,20 @@ uint32_t spapr_drc_index(SpaprDrc *drc)
         | (drc->id & DRC_INDEX_ID_MASK);
 }
 
+static void spapr_drc_release(SpaprDrc *drc)
+{
+    SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
+
+    drck->release(drc->dev);
+
+    drc->unplug_requested = false;
+    g_free(drc->fdt);
+    drc->fdt = NULL;
+    drc->fdt_start_offset = 0;
+    object_property_del(OBJECT(drc), "device");
+    drc->dev = NULL;
+}
+
 static uint32_t drc_isolate_physical(SpaprDrc *drc)
 {
     switch (drc->state) {
@@ -68,7 +82,7 @@ static uint32_t drc_isolate_physical(SpaprDrc *drc)
     if (drc->unplug_requested) {
         uint32_t drc_index = spapr_drc_index(drc);
         trace_spapr_drc_set_isolation_state_finalizing(drc_index);
-        spapr_drc_detach(drc);
+        spapr_drc_release(drc);
     }
 
     return RTAS_OUT_SUCCESS;
@@ -209,7 +223,7 @@ static uint32_t drc_set_unusable(SpaprDrc *drc)
     if (drc->unplug_requested) {
         uint32_t drc_index = spapr_drc_index(drc);
         trace_spapr_drc_set_allocation_state_finalizing(drc_index);
-        spapr_drc_detach(drc);
+        spapr_drc_release(drc);
     }
 
     return RTAS_OUT_SUCCESS;
@@ -372,20 +386,6 @@ void spapr_drc_attach(SpaprDrc *drc, DeviceState *d)
                              NULL, 0);
 }
 
-static void spapr_drc_release(SpaprDrc *drc)
-{
-    SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
-
-    drck->release(drc->dev);
-
-    drc->unplug_requested = false;
-    g_free(drc->fdt);
-    drc->fdt = NULL;
-    drc->fdt_start_offset = 0;
-    object_property_del(OBJECT(drc), "device");
-    drc->dev = NULL;
-}
-
 void spapr_drc_detach(SpaprDrc *drc)
 {
     SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 4/7] spapr: rename spapr_drc_detach() to spapr_drc_unplug_request()
  2021-02-11 22:52 [PATCH v3 0/7] CPU unplug timeout/LMB unplug cleanup in DRC reconfiguration Daniel Henrique Barboza
                   ` (2 preceding siblings ...)
  2021-02-11 22:52 ` [PATCH v3 3/7] spapr_drc.c: use spapr_drc_release() in isolate_physical/set_unusable Daniel Henrique Barboza
@ 2021-02-11 22:52 ` Daniel Henrique Barboza
  2021-02-17  0:58   ` David Gibson
  2021-02-17 11:01   ` Greg Kurz
  2021-02-11 22:52 ` [PATCH v3 5/7] spapr_drc.c: introduce unplug_timeout_timer Daniel Henrique Barboza
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 28+ messages in thread
From: Daniel Henrique Barboza @ 2021-02-11 22:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, groug, david

spapr_drc_detach() is not the best name for what the function does.
The function does not detach the DRC, it makes an uncommited attempt
to do it. It'll mark the DRC as pending unplug, via the 'unplug_request'
flag, and only if the DRC state is drck->empty_state it will detach the
DRC, via spapr_drc_release().

This is a contrast with its pair spapr_drc_attach(), where the function is
indeed creating the DRC QOM object. If you know what spapr_drc_attach()
does, you can be misled into thinking that spapr_drc_detach() is removing
the DRC from QEMU internal state, which isn't true.

The current role of this function is better described as a request for
detach, since there's no guarantee that we're going to detach the DRC in
the end. Rename the function to spapr_drc_unplug_request to reflect what is is
doing.

The initial idea was to change the name to spapr_drc_detach_request(), and
later on change the unplug_request flag to detach_request. However,
unplug_request is a migratable boolean for a long time now and renaming it
is not worth the trouble. spapr_drc_unplug_request() setting drc->unplug_request
is more natural than spapr_drc_detach_request setting drc->unplug_request.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 hw/ppc/spapr.c             | 6 +++---
 hw/ppc/spapr_drc.c         | 4 ++--
 hw/ppc/spapr_pci.c         | 2 +-
 hw/ppc/trace-events        | 2 +-
 include/hw/ppc/spapr_drc.h | 2 +-
 5 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 85fe65f894..b066df68cb 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3654,7 +3654,7 @@ static void spapr_memory_unplug_request(HotplugHandler *hotplug_dev,
                               addr / SPAPR_MEMORY_BLOCK_SIZE);
         g_assert(drc);
 
-        spapr_drc_detach(drc);
+        spapr_drc_unplug_request(drc);
         addr += SPAPR_MEMORY_BLOCK_SIZE;
     }
 
@@ -3722,7 +3722,7 @@ void spapr_core_unplug_request(HotplugHandler *hotplug_dev, DeviceState *dev,
     g_assert(drc);
 
     if (!spapr_drc_unplug_requested(drc)) {
-        spapr_drc_detach(drc);
+        spapr_drc_unplug_request(drc);
         spapr_hotplug_req_remove_by_index(drc);
     }
 }
@@ -3985,7 +3985,7 @@ static void spapr_phb_unplug_request(HotplugHandler *hotplug_dev,
     assert(drc);
 
     if (!spapr_drc_unplug_requested(drc)) {
-        spapr_drc_detach(drc);
+        spapr_drc_unplug_request(drc);
         spapr_hotplug_req_remove_by_index(drc);
     }
 }
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 555a25517d..67041fb212 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -386,11 +386,11 @@ void spapr_drc_attach(SpaprDrc *drc, DeviceState *d)
                              NULL, 0);
 }
 
-void spapr_drc_detach(SpaprDrc *drc)
+void spapr_drc_unplug_request(SpaprDrc *drc)
 {
     SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
 
-    trace_spapr_drc_detach(spapr_drc_index(drc));
+    trace_spapr_drc_unplug_request(spapr_drc_index(drc));
 
     g_assert(drc->dev);
 
diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 1791d98a49..9334ba5dbb 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -1726,7 +1726,7 @@ static void spapr_pci_unplug_request(HotplugHandler *plug_handler,
             if (state == SPAPR_DR_ENTITY_SENSE_PRESENT) {
                 /* Mark the DRC as requested unplug if needed. */
                 if (!spapr_drc_unplug_requested(func_drc)) {
-                    spapr_drc_detach(func_drc);
+                    spapr_drc_unplug_request(func_drc);
                 }
                 spapr_hotplug_req_remove_by_index(func_drc);
             }
diff --git a/hw/ppc/trace-events b/hw/ppc/trace-events
index 1e91984526..b4bbfbb013 100644
--- a/hw/ppc/trace-events
+++ b/hw/ppc/trace-events
@@ -50,7 +50,7 @@ spapr_drc_set_allocation_state(uint32_t index, int state) "drc: 0x%"PRIx32", sta
 spapr_drc_set_allocation_state_finalizing(uint32_t index) "drc: 0x%"PRIx32
 spapr_drc_set_configured(uint32_t index) "drc: 0x%"PRIx32
 spapr_drc_attach(uint32_t index) "drc: 0x%"PRIx32
-spapr_drc_detach(uint32_t index) "drc: 0x%"PRIx32
+spapr_drc_unplug_request(uint32_t index) "drc: 0x%"PRIx32
 spapr_drc_awaiting_quiesce(uint32_t index) "drc: 0x%"PRIx32
 spapr_drc_reset(uint32_t index) "drc: 0x%"PRIx32
 spapr_drc_realize(uint32_t index) "drc: 0x%"PRIx32
diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
index 8982927d5c..02a63b3666 100644
--- a/include/hw/ppc/spapr_drc.h
+++ b/include/hw/ppc/spapr_drc.h
@@ -243,7 +243,7 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask);
  * beforehand (eg. check drc->dev at pre-plug).
  */
 void spapr_drc_attach(SpaprDrc *drc, DeviceState *d);
-void spapr_drc_detach(SpaprDrc *drc);
+void spapr_drc_unplug_request(SpaprDrc *drc);
 
 /*
  * Reset all DRCs, causing pending hot-plug/unplug requests to complete.
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 5/7] spapr_drc.c: introduce unplug_timeout_timer
  2021-02-11 22:52 [PATCH v3 0/7] CPU unplug timeout/LMB unplug cleanup in DRC reconfiguration Daniel Henrique Barboza
                   ` (3 preceding siblings ...)
  2021-02-11 22:52 ` [PATCH v3 4/7] spapr: rename spapr_drc_detach() to spapr_drc_unplug_request() Daniel Henrique Barboza
@ 2021-02-11 22:52 ` Daniel Henrique Barboza
  2021-02-17  1:14   ` David Gibson
  2021-02-17  1:20   ` David Gibson
  2021-02-11 22:52 ` [PATCH v3 6/7] spapr_drc.c: add hotunplug timeout for CPUs Daniel Henrique Barboza
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 28+ messages in thread
From: Daniel Henrique Barboza @ 2021-02-11 22:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, groug, david

The LoPAR spec provides no way for the guest kernel to report failure of
hotplug/hotunplug events. This wouldn't be bad if those operations were
granted to always succeed, but that's far for the reality.

What ends up happening is that, in the case of a failed hotunplug,
regardless of whether it was a QEMU error or a guest misbehavior, the pSeries
machine is retaining the unplug state of the device in the running guest.
This state is cleanup in machine reset, where it is assumed that this state
represents a device that is pending unplug, and the device is hotunpluged
from the board. Until the reset occurs, any hotunplug operation of the same
device is forbid because there is a pending unplug state.

This behavior has at least one undesirable side effect. A long standing pending
unplug state is, more often than not, the result of a hotunplug error. The user
had to dealt with it, since retrying to unplug the device is noy allowed, and then
in the machine reset we're removing the device from the guest. This means that
we're failing the user twice - failed to hotunplug when asked, then hotunplugged
without notice.

Solutions to this problem range between trying to predict when the hotunplug will
fail and forbid the operation from the QEMU layer, from opening up the IRQ queue
to allow for multiple hotunplug attempts, from telling the users to 'reboot the
machine if something goes wrong'. The first solution is flawed because we can't
fully predict guest behavior from QEMU, the second solution is a trial and error
remediation that counts on a hope that the unplug will eventually succeed, and the
third is ... well.

This patch introduces a crude, but effective solution to hotunplug errors in
the pSeries machine. For each unplug done, we'll timeout after some time. If
a certain amount of time passes, we'll cleanup the hotunplug state from the machine.
During the timeout period, any unplug operations in the same device will still
be blocked. After that, we'll assume that the guest failed the operation, and
allow the user to try again. If the timeout is too short we'll prevent legitimate
hotunplug situations to occur, so we'll need to overestimate the regular time
an unplug operation takes to succeed to account that.

The true solution for the hotunplug errors in the pSeries machines is a PAPR change
to allow for the guest to warn the platform about it. For now, the work done in this
timeout design can be used for the new PAPR 'abort hcall' in the future, given that
for both cases we'll need code to cleanup the existing unplug states of the DRCs.

At this moment we're adding the basic wiring of the timer into the DRC. Next patch
will use the timer to timeout failed CPU hotunplugs.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 hw/ppc/spapr_drc.c         | 36 ++++++++++++++++++++++++++++++++++++
 include/hw/ppc/spapr_drc.h |  2 ++
 2 files changed, 38 insertions(+)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 67041fb212..c88bb524c5 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -57,6 +57,8 @@ static void spapr_drc_release(SpaprDrc *drc)
     drck->release(drc->dev);
 
     drc->unplug_requested = false;
+    timer_del(drc->unplug_timeout_timer);
+
     g_free(drc->fdt);
     drc->fdt = NULL;
     drc->fdt_start_offset = 0;
@@ -453,6 +455,24 @@ static const VMStateDescription vmstate_spapr_drc_unplug_requested = {
     }
 };
 
+static bool spapr_drc_unplug_timeout_timer_needed(void *opaque)
+{
+    SpaprDrc *drc = opaque;
+
+    return timer_pending(drc->unplug_timeout_timer);
+}
+
+static const VMStateDescription vmstate_spapr_drc_unplug_timeout_timer = {
+    .name = "DRC unplug timeout timer",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = spapr_drc_unplug_timeout_timer_needed,
+    .fields = (VMStateField[]) {
+        VMSTATE_TIMER_PTR(unplug_timeout_timer, SpaprDrc),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static bool spapr_drc_needed(void *opaque)
 {
     SpaprDrc *drc = opaque;
@@ -486,10 +506,20 @@ static const VMStateDescription vmstate_spapr_drc = {
     },
     .subsections = (const VMStateDescription * []) {
         &vmstate_spapr_drc_unplug_requested,
+        &vmstate_spapr_drc_unplug_timeout_timer,
         NULL
     }
 };
 
+static void drc_unplug_timeout_cb(void *opaque)
+{
+    SpaprDrc *drc = opaque;
+
+    if (drc->unplug_requested) {
+        drc->unplug_requested = false;
+    }
+}
+
 static void drc_realize(DeviceState *d, Error **errp)
 {
     SpaprDrc *drc = SPAPR_DR_CONNECTOR(d);
@@ -512,6 +542,11 @@ static void drc_realize(DeviceState *d, Error **errp)
     object_property_add_alias(root_container, link_name,
                               drc->owner, child_name);
     g_free(link_name);
+
+    drc->unplug_timeout_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
+                                             drc_unplug_timeout_cb,
+                                             drc);
+
     vmstate_register(VMSTATE_IF(drc), spapr_drc_index(drc), &vmstate_spapr_drc,
                      drc);
     trace_spapr_drc_realize_complete(spapr_drc_index(drc));
@@ -529,6 +564,7 @@ static void drc_unrealize(DeviceState *d)
     name = g_strdup_printf("%x", spapr_drc_index(drc));
     object_property_del(root_container, name);
     g_free(name);
+    timer_free(drc->unplug_timeout_timer);
 }
 
 SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type,
diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
index 02a63b3666..b2e6222d09 100644
--- a/include/hw/ppc/spapr_drc.h
+++ b/include/hw/ppc/spapr_drc.h
@@ -187,6 +187,8 @@ typedef struct SpaprDrc {
     bool unplug_requested;
     void *fdt;
     int fdt_start_offset;
+
+    QEMUTimer *unplug_timeout_timer;
 } SpaprDrc;
 
 struct SpaprMachineState;
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 6/7] spapr_drc.c: add hotunplug timeout for CPUs
  2021-02-11 22:52 [PATCH v3 0/7] CPU unplug timeout/LMB unplug cleanup in DRC reconfiguration Daniel Henrique Barboza
                   ` (4 preceding siblings ...)
  2021-02-11 22:52 ` [PATCH v3 5/7] spapr_drc.c: introduce unplug_timeout_timer Daniel Henrique Barboza
@ 2021-02-11 22:52 ` Daniel Henrique Barboza
  2021-02-17  1:23   ` David Gibson
  2021-02-11 22:52 ` [PATCH v3 7/7] spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state Daniel Henrique Barboza
  2021-02-17  2:33 ` [PATCH v3 0/7] CPU unplug timeout/LMB unplug cleanup in DRC reconfiguration David Gibson
  7 siblings, 1 reply; 28+ messages in thread
From: Daniel Henrique Barboza @ 2021-02-11 22:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: Xujun Ma, Daniel Henrique Barboza, qemu-ppc, groug, david

There is a reliable way to make a CPU hotunplug fail in the pseries
machine. Hotplug a CPU A, then offline all other CPUs inside the guest
but A. When trying to hotunplug A the guest kernel will refuse to do
it, because A is now the last online CPU of the guest. PAPR has no
'error callback' in this situation to report back to the platform,
so the guest kernel will deny the unplug in silent and QEMU will never
know what happened. The unplug pending state of A will remain until
the guest is shutdown or rebooted.

Previous attempts of fixing it (see [1] and [2]) were aimed at trying to
mitigate the effects of the problem. In [1] we were trying to guess which
guest CPUs were online to forbid hotunplug of the last online CPU in the QEMU
layer, avoiding the scenario described above because QEMU is now failing
in behalf of the guest. This is not robust because the last online CPU of
the guest can change while we're in the middle of the unplug process, and
our initial assumptions are now invalid. In [2] we were accepting that our
unplug process is uncertain and the user should be allowed to spam the IRQ
hotunplug queue of the guest in case the CPU hotunplug fails.

This patch presents another alternative, using the timeout infrastructure
introduced in the previous patch. CPU hotunplugs in the pSeries machine will
now timeout after 15 seconds. This is a long time for a single CPU unplug
to occur, regardless of guest load - although the user is *strongly* encouraged
to *not* hotunplug devices from a guest under high load - and we can be sure
that something went wrong if it takes longer than that for the guest to release
the CPU (the same can't be said about memory hotunplug - more on that in the
next patch).

Timing out the unplug operation will reset the unplug state of the CPU and
allow the user to try it again, regardless of the error situation that
prevented the hotunplug to occur. Of all the not so pretty fixes/mitigations
for CPU hotunplug errors in pSeries, timing out the operation is an admission
that we have no control in the process, and must assume the worst case if
the operation doesn't succeed in a sensible time frame.

[1] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg03353.html
[2] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04400.html

Reported-by: Xujun Ma <xuma@redhat.com>
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1911414
Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 hw/ppc/spapr.c             |  4 ++++
 hw/ppc/spapr_drc.c         | 17 +++++++++++++++++
 include/hw/ppc/spapr_drc.h |  3 +++
 3 files changed, 24 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index b066df68cb..ecce8abf14 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3724,6 +3724,10 @@ void spapr_core_unplug_request(HotplugHandler *hotplug_dev, DeviceState *dev,
     if (!spapr_drc_unplug_requested(drc)) {
         spapr_drc_unplug_request(drc);
         spapr_hotplug_req_remove_by_index(drc);
+    } else {
+        error_setg(errp, "core-id %d unplug is still pending, %d seconds "
+                   "timeout remaining",
+                   cc->core_id, spapr_drc_unplug_timeout_remaining_sec(drc));
     }
 }
 
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index c88bb524c5..c143bfb6d3 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -398,6 +398,12 @@ void spapr_drc_unplug_request(SpaprDrc *drc)
 
     drc->unplug_requested = true;
 
+    if (drck->unplug_timeout_seconds != 0) {
+        timer_mod(drc->unplug_timeout_timer,
+                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
+                  drck->unplug_timeout_seconds * 1000);
+    }
+
     if (drc->state != drck->empty_state) {
         trace_spapr_drc_awaiting_quiesce(spapr_drc_index(drc));
         return;
@@ -406,6 +412,16 @@ void spapr_drc_unplug_request(SpaprDrc *drc)
     spapr_drc_release(drc);
 }
 
+int spapr_drc_unplug_timeout_remaining_sec(SpaprDrc *drc)
+{
+    if (drc->unplug_requested && timer_pending(drc->unplug_timeout_timer)) {
+        return (qemu_timeout_ns_to_ms(drc->unplug_timeout_timer->expire_time) -
+                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL)) / 1000;
+    }
+
+    return 0;
+}
+
 bool spapr_drc_reset(SpaprDrc *drc)
 {
     SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
@@ -706,6 +722,7 @@ static void spapr_drc_cpu_class_init(ObjectClass *k, void *data)
     drck->drc_name_prefix = "CPU ";
     drck->release = spapr_core_release;
     drck->dt_populate = spapr_core_dt_populate;
+    drck->unplug_timeout_seconds = 15;
 }
 
 static void spapr_drc_pci_class_init(ObjectClass *k, void *data)
diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
index b2e6222d09..26599c385a 100644
--- a/include/hw/ppc/spapr_drc.h
+++ b/include/hw/ppc/spapr_drc.h
@@ -211,6 +211,8 @@ typedef struct SpaprDrcClass {
 
     int (*dt_populate)(SpaprDrc *drc, struct SpaprMachineState *spapr,
                        void *fdt, int *fdt_start_offset, Error **errp);
+
+    int unplug_timeout_seconds;
 } SpaprDrcClass;
 
 typedef struct SpaprDrcPhysical {
@@ -246,6 +248,7 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask);
  */
 void spapr_drc_attach(SpaprDrc *drc, DeviceState *d);
 void spapr_drc_unplug_request(SpaprDrc *drc);
+int spapr_drc_unplug_timeout_remaining_sec(SpaprDrc *drc);
 
 /*
  * Reset all DRCs, causing pending hot-plug/unplug requests to complete.
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v3 7/7] spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state
  2021-02-11 22:52 [PATCH v3 0/7] CPU unplug timeout/LMB unplug cleanup in DRC reconfiguration Daniel Henrique Barboza
                   ` (5 preceding siblings ...)
  2021-02-11 22:52 ` [PATCH v3 6/7] spapr_drc.c: add hotunplug timeout for CPUs Daniel Henrique Barboza
@ 2021-02-11 22:52 ` Daniel Henrique Barboza
  2021-02-17  2:31   ` David Gibson
  2021-02-17  2:33 ` [PATCH v3 0/7] CPU unplug timeout/LMB unplug cleanup in DRC reconfiguration David Gibson
  7 siblings, 1 reply; 28+ messages in thread
From: Daniel Henrique Barboza @ 2021-02-11 22:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, groug, david

Handling errors in memory hotunplug in the pSeries machine is more complex
than any other device type, because there are all the complications that other
devices has, and more.

For instance, determining a timeout for a DIMM hotunplug must consider if it's a
Hash-MMU or a Radix-MMU guest, because Hash guests takes longer to hotunplug DIMMs.
The size of the DIMM is also a factor, given that longer DIMMs naturally takes
longer to be hotunplugged from the kernel. And there's also the guest memory usage to
be considered: if there's a process that is consuming memory that would be lost by
the DIMM unplug, the kernel will postpone the unplug process until the process
finishes, and then initiate the regular hotunplug process. The first two
considerations are manageable, but the last one is a deal breaker.

There is no sane way for the pSeries machine to determine the memory load in the guest
when attempting a DIMM hotunplug - and even if there was a way, the guest can start
using all the RAM in the middle of the unplug process and invalidate our previous
assumptions - and in result we can't even begin to calculate a timeout for the
operation. This means that we can't implement a viable timeout mechanism for memory
unplug in pSeries.

Going back to why we would consider an unplug timeout, the reason is that we can't
know if the kernel is giving up the unplug. Turns out that, sometimes, we can.
Consider a failed memory hotunplug attempt where the kernel will error out with
the following message:

'pseries-hotplug-mem: Memory indexed-count-remove failed, adding any removed LMBs'

This happens when there is a LMB that the kernel gave up in removing, and the LMBs
marked for removal of the same DIMM are now being added back. This process happens
in the pseries kernel in [1], dlpar_memory_remove_by_ic() into dlpar_add_lmb(), and
after that update_lmb_associativity_index(). In this function, the kernel is configuring
the LMB DRC connector again. Note that this is a valid usage in LOPAR, as stated in
section "ibm,configure-connector RTAS Call":

'A subsequent sequence of calls to ibm,configure-connector with the same entry from
the “ibm,drc-indexes” or “ibm,drc-info” property will restart the configuration of
devices which were not completely configured.'

We can use this kernel behavior in our favor. If a DRC connector reconfiguration
for a LMB that we marked as unplug pending happens, this indicates that the kernel
changed its mind about the unplug and is reasserting that it will keep using the
DIMM. In this case, it's safe to assume that the whole DIMM unplug was cancelled.

This patch hops into rtas_ibm_configure_connector() and, in the scenario described
above, clear the unplug state for the DIMM device. This will not solve all the
problems we still have with memory unplug, but it will cover this case where the
kernel reconfigures LMBs after a failed unplug. We are a bit more resilient,
without using an unreliable timeout, and we didn't make the remaining error cases
any worse.

[1] arch/powerpc/platforms/pseries/hotplug-memory.c

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 hw/ppc/spapr.c         | 30 ++++++++++++++++++++++++++++++
 hw/ppc/spapr_drc.c     | 14 ++++++++++++++
 include/hw/ppc/spapr.h |  2 ++
 3 files changed, 46 insertions(+)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index ecce8abf14..4bcded4a1a 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3575,6 +3575,36 @@ static SpaprDimmState *spapr_recover_pending_dimm_state(SpaprMachineState *ms,
     return spapr_pending_dimm_unplugs_add(ms, avail_lmbs, dimm);
 }
 
+void spapr_clear_pending_dimm_unplug_state(SpaprMachineState *spapr,
+                                           PCDIMMDevice *dimm)
+{
+    SpaprDimmState *ds = spapr_pending_dimm_unplugs_find(spapr, dimm);
+    SpaprDrc *drc;
+    uint32_t nr_lmbs;
+    uint64_t size, addr_start, addr;
+    int i;
+
+    if (ds) {
+        spapr_pending_dimm_unplugs_remove(spapr, ds);
+    }
+
+    size = memory_device_get_region_size(MEMORY_DEVICE(dimm), &error_abort);
+    nr_lmbs = size / SPAPR_MEMORY_BLOCK_SIZE;
+
+    addr_start = object_property_get_uint(OBJECT(dimm), PC_DIMM_ADDR_PROP,
+                                          &error_abort);
+
+    addr = addr_start;
+    for (i = 0; i < nr_lmbs; i++) {
+        drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB,
+                              addr / SPAPR_MEMORY_BLOCK_SIZE);
+        g_assert(drc);
+
+        drc->unplug_requested = false;
+        addr += SPAPR_MEMORY_BLOCK_SIZE;
+    }
+}
+
 /* Callback to be called during DRC release. */
 void spapr_lmb_release(DeviceState *dev)
 {
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index c143bfb6d3..eae941233a 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -1230,6 +1230,20 @@ static void rtas_ibm_configure_connector(PowerPCCPU *cpu,
 
     drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
 
+    /*
+     * This indicates that the kernel is reconfiguring a LMB due to
+     * a failed hotunplug. Clear the pending unplug state for the whole
+     * DIMM.
+     */
+    if (spapr_drc_type(drc) == SPAPR_DR_CONNECTOR_TYPE_LMB &&
+        drc->unplug_requested) {
+
+        /* This really shouldn't happen in this point, but ... */
+        g_assert(drc->dev);
+
+        spapr_clear_pending_dimm_unplug_state(spapr, PC_DIMM(drc->dev));
+    }
+
     if (!drc->fdt) {
         void *fdt;
         int fdt_size;
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index ccbeeca1de..5bcc8f3bb8 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -847,6 +847,8 @@ int spapr_hpt_shift_for_ramsize(uint64_t ramsize);
 int spapr_reallocate_hpt(SpaprMachineState *spapr, int shift, Error **errp);
 void spapr_clear_pending_events(SpaprMachineState *spapr);
 void spapr_clear_pending_hotplug_events(SpaprMachineState *spapr);
+void spapr_clear_pending_dimm_unplug_state(SpaprMachineState *spapr,
+                                           PCDIMMDevice *dimm);
 int spapr_max_server_number(SpaprMachineState *spapr);
 void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
                       uint64_t pte0, uint64_t pte1);
-- 
2.29.2



^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 1/7] spapr_drc.c: do not call spapr_drc_detach() in drc_isolate_logical()
  2021-02-11 22:52 ` [PATCH v3 1/7] spapr_drc.c: do not call spapr_drc_detach() in drc_isolate_logical() Daniel Henrique Barboza
@ 2021-02-15 10:40   ` Greg Kurz
  2021-02-17  0:51     ` David Gibson
  0 siblings, 1 reply; 28+ messages in thread
From: Greg Kurz @ 2021-02-15 10:40 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: qemu-ppc, qemu-devel, david

On Thu, 11 Feb 2021 19:52:40 -0300
Daniel Henrique Barboza <danielhb413@gmail.com> wrote:

> drc_isolate_logical() is used to move the DRC from the "Configured" to the
> "Available" state, erroring out if the DRC is in the unexpected "Unisolate"
> state and doing nothing (with RTAS_OUT_SUCCESS) if the DRC is already in
> "Available" or in "Unusable" state.
> 
> When moving from "Configured" to "Available", the DRC is moved to the
> LOGICAL_AVAILABLE state, a drc->unplug_requested check is done and, if true,
> spapr_drc_detach() is called.
> 
> What spapr_drc_detach() does then is:
> 
> - set drc->unplug_requested to true. In fact, this is the only place where
> unplug_request is set to true;
> - does nothing else if drc->state != drck->empty_state. If the DRC state
> is equal to drck->empty_state, spapr_drc_release() is called. For logical
> DRCs, drck->empty_state = LOGICAL_UNUSABLE.
> 
> In short, calling spapr_drc_detach() in drc_isolate_logical() does nothing. It'll
> set unplug_request to true again ('again' since it was already true - otherwise the
> function wouldn't be called), and will return without calling spapr_drc_release()
> because the DRC is not in LOGICAL_UNUSABLE, since drc_isolate_logical() just
> moved it to LOGICAL_AVAILABLE. The only place where the logical DRC is released
> is when called from drc_set_unusable(), when it is moved to the "Unusable" state.
> As it should, according to PAPR.
> 
> Even though calling spapr_drc_detach() in drc_isolate_logical() is benign, removing
> it will avoid further thought about the matter. So let's go ahead and do that.
> 

Good catch. This path looks useless for logical DRCs indeed.

> As a note, this logic was introduced in commit bbf5c878ab76. Since then, the DRC
> handling code was refactored and enhanced, and PAPR itself went through some
> changes in the DRC area as well. It is expected that some assumptions we had back
> then are now deprecated.
> 

As specified in [1]:

Please do not use lines that are longer than 76 characters in your
commit message (so that the text still shows up nicely with "git show"
in a 80-columns terminal window).

[1] https://wiki.qemu.org/Contribute/SubmitAPatch#Write_a_meaningful_commit_message

> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---
>  hw/ppc/spapr_drc.c | 13 -------------
>  1 file changed, 13 deletions(-)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 8571d5bafe..84bd3c881f 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -132,19 +132,6 @@ static uint32_t drc_isolate_logical(SpaprDrc *drc)
>  
>      drc->state = SPAPR_DRC_STATE_LOGICAL_AVAILABLE;
>  
> -    /* if we're awaiting release, but still in an unconfigured state,
> -     * it's likely the guest is still in the process of configuring
> -     * the device and is transitioning the devices to an ISOLATED
> -     * state as a part of that process. so we only complete the
> -     * removal when this transition happens for a device in a
> -     * configured state, as suggested by the state diagram from PAPR+
> -     * 2.7, 13.4
> -     */
> -    if (drc->unplug_requested) {
> -        uint32_t drc_index = spapr_drc_index(drc);
> -        trace_spapr_drc_set_isolation_state_finalizing(drc_index);

I was expecting a change in hw/ppc/trace-events to ditch this trace,
but it is still called by drc_isolate_physical(), so we're good.

Reviewed-by: Greg Kurz <groug@kaod.org>

> -        spapr_drc_detach(drc);
> -    }
>      return RTAS_OUT_SUCCESS;
>  }
>  



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/7] spapr_pci.c: simplify spapr_pci_unplug_request() function handling
  2021-02-11 22:52 ` [PATCH v3 2/7] spapr_pci.c: simplify spapr_pci_unplug_request() function handling Daniel Henrique Barboza
@ 2021-02-16 15:50   ` Greg Kurz
  2021-02-16 16:09     ` Daniel Henrique Barboza
  0 siblings, 1 reply; 28+ messages in thread
From: Greg Kurz @ 2021-02-16 15:50 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: qemu-ppc, qemu-devel, david

On Thu, 11 Feb 2021 19:52:41 -0300
Daniel Henrique Barboza <danielhb413@gmail.com> wrote:

> When hotunplugging a PCI function we'll branch out the logic in two cases,
> function zero and non-zero. If non-zero, we'll call spapr_drc_detach() and
> nothing else. If it's function zero, we'll loop it once between all the
> functions in the slot to call spapr_drc_detach() on them, and afterwards
> we'll do another backwards loop where we'll signal the event to the guest.
> 
> We can simplify this logic. We can ignore all the DRC handling for non-zero
> functions, since we'll end up doing that regardless when unplugging function
> zero. And for function zero, everything can be done in a single loop, since
> tt doesn't matter if we end up marking the function DRCs as unplug pending in
> backwards order or not, as long as we call spapr_drc_detach() before issuing
> the hotunplug event to the guest.
> 
> This will also avoid a possible scenario where the user starts to hotunplug
> the slot, starting with a non-zero function, and then delays/forgets to
> hotunplug function zero afterwards. This would keep the function DRC marked
> as unplug requested indefinitely.
> 

... or until the guest is reset, which will no longer happen with this
patch applied, i.e. breaks the long standing policy that machine reset
causes pending hot-unplug requests to succeed. I don't see an obvious
reason to special case non-zero PCI functions.

> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---
>  hw/ppc/spapr_pci.c | 44 ++++++++++++++++----------------------------
>  1 file changed, 16 insertions(+), 28 deletions(-)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index f1c7479816..1791d98a49 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1709,38 +1709,26 @@ static void spapr_pci_unplug_request(HotplugHandler *plug_handler,
>              return;
>          }
>  
> -        /* ensure any other present functions are pending unplug */
> -        if (PCI_FUNC(pdev->devfn) == 0) {
> -            for (i = 1; i < 8; i++) {
> -                func_drc = drc_from_devfn(phb, chassis, PCI_DEVFN(slotnr, i));
> -                func_drck = SPAPR_DR_CONNECTOR_GET_CLASS(func_drc);
> -                state = func_drck->dr_entity_sense(func_drc);
> -                if (state == SPAPR_DR_ENTITY_SENSE_PRESENT
> -                    && !spapr_drc_unplug_requested(func_drc)) {
> -                    /*
> -                     * Attempting to remove function 0 of a multifunction
> -                     * device will will cascade into removing all child
> -                     * functions, even if their unplug weren't requested
> -                     * beforehand.
> -                     */
> -                    spapr_drc_detach(func_drc);
> -                }
> -            }
> +        /*
> +         * The hotunplug itself will occur when unplugging function 0,
> +         * regardless of marking any other functions DRCs as pending
> +         * unplug beforehand (since 02a1536eee33).
> +         */
> +        if (PCI_FUNC(pdev->devfn) != 0) {
> +            return;
>          }
>  
> -        spapr_drc_detach(drc);
> +        for (i = 7; i >= 0; i--) {
> +            func_drc = drc_from_devfn(phb, chassis, PCI_DEVFN(slotnr, i));
> +            func_drck = SPAPR_DR_CONNECTOR_GET_CLASS(func_drc);
> +            state = func_drck->dr_entity_sense(func_drc);
>  
> -        /* if this isn't func 0, defer unplug event. otherwise signal removal
> -         * for all present functions
> -         */
> -        if (PCI_FUNC(pdev->devfn) == 0) {
> -            for (i = 7; i >= 0; i--) {
> -                func_drc = drc_from_devfn(phb, chassis, PCI_DEVFN(slotnr, i));
> -                func_drck = SPAPR_DR_CONNECTOR_GET_CLASS(func_drc);
> -                state = func_drck->dr_entity_sense(func_drc);
> -                if (state == SPAPR_DR_ENTITY_SENSE_PRESENT) {
> -                    spapr_hotplug_req_remove_by_index(func_drc);
> +            if (state == SPAPR_DR_ENTITY_SENSE_PRESENT) {
> +                /* Mark the DRC as requested unplug if needed. */
> +                if (!spapr_drc_unplug_requested(func_drc)) {
> +                    spapr_drc_detach(func_drc);
>                  }
> +                spapr_hotplug_req_remove_by_index(func_drc);
>              }
>          }
>      }



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/7] spapr_pci.c: simplify spapr_pci_unplug_request() function handling
  2021-02-16 15:50   ` Greg Kurz
@ 2021-02-16 16:09     ` Daniel Henrique Barboza
  2021-02-16 17:16       ` Greg Kurz
  0 siblings, 1 reply; 28+ messages in thread
From: Daniel Henrique Barboza @ 2021-02-16 16:09 UTC (permalink / raw)
  To: Greg Kurz; +Cc: qemu-ppc, qemu-devel, david



On 2/16/21 12:50 PM, Greg Kurz wrote:
> On Thu, 11 Feb 2021 19:52:41 -0300
> Daniel Henrique Barboza <danielhb413@gmail.com> wrote:
> 
>> When hotunplugging a PCI function we'll branch out the logic in two cases,
>> function zero and non-zero. If non-zero, we'll call spapr_drc_detach() and
>> nothing else. If it's function zero, we'll loop it once between all the
>> functions in the slot to call spapr_drc_detach() on them, and afterwards
>> we'll do another backwards loop where we'll signal the event to the guest.
>>
>> We can simplify this logic. We can ignore all the DRC handling for non-zero
>> functions, since we'll end up doing that regardless when unplugging function
>> zero. And for function zero, everything can be done in a single loop, since
>> tt doesn't matter if we end up marking the function DRCs as unplug pending in
>> backwards order or not, as long as we call spapr_drc_detach() before issuing
>> the hotunplug event to the guest.
>>
>> This will also avoid a possible scenario where the user starts to hotunplug
>> the slot, starting with a non-zero function, and then delays/forgets to
>> hotunplug function zero afterwards. This would keep the function DRC marked
>> as unplug requested indefinitely.
>>
> 
> ... or until the guest is reset, which will no longer happen with this
> patch applied, i.e. breaks the long standing policy that machine reset
> causes pending hot-unplug requests to succeed. I don't see an obvious
> reason to special case non-zero PCI functions.

It's not possible to hotunplug the non-zero functions during machine reset for
multifunction PCI devices. We need to unplug the entire slot, and that will only
happen when function zero is unplugged. In fact, I think bad things will happen
in this case you mentioned if we are forcing the removal of non-zero functions
without function zero (spoiler: didn't test it).

What I'm doing in this patch is making it clearer that non-zero functions does
not matter for the unplug of multifunction PCI devices. We'll detach the whole
slot when function zero is unplugged, regardless of the unplug state of other
functions.

The only reason why I didn't make 'device_del' to error out when used with a
non-zero function is because we allowed this in the past and it would break user
ABI. Otherwise, FWIW, "device_del <non-zero function>" is doing nothing since
commit "spapr_pci: remove all child functions in function zero unplug".


Thanks,


DHB



> 
>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>> ---
>>   hw/ppc/spapr_pci.c | 44 ++++++++++++++++----------------------------
>>   1 file changed, 16 insertions(+), 28 deletions(-)
>>
>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>> index f1c7479816..1791d98a49 100644
>> --- a/hw/ppc/spapr_pci.c
>> +++ b/hw/ppc/spapr_pci.c
>> @@ -1709,38 +1709,26 @@ static void spapr_pci_unplug_request(HotplugHandler *plug_handler,
>>               return;
>>           }
>>   
>> -        /* ensure any other present functions are pending unplug */
>> -        if (PCI_FUNC(pdev->devfn) == 0) {
>> -            for (i = 1; i < 8; i++) {
>> -                func_drc = drc_from_devfn(phb, chassis, PCI_DEVFN(slotnr, i));
>> -                func_drck = SPAPR_DR_CONNECTOR_GET_CLASS(func_drc);
>> -                state = func_drck->dr_entity_sense(func_drc);
>> -                if (state == SPAPR_DR_ENTITY_SENSE_PRESENT
>> -                    && !spapr_drc_unplug_requested(func_drc)) {
>> -                    /*
>> -                     * Attempting to remove function 0 of a multifunction
>> -                     * device will will cascade into removing all child
>> -                     * functions, even if their unplug weren't requested
>> -                     * beforehand.
>> -                     */
>> -                    spapr_drc_detach(func_drc);
>> -                }
>> -            }
>> +        /*
>> +         * The hotunplug itself will occur when unplugging function 0,
>> +         * regardless of marking any other functions DRCs as pending
>> +         * unplug beforehand (since 02a1536eee33).
>> +         */
>> +        if (PCI_FUNC(pdev->devfn) != 0) {
>> +            return;
>>           }
>>   
>> -        spapr_drc_detach(drc);
>> +        for (i = 7; i >= 0; i--) {
>> +            func_drc = drc_from_devfn(phb, chassis, PCI_DEVFN(slotnr, i));
>> +            func_drck = SPAPR_DR_CONNECTOR_GET_CLASS(func_drc);
>> +            state = func_drck->dr_entity_sense(func_drc);
>>   
>> -        /* if this isn't func 0, defer unplug event. otherwise signal removal
>> -         * for all present functions
>> -         */
>> -        if (PCI_FUNC(pdev->devfn) == 0) {
>> -            for (i = 7; i >= 0; i--) {
>> -                func_drc = drc_from_devfn(phb, chassis, PCI_DEVFN(slotnr, i));
>> -                func_drck = SPAPR_DR_CONNECTOR_GET_CLASS(func_drc);
>> -                state = func_drck->dr_entity_sense(func_drc);
>> -                if (state == SPAPR_DR_ENTITY_SENSE_PRESENT) {
>> -                    spapr_hotplug_req_remove_by_index(func_drc);
>> +            if (state == SPAPR_DR_ENTITY_SENSE_PRESENT) {
>> +                /* Mark the DRC as requested unplug if needed. */
>> +                if (!spapr_drc_unplug_requested(func_drc)) {
>> +                    spapr_drc_detach(func_drc);
>>                   }
>> +                spapr_hotplug_req_remove_by_index(func_drc);
>>               }
>>           }
>>       }
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/7] spapr_pci.c: simplify spapr_pci_unplug_request() function handling
  2021-02-16 16:09     ` Daniel Henrique Barboza
@ 2021-02-16 17:16       ` Greg Kurz
  2021-02-16 17:44         ` Daniel Henrique Barboza
  0 siblings, 1 reply; 28+ messages in thread
From: Greg Kurz @ 2021-02-16 17:16 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: qemu-ppc, qemu-devel, david

On Tue, 16 Feb 2021 13:09:43 -0300
Daniel Henrique Barboza <danielhb413@gmail.com> wrote:

> 
> 
> On 2/16/21 12:50 PM, Greg Kurz wrote:
> > On Thu, 11 Feb 2021 19:52:41 -0300
> > Daniel Henrique Barboza <danielhb413@gmail.com> wrote:
> > 
> >> When hotunplugging a PCI function we'll branch out the logic in two cases,
> >> function zero and non-zero. If non-zero, we'll call spapr_drc_detach() and
> >> nothing else. If it's function zero, we'll loop it once between all the
> >> functions in the slot to call spapr_drc_detach() on them, and afterwards
> >> we'll do another backwards loop where we'll signal the event to the guest.
> >>
> >> We can simplify this logic. We can ignore all the DRC handling for non-zero
> >> functions, since we'll end up doing that regardless when unplugging function
> >> zero. And for function zero, everything can be done in a single loop, since
> >> tt doesn't matter if we end up marking the function DRCs as unplug pending in
> >> backwards order or not, as long as we call spapr_drc_detach() before issuing
> >> the hotunplug event to the guest.
> >>
> >> This will also avoid a possible scenario where the user starts to hotunplug
> >> the slot, starting with a non-zero function, and then delays/forgets to
> >> hotunplug function zero afterwards. This would keep the function DRC marked
> >> as unplug requested indefinitely.
> >>
> > 
> > ... or until the guest is reset, which will no longer happen with this
> > patch applied, i.e. breaks the long standing policy that machine reset
> > causes pending hot-unplug requests to succeed. I don't see an obvious
> > reason to special case non-zero PCI functions.
> 
> It's not possible to hotunplug the non-zero functions during machine reset for
> multifunction PCI devices. We need to unplug the entire slot, and that will only
> happen when function zero is unplugged. In fact, I think bad things will happen
> in this case you mentioned if we are forcing the removal of non-zero functions
> without function zero (spoiler: didn't test it).
> 

I've tested with the aggregation of two e1000e emulated devices:

device_add e1000e,addr=10.1,id=netfn1
device_add e1000e,multifunction=on,addr=10.0,id=netfn0

And I don't quite see what "bad things" could happen. We're resetting the
machine to a stable state and the new OS instance will just not see the
removed function (just like only function netfn0 got added).

> What I'm doing in this patch is making it clearer that non-zero functions does
> not matter for the unplug of multifunction PCI devices. We'll detach the whole
> slot when function zero is unplugged, regardless of the unplug state of other
> functions.
> 

I understand that hot-unplug of non-zero functions is special cased while
the guest OS is running, but this doesn't really applies if the guest is
rebooted. Code simplification is not a good reason enough, at least for me,
to alter the "reset complete all pending hotplugs" general rule.

> The only reason why I didn't make 'device_del' to error out when used with a
> non-zero function is because we allowed this in the past and it would break user
> ABI. Otherwise, FWIW, "device_del <non-zero function>" is doing nothing since
> commit "spapr_pci: remove all child functions in function zero unplug".
> 
> 
> Thanks,
> 
> 
> DHB
> 
> 
> 
> > 
> >> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> >> ---
> >>   hw/ppc/spapr_pci.c | 44 ++++++++++++++++----------------------------
> >>   1 file changed, 16 insertions(+), 28 deletions(-)
> >>
> >> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> >> index f1c7479816..1791d98a49 100644
> >> --- a/hw/ppc/spapr_pci.c
> >> +++ b/hw/ppc/spapr_pci.c
> >> @@ -1709,38 +1709,26 @@ static void spapr_pci_unplug_request(HotplugHandler *plug_handler,
> >>               return;
> >>           }
> >>   
> >> -        /* ensure any other present functions are pending unplug */
> >> -        if (PCI_FUNC(pdev->devfn) == 0) {
> >> -            for (i = 1; i < 8; i++) {
> >> -                func_drc = drc_from_devfn(phb, chassis, PCI_DEVFN(slotnr, i));
> >> -                func_drck = SPAPR_DR_CONNECTOR_GET_CLASS(func_drc);
> >> -                state = func_drck->dr_entity_sense(func_drc);
> >> -                if (state == SPAPR_DR_ENTITY_SENSE_PRESENT
> >> -                    && !spapr_drc_unplug_requested(func_drc)) {
> >> -                    /*
> >> -                     * Attempting to remove function 0 of a multifunction
> >> -                     * device will will cascade into removing all child
> >> -                     * functions, even if their unplug weren't requested
> >> -                     * beforehand.
> >> -                     */
> >> -                    spapr_drc_detach(func_drc);
> >> -                }
> >> -            }
> >> +        /*
> >> +         * The hotunplug itself will occur when unplugging function 0,
> >> +         * regardless of marking any other functions DRCs as pending
> >> +         * unplug beforehand (since 02a1536eee33).
> >> +         */
> >> +        if (PCI_FUNC(pdev->devfn) != 0) {
> >> +            return;
> >>           }
> >>   
> >> -        spapr_drc_detach(drc);
> >> +        for (i = 7; i >= 0; i--) {
> >> +            func_drc = drc_from_devfn(phb, chassis, PCI_DEVFN(slotnr, i));
> >> +            func_drck = SPAPR_DR_CONNECTOR_GET_CLASS(func_drc);
> >> +            state = func_drck->dr_entity_sense(func_drc);
> >>   
> >> -        /* if this isn't func 0, defer unplug event. otherwise signal removal
> >> -         * for all present functions
> >> -         */
> >> -        if (PCI_FUNC(pdev->devfn) == 0) {
> >> -            for (i = 7; i >= 0; i--) {
> >> -                func_drc = drc_from_devfn(phb, chassis, PCI_DEVFN(slotnr, i));
> >> -                func_drck = SPAPR_DR_CONNECTOR_GET_CLASS(func_drc);
> >> -                state = func_drck->dr_entity_sense(func_drc);
> >> -                if (state == SPAPR_DR_ENTITY_SENSE_PRESENT) {
> >> -                    spapr_hotplug_req_remove_by_index(func_drc);
> >> +            if (state == SPAPR_DR_ENTITY_SENSE_PRESENT) {
> >> +                /* Mark the DRC as requested unplug if needed. */
> >> +                if (!spapr_drc_unplug_requested(func_drc)) {
> >> +                    spapr_drc_detach(func_drc);
> >>                   }
> >> +                spapr_hotplug_req_remove_by_index(func_drc);
> >>               }
> >>           }
> >>       }
> > 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/7] spapr_pci.c: simplify spapr_pci_unplug_request() function handling
  2021-02-16 17:16       ` Greg Kurz
@ 2021-02-16 17:44         ` Daniel Henrique Barboza
  2021-02-17  0:54           ` David Gibson
  0 siblings, 1 reply; 28+ messages in thread
From: Daniel Henrique Barboza @ 2021-02-16 17:44 UTC (permalink / raw)
  To: Greg Kurz; +Cc: qemu-ppc, qemu-devel, david



On 2/16/21 2:16 PM, Greg Kurz wrote:
> On Tue, 16 Feb 2021 13:09:43 -0300
> Daniel Henrique Barboza <danielhb413@gmail.com> wrote:
> 
>>
>>
>> On 2/16/21 12:50 PM, Greg Kurz wrote:
>>> On Thu, 11 Feb 2021 19:52:41 -0300
>>> Daniel Henrique Barboza <danielhb413@gmail.com> wrote:
>>>
>>>> When hotunplugging a PCI function we'll branch out the logic in two cases,
>>>> function zero and non-zero. If non-zero, we'll call spapr_drc_detach() and
>>>> nothing else. If it's function zero, we'll loop it once between all the
>>>> functions in the slot to call spapr_drc_detach() on them, and afterwards
>>>> we'll do another backwards loop where we'll signal the event to the guest.
>>>>
>>>> We can simplify this logic. We can ignore all the DRC handling for non-zero
>>>> functions, since we'll end up doing that regardless when unplugging function
>>>> zero. And for function zero, everything can be done in a single loop, since
>>>> tt doesn't matter if we end up marking the function DRCs as unplug pending in
>>>> backwards order or not, as long as we call spapr_drc_detach() before issuing
>>>> the hotunplug event to the guest.
>>>>
>>>> This will also avoid a possible scenario where the user starts to hotunplug
>>>> the slot, starting with a non-zero function, and then delays/forgets to
>>>> hotunplug function zero afterwards. This would keep the function DRC marked
>>>> as unplug requested indefinitely.
>>>>
>>>
>>> ... or until the guest is reset, which will no longer happen with this
>>> patch applied, i.e. breaks the long standing policy that machine reset
>>> causes pending hot-unplug requests to succeed. I don't see an obvious
>>> reason to special case non-zero PCI functions.
>>
>> It's not possible to hotunplug the non-zero functions during machine reset for
>> multifunction PCI devices. We need to unplug the entire slot, and that will only
>> happen when function zero is unplugged. In fact, I think bad things will happen
>> in this case you mentioned if we are forcing the removal of non-zero functions
>> without function zero (spoiler: didn't test it).
>>
> 
> I've tested with the aggregation of two e1000e emulated devices:
> 
> device_add e1000e,addr=10.1,id=netfn1
> device_add e1000e,multifunction=on,addr=10.0,id=netfn0
> 
> And I don't quite see what "bad things" could happen. We're resetting the
> machine to a stable state and the new OS instance will just not see the
> removed function (just like only function netfn0 got added).


Interesting. Thanks for looking this up.

Given that the intention of this patch was a simplification of the existing
design, without changing what we currently do regarding PCI functions and unplug,
and apparently it just did that, let's drop it.



DHB


> 
>> What I'm doing in this patch is making it clearer that non-zero functions does
>> not matter for the unplug of multifunction PCI devices. We'll detach the whole
>> slot when function zero is unplugged, regardless of the unplug state of other
>> functions.
>>
> 
> I understand that hot-unplug of non-zero functions is special cased while
> the guest OS is running, but this doesn't really applies if the guest is
> rebooted. Code simplification is not a good reason enough, at least for me,
> to alter the "reset complete all pending hotplugs" general rule.
> 
>> The only reason why I didn't make 'device_del' to error out when used with a
>> non-zero function is because we allowed this in the past and it would break user
>> ABI. Otherwise, FWIW, "device_del <non-zero function>" is doing nothing since
>> commit "spapr_pci: remove all child functions in function zero unplug".
>>
>>
>> Thanks,
>>
>>
>> DHB
>>
>>
>>
>>>
>>>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>>>> ---
>>>>    hw/ppc/spapr_pci.c | 44 ++++++++++++++++----------------------------
>>>>    1 file changed, 16 insertions(+), 28 deletions(-)
>>>>
>>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>>>> index f1c7479816..1791d98a49 100644
>>>> --- a/hw/ppc/spapr_pci.c
>>>> +++ b/hw/ppc/spapr_pci.c
>>>> @@ -1709,38 +1709,26 @@ static void spapr_pci_unplug_request(HotplugHandler *plug_handler,
>>>>                return;
>>>>            }
>>>>    
>>>> -        /* ensure any other present functions are pending unplug */
>>>> -        if (PCI_FUNC(pdev->devfn) == 0) {
>>>> -            for (i = 1; i < 8; i++) {
>>>> -                func_drc = drc_from_devfn(phb, chassis, PCI_DEVFN(slotnr, i));
>>>> -                func_drck = SPAPR_DR_CONNECTOR_GET_CLASS(func_drc);
>>>> -                state = func_drck->dr_entity_sense(func_drc);
>>>> -                if (state == SPAPR_DR_ENTITY_SENSE_PRESENT
>>>> -                    && !spapr_drc_unplug_requested(func_drc)) {
>>>> -                    /*
>>>> -                     * Attempting to remove function 0 of a multifunction
>>>> -                     * device will will cascade into removing all child
>>>> -                     * functions, even if their unplug weren't requested
>>>> -                     * beforehand.
>>>> -                     */
>>>> -                    spapr_drc_detach(func_drc);
>>>> -                }
>>>> -            }
>>>> +        /*
>>>> +         * The hotunplug itself will occur when unplugging function 0,
>>>> +         * regardless of marking any other functions DRCs as pending
>>>> +         * unplug beforehand (since 02a1536eee33).
>>>> +         */
>>>> +        if (PCI_FUNC(pdev->devfn) != 0) {
>>>> +            return;
>>>>            }
>>>>    
>>>> -        spapr_drc_detach(drc);
>>>> +        for (i = 7; i >= 0; i--) {
>>>> +            func_drc = drc_from_devfn(phb, chassis, PCI_DEVFN(slotnr, i));
>>>> +            func_drck = SPAPR_DR_CONNECTOR_GET_CLASS(func_drc);
>>>> +            state = func_drck->dr_entity_sense(func_drc);
>>>>    
>>>> -        /* if this isn't func 0, defer unplug event. otherwise signal removal
>>>> -         * for all present functions
>>>> -         */
>>>> -        if (PCI_FUNC(pdev->devfn) == 0) {
>>>> -            for (i = 7; i >= 0; i--) {
>>>> -                func_drc = drc_from_devfn(phb, chassis, PCI_DEVFN(slotnr, i));
>>>> -                func_drck = SPAPR_DR_CONNECTOR_GET_CLASS(func_drc);
>>>> -                state = func_drck->dr_entity_sense(func_drc);
>>>> -                if (state == SPAPR_DR_ENTITY_SENSE_PRESENT) {
>>>> -                    spapr_hotplug_req_remove_by_index(func_drc);
>>>> +            if (state == SPAPR_DR_ENTITY_SENSE_PRESENT) {
>>>> +                /* Mark the DRC as requested unplug if needed. */
>>>> +                if (!spapr_drc_unplug_requested(func_drc)) {
>>>> +                    spapr_drc_detach(func_drc);
>>>>                    }
>>>> +                spapr_hotplug_req_remove_by_index(func_drc);
>>>>                }
>>>>            }
>>>>        }
>>>
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 1/7] spapr_drc.c: do not call spapr_drc_detach() in drc_isolate_logical()
  2021-02-15 10:40   ` Greg Kurz
@ 2021-02-17  0:51     ` David Gibson
  0 siblings, 0 replies; 28+ messages in thread
From: David Gibson @ 2021-02-17  0:51 UTC (permalink / raw)
  To: Greg Kurz; +Cc: Daniel Henrique Barboza, qemu-ppc, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 4102 bytes --]

On Mon, Feb 15, 2021 at 11:40:06AM +0100, Greg Kurz wrote:
> On Thu, 11 Feb 2021 19:52:40 -0300
> Daniel Henrique Barboza <danielhb413@gmail.com> wrote:
> 
> > drc_isolate_logical() is used to move the DRC from the "Configured" to the
> > "Available" state, erroring out if the DRC is in the unexpected "Unisolate"
> > state and doing nothing (with RTAS_OUT_SUCCESS) if the DRC is already in
> > "Available" or in "Unusable" state.
> > 
> > When moving from "Configured" to "Available", the DRC is moved to the
> > LOGICAL_AVAILABLE state, a drc->unplug_requested check is done and, if true,
> > spapr_drc_detach() is called.
> > 
> > What spapr_drc_detach() does then is:
> > 
> > - set drc->unplug_requested to true. In fact, this is the only place where
> > unplug_request is set to true;
> > - does nothing else if drc->state != drck->empty_state. If the DRC state
> > is equal to drck->empty_state, spapr_drc_release() is called. For logical
> > DRCs, drck->empty_state = LOGICAL_UNUSABLE.
> > 
> > In short, calling spapr_drc_detach() in drc_isolate_logical() does nothing. It'll
> > set unplug_request to true again ('again' since it was already true - otherwise the
> > function wouldn't be called), and will return without calling spapr_drc_release()
> > because the DRC is not in LOGICAL_UNUSABLE, since drc_isolate_logical() just
> > moved it to LOGICAL_AVAILABLE. The only place where the logical DRC is released
> > is when called from drc_set_unusable(), when it is moved to the "Unusable" state.
> > As it should, according to PAPR.
> > 
> > Even though calling spapr_drc_detach() in drc_isolate_logical() is benign, removing
> > it will avoid further thought about the matter. So let's go ahead and do that.
> > 
> 
> Good catch. This path looks useless for logical DRCs indeed.
> 
> > As a note, this logic was introduced in commit bbf5c878ab76. Since then, the DRC
> > handling code was refactored and enhanced, and PAPR itself went through some
> > changes in the DRC area as well. It is expected that some assumptions we had back
> > then are now deprecated.
> > 
> 
> As specified in [1]:
> 
> Please do not use lines that are longer than 76 characters in your
> commit message (so that the text still shows up nicely with "git show"
> in a 80-columns terminal window).
> 
> [1] https://wiki.qemu.org/Contribute/SubmitAPatch#Write_a_meaningful_commit_message

I've applied this patch, but re-wrapped the commit message.

> > Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> > ---
> >  hw/ppc/spapr_drc.c | 13 -------------
> >  1 file changed, 13 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> > index 8571d5bafe..84bd3c881f 100644
> > --- a/hw/ppc/spapr_drc.c
> > +++ b/hw/ppc/spapr_drc.c
> > @@ -132,19 +132,6 @@ static uint32_t drc_isolate_logical(SpaprDrc *drc)
> >  
> >      drc->state = SPAPR_DRC_STATE_LOGICAL_AVAILABLE;
> >  
> > -    /* if we're awaiting release, but still in an unconfigured state,
> > -     * it's likely the guest is still in the process of configuring
> > -     * the device and is transitioning the devices to an ISOLATED
> > -     * state as a part of that process. so we only complete the
> > -     * removal when this transition happens for a device in a
> > -     * configured state, as suggested by the state diagram from PAPR+
> > -     * 2.7, 13.4
> > -     */
> > -    if (drc->unplug_requested) {
> > -        uint32_t drc_index = spapr_drc_index(drc);
> > -        trace_spapr_drc_set_isolation_state_finalizing(drc_index);
> 
> I was expecting a change in hw/ppc/trace-events to ditch this trace,
> but it is still called by drc_isolate_physical(), so we're good.
> 
> Reviewed-by: Greg Kurz <groug@kaod.org>
> 
> > -        spapr_drc_detach(drc);
> > -    }
> >      return RTAS_OUT_SUCCESS;
> >  }
> >  
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 2/7] spapr_pci.c: simplify spapr_pci_unplug_request() function handling
  2021-02-16 17:44         ` Daniel Henrique Barboza
@ 2021-02-17  0:54           ` David Gibson
  0 siblings, 0 replies; 28+ messages in thread
From: David Gibson @ 2021-02-17  0:54 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: qemu-ppc, Greg Kurz, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3477 bytes --]

On Tue, Feb 16, 2021 at 02:44:44PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 2/16/21 2:16 PM, Greg Kurz wrote:
> > On Tue, 16 Feb 2021 13:09:43 -0300
> > Daniel Henrique Barboza <danielhb413@gmail.com> wrote:
> > 
> > > 
> > > 
> > > On 2/16/21 12:50 PM, Greg Kurz wrote:
> > > > On Thu, 11 Feb 2021 19:52:41 -0300
> > > > Daniel Henrique Barboza <danielhb413@gmail.com> wrote:
> > > > 
> > > > > When hotunplugging a PCI function we'll branch out the logic in two cases,
> > > > > function zero and non-zero. If non-zero, we'll call spapr_drc_detach() and
> > > > > nothing else. If it's function zero, we'll loop it once between all the
> > > > > functions in the slot to call spapr_drc_detach() on them, and afterwards
> > > > > we'll do another backwards loop where we'll signal the event to the guest.
> > > > > 
> > > > > We can simplify this logic. We can ignore all the DRC handling for non-zero
> > > > > functions, since we'll end up doing that regardless when unplugging function
> > > > > zero. And for function zero, everything can be done in a single loop, since
> > > > > tt doesn't matter if we end up marking the function DRCs as unplug pending in
> > > > > backwards order or not, as long as we call spapr_drc_detach() before issuing
> > > > > the hotunplug event to the guest.
> > > > > 
> > > > > This will also avoid a possible scenario where the user starts to hotunplug
> > > > > the slot, starting with a non-zero function, and then delays/forgets to
> > > > > hotunplug function zero afterwards. This would keep the function DRC marked
> > > > > as unplug requested indefinitely.
> > > > > 
> > > > 
> > > > ... or until the guest is reset, which will no longer happen with this
> > > > patch applied, i.e. breaks the long standing policy that machine reset
> > > > causes pending hot-unplug requests to succeed. I don't see an obvious
> > > > reason to special case non-zero PCI functions.
> > > 
> > > It's not possible to hotunplug the non-zero functions during machine reset for
> > > multifunction PCI devices. We need to unplug the entire slot, and that will only
> > > happen when function zero is unplugged. In fact, I think bad things will happen
> > > in this case you mentioned if we are forcing the removal of non-zero functions
> > > without function zero (spoiler: didn't test it).
> > > 
> > 
> > I've tested with the aggregation of two e1000e emulated devices:
> > 
> > device_add e1000e,addr=10.1,id=netfn1
> > device_add e1000e,multifunction=on,addr=10.0,id=netfn0
> > 
> > And I don't quite see what "bad things" could happen. We're resetting the
> > machine to a stable state and the new OS instance will just not see the
> > removed function (just like only function netfn0 got added).
> 
> 
> Interesting. Thanks for looking this up.
> 
> Given that the intention of this patch was a simplification of the existing
> design, without changing what we currently do regarding PCI functions and unplug,
> and apparently it just did that, let's drop it.

I think that's best.  As Greg says, I think maintaining the behaviour
that reset completes pending hotplugs should be retained, and the
usual constraints on non-zero function hot-unplug don't apply at reset
time.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/7] spapr_drc.c: use spapr_drc_release() in isolate_physical/set_unusable
  2021-02-11 22:52 ` [PATCH v3 3/7] spapr_drc.c: use spapr_drc_release() in isolate_physical/set_unusable Daniel Henrique Barboza
@ 2021-02-17  0:57   ` David Gibson
  2021-02-17 10:58   ` Greg Kurz
  1 sibling, 0 replies; 28+ messages in thread
From: David Gibson @ 2021-02-17  0:57 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: qemu-ppc, qemu-devel, groug

[-- Attachment #1: Type: text/plain, Size: 3678 bytes --]

On Thu, Feb 11, 2021 at 07:52:42PM -0300, Daniel Henrique Barboza wrote:
> When moving a physical DRC to "Available", drc_isolate_physical() will
> move the DRC state to STATE_PHYSICAL_POWERON and, if the DRC is marked
> for unplug, call spapr_drc_detach(). For physical DRCs, drck->empty_state
> is STATE_PHYSICAL_POWERON, meaning that we're sure that spapr_drc_detach()
> will end up calling spapr_drc_release() in the end.
> 
> Likewise, for logical DRCs, drc_set_unusable will move the DRC to "Unusable"
> state, setting drc->state to STATE_LOGICAL_UNUSABLE, which is the
> drck->empty_state for logical DRCs. spapr_drc_detach() will call
> spapr_drc_release() in this case as well.
> 
> In both scenarios, spapr_drc_detach() is being used as a spapr_drc_release(),
> wrapper, where we also set unplug_requested (which is already true, otherwise
> spapr_drc_detach() wouldn't be called in the first place) and check if
> drc->state == drck->empty_state, which we also know it's guaranteed to
> be true because we just set it.
> 
> Just use spapr_drc_release() in these functions to be clear of our intentions
> in both these functions.
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  hw/ppc/spapr_drc.c | 32 ++++++++++++++++----------------
>  1 file changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 84bd3c881f..555a25517d 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -50,6 +50,20 @@ uint32_t spapr_drc_index(SpaprDrc *drc)
>          | (drc->id & DRC_INDEX_ID_MASK);
>  }
>  
> +static void spapr_drc_release(SpaprDrc *drc)
> +{
> +    SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +
> +    drck->release(drc->dev);
> +
> +    drc->unplug_requested = false;
> +    g_free(drc->fdt);
> +    drc->fdt = NULL;
> +    drc->fdt_start_offset = 0;
> +    object_property_del(OBJECT(drc), "device");
> +    drc->dev = NULL;
> +}
> +
>  static uint32_t drc_isolate_physical(SpaprDrc *drc)
>  {
>      switch (drc->state) {
> @@ -68,7 +82,7 @@ static uint32_t drc_isolate_physical(SpaprDrc *drc)
>      if (drc->unplug_requested) {
>          uint32_t drc_index = spapr_drc_index(drc);
>          trace_spapr_drc_set_isolation_state_finalizing(drc_index);
> -        spapr_drc_detach(drc);
> +        spapr_drc_release(drc);
>      }
>  
>      return RTAS_OUT_SUCCESS;
> @@ -209,7 +223,7 @@ static uint32_t drc_set_unusable(SpaprDrc *drc)
>      if (drc->unplug_requested) {
>          uint32_t drc_index = spapr_drc_index(drc);
>          trace_spapr_drc_set_allocation_state_finalizing(drc_index);
> -        spapr_drc_detach(drc);
> +        spapr_drc_release(drc);
>      }
>  
>      return RTAS_OUT_SUCCESS;
> @@ -372,20 +386,6 @@ void spapr_drc_attach(SpaprDrc *drc, DeviceState *d)
>                               NULL, 0);
>  }
>  
> -static void spapr_drc_release(SpaprDrc *drc)
> -{
> -    SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> -
> -    drck->release(drc->dev);
> -
> -    drc->unplug_requested = false;
> -    g_free(drc->fdt);
> -    drc->fdt = NULL;
> -    drc->fdt_start_offset = 0;
> -    object_property_del(OBJECT(drc), "device");
> -    drc->dev = NULL;
> -}
> -
>  void spapr_drc_detach(SpaprDrc *drc)
>  {
>      SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 4/7] spapr: rename spapr_drc_detach() to spapr_drc_unplug_request()
  2021-02-11 22:52 ` [PATCH v3 4/7] spapr: rename spapr_drc_detach() to spapr_drc_unplug_request() Daniel Henrique Barboza
@ 2021-02-17  0:58   ` David Gibson
  2021-02-17 11:01   ` Greg Kurz
  1 sibling, 0 replies; 28+ messages in thread
From: David Gibson @ 2021-02-17  0:58 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: qemu-ppc, qemu-devel, groug

[-- Attachment #1: Type: text/plain, Size: 5622 bytes --]

On Thu, Feb 11, 2021 at 07:52:43PM -0300, Daniel Henrique Barboza wrote:
> spapr_drc_detach() is not the best name for what the function does.
> The function does not detach the DRC, it makes an uncommited attempt
> to do it. It'll mark the DRC as pending unplug, via the 'unplug_request'
> flag, and only if the DRC state is drck->empty_state it will detach the
> DRC, via spapr_drc_release().
> 
> This is a contrast with its pair spapr_drc_attach(), where the function is
> indeed creating the DRC QOM object. If you know what spapr_drc_attach()
> does, you can be misled into thinking that spapr_drc_detach() is removing
> the DRC from QEMU internal state, which isn't true.
> 
> The current role of this function is better described as a request for
> detach, since there's no guarantee that we're going to detach the DRC in
> the end. Rename the function to spapr_drc_unplug_request to reflect what is is
> doing.
> 
> The initial idea was to change the name to spapr_drc_detach_request(), and
> later on change the unplug_request flag to detach_request. However,
> unplug_request is a migratable boolean for a long time now and renaming it
> is not worth the trouble. spapr_drc_unplug_request() setting drc->unplug_request
> is more natural than spapr_drc_detach_request setting drc->unplug_request.
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>

Good reasoning.

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  hw/ppc/spapr.c             | 6 +++---
>  hw/ppc/spapr_drc.c         | 4 ++--
>  hw/ppc/spapr_pci.c         | 2 +-
>  hw/ppc/trace-events        | 2 +-
>  include/hw/ppc/spapr_drc.h | 2 +-
>  5 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 85fe65f894..b066df68cb 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3654,7 +3654,7 @@ static void spapr_memory_unplug_request(HotplugHandler *hotplug_dev,
>                                addr / SPAPR_MEMORY_BLOCK_SIZE);
>          g_assert(drc);
>  
> -        spapr_drc_detach(drc);
> +        spapr_drc_unplug_request(drc);
>          addr += SPAPR_MEMORY_BLOCK_SIZE;
>      }
>  
> @@ -3722,7 +3722,7 @@ void spapr_core_unplug_request(HotplugHandler *hotplug_dev, DeviceState *dev,
>      g_assert(drc);
>  
>      if (!spapr_drc_unplug_requested(drc)) {
> -        spapr_drc_detach(drc);
> +        spapr_drc_unplug_request(drc);
>          spapr_hotplug_req_remove_by_index(drc);
>      }
>  }
> @@ -3985,7 +3985,7 @@ static void spapr_phb_unplug_request(HotplugHandler *hotplug_dev,
>      assert(drc);
>  
>      if (!spapr_drc_unplug_requested(drc)) {
> -        spapr_drc_detach(drc);
> +        spapr_drc_unplug_request(drc);
>          spapr_hotplug_req_remove_by_index(drc);
>      }
>  }
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 555a25517d..67041fb212 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -386,11 +386,11 @@ void spapr_drc_attach(SpaprDrc *drc, DeviceState *d)
>                               NULL, 0);
>  }
>  
> -void spapr_drc_detach(SpaprDrc *drc)
> +void spapr_drc_unplug_request(SpaprDrc *drc)
>  {
>      SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
>  
> -    trace_spapr_drc_detach(spapr_drc_index(drc));
> +    trace_spapr_drc_unplug_request(spapr_drc_index(drc));
>  
>      g_assert(drc->dev);
>  
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 1791d98a49..9334ba5dbb 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1726,7 +1726,7 @@ static void spapr_pci_unplug_request(HotplugHandler *plug_handler,
>              if (state == SPAPR_DR_ENTITY_SENSE_PRESENT) {
>                  /* Mark the DRC as requested unplug if needed. */
>                  if (!spapr_drc_unplug_requested(func_drc)) {
> -                    spapr_drc_detach(func_drc);
> +                    spapr_drc_unplug_request(func_drc);
>                  }
>                  spapr_hotplug_req_remove_by_index(func_drc);
>              }
> diff --git a/hw/ppc/trace-events b/hw/ppc/trace-events
> index 1e91984526..b4bbfbb013 100644
> --- a/hw/ppc/trace-events
> +++ b/hw/ppc/trace-events
> @@ -50,7 +50,7 @@ spapr_drc_set_allocation_state(uint32_t index, int state) "drc: 0x%"PRIx32", sta
>  spapr_drc_set_allocation_state_finalizing(uint32_t index) "drc: 0x%"PRIx32
>  spapr_drc_set_configured(uint32_t index) "drc: 0x%"PRIx32
>  spapr_drc_attach(uint32_t index) "drc: 0x%"PRIx32
> -spapr_drc_detach(uint32_t index) "drc: 0x%"PRIx32
> +spapr_drc_unplug_request(uint32_t index) "drc: 0x%"PRIx32
>  spapr_drc_awaiting_quiesce(uint32_t index) "drc: 0x%"PRIx32
>  spapr_drc_reset(uint32_t index) "drc: 0x%"PRIx32
>  spapr_drc_realize(uint32_t index) "drc: 0x%"PRIx32
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index 8982927d5c..02a63b3666 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -243,7 +243,7 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask);
>   * beforehand (eg. check drc->dev at pre-plug).
>   */
>  void spapr_drc_attach(SpaprDrc *drc, DeviceState *d);
> -void spapr_drc_detach(SpaprDrc *drc);
> +void spapr_drc_unplug_request(SpaprDrc *drc);
>  
>  /*
>   * Reset all DRCs, causing pending hot-plug/unplug requests to complete.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 5/7] spapr_drc.c: introduce unplug_timeout_timer
  2021-02-11 22:52 ` [PATCH v3 5/7] spapr_drc.c: introduce unplug_timeout_timer Daniel Henrique Barboza
@ 2021-02-17  1:14   ` David Gibson
  2021-02-17  1:20   ` David Gibson
  1 sibling, 0 replies; 28+ messages in thread
From: David Gibson @ 2021-02-17  1:14 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: qemu-ppc, qemu-devel, groug

[-- Attachment #1: Type: text/plain, Size: 6889 bytes --]

On Thu, Feb 11, 2021 at 07:52:44PM -0300, Daniel Henrique Barboza wrote:
> The LoPAR spec provides no way for the guest kernel to report failure of
> hotplug/hotunplug events. This wouldn't be bad if those operations were
> granted to always succeed, but that's far for the reality.
> 
> What ends up happening is that, in the case of a failed hotunplug,
> regardless of whether it was a QEMU error or a guest misbehavior, the pSeries
> machine is retaining the unplug state of the device in the running guest.
> This state is cleanup in machine reset, where it is assumed that this state
> represents a device that is pending unplug, and the device is hotunpluged
> from the board. Until the reset occurs, any hotunplug operation of the same
> device is forbid because there is a pending unplug state.
> 
> This behavior has at least one undesirable side effect. A long standing pending
> unplug state is, more often than not, the result of a hotunplug error. The user
> had to dealt with it, since retrying to unplug the device is noy allowed, and then
> in the machine reset we're removing the device from the guest. This means that
> we're failing the user twice - failed to hotunplug when asked, then hotunplugged
> without notice.
> 
> Solutions to this problem range between trying to predict when the hotunplug will
> fail and forbid the operation from the QEMU layer, from opening up the IRQ queue
> to allow for multiple hotunplug attempts, from telling the users to 'reboot the
> machine if something goes wrong'. The first solution is flawed because we can't
> fully predict guest behavior from QEMU, the second solution is a trial and error
> remediation that counts on a hope that the unplug will eventually succeed, and the
> third is ... well.
> 
> This patch introduces a crude, but effective solution to hotunplug errors in
> the pSeries machine. For each unplug done, we'll timeout after some time. If
> a certain amount of time passes, we'll cleanup the hotunplug state from the machine.
> During the timeout period, any unplug operations in the same device will still
> be blocked. After that, we'll assume that the guest failed the operation, and
> allow the user to try again. If the timeout is too short we'll prevent legitimate
> hotunplug situations to occur, so we'll need to overestimate the regular time
> an unplug operation takes to succeed to account that.
> 
> The true solution for the hotunplug errors in the pSeries machines is a PAPR change
> to allow for the guest to warn the platform about it. For now, the work done in this
> timeout design can be used for the new PAPR 'abort hcall' in the future, given that
> for both cases we'll need code to cleanup the existing unplug states of the DRCs.
> 
> At this moment we're adding the basic wiring of the timer into the DRC. Next patch
> will use the timer to timeout failed CPU hotunplugs.
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---
>  hw/ppc/spapr_drc.c         | 36 ++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr_drc.h |  2 ++
>  2 files changed, 38 insertions(+)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 67041fb212..c88bb524c5 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -57,6 +57,8 @@ static void spapr_drc_release(SpaprDrc *drc)
>      drck->release(drc->dev);
>  
>      drc->unplug_requested = false;
> +    timer_del(drc->unplug_timeout_timer);
> +
>      g_free(drc->fdt);
>      drc->fdt = NULL;
>      drc->fdt_start_offset = 0;
> @@ -453,6 +455,24 @@ static const VMStateDescription vmstate_spapr_drc_unplug_requested = {
>      }
>  };
>  
> +static bool spapr_drc_unplug_timeout_timer_needed(void *opaque)
> +{
> +    SpaprDrc *drc = opaque;
> +
> +    return timer_pending(drc->unplug_timeout_timer);
> +}
> +
> +static const VMStateDescription vmstate_spapr_drc_unplug_timeout_timer = {
> +    .name = "DRC unplug timeout timer",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = spapr_drc_unplug_timeout_timer_needed,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_TIMER_PTR(unplug_timeout_timer, SpaprDrc),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};

I think we can probably avoid adding extra data to the migration
stream.  Because the exact length of the timeout isn't super
important, so long as it's "long enough" I think it's acceptable if we
restart the timeout period after a migration.  That can be
accomplished with a post-load hook that just restarts the timer at the
initial duration if the DRC is in the unplug_requested state.

>  static bool spapr_drc_needed(void *opaque)
>  {
>      SpaprDrc *drc = opaque;
> @@ -486,10 +506,20 @@ static const VMStateDescription vmstate_spapr_drc = {
>      },
>      .subsections = (const VMStateDescription * []) {
>          &vmstate_spapr_drc_unplug_requested,
> +        &vmstate_spapr_drc_unplug_timeout_timer,
>          NULL
>      }
>  };
>  
> +static void drc_unplug_timeout_cb(void *opaque)
> +{
> +    SpaprDrc *drc = opaque;
> +
> +    if (drc->unplug_requested) {
> +        drc->unplug_requested = false;
> +    }
> +}
> +
>  static void drc_realize(DeviceState *d, Error **errp)
>  {
>      SpaprDrc *drc = SPAPR_DR_CONNECTOR(d);
> @@ -512,6 +542,11 @@ static void drc_realize(DeviceState *d, Error **errp)
>      object_property_add_alias(root_container, link_name,
>                                drc->owner, child_name);
>      g_free(link_name);
> +
> +    drc->unplug_timeout_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
> +                                             drc_unplug_timeout_cb,
> +                                             drc);
> +
>      vmstate_register(VMSTATE_IF(drc), spapr_drc_index(drc), &vmstate_spapr_drc,
>                       drc);
>      trace_spapr_drc_realize_complete(spapr_drc_index(drc));
> @@ -529,6 +564,7 @@ static void drc_unrealize(DeviceState *d)
>      name = g_strdup_printf("%x", spapr_drc_index(drc));
>      object_property_del(root_container, name);
>      g_free(name);
> +    timer_free(drc->unplug_timeout_timer);
>  }
>  
>  SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type,
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index 02a63b3666..b2e6222d09 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -187,6 +187,8 @@ typedef struct SpaprDrc {
>      bool unplug_requested;
>      void *fdt;
>      int fdt_start_offset;
> +
> +    QEMUTimer *unplug_timeout_timer;
>  } SpaprDrc;
>  
>  struct SpaprMachineState;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 5/7] spapr_drc.c: introduce unplug_timeout_timer
  2021-02-11 22:52 ` [PATCH v3 5/7] spapr_drc.c: introduce unplug_timeout_timer Daniel Henrique Barboza
  2021-02-17  1:14   ` David Gibson
@ 2021-02-17  1:20   ` David Gibson
  1 sibling, 0 replies; 28+ messages in thread
From: David Gibson @ 2021-02-17  1:20 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: qemu-ppc, qemu-devel, groug

[-- Attachment #1: Type: text/plain, Size: 6942 bytes --]

On Thu, Feb 11, 2021 at 07:52:44PM -0300, Daniel Henrique Barboza wrote:
> The LoPAR spec provides no way for the guest kernel to report failure of
> hotplug/hotunplug events. This wouldn't be bad if those operations were
> granted to always succeed, but that's far for the reality.
> 
> What ends up happening is that, in the case of a failed hotunplug,
> regardless of whether it was a QEMU error or a guest misbehavior, the pSeries
> machine is retaining the unplug state of the device in the running guest.
> This state is cleanup in machine reset, where it is assumed that this state
> represents a device that is pending unplug, and the device is hotunpluged
> from the board. Until the reset occurs, any hotunplug operation of the same
> device is forbid because there is a pending unplug state.
> 
> This behavior has at least one undesirable side effect. A long standing pending
> unplug state is, more often than not, the result of a hotunplug error. The user
> had to dealt with it, since retrying to unplug the device is noy allowed, and then
> in the machine reset we're removing the device from the guest. This means that
> we're failing the user twice - failed to hotunplug when asked, then hotunplugged
> without notice.
> 
> Solutions to this problem range between trying to predict when the hotunplug will
> fail and forbid the operation from the QEMU layer, from opening up the IRQ queue
> to allow for multiple hotunplug attempts, from telling the users to 'reboot the
> machine if something goes wrong'. The first solution is flawed because we can't
> fully predict guest behavior from QEMU, the second solution is a trial and error
> remediation that counts on a hope that the unplug will eventually succeed, and the
> third is ... well.
> 
> This patch introduces a crude, but effective solution to hotunplug errors in
> the pSeries machine. For each unplug done, we'll timeout after some time. If
> a certain amount of time passes, we'll cleanup the hotunplug state from the machine.
> During the timeout period, any unplug operations in the same device will still
> be blocked. After that, we'll assume that the guest failed the operation, and
> allow the user to try again. If the timeout is too short we'll prevent legitimate
> hotunplug situations to occur, so we'll need to overestimate the regular time
> an unplug operation takes to succeed to account that.
> 
> The true solution for the hotunplug errors in the pSeries machines is a PAPR change
> to allow for the guest to warn the platform about it. For now, the work done in this
> timeout design can be used for the new PAPR 'abort hcall' in the future, given that
> for both cases we'll need code to cleanup the existing unplug states of the DRCs.
> 
> At this moment we're adding the basic wiring of the timer into the DRC. Next patch
> will use the timer to timeout failed CPU hotunplugs.
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---
>  hw/ppc/spapr_drc.c         | 36 ++++++++++++++++++++++++++++++++++++
>  include/hw/ppc/spapr_drc.h |  2 ++
>  2 files changed, 38 insertions(+)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 67041fb212..c88bb524c5 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -57,6 +57,8 @@ static void spapr_drc_release(SpaprDrc *drc)
>      drck->release(drc->dev);
>  
>      drc->unplug_requested = false;
> +    timer_del(drc->unplug_timeout_timer);
> +
>      g_free(drc->fdt);
>      drc->fdt = NULL;
>      drc->fdt_start_offset = 0;
> @@ -453,6 +455,24 @@ static const VMStateDescription vmstate_spapr_drc_unplug_requested = {
>      }
>  };
>  
> +static bool spapr_drc_unplug_timeout_timer_needed(void *opaque)
> +{
> +    SpaprDrc *drc = opaque;
> +
> +    return timer_pending(drc->unplug_timeout_timer);
> +}
> +
> +static const VMStateDescription vmstate_spapr_drc_unplug_timeout_timer = {
> +    .name = "DRC unplug timeout timer",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .needed = spapr_drc_unplug_timeout_timer_needed,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_TIMER_PTR(unplug_timeout_timer, SpaprDrc),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>  static bool spapr_drc_needed(void *opaque)
>  {
>      SpaprDrc *drc = opaque;
> @@ -486,10 +506,20 @@ static const VMStateDescription vmstate_spapr_drc = {
>      },
>      .subsections = (const VMStateDescription * []) {
>          &vmstate_spapr_drc_unplug_requested,
> +        &vmstate_spapr_drc_unplug_timeout_timer,
>          NULL
>      }
>  };
>  
> +static void drc_unplug_timeout_cb(void *opaque)
> +{
> +    SpaprDrc *drc = opaque;
> +
> +    if (drc->unplug_requested) {
> +        drc->unplug_requested = false;
> +    }

Sorry, forgot to mention in first pass - I think we want some kind of
reporting here.  At least a trace, and maybe also a warn_report() or
the like.

Hrm.. looking wider, I wonder if we should add a DEVICE_DELETE_FAILED
message to QAPI to mirror the DEVICE_DELETED message.  Fixing PAPR is
pretty tedious, but fixing at least qemu's own interfaces is a bit
more approachable.  That could certainly be a follow up enhancement,
though.

> +}
> +
>  static void drc_realize(DeviceState *d, Error **errp)
>  {
>      SpaprDrc *drc = SPAPR_DR_CONNECTOR(d);
> @@ -512,6 +542,11 @@ static void drc_realize(DeviceState *d, Error **errp)
>      object_property_add_alias(root_container, link_name,
>                                drc->owner, child_name);
>      g_free(link_name);
> +
> +    drc->unplug_timeout_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL,
> +                                             drc_unplug_timeout_cb,
> +                                             drc);
> +
>      vmstate_register(VMSTATE_IF(drc), spapr_drc_index(drc), &vmstate_spapr_drc,
>                       drc);
>      trace_spapr_drc_realize_complete(spapr_drc_index(drc));
> @@ -529,6 +564,7 @@ static void drc_unrealize(DeviceState *d)
>      name = g_strdup_printf("%x", spapr_drc_index(drc));
>      object_property_del(root_container, name);
>      g_free(name);
> +    timer_free(drc->unplug_timeout_timer);
>  }
>  
>  SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type,
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index 02a63b3666..b2e6222d09 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -187,6 +187,8 @@ typedef struct SpaprDrc {
>      bool unplug_requested;
>      void *fdt;
>      int fdt_start_offset;
> +
> +    QEMUTimer *unplug_timeout_timer;
>  } SpaprDrc;
>  
>  struct SpaprMachineState;

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 6/7] spapr_drc.c: add hotunplug timeout for CPUs
  2021-02-11 22:52 ` [PATCH v3 6/7] spapr_drc.c: add hotunplug timeout for CPUs Daniel Henrique Barboza
@ 2021-02-17  1:23   ` David Gibson
  0 siblings, 0 replies; 28+ messages in thread
From: David Gibson @ 2021-02-17  1:23 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: Xujun Ma, qemu-ppc, qemu-devel, groug

[-- Attachment #1: Type: text/plain, Size: 6354 bytes --]

On Thu, Feb 11, 2021 at 07:52:45PM -0300, Daniel Henrique Barboza wrote:
> There is a reliable way to make a CPU hotunplug fail in the pseries
> machine. Hotplug a CPU A, then offline all other CPUs inside the guest
> but A. When trying to hotunplug A the guest kernel will refuse to do
> it, because A is now the last online CPU of the guest. PAPR has no
> 'error callback' in this situation to report back to the platform,
> so the guest kernel will deny the unplug in silent and QEMU will never
> know what happened. The unplug pending state of A will remain until
> the guest is shutdown or rebooted.
> 
> Previous attempts of fixing it (see [1] and [2]) were aimed at trying to
> mitigate the effects of the problem. In [1] we were trying to guess which
> guest CPUs were online to forbid hotunplug of the last online CPU in the QEMU
> layer, avoiding the scenario described above because QEMU is now failing
> in behalf of the guest. This is not robust because the last online CPU of
> the guest can change while we're in the middle of the unplug process, and
> our initial assumptions are now invalid. In [2] we were accepting that our
> unplug process is uncertain and the user should be allowed to spam the IRQ
> hotunplug queue of the guest in case the CPU hotunplug fails.
> 
> This patch presents another alternative, using the timeout infrastructure
> introduced in the previous patch. CPU hotunplugs in the pSeries machine will
> now timeout after 15 seconds. This is a long time for a single CPU unplug
> to occur, regardless of guest load - although the user is *strongly* encouraged
> to *not* hotunplug devices from a guest under high load - and we can be sure
> that something went wrong if it takes longer than that for the guest to release
> the CPU (the same can't be said about memory hotunplug - more on that in the
> next patch).
> 
> Timing out the unplug operation will reset the unplug state of the CPU and
> allow the user to try it again, regardless of the error situation that
> prevented the hotunplug to occur. Of all the not so pretty fixes/mitigations
> for CPU hotunplug errors in pSeries, timing out the operation is an admission
> that we have no control in the process, and must assume the worst case if
> the operation doesn't succeed in a sensible time frame.
> 
> [1] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg03353.html
> [2] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04400.html
> 
> Reported-by: Xujun Ma <xuma@redhat.com>
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1911414
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  hw/ppc/spapr.c             |  4 ++++
>  hw/ppc/spapr_drc.c         | 17 +++++++++++++++++
>  include/hw/ppc/spapr_drc.h |  3 +++
>  3 files changed, 24 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index b066df68cb..ecce8abf14 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3724,6 +3724,10 @@ void spapr_core_unplug_request(HotplugHandler *hotplug_dev, DeviceState *dev,
>      if (!spapr_drc_unplug_requested(drc)) {
>          spapr_drc_unplug_request(drc);
>          spapr_hotplug_req_remove_by_index(drc);
> +    } else {
> +        error_setg(errp, "core-id %d unplug is still pending, %d seconds "
> +                   "timeout remaining",
> +                   cc->core_id, spapr_drc_unplug_timeout_remaining_sec(drc));

Reporting this information is a nice touch.

>      }
>  }
>  
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index c88bb524c5..c143bfb6d3 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -398,6 +398,12 @@ void spapr_drc_unplug_request(SpaprDrc *drc)
>  
>      drc->unplug_requested = true;
>  
> +    if (drck->unplug_timeout_seconds != 0) {
> +        timer_mod(drc->unplug_timeout_timer,
> +                  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> +                  drck->unplug_timeout_seconds * 1000);
> +    }
> +
>      if (drc->state != drck->empty_state) {
>          trace_spapr_drc_awaiting_quiesce(spapr_drc_index(drc));
>          return;
> @@ -406,6 +412,16 @@ void spapr_drc_unplug_request(SpaprDrc *drc)
>      spapr_drc_release(drc);
>  }
>  
> +int spapr_drc_unplug_timeout_remaining_sec(SpaprDrc *drc)
> +{
> +    if (drc->unplug_requested && timer_pending(drc->unplug_timeout_timer)) {
> +        return (qemu_timeout_ns_to_ms(drc->unplug_timeout_timer->expire_time) -
> +                qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL)) / 1000;

Hmm.  Reaching into the timer's internal fields isn't ideal.  I wonder
if we should add a helper in the timer code for reporting this information.

> +    }
> +
> +    return 0;
> +}
> +
>  bool spapr_drc_reset(SpaprDrc *drc)
>  {
>      SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> @@ -706,6 +722,7 @@ static void spapr_drc_cpu_class_init(ObjectClass *k, void *data)
>      drck->drc_name_prefix = "CPU ";
>      drck->release = spapr_core_release;
>      drck->dt_populate = spapr_core_dt_populate;
> +    drck->unplug_timeout_seconds = 15;
>  }
>  
>  static void spapr_drc_pci_class_init(ObjectClass *k, void *data)
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index b2e6222d09..26599c385a 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -211,6 +211,8 @@ typedef struct SpaprDrcClass {
>  
>      int (*dt_populate)(SpaprDrc *drc, struct SpaprMachineState *spapr,
>                         void *fdt, int *fdt_start_offset, Error **errp);
> +
> +    int unplug_timeout_seconds;
>  } SpaprDrcClass;
>  
>  typedef struct SpaprDrcPhysical {
> @@ -246,6 +248,7 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask);
>   */
>  void spapr_drc_attach(SpaprDrc *drc, DeviceState *d);
>  void spapr_drc_unplug_request(SpaprDrc *drc);
> +int spapr_drc_unplug_timeout_remaining_sec(SpaprDrc *drc);
>  
>  /*
>   * Reset all DRCs, causing pending hot-plug/unplug requests to complete.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 7/7] spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state
  2021-02-11 22:52 ` [PATCH v3 7/7] spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state Daniel Henrique Barboza
@ 2021-02-17  2:31   ` David Gibson
  2021-02-19 20:04     ` Daniel Henrique Barboza
  2021-02-19 21:31     ` Daniel Henrique Barboza
  0 siblings, 2 replies; 28+ messages in thread
From: David Gibson @ 2021-02-17  2:31 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: qemu-ppc, qemu-devel, groug

[-- Attachment #1: Type: text/plain, Size: 7957 bytes --]

On Thu, Feb 11, 2021 at 07:52:46PM -0300, Daniel Henrique Barboza wrote:
> Handling errors in memory hotunplug in the pSeries machine is more complex
> than any other device type, because there are all the complications that other
> devices has, and more.
> 
> For instance, determining a timeout for a DIMM hotunplug must consider if it's a
> Hash-MMU or a Radix-MMU guest, because Hash guests takes longer to hotunplug DIMMs.
> The size of the DIMM is also a factor, given that longer DIMMs naturally takes
> longer to be hotunplugged from the kernel. And there's also the guest memory usage to
> be considered: if there's a process that is consuming memory that would be lost by
> the DIMM unplug, the kernel will postpone the unplug process until the process
> finishes, and then initiate the regular hotunplug process. The first two
> considerations are manageable, but the last one is a deal breaker.
> 
> There is no sane way for the pSeries machine to determine the memory load in the guest
> when attempting a DIMM hotunplug - and even if there was a way, the guest can start
> using all the RAM in the middle of the unplug process and invalidate our previous
> assumptions - and in result we can't even begin to calculate a timeout for the
> operation. This means that we can't implement a viable timeout mechanism for memory
> unplug in pSeries.
> 
> Going back to why we would consider an unplug timeout, the reason is that we can't
> know if the kernel is giving up the unplug. Turns out that, sometimes, we can.
> Consider a failed memory hotunplug attempt where the kernel will error out with
> the following message:
> 
> 'pseries-hotplug-mem: Memory indexed-count-remove failed, adding any removed LMBs'
> 
> This happens when there is a LMB that the kernel gave up in removing, and the LMBs
> marked for removal of the same DIMM are now being added back. This process happens

We need to be a little careful about terminology here.  From the
guest's point of view, there's no such thing as a DIMM, only LMBs.
What the guest is doing here is essentially rejecting a single "index
+ number" DRC unplug request, which corresponds to one DIMM on the
qemu side.

> in the pseries kernel in [1], dlpar_memory_remove_by_ic() into dlpar_add_lmb(), and
> after that update_lmb_associativity_index(). In this function, the kernel is configuring
> the LMB DRC connector again. Note that this is a valid usage in LOPAR, as stated in
> section "ibm,configure-connector RTAS Call":
> 
> 'A subsequent sequence of calls to ibm,configure-connector with the same entry from
> the “ibm,drc-indexes” or “ibm,drc-info” property will restart the configuration of
> devices which were not completely configured.'
> 
> We can use this kernel behavior in our favor. If a DRC connector reconfiguration
> for a LMB that we marked as unplug pending happens, this indicates that the kernel
> changed its mind about the unplug and is reasserting that it will keep using the
> DIMM. In this case, it's safe to assume that the whole DIMM unplug was cancelled.
> 
> This patch hops into rtas_ibm_configure_connector() and, in the scenario described
> above, clear the unplug state for the DIMM device. This will not solve all the
> problems we still have with memory unplug, but it will cover this case where the
> kernel reconfigures LMBs after a failed unplug. We are a bit more resilient,
> without using an unreliable timeout, and we didn't make the remaining error cases
> any worse.

I wonder if we could use this as a beginning of a hotplug failure
reporting mechanism.  As noted, this is explicitly allowed by PAPR and
I think in general it makes sense that a configure-connector would
re-assert that the guest is using the resource and we can't unplug it.

Could we extend guests to do an indicative configure-connector on any
unplug it knows it can't complete?  Or if configure-connector is too
disruptive could we use an (extra) H_SET_INDICATOR to "UNISOLATE"
state? If I'm reading right, that should be both permitted and a no-op
for existing PAPR implementations, so it should be a pretty safe way
to add that indication.

> 
> [1] arch/powerpc/platforms/pseries/hotplug-memory.c
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---
>  hw/ppc/spapr.c         | 30 ++++++++++++++++++++++++++++++
>  hw/ppc/spapr_drc.c     | 14 ++++++++++++++
>  include/hw/ppc/spapr.h |  2 ++
>  3 files changed, 46 insertions(+)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index ecce8abf14..4bcded4a1a 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3575,6 +3575,36 @@ static SpaprDimmState *spapr_recover_pending_dimm_state(SpaprMachineState *ms,
>      return spapr_pending_dimm_unplugs_add(ms, avail_lmbs, dimm);
>  }
>  
> +void spapr_clear_pending_dimm_unplug_state(SpaprMachineState *spapr,
> +                                           PCDIMMDevice *dimm)
> +{
> +    SpaprDimmState *ds = spapr_pending_dimm_unplugs_find(spapr, dimm);
> +    SpaprDrc *drc;
> +    uint32_t nr_lmbs;
> +    uint64_t size, addr_start, addr;
> +    int i;
> +
> +    if (ds) {
> +        spapr_pending_dimm_unplugs_remove(spapr, ds);
> +    }

Hrm... how would !ds arise?  Could this just be an assert?

> +
> +    size = memory_device_get_region_size(MEMORY_DEVICE(dimm), &error_abort);
> +    nr_lmbs = size / SPAPR_MEMORY_BLOCK_SIZE;
> +
> +    addr_start = object_property_get_uint(OBJECT(dimm), PC_DIMM_ADDR_PROP,
> +                                          &error_abort);
> +
> +    addr = addr_start;
> +    for (i = 0; i < nr_lmbs; i++) {
> +        drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB,
> +                              addr / SPAPR_MEMORY_BLOCK_SIZE);
> +        g_assert(drc);
> +
> +        drc->unplug_requested = false;
> +        addr += SPAPR_MEMORY_BLOCK_SIZE;
> +    }
> +}
> +
>  /* Callback to be called during DRC release. */
>  void spapr_lmb_release(DeviceState *dev)
>  {
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index c143bfb6d3..eae941233a 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -1230,6 +1230,20 @@ static void rtas_ibm_configure_connector(PowerPCCPU *cpu,
>  
>      drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
>  
> +    /*
> +     * This indicates that the kernel is reconfiguring a LMB due to
> +     * a failed hotunplug. Clear the pending unplug state for the whole
> +     * DIMM.
> +     */
> +    if (spapr_drc_type(drc) == SPAPR_DR_CONNECTOR_TYPE_LMB &&
> +        drc->unplug_requested) {
> +
> +        /* This really shouldn't happen in this point, but ... */
> +        g_assert(drc->dev);

I'm a little worried that a buggy or malicious guest could trigger
this assert.

> +
> +        spapr_clear_pending_dimm_unplug_state(spapr, PC_DIMM(drc->dev));
> +    }
> +
>      if (!drc->fdt) {
>          void *fdt;
>          int fdt_size;
> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
> index ccbeeca1de..5bcc8f3bb8 100644
> --- a/include/hw/ppc/spapr.h
> +++ b/include/hw/ppc/spapr.h
> @@ -847,6 +847,8 @@ int spapr_hpt_shift_for_ramsize(uint64_t ramsize);
>  int spapr_reallocate_hpt(SpaprMachineState *spapr, int shift, Error **errp);
>  void spapr_clear_pending_events(SpaprMachineState *spapr);
>  void spapr_clear_pending_hotplug_events(SpaprMachineState *spapr);
> +void spapr_clear_pending_dimm_unplug_state(SpaprMachineState *spapr,
> +                                           PCDIMMDevice *dimm);
>  int spapr_max_server_number(SpaprMachineState *spapr);
>  void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>                        uint64_t pte0, uint64_t pte1);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 0/7] CPU unplug timeout/LMB unplug cleanup in DRC reconfiguration
  2021-02-11 22:52 [PATCH v3 0/7] CPU unplug timeout/LMB unplug cleanup in DRC reconfiguration Daniel Henrique Barboza
                   ` (6 preceding siblings ...)
  2021-02-11 22:52 ` [PATCH v3 7/7] spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state Daniel Henrique Barboza
@ 2021-02-17  2:33 ` David Gibson
  7 siblings, 0 replies; 28+ messages in thread
From: David Gibson @ 2021-02-17  2:33 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: qemu-ppc, qemu-devel, groug

[-- Attachment #1: Type: text/plain, Size: 2520 bytes --]

On Thu, Feb 11, 2021 at 07:52:39PM -0300, Daniel Henrique Barboza wrote:
> Hi,
> 
> This is marked as a v3 as it started as a result of discussions that
> followed the v2 [1]. 
> 
> The idea with this series is to add CPU hotunplug timeout to avoid the
> situations where the kernel refuses to release the CPU. The reasoning
> for a timeout approach is described in patch 05.
> 
> While investigating putting a timeout in memory hotunplug, I have found
> out that we have a way to determine, at least in some cases, when the kernel
> refuses to release the DIMM during a memory hotunplug. This alleviate one
> of the most common issues (at least AFAIK) with memory hotunplug and it
> made me gave up attempting to put a timeout in memory hotunplug altogether.
> 
> At this point I didn't add timeouts for PCI hotunplug operations, but it
> is trivial to do so if desirable.
> 
> The series goes as follows:
> 
> - Patches 1-4: DRC simplifications/cleanups. The idea with these
>   cleanups were to trim the spapr_drc_detach use as much as possible,
>   since the function would be used to start the timeout timer
> 
> - Patch 5: timeout timer infrastructure
> 
> - Patch 6: add cpu unplug timeout
> 
> - Patch 7: reset DIMM unplug state when the kernel reconfigures the DRC
>   connector

Very nice start.  More comments throughout.

> 
> 
> 
> v2 link: [1] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04400.html
> 
> 
> Daniel Henrique Barboza (7):
>   spapr_drc.c: do not call spapr_drc_detach() in drc_isolate_logical()
>   spapr_pci.c: simplify spapr_pci_unplug_request() function handling
>   spapr_drc.c: use spapr_drc_release() in isolate_physical/set_unusable
>   spapr: rename spapr_drc_detach() to spapr_drc_unplug_request()
>   spapr_drc.c: introduce unplug_timeout_timer
>   spapr_drc.c: add hotunplug timeout for CPUs
>   spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state
> 
>  hw/ppc/spapr.c             |  40 ++++++++++++-
>  hw/ppc/spapr_drc.c         | 116 +++++++++++++++++++++++++++----------
>  hw/ppc/spapr_pci.c         |  44 +++++---------
>  hw/ppc/trace-events        |   2 +-
>  include/hw/ppc/spapr.h     |   2 +
>  include/hw/ppc/spapr_drc.h |   7 ++-
>  6 files changed, 147 insertions(+), 64 deletions(-)
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 3/7] spapr_drc.c: use spapr_drc_release() in isolate_physical/set_unusable
  2021-02-11 22:52 ` [PATCH v3 3/7] spapr_drc.c: use spapr_drc_release() in isolate_physical/set_unusable Daniel Henrique Barboza
  2021-02-17  0:57   ` David Gibson
@ 2021-02-17 10:58   ` Greg Kurz
  1 sibling, 0 replies; 28+ messages in thread
From: Greg Kurz @ 2021-02-17 10:58 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: qemu-ppc, qemu-devel, david

On Thu, 11 Feb 2021 19:52:42 -0300
Daniel Henrique Barboza <danielhb413@gmail.com> wrote:

> When moving a physical DRC to "Available", drc_isolate_physical() will
> move the DRC state to STATE_PHYSICAL_POWERON and, if the DRC is marked
> for unplug, call spapr_drc_detach(). For physical DRCs, drck->empty_state
> is STATE_PHYSICAL_POWERON, meaning that we're sure that spapr_drc_detach()
> will end up calling spapr_drc_release() in the end.
> 
> Likewise, for logical DRCs, drc_set_unusable will move the DRC to "Unusable"
> state, setting drc->state to STATE_LOGICAL_UNUSABLE, which is the
> drck->empty_state for logical DRCs. spapr_drc_detach() will call
> spapr_drc_release() in this case as well.
> 
> In both scenarios, spapr_drc_detach() is being used as a spapr_drc_release(),
> wrapper, where we also set unplug_requested (which is already true, otherwise
> spapr_drc_detach() wouldn't be called in the first place) and check if
> drc->state == drck->empty_state, which we also know it's guaranteed to
> be true because we just set it.
> 
> Just use spapr_drc_release() in these functions to be clear of our intentions
> in both these functions.
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---

Reviewed-by: Greg Kurz <groug@kaod.org>

>  hw/ppc/spapr_drc.c | 32 ++++++++++++++++----------------
>  1 file changed, 16 insertions(+), 16 deletions(-)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 84bd3c881f..555a25517d 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -50,6 +50,20 @@ uint32_t spapr_drc_index(SpaprDrc *drc)
>          | (drc->id & DRC_INDEX_ID_MASK);
>  }
>  
> +static void spapr_drc_release(SpaprDrc *drc)
> +{
> +    SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> +
> +    drck->release(drc->dev);
> +
> +    drc->unplug_requested = false;
> +    g_free(drc->fdt);
> +    drc->fdt = NULL;
> +    drc->fdt_start_offset = 0;
> +    object_property_del(OBJECT(drc), "device");
> +    drc->dev = NULL;
> +}
> +
>  static uint32_t drc_isolate_physical(SpaprDrc *drc)
>  {
>      switch (drc->state) {
> @@ -68,7 +82,7 @@ static uint32_t drc_isolate_physical(SpaprDrc *drc)
>      if (drc->unplug_requested) {
>          uint32_t drc_index = spapr_drc_index(drc);
>          trace_spapr_drc_set_isolation_state_finalizing(drc_index);
> -        spapr_drc_detach(drc);
> +        spapr_drc_release(drc);
>      }
>  
>      return RTAS_OUT_SUCCESS;
> @@ -209,7 +223,7 @@ static uint32_t drc_set_unusable(SpaprDrc *drc)
>      if (drc->unplug_requested) {
>          uint32_t drc_index = spapr_drc_index(drc);
>          trace_spapr_drc_set_allocation_state_finalizing(drc_index);
> -        spapr_drc_detach(drc);
> +        spapr_drc_release(drc);
>      }
>  
>      return RTAS_OUT_SUCCESS;
> @@ -372,20 +386,6 @@ void spapr_drc_attach(SpaprDrc *drc, DeviceState *d)
>                               NULL, 0);
>  }
>  
> -static void spapr_drc_release(SpaprDrc *drc)
> -{
> -    SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> -
> -    drck->release(drc->dev);
> -
> -    drc->unplug_requested = false;
> -    g_free(drc->fdt);
> -    drc->fdt = NULL;
> -    drc->fdt_start_offset = 0;
> -    object_property_del(OBJECT(drc), "device");
> -    drc->dev = NULL;
> -}
> -
>  void spapr_drc_detach(SpaprDrc *drc)
>  {
>      SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 4/7] spapr: rename spapr_drc_detach() to spapr_drc_unplug_request()
  2021-02-11 22:52 ` [PATCH v3 4/7] spapr: rename spapr_drc_detach() to spapr_drc_unplug_request() Daniel Henrique Barboza
  2021-02-17  0:58   ` David Gibson
@ 2021-02-17 11:01   ` Greg Kurz
  1 sibling, 0 replies; 28+ messages in thread
From: Greg Kurz @ 2021-02-17 11:01 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: qemu-ppc, qemu-devel, david

On Thu, 11 Feb 2021 19:52:43 -0300
Daniel Henrique Barboza <danielhb413@gmail.com> wrote:

> spapr_drc_detach() is not the best name for what the function does.
> The function does not detach the DRC, it makes an uncommited attempt
> to do it. It'll mark the DRC as pending unplug, via the 'unplug_request'
> flag, and only if the DRC state is drck->empty_state it will detach the
> DRC, via spapr_drc_release().
> 
> This is a contrast with its pair spapr_drc_attach(), where the function is
> indeed creating the DRC QOM object. If you know what spapr_drc_attach()
> does, you can be misled into thinking that spapr_drc_detach() is removing
> the DRC from QEMU internal state, which isn't true.
> 
> The current role of this function is better described as a request for
> detach, since there's no guarantee that we're going to detach the DRC in
> the end. Rename the function to spapr_drc_unplug_request to reflect what is is
> doing.
> 
> The initial idea was to change the name to spapr_drc_detach_request(), and
> later on change the unplug_request flag to detach_request. However,
> unplug_request is a migratable boolean for a long time now and renaming it
> is not worth the trouble. spapr_drc_unplug_request() setting drc->unplug_request
> is more natural than spapr_drc_detach_request setting drc->unplug_request.
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---

Reviewed-by: Greg Kurz <groug@kaod.org>

>  hw/ppc/spapr.c             | 6 +++---
>  hw/ppc/spapr_drc.c         | 4 ++--
>  hw/ppc/spapr_pci.c         | 2 +-
>  hw/ppc/trace-events        | 2 +-
>  include/hw/ppc/spapr_drc.h | 2 +-
>  5 files changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 85fe65f894..b066df68cb 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3654,7 +3654,7 @@ static void spapr_memory_unplug_request(HotplugHandler *hotplug_dev,
>                                addr / SPAPR_MEMORY_BLOCK_SIZE);
>          g_assert(drc);
>  
> -        spapr_drc_detach(drc);
> +        spapr_drc_unplug_request(drc);
>          addr += SPAPR_MEMORY_BLOCK_SIZE;
>      }
>  
> @@ -3722,7 +3722,7 @@ void spapr_core_unplug_request(HotplugHandler *hotplug_dev, DeviceState *dev,
>      g_assert(drc);
>  
>      if (!spapr_drc_unplug_requested(drc)) {
> -        spapr_drc_detach(drc);
> +        spapr_drc_unplug_request(drc);
>          spapr_hotplug_req_remove_by_index(drc);
>      }
>  }
> @@ -3985,7 +3985,7 @@ static void spapr_phb_unplug_request(HotplugHandler *hotplug_dev,
>      assert(drc);
>  
>      if (!spapr_drc_unplug_requested(drc)) {
> -        spapr_drc_detach(drc);
> +        spapr_drc_unplug_request(drc);
>          spapr_hotplug_req_remove_by_index(drc);
>      }
>  }
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 555a25517d..67041fb212 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -386,11 +386,11 @@ void spapr_drc_attach(SpaprDrc *drc, DeviceState *d)
>                               NULL, 0);
>  }
>  
> -void spapr_drc_detach(SpaprDrc *drc)
> +void spapr_drc_unplug_request(SpaprDrc *drc)
>  {
>      SpaprDrcClass *drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
>  
> -    trace_spapr_drc_detach(spapr_drc_index(drc));
> +    trace_spapr_drc_unplug_request(spapr_drc_index(drc));
>  
>      g_assert(drc->dev);
>  
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 1791d98a49..9334ba5dbb 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1726,7 +1726,7 @@ static void spapr_pci_unplug_request(HotplugHandler *plug_handler,
>              if (state == SPAPR_DR_ENTITY_SENSE_PRESENT) {
>                  /* Mark the DRC as requested unplug if needed. */
>                  if (!spapr_drc_unplug_requested(func_drc)) {
> -                    spapr_drc_detach(func_drc);
> +                    spapr_drc_unplug_request(func_drc);
>                  }
>                  spapr_hotplug_req_remove_by_index(func_drc);
>              }
> diff --git a/hw/ppc/trace-events b/hw/ppc/trace-events
> index 1e91984526..b4bbfbb013 100644
> --- a/hw/ppc/trace-events
> +++ b/hw/ppc/trace-events
> @@ -50,7 +50,7 @@ spapr_drc_set_allocation_state(uint32_t index, int state) "drc: 0x%"PRIx32", sta
>  spapr_drc_set_allocation_state_finalizing(uint32_t index) "drc: 0x%"PRIx32
>  spapr_drc_set_configured(uint32_t index) "drc: 0x%"PRIx32
>  spapr_drc_attach(uint32_t index) "drc: 0x%"PRIx32
> -spapr_drc_detach(uint32_t index) "drc: 0x%"PRIx32
> +spapr_drc_unplug_request(uint32_t index) "drc: 0x%"PRIx32
>  spapr_drc_awaiting_quiesce(uint32_t index) "drc: 0x%"PRIx32
>  spapr_drc_reset(uint32_t index) "drc: 0x%"PRIx32
>  spapr_drc_realize(uint32_t index) "drc: 0x%"PRIx32
> diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h
> index 8982927d5c..02a63b3666 100644
> --- a/include/hw/ppc/spapr_drc.h
> +++ b/include/hw/ppc/spapr_drc.h
> @@ -243,7 +243,7 @@ int spapr_dt_drc(void *fdt, int offset, Object *owner, uint32_t drc_type_mask);
>   * beforehand (eg. check drc->dev at pre-plug).
>   */
>  void spapr_drc_attach(SpaprDrc *drc, DeviceState *d);
> -void spapr_drc_detach(SpaprDrc *drc);
> +void spapr_drc_unplug_request(SpaprDrc *drc);
>  
>  /*
>   * Reset all DRCs, causing pending hot-plug/unplug requests to complete.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 7/7] spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state
  2021-02-17  2:31   ` David Gibson
@ 2021-02-19 20:04     ` Daniel Henrique Barboza
  2021-02-22  5:53       ` David Gibson
  2021-02-19 21:31     ` Daniel Henrique Barboza
  1 sibling, 1 reply; 28+ messages in thread
From: Daniel Henrique Barboza @ 2021-02-19 20:04 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, groug



On 2/16/21 11:31 PM, David Gibson wrote:
> On Thu, Feb 11, 2021 at 07:52:46PM -0300, Daniel Henrique Barboza wrote:
>> Handling errors in memory hotunplug in the pSeries machine is more complex
>> than any other device type, because there are all the complications that other
>> devices has, and more.
>>
>> For instance, determining a timeout for a DIMM hotunplug must consider if it's a
>> Hash-MMU or a Radix-MMU guest, because Hash guests takes longer to hotunplug DIMMs.
>> The size of the DIMM is also a factor, given that longer DIMMs naturally takes
>> longer to be hotunplugged from the kernel. And there's also the guest memory usage to
>> be considered: if there's a process that is consuming memory that would be lost by
>> the DIMM unplug, the kernel will postpone the unplug process until the process
>> finishes, and then initiate the regular hotunplug process. The first two
>> considerations are manageable, but the last one is a deal breaker.
>>
>> There is no sane way for the pSeries machine to determine the memory load in the guest
>> when attempting a DIMM hotunplug - and even if there was a way, the guest can start
>> using all the RAM in the middle of the unplug process and invalidate our previous
>> assumptions - and in result we can't even begin to calculate a timeout for the
>> operation. This means that we can't implement a viable timeout mechanism for memory
>> unplug in pSeries.
>>
>> Going back to why we would consider an unplug timeout, the reason is that we can't
>> know if the kernel is giving up the unplug. Turns out that, sometimes, we can.
>> Consider a failed memory hotunplug attempt where the kernel will error out with
>> the following message:
>>
>> 'pseries-hotplug-mem: Memory indexed-count-remove failed, adding any removed LMBs'
>>
>> This happens when there is a LMB that the kernel gave up in removing, and the LMBs
>> marked for removal of the same DIMM are now being added back. This process happens
> 
> We need to be a little careful about terminology here.  From the
> guest's point of view, there's no such thing as a DIMM, only LMBs.
> What the guest is doing here is essentially rejecting a single "index
> + number" DRC unplug request, which corresponds to one DIMM on the
> qemu side.

I'll reword this paragraph to avoid using "DIMM" in the context of the guest
kernel.

> 
>> in the pseries kernel in [1], dlpar_memory_remove_by_ic() into dlpar_add_lmb(), and
>> after that update_lmb_associativity_index(). In this function, the kernel is configuring
>> the LMB DRC connector again. Note that this is a valid usage in LOPAR, as stated in
>> section "ibm,configure-connector RTAS Call":
>>
>> 'A subsequent sequence of calls to ibm,configure-connector with the same entry from
>> the “ibm,drc-indexes” or “ibm,drc-info” property will restart the configuration of
>> devices which were not completely configured.'
>>
>> We can use this kernel behavior in our favor. If a DRC connector reconfiguration
>> for a LMB that we marked as unplug pending happens, this indicates that the kernel
>> changed its mind about the unplug and is reasserting that it will keep using the
>> DIMM. In this case, it's safe to assume that the whole DIMM unplug was cancelled.
>>
>> This patch hops into rtas_ibm_configure_connector() and, in the scenario described
>> above, clear the unplug state for the DIMM device. This will not solve all the
>> problems we still have with memory unplug, but it will cover this case where the
>> kernel reconfigures LMBs after a failed unplug. We are a bit more resilient,
>> without using an unreliable timeout, and we didn't make the remaining error cases
>> any worse.
> 
> I wonder if we could use this as a beginning of a hotplug failure
> reporting mechanism.  As noted, this is explicitly allowed by PAPR and
> I think in general it makes sense that a configure-connector would
> re-assert that the guest is using the resource and we can't unplug it.
> 

I think it's worth looking into it. The kernel already does that in case of hotunplug
failure of LMBs (at least in this particular case), so it's a matter of evaluating
how hard it is to do the same for e.g. CPUs.


> Could we extend guests to do an indicative configure-connector on any
> unplug it knows it can't complete?  Or if configure-connector is too
> disruptive could we use an (extra) H_SET_INDICATOR to "UNISOLATE"
> state? If I'm reading right, that should be both permitted and a no-op
> for existing PAPR implementations, so it should be a pretty safe way
> to add that indication.

A quick look in LOPAR shows that set_indicator can be used to report
hotplug failures (which is a surprise to me, I wasn't aware of it):

-----
(Table 13.7, R1-13.5.3.4-4.)

For all DR options: If this is a DR operation that involves the user insert-
ing a DR entity, then if the firmware can determine that the inserted entity
would cause a system disturbance, then the set-indicator RTAS call must
not unisolate the entity and must return an error status which is unique to the
particular error.
-----

The wording 'would cause a system disturbance' seems generic on purpose, giving
the firmware/platform the prerrogative to refuse a hotplug for any given
reason.

I also read that there is no rule against using set_indicator with "unisolate" to
an already unisolated resource. It would be a no-op.

I don't think we would find fierce opposition if we propose an addendum such as:

"For all DR options: If this is a DR operation that involves the user removing
ing a DR entity, then if the partition operational system can determine that
removing the entity would cause a system disturbance, then the set-indicator RTAS
call can be used with the 'unisolate' value to inform the platform that the entity
can not be removed at that time."

This would be enough to accomplish what we're aiming for using set_indicator and
unisolate. Then we have 2 options to signal a failed unplug - configure-connector
and unisolating the device. The guest can use whichever is easier or makes more
sense.



Thanks,


DHB


> 
>>
>> [1] arch/powerpc/platforms/pseries/hotplug-memory.c
>>
>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>> ---
>>   hw/ppc/spapr.c         | 30 ++++++++++++++++++++++++++++++
>>   hw/ppc/spapr_drc.c     | 14 ++++++++++++++
>>   include/hw/ppc/spapr.h |  2 ++
>>   3 files changed, 46 insertions(+)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index ecce8abf14..4bcded4a1a 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -3575,6 +3575,36 @@ static SpaprDimmState *spapr_recover_pending_dimm_state(SpaprMachineState *ms,
>>       return spapr_pending_dimm_unplugs_add(ms, avail_lmbs, dimm);
>>   }
>>   
>> +void spapr_clear_pending_dimm_unplug_state(SpaprMachineState *spapr,
>> +                                           PCDIMMDevice *dimm)
>> +{
>> +    SpaprDimmState *ds = spapr_pending_dimm_unplugs_find(spapr, dimm);
>> +    SpaprDrc *drc;
>> +    uint32_t nr_lmbs;
>> +    uint64_t size, addr_start, addr;
>> +    int i;
>> +
>> +    if (ds) {
>> +        spapr_pending_dimm_unplugs_remove(spapr, ds);
>> +    }
> 
> Hrm... how would !ds arise?  Could this just be an assert?
> 
>> +
>> +    size = memory_device_get_region_size(MEMORY_DEVICE(dimm), &error_abort);
>> +    nr_lmbs = size / SPAPR_MEMORY_BLOCK_SIZE;
>> +
>> +    addr_start = object_property_get_uint(OBJECT(dimm), PC_DIMM_ADDR_PROP,
>> +                                          &error_abort);
>> +
>> +    addr = addr_start;
>> +    for (i = 0; i < nr_lmbs; i++) {
>> +        drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB,
>> +                              addr / SPAPR_MEMORY_BLOCK_SIZE);
>> +        g_assert(drc);
>> +
>> +        drc->unplug_requested = false;
>> +        addr += SPAPR_MEMORY_BLOCK_SIZE;
>> +    }
>> +}
>> +
>>   /* Callback to be called during DRC release. */
>>   void spapr_lmb_release(DeviceState *dev)
>>   {
>> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
>> index c143bfb6d3..eae941233a 100644
>> --- a/hw/ppc/spapr_drc.c
>> +++ b/hw/ppc/spapr_drc.c
>> @@ -1230,6 +1230,20 @@ static void rtas_ibm_configure_connector(PowerPCCPU *cpu,
>>   
>>       drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
>>   
>> +    /*
>> +     * This indicates that the kernel is reconfiguring a LMB due to
>> +     * a failed hotunplug. Clear the pending unplug state for the whole
>> +     * DIMM.
>> +     */
>> +    if (spapr_drc_type(drc) == SPAPR_DR_CONNECTOR_TYPE_LMB &&
>> +        drc->unplug_requested) {
>> +
>> +        /* This really shouldn't happen in this point, but ... */
>> +        g_assert(drc->dev);
> 
> I'm a little worried that a buggy or malicious guest could trigger
> this assert.
> 
>> +
>> +        spapr_clear_pending_dimm_unplug_state(spapr, PC_DIMM(drc->dev));
>> +    }
>> +
>>       if (!drc->fdt) {
>>           void *fdt;
>>           int fdt_size;
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index ccbeeca1de..5bcc8f3bb8 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -847,6 +847,8 @@ int spapr_hpt_shift_for_ramsize(uint64_t ramsize);
>>   int spapr_reallocate_hpt(SpaprMachineState *spapr, int shift, Error **errp);
>>   void spapr_clear_pending_events(SpaprMachineState *spapr);
>>   void spapr_clear_pending_hotplug_events(SpaprMachineState *spapr);
>> +void spapr_clear_pending_dimm_unplug_state(SpaprMachineState *spapr,
>> +                                           PCDIMMDevice *dimm);
>>   int spapr_max_server_number(SpaprMachineState *spapr);
>>   void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>>                         uint64_t pte0, uint64_t pte1);
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 7/7] spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state
  2021-02-17  2:31   ` David Gibson
  2021-02-19 20:04     ` Daniel Henrique Barboza
@ 2021-02-19 21:31     ` Daniel Henrique Barboza
  2021-02-22  5:54       ` David Gibson
  1 sibling, 1 reply; 28+ messages in thread
From: Daniel Henrique Barboza @ 2021-02-19 21:31 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-ppc, qemu-devel, groug



On 2/16/21 11:31 PM, David Gibson wrote:
> On Thu, Feb 11, 2021 at 07:52:46PM -0300, Daniel Henrique Barboza wrote:
>> Handling errors in memory hotunplug in the pSeries machine is more complex
>> than any other device type, because there are all the complications that other
>> devices has, and more.

[...]

>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index ecce8abf14..4bcded4a1a 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -3575,6 +3575,36 @@ static SpaprDimmState *spapr_recover_pending_dimm_state(SpaprMachineState *ms,
>>       return spapr_pending_dimm_unplugs_add(ms, avail_lmbs, dimm);
>>   }
>>   
>> +void spapr_clear_pending_dimm_unplug_state(SpaprMachineState *spapr,
>> +                                           PCDIMMDevice *dimm)
>> +{
>> +    SpaprDimmState *ds = spapr_pending_dimm_unplugs_find(spapr, dimm);
>> +    SpaprDrc *drc;
>> +    uint32_t nr_lmbs;
>> +    uint64_t size, addr_start, addr;
>> +    int i;
>> +
>> +    if (ds) {
>> +        spapr_pending_dimm_unplugs_remove(spapr, ds);
>> +    }
> 
> Hrm... how would !ds arise?  Could this just be an assert?

!ds would appear if we do not assert g_assert(drc->dev) down there, where you
suggested down below that a malicious/buggy code would trigger it, for example.
With that assert in place then this less likely to occcur.

I guess what I can do here is:

- remove the g_assert(drc->dev) from down below, since it's more related to the
logic of this function;

- here, check if drc->dev is NULL. Return doing nothing if that's the case (all the
function relies on drc->dev being valid);

- if drc->dev is not NULL, then we can g_assert(ds) and proceed with the rest of
the function

This way we become a little more tolerant on drc->dev being NULL, but if drc->dev
is valid we will expect a unplug dimm state to always exist and assert it.


Thanks,


DHB

> 
>> +
>> +    size = memory_device_get_region_size(MEMORY_DEVICE(dimm), &error_abort);
>> +    nr_lmbs = size / SPAPR_MEMORY_BLOCK_SIZE;
>> +
>> +    addr_start = object_property_get_uint(OBJECT(dimm), PC_DIMM_ADDR_PROP,
>> +                                          &error_abort);
>> +
>> +    addr = addr_start;
>> +    for (i = 0; i < nr_lmbs; i++) {
>> +        drc = spapr_drc_by_id(TYPE_SPAPR_DRC_LMB,
>> +                              addr / SPAPR_MEMORY_BLOCK_SIZE);
>> +        g_assert(drc);
>> +
>> +        drc->unplug_requested = false;
>> +        addr += SPAPR_MEMORY_BLOCK_SIZE;
>> +    }
>> +}
>> +
>>   /* Callback to be called during DRC release. */
>>   void spapr_lmb_release(DeviceState *dev)
>>   {
>> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
>> index c143bfb6d3..eae941233a 100644
>> --- a/hw/ppc/spapr_drc.c
>> +++ b/hw/ppc/spapr_drc.c
>> @@ -1230,6 +1230,20 @@ static void rtas_ibm_configure_connector(PowerPCCPU *cpu,
>>   
>>       drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
>>   
>> +    /*
>> +     * This indicates that the kernel is reconfiguring a LMB due to
>> +     * a failed hotunplug. Clear the pending unplug state for the whole
>> +     * DIMM.
>> +     */
>> +    if (spapr_drc_type(drc) == SPAPR_DR_CONNECTOR_TYPE_LMB &&
>> +        drc->unplug_requested) {
>> +
>> +        /* This really shouldn't happen in this point, but ... */
>> +        g_assert(drc->dev);
> 
> I'm a little worried that a buggy or malicious guest could trigger
> this assert.
> 
>> +
>> +        spapr_clear_pending_dimm_unplug_state(spapr, PC_DIMM(drc->dev));
>> +    }
>> +
>>       if (!drc->fdt) {
>>           void *fdt;
>>           int fdt_size;
>> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
>> index ccbeeca1de..5bcc8f3bb8 100644
>> --- a/include/hw/ppc/spapr.h
>> +++ b/include/hw/ppc/spapr.h
>> @@ -847,6 +847,8 @@ int spapr_hpt_shift_for_ramsize(uint64_t ramsize);
>>   int spapr_reallocate_hpt(SpaprMachineState *spapr, int shift, Error **errp);
>>   void spapr_clear_pending_events(SpaprMachineState *spapr);
>>   void spapr_clear_pending_hotplug_events(SpaprMachineState *spapr);
>> +void spapr_clear_pending_dimm_unplug_state(SpaprMachineState *spapr,
>> +                                           PCDIMMDevice *dimm);
>>   int spapr_max_server_number(SpaprMachineState *spapr);
>>   void spapr_store_hpte(PowerPCCPU *cpu, hwaddr ptex,
>>                         uint64_t pte0, uint64_t pte1);
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 7/7] spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state
  2021-02-19 20:04     ` Daniel Henrique Barboza
@ 2021-02-22  5:53       ` David Gibson
  0 siblings, 0 replies; 28+ messages in thread
From: David Gibson @ 2021-02-22  5:53 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: qemu-ppc, qemu-devel, groug

[-- Attachment #1: Type: text/plain, Size: 7022 bytes --]

On Fri, Feb 19, 2021 at 05:04:23PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 2/16/21 11:31 PM, David Gibson wrote:
> > On Thu, Feb 11, 2021 at 07:52:46PM -0300, Daniel Henrique Barboza wrote:
> > > Handling errors in memory hotunplug in the pSeries machine is more complex
> > > than any other device type, because there are all the complications that other
> > > devices has, and more.
> > > 
> > > For instance, determining a timeout for a DIMM hotunplug must consider if it's a
> > > Hash-MMU or a Radix-MMU guest, because Hash guests takes longer to hotunplug DIMMs.
> > > The size of the DIMM is also a factor, given that longer DIMMs naturally takes
> > > longer to be hotunplugged from the kernel. And there's also the guest memory usage to
> > > be considered: if there's a process that is consuming memory that would be lost by
> > > the DIMM unplug, the kernel will postpone the unplug process until the process
> > > finishes, and then initiate the regular hotunplug process. The first two
> > > considerations are manageable, but the last one is a deal breaker.
> > > 
> > > There is no sane way for the pSeries machine to determine the memory load in the guest
> > > when attempting a DIMM hotunplug - and even if there was a way, the guest can start
> > > using all the RAM in the middle of the unplug process and invalidate our previous
> > > assumptions - and in result we can't even begin to calculate a timeout for the
> > > operation. This means that we can't implement a viable timeout mechanism for memory
> > > unplug in pSeries.
> > > 
> > > Going back to why we would consider an unplug timeout, the reason is that we can't
> > > know if the kernel is giving up the unplug. Turns out that, sometimes, we can.
> > > Consider a failed memory hotunplug attempt where the kernel will error out with
> > > the following message:
> > > 
> > > 'pseries-hotplug-mem: Memory indexed-count-remove failed, adding any removed LMBs'
> > > 
> > > This happens when there is a LMB that the kernel gave up in removing, and the LMBs
> > > marked for removal of the same DIMM are now being added back. This process happens
> > 
> > We need to be a little careful about terminology here.  From the
> > guest's point of view, there's no such thing as a DIMM, only LMBs.
> > What the guest is doing here is essentially rejecting a single "index
> > + number" DRC unplug request, which corresponds to one DIMM on the
> > qemu side.
> 
> I'll reword this paragraph to avoid using "DIMM" in the context of the guest
> kernel.
> 
> > 
> > > in the pseries kernel in [1], dlpar_memory_remove_by_ic() into dlpar_add_lmb(), and
> > > after that update_lmb_associativity_index(). In this function, the kernel is configuring
> > > the LMB DRC connector again. Note that this is a valid usage in LOPAR, as stated in
> > > section "ibm,configure-connector RTAS Call":
> > > 
> > > 'A subsequent sequence of calls to ibm,configure-connector with the same entry from
> > > the “ibm,drc-indexes” or “ibm,drc-info” property will restart the configuration of
> > > devices which were not completely configured.'
> > > 
> > > We can use this kernel behavior in our favor. If a DRC connector reconfiguration
> > > for a LMB that we marked as unplug pending happens, this indicates that the kernel
> > > changed its mind about the unplug and is reasserting that it will keep using the
> > > DIMM. In this case, it's safe to assume that the whole DIMM unplug was cancelled.
> > > 
> > > This patch hops into rtas_ibm_configure_connector() and, in the scenario described
> > > above, clear the unplug state for the DIMM device. This will not solve all the
> > > problems we still have with memory unplug, but it will cover this case where the
> > > kernel reconfigures LMBs after a failed unplug. We are a bit more resilient,
> > > without using an unreliable timeout, and we didn't make the remaining error cases
> > > any worse.
> > 
> > I wonder if we could use this as a beginning of a hotplug failure
> > reporting mechanism.  As noted, this is explicitly allowed by PAPR and
> > I think in general it makes sense that a configure-connector would
> > re-assert that the guest is using the resource and we can't unplug it.
> > 
> 
> I think it's worth looking into it. The kernel already does that in case of hotunplug
> failure of LMBs (at least in this particular case), so it's a matter of evaluating
> how hard it is to do the same for e.g. CPUs.
> 
> 
> > Could we extend guests to do an indicative configure-connector on any
> > unplug it knows it can't complete?  Or if configure-connector is too
> > disruptive could we use an (extra) H_SET_INDICATOR to "UNISOLATE"
> > state? If I'm reading right, that should be both permitted and a no-op
> > for existing PAPR implementations, so it should be a pretty safe way
> > to add that indication.
> 
> A quick look in LOPAR shows that set_indicator can be used to report
> hotplug failures (which is a surprise to me, I wasn't aware of it):
> 
> -----
> (Table 13.7, R1-13.5.3.4-4.)
> 
> For all DR options: If this is a DR operation that involves the user insert-
> ing a DR entity, then if the firmware can determine that the inserted entity
> would cause a system disturbance, then the set-indicator RTAS call must
> not unisolate the entity and must return an error status which is unique to the
> particular error.
> -----
> 
> The wording 'would cause a system disturbance' seems generic on purpose, giving
> the firmware/platform the prerrogative to refuse a hotplug for any given
> reason.

Right.  This is about firmware/platform detected errors, which is of
less interest to us at the moment than OS detected errors.

> I also read that there is no rule against using set_indicator with "unisolate" to
> an already unisolated resource. It would be a no-op.
> 
> I don't think we would find fierce opposition if we propose an addendum such as:
> 
> "For all DR options: If this is a DR operation that involves the user removing
> ing a DR entity, then if the partition operational system can determine that
> removing the entity would cause a system disturbance, then the set-indicator RTAS
> call can be used with the 'unisolate' value to inform the platform that the entity
> can not be removed at that time."
> 
> This would be enough to accomplish what we're aiming for using set_indicator and
> unisolate. Then we have 2 options to signal a failed unplug - configure-connector
> and unisolating the device. The guest can use whichever is easier or makes more
> sense.

Sounds good, let's do it.  Because it's a no-op currently, I also
think we can go ahead with an implementation even without waiting for
a PAPR change to be approved.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v3 7/7] spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state
  2021-02-19 21:31     ` Daniel Henrique Barboza
@ 2021-02-22  5:54       ` David Gibson
  0 siblings, 0 replies; 28+ messages in thread
From: David Gibson @ 2021-02-22  5:54 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: qemu-ppc, qemu-devel, groug

[-- Attachment #1: Type: text/plain, Size: 2316 bytes --]

On Fri, Feb 19, 2021 at 06:31:46PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 2/16/21 11:31 PM, David Gibson wrote:
> > On Thu, Feb 11, 2021 at 07:52:46PM -0300, Daniel Henrique Barboza wrote:
> > > Handling errors in memory hotunplug in the pSeries machine is more complex
> > > than any other device type, because there are all the complications that other
> > > devices has, and more.
> 
> [...]
> 
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index ecce8abf14..4bcded4a1a 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -3575,6 +3575,36 @@ static SpaprDimmState *spapr_recover_pending_dimm_state(SpaprMachineState *ms,
> > >       return spapr_pending_dimm_unplugs_add(ms, avail_lmbs, dimm);
> > >   }
> > > +void spapr_clear_pending_dimm_unplug_state(SpaprMachineState *spapr,
> > > +                                           PCDIMMDevice *dimm)
> > > +{
> > > +    SpaprDimmState *ds = spapr_pending_dimm_unplugs_find(spapr, dimm);
> > > +    SpaprDrc *drc;
> > > +    uint32_t nr_lmbs;
> > > +    uint64_t size, addr_start, addr;
> > > +    int i;
> > > +
> > > +    if (ds) {
> > > +        spapr_pending_dimm_unplugs_remove(spapr, ds);
> > > +    }
> > 
> > Hrm... how would !ds arise?  Could this just be an assert?
> 
> !ds would appear if we do not assert g_assert(drc->dev) down there, where you
> suggested down below that a malicious/buggy code would trigger it, for example.
> With that assert in place then this less likely to occcur.
> 
> I guess what I can do here is:
> 
> - remove the g_assert(drc->dev) from down below, since it's more related to the
> logic of this function;
> 
> - here, check if drc->dev is NULL. Return doing nothing if that's the case (all the
> function relies on drc->dev being valid);
> 
> - if drc->dev is not NULL, then we can g_assert(ds) and proceed with the rest of
> the function
> 
> This way we become a little more tolerant on drc->dev being NULL, but if drc->dev
> is valid we will expect a unplug dimm state to always exist and
> assert it.

Seems reasonable.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2021-02-22  6:01 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-11 22:52 [PATCH v3 0/7] CPU unplug timeout/LMB unplug cleanup in DRC reconfiguration Daniel Henrique Barboza
2021-02-11 22:52 ` [PATCH v3 1/7] spapr_drc.c: do not call spapr_drc_detach() in drc_isolate_logical() Daniel Henrique Barboza
2021-02-15 10:40   ` Greg Kurz
2021-02-17  0:51     ` David Gibson
2021-02-11 22:52 ` [PATCH v3 2/7] spapr_pci.c: simplify spapr_pci_unplug_request() function handling Daniel Henrique Barboza
2021-02-16 15:50   ` Greg Kurz
2021-02-16 16:09     ` Daniel Henrique Barboza
2021-02-16 17:16       ` Greg Kurz
2021-02-16 17:44         ` Daniel Henrique Barboza
2021-02-17  0:54           ` David Gibson
2021-02-11 22:52 ` [PATCH v3 3/7] spapr_drc.c: use spapr_drc_release() in isolate_physical/set_unusable Daniel Henrique Barboza
2021-02-17  0:57   ` David Gibson
2021-02-17 10:58   ` Greg Kurz
2021-02-11 22:52 ` [PATCH v3 4/7] spapr: rename spapr_drc_detach() to spapr_drc_unplug_request() Daniel Henrique Barboza
2021-02-17  0:58   ` David Gibson
2021-02-17 11:01   ` Greg Kurz
2021-02-11 22:52 ` [PATCH v3 5/7] spapr_drc.c: introduce unplug_timeout_timer Daniel Henrique Barboza
2021-02-17  1:14   ` David Gibson
2021-02-17  1:20   ` David Gibson
2021-02-11 22:52 ` [PATCH v3 6/7] spapr_drc.c: add hotunplug timeout for CPUs Daniel Henrique Barboza
2021-02-17  1:23   ` David Gibson
2021-02-11 22:52 ` [PATCH v3 7/7] spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state Daniel Henrique Barboza
2021-02-17  2:31   ` David Gibson
2021-02-19 20:04     ` Daniel Henrique Barboza
2021-02-22  5:53       ` David Gibson
2021-02-19 21:31     ` Daniel Henrique Barboza
2021-02-22  5:54       ` David Gibson
2021-02-17  2:33 ` [PATCH v3 0/7] CPU unplug timeout/LMB unplug cleanup in DRC reconfiguration David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.