All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] DEVICE_UNPLUG_ERROR QAPI event
@ 2021-06-04 20:03 Daniel Henrique Barboza
  2021-06-04 20:03 ` [PATCH v2 1/2] qapi/machine.json: add " Daniel Henrique Barboza
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Daniel Henrique Barboza @ 2021-06-04 20:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: armbru, Daniel Henrique Barboza, qemu-ppc, groug, david

Hi,

This is the v2 of a series that started with 2 events,
DEVICE_UNPLUG_ERROR and DEVICE_NOT_DELETED [1]. After discussions in v1
we reached the conclussion that the DEVICE_NOT_DELETED wasn't doing much
of anything. It was an event that was trying to say 'I think something
happen, but I'm not sure', forcing the QAPI listener to inspect the
guest itself to see what went wrong, or just wait for some sort of
internal timeout (as Libvirt will do) and fail the operation regardless.

During this period between v1 and this v2 the PowerPC kernel was changed
to add a reliable error report mechanism in the device_removal path of
CPUs, which in turn gave QEMU the opportunity to do the same. This made
the DEVICE_UNPLUG_ERROR more relevant because now we can report CPU and
DIMM hotunplug errors.


changes from v1:
- former patches 1 and 2: dropped
- patch 1 (former 3): changed the version to '6.1'
- patch 2 (former 4): add a DEVICE_UNPLUG_ERROR event in the device
  unplug error path of CPUs and DIMMs

[1] v1 link: https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg04682.html



Daniel Henrique Barboza (2):
  qapi/machine.json: add DEVICE_UNPLUG_ERROR QAPI event
  spapr: use DEVICE_UNPLUG_ERROR to report unplug errors

 hw/ppc/spapr.c     |  2 +-
 hw/ppc/spapr_drc.c | 15 +++++++++------
 qapi/machine.json  | 23 +++++++++++++++++++++++
 3 files changed, 33 insertions(+), 7 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/2] qapi/machine.json: add DEVICE_UNPLUG_ERROR QAPI event
  2021-06-04 20:03 [PATCH v2 0/2] DEVICE_UNPLUG_ERROR QAPI event Daniel Henrique Barboza
@ 2021-06-04 20:03 ` Daniel Henrique Barboza
  2021-06-07  2:23   ` David Gibson
  2021-06-11 12:12   ` Markus Armbruster
  2021-06-04 20:03 ` [PATCH v2 2/2] spapr: use DEVICE_UNPLUG_ERROR to report unplug errors Daniel Henrique Barboza
  2021-06-07  2:25 ` [PATCH v2 0/2] DEVICE_UNPLUG_ERROR QAPI event David Gibson
  2 siblings, 2 replies; 14+ messages in thread
From: Daniel Henrique Barboza @ 2021-06-04 20:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: armbru, Daniel Henrique Barboza, qemu-ppc, groug, david

At this moment we only provide one event to report a hotunplug error,
MEM_UNPLUG_ERROR. As of Linux kernel 5.12 and QEMU 6.0.0, the pseries
machine is now able to report unplug errors for other device types, such
as CPUs.

Instead of creating a (device_type)_UNPLUG_ERROR for each new device,
create a generic DEVICE_UNPLUG_ERROR event that can be used by all
unplug errors in the future.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 qapi/machine.json | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/qapi/machine.json b/qapi/machine.json
index 58a9c86b36..f0c7e56be0 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1274,3 +1274,26 @@
 ##
 { 'event': 'MEM_UNPLUG_ERROR',
   'data': { 'device': 'str', 'msg': 'str' } }
+
+##
+# @DEVICE_UNPLUG_ERROR:
+#
+# Emitted when a device hot unplug error occurs.
+#
+# @device: device name
+#
+# @msg: Informative message
+#
+# Since: 6.1
+#
+# Example:
+#
+# <- { "event": "DEVICE_UNPLUG_ERROR"
+#      "data": { "device": "dimm1",
+#                "msg": "Memory hotunplug rejected by the guest for device dimm1"
+#      },
+#      "timestamp": { "seconds": 1615570772, "microseconds": 202844 } }
+#
+##
+{ 'event': 'DEVICE_UNPLUG_ERROR',
+  'data': { 'device': 'str', 'msg': 'str' } }
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/2] spapr: use DEVICE_UNPLUG_ERROR to report unplug errors
  2021-06-04 20:03 [PATCH v2 0/2] DEVICE_UNPLUG_ERROR QAPI event Daniel Henrique Barboza
  2021-06-04 20:03 ` [PATCH v2 1/2] qapi/machine.json: add " Daniel Henrique Barboza
@ 2021-06-04 20:03 ` Daniel Henrique Barboza
  2021-06-07  2:24   ` David Gibson
  2021-06-11 12:18   ` Markus Armbruster
  2021-06-07  2:25 ` [PATCH v2 0/2] DEVICE_UNPLUG_ERROR QAPI event David Gibson
  2 siblings, 2 replies; 14+ messages in thread
From: Daniel Henrique Barboza @ 2021-06-04 20:03 UTC (permalink / raw)
  To: qemu-devel; +Cc: armbru, Daniel Henrique Barboza, qemu-ppc, groug, david

Linux Kernel 5.12 is now unisolating CPU DRCs in the device_removal
error path, signalling that the hotunplug process wasn't successful.
This allow us to send a DEVICE_UNPLUG_ERROR in drc_unisolate_logical()
to signal this error to the management layer.

We also have another error path in spapr_memory_unplug_rollback() for
configured LMB DRCs. Kernels older than 5.13 will not unisolate the LMBs
in the hotunplug error path, but it will reconfigure them.  Let's send
the DEVICE_UNPLUG_ERROR event in that code path as well to cover the
case of older kernels.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 hw/ppc/spapr.c     |  2 +-
 hw/ppc/spapr_drc.c | 15 +++++++++------
 2 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index c23bcc4490..29aa2f467d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3639,7 +3639,7 @@ void spapr_memory_unplug_rollback(SpaprMachineState *spapr, DeviceState *dev)
      */
     qapi_error = g_strdup_printf("Memory hotunplug rejected by the guest "
                                  "for device %s", dev->id);
-    qapi_event_send_mem_unplug_error(dev->id, qapi_error);
+    qapi_event_send_device_unplug_error(dev->id, qapi_error);
 }
 
 /* Callback to be called during DRC release. */
diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index a2f2634601..0e1a8733bc 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -17,6 +17,8 @@
 #include "hw/ppc/spapr_drc.h"
 #include "qom/object.h"
 #include "migration/vmstate.h"
+#include "qapi/error.h"
+#include "qapi/qapi-events-machine.h"
 #include "qapi/visitor.h"
 #include "qemu/error-report.h"
 #include "hw/ppc/spapr.h" /* for RTAS return codes */
@@ -160,6 +162,10 @@ static uint32_t drc_unisolate_logical(SpaprDrc *drc)
          * means that the kernel is refusing the removal.
          */
         if (drc->unplug_requested && drc->dev) {
+            const char qapi_error_fmt[] = "Device hotunplug rejected by the "
+                                          "guest for device %s";
+            g_autofree char *qapi_error = NULL;
+
             if (spapr_drc_type(drc) == SPAPR_DR_CONNECTOR_TYPE_LMB) {
                 spapr = SPAPR_MACHINE(qdev_get_machine());
 
@@ -167,13 +173,10 @@ static uint32_t drc_unisolate_logical(SpaprDrc *drc)
             }
 
             drc->unplug_requested = false;
-            error_report("Device hotunplug rejected by the guest "
-                         "for device %s", drc->dev->id);
+            error_report(qapi_error_fmt, drc->dev->id);
 
-            /*
-             * TODO: send a QAPI DEVICE_UNPLUG_ERROR event when
-             * it is implemented.
-             */
+            qapi_error = g_strdup_printf(qapi_error_fmt, drc->dev->id);
+            qapi_event_send_device_unplug_error(drc->dev->id, qapi_error);
         }
 
         return RTAS_OUT_SUCCESS; /* Nothing to do */
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] qapi/machine.json: add DEVICE_UNPLUG_ERROR QAPI event
  2021-06-04 20:03 ` [PATCH v2 1/2] qapi/machine.json: add " Daniel Henrique Barboza
@ 2021-06-07  2:23   ` David Gibson
  2021-06-07  2:23     ` David Gibson
  2021-06-11 12:12   ` Markus Armbruster
  1 sibling, 1 reply; 14+ messages in thread
From: David Gibson @ 2021-06-07  2:23 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: armbru, qemu-ppc, qemu-devel, groug

[-- Attachment #1: Type: text/plain, Size: 1757 bytes --]

On Fri, Jun 04, 2021 at 05:03:52PM -0300, Daniel Henrique Barboza wrote:
> At this moment we only provide one event to report a hotunplug error,
> MEM_UNPLUG_ERROR. As of Linux kernel 5.12 and QEMU 6.0.0, the pseries
> machine is now able to report unplug errors for other device types, such
> as CPUs.
> 
> Instead of creating a (device_type)_UNPLUG_ERROR for each new device,
> create a generic DEVICE_UNPLUG_ERROR event that can be used by all
> unplug errors in the future.
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  qapi/machine.json | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 58a9c86b36..f0c7e56be0 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -1274,3 +1274,26 @@
>  ##
>  { 'event': 'MEM_UNPLUG_ERROR',
>    'data': { 'device': 'str', 'msg': 'str' } }
> +
> +##
> +# @DEVICE_UNPLUG_ERROR:
> +#
> +# Emitted when a device hot unplug error occurs.
> +#
> +# @device: device name
> +#
> +# @msg: Informative message
> +#
> +# Since: 6.1
> +#
> +# Example:
> +#
> +# <- { "event": "DEVICE_UNPLUG_ERROR"
> +#      "data": { "device": "dimm1",
> +#                "msg": "Memory hotunplug rejected by the guest for device dimm1"
> +#      },
> +#      "timestamp": { "seconds": 1615570772, "microseconds": 202844 } }
> +#
> +##
> +{ 'event': 'DEVICE_UNPLUG_ERROR',
> +  'data': { 'device': 'str', 'msg': 'str' } }

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] qapi/machine.json: add DEVICE_UNPLUG_ERROR QAPI event
  2021-06-07  2:23   ` David Gibson
@ 2021-06-07  2:23     ` David Gibson
  2021-06-07 18:41       ` Eric Blake
  0 siblings, 1 reply; 14+ messages in thread
From: David Gibson @ 2021-06-07  2:23 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: armbru, qemu-ppc, qemu-devel, groug

[-- Attachment #1: Type: text/plain, Size: 2037 bytes --]

On Mon, Jun 07, 2021 at 12:23:08PM +1000, David Gibson wrote:
> On Fri, Jun 04, 2021 at 05:03:52PM -0300, Daniel Henrique Barboza wrote:
> > At this moment we only provide one event to report a hotunplug error,
> > MEM_UNPLUG_ERROR. As of Linux kernel 5.12 and QEMU 6.0.0, the pseries
> > machine is now able to report unplug errors for other device types, such
> > as CPUs.
> > 
> > Instead of creating a (device_type)_UNPLUG_ERROR for each new device,
> > create a generic DEVICE_UNPLUG_ERROR event that can be used by all
> > unplug errors in the future.
> > 
> > Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> 
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

Markus, I'm happy to take this through my tree if that's convenient
for you, but I'd like to get an ack.

> 
> > ---
> >  qapi/machine.json | 23 +++++++++++++++++++++++
> >  1 file changed, 23 insertions(+)
> > 
> > diff --git a/qapi/machine.json b/qapi/machine.json
> > index 58a9c86b36..f0c7e56be0 100644
> > --- a/qapi/machine.json
> > +++ b/qapi/machine.json
> > @@ -1274,3 +1274,26 @@
> >  ##
> >  { 'event': 'MEM_UNPLUG_ERROR',
> >    'data': { 'device': 'str', 'msg': 'str' } }
> > +
> > +##
> > +# @DEVICE_UNPLUG_ERROR:
> > +#
> > +# Emitted when a device hot unplug error occurs.
> > +#
> > +# @device: device name
> > +#
> > +# @msg: Informative message
> > +#
> > +# Since: 6.1
> > +#
> > +# Example:
> > +#
> > +# <- { "event": "DEVICE_UNPLUG_ERROR"
> > +#      "data": { "device": "dimm1",
> > +#                "msg": "Memory hotunplug rejected by the guest for device dimm1"
> > +#      },
> > +#      "timestamp": { "seconds": 1615570772, "microseconds": 202844 } }
> > +#
> > +##
> > +{ 'event': 'DEVICE_UNPLUG_ERROR',
> > +  'data': { 'device': 'str', 'msg': 'str' } }
> 



-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] spapr: use DEVICE_UNPLUG_ERROR to report unplug errors
  2021-06-04 20:03 ` [PATCH v2 2/2] spapr: use DEVICE_UNPLUG_ERROR to report unplug errors Daniel Henrique Barboza
@ 2021-06-07  2:24   ` David Gibson
  2021-06-11 12:18   ` Markus Armbruster
  1 sibling, 0 replies; 14+ messages in thread
From: David Gibson @ 2021-06-07  2:24 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: armbru, qemu-ppc, qemu-devel, groug

[-- Attachment #1: Type: text/plain, Size: 3458 bytes --]

On Fri, Jun 04, 2021 at 05:03:53PM -0300, Daniel Henrique Barboza wrote:
> Linux Kernel 5.12 is now unisolating CPU DRCs in the device_removal
> error path, signalling that the hotunplug process wasn't successful.
> This allow us to send a DEVICE_UNPLUG_ERROR in drc_unisolate_logical()
> to signal this error to the management layer.
> 
> We also have another error path in spapr_memory_unplug_rollback() for
> configured LMB DRCs. Kernels older than 5.13 will not unisolate the LMBs
> in the hotunplug error path, but it will reconfigure them.  Let's send
> the DEVICE_UNPLUG_ERROR event in that code path as well to cover the
> case of older kernels.
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  hw/ppc/spapr.c     |  2 +-
>  hw/ppc/spapr_drc.c | 15 +++++++++------
>  2 files changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index c23bcc4490..29aa2f467d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3639,7 +3639,7 @@ void spapr_memory_unplug_rollback(SpaprMachineState *spapr, DeviceState *dev)
>       */
>      qapi_error = g_strdup_printf("Memory hotunplug rejected by the guest "
>                                   "for device %s", dev->id);
> -    qapi_event_send_mem_unplug_error(dev->id, qapi_error);
> +    qapi_event_send_device_unplug_error(dev->id, qapi_error);
>  }
>  
>  /* Callback to be called during DRC release. */
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index a2f2634601..0e1a8733bc 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -17,6 +17,8 @@
>  #include "hw/ppc/spapr_drc.h"
>  #include "qom/object.h"
>  #include "migration/vmstate.h"
> +#include "qapi/error.h"
> +#include "qapi/qapi-events-machine.h"
>  #include "qapi/visitor.h"
>  #include "qemu/error-report.h"
>  #include "hw/ppc/spapr.h" /* for RTAS return codes */
> @@ -160,6 +162,10 @@ static uint32_t drc_unisolate_logical(SpaprDrc *drc)
>           * means that the kernel is refusing the removal.
>           */
>          if (drc->unplug_requested && drc->dev) {
> +            const char qapi_error_fmt[] = "Device hotunplug rejected by the "
> +                                          "guest for device %s";
> +            g_autofree char *qapi_error = NULL;
> +
>              if (spapr_drc_type(drc) == SPAPR_DR_CONNECTOR_TYPE_LMB) {
>                  spapr = SPAPR_MACHINE(qdev_get_machine());
>  
> @@ -167,13 +173,10 @@ static uint32_t drc_unisolate_logical(SpaprDrc *drc)
>              }
>  
>              drc->unplug_requested = false;
> -            error_report("Device hotunplug rejected by the guest "
> -                         "for device %s", drc->dev->id);
> +            error_report(qapi_error_fmt, drc->dev->id);
>  
> -            /*
> -             * TODO: send a QAPI DEVICE_UNPLUG_ERROR event when
> -             * it is implemented.
> -             */
> +            qapi_error = g_strdup_printf(qapi_error_fmt, drc->dev->id);
> +            qapi_event_send_device_unplug_error(drc->dev->id, qapi_error);
>          }
>  
>          return RTAS_OUT_SUCCESS; /* Nothing to do */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/2] DEVICE_UNPLUG_ERROR QAPI event
  2021-06-04 20:03 [PATCH v2 0/2] DEVICE_UNPLUG_ERROR QAPI event Daniel Henrique Barboza
  2021-06-04 20:03 ` [PATCH v2 1/2] qapi/machine.json: add " Daniel Henrique Barboza
  2021-06-04 20:03 ` [PATCH v2 2/2] spapr: use DEVICE_UNPLUG_ERROR to report unplug errors Daniel Henrique Barboza
@ 2021-06-07  2:25 ` David Gibson
  2 siblings, 0 replies; 14+ messages in thread
From: David Gibson @ 2021-06-07  2:25 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: armbru, qemu-ppc, qemu-devel, groug

[-- Attachment #1: Type: text/plain, Size: 1916 bytes --]

On Fri, Jun 04, 2021 at 05:03:51PM -0300, Daniel Henrique Barboza wrote:
> Hi,
> 
> This is the v2 of a series that started with 2 events,
> DEVICE_UNPLUG_ERROR and DEVICE_NOT_DELETED [1]. After discussions in v1
> we reached the conclussion that the DEVICE_NOT_DELETED wasn't doing much
> of anything. It was an event that was trying to say 'I think something
> happen, but I'm not sure', forcing the QAPI listener to inspect the
> guest itself to see what went wrong, or just wait for some sort of
> internal timeout (as Libvirt will do) and fail the operation regardless.
> 
> During this period between v1 and this v2 the PowerPC kernel was changed
> to add a reliable error report mechanism in the device_removal path of
> CPUs, which in turn gave QEMU the opportunity to do the same. This made
> the DEVICE_UNPLUG_ERROR more relevant because now we can report CPU and
> DIMM hotunplug errors.
> 
> 
> changes from v1:
> - former patches 1 and 2: dropped
> - patch 1 (former 3): changed the version to '6.1'
> - patch 2 (former 4): add a DEVICE_UNPLUG_ERROR event in the device
>   unplug error path of CPUs and DIMMs
> 
> [1] v1 link:
> https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg04682.html

It would be nice to add a patch making x86 also issue the new error
format (as well as the old) on memory hot unplug errors.

> 
> 
> 
> Daniel Henrique Barboza (2):
>   qapi/machine.json: add DEVICE_UNPLUG_ERROR QAPI event
>   spapr: use DEVICE_UNPLUG_ERROR to report unplug errors
> 
>  hw/ppc/spapr.c     |  2 +-
>  hw/ppc/spapr_drc.c | 15 +++++++++------
>  qapi/machine.json  | 23 +++++++++++++++++++++++
>  3 files changed, 33 insertions(+), 7 deletions(-)
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] qapi/machine.json: add DEVICE_UNPLUG_ERROR QAPI event
  2021-06-07  2:23     ` David Gibson
@ 2021-06-07 18:41       ` Eric Blake
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Blake @ 2021-06-07 18:41 UTC (permalink / raw)
  To: David Gibson; +Cc: groug, Daniel Henrique Barboza, qemu-ppc, armbru, qemu-devel

On Mon, Jun 07, 2021 at 12:23:53PM +1000, David Gibson wrote:
> On Mon, Jun 07, 2021 at 12:23:08PM +1000, David Gibson wrote:
> > On Fri, Jun 04, 2021 at 05:03:52PM -0300, Daniel Henrique Barboza wrote:
> > > At this moment we only provide one event to report a hotunplug error,
> > > MEM_UNPLUG_ERROR. As of Linux kernel 5.12 and QEMU 6.0.0, the pseries
> > > machine is now able to report unplug errors for other device types, such
> > > as CPUs.
> > > 
> > > Instead of creating a (device_type)_UNPLUG_ERROR for each new device,
> > > create a generic DEVICE_UNPLUG_ERROR event that can be used by all
> > > unplug errors in the future.
> > > 
> > > Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> > 
> > Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> 
> Markus, I'm happy to take this through my tree if that's convenient
> for you, but I'd like to get an ack.

I'm not Markus, but if you're happy with my interface review,

Reviewed-by: Eric Blake <eblake@redhat.com>


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] qapi/machine.json: add DEVICE_UNPLUG_ERROR QAPI event
  2021-06-04 20:03 ` [PATCH v2 1/2] qapi/machine.json: add " Daniel Henrique Barboza
  2021-06-07  2:23   ` David Gibson
@ 2021-06-11 12:12   ` Markus Armbruster
  2021-06-16 16:41     ` Daniel Henrique Barboza
  1 sibling, 1 reply; 14+ messages in thread
From: Markus Armbruster @ 2021-06-11 12:12 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: david, qemu-ppc, qemu-devel, groug

Daniel Henrique Barboza <danielhb413@gmail.com> writes:

> At this moment we only provide one event to report a hotunplug error,
> MEM_UNPLUG_ERROR. As of Linux kernel 5.12 and QEMU 6.0.0, the pseries
> machine is now able to report unplug errors for other device types, such
> as CPUs.
>
> Instead of creating a (device_type)_UNPLUG_ERROR for each new device,
> create a generic DEVICE_UNPLUG_ERROR event that can be used by all
> unplug errors in the future.
>
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---
>  qapi/machine.json | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
>
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 58a9c86b36..f0c7e56be0 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -1274,3 +1274,26 @@
>  ##
>  { 'event': 'MEM_UNPLUG_ERROR',
>    'data': { 'device': 'str', 'msg': 'str' } }
> +
> +##
> +# @DEVICE_UNPLUG_ERROR:
> +#
> +# Emitted when a device hot unplug error occurs.
> +#
> +# @device: device name
> +#
> +# @msg: Informative message
> +#
> +# Since: 6.1
> +#
> +# Example:
> +#
> +# <- { "event": "DEVICE_UNPLUG_ERROR"
> +#      "data": { "device": "dimm1",
> +#                "msg": "Memory hotunplug rejected by the guest for device dimm1"
> +#      },
> +#      "timestamp": { "seconds": 1615570772, "microseconds": 202844 } }
> +#
> +##
> +{ 'event': 'DEVICE_UNPLUG_ERROR',
> +  'data': { 'device': 'str', 'msg': 'str' } }

Missing: update of device_add's doc comment in qdev.json:

    # Notes: When this command completes, the device may not be removed from the
    #        guest.  Hot removal is an operation that requires guest cooperation.
    #        This command merely requests that the guest begin the hot removal
    #        process.  Completion of the device removal process is signaled with a
    #        DEVICE_DELETED event. Guest reset will automatically complete removal
    #        for all devices.

This sure could use some polish.

If I understand things correctly, we're aiming for the following device
unplug protocol:

   Unplug the device with device_del (or possibly equivalent)

   If we know we can't unplug the device, fail immediately.  Also emit
   DEVICE_UNPLUG_ERROR.

   If possible, unplug the device synchronously and succeed.  Also emit
   DEVICE_DELETED.

   Else, initiate unplug and succeed.

   When unplug finishes, emit either DEVICE_DELETED or
   DEVICE_UNPLUG_ERROR.

   For some machines and devices, unplug may never finish.

Correct?

Any particular reason for not putting event DEVICE_UNPLUG_ERROR next to
DEVICE_DELETED in qdev.json?



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] spapr: use DEVICE_UNPLUG_ERROR to report unplug errors
  2021-06-04 20:03 ` [PATCH v2 2/2] spapr: use DEVICE_UNPLUG_ERROR to report unplug errors Daniel Henrique Barboza
  2021-06-07  2:24   ` David Gibson
@ 2021-06-11 12:18   ` Markus Armbruster
  2021-06-16 16:58     ` Daniel Henrique Barboza
  1 sibling, 1 reply; 14+ messages in thread
From: Markus Armbruster @ 2021-06-11 12:18 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: david, qemu-ppc, qemu-devel, groug

Daniel Henrique Barboza <danielhb413@gmail.com> writes:

> Linux Kernel 5.12 is now unisolating CPU DRCs in the device_removal
> error path, signalling that the hotunplug process wasn't successful.
> This allow us to send a DEVICE_UNPLUG_ERROR in drc_unisolate_logical()
> to signal this error to the management layer.
>
> We also have another error path in spapr_memory_unplug_rollback() for
> configured LMB DRCs. Kernels older than 5.13 will not unisolate the LMBs
> in the hotunplug error path, but it will reconfigure them.  Let's send
> the DEVICE_UNPLUG_ERROR event in that code path as well to cover the
> case of older kernels.
>
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> ---
>  hw/ppc/spapr.c     |  2 +-
>  hw/ppc/spapr_drc.c | 15 +++++++++------
>  2 files changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index c23bcc4490..29aa2f467d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3639,7 +3639,7 @@ void spapr_memory_unplug_rollback(SpaprMachineState *spapr, DeviceState *dev)
>       */
>      qapi_error = g_strdup_printf("Memory hotunplug rejected by the guest "
>                                   "for device %s", dev->id);
> -    qapi_event_send_mem_unplug_error(dev->id, qapi_error);
> +    qapi_event_send_device_unplug_error(dev->id, qapi_error);

Incompatible change: we now emit DEVICE_UNPLUG_ERROR instead of
MEM_UNPLUG_ERROR.  Intentional?

If yes, we need a release note.

To avoid the incompatible, we can emit both, and deprecate
MEM_UNPLUG_ERROR.

What about the MEM_UNPLUG_ERROR in acpi_memory_hotplug_write()?

>  }
>  
>  /* Callback to be called during DRC release. */
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index a2f2634601..0e1a8733bc 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -17,6 +17,8 @@
>  #include "hw/ppc/spapr_drc.h"
>  #include "qom/object.h"
>  #include "migration/vmstate.h"
> +#include "qapi/error.h"
> +#include "qapi/qapi-events-machine.h"
>  #include "qapi/visitor.h"
>  #include "qemu/error-report.h"
>  #include "hw/ppc/spapr.h" /* for RTAS return codes */
> @@ -160,6 +162,10 @@ static uint32_t drc_unisolate_logical(SpaprDrc *drc)
>           * means that the kernel is refusing the removal.
>           */
>          if (drc->unplug_requested && drc->dev) {
> +            const char qapi_error_fmt[] = "Device hotunplug rejected by the "
> +                                          "guest for device %s";
> +            g_autofree char *qapi_error = NULL;
> +
>              if (spapr_drc_type(drc) == SPAPR_DR_CONNECTOR_TYPE_LMB) {
>                  spapr = SPAPR_MACHINE(qdev_get_machine());
>  
> @@ -167,13 +173,10 @@ static uint32_t drc_unisolate_logical(SpaprDrc *drc)
>              }
>  
>              drc->unplug_requested = false;
> -            error_report("Device hotunplug rejected by the guest "
> -                         "for device %s", drc->dev->id);
> +            error_report(qapi_error_fmt, drc->dev->id);
>  
> -            /*
> -             * TODO: send a QAPI DEVICE_UNPLUG_ERROR event when
> -             * it is implemented.
> -             */
> +            qapi_error = g_strdup_printf(qapi_error_fmt, drc->dev->id);
> +            qapi_event_send_device_unplug_error(drc->dev->id, qapi_error);
>          }
>  
>          return RTAS_OUT_SUCCESS; /* Nothing to do */

Reporting both to stderr and QMP is odd.  Can you describe a use case
where the report to stderr is useful?



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] qapi/machine.json: add DEVICE_UNPLUG_ERROR QAPI event
  2021-06-11 12:12   ` Markus Armbruster
@ 2021-06-16 16:41     ` Daniel Henrique Barboza
  2021-06-17  5:59       ` Markus Armbruster
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Henrique Barboza @ 2021-06-16 16:41 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: groug, qemu-ppc, qemu-devel, david



On 6/11/21 9:12 AM, Markus Armbruster wrote:
> Daniel Henrique Barboza <danielhb413@gmail.com> writes:
> 
>> At this moment we only provide one event to report a hotunplug error,
>> MEM_UNPLUG_ERROR. As of Linux kernel 5.12 and QEMU 6.0.0, the pseries
>> machine is now able to report unplug errors for other device types, such
>> as CPUs.
>>
>> Instead of creating a (device_type)_UNPLUG_ERROR for each new device,
>> create a generic DEVICE_UNPLUG_ERROR event that can be used by all
>> unplug errors in the future.
>>
>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>> ---
>>   qapi/machine.json | 23 +++++++++++++++++++++++
>>   1 file changed, 23 insertions(+)
>>
>> diff --git a/qapi/machine.json b/qapi/machine.json
>> index 58a9c86b36..f0c7e56be0 100644
>> --- a/qapi/machine.json
>> +++ b/qapi/machine.json
>> @@ -1274,3 +1274,26 @@
>>   ##
>>   { 'event': 'MEM_UNPLUG_ERROR',
>>     'data': { 'device': 'str', 'msg': 'str' } }
>> +
>> +##
>> +# @DEVICE_UNPLUG_ERROR:
>> +#
>> +# Emitted when a device hot unplug error occurs.
>> +#
>> +# @device: device name
>> +#
>> +# @msg: Informative message
>> +#
>> +# Since: 6.1
>> +#
>> +# Example:
>> +#
>> +# <- { "event": "DEVICE_UNPLUG_ERROR"
>> +#      "data": { "device": "dimm1",
>> +#                "msg": "Memory hotunplug rejected by the guest for device dimm1"
>> +#      },
>> +#      "timestamp": { "seconds": 1615570772, "microseconds": 202844 } }
>> +#
>> +##
>> +{ 'event': 'DEVICE_UNPLUG_ERROR',
>> +  'data': { 'device': 'str', 'msg': 'str' } }
> 
> Missing: update of device_add's doc comment in qdev.json:
> 
>      # Notes: When this command completes, the device may not be removed from the
>      #        guest.  Hot removal is an operation that requires guest cooperation.
>      #        This command merely requests that the guest begin the hot removal
>      #        process.  Completion of the device removal process is signaled with a
>      #        DEVICE_DELETED event. Guest reset will automatically complete removal
>      #        for all devices.

Ok

> 
> This sure could use some polish.
> 
> If I understand things correctly, we're aiming for the following device
> unplug protocol:

One thing to note is that DEVICE_UNPLUG_ERROR isn't guaranteed to be send for
every hotunplug error. The event depends on machine/architecture support to
detect a guest side error.

> 
>     Unplug the device with device_del (or possibly equivalent)
> 
>     If we know we can't unplug the device, fail immediately.  Also emit
>     DEVICE_UNPLUG_ERROR.


I haven't predicted to use this event in those cases as well, although it
seems reasonable to do so now that you mentioned it.

> 
>     If possible, unplug the device synchronously and succeed.  Also emit
>     DEVICE_DELETED.
> 
>     Else, initiate unplug and succeed.
> 
>     When unplug finishes, emit either DEVICE_DELETED or
>     DEVICE_UNPLUG_ERROR.

Since there's no 100% guarantee that DEVICE_UNPLUG_ERROR will be emitted for
guest side errors, the wording here would be

"When unplug finishes, emit DEVICE_DELETED. A DEVICE_UNPLUG_ERROR can be
emitted if a guest side error was detected"


> 
>     For some machines and devices, unplug may never finish.
> 
> Correct?
> 
> Any particular reason for not putting event DEVICE_UNPLUG_ERROR next to
> DEVICE_DELETED in qdev.json?


Not really. I looked where MEM_UNPLUG_ERROR was declared and put it right
after it. I can change it to qdev.json near DEVICE_DELETED.



Thanks,


Daniel

> 
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] spapr: use DEVICE_UNPLUG_ERROR to report unplug errors
  2021-06-11 12:18   ` Markus Armbruster
@ 2021-06-16 16:58     ` Daniel Henrique Barboza
  2021-06-16 17:58       ` Eric Blake
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Henrique Barboza @ 2021-06-16 16:58 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: david, qemu-ppc, qemu-devel, groug



On 6/11/21 9:18 AM, Markus Armbruster wrote:
> Daniel Henrique Barboza <danielhb413@gmail.com> writes:
> 
>> Linux Kernel 5.12 is now unisolating CPU DRCs in the device_removal
>> error path, signalling that the hotunplug process wasn't successful.
>> This allow us to send a DEVICE_UNPLUG_ERROR in drc_unisolate_logical()
>> to signal this error to the management layer.
>>
>> We also have another error path in spapr_memory_unplug_rollback() for
>> configured LMB DRCs. Kernels older than 5.13 will not unisolate the LMBs
>> in the hotunplug error path, but it will reconfigure them.  Let's send
>> the DEVICE_UNPLUG_ERROR event in that code path as well to cover the
>> case of older kernels.
>>
>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>> ---
>>   hw/ppc/spapr.c     |  2 +-
>>   hw/ppc/spapr_drc.c | 15 +++++++++------
>>   2 files changed, 10 insertions(+), 7 deletions(-)
>>
>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>> index c23bcc4490..29aa2f467d 100644
>> --- a/hw/ppc/spapr.c
>> +++ b/hw/ppc/spapr.c
>> @@ -3639,7 +3639,7 @@ void spapr_memory_unplug_rollback(SpaprMachineState *spapr, DeviceState *dev)
>>        */
>>       qapi_error = g_strdup_printf("Memory hotunplug rejected by the guest "
>>                                    "for device %s", dev->id);
>> -    qapi_event_send_mem_unplug_error(dev->id, qapi_error);
>> +    qapi_event_send_device_unplug_error(dev->id, qapi_error);
> 
> Incompatible change: we now emit DEVICE_UNPLUG_ERROR instead of
> MEM_UNPLUG_ERROR.  Intentional?
> 
> If yes, we need a release note.
> 
> To avoid the incompatible, we can emit both, and deprecate
> MEM_UNPLUG_ERROR.
> 
> What about the MEM_UNPLUG_ERROR in acpi_memory_hotplug_write()?

I'll emit DEVICE_UNPLUG_ERROR together with all MEM_UNPLUG_ERROR instances.
Then we can deprecate MEM_UNPLUG_ERROR.

By the way, how do I mark MEM_UNPLUG_ERROR as deprecated? I see examples
of command line options being documented as deprecated in
docs/system/deprecated.rst and some deprecated QOM/QDEV properties are
marked as deprecated directly in their .json files, but I didn't find
any case where a whole event is deprecated. Would something like this be
adequate?


$ git diff
diff --git a/qapi/machine.json b/qapi/machine.json
index 58a9c86b36..ce3d873c64 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1261,6 +1261,10 @@
  #
  # @msg: Informative message
  #
+#
+# @deprecated: Starting in 6.1 this event has been replaced by
+#              DEVICE_UNPLUG_ERROR.
+#
  # Since: 2.4
  #
  # Example:



Thanks,


Daniel


> 
>>   }
>>   
>>   /* Callback to be called during DRC release. */
>> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
>> index a2f2634601..0e1a8733bc 100644
>> --- a/hw/ppc/spapr_drc.c
>> +++ b/hw/ppc/spapr_drc.c
>> @@ -17,6 +17,8 @@
>>   #include "hw/ppc/spapr_drc.h"
>>   #include "qom/object.h"
>>   #include "migration/vmstate.h"
>> +#include "qapi/error.h"
>> +#include "qapi/qapi-events-machine.h"
>>   #include "qapi/visitor.h"
>>   #include "qemu/error-report.h"
>>   #include "hw/ppc/spapr.h" /* for RTAS return codes */
>> @@ -160,6 +162,10 @@ static uint32_t drc_unisolate_logical(SpaprDrc *drc)
>>            * means that the kernel is refusing the removal.
>>            */
>>           if (drc->unplug_requested && drc->dev) {
>> +            const char qapi_error_fmt[] = "Device hotunplug rejected by the "
>> +                                          "guest for device %s";
>> +            g_autofree char *qapi_error = NULL;
>> +
>>               if (spapr_drc_type(drc) == SPAPR_DR_CONNECTOR_TYPE_LMB) {
>>                   spapr = SPAPR_MACHINE(qdev_get_machine());
>>   
>> @@ -167,13 +173,10 @@ static uint32_t drc_unisolate_logical(SpaprDrc *drc)
>>               }
>>   
>>               drc->unplug_requested = false;
>> -            error_report("Device hotunplug rejected by the guest "
>> -                         "for device %s", drc->dev->id);
>> +            error_report(qapi_error_fmt, drc->dev->id);
>>   
>> -            /*
>> -             * TODO: send a QAPI DEVICE_UNPLUG_ERROR event when
>> -             * it is implemented.
>> -             */
>> +            qapi_error = g_strdup_printf(qapi_error_fmt, drc->dev->id);
>> +            qapi_event_send_device_unplug_error(drc->dev->id, qapi_error);
>>           }
>>   
>>           return RTAS_OUT_SUCCESS; /* Nothing to do */
> 
> Reporting both to stderr and QMP is odd.  Can you describe a use case
> where the report to stderr is useful?
> 


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] spapr: use DEVICE_UNPLUG_ERROR to report unplug errors
  2021-06-16 16:58     ` Daniel Henrique Barboza
@ 2021-06-16 17:58       ` Eric Blake
  0 siblings, 0 replies; 14+ messages in thread
From: Eric Blake @ 2021-06-16 17:58 UTC (permalink / raw)
  To: Daniel Henrique Barboza
  Cc: qemu-devel, groug, qemu-ppc, Markus Armbruster, david

On Wed, Jun 16, 2021 at 01:58:04PM -0300, Daniel Henrique Barboza wrote:
> > Incompatible change: we now emit DEVICE_UNPLUG_ERROR instead of
> > MEM_UNPLUG_ERROR.  Intentional?
> > 
> > If yes, we need a release note.
> > 
> > To avoid the incompatible, we can emit both, and deprecate
> > MEM_UNPLUG_ERROR.
> > 
> > What about the MEM_UNPLUG_ERROR in acpi_memory_hotplug_write()?
> 
> I'll emit DEVICE_UNPLUG_ERROR together with all MEM_UNPLUG_ERROR instances.
> Then we can deprecate MEM_UNPLUG_ERROR.
> 
> By the way, how do I mark MEM_UNPLUG_ERROR as deprecated? I see examples
> of command line options being documented as deprecated in
> docs/system/deprecated.rst and some deprecated QOM/QDEV properties are
> marked as deprecated directly in their .json files, but I didn't find
> any case where a whole event is deprecated. Would something like this be
> adequate?

Almost.  That documents the deprecation for readers, but you also need
to mark it for viewing by machine code...

> 
> 
> $ git diff
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 58a9c86b36..ce3d873c64 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -1261,6 +1261,10 @@
>  #
>  # @msg: Informative message
>  #
> +#
> +# @deprecated: Starting in 6.1 this event has been replaced by
> +#              DEVICE_UNPLUG_ERROR.
> +#
>  # Since: 2.4
>  #
>  # Example:

...do that by adding 'features':['deprecated'] to the QAPI event
definition.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] qapi/machine.json: add DEVICE_UNPLUG_ERROR QAPI event
  2021-06-16 16:41     ` Daniel Henrique Barboza
@ 2021-06-17  5:59       ` Markus Armbruster
  0 siblings, 0 replies; 14+ messages in thread
From: Markus Armbruster @ 2021-06-17  5:59 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: david, qemu-ppc, groug, qemu-devel

Daniel Henrique Barboza <danielhb413@gmail.com> writes:

> On 6/11/21 9:12 AM, Markus Armbruster wrote:
>> Daniel Henrique Barboza <danielhb413@gmail.com> writes:
>> 
>>> At this moment we only provide one event to report a hotunplug error,
>>> MEM_UNPLUG_ERROR. As of Linux kernel 5.12 and QEMU 6.0.0, the pseries
>>> machine is now able to report unplug errors for other device types, such
>>> as CPUs.
>>>
>>> Instead of creating a (device_type)_UNPLUG_ERROR for each new device,
>>> create a generic DEVICE_UNPLUG_ERROR event that can be used by all
>>> unplug errors in the future.
>>>
>>> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
>>> ---
>>>   qapi/machine.json | 23 +++++++++++++++++++++++
>>>   1 file changed, 23 insertions(+)
>>>
>>> diff --git a/qapi/machine.json b/qapi/machine.json
>>> index 58a9c86b36..f0c7e56be0 100644
>>> --- a/qapi/machine.json
>>> +++ b/qapi/machine.json
>>> @@ -1274,3 +1274,26 @@
>>>   ##
>>>   { 'event': 'MEM_UNPLUG_ERROR',
>>>     'data': { 'device': 'str', 'msg': 'str' } }
>>> +
>>> +##
>>> +# @DEVICE_UNPLUG_ERROR:
>>> +#
>>> +# Emitted when a device hot unplug error occurs.
>>> +#
>>> +# @device: device name
>>> +#
>>> +# @msg: Informative message
>>> +#
>>> +# Since: 6.1
>>> +#
>>> +# Example:
>>> +#
>>> +# <- { "event": "DEVICE_UNPLUG_ERROR"
>>> +#      "data": { "device": "dimm1",
>>> +#                "msg": "Memory hotunplug rejected by the guest for device dimm1"
>>> +#      },
>>> +#      "timestamp": { "seconds": 1615570772, "microseconds": 202844 } }
>>> +#
>>> +##
>>> +{ 'event': 'DEVICE_UNPLUG_ERROR',
>>> +  'data': { 'device': 'str', 'msg': 'str' } }
>> 
>> Missing: update of device_add's doc comment in qdev.json:
>> 
>>      # Notes: When this command completes, the device may not be removed from the
>>      #        guest.  Hot removal is an operation that requires guest cooperation.
>>      #        This command merely requests that the guest begin the hot removal
>>      #        process.  Completion of the device removal process is signaled with a
>>      #        DEVICE_DELETED event. Guest reset will automatically complete removal
>>      #        for all devices.
>
> Ok
>
>> 
>> This sure could use some polish.
>> 
>> If I understand things correctly, we're aiming for the following device
>> unplug protocol:
>
> One thing to note is that DEVICE_UNPLUG_ERROR isn't guaranteed to be send for
> every hotunplug error. The event depends on machine/architecture support to
> detect a guest side error.

Yes.  I tried to provide for that in my description of the protocol.

>> 
>>     Unplug the device with device_del (or possibly equivalent)
>> 
>>     If we know we can't unplug the device, fail immediately.  Also emit
>>     DEVICE_UNPLUG_ERROR.
>
>
> I haven't predicted to use this event in those cases as well, although it
> seems reasonable to do so now that you mentioned it.

I think this is a matter of taste.  For what it's worth, we do emit
DEVICE_DELETED on immediate success (see next paragraph).

Emitting DEVICE_DELETED always lets QMP clients track device deletion
even when it's triggered by something else.  Feature.

No such tracking is needed for unplug failure.

>>     If possible, unplug the device synchronously and succeed.  Also emit
>>     DEVICE_DELETED.
>> 
>>     Else, initiate unplug and succeed.
>> 
>>     When unplug finishes, emit either DEVICE_DELETED or
>>     DEVICE_UNPLUG_ERROR.
>
> Since there's no 100% guarantee that DEVICE_UNPLUG_ERROR will be emitted for
> guest side errors, the wording here would be
>
> "When unplug finishes, emit DEVICE_DELETED. A DEVICE_UNPLUG_ERROR can be
> emitted if a guest side error was detected"

My assumption is that QEMU can detect asynchronous success reliably, but
not asynchronous failure.

From QEMU's point of view, "asynchronous unplug has not finished" is
indistinguishable from "asynchronous unplug finished unsuccessfully, but
we can't detect it".  So I lumped these two cases together in the next
paragraph:

>>     For some machines and devices, unplug may never finish.

My goal is a clear description of the state machine as it can be
observed in QMP.

>> Correct?
>> 
>> Any particular reason for not putting event DEVICE_UNPLUG_ERROR next to
>> DEVICE_DELETED in qdev.json?
>
>
> Not really. I looked where MEM_UNPLUG_ERROR was declared and put it right
> after it. I can change it to qdev.json near DEVICE_DELETED.

Yes, please.



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-06-17  6:00 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-04 20:03 [PATCH v2 0/2] DEVICE_UNPLUG_ERROR QAPI event Daniel Henrique Barboza
2021-06-04 20:03 ` [PATCH v2 1/2] qapi/machine.json: add " Daniel Henrique Barboza
2021-06-07  2:23   ` David Gibson
2021-06-07  2:23     ` David Gibson
2021-06-07 18:41       ` Eric Blake
2021-06-11 12:12   ` Markus Armbruster
2021-06-16 16:41     ` Daniel Henrique Barboza
2021-06-17  5:59       ` Markus Armbruster
2021-06-04 20:03 ` [PATCH v2 2/2] spapr: use DEVICE_UNPLUG_ERROR to report unplug errors Daniel Henrique Barboza
2021-06-07  2:24   ` David Gibson
2021-06-11 12:18   ` Markus Armbruster
2021-06-16 16:58     ` Daniel Henrique Barboza
2021-06-16 17:58       ` Eric Blake
2021-06-07  2:25 ` [PATCH v2 0/2] DEVICE_UNPLUG_ERROR QAPI event David Gibson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.