qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] pSeries: handle hotunplug errors in drc_unisolate_logical()
@ 2021-04-16 21:09 Daniel Henrique Barboza
  2021-04-16 21:09 ` [PATCH 1/1] spapr_drc.c: " Daniel Henrique Barboza
  0 siblings, 1 reply; 3+ messages in thread
From: Daniel Henrique Barboza @ 2021-04-16 21:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, groug, david

Hi,

This is the QEMU side of a kernel change being proposed in [1],
where an attempt to implement a CPU hotunplug error report
mechanism was proposed.

The idea was discussed first in this mailing list [2], where the
RTAS set-indicator call would be used to signal QEMU when a kernel
side error happens during the unplug process.

Using the modified kernel and this patch, this is the result of a
failed CPU hotunplug attempt when trying to unplug the last online
CPU of the guest:

( QEMU command line: qemu-system-ppc64 -machine pseries,accel=kvm,usb=off
-smp 1,maxcpus=2,threads=1,cores=2,sockets=1 ... )

[root@localhost ~]# QEMU 5.2.92 monitor - type 'help' for more information
(qemu) device_add host-spapr-cpu-core,core-id=1,id=core1
(qemu) 

[root@localhost ~]# echo 0 > /sys/devices/system/cpu/cpu0/online
[   77.548442][   T13] IRQ 19: no longer affine to CPU0
[   77.548452][   T13] IRQ 20: no longer affine to CPU0
[   77.548458][   T13] IRQ 256: no longer affine to CPU0
[   77.548465][   T13] IRQ 258: no longer affine to CPU0
[   77.548472][   T13] IRQ 259: no longer affine to CPU0
[   77.548479][   T13] IRQ 260: no longer affine to CPU0
[   77.548485][   T13] IRQ 261: no longer affine to CPU0
[   77.548590][    T0] cpu 0 (hwid 0) Ready to die...
[root@localhost ~]# (qemu) 
(qemu) device_del core1
(qemu) [   83.214073][  T100] pseries-hotplug-cpu: Failed to offline CPU PowerPC,POWER9, rc: -16
qemu-system-ppc64: Device hotunplug rejected by the guest for device core1

(qemu) 


As mentioned in the kernel change, if this is accepted I'll push
for a PAPR change to make this an official device removal error
report mechanism.


[1] https://lore.kernel.org/linuxppc-dev/20210416210216.380291-3-danielhb413@gmail.com/
[2] https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg06395.html

Daniel Henrique Barboza (1):
  spapr_drc.c: handle hotunplug errors in drc_unisolate_logical()

 hw/ppc/spapr_drc.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

-- 
2.30.2



^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/1] spapr_drc.c: handle hotunplug errors in drc_unisolate_logical()
  2021-04-16 21:09 [PATCH 0/1] pSeries: handle hotunplug errors in drc_unisolate_logical() Daniel Henrique Barboza
@ 2021-04-16 21:09 ` Daniel Henrique Barboza
  2021-04-19  4:08   ` David Gibson
  0 siblings, 1 reply; 3+ messages in thread
From: Daniel Henrique Barboza @ 2021-04-16 21:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Daniel Henrique Barboza, qemu-ppc, groug, david

The Linux kernel will call set-indicator to move a DRC to 'unisolate' in
the case a device removal fails. Setting a DRC that is already
unisolated or configured to 'unisolate' is a no-op for the current
hypervisors that supports pSeries guests, namely QEMU and phyp, and is
being used to signal hotunplug errors if the hypervisor has the code for
it.

This patch changes drc_unisolate_logical() to implement in the pSeries
machine. For CPUs it's a simple matter of setting drc->unplug_requested
to 'false', while for LMBs the process is similar to the rollback that
is done in rtas_ibm_configure_connector(). Although at this moment the
Linux kernel is only reporting CPU removal errors, let's get the code
ready to handle LMBs as well.

Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
---
 hw/ppc/spapr_drc.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 9e16505fa1..6918e0c9d1 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -151,9 +151,32 @@ static uint32_t drc_isolate_logical(SpaprDrc *drc)
 
 static uint32_t drc_unisolate_logical(SpaprDrc *drc)
 {
+    SpaprMachineState *spapr = NULL;
+
     switch (drc->state) {
     case SPAPR_DRC_STATE_LOGICAL_UNISOLATE:
     case SPAPR_DRC_STATE_LOGICAL_CONFIGURED:
+        /*
+         * Unisolating a logical DRC that was marked for unplug
+         * means that the kernel is refusing the removal.
+         */
+        if (drc->unplug_requested && drc->dev) {
+            if (spapr_drc_type(drc) == SPAPR_DR_CONNECTOR_TYPE_LMB) {
+                spapr = SPAPR_MACHINE(qdev_get_machine());
+
+                spapr_memory_unplug_rollback(spapr, drc->dev);
+            }
+
+            drc->unplug_requested = false;
+            error_report("Device hotunplug rejected by the guest "
+                         "for device %s", drc->dev->id);
+
+            /*
+             * TODO: send a QAPI DEVICE_UNPLUG_ERROR event when
+             * it is implemented.
+             */
+        }
+
         return RTAS_OUT_SUCCESS; /* Nothing to do */
     case SPAPR_DRC_STATE_LOGICAL_AVAILABLE:
         break; /* see below */
-- 
2.30.2



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/1] spapr_drc.c: handle hotunplug errors in drc_unisolate_logical()
  2021-04-16 21:09 ` [PATCH 1/1] spapr_drc.c: " Daniel Henrique Barboza
@ 2021-04-19  4:08   ` David Gibson
  0 siblings, 0 replies; 3+ messages in thread
From: David Gibson @ 2021-04-19  4:08 UTC (permalink / raw)
  To: Daniel Henrique Barboza; +Cc: qemu-ppc, qemu-devel, groug

[-- Attachment #1: Type: text/plain, Size: 2992 bytes --]

On Fri, Apr 16, 2021 at 06:09:41PM -0300, Daniel Henrique Barboza wrote:
> The Linux kernel will call set-indicator to move a DRC to 'unisolate' in
> the case a device removal fails. Setting a DRC that is already

Only issue I have with this patch is that this isn't quite accurate
yet.  Can you reword this to include some of the rationale from the
cover letters about how a previous no-op is a safe choice to allow
guest->hv error signalling and that we expect the Linux kernel to use
it at some point.

> unisolated or configured to 'unisolate' is a no-op for the current
> hypervisors that supports pSeries guests, namely QEMU and phyp, and is
> being used to signal hotunplug errors if the hypervisor has the code for
> it.
> 
> This patch changes drc_unisolate_logical() to implement in the pSeries
> machine. For CPUs it's a simple matter of setting drc->unplug_requested
> to 'false', while for LMBs the process is similar to the rollback that
> is done in rtas_ibm_configure_connector(). Although at this moment the
> Linux kernel is only reporting CPU removal errors, let's get the code
> ready to handle LMBs as well.
> 
> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>

Otherwise,

Reviewed-by: David Gibson <david@gibson.dropbear.id.au>

> ---
>  hw/ppc/spapr_drc.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 9e16505fa1..6918e0c9d1 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -151,9 +151,32 @@ static uint32_t drc_isolate_logical(SpaprDrc *drc)
>  
>  static uint32_t drc_unisolate_logical(SpaprDrc *drc)
>  {
> +    SpaprMachineState *spapr = NULL;
> +
>      switch (drc->state) {
>      case SPAPR_DRC_STATE_LOGICAL_UNISOLATE:
>      case SPAPR_DRC_STATE_LOGICAL_CONFIGURED:
> +        /*
> +         * Unisolating a logical DRC that was marked for unplug
> +         * means that the kernel is refusing the removal.
> +         */
> +        if (drc->unplug_requested && drc->dev) {
> +            if (spapr_drc_type(drc) == SPAPR_DR_CONNECTOR_TYPE_LMB) {
> +                spapr = SPAPR_MACHINE(qdev_get_machine());
> +
> +                spapr_memory_unplug_rollback(spapr, drc->dev);
> +            }
> +
> +            drc->unplug_requested = false;
> +            error_report("Device hotunplug rejected by the guest "
> +                         "for device %s", drc->dev->id);
> +
> +            /*
> +             * TODO: send a QAPI DEVICE_UNPLUG_ERROR event when
> +             * it is implemented.
> +             */
> +        }
> +
>          return RTAS_OUT_SUCCESS; /* Nothing to do */
>      case SPAPR_DRC_STATE_LOGICAL_AVAILABLE:
>          break; /* see below */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-04-19  4:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-16 21:09 [PATCH 0/1] pSeries: handle hotunplug errors in drc_unisolate_logical() Daniel Henrique Barboza
2021-04-16 21:09 ` [PATCH 1/1] spapr_drc.c: " Daniel Henrique Barboza
2021-04-19  4:08   ` David Gibson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).