On Wed, Mar 31, 2021 at 09:04:37PM -0300, Daniel Henrique Barboza wrote: > Commit 47c8c915b162 fixed a problem where multiple spapr_drc_detach() > requests were breaking QEMU. The solution was to just spapr_drc_detach() > once, and use spapr_drc_unplug_requested() to filter whether we already > detached it or not. The commit also tied the hotplug request to the > guest in the same condition. > > Turns out that there is a reliable way for a CPU hotunplug to fail. If a > guest with one CPU hotplugs a CPU1, then offline CPU0s via 'echo 0 > > /sys/devices/system/cpu/cpu0/online', then attempts to hotunplug CPU1, > the kernel will refuse it because it's the last online CPU of the > system. Given that we're pulsing the IRQ only in the first try, in a > failed attempt, all other CPU1 hotunplug attempts will fail, regardless > of the online state of CPU1 in the kernel, because we're simply not > letting the guest know that we want to hotunplug the device. > > Let's move spapr_hotplug_req_remove_by_index() back out of the "if > (!spapr_drc_unplug_requested(drc))" conditional, allowing for multiple > 'device_del' requests to the same CPU core to reach the guest, in case > the CPU core didn't fully hotunplugged previously. > > Signed-off-by: Daniel Henrique Barboza I've applied these to ppc-for-6.0, but.. > --- > hw/ppc/spapr.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 05a765fab4..e4be00b732 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -3777,8 +3777,17 @@ void spapr_core_unplug_request(HotplugHandler *hotplug_dev, DeviceState *dev, > > if (!spapr_drc_unplug_requested(drc)) { > spapr_drc_unplug_request(drc); > - spapr_hotplug_req_remove_by_index(drc); > } > + > + /* > + * spapr_hotplug_req_remove_by_index is left unguarded, out of the > + * "!spapr_drc_unplug_requested" check, to allow for multiple IRQ > + * pulses removing the same CPU. Otherwise, in an failed hotunplug > + * attempt (e.g. the kernel will refuse to remove the last online > + * CPU), we will never attempt it again because unplug_requested > + * will still be 'true' in that case. > + */ > + spapr_hotplug_req_remove_by_index(drc); I think we need similar changes for all the other unplug types (LMB, PCI, PHB) - basically retries should always be allowed, and at worst be a no-op, rather than generating an error like they do now. > } > > int spapr_core_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr, -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson