On Thu, Feb 11, 2021 at 07:52:44PM -0300, Daniel Henrique Barboza wrote: > The LoPAR spec provides no way for the guest kernel to report failure of > hotplug/hotunplug events. This wouldn't be bad if those operations were > granted to always succeed, but that's far for the reality. > > What ends up happening is that, in the case of a failed hotunplug, > regardless of whether it was a QEMU error or a guest misbehavior, the pSeries > machine is retaining the unplug state of the device in the running guest. > This state is cleanup in machine reset, where it is assumed that this state > represents a device that is pending unplug, and the device is hotunpluged > from the board. Until the reset occurs, any hotunplug operation of the same > device is forbid because there is a pending unplug state. > > This behavior has at least one undesirable side effect. A long standing pending > unplug state is, more often than not, the result of a hotunplug error. The user > had to dealt with it, since retrying to unplug the device is noy allowed, and then > in the machine reset we're removing the device from the guest. This means that > we're failing the user twice - failed to hotunplug when asked, then hotunplugged > without notice. > > Solutions to this problem range between trying to predict when the hotunplug will > fail and forbid the operation from the QEMU layer, from opening up the IRQ queue > to allow for multiple hotunplug attempts, from telling the users to 'reboot the > machine if something goes wrong'. The first solution is flawed because we can't > fully predict guest behavior from QEMU, the second solution is a trial and error > remediation that counts on a hope that the unplug will eventually succeed, and the > third is ... well. > > This patch introduces a crude, but effective solution to hotunplug errors in > the pSeries machine. For each unplug done, we'll timeout after some time. If > a certain amount of time passes, we'll cleanup the hotunplug state from the machine. > During the timeout period, any unplug operations in the same device will still > be blocked. After that, we'll assume that the guest failed the operation, and > allow the user to try again. If the timeout is too short we'll prevent legitimate > hotunplug situations to occur, so we'll need to overestimate the regular time > an unplug operation takes to succeed to account that. > > The true solution for the hotunplug errors in the pSeries machines is a PAPR change > to allow for the guest to warn the platform about it. For now, the work done in this > timeout design can be used for the new PAPR 'abort hcall' in the future, given that > for both cases we'll need code to cleanup the existing unplug states of the DRCs. > > At this moment we're adding the basic wiring of the timer into the DRC. Next patch > will use the timer to timeout failed CPU hotunplugs. > > Signed-off-by: Daniel Henrique Barboza > --- > hw/ppc/spapr_drc.c | 36 ++++++++++++++++++++++++++++++++++++ > include/hw/ppc/spapr_drc.h | 2 ++ > 2 files changed, 38 insertions(+) > > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c > index 67041fb212..c88bb524c5 100644 > --- a/hw/ppc/spapr_drc.c > +++ b/hw/ppc/spapr_drc.c > @@ -57,6 +57,8 @@ static void spapr_drc_release(SpaprDrc *drc) > drck->release(drc->dev); > > drc->unplug_requested = false; > + timer_del(drc->unplug_timeout_timer); > + > g_free(drc->fdt); > drc->fdt = NULL; > drc->fdt_start_offset = 0; > @@ -453,6 +455,24 @@ static const VMStateDescription vmstate_spapr_drc_unplug_requested = { > } > }; > > +static bool spapr_drc_unplug_timeout_timer_needed(void *opaque) > +{ > + SpaprDrc *drc = opaque; > + > + return timer_pending(drc->unplug_timeout_timer); > +} > + > +static const VMStateDescription vmstate_spapr_drc_unplug_timeout_timer = { > + .name = "DRC unplug timeout timer", > + .version_id = 1, > + .minimum_version_id = 1, > + .needed = spapr_drc_unplug_timeout_timer_needed, > + .fields = (VMStateField[]) { > + VMSTATE_TIMER_PTR(unplug_timeout_timer, SpaprDrc), > + VMSTATE_END_OF_LIST() > + } > +}; > + > static bool spapr_drc_needed(void *opaque) > { > SpaprDrc *drc = opaque; > @@ -486,10 +506,20 @@ static const VMStateDescription vmstate_spapr_drc = { > }, > .subsections = (const VMStateDescription * []) { > &vmstate_spapr_drc_unplug_requested, > + &vmstate_spapr_drc_unplug_timeout_timer, > NULL > } > }; > > +static void drc_unplug_timeout_cb(void *opaque) > +{ > + SpaprDrc *drc = opaque; > + > + if (drc->unplug_requested) { > + drc->unplug_requested = false; > + } Sorry, forgot to mention in first pass - I think we want some kind of reporting here. At least a trace, and maybe also a warn_report() or the like. Hrm.. looking wider, I wonder if we should add a DEVICE_DELETE_FAILED message to QAPI to mirror the DEVICE_DELETED message. Fixing PAPR is pretty tedious, but fixing at least qemu's own interfaces is a bit more approachable. That could certainly be a follow up enhancement, though. > +} > + > static void drc_realize(DeviceState *d, Error **errp) > { > SpaprDrc *drc = SPAPR_DR_CONNECTOR(d); > @@ -512,6 +542,11 @@ static void drc_realize(DeviceState *d, Error **errp) > object_property_add_alias(root_container, link_name, > drc->owner, child_name); > g_free(link_name); > + > + drc->unplug_timeout_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL, > + drc_unplug_timeout_cb, > + drc); > + > vmstate_register(VMSTATE_IF(drc), spapr_drc_index(drc), &vmstate_spapr_drc, > drc); > trace_spapr_drc_realize_complete(spapr_drc_index(drc)); > @@ -529,6 +564,7 @@ static void drc_unrealize(DeviceState *d) > name = g_strdup_printf("%x", spapr_drc_index(drc)); > object_property_del(root_container, name); > g_free(name); > + timer_free(drc->unplug_timeout_timer); > } > > SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type, > diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h > index 02a63b3666..b2e6222d09 100644 > --- a/include/hw/ppc/spapr_drc.h > +++ b/include/hw/ppc/spapr_drc.h > @@ -187,6 +187,8 @@ typedef struct SpaprDrc { > bool unplug_requested; > void *fdt; > int fdt_start_offset; > + > + QEMUTimer *unplug_timeout_timer; > } SpaprDrc; > > struct SpaprMachineState; -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson