From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44850) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1alq3g-00071p-2s for qemu-devel@nongnu.org; Thu, 31 Mar 2016 23:47:42 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1alq3a-0001bD-W8 for qemu-devel@nongnu.org; Thu, 31 Mar 2016 23:47:40 -0400 Received: from e19.ny.us.ibm.com ([129.33.205.209]:52445) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1alq3a-0001ay-Qb for qemu-devel@nongnu.org; Thu, 31 Mar 2016 23:47:34 -0400 Received: from localhost by e19.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 31 Mar 2016 23:47:33 -0400 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable From: Michael Roth In-Reply-To: <20160401023325.GK416@voom.redhat.com> References: <1459463257-3137-1-git-send-email-mdroth@linux.vnet.ibm.com> <20160401023325.GK416@voom.redhat.com> Message-ID: <20160401034724.3884.29404@loki> Date: Thu, 31 Mar 2016 22:47:24 -0500 Subject: Re: [Qemu-devel] [PATCH for-2.6] spapr_drc: enable immediate detach for unsignalled devices List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: David Gibson Cc: qemu-ppc@nongnu.org, sbhat@linux.vnet.ibm.com, qemu-devel@nongnu.org, bharata@linux.vnet.ibm.com Quoting David Gibson (2016-03-31 21:33:25) > On Thu, Mar 31, 2016 at 05:27:37PM -0500, Michael Roth wrote: > > Currently spapr doesn't support "aborting" hotplug of PCI > > devices by allowing device_del to immediately remove the > > device if we haven't signalled the presence of the device > > to the guest. > > = > > In the past this wasn't an issue, since we always immediately > > signalled device attach and simply relied on full guest-aware > > add->remove path for device removal. However, as of 788d259, > > we now defer signalling for PCI functions until function 0 > > is attached, so now we need to deal with these "abort" operations > > for cases where a user hotplugs a non-0 function, then opts to > > remove it prior hotplugging function 0. Currently they'd have to > > reboot before the unplug completed. PCIe multifunction hotplug > > does not have this requirement however, so from a management > > implementation perspective it would be good to address this within > > the same release as 788d259. > > = > > We accomplish this by simply adding a 'signalled' flag to track > > whether a device hotplug event has been sent to the guest. If it > > hasn't, we allow immediate removal under the assumption that the > > guest will not be using the device. Devices present at boot/reset > > time are also assumed to be 'signalled'. > > = > > For CPU/memory/etc, signalling will still happen immediately > > as part of device_add, so only PCI functions should be affected. > > = > > Cc: bharata@linux.vnet.ibm.com > > Cc: david@gibson.dropbear.id.au > > Cc: sbhat@linux.vnet.ibm.com > > Cc: qemu-ppc@nongnu.org > > Signed-off-by: Michael Roth > = > So, I'm disinclined to take this during the hard freeze, without some > evidence that it fixes a problem case that's really being hit in > practice. The basic situation is that previously we had: device_add virtio-net-pci,id=3Dhp5.2,addr=3D0x5.2 a1) plug device into slot a2) signal guest of attach a3) guest adds device device_del hp5.2 = d1) signal guest of detach d2) wait for guest to clean up device and signal completion d3) unplug device from slot After 788d259 we have: device_add virtio-net-pci,id=3Dhp5.2,addr=3D0x5.2 a1) plug device into slot a2) defer signalling until we see 0x5.0 added device_del hp5.2 = d1) defer signalling removal until we see 0x5.0 deleted d2) wait for guest to clean up device and signal completion But d2) never happens because the guest was never signalled that the device was added in the first place. A real world situation where this might occur is admittedly a bit contrived, but in practice I can certainly see it happening: a) user hotplugs multifunction virtio-net-pci to slot 5, starting with function 3 at 0x5.3 b) user notices they made a mistake, for instance, they forget enable vhost in the accompanying netdev c) user attempts to unplug function and "abort" the operation, but device does not go away Shiva (on cc:) is working on the libvirt implementation for this and hit this in testing. Since it differs in behavior from the PCIe workflow it was based on (where aborts are explicitly allowed), and causes a minor regression from 2.5, I thought it might be worth considering for 2.6, but I can certainly understand if you opt to hold off till 2.7. > = > > --- > > hw/ppc/spapr_drc.c | 34 ++++++++++++++++++++++++++++++++++ > > hw/ppc/spapr_events.c | 11 +++++++++++ > > include/hw/ppc/spapr_drc.h | 2 ++ > > 3 files changed, 47 insertions(+) > > = > > diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c > > index ef063c0..5568e44 100644 > > --- a/hw/ppc/spapr_drc.c > > +++ b/hw/ppc/spapr_drc.c > > @@ -173,6 +173,12 @@ static void set_configured(sPAPRDRConnector *drc) > > drc->configured =3D true; > > } > > = > > +/* has the guest been notified of device attachment? */ > > +static void set_signalled(sPAPRDRConnector *drc) > > +{ > > + drc->signalled =3D true; > > +} > > + > > /* > > * dr-entity-sense sensor value > > * returned via get-sensor-state RTAS calls > > @@ -355,6 +361,7 @@ static void attach(sPAPRDRConnector *drc, DeviceSta= te *d, void *fdt, > > drc->fdt =3D fdt; > > drc->fdt_start_offset =3D fdt_start_offset; > > drc->configured =3D coldplug; > > + drc->signalled =3D coldplug; > > = > > object_property_add_link(OBJECT(drc), "device", > > object_get_typename(OBJECT(drc->dev)), > > @@ -371,6 +378,26 @@ static void detach(sPAPRDRConnector *drc, DeviceSt= ate *d, > > drc->detach_cb =3D detach_cb; > > drc->detach_cb_opaque =3D detach_cb_opaque; > > = > > + /* if we've signalled device presence to the guest, or if the guest > > + * has gone ahead and configured the device (via manually-executed > > + * device add via drmgr in guest, namely), we need to wait > > + * for the guest to quiesce the device before completing detach. > > + * Otherwise, we can assume the guest hasn't seen it and complete = the > > + * detach immediately. Note that there is a small race window > > + * just before, or during, configuration, which is this context > > + * refers mainly to fetching the device tree via RTAS. > > + * During this window the device access will be arbitrated by > > + * associated DRC, which will simply fail the RTAS calls as invali= d. > > + * This is recoverable within guest and current implementations of > > + * drmgr should be able to cope. > = > Sorry, I'm really not following this description of the race, or > seeing how it relates to allowing removal of "half plugged" devices. > = There's 2 stages we enter before the device is fully accessible within the guest: 1) The 'signalled' state introduced by this patch, which we enter as soon as we send a hotplug event to the guest. 2) The 'configuration' phase, which consists of RTAS calls against the DRC and power domain for the slot to modify things like hotplug LED indicator, allocation/isolation state, and fetching the device's device tree. Once the device is done fetching the device tree it's considered to be in 'configured' state according to state diagram in PAPR v2.6 13.4. (note there's no accesses to the device over PCI bus during this time, just RTAS calls). We only allow immediate unplug/abort if we haven't signalled the device to the guest, *and* if the device hasn't reached the 'configured' state. Normally we wouldn't reach the configuration phase until after we enter the signalled state, and since being in the signalled state prevents immediate unplug, there's no race there. The only race is if we haven't signalled the guest yet (for instance, because the above example where we've only hotplugged function 0x5.3 and are waiting for function 0x5.0 before sending the hotplug event), but someone manually ran 'drmgr -c pci -a -s ...' in the guest (perhaps because they were confused why the device didn't show up). Then, while that command was running through the process of bringing the device into the 'configured' state, they or someone else executed device_del on the device. Since it hasn't been signalled, and since it hasn't reached the configured state, it would be immediately unplugged by QEMU. But since drmgr might not have finished adding it yet, we have a case where the guest is attempting to access the device while it's being removed. This is the race window. It's pretty small and unlikely, but I figured it was worth mentioning. I reason in the comment that since the 'configuration' phase consists only of RTAS calls, and no actual accesses over the PCI bus, that this would still result in the guest failing gracefully via an RTAS error for executing an RTAS call against a DRC that no longer has a device associated with in. Current implementations of drmgr would simply abort the add operation and resume normal function without ever adding the device to PCI subsystem. So during the race window an 'abort' should be fairly safe. > > + */ > > + if (!drc->signalled && !drc->configured) { > > + /* if the guest hasn't seen the device we can't rely on it to > > + * set it back to an isolated state via RTAS, so do it here ma= nually > > + */ > > + drc->isolation_state =3D SPAPR_DR_ISOLATION_STATE_ISOLATED; > > + } > > + > > if (drc->isolation_state !=3D SPAPR_DR_ISOLATION_STATE_ISOLATED) { > > DPRINTFN("awaiting transition to isolated state before removal= "); > > drc->awaiting_release =3D true; > > @@ -409,6 +436,7 @@ static void reset(DeviceState *d) > > { > > sPAPRDRConnector *drc =3D SPAPR_DR_CONNECTOR(d); > > sPAPRDRConnectorClass *drck =3D SPAPR_DR_CONNECTOR_GET_CLASS(drc); > > + sPAPRDREntitySense state; > > = > > DPRINTFN("drc reset: %x", drck->get_index(drc)); > > /* immediately upon reset we can safely assume DRCs whose devices > > @@ -436,6 +464,11 @@ static void reset(DeviceState *d) > > drck->set_allocation_state(drc, SPAPR_DR_ALLOCATION_STATE_= UNUSABLE); > > } > > } > > + > > + drck->entity_sense(drc, &state); > > + if (state =3D=3D SPAPR_DR_ENTITY_SENSE_PRESENT) { > > + drck->set_signalled(drc); > > + } > > } > > = > > static void realize(DeviceState *d, Error **errp) > > @@ -594,6 +627,7 @@ static void spapr_dr_connector_class_init(ObjectCla= ss *k, void *data) > > drck->attach =3D attach; > > drck->detach =3D detach; > > drck->release_pending =3D release_pending; > > + drck->set_signalled =3D set_signalled; > > /* > > * Reason: it crashes FIXME find and document the real reason > > */ > > diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c > > index 39f4682..be8de63 100644 > > --- a/hw/ppc/spapr_events.c > > +++ b/hw/ppc/spapr_events.c > > @@ -387,6 +387,13 @@ static void spapr_powerdown_req(Notifier *n, void = *opaque) > > qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_ir= q)); > > } > > = > > +static void spapr_hotplug_set_signalled(uint32_t drc_index) > > +{ > > + sPAPRDRConnector *drc =3D spapr_dr_connector_by_index(drc_index); > > + sPAPRDRConnectorClass *drck =3D SPAPR_DR_CONNECTOR_GET_CLASS(drc); > > + drck->set_signalled(drc); > > +} > > + > > static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action, > > sPAPRDRConnectorType drc_type, > > uint32_t drc) > > @@ -453,6 +460,10 @@ static void spapr_hotplug_req_event(uint8_t hp_id,= uint8_t hp_action, > > = > > rtas_event_log_queue(RTAS_LOG_TYPE_HOTPLUG, new_hp, true); > > = > > + if (hp->hotplug_action =3D=3D RTAS_LOG_V6_HP_ACTION_ADD) { > > + spapr_hotplug_set_signalled(drc); > > + } > > + > > qemu_irq_pulse(xics_get_qirq(spapr->icp, spapr->check_exception_ir= q)); > > } > > = > > diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h > > index 7e56347..fa21ba0 100644 > > --- a/include/hw/ppc/spapr_drc.h > > +++ b/include/hw/ppc/spapr_drc.h > > @@ -151,6 +151,7 @@ typedef struct sPAPRDRConnector { > > bool configured; > > = > > bool awaiting_release; > > + bool signalled; > > = > > /* device pointer, via link property */ > > DeviceState *dev; > > @@ -188,6 +189,7 @@ typedef struct sPAPRDRConnectorClass { > > spapr_drc_detach_cb *detach_cb, > > void *detach_cb_opaque, Error **errp); > > bool (*release_pending)(sPAPRDRConnector *drc); > > + void (*set_signalled)(sPAPRDRConnector *drc); > > } sPAPRDRConnectorClass; > > = > > sPAPRDRConnector *spapr_dr_connector_new(Object *owner, > = > -- = > David Gibson | I'll have my music baroque, and my code > david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _othe= r_ > | _way_ _around_! > http://www.ozlabs.org/~dgibson