From: David Gibson <david@gibson.dropbear.id.au> To: Daniel Henrique Barboza <danielhb413@gmail.com> Cc: qemu-ppc@nongnu.org, qemu-devel@nongnu.org, groug@kaod.org Subject: Re: [PATCH 2/2] spapr.c: always pulse guest IRQ in spapr_core_unplug_request() Date: Tue, 20 Apr 2021 11:24:51 +1000 [thread overview] Message-ID: <YH4tY6iDkeGVD2sM@yekko.fritz.box> (raw) In-Reply-To: <d8ef1891-6ec6-bacf-e29e-5a4891780c2e@gmail.com> [-- Attachment #1: Type: text/plain, Size: 4517 bytes --] On Mon, Apr 12, 2021 at 04:27:43PM -0300, Daniel Henrique Barboza wrote: > > > On 3/31/21 11:37 PM, David Gibson wrote: > > On Wed, Mar 31, 2021 at 09:04:37PM -0300, Daniel Henrique Barboza wrote: > > > Commit 47c8c915b162 fixed a problem where multiple spapr_drc_detach() > > > requests were breaking QEMU. The solution was to just spapr_drc_detach() > > > once, and use spapr_drc_unplug_requested() to filter whether we already > > > detached it or not. The commit also tied the hotplug request to the > > > guest in the same condition. > > > > > > Turns out that there is a reliable way for a CPU hotunplug to fail. If a > > > guest with one CPU hotplugs a CPU1, then offline CPU0s via 'echo 0 > > > > /sys/devices/system/cpu/cpu0/online', then attempts to hotunplug CPU1, > > > the kernel will refuse it because it's the last online CPU of the > > > system. Given that we're pulsing the IRQ only in the first try, in a > > > failed attempt, all other CPU1 hotunplug attempts will fail, regardless > > > of the online state of CPU1 in the kernel, because we're simply not > > > letting the guest know that we want to hotunplug the device. > > > > > > Let's move spapr_hotplug_req_remove_by_index() back out of the "if > > > (!spapr_drc_unplug_requested(drc))" conditional, allowing for multiple > > > 'device_del' requests to the same CPU core to reach the guest, in case > > > the CPU core didn't fully hotunplugged previously. > > > > > > Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com> > > > > I've applied these to ppc-for-6.0, but.. > > > > > --- > > > hw/ppc/spapr.c | 11 ++++++++++- > > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > > > index 05a765fab4..e4be00b732 100644 > > > --- a/hw/ppc/spapr.c > > > +++ b/hw/ppc/spapr.c > > > @@ -3777,8 +3777,17 @@ void spapr_core_unplug_request(HotplugHandler *hotplug_dev, DeviceState *dev, > > > if (!spapr_drc_unplug_requested(drc)) { > > > spapr_drc_unplug_request(drc); > > > - spapr_hotplug_req_remove_by_index(drc); > > > } > > > + > > > + /* > > > + * spapr_hotplug_req_remove_by_index is left unguarded, out of the > > > + * "!spapr_drc_unplug_requested" check, to allow for multiple IRQ > > > + * pulses removing the same CPU. Otherwise, in an failed hotunplug > > > + * attempt (e.g. the kernel will refuse to remove the last online > > > + * CPU), we will never attempt it again because unplug_requested > > > + * will still be 'true' in that case. > > > + */ > > > + spapr_hotplug_req_remove_by_index(drc); > > > > I think we need similar changes for all the other unplug types (LMB, > > PCI, PHB) - basically retries should always be allowed, and at worst > > be a no-op, rather than generating an error like they do now. > > > For PHBs should be straightforward. Not so sure about PCI because there is > all the PCI function logic around the hotunplug of function 0. > > As for LMBs, we block further attempts because there is no way we can tell > if the hotunplug is being executed but it is taking some time (it is not > uncommon for a DIMM unplug to take 20-30 seconds to complete), versus > an error scenario. I don't see why that prevents retries. Can't you reissue the index+size unplug request anyway? The code you already have to fail unplugs on a reconfigure should work for both the original request and the retry, shouldn't it? > What we do ATM is check is the pending DIMM unplug > state exists, and if it does, assume that a hotunplug is pending. I have > no idea what would happen if an unplug request for a LMB DRC reaches the > kernel in the middle of an error rollback (when the kernel reconnects all > the LMBs again) and the same DRC that was rolled back is disconnected > again. > > We would need to check not only if the pending dimm unplug state exists, but > also if partially exists. In other words, if there are DRCs of that DIMM that > were unplugged already. That way we can prevent to issue a removal while > the unplug is still running. > > > Thanks, > > > DHB > > > > > > } > > > int spapr_core_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr, > > > -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 833 bytes --]
prev parent reply other threads:[~2021-04-20 1:34 UTC|newest] Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-04-01 0:04 [PATCH 0/2] pSeries: revert CPU unplug timeout Daniel Henrique Barboza 2021-04-01 0:04 ` [PATCH 1/2] spapr: rollback 'unplug timeout' for CPU hotunplugs Daniel Henrique Barboza 2021-04-01 0:04 ` [PATCH 2/2] spapr.c: always pulse guest IRQ in spapr_core_unplug_request() Daniel Henrique Barboza 2021-04-01 2:37 ` David Gibson 2021-04-12 19:27 ` Daniel Henrique Barboza 2021-04-20 1:24 ` David Gibson [this message]
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YH4tY6iDkeGVD2sM@yekko.fritz.box \ --to=david@gibson.dropbear.id.au \ --cc=danielhb413@gmail.com \ --cc=groug@kaod.org \ --cc=qemu-devel@nongnu.org \ --cc=qemu-ppc@nongnu.org \ --subject='Re: [PATCH 2/2] spapr.c: always pulse guest IRQ in spapr_core_unplug_request()' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).