qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Daniel Henrique Barboza <danielhb413@gmail.com>
Cc: qemu-ppc@nongnu.org, qemu-devel@nongnu.org, groug@kaod.org
Subject: Re: [PATCH 2/2] spapr.c: always pulse guest IRQ in spapr_core_unplug_request()
Date: Tue, 20 Apr 2021 11:24:51 +1000	[thread overview]
Message-ID: <YH4tY6iDkeGVD2sM@yekko.fritz.box> (raw)
In-Reply-To: <d8ef1891-6ec6-bacf-e29e-5a4891780c2e@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4517 bytes --]

On Mon, Apr 12, 2021 at 04:27:43PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 3/31/21 11:37 PM, David Gibson wrote:
> > On Wed, Mar 31, 2021 at 09:04:37PM -0300, Daniel Henrique Barboza wrote:
> > > Commit 47c8c915b162 fixed a problem where multiple spapr_drc_detach()
> > > requests were breaking QEMU. The solution was to just spapr_drc_detach()
> > > once, and use spapr_drc_unplug_requested() to filter whether we already
> > > detached it or not. The commit also tied the hotplug request to the
> > > guest in the same condition.
> > > 
> > > Turns out that there is a reliable way for a CPU hotunplug to fail. If a
> > > guest with one CPU hotplugs a CPU1, then offline CPU0s via 'echo 0 >
> > > /sys/devices/system/cpu/cpu0/online', then attempts to hotunplug CPU1,
> > > the kernel will refuse it because it's the last online CPU of the
> > > system. Given that we're pulsing the IRQ only in the first try, in a
> > > failed attempt, all other CPU1 hotunplug attempts will fail, regardless
> > > of the online state of CPU1 in the kernel, because we're simply not
> > > letting the guest know that we want to hotunplug the device.
> > > 
> > > Let's move spapr_hotplug_req_remove_by_index() back out of the "if
> > > (!spapr_drc_unplug_requested(drc))" conditional, allowing for multiple
> > > 'device_del' requests to the same CPU core to reach the guest, in case
> > > the CPU core didn't fully hotunplugged previously.
> > > 
> > > Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
> > 
> > I've applied these to ppc-for-6.0, but..
> > 
> > > ---
> > >   hw/ppc/spapr.c | 11 ++++++++++-
> > >   1 file changed, 10 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index 05a765fab4..e4be00b732 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -3777,8 +3777,17 @@ void spapr_core_unplug_request(HotplugHandler *hotplug_dev, DeviceState *dev,
> > >       if (!spapr_drc_unplug_requested(drc)) {
> > >           spapr_drc_unplug_request(drc);
> > > -        spapr_hotplug_req_remove_by_index(drc);
> > >       }
> > > +
> > > +    /*
> > > +     * spapr_hotplug_req_remove_by_index is left unguarded, out of the
> > > +     * "!spapr_drc_unplug_requested" check, to allow for multiple IRQ
> > > +     * pulses removing the same CPU. Otherwise, in an failed hotunplug
> > > +     * attempt (e.g. the kernel will refuse to remove the last online
> > > +     * CPU), we will never attempt it again because unplug_requested
> > > +     * will still be 'true' in that case.
> > > +     */
> > > +    spapr_hotplug_req_remove_by_index(drc);
> > 
> > I think we need similar changes for all the other unplug types (LMB,
> > PCI, PHB) - basically retries should always be allowed, and at worst
> > be a no-op, rather than generating an error like they do now.
> 
> 
> For PHBs should be straightforward. Not so sure about PCI because there is
> all the PCI function logic around the hotunplug of function 0.
> 
> As for LMBs, we block further attempts because there is no way we can tell
> if the hotunplug is being executed but it is taking some time (it is not
> uncommon for a DIMM unplug to take 20-30 seconds to complete), versus
> an error scenario.

I don't see why that prevents retries.  Can't you reissue the
index+size unplug request anyway?  The code you already have to fail
unplugs on a reconfigure should work for both the original request and
the retry, shouldn't it?


> What we do ATM is check is the pending DIMM unplug
> state exists, and if it does, assume that a hotunplug is pending. I have
> no idea what would happen if an unplug request for a LMB DRC reaches the
> kernel in the middle of an error rollback (when the kernel reconnects all
> the LMBs again) and the same DRC that was rolled back is disconnected
> again.
> 
> We would need to check not only if the pending dimm unplug state exists, but
> also if partially exists. In other words, if there are DRCs of that DIMM that
> were unplugged already. That way we can prevent to issue a removal while
> the unplug is still running.
> 
> 
> Thanks,
> 
> 
> DHB
> 
> > 
> > >   }
> > >   int spapr_core_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

      reply	other threads:[~2021-04-20  1:34 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-01  0:04 [PATCH 0/2] pSeries: revert CPU unplug timeout Daniel Henrique Barboza
2021-04-01  0:04 ` [PATCH 1/2] spapr: rollback 'unplug timeout' for CPU hotunplugs Daniel Henrique Barboza
2021-04-01  0:04 ` [PATCH 2/2] spapr.c: always pulse guest IRQ in spapr_core_unplug_request() Daniel Henrique Barboza
2021-04-01  2:37   ` David Gibson
2021-04-12 19:27     ` Daniel Henrique Barboza
2021-04-20  1:24       ` David Gibson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YH4tY6iDkeGVD2sM@yekko.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=danielhb413@gmail.com \
    --cc=groug@kaod.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).