All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kurz <groug@kaod.org>
To: Igor Mammedov <imammedo@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>,
	Thomas Huth <thuth@redhat.com>,
	Michael Roth <mdroth@linux.vnet.ibm.com>,
	qemu-devel@nongnu.org, qemu-ppc@nongnu.org,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [Qemu-devel] [Qemu-ppc] [PATCH 3/4] spapr: disable hotplugging without OS
Date: Wed, 24 May 2017 17:54:10 +0200	[thread overview]
Message-ID: <20170524175410.088f3285@bahia.ttt.fr.ibm.com> (raw)
In-Reply-To: <20170524121402.50e62a75@nial.brq.redhat.com>

[-- Attachment #1: Type: text/plain, Size: 4344 bytes --]

On Wed, 24 May 2017 12:14:02 +0200
Igor Mammedov <imammedo@redhat.com> wrote:

> On Wed, 24 May 2017 11:28:57 +0200
> Greg Kurz <groug@kaod.org> wrote:
> 
> > On Wed, 24 May 2017 15:07:54 +1000
> > David Gibson <david@gibson.dropbear.id.au> wrote:
> >   
> > > On Tue, May 23, 2017 at 01:18:11PM +0200, Laurent Vivier wrote:    
> > > > If the OS is not started, QEMU sends an event to the OS
> > > > that is lost and cannot be recovered. An unplug is not
> > > > able to restore QEMU in a coherent state.
> > > > So, while the OS is not started, disable CPU and memory hotplug.
> > > > We use option vector 6 to know if the OS is started
> > > > 
> > > > Signed-off-by: Laurent Vivier <lvivier@redhat.com>      
> > > 
> > > Urgh.. I'm not terribly confident that this is really correct.  As
> > > discussed on the previous patch, you're essentially using OV6 as a
> > > flag that CAS is complete.
> > > 
> > > But while it undoubtedly makes the race window much smaller, I don't
> > > see that there's any guarantee the guest OS will really be able to
> > > handle hotplug events immediately after CAS.
> > > 
> > > In particular if the CAS process completes partially but then needs to
> > > trigger a reboot, I think that would end up setting the ov6 variable,
> > > but the OS would definitely not be in a state to accept events.  
> wouldn't guest on reboot pick up updated fdt and online hotplugged
> before crash cpu along with initial cpus?
> 

Yes and that's what actually happens with cpus.

But catching up with the background for this series, I have the
impression that the issue isn't the fact we loose an event if the OS
isn't started (which is not true), but more something wrong happening
when hotplugging+unplugging memory as described in this commit:

commit fe6824d12642b005c69123ecf8631f9b13553f8b
Author: Laurent Vivier <lvivier@redhat.com>
Date:   Tue Mar 28 14:09:34 2017 +0200

    spapr: fix memory hot-unplugging

> > We never have any guarantee that the OS will process an event that
> > we've sent actually (think of a kernel crash just after a successful
> > CAS negotiation for example, or any failure with the various guest
> > components involved in the process of hotplug).
> >   
> > > Mike, I really think we need some input from someone familiar with how
> > > these hotplug events are supposed to work.  What do we need to do to
> > > handle lost or stale events, such as those delivered when an OS is not
> > > booted.
> > >     
> > 
> > AFAIK, in the PowerVM world, the HMC exposes a user configurable timeout.
> > 
> > https://www.ibm.com/support/knowledgecenter/POWER8/p8hat/p8hat_dlparprocpoweraddp6.htm
> > 
> > I'm not sure we can do anything better than being able to "cancel" a previous
> > hotplug attempt if it takes too long, but I'm not necessarily the expert you're
> > looking for :)  
> From x86/ACPI world:
>  - if hotplug happens early at boot before guest OS is running
>    hotplug notification (SCI interrupt) stays pending and once guest
>    is up it will/should handle it and online CPU
>  - if guest crashed and is rebooted it will pickup updated apci tables (fdt equivalent)
>    with all present cpus (including hotplugged one before crash) and online
>    hotplugged cpu along with coldplugged ones
>  - if guest looses SCI somehow, it's considered guest issue and such cpu
>    stays unpluggable until guest picks it somehow (reboot, manually running cpus scan
>    method from ACPI or another cpu hotplug event) and explicitly ejects it.
> 
> Taking in account that CPUs don't support surprise removal and requires
> guest cooperation it's fine to leave CPU plugged in until guest ejects it.
> That's what I'd expect to happen on baremetal, 
> you hotplug CPU, hardware notifies OS about it and that's all,
> cpu won't suddenly pop out if OS isn't able to online it.
> 
> More over that hotplugged cpu might be executing some code or one of
> already present cpus might be executing initialization routines to online
> it (think of host overcommit and arbitrary delays) so it is not really safe
> to remove hotplugged but not onlined cpu without OS consent
> (i.e. explicit eject by OS/firmware). I think the lost event handling should be
> fixed on guest side and not in QEMU.
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

  reply	other threads:[~2017-05-24 15:54 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-23 11:18 [Qemu-devel] [PATCH 0/4] spapr: disable hotplugging without OS Laurent Vivier
2017-05-23 11:18 ` [Qemu-devel] [PATCH 1/4] spapr: add pre_plug function for memory Laurent Vivier
2017-05-23 15:28   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2017-05-23 16:09     ` Laurent Vivier
2017-05-24  4:52       ` David Gibson
2017-05-24  9:55         ` Greg Kurz
2017-05-24 10:27           ` David Gibson
2017-05-23 11:18 ` [Qemu-devel] [PATCH 2/4] spapr: add option vector 6 Laurent Vivier
2017-05-23 16:31   ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2017-05-24  4:58   ` [Qemu-devel] " David Gibson
2017-05-23 11:18 ` [Qemu-devel] [PATCH 3/4] spapr: disable hotplugging without OS Laurent Vivier
2017-05-24  5:07   ` David Gibson
2017-05-24  9:28     ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2017-05-24 10:14       ` Igor Mammedov
2017-05-24 15:54         ` Greg Kurz [this message]
2017-05-24 16:02           ` Laurent Vivier
2017-05-24 17:40             ` Michael Roth
2017-05-25  3:16               ` David Gibson
2017-05-30 17:15                 ` Michael Roth
2017-05-31  6:36                   ` David Gibson
2017-05-25  2:49         ` David Gibson
2017-05-25  2:45       ` David Gibson
2017-05-23 11:18 ` [Qemu-devel] [PATCH 4/4] Revert "spapr: fix memory hot-unplugging" Laurent Vivier
2017-05-23 17:52 ` [Qemu-devel] [Qemu-ppc] [PATCH 0/4] spapr: disable hotplugging without OS Daniel Henrique Barboza
2017-05-23 18:07   ` Daniel Henrique Barboza
2017-05-23 18:22     ` Daniel Henrique Barboza
2017-05-23 19:42       ` Laurent Vivier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170524175410.088f3285@bahia.ttt.fr.ibm.com \
    --to=groug@kaod.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=imammedo@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.