Re: [PATCH] qmp: Stabilize preconfig

From: "Daniel P. Berrangé" <berrange@redhat.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: "Michal Prívozník" <mprivozn@redhat.com>,
	"Igor Mammedov" <imammedo@redhat.com>,
	qemu-devel@nongnu.org, "Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [PATCH] qmp: Stabilize preconfig
Date: Wed, 3 Nov 2021 09:27:11 +0000	[thread overview]
Message-ID: <YYJV3ZIA7kNaqORB@redhat.com> (raw)
In-Reply-To: <87zgqlzmxi.fsf@dusky.pond.sub.org>

On Wed, Nov 03, 2021 at 09:02:49AM +0100, Markus Armbruster wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Mon, Nov 01, 2021 at 03:37:58PM +0100, Michal Prívozník wrote:
> >> On 10/25/21 2:19 PM, Markus Armbruster wrote:
> >> > Michal Privoznik <mprivozn@redhat.com> writes:
> >> > 
> >> >> The -preconfig option and exit-preconfig command are around for
> >> >> quite some time now. However, they are still marked as unstable.
> >> >> This is suboptimal because it may block some upper layer in
> >> >> consuming it. In this specific case - Libvirt avoids using
> >> >> experimental features.
> >> >>
> >> >> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> >> > 
> >> > If I remember correctly, the motivation for -preconfig was NUMA
> >> > configuration via QMP.  More uses may have appeared since.
> >> > 
> >> > Back then, I questioned the need for yet another option and yet another
> >> > state: why not -S?
> >> > 
> >> > The answer boiled down to
> >> > 
> >> > 0. Yes, having just one would be a simpler and cleaner interface, but
> >> > 
> >> > 1. the godawful mess QEMU startup has become makes -S unsuitable for
> >> >    some things we want to do, so we need -preconfig,
> >> > 
> >> > 2. which is in turn unsuitable for other things we want to do, so we
> >> >    still need -S".
> >> > 
> >> > 3. Cleaning up the mess to the point where "simpler and cleaner" becomes
> >> >    viable again is not in the cards right now.
> >> 
> >> I see a difference between the two. -preconfig starts QEMU in such a way
> >> that its configuration can still be changed (in my particular use case
> >> vCPUs can be assigned to NUMA nodes), while -S does not allow that. If
> >> we had one state for both, then some commands must be forbidden from
> >> executing as soon as 'cont' is issued. Moreover, those commands would
> >> need to do much more than they are doing now (e.g. regenerate ACPI table
> >> after each run). Subsequently, validating configuration would need to be
> >> postponed until the first 'cont' because with just one state QEMU can't
> >> know when the last config command was issued.
> 
> Doesn't all this apply to x-exit-preconfig already?
> 
> * Some commands are only allowed before x-exit-preconfig,
>   e.g. set-numa-node.
> 
> * The complete (pre-)configuration is only available at
>   x-exit-preconfig.  In particular, ACPI tables can be fixed only then.
> 
> >> Having said all of that, I'm not sure if -preconfig is the way to go or
> >> we want to go the other way. I don't have a strong opinion.
> >
> > It feels like the scenario here is really just a specialization of the
> > more general problem we want to be able to solve. Namely, we want to be
> > able to start a bare QEMU and configure it entirely on the fly. IOW, we
> > are really targetting for -preconfig to be able to do /all/ configuration,
> > and with a new ELF binary, at which point -preconfig wouldn't exist, it
> > would be the implicit default.
> 
> Whether -preconfig is the default or an option doesn't matter for
> discussing the state machine.
> 
> > Libvirt primarily uses -S because it needs to query various aspects of
> > QEMU's config before CPUs start executing, while QEMU can still be
> > considered trustworthy (as it hasn't executed untrusted guest code 
> > yet). eg we query vCPU PIDs so that we can apply CPU pinning to them. 
> > We query the CPU model features so we can reflect what exact CPU 
> > features we got from KVM. There are various other examples.
> 
> Which of the queries you need work only between x-exit-preconfig and -S?

Well before x-exit-preconfig, QMP only permits a very small number
of commands - QEMU has loosened that up a bit, but I don't think anyone
has checked whether there's enough to cover libvirt's current usage yet.

> Which of them could be made to work before x-exit-preconfig?

Quite a few i expect.

> > The secondary reason we use -S is that sometimes the mgmt app does
> > not actually want the guest CPUs to start running - they actively
> > want it in a paused state initially and will manually start CPUs
> > later. One reason is to enable them to open the serial console
> > backend before CPUs start, to guarantee that no console output is
> > lost in that small startup window.  This is really the original
> > purpose of -S.  This doesn't imply a need for -S. I'd say that
> > -preconfig should essentially imply -S by default. If you're
> > already doing lots of things via QMP, being required to issue
> > a 'cont' command is no hardship.
> 
> I wonder whether we really have to step through three states
> 
>          x-exit-preconfig  cont
>     preconfig ---> pre run ---> run
> 
> and not two
> 
>             cont
>     pre run ---> run

Looking at it from POV of configuration we have two states, with
a unidirectional transition permitted

  unconfigured   --->  configured

Then from the POV of guest CPUs we have two states, with a
bi-directional transition permitted.

   stopped   <----->  running

During QEMU start process we have two end goals we need to satisfy

 *  configured + running (the 95+% common case)
 *  configured + stopped (the rarer case)

So in terms of QEMU internal state transitions it feels like we do
likely need to distinguish pre-config separately from stopped, but
from a CLI arg POV I think it is redundant to distinguish them as
"stopped" can be reasonably implied as a default

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|