From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D600CC433EF for ; Thu, 9 Dec 2021 19:13:24 +0000 (UTC) Received: from localhost ([::1]:56606 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mvOr5-00046V-PI for qemu-devel@archiver.kernel.org; Thu, 09 Dec 2021 14:13:23 -0500 Received: from eggs.gnu.org ([209.51.188.92]:51732) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvOpe-0003NM-3U for qemu-devel@nongnu.org; Thu, 09 Dec 2021 14:11:54 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]:39422) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mvOpZ-00021O-Ui for qemu-devel@nongnu.org; Thu, 09 Dec 2021 14:11:53 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1639077109; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references; bh=cp65in747b7oAmJzg4YCkzJYYlRSEMhGHTsk+/kokyE=; b=UJ/c2l8cQW12JsWeAASyGhDbHCdoHrwY+dWMJQagDyLZIrP3tvW3mmgPMvPyJCiUmovows 8bGmliNSRYf3KHC6Eb+JSvkTa9l8ZWxcBkcRxW8+7Deq3fj3EQUQBWifWDrqARFO7IglHA tskknuRguAGxCtWf8m3sn32Nljmc3aM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-267-7hIU2sQgML6BqUKkct1Kdg-1; Thu, 09 Dec 2021 14:11:40 -0500 X-MC-Unique: 7hIU2sQgML6BqUKkct1Kdg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 92F321006AA1; Thu, 9 Dec 2021 19:11:39 +0000 (UTC) Received: from redhat.com (unknown [10.39.194.55]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 1161019729; Thu, 9 Dec 2021 19:11:36 +0000 (UTC) Date: Thu, 9 Dec 2021 19:11:34 +0000 From: Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?= To: Markus Armbruster Subject: Re: Redesign of QEMU startup & initial configuration Message-ID: References: <87lf13cx3x.fsf@dusky.pond.sub.org> MIME-Version: 1.0 In-Reply-To: <87lf13cx3x.fsf@dusky.pond.sub.org> User-Agent: Mutt/2.1.3 (2021-09-10) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=berrange@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Received-SPF: pass client-ip=170.10.129.124; envelope-from=berrange@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -33 X-Spam_score: -3.4 X-Spam_bar: --- X-Spam_report: (-3.4 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.618, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Daniel =?utf-8?B?UC4gQmVycmFuZ8Op?= Cc: Damien Hedde , "Edgar E. Iglesias" , Mark Burton , qemu-devel@nongnu.org, Mirela Grujic , =?utf-8?Q?Marc-Andr=C3=A9?= Lureau , Paolo Bonzini Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" On Thu, Dec 02, 2021 at 07:57:38AM +0100, Markus Armbruster wrote: > = Motivation = > > QEMU startup and initial configuration were designed many years ago for > a much, much simpler QEMU. They have since changed beyond recognition > to adapt to new needs. There was no real redesign. Adaption to new > needs has become more and more difficult. A recent example for > "difficult" is Damien's "[RFC PATCH v3 0/5] QMP support for > cold-plugging devices". > > I think it's time for a redesign. > > > = What users want for initial configuration = > > 1. QMP only > > Management applications need to use QMP for monitoring anyway. They > may want to use it for initial configuration, too. Libvirt does. Essentially, as soon as you need to deal with hotplug, you need QMP/QAPI. As soon as you need QMP/QAPI, it is horrible to also need to use something that isn't JSON for the coldplug configuration approach, as you've doubled the number of things to write + test. > They still need to bootstrap a QMP monitor, and for that, CLI is fine > as long as it's simple and stable. Since you mentioned 'simple', allow me to go down a slight tangent... Incidentally, I often wish we didn't use -chardevs anywhere near as much as we do. Chardev was OK when we used it for backing simple guest devices that just had a single host endpoint that was considered permanently connected. Everywhere we've used it for things that really want socket semantics we've created so much pain. The things we've done with chardevs for vhostuser in particular horrify me. I'm glad the block layer resisted the tempetation and just directly used the SocketAddress QAPI type with a QIOChannel objects. I would say the monitor would be better without chardevs too. Having to create multiple monitor instances so that you can have multiple clients is insane. The complexity of course is the need for 'mux' with HMP. If it were not for that, we could use QIOChannel and SocketAddress for the monitor code. Maybe if we one day get HMP fully separated from QMP such that it is independant code, we can simplify internally. Meanwhile, I think we should consider the QMP CLI at least to only use SocketAddress for config, and secretly turn it into a chardev internally. > 2. CLI and configuration files > > Human users want a CLI and configuration files. > > CLI is good for quick tweaks, and to explore. > > For more permanent, non-trivial configuration, configuration files > are more suitable, because they are easier to read, edit, and > document than long command lines. And you can start doing things like "includes" or templating to make the config more manageable / scalable. It is also nice to not have human and machines in completely separate worlds. Humans will often look at what machines do and then try to replicate that. eg people often ask to see the libvirt QEMU config and then want to run that directly themselves, or use libvirt CLI passthrough to add on features that libvirt does not native support yet. As libvirt makes more & more use of QAPI, we are increasing the divide betweeen what machines and humans target. I'd note that Kubernetes doesn't bother with a human CLI at all, and just expects everyone/everything to use JSON/YAML config files. So there's no divide between what syntax humans and machines use - humans just send the config via the CLI tool which uploads it to the REST API that machines use directly. > = What we have for initial configuration = > > Half of 1. and half of 2., satisfying nobody's needs. > > Management applications need to do a lot of non-trivial initial > configuration with the CLI. > > Human users struggle with inconsistent syntax, insufficiently expressive > configuration files, and huge command lines. The QEMU CLI was nice because you could historically do simple stuff really simply eg qemu-system-x86_64 mydisk.img or qemu-system-x86_64 -hda mydisk.img -cdrom mydistro.iso The challenge is how CLI usage evolves as you need to finese the config you're using. Using this simple CLI approach, our defaults still largely give you a VM from 1995. If people want a modern guest setup, the simple syntax quickly stops being simple. You very easily get to a point where passing stuff on the CLI gets out of control. IMHO if the CLI is over 100 characters long, a config file is the way to go. Given that I really wonder whether direct configuration of hardware on the CLI is worthwhile at all, and users should instead /always/ be using a config file, albeit indirectly. This doesn't mean simple things become harder. I'm thinking of a config file that supports a standard template language supporting variable substitution, loops, conditionals. The CLI then does not need to represent anything related to hardware config schemas at all. Instead it just needs to take the name of a config file and ability to set variables. We could then ship some standard configs for the simple cases, which would solve the problem of us being stuck on ancient defaults for the simple CLI configs. $QEMU virtiovm.cfg root=mydisk.img cdrom=mydistro.iso where virtiovm.cfg contains { "machine": { "type": "q35", "memory": "2G", }, "disks": [ { "type": "virtio-blk", "file": @@ VAR root@@ }, @@ IF defined cdrom @@ { "type": "sata", "media": "cdroM" "file": @@ VAR cdrom @@ } @@ END @@ ] } Don't pay close attention to my suggesting JSON syntax here. I just invented something for sake of illustration. A real config would be more complex than this and follow the QAPI schemas. The key is that the complexity of the config file does not matter so much, because the user isn't directly interacting with it for simple tasks. They just have a nice high level CLI taking a few useful variables. The simple CLI becomes massively more useful than our current simple CLI, because it can support modern VMs of arbitrary complexity while still remaining simple. > = Why startup matters for initial configuration = > > Startup and initial configuration are joined at the hip: the way startup > works limits how initial configuration can work. > > Consider the startup we had before -prefconfig: > > 0. Basic, configuration-independent initialization > > 1. Parse and partially process CLI > > 2. Startup with the remaining CLI processing mixed in > > 3. Startup complete, run main loop > > Note: > > * CLI is processed out of order. > > Few people understand the order, and only while starting at the code. > Pretty much every time we mess with the order, we break something. > > * QMP monitors become available only at 3, i.e. after startup is > complete. Precludes use of QMP for initial configuration. > > The current startup with -preconfig permits a bit of initial > configuration with QMP, at the cost of complicating startup code and > interfaces. More in "Appendix: Why further incremental change feels > impractical". > > > = Ways forward = Just to clarify or remind ourselves of scope. When I think about startup/cli in QEMU, I essentially consider this to mean all softmmu/vl.c code. Of course there are supporting pieces splattered around, but IMHO 90% of the pain is resulting from the code in the vl.c file. > 0. This is too hard, let's go shopping > > I can't really blame people for not wanting to mess with startup. It > *is* hard, and there's plenty of other useful stuff to do. > > However, I believe we have to mess with it, i.e. this is not actually > a way forward. Yeah, we hurt ourselves by not being able to move forward with a better approach to startup and configuration. > 1. Improve what we have step by step, managing incompatible changes > > We tried. It has eaten substantial time and energy, it has > complicated code and interfaces further, and we're still nowhere near > where we need to be. Does not feel like a practical way forward to > me. > > If you don't believe me, read the long version in "Appendix: Why > further incremental change feels impractical". Essentially death by 1000 cuts. > 2. Start over with a QMP-only QEMU, keep the traditional QEMU forever > > A QMP-only QEMU isn't just a matter of providing an alternate main() > where we ditch most of the CLI for QMP replacements. It takes a > rework of the startup code as "phase transitions" triggered by QMP > commands. Similar to how -preconfig triggers some startup work. > > To reduce code duplication, we'll probably want to rework the > traditional QEMU's startup code some, too. Refering back to my earlier note, when I've suggested this approach in the past, I've basically considered it to mean stop touching softmmu/vl.c, and create a softmmu/vl2.c. The existing binaries remain using vl.c, we get legacy free binaries built from vl2.c Overtime we might get confidence to refactor bits of vl.c to reduce duplication, but I would consider that not a priority at least in the short term. Part of the reason why vl.c is so painful is danger of breaking existing apps, especially mgmt tools like libvirt. If we did have a vl.c and vl2.c in parallel, creating separate binaries for each we could finese our support guarantees. ie once we consider the new vl2.c to be feature complete for typical production usage, we could declare that mgmt apps should exclusively use the new binaries, and the old binaries are no longer guaranteed CLI stable. This would unlock thue ability to make more risky changes to vl.c to reduce the duplication / maint burden of two binaries. On the other hand if we do a good job with the new binaries maybe there ceases to be any reason to keep the old binaries at all after a few years. > 3. Start over with the aim to replace the traditional QEMU > > I don't want an end game where we have two QEMUs, one with bad CLI > and limited QMP, and one without real CLI and decent QMP. I want one > QEMU with decent CLI and decent QMP. > > I propose that a QEMU with decent CLI and decent QMP isn't really > harder than a QMP-only QEMU. CLI is just another transport that > happens to be one-way, i.e. the QMP commands encoded as CLI options > can't return a value. Same for configuration files. I believe the > parts that are actually hard are shared with QMP-only: QAPIfying > initial configuration and reworking the startup code as phase > transitions. > > When that decent QEMU is ready, we can deprecate and later drop the > traditional one. Right, so this is essentially what I've just suggested as a possible evoluation of (2). The only difference here is that you're setting the death of the old QEMU as an explicit end gaol in (3), while in (2) it is merely a nice-to-have. In practical terms I don't think (2) and (3) are that different from each other for the first 2 years or so. A lot can change in that time, so I don't think we need to fixate on a choice of (2) vs (3) upfront, just make it clear that the death of the old binaries is "on the table" as an outcome if we do well. > = A design proposal = > > 0. Basic, configuration-independent initialization > > 1. Start main loop > > 2. Feed it CLI left to right. Each option runs a handler just like each > QMP command does. > > Options that read a configuration file inject the file into the feed. > > Options that create a monitor create it suspended. > > Options may advance the phase / run state, they may require certain > phase(s), and semantics may depend on the phase. > > 3. When we're done with CLI, resume any monitors we created. > > As a convenience, provide an easy way to auto-advance phase / run > state to "guest runs" right here. The traditional way would be to > auto-advance unless the user gives -S. > > 4. Monitors now feed commands to the main loop. Commands may advance > the phase / run state, and they may require certain phase(s), and > semantics may depend on the phase. > > In particular, command "cont" advances phase / run state to "guest > runs". > > Users can do as much or as little with the CLI as they want. A > management application might want to set up a QMP monitor and no more. As illustrated earlier, I'd really like us to consider being a bit more adventurous on the CLI side. I'm convinced that a CLI for directly configurable hardware is doomed to be horrible no matter what, if you try to directly expose all QAPI configuration flexibilty. Whether key/value, JSON, whatever, it will become unmanagable on the CLI because VM hardware config is inherantly complicated. Thus my though that config files or QMP should be the only two places where the full power of QAPI config is exposed. Use CLI as just a way to interact with config files in a simple way with templates. We can really think of QEMU's original simple CLI from 15 years ago as being a kind of templating frontend for a config file. The config file just happened to be written as C code, and so not user edittable if going outside the common cases. > = A way to get to the proposed design = > > Create an alternate QEMU binary with the CLI taken out and shot: > > 0. Basic, configuration-independent initialization > > 2. Startup > > 3. Startup complete, run main loop > > Maybe hardcode a few things so we can actually run something without any > configuration. To be ditched when we can configure again. I wouldn't bother trying to keep things working in the intermediate steps, as if we go with "new binary" we basically assume a clean slate and can just build things up in understandable chunks even if they're not runnable initially. > Rework 2. Startup as a sequence of state transitions we can later > trigger from commands / options. > > Create a quasi-monitor so we can feed QMP commands to the main loop. > > Make QMP command "cont" transition to "guest runs". > > Feed "cont" to the main loop, and drop 2. Startup. > > Create QAPI/CLI infrastructure, so we can define CLI options like QMP > commands. I have (dusty and likely quite bit-rotten) RFC patches. > > Feed CLI options to the main loop, and only then feed "cont". > > Create option -S to suppress feeding "cont". > > Create / reuse / rework QMP commands and CLI options to create QMP > monitors. Monitors created by CLI start suspended, and get resumed only > when we're done with the CLI. This is so monitor commands can't race > with the CLI. > > Create CLI option to read a configuration file, and feed its contents to > the main loop. Recursively. We'll likely want nicer syntax than CLI or > QMP for configuration files. > > At this point, we have all the infrastructure we need. What's still > missing is filling gaps in QMP and in CLI. > > I believe "QMP only" would only be marginally easier. Basically, it > processes (a minimal) CLI before the main loop, not in it. We could > leave it unQAPIfied. I can't quite see offhand how to do configuration > files sanely. > > Stretch goals: > > * Throw in QAPIfying HMP. > > * Dust with syntactic sugar (sparingly, please) to make CLI/HMP easier > to use for humans. > > > = Initial exploration, a.k.a. talk's cheap, show me the code = > > I'm going to post "[PATCH RFC 00/11] vl: Explore redesign of startup" > shortly. So in your RFC you've basically modified the existing startup code in vl.c, but I presume that was just expediant for the purpose of a quickish illustration. None the less I feel that all the patches cutting cruft are perhaps a distraction to understanding the proposal. Might have been clearer to just start from a vl2.c file and build up from nothing. Anyway, I don't want was our time arguing over that approach for an RFC, so I won't say more here at least :-) > = Appendix: Why further incremental change feels impractical = > > Recall from "Why startup matters for initial configuration" how the > traditional startup code made initial configuration with QMP impossible. snip > Crafting a big change in small steps sounds great. It isn't when we > have to make things worse before they can get better, and every step is > painfully slow because it's just too hard, and the end state we aim for > isn't really what we want. I can't disagree with this. If we carry on trying to evolve vl.c incrementally we are doomed to be stuck with a horrible starstup process for enternity (or at least as long as I'll still be working as QEMU maintainer). Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|