From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A643BC4338F for ; Wed, 25 Aug 2021 10:39:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8A5EE61212 for ; Wed, 25 Aug 2021 10:39:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239907AbhHYKkX (ORCPT ); Wed, 25 Aug 2021 06:40:23 -0400 Received: from mail.kernel.org ([198.145.29.99]:45128 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237638AbhHYKkW (ORCPT ); Wed, 25 Aug 2021 06:40:22 -0400 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 2558F60C3E; Wed, 25 Aug 2021 10:39:37 +0000 (UTC) Received: from sofa.misterjones.org ([185.219.108.64] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1mIqJj-0077a0-1S; Wed, 25 Aug 2021 11:39:35 +0100 Date: Wed, 25 Aug 2021 11:39:34 +0100 Message-ID: <87fsuxq049.wl-maz@kernel.org> From: Marc Zyngier To: Oliver Upton Cc: kvmarm@lists.cs.columbia.edu, pshier@google.com, ricarkol@google.com, rananta@google.com, reijiw@google.com, jingzhangos@google.com, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, james.morse@arm.com, Alexandru.Elisei@arm.com, suzuki.poulose@arm.com, Drew Jones , Peter Maydell Subject: Re: KVM/arm64: Guest ABI changes do not appear rollback-safe In-Reply-To: References: <87mtp5q3gx.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: oupton@google.com, kvmarm@lists.cs.columbia.edu, pshier@google.com, ricarkol@google.com, rananta@google.com, reijiw@google.com, jingzhangos@google.com, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, james.morse@arm.com, Alexandru.Elisei@arm.com, suzuki.poulose@arm.com, drjones@redhat.com, peter.maydell@linaro.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org On Wed, 25 Aug 2021 11:02:28 +0100, Oliver Upton wrote: > > On Wed, Aug 25, 2021 at 2:27 AM Marc Zyngier wrote: > > > Exposing new hypercalls to guests in this manner seems very unsafe to > > > me. Suppose an operator is trying to upgrade from kernel N to kernel > > > N+1, which brings in the new 'widget' hypercall. Guests are live > > > migrated onto the N+1 kernel, but the operator finds a defect that > > > warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N. > > > Any guests that discovered the 'widget' hypercall are likely going to > > > get fussy _very_ quickly on the old kernel. > > > > This goes against what we decided to support for the *only* publicly > > available VMM that cares about save/restore, which is that we only > > move forward and don't rollback. > > Ah, I was definitely missing this context. Current behavior makes much > more sense then. > > > Hypercalls are the least of your > > worries, and there is a whole range of other architectural features > > that will have also appeared/disappeared (your own CNTPOFF series is a > > glaring example of this). > > Isn't that a tad bit different though? I'll admit, I'm just as guilty > with my own series forgetting to add a KVM_CAP (oops), but it is in my > queue to kick out with the fix for nVHE/ptimer. Nonetheless, if a user > takes up a new KVM UAPI, it is up to the user to run on a new kernel. The two are linked. Exposing a new register to userspace and/or guest result in the same thing: you can't rollback. That's specially true in the QEMU case, which *learns* from the kernel what registers are available, and doesn't maintain a fixed list. > My concerns are explicitly with the 'under the nose' changes, where > KVM modifies the guest feature set without userspace opting in. Based > on your comment, though, it would appear that other parts of KVM are > affected too. Any new system register that is exposed by a new kernel feature breaks rollback. And so far, we only consider it a bug if the set of exposed registers reduces. Anything can be added safely (as checked by one of the selftests added by Drew). < It doesn't have to be rollback safety, either. There may > simply be a hypercall which an operator doesn't want to give its > guests, and it needs a way to tell KVM to hide it. Fair enough. But this has to be done in a scalable way, which individual capability cannot provide. > > > Have I missed something blatantly obvious, or do others see this as an > > > issue as well? I'll reply with an example of adding opt-out for PTP. > > > I'm sure other hypercalls could be handled similarly. > > > > Why do we need this? For future hypercalls, we could have some buy-in > > capabilities. For existing ones, it is too late, and negative features > > are just too horrible. > > Oh, agreed on the nastiness. Lazy hack to realize the intended > functional change.. Well, you definitely achieved your goal of attracting my attention :). > > For KVM-specific hypercalls, we could get the VMM to save/restore the > > bitmap of supported functions. That would be "less horrible". This > > could be implemented using extra "firmware pseudo-registers" such as > > the ones described in Documentation/virt/kvm/arm/psci.rst. > > This seems more reasonable, especially since we do this for migrating > the guest's PSCI version. > > Alternatively, I had thought about using a VM attribute, given the > fact that it is non-architectural information and we avoid ABI issues > in KVM_GET_REG_LIST without buy-in through a KVM_CAP. The whole point is that these settings get exposed by KVM_GET_REG_LIST, as this is QEMU's way to dump a VM state. Given that we already have this for things like the spectre management state, we can just as well expose the bitmaps that deal with the KVM-specific hypercalls. After all, this falls into the realm of "KVM as VM firmware". For ARM-architected hypercalls (TRNG, stolen time), we may need a similar extension. M. -- Without deviation from the norm, progress is not possible.