From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <kvm-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.0 required=3.0 tests=BAYES_00,MAILING_LIST_MULTI,
	SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A643BC4338F
	for <kvm@archiver.kernel.org>; Wed, 25 Aug 2021 10:39:38 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 8A5EE61212
	for <kvm@archiver.kernel.org>; Wed, 25 Aug 2021 10:39:38 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S239907AbhHYKkX (ORCPT <rfc822;kvm@archiver.kernel.org>);
        Wed, 25 Aug 2021 06:40:23 -0400
Received: from mail.kernel.org ([198.145.29.99]:45128 "EHLO mail.kernel.org"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S237638AbhHYKkW (ORCPT <rfc822;kvm@vger.kernel.org>);
        Wed, 25 Aug 2021 06:40:22 -0400
Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96])
        (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
        (No client certificate requested)
        by mail.kernel.org (Postfix) with ESMTPSA id 2558F60C3E;
        Wed, 25 Aug 2021 10:39:37 +0000 (UTC)
Received: from sofa.misterjones.org ([185.219.108.64] helo=why.misterjones.org)
        by disco-boy.misterjones.org with esmtpsa  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
        (Exim 4.94.2)
        (envelope-from <maz@kernel.org>)
        id 1mIqJj-0077a0-1S; Wed, 25 Aug 2021 11:39:35 +0100
Date:   Wed, 25 Aug 2021 11:39:34 +0100
Message-ID: <87fsuxq049.wl-maz@kernel.org>
From:   Marc Zyngier <maz@kernel.org>
To:     Oliver Upton <oupton@google.com>
Cc:     kvmarm@lists.cs.columbia.edu, pshier@google.com,
        ricarkol@google.com, rananta@google.com, reijiw@google.com,
        jingzhangos@google.com, kvm@vger.kernel.org,
        linux-arm-kernel@lists.infradead.org, james.morse@arm.com,
        Alexandru.Elisei@arm.com, suzuki.poulose@arm.com,
        Drew Jones <drjones@redhat.com>,
        Peter Maydell <peter.maydell@linaro.org>
Subject: Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
In-Reply-To: <CAOQ_QshSaEm_cMYQfRTaXJwnVqeoN29rMLBej-snWd6_0HsgGw@mail.gmail.com>
References: <YSVhV+UIMY12u2PW@google.com>
        <87mtp5q3gx.wl-maz@kernel.org>
        <CAOQ_QshSaEm_cMYQfRTaXJwnVqeoN29rMLBej-snWd6_0HsgGw@mail.gmail.com>
User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue)
 FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1
 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO)
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
Content-Type: text/plain; charset=US-ASCII
X-SA-Exim-Connect-IP: 185.219.108.64
X-SA-Exim-Rcpt-To: oupton@google.com, kvmarm@lists.cs.columbia.edu, pshier@google.com, ricarkol@google.com, rananta@google.com, reijiw@google.com, jingzhangos@google.com, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, james.morse@arm.com, Alexandru.Elisei@arm.com, suzuki.poulose@arm.com, drjones@redhat.com, peter.maydell@linaro.org
X-SA-Exim-Mail-From: maz@kernel.org
X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false
Precedence: bulk
List-ID: <kvm.vger.kernel.org>
X-Mailing-List: kvm@vger.kernel.org

On Wed, 25 Aug 2021 11:02:28 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> On Wed, Aug 25, 2021 at 2:27 AM Marc Zyngier <maz@kernel.org> wrote:
> > > Exposing new hypercalls to guests in this manner seems very unsafe to
> > > me. Suppose an operator is trying to upgrade from kernel N to kernel
> > > N+1, which brings in the new 'widget' hypercall. Guests are live
> > > migrated onto the N+1 kernel, but the operator finds a defect that
> > > warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
> > > Any guests that discovered the 'widget' hypercall are likely going to
> > > get fussy _very_ quickly on the old kernel.
> >
> > This goes against what we decided to support for the *only* publicly
> > available VMM that cares about save/restore, which is that we only
> > move forward and don't rollback.
> 
> Ah, I was definitely missing this context. Current behavior makes much
> more sense then.
> 
> > Hypercalls are the least of your
> > worries, and there is a whole range of other architectural features
> > that will have also appeared/disappeared (your own CNTPOFF series is a
> > glaring example of this).
> 
> Isn't that a tad bit different though? I'll admit, I'm just as guilty
> with my own series forgetting to add a KVM_CAP (oops), but it is in my
> queue to kick out with the fix for nVHE/ptimer. Nonetheless, if a user
> takes up a new KVM UAPI, it is up to the user to run on a new kernel.

The two are linked. Exposing a new register to userspace and/or guest
result in the same thing: you can't rollback. That's specially true in
the QEMU case, which *learns* from the kernel what registers are
available, and doesn't maintain a fixed list.

> My concerns are explicitly with the 'under the nose' changes, where
> KVM modifies the guest feature set without userspace opting in. Based
> on your comment, though, it would appear that other parts of KVM are
> affected too.

Any new system register that is exposed by a new kernel feature breaks
rollback. And so far, we only consider it a bug if the set of exposed
registers reduces. Anything can be added safely (as checked by one of
the selftests added by Drew).

< It doesn't have to be rollback safety, either. There may
> simply be a hypercall which an operator doesn't want to give its
> guests, and it needs a way to tell KVM to hide it.

Fair enough. But this has to be done in a scalable way, which
individual capability cannot provide.

> > > Have I missed something blatantly obvious, or do others see this as an
> > > issue as well? I'll reply with an example of adding opt-out for PTP.
> > > I'm sure other hypercalls could be handled similarly.
> >
> > Why do we need this? For future hypercalls, we could have some buy-in
> > capabilities. For existing ones, it is too late, and negative features
> > are just too horrible.
> 
> Oh, agreed on the nastiness. Lazy hack to realize the intended
> functional change..

Well, you definitely achieved your goal of attracting my attention :).

> > For KVM-specific hypercalls, we could get the VMM to save/restore the
> > bitmap of supported functions. That would be "less horrible". This
> > could be implemented using extra "firmware pseudo-registers" such as
> > the ones described in Documentation/virt/kvm/arm/psci.rst.
> 
> This seems more reasonable, especially since we do this for migrating
> the guest's PSCI version.
> 
> Alternatively, I had thought about using a VM attribute, given the
> fact that it is non-architectural information and we avoid ABI issues
> in KVM_GET_REG_LIST without buy-in through a KVM_CAP.

The whole point is that these settings get exposed by
KVM_GET_REG_LIST, as this is QEMU's way to dump a VM state. Given that
we already have this for things like the spectre management state, we
can just as well expose the bitmaps that deal with the KVM-specific
hypercalls. After all, this falls into the realm of "KVM as VM
firmware".

For ARM-architected hypercalls (TRNG, stolen time), we may need a
similar extension.

	M.

-- 
Without deviation from the norm, progress is not possible.