Re: KVM/arm64: Guest ABI changes do not appear rollback-safe

From: Marc Zyngier <maz@kernel.org>
To: Oliver Upton <oupton@google.com>
Cc: kvmarm@lists.cs.columbia.edu, pshier@google.com,
	ricarkol@google.com, rananta@google.com, reijiw@google.com,
	jingzhangos@google.com, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, james.morse@arm.com,
	Alexandru.Elisei@arm.com, suzuki.poulose@arm.com,
	Drew Jones <drjones@redhat.com>,
	Peter Maydell <peter.maydell@linaro.org>
Subject: Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
Date: Wed, 25 Aug 2021 11:39:34 +0100	[thread overview]
Message-ID: <87fsuxq049.wl-maz@kernel.org> (raw)
In-Reply-To: <CAOQ_QshSaEm_cMYQfRTaXJwnVqeoN29rMLBej-snWd6_0HsgGw@mail.gmail.com>

On Wed, 25 Aug 2021 11:02:28 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> On Wed, Aug 25, 2021 at 2:27 AM Marc Zyngier <maz@kernel.org> wrote:
> > > Exposing new hypercalls to guests in this manner seems very unsafe to
> > > me. Suppose an operator is trying to upgrade from kernel N to kernel
> > > N+1, which brings in the new 'widget' hypercall. Guests are live
> > > migrated onto the N+1 kernel, but the operator finds a defect that
> > > warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
> > > Any guests that discovered the 'widget' hypercall are likely going to
> > > get fussy _very_ quickly on the old kernel.
> >
> > This goes against what we decided to support for the *only* publicly
> > available VMM that cares about save/restore, which is that we only
> > move forward and don't rollback.
> 
> Ah, I was definitely missing this context. Current behavior makes much
> more sense then.
> 
> > Hypercalls are the least of your
> > worries, and there is a whole range of other architectural features
> > that will have also appeared/disappeared (your own CNTPOFF series is a
> > glaring example of this).
> 
> Isn't that a tad bit different though? I'll admit, I'm just as guilty
> with my own series forgetting to add a KVM_CAP (oops), but it is in my
> queue to kick out with the fix for nVHE/ptimer. Nonetheless, if a user
> takes up a new KVM UAPI, it is up to the user to run on a new kernel.

The two are linked. Exposing a new register to userspace and/or guest
result in the same thing: you can't rollback. That's specially true in
the QEMU case, which *learns* from the kernel what registers are
available, and doesn't maintain a fixed list.

> My concerns are explicitly with the 'under the nose' changes, where
> KVM modifies the guest feature set without userspace opting in. Based
> on your comment, though, it would appear that other parts of KVM are
> affected too.

Any new system register that is exposed by a new kernel feature breaks
rollback. And so far, we only consider it a bug if the set of exposed
registers reduces. Anything can be added safely (as checked by one of
the selftests added by Drew).

< It doesn't have to be rollback safety, either. There may
> simply be a hypercall which an operator doesn't want to give its
> guests, and it needs a way to tell KVM to hide it.

Fair enough. But this has to be done in a scalable way, which
individual capability cannot provide.

> > > Have I missed something blatantly obvious, or do others see this as an
> > > issue as well? I'll reply with an example of adding opt-out for PTP.
> > > I'm sure other hypercalls could be handled similarly.
> >
> > Why do we need this? For future hypercalls, we could have some buy-in
> > capabilities. For existing ones, it is too late, and negative features
> > are just too horrible.
> 
> Oh, agreed on the nastiness. Lazy hack to realize the intended
> functional change..

Well, you definitely achieved your goal of attracting my attention :).

> > For KVM-specific hypercalls, we could get the VMM to save/restore the
> > bitmap of supported functions. That would be "less horrible". This
> > could be implemented using extra "firmware pseudo-registers" such as
> > the ones described in Documentation/virt/kvm/arm/psci.rst.
> 
> This seems more reasonable, especially since we do this for migrating
> the guest's PSCI version.
> 
> Alternatively, I had thought about using a VM attribute, given the
> fact that it is non-architectural information and we avoid ABI issues
> in KVM_GET_REG_LIST without buy-in through a KVM_CAP.

The whole point is that these settings get exposed by
KVM_GET_REG_LIST, as this is QEMU's way to dump a VM state. Given that
we already have this for things like the spectre management state, we
can just as well expose the bitmaps that deal with the KVM-specific
hypercalls. After all, this falls into the realm of "KVM as VM
firmware".

For ARM-architected hypercalls (TRNG, stolen time), we may need a
similar extension.

	M.

-- 
Without deviation from the norm, progress is not possible.