KVM/arm64: Guest ABI changes do not appear rollback-safe

* KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-24 21:15 Oliver Upton
  2021-08-24 21:34 ` [RFC PATCH] KVM: arm64: Allow VMMs to opt-out of KVM_CAP_PTP_KVM Oliver Upton
  2021-08-25  9:27 ` KVM/arm64: Guest ABI changes do not appear rollback-safe Marc Zyngier
  0 siblings, 2 replies; 25+ messages in thread
From: Oliver Upton @ 2021-08-24 21:15 UTC (permalink / raw)
  To: kvmarm
  Cc: maz, pshier, ricarkol, rananta, reijiw, jingzhangos, kvm,
	linux-arm-kernel, james.morse, alexandru.elisei, suzuki.poulose

Hey folks,

Ricardo and I were discussing hypercall support in KVM/arm64 and
something seems to be a bit problematic. I do not see anywhere that KVM
requires opt-in from the VMM to expose new hypercalls to the guest. To
name a few, the TRNG and KVM PTP hypercalls are unconditionally provided
to the guest.

Exposing new hypercalls to guests in this manner seems very unsafe to
me. Suppose an operator is trying to upgrade from kernel N to kernel
N+1, which brings in the new 'widget' hypercall. Guests are live
migrated onto the N+1 kernel, but the operator finds a defect that
warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
Any guests that discovered the 'widget' hypercall are likely going to
get fussy _very_ quickly on the old kernel.

Really, we need to ensure that the exposed guest ABI is
backwards-compatible. Running a VMM that is blissfully unaware of the
'widget' hypercall should not implicitly expose it to its guest on a new
kernel.

This conversation was in the context of devising a new UAPI that allows
VMMs to trap hypercalls to userspace. While such an interface would
easily work around the issue, the onus of ABI compatibility lies with
the kernel.

So, this is all a long-winded way of asking: how do we dig ourselves out
of this situation, and how to we avoid it happening again in the future?
I believe the answer to both is to have new VM capabilities for sets of
hypercalls exposed to the guest. Unfortunately, the unconditional
exposure of TRNG and PTP hypercalls is ABI now, so we'd have to provide
an opt-out at this point. For anything new, require opt-in from the VMM
before we give it to the guest.

Have I missed something blatantly obvious, or do others see this as an
issue as well? I'll reply with an example of adding opt-out for PTP.
I'm sure other hypercalls could be handled similarly.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 25+ messages in thread