All of lore.kernel.org
 help / color / mirror / Atom feed
* KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-24 21:15 ` Oliver Upton
  0 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2021-08-24 21:15 UTC (permalink / raw)
  To: kvmarm
  Cc: maz, pshier, ricarkol, rananta, reijiw, jingzhangos, kvm,
	linux-arm-kernel, james.morse, alexandru.elisei, suzuki.poulose

Hey folks,

Ricardo and I were discussing hypercall support in KVM/arm64 and
something seems to be a bit problematic. I do not see anywhere that KVM
requires opt-in from the VMM to expose new hypercalls to the guest. To
name a few, the TRNG and KVM PTP hypercalls are unconditionally provided
to the guest.

Exposing new hypercalls to guests in this manner seems very unsafe to
me. Suppose an operator is trying to upgrade from kernel N to kernel
N+1, which brings in the new 'widget' hypercall. Guests are live
migrated onto the N+1 kernel, but the operator finds a defect that
warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
Any guests that discovered the 'widget' hypercall are likely going to
get fussy _very_ quickly on the old kernel.

Really, we need to ensure that the exposed guest ABI is
backwards-compatible. Running a VMM that is blissfully unaware of the
'widget' hypercall should not implicitly expose it to its guest on a new
kernel.

This conversation was in the context of devising a new UAPI that allows
VMMs to trap hypercalls to userspace. While such an interface would
easily work around the issue, the onus of ABI compatibility lies with
the kernel.

So, this is all a long-winded way of asking: how do we dig ourselves out
of this situation, and how to we avoid it happening again in the future?
I believe the answer to both is to have new VM capabilities for sets of
hypercalls exposed to the guest. Unfortunately, the unconditional
exposure of TRNG and PTP hypercalls is ABI now, so we'd have to provide
an opt-out at this point. For anything new, require opt-in from the VMM
before we give it to the guest.

Have I missed something blatantly obvious, or do others see this as an
issue as well? I'll reply with an example of adding opt-out for PTP.
I'm sure other hypercalls could be handled similarly.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 74+ messages in thread

* KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-24 21:15 ` Oliver Upton
  0 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2021-08-24 21:15 UTC (permalink / raw)
  To: kvmarm; +Cc: kvm, maz, pshier, linux-arm-kernel

Hey folks,

Ricardo and I were discussing hypercall support in KVM/arm64 and
something seems to be a bit problematic. I do not see anywhere that KVM
requires opt-in from the VMM to expose new hypercalls to the guest. To
name a few, the TRNG and KVM PTP hypercalls are unconditionally provided
to the guest.

Exposing new hypercalls to guests in this manner seems very unsafe to
me. Suppose an operator is trying to upgrade from kernel N to kernel
N+1, which brings in the new 'widget' hypercall. Guests are live
migrated onto the N+1 kernel, but the operator finds a defect that
warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
Any guests that discovered the 'widget' hypercall are likely going to
get fussy _very_ quickly on the old kernel.

Really, we need to ensure that the exposed guest ABI is
backwards-compatible. Running a VMM that is blissfully unaware of the
'widget' hypercall should not implicitly expose it to its guest on a new
kernel.

This conversation was in the context of devising a new UAPI that allows
VMMs to trap hypercalls to userspace. While such an interface would
easily work around the issue, the onus of ABI compatibility lies with
the kernel.

So, this is all a long-winded way of asking: how do we dig ourselves out
of this situation, and how to we avoid it happening again in the future?
I believe the answer to both is to have new VM capabilities for sets of
hypercalls exposed to the guest. Unfortunately, the unconditional
exposure of TRNG and PTP hypercalls is ABI now, so we'd have to provide
an opt-out at this point. For anything new, require opt-in from the VMM
before we give it to the guest.

Have I missed something blatantly obvious, or do others see this as an
issue as well? I'll reply with an example of adding opt-out for PTP.
I'm sure other hypercalls could be handled similarly.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-24 21:15 ` Oliver Upton
  0 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2021-08-24 21:15 UTC (permalink / raw)
  To: kvmarm
  Cc: maz, pshier, ricarkol, rananta, reijiw, jingzhangos, kvm,
	linux-arm-kernel, james.morse, alexandru.elisei, suzuki.poulose

Hey folks,

Ricardo and I were discussing hypercall support in KVM/arm64 and
something seems to be a bit problematic. I do not see anywhere that KVM
requires opt-in from the VMM to expose new hypercalls to the guest. To
name a few, the TRNG and KVM PTP hypercalls are unconditionally provided
to the guest.

Exposing new hypercalls to guests in this manner seems very unsafe to
me. Suppose an operator is trying to upgrade from kernel N to kernel
N+1, which brings in the new 'widget' hypercall. Guests are live
migrated onto the N+1 kernel, but the operator finds a defect that
warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
Any guests that discovered the 'widget' hypercall are likely going to
get fussy _very_ quickly on the old kernel.

Really, we need to ensure that the exposed guest ABI is
backwards-compatible. Running a VMM that is blissfully unaware of the
'widget' hypercall should not implicitly expose it to its guest on a new
kernel.

This conversation was in the context of devising a new UAPI that allows
VMMs to trap hypercalls to userspace. While such an interface would
easily work around the issue, the onus of ABI compatibility lies with
the kernel.

So, this is all a long-winded way of asking: how do we dig ourselves out
of this situation, and how to we avoid it happening again in the future?
I believe the answer to both is to have new VM capabilities for sets of
hypercalls exposed to the guest. Unfortunately, the unconditional
exposure of TRNG and PTP hypercalls is ABI now, so we'd have to provide
an opt-out at this point. For anything new, require opt-in from the VMM
before we give it to the guest.

Have I missed something blatantly obvious, or do others see this as an
issue as well? I'll reply with an example of adding opt-out for PTP.
I'm sure other hypercalls could be handled similarly.

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* [RFC PATCH] KVM: arm64: Allow VMMs to opt-out of KVM_CAP_PTP_KVM
  2021-08-24 21:15 ` Oliver Upton
@ 2021-08-24 21:34   ` Oliver Upton
  -1 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2021-08-24 21:34 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Marc Zyngier, Peter Shier, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, rejiw, Oliver Upton

commit 3bf725699bf6 ("KVM: arm64: Add support for the KVM PTP service")
introduced support for a hypercall-based interface through which a KVM
guest may query the host's walltime relative to its physical or virtual
counter. Unfortunately, KVM does not require opt-in for the feature, and
unconditionally provides it to guests when available.

This is extremely problematic for operators who want to ensure guest
migrations are rollback safe. If an operator were to live migrate guests
to a kernel with KVM_CAP_PTP_KVM and subsequently need to roll back the
kernel, guests that discovered the hypercall will get fussy *very*
quickly.

Plug the hazard by introducing a new capability,
KVM_CAP_DISABLE_PTP_KVM. To maintain ABI compatibility with the
aforementioned change, this cap is off by default. When enabled, hide
the KVM PTP hypercall from the guest.

Fixes: 3bf725699bf6 ("KVM: arm64: Add support for the KVM PTP service")
Signed-off-by: Oliver Upton <oupton@google.com>
---
Patch cleanly applies on v5.14-rc7. Delightfully untested beyond
building it :)

 Documentation/virt/kvm/api.rst    | 13 +++++++++++++
 arch/arm64/include/asm/kvm_host.h |  3 +++
 arch/arm64/kvm/arm.c              |  5 +++++
 arch/arm64/kvm/hypercalls.c       |  7 +++++--
 include/uapi/linux/kvm.h          |  1 +
 5 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index dae68e68ca23..4866418a2bb6 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7241,3 +7241,16 @@ The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
 of the result of KVM_CHECK_EXTENSION.  KVM will forward to userspace
 the hypercalls whose corresponding bit is in the argument, and return
 ENOSYS for the others.
+
+8.35 KVM_CAP_DISABLE_PTP_KVM
+----------------------------
+
+:Architectures: arm64
+
+This capability indicates that a VMM may disable the KVM virtual PTP
+service for a guest. KVM_CAP_PTP_KVM introduced support for this
+hypercall interface, but it is unconditionally enabled without any
+opt-out.
+
+When this capability is enabled, KVM will hide the KVM virtual PTP
+service from the guest.
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 41911585ae0c..8795228aa08e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -136,6 +136,9 @@ struct kvm_arch {
 
 	/* Memory Tagging Extension enabled for the guest */
 	bool mte_enabled;
+
+	/* PTP KVM hypercall disabled for this guest */
+	bool ptp_kvm_disabled;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 0ca72f5cda41..b8f3b2eafd45 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -103,6 +103,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		}
 		mutex_unlock(&kvm->lock);
 		break;
+	case KVM_CAP_DISABLE_PTP_KVM:
+		kvm->arch.ptp_kvm_disabled = true;
+		r = 0;
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -217,6 +221,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_SET_GUEST_DEBUG:
 	case KVM_CAP_VCPU_ATTRIBUTES:
 	case KVM_CAP_PTP_KVM:
+	case KVM_CAP_DISABLE_PTP_KVM:
 		r = 1;
 		break;
 	case KVM_CAP_SET_GUEST_DEBUG2:
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 30da78f72b3b..8e9f2e1329e7 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -62,6 +62,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 {
 	u32 func_id = smccc_get_function(vcpu);
 	u64 val[4] = {SMCCC_RET_NOT_SUPPORTED};
+	struct kvm *kvm = vcpu->kvm;
 	u32 feature;
 	gpa_t gpa;
 
@@ -128,10 +129,12 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 		break;
 	case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
 		val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
-		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_PTP);
+		if (!kvm->arch.ptp_kvm_disabled)
+			val[0] |= BIT(ARM_SMCCC_KVM_FUNC_PTP);
 		break;
 	case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
-		kvm_ptp_get_time(vcpu, val);
+		if (!kvm->arch.ptp_kvm_disabled)
+			kvm_ptp_get_time(vcpu, val);
 		break;
 	case ARM_SMCCC_TRNG_VERSION:
 	case ARM_SMCCC_TRNG_FEATURES:
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d9e4aabcb31a..d8419c336ec8 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1112,6 +1112,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_BINARY_STATS_FD 203
 #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
 #define KVM_CAP_ARM_MTE 205
+#define KVM_CAP_DISABLE_PTP_KVM 206
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.33.0.rc2.250.ged5fa647cd-goog


^ permalink raw reply related	[flat|nested] 74+ messages in thread

* [RFC PATCH] KVM: arm64: Allow VMMs to opt-out of KVM_CAP_PTP_KVM
@ 2021-08-24 21:34   ` Oliver Upton
  0 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2021-08-24 21:34 UTC (permalink / raw)
  To: kvm, kvmarm; +Cc: rejiw, Marc Zyngier, Peter Shier

commit 3bf725699bf6 ("KVM: arm64: Add support for the KVM PTP service")
introduced support for a hypercall-based interface through which a KVM
guest may query the host's walltime relative to its physical or virtual
counter. Unfortunately, KVM does not require opt-in for the feature, and
unconditionally provides it to guests when available.

This is extremely problematic for operators who want to ensure guest
migrations are rollback safe. If an operator were to live migrate guests
to a kernel with KVM_CAP_PTP_KVM and subsequently need to roll back the
kernel, guests that discovered the hypercall will get fussy *very*
quickly.

Plug the hazard by introducing a new capability,
KVM_CAP_DISABLE_PTP_KVM. To maintain ABI compatibility with the
aforementioned change, this cap is off by default. When enabled, hide
the KVM PTP hypercall from the guest.

Fixes: 3bf725699bf6 ("KVM: arm64: Add support for the KVM PTP service")
Signed-off-by: Oliver Upton <oupton@google.com>
---
Patch cleanly applies on v5.14-rc7. Delightfully untested beyond
building it :)

 Documentation/virt/kvm/api.rst    | 13 +++++++++++++
 arch/arm64/include/asm/kvm_host.h |  3 +++
 arch/arm64/kvm/arm.c              |  5 +++++
 arch/arm64/kvm/hypercalls.c       |  7 +++++--
 include/uapi/linux/kvm.h          |  1 +
 5 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index dae68e68ca23..4866418a2bb6 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7241,3 +7241,16 @@ The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
 of the result of KVM_CHECK_EXTENSION.  KVM will forward to userspace
 the hypercalls whose corresponding bit is in the argument, and return
 ENOSYS for the others.
+
+8.35 KVM_CAP_DISABLE_PTP_KVM
+----------------------------
+
+:Architectures: arm64
+
+This capability indicates that a VMM may disable the KVM virtual PTP
+service for a guest. KVM_CAP_PTP_KVM introduced support for this
+hypercall interface, but it is unconditionally enabled without any
+opt-out.
+
+When this capability is enabled, KVM will hide the KVM virtual PTP
+service from the guest.
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 41911585ae0c..8795228aa08e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -136,6 +136,9 @@ struct kvm_arch {
 
 	/* Memory Tagging Extension enabled for the guest */
 	bool mte_enabled;
+
+	/* PTP KVM hypercall disabled for this guest */
+	bool ptp_kvm_disabled;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 0ca72f5cda41..b8f3b2eafd45 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -103,6 +103,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		}
 		mutex_unlock(&kvm->lock);
 		break;
+	case KVM_CAP_DISABLE_PTP_KVM:
+		kvm->arch.ptp_kvm_disabled = true;
+		r = 0;
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -217,6 +221,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_SET_GUEST_DEBUG:
 	case KVM_CAP_VCPU_ATTRIBUTES:
 	case KVM_CAP_PTP_KVM:
+	case KVM_CAP_DISABLE_PTP_KVM:
 		r = 1;
 		break;
 	case KVM_CAP_SET_GUEST_DEBUG2:
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 30da78f72b3b..8e9f2e1329e7 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -62,6 +62,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 {
 	u32 func_id = smccc_get_function(vcpu);
 	u64 val[4] = {SMCCC_RET_NOT_SUPPORTED};
+	struct kvm *kvm = vcpu->kvm;
 	u32 feature;
 	gpa_t gpa;
 
@@ -128,10 +129,12 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 		break;
 	case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
 		val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
-		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_PTP);
+		if (!kvm->arch.ptp_kvm_disabled)
+			val[0] |= BIT(ARM_SMCCC_KVM_FUNC_PTP);
 		break;
 	case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
-		kvm_ptp_get_time(vcpu, val);
+		if (!kvm->arch.ptp_kvm_disabled)
+			kvm_ptp_get_time(vcpu, val);
 		break;
 	case ARM_SMCCC_TRNG_VERSION:
 	case ARM_SMCCC_TRNG_FEATURES:
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d9e4aabcb31a..d8419c336ec8 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1112,6 +1112,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_BINARY_STATS_FD 203
 #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
 #define KVM_CAP_ARM_MTE 205
+#define KVM_CAP_DISABLE_PTP_KVM 206
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.33.0.rc2.250.ged5fa647cd-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
  2021-08-24 21:15 ` Oliver Upton
  (?)
@ 2021-08-25  9:27   ` Marc Zyngier
  -1 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2021-08-25  9:27 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, pshier, ricarkol, rananta, reijiw, jingzhangos, kvm,
	linux-arm-kernel, james.morse, alexandru.elisei, suzuki.poulose,
	Drew Jones, Peter Maydell

Hi Oliver,

Adding Andrew and Peter to the discussion.

On Tue, 24 Aug 2021 22:15:03 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> Hey folks,
> 
> Ricardo and I were discussing hypercall support in KVM/arm64 and
> something seems to be a bit problematic. I do not see anywhere that KVM
> requires opt-in from the VMM to expose new hypercalls to the guest. To
> name a few, the TRNG and KVM PTP hypercalls are unconditionally provided
> to the guest.
> 
> Exposing new hypercalls to guests in this manner seems very unsafe to
> me. Suppose an operator is trying to upgrade from kernel N to kernel
> N+1, which brings in the new 'widget' hypercall. Guests are live
> migrated onto the N+1 kernel, but the operator finds a defect that
> warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
> Any guests that discovered the 'widget' hypercall are likely going to
> get fussy _very_ quickly on the old kernel.

This goes against what we decided to support for the *only* publicly
available VMM that cares about save/restore, which is that we only
move forward and don't rollback. Hypercalls are the least of your
worries, and there is a whole range of other architectural features
that will have also appeared/disappeared (your own CNTPOFF series is a
glaring example of this).

I appreciate that you may have different considerations, but I felt
that it was important to state *why* this is the way it is.

> 
> Really, we need to ensure that the exposed guest ABI is
> backwards-compatible. Running a VMM that is blissfully unaware of the
> 'widget' hypercall should not implicitly expose it to its guest on a new
> kernel.
> 
> This conversation was in the context of devising a new UAPI that allows
> VMMs to trap hypercalls to userspace. While such an interface would
> easily work around the issue, the onus of ABI compatibility lies with
> the kernel.
> 
> So, this is all a long-winded way of asking: how do we dig ourselves out
> of this situation, and how to we avoid it happening again in the future?
> I believe the answer to both is to have new VM capabilities for sets of
> hypercalls exposed to the guest. Unfortunately, the unconditional
> exposure of TRNG and PTP hypercalls is ABI now, so we'd have to provide
> an opt-out at this point. For anything new, require opt-in from the VMM
> before we give it to the guest.
>
> Have I missed something blatantly obvious, or do others see this as an
> issue as well? I'll reply with an example of adding opt-out for PTP.
> I'm sure other hypercalls could be handled similarly.

Why do we need this? For future hypercalls, we could have some buy-in
capabilities. For existing ones, it is too late, and negative features
are just too horrible.

For KVM-specific hypercalls, we could get the VMM to save/restore the
bitmap of supported functions. That would be "less horrible". This
could be implemented using extra "firmware pseudo-registers" such as
the ones described in Documentation/virt/kvm/arm/psci.rst.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-25  9:27   ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2021-08-25  9:27 UTC (permalink / raw)
  To: Oliver Upton; +Cc: kvm, pshier, kvmarm, linux-arm-kernel

Hi Oliver,

Adding Andrew and Peter to the discussion.

On Tue, 24 Aug 2021 22:15:03 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> Hey folks,
> 
> Ricardo and I were discussing hypercall support in KVM/arm64 and
> something seems to be a bit problematic. I do not see anywhere that KVM
> requires opt-in from the VMM to expose new hypercalls to the guest. To
> name a few, the TRNG and KVM PTP hypercalls are unconditionally provided
> to the guest.
> 
> Exposing new hypercalls to guests in this manner seems very unsafe to
> me. Suppose an operator is trying to upgrade from kernel N to kernel
> N+1, which brings in the new 'widget' hypercall. Guests are live
> migrated onto the N+1 kernel, but the operator finds a defect that
> warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
> Any guests that discovered the 'widget' hypercall are likely going to
> get fussy _very_ quickly on the old kernel.

This goes against what we decided to support for the *only* publicly
available VMM that cares about save/restore, which is that we only
move forward and don't rollback. Hypercalls are the least of your
worries, and there is a whole range of other architectural features
that will have also appeared/disappeared (your own CNTPOFF series is a
glaring example of this).

I appreciate that you may have different considerations, but I felt
that it was important to state *why* this is the way it is.

> 
> Really, we need to ensure that the exposed guest ABI is
> backwards-compatible. Running a VMM that is blissfully unaware of the
> 'widget' hypercall should not implicitly expose it to its guest on a new
> kernel.
> 
> This conversation was in the context of devising a new UAPI that allows
> VMMs to trap hypercalls to userspace. While such an interface would
> easily work around the issue, the onus of ABI compatibility lies with
> the kernel.
> 
> So, this is all a long-winded way of asking: how do we dig ourselves out
> of this situation, and how to we avoid it happening again in the future?
> I believe the answer to both is to have new VM capabilities for sets of
> hypercalls exposed to the guest. Unfortunately, the unconditional
> exposure of TRNG and PTP hypercalls is ABI now, so we'd have to provide
> an opt-out at this point. For anything new, require opt-in from the VMM
> before we give it to the guest.
>
> Have I missed something blatantly obvious, or do others see this as an
> issue as well? I'll reply with an example of adding opt-out for PTP.
> I'm sure other hypercalls could be handled similarly.

Why do we need this? For future hypercalls, we could have some buy-in
capabilities. For existing ones, it is too late, and negative features
are just too horrible.

For KVM-specific hypercalls, we could get the VMM to save/restore the
bitmap of supported functions. That would be "less horrible". This
could be implemented using extra "firmware pseudo-registers" such as
the ones described in Documentation/virt/kvm/arm/psci.rst.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-25  9:27   ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2021-08-25  9:27 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, pshier, ricarkol, rananta, reijiw, jingzhangos, kvm,
	linux-arm-kernel, james.morse, alexandru.elisei, suzuki.poulose,
	Drew Jones, Peter Maydell

Hi Oliver,

Adding Andrew and Peter to the discussion.

On Tue, 24 Aug 2021 22:15:03 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> Hey folks,
> 
> Ricardo and I were discussing hypercall support in KVM/arm64 and
> something seems to be a bit problematic. I do not see anywhere that KVM
> requires opt-in from the VMM to expose new hypercalls to the guest. To
> name a few, the TRNG and KVM PTP hypercalls are unconditionally provided
> to the guest.
> 
> Exposing new hypercalls to guests in this manner seems very unsafe to
> me. Suppose an operator is trying to upgrade from kernel N to kernel
> N+1, which brings in the new 'widget' hypercall. Guests are live
> migrated onto the N+1 kernel, but the operator finds a defect that
> warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
> Any guests that discovered the 'widget' hypercall are likely going to
> get fussy _very_ quickly on the old kernel.

This goes against what we decided to support for the *only* publicly
available VMM that cares about save/restore, which is that we only
move forward and don't rollback. Hypercalls are the least of your
worries, and there is a whole range of other architectural features
that will have also appeared/disappeared (your own CNTPOFF series is a
glaring example of this).

I appreciate that you may have different considerations, but I felt
that it was important to state *why* this is the way it is.

> 
> Really, we need to ensure that the exposed guest ABI is
> backwards-compatible. Running a VMM that is blissfully unaware of the
> 'widget' hypercall should not implicitly expose it to its guest on a new
> kernel.
> 
> This conversation was in the context of devising a new UAPI that allows
> VMMs to trap hypercalls to userspace. While such an interface would
> easily work around the issue, the onus of ABI compatibility lies with
> the kernel.
> 
> So, this is all a long-winded way of asking: how do we dig ourselves out
> of this situation, and how to we avoid it happening again in the future?
> I believe the answer to both is to have new VM capabilities for sets of
> hypercalls exposed to the guest. Unfortunately, the unconditional
> exposure of TRNG and PTP hypercalls is ABI now, so we'd have to provide
> an opt-out at this point. For anything new, require opt-in from the VMM
> before we give it to the guest.
>
> Have I missed something blatantly obvious, or do others see this as an
> issue as well? I'll reply with an example of adding opt-out for PTP.
> I'm sure other hypercalls could be handled similarly.

Why do we need this? For future hypercalls, we could have some buy-in
capabilities. For existing ones, it is too late, and negative features
are just too horrible.

For KVM-specific hypercalls, we could get the VMM to save/restore the
bitmap of supported functions. That would be "less horrible". This
could be implemented using extra "firmware pseudo-registers" such as
the ones described in Documentation/virt/kvm/arm/psci.rst.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
  2021-08-25  9:27   ` Marc Zyngier
  (?)
@ 2021-08-25 10:02     ` Oliver Upton
  -1 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2021-08-25 10:02 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, pshier, ricarkol, rananta, reijiw, jingzhangos, kvm,
	linux-arm-kernel, james.morse, Alexandru.Elisei, suzuki.poulose,
	Drew Jones, Peter Maydell

On Wed, Aug 25, 2021 at 2:27 AM Marc Zyngier <maz@kernel.org> wrote:
> > Exposing new hypercalls to guests in this manner seems very unsafe to
> > me. Suppose an operator is trying to upgrade from kernel N to kernel
> > N+1, which brings in the new 'widget' hypercall. Guests are live
> > migrated onto the N+1 kernel, but the operator finds a defect that
> > warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
> > Any guests that discovered the 'widget' hypercall are likely going to
> > get fussy _very_ quickly on the old kernel.
>
> This goes against what we decided to support for the *only* publicly
> available VMM that cares about save/restore, which is that we only
> move forward and don't rollback.

Ah, I was definitely missing this context. Current behavior makes much
more sense then.

> Hypercalls are the least of your
> worries, and there is a whole range of other architectural features
> that will have also appeared/disappeared (your own CNTPOFF series is a
> glaring example of this).

Isn't that a tad bit different though? I'll admit, I'm just as guilty
with my own series forgetting to add a KVM_CAP (oops), but it is in my
queue to kick out with the fix for nVHE/ptimer. Nonetheless, if a user
takes up a new KVM UAPI, it is up to the user to run on a new kernel.

My concerns are explicitly with the 'under the nose' changes, where
KVM modifies the guest feature set without userspace opting in. Based
on your comment, though, it would appear that other parts of KVM are
affected too. It doesn't have to be rollback safety, either. There may
simply be a hypercall which an operator doesn't want to give its
guests, and it needs a way to tell KVM to hide it.

> > Have I missed something blatantly obvious, or do others see this as an
> > issue as well? I'll reply with an example of adding opt-out for PTP.
> > I'm sure other hypercalls could be handled similarly.
>
> Why do we need this? For future hypercalls, we could have some buy-in
> capabilities. For existing ones, it is too late, and negative features
> are just too horrible.

Oh, agreed on the nastiness. Lazy hack to realize the intended
functional change..

> For KVM-specific hypercalls, we could get the VMM to save/restore the
> bitmap of supported functions. That would be "less horrible". This
> could be implemented using extra "firmware pseudo-registers" such as
> the ones described in Documentation/virt/kvm/arm/psci.rst.

This seems more reasonable, especially since we do this for migrating
the guest's PSCI version.

Alternatively, I had thought about using a VM attribute, given the
fact that it is non-architectural information and we avoid ABI issues
in KVM_GET_REG_LIST without buy-in through a KVM_CAP.

> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-25 10:02     ` Oliver Upton
  0 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2021-08-25 10:02 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, pshier, kvmarm, linux-arm-kernel

On Wed, Aug 25, 2021 at 2:27 AM Marc Zyngier <maz@kernel.org> wrote:
> > Exposing new hypercalls to guests in this manner seems very unsafe to
> > me. Suppose an operator is trying to upgrade from kernel N to kernel
> > N+1, which brings in the new 'widget' hypercall. Guests are live
> > migrated onto the N+1 kernel, but the operator finds a defect that
> > warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
> > Any guests that discovered the 'widget' hypercall are likely going to
> > get fussy _very_ quickly on the old kernel.
>
> This goes against what we decided to support for the *only* publicly
> available VMM that cares about save/restore, which is that we only
> move forward and don't rollback.

Ah, I was definitely missing this context. Current behavior makes much
more sense then.

> Hypercalls are the least of your
> worries, and there is a whole range of other architectural features
> that will have also appeared/disappeared (your own CNTPOFF series is a
> glaring example of this).

Isn't that a tad bit different though? I'll admit, I'm just as guilty
with my own series forgetting to add a KVM_CAP (oops), but it is in my
queue to kick out with the fix for nVHE/ptimer. Nonetheless, if a user
takes up a new KVM UAPI, it is up to the user to run on a new kernel.

My concerns are explicitly with the 'under the nose' changes, where
KVM modifies the guest feature set without userspace opting in. Based
on your comment, though, it would appear that other parts of KVM are
affected too. It doesn't have to be rollback safety, either. There may
simply be a hypercall which an operator doesn't want to give its
guests, and it needs a way to tell KVM to hide it.

> > Have I missed something blatantly obvious, or do others see this as an
> > issue as well? I'll reply with an example of adding opt-out for PTP.
> > I'm sure other hypercalls could be handled similarly.
>
> Why do we need this? For future hypercalls, we could have some buy-in
> capabilities. For existing ones, it is too late, and negative features
> are just too horrible.

Oh, agreed on the nastiness. Lazy hack to realize the intended
functional change..

> For KVM-specific hypercalls, we could get the VMM to save/restore the
> bitmap of supported functions. That would be "less horrible". This
> could be implemented using extra "firmware pseudo-registers" such as
> the ones described in Documentation/virt/kvm/arm/psci.rst.

This seems more reasonable, especially since we do this for migrating
the guest's PSCI version.

Alternatively, I had thought about using a VM attribute, given the
fact that it is non-architectural information and we avoid ABI issues
in KVM_GET_REG_LIST without buy-in through a KVM_CAP.

> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-25 10:02     ` Oliver Upton
  0 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2021-08-25 10:02 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, pshier, ricarkol, rananta, reijiw, jingzhangos, kvm,
	linux-arm-kernel, james.morse, Alexandru.Elisei, suzuki.poulose,
	Drew Jones, Peter Maydell

On Wed, Aug 25, 2021 at 2:27 AM Marc Zyngier <maz@kernel.org> wrote:
> > Exposing new hypercalls to guests in this manner seems very unsafe to
> > me. Suppose an operator is trying to upgrade from kernel N to kernel
> > N+1, which brings in the new 'widget' hypercall. Guests are live
> > migrated onto the N+1 kernel, but the operator finds a defect that
> > warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
> > Any guests that discovered the 'widget' hypercall are likely going to
> > get fussy _very_ quickly on the old kernel.
>
> This goes against what we decided to support for the *only* publicly
> available VMM that cares about save/restore, which is that we only
> move forward and don't rollback.

Ah, I was definitely missing this context. Current behavior makes much
more sense then.

> Hypercalls are the least of your
> worries, and there is a whole range of other architectural features
> that will have also appeared/disappeared (your own CNTPOFF series is a
> glaring example of this).

Isn't that a tad bit different though? I'll admit, I'm just as guilty
with my own series forgetting to add a KVM_CAP (oops), but it is in my
queue to kick out with the fix for nVHE/ptimer. Nonetheless, if a user
takes up a new KVM UAPI, it is up to the user to run on a new kernel.

My concerns are explicitly with the 'under the nose' changes, where
KVM modifies the guest feature set without userspace opting in. Based
on your comment, though, it would appear that other parts of KVM are
affected too. It doesn't have to be rollback safety, either. There may
simply be a hypercall which an operator doesn't want to give its
guests, and it needs a way to tell KVM to hide it.

> > Have I missed something blatantly obvious, or do others see this as an
> > issue as well? I'll reply with an example of adding opt-out for PTP.
> > I'm sure other hypercalls could be handled similarly.
>
> Why do we need this? For future hypercalls, we could have some buy-in
> capabilities. For existing ones, it is too late, and negative features
> are just too horrible.

Oh, agreed on the nastiness. Lazy hack to realize the intended
functional change..

> For KVM-specific hypercalls, we could get the VMM to save/restore the
> bitmap of supported functions. That would be "less horrible". This
> could be implemented using extra "firmware pseudo-registers" such as
> the ones described in Documentation/virt/kvm/arm/psci.rst.

This seems more reasonable, especially since we do this for migrating
the guest's PSCI version.

Alternatively, I had thought about using a VM attribute, given the
fact that it is non-architectural information and we avoid ABI issues
in KVM_GET_REG_LIST without buy-in through a KVM_CAP.

> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
  2021-08-25 10:02     ` Oliver Upton
  (?)
@ 2021-08-25 10:39       ` Marc Zyngier
  -1 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2021-08-25 10:39 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, pshier, ricarkol, rananta, reijiw, jingzhangos, kvm,
	linux-arm-kernel, james.morse, Alexandru.Elisei, suzuki.poulose,
	Drew Jones, Peter Maydell

On Wed, 25 Aug 2021 11:02:28 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> On Wed, Aug 25, 2021 at 2:27 AM Marc Zyngier <maz@kernel.org> wrote:
> > > Exposing new hypercalls to guests in this manner seems very unsafe to
> > > me. Suppose an operator is trying to upgrade from kernel N to kernel
> > > N+1, which brings in the new 'widget' hypercall. Guests are live
> > > migrated onto the N+1 kernel, but the operator finds a defect that
> > > warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
> > > Any guests that discovered the 'widget' hypercall are likely going to
> > > get fussy _very_ quickly on the old kernel.
> >
> > This goes against what we decided to support for the *only* publicly
> > available VMM that cares about save/restore, which is that we only
> > move forward and don't rollback.
> 
> Ah, I was definitely missing this context. Current behavior makes much
> more sense then.
> 
> > Hypercalls are the least of your
> > worries, and there is a whole range of other architectural features
> > that will have also appeared/disappeared (your own CNTPOFF series is a
> > glaring example of this).
> 
> Isn't that a tad bit different though? I'll admit, I'm just as guilty
> with my own series forgetting to add a KVM_CAP (oops), but it is in my
> queue to kick out with the fix for nVHE/ptimer. Nonetheless, if a user
> takes up a new KVM UAPI, it is up to the user to run on a new kernel.

The two are linked. Exposing a new register to userspace and/or guest
result in the same thing: you can't rollback. That's specially true in
the QEMU case, which *learns* from the kernel what registers are
available, and doesn't maintain a fixed list.

> My concerns are explicitly with the 'under the nose' changes, where
> KVM modifies the guest feature set without userspace opting in. Based
> on your comment, though, it would appear that other parts of KVM are
> affected too.

Any new system register that is exposed by a new kernel feature breaks
rollback. And so far, we only consider it a bug if the set of exposed
registers reduces. Anything can be added safely (as checked by one of
the selftests added by Drew).

< It doesn't have to be rollback safety, either. There may
> simply be a hypercall which an operator doesn't want to give its
> guests, and it needs a way to tell KVM to hide it.

Fair enough. But this has to be done in a scalable way, which
individual capability cannot provide.

> > > Have I missed something blatantly obvious, or do others see this as an
> > > issue as well? I'll reply with an example of adding opt-out for PTP.
> > > I'm sure other hypercalls could be handled similarly.
> >
> > Why do we need this? For future hypercalls, we could have some buy-in
> > capabilities. For existing ones, it is too late, and negative features
> > are just too horrible.
> 
> Oh, agreed on the nastiness. Lazy hack to realize the intended
> functional change..

Well, you definitely achieved your goal of attracting my attention :).

> > For KVM-specific hypercalls, we could get the VMM to save/restore the
> > bitmap of supported functions. That would be "less horrible". This
> > could be implemented using extra "firmware pseudo-registers" such as
> > the ones described in Documentation/virt/kvm/arm/psci.rst.
> 
> This seems more reasonable, especially since we do this for migrating
> the guest's PSCI version.
> 
> Alternatively, I had thought about using a VM attribute, given the
> fact that it is non-architectural information and we avoid ABI issues
> in KVM_GET_REG_LIST without buy-in through a KVM_CAP.

The whole point is that these settings get exposed by
KVM_GET_REG_LIST, as this is QEMU's way to dump a VM state. Given that
we already have this for things like the spectre management state, we
can just as well expose the bitmaps that deal with the KVM-specific
hypercalls. After all, this falls into the realm of "KVM as VM
firmware".

For ARM-architected hypercalls (TRNG, stolen time), we may need a
similar extension.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-25 10:39       ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2021-08-25 10:39 UTC (permalink / raw)
  To: Oliver Upton; +Cc: kvm, pshier, kvmarm, linux-arm-kernel

On Wed, 25 Aug 2021 11:02:28 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> On Wed, Aug 25, 2021 at 2:27 AM Marc Zyngier <maz@kernel.org> wrote:
> > > Exposing new hypercalls to guests in this manner seems very unsafe to
> > > me. Suppose an operator is trying to upgrade from kernel N to kernel
> > > N+1, which brings in the new 'widget' hypercall. Guests are live
> > > migrated onto the N+1 kernel, but the operator finds a defect that
> > > warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
> > > Any guests that discovered the 'widget' hypercall are likely going to
> > > get fussy _very_ quickly on the old kernel.
> >
> > This goes against what we decided to support for the *only* publicly
> > available VMM that cares about save/restore, which is that we only
> > move forward and don't rollback.
> 
> Ah, I was definitely missing this context. Current behavior makes much
> more sense then.
> 
> > Hypercalls are the least of your
> > worries, and there is a whole range of other architectural features
> > that will have also appeared/disappeared (your own CNTPOFF series is a
> > glaring example of this).
> 
> Isn't that a tad bit different though? I'll admit, I'm just as guilty
> with my own series forgetting to add a KVM_CAP (oops), but it is in my
> queue to kick out with the fix for nVHE/ptimer. Nonetheless, if a user
> takes up a new KVM UAPI, it is up to the user to run on a new kernel.

The two are linked. Exposing a new register to userspace and/or guest
result in the same thing: you can't rollback. That's specially true in
the QEMU case, which *learns* from the kernel what registers are
available, and doesn't maintain a fixed list.

> My concerns are explicitly with the 'under the nose' changes, where
> KVM modifies the guest feature set without userspace opting in. Based
> on your comment, though, it would appear that other parts of KVM are
> affected too.

Any new system register that is exposed by a new kernel feature breaks
rollback. And so far, we only consider it a bug if the set of exposed
registers reduces. Anything can be added safely (as checked by one of
the selftests added by Drew).

< It doesn't have to be rollback safety, either. There may
> simply be a hypercall which an operator doesn't want to give its
> guests, and it needs a way to tell KVM to hide it.

Fair enough. But this has to be done in a scalable way, which
individual capability cannot provide.

> > > Have I missed something blatantly obvious, or do others see this as an
> > > issue as well? I'll reply with an example of adding opt-out for PTP.
> > > I'm sure other hypercalls could be handled similarly.
> >
> > Why do we need this? For future hypercalls, we could have some buy-in
> > capabilities. For existing ones, it is too late, and negative features
> > are just too horrible.
> 
> Oh, agreed on the nastiness. Lazy hack to realize the intended
> functional change..

Well, you definitely achieved your goal of attracting my attention :).

> > For KVM-specific hypercalls, we could get the VMM to save/restore the
> > bitmap of supported functions. That would be "less horrible". This
> > could be implemented using extra "firmware pseudo-registers" such as
> > the ones described in Documentation/virt/kvm/arm/psci.rst.
> 
> This seems more reasonable, especially since we do this for migrating
> the guest's PSCI version.
> 
> Alternatively, I had thought about using a VM attribute, given the
> fact that it is non-architectural information and we avoid ABI issues
> in KVM_GET_REG_LIST without buy-in through a KVM_CAP.

The whole point is that these settings get exposed by
KVM_GET_REG_LIST, as this is QEMU's way to dump a VM state. Given that
we already have this for things like the spectre management state, we
can just as well expose the bitmaps that deal with the KVM-specific
hypercalls. After all, this falls into the realm of "KVM as VM
firmware".

For ARM-architected hypercalls (TRNG, stolen time), we may need a
similar extension.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-25 10:39       ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2021-08-25 10:39 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, pshier, ricarkol, rananta, reijiw, jingzhangos, kvm,
	linux-arm-kernel, james.morse, Alexandru.Elisei, suzuki.poulose,
	Drew Jones, Peter Maydell

On Wed, 25 Aug 2021 11:02:28 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> On Wed, Aug 25, 2021 at 2:27 AM Marc Zyngier <maz@kernel.org> wrote:
> > > Exposing new hypercalls to guests in this manner seems very unsafe to
> > > me. Suppose an operator is trying to upgrade from kernel N to kernel
> > > N+1, which brings in the new 'widget' hypercall. Guests are live
> > > migrated onto the N+1 kernel, but the operator finds a defect that
> > > warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
> > > Any guests that discovered the 'widget' hypercall are likely going to
> > > get fussy _very_ quickly on the old kernel.
> >
> > This goes against what we decided to support for the *only* publicly
> > available VMM that cares about save/restore, which is that we only
> > move forward and don't rollback.
> 
> Ah, I was definitely missing this context. Current behavior makes much
> more sense then.
> 
> > Hypercalls are the least of your
> > worries, and there is a whole range of other architectural features
> > that will have also appeared/disappeared (your own CNTPOFF series is a
> > glaring example of this).
> 
> Isn't that a tad bit different though? I'll admit, I'm just as guilty
> with my own series forgetting to add a KVM_CAP (oops), but it is in my
> queue to kick out with the fix for nVHE/ptimer. Nonetheless, if a user
> takes up a new KVM UAPI, it is up to the user to run on a new kernel.

The two are linked. Exposing a new register to userspace and/or guest
result in the same thing: you can't rollback. That's specially true in
the QEMU case, which *learns* from the kernel what registers are
available, and doesn't maintain a fixed list.

> My concerns are explicitly with the 'under the nose' changes, where
> KVM modifies the guest feature set without userspace opting in. Based
> on your comment, though, it would appear that other parts of KVM are
> affected too.

Any new system register that is exposed by a new kernel feature breaks
rollback. And so far, we only consider it a bug if the set of exposed
registers reduces. Anything can be added safely (as checked by one of
the selftests added by Drew).

< It doesn't have to be rollback safety, either. There may
> simply be a hypercall which an operator doesn't want to give its
> guests, and it needs a way to tell KVM to hide it.

Fair enough. But this has to be done in a scalable way, which
individual capability cannot provide.

> > > Have I missed something blatantly obvious, or do others see this as an
> > > issue as well? I'll reply with an example of adding opt-out for PTP.
> > > I'm sure other hypercalls could be handled similarly.
> >
> > Why do we need this? For future hypercalls, we could have some buy-in
> > capabilities. For existing ones, it is too late, and negative features
> > are just too horrible.
> 
> Oh, agreed on the nastiness. Lazy hack to realize the intended
> functional change..

Well, you definitely achieved your goal of attracting my attention :).

> > For KVM-specific hypercalls, we could get the VMM to save/restore the
> > bitmap of supported functions. That would be "less horrible". This
> > could be implemented using extra "firmware pseudo-registers" such as
> > the ones described in Documentation/virt/kvm/arm/psci.rst.
> 
> This seems more reasonable, especially since we do this for migrating
> the guest's PSCI version.
> 
> Alternatively, I had thought about using a VM attribute, given the
> fact that it is non-architectural information and we avoid ABI issues
> in KVM_GET_REG_LIST without buy-in through a KVM_CAP.

The whole point is that these settings get exposed by
KVM_GET_REG_LIST, as this is QEMU's way to dump a VM state. Given that
we already have this for things like the spectre management state, we
can just as well expose the bitmaps that deal with the KVM-specific
hypercalls. After all, this falls into the realm of "KVM as VM
firmware".

For ARM-architected hypercalls (TRNG, stolen time), we may need a
similar extension.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
  2021-08-25 10:39       ` Marc Zyngier
  (?)
@ 2021-08-25 15:07         ` Andrew Jones
  -1 siblings, 0 replies; 74+ messages in thread
From: Andrew Jones @ 2021-08-25 15:07 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Oliver Upton, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Wed, Aug 25, 2021 at 11:39:34AM +0100, Marc Zyngier wrote:
> On Wed, 25 Aug 2021 11:02:28 +0100,
> Oliver Upton <oupton@google.com> wrote:
> > 
> > On Wed, Aug 25, 2021 at 2:27 AM Marc Zyngier <maz@kernel.org> wrote:
> > > > Exposing new hypercalls to guests in this manner seems very unsafe to
> > > > me. Suppose an operator is trying to upgrade from kernel N to kernel
> > > > N+1, which brings in the new 'widget' hypercall. Guests are live
> > > > migrated onto the N+1 kernel, but the operator finds a defect that
> > > > warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
> > > > Any guests that discovered the 'widget' hypercall are likely going to
> > > > get fussy _very_ quickly on the old kernel.
> > >
> > > This goes against what we decided to support for the *only* publicly
> > > available VMM that cares about save/restore, which is that we only
> > > move forward and don't rollback.
> > 
> > Ah, I was definitely missing this context. Current behavior makes much
> > more sense then.
> > 
> > > Hypercalls are the least of your
> > > worries, and there is a whole range of other architectural features
> > > that will have also appeared/disappeared (your own CNTPOFF series is a
> > > glaring example of this).
> > 
> > Isn't that a tad bit different though? I'll admit, I'm just as guilty
> > with my own series forgetting to add a KVM_CAP (oops), but it is in my
> > queue to kick out with the fix for nVHE/ptimer. Nonetheless, if a user
> > takes up a new KVM UAPI, it is up to the user to run on a new kernel.
> 
> The two are linked. Exposing a new register to userspace and/or guest
> result in the same thing: you can't rollback. That's specially true in
> the QEMU case, which *learns* from the kernel what registers are
> available, and doesn't maintain a fixed list.
> 
> > My concerns are explicitly with the 'under the nose' changes, where
> > KVM modifies the guest feature set without userspace opting in. Based
> > on your comment, though, it would appear that other parts of KVM are
> > affected too.
> 
> Any new system register that is exposed by a new kernel feature breaks
> rollback. And so far, we only consider it a bug if the set of exposed
> registers reduces. Anything can be added safely (as checked by one of
> the selftests added by Drew).
> 
> < It doesn't have to be rollback safety, either. There may
> > simply be a hypercall which an operator doesn't want to give its
> > guests, and it needs a way to tell KVM to hide it.
> 
> Fair enough. But this has to be done in a scalable way, which
> individual capability cannot provide.
> 
> > > > Have I missed something blatantly obvious, or do others see this as an
> > > > issue as well? I'll reply with an example of adding opt-out for PTP.
> > > > I'm sure other hypercalls could be handled similarly.
> > >
> > > Why do we need this? For future hypercalls, we could have some buy-in
> > > capabilities. For existing ones, it is too late, and negative features
> > > are just too horrible.
> > 
> > Oh, agreed on the nastiness. Lazy hack to realize the intended
> > functional change..
> 
> Well, you definitely achieved your goal of attracting my attention :).
> 
> > > For KVM-specific hypercalls, we could get the VMM to save/restore the
> > > bitmap of supported functions. That would be "less horrible". This
> > > could be implemented using extra "firmware pseudo-registers" such as
> > > the ones described in Documentation/virt/kvm/arm/psci.rst.
> > 
> > This seems more reasonable, especially since we do this for migrating
> > the guest's PSCI version.
> > 
> > Alternatively, I had thought about using a VM attribute, given the
> > fact that it is non-architectural information and we avoid ABI issues
> > in KVM_GET_REG_LIST without buy-in through a KVM_CAP.
> 
> The whole point is that these settings get exposed by
> KVM_GET_REG_LIST, as this is QEMU's way to dump a VM state. Given that
> we already have this for things like the spectre management state, we
> can just as well expose the bitmaps that deal with the KVM-specific
> hypercalls. After all, this falls into the realm of "KVM as VM
> firmware".
> 
> For ARM-architected hypercalls (TRNG, stolen time), we may need a
> similar extension.
> 

Thanks for including me Marc. I think you've mentioned all the examples
of why we don't generally expect N+1 -> N migrations to work that I
can think of. While some of the examples like get-reg-list could
eventually be eliminated if we had CPU models to tighten our machine type
state, I think N+1 -> N migrations will always be best effort at most.

I agree with giving userspace control over the exposer of the hypercalls
though. Using pseudo-registers for that purpose rather than a pile of
CAPs also seems reasonable to me.

And, while I don't think this patch is going to proceed, I thought I'd
point out that the opt-out approach doesn't help much with expanding
our migration support unless we require the VMM to be upgraded first.
And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
work as expected, since the source enforce opt-out, but the destination
won't. Also, since the VMM doesn't key off the kernel version, for the
most part N+1 VMMs won't know when they're supposed to opt-out or not,
leaving it to the user to ensure they consider everything. opt-in
usually only needs the user to consider what machine type they want to
launch.

Thanks,
drew


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-25 15:07         ` Andrew Jones
  0 siblings, 0 replies; 74+ messages in thread
From: Andrew Jones @ 2021-08-25 15:07 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, pshier, kvmarm, linux-arm-kernel

On Wed, Aug 25, 2021 at 11:39:34AM +0100, Marc Zyngier wrote:
> On Wed, 25 Aug 2021 11:02:28 +0100,
> Oliver Upton <oupton@google.com> wrote:
> > 
> > On Wed, Aug 25, 2021 at 2:27 AM Marc Zyngier <maz@kernel.org> wrote:
> > > > Exposing new hypercalls to guests in this manner seems very unsafe to
> > > > me. Suppose an operator is trying to upgrade from kernel N to kernel
> > > > N+1, which brings in the new 'widget' hypercall. Guests are live
> > > > migrated onto the N+1 kernel, but the operator finds a defect that
> > > > warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
> > > > Any guests that discovered the 'widget' hypercall are likely going to
> > > > get fussy _very_ quickly on the old kernel.
> > >
> > > This goes against what we decided to support for the *only* publicly
> > > available VMM that cares about save/restore, which is that we only
> > > move forward and don't rollback.
> > 
> > Ah, I was definitely missing this context. Current behavior makes much
> > more sense then.
> > 
> > > Hypercalls are the least of your
> > > worries, and there is a whole range of other architectural features
> > > that will have also appeared/disappeared (your own CNTPOFF series is a
> > > glaring example of this).
> > 
> > Isn't that a tad bit different though? I'll admit, I'm just as guilty
> > with my own series forgetting to add a KVM_CAP (oops), but it is in my
> > queue to kick out with the fix for nVHE/ptimer. Nonetheless, if a user
> > takes up a new KVM UAPI, it is up to the user to run on a new kernel.
> 
> The two are linked. Exposing a new register to userspace and/or guest
> result in the same thing: you can't rollback. That's specially true in
> the QEMU case, which *learns* from the kernel what registers are
> available, and doesn't maintain a fixed list.
> 
> > My concerns are explicitly with the 'under the nose' changes, where
> > KVM modifies the guest feature set without userspace opting in. Based
> > on your comment, though, it would appear that other parts of KVM are
> > affected too.
> 
> Any new system register that is exposed by a new kernel feature breaks
> rollback. And so far, we only consider it a bug if the set of exposed
> registers reduces. Anything can be added safely (as checked by one of
> the selftests added by Drew).
> 
> < It doesn't have to be rollback safety, either. There may
> > simply be a hypercall which an operator doesn't want to give its
> > guests, and it needs a way to tell KVM to hide it.
> 
> Fair enough. But this has to be done in a scalable way, which
> individual capability cannot provide.
> 
> > > > Have I missed something blatantly obvious, or do others see this as an
> > > > issue as well? I'll reply with an example of adding opt-out for PTP.
> > > > I'm sure other hypercalls could be handled similarly.
> > >
> > > Why do we need this? For future hypercalls, we could have some buy-in
> > > capabilities. For existing ones, it is too late, and negative features
> > > are just too horrible.
> > 
> > Oh, agreed on the nastiness. Lazy hack to realize the intended
> > functional change..
> 
> Well, you definitely achieved your goal of attracting my attention :).
> 
> > > For KVM-specific hypercalls, we could get the VMM to save/restore the
> > > bitmap of supported functions. That would be "less horrible". This
> > > could be implemented using extra "firmware pseudo-registers" such as
> > > the ones described in Documentation/virt/kvm/arm/psci.rst.
> > 
> > This seems more reasonable, especially since we do this for migrating
> > the guest's PSCI version.
> > 
> > Alternatively, I had thought about using a VM attribute, given the
> > fact that it is non-architectural information and we avoid ABI issues
> > in KVM_GET_REG_LIST without buy-in through a KVM_CAP.
> 
> The whole point is that these settings get exposed by
> KVM_GET_REG_LIST, as this is QEMU's way to dump a VM state. Given that
> we already have this for things like the spectre management state, we
> can just as well expose the bitmaps that deal with the KVM-specific
> hypercalls. After all, this falls into the realm of "KVM as VM
> firmware".
> 
> For ARM-architected hypercalls (TRNG, stolen time), we may need a
> similar extension.
> 

Thanks for including me Marc. I think you've mentioned all the examples
of why we don't generally expect N+1 -> N migrations to work that I
can think of. While some of the examples like get-reg-list could
eventually be eliminated if we had CPU models to tighten our machine type
state, I think N+1 -> N migrations will always be best effort at most.

I agree with giving userspace control over the exposer of the hypercalls
though. Using pseudo-registers for that purpose rather than a pile of
CAPs also seems reasonable to me.

And, while I don't think this patch is going to proceed, I thought I'd
point out that the opt-out approach doesn't help much with expanding
our migration support unless we require the VMM to be upgraded first.
And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
work as expected, since the source enforce opt-out, but the destination
won't. Also, since the VMM doesn't key off the kernel version, for the
most part N+1 VMMs won't know when they're supposed to opt-out or not,
leaving it to the user to ensure they consider everything. opt-in
usually only needs the user to consider what machine type they want to
launch.

Thanks,
drew

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-25 15:07         ` Andrew Jones
  0 siblings, 0 replies; 74+ messages in thread
From: Andrew Jones @ 2021-08-25 15:07 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Oliver Upton, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Wed, Aug 25, 2021 at 11:39:34AM +0100, Marc Zyngier wrote:
> On Wed, 25 Aug 2021 11:02:28 +0100,
> Oliver Upton <oupton@google.com> wrote:
> > 
> > On Wed, Aug 25, 2021 at 2:27 AM Marc Zyngier <maz@kernel.org> wrote:
> > > > Exposing new hypercalls to guests in this manner seems very unsafe to
> > > > me. Suppose an operator is trying to upgrade from kernel N to kernel
> > > > N+1, which brings in the new 'widget' hypercall. Guests are live
> > > > migrated onto the N+1 kernel, but the operator finds a defect that
> > > > warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
> > > > Any guests that discovered the 'widget' hypercall are likely going to
> > > > get fussy _very_ quickly on the old kernel.
> > >
> > > This goes against what we decided to support for the *only* publicly
> > > available VMM that cares about save/restore, which is that we only
> > > move forward and don't rollback.
> > 
> > Ah, I was definitely missing this context. Current behavior makes much
> > more sense then.
> > 
> > > Hypercalls are the least of your
> > > worries, and there is a whole range of other architectural features
> > > that will have also appeared/disappeared (your own CNTPOFF series is a
> > > glaring example of this).
> > 
> > Isn't that a tad bit different though? I'll admit, I'm just as guilty
> > with my own series forgetting to add a KVM_CAP (oops), but it is in my
> > queue to kick out with the fix for nVHE/ptimer. Nonetheless, if a user
> > takes up a new KVM UAPI, it is up to the user to run on a new kernel.
> 
> The two are linked. Exposing a new register to userspace and/or guest
> result in the same thing: you can't rollback. That's specially true in
> the QEMU case, which *learns* from the kernel what registers are
> available, and doesn't maintain a fixed list.
> 
> > My concerns are explicitly with the 'under the nose' changes, where
> > KVM modifies the guest feature set without userspace opting in. Based
> > on your comment, though, it would appear that other parts of KVM are
> > affected too.
> 
> Any new system register that is exposed by a new kernel feature breaks
> rollback. And so far, we only consider it a bug if the set of exposed
> registers reduces. Anything can be added safely (as checked by one of
> the selftests added by Drew).
> 
> < It doesn't have to be rollback safety, either. There may
> > simply be a hypercall which an operator doesn't want to give its
> > guests, and it needs a way to tell KVM to hide it.
> 
> Fair enough. But this has to be done in a scalable way, which
> individual capability cannot provide.
> 
> > > > Have I missed something blatantly obvious, or do others see this as an
> > > > issue as well? I'll reply with an example of adding opt-out for PTP.
> > > > I'm sure other hypercalls could be handled similarly.
> > >
> > > Why do we need this? For future hypercalls, we could have some buy-in
> > > capabilities. For existing ones, it is too late, and negative features
> > > are just too horrible.
> > 
> > Oh, agreed on the nastiness. Lazy hack to realize the intended
> > functional change..
> 
> Well, you definitely achieved your goal of attracting my attention :).
> 
> > > For KVM-specific hypercalls, we could get the VMM to save/restore the
> > > bitmap of supported functions. That would be "less horrible". This
> > > could be implemented using extra "firmware pseudo-registers" such as
> > > the ones described in Documentation/virt/kvm/arm/psci.rst.
> > 
> > This seems more reasonable, especially since we do this for migrating
> > the guest's PSCI version.
> > 
> > Alternatively, I had thought about using a VM attribute, given the
> > fact that it is non-architectural information and we avoid ABI issues
> > in KVM_GET_REG_LIST without buy-in through a KVM_CAP.
> 
> The whole point is that these settings get exposed by
> KVM_GET_REG_LIST, as this is QEMU's way to dump a VM state. Given that
> we already have this for things like the spectre management state, we
> can just as well expose the bitmaps that deal with the KVM-specific
> hypercalls. After all, this falls into the realm of "KVM as VM
> firmware".
> 
> For ARM-architected hypercalls (TRNG, stolen time), we may need a
> similar extension.
> 

Thanks for including me Marc. I think you've mentioned all the examples
of why we don't generally expect N+1 -> N migrations to work that I
can think of. While some of the examples like get-reg-list could
eventually be eliminated if we had CPU models to tighten our machine type
state, I think N+1 -> N migrations will always be best effort at most.

I agree with giving userspace control over the exposer of the hypercalls
though. Using pseudo-registers for that purpose rather than a pile of
CAPs also seems reasonable to me.

And, while I don't think this patch is going to proceed, I thought I'd
point out that the opt-out approach doesn't help much with expanding
our migration support unless we require the VMM to be upgraded first.
And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
work as expected, since the source enforce opt-out, but the destination
won't. Also, since the VMM doesn't key off the kernel version, for the
most part N+1 VMMs won't know when they're supposed to opt-out or not,
leaving it to the user to ensure they consider everything. opt-in
usually only needs the user to consider what machine type they want to
launch.

Thanks,
drew


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
  2021-08-25 15:07         ` Andrew Jones
  (?)
@ 2021-08-25 18:14           ` Oliver Upton
  -1 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2021-08-25 18:14 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Marc Zyngier, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Wed, Aug 25, 2021 at 8:07 AM Andrew Jones <drjones@redhat.com> wrote:
>
> On Wed, Aug 25, 2021 at 11:39:34AM +0100, Marc Zyngier wrote:
> > On Wed, 25 Aug 2021 11:02:28 +0100,
> > Oliver Upton <oupton@google.com> wrote:
> > >
> > > On Wed, Aug 25, 2021 at 2:27 AM Marc Zyngier <maz@kernel.org> wrote:
> > > > > Exposing new hypercalls to guests in this manner seems very unsafe to
> > > > > me. Suppose an operator is trying to upgrade from kernel N to kernel
> > > > > N+1, which brings in the new 'widget' hypercall. Guests are live
> > > > > migrated onto the N+1 kernel, but the operator finds a defect that
> > > > > warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
> > > > > Any guests that discovered the 'widget' hypercall are likely going to
> > > > > get fussy _very_ quickly on the old kernel.
> > > >
> > > > This goes against what we decided to support for the *only* publicly
> > > > available VMM that cares about save/restore, which is that we only
> > > > move forward and don't rollback.
> > >
> > > Ah, I was definitely missing this context. Current behavior makes much
> > > more sense then.
> > >
> > > > Hypercalls are the least of your
> > > > worries, and there is a whole range of other architectural features
> > > > that will have also appeared/disappeared (your own CNTPOFF series is a
> > > > glaring example of this).
> > >
> > > Isn't that a tad bit different though? I'll admit, I'm just as guilty
> > > with my own series forgetting to add a KVM_CAP (oops), but it is in my
> > > queue to kick out with the fix for nVHE/ptimer. Nonetheless, if a user
> > > takes up a new KVM UAPI, it is up to the user to run on a new kernel.
> >
> > The two are linked. Exposing a new register to userspace and/or guest
> > result in the same thing: you can't rollback. That's specially true in
> > the QEMU case, which *learns* from the kernel what registers are
> > available, and doesn't maintain a fixed list.
> >
> > > My concerns are explicitly with the 'under the nose' changes, where
> > > KVM modifies the guest feature set without userspace opting in. Based
> > > on your comment, though, it would appear that other parts of KVM are
> > > affected too.
> >
> > Any new system register that is exposed by a new kernel feature breaks
> > rollback. And so far, we only consider it a bug if the set of exposed
> > registers reduces. Anything can be added safely (as checked by one of
> > the selftests added by Drew).
> >
> > < It doesn't have to be rollback safety, either. There may
> > > simply be a hypercall which an operator doesn't want to give its
> > > guests, and it needs a way to tell KVM to hide it.
> >
> > Fair enough. But this has to be done in a scalable way, which
> > individual capability cannot provide.
> >
> > > > > Have I missed something blatantly obvious, or do others see this as an
> > > > > issue as well? I'll reply with an example of adding opt-out for PTP.
> > > > > I'm sure other hypercalls could be handled similarly.
> > > >
> > > > Why do we need this? For future hypercalls, we could have some buy-in
> > > > capabilities. For existing ones, it is too late, and negative features
> > > > are just too horrible.
> > >
> > > Oh, agreed on the nastiness. Lazy hack to realize the intended
> > > functional change..
> >
> > Well, you definitely achieved your goal of attracting my attention :).
> >
> > > > For KVM-specific hypercalls, we could get the VMM to save/restore the
> > > > bitmap of supported functions. That would be "less horrible". This
> > > > could be implemented using extra "firmware pseudo-registers" such as
> > > > the ones described in Documentation/virt/kvm/arm/psci.rst.
> > >
> > > This seems more reasonable, especially since we do this for migrating
> > > the guest's PSCI version.
> > >
> > > Alternatively, I had thought about using a VM attribute, given the
> > > fact that it is non-architectural information and we avoid ABI issues
> > > in KVM_GET_REG_LIST without buy-in through a KVM_CAP.
> >
> > The whole point is that these settings get exposed by
> > KVM_GET_REG_LIST, as this is QEMU's way to dump a VM state. Given that
> > we already have this for things like the spectre management state, we
> > can just as well expose the bitmaps that deal with the KVM-specific
> > hypercalls. After all, this falls into the realm of "KVM as VM
> > firmware".
> >
> > For ARM-architected hypercalls (TRNG, stolen time), we may need a
> > similar extension.
> >
>
> Thanks for including me Marc. I think you've mentioned all the examples
> of why we don't generally expect N+1 -> N migrations to work that I
> can think of. While some of the examples like get-reg-list could
> eventually be eliminated if we had CPU models to tighten our machine type
> state, I think N+1 -> N migrations will always be best effort at most.
>
> I agree with giving userspace control over the exposer of the hypercalls
> though. Using pseudo-registers for that purpose rather than a pile of
> CAPs also seems reasonable to me.
>
> And, while I don't think this patch is going to proceed, I thought I'd
> point out that the opt-out approach doesn't help much with expanding
> our migration support unless we require the VMM to be upgraded first.
>
> And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
> work as expected, since the source enforce opt-out, but the destination
> won't.

Right, there's going to need to be a fence in both kernel and VMM
versions. Before the fence, you can't rollback with either component.
Once on the other side of the fence, the user may freely migrate
between kernel + VMM combinations.

> Also, since the VMM doesn't key off the kernel version, for the
> most part N+1 VMMs won't know when they're supposed to opt-out or not,
> leaving it to the user to ensure they consider everything. opt-in
> usually only needs the user to consider what machine type they want to
> launch.

Going the register route will implicitly require opt-out for all old
hypercalls. We exposed them unconditionally to the guest before, and
we must uphold that behavior. The default value for the bitmap will
have those features set. Any hypercalls added after that register
interface will then require explicit opt-in from userspace.

With regards to the pseudoregister interface, how would a VMM discover
new bits? From my perspective, you need to have two bitmaps that the
VMM can get at: the set of supported feature bits and the active
bitmap of features for a running guest.

> Thanks,
> drew
>

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-25 18:14           ` Oliver Upton
  0 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2021-08-25 18:14 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, Marc Zyngier, pshier, kvmarm, linux-arm-kernel

On Wed, Aug 25, 2021 at 8:07 AM Andrew Jones <drjones@redhat.com> wrote:
>
> On Wed, Aug 25, 2021 at 11:39:34AM +0100, Marc Zyngier wrote:
> > On Wed, 25 Aug 2021 11:02:28 +0100,
> > Oliver Upton <oupton@google.com> wrote:
> > >
> > > On Wed, Aug 25, 2021 at 2:27 AM Marc Zyngier <maz@kernel.org> wrote:
> > > > > Exposing new hypercalls to guests in this manner seems very unsafe to
> > > > > me. Suppose an operator is trying to upgrade from kernel N to kernel
> > > > > N+1, which brings in the new 'widget' hypercall. Guests are live
> > > > > migrated onto the N+1 kernel, but the operator finds a defect that
> > > > > warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
> > > > > Any guests that discovered the 'widget' hypercall are likely going to
> > > > > get fussy _very_ quickly on the old kernel.
> > > >
> > > > This goes against what we decided to support for the *only* publicly
> > > > available VMM that cares about save/restore, which is that we only
> > > > move forward and don't rollback.
> > >
> > > Ah, I was definitely missing this context. Current behavior makes much
> > > more sense then.
> > >
> > > > Hypercalls are the least of your
> > > > worries, and there is a whole range of other architectural features
> > > > that will have also appeared/disappeared (your own CNTPOFF series is a
> > > > glaring example of this).
> > >
> > > Isn't that a tad bit different though? I'll admit, I'm just as guilty
> > > with my own series forgetting to add a KVM_CAP (oops), but it is in my
> > > queue to kick out with the fix for nVHE/ptimer. Nonetheless, if a user
> > > takes up a new KVM UAPI, it is up to the user to run on a new kernel.
> >
> > The two are linked. Exposing a new register to userspace and/or guest
> > result in the same thing: you can't rollback. That's specially true in
> > the QEMU case, which *learns* from the kernel what registers are
> > available, and doesn't maintain a fixed list.
> >
> > > My concerns are explicitly with the 'under the nose' changes, where
> > > KVM modifies the guest feature set without userspace opting in. Based
> > > on your comment, though, it would appear that other parts of KVM are
> > > affected too.
> >
> > Any new system register that is exposed by a new kernel feature breaks
> > rollback. And so far, we only consider it a bug if the set of exposed
> > registers reduces. Anything can be added safely (as checked by one of
> > the selftests added by Drew).
> >
> > < It doesn't have to be rollback safety, either. There may
> > > simply be a hypercall which an operator doesn't want to give its
> > > guests, and it needs a way to tell KVM to hide it.
> >
> > Fair enough. But this has to be done in a scalable way, which
> > individual capability cannot provide.
> >
> > > > > Have I missed something blatantly obvious, or do others see this as an
> > > > > issue as well? I'll reply with an example of adding opt-out for PTP.
> > > > > I'm sure other hypercalls could be handled similarly.
> > > >
> > > > Why do we need this? For future hypercalls, we could have some buy-in
> > > > capabilities. For existing ones, it is too late, and negative features
> > > > are just too horrible.
> > >
> > > Oh, agreed on the nastiness. Lazy hack to realize the intended
> > > functional change..
> >
> > Well, you definitely achieved your goal of attracting my attention :).
> >
> > > > For KVM-specific hypercalls, we could get the VMM to save/restore the
> > > > bitmap of supported functions. That would be "less horrible". This
> > > > could be implemented using extra "firmware pseudo-registers" such as
> > > > the ones described in Documentation/virt/kvm/arm/psci.rst.
> > >
> > > This seems more reasonable, especially since we do this for migrating
> > > the guest's PSCI version.
> > >
> > > Alternatively, I had thought about using a VM attribute, given the
> > > fact that it is non-architectural information and we avoid ABI issues
> > > in KVM_GET_REG_LIST without buy-in through a KVM_CAP.
> >
> > The whole point is that these settings get exposed by
> > KVM_GET_REG_LIST, as this is QEMU's way to dump a VM state. Given that
> > we already have this for things like the spectre management state, we
> > can just as well expose the bitmaps that deal with the KVM-specific
> > hypercalls. After all, this falls into the realm of "KVM as VM
> > firmware".
> >
> > For ARM-architected hypercalls (TRNG, stolen time), we may need a
> > similar extension.
> >
>
> Thanks for including me Marc. I think you've mentioned all the examples
> of why we don't generally expect N+1 -> N migrations to work that I
> can think of. While some of the examples like get-reg-list could
> eventually be eliminated if we had CPU models to tighten our machine type
> state, I think N+1 -> N migrations will always be best effort at most.
>
> I agree with giving userspace control over the exposer of the hypercalls
> though. Using pseudo-registers for that purpose rather than a pile of
> CAPs also seems reasonable to me.
>
> And, while I don't think this patch is going to proceed, I thought I'd
> point out that the opt-out approach doesn't help much with expanding
> our migration support unless we require the VMM to be upgraded first.
>
> And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
> work as expected, since the source enforce opt-out, but the destination
> won't.

Right, there's going to need to be a fence in both kernel and VMM
versions. Before the fence, you can't rollback with either component.
Once on the other side of the fence, the user may freely migrate
between kernel + VMM combinations.

> Also, since the VMM doesn't key off the kernel version, for the
> most part N+1 VMMs won't know when they're supposed to opt-out or not,
> leaving it to the user to ensure they consider everything. opt-in
> usually only needs the user to consider what machine type they want to
> launch.

Going the register route will implicitly require opt-out for all old
hypercalls. We exposed them unconditionally to the guest before, and
we must uphold that behavior. The default value for the bitmap will
have those features set. Any hypercalls added after that register
interface will then require explicit opt-in from userspace.

With regards to the pseudoregister interface, how would a VMM discover
new bits? From my perspective, you need to have two bitmaps that the
VMM can get at: the set of supported feature bits and the active
bitmap of features for a running guest.

> Thanks,
> drew
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-25 18:14           ` Oliver Upton
  0 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2021-08-25 18:14 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Marc Zyngier, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Wed, Aug 25, 2021 at 8:07 AM Andrew Jones <drjones@redhat.com> wrote:
>
> On Wed, Aug 25, 2021 at 11:39:34AM +0100, Marc Zyngier wrote:
> > On Wed, 25 Aug 2021 11:02:28 +0100,
> > Oliver Upton <oupton@google.com> wrote:
> > >
> > > On Wed, Aug 25, 2021 at 2:27 AM Marc Zyngier <maz@kernel.org> wrote:
> > > > > Exposing new hypercalls to guests in this manner seems very unsafe to
> > > > > me. Suppose an operator is trying to upgrade from kernel N to kernel
> > > > > N+1, which brings in the new 'widget' hypercall. Guests are live
> > > > > migrated onto the N+1 kernel, but the operator finds a defect that
> > > > > warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
> > > > > Any guests that discovered the 'widget' hypercall are likely going to
> > > > > get fussy _very_ quickly on the old kernel.
> > > >
> > > > This goes against what we decided to support for the *only* publicly
> > > > available VMM that cares about save/restore, which is that we only
> > > > move forward and don't rollback.
> > >
> > > Ah, I was definitely missing this context. Current behavior makes much
> > > more sense then.
> > >
> > > > Hypercalls are the least of your
> > > > worries, and there is a whole range of other architectural features
> > > > that will have also appeared/disappeared (your own CNTPOFF series is a
> > > > glaring example of this).
> > >
> > > Isn't that a tad bit different though? I'll admit, I'm just as guilty
> > > with my own series forgetting to add a KVM_CAP (oops), but it is in my
> > > queue to kick out with the fix for nVHE/ptimer. Nonetheless, if a user
> > > takes up a new KVM UAPI, it is up to the user to run on a new kernel.
> >
> > The two are linked. Exposing a new register to userspace and/or guest
> > result in the same thing: you can't rollback. That's specially true in
> > the QEMU case, which *learns* from the kernel what registers are
> > available, and doesn't maintain a fixed list.
> >
> > > My concerns are explicitly with the 'under the nose' changes, where
> > > KVM modifies the guest feature set without userspace opting in. Based
> > > on your comment, though, it would appear that other parts of KVM are
> > > affected too.
> >
> > Any new system register that is exposed by a new kernel feature breaks
> > rollback. And so far, we only consider it a bug if the set of exposed
> > registers reduces. Anything can be added safely (as checked by one of
> > the selftests added by Drew).
> >
> > < It doesn't have to be rollback safety, either. There may
> > > simply be a hypercall which an operator doesn't want to give its
> > > guests, and it needs a way to tell KVM to hide it.
> >
> > Fair enough. But this has to be done in a scalable way, which
> > individual capability cannot provide.
> >
> > > > > Have I missed something blatantly obvious, or do others see this as an
> > > > > issue as well? I'll reply with an example of adding opt-out for PTP.
> > > > > I'm sure other hypercalls could be handled similarly.
> > > >
> > > > Why do we need this? For future hypercalls, we could have some buy-in
> > > > capabilities. For existing ones, it is too late, and negative features
> > > > are just too horrible.
> > >
> > > Oh, agreed on the nastiness. Lazy hack to realize the intended
> > > functional change..
> >
> > Well, you definitely achieved your goal of attracting my attention :).
> >
> > > > For KVM-specific hypercalls, we could get the VMM to save/restore the
> > > > bitmap of supported functions. That would be "less horrible". This
> > > > could be implemented using extra "firmware pseudo-registers" such as
> > > > the ones described in Documentation/virt/kvm/arm/psci.rst.
> > >
> > > This seems more reasonable, especially since we do this for migrating
> > > the guest's PSCI version.
> > >
> > > Alternatively, I had thought about using a VM attribute, given the
> > > fact that it is non-architectural information and we avoid ABI issues
> > > in KVM_GET_REG_LIST without buy-in through a KVM_CAP.
> >
> > The whole point is that these settings get exposed by
> > KVM_GET_REG_LIST, as this is QEMU's way to dump a VM state. Given that
> > we already have this for things like the spectre management state, we
> > can just as well expose the bitmaps that deal with the KVM-specific
> > hypercalls. After all, this falls into the realm of "KVM as VM
> > firmware".
> >
> > For ARM-architected hypercalls (TRNG, stolen time), we may need a
> > similar extension.
> >
>
> Thanks for including me Marc. I think you've mentioned all the examples
> of why we don't generally expect N+1 -> N migrations to work that I
> can think of. While some of the examples like get-reg-list could
> eventually be eliminated if we had CPU models to tighten our machine type
> state, I think N+1 -> N migrations will always be best effort at most.
>
> I agree with giving userspace control over the exposer of the hypercalls
> though. Using pseudo-registers for that purpose rather than a pile of
> CAPs also seems reasonable to me.
>
> And, while I don't think this patch is going to proceed, I thought I'd
> point out that the opt-out approach doesn't help much with expanding
> our migration support unless we require the VMM to be upgraded first.
>
> And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
> work as expected, since the source enforce opt-out, but the destination
> won't.

Right, there's going to need to be a fence in both kernel and VMM
versions. Before the fence, you can't rollback with either component.
Once on the other side of the fence, the user may freely migrate
between kernel + VMM combinations.

> Also, since the VMM doesn't key off the kernel version, for the
> most part N+1 VMMs won't know when they're supposed to opt-out or not,
> leaving it to the user to ensure they consider everything. opt-in
> usually only needs the user to consider what machine type they want to
> launch.

Going the register route will implicitly require opt-out for all old
hypercalls. We exposed them unconditionally to the guest before, and
we must uphold that behavior. The default value for the bitmap will
have those features set. Any hypercalls added after that register
interface will then require explicit opt-in from userspace.

With regards to the pseudoregister interface, how would a VMM discover
new bits? From my perspective, you need to have two bitmaps that the
VMM can get at: the set of supported feature bits and the active
bitmap of features for a running guest.

> Thanks,
> drew
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
  2021-08-25 18:14           ` Oliver Upton
  (?)
@ 2021-08-26  8:37             ` Marc Zyngier
  -1 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2021-08-26  8:37 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Andrew Jones, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Wed, 25 Aug 2021 19:14:59 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> On Wed, Aug 25, 2021 at 8:07 AM Andrew Jones <drjones@redhat.com> wrote:

[...]

> > Thanks for including me Marc. I think you've mentioned all the examples
> > of why we don't generally expect N+1 -> N migrations to work that I
> > can think of. While some of the examples like get-reg-list could
> > eventually be eliminated if we had CPU models to tighten our machine type
> > state, I think N+1 -> N migrations will always be best effort at most.
> >
> > I agree with giving userspace control over the exposer of the hypercalls
> > though. Using pseudo-registers for that purpose rather than a pile of
> > CAPs also seems reasonable to me.
> >
> > And, while I don't think this patch is going to proceed, I thought I'd
> > point out that the opt-out approach doesn't help much with expanding
> > our migration support unless we require the VMM to be upgraded first.
> >
> > And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
> > work as expected, since the source enforce opt-out, but the destination
> > won't.
> 
> Right, there's going to need to be a fence in both kernel and VMM
> versions. Before the fence, you can't rollback with either component.
> Once on the other side of the fence, the user may freely migrate
> between kernel + VMM combinations.
>
> > Also, since the VMM doesn't key off the kernel version, for the
> > most part N+1 VMMs won't know when they're supposed to opt-out or not,
> > leaving it to the user to ensure they consider everything. opt-in
> > usually only needs the user to consider what machine type they want to
> > launch.
> 
> Going the register route will implicitly require opt-out for all old
> hypercalls. We exposed them unconditionally to the guest before, and
> we must uphold that behavior. The default value for the bitmap will
> have those features set. Any hypercalls added after that register
> interface will then require explicit opt-in from userspace.

I disagree here. This makes the ABI inconsistent, and means that no
feature can be implemented without changing userspace. If you can deal
with the existing features, you should be able to deal with the next
lot.

> With regards to the pseudoregister interface, how would a VMM discover
> new bits? From my perspective, you need to have two bitmaps that the
> VMM can get at: the set of supported feature bits and the active
> bitmap of features for a running guest.

My proposal is that we have a single pseudo-register exposing the list
of implemented by the kernel. Clear the bits you don't want, and write
back the result. As long as you haven't written anything, you have the
full feature set. That's pretty similar to the virtio feature
negotiation.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-26  8:37             ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2021-08-26  8:37 UTC (permalink / raw)
  To: Oliver Upton; +Cc: kvm, pshier, kvmarm, linux-arm-kernel

On Wed, 25 Aug 2021 19:14:59 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> On Wed, Aug 25, 2021 at 8:07 AM Andrew Jones <drjones@redhat.com> wrote:

[...]

> > Thanks for including me Marc. I think you've mentioned all the examples
> > of why we don't generally expect N+1 -> N migrations to work that I
> > can think of. While some of the examples like get-reg-list could
> > eventually be eliminated if we had CPU models to tighten our machine type
> > state, I think N+1 -> N migrations will always be best effort at most.
> >
> > I agree with giving userspace control over the exposer of the hypercalls
> > though. Using pseudo-registers for that purpose rather than a pile of
> > CAPs also seems reasonable to me.
> >
> > And, while I don't think this patch is going to proceed, I thought I'd
> > point out that the opt-out approach doesn't help much with expanding
> > our migration support unless we require the VMM to be upgraded first.
> >
> > And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
> > work as expected, since the source enforce opt-out, but the destination
> > won't.
> 
> Right, there's going to need to be a fence in both kernel and VMM
> versions. Before the fence, you can't rollback with either component.
> Once on the other side of the fence, the user may freely migrate
> between kernel + VMM combinations.
>
> > Also, since the VMM doesn't key off the kernel version, for the
> > most part N+1 VMMs won't know when they're supposed to opt-out or not,
> > leaving it to the user to ensure they consider everything. opt-in
> > usually only needs the user to consider what machine type they want to
> > launch.
> 
> Going the register route will implicitly require opt-out for all old
> hypercalls. We exposed them unconditionally to the guest before, and
> we must uphold that behavior. The default value for the bitmap will
> have those features set. Any hypercalls added after that register
> interface will then require explicit opt-in from userspace.

I disagree here. This makes the ABI inconsistent, and means that no
feature can be implemented without changing userspace. If you can deal
with the existing features, you should be able to deal with the next
lot.

> With regards to the pseudoregister interface, how would a VMM discover
> new bits? From my perspective, you need to have two bitmaps that the
> VMM can get at: the set of supported feature bits and the active
> bitmap of features for a running guest.

My proposal is that we have a single pseudo-register exposing the list
of implemented by the kernel. Clear the bits you don't want, and write
back the result. As long as you haven't written anything, you have the
full feature set. That's pretty similar to the virtio feature
negotiation.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-26  8:37             ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2021-08-26  8:37 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Andrew Jones, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Wed, 25 Aug 2021 19:14:59 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> On Wed, Aug 25, 2021 at 8:07 AM Andrew Jones <drjones@redhat.com> wrote:

[...]

> > Thanks for including me Marc. I think you've mentioned all the examples
> > of why we don't generally expect N+1 -> N migrations to work that I
> > can think of. While some of the examples like get-reg-list could
> > eventually be eliminated if we had CPU models to tighten our machine type
> > state, I think N+1 -> N migrations will always be best effort at most.
> >
> > I agree with giving userspace control over the exposer of the hypercalls
> > though. Using pseudo-registers for that purpose rather than a pile of
> > CAPs also seems reasonable to me.
> >
> > And, while I don't think this patch is going to proceed, I thought I'd
> > point out that the opt-out approach doesn't help much with expanding
> > our migration support unless we require the VMM to be upgraded first.
> >
> > And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
> > work as expected, since the source enforce opt-out, but the destination
> > won't.
> 
> Right, there's going to need to be a fence in both kernel and VMM
> versions. Before the fence, you can't rollback with either component.
> Once on the other side of the fence, the user may freely migrate
> between kernel + VMM combinations.
>
> > Also, since the VMM doesn't key off the kernel version, for the
> > most part N+1 VMMs won't know when they're supposed to opt-out or not,
> > leaving it to the user to ensure they consider everything. opt-in
> > usually only needs the user to consider what machine type they want to
> > launch.
> 
> Going the register route will implicitly require opt-out for all old
> hypercalls. We exposed them unconditionally to the guest before, and
> we must uphold that behavior. The default value for the bitmap will
> have those features set. Any hypercalls added after that register
> interface will then require explicit opt-in from userspace.

I disagree here. This makes the ABI inconsistent, and means that no
feature can be implemented without changing userspace. If you can deal
with the existing features, you should be able to deal with the next
lot.

> With regards to the pseudoregister interface, how would a VMM discover
> new bits? From my perspective, you need to have two bitmaps that the
> VMM can get at: the set of supported feature bits and the active
> bitmap of features for a running guest.

My proposal is that we have a single pseudo-register exposing the list
of implemented by the kernel. Clear the bits you don't want, and write
back the result. As long as you haven't written anything, you have the
full feature set. That's pretty similar to the virtio feature
negotiation.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
  2021-08-25 18:14           ` Oliver Upton
  (?)
@ 2021-08-26  8:49             ` Andrew Jones
  -1 siblings, 0 replies; 74+ messages in thread
From: Andrew Jones @ 2021-08-26  8:49 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Wed, Aug 25, 2021 at 11:14:59AM -0700, Oliver Upton wrote:
> On Wed, Aug 25, 2021 at 8:07 AM Andrew Jones <drjones@redhat.com> wrote:
> >
> > On Wed, Aug 25, 2021 at 11:39:34AM +0100, Marc Zyngier wrote:
> > > On Wed, 25 Aug 2021 11:02:28 +0100,
> > > Oliver Upton <oupton@google.com> wrote:
> > > >
> > > > On Wed, Aug 25, 2021 at 2:27 AM Marc Zyngier <maz@kernel.org> wrote:
> > > > > > Exposing new hypercalls to guests in this manner seems very unsafe to
> > > > > > me. Suppose an operator is trying to upgrade from kernel N to kernel
> > > > > > N+1, which brings in the new 'widget' hypercall. Guests are live
> > > > > > migrated onto the N+1 kernel, but the operator finds a defect that
> > > > > > warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
> > > > > > Any guests that discovered the 'widget' hypercall are likely going to
> > > > > > get fussy _very_ quickly on the old kernel.
> > > > >
> > > > > This goes against what we decided to support for the *only* publicly
> > > > > available VMM that cares about save/restore, which is that we only
> > > > > move forward and don't rollback.
> > > >
> > > > Ah, I was definitely missing this context. Current behavior makes much
> > > > more sense then.
> > > >
> > > > > Hypercalls are the least of your
> > > > > worries, and there is a whole range of other architectural features
> > > > > that will have also appeared/disappeared (your own CNTPOFF series is a
> > > > > glaring example of this).
> > > >
> > > > Isn't that a tad bit different though? I'll admit, I'm just as guilty
> > > > with my own series forgetting to add a KVM_CAP (oops), but it is in my
> > > > queue to kick out with the fix for nVHE/ptimer. Nonetheless, if a user
> > > > takes up a new KVM UAPI, it is up to the user to run on a new kernel.
> > >
> > > The two are linked. Exposing a new register to userspace and/or guest
> > > result in the same thing: you can't rollback. That's specially true in
> > > the QEMU case, which *learns* from the kernel what registers are
> > > available, and doesn't maintain a fixed list.
> > >
> > > > My concerns are explicitly with the 'under the nose' changes, where
> > > > KVM modifies the guest feature set without userspace opting in. Based
> > > > on your comment, though, it would appear that other parts of KVM are
> > > > affected too.
> > >
> > > Any new system register that is exposed by a new kernel feature breaks
> > > rollback. And so far, we only consider it a bug if the set of exposed
> > > registers reduces. Anything can be added safely (as checked by one of
> > > the selftests added by Drew).
> > >
> > > < It doesn't have to be rollback safety, either. There may
> > > > simply be a hypercall which an operator doesn't want to give its
> > > > guests, and it needs a way to tell KVM to hide it.
> > >
> > > Fair enough. But this has to be done in a scalable way, which
> > > individual capability cannot provide.
> > >
> > > > > > Have I missed something blatantly obvious, or do others see this as an
> > > > > > issue as well? I'll reply with an example of adding opt-out for PTP.
> > > > > > I'm sure other hypercalls could be handled similarly.
> > > > >
> > > > > Why do we need this? For future hypercalls, we could have some buy-in
> > > > > capabilities. For existing ones, it is too late, and negative features
> > > > > are just too horrible.
> > > >
> > > > Oh, agreed on the nastiness. Lazy hack to realize the intended
> > > > functional change..
> > >
> > > Well, you definitely achieved your goal of attracting my attention :).
> > >
> > > > > For KVM-specific hypercalls, we could get the VMM to save/restore the
> > > > > bitmap of supported functions. That would be "less horrible". This
> > > > > could be implemented using extra "firmware pseudo-registers" such as
> > > > > the ones described in Documentation/virt/kvm/arm/psci.rst.
> > > >
> > > > This seems more reasonable, especially since we do this for migrating
> > > > the guest's PSCI version.
> > > >
> > > > Alternatively, I had thought about using a VM attribute, given the
> > > > fact that it is non-architectural information and we avoid ABI issues
> > > > in KVM_GET_REG_LIST without buy-in through a KVM_CAP.
> > >
> > > The whole point is that these settings get exposed by
> > > KVM_GET_REG_LIST, as this is QEMU's way to dump a VM state. Given that
> > > we already have this for things like the spectre management state, we
> > > can just as well expose the bitmaps that deal with the KVM-specific
> > > hypercalls. After all, this falls into the realm of "KVM as VM
> > > firmware".
> > >
> > > For ARM-architected hypercalls (TRNG, stolen time), we may need a
> > > similar extension.
> > >
> >
> > Thanks for including me Marc. I think you've mentioned all the examples
> > of why we don't generally expect N+1 -> N migrations to work that I
> > can think of. While some of the examples like get-reg-list could
> > eventually be eliminated if we had CPU models to tighten our machine type
> > state, I think N+1 -> N migrations will always be best effort at most.
> >
> > I agree with giving userspace control over the exposer of the hypercalls
> > though. Using pseudo-registers for that purpose rather than a pile of
> > CAPs also seems reasonable to me.
> >
> > And, while I don't think this patch is going to proceed, I thought I'd
> > point out that the opt-out approach doesn't help much with expanding
> > our migration support unless we require the VMM to be upgraded first.
> >
> > And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
> > work as expected, since the source enforce opt-out, but the destination
> > won't.
> 
> Right, there's going to need to be a fence in both kernel and VMM
> versions. Before the fence, you can't rollback with either component.
> Once on the other side of the fence, the user may freely migrate
> between kernel + VMM combinations.
> 
> > Also, since the VMM doesn't key off the kernel version, for the
> > most part N+1 VMMs won't know when they're supposed to opt-out or not,
> > leaving it to the user to ensure they consider everything. opt-in
> > usually only needs the user to consider what machine type they want to
> > launch.
> 
> Going the register route will implicitly require opt-out for all old
> hypercalls. We exposed them unconditionally to the guest before, and
> we must uphold that behavior. The default value for the bitmap will
> have those features set. Any hypercalls added after that register
> interface will then require explicit opt-in from userspace.
> 
> With regards to the pseudoregister interface, how would a VMM discover
> new bits? From my perspective, you need to have two bitmaps that the
> VMM can get at: the set of supported feature bits and the active
> bitmap of features for a running guest.
>

I think we should model the pseudo-register approach off of x86's
CPUID approach. x86 has specific get/set ioctls for CPUIDs
(KVM_GET/SET_CPUID2), but I think we should get by with just
get/set-one-reg. However, it might be nice/necessary to have something
like x86's KVM_GET_SUPPORTED_CPUID which returns all the registers
at once as a bitmap and the set bits would inform userspace of what's
supported by the hardware and KVM.

So a new ioctl similar to KVM_GET_SUPPORTED_CPUID would be your
first bitmap that shows what's supported and then userpace can
determine what it wants to change and calculate the appropriate
pseudo-registers to set/clear bits on with set-one-reg, as x86
would do with kvm-set-cpuid2.

Thanks,
drew


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-26  8:49             ` Andrew Jones
  0 siblings, 0 replies; 74+ messages in thread
From: Andrew Jones @ 2021-08-26  8:49 UTC (permalink / raw)
  To: Oliver Upton; +Cc: kvm, Marc Zyngier, pshier, kvmarm, linux-arm-kernel

On Wed, Aug 25, 2021 at 11:14:59AM -0700, Oliver Upton wrote:
> On Wed, Aug 25, 2021 at 8:07 AM Andrew Jones <drjones@redhat.com> wrote:
> >
> > On Wed, Aug 25, 2021 at 11:39:34AM +0100, Marc Zyngier wrote:
> > > On Wed, 25 Aug 2021 11:02:28 +0100,
> > > Oliver Upton <oupton@google.com> wrote:
> > > >
> > > > On Wed, Aug 25, 2021 at 2:27 AM Marc Zyngier <maz@kernel.org> wrote:
> > > > > > Exposing new hypercalls to guests in this manner seems very unsafe to
> > > > > > me. Suppose an operator is trying to upgrade from kernel N to kernel
> > > > > > N+1, which brings in the new 'widget' hypercall. Guests are live
> > > > > > migrated onto the N+1 kernel, but the operator finds a defect that
> > > > > > warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
> > > > > > Any guests that discovered the 'widget' hypercall are likely going to
> > > > > > get fussy _very_ quickly on the old kernel.
> > > > >
> > > > > This goes against what we decided to support for the *only* publicly
> > > > > available VMM that cares about save/restore, which is that we only
> > > > > move forward and don't rollback.
> > > >
> > > > Ah, I was definitely missing this context. Current behavior makes much
> > > > more sense then.
> > > >
> > > > > Hypercalls are the least of your
> > > > > worries, and there is a whole range of other architectural features
> > > > > that will have also appeared/disappeared (your own CNTPOFF series is a
> > > > > glaring example of this).
> > > >
> > > > Isn't that a tad bit different though? I'll admit, I'm just as guilty
> > > > with my own series forgetting to add a KVM_CAP (oops), but it is in my
> > > > queue to kick out with the fix for nVHE/ptimer. Nonetheless, if a user
> > > > takes up a new KVM UAPI, it is up to the user to run on a new kernel.
> > >
> > > The two are linked. Exposing a new register to userspace and/or guest
> > > result in the same thing: you can't rollback. That's specially true in
> > > the QEMU case, which *learns* from the kernel what registers are
> > > available, and doesn't maintain a fixed list.
> > >
> > > > My concerns are explicitly with the 'under the nose' changes, where
> > > > KVM modifies the guest feature set without userspace opting in. Based
> > > > on your comment, though, it would appear that other parts of KVM are
> > > > affected too.
> > >
> > > Any new system register that is exposed by a new kernel feature breaks
> > > rollback. And so far, we only consider it a bug if the set of exposed
> > > registers reduces. Anything can be added safely (as checked by one of
> > > the selftests added by Drew).
> > >
> > > < It doesn't have to be rollback safety, either. There may
> > > > simply be a hypercall which an operator doesn't want to give its
> > > > guests, and it needs a way to tell KVM to hide it.
> > >
> > > Fair enough. But this has to be done in a scalable way, which
> > > individual capability cannot provide.
> > >
> > > > > > Have I missed something blatantly obvious, or do others see this as an
> > > > > > issue as well? I'll reply with an example of adding opt-out for PTP.
> > > > > > I'm sure other hypercalls could be handled similarly.
> > > > >
> > > > > Why do we need this? For future hypercalls, we could have some buy-in
> > > > > capabilities. For existing ones, it is too late, and negative features
> > > > > are just too horrible.
> > > >
> > > > Oh, agreed on the nastiness. Lazy hack to realize the intended
> > > > functional change..
> > >
> > > Well, you definitely achieved your goal of attracting my attention :).
> > >
> > > > > For KVM-specific hypercalls, we could get the VMM to save/restore the
> > > > > bitmap of supported functions. That would be "less horrible". This
> > > > > could be implemented using extra "firmware pseudo-registers" such as
> > > > > the ones described in Documentation/virt/kvm/arm/psci.rst.
> > > >
> > > > This seems more reasonable, especially since we do this for migrating
> > > > the guest's PSCI version.
> > > >
> > > > Alternatively, I had thought about using a VM attribute, given the
> > > > fact that it is non-architectural information and we avoid ABI issues
> > > > in KVM_GET_REG_LIST without buy-in through a KVM_CAP.
> > >
> > > The whole point is that these settings get exposed by
> > > KVM_GET_REG_LIST, as this is QEMU's way to dump a VM state. Given that
> > > we already have this for things like the spectre management state, we
> > > can just as well expose the bitmaps that deal with the KVM-specific
> > > hypercalls. After all, this falls into the realm of "KVM as VM
> > > firmware".
> > >
> > > For ARM-architected hypercalls (TRNG, stolen time), we may need a
> > > similar extension.
> > >
> >
> > Thanks for including me Marc. I think you've mentioned all the examples
> > of why we don't generally expect N+1 -> N migrations to work that I
> > can think of. While some of the examples like get-reg-list could
> > eventually be eliminated if we had CPU models to tighten our machine type
> > state, I think N+1 -> N migrations will always be best effort at most.
> >
> > I agree with giving userspace control over the exposer of the hypercalls
> > though. Using pseudo-registers for that purpose rather than a pile of
> > CAPs also seems reasonable to me.
> >
> > And, while I don't think this patch is going to proceed, I thought I'd
> > point out that the opt-out approach doesn't help much with expanding
> > our migration support unless we require the VMM to be upgraded first.
> >
> > And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
> > work as expected, since the source enforce opt-out, but the destination
> > won't.
> 
> Right, there's going to need to be a fence in both kernel and VMM
> versions. Before the fence, you can't rollback with either component.
> Once on the other side of the fence, the user may freely migrate
> between kernel + VMM combinations.
> 
> > Also, since the VMM doesn't key off the kernel version, for the
> > most part N+1 VMMs won't know when they're supposed to opt-out or not,
> > leaving it to the user to ensure they consider everything. opt-in
> > usually only needs the user to consider what machine type they want to
> > launch.
> 
> Going the register route will implicitly require opt-out for all old
> hypercalls. We exposed them unconditionally to the guest before, and
> we must uphold that behavior. The default value for the bitmap will
> have those features set. Any hypercalls added after that register
> interface will then require explicit opt-in from userspace.
> 
> With regards to the pseudoregister interface, how would a VMM discover
> new bits? From my perspective, you need to have two bitmaps that the
> VMM can get at: the set of supported feature bits and the active
> bitmap of features for a running guest.
>

I think we should model the pseudo-register approach off of x86's
CPUID approach. x86 has specific get/set ioctls for CPUIDs
(KVM_GET/SET_CPUID2), but I think we should get by with just
get/set-one-reg. However, it might be nice/necessary to have something
like x86's KVM_GET_SUPPORTED_CPUID which returns all the registers
at once as a bitmap and the set bits would inform userspace of what's
supported by the hardware and KVM.

So a new ioctl similar to KVM_GET_SUPPORTED_CPUID would be your
first bitmap that shows what's supported and then userpace can
determine what it wants to change and calculate the appropriate
pseudo-registers to set/clear bits on with set-one-reg, as x86
would do with kvm-set-cpuid2.

Thanks,
drew

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-26  8:49             ` Andrew Jones
  0 siblings, 0 replies; 74+ messages in thread
From: Andrew Jones @ 2021-08-26  8:49 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Wed, Aug 25, 2021 at 11:14:59AM -0700, Oliver Upton wrote:
> On Wed, Aug 25, 2021 at 8:07 AM Andrew Jones <drjones@redhat.com> wrote:
> >
> > On Wed, Aug 25, 2021 at 11:39:34AM +0100, Marc Zyngier wrote:
> > > On Wed, 25 Aug 2021 11:02:28 +0100,
> > > Oliver Upton <oupton@google.com> wrote:
> > > >
> > > > On Wed, Aug 25, 2021 at 2:27 AM Marc Zyngier <maz@kernel.org> wrote:
> > > > > > Exposing new hypercalls to guests in this manner seems very unsafe to
> > > > > > me. Suppose an operator is trying to upgrade from kernel N to kernel
> > > > > > N+1, which brings in the new 'widget' hypercall. Guests are live
> > > > > > migrated onto the N+1 kernel, but the operator finds a defect that
> > > > > > warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
> > > > > > Any guests that discovered the 'widget' hypercall are likely going to
> > > > > > get fussy _very_ quickly on the old kernel.
> > > > >
> > > > > This goes against what we decided to support for the *only* publicly
> > > > > available VMM that cares about save/restore, which is that we only
> > > > > move forward and don't rollback.
> > > >
> > > > Ah, I was definitely missing this context. Current behavior makes much
> > > > more sense then.
> > > >
> > > > > Hypercalls are the least of your
> > > > > worries, and there is a whole range of other architectural features
> > > > > that will have also appeared/disappeared (your own CNTPOFF series is a
> > > > > glaring example of this).
> > > >
> > > > Isn't that a tad bit different though? I'll admit, I'm just as guilty
> > > > with my own series forgetting to add a KVM_CAP (oops), but it is in my
> > > > queue to kick out with the fix for nVHE/ptimer. Nonetheless, if a user
> > > > takes up a new KVM UAPI, it is up to the user to run on a new kernel.
> > >
> > > The two are linked. Exposing a new register to userspace and/or guest
> > > result in the same thing: you can't rollback. That's specially true in
> > > the QEMU case, which *learns* from the kernel what registers are
> > > available, and doesn't maintain a fixed list.
> > >
> > > > My concerns are explicitly with the 'under the nose' changes, where
> > > > KVM modifies the guest feature set without userspace opting in. Based
> > > > on your comment, though, it would appear that other parts of KVM are
> > > > affected too.
> > >
> > > Any new system register that is exposed by a new kernel feature breaks
> > > rollback. And so far, we only consider it a bug if the set of exposed
> > > registers reduces. Anything can be added safely (as checked by one of
> > > the selftests added by Drew).
> > >
> > > < It doesn't have to be rollback safety, either. There may
> > > > simply be a hypercall which an operator doesn't want to give its
> > > > guests, and it needs a way to tell KVM to hide it.
> > >
> > > Fair enough. But this has to be done in a scalable way, which
> > > individual capability cannot provide.
> > >
> > > > > > Have I missed something blatantly obvious, or do others see this as an
> > > > > > issue as well? I'll reply with an example of adding opt-out for PTP.
> > > > > > I'm sure other hypercalls could be handled similarly.
> > > > >
> > > > > Why do we need this? For future hypercalls, we could have some buy-in
> > > > > capabilities. For existing ones, it is too late, and negative features
> > > > > are just too horrible.
> > > >
> > > > Oh, agreed on the nastiness. Lazy hack to realize the intended
> > > > functional change..
> > >
> > > Well, you definitely achieved your goal of attracting my attention :).
> > >
> > > > > For KVM-specific hypercalls, we could get the VMM to save/restore the
> > > > > bitmap of supported functions. That would be "less horrible". This
> > > > > could be implemented using extra "firmware pseudo-registers" such as
> > > > > the ones described in Documentation/virt/kvm/arm/psci.rst.
> > > >
> > > > This seems more reasonable, especially since we do this for migrating
> > > > the guest's PSCI version.
> > > >
> > > > Alternatively, I had thought about using a VM attribute, given the
> > > > fact that it is non-architectural information and we avoid ABI issues
> > > > in KVM_GET_REG_LIST without buy-in through a KVM_CAP.
> > >
> > > The whole point is that these settings get exposed by
> > > KVM_GET_REG_LIST, as this is QEMU's way to dump a VM state. Given that
> > > we already have this for things like the spectre management state, we
> > > can just as well expose the bitmaps that deal with the KVM-specific
> > > hypercalls. After all, this falls into the realm of "KVM as VM
> > > firmware".
> > >
> > > For ARM-architected hypercalls (TRNG, stolen time), we may need a
> > > similar extension.
> > >
> >
> > Thanks for including me Marc. I think you've mentioned all the examples
> > of why we don't generally expect N+1 -> N migrations to work that I
> > can think of. While some of the examples like get-reg-list could
> > eventually be eliminated if we had CPU models to tighten our machine type
> > state, I think N+1 -> N migrations will always be best effort at most.
> >
> > I agree with giving userspace control over the exposer of the hypercalls
> > though. Using pseudo-registers for that purpose rather than a pile of
> > CAPs also seems reasonable to me.
> >
> > And, while I don't think this patch is going to proceed, I thought I'd
> > point out that the opt-out approach doesn't help much with expanding
> > our migration support unless we require the VMM to be upgraded first.
> >
> > And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
> > work as expected, since the source enforce opt-out, but the destination
> > won't.
> 
> Right, there's going to need to be a fence in both kernel and VMM
> versions. Before the fence, you can't rollback with either component.
> Once on the other side of the fence, the user may freely migrate
> between kernel + VMM combinations.
> 
> > Also, since the VMM doesn't key off the kernel version, for the
> > most part N+1 VMMs won't know when they're supposed to opt-out or not,
> > leaving it to the user to ensure they consider everything. opt-in
> > usually only needs the user to consider what machine type they want to
> > launch.
> 
> Going the register route will implicitly require opt-out for all old
> hypercalls. We exposed them unconditionally to the guest before, and
> we must uphold that behavior. The default value for the bitmap will
> have those features set. Any hypercalls added after that register
> interface will then require explicit opt-in from userspace.
> 
> With regards to the pseudoregister interface, how would a VMM discover
> new bits? From my perspective, you need to have two bitmaps that the
> VMM can get at: the set of supported feature bits and the active
> bitmap of features for a running guest.
>

I think we should model the pseudo-register approach off of x86's
CPUID approach. x86 has specific get/set ioctls for CPUIDs
(KVM_GET/SET_CPUID2), but I think we should get by with just
get/set-one-reg. However, it might be nice/necessary to have something
like x86's KVM_GET_SUPPORTED_CPUID which returns all the registers
at once as a bitmap and the set bits would inform userspace of what's
supported by the hardware and KVM.

So a new ioctl similar to KVM_GET_SUPPORTED_CPUID would be your
first bitmap that shows what's supported and then userpace can
determine what it wants to change and calculate the appropriate
pseudo-registers to set/clear bits on with set-one-reg, as x86
would do with kvm-set-cpuid2.

Thanks,
drew


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
  2021-08-26  8:49             ` Andrew Jones
  (?)
@ 2021-08-26  8:54               ` Andrew Jones
  -1 siblings, 0 replies; 74+ messages in thread
From: Andrew Jones @ 2021-08-26  8:54 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Thu, Aug 26, 2021 at 10:49:45AM +0200, Andrew Jones wrote:
> On Wed, Aug 25, 2021 at 11:14:59AM -0700, Oliver Upton wrote:
> > On Wed, Aug 25, 2021 at 8:07 AM Andrew Jones <drjones@redhat.com> wrote:
> > >
> > > On Wed, Aug 25, 2021 at 11:39:34AM +0100, Marc Zyngier wrote:
> > > > On Wed, 25 Aug 2021 11:02:28 +0100,
> > > > Oliver Upton <oupton@google.com> wrote:
> > > > >
> > > > > On Wed, Aug 25, 2021 at 2:27 AM Marc Zyngier <maz@kernel.org> wrote:
> > > > > > > Exposing new hypercalls to guests in this manner seems very unsafe to
> > > > > > > me. Suppose an operator is trying to upgrade from kernel N to kernel
> > > > > > > N+1, which brings in the new 'widget' hypercall. Guests are live
> > > > > > > migrated onto the N+1 kernel, but the operator finds a defect that
> > > > > > > warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
> > > > > > > Any guests that discovered the 'widget' hypercall are likely going to
> > > > > > > get fussy _very_ quickly on the old kernel.
> > > > > >
> > > > > > This goes against what we decided to support for the *only* publicly
> > > > > > available VMM that cares about save/restore, which is that we only
> > > > > > move forward and don't rollback.
> > > > >
> > > > > Ah, I was definitely missing this context. Current behavior makes much
> > > > > more sense then.
> > > > >
> > > > > > Hypercalls are the least of your
> > > > > > worries, and there is a whole range of other architectural features
> > > > > > that will have also appeared/disappeared (your own CNTPOFF series is a
> > > > > > glaring example of this).
> > > > >
> > > > > Isn't that a tad bit different though? I'll admit, I'm just as guilty
> > > > > with my own series forgetting to add a KVM_CAP (oops), but it is in my
> > > > > queue to kick out with the fix for nVHE/ptimer. Nonetheless, if a user
> > > > > takes up a new KVM UAPI, it is up to the user to run on a new kernel.
> > > >
> > > > The two are linked. Exposing a new register to userspace and/or guest
> > > > result in the same thing: you can't rollback. That's specially true in
> > > > the QEMU case, which *learns* from the kernel what registers are
> > > > available, and doesn't maintain a fixed list.
> > > >
> > > > > My concerns are explicitly with the 'under the nose' changes, where
> > > > > KVM modifies the guest feature set without userspace opting in. Based
> > > > > on your comment, though, it would appear that other parts of KVM are
> > > > > affected too.
> > > >
> > > > Any new system register that is exposed by a new kernel feature breaks
> > > > rollback. And so far, we only consider it a bug if the set of exposed
> > > > registers reduces. Anything can be added safely (as checked by one of
> > > > the selftests added by Drew).
> > > >
> > > > < It doesn't have to be rollback safety, either. There may
> > > > > simply be a hypercall which an operator doesn't want to give its
> > > > > guests, and it needs a way to tell KVM to hide it.
> > > >
> > > > Fair enough. But this has to be done in a scalable way, which
> > > > individual capability cannot provide.
> > > >
> > > > > > > Have I missed something blatantly obvious, or do others see this as an
> > > > > > > issue as well? I'll reply with an example of adding opt-out for PTP.
> > > > > > > I'm sure other hypercalls could be handled similarly.
> > > > > >
> > > > > > Why do we need this? For future hypercalls, we could have some buy-in
> > > > > > capabilities. For existing ones, it is too late, and negative features
> > > > > > are just too horrible.
> > > > >
> > > > > Oh, agreed on the nastiness. Lazy hack to realize the intended
> > > > > functional change..
> > > >
> > > > Well, you definitely achieved your goal of attracting my attention :).
> > > >
> > > > > > For KVM-specific hypercalls, we could get the VMM to save/restore the
> > > > > > bitmap of supported functions. That would be "less horrible". This
> > > > > > could be implemented using extra "firmware pseudo-registers" such as
> > > > > > the ones described in Documentation/virt/kvm/arm/psci.rst.
> > > > >
> > > > > This seems more reasonable, especially since we do this for migrating
> > > > > the guest's PSCI version.
> > > > >
> > > > > Alternatively, I had thought about using a VM attribute, given the
> > > > > fact that it is non-architectural information and we avoid ABI issues
> > > > > in KVM_GET_REG_LIST without buy-in through a KVM_CAP.
> > > >
> > > > The whole point is that these settings get exposed by
> > > > KVM_GET_REG_LIST, as this is QEMU's way to dump a VM state. Given that
> > > > we already have this for things like the spectre management state, we
> > > > can just as well expose the bitmaps that deal with the KVM-specific
> > > > hypercalls. After all, this falls into the realm of "KVM as VM
> > > > firmware".
> > > >
> > > > For ARM-architected hypercalls (TRNG, stolen time), we may need a
> > > > similar extension.
> > > >
> > >
> > > Thanks for including me Marc. I think you've mentioned all the examples
> > > of why we don't generally expect N+1 -> N migrations to work that I
> > > can think of. While some of the examples like get-reg-list could
> > > eventually be eliminated if we had CPU models to tighten our machine type
> > > state, I think N+1 -> N migrations will always be best effort at most.
> > >
> > > I agree with giving userspace control over the exposer of the hypercalls
> > > though. Using pseudo-registers for that purpose rather than a pile of
> > > CAPs also seems reasonable to me.
> > >
> > > And, while I don't think this patch is going to proceed, I thought I'd
> > > point out that the opt-out approach doesn't help much with expanding
> > > our migration support unless we require the VMM to be upgraded first.
> > >
> > > And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
> > > work as expected, since the source enforce opt-out, but the destination
> > > won't.
> > 
> > Right, there's going to need to be a fence in both kernel and VMM
> > versions. Before the fence, you can't rollback with either component.
> > Once on the other side of the fence, the user may freely migrate
> > between kernel + VMM combinations.
> > 
> > > Also, since the VMM doesn't key off the kernel version, for the
> > > most part N+1 VMMs won't know when they're supposed to opt-out or not,
> > > leaving it to the user to ensure they consider everything. opt-in
> > > usually only needs the user to consider what machine type they want to
> > > launch.
> > 
> > Going the register route will implicitly require opt-out for all old
> > hypercalls. We exposed them unconditionally to the guest before, and
> > we must uphold that behavior. The default value for the bitmap will
> > have those features set. Any hypercalls added after that register
> > interface will then require explicit opt-in from userspace.
> > 
> > With regards to the pseudoregister interface, how would a VMM discover
> > new bits? From my perspective, you need to have two bitmaps that the
> > VMM can get at: the set of supported feature bits and the active
> > bitmap of features for a running guest.
> >
> 
> I think we should model the pseudo-register approach off of x86's
> CPUID approach. x86 has specific get/set ioctls for CPUIDs
> (KVM_GET/SET_CPUID2), but I think we should get by with just
> get/set-one-reg. However, it might be nice/necessary to have something
> like x86's KVM_GET_SUPPORTED_CPUID which returns all the registers
> at once as a bitmap and the set bits would inform userspace of what's
> supported by the hardware and KVM.
> 
> So a new ioctl similar to KVM_GET_SUPPORTED_CPUID would be your
> first bitmap that shows what's supported and then userpace can
> determine what it wants to change and calculate the appropriate
> pseudo-registers to set/clear bits on with set-one-reg, as x86
> would do with kvm-set-cpuid2.
>

I see Marc just replied stating we'll probably just have a single
register. The KVM_GET_SUPPORTED_CPUID type of ioctl would be
overkill in that case (get-one_reg is enough). Anyway it can
always be added later if we expand into more registers.

Thanks,
drew


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-26  8:54               ` Andrew Jones
  0 siblings, 0 replies; 74+ messages in thread
From: Andrew Jones @ 2021-08-26  8:54 UTC (permalink / raw)
  To: Oliver Upton; +Cc: kvm, Marc Zyngier, pshier, kvmarm, linux-arm-kernel

On Thu, Aug 26, 2021 at 10:49:45AM +0200, Andrew Jones wrote:
> On Wed, Aug 25, 2021 at 11:14:59AM -0700, Oliver Upton wrote:
> > On Wed, Aug 25, 2021 at 8:07 AM Andrew Jones <drjones@redhat.com> wrote:
> > >
> > > On Wed, Aug 25, 2021 at 11:39:34AM +0100, Marc Zyngier wrote:
> > > > On Wed, 25 Aug 2021 11:02:28 +0100,
> > > > Oliver Upton <oupton@google.com> wrote:
> > > > >
> > > > > On Wed, Aug 25, 2021 at 2:27 AM Marc Zyngier <maz@kernel.org> wrote:
> > > > > > > Exposing new hypercalls to guests in this manner seems very unsafe to
> > > > > > > me. Suppose an operator is trying to upgrade from kernel N to kernel
> > > > > > > N+1, which brings in the new 'widget' hypercall. Guests are live
> > > > > > > migrated onto the N+1 kernel, but the operator finds a defect that
> > > > > > > warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
> > > > > > > Any guests that discovered the 'widget' hypercall are likely going to
> > > > > > > get fussy _very_ quickly on the old kernel.
> > > > > >
> > > > > > This goes against what we decided to support for the *only* publicly
> > > > > > available VMM that cares about save/restore, which is that we only
> > > > > > move forward and don't rollback.
> > > > >
> > > > > Ah, I was definitely missing this context. Current behavior makes much
> > > > > more sense then.
> > > > >
> > > > > > Hypercalls are the least of your
> > > > > > worries, and there is a whole range of other architectural features
> > > > > > that will have also appeared/disappeared (your own CNTPOFF series is a
> > > > > > glaring example of this).
> > > > >
> > > > > Isn't that a tad bit different though? I'll admit, I'm just as guilty
> > > > > with my own series forgetting to add a KVM_CAP (oops), but it is in my
> > > > > queue to kick out with the fix for nVHE/ptimer. Nonetheless, if a user
> > > > > takes up a new KVM UAPI, it is up to the user to run on a new kernel.
> > > >
> > > > The two are linked. Exposing a new register to userspace and/or guest
> > > > result in the same thing: you can't rollback. That's specially true in
> > > > the QEMU case, which *learns* from the kernel what registers are
> > > > available, and doesn't maintain a fixed list.
> > > >
> > > > > My concerns are explicitly with the 'under the nose' changes, where
> > > > > KVM modifies the guest feature set without userspace opting in. Based
> > > > > on your comment, though, it would appear that other parts of KVM are
> > > > > affected too.
> > > >
> > > > Any new system register that is exposed by a new kernel feature breaks
> > > > rollback. And so far, we only consider it a bug if the set of exposed
> > > > registers reduces. Anything can be added safely (as checked by one of
> > > > the selftests added by Drew).
> > > >
> > > > < It doesn't have to be rollback safety, either. There may
> > > > > simply be a hypercall which an operator doesn't want to give its
> > > > > guests, and it needs a way to tell KVM to hide it.
> > > >
> > > > Fair enough. But this has to be done in a scalable way, which
> > > > individual capability cannot provide.
> > > >
> > > > > > > Have I missed something blatantly obvious, or do others see this as an
> > > > > > > issue as well? I'll reply with an example of adding opt-out for PTP.
> > > > > > > I'm sure other hypercalls could be handled similarly.
> > > > > >
> > > > > > Why do we need this? For future hypercalls, we could have some buy-in
> > > > > > capabilities. For existing ones, it is too late, and negative features
> > > > > > are just too horrible.
> > > > >
> > > > > Oh, agreed on the nastiness. Lazy hack to realize the intended
> > > > > functional change..
> > > >
> > > > Well, you definitely achieved your goal of attracting my attention :).
> > > >
> > > > > > For KVM-specific hypercalls, we could get the VMM to save/restore the
> > > > > > bitmap of supported functions. That would be "less horrible". This
> > > > > > could be implemented using extra "firmware pseudo-registers" such as
> > > > > > the ones described in Documentation/virt/kvm/arm/psci.rst.
> > > > >
> > > > > This seems more reasonable, especially since we do this for migrating
> > > > > the guest's PSCI version.
> > > > >
> > > > > Alternatively, I had thought about using a VM attribute, given the
> > > > > fact that it is non-architectural information and we avoid ABI issues
> > > > > in KVM_GET_REG_LIST without buy-in through a KVM_CAP.
> > > >
> > > > The whole point is that these settings get exposed by
> > > > KVM_GET_REG_LIST, as this is QEMU's way to dump a VM state. Given that
> > > > we already have this for things like the spectre management state, we
> > > > can just as well expose the bitmaps that deal with the KVM-specific
> > > > hypercalls. After all, this falls into the realm of "KVM as VM
> > > > firmware".
> > > >
> > > > For ARM-architected hypercalls (TRNG, stolen time), we may need a
> > > > similar extension.
> > > >
> > >
> > > Thanks for including me Marc. I think you've mentioned all the examples
> > > of why we don't generally expect N+1 -> N migrations to work that I
> > > can think of. While some of the examples like get-reg-list could
> > > eventually be eliminated if we had CPU models to tighten our machine type
> > > state, I think N+1 -> N migrations will always be best effort at most.
> > >
> > > I agree with giving userspace control over the exposer of the hypercalls
> > > though. Using pseudo-registers for that purpose rather than a pile of
> > > CAPs also seems reasonable to me.
> > >
> > > And, while I don't think this patch is going to proceed, I thought I'd
> > > point out that the opt-out approach doesn't help much with expanding
> > > our migration support unless we require the VMM to be upgraded first.
> > >
> > > And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
> > > work as expected, since the source enforce opt-out, but the destination
> > > won't.
> > 
> > Right, there's going to need to be a fence in both kernel and VMM
> > versions. Before the fence, you can't rollback with either component.
> > Once on the other side of the fence, the user may freely migrate
> > between kernel + VMM combinations.
> > 
> > > Also, since the VMM doesn't key off the kernel version, for the
> > > most part N+1 VMMs won't know when they're supposed to opt-out or not,
> > > leaving it to the user to ensure they consider everything. opt-in
> > > usually only needs the user to consider what machine type they want to
> > > launch.
> > 
> > Going the register route will implicitly require opt-out for all old
> > hypercalls. We exposed them unconditionally to the guest before, and
> > we must uphold that behavior. The default value for the bitmap will
> > have those features set. Any hypercalls added after that register
> > interface will then require explicit opt-in from userspace.
> > 
> > With regards to the pseudoregister interface, how would a VMM discover
> > new bits? From my perspective, you need to have two bitmaps that the
> > VMM can get at: the set of supported feature bits and the active
> > bitmap of features for a running guest.
> >
> 
> I think we should model the pseudo-register approach off of x86's
> CPUID approach. x86 has specific get/set ioctls for CPUIDs
> (KVM_GET/SET_CPUID2), but I think we should get by with just
> get/set-one-reg. However, it might be nice/necessary to have something
> like x86's KVM_GET_SUPPORTED_CPUID which returns all the registers
> at once as a bitmap and the set bits would inform userspace of what's
> supported by the hardware and KVM.
> 
> So a new ioctl similar to KVM_GET_SUPPORTED_CPUID would be your
> first bitmap that shows what's supported and then userpace can
> determine what it wants to change and calculate the appropriate
> pseudo-registers to set/clear bits on with set-one-reg, as x86
> would do with kvm-set-cpuid2.
>

I see Marc just replied stating we'll probably just have a single
register. The KVM_GET_SUPPORTED_CPUID type of ioctl would be
overkill in that case (get-one_reg is enough). Anyway it can
always be added later if we expand into more registers.

Thanks,
drew

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-26  8:54               ` Andrew Jones
  0 siblings, 0 replies; 74+ messages in thread
From: Andrew Jones @ 2021-08-26  8:54 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Thu, Aug 26, 2021 at 10:49:45AM +0200, Andrew Jones wrote:
> On Wed, Aug 25, 2021 at 11:14:59AM -0700, Oliver Upton wrote:
> > On Wed, Aug 25, 2021 at 8:07 AM Andrew Jones <drjones@redhat.com> wrote:
> > >
> > > On Wed, Aug 25, 2021 at 11:39:34AM +0100, Marc Zyngier wrote:
> > > > On Wed, 25 Aug 2021 11:02:28 +0100,
> > > > Oliver Upton <oupton@google.com> wrote:
> > > > >
> > > > > On Wed, Aug 25, 2021 at 2:27 AM Marc Zyngier <maz@kernel.org> wrote:
> > > > > > > Exposing new hypercalls to guests in this manner seems very unsafe to
> > > > > > > me. Suppose an operator is trying to upgrade from kernel N to kernel
> > > > > > > N+1, which brings in the new 'widget' hypercall. Guests are live
> > > > > > > migrated onto the N+1 kernel, but the operator finds a defect that
> > > > > > > warrants a kernel rollback. VMs are then migrated from kernel N+1 -> N.
> > > > > > > Any guests that discovered the 'widget' hypercall are likely going to
> > > > > > > get fussy _very_ quickly on the old kernel.
> > > > > >
> > > > > > This goes against what we decided to support for the *only* publicly
> > > > > > available VMM that cares about save/restore, which is that we only
> > > > > > move forward and don't rollback.
> > > > >
> > > > > Ah, I was definitely missing this context. Current behavior makes much
> > > > > more sense then.
> > > > >
> > > > > > Hypercalls are the least of your
> > > > > > worries, and there is a whole range of other architectural features
> > > > > > that will have also appeared/disappeared (your own CNTPOFF series is a
> > > > > > glaring example of this).
> > > > >
> > > > > Isn't that a tad bit different though? I'll admit, I'm just as guilty
> > > > > with my own series forgetting to add a KVM_CAP (oops), but it is in my
> > > > > queue to kick out with the fix for nVHE/ptimer. Nonetheless, if a user
> > > > > takes up a new KVM UAPI, it is up to the user to run on a new kernel.
> > > >
> > > > The two are linked. Exposing a new register to userspace and/or guest
> > > > result in the same thing: you can't rollback. That's specially true in
> > > > the QEMU case, which *learns* from the kernel what registers are
> > > > available, and doesn't maintain a fixed list.
> > > >
> > > > > My concerns are explicitly with the 'under the nose' changes, where
> > > > > KVM modifies the guest feature set without userspace opting in. Based
> > > > > on your comment, though, it would appear that other parts of KVM are
> > > > > affected too.
> > > >
> > > > Any new system register that is exposed by a new kernel feature breaks
> > > > rollback. And so far, we only consider it a bug if the set of exposed
> > > > registers reduces. Anything can be added safely (as checked by one of
> > > > the selftests added by Drew).
> > > >
> > > > < It doesn't have to be rollback safety, either. There may
> > > > > simply be a hypercall which an operator doesn't want to give its
> > > > > guests, and it needs a way to tell KVM to hide it.
> > > >
> > > > Fair enough. But this has to be done in a scalable way, which
> > > > individual capability cannot provide.
> > > >
> > > > > > > Have I missed something blatantly obvious, or do others see this as an
> > > > > > > issue as well? I'll reply with an example of adding opt-out for PTP.
> > > > > > > I'm sure other hypercalls could be handled similarly.
> > > > > >
> > > > > > Why do we need this? For future hypercalls, we could have some buy-in
> > > > > > capabilities. For existing ones, it is too late, and negative features
> > > > > > are just too horrible.
> > > > >
> > > > > Oh, agreed on the nastiness. Lazy hack to realize the intended
> > > > > functional change..
> > > >
> > > > Well, you definitely achieved your goal of attracting my attention :).
> > > >
> > > > > > For KVM-specific hypercalls, we could get the VMM to save/restore the
> > > > > > bitmap of supported functions. That would be "less horrible". This
> > > > > > could be implemented using extra "firmware pseudo-registers" such as
> > > > > > the ones described in Documentation/virt/kvm/arm/psci.rst.
> > > > >
> > > > > This seems more reasonable, especially since we do this for migrating
> > > > > the guest's PSCI version.
> > > > >
> > > > > Alternatively, I had thought about using a VM attribute, given the
> > > > > fact that it is non-architectural information and we avoid ABI issues
> > > > > in KVM_GET_REG_LIST without buy-in through a KVM_CAP.
> > > >
> > > > The whole point is that these settings get exposed by
> > > > KVM_GET_REG_LIST, as this is QEMU's way to dump a VM state. Given that
> > > > we already have this for things like the spectre management state, we
> > > > can just as well expose the bitmaps that deal with the KVM-specific
> > > > hypercalls. After all, this falls into the realm of "KVM as VM
> > > > firmware".
> > > >
> > > > For ARM-architected hypercalls (TRNG, stolen time), we may need a
> > > > similar extension.
> > > >
> > >
> > > Thanks for including me Marc. I think you've mentioned all the examples
> > > of why we don't generally expect N+1 -> N migrations to work that I
> > > can think of. While some of the examples like get-reg-list could
> > > eventually be eliminated if we had CPU models to tighten our machine type
> > > state, I think N+1 -> N migrations will always be best effort at most.
> > >
> > > I agree with giving userspace control over the exposer of the hypercalls
> > > though. Using pseudo-registers for that purpose rather than a pile of
> > > CAPs also seems reasonable to me.
> > >
> > > And, while I don't think this patch is going to proceed, I thought I'd
> > > point out that the opt-out approach doesn't help much with expanding
> > > our migration support unless we require the VMM to be upgraded first.
> > >
> > > And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
> > > work as expected, since the source enforce opt-out, but the destination
> > > won't.
> > 
> > Right, there's going to need to be a fence in both kernel and VMM
> > versions. Before the fence, you can't rollback with either component.
> > Once on the other side of the fence, the user may freely migrate
> > between kernel + VMM combinations.
> > 
> > > Also, since the VMM doesn't key off the kernel version, for the
> > > most part N+1 VMMs won't know when they're supposed to opt-out or not,
> > > leaving it to the user to ensure they consider everything. opt-in
> > > usually only needs the user to consider what machine type they want to
> > > launch.
> > 
> > Going the register route will implicitly require opt-out for all old
> > hypercalls. We exposed them unconditionally to the guest before, and
> > we must uphold that behavior. The default value for the bitmap will
> > have those features set. Any hypercalls added after that register
> > interface will then require explicit opt-in from userspace.
> > 
> > With regards to the pseudoregister interface, how would a VMM discover
> > new bits? From my perspective, you need to have two bitmaps that the
> > VMM can get at: the set of supported feature bits and the active
> > bitmap of features for a running guest.
> >
> 
> I think we should model the pseudo-register approach off of x86's
> CPUID approach. x86 has specific get/set ioctls for CPUIDs
> (KVM_GET/SET_CPUID2), but I think we should get by with just
> get/set-one-reg. However, it might be nice/necessary to have something
> like x86's KVM_GET_SUPPORTED_CPUID which returns all the registers
> at once as a bitmap and the set bits would inform userspace of what's
> supported by the hardware and KVM.
> 
> So a new ioctl similar to KVM_GET_SUPPORTED_CPUID would be your
> first bitmap that shows what's supported and then userpace can
> determine what it wants to change and calculate the appropriate
> pseudo-registers to set/clear bits on with set-one-reg, as x86
> would do with kvm-set-cpuid2.
>

I see Marc just replied stating we'll probably just have a single
register. The KVM_GET_SUPPORTED_CPUID type of ioctl would be
overkill in that case (get-one_reg is enough). Anyway it can
always be added later if we expand into more registers.

Thanks,
drew


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
  2021-08-26  8:54               ` Andrew Jones
  (?)
@ 2021-08-26  9:43                 ` Marc Zyngier
  -1 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2021-08-26  9:43 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Oliver Upton, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Thu, 26 Aug 2021 09:54:29 +0100,
Andrew Jones <drjones@redhat.com> wrote:

[...]

> I see Marc just replied stating we'll probably just have a single
> register. The KVM_GET_SUPPORTED_CPUID type of ioctl would be
> overkill in that case (get-one_reg is enough). Anyway it can
> always be added later if we expand into more registers.

Here's what I propose for the KVM-specific hypercalls. Compile tested
only, plenty of nits to sort out. But you'll get the general idea.
Something similar could be implemented for ARM-architected hypercalls
such as TRNG.

	M.

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index f8be56d5342b..3fcca7cc9668 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -136,6 +136,9 @@ struct kvm_arch {
 
 	/* Memory Tagging Extension enabled for the guest */
 	bool mte_enabled;
+
+	/* Bitmap of enabled hypercalls */
+	unsigned long hc_bmap;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index b3edde68bc3e..8d7a1a93f12c 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -281,6 +281,9 @@ struct kvm_arm_copy_mte_tags {
 #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED	3
 #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED     	(1U << 4)
 
+#define KVM_REG_ARM_KVM_SPECIFIC_HYPERCALLS_0	KVM_REG_ARM_FW_REG(3)
+#define KVM_REG_ARM_KVM_SPECIFIC_HYPERCALLS_1	KVM_REG_ARM_FW_REG(4)
+
 /* SVE registers */
 #define KVM_REG_ARM64_SVE		(0x15 << KVM_REG_ARM_COPROC_SHIFT)
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index fe102cd2e518..dbc136675b02 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -130,6 +130,13 @@ static void set_default_spectre(struct kvm *kvm)
 		kvm->arch.pfr0_csv3 = 1;
 }
 
+
+static void set_default_vendor_hypercalls(struct kvm *kvm)
+{
+	kvm->arch.hc_bmap = (BIT(ARM_SMCCC_KVM_FUNC_FEATURES) |
+			     BIT(ARM_SMCCC_KVM_FUNC_PTP));
+}
+
 /**
  * kvm_arch_init_vm - initializes a VM data structure
  * @kvm:	pointer to the KVM struct
@@ -156,6 +163,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
 
 	set_default_spectre(kvm);
+	set_default_vendor_hypercalls(kvm);
 
 	return ret;
 out_free_stage2_pgd:
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 30da78f72b3b..11f79e790f7d 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -127,8 +127,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 		val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3;
 		break;
 	case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
-		val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
-		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_PTP);
+		val[0] = vcpu->kvm->arch.hc_bmap;
 		break;
 	case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
 		kvm_ptp_get_time(vcpu, val);
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 74c47d420253..ceef0c706d50 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -404,21 +404,26 @@ int kvm_psci_call(struct kvm_vcpu *vcpu)
 	};
 }
 
+static const u64 fw_reg_ids[] = {
+	KVM_REG_ARM_PSCI_VERSION,
+	KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1,
+	KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
+	KVM_REG_ARM_KVM_SPECIFIC_HYPERCALLS_0,
+};
+
 int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
 {
-	return 3;		/* PSCI version and two workaround registers */
+	return ARRAY_SIZE(fw_reg_ids);
 }
 
 int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
 {
-	if (put_user(KVM_REG_ARM_PSCI_VERSION, uindices++))
-		return -EFAULT;
-
-	if (put_user(KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1, uindices++))
-		return -EFAULT;
+	int i;
 
-	if (put_user(KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2, uindices++))
-		return -EFAULT;
+	for (i = 0; i < ARRAY_SIZE(fw_reg_ids); i++) {
+		if (put_user(fw_reg_ids[i], uindices++))
+			return -EFAULT;
+	}
 
 	return 0;
 }
@@ -477,6 +482,10 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
 		val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
 		break;
+	case KVM_REG_ARM_KVM_SPECIFIC_HYPERCALLS_0:
+		val = vcpu->kvm->arch.hc_bmap;
+		break;
+	case KVM_REG_ARM_KVM_SPECIFIC_HYPERCALLS_1:
 	default:
 		return -ENOENT;
 	}
@@ -515,8 +524,9 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 				return -EINVAL;
 			vcpu->kvm->arch.psci_version = val;
 			return 0;
+		default:
+			return -EINVAL;
 		}
-		break;
 	}
 
 	case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
@@ -563,9 +573,16 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 			return -EINVAL;
 
 		return 0;
+	case KVM_REG_ARM_KVM_SPECIFIC_HYPERCALLS_0:
+		/* FIXME: no vcpu should have run */
+		/* Exclude any unknown bit */
+		if (val & ~(BIT(ARM_SMCCC_KVM_FUNC_FEATURES) |
+			    BIT(ARM_SMCCC_KVM_FUNC_PTP)))
+			return -EINVAL;
+		vcpu->kvm->arch.hc_bmap = val;
+		return 0;
+	case KVM_REG_ARM_KVM_SPECIFIC_HYPERCALLS_1:
 	default:
 		return -ENOENT;
 	}
-
-	return -EINVAL;
 }

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-26  9:43                 ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2021-08-26  9:43 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, pshier, kvmarm, linux-arm-kernel

On Thu, 26 Aug 2021 09:54:29 +0100,
Andrew Jones <drjones@redhat.com> wrote:

[...]

> I see Marc just replied stating we'll probably just have a single
> register. The KVM_GET_SUPPORTED_CPUID type of ioctl would be
> overkill in that case (get-one_reg is enough). Anyway it can
> always be added later if we expand into more registers.

Here's what I propose for the KVM-specific hypercalls. Compile tested
only, plenty of nits to sort out. But you'll get the general idea.
Something similar could be implemented for ARM-architected hypercalls
such as TRNG.

	M.

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index f8be56d5342b..3fcca7cc9668 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -136,6 +136,9 @@ struct kvm_arch {
 
 	/* Memory Tagging Extension enabled for the guest */
 	bool mte_enabled;
+
+	/* Bitmap of enabled hypercalls */
+	unsigned long hc_bmap;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index b3edde68bc3e..8d7a1a93f12c 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -281,6 +281,9 @@ struct kvm_arm_copy_mte_tags {
 #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED	3
 #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED     	(1U << 4)
 
+#define KVM_REG_ARM_KVM_SPECIFIC_HYPERCALLS_0	KVM_REG_ARM_FW_REG(3)
+#define KVM_REG_ARM_KVM_SPECIFIC_HYPERCALLS_1	KVM_REG_ARM_FW_REG(4)
+
 /* SVE registers */
 #define KVM_REG_ARM64_SVE		(0x15 << KVM_REG_ARM_COPROC_SHIFT)
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index fe102cd2e518..dbc136675b02 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -130,6 +130,13 @@ static void set_default_spectre(struct kvm *kvm)
 		kvm->arch.pfr0_csv3 = 1;
 }
 
+
+static void set_default_vendor_hypercalls(struct kvm *kvm)
+{
+	kvm->arch.hc_bmap = (BIT(ARM_SMCCC_KVM_FUNC_FEATURES) |
+			     BIT(ARM_SMCCC_KVM_FUNC_PTP));
+}
+
 /**
  * kvm_arch_init_vm - initializes a VM data structure
  * @kvm:	pointer to the KVM struct
@@ -156,6 +163,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
 
 	set_default_spectre(kvm);
+	set_default_vendor_hypercalls(kvm);
 
 	return ret;
 out_free_stage2_pgd:
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 30da78f72b3b..11f79e790f7d 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -127,8 +127,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 		val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3;
 		break;
 	case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
-		val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
-		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_PTP);
+		val[0] = vcpu->kvm->arch.hc_bmap;
 		break;
 	case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
 		kvm_ptp_get_time(vcpu, val);
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 74c47d420253..ceef0c706d50 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -404,21 +404,26 @@ int kvm_psci_call(struct kvm_vcpu *vcpu)
 	};
 }
 
+static const u64 fw_reg_ids[] = {
+	KVM_REG_ARM_PSCI_VERSION,
+	KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1,
+	KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
+	KVM_REG_ARM_KVM_SPECIFIC_HYPERCALLS_0,
+};
+
 int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
 {
-	return 3;		/* PSCI version and two workaround registers */
+	return ARRAY_SIZE(fw_reg_ids);
 }
 
 int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
 {
-	if (put_user(KVM_REG_ARM_PSCI_VERSION, uindices++))
-		return -EFAULT;
-
-	if (put_user(KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1, uindices++))
-		return -EFAULT;
+	int i;
 
-	if (put_user(KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2, uindices++))
-		return -EFAULT;
+	for (i = 0; i < ARRAY_SIZE(fw_reg_ids); i++) {
+		if (put_user(fw_reg_ids[i], uindices++))
+			return -EFAULT;
+	}
 
 	return 0;
 }
@@ -477,6 +482,10 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
 		val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
 		break;
+	case KVM_REG_ARM_KVM_SPECIFIC_HYPERCALLS_0:
+		val = vcpu->kvm->arch.hc_bmap;
+		break;
+	case KVM_REG_ARM_KVM_SPECIFIC_HYPERCALLS_1:
 	default:
 		return -ENOENT;
 	}
@@ -515,8 +524,9 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 				return -EINVAL;
 			vcpu->kvm->arch.psci_version = val;
 			return 0;
+		default:
+			return -EINVAL;
 		}
-		break;
 	}
 
 	case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
@@ -563,9 +573,16 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 			return -EINVAL;
 
 		return 0;
+	case KVM_REG_ARM_KVM_SPECIFIC_HYPERCALLS_0:
+		/* FIXME: no vcpu should have run */
+		/* Exclude any unknown bit */
+		if (val & ~(BIT(ARM_SMCCC_KVM_FUNC_FEATURES) |
+			    BIT(ARM_SMCCC_KVM_FUNC_PTP)))
+			return -EINVAL;
+		vcpu->kvm->arch.hc_bmap = val;
+		return 0;
+	case KVM_REG_ARM_KVM_SPECIFIC_HYPERCALLS_1:
 	default:
 		return -ENOENT;
 	}
-
-	return -EINVAL;
 }

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-26  9:43                 ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2021-08-26  9:43 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Oliver Upton, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Thu, 26 Aug 2021 09:54:29 +0100,
Andrew Jones <drjones@redhat.com> wrote:

[...]

> I see Marc just replied stating we'll probably just have a single
> register. The KVM_GET_SUPPORTED_CPUID type of ioctl would be
> overkill in that case (get-one_reg is enough). Anyway it can
> always be added later if we expand into more registers.

Here's what I propose for the KVM-specific hypercalls. Compile tested
only, plenty of nits to sort out. But you'll get the general idea.
Something similar could be implemented for ARM-architected hypercalls
such as TRNG.

	M.

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index f8be56d5342b..3fcca7cc9668 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -136,6 +136,9 @@ struct kvm_arch {
 
 	/* Memory Tagging Extension enabled for the guest */
 	bool mte_enabled;
+
+	/* Bitmap of enabled hypercalls */
+	unsigned long hc_bmap;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index b3edde68bc3e..8d7a1a93f12c 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -281,6 +281,9 @@ struct kvm_arm_copy_mte_tags {
 #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_NOT_REQUIRED	3
 #define KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2_ENABLED     	(1U << 4)
 
+#define KVM_REG_ARM_KVM_SPECIFIC_HYPERCALLS_0	KVM_REG_ARM_FW_REG(3)
+#define KVM_REG_ARM_KVM_SPECIFIC_HYPERCALLS_1	KVM_REG_ARM_FW_REG(4)
+
 /* SVE registers */
 #define KVM_REG_ARM64_SVE		(0x15 << KVM_REG_ARM_COPROC_SHIFT)
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index fe102cd2e518..dbc136675b02 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -130,6 +130,13 @@ static void set_default_spectre(struct kvm *kvm)
 		kvm->arch.pfr0_csv3 = 1;
 }
 
+
+static void set_default_vendor_hypercalls(struct kvm *kvm)
+{
+	kvm->arch.hc_bmap = (BIT(ARM_SMCCC_KVM_FUNC_FEATURES) |
+			     BIT(ARM_SMCCC_KVM_FUNC_PTP));
+}
+
 /**
  * kvm_arch_init_vm - initializes a VM data structure
  * @kvm:	pointer to the KVM struct
@@ -156,6 +163,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
 
 	set_default_spectre(kvm);
+	set_default_vendor_hypercalls(kvm);
 
 	return ret;
 out_free_stage2_pgd:
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 30da78f72b3b..11f79e790f7d 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -127,8 +127,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 		val[3] = ARM_SMCCC_VENDOR_HYP_UID_KVM_REG_3;
 		break;
 	case ARM_SMCCC_VENDOR_HYP_KVM_FEATURES_FUNC_ID:
-		val[0] = BIT(ARM_SMCCC_KVM_FUNC_FEATURES);
-		val[0] |= BIT(ARM_SMCCC_KVM_FUNC_PTP);
+		val[0] = vcpu->kvm->arch.hc_bmap;
 		break;
 	case ARM_SMCCC_VENDOR_HYP_KVM_PTP_FUNC_ID:
 		kvm_ptp_get_time(vcpu, val);
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 74c47d420253..ceef0c706d50 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -404,21 +404,26 @@ int kvm_psci_call(struct kvm_vcpu *vcpu)
 	};
 }
 
+static const u64 fw_reg_ids[] = {
+	KVM_REG_ARM_PSCI_VERSION,
+	KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1,
+	KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2,
+	KVM_REG_ARM_KVM_SPECIFIC_HYPERCALLS_0,
+};
+
 int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu)
 {
-	return 3;		/* PSCI version and two workaround registers */
+	return ARRAY_SIZE(fw_reg_ids);
 }
 
 int kvm_arm_copy_fw_reg_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
 {
-	if (put_user(KVM_REG_ARM_PSCI_VERSION, uindices++))
-		return -EFAULT;
-
-	if (put_user(KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1, uindices++))
-		return -EFAULT;
+	int i;
 
-	if (put_user(KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2, uindices++))
-		return -EFAULT;
+	for (i = 0; i < ARRAY_SIZE(fw_reg_ids); i++) {
+		if (put_user(fw_reg_ids[i], uindices++))
+			return -EFAULT;
+	}
 
 	return 0;
 }
@@ -477,6 +482,10 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 	case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
 		val = get_kernel_wa_level(reg->id) & KVM_REG_FEATURE_LEVEL_MASK;
 		break;
+	case KVM_REG_ARM_KVM_SPECIFIC_HYPERCALLS_0:
+		val = vcpu->kvm->arch.hc_bmap;
+		break;
+	case KVM_REG_ARM_KVM_SPECIFIC_HYPERCALLS_1:
 	default:
 		return -ENOENT;
 	}
@@ -515,8 +524,9 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 				return -EINVAL;
 			vcpu->kvm->arch.psci_version = val;
 			return 0;
+		default:
+			return -EINVAL;
 		}
-		break;
 	}
 
 	case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
@@ -563,9 +573,16 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 			return -EINVAL;
 
 		return 0;
+	case KVM_REG_ARM_KVM_SPECIFIC_HYPERCALLS_0:
+		/* FIXME: no vcpu should have run */
+		/* Exclude any unknown bit */
+		if (val & ~(BIT(ARM_SMCCC_KVM_FUNC_FEATURES) |
+			    BIT(ARM_SMCCC_KVM_FUNC_PTP)))
+			return -EINVAL;
+		vcpu->kvm->arch.hc_bmap = val;
+		return 0;
+	case KVM_REG_ARM_KVM_SPECIFIC_HYPERCALLS_1:
 	default:
 		return -ENOENT;
 	}
-
-	return -EINVAL;
 }

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
  2021-08-26  8:37             ` Marc Zyngier
  (?)
@ 2021-08-26 18:49               ` Oliver Upton
  -1 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2021-08-26 18:49 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Andrew Jones, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Thu, Aug 26, 2021 at 09:37:42AM +0100, Marc Zyngier wrote:
> On Wed, 25 Aug 2021 19:14:59 +0100,
> Oliver Upton <oupton@google.com> wrote:
> > 
> > On Wed, Aug 25, 2021 at 8:07 AM Andrew Jones <drjones@redhat.com> wrote:
> 
> [...]
> 
> > > Thanks for including me Marc. I think you've mentioned all the examples
> > > of why we don't generally expect N+1 -> N migrations to work that I
> > > can think of. While some of the examples like get-reg-list could
> > > eventually be eliminated if we had CPU models to tighten our machine type
> > > state, I think N+1 -> N migrations will always be best effort at most.
> > >
> > > I agree with giving userspace control over the exposer of the hypercalls
> > > though. Using pseudo-registers for that purpose rather than a pile of
> > > CAPs also seems reasonable to me.
> > >
> > > And, while I don't think this patch is going to proceed, I thought I'd
> > > point out that the opt-out approach doesn't help much with expanding
> > > our migration support unless we require the VMM to be upgraded first.
> > >
> > > And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
> > > work as expected, since the source enforce opt-out, but the destination
> > > won't.
> > 
> > Right, there's going to need to be a fence in both kernel and VMM
> > versions. Before the fence, you can't rollback with either component.
> > Once on the other side of the fence, the user may freely migrate
> > between kernel + VMM combinations.
> >
> > > Also, since the VMM doesn't key off the kernel version, for the
> > > most part N+1 VMMs won't know when they're supposed to opt-out or not,
> > > leaving it to the user to ensure they consider everything. opt-in
> > > usually only needs the user to consider what machine type they want to
> > > launch.
> > 
> > Going the register route will implicitly require opt-out for all old
> > hypercalls. We exposed them unconditionally to the guest before, and
> > we must uphold that behavior. The default value for the bitmap will
> > have those features set. Any hypercalls added after that register
> > interface will then require explicit opt-in from userspace.
> 
> I disagree here. This makes the ABI inconsistent, and means that no
> feature can be implemented without changing userspace. If you can deal
> with the existing features, you should be able to deal with the next
> lot.
>
> > With regards to the pseudoregister interface, how would a VMM discover
> > new bits? From my perspective, you need to have two bitmaps that the
> > VMM can get at: the set of supported feature bits and the active
> > bitmap of features for a running guest.
> 
> My proposal is that we have a single pseudo-register exposing the list
> of implemented by the kernel. Clear the bits you don't want, and write
> back the result. As long as you haven't written anything, you have the
> full feature set. That's pretty similar to the virtio feature
> negotiation.

Ah, yes I agree. Thinking about it more we will not need something
similar to KVM_GET_SUPPORTED_CPUID.

So then, for any register where userspace/KVM need to negotiate
features, the default value will return the maximum feature set that is
supported. If userspace wants to constrain features, read out the
register, make sure everything you want is there, and write it back
blowing away the superfluous bits. Given this should we enforce ordering
on feature registers, such that a VMM can only write to the registers
before a VM is started?

Also, Reiji is working on making the identity registers writable for the
sake of feature restriction. The suggested negotiation interface would
be applicable there too, IMO.

Many thanks to both you and Drew for working this out with me.

--
Best,
Oliver

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-26 18:49               ` Oliver Upton
  0 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2021-08-26 18:49 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, pshier, kvmarm, linux-arm-kernel

On Thu, Aug 26, 2021 at 09:37:42AM +0100, Marc Zyngier wrote:
> On Wed, 25 Aug 2021 19:14:59 +0100,
> Oliver Upton <oupton@google.com> wrote:
> > 
> > On Wed, Aug 25, 2021 at 8:07 AM Andrew Jones <drjones@redhat.com> wrote:
> 
> [...]
> 
> > > Thanks for including me Marc. I think you've mentioned all the examples
> > > of why we don't generally expect N+1 -> N migrations to work that I
> > > can think of. While some of the examples like get-reg-list could
> > > eventually be eliminated if we had CPU models to tighten our machine type
> > > state, I think N+1 -> N migrations will always be best effort at most.
> > >
> > > I agree with giving userspace control over the exposer of the hypercalls
> > > though. Using pseudo-registers for that purpose rather than a pile of
> > > CAPs also seems reasonable to me.
> > >
> > > And, while I don't think this patch is going to proceed, I thought I'd
> > > point out that the opt-out approach doesn't help much with expanding
> > > our migration support unless we require the VMM to be upgraded first.
> > >
> > > And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
> > > work as expected, since the source enforce opt-out, but the destination
> > > won't.
> > 
> > Right, there's going to need to be a fence in both kernel and VMM
> > versions. Before the fence, you can't rollback with either component.
> > Once on the other side of the fence, the user may freely migrate
> > between kernel + VMM combinations.
> >
> > > Also, since the VMM doesn't key off the kernel version, for the
> > > most part N+1 VMMs won't know when they're supposed to opt-out or not,
> > > leaving it to the user to ensure they consider everything. opt-in
> > > usually only needs the user to consider what machine type they want to
> > > launch.
> > 
> > Going the register route will implicitly require opt-out for all old
> > hypercalls. We exposed them unconditionally to the guest before, and
> > we must uphold that behavior. The default value for the bitmap will
> > have those features set. Any hypercalls added after that register
> > interface will then require explicit opt-in from userspace.
> 
> I disagree here. This makes the ABI inconsistent, and means that no
> feature can be implemented without changing userspace. If you can deal
> with the existing features, you should be able to deal with the next
> lot.
>
> > With regards to the pseudoregister interface, how would a VMM discover
> > new bits? From my perspective, you need to have two bitmaps that the
> > VMM can get at: the set of supported feature bits and the active
> > bitmap of features for a running guest.
> 
> My proposal is that we have a single pseudo-register exposing the list
> of implemented by the kernel. Clear the bits you don't want, and write
> back the result. As long as you haven't written anything, you have the
> full feature set. That's pretty similar to the virtio feature
> negotiation.

Ah, yes I agree. Thinking about it more we will not need something
similar to KVM_GET_SUPPORTED_CPUID.

So then, for any register where userspace/KVM need to negotiate
features, the default value will return the maximum feature set that is
supported. If userspace wants to constrain features, read out the
register, make sure everything you want is there, and write it back
blowing away the superfluous bits. Given this should we enforce ordering
on feature registers, such that a VMM can only write to the registers
before a VM is started?

Also, Reiji is working on making the identity registers writable for the
sake of feature restriction. The suggested negotiation interface would
be applicable there too, IMO.

Many thanks to both you and Drew for working this out with me.

--
Best,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-26 18:49               ` Oliver Upton
  0 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2021-08-26 18:49 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Andrew Jones, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Thu, Aug 26, 2021 at 09:37:42AM +0100, Marc Zyngier wrote:
> On Wed, 25 Aug 2021 19:14:59 +0100,
> Oliver Upton <oupton@google.com> wrote:
> > 
> > On Wed, Aug 25, 2021 at 8:07 AM Andrew Jones <drjones@redhat.com> wrote:
> 
> [...]
> 
> > > Thanks for including me Marc. I think you've mentioned all the examples
> > > of why we don't generally expect N+1 -> N migrations to work that I
> > > can think of. While some of the examples like get-reg-list could
> > > eventually be eliminated if we had CPU models to tighten our machine type
> > > state, I think N+1 -> N migrations will always be best effort at most.
> > >
> > > I agree with giving userspace control over the exposer of the hypercalls
> > > though. Using pseudo-registers for that purpose rather than a pile of
> > > CAPs also seems reasonable to me.
> > >
> > > And, while I don't think this patch is going to proceed, I thought I'd
> > > point out that the opt-out approach doesn't help much with expanding
> > > our migration support unless we require the VMM to be upgraded first.
> > >
> > > And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
> > > work as expected, since the source enforce opt-out, but the destination
> > > won't.
> > 
> > Right, there's going to need to be a fence in both kernel and VMM
> > versions. Before the fence, you can't rollback with either component.
> > Once on the other side of the fence, the user may freely migrate
> > between kernel + VMM combinations.
> >
> > > Also, since the VMM doesn't key off the kernel version, for the
> > > most part N+1 VMMs won't know when they're supposed to opt-out or not,
> > > leaving it to the user to ensure they consider everything. opt-in
> > > usually only needs the user to consider what machine type they want to
> > > launch.
> > 
> > Going the register route will implicitly require opt-out for all old
> > hypercalls. We exposed them unconditionally to the guest before, and
> > we must uphold that behavior. The default value for the bitmap will
> > have those features set. Any hypercalls added after that register
> > interface will then require explicit opt-in from userspace.
> 
> I disagree here. This makes the ABI inconsistent, and means that no
> feature can be implemented without changing userspace. If you can deal
> with the existing features, you should be able to deal with the next
> lot.
>
> > With regards to the pseudoregister interface, how would a VMM discover
> > new bits? From my perspective, you need to have two bitmaps that the
> > VMM can get at: the set of supported feature bits and the active
> > bitmap of features for a running guest.
> 
> My proposal is that we have a single pseudo-register exposing the list
> of implemented by the kernel. Clear the bits you don't want, and write
> back the result. As long as you haven't written anything, you have the
> full feature set. That's pretty similar to the virtio feature
> negotiation.

Ah, yes I agree. Thinking about it more we will not need something
similar to KVM_GET_SUPPORTED_CPUID.

So then, for any register where userspace/KVM need to negotiate
features, the default value will return the maximum feature set that is
supported. If userspace wants to constrain features, read out the
register, make sure everything you want is there, and write it back
blowing away the superfluous bits. Given this should we enforce ordering
on feature registers, such that a VMM can only write to the registers
before a VM is started?

Also, Reiji is working on making the identity registers writable for the
sake of feature restriction. The suggested negotiation interface would
be applicable there too, IMO.

Many thanks to both you and Drew for working this out with me.

--
Best,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
  2021-08-26 18:49               ` Oliver Upton
  (?)
@ 2021-08-27  7:40                 ` Andrew Jones
  -1 siblings, 0 replies; 74+ messages in thread
From: Andrew Jones @ 2021-08-27  7:40 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Thu, Aug 26, 2021 at 06:49:27PM +0000, Oliver Upton wrote:
> On Thu, Aug 26, 2021 at 09:37:42AM +0100, Marc Zyngier wrote:
> > On Wed, 25 Aug 2021 19:14:59 +0100,
> > Oliver Upton <oupton@google.com> wrote:
> > > 
> > > On Wed, Aug 25, 2021 at 8:07 AM Andrew Jones <drjones@redhat.com> wrote:
> > 
> > [...]
> > 
> > > > Thanks for including me Marc. I think you've mentioned all the examples
> > > > of why we don't generally expect N+1 -> N migrations to work that I
> > > > can think of. While some of the examples like get-reg-list could
> > > > eventually be eliminated if we had CPU models to tighten our machine type
> > > > state, I think N+1 -> N migrations will always be best effort at most.
> > > >
> > > > I agree with giving userspace control over the exposer of the hypercalls
> > > > though. Using pseudo-registers for that purpose rather than a pile of
> > > > CAPs also seems reasonable to me.
> > > >
> > > > And, while I don't think this patch is going to proceed, I thought I'd
> > > > point out that the opt-out approach doesn't help much with expanding
> > > > our migration support unless we require the VMM to be upgraded first.
> > > >
> > > > And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
> > > > work as expected, since the source enforce opt-out, but the destination
> > > > won't.
> > > 
> > > Right, there's going to need to be a fence in both kernel and VMM
> > > versions. Before the fence, you can't rollback with either component.
> > > Once on the other side of the fence, the user may freely migrate
> > > between kernel + VMM combinations.
> > >
> > > > Also, since the VMM doesn't key off the kernel version, for the
> > > > most part N+1 VMMs won't know when they're supposed to opt-out or not,
> > > > leaving it to the user to ensure they consider everything. opt-in
> > > > usually only needs the user to consider what machine type they want to
> > > > launch.
> > > 
> > > Going the register route will implicitly require opt-out for all old
> > > hypercalls. We exposed them unconditionally to the guest before, and
> > > we must uphold that behavior. The default value for the bitmap will
> > > have those features set. Any hypercalls added after that register
> > > interface will then require explicit opt-in from userspace.
> > 
> > I disagree here. This makes the ABI inconsistent, and means that no
> > feature can be implemented without changing userspace. If you can deal
> > with the existing features, you should be able to deal with the next
> > lot.
> >
> > > With regards to the pseudoregister interface, how would a VMM discover
> > > new bits? From my perspective, you need to have two bitmaps that the
> > > VMM can get at: the set of supported feature bits and the active
> > > bitmap of features for a running guest.
> > 
> > My proposal is that we have a single pseudo-register exposing the list
> > of implemented by the kernel. Clear the bits you don't want, and write
> > back the result. As long as you haven't written anything, you have the
> > full feature set. That's pretty similar to the virtio feature
> > negotiation.
> 
> Ah, yes I agree. Thinking about it more we will not need something
> similar to KVM_GET_SUPPORTED_CPUID.
> 
> So then, for any register where userspace/KVM need to negotiate
> features, the default value will return the maximum feature set that is
> supported. If userspace wants to constrain features, read out the
> register, make sure everything you want is there, and write it back
> blowing away the superfluous bits. Given this should we enforce ordering
> on feature registers, such that a VMM can only write to the registers
> before a VM is started?

That's a good idea. KVM_REG_ARM64_SVE_VLS has this type of constraint so
we can model the feature register control off that.

> 
> Also, Reiji is working on making the identity registers writable for the
> sake of feature restriction. The suggested negotiation interface would
> be applicable there too, IMO.

This this interesting news. I'll look forward to the posting.

> 
> Many thanks to both you and Drew for working this out with me.
>

Thanks,
drew 


^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-27  7:40                 ` Andrew Jones
  0 siblings, 0 replies; 74+ messages in thread
From: Andrew Jones @ 2021-08-27  7:40 UTC (permalink / raw)
  To: Oliver Upton; +Cc: kvm, Marc Zyngier, pshier, kvmarm, linux-arm-kernel

On Thu, Aug 26, 2021 at 06:49:27PM +0000, Oliver Upton wrote:
> On Thu, Aug 26, 2021 at 09:37:42AM +0100, Marc Zyngier wrote:
> > On Wed, 25 Aug 2021 19:14:59 +0100,
> > Oliver Upton <oupton@google.com> wrote:
> > > 
> > > On Wed, Aug 25, 2021 at 8:07 AM Andrew Jones <drjones@redhat.com> wrote:
> > 
> > [...]
> > 
> > > > Thanks for including me Marc. I think you've mentioned all the examples
> > > > of why we don't generally expect N+1 -> N migrations to work that I
> > > > can think of. While some of the examples like get-reg-list could
> > > > eventually be eliminated if we had CPU models to tighten our machine type
> > > > state, I think N+1 -> N migrations will always be best effort at most.
> > > >
> > > > I agree with giving userspace control over the exposer of the hypercalls
> > > > though. Using pseudo-registers for that purpose rather than a pile of
> > > > CAPs also seems reasonable to me.
> > > >
> > > > And, while I don't think this patch is going to proceed, I thought I'd
> > > > point out that the opt-out approach doesn't help much with expanding
> > > > our migration support unless we require the VMM to be upgraded first.
> > > >
> > > > And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
> > > > work as expected, since the source enforce opt-out, but the destination
> > > > won't.
> > > 
> > > Right, there's going to need to be a fence in both kernel and VMM
> > > versions. Before the fence, you can't rollback with either component.
> > > Once on the other side of the fence, the user may freely migrate
> > > between kernel + VMM combinations.
> > >
> > > > Also, since the VMM doesn't key off the kernel version, for the
> > > > most part N+1 VMMs won't know when they're supposed to opt-out or not,
> > > > leaving it to the user to ensure they consider everything. opt-in
> > > > usually only needs the user to consider what machine type they want to
> > > > launch.
> > > 
> > > Going the register route will implicitly require opt-out for all old
> > > hypercalls. We exposed them unconditionally to the guest before, and
> > > we must uphold that behavior. The default value for the bitmap will
> > > have those features set. Any hypercalls added after that register
> > > interface will then require explicit opt-in from userspace.
> > 
> > I disagree here. This makes the ABI inconsistent, and means that no
> > feature can be implemented without changing userspace. If you can deal
> > with the existing features, you should be able to deal with the next
> > lot.
> >
> > > With regards to the pseudoregister interface, how would a VMM discover
> > > new bits? From my perspective, you need to have two bitmaps that the
> > > VMM can get at: the set of supported feature bits and the active
> > > bitmap of features for a running guest.
> > 
> > My proposal is that we have a single pseudo-register exposing the list
> > of implemented by the kernel. Clear the bits you don't want, and write
> > back the result. As long as you haven't written anything, you have the
> > full feature set. That's pretty similar to the virtio feature
> > negotiation.
> 
> Ah, yes I agree. Thinking about it more we will not need something
> similar to KVM_GET_SUPPORTED_CPUID.
> 
> So then, for any register where userspace/KVM need to negotiate
> features, the default value will return the maximum feature set that is
> supported. If userspace wants to constrain features, read out the
> register, make sure everything you want is there, and write it back
> blowing away the superfluous bits. Given this should we enforce ordering
> on feature registers, such that a VMM can only write to the registers
> before a VM is started?

That's a good idea. KVM_REG_ARM64_SVE_VLS has this type of constraint so
we can model the feature register control off that.

> 
> Also, Reiji is working on making the identity registers writable for the
> sake of feature restriction. The suggested negotiation interface would
> be applicable there too, IMO.

This this interesting news. I'll look forward to the posting.

> 
> Many thanks to both you and Drew for working this out with me.
>

Thanks,
drew 

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-08-27  7:40                 ` Andrew Jones
  0 siblings, 0 replies; 74+ messages in thread
From: Andrew Jones @ 2021-08-27  7:40 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Thu, Aug 26, 2021 at 06:49:27PM +0000, Oliver Upton wrote:
> On Thu, Aug 26, 2021 at 09:37:42AM +0100, Marc Zyngier wrote:
> > On Wed, 25 Aug 2021 19:14:59 +0100,
> > Oliver Upton <oupton@google.com> wrote:
> > > 
> > > On Wed, Aug 25, 2021 at 8:07 AM Andrew Jones <drjones@redhat.com> wrote:
> > 
> > [...]
> > 
> > > > Thanks for including me Marc. I think you've mentioned all the examples
> > > > of why we don't generally expect N+1 -> N migrations to work that I
> > > > can think of. While some of the examples like get-reg-list could
> > > > eventually be eliminated if we had CPU models to tighten our machine type
> > > > state, I think N+1 -> N migrations will always be best effort at most.
> > > >
> > > > I agree with giving userspace control over the exposer of the hypercalls
> > > > though. Using pseudo-registers for that purpose rather than a pile of
> > > > CAPs also seems reasonable to me.
> > > >
> > > > And, while I don't think this patch is going to proceed, I thought I'd
> > > > point out that the opt-out approach doesn't help much with expanding
> > > > our migration support unless we require the VMM to be upgraded first.
> > > >
> > > > And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
> > > > work as expected, since the source enforce opt-out, but the destination
> > > > won't.
> > > 
> > > Right, there's going to need to be a fence in both kernel and VMM
> > > versions. Before the fence, you can't rollback with either component.
> > > Once on the other side of the fence, the user may freely migrate
> > > between kernel + VMM combinations.
> > >
> > > > Also, since the VMM doesn't key off the kernel version, for the
> > > > most part N+1 VMMs won't know when they're supposed to opt-out or not,
> > > > leaving it to the user to ensure they consider everything. opt-in
> > > > usually only needs the user to consider what machine type they want to
> > > > launch.
> > > 
> > > Going the register route will implicitly require opt-out for all old
> > > hypercalls. We exposed them unconditionally to the guest before, and
> > > we must uphold that behavior. The default value for the bitmap will
> > > have those features set. Any hypercalls added after that register
> > > interface will then require explicit opt-in from userspace.
> > 
> > I disagree here. This makes the ABI inconsistent, and means that no
> > feature can be implemented without changing userspace. If you can deal
> > with the existing features, you should be able to deal with the next
> > lot.
> >
> > > With regards to the pseudoregister interface, how would a VMM discover
> > > new bits? From my perspective, you need to have two bitmaps that the
> > > VMM can get at: the set of supported feature bits and the active
> > > bitmap of features for a running guest.
> > 
> > My proposal is that we have a single pseudo-register exposing the list
> > of implemented by the kernel. Clear the bits you don't want, and write
> > back the result. As long as you haven't written anything, you have the
> > full feature set. That's pretty similar to the virtio feature
> > negotiation.
> 
> Ah, yes I agree. Thinking about it more we will not need something
> similar to KVM_GET_SUPPORTED_CPUID.
> 
> So then, for any register where userspace/KVM need to negotiate
> features, the default value will return the maximum feature set that is
> supported. If userspace wants to constrain features, read out the
> register, make sure everything you want is there, and write it back
> blowing away the superfluous bits. Given this should we enforce ordering
> on feature registers, such that a VMM can only write to the registers
> before a VM is started?

That's a good idea. KVM_REG_ARM64_SVE_VLS has this type of constraint so
we can model the feature register control off that.

> 
> Also, Reiji is working on making the identity registers writable for the
> sake of feature restriction. The suggested negotiation interface would
> be applicable there too, IMO.

This this interesting news. I'll look forward to the posting.

> 
> Many thanks to both you and Drew for working this out with me.
>

Thanks,
drew 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
  2021-08-27  7:40                 ` Andrew Jones
  (?)
@ 2021-09-29 18:22                   ` Oliver Upton
  -1 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2021-09-29 18:22 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Marc Zyngier, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Fri, Aug 27, 2021 at 12:40 AM Andrew Jones <drjones@redhat.com> wrote:
>
> On Thu, Aug 26, 2021 at 06:49:27PM +0000, Oliver Upton wrote:
> > On Thu, Aug 26, 2021 at 09:37:42AM +0100, Marc Zyngier wrote:
> > > On Wed, 25 Aug 2021 19:14:59 +0100,
> > > Oliver Upton <oupton@google.com> wrote:
> > > >
> > > > On Wed, Aug 25, 2021 at 8:07 AM Andrew Jones <drjones@redhat.com> wrote:
> > >
> > > [...]
> > >
> > > > > Thanks for including me Marc. I think you've mentioned all the examples
> > > > > of why we don't generally expect N+1 -> N migrations to work that I
> > > > > can think of. While some of the examples like get-reg-list could
> > > > > eventually be eliminated if we had CPU models to tighten our machine type
> > > > > state, I think N+1 -> N migrations will always be best effort at most.
> > > > >
> > > > > I agree with giving userspace control over the exposer of the hypercalls
> > > > > though. Using pseudo-registers for that purpose rather than a pile of
> > > > > CAPs also seems reasonable to me.
> > > > >
> > > > > And, while I don't think this patch is going to proceed, I thought I'd
> > > > > point out that the opt-out approach doesn't help much with expanding
> > > > > our migration support unless we require the VMM to be upgraded first.
> > > > >
> > > > > And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
> > > > > work as expected, since the source enforce opt-out, but the destination
> > > > > won't.
> > > >
> > > > Right, there's going to need to be a fence in both kernel and VMM
> > > > versions. Before the fence, you can't rollback with either component.
> > > > Once on the other side of the fence, the user may freely migrate
> > > > between kernel + VMM combinations.
> > > >
> > > > > Also, since the VMM doesn't key off the kernel version, for the
> > > > > most part N+1 VMMs won't know when they're supposed to opt-out or not,
> > > > > leaving it to the user to ensure they consider everything. opt-in
> > > > > usually only needs the user to consider what machine type they want to
> > > > > launch.
> > > >
> > > > Going the register route will implicitly require opt-out for all old
> > > > hypercalls. We exposed them unconditionally to the guest before, and
> > > > we must uphold that behavior. The default value for the bitmap will
> > > > have those features set. Any hypercalls added after that register
> > > > interface will then require explicit opt-in from userspace.
> > >
> > > I disagree here. This makes the ABI inconsistent, and means that no
> > > feature can be implemented without changing userspace. If you can deal
> > > with the existing features, you should be able to deal with the next
> > > lot.
> > >
> > > > With regards to the pseudoregister interface, how would a VMM discover
> > > > new bits? From my perspective, you need to have two bitmaps that the
> > > > VMM can get at: the set of supported feature bits and the active
> > > > bitmap of features for a running guest.
> > >
> > > My proposal is that we have a single pseudo-register exposing the list
> > > of implemented by the kernel. Clear the bits you don't want, and write
> > > back the result. As long as you haven't written anything, you have the
> > > full feature set. That's pretty similar to the virtio feature
> > > negotiation.
> >
> > Ah, yes I agree. Thinking about it more we will not need something
> > similar to KVM_GET_SUPPORTED_CPUID.
> >
> > So then, for any register where userspace/KVM need to negotiate
> > features, the default value will return the maximum feature set that is
> > supported. If userspace wants to constrain features, read out the
> > register, make sure everything you want is there, and write it back
> > blowing away the superfluous bits. Given this should we enforce ordering
> > on feature registers, such that a VMM can only write to the registers
> > before a VM is started?
>
> That's a good idea. KVM_REG_ARM64_SVE_VLS has this type of constraint so
> we can model the feature register control off that.
>
> >
> > Also, Reiji is working on making the identity registers writable for the
> > sake of feature restriction. The suggested negotiation interface would
> > be applicable there too, IMO.
>
> This this interesting news. I'll look forward to the posting.
>
> >
> > Many thanks to both you and Drew for working this out with me.
> >
>
> Thanks,
> drew
>

Hey folks,

I have some lingering thoughts on this subject since we last spoke and
wanted to discuss.

I'm having a hard time figuring out how a VMM should handle a new
hypercall identity register introduced on a newer kernel. In order to
maintain guest ABI, the VMM would need to know about that register and
zero it when restoring an older guest.

Perhaps instead we could reserve a range of firmware registers as the
'hypercall identity' registers. Implement all of them as RAZ/WI by
default, encouraging userspace to zero these registers away for older
VMs but still allowing an old userspace to pick up new KVM features.
Doing so would align the hypercall identity registers with the feature
ID registers from the architecture.

Thoughts?

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-09-29 18:22                   ` Oliver Upton
  0 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2021-09-29 18:22 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, Marc Zyngier, pshier, kvmarm, linux-arm-kernel

On Fri, Aug 27, 2021 at 12:40 AM Andrew Jones <drjones@redhat.com> wrote:
>
> On Thu, Aug 26, 2021 at 06:49:27PM +0000, Oliver Upton wrote:
> > On Thu, Aug 26, 2021 at 09:37:42AM +0100, Marc Zyngier wrote:
> > > On Wed, 25 Aug 2021 19:14:59 +0100,
> > > Oliver Upton <oupton@google.com> wrote:
> > > >
> > > > On Wed, Aug 25, 2021 at 8:07 AM Andrew Jones <drjones@redhat.com> wrote:
> > >
> > > [...]
> > >
> > > > > Thanks for including me Marc. I think you've mentioned all the examples
> > > > > of why we don't generally expect N+1 -> N migrations to work that I
> > > > > can think of. While some of the examples like get-reg-list could
> > > > > eventually be eliminated if we had CPU models to tighten our machine type
> > > > > state, I think N+1 -> N migrations will always be best effort at most.
> > > > >
> > > > > I agree with giving userspace control over the exposer of the hypercalls
> > > > > though. Using pseudo-registers for that purpose rather than a pile of
> > > > > CAPs also seems reasonable to me.
> > > > >
> > > > > And, while I don't think this patch is going to proceed, I thought I'd
> > > > > point out that the opt-out approach doesn't help much with expanding
> > > > > our migration support unless we require the VMM to be upgraded first.
> > > > >
> > > > > And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
> > > > > work as expected, since the source enforce opt-out, but the destination
> > > > > won't.
> > > >
> > > > Right, there's going to need to be a fence in both kernel and VMM
> > > > versions. Before the fence, you can't rollback with either component.
> > > > Once on the other side of the fence, the user may freely migrate
> > > > between kernel + VMM combinations.
> > > >
> > > > > Also, since the VMM doesn't key off the kernel version, for the
> > > > > most part N+1 VMMs won't know when they're supposed to opt-out or not,
> > > > > leaving it to the user to ensure they consider everything. opt-in
> > > > > usually only needs the user to consider what machine type they want to
> > > > > launch.
> > > >
> > > > Going the register route will implicitly require opt-out for all old
> > > > hypercalls. We exposed them unconditionally to the guest before, and
> > > > we must uphold that behavior. The default value for the bitmap will
> > > > have those features set. Any hypercalls added after that register
> > > > interface will then require explicit opt-in from userspace.
> > >
> > > I disagree here. This makes the ABI inconsistent, and means that no
> > > feature can be implemented without changing userspace. If you can deal
> > > with the existing features, you should be able to deal with the next
> > > lot.
> > >
> > > > With regards to the pseudoregister interface, how would a VMM discover
> > > > new bits? From my perspective, you need to have two bitmaps that the
> > > > VMM can get at: the set of supported feature bits and the active
> > > > bitmap of features for a running guest.
> > >
> > > My proposal is that we have a single pseudo-register exposing the list
> > > of implemented by the kernel. Clear the bits you don't want, and write
> > > back the result. As long as you haven't written anything, you have the
> > > full feature set. That's pretty similar to the virtio feature
> > > negotiation.
> >
> > Ah, yes I agree. Thinking about it more we will not need something
> > similar to KVM_GET_SUPPORTED_CPUID.
> >
> > So then, for any register where userspace/KVM need to negotiate
> > features, the default value will return the maximum feature set that is
> > supported. If userspace wants to constrain features, read out the
> > register, make sure everything you want is there, and write it back
> > blowing away the superfluous bits. Given this should we enforce ordering
> > on feature registers, such that a VMM can only write to the registers
> > before a VM is started?
>
> That's a good idea. KVM_REG_ARM64_SVE_VLS has this type of constraint so
> we can model the feature register control off that.
>
> >
> > Also, Reiji is working on making the identity registers writable for the
> > sake of feature restriction. The suggested negotiation interface would
> > be applicable there too, IMO.
>
> This this interesting news. I'll look forward to the posting.
>
> >
> > Many thanks to both you and Drew for working this out with me.
> >
>
> Thanks,
> drew
>

Hey folks,

I have some lingering thoughts on this subject since we last spoke and
wanted to discuss.

I'm having a hard time figuring out how a VMM should handle a new
hypercall identity register introduced on a newer kernel. In order to
maintain guest ABI, the VMM would need to know about that register and
zero it when restoring an older guest.

Perhaps instead we could reserve a range of firmware registers as the
'hypercall identity' registers. Implement all of them as RAZ/WI by
default, encouraging userspace to zero these registers away for older
VMs but still allowing an old userspace to pick up new KVM features.
Doing so would align the hypercall identity registers with the feature
ID registers from the architecture.

Thoughts?

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-09-29 18:22                   ` Oliver Upton
  0 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2021-09-29 18:22 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Marc Zyngier, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Fri, Aug 27, 2021 at 12:40 AM Andrew Jones <drjones@redhat.com> wrote:
>
> On Thu, Aug 26, 2021 at 06:49:27PM +0000, Oliver Upton wrote:
> > On Thu, Aug 26, 2021 at 09:37:42AM +0100, Marc Zyngier wrote:
> > > On Wed, 25 Aug 2021 19:14:59 +0100,
> > > Oliver Upton <oupton@google.com> wrote:
> > > >
> > > > On Wed, Aug 25, 2021 at 8:07 AM Andrew Jones <drjones@redhat.com> wrote:
> > >
> > > [...]
> > >
> > > > > Thanks for including me Marc. I think you've mentioned all the examples
> > > > > of why we don't generally expect N+1 -> N migrations to work that I
> > > > > can think of. While some of the examples like get-reg-list could
> > > > > eventually be eliminated if we had CPU models to tighten our machine type
> > > > > state, I think N+1 -> N migrations will always be best effort at most.
> > > > >
> > > > > I agree with giving userspace control over the exposer of the hypercalls
> > > > > though. Using pseudo-registers for that purpose rather than a pile of
> > > > > CAPs also seems reasonable to me.
> > > > >
> > > > > And, while I don't think this patch is going to proceed, I thought I'd
> > > > > point out that the opt-out approach doesn't help much with expanding
> > > > > our migration support unless we require the VMM to be upgraded first.
> > > > >
> > > > > And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
> > > > > work as expected, since the source enforce opt-out, but the destination
> > > > > won't.
> > > >
> > > > Right, there's going to need to be a fence in both kernel and VMM
> > > > versions. Before the fence, you can't rollback with either component.
> > > > Once on the other side of the fence, the user may freely migrate
> > > > between kernel + VMM combinations.
> > > >
> > > > > Also, since the VMM doesn't key off the kernel version, for the
> > > > > most part N+1 VMMs won't know when they're supposed to opt-out or not,
> > > > > leaving it to the user to ensure they consider everything. opt-in
> > > > > usually only needs the user to consider what machine type they want to
> > > > > launch.
> > > >
> > > > Going the register route will implicitly require opt-out for all old
> > > > hypercalls. We exposed them unconditionally to the guest before, and
> > > > we must uphold that behavior. The default value for the bitmap will
> > > > have those features set. Any hypercalls added after that register
> > > > interface will then require explicit opt-in from userspace.
> > >
> > > I disagree here. This makes the ABI inconsistent, and means that no
> > > feature can be implemented without changing userspace. If you can deal
> > > with the existing features, you should be able to deal with the next
> > > lot.
> > >
> > > > With regards to the pseudoregister interface, how would a VMM discover
> > > > new bits? From my perspective, you need to have two bitmaps that the
> > > > VMM can get at: the set of supported feature bits and the active
> > > > bitmap of features for a running guest.
> > >
> > > My proposal is that we have a single pseudo-register exposing the list
> > > of implemented by the kernel. Clear the bits you don't want, and write
> > > back the result. As long as you haven't written anything, you have the
> > > full feature set. That's pretty similar to the virtio feature
> > > negotiation.
> >
> > Ah, yes I agree. Thinking about it more we will not need something
> > similar to KVM_GET_SUPPORTED_CPUID.
> >
> > So then, for any register where userspace/KVM need to negotiate
> > features, the default value will return the maximum feature set that is
> > supported. If userspace wants to constrain features, read out the
> > register, make sure everything you want is there, and write it back
> > blowing away the superfluous bits. Given this should we enforce ordering
> > on feature registers, such that a VMM can only write to the registers
> > before a VM is started?
>
> That's a good idea. KVM_REG_ARM64_SVE_VLS has this type of constraint so
> we can model the feature register control off that.
>
> >
> > Also, Reiji is working on making the identity registers writable for the
> > sake of feature restriction. The suggested negotiation interface would
> > be applicable there too, IMO.
>
> This this interesting news. I'll look forward to the posting.
>
> >
> > Many thanks to both you and Drew for working this out with me.
> >
>
> Thanks,
> drew
>

Hey folks,

I have some lingering thoughts on this subject since we last spoke and
wanted to discuss.

I'm having a hard time figuring out how a VMM should handle a new
hypercall identity register introduced on a newer kernel. In order to
maintain guest ABI, the VMM would need to know about that register and
zero it when restoring an older guest.

Perhaps instead we could reserve a range of firmware registers as the
'hypercall identity' registers. Implement all of them as RAZ/WI by
default, encouraging userspace to zero these registers away for older
VMs but still allowing an old userspace to pick up new KVM features.
Doing so would align the hypercall identity registers with the feature
ID registers from the architecture.

Thoughts?

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
  2021-09-29 18:22                   ` Oliver Upton
  (?)
@ 2021-09-30  7:32                     ` Marc Zyngier
  -1 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2021-09-30  7:32 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Andrew Jones, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

Hi Oliver,

On Wed, 29 Sep 2021 19:22:05 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> I have some lingering thoughts on this subject since we last spoke and
> wanted to discuss.
> 
> I'm having a hard time figuring out how a VMM should handle a new
> hypercall identity register introduced on a newer kernel. In order to
> maintain guest ABI, the VMM would need to know about that register and
> zero it when restoring an older guest.

Just as it would need to be able to discover any new system register
exposed by default, as it happens at all times. Which is why we have a
way to discover all the registers, architected or not.

> Perhaps instead we could reserve a range of firmware registers as the
> 'hypercall identity' registers. Implement all of them as RAZ/WI by
> default, encouraging userspace to zero these registers away for older
> VMs but still allowing an old userspace to pick up new KVM features.
> Doing so would align the hypercall identity registers with the feature
> ID registers from the architecture.

The range already exists in the form of the "coprocessor" 0x14. I
don't see the need to expose it as RAZ/WI, however. If userspace
doesn't know about how this pseudo-register works, it won't be able to
program it anyway.

I don't buy the parallel with the ID-regs either. They are RAZ/WI by
default so that they don't UNDEF at runtime. The meaning of a RAZ
id-register is also well defined (feature not implemented), and the
CPU cannot write to them. In a way, the ID-regs *are* the enumeration
mechanism.

Our firmware registers don't follow the same rules. Userspace can
write to them, and there is no such "not implemented" rule (case in
point, PSCI). We also have a separate enumeration mechanism
(GET_ONE_REG), which is (more or less) designed for userspace to find
what is implemented.

For these reasons, I don't immediately see the point of advertising a
set of registers ahead of time, before userspace grows an
understanding of what these registers mean.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-09-30  7:32                     ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2021-09-30  7:32 UTC (permalink / raw)
  To: Oliver Upton; +Cc: kvm, pshier, kvmarm, linux-arm-kernel

Hi Oliver,

On Wed, 29 Sep 2021 19:22:05 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> I have some lingering thoughts on this subject since we last spoke and
> wanted to discuss.
> 
> I'm having a hard time figuring out how a VMM should handle a new
> hypercall identity register introduced on a newer kernel. In order to
> maintain guest ABI, the VMM would need to know about that register and
> zero it when restoring an older guest.

Just as it would need to be able to discover any new system register
exposed by default, as it happens at all times. Which is why we have a
way to discover all the registers, architected or not.

> Perhaps instead we could reserve a range of firmware registers as the
> 'hypercall identity' registers. Implement all of them as RAZ/WI by
> default, encouraging userspace to zero these registers away for older
> VMs but still allowing an old userspace to pick up new KVM features.
> Doing so would align the hypercall identity registers with the feature
> ID registers from the architecture.

The range already exists in the form of the "coprocessor" 0x14. I
don't see the need to expose it as RAZ/WI, however. If userspace
doesn't know about how this pseudo-register works, it won't be able to
program it anyway.

I don't buy the parallel with the ID-regs either. They are RAZ/WI by
default so that they don't UNDEF at runtime. The meaning of a RAZ
id-register is also well defined (feature not implemented), and the
CPU cannot write to them. In a way, the ID-regs *are* the enumeration
mechanism.

Our firmware registers don't follow the same rules. Userspace can
write to them, and there is no such "not implemented" rule (case in
point, PSCI). We also have a separate enumeration mechanism
(GET_ONE_REG), which is (more or less) designed for userspace to find
what is implemented.

For these reasons, I don't immediately see the point of advertising a
set of registers ahead of time, before userspace grows an
understanding of what these registers mean.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-09-30  7:32                     ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2021-09-30  7:32 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Andrew Jones, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

Hi Oliver,

On Wed, 29 Sep 2021 19:22:05 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> I have some lingering thoughts on this subject since we last spoke and
> wanted to discuss.
> 
> I'm having a hard time figuring out how a VMM should handle a new
> hypercall identity register introduced on a newer kernel. In order to
> maintain guest ABI, the VMM would need to know about that register and
> zero it when restoring an older guest.

Just as it would need to be able to discover any new system register
exposed by default, as it happens at all times. Which is why we have a
way to discover all the registers, architected or not.

> Perhaps instead we could reserve a range of firmware registers as the
> 'hypercall identity' registers. Implement all of them as RAZ/WI by
> default, encouraging userspace to zero these registers away for older
> VMs but still allowing an old userspace to pick up new KVM features.
> Doing so would align the hypercall identity registers with the feature
> ID registers from the architecture.

The range already exists in the form of the "coprocessor" 0x14. I
don't see the need to expose it as RAZ/WI, however. If userspace
doesn't know about how this pseudo-register works, it won't be able to
program it anyway.

I don't buy the parallel with the ID-regs either. They are RAZ/WI by
default so that they don't UNDEF at runtime. The meaning of a RAZ
id-register is also well defined (feature not implemented), and the
CPU cannot write to them. In a way, the ID-regs *are* the enumeration
mechanism.

Our firmware registers don't follow the same rules. Userspace can
write to them, and there is no such "not implemented" rule (case in
point, PSCI). We also have a separate enumeration mechanism
(GET_ONE_REG), which is (more or less) designed for userspace to find
what is implemented.

For these reasons, I don't immediately see the point of advertising a
set of registers ahead of time, before userspace grows an
understanding of what these registers mean.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
  2021-09-30  7:32                     ` Marc Zyngier
  (?)
@ 2021-09-30 17:24                       ` Oliver Upton
  -1 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2021-09-30 17:24 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Andrew Jones, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

Hey Marc,

On Thu, Sep 30, 2021 at 12:32 AM Marc Zyngier <maz@kernel.org> wrote:
>
> Hi Oliver,
>
> On Wed, 29 Sep 2021 19:22:05 +0100,
> Oliver Upton <oupton@google.com> wrote:
> >
> > I have some lingering thoughts on this subject since we last spoke and
> > wanted to discuss.
> >
> > I'm having a hard time figuring out how a VMM should handle a new
> > hypercall identity register introduced on a newer kernel. In order to
> > maintain guest ABI, the VMM would need to know about that register and
> > zero it when restoring an older guest.
>
> Just as it would need to be able to discover any new system register
> exposed by default, as it happens at all times. Which is why we have a
> way to discover all the registers, architected or not.
>
> > Perhaps instead we could reserve a range of firmware registers as the
> > 'hypercall identity' registers. Implement all of them as RAZ/WI by
> > default, encouraging userspace to zero these registers away for older
> > VMs but still allowing an old userspace to pick up new KVM features.
> > Doing so would align the hypercall identity registers with the feature
> > ID registers from the architecture.
>
> The range already exists in the form of the "coprocessor" 0x14. I
> don't see the need to expose it as RAZ/WI, however. If userspace
> doesn't know about how this pseudo-register works, it won't be able to
> program it anyway.
>
> I don't buy the parallel with the ID-regs either. They are RAZ/WI by
> default so that they don't UNDEF at runtime. The meaning of a RAZ
> id-register is also well defined (feature not implemented), and the
> CPU cannot write to them. In a way, the ID-regs *are* the enumeration
> mechanism.
>
> Our firmware registers don't follow the same rules. Userspace can
> write to them, and there is no such "not implemented" rule (case in
> point, PSCI). We also have a separate enumeration mechanism
> (GET_ONE_REG), which is (more or less) designed for userspace to find
> what is implemented.
>
> For these reasons, I don't immediately see the point of advertising a
> set of registers ahead of time, before userspace grows an
> understanding of what these registers mean.

Supposing we don't preallocate some hypercall ID registers, how can we
safely migrate a guest from an older kernel to newer kernel? Ideally,
we would preserve the hypercall feature set across the migration which
could work for a while with the first set of registers that get
defined, but whenever a new hypercall firmware register comes along
then the VMM will be clueless to the new ABI.

Fundamentally, I don't think userspace should need a patch to preserve
ABI on a newer kernel. Despite that, it would seem that userspace will
need to learn of any firmware registers that control hypercall
features which come after the initial set that gets proposed. If
KVM_GET_REG_LIST were to disambiguate between ID registers (hypercall,
architectural feature ID registers) from other parts of the vCPU
state, it would be clear to what registers to zero on a newer kernel.
Apologies if it is distracting to mention the feature ID registers
here, but both are on my mind currently and want to make sure there is
some consistency in how features get handled on newer kernels,
architected or not.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-09-30 17:24                       ` Oliver Upton
  0 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2021-09-30 17:24 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, pshier, kvmarm, linux-arm-kernel

Hey Marc,

On Thu, Sep 30, 2021 at 12:32 AM Marc Zyngier <maz@kernel.org> wrote:
>
> Hi Oliver,
>
> On Wed, 29 Sep 2021 19:22:05 +0100,
> Oliver Upton <oupton@google.com> wrote:
> >
> > I have some lingering thoughts on this subject since we last spoke and
> > wanted to discuss.
> >
> > I'm having a hard time figuring out how a VMM should handle a new
> > hypercall identity register introduced on a newer kernel. In order to
> > maintain guest ABI, the VMM would need to know about that register and
> > zero it when restoring an older guest.
>
> Just as it would need to be able to discover any new system register
> exposed by default, as it happens at all times. Which is why we have a
> way to discover all the registers, architected or not.
>
> > Perhaps instead we could reserve a range of firmware registers as the
> > 'hypercall identity' registers. Implement all of them as RAZ/WI by
> > default, encouraging userspace to zero these registers away for older
> > VMs but still allowing an old userspace to pick up new KVM features.
> > Doing so would align the hypercall identity registers with the feature
> > ID registers from the architecture.
>
> The range already exists in the form of the "coprocessor" 0x14. I
> don't see the need to expose it as RAZ/WI, however. If userspace
> doesn't know about how this pseudo-register works, it won't be able to
> program it anyway.
>
> I don't buy the parallel with the ID-regs either. They are RAZ/WI by
> default so that they don't UNDEF at runtime. The meaning of a RAZ
> id-register is also well defined (feature not implemented), and the
> CPU cannot write to them. In a way, the ID-regs *are* the enumeration
> mechanism.
>
> Our firmware registers don't follow the same rules. Userspace can
> write to them, and there is no such "not implemented" rule (case in
> point, PSCI). We also have a separate enumeration mechanism
> (GET_ONE_REG), which is (more or less) designed for userspace to find
> what is implemented.
>
> For these reasons, I don't immediately see the point of advertising a
> set of registers ahead of time, before userspace grows an
> understanding of what these registers mean.

Supposing we don't preallocate some hypercall ID registers, how can we
safely migrate a guest from an older kernel to newer kernel? Ideally,
we would preserve the hypercall feature set across the migration which
could work for a while with the first set of registers that get
defined, but whenever a new hypercall firmware register comes along
then the VMM will be clueless to the new ABI.

Fundamentally, I don't think userspace should need a patch to preserve
ABI on a newer kernel. Despite that, it would seem that userspace will
need to learn of any firmware registers that control hypercall
features which come after the initial set that gets proposed. If
KVM_GET_REG_LIST were to disambiguate between ID registers (hypercall,
architectural feature ID registers) from other parts of the vCPU
state, it would be clear to what registers to zero on a newer kernel.
Apologies if it is distracting to mention the feature ID registers
here, but both are on my mind currently and want to make sure there is
some consistency in how features get handled on newer kernels,
architected or not.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-09-30 17:24                       ` Oliver Upton
  0 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2021-09-30 17:24 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Andrew Jones, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

Hey Marc,

On Thu, Sep 30, 2021 at 12:32 AM Marc Zyngier <maz@kernel.org> wrote:
>
> Hi Oliver,
>
> On Wed, 29 Sep 2021 19:22:05 +0100,
> Oliver Upton <oupton@google.com> wrote:
> >
> > I have some lingering thoughts on this subject since we last spoke and
> > wanted to discuss.
> >
> > I'm having a hard time figuring out how a VMM should handle a new
> > hypercall identity register introduced on a newer kernel. In order to
> > maintain guest ABI, the VMM would need to know about that register and
> > zero it when restoring an older guest.
>
> Just as it would need to be able to discover any new system register
> exposed by default, as it happens at all times. Which is why we have a
> way to discover all the registers, architected or not.
>
> > Perhaps instead we could reserve a range of firmware registers as the
> > 'hypercall identity' registers. Implement all of them as RAZ/WI by
> > default, encouraging userspace to zero these registers away for older
> > VMs but still allowing an old userspace to pick up new KVM features.
> > Doing so would align the hypercall identity registers with the feature
> > ID registers from the architecture.
>
> The range already exists in the form of the "coprocessor" 0x14. I
> don't see the need to expose it as RAZ/WI, however. If userspace
> doesn't know about how this pseudo-register works, it won't be able to
> program it anyway.
>
> I don't buy the parallel with the ID-regs either. They are RAZ/WI by
> default so that they don't UNDEF at runtime. The meaning of a RAZ
> id-register is also well defined (feature not implemented), and the
> CPU cannot write to them. In a way, the ID-regs *are* the enumeration
> mechanism.
>
> Our firmware registers don't follow the same rules. Userspace can
> write to them, and there is no such "not implemented" rule (case in
> point, PSCI). We also have a separate enumeration mechanism
> (GET_ONE_REG), which is (more or less) designed for userspace to find
> what is implemented.
>
> For these reasons, I don't immediately see the point of advertising a
> set of registers ahead of time, before userspace grows an
> understanding of what these registers mean.

Supposing we don't preallocate some hypercall ID registers, how can we
safely migrate a guest from an older kernel to newer kernel? Ideally,
we would preserve the hypercall feature set across the migration which
could work for a while with the first set of registers that get
defined, but whenever a new hypercall firmware register comes along
then the VMM will be clueless to the new ABI.

Fundamentally, I don't think userspace should need a patch to preserve
ABI on a newer kernel. Despite that, it would seem that userspace will
need to learn of any firmware registers that control hypercall
features which come after the initial set that gets proposed. If
KVM_GET_REG_LIST were to disambiguate between ID registers (hypercall,
architectural feature ID registers) from other parts of the vCPU
state, it would be clear to what registers to zero on a newer kernel.
Apologies if it is distracting to mention the feature ID registers
here, but both are on my mind currently and want to make sure there is
some consistency in how features get handled on newer kernels,
architected or not.

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
  2021-09-30 17:24                       ` Oliver Upton
  (?)
@ 2021-10-01 11:43                         ` Marc Zyngier
  -1 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2021-10-01 11:43 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Andrew Jones, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Thu, 30 Sep 2021 18:24:23 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> Hey Marc,
> 
> On Thu, Sep 30, 2021 at 12:32 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > Hi Oliver,
> >
> > On Wed, 29 Sep 2021 19:22:05 +0100,
> > Oliver Upton <oupton@google.com> wrote:
> > >
> > > I have some lingering thoughts on this subject since we last spoke and
> > > wanted to discuss.
> > >
> > > I'm having a hard time figuring out how a VMM should handle a new
> > > hypercall identity register introduced on a newer kernel. In order to
> > > maintain guest ABI, the VMM would need to know about that register and
> > > zero it when restoring an older guest.
> >
> > Just as it would need to be able to discover any new system register
> > exposed by default, as it happens at all times. Which is why we have a
> > way to discover all the registers, architected or not.
> >
> > > Perhaps instead we could reserve a range of firmware registers as the
> > > 'hypercall identity' registers. Implement all of them as RAZ/WI by
> > > default, encouraging userspace to zero these registers away for older
> > > VMs but still allowing an old userspace to pick up new KVM features.
> > > Doing so would align the hypercall identity registers with the feature
> > > ID registers from the architecture.
> >
> > The range already exists in the form of the "coprocessor" 0x14. I
> > don't see the need to expose it as RAZ/WI, however. If userspace
> > doesn't know about how this pseudo-register works, it won't be able to
> > program it anyway.
> >
> > I don't buy the parallel with the ID-regs either. They are RAZ/WI by
> > default so that they don't UNDEF at runtime. The meaning of a RAZ
> > id-register is also well defined (feature not implemented), and the
> > CPU cannot write to them. In a way, the ID-regs *are* the enumeration
> > mechanism.
> >
> > Our firmware registers don't follow the same rules. Userspace can
> > write to them, and there is no such "not implemented" rule (case in
> > point, PSCI). We also have a separate enumeration mechanism
> > (GET_ONE_REG), which is (more or less) designed for userspace to find
> > what is implemented.
> >
> > For these reasons, I don't immediately see the point of advertising a
> > set of registers ahead of time, before userspace grows an
> > understanding of what these registers mean.
> 
> Supposing we don't preallocate some hypercall ID registers, how can we
> safely migrate a guest from an older kernel to newer kernel? Ideally,
> we would preserve the hypercall feature set across the migration which
> could work for a while with the first set of registers that get
> defined, but whenever a new hypercall firmware register comes along
> then the VMM will be clueless to the new ABI.

My expectations were that whenever userspace writes a set of firmware
register, all the defaults are invalidated. For example say that
host-A know about a single hypercall register, while host-B knows
about two. Upon writing to the first register, the host would clear
the set of available services in the second one. If userspace
eventually writes there, the value would stick if valid.

Also, remember that pseudo-registers don't have to be 64bit. We could
define a new class of hypercall-specific registers that would be much
wider, and thus have a larger write granularity (KVM supports anything
from 8 to 2048 bits). This would make it pretty easy to implement the
above.

> Fundamentally, I don't think userspace should need a patch to preserve
> ABI on a newer kernel. Despite that, it would seem that userspace will
> need to learn of any firmware registers that control hypercall
> features which come after the initial set that gets proposed. If
> KVM_GET_REG_LIST were to disambiguate between ID registers (hypercall,
> architectural feature ID registers) from other parts of the vCPU
> state, it would be clear to what registers to zero on a newer kernel.
> Apologies if it is distracting to mention the feature ID registers
> here, but both are on my mind currently and want to make sure there is
> some consistency in how features get handled on newer kernels,
> architected or not.

The problem I see is that we will always need to grow the HC register
space one way or another, no matter how many we reserve. Your approach
only works if we don't exceed that particular range. Maybe it will
never be a problem, but it remains that this is not scalable.

If we wanted to be safe, we'd reserve the whole of the possible space
as defined by the SMCCC spec. Given that we currently have two HC
spaces (the ARM-controlled one, and the KVM-specific one), the
function space being 16bits in each case, that's 16kB worth of zeroes
that userspace has to save/restore at all times... I'm not overly
enthusiastic.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-10-01 11:43                         ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2021-10-01 11:43 UTC (permalink / raw)
  To: Oliver Upton; +Cc: kvm, pshier, kvmarm, linux-arm-kernel

On Thu, 30 Sep 2021 18:24:23 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> Hey Marc,
> 
> On Thu, Sep 30, 2021 at 12:32 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > Hi Oliver,
> >
> > On Wed, 29 Sep 2021 19:22:05 +0100,
> > Oliver Upton <oupton@google.com> wrote:
> > >
> > > I have some lingering thoughts on this subject since we last spoke and
> > > wanted to discuss.
> > >
> > > I'm having a hard time figuring out how a VMM should handle a new
> > > hypercall identity register introduced on a newer kernel. In order to
> > > maintain guest ABI, the VMM would need to know about that register and
> > > zero it when restoring an older guest.
> >
> > Just as it would need to be able to discover any new system register
> > exposed by default, as it happens at all times. Which is why we have a
> > way to discover all the registers, architected or not.
> >
> > > Perhaps instead we could reserve a range of firmware registers as the
> > > 'hypercall identity' registers. Implement all of them as RAZ/WI by
> > > default, encouraging userspace to zero these registers away for older
> > > VMs but still allowing an old userspace to pick up new KVM features.
> > > Doing so would align the hypercall identity registers with the feature
> > > ID registers from the architecture.
> >
> > The range already exists in the form of the "coprocessor" 0x14. I
> > don't see the need to expose it as RAZ/WI, however. If userspace
> > doesn't know about how this pseudo-register works, it won't be able to
> > program it anyway.
> >
> > I don't buy the parallel with the ID-regs either. They are RAZ/WI by
> > default so that they don't UNDEF at runtime. The meaning of a RAZ
> > id-register is also well defined (feature not implemented), and the
> > CPU cannot write to them. In a way, the ID-regs *are* the enumeration
> > mechanism.
> >
> > Our firmware registers don't follow the same rules. Userspace can
> > write to them, and there is no such "not implemented" rule (case in
> > point, PSCI). We also have a separate enumeration mechanism
> > (GET_ONE_REG), which is (more or less) designed for userspace to find
> > what is implemented.
> >
> > For these reasons, I don't immediately see the point of advertising a
> > set of registers ahead of time, before userspace grows an
> > understanding of what these registers mean.
> 
> Supposing we don't preallocate some hypercall ID registers, how can we
> safely migrate a guest from an older kernel to newer kernel? Ideally,
> we would preserve the hypercall feature set across the migration which
> could work for a while with the first set of registers that get
> defined, but whenever a new hypercall firmware register comes along
> then the VMM will be clueless to the new ABI.

My expectations were that whenever userspace writes a set of firmware
register, all the defaults are invalidated. For example say that
host-A know about a single hypercall register, while host-B knows
about two. Upon writing to the first register, the host would clear
the set of available services in the second one. If userspace
eventually writes there, the value would stick if valid.

Also, remember that pseudo-registers don't have to be 64bit. We could
define a new class of hypercall-specific registers that would be much
wider, and thus have a larger write granularity (KVM supports anything
from 8 to 2048 bits). This would make it pretty easy to implement the
above.

> Fundamentally, I don't think userspace should need a patch to preserve
> ABI on a newer kernel. Despite that, it would seem that userspace will
> need to learn of any firmware registers that control hypercall
> features which come after the initial set that gets proposed. If
> KVM_GET_REG_LIST were to disambiguate between ID registers (hypercall,
> architectural feature ID registers) from other parts of the vCPU
> state, it would be clear to what registers to zero on a newer kernel.
> Apologies if it is distracting to mention the feature ID registers
> here, but both are on my mind currently and want to make sure there is
> some consistency in how features get handled on newer kernels,
> architected or not.

The problem I see is that we will always need to grow the HC register
space one way or another, no matter how many we reserve. Your approach
only works if we don't exceed that particular range. Maybe it will
never be a problem, but it remains that this is not scalable.

If we wanted to be safe, we'd reserve the whole of the possible space
as defined by the SMCCC spec. Given that we currently have two HC
spaces (the ARM-controlled one, and the KVM-specific one), the
function space being 16bits in each case, that's 16kB worth of zeroes
that userspace has to save/restore at all times... I'm not overly
enthusiastic.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-10-01 11:43                         ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2021-10-01 11:43 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Andrew Jones, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Thu, 30 Sep 2021 18:24:23 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> Hey Marc,
> 
> On Thu, Sep 30, 2021 at 12:32 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > Hi Oliver,
> >
> > On Wed, 29 Sep 2021 19:22:05 +0100,
> > Oliver Upton <oupton@google.com> wrote:
> > >
> > > I have some lingering thoughts on this subject since we last spoke and
> > > wanted to discuss.
> > >
> > > I'm having a hard time figuring out how a VMM should handle a new
> > > hypercall identity register introduced on a newer kernel. In order to
> > > maintain guest ABI, the VMM would need to know about that register and
> > > zero it when restoring an older guest.
> >
> > Just as it would need to be able to discover any new system register
> > exposed by default, as it happens at all times. Which is why we have a
> > way to discover all the registers, architected or not.
> >
> > > Perhaps instead we could reserve a range of firmware registers as the
> > > 'hypercall identity' registers. Implement all of them as RAZ/WI by
> > > default, encouraging userspace to zero these registers away for older
> > > VMs but still allowing an old userspace to pick up new KVM features.
> > > Doing so would align the hypercall identity registers with the feature
> > > ID registers from the architecture.
> >
> > The range already exists in the form of the "coprocessor" 0x14. I
> > don't see the need to expose it as RAZ/WI, however. If userspace
> > doesn't know about how this pseudo-register works, it won't be able to
> > program it anyway.
> >
> > I don't buy the parallel with the ID-regs either. They are RAZ/WI by
> > default so that they don't UNDEF at runtime. The meaning of a RAZ
> > id-register is also well defined (feature not implemented), and the
> > CPU cannot write to them. In a way, the ID-regs *are* the enumeration
> > mechanism.
> >
> > Our firmware registers don't follow the same rules. Userspace can
> > write to them, and there is no such "not implemented" rule (case in
> > point, PSCI). We also have a separate enumeration mechanism
> > (GET_ONE_REG), which is (more or less) designed for userspace to find
> > what is implemented.
> >
> > For these reasons, I don't immediately see the point of advertising a
> > set of registers ahead of time, before userspace grows an
> > understanding of what these registers mean.
> 
> Supposing we don't preallocate some hypercall ID registers, how can we
> safely migrate a guest from an older kernel to newer kernel? Ideally,
> we would preserve the hypercall feature set across the migration which
> could work for a while with the first set of registers that get
> defined, but whenever a new hypercall firmware register comes along
> then the VMM will be clueless to the new ABI.

My expectations were that whenever userspace writes a set of firmware
register, all the defaults are invalidated. For example say that
host-A know about a single hypercall register, while host-B knows
about two. Upon writing to the first register, the host would clear
the set of available services in the second one. If userspace
eventually writes there, the value would stick if valid.

Also, remember that pseudo-registers don't have to be 64bit. We could
define a new class of hypercall-specific registers that would be much
wider, and thus have a larger write granularity (KVM supports anything
from 8 to 2048 bits). This would make it pretty easy to implement the
above.

> Fundamentally, I don't think userspace should need a patch to preserve
> ABI on a newer kernel. Despite that, it would seem that userspace will
> need to learn of any firmware registers that control hypercall
> features which come after the initial set that gets proposed. If
> KVM_GET_REG_LIST were to disambiguate between ID registers (hypercall,
> architectural feature ID registers) from other parts of the vCPU
> state, it would be clear to what registers to zero on a newer kernel.
> Apologies if it is distracting to mention the feature ID registers
> here, but both are on my mind currently and want to make sure there is
> some consistency in how features get handled on newer kernels,
> architected or not.

The problem I see is that we will always need to grow the HC register
space one way or another, no matter how many we reserve. Your approach
only works if we don't exceed that particular range. Maybe it will
never be a problem, but it remains that this is not scalable.

If we wanted to be safe, we'd reserve the whole of the possible space
as defined by the SMCCC spec. Given that we currently have two HC
spaces (the ARM-controlled one, and the KVM-specific one), the
function space being 16bits in each case, that's 16kB worth of zeroes
that userspace has to save/restore at all times... I'm not overly
enthusiastic.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
  2021-10-01 11:43                         ` Marc Zyngier
  (?)
@ 2021-10-01 15:38                           ` Oliver Upton
  -1 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2021-10-01 15:38 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, pshier, kvmarm, linux-arm-kernel

On Fri, Oct 1, 2021 at 4:43 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Thu, 30 Sep 2021 18:24:23 +0100,
> Oliver Upton <oupton@google.com> wrote:
> >
> > Hey Marc,
> >
> > On Thu, Sep 30, 2021 at 12:32 AM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > Hi Oliver,
> > >
> > > On Wed, 29 Sep 2021 19:22:05 +0100,
> > > Oliver Upton <oupton@google.com> wrote:
> > > >
> > > > I have some lingering thoughts on this subject since we last spoke and
> > > > wanted to discuss.
> > > >
> > > > I'm having a hard time figuring out how a VMM should handle a new
> > > > hypercall identity register introduced on a newer kernel. In order to
> > > > maintain guest ABI, the VMM would need to know about that register and
> > > > zero it when restoring an older guest.
> > >
> > > Just as it would need to be able to discover any new system register
> > > exposed by default, as it happens at all times. Which is why we have a
> > > way to discover all the registers, architected or not.
> > >
> > > > Perhaps instead we could reserve a range of firmware registers as the
> > > > 'hypercall identity' registers. Implement all of them as RAZ/WI by
> > > > default, encouraging userspace to zero these registers away for older
> > > > VMs but still allowing an old userspace to pick up new KVM features.
> > > > Doing so would align the hypercall identity registers with the feature
> > > > ID registers from the architecture.
> > >
> > > The range already exists in the form of the "coprocessor" 0x14. I
> > > don't see the need to expose it as RAZ/WI, however. If userspace
> > > doesn't know about how this pseudo-register works, it won't be able to
> > > program it anyway.
> > >
> > > I don't buy the parallel with the ID-regs either. They are RAZ/WI by
> > > default so that they don't UNDEF at runtime. The meaning of a RAZ
> > > id-register is also well defined (feature not implemented), and the
> > > CPU cannot write to them. In a way, the ID-regs *are* the enumeration
> > > mechanism.
> > >
> > > Our firmware registers don't follow the same rules. Userspace can
> > > write to them, and there is no such "not implemented" rule (case in
> > > point, PSCI). We also have a separate enumeration mechanism
> > > (GET_ONE_REG), which is (more or less) designed for userspace to find
> > > what is implemented.
> > >
> > > For these reasons, I don't immediately see the point of advertising a
> > > set of registers ahead of time, before userspace grows an
> > > understanding of what these registers mean.
> >
> > Supposing we don't preallocate some hypercall ID registers, how can we
> > safely migrate a guest from an older kernel to newer kernel? Ideally,
> > we would preserve the hypercall feature set across the migration which
> > could work for a while with the first set of registers that get
> > defined, but whenever a new hypercall firmware register comes along
> > then the VMM will be clueless to the new ABI.
>
> My expectations were that whenever userspace writes a set of firmware
> register, all the defaults are invalidated. For example say that
> host-A know about a single hypercall register, while host-B knows
> about two. Upon writing to the first register, the host would clear
> the set of available services in the second one. If userspace
> eventually writes there, the value would stick if valid.
>
> Also, remember that pseudo-registers don't have to be 64bit. We could
> define a new class of hypercall-specific registers that would be much
> wider, and thus have a larger write granularity (KVM supports anything
> from 8 to 2048 bits). This would make it pretty easy to implement the
> above.
>
> > Fundamentally, I don't think userspace should need a patch to preserve
> > ABI on a newer kernel. Despite that, it would seem that userspace will
> > need to learn of any firmware registers that control hypercall
> > features which come after the initial set that gets proposed. If
> > KVM_GET_REG_LIST were to disambiguate between ID registers (hypercall,
> > architectural feature ID registers) from other parts of the vCPU
> > state, it would be clear to what registers to zero on a newer kernel.
> > Apologies if it is distracting to mention the feature ID registers
> > here, but both are on my mind currently and want to make sure there is
> > some consistency in how features get handled on newer kernels,
> > architected or not.
>
> The problem I see is that we will always need to grow the HC register
> space one way or another, no matter how many we reserve. Your approach
> only works if we don't exceed that particular range. Maybe it will
> never be a problem, but it remains that this is not scalable.
>
> If we wanted to be safe, we'd reserve the whole of the possible space
> as defined by the SMCCC spec. Given that we currently have two HC
> spaces (the ARM-controlled one, and the KVM-specific one), the
> function space being 16bits in each case, that's 16kB worth of zeroes
> that userspace has to save/restore at all times... I'm not overly
> enthusiastic.

Yeah, that's definitely overkill.

I agree with your position; we can solve the issue altogether by
enforcing ordering on the hypercall registers. Userspace should read
all hypercall registers to discover features, then write them. Given
your suggestion, I don't know if there is much need for the guesswork
to conclude on an appropriate register size for the hypercall bitmaps.
64 bits seems fine to me..

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-10-01 15:38                           ` Oliver Upton
  0 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2021-10-01 15:38 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Andrew Jones, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Fri, Oct 1, 2021 at 4:43 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Thu, 30 Sep 2021 18:24:23 +0100,
> Oliver Upton <oupton@google.com> wrote:
> >
> > Hey Marc,
> >
> > On Thu, Sep 30, 2021 at 12:32 AM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > Hi Oliver,
> > >
> > > On Wed, 29 Sep 2021 19:22:05 +0100,
> > > Oliver Upton <oupton@google.com> wrote:
> > > >
> > > > I have some lingering thoughts on this subject since we last spoke and
> > > > wanted to discuss.
> > > >
> > > > I'm having a hard time figuring out how a VMM should handle a new
> > > > hypercall identity register introduced on a newer kernel. In order to
> > > > maintain guest ABI, the VMM would need to know about that register and
> > > > zero it when restoring an older guest.
> > >
> > > Just as it would need to be able to discover any new system register
> > > exposed by default, as it happens at all times. Which is why we have a
> > > way to discover all the registers, architected or not.
> > >
> > > > Perhaps instead we could reserve a range of firmware registers as the
> > > > 'hypercall identity' registers. Implement all of them as RAZ/WI by
> > > > default, encouraging userspace to zero these registers away for older
> > > > VMs but still allowing an old userspace to pick up new KVM features.
> > > > Doing so would align the hypercall identity registers with the feature
> > > > ID registers from the architecture.
> > >
> > > The range already exists in the form of the "coprocessor" 0x14. I
> > > don't see the need to expose it as RAZ/WI, however. If userspace
> > > doesn't know about how this pseudo-register works, it won't be able to
> > > program it anyway.
> > >
> > > I don't buy the parallel with the ID-regs either. They are RAZ/WI by
> > > default so that they don't UNDEF at runtime. The meaning of a RAZ
> > > id-register is also well defined (feature not implemented), and the
> > > CPU cannot write to them. In a way, the ID-regs *are* the enumeration
> > > mechanism.
> > >
> > > Our firmware registers don't follow the same rules. Userspace can
> > > write to them, and there is no such "not implemented" rule (case in
> > > point, PSCI). We also have a separate enumeration mechanism
> > > (GET_ONE_REG), which is (more or less) designed for userspace to find
> > > what is implemented.
> > >
> > > For these reasons, I don't immediately see the point of advertising a
> > > set of registers ahead of time, before userspace grows an
> > > understanding of what these registers mean.
> >
> > Supposing we don't preallocate some hypercall ID registers, how can we
> > safely migrate a guest from an older kernel to newer kernel? Ideally,
> > we would preserve the hypercall feature set across the migration which
> > could work for a while with the first set of registers that get
> > defined, but whenever a new hypercall firmware register comes along
> > then the VMM will be clueless to the new ABI.
>
> My expectations were that whenever userspace writes a set of firmware
> register, all the defaults are invalidated. For example say that
> host-A know about a single hypercall register, while host-B knows
> about two. Upon writing to the first register, the host would clear
> the set of available services in the second one. If userspace
> eventually writes there, the value would stick if valid.
>
> Also, remember that pseudo-registers don't have to be 64bit. We could
> define a new class of hypercall-specific registers that would be much
> wider, and thus have a larger write granularity (KVM supports anything
> from 8 to 2048 bits). This would make it pretty easy to implement the
> above.
>
> > Fundamentally, I don't think userspace should need a patch to preserve
> > ABI on a newer kernel. Despite that, it would seem that userspace will
> > need to learn of any firmware registers that control hypercall
> > features which come after the initial set that gets proposed. If
> > KVM_GET_REG_LIST were to disambiguate between ID registers (hypercall,
> > architectural feature ID registers) from other parts of the vCPU
> > state, it would be clear to what registers to zero on a newer kernel.
> > Apologies if it is distracting to mention the feature ID registers
> > here, but both are on my mind currently and want to make sure there is
> > some consistency in how features get handled on newer kernels,
> > architected or not.
>
> The problem I see is that we will always need to grow the HC register
> space one way or another, no matter how many we reserve. Your approach
> only works if we don't exceed that particular range. Maybe it will
> never be a problem, but it remains that this is not scalable.
>
> If we wanted to be safe, we'd reserve the whole of the possible space
> as defined by the SMCCC spec. Given that we currently have two HC
> spaces (the ARM-controlled one, and the KVM-specific one), the
> function space being 16bits in each case, that's 16kB worth of zeroes
> that userspace has to save/restore at all times... I'm not overly
> enthusiastic.

Yeah, that's definitely overkill.

I agree with your position; we can solve the issue altogether by
enforcing ordering on the hypercall registers. Userspace should read
all hypercall registers to discover features, then write them. Given
your suggestion, I don't know if there is much need for the guesswork
to conclude on an appropriate register size for the hypercall bitmaps.
64 bits seems fine to me..

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2021-10-01 15:38                           ` Oliver Upton
  0 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2021-10-01 15:38 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Andrew Jones, kvmarm, pshier, ricarkol, rananta, reijiw,
	jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Fri, Oct 1, 2021 at 4:43 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Thu, 30 Sep 2021 18:24:23 +0100,
> Oliver Upton <oupton@google.com> wrote:
> >
> > Hey Marc,
> >
> > On Thu, Sep 30, 2021 at 12:32 AM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > Hi Oliver,
> > >
> > > On Wed, 29 Sep 2021 19:22:05 +0100,
> > > Oliver Upton <oupton@google.com> wrote:
> > > >
> > > > I have some lingering thoughts on this subject since we last spoke and
> > > > wanted to discuss.
> > > >
> > > > I'm having a hard time figuring out how a VMM should handle a new
> > > > hypercall identity register introduced on a newer kernel. In order to
> > > > maintain guest ABI, the VMM would need to know about that register and
> > > > zero it when restoring an older guest.
> > >
> > > Just as it would need to be able to discover any new system register
> > > exposed by default, as it happens at all times. Which is why we have a
> > > way to discover all the registers, architected or not.
> > >
> > > > Perhaps instead we could reserve a range of firmware registers as the
> > > > 'hypercall identity' registers. Implement all of them as RAZ/WI by
> > > > default, encouraging userspace to zero these registers away for older
> > > > VMs but still allowing an old userspace to pick up new KVM features.
> > > > Doing so would align the hypercall identity registers with the feature
> > > > ID registers from the architecture.
> > >
> > > The range already exists in the form of the "coprocessor" 0x14. I
> > > don't see the need to expose it as RAZ/WI, however. If userspace
> > > doesn't know about how this pseudo-register works, it won't be able to
> > > program it anyway.
> > >
> > > I don't buy the parallel with the ID-regs either. They are RAZ/WI by
> > > default so that they don't UNDEF at runtime. The meaning of a RAZ
> > > id-register is also well defined (feature not implemented), and the
> > > CPU cannot write to them. In a way, the ID-regs *are* the enumeration
> > > mechanism.
> > >
> > > Our firmware registers don't follow the same rules. Userspace can
> > > write to them, and there is no such "not implemented" rule (case in
> > > point, PSCI). We also have a separate enumeration mechanism
> > > (GET_ONE_REG), which is (more or less) designed for userspace to find
> > > what is implemented.
> > >
> > > For these reasons, I don't immediately see the point of advertising a
> > > set of registers ahead of time, before userspace grows an
> > > understanding of what these registers mean.
> >
> > Supposing we don't preallocate some hypercall ID registers, how can we
> > safely migrate a guest from an older kernel to newer kernel? Ideally,
> > we would preserve the hypercall feature set across the migration which
> > could work for a while with the first set of registers that get
> > defined, but whenever a new hypercall firmware register comes along
> > then the VMM will be clueless to the new ABI.
>
> My expectations were that whenever userspace writes a set of firmware
> register, all the defaults are invalidated. For example say that
> host-A know about a single hypercall register, while host-B knows
> about two. Upon writing to the first register, the host would clear
> the set of available services in the second one. If userspace
> eventually writes there, the value would stick if valid.
>
> Also, remember that pseudo-registers don't have to be 64bit. We could
> define a new class of hypercall-specific registers that would be much
> wider, and thus have a larger write granularity (KVM supports anything
> from 8 to 2048 bits). This would make it pretty easy to implement the
> above.
>
> > Fundamentally, I don't think userspace should need a patch to preserve
> > ABI on a newer kernel. Despite that, it would seem that userspace will
> > need to learn of any firmware registers that control hypercall
> > features which come after the initial set that gets proposed. If
> > KVM_GET_REG_LIST were to disambiguate between ID registers (hypercall,
> > architectural feature ID registers) from other parts of the vCPU
> > state, it would be clear to what registers to zero on a newer kernel.
> > Apologies if it is distracting to mention the feature ID registers
> > here, but both are on my mind currently and want to make sure there is
> > some consistency in how features get handled on newer kernels,
> > architected or not.
>
> The problem I see is that we will always need to grow the HC register
> space one way or another, no matter how many we reserve. Your approach
> only works if we don't exceed that particular range. Maybe it will
> never be a problem, but it remains that this is not scalable.
>
> If we wanted to be safe, we'd reserve the whole of the possible space
> as defined by the SMCCC spec. Given that we currently have two HC
> spaces (the ARM-controlled one, and the KVM-specific one), the
> function space being 16bits in each case, that's 16kB worth of zeroes
> that userspace has to save/restore at all times... I'm not overly
> enthusiastic.

Yeah, that's definitely overkill.

I agree with your position; we can solve the issue altogether by
enforcing ordering on the hypercall registers. Userspace should read
all hypercall registers to discover features, then write them. Given
your suggestion, I don't know if there is much need for the guesswork
to conclude on an appropriate register size for the hypercall bitmaps.
64 bits seems fine to me..

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
  2021-10-01 15:38                           ` Oliver Upton
  (?)
@ 2022-01-25  3:47                             ` Raghavendra Rao Ananta
  -1 siblings, 0 replies; 74+ messages in thread
From: Raghavendra Rao Ananta @ 2022-01-25  3:47 UTC (permalink / raw)
  To: Marc Zyngier, Andrew Jones; +Cc: kvm, pshier, kvmarm, linux-arm-kernel

Hello all,

Based on the recent discussion on the patch '[RFC PATCH v3 01/11] KVM:
Capture VM start' [1], Reiji, I (and others in the team) were
wondering, since the hypercall feature-map is technically per-VM and
not per-vCPU, would it make more sense to present it as a kvm_device,
rather than vCPU psuedo-registers to the userspace?

If I understand correctly, the original motivation for going with
pseudo-registers was to comply with QEMU, which uses KVM_GET_REG_LIST
and KVM_[GET|SET]_ONE_REG interface, but I'm guessing the VMMs doing
save/restore across migration might write the same values for every
vCPU.

Granted that we would be diverging from the existing implementation
(psci versioning and spectre WA registers), but this can be a cleaner
way forward for extending hypercall support. The kvm_device interface
can be made backward compatible with the way hypercalls are exposed
today, it can have the same registers/features discovery mechanisms
that we discussed above, and majorly the userspace has to configure it
only once per-VM. We can probably make the feature selection immutable
just before any vCPU is created.

Please let me know your thoughts or any disadvantages that I'm overlooking.

Thanks,
Raghavendra

[1]: https://lore.kernel.org/kvmarm/CAJHc60zhRyOad7AqtEFn-Ptro5BGVkfpB2wXWGw5EZMxOHUc=w@mail.gmail.com/



On Fri, Oct 1, 2021 at 8:38 AM Oliver Upton <oupton@google.com> wrote:
>
> On Fri, Oct 1, 2021 at 4:43 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Thu, 30 Sep 2021 18:24:23 +0100,
> > Oliver Upton <oupton@google.com> wrote:
> > >
> > > Hey Marc,
> > >
> > > On Thu, Sep 30, 2021 at 12:32 AM Marc Zyngier <maz@kernel.org> wrote:
> > > >
> > > > Hi Oliver,
> > > >
> > > > On Wed, 29 Sep 2021 19:22:05 +0100,
> > > > Oliver Upton <oupton@google.com> wrote:
> > > > >
> > > > > I have some lingering thoughts on this subject since we last spoke and
> > > > > wanted to discuss.
> > > > >
> > > > > I'm having a hard time figuring out how a VMM should handle a new
> > > > > hypercall identity register introduced on a newer kernel. In order to
> > > > > maintain guest ABI, the VMM would need to know about that register and
> > > > > zero it when restoring an older guest.
> > > >
> > > > Just as it would need to be able to discover any new system register
> > > > exposed by default, as it happens at all times. Which is why we have a
> > > > way to discover all the registers, architected or not.
> > > >
> > > > > Perhaps instead we could reserve a range of firmware registers as the
> > > > > 'hypercall identity' registers. Implement all of them as RAZ/WI by
> > > > > default, encouraging userspace to zero these registers away for older
> > > > > VMs but still allowing an old userspace to pick up new KVM features.
> > > > > Doing so would align the hypercall identity registers with the feature
> > > > > ID registers from the architecture.
> > > >
> > > > The range already exists in the form of the "coprocessor" 0x14. I
> > > > don't see the need to expose it as RAZ/WI, however. If userspace
> > > > doesn't know about how this pseudo-register works, it won't be able to
> > > > program it anyway.
> > > >
> > > > I don't buy the parallel with the ID-regs either. They are RAZ/WI by
> > > > default so that they don't UNDEF at runtime. The meaning of a RAZ
> > > > id-register is also well defined (feature not implemented), and the
> > > > CPU cannot write to them. In a way, the ID-regs *are* the enumeration
> > > > mechanism.
> > > >
> > > > Our firmware registers don't follow the same rules. Userspace can
> > > > write to them, and there is no such "not implemented" rule (case in
> > > > point, PSCI). We also have a separate enumeration mechanism
> > > > (GET_ONE_REG), which is (more or less) designed for userspace to find
> > > > what is implemented.
> > > >
> > > > For these reasons, I don't immediately see the point of advertising a
> > > > set of registers ahead of time, before userspace grows an
> > > > understanding of what these registers mean.
> > >
> > > Supposing we don't preallocate some hypercall ID registers, how can we
> > > safely migrate a guest from an older kernel to newer kernel? Ideally,
> > > we would preserve the hypercall feature set across the migration which
> > > could work for a while with the first set of registers that get
> > > defined, but whenever a new hypercall firmware register comes along
> > > then the VMM will be clueless to the new ABI.
> >
> > My expectations were that whenever userspace writes a set of firmware
> > register, all the defaults are invalidated. For example say that
> > host-A know about a single hypercall register, while host-B knows
> > about two. Upon writing to the first register, the host would clear
> > the set of available services in the second one. If userspace
> > eventually writes there, the value would stick if valid.
> >
> > Also, remember that pseudo-registers don't have to be 64bit. We could
> > define a new class of hypercall-specific registers that would be much
> > wider, and thus have a larger write granularity (KVM supports anything
> > from 8 to 2048 bits). This would make it pretty easy to implement the
> > above.
> >
> > > Fundamentally, I don't think userspace should need a patch to preserve
> > > ABI on a newer kernel. Despite that, it would seem that userspace will
> > > need to learn of any firmware registers that control hypercall
> > > features which come after the initial set that gets proposed. If
> > > KVM_GET_REG_LIST were to disambiguate between ID registers (hypercall,
> > > architectural feature ID registers) from other parts of the vCPU
> > > state, it would be clear to what registers to zero on a newer kernel.
> > > Apologies if it is distracting to mention the feature ID registers
> > > here, but both are on my mind currently and want to make sure there is
> > > some consistency in how features get handled on newer kernels,
> > > architected or not.
> >
> > The problem I see is that we will always need to grow the HC register
> > space one way or another, no matter how many we reserve. Your approach
> > only works if we don't exceed that particular range. Maybe it will
> > never be a problem, but it remains that this is not scalable.
> >
> > If we wanted to be safe, we'd reserve the whole of the possible space
> > as defined by the SMCCC spec. Given that we currently have two HC
> > spaces (the ARM-controlled one, and the KVM-specific one), the
> > function space being 16bits in each case, that's 16kB worth of zeroes
> > that userspace has to save/restore at all times... I'm not overly
> > enthusiastic.
>
> Yeah, that's definitely overkill.
>
> I agree with your position; we can solve the issue altogether by
> enforcing ordering on the hypercall registers. Userspace should read
> all hypercall registers to discover features, then write them. Given
> your suggestion, I don't know if there is much need for the guesswork
> to conclude on an appropriate register size for the hypercall bitmaps.
> 64 bits seems fine to me..
>
> --
> Thanks,
> Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2022-01-25  3:47                             ` Raghavendra Rao Ananta
  0 siblings, 0 replies; 74+ messages in thread
From: Raghavendra Rao Ananta @ 2022-01-25  3:47 UTC (permalink / raw)
  To: Marc Zyngier, Andrew Jones
  Cc: kvmarm, pshier, ricarkol, reijiw, jingzhangos, kvm,
	linux-arm-kernel, james.morse, alexandru.elisei, suzuki.poulose,
	Peter Maydell, Sean Christopherson, Oliver Upton

Hello all,

Based on the recent discussion on the patch '[RFC PATCH v3 01/11] KVM:
Capture VM start' [1], Reiji, I (and others in the team) were
wondering, since the hypercall feature-map is technically per-VM and
not per-vCPU, would it make more sense to present it as a kvm_device,
rather than vCPU psuedo-registers to the userspace?

If I understand correctly, the original motivation for going with
pseudo-registers was to comply with QEMU, which uses KVM_GET_REG_LIST
and KVM_[GET|SET]_ONE_REG interface, but I'm guessing the VMMs doing
save/restore across migration might write the same values for every
vCPU.

Granted that we would be diverging from the existing implementation
(psci versioning and spectre WA registers), but this can be a cleaner
way forward for extending hypercall support. The kvm_device interface
can be made backward compatible with the way hypercalls are exposed
today, it can have the same registers/features discovery mechanisms
that we discussed above, and majorly the userspace has to configure it
only once per-VM. We can probably make the feature selection immutable
just before any vCPU is created.

Please let me know your thoughts or any disadvantages that I'm overlooking.

Thanks,
Raghavendra

[1]: https://lore.kernel.org/kvmarm/CAJHc60zhRyOad7AqtEFn-Ptro5BGVkfpB2wXWGw5EZMxOHUc=w@mail.gmail.com/



On Fri, Oct 1, 2021 at 8:38 AM Oliver Upton <oupton@google.com> wrote:
>
> On Fri, Oct 1, 2021 at 4:43 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Thu, 30 Sep 2021 18:24:23 +0100,
> > Oliver Upton <oupton@google.com> wrote:
> > >
> > > Hey Marc,
> > >
> > > On Thu, Sep 30, 2021 at 12:32 AM Marc Zyngier <maz@kernel.org> wrote:
> > > >
> > > > Hi Oliver,
> > > >
> > > > On Wed, 29 Sep 2021 19:22:05 +0100,
> > > > Oliver Upton <oupton@google.com> wrote:
> > > > >
> > > > > I have some lingering thoughts on this subject since we last spoke and
> > > > > wanted to discuss.
> > > > >
> > > > > I'm having a hard time figuring out how a VMM should handle a new
> > > > > hypercall identity register introduced on a newer kernel. In order to
> > > > > maintain guest ABI, the VMM would need to know about that register and
> > > > > zero it when restoring an older guest.
> > > >
> > > > Just as it would need to be able to discover any new system register
> > > > exposed by default, as it happens at all times. Which is why we have a
> > > > way to discover all the registers, architected or not.
> > > >
> > > > > Perhaps instead we could reserve a range of firmware registers as the
> > > > > 'hypercall identity' registers. Implement all of them as RAZ/WI by
> > > > > default, encouraging userspace to zero these registers away for older
> > > > > VMs but still allowing an old userspace to pick up new KVM features.
> > > > > Doing so would align the hypercall identity registers with the feature
> > > > > ID registers from the architecture.
> > > >
> > > > The range already exists in the form of the "coprocessor" 0x14. I
> > > > don't see the need to expose it as RAZ/WI, however. If userspace
> > > > doesn't know about how this pseudo-register works, it won't be able to
> > > > program it anyway.
> > > >
> > > > I don't buy the parallel with the ID-regs either. They are RAZ/WI by
> > > > default so that they don't UNDEF at runtime. The meaning of a RAZ
> > > > id-register is also well defined (feature not implemented), and the
> > > > CPU cannot write to them. In a way, the ID-regs *are* the enumeration
> > > > mechanism.
> > > >
> > > > Our firmware registers don't follow the same rules. Userspace can
> > > > write to them, and there is no such "not implemented" rule (case in
> > > > point, PSCI). We also have a separate enumeration mechanism
> > > > (GET_ONE_REG), which is (more or less) designed for userspace to find
> > > > what is implemented.
> > > >
> > > > For these reasons, I don't immediately see the point of advertising a
> > > > set of registers ahead of time, before userspace grows an
> > > > understanding of what these registers mean.
> > >
> > > Supposing we don't preallocate some hypercall ID registers, how can we
> > > safely migrate a guest from an older kernel to newer kernel? Ideally,
> > > we would preserve the hypercall feature set across the migration which
> > > could work for a while with the first set of registers that get
> > > defined, but whenever a new hypercall firmware register comes along
> > > then the VMM will be clueless to the new ABI.
> >
> > My expectations were that whenever userspace writes a set of firmware
> > register, all the defaults are invalidated. For example say that
> > host-A know about a single hypercall register, while host-B knows
> > about two. Upon writing to the first register, the host would clear
> > the set of available services in the second one. If userspace
> > eventually writes there, the value would stick if valid.
> >
> > Also, remember that pseudo-registers don't have to be 64bit. We could
> > define a new class of hypercall-specific registers that would be much
> > wider, and thus have a larger write granularity (KVM supports anything
> > from 8 to 2048 bits). This would make it pretty easy to implement the
> > above.
> >
> > > Fundamentally, I don't think userspace should need a patch to preserve
> > > ABI on a newer kernel. Despite that, it would seem that userspace will
> > > need to learn of any firmware registers that control hypercall
> > > features which come after the initial set that gets proposed. If
> > > KVM_GET_REG_LIST were to disambiguate between ID registers (hypercall,
> > > architectural feature ID registers) from other parts of the vCPU
> > > state, it would be clear to what registers to zero on a newer kernel.
> > > Apologies if it is distracting to mention the feature ID registers
> > > here, but both are on my mind currently and want to make sure there is
> > > some consistency in how features get handled on newer kernels,
> > > architected or not.
> >
> > The problem I see is that we will always need to grow the HC register
> > space one way or another, no matter how many we reserve. Your approach
> > only works if we don't exceed that particular range. Maybe it will
> > never be a problem, but it remains that this is not scalable.
> >
> > If we wanted to be safe, we'd reserve the whole of the possible space
> > as defined by the SMCCC spec. Given that we currently have two HC
> > spaces (the ARM-controlled one, and the KVM-specific one), the
> > function space being 16bits in each case, that's 16kB worth of zeroes
> > that userspace has to save/restore at all times... I'm not overly
> > enthusiastic.
>
> Yeah, that's definitely overkill.
>
> I agree with your position; we can solve the issue altogether by
> enforcing ordering on the hypercall registers. Userspace should read
> all hypercall registers to discover features, then write them. Given
> your suggestion, I don't know if there is much need for the guesswork
> to conclude on an appropriate register size for the hypercall bitmaps.
> 64 bits seems fine to me..
>
> --
> Thanks,
> Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2022-01-25  3:47                             ` Raghavendra Rao Ananta
  0 siblings, 0 replies; 74+ messages in thread
From: Raghavendra Rao Ananta @ 2022-01-25  3:47 UTC (permalink / raw)
  To: Marc Zyngier, Andrew Jones
  Cc: kvmarm, pshier, ricarkol, reijiw, jingzhangos, kvm,
	linux-arm-kernel, james.morse, alexandru.elisei, suzuki.poulose,
	Peter Maydell, Sean Christopherson, Oliver Upton

Hello all,

Based on the recent discussion on the patch '[RFC PATCH v3 01/11] KVM:
Capture VM start' [1], Reiji, I (and others in the team) were
wondering, since the hypercall feature-map is technically per-VM and
not per-vCPU, would it make more sense to present it as a kvm_device,
rather than vCPU psuedo-registers to the userspace?

If I understand correctly, the original motivation for going with
pseudo-registers was to comply with QEMU, which uses KVM_GET_REG_LIST
and KVM_[GET|SET]_ONE_REG interface, but I'm guessing the VMMs doing
save/restore across migration might write the same values for every
vCPU.

Granted that we would be diverging from the existing implementation
(psci versioning and spectre WA registers), but this can be a cleaner
way forward for extending hypercall support. The kvm_device interface
can be made backward compatible with the way hypercalls are exposed
today, it can have the same registers/features discovery mechanisms
that we discussed above, and majorly the userspace has to configure it
only once per-VM. We can probably make the feature selection immutable
just before any vCPU is created.

Please let me know your thoughts or any disadvantages that I'm overlooking.

Thanks,
Raghavendra

[1]: https://lore.kernel.org/kvmarm/CAJHc60zhRyOad7AqtEFn-Ptro5BGVkfpB2wXWGw5EZMxOHUc=w@mail.gmail.com/



On Fri, Oct 1, 2021 at 8:38 AM Oliver Upton <oupton@google.com> wrote:
>
> On Fri, Oct 1, 2021 at 4:43 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Thu, 30 Sep 2021 18:24:23 +0100,
> > Oliver Upton <oupton@google.com> wrote:
> > >
> > > Hey Marc,
> > >
> > > On Thu, Sep 30, 2021 at 12:32 AM Marc Zyngier <maz@kernel.org> wrote:
> > > >
> > > > Hi Oliver,
> > > >
> > > > On Wed, 29 Sep 2021 19:22:05 +0100,
> > > > Oliver Upton <oupton@google.com> wrote:
> > > > >
> > > > > I have some lingering thoughts on this subject since we last spoke and
> > > > > wanted to discuss.
> > > > >
> > > > > I'm having a hard time figuring out how a VMM should handle a new
> > > > > hypercall identity register introduced on a newer kernel. In order to
> > > > > maintain guest ABI, the VMM would need to know about that register and
> > > > > zero it when restoring an older guest.
> > > >
> > > > Just as it would need to be able to discover any new system register
> > > > exposed by default, as it happens at all times. Which is why we have a
> > > > way to discover all the registers, architected or not.
> > > >
> > > > > Perhaps instead we could reserve a range of firmware registers as the
> > > > > 'hypercall identity' registers. Implement all of them as RAZ/WI by
> > > > > default, encouraging userspace to zero these registers away for older
> > > > > VMs but still allowing an old userspace to pick up new KVM features.
> > > > > Doing so would align the hypercall identity registers with the feature
> > > > > ID registers from the architecture.
> > > >
> > > > The range already exists in the form of the "coprocessor" 0x14. I
> > > > don't see the need to expose it as RAZ/WI, however. If userspace
> > > > doesn't know about how this pseudo-register works, it won't be able to
> > > > program it anyway.
> > > >
> > > > I don't buy the parallel with the ID-regs either. They are RAZ/WI by
> > > > default so that they don't UNDEF at runtime. The meaning of a RAZ
> > > > id-register is also well defined (feature not implemented), and the
> > > > CPU cannot write to them. In a way, the ID-regs *are* the enumeration
> > > > mechanism.
> > > >
> > > > Our firmware registers don't follow the same rules. Userspace can
> > > > write to them, and there is no such "not implemented" rule (case in
> > > > point, PSCI). We also have a separate enumeration mechanism
> > > > (GET_ONE_REG), which is (more or less) designed for userspace to find
> > > > what is implemented.
> > > >
> > > > For these reasons, I don't immediately see the point of advertising a
> > > > set of registers ahead of time, before userspace grows an
> > > > understanding of what these registers mean.
> > >
> > > Supposing we don't preallocate some hypercall ID registers, how can we
> > > safely migrate a guest from an older kernel to newer kernel? Ideally,
> > > we would preserve the hypercall feature set across the migration which
> > > could work for a while with the first set of registers that get
> > > defined, but whenever a new hypercall firmware register comes along
> > > then the VMM will be clueless to the new ABI.
> >
> > My expectations were that whenever userspace writes a set of firmware
> > register, all the defaults are invalidated. For example say that
> > host-A know about a single hypercall register, while host-B knows
> > about two. Upon writing to the first register, the host would clear
> > the set of available services in the second one. If userspace
> > eventually writes there, the value would stick if valid.
> >
> > Also, remember that pseudo-registers don't have to be 64bit. We could
> > define a new class of hypercall-specific registers that would be much
> > wider, and thus have a larger write granularity (KVM supports anything
> > from 8 to 2048 bits). This would make it pretty easy to implement the
> > above.
> >
> > > Fundamentally, I don't think userspace should need a patch to preserve
> > > ABI on a newer kernel. Despite that, it would seem that userspace will
> > > need to learn of any firmware registers that control hypercall
> > > features which come after the initial set that gets proposed. If
> > > KVM_GET_REG_LIST were to disambiguate between ID registers (hypercall,
> > > architectural feature ID registers) from other parts of the vCPU
> > > state, it would be clear to what registers to zero on a newer kernel.
> > > Apologies if it is distracting to mention the feature ID registers
> > > here, but both are on my mind currently and want to make sure there is
> > > some consistency in how features get handled on newer kernels,
> > > architected or not.
> >
> > The problem I see is that we will always need to grow the HC register
> > space one way or another, no matter how many we reserve. Your approach
> > only works if we don't exceed that particular range. Maybe it will
> > never be a problem, but it remains that this is not scalable.
> >
> > If we wanted to be safe, we'd reserve the whole of the possible space
> > as defined by the SMCCC spec. Given that we currently have two HC
> > spaces (the ARM-controlled one, and the KVM-specific one), the
> > function space being 16bits in each case, that's 16kB worth of zeroes
> > that userspace has to save/restore at all times... I'm not overly
> > enthusiastic.
>
> Yeah, that's definitely overkill.
>
> I agree with your position; we can solve the issue altogether by
> enforcing ordering on the hypercall registers. Userspace should read
> all hypercall registers to discover features, then write them. Given
> your suggestion, I don't know if there is much need for the guesswork
> to conclude on an appropriate register size for the hypercall bitmaps.
> 64 bits seems fine to me..
>
> --
> Thanks,
> Oliver

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
  2022-01-25  3:47                             ` Raghavendra Rao Ananta
  (?)
@ 2022-01-25  8:45                               ` Marc Zyngier
  -1 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2022-01-25  8:45 UTC (permalink / raw)
  To: Raghavendra Rao Ananta; +Cc: kvm, pshier, kvmarm, linux-arm-kernel

On Tue, 25 Jan 2022 03:47:04 +0000,
Raghavendra Rao Ananta <rananta@google.com> wrote:
> 
> Hello all,
> 
> Based on the recent discussion on the patch '[RFC PATCH v3 01/11] KVM:
> Capture VM start' [1], Reiji, I (and others in the team) were
> wondering, since the hypercall feature-map is technically per-VM and
> not per-vCPU, would it make more sense to present it as a kvm_device,
> rather than vCPU psuedo-registers to the userspace?

I really don't think so.

> If I understand correctly, the original motivation for going with
> pseudo-registers was to comply with QEMU, which uses KVM_GET_REG_LIST
> and KVM_[GET|SET]_ONE_REG interface, but I'm guessing the VMMs doing
> save/restore across migration might write the same values for every
> vCPU.

KVM currently restricts the vcpu features to be unified across vcpus,
but that's only an implementation choice. The ARM architecture doesn't
mandate that these registers are all the same, and it isn't impossible
that we'd allow for the feature set to become per-vcpu at some point
in time. So this argument doesn't really hold.

Furthermore, compatibility with QEMU's save/restore model is
essential, and AFAICT, there is no open source alternative.

> Granted that we would be diverging from the existing implementation
> (psci versioning and spectre WA registers), but this can be a cleaner
> way forward for extending hypercall support.

Cleaner? Why? How? You'd have the exact same constraints, plus the
disadvantages of having to maintain a new interface concurrently with
the existing ones.

> The kvm_device interface
> can be made backward compatible with the way hypercalls are exposed
> today, it can have the same registers/features discovery mechanisms
> that we discussed above, and majorly the userspace has to configure it
> only once per-VM. We can probably make the feature selection immutable
> just before any vCPU is created.

What is the problem with immutability on first run? Or even with a
'finalize' ioctl (we already have one)?

> 
> Please let me know your thoughts or any disadvantages that I'm overlooking.

A device means yet another configuration and migration API. Don't you
think we have enough of those? The complexity of KVM/arm64 userspace
API is already insane, and extremely fragile. Adding to it will be a
validation nightmare (it already is, and I don't see anyone actively
helping with it).

So no, I have no plan to consider anything but the GET/SET_ONE_REG
interface, and until someone writes another open source VMM that has
the same save/restore capabilities and proves that this interface
isn't fit for purpose, I see no incentive in deviating from a model
that is known to work.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2022-01-25  8:45                               ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2022-01-25  8:45 UTC (permalink / raw)
  To: Raghavendra Rao Ananta
  Cc: Andrew Jones, kvmarm, pshier, ricarkol, reijiw, jingzhangos, kvm,
	linux-arm-kernel, james.morse, alexandru.elisei, suzuki.poulose,
	Peter Maydell, Sean Christopherson, Oliver Upton

On Tue, 25 Jan 2022 03:47:04 +0000,
Raghavendra Rao Ananta <rananta@google.com> wrote:
> 
> Hello all,
> 
> Based on the recent discussion on the patch '[RFC PATCH v3 01/11] KVM:
> Capture VM start' [1], Reiji, I (and others in the team) were
> wondering, since the hypercall feature-map is technically per-VM and
> not per-vCPU, would it make more sense to present it as a kvm_device,
> rather than vCPU psuedo-registers to the userspace?

I really don't think so.

> If I understand correctly, the original motivation for going with
> pseudo-registers was to comply with QEMU, which uses KVM_GET_REG_LIST
> and KVM_[GET|SET]_ONE_REG interface, but I'm guessing the VMMs doing
> save/restore across migration might write the same values for every
> vCPU.

KVM currently restricts the vcpu features to be unified across vcpus,
but that's only an implementation choice. The ARM architecture doesn't
mandate that these registers are all the same, and it isn't impossible
that we'd allow for the feature set to become per-vcpu at some point
in time. So this argument doesn't really hold.

Furthermore, compatibility with QEMU's save/restore model is
essential, and AFAICT, there is no open source alternative.

> Granted that we would be diverging from the existing implementation
> (psci versioning and spectre WA registers), but this can be a cleaner
> way forward for extending hypercall support.

Cleaner? Why? How? You'd have the exact same constraints, plus the
disadvantages of having to maintain a new interface concurrently with
the existing ones.

> The kvm_device interface
> can be made backward compatible with the way hypercalls are exposed
> today, it can have the same registers/features discovery mechanisms
> that we discussed above, and majorly the userspace has to configure it
> only once per-VM. We can probably make the feature selection immutable
> just before any vCPU is created.

What is the problem with immutability on first run? Or even with a
'finalize' ioctl (we already have one)?

> 
> Please let me know your thoughts or any disadvantages that I'm overlooking.

A device means yet another configuration and migration API. Don't you
think we have enough of those? The complexity of KVM/arm64 userspace
API is already insane, and extremely fragile. Adding to it will be a
validation nightmare (it already is, and I don't see anyone actively
helping with it).

So no, I have no plan to consider anything but the GET/SET_ONE_REG
interface, and until someone writes another open source VMM that has
the same save/restore capabilities and proves that this interface
isn't fit for purpose, I see no incentive in deviating from a model
that is known to work.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2022-01-25  8:45                               ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2022-01-25  8:45 UTC (permalink / raw)
  To: Raghavendra Rao Ananta
  Cc: Andrew Jones, kvmarm, pshier, ricarkol, reijiw, jingzhangos, kvm,
	linux-arm-kernel, james.morse, alexandru.elisei, suzuki.poulose,
	Peter Maydell, Sean Christopherson, Oliver Upton

On Tue, 25 Jan 2022 03:47:04 +0000,
Raghavendra Rao Ananta <rananta@google.com> wrote:
> 
> Hello all,
> 
> Based on the recent discussion on the patch '[RFC PATCH v3 01/11] KVM:
> Capture VM start' [1], Reiji, I (and others in the team) were
> wondering, since the hypercall feature-map is technically per-VM and
> not per-vCPU, would it make more sense to present it as a kvm_device,
> rather than vCPU psuedo-registers to the userspace?

I really don't think so.

> If I understand correctly, the original motivation for going with
> pseudo-registers was to comply with QEMU, which uses KVM_GET_REG_LIST
> and KVM_[GET|SET]_ONE_REG interface, but I'm guessing the VMMs doing
> save/restore across migration might write the same values for every
> vCPU.

KVM currently restricts the vcpu features to be unified across vcpus,
but that's only an implementation choice. The ARM architecture doesn't
mandate that these registers are all the same, and it isn't impossible
that we'd allow for the feature set to become per-vcpu at some point
in time. So this argument doesn't really hold.

Furthermore, compatibility with QEMU's save/restore model is
essential, and AFAICT, there is no open source alternative.

> Granted that we would be diverging from the existing implementation
> (psci versioning and spectre WA registers), but this can be a cleaner
> way forward for extending hypercall support.

Cleaner? Why? How? You'd have the exact same constraints, plus the
disadvantages of having to maintain a new interface concurrently with
the existing ones.

> The kvm_device interface
> can be made backward compatible with the way hypercalls are exposed
> today, it can have the same registers/features discovery mechanisms
> that we discussed above, and majorly the userspace has to configure it
> only once per-VM. We can probably make the feature selection immutable
> just before any vCPU is created.

What is the problem with immutability on first run? Or even with a
'finalize' ioctl (we already have one)?

> 
> Please let me know your thoughts or any disadvantages that I'm overlooking.

A device means yet another configuration and migration API. Don't you
think we have enough of those? The complexity of KVM/arm64 userspace
API is already insane, and extremely fragile. Adding to it will be a
validation nightmare (it already is, and I don't see anyone actively
helping with it).

So no, I have no plan to consider anything but the GET/SET_ONE_REG
interface, and until someone writes another open source VMM that has
the same save/restore capabilities and proves that this interface
isn't fit for purpose, I see no incentive in deviating from a model
that is known to work.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
  2022-01-25  8:45                               ` Marc Zyngier
  (?)
@ 2022-01-25 17:29                                 ` Oliver Upton
  -1 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2022-01-25 17:29 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, pshier, kvmarm, linux-arm-kernel

Hi Marc,

On Tue, Jan 25, 2022 at 12:46 AM Marc Zyngier <maz@kernel.org> wrote:
> > If I understand correctly, the original motivation for going with
> > pseudo-registers was to comply with QEMU, which uses KVM_GET_REG_LIST
> > and KVM_[GET|SET]_ONE_REG interface, but I'm guessing the VMMs doing
> > save/restore across migration might write the same values for every
> > vCPU.
>
> KVM currently restricts the vcpu features to be unified across vcpus,
> but that's only an implementation choice.

But that implementation choice has become ABI, no? How could support
for asymmetry be added without requiring userspace opt-in or breaking
existing VMMs that depend on feature unification?

> The ARM architecture doesn't
> mandate that these registers are all the same, and it isn't impossible
> that we'd allow for the feature set to become per-vcpu at some point
> in time. So this argument doesn't really hold.

Accessing per-VM state N times is bound to increase VM blackout time
during migrations ~linearly as the number of vCPUs in a VM increases,
since a VM scoped lock is necessary to serialize guest accesses. It
could be tolerable at present scale, but seems like in the future it
could become a real problem.

> Furthermore, compatibility with QEMU's save/restore model is
> essential, and AFAICT, there is no open source alternative.

Agree fundamentally, but I believe it is entirely reasonable to
require a userspace change to adopt a new KVM feature. Otherwise, we
may be trying to shoehorn new features into existing UAPI that may not
be a precise fit..

In order to cure the serialization mentioned above, two options are
top of mind: accessing the VM state with the VM FD or informing
userspace that a set of registers need only be written once for an
entire VM. If we add support for asymmetry later down the road, that
would become an opt-in such that userspace will do the access
per-vCPU.

> A device means yet another configuration and migration API. Don't you
> think we have enough of those? The complexity of KVM/arm64 userspace
> API is already insane, and extremely fragile. Adding to it will be a
> validation nightmare (it already is, and I don't see anyone actively
> helping with it).

It seems equally fragile to introduce VM-wide serialization to vCPU
UAPI that we know is in the live migration critical path for _any_
VMM. Without requiring userspace changes for all the new widgets under
discussion we're effectively forcing VMMs to do something suboptimal.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2022-01-25 17:29                                 ` Oliver Upton
  0 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2022-01-25 17:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Raghavendra Rao Ananta, Andrew Jones, kvmarm, pshier, ricarkol,
	reijiw, jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell,
	Sean Christopherson

Hi Marc,

On Tue, Jan 25, 2022 at 12:46 AM Marc Zyngier <maz@kernel.org> wrote:
> > If I understand correctly, the original motivation for going with
> > pseudo-registers was to comply with QEMU, which uses KVM_GET_REG_LIST
> > and KVM_[GET|SET]_ONE_REG interface, but I'm guessing the VMMs doing
> > save/restore across migration might write the same values for every
> > vCPU.
>
> KVM currently restricts the vcpu features to be unified across vcpus,
> but that's only an implementation choice.

But that implementation choice has become ABI, no? How could support
for asymmetry be added without requiring userspace opt-in or breaking
existing VMMs that depend on feature unification?

> The ARM architecture doesn't
> mandate that these registers are all the same, and it isn't impossible
> that we'd allow for the feature set to become per-vcpu at some point
> in time. So this argument doesn't really hold.

Accessing per-VM state N times is bound to increase VM blackout time
during migrations ~linearly as the number of vCPUs in a VM increases,
since a VM scoped lock is necessary to serialize guest accesses. It
could be tolerable at present scale, but seems like in the future it
could become a real problem.

> Furthermore, compatibility with QEMU's save/restore model is
> essential, and AFAICT, there is no open source alternative.

Agree fundamentally, but I believe it is entirely reasonable to
require a userspace change to adopt a new KVM feature. Otherwise, we
may be trying to shoehorn new features into existing UAPI that may not
be a precise fit..

In order to cure the serialization mentioned above, two options are
top of mind: accessing the VM state with the VM FD or informing
userspace that a set of registers need only be written once for an
entire VM. If we add support for asymmetry later down the road, that
would become an opt-in such that userspace will do the access
per-vCPU.

> A device means yet another configuration and migration API. Don't you
> think we have enough of those? The complexity of KVM/arm64 userspace
> API is already insane, and extremely fragile. Adding to it will be a
> validation nightmare (it already is, and I don't see anyone actively
> helping with it).

It seems equally fragile to introduce VM-wide serialization to vCPU
UAPI that we know is in the live migration critical path for _any_
VMM. Without requiring userspace changes for all the new widgets under
discussion we're effectively forcing VMMs to do something suboptimal.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2022-01-25 17:29                                 ` Oliver Upton
  0 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2022-01-25 17:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Raghavendra Rao Ananta, Andrew Jones, kvmarm, pshier, ricarkol,
	reijiw, jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell,
	Sean Christopherson

Hi Marc,

On Tue, Jan 25, 2022 at 12:46 AM Marc Zyngier <maz@kernel.org> wrote:
> > If I understand correctly, the original motivation for going with
> > pseudo-registers was to comply with QEMU, which uses KVM_GET_REG_LIST
> > and KVM_[GET|SET]_ONE_REG interface, but I'm guessing the VMMs doing
> > save/restore across migration might write the same values for every
> > vCPU.
>
> KVM currently restricts the vcpu features to be unified across vcpus,
> but that's only an implementation choice.

But that implementation choice has become ABI, no? How could support
for asymmetry be added without requiring userspace opt-in or breaking
existing VMMs that depend on feature unification?

> The ARM architecture doesn't
> mandate that these registers are all the same, and it isn't impossible
> that we'd allow for the feature set to become per-vcpu at some point
> in time. So this argument doesn't really hold.

Accessing per-VM state N times is bound to increase VM blackout time
during migrations ~linearly as the number of vCPUs in a VM increases,
since a VM scoped lock is necessary to serialize guest accesses. It
could be tolerable at present scale, but seems like in the future it
could become a real problem.

> Furthermore, compatibility with QEMU's save/restore model is
> essential, and AFAICT, there is no open source alternative.

Agree fundamentally, but I believe it is entirely reasonable to
require a userspace change to adopt a new KVM feature. Otherwise, we
may be trying to shoehorn new features into existing UAPI that may not
be a precise fit..

In order to cure the serialization mentioned above, two options are
top of mind: accessing the VM state with the VM FD or informing
userspace that a set of registers need only be written once for an
entire VM. If we add support for asymmetry later down the road, that
would become an opt-in such that userspace will do the access
per-vCPU.

> A device means yet another configuration and migration API. Don't you
> think we have enough of those? The complexity of KVM/arm64 userspace
> API is already insane, and extremely fragile. Adding to it will be a
> validation nightmare (it already is, and I don't see anyone actively
> helping with it).

It seems equally fragile to introduce VM-wide serialization to vCPU
UAPI that we know is in the live migration critical path for _any_
VMM. Without requiring userspace changes for all the new widgets under
discussion we're effectively forcing VMMs to do something suboptimal.

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
  2022-01-25 17:29                                 ` Oliver Upton
  (?)
@ 2022-02-08  9:46                                   ` Marc Zyngier
  -1 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2022-02-08  9:46 UTC (permalink / raw)
  To: Oliver Upton; +Cc: kvm, pshier, kvmarm, linux-arm-kernel

Huh, somewhat missed that email, apologies for the delay.

On Tue, 25 Jan 2022 17:29:13 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> Hi Marc,
> 
> On Tue, Jan 25, 2022 at 12:46 AM Marc Zyngier <maz@kernel.org> wrote:
> > > If I understand correctly, the original motivation for going with
> > > pseudo-registers was to comply with QEMU, which uses KVM_GET_REG_LIST
> > > and KVM_[GET|SET]_ONE_REG interface, but I'm guessing the VMMs doing
> > > save/restore across migration might write the same values for every
> > > vCPU.
> >
> > KVM currently restricts the vcpu features to be unified across vcpus,
> > but that's only an implementation choice.
> 
> But that implementation choice has become ABI, no? How could support
> for asymmetry be added without requiring userspace opt-in or breaking
> existing VMMs that depend on feature unification?

Of course, you'd need some sort of advertising of this new behaviour.

One thing I would like to add to the current state of thing is an
indication of whether the effects of a sysreg being written from
userspace are global or local to a vcpu. You'd need a new capability,
and an extra flag added to the encoding of each register.

> 
> > The ARM architecture doesn't
> > mandate that these registers are all the same, and it isn't impossible
> > that we'd allow for the feature set to become per-vcpu at some point
> > in time. So this argument doesn't really hold.
> 
> Accessing per-VM state N times is bound to increase VM blackout time
> during migrations ~linearly as the number of vCPUs in a VM increases,
> since a VM scoped lock is necessary to serialize guest accesses. It
> could be tolerable at present scale, but seems like in the future it
> could become a real problem.

I don't disagree. But I doubt switching to a different API altogether
is the solution to this.

> 
> > Furthermore, compatibility with QEMU's save/restore model is
> > essential, and AFAICT, there is no open source alternative.
> 
> Agree fundamentally, but I believe it is entirely reasonable to
> require a userspace change to adopt a new KVM feature. Otherwise, we
> may be trying to shoehorn new features into existing UAPI that may not
> be a precise fit..

But the very purpose of this API is to support discoverability. If we
can't support it, then we might as well declare the whole API
deprecated and restart from scratch.

No, I'm not seriously suggesting this :-/.

> In order to cure the serialization mentioned above, two options are
> top of mind: accessing the VM state with the VM FD or informing
> userspace that a set of registers need only be written once for an
> entire VM. If we add support for asymmetry later down the road, that
> would become an opt-in such that userspace will do the access
> per-vCPU.

It is the latter that I'm suggesting.

>
> > A device means yet another configuration and migration API. Don't you
> > think we have enough of those? The complexity of KVM/arm64 userspace
> > API is already insane, and extremely fragile. Adding to it will be a
> > validation nightmare (it already is, and I don't see anyone actively
> > helping with it).
> 
> It seems equally fragile to introduce VM-wide serialization to vCPU
> UAPI that we know is in the live migration critical path for _any_
> VMM. Without requiring userspace changes for all the new widgets under
> discussion we're effectively forcing VMMs to do something suboptimal.

I'm perfectly happy with suboptimal to start with, and find ways to
make it better once we have a clear path forward. I just don't want to
conflate the two.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2022-02-08  9:46                                   ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2022-02-08  9:46 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Raghavendra Rao Ananta, Andrew Jones, kvmarm, pshier, ricarkol,
	reijiw, jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell,
	Sean Christopherson

Huh, somewhat missed that email, apologies for the delay.

On Tue, 25 Jan 2022 17:29:13 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> Hi Marc,
> 
> On Tue, Jan 25, 2022 at 12:46 AM Marc Zyngier <maz@kernel.org> wrote:
> > > If I understand correctly, the original motivation for going with
> > > pseudo-registers was to comply with QEMU, which uses KVM_GET_REG_LIST
> > > and KVM_[GET|SET]_ONE_REG interface, but I'm guessing the VMMs doing
> > > save/restore across migration might write the same values for every
> > > vCPU.
> >
> > KVM currently restricts the vcpu features to be unified across vcpus,
> > but that's only an implementation choice.
> 
> But that implementation choice has become ABI, no? How could support
> for asymmetry be added without requiring userspace opt-in or breaking
> existing VMMs that depend on feature unification?

Of course, you'd need some sort of advertising of this new behaviour.

One thing I would like to add to the current state of thing is an
indication of whether the effects of a sysreg being written from
userspace are global or local to a vcpu. You'd need a new capability,
and an extra flag added to the encoding of each register.

> 
> > The ARM architecture doesn't
> > mandate that these registers are all the same, and it isn't impossible
> > that we'd allow for the feature set to become per-vcpu at some point
> > in time. So this argument doesn't really hold.
> 
> Accessing per-VM state N times is bound to increase VM blackout time
> during migrations ~linearly as the number of vCPUs in a VM increases,
> since a VM scoped lock is necessary to serialize guest accesses. It
> could be tolerable at present scale, but seems like in the future it
> could become a real problem.

I don't disagree. But I doubt switching to a different API altogether
is the solution to this.

> 
> > Furthermore, compatibility with QEMU's save/restore model is
> > essential, and AFAICT, there is no open source alternative.
> 
> Agree fundamentally, but I believe it is entirely reasonable to
> require a userspace change to adopt a new KVM feature. Otherwise, we
> may be trying to shoehorn new features into existing UAPI that may not
> be a precise fit..

But the very purpose of this API is to support discoverability. If we
can't support it, then we might as well declare the whole API
deprecated and restart from scratch.

No, I'm not seriously suggesting this :-/.

> In order to cure the serialization mentioned above, two options are
> top of mind: accessing the VM state with the VM FD or informing
> userspace that a set of registers need only be written once for an
> entire VM. If we add support for asymmetry later down the road, that
> would become an opt-in such that userspace will do the access
> per-vCPU.

It is the latter that I'm suggesting.

>
> > A device means yet another configuration and migration API. Don't you
> > think we have enough of those? The complexity of KVM/arm64 userspace
> > API is already insane, and extremely fragile. Adding to it will be a
> > validation nightmare (it already is, and I don't see anyone actively
> > helping with it).
> 
> It seems equally fragile to introduce VM-wide serialization to vCPU
> UAPI that we know is in the live migration critical path for _any_
> VMM. Without requiring userspace changes for all the new widgets under
> discussion we're effectively forcing VMMs to do something suboptimal.

I'm perfectly happy with suboptimal to start with, and find ways to
make it better once we have a clear path forward. I just don't want to
conflate the two.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2022-02-08  9:46                                   ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2022-02-08  9:46 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Raghavendra Rao Ananta, Andrew Jones, kvmarm, pshier, ricarkol,
	reijiw, jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell,
	Sean Christopherson

Huh, somewhat missed that email, apologies for the delay.

On Tue, 25 Jan 2022 17:29:13 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> Hi Marc,
> 
> On Tue, Jan 25, 2022 at 12:46 AM Marc Zyngier <maz@kernel.org> wrote:
> > > If I understand correctly, the original motivation for going with
> > > pseudo-registers was to comply with QEMU, which uses KVM_GET_REG_LIST
> > > and KVM_[GET|SET]_ONE_REG interface, but I'm guessing the VMMs doing
> > > save/restore across migration might write the same values for every
> > > vCPU.
> >
> > KVM currently restricts the vcpu features to be unified across vcpus,
> > but that's only an implementation choice.
> 
> But that implementation choice has become ABI, no? How could support
> for asymmetry be added without requiring userspace opt-in or breaking
> existing VMMs that depend on feature unification?

Of course, you'd need some sort of advertising of this new behaviour.

One thing I would like to add to the current state of thing is an
indication of whether the effects of a sysreg being written from
userspace are global or local to a vcpu. You'd need a new capability,
and an extra flag added to the encoding of each register.

> 
> > The ARM architecture doesn't
> > mandate that these registers are all the same, and it isn't impossible
> > that we'd allow for the feature set to become per-vcpu at some point
> > in time. So this argument doesn't really hold.
> 
> Accessing per-VM state N times is bound to increase VM blackout time
> during migrations ~linearly as the number of vCPUs in a VM increases,
> since a VM scoped lock is necessary to serialize guest accesses. It
> could be tolerable at present scale, but seems like in the future it
> could become a real problem.

I don't disagree. But I doubt switching to a different API altogether
is the solution to this.

> 
> > Furthermore, compatibility with QEMU's save/restore model is
> > essential, and AFAICT, there is no open source alternative.
> 
> Agree fundamentally, but I believe it is entirely reasonable to
> require a userspace change to adopt a new KVM feature. Otherwise, we
> may be trying to shoehorn new features into existing UAPI that may not
> be a precise fit..

But the very purpose of this API is to support discoverability. If we
can't support it, then we might as well declare the whole API
deprecated and restart from scratch.

No, I'm not seriously suggesting this :-/.

> In order to cure the serialization mentioned above, two options are
> top of mind: accessing the VM state with the VM FD or informing
> userspace that a set of registers need only be written once for an
> entire VM. If we add support for asymmetry later down the road, that
> would become an opt-in such that userspace will do the access
> per-vCPU.

It is the latter that I'm suggesting.

>
> > A device means yet another configuration and migration API. Don't you
> > think we have enough of those? The complexity of KVM/arm64 userspace
> > API is already insane, and extremely fragile. Adding to it will be a
> > validation nightmare (it already is, and I don't see anyone actively
> > helping with it).
> 
> It seems equally fragile to introduce VM-wide serialization to vCPU
> UAPI that we know is in the live migration critical path for _any_
> VMM. Without requiring userspace changes for all the new widgets under
> discussion we're effectively forcing VMMs to do something suboptimal.

I'm perfectly happy with suboptimal to start with, and find ways to
make it better once we have a clear path forward. I just don't want to
conflate the two.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
  2022-02-08  9:46                                   ` Marc Zyngier
  (?)
@ 2022-02-08  9:56                                     ` Oliver Upton
  -1 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2022-02-08  9:56 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, pshier, kvmarm, linux-arm-kernel

Hi Marc,

On Tue, Feb 8, 2022 at 1:46 AM Marc Zyngier <maz@kernel.org> wrote:
> > > KVM currently restricts the vcpu features to be unified across vcpus,
> > > but that's only an implementation choice.
> >
> > But that implementation choice has become ABI, no? How could support
> > for asymmetry be added without requiring userspace opt-in or breaking
> > existing VMMs that depend on feature unification?
>
> Of course, you'd need some sort of advertising of this new behaviour.
>
> One thing I would like to add to the current state of thing is an
> indication of whether the effects of a sysreg being written from
> userspace are global or local to a vcpu. You'd need a new capability,
> and an extra flag added to the encoding of each register.

Ah. I think that is a much more reasonable fit then. VMMs unaware of
this can continue to migrate new bits (albeit at the cost of
potentially higher lock contention for the per-VM stuff), and those
that do can reap the benefits of writing such attributes exactly once.

[...]

> > > A device means yet another configuration and migration API. Don't you
> > > think we have enough of those? The complexity of KVM/arm64 userspace
> > > API is already insane, and extremely fragile. Adding to it will be a
> > > validation nightmare (it already is, and I don't see anyone actively
> > > helping with it).
> >
> > It seems equally fragile to introduce VM-wide serialization to vCPU
> > UAPI that we know is in the live migration critical path for _any_
> > VMM. Without requiring userspace changes for all the new widgets under
> > discussion we're effectively forcing VMMs to do something suboptimal.
>
> I'm perfectly happy with suboptimal to start with, and find ways to
> make it better once we have a clear path forward. I just don't want to
> conflate the two.

Fair. My initial concern was that it didn't seem as though there was
much room for growth/improvement with the one reg UAPI, but your
suggestion definitely provides a ramp out to handle VM state once per
VM.

Thanks for following up :)

--
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2022-02-08  9:56                                     ` Oliver Upton
  0 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2022-02-08  9:56 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Raghavendra Rao Ananta, Andrew Jones, kvmarm, pshier, ricarkol,
	reijiw, jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell,
	Sean Christopherson

Hi Marc,

On Tue, Feb 8, 2022 at 1:46 AM Marc Zyngier <maz@kernel.org> wrote:
> > > KVM currently restricts the vcpu features to be unified across vcpus,
> > > but that's only an implementation choice.
> >
> > But that implementation choice has become ABI, no? How could support
> > for asymmetry be added without requiring userspace opt-in or breaking
> > existing VMMs that depend on feature unification?
>
> Of course, you'd need some sort of advertising of this new behaviour.
>
> One thing I would like to add to the current state of thing is an
> indication of whether the effects of a sysreg being written from
> userspace are global or local to a vcpu. You'd need a new capability,
> and an extra flag added to the encoding of each register.

Ah. I think that is a much more reasonable fit then. VMMs unaware of
this can continue to migrate new bits (albeit at the cost of
potentially higher lock contention for the per-VM stuff), and those
that do can reap the benefits of writing such attributes exactly once.

[...]

> > > A device means yet another configuration and migration API. Don't you
> > > think we have enough of those? The complexity of KVM/arm64 userspace
> > > API is already insane, and extremely fragile. Adding to it will be a
> > > validation nightmare (it already is, and I don't see anyone actively
> > > helping with it).
> >
> > It seems equally fragile to introduce VM-wide serialization to vCPU
> > UAPI that we know is in the live migration critical path for _any_
> > VMM. Without requiring userspace changes for all the new widgets under
> > discussion we're effectively forcing VMMs to do something suboptimal.
>
> I'm perfectly happy with suboptimal to start with, and find ways to
> make it better once we have a clear path forward. I just don't want to
> conflate the two.

Fair. My initial concern was that it didn't seem as though there was
much room for growth/improvement with the one reg UAPI, but your
suggestion definitely provides a ramp out to handle VM state once per
VM.

Thanks for following up :)

--
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2022-02-08  9:56                                     ` Oliver Upton
  0 siblings, 0 replies; 74+ messages in thread
From: Oliver Upton @ 2022-02-08  9:56 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Raghavendra Rao Ananta, Andrew Jones, kvmarm, pshier, ricarkol,
	reijiw, jingzhangos, kvm, linux-arm-kernel, james.morse,
	Alexandru.Elisei, suzuki.poulose, Peter Maydell,
	Sean Christopherson

Hi Marc,

On Tue, Feb 8, 2022 at 1:46 AM Marc Zyngier <maz@kernel.org> wrote:
> > > KVM currently restricts the vcpu features to be unified across vcpus,
> > > but that's only an implementation choice.
> >
> > But that implementation choice has become ABI, no? How could support
> > for asymmetry be added without requiring userspace opt-in or breaking
> > existing VMMs that depend on feature unification?
>
> Of course, you'd need some sort of advertising of this new behaviour.
>
> One thing I would like to add to the current state of thing is an
> indication of whether the effects of a sysreg being written from
> userspace are global or local to a vcpu. You'd need a new capability,
> and an extra flag added to the encoding of each register.

Ah. I think that is a much more reasonable fit then. VMMs unaware of
this can continue to migrate new bits (albeit at the cost of
potentially higher lock contention for the per-VM stuff), and those
that do can reap the benefits of writing such attributes exactly once.

[...]

> > > A device means yet another configuration and migration API. Don't you
> > > think we have enough of those? The complexity of KVM/arm64 userspace
> > > API is already insane, and extremely fragile. Adding to it will be a
> > > validation nightmare (it already is, and I don't see anyone actively
> > > helping with it).
> >
> > It seems equally fragile to introduce VM-wide serialization to vCPU
> > UAPI that we know is in the live migration critical path for _any_
> > VMM. Without requiring userspace changes for all the new widgets under
> > discussion we're effectively forcing VMMs to do something suboptimal.
>
> I'm perfectly happy with suboptimal to start with, and find ways to
> make it better once we have a clear path forward. I just don't want to
> conflate the two.

Fair. My initial concern was that it didn't seem as though there was
much room for growth/improvement with the one reg UAPI, but your
suggestion definitely provides a ramp out to handle VM state once per
VM.

Thanks for following up :)

--
Oliver

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
  2022-02-08  9:56                                     ` Oliver Upton
  (?)
@ 2022-02-08 16:58                                       ` Sean Christopherson
  -1 siblings, 0 replies; 74+ messages in thread
From: Sean Christopherson @ 2022-02-08 16:58 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, Raghavendra Rao Ananta, Andrew Jones, kvmarm,
	pshier, ricarkol, reijiw, jingzhangos, kvm, linux-arm-kernel,
	james.morse, Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Tue, Feb 08, 2022, Oliver Upton wrote:
> Hi Marc,
> 
> On Tue, Feb 8, 2022 at 1:46 AM Marc Zyngier <maz@kernel.org> wrote:
> > > > KVM currently restricts the vcpu features to be unified across vcpus,
> > > > but that's only an implementation choice.
> > >
> > > But that implementation choice has become ABI, no? How could support
> > > for asymmetry be added without requiring userspace opt-in or breaking
> > > existing VMMs that depend on feature unification?
> >
> > Of course, you'd need some sort of advertising of this new behaviour.
> >
> > One thing I would like to add to the current state of thing is an
> > indication of whether the effects of a sysreg being written from
> > userspace are global or local to a vcpu. You'd need a new capability,
> > and an extra flag added to the encoding of each register.
> 
> Ah. I think that is a much more reasonable fit then. VMMs unaware of
> this can continue to migrate new bits (albeit at the cost of
> potentially higher lock contention for the per-VM stuff), and those
> that do can reap the benefits of writing such attributes exactly once.

But the "proper" usage is no different than adding support for VM-scoped variants
of KVM_{G,S}ET_ONE_REG and friends, and a VM-scoped variant is conceptually a lot
cleaner IMO.  And making them truly VM-scoped means KVM can do things like support
sysregs that are immutable after vCPUs are created.

So long as KVM defaults to '0' for all such registers, lack of migration support
in userspace that isn't aware of the new API, i.e. doesn't do KVM_GET_REG_LIST at
a VM-scope, is a nop because said userspace also won't modify the registers in the
first place.  The only "unsolvable" problem that is avoided by usurping the per-vCPU
ioctls is rollback to a userspace VMM that isn't aware of the per-VM ioctls, but it
doesn't seem too onerous to tell userspace "don't use these unless your entire fleet
has upgraded", especially since that requirement/advisement is true for the KVM side
with respect to new registers regardless of how those registers are accessed.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2022-02-08 16:58                                       ` Sean Christopherson
  0 siblings, 0 replies; 74+ messages in thread
From: Sean Christopherson @ 2022-02-08 16:58 UTC (permalink / raw)
  To: Oliver Upton; +Cc: kvm, Marc Zyngier, pshier, kvmarm, linux-arm-kernel

On Tue, Feb 08, 2022, Oliver Upton wrote:
> Hi Marc,
> 
> On Tue, Feb 8, 2022 at 1:46 AM Marc Zyngier <maz@kernel.org> wrote:
> > > > KVM currently restricts the vcpu features to be unified across vcpus,
> > > > but that's only an implementation choice.
> > >
> > > But that implementation choice has become ABI, no? How could support
> > > for asymmetry be added without requiring userspace opt-in or breaking
> > > existing VMMs that depend on feature unification?
> >
> > Of course, you'd need some sort of advertising of this new behaviour.
> >
> > One thing I would like to add to the current state of thing is an
> > indication of whether the effects of a sysreg being written from
> > userspace are global or local to a vcpu. You'd need a new capability,
> > and an extra flag added to the encoding of each register.
> 
> Ah. I think that is a much more reasonable fit then. VMMs unaware of
> this can continue to migrate new bits (albeit at the cost of
> potentially higher lock contention for the per-VM stuff), and those
> that do can reap the benefits of writing such attributes exactly once.

But the "proper" usage is no different than adding support for VM-scoped variants
of KVM_{G,S}ET_ONE_REG and friends, and a VM-scoped variant is conceptually a lot
cleaner IMO.  And making them truly VM-scoped means KVM can do things like support
sysregs that are immutable after vCPUs are created.

So long as KVM defaults to '0' for all such registers, lack of migration support
in userspace that isn't aware of the new API, i.e. doesn't do KVM_GET_REG_LIST at
a VM-scope, is a nop because said userspace also won't modify the registers in the
first place.  The only "unsolvable" problem that is avoided by usurping the per-vCPU
ioctls is rollback to a userspace VMM that isn't aware of the per-VM ioctls, but it
doesn't seem too onerous to tell userspace "don't use these unless your entire fleet
has upgraded", especially since that requirement/advisement is true for the KVM side
with respect to new registers regardless of how those registers are accessed.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2022-02-08 16:58                                       ` Sean Christopherson
  0 siblings, 0 replies; 74+ messages in thread
From: Sean Christopherson @ 2022-02-08 16:58 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, Raghavendra Rao Ananta, Andrew Jones, kvmarm,
	pshier, ricarkol, reijiw, jingzhangos, kvm, linux-arm-kernel,
	james.morse, Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Tue, Feb 08, 2022, Oliver Upton wrote:
> Hi Marc,
> 
> On Tue, Feb 8, 2022 at 1:46 AM Marc Zyngier <maz@kernel.org> wrote:
> > > > KVM currently restricts the vcpu features to be unified across vcpus,
> > > > but that's only an implementation choice.
> > >
> > > But that implementation choice has become ABI, no? How could support
> > > for asymmetry be added without requiring userspace opt-in or breaking
> > > existing VMMs that depend on feature unification?
> >
> > Of course, you'd need some sort of advertising of this new behaviour.
> >
> > One thing I would like to add to the current state of thing is an
> > indication of whether the effects of a sysreg being written from
> > userspace are global or local to a vcpu. You'd need a new capability,
> > and an extra flag added to the encoding of each register.
> 
> Ah. I think that is a much more reasonable fit then. VMMs unaware of
> this can continue to migrate new bits (albeit at the cost of
> potentially higher lock contention for the per-VM stuff), and those
> that do can reap the benefits of writing such attributes exactly once.

But the "proper" usage is no different than adding support for VM-scoped variants
of KVM_{G,S}ET_ONE_REG and friends, and a VM-scoped variant is conceptually a lot
cleaner IMO.  And making them truly VM-scoped means KVM can do things like support
sysregs that are immutable after vCPUs are created.

So long as KVM defaults to '0' for all such registers, lack of migration support
in userspace that isn't aware of the new API, i.e. doesn't do KVM_GET_REG_LIST at
a VM-scope, is a nop because said userspace also won't modify the registers in the
first place.  The only "unsolvable" problem that is avoided by usurping the per-vCPU
ioctls is rollback to a userspace VMM that isn't aware of the per-VM ioctls, but it
doesn't seem too onerous to tell userspace "don't use these unless your entire fleet
has upgraded", especially since that requirement/advisement is true for the KVM side
with respect to new registers regardless of how those registers are accessed.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
  2022-02-08 16:58                                       ` Sean Christopherson
  (?)
@ 2022-02-08 17:48                                         ` Marc Zyngier
  -1 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2022-02-08 17:48 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Oliver Upton, Raghavendra Rao Ananta, Andrew Jones, kvmarm,
	pshier, ricarkol, reijiw, jingzhangos, kvm, linux-arm-kernel,
	james.morse, Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Tue, 08 Feb 2022 16:58:26 +0000,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Tue, Feb 08, 2022, Oliver Upton wrote:
> > Hi Marc,
> > 
> > On Tue, Feb 8, 2022 at 1:46 AM Marc Zyngier <maz@kernel.org> wrote:
> > > > > KVM currently restricts the vcpu features to be unified across vcpus,
> > > > > but that's only an implementation choice.
> > > >
> > > > But that implementation choice has become ABI, no? How could support
> > > > for asymmetry be added without requiring userspace opt-in or breaking
> > > > existing VMMs that depend on feature unification?
> > >
> > > Of course, you'd need some sort of advertising of this new behaviour.
> > >
> > > One thing I would like to add to the current state of thing is an
> > > indication of whether the effects of a sysreg being written from
> > > userspace are global or local to a vcpu. You'd need a new capability,
> > > and an extra flag added to the encoding of each register.
> > 
> > Ah. I think that is a much more reasonable fit then. VMMs unaware of
> > this can continue to migrate new bits (albeit at the cost of
> > potentially higher lock contention for the per-VM stuff), and those
> > that do can reap the benefits of writing such attributes exactly once.
> 
> But the "proper" usage is no different than adding support for
> VM-scoped variants of KVM_{G,S}ET_ONE_REG and friends, and a
> VM-scoped variant is conceptually a lot cleaner IMO.  And making
> them truly VM-scoped means KVM can do things like support sysregs
> that are immutable after vCPUs are created.

It is different, because your approach requires you to update all the
existing VMMs to find out which register is of which kind. Not to
mention that global sysregs are an absolute oddity in the face of the
architecture (there is none in the base architecture).

> So long as KVM defaults to '0' for all such registers, lack of
> migration support in userspace that isn't aware of the new API,
> i.e. doesn't do KVM_GET_REG_LIST at a VM-scope, is a nop because
> said userspace also won't modify the registers in the first place.

We want any VMM that correctly uses the API today to seamlessly be
able to save/restore any new feature. QEMU does that, and it should
continue to work no matter what new feature we add to the list.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2022-02-08 17:48                                         ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2022-02-08 17:48 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, pshier, kvmarm, linux-arm-kernel

On Tue, 08 Feb 2022 16:58:26 +0000,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Tue, Feb 08, 2022, Oliver Upton wrote:
> > Hi Marc,
> > 
> > On Tue, Feb 8, 2022 at 1:46 AM Marc Zyngier <maz@kernel.org> wrote:
> > > > > KVM currently restricts the vcpu features to be unified across vcpus,
> > > > > but that's only an implementation choice.
> > > >
> > > > But that implementation choice has become ABI, no? How could support
> > > > for asymmetry be added without requiring userspace opt-in or breaking
> > > > existing VMMs that depend on feature unification?
> > >
> > > Of course, you'd need some sort of advertising of this new behaviour.
> > >
> > > One thing I would like to add to the current state of thing is an
> > > indication of whether the effects of a sysreg being written from
> > > userspace are global or local to a vcpu. You'd need a new capability,
> > > and an extra flag added to the encoding of each register.
> > 
> > Ah. I think that is a much more reasonable fit then. VMMs unaware of
> > this can continue to migrate new bits (albeit at the cost of
> > potentially higher lock contention for the per-VM stuff), and those
> > that do can reap the benefits of writing such attributes exactly once.
> 
> But the "proper" usage is no different than adding support for
> VM-scoped variants of KVM_{G,S}ET_ONE_REG and friends, and a
> VM-scoped variant is conceptually a lot cleaner IMO.  And making
> them truly VM-scoped means KVM can do things like support sysregs
> that are immutable after vCPUs are created.

It is different, because your approach requires you to update all the
existing VMMs to find out which register is of which kind. Not to
mention that global sysregs are an absolute oddity in the face of the
architecture (there is none in the base architecture).

> So long as KVM defaults to '0' for all such registers, lack of
> migration support in userspace that isn't aware of the new API,
> i.e. doesn't do KVM_GET_REG_LIST at a VM-scope, is a nop because
> said userspace also won't modify the registers in the first place.

We want any VMM that correctly uses the API today to seamlessly be
able to save/restore any new feature. QEMU does that, and it should
continue to work no matter what new feature we add to the list.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 74+ messages in thread

* Re: KVM/arm64: Guest ABI changes do not appear rollback-safe
@ 2022-02-08 17:48                                         ` Marc Zyngier
  0 siblings, 0 replies; 74+ messages in thread
From: Marc Zyngier @ 2022-02-08 17:48 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Oliver Upton, Raghavendra Rao Ananta, Andrew Jones, kvmarm,
	pshier, ricarkol, reijiw, jingzhangos, kvm, linux-arm-kernel,
	james.morse, Alexandru.Elisei, suzuki.poulose, Peter Maydell

On Tue, 08 Feb 2022 16:58:26 +0000,
Sean Christopherson <seanjc@google.com> wrote:
> 
> On Tue, Feb 08, 2022, Oliver Upton wrote:
> > Hi Marc,
> > 
> > On Tue, Feb 8, 2022 at 1:46 AM Marc Zyngier <maz@kernel.org> wrote:
> > > > > KVM currently restricts the vcpu features to be unified across vcpus,
> > > > > but that's only an implementation choice.
> > > >
> > > > But that implementation choice has become ABI, no? How could support
> > > > for asymmetry be added without requiring userspace opt-in or breaking
> > > > existing VMMs that depend on feature unification?
> > >
> > > Of course, you'd need some sort of advertising of this new behaviour.
> > >
> > > One thing I would like to add to the current state of thing is an
> > > indication of whether the effects of a sysreg being written from
> > > userspace are global or local to a vcpu. You'd need a new capability,
> > > and an extra flag added to the encoding of each register.
> > 
> > Ah. I think that is a much more reasonable fit then. VMMs unaware of
> > this can continue to migrate new bits (albeit at the cost of
> > potentially higher lock contention for the per-VM stuff), and those
> > that do can reap the benefits of writing such attributes exactly once.
> 
> But the "proper" usage is no different than adding support for
> VM-scoped variants of KVM_{G,S}ET_ONE_REG and friends, and a
> VM-scoped variant is conceptually a lot cleaner IMO.  And making
> them truly VM-scoped means KVM can do things like support sysregs
> that are immutable after vCPUs are created.

It is different, because your approach requires you to update all the
existing VMMs to find out which register is of which kind. Not to
mention that global sysregs are an absolute oddity in the face of the
architecture (there is none in the base architecture).

> So long as KVM defaults to '0' for all such registers, lack of
> migration support in userspace that isn't aware of the new API,
> i.e. doesn't do KVM_GET_REG_LIST at a VM-scope, is a nop because
> said userspace also won't modify the registers in the first place.

We want any VMM that correctly uses the API today to seamlessly be
able to save/restore any new feature. QEMU does that, and it should
continue to work no matter what new feature we add to the list.

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 74+ messages in thread

end of thread, other threads:[~2022-02-08 17:50 UTC | newest]

Thread overview: 74+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-24 21:15 KVM/arm64: Guest ABI changes do not appear rollback-safe Oliver Upton
2021-08-24 21:15 ` Oliver Upton
2021-08-24 21:15 ` Oliver Upton
2021-08-24 21:34 ` [RFC PATCH] KVM: arm64: Allow VMMs to opt-out of KVM_CAP_PTP_KVM Oliver Upton
2021-08-24 21:34   ` Oliver Upton
2021-08-25  9:27 ` KVM/arm64: Guest ABI changes do not appear rollback-safe Marc Zyngier
2021-08-25  9:27   ` Marc Zyngier
2021-08-25  9:27   ` Marc Zyngier
2021-08-25 10:02   ` Oliver Upton
2021-08-25 10:02     ` Oliver Upton
2021-08-25 10:02     ` Oliver Upton
2021-08-25 10:39     ` Marc Zyngier
2021-08-25 10:39       ` Marc Zyngier
2021-08-25 10:39       ` Marc Zyngier
2021-08-25 15:07       ` Andrew Jones
2021-08-25 15:07         ` Andrew Jones
2021-08-25 15:07         ` Andrew Jones
2021-08-25 18:14         ` Oliver Upton
2021-08-25 18:14           ` Oliver Upton
2021-08-25 18:14           ` Oliver Upton
2021-08-26  8:37           ` Marc Zyngier
2021-08-26  8:37             ` Marc Zyngier
2021-08-26  8:37             ` Marc Zyngier
2021-08-26 18:49             ` Oliver Upton
2021-08-26 18:49               ` Oliver Upton
2021-08-26 18:49               ` Oliver Upton
2021-08-27  7:40               ` Andrew Jones
2021-08-27  7:40                 ` Andrew Jones
2021-08-27  7:40                 ` Andrew Jones
2021-09-29 18:22                 ` Oliver Upton
2021-09-29 18:22                   ` Oliver Upton
2021-09-29 18:22                   ` Oliver Upton
2021-09-30  7:32                   ` Marc Zyngier
2021-09-30  7:32                     ` Marc Zyngier
2021-09-30  7:32                     ` Marc Zyngier
2021-09-30 17:24                     ` Oliver Upton
2021-09-30 17:24                       ` Oliver Upton
2021-09-30 17:24                       ` Oliver Upton
2021-10-01 11:43                       ` Marc Zyngier
2021-10-01 11:43                         ` Marc Zyngier
2021-10-01 11:43                         ` Marc Zyngier
2021-10-01 15:38                         ` Oliver Upton
2021-10-01 15:38                           ` Oliver Upton
2021-10-01 15:38                           ` Oliver Upton
2022-01-25  3:47                           ` Raghavendra Rao Ananta
2022-01-25  3:47                             ` Raghavendra Rao Ananta
2022-01-25  3:47                             ` Raghavendra Rao Ananta
2022-01-25  8:45                             ` Marc Zyngier
2022-01-25  8:45                               ` Marc Zyngier
2022-01-25  8:45                               ` Marc Zyngier
2022-01-25 17:29                               ` Oliver Upton
2022-01-25 17:29                                 ` Oliver Upton
2022-01-25 17:29                                 ` Oliver Upton
2022-02-08  9:46                                 ` Marc Zyngier
2022-02-08  9:46                                   ` Marc Zyngier
2022-02-08  9:46                                   ` Marc Zyngier
2022-02-08  9:56                                   ` Oliver Upton
2022-02-08  9:56                                     ` Oliver Upton
2022-02-08  9:56                                     ` Oliver Upton
2022-02-08 16:58                                     ` Sean Christopherson
2022-02-08 16:58                                       ` Sean Christopherson
2022-02-08 16:58                                       ` Sean Christopherson
2022-02-08 17:48                                       ` Marc Zyngier
2022-02-08 17:48                                         ` Marc Zyngier
2022-02-08 17:48                                         ` Marc Zyngier
2021-08-26  8:49           ` Andrew Jones
2021-08-26  8:49             ` Andrew Jones
2021-08-26  8:49             ` Andrew Jones
2021-08-26  8:54             ` Andrew Jones
2021-08-26  8:54               ` Andrew Jones
2021-08-26  8:54               ` Andrew Jones
2021-08-26  9:43               ` Marc Zyngier
2021-08-26  9:43                 ` Marc Zyngier
2021-08-26  9:43                 ` Marc Zyngier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.