kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support
@ 2021-08-19 22:36 Oliver Upton
  2021-08-19 22:36 ` [PATCH 1/6] KVM: arm64: Drop unused vcpu param to kvm_psci_valid_affinity() Oliver Upton
                   ` (9 more replies)
  0 siblings, 10 replies; 20+ messages in thread
From: Oliver Upton @ 2021-08-19 22:36 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Marc Zyngier, Peter Shier, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Andrew Jones, Oliver Upton

Certain VMMs/operators may wish to give their guests the ability to
initiate a system suspend that could result in the VM being saved to
persistent storage to be resumed at a later time. The PSCI v1.0
specification describes an SMC, SYSTEM_SUSPEND, that allows a kernel to
request a system suspend. This call is optional for v1.0, and KVM
elected to not support the call in its v1.0 implementation.

This series adds support for the SYSTEM_SUSPEND PSCI call to KVM/arm64.
Since this is a system-scoped event, KVM cannot quiesce the VM on its
own. We add a new system exit type in this series to clue in userspace
that a suspend was requested. Per the KVM_EXIT_SYSTEM_EVENT ABI, a VMM
that doesn't care about this event can simply resume the guest without
issue (we set up the calling vCPU to come out of reset correctly on next
KVM_RUN).

Patch 1 is unrelated, and is a fix for "KVM: arm64: Enforce reserved
bits for PSCI target affinities" on the kvmarm/next branch. Nothing
particularly hairy, just an unused param.

Patch 2 simplifies the function to check if KVM allows a particular PSCI
function. We can generally disallow any PSCI function that sets the
SMC64 bit in the PSCI function ID.

Patch 3 wraps up the PSCI reset logic used for CPU_ON, which will be
needed later to queue up a reset on the vCPU that requested the system
suspend.

Patch 4 brings in the new UAPI and PSCI call, guarded behind a VM
capability for backwards compatibility.

Patch 5 is indirectly related to this series, and avoids compiler
reordering on PSCI calls in the selftest introduced by "selftests: KVM:
Introduce psci_cpu_on_test".

Finally, patch 6 extends the PSCI selftest to verify the
SYSTEM_SUSPEND PSCI call behaves as intended.

These patches apply cleanly to kvmarm/next at the following commit:

f2267b87ecd5 ("Merge branch kvm-arm64/misc-5.15 into kvmarm-master/next")

The series is intentionally based on kvmarm/next for the sake of fixing
patches only present there in [1/6] and [5/6]. Tested on QEMU (ick)
since my Mt. Jade box is out to lunch at the moment and for some unknown
reason the toolchain on my work computer doesn't play nice with the FVP.

Oliver Upton (6):
  KVM: arm64: Drop unused vcpu param to kvm_psci_valid_affinity()
  KVM: arm64: Clean up SMC64 PSCI filtering for AArch32 guests
  KVM: arm64: Encapsulate reset request logic in a helper function
  KVM: arm64: Add support for SYSTEM_SUSPEND PSCI call
  selftests: KVM: Promote PSCI hypercalls to asm volatile
  selftests: KVM: Test SYSTEM_SUSPEND PSCI call

 arch/arm64/include/asm/kvm_host.h             |   3 +
 arch/arm64/kvm/arm.c                          |   5 +
 arch/arm64/kvm/psci.c                         | 134 +++++++++++++-----
 include/uapi/linux/kvm.h                      |   2 +
 .../selftests/kvm/aarch64/psci_cpu_on_test.c  | 126 +++++++++++-----
 5 files changed, 202 insertions(+), 68 deletions(-)

-- 
2.33.0.rc2.250.ged5fa647cd-goog


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/6] KVM: arm64: Drop unused vcpu param to kvm_psci_valid_affinity()
  2021-08-19 22:36 [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support Oliver Upton
@ 2021-08-19 22:36 ` Oliver Upton
  2021-08-19 22:36 ` [PATCH 2/6] KVM: arm64: Clean up SMC64 PSCI filtering for AArch32 guests Oliver Upton
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Oliver Upton @ 2021-08-19 22:36 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Marc Zyngier, Peter Shier, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Andrew Jones, Oliver Upton

The helper function does not need a pointer to the vCPU, as it only
consults a constant mask; drop the unused vcpu parameter.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/kvm/psci.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 74c47d420253..d46842f45b0a 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -59,8 +59,7 @@ static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
 	kvm_vcpu_kick(vcpu);
 }
 
-static inline bool kvm_psci_valid_affinity(struct kvm_vcpu *vcpu,
-					   unsigned long affinity)
+static inline bool kvm_psci_valid_affinity(unsigned long affinity)
 {
 	return !(affinity & ~MPIDR_HWID_BITMASK);
 }
@@ -73,7 +72,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 	unsigned long cpu_id;
 
 	cpu_id = smccc_get_arg1(source_vcpu);
-	if (!kvm_psci_valid_affinity(source_vcpu, cpu_id))
+	if (!kvm_psci_valid_affinity(cpu_id))
 		return PSCI_RET_INVALID_PARAMS;
 
 	vcpu = kvm_mpidr_to_vcpu(kvm, cpu_id);
@@ -132,7 +131,7 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
 	target_affinity = smccc_get_arg1(vcpu);
 	lowest_affinity_level = smccc_get_arg2(vcpu);
 
-	if (!kvm_psci_valid_affinity(vcpu, target_affinity))
+	if (!kvm_psci_valid_affinity(target_affinity))
 		return PSCI_RET_INVALID_PARAMS;
 
 	/* Determine target affinity mask */
-- 
2.33.0.rc2.250.ged5fa647cd-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/6] KVM: arm64: Clean up SMC64 PSCI filtering for AArch32 guests
  2021-08-19 22:36 [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support Oliver Upton
  2021-08-19 22:36 ` [PATCH 1/6] KVM: arm64: Drop unused vcpu param to kvm_psci_valid_affinity() Oliver Upton
@ 2021-08-19 22:36 ` Oliver Upton
  2021-08-19 22:36 ` [PATCH 3/6] KVM: arm64: Encapsulate reset request logic in a helper function Oliver Upton
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Oliver Upton @ 2021-08-19 22:36 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Marc Zyngier, Peter Shier, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Andrew Jones, Oliver Upton

The only valid calling SMC calling convention from an AArch32 state is
SMC32. Disallow any PSCI function that sets the SMC64 function ID bit
when called from AArch32 rather than comparing against known SMC64 PSCI
functions.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/kvm/psci.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index d46842f45b0a..310b9cb2b32b 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -208,15 +208,11 @@ static void kvm_psci_narrow_to_32bit(struct kvm_vcpu *vcpu)
 
 static unsigned long kvm_psci_check_allowed_function(struct kvm_vcpu *vcpu, u32 fn)
 {
-	switch(fn) {
-	case PSCI_0_2_FN64_CPU_SUSPEND:
-	case PSCI_0_2_FN64_CPU_ON:
-	case PSCI_0_2_FN64_AFFINITY_INFO:
-		/* Disallow these functions for 32bit guests */
-		if (vcpu_mode_is_32bit(vcpu))
-			return PSCI_RET_NOT_SUPPORTED;
-		break;
-	}
+	/*
+	 * Prevent 32 bit guests from calling 64 bit PSCI functions.
+	 */
+	if ((fn & PSCI_0_2_64BIT) && vcpu_mode_is_32bit(vcpu))
+		return PSCI_RET_NOT_SUPPORTED;
 
 	return 0;
 }
-- 
2.33.0.rc2.250.ged5fa647cd-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/6] KVM: arm64: Encapsulate reset request logic in a helper function
  2021-08-19 22:36 [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support Oliver Upton
  2021-08-19 22:36 ` [PATCH 1/6] KVM: arm64: Drop unused vcpu param to kvm_psci_valid_affinity() Oliver Upton
  2021-08-19 22:36 ` [PATCH 2/6] KVM: arm64: Clean up SMC64 PSCI filtering for AArch32 guests Oliver Upton
@ 2021-08-19 22:36 ` Oliver Upton
  2021-08-19 22:36 ` [PATCH 4/6] KVM: arm64: Add support for SYSTEM_SUSPEND PSCI call Oliver Upton
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Oliver Upton @ 2021-08-19 22:36 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Marc Zyngier, Peter Shier, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Andrew Jones, Oliver Upton

In its implementation of the PSCI function, KVM needs to request that a
target vCPU resets before its next entry into the guest. Wrap the logic
for requesting a reset in a function for later use by other implemented
PSCI calls.

No functional change intended.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/kvm/psci.c | 59 +++++++++++++++++++++++++------------------
 1 file changed, 35 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 310b9cb2b32b..bb59b692998b 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -64,9 +64,40 @@ static inline bool kvm_psci_valid_affinity(unsigned long affinity)
 	return !(affinity & ~MPIDR_HWID_BITMASK);
 }
 
-static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
+static void kvm_psci_vcpu_request_reset(struct kvm_vcpu *vcpu,
+					unsigned long entry_addr,
+					unsigned long context_id,
+					bool big_endian)
 {
 	struct vcpu_reset_state *reset_state;
+
+	lockdep_assert_held(&vcpu->kvm->lock);
+
+	reset_state = &vcpu->arch.reset_state;
+	reset_state->pc = entry_addr;
+
+	/* Propagate caller endianness */
+	reset_state->be = big_endian;
+
+	/*
+	 * NOTE: We always update r0 (or x0) because for PSCI v0.1
+	 * the general purpose registers are undefined upon CPU_ON.
+	 */
+	reset_state->r0 = context_id;
+
+	WRITE_ONCE(reset_state->reset, true);
+	kvm_make_request(KVM_REQ_VCPU_RESET, vcpu);
+
+	/*
+	 * Make sure the reset request is observed if the change to
+	 * power_state is observed.
+	 */
+	smp_wmb();
+	vcpu->arch.power_off = false;
+}
+
+static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
+{
 	struct kvm *kvm = source_vcpu->kvm;
 	struct kvm_vcpu *vcpu = NULL;
 	unsigned long cpu_id;
@@ -90,29 +121,9 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 			return PSCI_RET_INVALID_PARAMS;
 	}
 
-	reset_state = &vcpu->arch.reset_state;
-
-	reset_state->pc = smccc_get_arg2(source_vcpu);
-
-	/* Propagate caller endianness */
-	reset_state->be = kvm_vcpu_is_be(source_vcpu);
-
-	/*
-	 * NOTE: We always update r0 (or x0) because for PSCI v0.1
-	 * the general purpose registers are undefined upon CPU_ON.
-	 */
-	reset_state->r0 = smccc_get_arg3(source_vcpu);
-
-	WRITE_ONCE(reset_state->reset, true);
-	kvm_make_request(KVM_REQ_VCPU_RESET, vcpu);
-
-	/*
-	 * Make sure the reset request is observed if the change to
-	 * power_state is observed.
-	 */
-	smp_wmb();
-
-	vcpu->arch.power_off = false;
+	kvm_psci_vcpu_request_reset(vcpu, smccc_get_arg2(source_vcpu),
+				    smccc_get_arg3(source_vcpu),
+				    kvm_vcpu_is_be(source_vcpu));
 	kvm_vcpu_wake_up(vcpu);
 
 	return PSCI_RET_SUCCESS;
-- 
2.33.0.rc2.250.ged5fa647cd-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/6] KVM: arm64: Add support for SYSTEM_SUSPEND PSCI call
  2021-08-19 22:36 [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support Oliver Upton
                   ` (2 preceding siblings ...)
  2021-08-19 22:36 ` [PATCH 3/6] KVM: arm64: Encapsulate reset request logic in a helper function Oliver Upton
@ 2021-08-19 22:36 ` Oliver Upton
  2021-08-19 22:36 ` [PATCH 5/6] selftests: KVM: Promote PSCI hypercalls to asm volatile Oliver Upton
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Oliver Upton @ 2021-08-19 22:36 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Marc Zyngier, Peter Shier, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Andrew Jones, Oliver Upton

ARM DEN0022D 5.19 "SYSTEM_SUSPEND" describes a PSCI call that may be
used to request a system be suspended. This is optional for PSCI v1.0
and to date KVM has elected to not implement the call. However, a
VMM/operator may wish to provide their guests with the ability to
suspend/resume, necessitating this PSCI call.

Implement support for SYSTEM_SUSPEND according to the prescribed
behavior in the specification. Add a new system event exit type,
KVM_SYSTEM_EVENT_SUSPEND, to notify userspace when a VM has requested a
system suspend.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/include/asm/kvm_host.h |  3 ++
 arch/arm64/kvm/arm.c              |  5 +++
 arch/arm64/kvm/psci.c             | 60 +++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h          |  2 ++
 4 files changed, 70 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index bdab1754c71f..b5cc83b938cb 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -136,6 +136,9 @@ struct kvm_arch {
 
 	/* Memory Tagging Extension enabled for the guest */
 	bool mte_enabled;
+
+	/* PSCI SYSTEM_SUSPEND call enabled for the guest */
+	bool suspend_enabled;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 3496cb0d014e..45890ba897ee 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -97,6 +97,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		r = 0;
 		kvm->arch.mte_enabled = true;
 		break;
+	case KVM_CAP_ARM_SYSTEM_SUSPEND:
+		r = 0;
+		kvm->arch.suspend_enabled = true;
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -211,6 +215,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_SET_GUEST_DEBUG:
 	case KVM_CAP_VCPU_ATTRIBUTES:
 	case KVM_CAP_PTP_KVM:
+	case KVM_CAP_ARM_SYSTEM_SUSPEND:
 		r = 1;
 		break;
 	case KVM_CAP_SET_GUEST_DEBUG2:
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index bb59b692998b..8e30a0fb1c60 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -205,6 +205,46 @@ static void kvm_psci_system_reset(struct kvm_vcpu *vcpu)
 	kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET);
 }
 
+static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
+{
+	unsigned long entry_addr, context_id;
+	struct kvm *kvm = vcpu->kvm;
+	unsigned long psci_ret = 0;
+	struct kvm_vcpu *tmp;
+	int ret = 0;
+	int i;
+
+	/*
+	 * The SYSTEM_SUSPEND PSCI call requires that all vCPUs (except the
+	 * calling vCPU) be in an OFF state, as determined by the
+	 * implementation.
+	 *
+	 * See ARM DEN0022D, 5.19 "SYSTEM_SUSPEND" for more details.
+	 */
+	mutex_lock(&kvm->lock);
+	kvm_for_each_vcpu(i, tmp, kvm) {
+		if (tmp != vcpu && !tmp->arch.power_off) {
+			psci_ret = PSCI_RET_DENIED;
+			ret = 1;
+			goto out;
+		}
+	}
+
+	entry_addr = smccc_get_arg1(vcpu);
+	context_id = smccc_get_arg2(vcpu);
+
+	kvm_psci_vcpu_request_reset(vcpu, entry_addr, context_id,
+				    kvm_vcpu_is_be(vcpu));
+
+	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
+	vcpu->run->system_event.type = KVM_SYSTEM_EVENT_SUSPEND;
+	vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+out:
+	mutex_unlock(&kvm->lock);
+	smccc_set_retval(vcpu, psci_ret, 0, 0, 0);
+	return ret;
+}
+
 static void kvm_psci_narrow_to_32bit(struct kvm_vcpu *vcpu)
 {
 	int i;
@@ -225,6 +265,14 @@ static unsigned long kvm_psci_check_allowed_function(struct kvm_vcpu *vcpu, u32
 	if ((fn & PSCI_0_2_64BIT) && vcpu_mode_is_32bit(vcpu))
 		return PSCI_RET_NOT_SUPPORTED;
 
+	switch (fn) {
+	case PSCI_1_0_FN_SYSTEM_SUSPEND:
+	case PSCI_1_0_FN64_SYSTEM_SUSPEND:
+		if (!vcpu->kvm->arch.suspend_enabled)
+			return PSCI_RET_NOT_SUPPORTED;
+		break;
+	}
+
 	return 0;
 }
 
@@ -318,6 +366,10 @@ static int kvm_psci_1_0_call(struct kvm_vcpu *vcpu)
 	unsigned long val;
 	int ret = 1;
 
+	val = kvm_psci_check_allowed_function(vcpu, psci_fn);
+	if (val)
+		goto out;
+
 	switch(psci_fn) {
 	case PSCI_0_2_FN_PSCI_VERSION:
 		val = KVM_ARM_PSCI_1_0;
@@ -341,6 +393,8 @@ static int kvm_psci_1_0_call(struct kvm_vcpu *vcpu)
 		case PSCI_0_2_FN_SYSTEM_OFF:
 		case PSCI_0_2_FN_SYSTEM_RESET:
 		case PSCI_1_0_FN_PSCI_FEATURES:
+		case PSCI_1_0_FN_SYSTEM_SUSPEND:
+		case PSCI_1_0_FN64_SYSTEM_SUSPEND:
 		case ARM_SMCCC_VERSION_FUNC_ID:
 			val = 0;
 			break;
@@ -349,10 +403,16 @@ static int kvm_psci_1_0_call(struct kvm_vcpu *vcpu)
 			break;
 		}
 		break;
+	case PSCI_1_0_FN_SYSTEM_SUSPEND:
+		kvm_psci_narrow_to_32bit(vcpu);
+		fallthrough;
+	case PSCI_1_0_FN64_SYSTEM_SUSPEND:
+		return kvm_psci_system_suspend(vcpu);
 	default:
 		return kvm_psci_0_2_call(vcpu);
 	}
 
+out:
 	smccc_set_retval(vcpu, val, 0, 0, 0);
 	return ret;
 }
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index d9e4aabcb31a..8e97d9c11a1b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -433,6 +433,7 @@ struct kvm_run {
 #define KVM_SYSTEM_EVENT_SHUTDOWN       1
 #define KVM_SYSTEM_EVENT_RESET          2
 #define KVM_SYSTEM_EVENT_CRASH          3
+#define KVM_SYSTEM_EVENT_SUSPEND        4
 			__u32 type;
 			__u64 flags;
 		} system_event;
@@ -1112,6 +1113,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_BINARY_STATS_FD 203
 #define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
 #define KVM_CAP_ARM_MTE 205
+#define KVM_CAP_ARM_SYSTEM_SUSPEND 206
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.33.0.rc2.250.ged5fa647cd-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/6] selftests: KVM: Promote PSCI hypercalls to asm volatile
  2021-08-19 22:36 [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support Oliver Upton
                   ` (3 preceding siblings ...)
  2021-08-19 22:36 ` [PATCH 4/6] KVM: arm64: Add support for SYSTEM_SUSPEND PSCI call Oliver Upton
@ 2021-08-19 22:36 ` Oliver Upton
  2021-08-19 22:36 ` [PATCH 6/6] selftests: KVM: Test SYSTEM_SUSPEND PSCI call Oliver Upton
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Oliver Upton @ 2021-08-19 22:36 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Marc Zyngier, Peter Shier, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Andrew Jones, Oliver Upton

Prevent the compiler from reordering around PSCI hypercalls by promoting
to asm volatile.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 .../selftests/kvm/aarch64/psci_cpu_on_test.c     | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/kvm/aarch64/psci_cpu_on_test.c b/tools/testing/selftests/kvm/aarch64/psci_cpu_on_test.c
index 018c269990e1..9c22374fc0f5 100644
--- a/tools/testing/selftests/kvm/aarch64/psci_cpu_on_test.c
+++ b/tools/testing/selftests/kvm/aarch64/psci_cpu_on_test.c
@@ -31,10 +31,10 @@ static uint64_t psci_cpu_on(uint64_t target_cpu, uint64_t entry_addr,
 	register uint64_t x2 asm("x2") = entry_addr;
 	register uint64_t x3 asm("x3") = context_id;
 
-	asm("hvc #0"
-	    : "=r"(x0)
-	    : "r"(x0), "r"(x1), "r"(x2), "r"(x3)
-	    : "memory");
+	asm volatile("hvc #0"
+		     : "=r"(x0)
+		     : "r"(x0), "r"(x1), "r"(x2), "r"(x3)
+		     : "memory");
 
 	return x0;
 }
@@ -46,10 +46,10 @@ static uint64_t psci_affinity_info(uint64_t target_affinity,
 	register uint64_t x1 asm("x1") = target_affinity;
 	register uint64_t x2 asm("x2") = lowest_affinity_level;
 
-	asm("hvc #0"
-	    : "=r"(x0)
-	    : "r"(x0), "r"(x1), "r"(x2)
-	    : "memory");
+	asm volatile("hvc #0"
+		     : "=r"(x0)
+		     : "r"(x0), "r"(x1), "r"(x2)
+		     : "memory");
 
 	return x0;
 }
-- 
2.33.0.rc2.250.ged5fa647cd-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/6] selftests: KVM: Test SYSTEM_SUSPEND PSCI call
  2021-08-19 22:36 [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support Oliver Upton
                   ` (4 preceding siblings ...)
  2021-08-19 22:36 ` [PATCH 5/6] selftests: KVM: Promote PSCI hypercalls to asm volatile Oliver Upton
@ 2021-08-19 22:36 ` Oliver Upton
  2021-08-19 23:41 ` [PATCH] Documentation: kvm: Document KVM_SYSTEM_EVENT_SUSPEND exit type Oliver Upton
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Oliver Upton @ 2021-08-19 22:36 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Marc Zyngier, Peter Shier, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Andrew Jones, Oliver Upton

Test KVM's newly-added support for the SYSTEM_SUSPEND PSCI call. Since
it is ABI that the vCPUs remain runnable after a system exit (i.e. a VMM
can blissfully ignore this exit), assert that the exiting vCPU is reset
at the requested entrypoint.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 .../selftests/kvm/aarch64/psci_cpu_on_test.c  | 110 +++++++++++++-----
 1 file changed, 84 insertions(+), 26 deletions(-)

diff --git a/tools/testing/selftests/kvm/aarch64/psci_cpu_on_test.c b/tools/testing/selftests/kvm/aarch64/psci_cpu_on_test.c
index 9c22374fc0f5..b08b006cc4b4 100644
--- a/tools/testing/selftests/kvm/aarch64/psci_cpu_on_test.c
+++ b/tools/testing/selftests/kvm/aarch64/psci_cpu_on_test.c
@@ -54,7 +54,21 @@ static uint64_t psci_affinity_info(uint64_t target_affinity,
 	return x0;
 }
 
-static void guest_main(uint64_t target_cpu)
+static uint64_t psci_system_suspend(uint64_t entry_addr, uint64_t context_id)
+{
+	register uint64_t x0 asm("x0") = PSCI_1_0_FN64_SYSTEM_SUSPEND;
+	register uint64_t x1 asm("x1") = entry_addr;
+	register uint64_t x2 asm("x2") = context_id;
+
+	asm volatile("hvc #0"
+		     : "=r"(x0)
+		     : "r"(x0), "r"(x1), "r"(x2)
+		     : "memory");
+
+	return x0;
+}
+
+static void guest_test_cpu_on(uint64_t target_cpu)
 {
 	GUEST_ASSERT(!psci_cpu_on(target_cpu, CPU_ON_ENTRY_ADDR, CPU_ON_CONTEXT_ID));
 	uint64_t target_state;
@@ -69,52 +83,96 @@ static void guest_main(uint64_t target_cpu)
 	GUEST_DONE();
 }
 
+static void guest_test_system_suspend(uint64_t target_cpu)
+{
+	psci_system_suspend(CPU_ON_ENTRY_ADDR, CPU_ON_CONTEXT_ID);
+
+	/* should never be reached */
+	GUEST_ASSERT(0);
+}
+
+static void guest_main(uint64_t target_cpu)
+{
+	guest_test_cpu_on(target_cpu);
+	guest_test_system_suspend(target_cpu);
+}
+
 int main(void)
 {
+	struct kvm_mp_state target_mp_state = { .mp_state = KVM_MP_STATE_STOPPED };
 	uint64_t target_mpidr, obs_pc, obs_x0;
+	struct kvm_enable_cap cap = {0};
+	uint32_t vcpu_to_test = -1;
 	struct kvm_vcpu_init init;
 	struct kvm_vm *vm;
 	struct ucall uc;
+	int i;
+
+	if (!kvm_check_cap(KVM_CAP_ARM_SYSTEM_SUSPEND)) {
+		print_skip("KVM_CAP_ARM_SYSTEM_SUSPEND not supported");
+		exit(KSFT_SKIP);
+	}
 
 	vm = vm_create(VM_MODE_DEFAULT, DEFAULT_GUEST_PHY_PAGES, O_RDWR);
 	kvm_vm_elf_load(vm, program_invocation_name);
 	ucall_init(vm, NULL);
 
+	cap.cap = KVM_CAP_ARM_SYSTEM_SUSPEND;
+	vm_enable_cap(vm, &cap);
+
 	vm_ioctl(vm, KVM_ARM_PREFERRED_TARGET, &init);
 	init.features[0] |= (1 << KVM_ARM_VCPU_PSCI_0_2);
 
 	aarch64_vcpu_add_default(vm, VCPU_ID_SOURCE, &init, guest_main);
-
-	/*
-	 * make sure the target is already off when executing the test.
-	 */
-	init.features[0] |= (1 << KVM_ARM_VCPU_POWER_OFF);
 	aarch64_vcpu_add_default(vm, VCPU_ID_TARGET, &init, guest_main);
 
 	get_reg(vm, VCPU_ID_TARGET, ARM64_SYS_REG(MPIDR_EL1), &target_mpidr);
 	vcpu_args_set(vm, VCPU_ID_SOURCE, 1, target_mpidr & MPIDR_HWID_BITMASK);
-	vcpu_run(vm, VCPU_ID_SOURCE);
-
-	switch (get_ucall(vm, VCPU_ID_SOURCE, &uc)) {
-	case UCALL_DONE:
-		break;
-	case UCALL_ABORT:
-		TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0], __FILE__,
-			  uc.args[1]);
-		break;
-	default:
-		TEST_FAIL("Unhandled ucall: %lu", uc.cmd);
-	}
 
-	get_reg(vm, VCPU_ID_TARGET, ARM64_CORE_REG(regs.pc), &obs_pc);
-	get_reg(vm, VCPU_ID_TARGET, ARM64_CORE_REG(regs.regs[0]), &obs_x0);
+	for (i = 0; i < 2; i++) {
+		struct kvm_run *run = vcpu_state(vm, VCPU_ID_SOURCE);
+
+		/*
+		 * make sure the target is already off when executing the test.
+		 */
+		vcpu_set_mp_state(vm, VCPU_ID_TARGET, &target_mp_state);
+		vcpu_run(vm, VCPU_ID_SOURCE);
+		switch (run->exit_reason) {
+		case KVM_EXIT_MMIO:
+			switch (get_ucall(vm, VCPU_ID_SOURCE, &uc)) {
+			case UCALL_DONE:
+				vcpu_to_test = VCPU_ID_TARGET;
+				break;
+			case UCALL_ABORT:
+				TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0], __FILE__,
+					  uc.args[1]);
+				break;
+			default:
+				TEST_FAIL("Unhandled ucall: %lu", uc.cmd);
+			}
+			break;
+		case KVM_EXIT_SYSTEM_EVENT:
+			TEST_ASSERT(run->system_event.type == KVM_SYSTEM_EVENT_SUSPEND,
+				    "unhandled system event: %u (expected: %u)",
+				    run->system_event.type, KVM_SYSTEM_EVENT_SUSPEND);
+			vcpu_to_test = VCPU_ID_SOURCE;
+			break;
+		default:
+			TEST_FAIL("unhandled exit reason: %u (%s)",
+				  run->exit_reason, exit_reason_str(run->exit_reason));
+		}
+
+		get_reg(vm, vcpu_to_test, ARM64_CORE_REG(regs.pc), &obs_pc);
+		get_reg(vm, vcpu_to_test, ARM64_CORE_REG(regs.regs[0]), &obs_x0);
+
+		TEST_ASSERT(obs_pc == CPU_ON_ENTRY_ADDR,
+			    "unexpected target cpu pc: %lx (expected: %lx)",
+			    obs_pc, CPU_ON_ENTRY_ADDR);
+		TEST_ASSERT(obs_x0 == CPU_ON_CONTEXT_ID,
+			    "unexpected target context id: %lx (expected: %lx)",
+			    obs_x0, CPU_ON_CONTEXT_ID);
 
-	TEST_ASSERT(obs_pc == CPU_ON_ENTRY_ADDR,
-		    "unexpected target cpu pc: %lx (expected: %lx)",
-		    obs_pc, CPU_ON_ENTRY_ADDR);
-	TEST_ASSERT(obs_x0 == CPU_ON_CONTEXT_ID,
-		    "unexpected target context id: %lx (expected: %lx)",
-		    obs_x0, CPU_ON_CONTEXT_ID);
+	}
 
 	kvm_vm_free(vm);
 	return 0;
-- 
2.33.0.rc2.250.ged5fa647cd-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH] Documentation: kvm: Document KVM_SYSTEM_EVENT_SUSPEND exit type
  2021-08-19 22:36 [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support Oliver Upton
                   ` (5 preceding siblings ...)
  2021-08-19 22:36 ` [PATCH 6/6] selftests: KVM: Test SYSTEM_SUSPEND PSCI call Oliver Upton
@ 2021-08-19 23:41 ` Oliver Upton
  2021-08-22 19:56 ` [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support Oliver Upton
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Oliver Upton @ 2021-08-19 23:41 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Marc Zyngier, Peter Shier, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Andrew Jones, Oliver Upton

KVM now has the capability to exit to userspace for VM suspend events.
Document the intended UAPI behavior, such that a VMM can simply ignore
the guest intentions and resume.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 Documentation/virt/kvm/api.rst | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index dae68e68ca23..d4dfc6f84dfc 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5632,6 +5632,7 @@ should put the acknowledged interrupt vector into the 'epr' field.
   #define KVM_SYSTEM_EVENT_SHUTDOWN       1
   #define KVM_SYSTEM_EVENT_RESET          2
   #define KVM_SYSTEM_EVENT_CRASH          3
+  #define KVM_SYSTEM_EVENT_SUSPEND        4
 			__u32 type;
 			__u64 flags;
 		} system_event;
@@ -5656,6 +5657,10 @@ Valid values for 'type' are:
    has requested a crash condition maintenance. Userspace can choose
    to ignore the request, or to gather VM memory core dump and/or
    reset/shutdown of the VM.
+ - KVM_SYSTEM_EVENT_SUSPEND -- the guest has requested that the VM
+   suspends. Userspace is not obliged to honor this, and may call KVM_RUN
+   again. Doing so will cause the guest to resume at its requested entry
+   point.
 
 ::
 
-- 
2.33.0.rc2.250.ged5fa647cd-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support
  2021-08-19 22:36 [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support Oliver Upton
                   ` (6 preceding siblings ...)
  2021-08-19 23:41 ` [PATCH] Documentation: kvm: Document KVM_SYSTEM_EVENT_SUSPEND exit type Oliver Upton
@ 2021-08-22 19:56 ` Oliver Upton
  2021-08-26 10:51   ` Marc Zyngier
  2021-08-27 21:58 ` [RFC kvmtool PATCH 0/2] " Oliver Upton
  2021-09-06  9:12 ` [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support Marc Zyngier
  9 siblings, 1 reply; 20+ messages in thread
From: Oliver Upton @ 2021-08-22 19:56 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Marc Zyngier, Peter Shier, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Andrew Jones

Marc,

On Thu, Aug 19, 2021 at 3:36 PM Oliver Upton <oupton@google.com> wrote:
>
> Certain VMMs/operators may wish to give their guests the ability to
> initiate a system suspend that could result in the VM being saved to
> persistent storage to be resumed at a later time. The PSCI v1.0
> specification describes an SMC, SYSTEM_SUSPEND, that allows a kernel to
> request a system suspend. This call is optional for v1.0, and KVM
> elected to not support the call in its v1.0 implementation.
>
> This series adds support for the SYSTEM_SUSPEND PSCI call to KVM/arm64.
> Since this is a system-scoped event, KVM cannot quiesce the VM on its
> own. We add a new system exit type in this series to clue in userspace
> that a suspend was requested. Per the KVM_EXIT_SYSTEM_EVENT ABI, a VMM
> that doesn't care about this event can simply resume the guest without
> issue (we set up the calling vCPU to come out of reset correctly on next
> KVM_RUN).
>
> Patch 1 is unrelated, and is a fix for "KVM: arm64: Enforce reserved
> bits for PSCI target affinities" on the kvmarm/next branch. Nothing
> particularly hairy, just an unused param.

Title line may not have been clear on this series, Patch 1 is a fix
for the PSCI CPU_ON series that's in kvmarm/next to suppress a
compiler warning.

> Patch 2 simplifies the function to check if KVM allows a particular PSCI
> function. We can generally disallow any PSCI function that sets the
> SMC64 bit in the PSCI function ID.
>
> Patch 3 wraps up the PSCI reset logic used for CPU_ON, which will be
> needed later to queue up a reset on the vCPU that requested the system
> suspend.
>
> Patch 4 brings in the new UAPI and PSCI call, guarded behind a VM
> capability for backwards compatibility.
>
> Patch 5 is indirectly related to this series, and avoids compiler
> reordering on PSCI calls in the selftest introduced by "selftests: KVM:
> Introduce psci_cpu_on_test".

This too is a fix for the PSCI CPU_ON series. Just wanted to raise it
to your attention beyond the new feature I'm working on here.

--
Thanks,
Oliver

> Finally, patch 6 extends the PSCI selftest to verify the
> SYSTEM_SUSPEND PSCI call behaves as intended.
>
> These patches apply cleanly to kvmarm/next at the following commit:
>
> f2267b87ecd5 ("Merge branch kvm-arm64/misc-5.15 into kvmarm-master/next")
>
> The series is intentionally based on kvmarm/next for the sake of fixing
> patches only present there in [1/6] and [5/6]. Tested on QEMU (ick)
> since my Mt. Jade box is out to lunch at the moment and for some unknown
> reason the toolchain on my work computer doesn't play nice with the FVP.
>
> Oliver Upton (6):
>   KVM: arm64: Drop unused vcpu param to kvm_psci_valid_affinity()
>   KVM: arm64: Clean up SMC64 PSCI filtering for AArch32 guests
>   KVM: arm64: Encapsulate reset request logic in a helper function
>   KVM: arm64: Add support for SYSTEM_SUSPEND PSCI call
>   selftests: KVM: Promote PSCI hypercalls to asm volatile
>   selftests: KVM: Test SYSTEM_SUSPEND PSCI call
>
>  arch/arm64/include/asm/kvm_host.h             |   3 +
>  arch/arm64/kvm/arm.c                          |   5 +
>  arch/arm64/kvm/psci.c                         | 134 +++++++++++++-----
>  include/uapi/linux/kvm.h                      |   2 +
>  .../selftests/kvm/aarch64/psci_cpu_on_test.c  | 126 +++++++++++-----
>  5 files changed, 202 insertions(+), 68 deletions(-)
>
> --
> 2.33.0.rc2.250.ged5fa647cd-goog
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support
  2021-08-22 19:56 ` [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support Oliver Upton
@ 2021-08-26 10:51   ` Marc Zyngier
  2021-08-26 18:37     ` Oliver Upton
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2021-08-26 10:51 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, kvmarm, Peter Shier, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Andrew Jones

On Sun, 22 Aug 2021 20:56:13 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> Marc,
> 
> On Thu, Aug 19, 2021 at 3:36 PM Oliver Upton <oupton@google.com> wrote:
> >
> > Certain VMMs/operators may wish to give their guests the ability to
> > initiate a system suspend that could result in the VM being saved to
> > persistent storage to be resumed at a later time. The PSCI v1.0
> > specification describes an SMC, SYSTEM_SUSPEND, that allows a kernel to
> > request a system suspend. This call is optional for v1.0, and KVM
> > elected to not support the call in its v1.0 implementation.
> >
> > This series adds support for the SYSTEM_SUSPEND PSCI call to KVM/arm64.
> > Since this is a system-scoped event, KVM cannot quiesce the VM on its
> > own. We add a new system exit type in this series to clue in userspace
> > that a suspend was requested. Per the KVM_EXIT_SYSTEM_EVENT ABI, a VMM
> > that doesn't care about this event can simply resume the guest without
> > issue (we set up the calling vCPU to come out of reset correctly on next
> > KVM_RUN).
> >
> > Patch 1 is unrelated, and is a fix for "KVM: arm64: Enforce reserved
> > bits for PSCI target affinities" on the kvmarm/next branch. Nothing
> > particularly hairy, just an unused param.
> 
> Title line may not have been clear on this series, Patch 1 is a fix
> for the PSCI CPU_ON series that's in kvmarm/next to suppress a
> compiler warning.

I'm not getting this warning. What are you compiling with? In general,
the compiler should shout about unused function parameters.

> 
> > Patch 2 simplifies the function to check if KVM allows a particular PSCI
> > function. We can generally disallow any PSCI function that sets the
> > SMC64 bit in the PSCI function ID.
> >
> > Patch 3 wraps up the PSCI reset logic used for CPU_ON, which will be
> > needed later to queue up a reset on the vCPU that requested the system
> > suspend.
> >
> > Patch 4 brings in the new UAPI and PSCI call, guarded behind a VM
> > capability for backwards compatibility.
> >
> > Patch 5 is indirectly related to this series, and avoids compiler
> > reordering on PSCI calls in the selftest introduced by "selftests: KVM:
> > Introduce psci_cpu_on_test".
> 
> This too is a fix for the PSCI CPU_ON series. Just wanted to raise it
> to your attention beyond the new feature I'm working on here.

I'm not sure this actually need fixing. The dependencies on the input
and output will effectively prevent such reordering. That will
definitely be a good cleanup though, but maybe not worth taking out of
this series.

Thanks,

	M. (trying to find excuses to close the tree quickly).

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support
  2021-08-26 10:51   ` Marc Zyngier
@ 2021-08-26 18:37     ` Oliver Upton
  0 siblings, 0 replies; 20+ messages in thread
From: Oliver Upton @ 2021-08-26 18:37 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, kvmarm, Peter Shier, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Andrew Jones

On Thu, Aug 26, 2021 at 11:51:00AM +0100, Marc Zyngier wrote:
> > > Patch 1 is unrelated, and is a fix for "KVM: arm64: Enforce reserved
> > > bits for PSCI target affinities" on the kvmarm/next branch. Nothing
> > > particularly hairy, just an unused param.
> > 
> > Title line may not have been clear on this series, Patch 1 is a fix
> > for the PSCI CPU_ON series that's in kvmarm/next to suppress a
> > compiler warning.
> 
> I'm not getting this warning. What are you compiling with? In general,
> the compiler should shout about unused function parameters.

Gah, this is just with local tooling. I'm unable to repro using
GCC/Clang. I see that '-Wno-unused-parameter' is set alongside
'-Wunused' when W=1.

> > > Patch 5 is indirectly related to this series, and avoids compiler
> > > reordering on PSCI calls in the selftest introduced by "selftests: KVM:
> > > Introduce psci_cpu_on_test".
> > 
> > This too is a fix for the PSCI CPU_ON series. Just wanted to raise it
> > to your attention beyond the new feature I'm working on here.
> 
> I'm not sure this actually need fixing. The dependencies on the input
> and output will effectively prevent such reordering. That will
> definitely be a good cleanup though, but maybe not worth taking out of
> this series.

Yep, you're right. There's an obvious dependency in the test that
maintains program order. I realize that it is only the second test
(patch 6 in this series) where things get hairy.

Apologies for the noise.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC kvmtool PATCH 0/2] PSCI SYSTEM_SUSPEND support
  2021-08-19 22:36 [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support Oliver Upton
                   ` (7 preceding siblings ...)
  2021-08-22 19:56 ` [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support Oliver Upton
@ 2021-08-27 21:58 ` Oliver Upton
  2021-08-27 21:58   ` [RFC kvmtool PATCH 1/2] TESTONLY: KVM: Update KVM headers Oliver Upton
  2021-08-27 21:58   ` [RFC kvmtool PATCH 2/2] arm64: Add support for KVM_CAP_ARM_SYSTEM_SUSPEND Oliver Upton
  2021-09-06  9:12 ` [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support Marc Zyngier
  9 siblings, 2 replies; 20+ messages in thread
From: Oliver Upton @ 2021-08-27 21:58 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Marc Zyngier, Peter Shier, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, reijiw, Andrew Jones, Will Deacon,
	Julien Thierry, Oliver Upton

This series was developed to test support for PSCI SYSTEM_SUSPEND
currently proposed for KVM/arm64 [1]. Since there isn't much for kvmtool
to do for a suspend (we don't have save/restore), just restart the guest
immediately.

Tested on kvm-arm/next kernel with the aforementioned series applied.

[1] https://lore.kernel.org/r/20210819223640.3564975-1-oupton@google.com

Oliver Upton (2):
  TESTONLY: KVM: Update KVM headers
  KVM/arm64: Add support for KVM_CAP_ARM_SYSTEM_SUSPEND

 arm/kvm.c           |  12 ++
 include/kvm/kvm.h   |   1 +
 include/linux/kvm.h | 414 +++++++++++++++++++++++++++++++++++++++++++-
 kvm-cpu.c           |   5 +
 kvm.c               |   7 +
 5 files changed, 432 insertions(+), 7 deletions(-)

-- 
2.33.0.259.gc128427fd7-goog


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC kvmtool PATCH 1/2] TESTONLY: KVM: Update KVM headers
  2021-08-27 21:58 ` [RFC kvmtool PATCH 0/2] " Oliver Upton
@ 2021-08-27 21:58   ` Oliver Upton
  2021-08-27 21:58   ` [RFC kvmtool PATCH 2/2] arm64: Add support for KVM_CAP_ARM_SYSTEM_SUSPEND Oliver Upton
  1 sibling, 0 replies; 20+ messages in thread
From: Oliver Upton @ 2021-08-27 21:58 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Marc Zyngier, Peter Shier, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, reijiw, Andrew Jones, Will Deacon,
	Julien Thierry, Oliver Upton

Pick up headers from kvm-arm/next to test the SYSTEM_SUSPEND PSCI call.
This patch should not be submitted, and instead opt for a copy of
linux/kvm.h from Linus' tree when patches land.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 include/linux/kvm.h | 414 +++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 407 insertions(+), 7 deletions(-)

diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 5e3f12d..8e97d9c 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -8,6 +8,7 @@
  * Note: you must update KVM_API_VERSION if you change this interface.
  */
 
+#include <linux/const.h>
 #include <linux/types.h>
 #include <linux/compiler.h>
 #include <linux/ioctl.h>
@@ -116,7 +117,7 @@ struct kvm_irq_level {
 	 * ACPI gsi notion of irq.
 	 * For IA-64 (APIC model) IOAPIC0: irq 0-23; IOAPIC1: irq 24-47..
 	 * For X86 (standard AT mode) PIC0/1: irq 0-15. IOAPIC0: 0-23..
-	 * For ARM: See Documentation/virt/kvm/api.txt
+	 * For ARM: See Documentation/virt/kvm/api.rst
 	 */
 	union {
 		__u32 irq;
@@ -188,10 +189,13 @@ struct kvm_s390_cmma_log {
 struct kvm_hyperv_exit {
 #define KVM_EXIT_HYPERV_SYNIC          1
 #define KVM_EXIT_HYPERV_HCALL          2
+#define KVM_EXIT_HYPERV_SYNDBG         3
 	__u32 type;
+	__u32 pad1;
 	union {
 		struct {
 			__u32 msr;
+			__u32 pad2;
 			__u64 control;
 			__u64 evt_page;
 			__u64 msg_page;
@@ -201,6 +205,29 @@ struct kvm_hyperv_exit {
 			__u64 result;
 			__u64 params[2];
 		} hcall;
+		struct {
+			__u32 msr;
+			__u32 pad2;
+			__u64 control;
+			__u64 status;
+			__u64 send_page;
+			__u64 recv_page;
+			__u64 pending_page;
+		} syndbg;
+	} u;
+};
+
+struct kvm_xen_exit {
+#define KVM_EXIT_XEN_HCALL          1
+	__u32 type;
+	union {
+		struct {
+			__u32 longmode;
+			__u32 cpl;
+			__u64 input;
+			__u64 result;
+			__u64 params[6];
+		} hcall;
 	} u;
 };
 
@@ -235,6 +262,13 @@ struct kvm_hyperv_exit {
 #define KVM_EXIT_S390_STSI        25
 #define KVM_EXIT_IOAPIC_EOI       26
 #define KVM_EXIT_HYPERV           27
+#define KVM_EXIT_ARM_NISV         28
+#define KVM_EXIT_X86_RDMSR        29
+#define KVM_EXIT_X86_WRMSR        30
+#define KVM_EXIT_DIRTY_RING_FULL  31
+#define KVM_EXIT_AP_RESET_HOLD    32
+#define KVM_EXIT_X86_BUS_LOCK     33
+#define KVM_EXIT_XEN              34
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -243,6 +277,11 @@ struct kvm_hyperv_exit {
 #define KVM_INTERNAL_ERROR_SIMUL_EX	2
 /* Encounter unexpected vm-exit due to delivery event. */
 #define KVM_INTERNAL_ERROR_DELIVERY_EV	3
+/* Encounter unexpected vm-exit reason */
+#define KVM_INTERNAL_ERROR_UNEXPECTED_EXIT_REASON	4
+
+/* Flags that describe what fields in emulation_failure hold valid data. */
+#define KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES (1ULL << 0)
 
 /* for KVM_RUN, returned by mmap(vcpu_fd, offset=0) */
 struct kvm_run {
@@ -274,6 +313,7 @@ struct kvm_run {
 		/* KVM_EXIT_FAIL_ENTRY */
 		struct {
 			__u64 hardware_entry_failure_reason;
+			__u32 cpu;
 		} fail_entry;
 		/* KVM_EXIT_EXCEPTION */
 		struct {
@@ -346,6 +386,25 @@ struct kvm_run {
 			__u32 ndata;
 			__u64 data[16];
 		} internal;
+		/*
+		 * KVM_INTERNAL_ERROR_EMULATION
+		 *
+		 * "struct emulation_failure" is an overlay of "struct internal"
+		 * that is used for the KVM_INTERNAL_ERROR_EMULATION sub-type of
+		 * KVM_EXIT_INTERNAL_ERROR.  Note, unlike other internal error
+		 * sub-types, this struct is ABI!  It also needs to be backwards
+		 * compatible with "struct internal".  Take special care that
+		 * "ndata" is correct, that new fields are enumerated in "flags",
+		 * and that each flag enumerates fields that are 64-bit aligned
+		 * and sized (so that ndata+internal.data[] is valid/accurate).
+		 */
+		struct {
+			__u32 suberror;
+			__u32 ndata;
+			__u64 flags;
+			__u8  insn_size;
+			__u8  insn_bytes[15];
+		} emulation_failure;
 		/* KVM_EXIT_OSI */
 		struct {
 			__u64 gprs[32];
@@ -374,6 +433,7 @@ struct kvm_run {
 #define KVM_SYSTEM_EVENT_SHUTDOWN       1
 #define KVM_SYSTEM_EVENT_RESET          2
 #define KVM_SYSTEM_EVENT_CRASH          3
+#define KVM_SYSTEM_EVENT_SUSPEND        4
 			__u32 type;
 			__u64 flags;
 		} system_event;
@@ -392,6 +452,24 @@ struct kvm_run {
 		} eoi;
 		/* KVM_EXIT_HYPERV */
 		struct kvm_hyperv_exit hyperv;
+		/* KVM_EXIT_ARM_NISV */
+		struct {
+			__u64 esr_iss;
+			__u64 fault_ipa;
+		} arm_nisv;
+		/* KVM_EXIT_X86_RDMSR / KVM_EXIT_X86_WRMSR */
+		struct {
+			__u8 error; /* user -> kernel */
+			__u8 pad[7];
+#define KVM_MSR_EXIT_REASON_INVAL	(1 << 0)
+#define KVM_MSR_EXIT_REASON_UNKNOWN	(1 << 1)
+#define KVM_MSR_EXIT_REASON_FILTER	(1 << 2)
+			__u32 reason; /* kernel -> user */
+			__u32 index; /* kernel -> user */
+			__u64 data; /* kernel <-> user */
+		} msr;
+		/* KVM_EXIT_XEN */
+		struct kvm_xen_exit xen;
 		/* Fix the size of the union. */
 		char padding[256];
 	};
@@ -466,12 +544,17 @@ struct kvm_s390_mem_op {
 	__u32 size;		/* amount of bytes */
 	__u32 op;		/* type of operation */
 	__u64 buf;		/* buffer in userspace */
-	__u8 ar;		/* the access register number */
-	__u8 reserved[31];	/* should be set to 0 */
+	union {
+		__u8 ar;	/* the access register number */
+		__u32 sida_offset; /* offset into the sida */
+		__u8 reserved[32]; /* should be set to 0 */
+	};
 };
 /* types for kvm_s390_mem_op->op */
 #define KVM_S390_MEMOP_LOGICAL_READ	0
 #define KVM_S390_MEMOP_LOGICAL_WRITE	1
+#define KVM_S390_MEMOP_SIDA_READ	2
+#define KVM_S390_MEMOP_SIDA_WRITE	3
 /* flags for kvm_s390_mem_op->flags */
 #define KVM_S390_MEMOP_F_CHECK_ONLY		(1ULL << 0)
 #define KVM_S390_MEMOP_F_INJECT_EXCEPTION	(1ULL << 1)
@@ -533,6 +616,7 @@ struct kvm_vapic_addr {
 #define KVM_MP_STATE_CHECK_STOP        6
 #define KVM_MP_STATE_OPERATING         7
 #define KVM_MP_STATE_LOAD              8
+#define KVM_MP_STATE_AP_RESET_HOLD     9
 
 struct kvm_mp_state {
 	__u32 mp_state;
@@ -764,9 +848,10 @@ struct kvm_ppc_resize_hpt {
 #define KVM_VM_PPC_HV 1
 #define KVM_VM_PPC_PR 2
 
-/* on MIPS, 0 forces trap & emulate, 1 forces VZ ASE */
-#define KVM_VM_MIPS_TE		0
+/* on MIPS, 0 indicates auto, 1 forces VZ ASE, 2 forces trap & emulate */
+#define KVM_VM_MIPS_AUTO	0
 #define KVM_VM_MIPS_VZ		1
+#define KVM_VM_MIPS_TE		2
 
 #define KVM_S390_SIE_PAGE_OFFSET 1
 
@@ -996,6 +1081,39 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_ARM_PTRAUTH_ADDRESS 171
 #define KVM_CAP_ARM_PTRAUTH_GENERIC 172
 #define KVM_CAP_PMU_EVENT_FILTER 173
+#define KVM_CAP_ARM_IRQ_LINE_LAYOUT_2 174
+#define KVM_CAP_HYPERV_DIRECT_TLBFLUSH 175
+#define KVM_CAP_PPC_GUEST_DEBUG_SSTEP 176
+#define KVM_CAP_ARM_NISV_TO_USER 177
+#define KVM_CAP_ARM_INJECT_EXT_DABT 178
+#define KVM_CAP_S390_VCPU_RESETS 179
+#define KVM_CAP_S390_PROTECTED 180
+#define KVM_CAP_PPC_SECURE_GUEST 181
+#define KVM_CAP_HALT_POLL 182
+#define KVM_CAP_ASYNC_PF_INT 183
+#define KVM_CAP_LAST_CPU 184
+#define KVM_CAP_SMALLER_MAXPHYADDR 185
+#define KVM_CAP_S390_DIAG318 186
+#define KVM_CAP_STEAL_TIME 187
+#define KVM_CAP_X86_USER_SPACE_MSR 188
+#define KVM_CAP_X86_MSR_FILTER 189
+#define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190
+#define KVM_CAP_SYS_HYPERV_CPUID 191
+#define KVM_CAP_DIRTY_LOG_RING 192
+#define KVM_CAP_X86_BUS_LOCK_EXIT 193
+#define KVM_CAP_PPC_DAWR1 194
+#define KVM_CAP_SET_GUEST_DEBUG2 195
+#define KVM_CAP_SGX_ATTRIBUTE 196
+#define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
+#define KVM_CAP_PTP_KVM 198
+#define KVM_CAP_HYPERV_ENFORCE_CPUID 199
+#define KVM_CAP_SREGS2 200
+#define KVM_CAP_EXIT_HYPERCALL 201
+#define KVM_CAP_PPC_RPT_INVALIDATE 202
+#define KVM_CAP_BINARY_STATS_FD 203
+#define KVM_CAP_EXIT_ON_EMULATION_FAILURE 204
+#define KVM_CAP_ARM_MTE 205
+#define KVM_CAP_ARM_SYSTEM_SUSPEND 206
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1069,6 +1187,11 @@ struct kvm_x86_mce {
 #endif
 
 #ifdef KVM_CAP_XEN_HVM
+#define KVM_XEN_HVM_CONFIG_HYPERCALL_MSR	(1 << 0)
+#define KVM_XEN_HVM_CONFIG_INTERCEPT_HCALL	(1 << 1)
+#define KVM_XEN_HVM_CONFIG_SHARED_INFO		(1 << 2)
+#define KVM_XEN_HVM_CONFIG_RUNSTATE		(1 << 3)
+
 struct kvm_xen_hvm_config {
 	__u32 flags;
 	__u32 msr;
@@ -1086,7 +1209,7 @@ struct kvm_xen_hvm_config {
  *
  * KVM_IRQFD_FLAG_RESAMPLE indicates resamplefd is valid and specifies
  * the irqfd to operate in resampling mode for level triggered interrupt
- * emulation.  See Documentation/virt/kvm/api.txt.
+ * emulation.  See Documentation/virt/kvm/api.rst.
  */
 #define KVM_IRQFD_FLAG_RESAMPLE (1 << 1)
 
@@ -1142,6 +1265,7 @@ struct kvm_dirty_tlb {
 #define KVM_REG_S390		0x5000000000000000ULL
 #define KVM_REG_ARM64		0x6000000000000000ULL
 #define KVM_REG_MIPS		0x7000000000000000ULL
+#define KVM_REG_RISCV		0x8000000000000000ULL
 
 #define KVM_REG_SIZE_SHIFT	52
 #define KVM_REG_SIZE_MASK	0x00f0000000000000ULL
@@ -1222,6 +1346,8 @@ enum kvm_device_type {
 #define KVM_DEV_TYPE_ARM_VGIC_ITS	KVM_DEV_TYPE_ARM_VGIC_ITS
 	KVM_DEV_TYPE_XIVE,
 #define KVM_DEV_TYPE_XIVE		KVM_DEV_TYPE_XIVE
+	KVM_DEV_TYPE_ARM_PV_TIME,
+#define KVM_DEV_TYPE_ARM_PV_TIME	KVM_DEV_TYPE_ARM_PV_TIME
 	KVM_DEV_TYPE_MAX,
 };
 
@@ -1332,6 +1458,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_PPC_GET_CPU_CHAR	  _IOR(KVMIO,  0xb1, struct kvm_ppc_cpu_char)
 /* Available with KVM_CAP_PMU_EVENT_FILTER */
 #define KVM_SET_PMU_EVENT_FILTER  _IOW(KVMIO,  0xb2, struct kvm_pmu_event_filter)
+#define KVM_PPC_SVM_OFF		  _IO(KVMIO,  0xb3)
+#define KVM_ARM_MTE_COPY_TAGS	  _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
 
 /* ioctl for vm fd */
 #define KVM_CREATE_DEVICE	  _IOWR(KVMIO,  0xe0, struct kvm_create_device)
@@ -1450,12 +1578,109 @@ struct kvm_enc_region {
 /* Available with KVM_CAP_MANUAL_DIRTY_LOG_PROTECT_2 */
 #define KVM_CLEAR_DIRTY_LOG          _IOWR(KVMIO, 0xc0, struct kvm_clear_dirty_log)
 
-/* Available with KVM_CAP_HYPERV_CPUID */
+/* Available with KVM_CAP_HYPERV_CPUID (vcpu) / KVM_CAP_SYS_HYPERV_CPUID (system) */
 #define KVM_GET_SUPPORTED_HV_CPUID _IOWR(KVMIO, 0xc1, struct kvm_cpuid2)
 
 /* Available with KVM_CAP_ARM_SVE */
 #define KVM_ARM_VCPU_FINALIZE	  _IOW(KVMIO,  0xc2, int)
 
+/* Available with  KVM_CAP_S390_VCPU_RESETS */
+#define KVM_S390_NORMAL_RESET	_IO(KVMIO,   0xc3)
+#define KVM_S390_CLEAR_RESET	_IO(KVMIO,   0xc4)
+
+struct kvm_s390_pv_sec_parm {
+	__u64 origin;
+	__u64 length;
+};
+
+struct kvm_s390_pv_unp {
+	__u64 addr;
+	__u64 size;
+	__u64 tweak;
+};
+
+enum pv_cmd_id {
+	KVM_PV_ENABLE,
+	KVM_PV_DISABLE,
+	KVM_PV_SET_SEC_PARMS,
+	KVM_PV_UNPACK,
+	KVM_PV_VERIFY,
+	KVM_PV_PREP_RESET,
+	KVM_PV_UNSHARE_ALL,
+};
+
+struct kvm_pv_cmd {
+	__u32 cmd;	/* Command to be executed */
+	__u16 rc;	/* Ultravisor return code */
+	__u16 rrc;	/* Ultravisor return reason code */
+	__u64 data;	/* Data or address */
+	__u32 flags;    /* flags for future extensions. Must be 0 for now */
+	__u32 reserved[3];
+};
+
+/* Available with KVM_CAP_S390_PROTECTED */
+#define KVM_S390_PV_COMMAND		_IOWR(KVMIO, 0xc5, struct kvm_pv_cmd)
+
+/* Available with KVM_CAP_X86_MSR_FILTER */
+#define KVM_X86_SET_MSR_FILTER	_IOW(KVMIO,  0xc6, struct kvm_msr_filter)
+
+/* Available with KVM_CAP_DIRTY_LOG_RING */
+#define KVM_RESET_DIRTY_RINGS		_IO(KVMIO, 0xc7)
+
+/* Per-VM Xen attributes */
+#define KVM_XEN_HVM_GET_ATTR	_IOWR(KVMIO, 0xc8, struct kvm_xen_hvm_attr)
+#define KVM_XEN_HVM_SET_ATTR	_IOW(KVMIO,  0xc9, struct kvm_xen_hvm_attr)
+
+struct kvm_xen_hvm_attr {
+	__u16 type;
+	__u16 pad[3];
+	union {
+		__u8 long_mode;
+		__u8 vector;
+		struct {
+			__u64 gfn;
+		} shared_info;
+		__u64 pad[8];
+	} u;
+};
+
+/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO */
+#define KVM_XEN_ATTR_TYPE_LONG_MODE		0x0
+#define KVM_XEN_ATTR_TYPE_SHARED_INFO		0x1
+#define KVM_XEN_ATTR_TYPE_UPCALL_VECTOR		0x2
+
+/* Per-vCPU Xen attributes */
+#define KVM_XEN_VCPU_GET_ATTR	_IOWR(KVMIO, 0xca, struct kvm_xen_vcpu_attr)
+#define KVM_XEN_VCPU_SET_ATTR	_IOW(KVMIO,  0xcb, struct kvm_xen_vcpu_attr)
+
+#define KVM_GET_SREGS2             _IOR(KVMIO,  0xcc, struct kvm_sregs2)
+#define KVM_SET_SREGS2             _IOW(KVMIO,  0xcd, struct kvm_sregs2)
+
+struct kvm_xen_vcpu_attr {
+	__u16 type;
+	__u16 pad[3];
+	union {
+		__u64 gpa;
+		__u64 pad[8];
+		struct {
+			__u64 state;
+			__u64 state_entry_time;
+			__u64 time_running;
+			__u64 time_runnable;
+			__u64 time_blocked;
+			__u64 time_offline;
+		} runstate;
+	} u;
+};
+
+/* Available with KVM_CAP_XEN_HVM / KVM_XEN_HVM_CONFIG_SHARED_INFO */
+#define KVM_XEN_VCPU_ATTR_TYPE_VCPU_INFO	0x0
+#define KVM_XEN_VCPU_ATTR_TYPE_VCPU_TIME_INFO	0x1
+#define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADDR	0x2
+#define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT	0x3
+#define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_DATA	0x4
+#define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST	0x5
+
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
 	/* Guest initialization commands */
@@ -1484,6 +1709,10 @@ enum sev_cmd_id {
 	KVM_SEV_DBG_ENCRYPT,
 	/* Guest certificates commands */
 	KVM_SEV_CERT_EXPORT,
+	/* Attestation report */
+	KVM_SEV_GET_ATTESTATION_REPORT,
+	/* Guest Migration Extension */
+	KVM_SEV_SEND_CANCEL,
 
 	KVM_SEV_NR_MAX,
 };
@@ -1536,6 +1765,51 @@ struct kvm_sev_dbg {
 	__u32 len;
 };
 
+struct kvm_sev_attestation_report {
+	__u8 mnonce[16];
+	__u64 uaddr;
+	__u32 len;
+};
+
+struct kvm_sev_send_start {
+	__u32 policy;
+	__u64 pdh_cert_uaddr;
+	__u32 pdh_cert_len;
+	__u64 plat_certs_uaddr;
+	__u32 plat_certs_len;
+	__u64 amd_certs_uaddr;
+	__u32 amd_certs_len;
+	__u64 session_uaddr;
+	__u32 session_len;
+};
+
+struct kvm_sev_send_update_data {
+	__u64 hdr_uaddr;
+	__u32 hdr_len;
+	__u64 guest_uaddr;
+	__u32 guest_len;
+	__u64 trans_uaddr;
+	__u32 trans_len;
+};
+
+struct kvm_sev_receive_start {
+	__u32 handle;
+	__u32 policy;
+	__u64 pdh_uaddr;
+	__u32 pdh_len;
+	__u64 session_uaddr;
+	__u32 session_len;
+};
+
+struct kvm_sev_receive_update_data {
+	__u64 hdr_uaddr;
+	__u32 hdr_len;
+	__u64 guest_uaddr;
+	__u32 guest_len;
+	__u64 trans_uaddr;
+	__u32 trans_len;
+};
+
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
 #define KVM_DEV_ASSIGN_MASK_INTX	(1 << 2)
@@ -1606,4 +1880,130 @@ struct kvm_hyperv_eventfd {
 #define KVM_HYPERV_CONN_ID_MASK		0x00ffffff
 #define KVM_HYPERV_EVENTFD_DEASSIGN	(1 << 0)
 
+#define KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE    (1 << 0)
+#define KVM_DIRTY_LOG_INITIALLY_SET            (1 << 1)
+
+/*
+ * Arch needs to define the macro after implementing the dirty ring
+ * feature.  KVM_DIRTY_LOG_PAGE_OFFSET should be defined as the
+ * starting page offset of the dirty ring structures.
+ */
+#ifndef KVM_DIRTY_LOG_PAGE_OFFSET
+#define KVM_DIRTY_LOG_PAGE_OFFSET 0
+#endif
+
+/*
+ * KVM dirty GFN flags, defined as:
+ *
+ * |---------------+---------------+--------------|
+ * | bit 1 (reset) | bit 0 (dirty) | Status       |
+ * |---------------+---------------+--------------|
+ * |             0 |             0 | Invalid GFN  |
+ * |             0 |             1 | Dirty GFN    |
+ * |             1 |             X | GFN to reset |
+ * |---------------+---------------+--------------|
+ *
+ * Lifecycle of a dirty GFN goes like:
+ *
+ *      dirtied         harvested        reset
+ * 00 -----------> 01 -------------> 1X -------+
+ *  ^                                          |
+ *  |                                          |
+ *  +------------------------------------------+
+ *
+ * The userspace program is only responsible for the 01->1X state
+ * conversion after harvesting an entry.  Also, it must not skip any
+ * dirty bits, so that dirty bits are always harvested in sequence.
+ */
+#define KVM_DIRTY_GFN_F_DIRTY           _BITUL(0)
+#define KVM_DIRTY_GFN_F_RESET           _BITUL(1)
+#define KVM_DIRTY_GFN_F_MASK            0x3
+
+/*
+ * KVM dirty rings should be mapped at KVM_DIRTY_LOG_PAGE_OFFSET of
+ * per-vcpu mmaped regions as an array of struct kvm_dirty_gfn.  The
+ * size of the gfn buffer is decided by the first argument when
+ * enabling KVM_CAP_DIRTY_LOG_RING.
+ */
+struct kvm_dirty_gfn {
+	__u32 flags;
+	__u32 slot;
+	__u64 offset;
+};
+
+#define KVM_BUS_LOCK_DETECTION_OFF             (1 << 0)
+#define KVM_BUS_LOCK_DETECTION_EXIT            (1 << 1)
+
+/**
+ * struct kvm_stats_header - Header of per vm/vcpu binary statistics data.
+ * @flags: Some extra information for header, always 0 for now.
+ * @name_size: The size in bytes of the memory which contains statistics
+ *             name string including trailing '\0'. The memory is allocated
+ *             at the send of statistics descriptor.
+ * @num_desc: The number of statistics the vm or vcpu has.
+ * @id_offset: The offset of the vm/vcpu stats' id string in the file pointed
+ *             by vm/vcpu stats fd.
+ * @desc_offset: The offset of the vm/vcpu stats' descriptor block in the file
+ *               pointd by vm/vcpu stats fd.
+ * @data_offset: The offset of the vm/vcpu stats' data block in the file
+ *               pointed by vm/vcpu stats fd.
+ *
+ * This is the header userspace needs to read from stats fd before any other
+ * readings. It is used by userspace to discover all the information about the
+ * vm/vcpu's binary statistics.
+ * Userspace reads this header from the start of the vm/vcpu's stats fd.
+ */
+struct kvm_stats_header {
+	__u32 flags;
+	__u32 name_size;
+	__u32 num_desc;
+	__u32 id_offset;
+	__u32 desc_offset;
+	__u32 data_offset;
+};
+
+#define KVM_STATS_TYPE_SHIFT		0
+#define KVM_STATS_TYPE_MASK		(0xF << KVM_STATS_TYPE_SHIFT)
+#define KVM_STATS_TYPE_CUMULATIVE	(0x0 << KVM_STATS_TYPE_SHIFT)
+#define KVM_STATS_TYPE_INSTANT		(0x1 << KVM_STATS_TYPE_SHIFT)
+#define KVM_STATS_TYPE_PEAK		(0x2 << KVM_STATS_TYPE_SHIFT)
+#define KVM_STATS_TYPE_MAX		KVM_STATS_TYPE_PEAK
+
+#define KVM_STATS_UNIT_SHIFT		4
+#define KVM_STATS_UNIT_MASK		(0xF << KVM_STATS_UNIT_SHIFT)
+#define KVM_STATS_UNIT_NONE		(0x0 << KVM_STATS_UNIT_SHIFT)
+#define KVM_STATS_UNIT_BYTES		(0x1 << KVM_STATS_UNIT_SHIFT)
+#define KVM_STATS_UNIT_SECONDS		(0x2 << KVM_STATS_UNIT_SHIFT)
+#define KVM_STATS_UNIT_CYCLES		(0x3 << KVM_STATS_UNIT_SHIFT)
+#define KVM_STATS_UNIT_MAX		KVM_STATS_UNIT_CYCLES
+
+#define KVM_STATS_BASE_SHIFT		8
+#define KVM_STATS_BASE_MASK		(0xF << KVM_STATS_BASE_SHIFT)
+#define KVM_STATS_BASE_POW10		(0x0 << KVM_STATS_BASE_SHIFT)
+#define KVM_STATS_BASE_POW2		(0x1 << KVM_STATS_BASE_SHIFT)
+#define KVM_STATS_BASE_MAX		KVM_STATS_BASE_POW2
+
+/**
+ * struct kvm_stats_desc - Descriptor of a KVM statistics.
+ * @flags: Annotations of the stats, like type, unit, etc.
+ * @exponent: Used together with @flags to determine the unit.
+ * @size: The number of data items for this stats.
+ *        Every data item is of type __u64.
+ * @offset: The offset of the stats to the start of stat structure in
+ *          struture kvm or kvm_vcpu.
+ * @unused: Unused field for future usage. Always 0 for now.
+ * @name: The name string for the stats. Its size is indicated by the
+ *        &kvm_stats_header->name_size.
+ */
+struct kvm_stats_desc {
+	__u32 flags;
+	__s16 exponent;
+	__u16 size;
+	__u32 offset;
+	__u32 unused;
+	char name[];
+};
+
+#define KVM_GET_STATS_FD  _IO(KVMIO,  0xce)
+
 #endif /* __LINUX_KVM_H */
-- 
2.33.0.259.gc128427fd7-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC kvmtool PATCH 2/2] arm64: Add support for KVM_CAP_ARM_SYSTEM_SUSPEND
  2021-08-27 21:58 ` [RFC kvmtool PATCH 0/2] " Oliver Upton
  2021-08-27 21:58   ` [RFC kvmtool PATCH 1/2] TESTONLY: KVM: Update KVM headers Oliver Upton
@ 2021-08-27 21:58   ` Oliver Upton
  1 sibling, 0 replies; 20+ messages in thread
From: Oliver Upton @ 2021-08-27 21:58 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Marc Zyngier, Peter Shier, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, reijiw, Andrew Jones, Will Deacon,
	Julien Thierry, Oliver Upton

KVM now allows a guest to issue the SYSTEM_SUSPEND PSCI call with buy-in
from the VMM. Opt in to the new capability and handle the
KVM_SYSTEM_EVENT_SUSPEND exit subtype by ignoring the guest request and
entering KVM again. Since KVM has already configured the guest to
resume, this will result in the guest immediately coming out of reset.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 arm/kvm.c         | 12 ++++++++++++
 include/kvm/kvm.h |  1 +
 kvm-cpu.c         |  5 +++++
 kvm.c             |  7 +++++++
 4 files changed, 25 insertions(+)

diff --git a/arm/kvm.c b/arm/kvm.c
index 5aea18f..0a53c46 100644
--- a/arm/kvm.c
+++ b/arm/kvm.c
@@ -59,6 +59,18 @@ void kvm__arch_set_cmdline(char *cmdline, bool video)
 
 void kvm__arch_init(struct kvm *kvm, const char *hugetlbfs_path, u64 ram_size)
 {
+	if (kvm__supports_vm_extension(kvm, KVM_CAP_ARM_SYSTEM_SUSPEND)) {
+		struct kvm_enable_cap cap = {
+			.cap = KVM_CAP_ARM_SYSTEM_SUSPEND
+		};
+		int err;
+
+		err = kvm__enable_vm_extension(kvm, &cap);
+		if (err)
+			die("Failed to enable KVM_CAP_ARM_SYSTEM_SUSPEND "
+			    "[err %d]", err);
+	}
+
 	/*
 	 * Allocate guest memory. We must align our buffer to 64K to
 	 * correlate with the maximum guest page size for virtio-mmio.
diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
index 56e9c8e..c797516 100644
--- a/include/kvm/kvm.h
+++ b/include/kvm/kvm.h
@@ -236,6 +236,7 @@ static inline bool host_ptr_in_ram(struct kvm *kvm, void *p)
 
 bool kvm__supports_extension(struct kvm *kvm, unsigned int extension);
 bool kvm__supports_vm_extension(struct kvm *kvm, unsigned int extension);
+int kvm__enable_vm_extension(struct kvm *kvm, struct kvm_enable_cap *cap);
 
 static inline void kvm__set_thread_name(const char *name)
 {
diff --git a/kvm-cpu.c b/kvm-cpu.c
index 7dec088..1fedacf 100644
--- a/kvm-cpu.c
+++ b/kvm-cpu.c
@@ -236,6 +236,11 @@ int kvm_cpu__start(struct kvm_cpu *cpu)
 				 */
 				kvm__reboot(cpu->kvm);
 				goto exit_kvm;
+			case KVM_SYSTEM_EVENT_SUSPEND:
+				/*
+				 * Ignore the guest; kvm will resume it normally
+				 */
+				break;
 			};
 			break;
 		default: {
diff --git a/kvm.c b/kvm.c
index e327541..66815b4 100644
--- a/kvm.c
+++ b/kvm.c
@@ -123,6 +123,13 @@ bool kvm__supports_vm_extension(struct kvm *kvm, unsigned int extension)
 	return ret;
 }
 
+int kvm__enable_vm_extension(struct kvm *kvm, struct kvm_enable_cap *cap)
+{
+	int ret = ioctl(kvm->vm_fd, KVM_ENABLE_CAP, cap);
+
+	return ret ? errno : 0;
+}
+
 bool kvm__supports_extension(struct kvm *kvm, unsigned int extension)
 {
 	int ret;
-- 
2.33.0.259.gc128427fd7-goog


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support
  2021-08-19 22:36 [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support Oliver Upton
                   ` (8 preceding siblings ...)
  2021-08-27 21:58 ` [RFC kvmtool PATCH 0/2] " Oliver Upton
@ 2021-09-06  9:12 ` Marc Zyngier
  2021-09-07 16:30   ` Oliver Upton
  9 siblings, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2021-09-06  9:12 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, kvmarm, Peter Shier, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Andrew Jones

Hi Oliver,

On Thu, 19 Aug 2021 23:36:34 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> Certain VMMs/operators may wish to give their guests the ability to
> initiate a system suspend that could result in the VM being saved to
> persistent storage to be resumed at a later time. The PSCI v1.0
> specification describes an SMC, SYSTEM_SUSPEND, that allows a kernel to
> request a system suspend. This call is optional for v1.0, and KVM
> elected to not support the call in its v1.0 implementation.
> 
> This series adds support for the SYSTEM_SUSPEND PSCI call to KVM/arm64.
> Since this is a system-scoped event, KVM cannot quiesce the VM on its
> own. We add a new system exit type in this series to clue in userspace
> that a suspend was requested. Per the KVM_EXIT_SYSTEM_EVENT ABI, a VMM
> that doesn't care about this event can simply resume the guest without
> issue (we set up the calling vCPU to come out of reset correctly on next
> KVM_RUN).

More idle thoughts on this:

Although the definition of SYSTEM_SUSPEND is very simple from a PSCI
perspective, I don't think it is that simple at the system level,
because PSCI is only concerned with the CPU.

For example, what is a wake-up event? My first approach would be to
consider interrupts to be such events. However, this approach suffers
from at least two issues:

- How do you define which interrupts are actual wake-up events?
  Nothing in the GIC architecture defines what a wake-up is (let alone
  a wake-up event).

- Assuming you have a way to express the above, how do you handle
  wake-ups from interrupts that have their source in the kernel (such
  as timers, irqfd sources)? How do you cope with directly injected
  interrupts?

It looks to me that your implementation can only work with userspace
provided events, which is pretty limited.

Other items worth considering: ongoing DMA, state of the caches at
suspend time, device state in general All of this really needs to be
defined before we can move forward with this feature.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support
  2021-09-06  9:12 ` [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support Marc Zyngier
@ 2021-09-07 16:30   ` Oliver Upton
  2021-09-07 17:43     ` Marc Zyngier
  0 siblings, 1 reply; 20+ messages in thread
From: Oliver Upton @ 2021-09-07 16:30 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, kvmarm, Peter Shier, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Andrew Jones

On Mon, Sep 6, 2021 at 4:12 AM Marc Zyngier <maz@kernel.org> wrote:
>
> Hi Oliver,
>
> On Thu, 19 Aug 2021 23:36:34 +0100,
> Oliver Upton <oupton@google.com> wrote:
> >
> > Certain VMMs/operators may wish to give their guests the ability to
> > initiate a system suspend that could result in the VM being saved to
> > persistent storage to be resumed at a later time. The PSCI v1.0
> > specification describes an SMC, SYSTEM_SUSPEND, that allows a kernel to
> > request a system suspend. This call is optional for v1.0, and KVM
> > elected to not support the call in its v1.0 implementation.
> >
> > This series adds support for the SYSTEM_SUSPEND PSCI call to KVM/arm64.
> > Since this is a system-scoped event, KVM cannot quiesce the VM on its
> > own. We add a new system exit type in this series to clue in userspace
> > that a suspend was requested. Per the KVM_EXIT_SYSTEM_EVENT ABI, a VMM
> > that doesn't care about this event can simply resume the guest without
> > issue (we set up the calling vCPU to come out of reset correctly on next
> > KVM_RUN).
>
> More idle thoughts on this:
>
> Although the definition of SYSTEM_SUSPEND is very simple from a PSCI
> perspective, I don't think it is that simple at the system level,
> because PSCI is only concerned with the CPU.
>
> For example, what is a wake-up event? My first approach would be to
> consider interrupts to be such events. However, this approach suffers
> from at least two issues:
>
> - How do you define which interrupts are actual wake-up events?
>   Nothing in the GIC architecture defines what a wake-up is (let alone
>   a wake-up event).

Good point.

One possible implementation of suspend could just be a `WFI` in a
higher EL. In this case, KVM could emulate WFI wake up events
according to D1.16.2 in DDI 0487G.a. But I agree, it isn't entirely
clear what constitutes a wakeup from powered down state.

> - Assuming you have a way to express the above, how do you handle
>   wake-ups from interrupts that have their source in the kernel (such
>   as timers, irqfd sources)?

I think this could be handled, so long as we allow userspace to
indicate it has woken a vCPU. Depending on this, in the next KVM_RUN
we'd say:

- Some IMP DEF event occurred; I'm waking this CPU now
- I've either chosen to ignore the guest or will defer to KVM's
suspend implementation

> How do you cope with directly injected interrupts?

No expert on this, I'll need to do a bit more reading to give a good
answer here.

> It looks to me that your implementation can only work with userspace
> provided events, which is pretty limited.

Right. I implemented this from the mindset that userspace may do
something heavyweight when a guest suspends, like save it to a
persistent store to resume later on. No matter what we do in KVM, I
think it's probably best to give userspace the right of first refusal
to handle the suspension.

> Other items worth considering: ongoing DMA, state of the caches at
> suspend time, device state in general All of this really needs to be
> defined before we can move forward with this feature.

I believe it is largely up to the caller to get devices in a quiesced
state appropriate for a system suspend, but PSCI is delightfully vague
on this topic. On the contrary, it is up to KVM's implementation to
guarantee caches are clean when servicing the guest request.

I'll crank on this a bit more and see if I have more thoughts.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support
  2021-09-07 16:30   ` Oliver Upton
@ 2021-09-07 17:43     ` Marc Zyngier
  2021-09-07 18:14       ` Oliver Upton
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2021-09-07 17:43 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, kvmarm, Peter Shier, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Andrew Jones

On Tue, 07 Sep 2021 17:30:33 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> On Mon, Sep 6, 2021 at 4:12 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > Hi Oliver,
> >
> > On Thu, 19 Aug 2021 23:36:34 +0100,
> > Oliver Upton <oupton@google.com> wrote:
> > >
> > > Certain VMMs/operators may wish to give their guests the ability to
> > > initiate a system suspend that could result in the VM being saved to
> > > persistent storage to be resumed at a later time. The PSCI v1.0
> > > specification describes an SMC, SYSTEM_SUSPEND, that allows a kernel to
> > > request a system suspend. This call is optional for v1.0, and KVM
> > > elected to not support the call in its v1.0 implementation.
> > >
> > > This series adds support for the SYSTEM_SUSPEND PSCI call to KVM/arm64.
> > > Since this is a system-scoped event, KVM cannot quiesce the VM on its
> > > own. We add a new system exit type in this series to clue in userspace
> > > that a suspend was requested. Per the KVM_EXIT_SYSTEM_EVENT ABI, a VMM
> > > that doesn't care about this event can simply resume the guest without
> > > issue (we set up the calling vCPU to come out of reset correctly on next
> > > KVM_RUN).
> >
> > More idle thoughts on this:
> >
> > Although the definition of SYSTEM_SUSPEND is very simple from a PSCI
> > perspective, I don't think it is that simple at the system level,
> > because PSCI is only concerned with the CPU.
> >
> > For example, what is a wake-up event? My first approach would be to
> > consider interrupts to be such events. However, this approach suffers
> > from at least two issues:
> >
> > - How do you define which interrupts are actual wake-up events?
> >   Nothing in the GIC architecture defines what a wake-up is (let alone
> >   a wake-up event).
> 
> Good point.
> 
> One possible implementation of suspend could just be a `WFI` in a
> higher EL. In this case, KVM could emulate WFI wake up events
> according to D1.16.2 in DDI 0487G.a. But I agree, it isn't entirely
> clear what constitutes a wakeup from powered down state.

It isn't, and it is actually IMPDEF (there isn't much in the ARM ARM
in terms of what constitutes a low power state). And even if you
wanted to emulate a WFI in userspace, the problem of interrupts that
have their source in the kernel remains. How to you tell userspace
that such an event has occurred if the vcpu thread isn't in the
kernel?

> 
> > - Assuming you have a way to express the above, how do you handle
> >   wake-ups from interrupts that have their source in the kernel (such
> >   as timers, irqfd sources)?
> 
> I think this could be handled, so long as we allow userspace to
> indicate it has woken a vCPU. Depending on this, in the next KVM_RUN
> we'd say:
> 
> - Some IMP DEF event occurred; I'm waking this CPU now

I'm seeing the problem from the other side. The vcpu has exited to
userspace on a SUSPEND event. How is the kernel supposed to tell
userspace that there is a pending interrupt? To do that, you'd have to
keep the vcpu in the kernel on SUSPEND. Which is *exactly* WFI.

> - I've either chosen to ignore the guest or will defer to KVM's
> suspend implementation
> 
> > How do you cope with directly injected interrupts?
> 
> No expert on this, I'll need to do a bit more reading to give a good
> answer here.
> 
> > It looks to me that your implementation can only work with userspace
> > provided events, which is pretty limited.
> 
> Right. I implemented this from the mindset that userspace may do
> something heavyweight when a guest suspends, like save it to a
> persistent store to resume later on. No matter what we do in KVM, I
> think it's probably best to give userspace the right of first refusal
> to handle the suspension.

Maybe. But if you want to handle wake-up from interrupts to actually
work, you must return to the kernel for the wake-up to occurs.

The problem is that you piggyback on an existing feature (suspend) to
implement something else (opportunistic save/restore?). Oddly enough
the stars don't exactly align! ;-)

I have the feeling that a solution to this problem would be to exit to
userspace with something indicating an *intent* to suspend. At this
stage, userspace can do two things:

- resume the guest: the guest may have been moved to some other
  machine, cold storage, whatever... The important thing is that the
  guest is directly runnable without any extra event

- confirm the suspension by returning to the kernel, which will
  execute a blocking WFI on behalf of the guest

With this, you end-up with something that is works from an interrupt
perspective (even for directly injected interrupts), and you can save
your guest on suspend.

>
> > Other items worth considering: ongoing DMA, state of the caches at
> > suspend time, device state in general All of this really needs to be
> > defined before we can move forward with this feature.
> 
> I believe it is largely up to the caller to get devices in a quiesced
> state appropriate for a system suspend, but PSCI is delightfully vague
> on this topic.

Indeed, it only deals with the CPU. Oh look, another opportunity to
write a new spec! :)

> On the contrary, it is up to KVM's implementation to
> guarantee caches are clean when servicing the guest request.

This last point is pretty unclear to me. If the guest doesn't clean to
the PoC (or even to one of the PoPs) when it calls into suspend,
that's a clear indication that it doesn't care about its data. Why
should KVM be more conservative here? It shouldn't be in the business
of working around guest bugs.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support
  2021-09-07 17:43     ` Marc Zyngier
@ 2021-09-07 18:14       ` Oliver Upton
  2021-09-21  9:45         ` Marc Zyngier
  0 siblings, 1 reply; 20+ messages in thread
From: Oliver Upton @ 2021-09-07 18:14 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, kvmarm, Peter Shier, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Andrew Jones

On Tue, Sep 7, 2021 at 12:43 PM Marc Zyngier <maz@kernel.org> wrote:
> > > Although the definition of SYSTEM_SUSPEND is very simple from a PSCI
> > > perspective, I don't think it is that simple at the system level,
> > > because PSCI is only concerned with the CPU.
> > >
> > > For example, what is a wake-up event? My first approach would be to
> > > consider interrupts to be such events. However, this approach suffers
> > > from at least two issues:
> > >
> > > - How do you define which interrupts are actual wake-up events?
> > >   Nothing in the GIC architecture defines what a wake-up is (let alone
> > >   a wake-up event).
> >
> > Good point.
> >
> > One possible implementation of suspend could just be a `WFI` in a
> > higher EL. In this case, KVM could emulate WFI wake up events
> > according to D1.16.2 in DDI 0487G.a. But I agree, it isn't entirely
> > clear what constitutes a wakeup from powered down state.
>
> It isn't, and it is actually IMPDEF (there isn't much in the ARM ARM
> in terms of what constitutes a low power state). And even if you
> wanted to emulate a WFI in userspace, the problem of interrupts that
> have their source in the kernel remains. How to you tell userspace
> that such an event has occurred if the vcpu thread isn't in the
> kernel?

Well, are there any objections to saying for the KVM implementation we
observe the WFI wake-up events per the cited section of the ARM ARM?

> > > It looks to me that your implementation can only work with userspace
> > > provided events, which is pretty limited.
> >
> > Right. I implemented this from the mindset that userspace may do
> > something heavyweight when a guest suspends, like save it to a
> > persistent store to resume later on. No matter what we do in KVM, I
> > think it's probably best to give userspace the right of first refusal
> > to handle the suspension.
>
> Maybe. But if you want to handle wake-up from interrupts to actually
> work, you must return to the kernel for the wake-up to occurs.
>
> The problem is that you piggyback on an existing feature (suspend) to
> implement something else (opportunistic save/restore?). Oddly enough
> the stars don't exactly align! ;-)
>
> I have the feeling that a solution to this problem would be to exit to
> userspace with something indicating an *intent* to suspend. At this
> stage, userspace can do two things:
>
> - resume the guest: the guest may have been moved to some other
>   machine, cold storage, whatever... The important thing is that the
>   guest is directly runnable without any extra event
>
> - confirm the suspension by returning to the kernel, which will
>   execute a blocking WFI on behalf of the guest
>
> With this, you end-up with something that is works from an interrupt
> perspective (even for directly injected interrupts), and you can save
> your guest on suspend.

This is exactly what I was trying to get at with my last mail,
although not quite as eloquently stated. So I completely agree.

Just to check understanding for v2:

We agree that an exit to userspace is fine so it has the opportunity
to do something crazy when the guest attempts a suspend. If a VMM does
nothing and immediately re-enters the kernel, we emulate the suspend
there by waiting for some event to fire, which for our purposes we
will say is an interrupt originating from userspace or the kernel
(WFI). In all, the SUSPEND exit type does not indicate that emulation
terminates with the VMM. It only indicates we are about to block in
the kernel.

If there is some IMPDEF event specific to the VMM, it should signal
the vCPU thread to kick it out of the kernel, make it runnable, and
re-enter. No need to do anything special from the kernel perspective
for this. This is only for the case where we decide to block in the
kernel.

>
> >
> > > Other items worth considering: ongoing DMA, state of the caches at
> > > suspend time, device state in general All of this really needs to be
> > > defined before we can move forward with this feature.
> >
> > I believe it is largely up to the caller to get devices in a quiesced
> > state appropriate for a system suspend, but PSCI is delightfully vague
> > on this topic.
>
> Indeed, it only deals with the CPU. Oh look, another opportunity to
> write a new spec! :)
>
> > On the contrary, it is up to KVM's implementation to
> > guarantee caches are clean when servicing the guest request.
>
> This last point is pretty unclear to me. If the guest doesn't clean to
> the PoC (or even to one of the PoPs) when it calls into suspend,
> that's a clear indication that it doesn't care about its data. Why
> should KVM be more conservative here? It shouldn't be in the business
> of working around guest bugs.

PSCI is vague on this, sadly. DEN0022D.b, 5.4.8 "Implementation
responsibilities: Cache and coherency management states" that for
CPU_SUSPEND, the PSCI implementation must perform a cache clean
operation before entering the powerdown state. I don't see any reason
why SYSTEM_SUSPEND should be excluded from this requirement.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support
  2021-09-07 18:14       ` Oliver Upton
@ 2021-09-21  9:45         ` Marc Zyngier
  2021-09-21 18:22           ` Oliver Upton
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Zyngier @ 2021-09-21  9:45 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, kvmarm, Peter Shier, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Andrew Jones

Hi Oliver,

On Tue, 07 Sep 2021 19:14:00 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> On Tue, Sep 7, 2021 at 12:43 PM Marc Zyngier <maz@kernel.org> wrote:
> > > > Although the definition of SYSTEM_SUSPEND is very simple from a PSCI
> > > > perspective, I don't think it is that simple at the system level,
> > > > because PSCI is only concerned with the CPU.
> > > >
> > > > For example, what is a wake-up event? My first approach would be to
> > > > consider interrupts to be such events. However, this approach suffers
> > > > from at least two issues:
> > > >
> > > > - How do you define which interrupts are actual wake-up events?
> > > >   Nothing in the GIC architecture defines what a wake-up is (let alone
> > > >   a wake-up event).
> > >
> > > Good point.
> > >
> > > One possible implementation of suspend could just be a `WFI` in a
> > > higher EL. In this case, KVM could emulate WFI wake up events
> > > according to D1.16.2 in DDI 0487G.a. But I agree, it isn't entirely
> > > clear what constitutes a wakeup from powered down state.
> >
> > It isn't, and it is actually IMPDEF (there isn't much in the ARM ARM
> > in terms of what constitutes a low power state). And even if you
> > wanted to emulate a WFI in userspace, the problem of interrupts that
> > have their source in the kernel remains. How to you tell userspace
> > that such an event has occurred if the vcpu thread isn't in the
> > kernel?
> 
> Well, are there any objections to saying for the KVM implementation we
> observe the WFI wake-up events per the cited section of the ARM ARM?

These are fine. However, what of the GIC, for example? Can any GIC
interrupt wake-up the guest? I'm happy to say "yes" to this, but I
suspect others will have a different idea, and the thought of
introducing an IMPDEF wake-up interrupt controller doesn't fill me
with joy.

> > > > It looks to me that your implementation can only work with userspace
> > > > provided events, which is pretty limited.
> > >
> > > Right. I implemented this from the mindset that userspace may do
> > > something heavyweight when a guest suspends, like save it to a
> > > persistent store to resume later on. No matter what we do in KVM, I
> > > think it's probably best to give userspace the right of first refusal
> > > to handle the suspension.
> >
> > Maybe. But if you want to handle wake-up from interrupts to actually
> > work, you must return to the kernel for the wake-up to occurs.
> >
> > The problem is that you piggyback on an existing feature (suspend) to
> > implement something else (opportunistic save/restore?). Oddly enough
> > the stars don't exactly align! ;-)
> >
> > I have the feeling that a solution to this problem would be to exit to
> > userspace with something indicating an *intent* to suspend. At this
> > stage, userspace can do two things:
> >
> > - resume the guest: the guest may have been moved to some other
> >   machine, cold storage, whatever... The important thing is that the
> >   guest is directly runnable without any extra event
> >
> > - confirm the suspension by returning to the kernel, which will
> >   execute a blocking WFI on behalf of the guest
> >
> > With this, you end-up with something that is works from an interrupt
> > perspective (even for directly injected interrupts), and you can save
> > your guest on suspend.
> 
> This is exactly what I was trying to get at with my last mail,
> although not quite as eloquently stated. So I completely agree.

Ah! Good! :D

> Just to check understanding for v2:
> 
> We agree that an exit to userspace is fine so it has the opportunity
> to do something crazy when the guest attempts a suspend. If a VMM does
> nothing and immediately re-enters the kernel, we emulate the suspend
> there by waiting for some event to fire, which for our purposes we
> will say is an interrupt originating from userspace or the kernel
> (WFI). In all, the SUSPEND exit type does not indicate that emulation
> terminates with the VMM. It only indicates we are about to block in
> the kernel.
> 
> If there is some IMPDEF event specific to the VMM, it should signal
> the vCPU thread to kick it out of the kernel, make it runnable, and
> re-enter. No need to do anything special from the kernel perspective
> for this. This is only for the case where we decide to block in the
> kernel.

This looks sensible. One question though: I think there is an implicit
requirement that the guest should be "migratable" in that state. How
does the above handles it? If the "suspend state" is solely held in
the kernel, we need to be able to snapshot it, and I don't like the
sound of that...

We could instead keep the "suspend state" in the VMM:

On PSCI_SUSPEND, the guest exits to userspace. If the VMM wants to
honour the supend request, it reenters the guest with RUN+SUSPEND,
which results in a WFI. On each wake-up, the guest exits to userspace,
and it is the VMM responsibility to either perform the wake-up (RUN)
or stay in suspend (RUN+SUSPEND).

This ensures that the guest never transitions out of suspend without
the VMM knowing, and the VMM can always force a resume by kicking the
thread back to userspace.

Thoughts?

> > > > Other items worth considering: ongoing DMA, state of the caches at
> > > > suspend time, device state in general All of this really needs to be
> > > > defined before we can move forward with this feature.
> > >
> > > I believe it is largely up to the caller to get devices in a quiesced
> > > state appropriate for a system suspend, but PSCI is delightfully vague
> > > on this topic.
> >
> > Indeed, it only deals with the CPU. Oh look, another opportunity to
> > write a new spec! :)
> >
> > > On the contrary, it is up to KVM's implementation to
> > > guarantee caches are clean when servicing the guest request.
> >
> > This last point is pretty unclear to me. If the guest doesn't clean to
> > the PoC (or even to one of the PoPs) when it calls into suspend,
> > that's a clear indication that it doesn't care about its data. Why
> > should KVM be more conservative here? It shouldn't be in the business
> > of working around guest bugs.
> 
> PSCI is vague on this, sadly. DEN0022D.b, 5.4.8 "Implementation
> responsibilities: Cache and coherency management states" that for
> CPU_SUSPEND, the PSCI implementation must perform a cache clean
> operation before entering the powerdown state. I don't see any reason
> why SYSTEM_SUSPEND should be excluded from this requirement.

I'm not sure that's the case. CPU_SUSPEND may not use the resume
entry-point if the suspend results is a shallower state than expected
(i.e. the call just returns instead of behaving like a CPU boot).

However, a successful SYSTEM_SUSPEND always results in the deepest
possible state. The guest should know that. There is also the fact
that performing a full clean to the PoC is going to be pretty
expensive, and I'd like to avoid that.

I'll try and reach out to some of the ARM folks for clarification on
the matter.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support
  2021-09-21  9:45         ` Marc Zyngier
@ 2021-09-21 18:22           ` Oliver Upton
  0 siblings, 0 replies; 20+ messages in thread
From: Oliver Upton @ 2021-09-21 18:22 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, kvmarm, Peter Shier, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Andrew Jones

Hey Marc,

On Tue, Sep 21, 2021 at 10:45:22AM +0100, Marc Zyngier wrote:
> > > > > - How do you define which interrupts are actual wake-up events?
> > > > >   Nothing in the GIC architecture defines what a wake-up is (let alone
> > > > >   a wake-up event).
> > > >
> > > > Good point.
> > > >
> > > > One possible implementation of suspend could just be a `WFI` in a
> > > > higher EL. In this case, KVM could emulate WFI wake up events
> > > > according to D1.16.2 in DDI 0487G.a. But I agree, it isn't entirely
> > > > clear what constitutes a wakeup from powered down state.
> > >
> > > It isn't, and it is actually IMPDEF (there isn't much in the ARM ARM
> > > in terms of what constitutes a low power state). And even if you
> > > wanted to emulate a WFI in userspace, the problem of interrupts that
> > > have their source in the kernel remains. How to you tell userspace
> > > that such an event has occurred if the vcpu thread isn't in the
> > > kernel?
> > 
> > Well, are there any objections to saying for the KVM implementation we
> > observe the WFI wake-up events per the cited section of the ARM ARM?
> 
> These are fine. However, what of the GIC, for example? Can any GIC
> interrupt wake-up the guest? I'm happy to say "yes" to this, but I
> suspect others will have a different idea, and the thought of
> introducing an IMPDEF wake-up interrupt controller doesn't fill me
> with joy.
>

I'm planning to propose exactly this in the next series; any GIC
interrupt will wake the guest. I'd argue that if someone wants to do
anything else, their window of opportunity is the exit to userspace.

[...]

> > Just to check understanding for v2:
> > 
> > We agree that an exit to userspace is fine so it has the opportunity
> > to do something crazy when the guest attempts a suspend. If a VMM does
> > nothing and immediately re-enters the kernel, we emulate the suspend
> > there by waiting for some event to fire, which for our purposes we
> > will say is an interrupt originating from userspace or the kernel
> > (WFI). In all, the SUSPEND exit type does not indicate that emulation
> > terminates with the VMM. It only indicates we are about to block in
> > the kernel.
> > 
> > If there is some IMPDEF event specific to the VMM, it should signal
> > the vCPU thread to kick it out of the kernel, make it runnable, and
> > re-enter. No need to do anything special from the kernel perspective
> > for this. This is only for the case where we decide to block in the
> > kernel.
> 
> This looks sensible. One question though: I think there is an implicit
> requirement that the guest should be "migratable" in that state. How
> does the above handles it? If the "suspend state" is solely held in
> the kernel, we need to be able to snapshot it, and I don't like the
> sound of that...
> 
> We could instead keep the "suspend state" in the VMM:
> 
> On PSCI_SUSPEND, the guest exits to userspace. If the VMM wants to
> honour the supend request, it reenters the guest with RUN+SUSPEND,
> which results in a WFI. On each wake-up, the guest exits to userspace,
> and it is the VMM responsibility to either perform the wake-up (RUN)
> or stay in suspend (RUN+SUSPEND).
> 
> This ensures that the guest never transitions out of suspend without
> the VMM knowing, and the VMM can always force a resume by kicking the
> thread back to userspace.
> 
> Thoughts?

Agreed. I was mulling on exactly how to clue in the VMM about the
suspend state. What if we just encode it in KVM_{GET,SET}_MP_STATE? We'd
avoid the need for new UAPI that way. We could introduce a new state,
KVM_MP_STATE_SUSPENDED, which would clue KVM to do the suspend as we've
discussed. We would exit to userspace with KVM_MP_STATE_RUNNABLE,
meaning the VMM would need to set the MP state explicitly for the
in-kernel suspend to work.

[...]

> > > > On the contrary, it is up to KVM's implementation to
> > > > guarantee caches are clean when servicing the guest request.
> > >
> > > This last point is pretty unclear to me. If the guest doesn't clean to
> > > the PoC (or even to one of the PoPs) when it calls into suspend,
> > > that's a clear indication that it doesn't care about its data. Why
> > > should KVM be more conservative here? It shouldn't be in the business
> > > of working around guest bugs.
> > 
> > PSCI is vague on this, sadly. DEN0022D.b, 5.4.8 "Implementation
> > responsibilities: Cache and coherency management states" that for
> > CPU_SUSPEND, the PSCI implementation must perform a cache clean
> > operation before entering the powerdown state. I don't see any reason
> > why SYSTEM_SUSPEND should be excluded from this requirement.
> 
> I'm not sure that's the case. CPU_SUSPEND may not use the resume
> entry-point if the suspend results is a shallower state than expected
> (i.e. the call just returns instead of behaving like a CPU boot).
> 
> However, a successful SYSTEM_SUSPEND always results in the deepest
> possible state. The guest should know that. There is also the fact
> that performing a full clean to the PoC is going to be pretty
> expensive, and I'd like to avoid that.
> 
> I'll try and reach out to some of the ARM folks for clarification on
> the matter.

That'd be very helpful!

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2021-09-21 18:22 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-19 22:36 [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support Oliver Upton
2021-08-19 22:36 ` [PATCH 1/6] KVM: arm64: Drop unused vcpu param to kvm_psci_valid_affinity() Oliver Upton
2021-08-19 22:36 ` [PATCH 2/6] KVM: arm64: Clean up SMC64 PSCI filtering for AArch32 guests Oliver Upton
2021-08-19 22:36 ` [PATCH 3/6] KVM: arm64: Encapsulate reset request logic in a helper function Oliver Upton
2021-08-19 22:36 ` [PATCH 4/6] KVM: arm64: Add support for SYSTEM_SUSPEND PSCI call Oliver Upton
2021-08-19 22:36 ` [PATCH 5/6] selftests: KVM: Promote PSCI hypercalls to asm volatile Oliver Upton
2021-08-19 22:36 ` [PATCH 6/6] selftests: KVM: Test SYSTEM_SUSPEND PSCI call Oliver Upton
2021-08-19 23:41 ` [PATCH] Documentation: kvm: Document KVM_SYSTEM_EVENT_SUSPEND exit type Oliver Upton
2021-08-22 19:56 ` [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support Oliver Upton
2021-08-26 10:51   ` Marc Zyngier
2021-08-26 18:37     ` Oliver Upton
2021-08-27 21:58 ` [RFC kvmtool PATCH 0/2] " Oliver Upton
2021-08-27 21:58   ` [RFC kvmtool PATCH 1/2] TESTONLY: KVM: Update KVM headers Oliver Upton
2021-08-27 21:58   ` [RFC kvmtool PATCH 2/2] arm64: Add support for KVM_CAP_ARM_SYSTEM_SUSPEND Oliver Upton
2021-09-06  9:12 ` [PATCH 0/6] KVM: arm64: Implement PSCI SYSTEM_SUSPEND support Marc Zyngier
2021-09-07 16:30   ` Oliver Upton
2021-09-07 17:43     ` Marc Zyngier
2021-09-07 18:14       ` Oliver Upton
2021-09-21  9:45         ` Marc Zyngier
2021-09-21 18:22           ` Oliver Upton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).