All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/19] KVM: arm64: Implement PSCI SYSTEM_SUSPEND
@ 2022-02-23  4:18 ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang, Oliver Upton

The PSCI v1.0 specification describes a call, SYSTEM_SUSPEND, which
allows software to request that the system be placed into the lowest
possible power state and await an IMPLEMENTATION DEFINED wakeup event.
This call is optional in v1.0 and v1.1. KVM does not currently support
it in the v1.0 implementation.

This series adds support for the PSCI SYSTEM_SUSPEND call to KVM/arm64.
By default, KVM will treat the call as equivalent to CPU_SUSPEND,
wherein KVM handles the call as a guest WFI. However, this series also
introduces an opt-in for the SYSTEM_SUSPEND call to exit to userspace.
VMMs may use the event as a hint to save the VM to resume at a later
time, freeing up system resources. Userspace can decide at the time of
the exit whether or not to honor the SYSTEM_SUSPEND call.

Patch 1 is a small cleanup already present in kvmarm/next, but the
series depends on it so it has been included to guarantee the series
builds.

Patches 2-3 adds an additional check to the CPU_ON PSCI call. As Reiji
noted, PSCI implementations can return INVALID_ADDRESS if it is
determined that the provided entry address does not exist in the guest
address space.

Patch 4 is another small cleanup to generically filter SMC64 calls when
running an AArch32 EL1, avoiding the need to add a special case for the
new PSCI call introduced in this series.

Patches 5-6 add support for tracking a vCPU's power state using
KVM_MP_STATE_* values. This is significant as the series introduces an
additional power state, which cannot be represented by the
`vcpu->arch.power_off` boolean.

Patch 7 is a nitpick regarding the naming of a KVM_REQ_ handler.

Patches 8-9 provide the default implementation of PSCI SYSTEM_SUSPEND by
synchronously resetting the calling vCPU and entering WFI.

Patches 10-12 introduce a new MP state, KVM_MP_STATE_SUSPENDED, which
implements 'sticky' suspension. If userspace puts a vCPU in this state,
it will exit to userspace for every recognized wakeup event (pending
interrupt). When userspace is satisfied that a VM should resume, it must
explicitly unpark the vCPU by marking it runnable again. This is useful
for userspace to implement PSCI SYSTEM_SUSPEND if it decides to trap the
call.

Patch 13 extends upon the implementation of PSCI SYSTEM_SUSPEND,
granting userspace the opt-in capability of exiting to userspace on such
a call. *NOTE* KVM_SYSTEM_EVENT_SUSPEND breaks away from the semantics
of other system events. Userspace is required to manipulate the vCPU to
either reset it or reject the call. Other PSCI calls that exit set an
SMCCC return value before exiting, but doing so would clobber all of the
pending reset state. I wanted to avoid adding additional API to convey
the reset context to userspace so it may simply be expressed in the
architected state.

Patch 14 increments the reported PSCI version to 1.1, as KVM already
meets the requirements.

Patches 15-18 rework the PSCI selftest to make it amenable to additional
test cases

Lastly, patch 19 tests that the KVM_SYSTEM_EVENT_SUSPEND exits are
working as intended, and that KVM rejects invalid calls to PSCI
SYSTEM_SUSPEND.

This series applies cleanly to v5.17-rc5. Testing was performed with the
included selftest and suspending a QEMU guest (i.e. no system event
exits) on an Ampere Altra machine.

v2: https://patchwork.kernel.org/project/kvm/cover/20210923191610.3814698-1-oupton@google.com/

v2 -> v3:
 - rebase to 5.17-rc5
 - Reject CPU_ON and SYSTEM_SUSPEND calls that provide an invalid IPA
   (Reiji)
 - do *not* defer WFI as a requested event (Marc)
 - Add support for userspace filtering of wakeup events if SUSPEND exits
   are enabled (Marc)
 - Bump the reported PSCI verision to v1.1 (Marc)

Oliver Upton (19):
  KVM: arm64: Drop unused param from kvm_psci_version()
  KVM: arm64: Create a helper to check if IPA is valid
  KVM: arm64: Reject invalid addresses for CPU_ON PSCI call
  KVM: arm64: Clean up SMC64 PSCI filtering for AArch32 guests
  KVM: arm64: Dedupe vCPU power off helpers
  KVM: arm64: Track vCPU power state using MP state values
  KVM: arm64: Rename the KVM_REQ_SLEEP handler
  KVM: arm64: Add reset helper that accepts caller-provided reset state
  KVM: arm64: Implement PSCI SYSTEM_SUSPEND
  KVM: Create helper for setting a system event exit
  KVM: arm64: Return a value from check_vcpu_requests()
  KVM: arm64: Add support for userspace to suspend a vCPU
  KVM: arm64: Add support KVM_SYSTEM_EVENT_SUSPEND to PSCI
    SYSTEM_SUSPEND
  KVM: arm64: Raise default PSCI version to v1.1
  selftests: KVM: Rename psci_cpu_on_test to psci_test
  selftests: KVM: Create helper for making SMCCC calls
  selftests: KVM: Use KVM_SET_MP_STATE to power off vCPU in psci_test
  selftests: KVM: Refactor psci_test to make it amenable to new tests
  selftests: KVM: Test SYSTEM_SUSPEND PSCI call

 Documentation/virt/kvm/api.rst                |  62 ++++-
 arch/arm64/include/asm/kvm_host.h             |  27 ++-
 arch/arm64/include/asm/kvm_mmu.h              |   9 +
 arch/arm64/kvm/arm.c                          |  88 +++++--
 arch/arm64/kvm/psci.c                         | 129 ++++++++---
 arch/arm64/kvm/reset.c                        |  45 ++--
 arch/arm64/kvm/vgic/vgic-kvm-device.c         |   2 +-
 arch/riscv/kvm/vcpu_sbi_v01.c                 |   4 +-
 arch/x86/kvm/x86.c                            |   6 +-
 include/kvm/arm_psci.h                        |   9 +-
 include/linux/kvm_host.h                      |   7 +
 include/uapi/linux/kvm.h                      |   4 +
 tools/testing/selftests/kvm/.gitignore        |   2 +-
 tools/testing/selftests/kvm/Makefile          |   2 +-
 .../selftests/kvm/aarch64/psci_cpu_on_test.c  | 121 ----------
 .../testing/selftests/kvm/aarch64/psci_test.c | 218 ++++++++++++++++++
 .../selftests/kvm/include/aarch64/processor.h |  22 ++
 .../selftests/kvm/lib/aarch64/processor.c     |  25 ++
 tools/testing/selftests/kvm/steal_time.c      |  13 +-
 19 files changed, 571 insertions(+), 224 deletions(-)
 delete mode 100644 tools/testing/selftests/kvm/aarch64/psci_cpu_on_test.c
 create mode 100644 tools/testing/selftests/kvm/aarch64/psci_test.c

-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v3 00/19] KVM: arm64: Implement PSCI SYSTEM_SUSPEND
@ 2022-02-23  4:18 ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Wanpeng Li, kvm, Joerg Roedel, Atish Patra, Peter Shier,
	kvm-riscv, Marc Zyngier, Paolo Bonzini, Vitaly Kuznetsov,
	Jim Mattson

The PSCI v1.0 specification describes a call, SYSTEM_SUSPEND, which
allows software to request that the system be placed into the lowest
possible power state and await an IMPLEMENTATION DEFINED wakeup event.
This call is optional in v1.0 and v1.1. KVM does not currently support
it in the v1.0 implementation.

This series adds support for the PSCI SYSTEM_SUSPEND call to KVM/arm64.
By default, KVM will treat the call as equivalent to CPU_SUSPEND,
wherein KVM handles the call as a guest WFI. However, this series also
introduces an opt-in for the SYSTEM_SUSPEND call to exit to userspace.
VMMs may use the event as a hint to save the VM to resume at a later
time, freeing up system resources. Userspace can decide at the time of
the exit whether or not to honor the SYSTEM_SUSPEND call.

Patch 1 is a small cleanup already present in kvmarm/next, but the
series depends on it so it has been included to guarantee the series
builds.

Patches 2-3 adds an additional check to the CPU_ON PSCI call. As Reiji
noted, PSCI implementations can return INVALID_ADDRESS if it is
determined that the provided entry address does not exist in the guest
address space.

Patch 4 is another small cleanup to generically filter SMC64 calls when
running an AArch32 EL1, avoiding the need to add a special case for the
new PSCI call introduced in this series.

Patches 5-6 add support for tracking a vCPU's power state using
KVM_MP_STATE_* values. This is significant as the series introduces an
additional power state, which cannot be represented by the
`vcpu->arch.power_off` boolean.

Patch 7 is a nitpick regarding the naming of a KVM_REQ_ handler.

Patches 8-9 provide the default implementation of PSCI SYSTEM_SUSPEND by
synchronously resetting the calling vCPU and entering WFI.

Patches 10-12 introduce a new MP state, KVM_MP_STATE_SUSPENDED, which
implements 'sticky' suspension. If userspace puts a vCPU in this state,
it will exit to userspace for every recognized wakeup event (pending
interrupt). When userspace is satisfied that a VM should resume, it must
explicitly unpark the vCPU by marking it runnable again. This is useful
for userspace to implement PSCI SYSTEM_SUSPEND if it decides to trap the
call.

Patch 13 extends upon the implementation of PSCI SYSTEM_SUSPEND,
granting userspace the opt-in capability of exiting to userspace on such
a call. *NOTE* KVM_SYSTEM_EVENT_SUSPEND breaks away from the semantics
of other system events. Userspace is required to manipulate the vCPU to
either reset it or reject the call. Other PSCI calls that exit set an
SMCCC return value before exiting, but doing so would clobber all of the
pending reset state. I wanted to avoid adding additional API to convey
the reset context to userspace so it may simply be expressed in the
architected state.

Patch 14 increments the reported PSCI version to 1.1, as KVM already
meets the requirements.

Patches 15-18 rework the PSCI selftest to make it amenable to additional
test cases

Lastly, patch 19 tests that the KVM_SYSTEM_EVENT_SUSPEND exits are
working as intended, and that KVM rejects invalid calls to PSCI
SYSTEM_SUSPEND.

This series applies cleanly to v5.17-rc5. Testing was performed with the
included selftest and suspending a QEMU guest (i.e. no system event
exits) on an Ampere Altra machine.

v2: https://patchwork.kernel.org/project/kvm/cover/20210923191610.3814698-1-oupton@google.com/

v2 -> v3:
 - rebase to 5.17-rc5
 - Reject CPU_ON and SYSTEM_SUSPEND calls that provide an invalid IPA
   (Reiji)
 - do *not* defer WFI as a requested event (Marc)
 - Add support for userspace filtering of wakeup events if SUSPEND exits
   are enabled (Marc)
 - Bump the reported PSCI verision to v1.1 (Marc)

Oliver Upton (19):
  KVM: arm64: Drop unused param from kvm_psci_version()
  KVM: arm64: Create a helper to check if IPA is valid
  KVM: arm64: Reject invalid addresses for CPU_ON PSCI call
  KVM: arm64: Clean up SMC64 PSCI filtering for AArch32 guests
  KVM: arm64: Dedupe vCPU power off helpers
  KVM: arm64: Track vCPU power state using MP state values
  KVM: arm64: Rename the KVM_REQ_SLEEP handler
  KVM: arm64: Add reset helper that accepts caller-provided reset state
  KVM: arm64: Implement PSCI SYSTEM_SUSPEND
  KVM: Create helper for setting a system event exit
  KVM: arm64: Return a value from check_vcpu_requests()
  KVM: arm64: Add support for userspace to suspend a vCPU
  KVM: arm64: Add support KVM_SYSTEM_EVENT_SUSPEND to PSCI
    SYSTEM_SUSPEND
  KVM: arm64: Raise default PSCI version to v1.1
  selftests: KVM: Rename psci_cpu_on_test to psci_test
  selftests: KVM: Create helper for making SMCCC calls
  selftests: KVM: Use KVM_SET_MP_STATE to power off vCPU in psci_test
  selftests: KVM: Refactor psci_test to make it amenable to new tests
  selftests: KVM: Test SYSTEM_SUSPEND PSCI call

 Documentation/virt/kvm/api.rst                |  62 ++++-
 arch/arm64/include/asm/kvm_host.h             |  27 ++-
 arch/arm64/include/asm/kvm_mmu.h              |   9 +
 arch/arm64/kvm/arm.c                          |  88 +++++--
 arch/arm64/kvm/psci.c                         | 129 ++++++++---
 arch/arm64/kvm/reset.c                        |  45 ++--
 arch/arm64/kvm/vgic/vgic-kvm-device.c         |   2 +-
 arch/riscv/kvm/vcpu_sbi_v01.c                 |   4 +-
 arch/x86/kvm/x86.c                            |   6 +-
 include/kvm/arm_psci.h                        |   9 +-
 include/linux/kvm_host.h                      |   7 +
 include/uapi/linux/kvm.h                      |   4 +
 tools/testing/selftests/kvm/.gitignore        |   2 +-
 tools/testing/selftests/kvm/Makefile          |   2 +-
 .../selftests/kvm/aarch64/psci_cpu_on_test.c  | 121 ----------
 .../testing/selftests/kvm/aarch64/psci_test.c | 218 ++++++++++++++++++
 .../selftests/kvm/include/aarch64/processor.h |  22 ++
 .../selftests/kvm/lib/aarch64/processor.c     |  25 ++
 tools/testing/selftests/kvm/steal_time.c      |  13 +-
 19 files changed, 571 insertions(+), 224 deletions(-)
 delete mode 100644 tools/testing/selftests/kvm/aarch64/psci_cpu_on_test.c
 create mode 100644 tools/testing/selftests/kvm/aarch64/psci_test.c

-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* [PATCH v3 01/19] KVM: arm64: Drop unused param from kvm_psci_version()
  2022-02-23  4:18 ` Oliver Upton
@ 2022-02-23  4:18   ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang, Oliver Upton

kvm_psci_version() consumes a pointer to struct kvm in addition to a
vcpu pointer. Drop the kvm pointer as it is unused. While the comment
suggests the explicit kvm pointer was useful for calling from hyp, there
exist no such callsite in hyp.

Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220208012705.640444-1-oupton@google.com
---
 arch/arm64/kvm/psci.c  | 6 +++---
 include/kvm/arm_psci.h | 6 +-----
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 3eae32876897..a0c10c11f40e 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -85,7 +85,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 	if (!vcpu)
 		return PSCI_RET_INVALID_PARAMS;
 	if (!vcpu->arch.power_off) {
-		if (kvm_psci_version(source_vcpu, kvm) != KVM_ARM_PSCI_0_1)
+		if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
 			return PSCI_RET_ALREADY_ON;
 		else
 			return PSCI_RET_INVALID_PARAMS;
@@ -392,7 +392,7 @@ static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
  */
 int kvm_psci_call(struct kvm_vcpu *vcpu)
 {
-	switch (kvm_psci_version(vcpu, vcpu->kvm)) {
+	switch (kvm_psci_version(vcpu)) {
 	case KVM_ARM_PSCI_1_0:
 		return kvm_psci_1_0_call(vcpu);
 	case KVM_ARM_PSCI_0_2:
@@ -471,7 +471,7 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 
 	switch (reg->id) {
 	case KVM_REG_ARM_PSCI_VERSION:
-		val = kvm_psci_version(vcpu, vcpu->kvm);
+		val = kvm_psci_version(vcpu);
 		break;
 	case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
 	case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
index 5b58bd2fe088..297645edcaff 100644
--- a/include/kvm/arm_psci.h
+++ b/include/kvm/arm_psci.h
@@ -16,11 +16,7 @@
 
 #define KVM_ARM_PSCI_LATEST	KVM_ARM_PSCI_1_0
 
-/*
- * We need the KVM pointer independently from the vcpu as we can call
- * this from HYP, and need to apply kern_hyp_va on it...
- */
-static inline int kvm_psci_version(struct kvm_vcpu *vcpu, struct kvm *kvm)
+static inline int kvm_psci_version(struct kvm_vcpu *vcpu)
 {
 	/*
 	 * Our PSCI implementation stays the same across versions from
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 01/19] KVM: arm64: Drop unused param from kvm_psci_version()
@ 2022-02-23  4:18   ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Wanpeng Li, kvm, Joerg Roedel, Atish Patra, Peter Shier,
	kvm-riscv, Marc Zyngier, Paolo Bonzini, Vitaly Kuznetsov,
	Jim Mattson

kvm_psci_version() consumes a pointer to struct kvm in addition to a
vcpu pointer. Drop the kvm pointer as it is unused. While the comment
suggests the explicit kvm pointer was useful for calling from hyp, there
exist no such callsite in hyp.

Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220208012705.640444-1-oupton@google.com
---
 arch/arm64/kvm/psci.c  | 6 +++---
 include/kvm/arm_psci.h | 6 +-----
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 3eae32876897..a0c10c11f40e 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -85,7 +85,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 	if (!vcpu)
 		return PSCI_RET_INVALID_PARAMS;
 	if (!vcpu->arch.power_off) {
-		if (kvm_psci_version(source_vcpu, kvm) != KVM_ARM_PSCI_0_1)
+		if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
 			return PSCI_RET_ALREADY_ON;
 		else
 			return PSCI_RET_INVALID_PARAMS;
@@ -392,7 +392,7 @@ static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
  */
 int kvm_psci_call(struct kvm_vcpu *vcpu)
 {
-	switch (kvm_psci_version(vcpu, vcpu->kvm)) {
+	switch (kvm_psci_version(vcpu)) {
 	case KVM_ARM_PSCI_1_0:
 		return kvm_psci_1_0_call(vcpu);
 	case KVM_ARM_PSCI_0_2:
@@ -471,7 +471,7 @@ int kvm_arm_get_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 
 	switch (reg->id) {
 	case KVM_REG_ARM_PSCI_VERSION:
-		val = kvm_psci_version(vcpu, vcpu->kvm);
+		val = kvm_psci_version(vcpu);
 		break;
 	case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1:
 	case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2:
diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
index 5b58bd2fe088..297645edcaff 100644
--- a/include/kvm/arm_psci.h
+++ b/include/kvm/arm_psci.h
@@ -16,11 +16,7 @@
 
 #define KVM_ARM_PSCI_LATEST	KVM_ARM_PSCI_1_0
 
-/*
- * We need the KVM pointer independently from the vcpu as we can call
- * this from HYP, and need to apply kern_hyp_va on it...
- */
-static inline int kvm_psci_version(struct kvm_vcpu *vcpu, struct kvm *kvm)
+static inline int kvm_psci_version(struct kvm_vcpu *vcpu)
 {
 	/*
 	 * Our PSCI implementation stays the same across versions from
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 02/19] KVM: arm64: Create a helper to check if IPA is valid
  2022-02-23  4:18 ` Oliver Upton
@ 2022-02-23  4:18   ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang, Oliver Upton

Create a helper that tests if a given IPA fits within the guest's
address space.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/include/asm/kvm_mmu.h      | 9 +++++++++
 arch/arm64/kvm/vgic/vgic-kvm-device.c | 2 +-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 81839e9a8a24..78e8be7ea627 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -111,6 +111,7 @@ alternative_cb_end
 #else
 
 #include <linux/pgtable.h>
+#include <linux/kvm_host.h>
 #include <asm/pgalloc.h>
 #include <asm/cache.h>
 #include <asm/cacheflush.h>
@@ -147,6 +148,14 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
 #define kvm_phys_size(kvm)		(_AC(1, ULL) << kvm_phys_shift(kvm))
 #define kvm_phys_mask(kvm)		(kvm_phys_size(kvm) - _AC(1, ULL))
 
+/*
+ * Returns true if the provided IPA exists within the VM's IPA space.
+ */
+static inline bool kvm_ipa_valid(struct kvm *kvm, phys_addr_t guest_ipa)
+{
+	return !(guest_ipa & ~kvm_phys_mask(kvm));
+}
+
 #include <asm/kvm_pgtable.h>
 #include <asm/stage2_pgtable.h>
 
diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index c6d52a1fd9c8..e3853a75cb00 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -27,7 +27,7 @@ int vgic_check_iorange(struct kvm *kvm, phys_addr_t ioaddr,
 	if (addr + size < addr)
 		return -EINVAL;
 
-	if (addr & ~kvm_phys_mask(kvm) || addr + size > kvm_phys_size(kvm))
+	if (!kvm_ipa_valid(kvm, addr) || addr + size > kvm_phys_size(kvm))
 		return -E2BIG;
 
 	return 0;
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 02/19] KVM: arm64: Create a helper to check if IPA is valid
@ 2022-02-23  4:18   ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Wanpeng Li, kvm, Joerg Roedel, Atish Patra, Peter Shier,
	kvm-riscv, Marc Zyngier, Paolo Bonzini, Vitaly Kuznetsov,
	Jim Mattson

Create a helper that tests if a given IPA fits within the guest's
address space.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/include/asm/kvm_mmu.h      | 9 +++++++++
 arch/arm64/kvm/vgic/vgic-kvm-device.c | 2 +-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 81839e9a8a24..78e8be7ea627 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -111,6 +111,7 @@ alternative_cb_end
 #else
 
 #include <linux/pgtable.h>
+#include <linux/kvm_host.h>
 #include <asm/pgalloc.h>
 #include <asm/cache.h>
 #include <asm/cacheflush.h>
@@ -147,6 +148,14 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
 #define kvm_phys_size(kvm)		(_AC(1, ULL) << kvm_phys_shift(kvm))
 #define kvm_phys_mask(kvm)		(kvm_phys_size(kvm) - _AC(1, ULL))
 
+/*
+ * Returns true if the provided IPA exists within the VM's IPA space.
+ */
+static inline bool kvm_ipa_valid(struct kvm *kvm, phys_addr_t guest_ipa)
+{
+	return !(guest_ipa & ~kvm_phys_mask(kvm));
+}
+
 #include <asm/kvm_pgtable.h>
 #include <asm/stage2_pgtable.h>
 
diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index c6d52a1fd9c8..e3853a75cb00 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -27,7 +27,7 @@ int vgic_check_iorange(struct kvm *kvm, phys_addr_t ioaddr,
 	if (addr + size < addr)
 		return -EINVAL;
 
-	if (addr & ~kvm_phys_mask(kvm) || addr + size > kvm_phys_size(kvm))
+	if (!kvm_ipa_valid(kvm, addr) || addr + size > kvm_phys_size(kvm))
 		return -E2BIG;
 
 	return 0;
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 03/19] KVM: arm64: Reject invalid addresses for CPU_ON PSCI call
  2022-02-23  4:18 ` Oliver Upton
@ 2022-02-23  4:18   ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang, Oliver Upton

DEN0022D.b 5.6.2 "Caller responsibilities" states that a PSCI
implementation may return INVALID_ADDRESS for the CPU_ON call if the
provided entry address is known to be invalid. There is an additional
caveat to this rule. Prior to PSCI v1.0, the INVALID_PARAMETERS error
is returned instead. Check the guest's PSCI version and return the
appropriate error if the IPA is invalid.

Reported-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/kvm/psci.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index a0c10c11f40e..de1cf554929d 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -12,6 +12,7 @@
 
 #include <asm/cputype.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_mmu.h>
 
 #include <kvm/arm_psci.h>
 #include <kvm/arm_hypercalls.h>
@@ -70,12 +71,31 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 	struct vcpu_reset_state *reset_state;
 	struct kvm *kvm = source_vcpu->kvm;
 	struct kvm_vcpu *vcpu = NULL;
-	unsigned long cpu_id;
+	unsigned long cpu_id, entry_addr;
 
 	cpu_id = smccc_get_arg1(source_vcpu);
 	if (!kvm_psci_valid_affinity(source_vcpu, cpu_id))
 		return PSCI_RET_INVALID_PARAMS;
 
+	/*
+	 * Basic sanity check: ensure the requested entry address actually
+	 * exists within the guest's address space.
+	 */
+	entry_addr = smccc_get_arg2(source_vcpu);
+	if (!kvm_ipa_valid(kvm, entry_addr)) {
+
+		/*
+		 * Before PSCI v1.0, the INVALID_PARAMETERS error is returned
+		 * instead of INVALID_ADDRESS.
+		 *
+		 * For more details, see ARM DEN0022D.b 5.6 "CPU_ON".
+		 */
+		if (kvm_psci_version(source_vcpu) < KVM_ARM_PSCI_1_0)
+			return PSCI_RET_INVALID_PARAMS;
+		else
+			return PSCI_RET_INVALID_ADDRESS;
+	}
+
 	vcpu = kvm_mpidr_to_vcpu(kvm, cpu_id);
 
 	/*
@@ -93,7 +113,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 
 	reset_state = &vcpu->arch.reset_state;
 
-	reset_state->pc = smccc_get_arg2(source_vcpu);
+	reset_state->pc = entry_addr;
 
 	/* Propagate caller endianness */
 	reset_state->be = kvm_vcpu_is_be(source_vcpu);
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 03/19] KVM: arm64: Reject invalid addresses for CPU_ON PSCI call
@ 2022-02-23  4:18   ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Wanpeng Li, kvm, Joerg Roedel, Atish Patra, Peter Shier,
	kvm-riscv, Marc Zyngier, Paolo Bonzini, Vitaly Kuznetsov,
	Jim Mattson

DEN0022D.b 5.6.2 "Caller responsibilities" states that a PSCI
implementation may return INVALID_ADDRESS for the CPU_ON call if the
provided entry address is known to be invalid. There is an additional
caveat to this rule. Prior to PSCI v1.0, the INVALID_PARAMETERS error
is returned instead. Check the guest's PSCI version and return the
appropriate error if the IPA is invalid.

Reported-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/kvm/psci.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index a0c10c11f40e..de1cf554929d 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -12,6 +12,7 @@
 
 #include <asm/cputype.h>
 #include <asm/kvm_emulate.h>
+#include <asm/kvm_mmu.h>
 
 #include <kvm/arm_psci.h>
 #include <kvm/arm_hypercalls.h>
@@ -70,12 +71,31 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 	struct vcpu_reset_state *reset_state;
 	struct kvm *kvm = source_vcpu->kvm;
 	struct kvm_vcpu *vcpu = NULL;
-	unsigned long cpu_id;
+	unsigned long cpu_id, entry_addr;
 
 	cpu_id = smccc_get_arg1(source_vcpu);
 	if (!kvm_psci_valid_affinity(source_vcpu, cpu_id))
 		return PSCI_RET_INVALID_PARAMS;
 
+	/*
+	 * Basic sanity check: ensure the requested entry address actually
+	 * exists within the guest's address space.
+	 */
+	entry_addr = smccc_get_arg2(source_vcpu);
+	if (!kvm_ipa_valid(kvm, entry_addr)) {
+
+		/*
+		 * Before PSCI v1.0, the INVALID_PARAMETERS error is returned
+		 * instead of INVALID_ADDRESS.
+		 *
+		 * For more details, see ARM DEN0022D.b 5.6 "CPU_ON".
+		 */
+		if (kvm_psci_version(source_vcpu) < KVM_ARM_PSCI_1_0)
+			return PSCI_RET_INVALID_PARAMS;
+		else
+			return PSCI_RET_INVALID_ADDRESS;
+	}
+
 	vcpu = kvm_mpidr_to_vcpu(kvm, cpu_id);
 
 	/*
@@ -93,7 +113,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 
 	reset_state = &vcpu->arch.reset_state;
 
-	reset_state->pc = smccc_get_arg2(source_vcpu);
+	reset_state->pc = entry_addr;
 
 	/* Propagate caller endianness */
 	reset_state->be = kvm_vcpu_is_be(source_vcpu);
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 04/19] KVM: arm64: Clean up SMC64 PSCI filtering for AArch32 guests
  2022-02-23  4:18 ` Oliver Upton
@ 2022-02-23  4:18   ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang, Oliver Upton, Andrew Jones

The only valid calling SMC calling convention from an AArch32 state is
SMC32. Disallow any PSCI function that sets the SMC64 function ID bit
when called from AArch32 rather than comparing against known SMC64 PSCI
functions.

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
---
 arch/arm64/kvm/psci.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index de1cf554929d..4335cd5193b8 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -229,15 +229,11 @@ static void kvm_psci_narrow_to_32bit(struct kvm_vcpu *vcpu)
 
 static unsigned long kvm_psci_check_allowed_function(struct kvm_vcpu *vcpu, u32 fn)
 {
-	switch(fn) {
-	case PSCI_0_2_FN64_CPU_SUSPEND:
-	case PSCI_0_2_FN64_CPU_ON:
-	case PSCI_0_2_FN64_AFFINITY_INFO:
-		/* Disallow these functions for 32bit guests */
-		if (vcpu_mode_is_32bit(vcpu))
-			return PSCI_RET_NOT_SUPPORTED;
-		break;
-	}
+	/*
+	 * Prevent 32 bit guests from calling 64 bit PSCI functions.
+	 */
+	if ((fn & PSCI_0_2_64BIT) && vcpu_mode_is_32bit(vcpu))
+		return PSCI_RET_NOT_SUPPORTED;
 
 	return 0;
 }
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 04/19] KVM: arm64: Clean up SMC64 PSCI filtering for AArch32 guests
@ 2022-02-23  4:18   ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Wanpeng Li, kvm, Joerg Roedel, Atish Patra, Peter Shier,
	kvm-riscv, Marc Zyngier, Paolo Bonzini, Vitaly Kuznetsov,
	Jim Mattson

The only valid calling SMC calling convention from an AArch32 state is
SMC32. Disallow any PSCI function that sets the SMC64 function ID bit
when called from AArch32 rather than comparing against known SMC64 PSCI
functions.

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
---
 arch/arm64/kvm/psci.c | 14 +++++---------
 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index de1cf554929d..4335cd5193b8 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -229,15 +229,11 @@ static void kvm_psci_narrow_to_32bit(struct kvm_vcpu *vcpu)
 
 static unsigned long kvm_psci_check_allowed_function(struct kvm_vcpu *vcpu, u32 fn)
 {
-	switch(fn) {
-	case PSCI_0_2_FN64_CPU_SUSPEND:
-	case PSCI_0_2_FN64_CPU_ON:
-	case PSCI_0_2_FN64_AFFINITY_INFO:
-		/* Disallow these functions for 32bit guests */
-		if (vcpu_mode_is_32bit(vcpu))
-			return PSCI_RET_NOT_SUPPORTED;
-		break;
-	}
+	/*
+	 * Prevent 32 bit guests from calling 64 bit PSCI functions.
+	 */
+	if ((fn & PSCI_0_2_64BIT) && vcpu_mode_is_32bit(vcpu))
+		return PSCI_RET_NOT_SUPPORTED;
 
 	return 0;
 }
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 05/19] KVM: arm64: Dedupe vCPU power off helpers
  2022-02-23  4:18 ` Oliver Upton
@ 2022-02-23  4:18   ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang, Oliver Upton

vcpu_power_off() and kvm_psci_vcpu_off() are equivalent; rename the
former and replace all callsites to the latter.

No functional change intended.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/include/asm/kvm_host.h |  2 ++
 arch/arm64/kvm/arm.c              |  6 +++---
 arch/arm64/kvm/psci.c             | 11 ++---------
 3 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 5bc01e62c08a..cacc9efd2e70 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -799,4 +799,6 @@ void __init kvm_hyp_reserve(void);
 static inline void kvm_hyp_reserve(void) { }
 #endif
 
+void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu);
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index ecc5958e27fe..07c6a176cdcc 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -426,7 +426,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 	vcpu->cpu = -1;
 }
 
-static void vcpu_power_off(struct kvm_vcpu *vcpu)
+void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.power_off = true;
 	kvm_make_request(KVM_REQ_SLEEP, vcpu);
@@ -454,7 +454,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 		vcpu->arch.power_off = false;
 		break;
 	case KVM_MP_STATE_STOPPED:
-		vcpu_power_off(vcpu);
+		kvm_arm_vcpu_power_off(vcpu);
 		break;
 	default:
 		ret = -EINVAL;
@@ -1179,7 +1179,7 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
 	 * Handle the "start in power-off" case.
 	 */
 	if (test_bit(KVM_ARM_VCPU_POWER_OFF, vcpu->arch.features))
-		vcpu_power_off(vcpu);
+		kvm_arm_vcpu_power_off(vcpu);
 	else
 		vcpu->arch.power_off = false;
 
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 4335cd5193b8..e3f93b7f8d38 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -53,13 +53,6 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
 	return PSCI_RET_SUCCESS;
 }
 
-static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
-{
-	vcpu->arch.power_off = true;
-	kvm_make_request(KVM_REQ_SLEEP, vcpu);
-	kvm_vcpu_kick(vcpu);
-}
-
 static inline bool kvm_psci_valid_affinity(struct kvm_vcpu *vcpu,
 					   unsigned long affinity)
 {
@@ -262,7 +255,7 @@ static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu)
 		val = kvm_psci_vcpu_suspend(vcpu);
 		break;
 	case PSCI_0_2_FN_CPU_OFF:
-		kvm_psci_vcpu_off(vcpu);
+		kvm_arm_vcpu_power_off(vcpu);
 		val = PSCI_RET_SUCCESS;
 		break;
 	case PSCI_0_2_FN_CPU_ON:
@@ -375,7 +368,7 @@ static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
 
 	switch (psci_fn) {
 	case KVM_PSCI_FN_CPU_OFF:
-		kvm_psci_vcpu_off(vcpu);
+		kvm_arm_vcpu_power_off(vcpu);
 		val = PSCI_RET_SUCCESS;
 		break;
 	case KVM_PSCI_FN_CPU_ON:
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 05/19] KVM: arm64: Dedupe vCPU power off helpers
@ 2022-02-23  4:18   ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Wanpeng Li, kvm, Joerg Roedel, Atish Patra, Peter Shier,
	kvm-riscv, Marc Zyngier, Paolo Bonzini, Vitaly Kuznetsov,
	Jim Mattson

vcpu_power_off() and kvm_psci_vcpu_off() are equivalent; rename the
former and replace all callsites to the latter.

No functional change intended.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/include/asm/kvm_host.h |  2 ++
 arch/arm64/kvm/arm.c              |  6 +++---
 arch/arm64/kvm/psci.c             | 11 ++---------
 3 files changed, 7 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 5bc01e62c08a..cacc9efd2e70 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -799,4 +799,6 @@ void __init kvm_hyp_reserve(void);
 static inline void kvm_hyp_reserve(void) { }
 #endif
 
+void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu);
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index ecc5958e27fe..07c6a176cdcc 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -426,7 +426,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 	vcpu->cpu = -1;
 }
 
-static void vcpu_power_off(struct kvm_vcpu *vcpu)
+void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.power_off = true;
 	kvm_make_request(KVM_REQ_SLEEP, vcpu);
@@ -454,7 +454,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 		vcpu->arch.power_off = false;
 		break;
 	case KVM_MP_STATE_STOPPED:
-		vcpu_power_off(vcpu);
+		kvm_arm_vcpu_power_off(vcpu);
 		break;
 	default:
 		ret = -EINVAL;
@@ -1179,7 +1179,7 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
 	 * Handle the "start in power-off" case.
 	 */
 	if (test_bit(KVM_ARM_VCPU_POWER_OFF, vcpu->arch.features))
-		vcpu_power_off(vcpu);
+		kvm_arm_vcpu_power_off(vcpu);
 	else
 		vcpu->arch.power_off = false;
 
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 4335cd5193b8..e3f93b7f8d38 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -53,13 +53,6 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
 	return PSCI_RET_SUCCESS;
 }
 
-static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
-{
-	vcpu->arch.power_off = true;
-	kvm_make_request(KVM_REQ_SLEEP, vcpu);
-	kvm_vcpu_kick(vcpu);
-}
-
 static inline bool kvm_psci_valid_affinity(struct kvm_vcpu *vcpu,
 					   unsigned long affinity)
 {
@@ -262,7 +255,7 @@ static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu)
 		val = kvm_psci_vcpu_suspend(vcpu);
 		break;
 	case PSCI_0_2_FN_CPU_OFF:
-		kvm_psci_vcpu_off(vcpu);
+		kvm_arm_vcpu_power_off(vcpu);
 		val = PSCI_RET_SUCCESS;
 		break;
 	case PSCI_0_2_FN_CPU_ON:
@@ -375,7 +368,7 @@ static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
 
 	switch (psci_fn) {
 	case KVM_PSCI_FN_CPU_OFF:
-		kvm_psci_vcpu_off(vcpu);
+		kvm_arm_vcpu_power_off(vcpu);
 		val = PSCI_RET_SUCCESS;
 		break;
 	case KVM_PSCI_FN_CPU_ON:
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 06/19] KVM: arm64: Track vCPU power state using MP state values
  2022-02-23  4:18 ` Oliver Upton
@ 2022-02-23  4:18   ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang, Oliver Upton

A subsequent change to KVM will add support for additional power states.
Store the MP state by value rather than keeping track of it as a
boolean.

No functional change intended.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/include/asm/kvm_host.h |  5 +++--
 arch/arm64/kvm/arm.c              | 22 ++++++++++++----------
 arch/arm64/kvm/psci.c             | 10 +++++-----
 3 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index cacc9efd2e70..3e8bfecaa95b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -350,8 +350,8 @@ struct kvm_vcpu_arch {
 		u32	mdscr_el1;
 	} guest_debug_preserved;
 
-	/* vcpu power-off state */
-	bool power_off;
+	/* vcpu power state */
+	u32 mp_state;
 
 	/* Don't run the guest (internal implementation need) */
 	bool pause;
@@ -800,5 +800,6 @@ static inline void kvm_hyp_reserve(void) { }
 #endif
 
 void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu);
+bool kvm_arm_vcpu_powered_off(struct kvm_vcpu *vcpu);
 
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 07c6a176cdcc..b4987b891f38 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -428,18 +428,20 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 
 void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu)
 {
-	vcpu->arch.power_off = true;
+	vcpu->arch.mp_state = KVM_MP_STATE_STOPPED;
 	kvm_make_request(KVM_REQ_SLEEP, vcpu);
 	kvm_vcpu_kick(vcpu);
 }
 
+bool kvm_arm_vcpu_powered_off(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.mp_state == KVM_MP_STATE_STOPPED;
+}
+
 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
 				    struct kvm_mp_state *mp_state)
 {
-	if (vcpu->arch.power_off)
-		mp_state->mp_state = KVM_MP_STATE_STOPPED;
-	else
-		mp_state->mp_state = KVM_MP_STATE_RUNNABLE;
+	mp_state->mp_state = vcpu->arch.mp_state;
 
 	return 0;
 }
@@ -451,7 +453,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 
 	switch (mp_state->mp_state) {
 	case KVM_MP_STATE_RUNNABLE:
-		vcpu->arch.power_off = false;
+		vcpu->arch.mp_state = mp_state->mp_state;
 		break;
 	case KVM_MP_STATE_STOPPED:
 		kvm_arm_vcpu_power_off(vcpu);
@@ -474,7 +476,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
 	bool irq_lines = *vcpu_hcr(v) & (HCR_VI | HCR_VF);
 	return ((irq_lines || kvm_vgic_vcpu_pending_irq(v))
-		&& !v->arch.power_off && !v->arch.pause);
+		&& !kvm_arm_vcpu_powered_off(v) && !v->arch.pause);
 }
 
 bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
@@ -668,10 +670,10 @@ static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
 	struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
 
 	rcuwait_wait_event(wait,
-			   (!vcpu->arch.power_off) &&(!vcpu->arch.pause),
+			   (!kvm_arm_vcpu_powered_off(vcpu)) && (!vcpu->arch.pause),
 			   TASK_INTERRUPTIBLE);
 
-	if (vcpu->arch.power_off || vcpu->arch.pause) {
+	if (kvm_arm_vcpu_powered_off(vcpu) || vcpu->arch.pause) {
 		/* Awaken to handle a signal, request we sleep again later. */
 		kvm_make_request(KVM_REQ_SLEEP, vcpu);
 	}
@@ -1181,7 +1183,7 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
 	if (test_bit(KVM_ARM_VCPU_POWER_OFF, vcpu->arch.features))
 		kvm_arm_vcpu_power_off(vcpu);
 	else
-		vcpu->arch.power_off = false;
+		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
 
 	return 0;
 }
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index e3f93b7f8d38..77a00913cdfd 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -97,7 +97,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 	 */
 	if (!vcpu)
 		return PSCI_RET_INVALID_PARAMS;
-	if (!vcpu->arch.power_off) {
+	if (!kvm_arm_vcpu_powered_off(vcpu)) {
 		if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
 			return PSCI_RET_ALREADY_ON;
 		else
@@ -122,11 +122,11 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 
 	/*
 	 * Make sure the reset request is observed if the change to
-	 * power_off is observed.
+	 * mp_state is observed.
 	 */
 	smp_wmb();
 
-	vcpu->arch.power_off = false;
+	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
 	kvm_vcpu_wake_up(vcpu);
 
 	return PSCI_RET_SUCCESS;
@@ -164,7 +164,7 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
 		mpidr = kvm_vcpu_get_mpidr_aff(tmp);
 		if ((mpidr & target_affinity_mask) == target_affinity) {
 			matching_cpus++;
-			if (!tmp->arch.power_off)
+			if (!kvm_arm_vcpu_powered_off(tmp))
 				return PSCI_0_2_AFFINITY_LEVEL_ON;
 		}
 	}
@@ -190,7 +190,7 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
 	 * re-initialized.
 	 */
 	kvm_for_each_vcpu(i, tmp, vcpu->kvm)
-		tmp->arch.power_off = true;
+		tmp->arch.mp_state = KVM_MP_STATE_STOPPED;
 	kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
 
 	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 06/19] KVM: arm64: Track vCPU power state using MP state values
@ 2022-02-23  4:18   ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Wanpeng Li, kvm, Joerg Roedel, Atish Patra, Peter Shier,
	kvm-riscv, Marc Zyngier, Paolo Bonzini, Vitaly Kuznetsov,
	Jim Mattson

A subsequent change to KVM will add support for additional power states.
Store the MP state by value rather than keeping track of it as a
boolean.

No functional change intended.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/include/asm/kvm_host.h |  5 +++--
 arch/arm64/kvm/arm.c              | 22 ++++++++++++----------
 arch/arm64/kvm/psci.c             | 10 +++++-----
 3 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index cacc9efd2e70..3e8bfecaa95b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -350,8 +350,8 @@ struct kvm_vcpu_arch {
 		u32	mdscr_el1;
 	} guest_debug_preserved;
 
-	/* vcpu power-off state */
-	bool power_off;
+	/* vcpu power state */
+	u32 mp_state;
 
 	/* Don't run the guest (internal implementation need) */
 	bool pause;
@@ -800,5 +800,6 @@ static inline void kvm_hyp_reserve(void) { }
 #endif
 
 void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu);
+bool kvm_arm_vcpu_powered_off(struct kvm_vcpu *vcpu);
 
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 07c6a176cdcc..b4987b891f38 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -428,18 +428,20 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 
 void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu)
 {
-	vcpu->arch.power_off = true;
+	vcpu->arch.mp_state = KVM_MP_STATE_STOPPED;
 	kvm_make_request(KVM_REQ_SLEEP, vcpu);
 	kvm_vcpu_kick(vcpu);
 }
 
+bool kvm_arm_vcpu_powered_off(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.mp_state == KVM_MP_STATE_STOPPED;
+}
+
 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
 				    struct kvm_mp_state *mp_state)
 {
-	if (vcpu->arch.power_off)
-		mp_state->mp_state = KVM_MP_STATE_STOPPED;
-	else
-		mp_state->mp_state = KVM_MP_STATE_RUNNABLE;
+	mp_state->mp_state = vcpu->arch.mp_state;
 
 	return 0;
 }
@@ -451,7 +453,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 
 	switch (mp_state->mp_state) {
 	case KVM_MP_STATE_RUNNABLE:
-		vcpu->arch.power_off = false;
+		vcpu->arch.mp_state = mp_state->mp_state;
 		break;
 	case KVM_MP_STATE_STOPPED:
 		kvm_arm_vcpu_power_off(vcpu);
@@ -474,7 +476,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
 	bool irq_lines = *vcpu_hcr(v) & (HCR_VI | HCR_VF);
 	return ((irq_lines || kvm_vgic_vcpu_pending_irq(v))
-		&& !v->arch.power_off && !v->arch.pause);
+		&& !kvm_arm_vcpu_powered_off(v) && !v->arch.pause);
 }
 
 bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
@@ -668,10 +670,10 @@ static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
 	struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
 
 	rcuwait_wait_event(wait,
-			   (!vcpu->arch.power_off) &&(!vcpu->arch.pause),
+			   (!kvm_arm_vcpu_powered_off(vcpu)) && (!vcpu->arch.pause),
 			   TASK_INTERRUPTIBLE);
 
-	if (vcpu->arch.power_off || vcpu->arch.pause) {
+	if (kvm_arm_vcpu_powered_off(vcpu) || vcpu->arch.pause) {
 		/* Awaken to handle a signal, request we sleep again later. */
 		kvm_make_request(KVM_REQ_SLEEP, vcpu);
 	}
@@ -1181,7 +1183,7 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
 	if (test_bit(KVM_ARM_VCPU_POWER_OFF, vcpu->arch.features))
 		kvm_arm_vcpu_power_off(vcpu);
 	else
-		vcpu->arch.power_off = false;
+		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
 
 	return 0;
 }
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index e3f93b7f8d38..77a00913cdfd 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -97,7 +97,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 	 */
 	if (!vcpu)
 		return PSCI_RET_INVALID_PARAMS;
-	if (!vcpu->arch.power_off) {
+	if (!kvm_arm_vcpu_powered_off(vcpu)) {
 		if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
 			return PSCI_RET_ALREADY_ON;
 		else
@@ -122,11 +122,11 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 
 	/*
 	 * Make sure the reset request is observed if the change to
-	 * power_off is observed.
+	 * mp_state is observed.
 	 */
 	smp_wmb();
 
-	vcpu->arch.power_off = false;
+	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
 	kvm_vcpu_wake_up(vcpu);
 
 	return PSCI_RET_SUCCESS;
@@ -164,7 +164,7 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
 		mpidr = kvm_vcpu_get_mpidr_aff(tmp);
 		if ((mpidr & target_affinity_mask) == target_affinity) {
 			matching_cpus++;
-			if (!tmp->arch.power_off)
+			if (!kvm_arm_vcpu_powered_off(tmp))
 				return PSCI_0_2_AFFINITY_LEVEL_ON;
 		}
 	}
@@ -190,7 +190,7 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
 	 * re-initialized.
 	 */
 	kvm_for_each_vcpu(i, tmp, vcpu->kvm)
-		tmp->arch.power_off = true;
+		tmp->arch.mp_state = KVM_MP_STATE_STOPPED;
 	kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
 
 	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 07/19] KVM: arm64: Rename the KVM_REQ_SLEEP handler
  2022-02-23  4:18 ` Oliver Upton
@ 2022-02-23  4:18   ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang, Oliver Upton, Andrew Jones

The naming of the kvm_req_sleep function is confusing: the function
itself sleeps the vCPU, it does not request such an event. Rename the
function to make its purpose more clear.

No functional change intended.

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
---
 arch/arm64/kvm/arm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index b4987b891f38..6af680675810 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -665,7 +665,7 @@ void kvm_arm_resume_guest(struct kvm *kvm)
 	}
 }
 
-static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
+static void kvm_vcpu_sleep(struct kvm_vcpu *vcpu)
 {
 	struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
 
@@ -723,7 +723,7 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
 {
 	if (kvm_request_pending(vcpu)) {
 		if (kvm_check_request(KVM_REQ_SLEEP, vcpu))
-			vcpu_req_sleep(vcpu);
+			kvm_vcpu_sleep(vcpu);
 
 		if (kvm_check_request(KVM_REQ_VCPU_RESET, vcpu))
 			kvm_reset_vcpu(vcpu);
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 07/19] KVM: arm64: Rename the KVM_REQ_SLEEP handler
@ 2022-02-23  4:18   ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Wanpeng Li, kvm, Joerg Roedel, Atish Patra, Peter Shier,
	kvm-riscv, Marc Zyngier, Paolo Bonzini, Vitaly Kuznetsov,
	Jim Mattson

The naming of the kvm_req_sleep function is confusing: the function
itself sleeps the vCPU, it does not request such an event. Rename the
function to make its purpose more clear.

No functional change intended.

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
---
 arch/arm64/kvm/arm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index b4987b891f38..6af680675810 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -665,7 +665,7 @@ void kvm_arm_resume_guest(struct kvm *kvm)
 	}
 }
 
-static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
+static void kvm_vcpu_sleep(struct kvm_vcpu *vcpu)
 {
 	struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
 
@@ -723,7 +723,7 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
 {
 	if (kvm_request_pending(vcpu)) {
 		if (kvm_check_request(KVM_REQ_SLEEP, vcpu))
-			vcpu_req_sleep(vcpu);
+			kvm_vcpu_sleep(vcpu);
 
 		if (kvm_check_request(KVM_REQ_VCPU_RESET, vcpu))
 			kvm_reset_vcpu(vcpu);
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 08/19] KVM: arm64: Add reset helper that accepts caller-provided reset state
  2022-02-23  4:18 ` Oliver Upton
@ 2022-02-23  4:18   ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang, Oliver Upton

To date, struct vcpu_reset_state has been used to implement PSCI CPU_ON,
as callers of this function provide context for the targeted vCPU. A
subsequent change to KVM will require that a vCPU can populate its own
reset context.

Extract the vCPU reset implementation into a new function to separate
the locked read of shared data (vcpu->arch.reset_state) from the use of
the reset context.

No functional change intended.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/include/asm/kvm_host.h | 16 ++++++-----
 arch/arm64/kvm/reset.c            | 44 +++++++++++++++++++------------
 2 files changed, 36 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3e8bfecaa95b..33ecec755310 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -67,6 +67,15 @@ extern unsigned int kvm_sve_max_vl;
 int kvm_arm_init_sve(void);
 
 u32 __attribute_const__ kvm_target_cpu(void);
+
+struct vcpu_reset_state {
+	unsigned long	pc;
+	unsigned long	r0;
+	bool		be;
+	bool		reset;
+};
+
+int __kvm_reset_vcpu(struct kvm_vcpu *vcpu, struct vcpu_reset_state *reset_state);
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
 void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu);
 
@@ -271,13 +280,6 @@ extern s64 kvm_nvhe_sym(hyp_physvirt_offset);
 extern u64 kvm_nvhe_sym(hyp_cpu_logical_map)[NR_CPUS];
 #define hyp_cpu_logical_map CHOOSE_NVHE_SYM(hyp_cpu_logical_map)
 
-struct vcpu_reset_state {
-	unsigned long	pc;
-	unsigned long	r0;
-	bool		be;
-	bool		reset;
-};
-
 struct kvm_vcpu_arch {
 	struct kvm_cpu_context ctxt;
 	void *sve_state;
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index ecc40c8cd6f6..f879a8f6a99c 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -205,35 +205,32 @@ static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu)
 }
 
 /**
- * kvm_reset_vcpu - sets core registers and sys_regs to reset value
+ * __kvm_reset_vcpu - sets core registers and sys_regs to reset value
  * @vcpu: The VCPU pointer
+ * @reset_state: Context to use to reset the vCPU
  *
  * This function sets the registers on the virtual CPU struct to their
  * architecturally defined reset values, except for registers whose reset is
  * deferred until kvm_arm_vcpu_finalize().
  *
- * Note: This function can be called from two paths: The KVM_ARM_VCPU_INIT
- * ioctl or as part of handling a request issued by another VCPU in the PSCI
- * handling code.  In the first case, the VCPU will not be loaded, and in the
- * second case the VCPU will be loaded.  Because this function operates purely
- * on the memory-backed values of system registers, we want to do a full put if
+ * Note: This function can be called from two paths:
+ *  - The KVM_ARM_VCPU_INIT ioctl
+ *  - handling a request issued by another VCPU in the PSCI handling code
+ *
+ * In the first case, the VCPU will not be loaded, and in the second case the
+ * VCPU will be loaded.  Because this function operates purely on the
+ * memory-backed values of system registers, we want to do a full put if
  * we were loaded (handling a request) and load the values back at the end of
  * the function.  Otherwise we leave the state alone.  In both cases, we
  * disable preemption around the vcpu reset as we would otherwise race with
  * preempt notifiers which also call put/load.
  */
-int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
+int __kvm_reset_vcpu(struct kvm_vcpu *vcpu, struct vcpu_reset_state *reset_state)
 {
-	struct vcpu_reset_state reset_state;
 	int ret;
 	bool loaded;
 	u32 pstate;
 
-	mutex_lock(&vcpu->kvm->lock);
-	reset_state = vcpu->arch.reset_state;
-	WRITE_ONCE(vcpu->arch.reset_state.reset, false);
-	mutex_unlock(&vcpu->kvm->lock);
-
 	/* Reset PMU outside of the non-preemptible section */
 	kvm_pmu_vcpu_reset(vcpu);
 
@@ -296,8 +293,8 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 	 * Additional reset state handling that PSCI may have imposed on us.
 	 * Must be done after all the sys_reg reset.
 	 */
-	if (reset_state.reset) {
-		unsigned long target_pc = reset_state.pc;
+	if (reset_state->reset) {
+		unsigned long target_pc = reset_state->pc;
 
 		/* Gracefully handle Thumb2 entry point */
 		if (vcpu_mode_is_32bit(vcpu) && (target_pc & 1)) {
@@ -306,11 +303,11 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 		}
 
 		/* Propagate caller endianness */
-		if (reset_state.be)
+		if (reset_state->be)
 			kvm_vcpu_set_be(vcpu);
 
 		*vcpu_pc(vcpu) = target_pc;
-		vcpu_set_reg(vcpu, 0, reset_state.r0);
+		vcpu_set_reg(vcpu, 0, reset_state->r0);
 	}
 
 	/* Reset timer */
@@ -320,6 +317,19 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 		kvm_arch_vcpu_load(vcpu, smp_processor_id());
 	preempt_enable();
 	return ret;
+
+}
+
+int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_reset_state reset_state;
+
+	mutex_lock(&vcpu->kvm->lock);
+	reset_state = vcpu->arch.reset_state;
+	WRITE_ONCE(vcpu->arch.reset_state.reset, false);
+	mutex_unlock(&vcpu->kvm->lock);
+
+	return __kvm_reset_vcpu(vcpu, &reset_state);
 }
 
 u32 get_kvm_ipa_limit(void)
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 08/19] KVM: arm64: Add reset helper that accepts caller-provided reset state
@ 2022-02-23  4:18   ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Wanpeng Li, kvm, Joerg Roedel, Atish Patra, Peter Shier,
	kvm-riscv, Marc Zyngier, Paolo Bonzini, Vitaly Kuznetsov,
	Jim Mattson

To date, struct vcpu_reset_state has been used to implement PSCI CPU_ON,
as callers of this function provide context for the targeted vCPU. A
subsequent change to KVM will require that a vCPU can populate its own
reset context.

Extract the vCPU reset implementation into a new function to separate
the locked read of shared data (vcpu->arch.reset_state) from the use of
the reset context.

No functional change intended.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/include/asm/kvm_host.h | 16 ++++++-----
 arch/arm64/kvm/reset.c            | 44 +++++++++++++++++++------------
 2 files changed, 36 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3e8bfecaa95b..33ecec755310 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -67,6 +67,15 @@ extern unsigned int kvm_sve_max_vl;
 int kvm_arm_init_sve(void);
 
 u32 __attribute_const__ kvm_target_cpu(void);
+
+struct vcpu_reset_state {
+	unsigned long	pc;
+	unsigned long	r0;
+	bool		be;
+	bool		reset;
+};
+
+int __kvm_reset_vcpu(struct kvm_vcpu *vcpu, struct vcpu_reset_state *reset_state);
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
 void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu);
 
@@ -271,13 +280,6 @@ extern s64 kvm_nvhe_sym(hyp_physvirt_offset);
 extern u64 kvm_nvhe_sym(hyp_cpu_logical_map)[NR_CPUS];
 #define hyp_cpu_logical_map CHOOSE_NVHE_SYM(hyp_cpu_logical_map)
 
-struct vcpu_reset_state {
-	unsigned long	pc;
-	unsigned long	r0;
-	bool		be;
-	bool		reset;
-};
-
 struct kvm_vcpu_arch {
 	struct kvm_cpu_context ctxt;
 	void *sve_state;
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index ecc40c8cd6f6..f879a8f6a99c 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -205,35 +205,32 @@ static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu)
 }
 
 /**
- * kvm_reset_vcpu - sets core registers and sys_regs to reset value
+ * __kvm_reset_vcpu - sets core registers and sys_regs to reset value
  * @vcpu: The VCPU pointer
+ * @reset_state: Context to use to reset the vCPU
  *
  * This function sets the registers on the virtual CPU struct to their
  * architecturally defined reset values, except for registers whose reset is
  * deferred until kvm_arm_vcpu_finalize().
  *
- * Note: This function can be called from two paths: The KVM_ARM_VCPU_INIT
- * ioctl or as part of handling a request issued by another VCPU in the PSCI
- * handling code.  In the first case, the VCPU will not be loaded, and in the
- * second case the VCPU will be loaded.  Because this function operates purely
- * on the memory-backed values of system registers, we want to do a full put if
+ * Note: This function can be called from two paths:
+ *  - The KVM_ARM_VCPU_INIT ioctl
+ *  - handling a request issued by another VCPU in the PSCI handling code
+ *
+ * In the first case, the VCPU will not be loaded, and in the second case the
+ * VCPU will be loaded.  Because this function operates purely on the
+ * memory-backed values of system registers, we want to do a full put if
  * we were loaded (handling a request) and load the values back at the end of
  * the function.  Otherwise we leave the state alone.  In both cases, we
  * disable preemption around the vcpu reset as we would otherwise race with
  * preempt notifiers which also call put/load.
  */
-int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
+int __kvm_reset_vcpu(struct kvm_vcpu *vcpu, struct vcpu_reset_state *reset_state)
 {
-	struct vcpu_reset_state reset_state;
 	int ret;
 	bool loaded;
 	u32 pstate;
 
-	mutex_lock(&vcpu->kvm->lock);
-	reset_state = vcpu->arch.reset_state;
-	WRITE_ONCE(vcpu->arch.reset_state.reset, false);
-	mutex_unlock(&vcpu->kvm->lock);
-
 	/* Reset PMU outside of the non-preemptible section */
 	kvm_pmu_vcpu_reset(vcpu);
 
@@ -296,8 +293,8 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 	 * Additional reset state handling that PSCI may have imposed on us.
 	 * Must be done after all the sys_reg reset.
 	 */
-	if (reset_state.reset) {
-		unsigned long target_pc = reset_state.pc;
+	if (reset_state->reset) {
+		unsigned long target_pc = reset_state->pc;
 
 		/* Gracefully handle Thumb2 entry point */
 		if (vcpu_mode_is_32bit(vcpu) && (target_pc & 1)) {
@@ -306,11 +303,11 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 		}
 
 		/* Propagate caller endianness */
-		if (reset_state.be)
+		if (reset_state->be)
 			kvm_vcpu_set_be(vcpu);
 
 		*vcpu_pc(vcpu) = target_pc;
-		vcpu_set_reg(vcpu, 0, reset_state.r0);
+		vcpu_set_reg(vcpu, 0, reset_state->r0);
 	}
 
 	/* Reset timer */
@@ -320,6 +317,19 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
 		kvm_arch_vcpu_load(vcpu, smp_processor_id());
 	preempt_enable();
 	return ret;
+
+}
+
+int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_reset_state reset_state;
+
+	mutex_lock(&vcpu->kvm->lock);
+	reset_state = vcpu->arch.reset_state;
+	WRITE_ONCE(vcpu->arch.reset_state.reset, false);
+	mutex_unlock(&vcpu->kvm->lock);
+
+	return __kvm_reset_vcpu(vcpu, &reset_state);
 }
 
 u32 get_kvm_ipa_limit(void)
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 09/19] KVM: arm64: Implement PSCI SYSTEM_SUSPEND
  2022-02-23  4:18 ` Oliver Upton
@ 2022-02-23  4:18   ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang, Oliver Upton

ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows
software to request that a system be placed in the deepest possible
low-power state. Effectively, software can use this to suspend itself to
RAM. Note that the semantics of this PSCI call are very similar to
CPU_SUSPEND, which is already implemented in KVM.

Implement the SYSTEM_SUSPEND in KVM. Similar to CPU_SUSPEND, the
low-power state is implemented as a guest WFI. Synchronously reset the
calling CPU before entering the WFI, such that the vCPU may immediately
resume execution when a wakeup event is recognized.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/kvm/psci.c  | 51 ++++++++++++++++++++++++++++++++++++++++++
 arch/arm64/kvm/reset.c |  3 ++-
 2 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 77a00913cdfd..41adaaf2234a 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -208,6 +208,50 @@ static void kvm_psci_system_reset(struct kvm_vcpu *vcpu)
 	kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET);
 }
 
+static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_reset_state reset_state;
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_vcpu *tmp;
+	bool denied = false;
+	unsigned long i;
+
+	reset_state.pc = smccc_get_arg1(vcpu);
+	if (!kvm_ipa_valid(kvm, reset_state.pc)) {
+		smccc_set_retval(vcpu, PSCI_RET_INVALID_ADDRESS, 0, 0, 0);
+		return 1;
+	}
+
+	reset_state.r0 = smccc_get_arg2(vcpu);
+	reset_state.be = kvm_vcpu_is_be(vcpu);
+	reset_state.reset = true;
+
+	/*
+	 * The SYSTEM_SUSPEND PSCI call requires that all vCPUs (except the
+	 * calling vCPU) be in an OFF state, as determined by the
+	 * implementation.
+	 *
+	 * See ARM DEN0022D, 5.19 "SYSTEM_SUSPEND" for more details.
+	 */
+	mutex_lock(&kvm->lock);
+	kvm_for_each_vcpu(i, tmp, kvm) {
+		if (tmp != vcpu && !kvm_arm_vcpu_powered_off(tmp)) {
+			denied = true;
+			break;
+		}
+	}
+	mutex_unlock(&kvm->lock);
+
+	if (denied) {
+		smccc_set_retval(vcpu, PSCI_RET_DENIED, 0, 0, 0);
+		return 1;
+	}
+
+	__kvm_reset_vcpu(vcpu, &reset_state);
+	kvm_vcpu_wfi(vcpu);
+	return 1;
+}
+
 static void kvm_psci_narrow_to_32bit(struct kvm_vcpu *vcpu)
 {
 	int i;
@@ -343,6 +387,8 @@ static int kvm_psci_1_0_call(struct kvm_vcpu *vcpu)
 		case PSCI_0_2_FN_MIGRATE_INFO_TYPE:
 		case PSCI_0_2_FN_SYSTEM_OFF:
 		case PSCI_0_2_FN_SYSTEM_RESET:
+		case PSCI_1_0_FN_SYSTEM_SUSPEND:
+		case PSCI_1_0_FN64_SYSTEM_SUSPEND:
 		case PSCI_1_0_FN_PSCI_FEATURES:
 		case ARM_SMCCC_VERSION_FUNC_ID:
 			val = 0;
@@ -352,6 +398,11 @@ static int kvm_psci_1_0_call(struct kvm_vcpu *vcpu)
 			break;
 		}
 		break;
+	case PSCI_1_0_FN_SYSTEM_SUSPEND:
+		kvm_psci_narrow_to_32bit(vcpu);
+		fallthrough;
+	case PSCI_1_0_FN64_SYSTEM_SUSPEND:
+		return kvm_psci_system_suspend(vcpu);
 	default:
 		return kvm_psci_0_2_call(vcpu);
 	}
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index f879a8f6a99c..006e7a75ceba 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -215,7 +215,8 @@ static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu)
  *
  * Note: This function can be called from two paths:
  *  - The KVM_ARM_VCPU_INIT ioctl
- *  - handling a request issued by another VCPU in the PSCI handling code
+ *  - handling a request issued by possibly another VCPU in the PSCI handling
+ *    code
  *
  * In the first case, the VCPU will not be loaded, and in the second case the
  * VCPU will be loaded.  Because this function operates purely on the
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 09/19] KVM: arm64: Implement PSCI SYSTEM_SUSPEND
@ 2022-02-23  4:18   ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Wanpeng Li, kvm, Joerg Roedel, Atish Patra, Peter Shier,
	kvm-riscv, Marc Zyngier, Paolo Bonzini, Vitaly Kuznetsov,
	Jim Mattson

ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows
software to request that a system be placed in the deepest possible
low-power state. Effectively, software can use this to suspend itself to
RAM. Note that the semantics of this PSCI call are very similar to
CPU_SUSPEND, which is already implemented in KVM.

Implement the SYSTEM_SUSPEND in KVM. Similar to CPU_SUSPEND, the
low-power state is implemented as a guest WFI. Synchronously reset the
calling CPU before entering the WFI, such that the vCPU may immediately
resume execution when a wakeup event is recognized.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/kvm/psci.c  | 51 ++++++++++++++++++++++++++++++++++++++++++
 arch/arm64/kvm/reset.c |  3 ++-
 2 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 77a00913cdfd..41adaaf2234a 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -208,6 +208,50 @@ static void kvm_psci_system_reset(struct kvm_vcpu *vcpu)
 	kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET);
 }
 
+static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_reset_state reset_state;
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_vcpu *tmp;
+	bool denied = false;
+	unsigned long i;
+
+	reset_state.pc = smccc_get_arg1(vcpu);
+	if (!kvm_ipa_valid(kvm, reset_state.pc)) {
+		smccc_set_retval(vcpu, PSCI_RET_INVALID_ADDRESS, 0, 0, 0);
+		return 1;
+	}
+
+	reset_state.r0 = smccc_get_arg2(vcpu);
+	reset_state.be = kvm_vcpu_is_be(vcpu);
+	reset_state.reset = true;
+
+	/*
+	 * The SYSTEM_SUSPEND PSCI call requires that all vCPUs (except the
+	 * calling vCPU) be in an OFF state, as determined by the
+	 * implementation.
+	 *
+	 * See ARM DEN0022D, 5.19 "SYSTEM_SUSPEND" for more details.
+	 */
+	mutex_lock(&kvm->lock);
+	kvm_for_each_vcpu(i, tmp, kvm) {
+		if (tmp != vcpu && !kvm_arm_vcpu_powered_off(tmp)) {
+			denied = true;
+			break;
+		}
+	}
+	mutex_unlock(&kvm->lock);
+
+	if (denied) {
+		smccc_set_retval(vcpu, PSCI_RET_DENIED, 0, 0, 0);
+		return 1;
+	}
+
+	__kvm_reset_vcpu(vcpu, &reset_state);
+	kvm_vcpu_wfi(vcpu);
+	return 1;
+}
+
 static void kvm_psci_narrow_to_32bit(struct kvm_vcpu *vcpu)
 {
 	int i;
@@ -343,6 +387,8 @@ static int kvm_psci_1_0_call(struct kvm_vcpu *vcpu)
 		case PSCI_0_2_FN_MIGRATE_INFO_TYPE:
 		case PSCI_0_2_FN_SYSTEM_OFF:
 		case PSCI_0_2_FN_SYSTEM_RESET:
+		case PSCI_1_0_FN_SYSTEM_SUSPEND:
+		case PSCI_1_0_FN64_SYSTEM_SUSPEND:
 		case PSCI_1_0_FN_PSCI_FEATURES:
 		case ARM_SMCCC_VERSION_FUNC_ID:
 			val = 0;
@@ -352,6 +398,11 @@ static int kvm_psci_1_0_call(struct kvm_vcpu *vcpu)
 			break;
 		}
 		break;
+	case PSCI_1_0_FN_SYSTEM_SUSPEND:
+		kvm_psci_narrow_to_32bit(vcpu);
+		fallthrough;
+	case PSCI_1_0_FN64_SYSTEM_SUSPEND:
+		return kvm_psci_system_suspend(vcpu);
 	default:
 		return kvm_psci_0_2_call(vcpu);
 	}
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index f879a8f6a99c..006e7a75ceba 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -215,7 +215,8 @@ static bool vcpu_allowed_register_width(struct kvm_vcpu *vcpu)
  *
  * Note: This function can be called from two paths:
  *  - The KVM_ARM_VCPU_INIT ioctl
- *  - handling a request issued by another VCPU in the PSCI handling code
+ *  - handling a request issued by possibly another VCPU in the PSCI handling
+ *    code
  *
  * In the first case, the VCPU will not be loaded, and in the second case the
  * VCPU will be loaded.  Because this function operates purely on the
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 10/19] KVM: Create helper for setting a system event exit
  2022-02-23  4:18 ` Oliver Upton
@ 2022-02-23  4:18   ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang, Oliver Upton

Create a helper that appropriately configures kvm_run for a system event
exit.

No functional change intended.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/kvm/psci.c         | 4 +---
 arch/riscv/kvm/vcpu_sbi_v01.c | 4 +---
 arch/x86/kvm/x86.c            | 6 ++----
 include/linux/kvm_host.h      | 7 +++++++
 4 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 41adaaf2234a..2bb8d047cde4 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -193,9 +193,7 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
 		tmp->arch.mp_state = KVM_MP_STATE_STOPPED;
 	kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
 
-	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
-	vcpu->run->system_event.type = type;
-	vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+	kvm_vcpu_set_system_event_exit(vcpu, type);
 }
 
 static void kvm_psci_system_off(struct kvm_vcpu *vcpu)
diff --git a/arch/riscv/kvm/vcpu_sbi_v01.c b/arch/riscv/kvm/vcpu_sbi_v01.c
index 07e2de14433a..7a197d5658d7 100644
--- a/arch/riscv/kvm/vcpu_sbi_v01.c
+++ b/arch/riscv/kvm/vcpu_sbi_v01.c
@@ -24,9 +24,7 @@ static void kvm_sbi_system_shutdown(struct kvm_vcpu *vcpu,
 		tmp->arch.power_off = true;
 	kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
 
-	memset(&run->system_event, 0, sizeof(run->system_event));
-	run->system_event.type = type;
-	run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+	kvm_vcpu_set_system_event_exit(vcpu, type);
 }
 
 static int kvm_sbi_ext_v01_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7131d735b1ef..109751f89ee3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9903,14 +9903,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
 			kvm_vcpu_reload_apic_access_page(vcpu);
 		if (kvm_check_request(KVM_REQ_HV_CRASH, vcpu)) {
-			vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
-			vcpu->run->system_event.type = KVM_SYSTEM_EVENT_CRASH;
+			kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_CRASH);
 			r = 0;
 			goto out;
 		}
 		if (kvm_check_request(KVM_REQ_HV_RESET, vcpu)) {
-			vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
-			vcpu->run->system_event.type = KVM_SYSTEM_EVENT_RESET;
+			kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_RESET);
 			r = 0;
 			goto out;
 		}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f11039944c08..9085a1b1569a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2202,6 +2202,13 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
 }
 #endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
 
+static inline void kvm_vcpu_set_system_event_exit(struct kvm_vcpu *vcpu, u32 type)
+{
+	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
+	vcpu->run->system_event.type = type;
+	vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+}
+
 /*
  * This defines how many reserved entries we want to keep before we
  * kick the vcpu to the userspace to avoid dirty ring full.  This
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 10/19] KVM: Create helper for setting a system event exit
@ 2022-02-23  4:18   ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Wanpeng Li, kvm, Joerg Roedel, Atish Patra, Peter Shier,
	kvm-riscv, Marc Zyngier, Paolo Bonzini, Vitaly Kuznetsov,
	Jim Mattson

Create a helper that appropriately configures kvm_run for a system event
exit.

No functional change intended.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/kvm/psci.c         | 4 +---
 arch/riscv/kvm/vcpu_sbi_v01.c | 4 +---
 arch/x86/kvm/x86.c            | 6 ++----
 include/linux/kvm_host.h      | 7 +++++++
 4 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 41adaaf2234a..2bb8d047cde4 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -193,9 +193,7 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
 		tmp->arch.mp_state = KVM_MP_STATE_STOPPED;
 	kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
 
-	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
-	vcpu->run->system_event.type = type;
-	vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+	kvm_vcpu_set_system_event_exit(vcpu, type);
 }
 
 static void kvm_psci_system_off(struct kvm_vcpu *vcpu)
diff --git a/arch/riscv/kvm/vcpu_sbi_v01.c b/arch/riscv/kvm/vcpu_sbi_v01.c
index 07e2de14433a..7a197d5658d7 100644
--- a/arch/riscv/kvm/vcpu_sbi_v01.c
+++ b/arch/riscv/kvm/vcpu_sbi_v01.c
@@ -24,9 +24,7 @@ static void kvm_sbi_system_shutdown(struct kvm_vcpu *vcpu,
 		tmp->arch.power_off = true;
 	kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
 
-	memset(&run->system_event, 0, sizeof(run->system_event));
-	run->system_event.type = type;
-	run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+	kvm_vcpu_set_system_event_exit(vcpu, type);
 }
 
 static int kvm_sbi_ext_v01_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7131d735b1ef..109751f89ee3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9903,14 +9903,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
 			kvm_vcpu_reload_apic_access_page(vcpu);
 		if (kvm_check_request(KVM_REQ_HV_CRASH, vcpu)) {
-			vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
-			vcpu->run->system_event.type = KVM_SYSTEM_EVENT_CRASH;
+			kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_CRASH);
 			r = 0;
 			goto out;
 		}
 		if (kvm_check_request(KVM_REQ_HV_RESET, vcpu)) {
-			vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
-			vcpu->run->system_event.type = KVM_SYSTEM_EVENT_RESET;
+			kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_RESET);
 			r = 0;
 			goto out;
 		}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f11039944c08..9085a1b1569a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2202,6 +2202,13 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
 }
 #endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
 
+static inline void kvm_vcpu_set_system_event_exit(struct kvm_vcpu *vcpu, u32 type)
+{
+	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
+	vcpu->run->system_event.type = type;
+	vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+}
+
 /*
  * This defines how many reserved entries we want to keep before we
  * kick the vcpu to the userspace to avoid dirty ring full.  This
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 11/19] KVM: arm64: Return a value from check_vcpu_requests()
  2022-02-23  4:18 ` Oliver Upton
@ 2022-02-23  4:18   ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang, Oliver Upton

A subsequent change to KVM will introduce a vCPU request that could
result in an exit to userspace. Change check_vcpu_requests() to return a
value and document the function. Unconditionally return 1 for now.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/kvm/arm.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 6af680675810..f6ce97c0069c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -719,7 +719,16 @@ void kvm_vcpu_wfi(struct kvm_vcpu *vcpu)
 	preempt_enable();
 }
 
-static void check_vcpu_requests(struct kvm_vcpu *vcpu)
+/**
+ * check_vcpu_requests - check and handle pending vCPU requests
+ * @vcpu:	the VCPU pointer
+ *
+ * Return: 1 if we should enter the guest
+ *	   0 if we should exit to userspace
+ *	   <= 0 if we should exit to userspace, where the return value indicates
+ *	   an error
+ */
+static int check_vcpu_requests(struct kvm_vcpu *vcpu)
 {
 	if (kvm_request_pending(vcpu)) {
 		if (kvm_check_request(KVM_REQ_SLEEP, vcpu))
@@ -749,6 +758,8 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
 			kvm_pmu_handle_pmcr(vcpu,
 					    __vcpu_sys_reg(vcpu, PMCR_EL0));
 	}
+
+	return 1;
 }
 
 static bool vcpu_mode_is_bad_32bit(struct kvm_vcpu *vcpu)
@@ -859,7 +870,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 
 		update_vmid(&vcpu->arch.hw_mmu->vmid);
 
-		check_vcpu_requests(vcpu);
+		if (ret > 0)
+			ret = check_vcpu_requests(vcpu);
 
 		/*
 		 * Preparing the interrupts to be injected also
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 11/19] KVM: arm64: Return a value from check_vcpu_requests()
@ 2022-02-23  4:18   ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Wanpeng Li, kvm, Joerg Roedel, Atish Patra, Peter Shier,
	kvm-riscv, Marc Zyngier, Paolo Bonzini, Vitaly Kuznetsov,
	Jim Mattson

A subsequent change to KVM will introduce a vCPU request that could
result in an exit to userspace. Change check_vcpu_requests() to return a
value and document the function. Unconditionally return 1 for now.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/kvm/arm.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 6af680675810..f6ce97c0069c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -719,7 +719,16 @@ void kvm_vcpu_wfi(struct kvm_vcpu *vcpu)
 	preempt_enable();
 }
 
-static void check_vcpu_requests(struct kvm_vcpu *vcpu)
+/**
+ * check_vcpu_requests - check and handle pending vCPU requests
+ * @vcpu:	the VCPU pointer
+ *
+ * Return: 1 if we should enter the guest
+ *	   0 if we should exit to userspace
+ *	   <= 0 if we should exit to userspace, where the return value indicates
+ *	   an error
+ */
+static int check_vcpu_requests(struct kvm_vcpu *vcpu)
 {
 	if (kvm_request_pending(vcpu)) {
 		if (kvm_check_request(KVM_REQ_SLEEP, vcpu))
@@ -749,6 +758,8 @@ static void check_vcpu_requests(struct kvm_vcpu *vcpu)
 			kvm_pmu_handle_pmcr(vcpu,
 					    __vcpu_sys_reg(vcpu, PMCR_EL0));
 	}
+
+	return 1;
 }
 
 static bool vcpu_mode_is_bad_32bit(struct kvm_vcpu *vcpu)
@@ -859,7 +870,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 
 		update_vmid(&vcpu->arch.hw_mmu->vmid);
 
-		check_vcpu_requests(vcpu);
+		if (ret > 0)
+			ret = check_vcpu_requests(vcpu);
 
 		/*
 		 * Preparing the interrupts to be injected also
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 12/19] KVM: arm64: Add support for userspace to suspend a vCPU
  2022-02-23  4:18 ` Oliver Upton
@ 2022-02-23  4:18   ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang, Oliver Upton

Introduce a new MP state, KVM_MP_STATE_SUSPENDED, which indicates a vCPU
is in a suspended state. In the suspended state the vCPU will block
until a wakeup event (pending interrupt) is recognized.

Add a new system event type, KVM_SYSTEM_EVENT_WAKEUP, to indicate to
userspace that KVM has recognized one such wakeup event. It is the
responsibility of userspace to then make the vCPU runnable, or leave it
suspended until the next wakeup event.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 Documentation/virt/kvm/api.rst    | 23 ++++++++++++++++++--
 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/kvm/arm.c              | 35 +++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h          |  2 ++
 4 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index a4267104db50..2b4bdbc2dcc0 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1482,14 +1482,29 @@ Possible values are:
                                  [s390]
    KVM_MP_STATE_LOAD             the vcpu is in a special load/startup state
                                  [s390]
+   KVM_MP_STATE_SUSPENDED        the vcpu is in a suspend state and is waiting
+                                 for a wakeup event [arm/arm64]
    ==========================    ===============================================
 
 On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
 in-kernel irqchip, the multiprocessing state must be maintained by userspace on
 these architectures.
 
-For arm/arm64/riscv:
-^^^^^^^^^^^^^^^^^^^^
+For arm/arm64:
+^^^^^^^^^^^^^^
+
+If a vCPU is in the KVM_MP_STATE_SUSPENDED state, KVM will block the vCPU
+thread and wait for a wakeup event. A wakeup event is defined as a pending
+interrupt for the guest.
+
+If a wakeup event is recognized, KVM will exit to userspace with a
+KVM_SYSTEM_EVENT exit, where the event type is KVM_SYSTEM_EVENT_WAKEUP. If
+userspace wants to honor the wakeup, it must set the vCPU's MP state to
+KVM_MP_STATE_RUNNABLE. If it does not, KVM will continue to await a wakeup
+event in subsequent calls to KVM_RUN.
+
+For riscv:
+^^^^^^^^^^
 
 The only states that are valid are KVM_MP_STATE_STOPPED and
 KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
@@ -5914,6 +5929,7 @@ should put the acknowledged interrupt vector into the 'epr' field.
   #define KVM_SYSTEM_EVENT_SHUTDOWN       1
   #define KVM_SYSTEM_EVENT_RESET          2
   #define KVM_SYSTEM_EVENT_CRASH          3
+  #define KVM_SYSTEM_EVENT_WAKEUP         4
 			__u32 type;
 			__u64 flags;
 		} system_event;
@@ -5938,6 +5954,9 @@ Valid values for 'type' are:
    has requested a crash condition maintenance. Userspace can choose
    to ignore the request, or to gather VM memory core dump and/or
    reset/shutdown of the VM.
+ - KVM_SYSTEM_EVENT_WAKEUP -- the guest is in a suspended state and KVM
+   has recognized a wakeup event. Userspace may honor this event by marking
+   the exiting vCPU as runnable, or deny it and call KVM_RUN again.
 
 ::
 
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 33ecec755310..d32cab0c9752 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -46,6 +46,7 @@
 #define KVM_REQ_RECORD_STEAL	KVM_ARCH_REQ(3)
 #define KVM_REQ_RELOAD_GICv4	KVM_ARCH_REQ(4)
 #define KVM_REQ_RELOAD_PMU	KVM_ARCH_REQ(5)
+#define KVM_REQ_SUSPEND		KVM_ARCH_REQ(6)
 
 #define KVM_DIRTY_LOG_MANUAL_CAPS   (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
 				     KVM_DIRTY_LOG_INITIALLY_SET)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index f6ce97c0069c..d2b190f32651 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -438,6 +438,18 @@ bool kvm_arm_vcpu_powered_off(struct kvm_vcpu *vcpu)
 	return vcpu->arch.mp_state == KVM_MP_STATE_STOPPED;
 }
 
+static void kvm_arm_vcpu_suspend(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.mp_state = KVM_MP_STATE_SUSPENDED;
+	kvm_make_request(KVM_REQ_SUSPEND, vcpu);
+	kvm_vcpu_kick(vcpu);
+}
+
+bool kvm_arm_vcpu_suspended(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.mp_state == KVM_MP_STATE_SUSPENDED;
+}
+
 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
 				    struct kvm_mp_state *mp_state)
 {
@@ -458,6 +470,9 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 	case KVM_MP_STATE_STOPPED:
 		kvm_arm_vcpu_power_off(vcpu);
 		break;
+	case KVM_MP_STATE_SUSPENDED:
+		kvm_arm_vcpu_suspend(vcpu);
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -719,6 +734,23 @@ void kvm_vcpu_wfi(struct kvm_vcpu *vcpu)
 	preempt_enable();
 }
 
+static int kvm_vcpu_suspend(struct kvm_vcpu *vcpu)
+{
+	if (!kvm_arm_vcpu_suspended(vcpu))
+		return 1;
+
+	kvm_vcpu_wfi(vcpu);
+
+	/*
+	 * The suspend state is sticky; we do not leave it until userspace
+	 * explicitly marks the vCPU as runnable. Request that we suspend again
+	 * later.
+	 */
+	kvm_make_request(KVM_REQ_SUSPEND, vcpu);
+	kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_WAKEUP);
+	return 0;
+}
+
 /**
  * check_vcpu_requests - check and handle pending vCPU requests
  * @vcpu:	the VCPU pointer
@@ -757,6 +789,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
 		if (kvm_check_request(KVM_REQ_RELOAD_PMU, vcpu))
 			kvm_pmu_handle_pmcr(vcpu,
 					    __vcpu_sys_reg(vcpu, PMCR_EL0));
+
+		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
+			return kvm_vcpu_suspend(vcpu);
 	}
 
 	return 1;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 5191b57e1562..babb16c2abe5 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -444,6 +444,7 @@ struct kvm_run {
 #define KVM_SYSTEM_EVENT_SHUTDOWN       1
 #define KVM_SYSTEM_EVENT_RESET          2
 #define KVM_SYSTEM_EVENT_CRASH          3
+#define KVM_SYSTEM_EVENT_WAKEUP         4
 			__u32 type;
 			__u64 flags;
 		} system_event;
@@ -634,6 +635,7 @@ struct kvm_vapic_addr {
 #define KVM_MP_STATE_OPERATING         7
 #define KVM_MP_STATE_LOAD              8
 #define KVM_MP_STATE_AP_RESET_HOLD     9
+#define KVM_MP_STATE_SUSPENDED         10
 
 struct kvm_mp_state {
 	__u32 mp_state;
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 12/19] KVM: arm64: Add support for userspace to suspend a vCPU
@ 2022-02-23  4:18   ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Wanpeng Li, kvm, Joerg Roedel, Atish Patra, Peter Shier,
	kvm-riscv, Marc Zyngier, Paolo Bonzini, Vitaly Kuznetsov,
	Jim Mattson

Introduce a new MP state, KVM_MP_STATE_SUSPENDED, which indicates a vCPU
is in a suspended state. In the suspended state the vCPU will block
until a wakeup event (pending interrupt) is recognized.

Add a new system event type, KVM_SYSTEM_EVENT_WAKEUP, to indicate to
userspace that KVM has recognized one such wakeup event. It is the
responsibility of userspace to then make the vCPU runnable, or leave it
suspended until the next wakeup event.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 Documentation/virt/kvm/api.rst    | 23 ++++++++++++++++++--
 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/kvm/arm.c              | 35 +++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h          |  2 ++
 4 files changed, 59 insertions(+), 2 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index a4267104db50..2b4bdbc2dcc0 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1482,14 +1482,29 @@ Possible values are:
                                  [s390]
    KVM_MP_STATE_LOAD             the vcpu is in a special load/startup state
                                  [s390]
+   KVM_MP_STATE_SUSPENDED        the vcpu is in a suspend state and is waiting
+                                 for a wakeup event [arm/arm64]
    ==========================    ===============================================
 
 On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
 in-kernel irqchip, the multiprocessing state must be maintained by userspace on
 these architectures.
 
-For arm/arm64/riscv:
-^^^^^^^^^^^^^^^^^^^^
+For arm/arm64:
+^^^^^^^^^^^^^^
+
+If a vCPU is in the KVM_MP_STATE_SUSPENDED state, KVM will block the vCPU
+thread and wait for a wakeup event. A wakeup event is defined as a pending
+interrupt for the guest.
+
+If a wakeup event is recognized, KVM will exit to userspace with a
+KVM_SYSTEM_EVENT exit, where the event type is KVM_SYSTEM_EVENT_WAKEUP. If
+userspace wants to honor the wakeup, it must set the vCPU's MP state to
+KVM_MP_STATE_RUNNABLE. If it does not, KVM will continue to await a wakeup
+event in subsequent calls to KVM_RUN.
+
+For riscv:
+^^^^^^^^^^
 
 The only states that are valid are KVM_MP_STATE_STOPPED and
 KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
@@ -5914,6 +5929,7 @@ should put the acknowledged interrupt vector into the 'epr' field.
   #define KVM_SYSTEM_EVENT_SHUTDOWN       1
   #define KVM_SYSTEM_EVENT_RESET          2
   #define KVM_SYSTEM_EVENT_CRASH          3
+  #define KVM_SYSTEM_EVENT_WAKEUP         4
 			__u32 type;
 			__u64 flags;
 		} system_event;
@@ -5938,6 +5954,9 @@ Valid values for 'type' are:
    has requested a crash condition maintenance. Userspace can choose
    to ignore the request, or to gather VM memory core dump and/or
    reset/shutdown of the VM.
+ - KVM_SYSTEM_EVENT_WAKEUP -- the guest is in a suspended state and KVM
+   has recognized a wakeup event. Userspace may honor this event by marking
+   the exiting vCPU as runnable, or deny it and call KVM_RUN again.
 
 ::
 
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 33ecec755310..d32cab0c9752 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -46,6 +46,7 @@
 #define KVM_REQ_RECORD_STEAL	KVM_ARCH_REQ(3)
 #define KVM_REQ_RELOAD_GICv4	KVM_ARCH_REQ(4)
 #define KVM_REQ_RELOAD_PMU	KVM_ARCH_REQ(5)
+#define KVM_REQ_SUSPEND		KVM_ARCH_REQ(6)
 
 #define KVM_DIRTY_LOG_MANUAL_CAPS   (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
 				     KVM_DIRTY_LOG_INITIALLY_SET)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index f6ce97c0069c..d2b190f32651 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -438,6 +438,18 @@ bool kvm_arm_vcpu_powered_off(struct kvm_vcpu *vcpu)
 	return vcpu->arch.mp_state == KVM_MP_STATE_STOPPED;
 }
 
+static void kvm_arm_vcpu_suspend(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.mp_state = KVM_MP_STATE_SUSPENDED;
+	kvm_make_request(KVM_REQ_SUSPEND, vcpu);
+	kvm_vcpu_kick(vcpu);
+}
+
+bool kvm_arm_vcpu_suspended(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.mp_state == KVM_MP_STATE_SUSPENDED;
+}
+
 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
 				    struct kvm_mp_state *mp_state)
 {
@@ -458,6 +470,9 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 	case KVM_MP_STATE_STOPPED:
 		kvm_arm_vcpu_power_off(vcpu);
 		break;
+	case KVM_MP_STATE_SUSPENDED:
+		kvm_arm_vcpu_suspend(vcpu);
+		break;
 	default:
 		ret = -EINVAL;
 	}
@@ -719,6 +734,23 @@ void kvm_vcpu_wfi(struct kvm_vcpu *vcpu)
 	preempt_enable();
 }
 
+static int kvm_vcpu_suspend(struct kvm_vcpu *vcpu)
+{
+	if (!kvm_arm_vcpu_suspended(vcpu))
+		return 1;
+
+	kvm_vcpu_wfi(vcpu);
+
+	/*
+	 * The suspend state is sticky; we do not leave it until userspace
+	 * explicitly marks the vCPU as runnable. Request that we suspend again
+	 * later.
+	 */
+	kvm_make_request(KVM_REQ_SUSPEND, vcpu);
+	kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_WAKEUP);
+	return 0;
+}
+
 /**
  * check_vcpu_requests - check and handle pending vCPU requests
  * @vcpu:	the VCPU pointer
@@ -757,6 +789,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
 		if (kvm_check_request(KVM_REQ_RELOAD_PMU, vcpu))
 			kvm_pmu_handle_pmcr(vcpu,
 					    __vcpu_sys_reg(vcpu, PMCR_EL0));
+
+		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
+			return kvm_vcpu_suspend(vcpu);
 	}
 
 	return 1;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 5191b57e1562..babb16c2abe5 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -444,6 +444,7 @@ struct kvm_run {
 #define KVM_SYSTEM_EVENT_SHUTDOWN       1
 #define KVM_SYSTEM_EVENT_RESET          2
 #define KVM_SYSTEM_EVENT_CRASH          3
+#define KVM_SYSTEM_EVENT_WAKEUP         4
 			__u32 type;
 			__u64 flags;
 		} system_event;
@@ -634,6 +635,7 @@ struct kvm_vapic_addr {
 #define KVM_MP_STATE_OPERATING         7
 #define KVM_MP_STATE_LOAD              8
 #define KVM_MP_STATE_AP_RESET_HOLD     9
+#define KVM_MP_STATE_SUSPENDED         10
 
 struct kvm_mp_state {
 	__u32 mp_state;
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 13/19] KVM: arm64: Add support KVM_SYSTEM_EVENT_SUSPEND to PSCI SYSTEM_SUSPEND
  2022-02-23  4:18 ` Oliver Upton
@ 2022-02-23  4:18   ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang, Oliver Upton

Add a new system event type, KVM_SYSTEM_EVENT_SUSPEND, which indicates
to userspace that the guest has requested the VM be suspended. Userspace
can decide whether or not it wants to honor the guest's request by
changing the MP state of the vCPU. If it does not, userspace is
responsible for configuring the vCPU to return an error to the guest.
Document these expectations in the KVM API documentation.

To preserve ABI, this new exit requires explicit opt-in from userspace.
Add KVM_CAP_ARM_SYSTEM_SUSPEND which grants userspace the ability to
opt-in to these exits on a per-VM basis.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 Documentation/virt/kvm/api.rst    | 39 +++++++++++++++++++++++++++++++
 arch/arm64/include/asm/kvm_host.h |  3 +++
 arch/arm64/kvm/arm.c              |  5 ++++
 arch/arm64/kvm/psci.c             |  5 ++++
 include/uapi/linux/kvm.h          |  2 ++
 5 files changed, 54 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 2b4bdbc2dcc0..1e207bbc01f5 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5930,6 +5930,7 @@ should put the acknowledged interrupt vector into the 'epr' field.
   #define KVM_SYSTEM_EVENT_RESET          2
   #define KVM_SYSTEM_EVENT_CRASH          3
   #define KVM_SYSTEM_EVENT_WAKEUP         4
+  #define KVM_SYSTEM_EVENT_SUSPENDED      5
 			__u32 type;
 			__u64 flags;
 		} system_event;
@@ -5957,6 +5958,34 @@ Valid values for 'type' are:
  - KVM_SYSTEM_EVENT_WAKEUP -- the guest is in a suspended state and KVM
    has recognized a wakeup event. Userspace may honor this event by marking
    the exiting vCPU as runnable, or deny it and call KVM_RUN again.
+ - KVM_SYSTEM_EVENT_SUSPENDED -- the guest has requested a suspension of
+   the VM.
+
+For arm/arm64:
+^^^^^^^^^^^^^^
+
+   KVM_SYSTEM_EVENT_SUSPENDED exits are enabled with the
+   KVM_CAP_ARM_SYSTEM_SUSPEND VM capability. If a guest successfully
+   invokes the PSCI SYSTEM_SUSPEND function, KVM will exit to userspace
+   with this event type.
+
+   The guest's x2 register contains the 'entry_address' where execution
+   should resume when the VM is brought out of suspend. The guest's x3
+   register contains the 'context_id' corresponding to the request. When
+   the guest resumes execution at 'entry_address', x0 should contain the
+   'context_id'. For more details on the SYSTEM_SUSPEND PSCI call, see
+   ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND".
+
+   Userspace is _required_ to take action for such an exit. It must
+   either:
+
+    - Honor the guest request to suspend the VM. Userspace must reset
+      the calling vCPU, then set PC to 'entry_address' and x0 to
+      'context_id'. Userspace may request in-kernel emulation of the
+      suspension by setting the vCPU's state to KVM_MP_STATE_SUSPENDED.
+
+    - Deny the guest request to suspend the VM. Userspace must set
+      registers x1-x3 to 0 and set x0 to PSCI_RET_INTERNAL_ERROR (-6).
 
 ::
 
@@ -7580,3 +7609,13 @@ The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
 of the result of KVM_CHECK_EXTENSION.  KVM will forward to userspace
 the hypercalls whose corresponding bit is in the argument, and return
 ENOSYS for the others.
+
+8.35 KVM_CAP_ARM_SYSTEM_SUSPEND
+-------------------------------
+
+:Capability: KVM_CAP_ARM_SYSTEM_SUSPEND
+:Architectures: arm64
+:Type: vm
+
+When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of
+type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request.
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index d32cab0c9752..e1c2ec18d1aa 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -146,6 +146,9 @@ struct kvm_arch {
 
 	/* Memory Tagging Extension enabled for the guest */
 	bool mte_enabled;
+
+	/* System Suspend Event exits enabled for the VM */
+	bool system_suspend_exits;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index d2b190f32651..ce3f14a77a49 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -101,6 +101,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		}
 		mutex_unlock(&kvm->lock);
 		break;
+	case KVM_CAP_ARM_SYSTEM_SUSPEND:
+		r = 0;
+		kvm->arch.system_suspend_exits = true;
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -209,6 +213,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_SET_GUEST_DEBUG:
 	case KVM_CAP_VCPU_ATTRIBUTES:
 	case KVM_CAP_PTP_KVM:
+	case KVM_CAP_ARM_SYSTEM_SUSPEND:
 		r = 1;
 		break;
 	case KVM_CAP_SET_GUEST_DEBUG2:
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 2bb8d047cde4..a7de84cec2e4 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -245,6 +245,11 @@ static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
 		return 1;
 	}
 
+	if (kvm->arch.system_suspend_exits) {
+		kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_SUSPEND);
+		return 0;
+	}
+
 	__kvm_reset_vcpu(vcpu, &reset_state);
 	kvm_vcpu_wfi(vcpu);
 	return 1;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index babb16c2abe5..e5bb5f15c0eb 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -445,6 +445,7 @@ struct kvm_run {
 #define KVM_SYSTEM_EVENT_RESET          2
 #define KVM_SYSTEM_EVENT_CRASH          3
 #define KVM_SYSTEM_EVENT_WAKEUP         4
+#define KVM_SYSTEM_EVENT_SUSPEND        5
 			__u32 type;
 			__u64 flags;
 		} system_event;
@@ -1136,6 +1137,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_VM_GPA_BITS 207
 #define KVM_CAP_XSAVE2 208
 #define KVM_CAP_SYS_ATTRIBUTES 209
+#define KVM_CAP_ARM_SYSTEM_SUSPEND 210
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 13/19] KVM: arm64: Add support KVM_SYSTEM_EVENT_SUSPEND to PSCI SYSTEM_SUSPEND
@ 2022-02-23  4:18   ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Wanpeng Li, kvm, Joerg Roedel, Atish Patra, Peter Shier,
	kvm-riscv, Marc Zyngier, Paolo Bonzini, Vitaly Kuznetsov,
	Jim Mattson

Add a new system event type, KVM_SYSTEM_EVENT_SUSPEND, which indicates
to userspace that the guest has requested the VM be suspended. Userspace
can decide whether or not it wants to honor the guest's request by
changing the MP state of the vCPU. If it does not, userspace is
responsible for configuring the vCPU to return an error to the guest.
Document these expectations in the KVM API documentation.

To preserve ABI, this new exit requires explicit opt-in from userspace.
Add KVM_CAP_ARM_SYSTEM_SUSPEND which grants userspace the ability to
opt-in to these exits on a per-VM basis.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 Documentation/virt/kvm/api.rst    | 39 +++++++++++++++++++++++++++++++
 arch/arm64/include/asm/kvm_host.h |  3 +++
 arch/arm64/kvm/arm.c              |  5 ++++
 arch/arm64/kvm/psci.c             |  5 ++++
 include/uapi/linux/kvm.h          |  2 ++
 5 files changed, 54 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 2b4bdbc2dcc0..1e207bbc01f5 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5930,6 +5930,7 @@ should put the acknowledged interrupt vector into the 'epr' field.
   #define KVM_SYSTEM_EVENT_RESET          2
   #define KVM_SYSTEM_EVENT_CRASH          3
   #define KVM_SYSTEM_EVENT_WAKEUP         4
+  #define KVM_SYSTEM_EVENT_SUSPENDED      5
 			__u32 type;
 			__u64 flags;
 		} system_event;
@@ -5957,6 +5958,34 @@ Valid values for 'type' are:
  - KVM_SYSTEM_EVENT_WAKEUP -- the guest is in a suspended state and KVM
    has recognized a wakeup event. Userspace may honor this event by marking
    the exiting vCPU as runnable, or deny it and call KVM_RUN again.
+ - KVM_SYSTEM_EVENT_SUSPENDED -- the guest has requested a suspension of
+   the VM.
+
+For arm/arm64:
+^^^^^^^^^^^^^^
+
+   KVM_SYSTEM_EVENT_SUSPENDED exits are enabled with the
+   KVM_CAP_ARM_SYSTEM_SUSPEND VM capability. If a guest successfully
+   invokes the PSCI SYSTEM_SUSPEND function, KVM will exit to userspace
+   with this event type.
+
+   The guest's x2 register contains the 'entry_address' where execution
+   should resume when the VM is brought out of suspend. The guest's x3
+   register contains the 'context_id' corresponding to the request. When
+   the guest resumes execution at 'entry_address', x0 should contain the
+   'context_id'. For more details on the SYSTEM_SUSPEND PSCI call, see
+   ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND".
+
+   Userspace is _required_ to take action for such an exit. It must
+   either:
+
+    - Honor the guest request to suspend the VM. Userspace must reset
+      the calling vCPU, then set PC to 'entry_address' and x0 to
+      'context_id'. Userspace may request in-kernel emulation of the
+      suspension by setting the vCPU's state to KVM_MP_STATE_SUSPENDED.
+
+    - Deny the guest request to suspend the VM. Userspace must set
+      registers x1-x3 to 0 and set x0 to PSCI_RET_INTERNAL_ERROR (-6).
 
 ::
 
@@ -7580,3 +7609,13 @@ The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
 of the result of KVM_CHECK_EXTENSION.  KVM will forward to userspace
 the hypercalls whose corresponding bit is in the argument, and return
 ENOSYS for the others.
+
+8.35 KVM_CAP_ARM_SYSTEM_SUSPEND
+-------------------------------
+
+:Capability: KVM_CAP_ARM_SYSTEM_SUSPEND
+:Architectures: arm64
+:Type: vm
+
+When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of
+type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request.
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index d32cab0c9752..e1c2ec18d1aa 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -146,6 +146,9 @@ struct kvm_arch {
 
 	/* Memory Tagging Extension enabled for the guest */
 	bool mte_enabled;
+
+	/* System Suspend Event exits enabled for the VM */
+	bool system_suspend_exits;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index d2b190f32651..ce3f14a77a49 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -101,6 +101,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		}
 		mutex_unlock(&kvm->lock);
 		break;
+	case KVM_CAP_ARM_SYSTEM_SUSPEND:
+		r = 0;
+		kvm->arch.system_suspend_exits = true;
+		break;
 	default:
 		r = -EINVAL;
 		break;
@@ -209,6 +213,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_SET_GUEST_DEBUG:
 	case KVM_CAP_VCPU_ATTRIBUTES:
 	case KVM_CAP_PTP_KVM:
+	case KVM_CAP_ARM_SYSTEM_SUSPEND:
 		r = 1;
 		break;
 	case KVM_CAP_SET_GUEST_DEBUG2:
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 2bb8d047cde4..a7de84cec2e4 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -245,6 +245,11 @@ static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
 		return 1;
 	}
 
+	if (kvm->arch.system_suspend_exits) {
+		kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_SUSPEND);
+		return 0;
+	}
+
 	__kvm_reset_vcpu(vcpu, &reset_state);
 	kvm_vcpu_wfi(vcpu);
 	return 1;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index babb16c2abe5..e5bb5f15c0eb 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -445,6 +445,7 @@ struct kvm_run {
 #define KVM_SYSTEM_EVENT_RESET          2
 #define KVM_SYSTEM_EVENT_CRASH          3
 #define KVM_SYSTEM_EVENT_WAKEUP         4
+#define KVM_SYSTEM_EVENT_SUSPEND        5
 			__u32 type;
 			__u64 flags;
 		} system_event;
@@ -1136,6 +1137,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_VM_GPA_BITS 207
 #define KVM_CAP_XSAVE2 208
 #define KVM_CAP_SYS_ATTRIBUTES 209
+#define KVM_CAP_ARM_SYSTEM_SUSPEND 210
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 14/19] KVM: arm64: Raise default PSCI version to v1.1
  2022-02-23  4:18 ` Oliver Upton
@ 2022-02-23  4:18   ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang, Oliver Upton

As it turns out, KVM already implements the requirements of PSCI v1.1.
Raise the default PSCI version to v1.1 to actually advertise as such.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/kvm/psci.c  | 4 +++-
 include/kvm/arm_psci.h | 3 ++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index a7de84cec2e4..0b8a603c471b 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -370,7 +370,7 @@ static int kvm_psci_1_0_call(struct kvm_vcpu *vcpu)
 
 	switch(psci_fn) {
 	case PSCI_0_2_FN_PSCI_VERSION:
-		val = KVM_ARM_PSCI_1_0;
+		val = kvm_psci_version(vcpu);
 		break;
 	case PSCI_1_0_FN_PSCI_FEATURES:
 		feature = smccc_get_arg1(vcpu);
@@ -456,6 +456,7 @@ static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
 int kvm_psci_call(struct kvm_vcpu *vcpu)
 {
 	switch (kvm_psci_version(vcpu)) {
+	case KVM_ARM_PSCI_1_1:
 	case KVM_ARM_PSCI_1_0:
 		return kvm_psci_1_0_call(vcpu);
 	case KVM_ARM_PSCI_0_2:
@@ -574,6 +575,7 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 			return 0;
 		case KVM_ARM_PSCI_0_2:
 		case KVM_ARM_PSCI_1_0:
+		case KVM_ARM_PSCI_1_1:
 			if (!wants_02)
 				return -EINVAL;
 			vcpu->kvm->arch.psci_version = val;
diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
index 297645edcaff..68b96c3826c3 100644
--- a/include/kvm/arm_psci.h
+++ b/include/kvm/arm_psci.h
@@ -13,8 +13,9 @@
 #define KVM_ARM_PSCI_0_1	PSCI_VERSION(0, 1)
 #define KVM_ARM_PSCI_0_2	PSCI_VERSION(0, 2)
 #define KVM_ARM_PSCI_1_0	PSCI_VERSION(1, 0)
+#define KVM_ARM_PSCI_1_1	PSCI_VERSION(1, 1)
 
-#define KVM_ARM_PSCI_LATEST	KVM_ARM_PSCI_1_0
+#define KVM_ARM_PSCI_LATEST	KVM_ARM_PSCI_1_1
 
 static inline int kvm_psci_version(struct kvm_vcpu *vcpu)
 {
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 14/19] KVM: arm64: Raise default PSCI version to v1.1
@ 2022-02-23  4:18   ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Wanpeng Li, kvm, Joerg Roedel, Atish Patra, Peter Shier,
	kvm-riscv, Marc Zyngier, Paolo Bonzini, Vitaly Kuznetsov,
	Jim Mattson

As it turns out, KVM already implements the requirements of PSCI v1.1.
Raise the default PSCI version to v1.1 to actually advertise as such.

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/kvm/psci.c  | 4 +++-
 include/kvm/arm_psci.h | 3 ++-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index a7de84cec2e4..0b8a603c471b 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -370,7 +370,7 @@ static int kvm_psci_1_0_call(struct kvm_vcpu *vcpu)
 
 	switch(psci_fn) {
 	case PSCI_0_2_FN_PSCI_VERSION:
-		val = KVM_ARM_PSCI_1_0;
+		val = kvm_psci_version(vcpu);
 		break;
 	case PSCI_1_0_FN_PSCI_FEATURES:
 		feature = smccc_get_arg1(vcpu);
@@ -456,6 +456,7 @@ static int kvm_psci_0_1_call(struct kvm_vcpu *vcpu)
 int kvm_psci_call(struct kvm_vcpu *vcpu)
 {
 	switch (kvm_psci_version(vcpu)) {
+	case KVM_ARM_PSCI_1_1:
 	case KVM_ARM_PSCI_1_0:
 		return kvm_psci_1_0_call(vcpu);
 	case KVM_ARM_PSCI_0_2:
@@ -574,6 +575,7 @@ int kvm_arm_set_fw_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg)
 			return 0;
 		case KVM_ARM_PSCI_0_2:
 		case KVM_ARM_PSCI_1_0:
+		case KVM_ARM_PSCI_1_1:
 			if (!wants_02)
 				return -EINVAL;
 			vcpu->kvm->arch.psci_version = val;
diff --git a/include/kvm/arm_psci.h b/include/kvm/arm_psci.h
index 297645edcaff..68b96c3826c3 100644
--- a/include/kvm/arm_psci.h
+++ b/include/kvm/arm_psci.h
@@ -13,8 +13,9 @@
 #define KVM_ARM_PSCI_0_1	PSCI_VERSION(0, 1)
 #define KVM_ARM_PSCI_0_2	PSCI_VERSION(0, 2)
 #define KVM_ARM_PSCI_1_0	PSCI_VERSION(1, 0)
+#define KVM_ARM_PSCI_1_1	PSCI_VERSION(1, 1)
 
-#define KVM_ARM_PSCI_LATEST	KVM_ARM_PSCI_1_0
+#define KVM_ARM_PSCI_LATEST	KVM_ARM_PSCI_1_1
 
 static inline int kvm_psci_version(struct kvm_vcpu *vcpu)
 {
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 15/19] selftests: KVM: Rename psci_cpu_on_test to psci_test
  2022-02-23  4:18 ` Oliver Upton
@ 2022-02-23  4:18   ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang, Oliver Upton, Andrew Jones

There are other interactions with PSCI worth testing; rename the PSCI
test to make it more generic.

No functional change intended.

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
---
 tools/testing/selftests/kvm/.gitignore                          | 2 +-
 tools/testing/selftests/kvm/Makefile                            | 2 +-
 .../selftests/kvm/aarch64/{psci_cpu_on_test.c => psci_test.c}   | 0
 3 files changed, 2 insertions(+), 2 deletions(-)
 rename tools/testing/selftests/kvm/aarch64/{psci_cpu_on_test.c => psci_test.c} (100%)

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index dce7de7755e6..ac69108d9ffd 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -2,7 +2,7 @@
 /aarch64/arch_timer
 /aarch64/debug-exceptions
 /aarch64/get-reg-list
-/aarch64/psci_cpu_on_test
+/aarch64/psci_test
 /aarch64/vgic_init
 /aarch64/vgic_irq
 /s390x/memop
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 0e4926bc9a58..61e11e372366 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -103,7 +103,7 @@ TEST_GEN_PROGS_x86_64 += system_counter_offset_test
 TEST_GEN_PROGS_aarch64 += aarch64/arch_timer
 TEST_GEN_PROGS_aarch64 += aarch64/debug-exceptions
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
-TEST_GEN_PROGS_aarch64 += aarch64/psci_cpu_on_test
+TEST_GEN_PROGS_aarch64 += aarch64/psci_test
 TEST_GEN_PROGS_aarch64 += aarch64/vgic_init
 TEST_GEN_PROGS_aarch64 += aarch64/vgic_irq
 TEST_GEN_PROGS_aarch64 += demand_paging_test
diff --git a/tools/testing/selftests/kvm/aarch64/psci_cpu_on_test.c b/tools/testing/selftests/kvm/aarch64/psci_test.c
similarity index 100%
rename from tools/testing/selftests/kvm/aarch64/psci_cpu_on_test.c
rename to tools/testing/selftests/kvm/aarch64/psci_test.c
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 15/19] selftests: KVM: Rename psci_cpu_on_test to psci_test
@ 2022-02-23  4:18   ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Wanpeng Li, kvm, Joerg Roedel, Atish Patra, Peter Shier,
	kvm-riscv, Marc Zyngier, Paolo Bonzini, Vitaly Kuznetsov,
	Jim Mattson

There are other interactions with PSCI worth testing; rename the PSCI
test to make it more generic.

No functional change intended.

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
---
 tools/testing/selftests/kvm/.gitignore                          | 2 +-
 tools/testing/selftests/kvm/Makefile                            | 2 +-
 .../selftests/kvm/aarch64/{psci_cpu_on_test.c => psci_test.c}   | 0
 3 files changed, 2 insertions(+), 2 deletions(-)
 rename tools/testing/selftests/kvm/aarch64/{psci_cpu_on_test.c => psci_test.c} (100%)

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index dce7de7755e6..ac69108d9ffd 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -2,7 +2,7 @@
 /aarch64/arch_timer
 /aarch64/debug-exceptions
 /aarch64/get-reg-list
-/aarch64/psci_cpu_on_test
+/aarch64/psci_test
 /aarch64/vgic_init
 /aarch64/vgic_irq
 /s390x/memop
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 0e4926bc9a58..61e11e372366 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -103,7 +103,7 @@ TEST_GEN_PROGS_x86_64 += system_counter_offset_test
 TEST_GEN_PROGS_aarch64 += aarch64/arch_timer
 TEST_GEN_PROGS_aarch64 += aarch64/debug-exceptions
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
-TEST_GEN_PROGS_aarch64 += aarch64/psci_cpu_on_test
+TEST_GEN_PROGS_aarch64 += aarch64/psci_test
 TEST_GEN_PROGS_aarch64 += aarch64/vgic_init
 TEST_GEN_PROGS_aarch64 += aarch64/vgic_irq
 TEST_GEN_PROGS_aarch64 += demand_paging_test
diff --git a/tools/testing/selftests/kvm/aarch64/psci_cpu_on_test.c b/tools/testing/selftests/kvm/aarch64/psci_test.c
similarity index 100%
rename from tools/testing/selftests/kvm/aarch64/psci_cpu_on_test.c
rename to tools/testing/selftests/kvm/aarch64/psci_test.c
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 16/19] selftests: KVM: Create helper for making SMCCC calls
  2022-02-23  4:18 ` Oliver Upton
@ 2022-02-23  4:18   ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang, Oliver Upton, Andrew Jones

The PSCI and PV stolen time tests both need to make SMCCC calls within
the guest. Create a helper for making SMCCC calls and rework the
existing tests to use the library function.

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
---
 .../testing/selftests/kvm/aarch64/psci_test.c | 25 ++++++-------------
 .../selftests/kvm/include/aarch64/processor.h | 22 ++++++++++++++++
 .../selftests/kvm/lib/aarch64/processor.c     | 25 +++++++++++++++++++
 tools/testing/selftests/kvm/steal_time.c      | 13 +++-------
 4 files changed, 58 insertions(+), 27 deletions(-)

diff --git a/tools/testing/selftests/kvm/aarch64/psci_test.c b/tools/testing/selftests/kvm/aarch64/psci_test.c
index 4c5f6814030f..8c998f0b802c 100644
--- a/tools/testing/selftests/kvm/aarch64/psci_test.c
+++ b/tools/testing/selftests/kvm/aarch64/psci_test.c
@@ -26,32 +26,23 @@
 static uint64_t psci_cpu_on(uint64_t target_cpu, uint64_t entry_addr,
 			    uint64_t context_id)
 {
-	register uint64_t x0 asm("x0") = PSCI_0_2_FN64_CPU_ON;
-	register uint64_t x1 asm("x1") = target_cpu;
-	register uint64_t x2 asm("x2") = entry_addr;
-	register uint64_t x3 asm("x3") = context_id;
+	struct arm_smccc_res res;
 
-	asm("hvc #0"
-	    : "=r"(x0)
-	    : "r"(x0), "r"(x1), "r"(x2), "r"(x3)
-	    : "memory");
+	smccc_hvc(PSCI_0_2_FN64_CPU_ON, target_cpu, entry_addr, context_id,
+		  0, 0, 0, 0, &res);
 
-	return x0;
+	return res.a0;
 }
 
 static uint64_t psci_affinity_info(uint64_t target_affinity,
 				   uint64_t lowest_affinity_level)
 {
-	register uint64_t x0 asm("x0") = PSCI_0_2_FN64_AFFINITY_INFO;
-	register uint64_t x1 asm("x1") = target_affinity;
-	register uint64_t x2 asm("x2") = lowest_affinity_level;
+	struct arm_smccc_res res;
 
-	asm("hvc #0"
-	    : "=r"(x0)
-	    : "r"(x0), "r"(x1), "r"(x2)
-	    : "memory");
+	smccc_hvc(PSCI_0_2_FN64_AFFINITY_INFO, target_affinity, lowest_affinity_level,
+		  0, 0, 0, 0, 0, &res);
 
-	return x0;
+	return res.a0;
 }
 
 static void guest_main(uint64_t target_cpu)
diff --git a/tools/testing/selftests/kvm/include/aarch64/processor.h b/tools/testing/selftests/kvm/include/aarch64/processor.h
index 8f9f46979a00..59ece9d4e0d1 100644
--- a/tools/testing/selftests/kvm/include/aarch64/processor.h
+++ b/tools/testing/selftests/kvm/include/aarch64/processor.h
@@ -185,4 +185,26 @@ static inline void local_irq_disable(void)
 	asm volatile("msr daifset, #3" : : : "memory");
 }
 
+/**
+ * struct arm_smccc_res - Result from SMC/HVC call
+ * @a0-a3 result values from registers 0 to 3
+ */
+struct arm_smccc_res {
+	unsigned long a0;
+	unsigned long a1;
+	unsigned long a2;
+	unsigned long a3;
+};
+
+/**
+ * smccc_hvc - Invoke a SMCCC function using the hvc conduit
+ * @function_id: the SMCCC function to be called
+ * @arg0-arg6: SMCCC function arguments, corresponding to registers x1-x7
+ * @res: pointer to write the return values from registers x0-x3
+ *
+ */
+void smccc_hvc(uint32_t function_id, uint64_t arg0, uint64_t arg1,
+	       uint64_t arg2, uint64_t arg3, uint64_t arg4, uint64_t arg5,
+	       uint64_t arg6, struct arm_smccc_res *res);
+
 #endif /* SELFTEST_KVM_PROCESSOR_H */
diff --git a/tools/testing/selftests/kvm/lib/aarch64/processor.c b/tools/testing/selftests/kvm/lib/aarch64/processor.c
index 9343d82519b4..6a041289fa80 100644
--- a/tools/testing/selftests/kvm/lib/aarch64/processor.c
+++ b/tools/testing/selftests/kvm/lib/aarch64/processor.c
@@ -500,3 +500,28 @@ void __attribute__((constructor)) init_guest_modes(void)
 {
        guest_modes_append_default();
 }
+
+void smccc_hvc(uint32_t function_id, uint64_t arg0, uint64_t arg1,
+	       uint64_t arg2, uint64_t arg3, uint64_t arg4, uint64_t arg5,
+	       uint64_t arg6, struct arm_smccc_res *res)
+{
+	asm volatile("mov   w0, %w[function_id]\n"
+		     "mov   x1, %[arg0]\n"
+		     "mov   x2, %[arg1]\n"
+		     "mov   x3, %[arg2]\n"
+		     "mov   x4, %[arg3]\n"
+		     "mov   x5, %[arg4]\n"
+		     "mov   x6, %[arg5]\n"
+		     "mov   x7, %[arg6]\n"
+		     "hvc   #0\n"
+		     "mov   %[res0], x0\n"
+		     "mov   %[res1], x1\n"
+		     "mov   %[res2], x2\n"
+		     "mov   %[res3], x3\n"
+		     : [res0] "=r"(res->a0), [res1] "=r"(res->a1),
+		       [res2] "=r"(res->a2), [res3] "=r"(res->a3)
+		     : [function_id] "r"(function_id), [arg0] "r"(arg0),
+		       [arg1] "r"(arg1), [arg2] "r"(arg2), [arg3] "r"(arg3),
+		       [arg4] "r"(arg4), [arg5] "r"(arg5), [arg6] "r"(arg6)
+		     : "x0", "x1", "x2", "x3", "x4", "x5", "x6", "x7");
+}
diff --git a/tools/testing/selftests/kvm/steal_time.c b/tools/testing/selftests/kvm/steal_time.c
index 62f2eb9ee3d5..8c4e811bd586 100644
--- a/tools/testing/selftests/kvm/steal_time.c
+++ b/tools/testing/selftests/kvm/steal_time.c
@@ -118,17 +118,10 @@ struct st_time {
 
 static int64_t smccc(uint32_t func, uint64_t arg)
 {
-	unsigned long ret;
+	struct arm_smccc_res res;
 
-	asm volatile(
-		"mov	w0, %w1\n"
-		"mov	x1, %2\n"
-		"hvc	#0\n"
-		"mov	%0, x0\n"
-	: "=r" (ret) : "r" (func), "r" (arg) :
-	  "x0", "x1", "x2", "x3");
-
-	return ret;
+	smccc_hvc(func, arg, 0, 0, 0, 0, 0, 0, &res);
+	return res.a0;
 }
 
 static void check_status(struct st_time *st)
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 16/19] selftests: KVM: Create helper for making SMCCC calls
@ 2022-02-23  4:18   ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Wanpeng Li, kvm, Joerg Roedel, Atish Patra, Peter Shier,
	kvm-riscv, Marc Zyngier, Paolo Bonzini, Vitaly Kuznetsov,
	Jim Mattson

The PSCI and PV stolen time tests both need to make SMCCC calls within
the guest. Create a helper for making SMCCC calls and rework the
existing tests to use the library function.

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
---
 .../testing/selftests/kvm/aarch64/psci_test.c | 25 ++++++-------------
 .../selftests/kvm/include/aarch64/processor.h | 22 ++++++++++++++++
 .../selftests/kvm/lib/aarch64/processor.c     | 25 +++++++++++++++++++
 tools/testing/selftests/kvm/steal_time.c      | 13 +++-------
 4 files changed, 58 insertions(+), 27 deletions(-)

diff --git a/tools/testing/selftests/kvm/aarch64/psci_test.c b/tools/testing/selftests/kvm/aarch64/psci_test.c
index 4c5f6814030f..8c998f0b802c 100644
--- a/tools/testing/selftests/kvm/aarch64/psci_test.c
+++ b/tools/testing/selftests/kvm/aarch64/psci_test.c
@@ -26,32 +26,23 @@
 static uint64_t psci_cpu_on(uint64_t target_cpu, uint64_t entry_addr,
 			    uint64_t context_id)
 {
-	register uint64_t x0 asm("x0") = PSCI_0_2_FN64_CPU_ON;
-	register uint64_t x1 asm("x1") = target_cpu;
-	register uint64_t x2 asm("x2") = entry_addr;
-	register uint64_t x3 asm("x3") = context_id;
+	struct arm_smccc_res res;
 
-	asm("hvc #0"
-	    : "=r"(x0)
-	    : "r"(x0), "r"(x1), "r"(x2), "r"(x3)
-	    : "memory");
+	smccc_hvc(PSCI_0_2_FN64_CPU_ON, target_cpu, entry_addr, context_id,
+		  0, 0, 0, 0, &res);
 
-	return x0;
+	return res.a0;
 }
 
 static uint64_t psci_affinity_info(uint64_t target_affinity,
 				   uint64_t lowest_affinity_level)
 {
-	register uint64_t x0 asm("x0") = PSCI_0_2_FN64_AFFINITY_INFO;
-	register uint64_t x1 asm("x1") = target_affinity;
-	register uint64_t x2 asm("x2") = lowest_affinity_level;
+	struct arm_smccc_res res;
 
-	asm("hvc #0"
-	    : "=r"(x0)
-	    : "r"(x0), "r"(x1), "r"(x2)
-	    : "memory");
+	smccc_hvc(PSCI_0_2_FN64_AFFINITY_INFO, target_affinity, lowest_affinity_level,
+		  0, 0, 0, 0, 0, &res);
 
-	return x0;
+	return res.a0;
 }
 
 static void guest_main(uint64_t target_cpu)
diff --git a/tools/testing/selftests/kvm/include/aarch64/processor.h b/tools/testing/selftests/kvm/include/aarch64/processor.h
index 8f9f46979a00..59ece9d4e0d1 100644
--- a/tools/testing/selftests/kvm/include/aarch64/processor.h
+++ b/tools/testing/selftests/kvm/include/aarch64/processor.h
@@ -185,4 +185,26 @@ static inline void local_irq_disable(void)
 	asm volatile("msr daifset, #3" : : : "memory");
 }
 
+/**
+ * struct arm_smccc_res - Result from SMC/HVC call
+ * @a0-a3 result values from registers 0 to 3
+ */
+struct arm_smccc_res {
+	unsigned long a0;
+	unsigned long a1;
+	unsigned long a2;
+	unsigned long a3;
+};
+
+/**
+ * smccc_hvc - Invoke a SMCCC function using the hvc conduit
+ * @function_id: the SMCCC function to be called
+ * @arg0-arg6: SMCCC function arguments, corresponding to registers x1-x7
+ * @res: pointer to write the return values from registers x0-x3
+ *
+ */
+void smccc_hvc(uint32_t function_id, uint64_t arg0, uint64_t arg1,
+	       uint64_t arg2, uint64_t arg3, uint64_t arg4, uint64_t arg5,
+	       uint64_t arg6, struct arm_smccc_res *res);
+
 #endif /* SELFTEST_KVM_PROCESSOR_H */
diff --git a/tools/testing/selftests/kvm/lib/aarch64/processor.c b/tools/testing/selftests/kvm/lib/aarch64/processor.c
index 9343d82519b4..6a041289fa80 100644
--- a/tools/testing/selftests/kvm/lib/aarch64/processor.c
+++ b/tools/testing/selftests/kvm/lib/aarch64/processor.c
@@ -500,3 +500,28 @@ void __attribute__((constructor)) init_guest_modes(void)
 {
        guest_modes_append_default();
 }
+
+void smccc_hvc(uint32_t function_id, uint64_t arg0, uint64_t arg1,
+	       uint64_t arg2, uint64_t arg3, uint64_t arg4, uint64_t arg5,
+	       uint64_t arg6, struct arm_smccc_res *res)
+{
+	asm volatile("mov   w0, %w[function_id]\n"
+		     "mov   x1, %[arg0]\n"
+		     "mov   x2, %[arg1]\n"
+		     "mov   x3, %[arg2]\n"
+		     "mov   x4, %[arg3]\n"
+		     "mov   x5, %[arg4]\n"
+		     "mov   x6, %[arg5]\n"
+		     "mov   x7, %[arg6]\n"
+		     "hvc   #0\n"
+		     "mov   %[res0], x0\n"
+		     "mov   %[res1], x1\n"
+		     "mov   %[res2], x2\n"
+		     "mov   %[res3], x3\n"
+		     : [res0] "=r"(res->a0), [res1] "=r"(res->a1),
+		       [res2] "=r"(res->a2), [res3] "=r"(res->a3)
+		     : [function_id] "r"(function_id), [arg0] "r"(arg0),
+		       [arg1] "r"(arg1), [arg2] "r"(arg2), [arg3] "r"(arg3),
+		       [arg4] "r"(arg4), [arg5] "r"(arg5), [arg6] "r"(arg6)
+		     : "x0", "x1", "x2", "x3", "x4", "x5", "x6", "x7");
+}
diff --git a/tools/testing/selftests/kvm/steal_time.c b/tools/testing/selftests/kvm/steal_time.c
index 62f2eb9ee3d5..8c4e811bd586 100644
--- a/tools/testing/selftests/kvm/steal_time.c
+++ b/tools/testing/selftests/kvm/steal_time.c
@@ -118,17 +118,10 @@ struct st_time {
 
 static int64_t smccc(uint32_t func, uint64_t arg)
 {
-	unsigned long ret;
+	struct arm_smccc_res res;
 
-	asm volatile(
-		"mov	w0, %w1\n"
-		"mov	x1, %2\n"
-		"hvc	#0\n"
-		"mov	%0, x0\n"
-	: "=r" (ret) : "r" (func), "r" (arg) :
-	  "x0", "x1", "x2", "x3");
-
-	return ret;
+	smccc_hvc(func, arg, 0, 0, 0, 0, 0, 0, &res);
+	return res.a0;
 }
 
 static void check_status(struct st_time *st)
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 17/19] selftests: KVM: Use KVM_SET_MP_STATE to power off vCPU in psci_test
  2022-02-23  4:18 ` Oliver Upton
@ 2022-02-23  4:18   ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang, Oliver Upton

Setting a vCPU's MP state to KVM_MP_STATE_STOPPED has the effect of
powering off the vCPU. Rather than using the vCPU init feature flag, use
the KVM_SET_MP_STATE ioctl to power off the target vCPU.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 tools/testing/selftests/kvm/aarch64/psci_test.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kvm/aarch64/psci_test.c b/tools/testing/selftests/kvm/aarch64/psci_test.c
index 8c998f0b802c..fe1d5d343a2f 100644
--- a/tools/testing/selftests/kvm/aarch64/psci_test.c
+++ b/tools/testing/selftests/kvm/aarch64/psci_test.c
@@ -60,6 +60,15 @@ static void guest_main(uint64_t target_cpu)
 	GUEST_DONE();
 }
 
+static void vcpu_power_off(struct kvm_vm *vm, uint32_t vcpuid)
+{
+	struct kvm_mp_state mp_state = {
+		.mp_state = KVM_MP_STATE_STOPPED,
+	};
+
+	vcpu_set_mp_state(vm, vcpuid, &mp_state);
+}
+
 int main(void)
 {
 	uint64_t target_mpidr, obs_pc, obs_x0;
@@ -75,12 +84,12 @@ int main(void)
 	init.features[0] |= (1 << KVM_ARM_VCPU_PSCI_0_2);
 
 	aarch64_vcpu_add_default(vm, VCPU_ID_SOURCE, &init, guest_main);
+	aarch64_vcpu_add_default(vm, VCPU_ID_TARGET, &init, guest_main);
 
 	/*
 	 * make sure the target is already off when executing the test.
 	 */
-	init.features[0] |= (1 << KVM_ARM_VCPU_POWER_OFF);
-	aarch64_vcpu_add_default(vm, VCPU_ID_TARGET, &init, guest_main);
+	vcpu_power_off(vm, VCPU_ID_TARGET);
 
 	get_reg(vm, VCPU_ID_TARGET, KVM_ARM64_SYS_REG(SYS_MPIDR_EL1), &target_mpidr);
 	vcpu_args_set(vm, VCPU_ID_SOURCE, 1, target_mpidr & MPIDR_HWID_BITMASK);
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 17/19] selftests: KVM: Use KVM_SET_MP_STATE to power off vCPU in psci_test
@ 2022-02-23  4:18   ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Wanpeng Li, kvm, Joerg Roedel, Atish Patra, Peter Shier,
	kvm-riscv, Marc Zyngier, Paolo Bonzini, Vitaly Kuznetsov,
	Jim Mattson

Setting a vCPU's MP state to KVM_MP_STATE_STOPPED has the effect of
powering off the vCPU. Rather than using the vCPU init feature flag, use
the KVM_SET_MP_STATE ioctl to power off the target vCPU.

Signed-off-by: Oliver Upton <oupton@google.com>
---
 tools/testing/selftests/kvm/aarch64/psci_test.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kvm/aarch64/psci_test.c b/tools/testing/selftests/kvm/aarch64/psci_test.c
index 8c998f0b802c..fe1d5d343a2f 100644
--- a/tools/testing/selftests/kvm/aarch64/psci_test.c
+++ b/tools/testing/selftests/kvm/aarch64/psci_test.c
@@ -60,6 +60,15 @@ static void guest_main(uint64_t target_cpu)
 	GUEST_DONE();
 }
 
+static void vcpu_power_off(struct kvm_vm *vm, uint32_t vcpuid)
+{
+	struct kvm_mp_state mp_state = {
+		.mp_state = KVM_MP_STATE_STOPPED,
+	};
+
+	vcpu_set_mp_state(vm, vcpuid, &mp_state);
+}
+
 int main(void)
 {
 	uint64_t target_mpidr, obs_pc, obs_x0;
@@ -75,12 +84,12 @@ int main(void)
 	init.features[0] |= (1 << KVM_ARM_VCPU_PSCI_0_2);
 
 	aarch64_vcpu_add_default(vm, VCPU_ID_SOURCE, &init, guest_main);
+	aarch64_vcpu_add_default(vm, VCPU_ID_TARGET, &init, guest_main);
 
 	/*
 	 * make sure the target is already off when executing the test.
 	 */
-	init.features[0] |= (1 << KVM_ARM_VCPU_POWER_OFF);
-	aarch64_vcpu_add_default(vm, VCPU_ID_TARGET, &init, guest_main);
+	vcpu_power_off(vm, VCPU_ID_TARGET);
 
 	get_reg(vm, VCPU_ID_TARGET, KVM_ARM64_SYS_REG(SYS_MPIDR_EL1), &target_mpidr);
 	vcpu_args_set(vm, VCPU_ID_SOURCE, 1, target_mpidr & MPIDR_HWID_BITMASK);
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 18/19] selftests: KVM: Refactor psci_test to make it amenable to new tests
  2022-02-23  4:18 ` Oliver Upton
@ 2022-02-23  4:18   ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang, Oliver Upton, Andrew Jones

Split up the current test into several helpers that will be useful to
subsequent test cases added to the PSCI test suite.

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
---
 .../testing/selftests/kvm/aarch64/psci_test.c | 97 ++++++++++++-------
 1 file changed, 60 insertions(+), 37 deletions(-)

diff --git a/tools/testing/selftests/kvm/aarch64/psci_test.c b/tools/testing/selftests/kvm/aarch64/psci_test.c
index fe1d5d343a2f..535130d5e97f 100644
--- a/tools/testing/selftests/kvm/aarch64/psci_test.c
+++ b/tools/testing/selftests/kvm/aarch64/psci_test.c
@@ -45,21 +45,6 @@ static uint64_t psci_affinity_info(uint64_t target_affinity,
 	return res.a0;
 }
 
-static void guest_main(uint64_t target_cpu)
-{
-	GUEST_ASSERT(!psci_cpu_on(target_cpu, CPU_ON_ENTRY_ADDR, CPU_ON_CONTEXT_ID));
-	uint64_t target_state;
-
-	do {
-		target_state = psci_affinity_info(target_cpu, 0);
-
-		GUEST_ASSERT((target_state == PSCI_0_2_AFFINITY_LEVEL_ON) ||
-			     (target_state == PSCI_0_2_AFFINITY_LEVEL_OFF));
-	} while (target_state != PSCI_0_2_AFFINITY_LEVEL_ON);
-
-	GUEST_DONE();
-}
-
 static void vcpu_power_off(struct kvm_vm *vm, uint32_t vcpuid)
 {
 	struct kvm_mp_state mp_state = {
@@ -69,12 +54,10 @@ static void vcpu_power_off(struct kvm_vm *vm, uint32_t vcpuid)
 	vcpu_set_mp_state(vm, vcpuid, &mp_state);
 }
 
-int main(void)
+static struct kvm_vm *setup_vm(void *guest_code)
 {
-	uint64_t target_mpidr, obs_pc, obs_x0;
 	struct kvm_vcpu_init init;
 	struct kvm_vm *vm;
-	struct ucall uc;
 
 	vm = vm_create(VM_MODE_DEFAULT, DEFAULT_GUEST_PHY_PAGES, O_RDWR);
 	kvm_vm_elf_load(vm, program_invocation_name);
@@ -83,31 +66,28 @@ int main(void)
 	vm_ioctl(vm, KVM_ARM_PREFERRED_TARGET, &init);
 	init.features[0] |= (1 << KVM_ARM_VCPU_PSCI_0_2);
 
-	aarch64_vcpu_add_default(vm, VCPU_ID_SOURCE, &init, guest_main);
-	aarch64_vcpu_add_default(vm, VCPU_ID_TARGET, &init, guest_main);
+	aarch64_vcpu_add_default(vm, VCPU_ID_SOURCE, &init, guest_code);
+	aarch64_vcpu_add_default(vm, VCPU_ID_TARGET, &init, guest_code);
 
-	/*
-	 * make sure the target is already off when executing the test.
-	 */
-	vcpu_power_off(vm, VCPU_ID_TARGET);
+	return vm;
+}
 
-	get_reg(vm, VCPU_ID_TARGET, KVM_ARM64_SYS_REG(SYS_MPIDR_EL1), &target_mpidr);
-	vcpu_args_set(vm, VCPU_ID_SOURCE, 1, target_mpidr & MPIDR_HWID_BITMASK);
-	vcpu_run(vm, VCPU_ID_SOURCE);
+static void enter_guest(struct kvm_vm *vm, uint32_t vcpuid)
+{
+	struct ucall uc;
 
-	switch (get_ucall(vm, VCPU_ID_SOURCE, &uc)) {
-	case UCALL_DONE:
-		break;
-	case UCALL_ABORT:
+	vcpu_run(vm, vcpuid);
+	if (get_ucall(vm, vcpuid, &uc) == UCALL_ABORT)
 		TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0], __FILE__,
 			  uc.args[1]);
-		break;
-	default:
-		TEST_FAIL("Unhandled ucall: %lu", uc.cmd);
-	}
+}
+
+static void assert_vcpu_reset(struct kvm_vm *vm, uint32_t vcpuid)
+{
+	uint64_t obs_pc, obs_x0;
 
-	get_reg(vm, VCPU_ID_TARGET, ARM64_CORE_REG(regs.pc), &obs_pc);
-	get_reg(vm, VCPU_ID_TARGET, ARM64_CORE_REG(regs.regs[0]), &obs_x0);
+	get_reg(vm, vcpuid, ARM64_CORE_REG(regs.pc), &obs_pc);
+	get_reg(vm, vcpuid, ARM64_CORE_REG(regs.regs[0]), &obs_x0);
 
 	TEST_ASSERT(obs_pc == CPU_ON_ENTRY_ADDR,
 		    "unexpected target cpu pc: %lx (expected: %lx)",
@@ -115,7 +95,50 @@ int main(void)
 	TEST_ASSERT(obs_x0 == CPU_ON_CONTEXT_ID,
 		    "unexpected target context id: %lx (expected: %lx)",
 		    obs_x0, CPU_ON_CONTEXT_ID);
+}
+
+static void guest_test_cpu_on(uint64_t target_cpu)
+{
+	uint64_t target_state;
+
+	GUEST_ASSERT(!psci_cpu_on(target_cpu, CPU_ON_ENTRY_ADDR, CPU_ON_CONTEXT_ID));
+
+	do {
+		target_state = psci_affinity_info(target_cpu, 0);
+
+		GUEST_ASSERT((target_state == PSCI_0_2_AFFINITY_LEVEL_ON) ||
+			     (target_state == PSCI_0_2_AFFINITY_LEVEL_OFF));
+	} while (target_state != PSCI_0_2_AFFINITY_LEVEL_ON);
+
+	GUEST_DONE();
+}
+
+static void host_test_cpu_on(void)
+{
+	uint64_t target_mpidr;
+	struct kvm_vm *vm;
+	struct ucall uc;
+
+	vm = setup_vm(guest_test_cpu_on);
+
+	/*
+	 * make sure the target is already off when executing the test.
+	 */
+	vcpu_power_off(vm, VCPU_ID_TARGET);
+
+	get_reg(vm, VCPU_ID_TARGET, KVM_ARM64_SYS_REG(SYS_MPIDR_EL1), &target_mpidr);
+	vcpu_args_set(vm, VCPU_ID_SOURCE, 1, target_mpidr & MPIDR_HWID_BITMASK);
+	enter_guest(vm, VCPU_ID_SOURCE);
+
+	if (get_ucall(vm, VCPU_ID_SOURCE, &uc) != UCALL_DONE)
+		TEST_FAIL("Unhandled ucall: %lu", uc.cmd);
 
+	assert_vcpu_reset(vm, VCPU_ID_TARGET);
 	kvm_vm_free(vm);
+}
+
+int main(void)
+{
+	host_test_cpu_on();
 	return 0;
 }
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 18/19] selftests: KVM: Refactor psci_test to make it amenable to new tests
@ 2022-02-23  4:18   ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Wanpeng Li, kvm, Joerg Roedel, Atish Patra, Peter Shier,
	kvm-riscv, Marc Zyngier, Paolo Bonzini, Vitaly Kuznetsov,
	Jim Mattson

Split up the current test into several helpers that will be useful to
subsequent test cases added to the PSCI test suite.

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <drjones@redhat.com>
---
 .../testing/selftests/kvm/aarch64/psci_test.c | 97 ++++++++++++-------
 1 file changed, 60 insertions(+), 37 deletions(-)

diff --git a/tools/testing/selftests/kvm/aarch64/psci_test.c b/tools/testing/selftests/kvm/aarch64/psci_test.c
index fe1d5d343a2f..535130d5e97f 100644
--- a/tools/testing/selftests/kvm/aarch64/psci_test.c
+++ b/tools/testing/selftests/kvm/aarch64/psci_test.c
@@ -45,21 +45,6 @@ static uint64_t psci_affinity_info(uint64_t target_affinity,
 	return res.a0;
 }
 
-static void guest_main(uint64_t target_cpu)
-{
-	GUEST_ASSERT(!psci_cpu_on(target_cpu, CPU_ON_ENTRY_ADDR, CPU_ON_CONTEXT_ID));
-	uint64_t target_state;
-
-	do {
-		target_state = psci_affinity_info(target_cpu, 0);
-
-		GUEST_ASSERT((target_state == PSCI_0_2_AFFINITY_LEVEL_ON) ||
-			     (target_state == PSCI_0_2_AFFINITY_LEVEL_OFF));
-	} while (target_state != PSCI_0_2_AFFINITY_LEVEL_ON);
-
-	GUEST_DONE();
-}
-
 static void vcpu_power_off(struct kvm_vm *vm, uint32_t vcpuid)
 {
 	struct kvm_mp_state mp_state = {
@@ -69,12 +54,10 @@ static void vcpu_power_off(struct kvm_vm *vm, uint32_t vcpuid)
 	vcpu_set_mp_state(vm, vcpuid, &mp_state);
 }
 
-int main(void)
+static struct kvm_vm *setup_vm(void *guest_code)
 {
-	uint64_t target_mpidr, obs_pc, obs_x0;
 	struct kvm_vcpu_init init;
 	struct kvm_vm *vm;
-	struct ucall uc;
 
 	vm = vm_create(VM_MODE_DEFAULT, DEFAULT_GUEST_PHY_PAGES, O_RDWR);
 	kvm_vm_elf_load(vm, program_invocation_name);
@@ -83,31 +66,28 @@ int main(void)
 	vm_ioctl(vm, KVM_ARM_PREFERRED_TARGET, &init);
 	init.features[0] |= (1 << KVM_ARM_VCPU_PSCI_0_2);
 
-	aarch64_vcpu_add_default(vm, VCPU_ID_SOURCE, &init, guest_main);
-	aarch64_vcpu_add_default(vm, VCPU_ID_TARGET, &init, guest_main);
+	aarch64_vcpu_add_default(vm, VCPU_ID_SOURCE, &init, guest_code);
+	aarch64_vcpu_add_default(vm, VCPU_ID_TARGET, &init, guest_code);
 
-	/*
-	 * make sure the target is already off when executing the test.
-	 */
-	vcpu_power_off(vm, VCPU_ID_TARGET);
+	return vm;
+}
 
-	get_reg(vm, VCPU_ID_TARGET, KVM_ARM64_SYS_REG(SYS_MPIDR_EL1), &target_mpidr);
-	vcpu_args_set(vm, VCPU_ID_SOURCE, 1, target_mpidr & MPIDR_HWID_BITMASK);
-	vcpu_run(vm, VCPU_ID_SOURCE);
+static void enter_guest(struct kvm_vm *vm, uint32_t vcpuid)
+{
+	struct ucall uc;
 
-	switch (get_ucall(vm, VCPU_ID_SOURCE, &uc)) {
-	case UCALL_DONE:
-		break;
-	case UCALL_ABORT:
+	vcpu_run(vm, vcpuid);
+	if (get_ucall(vm, vcpuid, &uc) == UCALL_ABORT)
 		TEST_FAIL("%s at %s:%ld", (const char *)uc.args[0], __FILE__,
 			  uc.args[1]);
-		break;
-	default:
-		TEST_FAIL("Unhandled ucall: %lu", uc.cmd);
-	}
+}
+
+static void assert_vcpu_reset(struct kvm_vm *vm, uint32_t vcpuid)
+{
+	uint64_t obs_pc, obs_x0;
 
-	get_reg(vm, VCPU_ID_TARGET, ARM64_CORE_REG(regs.pc), &obs_pc);
-	get_reg(vm, VCPU_ID_TARGET, ARM64_CORE_REG(regs.regs[0]), &obs_x0);
+	get_reg(vm, vcpuid, ARM64_CORE_REG(regs.pc), &obs_pc);
+	get_reg(vm, vcpuid, ARM64_CORE_REG(regs.regs[0]), &obs_x0);
 
 	TEST_ASSERT(obs_pc == CPU_ON_ENTRY_ADDR,
 		    "unexpected target cpu pc: %lx (expected: %lx)",
@@ -115,7 +95,50 @@ int main(void)
 	TEST_ASSERT(obs_x0 == CPU_ON_CONTEXT_ID,
 		    "unexpected target context id: %lx (expected: %lx)",
 		    obs_x0, CPU_ON_CONTEXT_ID);
+}
+
+static void guest_test_cpu_on(uint64_t target_cpu)
+{
+	uint64_t target_state;
+
+	GUEST_ASSERT(!psci_cpu_on(target_cpu, CPU_ON_ENTRY_ADDR, CPU_ON_CONTEXT_ID));
+
+	do {
+		target_state = psci_affinity_info(target_cpu, 0);
+
+		GUEST_ASSERT((target_state == PSCI_0_2_AFFINITY_LEVEL_ON) ||
+			     (target_state == PSCI_0_2_AFFINITY_LEVEL_OFF));
+	} while (target_state != PSCI_0_2_AFFINITY_LEVEL_ON);
+
+	GUEST_DONE();
+}
+
+static void host_test_cpu_on(void)
+{
+	uint64_t target_mpidr;
+	struct kvm_vm *vm;
+	struct ucall uc;
+
+	vm = setup_vm(guest_test_cpu_on);
+
+	/*
+	 * make sure the target is already off when executing the test.
+	 */
+	vcpu_power_off(vm, VCPU_ID_TARGET);
+
+	get_reg(vm, VCPU_ID_TARGET, KVM_ARM64_SYS_REG(SYS_MPIDR_EL1), &target_mpidr);
+	vcpu_args_set(vm, VCPU_ID_SOURCE, 1, target_mpidr & MPIDR_HWID_BITMASK);
+	enter_guest(vm, VCPU_ID_SOURCE);
+
+	if (get_ucall(vm, VCPU_ID_SOURCE, &uc) != UCALL_DONE)
+		TEST_FAIL("Unhandled ucall: %lu", uc.cmd);
 
+	assert_vcpu_reset(vm, VCPU_ID_TARGET);
 	kvm_vm_free(vm);
+}
+
+int main(void)
+{
+	host_test_cpu_on();
 	return 0;
 }
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 19/19] selftests: KVM: Test SYSTEM_SUSPEND PSCI call
  2022-02-23  4:18 ` Oliver Upton
@ 2022-02-23  4:18   ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang, Oliver Upton

Assert that the vCPU exits to userspace with KVM_SYSTEM_EVENT_SUSPEND if
it correctly executes the SYSTEM_SUSPEND PSCI call. Additionally, assert
that the guest PSCI call fails if preconditions are not met (more than 1
running vCPU).

Signed-off-by: Oliver Upton <oupton@google.com>
---
 .../testing/selftests/kvm/aarch64/psci_test.c | 74 +++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/tools/testing/selftests/kvm/aarch64/psci_test.c b/tools/testing/selftests/kvm/aarch64/psci_test.c
index 535130d5e97f..ef7fd58af675 100644
--- a/tools/testing/selftests/kvm/aarch64/psci_test.c
+++ b/tools/testing/selftests/kvm/aarch64/psci_test.c
@@ -45,6 +45,16 @@ static uint64_t psci_affinity_info(uint64_t target_affinity,
 	return res.a0;
 }
 
+static uint64_t psci_system_suspend(uint64_t entry_addr, uint64_t context_id)
+{
+	struct arm_smccc_res res;
+
+	smccc_hvc(PSCI_1_0_FN64_SYSTEM_SUSPEND, entry_addr, context_id,
+		  0, 0, 0, 0, 0, &res);
+
+	return res.a0;
+}
+
 static void vcpu_power_off(struct kvm_vm *vm, uint32_t vcpuid)
 {
 	struct kvm_mp_state mp_state = {
@@ -137,8 +147,72 @@ static void host_test_cpu_on(void)
 	kvm_vm_free(vm);
 }
 
+static void enable_system_suspend(struct kvm_vm *vm)
+{
+	struct kvm_enable_cap cap = {
+		.cap = KVM_CAP_ARM_SYSTEM_SUSPEND,
+	};
+
+	vm_enable_cap(vm, &cap);
+}
+
+static void guest_test_system_suspend(void)
+{
+	uint64_t r = psci_system_suspend(CPU_ON_ENTRY_ADDR, CPU_ON_CONTEXT_ID);
+
+	GUEST_SYNC(r);
+}
+
+static void host_test_system_suspend(void)
+{
+	struct kvm_run *run;
+	struct kvm_vm *vm;
+
+	vm = setup_vm(guest_test_system_suspend);
+	enable_system_suspend(vm);
+
+	vcpu_power_off(vm, VCPU_ID_TARGET);
+	run = vcpu_state(vm, VCPU_ID_SOURCE);
+
+	enter_guest(vm, VCPU_ID_SOURCE);
+
+	TEST_ASSERT(run->exit_reason == KVM_EXIT_SYSTEM_EVENT,
+		    "Unhandled exit reason: %u (%s)",
+		    run->exit_reason, exit_reason_str(run->exit_reason));
+	TEST_ASSERT(run->system_event.type == KVM_SYSTEM_EVENT_SUSPEND,
+		    "Unhandled system event: %u (expected: %u)",
+		    run->system_event.type, KVM_SYSTEM_EVENT_SUSPEND);
+
+	kvm_vm_free(vm);
+}
+
+static void host_test_system_suspend_fails(void)
+{
+	struct kvm_vm *vm;
+	struct ucall uc;
+
+	vm = setup_vm(guest_test_system_suspend);
+	enable_system_suspend(vm);
+
+	enter_guest(vm, VCPU_ID_SOURCE);
+	TEST_ASSERT(get_ucall(vm, VCPU_ID_SOURCE, &uc) == UCALL_SYNC,
+		    "Unhandled ucall: %lu", uc.cmd);
+	TEST_ASSERT(uc.args[1] == PSCI_RET_DENIED,
+		    "Unrecognized PSCI return code: %lu (expected: %u)",
+		    uc.args[1], PSCI_RET_DENIED);
+
+	kvm_vm_free(vm);
+}
+
 int main(void)
 {
+	if (!kvm_check_cap(KVM_CAP_ARM_SYSTEM_SUSPEND)) {
+		print_skip("KVM_CAP_ARM_SYSTEM_SUSPEND not supported");
+		exit(KSFT_SKIP);
+	}
+
 	host_test_cpu_on();
+	host_test_system_suspend();
+	host_test_system_suspend_fails();
 	return 0;
 }
-- 
2.35.1.473.g83b2b277ed-goog


^ permalink raw reply related	[flat|nested] 94+ messages in thread

* [PATCH v3 19/19] selftests: KVM: Test SYSTEM_SUSPEND PSCI call
@ 2022-02-23  4:18   ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:18 UTC (permalink / raw)
  To: kvmarm
  Cc: Wanpeng Li, kvm, Joerg Roedel, Atish Patra, Peter Shier,
	kvm-riscv, Marc Zyngier, Paolo Bonzini, Vitaly Kuznetsov,
	Jim Mattson

Assert that the vCPU exits to userspace with KVM_SYSTEM_EVENT_SUSPEND if
it correctly executes the SYSTEM_SUSPEND PSCI call. Additionally, assert
that the guest PSCI call fails if preconditions are not met (more than 1
running vCPU).

Signed-off-by: Oliver Upton <oupton@google.com>
---
 .../testing/selftests/kvm/aarch64/psci_test.c | 74 +++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/tools/testing/selftests/kvm/aarch64/psci_test.c b/tools/testing/selftests/kvm/aarch64/psci_test.c
index 535130d5e97f..ef7fd58af675 100644
--- a/tools/testing/selftests/kvm/aarch64/psci_test.c
+++ b/tools/testing/selftests/kvm/aarch64/psci_test.c
@@ -45,6 +45,16 @@ static uint64_t psci_affinity_info(uint64_t target_affinity,
 	return res.a0;
 }
 
+static uint64_t psci_system_suspend(uint64_t entry_addr, uint64_t context_id)
+{
+	struct arm_smccc_res res;
+
+	smccc_hvc(PSCI_1_0_FN64_SYSTEM_SUSPEND, entry_addr, context_id,
+		  0, 0, 0, 0, 0, &res);
+
+	return res.a0;
+}
+
 static void vcpu_power_off(struct kvm_vm *vm, uint32_t vcpuid)
 {
 	struct kvm_mp_state mp_state = {
@@ -137,8 +147,72 @@ static void host_test_cpu_on(void)
 	kvm_vm_free(vm);
 }
 
+static void enable_system_suspend(struct kvm_vm *vm)
+{
+	struct kvm_enable_cap cap = {
+		.cap = KVM_CAP_ARM_SYSTEM_SUSPEND,
+	};
+
+	vm_enable_cap(vm, &cap);
+}
+
+static void guest_test_system_suspend(void)
+{
+	uint64_t r = psci_system_suspend(CPU_ON_ENTRY_ADDR, CPU_ON_CONTEXT_ID);
+
+	GUEST_SYNC(r);
+}
+
+static void host_test_system_suspend(void)
+{
+	struct kvm_run *run;
+	struct kvm_vm *vm;
+
+	vm = setup_vm(guest_test_system_suspend);
+	enable_system_suspend(vm);
+
+	vcpu_power_off(vm, VCPU_ID_TARGET);
+	run = vcpu_state(vm, VCPU_ID_SOURCE);
+
+	enter_guest(vm, VCPU_ID_SOURCE);
+
+	TEST_ASSERT(run->exit_reason == KVM_EXIT_SYSTEM_EVENT,
+		    "Unhandled exit reason: %u (%s)",
+		    run->exit_reason, exit_reason_str(run->exit_reason));
+	TEST_ASSERT(run->system_event.type == KVM_SYSTEM_EVENT_SUSPEND,
+		    "Unhandled system event: %u (expected: %u)",
+		    run->system_event.type, KVM_SYSTEM_EVENT_SUSPEND);
+
+	kvm_vm_free(vm);
+}
+
+static void host_test_system_suspend_fails(void)
+{
+	struct kvm_vm *vm;
+	struct ucall uc;
+
+	vm = setup_vm(guest_test_system_suspend);
+	enable_system_suspend(vm);
+
+	enter_guest(vm, VCPU_ID_SOURCE);
+	TEST_ASSERT(get_ucall(vm, VCPU_ID_SOURCE, &uc) == UCALL_SYNC,
+		    "Unhandled ucall: %lu", uc.cmd);
+	TEST_ASSERT(uc.args[1] == PSCI_RET_DENIED,
+		    "Unrecognized PSCI return code: %lu (expected: %u)",
+		    uc.args[1], PSCI_RET_DENIED);
+
+	kvm_vm_free(vm);
+}
+
 int main(void)
 {
+	if (!kvm_check_cap(KVM_CAP_ARM_SYSTEM_SUSPEND)) {
+		print_skip("KVM_CAP_ARM_SYSTEM_SUSPEND not supported");
+		exit(KSFT_SKIP);
+	}
+
 	host_test_cpu_on();
+	host_test_system_suspend();
+	host_test_system_suspend_fails();
 	return 0;
 }
-- 
2.35.1.473.g83b2b277ed-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 14/19] KVM: arm64: Raise default PSCI version to v1.1
  2022-02-23  4:18   ` Oliver Upton
@ 2022-02-23  4:26     ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:26 UTC (permalink / raw)
  To: kvmarm
  Cc: Paolo Bonzini, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

On Tue, Feb 22, 2022 at 8:19 PM Oliver Upton <oupton@google.com> wrote:
>
> As it turns out, KVM already implements the requirements of PSCI v1.1.
> Raise the default PSCI version to v1.1 to actually advertise as such.
>
> Suggested-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Oliver Upton <oupton@google.com>

Ah, looks like this is already in /next, courtesy of Will :-)

https://lore.kernel.org/all/20220221153524.15397-2-will@kernel.org/

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 14/19] KVM: arm64: Raise default PSCI version to v1.1
@ 2022-02-23  4:26     ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-23  4:26 UTC (permalink / raw)
  To: kvmarm
  Cc: Wanpeng Li, kvm, Joerg Roedel, Atish Patra, Peter Shier,
	kvm-riscv, Marc Zyngier, Paolo Bonzini, Vitaly Kuznetsov,
	Jim Mattson

On Tue, Feb 22, 2022 at 8:19 PM Oliver Upton <oupton@google.com> wrote:
>
> As it turns out, KVM already implements the requirements of PSCI v1.1.
> Raise the default PSCI version to v1.1 to actually advertise as such.
>
> Suggested-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Oliver Upton <oupton@google.com>

Ah, looks like this is already in /next, courtesy of Will :-)

https://lore.kernel.org/all/20220221153524.15397-2-will@kernel.org/

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 10/19] KVM: Create helper for setting a system event exit
  2022-02-23  4:18   ` Oliver Upton
@ 2022-02-23  6:37     ` Anup Patel
  -1 siblings, 0 replies; 94+ messages in thread
From: Anup Patel @ 2022-02-23  6:37 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, Paolo Bonzini, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Atish Patra,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, KVM General, kvm-riscv, Peter Shier,
	Reiji Watanabe, Ricardo Koller, Raghavendra Rao Ananta,
	Jing Zhang

On Wed, Feb 23, 2022 at 9:49 AM Oliver Upton <oupton@google.com> wrote:
>
> Create a helper that appropriately configures kvm_run for a system event
> exit.
>
> No functional change intended.
>
> Suggested-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Oliver Upton <oupton@google.com>

Looks good to me.

For KVM RISC-V:
Acked-by: Anup Patel <anup@brainfault.org>

Regards,
Anup

> ---
>  arch/arm64/kvm/psci.c         | 4 +---
>  arch/riscv/kvm/vcpu_sbi_v01.c | 4 +---
>  arch/x86/kvm/x86.c            | 6 ++----
>  include/linux/kvm_host.h      | 7 +++++++
>  4 files changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> index 41adaaf2234a..2bb8d047cde4 100644
> --- a/arch/arm64/kvm/psci.c
> +++ b/arch/arm64/kvm/psci.c
> @@ -193,9 +193,7 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
>                 tmp->arch.mp_state = KVM_MP_STATE_STOPPED;
>         kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
>
> -       memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
> -       vcpu->run->system_event.type = type;
> -       vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> +       kvm_vcpu_set_system_event_exit(vcpu, type);
>  }
>
>  static void kvm_psci_system_off(struct kvm_vcpu *vcpu)
> diff --git a/arch/riscv/kvm/vcpu_sbi_v01.c b/arch/riscv/kvm/vcpu_sbi_v01.c
> index 07e2de14433a..7a197d5658d7 100644
> --- a/arch/riscv/kvm/vcpu_sbi_v01.c
> +++ b/arch/riscv/kvm/vcpu_sbi_v01.c
> @@ -24,9 +24,7 @@ static void kvm_sbi_system_shutdown(struct kvm_vcpu *vcpu,
>                 tmp->arch.power_off = true;
>         kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
>
> -       memset(&run->system_event, 0, sizeof(run->system_event));
> -       run->system_event.type = type;
> -       run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> +       kvm_vcpu_set_system_event_exit(vcpu, type);
>  }
>
>  static int kvm_sbi_ext_v01_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7131d735b1ef..109751f89ee3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9903,14 +9903,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>                 if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
>                         kvm_vcpu_reload_apic_access_page(vcpu);
>                 if (kvm_check_request(KVM_REQ_HV_CRASH, vcpu)) {
> -                       vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> -                       vcpu->run->system_event.type = KVM_SYSTEM_EVENT_CRASH;
> +                       kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_CRASH);
>                         r = 0;
>                         goto out;
>                 }
>                 if (kvm_check_request(KVM_REQ_HV_RESET, vcpu)) {
> -                       vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> -                       vcpu->run->system_event.type = KVM_SYSTEM_EVENT_RESET;
> +                       kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_RESET);
>                         r = 0;
>                         goto out;
>                 }
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index f11039944c08..9085a1b1569a 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2202,6 +2202,13 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
>  }
>  #endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
>
> +static inline void kvm_vcpu_set_system_event_exit(struct kvm_vcpu *vcpu, u32 type)
> +{
> +       memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
> +       vcpu->run->system_event.type = type;
> +       vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> +}
> +
>  /*
>   * This defines how many reserved entries we want to keep before we
>   * kick the vcpu to the userspace to avoid dirty ring full.  This
> --
> 2.35.1.473.g83b2b277ed-goog
>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 10/19] KVM: Create helper for setting a system event exit
@ 2022-02-23  6:37     ` Anup Patel
  0 siblings, 0 replies; 94+ messages in thread
From: Anup Patel @ 2022-02-23  6:37 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Wanpeng Li, KVM General, Marc Zyngier, Joerg Roedel, Peter Shier,
	kvm-riscv, Atish Patra, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

On Wed, Feb 23, 2022 at 9:49 AM Oliver Upton <oupton@google.com> wrote:
>
> Create a helper that appropriately configures kvm_run for a system event
> exit.
>
> No functional change intended.
>
> Suggested-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Oliver Upton <oupton@google.com>

Looks good to me.

For KVM RISC-V:
Acked-by: Anup Patel <anup@brainfault.org>

Regards,
Anup

> ---
>  arch/arm64/kvm/psci.c         | 4 +---
>  arch/riscv/kvm/vcpu_sbi_v01.c | 4 +---
>  arch/x86/kvm/x86.c            | 6 ++----
>  include/linux/kvm_host.h      | 7 +++++++
>  4 files changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> index 41adaaf2234a..2bb8d047cde4 100644
> --- a/arch/arm64/kvm/psci.c
> +++ b/arch/arm64/kvm/psci.c
> @@ -193,9 +193,7 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
>                 tmp->arch.mp_state = KVM_MP_STATE_STOPPED;
>         kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
>
> -       memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
> -       vcpu->run->system_event.type = type;
> -       vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> +       kvm_vcpu_set_system_event_exit(vcpu, type);
>  }
>
>  static void kvm_psci_system_off(struct kvm_vcpu *vcpu)
> diff --git a/arch/riscv/kvm/vcpu_sbi_v01.c b/arch/riscv/kvm/vcpu_sbi_v01.c
> index 07e2de14433a..7a197d5658d7 100644
> --- a/arch/riscv/kvm/vcpu_sbi_v01.c
> +++ b/arch/riscv/kvm/vcpu_sbi_v01.c
> @@ -24,9 +24,7 @@ static void kvm_sbi_system_shutdown(struct kvm_vcpu *vcpu,
>                 tmp->arch.power_off = true;
>         kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
>
> -       memset(&run->system_event, 0, sizeof(run->system_event));
> -       run->system_event.type = type;
> -       run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> +       kvm_vcpu_set_system_event_exit(vcpu, type);
>  }
>
>  static int kvm_sbi_ext_v01_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 7131d735b1ef..109751f89ee3 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9903,14 +9903,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>                 if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
>                         kvm_vcpu_reload_apic_access_page(vcpu);
>                 if (kvm_check_request(KVM_REQ_HV_CRASH, vcpu)) {
> -                       vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> -                       vcpu->run->system_event.type = KVM_SYSTEM_EVENT_CRASH;
> +                       kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_CRASH);
>                         r = 0;
>                         goto out;
>                 }
>                 if (kvm_check_request(KVM_REQ_HV_RESET, vcpu)) {
> -                       vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> -                       vcpu->run->system_event.type = KVM_SYSTEM_EVENT_RESET;
> +                       kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_RESET);
>                         r = 0;
>                         goto out;
>                 }
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index f11039944c08..9085a1b1569a 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2202,6 +2202,13 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
>  }
>  #endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
>
> +static inline void kvm_vcpu_set_system_event_exit(struct kvm_vcpu *vcpu, u32 type)
> +{
> +       memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
> +       vcpu->run->system_event.type = type;
> +       vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> +}
> +
>  /*
>   * This defines how many reserved entries we want to keep before we
>   * kick the vcpu to the userspace to avoid dirty ring full.  This
> --
> 2.35.1.473.g83b2b277ed-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 01/19] KVM: arm64: Drop unused param from kvm_psci_version()
  2022-02-23  4:18   ` Oliver Upton
@ 2022-02-24  6:14     ` Reiji Watanabe
  -1 siblings, 0 replies; 94+ messages in thread
From: Reiji Watanabe @ 2022-02-24  6:14 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, Paolo Bonzini, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Anup Patel, Atish Patra,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, kvm-riscv, Peter Shier, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

On Tue, Feb 22, 2022 at 8:19 PM Oliver Upton <oupton@google.com> wrote:
>
> kvm_psci_version() consumes a pointer to struct kvm in addition to a
> vcpu pointer. Drop the kvm pointer as it is unused. While the comment
> suggests the explicit kvm pointer was useful for calling from hyp, there
> exist no such callsite in hyp.
>
> Signed-off-by: Oliver Upton <oupton@google.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Reiji Watanabe <reijiw@google.com>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 01/19] KVM: arm64: Drop unused param from kvm_psci_version()
@ 2022-02-24  6:14     ` Reiji Watanabe
  0 siblings, 0 replies; 94+ messages in thread
From: Reiji Watanabe @ 2022-02-24  6:14 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Wanpeng Li, kvm, Marc Zyngier, Joerg Roedel, Atish Patra,
	Peter Shier, kvm-riscv, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

On Tue, Feb 22, 2022 at 8:19 PM Oliver Upton <oupton@google.com> wrote:
>
> kvm_psci_version() consumes a pointer to struct kvm in addition to a
> vcpu pointer. Drop the kvm pointer as it is unused. While the comment
> suggests the explicit kvm pointer was useful for calling from hyp, there
> exist no such callsite in hyp.
>
> Signed-off-by: Oliver Upton <oupton@google.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>

Reviewed-by: Reiji Watanabe <reijiw@google.com>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 02/19] KVM: arm64: Create a helper to check if IPA is valid
  2022-02-23  4:18   ` Oliver Upton
@ 2022-02-24  6:32     ` Reiji Watanabe
  -1 siblings, 0 replies; 94+ messages in thread
From: Reiji Watanabe @ 2022-02-24  6:32 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, Paolo Bonzini, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Anup Patel, Atish Patra,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, kvm-riscv, Peter Shier, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

On Tue, Feb 22, 2022 at 8:19 PM Oliver Upton <oupton@google.com> wrote:
>
> Create a helper that tests if a given IPA fits within the guest's
> address space.
>
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
>  arch/arm64/include/asm/kvm_mmu.h      | 9 +++++++++
>  arch/arm64/kvm/vgic/vgic-kvm-device.c | 2 +-
>  2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 81839e9a8a24..78e8be7ea627 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -111,6 +111,7 @@ alternative_cb_end
>  #else
>
>  #include <linux/pgtable.h>
> +#include <linux/kvm_host.h>
>  #include <asm/pgalloc.h>
>  #include <asm/cache.h>
>  #include <asm/cacheflush.h>
> @@ -147,6 +148,14 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
>  #define kvm_phys_size(kvm)             (_AC(1, ULL) << kvm_phys_shift(kvm))
>  #define kvm_phys_mask(kvm)             (kvm_phys_size(kvm) - _AC(1, ULL))
>
> +/*
> + * Returns true if the provided IPA exists within the VM's IPA space.
> + */
> +static inline bool kvm_ipa_valid(struct kvm *kvm, phys_addr_t guest_ipa)
> +{
> +       return !(guest_ipa & ~kvm_phys_mask(kvm));
> +}
> +
>  #include <asm/kvm_pgtable.h>
>  #include <asm/stage2_pgtable.h>
>
> diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
> index c6d52a1fd9c8..e3853a75cb00 100644
> --- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
> +++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
> @@ -27,7 +27,7 @@ int vgic_check_iorange(struct kvm *kvm, phys_addr_t ioaddr,
>         if (addr + size < addr)
>                 return -EINVAL;
>
> -       if (addr & ~kvm_phys_mask(kvm) || addr + size > kvm_phys_size(kvm))
> +       if (!kvm_ipa_valid(kvm, addr) || addr + size > kvm_phys_size(kvm))
>                 return -E2BIG;
>
>         return 0;

Reviewed-by: Reiji Watanabe <reijiw@google.com>

It looks like we can use the helper for kvm_handle_guest_abort()
in arch/arm64/kvm/mmu.c as well though.
----------
<...>
        /* Userspace should not be able to register out-of-bounds IPAs */
        VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm));
<...>
----------

Thanks,
Reiji

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 02/19] KVM: arm64: Create a helper to check if IPA is valid
@ 2022-02-24  6:32     ` Reiji Watanabe
  0 siblings, 0 replies; 94+ messages in thread
From: Reiji Watanabe @ 2022-02-24  6:32 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Wanpeng Li, kvm, Marc Zyngier, Joerg Roedel, Atish Patra,
	Peter Shier, kvm-riscv, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

On Tue, Feb 22, 2022 at 8:19 PM Oliver Upton <oupton@google.com> wrote:
>
> Create a helper that tests if a given IPA fits within the guest's
> address space.
>
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
>  arch/arm64/include/asm/kvm_mmu.h      | 9 +++++++++
>  arch/arm64/kvm/vgic/vgic-kvm-device.c | 2 +-
>  2 files changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 81839e9a8a24..78e8be7ea627 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -111,6 +111,7 @@ alternative_cb_end
>  #else
>
>  #include <linux/pgtable.h>
> +#include <linux/kvm_host.h>
>  #include <asm/pgalloc.h>
>  #include <asm/cache.h>
>  #include <asm/cacheflush.h>
> @@ -147,6 +148,14 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
>  #define kvm_phys_size(kvm)             (_AC(1, ULL) << kvm_phys_shift(kvm))
>  #define kvm_phys_mask(kvm)             (kvm_phys_size(kvm) - _AC(1, ULL))
>
> +/*
> + * Returns true if the provided IPA exists within the VM's IPA space.
> + */
> +static inline bool kvm_ipa_valid(struct kvm *kvm, phys_addr_t guest_ipa)
> +{
> +       return !(guest_ipa & ~kvm_phys_mask(kvm));
> +}
> +
>  #include <asm/kvm_pgtable.h>
>  #include <asm/stage2_pgtable.h>
>
> diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
> index c6d52a1fd9c8..e3853a75cb00 100644
> --- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
> +++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
> @@ -27,7 +27,7 @@ int vgic_check_iorange(struct kvm *kvm, phys_addr_t ioaddr,
>         if (addr + size < addr)
>                 return -EINVAL;
>
> -       if (addr & ~kvm_phys_mask(kvm) || addr + size > kvm_phys_size(kvm))
> +       if (!kvm_ipa_valid(kvm, addr) || addr + size > kvm_phys_size(kvm))
>                 return -E2BIG;
>
>         return 0;

Reviewed-by: Reiji Watanabe <reijiw@google.com>

It looks like we can use the helper for kvm_handle_guest_abort()
in arch/arm64/kvm/mmu.c as well though.
----------
<...>
        /* Userspace should not be able to register out-of-bounds IPAs */
        VM_BUG_ON(fault_ipa >= kvm_phys_size(vcpu->kvm));
<...>
----------

Thanks,
Reiji
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 03/19] KVM: arm64: Reject invalid addresses for CPU_ON PSCI call
  2022-02-23  4:18   ` Oliver Upton
@ 2022-02-24  6:55     ` Reiji Watanabe
  -1 siblings, 0 replies; 94+ messages in thread
From: Reiji Watanabe @ 2022-02-24  6:55 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, Paolo Bonzini, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Anup Patel, Atish Patra,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, kvm-riscv, Peter Shier, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

On Tue, Feb 22, 2022 at 8:19 PM Oliver Upton <oupton@google.com> wrote:
>
> DEN0022D.b 5.6.2 "Caller responsibilities" states that a PSCI
> implementation may return INVALID_ADDRESS for the CPU_ON call if the
> provided entry address is known to be invalid. There is an additional
> caveat to this rule. Prior to PSCI v1.0, the INVALID_PARAMETERS error
> is returned instead. Check the guest's PSCI version and return the
> appropriate error if the IPA is invalid.
>
> Reported-by: Reiji Watanabe <reijiw@google.com>
> Signed-off-by: Oliver Upton <oupton@google.com>

Reviewed-by: Reiji Watanabe <reijiw@google.com>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 03/19] KVM: arm64: Reject invalid addresses for CPU_ON PSCI call
@ 2022-02-24  6:55     ` Reiji Watanabe
  0 siblings, 0 replies; 94+ messages in thread
From: Reiji Watanabe @ 2022-02-24  6:55 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Wanpeng Li, kvm, Marc Zyngier, Joerg Roedel, Atish Patra,
	Peter Shier, kvm-riscv, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

On Tue, Feb 22, 2022 at 8:19 PM Oliver Upton <oupton@google.com> wrote:
>
> DEN0022D.b 5.6.2 "Caller responsibilities" states that a PSCI
> implementation may return INVALID_ADDRESS for the CPU_ON call if the
> provided entry address is known to be invalid. There is an additional
> caveat to this rule. Prior to PSCI v1.0, the INVALID_PARAMETERS error
> is returned instead. Check the guest's PSCI version and return the
> appropriate error if the IPA is invalid.
>
> Reported-by: Reiji Watanabe <reijiw@google.com>
> Signed-off-by: Oliver Upton <oupton@google.com>

Reviewed-by: Reiji Watanabe <reijiw@google.com>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 05/19] KVM: arm64: Dedupe vCPU power off helpers
  2022-02-23  4:18   ` Oliver Upton
@ 2022-02-24  7:07     ` Reiji Watanabe
  -1 siblings, 0 replies; 94+ messages in thread
From: Reiji Watanabe @ 2022-02-24  7:07 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, Paolo Bonzini, Marc Zyngier, James Morse,
	Alexandru Elisei, Suzuki K Poulose, Anup Patel, Atish Patra,
	Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, kvm-riscv, Peter Shier, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

On Tue, Feb 22, 2022 at 8:19 PM Oliver Upton <oupton@google.com> wrote:
>
> vcpu_power_off() and kvm_psci_vcpu_off() are equivalent; rename the
> former and replace all callsites to the latter.
>
> No functional change intended.
>
> Signed-off-by: Oliver Upton <oupton@google.com>

Reviewed-by: Reiji Watanabe <reijiw@google.com>

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 05/19] KVM: arm64: Dedupe vCPU power off helpers
@ 2022-02-24  7:07     ` Reiji Watanabe
  0 siblings, 0 replies; 94+ messages in thread
From: Reiji Watanabe @ 2022-02-24  7:07 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Wanpeng Li, kvm, Marc Zyngier, Joerg Roedel, Atish Patra,
	Peter Shier, kvm-riscv, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

On Tue, Feb 22, 2022 at 8:19 PM Oliver Upton <oupton@google.com> wrote:
>
> vcpu_power_off() and kvm_psci_vcpu_off() are equivalent; rename the
> former and replace all callsites to the latter.
>
> No functional change intended.
>
> Signed-off-by: Oliver Upton <oupton@google.com>

Reviewed-by: Reiji Watanabe <reijiw@google.com>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 02/19] KVM: arm64: Create a helper to check if IPA is valid
  2022-02-23  4:18   ` Oliver Upton
@ 2022-02-24 12:06     ` Marc Zyngier
  -1 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-02-24 12:06 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, Paolo Bonzini, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

On Wed, 23 Feb 2022 04:18:27 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> Create a helper that tests if a given IPA fits within the guest's
> address space.
> 
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
>  arch/arm64/include/asm/kvm_mmu.h      | 9 +++++++++
>  arch/arm64/kvm/vgic/vgic-kvm-device.c | 2 +-
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 81839e9a8a24..78e8be7ea627 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -111,6 +111,7 @@ alternative_cb_end
>  #else
>  
>  #include <linux/pgtable.h>
> +#include <linux/kvm_host.h>

I'd rather you avoid that. This sort of linux->asm->linux transitive
inclusions always lead to a terrible mess at some point. Which is why
we use #defines below. And yes, the pgtable.h inclusion is a bad
precedent.

>  #include <asm/pgalloc.h>
>  #include <asm/cache.h>
>  #include <asm/cacheflush.h>
> @@ -147,6 +148,14 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
>  #define kvm_phys_size(kvm)		(_AC(1, ULL) << kvm_phys_shift(kvm))
>  #define kvm_phys_mask(kvm)		(kvm_phys_size(kvm) - _AC(1, ULL))
>  
> +/*
> + * Returns true if the provided IPA exists within the VM's IPA space.
> + */
> +static inline bool kvm_ipa_valid(struct kvm *kvm, phys_addr_t guest_ipa)
> +{
> +	return !(guest_ipa & ~kvm_phys_mask(kvm));
> +}
> +

I'm all for the helper, but just make it a #define to be consistent
with the rest of the code.

>  #include <asm/kvm_pgtable.h>
>  #include <asm/stage2_pgtable.h>
>  
> diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
> index c6d52a1fd9c8..e3853a75cb00 100644
> --- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
> +++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
> @@ -27,7 +27,7 @@ int vgic_check_iorange(struct kvm *kvm, phys_addr_t ioaddr,
>  	if (addr + size < addr)
>  		return -EINVAL;
>  
> -	if (addr & ~kvm_phys_mask(kvm) || addr + size > kvm_phys_size(kvm))
> +	if (!kvm_ipa_valid(kvm, addr) || addr + size > kvm_phys_size(kvm))
>  		return -E2BIG;

I think you can pretty much use this helper everywhere something is
compared to kvm_phys_size(), and the above becomes:

 if (!kvm_ipa_valid(kvm, addr) || !kvm_ipa_valid(kvm, addr + size - 1))

Same this goes for the couple of occurrences in arch/arm64/kvm/mmu.c.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 02/19] KVM: arm64: Create a helper to check if IPA is valid
@ 2022-02-24 12:06     ` Marc Zyngier
  0 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-02-24 12:06 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Wanpeng Li, kvm, Joerg Roedel, Peter Shier, kvm-riscv,
	Atish Patra, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

On Wed, 23 Feb 2022 04:18:27 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> Create a helper that tests if a given IPA fits within the guest's
> address space.
> 
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
>  arch/arm64/include/asm/kvm_mmu.h      | 9 +++++++++
>  arch/arm64/kvm/vgic/vgic-kvm-device.c | 2 +-
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 81839e9a8a24..78e8be7ea627 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -111,6 +111,7 @@ alternative_cb_end
>  #else
>  
>  #include <linux/pgtable.h>
> +#include <linux/kvm_host.h>

I'd rather you avoid that. This sort of linux->asm->linux transitive
inclusions always lead to a terrible mess at some point. Which is why
we use #defines below. And yes, the pgtable.h inclusion is a bad
precedent.

>  #include <asm/pgalloc.h>
>  #include <asm/cache.h>
>  #include <asm/cacheflush.h>
> @@ -147,6 +148,14 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
>  #define kvm_phys_size(kvm)		(_AC(1, ULL) << kvm_phys_shift(kvm))
>  #define kvm_phys_mask(kvm)		(kvm_phys_size(kvm) - _AC(1, ULL))
>  
> +/*
> + * Returns true if the provided IPA exists within the VM's IPA space.
> + */
> +static inline bool kvm_ipa_valid(struct kvm *kvm, phys_addr_t guest_ipa)
> +{
> +	return !(guest_ipa & ~kvm_phys_mask(kvm));
> +}
> +

I'm all for the helper, but just make it a #define to be consistent
with the rest of the code.

>  #include <asm/kvm_pgtable.h>
>  #include <asm/stage2_pgtable.h>
>  
> diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
> index c6d52a1fd9c8..e3853a75cb00 100644
> --- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
> +++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
> @@ -27,7 +27,7 @@ int vgic_check_iorange(struct kvm *kvm, phys_addr_t ioaddr,
>  	if (addr + size < addr)
>  		return -EINVAL;
>  
> -	if (addr & ~kvm_phys_mask(kvm) || addr + size > kvm_phys_size(kvm))
> +	if (!kvm_ipa_valid(kvm, addr) || addr + size > kvm_phys_size(kvm))
>  		return -E2BIG;

I think you can pretty much use this helper everywhere something is
compared to kvm_phys_size(), and the above becomes:

 if (!kvm_ipa_valid(kvm, addr) || !kvm_ipa_valid(kvm, addr + size - 1))

Same this goes for the couple of occurrences in arch/arm64/kvm/mmu.c.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 03/19] KVM: arm64: Reject invalid addresses for CPU_ON PSCI call
  2022-02-23  4:18   ` Oliver Upton
@ 2022-02-24 12:30     ` Marc Zyngier
  -1 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-02-24 12:30 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, Paolo Bonzini, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

On Wed, 23 Feb 2022 04:18:28 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> DEN0022D.b 5.6.2 "Caller responsibilities" states that a PSCI
> implementation may return INVALID_ADDRESS for the CPU_ON call if the
> provided entry address is known to be invalid. There is an additional
> caveat to this rule. Prior to PSCI v1.0, the INVALID_PARAMETERS error
> is returned instead. Check the guest's PSCI version and return the
> appropriate error if the IPA is invalid.
> 
> Reported-by: Reiji Watanabe <reijiw@google.com>
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
>  arch/arm64/kvm/psci.c | 24 ++++++++++++++++++++++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> index a0c10c11f40e..de1cf554929d 100644
> --- a/arch/arm64/kvm/psci.c
> +++ b/arch/arm64/kvm/psci.c
> @@ -12,6 +12,7 @@
>  
>  #include <asm/cputype.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_mmu.h>
>  
>  #include <kvm/arm_psci.h>
>  #include <kvm/arm_hypercalls.h>
> @@ -70,12 +71,31 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
>  	struct vcpu_reset_state *reset_state;
>  	struct kvm *kvm = source_vcpu->kvm;
>  	struct kvm_vcpu *vcpu = NULL;
> -	unsigned long cpu_id;
> +	unsigned long cpu_id, entry_addr;
>  
>  	cpu_id = smccc_get_arg1(source_vcpu);
>  	if (!kvm_psci_valid_affinity(source_vcpu, cpu_id))
>  		return PSCI_RET_INVALID_PARAMS;
>  
> +	/*
> +	 * Basic sanity check: ensure the requested entry address actually
> +	 * exists within the guest's address space.
> +	 */
> +	entry_addr = smccc_get_arg2(source_vcpu);
> +	if (!kvm_ipa_valid(kvm, entry_addr)) {
> +
> +		/*
> +		 * Before PSCI v1.0, the INVALID_PARAMETERS error is returned
> +		 * instead of INVALID_ADDRESS.
> +		 *
> +		 * For more details, see ARM DEN0022D.b 5.6 "CPU_ON".
> +		 */
> +		if (kvm_psci_version(source_vcpu) < KVM_ARM_PSCI_1_0)
> +			return PSCI_RET_INVALID_PARAMS;
> +		else
> +			return PSCI_RET_INVALID_ADDRESS;
> +	}
> +

If you're concerned with this, should you also check for the PC
alignment, or the presence of a memslot covering the address you are
branching to?  Le latter is particularly hard to implement reliably.

So far, my position has been that the guest is free to shoot itself in
the foot if that's what it wants to do, and that babysitting it was a
waste of useful bits! ;-)

Or have you identified something that makes it a requirement to handle
this case (and possibly others)  in the hypervisor?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 03/19] KVM: arm64: Reject invalid addresses for CPU_ON PSCI call
@ 2022-02-24 12:30     ` Marc Zyngier
  0 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-02-24 12:30 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Wanpeng Li, kvm, Joerg Roedel, Peter Shier, kvm-riscv,
	Atish Patra, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

On Wed, 23 Feb 2022 04:18:28 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> DEN0022D.b 5.6.2 "Caller responsibilities" states that a PSCI
> implementation may return INVALID_ADDRESS for the CPU_ON call if the
> provided entry address is known to be invalid. There is an additional
> caveat to this rule. Prior to PSCI v1.0, the INVALID_PARAMETERS error
> is returned instead. Check the guest's PSCI version and return the
> appropriate error if the IPA is invalid.
> 
> Reported-by: Reiji Watanabe <reijiw@google.com>
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
>  arch/arm64/kvm/psci.c | 24 ++++++++++++++++++++++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> index a0c10c11f40e..de1cf554929d 100644
> --- a/arch/arm64/kvm/psci.c
> +++ b/arch/arm64/kvm/psci.c
> @@ -12,6 +12,7 @@
>  
>  #include <asm/cputype.h>
>  #include <asm/kvm_emulate.h>
> +#include <asm/kvm_mmu.h>
>  
>  #include <kvm/arm_psci.h>
>  #include <kvm/arm_hypercalls.h>
> @@ -70,12 +71,31 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
>  	struct vcpu_reset_state *reset_state;
>  	struct kvm *kvm = source_vcpu->kvm;
>  	struct kvm_vcpu *vcpu = NULL;
> -	unsigned long cpu_id;
> +	unsigned long cpu_id, entry_addr;
>  
>  	cpu_id = smccc_get_arg1(source_vcpu);
>  	if (!kvm_psci_valid_affinity(source_vcpu, cpu_id))
>  		return PSCI_RET_INVALID_PARAMS;
>  
> +	/*
> +	 * Basic sanity check: ensure the requested entry address actually
> +	 * exists within the guest's address space.
> +	 */
> +	entry_addr = smccc_get_arg2(source_vcpu);
> +	if (!kvm_ipa_valid(kvm, entry_addr)) {
> +
> +		/*
> +		 * Before PSCI v1.0, the INVALID_PARAMETERS error is returned
> +		 * instead of INVALID_ADDRESS.
> +		 *
> +		 * For more details, see ARM DEN0022D.b 5.6 "CPU_ON".
> +		 */
> +		if (kvm_psci_version(source_vcpu) < KVM_ARM_PSCI_1_0)
> +			return PSCI_RET_INVALID_PARAMS;
> +		else
> +			return PSCI_RET_INVALID_ADDRESS;
> +	}
> +

If you're concerned with this, should you also check for the PC
alignment, or the presence of a memslot covering the address you are
branching to?  Le latter is particularly hard to implement reliably.

So far, my position has been that the guest is free to shoot itself in
the foot if that's what it wants to do, and that babysitting it was a
waste of useful bits! ;-)

Or have you identified something that makes it a requirement to handle
this case (and possibly others)  in the hypervisor?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 06/19] KVM: arm64: Track vCPU power state using MP state values
  2022-02-23  4:18   ` Oliver Upton
@ 2022-02-24 13:25     ` Marc Zyngier
  -1 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-02-24 13:25 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, Paolo Bonzini, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

On Wed, 23 Feb 2022 04:18:31 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> A subsequent change to KVM will add support for additional power states.
> Store the MP state by value rather than keeping track of it as a
> boolean.
> 
> No functional change intended.
> 
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  5 +++--
>  arch/arm64/kvm/arm.c              | 22 ++++++++++++----------
>  arch/arm64/kvm/psci.c             | 10 +++++-----
>  3 files changed, 20 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index cacc9efd2e70..3e8bfecaa95b 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -350,8 +350,8 @@ struct kvm_vcpu_arch {
>  		u32	mdscr_el1;
>  	} guest_debug_preserved;
>  
> -	/* vcpu power-off state */
> -	bool power_off;
> +	/* vcpu power state */
> +	u32 mp_state;

nit: why don't you just carry a kvm_mp_state structure instead of
open-coding a u32? Same size, stronger typing.

>  
>  	/* Don't run the guest (internal implementation need) */
>  	bool pause;
> @@ -800,5 +800,6 @@ static inline void kvm_hyp_reserve(void) { }
>  #endif
>  
>  void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu);
> +bool kvm_arm_vcpu_powered_off(struct kvm_vcpu *vcpu);
>  
>  #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 07c6a176cdcc..b4987b891f38 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -428,18 +428,20 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  
>  void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu)
>  {
> -	vcpu->arch.power_off = true;
> +	vcpu->arch.mp_state = KVM_MP_STATE_STOPPED;
>  	kvm_make_request(KVM_REQ_SLEEP, vcpu);
>  	kvm_vcpu_kick(vcpu);
>  }
>  
> +bool kvm_arm_vcpu_powered_off(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->arch.mp_state == KVM_MP_STATE_STOPPED;

nit: if we're fully embracing the MP_STATE concept, just renamed this
to kvm_arm_vcpu_stopped().

> +}
> +
>  int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
>  				    struct kvm_mp_state *mp_state)
>  {
> -	if (vcpu->arch.power_off)
> -		mp_state->mp_state = KVM_MP_STATE_STOPPED;
> -	else
> -		mp_state->mp_state = KVM_MP_STATE_RUNNABLE;
> +	mp_state->mp_state = vcpu->arch.mp_state;
>
>  	return 0;
>  }
> @@ -451,7 +453,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
>  
>  	switch (mp_state->mp_state) {
>  	case KVM_MP_STATE_RUNNABLE:
> -		vcpu->arch.power_off = false;
> +		vcpu->arch.mp_state = mp_state->mp_state;
>  		break;
>  	case KVM_MP_STATE_STOPPED:
>  		kvm_arm_vcpu_power_off(vcpu);
> @@ -474,7 +476,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
>  {
>  	bool irq_lines = *vcpu_hcr(v) & (HCR_VI | HCR_VF);
>  	return ((irq_lines || kvm_vgic_vcpu_pending_irq(v))
> -		&& !v->arch.power_off && !v->arch.pause);
> +		&& !kvm_arm_vcpu_powered_off(v) && !v->arch.pause);
>  }
>  
>  bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
> @@ -668,10 +670,10 @@ static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
>  	struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
>  
>  	rcuwait_wait_event(wait,
> -			   (!vcpu->arch.power_off) &&(!vcpu->arch.pause),
> +			   (!kvm_arm_vcpu_powered_off(vcpu)) && (!vcpu->arch.pause),
>  			   TASK_INTERRUPTIBLE);
>  
> -	if (vcpu->arch.power_off || vcpu->arch.pause) {
> +	if (kvm_arm_vcpu_powered_off(vcpu) || vcpu->arch.pause) {
>  		/* Awaken to handle a signal, request we sleep again later. */
>  		kvm_make_request(KVM_REQ_SLEEP, vcpu);
>  	}
> @@ -1181,7 +1183,7 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
>  	if (test_bit(KVM_ARM_VCPU_POWER_OFF, vcpu->arch.features))
>  		kvm_arm_vcpu_power_off(vcpu);
>  	else
> -		vcpu->arch.power_off = false;
> +		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
>  
>  	return 0;
>  }
> diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> index e3f93b7f8d38..77a00913cdfd 100644
> --- a/arch/arm64/kvm/psci.c
> +++ b/arch/arm64/kvm/psci.c
> @@ -97,7 +97,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
>  	 */
>  	if (!vcpu)
>  		return PSCI_RET_INVALID_PARAMS;
> -	if (!vcpu->arch.power_off) {
> +	if (!kvm_arm_vcpu_powered_off(vcpu)) {
>  		if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
>  			return PSCI_RET_ALREADY_ON;
>  		else
> @@ -122,11 +122,11 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
>  
>  	/*
>  	 * Make sure the reset request is observed if the change to
> -	 * power_off is observed.
> +	 * mp_state is observed.

You want to expand this comment a bit, as this is not strictly a
binary state anymore.

>  	 */
>  	smp_wmb();
>  
> -	vcpu->arch.power_off = false;
> +	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
>  	kvm_vcpu_wake_up(vcpu);
>  
>  	return PSCI_RET_SUCCESS;
> @@ -164,7 +164,7 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
>  		mpidr = kvm_vcpu_get_mpidr_aff(tmp);
>  		if ((mpidr & target_affinity_mask) == target_affinity) {
>  			matching_cpus++;
> -			if (!tmp->arch.power_off)
> +			if (!kvm_arm_vcpu_powered_off(tmp))
>  				return PSCI_0_2_AFFINITY_LEVEL_ON;
>  		}
>  	}
> @@ -190,7 +190,7 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
>  	 * re-initialized.
>  	 */
>  	kvm_for_each_vcpu(i, tmp, vcpu->kvm)
> -		tmp->arch.power_off = true;
> +		tmp->arch.mp_state = KVM_MP_STATE_STOPPED;
>  	kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
>  
>  	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));

You also may want to initialise the mp_state to RUNNABLE by default in
kvm_arch_vcpu_create(). We are currently relying on power_off to be
false thanks to the vcpu struct being zeroed, but we may as well make
it clearer (RUNNABLE is also 0, so there is no actual bug here).

Otherwise, looks good.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 06/19] KVM: arm64: Track vCPU power state using MP state values
@ 2022-02-24 13:25     ` Marc Zyngier
  0 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-02-24 13:25 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Wanpeng Li, kvm, Joerg Roedel, Peter Shier, kvm-riscv,
	Atish Patra, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

On Wed, 23 Feb 2022 04:18:31 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> A subsequent change to KVM will add support for additional power states.
> Store the MP state by value rather than keeping track of it as a
> boolean.
> 
> No functional change intended.
> 
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  5 +++--
>  arch/arm64/kvm/arm.c              | 22 ++++++++++++----------
>  arch/arm64/kvm/psci.c             | 10 +++++-----
>  3 files changed, 20 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index cacc9efd2e70..3e8bfecaa95b 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -350,8 +350,8 @@ struct kvm_vcpu_arch {
>  		u32	mdscr_el1;
>  	} guest_debug_preserved;
>  
> -	/* vcpu power-off state */
> -	bool power_off;
> +	/* vcpu power state */
> +	u32 mp_state;

nit: why don't you just carry a kvm_mp_state structure instead of
open-coding a u32? Same size, stronger typing.

>  
>  	/* Don't run the guest (internal implementation need) */
>  	bool pause;
> @@ -800,5 +800,6 @@ static inline void kvm_hyp_reserve(void) { }
>  #endif
>  
>  void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu);
> +bool kvm_arm_vcpu_powered_off(struct kvm_vcpu *vcpu);
>  
>  #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 07c6a176cdcc..b4987b891f38 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -428,18 +428,20 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  
>  void kvm_arm_vcpu_power_off(struct kvm_vcpu *vcpu)
>  {
> -	vcpu->arch.power_off = true;
> +	vcpu->arch.mp_state = KVM_MP_STATE_STOPPED;
>  	kvm_make_request(KVM_REQ_SLEEP, vcpu);
>  	kvm_vcpu_kick(vcpu);
>  }
>  
> +bool kvm_arm_vcpu_powered_off(struct kvm_vcpu *vcpu)
> +{
> +	return vcpu->arch.mp_state == KVM_MP_STATE_STOPPED;

nit: if we're fully embracing the MP_STATE concept, just renamed this
to kvm_arm_vcpu_stopped().

> +}
> +
>  int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
>  				    struct kvm_mp_state *mp_state)
>  {
> -	if (vcpu->arch.power_off)
> -		mp_state->mp_state = KVM_MP_STATE_STOPPED;
> -	else
> -		mp_state->mp_state = KVM_MP_STATE_RUNNABLE;
> +	mp_state->mp_state = vcpu->arch.mp_state;
>
>  	return 0;
>  }
> @@ -451,7 +453,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
>  
>  	switch (mp_state->mp_state) {
>  	case KVM_MP_STATE_RUNNABLE:
> -		vcpu->arch.power_off = false;
> +		vcpu->arch.mp_state = mp_state->mp_state;
>  		break;
>  	case KVM_MP_STATE_STOPPED:
>  		kvm_arm_vcpu_power_off(vcpu);
> @@ -474,7 +476,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
>  {
>  	bool irq_lines = *vcpu_hcr(v) & (HCR_VI | HCR_VF);
>  	return ((irq_lines || kvm_vgic_vcpu_pending_irq(v))
> -		&& !v->arch.power_off && !v->arch.pause);
> +		&& !kvm_arm_vcpu_powered_off(v) && !v->arch.pause);
>  }
>  
>  bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
> @@ -668,10 +670,10 @@ static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
>  	struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu);
>  
>  	rcuwait_wait_event(wait,
> -			   (!vcpu->arch.power_off) &&(!vcpu->arch.pause),
> +			   (!kvm_arm_vcpu_powered_off(vcpu)) && (!vcpu->arch.pause),
>  			   TASK_INTERRUPTIBLE);
>  
> -	if (vcpu->arch.power_off || vcpu->arch.pause) {
> +	if (kvm_arm_vcpu_powered_off(vcpu) || vcpu->arch.pause) {
>  		/* Awaken to handle a signal, request we sleep again later. */
>  		kvm_make_request(KVM_REQ_SLEEP, vcpu);
>  	}
> @@ -1181,7 +1183,7 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
>  	if (test_bit(KVM_ARM_VCPU_POWER_OFF, vcpu->arch.features))
>  		kvm_arm_vcpu_power_off(vcpu);
>  	else
> -		vcpu->arch.power_off = false;
> +		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
>  
>  	return 0;
>  }
> diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> index e3f93b7f8d38..77a00913cdfd 100644
> --- a/arch/arm64/kvm/psci.c
> +++ b/arch/arm64/kvm/psci.c
> @@ -97,7 +97,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
>  	 */
>  	if (!vcpu)
>  		return PSCI_RET_INVALID_PARAMS;
> -	if (!vcpu->arch.power_off) {
> +	if (!kvm_arm_vcpu_powered_off(vcpu)) {
>  		if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
>  			return PSCI_RET_ALREADY_ON;
>  		else
> @@ -122,11 +122,11 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
>  
>  	/*
>  	 * Make sure the reset request is observed if the change to
> -	 * power_off is observed.
> +	 * mp_state is observed.

You want to expand this comment a bit, as this is not strictly a
binary state anymore.

>  	 */
>  	smp_wmb();
>  
> -	vcpu->arch.power_off = false;
> +	vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
>  	kvm_vcpu_wake_up(vcpu);
>  
>  	return PSCI_RET_SUCCESS;
> @@ -164,7 +164,7 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
>  		mpidr = kvm_vcpu_get_mpidr_aff(tmp);
>  		if ((mpidr & target_affinity_mask) == target_affinity) {
>  			matching_cpus++;
> -			if (!tmp->arch.power_off)
> +			if (!kvm_arm_vcpu_powered_off(tmp))
>  				return PSCI_0_2_AFFINITY_LEVEL_ON;
>  		}
>  	}
> @@ -190,7 +190,7 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
>  	 * re-initialized.
>  	 */
>  	kvm_for_each_vcpu(i, tmp, vcpu->kvm)
> -		tmp->arch.power_off = true;
> +		tmp->arch.mp_state = KVM_MP_STATE_STOPPED;
>  	kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
>  
>  	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));

You also may want to initialise the mp_state to RUNNABLE by default in
kvm_arch_vcpu_create(). We are currently relying on power_off to be
false thanks to the vcpu struct being zeroed, but we may as well make
it clearer (RUNNABLE is also 0, so there is no actual bug here).

Otherwise, looks good.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 09/19] KVM: arm64: Implement PSCI SYSTEM_SUSPEND
  2022-02-23  4:18   ` Oliver Upton
@ 2022-02-24 14:02     ` Marc Zyngier
  -1 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-02-24 14:02 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, Paolo Bonzini, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

On Wed, 23 Feb 2022 04:18:34 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows
> software to request that a system be placed in the deepest possible
> low-power state. Effectively, software can use this to suspend itself to
> RAM. Note that the semantics of this PSCI call are very similar to
> CPU_SUSPEND, which is already implemented in KVM.
> 
> Implement the SYSTEM_SUSPEND in KVM. Similar to CPU_SUSPEND, the
> low-power state is implemented as a guest WFI. Synchronously reset the
> calling CPU before entering the WFI, such that the vCPU may immediately
> resume execution when a wakeup event is recognized.
> 
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
>  arch/arm64/kvm/psci.c  | 51 ++++++++++++++++++++++++++++++++++++++++++
>  arch/arm64/kvm/reset.c |  3 ++-
>  2 files changed, 53 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> index 77a00913cdfd..41adaaf2234a 100644
> --- a/arch/arm64/kvm/psci.c
> +++ b/arch/arm64/kvm/psci.c
> @@ -208,6 +208,50 @@ static void kvm_psci_system_reset(struct kvm_vcpu *vcpu)
>  	kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET);
>  }
>  
> +static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_reset_state reset_state;
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_vcpu *tmp;
> +	bool denied = false;
> +	unsigned long i;
> +
> +	reset_state.pc = smccc_get_arg1(vcpu);
> +	if (!kvm_ipa_valid(kvm, reset_state.pc)) {
> +		smccc_set_retval(vcpu, PSCI_RET_INVALID_ADDRESS, 0, 0, 0);
> +		return 1;
> +	}
> +
> +	reset_state.r0 = smccc_get_arg2(vcpu);
> +	reset_state.be = kvm_vcpu_is_be(vcpu);
> +	reset_state.reset = true;
> +
> +	/*
> +	 * The SYSTEM_SUSPEND PSCI call requires that all vCPUs (except the
> +	 * calling vCPU) be in an OFF state, as determined by the
> +	 * implementation.
> +	 *
> +	 * See ARM DEN0022D, 5.19 "SYSTEM_SUSPEND" for more details.
> +	 */
> +	mutex_lock(&kvm->lock);
> +	kvm_for_each_vcpu(i, tmp, kvm) {
> +		if (tmp != vcpu && !kvm_arm_vcpu_powered_off(tmp)) {
> +			denied = true;
> +			break;
> +		}
> +	}
> +	mutex_unlock(&kvm->lock);

This looks dodgy. Nothing seems to prevent userspace from setting the
mp_state to RUNNING in parallel with this, as only the vcpu mutex is
held when this ioctl is issued.

It looks to me that what you want is what lock_all_vcpus() does
(Alexandru has a patch moving it out of the vgic code as part of his
SPE series).

It is also pretty unclear what the interaction with userspace is once
you have released the lock. If the VMM starts a vcpu other than the
suspending one, what is its state? The spec doesn't see to help
here. I can see two options:

- either all the vcpus have the same reset state applied to them as
  they come up, unless they are started with CPU_ON by a vcpu that has
  already booted (but there is a single 'context_id' provided, and I
  fear this is going to confuse the OS)...

- or only the suspending vcpu can resume the system, and we must fail
  a change of mp_state for the other vcpus.

What do you think?

> +
> +	if (denied) {
> +		smccc_set_retval(vcpu, PSCI_RET_DENIED, 0, 0, 0);
> +		return 1;
> +	}
> +
> +	__kvm_reset_vcpu(vcpu, &reset_state);
> +	kvm_vcpu_wfi(vcpu);

I have mixed feelings about this. The vcpu has reset before being in
WFI, while it really should be the other way around and userspace
could rely on observing the transition.

What breaks if you change this?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 09/19] KVM: arm64: Implement PSCI SYSTEM_SUSPEND
@ 2022-02-24 14:02     ` Marc Zyngier
  0 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-02-24 14:02 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Wanpeng Li, kvm, Joerg Roedel, Peter Shier, kvm-riscv,
	Atish Patra, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

On Wed, 23 Feb 2022 04:18:34 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows
> software to request that a system be placed in the deepest possible
> low-power state. Effectively, software can use this to suspend itself to
> RAM. Note that the semantics of this PSCI call are very similar to
> CPU_SUSPEND, which is already implemented in KVM.
> 
> Implement the SYSTEM_SUSPEND in KVM. Similar to CPU_SUSPEND, the
> low-power state is implemented as a guest WFI. Synchronously reset the
> calling CPU before entering the WFI, such that the vCPU may immediately
> resume execution when a wakeup event is recognized.
> 
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
>  arch/arm64/kvm/psci.c  | 51 ++++++++++++++++++++++++++++++++++++++++++
>  arch/arm64/kvm/reset.c |  3 ++-
>  2 files changed, 53 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> index 77a00913cdfd..41adaaf2234a 100644
> --- a/arch/arm64/kvm/psci.c
> +++ b/arch/arm64/kvm/psci.c
> @@ -208,6 +208,50 @@ static void kvm_psci_system_reset(struct kvm_vcpu *vcpu)
>  	kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET);
>  }
>  
> +static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_reset_state reset_state;
> +	struct kvm *kvm = vcpu->kvm;
> +	struct kvm_vcpu *tmp;
> +	bool denied = false;
> +	unsigned long i;
> +
> +	reset_state.pc = smccc_get_arg1(vcpu);
> +	if (!kvm_ipa_valid(kvm, reset_state.pc)) {
> +		smccc_set_retval(vcpu, PSCI_RET_INVALID_ADDRESS, 0, 0, 0);
> +		return 1;
> +	}
> +
> +	reset_state.r0 = smccc_get_arg2(vcpu);
> +	reset_state.be = kvm_vcpu_is_be(vcpu);
> +	reset_state.reset = true;
> +
> +	/*
> +	 * The SYSTEM_SUSPEND PSCI call requires that all vCPUs (except the
> +	 * calling vCPU) be in an OFF state, as determined by the
> +	 * implementation.
> +	 *
> +	 * See ARM DEN0022D, 5.19 "SYSTEM_SUSPEND" for more details.
> +	 */
> +	mutex_lock(&kvm->lock);
> +	kvm_for_each_vcpu(i, tmp, kvm) {
> +		if (tmp != vcpu && !kvm_arm_vcpu_powered_off(tmp)) {
> +			denied = true;
> +			break;
> +		}
> +	}
> +	mutex_unlock(&kvm->lock);

This looks dodgy. Nothing seems to prevent userspace from setting the
mp_state to RUNNING in parallel with this, as only the vcpu mutex is
held when this ioctl is issued.

It looks to me that what you want is what lock_all_vcpus() does
(Alexandru has a patch moving it out of the vgic code as part of his
SPE series).

It is also pretty unclear what the interaction with userspace is once
you have released the lock. If the VMM starts a vcpu other than the
suspending one, what is its state? The spec doesn't see to help
here. I can see two options:

- either all the vcpus have the same reset state applied to them as
  they come up, unless they are started with CPU_ON by a vcpu that has
  already booted (but there is a single 'context_id' provided, and I
  fear this is going to confuse the OS)...

- or only the suspending vcpu can resume the system, and we must fail
  a change of mp_state for the other vcpus.

What do you think?

> +
> +	if (denied) {
> +		smccc_set_retval(vcpu, PSCI_RET_DENIED, 0, 0, 0);
> +		return 1;
> +	}
> +
> +	__kvm_reset_vcpu(vcpu, &reset_state);
> +	kvm_vcpu_wfi(vcpu);

I have mixed feelings about this. The vcpu has reset before being in
WFI, while it really should be the other way around and userspace
could rely on observing the transition.

What breaks if you change this?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 10/19] KVM: Create helper for setting a system event exit
  2022-02-23  4:18   ` Oliver Upton
@ 2022-02-24 14:07     ` Marc Zyngier
  -1 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-02-24 14:07 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, Paolo Bonzini, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

On Wed, 23 Feb 2022 04:18:35 +0000,
Oliver Upton <oupton@google.com> wrote:

> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index f11039944c08..9085a1b1569a 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2202,6 +2202,13 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
>  }
>  #endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
>  
> +static inline void kvm_vcpu_set_system_event_exit(struct kvm_vcpu *vcpu, u32 type)
> +{
> +	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
> +	vcpu->run->system_event.type = type;
> +	vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> +}
> +

nit: does this really deserve an inline function? I'd stick that in
kvm_main.c, really. Or is that getting in the way of building KVM as a
module on 'the other architecture'?

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 10/19] KVM: Create helper for setting a system event exit
@ 2022-02-24 14:07     ` Marc Zyngier
  0 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-02-24 14:07 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Wanpeng Li, kvm, Joerg Roedel, Peter Shier, kvm-riscv,
	Atish Patra, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

On Wed, 23 Feb 2022 04:18:35 +0000,
Oliver Upton <oupton@google.com> wrote:

> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index f11039944c08..9085a1b1569a 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2202,6 +2202,13 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
>  }
>  #endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */
>  
> +static inline void kvm_vcpu_set_system_event_exit(struct kvm_vcpu *vcpu, u32 type)
> +{
> +	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
> +	vcpu->run->system_event.type = type;
> +	vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
> +}
> +

nit: does this really deserve an inline function? I'd stick that in
kvm_main.c, really. Or is that getting in the way of building KVM as a
module on 'the other architecture'?

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 12/19] KVM: arm64: Add support for userspace to suspend a vCPU
  2022-02-23  4:18   ` Oliver Upton
@ 2022-02-24 15:12     ` Marc Zyngier
  -1 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-02-24 15:12 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Wanpeng Li, kvm, Joerg Roedel, Peter Shier, kvm-riscv,
	Atish Patra, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

On Wed, 23 Feb 2022 04:18:37 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> Introduce a new MP state, KVM_MP_STATE_SUSPENDED, which indicates a vCPU
> is in a suspended state. In the suspended state the vCPU will block
> until a wakeup event (pending interrupt) is recognized.
> 
> Add a new system event type, KVM_SYSTEM_EVENT_WAKEUP, to indicate to
> userspace that KVM has recognized one such wakeup event. It is the
> responsibility of userspace to then make the vCPU runnable, or leave it
> suspended until the next wakeup event.
> 
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
>  Documentation/virt/kvm/api.rst    | 23 ++++++++++++++++++--
>  arch/arm64/include/asm/kvm_host.h |  1 +
>  arch/arm64/kvm/arm.c              | 35 +++++++++++++++++++++++++++++++
>  include/uapi/linux/kvm.h          |  2 ++
>  4 files changed, 59 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index a4267104db50..2b4bdbc2dcc0 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -1482,14 +1482,29 @@ Possible values are:
>                                   [s390]
>     KVM_MP_STATE_LOAD             the vcpu is in a special load/startup state
>                                   [s390]
> +   KVM_MP_STATE_SUSPENDED        the vcpu is in a suspend state and is waiting
> +                                 for a wakeup event [arm/arm64]

nit: arm64 only (these are host architectures, not guest). Eventually,
someone needs to do a bit of cleanup in the docs to remove any trace
of ye olde 32bit stuff.

>     ==========================    ===============================================
>  
>  On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
>  in-kernel irqchip, the multiprocessing state must be maintained by userspace on
>  these architectures.
>  
> -For arm/arm64/riscv:
> -^^^^^^^^^^^^^^^^^^^^
> +For arm/arm64:
> +^^^^^^^^^^^^^^
> +
> +If a vCPU is in the KVM_MP_STATE_SUSPENDED state, KVM will block the vCPU
> +thread and wait for a wakeup event. A wakeup event is defined as a pending
> +interrupt for the guest.

nit: a pending interrupt that the guest can actually handle (a masked
interrupt can be pending). It'd be more accurate to describe this
state as the architectural execution of a WFI instruction.

> +
> +If a wakeup event is recognized, KVM will exit to userspace with a
> +KVM_SYSTEM_EVENT exit, where the event type is KVM_SYSTEM_EVENT_WAKEUP. If
> +userspace wants to honor the wakeup, it must set the vCPU's MP state to
> +KVM_MP_STATE_RUNNABLE. If it does not, KVM will continue to await a wakeup
> +event in subsequent calls to KVM_RUN.

I can see a potential 'gotcha' here. If the VMM doesn't want to set
the vcpu as runnable, but doesn't take action on the source of the
wake-up (masking the interrupt), you'll get an immediate wake-up event
again. The VMM is now eating 100% of the CPU and not making forward
progress. Luser error, but you may want to capture the failure mode
and make it crystal clear in the doc.

It also mean that at the point where it decides to restart the guest
for real, it must restore the interrupt state as it initially found
it.

> +
> +For riscv:
> +^^^^^^^^^^
>  
>  The only states that are valid are KVM_MP_STATE_STOPPED and
>  KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
> @@ -5914,6 +5929,7 @@ should put the acknowledged interrupt vector into the 'epr' field.
>    #define KVM_SYSTEM_EVENT_SHUTDOWN       1
>    #define KVM_SYSTEM_EVENT_RESET          2
>    #define KVM_SYSTEM_EVENT_CRASH          3
> +  #define KVM_SYSTEM_EVENT_WAKEUP         4
>  			__u32 type;
>  			__u64 flags;
>  		} system_event;
> @@ -5938,6 +5954,9 @@ Valid values for 'type' are:
>     has requested a crash condition maintenance. Userspace can choose
>     to ignore the request, or to gather VM memory core dump and/or
>     reset/shutdown of the VM.
> + - KVM_SYSTEM_EVENT_WAKEUP -- the guest is in a suspended state and KVM
> +   has recognized a wakeup event. Userspace may honor this event by marking
> +   the exiting vCPU as runnable, or deny it and call KVM_RUN again.
>  
>  ::
>  
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 33ecec755310..d32cab0c9752 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -46,6 +46,7 @@
>  #define KVM_REQ_RECORD_STEAL	KVM_ARCH_REQ(3)
>  #define KVM_REQ_RELOAD_GICv4	KVM_ARCH_REQ(4)
>  #define KVM_REQ_RELOAD_PMU	KVM_ARCH_REQ(5)
> +#define KVM_REQ_SUSPEND		KVM_ARCH_REQ(6)
>  
>  #define KVM_DIRTY_LOG_MANUAL_CAPS   (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
>  				     KVM_DIRTY_LOG_INITIALLY_SET)
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index f6ce97c0069c..d2b190f32651 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -438,6 +438,18 @@ bool kvm_arm_vcpu_powered_off(struct kvm_vcpu *vcpu)
>  	return vcpu->arch.mp_state == KVM_MP_STATE_STOPPED;
>  }
>  
> +static void kvm_arm_vcpu_suspend(struct kvm_vcpu *vcpu)
> +{
> +	vcpu->arch.mp_state = KVM_MP_STATE_SUSPENDED;
> +	kvm_make_request(KVM_REQ_SUSPEND, vcpu);
> +	kvm_vcpu_kick(vcpu);

I wonder whether this kvm_vcpu_kick() is simply cargo-culted. The
mp_state calls can only be done from the vcpu fd, and thus the vcpu
cannot be running, so there is nothing to kick. Not a big deal, but
something we may want to look at later on.

> +}
> +
> +bool kvm_arm_vcpu_suspended(struct kvm_vcpu *vcpu)

static?

> +{
> +	return vcpu->arch.mp_state == KVM_MP_STATE_SUSPENDED;
> +}
> +
>  int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
>  				    struct kvm_mp_state *mp_state)
>  {
> @@ -458,6 +470,9 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
>  	case KVM_MP_STATE_STOPPED:
>  		kvm_arm_vcpu_power_off(vcpu);
>  		break;
> +	case KVM_MP_STATE_SUSPENDED:
> +		kvm_arm_vcpu_suspend(vcpu);
> +		break;
>  	default:
>  		ret = -EINVAL;
>  	}
> @@ -719,6 +734,23 @@ void kvm_vcpu_wfi(struct kvm_vcpu *vcpu)
>  	preempt_enable();
>  }
>  
> +static int kvm_vcpu_suspend(struct kvm_vcpu *vcpu)
> +{
> +	if (!kvm_arm_vcpu_suspended(vcpu))
> +		return 1;
> +
> +	kvm_vcpu_wfi(vcpu);
> +
> +	/*
> +	 * The suspend state is sticky; we do not leave it until userspace
> +	 * explicitly marks the vCPU as runnable. Request that we suspend again
> +	 * later.
> +	 */
> +	kvm_make_request(KVM_REQ_SUSPEND, vcpu);
> +	kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_WAKEUP);
> +	return 0;
> +}
> +
>  /**
>   * check_vcpu_requests - check and handle pending vCPU requests
>   * @vcpu:	the VCPU pointer
> @@ -757,6 +789,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
>  		if (kvm_check_request(KVM_REQ_RELOAD_PMU, vcpu))
>  			kvm_pmu_handle_pmcr(vcpu,
>  					    __vcpu_sys_reg(vcpu, PMCR_EL0));
> +
> +		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
> +			return kvm_vcpu_suspend(vcpu);
>  	}
>  
>  	return 1;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 5191b57e1562..babb16c2abe5 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -444,6 +444,7 @@ struct kvm_run {
>  #define KVM_SYSTEM_EVENT_SHUTDOWN       1
>  #define KVM_SYSTEM_EVENT_RESET          2
>  #define KVM_SYSTEM_EVENT_CRASH          3
> +#define KVM_SYSTEM_EVENT_WAKEUP         4
>  			__u32 type;
>  			__u64 flags;
>  		} system_event;
> @@ -634,6 +635,7 @@ struct kvm_vapic_addr {
>  #define KVM_MP_STATE_OPERATING         7
>  #define KVM_MP_STATE_LOAD              8
>  #define KVM_MP_STATE_AP_RESET_HOLD     9
> +#define KVM_MP_STATE_SUSPENDED         10
>  
>  struct kvm_mp_state {
>  	__u32 mp_state;

This patch looks OK as is, but it is the interactions with PSCI that
concern me. What we have here is per-CPU suspend triggered by
userspace. PSCI OTOH offers two variants of suspend triggered by the
guest. All of them get different implementations, and I have a hard
time figuring out how they all interact...

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 12/19] KVM: arm64: Add support for userspace to suspend a vCPU
@ 2022-02-24 15:12     ` Marc Zyngier
  0 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-02-24 15:12 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, Paolo Bonzini, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

On Wed, 23 Feb 2022 04:18:37 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> Introduce a new MP state, KVM_MP_STATE_SUSPENDED, which indicates a vCPU
> is in a suspended state. In the suspended state the vCPU will block
> until a wakeup event (pending interrupt) is recognized.
> 
> Add a new system event type, KVM_SYSTEM_EVENT_WAKEUP, to indicate to
> userspace that KVM has recognized one such wakeup event. It is the
> responsibility of userspace to then make the vCPU runnable, or leave it
> suspended until the next wakeup event.
> 
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
>  Documentation/virt/kvm/api.rst    | 23 ++++++++++++++++++--
>  arch/arm64/include/asm/kvm_host.h |  1 +
>  arch/arm64/kvm/arm.c              | 35 +++++++++++++++++++++++++++++++
>  include/uapi/linux/kvm.h          |  2 ++
>  4 files changed, 59 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index a4267104db50..2b4bdbc2dcc0 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -1482,14 +1482,29 @@ Possible values are:
>                                   [s390]
>     KVM_MP_STATE_LOAD             the vcpu is in a special load/startup state
>                                   [s390]
> +   KVM_MP_STATE_SUSPENDED        the vcpu is in a suspend state and is waiting
> +                                 for a wakeup event [arm/arm64]

nit: arm64 only (these are host architectures, not guest). Eventually,
someone needs to do a bit of cleanup in the docs to remove any trace
of ye olde 32bit stuff.

>     ==========================    ===============================================
>  
>  On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
>  in-kernel irqchip, the multiprocessing state must be maintained by userspace on
>  these architectures.
>  
> -For arm/arm64/riscv:
> -^^^^^^^^^^^^^^^^^^^^
> +For arm/arm64:
> +^^^^^^^^^^^^^^
> +
> +If a vCPU is in the KVM_MP_STATE_SUSPENDED state, KVM will block the vCPU
> +thread and wait for a wakeup event. A wakeup event is defined as a pending
> +interrupt for the guest.

nit: a pending interrupt that the guest can actually handle (a masked
interrupt can be pending). It'd be more accurate to describe this
state as the architectural execution of a WFI instruction.

> +
> +If a wakeup event is recognized, KVM will exit to userspace with a
> +KVM_SYSTEM_EVENT exit, where the event type is KVM_SYSTEM_EVENT_WAKEUP. If
> +userspace wants to honor the wakeup, it must set the vCPU's MP state to
> +KVM_MP_STATE_RUNNABLE. If it does not, KVM will continue to await a wakeup
> +event in subsequent calls to KVM_RUN.

I can see a potential 'gotcha' here. If the VMM doesn't want to set
the vcpu as runnable, but doesn't take action on the source of the
wake-up (masking the interrupt), you'll get an immediate wake-up event
again. The VMM is now eating 100% of the CPU and not making forward
progress. Luser error, but you may want to capture the failure mode
and make it crystal clear in the doc.

It also mean that at the point where it decides to restart the guest
for real, it must restore the interrupt state as it initially found
it.

> +
> +For riscv:
> +^^^^^^^^^^
>  
>  The only states that are valid are KVM_MP_STATE_STOPPED and
>  KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
> @@ -5914,6 +5929,7 @@ should put the acknowledged interrupt vector into the 'epr' field.
>    #define KVM_SYSTEM_EVENT_SHUTDOWN       1
>    #define KVM_SYSTEM_EVENT_RESET          2
>    #define KVM_SYSTEM_EVENT_CRASH          3
> +  #define KVM_SYSTEM_EVENT_WAKEUP         4
>  			__u32 type;
>  			__u64 flags;
>  		} system_event;
> @@ -5938,6 +5954,9 @@ Valid values for 'type' are:
>     has requested a crash condition maintenance. Userspace can choose
>     to ignore the request, or to gather VM memory core dump and/or
>     reset/shutdown of the VM.
> + - KVM_SYSTEM_EVENT_WAKEUP -- the guest is in a suspended state and KVM
> +   has recognized a wakeup event. Userspace may honor this event by marking
> +   the exiting vCPU as runnable, or deny it and call KVM_RUN again.
>  
>  ::
>  
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 33ecec755310..d32cab0c9752 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -46,6 +46,7 @@
>  #define KVM_REQ_RECORD_STEAL	KVM_ARCH_REQ(3)
>  #define KVM_REQ_RELOAD_GICv4	KVM_ARCH_REQ(4)
>  #define KVM_REQ_RELOAD_PMU	KVM_ARCH_REQ(5)
> +#define KVM_REQ_SUSPEND		KVM_ARCH_REQ(6)
>  
>  #define KVM_DIRTY_LOG_MANUAL_CAPS   (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
>  				     KVM_DIRTY_LOG_INITIALLY_SET)
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index f6ce97c0069c..d2b190f32651 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -438,6 +438,18 @@ bool kvm_arm_vcpu_powered_off(struct kvm_vcpu *vcpu)
>  	return vcpu->arch.mp_state == KVM_MP_STATE_STOPPED;
>  }
>  
> +static void kvm_arm_vcpu_suspend(struct kvm_vcpu *vcpu)
> +{
> +	vcpu->arch.mp_state = KVM_MP_STATE_SUSPENDED;
> +	kvm_make_request(KVM_REQ_SUSPEND, vcpu);
> +	kvm_vcpu_kick(vcpu);

I wonder whether this kvm_vcpu_kick() is simply cargo-culted. The
mp_state calls can only be done from the vcpu fd, and thus the vcpu
cannot be running, so there is nothing to kick. Not a big deal, but
something we may want to look at later on.

> +}
> +
> +bool kvm_arm_vcpu_suspended(struct kvm_vcpu *vcpu)

static?

> +{
> +	return vcpu->arch.mp_state == KVM_MP_STATE_SUSPENDED;
> +}
> +
>  int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
>  				    struct kvm_mp_state *mp_state)
>  {
> @@ -458,6 +470,9 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
>  	case KVM_MP_STATE_STOPPED:
>  		kvm_arm_vcpu_power_off(vcpu);
>  		break;
> +	case KVM_MP_STATE_SUSPENDED:
> +		kvm_arm_vcpu_suspend(vcpu);
> +		break;
>  	default:
>  		ret = -EINVAL;
>  	}
> @@ -719,6 +734,23 @@ void kvm_vcpu_wfi(struct kvm_vcpu *vcpu)
>  	preempt_enable();
>  }
>  
> +static int kvm_vcpu_suspend(struct kvm_vcpu *vcpu)
> +{
> +	if (!kvm_arm_vcpu_suspended(vcpu))
> +		return 1;
> +
> +	kvm_vcpu_wfi(vcpu);
> +
> +	/*
> +	 * The suspend state is sticky; we do not leave it until userspace
> +	 * explicitly marks the vCPU as runnable. Request that we suspend again
> +	 * later.
> +	 */
> +	kvm_make_request(KVM_REQ_SUSPEND, vcpu);
> +	kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_WAKEUP);
> +	return 0;
> +}
> +
>  /**
>   * check_vcpu_requests - check and handle pending vCPU requests
>   * @vcpu:	the VCPU pointer
> @@ -757,6 +789,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
>  		if (kvm_check_request(KVM_REQ_RELOAD_PMU, vcpu))
>  			kvm_pmu_handle_pmcr(vcpu,
>  					    __vcpu_sys_reg(vcpu, PMCR_EL0));
> +
> +		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
> +			return kvm_vcpu_suspend(vcpu);
>  	}
>  
>  	return 1;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 5191b57e1562..babb16c2abe5 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -444,6 +444,7 @@ struct kvm_run {
>  #define KVM_SYSTEM_EVENT_SHUTDOWN       1
>  #define KVM_SYSTEM_EVENT_RESET          2
>  #define KVM_SYSTEM_EVENT_CRASH          3
> +#define KVM_SYSTEM_EVENT_WAKEUP         4
>  			__u32 type;
>  			__u64 flags;
>  		} system_event;
> @@ -634,6 +635,7 @@ struct kvm_vapic_addr {
>  #define KVM_MP_STATE_OPERATING         7
>  #define KVM_MP_STATE_LOAD              8
>  #define KVM_MP_STATE_AP_RESET_HOLD     9
> +#define KVM_MP_STATE_SUSPENDED         10
>  
>  struct kvm_mp_state {
>  	__u32 mp_state;

This patch looks OK as is, but it is the interactions with PSCI that
concern me. What we have here is per-CPU suspend triggered by
userspace. PSCI OTOH offers two variants of suspend triggered by the
guest. All of them get different implementations, and I have a hard
time figuring out how they all interact...

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 13/19] KVM: arm64: Add support KVM_SYSTEM_EVENT_SUSPEND to PSCI SYSTEM_SUSPEND
  2022-02-23  4:18   ` Oliver Upton
@ 2022-02-24 15:40     ` Marc Zyngier
  -1 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-02-24 15:40 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, Paolo Bonzini, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

On Wed, 23 Feb 2022 04:18:38 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> Add a new system event type, KVM_SYSTEM_EVENT_SUSPEND, which indicates
> to userspace that the guest has requested the VM be suspended. Userspace
> can decide whether or not it wants to honor the guest's request by
> changing the MP state of the vCPU. If it does not, userspace is
> responsible for configuring the vCPU to return an error to the guest.
> Document these expectations in the KVM API documentation.
> 
> To preserve ABI, this new exit requires explicit opt-in from userspace.
> Add KVM_CAP_ARM_SYSTEM_SUSPEND which grants userspace the ability to
> opt-in to these exits on a per-VM basis.
> 
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
>  Documentation/virt/kvm/api.rst    | 39 +++++++++++++++++++++++++++++++
>  arch/arm64/include/asm/kvm_host.h |  3 +++
>  arch/arm64/kvm/arm.c              |  5 ++++
>  arch/arm64/kvm/psci.c             |  5 ++++
>  include/uapi/linux/kvm.h          |  2 ++
>  5 files changed, 54 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 2b4bdbc2dcc0..1e207bbc01f5 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -5930,6 +5930,7 @@ should put the acknowledged interrupt vector into the 'epr' field.
>    #define KVM_SYSTEM_EVENT_RESET          2
>    #define KVM_SYSTEM_EVENT_CRASH          3
>    #define KVM_SYSTEM_EVENT_WAKEUP         4
> +  #define KVM_SYSTEM_EVENT_SUSPENDED      5
>  			__u32 type;
>  			__u64 flags;
>  		} system_event;
> @@ -5957,6 +5958,34 @@ Valid values for 'type' are:
>   - KVM_SYSTEM_EVENT_WAKEUP -- the guest is in a suspended state and KVM
>     has recognized a wakeup event. Userspace may honor this event by marking
>     the exiting vCPU as runnable, or deny it and call KVM_RUN again.
> + - KVM_SYSTEM_EVENT_SUSPENDED -- the guest has requested a suspension of
> +   the VM.
> +
> +For arm/arm64:
> +^^^^^^^^^^^^^^
> +
> +   KVM_SYSTEM_EVENT_SUSPENDED exits are enabled with the
> +   KVM_CAP_ARM_SYSTEM_SUSPEND VM capability. If a guest successfully
> +   invokes the PSCI SYSTEM_SUSPEND function, KVM will exit to userspace
> +   with this event type.
> +
> +   The guest's x2 register contains the 'entry_address' where execution

x1?

> +   should resume when the VM is brought out of suspend. The guest's x3

x2?

> +   register contains the 'context_id' corresponding to the request. When
> +   the guest resumes execution at 'entry_address', x0 should contain the
> +   'context_id'. For more details on the SYSTEM_SUSPEND PSCI call, see
> +   ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND".

I'd refrain from paraphrasing too much of the spec, and direct the
user to it. It will also avoid introducing bugs... ;-)

Overall, "the guest" is super ambiguous, and echoes the questions I
had earlier about what this means for an SMP system. Only one vcpu can
restart the system, but which one?

> +
> +   Userspace is _required_ to take action for such an exit. It must
> +   either:
> +
> +    - Honor the guest request to suspend the VM. Userspace must reset
> +      the calling vCPU, then set PC to 'entry_address' and x0 to
> +      'context_id'. Userspace may request in-kernel emulation of the
> +      suspension by setting the vCPU's state to KVM_MP_STATE_SUSPENDED.

So here, you are actively saying that the calling vcpu should be the
one being resumed. If that's the case (and assuming that this is a
behaviour intended by the spec), something should prevent the other
vcpus from being started.

> +
> +    - Deny the guest request to suspend the VM. Userspace must set
> +      registers x1-x3 to 0 and set x0 to PSCI_RET_INTERNAL_ERROR (-6).

Do you have any sort of userspace code that demonstrates this? It'd be
super useful to see how that works on any publicly available VMM
(qemu, kvmtool, or any of the ferric oxide based monsters).

>
>  ::
>  
> @@ -7580,3 +7609,13 @@ The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
>  of the result of KVM_CHECK_EXTENSION.  KVM will forward to userspace
>  the hypercalls whose corresponding bit is in the argument, and return
>  ENOSYS for the others.
> +
> +8.35 KVM_CAP_ARM_SYSTEM_SUSPEND
> +-------------------------------
> +
> +:Capability: KVM_CAP_ARM_SYSTEM_SUSPEND
> +:Architectures: arm64
> +:Type: vm
> +
> +When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of
> +type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request.
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index d32cab0c9752..e1c2ec18d1aa 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -146,6 +146,9 @@ struct kvm_arch {
>  
>  	/* Memory Tagging Extension enabled for the guest */
>  	bool mte_enabled;
> +
> +	/* System Suspend Event exits enabled for the VM */
> +	bool system_suspend_exits;

Gah... More of these. Please pick this patch:

https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=kvm-arm64/mmu/guest-MMIO-guard&id=7dd0a13a4217b870f2e83cdc6045e5ce482a5340

>  };
>  
>  struct kvm_vcpu_fault_info {
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index d2b190f32651..ce3f14a77a49 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -101,6 +101,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  		}
>  		mutex_unlock(&kvm->lock);
>  		break;
> +	case KVM_CAP_ARM_SYSTEM_SUSPEND:
> +		r = 0;
> +		kvm->arch.system_suspend_exits = true;
> +		break;
>  	default:
>  		r = -EINVAL;
>  		break;
> @@ -209,6 +213,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_SET_GUEST_DEBUG:
>  	case KVM_CAP_VCPU_ATTRIBUTES:
>  	case KVM_CAP_PTP_KVM:
> +	case KVM_CAP_ARM_SYSTEM_SUSPEND:
>  		r = 1;
>  		break;
>  	case KVM_CAP_SET_GUEST_DEBUG2:
> diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> index 2bb8d047cde4..a7de84cec2e4 100644
> --- a/arch/arm64/kvm/psci.c
> +++ b/arch/arm64/kvm/psci.c
> @@ -245,6 +245,11 @@ static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
>  		return 1;
>  	}
>  
> +	if (kvm->arch.system_suspend_exits) {
> +		kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_SUSPEND);
> +		return 0;
> +	}
> +

So there really is a difference in behaviour here. Userspace sees the
WFI behaviour before reset (it implements it), while when not using
the SUSPEND event, reset occurs before anything else.

They really should behave in a similar way (WFI first, reset next).

>  	__kvm_reset_vcpu(vcpu, &reset_state);
>  	kvm_vcpu_wfi(vcpu);
>  	return 1;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index babb16c2abe5..e5bb5f15c0eb 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -445,6 +445,7 @@ struct kvm_run {
>  #define KVM_SYSTEM_EVENT_RESET          2
>  #define KVM_SYSTEM_EVENT_CRASH          3
>  #define KVM_SYSTEM_EVENT_WAKEUP         4
> +#define KVM_SYSTEM_EVENT_SUSPEND        5
>  			__u32 type;
>  			__u64 flags;
>  		} system_event;
> @@ -1136,6 +1137,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_VM_GPA_BITS 207
>  #define KVM_CAP_XSAVE2 208
>  #define KVM_CAP_SYS_ATTRIBUTES 209
> +#define KVM_CAP_ARM_SYSTEM_SUSPEND 210
>  
>  #ifdef KVM_CAP_IRQ_ROUTING

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 13/19] KVM: arm64: Add support KVM_SYSTEM_EVENT_SUSPEND to PSCI SYSTEM_SUSPEND
@ 2022-02-24 15:40     ` Marc Zyngier
  0 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-02-24 15:40 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Wanpeng Li, kvm, Joerg Roedel, Peter Shier, kvm-riscv,
	Atish Patra, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

On Wed, 23 Feb 2022 04:18:38 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> Add a new system event type, KVM_SYSTEM_EVENT_SUSPEND, which indicates
> to userspace that the guest has requested the VM be suspended. Userspace
> can decide whether or not it wants to honor the guest's request by
> changing the MP state of the vCPU. If it does not, userspace is
> responsible for configuring the vCPU to return an error to the guest.
> Document these expectations in the KVM API documentation.
> 
> To preserve ABI, this new exit requires explicit opt-in from userspace.
> Add KVM_CAP_ARM_SYSTEM_SUSPEND which grants userspace the ability to
> opt-in to these exits on a per-VM basis.
> 
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
>  Documentation/virt/kvm/api.rst    | 39 +++++++++++++++++++++++++++++++
>  arch/arm64/include/asm/kvm_host.h |  3 +++
>  arch/arm64/kvm/arm.c              |  5 ++++
>  arch/arm64/kvm/psci.c             |  5 ++++
>  include/uapi/linux/kvm.h          |  2 ++
>  5 files changed, 54 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 2b4bdbc2dcc0..1e207bbc01f5 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -5930,6 +5930,7 @@ should put the acknowledged interrupt vector into the 'epr' field.
>    #define KVM_SYSTEM_EVENT_RESET          2
>    #define KVM_SYSTEM_EVENT_CRASH          3
>    #define KVM_SYSTEM_EVENT_WAKEUP         4
> +  #define KVM_SYSTEM_EVENT_SUSPENDED      5
>  			__u32 type;
>  			__u64 flags;
>  		} system_event;
> @@ -5957,6 +5958,34 @@ Valid values for 'type' are:
>   - KVM_SYSTEM_EVENT_WAKEUP -- the guest is in a suspended state and KVM
>     has recognized a wakeup event. Userspace may honor this event by marking
>     the exiting vCPU as runnable, or deny it and call KVM_RUN again.
> + - KVM_SYSTEM_EVENT_SUSPENDED -- the guest has requested a suspension of
> +   the VM.
> +
> +For arm/arm64:
> +^^^^^^^^^^^^^^
> +
> +   KVM_SYSTEM_EVENT_SUSPENDED exits are enabled with the
> +   KVM_CAP_ARM_SYSTEM_SUSPEND VM capability. If a guest successfully
> +   invokes the PSCI SYSTEM_SUSPEND function, KVM will exit to userspace
> +   with this event type.
> +
> +   The guest's x2 register contains the 'entry_address' where execution

x1?

> +   should resume when the VM is brought out of suspend. The guest's x3

x2?

> +   register contains the 'context_id' corresponding to the request. When
> +   the guest resumes execution at 'entry_address', x0 should contain the
> +   'context_id'. For more details on the SYSTEM_SUSPEND PSCI call, see
> +   ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND".

I'd refrain from paraphrasing too much of the spec, and direct the
user to it. It will also avoid introducing bugs... ;-)

Overall, "the guest" is super ambiguous, and echoes the questions I
had earlier about what this means for an SMP system. Only one vcpu can
restart the system, but which one?

> +
> +   Userspace is _required_ to take action for such an exit. It must
> +   either:
> +
> +    - Honor the guest request to suspend the VM. Userspace must reset
> +      the calling vCPU, then set PC to 'entry_address' and x0 to
> +      'context_id'. Userspace may request in-kernel emulation of the
> +      suspension by setting the vCPU's state to KVM_MP_STATE_SUSPENDED.

So here, you are actively saying that the calling vcpu should be the
one being resumed. If that's the case (and assuming that this is a
behaviour intended by the spec), something should prevent the other
vcpus from being started.

> +
> +    - Deny the guest request to suspend the VM. Userspace must set
> +      registers x1-x3 to 0 and set x0 to PSCI_RET_INTERNAL_ERROR (-6).

Do you have any sort of userspace code that demonstrates this? It'd be
super useful to see how that works on any publicly available VMM
(qemu, kvmtool, or any of the ferric oxide based monsters).

>
>  ::
>  
> @@ -7580,3 +7609,13 @@ The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
>  of the result of KVM_CHECK_EXTENSION.  KVM will forward to userspace
>  the hypercalls whose corresponding bit is in the argument, and return
>  ENOSYS for the others.
> +
> +8.35 KVM_CAP_ARM_SYSTEM_SUSPEND
> +-------------------------------
> +
> +:Capability: KVM_CAP_ARM_SYSTEM_SUSPEND
> +:Architectures: arm64
> +:Type: vm
> +
> +When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of
> +type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request.
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index d32cab0c9752..e1c2ec18d1aa 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -146,6 +146,9 @@ struct kvm_arch {
>  
>  	/* Memory Tagging Extension enabled for the guest */
>  	bool mte_enabled;
> +
> +	/* System Suspend Event exits enabled for the VM */
> +	bool system_suspend_exits;

Gah... More of these. Please pick this patch:

https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=kvm-arm64/mmu/guest-MMIO-guard&id=7dd0a13a4217b870f2e83cdc6045e5ce482a5340

>  };
>  
>  struct kvm_vcpu_fault_info {
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index d2b190f32651..ce3f14a77a49 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -101,6 +101,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  		}
>  		mutex_unlock(&kvm->lock);
>  		break;
> +	case KVM_CAP_ARM_SYSTEM_SUSPEND:
> +		r = 0;
> +		kvm->arch.system_suspend_exits = true;
> +		break;
>  	default:
>  		r = -EINVAL;
>  		break;
> @@ -209,6 +213,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_SET_GUEST_DEBUG:
>  	case KVM_CAP_VCPU_ATTRIBUTES:
>  	case KVM_CAP_PTP_KVM:
> +	case KVM_CAP_ARM_SYSTEM_SUSPEND:
>  		r = 1;
>  		break;
>  	case KVM_CAP_SET_GUEST_DEBUG2:
> diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> index 2bb8d047cde4..a7de84cec2e4 100644
> --- a/arch/arm64/kvm/psci.c
> +++ b/arch/arm64/kvm/psci.c
> @@ -245,6 +245,11 @@ static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
>  		return 1;
>  	}
>  
> +	if (kvm->arch.system_suspend_exits) {
> +		kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_SUSPEND);
> +		return 0;
> +	}
> +

So there really is a difference in behaviour here. Userspace sees the
WFI behaviour before reset (it implements it), while when not using
the SUSPEND event, reset occurs before anything else.

They really should behave in a similar way (WFI first, reset next).

>  	__kvm_reset_vcpu(vcpu, &reset_state);
>  	kvm_vcpu_wfi(vcpu);
>  	return 1;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index babb16c2abe5..e5bb5f15c0eb 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -445,6 +445,7 @@ struct kvm_run {
>  #define KVM_SYSTEM_EVENT_RESET          2
>  #define KVM_SYSTEM_EVENT_CRASH          3
>  #define KVM_SYSTEM_EVENT_WAKEUP         4
> +#define KVM_SYSTEM_EVENT_SUSPEND        5
>  			__u32 type;
>  			__u64 flags;
>  		} system_event;
> @@ -1136,6 +1137,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_VM_GPA_BITS 207
>  #define KVM_CAP_XSAVE2 208
>  #define KVM_CAP_SYS_ATTRIBUTES 209
> +#define KVM_CAP_ARM_SYSTEM_SUSPEND 210
>  
>  #ifdef KVM_CAP_IRQ_ROUTING

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 03/19] KVM: arm64: Reject invalid addresses for CPU_ON PSCI call
  2022-02-24 12:30     ` Marc Zyngier
@ 2022-02-24 19:21       ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-24 19:21 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, Paolo Bonzini, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

Hi Marc,

On Thu, Feb 24, 2022 at 12:30:49PM +0000, Marc Zyngier wrote:
> On Wed, 23 Feb 2022 04:18:28 +0000,
> Oliver Upton <oupton@google.com> wrote:
> > 
> > DEN0022D.b 5.6.2 "Caller responsibilities" states that a PSCI
> > implementation may return INVALID_ADDRESS for the CPU_ON call if the
> > provided entry address is known to be invalid. There is an additional
> > caveat to this rule. Prior to PSCI v1.0, the INVALID_PARAMETERS error
> > is returned instead. Check the guest's PSCI version and return the
> > appropriate error if the IPA is invalid.
> > 
> > Reported-by: Reiji Watanabe <reijiw@google.com>
> > Signed-off-by: Oliver Upton <oupton@google.com>
> > ---
> >  arch/arm64/kvm/psci.c | 24 ++++++++++++++++++++++--
> >  1 file changed, 22 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > index a0c10c11f40e..de1cf554929d 100644
> > --- a/arch/arm64/kvm/psci.c
> > +++ b/arch/arm64/kvm/psci.c
> > @@ -12,6 +12,7 @@
> >  
> >  #include <asm/cputype.h>
> >  #include <asm/kvm_emulate.h>
> > +#include <asm/kvm_mmu.h>
> >  
> >  #include <kvm/arm_psci.h>
> >  #include <kvm/arm_hypercalls.h>
> > @@ -70,12 +71,31 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
> >  	struct vcpu_reset_state *reset_state;
> >  	struct kvm *kvm = source_vcpu->kvm;
> >  	struct kvm_vcpu *vcpu = NULL;
> > -	unsigned long cpu_id;
> > +	unsigned long cpu_id, entry_addr;
> >  
> >  	cpu_id = smccc_get_arg1(source_vcpu);
> >  	if (!kvm_psci_valid_affinity(source_vcpu, cpu_id))
> >  		return PSCI_RET_INVALID_PARAMS;
> >  
> > +	/*
> > +	 * Basic sanity check: ensure the requested entry address actually
> > +	 * exists within the guest's address space.
> > +	 */
> > +	entry_addr = smccc_get_arg2(source_vcpu);
> > +	if (!kvm_ipa_valid(kvm, entry_addr)) {
> > +
> > +		/*
> > +		 * Before PSCI v1.0, the INVALID_PARAMETERS error is returned
> > +		 * instead of INVALID_ADDRESS.
> > +		 *
> > +		 * For more details, see ARM DEN0022D.b 5.6 "CPU_ON".
> > +		 */
> > +		if (kvm_psci_version(source_vcpu) < KVM_ARM_PSCI_1_0)
> > +			return PSCI_RET_INVALID_PARAMS;
> > +		else
> > +			return PSCI_RET_INVALID_ADDRESS;
> > +	}
> > +
> 
> If you're concerned with this, should you also check for the PC
> alignment, or the presence of a memslot covering the address you are
> branching to?  Le latter is particularly hard to implement reliably.

Andrew, Reiji and I had a conversation regarding exactly this on the
last run of this series, and concluded that checking against the IPA is
probably the best KVM can do [1]. That said, alignment is also an easy
thing to check.

> So far, my position has been that the guest is free to shoot itself in
> the foot if that's what it wants to do, and that babysitting it was a
> waste of useful bits! ;-)
>

Agreed -- there are plenty of spectacular/hilarious ways in which the
guest can mess up :-)

> Or have you identified something that makes it a requirement to handle
> this case (and possibly others)  in the hypervisor?

It is a lot easier to tell a guest that their software is broken if they
get an error back from the hypercall, whereas a vCPU off in the weeds
might need to be looked at before concluding there's a guest issue.


[1]: http://lore.kernel.org/r/20211005190153.dc2befzcisvznxq5@gator.home

--
Oliver

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 03/19] KVM: arm64: Reject invalid addresses for CPU_ON PSCI call
@ 2022-02-24 19:21       ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-24 19:21 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Wanpeng Li, kvm, Joerg Roedel, Peter Shier, kvm-riscv,
	Atish Patra, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

Hi Marc,

On Thu, Feb 24, 2022 at 12:30:49PM +0000, Marc Zyngier wrote:
> On Wed, 23 Feb 2022 04:18:28 +0000,
> Oliver Upton <oupton@google.com> wrote:
> > 
> > DEN0022D.b 5.6.2 "Caller responsibilities" states that a PSCI
> > implementation may return INVALID_ADDRESS for the CPU_ON call if the
> > provided entry address is known to be invalid. There is an additional
> > caveat to this rule. Prior to PSCI v1.0, the INVALID_PARAMETERS error
> > is returned instead. Check the guest's PSCI version and return the
> > appropriate error if the IPA is invalid.
> > 
> > Reported-by: Reiji Watanabe <reijiw@google.com>
> > Signed-off-by: Oliver Upton <oupton@google.com>
> > ---
> >  arch/arm64/kvm/psci.c | 24 ++++++++++++++++++++++--
> >  1 file changed, 22 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > index a0c10c11f40e..de1cf554929d 100644
> > --- a/arch/arm64/kvm/psci.c
> > +++ b/arch/arm64/kvm/psci.c
> > @@ -12,6 +12,7 @@
> >  
> >  #include <asm/cputype.h>
> >  #include <asm/kvm_emulate.h>
> > +#include <asm/kvm_mmu.h>
> >  
> >  #include <kvm/arm_psci.h>
> >  #include <kvm/arm_hypercalls.h>
> > @@ -70,12 +71,31 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
> >  	struct vcpu_reset_state *reset_state;
> >  	struct kvm *kvm = source_vcpu->kvm;
> >  	struct kvm_vcpu *vcpu = NULL;
> > -	unsigned long cpu_id;
> > +	unsigned long cpu_id, entry_addr;
> >  
> >  	cpu_id = smccc_get_arg1(source_vcpu);
> >  	if (!kvm_psci_valid_affinity(source_vcpu, cpu_id))
> >  		return PSCI_RET_INVALID_PARAMS;
> >  
> > +	/*
> > +	 * Basic sanity check: ensure the requested entry address actually
> > +	 * exists within the guest's address space.
> > +	 */
> > +	entry_addr = smccc_get_arg2(source_vcpu);
> > +	if (!kvm_ipa_valid(kvm, entry_addr)) {
> > +
> > +		/*
> > +		 * Before PSCI v1.0, the INVALID_PARAMETERS error is returned
> > +		 * instead of INVALID_ADDRESS.
> > +		 *
> > +		 * For more details, see ARM DEN0022D.b 5.6 "CPU_ON".
> > +		 */
> > +		if (kvm_psci_version(source_vcpu) < KVM_ARM_PSCI_1_0)
> > +			return PSCI_RET_INVALID_PARAMS;
> > +		else
> > +			return PSCI_RET_INVALID_ADDRESS;
> > +	}
> > +
> 
> If you're concerned with this, should you also check for the PC
> alignment, or the presence of a memslot covering the address you are
> branching to?  Le latter is particularly hard to implement reliably.

Andrew, Reiji and I had a conversation regarding exactly this on the
last run of this series, and concluded that checking against the IPA is
probably the best KVM can do [1]. That said, alignment is also an easy
thing to check.

> So far, my position has been that the guest is free to shoot itself in
> the foot if that's what it wants to do, and that babysitting it was a
> waste of useful bits! ;-)
>

Agreed -- there are plenty of spectacular/hilarious ways in which the
guest can mess up :-)

> Or have you identified something that makes it a requirement to handle
> this case (and possibly others)  in the hypervisor?

It is a lot easier to tell a guest that their software is broken if they
get an error back from the hypercall, whereas a vCPU off in the weeds
might need to be looked at before concluding there's a guest issue.


[1]: http://lore.kernel.org/r/20211005190153.dc2befzcisvznxq5@gator.home

--
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 09/19] KVM: arm64: Implement PSCI SYSTEM_SUSPEND
  2022-02-24 14:02     ` Marc Zyngier
@ 2022-02-24 19:35       ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-24 19:35 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, Paolo Bonzini, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

Hi Marc,

Thanks for reviewing the series. ACK to the nits and smaller comments
you've made, I'll incorporate that feedback in the next series.

On Thu, Feb 24, 2022 at 02:02:34PM +0000, Marc Zyngier wrote:
> On Wed, 23 Feb 2022 04:18:34 +0000,
> Oliver Upton <oupton@google.com> wrote:
> > 
> > ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows
> > software to request that a system be placed in the deepest possible
> > low-power state. Effectively, software can use this to suspend itself to
> > RAM. Note that the semantics of this PSCI call are very similar to
> > CPU_SUSPEND, which is already implemented in KVM.
> > 
> > Implement the SYSTEM_SUSPEND in KVM. Similar to CPU_SUSPEND, the
> > low-power state is implemented as a guest WFI. Synchronously reset the
> > calling CPU before entering the WFI, such that the vCPU may immediately
> > resume execution when a wakeup event is recognized.
> > 
> > Signed-off-by: Oliver Upton <oupton@google.com>
> > ---
> >  arch/arm64/kvm/psci.c  | 51 ++++++++++++++++++++++++++++++++++++++++++
> >  arch/arm64/kvm/reset.c |  3 ++-
> >  2 files changed, 53 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > index 77a00913cdfd..41adaaf2234a 100644
> > --- a/arch/arm64/kvm/psci.c
> > +++ b/arch/arm64/kvm/psci.c
> > @@ -208,6 +208,50 @@ static void kvm_psci_system_reset(struct kvm_vcpu *vcpu)
> >  	kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET);
> >  }
> >  
> > +static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
> > +{
> > +	struct vcpu_reset_state reset_state;
> > +	struct kvm *kvm = vcpu->kvm;
> > +	struct kvm_vcpu *tmp;
> > +	bool denied = false;
> > +	unsigned long i;
> > +
> > +	reset_state.pc = smccc_get_arg1(vcpu);
> > +	if (!kvm_ipa_valid(kvm, reset_state.pc)) {
> > +		smccc_set_retval(vcpu, PSCI_RET_INVALID_ADDRESS, 0, 0, 0);
> > +		return 1;
> > +	}
> > +
> > +	reset_state.r0 = smccc_get_arg2(vcpu);
> > +	reset_state.be = kvm_vcpu_is_be(vcpu);
> > +	reset_state.reset = true;
> > +
> > +	/*
> > +	 * The SYSTEM_SUSPEND PSCI call requires that all vCPUs (except the
> > +	 * calling vCPU) be in an OFF state, as determined by the
> > +	 * implementation.
> > +	 *
> > +	 * See ARM DEN0022D, 5.19 "SYSTEM_SUSPEND" for more details.
> > +	 */
> > +	mutex_lock(&kvm->lock);
> > +	kvm_for_each_vcpu(i, tmp, kvm) {
> > +		if (tmp != vcpu && !kvm_arm_vcpu_powered_off(tmp)) {
> > +			denied = true;
> > +			break;
> > +		}
> > +	}
> > +	mutex_unlock(&kvm->lock);
> 
> This looks dodgy. Nothing seems to prevent userspace from setting the
> mp_state to RUNNING in parallel with this, as only the vcpu mutex is
> held when this ioctl is issued.
> 
> It looks to me that what you want is what lock_all_vcpus() does
> (Alexandru has a patch moving it out of the vgic code as part of his
> SPE series).
> 
> It is also pretty unclear what the interaction with userspace is once
> you have released the lock. If the VMM starts a vcpu other than the
> suspending one, what is its state? The spec doesn't see to help
> here. I can see two options:
> 
> - either all the vcpus have the same reset state applied to them as
>   they come up, unless they are started with CPU_ON by a vcpu that has
>   already booted (but there is a single 'context_id' provided, and I
>   fear this is going to confuse the OS)...
> 
> - or only the suspending vcpu can resume the system, and we must fail
>   a change of mp_state for the other vcpus.
> 
> What do you think?

Definitely the latter. The documentation of SYSTEM_SUSPEND is quite
shaky on this, but it would appear that the intention is for the caller
to be the first CPU to wake up.

> > +
> > +	if (denied) {
> > +		smccc_set_retval(vcpu, PSCI_RET_DENIED, 0, 0, 0);
> > +		return 1;
> > +	}
> > +
> > +	__kvm_reset_vcpu(vcpu, &reset_state);
> > +	kvm_vcpu_wfi(vcpu);
> 
> I have mixed feelings about this. The vcpu has reset before being in
> WFI, while it really should be the other way around and userspace
> could rely on observing the transition.
> 
> What breaks if you change this?

I don't think that userspace would be able to observe the transition
even if we WFI before the reset. I imagine that would take the form
of setting KVM_REQ_VCPU_RESET, which we explicitly handle before
letting userspace access the vCPU's state as of commit
6826c6849b46 ("KVM: arm64: Handle PSCI resets before userspace
touches vCPU state").

Given this, I felt it was probably best to avoid all the indirection and
just do the vCPU reset in the handling of SYSTEM_SUSPEND. It does,
however, imply that we have slightly different behavior when userspace
exits are enabled, as that will happen pre-reset and pre-WFI.

--
Oliver

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 09/19] KVM: arm64: Implement PSCI SYSTEM_SUSPEND
@ 2022-02-24 19:35       ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-24 19:35 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Wanpeng Li, kvm, Joerg Roedel, Peter Shier, kvm-riscv,
	Atish Patra, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

Hi Marc,

Thanks for reviewing the series. ACK to the nits and smaller comments
you've made, I'll incorporate that feedback in the next series.

On Thu, Feb 24, 2022 at 02:02:34PM +0000, Marc Zyngier wrote:
> On Wed, 23 Feb 2022 04:18:34 +0000,
> Oliver Upton <oupton@google.com> wrote:
> > 
> > ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows
> > software to request that a system be placed in the deepest possible
> > low-power state. Effectively, software can use this to suspend itself to
> > RAM. Note that the semantics of this PSCI call are very similar to
> > CPU_SUSPEND, which is already implemented in KVM.
> > 
> > Implement the SYSTEM_SUSPEND in KVM. Similar to CPU_SUSPEND, the
> > low-power state is implemented as a guest WFI. Synchronously reset the
> > calling CPU before entering the WFI, such that the vCPU may immediately
> > resume execution when a wakeup event is recognized.
> > 
> > Signed-off-by: Oliver Upton <oupton@google.com>
> > ---
> >  arch/arm64/kvm/psci.c  | 51 ++++++++++++++++++++++++++++++++++++++++++
> >  arch/arm64/kvm/reset.c |  3 ++-
> >  2 files changed, 53 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > index 77a00913cdfd..41adaaf2234a 100644
> > --- a/arch/arm64/kvm/psci.c
> > +++ b/arch/arm64/kvm/psci.c
> > @@ -208,6 +208,50 @@ static void kvm_psci_system_reset(struct kvm_vcpu *vcpu)
> >  	kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET);
> >  }
> >  
> > +static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
> > +{
> > +	struct vcpu_reset_state reset_state;
> > +	struct kvm *kvm = vcpu->kvm;
> > +	struct kvm_vcpu *tmp;
> > +	bool denied = false;
> > +	unsigned long i;
> > +
> > +	reset_state.pc = smccc_get_arg1(vcpu);
> > +	if (!kvm_ipa_valid(kvm, reset_state.pc)) {
> > +		smccc_set_retval(vcpu, PSCI_RET_INVALID_ADDRESS, 0, 0, 0);
> > +		return 1;
> > +	}
> > +
> > +	reset_state.r0 = smccc_get_arg2(vcpu);
> > +	reset_state.be = kvm_vcpu_is_be(vcpu);
> > +	reset_state.reset = true;
> > +
> > +	/*
> > +	 * The SYSTEM_SUSPEND PSCI call requires that all vCPUs (except the
> > +	 * calling vCPU) be in an OFF state, as determined by the
> > +	 * implementation.
> > +	 *
> > +	 * See ARM DEN0022D, 5.19 "SYSTEM_SUSPEND" for more details.
> > +	 */
> > +	mutex_lock(&kvm->lock);
> > +	kvm_for_each_vcpu(i, tmp, kvm) {
> > +		if (tmp != vcpu && !kvm_arm_vcpu_powered_off(tmp)) {
> > +			denied = true;
> > +			break;
> > +		}
> > +	}
> > +	mutex_unlock(&kvm->lock);
> 
> This looks dodgy. Nothing seems to prevent userspace from setting the
> mp_state to RUNNING in parallel with this, as only the vcpu mutex is
> held when this ioctl is issued.
> 
> It looks to me that what you want is what lock_all_vcpus() does
> (Alexandru has a patch moving it out of the vgic code as part of his
> SPE series).
> 
> It is also pretty unclear what the interaction with userspace is once
> you have released the lock. If the VMM starts a vcpu other than the
> suspending one, what is its state? The spec doesn't see to help
> here. I can see two options:
> 
> - either all the vcpus have the same reset state applied to them as
>   they come up, unless they are started with CPU_ON by a vcpu that has
>   already booted (but there is a single 'context_id' provided, and I
>   fear this is going to confuse the OS)...
> 
> - or only the suspending vcpu can resume the system, and we must fail
>   a change of mp_state for the other vcpus.
> 
> What do you think?

Definitely the latter. The documentation of SYSTEM_SUSPEND is quite
shaky on this, but it would appear that the intention is for the caller
to be the first CPU to wake up.

> > +
> > +	if (denied) {
> > +		smccc_set_retval(vcpu, PSCI_RET_DENIED, 0, 0, 0);
> > +		return 1;
> > +	}
> > +
> > +	__kvm_reset_vcpu(vcpu, &reset_state);
> > +	kvm_vcpu_wfi(vcpu);
> 
> I have mixed feelings about this. The vcpu has reset before being in
> WFI, while it really should be the other way around and userspace
> could rely on observing the transition.
> 
> What breaks if you change this?

I don't think that userspace would be able to observe the transition
even if we WFI before the reset. I imagine that would take the form
of setting KVM_REQ_VCPU_RESET, which we explicitly handle before
letting userspace access the vCPU's state as of commit
6826c6849b46 ("KVM: arm64: Handle PSCI resets before userspace
touches vCPU state").

Given this, I felt it was probably best to avoid all the indirection and
just do the vCPU reset in the handling of SYSTEM_SUSPEND. It does,
however, imply that we have slightly different behavior when userspace
exits are enabled, as that will happen pre-reset and pre-WFI.

--
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 12/19] KVM: arm64: Add support for userspace to suspend a vCPU
  2022-02-24 15:12     ` Marc Zyngier
@ 2022-02-24 19:47       ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-24 19:47 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, Paolo Bonzini, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

On Thu, Feb 24, 2022 at 03:12:17PM +0000, Marc Zyngier wrote:
> On Wed, 23 Feb 2022 04:18:37 +0000,
> Oliver Upton <oupton@google.com> wrote:
> > 
> > Introduce a new MP state, KVM_MP_STATE_SUSPENDED, which indicates a vCPU
> > is in a suspended state. In the suspended state the vCPU will block
> > until a wakeup event (pending interrupt) is recognized.
> > 
> > Add a new system event type, KVM_SYSTEM_EVENT_WAKEUP, to indicate to
> > userspace that KVM has recognized one such wakeup event. It is the
> > responsibility of userspace to then make the vCPU runnable, or leave it
> > suspended until the next wakeup event.
> > 
> > Signed-off-by: Oliver Upton <oupton@google.com>
> > ---
> >  Documentation/virt/kvm/api.rst    | 23 ++++++++++++++++++--
> >  arch/arm64/include/asm/kvm_host.h |  1 +
> >  arch/arm64/kvm/arm.c              | 35 +++++++++++++++++++++++++++++++
> >  include/uapi/linux/kvm.h          |  2 ++
> >  4 files changed, 59 insertions(+), 2 deletions(-)
> > 
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index a4267104db50..2b4bdbc2dcc0 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -1482,14 +1482,29 @@ Possible values are:
> >                                   [s390]
> >     KVM_MP_STATE_LOAD             the vcpu is in a special load/startup state
> >                                   [s390]
> > +   KVM_MP_STATE_SUSPENDED        the vcpu is in a suspend state and is waiting
> > +                                 for a wakeup event [arm/arm64]
> 
> nit: arm64 only (these are host architectures, not guest).

Roger that.

> Eventually, someone needs to do a bit of cleanup in the docs to remove
> any trace of ye olde 32bit stuff.
>

I'm just going to act like I didn't read this ;-)

> 
> >     ==========================    ===============================================
> >  
> >  On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
> >  in-kernel irqchip, the multiprocessing state must be maintained by userspace on
> >  these architectures.
> >  
> > -For arm/arm64/riscv:
> > -^^^^^^^^^^^^^^^^^^^^
> > +For arm/arm64:
> > +^^^^^^^^^^^^^^
> > +
> > +If a vCPU is in the KVM_MP_STATE_SUSPENDED state, KVM will block the vCPU
> > +thread and wait for a wakeup event. A wakeup event is defined as a pending
> > +interrupt for the guest.
> 
> nit: a pending interrupt that the guest can actually handle (a masked
> interrupt can be pending). It'd be more accurate to describe this
> state as the architectural execution of a WFI instruction.
>

Yeah, probably better than paraphrasing.

> > +
> > +If a wakeup event is recognized, KVM will exit to userspace with a
> > +KVM_SYSTEM_EVENT exit, where the event type is KVM_SYSTEM_EVENT_WAKEUP. If
> > +userspace wants to honor the wakeup, it must set the vCPU's MP state to
> > +KVM_MP_STATE_RUNNABLE. If it does not, KVM will continue to await a wakeup
> > +event in subsequent calls to KVM_RUN.
> 
> I can see a potential 'gotcha' here. If the VMM doesn't want to set
> the vcpu as runnable, but doesn't take action on the source of the
> wake-up (masking the interrupt), you'll get an immediate wake-up event
> again. The VMM is now eating 100% of the CPU and not making forward
> progress. Luser error, but you may want to capture the failure mode
> and make it crystal clear in the doc.
> 
> It also mean that at the point where it decides to restart the guest
> for real, it must restore the interrupt state as it initially found
> it.
> 

Yeah, I had realized this when working on the series, but lazily swept
it under the rug of user error. But, it is probably better to be more
descriptive in the documentation, so I'll adopt the suggestion. Thanks!

> > +
> > +For riscv:
> > +^^^^^^^^^^
> >  
> >  The only states that are valid are KVM_MP_STATE_STOPPED and
> >  KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
> > @@ -5914,6 +5929,7 @@ should put the acknowledged interrupt vector into the 'epr' field.
> >    #define KVM_SYSTEM_EVENT_SHUTDOWN       1
> >    #define KVM_SYSTEM_EVENT_RESET          2
> >    #define KVM_SYSTEM_EVENT_CRASH          3
> > +  #define KVM_SYSTEM_EVENT_WAKEUP         4
> >  			__u32 type;
> >  			__u64 flags;
> >  		} system_event;
> > @@ -5938,6 +5954,9 @@ Valid values for 'type' are:
> >     has requested a crash condition maintenance. Userspace can choose
> >     to ignore the request, or to gather VM memory core dump and/or
> >     reset/shutdown of the VM.
> > + - KVM_SYSTEM_EVENT_WAKEUP -- the guest is in a suspended state and KVM
> > +   has recognized a wakeup event. Userspace may honor this event by marking
> > +   the exiting vCPU as runnable, or deny it and call KVM_RUN again.
> >  
> >  ::
> >  
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 33ecec755310..d32cab0c9752 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -46,6 +46,7 @@
> >  #define KVM_REQ_RECORD_STEAL	KVM_ARCH_REQ(3)
> >  #define KVM_REQ_RELOAD_GICv4	KVM_ARCH_REQ(4)
> >  #define KVM_REQ_RELOAD_PMU	KVM_ARCH_REQ(5)
> > +#define KVM_REQ_SUSPEND		KVM_ARCH_REQ(6)
> >  
> >  #define KVM_DIRTY_LOG_MANUAL_CAPS   (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
> >  				     KVM_DIRTY_LOG_INITIALLY_SET)
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index f6ce97c0069c..d2b190f32651 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -438,6 +438,18 @@ bool kvm_arm_vcpu_powered_off(struct kvm_vcpu *vcpu)
> >  	return vcpu->arch.mp_state == KVM_MP_STATE_STOPPED;
> >  }
> >  
> > +static void kvm_arm_vcpu_suspend(struct kvm_vcpu *vcpu)
> > +{
> > +	vcpu->arch.mp_state = KVM_MP_STATE_SUSPENDED;
> > +	kvm_make_request(KVM_REQ_SUSPEND, vcpu);
> > +	kvm_vcpu_kick(vcpu);
> 
> I wonder whether this kvm_vcpu_kick() is simply cargo-culted. The
> mp_state calls can only be done from the vcpu fd, and thus the vcpu
> cannot be running, so there is nothing to kick. Not a big deal, but
> something we may want to look at later on.
>

True, and hopefully this isn't an open invitation to add support for
vCPUs suspending other vCPUs, which would be a mess.

> > +}
> > +
> > +bool kvm_arm_vcpu_suspended(struct kvm_vcpu *vcpu)
> 
> static?
> 
> > +{
> > +	return vcpu->arch.mp_state == KVM_MP_STATE_SUSPENDED;
> > +}
> > +
> >  int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
> >  				    struct kvm_mp_state *mp_state)
> >  {
> > @@ -458,6 +470,9 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
> >  	case KVM_MP_STATE_STOPPED:
> >  		kvm_arm_vcpu_power_off(vcpu);
> >  		break;
> > +	case KVM_MP_STATE_SUSPENDED:
> > +		kvm_arm_vcpu_suspend(vcpu);
> > +		break;
> >  	default:
> >  		ret = -EINVAL;
> >  	}
> > @@ -719,6 +734,23 @@ void kvm_vcpu_wfi(struct kvm_vcpu *vcpu)
> >  	preempt_enable();
> >  }
> >  
> > +static int kvm_vcpu_suspend(struct kvm_vcpu *vcpu)
> > +{
> > +	if (!kvm_arm_vcpu_suspended(vcpu))
> > +		return 1;
> > +
> > +	kvm_vcpu_wfi(vcpu);
> > +
> > +	/*
> > +	 * The suspend state is sticky; we do not leave it until userspace
> > +	 * explicitly marks the vCPU as runnable. Request that we suspend again
> > +	 * later.
> > +	 */
> > +	kvm_make_request(KVM_REQ_SUSPEND, vcpu);
> > +	kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_WAKEUP);
> > +	return 0;
> > +}
> > +
> >  /**
> >   * check_vcpu_requests - check and handle pending vCPU requests
> >   * @vcpu:	the VCPU pointer
> > @@ -757,6 +789,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
> >  		if (kvm_check_request(KVM_REQ_RELOAD_PMU, vcpu))
> >  			kvm_pmu_handle_pmcr(vcpu,
> >  					    __vcpu_sys_reg(vcpu, PMCR_EL0));
> > +
> > +		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
> > +			return kvm_vcpu_suspend(vcpu);
> >  	}
> >  
> >  	return 1;
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 5191b57e1562..babb16c2abe5 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -444,6 +444,7 @@ struct kvm_run {
> >  #define KVM_SYSTEM_EVENT_SHUTDOWN       1
> >  #define KVM_SYSTEM_EVENT_RESET          2
> >  #define KVM_SYSTEM_EVENT_CRASH          3
> > +#define KVM_SYSTEM_EVENT_WAKEUP         4
> >  			__u32 type;
> >  			__u64 flags;
> >  		} system_event;
> > @@ -634,6 +635,7 @@ struct kvm_vapic_addr {
> >  #define KVM_MP_STATE_OPERATING         7
> >  #define KVM_MP_STATE_LOAD              8
> >  #define KVM_MP_STATE_AP_RESET_HOLD     9
> > +#define KVM_MP_STATE_SUSPENDED         10
> >  
> >  struct kvm_mp_state {
> >  	__u32 mp_state;
> 
> This patch looks OK as is, but it is the interactions with PSCI that
> concern me. What we have here is per-CPU suspend triggered by
> userspace. PSCI OTOH offers two variants of suspend triggered by the
> guest. All of them get different implementations, and I have a hard
> time figuring out how they all interact...

Yeah, all of this suspend logic could become a tangle.

There is likely an opportunity to share some bits between CPU_SUSPEND
and SYSTEM_SUSPEND, but userspace-directed suspends are different enough
that it warrants a different implmentation.

Thanks again for the review!

--
Oliver

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 12/19] KVM: arm64: Add support for userspace to suspend a vCPU
@ 2022-02-24 19:47       ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-24 19:47 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Wanpeng Li, kvm, Joerg Roedel, Peter Shier, kvm-riscv,
	Atish Patra, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

On Thu, Feb 24, 2022 at 03:12:17PM +0000, Marc Zyngier wrote:
> On Wed, 23 Feb 2022 04:18:37 +0000,
> Oliver Upton <oupton@google.com> wrote:
> > 
> > Introduce a new MP state, KVM_MP_STATE_SUSPENDED, which indicates a vCPU
> > is in a suspended state. In the suspended state the vCPU will block
> > until a wakeup event (pending interrupt) is recognized.
> > 
> > Add a new system event type, KVM_SYSTEM_EVENT_WAKEUP, to indicate to
> > userspace that KVM has recognized one such wakeup event. It is the
> > responsibility of userspace to then make the vCPU runnable, or leave it
> > suspended until the next wakeup event.
> > 
> > Signed-off-by: Oliver Upton <oupton@google.com>
> > ---
> >  Documentation/virt/kvm/api.rst    | 23 ++++++++++++++++++--
> >  arch/arm64/include/asm/kvm_host.h |  1 +
> >  arch/arm64/kvm/arm.c              | 35 +++++++++++++++++++++++++++++++
> >  include/uapi/linux/kvm.h          |  2 ++
> >  4 files changed, 59 insertions(+), 2 deletions(-)
> > 
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index a4267104db50..2b4bdbc2dcc0 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -1482,14 +1482,29 @@ Possible values are:
> >                                   [s390]
> >     KVM_MP_STATE_LOAD             the vcpu is in a special load/startup state
> >                                   [s390]
> > +   KVM_MP_STATE_SUSPENDED        the vcpu is in a suspend state and is waiting
> > +                                 for a wakeup event [arm/arm64]
> 
> nit: arm64 only (these are host architectures, not guest).

Roger that.

> Eventually, someone needs to do a bit of cleanup in the docs to remove
> any trace of ye olde 32bit stuff.
>

I'm just going to act like I didn't read this ;-)

> 
> >     ==========================    ===============================================
> >  
> >  On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
> >  in-kernel irqchip, the multiprocessing state must be maintained by userspace on
> >  these architectures.
> >  
> > -For arm/arm64/riscv:
> > -^^^^^^^^^^^^^^^^^^^^
> > +For arm/arm64:
> > +^^^^^^^^^^^^^^
> > +
> > +If a vCPU is in the KVM_MP_STATE_SUSPENDED state, KVM will block the vCPU
> > +thread and wait for a wakeup event. A wakeup event is defined as a pending
> > +interrupt for the guest.
> 
> nit: a pending interrupt that the guest can actually handle (a masked
> interrupt can be pending). It'd be more accurate to describe this
> state as the architectural execution of a WFI instruction.
>

Yeah, probably better than paraphrasing.

> > +
> > +If a wakeup event is recognized, KVM will exit to userspace with a
> > +KVM_SYSTEM_EVENT exit, where the event type is KVM_SYSTEM_EVENT_WAKEUP. If
> > +userspace wants to honor the wakeup, it must set the vCPU's MP state to
> > +KVM_MP_STATE_RUNNABLE. If it does not, KVM will continue to await a wakeup
> > +event in subsequent calls to KVM_RUN.
> 
> I can see a potential 'gotcha' here. If the VMM doesn't want to set
> the vcpu as runnable, but doesn't take action on the source of the
> wake-up (masking the interrupt), you'll get an immediate wake-up event
> again. The VMM is now eating 100% of the CPU and not making forward
> progress. Luser error, but you may want to capture the failure mode
> and make it crystal clear in the doc.
> 
> It also mean that at the point where it decides to restart the guest
> for real, it must restore the interrupt state as it initially found
> it.
> 

Yeah, I had realized this when working on the series, but lazily swept
it under the rug of user error. But, it is probably better to be more
descriptive in the documentation, so I'll adopt the suggestion. Thanks!

> > +
> > +For riscv:
> > +^^^^^^^^^^
> >  
> >  The only states that are valid are KVM_MP_STATE_STOPPED and
> >  KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
> > @@ -5914,6 +5929,7 @@ should put the acknowledged interrupt vector into the 'epr' field.
> >    #define KVM_SYSTEM_EVENT_SHUTDOWN       1
> >    #define KVM_SYSTEM_EVENT_RESET          2
> >    #define KVM_SYSTEM_EVENT_CRASH          3
> > +  #define KVM_SYSTEM_EVENT_WAKEUP         4
> >  			__u32 type;
> >  			__u64 flags;
> >  		} system_event;
> > @@ -5938,6 +5954,9 @@ Valid values for 'type' are:
> >     has requested a crash condition maintenance. Userspace can choose
> >     to ignore the request, or to gather VM memory core dump and/or
> >     reset/shutdown of the VM.
> > + - KVM_SYSTEM_EVENT_WAKEUP -- the guest is in a suspended state and KVM
> > +   has recognized a wakeup event. Userspace may honor this event by marking
> > +   the exiting vCPU as runnable, or deny it and call KVM_RUN again.
> >  
> >  ::
> >  
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 33ecec755310..d32cab0c9752 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -46,6 +46,7 @@
> >  #define KVM_REQ_RECORD_STEAL	KVM_ARCH_REQ(3)
> >  #define KVM_REQ_RELOAD_GICv4	KVM_ARCH_REQ(4)
> >  #define KVM_REQ_RELOAD_PMU	KVM_ARCH_REQ(5)
> > +#define KVM_REQ_SUSPEND		KVM_ARCH_REQ(6)
> >  
> >  #define KVM_DIRTY_LOG_MANUAL_CAPS   (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \
> >  				     KVM_DIRTY_LOG_INITIALLY_SET)
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index f6ce97c0069c..d2b190f32651 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -438,6 +438,18 @@ bool kvm_arm_vcpu_powered_off(struct kvm_vcpu *vcpu)
> >  	return vcpu->arch.mp_state == KVM_MP_STATE_STOPPED;
> >  }
> >  
> > +static void kvm_arm_vcpu_suspend(struct kvm_vcpu *vcpu)
> > +{
> > +	vcpu->arch.mp_state = KVM_MP_STATE_SUSPENDED;
> > +	kvm_make_request(KVM_REQ_SUSPEND, vcpu);
> > +	kvm_vcpu_kick(vcpu);
> 
> I wonder whether this kvm_vcpu_kick() is simply cargo-culted. The
> mp_state calls can only be done from the vcpu fd, and thus the vcpu
> cannot be running, so there is nothing to kick. Not a big deal, but
> something we may want to look at later on.
>

True, and hopefully this isn't an open invitation to add support for
vCPUs suspending other vCPUs, which would be a mess.

> > +}
> > +
> > +bool kvm_arm_vcpu_suspended(struct kvm_vcpu *vcpu)
> 
> static?
> 
> > +{
> > +	return vcpu->arch.mp_state == KVM_MP_STATE_SUSPENDED;
> > +}
> > +
> >  int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
> >  				    struct kvm_mp_state *mp_state)
> >  {
> > @@ -458,6 +470,9 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
> >  	case KVM_MP_STATE_STOPPED:
> >  		kvm_arm_vcpu_power_off(vcpu);
> >  		break;
> > +	case KVM_MP_STATE_SUSPENDED:
> > +		kvm_arm_vcpu_suspend(vcpu);
> > +		break;
> >  	default:
> >  		ret = -EINVAL;
> >  	}
> > @@ -719,6 +734,23 @@ void kvm_vcpu_wfi(struct kvm_vcpu *vcpu)
> >  	preempt_enable();
> >  }
> >  
> > +static int kvm_vcpu_suspend(struct kvm_vcpu *vcpu)
> > +{
> > +	if (!kvm_arm_vcpu_suspended(vcpu))
> > +		return 1;
> > +
> > +	kvm_vcpu_wfi(vcpu);
> > +
> > +	/*
> > +	 * The suspend state is sticky; we do not leave it until userspace
> > +	 * explicitly marks the vCPU as runnable. Request that we suspend again
> > +	 * later.
> > +	 */
> > +	kvm_make_request(KVM_REQ_SUSPEND, vcpu);
> > +	kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_WAKEUP);
> > +	return 0;
> > +}
> > +
> >  /**
> >   * check_vcpu_requests - check and handle pending vCPU requests
> >   * @vcpu:	the VCPU pointer
> > @@ -757,6 +789,9 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
> >  		if (kvm_check_request(KVM_REQ_RELOAD_PMU, vcpu))
> >  			kvm_pmu_handle_pmcr(vcpu,
> >  					    __vcpu_sys_reg(vcpu, PMCR_EL0));
> > +
> > +		if (kvm_check_request(KVM_REQ_SUSPEND, vcpu))
> > +			return kvm_vcpu_suspend(vcpu);
> >  	}
> >  
> >  	return 1;
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 5191b57e1562..babb16c2abe5 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -444,6 +444,7 @@ struct kvm_run {
> >  #define KVM_SYSTEM_EVENT_SHUTDOWN       1
> >  #define KVM_SYSTEM_EVENT_RESET          2
> >  #define KVM_SYSTEM_EVENT_CRASH          3
> > +#define KVM_SYSTEM_EVENT_WAKEUP         4
> >  			__u32 type;
> >  			__u64 flags;
> >  		} system_event;
> > @@ -634,6 +635,7 @@ struct kvm_vapic_addr {
> >  #define KVM_MP_STATE_OPERATING         7
> >  #define KVM_MP_STATE_LOAD              8
> >  #define KVM_MP_STATE_AP_RESET_HOLD     9
> > +#define KVM_MP_STATE_SUSPENDED         10
> >  
> >  struct kvm_mp_state {
> >  	__u32 mp_state;
> 
> This patch looks OK as is, but it is the interactions with PSCI that
> concern me. What we have here is per-CPU suspend triggered by
> userspace. PSCI OTOH offers two variants of suspend triggered by the
> guest. All of them get different implementations, and I have a hard
> time figuring out how they all interact...

Yeah, all of this suspend logic could become a tangle.

There is likely an opportunity to share some bits between CPU_SUSPEND
and SYSTEM_SUSPEND, but userspace-directed suspends are different enough
that it warrants a different implmentation.

Thanks again for the review!

--
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 13/19] KVM: arm64: Add support KVM_SYSTEM_EVENT_SUSPEND to PSCI SYSTEM_SUSPEND
  2022-02-24 15:40     ` Marc Zyngier
@ 2022-02-24 20:05       ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-24 20:05 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, Paolo Bonzini, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

On Thu, Feb 24, 2022 at 03:40:15PM +0000, Marc Zyngier wrote:
> On Wed, 23 Feb 2022 04:18:38 +0000,
> Oliver Upton <oupton@google.com> wrote:
> > 
> > Add a new system event type, KVM_SYSTEM_EVENT_SUSPEND, which indicates
> > to userspace that the guest has requested the VM be suspended. Userspace
> > can decide whether or not it wants to honor the guest's request by
> > changing the MP state of the vCPU. If it does not, userspace is
> > responsible for configuring the vCPU to return an error to the guest.
> > Document these expectations in the KVM API documentation.
> > 
> > To preserve ABI, this new exit requires explicit opt-in from userspace.
> > Add KVM_CAP_ARM_SYSTEM_SUSPEND which grants userspace the ability to
> > opt-in to these exits on a per-VM basis.
> > 
> > Signed-off-by: Oliver Upton <oupton@google.com>
> > ---
> >  Documentation/virt/kvm/api.rst    | 39 +++++++++++++++++++++++++++++++
> >  arch/arm64/include/asm/kvm_host.h |  3 +++
> >  arch/arm64/kvm/arm.c              |  5 ++++
> >  arch/arm64/kvm/psci.c             |  5 ++++
> >  include/uapi/linux/kvm.h          |  2 ++
> >  5 files changed, 54 insertions(+)
> > 
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 2b4bdbc2dcc0..1e207bbc01f5 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -5930,6 +5930,7 @@ should put the acknowledged interrupt vector into the 'epr' field.
> >    #define KVM_SYSTEM_EVENT_RESET          2
> >    #define KVM_SYSTEM_EVENT_CRASH          3
> >    #define KVM_SYSTEM_EVENT_WAKEUP         4
> > +  #define KVM_SYSTEM_EVENT_SUSPENDED      5
> >  			__u32 type;
> >  			__u64 flags;
> >  		} system_event;
> > @@ -5957,6 +5958,34 @@ Valid values for 'type' are:
> >   - KVM_SYSTEM_EVENT_WAKEUP -- the guest is in a suspended state and KVM
> >     has recognized a wakeup event. Userspace may honor this event by marking
> >     the exiting vCPU as runnable, or deny it and call KVM_RUN again.
> > + - KVM_SYSTEM_EVENT_SUSPENDED -- the guest has requested a suspension of
> > +   the VM.
> > +
> > +For arm/arm64:
> > +^^^^^^^^^^^^^^
> > +
> > +   KVM_SYSTEM_EVENT_SUSPENDED exits are enabled with the
> > +   KVM_CAP_ARM_SYSTEM_SUSPEND VM capability. If a guest successfully
> > +   invokes the PSCI SYSTEM_SUSPEND function, KVM will exit to userspace
> > +   with this event type.
> > +
> > +   The guest's x2 register contains the 'entry_address' where execution
> 
> x1?
> 
> > +   should resume when the VM is brought out of suspend. The guest's x3
> 
> x2?
> 
> > +   register contains the 'context_id' corresponding to the request. When
> > +   the guest resumes execution at 'entry_address', x0 should contain the
> > +   'context_id'. For more details on the SYSTEM_SUSPEND PSCI call, see
> > +   ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND".
> 
> I'd refrain from paraphrasing too much of the spec, and direct the
> user to it. It will also avoid introducing bugs... ;-)
> 
> Overall, "the guest" is super ambiguous, and echoes the questions I
> had earlier about what this means for an SMP system. Only one vcpu can
> restart the system, but which one?
> 
> > +
> > +   Userspace is _required_ to take action for such an exit. It must
> > +   either:
> > +
> > +    - Honor the guest request to suspend the VM. Userspace must reset
> > +      the calling vCPU, then set PC to 'entry_address' and x0 to
> > +      'context_id'. Userspace may request in-kernel emulation of the
> > +      suspension by setting the vCPU's state to KVM_MP_STATE_SUSPENDED.
> 
> So here, you are actively saying that the calling vcpu should be the
> one being resumed. If that's the case (and assuming that this is a
> behaviour intended by the spec), something should prevent the other
> vcpus from being started.
> 
> > +
> > +    - Deny the guest request to suspend the VM. Userspace must set
> > +      registers x1-x3 to 0 and set x0 to PSCI_RET_INTERNAL_ERROR (-6).
> 
> Do you have any sort of userspace code that demonstrates this? It'd be
> super useful to see how that works on any publicly available VMM
> (qemu, kvmtool, or any of the ferric oxide based monsters).
> 
> >
> >  ::
> >  
> > @@ -7580,3 +7609,13 @@ The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
> >  of the result of KVM_CHECK_EXTENSION.  KVM will forward to userspace
> >  the hypercalls whose corresponding bit is in the argument, and return
> >  ENOSYS for the others.
> > +
> > +8.35 KVM_CAP_ARM_SYSTEM_SUSPEND
> > +-------------------------------
> > +
> > +:Capability: KVM_CAP_ARM_SYSTEM_SUSPEND
> > +:Architectures: arm64
> > +:Type: vm
> > +
> > +When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of
> > +type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request.
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index d32cab0c9752..e1c2ec18d1aa 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -146,6 +146,9 @@ struct kvm_arch {
> >  
> >  	/* Memory Tagging Extension enabled for the guest */
> >  	bool mte_enabled;
> > +
> > +	/* System Suspend Event exits enabled for the VM */
> > +	bool system_suspend_exits;
> 
> Gah... More of these. Please pick this patch:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=kvm-arm64/mmu/guest-MMIO-guard&id=7dd0a13a4217b870f2e83cdc6045e5ce482a5340
> 
> >  };
> >  
> >  struct kvm_vcpu_fault_info {
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index d2b190f32651..ce3f14a77a49 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -101,6 +101,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >  		}
> >  		mutex_unlock(&kvm->lock);
> >  		break;
> > +	case KVM_CAP_ARM_SYSTEM_SUSPEND:
> > +		r = 0;
> > +		kvm->arch.system_suspend_exits = true;
> > +		break;
> >  	default:
> >  		r = -EINVAL;
> >  		break;
> > @@ -209,6 +213,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >  	case KVM_CAP_SET_GUEST_DEBUG:
> >  	case KVM_CAP_VCPU_ATTRIBUTES:
> >  	case KVM_CAP_PTP_KVM:
> > +	case KVM_CAP_ARM_SYSTEM_SUSPEND:
> >  		r = 1;
> >  		break;
> >  	case KVM_CAP_SET_GUEST_DEBUG2:
> > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > index 2bb8d047cde4..a7de84cec2e4 100644
> > --- a/arch/arm64/kvm/psci.c
> > +++ b/arch/arm64/kvm/psci.c
> > @@ -245,6 +245,11 @@ static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
> >  		return 1;
> >  	}
> >  
> > +	if (kvm->arch.system_suspend_exits) {
> > +		kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_SUSPEND);
> > +		return 0;
> > +	}
> > +
> 
> So there really is a difference in behaviour here. Userspace sees the
> WFI behaviour before reset (it implements it), while when not using
> the SUSPEND event, reset occurs before anything else.
> 
> They really should behave in a similar way (WFI first, reset next).

I mentioned this on the other patch, but I think the conversation should
continue here as UAPI context is in this one.

If SUSPEND exits are disabled and SYSTEM_SUSPEND is implemented in the
kernel, userspace cannot observe any intermediate state. I think it is
necessary for migration, otherwise if userspace were to save the vCPU
post-WFI, pre-reset the pending reset would get lost along the way.

As far as userspace is concerned, I think the WFI+reset operation is
atomic. SUSPEND exits just allow userspace to intervene before said
atomic operation.

Perhaps I'm missing something: assuming SUSPEND exits are disabled, what
value is provided to userspace if it can see WFI behavior before the
reset?

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 13/19] KVM: arm64: Add support KVM_SYSTEM_EVENT_SUSPEND to PSCI SYSTEM_SUSPEND
@ 2022-02-24 20:05       ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-24 20:05 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Wanpeng Li, kvm, Joerg Roedel, Peter Shier, kvm-riscv,
	Atish Patra, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

On Thu, Feb 24, 2022 at 03:40:15PM +0000, Marc Zyngier wrote:
> On Wed, 23 Feb 2022 04:18:38 +0000,
> Oliver Upton <oupton@google.com> wrote:
> > 
> > Add a new system event type, KVM_SYSTEM_EVENT_SUSPEND, which indicates
> > to userspace that the guest has requested the VM be suspended. Userspace
> > can decide whether or not it wants to honor the guest's request by
> > changing the MP state of the vCPU. If it does not, userspace is
> > responsible for configuring the vCPU to return an error to the guest.
> > Document these expectations in the KVM API documentation.
> > 
> > To preserve ABI, this new exit requires explicit opt-in from userspace.
> > Add KVM_CAP_ARM_SYSTEM_SUSPEND which grants userspace the ability to
> > opt-in to these exits on a per-VM basis.
> > 
> > Signed-off-by: Oliver Upton <oupton@google.com>
> > ---
> >  Documentation/virt/kvm/api.rst    | 39 +++++++++++++++++++++++++++++++
> >  arch/arm64/include/asm/kvm_host.h |  3 +++
> >  arch/arm64/kvm/arm.c              |  5 ++++
> >  arch/arm64/kvm/psci.c             |  5 ++++
> >  include/uapi/linux/kvm.h          |  2 ++
> >  5 files changed, 54 insertions(+)
> > 
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 2b4bdbc2dcc0..1e207bbc01f5 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -5930,6 +5930,7 @@ should put the acknowledged interrupt vector into the 'epr' field.
> >    #define KVM_SYSTEM_EVENT_RESET          2
> >    #define KVM_SYSTEM_EVENT_CRASH          3
> >    #define KVM_SYSTEM_EVENT_WAKEUP         4
> > +  #define KVM_SYSTEM_EVENT_SUSPENDED      5
> >  			__u32 type;
> >  			__u64 flags;
> >  		} system_event;
> > @@ -5957,6 +5958,34 @@ Valid values for 'type' are:
> >   - KVM_SYSTEM_EVENT_WAKEUP -- the guest is in a suspended state and KVM
> >     has recognized a wakeup event. Userspace may honor this event by marking
> >     the exiting vCPU as runnable, or deny it and call KVM_RUN again.
> > + - KVM_SYSTEM_EVENT_SUSPENDED -- the guest has requested a suspension of
> > +   the VM.
> > +
> > +For arm/arm64:
> > +^^^^^^^^^^^^^^
> > +
> > +   KVM_SYSTEM_EVENT_SUSPENDED exits are enabled with the
> > +   KVM_CAP_ARM_SYSTEM_SUSPEND VM capability. If a guest successfully
> > +   invokes the PSCI SYSTEM_SUSPEND function, KVM will exit to userspace
> > +   with this event type.
> > +
> > +   The guest's x2 register contains the 'entry_address' where execution
> 
> x1?
> 
> > +   should resume when the VM is brought out of suspend. The guest's x3
> 
> x2?
> 
> > +   register contains the 'context_id' corresponding to the request. When
> > +   the guest resumes execution at 'entry_address', x0 should contain the
> > +   'context_id'. For more details on the SYSTEM_SUSPEND PSCI call, see
> > +   ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND".
> 
> I'd refrain from paraphrasing too much of the spec, and direct the
> user to it. It will also avoid introducing bugs... ;-)
> 
> Overall, "the guest" is super ambiguous, and echoes the questions I
> had earlier about what this means for an SMP system. Only one vcpu can
> restart the system, but which one?
> 
> > +
> > +   Userspace is _required_ to take action for such an exit. It must
> > +   either:
> > +
> > +    - Honor the guest request to suspend the VM. Userspace must reset
> > +      the calling vCPU, then set PC to 'entry_address' and x0 to
> > +      'context_id'. Userspace may request in-kernel emulation of the
> > +      suspension by setting the vCPU's state to KVM_MP_STATE_SUSPENDED.
> 
> So here, you are actively saying that the calling vcpu should be the
> one being resumed. If that's the case (and assuming that this is a
> behaviour intended by the spec), something should prevent the other
> vcpus from being started.
> 
> > +
> > +    - Deny the guest request to suspend the VM. Userspace must set
> > +      registers x1-x3 to 0 and set x0 to PSCI_RET_INTERNAL_ERROR (-6).
> 
> Do you have any sort of userspace code that demonstrates this? It'd be
> super useful to see how that works on any publicly available VMM
> (qemu, kvmtool, or any of the ferric oxide based monsters).
> 
> >
> >  ::
> >  
> > @@ -7580,3 +7609,13 @@ The argument to KVM_ENABLE_CAP is also a bitmask, and must be a subset
> >  of the result of KVM_CHECK_EXTENSION.  KVM will forward to userspace
> >  the hypercalls whose corresponding bit is in the argument, and return
> >  ENOSYS for the others.
> > +
> > +8.35 KVM_CAP_ARM_SYSTEM_SUSPEND
> > +-------------------------------
> > +
> > +:Capability: KVM_CAP_ARM_SYSTEM_SUSPEND
> > +:Architectures: arm64
> > +:Type: vm
> > +
> > +When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of
> > +type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request.
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index d32cab0c9752..e1c2ec18d1aa 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -146,6 +146,9 @@ struct kvm_arch {
> >  
> >  	/* Memory Tagging Extension enabled for the guest */
> >  	bool mte_enabled;
> > +
> > +	/* System Suspend Event exits enabled for the VM */
> > +	bool system_suspend_exits;
> 
> Gah... More of these. Please pick this patch:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=kvm-arm64/mmu/guest-MMIO-guard&id=7dd0a13a4217b870f2e83cdc6045e5ce482a5340
> 
> >  };
> >  
> >  struct kvm_vcpu_fault_info {
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index d2b190f32651..ce3f14a77a49 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -101,6 +101,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
> >  		}
> >  		mutex_unlock(&kvm->lock);
> >  		break;
> > +	case KVM_CAP_ARM_SYSTEM_SUSPEND:
> > +		r = 0;
> > +		kvm->arch.system_suspend_exits = true;
> > +		break;
> >  	default:
> >  		r = -EINVAL;
> >  		break;
> > @@ -209,6 +213,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> >  	case KVM_CAP_SET_GUEST_DEBUG:
> >  	case KVM_CAP_VCPU_ATTRIBUTES:
> >  	case KVM_CAP_PTP_KVM:
> > +	case KVM_CAP_ARM_SYSTEM_SUSPEND:
> >  		r = 1;
> >  		break;
> >  	case KVM_CAP_SET_GUEST_DEBUG2:
> > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > index 2bb8d047cde4..a7de84cec2e4 100644
> > --- a/arch/arm64/kvm/psci.c
> > +++ b/arch/arm64/kvm/psci.c
> > @@ -245,6 +245,11 @@ static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
> >  		return 1;
> >  	}
> >  
> > +	if (kvm->arch.system_suspend_exits) {
> > +		kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_SUSPEND);
> > +		return 0;
> > +	}
> > +
> 
> So there really is a difference in behaviour here. Userspace sees the
> WFI behaviour before reset (it implements it), while when not using
> the SUSPEND event, reset occurs before anything else.
> 
> They really should behave in a similar way (WFI first, reset next).

I mentioned this on the other patch, but I think the conversation should
continue here as UAPI context is in this one.

If SUSPEND exits are disabled and SYSTEM_SUSPEND is implemented in the
kernel, userspace cannot observe any intermediate state. I think it is
necessary for migration, otherwise if userspace were to save the vCPU
post-WFI, pre-reset the pending reset would get lost along the way.

As far as userspace is concerned, I think the WFI+reset operation is
atomic. SUSPEND exits just allow userspace to intervene before said
atomic operation.

Perhaps I'm missing something: assuming SUSPEND exits are disabled, what
value is provided to userspace if it can see WFI behavior before the
reset?

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 06/19] KVM: arm64: Track vCPU power state using MP state values
  2022-02-24 13:25     ` Marc Zyngier
@ 2022-02-24 22:08       ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-24 22:08 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, Paolo Bonzini, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

Hi Marc,

On Thu, Feb 24, 2022 at 01:25:04PM +0000, Marc Zyngier wrote:

[...]

> > @@ -190,7 +190,7 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
> >  	 * re-initialized.
> >  	 */
> >  	kvm_for_each_vcpu(i, tmp, vcpu->kvm)
> > -		tmp->arch.power_off = true;
> > +		tmp->arch.mp_state = KVM_MP_STATE_STOPPED;
> >  	kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
> >  
> >  	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
> 
> You also may want to initialise the mp_state to RUNNABLE by default in
> kvm_arch_vcpu_create(). We are currently relying on power_off to be
> false thanks to the vcpu struct being zeroed, but we may as well make
> it clearer (RUNNABLE is also 0, so there is no actual bug here).

We unconditionally initialize power_off in
kvm_arch_vcpu_ioctl_vcpu_init(), and do the same in this patch for mp_state,
depending on if KVM_ARM_VCPU_POWER_OFF is set.

Any objections to leaving that as-is? I can move the RUNNABLE case into
kvm_arch_vcpu_create() as you've suggested, too.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 06/19] KVM: arm64: Track vCPU power state using MP state values
@ 2022-02-24 22:08       ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-24 22:08 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Wanpeng Li, kvm, Joerg Roedel, Peter Shier, kvm-riscv,
	Atish Patra, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

Hi Marc,

On Thu, Feb 24, 2022 at 01:25:04PM +0000, Marc Zyngier wrote:

[...]

> > @@ -190,7 +190,7 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
> >  	 * re-initialized.
> >  	 */
> >  	kvm_for_each_vcpu(i, tmp, vcpu->kvm)
> > -		tmp->arch.power_off = true;
> > +		tmp->arch.mp_state = KVM_MP_STATE_STOPPED;
> >  	kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
> >  
> >  	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
> 
> You also may want to initialise the mp_state to RUNNABLE by default in
> kvm_arch_vcpu_create(). We are currently relying on power_off to be
> false thanks to the vcpu struct being zeroed, but we may as well make
> it clearer (RUNNABLE is also 0, so there is no actual bug here).

We unconditionally initialize power_off in
kvm_arch_vcpu_ioctl_vcpu_init(), and do the same in this patch for mp_state,
depending on if KVM_ARM_VCPU_POWER_OFF is set.

Any objections to leaving that as-is? I can move the RUNNABLE case into
kvm_arch_vcpu_create() as you've suggested, too.

--
Thanks,
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 03/19] KVM: arm64: Reject invalid addresses for CPU_ON PSCI call
  2022-02-24 19:21       ` Oliver Upton
@ 2022-02-25 15:35         ` Marc Zyngier
  -1 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-02-25 15:35 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, Paolo Bonzini, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

On Thu, 24 Feb 2022 19:21:50 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> Hi Marc,
> 
> On Thu, Feb 24, 2022 at 12:30:49PM +0000, Marc Zyngier wrote:
> > On Wed, 23 Feb 2022 04:18:28 +0000,
> > Oliver Upton <oupton@google.com> wrote:
> > > 
> > > DEN0022D.b 5.6.2 "Caller responsibilities" states that a PSCI
> > > implementation may return INVALID_ADDRESS for the CPU_ON call if the
> > > provided entry address is known to be invalid. There is an additional
> > > caveat to this rule. Prior to PSCI v1.0, the INVALID_PARAMETERS error
> > > is returned instead. Check the guest's PSCI version and return the
> > > appropriate error if the IPA is invalid.
> > > 
> > > Reported-by: Reiji Watanabe <reijiw@google.com>
> > > Signed-off-by: Oliver Upton <oupton@google.com>
> > > ---
> > >  arch/arm64/kvm/psci.c | 24 ++++++++++++++++++++++--
> > >  1 file changed, 22 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > > index a0c10c11f40e..de1cf554929d 100644
> > > --- a/arch/arm64/kvm/psci.c
> > > +++ b/arch/arm64/kvm/psci.c
> > > @@ -12,6 +12,7 @@
> > >  
> > >  #include <asm/cputype.h>
> > >  #include <asm/kvm_emulate.h>
> > > +#include <asm/kvm_mmu.h>
> > >  
> > >  #include <kvm/arm_psci.h>
> > >  #include <kvm/arm_hypercalls.h>
> > > @@ -70,12 +71,31 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
> > >  	struct vcpu_reset_state *reset_state;
> > >  	struct kvm *kvm = source_vcpu->kvm;
> > >  	struct kvm_vcpu *vcpu = NULL;
> > > -	unsigned long cpu_id;
> > > +	unsigned long cpu_id, entry_addr;
> > >  
> > >  	cpu_id = smccc_get_arg1(source_vcpu);
> > >  	if (!kvm_psci_valid_affinity(source_vcpu, cpu_id))
> > >  		return PSCI_RET_INVALID_PARAMS;
> > >  
> > > +	/*
> > > +	 * Basic sanity check: ensure the requested entry address actually
> > > +	 * exists within the guest's address space.
> > > +	 */
> > > +	entry_addr = smccc_get_arg2(source_vcpu);
> > > +	if (!kvm_ipa_valid(kvm, entry_addr)) {
> > > +
> > > +		/*
> > > +		 * Before PSCI v1.0, the INVALID_PARAMETERS error is returned
> > > +		 * instead of INVALID_ADDRESS.
> > > +		 *
> > > +		 * For more details, see ARM DEN0022D.b 5.6 "CPU_ON".
> > > +		 */
> > > +		if (kvm_psci_version(source_vcpu) < KVM_ARM_PSCI_1_0)
> > > +			return PSCI_RET_INVALID_PARAMS;
> > > +		else
> > > +			return PSCI_RET_INVALID_ADDRESS;
> > > +	}
> > > +
> > 
> > If you're concerned with this, should you also check for the PC
> > alignment, or the presence of a memslot covering the address you are
> > branching to?  Le latter is particularly hard to implement reliably.
> 
> Andrew, Reiji and I had a conversation regarding exactly this on the
> last run of this series, and concluded that checking against the IPA is
> probably the best KVM can do [1]. That said, alignment is also an easy
> thing to check.

Until you look at Thumb-2 ;-)

> 
> > So far, my position has been that the guest is free to shoot itself in
> > the foot if that's what it wants to do, and that babysitting it was a
> > waste of useful bits! ;-)
> >
> 
> Agreed -- there are plenty of spectacular/hilarious ways in which the
> guest can mess up :-)
> 
> > Or have you identified something that makes it a requirement to handle
> > this case (and possibly others)  in the hypervisor?
> 
> It is a lot easier to tell a guest that their software is broken if they
> get an error back from the hypercall, whereas a vCPU off in the weeds
> might need to be looked at before concluding there's a guest issue.

Fair enough. I'm not fundamentally against this patch. It is just a
bit out of context in this series.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 03/19] KVM: arm64: Reject invalid addresses for CPU_ON PSCI call
@ 2022-02-25 15:35         ` Marc Zyngier
  0 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-02-25 15:35 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Wanpeng Li, kvm, Joerg Roedel, Peter Shier, kvm-riscv,
	Atish Patra, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

On Thu, 24 Feb 2022 19:21:50 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> Hi Marc,
> 
> On Thu, Feb 24, 2022 at 12:30:49PM +0000, Marc Zyngier wrote:
> > On Wed, 23 Feb 2022 04:18:28 +0000,
> > Oliver Upton <oupton@google.com> wrote:
> > > 
> > > DEN0022D.b 5.6.2 "Caller responsibilities" states that a PSCI
> > > implementation may return INVALID_ADDRESS for the CPU_ON call if the
> > > provided entry address is known to be invalid. There is an additional
> > > caveat to this rule. Prior to PSCI v1.0, the INVALID_PARAMETERS error
> > > is returned instead. Check the guest's PSCI version and return the
> > > appropriate error if the IPA is invalid.
> > > 
> > > Reported-by: Reiji Watanabe <reijiw@google.com>
> > > Signed-off-by: Oliver Upton <oupton@google.com>
> > > ---
> > >  arch/arm64/kvm/psci.c | 24 ++++++++++++++++++++++--
> > >  1 file changed, 22 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > > index a0c10c11f40e..de1cf554929d 100644
> > > --- a/arch/arm64/kvm/psci.c
> > > +++ b/arch/arm64/kvm/psci.c
> > > @@ -12,6 +12,7 @@
> > >  
> > >  #include <asm/cputype.h>
> > >  #include <asm/kvm_emulate.h>
> > > +#include <asm/kvm_mmu.h>
> > >  
> > >  #include <kvm/arm_psci.h>
> > >  #include <kvm/arm_hypercalls.h>
> > > @@ -70,12 +71,31 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
> > >  	struct vcpu_reset_state *reset_state;
> > >  	struct kvm *kvm = source_vcpu->kvm;
> > >  	struct kvm_vcpu *vcpu = NULL;
> > > -	unsigned long cpu_id;
> > > +	unsigned long cpu_id, entry_addr;
> > >  
> > >  	cpu_id = smccc_get_arg1(source_vcpu);
> > >  	if (!kvm_psci_valid_affinity(source_vcpu, cpu_id))
> > >  		return PSCI_RET_INVALID_PARAMS;
> > >  
> > > +	/*
> > > +	 * Basic sanity check: ensure the requested entry address actually
> > > +	 * exists within the guest's address space.
> > > +	 */
> > > +	entry_addr = smccc_get_arg2(source_vcpu);
> > > +	if (!kvm_ipa_valid(kvm, entry_addr)) {
> > > +
> > > +		/*
> > > +		 * Before PSCI v1.0, the INVALID_PARAMETERS error is returned
> > > +		 * instead of INVALID_ADDRESS.
> > > +		 *
> > > +		 * For more details, see ARM DEN0022D.b 5.6 "CPU_ON".
> > > +		 */
> > > +		if (kvm_psci_version(source_vcpu) < KVM_ARM_PSCI_1_0)
> > > +			return PSCI_RET_INVALID_PARAMS;
> > > +		else
> > > +			return PSCI_RET_INVALID_ADDRESS;
> > > +	}
> > > +
> > 
> > If you're concerned with this, should you also check for the PC
> > alignment, or the presence of a memslot covering the address you are
> > branching to?  Le latter is particularly hard to implement reliably.
> 
> Andrew, Reiji and I had a conversation regarding exactly this on the
> last run of this series, and concluded that checking against the IPA is
> probably the best KVM can do [1]. That said, alignment is also an easy
> thing to check.

Until you look at Thumb-2 ;-)

> 
> > So far, my position has been that the guest is free to shoot itself in
> > the foot if that's what it wants to do, and that babysitting it was a
> > waste of useful bits! ;-)
> >
> 
> Agreed -- there are plenty of spectacular/hilarious ways in which the
> guest can mess up :-)
> 
> > Or have you identified something that makes it a requirement to handle
> > this case (and possibly others)  in the hypervisor?
> 
> It is a lot easier to tell a guest that their software is broken if they
> get an error back from the hypercall, whereas a vCPU off in the weeds
> might need to be looked at before concluding there's a guest issue.

Fair enough. I'm not fundamentally against this patch. It is just a
bit out of context in this series.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 06/19] KVM: arm64: Track vCPU power state using MP state values
  2022-02-24 22:08       ` Oliver Upton
@ 2022-02-25 15:37         ` Marc Zyngier
  -1 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-02-25 15:37 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, Paolo Bonzini, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

On Thu, 24 Feb 2022 22:08:15 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> Hi Marc,
> 
> On Thu, Feb 24, 2022 at 01:25:04PM +0000, Marc Zyngier wrote:
> 
> [...]
> 
> > > @@ -190,7 +190,7 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
> > >  	 * re-initialized.
> > >  	 */
> > >  	kvm_for_each_vcpu(i, tmp, vcpu->kvm)
> > > -		tmp->arch.power_off = true;
> > > +		tmp->arch.mp_state = KVM_MP_STATE_STOPPED;
> > >  	kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
> > >  
> > >  	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
> > 
> > You also may want to initialise the mp_state to RUNNABLE by default in
> > kvm_arch_vcpu_create(). We are currently relying on power_off to be
> > false thanks to the vcpu struct being zeroed, but we may as well make
> > it clearer (RUNNABLE is also 0, so there is no actual bug here).
> 
> We unconditionally initialize power_off in
> kvm_arch_vcpu_ioctl_vcpu_init(), and do the same in this patch for mp_state,
> depending on if KVM_ARM_VCPU_POWER_OFF is set.

Ah, I missed that. Thanks for the heads up.

> Any objections to leaving that as-is? I can move the RUNNABLE case into
> kvm_arch_vcpu_create() as you've suggested, too.

No, that's just a brain fart on my part. Leave it as is.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 06/19] KVM: arm64: Track vCPU power state using MP state values
@ 2022-02-25 15:37         ` Marc Zyngier
  0 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-02-25 15:37 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Wanpeng Li, kvm, Joerg Roedel, Peter Shier, kvm-riscv,
	Atish Patra, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

On Thu, 24 Feb 2022 22:08:15 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> Hi Marc,
> 
> On Thu, Feb 24, 2022 at 01:25:04PM +0000, Marc Zyngier wrote:
> 
> [...]
> 
> > > @@ -190,7 +190,7 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
> > >  	 * re-initialized.
> > >  	 */
> > >  	kvm_for_each_vcpu(i, tmp, vcpu->kvm)
> > > -		tmp->arch.power_off = true;
> > > +		tmp->arch.mp_state = KVM_MP_STATE_STOPPED;
> > >  	kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
> > >  
> > >  	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
> > 
> > You also may want to initialise the mp_state to RUNNABLE by default in
> > kvm_arch_vcpu_create(). We are currently relying on power_off to be
> > false thanks to the vcpu struct being zeroed, but we may as well make
> > it clearer (RUNNABLE is also 0, so there is no actual bug here).
> 
> We unconditionally initialize power_off in
> kvm_arch_vcpu_ioctl_vcpu_init(), and do the same in this patch for mp_state,
> depending on if KVM_ARM_VCPU_POWER_OFF is set.

Ah, I missed that. Thanks for the heads up.

> Any objections to leaving that as-is? I can move the RUNNABLE case into
> kvm_arch_vcpu_create() as you've suggested, too.

No, that's just a brain fart on my part. Leave it as is.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 09/19] KVM: arm64: Implement PSCI SYSTEM_SUSPEND
  2022-02-24 19:35       ` Oliver Upton
@ 2022-02-25 18:58         ` Marc Zyngier
  -1 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-02-25 18:58 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, Paolo Bonzini, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

On Thu, 24 Feb 2022 19:35:33 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> Hi Marc,
> 
> Thanks for reviewing the series. ACK to the nits and smaller comments
> you've made, I'll incorporate that feedback in the next series.
> 
> On Thu, Feb 24, 2022 at 02:02:34PM +0000, Marc Zyngier wrote:
> > On Wed, 23 Feb 2022 04:18:34 +0000,
> > Oliver Upton <oupton@google.com> wrote:
> > > 
> > > ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows
> > > software to request that a system be placed in the deepest possible
> > > low-power state. Effectively, software can use this to suspend itself to
> > > RAM. Note that the semantics of this PSCI call are very similar to
> > > CPU_SUSPEND, which is already implemented in KVM.
> > > 
> > > Implement the SYSTEM_SUSPEND in KVM. Similar to CPU_SUSPEND, the
> > > low-power state is implemented as a guest WFI. Synchronously reset the
> > > calling CPU before entering the WFI, such that the vCPU may immediately
> > > resume execution when a wakeup event is recognized.
> > > 
> > > Signed-off-by: Oliver Upton <oupton@google.com>
> > > ---
> > >  arch/arm64/kvm/psci.c  | 51 ++++++++++++++++++++++++++++++++++++++++++
> > >  arch/arm64/kvm/reset.c |  3 ++-
> > >  2 files changed, 53 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > > index 77a00913cdfd..41adaaf2234a 100644
> > > --- a/arch/arm64/kvm/psci.c
> > > +++ b/arch/arm64/kvm/psci.c
> > > @@ -208,6 +208,50 @@ static void kvm_psci_system_reset(struct kvm_vcpu *vcpu)
> > >  	kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET);
> > >  }
> > >  
> > > +static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
> > > +{
> > > +	struct vcpu_reset_state reset_state;
> > > +	struct kvm *kvm = vcpu->kvm;
> > > +	struct kvm_vcpu *tmp;
> > > +	bool denied = false;
> > > +	unsigned long i;
> > > +
> > > +	reset_state.pc = smccc_get_arg1(vcpu);
> > > +	if (!kvm_ipa_valid(kvm, reset_state.pc)) {
> > > +		smccc_set_retval(vcpu, PSCI_RET_INVALID_ADDRESS, 0, 0, 0);
> > > +		return 1;
> > > +	}
> > > +
> > > +	reset_state.r0 = smccc_get_arg2(vcpu);
> > > +	reset_state.be = kvm_vcpu_is_be(vcpu);
> > > +	reset_state.reset = true;
> > > +
> > > +	/*
> > > +	 * The SYSTEM_SUSPEND PSCI call requires that all vCPUs (except the
> > > +	 * calling vCPU) be in an OFF state, as determined by the
> > > +	 * implementation.
> > > +	 *
> > > +	 * See ARM DEN0022D, 5.19 "SYSTEM_SUSPEND" for more details.
> > > +	 */
> > > +	mutex_lock(&kvm->lock);
> > > +	kvm_for_each_vcpu(i, tmp, kvm) {
> > > +		if (tmp != vcpu && !kvm_arm_vcpu_powered_off(tmp)) {
> > > +			denied = true;
> > > +			break;
> > > +		}
> > > +	}
> > > +	mutex_unlock(&kvm->lock);
> > 
> > This looks dodgy. Nothing seems to prevent userspace from setting the
> > mp_state to RUNNING in parallel with this, as only the vcpu mutex is
> > held when this ioctl is issued.
> > 
> > It looks to me that what you want is what lock_all_vcpus() does
> > (Alexandru has a patch moving it out of the vgic code as part of his
> > SPE series).
> > 
> > It is also pretty unclear what the interaction with userspace is once
> > you have released the lock. If the VMM starts a vcpu other than the
> > suspending one, what is its state? The spec doesn't see to help
> > here. I can see two options:
> > 
> > - either all the vcpus have the same reset state applied to them as
> >   they come up, unless they are started with CPU_ON by a vcpu that has
> >   already booted (but there is a single 'context_id' provided, and I
> >   fear this is going to confuse the OS)...
> > 
> > - or only the suspending vcpu can resume the system, and we must fail
> >   a change of mp_state for the other vcpus.
> > 
> > What do you think?
> 
> Definitely the latter. The documentation of SYSTEM_SUSPEND is quite
> shaky on this, but it would appear that the intention is for the caller
> to be the first CPU to wake up.

Yup. We now have clarification on the intent of the spec (only the
caller CPU can resume the system), and this needs to be tightened.

> 
> > > +
> > > +	if (denied) {
> > > +		smccc_set_retval(vcpu, PSCI_RET_DENIED, 0, 0, 0);
> > > +		return 1;
> > > +	}
> > > +
> > > +	__kvm_reset_vcpu(vcpu, &reset_state);
> > > +	kvm_vcpu_wfi(vcpu);
> > 
> > I have mixed feelings about this. The vcpu has reset before being in
> > WFI, while it really should be the other way around and userspace
> > could rely on observing the transition.
> > 
> > What breaks if you change this?
> 
> I don't think that userspace would be able to observe the transition
> even if we WFI before the reset.

I disagree. At any point can userspace issue a signal which would
trigger a return from WFI and an exit to userspace, and I don't think
this should result in a reset being observed.

This also means that SYSTEM_SUSPEND must be robust wrt signal
delivery, which it doesn't seem to be.

> I imagine that would take the form
> of setting KVM_REQ_VCPU_RESET, which we explicitly handle before
> letting userspace access the vCPU's state as of commit
> 6826c6849b46 ("KVM: arm64: Handle PSCI resets before userspace
> touches vCPU state").

In that case, the vcpu is ready to run, and is not blocked by
anything, so this is quite different.

>
> Given this, I felt it was probably best to avoid all the indirection and
> just do the vCPU reset in the handling of SYSTEM_SUSPEND. It does,
> however, imply that we have slightly different behavior when userspace
> exits are enabled, as that will happen pre-reset and pre-WFI.

And that's exactly the sort of behaviour I'd like to avoid if at all
possible. But maybe we don't need to support the standalone version
that doesn't involve userspace?

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 09/19] KVM: arm64: Implement PSCI SYSTEM_SUSPEND
@ 2022-02-25 18:58         ` Marc Zyngier
  0 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-02-25 18:58 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Wanpeng Li, kvm, Joerg Roedel, Peter Shier, kvm-riscv,
	Atish Patra, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

On Thu, 24 Feb 2022 19:35:33 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> Hi Marc,
> 
> Thanks for reviewing the series. ACK to the nits and smaller comments
> you've made, I'll incorporate that feedback in the next series.
> 
> On Thu, Feb 24, 2022 at 02:02:34PM +0000, Marc Zyngier wrote:
> > On Wed, 23 Feb 2022 04:18:34 +0000,
> > Oliver Upton <oupton@google.com> wrote:
> > > 
> > > ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows
> > > software to request that a system be placed in the deepest possible
> > > low-power state. Effectively, software can use this to suspend itself to
> > > RAM. Note that the semantics of this PSCI call are very similar to
> > > CPU_SUSPEND, which is already implemented in KVM.
> > > 
> > > Implement the SYSTEM_SUSPEND in KVM. Similar to CPU_SUSPEND, the
> > > low-power state is implemented as a guest WFI. Synchronously reset the
> > > calling CPU before entering the WFI, such that the vCPU may immediately
> > > resume execution when a wakeup event is recognized.
> > > 
> > > Signed-off-by: Oliver Upton <oupton@google.com>
> > > ---
> > >  arch/arm64/kvm/psci.c  | 51 ++++++++++++++++++++++++++++++++++++++++++
> > >  arch/arm64/kvm/reset.c |  3 ++-
> > >  2 files changed, 53 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > > index 77a00913cdfd..41adaaf2234a 100644
> > > --- a/arch/arm64/kvm/psci.c
> > > +++ b/arch/arm64/kvm/psci.c
> > > @@ -208,6 +208,50 @@ static void kvm_psci_system_reset(struct kvm_vcpu *vcpu)
> > >  	kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET);
> > >  }
> > >  
> > > +static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
> > > +{
> > > +	struct vcpu_reset_state reset_state;
> > > +	struct kvm *kvm = vcpu->kvm;
> > > +	struct kvm_vcpu *tmp;
> > > +	bool denied = false;
> > > +	unsigned long i;
> > > +
> > > +	reset_state.pc = smccc_get_arg1(vcpu);
> > > +	if (!kvm_ipa_valid(kvm, reset_state.pc)) {
> > > +		smccc_set_retval(vcpu, PSCI_RET_INVALID_ADDRESS, 0, 0, 0);
> > > +		return 1;
> > > +	}
> > > +
> > > +	reset_state.r0 = smccc_get_arg2(vcpu);
> > > +	reset_state.be = kvm_vcpu_is_be(vcpu);
> > > +	reset_state.reset = true;
> > > +
> > > +	/*
> > > +	 * The SYSTEM_SUSPEND PSCI call requires that all vCPUs (except the
> > > +	 * calling vCPU) be in an OFF state, as determined by the
> > > +	 * implementation.
> > > +	 *
> > > +	 * See ARM DEN0022D, 5.19 "SYSTEM_SUSPEND" for more details.
> > > +	 */
> > > +	mutex_lock(&kvm->lock);
> > > +	kvm_for_each_vcpu(i, tmp, kvm) {
> > > +		if (tmp != vcpu && !kvm_arm_vcpu_powered_off(tmp)) {
> > > +			denied = true;
> > > +			break;
> > > +		}
> > > +	}
> > > +	mutex_unlock(&kvm->lock);
> > 
> > This looks dodgy. Nothing seems to prevent userspace from setting the
> > mp_state to RUNNING in parallel with this, as only the vcpu mutex is
> > held when this ioctl is issued.
> > 
> > It looks to me that what you want is what lock_all_vcpus() does
> > (Alexandru has a patch moving it out of the vgic code as part of his
> > SPE series).
> > 
> > It is also pretty unclear what the interaction with userspace is once
> > you have released the lock. If the VMM starts a vcpu other than the
> > suspending one, what is its state? The spec doesn't see to help
> > here. I can see two options:
> > 
> > - either all the vcpus have the same reset state applied to them as
> >   they come up, unless they are started with CPU_ON by a vcpu that has
> >   already booted (but there is a single 'context_id' provided, and I
> >   fear this is going to confuse the OS)...
> > 
> > - or only the suspending vcpu can resume the system, and we must fail
> >   a change of mp_state for the other vcpus.
> > 
> > What do you think?
> 
> Definitely the latter. The documentation of SYSTEM_SUSPEND is quite
> shaky on this, but it would appear that the intention is for the caller
> to be the first CPU to wake up.

Yup. We now have clarification on the intent of the spec (only the
caller CPU can resume the system), and this needs to be tightened.

> 
> > > +
> > > +	if (denied) {
> > > +		smccc_set_retval(vcpu, PSCI_RET_DENIED, 0, 0, 0);
> > > +		return 1;
> > > +	}
> > > +
> > > +	__kvm_reset_vcpu(vcpu, &reset_state);
> > > +	kvm_vcpu_wfi(vcpu);
> > 
> > I have mixed feelings about this. The vcpu has reset before being in
> > WFI, while it really should be the other way around and userspace
> > could rely on observing the transition.
> > 
> > What breaks if you change this?
> 
> I don't think that userspace would be able to observe the transition
> even if we WFI before the reset.

I disagree. At any point can userspace issue a signal which would
trigger a return from WFI and an exit to userspace, and I don't think
this should result in a reset being observed.

This also means that SYSTEM_SUSPEND must be robust wrt signal
delivery, which it doesn't seem to be.

> I imagine that would take the form
> of setting KVM_REQ_VCPU_RESET, which we explicitly handle before
> letting userspace access the vCPU's state as of commit
> 6826c6849b46 ("KVM: arm64: Handle PSCI resets before userspace
> touches vCPU state").

In that case, the vcpu is ready to run, and is not blocked by
anything, so this is quite different.

>
> Given this, I felt it was probably best to avoid all the indirection and
> just do the vCPU reset in the handling of SYSTEM_SUSPEND. It does,
> however, imply that we have slightly different behavior when userspace
> exits are enabled, as that will happen pre-reset and pre-WFI.

And that's exactly the sort of behaviour I'd like to avoid if at all
possible. But maybe we don't need to support the standalone version
that doesn't involve userspace?

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 13/19] KVM: arm64: Add support KVM_SYSTEM_EVENT_SUSPEND to PSCI SYSTEM_SUSPEND
  2022-02-24 20:05       ` Oliver Upton
@ 2022-02-26 11:29         ` Marc Zyngier
  -1 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-02-26 11:29 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, Paolo Bonzini, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

On Thu, 24 Feb 2022 20:05:59 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> On Thu, Feb 24, 2022 at 03:40:15PM +0000, Marc Zyngier wrote:
> > > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > > index 2bb8d047cde4..a7de84cec2e4 100644
> > > --- a/arch/arm64/kvm/psci.c
> > > +++ b/arch/arm64/kvm/psci.c
> > > @@ -245,6 +245,11 @@ static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
> > >  		return 1;
> > >  	}
> > >  
> > > +	if (kvm->arch.system_suspend_exits) {
> > > +		kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_SUSPEND);
> > > +		return 0;
> > > +	}
> > > +
> > 
> > So there really is a difference in behaviour here. Userspace sees the
> > WFI behaviour before reset (it implements it), while when not using
> > the SUSPEND event, reset occurs before anything else.
> > 
> > They really should behave in a similar way (WFI first, reset next).
> 
> I mentioned this on the other patch, but I think the conversation should
> continue here as UAPI context is in this one.
> 
> If SUSPEND exits are disabled and SYSTEM_SUSPEND is implemented in the
> kernel, userspace cannot observe any intermediate state. I think it is
> necessary for migration, otherwise if userspace were to save the vCPU
> post-WFI, pre-reset the pending reset would get lost along the way.
> 
> As far as userspace is concerned, I think the WFI+reset operation is
> atomic. SUSPEND exits just allow userspace to intervene before said
> atomic operation.
>
> Perhaps I'm missing something: assuming SUSPEND exits are disabled, what
> value is provided to userspace if it can see WFI behavior before the
> reset?

Signals get in the way, and break the notion of atomicity. Userspace
*will* observe this.

I agree that save/restore is an important point, and that snapshoting
the guest at this stage should capture the reset value. But it is the
asymmetry of the behaviours that I find jarring:

- if you ask for userspace exit, no reset value is applied and you
  need to implement the reset in userspace

- if you *don't* ask for a userspace exit, the reset values are
  applied, and a signal while in WFI will result in this reset being
  observed

Why can't the userspace exit path also apply the reset values *before*
exiting? After all, you can model this exit to userspace as
reset+WFI+'spurious exit from WFI'. This would at least unify the two
behaviours.

I still dislike the reset state being applied early, but consistency
(and save/restore) trumps taste here. I know I'm being pedantic here,
but we've been burned with loosely defined semantics in the past, and
I want to get this right. Or less wrong.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 13/19] KVM: arm64: Add support KVM_SYSTEM_EVENT_SUSPEND to PSCI SYSTEM_SUSPEND
@ 2022-02-26 11:29         ` Marc Zyngier
  0 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-02-26 11:29 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Wanpeng Li, kvm, Joerg Roedel, Peter Shier, kvm-riscv,
	Atish Patra, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

On Thu, 24 Feb 2022 20:05:59 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> On Thu, Feb 24, 2022 at 03:40:15PM +0000, Marc Zyngier wrote:
> > > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > > index 2bb8d047cde4..a7de84cec2e4 100644
> > > --- a/arch/arm64/kvm/psci.c
> > > +++ b/arch/arm64/kvm/psci.c
> > > @@ -245,6 +245,11 @@ static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
> > >  		return 1;
> > >  	}
> > >  
> > > +	if (kvm->arch.system_suspend_exits) {
> > > +		kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_SUSPEND);
> > > +		return 0;
> > > +	}
> > > +
> > 
> > So there really is a difference in behaviour here. Userspace sees the
> > WFI behaviour before reset (it implements it), while when not using
> > the SUSPEND event, reset occurs before anything else.
> > 
> > They really should behave in a similar way (WFI first, reset next).
> 
> I mentioned this on the other patch, but I think the conversation should
> continue here as UAPI context is in this one.
> 
> If SUSPEND exits are disabled and SYSTEM_SUSPEND is implemented in the
> kernel, userspace cannot observe any intermediate state. I think it is
> necessary for migration, otherwise if userspace were to save the vCPU
> post-WFI, pre-reset the pending reset would get lost along the way.
> 
> As far as userspace is concerned, I think the WFI+reset operation is
> atomic. SUSPEND exits just allow userspace to intervene before said
> atomic operation.
>
> Perhaps I'm missing something: assuming SUSPEND exits are disabled, what
> value is provided to userspace if it can see WFI behavior before the
> reset?

Signals get in the way, and break the notion of atomicity. Userspace
*will* observe this.

I agree that save/restore is an important point, and that snapshoting
the guest at this stage should capture the reset value. But it is the
asymmetry of the behaviours that I find jarring:

- if you ask for userspace exit, no reset value is applied and you
  need to implement the reset in userspace

- if you *don't* ask for a userspace exit, the reset values are
  applied, and a signal while in WFI will result in this reset being
  observed

Why can't the userspace exit path also apply the reset values *before*
exiting? After all, you can model this exit to userspace as
reset+WFI+'spurious exit from WFI'. This would at least unify the two
behaviours.

I still dislike the reset state being applied early, but consistency
(and save/restore) trumps taste here. I know I'm being pedantic here,
but we've been burned with loosely defined semantics in the past, and
I want to get this right. Or less wrong.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 13/19] KVM: arm64: Add support KVM_SYSTEM_EVENT_SUSPEND to PSCI SYSTEM_SUSPEND
  2022-02-26 11:29         ` Marc Zyngier
@ 2022-02-26 18:28           ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-26 18:28 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Wanpeng Li, kvm, Joerg Roedel, Peter Shier, kvm-riscv,
	Atish Patra, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

On Sat, Feb 26, 2022 at 3:29 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Thu, 24 Feb 2022 20:05:59 +0000,
> Oliver Upton <oupton@google.com> wrote:
> >
> > On Thu, Feb 24, 2022 at 03:40:15PM +0000, Marc Zyngier wrote:
> > > > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > > > index 2bb8d047cde4..a7de84cec2e4 100644
> > > > --- a/arch/arm64/kvm/psci.c
> > > > +++ b/arch/arm64/kvm/psci.c
> > > > @@ -245,6 +245,11 @@ static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
> > > >           return 1;
> > > >   }
> > > >
> > > > + if (kvm->arch.system_suspend_exits) {
> > > > +         kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_SUSPEND);
> > > > +         return 0;
> > > > + }
> > > > +
> > >
> > > So there really is a difference in behaviour here. Userspace sees the
> > > WFI behaviour before reset (it implements it), while when not using
> > > the SUSPEND event, reset occurs before anything else.
> > >
> > > They really should behave in a similar way (WFI first, reset next).
> >
> > I mentioned this on the other patch, but I think the conversation should
> > continue here as UAPI context is in this one.
> >
> > If SUSPEND exits are disabled and SYSTEM_SUSPEND is implemented in the
> > kernel, userspace cannot observe any intermediate state. I think it is
> > necessary for migration, otherwise if userspace were to save the vCPU
> > post-WFI, pre-reset the pending reset would get lost along the way.
> >
> > As far as userspace is concerned, I think the WFI+reset operation is
> > atomic. SUSPEND exits just allow userspace to intervene before said
> > atomic operation.
> >
> > Perhaps I'm missing something: assuming SUSPEND exits are disabled, what
> > value is provided to userspace if it can see WFI behavior before the
> > reset?
>
> Signals get in the way, and break the notion of atomicity. Userspace
> *will* observe this.
>
> I agree that save/restore is an important point, and that snapshoting
> the guest at this stage should capture the reset value. But it is the
> asymmetry of the behaviours that I find jarring:
>
> - if you ask for userspace exit, no reset value is applied and you
>   need to implement the reset in userspace
>
> - if you *don't* ask for a userspace exit, the reset values are
>   applied, and a signal while in WFI will result in this reset being
>   observed
>
> Why can't the userspace exit path also apply the reset values *before*
> exiting? After all, you can model this exit to userspace as
> reset+WFI+'spurious exit from WFI'. This would at least unify the two
> behaviours.

I hesitated applying the reset context to the CPU before the userspace
exit because that would be wildly different from the other system
events. Userspace wouldn't have much choice but to comply with the
guest request at that point.

What about adopting the following:

 - Drop the in-kernel SYSTEM_SUSPEND emulation. I think you were
getting at this point in [1], and I'd certainly be open to it. Without
a userspace exit, I don't think there is anything meaningfully
different between this call and a WFI instruction.

 - Add data to the kvm_run structure to convey the reset state for a
SYSTEM_SUSPEND exit. There's plenty of room left in the structure for
more, and can be done generically (just an array of data) for future
expansion. We already are going to need a code change in userspace to
do this right, so may as well update its view of kvm_run along the
way.

 - Exit to userspace with PSCI_RET_INTERNAL_FAILURE queued up for the
guest. Doing so keeps the exits consistent with the other system
exits, and affords userspace the ability to deny the call when it
wants to.

[1]: http://lore.kernel.org/r/87fso63ha2.wl-maz@kernel.org

> I still dislike the reset state being applied early, but consistency
> (and save/restore) trumps taste here. I know I'm being pedantic here,
> but we've been burned with loosely defined semantics in the past, and
> I want to get this right. Or less wrong.

I completely agree with you. The semantics are a bit funky, and I
really do wonder if the easiest way around that is to just make the
implementation a userspace problem.

--
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 13/19] KVM: arm64: Add support KVM_SYSTEM_EVENT_SUSPEND to PSCI SYSTEM_SUSPEND
@ 2022-02-26 18:28           ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-02-26 18:28 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, Paolo Bonzini, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

On Sat, Feb 26, 2022 at 3:29 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Thu, 24 Feb 2022 20:05:59 +0000,
> Oliver Upton <oupton@google.com> wrote:
> >
> > On Thu, Feb 24, 2022 at 03:40:15PM +0000, Marc Zyngier wrote:
> > > > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > > > index 2bb8d047cde4..a7de84cec2e4 100644
> > > > --- a/arch/arm64/kvm/psci.c
> > > > +++ b/arch/arm64/kvm/psci.c
> > > > @@ -245,6 +245,11 @@ static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
> > > >           return 1;
> > > >   }
> > > >
> > > > + if (kvm->arch.system_suspend_exits) {
> > > > +         kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_SUSPEND);
> > > > +         return 0;
> > > > + }
> > > > +
> > >
> > > So there really is a difference in behaviour here. Userspace sees the
> > > WFI behaviour before reset (it implements it), while when not using
> > > the SUSPEND event, reset occurs before anything else.
> > >
> > > They really should behave in a similar way (WFI first, reset next).
> >
> > I mentioned this on the other patch, but I think the conversation should
> > continue here as UAPI context is in this one.
> >
> > If SUSPEND exits are disabled and SYSTEM_SUSPEND is implemented in the
> > kernel, userspace cannot observe any intermediate state. I think it is
> > necessary for migration, otherwise if userspace were to save the vCPU
> > post-WFI, pre-reset the pending reset would get lost along the way.
> >
> > As far as userspace is concerned, I think the WFI+reset operation is
> > atomic. SUSPEND exits just allow userspace to intervene before said
> > atomic operation.
> >
> > Perhaps I'm missing something: assuming SUSPEND exits are disabled, what
> > value is provided to userspace if it can see WFI behavior before the
> > reset?
>
> Signals get in the way, and break the notion of atomicity. Userspace
> *will* observe this.
>
> I agree that save/restore is an important point, and that snapshoting
> the guest at this stage should capture the reset value. But it is the
> asymmetry of the behaviours that I find jarring:
>
> - if you ask for userspace exit, no reset value is applied and you
>   need to implement the reset in userspace
>
> - if you *don't* ask for a userspace exit, the reset values are
>   applied, and a signal while in WFI will result in this reset being
>   observed
>
> Why can't the userspace exit path also apply the reset values *before*
> exiting? After all, you can model this exit to userspace as
> reset+WFI+'spurious exit from WFI'. This would at least unify the two
> behaviours.

I hesitated applying the reset context to the CPU before the userspace
exit because that would be wildly different from the other system
events. Userspace wouldn't have much choice but to comply with the
guest request at that point.

What about adopting the following:

 - Drop the in-kernel SYSTEM_SUSPEND emulation. I think you were
getting at this point in [1], and I'd certainly be open to it. Without
a userspace exit, I don't think there is anything meaningfully
different between this call and a WFI instruction.

 - Add data to the kvm_run structure to convey the reset state for a
SYSTEM_SUSPEND exit. There's plenty of room left in the structure for
more, and can be done generically (just an array of data) for future
expansion. We already are going to need a code change in userspace to
do this right, so may as well update its view of kvm_run along the
way.

 - Exit to userspace with PSCI_RET_INTERNAL_FAILURE queued up for the
guest. Doing so keeps the exits consistent with the other system
exits, and affords userspace the ability to deny the call when it
wants to.

[1]: http://lore.kernel.org/r/87fso63ha2.wl-maz@kernel.org

> I still dislike the reset state being applied early, but consistency
> (and save/restore) trumps taste here. I know I'm being pedantic here,
> but we've been burned with loosely defined semantics in the past, and
> I want to get this right. Or less wrong.

I completely agree with you. The semantics are a bit funky, and I
really do wonder if the easiest way around that is to just make the
implementation a userspace problem.

--
Oliver

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 13/19] KVM: arm64: Add support KVM_SYSTEM_EVENT_SUSPEND to PSCI SYSTEM_SUSPEND
  2022-02-26 18:28           ` Oliver Upton
@ 2022-03-02  9:52             ` Marc Zyngier
  -1 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-03-02  9:52 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, Paolo Bonzini, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

On Sat, 26 Feb 2022 18:28:21 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> On Sat, Feb 26, 2022 at 3:29 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Thu, 24 Feb 2022 20:05:59 +0000,
> > Oliver Upton <oupton@google.com> wrote:
> > >
> > > On Thu, Feb 24, 2022 at 03:40:15PM +0000, Marc Zyngier wrote:
> > > > > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > > > > index 2bb8d047cde4..a7de84cec2e4 100644
> > > > > --- a/arch/arm64/kvm/psci.c
> > > > > +++ b/arch/arm64/kvm/psci.c
> > > > > @@ -245,6 +245,11 @@ static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
> > > > >           return 1;
> > > > >   }
> > > > >
> > > > > + if (kvm->arch.system_suspend_exits) {
> > > > > +         kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_SUSPEND);
> > > > > +         return 0;
> > > > > + }
> > > > > +
> > > >
> > > > So there really is a difference in behaviour here. Userspace sees the
> > > > WFI behaviour before reset (it implements it), while when not using
> > > > the SUSPEND event, reset occurs before anything else.
> > > >
> > > > They really should behave in a similar way (WFI first, reset next).
> > >
> > > I mentioned this on the other patch, but I think the conversation should
> > > continue here as UAPI context is in this one.
> > >
> > > If SUSPEND exits are disabled and SYSTEM_SUSPEND is implemented in the
> > > kernel, userspace cannot observe any intermediate state. I think it is
> > > necessary for migration, otherwise if userspace were to save the vCPU
> > > post-WFI, pre-reset the pending reset would get lost along the way.
> > >
> > > As far as userspace is concerned, I think the WFI+reset operation is
> > > atomic. SUSPEND exits just allow userspace to intervene before said
> > > atomic operation.
> > >
> > > Perhaps I'm missing something: assuming SUSPEND exits are disabled, what
> > > value is provided to userspace if it can see WFI behavior before the
> > > reset?
> >
> > Signals get in the way, and break the notion of atomicity. Userspace
> > *will* observe this.
> >
> > I agree that save/restore is an important point, and that snapshoting
> > the guest at this stage should capture the reset value. But it is the
> > asymmetry of the behaviours that I find jarring:
> >
> > - if you ask for userspace exit, no reset value is applied and you
> >   need to implement the reset in userspace
> >
> > - if you *don't* ask for a userspace exit, the reset values are
> >   applied, and a signal while in WFI will result in this reset being
> >   observed
> >
> > Why can't the userspace exit path also apply the reset values *before*
> > exiting? After all, you can model this exit to userspace as
> > reset+WFI+'spurious exit from WFI'. This would at least unify the two
> > behaviours.
> 
> I hesitated applying the reset context to the CPU before the userspace
> exit because that would be wildly different from the other system
> events. Userspace wouldn't have much choice but to comply with the
> guest request at that point.
> 
> What about adopting the following:
> 
>  - Drop the in-kernel SYSTEM_SUSPEND emulation. I think you were
> getting at this point in [1], and I'd certainly be open to it. Without
> a userspace exit, I don't think there is anything meaningfully
> different between this call and a WFI instruction.

The only difference is the reset part. And I agree, it only makes the
kernel part more complicated than we strictly need it to be. It also
slightly clashes with the rest of the system events, in the sense that
it is the only one that would have an in-kernel implementation (both
reboot and power-off are entirely implemented in userspace).

So I definitely agree about dropping this.

> 
>  - Add data to the kvm_run structure to convey the reset state for a
> SYSTEM_SUSPEND exit. There's plenty of room left in the structure for
> more, and can be done generically (just an array of data) for future
> expansion. We already are going to need a code change in userspace to
> do this right, so may as well update its view of kvm_run along the
> way.

The reset state is already available in the guest registers, which are
available to userspace. What else do we need to expose?

>  - Exit to userspace with PSCI_RET_INTERNAL_FAILURE queued up for the
> guest. Doing so keeps the exits consistent with the other system
> exits, and affords userspace the ability to deny the call when it
> wants to.

Yup, that's what I like about pushing this completely to userspace.

> 
> [1]: http://lore.kernel.org/r/87fso63ha2.wl-maz@kernel.org
> 
> > I still dislike the reset state being applied early, but consistency
> > (and save/restore) trumps taste here. I know I'm being pedantic here,
> > but we've been burned with loosely defined semantics in the past, and
> > I want to get this right. Or less wrong.
> 
> I completely agree with you. The semantics are a bit funky, and I
> really do wonder if the easiest way around that is to just make the
> implementation a userspace problem.

We're in violent agreement. It means that we only need the MP_STATE
part to implement WFI from userspace.

Could you try and respin this? Also, it'd be good to see a prototype
of userspace code using this, as this is a new API.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 13/19] KVM: arm64: Add support KVM_SYSTEM_EVENT_SUSPEND to PSCI SYSTEM_SUSPEND
@ 2022-03-02  9:52             ` Marc Zyngier
  0 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-03-02  9:52 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Wanpeng Li, kvm, Joerg Roedel, Peter Shier, kvm-riscv,
	Atish Patra, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

On Sat, 26 Feb 2022 18:28:21 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
> On Sat, Feb 26, 2022 at 3:29 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > On Thu, 24 Feb 2022 20:05:59 +0000,
> > Oliver Upton <oupton@google.com> wrote:
> > >
> > > On Thu, Feb 24, 2022 at 03:40:15PM +0000, Marc Zyngier wrote:
> > > > > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > > > > index 2bb8d047cde4..a7de84cec2e4 100644
> > > > > --- a/arch/arm64/kvm/psci.c
> > > > > +++ b/arch/arm64/kvm/psci.c
> > > > > @@ -245,6 +245,11 @@ static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
> > > > >           return 1;
> > > > >   }
> > > > >
> > > > > + if (kvm->arch.system_suspend_exits) {
> > > > > +         kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_SUSPEND);
> > > > > +         return 0;
> > > > > + }
> > > > > +
> > > >
> > > > So there really is a difference in behaviour here. Userspace sees the
> > > > WFI behaviour before reset (it implements it), while when not using
> > > > the SUSPEND event, reset occurs before anything else.
> > > >
> > > > They really should behave in a similar way (WFI first, reset next).
> > >
> > > I mentioned this on the other patch, but I think the conversation should
> > > continue here as UAPI context is in this one.
> > >
> > > If SUSPEND exits are disabled and SYSTEM_SUSPEND is implemented in the
> > > kernel, userspace cannot observe any intermediate state. I think it is
> > > necessary for migration, otherwise if userspace were to save the vCPU
> > > post-WFI, pre-reset the pending reset would get lost along the way.
> > >
> > > As far as userspace is concerned, I think the WFI+reset operation is
> > > atomic. SUSPEND exits just allow userspace to intervene before said
> > > atomic operation.
> > >
> > > Perhaps I'm missing something: assuming SUSPEND exits are disabled, what
> > > value is provided to userspace if it can see WFI behavior before the
> > > reset?
> >
> > Signals get in the way, and break the notion of atomicity. Userspace
> > *will* observe this.
> >
> > I agree that save/restore is an important point, and that snapshoting
> > the guest at this stage should capture the reset value. But it is the
> > asymmetry of the behaviours that I find jarring:
> >
> > - if you ask for userspace exit, no reset value is applied and you
> >   need to implement the reset in userspace
> >
> > - if you *don't* ask for a userspace exit, the reset values are
> >   applied, and a signal while in WFI will result in this reset being
> >   observed
> >
> > Why can't the userspace exit path also apply the reset values *before*
> > exiting? After all, you can model this exit to userspace as
> > reset+WFI+'spurious exit from WFI'. This would at least unify the two
> > behaviours.
> 
> I hesitated applying the reset context to the CPU before the userspace
> exit because that would be wildly different from the other system
> events. Userspace wouldn't have much choice but to comply with the
> guest request at that point.
> 
> What about adopting the following:
> 
>  - Drop the in-kernel SYSTEM_SUSPEND emulation. I think you were
> getting at this point in [1], and I'd certainly be open to it. Without
> a userspace exit, I don't think there is anything meaningfully
> different between this call and a WFI instruction.

The only difference is the reset part. And I agree, it only makes the
kernel part more complicated than we strictly need it to be. It also
slightly clashes with the rest of the system events, in the sense that
it is the only one that would have an in-kernel implementation (both
reboot and power-off are entirely implemented in userspace).

So I definitely agree about dropping this.

> 
>  - Add data to the kvm_run structure to convey the reset state for a
> SYSTEM_SUSPEND exit. There's plenty of room left in the structure for
> more, and can be done generically (just an array of data) for future
> expansion. We already are going to need a code change in userspace to
> do this right, so may as well update its view of kvm_run along the
> way.

The reset state is already available in the guest registers, which are
available to userspace. What else do we need to expose?

>  - Exit to userspace with PSCI_RET_INTERNAL_FAILURE queued up for the
> guest. Doing so keeps the exits consistent with the other system
> exits, and affords userspace the ability to deny the call when it
> wants to.

Yup, that's what I like about pushing this completely to userspace.

> 
> [1]: http://lore.kernel.org/r/87fso63ha2.wl-maz@kernel.org
> 
> > I still dislike the reset state being applied early, but consistency
> > (and save/restore) trumps taste here. I know I'm being pedantic here,
> > but we've been burned with loosely defined semantics in the past, and
> > I want to get this right. Or less wrong.
> 
> I completely agree with you. The semantics are a bit funky, and I
> really do wonder if the easiest way around that is to just make the
> implementation a userspace problem.

We're in violent agreement. It means that we only need the MP_STATE
part to implement WFI from userspace.

Could you try and respin this? Also, it'd be good to see a prototype
of userspace code using this, as this is a new API.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 13/19] KVM: arm64: Add support KVM_SYSTEM_EVENT_SUSPEND to PSCI SYSTEM_SUSPEND
  2022-03-02  9:52             ` Marc Zyngier
@ 2022-03-02  9:57               ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-03-02  9:57 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Wanpeng Li, kvm, Joerg Roedel, Peter Shier, kvm-riscv,
	Atish Patra, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

On Wed, Mar 2, 2022 at 1:52 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Sat, 26 Feb 2022 18:28:21 +0000,
> Oliver Upton <oupton@google.com> wrote:
> >
> > On Sat, Feb 26, 2022 at 3:29 AM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > On Thu, 24 Feb 2022 20:05:59 +0000,
> > > Oliver Upton <oupton@google.com> wrote:
> > > >
> > > > On Thu, Feb 24, 2022 at 03:40:15PM +0000, Marc Zyngier wrote:
> > > > > > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > > > > > index 2bb8d047cde4..a7de84cec2e4 100644
> > > > > > --- a/arch/arm64/kvm/psci.c
> > > > > > +++ b/arch/arm64/kvm/psci.c
> > > > > > @@ -245,6 +245,11 @@ static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
> > > > > >           return 1;
> > > > > >   }
> > > > > >
> > > > > > + if (kvm->arch.system_suspend_exits) {
> > > > > > +         kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_SUSPEND);
> > > > > > +         return 0;
> > > > > > + }
> > > > > > +
> > > > >
> > > > > So there really is a difference in behaviour here. Userspace sees the
> > > > > WFI behaviour before reset (it implements it), while when not using
> > > > > the SUSPEND event, reset occurs before anything else.
> > > > >
> > > > > They really should behave in a similar way (WFI first, reset next).
> > > >
> > > > I mentioned this on the other patch, but I think the conversation should
> > > > continue here as UAPI context is in this one.
> > > >
> > > > If SUSPEND exits are disabled and SYSTEM_SUSPEND is implemented in the
> > > > kernel, userspace cannot observe any intermediate state. I think it is
> > > > necessary for migration, otherwise if userspace were to save the vCPU
> > > > post-WFI, pre-reset the pending reset would get lost along the way.
> > > >
> > > > As far as userspace is concerned, I think the WFI+reset operation is
> > > > atomic. SUSPEND exits just allow userspace to intervene before said
> > > > atomic operation.
> > > >
> > > > Perhaps I'm missing something: assuming SUSPEND exits are disabled, what
> > > > value is provided to userspace if it can see WFI behavior before the
> > > > reset?
> > >
> > > Signals get in the way, and break the notion of atomicity. Userspace
> > > *will* observe this.
> > >
> > > I agree that save/restore is an important point, and that snapshoting
> > > the guest at this stage should capture the reset value. But it is the
> > > asymmetry of the behaviours that I find jarring:
> > >
> > > - if you ask for userspace exit, no reset value is applied and you
> > >   need to implement the reset in userspace
> > >
> > > - if you *don't* ask for a userspace exit, the reset values are
> > >   applied, and a signal while in WFI will result in this reset being
> > >   observed
> > >
> > > Why can't the userspace exit path also apply the reset values *before*
> > > exiting? After all, you can model this exit to userspace as
> > > reset+WFI+'spurious exit from WFI'. This would at least unify the two
> > > behaviours.
> >
> > I hesitated applying the reset context to the CPU before the userspace
> > exit because that would be wildly different from the other system
> > events. Userspace wouldn't have much choice but to comply with the
> > guest request at that point.
> >
> > What about adopting the following:
> >
> >  - Drop the in-kernel SYSTEM_SUSPEND emulation. I think you were
> > getting at this point in [1], and I'd certainly be open to it. Without
> > a userspace exit, I don't think there is anything meaningfully
> > different between this call and a WFI instruction.
>
> The only difference is the reset part. And I agree, it only makes the
> kernel part more complicated than we strictly need it to be. It also
> slightly clashes with the rest of the system events, in the sense that
> it is the only one that would have an in-kernel implementation (both
> reboot and power-off are entirely implemented in userspace).
>
> So I definitely agree about dropping this.
>
> >
> >  - Add data to the kvm_run structure to convey the reset state for a
> > SYSTEM_SUSPEND exit. There's plenty of room left in the structure for
> > more, and can be done generically (just an array of data) for future
> > expansion. We already are going to need a code change in userspace to
> > do this right, so may as well update its view of kvm_run along the
> > way.
>
> The reset state is already available in the guest registers, which are
> available to userspace. What else do we need to expose?

Nothing. It is just a slight nitnoid thing for me where
KVM_EXIT_SYSTEM_SUSPEND behaves a bit differently than the others. If
a VMM wants to reject the call, it needs to manually set up the SMCCC
return value, whereas on the others a naive call to KVM_RUN will do
the job since KVM already sets up the failure value.

Unsure if this warrants a kvm_run change, leaning towards no if it is
well documented.

> >  - Exit to userspace with PSCI_RET_INTERNAL_FAILURE queued up for the
> > guest. Doing so keeps the exits consistent with the other system
> > exits, and affords userspace the ability to deny the call when it
> > wants to.
>
> Yup, that's what I like about pushing this completely to userspace.
>
> >
> > [1]: http://lore.kernel.org/r/87fso63ha2.wl-maz@kernel.org
> >
> > > I still dislike the reset state being applied early, but consistency
> > > (and save/restore) trumps taste here. I know I'm being pedantic here,
> > > but we've been burned with loosely defined semantics in the past, and
> > > I want to get this right. Or less wrong.
> >
> > I completely agree with you. The semantics are a bit funky, and I
> > really do wonder if the easiest way around that is to just make the
> > implementation a userspace problem.
>
> We're in violent agreement.

Lol

> It means that we only need the MP_STATE
> part to implement WFI from userspace.
>
> Could you try and respin this? Also, it'd be good to see a prototype
> of userspace code using this, as this is a new API.

Sure thing. I'll keep it to kvmtool, since that's the most familiar to
me. Also, I think I had an RFC for kvmtool many moons ago...

--
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 13/19] KVM: arm64: Add support KVM_SYSTEM_EVENT_SUSPEND to PSCI SYSTEM_SUSPEND
@ 2022-03-02  9:57               ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-03-02  9:57 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, Paolo Bonzini, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

On Wed, Mar 2, 2022 at 1:52 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On Sat, 26 Feb 2022 18:28:21 +0000,
> Oliver Upton <oupton@google.com> wrote:
> >
> > On Sat, Feb 26, 2022 at 3:29 AM Marc Zyngier <maz@kernel.org> wrote:
> > >
> > > On Thu, 24 Feb 2022 20:05:59 +0000,
> > > Oliver Upton <oupton@google.com> wrote:
> > > >
> > > > On Thu, Feb 24, 2022 at 03:40:15PM +0000, Marc Zyngier wrote:
> > > > > > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > > > > > index 2bb8d047cde4..a7de84cec2e4 100644
> > > > > > --- a/arch/arm64/kvm/psci.c
> > > > > > +++ b/arch/arm64/kvm/psci.c
> > > > > > @@ -245,6 +245,11 @@ static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
> > > > > >           return 1;
> > > > > >   }
> > > > > >
> > > > > > + if (kvm->arch.system_suspend_exits) {
> > > > > > +         kvm_vcpu_set_system_event_exit(vcpu, KVM_SYSTEM_EVENT_SUSPEND);
> > > > > > +         return 0;
> > > > > > + }
> > > > > > +
> > > > >
> > > > > So there really is a difference in behaviour here. Userspace sees the
> > > > > WFI behaviour before reset (it implements it), while when not using
> > > > > the SUSPEND event, reset occurs before anything else.
> > > > >
> > > > > They really should behave in a similar way (WFI first, reset next).
> > > >
> > > > I mentioned this on the other patch, but I think the conversation should
> > > > continue here as UAPI context is in this one.
> > > >
> > > > If SUSPEND exits are disabled and SYSTEM_SUSPEND is implemented in the
> > > > kernel, userspace cannot observe any intermediate state. I think it is
> > > > necessary for migration, otherwise if userspace were to save the vCPU
> > > > post-WFI, pre-reset the pending reset would get lost along the way.
> > > >
> > > > As far as userspace is concerned, I think the WFI+reset operation is
> > > > atomic. SUSPEND exits just allow userspace to intervene before said
> > > > atomic operation.
> > > >
> > > > Perhaps I'm missing something: assuming SUSPEND exits are disabled, what
> > > > value is provided to userspace if it can see WFI behavior before the
> > > > reset?
> > >
> > > Signals get in the way, and break the notion of atomicity. Userspace
> > > *will* observe this.
> > >
> > > I agree that save/restore is an important point, and that snapshoting
> > > the guest at this stage should capture the reset value. But it is the
> > > asymmetry of the behaviours that I find jarring:
> > >
> > > - if you ask for userspace exit, no reset value is applied and you
> > >   need to implement the reset in userspace
> > >
> > > - if you *don't* ask for a userspace exit, the reset values are
> > >   applied, and a signal while in WFI will result in this reset being
> > >   observed
> > >
> > > Why can't the userspace exit path also apply the reset values *before*
> > > exiting? After all, you can model this exit to userspace as
> > > reset+WFI+'spurious exit from WFI'. This would at least unify the two
> > > behaviours.
> >
> > I hesitated applying the reset context to the CPU before the userspace
> > exit because that would be wildly different from the other system
> > events. Userspace wouldn't have much choice but to comply with the
> > guest request at that point.
> >
> > What about adopting the following:
> >
> >  - Drop the in-kernel SYSTEM_SUSPEND emulation. I think you were
> > getting at this point in [1], and I'd certainly be open to it. Without
> > a userspace exit, I don't think there is anything meaningfully
> > different between this call and a WFI instruction.
>
> The only difference is the reset part. And I agree, it only makes the
> kernel part more complicated than we strictly need it to be. It also
> slightly clashes with the rest of the system events, in the sense that
> it is the only one that would have an in-kernel implementation (both
> reboot and power-off are entirely implemented in userspace).
>
> So I definitely agree about dropping this.
>
> >
> >  - Add data to the kvm_run structure to convey the reset state for a
> > SYSTEM_SUSPEND exit. There's plenty of room left in the structure for
> > more, and can be done generically (just an array of data) for future
> > expansion. We already are going to need a code change in userspace to
> > do this right, so may as well update its view of kvm_run along the
> > way.
>
> The reset state is already available in the guest registers, which are
> available to userspace. What else do we need to expose?

Nothing. It is just a slight nitnoid thing for me where
KVM_EXIT_SYSTEM_SUSPEND behaves a bit differently than the others. If
a VMM wants to reject the call, it needs to manually set up the SMCCC
return value, whereas on the others a naive call to KVM_RUN will do
the job since KVM already sets up the failure value.

Unsure if this warrants a kvm_run change, leaning towards no if it is
well documented.

> >  - Exit to userspace with PSCI_RET_INTERNAL_FAILURE queued up for the
> > guest. Doing so keeps the exits consistent with the other system
> > exits, and affords userspace the ability to deny the call when it
> > wants to.
>
> Yup, that's what I like about pushing this completely to userspace.
>
> >
> > [1]: http://lore.kernel.org/r/87fso63ha2.wl-maz@kernel.org
> >
> > > I still dislike the reset state being applied early, but consistency
> > > (and save/restore) trumps taste here. I know I'm being pedantic here,
> > > but we've been burned with loosely defined semantics in the past, and
> > > I want to get this right. Or less wrong.
> >
> > I completely agree with you. The semantics are a bit funky, and I
> > really do wonder if the easiest way around that is to just make the
> > implementation a userspace problem.
>
> We're in violent agreement.

Lol

> It means that we only need the MP_STATE
> part to implement WFI from userspace.
>
> Could you try and respin this? Also, it'd be good to see a prototype
> of userspace code using this, as this is a new API.

Sure thing. I'll keep it to kvmtool, since that's the most familiar to
me. Also, I think I had an RFC for kvmtool many moons ago...

--
Oliver

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 09/19] KVM: arm64: Implement PSCI SYSTEM_SUSPEND
  2022-02-25 18:58         ` Marc Zyngier
@ 2022-03-03  1:01           ` Oliver Upton
  -1 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-03-03  1:01 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, Paolo Bonzini, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

On Fri, Feb 25, 2022 at 06:58:13PM +0000, Marc Zyngier wrote:
> On Thu, 24 Feb 2022 19:35:33 +0000,
> Oliver Upton <oupton@google.com> wrote:
> > 
> > Hi Marc,
> > 
> > Thanks for reviewing the series. ACK to the nits and smaller comments
> > you've made, I'll incorporate that feedback in the next series.
> > 
> > On Thu, Feb 24, 2022 at 02:02:34PM +0000, Marc Zyngier wrote:
> > > On Wed, 23 Feb 2022 04:18:34 +0000,
> > > Oliver Upton <oupton@google.com> wrote:
> > > > 
> > > > ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows
> > > > software to request that a system be placed in the deepest possible
> > > > low-power state. Effectively, software can use this to suspend itself to
> > > > RAM. Note that the semantics of this PSCI call are very similar to
> > > > CPU_SUSPEND, which is already implemented in KVM.
> > > > 
> > > > Implement the SYSTEM_SUSPEND in KVM. Similar to CPU_SUSPEND, the
> > > > low-power state is implemented as a guest WFI. Synchronously reset the
> > > > calling CPU before entering the WFI, such that the vCPU may immediately
> > > > resume execution when a wakeup event is recognized.
> > > > 
> > > > Signed-off-by: Oliver Upton <oupton@google.com>
> > > > ---
> > > >  arch/arm64/kvm/psci.c  | 51 ++++++++++++++++++++++++++++++++++++++++++
> > > >  arch/arm64/kvm/reset.c |  3 ++-
> > > >  2 files changed, 53 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > > > index 77a00913cdfd..41adaaf2234a 100644
> > > > --- a/arch/arm64/kvm/psci.c
> > > > +++ b/arch/arm64/kvm/psci.c
> > > > @@ -208,6 +208,50 @@ static void kvm_psci_system_reset(struct kvm_vcpu *vcpu)
> > > >  	kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET);
> > > >  }
> > > >  
> > > > +static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
> > > > +{
> > > > +	struct vcpu_reset_state reset_state;
> > > > +	struct kvm *kvm = vcpu->kvm;
> > > > +	struct kvm_vcpu *tmp;
> > > > +	bool denied = false;
> > > > +	unsigned long i;
> > > > +
> > > > +	reset_state.pc = smccc_get_arg1(vcpu);
> > > > +	if (!kvm_ipa_valid(kvm, reset_state.pc)) {
> > > > +		smccc_set_retval(vcpu, PSCI_RET_INVALID_ADDRESS, 0, 0, 0);
> > > > +		return 1;
> > > > +	}
> > > > +
> > > > +	reset_state.r0 = smccc_get_arg2(vcpu);
> > > > +	reset_state.be = kvm_vcpu_is_be(vcpu);
> > > > +	reset_state.reset = true;
> > > > +
> > > > +	/*
> > > > +	 * The SYSTEM_SUSPEND PSCI call requires that all vCPUs (except the
> > > > +	 * calling vCPU) be in an OFF state, as determined by the
> > > > +	 * implementation.
> > > > +	 *
> > > > +	 * See ARM DEN0022D, 5.19 "SYSTEM_SUSPEND" for more details.
> > > > +	 */
> > > > +	mutex_lock(&kvm->lock);
> > > > +	kvm_for_each_vcpu(i, tmp, kvm) {
> > > > +		if (tmp != vcpu && !kvm_arm_vcpu_powered_off(tmp)) {
> > > > +			denied = true;
> > > > +			break;
> > > > +		}
> > > > +	}
> > > > +	mutex_unlock(&kvm->lock);
> > > 
> > > This looks dodgy. Nothing seems to prevent userspace from setting the
> > > mp_state to RUNNING in parallel with this, as only the vcpu mutex is
> > > held when this ioctl is issued.
> > > 
> > > It looks to me that what you want is what lock_all_vcpus() does
> > > (Alexandru has a patch moving it out of the vgic code as part of his
> > > SPE series).
> > > 
> > > It is also pretty unclear what the interaction with userspace is once
> > > you have released the lock. If the VMM starts a vcpu other than the
> > > suspending one, what is its state? The spec doesn't see to help
> > > here. I can see two options:
> > > 
> > > - either all the vcpus have the same reset state applied to them as
> > >   they come up, unless they are started with CPU_ON by a vcpu that has
> > >   already booted (but there is a single 'context_id' provided, and I
> > >   fear this is going to confuse the OS)...
> > > 
> > > - or only the suspending vcpu can resume the system, and we must fail
> > >   a change of mp_state for the other vcpus.
> > > 
> > > What do you think?
> > 
> > Definitely the latter. The documentation of SYSTEM_SUSPEND is quite
> > shaky on this, but it would appear that the intention is for the caller
> > to be the first CPU to wake up.
> 
> Yup. We now have clarification on the intent of the spec (only the
> caller CPU can resume the system), and this needs to be tightened.
> 

I'm beginning to wonder if the VMM/KVM split implementation of
system-scoped PSCI calls can ever be right. There exists a critical
section in all system-wide PSCI calls that currently spans an exit to
userspace. I cannot devise a sane way to guard such a critical section
when we are returning control to userspace.

For example, KVM offlines all of the CPUs except for the exiting CPU
when handling SYSTEM_RESET or SYSTEM_OFF, but nothing prevents an
interleaving KVM_ARM_VCPU_INIT or KVM_SET_MP_STATE from disturbing the
state of the VM. Couldn't even say its a userspace bug, either, because
a different vCPU could do something before the caller has exited. Even
if we grab all the vCPU mutexes, we'd need to drop them before exiting
to userspace.

If userspace decides to reject the PSCI call, we're giving control
back to the guest in a wildly different state than it had making the
PSCI call. Again, the PSCI spec is vague on this matter, but I believe
the intuitive answer is that we should not change the VM state if the call
is rejected. This could upset an otherwise well-behaved KVM guest.

Doing SYSTEM_SUSPEND in userspace is better, as KVM avoids mucking with
the VM state before the PSCI call is actually accepted. However, any of
the consistency checks in the kernel for SYSTEM_SUSPEND are entirely
moot. Anything can happen between the exit to userspace and the moment
userspace actually recognizes the SYSTEM_SUSPEND call on the exiting
CPU.

KVM rejecting attempts to resume vCPUs besides the caller will break
a correct userspace, given the inherent race that crops up when exiting.
Blocking attempts to resume other vCPUs could have unintented
consequences as well. It seems that we'd need to prevent
KVM_ARM_VCPU_INIT calls as well as KVM_SET_MP_STATE, even though the
former could be used in a valid SYSTEM_SUSPEND implementation.

I really do hate to go back to the drawing board on the PSCI stuff
again, but there seems to be a fundamental issue in how system-scoped
calls are handled. Userspace is probably the only place where we could
quiesce the VM state, assess if the PSCI call should be accepted, and
change the VM state.

Do you think all of this is an issue as well?

--
Oliver

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 09/19] KVM: arm64: Implement PSCI SYSTEM_SUSPEND
@ 2022-03-03  1:01           ` Oliver Upton
  0 siblings, 0 replies; 94+ messages in thread
From: Oliver Upton @ 2022-03-03  1:01 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Wanpeng Li, kvm, Joerg Roedel, Peter Shier, kvm-riscv,
	Atish Patra, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

On Fri, Feb 25, 2022 at 06:58:13PM +0000, Marc Zyngier wrote:
> On Thu, 24 Feb 2022 19:35:33 +0000,
> Oliver Upton <oupton@google.com> wrote:
> > 
> > Hi Marc,
> > 
> > Thanks for reviewing the series. ACK to the nits and smaller comments
> > you've made, I'll incorporate that feedback in the next series.
> > 
> > On Thu, Feb 24, 2022 at 02:02:34PM +0000, Marc Zyngier wrote:
> > > On Wed, 23 Feb 2022 04:18:34 +0000,
> > > Oliver Upton <oupton@google.com> wrote:
> > > > 
> > > > ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows
> > > > software to request that a system be placed in the deepest possible
> > > > low-power state. Effectively, software can use this to suspend itself to
> > > > RAM. Note that the semantics of this PSCI call are very similar to
> > > > CPU_SUSPEND, which is already implemented in KVM.
> > > > 
> > > > Implement the SYSTEM_SUSPEND in KVM. Similar to CPU_SUSPEND, the
> > > > low-power state is implemented as a guest WFI. Synchronously reset the
> > > > calling CPU before entering the WFI, such that the vCPU may immediately
> > > > resume execution when a wakeup event is recognized.
> > > > 
> > > > Signed-off-by: Oliver Upton <oupton@google.com>
> > > > ---
> > > >  arch/arm64/kvm/psci.c  | 51 ++++++++++++++++++++++++++++++++++++++++++
> > > >  arch/arm64/kvm/reset.c |  3 ++-
> > > >  2 files changed, 53 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> > > > index 77a00913cdfd..41adaaf2234a 100644
> > > > --- a/arch/arm64/kvm/psci.c
> > > > +++ b/arch/arm64/kvm/psci.c
> > > > @@ -208,6 +208,50 @@ static void kvm_psci_system_reset(struct kvm_vcpu *vcpu)
> > > >  	kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET);
> > > >  }
> > > >  
> > > > +static int kvm_psci_system_suspend(struct kvm_vcpu *vcpu)
> > > > +{
> > > > +	struct vcpu_reset_state reset_state;
> > > > +	struct kvm *kvm = vcpu->kvm;
> > > > +	struct kvm_vcpu *tmp;
> > > > +	bool denied = false;
> > > > +	unsigned long i;
> > > > +
> > > > +	reset_state.pc = smccc_get_arg1(vcpu);
> > > > +	if (!kvm_ipa_valid(kvm, reset_state.pc)) {
> > > > +		smccc_set_retval(vcpu, PSCI_RET_INVALID_ADDRESS, 0, 0, 0);
> > > > +		return 1;
> > > > +	}
> > > > +
> > > > +	reset_state.r0 = smccc_get_arg2(vcpu);
> > > > +	reset_state.be = kvm_vcpu_is_be(vcpu);
> > > > +	reset_state.reset = true;
> > > > +
> > > > +	/*
> > > > +	 * The SYSTEM_SUSPEND PSCI call requires that all vCPUs (except the
> > > > +	 * calling vCPU) be in an OFF state, as determined by the
> > > > +	 * implementation.
> > > > +	 *
> > > > +	 * See ARM DEN0022D, 5.19 "SYSTEM_SUSPEND" for more details.
> > > > +	 */
> > > > +	mutex_lock(&kvm->lock);
> > > > +	kvm_for_each_vcpu(i, tmp, kvm) {
> > > > +		if (tmp != vcpu && !kvm_arm_vcpu_powered_off(tmp)) {
> > > > +			denied = true;
> > > > +			break;
> > > > +		}
> > > > +	}
> > > > +	mutex_unlock(&kvm->lock);
> > > 
> > > This looks dodgy. Nothing seems to prevent userspace from setting the
> > > mp_state to RUNNING in parallel with this, as only the vcpu mutex is
> > > held when this ioctl is issued.
> > > 
> > > It looks to me that what you want is what lock_all_vcpus() does
> > > (Alexandru has a patch moving it out of the vgic code as part of his
> > > SPE series).
> > > 
> > > It is also pretty unclear what the interaction with userspace is once
> > > you have released the lock. If the VMM starts a vcpu other than the
> > > suspending one, what is its state? The spec doesn't see to help
> > > here. I can see two options:
> > > 
> > > - either all the vcpus have the same reset state applied to them as
> > >   they come up, unless they are started with CPU_ON by a vcpu that has
> > >   already booted (but there is a single 'context_id' provided, and I
> > >   fear this is going to confuse the OS)...
> > > 
> > > - or only the suspending vcpu can resume the system, and we must fail
> > >   a change of mp_state for the other vcpus.
> > > 
> > > What do you think?
> > 
> > Definitely the latter. The documentation of SYSTEM_SUSPEND is quite
> > shaky on this, but it would appear that the intention is for the caller
> > to be the first CPU to wake up.
> 
> Yup. We now have clarification on the intent of the spec (only the
> caller CPU can resume the system), and this needs to be tightened.
> 

I'm beginning to wonder if the VMM/KVM split implementation of
system-scoped PSCI calls can ever be right. There exists a critical
section in all system-wide PSCI calls that currently spans an exit to
userspace. I cannot devise a sane way to guard such a critical section
when we are returning control to userspace.

For example, KVM offlines all of the CPUs except for the exiting CPU
when handling SYSTEM_RESET or SYSTEM_OFF, but nothing prevents an
interleaving KVM_ARM_VCPU_INIT or KVM_SET_MP_STATE from disturbing the
state of the VM. Couldn't even say its a userspace bug, either, because
a different vCPU could do something before the caller has exited. Even
if we grab all the vCPU mutexes, we'd need to drop them before exiting
to userspace.

If userspace decides to reject the PSCI call, we're giving control
back to the guest in a wildly different state than it had making the
PSCI call. Again, the PSCI spec is vague on this matter, but I believe
the intuitive answer is that we should not change the VM state if the call
is rejected. This could upset an otherwise well-behaved KVM guest.

Doing SYSTEM_SUSPEND in userspace is better, as KVM avoids mucking with
the VM state before the PSCI call is actually accepted. However, any of
the consistency checks in the kernel for SYSTEM_SUSPEND are entirely
moot. Anything can happen between the exit to userspace and the moment
userspace actually recognizes the SYSTEM_SUSPEND call on the exiting
CPU.

KVM rejecting attempts to resume vCPUs besides the caller will break
a correct userspace, given the inherent race that crops up when exiting.
Blocking attempts to resume other vCPUs could have unintented
consequences as well. It seems that we'd need to prevent
KVM_ARM_VCPU_INIT calls as well as KVM_SET_MP_STATE, even though the
former could be used in a valid SYSTEM_SUSPEND implementation.

I really do hate to go back to the drawing board on the PSCI stuff
again, but there seems to be a fundamental issue in how system-scoped
calls are handled. Userspace is probably the only place where we could
quiesce the VM state, assess if the PSCI call should be accepted, and
change the VM state.

Do you think all of this is an issue as well?

--
Oliver
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 09/19] KVM: arm64: Implement PSCI SYSTEM_SUSPEND
  2022-03-03  1:01           ` Oliver Upton
@ 2022-03-03 11:37             ` Marc Zyngier
  -1 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-03-03 11:37 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, Paolo Bonzini, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Anup Patel, Atish Patra, Sean Christopherson,
	Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	kvm-riscv, Peter Shier, Reiji Watanabe, Ricardo Koller,
	Raghavendra Rao Ananta, Jing Zhang

On Thu, 03 Mar 2022 01:01:40 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
>
> I'm beginning to wonder if the VMM/KVM split implementation of
> system-scoped PSCI calls can ever be right. There exists a critical
> section in all system-wide PSCI calls that currently spans an exit to
> userspace. I cannot devise a sane way to guard such a critical section
> when we are returning control to userspace.
> 
> For example, KVM offlines all of the CPUs except for the exiting CPU
> when handling SYSTEM_RESET or SYSTEM_OFF, but nothing prevents an
> interleaving KVM_ARM_VCPU_INIT or KVM_SET_MP_STATE from disturbing the
> state of the VM. Couldn't even say its a userspace bug, either, because
> a different vCPU could do something before the caller has exited. Even
> if we grab all the vCPU mutexes, we'd need to drop them before exiting
> to userspace.
> 
> If userspace decides to reject the PSCI call, we're giving control
> back to the guest in a wildly different state than it had making the
> PSCI call. Again, the PSCI spec is vague on this matter, but I believe
> the intuitive answer is that we should not change the VM state if the call
> is rejected. This could upset an otherwise well-behaved KVM guest.

Sure. But this is the equivalent of a buggy firmware/hardware, and a
failing PSCI reboot is likely to have had destructive effects. Is it
nice? Absolutely not. Is it a problem in practice? It hasn't in the
10+ years this API has been implemented.

The alternative is to be able to forward all the PSCI events to
userspace and let it deal with it. It has long been at the back of my
mind to allow userspace to request ranges of hypercalls to be
forwarded directly, without any in-kernel handling. I'm all for it,
but this must be a buy-in from the VMM.

> Doing SYSTEM_SUSPEND in userspace is better, as KVM avoids mucking with
> the VM state before the PSCI call is actually accepted. However, any of
> the consistency checks in the kernel for SYSTEM_SUSPEND are entirely
> moot. Anything can happen between the exit to userspace and the moment
> userspace actually recognizes the SYSTEM_SUSPEND call on the exiting
> CPU.

I agree. Maybe we just don't do any and only exit to userspace on the
calling vcpu. It then becomes the responsibility of userspace to take
the other vcpus out of the kernel and change their state if required.

> 
> KVM rejecting attempts to resume vCPUs besides the caller will break
> a correct userspace, given the inherent race that crops up when exiting.
> Blocking attempts to resume other vCPUs could have unintented
> consequences as well. It seems that we'd need to prevent
> KVM_ARM_VCPU_INIT calls as well as KVM_SET_MP_STATE, even though the
> former could be used in a valid SYSTEM_SUSPEND implementation.

I don't think we need to enforce this if we leave suspend entirely to
userspace. At the end of the day, we rely on the VMM not to screw up
the guest. If the VMM restarts the wrong vcpu, that's bad behaviour,
but there are a million other ways for the VMM to mess the guess up.

> I really do hate to go back to the drawing board on the PSCI stuff
> again, but there seems to be a fundamental issue in how system-scoped
> calls are handled. Userspace is probably the only place where we could
> quiesce the VM state, assess if the PSCI call should be accepted, and
> change the VM state.
>
> Do you think all of this is an issue as well?

I don't think we should worry too much about the other system events.
They are now ABI, and changing them is tricky. For suspend, I think
punting the whole thing to userspace is doable. Otherwise, the
alternative is to implement full userspace PSCI support, which is
going to be a lot of work (and a lot of ABI discussions...).

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 94+ messages in thread

* Re: [PATCH v3 09/19] KVM: arm64: Implement PSCI SYSTEM_SUSPEND
@ 2022-03-03 11:37             ` Marc Zyngier
  0 siblings, 0 replies; 94+ messages in thread
From: Marc Zyngier @ 2022-03-03 11:37 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Wanpeng Li, kvm, Joerg Roedel, Peter Shier, kvm-riscv,
	Atish Patra, Paolo Bonzini, Vitaly Kuznetsov, kvmarm,
	Jim Mattson

On Thu, 03 Mar 2022 01:01:40 +0000,
Oliver Upton <oupton@google.com> wrote:
> 
>
> I'm beginning to wonder if the VMM/KVM split implementation of
> system-scoped PSCI calls can ever be right. There exists a critical
> section in all system-wide PSCI calls that currently spans an exit to
> userspace. I cannot devise a sane way to guard such a critical section
> when we are returning control to userspace.
> 
> For example, KVM offlines all of the CPUs except for the exiting CPU
> when handling SYSTEM_RESET or SYSTEM_OFF, but nothing prevents an
> interleaving KVM_ARM_VCPU_INIT or KVM_SET_MP_STATE from disturbing the
> state of the VM. Couldn't even say its a userspace bug, either, because
> a different vCPU could do something before the caller has exited. Even
> if we grab all the vCPU mutexes, we'd need to drop them before exiting
> to userspace.
> 
> If userspace decides to reject the PSCI call, we're giving control
> back to the guest in a wildly different state than it had making the
> PSCI call. Again, the PSCI spec is vague on this matter, but I believe
> the intuitive answer is that we should not change the VM state if the call
> is rejected. This could upset an otherwise well-behaved KVM guest.

Sure. But this is the equivalent of a buggy firmware/hardware, and a
failing PSCI reboot is likely to have had destructive effects. Is it
nice? Absolutely not. Is it a problem in practice? It hasn't in the
10+ years this API has been implemented.

The alternative is to be able to forward all the PSCI events to
userspace and let it deal with it. It has long been at the back of my
mind to allow userspace to request ranges of hypercalls to be
forwarded directly, without any in-kernel handling. I'm all for it,
but this must be a buy-in from the VMM.

> Doing SYSTEM_SUSPEND in userspace is better, as KVM avoids mucking with
> the VM state before the PSCI call is actually accepted. However, any of
> the consistency checks in the kernel for SYSTEM_SUSPEND are entirely
> moot. Anything can happen between the exit to userspace and the moment
> userspace actually recognizes the SYSTEM_SUSPEND call on the exiting
> CPU.

I agree. Maybe we just don't do any and only exit to userspace on the
calling vcpu. It then becomes the responsibility of userspace to take
the other vcpus out of the kernel and change their state if required.

> 
> KVM rejecting attempts to resume vCPUs besides the caller will break
> a correct userspace, given the inherent race that crops up when exiting.
> Blocking attempts to resume other vCPUs could have unintented
> consequences as well. It seems that we'd need to prevent
> KVM_ARM_VCPU_INIT calls as well as KVM_SET_MP_STATE, even though the
> former could be used in a valid SYSTEM_SUSPEND implementation.

I don't think we need to enforce this if we leave suspend entirely to
userspace. At the end of the day, we rely on the VMM not to screw up
the guest. If the VMM restarts the wrong vcpu, that's bad behaviour,
but there are a million other ways for the VMM to mess the guess up.

> I really do hate to go back to the drawing board on the PSCI stuff
> again, but there seems to be a fundamental issue in how system-scoped
> calls are handled. Userspace is probably the only place where we could
> quiesce the VM state, assess if the PSCI call should be accepted, and
> change the VM state.
>
> Do you think all of this is an issue as well?

I don't think we should worry too much about the other system events.
They are now ABI, and changing them is tricky. For suspend, I think
punting the whole thing to userspace is doable. Otherwise, the
alternative is to implement full userspace PSCI support, which is
going to be a lot of work (and a lot of ABI discussions...).

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 94+ messages in thread

end of thread, other threads:[~2022-03-03 11:37 UTC | newest]

Thread overview: 94+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-23  4:18 [PATCH v3 00/19] KVM: arm64: Implement PSCI SYSTEM_SUSPEND Oliver Upton
2022-02-23  4:18 ` Oliver Upton
2022-02-23  4:18 ` [PATCH v3 01/19] KVM: arm64: Drop unused param from kvm_psci_version() Oliver Upton
2022-02-23  4:18   ` Oliver Upton
2022-02-24  6:14   ` Reiji Watanabe
2022-02-24  6:14     ` Reiji Watanabe
2022-02-23  4:18 ` [PATCH v3 02/19] KVM: arm64: Create a helper to check if IPA is valid Oliver Upton
2022-02-23  4:18   ` Oliver Upton
2022-02-24  6:32   ` Reiji Watanabe
2022-02-24  6:32     ` Reiji Watanabe
2022-02-24 12:06   ` Marc Zyngier
2022-02-24 12:06     ` Marc Zyngier
2022-02-23  4:18 ` [PATCH v3 03/19] KVM: arm64: Reject invalid addresses for CPU_ON PSCI call Oliver Upton
2022-02-23  4:18   ` Oliver Upton
2022-02-24  6:55   ` Reiji Watanabe
2022-02-24  6:55     ` Reiji Watanabe
2022-02-24 12:30   ` Marc Zyngier
2022-02-24 12:30     ` Marc Zyngier
2022-02-24 19:21     ` Oliver Upton
2022-02-24 19:21       ` Oliver Upton
2022-02-25 15:35       ` Marc Zyngier
2022-02-25 15:35         ` Marc Zyngier
2022-02-23  4:18 ` [PATCH v3 04/19] KVM: arm64: Clean up SMC64 PSCI filtering for AArch32 guests Oliver Upton
2022-02-23  4:18   ` Oliver Upton
2022-02-23  4:18 ` [PATCH v3 05/19] KVM: arm64: Dedupe vCPU power off helpers Oliver Upton
2022-02-23  4:18   ` Oliver Upton
2022-02-24  7:07   ` Reiji Watanabe
2022-02-24  7:07     ` Reiji Watanabe
2022-02-23  4:18 ` [PATCH v3 06/19] KVM: arm64: Track vCPU power state using MP state values Oliver Upton
2022-02-23  4:18   ` Oliver Upton
2022-02-24 13:25   ` Marc Zyngier
2022-02-24 13:25     ` Marc Zyngier
2022-02-24 22:08     ` Oliver Upton
2022-02-24 22:08       ` Oliver Upton
2022-02-25 15:37       ` Marc Zyngier
2022-02-25 15:37         ` Marc Zyngier
2022-02-23  4:18 ` [PATCH v3 07/19] KVM: arm64: Rename the KVM_REQ_SLEEP handler Oliver Upton
2022-02-23  4:18   ` Oliver Upton
2022-02-23  4:18 ` [PATCH v3 08/19] KVM: arm64: Add reset helper that accepts caller-provided reset state Oliver Upton
2022-02-23  4:18   ` Oliver Upton
2022-02-23  4:18 ` [PATCH v3 09/19] KVM: arm64: Implement PSCI SYSTEM_SUSPEND Oliver Upton
2022-02-23  4:18   ` Oliver Upton
2022-02-24 14:02   ` Marc Zyngier
2022-02-24 14:02     ` Marc Zyngier
2022-02-24 19:35     ` Oliver Upton
2022-02-24 19:35       ` Oliver Upton
2022-02-25 18:58       ` Marc Zyngier
2022-02-25 18:58         ` Marc Zyngier
2022-03-03  1:01         ` Oliver Upton
2022-03-03  1:01           ` Oliver Upton
2022-03-03 11:37           ` Marc Zyngier
2022-03-03 11:37             ` Marc Zyngier
2022-02-23  4:18 ` [PATCH v3 10/19] KVM: Create helper for setting a system event exit Oliver Upton
2022-02-23  4:18   ` Oliver Upton
2022-02-23  6:37   ` Anup Patel
2022-02-23  6:37     ` Anup Patel
2022-02-24 14:07   ` Marc Zyngier
2022-02-24 14:07     ` Marc Zyngier
2022-02-23  4:18 ` [PATCH v3 11/19] KVM: arm64: Return a value from check_vcpu_requests() Oliver Upton
2022-02-23  4:18   ` Oliver Upton
2022-02-23  4:18 ` [PATCH v3 12/19] KVM: arm64: Add support for userspace to suspend a vCPU Oliver Upton
2022-02-23  4:18   ` Oliver Upton
2022-02-24 15:12   ` Marc Zyngier
2022-02-24 15:12     ` Marc Zyngier
2022-02-24 19:47     ` Oliver Upton
2022-02-24 19:47       ` Oliver Upton
2022-02-23  4:18 ` [PATCH v3 13/19] KVM: arm64: Add support KVM_SYSTEM_EVENT_SUSPEND to PSCI SYSTEM_SUSPEND Oliver Upton
2022-02-23  4:18   ` Oliver Upton
2022-02-24 15:40   ` Marc Zyngier
2022-02-24 15:40     ` Marc Zyngier
2022-02-24 20:05     ` Oliver Upton
2022-02-24 20:05       ` Oliver Upton
2022-02-26 11:29       ` Marc Zyngier
2022-02-26 11:29         ` Marc Zyngier
2022-02-26 18:28         ` Oliver Upton
2022-02-26 18:28           ` Oliver Upton
2022-03-02  9:52           ` Marc Zyngier
2022-03-02  9:52             ` Marc Zyngier
2022-03-02  9:57             ` Oliver Upton
2022-03-02  9:57               ` Oliver Upton
2022-02-23  4:18 ` [PATCH v3 14/19] KVM: arm64: Raise default PSCI version to v1.1 Oliver Upton
2022-02-23  4:18   ` Oliver Upton
2022-02-23  4:26   ` Oliver Upton
2022-02-23  4:26     ` Oliver Upton
2022-02-23  4:18 ` [PATCH v3 15/19] selftests: KVM: Rename psci_cpu_on_test to psci_test Oliver Upton
2022-02-23  4:18   ` Oliver Upton
2022-02-23  4:18 ` [PATCH v3 16/19] selftests: KVM: Create helper for making SMCCC calls Oliver Upton
2022-02-23  4:18   ` Oliver Upton
2022-02-23  4:18 ` [PATCH v3 17/19] selftests: KVM: Use KVM_SET_MP_STATE to power off vCPU in psci_test Oliver Upton
2022-02-23  4:18   ` Oliver Upton
2022-02-23  4:18 ` [PATCH v3 18/19] selftests: KVM: Refactor psci_test to make it amenable to new tests Oliver Upton
2022-02-23  4:18   ` Oliver Upton
2022-02-23  4:18 ` [PATCH v3 19/19] selftests: KVM: Test SYSTEM_SUSPEND PSCI call Oliver Upton
2022-02-23  4:18   ` Oliver Upton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.