kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] kvm/arm64: Initial pKVM user ABI
@ 2021-06-03 18:33 Will Deacon
  2021-06-03 18:33 ` [PATCH 1/4] KVM: arm64: Ignore 'kvm-arm.mode=protected' when using VHE Will Deacon
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Will Deacon @ 2021-06-03 18:33 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Mark Rutland, Christoffer Dall, Paolo Bonzini,
	Fuad Tabba, Quentin Perret, Sean Christopherson, David Brazdil,
	kvm, linux-arm-kernel

Hi folks,

These patches implement support for userspace to request a "Protected VM"
using KVM on arm64 when configured in Protected Mode (see the existing
kvm-arm.mode=protected command-line option).

The final patch documents the new ABI and its behaviour, so I won't
reproduce that here. Please go and have a look there instead!

Note that this series _doesn't_ implement the actual isolation of guest
memory; it's more about setting the groundwork for subsequent patches
and getting feedback on the user-facing side of things. The final patch
is marked RFC accordingly.

Cheers,

Will

Cc: Marc Zyngier <maz@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com> 
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com> 
Cc: Fuad Tabba <tabba@google.com>
Cc: Quentin Perret <qperret@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: David Brazdil <dbrazdil@google.com>
Cc: kvm@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org

--->8

Will Deacon (4):
  KVM: arm64: Ignore 'kvm-arm.mode=protected' when using VHE
  KVM: arm64: Extend comment in has_vhe()
  KVM: arm64: Parse reserved-memory node for pkvm guest firmware region
  KVM: arm64: Introduce KVM_CAP_ARM_PROTECTED_VM

 .../admin-guide/kernel-parameters.txt         |   1 -
 Documentation/virt/kvm/api.rst                |  69 ++++++++
 arch/arm64/include/asm/kvm_host.h             |  10 ++
 arch/arm64/include/asm/virt.h                 |   3 +
 arch/arm64/include/uapi/asm/kvm.h             |   9 +
 arch/arm64/kernel/cpufeature.c                |  10 +-
 arch/arm64/kvm/Makefile                       |   2 +-
 arch/arm64/kvm/arm.c                          |  24 +--
 arch/arm64/kvm/mmu.c                          |   3 +
 arch/arm64/kvm/pkvm.c                         | 156 ++++++++++++++++++
 include/uapi/linux/kvm.h                      |   1 +
 11 files changed, 267 insertions(+), 21 deletions(-)
 create mode 100644 arch/arm64/kvm/pkvm.c

-- 
2.32.0.rc0.204.g9fa02ecfa5-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] KVM: arm64: Ignore 'kvm-arm.mode=protected' when using VHE
  2021-06-03 18:33 [PATCH 0/4] kvm/arm64: Initial pKVM user ABI Will Deacon
@ 2021-06-03 18:33 ` Will Deacon
  2021-06-04 14:01   ` Mark Rutland
  2021-06-03 18:33 ` [PATCH 2/4] KVM: arm64: Extend comment in has_vhe() Will Deacon
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2021-06-03 18:33 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Mark Rutland, Christoffer Dall, Paolo Bonzini,
	Fuad Tabba, Quentin Perret, Sean Christopherson, David Brazdil,
	kvm, linux-arm-kernel

Ignore 'kvm-arm.mode=protected' when using VHE so that kvm_get_mode()
only returns KVM_MODE_PROTECTED on systems where the feature is available.

Cc: David Brazdil <dbrazdil@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt |  1 -
 arch/arm64/kernel/cpufeature.c                  | 10 +---------
 arch/arm64/kvm/arm.c                            |  6 +++++-
 3 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index cb89dbdedc46..e85dbdf1ee8e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2300,7 +2300,6 @@
 
 			protected: nVHE-based mode with support for guests whose
 				   state is kept private from the host.
-				   Not valid if the kernel is running in EL2.
 
 			Defaults to VHE/nVHE based on hardware support.
 
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index efed2830d141..dc1f2e747828 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1773,15 +1773,7 @@ static void cpu_enable_mte(struct arm64_cpu_capabilities const *cap)
 #ifdef CONFIG_KVM
 static bool is_kvm_protected_mode(const struct arm64_cpu_capabilities *entry, int __unused)
 {
-	if (kvm_get_mode() != KVM_MODE_PROTECTED)
-		return false;
-
-	if (is_kernel_in_hyp_mode()) {
-		pr_warn("Protected KVM not available with VHE\n");
-		return false;
-	}
-
-	return true;
+	return kvm_get_mode() == KVM_MODE_PROTECTED;
 }
 #endif /* CONFIG_KVM */
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 1cb39c0803a4..8d5e23198dfd 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -2121,7 +2121,11 @@ static int __init early_kvm_mode_cfg(char *arg)
 		return -EINVAL;
 
 	if (strcmp(arg, "protected") == 0) {
-		kvm_mode = KVM_MODE_PROTECTED;
+		if (!is_kernel_in_hyp_mode())
+			kvm_mode = KVM_MODE_PROTECTED;
+		else
+			pr_warn_once("Protected KVM not available with VHE\n");
+
 		return 0;
 	}
 
-- 
2.32.0.rc0.204.g9fa02ecfa5-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] KVM: arm64: Extend comment in has_vhe()
  2021-06-03 18:33 [PATCH 0/4] kvm/arm64: Initial pKVM user ABI Will Deacon
  2021-06-03 18:33 ` [PATCH 1/4] KVM: arm64: Ignore 'kvm-arm.mode=protected' when using VHE Will Deacon
@ 2021-06-03 18:33 ` Will Deacon
  2021-06-04 14:09   ` Mark Rutland
  2021-06-03 18:33 ` [PATCH 3/4] KVM: arm64: Parse reserved-memory node for pkvm guest firmware region Will Deacon
  2021-06-03 18:33 ` [RFC PATCH 4/4] KVM: arm64: Introduce KVM_CAP_ARM_PROTECTED_VM Will Deacon
  3 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2021-06-03 18:33 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Mark Rutland, Christoffer Dall, Paolo Bonzini,
	Fuad Tabba, Quentin Perret, Sean Christopherson, David Brazdil,
	kvm, linux-arm-kernel

has_vhe() expands to a compile-time constant when evaluated from the VHE
or nVHE code, alternatively checking a static key when called from
elsewhere in the kernel. On face value, this looks like a case of
premature optimization, but in fact this allows symbol references on
VHE-specific code paths to be dropped from the nVHE object.

Expand the comment in has_vhe() to make this clearer, hopefully
discouraging anybody from simplifying the code.

Cc: David Brazdil <dbrazdil@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/virt.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
index 7379f35ae2c6..3218ca17f819 100644
--- a/arch/arm64/include/asm/virt.h
+++ b/arch/arm64/include/asm/virt.h
@@ -111,6 +111,9 @@ static __always_inline bool has_vhe(void)
 	/*
 	 * Code only run in VHE/NVHE hyp context can assume VHE is present or
 	 * absent. Otherwise fall back to caps.
+	 * This allows the compiler to discard VHE-specific code from the
+	 * nVHE object, reducing the number of external symbol references
+	 * needed to link.
 	 */
 	if (is_vhe_hyp_code())
 		return true;
-- 
2.32.0.rc0.204.g9fa02ecfa5-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] KVM: arm64: Parse reserved-memory node for pkvm guest firmware region
  2021-06-03 18:33 [PATCH 0/4] kvm/arm64: Initial pKVM user ABI Will Deacon
  2021-06-03 18:33 ` [PATCH 1/4] KVM: arm64: Ignore 'kvm-arm.mode=protected' when using VHE Will Deacon
  2021-06-03 18:33 ` [PATCH 2/4] KVM: arm64: Extend comment in has_vhe() Will Deacon
@ 2021-06-03 18:33 ` Will Deacon
  2021-06-04 14:21   ` Mark Rutland
  2021-06-03 18:33 ` [RFC PATCH 4/4] KVM: arm64: Introduce KVM_CAP_ARM_PROTECTED_VM Will Deacon
  3 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2021-06-03 18:33 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Mark Rutland, Christoffer Dall, Paolo Bonzini,
	Fuad Tabba, Quentin Perret, Sean Christopherson, David Brazdil,
	kvm, linux-arm-kernel

Add support for a "linux,pkvm-guest-firmware-memory" reserved memory
region, which can be used to identify a firmware image for protected
VMs.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/Makefile |  2 +-
 arch/arm64/kvm/pkvm.c   | 52 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kvm/pkvm.c

diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 589921392cb1..61e054411831 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -14,7 +14,7 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
 	 $(KVM)/vfio.o $(KVM)/irqchip.o \
 	 arm.o mmu.o mmio.o psci.o perf.o hypercalls.o pvtime.o \
 	 inject_fault.o va_layout.o handle_exit.o \
-	 guest.o debug.o reset.o sys_regs.o \
+	 guest.o debug.o pkvm.o reset.o sys_regs.o \
 	 vgic-sys-reg-v3.o fpsimd.o pmu.o \
 	 arch_timer.o trng.o\
 	 vgic/vgic.o vgic/vgic-init.o \
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
new file mode 100644
index 000000000000..7af5d03a3941
--- /dev/null
+++ b/arch/arm64/kvm/pkvm.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * KVM host (EL1) interface to Protected KVM (pkvm) code at EL2.
+ *
+ * Copyright (C) 2021 Google LLC
+ * Author: Will Deacon <will@kernel.org>
+ */
+
+#include <linux/kvm_host.h>
+#include <linux/mm.h>
+#include <linux/of_fdt.h>
+#include <linux/of_reserved_mem.h>
+
+static struct reserved_mem *pkvm_firmware_mem;
+
+static int __init pkvm_firmware_rmem_err(struct reserved_mem *rmem,
+					 const char *reason)
+{
+	phys_addr_t end = rmem->base + rmem->size;
+
+	kvm_err("Ignoring pkvm guest firmware memory reservation [%pa - %pa]: %s\n",
+		&rmem->base, &end, reason);
+	return -EINVAL;
+}
+
+static int __init pkvm_firmware_rmem_init(struct reserved_mem *rmem)
+{
+	unsigned long node = rmem->fdt_node;
+
+	if (kvm_get_mode() != KVM_MODE_PROTECTED)
+		return pkvm_firmware_rmem_err(rmem, "protected mode not enabled");
+
+	if (pkvm_firmware_mem)
+		return pkvm_firmware_rmem_err(rmem, "duplicate reservation");
+
+	if (!of_get_flat_dt_prop(node, "no-map", NULL))
+		return pkvm_firmware_rmem_err(rmem, "missing \"no-map\" property");
+
+	if (of_get_flat_dt_prop(node, "reusable", NULL))
+		return pkvm_firmware_rmem_err(rmem, "\"reusable\" property unsupported");
+
+	if (!PAGE_ALIGNED(rmem->base))
+		return pkvm_firmware_rmem_err(rmem, "base is not page-aligned");
+
+	if (!PAGE_ALIGNED(rmem->size))
+		return pkvm_firmware_rmem_err(rmem, "size is not page-aligned");
+
+	pkvm_firmware_mem = rmem;
+	return 0;
+}
+RESERVEDMEM_OF_DECLARE(pkvm_firmware, "linux,pkvm-guest-firmware-memory",
+		       pkvm_firmware_rmem_init);
-- 
2.32.0.rc0.204.g9fa02ecfa5-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH 4/4] KVM: arm64: Introduce KVM_CAP_ARM_PROTECTED_VM
  2021-06-03 18:33 [PATCH 0/4] kvm/arm64: Initial pKVM user ABI Will Deacon
                   ` (2 preceding siblings ...)
  2021-06-03 18:33 ` [PATCH 3/4] KVM: arm64: Parse reserved-memory node for pkvm guest firmware region Will Deacon
@ 2021-06-03 18:33 ` Will Deacon
  2021-06-03 20:15   ` Sean Christopherson
  2021-06-04 14:41   ` Mark Rutland
  3 siblings, 2 replies; 15+ messages in thread
From: Will Deacon @ 2021-06-03 18:33 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Mark Rutland, Christoffer Dall, Paolo Bonzini,
	Fuad Tabba, Quentin Perret, Sean Christopherson, David Brazdil,
	kvm, linux-arm-kernel

Introduce a new VM capability, KVM_CAP_ARM_PROTECTED_VM, which can be
used to isolate guest memory from the host. For now, the EL2 portion is
missing, so this documents and exposes the user ABI for the host.

Signed-off-by: Will Deacon <will@kernel.org>
---
 Documentation/virt/kvm/api.rst    |  69 ++++++++++++++++++++
 arch/arm64/include/asm/kvm_host.h |  10 +++
 arch/arm64/include/uapi/asm/kvm.h |   9 +++
 arch/arm64/kvm/arm.c              |  18 +++---
 arch/arm64/kvm/mmu.c              |   3 +
 arch/arm64/kvm/pkvm.c             | 104 ++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h          |   1 +
 7 files changed, 205 insertions(+), 9 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 7fcb2fd38f42..dfbaf905c435 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6362,6 +6362,75 @@ default.
 
 See Documentation/x86/sgx/2.Kernel-internals.rst for more details.
 
+7.26 KVM_CAP_ARM_PROTECTED_VM
+-----------------------------
+
+:Architectures: arm64
+:Target: VM
+:Parameters: flags is a single KVM_CAP_ARM_PROTECTED_VM_FLAGS_* value
+
+The presence of this capability indicates that KVM supports running in a
+configuration where the host Linux kernel does not have access to guest memory.
+On such a system, a small hypervisor layer at EL2 can configure the stage-2
+page tables for both the CPU and any DMA-capable devices to protect guest
+memory pages so that they are inaccessible to the host unless access is granted
+explicitly by the guest.
+
+The 'flags' parameter is defined as follows:
+
+7.26.1 KVM_CAP_ARM_PROTECTED_VM_FLAGS_ENABLE
+--------------------------------------------
+
+:Capability: 'flag' parameter to KVM_CAP_ARM_PROTECTED_VM
+:Architectures: arm64
+:Target: VM
+:Parameters: args[0] contains memory slot ID to hold guest firmware
+:Returns: 0 on success; negative error code on failure
+
+Enabling this capability causes all memory slots of the specified VM to be
+unmapped from the host system and put into a state where they are no longer
+configurable. The memory slot corresponding to the ID passed in args[0] is
+populated with the guest firmware image provided by the host firmware.
+
+The first vCPU to enter the guest is defined to be the primary vCPU. All other
+vCPUs belonging to the VM are secondary vCPUs.
+
+All vCPUs belonging to a VM with this capability enabled are initialised to a
+pre-determined reset state irrespective of any prior configuration according to
+the KVM_ARM_VCPU_INIT ioctl, with the following exceptions for the primary
+vCPU:
+
+	===========	===========
+	Register(s)	Reset value
+	===========	===========
+	X0-X14:		Preserved (see KVM_SET_ONE_REG)
+	X15:		Boot protocol version (0)
+	X16-X30:	Reserved (0)
+	PC:		IPA base of firmware memory slot
+	SP:		IPA end of firmware memory slot
+	===========	===========
+
+Secondary vCPUs belonging to a VM with this capability enabled will return
+-EPERM in response to a KVM_RUN ioctl() if the vCPU was not initialised with
+the KVM_ARM_VCPU_POWER_OFF feature.
+
+There is no support for AArch32 at any exception level.
+
+It is an error to enable this capability on a VM after issuing a KVM_RUN
+ioctl() on one of its vCPUs.
+
+7.26.2 KVM_CAP_ARM_PROTECTED_VM_FLAGS_INFO
+------------------------------------------
+
+:Capability: 'flag' parameter to KVM_CAP_ARM_PROTECTED_VM
+:Architectures: arm64
+:Target: VM
+:Parameters: args[0] contains pointer to 'struct kvm_protected_vm_info'
+:Returns: 0 on success; negative error code on failure
+
+Populates the 'struct kvm_protected_vm_info' pointed to by args[0] with
+information about the protected environment for the VM.
+
 8. Other capabilities.
 ======================
 
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7cd7d5c8c4bc..5645af2a1431 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -100,6 +100,11 @@ struct kvm_s2_mmu {
 struct kvm_arch_memory_slot {
 };
 
+struct kvm_protected_vm {
+	bool enabled;
+	struct kvm_memory_slot *firmware_slot;
+};
+
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
 
@@ -132,6 +137,8 @@ struct kvm_arch {
 
 	u8 pfr0_csv2;
 	u8 pfr0_csv3;
+
+	struct kvm_protected_vm pkvm;
 };
 
 struct kvm_vcpu_fault_info {
@@ -763,6 +770,9 @@ void kvm_arch_free_vm(struct kvm *kvm);
 
 int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type);
 
+int kvm_arm_vm_ioctl_pkvm(struct kvm *kvm, struct kvm_enable_cap *cap);
+#define kvm_vm_is_protected(kvm) (kvm->arch.pkvm.enabled)
+
 int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature);
 bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 24223adae150..cdb3298ba8ae 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -402,6 +402,15 @@ struct kvm_vcpu_events {
 #define KVM_PSCI_RET_INVAL		PSCI_RET_INVALID_PARAMS
 #define KVM_PSCI_RET_DENIED		PSCI_RET_DENIED
 
+/* Protected KVM */
+#define KVM_CAP_ARM_PROTECTED_VM_FLAGS_ENABLE	0
+#define KVM_CAP_ARM_PROTECTED_VM_FLAGS_INFO	1
+
+struct kvm_protected_vm_info {
+	__u64 firmware_size;
+	__u64 __reserved[7];
+};
+
 #endif
 
 #endif /* __ARM_KVM_H__ */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 8d5e23198dfd..186a0adf6391 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -83,22 +83,19 @@ int kvm_arch_check_processor_compat(void *opaque)
 int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 			    struct kvm_enable_cap *cap)
 {
-	int r;
-
-	if (cap->flags)
-		return -EINVAL;
-
 	switch (cap->cap) {
 	case KVM_CAP_ARM_NISV_TO_USER:
-		r = 0;
+		if (cap->flags)
+			return -EINVAL;
 		kvm->arch.return_nisv_io_abort_to_user = true;
 		break;
+	case KVM_CAP_ARM_PROTECTED_VM:
+		return kvm_arm_vm_ioctl_pkvm(kvm, cap);
 	default:
-		r = -EINVAL;
-		break;
+		return -EINVAL;
 	}
 
-	return r;
+	return 0;
 }
 
 static int kvm_arm_default_max_vcpus(void)
@@ -265,6 +262,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_ARM_PTRAUTH_GENERIC:
 		r = system_has_full_ptr_auth();
 		break;
+	case KVM_CAP_ARM_PROTECTED_VM:
+		r = is_protected_kvm_enabled();
+		break;
 	default:
 		r = 0;
 	}
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index c5d1f3c87dbd..e1d4a87d18e4 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1349,6 +1349,9 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
 	bool writable = !(mem->flags & KVM_MEM_READONLY);
 	int ret = 0;
 
+	if (kvm_vm_is_protected(kvm))
+		return -EPERM;
+
 	if (change != KVM_MR_CREATE && change != KVM_MR_MOVE &&
 			change != KVM_MR_FLAGS_ONLY)
 		return 0;
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 7af5d03a3941..cf624350fb27 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -50,3 +50,107 @@ static int __init pkvm_firmware_rmem_init(struct reserved_mem *rmem)
 }
 RESERVEDMEM_OF_DECLARE(pkvm_firmware, "linux,pkvm-guest-firmware-memory",
 		       pkvm_firmware_rmem_init);
+
+static int pkvm_init_el2_context(struct kvm *kvm)
+{
+	kvm_pr_unimpl("Stage-2 protection is not yet implemented\n");
+	return -EINVAL;
+}
+
+static int pkvm_init_firmware_slot(struct kvm *kvm, u64 slotid)
+{
+	struct kvm_memslots *slots;
+	struct kvm_memory_slot *slot;
+
+	if (slotid >= KVM_MEM_SLOTS_NUM || !pkvm_firmware_mem)
+		return -EINVAL;
+
+	slots = kvm_memslots(kvm);
+	if (!slots)
+		return -ENOENT;
+
+	slot = id_to_memslot(slots, slotid);
+	if (!slot)
+		return -ENOENT;
+
+	if (slot->flags)
+		return -EINVAL;
+
+	if ((slot->npages << PAGE_SHIFT) < pkvm_firmware_mem->size)
+		return -ENOMEM;
+
+	kvm->arch.pkvm.firmware_slot = slot;
+	return 0;
+}
+
+static void pkvm_teardown_firmware_slot(struct kvm *kvm)
+{
+	kvm->arch.pkvm.firmware_slot = NULL;
+}
+
+static int pkvm_enable(struct kvm *kvm, u64 slotid)
+{
+	int ret;
+
+	ret = pkvm_init_firmware_slot(kvm, slotid);
+	if (ret)
+		return ret;
+
+	ret = pkvm_init_el2_context(kvm);
+	if (ret)
+		pkvm_teardown_firmware_slot(kvm);
+
+	return ret;
+}
+
+static int pkvm_vm_ioctl_enable(struct kvm *kvm, u64 slotid)
+{
+	int ret = 0;
+
+	mutex_lock(&kvm->lock);
+	if (kvm_vm_is_protected(kvm)) {
+		ret = -EPERM;
+		goto out_kvm_unlock;
+	}
+
+	mutex_lock(&kvm->slots_lock);
+	ret = pkvm_enable(kvm, slotid);
+	if (ret)
+		goto out_slots_unlock;
+
+	kvm->arch.pkvm.enabled = true;
+out_slots_unlock:
+	mutex_unlock(&kvm->slots_lock);
+out_kvm_unlock:
+	mutex_unlock(&kvm->lock);
+	return ret;
+}
+
+static int pkvm_vm_ioctl_info(struct kvm *kvm,
+			      struct kvm_protected_vm_info __user *info)
+{
+	struct kvm_protected_vm_info kinfo = {
+		.firmware_size = pkvm_firmware_mem ?
+				 pkvm_firmware_mem->size :
+				 0,
+	};
+
+	return copy_to_user(info, &kinfo, sizeof(kinfo)) ? -EFAULT : 0;
+}
+
+int kvm_arm_vm_ioctl_pkvm(struct kvm *kvm, struct kvm_enable_cap *cap)
+{
+	if (cap->args[1] || cap->args[2] || cap->args[3])
+		return -EINVAL;
+
+	switch (cap->flags) {
+	case KVM_CAP_ARM_PROTECTED_VM_FLAGS_ENABLE:
+		return pkvm_vm_ioctl_enable(kvm, cap->args[0]);
+	case KVM_CAP_ARM_PROTECTED_VM_FLAGS_INFO:
+		return pkvm_vm_ioctl_info(kvm, (void __user *)cap->args[0]);
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 3fd9a7e9d90c..58ab8508be5e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1082,6 +1082,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_SGX_ATTRIBUTE 196
 #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
 #define KVM_CAP_PTP_KVM 198
+#define KVM_CAP_ARM_PROTECTED_VM 199
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.32.0.rc0.204.g9fa02ecfa5-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 4/4] KVM: arm64: Introduce KVM_CAP_ARM_PROTECTED_VM
  2021-06-03 18:33 ` [RFC PATCH 4/4] KVM: arm64: Introduce KVM_CAP_ARM_PROTECTED_VM Will Deacon
@ 2021-06-03 20:15   ` Sean Christopherson
  2021-06-08 12:08     ` Will Deacon
  2021-06-04 14:41   ` Mark Rutland
  1 sibling, 1 reply; 15+ messages in thread
From: Sean Christopherson @ 2021-06-03 20:15 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Mark Rutland, Christoffer Dall, Paolo Bonzini,
	Fuad Tabba, Quentin Perret, David Brazdil, kvm, linux-arm-kernel

On Thu, Jun 03, 2021, Will Deacon wrote:
> +Enabling this capability causes all memory slots of the specified VM to be
> +unmapped from the host system and put into a state where they are no longer
> +configurable.

What happens if userspace remaps the hva of the memslot?  In other words, how
will the implementation ensure the physical memory backing the guest is truly
inaccessible?

And how will the guest/host share memory?  E.g. what if the guest wants to set
up a new shared memory region and the host wants that to happen in a new memslot?

On a related topic, would the concept of guest-only memory be useful[*]?  The
idea is to require userspace to map guest-private memory with a dedicated VMA
flag.  That will allow the host kernel to prevent userspace (or the kernel) from
mapping guest-private memory, and will also allow KVM to verify that memory that
the guest wants to be private is indeed private.

[*] https://lkml.kernel.org/r/20210416154106.23721-14-kirill.shutemov@linux.intel.com

> The memory slot corresponding to the ID passed in args[0] is
> +populated with the guest firmware image provided by the host firmware.

Why take a slot instead of an address range?  I assume copying the host firmware
into guest memory will utilize existing APIs, i.e. the memslot lookups to resolve
the GPA->HVA will Just Work (TM).

> +The first vCPU to enter the guest is defined to be the primary vCPU. All other
> +vCPUs belonging to the VM are secondary vCPUs.
> +
> +All vCPUs belonging to a VM with this capability enabled are initialised to a
> +pre-determined reset state irrespective of any prior configuration according to
> +the KVM_ARM_VCPU_INIT ioctl, with the following exceptions for the primary
> +vCPU:
> +
> +	===========	===========
> +	Register(s)	Reset value
> +	===========	===========
> +	X0-X14:		Preserved (see KVM_SET_ONE_REG)

What's the intended use case for allowing userspace to set a pile of registers?
Giving userspace control of vCPU state is tricky because the state is untrusted.
Is the state going to be attested in any way, or is the VMM trusted while the
VM is being created and only de-privileged later on?

> +	X15:		Boot protocol version (0)
> +	X16-X30:	Reserved (0)
> +	PC:		IPA base of firmware memory slot
> +	SP:		IPA end of firmware memory slot
> +	===========	===========
> +

...

> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index c5d1f3c87dbd..e1d4a87d18e4 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1349,6 +1349,9 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
>  	bool writable = !(mem->flags & KVM_MEM_READONLY);
>  	int ret = 0;
>  
> +	if (kvm_vm_is_protected(kvm))
> +		return -EPERM;

FWIW, this will miss the benign corner case where KVM_SET_USER_MEMORY_REGION
does not actualy change anything about an existing region.

A more practical problem is that this check won't happen until KVM has marked
the existing region as invalid in the DELETE or MOVE case.  So userspace can't
completely delete a region, but it can temporarily delete a region by briefly
making it in invalid.  No idea what the ramifications of that are on arm64.

>  	if (change != KVM_MR_CREATE && change != KVM_MR_MOVE &&
>  			change != KVM_MR_FLAGS_ONLY)
>  		return 0;

...

> +static int pkvm_vm_ioctl_enable(struct kvm *kvm, u64 slotid)
> +{
> +	int ret = 0;

Deep into the nits: technically unnecessary since ret is guaranteed to be written
before being consumed.

> +	mutex_lock(&kvm->lock);
> +	if (kvm_vm_is_protected(kvm)) {
> +		ret = -EPERM;
> +		goto out_kvm_unlock;
> +	}
> +
> +	mutex_lock(&kvm->slots_lock);
> +	ret = pkvm_enable(kvm, slotid);
> +	if (ret)
> +		goto out_slots_unlock;
> +
> +	kvm->arch.pkvm.enabled = true;
> +out_slots_unlock:
> +	mutex_unlock(&kvm->slots_lock);
> +out_kvm_unlock:
> +	mutex_unlock(&kvm->lock);
> +	return ret;
> +}

...

> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 3fd9a7e9d90c..58ab8508be5e 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1082,6 +1082,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_SGX_ATTRIBUTE 196
>  #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
>  #define KVM_CAP_PTP_KVM 198
> +#define KVM_CAP_ARM_PROTECTED_VM 199

Rather than a dedicated (and arm-only) capability for protected VMs, what about
adding a capability to retrieve the supported VM types?  And obviously make
protected VMs a different type that must be specified at VM creation (there's
already plumbing for this).  Even if there's no current need to know at creation
time that a VM will be protected, it's also unlikely to be a burden on userspace
and could be nice to have down the road.  The late "activation" ioctl() can still
be supported, e.g. to start disallowing memslot updates.

x86 with TDX would look like:

        case KVM_CAP_VM_TYPES:
                r = BIT(KVM_X86_LEGACY_VM);
                if (kvm_x86_ops.is_vm_type_supported(KVM_X86_TDX_VM))
                        r |= BIT(KVM_X86_TDX_VM);
                break;

and arm64 would look like?

	case KVM_CAP_VM_TYPES:
		r = BIT(KVM_ARM64_LEGACY_VM);
		if (kvm_get_mode() == KVM_MODE_PROTECTED)
			r |= BIT(KVM_ARM64_PROTECTED_VM);
		break;

>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> -- 
> 2.32.0.rc0.204.g9fa02ecfa5-goog
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] KVM: arm64: Ignore 'kvm-arm.mode=protected' when using VHE
  2021-06-03 18:33 ` [PATCH 1/4] KVM: arm64: Ignore 'kvm-arm.mode=protected' when using VHE Will Deacon
@ 2021-06-04 14:01   ` Mark Rutland
  2021-06-07 19:28     ` Will Deacon
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Rutland @ 2021-06-04 14:01 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Christoffer Dall, Paolo Bonzini, Fuad Tabba,
	Quentin Perret, Sean Christopherson, David Brazdil, kvm,
	linux-arm-kernel

On Thu, Jun 03, 2021 at 07:33:44PM +0100, Will Deacon wrote:
> Ignore 'kvm-arm.mode=protected' when using VHE so that kvm_get_mode()
> only returns KVM_MODE_PROTECTED on systems where the feature is available.

IIUC, since the introduction of the idreg-override code, and the
mutate_to_vhe stuff, passing 'kvm-arm.mode=protected' should make the
kernel stick to EL1, right? So this should only affect M1 (or other HW
with a similar impediment).

One minor comment below; otherwise:

Acked-by: Mark Rutland <mark.rutland@arm.com>

> 
> Cc: David Brazdil <dbrazdil@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  1 -
>  arch/arm64/kernel/cpufeature.c                  | 10 +---------
>  arch/arm64/kvm/arm.c                            |  6 +++++-
>  3 files changed, 6 insertions(+), 11 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index cb89dbdedc46..e85dbdf1ee8e 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2300,7 +2300,6 @@
>  
>  			protected: nVHE-based mode with support for guests whose
>  				   state is kept private from the host.
> -				   Not valid if the kernel is running in EL2.
>  
>  			Defaults to VHE/nVHE based on hardware support.
>  
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index efed2830d141..dc1f2e747828 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1773,15 +1773,7 @@ static void cpu_enable_mte(struct arm64_cpu_capabilities const *cap)
>  #ifdef CONFIG_KVM
>  static bool is_kvm_protected_mode(const struct arm64_cpu_capabilities *entry, int __unused)
>  {
> -	if (kvm_get_mode() != KVM_MODE_PROTECTED)
> -		return false;
> -
> -	if (is_kernel_in_hyp_mode()) {
> -		pr_warn("Protected KVM not available with VHE\n");
> -		return false;
> -	}
> -
> -	return true;
> +	return kvm_get_mode() == KVM_MODE_PROTECTED;
>  }
>  #endif /* CONFIG_KVM */
>  
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 1cb39c0803a4..8d5e23198dfd 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -2121,7 +2121,11 @@ static int __init early_kvm_mode_cfg(char *arg)
>  		return -EINVAL;
>  
>  	if (strcmp(arg, "protected") == 0) {
> -		kvm_mode = KVM_MODE_PROTECTED;
> +		if (!is_kernel_in_hyp_mode())
> +			kvm_mode = KVM_MODE_PROTECTED;
> +		else
> +			pr_warn_once("Protected KVM not available with VHE\n");

... assuming this is only for M1, it might be better to say:

	Protected KVM not available on this hardware

... since that doesn't suggest that other VHE-capable HW is also not
PKVM-capable.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] KVM: arm64: Extend comment in has_vhe()
  2021-06-03 18:33 ` [PATCH 2/4] KVM: arm64: Extend comment in has_vhe() Will Deacon
@ 2021-06-04 14:09   ` Mark Rutland
  0 siblings, 0 replies; 15+ messages in thread
From: Mark Rutland @ 2021-06-04 14:09 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Christoffer Dall, Paolo Bonzini, Fuad Tabba,
	Quentin Perret, Sean Christopherson, David Brazdil, kvm,
	linux-arm-kernel

On Thu, Jun 03, 2021 at 07:33:45PM +0100, Will Deacon wrote:
> has_vhe() expands to a compile-time constant when evaluated from the VHE
> or nVHE code, alternatively checking a static key when called from
> elsewhere in the kernel. On face value, this looks like a case of
> premature optimization, but in fact this allows symbol references on
> VHE-specific code paths to be dropped from the nVHE object.
> 
> Expand the comment in has_vhe() to make this clearer, hopefully
> discouraging anybody from simplifying the code.
> 
> Cc: David Brazdil <dbrazdil@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>

Acked-by: Mark Rutland <mark.rutland@arm.com>

Mark.

> ---
>  arch/arm64/include/asm/virt.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/virt.h b/arch/arm64/include/asm/virt.h
> index 7379f35ae2c6..3218ca17f819 100644
> --- a/arch/arm64/include/asm/virt.h
> +++ b/arch/arm64/include/asm/virt.h
> @@ -111,6 +111,9 @@ static __always_inline bool has_vhe(void)
>  	/*
>  	 * Code only run in VHE/NVHE hyp context can assume VHE is present or
>  	 * absent. Otherwise fall back to caps.
> +	 * This allows the compiler to discard VHE-specific code from the
> +	 * nVHE object, reducing the number of external symbol references
> +	 * needed to link.
>  	 */
>  	if (is_vhe_hyp_code())
>  		return true;
> -- 
> 2.32.0.rc0.204.g9fa02ecfa5-goog
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] KVM: arm64: Parse reserved-memory node for pkvm guest firmware region
  2021-06-03 18:33 ` [PATCH 3/4] KVM: arm64: Parse reserved-memory node for pkvm guest firmware region Will Deacon
@ 2021-06-04 14:21   ` Mark Rutland
  2021-06-08 12:03     ` Will Deacon
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Rutland @ 2021-06-04 14:21 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Christoffer Dall, Paolo Bonzini, Fuad Tabba,
	Quentin Perret, Sean Christopherson, David Brazdil, kvm,
	linux-arm-kernel

On Thu, Jun 03, 2021 at 07:33:46PM +0100, Will Deacon wrote:
> Add support for a "linux,pkvm-guest-firmware-memory" reserved memory
> region, which can be used to identify a firmware image for protected
> VMs.

The idea that the guest's FW comes from the host's FW strikes me as
unusual; what's the rationale for this coming from the host FW? IIUC
other confidential compute VM environments allow you to load up whatever
virtual FW you want, but this is measured such that the virtual FW used
can be attested.

Thanks,
Mark.

> 
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/kvm/Makefile |  2 +-
>  arch/arm64/kvm/pkvm.c   | 52 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 53 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm64/kvm/pkvm.c
> 
> diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
> index 589921392cb1..61e054411831 100644
> --- a/arch/arm64/kvm/Makefile
> +++ b/arch/arm64/kvm/Makefile
> @@ -14,7 +14,7 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o \
>  	 $(KVM)/vfio.o $(KVM)/irqchip.o \
>  	 arm.o mmu.o mmio.o psci.o perf.o hypercalls.o pvtime.o \
>  	 inject_fault.o va_layout.o handle_exit.o \
> -	 guest.o debug.o reset.o sys_regs.o \
> +	 guest.o debug.o pkvm.o reset.o sys_regs.o \
>  	 vgic-sys-reg-v3.o fpsimd.o pmu.o \
>  	 arch_timer.o trng.o\
>  	 vgic/vgic.o vgic/vgic-init.o \
> diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
> new file mode 100644
> index 000000000000..7af5d03a3941
> --- /dev/null
> +++ b/arch/arm64/kvm/pkvm.c
> @@ -0,0 +1,52 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * KVM host (EL1) interface to Protected KVM (pkvm) code at EL2.
> + *
> + * Copyright (C) 2021 Google LLC
> + * Author: Will Deacon <will@kernel.org>
> + */
> +
> +#include <linux/kvm_host.h>
> +#include <linux/mm.h>
> +#include <linux/of_fdt.h>
> +#include <linux/of_reserved_mem.h>
> +
> +static struct reserved_mem *pkvm_firmware_mem;
> +
> +static int __init pkvm_firmware_rmem_err(struct reserved_mem *rmem,
> +					 const char *reason)
> +{
> +	phys_addr_t end = rmem->base + rmem->size;
> +
> +	kvm_err("Ignoring pkvm guest firmware memory reservation [%pa - %pa]: %s\n",
> +		&rmem->base, &end, reason);
> +	return -EINVAL;
> +}
> +
> +static int __init pkvm_firmware_rmem_init(struct reserved_mem *rmem)
> +{
> +	unsigned long node = rmem->fdt_node;
> +
> +	if (kvm_get_mode() != KVM_MODE_PROTECTED)
> +		return pkvm_firmware_rmem_err(rmem, "protected mode not enabled");
> +
> +	if (pkvm_firmware_mem)
> +		return pkvm_firmware_rmem_err(rmem, "duplicate reservation");
> +
> +	if (!of_get_flat_dt_prop(node, "no-map", NULL))
> +		return pkvm_firmware_rmem_err(rmem, "missing \"no-map\" property");
> +
> +	if (of_get_flat_dt_prop(node, "reusable", NULL))
> +		return pkvm_firmware_rmem_err(rmem, "\"reusable\" property unsupported");
> +
> +	if (!PAGE_ALIGNED(rmem->base))
> +		return pkvm_firmware_rmem_err(rmem, "base is not page-aligned");
> +
> +	if (!PAGE_ALIGNED(rmem->size))
> +		return pkvm_firmware_rmem_err(rmem, "size is not page-aligned");
> +
> +	pkvm_firmware_mem = rmem;
> +	return 0;
> +}
> +RESERVEDMEM_OF_DECLARE(pkvm_firmware, "linux,pkvm-guest-firmware-memory",
> +		       pkvm_firmware_rmem_init);
> -- 
> 2.32.0.rc0.204.g9fa02ecfa5-goog
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 4/4] KVM: arm64: Introduce KVM_CAP_ARM_PROTECTED_VM
  2021-06-03 18:33 ` [RFC PATCH 4/4] KVM: arm64: Introduce KVM_CAP_ARM_PROTECTED_VM Will Deacon
  2021-06-03 20:15   ` Sean Christopherson
@ 2021-06-04 14:41   ` Mark Rutland
  2021-06-08 12:06     ` Will Deacon
  1 sibling, 1 reply; 15+ messages in thread
From: Mark Rutland @ 2021-06-04 14:41 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Christoffer Dall, Paolo Bonzini, Fuad Tabba,
	Quentin Perret, Sean Christopherson, David Brazdil, kvm,
	linux-arm-kernel

On Thu, Jun 03, 2021 at 07:33:47PM +0100, Will Deacon wrote:
> Introduce a new VM capability, KVM_CAP_ARM_PROTECTED_VM, which can be
> used to isolate guest memory from the host. For now, the EL2 portion is
> missing, so this documents and exposes the user ABI for the host.
> 
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  Documentation/virt/kvm/api.rst    |  69 ++++++++++++++++++++
>  arch/arm64/include/asm/kvm_host.h |  10 +++
>  arch/arm64/include/uapi/asm/kvm.h |   9 +++
>  arch/arm64/kvm/arm.c              |  18 +++---
>  arch/arm64/kvm/mmu.c              |   3 +
>  arch/arm64/kvm/pkvm.c             | 104 ++++++++++++++++++++++++++++++
>  include/uapi/linux/kvm.h          |   1 +
>  7 files changed, 205 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 7fcb2fd38f42..dfbaf905c435 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6362,6 +6362,75 @@ default.
>  
>  See Documentation/x86/sgx/2.Kernel-internals.rst for more details.
>  
> +7.26 KVM_CAP_ARM_PROTECTED_VM
> +-----------------------------
> +
> +:Architectures: arm64
> +:Target: VM
> +:Parameters: flags is a single KVM_CAP_ARM_PROTECTED_VM_FLAGS_* value
> +
> +The presence of this capability indicates that KVM supports running in a
> +configuration where the host Linux kernel does not have access to guest memory.
> +On such a system, a small hypervisor layer at EL2 can configure the stage-2
> +page tables for both the CPU and any DMA-capable devices to protect guest
> +memory pages so that they are inaccessible to the host unless access is granted
> +explicitly by the guest.
> +
> +The 'flags' parameter is defined as follows:
> +
> +7.26.1 KVM_CAP_ARM_PROTECTED_VM_FLAGS_ENABLE
> +--------------------------------------------
> +
> +:Capability: 'flag' parameter to KVM_CAP_ARM_PROTECTED_VM
> +:Architectures: arm64
> +:Target: VM
> +:Parameters: args[0] contains memory slot ID to hold guest firmware
> +:Returns: 0 on success; negative error code on failure
> +
> +Enabling this capability causes all memory slots of the specified VM to be
> +unmapped from the host system and put into a state where they are no longer
> +configurable. The memory slot corresponding to the ID passed in args[0] is
> +populated with the guest firmware image provided by the host firmware.

As on the prior patch, I don't quite follow the rationale for the guest
fw coming from the host fw, and it seems to go against the usual design
for VM contents, so I fear it could be a problem in future (even if not
in android's specific model for usage).

> +The first vCPU to enter the guest is defined to be the primary vCPU. All other
> +vCPUs belonging to the VM are secondary vCPUs.
> +
> +All vCPUs belonging to a VM with this capability enabled are initialised to a
> +pre-determined reset state

What is that "pre-determined reset state"? e.g. is that just what KVM
does today, or is there something more specific (e.g. that might change
with the "Boot protocol version" below)?

> irrespective of any prior configuration according to
> +the KVM_ARM_VCPU_INIT ioctl, with the following exceptions for the primary
> +vCPU:
> +
> +	===========	===========
> +	Register(s)	Reset value
> +	===========	===========
> +	X0-X14:		Preserved (see KVM_SET_ONE_REG)
> +	X15:		Boot protocol version (0)

What's the "Boot protocol" in this context? Is that just referring to
this handover state, or is that something more involved?

> +	X16-X30:	Reserved (0)
> +	PC:		IPA base of firmware memory slot
> +	SP:		IPA end of firmware memory slot
> +	===========	===========
> +
> +Secondary vCPUs belonging to a VM with this capability enabled will return
> +-EPERM in response to a KVM_RUN ioctl() if the vCPU was not initialised with
> +the KVM_ARM_VCPU_POWER_OFF feature.

I assume this means that protected VMs always get a trusted PSCI
implementation? It might be worth mentioning so (and worth consdiering
if that should always have the SMCCC bits too).

> +
> +There is no support for AArch32 at any exception level.

Is this only going to run on CPUs without AArch32 EL0? ... or does this
mean behaviour will be erratic if someone tries to run AArch32 EL0?

> +
> +It is an error to enable this capability on a VM after issuing a KVM_RUN
> +ioctl() on one of its vCPUs.
> +
> +7.26.2 KVM_CAP_ARM_PROTECTED_VM_FLAGS_INFO
> +------------------------------------------
> +
> +:Capability: 'flag' parameter to KVM_CAP_ARM_PROTECTED_VM
> +:Architectures: arm64
> +:Target: VM
> +:Parameters: args[0] contains pointer to 'struct kvm_protected_vm_info'
> +:Returns: 0 on success; negative error code on failure
> +
> +Populates the 'struct kvm_protected_vm_info' pointed to by args[0] with
> +information about the protected environment for the VM.
> +
>  8. Other capabilities.
>  ======================
>  
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 7cd7d5c8c4bc..5645af2a1431 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -100,6 +100,11 @@ struct kvm_s2_mmu {
>  struct kvm_arch_memory_slot {
>  };
>  
> +struct kvm_protected_vm {
> +	bool enabled;
> +	struct kvm_memory_slot *firmware_slot;
> +};
> +
>  struct kvm_arch {
>  	struct kvm_s2_mmu mmu;
>  
> @@ -132,6 +137,8 @@ struct kvm_arch {
>  
>  	u8 pfr0_csv2;
>  	u8 pfr0_csv3;
> +
> +	struct kvm_protected_vm pkvm;
>  };
>  
>  struct kvm_vcpu_fault_info {
> @@ -763,6 +770,9 @@ void kvm_arch_free_vm(struct kvm *kvm);
>  
>  int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type);
>  
> +int kvm_arm_vm_ioctl_pkvm(struct kvm *kvm, struct kvm_enable_cap *cap);
> +#define kvm_vm_is_protected(kvm) (kvm->arch.pkvm.enabled)
> +
>  int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature);
>  bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);
>  
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index 24223adae150..cdb3298ba8ae 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -402,6 +402,15 @@ struct kvm_vcpu_events {
>  #define KVM_PSCI_RET_INVAL		PSCI_RET_INVALID_PARAMS
>  #define KVM_PSCI_RET_DENIED		PSCI_RET_DENIED
>  
> +/* Protected KVM */
> +#define KVM_CAP_ARM_PROTECTED_VM_FLAGS_ENABLE	0
> +#define KVM_CAP_ARM_PROTECTED_VM_FLAGS_INFO	1
> +
> +struct kvm_protected_vm_info {
> +	__u64 firmware_size;
> +	__u64 __reserved[7];
> +};

Do you have anything in mind for those 7 fields, or was this just a
number that sounded big enough?

I wonder if it's worth adding an size field, and having a size argument
to the ioctl, so that you can discover the full size if we need to grow
this, but you can always get a truncated copy if you just need the early
fields.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] KVM: arm64: Ignore 'kvm-arm.mode=protected' when using VHE
  2021-06-04 14:01   ` Mark Rutland
@ 2021-06-07 19:28     ` Will Deacon
  0 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2021-06-07 19:28 UTC (permalink / raw)
  To: Mark Rutland
  Cc: kvmarm, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Christoffer Dall, Paolo Bonzini, Fuad Tabba,
	Quentin Perret, Sean Christopherson, David Brazdil, kvm,
	linux-arm-kernel

On Fri, Jun 04, 2021 at 03:01:17PM +0100, Mark Rutland wrote:
> On Thu, Jun 03, 2021 at 07:33:44PM +0100, Will Deacon wrote:
> > Ignore 'kvm-arm.mode=protected' when using VHE so that kvm_get_mode()
> > only returns KVM_MODE_PROTECTED on systems where the feature is available.
> 
> IIUC, since the introduction of the idreg-override code, and the
> mutate_to_vhe stuff, passing 'kvm-arm.mode=protected' should make the
> kernel stick to EL1, right? So this should only affect M1 (or other HW
> with a similar impediment).

It's not just about the M1, unfortunately. You can boot with:

	"kvm-arm.mode=protected id_aa64mmfr1.vh=1"

which will force VHE mode, so we should fail protected mode in that case.

> One minor comment below; otherwise:
> 
> Acked-by: Mark Rutland <mark.rutland@arm.com>

Thanks, I'll keep the tag but please yell if you want me to drop it.

> > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > index efed2830d141..dc1f2e747828 100644
> > --- a/arch/arm64/kernel/cpufeature.c
> > +++ b/arch/arm64/kernel/cpufeature.c
> > @@ -1773,15 +1773,7 @@ static void cpu_enable_mte(struct arm64_cpu_capabilities const *cap)
> >  #ifdef CONFIG_KVM
> >  static bool is_kvm_protected_mode(const struct arm64_cpu_capabilities *entry, int __unused)
> >  {
> > -	if (kvm_get_mode() != KVM_MODE_PROTECTED)
> > -		return false;
> > -
> > -	if (is_kernel_in_hyp_mode()) {
> > -		pr_warn("Protected KVM not available with VHE\n");
> > -		return false;
> > -	}
> > -
> > -	return true;
> > +	return kvm_get_mode() == KVM_MODE_PROTECTED;
> >  }
> >  #endif /* CONFIG_KVM */
> >  
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 1cb39c0803a4..8d5e23198dfd 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -2121,7 +2121,11 @@ static int __init early_kvm_mode_cfg(char *arg)
> >  		return -EINVAL;
> >  
> >  	if (strcmp(arg, "protected") == 0) {
> > -		kvm_mode = KVM_MODE_PROTECTED;
> > +		if (!is_kernel_in_hyp_mode())
> > +			kvm_mode = KVM_MODE_PROTECTED;
> > +		else
> > +			pr_warn_once("Protected KVM not available with VHE\n");
> 
> ... assuming this is only for M1, it might be better to say:
> 
> 	Protected KVM not available on this hardware
> 
> ... since that doesn't suggest that other VHE-capable HW is also not
> PKVM-capable.

I'm just moving the existing string here, but as above, it's not M1
specific.

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] KVM: arm64: Parse reserved-memory node for pkvm guest firmware region
  2021-06-04 14:21   ` Mark Rutland
@ 2021-06-08 12:03     ` Will Deacon
  0 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2021-06-08 12:03 UTC (permalink / raw)
  To: Mark Rutland
  Cc: kvmarm, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Christoffer Dall, Paolo Bonzini, Fuad Tabba,
	Quentin Perret, Sean Christopherson, David Brazdil, kvm,
	linux-arm-kernel

Hi Mark,

On Fri, Jun 04, 2021 at 03:21:41PM +0100, Mark Rutland wrote:
> On Thu, Jun 03, 2021 at 07:33:46PM +0100, Will Deacon wrote:
> > Add support for a "linux,pkvm-guest-firmware-memory" reserved memory
> > region, which can be used to identify a firmware image for protected
> > VMs.
> 
> The idea that the guest's FW comes from the host's FW strikes me as
> unusual; what's the rationale for this coming from the host FW? IIUC
> other confidential compute VM environments allow you to load up whatever
> virtual FW you want, but this is measured such that the virtual FW used
> can be attested.

The rationale is that, as far as possible, we're trying to keep the EL2
code simple and agnostic of the guest and the SoC. We therefore assign
validation of the guest payload to this firmware image which is executed
when first entering the guest and made inaccessible to the host kernel
as part of the deprivilege operation during boot. The VMM could still
provide its own virtual firmware, which would then be measured by the
firmware here and chainloaded. We just deprivilege that logic from EL2
to EL1.

For pKVM on Android, it is the Android Bootloader which loads both the
host kernel and the guest firmware (which is actually just u-boot).
Before entering the host, it verifies and measures the guest firmware,
appending secrets to the reserved memory region which are later used by
the firmware to generate per-VM identities. These certificates are then
used by the guest to establish a communication channel with Android's
"Keymint" [1] HAL on the host and get access to hardware-backed key
resources. That way we have a certificate chain which ties directly into
Android Verified Boot [2] and extends to the guest payload without KVM
having to be aware of any of it.

Since this is all pretty specific to Android, delegating it to the
firmware allows others to use their own mechanisms without bloating the
privileged code at EL2 or enforcing a specific flow.

A straightforward extension in future would be to make this firmware
optional when spawning a protected VM, but since we have no need for
that in Android (where we require the firmware), we elected to keep
things minimal at first.

Cheers,

Will

[1] https://source.android.com/security/keystore
[2] https://source.android.com/security/verifiedboot

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 4/4] KVM: arm64: Introduce KVM_CAP_ARM_PROTECTED_VM
  2021-06-04 14:41   ` Mark Rutland
@ 2021-06-08 12:06     ` Will Deacon
  0 siblings, 0 replies; 15+ messages in thread
From: Will Deacon @ 2021-06-08 12:06 UTC (permalink / raw)
  To: Mark Rutland
  Cc: kvmarm, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Christoffer Dall, Paolo Bonzini, Fuad Tabba,
	Quentin Perret, Sean Christopherson, David Brazdil, kvm,
	linux-arm-kernel

On Fri, Jun 04, 2021 at 03:41:10PM +0100, Mark Rutland wrote:
> On Thu, Jun 03, 2021 at 07:33:47PM +0100, Will Deacon wrote:
> > +7.26.1 KVM_CAP_ARM_PROTECTED_VM_FLAGS_ENABLE
> > +--------------------------------------------
> > +
> > +:Capability: 'flag' parameter to KVM_CAP_ARM_PROTECTED_VM
> > +:Architectures: arm64
> > +:Target: VM
> > +:Parameters: args[0] contains memory slot ID to hold guest firmware
> > +:Returns: 0 on success; negative error code on failure
> > +
> > +Enabling this capability causes all memory slots of the specified VM to be
> > +unmapped from the host system and put into a state where they are no longer
> > +configurable. The memory slot corresponding to the ID passed in args[0] is
> > +populated with the guest firmware image provided by the host firmware.
> 
> As on the prior patch, I don't quite follow the rationale for the guest
> fw coming from the host fw, and it seems to go against the usual design
> for VM contents, so I fear it could be a problem in future (even if not
> in android's specific model for usage).

Hopefully my other reply helps here. As I say, it would be trivial to extend
the interface to make this optional.

> > +The first vCPU to enter the guest is defined to be the primary vCPU. All other
> > +vCPUs belonging to the VM are secondary vCPUs.
> > +
> > +All vCPUs belonging to a VM with this capability enabled are initialised to a
> > +pre-determined reset state
> 
> What is that "pre-determined reset state"? e.g. is that just what KVM
> does today, or is there something more specific (e.g. that might change
> with the "Boot protocol version" below)?

Yes, it's the KVM reset state as described by KVM_ARM_VCPU_INIT, as I state
later in the sentence:

> > irrespective of any prior configuration according to
> > +the KVM_ARM_VCPU_INIT ioctl

    here ^^^

So I should probably reword it. How about:

  | All vCPUs belonging to a VM with this capability enabled are initialised to
  | a pre-determined reset state according to the KVM_ARM_VCPU_INIT ioctl,
  | irrespective of any prior configuration, with the following exceptions
  | for the primary vCPU:

?

> > with the following exceptions for the primary
> > +vCPU:
> > +
> > +	===========	===========
> > +	Register(s)	Reset value
> > +	===========	===========
> > +	X0-X14:		Preserved (see KVM_SET_ONE_REG)
> > +	X15:		Boot protocol version (0)
> 
> What's the "Boot protocol" in this context? Is that just referring to
> this handover state, or is that something more involved?

It's just versioning this state, in case we have to change/extend it in
future. If you prefer, I can just say:

	X15-X30		Reserved (0)

instead?

> > +	X16-X30:	Reserved (0)
> > +	PC:		IPA base of firmware memory slot
> > +	SP:		IPA end of firmware memory slot
> > +	===========	===========
> > +
> > +Secondary vCPUs belonging to a VM with this capability enabled will return
> > +-EPERM in response to a KVM_RUN ioctl() if the vCPU was not initialised with
> > +the KVM_ARM_VCPU_POWER_OFF feature.
> 
> I assume this means that protected VMs always get a trusted PSCI
> implementation? It might be worth mentioning so (and worth consdiering
> if that should always have the SMCCC bits too).

"trusted PSCI implementation" might be stretching it, but the idea is that
you can at least ensure that vCPUs won't start executing until the guest is
ready for them. I don't think we need any particular SMCCC revision for
that.

> > +
> > +There is no support for AArch32 at any exception level.
> 
> Is this only going to run on CPUs without AArch32 EL0? ... or does this
> mean behaviour will be erratic if someone tries to run AArch32 EL0?

It means that we don't permit creation of 32-bit vCPUs, or exception return
to an AArch32 guest from EL2. Of course, if the guest itself decides to try
this and the hardware happens to support it then there's not a lot we can
about it. But that's the guest being silly, so we don't need to care.

> > diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> > index 24223adae150..cdb3298ba8ae 100644
> > --- a/arch/arm64/include/uapi/asm/kvm.h
> > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > @@ -402,6 +402,15 @@ struct kvm_vcpu_events {
> >  #define KVM_PSCI_RET_INVAL		PSCI_RET_INVALID_PARAMS
> >  #define KVM_PSCI_RET_DENIED		PSCI_RET_DENIED
> >  
> > +/* Protected KVM */
> > +#define KVM_CAP_ARM_PROTECTED_VM_FLAGS_ENABLE	0
> > +#define KVM_CAP_ARM_PROTECTED_VM_FLAGS_INFO	1
> > +
> > +struct kvm_protected_vm_info {
> > +	__u64 firmware_size;
> > +	__u64 __reserved[7];
> > +};
> 
> Do you have anything in mind for those 7 fields, or was this just a
> number that sounded big enough?

I just padded it to a cacheline, as that's plenty enough space to version
the thing if we need to.

> I wonder if it's worth adding an size field, and having a size argument
> to the ioctl, so that you can discover the full size if we need to grow
> this, but you can always get a truncated copy if you just need the early
> fields.

Possibly, but the rest of the KVM UAPI doesn't seem to be designed like
that, so I followed what was there already. I defer to the KVM maintainers
on this one...

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 4/4] KVM: arm64: Introduce KVM_CAP_ARM_PROTECTED_VM
  2021-06-03 20:15   ` Sean Christopherson
@ 2021-06-08 12:08     ` Will Deacon
  2021-06-11 13:25       ` Alexandru Elisei
  0 siblings, 1 reply; 15+ messages in thread
From: Will Deacon @ 2021-06-08 12:08 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvmarm, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Mark Rutland, Christoffer Dall, Paolo Bonzini,
	Fuad Tabba, Quentin Perret, David Brazdil, kvm, linux-arm-kernel

Hi Sean,

Thanks for having a look.

On Thu, Jun 03, 2021 at 08:15:55PM +0000, Sean Christopherson wrote:
> On Thu, Jun 03, 2021, Will Deacon wrote:
> > +Enabling this capability causes all memory slots of the specified VM to be
> > +unmapped from the host system and put into a state where they are no longer
> > +configurable.
> 
> What happens if userspace remaps the hva of the memslot?  In other words, how
> will the implementation ensure the physical memory backing the guest is truly
> inaccessible?

The protection is done in the host stage-2 page-table, so it doesn't matter
how the host configures its stage-1 page-table, the underlying physical page
will be inaccessible via any mapping it can create. Does that answer your
question?

> And how will the guest/host share memory?  E.g. what if the guest wants to set
> up a new shared memory region and the host wants that to happen in a new memslot?

The way we're currently dealing with sharing is that, by default, all guest
memory is inaccessible to the host. The guest can then issue hypercalls to
open windows back to the host, so these would necessarily happen in
pre-existing memslots.

> On a related topic, would the concept of guest-only memory be useful[*]?  The
> idea is to require userspace to map guest-private memory with a dedicated VMA
> flag.  That will allow the host kernel to prevent userspace (or the kernel) from
> mapping guest-private memory, and will also allow KVM to verify that memory that
> the guest wants to be private is indeed private.
> 
> [*] https://lkml.kernel.org/r/20210416154106.23721-14-kirill.shutemov@linux.intel.com

Interesting. That certainly looks like it could make things more robust, as
we're in a pretty nasty position if the host tries to access a guest page.
We can either panic the host (which will clearly kill the system) or blow
away the guest (which hasn't done anything wrong). Consequently, whatever we
can do to avoid the host trying to access guest pages in the first place
will make the whole thing more stable.

> > The memory slot corresponding to the ID passed in args[0] is
> > +populated with the guest firmware image provided by the host firmware.
> 
> Why take a slot instead of an address range?  I assume copying the host firmware
> into guest memory will utilize existing APIs, i.e. the memslot lookups to resolve
> the GPA->HVA will Just Work (TM).

Because... initially we tried this funky idea where the firmware would
execute in-place from a read-only memslot and then get unmapped when it was
done. That didn't work out, so now that we have the copying we absolutely
can use an address range. I'll do that for the next version, thanks!

> > +The first vCPU to enter the guest is defined to be the primary vCPU. All other
> > +vCPUs belonging to the VM are secondary vCPUs.
> > +
> > +All vCPUs belonging to a VM with this capability enabled are initialised to a
> > +pre-determined reset state irrespective of any prior configuration according to
> > +the KVM_ARM_VCPU_INIT ioctl, with the following exceptions for the primary
> > +vCPU:
> > +
> > +	===========	===========
> > +	Register(s)	Reset value
> > +	===========	===========
> > +	X0-X14:		Preserved (see KVM_SET_ONE_REG)
> 
> What's the intended use case for allowing userspace to set a pile of registers?
> Giving userspace control of vCPU state is tricky because the state is untrusted.
> Is the state going to be attested in any way, or is the VMM trusted while the
> VM is being created and only de-privileged later on?

I touched a bit on this in my reply to Mark, but the idea is that we have a
firmware image which the host cannot access and which is installed by the
pKVM hypervisor when entering a guest for the first time. The state
documented here is the state in which the firmware is entered, so having
some registers populated by the VMM allows the VMM to communicate with both
the guest firmware and whatever the real VM payload is. We don't trust the
VMM to ensure the integrity of the VM's memory.

For example, let's say the VM payload is an arm64 Linux kernel and we have
u-boot as the firmware image. When the primary vCPU enters the guest for the
first time, the pKVM hypervisor will copy u-boot into the
memslot/address-range specified by KVM_CAP_ARM_PROTECTED_VM_FLAGS_ENABLE and
set the registers accordingly. U-boot will then locate the guest payload
according to what the VMM has put in the preserved registers. Let's say that
x4 points to the start of the guest kernel and x5 to the end; u-boot will
verify that the memory range is correctly signed and, if so, will then jump
to the kernel entry point. The kernel boot protocol then interprets X0-X3
(See Documentation/arm64/booting.rst), with X0 pointing to the devicetree
blob provided by the VMM.

In this example, if the VMM doesn't set X4/X5 correctly, then the signature
check will fail. The devicetree blob is untrusted and, in subsequent
patches, we will harden the guest against that (by offering, for example,
hypercalls so that the guest can opt-in to exiting back to the host on a
stage-2 translation fault).

We'd prefer not to spell out the contract between the VM, firmware and guest
payload, since this is really specific to what you're actually running.
Instead, just carving out a block of preserved registers means KVM doesn't
need to care what they're for.

> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index c5d1f3c87dbd..e1d4a87d18e4 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1349,6 +1349,9 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> >  	bool writable = !(mem->flags & KVM_MEM_READONLY);
> >  	int ret = 0;
> >  
> > +	if (kvm_vm_is_protected(kvm))
> > +		return -EPERM;
> 
> FWIW, this will miss the benign corner case where KVM_SET_USER_MEMORY_REGION
> does not actualy change anything about an existing region.

Yeah, I think that's ok though because nothing happened.

> A more practical problem is that this check won't happen until KVM has marked
> the existing region as invalid in the DELETE or MOVE case.  So userspace can't
> completely delete a region, but it can temporarily delete a region by briefly
> making it in invalid.  No idea what the ramifications of that are on arm64.

It's probably worth stating that the current patches I have here are only
the host kernel part of the bargain. The real work is going to be happening
in the hypervisor, which will be in the charge of the page-tables and
blissfully ignorant of memslot flags. In other words, the permission checks
here are really for slapping the VMM on the wrist if it does something daft
rather then enforcing anything for security -- all that has to happen in the
hypervisor. In the case of KVM_MEMSLOT_INVALID being set when a guest exits
to the host due to a page-fault, it looks like we'll return to the VMM.

I'd just like to get some feedback on the user ABI before we commit to the
hypervisor side of things.

> >  	if (change != KVM_MR_CREATE && change != KVM_MR_MOVE &&
> >  			change != KVM_MR_FLAGS_ONLY)
> >  		return 0;
> 
> ...
> 
> > +static int pkvm_vm_ioctl_enable(struct kvm *kvm, u64 slotid)
> > +{
> > +	int ret = 0;
> 
> Deep into the nits: technically unnecessary since ret is guaranteed to be written
> before being consumed.

Sure thing, I can drop that assignment.

> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index 3fd9a7e9d90c..58ab8508be5e 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1082,6 +1082,7 @@ struct kvm_ppc_resize_hpt {
> >  #define KVM_CAP_SGX_ATTRIBUTE 196
> >  #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
> >  #define KVM_CAP_PTP_KVM 198
> > +#define KVM_CAP_ARM_PROTECTED_VM 199
> 
> Rather than a dedicated (and arm-only) capability for protected VMs, what about
> adding a capability to retrieve the supported VM types?  And obviously make
> protected VMs a different type that must be specified at VM creation (there's
> already plumbing for this).  Even if there's no current need to know at creation
> time that a VM will be protected, it's also unlikely to be a burden on userspace
> and could be nice to have down the road.  The late "activation" ioctl() can still
> be supported, e.g. to start disallowing memslot updates.
> 
> x86 with TDX would look like:
> 
>         case KVM_CAP_VM_TYPES:
>                 r = BIT(KVM_X86_LEGACY_VM);
>                 if (kvm_x86_ops.is_vm_type_supported(KVM_X86_TDX_VM))
>                         r |= BIT(KVM_X86_TDX_VM);
>                 break;
> 
> and arm64 would look like?
> 
> 	case KVM_CAP_VM_TYPES:
> 		r = BIT(KVM_ARM64_LEGACY_VM);
> 		if (kvm_get_mode() == KVM_MODE_PROTECTED)
> 			r |= BIT(KVM_ARM64_PROTECTED_VM);
> 		break;

You're not the first person to mention this, so I'll look into it. We'll
still need the cap, not just for deferring activation, but also for querying
the firmware parameters.

Cheers,

Will

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH 4/4] KVM: arm64: Introduce KVM_CAP_ARM_PROTECTED_VM
  2021-06-08 12:08     ` Will Deacon
@ 2021-06-11 13:25       ` Alexandru Elisei
  0 siblings, 0 replies; 15+ messages in thread
From: Alexandru Elisei @ 2021-06-11 13:25 UTC (permalink / raw)
  To: Will Deacon, Sean Christopherson
  Cc: kvmarm, Marc Zyngier, James Morse, Suzuki K Poulose,
	Mark Rutland, Christoffer Dall, Paolo Bonzini, Fuad Tabba,
	Quentin Perret, David Brazdil, kvm, linux-arm-kernel

Hi Will,

On 6/8/21 1:08 PM, Will Deacon wrote:
> Hi Sean,
>
> Thanks for having a look.
>
> On Thu, Jun 03, 2021 at 08:15:55PM +0000, Sean Christopherson wrote:
>> On Thu, Jun 03, 2021, Will Deacon wrote:
>>> [..]
>>
>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>> index 3fd9a7e9d90c..58ab8508be5e 100644
>>> --- a/include/uapi/linux/kvm.h
>>> +++ b/include/uapi/linux/kvm.h
>>> @@ -1082,6 +1082,7 @@ struct kvm_ppc_resize_hpt {
>>>  #define KVM_CAP_SGX_ATTRIBUTE 196
>>>  #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
>>>  #define KVM_CAP_PTP_KVM 198
>>> +#define KVM_CAP_ARM_PROTECTED_VM 199
>> Rather than a dedicated (and arm-only) capability for protected VMs, what about
>> adding a capability to retrieve the supported VM types?  And obviously make
>> protected VMs a different type that must be specified at VM creation (there's
>> already plumbing for this).  Even if there's no current need to know at creation
>> time that a VM will be protected, it's also unlikely to be a burden on userspace
>> and could be nice to have down the road.  The late "activation" ioctl() can still
>> be supported, e.g. to start disallowing memslot updates.
>>
>> x86 with TDX would look like:
>>
>>         case KVM_CAP_VM_TYPES:
>>                 r = BIT(KVM_X86_LEGACY_VM);
>>                 if (kvm_x86_ops.is_vm_type_supported(KVM_X86_TDX_VM))
>>                         r |= BIT(KVM_X86_TDX_VM);
>>                 break;
>>
>> and arm64 would look like?
>>
>> 	case KVM_CAP_VM_TYPES:
>> 		r = BIT(KVM_ARM64_LEGACY_VM);
>> 		if (kvm_get_mode() == KVM_MODE_PROTECTED)
>> 			r |= BIT(KVM_ARM64_PROTECTED_VM);
>> 		break;
> You're not the first person to mention this, so I'll look into it. We'll
> still need the cap, not just for deferring activation, but also for querying
> the firmware parameters.

I had a look at the series that rejects unsupported VCPU features from a protected
VM [1], and it got me thinking. With this ABI in mind, it will be permitted for an
userspace VMM to do the following:

1. Create a VM -> success.

2. Check the KVM_CAP_ARM_PMU_V3 (cap chosen randomly) on the **VM fd** -> it's
available.

3. Set the VCPU feature and create the VCPU -> success.

4. <do other initialization stuff>

5. Turn the VM into a protected VM -> failure, because KVM_ARM_VCPU_PMU_V3 was
part of the VCPU features.

To me, that looks a bit strange. On the other hand, if KVM knew right from VM
creation time that the VM is protected, KVM could have told userspace that that
VCPU feature is not supported **for this particular type of VM**. I think that
makes for a much cleaner userspace API. Yes, it would still be possible to check
the cap on the KVM fd and get success, but we could make the argument that those
capabilities represent the capabilities that KVM supports for any type of VM, not
this particular type.

This can also be used for discovering new features added to pKVM. How do you
intend to do that with the current ABI? By adding a version field to struct
kvm_protected_vm_info, where each version comes with a fixed set of features that
are supported?

[1] https://www.spinics.net/lists/kvm/msg245882.html

Thanks,

Alex


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-06-11 13:24 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-03 18:33 [PATCH 0/4] kvm/arm64: Initial pKVM user ABI Will Deacon
2021-06-03 18:33 ` [PATCH 1/4] KVM: arm64: Ignore 'kvm-arm.mode=protected' when using VHE Will Deacon
2021-06-04 14:01   ` Mark Rutland
2021-06-07 19:28     ` Will Deacon
2021-06-03 18:33 ` [PATCH 2/4] KVM: arm64: Extend comment in has_vhe() Will Deacon
2021-06-04 14:09   ` Mark Rutland
2021-06-03 18:33 ` [PATCH 3/4] KVM: arm64: Parse reserved-memory node for pkvm guest firmware region Will Deacon
2021-06-04 14:21   ` Mark Rutland
2021-06-08 12:03     ` Will Deacon
2021-06-03 18:33 ` [RFC PATCH 4/4] KVM: arm64: Introduce KVM_CAP_ARM_PROTECTED_VM Will Deacon
2021-06-03 20:15   ` Sean Christopherson
2021-06-08 12:08     ` Will Deacon
2021-06-11 13:25       ` Alexandru Elisei
2021-06-04 14:41   ` Mark Rutland
2021-06-08 12:06     ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).