linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2
@ 2022-10-17 11:51 Will Deacon
  2022-10-17 11:51 ` [PATCH v4 01/25] KVM: arm64: Move hyp refcount manipulation helpers to common header file Will Deacon
                   ` (24 more replies)
  0 siblings, 25 replies; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:51 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

Hi everyone,

This is version four of the patches previously posted at:

  Mega-patch: https://lore.kernel.org/kvmarm/20220519134204.5379-1-will@kernel.org/
  v2: https://lore.kernel.org/all/20220630135747.26983-1-will@kernel.org/
  v3: https://lore.kernel.org/kvmarm/20220914083500.5118-1-will@kernel.org/

This series extends the pKVM EL2 code so that it can dynamically
instantiate and manage VM data structures without the host being able to
access them directly. These structures consist of a hyp VM, a set of hyp
vCPUs and the stage-2 page-table for the MMU. The pages used to hold the
hypervisor structures are returned to the host when the VM is destroyed.

There are only a few small changes for v4:

  * Fixed missing cache maintenance when reclaiming guest pages on a system
    with the FWB ("Force WriteBack") CPU feature
  * Added a comment about locking requirements for refcount manipulation
  * Fixed a kbuild robot complaint when using 52-bit physical addresses
  * Added Vincent's Tested-by for the series
  * Added Oliver's Reviewed-by for the first patch
  * Rebased onto 6.1-rc1

One big change since v3 is that Quentin's pKVM "technical deep dive" talk
from this year's KVM forum is now online and hopefully provides an
enjoyable narrative to this series:

  https://www.youtube.com/watch?v=9npebeVFbFw

There are still a bunch of extra patches needed to achieve guest/host
isolation, but that follow-up work is largely stalled pending resolution
of the guest private memory API (although KVM forum provided an
excellent venue to iron some of those details out!):

  https://lore.kernel.org/kvm/20220915142913.2213336-1-chao.p.peng@linux.intel.com/T/#t

The last patch remains "RFC" as it's primarily intended for testing and
I couldn't think of a better way to flag it.

Cheers,

Will, Quentin, Fuad and Marc

Cc: Sean Christopherson <seanjc@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Chao Peng <chao.p.peng@linux.intel.com>
Cc: Quentin Perret <qperret@google.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Fuad Tabba <tabba@google.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Marc Zyngier <maz@kernel.org>

Cc: kernel-team@android.com
Cc: kvm@vger.kernel.org
Cc: kvmarm@lists.linux.dev
Cc: linux-arm-kernel@lists.infradead.org

--->8

Fuad Tabba (3):
  KVM: arm64: Add hyp_spinlock_t static initializer
  KVM: arm64: Add infrastructure to create and track pKVM instances at
    EL2
  KVM: arm64: Instantiate pKVM hypervisor VM and vCPU structures from
    EL1

Quentin Perret (15):
  KVM: arm64: Move hyp refcount manipulation helpers to common header
    file
  KVM: arm64: Allow attaching of non-coalescable pages to a hyp pool
  KVM: arm64: Back the hypervisor 'struct hyp_page' array for all memory
  KVM: arm64: Fix-up hyp stage-1 refcounts for all pages mapped at EL2
  KVM: arm64: Implement do_donate() helper for donating memory
  KVM: arm64: Prevent the donation of no-map pages
  KVM: arm64: Add helpers to pin memory shared with the hypervisor at
    EL2
  KVM: arm64: Add per-cpu fixmap infrastructure at EL2
  KVM: arm64: Add generic hyp_memcache helpers
  KVM: arm64: Consolidate stage-2 initialisation into a single function
  KVM: arm64: Instantiate guest stage-2 page-tables at EL2
  KVM: arm64: Return guest memory from EL2 via dedicated teardown
    memcache
  KVM: arm64: Unmap 'kvm_arm_hyp_percpu_base' from the host
  KVM: arm64: Explicitly map 'kvm_vgic_global_state' at EL2
  KVM: arm64: Don't unnecessarily map host kernel sections at EL2

Will Deacon (7):
  KVM: arm64: Unify identifiers used to distinguish host and hypervisor
  KVM: arm64: Include asm/kvm_mmu.h in nvhe/mem_protect.h
  KVM: arm64: Rename 'host_kvm' to 'host_mmu'
  KVM: arm64: Initialise hypervisor copies of host symbols
    unconditionally
  KVM: arm64: Provide I-cache invalidation by virtual address at EL2
  KVM: arm64: Maintain a copy of 'kvm_arm_vmid_bits' at EL2
  KVM: arm64: Use the pKVM hyp vCPU structure in handle___kvm_vcpu_run()

 arch/arm64/include/asm/kvm_arm.h              |   2 +-
 arch/arm64/include/asm/kvm_asm.h              |   7 +-
 arch/arm64/include/asm/kvm_host.h             |  73 ++-
 arch/arm64/include/asm/kvm_hyp.h              |   3 +
 arch/arm64/include/asm/kvm_mmu.h              |   2 +-
 arch/arm64/include/asm/kvm_pgtable.h          |  22 +
 arch/arm64/include/asm/kvm_pkvm.h             |  38 ++
 arch/arm64/kernel/image-vars.h                |  15 -
 arch/arm64/kvm/arm.c                          |  61 ++-
 arch/arm64/kvm/hyp/hyp-constants.c            |   3 +
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  25 +-
 arch/arm64/kvm/hyp/include/nvhe/memory.h      |  27 +
 arch/arm64/kvm/hyp/include/nvhe/mm.h          |  18 +-
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  74 +++
 arch/arm64/kvm/hyp/include/nvhe/spinlock.h    |  10 +-
 arch/arm64/kvm/hyp/nvhe/cache.S               |  11 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            | 110 +++-
 arch/arm64/kvm/hyp/nvhe/hyp-smp.c             |   2 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 502 ++++++++++++++++--
 arch/arm64/kvm/hyp/nvhe/mm.c                  | 158 +++++-
 arch/arm64/kvm/hyp/nvhe/page_alloc.c          |  28 +-
 arch/arm64/kvm/hyp/nvhe/pkvm.c                | 444 ++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/setup.c               |  96 ++--
 arch/arm64/kvm/hyp/pgtable.c                  |  21 +-
 arch/arm64/kvm/mmu.c                          |  55 +-
 arch/arm64/kvm/pkvm.c                         | 138 ++++-
 arch/arm64/kvm/reset.c                        |  29 -
 27 files changed, 1758 insertions(+), 216 deletions(-)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/pkvm.h

-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v4 01/25] KVM: arm64: Move hyp refcount manipulation helpers to common header file
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
@ 2022-10-17 11:51 ` Will Deacon
  2022-10-17 20:29   ` Philippe Mathieu-Daudé
  2022-10-17 11:51 ` [PATCH v4 02/25] KVM: arm64: Allow attaching of non-coalescable pages to a hyp pool Will Deacon
                   ` (23 subsequent siblings)
  24 siblings, 1 reply; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:51 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

We will soon need to manipulate 'struct hyp_page' refcounts from outside
page_alloc.c, so move the helpers to a common header file to allow them
to be reused easily.

Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/memory.h | 22 ++++++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/page_alloc.c     | 19 -------------------
 2 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 592b7edb3edb..9422900e5c6a 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -38,6 +38,10 @@ static inline phys_addr_t hyp_virt_to_phys(void *addr)
 #define hyp_page_to_virt(page)	__hyp_va(hyp_page_to_phys(page))
 #define hyp_page_to_pool(page)	(((struct hyp_page *)page)->pool)
 
+/*
+ * Refcounting for 'struct hyp_page'.
+ * hyp_pool::lock must be held if atomic access to the refcount is required.
+ */
 static inline int hyp_page_count(void *addr)
 {
 	struct hyp_page *p = hyp_virt_to_page(addr);
@@ -45,4 +49,22 @@ static inline int hyp_page_count(void *addr)
 	return p->refcount;
 }
 
+static inline void hyp_page_ref_inc(struct hyp_page *p)
+{
+	BUG_ON(p->refcount == USHRT_MAX);
+	p->refcount++;
+}
+
+static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
+{
+	BUG_ON(!p->refcount);
+	p->refcount--;
+	return (p->refcount == 0);
+}
+
+static inline void hyp_set_page_refcounted(struct hyp_page *p)
+{
+	BUG_ON(p->refcount);
+	p->refcount = 1;
+}
 #endif /* __KVM_HYP_MEMORY_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index d40f0b30b534..1ded09fc9b10 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -144,25 +144,6 @@ static struct hyp_page *__hyp_extract_page(struct hyp_pool *pool,
 	return p;
 }
 
-static inline void hyp_page_ref_inc(struct hyp_page *p)
-{
-	BUG_ON(p->refcount == USHRT_MAX);
-	p->refcount++;
-}
-
-static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
-{
-	BUG_ON(!p->refcount);
-	p->refcount--;
-	return (p->refcount == 0);
-}
-
-static inline void hyp_set_page_refcounted(struct hyp_page *p)
-{
-	BUG_ON(p->refcount);
-	p->refcount = 1;
-}
-
 static void __hyp_put_page(struct hyp_pool *pool, struct hyp_page *p)
 {
 	if (hyp_page_ref_dec_and_test(p))
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 02/25] KVM: arm64: Allow attaching of non-coalescable pages to a hyp pool
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
  2022-10-17 11:51 ` [PATCH v4 01/25] KVM: arm64: Move hyp refcount manipulation helpers to common header file Will Deacon
@ 2022-10-17 11:51 ` Will Deacon
  2022-10-17 11:51 ` [PATCH v4 03/25] KVM: arm64: Back the hypervisor 'struct hyp_page' array for all memory Will Deacon
                   ` (22 subsequent siblings)
  24 siblings, 0 replies; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:51 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

All the contiguous pages used to initialize a 'struct hyp_pool' are
considered coalescable, which means that the hyp page allocator will
actively try to merge them with their buddies on the hyp_put_page() path.
However, using hyp_put_page() on a page that is not part of the inital
memory range given to a hyp_pool() is currently unsupported.

In order to allow dynamically extending hyp pools at run-time, add a
check to __hyp_attach_page() to allow inserting 'external' pages into
the free-list of order 0. This will be necessary to allow lazy donation
of pages from the host to the hypervisor when allocating guest stage-2
page-table pages at EL2.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/page_alloc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index 1ded09fc9b10..0d15227aced8 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -93,11 +93,15 @@ static inline struct hyp_page *node_to_page(struct list_head *node)
 static void __hyp_attach_page(struct hyp_pool *pool,
 			      struct hyp_page *p)
 {
+	phys_addr_t phys = hyp_page_to_phys(p);
 	unsigned short order = p->order;
 	struct hyp_page *buddy;
 
 	memset(hyp_page_to_virt(p), 0, PAGE_SIZE << p->order);
 
+	if (phys < pool->range_start || phys >= pool->range_end)
+		goto insert;
+
 	/*
 	 * Only the first struct hyp_page of a high-order page (otherwise known
 	 * as the 'head') should have p->order set. The non-head pages should
@@ -116,6 +120,7 @@ static void __hyp_attach_page(struct hyp_pool *pool,
 		p = min(p, buddy);
 	}
 
+insert:
 	/* Mark the new head, and insert it */
 	p->order = order;
 	page_add_to_list(p, &pool->free_area[order]);
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 03/25] KVM: arm64: Back the hypervisor 'struct hyp_page' array for all memory
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
  2022-10-17 11:51 ` [PATCH v4 01/25] KVM: arm64: Move hyp refcount manipulation helpers to common header file Will Deacon
  2022-10-17 11:51 ` [PATCH v4 02/25] KVM: arm64: Allow attaching of non-coalescable pages to a hyp pool Will Deacon
@ 2022-10-17 11:51 ` Will Deacon
  2022-10-17 11:51 ` [PATCH v4 04/25] KVM: arm64: Fix-up hyp stage-1 refcounts for all pages mapped at EL2 Will Deacon
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:51 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

The EL2 'vmemmap' array in nVHE Protected mode is currently very sparse:
only memory pages owned by the hypervisor itself have a matching 'struct
hyp_page'. However, as the size of this struct has been reduced
significantly since its introduction, it appears that we can now afford
to back the vmemmap for all of memory.

Having an easily accessible 'struct hyp_page' for every physical page in
memory provides the hypervisor with a simple mechanism to store metadata
(e.g. a refcount) that wouldn't otherwise fit in the very limited number
of software bits available in the host stage-2 page-table entries. This
will be used in subsequent patches when pinning host memory pages for
use by the hypervisor at EL2.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_pkvm.h    | 26 +++++++++++++++++++++++
 arch/arm64/kvm/hyp/include/nvhe/mm.h | 14 +------------
 arch/arm64/kvm/hyp/nvhe/mm.c         | 31 ++++++++++++++++++++++++----
 arch/arm64/kvm/hyp/nvhe/page_alloc.c |  4 +---
 arch/arm64/kvm/hyp/nvhe/setup.c      |  7 +++----
 arch/arm64/kvm/pkvm.c                | 18 ++--------------
 6 files changed, 60 insertions(+), 40 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 9f4ad2a8df59..8f7b8a2314bb 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -14,6 +14,32 @@
 extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
 extern unsigned int kvm_nvhe_sym(hyp_memblock_nr);
 
+static inline unsigned long
+hyp_vmemmap_memblock_size(struct memblock_region *reg, size_t vmemmap_entry_size)
+{
+	unsigned long nr_pages = reg->size >> PAGE_SHIFT;
+	unsigned long start, end;
+
+	start = (reg->base >> PAGE_SHIFT) * vmemmap_entry_size;
+	end = start + nr_pages * vmemmap_entry_size;
+	start = ALIGN_DOWN(start, PAGE_SIZE);
+	end = ALIGN(end, PAGE_SIZE);
+
+	return end - start;
+}
+
+static inline unsigned long hyp_vmemmap_pages(size_t vmemmap_entry_size)
+{
+	unsigned long res = 0, i;
+
+	for (i = 0; i < kvm_nvhe_sym(hyp_memblock_nr); i++) {
+		res += hyp_vmemmap_memblock_size(&kvm_nvhe_sym(hyp_memory)[i],
+						 vmemmap_entry_size);
+	}
+
+	return res >> PAGE_SHIFT;
+}
+
 static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
 {
 	unsigned long total = 0, i;
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index 42d8eb9bfe72..b2ee6d5df55b 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -15,7 +15,7 @@ extern hyp_spinlock_t pkvm_pgd_lock;
 
 int hyp_create_idmap(u32 hyp_va_bits);
 int hyp_map_vectors(void);
-int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back);
+int hyp_back_vmemmap(phys_addr_t back);
 int pkvm_cpu_set_vector(enum arm64_hyp_spectre_vector slot);
 int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
 int pkvm_create_mappings_locked(void *from, void *to, enum kvm_pgtable_prot prot);
@@ -24,16 +24,4 @@ int __pkvm_create_private_mapping(phys_addr_t phys, size_t size,
 				  unsigned long *haddr);
 int pkvm_alloc_private_va_range(size_t size, unsigned long *haddr);
 
-static inline void hyp_vmemmap_range(phys_addr_t phys, unsigned long size,
-				     unsigned long *start, unsigned long *end)
-{
-	unsigned long nr_pages = size >> PAGE_SHIFT;
-	struct hyp_page *p = hyp_phys_to_page(phys);
-
-	*start = (unsigned long)p;
-	*end = *start + nr_pages * sizeof(struct hyp_page);
-	*start = ALIGN_DOWN(*start, PAGE_SIZE);
-	*end = ALIGN(*end, PAGE_SIZE);
-}
-
 #endif /* __KVM_HYP_MM_H */
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index 96193cb31a39..d3a3b47181de 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -129,13 +129,36 @@ int pkvm_create_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
 	return ret;
 }
 
-int hyp_back_vmemmap(phys_addr_t phys, unsigned long size, phys_addr_t back)
+int hyp_back_vmemmap(phys_addr_t back)
 {
-	unsigned long start, end;
+	unsigned long i, start, size, end = 0;
+	int ret;
 
-	hyp_vmemmap_range(phys, size, &start, &end);
+	for (i = 0; i < hyp_memblock_nr; i++) {
+		start = hyp_memory[i].base;
+		start = ALIGN_DOWN((u64)hyp_phys_to_page(start), PAGE_SIZE);
+		/*
+		 * The begining of the hyp_vmemmap region for the current
+		 * memblock may already be backed by the page backing the end
+		 * the previous region, so avoid mapping it twice.
+		 */
+		start = max(start, end);
+
+		end = hyp_memory[i].base + hyp_memory[i].size;
+		end = PAGE_ALIGN((u64)hyp_phys_to_page(end));
+		if (start >= end)
+			continue;
+
+		size = end - start;
+		ret = __pkvm_create_mappings(start, size, back, PAGE_HYP);
+		if (ret)
+			return ret;
+
+		memset(hyp_phys_to_virt(back), 0, size);
+		back += size;
+	}
 
-	return __pkvm_create_mappings(start, end - start, back, PAGE_HYP);
+	return 0;
 }
 
 static void *__hyp_bp_vect_base;
diff --git a/arch/arm64/kvm/hyp/nvhe/page_alloc.c b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
index 0d15227aced8..7804da89e55d 100644
--- a/arch/arm64/kvm/hyp/nvhe/page_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/page_alloc.c
@@ -235,10 +235,8 @@ int hyp_pool_init(struct hyp_pool *pool, u64 pfn, unsigned int nr_pages,
 
 	/* Init the vmemmap portion */
 	p = hyp_phys_to_page(phys);
-	for (i = 0; i < nr_pages; i++) {
-		p[i].order = 0;
+	for (i = 0; i < nr_pages; i++)
 		hyp_set_page_refcounted(&p[i]);
-	}
 
 	/* Attach the unused pages to the buddy tree */
 	for (i = reserved_pages; i < nr_pages; i++)
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index e8d4ea2fcfa0..579eb4f73476 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -31,12 +31,11 @@ static struct hyp_pool hpool;
 
 static int divide_memory_pool(void *virt, unsigned long size)
 {
-	unsigned long vstart, vend, nr_pages;
+	unsigned long nr_pages;
 
 	hyp_early_alloc_init(virt, size);
 
-	hyp_vmemmap_range(__hyp_pa(virt), size, &vstart, &vend);
-	nr_pages = (vend - vstart) >> PAGE_SHIFT;
+	nr_pages = hyp_vmemmap_pages(sizeof(struct hyp_page));
 	vmemmap_base = hyp_early_alloc_contig(nr_pages);
 	if (!vmemmap_base)
 		return -ENOMEM;
@@ -78,7 +77,7 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 	if (ret)
 		return ret;
 
-	ret = hyp_back_vmemmap(phys, size, hyp_virt_to_phys(vmemmap_base));
+	ret = hyp_back_vmemmap(hyp_virt_to_phys(vmemmap_base));
 	if (ret)
 		return ret;
 
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index ebecb7c045f4..34229425b25d 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -53,7 +53,7 @@ static int __init register_memblock_regions(void)
 
 void __init kvm_hyp_reserve(void)
 {
-	u64 nr_pages, prev, hyp_mem_pages = 0;
+	u64 hyp_mem_pages = 0;
 	int ret;
 
 	if (!is_hyp_mode_available() || is_kernel_in_hyp_mode())
@@ -71,21 +71,7 @@ void __init kvm_hyp_reserve(void)
 
 	hyp_mem_pages += hyp_s1_pgtable_pages();
 	hyp_mem_pages += host_s2_pgtable_pages();
-
-	/*
-	 * The hyp_vmemmap needs to be backed by pages, but these pages
-	 * themselves need to be present in the vmemmap, so compute the number
-	 * of pages needed by looking for a fixed point.
-	 */
-	nr_pages = 0;
-	do {
-		prev = nr_pages;
-		nr_pages = hyp_mem_pages + prev;
-		nr_pages = DIV_ROUND_UP(nr_pages * STRUCT_HYP_PAGE_SIZE,
-					PAGE_SIZE);
-		nr_pages += __hyp_pgtable_max_pages(nr_pages);
-	} while (nr_pages != prev);
-	hyp_mem_pages += nr_pages;
+	hyp_mem_pages += hyp_vmemmap_pages(STRUCT_HYP_PAGE_SIZE);
 
 	/*
 	 * Try to allocate a PMD-aligned region to reduce TLB pressure once
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 04/25] KVM: arm64: Fix-up hyp stage-1 refcounts for all pages mapped at EL2
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
                   ` (2 preceding siblings ...)
  2022-10-17 11:51 ` [PATCH v4 03/25] KVM: arm64: Back the hypervisor 'struct hyp_page' array for all memory Will Deacon
@ 2022-10-17 11:51 ` Will Deacon
  2022-10-17 11:51 ` [PATCH v4 05/25] KVM: arm64: Unify identifiers used to distinguish host and hypervisor Will Deacon
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:51 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

In order to allow unmapping arbitrary memory pages from the hypervisor
stage-1 page-table, fix-up the initial refcount for pages that have been
mapped before the 'vmemmap' array was up and running so that it
accurately accounts for all existing hypervisor mappings.

This is achieved by traversing the entire hypervisor stage-1 page-table
during initialisation of EL2 and updating the corresponding
'struct hyp_page' for each valid mapping.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/setup.c | 62 +++++++++++++++++++++++----------
 1 file changed, 43 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 579eb4f73476..8f2726d7e201 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -185,12 +185,11 @@ static void hpool_put_page(void *addr)
 	hyp_put_page(&hpool, addr);
 }
 
-static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
-					 kvm_pte_t *ptep,
-					 enum kvm_pgtable_walk_flags flag,
-					 void * const arg)
+static int fix_host_ownership_walker(u64 addr, u64 end, u32 level,
+				     kvm_pte_t *ptep,
+				     enum kvm_pgtable_walk_flags flag,
+				     void * const arg)
 {
-	struct kvm_pgtable_mm_ops *mm_ops = arg;
 	enum kvm_pgtable_prot prot;
 	enum pkvm_page_state state;
 	kvm_pte_t pte = *ptep;
@@ -199,15 +198,6 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
 	if (!kvm_pte_valid(pte))
 		return 0;
 
-	/*
-	 * Fix-up the refcount for the page-table pages as the early allocator
-	 * was unable to access the hyp_vmemmap and so the buddy allocator has
-	 * initialised the refcount to '1'.
-	 */
-	mm_ops->get_page(ptep);
-	if (flag != KVM_PGTABLE_WALK_LEAF)
-		return 0;
-
 	if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
@@ -236,12 +226,30 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
 	return host_stage2_idmap_locked(phys, PAGE_SIZE, prot);
 }
 
-static int finalize_host_mappings(void)
+static int fix_hyp_pgtable_refcnt_walker(u64 addr, u64 end, u32 level,
+					 kvm_pte_t *ptep,
+					 enum kvm_pgtable_walk_flags flag,
+					 void * const arg)
+{
+	struct kvm_pgtable_mm_ops *mm_ops = arg;
+	kvm_pte_t pte = *ptep;
+
+	/*
+	 * Fix-up the refcount for the page-table pages as the early allocator
+	 * was unable to access the hyp_vmemmap and so the buddy allocator has
+	 * initialised the refcount to '1'.
+	 */
+	if (kvm_pte_valid(pte))
+		mm_ops->get_page(ptep);
+
+	return 0;
+}
+
+static int fix_host_ownership(void)
 {
 	struct kvm_pgtable_walker walker = {
-		.cb	= finalize_host_mappings_walker,
-		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
-		.arg	= pkvm_pgtable.mm_ops,
+		.cb	= fix_host_ownership_walker,
+		.flags	= KVM_PGTABLE_WALK_LEAF,
 	};
 	int i, ret;
 
@@ -257,6 +265,18 @@ static int finalize_host_mappings(void)
 	return 0;
 }
 
+static int fix_hyp_pgtable_refcnt(void)
+{
+	struct kvm_pgtable_walker walker = {
+		.cb	= fix_hyp_pgtable_refcnt_walker,
+		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
+		.arg	= pkvm_pgtable.mm_ops,
+	};
+
+	return kvm_pgtable_walk(&pkvm_pgtable, 0, BIT(pkvm_pgtable.ia_bits),
+				&walker);
+}
+
 void __noreturn __pkvm_init_finalise(void)
 {
 	struct kvm_host_data *host_data = this_cpu_ptr(&kvm_host_data);
@@ -286,7 +306,11 @@ void __noreturn __pkvm_init_finalise(void)
 	};
 	pkvm_pgtable.mm_ops = &pkvm_pgtable_mm_ops;
 
-	ret = finalize_host_mappings();
+	ret = fix_host_ownership();
+	if (ret)
+		goto out;
+
+	ret = fix_hyp_pgtable_refcnt();
 	if (ret)
 		goto out;
 
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 05/25] KVM: arm64: Unify identifiers used to distinguish host and hypervisor
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
                   ` (3 preceding siblings ...)
  2022-10-17 11:51 ` [PATCH v4 04/25] KVM: arm64: Fix-up hyp stage-1 refcounts for all pages mapped at EL2 Will Deacon
@ 2022-10-17 11:51 ` Will Deacon
  2022-10-17 20:21   ` Philippe Mathieu-Daudé
  2022-10-17 11:51 ` [PATCH v4 06/25] KVM: arm64: Implement do_donate() helper for donating memory Will Deacon
                   ` (19 subsequent siblings)
  24 siblings, 1 reply; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:51 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

The 'pkvm_component_id' enum type provides constants to refer to the
host and the hypervisor, yet this information is duplicated by the
'pkvm_hyp_id' constant.

Remove the definition of 'pkvm_hyp_id' and move the 'pkvm_component_id'
type definition to 'mem_protect.h' so that it can be used outside of
the memory protection code, for example when initialising the owner for
hypervisor-owned pages.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 6 +++++-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 8 --------
 arch/arm64/kvm/hyp/nvhe/setup.c               | 2 +-
 3 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 80e99836eac7..f5705a1e972f 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -51,7 +51,11 @@ struct host_kvm {
 };
 extern struct host_kvm host_kvm;
 
-extern const u8 pkvm_hyp_id;
+/* This corresponds to page-table locking order */
+enum pkvm_component_id {
+	PKVM_ID_HOST,
+	PKVM_ID_HYP,
+};
 
 int __pkvm_prot_finalize(void);
 int __pkvm_host_share_hyp(u64 pfn);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 1e78acf9662e..ff86f5bd230f 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -26,8 +26,6 @@ struct host_kvm host_kvm;
 
 static struct hyp_pool host_s2_pool;
 
-const u8 pkvm_hyp_id = 1;
-
 static void host_lock_component(void)
 {
 	hyp_spin_lock(&host_kvm.lock);
@@ -380,12 +378,6 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
 	BUG_ON(ret && ret != -EAGAIN);
 }
 
-/* This corresponds to locking order */
-enum pkvm_component_id {
-	PKVM_ID_HOST,
-	PKVM_ID_HYP,
-};
-
 struct pkvm_mem_transition {
 	u64				nr_pages;
 
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 8f2726d7e201..0312c9c74a5a 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -212,7 +212,7 @@ static int fix_host_ownership_walker(u64 addr, u64 end, u32 level,
 	state = pkvm_getstate(kvm_pgtable_hyp_pte_prot(pte));
 	switch (state) {
 	case PKVM_PAGE_OWNED:
-		return host_stage2_set_owner_locked(phys, PAGE_SIZE, pkvm_hyp_id);
+		return host_stage2_set_owner_locked(phys, PAGE_SIZE, PKVM_ID_HYP);
 	case PKVM_PAGE_SHARED_OWNED:
 		prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, PKVM_PAGE_SHARED_BORROWED);
 		break;
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 06/25] KVM: arm64: Implement do_donate() helper for donating memory
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
                   ` (4 preceding siblings ...)
  2022-10-17 11:51 ` [PATCH v4 05/25] KVM: arm64: Unify identifiers used to distinguish host and hypervisor Will Deacon
@ 2022-10-17 11:51 ` Will Deacon
  2022-10-17 11:51 ` [PATCH v4 07/25] KVM: arm64: Prevent the donation of no-map pages Will Deacon
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:51 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

Transferring ownership information of a memory region from one component
to another can be achieved using a "donate" operation, which results
in the previous owner losing access to the underlying pages entirely
and the new owner having exclusive access to the page.

Implement a do_donate() helper, along the same lines as do_{un,}share,
and provide this functionality for the host-{to,from}-hyp cases as this
will later be used to donate/reclaim memory pages to store VM metadata
at EL2.

In a similar manner to the sharing transitions, permission checks are
performed by the hypervisor to ensure that the component initiating the
transition really is the owner of the page and also that the completer
does not currently have a page mapped at the target address.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   2 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 239 ++++++++++++++++++
 2 files changed, 241 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index f5705a1e972f..c87b19b2d468 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -60,6 +60,8 @@ enum pkvm_component_id {
 int __pkvm_prot_finalize(void);
 int __pkvm_host_share_hyp(u64 pfn);
 int __pkvm_host_unshare_hyp(u64 pfn);
+int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages);
+int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages);
 
 bool addr_is_memory(phys_addr_t phys);
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index ff86f5bd230f..c30402737548 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -391,6 +391,9 @@ struct pkvm_mem_transition {
 				/* Address in the completer's address space */
 				u64	completer_addr;
 			} host;
+			struct {
+				u64	completer_addr;
+			} hyp;
 		};
 	} initiator;
 
@@ -404,6 +407,10 @@ struct pkvm_mem_share {
 	const enum kvm_pgtable_prot		completer_prot;
 };
 
+struct pkvm_mem_donation {
+	const struct pkvm_mem_transition	tx;
+};
+
 struct check_walk_data {
 	enum pkvm_page_state	desired;
 	enum pkvm_page_state	(*get_page_state)(kvm_pte_t pte);
@@ -503,6 +510,46 @@ static int host_initiate_unshare(u64 *completer_addr,
 	return __host_set_page_state_range(addr, size, PKVM_PAGE_OWNED);
 }
 
+static int host_initiate_donation(u64 *completer_addr,
+				  const struct pkvm_mem_transition *tx)
+{
+	u8 owner_id = tx->completer.id;
+	u64 size = tx->nr_pages * PAGE_SIZE;
+
+	*completer_addr = tx->initiator.host.completer_addr;
+	return host_stage2_set_owner_locked(tx->initiator.addr, size, owner_id);
+}
+
+static bool __host_ack_skip_pgtable_check(const struct pkvm_mem_transition *tx)
+{
+	return !(IS_ENABLED(CONFIG_NVHE_EL2_DEBUG) ||
+		 tx->initiator.id != PKVM_ID_HYP);
+}
+
+static int __host_ack_transition(u64 addr, const struct pkvm_mem_transition *tx,
+				 enum pkvm_page_state state)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+
+	if (__host_ack_skip_pgtable_check(tx))
+		return 0;
+
+	return __host_check_page_state_range(addr, size, state);
+}
+
+static int host_ack_donation(u64 addr, const struct pkvm_mem_transition *tx)
+{
+	return __host_ack_transition(addr, tx, PKVM_NOPAGE);
+}
+
+static int host_complete_donation(u64 addr, const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+	u8 host_id = tx->completer.id;
+
+	return host_stage2_set_owner_locked(addr, size, host_id);
+}
+
 static enum pkvm_page_state hyp_get_page_state(kvm_pte_t pte)
 {
 	if (!kvm_pte_valid(pte))
@@ -523,6 +570,27 @@ static int __hyp_check_page_state_range(u64 addr, u64 size,
 	return check_page_state_range(&pkvm_pgtable, addr, size, &d);
 }
 
+static int hyp_request_donation(u64 *completer_addr,
+				const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+	u64 addr = tx->initiator.addr;
+
+	*completer_addr = tx->initiator.hyp.completer_addr;
+	return __hyp_check_page_state_range(addr, size, PKVM_PAGE_OWNED);
+}
+
+static int hyp_initiate_donation(u64 *completer_addr,
+				 const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+	int ret;
+
+	*completer_addr = tx->initiator.hyp.completer_addr;
+	ret = kvm_pgtable_hyp_unmap(&pkvm_pgtable, tx->initiator.addr, size);
+	return (ret != size) ? -EFAULT : 0;
+}
+
 static bool __hyp_ack_skip_pgtable_check(const struct pkvm_mem_transition *tx)
 {
 	return !(IS_ENABLED(CONFIG_NVHE_EL2_DEBUG) ||
@@ -554,6 +622,16 @@ static int hyp_ack_unshare(u64 addr, const struct pkvm_mem_transition *tx)
 					    PKVM_PAGE_SHARED_BORROWED);
 }
 
+static int hyp_ack_donation(u64 addr, const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+
+	if (__hyp_ack_skip_pgtable_check(tx))
+		return 0;
+
+	return __hyp_check_page_state_range(addr, size, PKVM_NOPAGE);
+}
+
 static int hyp_complete_share(u64 addr, const struct pkvm_mem_transition *tx,
 			      enum kvm_pgtable_prot perms)
 {
@@ -572,6 +650,15 @@ static int hyp_complete_unshare(u64 addr, const struct pkvm_mem_transition *tx)
 	return (ret != size) ? -EFAULT : 0;
 }
 
+static int hyp_complete_donation(u64 addr,
+				 const struct pkvm_mem_transition *tx)
+{
+	void *start = (void *)addr, *end = start + (tx->nr_pages * PAGE_SIZE);
+	enum kvm_pgtable_prot prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_OWNED);
+
+	return pkvm_create_mappings_locked(start, end, prot);
+}
+
 static int check_share(struct pkvm_mem_share *share)
 {
 	const struct pkvm_mem_transition *tx = &share->tx;
@@ -724,6 +811,94 @@ static int do_unshare(struct pkvm_mem_share *share)
 	return WARN_ON(__do_unshare(share));
 }
 
+static int check_donation(struct pkvm_mem_donation *donation)
+{
+	const struct pkvm_mem_transition *tx = &donation->tx;
+	u64 completer_addr;
+	int ret;
+
+	switch (tx->initiator.id) {
+	case PKVM_ID_HOST:
+		ret = host_request_owned_transition(&completer_addr, tx);
+		break;
+	case PKVM_ID_HYP:
+		ret = hyp_request_donation(&completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	if (ret)
+		return ret;
+
+	switch (tx->completer.id){
+	case PKVM_ID_HOST:
+		ret = host_ack_donation(completer_addr, tx);
+		break;
+	case PKVM_ID_HYP:
+		ret = hyp_ack_donation(completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+static int __do_donate(struct pkvm_mem_donation *donation)
+{
+	const struct pkvm_mem_transition *tx = &donation->tx;
+	u64 completer_addr;
+	int ret;
+
+	switch (tx->initiator.id) {
+	case PKVM_ID_HOST:
+		ret = host_initiate_donation(&completer_addr, tx);
+		break;
+	case PKVM_ID_HYP:
+		ret = hyp_initiate_donation(&completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	if (ret)
+		return ret;
+
+	switch (tx->completer.id){
+	case PKVM_ID_HOST:
+		ret = host_complete_donation(completer_addr, tx);
+		break;
+	case PKVM_ID_HYP:
+		ret = hyp_complete_donation(completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+/*
+ * do_donate():
+ *
+ * The page owner transfers ownership to another component, losing access
+ * as a consequence.
+ *
+ * Initiator: OWNED	=> NOPAGE
+ * Completer: NOPAGE	=> OWNED
+ */
+static int do_donate(struct pkvm_mem_donation *donation)
+{
+	int ret;
+
+	ret = check_donation(donation);
+	if (ret)
+		return ret;
+
+	return WARN_ON(__do_donate(donation));
+}
+
 int __pkvm_host_share_hyp(u64 pfn)
 {
 	int ret;
@@ -789,3 +964,67 @@ int __pkvm_host_unshare_hyp(u64 pfn)
 
 	return ret;
 }
+
+int __pkvm_host_donate_hyp(u64 pfn, u64 nr_pages)
+{
+	int ret;
+	u64 host_addr = hyp_pfn_to_phys(pfn);
+	u64 hyp_addr = (u64)__hyp_va(host_addr);
+	struct pkvm_mem_donation donation = {
+		.tx	= {
+			.nr_pages	= nr_pages,
+			.initiator	= {
+				.id	= PKVM_ID_HOST,
+				.addr	= host_addr,
+				.host	= {
+					.completer_addr = hyp_addr,
+				},
+			},
+			.completer	= {
+				.id	= PKVM_ID_HYP,
+			},
+		},
+	};
+
+	host_lock_component();
+	hyp_lock_component();
+
+	ret = do_donate(&donation);
+
+	hyp_unlock_component();
+	host_unlock_component();
+
+	return ret;
+}
+
+int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages)
+{
+	int ret;
+	u64 host_addr = hyp_pfn_to_phys(pfn);
+	u64 hyp_addr = (u64)__hyp_va(host_addr);
+	struct pkvm_mem_donation donation = {
+		.tx	= {
+			.nr_pages	= nr_pages,
+			.initiator	= {
+				.id	= PKVM_ID_HYP,
+				.addr	= hyp_addr,
+				.hyp	= {
+					.completer_addr = host_addr,
+				},
+			},
+			.completer	= {
+				.id	= PKVM_ID_HOST,
+			},
+		},
+	};
+
+	host_lock_component();
+	hyp_lock_component();
+
+	ret = do_donate(&donation);
+
+	hyp_unlock_component();
+	host_unlock_component();
+
+	return ret;
+}
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 07/25] KVM: arm64: Prevent the donation of no-map pages
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
                   ` (5 preceding siblings ...)
  2022-10-17 11:51 ` [PATCH v4 06/25] KVM: arm64: Implement do_donate() helper for donating memory Will Deacon
@ 2022-10-17 11:51 ` Will Deacon
  2022-10-18 13:42   ` Philippe Mathieu-Daudé
  2022-10-17 11:51 ` [PATCH v4 08/25] KVM: arm64: Add helpers to pin memory shared with the hypervisor at EL2 Will Deacon
                   ` (17 subsequent siblings)
  24 siblings, 1 reply; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:51 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

Memory regions marked as "no-map" in the host device-tree routinely
include TrustZone carevouts and DMA pools. Although donating such pages
to the hypervisor may not breach confidentiality, it could be used to
corrupt its state in uncontrollable ways. To prevent this, let's block
host-initiated memory transitions targeting "no-map" pages altogether in
nVHE protected mode as there should be no valid reason to do this in
current operation.

Thankfully, the pKVM EL2 hypervisor has a full copy of the host's list
of memblock regions, so we can easily check for the presence of the
MEMBLOCK_NOMAP flag on a region containing pages being donated from the
host.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index c30402737548..a7156fd13bc8 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -193,7 +193,7 @@ struct kvm_mem_range {
 	u64 end;
 };
 
-static bool find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
+static struct memblock_region *find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
 {
 	int cur, left = 0, right = hyp_memblock_nr;
 	struct memblock_region *reg;
@@ -216,18 +216,28 @@ static bool find_mem_range(phys_addr_t addr, struct kvm_mem_range *range)
 		} else {
 			range->start = reg->base;
 			range->end = end;
-			return true;
+			return reg;
 		}
 	}
 
-	return false;
+	return NULL;
 }
 
 bool addr_is_memory(phys_addr_t phys)
 {
 	struct kvm_mem_range range;
 
-	return find_mem_range(phys, &range);
+	return !!find_mem_range(phys, &range);
+}
+
+static bool addr_is_allowed_memory(phys_addr_t phys)
+{
+	struct memblock_region *reg;
+	struct kvm_mem_range range;
+
+	reg = find_mem_range(phys, &range);
+
+	return reg && !(reg->flags & MEMBLOCK_NOMAP);
 }
 
 static bool is_in_mem_range(u64 addr, struct kvm_mem_range *range)
@@ -346,7 +356,7 @@ static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot pr
 static int host_stage2_idmap(u64 addr)
 {
 	struct kvm_mem_range range;
-	bool is_memory = find_mem_range(addr, &range);
+	bool is_memory = !!find_mem_range(addr, &range);
 	enum kvm_pgtable_prot prot;
 	int ret;
 
@@ -424,7 +434,7 @@ static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
 	struct check_walk_data *d = arg;
 	kvm_pte_t pte = *ptep;
 
-	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
+	if (kvm_pte_valid(pte) && !addr_is_allowed_memory(kvm_pte_to_phys(pte)))
 		return -EINVAL;
 
 	return d->get_page_state(pte) == d->desired ? 0 : -EPERM;
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 08/25] KVM: arm64: Add helpers to pin memory shared with the hypervisor at EL2
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
                   ` (6 preceding siblings ...)
  2022-10-17 11:51 ` [PATCH v4 07/25] KVM: arm64: Prevent the donation of no-map pages Will Deacon
@ 2022-10-17 11:51 ` Will Deacon
  2022-10-17 11:51 ` [PATCH v4 09/25] KVM: arm64: Include asm/kvm_mmu.h in nvhe/mem_protect.h Will Deacon
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:51 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

Add helpers allowing the hypervisor to check whether a range of pages
are currently shared by the host, and 'pin' them if so by blocking host
unshare operations until the memory has been unpinned.

This will allow the hypervisor to take references on host-provided
data-structures (e.g. 'struct kvm') with the guarantee that these pages
will remain in a stable state until the hypervisor decides to release
them, for example during guest teardown.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  3 ++
 arch/arm64/kvm/hyp/include/nvhe/memory.h      |  7 ++-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 48 +++++++++++++++++++
 3 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index c87b19b2d468..998bf165af71 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -69,6 +69,9 @@ int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id);
 int kvm_host_prepare_stage2(void *pgt_pool_base);
 void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 
+int hyp_pin_shared_mem(void *from, void *to);
+void hyp_unpin_shared_mem(void *from, void *to);
+
 static __always_inline void __load_host_stage2(void)
 {
 	if (static_branch_likely(&kvm_protected_mode_initialized))
diff --git a/arch/arm64/kvm/hyp/include/nvhe/memory.h b/arch/arm64/kvm/hyp/include/nvhe/memory.h
index 9422900e5c6a..ab205c4d6774 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/memory.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/memory.h
@@ -55,10 +55,15 @@ static inline void hyp_page_ref_inc(struct hyp_page *p)
 	p->refcount++;
 }
 
-static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
+static inline void hyp_page_ref_dec(struct hyp_page *p)
 {
 	BUG_ON(!p->refcount);
 	p->refcount--;
+}
+
+static inline int hyp_page_ref_dec_and_test(struct hyp_page *p)
+{
+	hyp_page_ref_dec(p);
 	return (p->refcount == 0);
 }
 
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index a7156fd13bc8..1262dbae7f06 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -625,6 +625,9 @@ static int hyp_ack_unshare(u64 addr, const struct pkvm_mem_transition *tx)
 {
 	u64 size = tx->nr_pages * PAGE_SIZE;
 
+	if (tx->initiator.id == PKVM_ID_HOST && hyp_page_count((void *)addr))
+		return -EBUSY;
+
 	if (__hyp_ack_skip_pgtable_check(tx))
 		return 0;
 
@@ -1038,3 +1041,48 @@ int __pkvm_hyp_donate_host(u64 pfn, u64 nr_pages)
 
 	return ret;
 }
+
+int hyp_pin_shared_mem(void *from, void *to)
+{
+	u64 cur, start = ALIGN_DOWN((u64)from, PAGE_SIZE);
+	u64 end = PAGE_ALIGN((u64)to);
+	u64 size = end - start;
+	int ret;
+
+	host_lock_component();
+	hyp_lock_component();
+
+	ret = __host_check_page_state_range(__hyp_pa(start), size,
+					    PKVM_PAGE_SHARED_OWNED);
+	if (ret)
+		goto unlock;
+
+	ret = __hyp_check_page_state_range(start, size,
+					   PKVM_PAGE_SHARED_BORROWED);
+	if (ret)
+		goto unlock;
+
+	for (cur = start; cur < end; cur += PAGE_SIZE)
+		hyp_page_ref_inc(hyp_virt_to_page(cur));
+
+unlock:
+	hyp_unlock_component();
+	host_unlock_component();
+
+	return ret;
+}
+
+void hyp_unpin_shared_mem(void *from, void *to)
+{
+	u64 cur, start = ALIGN_DOWN((u64)from, PAGE_SIZE);
+	u64 end = PAGE_ALIGN((u64)to);
+
+	host_lock_component();
+	hyp_lock_component();
+
+	for (cur = start; cur < end; cur += PAGE_SIZE)
+		hyp_page_ref_dec(hyp_virt_to_page(cur));
+
+	hyp_unlock_component();
+	host_unlock_component();
+}
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 09/25] KVM: arm64: Include asm/kvm_mmu.h in nvhe/mem_protect.h
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
                   ` (7 preceding siblings ...)
  2022-10-17 11:51 ` [PATCH v4 08/25] KVM: arm64: Add helpers to pin memory shared with the hypervisor at EL2 Will Deacon
@ 2022-10-17 11:51 ` Will Deacon
  2022-10-17 20:22   ` Philippe Mathieu-Daudé
  2022-10-17 11:51 ` [PATCH v4 10/25] KVM: arm64: Add hyp_spinlock_t static initializer Will Deacon
                   ` (15 subsequent siblings)
  24 siblings, 1 reply; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:51 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

nvhe/mem_protect.h refers to __load_stage2() in the definition of
__load_host_stage2() but doesn't include the relevant header.

Include asm/kvm_mmu.h in nvhe/mem_protect.h so that users of the latter
don't have to do this themselves.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 998bf165af71..3bea816296dc 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -8,6 +8,7 @@
 #define __KVM_NVHE_MEM_PROTECT__
 #include <linux/kvm_host.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 #include <asm/kvm_pgtable.h>
 #include <asm/virt.h>
 #include <nvhe/spinlock.h>
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 10/25] KVM: arm64: Add hyp_spinlock_t static initializer
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
                   ` (8 preceding siblings ...)
  2022-10-17 11:51 ` [PATCH v4 09/25] KVM: arm64: Include asm/kvm_mmu.h in nvhe/mem_protect.h Will Deacon
@ 2022-10-17 11:51 ` Will Deacon
  2022-10-18 13:51   ` Philippe Mathieu-Daudé
  2022-10-17 11:51 ` [PATCH v4 11/25] KVM: arm64: Rename 'host_kvm' to 'host_mmu' Will Deacon
                   ` (14 subsequent siblings)
  24 siblings, 1 reply; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:51 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

From: Fuad Tabba <tabba@google.com>

Introduce a static initializer macro for 'hyp_spinlock_t' so that it is
straightforward to instantiate global locks at EL2. This will be later
utilised for locking the VM table in the hypervisor.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/spinlock.h | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
index 4652fd04bdbe..7c7ea8c55405 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/spinlock.h
@@ -28,9 +28,17 @@ typedef union hyp_spinlock {
 	};
 } hyp_spinlock_t;
 
+#define __HYP_SPIN_LOCK_INITIALIZER \
+	{ .__val = 0 }
+
+#define __HYP_SPIN_LOCK_UNLOCKED \
+	((hyp_spinlock_t) __HYP_SPIN_LOCK_INITIALIZER)
+
+#define DEFINE_HYP_SPINLOCK(x)	hyp_spinlock_t x = __HYP_SPIN_LOCK_UNLOCKED
+
 #define hyp_spin_lock_init(l)						\
 do {									\
-	*(l) = (hyp_spinlock_t){ .__val = 0 };				\
+	*(l) = __HYP_SPIN_LOCK_UNLOCKED;				\
 } while (0)
 
 static inline void hyp_spin_lock(hyp_spinlock_t *lock)
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 11/25] KVM: arm64: Rename 'host_kvm' to 'host_mmu'
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
                   ` (9 preceding siblings ...)
  2022-10-17 11:51 ` [PATCH v4 10/25] KVM: arm64: Add hyp_spinlock_t static initializer Will Deacon
@ 2022-10-17 11:51 ` Will Deacon
  2022-10-18 13:47   ` Philippe Mathieu-Daudé
  2022-10-17 11:51 ` [PATCH v4 12/25] KVM: arm64: Add infrastructure to create and track pKVM instances at EL2 Will Deacon
                   ` (13 subsequent siblings)
  24 siblings, 1 reply; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:51 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

In preparation for introducing VM and vCPU state at EL2, rename the
existing 'struct host_kvm' and its singleton 'host_kvm' instance to
'host_mmu' so as to avoid confusion between the structure tracking the
host stage-2 MMU state and the host instance of a 'struct kvm' for a
protected guest.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  6 +--
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 46 +++++++++----------
 2 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 3bea816296dc..0a6d3e7f2a43 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -44,13 +44,13 @@ static inline enum pkvm_page_state pkvm_getstate(enum kvm_pgtable_prot prot)
 	return prot & PKVM_PAGE_STATE_PROT_MASK;
 }
 
-struct host_kvm {
+struct host_mmu {
 	struct kvm_arch arch;
 	struct kvm_pgtable pgt;
 	struct kvm_pgtable_mm_ops mm_ops;
 	hyp_spinlock_t lock;
 };
-extern struct host_kvm host_kvm;
+extern struct host_mmu host_mmu;
 
 /* This corresponds to page-table locking order */
 enum pkvm_component_id {
@@ -76,7 +76,7 @@ void hyp_unpin_shared_mem(void *from, void *to);
 static __always_inline void __load_host_stage2(void)
 {
 	if (static_branch_likely(&kvm_protected_mode_initialized))
-		__load_stage2(&host_kvm.arch.mmu, &host_kvm.arch);
+		__load_stage2(&host_mmu.arch.mmu, &host_mmu.arch);
 	else
 		write_sysreg(0, vttbr_el2);
 }
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 1262dbae7f06..bec8306c2392 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -22,18 +22,18 @@
 #define KVM_HOST_S2_FLAGS (KVM_PGTABLE_S2_NOFWB | KVM_PGTABLE_S2_IDMAP)
 
 extern unsigned long hyp_nr_cpus;
-struct host_kvm host_kvm;
+struct host_mmu host_mmu;
 
 static struct hyp_pool host_s2_pool;
 
 static void host_lock_component(void)
 {
-	hyp_spin_lock(&host_kvm.lock);
+	hyp_spin_lock(&host_mmu.lock);
 }
 
 static void host_unlock_component(void)
 {
-	hyp_spin_unlock(&host_kvm.lock);
+	hyp_spin_unlock(&host_mmu.lock);
 }
 
 static void hyp_lock_component(void)
@@ -88,7 +88,7 @@ static int prepare_s2_pool(void *pgt_pool_base)
 	if (ret)
 		return ret;
 
-	host_kvm.mm_ops = (struct kvm_pgtable_mm_ops) {
+	host_mmu.mm_ops = (struct kvm_pgtable_mm_ops) {
 		.zalloc_pages_exact = host_s2_zalloc_pages_exact,
 		.zalloc_page = host_s2_zalloc_page,
 		.phys_to_virt = hyp_phys_to_virt,
@@ -109,7 +109,7 @@ static void prepare_host_vtcr(void)
 	parange = kvm_get_parange(id_aa64mmfr0_el1_sys_val);
 	phys_shift = id_aa64mmfr0_parange_to_phys_shift(parange);
 
-	host_kvm.arch.vtcr = kvm_get_vtcr(id_aa64mmfr0_el1_sys_val,
+	host_mmu.arch.vtcr = kvm_get_vtcr(id_aa64mmfr0_el1_sys_val,
 					  id_aa64mmfr1_el1_sys_val, phys_shift);
 }
 
@@ -117,25 +117,25 @@ static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot pr
 
 int kvm_host_prepare_stage2(void *pgt_pool_base)
 {
-	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
+	struct kvm_s2_mmu *mmu = &host_mmu.arch.mmu;
 	int ret;
 
 	prepare_host_vtcr();
-	hyp_spin_lock_init(&host_kvm.lock);
-	mmu->arch = &host_kvm.arch;
+	hyp_spin_lock_init(&host_mmu.lock);
+	mmu->arch = &host_mmu.arch;
 
 	ret = prepare_s2_pool(pgt_pool_base);
 	if (ret)
 		return ret;
 
-	ret = __kvm_pgtable_stage2_init(&host_kvm.pgt, mmu,
-					&host_kvm.mm_ops, KVM_HOST_S2_FLAGS,
+	ret = __kvm_pgtable_stage2_init(&host_mmu.pgt, mmu,
+					&host_mmu.mm_ops, KVM_HOST_S2_FLAGS,
 					host_stage2_force_pte_cb);
 	if (ret)
 		return ret;
 
-	mmu->pgd_phys = __hyp_pa(host_kvm.pgt.pgd);
-	mmu->pgt = &host_kvm.pgt;
+	mmu->pgd_phys = __hyp_pa(host_mmu.pgt.pgd);
+	mmu->pgt = &host_mmu.pgt;
 	atomic64_set(&mmu->vmid.id, 0);
 
 	return 0;
@@ -143,19 +143,19 @@ int kvm_host_prepare_stage2(void *pgt_pool_base)
 
 int __pkvm_prot_finalize(void)
 {
-	struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu;
+	struct kvm_s2_mmu *mmu = &host_mmu.arch.mmu;
 	struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params);
 
 	if (params->hcr_el2 & HCR_VM)
 		return -EPERM;
 
 	params->vttbr = kvm_get_vttbr(mmu);
-	params->vtcr = host_kvm.arch.vtcr;
+	params->vtcr = host_mmu.arch.vtcr;
 	params->hcr_el2 |= HCR_VM;
 	kvm_flush_dcache_to_poc(params, sizeof(*params));
 
 	write_sysreg(params->hcr_el2, hcr_el2);
-	__load_stage2(&host_kvm.arch.mmu, &host_kvm.arch);
+	__load_stage2(&host_mmu.arch.mmu, &host_mmu.arch);
 
 	/*
 	 * Make sure to have an ISB before the TLB maintenance below but only
@@ -173,7 +173,7 @@ int __pkvm_prot_finalize(void)
 
 static int host_stage2_unmap_dev_all(void)
 {
-	struct kvm_pgtable *pgt = &host_kvm.pgt;
+	struct kvm_pgtable *pgt = &host_mmu.pgt;
 	struct memblock_region *reg;
 	u64 addr = 0;
 	int i, ret;
@@ -258,7 +258,7 @@ static bool range_is_memory(u64 start, u64 end)
 static inline int __host_stage2_idmap(u64 start, u64 end,
 				      enum kvm_pgtable_prot prot)
 {
-	return kvm_pgtable_stage2_map(&host_kvm.pgt, start, end - start, start,
+	return kvm_pgtable_stage2_map(&host_mmu.pgt, start, end - start, start,
 				      prot, &host_s2_pool);
 }
 
@@ -271,7 +271,7 @@ static inline int __host_stage2_idmap(u64 start, u64 end,
 #define host_stage2_try(fn, ...)					\
 	({								\
 		int __ret;						\
-		hyp_assert_lock_held(&host_kvm.lock);			\
+		hyp_assert_lock_held(&host_mmu.lock);			\
 		__ret = fn(__VA_ARGS__);				\
 		if (__ret == -ENOMEM) {					\
 			__ret = host_stage2_unmap_dev_all();		\
@@ -294,8 +294,8 @@ static int host_stage2_adjust_range(u64 addr, struct kvm_mem_range *range)
 	u32 level;
 	int ret;
 
-	hyp_assert_lock_held(&host_kvm.lock);
-	ret = kvm_pgtable_get_leaf(&host_kvm.pgt, addr, &pte, &level);
+	hyp_assert_lock_held(&host_mmu.lock);
+	ret = kvm_pgtable_get_leaf(&host_mmu.pgt, addr, &pte, &level);
 	if (ret)
 		return ret;
 
@@ -327,7 +327,7 @@ int host_stage2_idmap_locked(phys_addr_t addr, u64 size,
 
 int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id)
 {
-	return host_stage2_try(kvm_pgtable_stage2_set_owner, &host_kvm.pgt,
+	return host_stage2_try(kvm_pgtable_stage2_set_owner, &host_mmu.pgt,
 			       addr, size, &host_s2_pool, owner_id);
 }
 
@@ -468,8 +468,8 @@ static int __host_check_page_state_range(u64 addr, u64 size,
 		.get_page_state	= host_get_page_state,
 	};
 
-	hyp_assert_lock_held(&host_kvm.lock);
-	return check_page_state_range(&host_kvm.pgt, addr, size, &d);
+	hyp_assert_lock_held(&host_mmu.lock);
+	return check_page_state_range(&host_mmu.pgt, addr, size, &d);
 }
 
 static int __host_set_page_state_range(u64 addr, u64 size,
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 12/25] KVM: arm64: Add infrastructure to create and track pKVM instances at EL2
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
                   ` (10 preceding siblings ...)
  2022-10-17 11:51 ` [PATCH v4 11/25] KVM: arm64: Rename 'host_kvm' to 'host_mmu' Will Deacon
@ 2022-10-17 11:51 ` Will Deacon
  2022-10-18 15:13   ` Quentin Perret
                     ` (4 more replies)
  2022-10-17 11:51 ` [PATCH v4 13/25] KVM: arm64: Instantiate pKVM hypervisor VM and vCPU structures from EL1 Will Deacon
                   ` (12 subsequent siblings)
  24 siblings, 5 replies; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:51 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

From: Fuad Tabba <tabba@google.com>

Introduce a global table (and lock) to track pKVM instances at EL2, and
provide hypercalls that can be used by the untrusted host to create and
destroy pKVM VMs and their vCPUs. pKVM VM/vCPU state is directly
accessible only by the trusted hypervisor (EL2).

Each pKVM VM is directly associated with an untrusted host KVM instance,
and is referenced by the host using an opaque handle. Future patches
will provide hypercalls to allow the host to initialize/set/get pKVM
VM/vCPU state using the opaque handle.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_asm.h              |   3 +
 arch/arm64/include/asm/kvm_host.h             |   8 +
 arch/arm64/include/asm/kvm_pgtable.h          |   8 +
 arch/arm64/include/asm/kvm_pkvm.h             |   8 +
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   3 +
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |  64 +++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  31 ++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         |  14 +
 arch/arm64/kvm/hyp/nvhe/pkvm.c                | 389 ++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/setup.c               |   8 +
 arch/arm64/kvm/hyp/pgtable.c                  |   9 +
 arch/arm64/kvm/pkvm.c                         |   1 +
 12 files changed, 546 insertions(+)
 create mode 100644 arch/arm64/kvm/hyp/include/nvhe/pkvm.h

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 53035763e48e..de52ba775d48 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -76,6 +76,9 @@ enum __kvm_host_smccc_func {
 	__KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs,
 	__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_aprs,
 	__KVM_HOST_SMCCC_FUNC___pkvm_vcpu_init_traps,
+	__KVM_HOST_SMCCC_FUNC___pkvm_init_vm,
+	__KVM_HOST_SMCCC_FUNC___pkvm_init_vcpu,
+	__KVM_HOST_SMCCC_FUNC___pkvm_teardown_vm,
 };
 
 #define DECLARE_KVM_VHE_SYM(sym)	extern char sym[]
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 45e2136322ba..d3dd7ab9c79e 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -115,6 +115,8 @@ struct kvm_smccc_features {
 	unsigned long vendor_hyp_bmap;
 };
 
+typedef unsigned int pkvm_handle_t;
+
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
 
@@ -166,6 +168,12 @@ struct kvm_arch {
 
 	/* Hypercall features firmware registers' descriptor */
 	struct kvm_smccc_features smccc_feat;
+
+	/*
+	 * For an untrusted host VM, 'pkvm_handle' is used to lookup
+	 * the associated pKVM instance in the hypervisor.
+	 */
+	pkvm_handle_t pkvm_handle;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 1b098bd4cd37..4f6d79fe4352 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -288,6 +288,14 @@ u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
  */
 u64 kvm_get_vtcr(u64 mmfr0, u64 mmfr1, u32 phys_shift);
 
+/**
+ * kvm_pgtable_stage2_pgd_size() - Helper to compute size of a stage-2 PGD
+ * @vtcr:	Content of the VTCR register.
+ *
+ * Return: the size (in bytes) of the stage-2 PGD
+ */
+size_t kvm_pgtable_stage2_pgd_size(u64 vtcr);
+
 /**
  * __kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
  * @pgt:	Uninitialised page-table structure to initialise.
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 8f7b8a2314bb..e22856cf4207 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -9,6 +9,9 @@
 #include <linux/memblock.h>
 #include <asm/kvm_pgtable.h>
 
+/* Maximum number of protected VMs that can be created. */
+#define KVM_MAX_PVMS 255
+
 #define HYP_MEMBLOCK_REGIONS 128
 
 extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
@@ -40,6 +43,11 @@ static inline unsigned long hyp_vmemmap_pages(size_t vmemmap_entry_size)
 	return res >> PAGE_SHIFT;
 }
 
+static inline unsigned long hyp_vm_table_pages(void)
+{
+	return PAGE_ALIGN(KVM_MAX_PVMS * sizeof(void *)) >> PAGE_SHIFT;
+}
+
 static inline unsigned long __hyp_pgtable_max_pages(unsigned long nr_pages)
 {
 	unsigned long total = 0, i;
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 0a6d3e7f2a43..ce9a796a85ee 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -11,6 +11,7 @@
 #include <asm/kvm_mmu.h>
 #include <asm/kvm_pgtable.h>
 #include <asm/virt.h>
+#include <nvhe/pkvm.h>
 #include <nvhe/spinlock.h>
 
 /*
@@ -68,10 +69,12 @@ bool addr_is_memory(phys_addr_t phys);
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
 int host_stage2_set_owner_locked(phys_addr_t addr, u64 size, u8 owner_id);
 int kvm_host_prepare_stage2(void *pgt_pool_base);
+int kvm_guest_prepare_stage2(struct pkvm_hyp_vm *vm, void *pgd);
 void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 
 int hyp_pin_shared_mem(void *from, void *to);
 void hyp_unpin_shared_mem(void *from, void *to);
+void reclaim_guest_pages(struct pkvm_hyp_vm *vm);
 
 static __always_inline void __load_host_stage2(void)
 {
diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
new file mode 100644
index 000000000000..d176399dbfb5
--- /dev/null
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2021 Google LLC
+ * Author: Fuad Tabba <tabba@google.com>
+ */
+
+#ifndef __ARM64_KVM_NVHE_PKVM_H__
+#define __ARM64_KVM_NVHE_PKVM_H__
+
+#include <asm/kvm_pkvm.h>
+
+/*
+ * Holds the relevant data for maintaining the vcpu state completely at hyp.
+ */
+struct pkvm_hyp_vcpu {
+	struct kvm_vcpu vcpu;
+
+	/* Backpointer to the host's (untrusted) vCPU instance. */
+	struct kvm_vcpu *host_vcpu;
+};
+
+/*
+ * Holds the relevant data for running a protected vm.
+ */
+struct pkvm_hyp_vm {
+	struct kvm kvm;
+
+	/* Backpointer to the host's (untrusted) KVM instance. */
+	struct kvm *host_kvm;
+
+	/*
+	 * Total amount of memory donated by the host for maintaining
+	 * this 'struct pkvm_hyp_vm' in the hypervisor.
+	 */
+	size_t donated_memory_size;
+
+	/* The guest's stage-2 page-table managed by the hypervisor. */
+	struct kvm_pgtable pgt;
+
+	/*
+	 * The number of vcpus initialized and ready to run.
+	 * Modifying this is protected by 'vm_table_lock'.
+	 */
+	unsigned int nr_vcpus;
+
+	/* Array of the hyp vCPU structures for this VM. */
+	struct pkvm_hyp_vcpu *vcpus[];
+};
+
+static inline struct pkvm_hyp_vm *
+pkvm_hyp_vcpu_to_hyp_vm(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	return container_of(hyp_vcpu->vcpu.kvm, struct pkvm_hyp_vm, kvm);
+}
+
+void pkvm_hyp_vm_table_init(void *tbl);
+
+int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
+		   unsigned long pgd_hva);
+int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
+		     unsigned long vcpu_hva);
+int __pkvm_teardown_vm(pkvm_handle_t handle);
+
+#endif /* __ARM64_KVM_NVHE_PKVM_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 3cea4b6ac23e..b5f3fcfe9135 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -15,6 +15,7 @@
 
 #include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
+#include <nvhe/pkvm.h>
 #include <nvhe/trap_handler.h>
 
 DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
@@ -191,6 +192,33 @@ static void handle___pkvm_vcpu_init_traps(struct kvm_cpu_context *host_ctxt)
 	__pkvm_vcpu_init_traps(kern_hyp_va(vcpu));
 }
 
+static void handle___pkvm_init_vm(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(struct kvm *, host_kvm, host_ctxt, 1);
+	DECLARE_REG(unsigned long, vm_hva, host_ctxt, 2);
+	DECLARE_REG(unsigned long, pgd_hva, host_ctxt, 3);
+
+	host_kvm = kern_hyp_va(host_kvm);
+	cpu_reg(host_ctxt, 1) = __pkvm_init_vm(host_kvm, vm_hva, pgd_hva);
+}
+
+static void handle___pkvm_init_vcpu(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+	DECLARE_REG(struct kvm_vcpu *, host_vcpu, host_ctxt, 2);
+	DECLARE_REG(unsigned long, vcpu_hva, host_ctxt, 3);
+
+	host_vcpu = kern_hyp_va(host_vcpu);
+	cpu_reg(host_ctxt, 1) = __pkvm_init_vcpu(handle, host_vcpu, vcpu_hva);
+}
+
+static void handle___pkvm_teardown_vm(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(pkvm_handle_t, handle, host_ctxt, 1);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_teardown_vm(handle);
+}
+
 typedef void (*hcall_t)(struct kvm_cpu_context *);
 
 #define HANDLE_FUNC(x)	[__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
@@ -220,6 +248,9 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__vgic_v3_save_aprs),
 	HANDLE_FUNC(__vgic_v3_restore_aprs),
 	HANDLE_FUNC(__pkvm_vcpu_init_traps),
+	HANDLE_FUNC(__pkvm_init_vm),
+	HANDLE_FUNC(__pkvm_init_vcpu),
+	HANDLE_FUNC(__pkvm_teardown_vm),
 };
 
 static void handle_host_hcall(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index bec8306c2392..2ef6aaa21ba5 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -141,6 +141,20 @@ int kvm_host_prepare_stage2(void *pgt_pool_base)
 	return 0;
 }
 
+int kvm_guest_prepare_stage2(struct pkvm_hyp_vm *vm, void *pgd)
+{
+	vm->pgt.pgd = pgd;
+	return 0;
+}
+
+void reclaim_guest_pages(struct pkvm_hyp_vm *vm)
+{
+	unsigned long nr_pages;
+
+	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
+	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(vm->pgt.pgd), nr_pages));
+}
+
 int __pkvm_prot_finalize(void)
 {
 	struct kvm_s2_mmu *mmu = &host_mmu.arch.mmu;
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 85d3b7ae720f..7e6f7d2f7713 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -7,6 +7,9 @@
 #include <linux/kvm_host.h>
 #include <linux/mm.h>
 #include <nvhe/fixed_config.h>
+#include <nvhe/mem_protect.h>
+#include <nvhe/memory.h>
+#include <nvhe/pkvm.h>
 #include <nvhe/trap_handler.h>
 
 /*
@@ -183,3 +186,389 @@ void __pkvm_vcpu_init_traps(struct kvm_vcpu *vcpu)
 	pvm_init_traps_aa64mmfr0(vcpu);
 	pvm_init_traps_aa64mmfr1(vcpu);
 }
+
+/*
+ * Start the VM table handle at the offset defined instead of at 0.
+ * Mainly for sanity checking and debugging.
+ */
+#define HANDLE_OFFSET 0x1000
+
+static unsigned int vm_handle_to_idx(pkvm_handle_t handle)
+{
+	return handle - HANDLE_OFFSET;
+}
+
+static pkvm_handle_t idx_to_vm_handle(unsigned int idx)
+{
+	return idx + HANDLE_OFFSET;
+}
+
+/*
+ * Spinlock for protecting state related to the VM table. Protects writes
+ * to 'vm_table' and 'nr_table_entries' as well as reads and writes to
+ * 'last_hyp_vcpu_lookup'.
+ */
+static DEFINE_HYP_SPINLOCK(vm_table_lock);
+
+/*
+ * The table of VM entries for protected VMs in hyp.
+ * Allocated at hyp initialization and setup.
+ */
+static struct pkvm_hyp_vm **vm_table;
+
+void pkvm_hyp_vm_table_init(void *tbl)
+{
+	WARN_ON(vm_table);
+	vm_table = tbl;
+}
+
+/*
+ * Return the hyp vm structure corresponding to the handle.
+ */
+static struct pkvm_hyp_vm *get_vm_by_handle(pkvm_handle_t handle)
+{
+	unsigned int idx = vm_handle_to_idx(handle);
+
+	if (unlikely(idx >= KVM_MAX_PVMS))
+		return NULL;
+
+	return vm_table[idx];
+}
+
+static void unpin_host_vcpu(struct kvm_vcpu *host_vcpu)
+{
+	if (host_vcpu)
+		hyp_unpin_shared_mem(host_vcpu, host_vcpu + 1);
+}
+
+static void unpin_host_vcpus(struct pkvm_hyp_vcpu *hyp_vcpus[],
+			     unsigned int nr_vcpus)
+{
+	int i;
+
+	for (i = 0; i < nr_vcpus; i++)
+		unpin_host_vcpu(hyp_vcpus[i]->host_vcpu);
+}
+
+static void init_pkvm_hyp_vm(struct kvm *host_kvm, struct pkvm_hyp_vm *hyp_vm,
+			     unsigned int nr_vcpus)
+{
+	hyp_vm->host_kvm = host_kvm;
+	hyp_vm->kvm.created_vcpus = nr_vcpus;
+	hyp_vm->kvm.arch.vtcr = host_mmu.arch.vtcr;
+}
+
+static int init_pkvm_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu,
+			      struct pkvm_hyp_vm *hyp_vm,
+			      struct kvm_vcpu *host_vcpu,
+			      unsigned int vcpu_idx)
+{
+	int ret = 0;
+
+	if (hyp_pin_shared_mem(host_vcpu, host_vcpu + 1))
+		return -EBUSY;
+
+	if (host_vcpu->vcpu_idx != vcpu_idx) {
+		ret = -EINVAL;
+		goto done;
+	}
+
+	hyp_vcpu->host_vcpu = host_vcpu;
+
+	hyp_vcpu->vcpu.kvm = &hyp_vm->kvm;
+	hyp_vcpu->vcpu.vcpu_id = READ_ONCE(host_vcpu->vcpu_id);
+	hyp_vcpu->vcpu.vcpu_idx = vcpu_idx;
+
+	hyp_vcpu->vcpu.arch.hw_mmu = &hyp_vm->kvm.arch.mmu;
+done:
+	if (ret)
+		unpin_host_vcpu(host_vcpu);
+	return ret;
+}
+
+static int find_free_vm_table_entry(struct kvm *host_kvm)
+{
+	int i, ret = -ENOMEM;
+
+	for (i = 0; i < KVM_MAX_PVMS; ++i) {
+		struct pkvm_hyp_vm *vm = vm_table[i];
+
+		if (!vm) {
+			if (ret < 0)
+				ret = i;
+			continue;
+		}
+
+		if (unlikely(vm->host_kvm == host_kvm)) {
+			ret = -EEXIST;
+			break;
+		}
+	}
+
+	return ret;
+}
+
+/*
+ * Allocate a VM table entry and insert a pointer to the new vm.
+ *
+ * Return a unique handle to the protected VM on success,
+ * negative error code on failure.
+ */
+static pkvm_handle_t insert_vm_table_entry(struct kvm *host_kvm,
+					   struct pkvm_hyp_vm *hyp_vm,
+					   size_t donated_memory_size)
+{
+	struct kvm_s2_mmu *mmu = &hyp_vm->kvm.arch.mmu;
+	int idx;
+
+	hyp_assert_lock_held(&vm_table_lock);
+
+	/*
+	 * Initializing protected state might have failed, yet a malicious
+	 * host could trigger this function. Thus, ensure that 'vm_table'
+	 * exists.
+	 */
+	if (unlikely(!vm_table))
+		return -EINVAL;
+
+	idx = find_free_vm_table_entry(host_kvm);
+	if (idx < 0)
+		return idx;
+
+	hyp_vm->kvm.arch.pkvm_handle = idx_to_vm_handle(idx);
+	hyp_vm->donated_memory_size = donated_memory_size;
+
+	/* VMID 0 is reserved for the host */
+	atomic64_set(&mmu->vmid.id, idx + 1);
+
+	mmu->arch = &hyp_vm->kvm.arch;
+	mmu->pgt = &hyp_vm->pgt;
+
+	vm_table[idx] = hyp_vm;
+	return hyp_vm->kvm.arch.pkvm_handle;
+}
+
+/*
+ * Deallocate and remove the VM table entry corresponding to the handle.
+ */
+static void remove_vm_table_entry(pkvm_handle_t handle)
+{
+	hyp_assert_lock_held(&vm_table_lock);
+	vm_table[vm_handle_to_idx(handle)] = NULL;
+}
+
+static size_t pkvm_get_hyp_vm_size(unsigned int nr_vcpus)
+{
+	return size_add(sizeof(struct pkvm_hyp_vm),
+		size_mul(sizeof(struct pkvm_hyp_vcpu *), nr_vcpus));
+}
+
+static void *map_donated_memory_noclear(unsigned long host_va, size_t size)
+{
+	void *va = (void *)kern_hyp_va(host_va);
+
+	if (!PAGE_ALIGNED(va))
+		return NULL;
+
+	if (__pkvm_host_donate_hyp(hyp_virt_to_pfn(va),
+				   PAGE_ALIGN(size) >> PAGE_SHIFT))
+		return NULL;
+
+	return va;
+}
+
+static void *map_donated_memory(unsigned long host_va, size_t size)
+{
+	void *va = map_donated_memory_noclear(host_va, size);
+
+	if (va)
+		memset(va, 0, size);
+
+	return va;
+}
+
+static void __unmap_donated_memory(void *va, size_t size)
+{
+	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(va),
+				       PAGE_ALIGN(size) >> PAGE_SHIFT));
+}
+
+static void unmap_donated_memory(void *va, size_t size)
+{
+	if (!va)
+		return;
+
+	memset(va, 0, size);
+	__unmap_donated_memory(va, size);
+}
+
+static void unmap_donated_memory_noclear(void *va, size_t size)
+{
+	if (!va)
+		return;
+
+	__unmap_donated_memory(va, size);
+}
+
+/*
+ * Initialize the hypervisor copy of the protected VM state using the
+ * memory donated by the host.
+ *
+ * Unmaps the donated memory from the host at stage 2.
+ *
+ * kvm: A pointer to the host's struct kvm.
+ * vm_hva: The host va of the area being donated for the VM state.
+ *	   Must be page aligned.
+ * pgd_hva: The host va of the area being donated for the stage-2 PGD for
+ *	    the VM. Must be page aligned. Its size is implied by the VM's
+ *	    VTCR.
+ *
+ * Return a unique handle to the protected VM on success,
+ * negative error code on failure.
+ */
+int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
+		   unsigned long pgd_hva)
+{
+	struct pkvm_hyp_vm *hyp_vm = NULL;
+	size_t vm_size, pgd_size;
+	unsigned int nr_vcpus;
+	void *pgd = NULL;
+	int ret;
+
+	ret = hyp_pin_shared_mem(host_kvm, host_kvm + 1);
+	if (ret)
+		return ret;
+
+	nr_vcpus = READ_ONCE(host_kvm->created_vcpus);
+	if (nr_vcpus < 1) {
+		ret = -EINVAL;
+		goto err_unpin_kvm;
+	}
+
+	vm_size = pkvm_get_hyp_vm_size(nr_vcpus);
+	pgd_size = kvm_pgtable_stage2_pgd_size(host_mmu.arch.vtcr);
+
+	ret = -ENOMEM;
+
+	hyp_vm = map_donated_memory(vm_hva, vm_size);
+	if (!hyp_vm)
+		goto err_remove_mappings;
+
+	pgd = map_donated_memory_noclear(pgd_hva, pgd_size);
+	if (!pgd)
+		goto err_remove_mappings;
+
+	init_pkvm_hyp_vm(host_kvm, hyp_vm, nr_vcpus);
+
+	hyp_spin_lock(&vm_table_lock);
+	ret = insert_vm_table_entry(host_kvm, hyp_vm, vm_size);
+	if (ret < 0)
+		goto err_unlock;
+
+	ret = kvm_guest_prepare_stage2(hyp_vm, pgd);
+	if (ret)
+		goto err_remove_vm_table_entry;
+	hyp_spin_unlock(&vm_table_lock);
+
+	return hyp_vm->kvm.arch.pkvm_handle;
+
+err_remove_vm_table_entry:
+	remove_vm_table_entry(hyp_vm->kvm.arch.pkvm_handle);
+err_unlock:
+	hyp_spin_unlock(&vm_table_lock);
+err_remove_mappings:
+	unmap_donated_memory(hyp_vm, vm_size);
+	unmap_donated_memory(pgd, pgd_size);
+err_unpin_kvm:
+	hyp_unpin_shared_mem(host_kvm, host_kvm + 1);
+	return ret;
+}
+
+/*
+ * Initialize the hypervisor copy of the protected vCPU state using the
+ * memory donated by the host.
+ *
+ * handle: The handle for the protected vm.
+ * host_vcpu: A pointer to the corresponding host vcpu.
+ * vcpu_hva: The host va of the area being donated for the vcpu state.
+ *	     Must be page aligned. The size of the area must be equal to
+ *	     the page-aligned size of 'struct pkvm_hyp_vcpu'.
+ * Return 0 on success, negative error code on failure.
+ */
+int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
+		     unsigned long vcpu_hva)
+{
+	struct pkvm_hyp_vcpu *hyp_vcpu;
+	struct pkvm_hyp_vm *hyp_vm;
+	unsigned int idx;
+	int ret;
+
+	hyp_vcpu = map_donated_memory(vcpu_hva, sizeof(*hyp_vcpu));
+	if (!hyp_vcpu)
+		return -ENOMEM;
+
+	hyp_spin_lock(&vm_table_lock);
+
+	hyp_vm = get_vm_by_handle(handle);
+	if (!hyp_vm) {
+		ret = -ENOENT;
+		goto unlock;
+	}
+
+	idx = hyp_vm->nr_vcpus;
+	if (idx >= hyp_vm->kvm.created_vcpus) {
+		ret = -EINVAL;
+		goto unlock;
+	}
+
+	ret = init_pkvm_hyp_vcpu(hyp_vcpu, hyp_vm, host_vcpu, idx);
+	if (ret)
+		goto unlock;
+
+	hyp_vm->vcpus[idx] = hyp_vcpu;
+	hyp_vm->nr_vcpus++;
+unlock:
+	hyp_spin_unlock(&vm_table_lock);
+
+	if (ret)
+		unmap_donated_memory(hyp_vcpu, sizeof(*hyp_vcpu));
+
+	return ret;
+}
+
+int __pkvm_teardown_vm(pkvm_handle_t handle)
+{
+	struct pkvm_hyp_vm *hyp_vm;
+	int err;
+
+	hyp_spin_lock(&vm_table_lock);
+	hyp_vm = get_vm_by_handle(handle);
+	if (!hyp_vm) {
+		err = -ENOENT;
+		goto err_unlock;
+	}
+
+	if (WARN_ON(hyp_page_count(hyp_vm))) {
+		err = -EBUSY;
+		goto err_unlock;
+	}
+
+	/* Ensure the VMID is clean before it can be reallocated */
+	__kvm_tlb_flush_vmid(&hyp_vm->kvm.arch.mmu);
+	remove_vm_table_entry(handle);
+	hyp_spin_unlock(&vm_table_lock);
+
+	/* Reclaim guest pages (including page-table pages) */
+	reclaim_guest_pages(hyp_vm);
+	unpin_host_vcpus(hyp_vm->vcpus, hyp_vm->nr_vcpus);
+
+	/* Push the metadata pages to the teardown memcache */
+	hyp_unpin_shared_mem(hyp_vm->host_kvm, hyp_vm->host_kvm + 1);
+
+	unmap_donated_memory(hyp_vm, hyp_vm->donated_memory_size);
+	return 0;
+
+err_unlock:
+	hyp_spin_unlock(&vm_table_lock);
+	return err;
+}
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 0312c9c74a5a..2be72fbe7279 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -16,6 +16,7 @@
 #include <nvhe/memory.h>
 #include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
+#include <nvhe/pkvm.h>
 #include <nvhe/trap_handler.h>
 
 unsigned long hyp_nr_cpus;
@@ -24,6 +25,7 @@ unsigned long hyp_nr_cpus;
 			 (unsigned long)__per_cpu_start)
 
 static void *vmemmap_base;
+static void *vm_table_base;
 static void *hyp_pgt_base;
 static void *host_s2_pgt_base;
 static struct kvm_pgtable_mm_ops pkvm_pgtable_mm_ops;
@@ -40,6 +42,11 @@ static int divide_memory_pool(void *virt, unsigned long size)
 	if (!vmemmap_base)
 		return -ENOMEM;
 
+	nr_pages = hyp_vm_table_pages();
+	vm_table_base = hyp_early_alloc_contig(nr_pages);
+	if (!vm_table_base)
+		return -ENOMEM;
+
 	nr_pages = hyp_s1_pgtable_pages();
 	hyp_pgt_base = hyp_early_alloc_contig(nr_pages);
 	if (!hyp_pgt_base)
@@ -314,6 +321,7 @@ void __noreturn __pkvm_init_finalise(void)
 	if (ret)
 		goto out;
 
+	pkvm_hyp_vm_table_init(vm_table_base);
 out:
 	/*
 	 * We tail-called to here from handle___pkvm_init() and will not return,
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index cdf8e76b0be1..a1a27f88a312 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1200,6 +1200,15 @@ int __kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu,
 	return 0;
 }
 
+size_t kvm_pgtable_stage2_pgd_size(u64 vtcr)
+{
+	u32 ia_bits = VTCR_EL2_IPA(vtcr);
+	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
+	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
+
+	return kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
+}
+
 static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			      enum kvm_pgtable_walk_flags flag,
 			      void * const arg)
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 34229425b25d..71493136e59c 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -71,6 +71,7 @@ void __init kvm_hyp_reserve(void)
 
 	hyp_mem_pages += hyp_s1_pgtable_pages();
 	hyp_mem_pages += host_s2_pgtable_pages();
+	hyp_mem_pages += hyp_vm_table_pages();
 	hyp_mem_pages += hyp_vmemmap_pages(STRUCT_HYP_PAGE_SIZE);
 
 	/*
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 13/25] KVM: arm64: Instantiate pKVM hypervisor VM and vCPU structures from EL1
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
                   ` (11 preceding siblings ...)
  2022-10-17 11:51 ` [PATCH v4 12/25] KVM: arm64: Add infrastructure to create and track pKVM instances at EL2 Will Deacon
@ 2022-10-17 11:51 ` Will Deacon
  2022-10-19 15:46   ` Quentin Perret
  2022-10-19 16:00   ` Quentin Perret
  2022-10-17 11:51 ` [PATCH v4 14/25] KVM: arm64: Add per-cpu fixmap infrastructure at EL2 Will Deacon
                   ` (11 subsequent siblings)
  24 siblings, 2 replies; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:51 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

From: Fuad Tabba <tabba@google.com>

With the pKVM hypervisor at EL2 now offering hypercalls to the host for
creating and destroying VM and vCPU structures, plumb these in to the
existing arm64 KVM backend to ensure that the hypervisor data structures
are allocated and initialised on first vCPU run for a pKVM guest.

In the host, 'struct kvm_protected_vm' is introduced to hold the handle
of the pKVM VM instance as well as to track references to the memory
donated to the hypervisor so that it can be freed back to the host
allocator following VM teardown. The stage-2 page-table, hypervisor VM
and vCPU structures are allocated separately so as to avoid the need for
a large physically-contiguous allocation in the host at run-time.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Fuad Tabba <tabba@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h  |  15 +++-
 arch/arm64/include/asm/kvm_pkvm.h  |   4 +
 arch/arm64/kvm/arm.c               |  14 +++
 arch/arm64/kvm/hyp/hyp-constants.c |   3 +
 arch/arm64/kvm/hyp/nvhe/pkvm.c     |  15 +++-
 arch/arm64/kvm/pkvm.c              | 138 +++++++++++++++++++++++++++++
 6 files changed, 183 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index d3dd7ab9c79e..f09b422ba1fe 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -117,6 +117,17 @@ struct kvm_smccc_features {
 
 typedef unsigned int pkvm_handle_t;
 
+struct kvm_protected_vm {
+	pkvm_handle_t handle;
+	struct mutex vm_lock;
+
+	struct {
+		void *pgd;
+		void *vm;
+		void *vcpus[KVM_MAX_VCPUS];
+	} hyp_donations;
+};
+
 struct kvm_arch {
 	struct kvm_s2_mmu mmu;
 
@@ -170,10 +181,10 @@ struct kvm_arch {
 	struct kvm_smccc_features smccc_feat;
 
 	/*
-	 * For an untrusted host VM, 'pkvm_handle' is used to lookup
+	 * For an untrusted host VM, 'pkvm.handle' is used to lookup
 	 * the associated pKVM instance in the hypervisor.
 	 */
-	pkvm_handle_t pkvm_handle;
+	struct kvm_protected_vm pkvm;
 };
 
 struct kvm_vcpu_fault_info {
diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index e22856cf4207..8340798b2289 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -14,6 +14,10 @@
 
 #define HYP_MEMBLOCK_REGIONS 128
 
+int pkvm_init_host_vm(struct kvm *kvm);
+int pkvm_create_hyp_vm(struct kvm *kvm);
+void pkvm_destroy_hyp_vm(struct kvm *kvm);
+
 extern struct memblock_region kvm_nvhe_sym(hyp_memory)[];
 extern unsigned int kvm_nvhe_sym(hyp_memblock_nr);
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 94d33e296e10..30d6fc5d3a93 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -37,6 +37,7 @@
 #include <asm/kvm_arm.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_pkvm.h>
 #include <asm/kvm_emulate.h>
 #include <asm/sections.h>
 
@@ -150,6 +151,10 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	if (ret)
 		goto out_free_stage2_pgd;
 
+	ret = pkvm_init_host_vm(kvm);
+	if (ret)
+		goto out_free_stage2_pgd;
+
 	if (!zalloc_cpumask_var(&kvm->arch.supported_cpus, GFP_KERNEL)) {
 		ret = -ENOMEM;
 		goto out_free_stage2_pgd;
@@ -187,6 +192,9 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 
 	kvm_vgic_destroy(kvm);
 
+	if (is_protected_kvm_enabled())
+		pkvm_destroy_hyp_vm(kvm);
+
 	kvm_destroy_vcpus(kvm);
 
 	kvm_unshare_hyp(kvm, kvm + 1);
@@ -569,6 +577,12 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
 	if (ret)
 		return ret;
 
+	if (is_protected_kvm_enabled()) {
+		ret = pkvm_create_hyp_vm(kvm);
+		if (ret)
+			return ret;
+	}
+
 	if (!irqchip_in_kernel(kvm)) {
 		/*
 		 * Tell the rest of the code that there are userspace irqchip
diff --git a/arch/arm64/kvm/hyp/hyp-constants.c b/arch/arm64/kvm/hyp/hyp-constants.c
index b3742a6691e8..b257a3b4bfc5 100644
--- a/arch/arm64/kvm/hyp/hyp-constants.c
+++ b/arch/arm64/kvm/hyp/hyp-constants.c
@@ -2,9 +2,12 @@
 
 #include <linux/kbuild.h>
 #include <nvhe/memory.h>
+#include <nvhe/pkvm.h>
 
 int main(void)
 {
 	DEFINE(STRUCT_HYP_PAGE_SIZE,	sizeof(struct hyp_page));
+	DEFINE(PKVM_HYP_VM_SIZE,	sizeof(struct pkvm_hyp_vm));
+	DEFINE(PKVM_HYP_VCPU_SIZE,	sizeof(struct pkvm_hyp_vcpu));
 	return 0;
 }
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 7e6f7d2f7713..5970fd1adbfb 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -335,7 +335,7 @@ static pkvm_handle_t insert_vm_table_entry(struct kvm *host_kvm,
 	if (idx < 0)
 		return idx;
 
-	hyp_vm->kvm.arch.pkvm_handle = idx_to_vm_handle(idx);
+	hyp_vm->kvm.arch.pkvm.handle = idx_to_vm_handle(idx);
 	hyp_vm->donated_memory_size = donated_memory_size;
 
 	/* VMID 0 is reserved for the host */
@@ -345,7 +345,7 @@ static pkvm_handle_t insert_vm_table_entry(struct kvm *host_kvm,
 	mmu->pgt = &hyp_vm->pgt;
 
 	vm_table[idx] = hyp_vm;
-	return hyp_vm->kvm.arch.pkvm_handle;
+	return hyp_vm->kvm.arch.pkvm.handle;
 }
 
 /*
@@ -470,10 +470,10 @@ int __pkvm_init_vm(struct kvm *host_kvm, unsigned long vm_hva,
 		goto err_remove_vm_table_entry;
 	hyp_spin_unlock(&vm_table_lock);
 
-	return hyp_vm->kvm.arch.pkvm_handle;
+	return hyp_vm->kvm.arch.pkvm.handle;
 
 err_remove_vm_table_entry:
-	remove_vm_table_entry(hyp_vm->kvm.arch.pkvm_handle);
+	remove_vm_table_entry(hyp_vm->kvm.arch.pkvm.handle);
 err_unlock:
 	hyp_spin_unlock(&vm_table_lock);
 err_remove_mappings:
@@ -539,6 +539,7 @@ int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
 int __pkvm_teardown_vm(pkvm_handle_t handle)
 {
 	struct pkvm_hyp_vm *hyp_vm;
+	unsigned int idx;
 	int err;
 
 	hyp_spin_lock(&vm_table_lock);
@@ -565,6 +566,12 @@ int __pkvm_teardown_vm(pkvm_handle_t handle)
 	/* Push the metadata pages to the teardown memcache */
 	hyp_unpin_shared_mem(hyp_vm->host_kvm, hyp_vm->host_kvm + 1);
 
+	for (idx = 0; idx < hyp_vm->nr_vcpus; ++idx) {
+		struct pkvm_hyp_vcpu *hyp_vcpu = hyp_vm->vcpus[idx];
+
+		unmap_donated_memory(hyp_vcpu, sizeof(*hyp_vcpu));
+	}
+
 	unmap_donated_memory(hyp_vm, hyp_vm->donated_memory_size);
 	return 0;
 
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 71493136e59c..754632a608e3 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -6,6 +6,7 @@
 
 #include <linux/kvm_host.h>
 #include <linux/memblock.h>
+#include <linux/mutex.h>
 #include <linux/sort.h>
 
 #include <asm/kvm_pkvm.h>
@@ -94,3 +95,140 @@ void __init kvm_hyp_reserve(void)
 	kvm_info("Reserved %lld MiB at 0x%llx\n", hyp_mem_size >> 20,
 		 hyp_mem_base);
 }
+
+/*
+ * Allocates and donates memory for hypervisor VM structs at EL2.
+ *
+ * Allocates space for the VM state, which includes the hyp vm as well as
+ * the hyp vcpus.
+ *
+ * Stores an opaque handler in the kvm struct for future reference.
+ *
+ * Return 0 on success, negative error code on failure.
+ */
+static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
+{
+	size_t pgd_sz, hyp_vm_sz, hyp_vcpu_sz;
+	struct kvm_vcpu *host_vcpu;
+	pkvm_handle_t handle;
+	void *pgd, *hyp_vm;
+	unsigned long idx;
+	int ret;
+
+	if (host_kvm->created_vcpus < 1)
+		return -EINVAL;
+
+	pgd_sz = kvm_pgtable_stage2_pgd_size(host_kvm->arch.vtcr);
+
+	/*
+	 * The PGD pages will be reclaimed using a hyp_memcache which implies
+	 * page granularity. So, use alloc_pages_exact() to get individual
+	 * refcounts.
+	 */
+	pgd = alloc_pages_exact(pgd_sz, GFP_KERNEL_ACCOUNT);
+	if (!pgd)
+		return -ENOMEM;
+
+	/* Allocate memory to donate to hyp for vm and vcpu pointers. */
+	hyp_vm_sz = PAGE_ALIGN(size_add(PKVM_HYP_VM_SIZE,
+					size_mul(sizeof(void *),
+						 host_kvm->created_vcpus)));
+	hyp_vm = alloc_pages_exact(hyp_vm_sz, GFP_KERNEL_ACCOUNT);
+	if (!hyp_vm) {
+		ret = -ENOMEM;
+		goto free_pgd;
+	}
+
+	/* Donate the VM memory to hyp and let hyp initialize it. */
+	ret = kvm_call_hyp_nvhe(__pkvm_init_vm, host_kvm, hyp_vm, pgd);
+	if (ret < 0)
+		goto free_vm;
+
+	handle = ret;
+
+	host_kvm->arch.pkvm.handle = handle;
+	host_kvm->arch.pkvm.hyp_donations.pgd = pgd;
+	host_kvm->arch.pkvm.hyp_donations.vm = hyp_vm;
+
+	/* Donate memory for the vcpus at hyp and initialize it. */
+	hyp_vcpu_sz = PAGE_ALIGN(PKVM_HYP_VCPU_SIZE);
+	kvm_for_each_vcpu(idx, host_vcpu, host_kvm) {
+		void *hyp_vcpu;
+
+		/* Indexing of the vcpus to be sequential starting at 0. */
+		if (WARN_ON(host_vcpu->vcpu_idx != idx)) {
+			ret = -EINVAL;
+			goto destroy_vm;
+		}
+
+		hyp_vcpu = alloc_pages_exact(hyp_vcpu_sz, GFP_KERNEL_ACCOUNT);
+		if (!hyp_vcpu) {
+			ret = -ENOMEM;
+			goto destroy_vm;
+		}
+
+		host_kvm->arch.pkvm.hyp_donations.vcpus[idx] = hyp_vcpu;
+
+		ret = kvm_call_hyp_nvhe(__pkvm_init_vcpu, handle, host_vcpu,
+					hyp_vcpu);
+		if (ret)
+			goto destroy_vm;
+	}
+
+	return 0;
+
+destroy_vm:
+	pkvm_destroy_hyp_vm(host_kvm);
+	return ret;
+free_vm:
+	free_pages_exact(hyp_vm, hyp_vm_sz);
+free_pgd:
+	free_pages_exact(pgd, pgd_sz);
+	return ret;
+}
+
+int pkvm_create_hyp_vm(struct kvm *host_kvm)
+{
+	int ret = 0;
+
+	mutex_lock(&host_kvm->arch.pkvm.vm_lock);
+	if (!host_kvm->arch.pkvm.handle)
+		ret = __pkvm_create_hyp_vm(host_kvm);
+	mutex_unlock(&host_kvm->arch.pkvm.vm_lock);
+
+	return ret;
+}
+
+void pkvm_destroy_hyp_vm(struct kvm *host_kvm)
+{
+	unsigned long idx, nr_vcpus = host_kvm->created_vcpus;
+	size_t pgd_sz, hyp_vm_sz;
+
+	if (host_kvm->arch.pkvm.handle)
+		WARN_ON(kvm_call_hyp_nvhe(__pkvm_teardown_vm,
+					  host_kvm->arch.pkvm.handle));
+
+	host_kvm->arch.pkvm.handle = 0;
+
+	for (idx = 0; idx < nr_vcpus; ++idx) {
+		void *hyp_vcpu = host_kvm->arch.pkvm.hyp_donations.vcpus[idx];
+
+		if (!hyp_vcpu)
+			break;
+
+		free_pages_exact(hyp_vcpu, PAGE_ALIGN(PKVM_HYP_VCPU_SIZE));
+	}
+
+	hyp_vm_sz = PAGE_ALIGN(size_add(PKVM_HYP_VM_SIZE,
+					size_mul(sizeof(void *), nr_vcpus)));
+	pgd_sz = kvm_pgtable_stage2_pgd_size(host_kvm->arch.vtcr);
+
+	free_pages_exact(host_kvm->arch.pkvm.hyp_donations.vm, hyp_vm_sz);
+	free_pages_exact(host_kvm->arch.pkvm.hyp_donations.pgd, pgd_sz);
+}
+
+int pkvm_init_host_vm(struct kvm *host_kvm)
+{
+	mutex_init(&host_kvm->arch.pkvm.vm_lock);
+	return 0;
+}
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 14/25] KVM: arm64: Add per-cpu fixmap infrastructure at EL2
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
                   ` (12 preceding siblings ...)
  2022-10-17 11:51 ` [PATCH v4 13/25] KVM: arm64: Instantiate pKVM hypervisor VM and vCPU structures from EL1 Will Deacon
@ 2022-10-17 11:51 ` Will Deacon
  2022-10-18 11:06   ` Mark Rutland
  2022-10-17 11:51 ` [PATCH v4 15/25] KVM: arm64: Initialise hypervisor copies of host symbols unconditionally Will Deacon
                   ` (10 subsequent siblings)
  24 siblings, 1 reply; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:51 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

Mapping pages in a guest page-table from within the pKVM hypervisor at
EL2 may require cache maintenance to ensure that the initialised page
contents is visible even to non-cacheable (e.g. MMU-off) accesses from
the guest.

In preparation for performing this maintenance at EL2, introduce a
per-vCPU fixmap which allows the pKVM hypervisor to map guest pages
temporarily into its stage-1 page-table for the purposes of cache
maintenance and, in future, poisoning on the reclaim path. The use of a
fixmap avoids the need for memory allocation or locking on the map()
path.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_pgtable.h          | 14 +++
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +
 arch/arm64/kvm/hyp/include/nvhe/mm.h          |  4 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         |  1 -
 arch/arm64/kvm/hyp/nvhe/mm.c                  | 94 +++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/setup.c               |  4 +
 arch/arm64/kvm/hyp/pgtable.c                  | 12 ---
 7 files changed, 118 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 4f6d79fe4352..b2a886c9e78d 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -30,6 +30,8 @@ typedef u64 kvm_pte_t;
 #define KVM_PTE_ADDR_MASK		GENMASK(47, PAGE_SHIFT)
 #define KVM_PTE_ADDR_51_48		GENMASK(15, 12)
 
+#define KVM_PHYS_INVALID		(-1ULL)
+
 static inline bool kvm_pte_valid(kvm_pte_t pte)
 {
 	return pte & KVM_PTE_VALID;
@@ -45,6 +47,18 @@ static inline u64 kvm_pte_to_phys(kvm_pte_t pte)
 	return pa;
 }
 
+static inline kvm_pte_t kvm_phys_to_pte(u64 pa)
+{
+	kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK;
+
+	if (PAGE_SHIFT == 16) {
+		pa &= GENMASK(51, 48);
+		pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
+	}
+
+	return pte;
+}
+
 static inline u64 kvm_granule_shift(u32 level)
 {
 	/* Assumes KVM_PGTABLE_MAX_LEVELS is 4 */
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index ce9a796a85ee..ef31a1872c93 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -59,6 +59,8 @@ enum pkvm_component_id {
 	PKVM_ID_HYP,
 };
 
+extern unsigned long hyp_nr_cpus;
+
 int __pkvm_prot_finalize(void);
 int __pkvm_host_share_hyp(u64 pfn);
 int __pkvm_host_unshare_hyp(u64 pfn);
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
index b2ee6d5df55b..d5ec972b5c1e 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
@@ -13,6 +13,10 @@
 extern struct kvm_pgtable pkvm_pgtable;
 extern hyp_spinlock_t pkvm_pgd_lock;
 
+int hyp_create_pcpu_fixmap(void);
+void *hyp_fixmap_map(phys_addr_t phys);
+void hyp_fixmap_unmap(void);
+
 int hyp_create_idmap(u32 hyp_va_bits);
 int hyp_map_vectors(void);
 int hyp_back_vmemmap(phys_addr_t back);
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 2ef6aaa21ba5..1c38451050e5 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -21,7 +21,6 @@
 
 #define KVM_HOST_S2_FLAGS (KVM_PGTABLE_S2_NOFWB | KVM_PGTABLE_S2_IDMAP)
 
-extern unsigned long hyp_nr_cpus;
 struct host_mmu host_mmu;
 
 static struct hyp_pool host_s2_pool;
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index d3a3b47181de..b77215630d5c 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -14,6 +14,7 @@
 #include <nvhe/early_alloc.h>
 #include <nvhe/gfp.h>
 #include <nvhe/memory.h>
+#include <nvhe/mem_protect.h>
 #include <nvhe/mm.h>
 #include <nvhe/spinlock.h>
 
@@ -25,6 +26,12 @@ unsigned int hyp_memblock_nr;
 
 static u64 __io_map_base;
 
+struct hyp_fixmap_slot {
+	u64 addr;
+	kvm_pte_t *ptep;
+};
+static DEFINE_PER_CPU(struct hyp_fixmap_slot, fixmap_slots);
+
 static int __pkvm_create_mappings(unsigned long start, unsigned long size,
 				  unsigned long phys, enum kvm_pgtable_prot prot)
 {
@@ -212,6 +219,93 @@ int hyp_map_vectors(void)
 	return 0;
 }
 
+void *hyp_fixmap_map(phys_addr_t phys)
+{
+	struct hyp_fixmap_slot *slot = this_cpu_ptr(&fixmap_slots);
+	kvm_pte_t pte, *ptep = slot->ptep;
+
+	pte = *ptep;
+	pte &= ~kvm_phys_to_pte(KVM_PHYS_INVALID);
+	pte |= kvm_phys_to_pte(phys) | KVM_PTE_VALID;
+	WRITE_ONCE(*ptep, pte);
+	dsb(nshst);
+
+	return (void *)slot->addr;
+}
+
+static void fixmap_clear_slot(struct hyp_fixmap_slot *slot)
+{
+	kvm_pte_t *ptep = slot->ptep;
+	u64 addr = slot->addr;
+
+	WRITE_ONCE(*ptep, *ptep & ~KVM_PTE_VALID);
+	dsb(nshst);
+	__tlbi_level(vale2, __TLBI_VADDR(addr, 0), (KVM_PGTABLE_MAX_LEVELS - 1));
+	dsb(nsh);
+	isb();
+}
+
+void hyp_fixmap_unmap(void)
+{
+	fixmap_clear_slot(this_cpu_ptr(&fixmap_slots));
+}
+
+static int __create_fixmap_slot_cb(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+				   enum kvm_pgtable_walk_flags flag,
+				   void * const arg)
+{
+	struct hyp_fixmap_slot *slot = per_cpu_ptr(&fixmap_slots, (u64)arg);
+
+	if (!kvm_pte_valid(*ptep) || level != KVM_PGTABLE_MAX_LEVELS - 1)
+		return -EINVAL;
+
+	slot->addr = addr;
+	slot->ptep = ptep;
+
+	/*
+	 * Clear the PTE, but keep the page-table page refcount elevated to
+	 * prevent it from ever being freed. This lets us manipulate the PTEs
+	 * by hand safely without ever needing to allocate memory.
+	 */
+	fixmap_clear_slot(slot);
+
+	return 0;
+}
+
+static int create_fixmap_slot(u64 addr, u64 cpu)
+{
+	struct kvm_pgtable_walker walker = {
+		.cb	= __create_fixmap_slot_cb,
+		.flags	= KVM_PGTABLE_WALK_LEAF,
+		.arg = (void *)cpu,
+	};
+
+	return kvm_pgtable_walk(&pkvm_pgtable, addr, PAGE_SIZE, &walker);
+}
+
+int hyp_create_pcpu_fixmap(void)
+{
+	unsigned long addr, i;
+	int ret;
+
+	for (i = 0; i < hyp_nr_cpus; i++) {
+		ret = pkvm_alloc_private_va_range(PAGE_SIZE, &addr);
+		if (ret)
+			return ret;
+
+		ret = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, PAGE_SIZE,
+					  __hyp_pa(__hyp_bss_start), PAGE_HYP);
+		if (ret)
+			return ret;
+
+		ret = create_fixmap_slot(addr, i);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
 int hyp_create_idmap(u32 hyp_va_bits)
 {
 	unsigned long start, end;
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 2be72fbe7279..0f69c1393416 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -321,6 +321,10 @@ void __noreturn __pkvm_init_finalise(void)
 	if (ret)
 		goto out;
 
+	ret = hyp_create_pcpu_fixmap();
+	if (ret)
+		goto out;
+
 	pkvm_hyp_vm_table_init(vm_table_base);
 out:
 	/*
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index a1a27f88a312..2bcb2d5903ba 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -57,8 +57,6 @@ struct kvm_pgtable_walk_data {
 	u64				end;
 };
 
-#define KVM_PHYS_INVALID (-1ULL)
-
 static bool kvm_phys_is_valid(u64 phys)
 {
 	return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
@@ -122,16 +120,6 @@ static bool kvm_pte_table(kvm_pte_t pte, u32 level)
 	return FIELD_GET(KVM_PTE_TYPE, pte) == KVM_PTE_TYPE_TABLE;
 }
 
-static kvm_pte_t kvm_phys_to_pte(u64 pa)
-{
-	kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK;
-
-	if (PAGE_SHIFT == 16)
-		pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
-
-	return pte;
-}
-
 static kvm_pte_t *kvm_pte_follow(kvm_pte_t pte, struct kvm_pgtable_mm_ops *mm_ops)
 {
 	return mm_ops->phys_to_virt(kvm_pte_to_phys(pte));
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 15/25] KVM: arm64: Initialise hypervisor copies of host symbols unconditionally
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
                   ` (13 preceding siblings ...)
  2022-10-17 11:51 ` [PATCH v4 14/25] KVM: arm64: Add per-cpu fixmap infrastructure at EL2 Will Deacon
@ 2022-10-17 11:51 ` Will Deacon
  2022-10-17 20:26   ` Philippe Mathieu-Daudé
  2022-10-17 11:52 ` [PATCH v4 16/25] KVM: arm64: Provide I-cache invalidation by virtual address at EL2 Will Deacon
                   ` (9 subsequent siblings)
  24 siblings, 1 reply; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:51 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

The nVHE object at EL2 maintains its own copies of some host variables
so that, when pKVM is enabled, the host cannot directly modify the
hypervisor state. When running in normal nVHE mode, however, these
variables are still mirrored at EL2 but are not initialised.

Initialise the hypervisor symbols from the host copies regardless of
pKVM, ensuring that any reference to this data at EL2 with normal nVHE
will return a sensibly initialised value.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/arm.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 30d6fc5d3a93..584626e11797 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1884,11 +1884,8 @@ static int do_pkvm_init(u32 hyp_va_bits)
 	return ret;
 }
 
-static int kvm_hyp_init_protection(u32 hyp_va_bits)
+static void kvm_hyp_init_symbols(void)
 {
-	void *addr = phys_to_virt(hyp_mem_base);
-	int ret;
-
 	kvm_nvhe_sym(id_aa64pfr0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1);
 	kvm_nvhe_sym(id_aa64pfr1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1);
 	kvm_nvhe_sym(id_aa64isar0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64ISAR0_EL1);
@@ -1897,6 +1894,12 @@ static int kvm_hyp_init_protection(u32 hyp_va_bits)
 	kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
 	kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
 	kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR2_EL1);
+}
+
+static int kvm_hyp_init_protection(u32 hyp_va_bits)
+{
+	void *addr = phys_to_virt(hyp_mem_base);
+	int ret;
 
 	ret = create_hyp_mappings(addr, addr + hyp_mem_size, PAGE_HYP);
 	if (ret)
@@ -2071,6 +2074,8 @@ static int init_hyp_mode(void)
 		cpu_prepare_hyp_mode(cpu);
 	}
 
+	kvm_hyp_init_symbols();
+
 	if (is_protected_kvm_enabled()) {
 		init_cpu_logical_map();
 
@@ -2078,9 +2083,7 @@ static int init_hyp_mode(void)
 			err = -ENODEV;
 			goto out_err;
 		}
-	}
 
-	if (is_protected_kvm_enabled()) {
 		err = kvm_hyp_init_protection(hyp_va_bits);
 		if (err) {
 			kvm_err("Failed to init hyp memory protection\n");
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 16/25] KVM: arm64: Provide I-cache invalidation by virtual address at EL2
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
                   ` (14 preceding siblings ...)
  2022-10-17 11:51 ` [PATCH v4 15/25] KVM: arm64: Initialise hypervisor copies of host symbols unconditionally Will Deacon
@ 2022-10-17 11:52 ` Will Deacon
  2022-10-17 11:52 ` [PATCH v4 17/25] KVM: arm64: Add generic hyp_memcache helpers Will Deacon
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:52 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

In preparation for handling cache maintenance of guest pages from within
the pKVM hypervisor at EL2, introduce an EL2 copy of icache_inval_pou()
which will later be plumbed into the stage-2 page-table cache
maintenance callbacks, ensuring that the initial contents of pages
mapped as executable into the guest stage-2 page-table is visible to the
instruction fetcher.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_hyp.h |  1 +
 arch/arm64/kernel/image-vars.h   |  3 ---
 arch/arm64/kvm/arm.c             |  1 +
 arch/arm64/kvm/hyp/nvhe/cache.S  | 11 +++++++++++
 arch/arm64/kvm/hyp/nvhe/pkvm.c   |  3 +++
 5 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index aa7fa2a08f06..fd99cf09972d 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -123,4 +123,5 @@ extern u64 kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val);
 extern u64 kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val);
 extern u64 kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val);
 
+extern unsigned long kvm_nvhe_sym(__icache_flags);
 #endif /* __ARM64_KVM_HYP_H__ */
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 8151412653de..7f4e43bfaade 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -71,9 +71,6 @@ KVM_NVHE_ALIAS(nvhe_hyp_panic_handler);
 /* Vectors installed by hyp-init on reset HVC. */
 KVM_NVHE_ALIAS(__hyp_stub_vectors);
 
-/* Kernel symbol used by icache_is_vpipt(). */
-KVM_NVHE_ALIAS(__icache_flags);
-
 /* VMID bits set by the KVM VMID allocator */
 KVM_NVHE_ALIAS(kvm_arm_vmid_bits);
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 584626e11797..d99e93e6ddf7 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1894,6 +1894,7 @@ static void kvm_hyp_init_symbols(void)
 	kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
 	kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
 	kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR2_EL1);
+	kvm_nvhe_sym(__icache_flags) = __icache_flags;
 }
 
 static int kvm_hyp_init_protection(u32 hyp_va_bits)
diff --git a/arch/arm64/kvm/hyp/nvhe/cache.S b/arch/arm64/kvm/hyp/nvhe/cache.S
index 0c367eb5f4e2..85936c17ae40 100644
--- a/arch/arm64/kvm/hyp/nvhe/cache.S
+++ b/arch/arm64/kvm/hyp/nvhe/cache.S
@@ -12,3 +12,14 @@ SYM_FUNC_START(__pi_dcache_clean_inval_poc)
 	ret
 SYM_FUNC_END(__pi_dcache_clean_inval_poc)
 SYM_FUNC_ALIAS(dcache_clean_inval_poc, __pi_dcache_clean_inval_poc)
+
+SYM_FUNC_START(__pi_icache_inval_pou)
+alternative_if ARM64_HAS_CACHE_DIC
+	isb
+	ret
+alternative_else_nop_endif
+
+	invalidate_icache_by_line x0, x1, x2, x3
+	ret
+SYM_FUNC_END(__pi_icache_inval_pou)
+SYM_FUNC_ALIAS(icache_inval_pou, __pi_icache_inval_pou)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 5970fd1adbfb..2156ce546d90 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -12,6 +12,9 @@
 #include <nvhe/pkvm.h>
 #include <nvhe/trap_handler.h>
 
+/* Used by icache_is_vpipt(). */
+unsigned long __icache_flags;
+
 /*
  * Set trap register values based on features in ID_AA64PFR0.
  */
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 17/25] KVM: arm64: Add generic hyp_memcache helpers
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
                   ` (15 preceding siblings ...)
  2022-10-17 11:52 ` [PATCH v4 16/25] KVM: arm64: Provide I-cache invalidation by virtual address at EL2 Will Deacon
@ 2022-10-17 11:52 ` Will Deacon
  2022-10-17 11:52 ` [PATCH v4 18/25] KVM: arm64: Consolidate stage-2 initialisation into a single function Will Deacon
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:52 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

The host at EL1 and the pKVM hypervisor at EL2 will soon need to
exchange memory pages dynamically for creating and destroying VM state.

Indeed, the hypervisor will rely on the host to donate memory pages it
can use to create guest stage-2 page-tables and to store VM and vCPU
metadata. In order to ease this process, introduce a
'struct hyp_memcache' which is essentially a linked list of available
pages, indexed by physical addresses so that it can be passed
meaningfully between the different virtual address spaces configured at
EL1 and EL2.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h             | 57 +++++++++++++++++++
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +
 arch/arm64/kvm/hyp/nvhe/mm.c                  | 33 +++++++++++
 arch/arm64/kvm/mmu.c                          | 26 +++++++++
 4 files changed, 118 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index f09b422ba1fe..2ec268cc2031 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -73,6 +73,63 @@ u32 __attribute_const__ kvm_target_cpu(void);
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
 void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu);
 
+struct kvm_hyp_memcache {
+	phys_addr_t head;
+	unsigned long nr_pages;
+};
+
+static inline void push_hyp_memcache(struct kvm_hyp_memcache *mc,
+				     phys_addr_t *p,
+				     phys_addr_t (*to_pa)(void *virt))
+{
+	*p = mc->head;
+	mc->head = to_pa(p);
+	mc->nr_pages++;
+}
+
+static inline void *pop_hyp_memcache(struct kvm_hyp_memcache *mc,
+				     void *(*to_va)(phys_addr_t phys))
+{
+	phys_addr_t *p = to_va(mc->head);
+
+	if (!mc->nr_pages)
+		return NULL;
+
+	mc->head = *p;
+	mc->nr_pages--;
+
+	return p;
+}
+
+static inline int __topup_hyp_memcache(struct kvm_hyp_memcache *mc,
+				       unsigned long min_pages,
+				       void *(*alloc_fn)(void *arg),
+				       phys_addr_t (*to_pa)(void *virt),
+				       void *arg)
+{
+	while (mc->nr_pages < min_pages) {
+		phys_addr_t *p = alloc_fn(arg);
+
+		if (!p)
+			return -ENOMEM;
+		push_hyp_memcache(mc, p, to_pa);
+	}
+
+	return 0;
+}
+
+static inline void __free_hyp_memcache(struct kvm_hyp_memcache *mc,
+				       void (*free_fn)(void *virt, void *arg),
+				       void *(*to_va)(phys_addr_t phys),
+				       void *arg)
+{
+	while (mc->nr_pages)
+		free_fn(pop_hyp_memcache(mc, to_va), arg);
+}
+
+void free_hyp_memcache(struct kvm_hyp_memcache *mc);
+int topup_hyp_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages);
+
 struct kvm_vmid {
 	atomic64_t id;
 };
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index ef31a1872c93..420b87e755a4 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -77,6 +77,8 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 int hyp_pin_shared_mem(void *from, void *to);
 void hyp_unpin_shared_mem(void *from, void *to);
 void reclaim_guest_pages(struct pkvm_hyp_vm *vm);
+int refill_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages,
+		    struct kvm_hyp_memcache *host_mc);
 
 static __always_inline void __load_host_stage2(void)
 {
diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
index b77215630d5c..c73f87f13e29 100644
--- a/arch/arm64/kvm/hyp/nvhe/mm.c
+++ b/arch/arm64/kvm/hyp/nvhe/mm.c
@@ -330,3 +330,36 @@ int hyp_create_idmap(u32 hyp_va_bits)
 
 	return __pkvm_create_mappings(start, end - start, start, PAGE_HYP_EXEC);
 }
+
+static void *admit_host_page(void *arg)
+{
+	struct kvm_hyp_memcache *host_mc = arg;
+
+	if (!host_mc->nr_pages)
+		return NULL;
+
+	/*
+	 * The host still owns the pages in its memcache, so we need to go
+	 * through a full host-to-hyp donation cycle to change it. Fortunately,
+	 * __pkvm_host_donate_hyp() takes care of races for us, so if it
+	 * succeeds we're good to go.
+	 */
+	if (__pkvm_host_donate_hyp(hyp_phys_to_pfn(host_mc->head), 1))
+		return NULL;
+
+	return pop_hyp_memcache(host_mc, hyp_phys_to_virt);
+}
+
+/* Refill our local memcache by poping pages from the one provided by the host. */
+int refill_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages,
+		    struct kvm_hyp_memcache *host_mc)
+{
+	struct kvm_hyp_memcache tmp = *host_mc;
+	int ret;
+
+	ret =  __topup_hyp_memcache(mc, min_pages, admit_host_page,
+				    hyp_virt_to_phys, &tmp);
+	*host_mc = tmp;
+
+	return ret;
+}
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 34c5feed9dc1..8b52566d1cb9 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -800,6 +800,32 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
 	}
 }
 
+static void hyp_mc_free_fn(void *addr, void *unused)
+{
+	free_page((unsigned long)addr);
+}
+
+static void *hyp_mc_alloc_fn(void *unused)
+{
+	return (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
+}
+
+void free_hyp_memcache(struct kvm_hyp_memcache *mc)
+{
+	if (is_protected_kvm_enabled())
+		__free_hyp_memcache(mc, hyp_mc_free_fn,
+				    kvm_host_va, NULL);
+}
+
+int topup_hyp_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages)
+{
+	if (!is_protected_kvm_enabled())
+		return 0;
+
+	return __topup_hyp_memcache(mc, min_pages, hyp_mc_alloc_fn,
+				    kvm_host_pa, NULL);
+}
+
 /**
  * kvm_phys_addr_ioremap - map a device range to guest IPA
  *
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 18/25] KVM: arm64: Consolidate stage-2 initialisation into a single function
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
                   ` (16 preceding siblings ...)
  2022-10-17 11:52 ` [PATCH v4 17/25] KVM: arm64: Add generic hyp_memcache helpers Will Deacon
@ 2022-10-17 11:52 ` Will Deacon
  2022-10-17 11:52 ` [PATCH v4 19/25] KVM: arm64: Instantiate guest stage-2 page-tables at EL2 Will Deacon
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:52 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

The initialisation of guest stage-2 page-tables is currently split
across two functions: kvm_init_stage2_mmu() and kvm_arm_setup_stage2().
That is presumably for historical reasons as kvm_arm_setup_stage2()
originates from the (now defunct) KVM port for 32-bit Arm.

Simplify this code path by merging both functions into one, taking care
to map the 'struct kvm' into the hypervisor stage-1 early on in order to
simplify the failure path.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_arm.h  |  2 +-
 arch/arm64/include/asm/kvm_host.h |  2 --
 arch/arm64/include/asm/kvm_mmu.h  |  2 +-
 arch/arm64/kvm/arm.c              | 27 +++++++++++++--------------
 arch/arm64/kvm/mmu.c              | 27 ++++++++++++++++++++++++++-
 arch/arm64/kvm/reset.c            | 29 -----------------------------
 6 files changed, 41 insertions(+), 48 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 8aa8492dafc0..89e63585dae4 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -135,7 +135,7 @@
  * 40 bits wide (T0SZ = 24).  Systems with a PARange smaller than 40 bits are
  * not known to exist and will break with this configuration.
  *
- * The VTCR_EL2 is configured per VM and is initialised in kvm_arm_setup_stage2().
+ * The VTCR_EL2 is configured per VM and is initialised in kvm_init_stage2_mmu.
  *
  * Note that when using 4K pages, we concatenate two first level page tables
  * together. With 16K pages, we concatenate 16 first level page tables.
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 2ec268cc2031..3fa973237b90 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -991,8 +991,6 @@ int kvm_set_ipa_limit(void);
 #define __KVM_HAVE_ARCH_VM_ALLOC
 struct kvm *kvm_arch_alloc_vm(void);
 
-int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type);
-
 static inline bool kvm_vm_is_protected(struct kvm *kvm)
 {
 	return false;
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 7784081088e7..e4a7e6369499 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -166,7 +166,7 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
 void free_hyp_pgds(void);
 
 void stage2_unmap_vm(struct kvm *kvm);
-int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu);
+int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long type);
 void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu);
 int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 			  phys_addr_t pa, unsigned long size, bool writable);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index d99e93e6ddf7..f78eefa02f6b 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -139,28 +139,24 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 {
 	int ret;
 
-	ret = kvm_arm_setup_stage2(kvm, type);
-	if (ret)
-		return ret;
-
-	ret = kvm_init_stage2_mmu(kvm, &kvm->arch.mmu);
-	if (ret)
-		return ret;
-
 	ret = kvm_share_hyp(kvm, kvm + 1);
 	if (ret)
-		goto out_free_stage2_pgd;
+		return ret;
 
 	ret = pkvm_init_host_vm(kvm);
 	if (ret)
-		goto out_free_stage2_pgd;
+		goto err_unshare_kvm;
 
 	if (!zalloc_cpumask_var(&kvm->arch.supported_cpus, GFP_KERNEL)) {
 		ret = -ENOMEM;
-		goto out_free_stage2_pgd;
+		goto err_unshare_kvm;
 	}
 	cpumask_copy(kvm->arch.supported_cpus, cpu_possible_mask);
 
+	ret = kvm_init_stage2_mmu(kvm, &kvm->arch.mmu, type);
+	if (ret)
+		goto err_free_cpumask;
+
 	kvm_vgic_early_init(kvm);
 
 	/* The maximum number of VCPUs is limited by the host's GIC model */
@@ -169,9 +165,12 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	set_default_spectre(kvm);
 	kvm_arm_init_hypercalls(kvm);
 
-	return ret;
-out_free_stage2_pgd:
-	kvm_free_stage2_pgd(&kvm->arch.mmu);
+	return 0;
+
+err_free_cpumask:
+	free_cpumask_var(kvm->arch.supported_cpus);
+err_unshare_kvm:
+	kvm_unshare_hyp(kvm, kvm + 1);
 	return ret;
 }
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 8b52566d1cb9..43761d31f763 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -668,15 +668,40 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
  * kvm_init_stage2_mmu - Initialise a S2 MMU structure
  * @kvm:	The pointer to the KVM structure
  * @mmu:	The pointer to the s2 MMU structure
+ * @type:	The machine type of the virtual machine
  *
  * Allocates only the stage-2 HW PGD level table(s).
  * Note we don't need locking here as this is only called when the VM is
  * created, which can only be done once.
  */
-int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
+int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long type)
 {
+	u32 kvm_ipa_limit = get_kvm_ipa_limit();
 	int cpu, err;
 	struct kvm_pgtable *pgt;
+	u64 mmfr0, mmfr1;
+	u32 phys_shift;
+
+	if (type & ~KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
+		return -EINVAL;
+
+	phys_shift = KVM_VM_TYPE_ARM_IPA_SIZE(type);
+	if (phys_shift) {
+		if (phys_shift > kvm_ipa_limit ||
+		    phys_shift < ARM64_MIN_PARANGE_BITS)
+			return -EINVAL;
+	} else {
+		phys_shift = KVM_PHYS_SHIFT;
+		if (phys_shift > kvm_ipa_limit) {
+			pr_warn_once("%s using unsupported default IPA limit, upgrade your VMM\n",
+				     current->comm);
+			return -EINVAL;
+		}
+	}
+
+	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+	mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
+	kvm->arch.vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift);
 
 	if (mmu->pgt != NULL) {
 		kvm_err("kvm_arch already initialized?\n");
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 5ae18472205a..e0267f672b8a 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -395,32 +395,3 @@ int kvm_set_ipa_limit(void)
 
 	return 0;
 }
-
-int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type)
-{
-	u64 mmfr0, mmfr1;
-	u32 phys_shift;
-
-	if (type & ~KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
-		return -EINVAL;
-
-	phys_shift = KVM_VM_TYPE_ARM_IPA_SIZE(type);
-	if (phys_shift) {
-		if (phys_shift > kvm_ipa_limit ||
-		    phys_shift < ARM64_MIN_PARANGE_BITS)
-			return -EINVAL;
-	} else {
-		phys_shift = KVM_PHYS_SHIFT;
-		if (phys_shift > kvm_ipa_limit) {
-			pr_warn_once("%s using unsupported default IPA limit, upgrade your VMM\n",
-				     current->comm);
-			return -EINVAL;
-		}
-	}
-
-	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
-	mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
-	kvm->arch.vtcr = kvm_get_vtcr(mmfr0, mmfr1, phys_shift);
-
-	return 0;
-}
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 19/25] KVM: arm64: Instantiate guest stage-2 page-tables at EL2
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
                   ` (17 preceding siblings ...)
  2022-10-17 11:52 ` [PATCH v4 18/25] KVM: arm64: Consolidate stage-2 initialisation into a single function Will Deacon
@ 2022-10-17 11:52 ` Will Deacon
  2022-10-17 11:52 ` [PATCH v4 20/25] KVM: arm64: Return guest memory from EL2 via dedicated teardown memcache Will Deacon
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:52 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

Extend the initialisation of guest data structures within the pKVM
hypervisor at EL2 so that we instantiate a memory pool and a full
'struct kvm_s2_mmu' structure for each VM, with a stage-2 page-table
entirely independent from the one managed by the host at EL1.

The 'struct kvm_pgtable_mm_ops' used by the page-table code is populated
with a set of callbacks that can manage guest pages in the hypervisor
without any direct intervention from the host, allocating page-table
pages from the provided pool and returning these to the host on VM
teardown. To keep things simple, the stage-2 MMU for the guest is
configured identically to the host stage-2 in the VTCR register and so
the IPA size of the guest must match the PA size of the host.

For now, the new page-table is unused as there is no way for the host
to map anything into it. Yet.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h |   6 ++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c  | 125 ++++++++++++++++++++++++-
 arch/arm64/kvm/mmu.c                   |   4 +-
 3 files changed, 132 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index d176399dbfb5..5d456438445c 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -9,6 +9,9 @@
 
 #include <asm/kvm_pkvm.h>
 
+#include <nvhe/gfp.h>
+#include <nvhe/spinlock.h>
+
 /*
  * Holds the relevant data for maintaining the vcpu state completely at hyp.
  */
@@ -36,6 +39,9 @@ struct pkvm_hyp_vm {
 
 	/* The guest's stage-2 page-table managed by the hypervisor. */
 	struct kvm_pgtable pgt;
+	struct kvm_pgtable_mm_ops mm_ops;
+	struct hyp_pool pool;
+	hyp_spinlock_t lock;
 
 	/*
 	 * The number of vcpus initialized and ready to run.
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 1c38451050e5..27b16a6b85bb 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -25,6 +25,21 @@ struct host_mmu host_mmu;
 
 static struct hyp_pool host_s2_pool;
 
+static DEFINE_PER_CPU(struct pkvm_hyp_vm *, __current_vm);
+#define current_vm (*this_cpu_ptr(&__current_vm))
+
+static void guest_lock_component(struct pkvm_hyp_vm *vm)
+{
+	hyp_spin_lock(&vm->lock);
+	current_vm = vm;
+}
+
+static void guest_unlock_component(struct pkvm_hyp_vm *vm)
+{
+	current_vm = NULL;
+	hyp_spin_unlock(&vm->lock);
+}
+
 static void host_lock_component(void)
 {
 	hyp_spin_lock(&host_mmu.lock);
@@ -140,18 +155,124 @@ int kvm_host_prepare_stage2(void *pgt_pool_base)
 	return 0;
 }
 
+static bool guest_stage2_force_pte_cb(u64 addr, u64 end,
+				      enum kvm_pgtable_prot prot)
+{
+	return true;
+}
+
+static void *guest_s2_zalloc_pages_exact(size_t size)
+{
+	void *addr = hyp_alloc_pages(&current_vm->pool, get_order(size));
+
+	WARN_ON(size != (PAGE_SIZE << get_order(size)));
+	hyp_split_page(hyp_virt_to_page(addr));
+
+	return addr;
+}
+
+static void guest_s2_free_pages_exact(void *addr, unsigned long size)
+{
+	u8 order = get_order(size);
+	unsigned int i;
+
+	for (i = 0; i < (1 << order); i++)
+		hyp_put_page(&current_vm->pool, addr + (i * PAGE_SIZE));
+}
+
+static void *guest_s2_zalloc_page(void *mc)
+{
+	struct hyp_page *p;
+	void *addr;
+
+	addr = hyp_alloc_pages(&current_vm->pool, 0);
+	if (addr)
+		return addr;
+
+	addr = pop_hyp_memcache(mc, hyp_phys_to_virt);
+	if (!addr)
+		return addr;
+
+	memset(addr, 0, PAGE_SIZE);
+	p = hyp_virt_to_page(addr);
+	memset(p, 0, sizeof(*p));
+	p->refcount = 1;
+
+	return addr;
+}
+
+static void guest_s2_get_page(void *addr)
+{
+	hyp_get_page(&current_vm->pool, addr);
+}
+
+static void guest_s2_put_page(void *addr)
+{
+	hyp_put_page(&current_vm->pool, addr);
+}
+
+static void clean_dcache_guest_page(void *va, size_t size)
+{
+	__clean_dcache_guest_page(hyp_fixmap_map(__hyp_pa(va)), size);
+	hyp_fixmap_unmap();
+}
+
+static void invalidate_icache_guest_page(void *va, size_t size)
+{
+	__invalidate_icache_guest_page(hyp_fixmap_map(__hyp_pa(va)), size);
+	hyp_fixmap_unmap();
+}
+
 int kvm_guest_prepare_stage2(struct pkvm_hyp_vm *vm, void *pgd)
 {
-	vm->pgt.pgd = pgd;
+	struct kvm_s2_mmu *mmu = &vm->kvm.arch.mmu;
+	unsigned long nr_pages;
+	int ret;
+
+	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
+	ret = hyp_pool_init(&vm->pool, hyp_virt_to_pfn(pgd), nr_pages, 0);
+	if (ret)
+		return ret;
+
+	hyp_spin_lock_init(&vm->lock);
+	vm->mm_ops = (struct kvm_pgtable_mm_ops) {
+		.zalloc_pages_exact	= guest_s2_zalloc_pages_exact,
+		.free_pages_exact	= guest_s2_free_pages_exact,
+		.zalloc_page		= guest_s2_zalloc_page,
+		.phys_to_virt		= hyp_phys_to_virt,
+		.virt_to_phys		= hyp_virt_to_phys,
+		.page_count		= hyp_page_count,
+		.get_page		= guest_s2_get_page,
+		.put_page		= guest_s2_put_page,
+		.dcache_clean_inval_poc	= clean_dcache_guest_page,
+		.icache_inval_pou	= invalidate_icache_guest_page,
+	};
+
+	guest_lock_component(vm);
+	ret = __kvm_pgtable_stage2_init(mmu->pgt, mmu, &vm->mm_ops, 0,
+					guest_stage2_force_pte_cb);
+	guest_unlock_component(vm);
+	if (ret)
+		return ret;
+
+	vm->kvm.arch.mmu.pgd_phys = __hyp_pa(vm->pgt.pgd);
+
 	return 0;
 }
 
 void reclaim_guest_pages(struct pkvm_hyp_vm *vm)
 {
+	void *pgd = vm->pgt.pgd;
 	unsigned long nr_pages;
 
 	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
-	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(vm->pgt.pgd), nr_pages));
+
+	guest_lock_component(vm);
+	kvm_pgtable_stage2_destroy(&vm->pgt);
+	vm->kvm.arch.mmu.pgd_phys = 0ULL;
+	guest_unlock_component(vm);
+
+	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(pgd), nr_pages));
 }
 
 int __pkvm_prot_finalize(void)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 43761d31f763..301fc7275062 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -686,7 +686,9 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
 		return -EINVAL;
 
 	phys_shift = KVM_VM_TYPE_ARM_IPA_SIZE(type);
-	if (phys_shift) {
+	if (is_protected_kvm_enabled()) {
+		phys_shift = kvm_ipa_limit;
+	} else if (phys_shift) {
 		if (phys_shift > kvm_ipa_limit ||
 		    phys_shift < ARM64_MIN_PARANGE_BITS)
 			return -EINVAL;
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 20/25] KVM: arm64: Return guest memory from EL2 via dedicated teardown memcache
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
                   ` (18 preceding siblings ...)
  2022-10-17 11:52 ` [PATCH v4 19/25] KVM: arm64: Instantiate guest stage-2 page-tables at EL2 Will Deacon
@ 2022-10-17 11:52 ` Will Deacon
  2022-10-19 15:52   ` Quentin Perret
  2022-10-17 11:52 ` [PATCH v4 21/25] KVM: arm64: Unmap 'kvm_arm_hyp_percpu_base' from the host Will Deacon
                   ` (4 subsequent siblings)
  24 siblings, 1 reply; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:52 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

Rather than relying on the host to free the previously-donated pKVM
hypervisor VM pages explicitly on teardown, introduce a dedicated
teardown memcache which allows the host to reclaim guest memory
resources without having to keep track of all of the allocations made by
the pKVM hypervisor at EL2.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h             |  7 +----
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +-
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 17 ++++++----
 arch/arm64/kvm/hyp/nvhe/pkvm.c                | 20 ++++++++++--
 arch/arm64/kvm/pkvm.c                         | 31 ++++---------------
 5 files changed, 36 insertions(+), 41 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3fa973237b90..22a07823b0a9 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -177,12 +177,7 @@ typedef unsigned int pkvm_handle_t;
 struct kvm_protected_vm {
 	pkvm_handle_t handle;
 	struct mutex vm_lock;
-
-	struct {
-		void *pgd;
-		void *vm;
-		void *vcpus[KVM_MAX_VCPUS];
-	} hyp_donations;
+	struct kvm_hyp_memcache teardown_mc;
 };
 
 struct kvm_arch {
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 420b87e755a4..b7bdbe63deed 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -76,7 +76,7 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt);
 
 int hyp_pin_shared_mem(void *from, void *to);
 void hyp_unpin_shared_mem(void *from, void *to);
-void reclaim_guest_pages(struct pkvm_hyp_vm *vm);
+void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc);
 int refill_memcache(struct kvm_hyp_memcache *mc, unsigned long min_pages,
 		    struct kvm_hyp_memcache *host_mc);
 
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 27b16a6b85bb..ffa56a89acdb 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -260,19 +260,24 @@ int kvm_guest_prepare_stage2(struct pkvm_hyp_vm *vm, void *pgd)
 	return 0;
 }
 
-void reclaim_guest_pages(struct pkvm_hyp_vm *vm)
+void reclaim_guest_pages(struct pkvm_hyp_vm *vm, struct kvm_hyp_memcache *mc)
 {
-	void *pgd = vm->pgt.pgd;
-	unsigned long nr_pages;
-
-	nr_pages = kvm_pgtable_stage2_pgd_size(vm->kvm.arch.vtcr) >> PAGE_SHIFT;
+	void *addr;
 
+	/* Dump all pgtable pages in the hyp_pool */
 	guest_lock_component(vm);
 	kvm_pgtable_stage2_destroy(&vm->pgt);
 	vm->kvm.arch.mmu.pgd_phys = 0ULL;
 	guest_unlock_component(vm);
 
-	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(pgd), nr_pages));
+	/* Drain the hyp_pool into the memcache */
+	addr = hyp_alloc_pages(&vm->pool, 0);
+	while (addr) {
+		memset(hyp_virt_to_page(addr), 0, sizeof(struct hyp_page));
+		push_hyp_memcache(mc, addr, hyp_virt_to_phys);
+		WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(addr), 1));
+		addr = hyp_alloc_pages(&vm->pool, 0);
+	}
 }
 
 int __pkvm_prot_finalize(void)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 2156ce546d90..f5f300cc5864 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -539,8 +539,21 @@ int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
 	return ret;
 }
 
+static void
+teardown_donated_memory(struct kvm_hyp_memcache *mc, void *addr, size_t size)
+{
+	size = PAGE_ALIGN(size);
+	memset(addr, 0, size);
+
+	for (void *start = addr; start < addr + size; start += PAGE_SIZE)
+		push_hyp_memcache(mc, start, hyp_virt_to_phys);
+
+	unmap_donated_memory_noclear(addr, size);
+}
+
 int __pkvm_teardown_vm(pkvm_handle_t handle)
 {
+	struct kvm_hyp_memcache *mc;
 	struct pkvm_hyp_vm *hyp_vm;
 	unsigned int idx;
 	int err;
@@ -563,7 +576,8 @@ int __pkvm_teardown_vm(pkvm_handle_t handle)
 	hyp_spin_unlock(&vm_table_lock);
 
 	/* Reclaim guest pages (including page-table pages) */
-	reclaim_guest_pages(hyp_vm);
+	mc = &hyp_vm->host_kvm->arch.pkvm.teardown_mc;
+	reclaim_guest_pages(hyp_vm, mc);
 	unpin_host_vcpus(hyp_vm->vcpus, hyp_vm->nr_vcpus);
 
 	/* Push the metadata pages to the teardown memcache */
@@ -572,10 +586,10 @@ int __pkvm_teardown_vm(pkvm_handle_t handle)
 	for (idx = 0; idx < hyp_vm->nr_vcpus; ++idx) {
 		struct pkvm_hyp_vcpu *hyp_vcpu = hyp_vm->vcpus[idx];
 
-		unmap_donated_memory(hyp_vcpu, sizeof(*hyp_vcpu));
+		teardown_donated_memory(mc, hyp_vcpu, sizeof(*hyp_vcpu));
 	}
 
-	unmap_donated_memory(hyp_vm, hyp_vm->donated_memory_size);
+	teardown_donated_memory(mc, hyp_vm, hyp_vm->donated_memory_size);
 	return 0;
 
 err_unlock:
diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c
index 754632a608e3..a9953db08592 100644
--- a/arch/arm64/kvm/pkvm.c
+++ b/arch/arm64/kvm/pkvm.c
@@ -147,8 +147,6 @@ static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
 	handle = ret;
 
 	host_kvm->arch.pkvm.handle = handle;
-	host_kvm->arch.pkvm.hyp_donations.pgd = pgd;
-	host_kvm->arch.pkvm.hyp_donations.vm = hyp_vm;
 
 	/* Donate memory for the vcpus at hyp and initialize it. */
 	hyp_vcpu_sz = PAGE_ALIGN(PKVM_HYP_VCPU_SIZE);
@@ -167,12 +165,12 @@ static int __pkvm_create_hyp_vm(struct kvm *host_kvm)
 			goto destroy_vm;
 		}
 
-		host_kvm->arch.pkvm.hyp_donations.vcpus[idx] = hyp_vcpu;
-
 		ret = kvm_call_hyp_nvhe(__pkvm_init_vcpu, handle, host_vcpu,
 					hyp_vcpu);
-		if (ret)
+		if (ret) {
+			free_pages_exact(hyp_vcpu, hyp_vcpu_sz);
 			goto destroy_vm;
+		}
 	}
 
 	return 0;
@@ -201,30 +199,13 @@ int pkvm_create_hyp_vm(struct kvm *host_kvm)
 
 void pkvm_destroy_hyp_vm(struct kvm *host_kvm)
 {
-	unsigned long idx, nr_vcpus = host_kvm->created_vcpus;
-	size_t pgd_sz, hyp_vm_sz;
-
-	if (host_kvm->arch.pkvm.handle)
+	if (host_kvm->arch.pkvm.handle) {
 		WARN_ON(kvm_call_hyp_nvhe(__pkvm_teardown_vm,
 					  host_kvm->arch.pkvm.handle));
-
-	host_kvm->arch.pkvm.handle = 0;
-
-	for (idx = 0; idx < nr_vcpus; ++idx) {
-		void *hyp_vcpu = host_kvm->arch.pkvm.hyp_donations.vcpus[idx];
-
-		if (!hyp_vcpu)
-			break;
-
-		free_pages_exact(hyp_vcpu, PAGE_ALIGN(PKVM_HYP_VCPU_SIZE));
 	}
 
-	hyp_vm_sz = PAGE_ALIGN(size_add(PKVM_HYP_VM_SIZE,
-					size_mul(sizeof(void *), nr_vcpus)));
-	pgd_sz = kvm_pgtable_stage2_pgd_size(host_kvm->arch.vtcr);
-
-	free_pages_exact(host_kvm->arch.pkvm.hyp_donations.vm, hyp_vm_sz);
-	free_pages_exact(host_kvm->arch.pkvm.hyp_donations.pgd, pgd_sz);
+	host_kvm->arch.pkvm.handle = 0;
+	free_hyp_memcache(&host_kvm->arch.pkvm.teardown_mc);
 }
 
 int pkvm_init_host_vm(struct kvm *host_kvm)
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 21/25] KVM: arm64: Unmap 'kvm_arm_hyp_percpu_base' from the host
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
                   ` (19 preceding siblings ...)
  2022-10-17 11:52 ` [PATCH v4 20/25] KVM: arm64: Return guest memory from EL2 via dedicated teardown memcache Will Deacon
@ 2022-10-17 11:52 ` Will Deacon
  2022-10-17 11:52 ` [PATCH v4 22/25] KVM: arm64: Maintain a copy of 'kvm_arm_vmid_bits' at EL2 Will Deacon
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:52 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

When pKVM is enabled, the hypervisor at EL2 does not trust the host at
EL1 and must therefore prevent it from having unrestricted access to
internal hypervisor state.

The 'kvm_arm_hyp_percpu_base' array holds the offsets for hypervisor
per-cpu allocations, so move this this into the nVHE code where it
cannot be modified by the untrusted host at EL1.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_asm.h  | 4 ++--
 arch/arm64/kernel/image-vars.h    | 3 ---
 arch/arm64/kvm/arm.c              | 9 ++++-----
 arch/arm64/kvm/hyp/nvhe/hyp-smp.c | 2 ++
 4 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index de52ba775d48..43c3bc0f9544 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -109,7 +109,7 @@ enum __kvm_host_smccc_func {
 #define per_cpu_ptr_nvhe_sym(sym, cpu)						\
 	({									\
 		unsigned long base, off;					\
-		base = kvm_arm_hyp_percpu_base[cpu];				\
+		base = kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu];		\
 		off = (unsigned long)&CHOOSE_NVHE_SYM(sym) -			\
 		      (unsigned long)&CHOOSE_NVHE_SYM(__per_cpu_start);		\
 		base ? (typeof(CHOOSE_NVHE_SYM(sym))*)(base + off) : NULL;	\
@@ -214,7 +214,7 @@ DECLARE_KVM_HYP_SYM(__kvm_hyp_vector);
 #define __kvm_hyp_init		CHOOSE_NVHE_SYM(__kvm_hyp_init)
 #define __kvm_hyp_vector	CHOOSE_HYP_SYM(__kvm_hyp_vector)
 
-extern unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
+extern unsigned long kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[];
 DECLARE_KVM_NVHE_SYM(__per_cpu_start);
 DECLARE_KVM_NVHE_SYM(__per_cpu_end);
 
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 7f4e43bfaade..ae8f37f4aa8c 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -89,9 +89,6 @@ KVM_NVHE_ALIAS(gic_nonsecure_priorities);
 KVM_NVHE_ALIAS(__start___kvm_ex_table);
 KVM_NVHE_ALIAS(__stop___kvm_ex_table);
 
-/* Array containing bases of nVHE per-CPU memory regions. */
-KVM_NVHE_ALIAS(kvm_arm_hyp_percpu_base);
-
 /* PMU available static key */
 #ifdef CONFIG_HW_PERF_EVENTS
 KVM_NVHE_ALIAS(kvm_arm_pmu_available);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index f78eefa02f6b..25467f24803d 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -51,7 +51,6 @@ DEFINE_STATIC_KEY_FALSE(kvm_protected_mode_initialized);
 DECLARE_KVM_HYP_PER_CPU(unsigned long, kvm_hyp_vector);
 
 DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page);
-unsigned long kvm_arm_hyp_percpu_base[NR_CPUS];
 DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
 static bool vgic_present;
@@ -1857,13 +1856,13 @@ static void teardown_hyp_mode(void)
 	free_hyp_pgds();
 	for_each_possible_cpu(cpu) {
 		free_page(per_cpu(kvm_arm_hyp_stack_page, cpu));
-		free_pages(kvm_arm_hyp_percpu_base[cpu], nvhe_percpu_order());
+		free_pages(kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu], nvhe_percpu_order());
 	}
 }
 
 static int do_pkvm_init(u32 hyp_va_bits)
 {
-	void *per_cpu_base = kvm_ksym_ref(kvm_arm_hyp_percpu_base);
+	void *per_cpu_base = kvm_ksym_ref(kvm_nvhe_sym(kvm_arm_hyp_percpu_base));
 	int ret;
 
 	preempt_disable();
@@ -1967,7 +1966,7 @@ static int init_hyp_mode(void)
 
 		page_addr = page_address(page);
 		memcpy(page_addr, CHOOSE_NVHE_SYM(__per_cpu_start), nvhe_percpu_size());
-		kvm_arm_hyp_percpu_base[cpu] = (unsigned long)page_addr;
+		kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu] = (unsigned long)page_addr;
 	}
 
 	/*
@@ -2060,7 +2059,7 @@ static int init_hyp_mode(void)
 	}
 
 	for_each_possible_cpu(cpu) {
-		char *percpu_begin = (char *)kvm_arm_hyp_percpu_base[cpu];
+		char *percpu_begin = (char *)kvm_nvhe_sym(kvm_arm_hyp_percpu_base)[cpu];
 		char *percpu_end = percpu_begin + nvhe_percpu_size();
 
 		/* Map Hyp percpu pages */
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-smp.c b/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
index 9f54833af400..04d194583f1e 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-smp.c
@@ -23,6 +23,8 @@ u64 cpu_logical_map(unsigned int cpu)
 	return hyp_cpu_logical_map[cpu];
 }
 
+unsigned long __ro_after_init kvm_arm_hyp_percpu_base[NR_CPUS];
+
 unsigned long __hyp_per_cpu_offset(unsigned int cpu)
 {
 	unsigned long *cpu_base_array;
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 22/25] KVM: arm64: Maintain a copy of 'kvm_arm_vmid_bits' at EL2
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
                   ` (20 preceding siblings ...)
  2022-10-17 11:52 ` [PATCH v4 21/25] KVM: arm64: Unmap 'kvm_arm_hyp_percpu_base' from the host Will Deacon
@ 2022-10-17 11:52 ` Will Deacon
  2022-10-17 11:52 ` [PATCH v4 23/25] KVM: arm64: Explicitly map 'kvm_vgic_global_state' " Will Deacon
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:52 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

Sharing 'kvm_arm_vmid_bits' between EL1 and EL2 allows the host to
modify the variable arbitrarily, potentially leading to all sorts of
shenanians as this is used to configure the VTTBR register for the
guest stage-2.

In preparation for unmapping host sections entirely from EL2, maintain
a copy of 'kvm_arm_vmid_bits' in the pKVM hypervisor and initialise it
from the host value while it is still trusted.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_hyp.h | 2 ++
 arch/arm64/kernel/image-vars.h   | 3 ---
 arch/arm64/kvm/arm.c             | 1 +
 arch/arm64/kvm/hyp/nvhe/pkvm.c   | 3 +++
 4 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index fd99cf09972d..6797eafe7890 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -124,4 +124,6 @@ extern u64 kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val);
 extern u64 kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val);
 
 extern unsigned long kvm_nvhe_sym(__icache_flags);
+extern unsigned int kvm_nvhe_sym(kvm_arm_vmid_bits);
+
 #endif /* __ARM64_KVM_HYP_H__ */
diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index ae8f37f4aa8c..31ad75da4d58 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -71,9 +71,6 @@ KVM_NVHE_ALIAS(nvhe_hyp_panic_handler);
 /* Vectors installed by hyp-init on reset HVC. */
 KVM_NVHE_ALIAS(__hyp_stub_vectors);
 
-/* VMID bits set by the KVM VMID allocator */
-KVM_NVHE_ALIAS(kvm_arm_vmid_bits);
-
 /* Static keys which are set if a vGIC trap should be handled in hyp. */
 KVM_NVHE_ALIAS(vgic_v2_cpuif_trap);
 KVM_NVHE_ALIAS(vgic_v3_cpuif_trap);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 25467f24803d..1d4b8122d010 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1893,6 +1893,7 @@ static void kvm_hyp_init_symbols(void)
 	kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
 	kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR2_EL1);
 	kvm_nvhe_sym(__icache_flags) = __icache_flags;
+	kvm_nvhe_sym(kvm_arm_vmid_bits) = kvm_arm_vmid_bits;
 }
 
 static int kvm_hyp_init_protection(u32 hyp_va_bits)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index f5f300cc5864..826335ccad9d 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -15,6 +15,9 @@
 /* Used by icache_is_vpipt(). */
 unsigned long __icache_flags;
 
+/* Used by kvm_get_vttbr(). */
+unsigned int kvm_arm_vmid_bits;
+
 /*
  * Set trap register values based on features in ID_AA64PFR0.
  */
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 23/25] KVM: arm64: Explicitly map 'kvm_vgic_global_state' at EL2
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
                   ` (21 preceding siblings ...)
  2022-10-17 11:52 ` [PATCH v4 22/25] KVM: arm64: Maintain a copy of 'kvm_arm_vmid_bits' at EL2 Will Deacon
@ 2022-10-17 11:52 ` Will Deacon
  2022-10-17 11:52 ` [PATCH v4 24/25] KVM: arm64: Don't unnecessarily map host kernel sections " Will Deacon
  2022-10-17 11:52 ` [PATCH v4 25/25] KVM: arm64: Use the pKVM hyp vCPU structure in handle___kvm_vcpu_run() Will Deacon
  24 siblings, 0 replies; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:52 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

The pkvm hypervisor at EL2 may need to read the 'kvm_vgic_global_state'
variable from the host, for example when saving and restoring the state
of the virtual GIC.

Explicitly map 'kvm_vgic_global_state' in the stage-1 page-table of the
pKVM hypervisor rather than relying on mapping all of the host '.rodata'
section.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/nvhe/setup.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 0f69c1393416..5a371ab236db 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -161,6 +161,11 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 	if (ret)
 		return ret;
 
+	ret = pkvm_create_mappings(&kvm_vgic_global_state,
+				   &kvm_vgic_global_state + 1, prot);
+	if (ret)
+		return ret;
+
 	return 0;
 }
 
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 24/25] KVM: arm64: Don't unnecessarily map host kernel sections at EL2
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
                   ` (22 preceding siblings ...)
  2022-10-17 11:52 ` [PATCH v4 23/25] KVM: arm64: Explicitly map 'kvm_vgic_global_state' " Will Deacon
@ 2022-10-17 11:52 ` Will Deacon
  2022-10-17 11:52 ` [PATCH v4 25/25] KVM: arm64: Use the pKVM hyp vCPU structure in handle___kvm_vcpu_run() Will Deacon
  24 siblings, 0 replies; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:52 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

From: Quentin Perret <qperret@google.com>

We no longer need to map the host's '.rodata' and '.bss' sections in the
stage-1 page-table of the pKVM hypervisor at EL2, so remove those
mappings and avoid creating any future dependencies at EL2 on
host-controlled data structures.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kernel/image-vars.h  |  6 ------
 arch/arm64/kvm/hyp/nvhe/setup.c | 14 +++-----------
 2 files changed, 3 insertions(+), 17 deletions(-)

diff --git a/arch/arm64/kernel/image-vars.h b/arch/arm64/kernel/image-vars.h
index 31ad75da4d58..e3f88b5836a2 100644
--- a/arch/arm64/kernel/image-vars.h
+++ b/arch/arm64/kernel/image-vars.h
@@ -102,12 +102,6 @@ KVM_NVHE_ALIAS_HYP(__memcpy, __pi_memcpy);
 KVM_NVHE_ALIAS_HYP(__memset, __pi_memset);
 #endif
 
-/* Kernel memory sections */
-KVM_NVHE_ALIAS(__start_rodata);
-KVM_NVHE_ALIAS(__end_rodata);
-KVM_NVHE_ALIAS(__bss_start);
-KVM_NVHE_ALIAS(__bss_stop);
-
 /* Hyp memory sections */
 KVM_NVHE_ALIAS(__hyp_idmap_text_start);
 KVM_NVHE_ALIAS(__hyp_idmap_text_end);
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 5a371ab236db..5cdf3fb09bb4 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -144,23 +144,15 @@ static int recreate_hyp_mappings(phys_addr_t phys, unsigned long size,
 	}
 
 	/*
-	 * Map the host's .bss and .rodata sections RO in the hypervisor, but
-	 * transfer the ownership from the host to the hypervisor itself to
-	 * make sure it can't be donated or shared with another entity.
+	 * Map the host sections RO in the hypervisor, but transfer the
+	 * ownership from the host to the hypervisor itself to make sure they
+	 * can't be donated or shared with another entity.
 	 *
 	 * The ownership transition requires matching changes in the host
 	 * stage-2. This will be done later (see finalize_host_mappings()) once
 	 * the hyp_vmemmap is addressable.
 	 */
 	prot = pkvm_mkstate(PAGE_HYP_RO, PKVM_PAGE_SHARED_OWNED);
-	ret = pkvm_create_mappings(__start_rodata, __end_rodata, prot);
-	if (ret)
-		return ret;
-
-	ret = pkvm_create_mappings(__hyp_bss_end, __bss_stop, prot);
-	if (ret)
-		return ret;
-
 	ret = pkvm_create_mappings(&kvm_vgic_global_state,
 				   &kvm_vgic_global_state + 1, prot);
 	if (ret)
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v4 25/25] KVM: arm64: Use the pKVM hyp vCPU structure in handle___kvm_vcpu_run()
  2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
                   ` (23 preceding siblings ...)
  2022-10-17 11:52 ` [PATCH v4 24/25] KVM: arm64: Don't unnecessarily map host kernel sections " Will Deacon
@ 2022-10-17 11:52 ` Will Deacon
  24 siblings, 0 replies; 53+ messages in thread
From: Will Deacon @ 2022-10-17 11:52 UTC (permalink / raw)
  To: kvmarm
  Cc: Will Deacon, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Quentin Perret, Suzuki K Poulose, Mark Rutland, Fuad Tabba,
	Oliver Upton, Marc Zyngier, kernel-team, kvm, linux-arm-kernel

As a stepping stone towards deprivileging the host's access to the
guest's vCPU structures, introduce some naive flush/sync routines to
copy most of the host vCPU into the hyp vCPU on vCPU run and back
again on return to EL1.

This allows us to run using the pKVM hyp structures when KVM is
initialised in protected mode.

Tested-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h |  4 ++
 arch/arm64/kvm/hyp/nvhe/hyp-main.c     | 79 +++++++++++++++++++++++++-
 arch/arm64/kvm/hyp/nvhe/pkvm.c         | 28 +++++++++
 3 files changed, 109 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 5d456438445c..38424b98ed84 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -67,4 +67,8 @@ int __pkvm_init_vcpu(pkvm_handle_t handle, struct kvm_vcpu *host_vcpu,
 		     unsigned long vcpu_hva);
 int __pkvm_teardown_vm(pkvm_handle_t handle);
 
+struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
+					 unsigned int vcpu_idx);
+void pkvm_put_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu);
+
 #endif /* __ARM64_KVM_NVHE_PKVM_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index b5f3fcfe9135..728e01d4536b 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -22,11 +22,86 @@ DEFINE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params);
 
 void __kvm_hyp_host_forward_smc(struct kvm_cpu_context *host_ctxt);
 
+static void flush_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+
+	hyp_vcpu->vcpu.arch.ctxt	= host_vcpu->arch.ctxt;
+
+	hyp_vcpu->vcpu.arch.sve_state	= kern_hyp_va(host_vcpu->arch.sve_state);
+	hyp_vcpu->vcpu.arch.sve_max_vl	= host_vcpu->arch.sve_max_vl;
+
+	hyp_vcpu->vcpu.arch.hw_mmu	= host_vcpu->arch.hw_mmu;
+
+	hyp_vcpu->vcpu.arch.hcr_el2	= host_vcpu->arch.hcr_el2;
+	hyp_vcpu->vcpu.arch.mdcr_el2	= host_vcpu->arch.mdcr_el2;
+	hyp_vcpu->vcpu.arch.cptr_el2	= host_vcpu->arch.cptr_el2;
+
+	hyp_vcpu->vcpu.arch.iflags	= host_vcpu->arch.iflags;
+	hyp_vcpu->vcpu.arch.fp_state	= host_vcpu->arch.fp_state;
+
+	hyp_vcpu->vcpu.arch.debug_ptr	= kern_hyp_va(host_vcpu->arch.debug_ptr);
+	hyp_vcpu->vcpu.arch.host_fpsimd_state = host_vcpu->arch.host_fpsimd_state;
+
+	hyp_vcpu->vcpu.arch.vsesr_el2	= host_vcpu->arch.vsesr_el2;
+
+	hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3 = host_vcpu->arch.vgic_cpu.vgic_v3;
+}
+
+static void sync_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct kvm_vcpu *host_vcpu = hyp_vcpu->host_vcpu;
+	struct vgic_v3_cpu_if *hyp_cpu_if = &hyp_vcpu->vcpu.arch.vgic_cpu.vgic_v3;
+	struct vgic_v3_cpu_if *host_cpu_if = &host_vcpu->arch.vgic_cpu.vgic_v3;
+	unsigned int i;
+
+	host_vcpu->arch.ctxt		= hyp_vcpu->vcpu.arch.ctxt;
+
+	host_vcpu->arch.hcr_el2		= hyp_vcpu->vcpu.arch.hcr_el2;
+	host_vcpu->arch.cptr_el2	= hyp_vcpu->vcpu.arch.cptr_el2;
+
+	host_vcpu->arch.fault		= hyp_vcpu->vcpu.arch.fault;
+
+	host_vcpu->arch.iflags		= hyp_vcpu->vcpu.arch.iflags;
+	host_vcpu->arch.fp_state	= hyp_vcpu->vcpu.arch.fp_state;
+
+	host_cpu_if->vgic_hcr		= hyp_cpu_if->vgic_hcr;
+	for (i = 0; i < hyp_cpu_if->used_lrs; ++i)
+		host_cpu_if->vgic_lr[i] = hyp_cpu_if->vgic_lr[i];
+}
+
 static void handle___kvm_vcpu_run(struct kvm_cpu_context *host_ctxt)
 {
-	DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1);
+	DECLARE_REG(struct kvm_vcpu *, host_vcpu, host_ctxt, 1);
+	int ret;
+
+	host_vcpu = kern_hyp_va(host_vcpu);
+
+	if (unlikely(is_protected_kvm_enabled())) {
+		struct pkvm_hyp_vcpu *hyp_vcpu;
+		struct kvm *host_kvm;
+
+		host_kvm = kern_hyp_va(host_vcpu->kvm);
+		hyp_vcpu = pkvm_load_hyp_vcpu(host_kvm->arch.pkvm.handle,
+					      host_vcpu->vcpu_idx);
+		if (!hyp_vcpu) {
+			ret = -EINVAL;
+			goto out;
+		}
+
+		flush_hyp_vcpu(hyp_vcpu);
+
+		ret = __kvm_vcpu_run(&hyp_vcpu->vcpu);
+
+		sync_hyp_vcpu(hyp_vcpu);
+		pkvm_put_hyp_vcpu(hyp_vcpu);
+	} else {
+		/* The host is fully trusted, run its vCPU directly. */
+		ret = __kvm_vcpu_run(host_vcpu);
+	}
 
-	cpu_reg(host_ctxt, 1) =  __kvm_vcpu_run(kern_hyp_va(vcpu));
+out:
+	cpu_reg(host_ctxt, 1) =  ret;
 }
 
 static void handle___kvm_adjust_pc(struct kvm_cpu_context *host_ctxt)
diff --git a/arch/arm64/kvm/hyp/nvhe/pkvm.c b/arch/arm64/kvm/hyp/nvhe/pkvm.c
index 826335ccad9d..d4d2ed07a481 100644
--- a/arch/arm64/kvm/hyp/nvhe/pkvm.c
+++ b/arch/arm64/kvm/hyp/nvhe/pkvm.c
@@ -241,6 +241,33 @@ static struct pkvm_hyp_vm *get_vm_by_handle(pkvm_handle_t handle)
 	return vm_table[idx];
 }
 
+struct pkvm_hyp_vcpu *pkvm_load_hyp_vcpu(pkvm_handle_t handle,
+					 unsigned int vcpu_idx)
+{
+	struct pkvm_hyp_vcpu *hyp_vcpu = NULL;
+	struct pkvm_hyp_vm *hyp_vm;
+
+	hyp_spin_lock(&vm_table_lock);
+	hyp_vm = get_vm_by_handle(handle);
+	if (!hyp_vm || hyp_vm->nr_vcpus <= vcpu_idx)
+		goto unlock;
+
+	hyp_vcpu = hyp_vm->vcpus[vcpu_idx];
+	hyp_page_ref_inc(hyp_virt_to_page(hyp_vm));
+unlock:
+	hyp_spin_unlock(&vm_table_lock);
+	return hyp_vcpu;
+}
+
+void pkvm_put_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu)
+{
+	struct pkvm_hyp_vm *hyp_vm = pkvm_hyp_vcpu_to_hyp_vm(hyp_vcpu);
+
+	hyp_spin_lock(&vm_table_lock);
+	hyp_page_ref_dec(hyp_virt_to_page(hyp_vm));
+	hyp_spin_unlock(&vm_table_lock);
+}
+
 static void unpin_host_vcpu(struct kvm_vcpu *host_vcpu)
 {
 	if (host_vcpu)
@@ -286,6 +313,7 @@ static int init_pkvm_hyp_vcpu(struct pkvm_hyp_vcpu *hyp_vcpu,
 	hyp_vcpu->vcpu.vcpu_idx = vcpu_idx;
 
 	hyp_vcpu->vcpu.arch.hw_mmu = &hyp_vm->kvm.arch.mmu;
+	hyp_vcpu->vcpu.arch.cflags = READ_ONCE(host_vcpu->arch.cflags);
 done:
 	if (ret)
 		unpin_host_vcpu(host_vcpu);
-- 
2.38.0.413.g74048e4d9e-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 05/25] KVM: arm64: Unify identifiers used to distinguish host and hypervisor
  2022-10-17 11:51 ` [PATCH v4 05/25] KVM: arm64: Unify identifiers used to distinguish host and hypervisor Will Deacon
@ 2022-10-17 20:21   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 53+ messages in thread
From: Philippe Mathieu-Daudé @ 2022-10-17 20:21 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Quentin Perret,
	Suzuki K Poulose, Mark Rutland, Fuad Tabba, Oliver Upton,
	Marc Zyngier, kernel-team, kvm, linux-arm-kernel

On 17/10/22 13:51, Will Deacon wrote:
> The 'pkvm_component_id' enum type provides constants to refer to the
> host and the hypervisor, yet this information is duplicated by the
> 'pkvm_hyp_id' constant.
> 
> Remove the definition of 'pkvm_hyp_id' and move the 'pkvm_component_id'
> type definition to 'mem_protect.h' so that it can be used outside of
> the memory protection code, for example when initialising the owner for
> hypervisor-owned pages.
> 
> Tested-by: Vincent Donnefort <vdonnefort@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 6 +++++-
>   arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 8 --------
>   arch/arm64/kvm/hyp/nvhe/setup.c               | 2 +-
>   3 files changed, 6 insertions(+), 10 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 09/25] KVM: arm64: Include asm/kvm_mmu.h in nvhe/mem_protect.h
  2022-10-17 11:51 ` [PATCH v4 09/25] KVM: arm64: Include asm/kvm_mmu.h in nvhe/mem_protect.h Will Deacon
@ 2022-10-17 20:22   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 53+ messages in thread
From: Philippe Mathieu-Daudé @ 2022-10-17 20:22 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Quentin Perret,
	Suzuki K Poulose, Mark Rutland, Fuad Tabba, Oliver Upton,
	Marc Zyngier, kernel-team, kvm, linux-arm-kernel

On 17/10/22 13:51, Will Deacon wrote:
> nvhe/mem_protect.h refers to __load_stage2() in the definition of
> __load_host_stage2() but doesn't include the relevant header.
> 
> Include asm/kvm_mmu.h in nvhe/mem_protect.h so that users of the latter
> don't have to do this themselves.
> 
> Tested-by: Vincent Donnefort <vdonnefort@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 1 +
>   1 file changed, 1 insertion(+)
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 15/25] KVM: arm64: Initialise hypervisor copies of host symbols unconditionally
  2022-10-17 11:51 ` [PATCH v4 15/25] KVM: arm64: Initialise hypervisor copies of host symbols unconditionally Will Deacon
@ 2022-10-17 20:26   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 53+ messages in thread
From: Philippe Mathieu-Daudé @ 2022-10-17 20:26 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Quentin Perret,
	Suzuki K Poulose, Mark Rutland, Fuad Tabba, Oliver Upton,
	Marc Zyngier, kernel-team, kvm, linux-arm-kernel

On 17/10/22 13:51, Will Deacon wrote:
> The nVHE object at EL2 maintains its own copies of some host variables
> so that, when pKVM is enabled, the host cannot directly modify the
> hypervisor state. When running in normal nVHE mode, however, these
> variables are still mirrored at EL2 but are not initialised.
> 
> Initialise the hypervisor symbols from the host copies regardless of
> pKVM, ensuring that any reference to this data at EL2 with normal nVHE
> will return a sensibly initialised value.
> 
> Tested-by: Vincent Donnefort <vdonnefort@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/kvm/arm.c | 15 +++++++++------
>   1 file changed, 9 insertions(+), 6 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 01/25] KVM: arm64: Move hyp refcount manipulation helpers to common header file
  2022-10-17 11:51 ` [PATCH v4 01/25] KVM: arm64: Move hyp refcount manipulation helpers to common header file Will Deacon
@ 2022-10-17 20:29   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 53+ messages in thread
From: Philippe Mathieu-Daudé @ 2022-10-17 20:29 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Quentin Perret,
	Suzuki K Poulose, Mark Rutland, Fuad Tabba, Oliver Upton,
	Marc Zyngier, kernel-team, kvm, linux-arm-kernel

On 17/10/22 13:51, Will Deacon wrote:
> From: Quentin Perret <qperret@google.com>
> 
> We will soon need to manipulate 'struct hyp_page' refcounts from outside
> page_alloc.c, so move the helpers to a common header file to allow them
> to be reused easily.
> 
> Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
> Tested-by: Vincent Donnefort <vdonnefort@google.com>
> Signed-off-by: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/kvm/hyp/include/nvhe/memory.h | 22 ++++++++++++++++++++++
>   arch/arm64/kvm/hyp/nvhe/page_alloc.c     | 19 -------------------
>   2 files changed, 22 insertions(+), 19 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 14/25] KVM: arm64: Add per-cpu fixmap infrastructure at EL2
  2022-10-17 11:51 ` [PATCH v4 14/25] KVM: arm64: Add per-cpu fixmap infrastructure at EL2 Will Deacon
@ 2022-10-18 11:06   ` Mark Rutland
  2022-10-18 14:05     ` Will Deacon
  0 siblings, 1 reply; 53+ messages in thread
From: Mark Rutland @ 2022-10-18 11:06 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Quentin Perret,
	Suzuki K Poulose, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

On Mon, Oct 17, 2022 at 12:51:58PM +0100, Will Deacon wrote:
> From: Quentin Perret <qperret@google.com>
> 
> Mapping pages in a guest page-table from within the pKVM hypervisor at
> EL2 may require cache maintenance to ensure that the initialised page
> contents is visible even to non-cacheable (e.g. MMU-off) accesses from
> the guest.
> 
> In preparation for performing this maintenance at EL2, introduce a
> per-vCPU fixmap which allows the pKVM hypervisor to map guest pages
> temporarily into its stage-1 page-table for the purposes of cache
> maintenance and, in future, poisoning on the reclaim path. The use of a
> fixmap avoids the need for memory allocation or locking on the map()
> path.
> 
> Tested-by: Vincent Donnefort <vdonnefort@google.com>
> Signed-off-by: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h          | 14 +++
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  2 +
>  arch/arm64/kvm/hyp/include/nvhe/mm.h          |  4 +
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         |  1 -
>  arch/arm64/kvm/hyp/nvhe/mm.c                  | 94 +++++++++++++++++++
>  arch/arm64/kvm/hyp/nvhe/setup.c               |  4 +
>  arch/arm64/kvm/hyp/pgtable.c                  | 12 ---
>  7 files changed, 118 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 4f6d79fe4352..b2a886c9e78d 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -30,6 +30,8 @@ typedef u64 kvm_pte_t;
>  #define KVM_PTE_ADDR_MASK		GENMASK(47, PAGE_SHIFT)
>  #define KVM_PTE_ADDR_51_48		GENMASK(15, 12)
>  
> +#define KVM_PHYS_INVALID		(-1ULL)
> +
>  static inline bool kvm_pte_valid(kvm_pte_t pte)
>  {
>  	return pte & KVM_PTE_VALID;
> @@ -45,6 +47,18 @@ static inline u64 kvm_pte_to_phys(kvm_pte_t pte)
>  	return pa;
>  }
>  
> +static inline kvm_pte_t kvm_phys_to_pte(u64 pa)
> +{
> +	kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK;
> +
> +	if (PAGE_SHIFT == 16) {
> +		pa &= GENMASK(51, 48);
> +		pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
> +	}
> +
> +	return pte;
> +}
> +
>  static inline u64 kvm_granule_shift(u32 level)
>  {
>  	/* Assumes KVM_PGTABLE_MAX_LEVELS is 4 */
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index ce9a796a85ee..ef31a1872c93 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -59,6 +59,8 @@ enum pkvm_component_id {
>  	PKVM_ID_HYP,
>  };
>  
> +extern unsigned long hyp_nr_cpus;
> +
>  int __pkvm_prot_finalize(void);
>  int __pkvm_host_share_hyp(u64 pfn);
>  int __pkvm_host_unshare_hyp(u64 pfn);
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mm.h b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> index b2ee6d5df55b..d5ec972b5c1e 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mm.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mm.h
> @@ -13,6 +13,10 @@
>  extern struct kvm_pgtable pkvm_pgtable;
>  extern hyp_spinlock_t pkvm_pgd_lock;
>  
> +int hyp_create_pcpu_fixmap(void);
> +void *hyp_fixmap_map(phys_addr_t phys);
> +void hyp_fixmap_unmap(void);
> +
>  int hyp_create_idmap(u32 hyp_va_bits);
>  int hyp_map_vectors(void);
>  int hyp_back_vmemmap(phys_addr_t back);
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 2ef6aaa21ba5..1c38451050e5 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -21,7 +21,6 @@
>  
>  #define KVM_HOST_S2_FLAGS (KVM_PGTABLE_S2_NOFWB | KVM_PGTABLE_S2_IDMAP)
>  
> -extern unsigned long hyp_nr_cpus;
>  struct host_mmu host_mmu;
>  
>  static struct hyp_pool host_s2_pool;
> diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> index d3a3b47181de..b77215630d5c 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> @@ -14,6 +14,7 @@
>  #include <nvhe/early_alloc.h>
>  #include <nvhe/gfp.h>
>  #include <nvhe/memory.h>
> +#include <nvhe/mem_protect.h>
>  #include <nvhe/mm.h>
>  #include <nvhe/spinlock.h>
>  
> @@ -25,6 +26,12 @@ unsigned int hyp_memblock_nr;
>  
>  static u64 __io_map_base;
>  
> +struct hyp_fixmap_slot {
> +	u64 addr;
> +	kvm_pte_t *ptep;
> +};
> +static DEFINE_PER_CPU(struct hyp_fixmap_slot, fixmap_slots);
> +
>  static int __pkvm_create_mappings(unsigned long start, unsigned long size,
>  				  unsigned long phys, enum kvm_pgtable_prot prot)
>  {
> @@ -212,6 +219,93 @@ int hyp_map_vectors(void)
>  	return 0;
>  }
>  
> +void *hyp_fixmap_map(phys_addr_t phys)
> +{
> +	struct hyp_fixmap_slot *slot = this_cpu_ptr(&fixmap_slots);
> +	kvm_pte_t pte, *ptep = slot->ptep;
> +
> +	pte = *ptep;
> +	pte &= ~kvm_phys_to_pte(KVM_PHYS_INVALID);
> +	pte |= kvm_phys_to_pte(phys) | KVM_PTE_VALID;
> +	WRITE_ONCE(*ptep, pte);
> +	dsb(nshst);
> +
> +	return (void *)slot->addr;
> +}
> +
> +static void fixmap_clear_slot(struct hyp_fixmap_slot *slot)
> +{
> +	kvm_pte_t *ptep = slot->ptep;
> +	u64 addr = slot->addr;
> +
> +	WRITE_ONCE(*ptep, *ptep & ~KVM_PTE_VALID);
> +	dsb(nshst);
> +	__tlbi_level(vale2, __TLBI_VADDR(addr, 0), (KVM_PGTABLE_MAX_LEVELS - 1));
> +	dsb(nsh);
> +	isb();
> +}

Does each CPU have independent Stage-1 tables at EL2? i.e. each has a distinct
root table?

If the tables are shared, you need broadcast maintenance and ISH barriers here,
or you risk the usual issues with asynchronous MMU behaviour.

If those are per-cpu, sorry for the noise!

Thanks,
Mark.

> +
> +void hyp_fixmap_unmap(void)
> +{
> +	fixmap_clear_slot(this_cpu_ptr(&fixmap_slots));
> +}
> +
> +static int __create_fixmap_slot_cb(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +				   enum kvm_pgtable_walk_flags flag,
> +				   void * const arg)
> +{
> +	struct hyp_fixmap_slot *slot = per_cpu_ptr(&fixmap_slots, (u64)arg);
> +
> +	if (!kvm_pte_valid(*ptep) || level != KVM_PGTABLE_MAX_LEVELS - 1)
> +		return -EINVAL;
> +
> +	slot->addr = addr;
> +	slot->ptep = ptep;
> +
> +	/*
> +	 * Clear the PTE, but keep the page-table page refcount elevated to
> +	 * prevent it from ever being freed. This lets us manipulate the PTEs
> +	 * by hand safely without ever needing to allocate memory.
> +	 */
> +	fixmap_clear_slot(slot);
> +
> +	return 0;
> +}
> +
> +static int create_fixmap_slot(u64 addr, u64 cpu)
> +{
> +	struct kvm_pgtable_walker walker = {
> +		.cb	= __create_fixmap_slot_cb,
> +		.flags	= KVM_PGTABLE_WALK_LEAF,
> +		.arg = (void *)cpu,
> +	};
> +
> +	return kvm_pgtable_walk(&pkvm_pgtable, addr, PAGE_SIZE, &walker);
> +}
> +
> +int hyp_create_pcpu_fixmap(void)
> +{
> +	unsigned long addr, i;
> +	int ret;
> +
> +	for (i = 0; i < hyp_nr_cpus; i++) {
> +		ret = pkvm_alloc_private_va_range(PAGE_SIZE, &addr);
> +		if (ret)
> +			return ret;
> +
> +		ret = kvm_pgtable_hyp_map(&pkvm_pgtable, addr, PAGE_SIZE,
> +					  __hyp_pa(__hyp_bss_start), PAGE_HYP);
> +		if (ret)
> +			return ret;
> +
> +		ret = create_fixmap_slot(addr, i);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	return 0;
> +}
> +
>  int hyp_create_idmap(u32 hyp_va_bits)
>  {
>  	unsigned long start, end;
> diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
> index 2be72fbe7279..0f69c1393416 100644
> --- a/arch/arm64/kvm/hyp/nvhe/setup.c
> +++ b/arch/arm64/kvm/hyp/nvhe/setup.c
> @@ -321,6 +321,10 @@ void __noreturn __pkvm_init_finalise(void)
>  	if (ret)
>  		goto out;
>  
> +	ret = hyp_create_pcpu_fixmap();
> +	if (ret)
> +		goto out;
> +
>  	pkvm_hyp_vm_table_init(vm_table_base);
>  out:
>  	/*
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index a1a27f88a312..2bcb2d5903ba 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -57,8 +57,6 @@ struct kvm_pgtable_walk_data {
>  	u64				end;
>  };
>  
> -#define KVM_PHYS_INVALID (-1ULL)
> -
>  static bool kvm_phys_is_valid(u64 phys)
>  {
>  	return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_EL1_PARANGE_MAX));
> @@ -122,16 +120,6 @@ static bool kvm_pte_table(kvm_pte_t pte, u32 level)
>  	return FIELD_GET(KVM_PTE_TYPE, pte) == KVM_PTE_TYPE_TABLE;
>  }
>  
> -static kvm_pte_t kvm_phys_to_pte(u64 pa)
> -{
> -	kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK;
> -
> -	if (PAGE_SHIFT == 16)
> -		pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
> -
> -	return pte;
> -}
> -
>  static kvm_pte_t *kvm_pte_follow(kvm_pte_t pte, struct kvm_pgtable_mm_ops *mm_ops)
>  {
>  	return mm_ops->phys_to_virt(kvm_pte_to_phys(pte));
> -- 
> 2.38.0.413.g74048e4d9e-goog
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 07/25] KVM: arm64: Prevent the donation of no-map pages
  2022-10-17 11:51 ` [PATCH v4 07/25] KVM: arm64: Prevent the donation of no-map pages Will Deacon
@ 2022-10-18 13:42   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 53+ messages in thread
From: Philippe Mathieu-Daudé @ 2022-10-18 13:42 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Quentin Perret,
	Suzuki K Poulose, Mark Rutland, Fuad Tabba, Oliver Upton,
	Marc Zyngier, kernel-team, kvm, linux-arm-kernel

Hi Will & Quentin,

On 17/10/22 13:51, Will Deacon wrote:
> From: Quentin Perret <qperret@google.com>
> 
> Memory regions marked as "no-map" in the host device-tree routinely
> include TrustZone carevouts and DMA pools. Although donating such pages

Typo "carve-outs"?

> to the hypervisor may not breach confidentiality, it could be used to
> corrupt its state in uncontrollable ways. To prevent this, let's block
> host-initiated memory transitions targeting "no-map" pages altogether in
> nVHE protected mode as there should be no valid reason to do this in
> current operation.
> 
> Thankfully, the pKVM EL2 hypervisor has a full copy of the host's list
> of memblock regions, so we can easily check for the presence of the
> MEMBLOCK_NOMAP flag on a region containing pages being donated from the
> host.
> 
> Tested-by: Vincent Donnefort <vdonnefort@google.com>
> Signed-off-by: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/kvm/hyp/nvhe/mem_protect.c | 22 ++++++++++++++++------
>   1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index c30402737548..a7156fd13bc8 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -193,7 +193,7 @@ struct kvm_mem_range {
>   	u64 end;
>   };

>   bool addr_is_memory(phys_addr_t phys)
>   {
>   	struct kvm_mem_range range;
>   
> -	return find_mem_range(phys, &range);
> +	return !!find_mem_range(phys, &range);
> +}
> +
> +static bool addr_is_allowed_memory(phys_addr_t phys)
> +{
> +	struct memblock_region *reg;
> +	struct kvm_mem_range range;
> +
> +	reg = find_mem_range(phys, &range);
> +
> +	return reg && !(reg->flags & MEMBLOCK_NOMAP);
>   }
>   
>   static bool is_in_mem_range(u64 addr, struct kvm_mem_range *range)
> @@ -346,7 +356,7 @@ static bool host_stage2_force_pte_cb(u64 addr, u64 end, enum kvm_pgtable_prot pr
>   static int host_stage2_idmap(u64 addr)
>   {
>   	struct kvm_mem_range range;
> -	bool is_memory = find_mem_range(addr, &range);
> +	bool is_memory = !!find_mem_range(addr, &range);

We don't replace by addr_is_allowed_memory() because we still use
&range, OK.

>   	enum kvm_pgtable_prot prot;
>   	int ret;
>   
> @@ -424,7 +434,7 @@ static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
>   	struct check_walk_data *d = arg;
>   	kvm_pte_t pte = *ptep;
>   
> -	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
> +	if (kvm_pte_valid(pte) && !addr_is_allowed_memory(kvm_pte_to_phys(pte)))
>   		return -EINVAL;

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 11/25] KVM: arm64: Rename 'host_kvm' to 'host_mmu'
  2022-10-17 11:51 ` [PATCH v4 11/25] KVM: arm64: Rename 'host_kvm' to 'host_mmu' Will Deacon
@ 2022-10-18 13:47   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 53+ messages in thread
From: Philippe Mathieu-Daudé @ 2022-10-18 13:47 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Quentin Perret,
	Suzuki K Poulose, Mark Rutland, Fuad Tabba, Oliver Upton,
	Marc Zyngier, kernel-team, kvm, linux-arm-kernel

On 17/10/22 13:51, Will Deacon wrote:
> In preparation for introducing VM and vCPU state at EL2, rename the
> existing 'struct host_kvm' and its singleton 'host_kvm' instance to
> 'host_mmu' so as to avoid confusion between the structure tracking the
> host stage-2 MMU state and the host instance of a 'struct kvm' for a
> protected guest.
> 
> Tested-by: Vincent Donnefort <vdonnefort@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  6 +--
>   arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 46 +++++++++----------
>   2 files changed, 26 insertions(+), 26 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 10/25] KVM: arm64: Add hyp_spinlock_t static initializer
  2022-10-17 11:51 ` [PATCH v4 10/25] KVM: arm64: Add hyp_spinlock_t static initializer Will Deacon
@ 2022-10-18 13:51   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 53+ messages in thread
From: Philippe Mathieu-Daudé @ 2022-10-18 13:51 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Quentin Perret,
	Suzuki K Poulose, Mark Rutland, Fuad Tabba, Oliver Upton,
	Marc Zyngier, kernel-team, kvm, linux-arm-kernel

On 17/10/22 13:51, Will Deacon wrote:
> From: Fuad Tabba <tabba@google.com>
> 
> Introduce a static initializer macro for 'hyp_spinlock_t' so that it is
> straightforward to instantiate global locks at EL2. This will be later
> utilised for locking the VM table in the hypervisor.
> 
> Tested-by: Vincent Donnefort <vdonnefort@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/kvm/hyp/include/nvhe/spinlock.h | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 14/25] KVM: arm64: Add per-cpu fixmap infrastructure at EL2
  2022-10-18 11:06   ` Mark Rutland
@ 2022-10-18 14:05     ` Will Deacon
  2022-10-18 16:52       ` Mark Rutland
  0 siblings, 1 reply; 53+ messages in thread
From: Will Deacon @ 2022-10-18 14:05 UTC (permalink / raw)
  To: Mark Rutland
  Cc: kvmarm, Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Quentin Perret,
	Suzuki K Poulose, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

Hi Mark,

Cheers for having a look.

On Tue, Oct 18, 2022 at 12:06:14PM +0100, Mark Rutland wrote:
> On Mon, Oct 17, 2022 at 12:51:58PM +0100, Will Deacon wrote:
> > diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> > index d3a3b47181de..b77215630d5c 100644
> > --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> > @@ -14,6 +14,7 @@
> >  #include <nvhe/early_alloc.h>
> >  #include <nvhe/gfp.h>
> >  #include <nvhe/memory.h>
> > +#include <nvhe/mem_protect.h>
> >  #include <nvhe/mm.h>
> >  #include <nvhe/spinlock.h>
> >  
> > @@ -25,6 +26,12 @@ unsigned int hyp_memblock_nr;
> >  
> >  static u64 __io_map_base;
> >  
> > +struct hyp_fixmap_slot {
> > +	u64 addr;
> > +	kvm_pte_t *ptep;
> > +};
> > +static DEFINE_PER_CPU(struct hyp_fixmap_slot, fixmap_slots);
> > +
> >  static int __pkvm_create_mappings(unsigned long start, unsigned long size,
> >  				  unsigned long phys, enum kvm_pgtable_prot prot)
> >  {
> > @@ -212,6 +219,93 @@ int hyp_map_vectors(void)
> >  	return 0;
> >  }
> >  
> > +void *hyp_fixmap_map(phys_addr_t phys)
> > +{
> > +	struct hyp_fixmap_slot *slot = this_cpu_ptr(&fixmap_slots);
> > +	kvm_pte_t pte, *ptep = slot->ptep;
> > +
> > +	pte = *ptep;
> > +	pte &= ~kvm_phys_to_pte(KVM_PHYS_INVALID);
> > +	pte |= kvm_phys_to_pte(phys) | KVM_PTE_VALID;
> > +	WRITE_ONCE(*ptep, pte);
> > +	dsb(nshst);
> > +
> > +	return (void *)slot->addr;
> > +}
> > +
> > +static void fixmap_clear_slot(struct hyp_fixmap_slot *slot)
> > +{
> > +	kvm_pte_t *ptep = slot->ptep;
> > +	u64 addr = slot->addr;
> > +
> > +	WRITE_ONCE(*ptep, *ptep & ~KVM_PTE_VALID);
> > +	dsb(nshst);
> > +	__tlbi_level(vale2, __TLBI_VADDR(addr, 0), (KVM_PGTABLE_MAX_LEVELS - 1));
> > +	dsb(nsh);
> > +	isb();
> > +}
> 
> Does each CPU have independent Stage-1 tables at EL2? i.e. each has a distinct
> root table?

No, the CPUs share the same stage-1 table at EL2.

> If the tables are shared, you need broadcast maintenance and ISH barriers here,
> or you risk the usual issues with asynchronous MMU behaviour.

Can you elaborate a bit, please? What we're trying to do is reserve a page
of VA space for each CPU, which is only ever accessed explicitly by that
CPU using a normal memory mapping. The fixmap code therefore just updates
the relevant leaf entry for the CPU on which we're running and the TLBI
is there to ensure that the new mapping takes effect.

If another CPU speculatively walks another CPU's fixmap slot, then I agree
that it could access that page after the slot had been cleared. Although
I can see theoretical security arguments around avoiding that situation,
there's a very real performance cost to broadcast invalidation that we
were hoping to avoid on this fast path.

Of course, in the likely event that I've purged "the usual issues" from
my head and we need broadcasting for _correctness_, then we'll just have
to suck it up!

Cheers,

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 12/25] KVM: arm64: Add infrastructure to create and track pKVM instances at EL2
  2022-10-17 11:51 ` [PATCH v4 12/25] KVM: arm64: Add infrastructure to create and track pKVM instances at EL2 Will Deacon
@ 2022-10-18 15:13   ` Quentin Perret
  2022-10-19 12:35     ` Will Deacon
  2022-10-18 16:21   ` Quentin Perret
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 53+ messages in thread
From: Quentin Perret @ 2022-10-18 15:13 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Suzuki K Poulose,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

On Monday 17 Oct 2022 at 12:51:56 (+0100), Will Deacon wrote:
> +/* Maximum number of protected VMs that can be created. */
> +#define KVM_MAX_PVMS 255

Nit: I think that limit will apply to non-protected VMs too, at least
initially, so it'd be worth rewording the comment.

Cheers,
Quentin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 12/25] KVM: arm64: Add infrastructure to create and track pKVM instances at EL2
  2022-10-17 11:51 ` [PATCH v4 12/25] KVM: arm64: Add infrastructure to create and track pKVM instances at EL2 Will Deacon
  2022-10-18 15:13   ` Quentin Perret
@ 2022-10-18 16:21   ` Quentin Perret
  2022-10-19 12:45     ` Will Deacon
  2022-10-18 16:33   ` Quentin Perret
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 53+ messages in thread
From: Quentin Perret @ 2022-10-18 16:21 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Suzuki K Poulose,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

On Monday 17 Oct 2022 at 12:51:56 (+0100), Will Deacon wrote:
> +struct pkvm_hyp_vm {
> +	struct kvm kvm;
> +
> +	/* Backpointer to the host's (untrusted) KVM instance. */
> +	struct kvm *host_kvm;
> +
> +	/*
> +	 * Total amount of memory donated by the host for maintaining
> +	 * this 'struct pkvm_hyp_vm' in the hypervisor.
> +	 */
> +	size_t donated_memory_size;

I think you could get rid of that member. IIUC, all you need to
re-compute it in the teardown path is the number of created vCPUs on
the host, which we should have safely stored in
pkvm_hyp_vm::kvm::created_vcpus.

> +	/* The guest's stage-2 page-table managed by the hypervisor. */
> +	struct kvm_pgtable pgt;
> +
> +	/*
> +	 * The number of vcpus initialized and ready to run.
> +	 * Modifying this is protected by 'vm_table_lock'.
> +	 */
> +	unsigned int nr_vcpus;
> +
> +	/* Array of the hyp vCPU structures for this VM. */
> +	struct pkvm_hyp_vcpu *vcpus[];
> +};

Cheers,
Quentin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 12/25] KVM: arm64: Add infrastructure to create and track pKVM instances at EL2
  2022-10-17 11:51 ` [PATCH v4 12/25] KVM: arm64: Add infrastructure to create and track pKVM instances at EL2 Will Deacon
  2022-10-18 15:13   ` Quentin Perret
  2022-10-18 16:21   ` Quentin Perret
@ 2022-10-18 16:33   ` Quentin Perret
  2022-10-19 11:57     ` Will Deacon
  2022-10-18 16:40   ` Quentin Perret
  2022-10-18 16:45   ` Quentin Perret
  4 siblings, 1 reply; 53+ messages in thread
From: Quentin Perret @ 2022-10-18 16:33 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Suzuki K Poulose,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

On Monday 17 Oct 2022 at 12:51:56 (+0100), Will Deacon wrote:
> +void pkvm_hyp_vm_table_init(void *tbl)
> +{
> +	WARN_ON(vm_table);
> +	vm_table = tbl;
> +}

Uh, why does this one need to be exposed outside pkvm.c ?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 12/25] KVM: arm64: Add infrastructure to create and track pKVM instances at EL2
  2022-10-17 11:51 ` [PATCH v4 12/25] KVM: arm64: Add infrastructure to create and track pKVM instances at EL2 Will Deacon
                     ` (2 preceding siblings ...)
  2022-10-18 16:33   ` Quentin Perret
@ 2022-10-18 16:40   ` Quentin Perret
  2022-10-19 12:44     ` Will Deacon
  2022-10-18 16:45   ` Quentin Perret
  4 siblings, 1 reply; 53+ messages in thread
From: Quentin Perret @ 2022-10-18 16:40 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Suzuki K Poulose,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

On Monday 17 Oct 2022 at 12:51:56 (+0100), Will Deacon wrote:
> +static void *map_donated_memory_noclear(unsigned long host_va, size_t size)
> +{
> +	void *va = (void *)kern_hyp_va(host_va);
> +
> +	if (!PAGE_ALIGNED(va))
> +		return NULL;
> +
> +	if (__pkvm_host_donate_hyp(hyp_virt_to_pfn(va),
> +				   PAGE_ALIGN(size) >> PAGE_SHIFT))
> +		return NULL;
> +
> +	return va;
> +}
> +
> +static void *map_donated_memory(unsigned long host_va, size_t size)
> +{
> +	void *va = map_donated_memory_noclear(host_va, size);
> +
> +	if (va)
> +		memset(va, 0, size);
> +
> +	return va;
> +}
> +
> +static void __unmap_donated_memory(void *va, size_t size)
> +{
> +	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(va),
> +				       PAGE_ALIGN(size) >> PAGE_SHIFT));
> +}
> +
> +static void unmap_donated_memory(void *va, size_t size)
> +{
> +	if (!va)
> +		return;
> +
> +	memset(va, 0, size);
> +	__unmap_donated_memory(va, size);
> +}
> +
> +static void unmap_donated_memory_noclear(void *va, size_t size)
> +{
> +	if (!va)
> +		return;
> +
> +	__unmap_donated_memory(va, size);
> +}

Nit: I'm not a huge fan of the naming here, these do more than just
map/unmap random pages. This only works for host pages, the donation
path has permission checks, etc. Maybe {admit,return}_host_memory()?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 12/25] KVM: arm64: Add infrastructure to create and track pKVM instances at EL2
  2022-10-17 11:51 ` [PATCH v4 12/25] KVM: arm64: Add infrastructure to create and track pKVM instances at EL2 Will Deacon
                     ` (3 preceding siblings ...)
  2022-10-18 16:40   ` Quentin Perret
@ 2022-10-18 16:45   ` Quentin Perret
  2022-10-19 12:18     ` Fuad Tabba
  4 siblings, 1 reply; 53+ messages in thread
From: Quentin Perret @ 2022-10-18 16:45 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Suzuki K Poulose,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

On Monday 17 Oct 2022 at 12:51:56 (+0100), Will Deacon wrote:
> +static int find_free_vm_table_entry(struct kvm *host_kvm)
> +{
> +	int i, ret = -ENOMEM;
> +
> +	for (i = 0; i < KVM_MAX_PVMS; ++i) {
> +		struct pkvm_hyp_vm *vm = vm_table[i];
> +
> +		if (!vm) {
> +			if (ret < 0)
> +				ret = i;
> +			continue;
> +		}
> +
> +		if (unlikely(vm->host_kvm == host_kvm)) {
> +			ret = -EEXIST;
> +			break;
> +		}

That would be funny if the host passed the same struct twice, but do we
care? If the host wants to shoot itself in the foot, it's not our
problem I guess :) ? Also, we don't do the same check for vCPUs so ...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 14/25] KVM: arm64: Add per-cpu fixmap infrastructure at EL2
  2022-10-18 14:05     ` Will Deacon
@ 2022-10-18 16:52       ` Mark Rutland
  2022-10-19 12:01         ` Will Deacon
  0 siblings, 1 reply; 53+ messages in thread
From: Mark Rutland @ 2022-10-18 16:52 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Quentin Perret,
	Suzuki K Poulose, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

On Tue, Oct 18, 2022 at 03:05:14PM +0100, Will Deacon wrote:
> Hi Mark,
> 
> Cheers for having a look.
> 
> On Tue, Oct 18, 2022 at 12:06:14PM +0100, Mark Rutland wrote:
> > On Mon, Oct 17, 2022 at 12:51:58PM +0100, Will Deacon wrote:
> > > diff --git a/arch/arm64/kvm/hyp/nvhe/mm.c b/arch/arm64/kvm/hyp/nvhe/mm.c
> > > index d3a3b47181de..b77215630d5c 100644
> > > --- a/arch/arm64/kvm/hyp/nvhe/mm.c
> > > +++ b/arch/arm64/kvm/hyp/nvhe/mm.c
> > > @@ -14,6 +14,7 @@
> > >  #include <nvhe/early_alloc.h>
> > >  #include <nvhe/gfp.h>
> > >  #include <nvhe/memory.h>
> > > +#include <nvhe/mem_protect.h>
> > >  #include <nvhe/mm.h>
> > >  #include <nvhe/spinlock.h>
> > >  
> > > @@ -25,6 +26,12 @@ unsigned int hyp_memblock_nr;
> > >  
> > >  static u64 __io_map_base;
> > >  
> > > +struct hyp_fixmap_slot {
> > > +	u64 addr;
> > > +	kvm_pte_t *ptep;
> > > +};
> > > +static DEFINE_PER_CPU(struct hyp_fixmap_slot, fixmap_slots);
> > > +
> > >  static int __pkvm_create_mappings(unsigned long start, unsigned long size,
> > >  				  unsigned long phys, enum kvm_pgtable_prot prot)
> > >  {
> > > @@ -212,6 +219,93 @@ int hyp_map_vectors(void)
> > >  	return 0;
> > >  }
> > >  
> > > +void *hyp_fixmap_map(phys_addr_t phys)
> > > +{
> > > +	struct hyp_fixmap_slot *slot = this_cpu_ptr(&fixmap_slots);
> > > +	kvm_pte_t pte, *ptep = slot->ptep;
> > > +
> > > +	pte = *ptep;
> > > +	pte &= ~kvm_phys_to_pte(KVM_PHYS_INVALID);
> > > +	pte |= kvm_phys_to_pte(phys) | KVM_PTE_VALID;
> > > +	WRITE_ONCE(*ptep, pte);
> > > +	dsb(nshst);
> > > +
> > > +	return (void *)slot->addr;
> > > +}
> > > +
> > > +static void fixmap_clear_slot(struct hyp_fixmap_slot *slot)
> > > +{
> > > +	kvm_pte_t *ptep = slot->ptep;
> > > +	u64 addr = slot->addr;
> > > +
> > > +	WRITE_ONCE(*ptep, *ptep & ~KVM_PTE_VALID);
> > > +	dsb(nshst);
> > > +	__tlbi_level(vale2, __TLBI_VADDR(addr, 0), (KVM_PGTABLE_MAX_LEVELS - 1));
> > > +	dsb(nsh);
> > > +	isb();
> > > +}
> > 
> > Does each CPU have independent Stage-1 tables at EL2? i.e. each has a distinct
> > root table?
> 
> No, the CPUs share the same stage-1 table at EL2.

Ah, then I think there's a problem here.

> > If the tables are shared, you need broadcast maintenance and ISH barriers here,
> > or you risk the usual issues with asynchronous MMU behaviour.
> 
> Can you elaborate a bit, please? What we're trying to do is reserve a page
> of VA space for each CPU, which is only ever accessed explicitly by that
> CPU using a normal memory mapping. The fixmap code therefore just updates
> the relevant leaf entry for the CPU on which we're running and the TLBI
> is there to ensure that the new mapping takes effect.
> 
> If another CPU speculatively walks another CPU's fixmap slot, then I agree
> that it could access that page after the slot had been cleared. Although
> I can see theoretical security arguments around avoiding that situation,
> there's a very real performance cost to broadcast invalidation that we
> were hoping to avoid on this fast path.

The issue is that any CPU could walk any of these entries at any time
for any reason, and without broadcast maintenance we'd be violating the
Break-Before-Make requirements. That permits a number of things,
including "amalgamation", which would permit the CPU to consume some
arbitrary function of the old+new entries. Among other things, that can
permit accesses to entirely bogus physical addresses that weren't in
either entry (e.g. making speculative accesses to arbitrary device
addresses).

For correctness, you need the maintenance to be broadcast to all PEs
which could observe the old and new entries.

> Of course, in the likely event that I've purged "the usual issues" from
> my head and we need broadcasting for _correctness_, then we'll just have
> to suck it up!

As above, I believe you need this for correctness.

I'm not sure if FEAT_BBM level 2 gives you the necessary properties to
relax this on some HW.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 12/25] KVM: arm64: Add infrastructure to create and track pKVM instances at EL2
  2022-10-18 16:33   ` Quentin Perret
@ 2022-10-19 11:57     ` Will Deacon
  2022-10-19 13:35       ` Quentin Perret
  0 siblings, 1 reply; 53+ messages in thread
From: Will Deacon @ 2022-10-19 11:57 UTC (permalink / raw)
  To: Quentin Perret
  Cc: kvmarm, Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Suzuki K Poulose,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

On Tue, Oct 18, 2022 at 04:33:45PM +0000, Quentin Perret wrote:
> On Monday 17 Oct 2022 at 12:51:56 (+0100), Will Deacon wrote:
> > +void pkvm_hyp_vm_table_init(void *tbl)
> > +{
> > +	WARN_ON(vm_table);
> > +	vm_table = tbl;
> > +}
> 
> Uh, why does this one need to be exposed outside pkvm.c ?

We need to initialise the table using the memory donated by the host
on the __pkvm_init path. That's all private to nvhe/setup.c, so rather
than expose the raw pointers (of either the table or the donated memory),
we've got this initialisation function instead which is invoked by
__pkvm_init_finalise() on the deprivilege path.

Happy to repaint it if you have a patch?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 14/25] KVM: arm64: Add per-cpu fixmap infrastructure at EL2
  2022-10-18 16:52       ` Mark Rutland
@ 2022-10-19 12:01         ` Will Deacon
  0 siblings, 0 replies; 53+ messages in thread
From: Will Deacon @ 2022-10-19 12:01 UTC (permalink / raw)
  To: Mark Rutland
  Cc: kvmarm, Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Quentin Perret,
	Suzuki K Poulose, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

On Tue, Oct 18, 2022 at 05:52:18PM +0100, Mark Rutland wrote:
> On Tue, Oct 18, 2022 at 03:05:14PM +0100, Will Deacon wrote:
> > On Tue, Oct 18, 2022 at 12:06:14PM +0100, Mark Rutland wrote:
> > > If the tables are shared, you need broadcast maintenance and ISH barriers here,
> > > or you risk the usual issues with asynchronous MMU behaviour.
> > 
> > Can you elaborate a bit, please? What we're trying to do is reserve a page
> > of VA space for each CPU, which is only ever accessed explicitly by that
> > CPU using a normal memory mapping. The fixmap code therefore just updates
> > the relevant leaf entry for the CPU on which we're running and the TLBI
> > is there to ensure that the new mapping takes effect.
> > 
> > If another CPU speculatively walks another CPU's fixmap slot, then I agree
> > that it could access that page after the slot had been cleared. Although
> > I can see theoretical security arguments around avoiding that situation,
> > there's a very real performance cost to broadcast invalidation that we
> > were hoping to avoid on this fast path.
> 
> The issue is that any CPU could walk any of these entries at any time
> for any reason, and without broadcast maintenance we'd be violating the
> Break-Before-Make requirements. That permits a number of things,
> including "amalgamation", which would permit the CPU to consume some
> arbitrary function of the old+new entries. Among other things, that can
> permit accesses to entirely bogus physical addresses that weren't in
> either entry (e.g. making speculative accesses to arbitrary device
> addresses).
> 
> For correctness, you need the maintenance to be broadcast to all PEs
> which could observe the old and new entries.

Urgh, I had definitely purged that one. Thanks for the refresher.

> > Of course, in the likely event that I've purged "the usual issues" from
> > my head and we need broadcasting for _correctness_, then we'll just have
> > to suck it up!
> 
> As above, I believe you need this for correctness.
> 
> I'm not sure if FEAT_BBM level 2 gives you the necessary properties to
> relax this on some HW.

I'll seek clarification on this, as that amalgamation text feels a bit
over-reaching to me (particularly when combined with the fact that the
CPU still has to respect things like the NS bit) and I suspect that we
wouldn't see it in practice for this case.

But for now, I'll add the broadcasting so we don't block the series.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 12/25] KVM: arm64: Add infrastructure to create and track pKVM instances at EL2
  2022-10-18 16:45   ` Quentin Perret
@ 2022-10-19 12:18     ` Fuad Tabba
  0 siblings, 0 replies; 53+ messages in thread
From: Fuad Tabba @ 2022-10-19 12:18 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Will Deacon, kvmarm, Sean Christopherson, Vincent Donnefort,
	Alexandru Elisei, Catalin Marinas, James Morse, Chao Peng,
	Suzuki K Poulose, Mark Rutland, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

Hi,

On Tue, Oct 18, 2022 at 5:45 PM Quentin Perret <qperret@google.com> wrote:
>
> On Monday 17 Oct 2022 at 12:51:56 (+0100), Will Deacon wrote:
> > +static int find_free_vm_table_entry(struct kvm *host_kvm)
> > +{
> > +     int i, ret = -ENOMEM;
> > +
> > +     for (i = 0; i < KVM_MAX_PVMS; ++i) {
> > +             struct pkvm_hyp_vm *vm = vm_table[i];
> > +
> > +             if (!vm) {
> > +                     if (ret < 0)
> > +                             ret = i;
> > +                     continue;
> > +             }
> > +
> > +             if (unlikely(vm->host_kvm == host_kvm)) {
> > +                     ret = -EEXIST;
> > +                     break;
> > +             }
>
> That would be funny if the host passed the same struct twice, but do we
> care? If the host wants to shoot itself in the foot, it's not our
> problem I guess :) ? Also, we don't do the same check for vCPUs so ...

You're right, the host can shoot itself in the foot if it wants to.
The reason why we don't have it for vcpus is that that code was
factored out later, and as you said, a similar check isn't necessary.

TLDR: we'll remove it :)

Thanks,
/fuad

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 12/25] KVM: arm64: Add infrastructure to create and track pKVM instances at EL2
  2022-10-18 15:13   ` Quentin Perret
@ 2022-10-19 12:35     ` Will Deacon
  0 siblings, 0 replies; 53+ messages in thread
From: Will Deacon @ 2022-10-19 12:35 UTC (permalink / raw)
  To: Quentin Perret
  Cc: kvmarm, Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Suzuki K Poulose,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

On Tue, Oct 18, 2022 at 03:13:36PM +0000, Quentin Perret wrote:
> On Monday 17 Oct 2022 at 12:51:56 (+0100), Will Deacon wrote:
> > +/* Maximum number of protected VMs that can be created. */
> > +#define KVM_MAX_PVMS 255
> 
> Nit: I think that limit will apply to non-protected VMs too, at least
> initially, so it'd be worth rewording the comment.

Good point, I've changed this to:

  /* Maximum number of VMs that can co-exist under pKVM. */

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 12/25] KVM: arm64: Add infrastructure to create and track pKVM instances at EL2
  2022-10-18 16:40   ` Quentin Perret
@ 2022-10-19 12:44     ` Will Deacon
  0 siblings, 0 replies; 53+ messages in thread
From: Will Deacon @ 2022-10-19 12:44 UTC (permalink / raw)
  To: Quentin Perret
  Cc: kvmarm, Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Suzuki K Poulose,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

On Tue, Oct 18, 2022 at 04:40:11PM +0000, Quentin Perret wrote:
> On Monday 17 Oct 2022 at 12:51:56 (+0100), Will Deacon wrote:
> > +static void *map_donated_memory_noclear(unsigned long host_va, size_t size)
> > +{
> > +	void *va = (void *)kern_hyp_va(host_va);
> > +
> > +	if (!PAGE_ALIGNED(va))
> > +		return NULL;
> > +
> > +	if (__pkvm_host_donate_hyp(hyp_virt_to_pfn(va),
> > +				   PAGE_ALIGN(size) >> PAGE_SHIFT))
> > +		return NULL;
> > +
> > +	return va;
> > +}
> > +
> > +static void *map_donated_memory(unsigned long host_va, size_t size)
> > +{
> > +	void *va = map_donated_memory_noclear(host_va, size);
> > +
> > +	if (va)
> > +		memset(va, 0, size);
> > +
> > +	return va;
> > +}
> > +
> > +static void __unmap_donated_memory(void *va, size_t size)
> > +{
> > +	WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn(va),
> > +				       PAGE_ALIGN(size) >> PAGE_SHIFT));
> > +}
> > +
> > +static void unmap_donated_memory(void *va, size_t size)
> > +{
> > +	if (!va)
> > +		return;
> > +
> > +	memset(va, 0, size);
> > +	__unmap_donated_memory(va, size);
> > +}
> > +
> > +static void unmap_donated_memory_noclear(void *va, size_t size)
> > +{
> > +	if (!va)
> > +		return;
> > +
> > +	__unmap_donated_memory(va, size);
> > +}
> 
> Nit: I'm not a huge fan of the naming here, these do more than just
> map/unmap random pages. This only works for host pages, the donation
> path has permission checks, etc. Maybe {admit,return}_host_memory()?

Hmm, so I made this change locally and I found return_host_memory() to be
really hard to read, particularly on error paths where we 'goto' an error
label and have calls to 'return_host_memory' before an actual 'return'
statement.

So I've left as-is, not because I'm tied to the existing names, but because
I did struggle with your suggestion. At the end of the day, these are static
functions so it's really easy to change the name later on if we can think
of something better.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 12/25] KVM: arm64: Add infrastructure to create and track pKVM instances at EL2
  2022-10-18 16:21   ` Quentin Perret
@ 2022-10-19 12:45     ` Will Deacon
  0 siblings, 0 replies; 53+ messages in thread
From: Will Deacon @ 2022-10-19 12:45 UTC (permalink / raw)
  To: Quentin Perret
  Cc: kvmarm, Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Suzuki K Poulose,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

On Tue, Oct 18, 2022 at 04:21:11PM +0000, Quentin Perret wrote:
> On Monday 17 Oct 2022 at 12:51:56 (+0100), Will Deacon wrote:
> > +struct pkvm_hyp_vm {
> > +	struct kvm kvm;
> > +
> > +	/* Backpointer to the host's (untrusted) KVM instance. */
> > +	struct kvm *host_kvm;
> > +
> > +	/*
> > +	 * Total amount of memory donated by the host for maintaining
> > +	 * this 'struct pkvm_hyp_vm' in the hypervisor.
> > +	 */
> > +	size_t donated_memory_size;
> 
> I think you could get rid of that member. IIUC, all you need to
> re-compute it in the teardown path is the number of created vCPUs on
> the host, which we should have safely stored in
> pkvm_hyp_vm::kvm::created_vcpus.

Oh, well spotted! I've dropped this as you have suggested.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 12/25] KVM: arm64: Add infrastructure to create and track pKVM instances at EL2
  2022-10-19 11:57     ` Will Deacon
@ 2022-10-19 13:35       ` Quentin Perret
  0 siblings, 0 replies; 53+ messages in thread
From: Quentin Perret @ 2022-10-19 13:35 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Suzuki K Poulose,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

On Wednesday 19 Oct 2022 at 12:57:24 (+0100), Will Deacon wrote:
> On Tue, Oct 18, 2022 at 04:33:45PM +0000, Quentin Perret wrote:
> > On Monday 17 Oct 2022 at 12:51:56 (+0100), Will Deacon wrote:
> > > +void pkvm_hyp_vm_table_init(void *tbl)
> > > +{
> > > +	WARN_ON(vm_table);
> > > +	vm_table = tbl;
> > > +}
> > 
> > Uh, why does this one need to be exposed outside pkvm.c ?
> 
> We need to initialise the table using the memory donated by the host
> on the __pkvm_init path. That's all private to nvhe/setup.c, so rather
> than expose the raw pointers (of either the table or the donated memory),
> we've got this initialisation function instead which is invoked by
> __pkvm_init_finalise() on the deprivilege path.
> 
> Happy to repaint it if you have a patch?

I don't, I just got confused, maybe because in an older version of this
(possibly quite old) the table was statically allocated? Anyways, it's
all fine as-is, thanks for the reply.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 13/25] KVM: arm64: Instantiate pKVM hypervisor VM and vCPU structures from EL1
  2022-10-17 11:51 ` [PATCH v4 13/25] KVM: arm64: Instantiate pKVM hypervisor VM and vCPU structures from EL1 Will Deacon
@ 2022-10-19 15:46   ` Quentin Perret
  2022-10-19 16:00   ` Quentin Perret
  1 sibling, 0 replies; 53+ messages in thread
From: Quentin Perret @ 2022-10-19 15:46 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Suzuki K Poulose,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

On Monday 17 Oct 2022 at 12:51:57 (+0100), Will Deacon wrote:
> +struct kvm_protected_vm {
> +	pkvm_handle_t handle;
> +	struct mutex vm_lock;
> +
> +	struct {
> +		void *pgd;
> +		void *vm;
> +		void *vcpus[KVM_MAX_VCPUS];

That's memory that's 'wasted' for everyone :/

> +	} hyp_donations;
> +};
> +
>  struct kvm_arch {
>  	struct kvm_s2_mmu mmu;
>  
> @@ -170,10 +181,10 @@ struct kvm_arch {
>  	struct kvm_smccc_features smccc_feat;
>  
>  	/*
> -	 * For an untrusted host VM, 'pkvm_handle' is used to lookup
> +	 * For an untrusted host VM, 'pkvm.handle' is used to lookup
>  	 * the associated pKVM instance in the hypervisor.
>  	 */
> -	pkvm_handle_t pkvm_handle;
> +	struct kvm_protected_vm pkvm;

Maybe make this a pointer that will be !NULL only when pKVM is enabled?

>  };

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 20/25] KVM: arm64: Return guest memory from EL2 via dedicated teardown memcache
  2022-10-17 11:52 ` [PATCH v4 20/25] KVM: arm64: Return guest memory from EL2 via dedicated teardown memcache Will Deacon
@ 2022-10-19 15:52   ` Quentin Perret
  2022-10-19 16:24     ` Will Deacon
  0 siblings, 1 reply; 53+ messages in thread
From: Quentin Perret @ 2022-10-19 15:52 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Suzuki K Poulose,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

On Monday 17 Oct 2022 at 12:52:04 (+0100), Will Deacon wrote:
>  struct kvm_protected_vm {
>  	pkvm_handle_t handle;
>  	struct mutex vm_lock;
> -
> -	struct {
> -		void *pgd;
> -		void *vm;
> -		void *vcpus[KVM_MAX_VCPUS];
> -	} hyp_donations;
> +	struct kvm_hyp_memcache teardown_mc;
>  };

Argh, I guess that somewhat invalidates my previous comment. Oh well :-)

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 13/25] KVM: arm64: Instantiate pKVM hypervisor VM and vCPU structures from EL1
  2022-10-17 11:51 ` [PATCH v4 13/25] KVM: arm64: Instantiate pKVM hypervisor VM and vCPU structures from EL1 Will Deacon
  2022-10-19 15:46   ` Quentin Perret
@ 2022-10-19 16:00   ` Quentin Perret
  2022-10-19 16:34     ` Will Deacon
  1 sibling, 1 reply; 53+ messages in thread
From: Quentin Perret @ 2022-10-19 16:00 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvmarm, Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Suzuki K Poulose,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

On Monday 17 Oct 2022 at 12:51:57 (+0100), Will Deacon wrote:
> +struct kvm_protected_vm {
> +	pkvm_handle_t handle;
> +	struct mutex vm_lock;

Why is this lock needed btw? Isn't kvm->lock good enough?

> +
> +	struct {
> +		void *pgd;
> +		void *vm;
> +		void *vcpus[KVM_MAX_VCPUS];
> +	} hyp_donations;
> +};

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 20/25] KVM: arm64: Return guest memory from EL2 via dedicated teardown memcache
  2022-10-19 15:52   ` Quentin Perret
@ 2022-10-19 16:24     ` Will Deacon
  0 siblings, 0 replies; 53+ messages in thread
From: Will Deacon @ 2022-10-19 16:24 UTC (permalink / raw)
  To: Quentin Perret
  Cc: kvmarm, Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Suzuki K Poulose,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

On Wed, Oct 19, 2022 at 03:52:10PM +0000, Quentin Perret wrote:
> On Monday 17 Oct 2022 at 12:52:04 (+0100), Will Deacon wrote:
> >  struct kvm_protected_vm {
> >  	pkvm_handle_t handle;
> >  	struct mutex vm_lock;
> > -
> > -	struct {
> > -		void *pgd;
> > -		void *vm;
> > -		void *vcpus[KVM_MAX_VCPUS];
> > -	} hyp_donations;
> > +	struct kvm_hyp_memcache teardown_mc;
> >  };
> 
> Argh, I guess that somewhat invalidates my previous comment. Oh well :-)

Yup! There's a slight chicken-and-egg problem here, so we introduce
explicit tracking of the donations in the host to start with, but then
once we have the teardown memcache we can stop tracking things explicitly
and remove the temporary pointers.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v4 13/25] KVM: arm64: Instantiate pKVM hypervisor VM and vCPU structures from EL1
  2022-10-19 16:00   ` Quentin Perret
@ 2022-10-19 16:34     ` Will Deacon
  0 siblings, 0 replies; 53+ messages in thread
From: Will Deacon @ 2022-10-19 16:34 UTC (permalink / raw)
  To: Quentin Perret
  Cc: kvmarm, Sean Christopherson, Vincent Donnefort, Alexandru Elisei,
	Catalin Marinas, James Morse, Chao Peng, Suzuki K Poulose,
	Mark Rutland, Fuad Tabba, Oliver Upton, Marc Zyngier,
	kernel-team, kvm, linux-arm-kernel

On Wed, Oct 19, 2022 at 04:00:06PM +0000, Quentin Perret wrote:
> On Monday 17 Oct 2022 at 12:51:57 (+0100), Will Deacon wrote:
> > +struct kvm_protected_vm {
> > +	pkvm_handle_t handle;
> > +	struct mutex vm_lock;
> 
> Why is this lock needed btw? Isn't kvm->lock good enough?

Good question. I can't see why kvm->lock wouldn't work for this series,
so I'll drop this for now. I have vague recollections that we ran into
lock ordering issues in an early implementation wrt vcpu locking, but
for now let's drop it.

Cheers,

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2022-10-19 16:35 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-17 11:51 [PATCH v4 00/25] KVM: arm64: Introduce pKVM hyp VM and vCPU state at EL2 Will Deacon
2022-10-17 11:51 ` [PATCH v4 01/25] KVM: arm64: Move hyp refcount manipulation helpers to common header file Will Deacon
2022-10-17 20:29   ` Philippe Mathieu-Daudé
2022-10-17 11:51 ` [PATCH v4 02/25] KVM: arm64: Allow attaching of non-coalescable pages to a hyp pool Will Deacon
2022-10-17 11:51 ` [PATCH v4 03/25] KVM: arm64: Back the hypervisor 'struct hyp_page' array for all memory Will Deacon
2022-10-17 11:51 ` [PATCH v4 04/25] KVM: arm64: Fix-up hyp stage-1 refcounts for all pages mapped at EL2 Will Deacon
2022-10-17 11:51 ` [PATCH v4 05/25] KVM: arm64: Unify identifiers used to distinguish host and hypervisor Will Deacon
2022-10-17 20:21   ` Philippe Mathieu-Daudé
2022-10-17 11:51 ` [PATCH v4 06/25] KVM: arm64: Implement do_donate() helper for donating memory Will Deacon
2022-10-17 11:51 ` [PATCH v4 07/25] KVM: arm64: Prevent the donation of no-map pages Will Deacon
2022-10-18 13:42   ` Philippe Mathieu-Daudé
2022-10-17 11:51 ` [PATCH v4 08/25] KVM: arm64: Add helpers to pin memory shared with the hypervisor at EL2 Will Deacon
2022-10-17 11:51 ` [PATCH v4 09/25] KVM: arm64: Include asm/kvm_mmu.h in nvhe/mem_protect.h Will Deacon
2022-10-17 20:22   ` Philippe Mathieu-Daudé
2022-10-17 11:51 ` [PATCH v4 10/25] KVM: arm64: Add hyp_spinlock_t static initializer Will Deacon
2022-10-18 13:51   ` Philippe Mathieu-Daudé
2022-10-17 11:51 ` [PATCH v4 11/25] KVM: arm64: Rename 'host_kvm' to 'host_mmu' Will Deacon
2022-10-18 13:47   ` Philippe Mathieu-Daudé
2022-10-17 11:51 ` [PATCH v4 12/25] KVM: arm64: Add infrastructure to create and track pKVM instances at EL2 Will Deacon
2022-10-18 15:13   ` Quentin Perret
2022-10-19 12:35     ` Will Deacon
2022-10-18 16:21   ` Quentin Perret
2022-10-19 12:45     ` Will Deacon
2022-10-18 16:33   ` Quentin Perret
2022-10-19 11:57     ` Will Deacon
2022-10-19 13:35       ` Quentin Perret
2022-10-18 16:40   ` Quentin Perret
2022-10-19 12:44     ` Will Deacon
2022-10-18 16:45   ` Quentin Perret
2022-10-19 12:18     ` Fuad Tabba
2022-10-17 11:51 ` [PATCH v4 13/25] KVM: arm64: Instantiate pKVM hypervisor VM and vCPU structures from EL1 Will Deacon
2022-10-19 15:46   ` Quentin Perret
2022-10-19 16:00   ` Quentin Perret
2022-10-19 16:34     ` Will Deacon
2022-10-17 11:51 ` [PATCH v4 14/25] KVM: arm64: Add per-cpu fixmap infrastructure at EL2 Will Deacon
2022-10-18 11:06   ` Mark Rutland
2022-10-18 14:05     ` Will Deacon
2022-10-18 16:52       ` Mark Rutland
2022-10-19 12:01         ` Will Deacon
2022-10-17 11:51 ` [PATCH v4 15/25] KVM: arm64: Initialise hypervisor copies of host symbols unconditionally Will Deacon
2022-10-17 20:26   ` Philippe Mathieu-Daudé
2022-10-17 11:52 ` [PATCH v4 16/25] KVM: arm64: Provide I-cache invalidation by virtual address at EL2 Will Deacon
2022-10-17 11:52 ` [PATCH v4 17/25] KVM: arm64: Add generic hyp_memcache helpers Will Deacon
2022-10-17 11:52 ` [PATCH v4 18/25] KVM: arm64: Consolidate stage-2 initialisation into a single function Will Deacon
2022-10-17 11:52 ` [PATCH v4 19/25] KVM: arm64: Instantiate guest stage-2 page-tables at EL2 Will Deacon
2022-10-17 11:52 ` [PATCH v4 20/25] KVM: arm64: Return guest memory from EL2 via dedicated teardown memcache Will Deacon
2022-10-19 15:52   ` Quentin Perret
2022-10-19 16:24     ` Will Deacon
2022-10-17 11:52 ` [PATCH v4 21/25] KVM: arm64: Unmap 'kvm_arm_hyp_percpu_base' from the host Will Deacon
2022-10-17 11:52 ` [PATCH v4 22/25] KVM: arm64: Maintain a copy of 'kvm_arm_vmid_bits' at EL2 Will Deacon
2022-10-17 11:52 ` [PATCH v4 23/25] KVM: arm64: Explicitly map 'kvm_vgic_global_state' " Will Deacon
2022-10-17 11:52 ` [PATCH v4 24/25] KVM: arm64: Don't unnecessarily map host kernel sections " Will Deacon
2022-10-17 11:52 ` [PATCH v4 25/25] KVM: arm64: Use the pKVM hyp vCPU structure in handle___kvm_vcpu_run() Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).