linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/15] KVM: arm64: Introduce kvm_share_hyp()
@ 2021-12-01 17:03 Quentin Perret
  2021-12-01 17:03 ` [PATCH v3 01/15] KVM: arm64: Check if running in VHE from kvm_host_owns_hyp_mappings() Quentin Perret
                   ` (14 more replies)
  0 siblings, 15 replies; 37+ messages in thread
From: Quentin Perret @ 2021-12-01 17:03 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvmarm, linux-kernel, kernel-team

Hi all,

This is v3 of the series previously posted here:

  https://lore.kernel.org/kvmarm/20211019121304.2732332-1-qperret@google.com/

This series implements an unshare hypercall at EL2 in nVHE protected
mode, and makes use of it to unmmap guest-specific data-structures from
EL2 stage-1 during guest tear-down. Crucially, the implementation of the
share and unshare routines use page refcounts in the host kernel to
avoid accidentally unmapping data-structures that overlap a common page.

This series has two main benefits. Firstly it allows EL2 to track the
state of shared pages cleanly, as they can now transition from SHARED
back to OWNED. This will simplify permission checks once e.g. pkvm
implements a donation hcall to provide memory to protected guests, as
there should then be no reason for the host to donate a page that is
currently marked shared. And secondly, it avoids having dangling
mappings in the hypervisor's stage-1, which should be a good idea from
a security perspective as the hypervisor is obviously running with
elevated privileges. And perhaps worth noting is that this also
refactors the EL2 page-tracking checks in a more scalable way, which
should allow to implement other memory transitions (host donating memory
to a guest, a guest sharing back with the host, ...) much more easily in
the future.

Changes since v2:

 - Added a check in kvm_share_hyp() to prevent sharing vmalloc pages;

 - Rebased on kvmarm/next, which contains Marc's rework of FPSIMD/SVE
   tracking [1].

Thanks!
Quentin

[1] https://lore.kernel.org/kvmarm/20211201120436.389756-1-maz@kernel.org/

Quentin Perret (7):
  KVM: arm64: Check if running in VHE from kvm_host_owns_hyp_mappings()
  KVM: arm64: Provide {get,put}_page() stubs for early hyp allocator
  KVM: arm64: Refcount hyp stage-1 pgtable pages
  KVM: arm64: Fixup hyp stage-1 refcount
  KVM: arm64: Introduce kvm_share_hyp()
  KVM: arm64: pkvm: Refcount the pages shared with EL2
  KVM: arm64: pkvm: Unshare guest structs during teardown

Will Deacon (8):
  KVM: arm64: Hook up ->page_count() for hypervisor stage-1 page-table
  KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2
  KVM: arm64: Extend pkvm_page_state enumeration to handle absent pages
  KVM: arm64: Introduce wrappers for host and hyp spin lock accessors
  KVM: arm64: Implement do_share() helper for sharing memory
  KVM: arm64: Implement __pkvm_host_share_hyp() using do_share()
  KVM: arm64: Implement do_unshare() helper for unsharing memory
  KVM: arm64: Expose unshare hypercall to the host

 arch/arm64/include/asm/kvm_asm.h              |   1 +
 arch/arm64/include/asm/kvm_host.h             |   2 +
 arch/arm64/include/asm/kvm_mmu.h              |   2 +
 arch/arm64/include/asm/kvm_pgtable.h          |  21 +
 arch/arm64/kvm/arm.c                          |   6 +-
 arch/arm64/kvm/fpsimd.c                       |  36 +-
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   6 +
 arch/arm64/kvm/hyp/nvhe/early_alloc.c         |   5 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |   8 +
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 500 +++++++++++++++---
 arch/arm64/kvm/hyp/nvhe/setup.c               |  22 +-
 arch/arm64/kvm/hyp/pgtable.c                  |  80 ++-
 arch/arm64/kvm/mmu.c                          | 140 ++++-
 arch/arm64/kvm/reset.c                        |  10 +-
 14 files changed, 737 insertions(+), 102 deletions(-)

-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 01/15] KVM: arm64: Check if running in VHE from kvm_host_owns_hyp_mappings()
  2021-12-01 17:03 [PATCH v3 00/15] KVM: arm64: Introduce kvm_share_hyp() Quentin Perret
@ 2021-12-01 17:03 ` Quentin Perret
  2021-12-09 10:10   ` Will Deacon
  2021-12-01 17:03 ` [PATCH v3 02/15] KVM: arm64: Provide {get,put}_page() stubs for early hyp allocator Quentin Perret
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Quentin Perret @ 2021-12-01 17:03 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvmarm, linux-kernel, kernel-team

The kvm_host_owns_hyp_mappings() function should return true if and only
if the host kernel is responsible for creating the hypervisor stage-1
mappings. That is only possible in standard non-VHE mode, or during boot
in protected nVHE mode. But either way, non of this makes sense in VHE,
so make sure to catch this case as well, hence making the function
return sensible values in any context (VHE or not).

Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/mmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 326cdfec74a1..f8f1096a297f 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -239,6 +239,9 @@ void free_hyp_pgds(void)
 
 static bool kvm_host_owns_hyp_mappings(void)
 {
+	if (is_kernel_in_hyp_mode())
+		return false;
+
 	if (static_branch_likely(&kvm_protected_mode_initialized))
 		return false;
 
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 02/15] KVM: arm64: Provide {get,put}_page() stubs for early hyp allocator
  2021-12-01 17:03 [PATCH v3 00/15] KVM: arm64: Introduce kvm_share_hyp() Quentin Perret
  2021-12-01 17:03 ` [PATCH v3 01/15] KVM: arm64: Check if running in VHE from kvm_host_owns_hyp_mappings() Quentin Perret
@ 2021-12-01 17:03 ` Quentin Perret
  2021-12-09 10:12   ` Will Deacon
  2021-12-01 17:03 ` [PATCH v3 03/15] KVM: arm64: Refcount hyp stage-1 pgtable pages Quentin Perret
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Quentin Perret @ 2021-12-01 17:03 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvmarm, linux-kernel, kernel-team

In nVHE protected mode, the EL2 code uses a temporary allocator during
boot while re-creating its stage-1 page-table. Unfortunately, the
hyp_vmmemap is not ready to use at this stage, so refcounting pages
is not possible. That is not currently a problem because hyp stage-1
mappings are never removed, which implies refcounting of page-table
pages is unnecessary.

In preparation for allowing hypervisor stage-1 mappings to be removed,
provide stub implementations for {get,put}_page() in the early allocator.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/nvhe/early_alloc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/early_alloc.c b/arch/arm64/kvm/hyp/nvhe/early_alloc.c
index 1306c430ab87..00de04153cc6 100644
--- a/arch/arm64/kvm/hyp/nvhe/early_alloc.c
+++ b/arch/arm64/kvm/hyp/nvhe/early_alloc.c
@@ -43,6 +43,9 @@ void *hyp_early_alloc_page(void *arg)
 	return hyp_early_alloc_contig(1);
 }
 
+static void hyp_early_alloc_get_page(void *addr) { }
+static void hyp_early_alloc_put_page(void *addr) { }
+
 void hyp_early_alloc_init(void *virt, unsigned long size)
 {
 	base = cur = (unsigned long)virt;
@@ -51,4 +54,6 @@ void hyp_early_alloc_init(void *virt, unsigned long size)
 	hyp_early_alloc_mm_ops.zalloc_page = hyp_early_alloc_page;
 	hyp_early_alloc_mm_ops.phys_to_virt = hyp_phys_to_virt;
 	hyp_early_alloc_mm_ops.virt_to_phys = hyp_virt_to_phys;
+	hyp_early_alloc_mm_ops.get_page = hyp_early_alloc_get_page;
+	hyp_early_alloc_mm_ops.put_page = hyp_early_alloc_put_page;
 }
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 03/15] KVM: arm64: Refcount hyp stage-1 pgtable pages
  2021-12-01 17:03 [PATCH v3 00/15] KVM: arm64: Introduce kvm_share_hyp() Quentin Perret
  2021-12-01 17:03 ` [PATCH v3 01/15] KVM: arm64: Check if running in VHE from kvm_host_owns_hyp_mappings() Quentin Perret
  2021-12-01 17:03 ` [PATCH v3 02/15] KVM: arm64: Provide {get,put}_page() stubs for early hyp allocator Quentin Perret
@ 2021-12-01 17:03 ` Quentin Perret
  2021-12-09 10:29   ` Will Deacon
  2021-12-01 17:03 ` [PATCH v3 04/15] KVM: arm64: Fixup hyp stage-1 refcount Quentin Perret
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Quentin Perret @ 2021-12-01 17:03 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvmarm, linux-kernel, kernel-team

To prepare the ground for allowing hyp stage-1 mappings to be removed at
run-time, update the KVM page-table code to maintain a correct refcount
using the ->{get,put}_page() function callbacks.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/pgtable.c | 17 ++++++++++++++---
 1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index f8ceebe4982e..768a58835153 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -408,8 +408,10 @@ static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 		return false;
 
 	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
-	if (hyp_pte_needs_update(old, new))
+	if (hyp_pte_needs_update(old, new)) {
 		smp_store_release(ptep, new);
+		data->mm_ops->get_page(ptep);
+	}
 
 	data->phys += granule;
 	return true;
@@ -433,6 +435,7 @@ static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 		return -ENOMEM;
 
 	kvm_set_table_pte(ptep, childp, mm_ops);
+	mm_ops->get_page(ptep);
 	return 0;
 }
 
@@ -482,8 +485,16 @@ static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			   enum kvm_pgtable_walk_flags flag, void * const arg)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = arg;
+	kvm_pte_t pte = *ptep;
+
+	if (!kvm_pte_valid(pte))
+		return 0;
+
+	mm_ops->put_page(ptep);
+
+	if (kvm_pte_table(pte, level))
+		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
 
-	mm_ops->put_page((void *)kvm_pte_follow(*ptep, mm_ops));
 	return 0;
 }
 
@@ -491,7 +502,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 {
 	struct kvm_pgtable_walker walker = {
 		.cb	= hyp_free_walker,
-		.flags	= KVM_PGTABLE_WALK_TABLE_POST,
+		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
 		.arg	= pgt->mm_ops,
 	};
 
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 04/15] KVM: arm64: Fixup hyp stage-1 refcount
  2021-12-01 17:03 [PATCH v3 00/15] KVM: arm64: Introduce kvm_share_hyp() Quentin Perret
                   ` (2 preceding siblings ...)
  2021-12-01 17:03 ` [PATCH v3 03/15] KVM: arm64: Refcount hyp stage-1 pgtable pages Quentin Perret
@ 2021-12-01 17:03 ` Quentin Perret
  2021-12-09 11:09   ` Will Deacon
  2021-12-01 17:03 ` [PATCH v3 05/15] KVM: arm64: Hook up ->page_count() for hypervisor stage-1 page-table Quentin Perret
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Quentin Perret @ 2021-12-01 17:03 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvmarm, linux-kernel, kernel-team

In nVHE-protected mode, the hyp stage-1 page-table refcount is broken
due to the lack of refcount support in the early allocator. Fix-up the
refcount in the finalize walker, once the 'hyp_vmemmap' is up and running.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/nvhe/setup.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 578f71798c2e..875b5174342f 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -165,6 +165,7 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
 					 enum kvm_pgtable_walk_flags flag,
 					 void * const arg)
 {
+	struct kvm_pgtable_mm_ops *mm_ops = arg;
 	enum kvm_pgtable_prot prot;
 	enum pkvm_page_state state;
 	kvm_pte_t pte = *ptep;
@@ -173,6 +174,15 @@ static int finalize_host_mappings_walker(u64 addr, u64 end, u32 level,
 	if (!kvm_pte_valid(pte))
 		return 0;
 
+	/*
+	 * Fix-up the refcount for the page-table pages as the early allocator
+	 * was unable to access the hyp_vmemmap and so the buddy allocator has
+	 * initialised the refcount to '1'.
+	 */
+	mm_ops->get_page(ptep);
+	if (flag != KVM_PGTABLE_WALK_LEAF)
+		return 0;
+
 	if (level != (KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
@@ -205,7 +215,8 @@ static int finalize_host_mappings(void)
 {
 	struct kvm_pgtable_walker walker = {
 		.cb	= finalize_host_mappings_walker,
-		.flags	= KVM_PGTABLE_WALK_LEAF,
+		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
+		.arg	= pkvm_pgtable.mm_ops,
 	};
 	int i, ret;
 
@@ -240,10 +251,6 @@ void __noreturn __pkvm_init_finalise(void)
 	if (ret)
 		goto out;
 
-	ret = finalize_host_mappings();
-	if (ret)
-		goto out;
-
 	pkvm_pgtable_mm_ops = (struct kvm_pgtable_mm_ops) {
 		.zalloc_page = hyp_zalloc_hyp_page,
 		.phys_to_virt = hyp_phys_to_virt,
@@ -253,6 +260,10 @@ void __noreturn __pkvm_init_finalise(void)
 	};
 	pkvm_pgtable.mm_ops = &pkvm_pgtable_mm_ops;
 
+	ret = finalize_host_mappings();
+	if (ret)
+		goto out;
+
 out:
 	/*
 	 * We tail-called to here from handle___pkvm_init() and will not return,
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 05/15] KVM: arm64: Hook up ->page_count() for hypervisor stage-1 page-table
  2021-12-01 17:03 [PATCH v3 00/15] KVM: arm64: Introduce kvm_share_hyp() Quentin Perret
                   ` (3 preceding siblings ...)
  2021-12-01 17:03 ` [PATCH v3 04/15] KVM: arm64: Fixup hyp stage-1 refcount Quentin Perret
@ 2021-12-01 17:03 ` Quentin Perret
  2021-12-01 17:04 ` [PATCH v3 06/15] KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2 Quentin Perret
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Quentin Perret @ 2021-12-01 17:03 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvmarm, linux-kernel, kernel-team

From: Will Deacon <will@kernel.org>

kvm_pgtable_hyp_unmap() relies on the ->page_count() function callback
being provided by the memory-management operations for the page-table.

Wire up this callback for the hypervisor stage-1 page-table.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/nvhe/setup.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 875b5174342f..855a19056627 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -257,6 +257,7 @@ void __noreturn __pkvm_init_finalise(void)
 		.virt_to_phys = hyp_virt_to_phys,
 		.get_page = hpool_get_page,
 		.put_page = hpool_put_page,
+		.page_count = hyp_page_count,
 	};
 	pkvm_pgtable.mm_ops = &pkvm_pgtable_mm_ops;
 
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 06/15] KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2
  2021-12-01 17:03 [PATCH v3 00/15] KVM: arm64: Introduce kvm_share_hyp() Quentin Perret
                   ` (4 preceding siblings ...)
  2021-12-01 17:03 ` [PATCH v3 05/15] KVM: arm64: Hook up ->page_count() for hypervisor stage-1 page-table Quentin Perret
@ 2021-12-01 17:04 ` Quentin Perret
  2021-12-07 14:47   ` Andrew Walbran
  2021-12-01 17:04 ` [PATCH v3 07/15] KVM: arm64: Introduce kvm_share_hyp() Quentin Perret
                   ` (8 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Quentin Perret @ 2021-12-01 17:04 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvmarm, linux-kernel, kernel-team

From: Will Deacon <will@kernel.org>

Implement kvm_pgtable_hyp_unmap() which can be used to remove hypervisor
stage-1 mappings at EL2.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h | 21 ++++++++++
 arch/arm64/kvm/hyp/pgtable.c         | 63 ++++++++++++++++++++++++++++
 2 files changed, 84 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 027783829584..9d076f36401d 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -251,6 +251,27 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt);
 int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 			enum kvm_pgtable_prot prot);
 
+/**
+ * kvm_pgtable_hyp_unmap() - Remove a mapping from a hypervisor stage-1 page-table.
+ * @pgt:	Page-table structure initialised by kvm_pgtable_hyp_init().
+ * @addr:	Virtual address from which to remove the mapping.
+ * @size:	Size of the mapping.
+ *
+ * The offset of @addr within a page is ignored, @size is rounded-up to
+ * the next page boundary and @phys is rounded-down to the previous page
+ * boundary.
+ *
+ * TLB invalidation is performed for each page-table entry cleared during the
+ * unmapping operation and the reference count for the page-table page
+ * containing the cleared entry is decremented, with unreferenced pages being
+ * freed. The unmapping operation will stop early if it encounters either an
+ * invalid page-table entry or a valid block mapping which maps beyond the range
+ * being unmapped.
+ *
+ * Return: Number of bytes unmapped, which may be 0.
+ */
+u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
+
 /**
  * kvm_get_vtcr() - Helper to construct VTCR_EL2
  * @mmfr0:	Sanitized value of SYS_ID_AA64MMFR0_EL1 register.
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 768a58835153..6ad4cb2d6947 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -463,6 +463,69 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 	return ret;
 }
 
+struct hyp_unmap_data {
+	u64				unmapped;
+	struct kvm_pgtable_mm_ops	*mm_ops;
+};
+
+static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			    enum kvm_pgtable_walk_flags flag, void * const arg)
+{
+	kvm_pte_t pte = *ptep, *childp = NULL;
+	u64 granule = kvm_granule_size(level);
+	struct hyp_unmap_data *data = arg;
+	struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
+
+	if (!kvm_pte_valid(pte))
+		return -EINVAL;
+
+	if (kvm_pte_table(pte, level)) {
+		childp = kvm_pte_follow(pte, mm_ops);
+
+		if (mm_ops->page_count(childp) != 1)
+			return 0;
+
+		kvm_clear_pte(ptep);
+		dsb(ishst);
+		__tlbi_level(vae2is, __TLBI_VADDR(addr, 0), level);
+	} else {
+		if (end - addr < granule)
+			return -EINVAL;
+
+		kvm_clear_pte(ptep);
+		dsb(ishst);
+		__tlbi_level(vale2is, __TLBI_VADDR(addr, 0), level);
+		data->unmapped += granule;
+	}
+
+	dsb(ish);
+	isb();
+	mm_ops->put_page(ptep);
+
+	if (childp)
+		mm_ops->put_page(childp);
+
+	return 0;
+}
+
+u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
+{
+	struct hyp_unmap_data unmap_data = {
+		.mm_ops	= pgt->mm_ops,
+	};
+	struct kvm_pgtable_walker walker = {
+		.cb	= hyp_unmap_walker,
+		.arg	= &unmap_data,
+		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
+	};
+
+	if (!pgt->mm_ops->page_count)
+		return 0;
+
+	kvm_pgtable_walk(pgt, addr, size, &walker);
+	return unmap_data.unmapped;
+}
+
 int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
 			 struct kvm_pgtable_mm_ops *mm_ops)
 {
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 07/15] KVM: arm64: Introduce kvm_share_hyp()
  2021-12-01 17:03 [PATCH v3 00/15] KVM: arm64: Introduce kvm_share_hyp() Quentin Perret
                   ` (5 preceding siblings ...)
  2021-12-01 17:04 ` [PATCH v3 06/15] KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2 Quentin Perret
@ 2021-12-01 17:04 ` Quentin Perret
  2021-12-09 11:13   ` Will Deacon
  2021-12-01 17:04 ` [PATCH v3 08/15] KVM: arm64: pkvm: Refcount the pages shared with EL2 Quentin Perret
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Quentin Perret @ 2021-12-01 17:04 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvmarm, linux-kernel, kernel-team

The create_hyp_mappings() function can currently be called at any point
in time. However, its behaviour in protected mode changes widely
depending on when it is being called. Prior to KVM init, it is used to
create the temporary page-table used to bring-up the hypervisor, and
later on it is transparently turned into a 'share' hypercall when the
kernel has lost control over the hypervisor stage-1. In order to prepare
the ground for also unsharing pages with the hypervisor during guest
teardown, introduce a kvm_share_hyp() function to make it clear in which
places a share hypercall should be expected, as we will soon need a
matching unshare hypercall in all those places.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_mmu.h |  1 +
 arch/arm64/kvm/arm.c             |  4 ++--
 arch/arm64/kvm/fpsimd.c          |  2 +-
 arch/arm64/kvm/mmu.c             | 27 +++++++++++++++++++++------
 arch/arm64/kvm/reset.c           |  2 +-
 5 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 02d378887743..185d0f62b724 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -150,6 +150,7 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
 #include <asm/kvm_pgtable.h>
 #include <asm/stage2_pgtable.h>
 
+int kvm_share_hyp(void *from, void *to);
 int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
 int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
 			   void __iomem **kaddr,
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 9b745d2bc89a..c202abb448b1 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -146,7 +146,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	if (ret)
 		return ret;
 
-	ret = create_hyp_mappings(kvm, kvm + 1, PAGE_HYP);
+	ret = kvm_share_hyp(kvm, kvm + 1);
 	if (ret)
 		goto out_free_stage2_pgd;
 
@@ -342,7 +342,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 	if (err)
 		return err;
 
-	return create_hyp_mappings(vcpu, vcpu + 1, PAGE_HYP);
+	return kvm_share_hyp(vcpu, vcpu + 1);
 }
 
 void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index 5526d79c7b47..86899d3aa9a9 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -30,7 +30,7 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
 	struct user_fpsimd_state *fpsimd = &current->thread.uw.fpsimd_state;
 
 	/* Make sure the host task fpsimd state is visible to hyp: */
-	ret = create_hyp_mappings(fpsimd, fpsimd + 1, PAGE_HYP);
+	ret = kvm_share_hyp(fpsimd, fpsimd + 1);
 	if (!ret)
 		vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd);
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index f8f1096a297f..fd868fb9d922 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -299,6 +299,25 @@ static int pkvm_share_hyp(phys_addr_t start, phys_addr_t end)
 	return 0;
 }
 
+int kvm_share_hyp(void *from, void *to)
+{
+	if (is_kernel_in_hyp_mode())
+		return 0;
+
+	/*
+	 * The share hcall maps things in the 'fixed-offset' region of the hyp
+	 * VA space, so we can only share physically contiguous data-structures
+	 * for now.
+	 */
+	if (is_vmalloc_addr(from) || is_vmalloc_addr(to))
+		return -EINVAL;
+
+	if (kvm_host_owns_hyp_mappings())
+		return create_hyp_mappings(from, to, PAGE_HYP);
+
+	return pkvm_share_hyp(__pa(from), __pa(to));
+}
+
 /**
  * create_hyp_mappings - duplicate a kernel virtual address range in Hyp mode
  * @from:	The virtual kernel start address of the range
@@ -319,12 +338,8 @@ int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
 	if (is_kernel_in_hyp_mode())
 		return 0;
 
-	if (!kvm_host_owns_hyp_mappings()) {
-		if (WARN_ON(prot != PAGE_HYP))
-			return -EPERM;
-		return pkvm_share_hyp(kvm_kaddr_to_phys(from),
-				      kvm_kaddr_to_phys(to));
-	}
+	if (!kvm_host_owns_hyp_mappings())
+		return -EPERM;
 
 	start = start & PAGE_MASK;
 	end = PAGE_ALIGN(end);
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index c7a0249df840..e3e2a79fbd75 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -113,7 +113,7 @@ static int kvm_vcpu_finalize_sve(struct kvm_vcpu *vcpu)
 	if (!buf)
 		return -ENOMEM;
 
-	ret = create_hyp_mappings(buf, buf + reg_sz, PAGE_HYP);
+	ret = kvm_share_hyp(buf, buf + reg_sz);
 	if (ret) {
 		kfree(buf);
 		return ret;
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 08/15] KVM: arm64: pkvm: Refcount the pages shared with EL2
  2021-12-01 17:03 [PATCH v3 00/15] KVM: arm64: Introduce kvm_share_hyp() Quentin Perret
                   ` (6 preceding siblings ...)
  2021-12-01 17:04 ` [PATCH v3 07/15] KVM: arm64: Introduce kvm_share_hyp() Quentin Perret
@ 2021-12-01 17:04 ` Quentin Perret
  2021-12-09 11:16   ` Will Deacon
  2021-12-01 17:04 ` [PATCH v3 09/15] KVM: arm64: Extend pkvm_page_state enumeration to handle absent pages Quentin Perret
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Quentin Perret @ 2021-12-01 17:04 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvmarm, linux-kernel, kernel-team

In order to simplify the page tracking infrastructure at EL2 in nVHE
protected mode, move the responsibility of refcounting pages that are
shared multiple times on the host. In order to do so, let's create a
red-black tree tracking all the PFNs that have been shared, along with
a refcount.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/mmu.c | 78 ++++++++++++++++++++++++++++++++++++++------
 1 file changed, 68 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index fd868fb9d922..d72566896755 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -284,23 +284,72 @@ static phys_addr_t kvm_kaddr_to_phys(void *kaddr)
 	}
 }
 
-static int pkvm_share_hyp(phys_addr_t start, phys_addr_t end)
+struct hyp_shared_pfn {
+	u64 pfn;
+	int count;
+	struct rb_node node;
+};
+
+static DEFINE_MUTEX(hyp_shared_pfns_lock);
+static struct rb_root hyp_shared_pfns = RB_ROOT;
+
+static struct hyp_shared_pfn *find_shared_pfn(u64 pfn, struct rb_node ***node,
+					      struct rb_node **parent)
 {
-	phys_addr_t addr;
-	int ret;
+	struct hyp_shared_pfn *this;
+
+	*node = &hyp_shared_pfns.rb_node;
+	*parent = NULL;
+	while (**node) {
+		this = container_of(**node, struct hyp_shared_pfn, node);
+		*parent = **node;
+		if (this->pfn < pfn)
+			*node = &((**node)->rb_left);
+		else if (this->pfn > pfn)
+			*node = &((**node)->rb_right);
+		else
+			return this;
+	}
 
-	for (addr = ALIGN_DOWN(start, PAGE_SIZE); addr < end; addr += PAGE_SIZE) {
-		ret = kvm_call_hyp_nvhe(__pkvm_host_share_hyp,
-					__phys_to_pfn(addr));
-		if (ret)
-			return ret;
+	return NULL;
+}
+
+static int share_pfn_hyp(u64 pfn)
+{
+	struct rb_node **node, *parent;
+	struct hyp_shared_pfn *this;
+	int ret = 0;
+
+	mutex_lock(&hyp_shared_pfns_lock);
+	this = find_shared_pfn(pfn, &node, &parent);
+	if (this) {
+		this->count++;
+		goto unlock;
 	}
 
-	return 0;
+	this = kzalloc(sizeof(*this), GFP_KERNEL);
+	if (!this) {
+		ret = -ENOMEM;
+		goto unlock;
+	}
+
+	this->pfn = pfn;
+	this->count = 1;
+	rb_link_node(&this->node, parent, node);
+	rb_insert_color(&this->node, &hyp_shared_pfns);
+	ret = kvm_call_hyp_nvhe(__pkvm_host_share_hyp, pfn, 1);
+unlock:
+	mutex_unlock(&hyp_shared_pfns_lock);
+
+	return ret;
 }
 
 int kvm_share_hyp(void *from, void *to)
 {
+	phys_addr_t start, end, cur;
+	u64 pfn;
+	int ret;
+
 	if (is_kernel_in_hyp_mode())
 		return 0;
 
@@ -315,7 +364,16 @@ int kvm_share_hyp(void *from, void *to)
 	if (kvm_host_owns_hyp_mappings())
 		return create_hyp_mappings(from, to, PAGE_HYP);
 
-	return pkvm_share_hyp(__pa(from), __pa(to));
+	start = ALIGN_DOWN(__pa(from), PAGE_SIZE);
+	end = PAGE_ALIGN(__pa(to));
+	for (cur = start; cur < end; cur += PAGE_SIZE) {
+		pfn = __phys_to_pfn(cur);
+		ret = share_pfn_hyp(pfn);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
 }
 
 /**
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 09/15] KVM: arm64: Extend pkvm_page_state enumeration to handle absent pages
  2021-12-01 17:03 [PATCH v3 00/15] KVM: arm64: Introduce kvm_share_hyp() Quentin Perret
                   ` (7 preceding siblings ...)
  2021-12-01 17:04 ` [PATCH v3 08/15] KVM: arm64: pkvm: Refcount the pages shared with EL2 Quentin Perret
@ 2021-12-01 17:04 ` Quentin Perret
  2021-12-14 14:47   ` Andrew Walbran
  2021-12-01 17:04 ` [PATCH v3 10/15] KVM: arm64: Introduce wrappers for host and hyp spin lock accessors Quentin Perret
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Quentin Perret @ 2021-12-01 17:04 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvmarm, linux-kernel, kernel-team

From: Will Deacon <will@kernel.org>

Explicitly name the combination of SW0 | SW1 as reserved in the pte and
introduce a new PKVM_NOPAGE meta-state which, although not directly
stored in the software bits of the pte, can be used to represent an
entry for which there is no underlying page. This is distinct from an
invalid pte, as stage-2 identity mappings for the host are created
lazily and so an invalid pte there is the same as a valid mapping for
the purposes of ownership information.

This state will be used for permission checking during page transitions
in later patches.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index b58c910babaf..56445586c755 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -24,6 +24,11 @@ enum pkvm_page_state {
 	PKVM_PAGE_OWNED			= 0ULL,
 	PKVM_PAGE_SHARED_OWNED		= KVM_PGTABLE_PROT_SW0,
 	PKVM_PAGE_SHARED_BORROWED	= KVM_PGTABLE_PROT_SW1,
+	__PKVM_PAGE_RESERVED		= KVM_PGTABLE_PROT_SW0 |
+					  KVM_PGTABLE_PROT_SW1,
+
+	/* Meta-states which aren't encoded directly in the PTE's SW bits */
+	PKVM_NOPAGE,
 };
 
 #define PKVM_PAGE_STATE_PROT_MASK	(KVM_PGTABLE_PROT_SW0 | KVM_PGTABLE_PROT_SW1)
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 10/15] KVM: arm64: Introduce wrappers for host and hyp spin lock accessors
  2021-12-01 17:03 [PATCH v3 00/15] KVM: arm64: Introduce kvm_share_hyp() Quentin Perret
                   ` (8 preceding siblings ...)
  2021-12-01 17:04 ` [PATCH v3 09/15] KVM: arm64: Extend pkvm_page_state enumeration to handle absent pages Quentin Perret
@ 2021-12-01 17:04 ` Quentin Perret
  2021-12-14 14:48   ` Andrew Walbran
  2021-12-01 17:04 ` [PATCH v3 11/15] KVM: arm64: Implement do_share() helper for sharing memory Quentin Perret
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Quentin Perret @ 2021-12-01 17:04 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvmarm, linux-kernel, kernel-team

From: Will Deacon <will@kernel.org>

In preparation for adding additional locked sections for manipulating
page-tables at EL2, introduce some simple wrappers around the host and
hypervisor locks so that it's a bit easier to read and bit more difficult
to take the wrong lock (or even take them in the wrong order).

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 32 ++++++++++++++++++++++-----
 1 file changed, 26 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index c1a90dd022b8..757dfefe3aeb 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -27,6 +27,26 @@ static struct hyp_pool host_s2_pool;
 
 const u8 pkvm_hyp_id = 1;
 
+static void host_lock_component(void)
+{
+	hyp_spin_lock(&host_kvm.lock);
+}
+
+static void host_unlock_component(void)
+{
+	hyp_spin_unlock(&host_kvm.lock);
+}
+
+static void hyp_lock_component(void)
+{
+	hyp_spin_lock(&pkvm_pgd_lock);
+}
+
+static void hyp_unlock_component(void)
+{
+	hyp_spin_unlock(&pkvm_pgd_lock);
+}
+
 static void *host_s2_zalloc_pages_exact(size_t size)
 {
 	void *addr = hyp_alloc_pages(&host_s2_pool, get_order(size));
@@ -338,14 +358,14 @@ static int host_stage2_idmap(u64 addr)
 
 	prot = is_memory ? PKVM_HOST_MEM_PROT : PKVM_HOST_MMIO_PROT;
 
-	hyp_spin_lock(&host_kvm.lock);
+	host_lock_component();
 	ret = host_stage2_adjust_range(addr, &range);
 	if (ret)
 		goto unlock;
 
 	ret = host_stage2_idmap_locked(range.start, range.end - range.start, prot);
 unlock:
-	hyp_spin_unlock(&host_kvm.lock);
+	host_unlock_component();
 
 	return ret;
 }
@@ -369,8 +389,8 @@ int __pkvm_host_share_hyp(u64 pfn)
 	if (!addr_is_memory(addr))
 		return -EINVAL;
 
-	hyp_spin_lock(&host_kvm.lock);
-	hyp_spin_lock(&pkvm_pgd_lock);
+	host_lock_component();
+	hyp_lock_component();
 
 	ret = kvm_pgtable_get_leaf(&host_kvm.pgt, addr, &pte, NULL);
 	if (ret)
@@ -432,8 +452,8 @@ int __pkvm_host_share_hyp(u64 pfn)
 	BUG_ON(ret);
 
 unlock:
-	hyp_spin_unlock(&pkvm_pgd_lock);
-	hyp_spin_unlock(&host_kvm.lock);
+	hyp_unlock_component();
+	host_unlock_component();
 
 	return ret;
 }
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 11/15] KVM: arm64: Implement do_share() helper for sharing memory
  2021-12-01 17:03 [PATCH v3 00/15] KVM: arm64: Introduce kvm_share_hyp() Quentin Perret
                   ` (9 preceding siblings ...)
  2021-12-01 17:04 ` [PATCH v3 10/15] KVM: arm64: Introduce wrappers for host and hyp spin lock accessors Quentin Perret
@ 2021-12-01 17:04 ` Quentin Perret
  2021-12-10 15:18   ` Andrew Walbran
  2021-12-01 17:04 ` [PATCH v3 12/15] KVM: arm64: Implement __pkvm_host_share_hyp() using do_share() Quentin Perret
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 37+ messages in thread
From: Quentin Perret @ 2021-12-01 17:04 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvmarm, linux-kernel, kernel-team

From: Will Deacon <will@kernel.org>

By default, protected KVM isolates memory pages so that they are
accessible only to their owner: be it the host kernel, the hypervisor
at EL2 or (in future) the guest. Establishing shared-memory regions
between these components therefore involves a transition for each page
so that the owner can share memory with a borrower under a certain set
of permissions.

Introduce a do_share() helper for safely sharing a memory region between
two components. Currently, only host-to-hyp sharing is implemented, but
the code is easily extended to handle other combinations and the
permission checks for each component are reusable.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 237 ++++++++++++++++++++++++++
 1 file changed, 237 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 757dfefe3aeb..74ca4043b08a 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -471,3 +471,240 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
 	ret = host_stage2_idmap(addr);
 	BUG_ON(ret && ret != -EAGAIN);
 }
+
+/* This corresponds to locking order */
+enum pkvm_component_id {
+	PKVM_ID_HOST,
+	PKVM_ID_HYP,
+};
+
+struct pkvm_mem_transition {
+	u64				nr_pages;
+
+	struct {
+		enum pkvm_component_id	id;
+		/* Address in the initiator's address space */
+		u64			addr;
+
+		union {
+			struct {
+				/* Address in the completer's address space */
+				u64	completer_addr;
+			} host;
+		};
+	} initiator;
+
+	struct {
+		enum pkvm_component_id	id;
+	} completer;
+};
+
+struct pkvm_mem_share {
+	const struct pkvm_mem_transition	tx;
+	const enum kvm_pgtable_prot		prot;
+};
+
+struct check_walk_data {
+	enum pkvm_page_state	desired;
+	enum pkvm_page_state	(*get_page_state)(kvm_pte_t pte);
+};
+
+static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
+				      kvm_pte_t *ptep,
+				      enum kvm_pgtable_walk_flags flag,
+				      void * const arg)
+{
+	struct check_walk_data *d = arg;
+	kvm_pte_t pte = *ptep;
+
+	if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
+		return -EINVAL;
+
+	return d->get_page_state(pte) == d->desired ? 0 : -EPERM;
+}
+
+static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
+				  struct check_walk_data *data)
+{
+	struct kvm_pgtable_walker walker = {
+		.cb	= __check_page_state_visitor,
+		.arg	= data,
+		.flags	= KVM_PGTABLE_WALK_LEAF,
+	};
+
+	return kvm_pgtable_walk(pgt, addr, size, &walker);
+}
+
+static enum pkvm_page_state host_get_page_state(kvm_pte_t pte)
+{
+	if (!kvm_pte_valid(pte) && pte)
+		return PKVM_NOPAGE;
+
+	return pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte));
+}
+
+static int __host_check_page_state_range(u64 addr, u64 size,
+					 enum pkvm_page_state state)
+{
+	struct check_walk_data d = {
+		.desired	= state,
+		.get_page_state	= host_get_page_state,
+	};
+
+	hyp_assert_lock_held(&host_kvm.lock);
+	return check_page_state_range(&host_kvm.pgt, addr, size, &d);
+}
+
+static int __host_set_page_state_range(u64 addr, u64 size,
+				       enum pkvm_page_state state)
+{
+	enum kvm_pgtable_prot prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, state);
+
+	return host_stage2_idmap_locked(addr, size, prot);
+}
+
+static int host_request_owned_transition(u64 *completer_addr,
+					 const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+	u64 addr = tx->initiator.addr;
+
+	*completer_addr = tx->initiator.host.completer_addr;
+	return __host_check_page_state_range(addr, size, PKVM_PAGE_OWNED);
+}
+
+static int host_initiate_share(u64 *completer_addr,
+			       const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+	u64 addr = tx->initiator.addr;
+
+	*completer_addr = tx->initiator.host.completer_addr;
+	return __host_set_page_state_range(addr, size, PKVM_PAGE_SHARED_OWNED);
+}
+
+static enum pkvm_page_state hyp_get_page_state(kvm_pte_t pte)
+{
+	if (!kvm_pte_valid(pte))
+		return PKVM_NOPAGE;
+
+	return pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte));
+}
+
+static int __hyp_check_page_state_range(u64 addr, u64 size,
+					enum pkvm_page_state state)
+{
+	struct check_walk_data d = {
+		.desired	= state,
+		.get_page_state	= hyp_get_page_state,
+	};
+
+	hyp_assert_lock_held(&pkvm_pgd_lock);
+	return check_page_state_range(&pkvm_pgtable, addr, size, &d);
+}
+
+static bool __hyp_ack_skip_pgtable_check(const struct pkvm_mem_transition *tx)
+{
+	return !(IS_ENABLED(CONFIG_NVHE_EL2_DEBUG) ||
+		 tx->initiator.id != PKVM_ID_HOST);
+}
+
+static int hyp_ack_share(u64 addr, const struct pkvm_mem_transition *tx,
+			 enum kvm_pgtable_prot perms)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+
+	if (perms != PAGE_HYP)
+		return -EPERM;
+
+	if (__hyp_ack_skip_pgtable_check(tx))
+		return 0;
+
+	return __hyp_check_page_state_range(addr, size, PKVM_NOPAGE);
+}
+
+static int hyp_complete_share(u64 addr, const struct pkvm_mem_transition *tx,
+			      enum kvm_pgtable_prot perms)
+{
+	void *start = (void *)addr, *end = start + (tx->nr_pages * PAGE_SIZE);
+	enum kvm_pgtable_prot prot;
+
+	prot = pkvm_mkstate(perms, PKVM_PAGE_SHARED_BORROWED);
+	return pkvm_create_mappings_locked(start, end, prot);
+}
+
+static int check_share(struct pkvm_mem_share *share)
+{
+	const struct pkvm_mem_transition *tx = &share->tx;
+	u64 completer_addr;
+	int ret;
+
+	switch (tx->initiator.id) {
+	case PKVM_ID_HOST:
+		ret = host_request_owned_transition(&completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	if (ret)
+		return ret;
+
+	switch (tx->completer.id) {
+	case PKVM_ID_HYP:
+		ret = hyp_ack_share(completer_addr, tx, share->prot);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+static int __do_share(struct pkvm_mem_share *share)
+{
+	const struct pkvm_mem_transition *tx = &share->tx;
+	u64 completer_addr;
+	int ret;
+
+	switch (tx->initiator.id) {
+	case PKVM_ID_HOST:
+		ret = host_initiate_share(&completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	if (ret)
+		return ret;
+
+	switch (tx->completer.id) {
+	case PKVM_ID_HYP:
+		ret = hyp_complete_share(completer_addr, tx, share->prot);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+/*
+ * do_share():
+ *
+ * The page owner grants access to another component with a given set
+ * of permissions.
+ *
+ * Initiator: OWNED	=> SHARED_OWNED
+ * Completer: NOPAGE	=> SHARED_BORROWED
+ */
+static int do_share(struct pkvm_mem_share *share)
+{
+	int ret;
+
+	ret = check_share(share);
+	if (ret)
+		return ret;
+
+	return WARN_ON(__do_share(share));
+}
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 12/15] KVM: arm64: Implement __pkvm_host_share_hyp() using do_share()
  2021-12-01 17:03 [PATCH v3 00/15] KVM: arm64: Introduce kvm_share_hyp() Quentin Perret
                   ` (10 preceding siblings ...)
  2021-12-01 17:04 ` [PATCH v3 11/15] KVM: arm64: Implement do_share() helper for sharing memory Quentin Perret
@ 2021-12-01 17:04 ` Quentin Perret
  2021-12-01 17:04 ` [PATCH v3 13/15] KVM: arm64: Implement do_unshare() helper for unsharing memory Quentin Perret
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 37+ messages in thread
From: Quentin Perret @ 2021-12-01 17:04 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvmarm, linux-kernel, kernel-team

From: Will Deacon <will@kernel.org>

__pkvm_host_share_hyp() shares memory between the host and the
hypervisor so implement it as an invocation of the new do_share()
mechanism.

Note that double-sharing is no longer permitted (as this allows us to
reduce the number of page-table walks significantly), but is thankfully
no longer relied upon by the host.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 121 +++++++-------------------
 1 file changed, 33 insertions(+), 88 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 74ca4043b08a..1282cbd6b9b3 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -370,94 +370,6 @@ static int host_stage2_idmap(u64 addr)
 	return ret;
 }
 
-static inline bool check_prot(enum kvm_pgtable_prot prot,
-			      enum kvm_pgtable_prot required,
-			      enum kvm_pgtable_prot denied)
-{
-	return (prot & (required | denied)) == required;
-}
-
-int __pkvm_host_share_hyp(u64 pfn)
-{
-	phys_addr_t addr = hyp_pfn_to_phys(pfn);
-	enum kvm_pgtable_prot prot, cur;
-	void *virt = __hyp_va(addr);
-	enum pkvm_page_state state;
-	kvm_pte_t pte;
-	int ret;
-
-	if (!addr_is_memory(addr))
-		return -EINVAL;
-
-	host_lock_component();
-	hyp_lock_component();
-
-	ret = kvm_pgtable_get_leaf(&host_kvm.pgt, addr, &pte, NULL);
-	if (ret)
-		goto unlock;
-	if (!pte)
-		goto map_shared;
-
-	/*
-	 * Check attributes in the host stage-2 PTE. We need the page to be:
-	 *  - mapped RWX as we're sharing memory;
-	 *  - not borrowed, as that implies absence of ownership.
-	 * Otherwise, we can't let it got through
-	 */
-	cur = kvm_pgtable_stage2_pte_prot(pte);
-	prot = pkvm_mkstate(0, PKVM_PAGE_SHARED_BORROWED);
-	if (!check_prot(cur, PKVM_HOST_MEM_PROT, prot)) {
-		ret = -EPERM;
-		goto unlock;
-	}
-
-	state = pkvm_getstate(cur);
-	if (state == PKVM_PAGE_OWNED)
-		goto map_shared;
-
-	/*
-	 * Tolerate double-sharing the same page, but this requires
-	 * cross-checking the hypervisor stage-1.
-	 */
-	if (state != PKVM_PAGE_SHARED_OWNED) {
-		ret = -EPERM;
-		goto unlock;
-	}
-
-	ret = kvm_pgtable_get_leaf(&pkvm_pgtable, (u64)virt, &pte, NULL);
-	if (ret)
-		goto unlock;
-
-	/*
-	 * If the page has been shared with the hypervisor, it must be
-	 * already mapped as SHARED_BORROWED in its stage-1.
-	 */
-	cur = kvm_pgtable_hyp_pte_prot(pte);
-	prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_SHARED_BORROWED);
-	if (!check_prot(cur, prot, ~prot))
-		ret = -EPERM;
-	goto unlock;
-
-map_shared:
-	/*
-	 * If the page is not yet shared, adjust mappings in both page-tables
-	 * while both locks are held.
-	 */
-	prot = pkvm_mkstate(PAGE_HYP, PKVM_PAGE_SHARED_BORROWED);
-	ret = pkvm_create_mappings_locked(virt, virt + PAGE_SIZE, prot);
-	BUG_ON(ret);
-
-	prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, PKVM_PAGE_SHARED_OWNED);
-	ret = host_stage2_idmap_locked(addr, PAGE_SIZE, prot);
-	BUG_ON(ret);
-
-unlock:
-	hyp_unlock_component();
-	host_unlock_component();
-
-	return ret;
-}
-
 void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
 {
 	struct kvm_vcpu_fault_info fault;
@@ -708,3 +620,36 @@ static int do_share(struct pkvm_mem_share *share)
 
 	return WARN_ON(__do_share(share));
 }
+
+int __pkvm_host_share_hyp(u64 pfn)
+{
+	int ret;
+	u64 host_addr = hyp_pfn_to_phys(pfn);
+	u64 hyp_addr = (u64)__hyp_va(host_addr);
+	struct pkvm_mem_share share = {
+		.tx	= {
+			.nr_pages	= 1,
+			.initiator	= {
+				.id	= PKVM_ID_HOST,
+				.addr	= host_addr,
+				.host	= {
+					.completer_addr = hyp_addr,
+				},
+			},
+			.completer	= {
+				.id	= PKVM_ID_HYP,
+			},
+		},
+		.prot	= PAGE_HYP,
+	};
+
+	host_lock_component();
+	hyp_lock_component();
+
+	ret = do_share(&share);
+
+	hyp_unlock_component();
+	host_unlock_component();
+
+	return ret;
+}
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 13/15] KVM: arm64: Implement do_unshare() helper for unsharing memory
  2021-12-01 17:03 [PATCH v3 00/15] KVM: arm64: Introduce kvm_share_hyp() Quentin Perret
                   ` (11 preceding siblings ...)
  2021-12-01 17:04 ` [PATCH v3 12/15] KVM: arm64: Implement __pkvm_host_share_hyp() using do_share() Quentin Perret
@ 2021-12-01 17:04 ` Quentin Perret
  2021-12-10 15:08   ` Andrew Walbran
  2021-12-01 17:04 ` [PATCH v3 14/15] KVM: arm64: Expose unshare hypercall to the host Quentin Perret
  2021-12-01 17:04 ` [PATCH v3 15/15] KVM: arm64: pkvm: Unshare guest structs during teardown Quentin Perret
  14 siblings, 1 reply; 37+ messages in thread
From: Quentin Perret @ 2021-12-01 17:04 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvmarm, linux-kernel, kernel-team

From: Will Deacon <will@kernel.org>

Tearing down a previously shared memory region results in the borrower
losing access to the underlying pages and returning them to the "owned"
state in the owner.

Implement a do_unshare() helper, along the same lines as do_share(), to
provide this functionality for the host-to-hyp case.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 115 ++++++++++++++++++++++++++
 1 file changed, 115 insertions(+)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 1282cbd6b9b3..43b25e2de780 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -485,6 +485,16 @@ static int host_request_owned_transition(u64 *completer_addr,
 	return __host_check_page_state_range(addr, size, PKVM_PAGE_OWNED);
 }
 
+static int host_request_unshare(u64 *completer_addr,
+				const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+	u64 addr = tx->initiator.addr;
+
+	*completer_addr = tx->initiator.host.completer_addr;
+	return __host_check_page_state_range(addr, size, PKVM_PAGE_SHARED_OWNED);
+}
+
 static int host_initiate_share(u64 *completer_addr,
 			       const struct pkvm_mem_transition *tx)
 {
@@ -495,6 +505,16 @@ static int host_initiate_share(u64 *completer_addr,
 	return __host_set_page_state_range(addr, size, PKVM_PAGE_SHARED_OWNED);
 }
 
+static int host_initiate_unshare(u64 *completer_addr,
+				 const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+	u64 addr = tx->initiator.addr;
+
+	*completer_addr = tx->initiator.host.completer_addr;
+	return __host_set_page_state_range(addr, size, PKVM_PAGE_OWNED);
+}
+
 static enum pkvm_page_state hyp_get_page_state(kvm_pte_t pte)
 {
 	if (!kvm_pte_valid(pte))
@@ -535,6 +555,17 @@ static int hyp_ack_share(u64 addr, const struct pkvm_mem_transition *tx,
 	return __hyp_check_page_state_range(addr, size, PKVM_NOPAGE);
 }
 
+static int hyp_ack_unshare(u64 addr, const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+
+	if (__hyp_ack_skip_pgtable_check(tx))
+		return 0;
+
+	return __hyp_check_page_state_range(addr, size,
+					    PKVM_PAGE_SHARED_BORROWED);
+}
+
 static int hyp_complete_share(u64 addr, const struct pkvm_mem_transition *tx,
 			      enum kvm_pgtable_prot perms)
 {
@@ -545,6 +576,14 @@ static int hyp_complete_share(u64 addr, const struct pkvm_mem_transition *tx,
 	return pkvm_create_mappings_locked(start, end, prot);
 }
 
+static int hyp_complete_unshare(u64 addr, const struct pkvm_mem_transition *tx)
+{
+	u64 size = tx->nr_pages * PAGE_SIZE;
+	int ret = kvm_pgtable_hyp_unmap(&pkvm_pgtable, addr, size);
+
+	return (ret != size) ? -EFAULT : 0;
+}
+
 static int check_share(struct pkvm_mem_share *share)
 {
 	const struct pkvm_mem_transition *tx = &share->tx;
@@ -621,6 +660,82 @@ static int do_share(struct pkvm_mem_share *share)
 	return WARN_ON(__do_share(share));
 }
 
+static int check_unshare(struct pkvm_mem_share *share)
+{
+	const struct pkvm_mem_transition *tx = &share->tx;
+	u64 completer_addr;
+	int ret;
+
+	switch (tx->initiator.id) {
+	case PKVM_ID_HOST:
+		ret = host_request_unshare(&completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	if (ret)
+		return ret;
+
+	switch (tx->completer.id) {
+	case PKVM_ID_HYP:
+		ret = hyp_ack_unshare(completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+static int __do_unshare(struct pkvm_mem_share *share)
+{
+	const struct pkvm_mem_transition *tx = &share->tx;
+	u64 completer_addr;
+	int ret;
+
+	switch (tx->initiator.id) {
+	case PKVM_ID_HOST:
+		ret = host_initiate_unshare(&completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	if (ret)
+		return ret;
+
+	switch (tx->completer.id) {
+	case PKVM_ID_HYP:
+		ret = hyp_complete_unshare(completer_addr, tx);
+		break;
+	default:
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+/*
+ * do_unshare():
+ *
+ * The page owner revokes access from another component for a range of
+ * pages which were previously shared using do_share().
+ *
+ * Initiator: SHARED_OWNED	=> OWNED
+ * Completer: SHARED_BORROWED	=> NOPAGE
+ */
+static int do_unshare(struct pkvm_mem_share *share)
+{
+	int ret;
+
+	ret = check_unshare(share);
+	if (ret)
+		return ret;
+
+	return WARN_ON(__do_unshare(share));
+}
+
 int __pkvm_host_share_hyp(u64 pfn)
 {
 	int ret;
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 14/15] KVM: arm64: Expose unshare hypercall to the host
  2021-12-01 17:03 [PATCH v3 00/15] KVM: arm64: Introduce kvm_share_hyp() Quentin Perret
                   ` (12 preceding siblings ...)
  2021-12-01 17:04 ` [PATCH v3 13/15] KVM: arm64: Implement do_unshare() helper for unsharing memory Quentin Perret
@ 2021-12-01 17:04 ` Quentin Perret
  2021-12-01 17:04 ` [PATCH v3 15/15] KVM: arm64: pkvm: Unshare guest structs during teardown Quentin Perret
  14 siblings, 0 replies; 37+ messages in thread
From: Quentin Perret @ 2021-12-01 17:04 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvmarm, linux-kernel, kernel-team

From: Will Deacon <will@kernel.org>

Introduce an unshare hypercall which can be used to unmap memory from
the hypervisor stage-1 in nVHE protected mode. This will be useful to
update the EL2 ownership state of pages during guest teardown, and
avoids keeping dangling mappings to unreferenced portions of memory.

Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_asm.h              |  1 +
 arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |  1 +
 arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  8 +++++
 arch/arm64/kvm/hyp/nvhe/mem_protect.c         | 33 +++++++++++++++++++
 4 files changed, 43 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 50d5e4de244c..d5b0386ef765 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -63,6 +63,7 @@ enum __kvm_host_smccc_func {
 
 	/* Hypercalls available after pKVM finalisation */
 	__KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp,
+	__KVM_HOST_SMCCC_FUNC___pkvm_host_unshare_hyp,
 	__KVM_HOST_SMCCC_FUNC___kvm_adjust_pc,
 	__KVM_HOST_SMCCC_FUNC___kvm_vcpu_run,
 	__KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context,
diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
index 56445586c755..80e99836eac7 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
@@ -55,6 +55,7 @@ extern const u8 pkvm_hyp_id;
 
 int __pkvm_prot_finalize(void);
 int __pkvm_host_share_hyp(u64 pfn);
+int __pkvm_host_unshare_hyp(u64 pfn);
 
 bool addr_is_memory(phys_addr_t phys);
 int host_stage2_idmap_locked(phys_addr_t addr, u64 size, enum kvm_pgtable_prot prot);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index b096bf009144..5e2197db0d32 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -147,6 +147,13 @@ static void handle___pkvm_host_share_hyp(struct kvm_cpu_context *host_ctxt)
 	cpu_reg(host_ctxt, 1) = __pkvm_host_share_hyp(pfn);
 }
 
+static void handle___pkvm_host_unshare_hyp(struct kvm_cpu_context *host_ctxt)
+{
+	DECLARE_REG(u64, pfn, host_ctxt, 1);
+
+	cpu_reg(host_ctxt, 1) = __pkvm_host_unshare_hyp(pfn);
+}
+
 static void handle___pkvm_create_private_mapping(struct kvm_cpu_context *host_ctxt)
 {
 	DECLARE_REG(phys_addr_t, phys, host_ctxt, 1);
@@ -184,6 +191,7 @@ static const hcall_t host_hcall[] = {
 	HANDLE_FUNC(__pkvm_prot_finalize),
 
 	HANDLE_FUNC(__pkvm_host_share_hyp),
+	HANDLE_FUNC(__pkvm_host_unshare_hyp),
 	HANDLE_FUNC(__kvm_adjust_pc),
 	HANDLE_FUNC(__kvm_vcpu_run),
 	HANDLE_FUNC(__kvm_flush_vm_context),
diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index 43b25e2de780..640234d3896a 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -768,3 +768,36 @@ int __pkvm_host_share_hyp(u64 pfn)
 
 	return ret;
 }
+
+int __pkvm_host_unshare_hyp(u64 pfn)
+{
+	int ret;
+	u64 host_addr = hyp_pfn_to_phys(pfn);
+	u64 hyp_addr = (u64)__hyp_va(host_addr);
+	struct pkvm_mem_share share = {
+		.tx	= {
+			.nr_pages	= 1,
+			.initiator	= {
+				.id	= PKVM_ID_HOST,
+				.addr	= host_addr,
+				.host	= {
+					.completer_addr = hyp_addr,
+				},
+			},
+			.completer	= {
+				.id	= PKVM_ID_HYP,
+			},
+		},
+		.prot	= PAGE_HYP,
+	};
+
+	host_lock_component();
+	hyp_lock_component();
+
+	ret = do_unshare(&share);
+
+	hyp_unlock_component();
+	host_unlock_component();
+
+	return ret;
+}
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 15/15] KVM: arm64: pkvm: Unshare guest structs during teardown
  2021-12-01 17:03 [PATCH v3 00/15] KVM: arm64: Introduce kvm_share_hyp() Quentin Perret
                   ` (13 preceding siblings ...)
  2021-12-01 17:04 ` [PATCH v3 14/15] KVM: arm64: Expose unshare hypercall to the host Quentin Perret
@ 2021-12-01 17:04 ` Quentin Perret
  2021-12-09 11:22   ` Will Deacon
  14 siblings, 1 reply; 37+ messages in thread
From: Quentin Perret @ 2021-12-01 17:04 UTC (permalink / raw)
  To: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon
  Cc: linux-arm-kernel, kvmarm, linux-kernel, kernel-team

Make use of the newly introduced unshare hypercall during guest teardown
to unmap guest-related data structures from the hyp stage-1.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_host.h |  2 ++
 arch/arm64/include/asm/kvm_mmu.h  |  1 +
 arch/arm64/kvm/arm.c              |  2 ++
 arch/arm64/kvm/fpsimd.c           | 34 ++++++++++++++++++++++---
 arch/arm64/kvm/mmu.c              | 42 +++++++++++++++++++++++++++++++
 arch/arm64/kvm/reset.c            |  8 +++++-
 6 files changed, 85 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index cf858a7e3533..9360a2804df1 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -321,6 +321,7 @@ struct kvm_vcpu_arch {
 	struct kvm_guest_debug_arch external_debug_state;
 
 	struct user_fpsimd_state *host_fpsimd_state;	/* hyp VA */
+	struct task_struct *parent_task;
 
 	struct {
 		/* {Break,watch}point registers */
@@ -737,6 +738,7 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_ctxflush_fp(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_put_fp(struct kvm_vcpu *vcpu);
+void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu);
 
 static inline bool kvm_pmu_counter_deferred(struct perf_event_attr *attr)
 {
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 185d0f62b724..81839e9a8a24 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -151,6 +151,7 @@ static __always_inline unsigned long __kern_hyp_va(unsigned long v)
 #include <asm/stage2_pgtable.h>
 
 int kvm_share_hyp(void *from, void *to);
+void kvm_unshare_hyp(void *from, void *to);
 int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
 int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
 			   void __iomem **kaddr,
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index c202abb448b1..6057f3c5aafe 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -188,6 +188,8 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 		}
 	}
 	atomic_set(&kvm->online_vcpus, 0);
+
+	kvm_unshare_hyp(kvm, kvm + 1);
 }
 
 int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index 86899d3aa9a9..2f48fd362a8c 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -14,6 +14,19 @@
 #include <asm/kvm_mmu.h>
 #include <asm/sysreg.h>
 
+void kvm_vcpu_unshare_task_fp(struct kvm_vcpu *vcpu)
+{
+	struct task_struct *p = vcpu->arch.parent_task;
+	struct user_fpsimd_state *fpsimd;
+
+	if (!is_protected_kvm_enabled() || !p)
+		return;
+
+	fpsimd = &p->thread.uw.fpsimd_state;
+	kvm_unshare_hyp(fpsimd, fpsimd + 1);
+	put_task_struct(p);
+}
+
 /*
  * Called on entry to KVM_RUN unless this vcpu previously ran at least
  * once and the most recent prior KVM_RUN for this vcpu was called from
@@ -29,12 +42,27 @@ int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu)
 
 	struct user_fpsimd_state *fpsimd = &current->thread.uw.fpsimd_state;
 
+	kvm_vcpu_unshare_task_fp(vcpu);
+
 	/* Make sure the host task fpsimd state is visible to hyp: */
 	ret = kvm_share_hyp(fpsimd, fpsimd + 1);
-	if (!ret)
-		vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd);
+	if (ret)
+		return ret;
+
+	vcpu->arch.host_fpsimd_state = kern_hyp_va(fpsimd);
+
+	/*
+	 * We need to keep current's task_struct pinned until its data has been
+	 * unshared with the hypervisor to make sure it is not re-used by the
+	 * kernel and donated to someone else while already shared -- see
+	 * kvm_vcpu_unshare_task_fp() for the matching put_task_struct().
+	 */
+	if (is_protected_kvm_enabled()) {
+		get_task_struct(current);
+		vcpu->arch.parent_task = current;
+	}
 
-	return ret;
+	return 0;
 }
 
 /*
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index d72566896755..8e506ba8988e 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -344,6 +344,32 @@ static int share_pfn_hyp(u64 pfn)
 	return ret;
 }
 
+static int unshare_pfn_hyp(u64 pfn)
+{
+	struct rb_node **node, *parent;
+	struct hyp_shared_pfn *this;
+	int ret = 0;
+
+	mutex_lock(&hyp_shared_pfns_lock);
+	this = find_shared_pfn(pfn, &node, &parent);
+	if (WARN_ON(!this)) {
+		ret = -EINVAL;
+		goto unlock;
+	}
+
+	this->count--;
+	if (this->count)
+		goto unlock;
+
+	rb_erase(&this->node, &hyp_shared_pfns);
+	kfree(this);
+	ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_hyp, pfn, 1);
+unlock:
+	mutex_unlock(&hyp_shared_pfns_lock);
+
+	return ret;
+}
+
 int kvm_share_hyp(void *from, void *to)
 {
 	phys_addr_t start, end, cur;
@@ -376,6 +402,22 @@ int kvm_share_hyp(void *from, void *to)
 	return 0;
 }
 
+void kvm_unshare_hyp(void *from, void *to)
+{
+	phys_addr_t start, end, cur;
+	u64 pfn;
+
+	if (is_kernel_in_hyp_mode() || kvm_host_owns_hyp_mappings() || !from)
+		return;
+
+	start = ALIGN_DOWN(__pa(from), PAGE_SIZE);
+	end = PAGE_ALIGN(__pa(to));
+	for (cur = start; cur < end; cur += PAGE_SIZE) {
+		pfn = __phys_to_pfn(cur);
+		WARN_ON(unshare_pfn_hyp(pfn));
+	}
+}
+
 /**
  * create_hyp_mappings - duplicate a kernel virtual address range in Hyp mode
  * @from:	The virtual kernel start address of the range
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index e3e2a79fbd75..798a84eddbde 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -150,7 +150,13 @@ bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu)
 
 void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
-	kfree(vcpu->arch.sve_state);
+	void *sve_state = vcpu->arch.sve_state;
+
+	kvm_vcpu_unshare_task_fp(vcpu);
+	kvm_unshare_hyp(vcpu, vcpu + 1);
+	if (sve_state)
+		kvm_unshare_hyp(sve_state, sve_state + vcpu_sve_state_size(vcpu));
+	kfree(sve_state);
 }
 
 static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu)
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 06/15] KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2
  2021-12-01 17:04 ` [PATCH v3 06/15] KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2 Quentin Perret
@ 2021-12-07 14:47   ` Andrew Walbran
  2021-12-08  9:51     ` Quentin Perret
  0 siblings, 1 reply; 37+ messages in thread
From: Andrew Walbran @ 2021-12-07 14:47 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, linux-arm-kernel, kvmarm,
	linux-kernel, kernel-team

[-- Attachment #1: Type: text/plain, Size: 4831 bytes --]

On Wed, 1 Dec 2021 at 17:04, 'Quentin Perret' via kernel-team
<kernel-team@android.com> wrote:
>
> From: Will Deacon <will@kernel.org>
>
> Implement kvm_pgtable_hyp_unmap() which can be used to remove hypervisor
> stage-1 mappings at EL2.
>
> Signed-off-by: Will Deacon <will@kernel.org>
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h | 21 ++++++++++
>  arch/arm64/kvm/hyp/pgtable.c         | 63 ++++++++++++++++++++++++++++
>  2 files changed, 84 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 027783829584..9d076f36401d 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -251,6 +251,27 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt);
>  int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
>                         enum kvm_pgtable_prot prot);
>
> +/**
> + * kvm_pgtable_hyp_unmap() - Remove a mapping from a hypervisor stage-1 page-table.
> + * @pgt:       Page-table structure initialised by kvm_pgtable_hyp_init().
> + * @addr:      Virtual address from which to remove the mapping.
> + * @size:      Size of the mapping.
> + *
> + * The offset of @addr within a page is ignored, @size is rounded-up to
> + * the next page boundary and @phys is rounded-down to the previous page
> + * boundary.
> + *
> + * TLB invalidation is performed for each page-table entry cleared during the
> + * unmapping operation and the reference count for the page-table page
> + * containing the cleared entry is decremented, with unreferenced pages being
> + * freed. The unmapping operation will stop early if it encounters either an
> + * invalid page-table entry or a valid block mapping which maps beyond the range
> + * being unmapped.

How is the caller expected to break up the block mapping? Why not
handle that within this function?

>
> + *
> + * Return: Number of bytes unmapped, which may be 0.
> + */
> +u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
> +
>  /**
>   * kvm_get_vtcr() - Helper to construct VTCR_EL2
>   * @mmfr0:     Sanitized value of SYS_ID_AA64MMFR0_EL1 register.
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 768a58835153..6ad4cb2d6947 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -463,6 +463,69 @@ int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
>         return ret;
>  }
>
> +struct hyp_unmap_data {
> +       u64                             unmapped;
> +       struct kvm_pgtable_mm_ops       *mm_ops;
> +};
> +
> +static int hyp_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +                           enum kvm_pgtable_walk_flags flag, void * const arg)
> +{
> +       kvm_pte_t pte = *ptep, *childp = NULL;
> +       u64 granule = kvm_granule_size(level);
> +       struct hyp_unmap_data *data = arg;
> +       struct kvm_pgtable_mm_ops *mm_ops = data->mm_ops;
> +
> +       if (!kvm_pte_valid(pte))
> +               return -EINVAL;
> +
> +       if (kvm_pte_table(pte, level)) {
> +               childp = kvm_pte_follow(pte, mm_ops);
> +
> +               if (mm_ops->page_count(childp) != 1)
> +                       return 0;
> +
> +               kvm_clear_pte(ptep);
> +               dsb(ishst);
> +               __tlbi_level(vae2is, __TLBI_VADDR(addr, 0), level);
> +       } else {
> +               if (end - addr < granule)
> +                       return -EINVAL;
> +
> +               kvm_clear_pte(ptep);
> +               dsb(ishst);
> +               __tlbi_level(vale2is, __TLBI_VADDR(addr, 0), level);
> +               data->unmapped += granule;
> +       }
> +
> +       dsb(ish);
> +       isb();
> +       mm_ops->put_page(ptep);
> +
> +       if (childp)
> +               mm_ops->put_page(childp);
> +
> +       return 0;
> +}
> +
> +u64 kvm_pgtable_hyp_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
> +{
> +       struct hyp_unmap_data unmap_data = {
> +               .mm_ops = pgt->mm_ops,
> +       };
> +       struct kvm_pgtable_walker walker = {
> +               .cb     = hyp_unmap_walker,
> +               .arg    = &unmap_data,
> +               .flags  = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> +       };
> +
> +       if (!pgt->mm_ops->page_count)
> +               return 0;
> +
> +       kvm_pgtable_walk(pgt, addr, size, &walker);
> +       return unmap_data.unmapped;
> +}
> +
>  int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits,
>                          struct kvm_pgtable_mm_ops *mm_ops)
>  {
> --
> 2.34.0.rc2.393.gf8c9666880-goog
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3998 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 06/15] KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2
  2021-12-07 14:47   ` Andrew Walbran
@ 2021-12-08  9:51     ` Quentin Perret
  2021-12-08 14:40       ` Andrew Walbran
  0 siblings, 1 reply; 37+ messages in thread
From: Quentin Perret @ 2021-12-08  9:51 UTC (permalink / raw)
  To: Andrew Walbran
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, linux-arm-kernel, kvmarm,
	linux-kernel, kernel-team

Hi Andrew,

On Tuesday 07 Dec 2021 at 14:47:14 (+0000), Andrew Walbran wrote:
> On Wed, 1 Dec 2021 at 17:04, 'Quentin Perret' via kernel-team
> <kernel-team@android.com> wrote:
> >
> > From: Will Deacon <will@kernel.org>
> >
> > Implement kvm_pgtable_hyp_unmap() which can be used to remove hypervisor
> > stage-1 mappings at EL2.
> >
> > Signed-off-by: Will Deacon <will@kernel.org>
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> >  arch/arm64/include/asm/kvm_pgtable.h | 21 ++++++++++
> >  arch/arm64/kvm/hyp/pgtable.c         | 63 ++++++++++++++++++++++++++++
> >  2 files changed, 84 insertions(+)
> >
> > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > index 027783829584..9d076f36401d 100644
> > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > @@ -251,6 +251,27 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt);
> >  int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
> >                         enum kvm_pgtable_prot prot);
> >
> > +/**
> > + * kvm_pgtable_hyp_unmap() - Remove a mapping from a hypervisor stage-1 page-table.
> > + * @pgt:       Page-table structure initialised by kvm_pgtable_hyp_init().
> > + * @addr:      Virtual address from which to remove the mapping.
> > + * @size:      Size of the mapping.
> > + *
> > + * The offset of @addr within a page is ignored, @size is rounded-up to
> > + * the next page boundary and @phys is rounded-down to the previous page
> > + * boundary.
> > + *
> > + * TLB invalidation is performed for each page-table entry cleared during the
> > + * unmapping operation and the reference count for the page-table page
> > + * containing the cleared entry is decremented, with unreferenced pages being
> > + * freed. The unmapping operation will stop early if it encounters either an
> > + * invalid page-table entry or a valid block mapping which maps beyond the range
> > + * being unmapped.
> 
> How is the caller expected to break up the block mapping? Why not
> handle that within this function?

We don't really use block mappings for the hyp stage-1, since pretty
much forever (see the loop in pkvm_create_mappings_locked() for ex), so
handling it here would be somewhat unnecessary complexity. Handling this
in the pgtable code itself (which I assume would mean proactively
re-mapping the rest of the range with page-granularity mappings or
something along those lines) is tricky because of BBM and concurrency,
so I'd rather avoid handling same-level aborts at EL2 and all that mess
unless we have a good reason. Is there a use-case where you think that'd
be needed?

Cheers,
Quentin

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 06/15] KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2
  2021-12-08  9:51     ` Quentin Perret
@ 2021-12-08 14:40       ` Andrew Walbran
  2021-12-15 16:02         ` Quentin Perret
  0 siblings, 1 reply; 37+ messages in thread
From: Andrew Walbran @ 2021-12-08 14:40 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, linux-arm-kernel, kvmarm,
	linux-kernel, kernel-team

On Wed, 8 Dec 2021 at 09:51, Quentin Perret <qperret@google.com> wrote:
>
> Hi Andrew,
>
> On Tuesday 07 Dec 2021 at 14:47:14 (+0000), Andrew Walbran wrote:
> > On Wed, 1 Dec 2021 at 17:04, 'Quentin Perret' via kernel-team
> > <kernel-team@android.com> wrote:
> > >
> > > From: Will Deacon <will@kernel.org>
> > >
> > > Implement kvm_pgtable_hyp_unmap() which can be used to remove hypervisor
> > > stage-1 mappings at EL2.
> > >
> > > Signed-off-by: Will Deacon <will@kernel.org>
> > > Signed-off-by: Quentin Perret <qperret@google.com>
> > > ---
> > >  arch/arm64/include/asm/kvm_pgtable.h | 21 ++++++++++
> > >  arch/arm64/kvm/hyp/pgtable.c         | 63 ++++++++++++++++++++++++++++
> > >  2 files changed, 84 insertions(+)
> > >
> > > diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> > > index 027783829584..9d076f36401d 100644
> > > --- a/arch/arm64/include/asm/kvm_pgtable.h
> > > +++ b/arch/arm64/include/asm/kvm_pgtable.h
> > > @@ -251,6 +251,27 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt);
> > >  int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
> > >                         enum kvm_pgtable_prot prot);
> > >
> > > +/**
> > > + * kvm_pgtable_hyp_unmap() - Remove a mapping from a hypervisor stage-1 page-table.
> > > + * @pgt:       Page-table structure initialised by kvm_pgtable_hyp_init().
> > > + * @addr:      Virtual address from which to remove the mapping.
> > > + * @size:      Size of the mapping.
> > > + *
> > > + * The offset of @addr within a page is ignored, @size is rounded-up to
> > > + * the next page boundary and @phys is rounded-down to the previous page
> > > + * boundary.
> > > + *
> > > + * TLB invalidation is performed for each page-table entry cleared during the
> > > + * unmapping operation and the reference count for the page-table page
> > > + * containing the cleared entry is decremented, with unreferenced pages being
> > > + * freed. The unmapping operation will stop early if it encounters either an
> > > + * invalid page-table entry or a valid block mapping which maps beyond the range
> > > + * being unmapped.
> >
> > How is the caller expected to break up the block mapping? Why not
> > handle that within this function?
>
> We don't really use block mappings for the hyp stage-1, since pretty
> much forever (see the loop in pkvm_create_mappings_locked() for ex), so
> handling it here would be somewhat unnecessary complexity. Handling this
> in the pgtable code itself (which I assume would mean proactively
> re-mapping the rest of the range with page-granularity mappings or
> something along those lines) is tricky because of BBM and concurrency,
> so I'd rather avoid handling same-level aborts at EL2 and all that mess
> unless we have a good reason. Is there a use-case where you think that'd
> be needed?

Aha, I didn't realise that block mappings, but it makes sense in that
case. How about adding a note to the function comment here explaining
that reasoning? Otherwise it sounds like the caller has to handle it
somehow.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 01/15] KVM: arm64: Check if running in VHE from kvm_host_owns_hyp_mappings()
  2021-12-01 17:03 ` [PATCH v3 01/15] KVM: arm64: Check if running in VHE from kvm_host_owns_hyp_mappings() Quentin Perret
@ 2021-12-09 10:10   ` Will Deacon
  2021-12-10 13:37     ` Quentin Perret
  0 siblings, 1 reply; 37+ messages in thread
From: Will Deacon @ 2021-12-09 10:10 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, linux-arm-kernel, kvmarm, linux-kernel,
	kernel-team

On Wed, Dec 01, 2021 at 05:03:55PM +0000, Quentin Perret wrote:
> The kvm_host_owns_hyp_mappings() function should return true if and only
> if the host kernel is responsible for creating the hypervisor stage-1
> mappings. That is only possible in standard non-VHE mode, or during boot
> in protected nVHE mode. But either way, non of this makes sense in VHE,
> so make sure to catch this case as well, hence making the function
> return sensible values in any context (VHE or not).
> 
> Suggested-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/mmu.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 326cdfec74a1..f8f1096a297f 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -239,6 +239,9 @@ void free_hyp_pgds(void)
>  
>  static bool kvm_host_owns_hyp_mappings(void)
>  {
> +	if (is_kernel_in_hyp_mode())
> +		return false;

This looks identical to:

https://lore.kernel.org/r/20211208152300.2478542-7-qperret@google.com

Will

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 02/15] KVM: arm64: Provide {get,put}_page() stubs for early hyp allocator
  2021-12-01 17:03 ` [PATCH v3 02/15] KVM: arm64: Provide {get,put}_page() stubs for early hyp allocator Quentin Perret
@ 2021-12-09 10:12   ` Will Deacon
  0 siblings, 0 replies; 37+ messages in thread
From: Will Deacon @ 2021-12-09 10:12 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, linux-arm-kernel, kvmarm, linux-kernel,
	kernel-team

On Wed, Dec 01, 2021 at 05:03:56PM +0000, Quentin Perret wrote:
> In nVHE protected mode, the EL2 code uses a temporary allocator during
> boot while re-creating its stage-1 page-table. Unfortunately, the
> hyp_vmmemap is not ready to use at this stage, so refcounting pages
> is not possible. That is not currently a problem because hyp stage-1
> mappings are never removed, which implies refcounting of page-table
> pages is unnecessary.
> 
> In preparation for allowing hypervisor stage-1 mappings to be removed,
> provide stub implementations for {get,put}_page() in the early allocator.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/nvhe/early_alloc.c | 5 +++++
>  1 file changed, 5 insertions(+)

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 03/15] KVM: arm64: Refcount hyp stage-1 pgtable pages
  2021-12-01 17:03 ` [PATCH v3 03/15] KVM: arm64: Refcount hyp stage-1 pgtable pages Quentin Perret
@ 2021-12-09 10:29   ` Will Deacon
  2021-12-10 14:34     ` Quentin Perret
  0 siblings, 1 reply; 37+ messages in thread
From: Will Deacon @ 2021-12-09 10:29 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, linux-arm-kernel, kvmarm, linux-kernel,
	kernel-team

On Wed, Dec 01, 2021 at 05:03:57PM +0000, Quentin Perret wrote:
> To prepare the ground for allowing hyp stage-1 mappings to be removed at
> run-time, update the KVM page-table code to maintain a correct refcount
> using the ->{get,put}_page() function callbacks.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index f8ceebe4982e..768a58835153 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -408,8 +408,10 @@ static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>  		return false;
>  
>  	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> -	if (hyp_pte_needs_update(old, new))
> +	if (hyp_pte_needs_update(old, new)) {
>  		smp_store_release(ptep, new);
> +		data->mm_ops->get_page(ptep);

In the case where we're just updating software bits for a valid pte, doesn't
this result in us taking a spurious reference to the page?

> @@ -482,8 +485,16 @@ static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>  			   enum kvm_pgtable_walk_flags flag, void * const arg)
>  {
>  	struct kvm_pgtable_mm_ops *mm_ops = arg;
> +	kvm_pte_t pte = *ptep;
> +
> +	if (!kvm_pte_valid(pte))
> +		return 0;
> +
> +	mm_ops->put_page(ptep);
> +
> +	if (kvm_pte_table(pte, level))
> +		mm_ops->put_page(kvm_pte_follow(pte, mm_ops));
>  
> -	mm_ops->put_page((void *)kvm_pte_follow(*ptep, mm_ops));
>  	return 0;

This looks pretty similar to the stage-2 walker now, but given how small the
functions are, I'm not sure we'd really gain much by abstracting the "pte
counted" check.

Will

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 04/15] KVM: arm64: Fixup hyp stage-1 refcount
  2021-12-01 17:03 ` [PATCH v3 04/15] KVM: arm64: Fixup hyp stage-1 refcount Quentin Perret
@ 2021-12-09 11:09   ` Will Deacon
  0 siblings, 0 replies; 37+ messages in thread
From: Will Deacon @ 2021-12-09 11:09 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, linux-arm-kernel, kvmarm, linux-kernel,
	kernel-team

On Wed, Dec 01, 2021 at 05:03:58PM +0000, Quentin Perret wrote:
> In nVHE-protected mode, the hyp stage-1 page-table refcount is broken
> due to the lack of refcount support in the early allocator. Fix-up the
> refcount in the finalize walker, once the 'hyp_vmemmap' is up and running.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/nvhe/setup.c | 21 ++++++++++++++++-----
>  1 file changed, 16 insertions(+), 5 deletions(-)

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 07/15] KVM: arm64: Introduce kvm_share_hyp()
  2021-12-01 17:04 ` [PATCH v3 07/15] KVM: arm64: Introduce kvm_share_hyp() Quentin Perret
@ 2021-12-09 11:13   ` Will Deacon
  2021-12-10 14:37     ` Quentin Perret
  0 siblings, 1 reply; 37+ messages in thread
From: Will Deacon @ 2021-12-09 11:13 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, linux-arm-kernel, kvmarm, linux-kernel,
	kernel-team

On Wed, Dec 01, 2021 at 05:04:01PM +0000, Quentin Perret wrote:
> The create_hyp_mappings() function can currently be called at any point
> in time. However, its behaviour in protected mode changes widely
> depending on when it is being called. Prior to KVM init, it is used to
> create the temporary page-table used to bring-up the hypervisor, and
> later on it is transparently turned into a 'share' hypercall when the
> kernel has lost control over the hypervisor stage-1. In order to prepare
> the ground for also unsharing pages with the hypervisor during guest
> teardown, introduce a kvm_share_hyp() function to make it clear in which
> places a share hypercall should be expected, as we will soon need a
> matching unshare hypercall in all those places.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_mmu.h |  1 +
>  arch/arm64/kvm/arm.c             |  4 ++--
>  arch/arm64/kvm/fpsimd.c          |  2 +-
>  arch/arm64/kvm/mmu.c             | 27 +++++++++++++++++++++------
>  arch/arm64/kvm/reset.c           |  2 +-
>  5 files changed, 26 insertions(+), 10 deletions(-)

[...]

> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index f8f1096a297f..fd868fb9d922 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -299,6 +299,25 @@ static int pkvm_share_hyp(phys_addr_t start, phys_addr_t end)
>  	return 0;
>  }
>  
> +int kvm_share_hyp(void *from, void *to)
> +{
> +	if (is_kernel_in_hyp_mode())
> +		return 0;
> +
> +	/*
> +	 * The share hcall maps things in the 'fixed-offset' region of the hyp
> +	 * VA space, so we can only share physically contiguous data-structures
> +	 * for now.
> +	 */
> +	if (is_vmalloc_addr(from) || is_vmalloc_addr(to))
> +		return -EINVAL;

If we're adding these sanity checks, perhaps is_vmalloc_or_module_addr()
would be worth using instead?

Will

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 08/15] KVM: arm64: pkvm: Refcount the pages shared with EL2
  2021-12-01 17:04 ` [PATCH v3 08/15] KVM: arm64: pkvm: Refcount the pages shared with EL2 Quentin Perret
@ 2021-12-09 11:16   ` Will Deacon
  0 siblings, 0 replies; 37+ messages in thread
From: Will Deacon @ 2021-12-09 11:16 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, linux-arm-kernel, kvmarm, linux-kernel,
	kernel-team

On Wed, Dec 01, 2021 at 05:04:02PM +0000, Quentin Perret wrote:
> In order to simplify the page tracking infrastructure at EL2 in nVHE
> protected mode, move the responsibility of refcounting pages that are
> shared multiple times on the host. In order to do so, let's create a
> red-black tree tracking all the PFNs that have been shared, along with
> a refcount.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/mmu.c | 78 ++++++++++++++++++++++++++++++++++++++------
>  1 file changed, 68 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index fd868fb9d922..d72566896755 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -284,23 +284,72 @@ static phys_addr_t kvm_kaddr_to_phys(void *kaddr)
>  	}
>  }
>  
> -static int pkvm_share_hyp(phys_addr_t start, phys_addr_t end)
> +struct hyp_shared_pfn {
> +	u64 pfn;
> +	int count;
> +	struct rb_node node;
> +};
> +
> +static DEFINE_MUTEX(hyp_shared_pfns_lock);
> +static struct rb_root hyp_shared_pfns = RB_ROOT;
> +
> +static struct hyp_shared_pfn *find_shared_pfn(u64 pfn, struct rb_node ***node,
> +					      struct rb_node **parent)
>  {
> -	phys_addr_t addr;
> -	int ret;
> +	struct hyp_shared_pfn *this;
> +
> +	*node = &hyp_shared_pfns.rb_node;
> +	*parent = NULL;
> +	while (**node) {
> +		this = container_of(**node, struct hyp_shared_pfn, node);
> +		*parent = **node;
> +		if (this->pfn < pfn)
> +			*node = &((**node)->rb_left);
> +		else if (this->pfn > pfn)
> +			*node = &((**node)->rb_right);
> +		else
> +			return this;
> +	}
>  
> -	for (addr = ALIGN_DOWN(start, PAGE_SIZE); addr < end; addr += PAGE_SIZE) {
> -		ret = kvm_call_hyp_nvhe(__pkvm_host_share_hyp,
> -					__phys_to_pfn(addr));
> -		if (ret)
> -			return ret;
> +	return NULL;
> +}
> +
> +static int share_pfn_hyp(u64 pfn)
> +{
> +	struct rb_node **node, *parent;
> +	struct hyp_shared_pfn *this;
> +	int ret = 0;
> +
> +	mutex_lock(&hyp_shared_pfns_lock);
> +	this = find_shared_pfn(pfn, &node, &parent);

I don't think this is a fast-path at the moment, but in the future we might
consider using RCU to do the lookup outside of the mutex.

But as-is:

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 15/15] KVM: arm64: pkvm: Unshare guest structs during teardown
  2021-12-01 17:04 ` [PATCH v3 15/15] KVM: arm64: pkvm: Unshare guest structs during teardown Quentin Perret
@ 2021-12-09 11:22   ` Will Deacon
  2021-12-10 14:48     ` Quentin Perret
  0 siblings, 1 reply; 37+ messages in thread
From: Will Deacon @ 2021-12-09 11:22 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, linux-arm-kernel, kvmarm, linux-kernel,
	kernel-team

On Wed, Dec 01, 2021 at 05:04:09PM +0000, Quentin Perret wrote:
> Make use of the newly introduced unshare hypercall during guest teardown
> to unmap guest-related data structures from the hyp stage-1.
> 
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  2 ++
>  arch/arm64/include/asm/kvm_mmu.h  |  1 +
>  arch/arm64/kvm/arm.c              |  2 ++
>  arch/arm64/kvm/fpsimd.c           | 34 ++++++++++++++++++++++---
>  arch/arm64/kvm/mmu.c              | 42 +++++++++++++++++++++++++++++++
>  arch/arm64/kvm/reset.c            |  8 +++++-
>  6 files changed, 85 insertions(+), 4 deletions(-)

[...]

> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index d72566896755..8e506ba8988e 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -344,6 +344,32 @@ static int share_pfn_hyp(u64 pfn)
>  	return ret;
>  }
>  
> +static int unshare_pfn_hyp(u64 pfn)
> +{
> +	struct rb_node **node, *parent;
> +	struct hyp_shared_pfn *this;
> +	int ret = 0;
> +
> +	mutex_lock(&hyp_shared_pfns_lock);
> +	this = find_shared_pfn(pfn, &node, &parent);
> +	if (WARN_ON(!this)) {
> +		ret = -EINVAL;

-ENOENT?

> +		goto unlock;
> +	}
> +
> +	this->count--;
> +	if (this->count)
> +		goto unlock;

Again, if we did an RCU lookup then this could be converted to a refcount_t
to take the mutex only when it hits zero. But for now I think it's fine.

> +
> +	rb_erase(&this->node, &hyp_shared_pfns);
> +	kfree(this);
> +	ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_hyp, pfn, 1);
> +unlock:
> +	mutex_unlock(&hyp_shared_pfns_lock);
> +
> +	return ret;
> +}
> +
>  int kvm_share_hyp(void *from, void *to)
>  {
>  	phys_addr_t start, end, cur;
> @@ -376,6 +402,22 @@ int kvm_share_hyp(void *from, void *to)
>  	return 0;
>  }
>  
> +void kvm_unshare_hyp(void *from, void *to)
> +{
> +	phys_addr_t start, end, cur;
> +	u64 pfn;
> +
> +	if (is_kernel_in_hyp_mode() || kvm_host_owns_hyp_mappings() || !from)

I don't think you need the is_kernel_in_hyp_mode() check any more not that
you've moved that into kvm_host_owns_hyp_mappings().

Will

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 01/15] KVM: arm64: Check if running in VHE from kvm_host_owns_hyp_mappings()
  2021-12-09 10:10   ` Will Deacon
@ 2021-12-10 13:37     ` Quentin Perret
  0 siblings, 0 replies; 37+ messages in thread
From: Quentin Perret @ 2021-12-10 13:37 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, linux-arm-kernel, kvmarm, linux-kernel,
	kernel-team

On Thursday 09 Dec 2021 at 10:10:54 (+0000), Will Deacon wrote:
> On Wed, Dec 01, 2021 at 05:03:55PM +0000, Quentin Perret wrote:
> > The kvm_host_owns_hyp_mappings() function should return true if and only
> > if the host kernel is responsible for creating the hypervisor stage-1
> > mappings. That is only possible in standard non-VHE mode, or during boot
> > in protected nVHE mode. But either way, non of this makes sense in VHE,
> > so make sure to catch this case as well, hence making the function
> > return sensible values in any context (VHE or not).
> > 
> > Suggested-by: Marc Zyngier <maz@kernel.org>
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> >  arch/arm64/kvm/mmu.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 326cdfec74a1..f8f1096a297f 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -239,6 +239,9 @@ void free_hyp_pgds(void)
> >  
> >  static bool kvm_host_owns_hyp_mappings(void)
> >  {
> > +	if (is_kernel_in_hyp_mode())
> > +		return false;
> 
> This looks identical to:
> 
> https://lore.kernel.org/r/20211208152300.2478542-7-qperret@google.com

Yep, I figured it made more sense in the other series as it's not
strictly related to this one, so ... :)

Cheers,
Quentin

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 03/15] KVM: arm64: Refcount hyp stage-1 pgtable pages
  2021-12-09 10:29   ` Will Deacon
@ 2021-12-10 14:34     ` Quentin Perret
  2021-12-13 12:53       ` Will Deacon
  0 siblings, 1 reply; 37+ messages in thread
From: Quentin Perret @ 2021-12-10 14:34 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, linux-arm-kernel, kvmarm, linux-kernel,
	kernel-team

On Thursday 09 Dec 2021 at 10:29:24 (+0000), Will Deacon wrote:
> On Wed, Dec 01, 2021 at 05:03:57PM +0000, Quentin Perret wrote:
> > To prepare the ground for allowing hyp stage-1 mappings to be removed at
> > run-time, update the KVM page-table code to maintain a correct refcount
> > using the ->{get,put}_page() function callbacks.
> > 
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 17 ++++++++++++++---
> >  1 file changed, 14 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index f8ceebe4982e..768a58835153 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -408,8 +408,10 @@ static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> >  		return false;
> >  
> >  	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> > -	if (hyp_pte_needs_update(old, new))
> > +	if (hyp_pte_needs_update(old, new)) {
> >  		smp_store_release(ptep, new);
> > +		data->mm_ops->get_page(ptep);
> 
> In the case where we're just updating software bits for a valid pte, doesn't
> this result in us taking a spurious reference to the page?

Ahem, yes, that is the case. I ended up with the below diff to fix it,
which I intend to fold in the next version:

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 6ad4cb2d6947..e2047d3f05a2 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -383,21 +383,6 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
        return prot;
 }

-static bool hyp_pte_needs_update(kvm_pte_t old, kvm_pte_t new)
-{
-       /*
-        * Tolerate KVM recreating the exact same mapping, or changing software
-        * bits if the existing mapping was valid.
-        */
-       if (old == new)
-               return false;
-
-       if (!kvm_pte_valid(old))
-               return true;
-
-       return !WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW);
-}
-
 static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
                                    kvm_pte_t *ptep, struct hyp_map_data *data)
 {
@@ -407,13 +392,16 @@ static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
        if (!kvm_block_mapping_supported(addr, end, phys, level))
                return false;

+       data->phys += granule;
        new = kvm_init_valid_leaf_pte(phys, data->attr, level);
-       if (hyp_pte_needs_update(old, new)) {
-               smp_store_release(ptep, new);
+       if (old == new)
+               return true;
+       else if (!kvm_pte_valid(old))
                data->mm_ops->get_page(ptep);
-       }
+       else if (WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW))
+               return false;

-       data->phys += granule;
+       smp_store_release(ptep, new);
        return true;
 }

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 07/15] KVM: arm64: Introduce kvm_share_hyp()
  2021-12-09 11:13   ` Will Deacon
@ 2021-12-10 14:37     ` Quentin Perret
  0 siblings, 0 replies; 37+ messages in thread
From: Quentin Perret @ 2021-12-10 14:37 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, linux-arm-kernel, kvmarm, linux-kernel,
	kernel-team

On Thursday 09 Dec 2021 at 11:13:10 (+0000), Will Deacon wrote:
> On Wed, Dec 01, 2021 at 05:04:01PM +0000, Quentin Perret wrote:
> > The create_hyp_mappings() function can currently be called at any point
> > in time. However, its behaviour in protected mode changes widely
> > depending on when it is being called. Prior to KVM init, it is used to
> > create the temporary page-table used to bring-up the hypervisor, and
> > later on it is transparently turned into a 'share' hypercall when the
> > kernel has lost control over the hypervisor stage-1. In order to prepare
> > the ground for also unsharing pages with the hypervisor during guest
> > teardown, introduce a kvm_share_hyp() function to make it clear in which
> > places a share hypercall should be expected, as we will soon need a
> > matching unshare hypercall in all those places.
> > 
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> >  arch/arm64/include/asm/kvm_mmu.h |  1 +
> >  arch/arm64/kvm/arm.c             |  4 ++--
> >  arch/arm64/kvm/fpsimd.c          |  2 +-
> >  arch/arm64/kvm/mmu.c             | 27 +++++++++++++++++++++------
> >  arch/arm64/kvm/reset.c           |  2 +-
> >  5 files changed, 26 insertions(+), 10 deletions(-)
> 
> [...]
> 
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index f8f1096a297f..fd868fb9d922 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -299,6 +299,25 @@ static int pkvm_share_hyp(phys_addr_t start, phys_addr_t end)
> >  	return 0;
> >  }
> >  
> > +int kvm_share_hyp(void *from, void *to)
> > +{
> > +	if (is_kernel_in_hyp_mode())
> > +		return 0;
> > +
> > +	/*
> > +	 * The share hcall maps things in the 'fixed-offset' region of the hyp
> > +	 * VA space, so we can only share physically contiguous data-structures
> > +	 * for now.
> > +	 */
> > +	if (is_vmalloc_addr(from) || is_vmalloc_addr(to))
> > +		return -EINVAL;
> 
> If we're adding these sanity checks, perhaps is_vmalloc_or_module_addr()
> would be worth using instead?

Ack, I'll fix that up.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 15/15] KVM: arm64: pkvm: Unshare guest structs during teardown
  2021-12-09 11:22   ` Will Deacon
@ 2021-12-10 14:48     ` Quentin Perret
  0 siblings, 0 replies; 37+ messages in thread
From: Quentin Perret @ 2021-12-10 14:48 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, linux-arm-kernel, kvmarm, linux-kernel,
	kernel-team

On Thursday 09 Dec 2021 at 11:22:33 (+0000), Will Deacon wrote:
> On Wed, Dec 01, 2021 at 05:04:09PM +0000, Quentin Perret wrote:
> > Make use of the newly introduced unshare hypercall during guest teardown
> > to unmap guest-related data structures from the hyp stage-1.
> > 
> > Signed-off-by: Quentin Perret <qperret@google.com>
> > ---
> >  arch/arm64/include/asm/kvm_host.h |  2 ++
> >  arch/arm64/include/asm/kvm_mmu.h  |  1 +
> >  arch/arm64/kvm/arm.c              |  2 ++
> >  arch/arm64/kvm/fpsimd.c           | 34 ++++++++++++++++++++++---
> >  arch/arm64/kvm/mmu.c              | 42 +++++++++++++++++++++++++++++++
> >  arch/arm64/kvm/reset.c            |  8 +++++-
> >  6 files changed, 85 insertions(+), 4 deletions(-)
> 
> [...]
> 
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index d72566896755..8e506ba8988e 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -344,6 +344,32 @@ static int share_pfn_hyp(u64 pfn)
> >  	return ret;
> >  }
> >  
> > +static int unshare_pfn_hyp(u64 pfn)
> > +{
> > +	struct rb_node **node, *parent;
> > +	struct hyp_shared_pfn *this;
> > +	int ret = 0;
> > +
> > +	mutex_lock(&hyp_shared_pfns_lock);
> > +	this = find_shared_pfn(pfn, &node, &parent);
> > +	if (WARN_ON(!this)) {
> > +		ret = -EINVAL;
> 
> -ENOENT?

Sure.

> > +		goto unlock;
> > +	}
> > +
> > +	this->count--;
> > +	if (this->count)
> > +		goto unlock;
> 
> Again, if we did an RCU lookup then this could be converted to a refcount_t
> to take the mutex only when it hits zero. But for now I think it's fine.

No objection to do this in the future, but yeah I think we might as well
start simple :)

> > +
> > +	rb_erase(&this->node, &hyp_shared_pfns);
> > +	kfree(this);
> > +	ret = kvm_call_hyp_nvhe(__pkvm_host_unshare_hyp, pfn, 1);
> > +unlock:
> > +	mutex_unlock(&hyp_shared_pfns_lock);
> > +
> > +	return ret;
> > +}
> > +
> >  int kvm_share_hyp(void *from, void *to)
> >  {
> >  	phys_addr_t start, end, cur;
> > @@ -376,6 +402,22 @@ int kvm_share_hyp(void *from, void *to)
> >  	return 0;
> >  }
> >  
> > +void kvm_unshare_hyp(void *from, void *to)
> > +{
> > +	phys_addr_t start, end, cur;
> > +	u64 pfn;
> > +
> > +	if (is_kernel_in_hyp_mode() || kvm_host_owns_hyp_mappings() || !from)
> 
> I don't think you need the is_kernel_in_hyp_mode() check any more not that
> you've moved that into kvm_host_owns_hyp_mappings().

The logic is a little odd, but I think I still do as
kvm_host_owns_hyp_mappings() will return false if is_kernel_in_hyp_mode()
is true.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 13/15] KVM: arm64: Implement do_unshare() helper for unsharing memory
  2021-12-01 17:04 ` [PATCH v3 13/15] KVM: arm64: Implement do_unshare() helper for unsharing memory Quentin Perret
@ 2021-12-10 15:08   ` Andrew Walbran
  0 siblings, 0 replies; 37+ messages in thread
From: Andrew Walbran @ 2021-12-10 15:08 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, linux-arm-kernel, kvmarm,
	linux-kernel, kernel-team

Reviewed-by: Andrew Walbran <qwandor@google.com>


On Wed, 1 Dec 2021 at 17:05, 'Quentin Perret' via kernel-team
<kernel-team@android.com> wrote:
>
> From: Will Deacon <will@kernel.org>
>
> Tearing down a previously shared memory region results in the borrower
> losing access to the underlying pages and returning them to the "owned"
> state in the owner.
>
> Implement a do_unshare() helper, along the same lines as do_share(), to
> provide this functionality for the host-to-hyp case.
>
> Signed-off-by: Will Deacon <will@kernel.org>
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c | 115 ++++++++++++++++++++++++++
>  1 file changed, 115 insertions(+)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 1282cbd6b9b3..43b25e2de780 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -485,6 +485,16 @@ static int host_request_owned_transition(u64 *completer_addr,
>         return __host_check_page_state_range(addr, size, PKVM_PAGE_OWNED);
>  }
>
> +static int host_request_unshare(u64 *completer_addr,
> +                               const struct pkvm_mem_transition *tx)
> +{
> +       u64 size = tx->nr_pages * PAGE_SIZE;
> +       u64 addr = tx->initiator.addr;
> +
> +       *completer_addr = tx->initiator.host.completer_addr;
> +       return __host_check_page_state_range(addr, size, PKVM_PAGE_SHARED_OWNED);
> +}
> +
>  static int host_initiate_share(u64 *completer_addr,
>                                const struct pkvm_mem_transition *tx)
>  {
> @@ -495,6 +505,16 @@ static int host_initiate_share(u64 *completer_addr,
>         return __host_set_page_state_range(addr, size, PKVM_PAGE_SHARED_OWNED);
>  }
>
> +static int host_initiate_unshare(u64 *completer_addr,
> +                                const struct pkvm_mem_transition *tx)
> +{
> +       u64 size = tx->nr_pages * PAGE_SIZE;
> +       u64 addr = tx->initiator.addr;
> +
> +       *completer_addr = tx->initiator.host.completer_addr;
> +       return __host_set_page_state_range(addr, size, PKVM_PAGE_OWNED);
> +}
> +
>  static enum pkvm_page_state hyp_get_page_state(kvm_pte_t pte)
>  {
>         if (!kvm_pte_valid(pte))
> @@ -535,6 +555,17 @@ static int hyp_ack_share(u64 addr, const struct pkvm_mem_transition *tx,
>         return __hyp_check_page_state_range(addr, size, PKVM_NOPAGE);
>  }
>
> +static int hyp_ack_unshare(u64 addr, const struct pkvm_mem_transition *tx)
> +{
> +       u64 size = tx->nr_pages * PAGE_SIZE;
> +
> +       if (__hyp_ack_skip_pgtable_check(tx))
> +               return 0;
> +
> +       return __hyp_check_page_state_range(addr, size,
> +                                           PKVM_PAGE_SHARED_BORROWED);
> +}
> +
>  static int hyp_complete_share(u64 addr, const struct pkvm_mem_transition *tx,
>                               enum kvm_pgtable_prot perms)
>  {
> @@ -545,6 +576,14 @@ static int hyp_complete_share(u64 addr, const struct pkvm_mem_transition *tx,
>         return pkvm_create_mappings_locked(start, end, prot);
>  }
>
> +static int hyp_complete_unshare(u64 addr, const struct pkvm_mem_transition *tx)
> +{
> +       u64 size = tx->nr_pages * PAGE_SIZE;
> +       int ret = kvm_pgtable_hyp_unmap(&pkvm_pgtable, addr, size);
> +
> +       return (ret != size) ? -EFAULT : 0;
> +}
> +
>  static int check_share(struct pkvm_mem_share *share)
>  {
>         const struct pkvm_mem_transition *tx = &share->tx;
> @@ -621,6 +660,82 @@ static int do_share(struct pkvm_mem_share *share)
>         return WARN_ON(__do_share(share));
>  }
>
> +static int check_unshare(struct pkvm_mem_share *share)
> +{
> +       const struct pkvm_mem_transition *tx = &share->tx;
> +       u64 completer_addr;
> +       int ret;
> +
> +       switch (tx->initiator.id) {
> +       case PKVM_ID_HOST:
> +               ret = host_request_unshare(&completer_addr, tx);
> +               break;
> +       default:
> +               ret = -EINVAL;
> +       }
> +
> +       if (ret)
> +               return ret;
> +
> +       switch (tx->completer.id) {
> +       case PKVM_ID_HYP:
> +               ret = hyp_ack_unshare(completer_addr, tx);
> +               break;
> +       default:
> +               ret = -EINVAL;
> +       }
> +
> +       return ret;
> +}
> +
> +static int __do_unshare(struct pkvm_mem_share *share)
> +{
> +       const struct pkvm_mem_transition *tx = &share->tx;
> +       u64 completer_addr;
> +       int ret;
> +
> +       switch (tx->initiator.id) {
> +       case PKVM_ID_HOST:
> +               ret = host_initiate_unshare(&completer_addr, tx);
> +               break;
> +       default:
> +               ret = -EINVAL;
> +       }
> +
> +       if (ret)
> +               return ret;
> +
> +       switch (tx->completer.id) {
> +       case PKVM_ID_HYP:
> +               ret = hyp_complete_unshare(completer_addr, tx);
> +               break;
> +       default:
> +               ret = -EINVAL;
> +       }
> +
> +       return ret;
> +}
> +
> +/*
> + * do_unshare():
> + *
> + * The page owner revokes access from another component for a range of
> + * pages which were previously shared using do_share().
> + *
> + * Initiator: SHARED_OWNED     => OWNED
> + * Completer: SHARED_BORROWED  => NOPAGE
> + */
> +static int do_unshare(struct pkvm_mem_share *share)
> +{
> +       int ret;
> +
> +       ret = check_unshare(share);
> +       if (ret)
> +               return ret;
> +
> +       return WARN_ON(__do_unshare(share));
> +}
> +
>  int __pkvm_host_share_hyp(u64 pfn)
>  {
>         int ret;
> --
> 2.34.0.rc2.393.gf8c9666880-goog
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 11/15] KVM: arm64: Implement do_share() helper for sharing memory
  2021-12-01 17:04 ` [PATCH v3 11/15] KVM: arm64: Implement do_share() helper for sharing memory Quentin Perret
@ 2021-12-10 15:18   ` Andrew Walbran
  0 siblings, 0 replies; 37+ messages in thread
From: Andrew Walbran @ 2021-12-10 15:18 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, linux-arm-kernel, kvmarm,
	linux-kernel, kernel-team

Reviewed-by: Andrew Walbran <qwandor@google.com>

On Wed, 1 Dec 2021 at 17:04, 'Quentin Perret' via kernel-team
<kernel-team@android.com> wrote:
>
> From: Will Deacon <will@kernel.org>
>
> By default, protected KVM isolates memory pages so that they are
> accessible only to their owner: be it the host kernel, the hypervisor
> at EL2 or (in future) the guest. Establishing shared-memory regions
> between these components therefore involves a transition for each page
> so that the owner can share memory with a borrower under a certain set
> of permissions.
>
> Introduce a do_share() helper for safely sharing a memory region between
> two components. Currently, only host-to-hyp sharing is implemented, but
> the code is easily extended to handle other combinations and the
> permission checks for each component are reusable.
>
> Signed-off-by: Will Deacon <will@kernel.org>
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c | 237 ++++++++++++++++++++++++++
>  1 file changed, 237 insertions(+)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 757dfefe3aeb..74ca4043b08a 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -471,3 +471,240 @@ void handle_host_mem_abort(struct kvm_cpu_context *host_ctxt)
>         ret = host_stage2_idmap(addr);
>         BUG_ON(ret && ret != -EAGAIN);
>  }
> +
> +/* This corresponds to locking order */
> +enum pkvm_component_id {
> +       PKVM_ID_HOST,
> +       PKVM_ID_HYP,
> +};
> +
> +struct pkvm_mem_transition {
> +       u64                             nr_pages;
> +
> +       struct {
> +               enum pkvm_component_id  id;
> +               /* Address in the initiator's address space */
> +               u64                     addr;
> +
> +               union {
> +                       struct {
> +                               /* Address in the completer's address space */
> +                               u64     completer_addr;
> +                       } host;
> +               };
> +       } initiator;
> +
> +       struct {
> +               enum pkvm_component_id  id;
> +       } completer;
> +};
> +
> +struct pkvm_mem_share {
> +       const struct pkvm_mem_transition        tx;
> +       const enum kvm_pgtable_prot             prot;
It would be helpful to add a comment documenting what this is used for
(i.e. whether it is for the initiator or completer). Or even rename it
to something like completer_prot to make that clear.

> +};
> +
> +struct check_walk_data {
> +       enum pkvm_page_state    desired;
> +       enum pkvm_page_state    (*get_page_state)(kvm_pte_t pte);
> +};
> +
> +static int __check_page_state_visitor(u64 addr, u64 end, u32 level,
> +                                     kvm_pte_t *ptep,
> +                                     enum kvm_pgtable_walk_flags flag,
> +                                     void * const arg)
> +{
> +       struct check_walk_data *d = arg;
> +       kvm_pte_t pte = *ptep;
> +
> +       if (kvm_pte_valid(pte) && !addr_is_memory(kvm_pte_to_phys(pte)))
> +               return -EINVAL;
> +
> +       return d->get_page_state(pte) == d->desired ? 0 : -EPERM;
> +}
> +
> +static int check_page_state_range(struct kvm_pgtable *pgt, u64 addr, u64 size,
> +                                 struct check_walk_data *data)
> +{
> +       struct kvm_pgtable_walker walker = {
> +               .cb     = __check_page_state_visitor,
> +               .arg    = data,
> +               .flags  = KVM_PGTABLE_WALK_LEAF,
> +       };
> +
> +       return kvm_pgtable_walk(pgt, addr, size, &walker);
> +}
> +
> +static enum pkvm_page_state host_get_page_state(kvm_pte_t pte)
> +{
> +       if (!kvm_pte_valid(pte) && pte)
> +               return PKVM_NOPAGE;
> +
> +       return pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte));
> +}
> +
> +static int __host_check_page_state_range(u64 addr, u64 size,
> +                                        enum pkvm_page_state state)
> +{
> +       struct check_walk_data d = {
> +               .desired        = state,
> +               .get_page_state = host_get_page_state,
> +       };
> +
> +       hyp_assert_lock_held(&host_kvm.lock);
> +       return check_page_state_range(&host_kvm.pgt, addr, size, &d);
> +}
> +
> +static int __host_set_page_state_range(u64 addr, u64 size,
> +                                      enum pkvm_page_state state)
> +{
> +       enum kvm_pgtable_prot prot = pkvm_mkstate(PKVM_HOST_MEM_PROT, state);
> +
> +       return host_stage2_idmap_locked(addr, size, prot);
> +}
> +
> +static int host_request_owned_transition(u64 *completer_addr,
> +                                        const struct pkvm_mem_transition *tx)
> +{
> +       u64 size = tx->nr_pages * PAGE_SIZE;
> +       u64 addr = tx->initiator.addr;
> +
> +       *completer_addr = tx->initiator.host.completer_addr;
> +       return __host_check_page_state_range(addr, size, PKVM_PAGE_OWNED);
> +}
> +
> +static int host_initiate_share(u64 *completer_addr,
> +                              const struct pkvm_mem_transition *tx)
> +{
> +       u64 size = tx->nr_pages * PAGE_SIZE;
> +       u64 addr = tx->initiator.addr;
> +
> +       *completer_addr = tx->initiator.host.completer_addr;
> +       return __host_set_page_state_range(addr, size, PKVM_PAGE_SHARED_OWNED);
> +}
> +
> +static enum pkvm_page_state hyp_get_page_state(kvm_pte_t pte)
> +{
> +       if (!kvm_pte_valid(pte))
> +               return PKVM_NOPAGE;
> +
> +       return pkvm_getstate(kvm_pgtable_stage2_pte_prot(pte));
> +}
> +
> +static int __hyp_check_page_state_range(u64 addr, u64 size,
> +                                       enum pkvm_page_state state)
> +{
> +       struct check_walk_data d = {
> +               .desired        = state,
> +               .get_page_state = hyp_get_page_state,
> +       };
> +
> +       hyp_assert_lock_held(&pkvm_pgd_lock);
> +       return check_page_state_range(&pkvm_pgtable, addr, size, &d);
> +}
> +
> +static bool __hyp_ack_skip_pgtable_check(const struct pkvm_mem_transition *tx)
> +{
> +       return !(IS_ENABLED(CONFIG_NVHE_EL2_DEBUG) ||
> +                tx->initiator.id != PKVM_ID_HOST);
> +}
> +
> +static int hyp_ack_share(u64 addr, const struct pkvm_mem_transition *tx,
> +                        enum kvm_pgtable_prot perms)
> +{
> +       u64 size = tx->nr_pages * PAGE_SIZE;
> +
> +       if (perms != PAGE_HYP)
> +               return -EPERM;
> +
> +       if (__hyp_ack_skip_pgtable_check(tx))
> +               return 0;
> +
> +       return __hyp_check_page_state_range(addr, size, PKVM_NOPAGE);
> +}
> +
> +static int hyp_complete_share(u64 addr, const struct pkvm_mem_transition *tx,
> +                             enum kvm_pgtable_prot perms)
> +{
> +       void *start = (void *)addr, *end = start + (tx->nr_pages * PAGE_SIZE);
> +       enum kvm_pgtable_prot prot;
> +
> +       prot = pkvm_mkstate(perms, PKVM_PAGE_SHARED_BORROWED);
> +       return pkvm_create_mappings_locked(start, end, prot);
> +}
> +
> +static int check_share(struct pkvm_mem_share *share)
> +{
> +       const struct pkvm_mem_transition *tx = &share->tx;
> +       u64 completer_addr;
> +       int ret;
> +
> +       switch (tx->initiator.id) {
> +       case PKVM_ID_HOST:
> +               ret = host_request_owned_transition(&completer_addr, tx);
> +               break;
> +       default:
> +               ret = -EINVAL;
> +       }
> +
> +       if (ret)
> +               return ret;
> +
> +       switch (tx->completer.id) {
> +       case PKVM_ID_HYP:
> +               ret = hyp_ack_share(completer_addr, tx, share->prot);
> +               break;
> +       default:
> +               ret = -EINVAL;
> +       }
> +
> +       return ret;
> +}
> +
> +static int __do_share(struct pkvm_mem_share *share)
> +{
> +       const struct pkvm_mem_transition *tx = &share->tx;
> +       u64 completer_addr;
> +       int ret;
> +
> +       switch (tx->initiator.id) {
> +       case PKVM_ID_HOST:
> +               ret = host_initiate_share(&completer_addr, tx);
> +               break;
> +       default:
> +               ret = -EINVAL;
> +       }
> +
> +       if (ret)
> +               return ret;
> +
> +       switch (tx->completer.id) {
> +       case PKVM_ID_HYP:
> +               ret = hyp_complete_share(completer_addr, tx, share->prot);
> +               break;
> +       default:
> +               ret = -EINVAL;
> +       }
> +
> +       return ret;
> +}
> +
> +/*
> + * do_share():
> + *
> + * The page owner grants access to another component with a given set
> + * of permissions.
> + *
> + * Initiator: OWNED    => SHARED_OWNED
> + * Completer: NOPAGE   => SHARED_BORROWED
> + */
> +static int do_share(struct pkvm_mem_share *share)
> +{
> +       int ret;
> +
> +       ret = check_share(share);
> +       if (ret)
> +               return ret;
> +
> +       return WARN_ON(__do_share(share));
> +}
> --
> 2.34.0.rc2.393.gf8c9666880-goog
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 03/15] KVM: arm64: Refcount hyp stage-1 pgtable pages
  2021-12-10 14:34     ` Quentin Perret
@ 2021-12-13 12:53       ` Will Deacon
  0 siblings, 0 replies; 37+ messages in thread
From: Will Deacon @ 2021-12-13 12:53 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, linux-arm-kernel, kvmarm, linux-kernel,
	kernel-team

On Fri, Dec 10, 2021 at 02:34:16PM +0000, Quentin Perret wrote:
> On Thursday 09 Dec 2021 at 10:29:24 (+0000), Will Deacon wrote:
> > On Wed, Dec 01, 2021 at 05:03:57PM +0000, Quentin Perret wrote:
> > > To prepare the ground for allowing hyp stage-1 mappings to be removed at
> > > run-time, update the KVM page-table code to maintain a correct refcount
> > > using the ->{get,put}_page() function callbacks.
> > > 
> > > Signed-off-by: Quentin Perret <qperret@google.com>
> > > ---
> > >  arch/arm64/kvm/hyp/pgtable.c | 17 ++++++++++++++---
> > >  1 file changed, 14 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > > index f8ceebe4982e..768a58835153 100644
> > > --- a/arch/arm64/kvm/hyp/pgtable.c
> > > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > > @@ -408,8 +408,10 @@ static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> > >  		return false;
> > >  
> > >  	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> > > -	if (hyp_pte_needs_update(old, new))
> > > +	if (hyp_pte_needs_update(old, new)) {
> > >  		smp_store_release(ptep, new);
> > > +		data->mm_ops->get_page(ptep);
> > 
> > In the case where we're just updating software bits for a valid pte, doesn't
> > this result in us taking a spurious reference to the page?
> 
> Ahem, yes, that is the case. I ended up with the below diff to fix it,
> which I intend to fold in the next version:
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 6ad4cb2d6947..e2047d3f05a2 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -383,21 +383,6 @@ enum kvm_pgtable_prot kvm_pgtable_hyp_pte_prot(kvm_pte_t pte)
>         return prot;
>  }
> 
> -static bool hyp_pte_needs_update(kvm_pte_t old, kvm_pte_t new)
> -{
> -       /*
> -        * Tolerate KVM recreating the exact same mapping, or changing software
> -        * bits if the existing mapping was valid.
> -        */
> -       if (old == new)
> -               return false;
> -
> -       if (!kvm_pte_valid(old))
> -               return true;
> -
> -       return !WARN_ON((old ^ new) & ~KVM_PTE_LEAF_ATTR_HI_SW);
> -}
> -
>  static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>                                     kvm_pte_t *ptep, struct hyp_map_data *data)
>  {
> @@ -407,13 +392,16 @@ static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>         if (!kvm_block_mapping_supported(addr, end, phys, level))
>                 return false;
> 
> +       data->phys += granule;
>         new = kvm_init_valid_leaf_pte(phys, data->attr, level);
> -       if (hyp_pte_needs_update(old, new)) {
> -               smp_store_release(ptep, new);
> +       if (old == new)
> +               return true;
> +       else if (!kvm_pte_valid(old))

(nit: clearer to drop the 'else' part here)

>                 data->mm_ops->get_page(ptep);

Ok, so this works because new is always valid.

LGTM.

Will

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 09/15] KVM: arm64: Extend pkvm_page_state enumeration to handle absent pages
  2021-12-01 17:04 ` [PATCH v3 09/15] KVM: arm64: Extend pkvm_page_state enumeration to handle absent pages Quentin Perret
@ 2021-12-14 14:47   ` Andrew Walbran
  0 siblings, 0 replies; 37+ messages in thread
From: Andrew Walbran @ 2021-12-14 14:47 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, linux-arm-kernel, kvmarm,
	linux-kernel, kernel-team

On Wed, 1 Dec 2021 at 17:04, 'Quentin Perret' via kernel-team
<kernel-team@android.com> wrote:
>
> From: Will Deacon <will@kernel.org>
>
> Explicitly name the combination of SW0 | SW1 as reserved in the pte and
> introduce a new PKVM_NOPAGE meta-state which, although not directly
> stored in the software bits of the pte, can be used to represent an
> entry for which there is no underlying page. This is distinct from an
> invalid pte, as stage-2 identity mappings for the host are created
> lazily and so an invalid pte there is the same as a valid mapping for
> the purposes of ownership information.
>
> This state will be used for permission checking during page transitions
> in later patches.
>
> Signed-off-by: Will Deacon <will@kernel.org>
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> index b58c910babaf..56445586c755 100644
> --- a/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> +++ b/arch/arm64/kvm/hyp/include/nvhe/mem_protect.h
> @@ -24,6 +24,11 @@ enum pkvm_page_state {
>         PKVM_PAGE_OWNED                 = 0ULL,
>         PKVM_PAGE_SHARED_OWNED          = KVM_PGTABLE_PROT_SW0,
>         PKVM_PAGE_SHARED_BORROWED       = KVM_PGTABLE_PROT_SW1,
> +       __PKVM_PAGE_RESERVED            = KVM_PGTABLE_PROT_SW0 |
> +                                         KVM_PGTABLE_PROT_SW1,
> +
> +       /* Meta-states which aren't encoded directly in the PTE's SW bits */
> +       PKVM_NOPAGE,
>  };
>
>  #define PKVM_PAGE_STATE_PROT_MASK      (KVM_PGTABLE_PROT_SW0 | KVM_PGTABLE_PROT_SW1)
> --
> 2.34.0.rc2.393.gf8c9666880-goog

Reviewed-by: Andrew Walbran <qwandor@google.com>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 10/15] KVM: arm64: Introduce wrappers for host and hyp spin lock accessors
  2021-12-01 17:04 ` [PATCH v3 10/15] KVM: arm64: Introduce wrappers for host and hyp spin lock accessors Quentin Perret
@ 2021-12-14 14:48   ` Andrew Walbran
  2021-12-14 14:52     ` Will Deacon
  0 siblings, 1 reply; 37+ messages in thread
From: Andrew Walbran @ 2021-12-14 14:48 UTC (permalink / raw)
  To: Quentin Perret
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, linux-arm-kernel, kvmarm,
	linux-kernel, kernel-team

On Wed, 1 Dec 2021 at 17:04, 'Quentin Perret' via kernel-team
<kernel-team@android.com> wrote:
>
> From: Will Deacon <will@kernel.org>
>
> In preparation for adding additional locked sections for manipulating
> page-tables at EL2, introduce some simple wrappers around the host and
> hypervisor locks so that it's a bit easier to read and bit more difficult
> to take the wrong lock (or even take them in the wrong order).
Looks good, but how does this help prevent taking locks in the wrong order?

>
> Signed-off-by: Will Deacon <will@kernel.org>
> Signed-off-by: Quentin Perret <qperret@google.com>
> ---
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c | 32 ++++++++++++++++++++++-----
>  1 file changed, 26 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index c1a90dd022b8..757dfefe3aeb 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -27,6 +27,26 @@ static struct hyp_pool host_s2_pool;
>
>  const u8 pkvm_hyp_id = 1;
>
> +static void host_lock_component(void)
> +{
> +       hyp_spin_lock(&host_kvm.lock);
> +}
> +
> +static void host_unlock_component(void)
> +{
> +       hyp_spin_unlock(&host_kvm.lock);
> +}
> +
> +static void hyp_lock_component(void)
> +{
> +       hyp_spin_lock(&pkvm_pgd_lock);
> +}
> +
> +static void hyp_unlock_component(void)
> +{
> +       hyp_spin_unlock(&pkvm_pgd_lock);
> +}
> +
>  static void *host_s2_zalloc_pages_exact(size_t size)
>  {
>         void *addr = hyp_alloc_pages(&host_s2_pool, get_order(size));
> @@ -338,14 +358,14 @@ static int host_stage2_idmap(u64 addr)
>
>         prot = is_memory ? PKVM_HOST_MEM_PROT : PKVM_HOST_MMIO_PROT;
>
> -       hyp_spin_lock(&host_kvm.lock);
> +       host_lock_component();
>         ret = host_stage2_adjust_range(addr, &range);
>         if (ret)
>                 goto unlock;
>
>         ret = host_stage2_idmap_locked(range.start, range.end - range.start, prot);
>  unlock:
> -       hyp_spin_unlock(&host_kvm.lock);
> +       host_unlock_component();
>
>         return ret;
>  }
> @@ -369,8 +389,8 @@ int __pkvm_host_share_hyp(u64 pfn)
>         if (!addr_is_memory(addr))
>                 return -EINVAL;
>
> -       hyp_spin_lock(&host_kvm.lock);
> -       hyp_spin_lock(&pkvm_pgd_lock);
> +       host_lock_component();
> +       hyp_lock_component();
>
>         ret = kvm_pgtable_get_leaf(&host_kvm.pgt, addr, &pte, NULL);
>         if (ret)
> @@ -432,8 +452,8 @@ int __pkvm_host_share_hyp(u64 pfn)
>         BUG_ON(ret);
>
>  unlock:
> -       hyp_spin_unlock(&pkvm_pgd_lock);
> -       hyp_spin_unlock(&host_kvm.lock);
> +       hyp_unlock_component();
> +       host_unlock_component();
>
>         return ret;
>  }
> --
> 2.34.0.rc2.393.gf8c9666880-goog
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 10/15] KVM: arm64: Introduce wrappers for host and hyp spin lock accessors
  2021-12-14 14:48   ` Andrew Walbran
@ 2021-12-14 14:52     ` Will Deacon
  0 siblings, 0 replies; 37+ messages in thread
From: Will Deacon @ 2021-12-14 14:52 UTC (permalink / raw)
  To: Andrew Walbran
  Cc: Quentin Perret, Marc Zyngier, James Morse, Alexandru Elisei,
	Suzuki K Poulose, Catalin Marinas, linux-arm-kernel, kvmarm,
	linux-kernel, kernel-team

On Tue, Dec 14, 2021 at 02:48:30PM +0000, Andrew Walbran wrote:
> On Wed, 1 Dec 2021 at 17:04, 'Quentin Perret' via kernel-team
> <kernel-team@android.com> wrote:
> >
> > From: Will Deacon <will@kernel.org>
> >
> > In preparation for adding additional locked sections for manipulating
> > page-tables at EL2, introduce some simple wrappers around the host and
> > hypervisor locks so that it's a bit easier to read and bit more difficult
> > to take the wrong lock (or even take them in the wrong order).
> Looks good, but how does this help prevent taking locks in the wrong order?

I just found that I would easily forget what exactly was protected by
"pkvm_pgd_lock" and so relating that back to "take host before hyp" was
error-prone. Having helpers with "host" and "hyp" in the name helps me with
that, at least.

Will

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 06/15] KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2
  2021-12-08 14:40       ` Andrew Walbran
@ 2021-12-15 16:02         ` Quentin Perret
  0 siblings, 0 replies; 37+ messages in thread
From: Quentin Perret @ 2021-12-15 16:02 UTC (permalink / raw)
  To: Andrew Walbran
  Cc: Marc Zyngier, James Morse, Alexandru Elisei, Suzuki K Poulose,
	Catalin Marinas, Will Deacon, linux-arm-kernel, kvmarm,
	linux-kernel, kernel-team

Hey Andrew,

On Wednesday 08 Dec 2021 at 14:40:54 (+0000), Andrew Walbran wrote:
> Aha, I didn't realise that block mappings, but it makes sense in that
> case. How about adding a note to the function comment here explaining
> that reasoning? Otherwise it sounds like the caller has to handle it
> somehow.

Well in fact the caller does have to handle it if it decided to use
block mappings. We just decided not to use them in pkvm for now for
simplicity, but I guess that might change at some point. The page-table
library is meant to be architecturally compliant (it can create block
mappings or not, ...) but the decision of which mappings to create is
left to the caller and the handling logic belongs there. At least that's
my view of it. So, I actually like the comment the way it is, it
describes clearly what the function does, and what the caller should
expect. I have a bunch of changes for v4 queued locally, so I'll send it
now with that comment left untouched, but please shout if you have an
idea on how to make it better.

Cheers,
Quentin

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2021-12-15 16:02 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-01 17:03 [PATCH v3 00/15] KVM: arm64: Introduce kvm_share_hyp() Quentin Perret
2021-12-01 17:03 ` [PATCH v3 01/15] KVM: arm64: Check if running in VHE from kvm_host_owns_hyp_mappings() Quentin Perret
2021-12-09 10:10   ` Will Deacon
2021-12-10 13:37     ` Quentin Perret
2021-12-01 17:03 ` [PATCH v3 02/15] KVM: arm64: Provide {get,put}_page() stubs for early hyp allocator Quentin Perret
2021-12-09 10:12   ` Will Deacon
2021-12-01 17:03 ` [PATCH v3 03/15] KVM: arm64: Refcount hyp stage-1 pgtable pages Quentin Perret
2021-12-09 10:29   ` Will Deacon
2021-12-10 14:34     ` Quentin Perret
2021-12-13 12:53       ` Will Deacon
2021-12-01 17:03 ` [PATCH v3 04/15] KVM: arm64: Fixup hyp stage-1 refcount Quentin Perret
2021-12-09 11:09   ` Will Deacon
2021-12-01 17:03 ` [PATCH v3 05/15] KVM: arm64: Hook up ->page_count() for hypervisor stage-1 page-table Quentin Perret
2021-12-01 17:04 ` [PATCH v3 06/15] KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2 Quentin Perret
2021-12-07 14:47   ` Andrew Walbran
2021-12-08  9:51     ` Quentin Perret
2021-12-08 14:40       ` Andrew Walbran
2021-12-15 16:02         ` Quentin Perret
2021-12-01 17:04 ` [PATCH v3 07/15] KVM: arm64: Introduce kvm_share_hyp() Quentin Perret
2021-12-09 11:13   ` Will Deacon
2021-12-10 14:37     ` Quentin Perret
2021-12-01 17:04 ` [PATCH v3 08/15] KVM: arm64: pkvm: Refcount the pages shared with EL2 Quentin Perret
2021-12-09 11:16   ` Will Deacon
2021-12-01 17:04 ` [PATCH v3 09/15] KVM: arm64: Extend pkvm_page_state enumeration to handle absent pages Quentin Perret
2021-12-14 14:47   ` Andrew Walbran
2021-12-01 17:04 ` [PATCH v3 10/15] KVM: arm64: Introduce wrappers for host and hyp spin lock accessors Quentin Perret
2021-12-14 14:48   ` Andrew Walbran
2021-12-14 14:52     ` Will Deacon
2021-12-01 17:04 ` [PATCH v3 11/15] KVM: arm64: Implement do_share() helper for sharing memory Quentin Perret
2021-12-10 15:18   ` Andrew Walbran
2021-12-01 17:04 ` [PATCH v3 12/15] KVM: arm64: Implement __pkvm_host_share_hyp() using do_share() Quentin Perret
2021-12-01 17:04 ` [PATCH v3 13/15] KVM: arm64: Implement do_unshare() helper for unsharing memory Quentin Perret
2021-12-10 15:08   ` Andrew Walbran
2021-12-01 17:04 ` [PATCH v3 14/15] KVM: arm64: Expose unshare hypercall to the host Quentin Perret
2021-12-01 17:04 ` [PATCH v3 15/15] KVM: arm64: pkvm: Unshare guest structs during teardown Quentin Perret
2021-12-09 11:22   ` Will Deacon
2021-12-10 14:48     ` Quentin Perret

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).