linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Quentin Perret <qperret@google.com>
To: catalin.marinas@arm.com, will@kernel.org, maz@kernel.org,
	james.morse@arm.com, julien.thierry.kdev@gmail.com,
	suzuki.poulose@arm.com
Cc: android-kvm@google.com, seanjc@google.com, mate.toth-pal@arm.com,
	linux-kernel@vger.kernel.org, robh+dt@kernel.org,
	linux-arm-kernel@lists.infradead.org, kernel-team@android.com,
	kvmarm@lists.cs.columbia.edu, tabba@google.com, ardb@kernel.org,
	mark.rutland@arm.com, dbrazdil@google.com, qperret@google.com
Subject: [PATCH v5 29/36] KVM: arm64: Use page-table to track page ownership
Date: Mon, 15 Mar 2021 14:35:29 +0000	[thread overview]
Message-ID: <20210315143536.214621-30-qperret@google.com> (raw)
In-Reply-To: <20210315143536.214621-1-qperret@google.com>

As the host stage 2 will be identity mapped, all the .hyp memory regions
and/or memory pages donated to protected guestis will have to marked
invalid in the host stage 2 page-table. At the same time, the hypervisor
will need a way to track the ownership of each physical page to ensure
memory sharing or donation between entities (host, guests, hypervisor) is
legal.

In order to enable this tracking at EL2, let's use the host stage 2
page-table itself. The idea is to use the top bits of invalid mappings
to store the unique identifier of the page owner. The page-table owner
(the host) gets identifier 0 such that, at boot time, it owns the entire
IPA space as the pgd starts zeroed.

Provide kvm_pgtable_stage2_set_owner() which allows to modify the
ownership of pages in the host stage 2. It re-uses most of the map()
logic, but ends up creating invalid mappings instead. This impacts
how we do refcount as we now need to count invalid mappings when they
are used for ownership tracking.

Signed-off-by: Quentin Perret <qperret@google.com>
---
 arch/arm64/include/asm/kvm_pgtable.h |  21 +++++
 arch/arm64/kvm/hyp/pgtable.c         | 127 ++++++++++++++++++++++-----
 2 files changed, 124 insertions(+), 24 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 4ae19247837b..683e96abdc24 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -238,6 +238,27 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 			   u64 phys, enum kvm_pgtable_prot prot,
 			   void *mc);
 
+/**
+ * kvm_pgtable_stage2_set_owner() - Annotate invalid mappings with metadata
+ *				    encoding the ownership of a page in the
+ *				    IPA space.
+ * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
+ * @addr:	Base intermediate physical address to annotate.
+ * @size:	Size of the annotated range.
+ * @mc:		Cache of pre-allocated and zeroed memory from which to allocate
+ *		page-table pages.
+ * @owner_id:	Unique identifier for the owner of the page.
+ *
+ * By default, all page-tables are owned by identifier 0. This function can be
+ * used to mark portions of the IPA space as owned by other entities. When a
+ * stage 2 is used with identity-mappings, these annotations allow to use the
+ * page-table data structure as a simple rmap.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
+				 void *mc, u8 owner_id);
+
 /**
  * kvm_pgtable_stage2_unmap() - Remove a mapping from a guest stage-2 page-table.
  * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index f37b4179b880..bd44e84dedc4 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -48,6 +48,9 @@
 					 KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W | \
 					 KVM_PTE_LEAF_ATTR_HI_S2_XN)
 
+#define KVM_INVALID_PTE_OWNER_MASK	GENMASK(63, 56)
+#define KVM_MAX_OWNER_ID		1
+
 struct kvm_pgtable_walk_data {
 	struct kvm_pgtable		*pgt;
 	struct kvm_pgtable_walker	*walker;
@@ -67,6 +70,13 @@ static u64 kvm_granule_size(u32 level)
 	return BIT(kvm_granule_shift(level));
 }
 
+#define KVM_PHYS_INVALID (-1ULL)
+
+static bool kvm_phys_is_valid(u64 phys)
+{
+	return phys < BIT(id_aa64mmfr0_parange_to_phys_shift(ID_AA64MMFR0_PARANGE_MAX));
+}
+
 static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level)
 {
 	u64 granule = kvm_granule_size(level);
@@ -81,7 +91,10 @@ static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level)
 	if (granule > (end - addr))
 		return false;
 
-	return IS_ALIGNED(addr, granule) && IS_ALIGNED(phys, granule);
+	if (kvm_phys_is_valid(phys) && !IS_ALIGNED(phys, granule))
+		return false;
+
+	return IS_ALIGNED(addr, granule);
 }
 
 static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
@@ -186,6 +199,11 @@ static kvm_pte_t kvm_init_valid_leaf_pte(u64 pa, kvm_pte_t attr, u32 level)
 	return pte;
 }
 
+static kvm_pte_t kvm_init_invalid_leaf_owner(u8 owner_id)
+{
+	return FIELD_PREP(KVM_INVALID_PTE_OWNER_MASK, owner_id);
+}
+
 static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
 				  u32 level, kvm_pte_t *ptep,
 				  enum kvm_pgtable_walk_flags flag)
@@ -440,6 +458,7 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 struct stage2_map_data {
 	u64				phys;
 	kvm_pte_t			attr;
+	u8				owner_id;
 
 	kvm_pte_t			*anchor;
 	kvm_pte_t			*childp;
@@ -506,6 +525,39 @@ static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
 	return 0;
 }
 
+static bool stage2_pte_needs_update(kvm_pte_t old, kvm_pte_t new)
+{
+	if (!kvm_pte_valid(old) || !kvm_pte_valid(new))
+		return true;
+
+	return ((old ^ new) & (~KVM_PTE_LEAF_ATTR_S2_PERMS));
+}
+
+static bool stage2_pte_is_counted(kvm_pte_t pte)
+{
+	/*
+	 * The refcount tracks valid entries as well as invalid entries if they
+	 * encode ownership of a page to another entity than the page-table
+	 * owner, whose id is 0.
+	 */
+	return !!pte;
+}
+
+static void stage2_put_pte(kvm_pte_t *ptep, struct kvm_s2_mmu *mmu, u64 addr,
+			   u32 level, struct kvm_pgtable_mm_ops *mm_ops)
+{
+	/*
+	 * Clear the existing PTE, and perform break-before-make with
+	 * TLB maintenance if it was valid.
+	 */
+	if (kvm_pte_valid(*ptep)) {
+		kvm_clear_pte(ptep);
+		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
+	}
+
+	mm_ops->put_page(ptep);
+}
+
 static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 				      kvm_pte_t *ptep,
 				      struct stage2_map_data *data)
@@ -517,29 +569,29 @@ static int stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
 	if (!kvm_block_mapping_supported(addr, end, phys, level))
 		return -E2BIG;
 
-	new = kvm_init_valid_leaf_pte(phys, data->attr, level);
-	if (kvm_pte_valid(old)) {
+	if (kvm_phys_is_valid(phys))
+		new = kvm_init_valid_leaf_pte(phys, data->attr, level);
+	else
+		new = kvm_init_invalid_leaf_owner(data->owner_id);
+
+	if (stage2_pte_is_counted(old)) {
 		/*
 		 * Skip updating the PTE if we are trying to recreate the exact
 		 * same mapping or only change the access permissions. Instead,
 		 * the vCPU will exit one more time from guest if still needed
 		 * and then go through the path of relaxing permissions.
 		 */
-		if (!((old ^ new) & (~KVM_PTE_LEAF_ATTR_S2_PERMS)))
+		if (!stage2_pte_needs_update(old, new))
 			return -EAGAIN;
 
-		/*
-		 * There's an existing different valid leaf entry, so perform
-		 * break-before-make.
-		 */
-		kvm_clear_pte(ptep);
-		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
-		mm_ops->put_page(ptep);
+		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
 	}
 
 	smp_store_release(ptep, new);
-	mm_ops->get_page(ptep);
-	data->phys += granule;
+	if (stage2_pte_is_counted(new))
+		mm_ops->get_page(ptep);
+	if (kvm_phys_is_valid(phys))
+		data->phys += granule;
 	return 0;
 }
 
@@ -574,7 +626,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	int ret;
 
 	if (data->anchor) {
-		if (kvm_pte_valid(pte))
+		if (stage2_pte_is_counted(pte))
 			mm_ops->put_page(ptep);
 
 		return 0;
@@ -599,11 +651,8 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	 * a table. Accesses beyond 'end' that fall within the new table
 	 * will be mapped lazily.
 	 */
-	if (kvm_pte_valid(pte)) {
-		kvm_clear_pte(ptep);
-		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
-		mm_ops->put_page(ptep);
-	}
+	if (stage2_pte_is_counted(pte))
+		stage2_put_pte(ptep, data->mmu, addr, level, mm_ops);
 
 	kvm_set_table_pte(ptep, childp, mm_ops);
 	mm_ops->get_page(ptep);
@@ -701,6 +750,33 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
 	return ret;
 }
 
+int kvm_pgtable_stage2_set_owner(struct kvm_pgtable *pgt, u64 addr, u64 size,
+				 void *mc, u8 owner_id)
+{
+	int ret;
+	struct stage2_map_data map_data = {
+		.phys		= KVM_PHYS_INVALID,
+		.mmu		= pgt->mmu,
+		.memcache	= mc,
+		.mm_ops		= pgt->mm_ops,
+		.owner_id	= owner_id,
+	};
+	struct kvm_pgtable_walker walker = {
+		.cb		= stage2_map_walker,
+		.flags		= KVM_PGTABLE_WALK_TABLE_PRE |
+				  KVM_PGTABLE_WALK_LEAF |
+				  KVM_PGTABLE_WALK_TABLE_POST,
+		.arg		= &map_data,
+	};
+
+	if (owner_id > KVM_MAX_OWNER_ID)
+		return -EINVAL;
+
+	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
+	dsb(ishst);
+	return ret;
+}
+
 static void stage2_flush_dcache(void *addr, u64 size)
 {
 	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
@@ -725,8 +801,13 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	kvm_pte_t pte = *ptep, *childp = NULL;
 	bool need_flush = false;
 
-	if (!kvm_pte_valid(pte))
+	if (!kvm_pte_valid(pte)) {
+		if (stage2_pte_is_counted(pte)) {
+			kvm_clear_pte(ptep);
+			mm_ops->put_page(ptep);
+		}
 		return 0;
+	}
 
 	if (kvm_pte_table(pte, level)) {
 		childp = kvm_pte_follow(pte, mm_ops);
@@ -742,9 +823,7 @@ static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	 * block entry and rely on the remaining portions being faulted
 	 * back lazily.
 	 */
-	kvm_clear_pte(ptep);
-	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
-	mm_ops->put_page(ptep);
+	stage2_put_pte(ptep, mmu, addr, level, mm_ops);
 
 	if (need_flush) {
 		stage2_flush_dcache(kvm_pte_follow(pte, mm_ops),
@@ -948,7 +1027,7 @@ static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	struct kvm_pgtable_mm_ops *mm_ops = arg;
 	kvm_pte_t pte = *ptep;
 
-	if (!kvm_pte_valid(pte))
+	if (!stage2_pte_is_counted(pte))
 		return 0;
 
 	mm_ops->put_page(ptep);
-- 
2.31.0.rc2.261.g7f71774620-goog


  parent reply	other threads:[~2021-03-15 14:53 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-15 14:35 [PATCH v5 00/36] KVM: arm64: A stage 2 for the host Quentin Perret
2021-03-15 14:35 ` [PATCH v5 01/36] arm64: lib: Annotate {clear,copy}_page() as position-independent Quentin Perret
2021-03-15 14:35 ` [PATCH v5 02/36] KVM: arm64: Link position-independent string routines into .hyp.text Quentin Perret
2021-03-15 14:35 ` [PATCH v5 03/36] arm64: kvm: Add standalone ticket spinlock implementation for use at hyp Quentin Perret
2021-03-15 14:35 ` [PATCH v5 04/36] KVM: arm64: Initialize kvm_nvhe_init_params early Quentin Perret
2021-03-15 14:35 ` [PATCH v5 05/36] KVM: arm64: Avoid free_page() in page-table allocator Quentin Perret
2021-03-15 14:35 ` [PATCH v5 06/36] KVM: arm64: Factor memory allocation out of pgtable.c Quentin Perret
2021-03-15 14:35 ` [PATCH v5 07/36] KVM: arm64: Introduce a BSS section for use at Hyp Quentin Perret
2021-03-15 14:35 ` [PATCH v5 08/36] KVM: arm64: Make kvm_call_hyp() a function call " Quentin Perret
2021-03-15 14:35 ` [PATCH v5 09/36] KVM: arm64: Allow using kvm_nvhe_sym() in hyp code Quentin Perret
2021-03-15 14:35 ` [PATCH v5 10/36] KVM: arm64: Introduce an early Hyp page allocator Quentin Perret
2021-03-15 14:35 ` [PATCH v5 11/36] KVM: arm64: Stub CONFIG_DEBUG_LIST at Hyp Quentin Perret
2021-03-15 14:35 ` [PATCH v5 12/36] KVM: arm64: Introduce a Hyp buddy page allocator Quentin Perret
2021-03-15 14:35 ` [PATCH v5 13/36] KVM: arm64: Enable access to sanitized CPU features at EL2 Quentin Perret
2021-03-15 14:35 ` [PATCH v5 14/36] KVM: arm64: Provide __flush_dcache_area " Quentin Perret
2021-03-15 16:33   ` Will Deacon
2021-03-15 16:56     ` Quentin Perret
2021-03-15 17:03       ` Will Deacon
2021-03-15 14:35 ` [PATCH v5 15/36] KVM: arm64: Factor out vector address calculation Quentin Perret
2021-03-15 14:35 ` [PATCH v5 16/36] arm64: asm: Provide set_sctlr_el2 macro Quentin Perret
2021-03-15 14:35 ` [PATCH v5 17/36] KVM: arm64: Prepare the creation of s1 mappings at EL2 Quentin Perret
2021-03-15 14:35 ` [PATCH v5 18/36] KVM: arm64: Elevate hypervisor mappings creation " Quentin Perret
2021-03-15 14:35 ` [PATCH v5 19/36] KVM: arm64: Use kvm_arch for stage 2 pgtable Quentin Perret
2021-03-15 14:35 ` [PATCH v5 20/36] KVM: arm64: Use kvm_arch in kvm_s2_mmu Quentin Perret
2021-03-15 14:35 ` [PATCH v5 21/36] KVM: arm64: Set host stage 2 using kvm_nvhe_init_params Quentin Perret
2021-03-15 14:35 ` [PATCH v5 22/36] KVM: arm64: Refactor kvm_arm_setup_stage2() Quentin Perret
2021-03-15 14:35 ` [PATCH v5 23/36] KVM: arm64: Refactor __load_guest_stage2() Quentin Perret
2021-03-15 14:35 ` [PATCH v5 24/36] KVM: arm64: Refactor __populate_fault_info() Quentin Perret
2021-03-15 14:35 ` [PATCH v5 25/36] KVM: arm64: Make memcache anonymous in pgtable allocator Quentin Perret
2021-03-15 14:35 ` [PATCH v5 26/36] KVM: arm64: Reserve memory for host stage 2 Quentin Perret
2021-03-15 14:35 ` [PATCH v5 27/36] KVM: arm64: Sort the hypervisor memblocks Quentin Perret
2021-03-15 14:35 ` [PATCH v5 28/36] KVM: arm64: Always zero invalid PTEs Quentin Perret
2021-03-15 14:35 ` Quentin Perret [this message]
2021-03-15 16:36   ` [PATCH v5 29/36] KVM: arm64: Use page-table to track page ownership Will Deacon
2021-03-15 16:53     ` Quentin Perret
2021-03-15 17:01       ` Will Deacon
2021-03-15 14:35 ` [PATCH v5 30/36] KVM: arm64: Refactor the *_map_set_prot_attr() helpers Quentin Perret
2021-03-15 14:35 ` [PATCH v5 31/36] KVM: arm64: Add kvm_pgtable_stage2_find_range() Quentin Perret
2021-03-15 16:31   ` Will Deacon
2021-03-15 14:35 ` [PATCH v5 32/36] KVM: arm64: Provide sanitized mmfr* registers at EL2 Quentin Perret
2021-03-15 16:31   ` Will Deacon
2021-03-15 14:35 ` [PATCH v5 33/36] KVM: arm64: Wrap the host with a stage 2 Quentin Perret
2021-03-16 12:28   ` Mate Toth-Pal
2021-03-16 12:53     ` Quentin Perret
2021-03-16 14:29       ` Quentin Perret
2021-03-16 15:16         ` Mate Toth-Pal
2021-03-16 17:46           ` Quentin Perret
2021-03-17  8:41             ` Mate Toth-Pal
2021-03-17  9:02               ` Quentin Perret
2021-03-17 14:57                 ` Mate Toth-Pal
2021-03-17 14:17   ` [PATCH 0/2] Fixes for FWB Quentin Perret
2021-03-17 14:17     ` [PATCH 1/2] KVM: arm64: Introduce KVM_PGTABLE_S2_NOFWB Stage-2 flag Quentin Perret
2021-03-17 14:41       ` Marc Zyngier
2021-03-17 14:47         ` Quentin Perret
2021-03-17 14:42       ` Will Deacon
2021-03-17 14:51         ` Quentin Perret
2021-03-17 14:17     ` [PATCH 2/2] KVM: arm64: Disable FWB in host stage-2 Quentin Perret
2021-03-15 14:35 ` [PATCH v5 34/36] KVM: arm64: Page-align the .hyp sections Quentin Perret
2021-03-15 14:35 ` [PATCH v5 35/36] KVM: arm64: Disable PMU support in protected mode Quentin Perret
2021-03-15 14:35 ` [PATCH v5 36/36] KVM: arm64: Protect the .hyp sections from the host Quentin Perret

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210315143536.214621-30-qperret@google.com \
    --to=qperret@google.com \
    --cc=android-kvm@google.com \
    --cc=ardb@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=dbrazdil@google.com \
    --cc=james.morse@arm.com \
    --cc=julien.thierry.kdev@gmail.com \
    --cc=kernel-team@android.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mate.toth-pal@arm.com \
    --cc=maz@kernel.org \
    --cc=robh+dt@kernel.org \
    --cc=seanjc@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).