linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling
@ 2020-08-25  9:39 Will Deacon
  2020-08-25  9:39 ` [PATCH v3 01/21] KVM: arm64: Remove kvm_mmu_free_memory_caches() Will Deacon
                   ` (23 more replies)
  0 siblings, 24 replies; 86+ messages in thread
From: Will Deacon @ 2020-08-25  9:39 UTC (permalink / raw)
  To: kvmarm
  Cc: kernel-team, Gavin Shan, Suzuki Poulose, Marc Zyngier,
	Quentin Perret, James Morse, Catalin Marinas, Will Deacon,
	linux-arm-kernel

Hello folks,

This is version three of the KVM page-table rework that I previously posted
here:

  v1: https://lore.kernel.org/r/20200730153406.25136-1-will@kernel.org
  v2: https://lore.kernel.org/r/20200818132818.16065-1-will@kernel.org

Changes since v2 include:

  * Rebased onto -rc2, which includes the conflicting OOM blocking fixes
  * Dropped the patch trying to "fix" the memcache in kvm_phys_addr_ioremap()

Cheers,

Will

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Gavin Shan <gshan@redhat.com>
Cc: kernel-team@android.com
Cc: linux-arm-kernel@lists.infradead.org

--->8

Quentin Perret (4):
  KVM: arm64: Add support for stage-2 write-protect in generic
    page-table
  KVM: arm64: Convert write-protect operation to generic page-table API
  KVM: arm64: Add support for stage-2 cache flushing in generic
    page-table
  KVM: arm64: Convert memslot cache-flushing code to generic page-table
    API

Will Deacon (17):
  KVM: arm64: Remove kvm_mmu_free_memory_caches()
  KVM: arm64: Add stand-alone page-table walker infrastructure
  KVM: arm64: Add support for creating kernel-agnostic stage-1 page
    tables
  KVM: arm64: Use generic allocator for hyp stage-1 page-tables
  KVM: arm64: Add support for creating kernel-agnostic stage-2 page
    tables
  KVM: arm64: Add support for stage-2 map()/unmap() in generic
    page-table
  KVM: arm64: Convert kvm_phys_addr_ioremap() to generic page-table API
  KVM: arm64: Convert kvm_set_spte_hva() to generic page-table API
  KVM: arm64: Convert unmap_stage2_range() to generic page-table API
  KVM: arm64: Add support for stage-2 page-aging in generic page-table
  KVM: arm64: Convert page-aging and access faults to generic page-table
    API
  KVM: arm64: Add support for relaxing stage-2 perms in generic
    page-table code
  KVM: arm64: Convert user_mem_abort() to generic page-table API
  KVM: arm64: Check the pgt instead of the pgd when modifying page-table
  KVM: arm64: Remove unused page-table code
  KVM: arm64: Remove unused 'pgd' field from 'struct kvm_s2_mmu'
  KVM: arm64: Don't constrain maximum IPA size based on host
    configuration

 arch/arm64/include/asm/kvm_host.h       |    2 +-
 arch/arm64/include/asm/kvm_mmu.h        |  221 +---
 arch/arm64/include/asm/kvm_pgtable.h    |  279 ++++
 arch/arm64/include/asm/pgtable-hwdef.h  |   23 -
 arch/arm64/include/asm/pgtable-prot.h   |   19 -
 arch/arm64/include/asm/stage2_pgtable.h |  215 ----
 arch/arm64/kvm/arm.c                    |    2 +-
 arch/arm64/kvm/hyp/Makefile             |    2 +-
 arch/arm64/kvm/hyp/pgtable.c            |  860 +++++++++++++
 arch/arm64/kvm/mmu.c                    | 1566 +++--------------------
 arch/arm64/kvm/reset.c                  |   38 +-
 11 files changed, 1326 insertions(+), 1901 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_pgtable.h
 create mode 100644 arch/arm64/kvm/hyp/pgtable.c

-- 
2.28.0.297.g1956fa8f8d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* [PATCH v3 01/21] KVM: arm64: Remove kvm_mmu_free_memory_caches()
  2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
@ 2020-08-25  9:39 ` Will Deacon
  2020-08-25  9:39 ` [PATCH v3 02/21] KVM: arm64: Add stand-alone page-table walker infrastructure Will Deacon
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 86+ messages in thread
From: Will Deacon @ 2020-08-25  9:39 UTC (permalink / raw)
  To: kvmarm
  Cc: kernel-team, Gavin Shan, Suzuki Poulose, Marc Zyngier,
	Quentin Perret, James Morse, Catalin Marinas, Will Deacon,
	linux-arm-kernel

kvm_mmu_free_memory_caches() is only called by kvm_arch_vcpu_destroy(),
so inline the implementation and get rid of the extra function.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_mmu.h | 2 --
 arch/arm64/kvm/arm.c             | 2 +-
 arch/arm64/kvm/mmu.c             | 5 -----
 3 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 189839c3706a..0f078b1920ff 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -141,8 +141,6 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
 
-void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
-
 phys_addr_t kvm_mmu_get_httbr(void);
 phys_addr_t kvm_get_idmap_vector(void);
 int kvm_mmu_init(void);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 46dc3d75cf13..262a0afbcc27 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -283,7 +283,7 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 	if (vcpu->arch.has_run_once && unlikely(!irqchip_in_kernel(vcpu->kvm)))
 		static_branch_dec(&userspace_irqchip_in_use);
 
-	kvm_mmu_free_memory_caches(vcpu);
+	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
 	kvm_timer_vcpu_terminate(vcpu);
 	kvm_pmu_vcpu_destroy(vcpu);
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ba00bcc0c884..935f8f689433 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2324,11 +2324,6 @@ int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
 				 kvm_test_age_hva_handler, NULL);
 }
 
-void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
-{
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
-}
-
 phys_addr_t kvm_mmu_get_httbr(void)
 {
 	if (__kvm_cpu_uses_extended_idmap())
-- 
2.28.0.297.g1956fa8f8d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 02/21] KVM: arm64: Add stand-alone page-table walker infrastructure
  2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
  2020-08-25  9:39 ` [PATCH v3 01/21] KVM: arm64: Remove kvm_mmu_free_memory_caches() Will Deacon
@ 2020-08-25  9:39 ` Will Deacon
  2020-08-27 16:27   ` Alexandru Elisei
                     ` (2 more replies)
  2020-08-25  9:39 ` [PATCH v3 03/21] KVM: arm64: Add support for creating kernel-agnostic stage-1 page tables Will Deacon
                   ` (21 subsequent siblings)
  23 siblings, 3 replies; 86+ messages in thread
From: Will Deacon @ 2020-08-25  9:39 UTC (permalink / raw)
  To: kvmarm
  Cc: kernel-team, Gavin Shan, Suzuki Poulose, Marc Zyngier,
	Quentin Perret, James Morse, Catalin Marinas, Will Deacon,
	linux-arm-kernel

The KVM page-table code is intricately tied into the kernel page-table
code and re-uses the pte/pmd/pud/p4d/pgd macros directly in an attempt
to reduce code duplication. Unfortunately, the reality is that there is
an awful lot of code required to make this work, and at the end of the
day you're limited to creating page-tables with the same configuration
as the host kernel. Furthermore, lifting the page-table code to run
directly at EL2 on a non-VHE system (as we plan to to do in future
patches) is practically impossible due to the number of dependencies it
has on the core kernel.

Introduce a framework for walking Armv8 page-tables configured
independently from the host kernel.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_pgtable.h | 101 ++++++++++
 arch/arm64/kvm/hyp/Makefile          |   2 +-
 arch/arm64/kvm/hyp/pgtable.c         | 290 +++++++++++++++++++++++++++
 3 files changed, 392 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/include/asm/kvm_pgtable.h
 create mode 100644 arch/arm64/kvm/hyp/pgtable.c

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
new file mode 100644
index 000000000000..51ccbbb0efae
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -0,0 +1,101 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Google LLC
+ * Author: Will Deacon <will@kernel.org>
+ */
+
+#ifndef __ARM64_KVM_PGTABLE_H__
+#define __ARM64_KVM_PGTABLE_H__
+
+#include <linux/bits.h>
+#include <linux/kvm_host.h>
+#include <linux/types.h>
+
+typedef u64 kvm_pte_t;
+
+/**
+ * struct kvm_pgtable - KVM page-table.
+ * @ia_bits:		Maximum input address size, in bits.
+ * @start_level:	Level at which the page-table walk starts.
+ * @pgd:		Pointer to the first top-level entry of the page-table.
+ * @mmu:		Stage-2 KVM MMU struct. Unused for stage-1 page-tables.
+ */
+struct kvm_pgtable {
+	u32					ia_bits;
+	u32					start_level;
+	kvm_pte_t				*pgd;
+
+	/* Stage-2 only */
+	struct kvm_s2_mmu			*mmu;
+};
+
+/**
+ * enum kvm_pgtable_prot - Page-table permissions and attributes.
+ * @KVM_PGTABLE_PROT_R:		Read permission.
+ * @KVM_PGTABLE_PROT_W:		Write permission.
+ * @KVM_PGTABLE_PROT_X:		Execute permission.
+ * @KVM_PGTABLE_PROT_DEVICE:	Device attributes.
+ */
+enum kvm_pgtable_prot {
+	KVM_PGTABLE_PROT_R			= BIT(0),
+	KVM_PGTABLE_PROT_W			= BIT(1),
+	KVM_PGTABLE_PROT_X			= BIT(2),
+
+	KVM_PGTABLE_PROT_DEVICE			= BIT(3),
+};
+
+/**
+ * enum kvm_pgtable_walk_flags - Flags to control a depth-first page-table walk.
+ * @KVM_PGTABLE_WALK_LEAF:		Visit leaf entries, including invalid
+ *					entries.
+ * @KVM_PGTABLE_WALK_TABLE_PRE:		Visit table entries before their
+ *					children.
+ * @KVM_PGTABLE_WALK_TABLE_POST:	Visit table entries after their
+ *					children.
+ */
+enum kvm_pgtable_walk_flags {
+	KVM_PGTABLE_WALK_LEAF			= BIT(0),
+	KVM_PGTABLE_WALK_TABLE_PRE		= BIT(1),
+	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
+};
+
+typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level,
+					kvm_pte_t *ptep,
+					enum kvm_pgtable_walk_flags flag,
+					void * const arg);
+
+/**
+ * struct kvm_pgtable_walker - Hook into a page-table walk.
+ * @cb:		Callback function to invoke during the walk.
+ * @arg:	Argument passed to the callback function.
+ * @flags:	Bitwise-OR of flags to identify the entry types on which to
+ *		invoke the callback function.
+ */
+struct kvm_pgtable_walker {
+	const kvm_pgtable_visitor_fn_t		cb;
+	void * const				arg;
+	const enum kvm_pgtable_walk_flags	flags;
+};
+
+/**
+ * kvm_pgtable_walk() - Walk a page-table.
+ * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
+ * @addr:	Input address for the start of the walk.
+ * @size:	Size of the range to walk.
+ * @walker:	Walker callback description.
+ *
+ * The walker will walk the page-table entries corresponding to the input
+ * address range specified, visiting entries according to the walker flags.
+ * Invalid entries are treated as leaf entries. Leaf entries are reloaded
+ * after invoking the walker callback, allowing the walker to descend into
+ * a newly installed table.
+ *
+ * Returning a negative error code from the walker callback function will
+ * terminate the walk immediately with the same error code.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
+		     struct kvm_pgtable_walker *walker);
+
+#endif	/* __ARM64_KVM_PGTABLE_H__ */
diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
index f54f0e89a71c..607b8a898826 100644
--- a/arch/arm64/kvm/hyp/Makefile
+++ b/arch/arm64/kvm/hyp/Makefile
@@ -10,5 +10,5 @@ subdir-ccflags-y := -I$(incdir)				\
 		    -DDISABLE_BRANCH_PROFILING		\
 		    $(DISABLE_STACKLEAK_PLUGIN)
 
-obj-$(CONFIG_KVM) += vhe/ nvhe/
+obj-$(CONFIG_KVM) += vhe/ nvhe/ pgtable.o
 obj-$(CONFIG_KVM_INDIRECT_VECTORS) += smccc_wa.o
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
new file mode 100644
index 000000000000..462001bbe028
--- /dev/null
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -0,0 +1,290 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Stand-alone page-table allocator for hyp stage-1 and guest stage-2.
+ * No bombay mix was harmed in the writing of this file.
+ *
+ * Copyright (C) 2020 Google LLC
+ * Author: Will Deacon <will@kernel.org>
+ */
+
+#include <linux/bitfield.h>
+#include <asm/kvm_pgtable.h>
+
+#define KVM_PGTABLE_MAX_LEVELS		4U
+
+#define KVM_PTE_VALID			BIT(0)
+
+#define KVM_PTE_TYPE			BIT(1)
+#define KVM_PTE_TYPE_BLOCK		0
+#define KVM_PTE_TYPE_PAGE		1
+#define KVM_PTE_TYPE_TABLE		1
+
+#define KVM_PTE_ADDR_MASK		GENMASK(47, PAGE_SHIFT)
+#define KVM_PTE_ADDR_51_48		GENMASK(15, 12)
+
+#define KVM_PTE_LEAF_ATTR_LO		GENMASK(11, 2)
+
+#define KVM_PTE_LEAF_ATTR_HI		GENMASK(63, 51)
+
+struct kvm_pgtable_walk_data {
+	struct kvm_pgtable		*pgt;
+	struct kvm_pgtable_walker	*walker;
+
+	u64				addr;
+	u64				end;
+};
+
+static u64 kvm_granule_shift(u32 level)
+{
+	return (KVM_PGTABLE_MAX_LEVELS - level) * (PAGE_SHIFT - 3) + 3;
+}
+
+static u64 kvm_granule_size(u32 level)
+{
+	return BIT(kvm_granule_shift(level));
+}
+
+static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level)
+{
+	u64 granule = kvm_granule_size(level);
+
+	/*
+	 * Reject invalid block mappings and don't bother with 4TB mappings for
+	 * 52-bit PAs.
+	 */
+	if (level == 0 || (PAGE_SIZE != SZ_4K && level == 1))
+		return false;
+
+	if (granule > (end - addr))
+		return false;
+
+	return IS_ALIGNED(addr, granule) && IS_ALIGNED(phys, granule);
+}
+
+static u32 kvm_start_level(u64 ia_bits)
+{
+	u64 levels = DIV_ROUND_UP(ia_bits - PAGE_SHIFT, PAGE_SHIFT - 3);
+	return KVM_PGTABLE_MAX_LEVELS - levels;
+}
+
+static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
+{
+	u64 shift = kvm_granule_shift(level);
+	u64 mask = BIT(PAGE_SHIFT - 3) - 1;
+
+	return (data->addr >> shift) & mask;
+}
+
+static u32 __kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
+{
+	u64 shift = kvm_granule_shift(pgt->start_level - 1); /* May underflow */
+	u64 mask = BIT(pgt->ia_bits) - 1;
+
+	return (addr & mask) >> shift;
+}
+
+static u32 kvm_pgd_page_idx(struct kvm_pgtable_walk_data *data)
+{
+	return __kvm_pgd_page_idx(data->pgt, data->addr);
+}
+
+static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
+{
+	struct kvm_pgtable pgt = {
+		.ia_bits	= ia_bits,
+		.start_level	= start_level,
+	};
+
+	return __kvm_pgd_page_idx(&pgt, -1ULL) + 1;
+}
+
+static bool kvm_pte_valid(kvm_pte_t pte)
+{
+	return pte & KVM_PTE_VALID;
+}
+
+static bool kvm_pte_table(kvm_pte_t pte, u32 level)
+{
+	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
+		return false;
+
+	if (!kvm_pte_valid(pte))
+		return false;
+
+	return FIELD_GET(KVM_PTE_TYPE, pte) == KVM_PTE_TYPE_TABLE;
+}
+
+static u64 kvm_pte_to_phys(kvm_pte_t pte)
+{
+	u64 pa = pte & KVM_PTE_ADDR_MASK;
+
+	if (PAGE_SHIFT == 16)
+		pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
+
+	return pa;
+}
+
+static kvm_pte_t kvm_phys_to_pte(u64 pa)
+{
+	kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK;
+
+	if (PAGE_SHIFT == 16)
+		pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
+
+	return pte;
+}
+
+static kvm_pte_t *kvm_pte_follow(kvm_pte_t pte)
+{
+	return __va(kvm_pte_to_phys(pte));
+}
+
+static void kvm_set_invalid_pte(kvm_pte_t *ptep)
+{
+	kvm_pte_t pte = 0;
+	WRITE_ONCE(*ptep, pte);
+}
+
+static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
+{
+	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(__pa(childp));
+
+	pte |= FIELD_PREP(KVM_PTE_TYPE, KVM_PTE_TYPE_TABLE);
+	pte |= KVM_PTE_VALID;
+
+	WARN_ON(kvm_pte_valid(old));
+	smp_store_release(ptep, pte);
+}
+
+static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
+				   u32 level)
+{
+	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(pa);
+	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
+							   KVM_PTE_TYPE_BLOCK;
+
+	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
+	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
+	pte |= KVM_PTE_VALID;
+
+	/* Tolerate KVM recreating the exact same mapping. */
+	if (kvm_pte_valid(old))
+		return old == pte;
+
+	smp_store_release(ptep, pte);
+	return true;
+}
+
+static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
+				  u32 level, kvm_pte_t *ptep,
+				  enum kvm_pgtable_walk_flags flag)
+{
+	struct kvm_pgtable_walker *walker = data->walker;
+	return walker->cb(addr, data->end, level, ptep, flag, walker->arg);
+}
+
+static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
+			      kvm_pte_t *pgtable, u32 level);
+
+static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
+				      kvm_pte_t *ptep, u32 level)
+{
+	int ret = 0;
+	u64 addr = data->addr;
+	kvm_pte_t *childp, pte = *ptep;
+	bool table = kvm_pte_table(pte, level);
+	enum kvm_pgtable_walk_flags flags = data->walker->flags;
+
+	if (table && (flags & KVM_PGTABLE_WALK_TABLE_PRE)) {
+		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
+					     KVM_PGTABLE_WALK_TABLE_PRE);
+	}
+
+	if (!table && (flags & KVM_PGTABLE_WALK_LEAF)) {
+		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
+					     KVM_PGTABLE_WALK_LEAF);
+		pte = *ptep;
+		table = kvm_pte_table(pte, level);
+	}
+
+	if (ret)
+		goto out;
+
+	if (!table) {
+		data->addr += kvm_granule_size(level);
+		goto out;
+	}
+
+	childp = kvm_pte_follow(pte);
+	ret = __kvm_pgtable_walk(data, childp, level + 1);
+	if (ret)
+		goto out;
+
+	if (flags & KVM_PGTABLE_WALK_TABLE_POST) {
+		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
+					     KVM_PGTABLE_WALK_TABLE_POST);
+	}
+
+out:
+	return ret;
+}
+
+static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
+			      kvm_pte_t *pgtable, u32 level)
+{
+	u32 idx;
+	int ret = 0;
+
+	if (WARN_ON_ONCE(level >= KVM_PGTABLE_MAX_LEVELS))
+		return -EINVAL;
+
+	for (idx = kvm_pgtable_idx(data, level); idx < PTRS_PER_PTE; ++idx) {
+		kvm_pte_t *ptep = &pgtable[idx];
+
+		if (data->addr >= data->end)
+			break;
+
+		ret = __kvm_pgtable_visit(data, ptep, level);
+		if (ret)
+			break;
+	}
+
+	return ret;
+}
+
+static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
+{
+	u32 idx;
+	int ret = 0;
+	struct kvm_pgtable *pgt = data->pgt;
+	u64 limit = BIT(pgt->ia_bits);
+
+	if (data->addr > limit || data->end > limit)
+		return -ERANGE;
+
+	if (!pgt->pgd)
+		return -EINVAL;
+
+	for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
+		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
+
+		ret = __kvm_pgtable_walk(data, ptep, pgt->start_level);
+		if (ret)
+			break;
+	}
+
+	return ret;
+}
+
+int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
+		     struct kvm_pgtable_walker *walker)
+{
+	struct kvm_pgtable_walk_data walk_data = {
+		.pgt	= pgt,
+		.addr	= ALIGN_DOWN(addr, PAGE_SIZE),
+		.end	= PAGE_ALIGN(walk_data.addr + size),
+		.walker	= walker,
+	};
+
+	return _kvm_pgtable_walk(&walk_data);
+}
-- 
2.28.0.297.g1956fa8f8d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 03/21] KVM: arm64: Add support for creating kernel-agnostic stage-1 page tables
  2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
  2020-08-25  9:39 ` [PATCH v3 01/21] KVM: arm64: Remove kvm_mmu_free_memory_caches() Will Deacon
  2020-08-25  9:39 ` [PATCH v3 02/21] KVM: arm64: Add stand-alone page-table walker infrastructure Will Deacon
@ 2020-08-25  9:39 ` Will Deacon
  2020-08-28 15:35   ` Alexandru Elisei
  2020-08-25  9:39 ` [PATCH v3 04/21] KVM: arm64: Use generic allocator for hyp stage-1 page-tables Will Deacon
                   ` (20 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Will Deacon @ 2020-08-25  9:39 UTC (permalink / raw)
  To: kvmarm
  Cc: kernel-team, Gavin Shan, Suzuki Poulose, Marc Zyngier,
	Quentin Perret, James Morse, Catalin Marinas, Will Deacon,
	linux-arm-kernel

The generic page-table walker is pretty useless as it stands, because it
doesn't understand enough to allocate anything. Teach it about stage-1
page-tables, and hook up an API for allocating these for the hypervisor
at EL2.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_pgtable.h |  34 +++++++
 arch/arm64/kvm/hyp/pgtable.c         | 131 +++++++++++++++++++++++++++
 2 files changed, 165 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 51ccbbb0efae..ec9f98527dcc 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -77,6 +77,40 @@ struct kvm_pgtable_walker {
 	const enum kvm_pgtable_walk_flags	flags;
 };
 
+/**
+ * kvm_pgtable_hyp_init() - Initialise a hypervisor stage-1 page-table.
+ * @pgt:	Uninitialised page-table structure to initialise.
+ * @va_bits:	Maximum virtual address bits.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits);
+
+/**
+ * kvm_pgtable_hyp_destroy() - Destroy an unused hypervisor stage-1 page-table.
+ * @pgt:	Page-table structure initialised by kvm_pgtable_hyp_init().
+ *
+ * The page-table is assumed to be unreachable by any hardware walkers prior
+ * to freeing and therefore no TLB invalidation is performed.
+ */
+void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt);
+
+/**
+ * kvm_pgtable_hyp_map() - Install a mapping in a hypervisor stage-1 page-table.
+ * @pgt:	Page-table structure initialised by kvm_pgtable_hyp_init().
+ * @addr:	Virtual address at which to place the mapping.
+ * @size:	Size of the mapping.
+ * @phys:	Physical address of the memory to map.
+ * @prot:	Permissions and attributes for the mapping.
+ *
+ * If device attributes are not explicitly requested in @prot, then the
+ * mapping will be normal, cacheable.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
+			enum kvm_pgtable_prot prot);
+
 /**
  * kvm_pgtable_walk() - Walk a page-table.
  * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 462001bbe028..d75166823ad9 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -24,8 +24,18 @@
 
 #define KVM_PTE_LEAF_ATTR_LO		GENMASK(11, 2)
 
+#define KVM_PTE_LEAF_ATTR_LO_S1_ATTRIDX	GENMASK(4, 2)
+#define KVM_PTE_LEAF_ATTR_LO_S1_AP	GENMASK(7, 6)
+#define KVM_PTE_LEAF_ATTR_LO_S1_AP_RO	3
+#define KVM_PTE_LEAF_ATTR_LO_S1_AP_RW	1
+#define KVM_PTE_LEAF_ATTR_LO_S1_SH	GENMASK(9, 8)
+#define KVM_PTE_LEAF_ATTR_LO_S1_SH_IS	3
+#define KVM_PTE_LEAF_ATTR_LO_S1_AF	BIT(10)
+
 #define KVM_PTE_LEAF_ATTR_HI		GENMASK(63, 51)
 
+#define KVM_PTE_LEAF_ATTR_HI_S1_XN	BIT(54)
+
 struct kvm_pgtable_walk_data {
 	struct kvm_pgtable		*pgt;
 	struct kvm_pgtable_walker	*walker;
@@ -288,3 +298,124 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
 
 	return _kvm_pgtable_walk(&walk_data);
 }
+
+struct hyp_map_data {
+	u64		phys;
+	kvm_pte_t	attr;
+};
+
+static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot,
+				 struct hyp_map_data *data)
+{
+	bool device = prot & KVM_PGTABLE_PROT_DEVICE;
+	u32 mtype = device ? MT_DEVICE_nGnRE : MT_NORMAL;
+	kvm_pte_t attr = FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_ATTRIDX, mtype);
+	u32 sh = KVM_PTE_LEAF_ATTR_LO_S1_SH_IS;
+	u32 ap = (prot & KVM_PGTABLE_PROT_W) ? KVM_PTE_LEAF_ATTR_LO_S1_AP_RW :
+					       KVM_PTE_LEAF_ATTR_LO_S1_AP_RO;
+
+	if (!(prot & KVM_PGTABLE_PROT_R))
+		return -EINVAL;
+
+	if (prot & KVM_PGTABLE_PROT_X) {
+		if (prot & KVM_PGTABLE_PROT_W)
+			return -EINVAL;
+
+		if (device)
+			return -EINVAL;
+	} else {
+		attr |= KVM_PTE_LEAF_ATTR_HI_S1_XN;
+	}
+
+	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_AP, ap);
+	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
+	attr |= KVM_PTE_LEAF_ATTR_LO_S1_AF;
+	data->attr = attr;
+	return 0;
+}
+
+static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
+				    kvm_pte_t *ptep, struct hyp_map_data *data)
+{
+	u64 granule = kvm_granule_size(level), phys = data->phys;
+
+	if (!kvm_block_mapping_supported(addr, end, phys, level))
+		return false;
+
+	WARN_ON(!kvm_set_valid_leaf_pte(ptep, phys, data->attr, level));
+	data->phys += granule;
+	return true;
+}
+
+static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			  enum kvm_pgtable_walk_flags flag, void * const arg)
+{
+	kvm_pte_t *childp;
+
+	if (hyp_map_walker_try_leaf(addr, end, level, ptep, arg))
+		return 0;
+
+	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
+		return -EINVAL;
+
+	childp = (kvm_pte_t *)get_zeroed_page(GFP_KERNEL);
+	if (!childp)
+		return -ENOMEM;
+
+	kvm_set_table_pte(ptep, childp);
+	return 0;
+}
+
+int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
+			enum kvm_pgtable_prot prot)
+{
+	int ret;
+	struct hyp_map_data map_data = {
+		.phys	= ALIGN_DOWN(phys, PAGE_SIZE),
+	};
+	struct kvm_pgtable_walker walker = {
+		.cb	= hyp_map_walker,
+		.flags	= KVM_PGTABLE_WALK_LEAF,
+		.arg	= &map_data,
+	};
+
+	ret = hyp_map_set_prot_attr(prot, &map_data);
+	if (ret)
+		return ret;
+
+	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
+	dsb(ishst);
+	isb();
+	return ret;
+}
+
+int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits)
+{
+	pgt->pgd = (kvm_pte_t *)get_zeroed_page(GFP_KERNEL);
+	if (!pgt->pgd)
+		return -ENOMEM;
+
+	pgt->ia_bits		= va_bits;
+	pgt->start_level	= kvm_start_level(va_bits);
+	pgt->mmu		= NULL;
+	return 0;
+}
+
+static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			   enum kvm_pgtable_walk_flags flag, void * const arg)
+{
+	free_page((unsigned long)kvm_pte_follow(*ptep));
+	return 0;
+}
+
+void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
+{
+	struct kvm_pgtable_walker walker = {
+		.cb	= hyp_free_walker,
+		.flags	= KVM_PGTABLE_WALK_TABLE_POST,
+	};
+
+	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
+	free_page((unsigned long)pgt->pgd);
+	pgt->pgd = NULL;
+}
-- 
2.28.0.297.g1956fa8f8d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 04/21] KVM: arm64: Use generic allocator for hyp stage-1 page-tables
  2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
                   ` (2 preceding siblings ...)
  2020-08-25  9:39 ` [PATCH v3 03/21] KVM: arm64: Add support for creating kernel-agnostic stage-1 page tables Will Deacon
@ 2020-08-25  9:39 ` Will Deacon
  2020-08-28 16:32   ` Alexandru Elisei
  2020-08-25  9:39 ` [PATCH v3 05/21] KVM: arm64: Add support for creating kernel-agnostic stage-2 page tables Will Deacon
                   ` (19 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Will Deacon @ 2020-08-25  9:39 UTC (permalink / raw)
  To: kvmarm
  Cc: kernel-team, Gavin Shan, Suzuki Poulose, Marc Zyngier,
	Quentin Perret, James Morse, Catalin Marinas, Will Deacon,
	linux-arm-kernel

Now that we have a shiny new page-table allocator, replace the hyp
page-table code with calls into the new API. This also allows us to
remove the extended idmap code, as we can now simply ensure that the
VA size is large enough to map everything we need.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_mmu.h       |  78 +----
 arch/arm64/include/asm/kvm_pgtable.h   |   5 +
 arch/arm64/include/asm/pgtable-hwdef.h |   6 -
 arch/arm64/include/asm/pgtable-prot.h  |   6 -
 arch/arm64/kvm/mmu.c                   | 414 +++----------------------
 5 files changed, 45 insertions(+), 464 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 0f078b1920ff..42fb50cfe0d8 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -43,16 +43,6 @@
  *	HYP_VA_MIN = 1 << (VA_BITS - 1)
  * HYP_VA_MAX = HYP_VA_MIN + (1 << (VA_BITS - 1)) - 1
  *
- * This of course assumes that the trampoline page exists within the
- * VA_BITS range. If it doesn't, then it means we're in the odd case
- * where the kernel idmap (as well as HYP) uses more levels than the
- * kernel runtime page tables (as seen when the kernel is configured
- * for 4k pages, 39bits VA, and yet memory lives just above that
- * limit, forcing the idmap to use 4 levels of page tables while the
- * kernel itself only uses 3). In this particular case, it doesn't
- * matter which side of VA_BITS we use, as we're guaranteed not to
- * conflict with anything.
- *
  * When using VHE, there are no separate hyp mappings and all KVM
  * functionality is already mapped as part of the main kernel
  * mappings, and none of this applies in that case.
@@ -123,9 +113,10 @@ static inline bool kvm_page_empty(void *ptr)
 	return page_count(ptr_page) == 1;
 }
 
+#include <asm/kvm_pgtable.h>
 #include <asm/stage2_pgtable.h>
 
-int create_hyp_mappings(void *from, void *to, pgprot_t prot);
+int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot);
 int create_hyp_io_mappings(phys_addr_t phys_addr, size_t size,
 			   void __iomem **kaddr,
 			   void __iomem **haddr);
@@ -144,8 +135,6 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
 phys_addr_t kvm_mmu_get_httbr(void);
 phys_addr_t kvm_get_idmap_vector(void);
 int kvm_mmu_init(void);
-void kvm_clear_hyp_idmap(void);
-
 #define kvm_mk_pmd(ptep)					\
 	__pmd(__phys_to_pmd_val(__pa(ptep)) | PMD_TYPE_TABLE)
 #define kvm_mk_pud(pmdp)					\
@@ -263,25 +252,6 @@ static inline bool kvm_s2pud_young(pud_t pud)
 	return pud_young(pud);
 }
 
-#define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)
-
-#ifdef __PAGETABLE_PMD_FOLDED
-#define hyp_pmd_table_empty(pmdp) (0)
-#else
-#define hyp_pmd_table_empty(pmdp) kvm_page_empty(pmdp)
-#endif
-
-#ifdef __PAGETABLE_PUD_FOLDED
-#define hyp_pud_table_empty(pudp) (0)
-#else
-#define hyp_pud_table_empty(pudp) kvm_page_empty(pudp)
-#endif
-
-#ifdef __PAGETABLE_P4D_FOLDED
-#define hyp_p4d_table_empty(p4dp) (0)
-#else
-#define hyp_p4d_table_empty(p4dp) kvm_page_empty(p4dp)
-#endif
 
 struct kvm;
 
@@ -350,50 +320,6 @@ static inline void __kvm_flush_dcache_pud(pud_t pud)
 void kvm_set_way_flush(struct kvm_vcpu *vcpu);
 void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
 
-static inline bool __kvm_cpu_uses_extended_idmap(void)
-{
-	return __cpu_uses_extended_idmap_level();
-}
-
-static inline unsigned long __kvm_idmap_ptrs_per_pgd(void)
-{
-	return idmap_ptrs_per_pgd;
-}
-
-/*
- * Can't use pgd_populate here, because the extended idmap adds an extra level
- * above CONFIG_PGTABLE_LEVELS (which is 2 or 3 if we're using the extended
- * idmap), and pgd_populate is only available if CONFIG_PGTABLE_LEVELS = 4.
- */
-static inline void __kvm_extend_hypmap(pgd_t *boot_hyp_pgd,
-				       pgd_t *hyp_pgd,
-				       pgd_t *merged_hyp_pgd,
-				       unsigned long hyp_idmap_start)
-{
-	int idmap_idx;
-	u64 pgd_addr;
-
-	/*
-	 * Use the first entry to access the HYP mappings. It is
-	 * guaranteed to be free, otherwise we wouldn't use an
-	 * extended idmap.
-	 */
-	VM_BUG_ON(pgd_val(merged_hyp_pgd[0]));
-	pgd_addr = __phys_to_pgd_val(__pa(hyp_pgd));
-	merged_hyp_pgd[0] = __pgd(pgd_addr | PMD_TYPE_TABLE);
-
-	/*
-	 * Create another extended level entry that points to the boot HYP map,
-	 * which contains an ID mapping of the HYP init code. We essentially
-	 * merge the boot and runtime HYP maps by doing so, but they don't
-	 * overlap anyway, so this is fine.
-	 */
-	idmap_idx = hyp_idmap_start >> VA_BITS;
-	VM_BUG_ON(pgd_val(merged_hyp_pgd[idmap_idx]));
-	pgd_addr = __phys_to_pgd_val(__pa(boot_hyp_pgd));
-	merged_hyp_pgd[idmap_idx] = __pgd(pgd_addr | PMD_TYPE_TABLE);
-}
-
 static inline unsigned int kvm_get_vmid_bits(void)
 {
 	int reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index ec9f98527dcc..2af84ab78cb8 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -44,6 +44,11 @@ enum kvm_pgtable_prot {
 	KVM_PGTABLE_PROT_DEVICE			= BIT(3),
 };
 
+#define PAGE_HYP		(KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_W)
+#define PAGE_HYP_EXEC		(KVM_PGTABLE_PROT_R | KVM_PGTABLE_PROT_X)
+#define PAGE_HYP_RO		(KVM_PGTABLE_PROT_R)
+#define PAGE_HYP_DEVICE		(PAGE_HYP | KVM_PGTABLE_PROT_DEVICE)
+
 /**
  * enum kvm_pgtable_walk_flags - Flags to control a depth-first page-table walk.
  * @KVM_PGTABLE_WALK_LEAF:		Visit leaf entries, including invalid
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index d400a4d9aee2..1a989353144e 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -194,12 +194,6 @@
  */
 #define PTE_S2_MEMATTR(t)	(_AT(pteval_t, (t)) << 2)
 
-/*
- * EL2/HYP PTE/PMD definitions
- */
-#define PMD_HYP			PMD_SECT_USER
-#define PTE_HYP			PTE_USER
-
 /*
  * Highest possible physical address supported.
  */
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 4d867c6446c4..88acd7e1cd05 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -56,7 +56,6 @@ extern bool arm64_use_ng_mappings;
 #define PROT_SECT_NORMAL_EXEC	(PROT_SECT_DEFAULT | PMD_SECT_UXN | PMD_ATTRINDX(MT_NORMAL))
 
 #define _PAGE_DEFAULT		(_PROT_DEFAULT | PTE_ATTRINDX(MT_NORMAL))
-#define _HYP_PAGE_DEFAULT	_PAGE_DEFAULT
 
 #define PAGE_KERNEL		__pgprot(PROT_NORMAL)
 #define PAGE_KERNEL_RO		__pgprot((PROT_NORMAL & ~PTE_WRITE) | PTE_RDONLY)
@@ -64,11 +63,6 @@ extern bool arm64_use_ng_mappings;
 #define PAGE_KERNEL_EXEC	__pgprot(PROT_NORMAL & ~PTE_PXN)
 #define PAGE_KERNEL_EXEC_CONT	__pgprot((PROT_NORMAL & ~PTE_PXN) | PTE_CONT)
 
-#define PAGE_HYP		__pgprot(_HYP_PAGE_DEFAULT | PTE_HYP | PTE_HYP_XN)
-#define PAGE_HYP_EXEC		__pgprot(_HYP_PAGE_DEFAULT | PTE_HYP | PTE_RDONLY)
-#define PAGE_HYP_RO		__pgprot(_HYP_PAGE_DEFAULT | PTE_HYP | PTE_RDONLY | PTE_HYP_XN)
-#define PAGE_HYP_DEVICE		__pgprot(_PROT_DEFAULT | PTE_ATTRINDX(MT_DEVICE_nGnRE) | PTE_HYP | PTE_HYP_XN)
-
 #define PAGE_S2_MEMATTR(attr)						\
 	({								\
 		u64 __val;						\
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 935f8f689433..fabd72b0c8a4 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -14,6 +14,7 @@
 #include <asm/cacheflush.h>
 #include <asm/kvm_arm.h>
 #include <asm/kvm_mmu.h>
+#include <asm/kvm_pgtable.h>
 #include <asm/kvm_ras.h>
 #include <asm/kvm_asm.h>
 #include <asm/kvm_emulate.h>
@@ -21,9 +22,7 @@
 
 #include "trace.h"
 
-static pgd_t *boot_hyp_pgd;
-static pgd_t *hyp_pgd;
-static pgd_t *merged_hyp_pgd;
+static struct kvm_pgtable *hyp_pgtable;
 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
 
 static unsigned long hyp_idmap_start;
@@ -32,8 +31,6 @@ static phys_addr_t hyp_idmap_vector;
 
 static unsigned long io_map_base;
 
-#define hyp_pgd_order get_order(PTRS_PER_PGD * sizeof(pgd_t))
-
 #define KVM_S2PTE_FLAG_IS_IOMAP		(1UL << 0)
 #define KVM_S2_FLAG_LOGGING_ACTIVE	(1UL << 1)
 
@@ -489,338 +486,28 @@ static void stage2_flush_vm(struct kvm *kvm)
 	srcu_read_unlock(&kvm->srcu, idx);
 }
 
-static void clear_hyp_pgd_entry(pgd_t *pgd)
-{
-	p4d_t *p4d_table __maybe_unused = p4d_offset(pgd, 0UL);
-	pgd_clear(pgd);
-	p4d_free(NULL, p4d_table);
-	put_page(virt_to_page(pgd));
-}
-
-static void clear_hyp_p4d_entry(p4d_t *p4d)
-{
-	pud_t *pud_table __maybe_unused = pud_offset(p4d, 0UL);
-	VM_BUG_ON(p4d_huge(*p4d));
-	p4d_clear(p4d);
-	pud_free(NULL, pud_table);
-	put_page(virt_to_page(p4d));
-}
-
-static void clear_hyp_pud_entry(pud_t *pud)
-{
-	pmd_t *pmd_table __maybe_unused = pmd_offset(pud, 0);
-	VM_BUG_ON(pud_huge(*pud));
-	pud_clear(pud);
-	pmd_free(NULL, pmd_table);
-	put_page(virt_to_page(pud));
-}
-
-static void clear_hyp_pmd_entry(pmd_t *pmd)
-{
-	pte_t *pte_table = pte_offset_kernel(pmd, 0);
-	VM_BUG_ON(pmd_thp_or_huge(*pmd));
-	pmd_clear(pmd);
-	pte_free_kernel(NULL, pte_table);
-	put_page(virt_to_page(pmd));
-}
-
-static void unmap_hyp_ptes(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
-{
-	pte_t *pte, *start_pte;
-
-	start_pte = pte = pte_offset_kernel(pmd, addr);
-	do {
-		if (!pte_none(*pte)) {
-			kvm_set_pte(pte, __pte(0));
-			put_page(virt_to_page(pte));
-		}
-	} while (pte++, addr += PAGE_SIZE, addr != end);
-
-	if (hyp_pte_table_empty(start_pte))
-		clear_hyp_pmd_entry(pmd);
-}
-
-static void unmap_hyp_pmds(pud_t *pud, phys_addr_t addr, phys_addr_t end)
-{
-	phys_addr_t next;
-	pmd_t *pmd, *start_pmd;
-
-	start_pmd = pmd = pmd_offset(pud, addr);
-	do {
-		next = pmd_addr_end(addr, end);
-		/* Hyp doesn't use huge pmds */
-		if (!pmd_none(*pmd))
-			unmap_hyp_ptes(pmd, addr, next);
-	} while (pmd++, addr = next, addr != end);
-
-	if (hyp_pmd_table_empty(start_pmd))
-		clear_hyp_pud_entry(pud);
-}
-
-static void unmap_hyp_puds(p4d_t *p4d, phys_addr_t addr, phys_addr_t end)
-{
-	phys_addr_t next;
-	pud_t *pud, *start_pud;
-
-	start_pud = pud = pud_offset(p4d, addr);
-	do {
-		next = pud_addr_end(addr, end);
-		/* Hyp doesn't use huge puds */
-		if (!pud_none(*pud))
-			unmap_hyp_pmds(pud, addr, next);
-	} while (pud++, addr = next, addr != end);
-
-	if (hyp_pud_table_empty(start_pud))
-		clear_hyp_p4d_entry(p4d);
-}
-
-static void unmap_hyp_p4ds(pgd_t *pgd, phys_addr_t addr, phys_addr_t end)
-{
-	phys_addr_t next;
-	p4d_t *p4d, *start_p4d;
-
-	start_p4d = p4d = p4d_offset(pgd, addr);
-	do {
-		next = p4d_addr_end(addr, end);
-		/* Hyp doesn't use huge p4ds */
-		if (!p4d_none(*p4d))
-			unmap_hyp_puds(p4d, addr, next);
-	} while (p4d++, addr = next, addr != end);
-
-	if (hyp_p4d_table_empty(start_p4d))
-		clear_hyp_pgd_entry(pgd);
-}
-
-static unsigned int kvm_pgd_index(unsigned long addr, unsigned int ptrs_per_pgd)
-{
-	return (addr >> PGDIR_SHIFT) & (ptrs_per_pgd - 1);
-}
-
-static void __unmap_hyp_range(pgd_t *pgdp, unsigned long ptrs_per_pgd,
-			      phys_addr_t start, u64 size)
-{
-	pgd_t *pgd;
-	phys_addr_t addr = start, end = start + size;
-	phys_addr_t next;
-
-	/*
-	 * We don't unmap anything from HYP, except at the hyp tear down.
-	 * Hence, we don't have to invalidate the TLBs here.
-	 */
-	pgd = pgdp + kvm_pgd_index(addr, ptrs_per_pgd);
-	do {
-		next = pgd_addr_end(addr, end);
-		if (!pgd_none(*pgd))
-			unmap_hyp_p4ds(pgd, addr, next);
-	} while (pgd++, addr = next, addr != end);
-}
-
-static void unmap_hyp_range(pgd_t *pgdp, phys_addr_t start, u64 size)
-{
-	__unmap_hyp_range(pgdp, PTRS_PER_PGD, start, size);
-}
-
-static void unmap_hyp_idmap_range(pgd_t *pgdp, phys_addr_t start, u64 size)
-{
-	__unmap_hyp_range(pgdp, __kvm_idmap_ptrs_per_pgd(), start, size);
-}
-
 /**
  * free_hyp_pgds - free Hyp-mode page tables
- *
- * Assumes hyp_pgd is a page table used strictly in Hyp-mode and
- * therefore contains either mappings in the kernel memory area (above
- * PAGE_OFFSET), or device mappings in the idmap range.
- *
- * boot_hyp_pgd should only map the idmap range, and is only used in
- * the extended idmap case.
  */
 void free_hyp_pgds(void)
 {
-	pgd_t *id_pgd;
-
 	mutex_lock(&kvm_hyp_pgd_mutex);
-
-	id_pgd = boot_hyp_pgd ? boot_hyp_pgd : hyp_pgd;
-
-	if (id_pgd) {
-		/* In case we never called hyp_mmu_init() */
-		if (!io_map_base)
-			io_map_base = hyp_idmap_start;
-		unmap_hyp_idmap_range(id_pgd, io_map_base,
-				      hyp_idmap_start + PAGE_SIZE - io_map_base);
-	}
-
-	if (boot_hyp_pgd) {
-		free_pages((unsigned long)boot_hyp_pgd, hyp_pgd_order);
-		boot_hyp_pgd = NULL;
-	}
-
-	if (hyp_pgd) {
-		unmap_hyp_range(hyp_pgd, kern_hyp_va(PAGE_OFFSET),
-				(uintptr_t)high_memory - PAGE_OFFSET);
-
-		free_pages((unsigned long)hyp_pgd, hyp_pgd_order);
-		hyp_pgd = NULL;
+	if (hyp_pgtable) {
+		kvm_pgtable_hyp_destroy(hyp_pgtable);
+		kfree(hyp_pgtable);
 	}
-	if (merged_hyp_pgd) {
-		clear_page(merged_hyp_pgd);
-		free_page((unsigned long)merged_hyp_pgd);
-		merged_hyp_pgd = NULL;
-	}
-
 	mutex_unlock(&kvm_hyp_pgd_mutex);
 }
 
-static void create_hyp_pte_mappings(pmd_t *pmd, unsigned long start,
-				    unsigned long end, unsigned long pfn,
-				    pgprot_t prot)
-{
-	pte_t *pte;
-	unsigned long addr;
-
-	addr = start;
-	do {
-		pte = pte_offset_kernel(pmd, addr);
-		kvm_set_pte(pte, kvm_pfn_pte(pfn, prot));
-		get_page(virt_to_page(pte));
-		pfn++;
-	} while (addr += PAGE_SIZE, addr != end);
-}
-
-static int create_hyp_pmd_mappings(pud_t *pud, unsigned long start,
-				   unsigned long end, unsigned long pfn,
-				   pgprot_t prot)
+static int __create_hyp_mappings(unsigned long start, unsigned long size,
+				 unsigned long phys, enum kvm_pgtable_prot prot)
 {
-	pmd_t *pmd;
-	pte_t *pte;
-	unsigned long addr, next;
-
-	addr = start;
-	do {
-		pmd = pmd_offset(pud, addr);
-
-		BUG_ON(pmd_sect(*pmd));
-
-		if (pmd_none(*pmd)) {
-			pte = pte_alloc_one_kernel(NULL);
-			if (!pte) {
-				kvm_err("Cannot allocate Hyp pte\n");
-				return -ENOMEM;
-			}
-			kvm_pmd_populate(pmd, pte);
-			get_page(virt_to_page(pmd));
-		}
-
-		next = pmd_addr_end(addr, end);
-
-		create_hyp_pte_mappings(pmd, addr, next, pfn, prot);
-		pfn += (next - addr) >> PAGE_SHIFT;
-	} while (addr = next, addr != end);
-
-	return 0;
-}
-
-static int create_hyp_pud_mappings(p4d_t *p4d, unsigned long start,
-				   unsigned long end, unsigned long pfn,
-				   pgprot_t prot)
-{
-	pud_t *pud;
-	pmd_t *pmd;
-	unsigned long addr, next;
-	int ret;
-
-	addr = start;
-	do {
-		pud = pud_offset(p4d, addr);
-
-		if (pud_none_or_clear_bad(pud)) {
-			pmd = pmd_alloc_one(NULL, addr);
-			if (!pmd) {
-				kvm_err("Cannot allocate Hyp pmd\n");
-				return -ENOMEM;
-			}
-			kvm_pud_populate(pud, pmd);
-			get_page(virt_to_page(pud));
-		}
-
-		next = pud_addr_end(addr, end);
-		ret = create_hyp_pmd_mappings(pud, addr, next, pfn, prot);
-		if (ret)
-			return ret;
-		pfn += (next - addr) >> PAGE_SHIFT;
-	} while (addr = next, addr != end);
-
-	return 0;
-}
-
-static int create_hyp_p4d_mappings(pgd_t *pgd, unsigned long start,
-				   unsigned long end, unsigned long pfn,
-				   pgprot_t prot)
-{
-	p4d_t *p4d;
-	pud_t *pud;
-	unsigned long addr, next;
-	int ret;
-
-	addr = start;
-	do {
-		p4d = p4d_offset(pgd, addr);
-
-		if (p4d_none(*p4d)) {
-			pud = pud_alloc_one(NULL, addr);
-			if (!pud) {
-				kvm_err("Cannot allocate Hyp pud\n");
-				return -ENOMEM;
-			}
-			kvm_p4d_populate(p4d, pud);
-			get_page(virt_to_page(p4d));
-		}
-
-		next = p4d_addr_end(addr, end);
-		ret = create_hyp_pud_mappings(p4d, addr, next, pfn, prot);
-		if (ret)
-			return ret;
-		pfn += (next - addr) >> PAGE_SHIFT;
-	} while (addr = next, addr != end);
-
-	return 0;
-}
-
-static int __create_hyp_mappings(pgd_t *pgdp, unsigned long ptrs_per_pgd,
-				 unsigned long start, unsigned long end,
-				 unsigned long pfn, pgprot_t prot)
-{
-	pgd_t *pgd;
-	p4d_t *p4d;
-	unsigned long addr, next;
-	int err = 0;
+	int err;
 
 	mutex_lock(&kvm_hyp_pgd_mutex);
-	addr = start & PAGE_MASK;
-	end = PAGE_ALIGN(end);
-	do {
-		pgd = pgdp + kvm_pgd_index(addr, ptrs_per_pgd);
-
-		if (pgd_none(*pgd)) {
-			p4d = p4d_alloc_one(NULL, addr);
-			if (!p4d) {
-				kvm_err("Cannot allocate Hyp p4d\n");
-				err = -ENOMEM;
-				goto out;
-			}
-			kvm_pgd_populate(pgd, p4d);
-			get_page(virt_to_page(pgd));
-		}
-
-		next = pgd_addr_end(addr, end);
-		err = create_hyp_p4d_mappings(pgd, addr, next, pfn, prot);
-		if (err)
-			goto out;
-		pfn += (next - addr) >> PAGE_SHIFT;
-	} while (addr = next, addr != end);
-out:
+	err = kvm_pgtable_hyp_map(hyp_pgtable, start, size, phys, prot);
 	mutex_unlock(&kvm_hyp_pgd_mutex);
+
 	return err;
 }
 
@@ -845,7 +532,7 @@ static phys_addr_t kvm_kaddr_to_phys(void *kaddr)
  * in Hyp-mode mapping (modulo HYP_PAGE_OFFSET) to the same underlying
  * physical pages.
  */
-int create_hyp_mappings(void *from, void *to, pgprot_t prot)
+int create_hyp_mappings(void *from, void *to, enum kvm_pgtable_prot prot)
 {
 	phys_addr_t phys_addr;
 	unsigned long virt_addr;
@@ -862,9 +549,7 @@ int create_hyp_mappings(void *from, void *to, pgprot_t prot)
 		int err;
 
 		phys_addr = kvm_kaddr_to_phys(from + virt_addr - start);
-		err = __create_hyp_mappings(hyp_pgd, PTRS_PER_PGD,
-					    virt_addr, virt_addr + PAGE_SIZE,
-					    __phys_to_pfn(phys_addr),
+		err = __create_hyp_mappings(virt_addr, PAGE_SIZE, phys_addr,
 					    prot);
 		if (err)
 			return err;
@@ -874,9 +559,9 @@ int create_hyp_mappings(void *from, void *to, pgprot_t prot)
 }
 
 static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
-					unsigned long *haddr, pgprot_t prot)
+					unsigned long *haddr,
+					enum kvm_pgtable_prot prot)
 {
-	pgd_t *pgd = hyp_pgd;
 	unsigned long base;
 	int ret = 0;
 
@@ -908,17 +593,11 @@ static int __create_hyp_private_mapping(phys_addr_t phys_addr, size_t size,
 	if (ret)
 		goto out;
 
-	if (__kvm_cpu_uses_extended_idmap())
-		pgd = boot_hyp_pgd;
-
-	ret = __create_hyp_mappings(pgd, __kvm_idmap_ptrs_per_pgd(),
-				    base, base + size,
-				    __phys_to_pfn(phys_addr), prot);
+	ret = __create_hyp_mappings(base, size, phys_addr, prot);
 	if (ret)
 		goto out;
 
 	*haddr = base + offset_in_page(phys_addr);
-
 out:
 	return ret;
 }
@@ -2326,10 +2005,7 @@ int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
 
 phys_addr_t kvm_mmu_get_httbr(void)
 {
-	if (__kvm_cpu_uses_extended_idmap())
-		return virt_to_phys(merged_hyp_pgd);
-	else
-		return virt_to_phys(hyp_pgd);
+	return __pa(hyp_pgtable->pgd);
 }
 
 phys_addr_t kvm_get_idmap_vector(void)
@@ -2337,15 +2013,11 @@ phys_addr_t kvm_get_idmap_vector(void)
 	return hyp_idmap_vector;
 }
 
-static int kvm_map_idmap_text(pgd_t *pgd)
+static int kvm_map_idmap_text(void)
 {
-	int err;
-
-	/* Create the idmap in the boot page tables */
-	err = 	__create_hyp_mappings(pgd, __kvm_idmap_ptrs_per_pgd(),
-				      hyp_idmap_start, hyp_idmap_end,
-				      __phys_to_pfn(hyp_idmap_start),
-				      PAGE_HYP_EXEC);
+	unsigned long size = hyp_idmap_end - hyp_idmap_start;
+	int err = __create_hyp_mappings(hyp_idmap_start, size, hyp_idmap_start,
+					PAGE_HYP_EXEC);
 	if (err)
 		kvm_err("Failed to idmap %lx-%lx\n",
 			hyp_idmap_start, hyp_idmap_end);
@@ -2356,6 +2028,7 @@ static int kvm_map_idmap_text(pgd_t *pgd)
 int kvm_mmu_init(void)
 {
 	int err;
+	u32 hyp_va_bits;
 
 	hyp_idmap_start = __pa_symbol(__hyp_idmap_text_start);
 	hyp_idmap_start = ALIGN_DOWN(hyp_idmap_start, PAGE_SIZE);
@@ -2369,6 +2042,8 @@ int kvm_mmu_init(void)
 	 */
 	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
 
+	hyp_va_bits = 64 - ((idmap_t0sz & TCR_T0SZ_MASK) >> TCR_T0SZ_OFFSET);
+	kvm_debug("Using %u-bit virtual addresses at EL2\n", hyp_va_bits);
 	kvm_debug("IDMAP page: %lx\n", hyp_idmap_start);
 	kvm_debug("HYP VA range: %lx:%lx\n",
 		  kern_hyp_va(PAGE_OFFSET),
@@ -2386,43 +2061,30 @@ int kvm_mmu_init(void)
 		goto out;
 	}
 
-	hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, hyp_pgd_order);
-	if (!hyp_pgd) {
-		kvm_err("Hyp mode PGD not allocated\n");
+	hyp_pgtable = kzalloc(sizeof(*hyp_pgtable), GFP_KERNEL);
+	if (!hyp_pgtable) {
+		kvm_err("Hyp mode page-table not allocated\n");
 		err = -ENOMEM;
 		goto out;
 	}
 
-	if (__kvm_cpu_uses_extended_idmap()) {
-		boot_hyp_pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
-							 hyp_pgd_order);
-		if (!boot_hyp_pgd) {
-			kvm_err("Hyp boot PGD not allocated\n");
-			err = -ENOMEM;
-			goto out;
-		}
-
-		err = kvm_map_idmap_text(boot_hyp_pgd);
-		if (err)
-			goto out;
+	err = kvm_pgtable_hyp_init(hyp_pgtable, hyp_va_bits);
+	if (err)
+		goto out_free_pgtable;
 
-		merged_hyp_pgd = (pgd_t *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
-		if (!merged_hyp_pgd) {
-			kvm_err("Failed to allocate extra HYP pgd\n");
-			goto out;
-		}
-		__kvm_extend_hypmap(boot_hyp_pgd, hyp_pgd, merged_hyp_pgd,
-				    hyp_idmap_start);
-	} else {
-		err = kvm_map_idmap_text(hyp_pgd);
-		if (err)
-			goto out;
-	}
+	err = kvm_map_idmap_text();
+	if (err)
+		goto out_destroy_pgtable;
 
 	io_map_base = hyp_idmap_start;
 	return 0;
+
+out_destroy_pgtable:
+	kvm_pgtable_hyp_destroy(hyp_pgtable);
+out_free_pgtable:
+	kfree(hyp_pgtable);
+	hyp_pgtable = NULL;
 out:
-	free_hyp_pgds();
 	return err;
 }
 
-- 
2.28.0.297.g1956fa8f8d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 05/21] KVM: arm64: Add support for creating kernel-agnostic stage-2 page tables
  2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
                   ` (3 preceding siblings ...)
  2020-08-25  9:39 ` [PATCH v3 04/21] KVM: arm64: Use generic allocator for hyp stage-1 page-tables Will Deacon
@ 2020-08-25  9:39 ` Will Deacon
  2020-09-02  6:40   ` Gavin Shan
  2020-08-25  9:39 ` [PATCH v3 06/21] KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table Will Deacon
                   ` (18 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Will Deacon @ 2020-08-25  9:39 UTC (permalink / raw)
  To: kvmarm
  Cc: kernel-team, Gavin Shan, Suzuki Poulose, Marc Zyngier,
	Quentin Perret, James Morse, Catalin Marinas, Will Deacon,
	linux-arm-kernel

Introduce alloc() and free() functions to the generic page-table code
for guest stage-2 page-tables and plumb these into the existing KVM
page-table allocator. Subsequent patches will convert other operations
within the KVM allocator over to the generic code.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h    |  1 +
 arch/arm64/include/asm/kvm_pgtable.h | 18 +++++++++
 arch/arm64/kvm/hyp/pgtable.c         | 51 ++++++++++++++++++++++++++
 arch/arm64/kvm/mmu.c                 | 55 +++++++++++++++-------------
 4 files changed, 99 insertions(+), 26 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e52c927aade5..0b7c702b2151 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -81,6 +81,7 @@ struct kvm_s2_mmu {
 	 */
 	pgd_t		*pgd;
 	phys_addr_t	pgd_phys;
+	struct kvm_pgtable *pgt;
 
 	/* The last vcpu id that ran on each physical CPU */
 	int __percpu *last_vcpu_ran;
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 2af84ab78cb8..3389f978d573 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -116,6 +116,24 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt);
 int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
 			enum kvm_pgtable_prot prot);
 
+/**
+ * kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
+ * @pgt:	Uninitialised page-table structure to initialise.
+ * @kvm:	KVM structure representing the guest virtual machine.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm);
+
+/**
+ * kvm_pgtable_stage2_destroy() - Destroy an unused guest stage-2 page-table.
+ * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
+ *
+ * The page-table is assumed to be unreachable by any hardware walkers prior
+ * to freeing and therefore no TLB invalidation is performed.
+ */
+void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
+
 /**
  * kvm_pgtable_walk() - Walk a page-table.
  * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index d75166823ad9..b8550ccaef4d 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -419,3 +419,54 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 	free_page((unsigned long)pgt->pgd);
 	pgt->pgd = NULL;
 }
+
+int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm)
+{
+	size_t pgd_sz;
+	u64 vtcr = kvm->arch.vtcr;
+	u32 ia_bits = VTCR_EL2_IPA(vtcr);
+	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
+	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
+
+	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
+	pgt->pgd = alloc_pages_exact(pgd_sz, GFP_KERNEL | __GFP_ZERO);
+	if (!pgt->pgd)
+		return -ENOMEM;
+
+	pgt->ia_bits		= ia_bits;
+	pgt->start_level	= start_level;
+	pgt->mmu		= &kvm->arch.mmu;
+	return 0;
+}
+
+static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			      enum kvm_pgtable_walk_flags flag,
+			      void * const arg)
+{
+	kvm_pte_t pte = *ptep;
+
+	if (!kvm_pte_valid(pte))
+		return 0;
+
+	put_page(virt_to_page(ptep));
+
+	if (kvm_pte_table(pte, level))
+		free_page((unsigned long)kvm_pte_follow(pte));
+
+	return 0;
+}
+
+void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
+{
+	size_t pgd_sz;
+	struct kvm_pgtable_walker walker = {
+		.cb	= stage2_free_walker,
+		.flags	= KVM_PGTABLE_WALK_LEAF |
+			  KVM_PGTABLE_WALK_TABLE_POST,
+	};
+
+	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
+	pgd_sz = kvm_pgd_pages(pgt->ia_bits, pgt->start_level) * PAGE_SIZE;
+	free_pages_exact(pgt->pgd, pgd_sz);
+	pgt->pgd = NULL;
+}
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index fabd72b0c8a4..4607e9ca60a2 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -668,47 +668,49 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
  * @kvm:	The pointer to the KVM structure
  * @mmu:	The pointer to the s2 MMU structure
  *
- * Allocates only the stage-2 HW PGD level table(s) of size defined by
- * stage2_pgd_size(mmu->kvm).
- *
+ * Allocates only the stage-2 HW PGD level table(s).
  * Note we don't need locking here as this is only called when the VM is
  * created, which can only be done once.
  */
 int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 {
-	phys_addr_t pgd_phys;
-	pgd_t *pgd;
-	int cpu;
+	int cpu, err;
+	struct kvm_pgtable *pgt;
 
-	if (mmu->pgd != NULL) {
+	if (mmu->pgt != NULL) {
 		kvm_err("kvm_arch already initialized?\n");
 		return -EINVAL;
 	}
 
-	/* Allocate the HW PGD, making sure that each page gets its own refcount */
-	pgd = alloc_pages_exact(stage2_pgd_size(kvm), GFP_KERNEL | __GFP_ZERO);
-	if (!pgd)
+	pgt = kzalloc(sizeof(*pgt), GFP_KERNEL);
+	if (!pgt)
 		return -ENOMEM;
 
-	pgd_phys = virt_to_phys(pgd);
-	if (WARN_ON(pgd_phys & ~kvm_vttbr_baddr_mask(kvm)))
-		return -EINVAL;
+	err = kvm_pgtable_stage2_init(pgt, kvm);
+	if (err)
+		goto out_free_pgtable;
 
 	mmu->last_vcpu_ran = alloc_percpu(typeof(*mmu->last_vcpu_ran));
 	if (!mmu->last_vcpu_ran) {
-		free_pages_exact(pgd, stage2_pgd_size(kvm));
-		return -ENOMEM;
+		err = -ENOMEM;
+		goto out_destroy_pgtable;
 	}
 
 	for_each_possible_cpu(cpu)
 		*per_cpu_ptr(mmu->last_vcpu_ran, cpu) = -1;
 
 	mmu->kvm = kvm;
-	mmu->pgd = pgd;
-	mmu->pgd_phys = pgd_phys;
+	mmu->pgt = pgt;
+	mmu->pgd_phys = __pa(pgt->pgd);
+	mmu->pgd = (void *)pgt->pgd;
 	mmu->vmid.vmid_gen = 0;
-
 	return 0;
+
+out_destroy_pgtable:
+	kvm_pgtable_stage2_destroy(pgt);
+out_free_pgtable:
+	kfree(pgt);
+	return err;
 }
 
 static void stage2_unmap_memslot(struct kvm *kvm,
@@ -781,20 +783,21 @@ void stage2_unmap_vm(struct kvm *kvm)
 void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
 {
 	struct kvm *kvm = mmu->kvm;
-	void *pgd = NULL;
+	struct kvm_pgtable *pgt = NULL;
 
 	spin_lock(&kvm->mmu_lock);
-	if (mmu->pgd) {
-		unmap_stage2_range(mmu, 0, kvm_phys_size(kvm));
-		pgd = READ_ONCE(mmu->pgd);
+	pgt = mmu->pgt;
+	if (pgt) {
 		mmu->pgd = NULL;
+		mmu->pgd_phys = 0;
+		mmu->pgt = NULL;
+		free_percpu(mmu->last_vcpu_ran);
 	}
 	spin_unlock(&kvm->mmu_lock);
 
-	/* Free the HW pgd, one page at a time */
-	if (pgd) {
-		free_pages_exact(pgd, stage2_pgd_size(kvm));
-		free_percpu(mmu->last_vcpu_ran);
+	if (pgt) {
+		kvm_pgtable_stage2_destroy(pgt);
+		kfree(pgt);
 	}
 }
 
-- 
2.28.0.297.g1956fa8f8d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 06/21] KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table
  2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
                   ` (4 preceding siblings ...)
  2020-08-25  9:39 ` [PATCH v3 05/21] KVM: arm64: Add support for creating kernel-agnostic stage-2 page tables Will Deacon
@ 2020-08-25  9:39 ` Will Deacon
  2020-09-01 16:24   ` Alexandru Elisei
                     ` (2 more replies)
  2020-08-25  9:39 ` [PATCH v3 07/21] KVM: arm64: Convert kvm_phys_addr_ioremap() to generic page-table API Will Deacon
                   ` (17 subsequent siblings)
  23 siblings, 3 replies; 86+ messages in thread
From: Will Deacon @ 2020-08-25  9:39 UTC (permalink / raw)
  To: kvmarm
  Cc: kernel-team, Gavin Shan, Suzuki Poulose, Marc Zyngier,
	Quentin Perret, James Morse, Catalin Marinas, Will Deacon,
	linux-arm-kernel

Add stage-2 map() and unmap() operations to the generic page-table code.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_pgtable.h |  39 ++++
 arch/arm64/kvm/hyp/pgtable.c         | 262 +++++++++++++++++++++++++++
 2 files changed, 301 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 3389f978d573..8ab0d5f43817 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -134,6 +134,45 @@ int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm);
  */
 void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
 
+/**
+ * kvm_pgtable_stage2_map() - Install a mapping in a guest stage-2 page-table.
+ * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
+ * @addr:	Intermediate physical address at which to place the mapping.
+ * @size:	Size of the mapping.
+ * @phys:	Physical address of the memory to map.
+ * @prot:	Permissions and attributes for the mapping.
+ * @mc:		Cache of pre-allocated GFP_PGTABLE_USER memory from which to
+ *		allocate page-table pages.
+ *
+ * If device attributes are not explicitly requested in @prot, then the
+ * mapping will be normal, cacheable.
+ *
+ * Note that this function will both coalesce existing table entries and split
+ * existing block mappings, relying on page-faults to fault back areas outside
+ * of the new mapping lazily.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
+			   u64 phys, enum kvm_pgtable_prot prot,
+			   struct kvm_mmu_memory_cache *mc);
+
+/**
+ * kvm_pgtable_stage2_unmap() - Remove a mapping from a guest stage-2 page-table.
+ * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
+ * @addr:	Intermediate physical address from which to remove the mapping.
+ * @size:	Size of the mapping.
+ *
+ * TLB invalidation is performed for each page-table entry cleared during the
+ * unmapping operation and the reference count for the page-table page
+ * containing the cleared entry is decremented, with unreferenced pages being
+ * freed. Unmapping a cacheable page will ensure that it is clean to the PoC if
+ * FWB is not supported by the CPU.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
+
 /**
  * kvm_pgtable_walk() - Walk a page-table.
  * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index b8550ccaef4d..41ee8f3c0369 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -32,10 +32,19 @@
 #define KVM_PTE_LEAF_ATTR_LO_S1_SH_IS	3
 #define KVM_PTE_LEAF_ATTR_LO_S1_AF	BIT(10)
 
+#define KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR	GENMASK(5, 2)
+#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R	BIT(6)
+#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W	BIT(7)
+#define KVM_PTE_LEAF_ATTR_LO_S2_SH	GENMASK(9, 8)
+#define KVM_PTE_LEAF_ATTR_LO_S2_SH_IS	3
+#define KVM_PTE_LEAF_ATTR_LO_S2_AF	BIT(10)
+
 #define KVM_PTE_LEAF_ATTR_HI		GENMASK(63, 51)
 
 #define KVM_PTE_LEAF_ATTR_HI_S1_XN	BIT(54)
 
+#define KVM_PTE_LEAF_ATTR_HI_S2_XN	BIT(54)
+
 struct kvm_pgtable_walk_data {
 	struct kvm_pgtable		*pgt;
 	struct kvm_pgtable_walker	*walker;
@@ -420,6 +429,259 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
 	pgt->pgd = NULL;
 }
 
+struct stage2_map_data {
+	u64				phys;
+	kvm_pte_t			attr;
+
+	kvm_pte_t			*anchor;
+
+	struct kvm_s2_mmu		*mmu;
+	struct kvm_mmu_memory_cache	*memcache;
+};
+
+static kvm_pte_t *stage2_memcache_alloc_page(struct stage2_map_data *data)
+{
+	kvm_pte_t *ptep = NULL;
+	struct kvm_mmu_memory_cache *mc = data->memcache;
+
+	/* Allocated with GFP_PGTABLE_USER, so no need to zero */
+	if (mc && mc->nobjs)
+		ptep = mc->objects[--mc->nobjs];
+
+	return ptep;
+}
+
+static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
+				    struct stage2_map_data *data)
+{
+	bool device = prot & KVM_PGTABLE_PROT_DEVICE;
+	kvm_pte_t attr = device ? PAGE_S2_MEMATTR(DEVICE_nGnRE) :
+			    PAGE_S2_MEMATTR(NORMAL);
+	u32 sh = KVM_PTE_LEAF_ATTR_LO_S2_SH_IS;
+
+	if (!(prot & KVM_PGTABLE_PROT_X))
+		attr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
+	else if (device)
+		return -EINVAL;
+
+	if (prot & KVM_PGTABLE_PROT_R)
+		attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R;
+
+	if (prot & KVM_PGTABLE_PROT_W)
+		attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W;
+
+	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
+	attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
+	data->attr = attr;
+	return 0;
+}
+
+static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
+				       kvm_pte_t *ptep,
+				       struct stage2_map_data *data)
+{
+	u64 granule = kvm_granule_size(level), phys = data->phys;
+
+	if (!kvm_block_mapping_supported(addr, end, phys, level))
+		return false;
+
+	if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
+		goto out;
+
+	kvm_set_invalid_pte(ptep);
+	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
+	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
+out:
+	data->phys += granule;
+	return true;
+}
+
+static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
+				     kvm_pte_t *ptep,
+				     struct stage2_map_data *data)
+{
+	if (data->anchor)
+		return 0;
+
+	if (!kvm_block_mapping_supported(addr, end, data->phys, level))
+		return 0;
+
+	kvm_set_invalid_pte(ptep);
+	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
+	data->anchor = ptep;
+	return 0;
+}
+
+static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+				struct stage2_map_data *data)
+{
+	kvm_pte_t *childp, pte = *ptep;
+	struct page *page = virt_to_page(ptep);
+
+	if (data->anchor) {
+		if (kvm_pte_valid(pte))
+			put_page(page);
+
+		return 0;
+	}
+
+	if (stage2_map_walker_try_leaf(addr, end, level, ptep, data))
+		goto out_get_page;
+
+	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
+		return -EINVAL;
+
+	childp = stage2_memcache_alloc_page(data);
+	if (!childp)
+		return -ENOMEM;
+
+	/*
+	 * If we've run into an existing block mapping then replace it with
+	 * a table. Accesses beyond 'end' that fall within the new table
+	 * will be mapped lazily.
+	 */
+	if (kvm_pte_valid(pte)) {
+		kvm_set_invalid_pte(ptep);
+		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
+		put_page(page);
+	}
+
+	kvm_set_table_pte(ptep, childp);
+
+out_get_page:
+	get_page(page);
+	return 0;
+}
+
+static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
+				      kvm_pte_t *ptep,
+				      struct stage2_map_data *data)
+{
+	int ret = 0;
+
+	if (!data->anchor)
+		return 0;
+
+	free_page((unsigned long)kvm_pte_follow(*ptep));
+	put_page(virt_to_page(ptep));
+
+	if (data->anchor == ptep) {
+		data->anchor = NULL;
+		ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
+	}
+
+	return ret;
+}
+
+static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			     enum kvm_pgtable_walk_flags flag, void * const arg)
+{
+	struct stage2_map_data *data = arg;
+
+	switch (flag) {
+	case KVM_PGTABLE_WALK_TABLE_PRE:
+		return stage2_map_walk_table_pre(addr, end, level, ptep, data);
+	case KVM_PGTABLE_WALK_LEAF:
+		return stage2_map_walk_leaf(addr, end, level, ptep, data);
+	case KVM_PGTABLE_WALK_TABLE_POST:
+		return stage2_map_walk_table_post(addr, end, level, ptep, data);
+	}
+
+	return -EINVAL;
+}
+
+int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
+			   u64 phys, enum kvm_pgtable_prot prot,
+			   struct kvm_mmu_memory_cache *mc)
+{
+	int ret;
+	struct stage2_map_data map_data = {
+		.phys		= ALIGN_DOWN(phys, PAGE_SIZE),
+		.mmu		= pgt->mmu,
+		.memcache	= mc,
+	};
+	struct kvm_pgtable_walker walker = {
+		.cb		= stage2_map_walker,
+		.flags		= KVM_PGTABLE_WALK_TABLE_PRE |
+				  KVM_PGTABLE_WALK_LEAF |
+				  KVM_PGTABLE_WALK_TABLE_POST,
+		.arg		= &map_data,
+	};
+
+	ret = stage2_map_set_prot_attr(prot, &map_data);
+	if (ret)
+		return ret;
+
+	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
+	dsb(ishst);
+	return ret;
+}
+
+static void stage2_flush_dcache(void *addr, u64 size)
+{
+	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
+		return;
+
+	__flush_dcache_area(addr, size);
+}
+
+static bool stage2_pte_cacheable(kvm_pte_t pte)
+{
+	u64 memattr = FIELD_GET(KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR, pte);
+	return memattr == PAGE_S2_MEMATTR(NORMAL);
+}
+
+static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			       enum kvm_pgtable_walk_flags flag,
+			       void * const arg)
+{
+	struct kvm_s2_mmu *mmu = arg;
+	kvm_pte_t pte = *ptep, *childp = NULL;
+	bool need_flush = false;
+
+	if (!kvm_pte_valid(pte))
+		return 0;
+
+	if (kvm_pte_table(pte, level)) {
+		childp = kvm_pte_follow(pte);
+
+		if (page_count(virt_to_page(childp)) != 1)
+			return 0;
+	} else if (stage2_pte_cacheable(pte)) {
+		need_flush = true;
+	}
+
+	/*
+	 * This is similar to the map() path in that we unmap the entire
+	 * block entry and rely on the remaining portions being faulted
+	 * back lazily.
+	 */
+	kvm_set_invalid_pte(ptep);
+	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
+	put_page(virt_to_page(ptep));
+
+	if (need_flush) {
+		stage2_flush_dcache(kvm_pte_follow(pte),
+				    kvm_granule_size(level));
+	}
+
+	if (childp)
+		free_page((unsigned long)childp);
+
+	return 0;
+}
+
+int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
+{
+	struct kvm_pgtable_walker walker = {
+		.cb	= stage2_unmap_walker,
+		.arg	= pgt->mmu,
+		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
+	};
+
+	return kvm_pgtable_walk(pgt, addr, size, &walker);
+}
+
 int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm)
 {
 	size_t pgd_sz;
-- 
2.28.0.297.g1956fa8f8d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 07/21] KVM: arm64: Convert kvm_phys_addr_ioremap() to generic page-table API
  2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
                   ` (5 preceding siblings ...)
  2020-08-25  9:39 ` [PATCH v3 06/21] KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table Will Deacon
@ 2020-08-25  9:39 ` Will Deacon
  2020-09-01 17:08   ` Alexandru Elisei
  2020-09-03  3:57   ` Gavin Shan
  2020-08-25  9:39 ` [PATCH v3 08/21] KVM: arm64: Convert kvm_set_spte_hva() " Will Deacon
                   ` (16 subsequent siblings)
  23 siblings, 2 replies; 86+ messages in thread
From: Will Deacon @ 2020-08-25  9:39 UTC (permalink / raw)
  To: kvmarm
  Cc: kernel-team, Gavin Shan, Suzuki Poulose, Marc Zyngier,
	Quentin Perret, James Morse, Catalin Marinas, Will Deacon,
	linux-arm-kernel

Convert kvm_phys_addr_ioremap() to use kvm_pgtable_stage2_map() instead
of stage2_set_pte().

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/hyp/pgtable.c | 14 +-------------
 arch/arm64/kvm/mmu.c         | 29 ++++++++++++-----------------
 2 files changed, 13 insertions(+), 30 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 41ee8f3c0369..6f65d3841ec9 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -439,18 +439,6 @@ struct stage2_map_data {
 	struct kvm_mmu_memory_cache	*memcache;
 };
 
-static kvm_pte_t *stage2_memcache_alloc_page(struct stage2_map_data *data)
-{
-	kvm_pte_t *ptep = NULL;
-	struct kvm_mmu_memory_cache *mc = data->memcache;
-
-	/* Allocated with GFP_PGTABLE_USER, so no need to zero */
-	if (mc && mc->nobjs)
-		ptep = mc->objects[--mc->nobjs];
-
-	return ptep;
-}
-
 static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
 				    struct stage2_map_data *data)
 {
@@ -531,7 +519,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
 		return -EINVAL;
 
-	childp = stage2_memcache_alloc_page(data);
+	childp = kvm_mmu_memory_cache_alloc(data->memcache);
 	if (!childp)
 		return -ENOMEM;
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 4607e9ca60a2..33146d3dc93a 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1154,35 +1154,30 @@ static int stage2_pudp_test_and_clear_young(pud_t *pud)
 int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 			  phys_addr_t pa, unsigned long size, bool writable)
 {
-	phys_addr_t addr, end;
+	phys_addr_t addr;
 	int ret = 0;
-	unsigned long pfn;
 	struct kvm_mmu_memory_cache cache = { 0, __GFP_ZERO, NULL, };
+	struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
+	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_DEVICE |
+				     KVM_PGTABLE_PROT_R |
+				     (writable ? KVM_PGTABLE_PROT_W : 0);
 
-	end = (guest_ipa + size + PAGE_SIZE - 1) & PAGE_MASK;
-	pfn = __phys_to_pfn(pa);
-
-	for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) {
-		pte_t pte = kvm_pfn_pte(pfn, PAGE_S2_DEVICE);
-
-		if (writable)
-			pte = kvm_s2pte_mkwrite(pte);
-
+	for (addr = guest_ipa; addr < guest_ipa + size; addr += PAGE_SIZE) {
 		ret = kvm_mmu_topup_memory_cache(&cache,
 						 kvm_mmu_cache_min_pages(kvm));
 		if (ret)
-			goto out;
+			break;
+
 		spin_lock(&kvm->mmu_lock);
-		ret = stage2_set_pte(&kvm->arch.mmu, &cache, addr, &pte,
-				     KVM_S2PTE_FLAG_IS_IOMAP);
+		ret = kvm_pgtable_stage2_map(pgt, addr, PAGE_SIZE, pa, prot,
+					     &cache);
 		spin_unlock(&kvm->mmu_lock);
 		if (ret)
-			goto out;
+			break;
 
-		pfn++;
+		pa += PAGE_SIZE;
 	}
 
-out:
 	kvm_mmu_free_memory_cache(&cache);
 	return ret;
 }
-- 
2.28.0.297.g1956fa8f8d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 08/21] KVM: arm64: Convert kvm_set_spte_hva() to generic page-table API
  2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
                   ` (6 preceding siblings ...)
  2020-08-25  9:39 ` [PATCH v3 07/21] KVM: arm64: Convert kvm_phys_addr_ioremap() to generic page-table API Will Deacon
@ 2020-08-25  9:39 ` Will Deacon
  2020-09-02 15:37   ` Alexandru Elisei
  2020-09-03  4:13   ` Gavin Shan
  2020-08-25  9:39 ` [PATCH v3 09/21] KVM: arm64: Convert unmap_stage2_range() " Will Deacon
                   ` (15 subsequent siblings)
  23 siblings, 2 replies; 86+ messages in thread
From: Will Deacon @ 2020-08-25  9:39 UTC (permalink / raw)
  To: kvmarm
  Cc: kernel-team, Gavin Shan, Suzuki Poulose, Marc Zyngier,
	Quentin Perret, James Morse, Catalin Marinas, Will Deacon,
	linux-arm-kernel

Convert kvm_set_spte_hva() to use kvm_pgtable_stage2_map() instead
of stage2_set_pte().

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/mmu.c | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 33146d3dc93a..704b471a48ce 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1911,28 +1911,27 @@ int kvm_unmap_hva_range(struct kvm *kvm,
 
 static int kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
 {
-	pte_t *pte = (pte_t *)data;
+	kvm_pfn_t *pfn = (kvm_pfn_t *)data;
 
 	WARN_ON(size != PAGE_SIZE);
+
 	/*
-	 * We can always call stage2_set_pte with KVM_S2PTE_FLAG_LOGGING_ACTIVE
-	 * flag clear because MMU notifiers will have unmapped a huge PMD before
-	 * calling ->change_pte() (which in turn calls kvm_set_spte_hva()) and
-	 * therefore stage2_set_pte() never needs to clear out a huge PMD
-	 * through this calling path.
+	 * The MMU notifiers will have unmapped a huge PMD before calling
+	 * ->change_pte() (which in turn calls kvm_set_spte_hva()) and
+	 * therefore we never need to clear out a huge PMD through this
+	 * calling path and a memcache is not required.
 	 */
-	stage2_set_pte(&kvm->arch.mmu, NULL, gpa, pte, 0);
+	kvm_pgtable_stage2_map(kvm->arch.mmu.pgt, gpa, PAGE_SIZE,
+			       __pfn_to_phys(*pfn), KVM_PGTABLE_PROT_R, NULL);
 	return 0;
 }
 
-
 int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
 {
 	unsigned long end = hva + PAGE_SIZE;
 	kvm_pfn_t pfn = pte_pfn(pte);
-	pte_t stage2_pte;
 
-	if (!kvm->arch.mmu.pgd)
+	if (!kvm->arch.mmu.pgt)
 		return 0;
 
 	trace_kvm_set_spte_hva(hva);
@@ -1942,9 +1941,7 @@ int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
 	 * just like a translation fault and clean the cache to the PoC.
 	 */
 	clean_dcache_guest_page(pfn, PAGE_SIZE);
-	stage2_pte = kvm_pfn_pte(pfn, PAGE_S2);
-	handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &stage2_pte);
-
+	handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &pfn);
 	return 0;
 }
 
-- 
2.28.0.297.g1956fa8f8d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 09/21] KVM: arm64: Convert unmap_stage2_range() to generic page-table API
  2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
                   ` (7 preceding siblings ...)
  2020-08-25  9:39 ` [PATCH v3 08/21] KVM: arm64: Convert kvm_set_spte_hva() " Will Deacon
@ 2020-08-25  9:39 ` Will Deacon
  2020-09-02 16:23   ` Alexandru Elisei
  2020-09-03  4:19   ` Gavin Shan
  2020-08-25  9:39 ` [PATCH v3 10/21] KVM: arm64: Add support for stage-2 page-aging in generic page-table Will Deacon
                   ` (14 subsequent siblings)
  23 siblings, 2 replies; 86+ messages in thread
From: Will Deacon @ 2020-08-25  9:39 UTC (permalink / raw)
  To: kvmarm
  Cc: kernel-team, Gavin Shan, Suzuki Poulose, Marc Zyngier,
	Quentin Perret, James Morse, Catalin Marinas, Will Deacon,
	linux-arm-kernel

Convert unmap_stage2_range() to use kvm_pgtable_stage2_unmap() instead
of walking the page-table directly.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/mmu.c | 57 +++++++++++++++++++++++++-------------------
 1 file changed, 32 insertions(+), 25 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 704b471a48ce..751ce2462765 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -39,6 +39,33 @@ static bool is_iomap(unsigned long flags)
 	return flags & KVM_S2PTE_FLAG_IS_IOMAP;
 }
 
+/*
+ * Release kvm_mmu_lock periodically if the memory region is large. Otherwise,
+ * we may see kernel panics with CONFIG_DETECT_HUNG_TASK,
+ * CONFIG_LOCKUP_DETECTOR, CONFIG_LOCKDEP. Additionally, holding the lock too
+ * long will also starve other vCPUs. We have to also make sure that the page
+ * tables are not freed while we released the lock.
+ */
+#define stage2_apply_range(kvm, addr, end, fn, resched)			\
+({									\
+	int ret;							\
+	struct kvm *__kvm = (kvm);					\
+	bool __resched = (resched);					\
+	u64 next, __addr = (addr), __end = (end);			\
+	do {								\
+		struct kvm_pgtable *pgt = __kvm->arch.mmu.pgt;		\
+		if (!pgt)						\
+			break;						\
+		next = stage2_pgd_addr_end(__kvm, __addr, __end);	\
+		ret = fn(pgt, __addr, next - __addr);			\
+		if (ret)						\
+			break;						\
+		if (__resched && next != __end)				\
+			cond_resched_lock(&__kvm->mmu_lock);		\
+	} while (__addr = next, __addr != __end);			\
+	ret;								\
+})
+
 static bool memslot_is_logging(struct kvm_memory_slot *memslot)
 {
 	return memslot->dirty_bitmap && !(memslot->flags & KVM_MEM_READONLY);
@@ -220,8 +247,8 @@ static inline void kvm_pgd_populate(pgd_t *pgdp, p4d_t *p4dp)
  * end up writing old data to disk.
  *
  * This is why right after unmapping a page/section and invalidating
- * the corresponding TLBs, we call kvm_flush_dcache_p*() to make sure
- * the IO subsystem will never hit in the cache.
+ * the corresponding TLBs, we flush to make sure the IO subsystem will
+ * never hit in the cache.
  *
  * This is all avoided on systems that have ARM64_HAS_STAGE2_FWB, as
  * we then fully enforce cacheability of RAM, no matter what the guest
@@ -344,32 +371,12 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
 				 bool may_block)
 {
 	struct kvm *kvm = mmu->kvm;
-	pgd_t *pgd;
-	phys_addr_t addr = start, end = start + size;
-	phys_addr_t next;
+	phys_addr_t end = start + size;
 
 	assert_spin_locked(&kvm->mmu_lock);
 	WARN_ON(size & ~PAGE_MASK);
-
-	pgd = mmu->pgd + stage2_pgd_index(kvm, addr);
-	do {
-		/*
-		 * Make sure the page table is still active, as another thread
-		 * could have possibly freed the page table, while we released
-		 * the lock.
-		 */
-		if (!READ_ONCE(mmu->pgd))
-			break;
-		next = stage2_pgd_addr_end(kvm, addr, end);
-		if (!stage2_pgd_none(kvm, *pgd))
-			unmap_stage2_p4ds(mmu, pgd, addr, next);
-		/*
-		 * If the range is too large, release the kvm->mmu_lock
-		 * to prevent starvation and lockup detector warnings.
-		 */
-		if (may_block && next != end)
-			cond_resched_lock(&kvm->mmu_lock);
-	} while (pgd++, addr = next, addr != end);
+	WARN_ON(stage2_apply_range(kvm, start, end, kvm_pgtable_stage2_unmap,
+				   may_block));
 }
 
 static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
-- 
2.28.0.297.g1956fa8f8d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 10/21] KVM: arm64: Add support for stage-2 page-aging in generic page-table
  2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
                   ` (8 preceding siblings ...)
  2020-08-25  9:39 ` [PATCH v3 09/21] KVM: arm64: Convert unmap_stage2_range() " Will Deacon
@ 2020-08-25  9:39 ` Will Deacon
  2020-09-03  4:33   ` Gavin Shan
  2020-08-25  9:39 ` [PATCH v3 11/21] KVM: arm64: Convert page-aging and access faults to generic page-table API Will Deacon
                   ` (13 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Will Deacon @ 2020-08-25  9:39 UTC (permalink / raw)
  To: kvmarm
  Cc: kernel-team, Gavin Shan, Suzuki Poulose, Marc Zyngier,
	Quentin Perret, James Morse, Catalin Marinas, Will Deacon,
	linux-arm-kernel

Add stage-2 mkyoung(), mkold() and is_young() operations to the generic
page-table code.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_pgtable.h | 38 ++++++++++++
 arch/arm64/kvm/hyp/pgtable.c         | 86 ++++++++++++++++++++++++++++
 2 files changed, 124 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 8ab0d5f43817..ae56534f87a0 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -173,6 +173,44 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
  */
 int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
 
+/**
+ * kvm_pgtable_stage2_mkyoung() - Set the access flag in a page-table entry.
+ * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
+ * @addr:	Intermediate physical address to identify the page-table entry.
+ *
+ * If there is a valid, leaf page-table entry used to translate @addr, then
+ * set the access flag in that entry.
+ *
+ * Return: The old page-table entry prior to setting the flag, 0 on failure.
+ */
+kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
+
+/**
+ * kvm_pgtable_stage2_mkold() - Clear the access flag in a page-table entry.
+ * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
+ * @addr:	Intermediate physical address to identify the page-table entry.
+ *
+ * If there is a valid, leaf page-table entry used to translate @addr, then
+ * clear the access flag in that entry.
+ *
+ * Note that it is the caller's responsibility to invalidate the TLB after
+ * calling this function to ensure that the updated permissions are visible
+ * to the CPUs.
+ *
+ * Return: The old page-table entry prior to clearing the flag, 0 on failure.
+ */
+kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
+
+/**
+ * kvm_pgtable_stage2_is_young() - Test whether a page-table entry has the
+ *				   access flag set.
+ * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
+ * @addr:	Intermediate physical address to identify the page-table entry.
+ *
+ * Return: True if the page-table entry has the access flag set, false otherwise.
+ */
+bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr);
+
 /**
  * kvm_pgtable_walk() - Walk a page-table.
  * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 6f65d3841ec9..30713eb773e0 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -670,6 +670,92 @@ int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
 	return kvm_pgtable_walk(pgt, addr, size, &walker);
 }
 
+struct stage2_attr_data {
+	kvm_pte_t	attr_set;
+	kvm_pte_t	attr_clr;
+	kvm_pte_t	pte;
+};
+
+static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			      enum kvm_pgtable_walk_flags flag,
+			      void * const arg)
+{
+	kvm_pte_t pte = *ptep;
+	struct stage2_attr_data *data = arg;
+
+	if (!kvm_pte_valid(pte))
+		return 0;
+
+	data->pte = pte;
+	pte &= ~data->attr_clr;
+	pte |= data->attr_set;
+
+	/*
+	 * We may race with the CPU trying to set the access flag here,
+	 * but worst-case the access flag update gets lost and will be
+	 * set on the next access instead.
+	 */
+	if (data->pte != pte)
+		WRITE_ONCE(*ptep, pte);
+
+	return 0;
+}
+
+static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
+				    u64 size, kvm_pte_t attr_set,
+				    kvm_pte_t attr_clr, kvm_pte_t *orig_pte)
+{
+	int ret;
+	kvm_pte_t attr_mask = KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI;
+	struct stage2_attr_data data = {
+		.attr_set	= attr_set & attr_mask,
+		.attr_clr	= attr_clr & attr_mask,
+	};
+	struct kvm_pgtable_walker walker = {
+		.cb		= stage2_attr_walker,
+		.arg		= &data,
+		.flags		= KVM_PGTABLE_WALK_LEAF,
+	};
+
+	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
+	if (ret)
+		return ret;
+
+	if (orig_pte)
+		*orig_pte = data.pte;
+	return 0;
+}
+
+kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
+{
+	kvm_pte_t pte = 0;
+	stage2_update_leaf_attrs(pgt, addr, 1, KVM_PTE_LEAF_ATTR_LO_S2_AF, 0,
+				 &pte);
+	dsb(ishst);
+	return pte;
+}
+
+kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
+{
+	kvm_pte_t pte = 0;
+	stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF,
+				 &pte);
+	/*
+	 * "But where's the TLBI?!", you scream.
+	 * "Over in the core code", I sigh.
+	 *
+	 * See the '->clear_flush_young()' callback on the KVM mmu notifier.
+	 */
+	return pte;
+}
+
+bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
+{
+	kvm_pte_t pte = 0;
+	stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte);
+	return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
+}
+
 int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm)
 {
 	size_t pgd_sz;
-- 
2.28.0.297.g1956fa8f8d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 11/21] KVM: arm64: Convert page-aging and access faults to generic page-table API
  2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
                   ` (9 preceding siblings ...)
  2020-08-25  9:39 ` [PATCH v3 10/21] KVM: arm64: Add support for stage-2 page-aging in generic page-table Will Deacon
@ 2020-08-25  9:39 ` Will Deacon
  2020-09-03  4:37   ` Gavin Shan
  2020-08-25  9:39 ` [PATCH v3 12/21] KVM: arm64: Add support for stage-2 write-protect in generic page-table Will Deacon
                   ` (12 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Will Deacon @ 2020-08-25  9:39 UTC (permalink / raw)
  To: kvmarm
  Cc: kernel-team, Gavin Shan, Suzuki Poulose, Marc Zyngier,
	Quentin Perret, James Morse, Catalin Marinas, Will Deacon,
	linux-arm-kernel

Convert the page-aging functions and access fault handler to use the
generic page-table code instead of walking the page-table directly.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/mmu.c | 74 ++++++++++----------------------------------
 1 file changed, 16 insertions(+), 58 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 751ce2462765..d3db8e00ce0a 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1698,46 +1698,23 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	return ret;
 }
 
-/*
- * Resolve the access fault by making the page young again.
- * Note that because the faulting entry is guaranteed not to be
- * cached in the TLB, we don't need to invalidate anything.
- * Only the HW Access Flag updates are supported for Stage 2 (no DBM),
- * so there is no need for atomic (pte|pmd)_mkyoung operations.
- */
+/* Resolve the access fault by making the page young again. */
 static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 {
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *pte;
-	kvm_pfn_t pfn;
-	bool pfn_valid = false;
+	pte_t pte;
+	kvm_pte_t kpte;
+	struct kvm_s2_mmu *mmu;
 
 	trace_kvm_access_fault(fault_ipa);
 
 	spin_lock(&vcpu->kvm->mmu_lock);
-
-	if (!stage2_get_leaf_entry(vcpu->arch.hw_mmu, fault_ipa, &pud, &pmd, &pte))
-		goto out;
-
-	if (pud) {		/* HugeTLB */
-		*pud = kvm_s2pud_mkyoung(*pud);
-		pfn = kvm_pud_pfn(*pud);
-		pfn_valid = true;
-	} else	if (pmd) {	/* THP, HugeTLB */
-		*pmd = pmd_mkyoung(*pmd);
-		pfn = pmd_pfn(*pmd);
-		pfn_valid = true;
-	} else {
-		*pte = pte_mkyoung(*pte);	/* Just a page... */
-		pfn = pte_pfn(*pte);
-		pfn_valid = true;
-	}
-
-out:
+	mmu = vcpu->arch.hw_mmu;
+	kpte = kvm_pgtable_stage2_mkyoung(mmu->pgt, fault_ipa);
 	spin_unlock(&vcpu->kvm->mmu_lock);
-	if (pfn_valid)
-		kvm_set_pfn_accessed(pfn);
+
+	pte = __pte(kpte);
+	if (pte_valid(pte))
+		kvm_set_pfn_accessed(pte_pfn(pte));
 }
 
 /**
@@ -1954,38 +1931,19 @@ int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
 
 static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
 {
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *pte;
+	pte_t pte;
+	kvm_pte_t kpte;
 
 	WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
-	if (!stage2_get_leaf_entry(&kvm->arch.mmu, gpa, &pud, &pmd, &pte))
-		return 0;
-
-	if (pud)
-		return stage2_pudp_test_and_clear_young(pud);
-	else if (pmd)
-		return stage2_pmdp_test_and_clear_young(pmd);
-	else
-		return stage2_ptep_test_and_clear_young(pte);
+	kpte = kvm_pgtable_stage2_mkold(kvm->arch.mmu.pgt, gpa);
+	pte = __pte(kpte);
+	return pte_valid(pte) && pte_young(pte);
 }
 
 static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
 {
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *pte;
-
 	WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
-	if (!stage2_get_leaf_entry(&kvm->arch.mmu, gpa, &pud, &pmd, &pte))
-		return 0;
-
-	if (pud)
-		return kvm_s2pud_young(*pud);
-	else if (pmd)
-		return pmd_young(*pmd);
-	else
-		return pte_young(*pte);
+	return kvm_pgtable_stage2_is_young(kvm->arch.mmu.pgt, gpa);
 }
 
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
-- 
2.28.0.297.g1956fa8f8d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 12/21] KVM: arm64: Add support for stage-2 write-protect in generic page-table
  2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
                   ` (10 preceding siblings ...)
  2020-08-25  9:39 ` [PATCH v3 11/21] KVM: arm64: Convert page-aging and access faults to generic page-table API Will Deacon
@ 2020-08-25  9:39 ` Will Deacon
  2020-09-03  4:47   ` Gavin Shan
  2020-08-25  9:39 ` [PATCH v3 13/21] KVM: arm64: Convert write-protect operation to generic page-table API Will Deacon
                   ` (11 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Will Deacon @ 2020-08-25  9:39 UTC (permalink / raw)
  To: kvmarm
  Cc: kernel-team, Gavin Shan, Suzuki Poulose, Marc Zyngier,
	Quentin Perret, James Morse, Catalin Marinas, Will Deacon,
	linux-arm-kernel

From: Quentin Perret <qperret@google.com>

Add a stage-2 wrprotect() operation to the generic page-table code.

Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_pgtable.h | 15 +++++++++++++++
 arch/arm64/kvm/hyp/pgtable.c         |  6 ++++++
 2 files changed, 21 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index ae56534f87a0..0c96b78d791d 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -173,6 +173,21 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
  */
 int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
 
+/**
+ * kvm_pgtable_stage2_wrprotect() - Write-protect guest stage-2 address range
+ *                                  without TLB invalidation.
+ * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
+ * @addr:	Intermediate physical address from which to write-protect,
+ * @size:	Size of the range.
+ *
+ * Note that it is the caller's responsibility to invalidate the TLB after
+ * calling this function to ensure that the updated permissions are visible
+ * to the CPUs.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
+
 /**
  * kvm_pgtable_stage2_mkyoung() - Set the access flag in a page-table entry.
  * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 30713eb773e0..c218651f8eba 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -726,6 +726,12 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
 	return 0;
 }
 
+int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
+{
+	return stage2_update_leaf_attrs(pgt, addr, size, 0,
+					KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W, NULL);
+}
+
 kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
 {
 	kvm_pte_t pte = 0;
-- 
2.28.0.297.g1956fa8f8d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 13/21] KVM: arm64: Convert write-protect operation to generic page-table API
  2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
                   ` (11 preceding siblings ...)
  2020-08-25  9:39 ` [PATCH v3 12/21] KVM: arm64: Add support for stage-2 write-protect in generic page-table Will Deacon
@ 2020-08-25  9:39 ` Will Deacon
  2020-09-03  4:48   ` Gavin Shan
  2020-08-25  9:39 ` [PATCH v3 14/21] KVM: arm64: Add support for stage-2 cache flushing in generic page-table Will Deacon
                   ` (10 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Will Deacon @ 2020-08-25  9:39 UTC (permalink / raw)
  To: kvmarm
  Cc: kernel-team, Gavin Shan, Suzuki Poulose, Marc Zyngier,
	Quentin Perret, James Morse, Catalin Marinas, Will Deacon,
	linux-arm-kernel

From: Quentin Perret <qperret@google.com>

Convert stage2_wp_range() to call the kvm_pgtable_stage2_wrprotect()
function of the generic page-table code instead of walking the page-table
directly.

Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/mmu.c | 25 ++++---------------------
 1 file changed, 4 insertions(+), 21 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index d3db8e00ce0a..ca2c37c91e0b 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -66,6 +66,9 @@ static bool is_iomap(unsigned long flags)
 	ret;								\
 })
 
+#define stage2_apply_range_resched(kvm, addr, end, fn)			\
+	stage2_apply_range(kvm, addr, end, fn, true)
+
 static bool memslot_is_logging(struct kvm_memory_slot *memslot)
 {
 	return memslot->dirty_bitmap && !(memslot->flags & KVM_MEM_READONLY);
@@ -1294,27 +1297,7 @@ static void  stage2_wp_p4ds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
 static void stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
 {
 	struct kvm *kvm = mmu->kvm;
-	pgd_t *pgd;
-	phys_addr_t next;
-
-	pgd = mmu->pgd + stage2_pgd_index(kvm, addr);
-	do {
-		/*
-		 * Release kvm_mmu_lock periodically if the memory region is
-		 * large. Otherwise, we may see kernel panics with
-		 * CONFIG_DETECT_HUNG_TASK, CONFIG_LOCKUP_DETECTOR,
-		 * CONFIG_LOCKDEP. Additionally, holding the lock too long
-		 * will also starve other vCPUs. We have to also make sure
-		 * that the page tables are not freed while we released
-		 * the lock.
-		 */
-		cond_resched_lock(&kvm->mmu_lock);
-		if (!READ_ONCE(mmu->pgd))
-			break;
-		next = stage2_pgd_addr_end(kvm, addr, end);
-		if (stage2_pgd_present(kvm, *pgd))
-			stage2_wp_p4ds(mmu, pgd, addr, next);
-	} while (pgd++, addr = next, addr != end);
+	stage2_apply_range_resched(kvm, addr, end, kvm_pgtable_stage2_wrprotect);
 }
 
 /**
-- 
2.28.0.297.g1956fa8f8d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 14/21] KVM: arm64: Add support for stage-2 cache flushing in generic page-table
  2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
                   ` (12 preceding siblings ...)
  2020-08-25  9:39 ` [PATCH v3 13/21] KVM: arm64: Convert write-protect operation to generic page-table API Will Deacon
@ 2020-08-25  9:39 ` Will Deacon
  2020-09-03  4:51   ` Gavin Shan
  2020-08-25  9:39 ` [PATCH v3 15/21] KVM: arm64: Convert memslot cache-flushing code to generic page-table API Will Deacon
                   ` (9 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Will Deacon @ 2020-08-25  9:39 UTC (permalink / raw)
  To: kvmarm
  Cc: kernel-team, Gavin Shan, Suzuki Poulose, Marc Zyngier,
	Quentin Perret, James Morse, Catalin Marinas, Will Deacon,
	linux-arm-kernel

From: Quentin Perret <qperret@google.com>

Add support for cache flushing a range of the stage-2 address space to
the generic page-table code.

Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_pgtable.h | 12 ++++++++++++
 arch/arm64/kvm/hyp/pgtable.c         | 26 ++++++++++++++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 0c96b78d791d..ea823fe31913 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -226,6 +226,18 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
  */
 bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr);
 
+/**
+ * kvm_pgtable_stage2_flush_range() - Clean and invalidate data cache to Point
+ * 				      of Coherency for guest stage-2 address
+ *				      range.
+ * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
+ * @addr:	Intermediate physical address from which to flush.
+ * @size:	Size of the range.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size);
+
 /**
  * kvm_pgtable_walk() - Walk a page-table.
  * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index c218651f8eba..75887185f1e2 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -762,6 +762,32 @@ bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
 	return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
 }
 
+static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
+			       enum kvm_pgtable_walk_flags flag,
+			       void * const arg)
+{
+	kvm_pte_t pte = *ptep;
+
+	if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pte))
+		return 0;
+
+	stage2_flush_dcache(kvm_pte_follow(pte), kvm_granule_size(level));
+	return 0;
+}
+
+int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
+{
+	struct kvm_pgtable_walker walker = {
+		.cb	= stage2_flush_walker,
+		.flags	= KVM_PGTABLE_WALK_LEAF,
+	};
+
+	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
+		return 0;
+
+	return kvm_pgtable_walk(pgt, addr, size, &walker);
+}
+
 int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm)
 {
 	size_t pgd_sz;
-- 
2.28.0.297.g1956fa8f8d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 15/21] KVM: arm64: Convert memslot cache-flushing code to generic page-table API
  2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
                   ` (13 preceding siblings ...)
  2020-08-25  9:39 ` [PATCH v3 14/21] KVM: arm64: Add support for stage-2 cache flushing in generic page-table Will Deacon
@ 2020-08-25  9:39 ` Will Deacon
  2020-09-03  4:52   ` Gavin Shan
  2020-08-25  9:39 ` [PATCH v3 16/21] KVM: arm64: Add support for relaxing stage-2 perms in generic page-table code Will Deacon
                   ` (8 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Will Deacon @ 2020-08-25  9:39 UTC (permalink / raw)
  To: kvmarm
  Cc: kernel-team, Gavin Shan, Suzuki Poulose, Marc Zyngier,
	Quentin Perret, James Morse, Catalin Marinas, Will Deacon,
	linux-arm-kernel

From: Quentin Perret <qperret@google.com>

Convert stage2_flush_memslot() to call the kvm_pgtable_stage2_flush()
function of the generic page-table code instead of walking the page-table
directly.

Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/mmu.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ca2c37c91e0b..d4b0716a6ab4 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -455,21 +455,10 @@ static void stage2_flush_p4ds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
 static void stage2_flush_memslot(struct kvm *kvm,
 				 struct kvm_memory_slot *memslot)
 {
-	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
 	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
 	phys_addr_t end = addr + PAGE_SIZE * memslot->npages;
-	phys_addr_t next;
-	pgd_t *pgd;
-
-	pgd = mmu->pgd + stage2_pgd_index(kvm, addr);
-	do {
-		next = stage2_pgd_addr_end(kvm, addr, end);
-		if (!stage2_pgd_none(kvm, *pgd))
-			stage2_flush_p4ds(mmu, pgd, addr, next);
 
-		if (next != end)
-			cond_resched_lock(&kvm->mmu_lock);
-	} while (pgd++, addr = next, addr != end);
+	stage2_apply_range_resched(kvm, addr, end, kvm_pgtable_stage2_flush);
 }
 
 /**
-- 
2.28.0.297.g1956fa8f8d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 16/21] KVM: arm64: Add support for relaxing stage-2 perms in generic page-table code
  2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
                   ` (14 preceding siblings ...)
  2020-08-25  9:39 ` [PATCH v3 15/21] KVM: arm64: Convert memslot cache-flushing code to generic page-table API Will Deacon
@ 2020-08-25  9:39 ` Will Deacon
  2020-09-03  4:55   ` Gavin Shan
  2020-08-25  9:39 ` [PATCH v3 17/21] KVM: arm64: Convert user_mem_abort() to generic page-table API Will Deacon
                   ` (7 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Will Deacon @ 2020-08-25  9:39 UTC (permalink / raw)
  To: kvmarm
  Cc: kernel-team, Gavin Shan, Suzuki Poulose, Marc Zyngier,
	Quentin Perret, James Morse, Catalin Marinas, Will Deacon,
	linux-arm-kernel

Add support for relaxing the permissions of a stage-2 mapping (i.e.
adding additional permissions) to the generic page-table code.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_pgtable.h | 17 +++++++++++++++++
 arch/arm64/kvm/hyp/pgtable.c         | 20 ++++++++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index ea823fe31913..0d7077c34152 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -216,6 +216,23 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
  */
 kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
 
+/**
+ * kvm_pgtable_stage2_relax_perms() - Relax the permissions enforced by a
+ *				      page-table entry.
+ * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
+ * @addr:	Intermediate physical address to identify the page-table entry.
+ * @prot:	Additional permissions to grant for the mapping.
+ *
+ * If there is a valid, leaf page-table entry used to translate @addr, then
+ * relax the permissions in that entry according to the read, write and
+ * execute permissions specified by @prot. No permissions are removed, and
+ * TLB invalidation is performed after updating the entry.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
+				   enum kvm_pgtable_prot prot);
+
 /**
  * kvm_pgtable_stage2_is_young() - Test whether a page-table entry has the
  *				   access flag set.
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 75887185f1e2..6e8ca1ec12b4 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -762,6 +762,26 @@ bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
 	return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
 }
 
+int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
+				   enum kvm_pgtable_prot prot)
+{
+	int ret;
+	kvm_pte_t set = 0, clr = 0;
+
+	if (prot & KVM_PGTABLE_PROT_R)
+		set |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R;
+
+	if (prot & KVM_PGTABLE_PROT_W)
+		set |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W;
+
+	if (prot & KVM_PGTABLE_PROT_X)
+		clr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
+
+	ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL);
+	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, pgt->mmu, addr, 0);
+	return ret;
+}
+
 static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
 			       enum kvm_pgtable_walk_flags flag,
 			       void * const arg)
-- 
2.28.0.297.g1956fa8f8d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 17/21] KVM: arm64: Convert user_mem_abort() to generic page-table API
  2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
                   ` (15 preceding siblings ...)
  2020-08-25  9:39 ` [PATCH v3 16/21] KVM: arm64: Add support for relaxing stage-2 perms in generic page-table code Will Deacon
@ 2020-08-25  9:39 ` Will Deacon
  2020-09-03  6:05   ` Gavin Shan
  2020-08-25  9:39 ` [PATCH v3 18/21] KVM: arm64: Check the pgt instead of the pgd when modifying page-table Will Deacon
                   ` (6 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Will Deacon @ 2020-08-25  9:39 UTC (permalink / raw)
  To: kvmarm
  Cc: kernel-team, Gavin Shan, Suzuki Poulose, Marc Zyngier,
	Quentin Perret, James Morse, Catalin Marinas, Will Deacon,
	linux-arm-kernel

Convert user_mem_abort() to call kvm_pgtable_stage2_relax_perms() when
handling a stage-2 permission fault and kvm_pgtable_stage2_map() when
handling a stage-2 translation fault, rather than walking the page-table
manually.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/mmu.c | 112 +++++++++++++------------------------------
 1 file changed, 34 insertions(+), 78 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index d4b0716a6ab4..cfbf32cae3a5 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1491,7 +1491,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 {
 	int ret;
 	bool write_fault, writable, force_pte = false;
-	bool exec_fault, needs_exec;
+	bool exec_fault;
+	bool device = false;
 	unsigned long mmu_seq;
 	gfn_t gfn = fault_ipa >> PAGE_SHIFT;
 	struct kvm *kvm = vcpu->kvm;
@@ -1499,10 +1500,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	struct vm_area_struct *vma;
 	short vma_shift;
 	kvm_pfn_t pfn;
-	pgprot_t mem_type = PAGE_S2;
 	bool logging_active = memslot_is_logging(memslot);
-	unsigned long vma_pagesize, flags = 0;
-	struct kvm_s2_mmu *mmu = vcpu->arch.hw_mmu;
+	unsigned long vma_pagesize;
+	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
+	struct kvm_pgtable *pgt;
 
 	write_fault = kvm_is_write_fault(vcpu);
 	exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
@@ -1535,22 +1536,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		vma_pagesize = PAGE_SIZE;
 	}
 
-	/*
-	 * The stage2 has a minimum of 2 level table (For arm64 see
-	 * kvm_arm_setup_stage2()). Hence, we are guaranteed that we can
-	 * use PMD_SIZE huge mappings (even when the PMD is folded into PGD).
-	 * As for PUD huge maps, we must make sure that we have at least
-	 * 3 levels, i.e, PMD is not folded.
-	 */
-	if (vma_pagesize == PMD_SIZE ||
-	    (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm)))
+	if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
 		gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
 	mmap_read_unlock(current->mm);
 
-	/* We need minimum second+third level pages */
-	ret = kvm_mmu_topup_memory_cache(memcache, kvm_mmu_cache_min_pages(kvm));
-	if (ret)
-		return ret;
+	if (fault_status != FSC_PERM) {
+		ret = kvm_mmu_topup_memory_cache(memcache,
+						 kvm_mmu_cache_min_pages(kvm));
+		if (ret)
+			return ret;
+	}
 
 	mmu_seq = vcpu->kvm->mmu_notifier_seq;
 	/*
@@ -1573,28 +1568,20 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 		return -EFAULT;
 
 	if (kvm_is_device_pfn(pfn)) {
-		mem_type = PAGE_S2_DEVICE;
-		flags |= KVM_S2PTE_FLAG_IS_IOMAP;
-	} else if (logging_active) {
-		/*
-		 * Faults on pages in a memslot with logging enabled
-		 * should not be mapped with huge pages (it introduces churn
-		 * and performance degradation), so force a pte mapping.
-		 */
-		flags |= KVM_S2_FLAG_LOGGING_ACTIVE;
-
+		device = true;
+	} else if (logging_active && !write_fault) {
 		/*
 		 * Only actually map the page as writable if this was a write
 		 * fault.
 		 */
-		if (!write_fault)
-			writable = false;
+		writable = false;
 	}
 
-	if (exec_fault && is_iomap(flags))
+	if (exec_fault && device)
 		return -ENOEXEC;
 
 	spin_lock(&kvm->mmu_lock);
+	pgt = vcpu->arch.hw_mmu->pgt;
 	if (mmu_notifier_retry(kvm, mmu_seq))
 		goto out_unlock;
 
@@ -1605,62 +1592,31 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	if (vma_pagesize == PAGE_SIZE && !force_pte)
 		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
 							   &pfn, &fault_ipa);
-	if (writable)
+	if (writable) {
+		prot |= KVM_PGTABLE_PROT_W;
 		kvm_set_pfn_dirty(pfn);
+		mark_page_dirty(kvm, gfn);
+	}
 
-	if (fault_status != FSC_PERM && !is_iomap(flags))
+	if (fault_status != FSC_PERM && !device)
 		clean_dcache_guest_page(pfn, vma_pagesize);
 
-	if (exec_fault)
+	if (exec_fault) {
+		prot |= KVM_PGTABLE_PROT_X;
 		invalidate_icache_guest_page(pfn, vma_pagesize);
+	}
 
-	/*
-	 * If we took an execution fault we have made the
-	 * icache/dcache coherent above and should now let the s2
-	 * mapping be executable.
-	 *
-	 * Write faults (!exec_fault && FSC_PERM) are orthogonal to
-	 * execute permissions, and we preserve whatever we have.
-	 */
-	needs_exec = exec_fault ||
-		(fault_status == FSC_PERM &&
-		 stage2_is_exec(mmu, fault_ipa, vma_pagesize));
-
-	if (vma_pagesize == PUD_SIZE) {
-		pud_t new_pud = kvm_pfn_pud(pfn, mem_type);
-
-		new_pud = kvm_pud_mkhuge(new_pud);
-		if (writable)
-			new_pud = kvm_s2pud_mkwrite(new_pud);
-
-		if (needs_exec)
-			new_pud = kvm_s2pud_mkexec(new_pud);
-
-		ret = stage2_set_pud_huge(mmu, memcache, fault_ipa, &new_pud);
-	} else if (vma_pagesize == PMD_SIZE) {
-		pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type);
-
-		new_pmd = kvm_pmd_mkhuge(new_pmd);
-
-		if (writable)
-			new_pmd = kvm_s2pmd_mkwrite(new_pmd);
-
-		if (needs_exec)
-			new_pmd = kvm_s2pmd_mkexec(new_pmd);
+	if (device)
+		prot |= KVM_PGTABLE_PROT_DEVICE;
+	else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
+		prot |= KVM_PGTABLE_PROT_X;
 
-		ret = stage2_set_pmd_huge(mmu, memcache, fault_ipa, &new_pmd);
+	if (fault_status == FSC_PERM) {
+		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
 	} else {
-		pte_t new_pte = kvm_pfn_pte(pfn, mem_type);
-
-		if (writable) {
-			new_pte = kvm_s2pte_mkwrite(new_pte);
-			mark_page_dirty(kvm, gfn);
-		}
-
-		if (needs_exec)
-			new_pte = kvm_s2pte_mkexec(new_pte);
-
-		ret = stage2_set_pte(mmu, memcache, fault_ipa, &new_pte, flags);
+		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
+					     __pfn_to_phys(pfn), prot,
+					     memcache);
 	}
 
 out_unlock:
-- 
2.28.0.297.g1956fa8f8d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 18/21] KVM: arm64: Check the pgt instead of the pgd when modifying page-table
  2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
                   ` (16 preceding siblings ...)
  2020-08-25  9:39 ` [PATCH v3 17/21] KVM: arm64: Convert user_mem_abort() to generic page-table API Will Deacon
@ 2020-08-25  9:39 ` Will Deacon
  2020-09-03  5:00   ` Gavin Shan
  2020-08-25  9:39 ` [PATCH v3 19/21] KVM: arm64: Remove unused page-table code Will Deacon
                   ` (5 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Will Deacon @ 2020-08-25  9:39 UTC (permalink / raw)
  To: kvmarm
  Cc: kernel-team, Gavin Shan, Suzuki Poulose, Marc Zyngier,
	Quentin Perret, James Morse, Catalin Marinas, Will Deacon,
	linux-arm-kernel

In preparation for removing the 'pgd' field of 'struct kvm_s2_mmu',
update the few remaining users to check the 'pgt' field instead.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/mmu.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index cfbf32cae3a5..050eab71de31 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1813,7 +1813,7 @@ static int kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *dat
 int kvm_unmap_hva_range(struct kvm *kvm,
 			unsigned long start, unsigned long end, unsigned flags)
 {
-	if (!kvm->arch.mmu.pgd)
+	if (!kvm->arch.mmu.pgt)
 		return 0;
 
 	trace_kvm_unmap_hva_range(start, end);
@@ -1876,7 +1876,7 @@ static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *
 
 int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
 {
-	if (!kvm->arch.mmu.pgd)
+	if (!kvm->arch.mmu.pgt)
 		return 0;
 	trace_kvm_age_hva(start, end);
 	return handle_hva_to_gpa(kvm, start, end, kvm_age_hva_handler, NULL);
@@ -1884,7 +1884,7 @@ int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
 
 int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
 {
-	if (!kvm->arch.mmu.pgd)
+	if (!kvm->arch.mmu.pgt)
 		return 0;
 	trace_kvm_test_age_hva(hva);
 	return handle_hva_to_gpa(kvm, hva, hva + PAGE_SIZE,
-- 
2.28.0.297.g1956fa8f8d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 19/21] KVM: arm64: Remove unused page-table code
  2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
                   ` (17 preceding siblings ...)
  2020-08-25  9:39 ` [PATCH v3 18/21] KVM: arm64: Check the pgt instead of the pgd when modifying page-table Will Deacon
@ 2020-08-25  9:39 ` Will Deacon
  2020-09-03  6:02   ` Gavin Shan
  2020-08-25  9:39 ` [PATCH v3 20/21] KVM: arm64: Remove unused 'pgd' field from 'struct kvm_s2_mmu' Will Deacon
                   ` (4 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Will Deacon @ 2020-08-25  9:39 UTC (permalink / raw)
  To: kvmarm
  Cc: kernel-team, Gavin Shan, Suzuki Poulose, Marc Zyngier,
	Quentin Perret, James Morse, Catalin Marinas, Will Deacon,
	linux-arm-kernel

Now that KVM is using the generic page-table code to manage the guest
stage-2 page-tables, we can remove a bunch of unused macros, #defines
and static inline functions from the old implementation.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_mmu.h        | 141 -----
 arch/arm64/include/asm/pgtable-hwdef.h  |  17 -
 arch/arm64/include/asm/pgtable-prot.h   |  13 -
 arch/arm64/include/asm/stage2_pgtable.h | 215 -------
 arch/arm64/kvm/mmu.c                    | 755 ------------------------
 5 files changed, 1141 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 42fb50cfe0d8..13ff00d9f16d 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -135,123 +135,6 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
 phys_addr_t kvm_mmu_get_httbr(void);
 phys_addr_t kvm_get_idmap_vector(void);
 int kvm_mmu_init(void);
-#define kvm_mk_pmd(ptep)					\
-	__pmd(__phys_to_pmd_val(__pa(ptep)) | PMD_TYPE_TABLE)
-#define kvm_mk_pud(pmdp)					\
-	__pud(__phys_to_pud_val(__pa(pmdp)) | PMD_TYPE_TABLE)
-#define kvm_mk_p4d(pmdp)					\
-	__p4d(__phys_to_p4d_val(__pa(pmdp)) | PUD_TYPE_TABLE)
-
-#define kvm_set_pud(pudp, pud)		set_pud(pudp, pud)
-
-#define kvm_pfn_pte(pfn, prot)		pfn_pte(pfn, prot)
-#define kvm_pfn_pmd(pfn, prot)		pfn_pmd(pfn, prot)
-#define kvm_pfn_pud(pfn, prot)		pfn_pud(pfn, prot)
-
-#define kvm_pud_pfn(pud)		pud_pfn(pud)
-
-#define kvm_pmd_mkhuge(pmd)		pmd_mkhuge(pmd)
-#define kvm_pud_mkhuge(pud)		pud_mkhuge(pud)
-
-static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
-{
-	pte_val(pte) |= PTE_S2_RDWR;
-	return pte;
-}
-
-static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd)
-{
-	pmd_val(pmd) |= PMD_S2_RDWR;
-	return pmd;
-}
-
-static inline pud_t kvm_s2pud_mkwrite(pud_t pud)
-{
-	pud_val(pud) |= PUD_S2_RDWR;
-	return pud;
-}
-
-static inline pte_t kvm_s2pte_mkexec(pte_t pte)
-{
-	pte_val(pte) &= ~PTE_S2_XN;
-	return pte;
-}
-
-static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd)
-{
-	pmd_val(pmd) &= ~PMD_S2_XN;
-	return pmd;
-}
-
-static inline pud_t kvm_s2pud_mkexec(pud_t pud)
-{
-	pud_val(pud) &= ~PUD_S2_XN;
-	return pud;
-}
-
-static inline void kvm_set_s2pte_readonly(pte_t *ptep)
-{
-	pteval_t old_pteval, pteval;
-
-	pteval = READ_ONCE(pte_val(*ptep));
-	do {
-		old_pteval = pteval;
-		pteval &= ~PTE_S2_RDWR;
-		pteval |= PTE_S2_RDONLY;
-		pteval = cmpxchg_relaxed(&pte_val(*ptep), old_pteval, pteval);
-	} while (pteval != old_pteval);
-}
-
-static inline bool kvm_s2pte_readonly(pte_t *ptep)
-{
-	return (READ_ONCE(pte_val(*ptep)) & PTE_S2_RDWR) == PTE_S2_RDONLY;
-}
-
-static inline bool kvm_s2pte_exec(pte_t *ptep)
-{
-	return !(READ_ONCE(pte_val(*ptep)) & PTE_S2_XN);
-}
-
-static inline void kvm_set_s2pmd_readonly(pmd_t *pmdp)
-{
-	kvm_set_s2pte_readonly((pte_t *)pmdp);
-}
-
-static inline bool kvm_s2pmd_readonly(pmd_t *pmdp)
-{
-	return kvm_s2pte_readonly((pte_t *)pmdp);
-}
-
-static inline bool kvm_s2pmd_exec(pmd_t *pmdp)
-{
-	return !(READ_ONCE(pmd_val(*pmdp)) & PMD_S2_XN);
-}
-
-static inline void kvm_set_s2pud_readonly(pud_t *pudp)
-{
-	kvm_set_s2pte_readonly((pte_t *)pudp);
-}
-
-static inline bool kvm_s2pud_readonly(pud_t *pudp)
-{
-	return kvm_s2pte_readonly((pte_t *)pudp);
-}
-
-static inline bool kvm_s2pud_exec(pud_t *pudp)
-{
-	return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN);
-}
-
-static inline pud_t kvm_s2pud_mkyoung(pud_t pud)
-{
-	return pud_mkyoung(pud);
-}
-
-static inline bool kvm_s2pud_young(pud_t pud)
-{
-	return pud_young(pud);
-}
-
 
 struct kvm;
 
@@ -293,30 +176,6 @@ static inline void __invalidate_icache_guest_page(kvm_pfn_t pfn,
 	}
 }
 
-static inline void __kvm_flush_dcache_pte(pte_t pte)
-{
-	if (!cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) {
-		struct page *page = pte_page(pte);
-		kvm_flush_dcache_to_poc(page_address(page), PAGE_SIZE);
-	}
-}
-
-static inline void __kvm_flush_dcache_pmd(pmd_t pmd)
-{
-	if (!cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) {
-		struct page *page = pmd_page(pmd);
-		kvm_flush_dcache_to_poc(page_address(page), PMD_SIZE);
-	}
-}
-
-static inline void __kvm_flush_dcache_pud(pud_t pud)
-{
-	if (!cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) {
-		struct page *page = pud_page(pud);
-		kvm_flush_dcache_to_poc(page_address(page), PUD_SIZE);
-	}
-}
-
 void kvm_set_way_flush(struct kvm_vcpu *vcpu);
 void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
 
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index 1a989353144e..bb97d464f42b 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -172,23 +172,6 @@
 #define PTE_ATTRINDX(t)		(_AT(pteval_t, (t)) << 2)
 #define PTE_ATTRINDX_MASK	(_AT(pteval_t, 7) << 2)
 
-/*
- * 2nd stage PTE definitions
- */
-#define PTE_S2_RDONLY		(_AT(pteval_t, 1) << 6)   /* HAP[2:1] */
-#define PTE_S2_RDWR		(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
-#define PTE_S2_XN		(_AT(pteval_t, 2) << 53)  /* XN[1:0] */
-#define PTE_S2_SW_RESVD		(_AT(pteval_t, 15) << 55) /* Reserved for SW */
-
-#define PMD_S2_RDONLY		(_AT(pmdval_t, 1) << 6)   /* HAP[2:1] */
-#define PMD_S2_RDWR		(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
-#define PMD_S2_XN		(_AT(pmdval_t, 2) << 53)  /* XN[1:0] */
-#define PMD_S2_SW_RESVD		(_AT(pmdval_t, 15) << 55) /* Reserved for SW */
-
-#define PUD_S2_RDONLY		(_AT(pudval_t, 1) << 6)   /* HAP[2:1] */
-#define PUD_S2_RDWR		(_AT(pudval_t, 3) << 6)   /* HAP[2:1] */
-#define PUD_S2_XN		(_AT(pudval_t, 2) << 53)  /* XN[1:0] */
-
 /*
  * Memory Attribute override for Stage-2 (MemAttr[3:0])
  */
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 88acd7e1cd05..8f094c43072a 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -73,19 +73,6 @@ extern bool arm64_use_ng_mappings;
 		__val;							\
 	 })
 
-#define PAGE_S2_XN							\
-	({								\
-		u64 __val;						\
-		if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))		\
-			__val = 0;					\
-		else							\
-			__val = PTE_S2_XN;				\
-		__val;							\
-	})
-
-#define PAGE_S2			__pgprot(_PROT_DEFAULT | PAGE_S2_MEMATTR(NORMAL) | PTE_S2_RDONLY | PAGE_S2_XN)
-#define PAGE_S2_DEVICE		__pgprot(_PROT_DEFAULT | PAGE_S2_MEMATTR(DEVICE_nGnRE) | PTE_S2_RDONLY | PTE_S2_XN)
-
 #define PAGE_NONE		__pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN)
 /* shared+writable pages are clean by default, hence PTE_RDONLY|PTE_WRITE */
 #define PAGE_SHARED		__pgprot(_PAGE_DEFAULT | PTE_USER | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN | PTE_WRITE)
diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h
index 996bf98f0cab..fe341a6578c3 100644
--- a/arch/arm64/include/asm/stage2_pgtable.h
+++ b/arch/arm64/include/asm/stage2_pgtable.h
@@ -8,7 +8,6 @@
 #ifndef __ARM64_S2_PGTABLE_H_
 #define __ARM64_S2_PGTABLE_H_
 
-#include <linux/hugetlb.h>
 #include <linux/pgtable.h>
 
 /*
@@ -36,21 +35,6 @@
 #define stage2_pgdir_size(kvm)		(1ULL << stage2_pgdir_shift(kvm))
 #define stage2_pgdir_mask(kvm)		~(stage2_pgdir_size(kvm) - 1)
 
-/*
- * The number of PTRS across all concatenated stage2 tables given by the
- * number of bits resolved at the initial level.
- * If we force more levels than necessary, we may have (stage2_pgdir_shift > IPA),
- * in which case, stage2_pgd_ptrs will have one entry.
- */
-#define pgd_ptrs_shift(ipa, pgdir_shift)	\
-	((ipa) > (pgdir_shift) ? ((ipa) - (pgdir_shift)) : 0)
-#define __s2_pgd_ptrs(ipa, lvls)		\
-	(1 << (pgd_ptrs_shift((ipa), pt_levels_pgdir_shift(lvls))))
-#define __s2_pgd_size(ipa, lvls)	(__s2_pgd_ptrs((ipa), (lvls)) * sizeof(pgd_t))
-
-#define stage2_pgd_ptrs(kvm)		__s2_pgd_ptrs(kvm_phys_shift(kvm), kvm_stage2_levels(kvm))
-#define stage2_pgd_size(kvm)		__s2_pgd_size(kvm_phys_shift(kvm), kvm_stage2_levels(kvm))
-
 /*
  * kvm_mmmu_cache_min_pages() is the number of pages required to install
  * a stage-2 translation. We pre-allocate the entry level page table at
@@ -58,196 +42,6 @@
  */
 #define kvm_mmu_cache_min_pages(kvm)	(kvm_stage2_levels(kvm) - 1)
 
-/* Stage2 PUD definitions when the level is present */
-static inline bool kvm_stage2_has_pud(struct kvm *kvm)
-{
-	return (CONFIG_PGTABLE_LEVELS > 3) && (kvm_stage2_levels(kvm) > 3);
-}
-
-#define S2_PUD_SHIFT			ARM64_HW_PGTABLE_LEVEL_SHIFT(1)
-#define S2_PUD_SIZE			(1UL << S2_PUD_SHIFT)
-#define S2_PUD_MASK			(~(S2_PUD_SIZE - 1))
-
-#define stage2_pgd_none(kvm, pgd)		pgd_none(pgd)
-#define stage2_pgd_clear(kvm, pgd)		pgd_clear(pgd)
-#define stage2_pgd_present(kvm, pgd)		pgd_present(pgd)
-#define stage2_pgd_populate(kvm, pgd, p4d)	pgd_populate(NULL, pgd, p4d)
-
-static inline p4d_t *stage2_p4d_offset(struct kvm *kvm,
-				       pgd_t *pgd, unsigned long address)
-{
-	return p4d_offset(pgd, address);
-}
-
-static inline void stage2_p4d_free(struct kvm *kvm, p4d_t *p4d)
-{
-}
-
-static inline bool stage2_p4d_table_empty(struct kvm *kvm, p4d_t *p4dp)
-{
-	return false;
-}
-
-static inline phys_addr_t stage2_p4d_addr_end(struct kvm *kvm,
-					      phys_addr_t addr, phys_addr_t end)
-{
-	return end;
-}
-
-static inline bool stage2_p4d_none(struct kvm *kvm, p4d_t p4d)
-{
-	if (kvm_stage2_has_pud(kvm))
-		return p4d_none(p4d);
-	else
-		return 0;
-}
-
-static inline void stage2_p4d_clear(struct kvm *kvm, p4d_t *p4dp)
-{
-	if (kvm_stage2_has_pud(kvm))
-		p4d_clear(p4dp);
-}
-
-static inline bool stage2_p4d_present(struct kvm *kvm, p4d_t p4d)
-{
-	if (kvm_stage2_has_pud(kvm))
-		return p4d_present(p4d);
-	else
-		return 1;
-}
-
-static inline void stage2_p4d_populate(struct kvm *kvm, p4d_t *p4d, pud_t *pud)
-{
-	if (kvm_stage2_has_pud(kvm))
-		p4d_populate(NULL, p4d, pud);
-}
-
-static inline pud_t *stage2_pud_offset(struct kvm *kvm,
-				       p4d_t *p4d, unsigned long address)
-{
-	if (kvm_stage2_has_pud(kvm))
-		return pud_offset(p4d, address);
-	else
-		return (pud_t *)p4d;
-}
-
-static inline void stage2_pud_free(struct kvm *kvm, pud_t *pud)
-{
-	if (kvm_stage2_has_pud(kvm))
-		free_page((unsigned long)pud);
-}
-
-static inline bool stage2_pud_table_empty(struct kvm *kvm, pud_t *pudp)
-{
-	if (kvm_stage2_has_pud(kvm))
-		return kvm_page_empty(pudp);
-	else
-		return false;
-}
-
-static inline phys_addr_t
-stage2_pud_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
-{
-	if (kvm_stage2_has_pud(kvm)) {
-		phys_addr_t boundary = (addr + S2_PUD_SIZE) & S2_PUD_MASK;
-
-		return (boundary - 1 < end - 1) ? boundary : end;
-	} else {
-		return end;
-	}
-}
-
-/* Stage2 PMD definitions when the level is present */
-static inline bool kvm_stage2_has_pmd(struct kvm *kvm)
-{
-	return (CONFIG_PGTABLE_LEVELS > 2) && (kvm_stage2_levels(kvm) > 2);
-}
-
-#define S2_PMD_SHIFT			ARM64_HW_PGTABLE_LEVEL_SHIFT(2)
-#define S2_PMD_SIZE			(1UL << S2_PMD_SHIFT)
-#define S2_PMD_MASK			(~(S2_PMD_SIZE - 1))
-
-static inline bool stage2_pud_none(struct kvm *kvm, pud_t pud)
-{
-	if (kvm_stage2_has_pmd(kvm))
-		return pud_none(pud);
-	else
-		return 0;
-}
-
-static inline void stage2_pud_clear(struct kvm *kvm, pud_t *pud)
-{
-	if (kvm_stage2_has_pmd(kvm))
-		pud_clear(pud);
-}
-
-static inline bool stage2_pud_present(struct kvm *kvm, pud_t pud)
-{
-	if (kvm_stage2_has_pmd(kvm))
-		return pud_present(pud);
-	else
-		return 1;
-}
-
-static inline void stage2_pud_populate(struct kvm *kvm, pud_t *pud, pmd_t *pmd)
-{
-	if (kvm_stage2_has_pmd(kvm))
-		pud_populate(NULL, pud, pmd);
-}
-
-static inline pmd_t *stage2_pmd_offset(struct kvm *kvm,
-				       pud_t *pud, unsigned long address)
-{
-	if (kvm_stage2_has_pmd(kvm))
-		return pmd_offset(pud, address);
-	else
-		return (pmd_t *)pud;
-}
-
-static inline void stage2_pmd_free(struct kvm *kvm, pmd_t *pmd)
-{
-	if (kvm_stage2_has_pmd(kvm))
-		free_page((unsigned long)pmd);
-}
-
-static inline bool stage2_pud_huge(struct kvm *kvm, pud_t pud)
-{
-	if (kvm_stage2_has_pmd(kvm))
-		return pud_huge(pud);
-	else
-		return 0;
-}
-
-static inline bool stage2_pmd_table_empty(struct kvm *kvm, pmd_t *pmdp)
-{
-	if (kvm_stage2_has_pmd(kvm))
-		return kvm_page_empty(pmdp);
-	else
-		return 0;
-}
-
-static inline phys_addr_t
-stage2_pmd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
-{
-	if (kvm_stage2_has_pmd(kvm)) {
-		phys_addr_t boundary = (addr + S2_PMD_SIZE) & S2_PMD_MASK;
-
-		return (boundary - 1 < end - 1) ? boundary : end;
-	} else {
-		return end;
-	}
-}
-
-static inline bool stage2_pte_table_empty(struct kvm *kvm, pte_t *ptep)
-{
-	return kvm_page_empty(ptep);
-}
-
-static inline unsigned long stage2_pgd_index(struct kvm *kvm, phys_addr_t addr)
-{
-	return (((addr) >> stage2_pgdir_shift(kvm)) & (stage2_pgd_ptrs(kvm) - 1));
-}
-
 static inline phys_addr_t
 stage2_pgd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
 {
@@ -256,13 +50,4 @@ stage2_pgd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
 	return (boundary - 1 < end - 1) ? boundary : end;
 }
 
-/*
- * Level values for the ARMv8.4-TTL extension, mapping PUD/PMD/PTE and
- * the architectural page-table level.
- */
-#define S2_NO_LEVEL_HINT	0
-#define S2_PUD_LEVEL		1
-#define S2_PMD_LEVEL		2
-#define S2_PTE_LEVEL		3
-
 #endif	/* __ARM64_S2_PGTABLE_H_ */
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 050eab71de31..ddeec0b03666 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -31,13 +31,6 @@ static phys_addr_t hyp_idmap_vector;
 
 static unsigned long io_map_base;
 
-#define KVM_S2PTE_FLAG_IS_IOMAP		(1UL << 0)
-#define KVM_S2_FLAG_LOGGING_ACTIVE	(1UL << 1)
-
-static bool is_iomap(unsigned long flags)
-{
-	return flags & KVM_S2PTE_FLAG_IS_IOMAP;
-}
 
 /*
  * Release kvm_mmu_lock periodically if the memory region is large. Otherwise,
@@ -85,154 +78,11 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
 	kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
 }
 
-static void kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
-				   int level)
-{
-	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ipa, level);
-}
-
-/*
- * D-Cache management functions. They take the page table entries by
- * value, as they are flushing the cache using the kernel mapping (or
- * kmap on 32bit).
- */
-static void kvm_flush_dcache_pte(pte_t pte)
-{
-	__kvm_flush_dcache_pte(pte);
-}
-
-static void kvm_flush_dcache_pmd(pmd_t pmd)
-{
-	__kvm_flush_dcache_pmd(pmd);
-}
-
-static void kvm_flush_dcache_pud(pud_t pud)
-{
-	__kvm_flush_dcache_pud(pud);
-}
-
 static bool kvm_is_device_pfn(unsigned long pfn)
 {
 	return !pfn_valid(pfn);
 }
 
-/**
- * stage2_dissolve_pmd() - clear and flush huge PMD entry
- * @mmu:	pointer to mmu structure to operate on
- * @addr:	IPA
- * @pmd:	pmd pointer for IPA
- *
- * Function clears a PMD entry, flushes addr 1st and 2nd stage TLBs.
- */
-static void stage2_dissolve_pmd(struct kvm_s2_mmu *mmu, phys_addr_t addr, pmd_t *pmd)
-{
-	if (!pmd_thp_or_huge(*pmd))
-		return;
-
-	pmd_clear(pmd);
-	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_PMD_LEVEL);
-	put_page(virt_to_page(pmd));
-}
-
-/**
- * stage2_dissolve_pud() - clear and flush huge PUD entry
- * @mmu:	pointer to mmu structure to operate on
- * @addr:	IPA
- * @pud:	pud pointer for IPA
- *
- * Function clears a PUD entry, flushes addr 1st and 2nd stage TLBs.
- */
-static void stage2_dissolve_pud(struct kvm_s2_mmu *mmu, phys_addr_t addr, pud_t *pudp)
-{
-	struct kvm *kvm = mmu->kvm;
-
-	if (!stage2_pud_huge(kvm, *pudp))
-		return;
-
-	stage2_pud_clear(kvm, pudp);
-	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_PUD_LEVEL);
-	put_page(virt_to_page(pudp));
-}
-
-static void clear_stage2_pgd_entry(struct kvm_s2_mmu *mmu, pgd_t *pgd, phys_addr_t addr)
-{
-	struct kvm *kvm = mmu->kvm;
-	p4d_t *p4d_table __maybe_unused = stage2_p4d_offset(kvm, pgd, 0UL);
-	stage2_pgd_clear(kvm, pgd);
-	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_NO_LEVEL_HINT);
-	stage2_p4d_free(kvm, p4d_table);
-	put_page(virt_to_page(pgd));
-}
-
-static void clear_stage2_p4d_entry(struct kvm_s2_mmu *mmu, p4d_t *p4d, phys_addr_t addr)
-{
-	struct kvm *kvm = mmu->kvm;
-	pud_t *pud_table __maybe_unused = stage2_pud_offset(kvm, p4d, 0);
-	stage2_p4d_clear(kvm, p4d);
-	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_NO_LEVEL_HINT);
-	stage2_pud_free(kvm, pud_table);
-	put_page(virt_to_page(p4d));
-}
-
-static void clear_stage2_pud_entry(struct kvm_s2_mmu *mmu, pud_t *pud, phys_addr_t addr)
-{
-	struct kvm *kvm = mmu->kvm;
-	pmd_t *pmd_table __maybe_unused = stage2_pmd_offset(kvm, pud, 0);
-
-	VM_BUG_ON(stage2_pud_huge(kvm, *pud));
-	stage2_pud_clear(kvm, pud);
-	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_NO_LEVEL_HINT);
-	stage2_pmd_free(kvm, pmd_table);
-	put_page(virt_to_page(pud));
-}
-
-static void clear_stage2_pmd_entry(struct kvm_s2_mmu *mmu, pmd_t *pmd, phys_addr_t addr)
-{
-	pte_t *pte_table = pte_offset_kernel(pmd, 0);
-	VM_BUG_ON(pmd_thp_or_huge(*pmd));
-	pmd_clear(pmd);
-	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_NO_LEVEL_HINT);
-	free_page((unsigned long)pte_table);
-	put_page(virt_to_page(pmd));
-}
-
-static inline void kvm_set_pte(pte_t *ptep, pte_t new_pte)
-{
-	WRITE_ONCE(*ptep, new_pte);
-	dsb(ishst);
-}
-
-static inline void kvm_set_pmd(pmd_t *pmdp, pmd_t new_pmd)
-{
-	WRITE_ONCE(*pmdp, new_pmd);
-	dsb(ishst);
-}
-
-static inline void kvm_pmd_populate(pmd_t *pmdp, pte_t *ptep)
-{
-	kvm_set_pmd(pmdp, kvm_mk_pmd(ptep));
-}
-
-static inline void kvm_pud_populate(pud_t *pudp, pmd_t *pmdp)
-{
-	WRITE_ONCE(*pudp, kvm_mk_pud(pmdp));
-	dsb(ishst);
-}
-
-static inline void kvm_p4d_populate(p4d_t *p4dp, pud_t *pudp)
-{
-	WRITE_ONCE(*p4dp, kvm_mk_p4d(pudp));
-	dsb(ishst);
-}
-
-static inline void kvm_pgd_populate(pgd_t *pgdp, p4d_t *p4dp)
-{
-#ifndef __PAGETABLE_P4D_FOLDED
-	WRITE_ONCE(*pgdp, kvm_mk_pgd(p4dp));
-	dsb(ishst);
-#endif
-}
-
 /*
  * Unmapping vs dcache management:
  *
@@ -257,108 +107,6 @@ static inline void kvm_pgd_populate(pgd_t *pgdp, p4d_t *p4dp)
  * we then fully enforce cacheability of RAM, no matter what the guest
  * does.
  */
-static void unmap_stage2_ptes(struct kvm_s2_mmu *mmu, pmd_t *pmd,
-		       phys_addr_t addr, phys_addr_t end)
-{
-	phys_addr_t start_addr = addr;
-	pte_t *pte, *start_pte;
-
-	start_pte = pte = pte_offset_kernel(pmd, addr);
-	do {
-		if (!pte_none(*pte)) {
-			pte_t old_pte = *pte;
-
-			kvm_set_pte(pte, __pte(0));
-			kvm_tlb_flush_vmid_ipa(mmu, addr, S2_PTE_LEVEL);
-
-			/* No need to invalidate the cache for device mappings */
-			if (!kvm_is_device_pfn(pte_pfn(old_pte)))
-				kvm_flush_dcache_pte(old_pte);
-
-			put_page(virt_to_page(pte));
-		}
-	} while (pte++, addr += PAGE_SIZE, addr != end);
-
-	if (stage2_pte_table_empty(mmu->kvm, start_pte))
-		clear_stage2_pmd_entry(mmu, pmd, start_addr);
-}
-
-static void unmap_stage2_pmds(struct kvm_s2_mmu *mmu, pud_t *pud,
-		       phys_addr_t addr, phys_addr_t end)
-{
-	struct kvm *kvm = mmu->kvm;
-	phys_addr_t next, start_addr = addr;
-	pmd_t *pmd, *start_pmd;
-
-	start_pmd = pmd = stage2_pmd_offset(kvm, pud, addr);
-	do {
-		next = stage2_pmd_addr_end(kvm, addr, end);
-		if (!pmd_none(*pmd)) {
-			if (pmd_thp_or_huge(*pmd)) {
-				pmd_t old_pmd = *pmd;
-
-				pmd_clear(pmd);
-				kvm_tlb_flush_vmid_ipa(mmu, addr, S2_PMD_LEVEL);
-
-				kvm_flush_dcache_pmd(old_pmd);
-
-				put_page(virt_to_page(pmd));
-			} else {
-				unmap_stage2_ptes(mmu, pmd, addr, next);
-			}
-		}
-	} while (pmd++, addr = next, addr != end);
-
-	if (stage2_pmd_table_empty(kvm, start_pmd))
-		clear_stage2_pud_entry(mmu, pud, start_addr);
-}
-
-static void unmap_stage2_puds(struct kvm_s2_mmu *mmu, p4d_t *p4d,
-		       phys_addr_t addr, phys_addr_t end)
-{
-	struct kvm *kvm = mmu->kvm;
-	phys_addr_t next, start_addr = addr;
-	pud_t *pud, *start_pud;
-
-	start_pud = pud = stage2_pud_offset(kvm, p4d, addr);
-	do {
-		next = stage2_pud_addr_end(kvm, addr, end);
-		if (!stage2_pud_none(kvm, *pud)) {
-			if (stage2_pud_huge(kvm, *pud)) {
-				pud_t old_pud = *pud;
-
-				stage2_pud_clear(kvm, pud);
-				kvm_tlb_flush_vmid_ipa(mmu, addr, S2_PUD_LEVEL);
-				kvm_flush_dcache_pud(old_pud);
-				put_page(virt_to_page(pud));
-			} else {
-				unmap_stage2_pmds(mmu, pud, addr, next);
-			}
-		}
-	} while (pud++, addr = next, addr != end);
-
-	if (stage2_pud_table_empty(kvm, start_pud))
-		clear_stage2_p4d_entry(mmu, p4d, start_addr);
-}
-
-static void unmap_stage2_p4ds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
-		       phys_addr_t addr, phys_addr_t end)
-{
-	struct kvm *kvm = mmu->kvm;
-	phys_addr_t next, start_addr = addr;
-	p4d_t *p4d, *start_p4d;
-
-	start_p4d = p4d = stage2_p4d_offset(kvm, pgd, addr);
-	do {
-		next = stage2_p4d_addr_end(kvm, addr, end);
-		if (!stage2_p4d_none(kvm, *p4d))
-			unmap_stage2_puds(mmu, p4d, addr, next);
-	} while (p4d++, addr = next, addr != end);
-
-	if (stage2_p4d_table_empty(kvm, start_p4d))
-		clear_stage2_pgd_entry(mmu, pgd, start_addr);
-}
-
 /**
  * unmap_stage2_range -- Clear stage2 page table entries to unmap a range
  * @kvm:   The VM pointer
@@ -387,71 +135,6 @@ static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 si
 	__unmap_stage2_range(mmu, start, size, true);
 }
 
-static void stage2_flush_ptes(struct kvm_s2_mmu *mmu, pmd_t *pmd,
-			      phys_addr_t addr, phys_addr_t end)
-{
-	pte_t *pte;
-
-	pte = pte_offset_kernel(pmd, addr);
-	do {
-		if (!pte_none(*pte) && !kvm_is_device_pfn(pte_pfn(*pte)))
-			kvm_flush_dcache_pte(*pte);
-	} while (pte++, addr += PAGE_SIZE, addr != end);
-}
-
-static void stage2_flush_pmds(struct kvm_s2_mmu *mmu, pud_t *pud,
-			      phys_addr_t addr, phys_addr_t end)
-{
-	struct kvm *kvm = mmu->kvm;
-	pmd_t *pmd;
-	phys_addr_t next;
-
-	pmd = stage2_pmd_offset(kvm, pud, addr);
-	do {
-		next = stage2_pmd_addr_end(kvm, addr, end);
-		if (!pmd_none(*pmd)) {
-			if (pmd_thp_or_huge(*pmd))
-				kvm_flush_dcache_pmd(*pmd);
-			else
-				stage2_flush_ptes(mmu, pmd, addr, next);
-		}
-	} while (pmd++, addr = next, addr != end);
-}
-
-static void stage2_flush_puds(struct kvm_s2_mmu *mmu, p4d_t *p4d,
-			      phys_addr_t addr, phys_addr_t end)
-{
-	struct kvm *kvm = mmu->kvm;
-	pud_t *pud;
-	phys_addr_t next;
-
-	pud = stage2_pud_offset(kvm, p4d, addr);
-	do {
-		next = stage2_pud_addr_end(kvm, addr, end);
-		if (!stage2_pud_none(kvm, *pud)) {
-			if (stage2_pud_huge(kvm, *pud))
-				kvm_flush_dcache_pud(*pud);
-			else
-				stage2_flush_pmds(mmu, pud, addr, next);
-		}
-	} while (pud++, addr = next, addr != end);
-}
-
-static void stage2_flush_p4ds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
-			      phys_addr_t addr, phys_addr_t end)
-{
-	struct kvm *kvm = mmu->kvm;
-	p4d_t *p4d;
-	phys_addr_t next;
-
-	p4d = stage2_p4d_offset(kvm, pgd, addr);
-	do {
-		next = stage2_p4d_addr_end(kvm, addr, end);
-		if (!stage2_p4d_none(kvm, *p4d))
-			stage2_flush_puds(mmu, p4d, addr, next);
-	} while (p4d++, addr = next, addr != end);
-}
-
 static void stage2_flush_memslot(struct kvm *kvm,
 				 struct kvm_memory_slot *memslot)
 {
@@ -800,348 +483,6 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
 	}
 }
 
-static p4d_t *stage2_get_p4d(struct kvm_s2_mmu *mmu, struct kvm_mmu_memory_cache *cache,
-			     phys_addr_t addr)
-{
-	struct kvm *kvm = mmu->kvm;
-	pgd_t *pgd;
-	p4d_t *p4d;
-
-	pgd = mmu->pgd + stage2_pgd_index(kvm, addr);
-	if (stage2_pgd_none(kvm, *pgd)) {
-		if (!cache)
-			return NULL;
-		p4d = kvm_mmu_memory_cache_alloc(cache);
-		stage2_pgd_populate(kvm, pgd, p4d);
-		get_page(virt_to_page(pgd));
-	}
-
-	return stage2_p4d_offset(kvm, pgd, addr);
-}
-
-static pud_t *stage2_get_pud(struct kvm_s2_mmu *mmu, struct kvm_mmu_memory_cache *cache,
-			     phys_addr_t addr)
-{
-	struct kvm *kvm = mmu->kvm;
-	p4d_t *p4d;
-	pud_t *pud;
-
-	p4d = stage2_get_p4d(mmu, cache, addr);
-	if (stage2_p4d_none(kvm, *p4d)) {
-		if (!cache)
-			return NULL;
-		pud = kvm_mmu_memory_cache_alloc(cache);
-		stage2_p4d_populate(kvm, p4d, pud);
-		get_page(virt_to_page(p4d));
-	}
-
-	return stage2_pud_offset(kvm, p4d, addr);
-}
-
-static pmd_t *stage2_get_pmd(struct kvm_s2_mmu *mmu, struct kvm_mmu_memory_cache *cache,
-			     phys_addr_t addr)
-{
-	struct kvm *kvm = mmu->kvm;
-	pud_t *pud;
-	pmd_t *pmd;
-
-	pud = stage2_get_pud(mmu, cache, addr);
-	if (!pud || stage2_pud_huge(kvm, *pud))
-		return NULL;
-
-	if (stage2_pud_none(kvm, *pud)) {
-		if (!cache)
-			return NULL;
-		pmd = kvm_mmu_memory_cache_alloc(cache);
-		stage2_pud_populate(kvm, pud, pmd);
-		get_page(virt_to_page(pud));
-	}
-
-	return stage2_pmd_offset(kvm, pud, addr);
-}
-
-static int stage2_set_pmd_huge(struct kvm_s2_mmu *mmu,
-			       struct kvm_mmu_memory_cache *cache,
-			       phys_addr_t addr, const pmd_t *new_pmd)
-{
-	pmd_t *pmd, old_pmd;
-
-retry:
-	pmd = stage2_get_pmd(mmu, cache, addr);
-	VM_BUG_ON(!pmd);
-
-	old_pmd = *pmd;
-	/*
-	 * Multiple vcpus faulting on the same PMD entry, can
-	 * lead to them sequentially updating the PMD with the
-	 * same value. Following the break-before-make
-	 * (pmd_clear() followed by tlb_flush()) process can
-	 * hinder forward progress due to refaults generated
-	 * on missing translations.
-	 *
-	 * Skip updating the page table if the entry is
-	 * unchanged.
-	 */
-	if (pmd_val(old_pmd) == pmd_val(*new_pmd))
-		return 0;
-
-	if (pmd_present(old_pmd)) {
-		/*
-		 * If we already have PTE level mapping for this block,
-		 * we must unmap it to avoid inconsistent TLB state and
-		 * leaking the table page. We could end up in this situation
-		 * if the memory slot was marked for dirty logging and was
-		 * reverted, leaving PTE level mappings for the pages accessed
-		 * during the period. So, unmap the PTE level mapping for this
-		 * block and retry, as we could have released the upper level
-		 * table in the process.
-		 *
-		 * Normal THP split/merge follows mmu_notifier callbacks and do
-		 * get handled accordingly.
-		 */
-		if (!pmd_thp_or_huge(old_pmd)) {
-			unmap_stage2_range(mmu, addr & S2_PMD_MASK, S2_PMD_SIZE);
-			goto retry;
-		}
-		/*
-		 * Mapping in huge pages should only happen through a
-		 * fault.  If a page is merged into a transparent huge
-		 * page, the individual subpages of that huge page
-		 * should be unmapped through MMU notifiers before we
-		 * get here.
-		 *
-		 * Merging of CompoundPages is not supported; they
-		 * should become splitting first, unmapped, merged,
-		 * and mapped back in on-demand.
-		 */
-		WARN_ON_ONCE(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd));
-		pmd_clear(pmd);
-		kvm_tlb_flush_vmid_ipa(mmu, addr, S2_PMD_LEVEL);
-	} else {
-		get_page(virt_to_page(pmd));
-	}
-
-	kvm_set_pmd(pmd, *new_pmd);
-	return 0;
-}
-
-static int stage2_set_pud_huge(struct kvm_s2_mmu *mmu,
-			       struct kvm_mmu_memory_cache *cache,
-			       phys_addr_t addr, const pud_t *new_pudp)
-{
-	struct kvm *kvm = mmu->kvm;
-	pud_t *pudp, old_pud;
-
-retry:
-	pudp = stage2_get_pud(mmu, cache, addr);
-	VM_BUG_ON(!pudp);
-
-	old_pud = *pudp;
-
-	/*
-	 * A large number of vcpus faulting on the same stage 2 entry,
-	 * can lead to a refault due to the stage2_pud_clear()/tlb_flush().
-	 * Skip updating the page tables if there is no change.
-	 */
-	if (pud_val(old_pud) == pud_val(*new_pudp))
-		return 0;
-
-	if (stage2_pud_present(kvm, old_pud)) {
-		/*
-		 * If we already have table level mapping for this block, unmap
-		 * the range for this block and retry.
-		 */
-		if (!stage2_pud_huge(kvm, old_pud)) {
-			unmap_stage2_range(mmu, addr & S2_PUD_MASK, S2_PUD_SIZE);
-			goto retry;
-		}
-
-		WARN_ON_ONCE(kvm_pud_pfn(old_pud) != kvm_pud_pfn(*new_pudp));
-		stage2_pud_clear(kvm, pudp);
-		kvm_tlb_flush_vmid_ipa(mmu, addr, S2_PUD_LEVEL);
-	} else {
-		get_page(virt_to_page(pudp));
-	}
-
-	kvm_set_pud(pudp, *new_pudp);
-	return 0;
-}
-
-/*
- * stage2_get_leaf_entry - walk the stage2 VM page tables and return
- * true if a valid and present leaf-entry is found. A pointer to the
- * leaf-entry is returned in the appropriate level variable - pudpp,
- * pmdpp, ptepp.
- */
-static bool stage2_get_leaf_entry(struct kvm_s2_mmu *mmu, phys_addr_t addr,
-				  pud_t **pudpp, pmd_t **pmdpp, pte_t **ptepp)
-{
-	struct kvm *kvm = mmu->kvm;
-	pud_t *pudp;
-	pmd_t *pmdp;
-	pte_t *ptep;
-
-	*pudpp = NULL;
-	*pmdpp = NULL;
-	*ptepp = NULL;
-
-	pudp = stage2_get_pud(mmu, NULL, addr);
-	if (!pudp || stage2_pud_none(kvm, *pudp) || !stage2_pud_present(kvm, *pudp))
-		return false;
-
-	if (stage2_pud_huge(kvm, *pudp)) {
-		*pudpp = pudp;
-		return true;
-	}
-
-	pmdp = stage2_pmd_offset(kvm, pudp, addr);
-	if (!pmdp || pmd_none(*pmdp) || !pmd_present(*pmdp))
-		return false;
-
-	if (pmd_thp_or_huge(*pmdp)) {
-		*pmdpp = pmdp;
-		return true;
-	}
-
-	ptep = pte_offset_kernel(pmdp, addr);
-	if (!ptep || pte_none(*ptep) || !pte_present(*ptep))
-		return false;
-
-	*ptepp = ptep;
-	return true;
-}
-
-static bool stage2_is_exec(struct kvm_s2_mmu *mmu, phys_addr_t addr, unsigned long sz)
-{
-	pud_t *pudp;
-	pmd_t *pmdp;
-	pte_t *ptep;
-	bool found;
-
-	found = stage2_get_leaf_entry(mmu, addr, &pudp, &pmdp, &ptep);
-	if (!found)
-		return false;
-
-	if (pudp)
-		return sz <= PUD_SIZE && kvm_s2pud_exec(pudp);
-	else if (pmdp)
-		return sz <= PMD_SIZE && kvm_s2pmd_exec(pmdp);
-	else
-		return sz == PAGE_SIZE && kvm_s2pte_exec(ptep);
-}
-
-static int stage2_set_pte(struct kvm_s2_mmu *mmu,
-			  struct kvm_mmu_memory_cache *cache,
-			  phys_addr_t addr, const pte_t *new_pte,
-			  unsigned long flags)
-{
-	struct kvm *kvm = mmu->kvm;
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *pte, old_pte;
-	bool iomap = flags & KVM_S2PTE_FLAG_IS_IOMAP;
-	bool logging_active = flags & KVM_S2_FLAG_LOGGING_ACTIVE;
-
-	VM_BUG_ON(logging_active && !cache);
-
-	/* Create stage-2 page table mapping - Levels 0 and 1 */
-	pud = stage2_get_pud(mmu, cache, addr);
-	if (!pud) {
-		/*
-		 * Ignore calls from kvm_set_spte_hva for unallocated
-		 * address ranges.
-		 */
-		return 0;
-	}
-
-	/*
-	 * While dirty page logging - dissolve huge PUD, then continue
-	 * on to allocate page.
-	 */
-	if (logging_active)
-		stage2_dissolve_pud(mmu, addr, pud);
-
-	if (stage2_pud_none(kvm, *pud)) {
-		if (!cache)
-			return 0; /* ignore calls from kvm_set_spte_hva */
-		pmd = kvm_mmu_memory_cache_alloc(cache);
-		stage2_pud_populate(kvm, pud, pmd);
-		get_page(virt_to_page(pud));
-	}
-
-	pmd = stage2_pmd_offset(kvm, pud, addr);
-	if (!pmd) {
-		/*
-		 * Ignore calls from kvm_set_spte_hva for unallocated
-		 * address ranges.
-		 */
-		return 0;
-	}
-
-	/*
-	 * While dirty page logging - dissolve huge PMD, then continue on to
-	 * allocate page.
-	 */
-	if (logging_active)
-		stage2_dissolve_pmd(mmu, addr, pmd);
-
-	/* Create stage-2 page mappings - Level 2 */
-	if (pmd_none(*pmd)) {
-		if (!cache)
-			return 0; /* ignore calls from kvm_set_spte_hva */
-		pte = kvm_mmu_memory_cache_alloc(cache);
-		kvm_pmd_populate(pmd, pte);
-		get_page(virt_to_page(pmd));
-	}
-
-	pte = pte_offset_kernel(pmd, addr);
-
-	if (iomap && pte_present(*pte))
-		return -EFAULT;
-
-	/* Create 2nd stage page table mapping - Level 3 */
-	old_pte = *pte;
-	if (pte_present(old_pte)) {
-		/* Skip page table update if there is no change */
-		if (pte_val(old_pte) == pte_val(*new_pte))
-			return 0;
-
-		kvm_set_pte(pte, __pte(0));
-		kvm_tlb_flush_vmid_ipa(mmu, addr, S2_PTE_LEVEL);
-	} else {
-		get_page(virt_to_page(pte));
-	}
-
-	kvm_set_pte(pte, *new_pte);
-	return 0;
-}
-
-#ifndef __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
-static int stage2_ptep_test_and_clear_young(pte_t *pte)
-{
-	if (pte_young(*pte)) {
-		*pte = pte_mkold(*pte);
-		return 1;
-	}
-	return 0;
-}
-#else
-static int stage2_ptep_test_and_clear_young(pte_t *pte)
-{
-	return __ptep_test_and_clear_young(pte);
-}
-#endif
-
-static int stage2_pmdp_test_and_clear_young(pmd_t *pmd)
-{
-	return stage2_ptep_test_and_clear_young((pte_t *)pmd);
-}
-
-static int stage2_pudp_test_and_clear_young(pud_t *pud)
-{
-	return stage2_ptep_test_and_clear_young((pte_t *)pud);
-}
-
 /**
  * kvm_phys_addr_ioremap - map a device range to guest IPA
  *
@@ -1181,102 +522,6 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
 	return ret;
 }
 
-/**
- * stage2_wp_ptes - write protect PMD range
- * @pmd:	pointer to pmd entry
- * @addr:	range start address
- * @end:	range end address
- */
-static void stage2_wp_ptes(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
-{
-	pte_t *pte;
-
-	pte = pte_offset_kernel(pmd, addr);
-	do {
-		if (!pte_none(*pte)) {
-			if (!kvm_s2pte_readonly(pte))
-				kvm_set_s2pte_readonly(pte);
-		}
-	} while (pte++, addr += PAGE_SIZE, addr != end);
-}
-
-/**
- * stage2_wp_pmds - write protect PUD range
- * kvm:		kvm instance for the VM
- * @pud:	pointer to pud entry
- * @addr:	range start address
- * @end:	range end address
- */
-static void stage2_wp_pmds(struct kvm_s2_mmu *mmu, pud_t *pud,
-			   phys_addr_t addr, phys_addr_t end)
-{
-	struct kvm *kvm = mmu->kvm;
-	pmd_t *pmd;
-	phys_addr_t next;
-
-	pmd = stage2_pmd_offset(kvm, pud, addr);
-
-	do {
-		next = stage2_pmd_addr_end(kvm, addr, end);
-		if (!pmd_none(*pmd)) {
-			if (pmd_thp_or_huge(*pmd)) {
-				if (!kvm_s2pmd_readonly(pmd))
-					kvm_set_s2pmd_readonly(pmd);
-			} else {
-				stage2_wp_ptes(pmd, addr, next);
-			}
-		}
-	} while (pmd++, addr = next, addr != end);
-}
-
-/**
- * stage2_wp_puds - write protect P4D range
- * @p4d:	pointer to p4d entry
- * @addr:	range start address
- * @end:	range end address
- */
-static void  stage2_wp_puds(struct kvm_s2_mmu *mmu, p4d_t *p4d,
-			    phys_addr_t addr, phys_addr_t end)
-{
-	struct kvm *kvm = mmu->kvm;
-	pud_t *pud;
-	phys_addr_t next;
-
-	pud = stage2_pud_offset(kvm, p4d, addr);
-	do {
-		next = stage2_pud_addr_end(kvm, addr, end);
-		if (!stage2_pud_none(kvm, *pud)) {
-			if (stage2_pud_huge(kvm, *pud)) {
-				if (!kvm_s2pud_readonly(pud))
-					kvm_set_s2pud_readonly(pud);
-			} else {
-				stage2_wp_pmds(mmu, pud, addr, next);
-			}
-		}
-	} while (pud++, addr = next, addr != end);
-}
-
-/**
- * stage2_wp_p4ds - write protect PGD range
- * @pgd:	pointer to pgd entry
- * @addr:	range start address
- * @end:	range end address
- */
-static void  stage2_wp_p4ds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
-			    phys_addr_t addr, phys_addr_t end)
-{
-	struct kvm *kvm = mmu->kvm;
-	p4d_t *p4d;
-	phys_addr_t next;
-
-	p4d = stage2_p4d_offset(kvm, pgd, addr);
-	do {
-		next = stage2_p4d_addr_end(kvm, addr, end);
-		if (!stage2_p4d_none(kvm, *p4d))
-			stage2_wp_puds(mmu, p4d, addr, next);
-	} while (p4d++, addr = next, addr != end);
-}
-
 /**
  * stage2_wp_range() - write protect stage2 memory region range
  * @kvm:	The KVM pointer
-- 
2.28.0.297.g1956fa8f8d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 20/21] KVM: arm64: Remove unused 'pgd' field from 'struct kvm_s2_mmu'
  2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
                   ` (18 preceding siblings ...)
  2020-08-25  9:39 ` [PATCH v3 19/21] KVM: arm64: Remove unused page-table code Will Deacon
@ 2020-08-25  9:39 ` Will Deacon
  2020-09-03  5:07   ` Gavin Shan
  2020-08-25  9:39 ` [PATCH v3 21/21] KVM: arm64: Don't constrain maximum IPA size based on host configuration Will Deacon
                   ` (3 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Will Deacon @ 2020-08-25  9:39 UTC (permalink / raw)
  To: kvmarm
  Cc: kernel-team, Gavin Shan, Suzuki Poulose, Marc Zyngier,
	Quentin Perret, James Morse, Catalin Marinas, Will Deacon,
	linux-arm-kernel

The stage-2 page-tables are entirely encapsulated by the 'pgt' field of
'struct kvm_s2_mmu', so remove the unused 'pgd' field.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h | 1 -
 arch/arm64/kvm/mmu.c              | 2 --
 2 files changed, 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 0b7c702b2151..41caf29bd93c 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -79,7 +79,6 @@ struct kvm_s2_mmu {
 	 * for vEL1/EL0 with vHCR_EL2.VM == 0.  In that case, we use the
 	 * canonical stage-2 page tables.
 	 */
-	pgd_t		*pgd;
 	phys_addr_t	pgd_phys;
 	struct kvm_pgtable *pgt;
 
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ddeec0b03666..f28e03dcb897 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -384,7 +384,6 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
 	mmu->kvm = kvm;
 	mmu->pgt = pgt;
 	mmu->pgd_phys = __pa(pgt->pgd);
-	mmu->pgd = (void *)pgt->pgd;
 	mmu->vmid.vmid_gen = 0;
 	return 0;
 
@@ -470,7 +469,6 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
 	spin_lock(&kvm->mmu_lock);
 	pgt = mmu->pgt;
 	if (pgt) {
-		mmu->pgd = NULL;
 		mmu->pgd_phys = 0;
 		mmu->pgt = NULL;
 		free_percpu(mmu->last_vcpu_ran);
-- 
2.28.0.297.g1956fa8f8d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* [PATCH v3 21/21] KVM: arm64: Don't constrain maximum IPA size based on host configuration
  2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
                   ` (19 preceding siblings ...)
  2020-08-25  9:39 ` [PATCH v3 20/21] KVM: arm64: Remove unused 'pgd' field from 'struct kvm_s2_mmu' Will Deacon
@ 2020-08-25  9:39 ` Will Deacon
  2020-09-03  5:09   ` Gavin Shan
  2020-08-27 16:26 ` [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Alexandru Elisei
                   ` (2 subsequent siblings)
  23 siblings, 1 reply; 86+ messages in thread
From: Will Deacon @ 2020-08-25  9:39 UTC (permalink / raw)
  To: kvmarm
  Cc: kernel-team, Gavin Shan, Suzuki Poulose, Marc Zyngier,
	Quentin Perret, James Morse, Catalin Marinas, Will Deacon,
	linux-arm-kernel

Now that the guest stage-2 page-tables are managed independently from
the host stage-1 page-tables, we can avoid constraining the IPA size
based on the host and instead limit it only based on the PARange field
of the ID_AA64MMFR0 register.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arch/arm64/kvm/reset.c | 38 +++++---------------------------------
 1 file changed, 5 insertions(+), 33 deletions(-)

diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index ee33875c5c2a..471ee9234e40 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -339,7 +339,7 @@ u32 get_kvm_ipa_limit(void)
 
 int kvm_set_ipa_limit(void)
 {
-	unsigned int ipa_max, pa_max, va_max, parange, tgran_2;
+	unsigned int parange, tgran_2;
 	u64 mmfr0;
 
 	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
@@ -376,38 +376,10 @@ int kvm_set_ipa_limit(void)
 		break;
 	}
 
-	pa_max = id_aa64mmfr0_parange_to_phys_shift(parange);
-
-	/* Clamp the IPA limit to the PA size supported by the kernel */
-	ipa_max = (pa_max > PHYS_MASK_SHIFT) ? PHYS_MASK_SHIFT : pa_max;
-	/*
-	 * Since our stage2 table is dependent on the stage1 page table code,
-	 * we must always honor the following condition:
-	 *
-	 *  Number of levels in Stage1 >= Number of levels in Stage2.
-	 *
-	 * So clamp the ipa limit further down to limit the number of levels.
-	 * Since we can concatenate upto 16 tables at entry level, we could
-	 * go upto 4bits above the maximum VA addressable with the current
-	 * number of levels.
-	 */
-	va_max = PGDIR_SHIFT + PAGE_SHIFT - 3;
-	va_max += 4;
-
-	if (va_max < ipa_max)
-		ipa_max = va_max;
-
-	/*
-	 * If the final limit is lower than the real physical address
-	 * limit of the CPUs, report the reason.
-	 */
-	if (ipa_max < pa_max)
-		pr_info("kvm: Limiting the IPA size due to kernel %s Address limit\n",
-			(va_max < pa_max) ? "Virtual" : "Physical");
-
-	WARN(ipa_max < KVM_PHYS_SHIFT,
-	     "KVM IPA limit (%d bit) is smaller than default size\n", ipa_max);
-	kvm_ipa_limit = ipa_max;
+	kvm_ipa_limit = id_aa64mmfr0_parange_to_phys_shift(parange);
+	WARN(kvm_ipa_limit < KVM_PHYS_SHIFT,
+	     "KVM IPA limit (%d bit) is smaller than default size\n",
+	     kvm_ipa_limit);
 	kvm_info("IPA Size Limit: %dbits\n", kvm_ipa_limit);
 
 	return 0;
-- 
2.28.0.297.g1956fa8f8d-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling
  2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
                   ` (20 preceding siblings ...)
  2020-08-25  9:39 ` [PATCH v3 21/21] KVM: arm64: Don't constrain maximum IPA size based on host configuration Will Deacon
@ 2020-08-27 16:26 ` Alexandru Elisei
  2020-09-01 16:15   ` Will Deacon
  2020-09-03  7:34 ` Gavin Shan
  2020-09-03 18:52 ` Will Deacon
  23 siblings, 1 reply; 86+ messages in thread
From: Alexandru Elisei @ 2020-08-27 16:26 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Marc Zyngier, kernel-team, linux-arm-kernel, Catalin Marinas

Hi Will,

I've been looking into pinning guest memory for KVM SPE, so I like to think that
the stage 2 page table code is not entirely alien to me. I'll do my best to review
the series, I hope you'll find it useful.

Thanks,

Alex

On 8/25/20 10:39 AM, Will Deacon wrote:
> Hello folks,
>
> This is version three of the KVM page-table rework that I previously posted
> here:
>
>   v1: https://lore.kernel.org/r/20200730153406.25136-1-will@kernel.org
>   v2: https://lore.kernel.org/r/20200818132818.16065-1-will@kernel.org
>
> Changes since v2 include:
>
>   * Rebased onto -rc2, which includes the conflicting OOM blocking fixes
>   * Dropped the patch trying to "fix" the memcache in kvm_phys_addr_ioremap()
>
> Cheers,
>
> Will
>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Quentin Perret <qperret@google.com>
> Cc: James Morse <james.morse@arm.com>
> Cc: Suzuki Poulose <suzuki.poulose@arm.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Gavin Shan <gshan@redhat.com>
> Cc: kernel-team@android.com
> Cc: linux-arm-kernel@lists.infradead.org
>
> --->8
>
> Quentin Perret (4):
>   KVM: arm64: Add support for stage-2 write-protect in generic
>     page-table
>   KVM: arm64: Convert write-protect operation to generic page-table API
>   KVM: arm64: Add support for stage-2 cache flushing in generic
>     page-table
>   KVM: arm64: Convert memslot cache-flushing code to generic page-table
>     API
>
> Will Deacon (17):
>   KVM: arm64: Remove kvm_mmu_free_memory_caches()
>   KVM: arm64: Add stand-alone page-table walker infrastructure
>   KVM: arm64: Add support for creating kernel-agnostic stage-1 page
>     tables
>   KVM: arm64: Use generic allocator for hyp stage-1 page-tables
>   KVM: arm64: Add support for creating kernel-agnostic stage-2 page
>     tables
>   KVM: arm64: Add support for stage-2 map()/unmap() in generic
>     page-table
>   KVM: arm64: Convert kvm_phys_addr_ioremap() to generic page-table API
>   KVM: arm64: Convert kvm_set_spte_hva() to generic page-table API
>   KVM: arm64: Convert unmap_stage2_range() to generic page-table API
>   KVM: arm64: Add support for stage-2 page-aging in generic page-table
>   KVM: arm64: Convert page-aging and access faults to generic page-table
>     API
>   KVM: arm64: Add support for relaxing stage-2 perms in generic
>     page-table code
>   KVM: arm64: Convert user_mem_abort() to generic page-table API
>   KVM: arm64: Check the pgt instead of the pgd when modifying page-table
>   KVM: arm64: Remove unused page-table code
>   KVM: arm64: Remove unused 'pgd' field from 'struct kvm_s2_mmu'
>   KVM: arm64: Don't constrain maximum IPA size based on host
>     configuration
>
>  arch/arm64/include/asm/kvm_host.h       |    2 +-
>  arch/arm64/include/asm/kvm_mmu.h        |  221 +---
>  arch/arm64/include/asm/kvm_pgtable.h    |  279 ++++
>  arch/arm64/include/asm/pgtable-hwdef.h  |   23 -
>  arch/arm64/include/asm/pgtable-prot.h   |   19 -
>  arch/arm64/include/asm/stage2_pgtable.h |  215 ----
>  arch/arm64/kvm/arm.c                    |    2 +-
>  arch/arm64/kvm/hyp/Makefile             |    2 +-
>  arch/arm64/kvm/hyp/pgtable.c            |  860 +++++++++++++
>  arch/arm64/kvm/mmu.c                    | 1566 +++--------------------
>  arch/arm64/kvm/reset.c                  |   38 +-
>  11 files changed, 1326 insertions(+), 1901 deletions(-)
>  create mode 100644 arch/arm64/include/asm/kvm_pgtable.h
>  create mode 100644 arch/arm64/kvm/hyp/pgtable.c
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 02/21] KVM: arm64: Add stand-alone page-table walker infrastructure
  2020-08-25  9:39 ` [PATCH v3 02/21] KVM: arm64: Add stand-alone page-table walker infrastructure Will Deacon
@ 2020-08-27 16:27   ` Alexandru Elisei
  2020-08-28 15:43     ` Alexandru Elisei
  2020-09-02 10:36     ` Will Deacon
  2020-08-28 15:51   ` Alexandru Elisei
  2020-09-02  6:31   ` Gavin Shan
  2 siblings, 2 replies; 86+ messages in thread
From: Alexandru Elisei @ 2020-08-27 16:27 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Marc Zyngier, kernel-team, linux-arm-kernel, Catalin Marinas

Hi Will,

It looks to me like the fact that code doesn't take into account the fact that we
can have concatenated pages at the initial level of lookup. Am I missing
something? Is it added in later patches and I missed it? I've commented below in a
few places where I noticed that.

On 8/25/20 10:39 AM, Will Deacon wrote:
> The KVM page-table code is intricately tied into the kernel page-table
> code and re-uses the pte/pmd/pud/p4d/pgd macros directly in an attempt
> to reduce code duplication. Unfortunately, the reality is that there is
> an awful lot of code required to make this work, and at the end of the
> day you're limited to creating page-tables with the same configuration
> as the host kernel. Furthermore, lifting the page-table code to run
> directly at EL2 on a non-VHE system (as we plan to to do in future
> patches) is practically impossible due to the number of dependencies it
> has on the core kernel.
>
> Introduce a framework for walking Armv8 page-tables configured
> independently from the host kernel.
>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h | 101 ++++++++++
>  arch/arm64/kvm/hyp/Makefile          |   2 +-
>  arch/arm64/kvm/hyp/pgtable.c         | 290 +++++++++++++++++++++++++++
>  3 files changed, 392 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm64/include/asm/kvm_pgtable.h
>  create mode 100644 arch/arm64/kvm/hyp/pgtable.c
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> new file mode 100644
> index 000000000000..51ccbbb0efae
> --- /dev/null
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -0,0 +1,101 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2020 Google LLC
> + * Author: Will Deacon <will@kernel.org>
> + */
> +
> +#ifndef __ARM64_KVM_PGTABLE_H__
> +#define __ARM64_KVM_PGTABLE_H__
> +
> +#include <linux/bits.h>
> +#include <linux/kvm_host.h>
> +#include <linux/types.h>
> +
> +typedef u64 kvm_pte_t;
> +
> +/**
> + * struct kvm_pgtable - KVM page-table.
> + * @ia_bits:		Maximum input address size, in bits.
> + * @start_level:	Level at which the page-table walk starts.
> + * @pgd:		Pointer to the first top-level entry of the page-table.
> + * @mmu:		Stage-2 KVM MMU struct. Unused for stage-1 page-tables.
> + */
> +struct kvm_pgtable {
> +	u32					ia_bits;
> +	u32					start_level;
> +	kvm_pte_t				*pgd;
> +
> +	/* Stage-2 only */
> +	struct kvm_s2_mmu			*mmu;
> +};
> +
> +/**
> + * enum kvm_pgtable_prot - Page-table permissions and attributes.
> + * @KVM_PGTABLE_PROT_R:		Read permission.
> + * @KVM_PGTABLE_PROT_W:		Write permission.
> + * @KVM_PGTABLE_PROT_X:		Execute permission.
> + * @KVM_PGTABLE_PROT_DEVICE:	Device attributes.
> + */
> +enum kvm_pgtable_prot {
> +	KVM_PGTABLE_PROT_R			= BIT(0),
> +	KVM_PGTABLE_PROT_W			= BIT(1),
> +	KVM_PGTABLE_PROT_X			= BIT(2),
> +
> +	KVM_PGTABLE_PROT_DEVICE			= BIT(3),
> +};
> +
> +/**
> + * enum kvm_pgtable_walk_flags - Flags to control a depth-first page-table walk.
> + * @KVM_PGTABLE_WALK_LEAF:		Visit leaf entries, including invalid
> + *					entries.
> + * @KVM_PGTABLE_WALK_TABLE_PRE:		Visit table entries before their
> + *					children.
> + * @KVM_PGTABLE_WALK_TABLE_POST:	Visit table entries after their
> + *					children.
> + */
> +enum kvm_pgtable_walk_flags {
> +	KVM_PGTABLE_WALK_LEAF			= BIT(0),
> +	KVM_PGTABLE_WALK_TABLE_PRE		= BIT(1),
> +	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
> +};
> +
> +typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level,
> +					kvm_pte_t *ptep,
> +					enum kvm_pgtable_walk_flags flag,
> +					void * const arg);
> +
> +/**
> + * struct kvm_pgtable_walker - Hook into a page-table walk.
> + * @cb:		Callback function to invoke during the walk.
> + * @arg:	Argument passed to the callback function.
> + * @flags:	Bitwise-OR of flags to identify the entry types on which to
> + *		invoke the callback function.
> + */
> +struct kvm_pgtable_walker {
> +	const kvm_pgtable_visitor_fn_t		cb;
> +	void * const				arg;
> +	const enum kvm_pgtable_walk_flags	flags;
> +};
> +
> +/**
> + * kvm_pgtable_walk() - Walk a page-table.
> + * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
> + * @addr:	Input address for the start of the walk.
> + * @size:	Size of the range to walk.
> + * @walker:	Walker callback description.
> + *
> + * The walker will walk the page-table entries corresponding to the input
> + * address range specified, visiting entries according to the walker flags.
> + * Invalid entries are treated as leaf entries. Leaf entries are reloaded
> + * after invoking the walker callback, allowing the walker to descend into
> + * a newly installed table.
> + *
> + * Returning a negative error code from the walker callback function will
> + * terminate the walk immediately with the same error code.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
> +		     struct kvm_pgtable_walker *walker);
> +
> +#endif	/* __ARM64_KVM_PGTABLE_H__ */
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index f54f0e89a71c..607b8a898826 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -10,5 +10,5 @@ subdir-ccflags-y := -I$(incdir)				\
>  		    -DDISABLE_BRANCH_PROFILING		\
>  		    $(DISABLE_STACKLEAK_PLUGIN)
>  
> -obj-$(CONFIG_KVM) += vhe/ nvhe/
> +obj-$(CONFIG_KVM) += vhe/ nvhe/ pgtable.o
>  obj-$(CONFIG_KVM_INDIRECT_VECTORS) += smccc_wa.o
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> new file mode 100644
> index 000000000000..462001bbe028
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -0,0 +1,290 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Stand-alone page-table allocator for hyp stage-1 and guest stage-2.
> + * No bombay mix was harmed in the writing of this file.
> + *
> + * Copyright (C) 2020 Google LLC
> + * Author: Will Deacon <will@kernel.org>
> + */
> +
> +#include <linux/bitfield.h>
> +#include <asm/kvm_pgtable.h>
> +
> +#define KVM_PGTABLE_MAX_LEVELS		4U
> +
> +#define KVM_PTE_VALID			BIT(0)
> +
> +#define KVM_PTE_TYPE			BIT(1)
> +#define KVM_PTE_TYPE_BLOCK		0
> +#define KVM_PTE_TYPE_PAGE		1
> +#define KVM_PTE_TYPE_TABLE		1
> +
> +#define KVM_PTE_ADDR_MASK		GENMASK(47, PAGE_SHIFT)
> +#define KVM_PTE_ADDR_51_48		GENMASK(15, 12)
> +
> +#define KVM_PTE_LEAF_ATTR_LO		GENMASK(11, 2)
> +
> +#define KVM_PTE_LEAF_ATTR_HI		GENMASK(63, 51)
> +
> +struct kvm_pgtable_walk_data {
> +	struct kvm_pgtable		*pgt;
> +	struct kvm_pgtable_walker	*walker;
> +
> +	u64				addr;
> +	u64				end;
> +};
> +
> +static u64 kvm_granule_shift(u32 level)
> +{
> +	return (KVM_PGTABLE_MAX_LEVELS - level) * (PAGE_SHIFT - 3) + 3;

Isn't that the same same thing as the macro ARM64_HW_PGTABLE_LEVEL_SHIFT(n) from
pgtable-hwdef.h? I think the header is already included, as this file uses
PTRS_PER_PTE and that's the only place I found it defined.

> +}
> +
> +static u64 kvm_granule_size(u32 level)
> +{
> +	return BIT(kvm_granule_shift(level));
> +}
> +
> +static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level)
> +{
> +	u64 granule = kvm_granule_size(level);
> +
> +	/*
> +	 * Reject invalid block mappings and don't bother with 4TB mappings for
> +	 * 52-bit PAs.
> +	 */
> +	if (level == 0 || (PAGE_SIZE != SZ_4K && level == 1))
> +		return false;
> +
> +	if (granule > (end - addr))
> +		return false;
> +
> +	return IS_ALIGNED(addr, granule) && IS_ALIGNED(phys, granule);
> +}

This is a very nice rewrite of fault_supports_stage2_huge_mapping, definitely
easier to understand.

> +
> +static u32 kvm_start_level(u64 ia_bits)
> +{
> +	u64 levels = DIV_ROUND_UP(ia_bits - PAGE_SHIFT, PAGE_SHIFT - 3);

Isn't that the same same thing as the macro ARM64_HW_PGTABLE_LEVELS from
pgtable-hwdef.h?

> +	return KVM_PGTABLE_MAX_LEVELS - levels;

I tried to verify this formula and I think there's something that I don't
understand or I'm missing. For the default KVM setup, where the user doesn't
specify an IPA size different from the 40 bits default: ia_bits = 40 (IPA =
[39:0]), 4KB pages, translation starting at level 1 with 2 concatenated level 1
tables (VTCR_EL2.T0SZ = 24, VTCR_EL2.SL0 = 1, VTCR_EL2.TG0 = 0, starting level
from table D5-13 at page D5-2566, ARM DDI 0487F.b), according to the formula I get:

levels = DIV_ROUND_UP(40 - 12, 12 -3) = DIV_ROUND_UP(28, 9) = 4
return 4 - 4 = 0

which means the resulting starting level is 0 instead of 1.

> +}
> +
> +static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
> +{
> +	u64 shift = kvm_granule_shift(level);
> +	u64 mask = BIT(PAGE_SHIFT - 3) - 1;

This doesn't seem to take into account the fact that we can have concatenated
initial page tables.

> +
> +	return (data->addr >> shift) & mask;
> +}
> +
> +static u32 __kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
> +{
> +	u64 shift = kvm_granule_shift(pgt->start_level - 1); /* May underflow */
> +	u64 mask = BIT(pgt->ia_bits) - 1;
> +
> +	return (addr & mask) >> shift;
> +}
> +
> +static u32 kvm_pgd_page_idx(struct kvm_pgtable_walk_data *data)
> +{
> +	return __kvm_pgd_page_idx(data->pgt, data->addr);
> +}
> +
> +static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
> +{
> +	struct kvm_pgtable pgt = {
> +		.ia_bits	= ia_bits,
> +		.start_level	= start_level,
> +	};
> +
> +	return __kvm_pgd_page_idx(&pgt, -1ULL) + 1;
> +}
> +
> +static bool kvm_pte_valid(kvm_pte_t pte)
> +{
> +	return pte & KVM_PTE_VALID;
> +}
> +
> +static bool kvm_pte_table(kvm_pte_t pte, u32 level)
> +{
> +	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
> +		return false;
> +
> +	if (!kvm_pte_valid(pte))
> +		return false;
> +
> +	return FIELD_GET(KVM_PTE_TYPE, pte) == KVM_PTE_TYPE_TABLE;
> +}
> +
> +static u64 kvm_pte_to_phys(kvm_pte_t pte)
> +{
> +	u64 pa = pte & KVM_PTE_ADDR_MASK;
> +
> +	if (PAGE_SHIFT == 16)
> +		pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
> +
> +	return pa;
> +}
> +
> +static kvm_pte_t kvm_phys_to_pte(u64 pa)
> +{
> +	kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK;
> +
> +	if (PAGE_SHIFT == 16)
> +		pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
> +
> +	return pte;
> +}
> +
> +static kvm_pte_t *kvm_pte_follow(kvm_pte_t pte)
> +{
> +	return __va(kvm_pte_to_phys(pte));
> +}
> +
> +static void kvm_set_invalid_pte(kvm_pte_t *ptep)
> +{
> +	kvm_pte_t pte = 0;
> +	WRITE_ONCE(*ptep, pte);
> +}
> +
> +static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
> +{
> +	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(__pa(childp));
> +
> +	pte |= FIELD_PREP(KVM_PTE_TYPE, KVM_PTE_TYPE_TABLE);
> +	pte |= KVM_PTE_VALID;
> +
> +	WARN_ON(kvm_pte_valid(old));
> +	smp_store_release(ptep, pte);
> +}
> +
> +static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
> +				   u32 level)
> +{
> +	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(pa);
> +	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
> +							   KVM_PTE_TYPE_BLOCK;
> +
> +	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
> +	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
> +	pte |= KVM_PTE_VALID;
> +
> +	/* Tolerate KVM recreating the exact same mapping. */
> +	if (kvm_pte_valid(old))
> +		return old == pte;
> +
> +	smp_store_release(ptep, pte);
> +	return true;
> +}
> +
> +static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
> +				  u32 level, kvm_pte_t *ptep,
> +				  enum kvm_pgtable_walk_flags flag)
> +{
> +	struct kvm_pgtable_walker *walker = data->walker;
> +	return walker->cb(addr, data->end, level, ptep, flag, walker->arg);
> +}
> +
> +static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> +			      kvm_pte_t *pgtable, u32 level);
> +
> +static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
> +				      kvm_pte_t *ptep, u32 level)
> +{
> +	int ret = 0;
> +	u64 addr = data->addr;
> +	kvm_pte_t *childp, pte = *ptep;
> +	bool table = kvm_pte_table(pte, level);
> +	enum kvm_pgtable_walk_flags flags = data->walker->flags;
> +
> +	if (table && (flags & KVM_PGTABLE_WALK_TABLE_PRE)) {
> +		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> +					     KVM_PGTABLE_WALK_TABLE_PRE);

I see that below we check if the visitor modified the leaf entry and turned into a
table. Is it not allowed for a visitor to turn a table into a block mapping?

> +	}
> +
> +	if (!table && (flags & KVM_PGTABLE_WALK_LEAF)) {
> +		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> +					     KVM_PGTABLE_WALK_LEAF);
> +		pte = *ptep;
> +		table = kvm_pte_table(pte, level);
> +	}
> +
> +	if (ret)
> +		goto out;
> +
> +	if (!table) {
> +		data->addr += kvm_granule_size(level);
> +		goto out;
> +	}
> +
> +	childp = kvm_pte_follow(pte);
> +	ret = __kvm_pgtable_walk(data, childp, level + 1);
> +	if (ret)
> +		goto out;
> +
> +	if (flags & KVM_PGTABLE_WALK_TABLE_POST) {

We check that ptep is a valid table when we test the KVM_PGTABLE_WALK_TABLE_PRE
flag, why aren't we doing that here?

> +		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> +					     KVM_PGTABLE_WALK_TABLE_POST);
> +	}
> +
> +out:
> +	return ret;
> +}
> +
> +static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> +			      kvm_pte_t *pgtable, u32 level)
> +{
> +	u32 idx;
> +	int ret = 0;
> +
> +	if (WARN_ON_ONCE(level >= KVM_PGTABLE_MAX_LEVELS))
> +		return -EINVAL;
> +
> +	for (idx = kvm_pgtable_idx(data, level); idx < PTRS_PER_PTE; ++idx) {
> +		kvm_pte_t *ptep = &pgtable[idx];
> +
> +		if (data->addr >= data->end)
> +			break;
> +
> +		ret = __kvm_pgtable_visit(data, ptep, level);
> +		if (ret)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
> +static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
> +{
> +	u32 idx;
> +	int ret = 0;
> +	struct kvm_pgtable *pgt = data->pgt;
> +	u64 limit = BIT(pgt->ia_bits);
> +
> +	if (data->addr > limit || data->end > limit)
> +		return -ERANGE;
> +
> +	if (!pgt->pgd)
> +		return -EINVAL;
> +
> +	for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
> +		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];

I'm sorry, but I just don't understand this part:

- Why do we skip over PTRS_PER_PTE instead of visiting each idx?

- Why do we use PTRS_PER_PTE instead of PTRS_PER_PGD?

Would you mind explaining what the loop is doing?

I also don't see anywhere in the page table walking code where we take into
account that we can have concatenated tables at level 1 or 2, which means we have
more entries than PTRS_PER_P{U,M}D.

> +
> +		ret = __kvm_pgtable_walk(data, ptep, pgt->start_level);
> +		if (ret)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
> +int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
> +		     struct kvm_pgtable_walker *walker)
> +{
> +	struct kvm_pgtable_walk_data walk_data = {
> +		.pgt	= pgt,
> +		.addr	= ALIGN_DOWN(addr, PAGE_SIZE),
> +		.end	= PAGE_ALIGN(walk_data.addr + size),

Shouldn't that be .end = PAGE_ALIGN(addr + size)? For example, for addr = 2 *
PAGE_SIZE -1 and size = PAGE_SIZE, PAGE_ALIGN(addr + size) = 3 * PAGE_SIZE, but
PAGE_ALIGN(walk_data.addr + size) = 2 * PAGE_SIZE.

What happens if addr < PAGE_SIZE - 1? It looks to me that according to the
definition of ALIGN_DOWN, addr will wrap around.

Thanks,

Alex

> +		.walker	= walker,
> +	};
> +
> +	return _kvm_pgtable_walk(&walk_data);
> +}

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 03/21] KVM: arm64: Add support for creating kernel-agnostic stage-1 page tables
  2020-08-25  9:39 ` [PATCH v3 03/21] KVM: arm64: Add support for creating kernel-agnostic stage-1 page tables Will Deacon
@ 2020-08-28 15:35   ` Alexandru Elisei
  2020-09-02 10:06     ` Will Deacon
  0 siblings, 1 reply; 86+ messages in thread
From: Alexandru Elisei @ 2020-08-28 15:35 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Marc Zyngier, kernel-team, linux-arm-kernel, Catalin Marinas

Hi Will,

On 8/25/20 10:39 AM, Will Deacon wrote:
> The generic page-table walker is pretty useless as it stands, because it
> doesn't understand enough to allocate anything. Teach it about stage-1
> page-tables, and hook up an API for allocating these for the hypervisor
> at EL2.
>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h |  34 +++++++
>  arch/arm64/kvm/hyp/pgtable.c         | 131 +++++++++++++++++++++++++++
>  2 files changed, 165 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 51ccbbb0efae..ec9f98527dcc 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -77,6 +77,40 @@ struct kvm_pgtable_walker {
>  	const enum kvm_pgtable_walk_flags	flags;
>  };
>  
> +/**
> + * kvm_pgtable_hyp_init() - Initialise a hypervisor stage-1 page-table.
> + * @pgt:	Uninitialised page-table structure to initialise.
> + * @va_bits:	Maximum virtual address bits.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits);
> +
> +/**
> + * kvm_pgtable_hyp_destroy() - Destroy an unused hypervisor stage-1 page-table.
> + * @pgt:	Page-table structure initialised by kvm_pgtable_hyp_init().
> + *
> + * The page-table is assumed to be unreachable by any hardware walkers prior
> + * to freeing and therefore no TLB invalidation is performed.
> + */
> +void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt);
> +
> +/**
> + * kvm_pgtable_hyp_map() - Install a mapping in a hypervisor stage-1 page-table.
> + * @pgt:	Page-table structure initialised by kvm_pgtable_hyp_init().
> + * @addr:	Virtual address at which to place the mapping.
> + * @size:	Size of the mapping.
> + * @phys:	Physical address of the memory to map.
> + * @prot:	Permissions and attributes for the mapping.
> + *
> + * If device attributes are not explicitly requested in @prot, then the
> + * mapping will be normal, cacheable.
> + *
> + * Return: 0 on success, negative error code on failure.

From my understanding of the code, when the caller replaces an existing leaf entry
or a table with a different one, KVM will print a warning instead of using
break-before-make (if necessary). It might be worth pointing out that it is
expected from the callers not to do that, because it's not immediately obvious.

> + */
> +int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
> +			enum kvm_pgtable_prot prot);
> +
>  /**
>   * kvm_pgtable_walk() - Walk a page-table.
>   * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 462001bbe028..d75166823ad9 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -24,8 +24,18 @@
>  
>  #define KVM_PTE_LEAF_ATTR_LO		GENMASK(11, 2)
>  
> +#define KVM_PTE_LEAF_ATTR_LO_S1_ATTRIDX	GENMASK(4, 2)
> +#define KVM_PTE_LEAF_ATTR_LO_S1_AP	GENMASK(7, 6)
> +#define KVM_PTE_LEAF_ATTR_LO_S1_AP_RO	3
> +#define KVM_PTE_LEAF_ATTR_LO_S1_AP_RW	1
> +#define KVM_PTE_LEAF_ATTR_LO_S1_SH	GENMASK(9, 8)
> +#define KVM_PTE_LEAF_ATTR_LO_S1_SH_IS	3
> +#define KVM_PTE_LEAF_ATTR_LO_S1_AF	BIT(10)
> +
>  #define KVM_PTE_LEAF_ATTR_HI		GENMASK(63, 51)
>  
> +#define KVM_PTE_LEAF_ATTR_HI_S1_XN	BIT(54)

I compared the macros to the Arm ARM attribute fields in stage 1 VMSAv8-64 block
and page descriptors, and they match.

I looked at the algorithm below, and for what it's worth it looks alright to me.

Thanks,

Alex

> +
>  struct kvm_pgtable_walk_data {
>  	struct kvm_pgtable		*pgt;
>  	struct kvm_pgtable_walker	*walker;
> @@ -288,3 +298,124 @@ int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>  
>  	return _kvm_pgtable_walk(&walk_data);
>  }
> +
> +struct hyp_map_data {
> +	u64		phys;
> +	kvm_pte_t	attr;
> +};
> +
> +static int hyp_map_set_prot_attr(enum kvm_pgtable_prot prot,
> +				 struct hyp_map_data *data)
> +{
> +	bool device = prot & KVM_PGTABLE_PROT_DEVICE;
> +	u32 mtype = device ? MT_DEVICE_nGnRE : MT_NORMAL;
> +	kvm_pte_t attr = FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_ATTRIDX, mtype);
> +	u32 sh = KVM_PTE_LEAF_ATTR_LO_S1_SH_IS;
> +	u32 ap = (prot & KVM_PGTABLE_PROT_W) ? KVM_PTE_LEAF_ATTR_LO_S1_AP_RW :
> +					       KVM_PTE_LEAF_ATTR_LO_S1_AP_RO;
> +
> +	if (!(prot & KVM_PGTABLE_PROT_R))
> +		return -EINVAL;
> +
> +	if (prot & KVM_PGTABLE_PROT_X) {
> +		if (prot & KVM_PGTABLE_PROT_W)
> +			return -EINVAL;
> +
> +		if (device)
> +			return -EINVAL;
> +	} else {
> +		attr |= KVM_PTE_LEAF_ATTR_HI_S1_XN;
> +	}
> +
> +	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_AP, ap);
> +	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S1_SH, sh);
> +	attr |= KVM_PTE_LEAF_ATTR_LO_S1_AF;
> +	data->attr = attr;
> +	return 0;
> +}
> +
> +static bool hyp_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> +				    kvm_pte_t *ptep, struct hyp_map_data *data)
> +{
> +	u64 granule = kvm_granule_size(level), phys = data->phys;
> +
> +	if (!kvm_block_mapping_supported(addr, end, phys, level))
> +		return false;
> +
> +	WARN_ON(!kvm_set_valid_leaf_pte(ptep, phys, data->attr, level));
> +	data->phys += granule;
> +	return true;
> +}
> +
> +static int hyp_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +			  enum kvm_pgtable_walk_flags flag, void * const arg)
> +{
> +	kvm_pte_t *childp;
> +
> +	if (hyp_map_walker_try_leaf(addr, end, level, ptep, arg))
> +		return 0;
> +
> +	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
> +		return -EINVAL;
> +
> +	childp = (kvm_pte_t *)get_zeroed_page(GFP_KERNEL);
> +	if (!childp)
> +		return -ENOMEM;
> +
> +	kvm_set_table_pte(ptep, childp);
> +	return 0;
> +}
> +
> +int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
> +			enum kvm_pgtable_prot prot)
> +{
> +	int ret;
> +	struct hyp_map_data map_data = {
> +		.phys	= ALIGN_DOWN(phys, PAGE_SIZE),
> +	};
> +	struct kvm_pgtable_walker walker = {
> +		.cb	= hyp_map_walker,
> +		.flags	= KVM_PGTABLE_WALK_LEAF,
> +		.arg	= &map_data,
> +	};
> +
> +	ret = hyp_map_set_prot_attr(prot, &map_data);
> +	if (ret)
> +		return ret;
> +
> +	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
> +	dsb(ishst);
> +	isb();
> +	return ret;
> +}
> +
> +int kvm_pgtable_hyp_init(struct kvm_pgtable *pgt, u32 va_bits)
> +{
> +	pgt->pgd = (kvm_pte_t *)get_zeroed_page(GFP_KERNEL);
> +	if (!pgt->pgd)
> +		return -ENOMEM;
> +
> +	pgt->ia_bits		= va_bits;
> +	pgt->start_level	= kvm_start_level(va_bits);
> +	pgt->mmu		= NULL;
> +	return 0;
> +}
> +
> +static int hyp_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +			   enum kvm_pgtable_walk_flags flag, void * const arg)
> +{
> +	free_page((unsigned long)kvm_pte_follow(*ptep));
> +	return 0;
> +}
> +
> +void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
> +{
> +	struct kvm_pgtable_walker walker = {
> +		.cb	= hyp_free_walker,
> +		.flags	= KVM_PGTABLE_WALK_TABLE_POST,
> +	};
> +
> +	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
> +	free_page((unsigned long)pgt->pgd);
> +	pgt->pgd = NULL;
> +}

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 02/21] KVM: arm64: Add stand-alone page-table walker infrastructure
  2020-08-27 16:27   ` Alexandru Elisei
@ 2020-08-28 15:43     ` Alexandru Elisei
  2020-09-02 10:36     ` Will Deacon
  1 sibling, 0 replies; 86+ messages in thread
From: Alexandru Elisei @ 2020-08-28 15:43 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Marc Zyngier, kernel-team, linux-arm-kernel, Catalin Marinas

Hi,

I've had another good look at the code, and I now I can answer some of my own
questions. Sorry for the noise!

On 8/27/20 5:27 PM, Alexandru Elisei wrote:
> [..]
> +
> +	if (!table) {
> +		data->addr += kvm_granule_size(level);
> +		goto out;
> +	}
> +
> +	childp = kvm_pte_follow(pte);
> +	ret = __kvm_pgtable_walk(data, childp, level + 1);
> +	if (ret)
> +		goto out;
> +
> +	if (flags & KVM_PGTABLE_WALK_TABLE_POST) {
> We check that ptep is a valid table when we test the KVM_PGTABLE_WALK_TABLE_PRE
> flag, why aren't we doing that here?

That's because the function goes to out if the leaf visitor didn't turn the leaf
entry into a table.

>
>> +		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
>> +					     KVM_PGTABLE_WALK_TABLE_POST);
>> +	}
>> +
>> +out:
>> +	return ret;
>> +}
>> +
>> [..]
>> +}
>> +
>> +static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
>> +{
>> +	u32 idx;
>> +	int ret = 0;
>> +	struct kvm_pgtable *pgt = data->pgt;
>> +	u64 limit = BIT(pgt->ia_bits);
>> +
>> +	if (data->addr > limit || data->end > limit)
>> +		return -ERANGE;
>> +
>> +	if (!pgt->pgd)
>> +		return -EINVAL;
>> +
>> +	for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
>> +		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
> I'm sorry, but I just don't understand this part:
>
> - Why do we skip over PTRS_PER_PTE instead of visiting each idx?
>
> - Why do we use PTRS_PER_PTE instead of PTRS_PER_PGD?
>
> Would you mind explaining what the loop is doing?
>
> I also don't see anywhere in the page table walking code where we take into
> account that we can have concatenated tables at level 1 or 2, which means we have
> more entries than PTRS_PER_P{U,M}D.

I think I understand the code better now, __kvm_pgtable_walk will visit all
entries in the range ptep[0..PTRS_PER_PTE-1], that's why every iteration we
increment by PTRS_PER_PTE.

>
>> +
>> +		ret = __kvm_pgtable_walk(data, ptep, pgt->start_level);
>> +		if (ret)
>> +			break;
>> +	}
>> +
>> +	return ret;
>> +}
>> +
>> +int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
>> +		     struct kvm_pgtable_walker *walker)
>> +{
>> +	struct kvm_pgtable_walk_data walk_data = {
>> +		.pgt	= pgt,
>> +		.addr	= ALIGN_DOWN(addr, PAGE_SIZE),
>> +		.end	= PAGE_ALIGN(walk_data.addr + size),
> [..]
>
> What happens if addr < PAGE_SIZE - 1? It looks to me that according to the
> definition of ALIGN_DOWN, addr will wrap around.

My mistake again, ALIGN_DOWN will subtract PAGE_SIZE - 1, but __ALIGN_KERNEL will
add PAGE_SIZE - 1, and the result is what we expect (no wrapping around).

Thanks,

Alex


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 02/21] KVM: arm64: Add stand-alone page-table walker infrastructure
  2020-08-25  9:39 ` [PATCH v3 02/21] KVM: arm64: Add stand-alone page-table walker infrastructure Will Deacon
  2020-08-27 16:27   ` Alexandru Elisei
@ 2020-08-28 15:51   ` Alexandru Elisei
  2020-09-02 10:49     ` Will Deacon
  2020-09-02  6:31   ` Gavin Shan
  2 siblings, 1 reply; 86+ messages in thread
From: Alexandru Elisei @ 2020-08-28 15:51 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Marc Zyngier, kernel-team, linux-arm-kernel, Catalin Marinas

Hi Will,

On 8/25/20 10:39 AM, Will Deacon wrote:
> [..]
> +static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
> +{
> +	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(__pa(childp));
> +
> +	pte |= FIELD_PREP(KVM_PTE_TYPE, KVM_PTE_TYPE_TABLE);
> +	pte |= KVM_PTE_VALID;
> +
> +	WARN_ON(kvm_pte_valid(old));
> +	smp_store_release(ptep, pte);
> +}
> +
> +static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
> +				   u32 level)
> +{
> +	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(pa);
> +	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
> +							   KVM_PTE_TYPE_BLOCK;
> +
> +	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
> +	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
> +	pte |= KVM_PTE_VALID;
> +
> +	/* Tolerate KVM recreating the exact same mapping. */
> +	if (kvm_pte_valid(old))
> +		return old == pte;
> +
> +	smp_store_release(ptep, pte);
> +	return true;
> +}

These two functions look inconsistent to me - we refuse to update a valid leaf
entry with a new value, but we allow updating a valid table. Is there something
that I'm not taking into account?

Thanks,

Alex


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 04/21] KVM: arm64: Use generic allocator for hyp stage-1 page-tables
  2020-08-25  9:39 ` [PATCH v3 04/21] KVM: arm64: Use generic allocator for hyp stage-1 page-tables Will Deacon
@ 2020-08-28 16:32   ` Alexandru Elisei
  2020-09-02 11:35     ` Will Deacon
  0 siblings, 1 reply; 86+ messages in thread
From: Alexandru Elisei @ 2020-08-28 16:32 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Marc Zyngier, kernel-team, linux-arm-kernel, Catalin Marinas

Hi Will,

The code looks much nicer with the el2 page table allocator. One minor nitpick below.

On 8/25/20 10:39 AM, Will Deacon wrote:
> Now that we have a shiny new page-table allocator, replace the hyp
> page-table code with calls into the new API. This also allows us to
> remove the extended idmap code, as we can now simply ensure that the
> VA size is large enough to map everything we need.
>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_mmu.h       |  78 +----
>  arch/arm64/include/asm/kvm_pgtable.h   |   5 +
>  arch/arm64/include/asm/pgtable-hwdef.h |   6 -
>  arch/arm64/include/asm/pgtable-prot.h  |   6 -
>  arch/arm64/kvm/mmu.c                   | 414 +++----------------------
>  5 files changed, 45 insertions(+), 464 deletions(-)
>
> [..]
> @@ -2356,6 +2028,7 @@ static int kvm_map_idmap_text(pgd_t *pgd)
>  int kvm_mmu_init(void)
>  {
>  	int err;
> +	u32 hyp_va_bits;
>  
>  	hyp_idmap_start = __pa_symbol(__hyp_idmap_text_start);
>  	hyp_idmap_start = ALIGN_DOWN(hyp_idmap_start, PAGE_SIZE);
> @@ -2369,6 +2042,8 @@ int kvm_mmu_init(void)
>  	 */
>  	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
>  
> +	hyp_va_bits = 64 - ((idmap_t0sz & TCR_T0SZ_MASK) >> TCR_T0SZ_OFFSET);

idmap_t0sz is defined in mm/mmu.c as: TCR_T0SZ(VA_BITS) = (UL(64) - VA_BITS) <<
TCR_T0SZ_OFFSET. Looks to me like hyp_va_bits == VA_BITS.

Thanks,
Alex

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling
  2020-08-27 16:26 ` [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Alexandru Elisei
@ 2020-09-01 16:15   ` Will Deacon
  0 siblings, 0 replies; 86+ messages in thread
From: Will Deacon @ 2020-09-01 16:15 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: Marc Zyngier, kernel-team, kvmarm, linux-arm-kernel, Catalin Marinas

Hi Alex,

On Thu, Aug 27, 2020 at 05:26:01PM +0100, Alexandru Elisei wrote:
> I've been looking into pinning guest memory for KVM SPE, so I like to think that
> the stage 2 page table code is not entirely alien to me. I'll do my best to review
> the series, I hope you'll find it useful.

Just wanted to say a huge "thank you!" for having a look. I'll get to your
comments later this week, but I'm a bit snowed under after LPC and the
bank holiday at the moment, so please bear with me.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 06/21] KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table
  2020-08-25  9:39 ` [PATCH v3 06/21] KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table Will Deacon
@ 2020-09-01 16:24   ` Alexandru Elisei
  2020-09-02 11:46     ` Will Deacon
  2020-09-03  2:57   ` Gavin Shan
  2020-09-03 11:18   ` Gavin Shan
  2 siblings, 1 reply; 86+ messages in thread
From: Alexandru Elisei @ 2020-09-01 16:24 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Marc Zyngier, kernel-team, linux-arm-kernel, Catalin Marinas

Hi Will,

On 8/25/20 10:39 AM, Will Deacon wrote:
> Add stage-2 map() and unmap() operations to the generic page-table code.
>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_pgtable.h |  39 ++++
>  arch/arm64/kvm/hyp/pgtable.c         | 262 +++++++++++++++++++++++++++
>  2 files changed, 301 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 3389f978d573..8ab0d5f43817 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -134,6 +134,45 @@ int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm);
>   */
>  void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
>  
> +/**
> + * kvm_pgtable_stage2_map() - Install a mapping in a guest stage-2 page-table.
> + * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
> + * @addr:	Intermediate physical address at which to place the mapping.
> + * @size:	Size of the mapping.
> + * @phys:	Physical address of the memory to map.
> + * @prot:	Permissions and attributes for the mapping.
> + * @mc:		Cache of pre-allocated GFP_PGTABLE_USER memory from which to
> + *		allocate page-table pages.
> + *
> + * If device attributes are not explicitly requested in @prot, then the
> + * mapping will be normal, cacheable.
> + *
> + * Note that this function will both coalesce existing table entries and split
> + * existing block mappings, relying on page-faults to fault back areas outside
> + * of the new mapping lazily.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
> +			   u64 phys, enum kvm_pgtable_prot prot,
> +			   struct kvm_mmu_memory_cache *mc);
> +
> +/**
> + * kvm_pgtable_stage2_unmap() - Remove a mapping from a guest stage-2 page-table.
> + * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
> + * @addr:	Intermediate physical address from which to remove the mapping.
> + * @size:	Size of the mapping.
> + *
> + * TLB invalidation is performed for each page-table entry cleared during the
> + * unmapping operation and the reference count for the page-table page
> + * containing the cleared entry is decremented, with unreferenced pages being
> + * freed. Unmapping a cacheable page will ensure that it is clean to the PoC if
> + * FWB is not supported by the CPU.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
> +
>  /**
>   * kvm_pgtable_walk() - Walk a page-table.
>   * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index b8550ccaef4d..41ee8f3c0369 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -32,10 +32,19 @@
>  #define KVM_PTE_LEAF_ATTR_LO_S1_SH_IS	3
>  #define KVM_PTE_LEAF_ATTR_LO_S1_AF	BIT(10)
>  
> +#define KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR	GENMASK(5, 2)
> +#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R	BIT(6)
> +#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W	BIT(7)
> +#define KVM_PTE_LEAF_ATTR_LO_S2_SH	GENMASK(9, 8)
> +#define KVM_PTE_LEAF_ATTR_LO_S2_SH_IS	3
> +#define KVM_PTE_LEAF_ATTR_LO_S2_AF	BIT(10)
> +
>  #define KVM_PTE_LEAF_ATTR_HI		GENMASK(63, 51)
>  
>  #define KVM_PTE_LEAF_ATTR_HI_S1_XN	BIT(54)
>  
> +#define KVM_PTE_LEAF_ATTR_HI_S2_XN	BIT(54)

Checked the bitfields against ARM DDI 0487F.b, they match.

> +
>  struct kvm_pgtable_walk_data {
>  	struct kvm_pgtable		*pgt;
>  	struct kvm_pgtable_walker	*walker;
> @@ -420,6 +429,259 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
>  	pgt->pgd = NULL;
>  }
>  
> +struct stage2_map_data {
> +	u64				phys;
> +	kvm_pte_t			attr;
> +
> +	kvm_pte_t			*anchor;
> +
> +	struct kvm_s2_mmu		*mmu;
> +	struct kvm_mmu_memory_cache	*memcache;
> +};
> +
> +static kvm_pte_t *stage2_memcache_alloc_page(struct stage2_map_data *data)
> +{
> +	kvm_pte_t *ptep = NULL;
> +	struct kvm_mmu_memory_cache *mc = data->memcache;
> +
> +	/* Allocated with GFP_PGTABLE_USER, so no need to zero */
> +	if (mc && mc->nobjs)
> +		ptep = mc->objects[--mc->nobjs];
> +
> +	return ptep;
> +}
> +
> +static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
> +				    struct stage2_map_data *data)
> +{
> +	bool device = prot & KVM_PGTABLE_PROT_DEVICE;
> +	kvm_pte_t attr = device ? PAGE_S2_MEMATTR(DEVICE_nGnRE) :
> +			    PAGE_S2_MEMATTR(NORMAL);
> +	u32 sh = KVM_PTE_LEAF_ATTR_LO_S2_SH_IS;
> +
> +	if (!(prot & KVM_PGTABLE_PROT_X))
> +		attr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
> +	else if (device)
> +		return -EINVAL;
> +
> +	if (prot & KVM_PGTABLE_PROT_R)
> +		attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R;
> +
> +	if (prot & KVM_PGTABLE_PROT_W)
> +		attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W;
> +
> +	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
> +	attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
> +	data->attr = attr;
> +	return 0;
> +}
> +
> +static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> +				       kvm_pte_t *ptep,
> +				       struct stage2_map_data *data)
> +{
> +	u64 granule = kvm_granule_size(level), phys = data->phys;
> +
> +	if (!kvm_block_mapping_supported(addr, end, phys, level))
> +		return false;
> +
> +	if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
> +		goto out;
> +
> +	kvm_set_invalid_pte(ptep);
> +	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> +	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);

One has to read the kvm_set_valid_leaf_pte code very carefully to understand why
we're doing the above (found an old, valid entry in the stage 2 code, the page
tables are in use so we're doing break-before-make to replace it with the new
one), especially since we don't this with the hyp tables. Perhaps a comment
explaining what's happening would be useful.

> +out:
> +	data->phys += granule;
> +	return true;
> +}
> +
> +static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
> +				     kvm_pte_t *ptep,
> +				     struct stage2_map_data *data)
> +{
> +	if (data->anchor)
> +		return 0;
> +
> +	if (!kvm_block_mapping_supported(addr, end, data->phys, level))
> +		return 0;
> +
> +	kvm_set_invalid_pte(ptep);
> +	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
> +	data->anchor = ptep;
> +	return 0;
> +}
> +
> +static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +				struct stage2_map_data *data)
> +{
> +	kvm_pte_t *childp, pte = *ptep;
> +	struct page *page = virt_to_page(ptep);
> +
> +	if (data->anchor) {
> +		if (kvm_pte_valid(pte))
> +			put_page(page);
> +
> +		return 0;
> +	}
> +
> +	if (stage2_map_walker_try_leaf(addr, end, level, ptep, data))
> +		goto out_get_page;
> +
> +	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
> +		return -EINVAL;
> +
> +	childp = stage2_memcache_alloc_page(data);
> +	if (!childp)
> +		return -ENOMEM;
> +
> +	/*
> +	 * If we've run into an existing block mapping then replace it with
> +	 * a table. Accesses beyond 'end' that fall within the new table
> +	 * will be mapped lazily.
> +	 */
> +	if (kvm_pte_valid(pte)) {
> +		kvm_set_invalid_pte(ptep);
> +		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> +		put_page(page);
> +	}
> +
> +	kvm_set_table_pte(ptep, childp);
> +
> +out_get_page:
> +	get_page(page);
> +	return 0;
> +}
> +
> +static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
> +				      kvm_pte_t *ptep,
> +				      struct stage2_map_data *data)
> +{
> +	int ret = 0;
> +
> +	if (!data->anchor)
> +		return 0;
> +
> +	free_page((unsigned long)kvm_pte_follow(*ptep));
> +	put_page(virt_to_page(ptep));
> +
> +	if (data->anchor == ptep) {
> +		data->anchor = NULL;
> +		ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
> +	}
> +
> +	return ret;
> +}
> +
> +static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +			     enum kvm_pgtable_walk_flags flag, void * const arg)
> +{
> +	struct stage2_map_data *data = arg;
> +
> +	switch (flag) {
> +	case KVM_PGTABLE_WALK_TABLE_PRE:
> +		return stage2_map_walk_table_pre(addr, end, level, ptep, data);
> +	case KVM_PGTABLE_WALK_LEAF:
> +		return stage2_map_walk_leaf(addr, end, level, ptep, data);
> +	case KVM_PGTABLE_WALK_TABLE_POST:
> +		return stage2_map_walk_table_post(addr, end, level, ptep, data);
> +	}
> +
> +	return -EINVAL;
> +}

As I understood the algorithm, each of the pre, leaf and post function do two
different things: 1. free/invalidate the tables/leaf entries if we can create a
block mapping at a previously visited level (stage2_map_data->anchor != NULL); and
create an entry for the range at the correct level. To be honest, to me this
hasn't been obvious from the code and I think some comments to the functions and
especially to the anchor field of stage2_map_data would go a long way to making it
easier for others to understand the code.

With that in mind, the functions look solid to me: every get_page has a
corresponding put_page in stage2_map_walk_leaf or in the unmap walker, and the
algorithm looks sound. I still want to re-read the functions a few times (probably
in the next iteration) because they're definitely not trivial and I don't want to
miss something.

One nitpick below.

> +
> +int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
> +			   u64 phys, enum kvm_pgtable_prot prot,
> +			   struct kvm_mmu_memory_cache *mc)
> +{
> +	int ret;
> +	struct stage2_map_data map_data = {
> +		.phys		= ALIGN_DOWN(phys, PAGE_SIZE),
> +		.mmu		= pgt->mmu,
> +		.memcache	= mc,
> +	};
> +	struct kvm_pgtable_walker walker = {
> +		.cb		= stage2_map_walker,
> +		.flags		= KVM_PGTABLE_WALK_TABLE_PRE |
> +				  KVM_PGTABLE_WALK_LEAF |
> +				  KVM_PGTABLE_WALK_TABLE_POST,
> +		.arg		= &map_data,
> +	};
> +
> +	ret = stage2_map_set_prot_attr(prot, &map_data);
> +	if (ret)
> +		return ret;
> +
> +	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
> +	dsb(ishst);
> +	return ret;
> +}
> +
> +static void stage2_flush_dcache(void *addr, u64 size)
> +{
> +	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> +		return;
> +
> +	__flush_dcache_area(addr, size);
> +}
> +
> +static bool stage2_pte_cacheable(kvm_pte_t pte)
> +{
> +	u64 memattr = FIELD_GET(KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR, pte);
> +	return memattr == PAGE_S2_MEMATTR(NORMAL);
> +}
> +
> +static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +			       enum kvm_pgtable_walk_flags flag,
> +			       void * const arg)
> +{
> +	struct kvm_s2_mmu *mmu = arg;
> +	kvm_pte_t pte = *ptep, *childp = NULL;
> +	bool need_flush = false;
> +
> +	if (!kvm_pte_valid(pte))
> +		return 0;
> +
> +	if (kvm_pte_table(pte, level)) {
> +		childp = kvm_pte_follow(pte);
> +
> +		if (page_count(virt_to_page(childp)) != 1)
> +			return 0;
> +	} else if (stage2_pte_cacheable(pte)) {
> +		need_flush = true;
> +	}
> +
> +	/*
> +	 * This is similar to the map() path in that we unmap the entire
> +	 * block entry and rely on the remaining portions being faulted
> +	 * back lazily.
> +	 */
> +	kvm_set_invalid_pte(ptep);
> +	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
> +	put_page(virt_to_page(ptep));
> +
> +	if (need_flush) {
> +		stage2_flush_dcache(kvm_pte_follow(pte),
> +				    kvm_granule_size(level));
> +	}

The curly braces are unnecessary; I'm only mentioning it because you don't use
them in this function for the rest of the one line if statements.

Thanks,

Alex

> +
> +	if (childp)
> +		free_page((unsigned long)childp);
> +
> +	return 0;
> +}
> +
> +int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
> +{
> +	struct kvm_pgtable_walker walker = {
> +		.cb	= stage2_unmap_walker,
> +		.arg	= pgt->mmu,
> +		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> +	};
> +
> +	return kvm_pgtable_walk(pgt, addr, size, &walker);
> +}
> +
>  int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm)
>  {
>  	size_t pgd_sz;

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 07/21] KVM: arm64: Convert kvm_phys_addr_ioremap() to generic page-table API
  2020-08-25  9:39 ` [PATCH v3 07/21] KVM: arm64: Convert kvm_phys_addr_ioremap() to generic page-table API Will Deacon
@ 2020-09-01 17:08   ` Alexandru Elisei
  2020-09-02 11:48     ` Will Deacon
  2020-09-03  3:57   ` Gavin Shan
  1 sibling, 1 reply; 86+ messages in thread
From: Alexandru Elisei @ 2020-09-01 17:08 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Marc Zyngier, kernel-team, linux-arm-kernel, Catalin Marinas

Hi Will,

The patch looks correct to me. I also had another look at the pre-order visitor
for kvm_pgtable_stage2_map, and it will not try to map the address range using a
block mapping (kvm_block_mapping_supported returns false).

One nitpick below.

On 8/25/20 10:39 AM, Will Deacon wrote:
> Convert kvm_phys_addr_ioremap() to use kvm_pgtable_stage2_map() instead
> of stage2_set_pte().
>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 14 +-------------
>  arch/arm64/kvm/mmu.c         | 29 ++++++++++++-----------------
>  2 files changed, 13 insertions(+), 30 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 41ee8f3c0369..6f65d3841ec9 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -439,18 +439,6 @@ struct stage2_map_data {
>  	struct kvm_mmu_memory_cache	*memcache;
>  };
>  
> -static kvm_pte_t *stage2_memcache_alloc_page(struct stage2_map_data *data)
> -{
> -	kvm_pte_t *ptep = NULL;
> -	struct kvm_mmu_memory_cache *mc = data->memcache;
> -
> -	/* Allocated with GFP_PGTABLE_USER, so no need to zero */
> -	if (mc && mc->nobjs)
> -		ptep = mc->objects[--mc->nobjs];
> -
> -	return ptep;
> -}
> -
>  static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
>  				    struct stage2_map_data *data)
>  {
> @@ -531,7 +519,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>  	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
>  		return -EINVAL;
>  
> -	childp = stage2_memcache_alloc_page(data);
> +	childp = kvm_mmu_memory_cache_alloc(data->memcache);

I think this hunk and the above could have been squashed in the previous patch, I
think we could have used kvm_mmu_memory_cache_alloc directly from the start.

Thanks,

Alex

>  	if (!childp)
>  		return -ENOMEM;
>  
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 4607e9ca60a2..33146d3dc93a 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1154,35 +1154,30 @@ static int stage2_pudp_test_and_clear_young(pud_t *pud)
>  int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>  			  phys_addr_t pa, unsigned long size, bool writable)
>  {
> -	phys_addr_t addr, end;
> +	phys_addr_t addr;
>  	int ret = 0;
> -	unsigned long pfn;
>  	struct kvm_mmu_memory_cache cache = { 0, __GFP_ZERO, NULL, };
> +	struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
> +	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_DEVICE |
> +				     KVM_PGTABLE_PROT_R |
> +				     (writable ? KVM_PGTABLE_PROT_W : 0);
>  
> -	end = (guest_ipa + size + PAGE_SIZE - 1) & PAGE_MASK;
> -	pfn = __phys_to_pfn(pa);
> -
> -	for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) {
> -		pte_t pte = kvm_pfn_pte(pfn, PAGE_S2_DEVICE);
> -
> -		if (writable)
> -			pte = kvm_s2pte_mkwrite(pte);
> -
> +	for (addr = guest_ipa; addr < guest_ipa + size; addr += PAGE_SIZE) {
>  		ret = kvm_mmu_topup_memory_cache(&cache,
>  						 kvm_mmu_cache_min_pages(kvm));
>  		if (ret)
> -			goto out;
> +			break;
> +
>  		spin_lock(&kvm->mmu_lock);
> -		ret = stage2_set_pte(&kvm->arch.mmu, &cache, addr, &pte,
> -				     KVM_S2PTE_FLAG_IS_IOMAP);
> +		ret = kvm_pgtable_stage2_map(pgt, addr, PAGE_SIZE, pa, prot,
> +					     &cache);
>  		spin_unlock(&kvm->mmu_lock);
>  		if (ret)
> -			goto out;
> +			break;
>  
> -		pfn++;
> +		pa += PAGE_SIZE;
>  	}
>  
> -out:
>  	kvm_mmu_free_memory_cache(&cache);
>  	return ret;
>  }

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 02/21] KVM: arm64: Add stand-alone page-table walker infrastructure
  2020-08-25  9:39 ` [PATCH v3 02/21] KVM: arm64: Add stand-alone page-table walker infrastructure Will Deacon
  2020-08-27 16:27   ` Alexandru Elisei
  2020-08-28 15:51   ` Alexandru Elisei
@ 2020-09-02  6:31   ` Gavin Shan
  2020-09-02 11:02     ` Will Deacon
  2 siblings, 1 reply; 86+ messages in thread
From: Gavin Shan @ 2020-09-02  6:31 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, linux-arm-kernel

Hi Will,

On 8/25/20 7:39 PM, Will Deacon wrote:
> The KVM page-table code is intricately tied into the kernel page-table
> code and re-uses the pte/pmd/pud/p4d/pgd macros directly in an attempt
> to reduce code duplication. Unfortunately, the reality is that there is
> an awful lot of code required to make this work, and at the end of the
> day you're limited to creating page-tables with the same configuration
> as the host kernel. Furthermore, lifting the page-table code to run
> directly at EL2 on a non-VHE system (as we plan to to do in future
> patches) is practically impossible due to the number of dependencies it
> has on the core kernel.
> 
> Introduce a framework for walking Armv8 page-tables configured
> independently from the host kernel.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/include/asm/kvm_pgtable.h | 101 ++++++++++
>   arch/arm64/kvm/hyp/Makefile          |   2 +-
>   arch/arm64/kvm/hyp/pgtable.c         | 290 +++++++++++++++++++++++++++
>   3 files changed, 392 insertions(+), 1 deletion(-)
>   create mode 100644 arch/arm64/include/asm/kvm_pgtable.h
>   create mode 100644 arch/arm64/kvm/hyp/pgtable.c
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> new file mode 100644
> index 000000000000..51ccbbb0efae
> --- /dev/null
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -0,0 +1,101 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2020 Google LLC
> + * Author: Will Deacon <will@kernel.org>
> + */
> +
> +#ifndef __ARM64_KVM_PGTABLE_H__
> +#define __ARM64_KVM_PGTABLE_H__
> +
> +#include <linux/bits.h>
> +#include <linux/kvm_host.h>
> +#include <linux/types.h>
> +
> +typedef u64 kvm_pte_t;
> +
> +/**
> + * struct kvm_pgtable - KVM page-table.
> + * @ia_bits:		Maximum input address size, in bits.
> + * @start_level:	Level at which the page-table walk starts.
> + * @pgd:		Pointer to the first top-level entry of the page-table.
> + * @mmu:		Stage-2 KVM MMU struct. Unused for stage-1 page-tables.
> + */
> +struct kvm_pgtable {
> +	u32					ia_bits;
> +	u32					start_level;
> +	kvm_pte_t				*pgd;
> +
> +	/* Stage-2 only */
> +	struct kvm_s2_mmu			*mmu;
> +};
> +
> +/**
> + * enum kvm_pgtable_prot - Page-table permissions and attributes.
> + * @KVM_PGTABLE_PROT_R:		Read permission.
> + * @KVM_PGTABLE_PROT_W:		Write permission.
> + * @KVM_PGTABLE_PROT_X:		Execute permission.
> + * @KVM_PGTABLE_PROT_DEVICE:	Device attributes.
> + */
> +enum kvm_pgtable_prot {
> +	KVM_PGTABLE_PROT_R			= BIT(0),
> +	KVM_PGTABLE_PROT_W			= BIT(1),
> +	KVM_PGTABLE_PROT_X			= BIT(2),
> +
> +	KVM_PGTABLE_PROT_DEVICE			= BIT(3),
> +};
> +
> +/**
> + * enum kvm_pgtable_walk_flags - Flags to control a depth-first page-table walk.
> + * @KVM_PGTABLE_WALK_LEAF:		Visit leaf entries, including invalid
> + *					entries.
> + * @KVM_PGTABLE_WALK_TABLE_PRE:		Visit table entries before their
> + *					children.
> + * @KVM_PGTABLE_WALK_TABLE_POST:	Visit table entries after their
> + *					children.
> + */
> +enum kvm_pgtable_walk_flags {
> +	KVM_PGTABLE_WALK_LEAF			= BIT(0),
> +	KVM_PGTABLE_WALK_TABLE_PRE		= BIT(1),
> +	KVM_PGTABLE_WALK_TABLE_POST		= BIT(2),
> +};
> +
> +typedef int (*kvm_pgtable_visitor_fn_t)(u64 addr, u64 end, u32 level,
> +					kvm_pte_t *ptep,
> +					enum kvm_pgtable_walk_flags flag,
> +					void * const arg);
> +
> +/**
> + * struct kvm_pgtable_walker - Hook into a page-table walk.
> + * @cb:		Callback function to invoke during the walk.
> + * @arg:	Argument passed to the callback function.
> + * @flags:	Bitwise-OR of flags to identify the entry types on which to
> + *		invoke the callback function.
> + */
> +struct kvm_pgtable_walker {
> +	const kvm_pgtable_visitor_fn_t		cb;
> +	void * const				arg;
> +	const enum kvm_pgtable_walk_flags	flags;
> +};
> +
> +/**
> + * kvm_pgtable_walk() - Walk a page-table.
> + * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
> + * @addr:	Input address for the start of the walk.
> + * @size:	Size of the range to walk.
> + * @walker:	Walker callback description.
> + *
> + * The walker will walk the page-table entries corresponding to the input
> + * address range specified, visiting entries according to the walker flags.
> + * Invalid entries are treated as leaf entries. Leaf entries are reloaded
> + * after invoking the walker callback, allowing the walker to descend into
> + * a newly installed table.
> + *
> + * Returning a negative error code from the walker callback function will
> + * terminate the walk immediately with the same error code.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
> +		     struct kvm_pgtable_walker *walker);
> +
> +#endif	/* __ARM64_KVM_PGTABLE_H__ */
> diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
> index f54f0e89a71c..607b8a898826 100644
> --- a/arch/arm64/kvm/hyp/Makefile
> +++ b/arch/arm64/kvm/hyp/Makefile
> @@ -10,5 +10,5 @@ subdir-ccflags-y := -I$(incdir)				\
>   		    -DDISABLE_BRANCH_PROFILING		\
>   		    $(DISABLE_STACKLEAK_PLUGIN)
>   
> -obj-$(CONFIG_KVM) += vhe/ nvhe/
> +obj-$(CONFIG_KVM) += vhe/ nvhe/ pgtable.o
>   obj-$(CONFIG_KVM_INDIRECT_VECTORS) += smccc_wa.o
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> new file mode 100644
> index 000000000000..462001bbe028
> --- /dev/null
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -0,0 +1,290 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Stand-alone page-table allocator for hyp stage-1 and guest stage-2.
> + * No bombay mix was harmed in the writing of this file.
> + *
> + * Copyright (C) 2020 Google LLC
> + * Author: Will Deacon <will@kernel.org>
> + */
> +
> +#include <linux/bitfield.h>
> +#include <asm/kvm_pgtable.h>
> +
> +#define KVM_PGTABLE_MAX_LEVELS		4U
> +
> +#define KVM_PTE_VALID			BIT(0)
> +
> +#define KVM_PTE_TYPE			BIT(1)
> +#define KVM_PTE_TYPE_BLOCK		0
> +#define KVM_PTE_TYPE_PAGE		1
> +#define KVM_PTE_TYPE_TABLE		1
> +
> +#define KVM_PTE_ADDR_MASK		GENMASK(47, PAGE_SHIFT)
> +#define KVM_PTE_ADDR_51_48		GENMASK(15, 12)
> +
> +#define KVM_PTE_LEAF_ATTR_LO		GENMASK(11, 2)
> +
> +#define KVM_PTE_LEAF_ATTR_HI		GENMASK(63, 51)
> +
> +struct kvm_pgtable_walk_data {
> +	struct kvm_pgtable		*pgt;
> +	struct kvm_pgtable_walker	*walker;
> +
> +	u64				addr;
> +	u64				end;
> +};
> +

Some of the following function might be worthy to be inlined, considering
their complexity :)

> +static u64 kvm_granule_shift(u32 level)
> +{
> +	return (KVM_PGTABLE_MAX_LEVELS - level) * (PAGE_SHIFT - 3) + 3;
> +}
> +
> +static u64 kvm_granule_size(u32 level)
> +{
> +	return BIT(kvm_granule_shift(level));
> +}
> +
> +static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level)
> +{
> +	u64 granule = kvm_granule_size(level);
> +
> +	/*
> +	 * Reject invalid block mappings and don't bother with 4TB mappings for
> +	 * 52-bit PAs.
> +	 */
> +	if (level == 0 || (PAGE_SIZE != SZ_4K && level == 1))
> +		return false;
> +
> +	if (granule > (end - addr))
> +		return false;
> +
> +	return IS_ALIGNED(addr, granule) && IS_ALIGNED(phys, granule);
> +}
> +
> +static u32 kvm_start_level(u64 ia_bits)
> +{
> +	u64 levels = DIV_ROUND_UP(ia_bits - PAGE_SHIFT, PAGE_SHIFT - 3);
> +	return KVM_PGTABLE_MAX_LEVELS - levels;
> +}
> +
> +static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
> +{
> +	u64 shift = kvm_granule_shift(level);
> +	u64 mask = BIT(PAGE_SHIFT - 3) - 1;
> +
> +	return (data->addr >> shift) & mask;
> +}
> +
> +static u32 __kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
> +{
> +	u64 shift = kvm_granule_shift(pgt->start_level - 1); /* May underflow */
> +	u64 mask = BIT(pgt->ia_bits) - 1;
> +
> +	return (addr & mask) >> shift;
> +}
> +
> +static u32 kvm_pgd_page_idx(struct kvm_pgtable_walk_data *data)
> +{
> +	return __kvm_pgd_page_idx(data->pgt, data->addr);
> +}
> +
> +static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
> +{
> +	struct kvm_pgtable pgt = {
> +		.ia_bits	= ia_bits,
> +		.start_level	= start_level,
> +	};
> +
> +	return __kvm_pgd_page_idx(&pgt, -1ULL) + 1;
> +}
> +

It seems @pgt.start_level is assigned with wrong value here.
For example, @start_level is 2 when @ia_bits and PAGE_SIZE
are 40 and 64KB separately. In this case, __kvm_pgd_page_idx()
always return zero. However, the extra page covers up the
issue. I think something like below might be needed:

	struct kvm_pgtable pgt = {
		.ia_bits	= ia_bits,
		.start_level	= KVM_PGTABLE_MAX_LEVELS - start_level + 1,
	};


> +static bool kvm_pte_valid(kvm_pte_t pte)
> +{
> +	return pte & KVM_PTE_VALID;
> +}
> +
> +static bool kvm_pte_table(kvm_pte_t pte, u32 level)
> +{
> +	if (level == KVM_PGTABLE_MAX_LEVELS - 1)
> +		return false;
> +
> +	if (!kvm_pte_valid(pte))
> +		return false;
> +
> +	return FIELD_GET(KVM_PTE_TYPE, pte) == KVM_PTE_TYPE_TABLE;
> +}
> +
> +static u64 kvm_pte_to_phys(kvm_pte_t pte)
> +{
> +	u64 pa = pte & KVM_PTE_ADDR_MASK;
> +
> +	if (PAGE_SHIFT == 16)
> +		pa |= FIELD_GET(KVM_PTE_ADDR_51_48, pte) << 48;
> +
> +	return pa;
> +}
> +
> +static kvm_pte_t kvm_phys_to_pte(u64 pa)
> +{
> +	kvm_pte_t pte = pa & KVM_PTE_ADDR_MASK;
> +
> +	if (PAGE_SHIFT == 16)
> +		pte |= FIELD_PREP(KVM_PTE_ADDR_51_48, pa >> 48);
> +
> +	return pte;
> +}
> +
> +static kvm_pte_t *kvm_pte_follow(kvm_pte_t pte)
> +{
> +	return __va(kvm_pte_to_phys(pte));
> +}
> +
> +static void kvm_set_invalid_pte(kvm_pte_t *ptep)
> +{
> +	kvm_pte_t pte = 0;
> +	WRITE_ONCE(*ptep, pte);
> +}
> +
> +static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
> +{
> +	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(__pa(childp));
> +
> +	pte |= FIELD_PREP(KVM_PTE_TYPE, KVM_PTE_TYPE_TABLE);
> +	pte |= KVM_PTE_VALID;
> +
> +	WARN_ON(kvm_pte_valid(old));
> +	smp_store_release(ptep, pte);
> +}
> +
> +static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
> +				   u32 level)
> +{
> +	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(pa);
> +	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
> +							   KVM_PTE_TYPE_BLOCK;
> +
> +	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
> +	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
> +	pte |= KVM_PTE_VALID;
> +
> +	/* Tolerate KVM recreating the exact same mapping. */
> +	if (kvm_pte_valid(old))
> +		return old == pte;
> +
> +	smp_store_release(ptep, pte);
> +	return true;
> +}
> +
> +static int kvm_pgtable_visitor_cb(struct kvm_pgtable_walk_data *data, u64 addr,
> +				  u32 level, kvm_pte_t *ptep,
> +				  enum kvm_pgtable_walk_flags flag)
> +{
> +	struct kvm_pgtable_walker *walker = data->walker;
> +	return walker->cb(addr, data->end, level, ptep, flag, walker->arg);
> +}
> +
> +static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> +			      kvm_pte_t *pgtable, u32 level);
> +
> +static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
> +				      kvm_pte_t *ptep, u32 level)
> +{
> +	int ret = 0;
> +	u64 addr = data->addr;
> +	kvm_pte_t *childp, pte = *ptep;
> +	bool table = kvm_pte_table(pte, level);
> +	enum kvm_pgtable_walk_flags flags = data->walker->flags;
> +
> +	if (table && (flags & KVM_PGTABLE_WALK_TABLE_PRE)) {
> +		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> +					     KVM_PGTABLE_WALK_TABLE_PRE);
> +	}
> +
> +	if (!table && (flags & KVM_PGTABLE_WALK_LEAF)) {
> +		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> +					     KVM_PGTABLE_WALK_LEAF);
> +		pte = *ptep;
> +		table = kvm_pte_table(pte, level);
> +	}
> +
> +	if (ret)
> +		goto out;
> +
> +	if (!table) {
> +		data->addr += kvm_granule_size(level);
> +		goto out;
> +	}
> +
> +	childp = kvm_pte_follow(pte);
> +	ret = __kvm_pgtable_walk(data, childp, level + 1);
> +	if (ret)
> +		goto out;
> +
> +	if (flags & KVM_PGTABLE_WALK_TABLE_POST) {
> +		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> +					     KVM_PGTABLE_WALK_TABLE_POST);
> +	}
> +
> +out:
> +	return ret;
> +}
> +
> +static int __kvm_pgtable_walk(struct kvm_pgtable_walk_data *data,
> +			      kvm_pte_t *pgtable, u32 level)
> +{
> +	u32 idx;
> +	int ret = 0;
> +
> +	if (WARN_ON_ONCE(level >= KVM_PGTABLE_MAX_LEVELS))
> +		return -EINVAL;
> +
> +	for (idx = kvm_pgtable_idx(data, level); idx < PTRS_PER_PTE; ++idx) {
> +		kvm_pte_t *ptep = &pgtable[idx];
> +
> +		if (data->addr >= data->end)
> +			break;
> +
> +		ret = __kvm_pgtable_visit(data, ptep, level);
> +		if (ret)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +
> +static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
> +{
> +	u32 idx;
> +	int ret = 0;
> +	struct kvm_pgtable *pgt = data->pgt;
> +	u64 limit = BIT(pgt->ia_bits);
> +
> +	if (data->addr > limit || data->end > limit)
> +		return -ERANGE;
> +
> +	if (!pgt->pgd)
> +		return -EINVAL;
> +
> +	for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
> +		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
> +
> +		ret = __kvm_pgtable_walk(data, ptep, pgt->start_level);
> +		if (ret)
> +			break;
> +	}
> +
> +	return ret;
> +}
> +

I guess we need bail on the following condition:

         if (data->addr >= limit || data->end >= limit)
             return -ERANGE;

> +int kvm_pgtable_walk(struct kvm_pgtable *pgt, u64 addr, u64 size,
> +		     struct kvm_pgtable_walker *walker)
> +{
> +	struct kvm_pgtable_walk_data walk_data = {
> +		.pgt	= pgt,
> +		.addr	= ALIGN_DOWN(addr, PAGE_SIZE),
> +		.end	= PAGE_ALIGN(walk_data.addr + size),
> +		.walker	= walker,
> +	};
> +
> +	return _kvm_pgtable_walk(&walk_data);
> +}
> 

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 05/21] KVM: arm64: Add support for creating kernel-agnostic stage-2 page tables
  2020-08-25  9:39 ` [PATCH v3 05/21] KVM: arm64: Add support for creating kernel-agnostic stage-2 page tables Will Deacon
@ 2020-09-02  6:40   ` Gavin Shan
  2020-09-02 11:30     ` Will Deacon
  0 siblings, 1 reply; 86+ messages in thread
From: Gavin Shan @ 2020-09-02  6:40 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, linux-arm-kernel

Hi Will,

On 8/25/20 7:39 PM, Will Deacon wrote:
> Introduce alloc() and free() functions to the generic page-table code
> for guest stage-2 page-tables and plumb these into the existing KVM
> page-table allocator. Subsequent patches will convert other operations
> within the KVM allocator over to the generic code.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/include/asm/kvm_host.h    |  1 +
>   arch/arm64/include/asm/kvm_pgtable.h | 18 +++++++++
>   arch/arm64/kvm/hyp/pgtable.c         | 51 ++++++++++++++++++++++++++
>   arch/arm64/kvm/mmu.c                 | 55 +++++++++++++++-------------
>   4 files changed, 99 insertions(+), 26 deletions(-)
> 

With the following one question resolved:

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index e52c927aade5..0b7c702b2151 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -81,6 +81,7 @@ struct kvm_s2_mmu {
>   	 */
>   	pgd_t		*pgd;
>   	phys_addr_t	pgd_phys;
> +	struct kvm_pgtable *pgt;
>   
>   	/* The last vcpu id that ran on each physical CPU */
>   	int __percpu *last_vcpu_ran;
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 2af84ab78cb8..3389f978d573 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -116,6 +116,24 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt);
>   int kvm_pgtable_hyp_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u64 phys,
>   			enum kvm_pgtable_prot prot);
>   
> +/**
> + * kvm_pgtable_stage2_init() - Initialise a guest stage-2 page-table.
> + * @pgt:	Uninitialised page-table structure to initialise.
> + * @kvm:	KVM structure representing the guest virtual machine.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm);
> +
> +/**
> + * kvm_pgtable_stage2_destroy() - Destroy an unused guest stage-2 page-table.
> + * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
> + *
> + * The page-table is assumed to be unreachable by any hardware walkers prior
> + * to freeing and therefore no TLB invalidation is performed.
> + */
> +void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
> +
>   /**
>    * kvm_pgtable_walk() - Walk a page-table.
>    * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index d75166823ad9..b8550ccaef4d 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -419,3 +419,54 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
>   	free_page((unsigned long)pgt->pgd);
>   	pgt->pgd = NULL;
>   }
> +
> +int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm)
> +{
> +	size_t pgd_sz;
> +	u64 vtcr = kvm->arch.vtcr;
> +	u32 ia_bits = VTCR_EL2_IPA(vtcr);
> +	u32 sl0 = FIELD_GET(VTCR_EL2_SL0_MASK, vtcr);
> +	u32 start_level = VTCR_EL2_TGRAN_SL0_BASE - sl0;
> +
> +	pgd_sz = kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE;
> +	pgt->pgd = alloc_pages_exact(pgd_sz, GFP_KERNEL | __GFP_ZERO);
> +	if (!pgt->pgd)
> +		return -ENOMEM;
> +
> +	pgt->ia_bits		= ia_bits;
> +	pgt->start_level	= start_level;
> +	pgt->mmu		= &kvm->arch.mmu;
> +	return 0;
> +}
> +
> +static int stage2_free_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +			      enum kvm_pgtable_walk_flags flag,
> +			      void * const arg)
> +{
> +	kvm_pte_t pte = *ptep;
> +
> +	if (!kvm_pte_valid(pte))
> +		return 0;
> +
> +	put_page(virt_to_page(ptep));
> +
> +	if (kvm_pte_table(pte, level))
> +		free_page((unsigned long)kvm_pte_follow(pte));
> +
> +	return 0;
> +}
> +
> +void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt)
> +{
> +	size_t pgd_sz;
> +	struct kvm_pgtable_walker walker = {
> +		.cb	= stage2_free_walker,
> +		.flags	= KVM_PGTABLE_WALK_LEAF |
> +			  KVM_PGTABLE_WALK_TABLE_POST,
> +	};
> +
> +	WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker));
> +	pgd_sz = kvm_pgd_pages(pgt->ia_bits, pgt->start_level) * PAGE_SIZE;
> +	free_pages_exact(pgt->pgd, pgd_sz);
> +	pgt->pgd = NULL;
> +}
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index fabd72b0c8a4..4607e9ca60a2 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -668,47 +668,49 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
>    * @kvm:	The pointer to the KVM structure
>    * @mmu:	The pointer to the s2 MMU structure
>    *
> - * Allocates only the stage-2 HW PGD level table(s) of size defined by
> - * stage2_pgd_size(mmu->kvm).
> - *
> + * Allocates only the stage-2 HW PGD level table(s).
>    * Note we don't need locking here as this is only called when the VM is
>    * created, which can only be done once.
>    */
>   int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
>   {
> -	phys_addr_t pgd_phys;
> -	pgd_t *pgd;
> -	int cpu;
> +	int cpu, err;
> +	struct kvm_pgtable *pgt;
>   
> -	if (mmu->pgd != NULL) {
> +	if (mmu->pgt != NULL) {
>   		kvm_err("kvm_arch already initialized?\n");
>   		return -EINVAL;
>   	}
>   
> -	/* Allocate the HW PGD, making sure that each page gets its own refcount */
> -	pgd = alloc_pages_exact(stage2_pgd_size(kvm), GFP_KERNEL | __GFP_ZERO);
> -	if (!pgd)
> +	pgt = kzalloc(sizeof(*pgt), GFP_KERNEL);
> +	if (!pgt)
>   		return -ENOMEM;
>   
> -	pgd_phys = virt_to_phys(pgd);
> -	if (WARN_ON(pgd_phys & ~kvm_vttbr_baddr_mask(kvm)))
> -		return -EINVAL;
> +	err = kvm_pgtable_stage2_init(pgt, kvm);
> +	if (err)
> +		goto out_free_pgtable;
>   
>   	mmu->last_vcpu_ran = alloc_percpu(typeof(*mmu->last_vcpu_ran));
>   	if (!mmu->last_vcpu_ran) {
> -		free_pages_exact(pgd, stage2_pgd_size(kvm));
> -		return -ENOMEM;
> +		err = -ENOMEM;
> +		goto out_destroy_pgtable;
>   	}
>   
>   	for_each_possible_cpu(cpu)
>   		*per_cpu_ptr(mmu->last_vcpu_ran, cpu) = -1;
>   
>   	mmu->kvm = kvm;
> -	mmu->pgd = pgd;
> -	mmu->pgd_phys = pgd_phys;
> +	mmu->pgt = pgt;
> +	mmu->pgd_phys = __pa(pgt->pgd);
> +	mmu->pgd = (void *)pgt->pgd;
>   	mmu->vmid.vmid_gen = 0;
> -
>   	return 0;
> +
> +out_destroy_pgtable:
> +	kvm_pgtable_stage2_destroy(pgt);
> +out_free_pgtable:
> +	kfree(pgt);
> +	return err;
>   }
>

kvm_pgtable_stage2_destroy() might not needed here because
the stage2 page pgtable is empty so far. However, it should
be rare to hit the case. If I'm correct, what we need to do
is just freeing the PGDs.

    
>   static void stage2_unmap_memslot(struct kvm *kvm,
> @@ -781,20 +783,21 @@ void stage2_unmap_vm(struct kvm *kvm)
>   void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>   {
>   	struct kvm *kvm = mmu->kvm;
> -	void *pgd = NULL;
> +	struct kvm_pgtable *pgt = NULL;
>   
>   	spin_lock(&kvm->mmu_lock);
> -	if (mmu->pgd) {
> -		unmap_stage2_range(mmu, 0, kvm_phys_size(kvm));
> -		pgd = READ_ONCE(mmu->pgd);
> +	pgt = mmu->pgt;
> +	if (pgt) {
>   		mmu->pgd = NULL;
> +		mmu->pgd_phys = 0;
> +		mmu->pgt = NULL;
> +		free_percpu(mmu->last_vcpu_ran);
>   	}
>   	spin_unlock(&kvm->mmu_lock);
>   
> -	/* Free the HW pgd, one page at a time */
> -	if (pgd) {
> -		free_pages_exact(pgd, stage2_pgd_size(kvm));
> -		free_percpu(mmu->last_vcpu_ran);
> +	if (pgt) {
> +		kvm_pgtable_stage2_destroy(pgt);
> +		kfree(pgt);
>   	}
>   }
>   

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 03/21] KVM: arm64: Add support for creating kernel-agnostic stage-1 page tables
  2020-08-28 15:35   ` Alexandru Elisei
@ 2020-09-02 10:06     ` Will Deacon
  0 siblings, 0 replies; 86+ messages in thread
From: Will Deacon @ 2020-09-02 10:06 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: Marc Zyngier, kernel-team, kvmarm, linux-arm-kernel, Catalin Marinas

Hi Alex,

On Fri, Aug 28, 2020 at 04:35:24PM +0100, Alexandru Elisei wrote:
> On 8/25/20 10:39 AM, Will Deacon wrote:
> > The generic page-table walker is pretty useless as it stands, because it
> > doesn't understand enough to allocate anything. Teach it about stage-1
> > page-tables, and hook up an API for allocating these for the hypervisor
> > at EL2.
> >
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Quentin Perret <qperret@google.com>
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_pgtable.h |  34 +++++++
> >  arch/arm64/kvm/hyp/pgtable.c         | 131 +++++++++++++++++++++++++++
> >  2 files changed, 165 insertions(+)

[...]

> > +/**
> > + * kvm_pgtable_hyp_map() - Install a mapping in a hypervisor stage-1 page-table.
> > + * @pgt:	Page-table structure initialised by kvm_pgtable_hyp_init().
> > + * @addr:	Virtual address at which to place the mapping.
> > + * @size:	Size of the mapping.
> > + * @phys:	Physical address of the memory to map.
> > + * @prot:	Permissions and attributes for the mapping.
> > + *
> > + * If device attributes are not explicitly requested in @prot, then the
> > + * mapping will be normal, cacheable.
> > + *
> > + * Return: 0 on success, negative error code on failure.
> 
> From my understanding of the code, when the caller replaces an existing leaf entry
> or a table with a different one, KVM will print a warning instead of using
> break-before-make (if necessary). It might be worth pointing out that it is
> expected from the callers not to do that, because it's not immediately obvious.

For hypervisor stage-1 mappings, we WARN() and ignore the mapping request
if we run into an existing valid leaf entry for a different mapping. That
shouldn't happen, as hyp mappings are typically static and performed only
for the .hyp.* sections and the GICv2 memory-mapped bits. We don't even
provide an unmap interface. But yes, I can mention this in the comment:

  | Attempts to install a mapping for a virtual address that is already
  | mapped will be rejected with an error and a WARN().

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 02/21] KVM: arm64: Add stand-alone page-table walker infrastructure
  2020-08-27 16:27   ` Alexandru Elisei
  2020-08-28 15:43     ` Alexandru Elisei
@ 2020-09-02 10:36     ` Will Deacon
  1 sibling, 0 replies; 86+ messages in thread
From: Will Deacon @ 2020-09-02 10:36 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: Marc Zyngier, kernel-team, kvmarm, linux-arm-kernel, Catalin Marinas

On Thu, Aug 27, 2020 at 05:27:13PM +0100, Alexandru Elisei wrote:
> It looks to me like the fact that code doesn't take into account the fact that we
> can have concatenated pages at the initial level of lookup. Am I missing
> something? Is it added in later patches and I missed it? I've commented below in a
> few places where I noticed that.

(seems like you figured some of this out in a later reply).

> On 8/25/20 10:39 AM, Will Deacon wrote:
> > The KVM page-table code is intricately tied into the kernel page-table
> > code and re-uses the pte/pmd/pud/p4d/pgd macros directly in an attempt
> > to reduce code duplication. Unfortunately, the reality is that there is
> > an awful lot of code required to make this work, and at the end of the
> > day you're limited to creating page-tables with the same configuration
> > as the host kernel. Furthermore, lifting the page-table code to run
> > directly at EL2 on a non-VHE system (as we plan to to do in future
> > patches) is practically impossible due to the number of dependencies it
> > has on the core kernel.
> >
> > Introduce a framework for walking Armv8 page-tables configured
> > independently from the host kernel.
> >
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Quentin Perret <qperret@google.com>
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_pgtable.h | 101 ++++++++++
> >  arch/arm64/kvm/hyp/Makefile          |   2 +-
> >  arch/arm64/kvm/hyp/pgtable.c         | 290 +++++++++++++++++++++++++++
> >  3 files changed, 392 insertions(+), 1 deletion(-)
> >  create mode 100644 arch/arm64/include/asm/kvm_pgtable.h
> >  create mode 100644 arch/arm64/kvm/hyp/pgtable.c

[...]

> > +static u64 kvm_granule_shift(u32 level)
> > +{
> > +	return (KVM_PGTABLE_MAX_LEVELS - level) * (PAGE_SHIFT - 3) + 3;
> 
> Isn't that the same same thing as the macro ARM64_HW_PGTABLE_LEVEL_SHIFT(n) from
> pgtable-hwdef.h? I think the header is already included, as this file uses
> PTRS_PER_PTE and that's the only place I found it defined.

Hmm, that's an interesting one. If we ever want to adjust KVM_PGTABLE_MAX_LEVELS
things will break, so we just need to take that into account should future
architecture extensions add an extra level. I suppose I can add a comment
to that effect and use ARM64_HW_PGTABLE_LEVEL_SHIFT() instead.

> 
> > +}
> > +
> > +static u64 kvm_granule_size(u32 level)
> > +{
> > +	return BIT(kvm_granule_shift(level));
> > +}
> > +
> > +static bool kvm_block_mapping_supported(u64 addr, u64 end, u64 phys, u32 level)
> > +{
> > +	u64 granule = kvm_granule_size(level);
> > +
> > +	/*
> > +	 * Reject invalid block mappings and don't bother with 4TB mappings for
> > +	 * 52-bit PAs.
> > +	 */
> > +	if (level == 0 || (PAGE_SIZE != SZ_4K && level == 1))
> > +		return false;
> > +
> > +	if (granule > (end - addr))
> > +		return false;
> > +
> > +	return IS_ALIGNED(addr, granule) && IS_ALIGNED(phys, granule);
> > +}
> 
> This is a very nice rewrite of fault_supports_stage2_huge_mapping, definitely
> easier to understand.

Thanks!

> > +static u32 kvm_start_level(u64 ia_bits)
> > +{
> > +	u64 levels = DIV_ROUND_UP(ia_bits - PAGE_SHIFT, PAGE_SHIFT - 3);
> 
> Isn't that the same same thing as the macro ARM64_HW_PGTABLE_LEVELS from
> pgtable-hwdef.h?

Yes, although this is slightly more idiomatic due to its use of
DIV_ROUND_UP imo. But happy to replace it.

> 
> > +	return KVM_PGTABLE_MAX_LEVELS - levels;
> 
> I tried to verify this formula and I think there's something that I don't
> understand or I'm missing. For the default KVM setup, where the user doesn't
> specify an IPA size different from the 40 bits default: ia_bits = 40 (IPA =
> [39:0]), 4KB pages, translation starting at level 1 with 2 concatenated level 1
> tables (VTCR_EL2.T0SZ = 24, VTCR_EL2.SL0 = 1, VTCR_EL2.TG0 = 0, starting level
> from table D5-13 at page D5-2566, ARM DDI 0487F.b), according to the formula I get:
> 
> levels = DIV_ROUND_UP(40 - 12, 12 -3) = DIV_ROUND_UP(28, 9) = 4
> return 4 - 4 = 0
> 
> which means the resulting starting level is 0 instead of 1.

Yeah, this is fiddly. kvm_start_level() doesn't cater for concatenation at
all and it's only used to determine the start level for the hypervisor
stage-1 table. For the stage-2 page-tables, we actually extract the start
level back out of the vtcr, as that gets configured separately and so we
just parameterise ourselves around that.

I think I'll remove kvm_start_level() entirely, and just inlined it into
its single call site (which will be neater using ARM64_HW_PGTABLE_LEVELS).

> 
> > +}
> > +
> > +static u32 kvm_pgtable_idx(struct kvm_pgtable_walk_data *data, u32 level)
> > +{
> > +	u64 shift = kvm_granule_shift(level);
> > +	u64 mask = BIT(PAGE_SHIFT - 3) - 1;
> 
> This doesn't seem to take into account the fact that we can have concatenated
> initial page tables.

This is ok, as we basically process the PGD one page at a time so that the
details of concatenation only really need to be exposed to the iterator.
See the use of kvm_pgd_page_idx() in _kvm_pgtable_walk().

> > +static inline int __kvm_pgtable_visit(struct kvm_pgtable_walk_data *data,
> > +				      kvm_pte_t *ptep, u32 level)
> > +{
> > +	int ret = 0;
> > +	u64 addr = data->addr;
> > +	kvm_pte_t *childp, pte = *ptep;
> > +	bool table = kvm_pte_table(pte, level);
> > +	enum kvm_pgtable_walk_flags flags = data->walker->flags;
> > +
> > +	if (table && (flags & KVM_PGTABLE_WALK_TABLE_PRE)) {
> > +		ret = kvm_pgtable_visitor_cb(data, addr, level, ptep,
> > +					     KVM_PGTABLE_WALK_TABLE_PRE);
> 
> I see that below we check if the visitor modified the leaf entry and turned into a
> table. Is it not allowed for a visitor to turn a table into a block mapping?

It is allowed, but in that case we don't revisit the block entry, as there's
really no need. Compare that with installing a table, where you may well
want to descend into the new table to initialise the new entries in there.

The kerneldoc for kvm_pgtable_walk() talks a bit about this. (aside: that
function isn't actually used, but it felt useful to expose it as an
interface).

Thanks for the review,

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 02/21] KVM: arm64: Add stand-alone page-table walker infrastructure
  2020-08-28 15:51   ` Alexandru Elisei
@ 2020-09-02 10:49     ` Will Deacon
  0 siblings, 0 replies; 86+ messages in thread
From: Will Deacon @ 2020-09-02 10:49 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: Marc Zyngier, kernel-team, kvmarm, linux-arm-kernel, Catalin Marinas

On Fri, Aug 28, 2020 at 04:51:02PM +0100, Alexandru Elisei wrote:
> On 8/25/20 10:39 AM, Will Deacon wrote:
> > [..]
> > +static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
> > +{
> > +	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(__pa(childp));
> > +
> > +	pte |= FIELD_PREP(KVM_PTE_TYPE, KVM_PTE_TYPE_TABLE);
> > +	pte |= KVM_PTE_VALID;
> > +
> > +	WARN_ON(kvm_pte_valid(old));
> > +	smp_store_release(ptep, pte);
> > +}
> > +
> > +static bool kvm_set_valid_leaf_pte(kvm_pte_t *ptep, u64 pa, kvm_pte_t attr,
> > +				   u32 level)
> > +{
> > +	kvm_pte_t old = *ptep, pte = kvm_phys_to_pte(pa);
> > +	u64 type = (level == KVM_PGTABLE_MAX_LEVELS - 1) ? KVM_PTE_TYPE_PAGE :
> > +							   KVM_PTE_TYPE_BLOCK;
> > +
> > +	pte |= attr & (KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI);
> > +	pte |= FIELD_PREP(KVM_PTE_TYPE, type);
> > +	pte |= KVM_PTE_VALID;
> > +
> > +	/* Tolerate KVM recreating the exact same mapping. */
> > +	if (kvm_pte_valid(old))
> > +		return old == pte;
> > +
> > +	smp_store_release(ptep, pte);
> > +	return true;
> > +}
> 
> These two functions look inconsistent to me - we refuse to update a valid leaf
> entry with a new value, but we allow updating a valid table. Is there something
> that I'm not taking into account?

Well, the table code will WARN() so it's not like we do it quietly. I could
try to propagate the error, but I don't see what the gains us other than
complexity and code that likely won't get tested.

The leaf case is different, because some callers will handle the failure
and perform break-before-make; TLBI (e.g. because of an MMU notifier
changing the PTE).

Take a look at how stage2_map_walker_try_leaf() ends up being called from
kvm_set_spte_handler().

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 02/21] KVM: arm64: Add stand-alone page-table walker infrastructure
  2020-09-02  6:31   ` Gavin Shan
@ 2020-09-02 11:02     ` Will Deacon
  2020-09-03  1:11       ` Gavin Shan
  0 siblings, 1 reply; 86+ messages in thread
From: Will Deacon @ 2020-09-02 11:02 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, kvmarm, linux-arm-kernel

Hi Gavin,

On Wed, Sep 02, 2020 at 04:31:32PM +1000, Gavin Shan wrote:
> On 8/25/20 7:39 PM, Will Deacon wrote:
> > The KVM page-table code is intricately tied into the kernel page-table
> > code and re-uses the pte/pmd/pud/p4d/pgd macros directly in an attempt
> > to reduce code duplication. Unfortunately, the reality is that there is
> > an awful lot of code required to make this work, and at the end of the
> > day you're limited to creating page-tables with the same configuration
> > as the host kernel. Furthermore, lifting the page-table code to run
> > directly at EL2 on a non-VHE system (as we plan to to do in future
> > patches) is practically impossible due to the number of dependencies it
> > has on the core kernel.
> > 
> > Introduce a framework for walking Armv8 page-tables configured
> > independently from the host kernel.
> > 
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Quentin Perret <qperret@google.com>
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >   arch/arm64/include/asm/kvm_pgtable.h | 101 ++++++++++
> >   arch/arm64/kvm/hyp/Makefile          |   2 +-
> >   arch/arm64/kvm/hyp/pgtable.c         | 290 +++++++++++++++++++++++++++
> >   3 files changed, 392 insertions(+), 1 deletion(-)
> >   create mode 100644 arch/arm64/include/asm/kvm_pgtable.h
> >   create mode 100644 arch/arm64/kvm/hyp/pgtable.c

[...]

> > +struct kvm_pgtable_walk_data {
> > +	struct kvm_pgtable		*pgt;
> > +	struct kvm_pgtable_walker	*walker;
> > +
> > +	u64				addr;
> > +	u64				end;
> > +};
> > +
> 
> Some of the following function might be worthy to be inlined, considering
> their complexity :)

I'll leave that for the compiler to figure out :)

> > +static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
> > +{
> > +	struct kvm_pgtable pgt = {
> > +		.ia_bits	= ia_bits,
> > +		.start_level	= start_level,
> > +	};
> > +
> > +	return __kvm_pgd_page_idx(&pgt, -1ULL) + 1;
> > +}
> > +
> 
> It seems @pgt.start_level is assigned with wrong value here.
> For example, @start_level is 2 when @ia_bits and PAGE_SIZE
> are 40 and 64KB separately. In this case, __kvm_pgd_page_idx()
> always return zero. However, the extra page covers up the
> issue. I think something like below might be needed:
> 
> 	struct kvm_pgtable pgt = {
> 		.ia_bits	= ia_bits,
> 		.start_level	= KVM_PGTABLE_MAX_LEVELS - start_level + 1,
> 	};

Hmm, we're pulling the start_level right out of the vtcr, so I don't see
how it can be wrong. In your example, a start_level of 2 seems correct to
me, as we'll translate 13 bits there, then 13 bits at level 3 which covers
the 24 bits you need (with a 16-bit offset within the page).

Your suggestion would give us a start_level of 1, which has a redundant
level of translation. Maybe you're looking at the levels upside-down? The
top level is level 0 and each time you walk to a new level, that number
increases.

But perhaps I'm missing something. Please could you elaborate if you think
there's a problem here?

> > +static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
> > +{
> > +	u32 idx;
> > +	int ret = 0;
> > +	struct kvm_pgtable *pgt = data->pgt;
> > +	u64 limit = BIT(pgt->ia_bits);
> > +
> > +	if (data->addr > limit || data->end > limit)
> > +		return -ERANGE;
> > +
> > +	if (!pgt->pgd)
> > +		return -EINVAL;
> > +
> > +	for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
> > +		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
> > +
> > +		ret = __kvm_pgtable_walk(data, ptep, pgt->start_level);
> > +		if (ret)
> > +			break;
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> 
> I guess we need bail on the following condition:
> 
>         if (data->addr >= limit || data->end >= limit)
>             return -ERANGE;

What's wrong with the existing check? In particular, I think we _want_
to support data->end == limit (it's exclusive). If data->addr == limit,
then we'll have a size of zero and the loop won't run.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 05/21] KVM: arm64: Add support for creating kernel-agnostic stage-2 page tables
  2020-09-02  6:40   ` Gavin Shan
@ 2020-09-02 11:30     ` Will Deacon
  0 siblings, 0 replies; 86+ messages in thread
From: Will Deacon @ 2020-09-02 11:30 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, kvmarm, linux-arm-kernel

On Wed, Sep 02, 2020 at 04:40:03PM +1000, Gavin Shan wrote:
> On 8/25/20 7:39 PM, Will Deacon wrote:
> > Introduce alloc() and free() functions to the generic page-table code
> > for guest stage-2 page-tables and plumb these into the existing KVM
> > page-table allocator. Subsequent patches will convert other operations
> > within the KVM allocator over to the generic code.
> > 
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Quentin Perret <qperret@google.com>
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >   arch/arm64/include/asm/kvm_host.h    |  1 +
> >   arch/arm64/include/asm/kvm_pgtable.h | 18 +++++++++
> >   arch/arm64/kvm/hyp/pgtable.c         | 51 ++++++++++++++++++++++++++
> >   arch/arm64/kvm/mmu.c                 | 55 +++++++++++++++-------------
> >   4 files changed, 99 insertions(+), 26 deletions(-)
> > 
> 
> With the following one question resolved:
> 
> Reviewed-by: Gavin Shan <gshan@redhat.com>

Thanks!

> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index fabd72b0c8a4..4607e9ca60a2 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -668,47 +668,49 @@ int create_hyp_exec_mappings(phys_addr_t phys_addr, size_t size,
> >    * @kvm:	The pointer to the KVM structure
> >    * @mmu:	The pointer to the s2 MMU structure
> >    *
> > - * Allocates only the stage-2 HW PGD level table(s) of size defined by
> > - * stage2_pgd_size(mmu->kvm).
> > - *
> > + * Allocates only the stage-2 HW PGD level table(s).
> >    * Note we don't need locking here as this is only called when the VM is
> >    * created, which can only be done once.
> >    */
> >   int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
> >   {
> > -	phys_addr_t pgd_phys;
> > -	pgd_t *pgd;
> > -	int cpu;
> > +	int cpu, err;
> > +	struct kvm_pgtable *pgt;
> > -	if (mmu->pgd != NULL) {
> > +	if (mmu->pgt != NULL) {
> >   		kvm_err("kvm_arch already initialized?\n");
> >   		return -EINVAL;
> >   	}
> > -	/* Allocate the HW PGD, making sure that each page gets its own refcount */
> > -	pgd = alloc_pages_exact(stage2_pgd_size(kvm), GFP_KERNEL | __GFP_ZERO);
> > -	if (!pgd)
> > +	pgt = kzalloc(sizeof(*pgt), GFP_KERNEL);
> > +	if (!pgt)
> >   		return -ENOMEM;
> > -	pgd_phys = virt_to_phys(pgd);
> > -	if (WARN_ON(pgd_phys & ~kvm_vttbr_baddr_mask(kvm)))
> > -		return -EINVAL;
> > +	err = kvm_pgtable_stage2_init(pgt, kvm);
> > +	if (err)
> > +		goto out_free_pgtable;
> >   	mmu->last_vcpu_ran = alloc_percpu(typeof(*mmu->last_vcpu_ran));
> >   	if (!mmu->last_vcpu_ran) {
> > -		free_pages_exact(pgd, stage2_pgd_size(kvm));
> > -		return -ENOMEM;
> > +		err = -ENOMEM;
> > +		goto out_destroy_pgtable;
> >   	}
> >   	for_each_possible_cpu(cpu)
> >   		*per_cpu_ptr(mmu->last_vcpu_ran, cpu) = -1;
> >   	mmu->kvm = kvm;
> > -	mmu->pgd = pgd;
> > -	mmu->pgd_phys = pgd_phys;
> > +	mmu->pgt = pgt;
> > +	mmu->pgd_phys = __pa(pgt->pgd);
> > +	mmu->pgd = (void *)pgt->pgd;
> >   	mmu->vmid.vmid_gen = 0;
> > -
> >   	return 0;
> > +
> > +out_destroy_pgtable:
> > +	kvm_pgtable_stage2_destroy(pgt);
> > +out_free_pgtable:
> > +	kfree(pgt);
> > +	return err;
> >   }
> > 
> 
> kvm_pgtable_stage2_destroy() might not needed here because
> the stage2 page pgtable is empty so far. However, it should
> be rare to hit the case. If I'm correct, what we need to do
> is just freeing the PGDs.

Right, but kvm_pgtable_stage2_destroy() also frees the PGDs because it
knows how many pages there are and they were allocated by
kvm_pgtable_stage2_init().

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 04/21] KVM: arm64: Use generic allocator for hyp stage-1 page-tables
  2020-08-28 16:32   ` Alexandru Elisei
@ 2020-09-02 11:35     ` Will Deacon
  2020-09-02 14:48       ` Alexandru Elisei
  0 siblings, 1 reply; 86+ messages in thread
From: Will Deacon @ 2020-09-02 11:35 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: Marc Zyngier, kernel-team, kvmarm, linux-arm-kernel, Catalin Marinas

On Fri, Aug 28, 2020 at 05:32:16PM +0100, Alexandru Elisei wrote:
> On 8/25/20 10:39 AM, Will Deacon wrote:
> > Now that we have a shiny new page-table allocator, replace the hyp
> > page-table code with calls into the new API. This also allows us to
> > remove the extended idmap code, as we can now simply ensure that the
> > VA size is large enough to map everything we need.
> >
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Quentin Perret <qperret@google.com>
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_mmu.h       |  78 +----
> >  arch/arm64/include/asm/kvm_pgtable.h   |   5 +
> >  arch/arm64/include/asm/pgtable-hwdef.h |   6 -
> >  arch/arm64/include/asm/pgtable-prot.h  |   6 -
> >  arch/arm64/kvm/mmu.c                   | 414 +++----------------------
> >  5 files changed, 45 insertions(+), 464 deletions(-)
> >
> > [..]
> > @@ -2356,6 +2028,7 @@ static int kvm_map_idmap_text(pgd_t *pgd)
> >  int kvm_mmu_init(void)
> >  {
> >  	int err;
> > +	u32 hyp_va_bits;
> >  
> >  	hyp_idmap_start = __pa_symbol(__hyp_idmap_text_start);
> >  	hyp_idmap_start = ALIGN_DOWN(hyp_idmap_start, PAGE_SIZE);
> > @@ -2369,6 +2042,8 @@ int kvm_mmu_init(void)
> >  	 */
> >  	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
> >  
> > +	hyp_va_bits = 64 - ((idmap_t0sz & TCR_T0SZ_MASK) >> TCR_T0SZ_OFFSET);
> 
> idmap_t0sz is defined in mm/mmu.c as: TCR_T0SZ(VA_BITS) = (UL(64) - VA_BITS) <<
> TCR_T0SZ_OFFSET. Looks to me like hyp_va_bits == VA_BITS.

Careful! It can get rewritten in head.S if we determine that physical memory
is in an awkward place and not covered by VA_BITS in an identity mapping.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 06/21] KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table
  2020-09-01 16:24   ` Alexandru Elisei
@ 2020-09-02 11:46     ` Will Deacon
  0 siblings, 0 replies; 86+ messages in thread
From: Will Deacon @ 2020-09-02 11:46 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: Marc Zyngier, kernel-team, kvmarm, linux-arm-kernel, Catalin Marinas

On Tue, Sep 01, 2020 at 05:24:58PM +0100, Alexandru Elisei wrote:
> On 8/25/20 10:39 AM, Will Deacon wrote:
> > Add stage-2 map() and unmap() operations to the generic page-table code.
> >
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Quentin Perret <qperret@google.com>
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_pgtable.h |  39 ++++
> >  arch/arm64/kvm/hyp/pgtable.c         | 262 +++++++++++++++++++++++++++
> >  2 files changed, 301 insertions(+)

[...]

> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index b8550ccaef4d..41ee8f3c0369 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -32,10 +32,19 @@
> >  #define KVM_PTE_LEAF_ATTR_LO_S1_SH_IS	3
> >  #define KVM_PTE_LEAF_ATTR_LO_S1_AF	BIT(10)
> >  
> > +#define KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR	GENMASK(5, 2)
> > +#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R	BIT(6)
> > +#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W	BIT(7)
> > +#define KVM_PTE_LEAF_ATTR_LO_S2_SH	GENMASK(9, 8)
> > +#define KVM_PTE_LEAF_ATTR_LO_S2_SH_IS	3
> > +#define KVM_PTE_LEAF_ATTR_LO_S2_AF	BIT(10)
> > +
> >  #define KVM_PTE_LEAF_ATTR_HI		GENMASK(63, 51)
> >  
> >  #define KVM_PTE_LEAF_ATTR_HI_S1_XN	BIT(54)
> >  
> > +#define KVM_PTE_LEAF_ATTR_HI_S2_XN	BIT(54)
> 
> Checked the bitfields against ARM DDI 0487F.b, they match.

Phew! ;)

> > +static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> > +				       kvm_pte_t *ptep,
> > +				       struct stage2_map_data *data)
> > +{
> > +	u64 granule = kvm_granule_size(level), phys = data->phys;
> > +
> > +	if (!kvm_block_mapping_supported(addr, end, phys, level))
> > +		return false;
> > +
> > +	if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
> > +		goto out;
> > +
> > +	kvm_set_invalid_pte(ptep);
> > +	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> > +	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
> 
> One has to read the kvm_set_valid_leaf_pte code very carefully to understand why
> we're doing the above (found an old, valid entry in the stage 2 code, the page
> tables are in use so we're doing break-before-make to replace it with the new
> one), especially since we don't this with the hyp tables. Perhaps a comment
> explaining what's happening would be useful.

Sure, I can add something here, but it sounds like you figured it out.

> > +static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> > +			     enum kvm_pgtable_walk_flags flag, void * const arg)
> > +{
> > +	struct stage2_map_data *data = arg;
> > +
> > +	switch (flag) {
> > +	case KVM_PGTABLE_WALK_TABLE_PRE:
> > +		return stage2_map_walk_table_pre(addr, end, level, ptep, data);
> > +	case KVM_PGTABLE_WALK_LEAF:
> > +		return stage2_map_walk_leaf(addr, end, level, ptep, data);
> > +	case KVM_PGTABLE_WALK_TABLE_POST:
> > +		return stage2_map_walk_table_post(addr, end, level, ptep, data);
> > +	}
> > +
> > +	return -EINVAL;
> > +}
> 
> As I understood the algorithm, each of the pre, leaf and post function do two
> different things: 1. free/invalidate the tables/leaf entries if we can create a
> block mapping at a previously visited level (stage2_map_data->anchor != NULL); and
> create an entry for the range at the correct level. To be honest, to me this
> hasn't been obvious from the code and I think some comments to the functions and
> especially to the anchor field of stage2_map_data would go a long way to making it
> easier for others to understand the code.

I can also add something here as the anchor thing is quite unusual. We
basically use it to mark an existing table entry which we want to replace
with a block entry, but before we can do that we have to descend into the
page table under that table entry freeing everything as we go. Then we'll
see the marked entry on the way back up and install the block entry then.

I had a few goes at implementing this with only LEAF and TABLE_POST but
it was really ugly, and actually it turns out TABLE_PRE is really useful
for debugging if you just want to print out the page-table.

> With that in mind, the functions look solid to me: every get_page has a
> corresponding put_page in stage2_map_walk_leaf or in the unmap walker, and the
> algorithm looks sound. I still want to re-read the functions a few times (probably
> in the next iteration) because they're definitely not trivial and I don't want to
> miss something.

Thanks. I'll post a v4 with some comments, so maybe that will help.

> > +	/*
> > +	 * This is similar to the map() path in that we unmap the entire
> > +	 * block entry and rely on the remaining portions being faulted
> > +	 * back lazily.
> > +	 */
> > +	kvm_set_invalid_pte(ptep);
> > +	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
> > +	put_page(virt_to_page(ptep));
> > +
> > +	if (need_flush) {
> > +		stage2_flush_dcache(kvm_pte_follow(pte),
> > +				    kvm_granule_size(level));
> > +	}
> 
> The curly braces are unnecessary; I'm only mentioning it because you don't use
> them in this function for the rest of the one line if statements.

Hmm, but this is a two-line statement so I think it reads better.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 07/21] KVM: arm64: Convert kvm_phys_addr_ioremap() to generic page-table API
  2020-09-01 17:08   ` Alexandru Elisei
@ 2020-09-02 11:48     ` Will Deacon
  0 siblings, 0 replies; 86+ messages in thread
From: Will Deacon @ 2020-09-02 11:48 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: Marc Zyngier, kernel-team, kvmarm, linux-arm-kernel, Catalin Marinas

On Tue, Sep 01, 2020 at 06:08:01PM +0100, Alexandru Elisei wrote:
> On 8/25/20 10:39 AM, Will Deacon wrote:
> > Convert kvm_phys_addr_ioremap() to use kvm_pgtable_stage2_map() instead
> > of stage2_set_pte().
> >
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Quentin Perret <qperret@google.com>
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/kvm/hyp/pgtable.c | 14 +-------------
> >  arch/arm64/kvm/mmu.c         | 29 ++++++++++++-----------------
> >  2 files changed, 13 insertions(+), 30 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> > index 41ee8f3c0369..6f65d3841ec9 100644
> > --- a/arch/arm64/kvm/hyp/pgtable.c
> > +++ b/arch/arm64/kvm/hyp/pgtable.c
> > @@ -439,18 +439,6 @@ struct stage2_map_data {
> >  	struct kvm_mmu_memory_cache	*memcache;
> >  };
> >  
> > -static kvm_pte_t *stage2_memcache_alloc_page(struct stage2_map_data *data)
> > -{
> > -	kvm_pte_t *ptep = NULL;
> > -	struct kvm_mmu_memory_cache *mc = data->memcache;
> > -
> > -	/* Allocated with GFP_PGTABLE_USER, so no need to zero */
> > -	if (mc && mc->nobjs)
> > -		ptep = mc->objects[--mc->nobjs];
> > -
> > -	return ptep;
> > -}
> > -
> >  static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
> >  				    struct stage2_map_data *data)
> >  {
> > @@ -531,7 +519,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> >  	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
> >  		return -EINVAL;
> >  
> > -	childp = stage2_memcache_alloc_page(data);
> > +	childp = kvm_mmu_memory_cache_alloc(data->memcache);
> 
> I think this hunk and the above could have been squashed in the previous patch, I
> think we could have used kvm_mmu_memory_cache_alloc directly from the start.

Urgh, looks like I squashed into the wrong patch when I rebased this before.
Thanks, I'll fix that (but damn, rebasing this series sucks rocks).

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 04/21] KVM: arm64: Use generic allocator for hyp stage-1 page-tables
  2020-09-02 11:35     ` Will Deacon
@ 2020-09-02 14:48       ` Alexandru Elisei
  0 siblings, 0 replies; 86+ messages in thread
From: Alexandru Elisei @ 2020-09-02 14:48 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, kernel-team, kvmarm, linux-arm-kernel, Catalin Marinas

Hi Will,

On 9/2/20 12:35 PM, Will Deacon wrote:
> On Fri, Aug 28, 2020 at 05:32:16PM +0100, Alexandru Elisei wrote:
>> On 8/25/20 10:39 AM, Will Deacon wrote:
>>> Now that we have a shiny new page-table allocator, replace the hyp
>>> page-table code with calls into the new API. This also allows us to
>>> remove the extended idmap code, as we can now simply ensure that the
>>> VA size is large enough to map everything we need.
>>>
>>> Cc: Marc Zyngier <maz@kernel.org>
>>> Cc: Quentin Perret <qperret@google.com>
>>> Signed-off-by: Will Deacon <will@kernel.org>
>>> ---
>>>  arch/arm64/include/asm/kvm_mmu.h       |  78 +----
>>>  arch/arm64/include/asm/kvm_pgtable.h   |   5 +
>>>  arch/arm64/include/asm/pgtable-hwdef.h |   6 -
>>>  arch/arm64/include/asm/pgtable-prot.h  |   6 -
>>>  arch/arm64/kvm/mmu.c                   | 414 +++----------------------
>>>  5 files changed, 45 insertions(+), 464 deletions(-)
>>>
>>> [..]
>>> @@ -2356,6 +2028,7 @@ static int kvm_map_idmap_text(pgd_t *pgd)
>>>  int kvm_mmu_init(void)
>>>  {
>>>  	int err;
>>> +	u32 hyp_va_bits;
>>>  
>>>  	hyp_idmap_start = __pa_symbol(__hyp_idmap_text_start);
>>>  	hyp_idmap_start = ALIGN_DOWN(hyp_idmap_start, PAGE_SIZE);
>>> @@ -2369,6 +2042,8 @@ int kvm_mmu_init(void)
>>>  	 */
>>>  	BUG_ON((hyp_idmap_start ^ (hyp_idmap_end - 1)) & PAGE_MASK);
>>>  
>>> +	hyp_va_bits = 64 - ((idmap_t0sz & TCR_T0SZ_MASK) >> TCR_T0SZ_OFFSET);
>> idmap_t0sz is defined in mm/mmu.c as: TCR_T0SZ(VA_BITS) = (UL(64) - VA_BITS) <<
>> TCR_T0SZ_OFFSET. Looks to me like hyp_va_bits == VA_BITS.
> Careful! It can get rewritten in head.S if we determine that physical memory
> is in an awkward place and not covered by VA_BITS in an identity mapping.

I wasn't aware of that, I see now that in head.S we can change idmap_t0sz if
VA_BITS is not enough, I wasn't away of that, thank you for pointing it out!

Thanks,
Alex

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 08/21] KVM: arm64: Convert kvm_set_spte_hva() to generic page-table API
  2020-08-25  9:39 ` [PATCH v3 08/21] KVM: arm64: Convert kvm_set_spte_hva() " Will Deacon
@ 2020-09-02 15:37   ` Alexandru Elisei
  2020-09-03 16:37     ` Will Deacon
  2020-09-03  4:13   ` Gavin Shan
  1 sibling, 1 reply; 86+ messages in thread
From: Alexandru Elisei @ 2020-09-02 15:37 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Marc Zyngier, kernel-team, linux-arm-kernel, Catalin Marinas

Hi Will,

There are still a few comments and code paths in stage2_set_pte referring to
kvm_set_spte_hva.

On 8/25/20 10:39 AM, Will Deacon wrote:
> Convert kvm_set_spte_hva() to use kvm_pgtable_stage2_map() instead
> of stage2_set_pte().
>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/kvm/mmu.c | 23 ++++++++++-------------
>  1 file changed, 10 insertions(+), 13 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 33146d3dc93a..704b471a48ce 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1911,28 +1911,27 @@ int kvm_unmap_hva_range(struct kvm *kvm,
>  
>  static int kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
>  {
> -	pte_t *pte = (pte_t *)data;
> +	kvm_pfn_t *pfn = (kvm_pfn_t *)data;
>  
>  	WARN_ON(size != PAGE_SIZE);
> +
>  	/*
> -	 * We can always call stage2_set_pte with KVM_S2PTE_FLAG_LOGGING_ACTIVE
> -	 * flag clear because MMU notifiers will have unmapped a huge PMD before
> -	 * calling ->change_pte() (which in turn calls kvm_set_spte_hva()) and
> -	 * therefore stage2_set_pte() never needs to clear out a huge PMD
> -	 * through this calling path.
> +	 * The MMU notifiers will have unmapped a huge PMD before calling
> +	 * ->change_pte() (which in turn calls kvm_set_spte_hva()) and
> +	 * therefore we never need to clear out a huge PMD through this
> +	 * calling path and a memcache is not required.
>  	 */
> -	stage2_set_pte(&kvm->arch.mmu, NULL, gpa, pte, 0);
> +	kvm_pgtable_stage2_map(kvm->arch.mmu.pgt, gpa, PAGE_SIZE,
> +			       __pfn_to_phys(*pfn), KVM_PGTABLE_PROT_R, NULL);

I have to admit that I managed to confuse myself.

According to the comment, this is called after unmapping a huge PMD.
__unmap_stage2_range() -> .. -> unmap_stage2_pmd() calls pmd_clear(), which means
the PMD entry is now 0.

In __kvm_pgtable_visit(), kvm_pte_table() returns false, because the entry is
invalid, and so we call stage2_map_walk_leaf(). Here, stage2_map_walker_try_leaf()
will return false, because kvm_block_mapping_supported() returns false (PMD
granule is larger than PAGE_SIZE), and then we end up allocating a table from the
memcache. memcache which will NULL, and kvm_mmu_memory_cache_alloc() will
dereference the NULL pointer.

I'm pretty sure there's something that I'm missing here, I would really appreciate
someone pointing out where I'm making a mistake.

Thanks,

Alex

>  	return 0;
>  }
>  
> -
>  int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
>  {
>  	unsigned long end = hva + PAGE_SIZE;
>  	kvm_pfn_t pfn = pte_pfn(pte);
> -	pte_t stage2_pte;
>  
> -	if (!kvm->arch.mmu.pgd)
> +	if (!kvm->arch.mmu.pgt)
>  		return 0;
>  
>  	trace_kvm_set_spte_hva(hva);
> @@ -1942,9 +1941,7 @@ int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
>  	 * just like a translation fault and clean the cache to the PoC.
>  	 */
>  	clean_dcache_guest_page(pfn, PAGE_SIZE);
> -	stage2_pte = kvm_pfn_pte(pfn, PAGE_S2);
> -	handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &stage2_pte);
> -
> +	handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &pfn);
>  	return 0;
>  }
>  

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 09/21] KVM: arm64: Convert unmap_stage2_range() to generic page-table API
  2020-08-25  9:39 ` [PATCH v3 09/21] KVM: arm64: Convert unmap_stage2_range() " Will Deacon
@ 2020-09-02 16:23   ` Alexandru Elisei
  2020-09-02 18:44     ` Alexandru Elisei
  2020-09-03 17:57     ` Will Deacon
  2020-09-03  4:19   ` Gavin Shan
  1 sibling, 2 replies; 86+ messages in thread
From: Alexandru Elisei @ 2020-09-02 16:23 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Marc Zyngier, kernel-team, linux-arm-kernel, Catalin Marinas

Hello,

On 8/25/20 10:39 AM, Will Deacon wrote:
> Convert unmap_stage2_range() to use kvm_pgtable_stage2_unmap() instead
> of walking the page-table directly.
>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/kvm/mmu.c | 57 +++++++++++++++++++++++++-------------------
>  1 file changed, 32 insertions(+), 25 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 704b471a48ce..751ce2462765 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -39,6 +39,33 @@ static bool is_iomap(unsigned long flags)
>  	return flags & KVM_S2PTE_FLAG_IS_IOMAP;
>  }
>  
> +/*
> + * Release kvm_mmu_lock periodically if the memory region is large. Otherwise,
> + * we may see kernel panics with CONFIG_DETECT_HUNG_TASK,
> + * CONFIG_LOCKUP_DETECTOR, CONFIG_LOCKDEP. Additionally, holding the lock too
> + * long will also starve other vCPUs. We have to also make sure that the page
> + * tables are not freed while we released the lock.
> + */
> +#define stage2_apply_range(kvm, addr, end, fn, resched)			\
> +({									\
> +	int ret;							\
> +	struct kvm *__kvm = (kvm);					\
> +	bool __resched = (resched);					\
> +	u64 next, __addr = (addr), __end = (end);			\
> +	do {								\
> +		struct kvm_pgtable *pgt = __kvm->arch.mmu.pgt;		\
> +		if (!pgt)						\
> +			break;						\

I'm 100% sure there's a reason why we've dropped the READ_ONCE, but it still looks
to me like the compiler might decide to optimize by reading pgt once at the start
of the loop and stashing it in a register. Would you mind explaining what I am
missing?

> +		next = stage2_pgd_addr_end(__kvm, __addr, __end);	\
> +		ret = fn(pgt, __addr, next - __addr);			\
> +		if (ret)						\
> +			break;						\
> +		if (__resched && next != __end)				\
> +			cond_resched_lock(&__kvm->mmu_lock);		\
> +	} while (__addr = next, __addr != __end);			\
> +	ret;								\
> +})

This seems unusual to me. We have a non-trivial, multiline macro which calls
cond_resched(), has 6 local variables, and is called from exactly one place.I am
curious why we are not open coding the loop in __unmap_stage2_range() or using a
function.

> +
>  static bool memslot_is_logging(struct kvm_memory_slot *memslot)
>  {
>  	return memslot->dirty_bitmap && !(memslot->flags & KVM_MEM_READONLY);
> @@ -220,8 +247,8 @@ static inline void kvm_pgd_populate(pgd_t *pgdp, p4d_t *p4dp)
>   * end up writing old data to disk.
>   *
>   * This is why right after unmapping a page/section and invalidating
> - * the corresponding TLBs, we call kvm_flush_dcache_p*() to make sure
> - * the IO subsystem will never hit in the cache.
> + * the corresponding TLBs, we flush to make sure the IO subsystem will
> + * never hit in the cache.
>   *
>   * This is all avoided on systems that have ARM64_HAS_STAGE2_FWB, as
>   * we then fully enforce cacheability of RAM, no matter what the guest
> @@ -344,32 +371,12 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
>  				 bool may_block)
>  {
>  	struct kvm *kvm = mmu->kvm;
> -	pgd_t *pgd;
> -	phys_addr_t addr = start, end = start + size;
> -	phys_addr_t next;
> +	phys_addr_t end = start + size;
>  
>  	assert_spin_locked(&kvm->mmu_lock);
>  	WARN_ON(size & ~PAGE_MASK);
> -
> -	pgd = mmu->pgd + stage2_pgd_index(kvm, addr);
> -	do {
> -		/*
> -		 * Make sure the page table is still active, as another thread
> -		 * could have possibly freed the page table, while we released
> -		 * the lock.
> -		 */
> -		if (!READ_ONCE(mmu->pgd))
> -			break;
> -		next = stage2_pgd_addr_end(kvm, addr, end);
> -		if (!stage2_pgd_none(kvm, *pgd))
> -			unmap_stage2_p4ds(mmu, pgd, addr, next);
> -		/*
> -		 * If the range is too large, release the kvm->mmu_lock
> -		 * to prevent starvation and lockup detector warnings.
> -		 */
> -		if (may_block && next != end)
> -			cond_resched_lock(&kvm->mmu_lock);
> -	} while (pgd++, addr = next, addr != end);
> +	WARN_ON(stage2_apply_range(kvm, start, end, kvm_pgtable_stage2_unmap,
> +				   may_block));
>  }
>  
>  static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 09/21] KVM: arm64: Convert unmap_stage2_range() to generic page-table API
  2020-09-02 16:23   ` Alexandru Elisei
@ 2020-09-02 18:44     ` Alexandru Elisei
  2020-09-03 17:57     ` Will Deacon
  1 sibling, 0 replies; 86+ messages in thread
From: Alexandru Elisei @ 2020-09-02 18:44 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Marc Zyngier, kernel-team, linux-arm-kernel, Catalin Marinas

Hi Will,

I think I have answered my own question (again).

On 9/2/20 5:23 PM, Alexandru Elisei wrote:
> Hello,
>
> On 8/25/20 10:39 AM, Will Deacon wrote:
>> Convert unmap_stage2_range() to use kvm_pgtable_stage2_unmap() instead
>> of walking the page-table directly.
>>
>> Cc: Marc Zyngier <maz@kernel.org>
>> Cc: Quentin Perret <qperret@google.com>
>> Signed-off-by: Will Deacon <will@kernel.org>
>> ---
>>  arch/arm64/kvm/mmu.c | 57 +++++++++++++++++++++++++-------------------
>>  1 file changed, 32 insertions(+), 25 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index 704b471a48ce..751ce2462765 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -39,6 +39,33 @@ static bool is_iomap(unsigned long flags)
>>  	return flags & KVM_S2PTE_FLAG_IS_IOMAP;
>>  }
>>  
>> +/*
>> + * Release kvm_mmu_lock periodically if the memory region is large. Otherwise,
>> + * we may see kernel panics with CONFIG_DETECT_HUNG_TASK,
>> + * CONFIG_LOCKUP_DETECTOR, CONFIG_LOCKDEP. Additionally, holding the lock too
>> + * long will also starve other vCPUs. We have to also make sure that the page
>> + * tables are not freed while we released the lock.
>> + */
>> +#define stage2_apply_range(kvm, addr, end, fn, resched)			\
>> +({									\
>> +	int ret;							\
>> +	struct kvm *__kvm = (kvm);					\
>> +	bool __resched = (resched);					\
>> +	u64 next, __addr = (addr), __end = (end);			\
>> +	do {								\
>> +		struct kvm_pgtable *pgt = __kvm->arch.mmu.pgt;		\
>> +		if (!pgt)						\
>> +			break;						\
> I'm 100% sure there's a reason why we've dropped the READ_ONCE, but it still looks
> to me like the compiler might decide to optimize by reading pgt once at the start
> of the loop and stashing it in a register. Would you mind explaining what I am
> missing?

I think the reason is that kvm_pgtable_stage2_unmap() has access to pgt via the
back pointer to mmu, and the function is in a different compilation unit now, so
the compiler cannot infer anything about pgt staying the same between function
calls. Is that correct?

Thanks,

Alex


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 02/21] KVM: arm64: Add stand-alone page-table walker infrastructure
  2020-09-02 11:02     ` Will Deacon
@ 2020-09-03  1:11       ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2020-09-03  1:11 UTC (permalink / raw)
  To: Will Deacon
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, kvmarm, linux-arm-kernel

Hi Will,

On 9/2/20 9:02 PM, Will Deacon wrote:
> On Wed, Sep 02, 2020 at 04:31:32PM +1000, Gavin Shan wrote:
>> On 8/25/20 7:39 PM, Will Deacon wrote:
>>> The KVM page-table code is intricately tied into the kernel page-table
>>> code and re-uses the pte/pmd/pud/p4d/pgd macros directly in an attempt
>>> to reduce code duplication. Unfortunately, the reality is that there is
>>> an awful lot of code required to make this work, and at the end of the
>>> day you're limited to creating page-tables with the same configuration
>>> as the host kernel. Furthermore, lifting the page-table code to run
>>> directly at EL2 on a non-VHE system (as we plan to to do in future
>>> patches) is practically impossible due to the number of dependencies it
>>> has on the core kernel.
>>>
>>> Introduce a framework for walking Armv8 page-tables configured
>>> independently from the host kernel.
>>>
>>> Cc: Marc Zyngier <maz@kernel.org>
>>> Cc: Quentin Perret <qperret@google.com>
>>> Signed-off-by: Will Deacon <will@kernel.org>
>>> ---
>>>    arch/arm64/include/asm/kvm_pgtable.h | 101 ++++++++++
>>>    arch/arm64/kvm/hyp/Makefile          |   2 +-
>>>    arch/arm64/kvm/hyp/pgtable.c         | 290 +++++++++++++++++++++++++++
>>>    3 files changed, 392 insertions(+), 1 deletion(-)
>>>    create mode 100644 arch/arm64/include/asm/kvm_pgtable.h
>>>    create mode 100644 arch/arm64/kvm/hyp/pgtable.c
> 
> [...]
> 
>>> +struct kvm_pgtable_walk_data {
>>> +	struct kvm_pgtable		*pgt;
>>> +	struct kvm_pgtable_walker	*walker;
>>> +
>>> +	u64				addr;
>>> +	u64				end;
>>> +};
>>> +
>>
>> Some of the following function might be worthy to be inlined, considering
>> their complexity :)
> 
> I'll leave that for the compiler to figure out :)
> 

Ok :)

>>> +static u32 kvm_pgd_pages(u32 ia_bits, u32 start_level)
>>> +{
>>> +	struct kvm_pgtable pgt = {
>>> +		.ia_bits	= ia_bits,
>>> +		.start_level	= start_level,
>>> +	};
>>> +
>>> +	return __kvm_pgd_page_idx(&pgt, -1ULL) + 1;
>>> +}
>>> +
>>
>> It seems @pgt.start_level is assigned with wrong value here.
>> For example, @start_level is 2 when @ia_bits and PAGE_SIZE
>> are 40 and 64KB separately. In this case, __kvm_pgd_page_idx()
>> always return zero. However, the extra page covers up the
>> issue. I think something like below might be needed:
>>
>> 	struct kvm_pgtable pgt = {
>> 		.ia_bits	= ia_bits,
>> 		.start_level	= KVM_PGTABLE_MAX_LEVELS - start_level + 1,
>> 	};
> 
> Hmm, we're pulling the start_level right out of the vtcr, so I don't see
> how it can be wrong. In your example, a start_level of 2 seems correct to
> me, as we'll translate 13 bits there, then 13 bits at level 3 which covers
> the 24 bits you need (with a 16-bit offset within the page).
> 
> Your suggestion would give us a start_level of 1, which has a redundant
> level of translation. Maybe you're looking at the levels upside-down? The
> top level is level 0 and each time you walk to a new level, that number
> increases.
> 
> But perhaps I'm missing something. Please could you elaborate if you think
> there's a problem here?
> 

Thanks for the explanation. I think I was understanding the code in wrong
way. In this particular path, __kvm_pgd_page_idx() is used to calculate
how many subordinate pages needed to hold PGDs. If I'm correct, there are
16 pages for PGDs to the maximal degree. So current implementation looks
correct to me.

There is another question, which might not relevant. I added some logs
around and hopefully my calculation is making sense. I have following
configuration (values) in my experiment. I'm including the kernel log
to make information complete:

    [ 5089.107147] kvm_arch_init_vm: kvm@0xfffffe0028460000, type=0x0
    [ 5089.112973] kvm_arm_setup_stage2: kvm@0xfffffe0028460000, type=0x0
    [ 5089.119157]    kvm_ipa_limit=0x2c, phys_shift=0x28
    [ 5089.123936]    kvm->arch.vtcr=0x00000000802c7558
    [ 5089.128552] kvm_init_stage2_mmu: kvm@0xfffffe0028460000
    [ 5089.133765] kvm_pgtable_stage2_init: kvm@0xfffffe0028460000, ia_bits=0x28,start_level=0x2

    PAGE_SIZE:       64KB
    @kvm->arch.vtcr: 0x00000000_802c7558
    @ipa_bits:       40
    @start_level:    2

    #define KVM_PGTABLE_MAX_LEVELS            4U

    static u64 kvm_granule_shift(u32 level)
    {
         return (KVM_PGTABLE_MAX_LEVELS - level) * (PAGE_SHIFT - 3) + 3;
    }

    static u32 __kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
    {
         u64 shift = kvm_granule_shift(pgt->start_level - 1); /* May underflow */
         u64 mask = BIT(pgt->ia_bits) - 1;

         return (addr & mask) >> shift;

         // shift = kvm_granule_shift(2 - 1) = ((3 * 13) + 3) = 42
         // mask  = ((1UL << 40) - 1)
         // return (0x000000ff_ffffffff >> 42) = 0
         //
         // QUESTION: Since we have 40-bits @ipa_bits, why we need shift 42-bits here.
    }

I was also thinking about the following case, which is making sense
to me. Note I didn't add logs to debug for this case.

    PAGE_SIZE:     4KB
    @ipa_bits:     40
    @start_level:  1

    static u32 __kvm_pgd_page_idx(struct kvm_pgtable *pgt, u64 addr)
    {
         u64 shift = kvm_granule_shift(pgt->start_level - 1); /* May underflow */
         u64 mask = BIT(pgt->ia_bits) - 1;

         return (addr & mask) >> shift;

         // shift = kvm_granule_shift(1 - 1) = ((4 * 9) + 3) = 39
         // mask  = ((1UL << 40) - 1)
         // return (0x000000ff_ffffffff >> 39) = 1
    }

>>> +static int _kvm_pgtable_walk(struct kvm_pgtable_walk_data *data)
>>> +{
>>> +	u32 idx;
>>> +	int ret = 0;
>>> +	struct kvm_pgtable *pgt = data->pgt;
>>> +	u64 limit = BIT(pgt->ia_bits);
>>> +
>>> +	if (data->addr > limit || data->end > limit)
>>> +		return -ERANGE;
>>> +
>>> +	if (!pgt->pgd)
>>> +		return -EINVAL;
>>> +
>>> +	for (idx = kvm_pgd_page_idx(data); data->addr < data->end; ++idx) {
>>> +		kvm_pte_t *ptep = &pgt->pgd[idx * PTRS_PER_PTE];
>>> +
>>> +		ret = __kvm_pgtable_walk(data, ptep, pgt->start_level);
>>> +		if (ret)
>>> +			break;
>>> +	}
>>> +
>>> +	return ret;
>>> +}
>>> +
>>
>> I guess we need bail on the following condition:
>>
>>          if (data->addr >= limit || data->end >= limit)
>>              return -ERANGE;
> 
> What's wrong with the existing check? In particular, I think we _want_
> to support data->end == limit (it's exclusive). If data->addr == limit,
> then we'll have a size of zero and the loop won't run.
> 

I was thinking @limit is exclusive, so we need bail when hitting the
ceiling. The @limit was figured out from @ia_bits. For example, it's
0x00000100_00000000 when @ia_bits is 40-bits, and it's invalid adress
to the guest, but I'm still wrong in this case :)

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 06/21] KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table
  2020-08-25  9:39 ` [PATCH v3 06/21] KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table Will Deacon
  2020-09-01 16:24   ` Alexandru Elisei
@ 2020-09-03  2:57   ` Gavin Shan
  2020-09-03  5:27     ` Gavin Shan
  2020-09-03 11:18   ` Gavin Shan
  2 siblings, 1 reply; 86+ messages in thread
From: Gavin Shan @ 2020-09-03  2:57 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, linux-arm-kernel

Hi Will,

On 8/25/20 7:39 PM, Will Deacon wrote:
> Add stage-2 map() and unmap() operations to the generic page-table code.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/include/asm/kvm_pgtable.h |  39 ++++
>   arch/arm64/kvm/hyp/pgtable.c         | 262 +++++++++++++++++++++++++++
>   2 files changed, 301 insertions(+)
> 

With the following questions resolved:

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 3389f978d573..8ab0d5f43817 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -134,6 +134,45 @@ int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm);
>    */
>   void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
>   
> +/**
> + * kvm_pgtable_stage2_map() - Install a mapping in a guest stage-2 page-table.
> + * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
> + * @addr:	Intermediate physical address at which to place the mapping.
> + * @size:	Size of the mapping.
> + * @phys:	Physical address of the memory to map.
> + * @prot:	Permissions and attributes for the mapping.
> + * @mc:		Cache of pre-allocated GFP_PGTABLE_USER memory from which to
> + *		allocate page-table pages.
> + *
> + * If device attributes are not explicitly requested in @prot, then the
> + * mapping will be normal, cacheable.
> + *
> + * Note that this function will both coalesce existing table entries and split
> + * existing block mappings, relying on page-faults to fault back areas outside
> + * of the new mapping lazily.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
> +			   u64 phys, enum kvm_pgtable_prot prot,
> +			   struct kvm_mmu_memory_cache *mc);
> +
> +/**
> + * kvm_pgtable_stage2_unmap() - Remove a mapping from a guest stage-2 page-table.
> + * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
> + * @addr:	Intermediate physical address from which to remove the mapping.
> + * @size:	Size of the mapping.
> + *
> + * TLB invalidation is performed for each page-table entry cleared during the
> + * unmapping operation and the reference count for the page-table page
> + * containing the cleared entry is decremented, with unreferenced pages being
> + * freed. Unmapping a cacheable page will ensure that it is clean to the PoC if
> + * FWB is not supported by the CPU.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
> +
>   /**
>    * kvm_pgtable_walk() - Walk a page-table.
>    * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index b8550ccaef4d..41ee8f3c0369 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -32,10 +32,19 @@
>   #define KVM_PTE_LEAF_ATTR_LO_S1_SH_IS	3
>   #define KVM_PTE_LEAF_ATTR_LO_S1_AF	BIT(10)
>   
> +#define KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR	GENMASK(5, 2)
> +#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R	BIT(6)
> +#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W	BIT(7)
> +#define KVM_PTE_LEAF_ATTR_LO_S2_SH	GENMASK(9, 8)
> +#define KVM_PTE_LEAF_ATTR_LO_S2_SH_IS	3
> +#define KVM_PTE_LEAF_ATTR_LO_S2_AF	BIT(10)
> +
>   #define KVM_PTE_LEAF_ATTR_HI		GENMASK(63, 51)
>   
>   #define KVM_PTE_LEAF_ATTR_HI_S1_XN	BIT(54)
>   
> +#define KVM_PTE_LEAF_ATTR_HI_S2_XN	BIT(54)
> +
>   struct kvm_pgtable_walk_data {
>   	struct kvm_pgtable		*pgt;
>   	struct kvm_pgtable_walker	*walker;
> @@ -420,6 +429,259 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
>   	pgt->pgd = NULL;
>   }
>   
> +struct stage2_map_data {
> +	u64				phys;
> +	kvm_pte_t			attr;
> +
> +	kvm_pte_t			*anchor;
> +
> +	struct kvm_s2_mmu		*mmu;
> +	struct kvm_mmu_memory_cache	*memcache;
> +};
> +
> +static kvm_pte_t *stage2_memcache_alloc_page(struct stage2_map_data *data)
> +{
> +	kvm_pte_t *ptep = NULL;
> +	struct kvm_mmu_memory_cache *mc = data->memcache;
> +
> +	/* Allocated with GFP_PGTABLE_USER, so no need to zero */
> +	if (mc && mc->nobjs)
> +		ptep = mc->objects[--mc->nobjs];
> +
> +	return ptep;
> +}
> +

This function is introduced by this (PATCH[6]), but replaced by
the generic one (kvm_mmu_memory_cache_alloc()) in PATCH[7]. I
think we might use the generic one from PATCH[7].

> +static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
> +				    struct stage2_map_data *data)
> +{
> +	bool device = prot & KVM_PGTABLE_PROT_DEVICE;
> +	kvm_pte_t attr = device ? PAGE_S2_MEMATTR(DEVICE_nGnRE) :
> +			    PAGE_S2_MEMATTR(NORMAL);
> +	u32 sh = KVM_PTE_LEAF_ATTR_LO_S2_SH_IS;
> +
> +	if (!(prot & KVM_PGTABLE_PROT_X))
> +		attr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
> +	else if (device)
> +		return -EINVAL;
> +
> +	if (prot & KVM_PGTABLE_PROT_R)
> +		attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R;
> +
> +	if (prot & KVM_PGTABLE_PROT_W)
> +		attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W;
> +
> +	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
> +	attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
> +	data->attr = attr;
> +	return 0;
> +}
> +
> +static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> +				       kvm_pte_t *ptep,
> +				       struct stage2_map_data *data)
> +{
> +	u64 granule = kvm_granule_size(level), phys = data->phys;
> +
> +	if (!kvm_block_mapping_supported(addr, end, phys, level))
> +		return false;
> +
> +	if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
> +		goto out;
> +
> +	kvm_set_invalid_pte(ptep);
> +	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> +	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
> +out:
> +	data->phys += granule;
> +	return true;
> +}
> +
> +static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
> +				     kvm_pte_t *ptep,
> +				     struct stage2_map_data *data)
> +{
> +	if (data->anchor)
> +		return 0;
> +
> +	if (!kvm_block_mapping_supported(addr, end, data->phys, level))
> +		return 0;
> +
> +	kvm_set_invalid_pte(ptep);
> +	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
> +	data->anchor = ptep;
> +	return 0;
> +}
> +
> +static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +				struct stage2_map_data *data)
> +{
> +	kvm_pte_t *childp, pte = *ptep;
> +	struct page *page = virt_to_page(ptep);
> +
> +	if (data->anchor) {
> +		if (kvm_pte_valid(pte))
> +			put_page(page);
> +
> +		return 0;
> +	}
> +
> +	if (stage2_map_walker_try_leaf(addr, end, level, ptep, data))
> +		goto out_get_page;
> +
> +	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
> +		return -EINVAL;
> +
> +	childp = stage2_memcache_alloc_page(data);
> +	if (!childp)
> +		return -ENOMEM;
> +
> +	/*
> +	 * If we've run into an existing block mapping then replace it with
> +	 * a table. Accesses beyond 'end' that fall within the new table
> +	 * will be mapped lazily.
> +	 */
> +	if (kvm_pte_valid(pte)) {
> +		kvm_set_invalid_pte(ptep);
> +		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> +		put_page(page);
> +	}
> +
> +	kvm_set_table_pte(ptep, childp);
> +
> +out_get_page:
> +	get_page(page);
> +	return 0;
> +}
> +
> +static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
> +				      kvm_pte_t *ptep,
> +				      struct stage2_map_data *data)
> +{
> +	int ret = 0;
> +
> +	if (!data->anchor)
> +		return 0;
> +
> +	free_page((unsigned long)kvm_pte_follow(*ptep));
> +	put_page(virt_to_page(ptep));
> +
> +	if (data->anchor == ptep) {
> +		data->anchor = NULL;
> +		ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
> +	}
> +
> +	return ret;
> +}
> +

stage2_map_walk_leaf() tries to build the huge (block?) mapping. It then
populate next-level page table if that fails. So it has more than what we
want. I think we might need call to stage2_map_walker_try_leaf() here.
However, there is nothing wrong to me :)

> +static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +			     enum kvm_pgtable_walk_flags flag, void * const arg)
> +{
> +	struct stage2_map_data *data = arg;
> +
> +	switch (flag) {
> +	case KVM_PGTABLE_WALK_TABLE_PRE:
> +		return stage2_map_walk_table_pre(addr, end, level, ptep, data);
> +	case KVM_PGTABLE_WALK_LEAF:
> +		return stage2_map_walk_leaf(addr, end, level, ptep, data);
> +	case KVM_PGTABLE_WALK_TABLE_POST:
> +		return stage2_map_walk_table_post(addr, end, level, ptep, data);
> +	}
> +
> +	return -EINVAL;
> +}
> +
> +int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
> +			   u64 phys, enum kvm_pgtable_prot prot,
> +			   struct kvm_mmu_memory_cache *mc)
> +{
> +	int ret;
> +	struct stage2_map_data map_data = {
> +		.phys		= ALIGN_DOWN(phys, PAGE_SIZE),
> +		.mmu		= pgt->mmu,
> +		.memcache	= mc,
> +	};
> +	struct kvm_pgtable_walker walker = {
> +		.cb		= stage2_map_walker,
> +		.flags		= KVM_PGTABLE_WALK_TABLE_PRE |
> +				  KVM_PGTABLE_WALK_LEAF |
> +				  KVM_PGTABLE_WALK_TABLE_POST,
> +		.arg		= &map_data,
> +	};
> +
> +	ret = stage2_map_set_prot_attr(prot, &map_data);
> +	if (ret)
> +		return ret;
> +
> +	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
> +	dsb(ishst);
> +	return ret;
> +}
> +
> +static void stage2_flush_dcache(void *addr, u64 size)
> +{
> +	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> +		return;
> +
> +	__flush_dcache_area(addr, size);
> +}
> +
> +static bool stage2_pte_cacheable(kvm_pte_t pte)
> +{
> +	u64 memattr = FIELD_GET(KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR, pte);
> +	return memattr == PAGE_S2_MEMATTR(NORMAL);
> +}
> +
> +static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +			       enum kvm_pgtable_walk_flags flag,
> +			       void * const arg)
> +{
> +	struct kvm_s2_mmu *mmu = arg;
> +	kvm_pte_t pte = *ptep, *childp = NULL;
> +	bool need_flush = false;
> +
> +	if (!kvm_pte_valid(pte))
> +		return 0;
> +
> +	if (kvm_pte_table(pte, level)) {
> +		childp = kvm_pte_follow(pte);
> +
> +		if (page_count(virt_to_page(childp)) != 1)
> +			return 0;
> +	} else if (stage2_pte_cacheable(pte)) {
> +		need_flush = true;
> +	}
> +
> +	/*
> +	 * This is similar to the map() path in that we unmap the entire
> +	 * block entry and rely on the remaining portions being faulted
> +	 * back lazily.
> +	 */
> +	kvm_set_invalid_pte(ptep);
> +	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
> +	put_page(virt_to_page(ptep));
> +
> +	if (need_flush) {
> +		stage2_flush_dcache(kvm_pte_follow(pte),
> +				    kvm_granule_size(level));
> +	}
> +
> +	if (childp)
> +		free_page((unsigned long)childp);
> +
> +	return 0;
> +}
> +
> +int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
> +{
> +	struct kvm_pgtable_walker walker = {
> +		.cb	= stage2_unmap_walker,
> +		.arg	= pgt->mmu,
> +		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> +	};
> +
> +	return kvm_pgtable_walk(pgt, addr, size, &walker);
> +}
> +
>   int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm)
>   {
>   	size_t pgd_sz;
> 

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 07/21] KVM: arm64: Convert kvm_phys_addr_ioremap() to generic page-table API
  2020-08-25  9:39 ` [PATCH v3 07/21] KVM: arm64: Convert kvm_phys_addr_ioremap() to generic page-table API Will Deacon
  2020-09-01 17:08   ` Alexandru Elisei
@ 2020-09-03  3:57   ` Gavin Shan
  1 sibling, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2020-09-03  3:57 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, linux-arm-kernel

On 8/25/20 7:39 PM, Will Deacon wrote:
> Convert kvm_phys_addr_ioremap() to use kvm_pgtable_stage2_map() instead
> of stage2_set_pte().
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/kvm/hyp/pgtable.c | 14 +-------------
>   arch/arm64/kvm/mmu.c         | 29 ++++++++++++-----------------
>   2 files changed, 13 insertions(+), 30 deletions(-)
> 
Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 41ee8f3c0369..6f65d3841ec9 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -439,18 +439,6 @@ struct stage2_map_data {
>   	struct kvm_mmu_memory_cache	*memcache;
>   };
>   
> -static kvm_pte_t *stage2_memcache_alloc_page(struct stage2_map_data *data)
> -{
> -	kvm_pte_t *ptep = NULL;
> -	struct kvm_mmu_memory_cache *mc = data->memcache;
> -
> -	/* Allocated with GFP_PGTABLE_USER, so no need to zero */
> -	if (mc && mc->nobjs)
> -		ptep = mc->objects[--mc->nobjs];
> -
> -	return ptep;
> -}
> -
>   static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
>   				    struct stage2_map_data *data)
>   {
> @@ -531,7 +519,7 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>   	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
>   		return -EINVAL;
>   
> -	childp = stage2_memcache_alloc_page(data);
> +	childp = kvm_mmu_memory_cache_alloc(data->memcache);
>   	if (!childp)
>   		return -ENOMEM;
>   
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 4607e9ca60a2..33146d3dc93a 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1154,35 +1154,30 @@ static int stage2_pudp_test_and_clear_young(pud_t *pud)
>   int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>   			  phys_addr_t pa, unsigned long size, bool writable)
>   {
> -	phys_addr_t addr, end;
> +	phys_addr_t addr;
>   	int ret = 0;
> -	unsigned long pfn;
>   	struct kvm_mmu_memory_cache cache = { 0, __GFP_ZERO, NULL, };
> +	struct kvm_pgtable *pgt = kvm->arch.mmu.pgt;
> +	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_DEVICE |
> +				     KVM_PGTABLE_PROT_R |
> +				     (writable ? KVM_PGTABLE_PROT_W : 0);
>   
> -	end = (guest_ipa + size + PAGE_SIZE - 1) & PAGE_MASK;
> -	pfn = __phys_to_pfn(pa);
> -
> -	for (addr = guest_ipa; addr < end; addr += PAGE_SIZE) {
> -		pte_t pte = kvm_pfn_pte(pfn, PAGE_S2_DEVICE);
> -
> -		if (writable)
> -			pte = kvm_s2pte_mkwrite(pte);
> -
> +	for (addr = guest_ipa; addr < guest_ipa + size; addr += PAGE_SIZE) {
>   		ret = kvm_mmu_topup_memory_cache(&cache,
>   						 kvm_mmu_cache_min_pages(kvm));
>   		if (ret)
> -			goto out;
> +			break;
> +
>   		spin_lock(&kvm->mmu_lock);
> -		ret = stage2_set_pte(&kvm->arch.mmu, &cache, addr, &pte,
> -				     KVM_S2PTE_FLAG_IS_IOMAP);
> +		ret = kvm_pgtable_stage2_map(pgt, addr, PAGE_SIZE, pa, prot,
> +					     &cache);
>   		spin_unlock(&kvm->mmu_lock);
>   		if (ret)
> -			goto out;
> +			break;
>   
> -		pfn++;
> +		pa += PAGE_SIZE;
>   	}
>   
> -out:
>   	kvm_mmu_free_memory_cache(&cache);
>   	return ret;
>   }
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 08/21] KVM: arm64: Convert kvm_set_spte_hva() to generic page-table API
  2020-08-25  9:39 ` [PATCH v3 08/21] KVM: arm64: Convert kvm_set_spte_hva() " Will Deacon
  2020-09-02 15:37   ` Alexandru Elisei
@ 2020-09-03  4:13   ` Gavin Shan
  1 sibling, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2020-09-03  4:13 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, linux-arm-kernel

Hi Will,

On 8/25/20 7:39 PM, Will Deacon wrote:
> Convert kvm_set_spte_hva() to use kvm_pgtable_stage2_map() instead
> of stage2_set_pte().
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/kvm/mmu.c | 23 ++++++++++-------------
>   1 file changed, 10 insertions(+), 13 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 33146d3dc93a..704b471a48ce 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1911,28 +1911,27 @@ int kvm_unmap_hva_range(struct kvm *kvm,
>   
>   static int kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
>   {
> -	pte_t *pte = (pte_t *)data;
> +	kvm_pfn_t *pfn = (kvm_pfn_t *)data;
>   
>   	WARN_ON(size != PAGE_SIZE);
> +
>   	/*
> -	 * We can always call stage2_set_pte with KVM_S2PTE_FLAG_LOGGING_ACTIVE
> -	 * flag clear because MMU notifiers will have unmapped a huge PMD before
> -	 * calling ->change_pte() (which in turn calls kvm_set_spte_hva()) and
> -	 * therefore stage2_set_pte() never needs to clear out a huge PMD
> -	 * through this calling path.
> +	 * The MMU notifiers will have unmapped a huge PMD before calling
> +	 * ->change_pte() (which in turn calls kvm_set_spte_hva()) and
> +	 * therefore we never need to clear out a huge PMD through this
> +	 * calling path and a memcache is not required.
>   	 */
> -	stage2_set_pte(&kvm->arch.mmu, NULL, gpa, pte, 0);
> +	kvm_pgtable_stage2_map(kvm->arch.mmu.pgt, gpa, PAGE_SIZE,
> +			       __pfn_to_phys(*pfn), KVM_PGTABLE_PROT_R, NULL);
>   	return 0;
>   }
>   
> -
>   int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
>   {
>   	unsigned long end = hva + PAGE_SIZE;
>   	kvm_pfn_t pfn = pte_pfn(pte);
> -	pte_t stage2_pte;
>   
> -	if (!kvm->arch.mmu.pgd)
> +	if (!kvm->arch.mmu.pgt)
>   		return 0;
>   
>   	trace_kvm_set_spte_hva(hva);
> @@ -1942,9 +1941,7 @@ int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
>   	 * just like a translation fault and clean the cache to the PoC.
>   	 */
>   	clean_dcache_guest_page(pfn, PAGE_SIZE);
> -	stage2_pte = kvm_pfn_pte(pfn, PAGE_S2);
> -	handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &stage2_pte);
> -
> +	handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &pfn);
>   	return 0;
>   }
>   
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 09/21] KVM: arm64: Convert unmap_stage2_range() to generic page-table API
  2020-08-25  9:39 ` [PATCH v3 09/21] KVM: arm64: Convert unmap_stage2_range() " Will Deacon
  2020-09-02 16:23   ` Alexandru Elisei
@ 2020-09-03  4:19   ` Gavin Shan
  1 sibling, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2020-09-03  4:19 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, linux-arm-kernel

Hi Will,

On 8/25/20 7:39 PM, Will Deacon wrote:
> Convert unmap_stage2_range() to use kvm_pgtable_stage2_unmap() instead
> of walking the page-table directly.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/kvm/mmu.c | 57 +++++++++++++++++++++++++-------------------
>   1 file changed, 32 insertions(+), 25 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 704b471a48ce..751ce2462765 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -39,6 +39,33 @@ static bool is_iomap(unsigned long flags)
>   	return flags & KVM_S2PTE_FLAG_IS_IOMAP;
>   }
>   
> +/*
> + * Release kvm_mmu_lock periodically if the memory region is large. Otherwise,
> + * we may see kernel panics with CONFIG_DETECT_HUNG_TASK,
> + * CONFIG_LOCKUP_DETECTOR, CONFIG_LOCKDEP. Additionally, holding the lock too
> + * long will also starve other vCPUs. We have to also make sure that the page
> + * tables are not freed while we released the lock.
> + */
> +#define stage2_apply_range(kvm, addr, end, fn, resched)			\
> +({									\
> +	int ret;							\
> +	struct kvm *__kvm = (kvm);					\
> +	bool __resched = (resched);					\
> +	u64 next, __addr = (addr), __end = (end);			\
> +	do {								\
> +		struct kvm_pgtable *pgt = __kvm->arch.mmu.pgt;		\
> +		if (!pgt)						\
> +			break;						\
> +		next = stage2_pgd_addr_end(__kvm, __addr, __end);	\
> +		ret = fn(pgt, __addr, next - __addr);			\
> +		if (ret)						\
> +			break;						\
> +		if (__resched && next != __end)				\
> +			cond_resched_lock(&__kvm->mmu_lock);		\
> +	} while (__addr = next, __addr != __end);			\
> +	ret;								\
> +})
> +
>   static bool memslot_is_logging(struct kvm_memory_slot *memslot)
>   {
>   	return memslot->dirty_bitmap && !(memslot->flags & KVM_MEM_READONLY);
> @@ -220,8 +247,8 @@ static inline void kvm_pgd_populate(pgd_t *pgdp, p4d_t *p4dp)
>    * end up writing old data to disk.
>    *
>    * This is why right after unmapping a page/section and invalidating
> - * the corresponding TLBs, we call kvm_flush_dcache_p*() to make sure
> - * the IO subsystem will never hit in the cache.
> + * the corresponding TLBs, we flush to make sure the IO subsystem will
> + * never hit in the cache.
>    *
>    * This is all avoided on systems that have ARM64_HAS_STAGE2_FWB, as
>    * we then fully enforce cacheability of RAM, no matter what the guest
> @@ -344,32 +371,12 @@ static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64
>   				 bool may_block)
>   {
>   	struct kvm *kvm = mmu->kvm;
> -	pgd_t *pgd;
> -	phys_addr_t addr = start, end = start + size;
> -	phys_addr_t next;
> +	phys_addr_t end = start + size;
>   
>   	assert_spin_locked(&kvm->mmu_lock);
>   	WARN_ON(size & ~PAGE_MASK);
> -
> -	pgd = mmu->pgd + stage2_pgd_index(kvm, addr);
> -	do {
> -		/*
> -		 * Make sure the page table is still active, as another thread
> -		 * could have possibly freed the page table, while we released
> -		 * the lock.
> -		 */
> -		if (!READ_ONCE(mmu->pgd))
> -			break;
> -		next = stage2_pgd_addr_end(kvm, addr, end);
> -		if (!stage2_pgd_none(kvm, *pgd))
> -			unmap_stage2_p4ds(mmu, pgd, addr, next);
> -		/*
> -		 * If the range is too large, release the kvm->mmu_lock
> -		 * to prevent starvation and lockup detector warnings.
> -		 */
> -		if (may_block && next != end)
> -			cond_resched_lock(&kvm->mmu_lock);
> -	} while (pgd++, addr = next, addr != end);
> +	WARN_ON(stage2_apply_range(kvm, start, end, kvm_pgtable_stage2_unmap,
> +				   may_block));
>   }
>   
>   static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size)
> 

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 10/21] KVM: arm64: Add support for stage-2 page-aging in generic page-table
  2020-08-25  9:39 ` [PATCH v3 10/21] KVM: arm64: Add support for stage-2 page-aging in generic page-table Will Deacon
@ 2020-09-03  4:33   ` Gavin Shan
  2020-09-03 16:48     ` Will Deacon
  0 siblings, 1 reply; 86+ messages in thread
From: Gavin Shan @ 2020-09-03  4:33 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, linux-arm-kernel

Hi Will,

On 8/25/20 7:39 PM, Will Deacon wrote:
> Add stage-2 mkyoung(), mkold() and is_young() operations to the generic
> page-table code.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/include/asm/kvm_pgtable.h | 38 ++++++++++++
>   arch/arm64/kvm/hyp/pgtable.c         | 86 ++++++++++++++++++++++++++++
>   2 files changed, 124 insertions(+)
> 

With the following one question resolved:

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 8ab0d5f43817..ae56534f87a0 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -173,6 +173,44 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
>    */
>   int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
>   
> +/**
> + * kvm_pgtable_stage2_mkyoung() - Set the access flag in a page-table entry.
> + * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
> + * @addr:	Intermediate physical address to identify the page-table entry.
> + *
> + * If there is a valid, leaf page-table entry used to translate @addr, then
> + * set the access flag in that entry.
> + *
> + * Return: The old page-table entry prior to setting the flag, 0 on failure.
> + */
> +kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
> +
> +/**
> + * kvm_pgtable_stage2_mkold() - Clear the access flag in a page-table entry.
> + * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
> + * @addr:	Intermediate physical address to identify the page-table entry.
> + *
> + * If there is a valid, leaf page-table entry used to translate @addr, then
> + * clear the access flag in that entry.
> + *
> + * Note that it is the caller's responsibility to invalidate the TLB after
> + * calling this function to ensure that the updated permissions are visible
> + * to the CPUs.
> + *
> + * Return: The old page-table entry prior to clearing the flag, 0 on failure.
> + */
> +kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
> +
> +/**
> + * kvm_pgtable_stage2_is_young() - Test whether a page-table entry has the
> + *				   access flag set.
> + * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
> + * @addr:	Intermediate physical address to identify the page-table entry.
> + *
> + * Return: True if the page-table entry has the access flag set, false otherwise.
> + */
> +bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr);
> +
>   /**
>    * kvm_pgtable_walk() - Walk a page-table.
>    * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 6f65d3841ec9..30713eb773e0 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -670,6 +670,92 @@ int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>   	return kvm_pgtable_walk(pgt, addr, size, &walker);
>   }
>   
> +struct stage2_attr_data {
> +	kvm_pte_t	attr_set;
> +	kvm_pte_t	attr_clr;
> +	kvm_pte_t	pte;
> +};
> +
> +static int stage2_attr_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +			      enum kvm_pgtable_walk_flags flag,
> +			      void * const arg)
> +{
> +	kvm_pte_t pte = *ptep;
> +	struct stage2_attr_data *data = arg;
> +
> +	if (!kvm_pte_valid(pte))
> +		return 0;
> +
> +	data->pte = pte;
> +	pte &= ~data->attr_clr;
> +	pte |= data->attr_set;
> +
> +	/*
> +	 * We may race with the CPU trying to set the access flag here,
> +	 * but worst-case the access flag update gets lost and will be
> +	 * set on the next access instead.
> +	 */
> +	if (data->pte != pte)
> +		WRITE_ONCE(*ptep, pte);
> +
> +	return 0;
> +}
> +
> +static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
> +				    u64 size, kvm_pte_t attr_set,
> +				    kvm_pte_t attr_clr, kvm_pte_t *orig_pte)
> +{
> +	int ret;
> +	kvm_pte_t attr_mask = KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI;
> +	struct stage2_attr_data data = {
> +		.attr_set	= attr_set & attr_mask,
> +		.attr_clr	= attr_clr & attr_mask,
> +	};
> +	struct kvm_pgtable_walker walker = {
> +		.cb		= stage2_attr_walker,
> +		.arg		= &data,
> +		.flags		= KVM_PGTABLE_WALK_LEAF,
> +	};
> +
> +	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
> +	if (ret)
> +		return ret;
> +
> +	if (orig_pte)
> +		*orig_pte = data.pte;
> +	return 0;
> +}
> +

The @size is always 1 from the caller, which means the parameter
can be dropped from stage2_update_leaf_attrs(). In the meanwhile,
we don't know the page is mapped by PUD, PMD or PTE. So to have
fixed value ("1") looks meaningless.

> +kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
> +{
> +	kvm_pte_t pte = 0;
> +	stage2_update_leaf_attrs(pgt, addr, 1, KVM_PTE_LEAF_ATTR_LO_S2_AF, 0,
> +				 &pte);
> +	dsb(ishst);
> +	return pte;
> +}
> +
> +kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
> +{
> +	kvm_pte_t pte = 0;
> +	stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF,
> +				 &pte);
> +	/*
> +	 * "But where's the TLBI?!", you scream.
> +	 * "Over in the core code", I sigh.
> +	 *
> +	 * See the '->clear_flush_young()' callback on the KVM mmu notifier.
> +	 */
> +	return pte;
> +}
> +
> +bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
> +{
> +	kvm_pte_t pte = 0;
> +	stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte);
> +	return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
> +}
> +
>   int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm)
>   {
>   	size_t pgd_sz;
> 

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 11/21] KVM: arm64: Convert page-aging and access faults to generic page-table API
  2020-08-25  9:39 ` [PATCH v3 11/21] KVM: arm64: Convert page-aging and access faults to generic page-table API Will Deacon
@ 2020-09-03  4:37   ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2020-09-03  4:37 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, linux-arm-kernel

Hi Will,

On 8/25/20 7:39 PM, Will Deacon wrote:
> Convert the page-aging functions and access fault handler to use the
> generic page-table code instead of walking the page-table directly.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/kvm/mmu.c | 74 ++++++++++----------------------------------
>   1 file changed, 16 insertions(+), 58 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 751ce2462765..d3db8e00ce0a 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1698,46 +1698,23 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   	return ret;
>   }
>   
> -/*
> - * Resolve the access fault by making the page young again.
> - * Note that because the faulting entry is guaranteed not to be
> - * cached in the TLB, we don't need to invalidate anything.
> - * Only the HW Access Flag updates are supported for Stage 2 (no DBM),
> - * so there is no need for atomic (pte|pmd)_mkyoung operations.
> - */
> +/* Resolve the access fault by making the page young again. */
>   static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
>   {
> -	pud_t *pud;
> -	pmd_t *pmd;
> -	pte_t *pte;
> -	kvm_pfn_t pfn;
> -	bool pfn_valid = false;
> +	pte_t pte;
> +	kvm_pte_t kpte;
> +	struct kvm_s2_mmu *mmu;
>   
>   	trace_kvm_access_fault(fault_ipa);
>   
>   	spin_lock(&vcpu->kvm->mmu_lock);
> -
> -	if (!stage2_get_leaf_entry(vcpu->arch.hw_mmu, fault_ipa, &pud, &pmd, &pte))
> -		goto out;
> -
> -	if (pud) {		/* HugeTLB */
> -		*pud = kvm_s2pud_mkyoung(*pud);
> -		pfn = kvm_pud_pfn(*pud);
> -		pfn_valid = true;
> -	} else	if (pmd) {	/* THP, HugeTLB */
> -		*pmd = pmd_mkyoung(*pmd);
> -		pfn = pmd_pfn(*pmd);
> -		pfn_valid = true;
> -	} else {
> -		*pte = pte_mkyoung(*pte);	/* Just a page... */
> -		pfn = pte_pfn(*pte);
> -		pfn_valid = true;
> -	}
> -
> -out:
> +	mmu = vcpu->arch.hw_mmu;
> +	kpte = kvm_pgtable_stage2_mkyoung(mmu->pgt, fault_ipa);
>   	spin_unlock(&vcpu->kvm->mmu_lock);
> -	if (pfn_valid)
> -		kvm_set_pfn_accessed(pfn);
> +
> +	pte = __pte(kpte);
> +	if (pte_valid(pte))
> +		kvm_set_pfn_accessed(pte_pfn(pte));
>   }
>   
>   /**
> @@ -1954,38 +1931,19 @@ int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
>   
>   static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
>   {
> -	pud_t *pud;
> -	pmd_t *pmd;
> -	pte_t *pte;
> +	pte_t pte;
> +	kvm_pte_t kpte;
>   
>   	WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
> -	if (!stage2_get_leaf_entry(&kvm->arch.mmu, gpa, &pud, &pmd, &pte))
> -		return 0;
> -
> -	if (pud)
> -		return stage2_pudp_test_and_clear_young(pud);
> -	else if (pmd)
> -		return stage2_pmdp_test_and_clear_young(pmd);
> -	else
> -		return stage2_ptep_test_and_clear_young(pte);
> +	kpte = kvm_pgtable_stage2_mkold(kvm->arch.mmu.pgt, gpa);
> +	pte = __pte(kpte);
> +	return pte_valid(pte) && pte_young(pte);
>   }
>   
>   static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
>   {
> -	pud_t *pud;
> -	pmd_t *pmd;
> -	pte_t *pte;
> -
>   	WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
> -	if (!stage2_get_leaf_entry(&kvm->arch.mmu, gpa, &pud, &pmd, &pte))
> -		return 0;
> -
> -	if (pud)
> -		return kvm_s2pud_young(*pud);
> -	else if (pmd)
> -		return pmd_young(*pmd);
> -	else
> -		return pte_young(*pte);
> +	return kvm_pgtable_stage2_is_young(kvm->arch.mmu.pgt, gpa);
>   }
>   
>   int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
> 

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 12/21] KVM: arm64: Add support for stage-2 write-protect in generic page-table
  2020-08-25  9:39 ` [PATCH v3 12/21] KVM: arm64: Add support for stage-2 write-protect in generic page-table Will Deacon
@ 2020-09-03  4:47   ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2020-09-03  4:47 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, linux-arm-kernel

Hi Will,

On 8/25/20 7:39 PM, Will Deacon wrote:
> From: Quentin Perret <qperret@google.com>
> 
> Add a stage-2 wrprotect() operation to the generic page-table code.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---

Reviewed-by: Gavin Shan <gshan@redhat.com>

>   arch/arm64/include/asm/kvm_pgtable.h | 15 +++++++++++++++
>   arch/arm64/kvm/hyp/pgtable.c         |  6 ++++++
>   2 files changed, 21 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index ae56534f87a0..0c96b78d791d 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -173,6 +173,21 @@ int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
>    */
>   int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
>   
> +/**
> + * kvm_pgtable_stage2_wrprotect() - Write-protect guest stage-2 address range
> + *                                  without TLB invalidation.
> + * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
> + * @addr:	Intermediate physical address from which to write-protect,
> + * @size:	Size of the range.
> + *
> + * Note that it is the caller's responsibility to invalidate the TLB after
> + * calling this function to ensure that the updated permissions are visible
> + * to the CPUs.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
> +
>   /**
>    * kvm_pgtable_stage2_mkyoung() - Set the access flag in a page-table entry.
>    * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 30713eb773e0..c218651f8eba 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -726,6 +726,12 @@ static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
>   	return 0;
>   }
>   
> +int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size)
> +{
> +	return stage2_update_leaf_attrs(pgt, addr, size, 0,
> +					KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W, NULL);
> +}
> +
>   kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
>   {
>   	kvm_pte_t pte = 0;
> 

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 13/21] KVM: arm64: Convert write-protect operation to generic page-table API
  2020-08-25  9:39 ` [PATCH v3 13/21] KVM: arm64: Convert write-protect operation to generic page-table API Will Deacon
@ 2020-09-03  4:48   ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2020-09-03  4:48 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, linux-arm-kernel

Hi Will,

On 8/25/20 7:39 PM, Will Deacon wrote:
> From: Quentin Perret <qperret@google.com>
> 
> Convert stage2_wp_range() to call the kvm_pgtable_stage2_wrprotect()
> function of the generic page-table code instead of walking the page-table
> directly.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/kvm/mmu.c | 25 ++++---------------------
>   1 file changed, 4 insertions(+), 21 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index d3db8e00ce0a..ca2c37c91e0b 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -66,6 +66,9 @@ static bool is_iomap(unsigned long flags)
>   	ret;								\
>   })
>   
> +#define stage2_apply_range_resched(kvm, addr, end, fn)			\
> +	stage2_apply_range(kvm, addr, end, fn, true)
> +
>   static bool memslot_is_logging(struct kvm_memory_slot *memslot)
>   {
>   	return memslot->dirty_bitmap && !(memslot->flags & KVM_MEM_READONLY);
> @@ -1294,27 +1297,7 @@ static void  stage2_wp_p4ds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
>   static void stage2_wp_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
>   {
>   	struct kvm *kvm = mmu->kvm;
> -	pgd_t *pgd;
> -	phys_addr_t next;
> -
> -	pgd = mmu->pgd + stage2_pgd_index(kvm, addr);
> -	do {
> -		/*
> -		 * Release kvm_mmu_lock periodically if the memory region is
> -		 * large. Otherwise, we may see kernel panics with
> -		 * CONFIG_DETECT_HUNG_TASK, CONFIG_LOCKUP_DETECTOR,
> -		 * CONFIG_LOCKDEP. Additionally, holding the lock too long
> -		 * will also starve other vCPUs. We have to also make sure
> -		 * that the page tables are not freed while we released
> -		 * the lock.
> -		 */
> -		cond_resched_lock(&kvm->mmu_lock);
> -		if (!READ_ONCE(mmu->pgd))
> -			break;
> -		next = stage2_pgd_addr_end(kvm, addr, end);
> -		if (stage2_pgd_present(kvm, *pgd))
> -			stage2_wp_p4ds(mmu, pgd, addr, next);
> -	} while (pgd++, addr = next, addr != end);
> +	stage2_apply_range_resched(kvm, addr, end, kvm_pgtable_stage2_wrprotect);
>   }
>   
>   /**
> 

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 14/21] KVM: arm64: Add support for stage-2 cache flushing in generic page-table
  2020-08-25  9:39 ` [PATCH v3 14/21] KVM: arm64: Add support for stage-2 cache flushing in generic page-table Will Deacon
@ 2020-09-03  4:51   ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2020-09-03  4:51 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, linux-arm-kernel

Hi Will,

On 8/25/20 7:39 PM, Will Deacon wrote:
> From: Quentin Perret <qperret@google.com>
> 
> Add support for cache flushing a range of the stage-2 address space to
> the generic page-table code.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---

Reviewed-by: Gavin Shan <gshan@redhat.com>

>   arch/arm64/include/asm/kvm_pgtable.h | 12 ++++++++++++
>   arch/arm64/kvm/hyp/pgtable.c         | 26 ++++++++++++++++++++++++++
>   2 files changed, 38 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 0c96b78d791d..ea823fe31913 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -226,6 +226,18 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
>    */
>   bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr);
>   
> +/**
> + * kvm_pgtable_stage2_flush_range() - Clean and invalidate data cache to Point
> + * 				      of Coherency for guest stage-2 address
> + *				      range.
> + * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
> + * @addr:	Intermediate physical address from which to flush.
> + * @size:	Size of the range.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size);
> +
>   /**
>    * kvm_pgtable_walk() - Walk a page-table.
>    * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index c218651f8eba..75887185f1e2 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -762,6 +762,32 @@ bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
>   	return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
>   }
>   
> +static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +			       enum kvm_pgtable_walk_flags flag,
> +			       void * const arg)
> +{
> +	kvm_pte_t pte = *ptep;
> +
> +	if (!kvm_pte_valid(pte) || !stage2_pte_cacheable(pte))
> +		return 0;
> +
> +	stage2_flush_dcache(kvm_pte_follow(pte), kvm_granule_size(level));
> +	return 0;
> +}
> +
> +int kvm_pgtable_stage2_flush(struct kvm_pgtable *pgt, u64 addr, u64 size)
> +{
> +	struct kvm_pgtable_walker walker = {
> +		.cb	= stage2_flush_walker,
> +		.flags	= KVM_PGTABLE_WALK_LEAF,
> +	};
> +
> +	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> +		return 0;
> +
> +	return kvm_pgtable_walk(pgt, addr, size, &walker);
> +}
> +
>   int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm)
>   {
>   	size_t pgd_sz;
> 

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 15/21] KVM: arm64: Convert memslot cache-flushing code to generic page-table API
  2020-08-25  9:39 ` [PATCH v3 15/21] KVM: arm64: Convert memslot cache-flushing code to generic page-table API Will Deacon
@ 2020-09-03  4:52   ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2020-09-03  4:52 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, linux-arm-kernel

Hi Will,

On 8/25/20 7:39 PM, Will Deacon wrote:
> From: Quentin Perret <qperret@google.com>
> 
> Convert stage2_flush_memslot() to call the kvm_pgtable_stage2_flush()
> function of the generic page-table code instead of walking the page-table
> directly.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/kvm/mmu.c | 13 +------------
>   1 file changed, 1 insertion(+), 12 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index ca2c37c91e0b..d4b0716a6ab4 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -455,21 +455,10 @@ static void stage2_flush_p4ds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
>   static void stage2_flush_memslot(struct kvm *kvm,
>   				 struct kvm_memory_slot *memslot)
>   {
> -	struct kvm_s2_mmu *mmu = &kvm->arch.mmu;
>   	phys_addr_t addr = memslot->base_gfn << PAGE_SHIFT;
>   	phys_addr_t end = addr + PAGE_SIZE * memslot->npages;
> -	phys_addr_t next;
> -	pgd_t *pgd;
> -
> -	pgd = mmu->pgd + stage2_pgd_index(kvm, addr);
> -	do {
> -		next = stage2_pgd_addr_end(kvm, addr, end);
> -		if (!stage2_pgd_none(kvm, *pgd))
> -			stage2_flush_p4ds(mmu, pgd, addr, next);
>   
> -		if (next != end)
> -			cond_resched_lock(&kvm->mmu_lock);
> -	} while (pgd++, addr = next, addr != end);
> +	stage2_apply_range_resched(kvm, addr, end, kvm_pgtable_stage2_flush);
>   }
>   
>   /**
> 

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 16/21] KVM: arm64: Add support for relaxing stage-2 perms in generic page-table code
  2020-08-25  9:39 ` [PATCH v3 16/21] KVM: arm64: Add support for relaxing stage-2 perms in generic page-table code Will Deacon
@ 2020-09-03  4:55   ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2020-09-03  4:55 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, linux-arm-kernel

Hi Will,

On 8/25/20 7:39 PM, Will Deacon wrote:
> Add support for relaxing the permissions of a stage-2 mapping (i.e.
> adding additional permissions) to the generic page-table code.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/include/asm/kvm_pgtable.h | 17 +++++++++++++++++
>   arch/arm64/kvm/hyp/pgtable.c         | 20 ++++++++++++++++++++
>   2 files changed, 37 insertions(+)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index ea823fe31913..0d7077c34152 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -216,6 +216,23 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
>    */
>   kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
>   
> +/**
> + * kvm_pgtable_stage2_relax_perms() - Relax the permissions enforced by a
> + *				      page-table entry.
> + * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
> + * @addr:	Intermediate physical address to identify the page-table entry.
> + * @prot:	Additional permissions to grant for the mapping.
> + *
> + * If there is a valid, leaf page-table entry used to translate @addr, then
> + * relax the permissions in that entry according to the read, write and
> + * execute permissions specified by @prot. No permissions are removed, and
> + * TLB invalidation is performed after updating the entry.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
> +				   enum kvm_pgtable_prot prot);
> +
>   /**
>    * kvm_pgtable_stage2_is_young() - Test whether a page-table entry has the
>    *				   access flag set.
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 75887185f1e2..6e8ca1ec12b4 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -762,6 +762,26 @@ bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
>   	return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
>   }
>   
> +int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
> +				   enum kvm_pgtable_prot prot)
> +{
> +	int ret;
> +	kvm_pte_t set = 0, clr = 0;
> +
> +	if (prot & KVM_PGTABLE_PROT_R)
> +		set |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R;
> +
> +	if (prot & KVM_PGTABLE_PROT_W)
> +		set |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W;
> +
> +	if (prot & KVM_PGTABLE_PROT_X)
> +		clr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
> +
> +	ret = stage2_update_leaf_attrs(pgt, addr, 1, set, clr, NULL);
> +	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, pgt->mmu, addr, 0);
> +	return ret;
> +}
> +
>   static int stage2_flush_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>   			       enum kvm_pgtable_walk_flags flag,
>   			       void * const arg)
> 

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 18/21] KVM: arm64: Check the pgt instead of the pgd when modifying page-table
  2020-08-25  9:39 ` [PATCH v3 18/21] KVM: arm64: Check the pgt instead of the pgd when modifying page-table Will Deacon
@ 2020-09-03  5:00   ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2020-09-03  5:00 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, linux-arm-kernel

Hi Will,

On 8/25/20 7:39 PM, Will Deacon wrote:
> In preparation for removing the 'pgd' field of 'struct kvm_s2_mmu',
> update the few remaining users to check the 'pgt' field instead.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/kvm/mmu.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index cfbf32cae3a5..050eab71de31 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1813,7 +1813,7 @@ static int kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *dat
>   int kvm_unmap_hva_range(struct kvm *kvm,
>   			unsigned long start, unsigned long end, unsigned flags)
>   {
> -	if (!kvm->arch.mmu.pgd)
> +	if (!kvm->arch.mmu.pgt)
>   		return 0;
>   
>   	trace_kvm_unmap_hva_range(start, end);
> @@ -1876,7 +1876,7 @@ static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *
>   
>   int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
>   {
> -	if (!kvm->arch.mmu.pgd)
> +	if (!kvm->arch.mmu.pgt)
>   		return 0;
>   	trace_kvm_age_hva(start, end);
>   	return handle_hva_to_gpa(kvm, start, end, kvm_age_hva_handler, NULL);
> @@ -1884,7 +1884,7 @@ int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
>   
>   int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
>   {
> -	if (!kvm->arch.mmu.pgd)
> +	if (!kvm->arch.mmu.pgt)
>   		return 0;
>   	trace_kvm_test_age_hva(hva);
>   	return handle_hva_to_gpa(kvm, hva, hva + PAGE_SIZE,
> 

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 20/21] KVM: arm64: Remove unused 'pgd' field from 'struct kvm_s2_mmu'
  2020-08-25  9:39 ` [PATCH v3 20/21] KVM: arm64: Remove unused 'pgd' field from 'struct kvm_s2_mmu' Will Deacon
@ 2020-09-03  5:07   ` Gavin Shan
  2020-09-03 16:50     ` Will Deacon
  0 siblings, 1 reply; 86+ messages in thread
From: Gavin Shan @ 2020-09-03  5:07 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, linux-arm-kernel

Hi Will,

On 8/25/20 7:39 PM, Will Deacon wrote:
> The stage-2 page-tables are entirely encapsulated by the 'pgt' field of
> 'struct kvm_s2_mmu', so remove the unused 'pgd' field.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/include/asm/kvm_host.h | 1 -
>   arch/arm64/kvm/mmu.c              | 2 --
>   2 files changed, 3 deletions(-)
> 

I think this might be folded into PATCH[18] as both patches are
simple enough. I'm not sure the changes introduced in PATCH[19]
prevent us doing this.

There is another question below.

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 0b7c702b2151..41caf29bd93c 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -79,7 +79,6 @@ struct kvm_s2_mmu {
>   	 * for vEL1/EL0 with vHCR_EL2.VM == 0.  In that case, we use the
>   	 * canonical stage-2 page tables.
>   	 */
> -	pgd_t		*pgd;
>   	phys_addr_t	pgd_phys;
>   	struct kvm_pgtable *pgt;
>   
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index ddeec0b03666..f28e03dcb897 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -384,7 +384,6 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
>   	mmu->kvm = kvm;
>   	mmu->pgt = pgt;
>   	mmu->pgd_phys = __pa(pgt->pgd);
> -	mmu->pgd = (void *)pgt->pgd;
>   	mmu->vmid.vmid_gen = 0;
>   	return 0;
>   
> @@ -470,7 +469,6 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>   	spin_lock(&kvm->mmu_lock);
>   	pgt = mmu->pgt;
>   	if (pgt) {
> -		mmu->pgd = NULL;
>   		mmu->pgd_phys = 0;
>   		mmu->pgt = NULL;
>   		free_percpu(mmu->last_vcpu_ran);
> 

I guess mmu->pgd_phys might be removed either because kvm_get_vttbr()
is the only consumer.

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 21/21] KVM: arm64: Don't constrain maximum IPA size based on host configuration
  2020-08-25  9:39 ` [PATCH v3 21/21] KVM: arm64: Don't constrain maximum IPA size based on host configuration Will Deacon
@ 2020-09-03  5:09   ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2020-09-03  5:09 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, linux-arm-kernel

Hi Will,

On 8/25/20 7:39 PM, Will Deacon wrote:
> Now that the guest stage-2 page-tables are managed independently from
> the host stage-1 page-tables, we can avoid constraining the IPA size
> based on the host and instead limit it only based on the PARange field
> of the ID_AA64MMFR0 register.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/kvm/reset.c | 38 +++++---------------------------------
>   1 file changed, 5 insertions(+), 33 deletions(-)
> 

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index ee33875c5c2a..471ee9234e40 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -339,7 +339,7 @@ u32 get_kvm_ipa_limit(void)
>   
>   int kvm_set_ipa_limit(void)
>   {
> -	unsigned int ipa_max, pa_max, va_max, parange, tgran_2;
> +	unsigned int parange, tgran_2;
>   	u64 mmfr0;
>   
>   	mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
> @@ -376,38 +376,10 @@ int kvm_set_ipa_limit(void)
>   		break;
>   	}
>   
> -	pa_max = id_aa64mmfr0_parange_to_phys_shift(parange);
> -
> -	/* Clamp the IPA limit to the PA size supported by the kernel */
> -	ipa_max = (pa_max > PHYS_MASK_SHIFT) ? PHYS_MASK_SHIFT : pa_max;
> -	/*
> -	 * Since our stage2 table is dependent on the stage1 page table code,
> -	 * we must always honor the following condition:
> -	 *
> -	 *  Number of levels in Stage1 >= Number of levels in Stage2.
> -	 *
> -	 * So clamp the ipa limit further down to limit the number of levels.
> -	 * Since we can concatenate upto 16 tables at entry level, we could
> -	 * go upto 4bits above the maximum VA addressable with the current
> -	 * number of levels.
> -	 */
> -	va_max = PGDIR_SHIFT + PAGE_SHIFT - 3;
> -	va_max += 4;
> -
> -	if (va_max < ipa_max)
> -		ipa_max = va_max;
> -
> -	/*
> -	 * If the final limit is lower than the real physical address
> -	 * limit of the CPUs, report the reason.
> -	 */
> -	if (ipa_max < pa_max)
> -		pr_info("kvm: Limiting the IPA size due to kernel %s Address limit\n",
> -			(va_max < pa_max) ? "Virtual" : "Physical");
> -
> -	WARN(ipa_max < KVM_PHYS_SHIFT,
> -	     "KVM IPA limit (%d bit) is smaller than default size\n", ipa_max);
> -	kvm_ipa_limit = ipa_max;
> +	kvm_ipa_limit = id_aa64mmfr0_parange_to_phys_shift(parange);
> +	WARN(kvm_ipa_limit < KVM_PHYS_SHIFT,
> +	     "KVM IPA limit (%d bit) is smaller than default size\n",
> +	     kvm_ipa_limit);
>   	kvm_info("IPA Size Limit: %dbits\n", kvm_ipa_limit);
>   
>   	return 0;
> 

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 06/21] KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table
  2020-09-03  2:57   ` Gavin Shan
@ 2020-09-03  5:27     ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2020-09-03  5:27 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, linux-arm-kernel

Hi Will,

On 9/3/20 12:57 PM, Gavin Shan wrote:
> On 8/25/20 7:39 PM, Will Deacon wrote:
>> Add stage-2 map() and unmap() operations to the generic page-table code.
>>
>> Cc: Marc Zyngier <maz@kernel.org>
>> Cc: Quentin Perret <qperret@google.com>
>> Signed-off-by: Will Deacon <will@kernel.org>
>> ---
>>   arch/arm64/include/asm/kvm_pgtable.h |  39 ++++
>>   arch/arm64/kvm/hyp/pgtable.c         | 262 +++++++++++++++++++++++++++
>>   2 files changed, 301 insertions(+)
>>
> 
> With the following questions resolved:
> 
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> 
>> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
>> index 3389f978d573..8ab0d5f43817 100644
>> --- a/arch/arm64/include/asm/kvm_pgtable.h
>> +++ b/arch/arm64/include/asm/kvm_pgtable.h
>> @@ -134,6 +134,45 @@ int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm);
>>    */
>>   void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
>> +/**
>> + * kvm_pgtable_stage2_map() - Install a mapping in a guest stage-2 page-table.
>> + * @pgt:    Page-table structure initialised by kvm_pgtable_stage2_init().
>> + * @addr:    Intermediate physical address at which to place the mapping.
>> + * @size:    Size of the mapping.
>> + * @phys:    Physical address of the memory to map.
>> + * @prot:    Permissions and attributes for the mapping.
>> + * @mc:        Cache of pre-allocated GFP_PGTABLE_USER memory from which to
>> + *        allocate page-table pages.
>> + *
>> + * If device attributes are not explicitly requested in @prot, then the
>> + * mapping will be normal, cacheable.
>> + *
>> + * Note that this function will both coalesce existing table entries and split
>> + * existing block mappings, relying on page-faults to fault back areas outside
>> + * of the new mapping lazily.
>> + *
>> + * Return: 0 on success, negative error code on failure.
>> + */
>> +int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
>> +               u64 phys, enum kvm_pgtable_prot prot,
>> +               struct kvm_mmu_memory_cache *mc);
>> +
>> +/**
>> + * kvm_pgtable_stage2_unmap() - Remove a mapping from a guest stage-2 page-table.
>> + * @pgt:    Page-table structure initialised by kvm_pgtable_stage2_init().
>> + * @addr:    Intermediate physical address from which to remove the mapping.
>> + * @size:    Size of the mapping.
>> + *
>> + * TLB invalidation is performed for each page-table entry cleared during the
>> + * unmapping operation and the reference count for the page-table page
>> + * containing the cleared entry is decremented, with unreferenced pages being
>> + * freed. Unmapping a cacheable page will ensure that it is clean to the PoC if
>> + * FWB is not supported by the CPU.
>> + *
>> + * Return: 0 on success, negative error code on failure.
>> + */
>> +int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
>> +
>>   /**
>>    * kvm_pgtable_walk() - Walk a page-table.
>>    * @pgt:    Page-table structure initialised by kvm_pgtable_*_init().
>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> index b8550ccaef4d..41ee8f3c0369 100644
>> --- a/arch/arm64/kvm/hyp/pgtable.c
>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>> @@ -32,10 +32,19 @@
>>   #define KVM_PTE_LEAF_ATTR_LO_S1_SH_IS    3
>>   #define KVM_PTE_LEAF_ATTR_LO_S1_AF    BIT(10)
>> +#define KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR    GENMASK(5, 2)
>> +#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R    BIT(6)
>> +#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W    BIT(7)
>> +#define KVM_PTE_LEAF_ATTR_LO_S2_SH    GENMASK(9, 8)
>> +#define KVM_PTE_LEAF_ATTR_LO_S2_SH_IS    3
>> +#define KVM_PTE_LEAF_ATTR_LO_S2_AF    BIT(10)
>> +
>>   #define KVM_PTE_LEAF_ATTR_HI        GENMASK(63, 51)
>>   #define KVM_PTE_LEAF_ATTR_HI_S1_XN    BIT(54)
>> +#define KVM_PTE_LEAF_ATTR_HI_S2_XN    BIT(54)
>> +
>>   struct kvm_pgtable_walk_data {
>>       struct kvm_pgtable        *pgt;
>>       struct kvm_pgtable_walker    *walker;
>> @@ -420,6 +429,259 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
>>       pgt->pgd = NULL;
>>   }
>> +struct stage2_map_data {
>> +    u64                phys;
>> +    kvm_pte_t            attr;
>> +
>> +    kvm_pte_t            *anchor;
>> +
>> +    struct kvm_s2_mmu        *mmu;
>> +    struct kvm_mmu_memory_cache    *memcache;
>> +};
>> +
>> +static kvm_pte_t *stage2_memcache_alloc_page(struct stage2_map_data *data)
>> +{
>> +    kvm_pte_t *ptep = NULL;
>> +    struct kvm_mmu_memory_cache *mc = data->memcache;
>> +
>> +    /* Allocated with GFP_PGTABLE_USER, so no need to zero */
>> +    if (mc && mc->nobjs)
>> +        ptep = mc->objects[--mc->nobjs];
>> +
>> +    return ptep;
>> +}
>> +
> 
> This function is introduced by this (PATCH[6]), but replaced by
> the generic one (kvm_mmu_memory_cache_alloc()) in PATCH[7]. I
> think we might use the generic one from PATCH[7].
> 

Correction: I think we might the generic function from the beginning,
corrresponding to this (PATCH[6]).

>> +static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
>> +                    struct stage2_map_data *data)
>> +{
>> +    bool device = prot & KVM_PGTABLE_PROT_DEVICE;
>> +    kvm_pte_t attr = device ? PAGE_S2_MEMATTR(DEVICE_nGnRE) :
>> +                PAGE_S2_MEMATTR(NORMAL);
>> +    u32 sh = KVM_PTE_LEAF_ATTR_LO_S2_SH_IS;
>> +
>> +    if (!(prot & KVM_PGTABLE_PROT_X))
>> +        attr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
>> +    else if (device)
>> +        return -EINVAL;
>> +
>> +    if (prot & KVM_PGTABLE_PROT_R)
>> +        attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R;
>> +
>> +    if (prot & KVM_PGTABLE_PROT_W)
>> +        attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W;
>> +
>> +    attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
>> +    attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
>> +    data->attr = attr;
>> +    return 0;
>> +}
>> +
>> +static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
>> +                       kvm_pte_t *ptep,
>> +                       struct stage2_map_data *data)
>> +{
>> +    u64 granule = kvm_granule_size(level), phys = data->phys;
>> +
>> +    if (!kvm_block_mapping_supported(addr, end, phys, level))
>> +        return false;
>> +
>> +    if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
>> +        goto out;
>> +
>> +    kvm_set_invalid_pte(ptep);
>> +    kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
>> +    kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
>> +out:
>> +    data->phys += granule;
>> +    return true;
>> +}
>> +
>> +static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>> +                     kvm_pte_t *ptep,
>> +                     struct stage2_map_data *data)
>> +{
>> +    if (data->anchor)
>> +        return 0;
>> +
>> +    if (!kvm_block_mapping_supported(addr, end, data->phys, level))
>> +        return 0;
>> +
>> +    kvm_set_invalid_pte(ptep);
>> +    kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
>> +    data->anchor = ptep;
>> +    return 0;
>> +}
>> +
>> +static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>> +                struct stage2_map_data *data)
>> +{
>> +    kvm_pte_t *childp, pte = *ptep;
>> +    struct page *page = virt_to_page(ptep);
>> +
>> +    if (data->anchor) {
>> +        if (kvm_pte_valid(pte))
>> +            put_page(page);
>> +
>> +        return 0;
>> +    }
>> +
>> +    if (stage2_map_walker_try_leaf(addr, end, level, ptep, data))
>> +        goto out_get_page;
>> +
>> +    if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
>> +        return -EINVAL;
>> +
>> +    childp = stage2_memcache_alloc_page(data);
>> +    if (!childp)
>> +        return -ENOMEM;
>> +
>> +    /*
>> +     * If we've run into an existing block mapping then replace it with
>> +     * a table. Accesses beyond 'end' that fall within the new table
>> +     * will be mapped lazily.
>> +     */
>> +    if (kvm_pte_valid(pte)) {
>> +        kvm_set_invalid_pte(ptep);
>> +        kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
>> +        put_page(page);
>> +    }
>> +
>> +    kvm_set_table_pte(ptep, childp);
>> +
>> +out_get_page:
>> +    get_page(page);
>> +    return 0;
>> +}
>> +
>> +static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
>> +                      kvm_pte_t *ptep,
>> +                      struct stage2_map_data *data)
>> +{
>> +    int ret = 0;
>> +
>> +    if (!data->anchor)
>> +        return 0;
>> +
>> +    free_page((unsigned long)kvm_pte_follow(*ptep));
>> +    put_page(virt_to_page(ptep));
>> +
>> +    if (data->anchor == ptep) {
>> +        data->anchor = NULL;
>> +        ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
>> +    }
>> +
>> +    return ret;
>> +}
>> +
> 
> stage2_map_walk_leaf() tries to build the huge (block?) mapping. It then
> populate next-level page table if that fails. So it has more than what we
> want. I think we might need call to stage2_map_walker_try_leaf() here.
> However, there is nothing wrong to me :)
> 
>> +static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>> +                 enum kvm_pgtable_walk_flags flag, void * const arg)
>> +{
>> +    struct stage2_map_data *data = arg;
>> +
>> +    switch (flag) {
>> +    case KVM_PGTABLE_WALK_TABLE_PRE:
>> +        return stage2_map_walk_table_pre(addr, end, level, ptep, data);
>> +    case KVM_PGTABLE_WALK_LEAF:
>> +        return stage2_map_walk_leaf(addr, end, level, ptep, data);
>> +    case KVM_PGTABLE_WALK_TABLE_POST:
>> +        return stage2_map_walk_table_post(addr, end, level, ptep, data);
>> +    }
>> +
>> +    return -EINVAL;
>> +}
>> +
>> +int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
>> +               u64 phys, enum kvm_pgtable_prot prot,
>> +               struct kvm_mmu_memory_cache *mc)
>> +{
>> +    int ret;
>> +    struct stage2_map_data map_data = {
>> +        .phys        = ALIGN_DOWN(phys, PAGE_SIZE),
>> +        .mmu        = pgt->mmu,
>> +        .memcache    = mc,
>> +    };
>> +    struct kvm_pgtable_walker walker = {
>> +        .cb        = stage2_map_walker,
>> +        .flags        = KVM_PGTABLE_WALK_TABLE_PRE |
>> +                  KVM_PGTABLE_WALK_LEAF |
>> +                  KVM_PGTABLE_WALK_TABLE_POST,
>> +        .arg        = &map_data,
>> +    };
>> +
>> +    ret = stage2_map_set_prot_attr(prot, &map_data);
>> +    if (ret)
>> +        return ret;
>> +
>> +    ret = kvm_pgtable_walk(pgt, addr, size, &walker);
>> +    dsb(ishst);
>> +    return ret;
>> +}
>> +
>> +static void stage2_flush_dcache(void *addr, u64 size)
>> +{
>> +    if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
>> +        return;
>> +
>> +    __flush_dcache_area(addr, size);
>> +}
>> +
>> +static bool stage2_pte_cacheable(kvm_pte_t pte)
>> +{
>> +    u64 memattr = FIELD_GET(KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR, pte);
>> +    return memattr == PAGE_S2_MEMATTR(NORMAL);
>> +}
>> +
>> +static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>> +                   enum kvm_pgtable_walk_flags flag,
>> +                   void * const arg)
>> +{
>> +    struct kvm_s2_mmu *mmu = arg;
>> +    kvm_pte_t pte = *ptep, *childp = NULL;
>> +    bool need_flush = false;
>> +
>> +    if (!kvm_pte_valid(pte))
>> +        return 0;
>> +
>> +    if (kvm_pte_table(pte, level)) {
>> +        childp = kvm_pte_follow(pte);
>> +
>> +        if (page_count(virt_to_page(childp)) != 1)
>> +            return 0;
>> +    } else if (stage2_pte_cacheable(pte)) {
>> +        need_flush = true;
>> +    }
>> +
>> +    /*
>> +     * This is similar to the map() path in that we unmap the entire
>> +     * block entry and rely on the remaining portions being faulted
>> +     * back lazily.
>> +     */
>> +    kvm_set_invalid_pte(ptep);
>> +    kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
>> +    put_page(virt_to_page(ptep));
>> +
>> +    if (need_flush) {
>> +        stage2_flush_dcache(kvm_pte_follow(pte),
>> +                    kvm_granule_size(level));
>> +    }
>> +
>> +    if (childp)
>> +        free_page((unsigned long)childp);
>> +
>> +    return 0;
>> +}
>> +
>> +int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
>> +{
>> +    struct kvm_pgtable_walker walker = {
>> +        .cb    = stage2_unmap_walker,
>> +        .arg    = pgt->mmu,
>> +        .flags    = KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
>> +    };
>> +
>> +    return kvm_pgtable_walk(pgt, addr, size, &walker);
>> +}
>> +
>>   int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm)
>>   {
>>       size_t pgd_sz;
>>

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 19/21] KVM: arm64: Remove unused page-table code
  2020-08-25  9:39 ` [PATCH v3 19/21] KVM: arm64: Remove unused page-table code Will Deacon
@ 2020-09-03  6:02   ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2020-09-03  6:02 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, linux-arm-kernel

Hi Will,

On 8/25/20 7:39 PM, Will Deacon wrote:
> Now that KVM is using the generic page-table code to manage the guest
> stage-2 page-tables, we can remove a bunch of unused macros, #defines
> and static inline functions from the old implementation.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/include/asm/kvm_mmu.h        | 141 -----
>   arch/arm64/include/asm/pgtable-hwdef.h  |  17 -
>   arch/arm64/include/asm/pgtable-prot.h   |  13 -
>   arch/arm64/include/asm/stage2_pgtable.h | 215 -------
>   arch/arm64/kvm/mmu.c                    | 755 ------------------------
>   5 files changed, 1141 deletions(-)
> 

With the following questions resolved:

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
> index 42fb50cfe0d8..13ff00d9f16d 100644
> --- a/arch/arm64/include/asm/kvm_mmu.h
> +++ b/arch/arm64/include/asm/kvm_mmu.h
> @@ -135,123 +135,6 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu);
>   phys_addr_t kvm_mmu_get_httbr(void);
>   phys_addr_t kvm_get_idmap_vector(void);
>   int kvm_mmu_init(void);
> -#define kvm_mk_pmd(ptep)					\
> -	__pmd(__phys_to_pmd_val(__pa(ptep)) | PMD_TYPE_TABLE)
> -#define kvm_mk_pud(pmdp)					\
> -	__pud(__phys_to_pud_val(__pa(pmdp)) | PMD_TYPE_TABLE)
> -#define kvm_mk_p4d(pmdp)					\
> -	__p4d(__phys_to_p4d_val(__pa(pmdp)) | PUD_TYPE_TABLE)
> -
> -#define kvm_set_pud(pudp, pud)		set_pud(pudp, pud)
> -
> -#define kvm_pfn_pte(pfn, prot)		pfn_pte(pfn, prot)
> -#define kvm_pfn_pmd(pfn, prot)		pfn_pmd(pfn, prot)
> -#define kvm_pfn_pud(pfn, prot)		pfn_pud(pfn, prot)
> -
> -#define kvm_pud_pfn(pud)		pud_pfn(pud)
> -
> -#define kvm_pmd_mkhuge(pmd)		pmd_mkhuge(pmd)
> -#define kvm_pud_mkhuge(pud)		pud_mkhuge(pud)
> -
> -static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
> -{
> -	pte_val(pte) |= PTE_S2_RDWR;
> -	return pte;
> -}
> -
> -static inline pmd_t kvm_s2pmd_mkwrite(pmd_t pmd)
> -{
> -	pmd_val(pmd) |= PMD_S2_RDWR;
> -	return pmd;
> -}
> -
> -static inline pud_t kvm_s2pud_mkwrite(pud_t pud)
> -{
> -	pud_val(pud) |= PUD_S2_RDWR;
> -	return pud;
> -}
> -
> -static inline pte_t kvm_s2pte_mkexec(pte_t pte)
> -{
> -	pte_val(pte) &= ~PTE_S2_XN;
> -	return pte;
> -}
> -
> -static inline pmd_t kvm_s2pmd_mkexec(pmd_t pmd)
> -{
> -	pmd_val(pmd) &= ~PMD_S2_XN;
> -	return pmd;
> -}
> -
> -static inline pud_t kvm_s2pud_mkexec(pud_t pud)
> -{
> -	pud_val(pud) &= ~PUD_S2_XN;
> -	return pud;
> -}
> -
> -static inline void kvm_set_s2pte_readonly(pte_t *ptep)
> -{
> -	pteval_t old_pteval, pteval;
> -
> -	pteval = READ_ONCE(pte_val(*ptep));
> -	do {
> -		old_pteval = pteval;
> -		pteval &= ~PTE_S2_RDWR;
> -		pteval |= PTE_S2_RDONLY;
> -		pteval = cmpxchg_relaxed(&pte_val(*ptep), old_pteval, pteval);
> -	} while (pteval != old_pteval);
> -}
> -
> -static inline bool kvm_s2pte_readonly(pte_t *ptep)
> -{
> -	return (READ_ONCE(pte_val(*ptep)) & PTE_S2_RDWR) == PTE_S2_RDONLY;
> -}
> -
> -static inline bool kvm_s2pte_exec(pte_t *ptep)
> -{
> -	return !(READ_ONCE(pte_val(*ptep)) & PTE_S2_XN);
> -}
> -
> -static inline void kvm_set_s2pmd_readonly(pmd_t *pmdp)
> -{
> -	kvm_set_s2pte_readonly((pte_t *)pmdp);
> -}
> -
> -static inline bool kvm_s2pmd_readonly(pmd_t *pmdp)
> -{
> -	return kvm_s2pte_readonly((pte_t *)pmdp);
> -}
> -
> -static inline bool kvm_s2pmd_exec(pmd_t *pmdp)
> -{
> -	return !(READ_ONCE(pmd_val(*pmdp)) & PMD_S2_XN);
> -}
> -
> -static inline void kvm_set_s2pud_readonly(pud_t *pudp)
> -{
> -	kvm_set_s2pte_readonly((pte_t *)pudp);
> -}
> -
> -static inline bool kvm_s2pud_readonly(pud_t *pudp)
> -{
> -	return kvm_s2pte_readonly((pte_t *)pudp);
> -}
> -
> -static inline bool kvm_s2pud_exec(pud_t *pudp)
> -{
> -	return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN);
> -}
> -
> -static inline pud_t kvm_s2pud_mkyoung(pud_t pud)
> -{
> -	return pud_mkyoung(pud);
> -}
> -
> -static inline bool kvm_s2pud_young(pud_t pud)
> -{
> -	return pud_young(pud);
> -}
> -
>   
>   struct kvm;
>   
> @@ -293,30 +176,6 @@ static inline void __invalidate_icache_guest_page(kvm_pfn_t pfn,
>   	}
>   }
>   
> -static inline void __kvm_flush_dcache_pte(pte_t pte)
> -{
> -	if (!cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) {
> -		struct page *page = pte_page(pte);
> -		kvm_flush_dcache_to_poc(page_address(page), PAGE_SIZE);
> -	}
> -}
> -
> -static inline void __kvm_flush_dcache_pmd(pmd_t pmd)
> -{
> -	if (!cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) {
> -		struct page *page = pmd_page(pmd);
> -		kvm_flush_dcache_to_poc(page_address(page), PMD_SIZE);
> -	}
> -}
> -
> -static inline void __kvm_flush_dcache_pud(pud_t pud)
> -{
> -	if (!cpus_have_const_cap(ARM64_HAS_STAGE2_FWB)) {
> -		struct page *page = pud_page(pud);
> -		kvm_flush_dcache_to_poc(page_address(page), PUD_SIZE);
> -	}
> -}
> -
>   void kvm_set_way_flush(struct kvm_vcpu *vcpu);
>   void kvm_toggle_cache(struct kvm_vcpu *vcpu, bool was_enabled);
>   

There are more stuff in arch/arm64/include/asm/kvm_mmu.h to be dropped:

static inline bool kvm_page_empty(void *ptr)
static inline int arm64_vttbr_x(u32 ipa_shift, u32 levels)
static inline u64 vttbr_baddr_mask(u32 ipa_shift, u32 levels)
static inline u64 kvm_vttbr_baddr_mask(struct kvm *kvm)


> diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
> index 1a989353144e..bb97d464f42b 100644
> --- a/arch/arm64/include/asm/pgtable-hwdef.h
> +++ b/arch/arm64/include/asm/pgtable-hwdef.h
> @@ -172,23 +172,6 @@
>   #define PTE_ATTRINDX(t)		(_AT(pteval_t, (t)) << 2)
>   #define PTE_ATTRINDX_MASK	(_AT(pteval_t, 7) << 2)
>   
> -/*
> - * 2nd stage PTE definitions
> - */
> -#define PTE_S2_RDONLY		(_AT(pteval_t, 1) << 6)   /* HAP[2:1] */
> -#define PTE_S2_RDWR		(_AT(pteval_t, 3) << 6)   /* HAP[2:1] */
> -#define PTE_S2_XN		(_AT(pteval_t, 2) << 53)  /* XN[1:0] */
> -#define PTE_S2_SW_RESVD		(_AT(pteval_t, 15) << 55) /* Reserved for SW */
> -
> -#define PMD_S2_RDONLY		(_AT(pmdval_t, 1) << 6)   /* HAP[2:1] */
> -#define PMD_S2_RDWR		(_AT(pmdval_t, 3) << 6)   /* HAP[2:1] */
> -#define PMD_S2_XN		(_AT(pmdval_t, 2) << 53)  /* XN[1:0] */
> -#define PMD_S2_SW_RESVD		(_AT(pmdval_t, 15) << 55) /* Reserved for SW */
> -
> -#define PUD_S2_RDONLY		(_AT(pudval_t, 1) << 6)   /* HAP[2:1] */
> -#define PUD_S2_RDWR		(_AT(pudval_t, 3) << 6)   /* HAP[2:1] */
> -#define PUD_S2_XN		(_AT(pudval_t, 2) << 53)  /* XN[1:0] */
> -
>   /*
>    * Memory Attribute override for Stage-2 (MemAttr[3:0])
>    */
> diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
> index 88acd7e1cd05..8f094c43072a 100644
> --- a/arch/arm64/include/asm/pgtable-prot.h
> +++ b/arch/arm64/include/asm/pgtable-prot.h
> @@ -73,19 +73,6 @@ extern bool arm64_use_ng_mappings;
>   		__val;							\
>   	 })
>   
> -#define PAGE_S2_XN							\
> -	({								\
> -		u64 __val;						\
> -		if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))		\
> -			__val = 0;					\
> -		else							\
> -			__val = PTE_S2_XN;				\
> -		__val;							\
> -	})
> -
> -#define PAGE_S2			__pgprot(_PROT_DEFAULT | PAGE_S2_MEMATTR(NORMAL) | PTE_S2_RDONLY | PAGE_S2_XN)
> -#define PAGE_S2_DEVICE		__pgprot(_PROT_DEFAULT | PAGE_S2_MEMATTR(DEVICE_nGnRE) | PTE_S2_RDONLY | PTE_S2_XN)
> -
>   #define PAGE_NONE		__pgprot(((_PAGE_DEFAULT) & ~PTE_VALID) | PTE_PROT_NONE | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN)
>   /* shared+writable pages are clean by default, hence PTE_RDONLY|PTE_WRITE */
>   #define PAGE_SHARED		__pgprot(_PAGE_DEFAULT | PTE_USER | PTE_RDONLY | PTE_NG | PTE_PXN | PTE_UXN | PTE_WRITE)
> diff --git a/arch/arm64/include/asm/stage2_pgtable.h b/arch/arm64/include/asm/stage2_pgtable.h
> index 996bf98f0cab..fe341a6578c3 100644
> --- a/arch/arm64/include/asm/stage2_pgtable.h
> +++ b/arch/arm64/include/asm/stage2_pgtable.h
> @@ -8,7 +8,6 @@
>   #ifndef __ARM64_S2_PGTABLE_H_
>   #define __ARM64_S2_PGTABLE_H_
>   
> -#include <linux/hugetlb.h>
>   #include <linux/pgtable.h>
>   
>   /*
> @@ -36,21 +35,6 @@
>   #define stage2_pgdir_size(kvm)		(1ULL << stage2_pgdir_shift(kvm))
>   #define stage2_pgdir_mask(kvm)		~(stage2_pgdir_size(kvm) - 1)
>   
> -/*
> - * The number of PTRS across all concatenated stage2 tables given by the
> - * number of bits resolved at the initial level.
> - * If we force more levels than necessary, we may have (stage2_pgdir_shift > IPA),
> - * in which case, stage2_pgd_ptrs will have one entry.
> - */
> -#define pgd_ptrs_shift(ipa, pgdir_shift)	\
> -	((ipa) > (pgdir_shift) ? ((ipa) - (pgdir_shift)) : 0)
> -#define __s2_pgd_ptrs(ipa, lvls)		\
> -	(1 << (pgd_ptrs_shift((ipa), pt_levels_pgdir_shift(lvls))))
> -#define __s2_pgd_size(ipa, lvls)	(__s2_pgd_ptrs((ipa), (lvls)) * sizeof(pgd_t))
> -
> -#define stage2_pgd_ptrs(kvm)		__s2_pgd_ptrs(kvm_phys_shift(kvm), kvm_stage2_levels(kvm))
> -#define stage2_pgd_size(kvm)		__s2_pgd_size(kvm_phys_shift(kvm), kvm_stage2_levels(kvm))
> -
>   /*
>    * kvm_mmmu_cache_min_pages() is the number of pages required to install
>    * a stage-2 translation. We pre-allocate the entry level page table at
> @@ -58,196 +42,6 @@
>    */
>   #define kvm_mmu_cache_min_pages(kvm)	(kvm_stage2_levels(kvm) - 1)
>   
> -/* Stage2 PUD definitions when the level is present */
> -static inline bool kvm_stage2_has_pud(struct kvm *kvm)
> -{
> -	return (CONFIG_PGTABLE_LEVELS > 3) && (kvm_stage2_levels(kvm) > 3);
> -}
> -
> -#define S2_PUD_SHIFT			ARM64_HW_PGTABLE_LEVEL_SHIFT(1)
> -#define S2_PUD_SIZE			(1UL << S2_PUD_SHIFT)
> -#define S2_PUD_MASK			(~(S2_PUD_SIZE - 1))
> -
> -#define stage2_pgd_none(kvm, pgd)		pgd_none(pgd)
> -#define stage2_pgd_clear(kvm, pgd)		pgd_clear(pgd)
> -#define stage2_pgd_present(kvm, pgd)		pgd_present(pgd)
> -#define stage2_pgd_populate(kvm, pgd, p4d)	pgd_populate(NULL, pgd, p4d)
> -
> -static inline p4d_t *stage2_p4d_offset(struct kvm *kvm,
> -				       pgd_t *pgd, unsigned long address)
> -{
> -	return p4d_offset(pgd, address);
> -}
> -
> -static inline void stage2_p4d_free(struct kvm *kvm, p4d_t *p4d)
> -{
> -}
> -
> -static inline bool stage2_p4d_table_empty(struct kvm *kvm, p4d_t *p4dp)
> -{
> -	return false;
> -}
> -
> -static inline phys_addr_t stage2_p4d_addr_end(struct kvm *kvm,
> -					      phys_addr_t addr, phys_addr_t end)
> -{
> -	return end;
> -}
> -
> -static inline bool stage2_p4d_none(struct kvm *kvm, p4d_t p4d)
> -{
> -	if (kvm_stage2_has_pud(kvm))
> -		return p4d_none(p4d);
> -	else
> -		return 0;
> -}
> -
> -static inline void stage2_p4d_clear(struct kvm *kvm, p4d_t *p4dp)
> -{
> -	if (kvm_stage2_has_pud(kvm))
> -		p4d_clear(p4dp);
> -}
> -
> -static inline bool stage2_p4d_present(struct kvm *kvm, p4d_t p4d)
> -{
> -	if (kvm_stage2_has_pud(kvm))
> -		return p4d_present(p4d);
> -	else
> -		return 1;
> -}
> -
> -static inline void stage2_p4d_populate(struct kvm *kvm, p4d_t *p4d, pud_t *pud)
> -{
> -	if (kvm_stage2_has_pud(kvm))
> -		p4d_populate(NULL, p4d, pud);
> -}
> -
> -static inline pud_t *stage2_pud_offset(struct kvm *kvm,
> -				       p4d_t *p4d, unsigned long address)
> -{
> -	if (kvm_stage2_has_pud(kvm))
> -		return pud_offset(p4d, address);
> -	else
> -		return (pud_t *)p4d;
> -}
> -
> -static inline void stage2_pud_free(struct kvm *kvm, pud_t *pud)
> -{
> -	if (kvm_stage2_has_pud(kvm))
> -		free_page((unsigned long)pud);
> -}
> -
> -static inline bool stage2_pud_table_empty(struct kvm *kvm, pud_t *pudp)
> -{
> -	if (kvm_stage2_has_pud(kvm))
> -		return kvm_page_empty(pudp);
> -	else
> -		return false;
> -}
> -
> -static inline phys_addr_t
> -stage2_pud_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
> -{
> -	if (kvm_stage2_has_pud(kvm)) {
> -		phys_addr_t boundary = (addr + S2_PUD_SIZE) & S2_PUD_MASK;
> -
> -		return (boundary - 1 < end - 1) ? boundary : end;
> -	} else {
> -		return end;
> -	}
> -}
> -
> -/* Stage2 PMD definitions when the level is present */
> -static inline bool kvm_stage2_has_pmd(struct kvm *kvm)
> -{
> -	return (CONFIG_PGTABLE_LEVELS > 2) && (kvm_stage2_levels(kvm) > 2);
> -}
> -
> -#define S2_PMD_SHIFT			ARM64_HW_PGTABLE_LEVEL_SHIFT(2)
> -#define S2_PMD_SIZE			(1UL << S2_PMD_SHIFT)
> -#define S2_PMD_MASK			(~(S2_PMD_SIZE - 1))
> -
> -static inline bool stage2_pud_none(struct kvm *kvm, pud_t pud)
> -{
> -	if (kvm_stage2_has_pmd(kvm))
> -		return pud_none(pud);
> -	else
> -		return 0;
> -}
> -
> -static inline void stage2_pud_clear(struct kvm *kvm, pud_t *pud)
> -{
> -	if (kvm_stage2_has_pmd(kvm))
> -		pud_clear(pud);
> -}
> -
> -static inline bool stage2_pud_present(struct kvm *kvm, pud_t pud)
> -{
> -	if (kvm_stage2_has_pmd(kvm))
> -		return pud_present(pud);
> -	else
> -		return 1;
> -}
> -
> -static inline void stage2_pud_populate(struct kvm *kvm, pud_t *pud, pmd_t *pmd)
> -{
> -	if (kvm_stage2_has_pmd(kvm))
> -		pud_populate(NULL, pud, pmd);
> -}
> -
> -static inline pmd_t *stage2_pmd_offset(struct kvm *kvm,
> -				       pud_t *pud, unsigned long address)
> -{
> -	if (kvm_stage2_has_pmd(kvm))
> -		return pmd_offset(pud, address);
> -	else
> -		return (pmd_t *)pud;
> -}
> -
> -static inline void stage2_pmd_free(struct kvm *kvm, pmd_t *pmd)
> -{
> -	if (kvm_stage2_has_pmd(kvm))
> -		free_page((unsigned long)pmd);
> -}
> -
> -static inline bool stage2_pud_huge(struct kvm *kvm, pud_t pud)
> -{
> -	if (kvm_stage2_has_pmd(kvm))
> -		return pud_huge(pud);
> -	else
> -		return 0;
> -}
> -
> -static inline bool stage2_pmd_table_empty(struct kvm *kvm, pmd_t *pmdp)
> -{
> -	if (kvm_stage2_has_pmd(kvm))
> -		return kvm_page_empty(pmdp);
> -	else
> -		return 0;
> -}
> -
> -static inline phys_addr_t
> -stage2_pmd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
> -{
> -	if (kvm_stage2_has_pmd(kvm)) {
> -		phys_addr_t boundary = (addr + S2_PMD_SIZE) & S2_PMD_MASK;
> -
> -		return (boundary - 1 < end - 1) ? boundary : end;
> -	} else {
> -		return end;
> -	}
> -}
> -
> -static inline bool stage2_pte_table_empty(struct kvm *kvm, pte_t *ptep)
> -{
> -	return kvm_page_empty(ptep);
> -}
> -
> -static inline unsigned long stage2_pgd_index(struct kvm *kvm, phys_addr_t addr)
> -{
> -	return (((addr) >> stage2_pgdir_shift(kvm)) & (stage2_pgd_ptrs(kvm) - 1));
> -}
> -
>   static inline phys_addr_t
>   stage2_pgd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
>   {
> @@ -256,13 +50,4 @@ stage2_pgd_addr_end(struct kvm *kvm, phys_addr_t addr, phys_addr_t end)
>   	return (boundary - 1 < end - 1) ? boundary : end;
>   }
>   
> -/*
> - * Level values for the ARMv8.4-TTL extension, mapping PUD/PMD/PTE and
> - * the architectural page-table level.
> - */
> -#define S2_NO_LEVEL_HINT	0
> -#define S2_PUD_LEVEL		1
> -#define S2_PMD_LEVEL		2
> -#define S2_PTE_LEVEL		3
> -
>   #endif	/* __ARM64_S2_PGTABLE_H_ */

In arch/arm64/include/stage2_pgtable.h, there are several
macros are only used by stage2_pgd_addr_end(), which is defined
in same header file. I guess they might be moved to the function
and make them function scoped:

/* stage2_pgdir_shift() is the size mapped by top-level stage2 entry for the VM */
#define stage2_pgdir_shift(kvm)         pt_levels_pgdir_shift(kvm_stage2_levels(kvm))
#define stage2_pgdir_size(kvm)          (1ULL << stage2_pgdir_shift(kvm))
#define stage2_pgdir_mask(kvm)          ~(stage2_pgdir_size(kvm) - 1)


> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 050eab71de31..ddeec0b03666 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -31,13 +31,6 @@ static phys_addr_t hyp_idmap_vector;
>   
>   static unsigned long io_map_base;
>   
> -#define KVM_S2PTE_FLAG_IS_IOMAP		(1UL << 0)
> -#define KVM_S2_FLAG_LOGGING_ACTIVE	(1UL << 1)
> -
> -static bool is_iomap(unsigned long flags)
> -{
> -	return flags & KVM_S2PTE_FLAG_IS_IOMAP;
> -}
>   
>   /*
>    * Release kvm_mmu_lock periodically if the memory region is large. Otherwise,
> @@ -85,154 +78,11 @@ void kvm_flush_remote_tlbs(struct kvm *kvm)
>   	kvm_call_hyp(__kvm_tlb_flush_vmid, &kvm->arch.mmu);
>   }
>   
> -static void kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa,
> -				   int level)
> -{
> -	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, ipa, level);
> -}
> -
> -/*
> - * D-Cache management functions. They take the page table entries by
> - * value, as they are flushing the cache using the kernel mapping (or
> - * kmap on 32bit).
> - */
> -static void kvm_flush_dcache_pte(pte_t pte)
> -{
> -	__kvm_flush_dcache_pte(pte);
> -}
> -
> -static void kvm_flush_dcache_pmd(pmd_t pmd)
> -{
> -	__kvm_flush_dcache_pmd(pmd);
> -}
> -
> -static void kvm_flush_dcache_pud(pud_t pud)
> -{
> -	__kvm_flush_dcache_pud(pud);
> -}
> -
>   static bool kvm_is_device_pfn(unsigned long pfn)
>   {
>   	return !pfn_valid(pfn);
>   }
>   
> -/**
> - * stage2_dissolve_pmd() - clear and flush huge PMD entry
> - * @mmu:	pointer to mmu structure to operate on
> - * @addr:	IPA
> - * @pmd:	pmd pointer for IPA
> - *
> - * Function clears a PMD entry, flushes addr 1st and 2nd stage TLBs.
> - */
> -static void stage2_dissolve_pmd(struct kvm_s2_mmu *mmu, phys_addr_t addr, pmd_t *pmd)
> -{
> -	if (!pmd_thp_or_huge(*pmd))
> -		return;
> -
> -	pmd_clear(pmd);
> -	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_PMD_LEVEL);
> -	put_page(virt_to_page(pmd));
> -}
> -
> -/**
> - * stage2_dissolve_pud() - clear and flush huge PUD entry
> - * @mmu:	pointer to mmu structure to operate on
> - * @addr:	IPA
> - * @pud:	pud pointer for IPA
> - *
> - * Function clears a PUD entry, flushes addr 1st and 2nd stage TLBs.
> - */
> -static void stage2_dissolve_pud(struct kvm_s2_mmu *mmu, phys_addr_t addr, pud_t *pudp)
> -{
> -	struct kvm *kvm = mmu->kvm;
> -
> -	if (!stage2_pud_huge(kvm, *pudp))
> -		return;
> -
> -	stage2_pud_clear(kvm, pudp);
> -	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_PUD_LEVEL);
> -	put_page(virt_to_page(pudp));
> -}
> -
> -static void clear_stage2_pgd_entry(struct kvm_s2_mmu *mmu, pgd_t *pgd, phys_addr_t addr)
> -{
> -	struct kvm *kvm = mmu->kvm;
> -	p4d_t *p4d_table __maybe_unused = stage2_p4d_offset(kvm, pgd, 0UL);
> -	stage2_pgd_clear(kvm, pgd);
> -	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_NO_LEVEL_HINT);
> -	stage2_p4d_free(kvm, p4d_table);
> -	put_page(virt_to_page(pgd));
> -}
> -
> -static void clear_stage2_p4d_entry(struct kvm_s2_mmu *mmu, p4d_t *p4d, phys_addr_t addr)
> -{
> -	struct kvm *kvm = mmu->kvm;
> -	pud_t *pud_table __maybe_unused = stage2_pud_offset(kvm, p4d, 0);
> -	stage2_p4d_clear(kvm, p4d);
> -	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_NO_LEVEL_HINT);
> -	stage2_pud_free(kvm, pud_table);
> -	put_page(virt_to_page(p4d));
> -}
> -
> -static void clear_stage2_pud_entry(struct kvm_s2_mmu *mmu, pud_t *pud, phys_addr_t addr)
> -{
> -	struct kvm *kvm = mmu->kvm;
> -	pmd_t *pmd_table __maybe_unused = stage2_pmd_offset(kvm, pud, 0);
> -
> -	VM_BUG_ON(stage2_pud_huge(kvm, *pud));
> -	stage2_pud_clear(kvm, pud);
> -	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_NO_LEVEL_HINT);
> -	stage2_pmd_free(kvm, pmd_table);
> -	put_page(virt_to_page(pud));
> -}
> -
> -static void clear_stage2_pmd_entry(struct kvm_s2_mmu *mmu, pmd_t *pmd, phys_addr_t addr)
> -{
> -	pte_t *pte_table = pte_offset_kernel(pmd, 0);
> -	VM_BUG_ON(pmd_thp_or_huge(*pmd));
> -	pmd_clear(pmd);
> -	kvm_tlb_flush_vmid_ipa(mmu, addr, S2_NO_LEVEL_HINT);
> -	free_page((unsigned long)pte_table);
> -	put_page(virt_to_page(pmd));
> -}
> -
> -static inline void kvm_set_pte(pte_t *ptep, pte_t new_pte)
> -{
> -	WRITE_ONCE(*ptep, new_pte);
> -	dsb(ishst);
> -}
> -
> -static inline void kvm_set_pmd(pmd_t *pmdp, pmd_t new_pmd)
> -{
> -	WRITE_ONCE(*pmdp, new_pmd);
> -	dsb(ishst);
> -}
> -
> -static inline void kvm_pmd_populate(pmd_t *pmdp, pte_t *ptep)
> -{
> -	kvm_set_pmd(pmdp, kvm_mk_pmd(ptep));
> -}
> -
> -static inline void kvm_pud_populate(pud_t *pudp, pmd_t *pmdp)
> -{
> -	WRITE_ONCE(*pudp, kvm_mk_pud(pmdp));
> -	dsb(ishst);
> -}
> -
> -static inline void kvm_p4d_populate(p4d_t *p4dp, pud_t *pudp)
> -{
> -	WRITE_ONCE(*p4dp, kvm_mk_p4d(pudp));
> -	dsb(ishst);
> -}
> -
> -static inline void kvm_pgd_populate(pgd_t *pgdp, p4d_t *p4dp)
> -{
> -#ifndef __PAGETABLE_P4D_FOLDED
> -	WRITE_ONCE(*pgdp, kvm_mk_pgd(p4dp));
> -	dsb(ishst);
> -#endif
> -}
> -
>   /*
>    * Unmapping vs dcache management:
>    *
> @@ -257,108 +107,6 @@ static inline void kvm_pgd_populate(pgd_t *pgdp, p4d_t *p4dp)
>    * we then fully enforce cacheability of RAM, no matter what the guest
>    * does.
>    */
> -static void unmap_stage2_ptes(struct kvm_s2_mmu *mmu, pmd_t *pmd,
> -		       phys_addr_t addr, phys_addr_t end)
> -{
> -	phys_addr_t start_addr = addr;
> -	pte_t *pte, *start_pte;
> -
> -	start_pte = pte = pte_offset_kernel(pmd, addr);
> -	do {
> -		if (!pte_none(*pte)) {
> -			pte_t old_pte = *pte;
> -
> -			kvm_set_pte(pte, __pte(0));
> -			kvm_tlb_flush_vmid_ipa(mmu, addr, S2_PTE_LEVEL);
> -
> -			/* No need to invalidate the cache for device mappings */
> -			if (!kvm_is_device_pfn(pte_pfn(old_pte)))
> -				kvm_flush_dcache_pte(old_pte);
> -
> -			put_page(virt_to_page(pte));
> -		}
> -	} while (pte++, addr += PAGE_SIZE, addr != end);
> -
> -	if (stage2_pte_table_empty(mmu->kvm, start_pte))
> -		clear_stage2_pmd_entry(mmu, pmd, start_addr);
> -}
> -
> -static void unmap_stage2_pmds(struct kvm_s2_mmu *mmu, pud_t *pud,
> -		       phys_addr_t addr, phys_addr_t end)
> -{
> -	struct kvm *kvm = mmu->kvm;
> -	phys_addr_t next, start_addr = addr;
> -	pmd_t *pmd, *start_pmd;
> -
> -	start_pmd = pmd = stage2_pmd_offset(kvm, pud, addr);
> -	do {
> -		next = stage2_pmd_addr_end(kvm, addr, end);
> -		if (!pmd_none(*pmd)) {
> -			if (pmd_thp_or_huge(*pmd)) {
> -				pmd_t old_pmd = *pmd;
> -
> -				pmd_clear(pmd);
> -				kvm_tlb_flush_vmid_ipa(mmu, addr, S2_PMD_LEVEL);
> -
> -				kvm_flush_dcache_pmd(old_pmd);
> -
> -				put_page(virt_to_page(pmd));
> -			} else {
> -				unmap_stage2_ptes(mmu, pmd, addr, next);
> -			}
> -		}
> -	} while (pmd++, addr = next, addr != end);
> -
> -	if (stage2_pmd_table_empty(kvm, start_pmd))
> -		clear_stage2_pud_entry(mmu, pud, start_addr);
> -}
> -
> -static void unmap_stage2_puds(struct kvm_s2_mmu *mmu, p4d_t *p4d,
> -		       phys_addr_t addr, phys_addr_t end)
> -{
> -	struct kvm *kvm = mmu->kvm;
> -	phys_addr_t next, start_addr = addr;
> -	pud_t *pud, *start_pud;
> -
> -	start_pud = pud = stage2_pud_offset(kvm, p4d, addr);
> -	do {
> -		next = stage2_pud_addr_end(kvm, addr, end);
> -		if (!stage2_pud_none(kvm, *pud)) {
> -			if (stage2_pud_huge(kvm, *pud)) {
> -				pud_t old_pud = *pud;
> -
> -				stage2_pud_clear(kvm, pud);
> -				kvm_tlb_flush_vmid_ipa(mmu, addr, S2_PUD_LEVEL);
> -				kvm_flush_dcache_pud(old_pud);
> -				put_page(virt_to_page(pud));
> -			} else {
> -				unmap_stage2_pmds(mmu, pud, addr, next);
> -			}
> -		}
> -	} while (pud++, addr = next, addr != end);
> -
> -	if (stage2_pud_table_empty(kvm, start_pud))
> -		clear_stage2_p4d_entry(mmu, p4d, start_addr);
> -}
> -
> -static void unmap_stage2_p4ds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
> -		       phys_addr_t addr, phys_addr_t end)
> -{
> -	struct kvm *kvm = mmu->kvm;
> -	phys_addr_t next, start_addr = addr;
> -	p4d_t *p4d, *start_p4d;
> -
> -	start_p4d = p4d = stage2_p4d_offset(kvm, pgd, addr);
> -	do {
> -		next = stage2_p4d_addr_end(kvm, addr, end);
> -		if (!stage2_p4d_none(kvm, *p4d))
> -			unmap_stage2_puds(mmu, p4d, addr, next);
> -	} while (p4d++, addr = next, addr != end);
> -
> -	if (stage2_p4d_table_empty(kvm, start_p4d))
> -		clear_stage2_pgd_entry(mmu, pgd, start_addr);
> -}
> -
>   /**
>    * unmap_stage2_range -- Clear stage2 page table entries to unmap a range
>    * @kvm:   The VM pointer
> @@ -387,71 +135,6 @@ static void unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 si
>   	__unmap_stage2_range(mmu, start, size, true);
>   }
>   
> -static void stage2_flush_ptes(struct kvm_s2_mmu *mmu, pmd_t *pmd,
> -			      phys_addr_t addr, phys_addr_t end)
> -{
> -	pte_t *pte;
> -
> -	pte = pte_offset_kernel(pmd, addr);
> -	do {
> -		if (!pte_none(*pte) && !kvm_is_device_pfn(pte_pfn(*pte)))
> -			kvm_flush_dcache_pte(*pte);
> -	} while (pte++, addr += PAGE_SIZE, addr != end);
> -}
> -
> -static void stage2_flush_pmds(struct kvm_s2_mmu *mmu, pud_t *pud,
> -			      phys_addr_t addr, phys_addr_t end)
> -{
> -	struct kvm *kvm = mmu->kvm;
> -	pmd_t *pmd;
> -	phys_addr_t next;
> -
> -	pmd = stage2_pmd_offset(kvm, pud, addr);
> -	do {
> -		next = stage2_pmd_addr_end(kvm, addr, end);
> -		if (!pmd_none(*pmd)) {
> -			if (pmd_thp_or_huge(*pmd))
> -				kvm_flush_dcache_pmd(*pmd);
> -			else
> -				stage2_flush_ptes(mmu, pmd, addr, next);
> -		}
> -	} while (pmd++, addr = next, addr != end);
> -}
> -
> -static void stage2_flush_puds(struct kvm_s2_mmu *mmu, p4d_t *p4d,
> -			      phys_addr_t addr, phys_addr_t end)
> -{
> -	struct kvm *kvm = mmu->kvm;
> -	pud_t *pud;
> -	phys_addr_t next;
> -
> -	pud = stage2_pud_offset(kvm, p4d, addr);
> -	do {
> -		next = stage2_pud_addr_end(kvm, addr, end);
> -		if (!stage2_pud_none(kvm, *pud)) {
> -			if (stage2_pud_huge(kvm, *pud))
> -				kvm_flush_dcache_pud(*pud);
> -			else
> -				stage2_flush_pmds(mmu, pud, addr, next);
> -		}
> -	} while (pud++, addr = next, addr != end);
> -}
> -
> -static void stage2_flush_p4ds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
> -			      phys_addr_t addr, phys_addr_t end)
> -{
> -	struct kvm *kvm = mmu->kvm;
> -	p4d_t *p4d;
> -	phys_addr_t next;
> -
> -	p4d = stage2_p4d_offset(kvm, pgd, addr);
> -	do {
> -		next = stage2_p4d_addr_end(kvm, addr, end);
> -		if (!stage2_p4d_none(kvm, *p4d))
> -			stage2_flush_puds(mmu, p4d, addr, next);
> -	} while (p4d++, addr = next, addr != end);
> -}
> -
>   static void stage2_flush_memslot(struct kvm *kvm,
>   				 struct kvm_memory_slot *memslot)
>   {
> @@ -800,348 +483,6 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>   	}
>   }
>   
> -static p4d_t *stage2_get_p4d(struct kvm_s2_mmu *mmu, struct kvm_mmu_memory_cache *cache,
> -			     phys_addr_t addr)
> -{
> -	struct kvm *kvm = mmu->kvm;
> -	pgd_t *pgd;
> -	p4d_t *p4d;
> -
> -	pgd = mmu->pgd + stage2_pgd_index(kvm, addr);
> -	if (stage2_pgd_none(kvm, *pgd)) {
> -		if (!cache)
> -			return NULL;
> -		p4d = kvm_mmu_memory_cache_alloc(cache);
> -		stage2_pgd_populate(kvm, pgd, p4d);
> -		get_page(virt_to_page(pgd));
> -	}
> -
> -	return stage2_p4d_offset(kvm, pgd, addr);
> -}
> -
> -static pud_t *stage2_get_pud(struct kvm_s2_mmu *mmu, struct kvm_mmu_memory_cache *cache,
> -			     phys_addr_t addr)
> -{
> -	struct kvm *kvm = mmu->kvm;
> -	p4d_t *p4d;
> -	pud_t *pud;
> -
> -	p4d = stage2_get_p4d(mmu, cache, addr);
> -	if (stage2_p4d_none(kvm, *p4d)) {
> -		if (!cache)
> -			return NULL;
> -		pud = kvm_mmu_memory_cache_alloc(cache);
> -		stage2_p4d_populate(kvm, p4d, pud);
> -		get_page(virt_to_page(p4d));
> -	}
> -
> -	return stage2_pud_offset(kvm, p4d, addr);
> -}
> -
> -static pmd_t *stage2_get_pmd(struct kvm_s2_mmu *mmu, struct kvm_mmu_memory_cache *cache,
> -			     phys_addr_t addr)
> -{
> -	struct kvm *kvm = mmu->kvm;
> -	pud_t *pud;
> -	pmd_t *pmd;
> -
> -	pud = stage2_get_pud(mmu, cache, addr);
> -	if (!pud || stage2_pud_huge(kvm, *pud))
> -		return NULL;
> -
> -	if (stage2_pud_none(kvm, *pud)) {
> -		if (!cache)
> -			return NULL;
> -		pmd = kvm_mmu_memory_cache_alloc(cache);
> -		stage2_pud_populate(kvm, pud, pmd);
> -		get_page(virt_to_page(pud));
> -	}
> -
> -	return stage2_pmd_offset(kvm, pud, addr);
> -}
> -
> -static int stage2_set_pmd_huge(struct kvm_s2_mmu *mmu,
> -			       struct kvm_mmu_memory_cache *cache,
> -			       phys_addr_t addr, const pmd_t *new_pmd)
> -{
> -	pmd_t *pmd, old_pmd;
> -
> -retry:
> -	pmd = stage2_get_pmd(mmu, cache, addr);
> -	VM_BUG_ON(!pmd);
> -
> -	old_pmd = *pmd;
> -	/*
> -	 * Multiple vcpus faulting on the same PMD entry, can
> -	 * lead to them sequentially updating the PMD with the
> -	 * same value. Following the break-before-make
> -	 * (pmd_clear() followed by tlb_flush()) process can
> -	 * hinder forward progress due to refaults generated
> -	 * on missing translations.
> -	 *
> -	 * Skip updating the page table if the entry is
> -	 * unchanged.
> -	 */
> -	if (pmd_val(old_pmd) == pmd_val(*new_pmd))
> -		return 0;
> -
> -	if (pmd_present(old_pmd)) {
> -		/*
> -		 * If we already have PTE level mapping for this block,
> -		 * we must unmap it to avoid inconsistent TLB state and
> -		 * leaking the table page. We could end up in this situation
> -		 * if the memory slot was marked for dirty logging and was
> -		 * reverted, leaving PTE level mappings for the pages accessed
> -		 * during the period. So, unmap the PTE level mapping for this
> -		 * block and retry, as we could have released the upper level
> -		 * table in the process.
> -		 *
> -		 * Normal THP split/merge follows mmu_notifier callbacks and do
> -		 * get handled accordingly.
> -		 */
> -		if (!pmd_thp_or_huge(old_pmd)) {
> -			unmap_stage2_range(mmu, addr & S2_PMD_MASK, S2_PMD_SIZE);
> -			goto retry;
> -		}
> -		/*
> -		 * Mapping in huge pages should only happen through a
> -		 * fault.  If a page is merged into a transparent huge
> -		 * page, the individual subpages of that huge page
> -		 * should be unmapped through MMU notifiers before we
> -		 * get here.
> -		 *
> -		 * Merging of CompoundPages is not supported; they
> -		 * should become splitting first, unmapped, merged,
> -		 * and mapped back in on-demand.
> -		 */
> -		WARN_ON_ONCE(pmd_pfn(old_pmd) != pmd_pfn(*new_pmd));
> -		pmd_clear(pmd);
> -		kvm_tlb_flush_vmid_ipa(mmu, addr, S2_PMD_LEVEL);
> -	} else {
> -		get_page(virt_to_page(pmd));
> -	}
> -
> -	kvm_set_pmd(pmd, *new_pmd);
> -	return 0;
> -}
> -
> -static int stage2_set_pud_huge(struct kvm_s2_mmu *mmu,
> -			       struct kvm_mmu_memory_cache *cache,
> -			       phys_addr_t addr, const pud_t *new_pudp)
> -{
> -	struct kvm *kvm = mmu->kvm;
> -	pud_t *pudp, old_pud;
> -
> -retry:
> -	pudp = stage2_get_pud(mmu, cache, addr);
> -	VM_BUG_ON(!pudp);
> -
> -	old_pud = *pudp;
> -
> -	/*
> -	 * A large number of vcpus faulting on the same stage 2 entry,
> -	 * can lead to a refault due to the stage2_pud_clear()/tlb_flush().
> -	 * Skip updating the page tables if there is no change.
> -	 */
> -	if (pud_val(old_pud) == pud_val(*new_pudp))
> -		return 0;
> -
> -	if (stage2_pud_present(kvm, old_pud)) {
> -		/*
> -		 * If we already have table level mapping for this block, unmap
> -		 * the range for this block and retry.
> -		 */
> -		if (!stage2_pud_huge(kvm, old_pud)) {
> -			unmap_stage2_range(mmu, addr & S2_PUD_MASK, S2_PUD_SIZE);
> -			goto retry;
> -		}
> -
> -		WARN_ON_ONCE(kvm_pud_pfn(old_pud) != kvm_pud_pfn(*new_pudp));
> -		stage2_pud_clear(kvm, pudp);
> -		kvm_tlb_flush_vmid_ipa(mmu, addr, S2_PUD_LEVEL);
> -	} else {
> -		get_page(virt_to_page(pudp));
> -	}
> -
> -	kvm_set_pud(pudp, *new_pudp);
> -	return 0;
> -}
> -
> -/*
> - * stage2_get_leaf_entry - walk the stage2 VM page tables and return
> - * true if a valid and present leaf-entry is found. A pointer to the
> - * leaf-entry is returned in the appropriate level variable - pudpp,
> - * pmdpp, ptepp.
> - */
> -static bool stage2_get_leaf_entry(struct kvm_s2_mmu *mmu, phys_addr_t addr,
> -				  pud_t **pudpp, pmd_t **pmdpp, pte_t **ptepp)
> -{
> -	struct kvm *kvm = mmu->kvm;
> -	pud_t *pudp;
> -	pmd_t *pmdp;
> -	pte_t *ptep;
> -
> -	*pudpp = NULL;
> -	*pmdpp = NULL;
> -	*ptepp = NULL;
> -
> -	pudp = stage2_get_pud(mmu, NULL, addr);
> -	if (!pudp || stage2_pud_none(kvm, *pudp) || !stage2_pud_present(kvm, *pudp))
> -		return false;
> -
> -	if (stage2_pud_huge(kvm, *pudp)) {
> -		*pudpp = pudp;
> -		return true;
> -	}
> -
> -	pmdp = stage2_pmd_offset(kvm, pudp, addr);
> -	if (!pmdp || pmd_none(*pmdp) || !pmd_present(*pmdp))
> -		return false;
> -
> -	if (pmd_thp_or_huge(*pmdp)) {
> -		*pmdpp = pmdp;
> -		return true;
> -	}
> -
> -	ptep = pte_offset_kernel(pmdp, addr);
> -	if (!ptep || pte_none(*ptep) || !pte_present(*ptep))
> -		return false;
> -
> -	*ptepp = ptep;
> -	return true;
> -}
> -
> -static bool stage2_is_exec(struct kvm_s2_mmu *mmu, phys_addr_t addr, unsigned long sz)
> -{
> -	pud_t *pudp;
> -	pmd_t *pmdp;
> -	pte_t *ptep;
> -	bool found;
> -
> -	found = stage2_get_leaf_entry(mmu, addr, &pudp, &pmdp, &ptep);
> -	if (!found)
> -		return false;
> -
> -	if (pudp)
> -		return sz <= PUD_SIZE && kvm_s2pud_exec(pudp);
> -	else if (pmdp)
> -		return sz <= PMD_SIZE && kvm_s2pmd_exec(pmdp);
> -	else
> -		return sz == PAGE_SIZE && kvm_s2pte_exec(ptep);
> -}
> -
> -static int stage2_set_pte(struct kvm_s2_mmu *mmu,
> -			  struct kvm_mmu_memory_cache *cache,
> -			  phys_addr_t addr, const pte_t *new_pte,
> -			  unsigned long flags)
> -{
> -	struct kvm *kvm = mmu->kvm;
> -	pud_t *pud;
> -	pmd_t *pmd;
> -	pte_t *pte, old_pte;
> -	bool iomap = flags & KVM_S2PTE_FLAG_IS_IOMAP;
> -	bool logging_active = flags & KVM_S2_FLAG_LOGGING_ACTIVE;
> -
> -	VM_BUG_ON(logging_active && !cache);
> -
> -	/* Create stage-2 page table mapping - Levels 0 and 1 */
> -	pud = stage2_get_pud(mmu, cache, addr);
> -	if (!pud) {
> -		/*
> -		 * Ignore calls from kvm_set_spte_hva for unallocated
> -		 * address ranges.
> -		 */
> -		return 0;
> -	}
> -
> -	/*
> -	 * While dirty page logging - dissolve huge PUD, then continue
> -	 * on to allocate page.
> -	 */
> -	if (logging_active)
> -		stage2_dissolve_pud(mmu, addr, pud);
> -
> -	if (stage2_pud_none(kvm, *pud)) {
> -		if (!cache)
> -			return 0; /* ignore calls from kvm_set_spte_hva */
> -		pmd = kvm_mmu_memory_cache_alloc(cache);
> -		stage2_pud_populate(kvm, pud, pmd);
> -		get_page(virt_to_page(pud));
> -	}
> -
> -	pmd = stage2_pmd_offset(kvm, pud, addr);
> -	if (!pmd) {
> -		/*
> -		 * Ignore calls from kvm_set_spte_hva for unallocated
> -		 * address ranges.
> -		 */
> -		return 0;
> -	}
> -
> -	/*
> -	 * While dirty page logging - dissolve huge PMD, then continue on to
> -	 * allocate page.
> -	 */
> -	if (logging_active)
> -		stage2_dissolve_pmd(mmu, addr, pmd);
> -
> -	/* Create stage-2 page mappings - Level 2 */
> -	if (pmd_none(*pmd)) {
> -		if (!cache)
> -			return 0; /* ignore calls from kvm_set_spte_hva */
> -		pte = kvm_mmu_memory_cache_alloc(cache);
> -		kvm_pmd_populate(pmd, pte);
> -		get_page(virt_to_page(pmd));
> -	}
> -
> -	pte = pte_offset_kernel(pmd, addr);
> -
> -	if (iomap && pte_present(*pte))
> -		return -EFAULT;
> -
> -	/* Create 2nd stage page table mapping - Level 3 */
> -	old_pte = *pte;
> -	if (pte_present(old_pte)) {
> -		/* Skip page table update if there is no change */
> -		if (pte_val(old_pte) == pte_val(*new_pte))
> -			return 0;
> -
> -		kvm_set_pte(pte, __pte(0));
> -		kvm_tlb_flush_vmid_ipa(mmu, addr, S2_PTE_LEVEL);
> -	} else {
> -		get_page(virt_to_page(pte));
> -	}
> -
> -	kvm_set_pte(pte, *new_pte);
> -	return 0;
> -}
> -
> -#ifndef __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG
> -static int stage2_ptep_test_and_clear_young(pte_t *pte)
> -{
> -	if (pte_young(*pte)) {
> -		*pte = pte_mkold(*pte);
> -		return 1;
> -	}
> -	return 0;
> -}
> -#else
> -static int stage2_ptep_test_and_clear_young(pte_t *pte)
> -{
> -	return __ptep_test_and_clear_young(pte);
> -}
> -#endif
> -
> -static int stage2_pmdp_test_and_clear_young(pmd_t *pmd)
> -{
> -	return stage2_ptep_test_and_clear_young((pte_t *)pmd);
> -}
> -
> -static int stage2_pudp_test_and_clear_young(pud_t *pud)
> -{
> -	return stage2_ptep_test_and_clear_young((pte_t *)pud);
> -}
> -
>   /**
>    * kvm_phys_addr_ioremap - map a device range to guest IPA
>    *
> @@ -1181,102 +522,6 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>   	return ret;
>   }
>   
> -/**
> - * stage2_wp_ptes - write protect PMD range
> - * @pmd:	pointer to pmd entry
> - * @addr:	range start address
> - * @end:	range end address
> - */
> -static void stage2_wp_ptes(pmd_t *pmd, phys_addr_t addr, phys_addr_t end)
> -{
> -	pte_t *pte;
> -
> -	pte = pte_offset_kernel(pmd, addr);
> -	do {
> -		if (!pte_none(*pte)) {
> -			if (!kvm_s2pte_readonly(pte))
> -				kvm_set_s2pte_readonly(pte);
> -		}
> -	} while (pte++, addr += PAGE_SIZE, addr != end);
> -}
> -
> -/**
> - * stage2_wp_pmds - write protect PUD range
> - * kvm:		kvm instance for the VM
> - * @pud:	pointer to pud entry
> - * @addr:	range start address
> - * @end:	range end address
> - */
> -static void stage2_wp_pmds(struct kvm_s2_mmu *mmu, pud_t *pud,
> -			   phys_addr_t addr, phys_addr_t end)
> -{
> -	struct kvm *kvm = mmu->kvm;
> -	pmd_t *pmd;
> -	phys_addr_t next;
> -
> -	pmd = stage2_pmd_offset(kvm, pud, addr);
> -
> -	do {
> -		next = stage2_pmd_addr_end(kvm, addr, end);
> -		if (!pmd_none(*pmd)) {
> -			if (pmd_thp_or_huge(*pmd)) {
> -				if (!kvm_s2pmd_readonly(pmd))
> -					kvm_set_s2pmd_readonly(pmd);
> -			} else {
> -				stage2_wp_ptes(pmd, addr, next);
> -			}
> -		}
> -	} while (pmd++, addr = next, addr != end);
> -}
> -
> -/**
> - * stage2_wp_puds - write protect P4D range
> - * @p4d:	pointer to p4d entry
> - * @addr:	range start address
> - * @end:	range end address
> - */
> -static void  stage2_wp_puds(struct kvm_s2_mmu *mmu, p4d_t *p4d,
> -			    phys_addr_t addr, phys_addr_t end)
> -{
> -	struct kvm *kvm = mmu->kvm;
> -	pud_t *pud;
> -	phys_addr_t next;
> -
> -	pud = stage2_pud_offset(kvm, p4d, addr);
> -	do {
> -		next = stage2_pud_addr_end(kvm, addr, end);
> -		if (!stage2_pud_none(kvm, *pud)) {
> -			if (stage2_pud_huge(kvm, *pud)) {
> -				if (!kvm_s2pud_readonly(pud))
> -					kvm_set_s2pud_readonly(pud);
> -			} else {
> -				stage2_wp_pmds(mmu, pud, addr, next);
> -			}
> -		}
> -	} while (pud++, addr = next, addr != end);
> -}
> -
> -/**
> - * stage2_wp_p4ds - write protect PGD range
> - * @pgd:	pointer to pgd entry
> - * @addr:	range start address
> - * @end:	range end address
> - */
> -static void  stage2_wp_p4ds(struct kvm_s2_mmu *mmu, pgd_t *pgd,
> -			    phys_addr_t addr, phys_addr_t end)
> -{
> -	struct kvm *kvm = mmu->kvm;
> -	p4d_t *p4d;
> -	phys_addr_t next;
> -
> -	p4d = stage2_p4d_offset(kvm, pgd, addr);
> -	do {
> -		next = stage2_p4d_addr_end(kvm, addr, end);
> -		if (!stage2_p4d_none(kvm, *p4d))
> -			stage2_wp_puds(mmu, p4d, addr, next);
> -	} while (p4d++, addr = next, addr != end);
> -}
> -
>   /**
>    * stage2_wp_range() - write protect stage2 memory region range
>    * @kvm:	The KVM pointer
> 

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 17/21] KVM: arm64: Convert user_mem_abort() to generic page-table API
  2020-08-25  9:39 ` [PATCH v3 17/21] KVM: arm64: Convert user_mem_abort() to generic page-table API Will Deacon
@ 2020-09-03  6:05   ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2020-09-03  6:05 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, linux-arm-kernel

Hi Will,

On 8/25/20 7:39 PM, Will Deacon wrote:
> Convert user_mem_abort() to call kvm_pgtable_stage2_relax_perms() when
> handling a stage-2 permission fault and kvm_pgtable_stage2_map() when
> handling a stage-2 translation fault, rather than walking the page-table
> manually.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/kvm/mmu.c | 112 +++++++++++++------------------------------
>   1 file changed, 34 insertions(+), 78 deletions(-)
> 

I looks good to me. As it's changing the stage2 page table management
mechanism completely. I will test this series with various configuration
on different machines. I will update the result when it's finished.

Reviewed-by: Gavin Shan <gshan@redhat.com>

> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index d4b0716a6ab4..cfbf32cae3a5 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1491,7 +1491,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   {
>   	int ret;
>   	bool write_fault, writable, force_pte = false;
> -	bool exec_fault, needs_exec;
> +	bool exec_fault;
> +	bool device = false;
>   	unsigned long mmu_seq;
>   	gfn_t gfn = fault_ipa >> PAGE_SHIFT;
>   	struct kvm *kvm = vcpu->kvm;
> @@ -1499,10 +1500,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   	struct vm_area_struct *vma;
>   	short vma_shift;
>   	kvm_pfn_t pfn;
> -	pgprot_t mem_type = PAGE_S2;
>   	bool logging_active = memslot_is_logging(memslot);
> -	unsigned long vma_pagesize, flags = 0;
> -	struct kvm_s2_mmu *mmu = vcpu->arch.hw_mmu;
> +	unsigned long vma_pagesize;
> +	enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> +	struct kvm_pgtable *pgt;
>   
>   	write_fault = kvm_is_write_fault(vcpu);
>   	exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
> @@ -1535,22 +1536,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   		vma_pagesize = PAGE_SIZE;
>   	}
>   
> -	/*
> -	 * The stage2 has a minimum of 2 level table (For arm64 see
> -	 * kvm_arm_setup_stage2()). Hence, we are guaranteed that we can
> -	 * use PMD_SIZE huge mappings (even when the PMD is folded into PGD).
> -	 * As for PUD huge maps, we must make sure that we have at least
> -	 * 3 levels, i.e, PMD is not folded.
> -	 */
> -	if (vma_pagesize == PMD_SIZE ||
> -	    (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm)))
> +	if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
>   		gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
>   	mmap_read_unlock(current->mm);
>   
> -	/* We need minimum second+third level pages */
> -	ret = kvm_mmu_topup_memory_cache(memcache, kvm_mmu_cache_min_pages(kvm));
> -	if (ret)
> -		return ret;
> +	if (fault_status != FSC_PERM) {
> +		ret = kvm_mmu_topup_memory_cache(memcache,
> +						 kvm_mmu_cache_min_pages(kvm));
> +		if (ret)
> +			return ret;
> +	}
>   
>   	mmu_seq = vcpu->kvm->mmu_notifier_seq;
>   	/*
> @@ -1573,28 +1568,20 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   		return -EFAULT;
>   
>   	if (kvm_is_device_pfn(pfn)) {
> -		mem_type = PAGE_S2_DEVICE;
> -		flags |= KVM_S2PTE_FLAG_IS_IOMAP;
> -	} else if (logging_active) {
> -		/*
> -		 * Faults on pages in a memslot with logging enabled
> -		 * should not be mapped with huge pages (it introduces churn
> -		 * and performance degradation), so force a pte mapping.
> -		 */
> -		flags |= KVM_S2_FLAG_LOGGING_ACTIVE;
> -
> +		device = true;
> +	} else if (logging_active && !write_fault) {
>   		/*
>   		 * Only actually map the page as writable if this was a write
>   		 * fault.
>   		 */
> -		if (!write_fault)
> -			writable = false;
> +		writable = false;
>   	}
>   
> -	if (exec_fault && is_iomap(flags))
> +	if (exec_fault && device)
>   		return -ENOEXEC;
>   
>   	spin_lock(&kvm->mmu_lock);
> +	pgt = vcpu->arch.hw_mmu->pgt;
>   	if (mmu_notifier_retry(kvm, mmu_seq))
>   		goto out_unlock;
>   
> @@ -1605,62 +1592,31 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>   	if (vma_pagesize == PAGE_SIZE && !force_pte)
>   		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>   							   &pfn, &fault_ipa);
> -	if (writable)
> +	if (writable) {
> +		prot |= KVM_PGTABLE_PROT_W;
>   		kvm_set_pfn_dirty(pfn);
> +		mark_page_dirty(kvm, gfn);
> +	}
>   
> -	if (fault_status != FSC_PERM && !is_iomap(flags))
> +	if (fault_status != FSC_PERM && !device)
>   		clean_dcache_guest_page(pfn, vma_pagesize);
>   
> -	if (exec_fault)
> +	if (exec_fault) {
> +		prot |= KVM_PGTABLE_PROT_X;
>   		invalidate_icache_guest_page(pfn, vma_pagesize);
> +	}
>   
> -	/*
> -	 * If we took an execution fault we have made the
> -	 * icache/dcache coherent above and should now let the s2
> -	 * mapping be executable.
> -	 *
> -	 * Write faults (!exec_fault && FSC_PERM) are orthogonal to
> -	 * execute permissions, and we preserve whatever we have.
> -	 */
> -	needs_exec = exec_fault ||
> -		(fault_status == FSC_PERM &&
> -		 stage2_is_exec(mmu, fault_ipa, vma_pagesize));
> -
> -	if (vma_pagesize == PUD_SIZE) {
> -		pud_t new_pud = kvm_pfn_pud(pfn, mem_type);
> -
> -		new_pud = kvm_pud_mkhuge(new_pud);
> -		if (writable)
> -			new_pud = kvm_s2pud_mkwrite(new_pud);
> -
> -		if (needs_exec)
> -			new_pud = kvm_s2pud_mkexec(new_pud);
> -
> -		ret = stage2_set_pud_huge(mmu, memcache, fault_ipa, &new_pud);
> -	} else if (vma_pagesize == PMD_SIZE) {
> -		pmd_t new_pmd = kvm_pfn_pmd(pfn, mem_type);
> -
> -		new_pmd = kvm_pmd_mkhuge(new_pmd);
> -
> -		if (writable)
> -			new_pmd = kvm_s2pmd_mkwrite(new_pmd);
> -
> -		if (needs_exec)
> -			new_pmd = kvm_s2pmd_mkexec(new_pmd);
> +	if (device)
> +		prot |= KVM_PGTABLE_PROT_DEVICE;
> +	else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
> +		prot |= KVM_PGTABLE_PROT_X;
>   
> -		ret = stage2_set_pmd_huge(mmu, memcache, fault_ipa, &new_pmd);
> +	if (fault_status == FSC_PERM) {
> +		ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
>   	} else {
> -		pte_t new_pte = kvm_pfn_pte(pfn, mem_type);
> -
> -		if (writable) {
> -			new_pte = kvm_s2pte_mkwrite(new_pte);
> -			mark_page_dirty(kvm, gfn);
> -		}
> -
> -		if (needs_exec)
> -			new_pte = kvm_s2pte_mkexec(new_pte);
> -
> -		ret = stage2_set_pte(mmu, memcache, fault_ipa, &new_pte, flags);
> +		ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
> +					     __pfn_to_phys(pfn), prot,
> +					     memcache);
>   	}
>   
>   out_unlock:
> 

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling
  2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
                   ` (21 preceding siblings ...)
  2020-08-27 16:26 ` [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Alexandru Elisei
@ 2020-09-03  7:34 ` Gavin Shan
  2020-09-03 11:13   ` Gavin Shan
  2020-09-03 18:52 ` Will Deacon
  23 siblings, 1 reply; 86+ messages in thread
From: Gavin Shan @ 2020-09-03  7:34 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, linux-arm-kernel

Hi Will,

On 8/25/20 7:39 PM, Will Deacon wrote:
> Hello folks,
> 
> This is version three of the KVM page-table rework that I previously posted
> here:
> 
>    v1: https://lore.kernel.org/r/20200730153406.25136-1-will@kernel.org
>    v2: https://lore.kernel.org/r/20200818132818.16065-1-will@kernel.org
> 
> Changes since v2 include:
> 
>    * Rebased onto -rc2, which includes the conflicting OOM blocking fixes
>    * Dropped the patch trying to "fix" the memcache in kvm_phys_addr_ioremap()
> 

It's really nice work, making the code unified/simplified greatly.
However, it seems it doesn't work well with HugeTLBfs. Please refer
to the following test result and see if you have quick idea, or I
can debug it a bit :)

Note: I think the failing cases (FAIL[1] and FAIL[2]) would be
caused by same issue.

Machine	     Host                     Guest              Result
===============================================================
ThunderX2    VA_BITS:   42           PAGE_SIZE:  4KB     Passed
              PAGE_SIZE: 64KB                    64KB     passed
              THP:       disabled
              HugeTLB:   disabled
---------------------------------------------------------------
ThunderX2    VA_BITS:   42           PAGE_SIZE:  4KB     Passed
              PAGE_SIZE: 64KB                    64KB     passed
              THP:       enabled
              HugeTLB:   disabled
----------------------------------------------------------------
ThunderX2    VA_BITS:   42           PAGE_SIZE:  4KB     Fail[1]
              PAGE_SIZE: 64KB                    64KB     Fail[1]
              THP:       disabled
              HugeTLB:   enabled
---------------------------------------------------------------
ThunderX2    VA_BITS:   39           PAGE_SIZE:  4KB     Passed
              PAGE_SIZE: 4KB                     64KB     Passed
              THP:       disabled
              HugeTLB:   disabled
---------------------------------------------------------------
ThunderX2    VA_BITS:   39           PAGE_SIZE:  4KB     Passed
              PAGE_SIZE: 4KB                     64KB     Passed
              THP:       enabled
              HugeTLB:   disabled
--------------------------------------------------------------
ThunderX2    VA_BITS:   39           PAGE_SIZE: 4KB     Fail[2]
              PAGE_SIZE: 4KB                    64KB     Fail[2]
              THP:       disabled
              HugeTLB:   enabled

NOTE: The commands used to start VM are same for FAIL[1] and
FAIL[2] and the host kernel log are similar. So I don't provide
the kernel log for FAIL[2]. I guess they're caused by same
issue.

Fail[1]
===============================================================

start_vm_aarch64_hugetlbfs() {
    echo 16 > /sys/kernel/mm/hugepages/hugepages-524288kB/nr_hugepages

    /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64                     \
    --enable-kvm -machine virt,gic-version=host                                 \
    -cpu host -smp 8,sockets=8,cores=1,threads=1                                \
    -m 4G -mem-prealloc -mem-path /dev/hugepages                                \
    -monitor none -serial mon:stdio -nographic -s                               \
    -bios /home/gavin/sandbox/qemu.main/pc-bios/edk2-aarch64-code.fd            \
    -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image               \
    -initrd /home/gavin/sandbox/images/rootfs.cpio.xz                           \
    -append "earlycon=pl011,mmio,0x9000000"                                     \
    -device virtio-net-pci,netdev=unet,mac=52:54:00:f1:26:a6                    \
    -netdev user,id=unet,hostfwd=tcp::50959-:22                                 \
    -drive file=/home/gavin/sandbox/images/vm.img,if=none,format=raw,id=nvme0   \
    -device nvme,drive=nvme0,serial=foo                                         \
    -drive file=/home/gavin/sandbox/images/vm1.img,if=none,format=raw,id=nvme1  \
    -device nvme,drive=nvme1,serial=foo1
}

[  160.889802] Unable to handle kernel paging request at virtual address 003fffff7fc00034
[  160.897712] Mem abort info:
[  160.900507]   ESR = 0x96000004
[  160.903550]   EC = 0x25: DABT (current EL), IL = 32 bits
[  160.908848]   SET = 0, FnV = 0
[  160.911896]   EA = 0, S1PTW = 0
[  160.915024] Data abort info:
[  160.917891]   ISV = 0, ISS = 0x00000004
[  160.921722]   CM = 0, WnR = 0
[  160.924678] [003fffff7fc00034] address between user and kernel address ranges
[  160.931808] Internal error: Oops: 96000004 [#1] SMP
[  160.936676] Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink tun bridge stp llc rfkill ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib vfat fat ib_umad rpcrdma sunrpc rdma_ucm ib_iser rdma_cm iw_cm ib_cm libiscsi scsi_transport_iscsi ipmi_ssif qedr ib_uverbs crct10dif_ce i2c_smbus ghash_ce sha2_ce sha256_arm64 ib_core sha1_ce ipmi_devintf ipmi_msghandler thunderx2_pmu ip_tables xfs libcrc32c sg ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helper qede ttm mpt3sas qed drm raid_class e1000e scsi_transport_sas crc8 gpio_xlp i2c_xlp9xx dm_mirror dm_region_hash dm_log dm_mod
[  161.007565] CPU: 222 PID: 4559 Comm: qemu-system-aar Not tainted 5.9.0-rc3-gavin+ #4
[  161.015293] Hardware name: Default string MT91-FS1/MT91-FS1, BIOS 28m 12/14/2019
[  161.022676] pstate: 60400009 (nZCv daif +PAN -UAO BTYPE=--)
[  161.028250] pc : __free_pages+0x24/0x60
[  161.032074] lr : free_pages.part.102+0x2c/0x38
[  161.036504] sp : fffffe0031b2f8a0
[  161.039805] x29: fffffe0031b2f8a0 x28: 0000000040000000
[  161.045104] x27: fffffe0031b2f9c8 x26: 0000000000000007
[  161.050402] x25: 0000000000000003 x24: fffffe0010f16000
[  161.055700] x23: 0000000020000000 x22: 0000000040000000
[  161.060998] x21: 0000000000000002 x20: 0000000060000000
[  161.066296] x19: fffffc0f1b050010 x18: 0000000000000000
[  161.071595] x17: 0000000000000000 x16: 0000000000000000
[  161.076893] x15: 0000000000000000 x14: 0000000000000000
[  161.082191] x13: 0000000000000000 x12: 0000000000000001
[  161.087489] x11: 0000000000000003 x10: 0000000000000002
[  161.092787] x9 : fffffe001035fca4 x8 : 0000000000000007
[  161.098085] x7 : 00000000fffffff3 x6 : fffffe0010126370
[  161.103383] x5 : fffffe0031b2f9e8 x4 : 0000040080000000
[  161.108681] x3 : 003fffff7fc00034 x2 : 00000000ffffffff
[  161.113979] x1 : 0000000000000000 x0 : 003fffff7fc00000
[  161.119277] Call trace:
[  161.121713]  __free_pages+0x24/0x60
[  161.125189]  free_pages.part.102+0x2c/0x38
[  161.129272]  free_pages+0x1c/0x28
[  161.132586]  stage2_map_walker+0xbc/0x218
[  161.136584]  __kvm_pgtable_walk+0xec/0x1c8
[  161.140667]  _kvm_pgtable_walk+0xa4/0xe0
[  161.144578]  kvm_pgtable_stage2_map+0xa4/0x118
[  161.149022]  kvm_handle_guest_abort+0x48c/0xa08
[  161.153543]  handle_exit+0x134/0x198
[  161.157107]  kvm_arch_vcpu_ioctl_run+0x4f0/0x880
[  161.161721]  kvm_vcpu_ioctl+0x3a8/0x808
[  161.165546]  __arm64_sys_ioctl+0x1dc/0xcf8
[  161.169642]  do_el0_svc+0xf4/0x1b8
[  161.173039]  el0_sync_handler+0xf8/0x124
[  161.176949]  el0_sync+0x140/0x180
[  161.180254] Code: d503201f 9100d003 52800022 4b0203e2 (b8e20064)
[  161.186408] ---[ end trace d0b1b117875f8fcd ]---
[  161.191012] Kernel panic - not syncing: Fatal exception
[  161.196247] SMP: stopping secondary CPUs
[  161.200206] Kernel Offset: 0xc0000 from 0xfffffe0010000000
[  161.205677] PHYS_OFFSET: 0x80000000
[  161.209154] CPU features: 0x0046002,22800c38
[  161.213410] Memory Limit: none
[  161.216474] ---[ end Kernel panic - not syncing: Fatal exception ]---


FAIL[2]
================================================================
start_vm_aarch64_hugetlbfs() {
    echo 4096 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

    /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64                     \
    --enable-kvm -machine virt,gic-version=host                                 \
    -cpu host -smp 8,sockets=8,cores=1,threads=1                                \
    -m 4G -mem-prealloc -mem-path /dev/hugepages                                \
    -monitor none -serial mon:stdio -nographic -s                               \
    -bios /home/gavin/sandbox/qemu.main/pc-bios/edk2-aarch64-code.fd            \
    -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image               \
    -initrd /home/gavin/sandbox/images/rootfs.cpio.xz                           \
    -append "earlycon=pl011,mmio,0x9000000"                                     \
    -device virtio-net-pci,netdev=unet,mac=52:54:00:f1:26:a6                    \
    -netdev user,id=unet,hostfwd=tcp::50959-:22                                 \
    -drive file=/home/gavin/sandbox/images/vm.img,if=none,format=raw,id=nvme0   \
    -device nvme,drive=nvme0,serial=foo                                         \
    -drive file=/home/gavin/sandbox/images/vm1.img,if=none,format=raw,id=nvme1  \
    -device nvme,drive=nvme1,serial=foo1
}

[  666.278391] Unable to handle kernel paging request at virtual address 03fffffefde00034
[  666.286304] Mem abort info:
[  666.289086]   ESR = 0x96000004
[  666.292142]   EC = 0x25: DABT (current EL), IL = 32 bits
[  666.297440]   SET = 0, FnV = 0
[  666.300481]   EA = 0, S1PTW = 0
[  666.303616] Data abort info:
[  666.306484]   ISV = 0, ISS = 0x00000004
[  666.310306]   CM = 0, WnR = 0
[  666.313269] [03fffffefde00034] address between user and kernel address ranges
[  666.320393] Internal error: Oops: 96000004 [#1] SMP
[  666.325259] Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink tun bridge stp llc rfkill ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_umad vfat fat rpcrdma sunrpc rdma_ucm ib_iser rdma_cm iw_cm ib_cm libiscsi scsi_transport_iscsi qedr ib_uverbs ipmi_ssif ib_core crct10dif_ce i2c_smbus ghash_ce sha2_ce sha256_arm64 sha1_ce ipmi_devintf ipmi_msghandler thunderx2_pmu ip_tables xfs libcrc32c sg ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helper qede ttm qed mpt3sas drm e1000e raid_class crc8 scsi_transport_sas gpio_xlp i2c_xlp9xx dm_mirror dm_region_hash dm_log dm_mod
[  666.396150] CPU: 168 PID: 42112 Comm: qemu-system-aar Not tainted 5.9.0-rc3-gavin+ #5
[  666.403965] Hardware name: Default string MT91-FS1/MT91-FS1, BIOS 28m 12/14/2019
[  666.411348] pstate: 60400009 (nZCv daif +PAN -UAO BTYPE=--)
[  666.416922] pc : __free_pages+0x24/0x60
[  666.420746] lr : free_pages.part.102+0x2c/0x38
[  666.425176] sp : ffffffc024a23840
[  666.428477] x29: ffffffc024a23840 x28: 0000000040000000
[  666.433776] x27: ffffffc024a239c8 x26: 0000000000000007
[  666.439074] x25: 0000000000000003 x24: ffffffc010ec0000
[  666.444373] x23: 0000000000200000 x22: 0000000040200000
[  666.449671] x21: 0000000040000000 x20: ffffff8f34576000
[  666.454969] x19: 0000000000000002 x18: 0000000000000000
[  666.460267] x17: 0000000000000000 x16: 0000000000000000
[  666.465565] x15: 0000000000000000 x14: 0000000000000000
[  666.470863] x13: 0000000000000000 x12: 0000000000000001
[  666.476161] x11: 0000000000000003 x10: 0000000000000002
[  666.481459] x9 : ffffffc0103522f4 x8 : 0000000000000007
[  666.486757] x7 : ffffffc0249960f8 x6 : ffffffc0101162f8
[  666.492055] x5 : ffffffc024a239e8 x4 : 0000000000000004
[  666.497353] x3 : 03fffffefde00034 x2 : 00000000ffffffff
[  666.502651] x1 : 0000000000000000 x0 : 03fffffefde00000
[  666.507950] Call trace:
[  666.510385]  __free_pages+0x24/0x60
[  666.513861]  free_pages.part.102+0x2c/0x38
[  666.517945]  free_pages+0x1c/0x28
[  666.521260]  stage2_map_walker+0xb0/0x208
[  666.525257]  __kvm_pgtable_walk+0xe0/0x1b8
[  666.529340]  __kvm_pgtable_walk+0xb8/0x1b8
[  666.533424]  _kvm_pgtable_walk+0xa4/0xe0
[  666.537334]  kvm_pgtable_stage2_map+0xa0/0x118
[  666.541779]  kvm_handle_guest_abort+0x48c/0xa38
[  666.546300]  handle_exit+0x134/0x198
[  666.549864]  kvm_arch_vcpu_ioctl_run+0x4f0/0x880
[  666.554479]  kvm_vcpu_ioctl+0x3a8/0x808
[  666.558304]  __arm64_sys_ioctl+0x1dc/0xcf8
[  666.562402]  do_el0_svc+0xf4/0x1b8
[  666.565799]  el0_sync_handler+0xf8/0x124
[  666.569709]  el0_sync+0x140/0x180
[  666.573014] Code: d503201f 9100d003 52800022 4b0203e2 (b8e20064)
[  666.579197] ---[ end trace 52b60e2f408396b6 ]---
[  666.583801] Kernel panic - not syncing: Fatal exception
[  666.589035] SMP: stopping secondary CPUs
[  666.592996] Kernel Offset: 0xb0000 from 0xffffffc010000000
[  666.598467] PHYS_OFFSET: 0x80000000
[  666.601944] CPU features: 0x0046002,22800c38
[  666.606200] Memory Limit: none
[  666.609264] ---[ end Kernel panic - not syncing: Fatal exception ]---

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling
  2020-09-03  7:34 ` Gavin Shan
@ 2020-09-03 11:13   ` Gavin Shan
  2020-09-03 11:48     ` Gavin Shan
  0 siblings, 1 reply; 86+ messages in thread
From: Gavin Shan @ 2020-09-03 11:13 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, linux-arm-kernel

Hi Will,

On 9/3/20 5:34 PM, Gavin Shan wrote:
> On 8/25/20 7:39 PM, Will Deacon wrote:
>> Hello folks,
>>
>> This is version three of the KVM page-table rework that I previously posted
>> here:
>>
>>    v1: https://lore.kernel.org/r/20200730153406.25136-1-will@kernel.org
>>    v2: https://lore.kernel.org/r/20200818132818.16065-1-will@kernel.org
>>
>> Changes since v2 include:
>>
>>    * Rebased onto -rc2, which includes the conflicting OOM blocking fixes
>>    * Dropped the patch trying to "fix" the memcache in kvm_phys_addr_ioremap()
>>
> 
> It's really nice work, making the code unified/simplified greatly.
> However, it seems it doesn't work well with HugeTLBfs. Please refer
> to the following test result and see if you have quick idea, or I
> can debug it a bit :)
> 
> 
> Machine         Host                     Guest              Result
> ===============================================================
> ThunderX2    VA_BITS:   42           PAGE_SIZE:  4KB     Passed
>               PAGE_SIZE: 64KB                    64KB     passed
>               THP:       disabled
>               HugeTLB:   disabled
> ---------------------------------------------------------------
> ThunderX2    VA_BITS:   42           PAGE_SIZE:  4KB     Passed
>               PAGE_SIZE: 64KB                    64KB     passed
>               THP:       enabled
>               HugeTLB:   disabled
> ----------------------------------------------------------------
> ThunderX2    VA_BITS:   42           PAGE_SIZE:  4KB     Fail[1]
>               PAGE_SIZE: 64KB                    64KB     Fail[1]
>               THP:       disabled
>               HugeTLB:   enabled
> ---------------------------------------------------------------
> ThunderX2    VA_BITS:   39           PAGE_SIZE:  4KB     Passed
>               PAGE_SIZE: 4KB                     64KB     Passed
>               THP:       disabled
>               HugeTLB:   disabled
> ---------------------------------------------------------------
> ThunderX2    VA_BITS:   39           PAGE_SIZE:  4KB     Passed
>               PAGE_SIZE: 4KB                     64KB     Passed
>               THP:       enabled
>               HugeTLB:   disabled
> --------------------------------------------------------------
> ThunderX2    VA_BITS:   39           PAGE_SIZE: 4KB     Fail[2]
>               PAGE_SIZE: 4KB                    64KB     Fail[2]
>               THP:       disabled
>               HugeTLB:   enabled
> 

I debugged the code and found the issue is caused by the following
patch.

[PATCH v3 06/21] KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table

With the following code changes applied on top of this series, no
host kernel crash found and hugetlbfs works for me. However, I don't
think it's correct fix to have. I guess we still want to invalidate
the page table entry (at level#2 when PAGE_SIZE is 64KB on host) in
stage2_map_walk_table_pre() as we're going to cut off the branch to
the subordinate tables/entries. However, stage2_map_walk_table_post()
still need the original page table entry to release the subordinate
page properly. So I guess the proper fix would be to cache the original
page table entry in advance, or you might have better idea :)

I will also reply to PATCH[06/21] to to make the reply chain complete.

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 6e8ca1ec12b4..f4eacfdd73cb 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -494,8 +494,8 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
         if (!kvm_block_mapping_supported(addr, end, data->phys, level))
                 return 0;
  
-       kvm_set_invalid_pte(ptep);
-       kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
+       //kvm_set_invalid_pte(ptep);
+       //kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
         data->anchor = ptep;
         return 0;
  }

For the initial debugging, I add some printk around and get the following
output, for FYI. It indicates we're releasing page at physical address
0x0 and obviously incorrect.

    [  111.586180] stage2_map_walk_table_post: addr=0x40000000, end=0x60000000, level=2, anchor@0xfffffc0f191c0010, ptep@0xfffffc0f191c0010

    static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
         if (!data->anchor)
                 return 0;
  
+       if (*ptep == 0x0) {
+               pr_warn("%s: addr=0x%llx, end=0x%llx, level=%d, anchor@0x%lx, ptep@0x%lx\n",
+                        __func__, addr, end, level, (unsigned long)(data->anchor),
+                       (unsigned long)ptep);
+       }
+
         free_page((unsigned long)kvm_pte_follow(*ptep));
         put_page(virt_to_page(ptep));

By the way, I've finished the code review. I leave those nVHE patches to Alex for his
review. I think the testing is also finished until you need me to have more testing.
With the issue fixed, feel free to add for this series:

Tested-by: Gavin Shan <gshan@redhat.com>

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 06/21] KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table
  2020-08-25  9:39 ` [PATCH v3 06/21] KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table Will Deacon
  2020-09-01 16:24   ` Alexandru Elisei
  2020-09-03  2:57   ` Gavin Shan
@ 2020-09-03 11:18   ` Gavin Shan
  2020-09-03 12:30     ` Will Deacon
  2 siblings, 1 reply; 86+ messages in thread
From: Gavin Shan @ 2020-09-03 11:18 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, linux-arm-kernel

Hi Will,

On 8/25/20 7:39 PM, Will Deacon wrote:
> Add stage-2 map() and unmap() operations to the generic page-table code.
> 
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Quentin Perret <qperret@google.com>
> Signed-off-by: Will Deacon <will@kernel.org>
> ---
>   arch/arm64/include/asm/kvm_pgtable.h |  39 ++++
>   arch/arm64/kvm/hyp/pgtable.c         | 262 +++++++++++++++++++++++++++
>   2 files changed, 301 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
> index 3389f978d573..8ab0d5f43817 100644
> --- a/arch/arm64/include/asm/kvm_pgtable.h
> +++ b/arch/arm64/include/asm/kvm_pgtable.h
> @@ -134,6 +134,45 @@ int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm);
>    */
>   void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt);
>   
> +/**
> + * kvm_pgtable_stage2_map() - Install a mapping in a guest stage-2 page-table.
> + * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
> + * @addr:	Intermediate physical address at which to place the mapping.
> + * @size:	Size of the mapping.
> + * @phys:	Physical address of the memory to map.
> + * @prot:	Permissions and attributes for the mapping.
> + * @mc:		Cache of pre-allocated GFP_PGTABLE_USER memory from which to
> + *		allocate page-table pages.
> + *
> + * If device attributes are not explicitly requested in @prot, then the
> + * mapping will be normal, cacheable.
> + *
> + * Note that this function will both coalesce existing table entries and split
> + * existing block mappings, relying on page-faults to fault back areas outside
> + * of the new mapping lazily.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
> +			   u64 phys, enum kvm_pgtable_prot prot,
> +			   struct kvm_mmu_memory_cache *mc);
> +
> +/**
> + * kvm_pgtable_stage2_unmap() - Remove a mapping from a guest stage-2 page-table.
> + * @pgt:	Page-table structure initialised by kvm_pgtable_stage2_init().
> + * @addr:	Intermediate physical address from which to remove the mapping.
> + * @size:	Size of the mapping.
> + *
> + * TLB invalidation is performed for each page-table entry cleared during the
> + * unmapping operation and the reference count for the page-table page
> + * containing the cleared entry is decremented, with unreferenced pages being
> + * freed. Unmapping a cacheable page will ensure that it is clean to the PoC if
> + * FWB is not supported by the CPU.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size);
> +
>   /**
>    * kvm_pgtable_walk() - Walk a page-table.
>    * @pgt:	Page-table structure initialised by kvm_pgtable_*_init().
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index b8550ccaef4d..41ee8f3c0369 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -32,10 +32,19 @@
>   #define KVM_PTE_LEAF_ATTR_LO_S1_SH_IS	3
>   #define KVM_PTE_LEAF_ATTR_LO_S1_AF	BIT(10)
>   
> +#define KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR	GENMASK(5, 2)
> +#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R	BIT(6)
> +#define KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W	BIT(7)
> +#define KVM_PTE_LEAF_ATTR_LO_S2_SH	GENMASK(9, 8)
> +#define KVM_PTE_LEAF_ATTR_LO_S2_SH_IS	3
> +#define KVM_PTE_LEAF_ATTR_LO_S2_AF	BIT(10)
> +
>   #define KVM_PTE_LEAF_ATTR_HI		GENMASK(63, 51)
>   
>   #define KVM_PTE_LEAF_ATTR_HI_S1_XN	BIT(54)
>   
> +#define KVM_PTE_LEAF_ATTR_HI_S2_XN	BIT(54)
> +
>   struct kvm_pgtable_walk_data {
>   	struct kvm_pgtable		*pgt;
>   	struct kvm_pgtable_walker	*walker;
> @@ -420,6 +429,259 @@ void kvm_pgtable_hyp_destroy(struct kvm_pgtable *pgt)
>   	pgt->pgd = NULL;
>   }
>   
> +struct stage2_map_data {
> +	u64				phys;
> +	kvm_pte_t			attr;
> +
> +	kvm_pte_t			*anchor;
> +
> +	struct kvm_s2_mmu		*mmu;
> +	struct kvm_mmu_memory_cache	*memcache;
> +};
> +
> +static kvm_pte_t *stage2_memcache_alloc_page(struct stage2_map_data *data)
> +{
> +	kvm_pte_t *ptep = NULL;
> +	struct kvm_mmu_memory_cache *mc = data->memcache;
> +
> +	/* Allocated with GFP_PGTABLE_USER, so no need to zero */
> +	if (mc && mc->nobjs)
> +		ptep = mc->objects[--mc->nobjs];
> +
> +	return ptep;
> +}
> +
> +static int stage2_map_set_prot_attr(enum kvm_pgtable_prot prot,
> +				    struct stage2_map_data *data)
> +{
> +	bool device = prot & KVM_PGTABLE_PROT_DEVICE;
> +	kvm_pte_t attr = device ? PAGE_S2_MEMATTR(DEVICE_nGnRE) :
> +			    PAGE_S2_MEMATTR(NORMAL);
> +	u32 sh = KVM_PTE_LEAF_ATTR_LO_S2_SH_IS;
> +
> +	if (!(prot & KVM_PGTABLE_PROT_X))
> +		attr |= KVM_PTE_LEAF_ATTR_HI_S2_XN;
> +	else if (device)
> +		return -EINVAL;
> +
> +	if (prot & KVM_PGTABLE_PROT_R)
> +		attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_R;
> +
> +	if (prot & KVM_PGTABLE_PROT_W)
> +		attr |= KVM_PTE_LEAF_ATTR_LO_S2_S2AP_W;
> +
> +	attr |= FIELD_PREP(KVM_PTE_LEAF_ATTR_LO_S2_SH, sh);
> +	attr |= KVM_PTE_LEAF_ATTR_LO_S2_AF;
> +	data->attr = attr;
> +	return 0;
> +}
> +
> +static bool stage2_map_walker_try_leaf(u64 addr, u64 end, u32 level,
> +				       kvm_pte_t *ptep,
> +				       struct stage2_map_data *data)
> +{
> +	u64 granule = kvm_granule_size(level), phys = data->phys;
> +
> +	if (!kvm_block_mapping_supported(addr, end, phys, level))
> +		return false;
> +
> +	if (kvm_set_valid_leaf_pte(ptep, phys, data->attr, level))
> +		goto out;
> +
> +	kvm_set_invalid_pte(ptep);
> +	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> +	kvm_set_valid_leaf_pte(ptep, phys, data->attr, level);
> +out:
> +	data->phys += granule;
> +	return true;
> +}
> +
> +static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
> +				     kvm_pte_t *ptep,
> +				     struct stage2_map_data *data)
> +{
> +	if (data->anchor)
> +		return 0;
> +
> +	if (!kvm_block_mapping_supported(addr, end, data->phys, level))
> +		return 0;
> +
> +	kvm_set_invalid_pte(ptep);
> +	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
> +	data->anchor = ptep;
> +	return 0;
> +}
> +
> +static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +				struct stage2_map_data *data)
> +{
> +	kvm_pte_t *childp, pte = *ptep;
> +	struct page *page = virt_to_page(ptep);
> +
> +	if (data->anchor) {
> +		if (kvm_pte_valid(pte))
> +			put_page(page);
> +
> +		return 0;
> +	}
> +
> +	if (stage2_map_walker_try_leaf(addr, end, level, ptep, data))
> +		goto out_get_page;
> +
> +	if (WARN_ON(level == KVM_PGTABLE_MAX_LEVELS - 1))
> +		return -EINVAL;
> +
> +	childp = stage2_memcache_alloc_page(data);
> +	if (!childp)
> +		return -ENOMEM;
> +
> +	/*
> +	 * If we've run into an existing block mapping then replace it with
> +	 * a table. Accesses beyond 'end' that fall within the new table
> +	 * will be mapped lazily.
> +	 */
> +	if (kvm_pte_valid(pte)) {
> +		kvm_set_invalid_pte(ptep);
> +		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, level);
> +		put_page(page);
> +	}
> +
> +	kvm_set_table_pte(ptep, childp);
> +
> +out_get_page:
> +	get_page(page);
> +	return 0;
> +}
> +
> +static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
> +				      kvm_pte_t *ptep,
> +				      struct stage2_map_data *data)
> +{
> +	int ret = 0;
> +
> +	if (!data->anchor)
> +		return 0;
> +
> +	free_page((unsigned long)kvm_pte_follow(*ptep));
> +	put_page(virt_to_page(ptep));
> +
> +	if (data->anchor == ptep) {
> +		data->anchor = NULL;
> +		ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
> +	}
> +
> +	return ret;
> +}
> +

As discussed in another thread, *ptep has been invalidated in stage2_map_walk_table_pre().
It means *ptep has value of zero. The following call to free_page() is going to release
the page frame corresponding to physical address 0x0. It's not correct. We might cache
the original value of this page table entry so that it can be used here.

> +static int stage2_map_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +			     enum kvm_pgtable_walk_flags flag, void * const arg)
> +{
> +	struct stage2_map_data *data = arg;
> +
> +	switch (flag) {
> +	case KVM_PGTABLE_WALK_TABLE_PRE:
> +		return stage2_map_walk_table_pre(addr, end, level, ptep, data);
> +	case KVM_PGTABLE_WALK_LEAF:
> +		return stage2_map_walk_leaf(addr, end, level, ptep, data);
> +	case KVM_PGTABLE_WALK_TABLE_POST:
> +		return stage2_map_walk_table_post(addr, end, level, ptep, data);
> +	}
> +
> +	return -EINVAL;
> +}
> +
> +int kvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size,
> +			   u64 phys, enum kvm_pgtable_prot prot,
> +			   struct kvm_mmu_memory_cache *mc)
> +{
> +	int ret;
> +	struct stage2_map_data map_data = {
> +		.phys		= ALIGN_DOWN(phys, PAGE_SIZE),
> +		.mmu		= pgt->mmu,
> +		.memcache	= mc,
> +	};
> +	struct kvm_pgtable_walker walker = {
> +		.cb		= stage2_map_walker,
> +		.flags		= KVM_PGTABLE_WALK_TABLE_PRE |
> +				  KVM_PGTABLE_WALK_LEAF |
> +				  KVM_PGTABLE_WALK_TABLE_POST,
> +		.arg		= &map_data,
> +	};
> +
> +	ret = stage2_map_set_prot_attr(prot, &map_data);
> +	if (ret)
> +		return ret;
> +
> +	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
> +	dsb(ishst);
> +	return ret;
> +}
> +
> +static void stage2_flush_dcache(void *addr, u64 size)
> +{
> +	if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> +		return;
> +
> +	__flush_dcache_area(addr, size);
> +}
> +
> +static bool stage2_pte_cacheable(kvm_pte_t pte)
> +{
> +	u64 memattr = FIELD_GET(KVM_PTE_LEAF_ATTR_LO_S2_MEMATTR, pte);
> +	return memattr == PAGE_S2_MEMATTR(NORMAL);
> +}
> +
> +static int stage2_unmap_walker(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
> +			       enum kvm_pgtable_walk_flags flag,
> +			       void * const arg)
> +{
> +	struct kvm_s2_mmu *mmu = arg;
> +	kvm_pte_t pte = *ptep, *childp = NULL;
> +	bool need_flush = false;
> +
> +	if (!kvm_pte_valid(pte))
> +		return 0;
> +
> +	if (kvm_pte_table(pte, level)) {
> +		childp = kvm_pte_follow(pte);
> +
> +		if (page_count(virt_to_page(childp)) != 1)
> +			return 0;
> +	} else if (stage2_pte_cacheable(pte)) {
> +		need_flush = true;
> +	}
> +
> +	/*
> +	 * This is similar to the map() path in that we unmap the entire
> +	 * block entry and rely on the remaining portions being faulted
> +	 * back lazily.
> +	 */
> +	kvm_set_invalid_pte(ptep);
> +	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, mmu, addr, level);
> +	put_page(virt_to_page(ptep));
> +
> +	if (need_flush) {
> +		stage2_flush_dcache(kvm_pte_follow(pte),
> +				    kvm_granule_size(level));
> +	}
> +
> +	if (childp)
> +		free_page((unsigned long)childp);
> +
> +	return 0;
> +}
> +
> +int kvm_pgtable_stage2_unmap(struct kvm_pgtable *pgt, u64 addr, u64 size)
> +{
> +	struct kvm_pgtable_walker walker = {
> +		.cb	= stage2_unmap_walker,
> +		.arg	= pgt->mmu,
> +		.flags	= KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST,
> +	};
> +
> +	return kvm_pgtable_walk(pgt, addr, size, &walker);
> +}
> +
>   int kvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm *kvm)
>   {
>   	size_t pgd_sz;
> 

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling
  2020-09-03 11:13   ` Gavin Shan
@ 2020-09-03 11:48     ` Gavin Shan
  2020-09-03 12:16       ` Will Deacon
  0 siblings, 1 reply; 86+ messages in thread
From: Gavin Shan @ 2020-09-03 11:48 UTC (permalink / raw)
  To: Will Deacon, kvmarm
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, linux-arm-kernel

Hi Will,

On 9/3/20 9:13 PM, Gavin Shan wrote:
> On 9/3/20 5:34 PM, Gavin Shan wrote:
>> On 8/25/20 7:39 PM, Will Deacon wrote:
>>> Hello folks,
>>>
>>> This is version three of the KVM page-table rework that I previously posted
>>> here:
>>>
>>>    v1: https://lore.kernel.org/r/20200730153406.25136-1-will@kernel.org
>>>    v2: https://lore.kernel.org/r/20200818132818.16065-1-will@kernel.org
>>>
>>> Changes since v2 include:
>>>
>>>    * Rebased onto -rc2, which includes the conflicting OOM blocking fixes
>>>    * Dropped the patch trying to "fix" the memcache in kvm_phys_addr_ioremap()
>>>
>>
>> It's really nice work, making the code unified/simplified greatly.
>> However, it seems it doesn't work well with HugeTLBfs. Please refer
>> to the following test result and see if you have quick idea, or I
>> can debug it a bit :)
>>
>>
>> Machine         Host                     Guest              Result
>> ===============================================================
>> ThunderX2    VA_BITS:   42           PAGE_SIZE:  4KB     Passed
>>               PAGE_SIZE: 64KB                    64KB     passed
>>               THP:       disabled
>>               HugeTLB:   disabled
>> ---------------------------------------------------------------
>> ThunderX2    VA_BITS:   42           PAGE_SIZE:  4KB     Passed
>>               PAGE_SIZE: 64KB                    64KB     passed
>>               THP:       enabled
>>               HugeTLB:   disabled
>> ----------------------------------------------------------------
>> ThunderX2    VA_BITS:   42           PAGE_SIZE:  4KB     Fail[1]
>>               PAGE_SIZE: 64KB                    64KB     Fail[1]
>>               THP:       disabled
>>               HugeTLB:   enabled
>> ---------------------------------------------------------------
>> ThunderX2    VA_BITS:   39           PAGE_SIZE:  4KB     Passed
>>               PAGE_SIZE: 4KB                     64KB     Passed
>>               THP:       disabled
>>               HugeTLB:   disabled
>> ---------------------------------------------------------------
>> ThunderX2    VA_BITS:   39           PAGE_SIZE:  4KB     Passed
>>               PAGE_SIZE: 4KB                     64KB     Passed
>>               THP:       enabled
>>               HugeTLB:   disabled
>> --------------------------------------------------------------
>> ThunderX2    VA_BITS:   39           PAGE_SIZE: 4KB     Fail[2]
>>               PAGE_SIZE: 4KB                    64KB     Fail[2]
>>               THP:       disabled
>>               HugeTLB:   enabled
>>
> 
> I debugged the code and found the issue is caused by the following
> patch.
> 
> [PATCH v3 06/21] KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table
> 
> With the following code changes applied on top of this series, no
> host kernel crash found and hugetlbfs works for me. However, I don't
> think it's correct fix to have. I guess we still want to invalidate
> the page table entry (at level#2 when PAGE_SIZE is 64KB on host) in
> stage2_map_walk_table_pre() as we're going to cut off the branch to
> the subordinate tables/entries. However, stage2_map_walk_table_post()
> still need the original page table entry to release the subordinate
> page properly. So I guess the proper fix would be to cache the original
> page table entry in advance, or you might have better idea :)
> 
> I will also reply to PATCH[06/21] to to make the reply chain complete.
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 6e8ca1ec12b4..f4eacfdd73cb 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -494,8 +494,8 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>          if (!kvm_block_mapping_supported(addr, end, data->phys, level))
>                  return 0;
> 
> -       kvm_set_invalid_pte(ptep);
> -       kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
> +       //kvm_set_invalid_pte(ptep);
> +       //kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu, addr, 0);
>          data->anchor = ptep;
>          return 0;
>   }
> 

[...]

Sorry that the guest could hang sometimes with above changes. I have no idea what
has been happening before I'm going to debug for more.. I'm pasting the used command
and output from guest.

host: VA_BITS=42, PAGE_SIZE=64KB

start_vm_aarch64_hugetlbfs() {
    local size=512
    local base="/sys/kernel/mm/hugepages"
    local file="$base/hugepages-$((size*1024))kB/nr_hugepages"

    if [ ! -f $file ]; then
       echo "Huage page file <$file> not existing"
       return -1
    fi

    echo 16 > $file

    /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64                     \
    --enable-kvm -machine virt,gic-version=host                                 \
    -cpu host -smp 8,sockets=8,cores=1,threads=1                                \
    -m 4G -mem-prealloc -mem-path /dev/hugepages                                \
    -monitor none -serial mon:stdio -nographic -s                               \
    -bios /home/gavin/sandbox/qemu.main/pc-bios/edk2-aarch64-code.fd            \
    -kernel /home/gavin/sandbox/linux.guest/arch/arm64/boot/Image               \
    -initrd /home/gavin/sandbox/images/rootfs.cpio.xz                           \
    -append "earlycon=pl011,mmio,0x9000000"                                     \
    -device virtio-net-pci,netdev=unet,mac=52:54:00:f1:26:a6                    \
    -netdev user,id=unet,hostfwd=tcp::50959-:22                                 \
    -drive file=/home/gavin/sandbox/images/vm.img,if=none,format=raw,id=nvme0   \
    -device nvme,drive=nvme0,serial=foo                                         \
    -drive file=/home/gavin/sandbox/images/vm1.img,if=none,format=raw,id=nvme1  \
    -device nvme,drive=nvme1,serial=foo1
}


[root@virtlab-arm01 ~]# start_vm_aarch64_hugetlbfs
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmPlatformPkg/PrePeiCore/PrePeiCoreUniCore/DEBUG/ArmPlatformPrePeiCore.dll 0x1800
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Pei/PeiMain/DEBUG/PeiCore.dll 0x7180
Register PPI Notify: DCD0BE23-9586-40F4-B643-06522CED4EDE
Install PPI: 8C8CE578-8A3D-4F1C-9935-896185C32DD3
Install PPI: 5473C07A-3DCB-4DCA-BD6F-1E9689E7349A
The 0th FV start address is 0x00000001000, size is 0x001FF000, handle is 0x1000
Register PPI Notify: 49EDB1C1-BF21-4761-BB12-EB0031AABB39
Register PPI Notify: EA7CA24B-DED5-4DAD-A389-BF827E8F9B38
Install PPI: B9E0ABFE-5979-4914-977F-6DEE78C278A6
Install PPI: DBE23AA9-A345-4B97-85B6-B226F1617389
Install PPI: 6847CC74-E9EC-4F8F-A29D-AB44E754A8FC
DiscoverPeimsAndOrderWithApriori(): Found 0x7 PEI FFS files in the 0th FV
Loading PEIM 9B3ADA4F-AE56-4C24-8DEA-F03B7558AE50
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/PCD/Pei/Pcd/DEBUG/PcdPeim.dll 0x1F520
Loading PEIM at 0x0000001F440 EntryPoint=0x00000020000 PcdPeim.efi
Install PPI: 06E81C58-4AD7-44BC-8390-F10265F72480
Install PPI: 01F34D25-4DE2-23AD-3FF3-36353FF323F1
Install PPI: 4D8B155B-C059-4C8F-8926-06FD4331DB8A
Install PPI: A60C6B59-E459-425D-9C69-0BCC9CB27D81
Register PPI Notify: 605EA650-C65C-42E1-BA80-91A52AB618C6
Loading PEIM C61EF796-B50D-4F98-9F78-4F6F79D800D5
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmPlatformPkg/MemoryInitPei/MemoryInitPeim/DEBUG/MemoryInit.dll 0x18000
Loading PEIM at 0x00000017F20 EntryPoint=0x00000018000 MemoryInit.efi
QemuVirtMemInfoPeiLibConstructor: System RAM @ 0x40000000 - 0x13FFFFFFF
Memory Init PEIM Loaded
PeiInstallPeiMemory MemoryBegin 0x13C000000, MemoryLength 0x4000000
ArmVirtGetMemoryMap: Dumping System DRAM Memory Map:
         PhysicalBase: 0x40000000
         VirtualBase: 0x40000000
         Length: 0x100000000
Temp Stack : BaseAddress=0x4007E020 Length=0x1FE0
Temp Heap  : BaseAddress=0x4007C030 Length=0x1FF0
Total temporary memory:    16336 bytes.
   temporary memory stack ever used:       4208 bytes.
   temporary memory heap used for HobList: 3248 bytes.
   temporary memory heap occupied by memory pages: 0 bytes.
Memory Allocation 0x00000004 0x13FFFF000 - 0x13FFFFFFF
Memory Allocation 0x00000004 0x13FFFE000 - 0x13FFFEFFF
Memory Allocation 0x00000004 0x13FFFD000 - 0x13FFFDFFF
Memory Allocation 0x00000004 0x13FFFC000 - 0x13FFFCFFF
Old Stack size 8160, New stack size 131072
Stack Hob: BaseAddress=0x13C000000 Length=0x20000
Heap Offset = 0xFBFA3FD0 Stack Offset = 0xFBFA0000
Loading PEIM 52C05B14-0B98-496C-BC3B-04B50211D680
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Pei/PeiMain/DEBUG/PeiCore.dll 0x13FFEE240
Loading PEIM at 0x0013FFEE160 EntryPoint=0x0013FFF8B2C PeiCore.efi
Reinstall PPI: 8C8CE578-8A3D-4F1C-9935-896185C32DD3
Reinstall PPI: 5473C07A-3DCB-4DCA-BD6F-1E9689E7349A
Reinstall PPI: B9E0ABFE-5979-4914-977F-6DEE78C278A6
Install PPI: F894643D-C449-42D1-8EA8-85BDD8C65BDE
Loading PEIM 2FD8B7AD-F8FA-4021-9FC0-0AA572147CDC
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmPkg/Drivers/CpuPei/CpuPei/DEBUG/CpuPei.dll 0x13FFEB240
Loading PEIM at 0x0013FFEB160 EntryPoint=0x0013FFEBEAC CpuPei.efi
Loading PEIM 2AD0FC59-2314-4BF3-8633-13FA22A624A0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmPlatformPkg/PlatformPei/PlatformPeim/DEBUG/PlatformPei.dll 0x13FFE7240
Loading PEIM at 0x0013FFE7160 EntryPoint=0x0013FFE773C PlatformPei.efi
Platform PEIM Loaded
PlatformPeim: PL011 UART @ 0x9000000
Install PPI: 7408D748-FC8C-4EE6-9288-C4BEC092A410
Loading PEIM 86D70125-BAA3-4296-A62F-602BEBBB9081
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/DxeIplPeim/DxeIpl/DEBUG/DxeIpl.dll 0x13FEDE240
Loading PEIM at 0x0013FEDE160 EntryPoint=0x0013FEDEDB4 DxeIpl.efi
Install PPI: EE4E5898-3914-4259-9D6E-DC7BD79403CF
Install PPI: 1A36E4E7-FAB6-476A-8E75-695A0576FDD7
Install PPI: 0AE8CE5D-E448-4437-A8D7-EBF5F194F731
Customized Guided section Memory Size required is 0x7023D0 and address is 0x13F7CB000
ProcessFvFile() FV at 0x3F7CB010, FvAlignment required is 0x10
Install PPI: EA7CA24B-DED5-4DAD-A389-BF827E8F9B38
Notify: PPI Guid: EA7CA24B-DED5-4DAD-A389-BF827E8F9B38, Peim notify entry point: AA8C
The 1th FV start address is 0x0013F7CB010, size is 0x007023C0, handle is 0x13F7CB010
Install PPI: 49EDB1C1-BF21-4761-BB12-EB0031AABB39
Notify: PPI Guid: 49EDB1C1-BF21-4761-BB12-EB0031AABB39, Peim notify entry point: AA8C
The Fv 13F7CB010 has already been processed!
DiscoverPeimsAndOrderWithApriori(): Found 0x0 PEI FFS files in the 1th FV
DXE IPL Entry
Loading PEIM D6A2CB7F-6A18-4E2F-B43B-9920A733700A
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll 0x13F787000
Loading PEIM at 0x0013F786000 EntryPoint=0x0013F787000 DxeCore.efi
Loading DXE CORE at 0x0013F786000 EntryPoint=0x0013F787000
Install PPI: 605EA650-C65C-42E1-BA80-91A52AB618C6
Notify: PPI Guid: 605EA650-C65C-42E1-BA80-91A52AB618C6, Peim notify entry point: 1FBE4
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll 0x13F787000
HOBLIST address in DXE = 0x13F1A5018
Memory Allocation 0x00000004 0x13FFFF000 - 0x13FFFFFFF
Memory Allocation 0x00000004 0x13FFFE000 - 0x13FFFEFFF
Memory Allocation 0x00000004 0x13FFFD000 - 0x13FFFDFFF
Memory Allocation 0x00000004 0x13FFFC000 - 0x13FFFCFFF
Memory Allocation 0x00000004 0x13F766000 - 0x13F785FFF
Memory Allocation 0x00000003 0x13FFEE000 - 0x13FFFBFFF
Memory Allocation 0x00000003 0x13FFEB000 - 0x13FFEDFFF
Memory Allocation 0x00000003 0x13FFE7000 - 0x13FFEAFFF
Memory Allocation 0x00000004 0x13FEE6000 - 0x13FFE6FFF
Memory Allocation 0x00000003 0x13FEDE000 - 0x13FEE5FFF
Memory Allocation 0x00000004 0x13FECE000 - 0x13FEDDFFF
Memory Allocation 0x00000004 0x13F7CB000 - 0x13FECDFFF
Memory Allocation 0x00000003 0x13F786000 - 0x13F7CAFFF
Memory Allocation 0x00000003 0x13F786000 - 0x13F7CAFFF
Memory Allocation 0x00000004 0x13F766000 - 0x13F785FFF
Memory Allocation 0x00000004 0x13F765000 - 0x13F765FFF
Memory Allocation 0x00000004 0x13F764000 - 0x13F764FFF
Memory Allocation 0x00000004 0x13C000000 - 0x13C01FFFF
FV Hob            0x1000 - 0x1FFFFF
FV Hob            0x13F7CB010 - 0x13FECD3CF
FV2 Hob           0x13F7CB010 - 0x13FECD3CF
                   00000000-0000-0000-0000-000000000000 - 9E21FD93-9C72-4C15-8C4B-E77F1DB2D792
FV3 Hob           0x13F7CB010 - 0x13FECD3CF - 0x0 - 0x1
                   00000000-0000-0000-0000-000000000000 - 9E21FD93-9C72-4C15-8C4B-E77F1DB2D792
InstallProtocolInterface: D8117CFE-94A6-11D4-9A3A-0090273FC14D 13F7AF000
InstallProtocolInterface: 8F644FA9-E850-4DB1-9CE2-0B44698E8DA4 13F1A0030
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13F1A0F98
InstallProtocolInterface: 8F644FA9-E850-4DB1-9CE2-0B44698E8DA4 13F1A0CB0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13F1A0D98
InstallProtocolInterface: 220E73B6-6BDB-4413-8405-B974B108619A 13F1A0630
InstallProtocolInterface: 220E73B6-6BDB-4413-8405-B974B108619A 13F19C0B0
InstallProtocolInterface: FC1BCDB0-7D31-49AA-936A-A4600D9DD083 13F7AF3B0
Loading driver 9B680FCE-AD6B-4F3A-B60B-F59899003443
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9AC040
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/DevicePathDxe/DevicePathDxe/DEBUG/DevicePathDxe.dll 0x13F581000
Loading driver at 0x0013F580000 EntryPoint=0x0013F581AD0 DevicePathDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9F0498
ProtectUefiImageCommon - 0x3E9AC040
   - 0x000000013F580000 - 0x0000000000010000
InstallProtocolInterface: 0379BE4E-D706-437D-B037-EDB82FB772A4 13F58A250
InstallProtocolInterface: 8B843E20-8132-4852-90CC-551A4E4A7F1C 13F58A240
InstallProtocolInterface: 05C99A21-C70F-4AD2-8A5F-35DF3343F51E 13F58A230
Loading driver 80CF7257-87AB-47F9-A3FE-D50B76D89541
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9AC340
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/PCD/Dxe/Pcd/DEBUG/PcdDxe.dll 0x13F577000
Loading driver at 0x0013F576000 EntryPoint=0x0013F57A29C PcdDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9AC918
ProtectUefiImageCommon - 0x3E9AC340
   - 0x000000013F576000 - 0x000000000000A000
InstallProtocolInterface: 11B34006-D85B-4D0A-A290-D5A571310EF7 13F57E198
InstallProtocolInterface: 13A3F0F6-264A-3EF0-F2E0-DEC512342F34 13F57E0F0
InstallProtocolInterface: 5BE40F57-FA68-4610-BBBF-E9C5FCDAD365 13F57E180
InstallProtocolInterface: FD0F4478-0EFD-461D-BA2D-E58C45FD5F5E 13F57E0E0
Loading driver 9A871B00-1C16-4F61-8D2C-93B6654B5AD6
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9BD2C0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmVirtPkg/FdtClientDxe/FdtClientDxe/DEBUG/FdtClientDxe.dll 0x13F56F000
Loading driver at 0x0013F56E000 EntryPoint=0x0013F570080 FdtClientDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9B3018
ProtectUefiImageCommon - 0x3E9BD2C0
   - 0x000000013F56E000 - 0x0000000000008000
InitializeFdtClientDxe: DTB @ 0x13FEE6000
InstallProtocolInterface: E11FACA0-4710-4C8E-A7A2-01BAA2591B4C 13F574060
Loading driver B601F8C4-43B7-4784-95B1-F4226CB40CEE
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9B30C0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/RuntimeDxe/RuntimeDxe/DEBUG/RuntimeDxe.dll 0x13F5E0000
Loading driver at 0x0013F5D0000 EntryPoint=0x0013F5E0910 RuntimeDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9B3798
ProtectUefiImageCommon - 0x3E9B30C0
   - 0x000000013F5D0000 - 0x0000000000040000
InstallProtocolInterface: B7DFB4E1-052F-449F-87BE-9818FC91B733 13F5F0060
Loading driver F80697E9-7FD6-4665-8646-88E33EF71DFC
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9B4040
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/SecurityStubDxe/SecurityStubDxe/DEBUG/SecurityStubDxe.dll 0x13F568000
Loading driver at 0x0013F567000 EntryPoint=0x0013F568978 SecurityStubDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9B4D18
ProtectUefiImageCommon - 0x3E9B4040
   - 0x000000013F567000 - 0x0000000000007000
InstallProtocolInterface: 94AB2F58-1438-4EF1-9152-18941A3A0E68 13F56C0B8
InstallProtocolInterface: A46423E3-4617-49F1-B9FF-D1BFA9115839 13F56C0C0
InstallProtocolInterface: 15853D7C-3DDF-43E0-A1CB-EBF85B8F872C 13F56C0B0
Loading driver 4C6E0267-C77D-410D-8100-1495911A989D
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9B4440
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/EmbeddedPkg/MetronomeDxe/MetronomeDxe/DEBUG/MetronomeDxe.dll 0x13F562000
Loading driver at 0x0013F561000 EntryPoint=0x0013F562544 MetronomeDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9B4798
ProtectUefiImageCommon - 0x3E9B4440
   - 0x000000013F561000 - 0x0000000000006000
InstallProtocolInterface: 26BACCB2-6F42-11D4-BCE7-0080C73C8881 13F565040
Loading driver 348C4D62-BFBD-4882-9ECE-C80BB1C4783B
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9B5C40
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/HiiDatabaseDxe/HiiDatabaseDxe/DEBUG/HiiDatabase.dll 0x13F542000
Loading driver at 0x0013F541000 EntryPoint=0x0013F544978 HiiDatabase.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9B5798
ProtectUefiImageCommon - 0x3E9B5C40
   - 0x000000013F541000 - 0x0000000000020000
InstallProtocolInterface: E9CA4775-8657-47FC-97E7-7ED65A084324 13F55F150
InstallProtocolInterface: 0FD96974-23AA-4CDC-B9CB-98D17750322A 13F55F1C8
InstallProtocolInterface: EF9FC172-A1B2-4693-B327-6D32FC416042 13F55F1F0
InstallProtocolInterface: 587E72D7-CC50-4F79-8209-CA291FC1A10F 13F55F248
InstallProtocolInterface: 0A8BADD5-03B8-4D19-B128-7B8F0EDAA596 13F55F278
InstallProtocolInterface: 31A6406A-6BDF-4E46-B2A2-EBAA89C40920 13F55F170
InstallProtocolInterface: 1A1241E6-8F19-41A9-BC0E-E8EF39E06546 13F55F198
Loading driver D3987D4B-971A-435F-8CAF-4967EB627241
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9BCBC0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/SerialDxe/SerialDxe/DEBUG/SerialDxe.dll 0x13F53C000
Loading driver at 0x0013F53B000 EntryPoint=0x0013F53DA98 SerialDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9BCA18
ProtectUefiImageCommon - 0x3E9BCBC0
   - 0x000000013F53B000 - 0x0000000000006000
InstallProtocolInterface: BB25CF6F-F1D4-11D2-9A0C-0090273FC1FD 13F53F090
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13F53F040
Loading driver D93CE3D8-A7EB-4730-8C8E-CC466A9ECC3C
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9BC1C0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/ReportStatusCodeRouter/RuntimeDxe/ReportStatusCodeRouterRuntimeDxe/DEBUG/ReportStatusCodeRouterRuntimeDxe.dll 0x13C270000
Loading driver at 0x0013C260000 EntryPoint=0x0013C270808 ReportStatusCodeRouterRuntimeDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9BB018
ProtectUefiImageCommon - 0x3E9BC1C0
   - 0x000000013C260000 - 0x0000000000040000
InstallProtocolInterface: 86212936-0E76-41C8-A03A-2AF2FC1C39E2 13C280060
InstallProtocolInterface: D2B2B828-0826-48A7-B3DF-983C006024F0 13C280070
Loading driver A210F973-229D-4F4D-AA37-9895E6C9EABA
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9BB7C0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/NetworkPkg/DpcDxe/DpcDxe/DEBUG/DpcDxe.dll 0x13F536000
Loading driver at 0x0013F535000 EntryPoint=0x0013F5368F0 DpcDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9BB618
ProtectUefiImageCommon - 0x3E9BB7C0
   - 0x000000013F535000 - 0x0000000000006000
InstallProtocolInterface: 480F8AE9-0C46-4AA9-BC89-DB9FBA619806 13F539030
Loading driver 22EA234F-E72A-11E4-91F9-28D2447C4829
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9BA040
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/NetworkPkg/HttpUtilitiesDxe/HttpUtilitiesDxe/DEBUG/HttpUtilitiesDxe.dll 0x13F52F000
Loading driver at 0x0013F52E000 EntryPoint=0x0013F52FF44 HttpUtilitiesDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9BAD18
ProtectUefiImageCommon - 0x3E9BA040
   - 0x000000013F52E000 - 0x0000000000007000
InstallProtocolInterface: 3E35C163-4074-45DD-431E-23989DD86B32 13F533050
Loading driver 13AC6DD0-73D0-11D4-B06B-00AA00BD6DE7
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9BA9C0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/EbcDxe/EbcDxe/DEBUG/EbcDxe.dll 0x13F525000
Loading driver at 0x0013F524000 EntryPoint=0x0013F5258A8 EbcDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9BA818
ProtectUefiImageCommon - 0x3E9BA9C0
   - 0x000000013F524000 - 0x000000000000A000
InstallProtocolInterface: 13AC6DD1-73D0-11D4-B06B-00AA00BD6DE7 13E9BA418
InstallProtocolInterface: 96F46153-97A7-4793-ACC1-FA19BF78EA97 13F52C050
InstallProtocolInterface: 2755590C-6F3C-42FA-9EA4-A3BA543CDA25 13E9B9F18
InstallProtocolInterface: AAEACCFD-F27B-4C17-B610-75CA1F2DFB52 13E9B9B98
Loading driver 0049858F-8CA7-4CCD-918B-D952CBF32975
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9B90C0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmVirtPkg/VirtioFdtDxe/VirtioFdtDxe/DEBUG/VirtioFdtDxe.dll 0x13F51E000
Loading driver at 0x0013F51D000 EntryPoint=0x0013F51FEAC VirtioFdtDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9B9A98
ProtectUefiImageCommon - 0x3E9B90C0
   - 0x000000013F51D000 - 0x0000000000007000
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B9898
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B9420
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B9598
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B8F20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B8E18
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B8C20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B8D98
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B87A0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B8898
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B8120
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B8698
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B8520
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B8498
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B7020
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B8418
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B7C20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B7D18
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B7120
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B7898
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B79A0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B7818
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B72A0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B7798
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B7420
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B6018
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B6B20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B6C18
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B6D20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B6A18
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B6820
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B6998
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B65A0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B6198
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B6220
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B6398
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B2F20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B2E18
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B2C20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B2D98
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B27A0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B2898
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B2120
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B2698
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B2520
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B2498
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B1020
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B2418
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B1C20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B1D18
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B1120
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B1898
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B19A0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B1818
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B12A0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B1798
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B1420
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B0018
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B0B20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B0C18
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B0D20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B0A18
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B0820
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B0998
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B05A0
Loading driver FE5CEA76-4F72-49E8-986F-2CD899DFFE5D
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9AF240
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/FaultTolerantWriteDxe/FaultTolerantWriteDxe/DEBUG/FaultTolerantWriteDxe.dll 0x13F514000
Loading driver at 0x0013F513000 EntryPoint=0x0013F514C48 FaultTolerantWriteDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9AE018
ProtectUefiImageCommon - 0x3E9AF240
   - 0x000000013F513000 - 0x000000000000A000
Loading driver 4B28E4C7-FF36-4E10-93CF-A82159E777C5
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9AE0C0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/ResetSystemRuntimeDxe/ResetSystemRuntimeDxe/DEBUG/ResetSystemRuntimeDxe.dll 0x13C220000
Loading driver at 0x0013C210000 EntryPoint=0x0013C22091C ResetSystemRuntimeDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9AEA98
ProtectUefiImageCommon - 0x3E9AE0C0
   - 0x000000013C210000 - 0x0000000000040000
InstallProtocolInterface: 27CFAC88-46CC-11D4-9A38-0090273FC14D 0
InstallProtocolInterface: 9DA34AE0-EAF9-4BBF-8EC3-FD60226C44BE 13C2300D8
InstallProtocolInterface: 695D7835-8D47-4C11-AB22-FA8ACCE7AE7A 13C230088
InstallProtocolInterface: 2DF6BA0B-7092-440D-BD04-FB091EC3F3C1 13C2300B0
Loading driver DE371F7C-DEC4-4D21-ADF1-593ABCC15882
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9ADB40
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmPkg/Drivers/ArmGic/ArmGicDxe/DEBUG/ArmGicDxe.dll 0x13F50C000
Loading driver at 0x0013F50B000 EntryPoint=0x0013F50CE94 ArmGicDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9AD098
ProtectUefiImageCommon - 0x3E9ADB40
   - 0x000000013F50B000 - 0x0000000000008000
Found GIC v3 (re)distributor @ 0x8000000 (0x80A0000)
InstallProtocolInterface: 2890B3EA-053D-1643-AD0C-D64808DA3FF1 13F5110E8
InstallProtocolInterface: 32898322-2DA1-474A-BAAA-F3F7CF569470 13F511078
Loading driver A487A478-51EF-48AA-8794-7BEE2A0562F1
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9AD140
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ShellPkg/DynamicCommand/TftpDynamicCommand/TftpDynamicCommand/DEBUG/tftpDynamicCommand.dll 0x13F4FD000
Loading driver at 0x0013F4FC000 EntryPoint=0x0013F4FD6AC tftpDynamicCommand.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E56AD98
InstallProtocolInterface: 6A1EE763-D47A-43B4-AABE-EF1DE2AB56FC 13F508070
ProtectUefiImageCommon - 0x3E9AD140
   - 0x000000013F4FC000 - 0x000000000000F000
InstallProtocolInterface: 3C7200E9-005F-4EA4-87DE-A3DFAC8A27C3 13F5070A0
Loading driver EBF342FE-B1D3-4EF8-957C-8048606FF671
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E569140
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/SetupBrowserDxe/SetupBrowserDxe/DEBUG/SetupBrowser.dll 0x13F4E1000
Loading driver at 0x0013F4E0000 EntryPoint=0x0013F4EEFB0 SetupBrowser.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E569B18
ProtectUefiImageCommon - 0x3E569140
   - 0x000000013F4E0000 - 0x000000000001C000
InstallProtocolInterface: B9D4C360-BCFB-4F9B-9298-53C136982258 13F4FA0C8
InstallProtocolInterface: A770C357-B693-4E6D-A6CF-D21C728E550B 13F4FA0F8
InstallProtocolInterface: 1F73B18D-4630-43C1-A1DE-6F80855D7DA4 13F4FA0D8
Loading driver F9D88642-0737-49BC-81B5-6889CD57D9EA
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E566BC0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/SmbiosDxe/SmbiosDxe/DEBUG/SmbiosDxe.dll 0x13F4D7000
Loading driver at 0x0013F4D6000 EntryPoint=0x0013F4D9D74 SmbiosDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E566F18
ProtectUefiImageCommon - 0x3E566BC0
   - 0x000000013F4D6000 - 0x000000000000A000
Found FwCfg @ 0x9020008/0x9020000
Found FwCfg DMA @ 0x9020010
QEMU 5.1.50 monitor - type 'help' for more information
(qemu) q
[root@virtlab-arm01 ~]# start_vm_aarch64_hugetlbfs
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmPlatformPkg/PrePeiCore/PrePeiCoreUniCore/DEBUG/ArmPlatformPrePeiCore.dll 0x1800
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Pei/PeiMain/DEBUG/PeiCore.dll 0x7180
Register PPI Notify: DCD0BE23-9586-40F4-B643-06522CED4EDE
Install PPI: 8C8CE578-8A3D-4F1C-9935-896185C32DD3
Install PPI: 5473C07A-3DCB-4DCA-BD6F-1E9689E7349A
The 0th FV start address is 0x00000001000, size is 0x001FF000, handle is 0x1000
Register PPI Notify: 49EDB1C1-BF21-4761-BB12-EB0031AABB39
Register PPI Notify: EA7CA24B-DED5-4DAD-A389-BF827E8F9B38
Install PPI: B9E0ABFE-5979-4914-977F-6DEE78C278A6
Install PPI: DBE23AA9-A345-4B97-85B6-B226F1617389
Install PPI: 6847CC74-E9EC-4F8F-A29D-AB44E754A8FC
DiscoverPeimsAndOrderWithApriori(): Found 0x7 PEI FFS files in the 0th FV
Loading PEIM 9B3ADA4F-AE56-4C24-8DEA-F03B7558AE50
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/PCD/Pei/Pcd/DEBUG/PcdPeim.dll 0x1F520
Loading PEIM at 0x0000001F440 EntryPoint=0x00000020000 PcdPeim.efi
Install PPI: 06E81C58-4AD7-44BC-8390-F10265F72480
Install PPI: 01F34D25-4DE2-23AD-3FF3-36353FF323F1
Install PPI: 4D8B155B-C059-4C8F-8926-06FD4331DB8A
Install PPI: A60C6B59-E459-425D-9C69-0BCC9CB27D81
Register PPI Notify: 605EA650-C65C-42E1-BA80-91A52AB618C6
Loading PEIM C61EF796-B50D-4F98-9F78-4F6F79D800D5
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmPlatformPkg/MemoryInitPei/MemoryInitPeim/DEBUG/MemoryInit.dll 0x18000
Loading PEIM at 0x00000017F20 EntryPoint=0x00000018000 MemoryInit.efi
QemuVirtMemInfoPeiLibConstructor: System RAM @ 0x40000000 - 0x13FFFFFFF
Memory Init PEIM Loaded
PeiInstallPeiMemory MemoryBegin 0x13C000000, MemoryLength 0x4000000
ArmVirtGetMemoryMap: Dumping System DRAM Memory Map:
         PhysicalBase: 0x40000000
         VirtualBase: 0x40000000
         Length: 0x100000000
Temp Stack : BaseAddress=0x4007E020 Length=0x1FE0
Temp Heap  : BaseAddress=0x4007C030 Length=0x1FF0
Total temporary memory:    16336 bytes.
   temporary memory stack ever used:       4208 bytes.
   temporary memory heap used for HobList: 3248 bytes.
   temporary memory heap occupied by memory pages: 0 bytes.
Memory Allocation 0x00000004 0x13FFFF000 - 0x13FFFFFFF
Memory Allocation 0x00000004 0x13FFFE000 - 0x13FFFEFFF
Memory Allocation 0x00000004 0x13FFFD000 - 0x13FFFDFFF
Memory Allocation 0x00000004 0x13FFFC000 - 0x13FFFCFFF
Old Stack size 8160, New stack size 131072
Stack Hob: BaseAddress=0x13C000000 Length=0x20000
Heap Offset = 0xFBFA3FD0 Stack Offset = 0xFBFA0000
Loading PEIM 52C05B14-0B98-496C-BC3B-04B50211D680
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Pei/PeiMain/DEBUG/PeiCore.dll 0x13FFEE240
Loading PEIM at 0x0013FFEE160 EntryPoint=0x0013FFF8B2C PeiCore.efi
Reinstall PPI: 8C8CE578-8A3D-4F1C-9935-896185C32DD3
Reinstall PPI: 5473C07A-3DCB-4DCA-BD6F-1E9689E7349A
Reinstall PPI: B9E0ABFE-5979-4914-977F-6DEE78C278A6
Install PPI: F894643D-C449-42D1-8EA8-85BDD8C65BDE
Loading PEIM 2FD8B7AD-F8FA-4021-9FC0-0AA572147CDC
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmPkg/Drivers/CpuPei/CpuPei/DEBUG/CpuPei.dll 0x13FFEB240
Loading PEIM at 0x0013FFEB160 EntryPoint=0x0013FFEBEAC CpuPei.efi
Loading PEIM 2AD0FC59-2314-4BF3-8633-13FA22A624A0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmPlatformPkg/PlatformPei/PlatformPeim/DEBUG/PlatformPei.dll 0x13FFE7240
Loading PEIM at 0x0013FFE7160 EntryPoint=0x0013FFE773C PlatformPei.efi
Platform PEIM Loaded
PlatformPeim: PL011 UART @ 0x9000000
Install PPI: 7408D748-FC8C-4EE6-9288-C4BEC092A410
Loading PEIM 86D70125-BAA3-4296-A62F-602BEBBB9081
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/DxeIplPeim/DxeIpl/DEBUG/DxeIpl.dll 0x13FEDE240
Loading PEIM at 0x0013FEDE160 EntryPoint=0x0013FEDEDB4 DxeIpl.efi
Install PPI: EE4E5898-3914-4259-9D6E-DC7BD79403CF
Install PPI: 1A36E4E7-FAB6-476A-8E75-695A0576FDD7
Install PPI: 0AE8CE5D-E448-4437-A8D7-EBF5F194F731
Customized Guided section Memory Size required is 0x7023D0 and address is 0x13F7CB000
ProcessFvFile() FV at 0x3F7CB010, FvAlignment required is 0x10
Install PPI: EA7CA24B-DED5-4DAD-A389-BF827E8F9B38
Notify: PPI Guid: EA7CA24B-DED5-4DAD-A389-BF827E8F9B38, Peim notify entry point: AA8C
The 1th FV start address is 0x0013F7CB010, size is 0x007023C0, handle is 0x13F7CB010
Install PPI: 49EDB1C1-BF21-4761-BB12-EB0031AABB39
Notify: PPI Guid: 49EDB1C1-BF21-4761-BB12-EB0031AABB39, Peim notify entry point: AA8C
The Fv 13F7CB010 has already been processed!
DiscoverPeimsAndOrderWithApriori(): Found 0x0 PEI FFS files in the 1th FV
DXE IPL Entry
Loading PEIM D6A2CB7F-6A18-4E2F-B43B-9920A733700A
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll 0x13F787000
Loading PEIM at 0x0013F786000 EntryPoint=0x0013F787000 DxeCore.efi
Loading DXE CORE at 0x0013F786000 EntryPoint=0x0013F787000
Install PPI: 605EA650-C65C-42E1-BA80-91A52AB618C6
Notify: PPI Guid: 605EA650-C65C-42E1-BA80-91A52AB618C6, Peim notify entry point: 1FBE4
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll 0x13F787000
HOBLIST address in DXE = 0x13F1A5018
Memory Allocation 0x00000004 0x13FFFF000 - 0x13FFFFFFF
Memory Allocation 0x00000004 0x13FFFE000 - 0x13FFFEFFF
Memory Allocation 0x00000004 0x13FFFD000 - 0x13FFFDFFF
Memory Allocation 0x00000004 0x13FFFC000 - 0x13FFFCFFF
Memory Allocation 0x00000004 0x13F766000 - 0x13F785FFF
Memory Allocation 0x00000003 0x13FFEE000 - 0x13FFFBFFF
Memory Allocation 0x00000003 0x13FFEB000 - 0x13FFEDFFF
Memory Allocation 0x00000003 0x13FFE7000 - 0x13FFEAFFF
Memory Allocation 0x00000004 0x13FEE6000 - 0x13FFE6FFF
Memory Allocation 0x00000003 0x13FEDE000 - 0x13FEE5FFF
Memory Allocation 0x00000004 0x13FECE000 - 0x13FEDDFFF
Memory Allocation 0x00000004 0x13F7CB000 - 0x13FECDFFF
Memory Allocation 0x00000003 0x13F786000 - 0x13F7CAFFF
Memory Allocation 0x00000003 0x13F786000 - 0x13F7CAFFF
Memory Allocation 0x00000004 0x13F766000 - 0x13F785FFF
Memory Allocation 0x00000004 0x13F765000 - 0x13F765FFF
Memory Allocation 0x00000004 0x13F764000 - 0x13F764FFF
Memory Allocation 0x00000004 0x13C000000 - 0x13C01FFFF
FV Hob            0x1000 - 0x1FFFFF
FV Hob            0x13F7CB010 - 0x13FECD3CF
FV2 Hob           0x13F7CB010 - 0x13FECD3CF
                   00000000-0000-0000-0000-000000000000 - 9E21FD93-9C72-4C15-8C4B-E77F1DB2D792
FV3 Hob           0x13F7CB010 - 0x13FECD3CF - 0x0 - 0x1
                   00000000-0000-0000-0000-000000000000 - 9E21FD93-9C72-4C15-8C4B-E77F1DB2D792
InstallProtocolInterface: D8117CFE-94A6-11D4-9A3A-0090273FC14D 13F7AF000
InstallProtocolInterface: 8F644FA9-E850-4DB1-9CE2-0B44698E8DA4 13F1A0030
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13F1A0F98
InstallProtocolInterface: 8F644FA9-E850-4DB1-9CE2-0B44698E8DA4 13F1A0CB0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13F1A0D98
InstallProtocolInterface: 220E73B6-6BDB-4413-8405-B974B108619A 13F1A0630
InstallProtocolInterface: 220E73B6-6BDB-4413-8405-B974B108619A 13F19C0B0
InstallProtocolInterface: FC1BCDB0-7D31-49AA-936A-A4600D9DD083 13F7AF3B0
Loading driver 9B680FCE-AD6B-4F3A-B60B-F59899003443
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9AC040
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/DevicePathDxe/DevicePathDxe/DEBUG/DevicePathDxe.dll 0x13F581000
Loading driver at 0x0013F580000 EntryPoint=0x0013F581AD0 DevicePathDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9F0498
ProtectUefiImageCommon - 0x3E9AC040
   - 0x000000013F580000 - 0x0000000000010000
InstallProtocolInterface: 0379BE4E-D706-437D-B037-EDB82FB772A4 13F58A250
InstallProtocolInterface: 8B843E20-8132-4852-90CC-551A4E4A7F1C 13F58A240
InstallProtocolInterface: 05C99A21-C70F-4AD2-8A5F-35DF3343F51E 13F58A230
Loading driver 80CF7257-87AB-47F9-A3FE-D50B76D89541
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9AC340
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/PCD/Dxe/Pcd/DEBUG/PcdDxe.dll 0x13F577000
Loading driver at 0x0013F576000 EntryPoint=0x0013F57A29C PcdDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9AC918
ProtectUefiImageCommon - 0x3E9AC340
   - 0x000000013F576000 - 0x000000000000A000
InstallProtocolInterface: 11B34006-D85B-4D0A-A290-D5A571310EF7 13F57E198
InstallProtocolInterface: 13A3F0F6-264A-3EF0-F2E0-DEC512342F34 13F57E0F0
InstallProtocolInterface: 5BE40F57-FA68-4610-BBBF-E9C5FCDAD365 13F57E180
InstallProtocolInterface: FD0F4478-0EFD-461D-BA2D-E58C45FD5F5E 13F57E0E0
Loading driver 9A871B00-1C16-4F61-8D2C-93B6654B5AD6
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9BD2C0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmVirtPkg/FdtClientDxe/FdtClientDxe/DEBUG/FdtClientDxe.dll 0x13F56F000
Loading driver at 0x0013F56E000 EntryPoint=0x0013F570080 FdtClientDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9B3018
ProtectUefiImageCommon - 0x3E9BD2C0
   - 0x000000013F56E000 - 0x0000000000008000
InitializeFdtClientDxe: DTB @ 0x13FEE6000
InstallProtocolInterface: E11FACA0-4710-4C8E-A7A2-01BAA2591B4C 13F574060
Loading driver B601F8C4-43B7-4784-95B1-F4226CB40CEE
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9B30C0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/RuntimeDxe/RuntimeDxe/DEBUG/RuntimeDxe.dll 0x13F5E0000
Loading driver at 0x0013F5D0000 EntryPoint=0x0013F5E0910 RuntimeDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9B3798
ProtectUefiImageCommon - 0x3E9B30C0
   - 0x000000013F5D0000 - 0x0000000000040000
InstallProtocolInterface: B7DFB4E1-052F-449F-87BE-9818FC91B733 13F5F0060
Loading driver F80697E9-7FD6-4665-8646-88E33EF71DFC
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9B4040
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/SecurityStubDxe/SecurityStubDxe/DEBUG/SecurityStubDxe.dll 0x13F568000
Loading driver at 0x0013F567000 EntryPoint=0x0013F568978 SecurityStubDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9B4D18
ProtectUefiImageCommon - 0x3E9B4040
   - 0x000000013F567000 - 0x0000000000007000
InstallProtocolInterface: 94AB2F58-1438-4EF1-9152-18941A3A0E68 13F56C0B8
InstallProtocolInterface: A46423E3-4617-49F1-B9FF-D1BFA9115839 13F56C0C0
InstallProtocolInterface: 15853D7C-3DDF-43E0-A1CB-EBF85B8F872C 13F56C0B0
Loading driver 4C6E0267-C77D-410D-8100-1495911A989D
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9B4440
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/EmbeddedPkg/MetronomeDxe/MetronomeDxe/DEBUG/MetronomeDxe.dll 0x13F562000
Loading driver at 0x0013F561000 EntryPoint=0x0013F562544 MetronomeDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9B4798
ProtectUefiImageCommon - 0x3E9B4440
   - 0x000000013F561000 - 0x0000000000006000
InstallProtocolInterface: 26BACCB2-6F42-11D4-BCE7-0080C73C8881 13F565040
Loading driver 348C4D62-BFBD-4882-9ECE-C80BB1C4783B
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9B5C40
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/HiiDatabaseDxe/HiiDatabaseDxe/DEBUG/HiiDatabase.dll 0x13F542000
Loading driver at 0x0013F541000 EntryPoint=0x0013F544978 HiiDatabase.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9B5798
ProtectUefiImageCommon - 0x3E9B5C40
   - 0x000000013F541000 - 0x0000000000020000
InstallProtocolInterface: E9CA4775-8657-47FC-97E7-7ED65A084324 13F55F150
InstallProtocolInterface: 0FD96974-23AA-4CDC-B9CB-98D17750322A 13F55F1C8
InstallProtocolInterface: EF9FC172-A1B2-4693-B327-6D32FC416042 13F55F1F0
InstallProtocolInterface: 587E72D7-CC50-4F79-8209-CA291FC1A10F 13F55F248
InstallProtocolInterface: 0A8BADD5-03B8-4D19-B128-7B8F0EDAA596 13F55F278
InstallProtocolInterface: 31A6406A-6BDF-4E46-B2A2-EBAA89C40920 13F55F170
InstallProtocolInterface: 1A1241E6-8F19-41A9-BC0E-E8EF39E06546 13F55F198
Loading driver D3987D4B-971A-435F-8CAF-4967EB627241
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9BCBC0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/SerialDxe/SerialDxe/DEBUG/SerialDxe.dll 0x13F53C000
Loading driver at 0x0013F53B000 EntryPoint=0x0013F53DA98 SerialDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9BCA18
ProtectUefiImageCommon - 0x3E9BCBC0
   - 0x000000013F53B000 - 0x0000000000006000
InstallProtocolInterface: BB25CF6F-F1D4-11D2-9A0C-0090273FC1FD 13F53F090
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13F53F040
Loading driver D93CE3D8-A7EB-4730-8C8E-CC466A9ECC3C
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9BC1C0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/ReportStatusCodeRouter/RuntimeDxe/ReportStatusCodeRouterRuntimeDxe/DEBUG/ReportStatusCodeRouterRuntimeDxe.dll 0x13C270000
Loading driver at 0x0013C260000 EntryPoint=0x0013C270808 ReportStatusCodeRouterRuntimeDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9BB018
ProtectUefiImageCommon - 0x3E9BC1C0
   - 0x000000013C260000 - 0x0000000000040000
InstallProtocolInterface: 86212936-0E76-41C8-A03A-2AF2FC1C39E2 13C280060
InstallProtocolInterface: D2B2B828-0826-48A7-B3DF-983C006024F0 13C280070
Loading driver A210F973-229D-4F4D-AA37-9895E6C9EABA
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9BB7C0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/NetworkPkg/DpcDxe/DpcDxe/DEBUG/DpcDxe.dll 0x13F536000
Loading driver at 0x0013F535000 EntryPoint=0x0013F5368F0 DpcDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9BB618
ProtectUefiImageCommon - 0x3E9BB7C0
   - 0x000000013F535000 - 0x0000000000006000
InstallProtocolInterface: 480F8AE9-0C46-4AA9-BC89-DB9FBA619806 13F539030
Loading driver 22EA234F-E72A-11E4-91F9-28D2447C4829
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9BA040
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/NetworkPkg/HttpUtilitiesDxe/HttpUtilitiesDxe/DEBUG/HttpUtilitiesDxe.dll 0x13F52F000
Loading driver at 0x0013F52E000 EntryPoint=0x0013F52FF44 HttpUtilitiesDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9BAD18
ProtectUefiImageCommon - 0x3E9BA040
   - 0x000000013F52E000 - 0x0000000000007000> For the initial debugging, I add some printk around and get the following
> output, for FYI. It indicates we're releasing page at physical address
> 0x0 and obviously incorrect.
> 
>     [  111.586180] stage2_map_walk_table_post: addr=0x40000000, end=0x60000000, level=2, anchor@0xfffffc0f191c0010, ptep@0xfffffc0f191c0010
> 
>     static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
>          if (!data->anchor)
>                  return 0;
> 
> +       if (*ptep == 0x0) {
> +               pr_warn("%s: addr=0x%llx, end=0x%llx, level=%d, anchor@0x%lx, ptep@0x%lx\n",
> +                        __func__, addr, end, level, (unsigned long)(data->anchor),
> +                       (unsigned long)ptep);
> +       }
> +
>          free_page((unsigned long)kvm_pte_follow(*ptep));
>          put_page(virt_to_page(ptep));
>
InstallProtocolInterface: 3E35C163-4074-45DD-431E-23989DD86B32 13F533050
Loading driver 13AC6DD0-73D0-11D4-B06B-00AA00BD6DE7
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9BA9C0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/EbcDxe/EbcDxe/DEBUG/EbcDxe.dll 0x13F525000
Loading driver at 0x0013F524000 EntryPoint=0x0013F5258A8 EbcDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9BA818
ProtectUefiImageCommon - 0x3E9BA9C0
   - 0x000000013F524000 - 0x000000000000A000
InstallProtocolInterface: 13AC6DD1-73D0-11D4-B06B-00AA00BD6DE7 13E9BA418
InstallProtocolInterface: 96F46153-97A7-4793-ACC1-FA19BF78EA97 13F52C050
InstallProtocolInterface: 2755590C-6F3C-42FA-9EA4-A3BA543CDA25 13E9B9F18
InstallProtocolInterface: AAEACCFD-F27B-4C17-B610-75CA1F2DFB52 13E9B9B98
Loading driver 0049858F-8CA7-4CCD-918B-D952CBF32975
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9B90C0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmVirtPkg/VirtioFdtDxe/VirtioFdtDxe/DEBUG/VirtioFdtDxe.dll 0x13F51E000
Loading driver at 0x0013F51D000 EntryPoint=0x0013F51FEAC VirtioFdtDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9B9A98
ProtectUefiImageCommon - 0x3E9B90C0
   - 0x000000013F51D000 - 0x0000000000007000
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B9898
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B9420
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B9598
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B8F20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B8E18
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B8C20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B8D98
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B87A0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B8898
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B8120
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B8698
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B8520
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B8498
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B7020
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B8418
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B7C20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B7D18
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B7120
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B7898
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B79A0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B7818
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B72A0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B7798
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B7420
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B6018
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B6B20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B6C18
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B6D20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B6A18
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B6820
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B6998
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B65A0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B6198
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B6220
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B6398
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B2F20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B2E18
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B2C20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B2D98
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B27A0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B2898
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B2120
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B2698
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B2520
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B2498
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B1020
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B2418
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B1C20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B1D18
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B1120
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B1898
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B19A0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B1818
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B12A0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B1798
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B1420
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B0018
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B0B20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B0C18
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B0D20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B0A18
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B0820
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B0998
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B05A0
Loading driver FE5CEA76-4F72-49E8-986F-2CD899DFFE5D
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9AF240
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/FaultTolerantWriteDxe/FaultTolerantWriteDxe/DEBUG/FaultTolerantWriteDxe.dll 0x13F514000
Loading driver at 0x0013F513000 EntryPoint=0x0013F514C48 FaultTolerantWriteDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9AE018
ProtectUefiImageCommon - 0x3E9AF240
   - 0x000000013F513000 - 0x000000000000A000
Loading driver 4B28E4C7-FF36-4E10-93CF-A82159E777C5
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9AE0C0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/ResetSystemRuntimeDxe/ResetSystemRuntimeDxe/DEBUG/ResetSystemRuntimeDxe.dll 0x13C220000
Loading driver at 0x0013C210000 EntryPoint=0x0013C22091C ResetSystemRuntimeDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9AEA98
ProtectUefiImageCommon - 0x3E9AE0C0
   - 0x000000013C210000 - 0x0000000000040000
InstallProtocolInterface: 27CFAC88-46CC-11D4-9A38-0090273FC14D 0
InstallProtocolInterface: 9DA34AE0-EAF9-4BBF-8EC3-FD60226C44BE 13C2300D8
InstallProtocolInterface: 695D7835-8D47-4C11-AB22-FA8ACCE7AE7A 13C230088
InstallProtocolInterface: 2DF6BA0B-7092-440D-BD04-FB091EC3F3C1 13C2300B0
Loading driver DE371F7C-DEC4-4D21-ADF1-593ABCC15882
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9ADB40
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmPkg/Drivers/ArmGic/ArmGicDxe/DEBUG/ArmGicDxe.dll 0x13F50C000
Loading driver at 0x0013F50B000 EntryPoint=0x0013F50CE94 ArmGicDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9AD098
ProtectUefiImageCommon - 0x3E9ADB40
   - 0x000000013F50B000 - 0x0000000000008000
Found GIC v3 (re)distributor @ 0x8000000 (0x80A0000)
InstallProtocolInterface: 2890B3EA-053D-1643-AD0C-D64808DA3FF1 13F5110E8
InstallProtocolInterface: 32898322-2DA1-474A-BAAA-F3F7CF569470 13F511078
Loading driver A487A478-51EF-48AA-8794-7BEE2A0562F1
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9AD140
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ShellPkg/DynamicCommand/TftpDynamicCommand/TftpDynamicCommand/DEBUG/tftpDynamicCommand.dll 0x13F4FD000
Loading driver at 0x0013F4FC000 EntryPoint=0x0013F4FD6AC tftpDynamicCommand.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E56AD98
InstallProtocolInterface: 6A1EE763-D47A-43B4-AABE-EF1DE2AB56FC 13F508070
ProtectUefiImageCommon - 0x3E9AD140
   - 0x000000013F4FC000 - 0x000000000000F000
InstallProtocolInterface: 3C7200E9-005F-4EA4-87DE-A3DFAC8A27C3 13F5070A0
Loading driver EBF342FE-B1D3-4EF8-957C-8048606FF671
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E569140
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/SetupBrowserDxe/SetupBrowserDxe/DEBUG/SetupBrowser.dll 0x13F4E1000
Loading driver at 0x0013F4E0000 EntryPoint=0x0013F4EEFB0 SetupBrowser.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E569B18
ProtectUefiImageCommon - 0x3E569140
   - 0x000000013F4E0000 - 0x000000000001C000
InstallProtocolInterface: B9D4C360-BCFB-4F9B-9298-53C136982258 13F4FA0C8
InstallProtocolInterface: A770C357-B693-4E6D-A6CF-D21C728E550B 13F4FA0F8
InstallProtocolInterface: 1F73B18D-4630-43C1-A1DE-6F80855D7DA4 13F4FA0D8
Loading driver F9D88642-0737-49BC-81B5-6889CD57D9EA
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E566BC0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/SmbiosDxe/SmbiosDxe/DEBUG/SmbiosDxe.dll 0x13F4D7000
Loading driver at 0x0013F4D6000 EntryPoint=0x0013F4D9D74 SmbiosDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E566F18
ProtectUefiImageCommon - 0x3E566BC0
   - 0x000000013F4D6000 - 0x000000000000A000
Found FwCfg @ 0x9020008/0x9020000
Found FwCfg DMA @ 0x9020010
QEMU 5.1.50 monitor - type 'help' for more information
(qemu) q
[root@virtlab-arm01 ~]# Connection to virtlab-arm01.virt.lab.eng.bos.redhat.com closed by remote host.
Connection to virtlab-arm01.virt.lab.eng.bos.redhat.com closed.
[gwshan@gshan ~]$ to_arm
Activate the web console with: systemctl enable --now cockpit.socket

This system is not registered to Red Hat Insights. See https://cloud.redhat.com/
To register this system, run: insights-client --register

Last login: Thu Sep  3 07:29:54 2020 from 10.64.54.159
[root@virtlab-arm01 ~]# start_vm_aarch64_hugetlbfs
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmPlatformPkg/PrePeiCore/PrePeiCoreUniCore/DEBUG/ArmPlatformPrePeiCore.dll 0x1800
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Pei/PeiMain/DEBUG/PeiCore.dll 0x7180
Register PPI Notify: DCD0BE23-9586-40F4-B643-06522CED4EDE
Install PPI: 8C8CE578-8A3D-4F1C-9935-896185C32DD3
Install PPI: 5473C07A-3DCB-4DCA-BD6F-1E9689E7349A
The 0th FV start address is 0x00000001000, size is 0x001FF000, handle is 0x1000
Register PPI Notify: 49EDB1C1-BF21-4761-BB12-EB0031AABB39
Register PPI Notify: EA7CA24B-DED5-4DAD-A389-BF827E8F9B38
Install PPI: B9E0ABFE-5979-4914-977F-6DEE78C278A6
Install PPI: DBE23AA9-A345-4B97-85B6-B226F1617389
Install PPI: 6847CC74-E9EC-4F8F-A29D-AB44E754A8FC
DiscoverPeimsAndOrderWithApriori(): Found 0x7 PEI FFS files in the 0th FV
Loading PEIM 9B3ADA4F-AE56-4C24-8DEA-F03B7558AE50
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/PCD/Pei/Pcd/DEBUG/PcdPeim.dll 0x1F520
Loading PEIM at 0x0000001F440 EntryPoint=0x00000020000 PcdPeim.efi
Install PPI: 06E81C58-4AD7-44BC-8390-F10265F72480
Install PPI: 01F34D25-4DE2-23AD-3FF3-36353FF323F1
Install PPI: 4D8B155B-C059-4C8F-8926-06FD4331DB8A
Install PPI: A60C6B59-E459-425D-9C69-0BCC9CB27D81
Register PPI Notify: 605EA650-C65C-42E1-BA80-91A52AB618C6
Loading PEIM C61EF796-B50D-4F98-9F78-4F6F79D800D5
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmPlatformPkg/MemoryInitPei/MemoryInitPeim/DEBUG/MemoryInit.dll 0x18000
Loading PEIM at 0x00000017F20 EntryPoint=0x00000018000 MemoryInit.efi
QemuVirtMemInfoPeiLibConstructor: System RAM @ 0x40000000 - 0x13FFFFFFF
Memory Init PEIM Loaded
PeiInstallPeiMemory MemoryBegin 0x13C000000, MemoryLength 0x4000000
ArmVirtGetMemoryMap: Dumping System DRAM Memory Map:
         PhysicalBase: 0x40000000
         VirtualBase: 0x40000000
         Length: 0x100000000
Temp Stack : BaseAddress=0x4007E020 Length=0x1FE0
Temp Heap  : BaseAddress=0x4007C030 Length=0x1FF0
Total temporary memory:    16336 bytes.
   temporary memory stack ever used:       4208 bytes.
   temporary memory heap used for HobList: 3248 bytes.
   temporary memory heap occupied by memory pages: 0 bytes.
Memory Allocation 0x00000004 0x13FFFF000 - 0x13FFFFFFF
Memory Allocation 0x00000004 0x13FFFE000 - 0x13FFFEFFF
Memory Allocation 0x00000004 0x13FFFD000 - 0x13FFFDFFF
Memory Allocation 0x00000004 0x13FFFC000 - 0x13FFFCFFF
Old Stack size 8160, New stack size 131072
Stack Hob: BaseAddress=0x13C000000 Length=0x20000
Heap Offset = 0xFBFA3FD0 Stack Offset = 0xFBFA0000
Loading PEIM 52C05B14-0B98-496C-BC3B-04B50211D680
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Pei/PeiMain/DEBUG/PeiCore.dll 0x13FFEE240
Loading PEIM at 0x0013FFEE160 EntryPoint=0x0013FFF8B2C PeiCore.efi
Reinstall PPI: 8C8CE578-8A3D-4F1C-9935-896185C32DD3
Reinstall PPI: 5473C07A-3DCB-4DCA-BD6F-1E9689E7349A
Reinstall PPI: B9E0ABFE-5979-4914-977F-6DEE78C278A6
Install PPI: F894643D-C449-42D1-8EA8-85BDD8C65BDE
Loading PEIM 2FD8B7AD-F8FA-4021-9FC0-0AA572147CDC
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmPkg/Drivers/CpuPei/CpuPei/DEBUG/CpuPei.dll 0x13FFEB240
Loading PEIM at 0x0013FFEB160 EntryPoint=0x0013FFEBEAC CpuPei.efi
Loading PEIM 2AD0FC59-2314-4BF3-8633-13FA22A624A0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmPlatformPkg/PlatformPei/PlatformPeim/DEBUG/PlatformPei.dll 0x13FFE7240
Loading PEIM at 0x0013FFE7160 EntryPoint=0x0013FFE773C PlatformPei.efi
Platform PEIM Loaded
PlatformPeim: PL011 UART @ 0x9000000
Install PPI: 7408D748-FC8C-4EE6-9288-C4BEC092A410
Loading PEIM 86D70125-BAA3-4296-A62F-602BEBBB9081
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/DxeIplPeim/DxeIpl/DEBUG/DxeIpl.dll 0x13FEDE240
Loading PEIM at 0x0013FEDE160 EntryPoint=0x0013FEDEDB4 DxeIpl.efi
Install PPI: EE4E5898-3914-4259-9D6E-DC7BD79403CF
Install PPI: 1A36E4E7-FAB6-476A-8E75-695A0576FDD7
Install PPI: 0AE8CE5D-E448-4437-A8D7-EBF5F194F731
Customized Guided section Memory Size required is 0x7023D0 and address is 0x13F7CB000
ProcessFvFile() FV at 0x3F7CB010, FvAlignment required is 0x10
Install PPI: EA7CA24B-DED5-4DAD-A389-BF827E8F9B38
Notify: PPI Guid: EA7CA24B-DED5-4DAD-A389-BF827E8F9B38, Peim notify entry point: AA8C
The 1th FV start address is 0x0013F7CB010, size is 0x007023C0, handle is 0x13F7CB010
Install PPI: 49EDB1C1-BF21-4761-BB12-EB0031AABB39
Notify: PPI Guid: 49EDB1C1-BF21-4761-BB12-EB0031AABB39, Peim notify entry point: AA8C
The Fv 13F7CB010 has already been processed!
DiscoverPeimsAndOrderWithApriori(): Found 0x0 PEI FFS files in the 1th FV
DXE IPL Entry
Loading PEIM D6A2CB7F-6A18-4E2F-B43B-9920A733700A
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll 0x13F787000
Loading PEIM at 0x0013F786000 EntryPoint=0x0013F787000 DxeCore.efi
Loading DXE CORE at 0x0013F786000 EntryPoint=0x0013F787000
Install PPI: 605EA650-C65C-42E1-BA80-91A52AB618C6
Notify: PPI Guid: 605EA650-C65C-42E1-BA80-91A52AB618C6, Peim notify entry point: 1FBE4
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll 0x13F787000
HOBLIST address in DXE = 0x13F1A5018
Memory Allocation 0x00000004 0x13FFFF000 - 0x13FFFFFFF
Memory Allocation 0x00000004 0x13FFFE000 - 0x13FFFEFFF
Memory Allocation 0x00000004 0x13FFFD000 - 0x13FFFDFFF
Memory Allocation 0x00000004 0x13FFFC000 - 0x13FFFCFFF
Memory Allocation 0x00000004 0x13F766000 - 0x13F785FFF
Memory Allocation 0x00000003 0x13FFEE000 - 0x13FFFBFFF
Memory Allocation 0x00000003 0x13FFEB000 - 0x13FFEDFFF
Memory Allocation 0x00000003 0x13FFE7000 - 0x13FFEAFFF
Memory Allocation 0x00000004 0x13FEE6000 - 0x13FFE6FFF
Memory Allocation 0x00000003 0x13FEDE000 - 0x13FEE5FFF
Memory Allocation 0x00000004 0x13FECE000 - 0x13FEDDFFF
Memory Allocation 0x00000004 0x13F7CB000 - 0x13FECDFFF
Memory Allocation 0x00000003 0x13F786000 - 0x13F7CAFFF
Memory Allocation 0x00000003 0x13F786000 - 0x13F7CAFFF
Memory Allocation 0x00000004 0x13F766000 - 0x13F785FFF
Memory Allocation 0x00000004 0x13F765000 - 0x13F765FFF
Memory Allocation 0x00000004 0x13F764000 - 0x13F764FFF
Memory Allocation 0x00000004 0x13C000000 - 0x13C01FFFF
FV Hob            0x1000 - 0x1FFFFF
FV Hob            0x13F7CB010 - 0x13FECD3CF
FV2 Hob           0x13F7CB010 - 0x13FECD3CF
                   00000000-0000-0000-0000-000000000000 - 9E21FD93-9C72-4C15-8C4B-E77F1DB2D792
FV3 Hob           0x13F7CB010 - 0x13FECD3CF - 0x0 - 0x1
                   00000000-0000-0000-0000-000000000000 - 9E21FD93-9C72-4C15-8C4B-E77F1DB2D792
InstallProtocolInterface: D8117CFE-94A6-11D4-9A3A-0090273FC14D 13F7AF000
InstallProtocolInterface: 8F644FA9-E850-4DB1-9CE2-0B44698E8DA4 13F1A0030
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13F1A0F98
InstallProtocolInterface: 8F644FA9-E850-4DB1-9CE2-0B44698E8DA4 13F1A0CB0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13F1A0D98
InstallProtocolInterface: 220E73B6-6BDB-4413-8405-B974B108619A 13F1A0630
InstallProtocolInterface: 220E73B6-6BDB-4413-8405-B974B108619A 13F19C0B0
InstallProtocolInterface: FC1BCDB0-7D31-49AA-936A-A4600D9DD083 13F7AF3B0
Loading driver 9B680FCE-AD6B-4F3A-B60B-F59899003443
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9AC040
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/DevicePathDxe/DevicePathDxe/DEBUG/DevicePathDxe.dll 0x13F581000
Loading driver at 0x0013F580000 EntryPoint=0x0013F581AD0 DevicePathDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9F0498
ProtectUefiImageCommon - 0x3E9AC040
   - 0x000000013F580000 - 0x0000000000010000
InstallProtocolInterface: 0379BE4E-D706-437D-B037-EDB82FB772A4 13F58A250
InstallProtocolInterface: 8B843E20-8132-4852-90CC-551A4E4A7F1C 13F58A240
InstallProtocolInterface: 05C99A21-C70F-4AD2-8A5F-35DF3343F51E 13F58A230
Loading driver 80CF7257-87AB-47F9-A3FE-D50B76D89541
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9AC340
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/PCD/Dxe/Pcd/DEBUG/PcdDxe.dll 0x13F577000
Loading driver at 0x0013F576000 EntryPoint=0x0013F57A29C PcdDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9AC918
ProtectUefiImageCommon - 0x3E9AC340
   - 0x000000013F576000 - 0x000000000000A000
InstallProtocolInterface: 11B34006-D85B-4D0A-A290-D5A571310EF7 13F57E198
InstallProtocolInterface: 13A3F0F6-264A-3EF0-F2E0-DEC512342F34 13F57E0F0
InstallProtocolInterface: 5BE40F57-FA68-4610-BBBF-E9C5FCDAD365 13F57E180
InstallProtocolInterface: FD0F4478-0EFD-461D-BA2D-E58C45FD5F5E 13F57E0E0
Loading driver 9A871B00-1C16-4F61-8D2C-93B6654B5AD6
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9BD2C0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmVirtPkg/FdtClientDxe/FdtClientDxe/DEBUG/FdtClientDxe.dll 0x13F56F000
Loading driver at 0x0013F56E000 EntryPoint=0x0013F570080 FdtClientDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9B3018
ProtectUefiImageCommon - 0x3E9BD2C0
   - 0x000000013F56E000 - 0x0000000000008000
InitializeFdtClientDxe: DTB @ 0x13FEE6000
InstallProtocolInterface: E11FACA0-4710-4C8E-A7A2-01BAA2591B4C 13F574060
Loading driver B601F8C4-43B7-4784-95B1-F4226CB40CEE
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9B30C0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Core/RuntimeDxe/RuntimeDxe/DEBUG/RuntimeDxe.dll 0x13F5E0000
Loading driver at 0x0013F5D0000 EntryPoint=0x0013F5E0910 RuntimeDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9B3798
ProtectUefiImageCommon - 0x3E9B30C0
   - 0x000000013F5D0000 - 0x0000000000040000
InstallProtocolInterface: B7DFB4E1-052F-449F-87BE-9818FC91B733 13F5F0060
Loading driver F80697E9-7FD6-4665-8646-88E33EF71DFC
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9B4040
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/SecurityStubDxe/SecurityStubDxe/DEBUG/SecurityStubDxe.dll 0x13F568000
Loading driver at 0x0013F567000 EntryPoint=0x0013F568978 SecurityStubDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9B4D18
ProtectUefiImageCommon - 0x3E9B4040
   - 0x000000013F567000 - 0x0000000000007000
InstallProtocolInterface: 94AB2F58-1438-4EF1-9152-18941A3A0E68 13F56C0B8
InstallProtocolInterface: A46423E3-4617-49F1-B9FF-D1BFA9115839 13F56C0C0
InstallProtocolInterface: 15853D7C-3DDF-43E0-A1CB-EBF85B8F872C 13F56C0B0
Loading driver 4C6E0267-C77D-410D-8100-1495911A989D
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9B4440
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/EmbeddedPkg/MetronomeDxe/MetronomeDxe/DEBUG/MetronomeDxe.dll 0x13F562000
Loading driver at 0x0013F561000 EntryPoint=0x0013F562544 MetronomeDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9B4798
ProtectUefiImageCommon - 0x3E9B4440
   - 0x000000013F561000 - 0x0000000000006000
InstallProtocolInterface: 26BACCB2-6F42-11D4-BCE7-0080C73C8881 13F565040
Loading driver 348C4D62-BFBD-4882-9ECE-C80BB1C4783B
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9B5C40
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/HiiDatabaseDxe/HiiDatabaseDxe/DEBUG/HiiDatabase.dll 0x13F542000
Loading driver at 0x0013F541000 EntryPoint=0x0013F544978 HiiDatabase.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9B5798
ProtectUefiImageCommon - 0x3E9B5C40
   - 0x000000013F541000 - 0x0000000000020000
InstallProtocolInterface: E9CA4775-8657-47FC-97E7-7ED65A084324 13F55F150
InstallProtocolInterface: 0FD96974-23AA-4CDC-B9CB-98D17750322A 13F55F1C8
InstallProtocolInterface: EF9FC172-A1B2-4693-B327-6D32FC416042 13F55F1F0
InstallProtocolInterface: 587E72D7-CC50-4F79-8209-CA291FC1A10F 13F55F248
InstallProtocolInterface: 0A8BADD5-03B8-4D19-B128-7B8F0EDAA596 13F55F278
InstallProtocolInterface: 31A6406A-6BDF-4E46-B2A2-EBAA89C40920 13F55F170
InstallProtocolInterface: 1A1241E6-8F19-41A9-BC0E-E8EF39E06546 13F55F198
Loading driver D3987D4B-971A-435F-8CAF-4967EB627241
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9BCBC0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/SerialDxe/SerialDxe/DEBUG/SerialDxe.dll 0x13F53C000
Loading driver at 0x0013F53B000 EntryPoint=0x0013F53DA98 SerialDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9BCA18
ProtectUefiImageCommon - 0x3E9BCBC0
   - 0x000000013F53B000 - 0x0000000000006000
InstallProtocolInterface: BB25CF6F-F1D4-11D2-9A0C-0090273FC1FD 13F53F090
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13F53F040
Loading driver D93CE3D8-A7EB-4730-8C8E-CC466A9ECC3C
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9BC1C0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/ReportStatusCodeRouter/RuntimeDxe/ReportStatusCodeRouterRuntimeDxe/DEBUG/ReportStatusCodeRouterRuntimeDxe.dll 0x13C270000
Loading driver at 0x0013C260000 EntryPoint=0x0013C270808 ReportStatusCodeRouterRuntimeDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9BB018
ProtectUefiImageCommon - 0x3E9BC1C0
   - 0x000000013C260000 - 0x0000000000040000
InstallProtocolInterface: 86212936-0E76-41C8-A03A-2AF2FC1C39E2 13C280060
InstallProtocolInterface: D2B2B828-0826-48A7-B3DF-983C006024F0 13C280070
Loading driver A210F973-229D-4F4D-AA37-9895E6C9EABA
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9BB7C0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/NetworkPkg/DpcDxe/DpcDxe/DEBUG/DpcDxe.dll 0x13F536000
Loading driver at 0x0013F535000 EntryPoint=0x0013F5368F0 DpcDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9BB618
ProtectUefiImageCommon - 0x3E9BB7C0
   - 0x000000013F535000 - 0x0000000000006000
InstallProtocolInterface: 480F8AE9-0C46-4AA9-BC89-DB9FBA619806 13F539030
Loading driver 22EA234F-E72A-11E4-91F9-28D2447C4829
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9BA040
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/NetworkPkg/HttpUtilitiesDxe/HttpUtilitiesDxe/DEBUG/HttpUtilitiesDxe.dll 0x13F52F000
Loading driver at 0x0013F52E000 EntryPoint=0x0013F52FF44 HttpUtilitiesDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9BAD18
ProtectUefiImageCommon - 0x3E9BA040
   - 0x000000013F52E000 - 0x0000000000007000
InstallProtocolInterface: 3E35C163-4074-45DD-431E-23989DD86B32 13F533050
Loading driver 13AC6DD0-73D0-11D4-B06B-00AA00BD6DE7
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9BA9C0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/EbcDxe/EbcDxe/DEBUG/EbcDxe.dll 0x13F525000
Loading driver at 0x0013F524000 EntryPoint=0x0013F5258A8 EbcDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9BA818
ProtectUefiImageCommon - 0x3E9BA9C0
   - 0x000000013F524000 - 0x000000000000A000
InstallProtocolInterface: 13AC6DD1-73D0-11D4-B06B-00AA00BD6DE7 13E9BA418
InstallProtocolInterface: 96F46153-97A7-4793-ACC1-FA19BF78EA97 13F52C050
InstallProtocolInterface: 2755590C-6F3C-42FA-9EA4-A3BA543CDA25 13E9B9F18
InstallProtocolInterface: AAEACCFD-F27B-4C17-B610-75CA1F2DFB52 13E9B9B98
Loading driver 0049858F-8CA7-4CCD-918B-D952CBF32975
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9B90C0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmVirtPkg/VirtioFdtDxe/VirtioFdtDxe/DEBUG/VirtioFdtDxe.dll 0x13F51E000
Loading driver at 0x0013F51D000 EntryPoint=0x0013F51FEAC VirtioFdtDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9B9A98
ProtectUefiImageCommon - 0x3E9B90C0
   - 0x000000013F51D000 - 0x0000000000007000
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B9898
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B9420
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B9598
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B8F20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B8E18
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B8C20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B8D98
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B87A0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B8898
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B8120
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B8698
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B8520
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B8498
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B7020
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B8418
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B7C20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B7D18
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B7120
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B7898
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B79A0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B7818
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B72A0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B7798
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B7420
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B6018
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B6B20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B6C18
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B6D20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B6A18
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B6820
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B6998
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B65A0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B6198
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B6220
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B6398
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B2F20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B2E18
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B2C20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B2D98
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B27A0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B2898
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B2120
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B2698
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B2520
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B2498
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B1020
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B2418
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B1C20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B1D18
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B1120
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B1898
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B19A0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B1818
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B12A0
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B1798
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B1420
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B0018
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B0B20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B0C18
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B0D20
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B0A18
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B0820
InstallProtocolInterface: 09576E91-6D3F-11D2-8E39-00A0C969723B 13E9B0998
VirtioMmioInit: Warning: The VendorId (0x554D4551) does not match the VirtIo VendorId (0x1AF4).
InstallProtocolInterface: FA920010-6785-4941-B6EC-498C579F160A 13E9B05A0
Loading driver FE5CEA76-4F72-49E8-986F-2CD899DFFE5D
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9AF240
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/FaultTolerantWriteDxe/FaultTolerantWriteDxe/DEBUG/FaultTolerantWriteDxe.dll 0x13F514000
Loading driver at 0x0013F513000 EntryPoint=0x0013F514C48 FaultTolerantWriteDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9AE018
ProtectUefiImageCommon - 0x3E9AF240
   - 0x000000013F513000 - 0x000000000000A000
Loading driver 4B28E4C7-FF36-4E10-93CF-A82159E777C5
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9AE0C0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/ResetSystemRuntimeDxe/ResetSystemRuntimeDxe/DEBUG/ResetSystemRuntimeDxe.dll 0x13C220000
Loading driver at 0x0013C210000 EntryPoint=0x0013C22091C ResetSystemRuntimeDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9AEA98
ProtectUefiImageCommon - 0x3E9AE0C0
   - 0x000000013C210000 - 0x0000000000040000
InstallProtocolInterface: 27CFAC88-46CC-11D4-9A38-0090273FC14D 0
InstallProtocolInterface: 9DA34AE0-EAF9-4BBF-8EC3-FD60226C44BE 13C2300D8
InstallProtocolInterface: 695D7835-8D47-4C11-AB22-FA8ACCE7AE7A 13C230088
InstallProtocolInterface: 2DF6BA0B-7092-440D-BD04-FB091EC3F3C1 13C2300B0
Loading driver DE371F7C-DEC4-4D21-ADF1-593ABCC15882
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9ADB40
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ArmPkg/Drivers/ArmGic/ArmGicDxe/DEBUG/ArmGicDxe.dll 0x13F50C000
Loading driver at 0x0013F50B000 EntryPoint=0x0013F50CE94 ArmGicDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E9AD098
ProtectUefiImageCommon - 0x3E9ADB40
   - 0x000000013F50B000 - 0x0000000000008000
Found GIC v3 (re)distributor @ 0x8000000 (0x80A0000)
InstallProtocolInterface: 2890B3EA-053D-1643-AD0C-D64808DA3FF1 13F5110E8
InstallProtocolInterface: 32898322-2DA1-474A-BAAA-F3F7CF569470 13F511078
Loading driver A487A478-51EF-48AA-8794-7BEE2A0562F1
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E9AD140
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/ShellPkg/DynamicCommand/TftpDynamicCommand/TftpDynamicCommand/DEBUG/tftpDynamicCommand.dll 0x13F4FD000
Loading driver at 0x0013F4FC000 EntryPoint=0x0013F4FD6AC tftpDynamicCommand.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E56AD98
InstallProtocolInterface: 6A1EE763-D47A-43B4-AABE-EF1DE2AB56FC 13F508070
ProtectUefiImageCommon - 0x3E9AD140
   - 0x000000013F4FC000 - 0x000000000000F000
InstallProtocolInterface: 3C7200E9-005F-4EA4-87DE-A3DFAC8A27C3 13F5070A0
Loading driver EBF342FE-B1D3-4EF8-957C-8048606FF671
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E569140
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/SetupBrowserDxe/SetupBrowserDxe/DEBUG/SetupBrowser.dll 0x13F4E1000
Loading driver at 0x0013F4E0000 EntryPoint=0x0013F4EEFB0 SetupBrowser.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E569B18
ProtectUefiImageCommon - 0x3E569140
   - 0x000000013F4E0000 - 0x000000000001C000
InstallProtocolInterface: B9D4C360-BCFB-4F9B-9298-53C136982258 13F4FA0C8
InstallProtocolInterface: A770C357-B693-4E6D-A6CF-D21C728E550B 13F4FA0F8
InstallProtocolInterface: 1F73B18D-4630-43C1-A1DE-6F80855D7DA4 13F4FA0D8
Loading driver F9D88642-0737-49BC-81B5-6889CD57D9EA
InstallProtocolInterface: 5B1B31A1-9562-11D2-8E3F-00A0C969723B 13E566BC0
add-symbol-file /home/lacos/src/upstream/qemu/roms/edk2/Build/ArmVirtQemu-AARCH64/DEBUG_GCC5/AARCH64/MdeModulePkg/Universal/SmbiosDxe/SmbiosDxe/DEBUG/SmbiosDxe.dll 0x13F4D7000
Loading driver at 0x0013F4D6000 EntryPoint=0x0013F4D9D74 SmbiosDxe.efi
InstallProtocolInterface: BC62157E-3E33-4FEC-9920-2D3B36D750DF 13E566F18
ProtectUefiImageCommon - 0x3E566BC0
   - 0x000000013F4D6000 - 0x000000000000A000
Found FwCfg @ 0x9020008/0x9020000
Found FwCfg DMA @ 0x9020010

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling
  2020-09-03 11:48     ` Gavin Shan
@ 2020-09-03 12:16       ` Will Deacon
  2020-09-04  0:51         ` Gavin Shan
  0 siblings, 1 reply; 86+ messages in thread
From: Will Deacon @ 2020-09-03 12:16 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, kvmarm, linux-arm-kernel

On Thu, Sep 03, 2020 at 09:48:18PM +1000, Gavin Shan wrote:
> On 9/3/20 9:13 PM, Gavin Shan wrote:
> > On 9/3/20 5:34 PM, Gavin Shan wrote:
> > > On 8/25/20 7:39 PM, Will Deacon wrote:
> > > > Hello folks,
> > > > 
> > > > This is version three of the KVM page-table rework that I previously posted
> > > > here:
> > > > 
> > > >    v1: https://lore.kernel.org/r/20200730153406.25136-1-will@kernel.org
> > > >    v2: https://lore.kernel.org/r/20200818132818.16065-1-will@kernel.org
> > > > 
> > > > Changes since v2 include:
> > > > 
> > > >    * Rebased onto -rc2, which includes the conflicting OOM blocking fixes
> > > >    * Dropped the patch trying to "fix" the memcache in kvm_phys_addr_ioremap()
> > > > 
> > > 
> > > It's really nice work, making the code unified/simplified greatly.
> > > However, it seems it doesn't work well with HugeTLBfs. Please refer
> > > to the following test result and see if you have quick idea, or I
> > > can debug it a bit :)

Nice testing matrix, and thanks for reporting the problem!

> > > Machine         Host                     Guest              Result
> > > ===============================================================
> > > ThunderX2    VA_BITS:   42           PAGE_SIZE:  4KB     Passed
> > >               PAGE_SIZE: 64KB                    64KB     passed
> > >               THP:       disabled
> > >               HugeTLB:   disabled
> > > ---------------------------------------------------------------
> > > ThunderX2    VA_BITS:   42           PAGE_SIZE:  4KB     Passed
> > >               PAGE_SIZE: 64KB                    64KB     passed
> > >               THP:       enabled
> > >               HugeTLB:   disabled
> > > ----------------------------------------------------------------
> > > ThunderX2    VA_BITS:   42           PAGE_SIZE:  4KB     Fail[1]
> > >               PAGE_SIZE: 64KB                    64KB     Fail[1]
> > >               THP:       disabled
> > >               HugeTLB:   enabled
> > > ---------------------------------------------------------------
> > > ThunderX2    VA_BITS:   39           PAGE_SIZE:  4KB     Passed
> > >               PAGE_SIZE: 4KB                     64KB     Passed
> > >               THP:       disabled
> > >               HugeTLB:   disabled
> > > ---------------------------------------------------------------
> > > ThunderX2    VA_BITS:   39           PAGE_SIZE:  4KB     Passed
> > >               PAGE_SIZE: 4KB                     64KB     Passed
> > >               THP:       enabled
> > >               HugeTLB:   disabled
> > > --------------------------------------------------------------
> > > ThunderX2    VA_BITS:   39           PAGE_SIZE: 4KB     Fail[2]
> > >               PAGE_SIZE: 4KB                    64KB     Fail[2]
> > >               THP:       disabled
> > >               HugeTLB:   enabled
> > > 
> > 
> > I debugged the code and found the issue is caused by the following
> > patch.
> > 
> > [PATCH v3 06/21] KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table

(I think this is just a symptom of the page-table being out of whack)

> Sorry that the guest could hang sometimes with above changes. I have no idea what
> has been happening before I'm going to debug for more.. I'm pasting the used command
> and output from guest.

Can you try the diff below, please? I think we can end up sticking down a
huge-page-sized mapping at an unaligned address, which causes us both to
overmap and also to fail to use the huge granule for a block mapping.

Cheers,

Will

--->8

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index f28e03dcb897..3bff942e5f33 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -737,11 +737,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
        bool exec_fault;
        bool device = false;
        unsigned long mmu_seq;
-       gfn_t gfn = fault_ipa >> PAGE_SHIFT;
        struct kvm *kvm = vcpu->kvm;
        struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
        struct vm_area_struct *vma;
        short vma_shift;
+       gfn_t gfn;
        kvm_pfn_t pfn;
        bool logging_active = memslot_is_logging(memslot);
        unsigned long vma_pagesize;
@@ -780,7 +780,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
        }
 
        if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
-               gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
+               fault_ipa &= huge_page_mask(hstate_vma(vma));
+
+       gfn = fault_ipa >> PAGE_SHIFT;
        mmap_read_unlock(current->mm);
 
        if (fault_status != FSC_PERM) {


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 06/21] KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table
  2020-09-03 11:18   ` Gavin Shan
@ 2020-09-03 12:30     ` Will Deacon
  2020-09-03 16:15       ` Will Deacon
  0 siblings, 1 reply; 86+ messages in thread
From: Will Deacon @ 2020-09-03 12:30 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, kvmarm, linux-arm-kernel

On Thu, Sep 03, 2020 at 09:18:27PM +1000, Gavin Shan wrote:
> On 8/25/20 7:39 PM, Will Deacon wrote:
> > Add stage-2 map() and unmap() operations to the generic page-table code.
> > 
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Quentin Perret <qperret@google.com>
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >   arch/arm64/include/asm/kvm_pgtable.h |  39 ++++
> >   arch/arm64/kvm/hyp/pgtable.c         | 262 +++++++++++++++++++++++++++
> >   2 files changed, 301 insertions(+)

[...]

> > +static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
> > +				      kvm_pte_t *ptep,
> > +				      struct stage2_map_data *data)
> > +{
> > +	int ret = 0;
> > +
> > +	if (!data->anchor)
> > +		return 0;
> > +
> > +	free_page((unsigned long)kvm_pte_follow(*ptep));
> > +	put_page(virt_to_page(ptep));
> > +
> > +	if (data->anchor == ptep) {
> > +		data->anchor = NULL;
> > +		ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> 
> As discussed in another thread, *ptep has been invalidated in stage2_map_walk_table_pre().
> It means *ptep has value of zero. The following call to free_page() is going to release
> the page frame corresponding to physical address 0x0. It's not correct. We might cache
> the original value of this page table entry so that it can be used here.

Ah, yes, I see what you mean. But it's odd that I haven't run into this
myself, so let me try to reproduce the issue first. Another solution is
to invalidate the table entry only by clearing the valid bit of the pte,
rather than zapping the entire thing to 0, which can be done later when we
clear the anchor.

Cheers,

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 06/21] KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table
  2020-09-03 12:30     ` Will Deacon
@ 2020-09-03 16:15       ` Will Deacon
  2020-09-04  0:47         ` Gavin Shan
  0 siblings, 1 reply; 86+ messages in thread
From: Will Deacon @ 2020-09-03 16:15 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, kvmarm, linux-arm-kernel

Hi Gavin,

On Thu, Sep 03, 2020 at 01:30:32PM +0100, Will Deacon wrote:
> On Thu, Sep 03, 2020 at 09:18:27PM +1000, Gavin Shan wrote:
> > On 8/25/20 7:39 PM, Will Deacon wrote:
> > > +static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
> > > +				      kvm_pte_t *ptep,
> > > +				      struct stage2_map_data *data)
> > > +{
> > > +	int ret = 0;
> > > +
> > > +	if (!data->anchor)
> > > +		return 0;
> > > +
> > > +	free_page((unsigned long)kvm_pte_follow(*ptep));
> > > +	put_page(virt_to_page(ptep));
> > > +
> > > +	if (data->anchor == ptep) {
> > > +		data->anchor = NULL;
> > > +		ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
> > > +	}
> > > +
> > > +	return ret;
> > > +}
> > > +
> > 
> > As discussed in another thread, *ptep has been invalidated in stage2_map_walk_table_pre().
> > It means *ptep has value of zero. The following call to free_page() is going to release
> > the page frame corresponding to physical address 0x0. It's not correct. We might cache
> > the original value of this page table entry so that it can be used here.
> 
> Ah, yes, I see what you mean. But it's odd that I haven't run into this
> myself, so let me try to reproduce the issue first. Another solution is
> to invalidate the table entry only by clearing the valid bit of the pte,
> rather than zapping the entire thing to 0, which can be done later when we
> clear the anchor.

Ok! There are a couple of issues here:

  1. As you point out, the kvm_pte_follow() above ends up chasing a zeroed
     pte.

  2. The reason I'm not seeing this in testing is because the dirty logging
     code isn't hitting the table -> block case as it should. This is
     because I'm not handling permission faults properly when a write
     hits a read-only block entry. In this case, we need to collapse the
     entry if logging is active.

Diff below seems to clear all of this up. I'll fold it in for v4.

Thanks for reporting the problem and helping to debug it.

Will

--->8

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index dc76fdf31be3..9328830e9464 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -155,8 +155,8 @@ static kvm_pte_t *kvm_pte_follow(kvm_pte_t pte)
 
 static void kvm_set_invalid_pte(kvm_pte_t *ptep)
 {
-       kvm_pte_t pte = 0;
-       WRITE_ONCE(*ptep, pte);
+       kvm_pte_t pte = *ptep;
+       WRITE_ONCE(*ptep, pte & ~KVM_PTE_VALID);
 }
 
 static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index f28e03dcb897..10b73da6abb2 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -737,11 +737,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
        bool exec_fault;
        bool device = false;
        unsigned long mmu_seq;
-       gfn_t gfn = fault_ipa >> PAGE_SHIFT;
        struct kvm *kvm = vcpu->kvm;
        struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
        struct vm_area_struct *vma;
        short vma_shift;
+       gfn_t gfn;
        kvm_pfn_t pfn;
        bool logging_active = memslot_is_logging(memslot);
        unsigned long vma_pagesize;
@@ -780,10 +780,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
        }
 
        if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
-               gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
+               fault_ipa &= huge_page_mask(hstate_vma(vma));
+
+       gfn = fault_ipa >> PAGE_SHIFT;
        mmap_read_unlock(current->mm);
 
-       if (fault_status != FSC_PERM) {
+       /*
+        * Permission faults just need to update the existing leaf entry,
+        * and so normally don't require allocations from the memcache. The
+        * only exception to this is when dirty logging is enabled at runtime
+        * and a write fault needs to collapse a block entry into a table.
+        */
+       if (fault_status != FSC_PERM || (logging_active && write_fault)) {
                ret = kvm_mmu_topup_memory_cache(memcache,
                                                 kvm_mmu_cache_min_pages(kvm));
                if (ret)
@@ -854,7 +862,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
        else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
                prot |= KVM_PGTABLE_PROT_X;
 
-       if (fault_status == FSC_PERM) {
+       if (fault_status == FSC_PERM && !(logging_active && writable)) {
                ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
        } else {
                ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 08/21] KVM: arm64: Convert kvm_set_spte_hva() to generic page-table API
  2020-09-02 15:37   ` Alexandru Elisei
@ 2020-09-03 16:37     ` Will Deacon
  0 siblings, 0 replies; 86+ messages in thread
From: Will Deacon @ 2020-09-03 16:37 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: Marc Zyngier, kernel-team, kvmarm, linux-arm-kernel, Catalin Marinas

On Wed, Sep 02, 2020 at 04:37:18PM +0100, Alexandru Elisei wrote:
> On 8/25/20 10:39 AM, Will Deacon wrote:
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 33146d3dc93a..704b471a48ce 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1911,28 +1911,27 @@ int kvm_unmap_hva_range(struct kvm *kvm,
> >  
> >  static int kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *data)
> >  {
> > -	pte_t *pte = (pte_t *)data;
> > +	kvm_pfn_t *pfn = (kvm_pfn_t *)data;
> >  
> >  	WARN_ON(size != PAGE_SIZE);
> > +
> >  	/*
> > -	 * We can always call stage2_set_pte with KVM_S2PTE_FLAG_LOGGING_ACTIVE
> > -	 * flag clear because MMU notifiers will have unmapped a huge PMD before
> > -	 * calling ->change_pte() (which in turn calls kvm_set_spte_hva()) and
> > -	 * therefore stage2_set_pte() never needs to clear out a huge PMD
> > -	 * through this calling path.
> > +	 * The MMU notifiers will have unmapped a huge PMD before calling
> > +	 * ->change_pte() (which in turn calls kvm_set_spte_hva()) and
> > +	 * therefore we never need to clear out a huge PMD through this
> > +	 * calling path and a memcache is not required.
> >  	 */
> > -	stage2_set_pte(&kvm->arch.mmu, NULL, gpa, pte, 0);
> > +	kvm_pgtable_stage2_map(kvm->arch.mmu.pgt, gpa, PAGE_SIZE,
> > +			       __pfn_to_phys(*pfn), KVM_PGTABLE_PROT_R, NULL);
> 
> I have to admit that I managed to confuse myself.
> 
> According to the comment, this is called after unmapping a huge PMD.
> __unmap_stage2_range() -> .. -> unmap_stage2_pmd() calls pmd_clear(), which means
> the PMD entry is now 0.
> 
> In __kvm_pgtable_visit(), kvm_pte_table() returns false, because the entry is
> invalid, and so we call stage2_map_walk_leaf(). Here, stage2_map_walker_try_leaf()
> will return false, because kvm_block_mapping_supported() returns false (PMD
> granule is larger than PAGE_SIZE), and then we end up allocating a table from the
> memcache. memcache which will NULL, and kvm_mmu_memory_cache_alloc() will
> dereference the NULL pointer.
> 
> I'm pretty sure there's something that I'm missing here, I would really appreciate
> someone pointing out where I'm making a mistake.

You're not missing anything, and this is actually a bug introduced by moving
to the generic mmu cache code. My old implementation (which you can still
see in the earlier patch) returns NULL if the cache is NULL, so I'll need to
reintroduce that check here. This then mimics the current behaviour of
ignoring map requests from the MMU if we need to allocate, and instead
handling them lazily when we take the fault.

Well spotted!

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 10/21] KVM: arm64: Add support for stage-2 page-aging in generic page-table
  2020-09-03  4:33   ` Gavin Shan
@ 2020-09-03 16:48     ` Will Deacon
  2020-09-04  1:01       ` Gavin Shan
  0 siblings, 1 reply; 86+ messages in thread
From: Will Deacon @ 2020-09-03 16:48 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, kvmarm, linux-arm-kernel

On Thu, Sep 03, 2020 at 02:33:22PM +1000, Gavin Shan wrote:
> On 8/25/20 7:39 PM, Will Deacon wrote:
> > Add stage-2 mkyoung(), mkold() and is_young() operations to the generic
> > page-table code.
> > 
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Quentin Perret <qperret@google.com>
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >   arch/arm64/include/asm/kvm_pgtable.h | 38 ++++++++++++
> >   arch/arm64/kvm/hyp/pgtable.c         | 86 ++++++++++++++++++++++++++++
> >   2 files changed, 124 insertions(+)

[...]

> > +static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
> > +				    u64 size, kvm_pte_t attr_set,
> > +				    kvm_pte_t attr_clr, kvm_pte_t *orig_pte)
> > +{
> > +	int ret;
> > +	kvm_pte_t attr_mask = KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI;
> > +	struct stage2_attr_data data = {
> > +		.attr_set	= attr_set & attr_mask,
> > +		.attr_clr	= attr_clr & attr_mask,
> > +	};
> > +	struct kvm_pgtable_walker walker = {
> > +		.cb		= stage2_attr_walker,
> > +		.arg		= &data,
> > +		.flags		= KVM_PGTABLE_WALK_LEAF,
> > +	};
> > +
> > +	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
> > +	if (ret)
> > +		return ret;
> > +
> > +	if (orig_pte)
> > +		*orig_pte = data.pte;
> > +	return 0;
> > +}
> > +
> 
> The @size is always 1 from the caller, which means the parameter
> can be dropped from stage2_update_leaf_attrs(). In the meanwhile,
> we don't know the page is mapped by PUD, PMD or PTE. So to have
> fixed value ("1") looks meaningless.

I add extra callers later on, for example kvm_pgtable_stage2_wrprotect(),
which pass a size, so it's needed for that.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 20/21] KVM: arm64: Remove unused 'pgd' field from 'struct kvm_s2_mmu'
  2020-09-03  5:07   ` Gavin Shan
@ 2020-09-03 16:50     ` Will Deacon
  2020-09-04  0:59       ` Gavin Shan
  0 siblings, 1 reply; 86+ messages in thread
From: Will Deacon @ 2020-09-03 16:50 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, kvmarm, linux-arm-kernel

On Thu, Sep 03, 2020 at 03:07:17PM +1000, Gavin Shan wrote:
> On 8/25/20 7:39 PM, Will Deacon wrote:
> > The stage-2 page-tables are entirely encapsulated by the 'pgt' field of
> > 'struct kvm_s2_mmu', so remove the unused 'pgd' field.
> > 
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Quentin Perret <qperret@google.com>
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >   arch/arm64/include/asm/kvm_host.h | 1 -
> >   arch/arm64/kvm/mmu.c              | 2 --
> >   2 files changed, 3 deletions(-)
> > 
> 
> I think this might be folded into PATCH[18] as both patches are
> simple enough. I'm not sure the changes introduced in PATCH[19]
> prevent us doing this.
> 
> There is another question below.
> 
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> 
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 0b7c702b2151..41caf29bd93c 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -79,7 +79,6 @@ struct kvm_s2_mmu {
> >   	 * for vEL1/EL0 with vHCR_EL2.VM == 0.  In that case, we use the
> >   	 * canonical stage-2 page tables.
> >   	 */
> > -	pgd_t		*pgd;
> >   	phys_addr_t	pgd_phys;
> >   	struct kvm_pgtable *pgt;
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index ddeec0b03666..f28e03dcb897 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -384,7 +384,6 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
> >   	mmu->kvm = kvm;
> >   	mmu->pgt = pgt;
> >   	mmu->pgd_phys = __pa(pgt->pgd);
> > -	mmu->pgd = (void *)pgt->pgd;
> >   	mmu->vmid.vmid_gen = 0;
> >   	return 0;
> > @@ -470,7 +469,6 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
> >   	spin_lock(&kvm->mmu_lock);
> >   	pgt = mmu->pgt;
> >   	if (pgt) {
> > -		mmu->pgd = NULL;
> >   		mmu->pgd_phys = 0;
> >   		mmu->pgt = NULL;
> >   		free_percpu(mmu->last_vcpu_ran);
> > 
> 
> I guess mmu->pgd_phys might be removed either because kvm_get_vttbr()
> is the only consumer.

Hmm, but kvm_get_vttbr() is still used after these patches, so I think
the pgd_phys field needs to stick around.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 09/21] KVM: arm64: Convert unmap_stage2_range() to generic page-table API
  2020-09-02 16:23   ` Alexandru Elisei
  2020-09-02 18:44     ` Alexandru Elisei
@ 2020-09-03 17:57     ` Will Deacon
  2020-09-08 13:07       ` Alexandru Elisei
  1 sibling, 1 reply; 86+ messages in thread
From: Will Deacon @ 2020-09-03 17:57 UTC (permalink / raw)
  To: Alexandru Elisei
  Cc: Marc Zyngier, kernel-team, kvmarm, linux-arm-kernel, Catalin Marinas

On Wed, Sep 02, 2020 at 05:23:08PM +0100, Alexandru Elisei wrote:
> On 8/25/20 10:39 AM, Will Deacon wrote:
> > Convert unmap_stage2_range() to use kvm_pgtable_stage2_unmap() instead
> > of walking the page-table directly.
> >
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Quentin Perret <qperret@google.com>
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/kvm/mmu.c | 57 +++++++++++++++++++++++++-------------------
> >  1 file changed, 32 insertions(+), 25 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 704b471a48ce..751ce2462765 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -39,6 +39,33 @@ static bool is_iomap(unsigned long flags)
> >  	return flags & KVM_S2PTE_FLAG_IS_IOMAP;
> >  }
> >  
> > +/*
> > + * Release kvm_mmu_lock periodically if the memory region is large. Otherwise,
> > + * we may see kernel panics with CONFIG_DETECT_HUNG_TASK,
> > + * CONFIG_LOCKUP_DETECTOR, CONFIG_LOCKDEP. Additionally, holding the lock too
> > + * long will also starve other vCPUs. We have to also make sure that the page
> > + * tables are not freed while we released the lock.
> > + */
> > +#define stage2_apply_range(kvm, addr, end, fn, resched)			\
> > +({									\
> > +	int ret;							\
> > +	struct kvm *__kvm = (kvm);					\
> > +	bool __resched = (resched);					\
> > +	u64 next, __addr = (addr), __end = (end);			\
> > +	do {								\
> > +		struct kvm_pgtable *pgt = __kvm->arch.mmu.pgt;		\
> > +		if (!pgt)						\
> > +			break;						\
> 
> I'm 100% sure there's a reason why we've dropped the READ_ONCE, but it still looks
> to me like the compiler might decide to optimize by reading pgt once at the start
> of the loop and stashing it in a register. Would you mind explaining what I am
> missing?

The load always happens with the mmu_lock held, so I think it's not a
problem because it means that the pointer is stable.
spin_lock()/spin_unlock() imply compiler barriers.

> > +		next = stage2_pgd_addr_end(__kvm, __addr, __end);	\
> > +		ret = fn(pgt, __addr, next - __addr);			\
> > +		if (ret)						\
> > +			break;						\
> > +		if (__resched && next != __end)				\
> > +			cond_resched_lock(&__kvm->mmu_lock);		\
> > +	} while (__addr = next, __addr != __end);			\
> > +	ret;								\
> > +})
> 
> This seems unusual to me. We have a non-trivial, multiline macro which calls
> cond_resched(), has 6 local variables, and is called from exactly one place.I am
> curious why we are not open coding the loop in __unmap_stage2_range() or using a
> function.

It's called from three places. That said, I think it's like this because in
an earlier life it was used as an iterator and therefore had to be a macro.
I can try moving it into a function instead.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling
  2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
                   ` (22 preceding siblings ...)
  2020-09-03  7:34 ` Gavin Shan
@ 2020-09-03 18:52 ` Will Deacon
  23 siblings, 0 replies; 86+ messages in thread
From: Will Deacon @ 2020-09-03 18:52 UTC (permalink / raw)
  To: kvmarm, gshan, alexandru.elisei
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, linux-arm-kernel

Alex, Gavin,

On Tue, Aug 25, 2020 at 10:39:32AM +0100, Will Deacon wrote:
> This is version three of the KVM page-table rework that I previously posted
> here:
> 
>   v1: https://lore.kernel.org/r/20200730153406.25136-1-will@kernel.org
>   v2: https://lore.kernel.org/r/20200818132818.16065-1-will@kernel.org
> 
> Changes since v2 include:
> 
>   * Rebased onto -rc2, which includes the conflicting OOM blocking fixes
>   * Dropped the patch trying to "fix" the memcache in kvm_phys_addr_ioremap()

I'm away tomorrow, so I'll post a v4 next week. However, in the meantime,
I've pushed a branch which I think incorporates all of your comments here:

https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=kvm/pgtable

so if you want to kick the tyres, that's the one to use.

Cheers, and have a good weekend.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 06/21] KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table
  2020-09-03 16:15       ` Will Deacon
@ 2020-09-04  0:47         ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2020-09-04  0:47 UTC (permalink / raw)
  To: Will Deacon
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, kvmarm, linux-arm-kernel

Hi Will,

On 9/4/20 2:15 AM, Will Deacon wrote:
> On Thu, Sep 03, 2020 at 01:30:32PM +0100, Will Deacon wrote:
>> On Thu, Sep 03, 2020 at 09:18:27PM +1000, Gavin Shan wrote:
>>> On 8/25/20 7:39 PM, Will Deacon wrote:
>>>> +static int stage2_map_walk_table_post(u64 addr, u64 end, u32 level,
>>>> +				      kvm_pte_t *ptep,
>>>> +				      struct stage2_map_data *data)
>>>> +{
>>>> +	int ret = 0;
>>>> +
>>>> +	if (!data->anchor)
>>>> +		return 0;
>>>> +
>>>> +	free_page((unsigned long)kvm_pte_follow(*ptep));
>>>> +	put_page(virt_to_page(ptep));
>>>> +
>>>> +	if (data->anchor == ptep) {
>>>> +		data->anchor = NULL;
>>>> +		ret = stage2_map_walk_leaf(addr, end, level, ptep, data);
>>>> +	}
>>>> +
>>>> +	return ret;
>>>> +}
>>>> +
>>>
>>> As discussed in another thread, *ptep has been invalidated in stage2_map_walk_table_pre().
>>> It means *ptep has value of zero. The following call to free_page() is going to release
>>> the page frame corresponding to physical address 0x0. It's not correct. We might cache
>>> the original value of this page table entry so that it can be used here.
>>
>> Ah, yes, I see what you mean. But it's odd that I haven't run into this
>> myself, so let me try to reproduce the issue first. Another solution is
>> to invalidate the table entry only by clearing the valid bit of the pte,
>> rather than zapping the entire thing to 0, which can be done later when we
>> clear the anchor.
> 
> Ok! There are a couple of issues here:
> 
>    1. As you point out, the kvm_pte_follow() above ends up chasing a zeroed
>       pte.
> 
>    2. The reason I'm not seeing this in testing is because the dirty logging
>       code isn't hitting the table -> block case as it should. This is
>       because I'm not handling permission faults properly when a write
>       hits a read-only block entry. In this case, we need to collapse the
>       entry if logging is active.
> 
> Diff below seems to clear all of this up. I'll fold it in for v4.
> 
> Thanks for reporting the problem and helping to debug it.
> 

I saw these changes have been fold to v4. So tried v4 directly and hugetlbfs
works fine with the changes. Thanks for the fixes.

> --->8
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index dc76fdf31be3..9328830e9464 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -155,8 +155,8 @@ static kvm_pte_t *kvm_pte_follow(kvm_pte_t pte)
>   
>   static void kvm_set_invalid_pte(kvm_pte_t *ptep)
>   {
> -       kvm_pte_t pte = 0;
> -       WRITE_ONCE(*ptep, pte);
> +       kvm_pte_t pte = *ptep;
> +       WRITE_ONCE(*ptep, pte & ~KVM_PTE_VALID);
>   }
>   
>   static void kvm_set_table_pte(kvm_pte_t *ptep, kvm_pte_t *childp)
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index f28e03dcb897..10b73da6abb2 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -737,11 +737,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>          bool exec_fault;
>          bool device = false;
>          unsigned long mmu_seq;
> -       gfn_t gfn = fault_ipa >> PAGE_SHIFT;
>          struct kvm *kvm = vcpu->kvm;
>          struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>          struct vm_area_struct *vma;
>          short vma_shift;
> +       gfn_t gfn;
>          kvm_pfn_t pfn;
>          bool logging_active = memslot_is_logging(memslot);
>          unsigned long vma_pagesize;
> @@ -780,10 +780,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>          }
>   
>          if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
> -               gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
> +               fault_ipa &= huge_page_mask(hstate_vma(vma));
> +
> +       gfn = fault_ipa >> PAGE_SHIFT;
>          mmap_read_unlock(current->mm);
>   
> -       if (fault_status != FSC_PERM) {
> +       /*
> +        * Permission faults just need to update the existing leaf entry,
> +        * and so normally don't require allocations from the memcache. The
> +        * only exception to this is when dirty logging is enabled at runtime
> +        * and a write fault needs to collapse a block entry into a table.
> +        */
> +       if (fault_status != FSC_PERM || (logging_active && write_fault)) {
>                  ret = kvm_mmu_topup_memory_cache(memcache,
>                                                   kvm_mmu_cache_min_pages(kvm));
>                  if (ret)
> @@ -854,7 +862,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>          else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
>                  prot |= KVM_PGTABLE_PROT_X;
>   
> -       if (fault_status == FSC_PERM) {
> +       if (fault_status == FSC_PERM && !(logging_active && writable)) {
>                  ret = kvm_pgtable_stage2_relax_perms(pgt, fault_ipa, prot);
>          } else {
>                  ret = kvm_pgtable_stage2_map(pgt, fault_ipa, vma_pagesize,
> 

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling
  2020-09-03 12:16       ` Will Deacon
@ 2020-09-04  0:51         ` Gavin Shan
  2020-09-04 10:07           ` Marc Zyngier
  2020-09-07  9:27           ` Will Deacon
  0 siblings, 2 replies; 86+ messages in thread
From: Gavin Shan @ 2020-09-04  0:51 UTC (permalink / raw)
  To: Will Deacon
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, kvmarm, linux-arm-kernel

Hi Will,

On 9/3/20 10:16 PM, Will Deacon wrote:
> On Thu, Sep 03, 2020 at 09:48:18PM +1000, Gavin Shan wrote:
>> On 9/3/20 9:13 PM, Gavin Shan wrote:
>>> On 9/3/20 5:34 PM, Gavin Shan wrote:
>>>> On 8/25/20 7:39 PM, Will Deacon wrote:
>>>>> Hello folks,
>>>>>
>>>>> This is version three of the KVM page-table rework that I previously posted
>>>>> here:
>>>>>
>>>>>     v1: https://lore.kernel.org/r/20200730153406.25136-1-will@kernel.org
>>>>>     v2: https://lore.kernel.org/r/20200818132818.16065-1-will@kernel.org
>>>>>
>>>>> Changes since v2 include:
>>>>>
>>>>>     * Rebased onto -rc2, which includes the conflicting OOM blocking fixes
>>>>>     * Dropped the patch trying to "fix" the memcache in kvm_phys_addr_ioremap()
>>>>>
>>>>
>>>> It's really nice work, making the code unified/simplified greatly.
>>>> However, it seems it doesn't work well with HugeTLBfs. Please refer
>>>> to the following test result and see if you have quick idea, or I
>>>> can debug it a bit :)
> 
> Nice testing matrix, and thanks for reporting the problem!
> 
>>>> Machine         Host                     Guest              Result
>>>> ===============================================================
>>>> ThunderX2    VA_BITS:   42           PAGE_SIZE:  4KB     Passed
>>>>                PAGE_SIZE: 64KB                    64KB     passed
>>>>                THP:       disabled
>>>>                HugeTLB:   disabled
>>>> ---------------------------------------------------------------
>>>> ThunderX2    VA_BITS:   42           PAGE_SIZE:  4KB     Passed
>>>>                PAGE_SIZE: 64KB                    64KB     passed
>>>>                THP:       enabled
>>>>                HugeTLB:   disabled
>>>> ----------------------------------------------------------------
>>>> ThunderX2    VA_BITS:   42           PAGE_SIZE:  4KB     Fail[1]
>>>>                PAGE_SIZE: 64KB                    64KB     Fail[1]
>>>>                THP:       disabled
>>>>                HugeTLB:   enabled
>>>> ---------------------------------------------------------------
>>>> ThunderX2    VA_BITS:   39           PAGE_SIZE:  4KB     Passed
>>>>                PAGE_SIZE: 4KB                     64KB     Passed
>>>>                THP:       disabled
>>>>                HugeTLB:   disabled
>>>> ---------------------------------------------------------------
>>>> ThunderX2    VA_BITS:   39           PAGE_SIZE:  4KB     Passed
>>>>                PAGE_SIZE: 4KB                     64KB     Passed
>>>>                THP:       enabled
>>>>                HugeTLB:   disabled
>>>> --------------------------------------------------------------
>>>> ThunderX2    VA_BITS:   39           PAGE_SIZE: 4KB     Fail[2]
>>>>                PAGE_SIZE: 4KB                    64KB     Fail[2]
>>>>                THP:       disabled
>>>>                HugeTLB:   enabled
>>>>
>>>
>>> I debugged the code and found the issue is caused by the following
>>> patch.
>>>
>>> [PATCH v3 06/21] KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table
> 
> (I think this is just a symptom of the page-table being out of whack)
> 
>> Sorry that the guest could hang sometimes with above changes. I have no idea what
>> has been happening before I'm going to debug for more.. I'm pasting the used command
>> and output from guest.
> 
> Can you try the diff below, please? I think we can end up sticking down a
> huge-page-sized mapping at an unaligned address, which causes us both to
> overmap and also to fail to use the huge granule for a block mapping.
> 

Since the the following changes have been folded to v4, I reran the test cases
on v4 and everything works fine.

Thanks,
Gavin

> --->8
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index f28e03dcb897..3bff942e5f33 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -737,11 +737,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>          bool exec_fault;
>          bool device = false;
>          unsigned long mmu_seq;
> -       gfn_t gfn = fault_ipa >> PAGE_SHIFT;
>          struct kvm *kvm = vcpu->kvm;
>          struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>          struct vm_area_struct *vma;
>          short vma_shift;
> +       gfn_t gfn;
>          kvm_pfn_t pfn;
>          bool logging_active = memslot_is_logging(memslot);
>          unsigned long vma_pagesize;
> @@ -780,7 +780,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>          }
>   
>          if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
> -               gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
> +               fault_ipa &= huge_page_mask(hstate_vma(vma));
> +
> +       gfn = fault_ipa >> PAGE_SHIFT;
>          mmap_read_unlock(current->mm);
>   
>          if (fault_status != FSC_PERM) {
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 20/21] KVM: arm64: Remove unused 'pgd' field from 'struct kvm_s2_mmu'
  2020-09-03 16:50     ` Will Deacon
@ 2020-09-04  0:59       ` Gavin Shan
  2020-09-04 10:02         ` Marc Zyngier
  0 siblings, 1 reply; 86+ messages in thread
From: Gavin Shan @ 2020-09-04  0:59 UTC (permalink / raw)
  To: Will Deacon
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, kvmarm, linux-arm-kernel

Hi Will,

On 9/4/20 2:50 AM, Will Deacon wrote:
> On Thu, Sep 03, 2020 at 03:07:17PM +1000, Gavin Shan wrote:
>> On 8/25/20 7:39 PM, Will Deacon wrote:
>>> The stage-2 page-tables are entirely encapsulated by the 'pgt' field of
>>> 'struct kvm_s2_mmu', so remove the unused 'pgd' field.
>>>
>>> Cc: Marc Zyngier <maz@kernel.org>
>>> Cc: Quentin Perret <qperret@google.com>
>>> Signed-off-by: Will Deacon <will@kernel.org>
>>> ---
>>>    arch/arm64/include/asm/kvm_host.h | 1 -
>>>    arch/arm64/kvm/mmu.c              | 2 --
>>>    2 files changed, 3 deletions(-)
>>>
>>
>> I think this might be folded into PATCH[18] as both patches are
>> simple enough. I'm not sure the changes introduced in PATCH[19]
>> prevent us doing this.
>>
>> There is another question below.
>>
>> Reviewed-by: Gavin Shan <gshan@redhat.com>
>>
>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>> index 0b7c702b2151..41caf29bd93c 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -79,7 +79,6 @@ struct kvm_s2_mmu {
>>>    	 * for vEL1/EL0 with vHCR_EL2.VM == 0.  In that case, we use the
>>>    	 * canonical stage-2 page tables.
>>>    	 */
>>> -	pgd_t		*pgd;
>>>    	phys_addr_t	pgd_phys;
>>>    	struct kvm_pgtable *pgt;
>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>> index ddeec0b03666..f28e03dcb897 100644
>>> --- a/arch/arm64/kvm/mmu.c
>>> +++ b/arch/arm64/kvm/mmu.c
>>> @@ -384,7 +384,6 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu)
>>>    	mmu->kvm = kvm;
>>>    	mmu->pgt = pgt;
>>>    	mmu->pgd_phys = __pa(pgt->pgd);
>>> -	mmu->pgd = (void *)pgt->pgd;
>>>    	mmu->vmid.vmid_gen = 0;
>>>    	return 0;
>>> @@ -470,7 +469,6 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>>>    	spin_lock(&kvm->mmu_lock);
>>>    	pgt = mmu->pgt;
>>>    	if (pgt) {
>>> -		mmu->pgd = NULL;
>>>    		mmu->pgd_phys = 0;
>>>    		mmu->pgt = NULL;
>>>    		free_percpu(mmu->last_vcpu_ran);
>>>
>>
>> I guess mmu->pgd_phys might be removed either because kvm_get_vttbr()
>> is the only consumer.
> 
> Hmm, but kvm_get_vttbr() is still used after these patches, so I think
> the pgd_phys field needs to stick around.
> 

Yes, kvm_get_vttbr() is the only consumer. The corresponding physical
address can be figured out in the function, we needn't have the physical
address and cache it in advance. However, it's not a big deal. I probably
post one patch to remove it after this series gets merged.

    baddr = __pa(mmu->pgt->pgd);

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 10/21] KVM: arm64: Add support for stage-2 page-aging in generic page-table
  2020-09-03 16:48     ` Will Deacon
@ 2020-09-04  1:01       ` Gavin Shan
  0 siblings, 0 replies; 86+ messages in thread
From: Gavin Shan @ 2020-09-04  1:01 UTC (permalink / raw)
  To: Will Deacon
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, kvmarm, linux-arm-kernel

Hi Will,

On 9/4/20 2:48 AM, Will Deacon wrote:
> On Thu, Sep 03, 2020 at 02:33:22PM +1000, Gavin Shan wrote:
>> On 8/25/20 7:39 PM, Will Deacon wrote:
>>> Add stage-2 mkyoung(), mkold() and is_young() operations to the generic
>>> page-table code.
>>>
>>> Cc: Marc Zyngier <maz@kernel.org>
>>> Cc: Quentin Perret <qperret@google.com>
>>> Signed-off-by: Will Deacon <will@kernel.org>
>>> ---
>>>    arch/arm64/include/asm/kvm_pgtable.h | 38 ++++++++++++
>>>    arch/arm64/kvm/hyp/pgtable.c         | 86 ++++++++++++++++++++++++++++
>>>    2 files changed, 124 insertions(+)
> 
> [...]
> 
>>> +static int stage2_update_leaf_attrs(struct kvm_pgtable *pgt, u64 addr,
>>> +				    u64 size, kvm_pte_t attr_set,
>>> +				    kvm_pte_t attr_clr, kvm_pte_t *orig_pte)
>>> +{
>>> +	int ret;
>>> +	kvm_pte_t attr_mask = KVM_PTE_LEAF_ATTR_LO | KVM_PTE_LEAF_ATTR_HI;
>>> +	struct stage2_attr_data data = {
>>> +		.attr_set	= attr_set & attr_mask,
>>> +		.attr_clr	= attr_clr & attr_mask,
>>> +	};
>>> +	struct kvm_pgtable_walker walker = {
>>> +		.cb		= stage2_attr_walker,
>>> +		.arg		= &data,
>>> +		.flags		= KVM_PGTABLE_WALK_LEAF,
>>> +	};
>>> +
>>> +	ret = kvm_pgtable_walk(pgt, addr, size, &walker);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	if (orig_pte)
>>> +		*orig_pte = data.pte;
>>> +	return 0;
>>> +}
>>> +
>>
>> The @size is always 1 from the caller, which means the parameter
>> can be dropped from stage2_update_leaf_attrs(). In the meanwhile,
>> we don't know the page is mapped by PUD, PMD or PTE. So to have
>> fixed value ("1") looks meaningless.
> 
> I add extra callers later on, for example kvm_pgtable_stage2_wrprotect(),
> which pass a size, so it's needed for that.
> 

Yes, we still need @size in the subsequent patches. So this suggestion
isn't valid.

Thanks,
Gavin



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 20/21] KVM: arm64: Remove unused 'pgd' field from 'struct kvm_s2_mmu'
  2020-09-04  0:59       ` Gavin Shan
@ 2020-09-04 10:02         ` Marc Zyngier
  0 siblings, 0 replies; 86+ messages in thread
From: Marc Zyngier @ 2020-09-04 10:02 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kernel-team, Suzuki Poulose, Catalin Marinas, Quentin Perret,
	James Morse, Will Deacon, kvmarm, linux-arm-kernel

On 2020-09-04 01:59, Gavin Shan wrote:
> Hi Will,
> 
> On 9/4/20 2:50 AM, Will Deacon wrote:
>> On Thu, Sep 03, 2020 at 03:07:17PM +1000, Gavin Shan wrote:
>>> On 8/25/20 7:39 PM, Will Deacon wrote:
>>>> The stage-2 page-tables are entirely encapsulated by the 'pgt' field 
>>>> of
>>>> 'struct kvm_s2_mmu', so remove the unused 'pgd' field.
>>>> 
>>>> Cc: Marc Zyngier <maz@kernel.org>
>>>> Cc: Quentin Perret <qperret@google.com>
>>>> Signed-off-by: Will Deacon <will@kernel.org>
>>>> ---
>>>>    arch/arm64/include/asm/kvm_host.h | 1 -
>>>>    arch/arm64/kvm/mmu.c              | 2 --
>>>>    2 files changed, 3 deletions(-)
>>>> 
>>> 
>>> I think this might be folded into PATCH[18] as both patches are
>>> simple enough. I'm not sure the changes introduced in PATCH[19]
>>> prevent us doing this.
>>> 
>>> There is another question below.
>>> 
>>> Reviewed-by: Gavin Shan <gshan@redhat.com>
>>> 
>>>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>>>> b/arch/arm64/include/asm/kvm_host.h
>>>> index 0b7c702b2151..41caf29bd93c 100644
>>>> --- a/arch/arm64/include/asm/kvm_host.h
>>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>>> @@ -79,7 +79,6 @@ struct kvm_s2_mmu {
>>>>    	 * for vEL1/EL0 with vHCR_EL2.VM == 0.  In that case, we use the
>>>>    	 * canonical stage-2 page tables.
>>>>    	 */
>>>> -	pgd_t		*pgd;
>>>>    	phys_addr_t	pgd_phys;
>>>>    	struct kvm_pgtable *pgt;
>>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>>> index ddeec0b03666..f28e03dcb897 100644
>>>> --- a/arch/arm64/kvm/mmu.c
>>>> +++ b/arch/arm64/kvm/mmu.c
>>>> @@ -384,7 +384,6 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct 
>>>> kvm_s2_mmu *mmu)
>>>>    	mmu->kvm = kvm;
>>>>    	mmu->pgt = pgt;
>>>>    	mmu->pgd_phys = __pa(pgt->pgd);
>>>> -	mmu->pgd = (void *)pgt->pgd;
>>>>    	mmu->vmid.vmid_gen = 0;
>>>>    	return 0;
>>>> @@ -470,7 +469,6 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>>>>    	spin_lock(&kvm->mmu_lock);
>>>>    	pgt = mmu->pgt;
>>>>    	if (pgt) {
>>>> -		mmu->pgd = NULL;
>>>>    		mmu->pgd_phys = 0;
>>>>    		mmu->pgt = NULL;
>>>>    		free_percpu(mmu->last_vcpu_ran);
>>>> 
>>> 
>>> I guess mmu->pgd_phys might be removed either because kvm_get_vttbr()
>>> is the only consumer.
>> 
>> Hmm, but kvm_get_vttbr() is still used after these patches, so I think
>> the pgd_phys field needs to stick around.
>> 
> 
> Yes, kvm_get_vttbr() is the only consumer. The corresponding physical
> address can be figured out in the function, we needn't have the 
> physical
> address and cache it in advance. However, it's not a big deal. I 
> probably
> post one patch to remove it after this series gets merged.
> 
>    baddr = __pa(mmu->pgt->pgd);

I'd rather you didn't. The NV patches need it for the AT emulation code,
so it might as well stay in place.

Thanks,

          M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling
  2020-09-04  0:51         ` Gavin Shan
@ 2020-09-04 10:07           ` Marc Zyngier
  2020-09-05  3:56             ` Gavin Shan
  2020-09-07  9:27           ` Will Deacon
  1 sibling, 1 reply; 86+ messages in thread
From: Marc Zyngier @ 2020-09-04 10:07 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kernel-team, Suzuki Poulose, Catalin Marinas, Quentin Perret,
	James Morse, Will Deacon, kvmarm, linux-arm-kernel

On 2020-09-04 01:51, Gavin Shan wrote:
> Hi Will,
> 
> On 9/3/20 10:16 PM, Will Deacon wrote:
>> On Thu, Sep 03, 2020 at 09:48:18PM +1000, Gavin Shan wrote:
>>> On 9/3/20 9:13 PM, Gavin Shan wrote:
>>>> On 9/3/20 5:34 PM, Gavin Shan wrote:
>>>>> On 8/25/20 7:39 PM, Will Deacon wrote:
>>>>>> Hello folks,
>>>>>> 
>>>>>> This is version three of the KVM page-table rework that I 
>>>>>> previously posted
>>>>>> here:
>>>>>> 
>>>>>>     v1: 
>>>>>> https://lore.kernel.org/r/20200730153406.25136-1-will@kernel.org
>>>>>>     v2: 
>>>>>> https://lore.kernel.org/r/20200818132818.16065-1-will@kernel.org
>>>>>> 
>>>>>> Changes since v2 include:
>>>>>> 
>>>>>>     * Rebased onto -rc2, which includes the conflicting OOM 
>>>>>> blocking fixes
>>>>>>     * Dropped the patch trying to "fix" the memcache in 
>>>>>> kvm_phys_addr_ioremap()
>>>>>> 
>>>>> 
>>>>> It's really nice work, making the code unified/simplified greatly.
>>>>> However, it seems it doesn't work well with HugeTLBfs. Please refer
>>>>> to the following test result and see if you have quick idea, or I
>>>>> can debug it a bit :)
>> 
>> Nice testing matrix, and thanks for reporting the problem!
>> 
>>>>> Machine         Host                     Guest              Result
>>>>> ===============================================================
>>>>> ThunderX2    VA_BITS:   42           PAGE_SIZE:  4KB     Passed
>>>>>                PAGE_SIZE: 64KB                    64KB     passed
>>>>>                THP:       disabled
>>>>>                HugeTLB:   disabled
>>>>> ---------------------------------------------------------------
>>>>> ThunderX2    VA_BITS:   42           PAGE_SIZE:  4KB     Passed
>>>>>                PAGE_SIZE: 64KB                    64KB     passed
>>>>>                THP:       enabled
>>>>>                HugeTLB:   disabled
>>>>> ----------------------------------------------------------------
>>>>> ThunderX2    VA_BITS:   42           PAGE_SIZE:  4KB     Fail[1]
>>>>>                PAGE_SIZE: 64KB                    64KB     Fail[1]
>>>>>                THP:       disabled
>>>>>                HugeTLB:   enabled
>>>>> ---------------------------------------------------------------
>>>>> ThunderX2    VA_BITS:   39           PAGE_SIZE:  4KB     Passed
>>>>>                PAGE_SIZE: 4KB                     64KB     Passed
>>>>>                THP:       disabled
>>>>>                HugeTLB:   disabled
>>>>> ---------------------------------------------------------------
>>>>> ThunderX2    VA_BITS:   39           PAGE_SIZE:  4KB     Passed
>>>>>                PAGE_SIZE: 4KB                     64KB     Passed
>>>>>                THP:       enabled
>>>>>                HugeTLB:   disabled
>>>>> --------------------------------------------------------------
>>>>> ThunderX2    VA_BITS:   39           PAGE_SIZE: 4KB     Fail[2]
>>>>>                PAGE_SIZE: 4KB                    64KB     Fail[2]
>>>>>                THP:       disabled
>>>>>                HugeTLB:   enabled
>>>>> 
>>>> 
>>>> I debugged the code and found the issue is caused by the following
>>>> patch.
>>>> 
>>>> [PATCH v3 06/21] KVM: arm64: Add support for stage-2 map()/unmap() 
>>>> in generic page-table
>> 
>> (I think this is just a symptom of the page-table being out of whack)
>> 
>>> Sorry that the guest could hang sometimes with above changes. I have 
>>> no idea what
>>> has been happening before I'm going to debug for more.. I'm pasting 
>>> the used command
>>> and output from guest.
>> 
>> Can you try the diff below, please? I think we can end up sticking 
>> down a
>> huge-page-sized mapping at an unaligned address, which causes us both 
>> to
>> overmap and also to fail to use the huge granule for a block mapping.
>> 
> 
> Since the the following changes have been folded to v4, I reran the 
> test cases
> on v4 and everything works fine.

Thanks a lot for the great testing and reviewing effort!

<shameless ask>
Since you obviously have a test rig setup for this: does your TX2 
support 16kB
pages? If so, could you please do another run with this page size on the 
host?
</shameless ask>

Thanks again,

         M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling
  2020-09-04 10:07           ` Marc Zyngier
@ 2020-09-05  3:56             ` Gavin Shan
  2020-09-05  9:33               ` Marc Zyngier
  0 siblings, 1 reply; 86+ messages in thread
From: Gavin Shan @ 2020-09-05  3:56 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kernel-team, Suzuki Poulose, Catalin Marinas, Quentin Perret,
	James Morse, Will Deacon, kvmarm, linux-arm-kernel

Hi Marc,

On 9/4/20 8:07 PM, Marc Zyngier wrote:
> On 2020-09-04 01:51, Gavin Shan wrote:
>> On 9/3/20 10:16 PM, Will Deacon wrote:
>>> On Thu, Sep 03, 2020 at 09:48:18PM +1000, Gavin Shan wrote:
>>>> On 9/3/20 9:13 PM, Gavin Shan wrote:
>>>>> On 9/3/20 5:34 PM, Gavin Shan wrote:
>>>>>> On 8/25/20 7:39 PM, Will Deacon wrote:

[...]

>>
>> Since the the following changes have been folded to v4, I reran the test cases
>> on v4 and everything works fine.
> 
> Thanks a lot for the great testing and reviewing effort!
> 
> <shameless ask>
> Since you obviously have a test rig setup for this: does your TX2 support 16kB
> pages? If so, could you please do another run with this page size on the host?
> </shameless ask>
> 

My TX2 machine doesn't support 16KB page size unfortunately. The
following output was seen from host when it has 16KB page size.
Sorry about it.

    CONFIG_ARM64_PAGE_SHIFT=14
    CONFIG_ARM64_VA_BITS_47=y
    CONFIG_ARM64_VA_BITS=47

Output from console on host
===========================
EFI stub: ERROR: This 16 KB granular kernel is not supported by your CPU

   Failed to boot both default and fallback entries.

Press any key to continue...
    
Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling
  2020-09-05  3:56             ` Gavin Shan
@ 2020-09-05  9:33               ` Marc Zyngier
  0 siblings, 0 replies; 86+ messages in thread
From: Marc Zyngier @ 2020-09-05  9:33 UTC (permalink / raw)
  To: Gavin Shan
  Cc: kernel-team, Suzuki Poulose, Catalin Marinas, Quentin Perret,
	James Morse, Will Deacon, kvmarm, linux-arm-kernel

On Sat, 05 Sep 2020 04:56:39 +0100,
Gavin Shan <gshan@redhat.com> wrote:
> 
> Hi Marc,
> 
> On 9/4/20 8:07 PM, Marc Zyngier wrote:
> > On 2020-09-04 01:51, Gavin Shan wrote:
> >> On 9/3/20 10:16 PM, Will Deacon wrote:
> >>> On Thu, Sep 03, 2020 at 09:48:18PM +1000, Gavin Shan wrote:
> >>>> On 9/3/20 9:13 PM, Gavin Shan wrote:
> >>>>> On 9/3/20 5:34 PM, Gavin Shan wrote:
> >>>>>> On 8/25/20 7:39 PM, Will Deacon wrote:
> 
> [...]
> 
> >> 
> >> Since the the following changes have been folded to v4, I reran the test cases
> >> on v4 and everything works fine.
> > 
> > Thanks a lot for the great testing and reviewing effort!
> > 
> > <shameless ask>
> > Since you obviously have a test rig setup for this: does your TX2 support 16kB
> > pages? If so, could you please do another run with this page size on the host?
> > </shameless ask>
> > 
> 
> My TX2 machine doesn't support 16KB page size unfortunately. The
> following output was seen from host when it has 16KB page size.
> Sorry about it.
> 
>    CONFIG_ARM64_PAGE_SHIFT=14
>    CONFIG_ARM64_VA_BITS_47=y
>    CONFIG_ARM64_VA_BITS=47
> 
> Output from console on host
> ===========================
> EFI stub: ERROR: This 16 KB granular kernel is not supported by your CPU
> 
>   Failed to boot both default and fallback entries.
> 
> Press any key to continue...

Ah, fair enough. It was worth trying. I guess I need to try that on
TX1, which does support 16kB pages. Too bad it is such a pain to use...

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling
  2020-09-04  0:51         ` Gavin Shan
  2020-09-04 10:07           ` Marc Zyngier
@ 2020-09-07  9:27           ` Will Deacon
  1 sibling, 0 replies; 86+ messages in thread
From: Will Deacon @ 2020-09-07  9:27 UTC (permalink / raw)
  To: Gavin Shan
  Cc: Suzuki Poulose, Marc Zyngier, Quentin Perret, James Morse,
	Catalin Marinas, kernel-team, kvmarm, linux-arm-kernel

On Fri, Sep 04, 2020 at 10:51:58AM +1000, Gavin Shan wrote:
> On 9/3/20 10:16 PM, Will Deacon wrote:
> > Can you try the diff below, please? I think we can end up sticking down a
> > huge-page-sized mapping at an unaligned address, which causes us both to
> > overmap and also to fail to use the huge granule for a block mapping.
> > 
> 
> Since the the following changes have been folded to v4, I reran the test cases
> on v4 and everything works fine.

That's great news, thanks! I'll post that lot later today, assuming I finish
reading email before it gets dark.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 09/21] KVM: arm64: Convert unmap_stage2_range() to generic page-table API
  2020-09-03 17:57     ` Will Deacon
@ 2020-09-08 13:07       ` Alexandru Elisei
  2020-09-09 10:57         ` Alexandru Elisei
  0 siblings, 1 reply; 86+ messages in thread
From: Alexandru Elisei @ 2020-09-08 13:07 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, kernel-team, kvmarm, linux-arm-kernel, Catalin Marinas

Hi Will,

On 9/3/20 6:57 PM, Will Deacon wrote:
> On Wed, Sep 02, 2020 at 05:23:08PM +0100, Alexandru Elisei wrote:
>> On 8/25/20 10:39 AM, Will Deacon wrote:
>>> Convert unmap_stage2_range() to use kvm_pgtable_stage2_unmap() instead
>>> of walking the page-table directly.
>>>
>>> Cc: Marc Zyngier <maz@kernel.org>
>>> Cc: Quentin Perret <qperret@google.com>
>>> Signed-off-by: Will Deacon <will@kernel.org>
>>> ---
>>>  arch/arm64/kvm/mmu.c | 57 +++++++++++++++++++++++++-------------------
>>>  1 file changed, 32 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>> index 704b471a48ce..751ce2462765 100644
>>> --- a/arch/arm64/kvm/mmu.c
>>> +++ b/arch/arm64/kvm/mmu.c
>>> @@ -39,6 +39,33 @@ static bool is_iomap(unsigned long flags)
>>>  	return flags & KVM_S2PTE_FLAG_IS_IOMAP;
>>>  }
>>>  
>>> +/*
>>> + * Release kvm_mmu_lock periodically if the memory region is large. Otherwise,
>>> + * we may see kernel panics with CONFIG_DETECT_HUNG_TASK,
>>> + * CONFIG_LOCKUP_DETECTOR, CONFIG_LOCKDEP. Additionally, holding the lock too
>>> + * long will also starve other vCPUs. We have to also make sure that the page
>>> + * tables are not freed while we released the lock.
>>> + */
>>> +#define stage2_apply_range(kvm, addr, end, fn, resched)			\
>>> +({									\
>>> +	int ret;							\
>>> +	struct kvm *__kvm = (kvm);					\
>>> +	bool __resched = (resched);					\
>>> +	u64 next, __addr = (addr), __end = (end);			\
>>> +	do {								\
>>> +		struct kvm_pgtable *pgt = __kvm->arch.mmu.pgt;		\
>>> +		if (!pgt)						\
>>> +			break;						\
>> I'm 100% sure there's a reason why we've dropped the READ_ONCE, but it still looks
>> to me like the compiler might decide to optimize by reading pgt once at the start
>> of the loop and stashing it in a register. Would you mind explaining what I am
>> missing?
> The load always happens with the mmu_lock held, so I think it's not a
> problem because it means that the pointer is stable.
> spin_lock()/spin_unlock() imply compiler barriers.

I think you are correct, if this is supposed to always execute with kvm->mmu_lock
held, then pgt should not change between iterations. It didn't immediately occur
to me that that is the case because we check if pgt is NULL every iteration. If we
are relying on the lock being held, maybe we should move the pgt load + comparison
against NULL out of the loop? That should avoid any confusion and make the code
ever so slightly faster.

Also, I see that in __unmap_stage2_range() we check that the mmu_lock is held, but
we don't check that at all call sites (for example, in stage2_wp_range()). I
realize this is me bikeshedding, but that looks a bit asymmetrical. Should we move
the assert_spin_locked(&kvm->mmu_lock) statement in stage2_apply_range(), since
the function assumes the pgt will remain unchanged? What do you think?

Thanks,
Alex

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

* Re: [PATCH v3 09/21] KVM: arm64: Convert unmap_stage2_range() to generic page-table API
  2020-09-08 13:07       ` Alexandru Elisei
@ 2020-09-09 10:57         ` Alexandru Elisei
  0 siblings, 0 replies; 86+ messages in thread
From: Alexandru Elisei @ 2020-09-09 10:57 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, kernel-team, kvmarm, linux-arm-kernel, Catalin Marinas

Hi Will,

I'm answering my own question, again. See below.

On 9/8/20 2:07 PM, Alexandru Elisei wrote:
> Hi Will,
>
> On 9/3/20 6:57 PM, Will Deacon wrote:
>> On Wed, Sep 02, 2020 at 05:23:08PM +0100, Alexandru Elisei wrote:
>>> On 8/25/20 10:39 AM, Will Deacon wrote:
>>>> Convert unmap_stage2_range() to use kvm_pgtable_stage2_unmap() instead
>>>> of walking the page-table directly.
>>>>
>>>> Cc: Marc Zyngier <maz@kernel.org>
>>>> Cc: Quentin Perret <qperret@google.com>
>>>> Signed-off-by: Will Deacon <will@kernel.org>
>>>> ---
>>>>  arch/arm64/kvm/mmu.c | 57 +++++++++++++++++++++++++-------------------
>>>>  1 file changed, 32 insertions(+), 25 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>>> index 704b471a48ce..751ce2462765 100644
>>>> --- a/arch/arm64/kvm/mmu.c
>>>> +++ b/arch/arm64/kvm/mmu.c
>>>> @@ -39,6 +39,33 @@ static bool is_iomap(unsigned long flags)
>>>>  	return flags & KVM_S2PTE_FLAG_IS_IOMAP;
>>>>  }
>>>>  
>>>> +/*
>>>> + * Release kvm_mmu_lock periodically if the memory region is large. Otherwise,
>>>> + * we may see kernel panics with CONFIG_DETECT_HUNG_TASK,
>>>> + * CONFIG_LOCKUP_DETECTOR, CONFIG_LOCKDEP. Additionally, holding the lock too
>>>> + * long will also starve other vCPUs. We have to also make sure that the page
>>>> + * tables are not freed while we released the lock.
>>>> + */
>>>> +#define stage2_apply_range(kvm, addr, end, fn, resched)			\
>>>> +({									\
>>>> +	int ret;							\
>>>> +	struct kvm *__kvm = (kvm);					\
>>>> +	bool __resched = (resched);					\
>>>> +	u64 next, __addr = (addr), __end = (end);			\
>>>> +	do {								\
>>>> +		struct kvm_pgtable *pgt = __kvm->arch.mmu.pgt;		\
>>>> +		if (!pgt)						\
>>>> +			break;						\
>>> I'm 100% sure there's a reason why we've dropped the READ_ONCE, but it still looks
>>> to me like the compiler might decide to optimize by reading pgt once at the start
>>> of the loop and stashing it in a register. Would you mind explaining what I am
>>> missing?
>> The load always happens with the mmu_lock held, so I think it's not a
>> problem because it means that the pointer is stable.
>> spin_lock()/spin_unlock() imply compiler barriers.
> I think you are correct, if this is supposed to always execute with kvm->mmu_lock
> held, then pgt should not change between iterations. It didn't immediately occur
> to me that that is the case because we check if pgt is NULL every iteration. If we
> are relying on the lock being held, maybe we should move the pgt load + comparison
> against NULL out of the loop? That should avoid any confusion and make the code
> ever so slightly faster.
>
> Also, I see that in __unmap_stage2_range() we check that the mmu_lock is held, but
> we don't check that at all call sites (for example, in stage2_wp_range()). I
> realize this is me bikeshedding, but that looks a bit asymmetrical. Should we move
> the assert_spin_locked(&kvm->mmu_lock) statement in stage2_apply_range(), since
> the function assumes the pgt will remain unchanged? What do you think?

What I wrote is wrong, because we can drop the lock in cond_resched_lock(). I
don't see the need for any changes.

Thanks,
Alex

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 86+ messages in thread

end of thread, other threads:[~2020-09-09 10:58 UTC | newest]

Thread overview: 86+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-25  9:39 [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Will Deacon
2020-08-25  9:39 ` [PATCH v3 01/21] KVM: arm64: Remove kvm_mmu_free_memory_caches() Will Deacon
2020-08-25  9:39 ` [PATCH v3 02/21] KVM: arm64: Add stand-alone page-table walker infrastructure Will Deacon
2020-08-27 16:27   ` Alexandru Elisei
2020-08-28 15:43     ` Alexandru Elisei
2020-09-02 10:36     ` Will Deacon
2020-08-28 15:51   ` Alexandru Elisei
2020-09-02 10:49     ` Will Deacon
2020-09-02  6:31   ` Gavin Shan
2020-09-02 11:02     ` Will Deacon
2020-09-03  1:11       ` Gavin Shan
2020-08-25  9:39 ` [PATCH v3 03/21] KVM: arm64: Add support for creating kernel-agnostic stage-1 page tables Will Deacon
2020-08-28 15:35   ` Alexandru Elisei
2020-09-02 10:06     ` Will Deacon
2020-08-25  9:39 ` [PATCH v3 04/21] KVM: arm64: Use generic allocator for hyp stage-1 page-tables Will Deacon
2020-08-28 16:32   ` Alexandru Elisei
2020-09-02 11:35     ` Will Deacon
2020-09-02 14:48       ` Alexandru Elisei
2020-08-25  9:39 ` [PATCH v3 05/21] KVM: arm64: Add support for creating kernel-agnostic stage-2 page tables Will Deacon
2020-09-02  6:40   ` Gavin Shan
2020-09-02 11:30     ` Will Deacon
2020-08-25  9:39 ` [PATCH v3 06/21] KVM: arm64: Add support for stage-2 map()/unmap() in generic page-table Will Deacon
2020-09-01 16:24   ` Alexandru Elisei
2020-09-02 11:46     ` Will Deacon
2020-09-03  2:57   ` Gavin Shan
2020-09-03  5:27     ` Gavin Shan
2020-09-03 11:18   ` Gavin Shan
2020-09-03 12:30     ` Will Deacon
2020-09-03 16:15       ` Will Deacon
2020-09-04  0:47         ` Gavin Shan
2020-08-25  9:39 ` [PATCH v3 07/21] KVM: arm64: Convert kvm_phys_addr_ioremap() to generic page-table API Will Deacon
2020-09-01 17:08   ` Alexandru Elisei
2020-09-02 11:48     ` Will Deacon
2020-09-03  3:57   ` Gavin Shan
2020-08-25  9:39 ` [PATCH v3 08/21] KVM: arm64: Convert kvm_set_spte_hva() " Will Deacon
2020-09-02 15:37   ` Alexandru Elisei
2020-09-03 16:37     ` Will Deacon
2020-09-03  4:13   ` Gavin Shan
2020-08-25  9:39 ` [PATCH v3 09/21] KVM: arm64: Convert unmap_stage2_range() " Will Deacon
2020-09-02 16:23   ` Alexandru Elisei
2020-09-02 18:44     ` Alexandru Elisei
2020-09-03 17:57     ` Will Deacon
2020-09-08 13:07       ` Alexandru Elisei
2020-09-09 10:57         ` Alexandru Elisei
2020-09-03  4:19   ` Gavin Shan
2020-08-25  9:39 ` [PATCH v3 10/21] KVM: arm64: Add support for stage-2 page-aging in generic page-table Will Deacon
2020-09-03  4:33   ` Gavin Shan
2020-09-03 16:48     ` Will Deacon
2020-09-04  1:01       ` Gavin Shan
2020-08-25  9:39 ` [PATCH v3 11/21] KVM: arm64: Convert page-aging and access faults to generic page-table API Will Deacon
2020-09-03  4:37   ` Gavin Shan
2020-08-25  9:39 ` [PATCH v3 12/21] KVM: arm64: Add support for stage-2 write-protect in generic page-table Will Deacon
2020-09-03  4:47   ` Gavin Shan
2020-08-25  9:39 ` [PATCH v3 13/21] KVM: arm64: Convert write-protect operation to generic page-table API Will Deacon
2020-09-03  4:48   ` Gavin Shan
2020-08-25  9:39 ` [PATCH v3 14/21] KVM: arm64: Add support for stage-2 cache flushing in generic page-table Will Deacon
2020-09-03  4:51   ` Gavin Shan
2020-08-25  9:39 ` [PATCH v3 15/21] KVM: arm64: Convert memslot cache-flushing code to generic page-table API Will Deacon
2020-09-03  4:52   ` Gavin Shan
2020-08-25  9:39 ` [PATCH v3 16/21] KVM: arm64: Add support for relaxing stage-2 perms in generic page-table code Will Deacon
2020-09-03  4:55   ` Gavin Shan
2020-08-25  9:39 ` [PATCH v3 17/21] KVM: arm64: Convert user_mem_abort() to generic page-table API Will Deacon
2020-09-03  6:05   ` Gavin Shan
2020-08-25  9:39 ` [PATCH v3 18/21] KVM: arm64: Check the pgt instead of the pgd when modifying page-table Will Deacon
2020-09-03  5:00   ` Gavin Shan
2020-08-25  9:39 ` [PATCH v3 19/21] KVM: arm64: Remove unused page-table code Will Deacon
2020-09-03  6:02   ` Gavin Shan
2020-08-25  9:39 ` [PATCH v3 20/21] KVM: arm64: Remove unused 'pgd' field from 'struct kvm_s2_mmu' Will Deacon
2020-09-03  5:07   ` Gavin Shan
2020-09-03 16:50     ` Will Deacon
2020-09-04  0:59       ` Gavin Shan
2020-09-04 10:02         ` Marc Zyngier
2020-08-25  9:39 ` [PATCH v3 21/21] KVM: arm64: Don't constrain maximum IPA size based on host configuration Will Deacon
2020-09-03  5:09   ` Gavin Shan
2020-08-27 16:26 ` [PATCH v3 00/21] KVM: arm64: Rewrite page-table code and fault handling Alexandru Elisei
2020-09-01 16:15   ` Will Deacon
2020-09-03  7:34 ` Gavin Shan
2020-09-03 11:13   ` Gavin Shan
2020-09-03 11:48     ` Gavin Shan
2020-09-03 12:16       ` Will Deacon
2020-09-04  0:51         ` Gavin Shan
2020-09-04 10:07           ` Marc Zyngier
2020-09-05  3:56             ` Gavin Shan
2020-09-05  9:33               ` Marc Zyngier
2020-09-07  9:27           ` Will Deacon
2020-09-03 18:52 ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).