linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Relax break-before-make use with FEAT_BBM
@ 2023-06-02 17:01 Colton Lewis
  2023-06-02 17:01 ` [PATCH 1/3] arm64: Add a capability for FEAT_BBM level 2 Colton Lewis
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Colton Lewis @ 2023-06-02 17:01 UTC (permalink / raw)
  To: kvm
  Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	James Morse, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, kvmarm, Colton Lewis

Currently KVM follows the lengthy break-before-make process every time
the page size changes, which requires KVM to do a broadcast TLB
invalidation and data serialization for every affected page table
entry. This is expensive.

FEAT_BBM Level 2 support precludes the need to follow the whole
process when page size is the only thing that changed. This series
detects said support and avoids the unnecessary expensive operations,
speeding up the execution of the stage2 page table walkers.

Considerable time and effort has been spent trying to measure the
performance benefit, mainly using dirty_log_perf_test with huge pages,
but nothing was seen that stood out from ordinary variation between
runs. This is puzzling, but getting the series reviewed anyway may
spark some ideas.

This is based on kvmarm-6.4 + Ricardo's eager page splitting series
[1] to cover the eager splitting case as well. Similar changes were
originally part of that series but it was suggested FEAT_BBM should be
its own series.

[1] https://lore.kernel.org/kvmarm/20230426172330.1439644-1-ricarkol@google.com/

Colton Lewis (2):
  KVM: arm64: Clear possible conflict aborts
  KVM: arm64: Skip break phase when we have FEAT_BBM level 2

Ricardo Koller (1):
  arm64: Add a capability for FEAT_BBM level 2

 arch/arm64/include/asm/esr.h   |  1 +
 arch/arm64/kernel/cpufeature.c | 11 +++++++
 arch/arm64/kvm/hyp/pgtable.c   | 58 ++++++++++++++++++++++++++++++----
 arch/arm64/kvm/mmu.c           |  6 ++++
 arch/arm64/tools/cpucaps       |  1 +
 5 files changed, 70 insertions(+), 7 deletions(-)

--
2.41.0.rc0.172.g3f132b7071-goog

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] arm64: Add a capability for FEAT_BBM level 2
  2023-06-02 17:01 [PATCH 0/3] Relax break-before-make use with FEAT_BBM Colton Lewis
@ 2023-06-02 17:01 ` Colton Lewis
  2023-06-05 15:07   ` Robin Murphy
  2023-06-02 17:01 ` [PATCH 2/3] KVM: arm64: Clear possible conflict aborts Colton Lewis
  2023-06-02 17:01 ` [PATCH 3/3] KVM: arm64: Skip break phase when we have FEAT_BBM level 2 Colton Lewis
  2 siblings, 1 reply; 10+ messages in thread
From: Colton Lewis @ 2023-06-02 17:01 UTC (permalink / raw)
  To: kvm
  Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	James Morse, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, kvmarm, Ricardo Koller

From: Ricardo Koller <ricarkol@google.com>

Add a new capability to detect "Stage-2 Translation table
break-before-make" (FEAT_BBM) level 2.

Signed-off-by: Ricardo Koller <ricarkol@google.com>
---
 arch/arm64/kernel/cpufeature.c | 11 +++++++++++
 arch/arm64/tools/cpucaps       |  1 +
 2 files changed, 12 insertions(+)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index c331c49a7d19c..c538060f7f66b 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2455,6 +2455,17 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.min_field_value = 1,
 		.matches = has_cpuid_feature,
 	},
+	{
+		.desc = "Stage-2 Translation table break-before-make level 2",
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.capability = ARM64_HAS_STAGE2_BBM2,
+		.sys_reg = SYS_ID_AA64MMFR2_EL1,
+		.sign = FTR_UNSIGNED,
+		.field_pos = ID_AA64MMFR2_EL1_BBM_SHIFT,
+		.field_width = 4,
+		.min_field_value = 2,
+		.matches = has_cpuid_feature,
+	},
 	{
 		.desc = "TLB range maintenance instructions",
 		.capability = ARM64_HAS_TLB_RANGE,
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 40ba95472594d..010aca1892642 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -41,6 +41,7 @@ HAS_PAN
 HAS_RAS_EXTN
 HAS_RNG
 HAS_SB
+HAS_STAGE2_BBM2
 HAS_STAGE2_FWB
 HAS_TIDCP1
 HAS_TLB_RANGE
--
2.41.0.rc0.172.g3f132b7071-goog

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] KVM: arm64: Clear possible conflict aborts
  2023-06-02 17:01 [PATCH 0/3] Relax break-before-make use with FEAT_BBM Colton Lewis
  2023-06-02 17:01 ` [PATCH 1/3] arm64: Add a capability for FEAT_BBM level 2 Colton Lewis
@ 2023-06-02 17:01 ` Colton Lewis
  2023-06-09 15:44   ` Oliver Upton
  2023-06-02 17:01 ` [PATCH 3/3] KVM: arm64: Skip break phase when we have FEAT_BBM level 2 Colton Lewis
  2 siblings, 1 reply; 10+ messages in thread
From: Colton Lewis @ 2023-06-02 17:01 UTC (permalink / raw)
  To: kvm
  Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	James Morse, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, kvmarm, Colton Lewis

Clear possible conflict aborts by TLB invalidation targeted to the
address that caused the abort.

Making use of FEAT_BBM Level 2 creates the possibility of a conflict
abort when translating addresses, where multiple entries exist in the
TLB for a single input address.

Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
 arch/arm64/include/asm/esr.h | 1 +
 arch/arm64/kvm/mmu.c         | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/arch/arm64/include/asm/esr.h b/arch/arm64/include/asm/esr.h
index 8487aec9b6587..41336cfa19ff3 100644
--- a/arch/arm64/include/asm/esr.h
+++ b/arch/arm64/include/asm/esr.h
@@ -123,6 +123,7 @@
 #define ESR_ELx_FSC_SECC_TTW1	(0x1d)
 #define ESR_ELx_FSC_SECC_TTW2	(0x1e)
 #define ESR_ELx_FSC_SECC_TTW3	(0x1f)
+#define ESR_ELx_FSC_CONFLICT	(0x30)

 /* ISS field definitions for Data Aborts */
 #define ESR_ELx_ISV_SHIFT	(24)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 7a68398517c95..96b950f20c8d0 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1591,6 +1591,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
 		return 1;
 	}

+	if (fault_status == ESR_ELx_FSC_CONFLICT) {
+		/* We could be at any level. 0 covers all levels. */
+		__kvm_tlb_flush_vmid_ipa(vcpu->arch.hw_mmu, fault_ipa, 0);
+		return 1;
+	}
+
 	trace_kvm_guest_fault(*vcpu_pc(vcpu), kvm_vcpu_get_esr(vcpu),
 			      kvm_vcpu_get_hfar(vcpu), fault_ipa);

--
2.41.0.rc0.172.g3f132b7071-goog

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] KVM: arm64: Skip break phase when we have FEAT_BBM level 2
  2023-06-02 17:01 [PATCH 0/3] Relax break-before-make use with FEAT_BBM Colton Lewis
  2023-06-02 17:01 ` [PATCH 1/3] arm64: Add a capability for FEAT_BBM level 2 Colton Lewis
  2023-06-02 17:01 ` [PATCH 2/3] KVM: arm64: Clear possible conflict aborts Colton Lewis
@ 2023-06-02 17:01 ` Colton Lewis
  2023-06-04  8:23   ` Marc Zyngier
  2 siblings, 1 reply; 10+ messages in thread
From: Colton Lewis @ 2023-06-02 17:01 UTC (permalink / raw)
  To: kvm
  Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	James Morse, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, kvmarm, Colton Lewis

Skip the break phase of break-before-make when the CPU has FEAT_BBM
level 2. This allows skipping some expensive invalidation and
serialization and should result in significant performance
improvements when changing block size.

The ARM manual section D5.10.1 specifically states under heading
"Support levels for changing block size" that FEAT_BBM Level 2 support
means changing block size does not break coherency, ordering
guarantees, or uniprocessor semantics.

Because a compare-and-exchange operation was used in the break phase
to serialize access to the PTE, an analogous compare-and-exchange is
introduced in the make phase to ensure serialization remains even if
the break phase is skipped and proper handling is introduced to
account for this function now having a way to fail.

Considering the possibility that the new pte has different permissions
than the old pte, the minimum necessary tlb invalidations are used.

Signed-off-by: Colton Lewis <coltonlewis@google.com>
---
 arch/arm64/kvm/hyp/pgtable.c | 58 +++++++++++++++++++++++++++++++-----
 1 file changed, 51 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 8acab89080af9..6778e3df697f7 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -643,6 +643,11 @@ static bool stage2_has_fwb(struct kvm_pgtable *pgt)
 	return !(pgt->flags & KVM_PGTABLE_S2_NOFWB);
 }

+static bool stage2_has_bbm_level2(void)
+{
+	return cpus_have_const_cap(ARM64_HAS_STAGE2_BBM2);
+}
+
 #define KVM_S2_MEMATTR(pgt, attr) PAGE_S2_MEMATTR(attr, stage2_has_fwb(pgt))

 static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot prot,
@@ -730,7 +735,7 @@ static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_
  * @ctx: context of the visited pte.
  * @mmu: stage-2 mmu
  *
- * Returns: true if the pte was successfully broken.
+ * Returns: true if the pte was successfully broken or there is no need.
  *
  * If the removed pte was valid, performs the necessary serialization and TLB
  * invalidation for the old value. For counted ptes, drops the reference count
@@ -750,6 +755,10 @@ static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
 		return false;
 	}

+	/* There is no need to break the pte. */
+	if (stage2_has_bbm_level2())
+		return true;
+
 	if (!stage2_try_set_pte(ctx, KVM_INVALID_PTE_LOCKED))
 		return false;

@@ -771,16 +780,45 @@ static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
 	return true;
 }

-static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
+static bool stage2_pte_perms_equal(kvm_pte_t p1, kvm_pte_t p2)
+{
+	u64 perms1 = p1 & KVM_PGTABLE_PROT_RWX;
+	u64 perms2 = p2 & KVM_PGTABLE_PROT_RWX;
+
+	return perms1 == perms2;
+}
+
+/**
+ * stage2_try_make_pte() - Attempts to install a new pte.
+ *
+ * @ctx: context of the visited pte.
+ * @new: new pte to install
+ *
+ * Returns: true if the pte was successfully installed
+ *
+ * If the old pte had different permissions, perform appropriate TLB
+ * invalidation for the old value. For counted ptes, drops the
+ * reference count on the containing table page.
+ */
+static bool stage2_try_make_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu, kvm_pte_t new)
 {
 	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;

-	WARN_ON(!stage2_pte_is_locked(*ctx->ptep));
+	if (!stage2_has_bbm_level2())
+		WARN_ON(!stage2_pte_is_locked(*ctx->ptep));
+
+	if (!stage2_try_set_pte(ctx, new))
+		return false;
+
+	if (kvm_pte_table(ctx->old, ctx->level))
+		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
+	else if (kvm_pte_valid(ctx->old) && !stage2_pte_perms_equal(ctx->old, new))
+		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa_nsh, mmu, ctx->addr, ctx->level);

 	if (stage2_pte_is_counted(new))
 		mm_ops->get_page(ctx->ptep);

-	smp_store_release(ctx->ptep, new);
+	return true;
 }

 static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
@@ -879,7 +917,8 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	    stage2_pte_executable(new))
 		mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);

-	stage2_make_pte(ctx, new);
+	if (!stage2_try_make_pte(ctx, data->mmu, new))
+		return -EAGAIN;

 	return 0;
 }
@@ -934,7 +973,9 @@ static int stage2_map_walk_leaf(const struct kvm_pgtable_visit_ctx *ctx,
 	 * will be mapped lazily.
 	 */
 	new = kvm_init_table_pte(childp, mm_ops);
-	stage2_make_pte(ctx, new);
+
+	if (!stage2_try_make_pte(ctx, data->mmu, new))
+		return -EAGAIN;

 	return 0;
 }
@@ -1385,7 +1426,10 @@ static int stage2_split_walker(const struct kvm_pgtable_visit_ctx *ctx,
 	 * writes the PTE using smp_store_release().
 	 */
 	new = kvm_init_table_pte(childp, mm_ops);
-	stage2_make_pte(ctx, new);
+
+	if (!stage2_try_make_pte(ctx, mmu, new))
+		return -EAGAIN;
+
 	dsb(ishst);
 	return 0;
 }
--
2.41.0.rc0.172.g3f132b7071-goog

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] KVM: arm64: Skip break phase when we have FEAT_BBM level 2
  2023-06-02 17:01 ` [PATCH 3/3] KVM: arm64: Skip break phase when we have FEAT_BBM level 2 Colton Lewis
@ 2023-06-04  8:23   ` Marc Zyngier
  2023-06-05 21:36     ` Oliver Upton
  0 siblings, 1 reply; 10+ messages in thread
From: Marc Zyngier @ 2023-06-04  8:23 UTC (permalink / raw)
  To: Colton Lewis
  Cc: kvm, Catalin Marinas, Will Deacon, Oliver Upton, James Morse,
	Suzuki K Poulose, Zenghui Yu, linux-arm-kernel, linux-kernel,
	kvmarm

On Fri, 02 Jun 2023 18:01:47 +0100,
Colton Lewis <coltonlewis@google.com> wrote:
> 
> Skip the break phase of break-before-make when the CPU has FEAT_BBM
> level 2. This allows skipping some expensive invalidation and
> serialization and should result in significant performance
> improvements when changing block size.
> 
> The ARM manual section D5.10.1 specifically states under heading
> "Support levels for changing block size" that FEAT_BBM Level 2 support
> means changing block size does not break coherency, ordering
> guarantees, or uniprocessor semantics.

I'd like to have that sort of reference in the code itself (spelling
out the revision on the ARM ARM this is taken from, as this section is
in D8.14.2 in DDI0487J.a). I'd also like it to point out that this
only applies when the *output addresses* are the same.

> 
> Because a compare-and-exchange operation was used in the break phase
> to serialize access to the PTE, an analogous compare-and-exchange is
> introduced in the make phase to ensure serialization remains even if
> the break phase is skipped and proper handling is introduced to
> account for this function now having a way to fail.
> 
> Considering the possibility that the new pte has different permissions
> than the old pte, the minimum necessary tlb invalidations are used.
> 
> Signed-off-by: Colton Lewis <coltonlewis@google.com>
> ---
>  arch/arm64/kvm/hyp/pgtable.c | 58 +++++++++++++++++++++++++++++++-----
>  1 file changed, 51 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
> index 8acab89080af9..6778e3df697f7 100644
> --- a/arch/arm64/kvm/hyp/pgtable.c
> +++ b/arch/arm64/kvm/hyp/pgtable.c
> @@ -643,6 +643,11 @@ static bool stage2_has_fwb(struct kvm_pgtable *pgt)
>  	return !(pgt->flags & KVM_PGTABLE_S2_NOFWB);
>  }
> 
> +static bool stage2_has_bbm_level2(void)
> +{
> +	return cpus_have_const_cap(ARM64_HAS_STAGE2_BBM2);

By the time we look at unmapping things from S2, the capabilities
should be finalised, so this should read cpus_have_final_cap()
instead.

> +}
> +
>  #define KVM_S2_MEMATTR(pgt, attr) PAGE_S2_MEMATTR(attr, stage2_has_fwb(pgt))
> 
>  static int stage2_set_prot_attr(struct kvm_pgtable *pgt, enum kvm_pgtable_prot prot,
> @@ -730,7 +735,7 @@ static bool stage2_try_set_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_
>   * @ctx: context of the visited pte.
>   * @mmu: stage-2 mmu
>   *
> - * Returns: true if the pte was successfully broken.
> + * Returns: true if the pte was successfully broken or there is no need.

No need of what? Why? The rationale should be captured in the comments
below.

>   *
>   * If the removed pte was valid, performs the necessary serialization and TLB
>   * invalidation for the old value. For counted ptes, drops the reference count
> @@ -750,6 +755,10 @@ static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
>  		return false;
>  	}
> 
> +	/* There is no need to break the pte. */
> +	if (stage2_has_bbm_level2())
> +		return true;
> +
>  	if (!stage2_try_set_pte(ctx, KVM_INVALID_PTE_LOCKED))
>  		return false;
> 
> @@ -771,16 +780,45 @@ static bool stage2_try_break_pte(const struct kvm_pgtable_visit_ctx *ctx,
>  	return true;
>  }
> 
> -static void stage2_make_pte(const struct kvm_pgtable_visit_ctx *ctx, kvm_pte_t new)
> +static bool stage2_pte_perms_equal(kvm_pte_t p1, kvm_pte_t p2)
> +{
> +	u64 perms1 = p1 & KVM_PGTABLE_PROT_RWX;
> +	u64 perms2 = p2 & KVM_PGTABLE_PROT_RWX;

Huh? The KVM_PGTABLE_PROT_* constants are part of an *enum*, and do
*not* represent the bit layout of the PTE.

How did you test this code?

> +
> +	return perms1 == perms2;
> +}
> +
> +/**
> + * stage2_try_make_pte() - Attempts to install a new pte.
> + *
> + * @ctx: context of the visited pte.
> + * @new: new pte to install
> + *
> + * Returns: true if the pte was successfully installed
> + *
> + * If the old pte had different permissions, perform appropriate TLB
> + * invalidation for the old value. For counted ptes, drops the
> + * reference count on the containing table page.
> + */
> +static bool stage2_try_make_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu, kvm_pte_t new)
>  {
>  	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> 
> -	WARN_ON(!stage2_pte_is_locked(*ctx->ptep));
> +	if (!stage2_has_bbm_level2())
> +		WARN_ON(!stage2_pte_is_locked(*ctx->ptep));
> +
> +	if (!stage2_try_set_pte(ctx, new))
> +		return false;
> +
> +	if (kvm_pte_table(ctx->old, ctx->level))
> +		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
> +	else if (kvm_pte_valid(ctx->old) && !stage2_pte_perms_equal(ctx->old, new))
> +		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa_nsh, mmu, ctx->addr, ctx->level);

Why a non-shareable invalidation? Nothing in this code captures the
rationale for it. What if the permission change was a *restriction* of
the permission? It should absolutely be global, and not local.

>
>  	if (stage2_pte_is_counted(new))
>  		mm_ops->get_page(ctx->ptep);
> 
> -	smp_store_release(ctx->ptep, new);
> +	return true;
>  }
> 
>  static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
> @@ -879,7 +917,8 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
>  	    stage2_pte_executable(new))
>  		mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
> 
> -	stage2_make_pte(ctx, new);
> +	if (!stage2_try_make_pte(ctx, data->mmu, new))
> +		return -EAGAIN;

So we don't have forward-progress guarantees anymore? I'm not sure
this is a change I'm overly fond of.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] arm64: Add a capability for FEAT_BBM level 2
  2023-06-02 17:01 ` [PATCH 1/3] arm64: Add a capability for FEAT_BBM level 2 Colton Lewis
@ 2023-06-05 15:07   ` Robin Murphy
  0 siblings, 0 replies; 10+ messages in thread
From: Robin Murphy @ 2023-06-05 15:07 UTC (permalink / raw)
  To: Colton Lewis, kvm
  Cc: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton,
	James Morse, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, kvmarm, Ricardo Koller

On 2023-06-02 18:01, Colton Lewis wrote:
> From: Ricardo Koller <ricarkol@google.com>
> 
> Add a new capability to detect "Stage-2 Translation table
> break-before-make" (FEAT_BBM) level 2.

Why does this patch invent spurious "stage 2" references everywhere? The 
full name of FEAT_BBM is "Translation table break-before-make levels", 
and it is not specific to one stage of translation.

Thanks,
Robin.

> Signed-off-by: Ricardo Koller <ricarkol@google.com>
> ---
>   arch/arm64/kernel/cpufeature.c | 11 +++++++++++
>   arch/arm64/tools/cpucaps       |  1 +
>   2 files changed, 12 insertions(+)
> 
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index c331c49a7d19c..c538060f7f66b 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -2455,6 +2455,17 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
>   		.min_field_value = 1,
>   		.matches = has_cpuid_feature,
>   	},
> +	{
> +		.desc = "Stage-2 Translation table break-before-make level 2",
> +		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
> +		.capability = ARM64_HAS_STAGE2_BBM2,
> +		.sys_reg = SYS_ID_AA64MMFR2_EL1,
> +		.sign = FTR_UNSIGNED,
> +		.field_pos = ID_AA64MMFR2_EL1_BBM_SHIFT,
> +		.field_width = 4,
> +		.min_field_value = 2,
> +		.matches = has_cpuid_feature,
> +	},
>   	{
>   		.desc = "TLB range maintenance instructions",
>   		.capability = ARM64_HAS_TLB_RANGE,
> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
> index 40ba95472594d..010aca1892642 100644
> --- a/arch/arm64/tools/cpucaps
> +++ b/arch/arm64/tools/cpucaps
> @@ -41,6 +41,7 @@ HAS_PAN
>   HAS_RAS_EXTN
>   HAS_RNG
>   HAS_SB
> +HAS_STAGE2_BBM2
>   HAS_STAGE2_FWB
>   HAS_TIDCP1
>   HAS_TLB_RANGE
> --
> 2.41.0.rc0.172.g3f132b7071-goog
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] KVM: arm64: Skip break phase when we have FEAT_BBM level 2
  2023-06-04  8:23   ` Marc Zyngier
@ 2023-06-05 21:36     ` Oliver Upton
  2023-06-08 17:21       ` Will Deacon
  0 siblings, 1 reply; 10+ messages in thread
From: Oliver Upton @ 2023-06-05 21:36 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Colton Lewis, kvm, Catalin Marinas, Will Deacon, James Morse,
	Suzuki K Poulose, Zenghui Yu, linux-arm-kernel, linux-kernel,
	kvmarm

On Sun, Jun 04, 2023 at 09:23:39AM +0100, Marc Zyngier wrote:
> On Fri, 02 Jun 2023 18:01:47 +0100, Colton Lewis <coltonlewis@google.com> wrote:
> > +static bool stage2_try_make_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu, kvm_pte_t new)
> >  {
> >  	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> > 
> > -	WARN_ON(!stage2_pte_is_locked(*ctx->ptep));
> > +	if (!stage2_has_bbm_level2())
> > +		WARN_ON(!stage2_pte_is_locked(*ctx->ptep));
> > +
> > +	if (!stage2_try_set_pte(ctx, new))
> > +		return false;
> > +
> > +	if (kvm_pte_table(ctx->old, ctx->level))
> > +		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
> > +	else if (kvm_pte_valid(ctx->old) && !stage2_pte_perms_equal(ctx->old, new))
> > +		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa_nsh, mmu, ctx->addr, ctx->level);
> 
> Why a non-shareable invalidation? Nothing in this code captures the
> rationale for it. What if the permission change was a *restriction* of
> the permission? It should absolutely be global, and not local.

IIRC, Colton was testing largely with permission relaxation, and had
forward progress issues b.c. the stale TLB entry was never invalidated
in response to a permission fault.

Nonetheless, I very much agree with your suggestion. Non-Shareable
invalidations should only be applied after exhausting all other
invalidation requirements for a particular manipulation to the stage-2
tables.

> >
> >  	if (stage2_pte_is_counted(new))
> >  		mm_ops->get_page(ctx->ptep);
> > 
> > -	smp_store_release(ctx->ptep, new);
> > +	return true;
> >  }
> > 
> >  static void stage2_put_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu,
> > @@ -879,7 +917,8 @@ static int stage2_map_walker_try_leaf(const struct kvm_pgtable_visit_ctx *ctx,
> >  	    stage2_pte_executable(new))
> >  		mm_ops->icache_inval_pou(kvm_pte_follow(new, mm_ops), granule);
> > 
> > -	stage2_make_pte(ctx, new);
> > +	if (!stage2_try_make_pte(ctx, data->mmu, new))
> > +		return -EAGAIN;
> 
> So we don't have forward-progress guarantees anymore? I'm not sure
> this is a change I'm overly fond of.

I'll take the blame for the clunky wording here, though I do not believe
there are any real changes to our forward progress guarantees relative to
the existing code.

Previously, we did the CAS on the break side of things to have a fault
handler 'take ownership' of a PTE. The CAS now needs to move onto the
make end when doing a BBM=2 style manipulation.

Would you rather see something explicitly keyed on the BBM capability
here? Then we could use a helper that implies unconditional success for
BBM!=2 systems.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] KVM: arm64: Skip break phase when we have FEAT_BBM level 2
  2023-06-05 21:36     ` Oliver Upton
@ 2023-06-08 17:21       ` Will Deacon
  2023-06-09 14:59         ` Oliver Upton
  0 siblings, 1 reply; 10+ messages in thread
From: Will Deacon @ 2023-06-08 17:21 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, Colton Lewis, kvm, Catalin Marinas, James Morse,
	Suzuki K Poulose, Zenghui Yu, linux-arm-kernel, linux-kernel,
	kvmarm

On Mon, Jun 05, 2023 at 02:36:00PM -0700, Oliver Upton wrote:
> On Sun, Jun 04, 2023 at 09:23:39AM +0100, Marc Zyngier wrote:
> > On Fri, 02 Jun 2023 18:01:47 +0100, Colton Lewis <coltonlewis@google.com> wrote:
> > > +static bool stage2_try_make_pte(const struct kvm_pgtable_visit_ctx *ctx, struct kvm_s2_mmu *mmu, kvm_pte_t new)
> > >  {
> > >  	struct kvm_pgtable_mm_ops *mm_ops = ctx->mm_ops;
> > > 
> > > -	WARN_ON(!stage2_pte_is_locked(*ctx->ptep));
> > > +	if (!stage2_has_bbm_level2())
> > > +		WARN_ON(!stage2_pte_is_locked(*ctx->ptep));
> > > +
> > > +	if (!stage2_try_set_pte(ctx, new))
> > > +		return false;
> > > +
> > > +	if (kvm_pte_table(ctx->old, ctx->level))
> > > +		kvm_call_hyp(__kvm_tlb_flush_vmid, mmu);
> > > +	else if (kvm_pte_valid(ctx->old) && !stage2_pte_perms_equal(ctx->old, new))
> > > +		kvm_call_hyp(__kvm_tlb_flush_vmid_ipa_nsh, mmu, ctx->addr, ctx->level);
> > 
> > Why a non-shareable invalidation? Nothing in this code captures the
> > rationale for it. What if the permission change was a *restriction* of
> > the permission? It should absolutely be global, and not local.
> 
> IIRC, Colton was testing largely with permission relaxation, and had
> forward progress issues b.c. the stale TLB entry was never invalidated
> in response to a permission fault.

Would the series at:

https://lore.kernel.org/r/5d8e1f752051173d2d1b5c3e14b54eb3506ed3ef.1684892404.git-series.apopple@nvidia.com

help with that?

Will

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] KVM: arm64: Skip break phase when we have FEAT_BBM level 2
  2023-06-08 17:21       ` Will Deacon
@ 2023-06-09 14:59         ` Oliver Upton
  0 siblings, 0 replies; 10+ messages in thread
From: Oliver Upton @ 2023-06-09 14:59 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, Colton Lewis, kvm, Catalin Marinas, James Morse,
	Suzuki K Poulose, Zenghui Yu, linux-arm-kernel, linux-kernel,
	kvmarm

Hey Will,

On Thu, Jun 08, 2023 at 06:21:13PM +0100, Will Deacon wrote:
> > IIRC, Colton was testing largely with permission relaxation, and had
> > forward progress issues b.c. the stale TLB entry was never invalidated
> > in response to a permission fault.
> 
> Would the series at:
> 
> https://lore.kernel.org/r/5d8e1f752051173d2d1b5c3e14b54eb3506ed3ef.1684892404.git-series.apopple@nvidia.com
> 
> help with that?

Heh, that's a rather interesting patch :)

I don't think it is directly related to the problem Colton encounters,
though the symptoms are similar. This crops up when KVM uses a stricter
permission set than the primary MMU, like lazy X for deferred I$
maintenance and write-protection for dirty logging. KVM policy led to
the stale TLB entry, so KVM is the one that needs to initiate the
invalidation.

-- 
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] KVM: arm64: Clear possible conflict aborts
  2023-06-02 17:01 ` [PATCH 2/3] KVM: arm64: Clear possible conflict aborts Colton Lewis
@ 2023-06-09 15:44   ` Oliver Upton
  0 siblings, 0 replies; 10+ messages in thread
From: Oliver Upton @ 2023-06-09 15:44 UTC (permalink / raw)
  To: Colton Lewis
  Cc: kvm, Catalin Marinas, Will Deacon, Marc Zyngier, James Morse,
	Suzuki K Poulose, Zenghui Yu, linux-arm-kernel, linux-kernel,
	kvmarm

On Fri, Jun 02, 2023 at 05:01:46PM +0000, Colton Lewis wrote:
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 7a68398517c95..96b950f20c8d0 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1591,6 +1591,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>  		return 1;
>  	}
> 
> +	if (fault_status == ESR_ELx_FSC_CONFLICT) {
> +		/* We could be at any level. 0 covers all levels. */
> +		__kvm_tlb_flush_vmid_ipa(vcpu->arch.hw_mmu, fault_ipa, 0);
> +		return 1;
> +	}
> +

This does not match the architecture. Please read DDI0487J D8.14.3
"TLB maintenance due to TLB conflict", which tells you exactly how to
resolve the conflict. TL; DR: TLBI by address is _not_ guaranteed to
invalidate duplicate TLB entries. vmalls12e1 is your friend.

The conflicting TLB entries are local to the CPU that took the abort, so
you don't need to do any broadcast.

--
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-06-09 15:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-02 17:01 [PATCH 0/3] Relax break-before-make use with FEAT_BBM Colton Lewis
2023-06-02 17:01 ` [PATCH 1/3] arm64: Add a capability for FEAT_BBM level 2 Colton Lewis
2023-06-05 15:07   ` Robin Murphy
2023-06-02 17:01 ` [PATCH 2/3] KVM: arm64: Clear possible conflict aborts Colton Lewis
2023-06-09 15:44   ` Oliver Upton
2023-06-02 17:01 ` [PATCH 3/3] KVM: arm64: Skip break phase when we have FEAT_BBM level 2 Colton Lewis
2023-06-04  8:23   ` Marc Zyngier
2023-06-05 21:36     ` Oliver Upton
2023-06-08 17:21       ` Will Deacon
2023-06-09 14:59         ` Oliver Upton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).