All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
@ 2023-01-09 15:19 Gabriel Krisman Bertazi
  2023-01-11  8:34 ` Anshuman Khandual
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-01-09 15:19 UTC (permalink / raw)
  To: catalin.marinas, will; +Cc: linux-arm-kernel, broonie, Gabriel Krisman Bertazi

Accessing AA64MMFR1_EL1 is expensive in KVM guests, since it is emulated
in the hypervisor.  In fact, ARM documentation mentions some feature
registers are not supposed to be accessed frequently by the OS, and
therefore should be emulated for guests [1].

Commit 0388f9c74330 ("arm64: mm: Implement
arch_wants_old_prefaulted_pte()") introduced a read of this register in
the page fault path.  But, even when the feature of setting faultaround
pages with the old flag is disabled for a given cpu, we are still paying
the cost of checking the register on every pagefault. This results in an
explosion of vmexit events in KVM guests, which directly impacts the
performance of virtualized workloads.  For instance, running kernbench
yields a 15% increase in system time solely due to the increased vmexit
cycles.

This patch avoids the extra cost by using the sanitized cached value.
It should be safe to do so, since this register mustn't change for a
given cpu.

[1] https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/Learn%20the%20Architecture/Armv8-A%20virtualization.pdf?revision=a765a7df-1a00-434d-b241-357bfda2dd31

Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>

---
Changes since v1:
  - Rely on read_sanitised_ftr_reg instead of static var (will)
---
 arch/arm64/include/asm/cpufeature.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 03d1c9d7af82..3d1a5677321e 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -864,7 +864,11 @@ static inline bool cpu_has_hw_af(void)
 	if (!IS_ENABLED(CONFIG_ARM64_HW_AFDBM))
 		return false;
 
-	mmfr1 = read_cpuid(ID_AA64MMFR1_EL1);
+	/*
+	 * Use cached version to avoid emulated msr operation on KVM
+	 * guests.
+	 */
+	mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
 	return cpuid_feature_extract_unsigned_field(mmfr1,
 						ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
 }
-- 
2.35.3


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
  2023-01-09 15:19 [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path Gabriel Krisman Bertazi
@ 2023-01-11  8:34 ` Anshuman Khandual
  2023-01-11 13:31   ` Gabriel Krisman Bertazi
  2023-01-12  3:05 ` Anshuman Khandual
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Anshuman Khandual @ 2023-01-11  8:34 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi, catalin.marinas, will; +Cc: linux-arm-kernel, broonie



On 1/9/23 20:49, Gabriel Krisman Bertazi wrote:
> Accessing AA64MMFR1_EL1 is expensive in KVM guests, since it is emulated
> in the hypervisor.  In fact, ARM documentation mentions some feature
> registers are not supposed to be accessed frequently by the OS, and
> therefore should be emulated for guests [1].

I am just curious, is this the only system register access (AA64MMFR1_EL1)
causing such performance problems ?

> 
> Commit 0388f9c74330 ("arm64: mm: Implement
> arch_wants_old_prefaulted_pte()") introduced a read of this register in
> the page fault path.  But, even when the feature of setting faultaround

Right, although cpu_has_hw_af() was added earlier via commit 47d7b15b88f9
("arm64: cpufeature: introduce helper cpu_has_hw_af()"), but above commit
did add this on regular page fault path via do_set_pte().


> pages with the old flag is disabled for a given cpu, we are still paying
> the cost of checking the register on every pagefault. This results in an

Right, accessing a system register on each page fault is not optimal.

> explosion of vmexit events in KVM guests, which directly impacts the
> performance of virtualized workloads.  For instance, running kernbench
> yields a 15% increase in system time solely due to the increased vmexit
> cycles.
> 
> This patch avoids the extra cost by using the sanitized cached value.
> It should be safe to do so, since this register mustn't change for a
> given cpu.
> 
> [1] https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/Learn%20the%20Architecture/Armv8-A%20virtualization.pdf?revision=a765a7df-1a00-434d-b241-357bfda2dd31
> 
> Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
> 
> ---
> Changes since v1:
>   - Rely on read_sanitised_ftr_reg instead of static var (will)
> ---
>  arch/arm64/include/asm/cpufeature.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 03d1c9d7af82..3d1a5677321e 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -864,7 +864,11 @@ static inline bool cpu_has_hw_af(void)
>  	if (!IS_ENABLED(CONFIG_ARM64_HW_AFDBM))
>  		return false;
>  
> -	mmfr1 = read_cpuid(ID_AA64MMFR1_EL1);
> +	/*
> +	 * Use cached version to avoid emulated msr operation on KVM
> +	 * guests.
> +	 */
> +	mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
>  	return cpuid_feature_extract_unsigned_field(mmfr1,
>  						ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
>  }

LGTM but as mentioned earlier, are there not other similar instances or this
is just more problematic being on direct page fault path ? 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
  2023-01-11  8:34 ` Anshuman Khandual
@ 2023-01-11 13:31   ` Gabriel Krisman Bertazi
  2023-01-12  2:57     ` Anshuman Khandual
  0 siblings, 1 reply; 12+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-01-11 13:31 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: catalin.marinas, will, linux-arm-kernel, broonie

Anshuman Khandual <anshuman.khandual@arm.com> writes:

> On 1/9/23 20:49, Gabriel Krisman Bertazi wrote:
>> Accessing AA64MMFR1_EL1 is expensive in KVM guests, since it is emulated
>> in the hypervisor.  In fact, ARM documentation mentions some feature
>> registers are not supposed to be accessed frequently by the OS, and
>> therefore should be emulated for guests [1].
>
> I am just curious, is this the only system register access (AA64MMFR1_EL1)
> causing such performance problems ?

I haven't audited all the system registers.  For AA64MMFR1_EL1 this is
the only instance where the frequency of access affects performance in a
meaningful way for my workload.

I have a real-world bug report about it, and by profiling vm exit
events, I can also argue this is the only instance of any emulated msr
read/write that happens frequently enough to change the order of
magnitude of exit events measured by perf for my workload between, at
least, 5.4 (it was introduced in v5.12, but I have data back to 5.4) and
mainline.

>> Commit 0388f9c74330 ("arm64: mm: Implement
>> arch_wants_old_prefaulted_pte()") introduced a read of this register in
>> the page fault path.  But, even when the feature of setting faultaround
>
> Right, although cpu_has_hw_af() was added earlier via commit 47d7b15b88f9
> ("arm64: cpufeature: introduce helper cpu_has_hw_af()"), but above commit
> did add this on regular page fault path via do_set_pte().

Indeed.  The only other usage of this function is in wp_page_copy, and,
from what I can tell, it is in an unlikely() branch when COW is being
performed on a page that was recently unmapped.  It is not something
frequent enough that I saw in profiling.

>>  						ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
>>  }
>
> LGTM but as mentioned earlier, are there not other similar instances or this
> is just more problematic being on direct page fault path ?

I think a full audit of the emulated system registers in kvm will be
required to definitely answer it.  But this instance is, by far, the hottest
case in the codebase.

-- 
Gabriel Krisman Bertazi

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
  2023-01-11 13:31   ` Gabriel Krisman Bertazi
@ 2023-01-12  2:57     ` Anshuman Khandual
  0 siblings, 0 replies; 12+ messages in thread
From: Anshuman Khandual @ 2023-01-12  2:57 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi, Anshuman Khandual
  Cc: catalin.marinas, will, linux-arm-kernel, broonie



On 1/11/23 19:01, Gabriel Krisman Bertazi wrote:
> Anshuman Khandual <anshuman.khandual@arm.com> writes:
> 
>> On 1/9/23 20:49, Gabriel Krisman Bertazi wrote:
>>> Accessing AA64MMFR1_EL1 is expensive in KVM guests, since it is emulated
>>> in the hypervisor.  In fact, ARM documentation mentions some feature
>>> registers are not supposed to be accessed frequently by the OS, and
>>> therefore should be emulated for guests [1].
>>
>> I am just curious, is this the only system register access (AA64MMFR1_EL1)
>> causing such performance problems ?
> 
> I haven't audited all the system registers.  For AA64MMFR1_EL1 this is
> the only instance where the frequency of access affects performance in a
> meaningful way for my workload.
> 
> I have a real-world bug report about it, and by profiling vm exit
> events, I can also argue this is the only instance of any emulated msr
> read/write that happens frequently enough to change the order of
> magnitude of exit events measured by perf for my workload between, at
> least, 5.4 (it was introduced in v5.12, but I have data back to 5.4) and
> mainline.
> 
>>> Commit 0388f9c74330 ("arm64: mm: Implement
>>> arch_wants_old_prefaulted_pte()") introduced a read of this register in
>>> the page fault path.  But, even when the feature of setting faultaround
>>
>> Right, although cpu_has_hw_af() was added earlier via commit 47d7b15b88f9
>> ("arm64: cpufeature: introduce helper cpu_has_hw_af()"), but above commit
>> did add this on regular page fault path via do_set_pte().
> 
> Indeed.  The only other usage of this function is in wp_page_copy, and,
> from what I can tell, it is in an unlikely() branch when COW is being
> performed on a page that was recently unmapped.  It is not something
> frequent enough that I saw in profiling.
> 
>>>  						ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
>>>  }
>>
>> LGTM but as mentioned earlier, are there not other similar instances or this
>> is just more problematic being on direct page fault path ?
> 
> I think a full audit of the emulated system registers in kvm will be
> required to definitely answer it.  But this instance is, by far, the hottest
> case in the codebase.
> 

Thanks for the additional details.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
  2023-01-09 15:19 [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path Gabriel Krisman Bertazi
  2023-01-11  8:34 ` Anshuman Khandual
@ 2023-01-12  3:05 ` Anshuman Khandual
  2023-01-16 20:41 ` Gabriel Krisman Bertazi
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Anshuman Khandual @ 2023-01-12  3:05 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi, catalin.marinas, will; +Cc: linux-arm-kernel, broonie



On 1/9/23 20:49, Gabriel Krisman Bertazi wrote:
> Accessing AA64MMFR1_EL1 is expensive in KVM guests, since it is emulated
> in the hypervisor.  In fact, ARM documentation mentions some feature
> registers are not supposed to be accessed frequently by the OS, and
> therefore should be emulated for guests [1].
> 
> Commit 0388f9c74330 ("arm64: mm: Implement
> arch_wants_old_prefaulted_pte()") introduced a read of this register in
> the page fault path.  But, even when the feature of setting faultaround
> pages with the old flag is disabled for a given cpu, we are still paying
> the cost of checking the register on every pagefault. This results in an
> explosion of vmexit events in KVM guests, which directly impacts the
> performance of virtualized workloads.  For instance, running kernbench
> yields a 15% increase in system time solely due to the increased vmexit
> cycles.
> 
> This patch avoids the extra cost by using the sanitized cached value.
> It should be safe to do so, since this register mustn't change for a
> given cpu.
> 
> [1] https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/Learn%20the%20Architecture/Armv8-A%20virtualization.pdf?revision=a765a7df-1a00-434d-b241-357bfda2dd31

A small nit. This URL length causes a checkpatch.pl warning.

WARNING: Possible unwrapped commit description (prefer a maximum 75 chars per line)

Just wondering, if this URL can be replaced with a document identification
number or something similar for a more cleaner commit message, not sure if
such a thing is even feasible.

> 
> Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
> 
> ---
> Changes since v1:
>   - Rely on read_sanitised_ftr_reg instead of static var (will)
> ---
>  arch/arm64/include/asm/cpufeature.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 03d1c9d7af82..3d1a5677321e 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -864,7 +864,11 @@ static inline bool cpu_has_hw_af(void)
>  	if (!IS_ENABLED(CONFIG_ARM64_HW_AFDBM))
>  		return false;
>  
> -	mmfr1 = read_cpuid(ID_AA64MMFR1_EL1);
> +	/*
> +	 * Use cached version to avoid emulated msr operation on KVM
> +	 * guests.
> +	 */
> +	mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
>  	return cpuid_feature_extract_unsigned_field(mmfr1,
>  						ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
>  }

FWIW

Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
  2023-01-09 15:19 [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path Gabriel Krisman Bertazi
  2023-01-11  8:34 ` Anshuman Khandual
  2023-01-12  3:05 ` Anshuman Khandual
@ 2023-01-16 20:41 ` Gabriel Krisman Bertazi
  2023-01-17 16:14   ` Will Deacon
  2023-01-18 15:44 ` Catalin Marinas
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-01-16 20:41 UTC (permalink / raw)
  To: catalin.marinas; +Cc: will, linux-arm-kernel, broonie

Gabriel Krisman Bertazi <krisman@suse.de> writes:

> Accessing AA64MMFR1_EL1 is expensive in KVM guests, since it is emulated
> in the hypervisor.  In fact, ARM documentation mentions some feature
> registers are not supposed to be accessed frequently by the OS, and
> therefore should be emulated for guests [1].
>
> Commit 0388f9c74330 ("arm64: mm: Implement
> arch_wants_old_prefaulted_pte()") introduced a read of this register in
> the page fault path.  But, even when the feature of setting faultaround
> pages with the old flag is disabled for a given cpu, we are still paying
> the cost of checking the register on every pagefault. This results in an
> explosion of vmexit events in KVM guests, which directly impacts the
> performance of virtualized workloads.  For instance, running kernbench
> yields a 15% increase in system time solely due to the increased vmexit
> cycles.
>
> This patch avoids the extra cost by using the sanitized cached value.
> It should be safe to do so, since this register mustn't change for a
> given cpu.
>
> [1] https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/Learn%20the%20Architecture/Armv8-A%20virtualization.pdf?revision=a765a7df-1a00-434d-b241-357bfda2dd31
>
> Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>

Hi,

Considering the performance impact on kvm guests, unless someone
opposes, can we get this queued already for -rc5?

-- 
Gabriel Krisman Bertazi

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
  2023-01-16 20:41 ` Gabriel Krisman Bertazi
@ 2023-01-17 16:14   ` Will Deacon
  2023-01-18 16:08     ` Gabriel Krisman Bertazi
  0 siblings, 1 reply; 12+ messages in thread
From: Will Deacon @ 2023-01-17 16:14 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi; +Cc: catalin.marinas, linux-arm-kernel, broonie

On Mon, Jan 16, 2023 at 05:41:35PM -0300, Gabriel Krisman Bertazi wrote:
> Gabriel Krisman Bertazi <krisman@suse.de> writes:
> 
> > Accessing AA64MMFR1_EL1 is expensive in KVM guests, since it is emulated
> > in the hypervisor.  In fact, ARM documentation mentions some feature
> > registers are not supposed to be accessed frequently by the OS, and
> > therefore should be emulated for guests [1].
> >
> > Commit 0388f9c74330 ("arm64: mm: Implement
> > arch_wants_old_prefaulted_pte()") introduced a read of this register in
> > the page fault path.  But, even when the feature of setting faultaround
> > pages with the old flag is disabled for a given cpu, we are still paying
> > the cost of checking the register on every pagefault. This results in an
> > explosion of vmexit events in KVM guests, which directly impacts the
> > performance of virtualized workloads.  For instance, running kernbench
> > yields a 15% increase in system time solely due to the increased vmexit
> > cycles.
> >
> > This patch avoids the extra cost by using the sanitized cached value.
> > It should be safe to do so, since this register mustn't change for a
> > given cpu.
> >
> > [1] https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/Learn%20the%20Architecture/Armv8-A%20virtualization.pdf?revision=a765a7df-1a00-434d-b241-357bfda2dd31
> >
> > Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
> 
> Hi,
> 
> Considering the performance impact on kvm guests, unless someone
> opposes, can we get this queued already for -rc5?

Given this has been the case since v5.12 afaict and it's not a correctness
issue, I was thinking we could queue this for 6.3?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
  2023-01-09 15:19 [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path Gabriel Krisman Bertazi
                   ` (2 preceding siblings ...)
  2023-01-16 20:41 ` Gabriel Krisman Bertazi
@ 2023-01-18 15:44 ` Catalin Marinas
  2023-01-18 16:06   ` Gabriel Krisman Bertazi
  2023-01-18 16:38 ` Will Deacon
  2023-01-20 16:59 ` Catalin Marinas
  5 siblings, 1 reply; 12+ messages in thread
From: Catalin Marinas @ 2023-01-18 15:44 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi; +Cc: will, linux-arm-kernel, broonie

On Mon, Jan 09, 2023 at 12:19:55PM -0300, Gabriel Krisman Bertazi wrote:
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 03d1c9d7af82..3d1a5677321e 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -864,7 +864,11 @@ static inline bool cpu_has_hw_af(void)
>  	if (!IS_ENABLED(CONFIG_ARM64_HW_AFDBM))
>  		return false;
>  
> -	mmfr1 = read_cpuid(ID_AA64MMFR1_EL1);
> +	/*
> +	 * Use cached version to avoid emulated msr operation on KVM
> +	 * guests.
> +	 */
> +	mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
>  	return cpuid_feature_extract_unsigned_field(mmfr1,
>  						ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
>  }

You can probably make it slightly faster if you add a cpucap feature. We
have ARM64_HW_DBM but not ARM64_HW_AF. Not sure you'd notice a
difference in practice though.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
  2023-01-18 15:44 ` Catalin Marinas
@ 2023-01-18 16:06   ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 12+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-01-18 16:06 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: will, linux-arm-kernel, broonie

Catalin Marinas <catalin.marinas@arm.com> writes:

> On Mon, Jan 09, 2023 at 12:19:55PM -0300, Gabriel Krisman Bertazi wrote:
>> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
>> index 03d1c9d7af82..3d1a5677321e 100644
>> --- a/arch/arm64/include/asm/cpufeature.h
>> +++ b/arch/arm64/include/asm/cpufeature.h
>> @@ -864,7 +864,11 @@ static inline bool cpu_has_hw_af(void)
>>  	if (!IS_ENABLED(CONFIG_ARM64_HW_AFDBM))
>>  		return false;
>>  
>> -	mmfr1 = read_cpuid(ID_AA64MMFR1_EL1);
>> +	/*
>> +	 * Use cached version to avoid emulated msr operation on KVM
>> +	 * guests.
>> +	 */
>> +	mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
>>  	return cpuid_feature_extract_unsigned_field(mmfr1,
>>  						ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
>>  }
>
> You can probably make it slightly faster if you add a cpucap feature. We
> have ARM64_HW_DBM but not ARM64_HW_AF. Not sure you'd notice a
> difference in practice though.

Hi Catalin,

I had a patch for that, but when I noticed the existence of
ARM64_HW_DBM, I dropped it. it is mentioned quickly in the v1.

From my measurements, when comparing this version to a static variable
caching the value (patch v1), the bsearch performance impact was not
significant.  I think it would be fine to keep this version.

but, of course, I can add the cpucap bit if it is preferred.  Please,
let me know.

Thanks,

-- 
Gabriel Krisman Bertazi

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
  2023-01-17 16:14   ` Will Deacon
@ 2023-01-18 16:08     ` Gabriel Krisman Bertazi
  0 siblings, 0 replies; 12+ messages in thread
From: Gabriel Krisman Bertazi @ 2023-01-18 16:08 UTC (permalink / raw)
  To: Will Deacon; +Cc: catalin.marinas, linux-arm-kernel, broonie

Will Deacon <will@kernel.org> writes:

> On Mon, Jan 16, 2023 at 05:41:35PM -0300, Gabriel Krisman Bertazi wrote:
>> 
>> Considering the performance impact on kvm guests, unless someone
>> opposes, can we get this queued already for -rc5?
>
> Given this has been the case since v5.12 afaict and it's not a correctness
> issue, I was thinking we could queue this for 6.3?

hey Will,

On a second thought, I'm fine with having it only in 6.3. :)


-- 
Gabriel Krisman Bertazi

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
  2023-01-09 15:19 [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path Gabriel Krisman Bertazi
                   ` (3 preceding siblings ...)
  2023-01-18 15:44 ` Catalin Marinas
@ 2023-01-18 16:38 ` Will Deacon
  2023-01-20 16:59 ` Catalin Marinas
  5 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2023-01-18 16:38 UTC (permalink / raw)
  To: Gabriel Krisman Bertazi; +Cc: catalin.marinas, linux-arm-kernel, broonie

On Mon, Jan 09, 2023 at 12:19:55PM -0300, Gabriel Krisman Bertazi wrote:
> Accessing AA64MMFR1_EL1 is expensive in KVM guests, since it is emulated
> in the hypervisor.  In fact, ARM documentation mentions some feature
> registers are not supposed to be accessed frequently by the OS, and
> therefore should be emulated for guests [1].
> 
> Commit 0388f9c74330 ("arm64: mm: Implement
> arch_wants_old_prefaulted_pte()") introduced a read of this register in
> the page fault path.  But, even when the feature of setting faultaround
> pages with the old flag is disabled for a given cpu, we are still paying
> the cost of checking the register on every pagefault. This results in an
> explosion of vmexit events in KVM guests, which directly impacts the
> performance of virtualized workloads.  For instance, running kernbench
> yields a 15% increase in system time solely due to the increased vmexit
> cycles.
> 
> This patch avoids the extra cost by using the sanitized cached value.
> It should be safe to do so, since this register mustn't change for a
> given cpu.
> 
> [1] https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/Learn%20the%20Architecture/Armv8-A%20virtualization.pdf?revision=a765a7df-1a00-434d-b241-357bfda2dd31
> 
> Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
> 
> ---
> Changes since v1:
>   - Rely on read_sanitised_ftr_reg instead of static var (will)
> ---
>  arch/arm64/include/asm/cpufeature.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
> index 03d1c9d7af82..3d1a5677321e 100644
> --- a/arch/arm64/include/asm/cpufeature.h
> +++ b/arch/arm64/include/asm/cpufeature.h
> @@ -864,7 +864,11 @@ static inline bool cpu_has_hw_af(void)
>  	if (!IS_ENABLED(CONFIG_ARM64_HW_AFDBM))
>  		return false;
>  
> -	mmfr1 = read_cpuid(ID_AA64MMFR1_EL1);
> +	/*
> +	 * Use cached version to avoid emulated msr operation on KVM
> +	 * guests.
> +	 */
> +	mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
>  	return cpuid_feature_extract_unsigned_field(mmfr1,
>  						ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
>  }

Acked-by: Will Deacon <will@kernel.org>

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
  2023-01-09 15:19 [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path Gabriel Krisman Bertazi
                   ` (4 preceding siblings ...)
  2023-01-18 16:38 ` Will Deacon
@ 2023-01-20 16:59 ` Catalin Marinas
  5 siblings, 0 replies; 12+ messages in thread
From: Catalin Marinas @ 2023-01-20 16:59 UTC (permalink / raw)
  To: will, Gabriel Krisman Bertazi; +Cc: linux-arm-kernel, broonie

On Mon, 09 Jan 2023 12:19:55 -0300, Gabriel Krisman Bertazi wrote:
> Accessing AA64MMFR1_EL1 is expensive in KVM guests, since it is emulated
> in the hypervisor.  In fact, ARM documentation mentions some feature
> registers are not supposed to be accessed frequently by the OS, and
> therefore should be emulated for guests [1].
> 
> Commit 0388f9c74330 ("arm64: mm: Implement
> arch_wants_old_prefaulted_pte()") introduced a read of this register in
> the page fault path.  But, even when the feature of setting faultaround
> pages with the old flag is disabled for a given cpu, we are still paying
> the cost of checking the register on every pagefault. This results in an
> explosion of vmexit events in KVM guests, which directly impacts the
> performance of virtualized workloads.  For instance, running kernbench
> yields a 15% increase in system time solely due to the increased vmexit
> cycles.
> 
> [...]

Applied to arm64 (for-next/misc), thanks!

[1/1] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
      https://git.kernel.org/arm64/c/a89c6bcdac22

-- 
Catalin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-01-20 17:00 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-09 15:19 [PATCH v2] arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path Gabriel Krisman Bertazi
2023-01-11  8:34 ` Anshuman Khandual
2023-01-11 13:31   ` Gabriel Krisman Bertazi
2023-01-12  2:57     ` Anshuman Khandual
2023-01-12  3:05 ` Anshuman Khandual
2023-01-16 20:41 ` Gabriel Krisman Bertazi
2023-01-17 16:14   ` Will Deacon
2023-01-18 16:08     ` Gabriel Krisman Bertazi
2023-01-18 15:44 ` Catalin Marinas
2023-01-18 16:06   ` Gabriel Krisman Bertazi
2023-01-18 16:38 ` Will Deacon
2023-01-20 16:59 ` Catalin Marinas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.