linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1] arch: Enable function alignment for arm64
@ 2022-12-08  5:36 Mina Almasry
  2022-12-08  6:47 ` Mina Almasry
  2023-01-24 12:09 ` Will Deacon
  0 siblings, 2 replies; 7+ messages in thread
From: Mina Almasry @ 2022-12-08  5:36 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Mina Almasry, Peter Zijlstra, linux-arm-kernel, linux-kernel

We recently ran into a double-digit percentage hackbench regression
when backporting commit 12df140f0bdf ("mm,hugetlb: take hugetlb_lock
before decrementing h->resv_huge_pages") to an older kernel. This was
surprising since hackbench does use hugetlb pages at all and the
modified code is not invoked. After some debugging we found that the
regression can be fixed by back-porting commit d49a0626216b ("arch:
Introduce CONFIG_FUNCTION_ALIGNMENT") and enabling function alignment
for arm64. I suggest enabling it by default for arm64 if possible.

Tested by examing function alignment on a compiled object file
before/after:

After this patch:

$ ~/is-aligned.sh mm/hugetlb.o 16
file=mm/hugetlb.o, alignment=16
total number of functions: 146
total number of unaligned: 0

Before this patch:

$ ~/is-aligned.sh mm/hugetlb.o 16
file=mm/hugetlb.o, alignment=16
total number of functions: 146
total number of unaligned: 94

Cc: Peter Zijlstra <peterz@infradead.org>
---
 arch/arm64/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index cf6d1cd8b6dc..bcc9e1578937 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -235,6 +235,7 @@ config ARM64
 	select TRACE_IRQFLAGS_SUPPORT
 	select TRACE_IRQFLAGS_NMI_SUPPORT
 	select HAVE_SOFTIRQ_ON_OWN_STACK
+	select FUNCTION_ALIGNMENT_16B
 	help
 	  ARM 64-bit (AArch64) Linux support.

--
2.39.0.rc0.267.gcb52ba06e7-goog

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v1] arch: Enable function alignment for arm64
  2022-12-08  5:36 [PATCH v1] arch: Enable function alignment for arm64 Mina Almasry
@ 2022-12-08  6:47 ` Mina Almasry
  2023-01-24 12:09 ` Will Deacon
  1 sibling, 0 replies; 7+ messages in thread
From: Mina Almasry @ 2022-12-08  6:47 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon
  Cc: Peter Zijlstra, linux-arm-kernel, linux-kernel

On Wed, Dec 7, 2022 at 9:36 PM Mina Almasry <almasrymina@google.com> wrote:
>
> We recently ran into a double-digit percentage hackbench regression
> when backporting commit 12df140f0bdf ("mm,hugetlb: take hugetlb_lock
> before decrementing h->resv_huge_pages") to an older kernel. This was
> surprising since hackbench does use hugetlb pages at all and the
> modified code is not invoked. After some debugging we found that the
> regression can be fixed by back-porting commit d49a0626216b ("arch:
> Introduce CONFIG_FUNCTION_ALIGNMENT") and enabling function alignment
> for arm64. I suggest enabling it by default for arm64 if possible.
>
> Tested by examing function alignment on a compiled object file
> before/after:
>
> After this patch:
>
> $ ~/is-aligned.sh mm/hugetlb.o 16
> file=mm/hugetlb.o, alignment=16
> total number of functions: 146
> total number of unaligned: 0
>
> Before this patch:
>
> $ ~/is-aligned.sh mm/hugetlb.o 16
> file=mm/hugetlb.o, alignment=16
> total number of functions: 146
> total number of unaligned: 94
>
> Cc: Peter Zijlstra <peterz@infradead.org>

I missed the Signed-off-by: Mina Almasry <almasrymina@google.com> tag
here. I can add with the next version.

> ---
>  arch/arm64/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index cf6d1cd8b6dc..bcc9e1578937 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -235,6 +235,7 @@ config ARM64
>         select TRACE_IRQFLAGS_SUPPORT
>         select TRACE_IRQFLAGS_NMI_SUPPORT
>         select HAVE_SOFTIRQ_ON_OWN_STACK
> +       select FUNCTION_ALIGNMENT_16B
>         help
>           ARM 64-bit (AArch64) Linux support.
>
> --
> 2.39.0.rc0.267.gcb52ba06e7-goog

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1] arch: Enable function alignment for arm64
  2022-12-08  5:36 [PATCH v1] arch: Enable function alignment for arm64 Mina Almasry
  2022-12-08  6:47 ` Mina Almasry
@ 2023-01-24 12:09 ` Will Deacon
  2023-01-25 11:16   ` David Laight
  2023-02-22 22:08   ` Mina Almasry
  1 sibling, 2 replies; 7+ messages in thread
From: Will Deacon @ 2023-01-24 12:09 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Catalin Marinas, Peter Zijlstra, linux-arm-kernel, linux-kernel

On Wed, Dec 07, 2022 at 09:36:48PM -0800, Mina Almasry wrote:
> We recently ran into a double-digit percentage hackbench regression
> when backporting commit 12df140f0bdf ("mm,hugetlb: take hugetlb_lock
> before decrementing h->resv_huge_pages") to an older kernel. This was
> surprising since hackbench does use hugetlb pages at all and the
> modified code is not invoked. After some debugging we found that the
> regression can be fixed by back-porting commit d49a0626216b ("arch:
> Introduce CONFIG_FUNCTION_ALIGNMENT") and enabling function alignment
> for arm64. I suggest enabling it by default for arm64 if possible.
> 
> Tested by examing function alignment on a compiled object file
> before/after:
> 
> After this patch:
> 
> $ ~/is-aligned.sh mm/hugetlb.o 16
> file=mm/hugetlb.o, alignment=16
> total number of functions: 146
> total number of unaligned: 0
> 
> Before this patch:
> 
> $ ~/is-aligned.sh mm/hugetlb.o 16
> file=mm/hugetlb.o, alignment=16
> total number of functions: 146
> total number of unaligned: 94
> 
> Cc: Peter Zijlstra <peterz@infradead.org>
> ---
>  arch/arm64/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index cf6d1cd8b6dc..bcc9e1578937 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -235,6 +235,7 @@ config ARM64
>  	select TRACE_IRQFLAGS_SUPPORT
>  	select TRACE_IRQFLAGS_NMI_SUPPORT
>  	select HAVE_SOFTIRQ_ON_OWN_STACK
> +	select FUNCTION_ALIGNMENT_16B
>  	help
>  	  ARM 64-bit (AArch64) Linux support.

This increases the size of .text for a defconfig build by ~2%, so I think it
would be nice to have some real numbers for the performance uplift. Are you
able to elaborate beyond "double-digit percentage hackbench regression"?

In general, however, I'm supportive of the patch (and it seems that x86
does the same thing) so:

Acked-by: Will Deacon <will@kernel.org>

Will

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH v1] arch: Enable function alignment for arm64
  2023-01-24 12:09 ` Will Deacon
@ 2023-01-25 11:16   ` David Laight
  2023-02-22 22:16     ` Mina Almasry
  2023-02-22 22:08   ` Mina Almasry
  1 sibling, 1 reply; 7+ messages in thread
From: David Laight @ 2023-01-25 11:16 UTC (permalink / raw)
  To: 'Will Deacon', Mina Almasry
  Cc: Catalin Marinas, Peter Zijlstra, linux-arm-kernel, linux-kernel

From: Will Deacon <will@kernel.org>
> Sent: 24 January 2023 12:09
> 
> On Wed, Dec 07, 2022 at 09:36:48PM -0800, Mina Almasry wrote:
> > We recently ran into a double-digit percentage hackbench regression
> > when backporting commit 12df140f0bdf ("mm,hugetlb: take hugetlb_lock
> > before decrementing h->resv_huge_pages") to an older kernel. This was
> > surprising since hackbench does use hugetlb pages at all and the
> > modified code is not invoked. After some debugging we found that the
> > regression can be fixed by back-porting commit d49a0626216b ("arch:
> > Introduce CONFIG_FUNCTION_ALIGNMENT") and enabling function alignment
> > for arm64. I suggest enabling it by default for arm64 if possible.
> >
...
> 
> This increases the size of .text for a defconfig build by ~2%, so I think it
> would be nice to have some real numbers for the performance uplift. Are you
> able to elaborate beyond "double-digit percentage hackbench regression"?
> 
> In general, however, I'm supportive of the patch (and it seems that x86
> does the same thing) so:

I bet it just changes the alignment of the code so that more
functions are using different cache lines.

All sorts of other random changes are likely to have a similar effect.

Cache-line aligning the start of a function probably reduces the
number of cache lines the functions needs - but that isn't guaranteed.
It also slightly reduces the delay on a cache miss - but they are so
slow it probably makes almost no difference.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1] arch: Enable function alignment for arm64
  2023-01-24 12:09 ` Will Deacon
  2023-01-25 11:16   ` David Laight
@ 2023-02-22 22:08   ` Mina Almasry
  1 sibling, 0 replies; 7+ messages in thread
From: Mina Almasry @ 2023-02-22 22:08 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Peter Zijlstra, linux-arm-kernel, linux-kernel

On Tue, Jan 24, 2023 at 4:09 AM Will Deacon <will@kernel.org> wrote:
>
> On Wed, Dec 07, 2022 at 09:36:48PM -0800, Mina Almasry wrote:
> > We recently ran into a double-digit percentage hackbench regression
> > when backporting commit 12df140f0bdf ("mm,hugetlb: take hugetlb_lock
> > before decrementing h->resv_huge_pages") to an older kernel. This was
> > surprising since hackbench does use hugetlb pages at all and the
> > modified code is not invoked. After some debugging we found that the
> > regression can be fixed by back-porting commit d49a0626216b ("arch:
> > Introduce CONFIG_FUNCTION_ALIGNMENT") and enabling function alignment
> > for arm64. I suggest enabling it by default for arm64 if possible.
> >
> > Tested by examing function alignment on a compiled object file
> > before/after:
> >
> > After this patch:
> >
> > $ ~/is-aligned.sh mm/hugetlb.o 16
> > file=mm/hugetlb.o, alignment=16
> > total number of functions: 146
> > total number of unaligned: 0
> >
> > Before this patch:
> >
> > $ ~/is-aligned.sh mm/hugetlb.o 16
> > file=mm/hugetlb.o, alignment=16
> > total number of functions: 146
> > total number of unaligned: 94
> >
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > ---
> >  arch/arm64/Kconfig | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index cf6d1cd8b6dc..bcc9e1578937 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -235,6 +235,7 @@ config ARM64
> >       select TRACE_IRQFLAGS_SUPPORT
> >       select TRACE_IRQFLAGS_NMI_SUPPORT
> >       select HAVE_SOFTIRQ_ON_OWN_STACK
> > +     select FUNCTION_ALIGNMENT_16B
> >       help
> >         ARM 64-bit (AArch64) Linux support.
>
> This increases the size of .text for a defconfig build by ~2%, so I think it
> would be nice to have some real numbers for the performance uplift. Are you
> able to elaborate beyond "double-digit percentage hackbench regression"?
>

(Sorry for the late reply)

The full story is already in the commit message. The only details I
omitted are the exact regression numbers we saw:

-16% on hackbench_process_pipes_234 (which should be `hackbench -pipe
234 process 1000`)
-23% on hackbench_process_sockets_234 (which should be `hackbnech 234
process 1000`)

Like the commit message says it doesn't make much sense that
cherry-picking 12df140f0bdf ("mm,hugetlb: take hugetlb_lock before
decrementing h->resv_huge_pages") to our kernel causes such a huge
regression, because hackbench doesn't use hugetlb at all.

> In general, however, I'm supportive of the patch (and it seems that x86
> does the same thing) so:
>
> Acked-by: Will Deacon <will@kernel.org>
>
> Will

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1] arch: Enable function alignment for arm64
  2023-01-25 11:16   ` David Laight
@ 2023-02-22 22:16     ` Mina Almasry
  2023-02-22 22:42       ` David Laight
  0 siblings, 1 reply; 7+ messages in thread
From: Mina Almasry @ 2023-02-22 22:16 UTC (permalink / raw)
  To: David Laight
  Cc: Will Deacon, Catalin Marinas, Peter Zijlstra, linux-arm-kernel,
	linux-kernel

On Wed, Jan 25, 2023 at 3:16 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Will Deacon <will@kernel.org>
> > Sent: 24 January 2023 12:09
> >
> > On Wed, Dec 07, 2022 at 09:36:48PM -0800, Mina Almasry wrote:
> > > We recently ran into a double-digit percentage hackbench regression
> > > when backporting commit 12df140f0bdf ("mm,hugetlb: take hugetlb_lock
> > > before decrementing h->resv_huge_pages") to an older kernel. This was
> > > surprising since hackbench does use hugetlb pages at all and the
> > > modified code is not invoked. After some debugging we found that the
> > > regression can be fixed by back-porting commit d49a0626216b ("arch:
> > > Introduce CONFIG_FUNCTION_ALIGNMENT") and enabling function alignment
> > > for arm64. I suggest enabling it by default for arm64 if possible.
> > >
> ...
> >
> > This increases the size of .text for a defconfig build by ~2%, so I think it
> > would be nice to have some real numbers for the performance uplift. Are you
> > able to elaborate beyond "double-digit percentage hackbench regression"?
> >
> > In general, however, I'm supportive of the patch (and it seems that x86
> > does the same thing) so:
>
> I bet it just changes the alignment of the code so that more
> functions are using different cache lines.
>
> All sorts of other random changes are likely to have a similar effect.
>
> Cache-line aligning the start of a function probably reduces the
> number of cache lines the functions needs - but that isn't guaranteed.
> It also slightly reduces the delay on a cache miss - but they are so
> slow it probably makes almost no difference.
>

David, my understanding is similar to yours. I.e. without explicit alignment:

1. Random changes to the code can cause critical path functions to
become aligned or unaligned which will cause perf
regressions/improvements.
2. Random changes to the code can cause critical path functions to be
placed near a cache line boundary, causing one more cache line to be
loaded when they are run, which will cause perf regressions.

So for these very reasons function alignment is a good thing.

>         David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH v1] arch: Enable function alignment for arm64
  2023-02-22 22:16     ` Mina Almasry
@ 2023-02-22 22:42       ` David Laight
  0 siblings, 0 replies; 7+ messages in thread
From: David Laight @ 2023-02-22 22:42 UTC (permalink / raw)
  To: 'Mina Almasry'
  Cc: Will Deacon, Catalin Marinas, Peter Zijlstra, linux-arm-kernel,
	linux-kernel

From: Mina Almasry
> Sent: 22 February 2023 22:16
> 
> On Wed, Jan 25, 2023 at 3:16 AM David Laight <David.Laight@aculab.com> wrote:
> >
> > From: Will Deacon <will@kernel.org>
> > > Sent: 24 January 2023 12:09
> > >
> > > On Wed, Dec 07, 2022 at 09:36:48PM -0800, Mina Almasry wrote:
> > > > We recently ran into a double-digit percentage hackbench regression
> > > > when backporting commit 12df140f0bdf ("mm,hugetlb: take hugetlb_lock
> > > > before decrementing h->resv_huge_pages") to an older kernel. This was
> > > > surprising since hackbench does use hugetlb pages at all and the
> > > > modified code is not invoked. After some debugging we found that the
> > > > regression can be fixed by back-porting commit d49a0626216b ("arch:
> > > > Introduce CONFIG_FUNCTION_ALIGNMENT") and enabling function alignment
> > > > for arm64. I suggest enabling it by default for arm64 if possible.
> > > >
> > ...
> > >
> > > This increases the size of .text for a defconfig build by ~2%, so I think it
> > > would be nice to have some real numbers for the performance uplift. Are you
> > > able to elaborate beyond "double-digit percentage hackbench regression"?
> > >
> > > In general, however, I'm supportive of the patch (and it seems that x86
> > > does the same thing) so:
> >
> > I bet it just changes the alignment of the code so that more
> > functions are using different cache lines.
> >
> > All sorts of other random changes are likely to have a similar effect.
> >
> > Cache-line aligning the start of a function probably reduces the
> > number of cache lines the functions needs - but that isn't guaranteed.
> > It also slightly reduces the delay on a cache miss - but they are so
> > slow it probably makes almost no difference.
> >
> 
> David, my understanding is similar to yours. I.e. without explicit alignment:
> 
> 1. Random changes to the code can cause critical path functions to
> become aligned or unaligned which will cause perf
> regressions/improvements.
> 2. Random changes to the code can cause critical path functions to be
> placed near a cache line boundary, causing one more cache line to be
> loaded when they are run, which will cause perf regressions.
> 
> So for these very reasons function alignment is a good thing.

Except that aligning functions doesn't necessarily improve things.

Even within a function the alignment of the top of a loop (that
is executed a lot) might matter more than the alignment of the
function itself.

Any change will affect which code 'share' cache lines so can
stop the working set of some test (or code loop with deep
function calls) from fitting in the I-cache so making it
much slower.

Changing the size also affects where the TLB boundaries are
(especially if not using very large pages).
If the number of pages exceeds the number of TLB things
will slow down.

I think one version of gcc used to align most labels.
While the code might be slightly faster, the bloat actually
made it a net loss.

So aligning functions might help, but it might just
make things worse.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-02-22 22:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-08  5:36 [PATCH v1] arch: Enable function alignment for arm64 Mina Almasry
2022-12-08  6:47 ` Mina Almasry
2023-01-24 12:09 ` Will Deacon
2023-01-25 11:16   ` David Laight
2023-02-22 22:16     ` Mina Almasry
2023-02-22 22:42       ` David Laight
2023-02-22 22:08   ` Mina Almasry

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).