linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Mark Rutland <mark.rutland@arm.com>
To: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: linux-arm-kernel@lists.infradead.org, catalin.marinas@arm.com,
	lenb@kernel.org, linux-acpi@vger.kernel.org,
	linux-kernel@vger.kernel.org, mhiramat@kernel.org,
	ndesaulniers@google.com, ojeda@kernel.org, peterz@infradead.org,
	rafael.j.wysocki@intel.com, revest@chromium.org,
	robert.moore@intel.com, rostedt@goodmis.org, will@kernel.org
Subject: Re: [PATCH 1/8] Compiler attributes: GCC function alignment workarounds
Date: Wed, 11 Jan 2023 18:27:53 +0000	[thread overview]
Message-ID: <Y77/qVgvaJidFpYt@FVFF77S0Q05N> (raw)
In-Reply-To: <CANiq72kgmFYEO_EB_NxAF=S7VOf45KM7W3uwxxvftVErwfWzjg@mail.gmail.com>

On Mon, Jan 09, 2023 at 03:43:16PM +0100, Miguel Ojeda wrote:
> On Mon, Jan 9, 2023 at 2:58 PM Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > As far as I can tell, GCC doesn't respect '-falign-functions=N':
> >
> > * When the __weak__ attribute is used
> >
> >   GCC seems to forget the alignment specified by '-falign-functions=N',
> >   but will respect the '__aligned__(N)' function attribute. Thus, we can
> >   work around this by explciitly setting the alignment for weak
> >   functions.
> >
> > * When the __cold__ attribute is used
> >
> >   GCC seems to forget the alignment specified by '-falign-functions=N',
> >   and also doesn't seem to respect the '__aligned__(N)' function
> >   attribute. The only way to work around this is to not use the __cold__
> >   attibute.
> 
> If you happen to have a reduced case, then it would be nice to link it
> in the commit. A bug report to GCC would also be nice.
> 
> I gave it a very quick try in Compiler Explorer, but I couldn't
> reproduce it, so I guess it depends on flags, non-trivial functions or
> something else.

So having spent today coming up with tests, it turns out it's not quite as I
described above, but in a sense worse. I'm posting a summary here for
posterity; I'll try to get this to compiler folk shortly.

GCC appears to not align cold functions to the alignment specified by
`-falign-functions=N` when compiling at `-O1` or above. Alignment *can* be
restored with explicit attributes on each function, but due to some
interprocedural analysis, callees can be implicitly marked as cold (losing
their default alignment), which means we don't have a reliable mechanism to
ensure functions are always aligned short of annotating *every* function
explicitly (and I suspect that's not sufficient due to some interprocedural optimizations).

I've tested with the 12.1.0 binary release from the kernel.org cross toolchains
page).

LLVM always seems to repsect `-falign-functions=N` at both `-O1` and `-O2` (I
tested the 10.0.0, 11.0.0, 11.0.1, 15.0.6 binary releases from llvm.org).

For example:

| [mark@lakrids:/mnt/data/tests/gcc-alignment]% cat test-cold.c                                           
| #define __cold \
|         __attribute__((cold))
| 
| #define EXPORT_FUNC_PTR(func) \
|         typeof((func)) *__ptr_##func = (func)
| 
| __cold
| void cold_func_a(void) { }
| 
| __cold
| void cold_func_b(void) { }
| 
| __cold
| void cold_func_c(void) { }
| 
| static __cold
| void static_cold_func_a(void) { }
| EXPORT_FUNC_PTR(static_cold_func_a);
| 
| static __cold
| void static_cold_func_b(void) { }
| EXPORT_FUNC_PTR(static_cold_func_b);
| 
| static __cold
| void static_cold_func_c(void) { }
| EXPORT_FUNC_PTR(static_cold_func_c);
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-gcc -falign-functions=16 -c test-cold.c -O1
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -d test-cold.o                     
| 
| test-cold.o:     file format elf64-littleaarch64
| 
| 
| Disassembly of section .text:
| 
| 0000000000000000 <static_cold_func_a>:
|    0:   d65f03c0        ret
| 
| 0000000000000004 <static_cold_func_b>:
|    4:   d65f03c0        ret
| 
| 0000000000000008 <static_cold_func_c>:
|    8:   d65f03c0        ret
| 
| 000000000000000c <cold_func_a>:
|    c:   d65f03c0        ret
| 
| 0000000000000010 <cold_func_b>:
|   10:   d65f03c0        ret
| 
| 0000000000000014 <cold_func_c>:
|   14:   d65f03c0        ret
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -h test-cold.o
| 
| test-cold.o:     file format elf64-littleaarch64
| 
| Sections:
| Idx Name          Size      VMA               LMA               File off  Algn
|   0 .text         00000018  0000000000000000  0000000000000000  00000040  2**2
|                   CONTENTS, ALLOC, LOAD, READONLY, CODE
|   1 .data         00000018  0000000000000000  0000000000000000  00000058  2**3
|                   CONTENTS, ALLOC, LOAD, RELOC, DATA
|   2 .bss          00000000  0000000000000000  0000000000000000  00000070  2**0
|                   ALLOC
|   3 .comment      00000013  0000000000000000  0000000000000000  00000070  2**0
|                   CONTENTS, READONLY
|   4 .note.GNU-stack 00000000  0000000000000000  0000000000000000  00000083  2**0
|                   CONTENTS, READONLY
|   5 .eh_frame     00000090  0000000000000000  0000000000000000  00000088  2**3
|                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA

In simple cases, alignment *can* be restored if an explicit function attribute
is used. For example:

| [mark@lakrids:/mnt/data/tests/gcc-alignment]% cat test-aligned-cold.c                            
| #define __aligned(n) \
|         __attribute__((aligned(n)))
| 
| #define __cold \
|         __attribute__((cold)) __aligned(16)
| 
| #define EXPORT_FUNC_PTR(func) \
|         typeof((func)) *__ptr_##func = (func)
| 
| __cold
| void cold_func_a(void) { }
| 
| __cold
| void cold_func_b(void) { }
| 
| __cold
| void cold_func_c(void) { }
| 
| static __cold
| void static_cold_func_a(void) { }
| EXPORT_FUNC_PTR(static_cold_func_a);
| 
| static __cold
| void static_cold_func_b(void) { }
| EXPORT_FUNC_PTR(static_cold_func_b);
| 
| static __cold
| void static_cold_func_c(void) { }
| EXPORT_FUNC_PTR(static_cold_func_c);
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-gcc -falign-functions=16 -c test-aligned-cold.c -O1
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -d test-aligned-cold.o                     
| 
| test-aligned-cold.o:     file format elf64-littleaarch64
| 
| 
| Disassembly of section .text:
| 
| 0000000000000000 <static_cold_func_a>:
|    0:   d65f03c0        ret
|    4:   d503201f        nop
|    8:   d503201f        nop
|    c:   d503201f        nop
| 
| 0000000000000010 <static_cold_func_b>:
|   10:   d65f03c0        ret
|   14:   d503201f        nop
|   18:   d503201f        nop
|   1c:   d503201f        nop
| 
| 0000000000000020 <static_cold_func_c>:
|   20:   d65f03c0        ret
|   24:   d503201f        nop
|   28:   d503201f        nop
|   2c:   d503201f        nop
| 
| 0000000000000030 <cold_func_a>:
|   30:   d65f03c0        ret
|   34:   d503201f        nop
|   38:   d503201f        nop
|   3c:   d503201f        nop
| 
| 0000000000000040 <cold_func_b>:
|   40:   d65f03c0        ret
|   44:   d503201f        nop
|   48:   d503201f        nop
|   4c:   d503201f        nop
| 
| 0000000000000050 <cold_func_c>:
|   50:   d65f03c0        ret
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -h test-aligned-cold.o
| 
| test-aligned-cold.o:     file format elf64-littleaarch64
| 
| Sections:
| Idx Name          Size      VMA               LMA               File off  Algn
|   0 .text         00000054  0000000000000000  0000000000000000  00000040  2**4
|                   CONTENTS, ALLOC, LOAD, READONLY, CODE
|   1 .data         00000018  0000000000000000  0000000000000000  00000098  2**3
|                   CONTENTS, ALLOC, LOAD, RELOC, DATA
|   2 .bss          00000000  0000000000000000  0000000000000000  000000b0  2**0
|                   ALLOC
|   3 .comment      00000013  0000000000000000  0000000000000000  000000b0  2**0
|                   CONTENTS, READONLY
|   4 .note.GNU-stack 00000000  0000000000000000  0000000000000000  000000c3  2**0
|                   CONTENTS, READONLY
|   5 .eh_frame     00000090  0000000000000000  0000000000000000  000000c8  2**3
|                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA


Unfortunately it appears that some interprocedural analysis determines that if
a callee is only called/referenced from cold callers, the callee is marked as
cold, and the alignment it would have got from the command line option is
dropped. If it's given an explicit alignment attribute, the alignment is
retained.

For example:

| [mark@lakrids:/mnt/data/tests/gcc-alignment]% cat test-aligned-cold-caller.c                                    
| #define noinline \
|         __attribute__((noinline))
| 
| #define __aligned(n) \
|         __attribute__((aligned(n)))
| 
| #define __cold \
|         __attribute__((cold)) __aligned(16)
| 
| #define EXPORT_FUNC_PTR(func) \
|         typeof((func)) *__ptr_##func = (func)
| 
| static noinline void callee_a(void)
| {
|         asm volatile("// callee_a\n" ::: "memory");
| }
| 
| static noinline void callee_b(void)
| {
|         asm volatile("// callee_b\n" ::: "memory");
| }
| 
| static noinline void callee_c(void)
| {
|         asm volatile("// callee_c\n" ::: "memory");
| }
| __cold
| void cold_func_a(void) { callee_a(); }
| 
| __cold
| void cold_func_b(void) { callee_b(); }
| 
| __cold
| void cold_func_c(void) { callee_c(); }
| 
| static __cold
| void static_cold_func_a(void) { callee_a(); }
| EXPORT_FUNC_PTR(static_cold_func_a);
| 
| static __cold
| void static_cold_func_b(void) { callee_b(); }
| EXPORT_FUNC_PTR(static_cold_func_b);
| 
| static __cold
| void static_cold_func_c(void) { callee_c(); }
| EXPORT_FUNC_PTR(static_cold_func_c);
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-gcc -falign-functions=16 -c test-aligned-cold-caller.c -O1
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -d test-aligned-cold-caller.o                     
| 
| test-aligned-cold-caller.o:     file format elf64-littleaarch64
| 
| 
| Disassembly of section .text:
| 
| 0000000000000000 <callee_a>:
|    0:   d65f03c0        ret
| 
| 0000000000000004 <callee_b>:
|    4:   d65f03c0        ret
| 
| 0000000000000008 <callee_c>:
|    8:   d65f03c0        ret
|    c:   d503201f        nop
| 
| 0000000000000010 <static_cold_func_a>:
|   10:   a9bf7bfd        stp     x29, x30, [sp, #-16]!
|   14:   910003fd        mov     x29, sp
|   18:   97fffffa        bl      0 <callee_a>
|   1c:   a8c17bfd        ldp     x29, x30, [sp], #16
|   20:   d65f03c0        ret
|   24:   d503201f        nop
|   28:   d503201f        nop
|   2c:   d503201f        nop
| 
| 0000000000000030 <static_cold_func_b>:
|   30:   a9bf7bfd        stp     x29, x30, [sp, #-16]!
|   34:   910003fd        mov     x29, sp
|   38:   97fffff3        bl      4 <callee_b>
|   3c:   a8c17bfd        ldp     x29, x30, [sp], #16
|   40:   d65f03c0        ret
|   44:   d503201f        nop
|   48:   d503201f        nop
|   4c:   d503201f        nop
| 
| 0000000000000050 <static_cold_func_c>:
|   50:   a9bf7bfd        stp     x29, x30, [sp, #-16]!
|   54:   910003fd        mov     x29, sp
|   58:   97ffffec        bl      8 <callee_c>
|   5c:   a8c17bfd        ldp     x29, x30, [sp], #16
|   60:   d65f03c0        ret
|   64:   d503201f        nop
|   68:   d503201f        nop
|   6c:   d503201f        nop
| 
| 0000000000000070 <cold_func_a>:
|   70:   a9bf7bfd        stp     x29, x30, [sp, #-16]!
|   74:   910003fd        mov     x29, sp
|   78:   97ffffe2        bl      0 <callee_a>
|   7c:   a8c17bfd        ldp     x29, x30, [sp], #16
|   80:   d65f03c0        ret
|   84:   d503201f        nop
|   88:   d503201f        nop
|   8c:   d503201f        nop
| 
| 0000000000000090 <cold_func_b>:
|   90:   a9bf7bfd        stp     x29, x30, [sp, #-16]!
|   94:   910003fd        mov     x29, sp
|   98:   97ffffdb        bl      4 <callee_b>
|   9c:   a8c17bfd        ldp     x29, x30, [sp], #16
|   a0:   d65f03c0        ret
|   a4:   d503201f        nop
|   a8:   d503201f        nop
|   ac:   d503201f        nop
| 
| 00000000000000b0 <cold_func_c>:
|   b0:   a9bf7bfd        stp     x29, x30, [sp, #-16]!
|   b4:   910003fd        mov     x29, sp
|   b8:   97ffffd4        bl      8 <callee_c>
|   bc:   a8c17bfd        ldp     x29, x30, [sp], #16
|   c0:   d65f03c0        ret
| [mark@lakrids:/mnt/data/tests/gcc-alignment]% usekorg 12.1.0 aarch64-linux-objdump -h test-aligned-cold-caller.o
| 
| test-aligned-cold-caller.o:     file format elf64-littleaarch64
| 
| Sections:
| Idx Name          Size      VMA               LMA               File off  Algn
|   0 .text         000000c4  0000000000000000  0000000000000000  00000040  2**4
|                   CONTENTS, ALLOC, LOAD, READONLY, CODE
|   1 .data         00000018  0000000000000000  0000000000000000  00000108  2**3
|                   CONTENTS, ALLOC, LOAD, RELOC, DATA
|   2 .bss          00000000  0000000000000000  0000000000000000  00000120  2**0
|                   ALLOC
|   3 .comment      00000013  0000000000000000  0000000000000000  00000120  2**0
|                   CONTENTS, READONLY
|   4 .note.GNU-stack 00000000  0000000000000000  0000000000000000  00000133  2**0
|                   CONTENTS, READONLY
|   5 .eh_frame     00000110  0000000000000000  0000000000000000  00000138  2**3
|                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2023-01-11 18:29 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-09 13:58 [PATCH 0/8] arm64/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS Mark Rutland
2023-01-09 13:58 ` [PATCH 1/8] Compiler attributes: GCC function alignment workarounds Mark Rutland
2023-01-09 14:43   ` Miguel Ojeda
2023-01-09 17:06     ` Mark Rutland
2023-01-09 22:35       ` Miguel Ojeda
2023-01-11 18:27     ` Mark Rutland [this message]
2023-01-12 11:38       ` Mark Rutland
2023-01-13 12:49         ` Mark Rutland
2023-01-15 21:32           ` Miguel Ojeda
2023-01-09 13:58 ` [PATCH 2/8] ACPI: Don't build ACPICA with '-Os' Mark Rutland
2023-01-10 13:45   ` Rafael J. Wysocki
2023-01-09 13:58 ` [PATCH 3/8] arm64: Extend support for CONFIG_FUNCTION_ALIGNMENT Mark Rutland
2023-01-10 20:35   ` Peter Zijlstra
2023-01-10 20:43     ` Will Deacon
2023-01-11 11:39       ` Mark Rutland
2023-01-11 11:36     ` Mark Rutland
2023-01-09 13:58 ` [PATCH 4/8] ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS Mark Rutland
2023-01-12  6:48   ` Li Huafei
2023-01-12 11:00     ` Mark Rutland
2023-01-13  1:15       ` Li Huafei
2023-01-09 13:58 ` [PATCH 5/8] arm64: insn: Add helpers for BTI Mark Rutland
2023-01-09 13:58 ` [PATCH 6/8] arm64: patching: Add aarch64_insn_write_literal_u64() Mark Rutland
2023-01-09 13:58 ` [PATCH 7/8] arm64: ftrace: Update stale comment Mark Rutland
2023-01-09 13:58 ` [PATCH 8/8] arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS Mark Rutland
2023-01-10  8:55 ` [PATCH 0/8] arm64/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS David Laight
2023-01-10 10:31   ` Mark Rutland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y77/qVgvaJidFpYt@FVFF77S0Q05N \
    --to=mark.rutland@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=miguel.ojeda.sandonis@gmail.com \
    --cc=ndesaulniers@google.com \
    --cc=ojeda@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=revest@chromium.org \
    --cc=robert.moore@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).