bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Giuliano Procida <gprocida@google.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: dwarves@vger.kernel.org, kernel-team@android.com,
	"Matthias Männich" <maennich@google.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Andrii Nakryiko" <andrii@kernel.org>, bpf <bpf@vger.kernel.org>,
	"Kernel Team" <kernel-team@fb.com>
Subject: Re: [PATCH 3/3] btf_encoder: Set .BTF section alignment to 16
Date: Wed, 27 Jan 2021 18:06:27 +0000	[thread overview]
Message-ID: <CAGvU0H=bNJ6QScpsxQWiijCqvqVhBoHctOhN8nZ8vt9CwpA6tQ@mail.gmail.com> (raw)
In-Reply-To: <CAGvU0HkuZ_AW_YTjsdsivWV+wF3kf49ugChzMdRjZnrYzwVB3A@mail.gmail.com>

Hi.

On Wed, 27 Jan 2021 at 15:08, Giuliano Procida <gprocida@google.com> wrote:
>
> Hi Andrii.
>
> On Thu, 21 Jan 2021 at 20:08, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Jan 21, 2021 at 3:07 AM Giuliano Procida <gprocida@google.com> wrote:
> > >
> > > Hi.
> > >
> > > On Thu, 21 Jan 2021 at 07:16, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote:
> > >>
> > >> On Mon, Jan 18, 2021 at 8:01 AM Giuliano Procida <gprocida@google.com> wrote:
> > >> >
> > >> > This is to avoid misaligned access when memory-mapping ELF sections.
> > >> >
> > >> > Signed-off-by: Giuliano Procida <gprocida@google.com>
> > >> > ---
> > >> >  libbtf.c | 8 ++++++++
> > >> >  1 file changed, 8 insertions(+)
> > >> >
> > >> > diff --git a/libbtf.c b/libbtf.c
> > >> > index 7552d8e..2f12d53 100644
> > >> > --- a/libbtf.c
> > >> > +++ b/libbtf.c
> > >> > @@ -797,6 +797,14 @@ static int btf_elf__write(const char *filename, struct btf *btf)
> > >> >                         goto unlink;
> > >> >                 }
> > >> >
> > >> > +               snprintf(cmd, sizeof(cmd), "%s --set-section-alignment .BTF=16 %s",
> > >> > +                        llvm_objcopy, filename);
> > >>
> > >> does it align inside the ELF file to 16 bytes, or does it request the
> > >> linker to align it at 16 byte alignment in memory? Given .BTF section
> > >> is not loadable, trying to understand the implications.
> > >>
> > >
> > > We have a tool that loads BTF from ELF files. It uses mmap and "parses" the BTF as structs in memory. The ELF file is mapped with page alignment but the BTF section within it has no alignment at all. Using MSAN (IIRC) we get warnings about misaligned accesses. Everything within BTF itself is naturally aligned, so it makes sense to align the section within ELF as well. There are probably some architectures where this makes the difference between working and SIGBUS.
> > >
> >
> > Right, ok, thanks for explaining!
> >
> > > I did try to get objcopy to set alignment at the point the section is added. However, this didn't work.
> > >
> > >>
> > >>
> > >> > +               if (system(cmd)) {
> > >>
> > >> Also curious, if objcopy emits error (saying that
> > >> --set-section-alignment argument is not recognized), will that error
> > >> be shown in stdout? or system() consumes it without redirecting it to
> > >> stdout?
> > >>
> > >
> > > I believe it goes to stderr. I would need to check. system() will not consume this. I'm not keen to write stderr (or stdout) post-processing code in plain C.
> > >
> >
> > You can use popen() to capture/hide output, this is a better
> > alternative to system() in this case. We don't want "expected
> > warnings" in kernel build process.
> >
> > >>
> > >> > +                       /* non-fatal, this is a nice-to-have and it's only supported from LLVM 10 */
> > >> > +                       fprintf(stderr, "%s: warning: failed to align .BTF section in '%s': %d!\n",
> > >> > +                               __func__, filename, errno);
> > >>
> > >> Probably better to emit this warning only in verbose mode, otherwise
> > >> lots of people will start complaining that they get some new warnings
> > >> from pahole.
> > >>
> > >
> > > It may be better to just use POSIX and ELF APIs directly instead of objcopy. This way the section can be added with the right alignment directly. pahole is already linked against libelf and if we could get rid of the external dependency on objcopy it would be a win in more than one way.
> >
> > This would be great, yes. At some point I remember giving it a try,
> > but for some reason I couldn't make libelf flush data and update
> > section headers properly. Maybe you'll have better luck. Though I
> > think I was trying to mark section loadable, and eventually I probably
> > managed to do that, but still abandoned it (it's not enough to mark
> > section loadable, you have to assign it to ELF segment as well, which
> > libelf doesn't allow to do and you need linker support). Anyways, give
> > it a try, it should work.
> >
>
> I found 341dfcf8d78eaa3a2dc96dea06f0392eb2978364 ("btf: expose BTF
> info through sysfs") and I now see what you mean.
>
> Alignment of .BTF as produced by the linker script is currently not
> down to pahole at all. The kernel link script has to add .BTF in a
> rather roundabout way because it needs to be added as a loadable
> segment and pahole only adds it as a plain section.
>
> pahole's does this using llvm-objcopy (which I spotted has some
> side-effects on our AOSP vmlinux). On vanilla kernels, while
> llvm-objcopy doesn't rewrite (or at least, resize) .strtab, it does
> renumber sections so that the offset order is monotonic.
>
> We're working with .BTF in userspace and haven't needed .BTF as a
> segment. If I managed to get pahole to make .BTF a loadable segment as
> well, then the linker scripts could be simplified. I'll see if I can
> do this part as well.

OK...

$ readelf -lW /tmp/vmlinux

Elf file type is EXEC (Executable file)
Entry point 0x1000000
There are 5 program headers, starting at offset 64

Program Headers:
  Type           Offset   VirtAddr           PhysAddr
FileSiz  MemSiz   Flg Align
  LOAD           0x200000 0xffffffff81000000 0x0000000001000000
0x167be37 0x167be37 R E 0x200000
  LOAD           0x1a00000 0xffffffff82800000 0x0000000002800000
0x5a6000 0x5a6000 RW  0x200000
  LOAD           0x2000000 0x0000000000000000 0x0000000002da6000
0x02a258 0x02a258 RW  0x200000
  LOAD           0x21d1000 0xffffffff82dd1000 0x0000000002dd1000
0x104000 0x25b000 RWE 0x200000
  NOTE           0x152ac30 0xffffffff8232ac30 0x000000000232ac30
0x00003c 0x00003c     0x4

 Section to Segment mapping:
  Segment Sections...
   00     .text .rodata .pci_fixup .tracedata __ksymtab __ksymtab_gpl
__ksymtab_strings __param __modver __ex_table .notes .BTF
   01     .data __bug_table .orc_unwind_ip .orc_unwind .orc_lookup .vvar
   02     .data..percpu
   03     .init.text .altinstr_aux .init.data .x86_cpu_dev.init
.altinstructions .altinstr_replacement .iommu_table .apicdrivers
.exit.text .smp_locks .data_nosave .bss .brk
   04     .notes

This is the end result. The sausage factory (gen_btf / vmlinux_link -
which I've now read through) actually does:

* link a temporary vmlinux
* run pahole -J on this
* dump out the .BTF as a raw file (anything clever pahole does with
ELF is thrown away here)
* create an ELF file with arch and format to match vmlinux, containing
a single .BTF section
* link this ELF file with the other bits of the kernel.

As a DWARF to BTF converter, pahole's role is clear. At this point I'd
like to separate what's useful for the kernel and what's useful in
terms of generally packaging up .BTF as a kind of debug information
for general use.

Packaging up BTF as an ELF section or linking this into the kernel is
a lot of work to do properly in pahole and duplicates the role of the
linker. If I continue, I'll probably end up creating a disjoint R
segment just for .BTF and I don't know if that's OK. I'm not sure how
much more work is needed to get to the point where all the various
objcopy/objdump can be eliminated or whether this is a worthwhile
goal. Another way of getting rid of the objcopy/objdump dependencies
for the kernel would be to just emit an ELF file containing the .BTF
section only and let the linker do its thing.

For non-kernel use, I'm not sure of the implications of letting libelf
reorder all the sections, or if we'd ever want the .BTF to be in a
loadable segment. If anything, I'd advocate for having pahole just
generate raw BTF output. However, I know there's a big convenience
factor in having debug (type) info packaged into the ELF.

So I'm not sure if it's worth pursuing the line of work beyond
eliminating pahole's dependency on llvm-objcopy. In terms of my
follow-up series, this might mean dropping 3/4 (as preserving existing
ELF layout in the temporary vmlinux isn't needed) but keeping 4/4 (as
it's useful to us, even if it's currently useless for the kernel).

If we cannot get libelf to make the right kind of sausage, then I
agree that vmlinux .BTF alignment should probably follow your earlier
suggestion of `. = ALIGN(8)` in vmlinux.lds.h.

And we haven't even got to discussing modules and merging .BTF info. :-)

Regards,
Giuliano.


> Regards,
> Giuliano.
>
> > >
> > >>
> > >>
> > >> > +               }
> > >> > +
> > >> >                 err = 0;
> > >> >         unlink:
> > >> >                 unlink(tmp_fn);
> > >> > --
> > >> > 2.30.0.284.gd98b1dd5eaa7-goog
> > >> >
> > >
> > >
> > > I'll see if I can spend a little time on this idea instead.
> > >
> > > Regards,
> > > Giuliano.
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> >

  reply	other threads:[~2021-01-27 18:07 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-18 16:01 [PATCH 0/3] Small fixes and improvements Giuliano Procida
2021-01-18 16:01 ` [PATCH 1/3] btf_encoder: Fix handling of restrict qualifier Giuliano Procida
2021-01-21  7:07   ` Andrii Nakryiko
2021-01-21 13:04     ` Arnaldo Carvalho de Melo
2021-01-18 16:01 ` [PATCH 2/3] btf_encoder: Improve error-handling around objcopy Giuliano Procida
2021-01-21  7:09   ` Andrii Nakryiko
2021-01-18 16:01 ` [PATCH 3/3] btf_encoder: Set .BTF section alignment to 16 Giuliano Procida
2021-01-21  7:16   ` Andrii Nakryiko
     [not found]     ` <CAGvU0HmE+gs8eNQcXmFrEERHaiGEnMgqxBho4Ny3DLCe6WR55Q@mail.gmail.com>
2021-01-21 20:07       ` Andrii Nakryiko
2021-01-25 12:53         ` Giuliano Procida
2021-01-26  0:28           ` Andrii Nakryiko
2021-01-26 11:43             ` Giuliano Procida
2021-01-27 15:08         ` Giuliano Procida
2021-01-27 18:06           ` Giuliano Procida [this message]
2021-01-27 19:56             ` Andrii Nakryiko
2021-01-21 11:35 ` [PATCH dwarves v2 0/3] Small fixes and improvements Giuliano Procida
2021-01-21 11:35   ` [PATCH dwarves v2 1/3] btf_encoder: Fix handling of restrict qualifier Giuliano Procida
2021-01-21 13:21     ` Arnaldo Carvalho de Melo
2021-01-21 11:35   ` [PATCH dwarves v2 2/3] btf_encoder: Improve error-handling around objcopy Giuliano Procida
2021-01-21 13:24     ` Arnaldo Carvalho de Melo
2021-01-21 11:35   ` [PATCH dwarves v2 3/3] btf_encoder: Set .BTF section alignment to 16 Giuliano Procida
2021-01-21 13:23     ` Arnaldo Carvalho de Melo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGvU0H=bNJ6QScpsxQWiijCqvqVhBoHctOhN8nZ8vt9CwpA6tQ@mail.gmail.com' \
    --to=gprocida@google.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=dwarves@vger.kernel.org \
    --cc=kernel-team@android.com \
    --cc=kernel-team@fb.com \
    --cc=maennich@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).