All of lore.kernel.org
 help / color / mirror / Atom feed
* Latest libbpf fails to load programs compiled with old LLVM
@ 2020-12-03 17:55 Toke Høiland-Jørgensen
  2020-12-04  1:42 ` Yonghong Song
  0 siblings, 1 reply; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-12-03 17:55 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: bpf

Hi Andrii

I noticed that recent libbpf versions fail to load BPF files compiled
with old versions of LLVM. E.g., if I compile xdp-tools with LLVM 7 I
get:

$ sudo ./xdp-loader load testns ../lib/testing/xdp_drop.o -vv
Loading 1 files on interface 'testns'.
libbpf: loading ../lib/testing/xdp_drop.o
libbpf: elf: section(3) prog, size 16, link 0, flags 6, type=1
libbpf: sec 'prog': failed to find program symbol at offset 0
Couldn't open file '../lib/testing/xdp_drop.o': BPF object format invalid

The 'failed to find program symbol' error seems to have been introduced
with commit c112239272c6 ("libbpf: Parse multi-function sections into
multiple BPF programs").

Looking at the object file in question, indeed it seems to not have any
function symbols defined:

$  llvm-objdump --syms ../lib/testing/xdp_drop.o

../lib/testing/xdp_drop.o:	file format elf64-bpf

SYMBOL TABLE:
0000000000000000 l       .debug_str	0000000000000000 
0000000000000037 l       .debug_str	0000000000000000 
0000000000000042 l       .debug_str	0000000000000000 
0000000000000068 l       .debug_str	0000000000000000 
0000000000000071 l       .debug_str	0000000000000000 
0000000000000076 l       .debug_str	0000000000000000 
000000000000008a l       .debug_str	0000000000000000 
0000000000000097 l       .debug_str	0000000000000000 
00000000000000a3 l       .debug_str	0000000000000000 
00000000000000ac l       .debug_str	0000000000000000 
00000000000000b5 l       .debug_str	0000000000000000 
00000000000000bc l       .debug_str	0000000000000000 
00000000000000c9 l       .debug_str	0000000000000000 
00000000000000d4 l       .debug_str	0000000000000000 
00000000000000dd l       .debug_str	0000000000000000 
00000000000000e1 l       .debug_str	0000000000000000 
00000000000000e5 l       .debug_str	0000000000000000 
00000000000000ea l       .debug_str	0000000000000000 
00000000000000f0 l       .debug_str	0000000000000000 
00000000000000f9 l       .debug_str	0000000000000000 
0000000000000103 l       .debug_str	0000000000000000 
0000000000000113 l       .debug_str	0000000000000000 
0000000000000122 l       .debug_str	0000000000000000 
0000000000000131 l       .debug_str	0000000000000000 
0000000000000000 l    d  prog	0000000000000000 prog
0000000000000000 l    d  .debug_abbrev	0000000000000000 .debug_abbrev
0000000000000000 l    d  .debug_info	0000000000000000 .debug_info
0000000000000000 l    d  .debug_frame	0000000000000000 .debug_frame
0000000000000000 l    d  .debug_line	0000000000000000 .debug_line
0000000000000000 g       license	0000000000000000 _license
0000000000000000 g       prog	0000000000000000 xdp_drop


I assume this is because old LLVM versions simply don't emit that symbol
information?

Anyhow, the patch series that introduced this restructures the program
parsing some, so I wanted to get your input to make sure I don't break
things when fixing this regression. So what's the best way to fix it?
Just assume that the whole section is one program if no symbols are
present, or is the some subtle reason why that would break any of the
other logic for BPF-to-BPF calls?

-Toke


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Latest libbpf fails to load programs compiled with old LLVM
  2020-12-03 17:55 Latest libbpf fails to load programs compiled with old LLVM Toke Høiland-Jørgensen
@ 2020-12-04  1:42 ` Yonghong Song
  2020-12-04  9:34   ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 20+ messages in thread
From: Yonghong Song @ 2020-12-04  1:42 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Andrii Nakryiko; +Cc: bpf



On 12/3/20 9:55 AM, Toke Høiland-Jørgensen wrote:
> Hi Andrii
> 
> I noticed that recent libbpf versions fail to load BPF files compiled
> with old versions of LLVM. E.g., if I compile xdp-tools with LLVM 7 I
> get:
> 
> $ sudo ./xdp-loader load testns ../lib/testing/xdp_drop.o -vv
> Loading 1 files on interface 'testns'.
> libbpf: loading ../lib/testing/xdp_drop.o
> libbpf: elf: section(3) prog, size 16, link 0, flags 6, type=1
> libbpf: sec 'prog': failed to find program symbol at offset 0
> Couldn't open file '../lib/testing/xdp_drop.o': BPF object format invalid
> 
> The 'failed to find program symbol' error seems to have been introduced
> with commit c112239272c6 ("libbpf: Parse multi-function sections into
> multiple BPF programs").
> 
> Looking at the object file in question, indeed it seems to not have any
> function symbols defined:
> 
> $  llvm-objdump --syms ../lib/testing/xdp_drop.o
> 
> ../lib/testing/xdp_drop.o:	file format elf64-bpf
> 
> SYMBOL TABLE:
> 0000000000000000 l       .debug_str	0000000000000000
> 0000000000000037 l       .debug_str	0000000000000000
> 0000000000000042 l       .debug_str	0000000000000000
> 0000000000000068 l       .debug_str	0000000000000000
> 0000000000000071 l       .debug_str	0000000000000000
> 0000000000000076 l       .debug_str	0000000000000000
> 000000000000008a l       .debug_str	0000000000000000
> 0000000000000097 l       .debug_str	0000000000000000
> 00000000000000a3 l       .debug_str	0000000000000000
> 00000000000000ac l       .debug_str	0000000000000000
> 00000000000000b5 l       .debug_str	0000000000000000
> 00000000000000bc l       .debug_str	0000000000000000
> 00000000000000c9 l       .debug_str	0000000000000000
> 00000000000000d4 l       .debug_str	0000000000000000
> 00000000000000dd l       .debug_str	0000000000000000
> 00000000000000e1 l       .debug_str	0000000000000000
> 00000000000000e5 l       .debug_str	0000000000000000
> 00000000000000ea l       .debug_str	0000000000000000
> 00000000000000f0 l       .debug_str	0000000000000000
> 00000000000000f9 l       .debug_str	0000000000000000
> 0000000000000103 l       .debug_str	0000000000000000
> 0000000000000113 l       .debug_str	0000000000000000
> 0000000000000122 l       .debug_str	0000000000000000
> 0000000000000131 l       .debug_str	0000000000000000
> 0000000000000000 l    d  prog	0000000000000000 prog
> 0000000000000000 l    d  .debug_abbrev	0000000000000000 .debug_abbrev
> 0000000000000000 l    d  .debug_info	0000000000000000 .debug_info
> 0000000000000000 l    d  .debug_frame	0000000000000000 .debug_frame
> 0000000000000000 l    d  .debug_line	0000000000000000 .debug_line
> 0000000000000000 g       license	0000000000000000 _license
> 0000000000000000 g       prog	0000000000000000 xdp_drop
> 
> 
> I assume this is because old LLVM versions simply don't emit that symbol
> information?

Could you share xdp_drop.c or other test which I can compile and check
to understand the issue?

> 
> Anyhow, the patch series that introduced this restructures the program
> parsing some, so I wanted to get your input to make sure I don't break
> things when fixing this regression. So what's the best way to fix it?
> Just assume that the whole section is one program if no symbols are
> present, or is the some subtle reason why that would break any of the
> other logic for BPF-to-BPF calls?
> 
> -Toke
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Latest libbpf fails to load programs compiled with old LLVM
  2020-12-04  1:42 ` Yonghong Song
@ 2020-12-04  9:34   ` Toke Høiland-Jørgensen
  2020-12-04 17:54     ` Yonghong Song
  0 siblings, 1 reply; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-12-04  9:34 UTC (permalink / raw)
  To: Yonghong Song, Andrii Nakryiko; +Cc: bpf

Yonghong Song <yhs@fb.com> writes:

> On 12/3/20 9:55 AM, Toke Høiland-Jørgensen wrote:
>> Hi Andrii
>> 
>> I noticed that recent libbpf versions fail to load BPF files compiled
>> with old versions of LLVM. E.g., if I compile xdp-tools with LLVM 7 I
>> get:
>> 
>> $ sudo ./xdp-loader load testns ../lib/testing/xdp_drop.o -vv
>> Loading 1 files on interface 'testns'.
>> libbpf: loading ../lib/testing/xdp_drop.o
>> libbpf: elf: section(3) prog, size 16, link 0, flags 6, type=1
>> libbpf: sec 'prog': failed to find program symbol at offset 0
>> Couldn't open file '../lib/testing/xdp_drop.o': BPF object format invalid
>> 
>> The 'failed to find program symbol' error seems to have been introduced
>> with commit c112239272c6 ("libbpf: Parse multi-function sections into
>> multiple BPF programs").
>> 
>> Looking at the object file in question, indeed it seems to not have any
>> function symbols defined:
>> 
>> $  llvm-objdump --syms ../lib/testing/xdp_drop.o
>> 
>> ../lib/testing/xdp_drop.o:	file format elf64-bpf
>> 
>> SYMBOL TABLE:
>> 0000000000000000 l       .debug_str	0000000000000000
>> 0000000000000037 l       .debug_str	0000000000000000
>> 0000000000000042 l       .debug_str	0000000000000000
>> 0000000000000068 l       .debug_str	0000000000000000
>> 0000000000000071 l       .debug_str	0000000000000000
>> 0000000000000076 l       .debug_str	0000000000000000
>> 000000000000008a l       .debug_str	0000000000000000
>> 0000000000000097 l       .debug_str	0000000000000000
>> 00000000000000a3 l       .debug_str	0000000000000000
>> 00000000000000ac l       .debug_str	0000000000000000
>> 00000000000000b5 l       .debug_str	0000000000000000
>> 00000000000000bc l       .debug_str	0000000000000000
>> 00000000000000c9 l       .debug_str	0000000000000000
>> 00000000000000d4 l       .debug_str	0000000000000000
>> 00000000000000dd l       .debug_str	0000000000000000
>> 00000000000000e1 l       .debug_str	0000000000000000
>> 00000000000000e5 l       .debug_str	0000000000000000
>> 00000000000000ea l       .debug_str	0000000000000000
>> 00000000000000f0 l       .debug_str	0000000000000000
>> 00000000000000f9 l       .debug_str	0000000000000000
>> 0000000000000103 l       .debug_str	0000000000000000
>> 0000000000000113 l       .debug_str	0000000000000000
>> 0000000000000122 l       .debug_str	0000000000000000
>> 0000000000000131 l       .debug_str	0000000000000000
>> 0000000000000000 l    d  prog	0000000000000000 prog
>> 0000000000000000 l    d  .debug_abbrev	0000000000000000 .debug_abbrev
>> 0000000000000000 l    d  .debug_info	0000000000000000 .debug_info
>> 0000000000000000 l    d  .debug_frame	0000000000000000 .debug_frame
>> 0000000000000000 l    d  .debug_line	0000000000000000 .debug_line
>> 0000000000000000 g       license	0000000000000000 _license
>> 0000000000000000 g       prog	0000000000000000 xdp_drop
>> 
>> 
>> I assume this is because old LLVM versions simply don't emit that symbol
>> information?
>
> Could you share xdp_drop.c or other test which I can compile and check
> to understand the issue?

It's just an empty program returning XDP_DROP:

https://github.com/xdp-project/xdp-tools/blob/master/lib/testing/xdp_drop.c

I basically just did this on Debian buster:

$ sudo apt install gcc-multilib build-essential libpcap-dev libelf-dev git llc lld clang gcc-multilib pkt-config m4
$ git clone --recurse-submodules https://github.com/xdp-project/xdp-tools
$ cd xdp-tools
$ LLC=llc-7 ./configure
$ make -k
$ cd xdp-loader
$ sudo ip link add dev testns type veth
$ sudo ./xdp-loader load testns ../lib/testing/xdp_drop.o -vv

(xdpdump will fail to build with llvm7, but the rest should work)

-Toke


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Latest libbpf fails to load programs compiled with old LLVM
  2020-12-04  9:34   ` Toke Høiland-Jørgensen
@ 2020-12-04 17:54     ` Yonghong Song
  2020-12-04 19:23       ` Andrii Nakryiko
  0 siblings, 1 reply; 20+ messages in thread
From: Yonghong Song @ 2020-12-04 17:54 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Andrii Nakryiko; +Cc: bpf



On 12/4/20 1:34 AM, Toke Høiland-Jørgensen wrote:
> Yonghong Song <yhs@fb.com> writes:
> 
>> On 12/3/20 9:55 AM, Toke Høiland-Jørgensen wrote:
>>> Hi Andrii
>>>
>>> I noticed that recent libbpf versions fail to load BPF files compiled
>>> with old versions of LLVM. E.g., if I compile xdp-tools with LLVM 7 I
>>> get:
>>>
>>> $ sudo ./xdp-loader load testns ../lib/testing/xdp_drop.o -vv
>>> Loading 1 files on interface 'testns'.
>>> libbpf: loading ../lib/testing/xdp_drop.o
>>> libbpf: elf: section(3) prog, size 16, link 0, flags 6, type=1
>>> libbpf: sec 'prog': failed to find program symbol at offset 0
>>> Couldn't open file '../lib/testing/xdp_drop.o': BPF object format invalid
>>>
>>> The 'failed to find program symbol' error seems to have been introduced
>>> with commit c112239272c6 ("libbpf: Parse multi-function sections into
>>> multiple BPF programs").
>>>
>>> Looking at the object file in question, indeed it seems to not have any
>>> function symbols defined:
>>>
>>> $  llvm-objdump --syms ../lib/testing/xdp_drop.o
>>>
>>> ../lib/testing/xdp_drop.o:	file format elf64-bpf
>>>
>>> SYMBOL TABLE:
>>> 0000000000000000 l       .debug_str	0000000000000000
>>> 0000000000000037 l       .debug_str	0000000000000000
>>> 0000000000000042 l       .debug_str	0000000000000000
>>> 0000000000000068 l       .debug_str	0000000000000000
>>> 0000000000000071 l       .debug_str	0000000000000000
>>> 0000000000000076 l       .debug_str	0000000000000000
>>> 000000000000008a l       .debug_str	0000000000000000
>>> 0000000000000097 l       .debug_str	0000000000000000
>>> 00000000000000a3 l       .debug_str	0000000000000000
>>> 00000000000000ac l       .debug_str	0000000000000000
>>> 00000000000000b5 l       .debug_str	0000000000000000
>>> 00000000000000bc l       .debug_str	0000000000000000
>>> 00000000000000c9 l       .debug_str	0000000000000000
>>> 00000000000000d4 l       .debug_str	0000000000000000
>>> 00000000000000dd l       .debug_str	0000000000000000
>>> 00000000000000e1 l       .debug_str	0000000000000000
>>> 00000000000000e5 l       .debug_str	0000000000000000
>>> 00000000000000ea l       .debug_str	0000000000000000
>>> 00000000000000f0 l       .debug_str	0000000000000000
>>> 00000000000000f9 l       .debug_str	0000000000000000
>>> 0000000000000103 l       .debug_str	0000000000000000
>>> 0000000000000113 l       .debug_str	0000000000000000
>>> 0000000000000122 l       .debug_str	0000000000000000
>>> 0000000000000131 l       .debug_str	0000000000000000
>>> 0000000000000000 l    d  prog	0000000000000000 prog
>>> 0000000000000000 l    d  .debug_abbrev	0000000000000000 .debug_abbrev
>>> 0000000000000000 l    d  .debug_info	0000000000000000 .debug_info
>>> 0000000000000000 l    d  .debug_frame	0000000000000000 .debug_frame
>>> 0000000000000000 l    d  .debug_line	0000000000000000 .debug_line
>>> 0000000000000000 g       license	0000000000000000 _license
>>> 0000000000000000 g       prog	0000000000000000 xdp_drop
>>>
>>>
>>> I assume this is because old LLVM versions simply don't emit that symbol
>>> information?

Thanks for the below instruction and xdp_drop.c file. I can reproduce 
the issue now.

I added another function 'xdp_drop1' in the same thing. Below is the
symbol table with llvm7 vs. llvm12.

-bash-4.4$ llvm-readelf -symbols xdp-7.o | grep xdp_drop
     32: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT     3 xdp_drop
     33: 0000000000000010     0 NOTYPE  GLOBAL DEFAULT     3 xdp_drop1

   [ 3] prog              PROGBITS        0000000000000000 000040 000020 
00  AX  0   0  8

-bash-4.4$ llvm-readelf -symbols xdp-12.o | grep xdp_drop
     32: 0000000000000000    16 FUNC    GLOBAL DEFAULT     3 xdp_drop
     33: 0000000000000010    16 FUNC    GLOBAL DEFAULT     3 xdp_drop1
-bash-4.4$

   [ 3] prog              PROGBITS        0000000000000000 000040 000020 
00  AX  0   0  8


Yes, llvm7 does not encode type and size for FUNC's. I guess libbpf can
change to recognize NOTYPE and use the symbol value (representing the 
offset from the start of the section) and section size to
calculate the individual function size. This is more complicated than
elf file providing FUNC type and symbol size directly.

Maybe in this case, libbpf can do some sanity check. If there are more
than one functions in the 'prog' section and they are not marked at FUNC
type, simply recommend newer compiler and bail out saying this feature
not available with old llvm?

>>
>> Could you share xdp_drop.c or other test which I can compile and check
>> to understand the issue?
> 
> It's just an empty program returning XDP_DROP:
> 
> https://github.com/xdp-project/xdp-tools/blob/master/lib/testing/xdp_drop.c
> 
> I basically just did this on Debian buster:
> 
> $ sudo apt install gcc-multilib build-essential libpcap-dev libelf-dev git llc lld clang gcc-multilib pkt-config m4
> $ git clone --recurse-submodules https://github.com/xdp-project/xdp-tools
> $ cd xdp-tools
> $ LLC=llc-7 ./configure
> $ make -k
> $ cd xdp-loader
> $ sudo ip link add dev testns type veth
> $ sudo ./xdp-loader load testns ../lib/testing/xdp_drop.o -vv
> 
> (xdpdump will fail to build with llvm7, but the rest should work)
> 
> -Toke
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Latest libbpf fails to load programs compiled with old LLVM
  2020-12-04 17:54     ` Yonghong Song
@ 2020-12-04 19:23       ` Andrii Nakryiko
  2020-12-07 10:59         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 20+ messages in thread
From: Andrii Nakryiko @ 2020-12-04 19:23 UTC (permalink / raw)
  To: Yonghong Song; +Cc: Toke Høiland-Jørgensen, bpf

On Fri, Dec 4, 2020 at 9:55 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 12/4/20 1:34 AM, Toke Høiland-Jørgensen wrote:
> > Yonghong Song <yhs@fb.com> writes:
> >
> >> On 12/3/20 9:55 AM, Toke Høiland-Jørgensen wrote:
> >>> Hi Andrii
> >>>
> >>> I noticed that recent libbpf versions fail to load BPF files compiled
> >>> with old versions of LLVM. E.g., if I compile xdp-tools with LLVM 7 I
> >>> get:
> >>>
> >>> $ sudo ./xdp-loader load testns ../lib/testing/xdp_drop.o -vv
> >>> Loading 1 files on interface 'testns'.
> >>> libbpf: loading ../lib/testing/xdp_drop.o
> >>> libbpf: elf: section(3) prog, size 16, link 0, flags 6, type=1
> >>> libbpf: sec 'prog': failed to find program symbol at offset 0
> >>> Couldn't open file '../lib/testing/xdp_drop.o': BPF object format invalid
> >>>
> >>> The 'failed to find program symbol' error seems to have been introduced
> >>> with commit c112239272c6 ("libbpf: Parse multi-function sections into
> >>> multiple BPF programs").
> >>>
> >>> Looking at the object file in question, indeed it seems to not have any
> >>> function symbols defined:
> >>>
> >>> $  llvm-objdump --syms ../lib/testing/xdp_drop.o
> >>>
> >>> ../lib/testing/xdp_drop.o:  file format elf64-bpf
> >>>
> >>> SYMBOL TABLE:
> >>> 0000000000000000 l       .debug_str 0000000000000000
> >>> 0000000000000037 l       .debug_str 0000000000000000
> >>> 0000000000000042 l       .debug_str 0000000000000000
> >>> 0000000000000068 l       .debug_str 0000000000000000
> >>> 0000000000000071 l       .debug_str 0000000000000000
> >>> 0000000000000076 l       .debug_str 0000000000000000
> >>> 000000000000008a l       .debug_str 0000000000000000
> >>> 0000000000000097 l       .debug_str 0000000000000000
> >>> 00000000000000a3 l       .debug_str 0000000000000000
> >>> 00000000000000ac l       .debug_str 0000000000000000
> >>> 00000000000000b5 l       .debug_str 0000000000000000
> >>> 00000000000000bc l       .debug_str 0000000000000000
> >>> 00000000000000c9 l       .debug_str 0000000000000000
> >>> 00000000000000d4 l       .debug_str 0000000000000000
> >>> 00000000000000dd l       .debug_str 0000000000000000
> >>> 00000000000000e1 l       .debug_str 0000000000000000
> >>> 00000000000000e5 l       .debug_str 0000000000000000
> >>> 00000000000000ea l       .debug_str 0000000000000000
> >>> 00000000000000f0 l       .debug_str 0000000000000000
> >>> 00000000000000f9 l       .debug_str 0000000000000000
> >>> 0000000000000103 l       .debug_str 0000000000000000
> >>> 0000000000000113 l       .debug_str 0000000000000000
> >>> 0000000000000122 l       .debug_str 0000000000000000
> >>> 0000000000000131 l       .debug_str 0000000000000000
> >>> 0000000000000000 l    d  prog       0000000000000000 prog
> >>> 0000000000000000 l    d  .debug_abbrev      0000000000000000 .debug_abbrev
> >>> 0000000000000000 l    d  .debug_info        0000000000000000 .debug_info
> >>> 0000000000000000 l    d  .debug_frame       0000000000000000 .debug_frame
> >>> 0000000000000000 l    d  .debug_line        0000000000000000 .debug_line
> >>> 0000000000000000 g       license    0000000000000000 _license
> >>> 0000000000000000 g       prog       0000000000000000 xdp_drop
> >>>
> >>>
> >>> I assume this is because old LLVM versions simply don't emit that symbol
> >>> information?
>
> Thanks for the below instruction and xdp_drop.c file. I can reproduce
> the issue now.
>
> I added another function 'xdp_drop1' in the same thing. Below is the
> symbol table with llvm7 vs. llvm12.
>
> -bash-4.4$ llvm-readelf -symbols xdp-7.o | grep xdp_drop
>      32: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT     3 xdp_drop
>      33: 0000000000000010     0 NOTYPE  GLOBAL DEFAULT     3 xdp_drop1
>
>    [ 3] prog              PROGBITS        0000000000000000 000040 000020
> 00  AX  0   0  8
>
> -bash-4.4$ llvm-readelf -symbols xdp-12.o | grep xdp_drop
>      32: 0000000000000000    16 FUNC    GLOBAL DEFAULT     3 xdp_drop
>      33: 0000000000000010    16 FUNC    GLOBAL DEFAULT     3 xdp_drop1
> -bash-4.4$
>
>    [ 3] prog              PROGBITS        0000000000000000 000040 000020
> 00  AX  0   0  8
>
>
> Yes, llvm7 does not encode type and size for FUNC's. I guess libbpf can
> change to recognize NOTYPE and use the symbol value (representing the
> offset from the start of the section) and section size to
> calculate the individual function size. This is more complicated than
> elf file providing FUNC type and symbol size directly.

I think we should just face the fact that LLVM7 is way too old to
produce a sensible BPF ELF file layout. We can extend:

libbpf: sec 'prog': failed to find program symbol at offset 0
Couldn't open file '../lib/testing/xdp_drop.o': BPF object format invalid

with a suggestion to upgrade Clang/LLVM to something more recent, if
that would be helpful. But I don't want to add error-prone checks and
assumptions in the already quite complicated logic. Even the kernel
itself maintains that Clang 10+ needs to be used for its compilation.
BPF CO-RE is also not working with older than Clang10, so lots of
people have already upgraded way beyond that.

Speaking of legacy. Toke, can you please update all the samples in
your xdp-tools repo to not use arbitrary sections names. I see
SEC("prog"), where it should really be SEC("xdp"). It sets a bad
example for newcomers, IMO. I'm also going to emit warnings in libbpf
soon for section names that don't follow proper libbpf naming pattern,
so it would be good if you could get ahead of the curve.


>
> Maybe in this case, libbpf can do some sanity check. If there are more
> than one functions in the 'prog' section and they are not marked at FUNC
> type, simply recommend newer compiler and bail out saying this feature
> not available with old llvm?
>
> >>
> >> Could you share xdp_drop.c or other test which I can compile and check
> >> to understand the issue?
> >
> > It's just an empty program returning XDP_DROP:
> >
> > https://github.com/xdp-project/xdp-tools/blob/master/lib/testing/xdp_drop.c
> >
> > I basically just did this on Debian buster:
> >
> > $ sudo apt install gcc-multilib build-essential libpcap-dev libelf-dev git llc lld clang gcc-multilib pkt-config m4
> > $ git clone --recurse-submodules https://github.com/xdp-project/xdp-tools
> > $ cd xdp-tools
> > $ LLC=llc-7 ./configure
> > $ make -k
> > $ cd xdp-loader
> > $ sudo ip link add dev testns type veth
> > $ sudo ./xdp-loader load testns ../lib/testing/xdp_drop.o -vv
> >
> > (xdpdump will fail to build with llvm7, but the rest should work)
> >
> > -Toke
> >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Latest libbpf fails to load programs compiled with old LLVM
  2020-12-04 19:23       ` Andrii Nakryiko
@ 2020-12-07 10:59         ` Toke Høiland-Jørgensen
  2020-12-07 15:55           ` Alexei Starovoitov
  2020-12-08  2:47           ` Andrii Nakryiko
  0 siblings, 2 replies; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-12-07 10:59 UTC (permalink / raw)
  To: Andrii Nakryiko, Yonghong Song; +Cc: bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Fri, Dec 4, 2020 at 9:55 AM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 12/4/20 1:34 AM, Toke Høiland-Jørgensen wrote:
>> > Yonghong Song <yhs@fb.com> writes:
>> >
>> >> On 12/3/20 9:55 AM, Toke Høiland-Jørgensen wrote:
>> >>> Hi Andrii
>> >>>
>> >>> I noticed that recent libbpf versions fail to load BPF files compiled
>> >>> with old versions of LLVM. E.g., if I compile xdp-tools with LLVM 7 I
>> >>> get:
>> >>>
>> >>> $ sudo ./xdp-loader load testns ../lib/testing/xdp_drop.o -vv
>> >>> Loading 1 files on interface 'testns'.
>> >>> libbpf: loading ../lib/testing/xdp_drop.o
>> >>> libbpf: elf: section(3) prog, size 16, link 0, flags 6, type=1
>> >>> libbpf: sec 'prog': failed to find program symbol at offset 0
>> >>> Couldn't open file '../lib/testing/xdp_drop.o': BPF object format invalid
>> >>>
>> >>> The 'failed to find program symbol' error seems to have been introduced
>> >>> with commit c112239272c6 ("libbpf: Parse multi-function sections into
>> >>> multiple BPF programs").
>> >>>
>> >>> Looking at the object file in question, indeed it seems to not have any
>> >>> function symbols defined:
>> >>>
>> >>> $  llvm-objdump --syms ../lib/testing/xdp_drop.o
>> >>>
>> >>> ../lib/testing/xdp_drop.o:  file format elf64-bpf
>> >>>
>> >>> SYMBOL TABLE:
>> >>> 0000000000000000 l       .debug_str 0000000000000000
>> >>> 0000000000000037 l       .debug_str 0000000000000000
>> >>> 0000000000000042 l       .debug_str 0000000000000000
>> >>> 0000000000000068 l       .debug_str 0000000000000000
>> >>> 0000000000000071 l       .debug_str 0000000000000000
>> >>> 0000000000000076 l       .debug_str 0000000000000000
>> >>> 000000000000008a l       .debug_str 0000000000000000
>> >>> 0000000000000097 l       .debug_str 0000000000000000
>> >>> 00000000000000a3 l       .debug_str 0000000000000000
>> >>> 00000000000000ac l       .debug_str 0000000000000000
>> >>> 00000000000000b5 l       .debug_str 0000000000000000
>> >>> 00000000000000bc l       .debug_str 0000000000000000
>> >>> 00000000000000c9 l       .debug_str 0000000000000000
>> >>> 00000000000000d4 l       .debug_str 0000000000000000
>> >>> 00000000000000dd l       .debug_str 0000000000000000
>> >>> 00000000000000e1 l       .debug_str 0000000000000000
>> >>> 00000000000000e5 l       .debug_str 0000000000000000
>> >>> 00000000000000ea l       .debug_str 0000000000000000
>> >>> 00000000000000f0 l       .debug_str 0000000000000000
>> >>> 00000000000000f9 l       .debug_str 0000000000000000
>> >>> 0000000000000103 l       .debug_str 0000000000000000
>> >>> 0000000000000113 l       .debug_str 0000000000000000
>> >>> 0000000000000122 l       .debug_str 0000000000000000
>> >>> 0000000000000131 l       .debug_str 0000000000000000
>> >>> 0000000000000000 l    d  prog       0000000000000000 prog
>> >>> 0000000000000000 l    d  .debug_abbrev      0000000000000000 .debug_abbrev
>> >>> 0000000000000000 l    d  .debug_info        0000000000000000 .debug_info
>> >>> 0000000000000000 l    d  .debug_frame       0000000000000000 .debug_frame
>> >>> 0000000000000000 l    d  .debug_line        0000000000000000 .debug_line
>> >>> 0000000000000000 g       license    0000000000000000 _license
>> >>> 0000000000000000 g       prog       0000000000000000 xdp_drop
>> >>>
>> >>>
>> >>> I assume this is because old LLVM versions simply don't emit that symbol
>> >>> information?
>>
>> Thanks for the below instruction and xdp_drop.c file. I can reproduce
>> the issue now.
>>
>> I added another function 'xdp_drop1' in the same thing. Below is the
>> symbol table with llvm7 vs. llvm12.
>>
>> -bash-4.4$ llvm-readelf -symbols xdp-7.o | grep xdp_drop
>>      32: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT     3 xdp_drop
>>      33: 0000000000000010     0 NOTYPE  GLOBAL DEFAULT     3 xdp_drop1
>>
>>    [ 3] prog              PROGBITS        0000000000000000 000040 000020
>> 00  AX  0   0  8
>>
>> -bash-4.4$ llvm-readelf -symbols xdp-12.o | grep xdp_drop
>>      32: 0000000000000000    16 FUNC    GLOBAL DEFAULT     3 xdp_drop
>>      33: 0000000000000010    16 FUNC    GLOBAL DEFAULT     3 xdp_drop1
>> -bash-4.4$
>>
>>    [ 3] prog              PROGBITS        0000000000000000 000040 000020
>> 00  AX  0   0  8
>>
>>
>> Yes, llvm7 does not encode type and size for FUNC's. I guess libbpf can
>> change to recognize NOTYPE and use the symbol value (representing the
>> offset from the start of the section) and section size to
>> calculate the individual function size. This is more complicated than
>> elf file providing FUNC type and symbol size directly.
>
> I think we should just face the fact that LLVM7 is way too old to
> produce a sensible BPF ELF file layout. We can extend:
>
> libbpf: sec 'prog': failed to find program symbol at offset 0
> Couldn't open file '../lib/testing/xdp_drop.o': BPF object format invalid
>
> with a suggestion to upgrade Clang/LLVM to something more recent, if
> that would be helpful.
>
> But I don't want to add error-prone checks and assumptions in the
> already quite complicated logic. Even the kernel itself maintains that
> Clang 10+ needs to be used for its compilation. BPF CO-RE is also not
> working with older than Clang10, so lots of people have already
> upgraded way beyond that.

Wait, what? This is a regression that *breaks people's programs* on
compiler versions that are still very much in the wild! I mean, fine if
you don't want to support new features on such files, but then surely we
can at least revert back to the old behaviour?

> Speaking of legacy. Toke, can you please update all the samples in
> your xdp-tools repo to not use arbitrary sections names. I see
> SEC("prog"), where it should really be SEC("xdp"). It sets a bad
> example for newcomers, IMO.

I used "prog" because that's what iproute2 looks for if you don't supply
a section name, so it makes it convenient to load programs with 'ip'
without supplying the section name. However, I do realise this is not
the best of reasons, and I am not opposed to changing it. However...

> I'm also going to emit warnings in libbpf soon for section names that
> don't follow proper libbpf naming pattern, so it would be good if you
> could get ahead of the curve.

...this sounds like just another way to annoy users by breaking things
that were working before? :/

-Toke


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Latest libbpf fails to load programs compiled with old LLVM
  2020-12-07 10:59         ` Toke Høiland-Jørgensen
@ 2020-12-07 15:55           ` Alexei Starovoitov
  2020-12-07 16:15             ` Toke Høiland-Jørgensen
  2020-12-08  2:47           ` Andrii Nakryiko
  1 sibling, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2020-12-07 15:55 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: Andrii Nakryiko, Yonghong Song, bpf

On Mon, Dec 7, 2020 at 3:03 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Wait, what? This is a regression that *breaks people's programs* on
> compiler versions that are still very much in the wild! I mean, fine if
> you don't want to support new features on such files, but then surely we
> can at least revert back to the old behaviour?

Those folks that care about compiling with old llvm would have to stick
to whatever loader they have instead of using libbpf.
It's not a backward compatibility breakage.

> I used "prog" because that's what iproute2 looks for if you don't supply

Please don't use iproute2 as a reason to do anything in libbpf. It won't fly.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Latest libbpf fails to load programs compiled with old LLVM
  2020-12-07 15:55           ` Alexei Starovoitov
@ 2020-12-07 16:15             ` Toke Høiland-Jørgensen
  2020-12-07 16:20               ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-12-07 16:15 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Andrii Nakryiko, Yonghong Song, bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Mon, Dec 7, 2020 at 3:03 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Wait, what? This is a regression that *breaks people's programs* on
>> compiler versions that are still very much in the wild! I mean, fine if
>> you don't want to support new features on such files, but then surely we
>> can at least revert back to the old behaviour?
>
> Those folks that care about compiling with old llvm would have to stick
> to whatever loader they have instead of using libbpf.
> It's not a backward compatibility breakage.

What? It's a change in libbpf that breaks loading of existing BPF object
files that were working (with libbpf) before. If that's not a backward
compatibility break then that term has lost all meaning.

>> I used "prog" because that's what iproute2 looks for if you don't supply
>
> Please don't use iproute2 as a reason to do anything in libbpf. It won't fly.

Eh? Did you even read the email you're replying to? This issue has
nothing to do with iproute2, that was an unrelated thing Andrii asked me
to change when he was looking at my example...

-Toke


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Latest libbpf fails to load programs compiled with old LLVM
  2020-12-07 16:15             ` Toke Høiland-Jørgensen
@ 2020-12-07 16:20               ` Alexei Starovoitov
  2020-12-07 16:51                 ` Toke Høiland-Jørgensen
  2020-12-07 18:02                 ` David Ahern
  0 siblings, 2 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2020-12-07 16:20 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: Andrii Nakryiko, Yonghong Song, bpf

On Mon, Dec 7, 2020 at 8:15 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>
> > On Mon, Dec 7, 2020 at 3:03 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Wait, what? This is a regression that *breaks people's programs* on
> >> compiler versions that are still very much in the wild! I mean, fine if
> >> you don't want to support new features on such files, but then surely we
> >> can at least revert back to the old behaviour?
> >
> > Those folks that care about compiling with old llvm would have to stick
> > to whatever loader they have instead of using libbpf.
> > It's not a backward compatibility breakage.
>
> What? It's a change in libbpf that breaks loading of existing BPF object
> files that were working (with libbpf) before. If that's not a backward
> compatibility break then that term has lost all meaning.

The user space library is not a kernel.
The library will change its interface. It will remove functions, features, etc.
That's what .map is for.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Latest libbpf fails to load programs compiled with old LLVM
  2020-12-07 16:20               ` Alexei Starovoitov
@ 2020-12-07 16:51                 ` Toke Høiland-Jørgensen
  2020-12-07 17:16                   ` Alexei Starovoitov
  2020-12-07 18:02                 ` David Ahern
  1 sibling, 1 reply; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-12-07 16:51 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Andrii Nakryiko, Yonghong Song, bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Mon, Dec 7, 2020 at 8:15 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>>
>> > On Mon, Dec 7, 2020 at 3:03 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Wait, what? This is a regression that *breaks people's programs* on
>> >> compiler versions that are still very much in the wild! I mean, fine if
>> >> you don't want to support new features on such files, but then surely we
>> >> can at least revert back to the old behaviour?
>> >
>> > Those folks that care about compiling with old llvm would have to stick
>> > to whatever loader they have instead of using libbpf.
>> > It's not a backward compatibility breakage.
>>
>> What? It's a change in libbpf that breaks loading of existing BPF object
>> files that were working (with libbpf) before. If that's not a backward
>> compatibility break then that term has lost all meaning.
>
> The user space library is not a kernel.
> The library will change its interface. It will remove functions, features, etc.
> That's what .map is for.

Right, OK, so how do I use .map to get the old behaviour here? That's
all I'm asking for, really...

-Toke


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Latest libbpf fails to load programs compiled with old LLVM
  2020-12-07 16:51                 ` Toke Høiland-Jørgensen
@ 2020-12-07 17:16                   ` Alexei Starovoitov
  2020-12-07 22:18                     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2020-12-07 17:16 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: Andrii Nakryiko, Yonghong Song, bpf

On Mon, Dec 7, 2020 at 8:51 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>
> > On Mon, Dec 7, 2020 at 8:15 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> >>
> >> > On Mon, Dec 7, 2020 at 3:03 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> >>
> >> >> Wait, what? This is a regression that *breaks people's programs* on
> >> >> compiler versions that are still very much in the wild! I mean, fine if
> >> >> you don't want to support new features on such files, but then surely we
> >> >> can at least revert back to the old behaviour?
> >> >
> >> > Those folks that care about compiling with old llvm would have to stick
> >> > to whatever loader they have instead of using libbpf.
> >> > It's not a backward compatibility breakage.
> >>
> >> What? It's a change in libbpf that breaks loading of existing BPF object
> >> files that were working (with libbpf) before. If that's not a backward
> >> compatibility break then that term has lost all meaning.
> >
> > The user space library is not a kernel.
> > The library will change its interface. It will remove functions, features, etc.
> > That's what .map is for.
>
> Right, OK, so how do I use .map to get the old behaviour here? That's
> all I'm asking for, really...

Fix old llvm. The users would have to upgrade either from llvm 7.x to
7.x+1 or to llvm 10+.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Latest libbpf fails to load programs compiled with old LLVM
  2020-12-07 16:20               ` Alexei Starovoitov
  2020-12-07 16:51                 ` Toke Høiland-Jørgensen
@ 2020-12-07 18:02                 ` David Ahern
  2020-12-07 18:14                   ` Alexei Starovoitov
  1 sibling, 1 reply; 20+ messages in thread
From: David Ahern @ 2020-12-07 18:02 UTC (permalink / raw)
  To: Alexei Starovoitov, Toke Høiland-Jørgensen
  Cc: Andrii Nakryiko, Yonghong Song, bpf

On 12/7/20 9:20 AM, Alexei Starovoitov wrote:
> The user space library is not a kernel.
> The library will change its interface. It will remove functions, features, etc.
> That's what .map is for.

So any user/package wanting to leverage libbpf can not expect stability
or consistency with its APIs?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Latest libbpf fails to load programs compiled with old LLVM
  2020-12-07 18:02                 ` David Ahern
@ 2020-12-07 18:14                   ` Alexei Starovoitov
  2020-12-07 18:59                     ` David Ahern
  0 siblings, 1 reply; 20+ messages in thread
From: Alexei Starovoitov @ 2020-12-07 18:14 UTC (permalink / raw)
  To: David Ahern
  Cc: Toke Høiland-Jørgensen, Andrii Nakryiko, Yonghong Song, bpf

On Mon, Dec 7, 2020 at 10:02 AM David Ahern <dsahern@gmail.com> wrote:
>
> On 12/7/20 9:20 AM, Alexei Starovoitov wrote:
> > The user space library is not a kernel.
> > The library will change its interface. It will remove functions, features, etc.
> > That's what .map is for.
>
> So any user/package wanting to leverage libbpf can not expect stability
> or consistency with its APIs?

If you're talking about iproute2 and your own convoluted definition of
stability and consistency then certainly not.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Latest libbpf fails to load programs compiled with old LLVM
  2020-12-07 18:14                   ` Alexei Starovoitov
@ 2020-12-07 18:59                     ` David Ahern
  0 siblings, 0 replies; 20+ messages in thread
From: David Ahern @ 2020-12-07 18:59 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Toke Høiland-Jørgensen, Andrii Nakryiko, Yonghong Song, bpf

On 12/7/20 11:14 AM, Alexei Starovoitov wrote:
> On Mon, Dec 7, 2020 at 10:02 AM David Ahern <dsahern@gmail.com> wrote:
>>
>> On 12/7/20 9:20 AM, Alexei Starovoitov wrote:
>>> The user space library is not a kernel.
>>> The library will change its interface. It will remove functions, features, etc.
>>> That's what .map is for.
>>
>> So any user/package wanting to leverage libbpf can not expect stability
>> or consistency with its APIs?
> 
> If you're talking about iproute2 and your own convoluted definition of

your record player seems to be stuck. Could you give it a little bump so
it can move on to the next track? Thanks. I do work on other things
besides iproute2.

> stability and consistency then certainly not.
> 

Your statement has huge impacts on what users can expect from a library
and is inconsistent with other libraries I have used. Hence, my request
for clarification on your comment about 'changing interfaces, removing
functions and features.'

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Latest libbpf fails to load programs compiled with old LLVM
  2020-12-07 17:16                   ` Alexei Starovoitov
@ 2020-12-07 22:18                     ` Toke Høiland-Jørgensen
  2020-12-08  1:20                       ` Alexei Starovoitov
  0 siblings, 1 reply; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-12-07 22:18 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: Andrii Nakryiko, Yonghong Song, bpf

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Mon, Dec 7, 2020 at 8:51 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>>
>> > On Mon, Dec 7, 2020 at 8:15 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>> >>
>> >> > On Mon, Dec 7, 2020 at 3:03 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >> >>
>> >> >> Wait, what? This is a regression that *breaks people's programs* on
>> >> >> compiler versions that are still very much in the wild! I mean, fine if
>> >> >> you don't want to support new features on such files, but then surely we
>> >> >> can at least revert back to the old behaviour?
>> >> >
>> >> > Those folks that care about compiling with old llvm would have to stick
>> >> > to whatever loader they have instead of using libbpf.
>> >> > It's not a backward compatibility breakage.
>> >>
>> >> What? It's a change in libbpf that breaks loading of existing BPF object
>> >> files that were working (with libbpf) before. If that's not a backward
>> >> compatibility break then that term has lost all meaning.
>> >
>> > The user space library is not a kernel.
>> > The library will change its interface. It will remove functions, features, etc.
>> > That's what .map is for.
>>
>> Right, OK, so how do I use .map to get the old behaviour here? That's
>> all I'm asking for, really...
>
> Fix old llvm. The users would have to upgrade either from llvm 7.x to
> 7.x+1 or to llvm 10+.

Right, so by "we keep a stable interface" you mean "we expect you to
upgrade your entire toolchain every time you update the library". Gotcha!
I'll rectify my newspeak dictionary straight away - doubleplusgood!

-Toke


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Latest libbpf fails to load programs compiled with old LLVM
  2020-12-07 22:18                     ` Toke Høiland-Jørgensen
@ 2020-12-08  1:20                       ` Alexei Starovoitov
  0 siblings, 0 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2020-12-08  1:20 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: Andrii Nakryiko, Yonghong Song, bpf

On Mon, Dec 7, 2020 at 2:18 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>
> > On Mon, Dec 7, 2020 at 8:51 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> >>
> >> > On Mon, Dec 7, 2020 at 8:15 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> >>
> >> >> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> >> >>
> >> >> > On Mon, Dec 7, 2020 at 3:03 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >> >> >>
> >> >> >> Wait, what? This is a regression that *breaks people's programs* on
> >> >> >> compiler versions that are still very much in the wild! I mean, fine if
> >> >> >> you don't want to support new features on such files, but then surely we
> >> >> >> can at least revert back to the old behaviour?
> >> >> >
> >> >> > Those folks that care about compiling with old llvm would have to stick
> >> >> > to whatever loader they have instead of using libbpf.
> >> >> > It's not a backward compatibility breakage.
> >> >>
> >> >> What? It's a change in libbpf that breaks loading of existing BPF object
> >> >> files that were working (with libbpf) before. If that's not a backward
> >> >> compatibility break then that term has lost all meaning.
> >> >
> >> > The user space library is not a kernel.
> >> > The library will change its interface. It will remove functions, features, etc.
> >> > That's what .map is for.
> >>
> >> Right, OK, so how do I use .map to get the old behaviour here? That's
> >> all I'm asking for, really...
> >
> > Fix old llvm. The users would have to upgrade either from llvm 7.x to
> > 7.x+1 or to llvm 10+.
>
> Right, so by "we keep a stable interface" you mean "we expect you to
> upgrade your entire toolchain every time you update the library". Gotcha!

No. It means that libbpf is not going to have a workaround for every possible
llvm bug that was fixed over the years.
libbpf already does a ton of fixups for things that users can actually hit.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Latest libbpf fails to load programs compiled with old LLVM
  2020-12-07 10:59         ` Toke Høiland-Jørgensen
  2020-12-07 15:55           ` Alexei Starovoitov
@ 2020-12-08  2:47           ` Andrii Nakryiko
  2020-12-08 13:41             ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 20+ messages in thread
From: Andrii Nakryiko @ 2020-12-08  2:47 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: Yonghong Song, bpf

On Mon, Dec 7, 2020 at 3:00 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Fri, Dec 4, 2020 at 9:55 AM Yonghong Song <yhs@fb.com> wrote:
> >>
> >>
> >>
> >> On 12/4/20 1:34 AM, Toke Høiland-Jørgensen wrote:
> >> > Yonghong Song <yhs@fb.com> writes:
> >> >
> >> >> On 12/3/20 9:55 AM, Toke Høiland-Jørgensen wrote:
> >> >>> Hi Andrii
> >> >>>
> >> >>> I noticed that recent libbpf versions fail to load BPF files compiled
> >> >>> with old versions of LLVM. E.g., if I compile xdp-tools with LLVM 7 I
> >> >>> get:
> >> >>>
> >> >>> $ sudo ./xdp-loader load testns ../lib/testing/xdp_drop.o -vv
> >> >>> Loading 1 files on interface 'testns'.
> >> >>> libbpf: loading ../lib/testing/xdp_drop.o
> >> >>> libbpf: elf: section(3) prog, size 16, link 0, flags 6, type=1
> >> >>> libbpf: sec 'prog': failed to find program symbol at offset 0
> >> >>> Couldn't open file '../lib/testing/xdp_drop.o': BPF object format invalid
> >> >>>
> >> >>> The 'failed to find program symbol' error seems to have been introduced
> >> >>> with commit c112239272c6 ("libbpf: Parse multi-function sections into
> >> >>> multiple BPF programs").
> >> >>>
> >> >>> Looking at the object file in question, indeed it seems to not have any
> >> >>> function symbols defined:
> >> >>>
> >> >>> $  llvm-objdump --syms ../lib/testing/xdp_drop.o
> >> >>>
> >> >>> ../lib/testing/xdp_drop.o:  file format elf64-bpf
> >> >>>
> >> >>> SYMBOL TABLE:
> >> >>> 0000000000000000 l       .debug_str 0000000000000000
> >> >>> 0000000000000037 l       .debug_str 0000000000000000
> >> >>> 0000000000000042 l       .debug_str 0000000000000000
> >> >>> 0000000000000068 l       .debug_str 0000000000000000
> >> >>> 0000000000000071 l       .debug_str 0000000000000000
> >> >>> 0000000000000076 l       .debug_str 0000000000000000
> >> >>> 000000000000008a l       .debug_str 0000000000000000
> >> >>> 0000000000000097 l       .debug_str 0000000000000000
> >> >>> 00000000000000a3 l       .debug_str 0000000000000000
> >> >>> 00000000000000ac l       .debug_str 0000000000000000
> >> >>> 00000000000000b5 l       .debug_str 0000000000000000
> >> >>> 00000000000000bc l       .debug_str 0000000000000000
> >> >>> 00000000000000c9 l       .debug_str 0000000000000000
> >> >>> 00000000000000d4 l       .debug_str 0000000000000000
> >> >>> 00000000000000dd l       .debug_str 0000000000000000
> >> >>> 00000000000000e1 l       .debug_str 0000000000000000
> >> >>> 00000000000000e5 l       .debug_str 0000000000000000
> >> >>> 00000000000000ea l       .debug_str 0000000000000000
> >> >>> 00000000000000f0 l       .debug_str 0000000000000000
> >> >>> 00000000000000f9 l       .debug_str 0000000000000000
> >> >>> 0000000000000103 l       .debug_str 0000000000000000
> >> >>> 0000000000000113 l       .debug_str 0000000000000000
> >> >>> 0000000000000122 l       .debug_str 0000000000000000
> >> >>> 0000000000000131 l       .debug_str 0000000000000000
> >> >>> 0000000000000000 l    d  prog       0000000000000000 prog
> >> >>> 0000000000000000 l    d  .debug_abbrev      0000000000000000 .debug_abbrev
> >> >>> 0000000000000000 l    d  .debug_info        0000000000000000 .debug_info
> >> >>> 0000000000000000 l    d  .debug_frame       0000000000000000 .debug_frame
> >> >>> 0000000000000000 l    d  .debug_line        0000000000000000 .debug_line
> >> >>> 0000000000000000 g       license    0000000000000000 _license
> >> >>> 0000000000000000 g       prog       0000000000000000 xdp_drop
> >> >>>
> >> >>>
> >> >>> I assume this is because old LLVM versions simply don't emit that symbol
> >> >>> information?
> >>
> >> Thanks for the below instruction and xdp_drop.c file. I can reproduce
> >> the issue now.
> >>
> >> I added another function 'xdp_drop1' in the same thing. Below is the
> >> symbol table with llvm7 vs. llvm12.
> >>
> >> -bash-4.4$ llvm-readelf -symbols xdp-7.o | grep xdp_drop
> >>      32: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT     3 xdp_drop
> >>      33: 0000000000000010     0 NOTYPE  GLOBAL DEFAULT     3 xdp_drop1
> >>
> >>    [ 3] prog              PROGBITS        0000000000000000 000040 000020
> >> 00  AX  0   0  8
> >>
> >> -bash-4.4$ llvm-readelf -symbols xdp-12.o | grep xdp_drop
> >>      32: 0000000000000000    16 FUNC    GLOBAL DEFAULT     3 xdp_drop
> >>      33: 0000000000000010    16 FUNC    GLOBAL DEFAULT     3 xdp_drop1
> >> -bash-4.4$
> >>
> >>    [ 3] prog              PROGBITS        0000000000000000 000040 000020
> >> 00  AX  0   0  8
> >>
> >>
> >> Yes, llvm7 does not encode type and size for FUNC's. I guess libbpf can
> >> change to recognize NOTYPE and use the symbol value (representing the
> >> offset from the start of the section) and section size to
> >> calculate the individual function size. This is more complicated than
> >> elf file providing FUNC type and symbol size directly.
> >
> > I think we should just face the fact that LLVM7 is way too old to
> > produce a sensible BPF ELF file layout. We can extend:
> >
> > libbpf: sec 'prog': failed to find program symbol at offset 0
> > Couldn't open file '../lib/testing/xdp_drop.o': BPF object format invalid
> >
> > with a suggestion to upgrade Clang/LLVM to something more recent, if
> > that would be helpful.
> >
> > But I don't want to add error-prone checks and assumptions in the
> > already quite complicated logic. Even the kernel itself maintains that
> > Clang 10+ needs to be used for its compilation. BPF CO-RE is also not
> > working with older than Clang10, so lots of people have already
> > upgraded way beyond that.
>
> Wait, what? This is a regression that *breaks people's programs* on
> compiler versions that are still very much in the wild! I mean, fine if
> you don't want to support new features on such files, but then surely we
> can at least revert back to the old behaviour?

This is clearly a bug in LLVM7, which didn't produce correct ELF
symbols, do we agree on that? libbpf used to handle such invalid ELF
files *by accident* until it changed its internal logic to be more
strict in v0.2. It became more strict and doesn't work with such
invalid ELF files anymore. Does it need to add extra quirks to support
such broken ELF? I don't think so.

Surely, users that can't upgrade LLVM7 to something less ancient, can
stick to libbpf v0.1, that was lenient enough to accept such invalid
ELF files. libbpf v0.2 was released more than a month ago, and so far
you are the only one who noticed this "regression". So hopefully it's
not super annoying to people and they would be accommodating enough to
use more up to date compiler (and save themselves lots of trouble
along the way).

>
> > Speaking of legacy. Toke, can you please update all the samples in
> > your xdp-tools repo to not use arbitrary sections names. I see
> > SEC("prog"), where it should really be SEC("xdp"). It sets a bad
> > example for newcomers, IMO.
>
> I used "prog" because that's what iproute2 looks for if you don't supply

Ok.

> a section name, so it makes it convenient to load programs with 'ip'
> without supplying the section name. However, I do realise this is not
> the best of reasons, and I am not opposed to changing it. However...
>
> > I'm also going to emit warnings in libbpf soon for section names that
> > don't follow proper libbpf naming pattern, so it would be good if you
> > could get ahead of the curve.
>
> ...this sounds like just another way to annoy users by breaking things
> that were working before? :/

It won't break, libbpf will emit a warning about the need to use
proper section name format, which will start to be enforced only with
major version bump. So that will give users plenty of time to make
sure their BPF programs are compatible with stricter libbpf.

>
> -Toke
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Latest libbpf fails to load programs compiled with old LLVM
  2020-12-08  2:47           ` Andrii Nakryiko
@ 2020-12-08 13:41             ` Toke Høiland-Jørgensen
  2020-12-08 18:39               ` Andrii Nakryiko
  0 siblings, 1 reply; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-12-08 13:41 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: Yonghong Song, bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Mon, Dec 7, 2020 at 3:00 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>
>> > On Fri, Dec 4, 2020 at 9:55 AM Yonghong Song <yhs@fb.com> wrote:
>> >>
>> >>
>> >>
>> >> On 12/4/20 1:34 AM, Toke Høiland-Jørgensen wrote:
>> >> > Yonghong Song <yhs@fb.com> writes:
>> >> >
>> >> >> On 12/3/20 9:55 AM, Toke Høiland-Jørgensen wrote:
>> >> >>> Hi Andrii
>> >> >>>
>> >> >>> I noticed that recent libbpf versions fail to load BPF files compiled
>> >> >>> with old versions of LLVM. E.g., if I compile xdp-tools with LLVM 7 I
>> >> >>> get:
>> >> >>>
>> >> >>> $ sudo ./xdp-loader load testns ../lib/testing/xdp_drop.o -vv
>> >> >>> Loading 1 files on interface 'testns'.
>> >> >>> libbpf: loading ../lib/testing/xdp_drop.o
>> >> >>> libbpf: elf: section(3) prog, size 16, link 0, flags 6, type=1
>> >> >>> libbpf: sec 'prog': failed to find program symbol at offset 0
>> >> >>> Couldn't open file '../lib/testing/xdp_drop.o': BPF object format invalid
>> >> >>>
>> >> >>> The 'failed to find program symbol' error seems to have been introduced
>> >> >>> with commit c112239272c6 ("libbpf: Parse multi-function sections into
>> >> >>> multiple BPF programs").
>> >> >>>
>> >> >>> Looking at the object file in question, indeed it seems to not have any
>> >> >>> function symbols defined:
>> >> >>>
>> >> >>> $  llvm-objdump --syms ../lib/testing/xdp_drop.o
>> >> >>>
>> >> >>> ../lib/testing/xdp_drop.o:  file format elf64-bpf
>> >> >>>
>> >> >>> SYMBOL TABLE:
>> >> >>> 0000000000000000 l       .debug_str 0000000000000000
>> >> >>> 0000000000000037 l       .debug_str 0000000000000000
>> >> >>> 0000000000000042 l       .debug_str 0000000000000000
>> >> >>> 0000000000000068 l       .debug_str 0000000000000000
>> >> >>> 0000000000000071 l       .debug_str 0000000000000000
>> >> >>> 0000000000000076 l       .debug_str 0000000000000000
>> >> >>> 000000000000008a l       .debug_str 0000000000000000
>> >> >>> 0000000000000097 l       .debug_str 0000000000000000
>> >> >>> 00000000000000a3 l       .debug_str 0000000000000000
>> >> >>> 00000000000000ac l       .debug_str 0000000000000000
>> >> >>> 00000000000000b5 l       .debug_str 0000000000000000
>> >> >>> 00000000000000bc l       .debug_str 0000000000000000
>> >> >>> 00000000000000c9 l       .debug_str 0000000000000000
>> >> >>> 00000000000000d4 l       .debug_str 0000000000000000
>> >> >>> 00000000000000dd l       .debug_str 0000000000000000
>> >> >>> 00000000000000e1 l       .debug_str 0000000000000000
>> >> >>> 00000000000000e5 l       .debug_str 0000000000000000
>> >> >>> 00000000000000ea l       .debug_str 0000000000000000
>> >> >>> 00000000000000f0 l       .debug_str 0000000000000000
>> >> >>> 00000000000000f9 l       .debug_str 0000000000000000
>> >> >>> 0000000000000103 l       .debug_str 0000000000000000
>> >> >>> 0000000000000113 l       .debug_str 0000000000000000
>> >> >>> 0000000000000122 l       .debug_str 0000000000000000
>> >> >>> 0000000000000131 l       .debug_str 0000000000000000
>> >> >>> 0000000000000000 l    d  prog       0000000000000000 prog
>> >> >>> 0000000000000000 l    d  .debug_abbrev      0000000000000000 .debug_abbrev
>> >> >>> 0000000000000000 l    d  .debug_info        0000000000000000 .debug_info
>> >> >>> 0000000000000000 l    d  .debug_frame       0000000000000000 .debug_frame
>> >> >>> 0000000000000000 l    d  .debug_line        0000000000000000 .debug_line
>> >> >>> 0000000000000000 g       license    0000000000000000 _license
>> >> >>> 0000000000000000 g       prog       0000000000000000 xdp_drop
>> >> >>>
>> >> >>>
>> >> >>> I assume this is because old LLVM versions simply don't emit that symbol
>> >> >>> information?
>> >>
>> >> Thanks for the below instruction and xdp_drop.c file. I can reproduce
>> >> the issue now.
>> >>
>> >> I added another function 'xdp_drop1' in the same thing. Below is the
>> >> symbol table with llvm7 vs. llvm12.
>> >>
>> >> -bash-4.4$ llvm-readelf -symbols xdp-7.o | grep xdp_drop
>> >>      32: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT     3 xdp_drop
>> >>      33: 0000000000000010     0 NOTYPE  GLOBAL DEFAULT     3 xdp_drop1
>> >>
>> >>    [ 3] prog              PROGBITS        0000000000000000 000040 000020
>> >> 00  AX  0   0  8
>> >>
>> >> -bash-4.4$ llvm-readelf -symbols xdp-12.o | grep xdp_drop
>> >>      32: 0000000000000000    16 FUNC    GLOBAL DEFAULT     3 xdp_drop
>> >>      33: 0000000000000010    16 FUNC    GLOBAL DEFAULT     3 xdp_drop1
>> >> -bash-4.4$
>> >>
>> >>    [ 3] prog              PROGBITS        0000000000000000 000040 000020
>> >> 00  AX  0   0  8
>> >>
>> >>
>> >> Yes, llvm7 does not encode type and size for FUNC's. I guess libbpf can
>> >> change to recognize NOTYPE and use the symbol value (representing the
>> >> offset from the start of the section) and section size to
>> >> calculate the individual function size. This is more complicated than
>> >> elf file providing FUNC type and symbol size directly.
>> >
>> > I think we should just face the fact that LLVM7 is way too old to
>> > produce a sensible BPF ELF file layout. We can extend:
>> >
>> > libbpf: sec 'prog': failed to find program symbol at offset 0
>> > Couldn't open file '../lib/testing/xdp_drop.o': BPF object format invalid
>> >
>> > with a suggestion to upgrade Clang/LLVM to something more recent, if
>> > that would be helpful.
>> >
>> > But I don't want to add error-prone checks and assumptions in the
>> > already quite complicated logic. Even the kernel itself maintains that
>> > Clang 10+ needs to be used for its compilation. BPF CO-RE is also not
>> > working with older than Clang10, so lots of people have already
>> > upgraded way beyond that.
>>
>> Wait, what? This is a regression that *breaks people's programs* on
>> compiler versions that are still very much in the wild! I mean, fine if
>> you don't want to support new features on such files, but then surely we
>> can at least revert back to the old behaviour?
>
> This is clearly a bug in LLVM7, which didn't produce correct ELF
> symbols, do we agree on that? libbpf used to handle such invalid ELF
> files *by accident* until it changed its internal logic to be more
> strict in v0.2. It became more strict and doesn't work with such
> invalid ELF files anymore. Does it need to add extra quirks to support
> such broken ELF? I don't think so.

I don't know enough about the intricacies of the ELF format to say, but
I believe you when you say it's a bug. However, that doesn't change the
fact that from a user's PoV, something that was working before is now
broken, with the only change being a newer libbpf.

This is not a theoretical concern, BTW, I discovered this due to
feedback from a partner that we've been pushing to adopt libbpf. When
they finally tried it out, the first thing they noticed is that their
programs wouldn't load due to this issue.

Sure, I can tell them to just upgrade their toolchain (and I will), but
that still means we're back to "in order to use this library, you should
expect to keep chasing the latest version of the entire toolchain". And
this is a much harder sell than "this is a stable library and upstream
takes backwards compatibility very serious", which I *thought* was the
expectation.

> Surely, users that can't upgrade LLVM7 to something less ancient, can
> stick to libbpf v0.1, that was lenient enough to accept such invalid
> ELF files. libbpf v0.2 was released more than a month ago, and so far
> you are the only one who noticed this "regression". So hopefully it's
> not super annoying to people and they would be accommodating enough to
> use more up to date compiler (and save themselves lots of trouble
> along the way).

Oh, boy, do I envy your adoption rate for new versions! In my world I
would expect that by one month a few people who are very early adopters
have started noticing and maybe thinking about testing the new version :)

>> > Speaking of legacy. Toke, can you please update all the samples in
>> > your xdp-tools repo to not use arbitrary sections names. I see
>> > SEC("prog"), where it should really be SEC("xdp"). It sets a bad
>> > example for newcomers, IMO.
>>
>> I used "prog" because that's what iproute2 looks for if you don't supply
>
> Ok.

Fixed now, BTW:
https://github.com/xdp-project/xdp-tools/commit/83ab8aa1c29408aac842bebe704aa47ec5dc5bc3

>> a section name, so it makes it convenient to load programs with 'ip'
>> without supplying the section name. However, I do realise this is not
>> the best of reasons, and I am not opposed to changing it. However...
>>
>> > I'm also going to emit warnings in libbpf soon for section names that
>> > don't follow proper libbpf naming pattern, so it would be good if you
>> > could get ahead of the curve.
>>
>> ...this sounds like just another way to annoy users by breaking things
>> that were working before? :/
>
> It won't break, libbpf will emit a warning about the need to use
> proper section name format, which will start to be enforced only with
> major version bump. So that will give users plenty of time to make
> sure their BPF programs are compatible with stricter libbpf.

Well see above re: different expectations for "plenty of time". But OK,
maybe this isn't as bad as I figured at first glance :)

-Toke


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Latest libbpf fails to load programs compiled with old LLVM
  2020-12-08 13:41             ` Toke Høiland-Jørgensen
@ 2020-12-08 18:39               ` Andrii Nakryiko
  2020-12-08 22:38                 ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 20+ messages in thread
From: Andrii Nakryiko @ 2020-12-08 18:39 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: Yonghong Song, bpf

On Tue, Dec 8, 2020 at 5:41 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>
> > On Mon, Dec 7, 2020 at 3:00 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
> >>
> >> > On Fri, Dec 4, 2020 at 9:55 AM Yonghong Song <yhs@fb.com> wrote:
> >> >>
> >> >>
> >> >>
> >> >> On 12/4/20 1:34 AM, Toke Høiland-Jørgensen wrote:
> >> >> > Yonghong Song <yhs@fb.com> writes:
> >> >> >
> >> >> >> On 12/3/20 9:55 AM, Toke Høiland-Jørgensen wrote:
> >> >> >>> Hi Andrii
> >> >> >>>
> >> >> >>> I noticed that recent libbpf versions fail to load BPF files compiled
> >> >> >>> with old versions of LLVM. E.g., if I compile xdp-tools with LLVM 7 I
> >> >> >>> get:
> >> >> >>>
> >> >> >>> $ sudo ./xdp-loader load testns ../lib/testing/xdp_drop.o -vv
> >> >> >>> Loading 1 files on interface 'testns'.
> >> >> >>> libbpf: loading ../lib/testing/xdp_drop.o
> >> >> >>> libbpf: elf: section(3) prog, size 16, link 0, flags 6, type=1
> >> >> >>> libbpf: sec 'prog': failed to find program symbol at offset 0
> >> >> >>> Couldn't open file '../lib/testing/xdp_drop.o': BPF object format invalid
> >> >> >>>
> >> >> >>> The 'failed to find program symbol' error seems to have been introduced
> >> >> >>> with commit c112239272c6 ("libbpf: Parse multi-function sections into
> >> >> >>> multiple BPF programs").
> >> >> >>>
> >> >> >>> Looking at the object file in question, indeed it seems to not have any
> >> >> >>> function symbols defined:
> >> >> >>>
> >> >> >>> $  llvm-objdump --syms ../lib/testing/xdp_drop.o
> >> >> >>>
> >> >> >>> ../lib/testing/xdp_drop.o:  file format elf64-bpf
> >> >> >>>
> >> >> >>> SYMBOL TABLE:
> >> >> >>> 0000000000000000 l       .debug_str 0000000000000000
> >> >> >>> 0000000000000037 l       .debug_str 0000000000000000
> >> >> >>> 0000000000000042 l       .debug_str 0000000000000000
> >> >> >>> 0000000000000068 l       .debug_str 0000000000000000
> >> >> >>> 0000000000000071 l       .debug_str 0000000000000000
> >> >> >>> 0000000000000076 l       .debug_str 0000000000000000
> >> >> >>> 000000000000008a l       .debug_str 0000000000000000
> >> >> >>> 0000000000000097 l       .debug_str 0000000000000000
> >> >> >>> 00000000000000a3 l       .debug_str 0000000000000000
> >> >> >>> 00000000000000ac l       .debug_str 0000000000000000
> >> >> >>> 00000000000000b5 l       .debug_str 0000000000000000
> >> >> >>> 00000000000000bc l       .debug_str 0000000000000000
> >> >> >>> 00000000000000c9 l       .debug_str 0000000000000000
> >> >> >>> 00000000000000d4 l       .debug_str 0000000000000000
> >> >> >>> 00000000000000dd l       .debug_str 0000000000000000
> >> >> >>> 00000000000000e1 l       .debug_str 0000000000000000
> >> >> >>> 00000000000000e5 l       .debug_str 0000000000000000
> >> >> >>> 00000000000000ea l       .debug_str 0000000000000000
> >> >> >>> 00000000000000f0 l       .debug_str 0000000000000000
> >> >> >>> 00000000000000f9 l       .debug_str 0000000000000000
> >> >> >>> 0000000000000103 l       .debug_str 0000000000000000
> >> >> >>> 0000000000000113 l       .debug_str 0000000000000000
> >> >> >>> 0000000000000122 l       .debug_str 0000000000000000
> >> >> >>> 0000000000000131 l       .debug_str 0000000000000000
> >> >> >>> 0000000000000000 l    d  prog       0000000000000000 prog
> >> >> >>> 0000000000000000 l    d  .debug_abbrev      0000000000000000 .debug_abbrev
> >> >> >>> 0000000000000000 l    d  .debug_info        0000000000000000 .debug_info
> >> >> >>> 0000000000000000 l    d  .debug_frame       0000000000000000 .debug_frame
> >> >> >>> 0000000000000000 l    d  .debug_line        0000000000000000 .debug_line
> >> >> >>> 0000000000000000 g       license    0000000000000000 _license
> >> >> >>> 0000000000000000 g       prog       0000000000000000 xdp_drop
> >> >> >>>
> >> >> >>>
> >> >> >>> I assume this is because old LLVM versions simply don't emit that symbol
> >> >> >>> information?
> >> >>
> >> >> Thanks for the below instruction and xdp_drop.c file. I can reproduce
> >> >> the issue now.
> >> >>
> >> >> I added another function 'xdp_drop1' in the same thing. Below is the
> >> >> symbol table with llvm7 vs. llvm12.
> >> >>
> >> >> -bash-4.4$ llvm-readelf -symbols xdp-7.o | grep xdp_drop
> >> >>      32: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT     3 xdp_drop
> >> >>      33: 0000000000000010     0 NOTYPE  GLOBAL DEFAULT     3 xdp_drop1
> >> >>
> >> >>    [ 3] prog              PROGBITS        0000000000000000 000040 000020
> >> >> 00  AX  0   0  8
> >> >>
> >> >> -bash-4.4$ llvm-readelf -symbols xdp-12.o | grep xdp_drop
> >> >>      32: 0000000000000000    16 FUNC    GLOBAL DEFAULT     3 xdp_drop
> >> >>      33: 0000000000000010    16 FUNC    GLOBAL DEFAULT     3 xdp_drop1
> >> >> -bash-4.4$
> >> >>
> >> >>    [ 3] prog              PROGBITS        0000000000000000 000040 000020
> >> >> 00  AX  0   0  8
> >> >>
> >> >>
> >> >> Yes, llvm7 does not encode type and size for FUNC's. I guess libbpf can
> >> >> change to recognize NOTYPE and use the symbol value (representing the
> >> >> offset from the start of the section) and section size to
> >> >> calculate the individual function size. This is more complicated than
> >> >> elf file providing FUNC type and symbol size directly.
> >> >
> >> > I think we should just face the fact that LLVM7 is way too old to
> >> > produce a sensible BPF ELF file layout. We can extend:
> >> >
> >> > libbpf: sec 'prog': failed to find program symbol at offset 0
> >> > Couldn't open file '../lib/testing/xdp_drop.o': BPF object format invalid
> >> >
> >> > with a suggestion to upgrade Clang/LLVM to something more recent, if
> >> > that would be helpful.
> >> >
> >> > But I don't want to add error-prone checks and assumptions in the
> >> > already quite complicated logic. Even the kernel itself maintains that
> >> > Clang 10+ needs to be used for its compilation. BPF CO-RE is also not
> >> > working with older than Clang10, so lots of people have already
> >> > upgraded way beyond that.
> >>
> >> Wait, what? This is a regression that *breaks people's programs* on
> >> compiler versions that are still very much in the wild! I mean, fine if
> >> you don't want to support new features on such files, but then surely we
> >> can at least revert back to the old behaviour?
> >
> > This is clearly a bug in LLVM7, which didn't produce correct ELF
> > symbols, do we agree on that? libbpf used to handle such invalid ELF
> > files *by accident* until it changed its internal logic to be more
> > strict in v0.2. It became more strict and doesn't work with such
> > invalid ELF files anymore. Does it need to add extra quirks to support
> > such broken ELF? I don't think so.
>
> I don't know enough about the intricacies of the ELF format to say, but
> I believe you when you say it's a bug. However, that doesn't change the
> fact that from a user's PoV, something that was working before is now
> broken, with the only change being a newer libbpf.
>
> This is not a theoretical concern, BTW, I discovered this due to
> feedback from a partner that we've been pushing to adopt libbpf. When
> they finally tried it out, the first thing they noticed is that their
> programs wouldn't load due to this issue.
>
> Sure, I can tell them to just upgrade their toolchain (and I will), but
> that still means we're back to "in order to use this library, you should
> expect to keep chasing the latest version of the entire toolchain". And

Migrating from LLVM7 to something like LLVM10 or LLVM11 (not asking
for not-yet-released LLVM12) hardly qualifies as "chasing the latest
version". LLVM8 or LLVM9 might work for their simple use case either,
I haven't checked. Just please don't use the extremely outdated
toolchain that is (now) known to be broken. That's all I'm asking.

> this is a much harder sell than "this is a stable library and upstream
> takes backwards compatibility very serious", which I *thought* was the
> expectation.

That's still true and I'd rather not go over the same discussion
again. But libbpf is also not a dumpster of work-arounds for all
possible bugs in the kernel and compiler. Libbpf does a lot of that
for backwards compatibility reasons, no need to deal with quirks of
buggy and very outdated compilers (and kernels, if there are obvious
bugs like this).

>
> > Surely, users that can't upgrade LLVM7 to something less ancient, can
> > stick to libbpf v0.1, that was lenient enough to accept such invalid
> > ELF files. libbpf v0.2 was released more than a month ago, and so far
> > you are the only one who noticed this "regression". So hopefully it's
> > not super annoying to people and they would be accommodating enough to
> > use more up to date compiler (and save themselves lots of trouble
> > along the way).
>
> Oh, boy, do I envy your adoption rate for new versions! In my world I
> would expect that by one month a few people who are very early adopters
> have started noticing and maybe thinking about testing the new version :)

In practice with the new libbpf releases I've been getting reports
about something broken within a few days. So yeah, I'm a lucky guy, I
suppose.

>
> >> > Speaking of legacy. Toke, can you please update all the samples in
> >> > your xdp-tools repo to not use arbitrary sections names. I see
> >> > SEC("prog"), where it should really be SEC("xdp"). It sets a bad
> >> > example for newcomers, IMO.
> >>
> >> I used "prog" because that's what iproute2 looks for if you don't supply
> >
> > Ok.
>
> Fixed now, BTW:
> https://github.com/xdp-project/xdp-tools/commit/83ab8aa1c29408aac842bebe704aa47ec5dc5bc3

Thanks!

>
> >> a section name, so it makes it convenient to load programs with 'ip'
> >> without supplying the section name. However, I do realise this is not
> >> the best of reasons, and I am not opposed to changing it. However...
> >>
> >> > I'm also going to emit warnings in libbpf soon for section names that
> >> > don't follow proper libbpf naming pattern, so it would be good if you
> >> > could get ahead of the curve.
> >>
> >> ...this sounds like just another way to annoy users by breaking things
> >> that were working before? :/
> >
> > It won't break, libbpf will emit a warning about the need to use
> > proper section name format, which will start to be enforced only with
> > major version bump. So that will give users plenty of time to make
> > sure their BPF programs are compatible with stricter libbpf.
>
> Well see above re: different expectations for "plenty of time". But OK,
> maybe this isn't as bad as I figured at first glance :)

So far libbpf releases were timed to Linux releases, so roughly one
every 2 months. libbpf 1.0 is unlikely to come sooner than 2 releases
out. So it's not like 2 weeks notice, right? Waiting for a year or
more seems excessive as well.

>
> -Toke
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Latest libbpf fails to load programs compiled with old LLVM
  2020-12-08 18:39               ` Andrii Nakryiko
@ 2020-12-08 22:38                 ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 20+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-12-08 22:38 UTC (permalink / raw)
  To: Andrii Nakryiko; +Cc: Yonghong Song, bpf

Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:

> On Tue, Dec 8, 2020 at 5:41 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>>
>> > On Mon, Dec 7, 2020 at 3:00 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>> >>
>> >> Andrii Nakryiko <andrii.nakryiko@gmail.com> writes:
>> >>
>> >> > On Fri, Dec 4, 2020 at 9:55 AM Yonghong Song <yhs@fb.com> wrote:
>> >> >>
>> >> >>
>> >> >>
>> >> >> On 12/4/20 1:34 AM, Toke Høiland-Jørgensen wrote:
>> >> >> > Yonghong Song <yhs@fb.com> writes:
>> >> >> >
>> >> >> >> On 12/3/20 9:55 AM, Toke Høiland-Jørgensen wrote:
>> >> >> >>> Hi Andrii
>> >> >> >>>
>> >> >> >>> I noticed that recent libbpf versions fail to load BPF files compiled
>> >> >> >>> with old versions of LLVM. E.g., if I compile xdp-tools with LLVM 7 I
>> >> >> >>> get:
>> >> >> >>>
>> >> >> >>> $ sudo ./xdp-loader load testns ../lib/testing/xdp_drop.o -vv
>> >> >> >>> Loading 1 files on interface 'testns'.
>> >> >> >>> libbpf: loading ../lib/testing/xdp_drop.o
>> >> >> >>> libbpf: elf: section(3) prog, size 16, link 0, flags 6, type=1
>> >> >> >>> libbpf: sec 'prog': failed to find program symbol at offset 0
>> >> >> >>> Couldn't open file '../lib/testing/xdp_drop.o': BPF object format invalid
>> >> >> >>>
>> >> >> >>> The 'failed to find program symbol' error seems to have been introduced
>> >> >> >>> with commit c112239272c6 ("libbpf: Parse multi-function sections into
>> >> >> >>> multiple BPF programs").
>> >> >> >>>
>> >> >> >>> Looking at the object file in question, indeed it seems to not have any
>> >> >> >>> function symbols defined:
>> >> >> >>>
>> >> >> >>> $  llvm-objdump --syms ../lib/testing/xdp_drop.o
>> >> >> >>>
>> >> >> >>> ../lib/testing/xdp_drop.o:  file format elf64-bpf
>> >> >> >>>
>> >> >> >>> SYMBOL TABLE:
>> >> >> >>> 0000000000000000 l       .debug_str 0000000000000000
>> >> >> >>> 0000000000000037 l       .debug_str 0000000000000000
>> >> >> >>> 0000000000000042 l       .debug_str 0000000000000000
>> >> >> >>> 0000000000000068 l       .debug_str 0000000000000000
>> >> >> >>> 0000000000000071 l       .debug_str 0000000000000000
>> >> >> >>> 0000000000000076 l       .debug_str 0000000000000000
>> >> >> >>> 000000000000008a l       .debug_str 0000000000000000
>> >> >> >>> 0000000000000097 l       .debug_str 0000000000000000
>> >> >> >>> 00000000000000a3 l       .debug_str 0000000000000000
>> >> >> >>> 00000000000000ac l       .debug_str 0000000000000000
>> >> >> >>> 00000000000000b5 l       .debug_str 0000000000000000
>> >> >> >>> 00000000000000bc l       .debug_str 0000000000000000
>> >> >> >>> 00000000000000c9 l       .debug_str 0000000000000000
>> >> >> >>> 00000000000000d4 l       .debug_str 0000000000000000
>> >> >> >>> 00000000000000dd l       .debug_str 0000000000000000
>> >> >> >>> 00000000000000e1 l       .debug_str 0000000000000000
>> >> >> >>> 00000000000000e5 l       .debug_str 0000000000000000
>> >> >> >>> 00000000000000ea l       .debug_str 0000000000000000
>> >> >> >>> 00000000000000f0 l       .debug_str 0000000000000000
>> >> >> >>> 00000000000000f9 l       .debug_str 0000000000000000
>> >> >> >>> 0000000000000103 l       .debug_str 0000000000000000
>> >> >> >>> 0000000000000113 l       .debug_str 0000000000000000
>> >> >> >>> 0000000000000122 l       .debug_str 0000000000000000
>> >> >> >>> 0000000000000131 l       .debug_str 0000000000000000
>> >> >> >>> 0000000000000000 l    d  prog       0000000000000000 prog
>> >> >> >>> 0000000000000000 l    d  .debug_abbrev      0000000000000000 .debug_abbrev
>> >> >> >>> 0000000000000000 l    d  .debug_info        0000000000000000 .debug_info
>> >> >> >>> 0000000000000000 l    d  .debug_frame       0000000000000000 .debug_frame
>> >> >> >>> 0000000000000000 l    d  .debug_line        0000000000000000 .debug_line
>> >> >> >>> 0000000000000000 g       license    0000000000000000 _license
>> >> >> >>> 0000000000000000 g       prog       0000000000000000 xdp_drop
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> I assume this is because old LLVM versions simply don't emit that symbol
>> >> >> >>> information?
>> >> >>
>> >> >> Thanks for the below instruction and xdp_drop.c file. I can reproduce
>> >> >> the issue now.
>> >> >>
>> >> >> I added another function 'xdp_drop1' in the same thing. Below is the
>> >> >> symbol table with llvm7 vs. llvm12.
>> >> >>
>> >> >> -bash-4.4$ llvm-readelf -symbols xdp-7.o | grep xdp_drop
>> >> >>      32: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT     3 xdp_drop
>> >> >>      33: 0000000000000010     0 NOTYPE  GLOBAL DEFAULT     3 xdp_drop1
>> >> >>
>> >> >>    [ 3] prog              PROGBITS        0000000000000000 000040 000020
>> >> >> 00  AX  0   0  8
>> >> >>
>> >> >> -bash-4.4$ llvm-readelf -symbols xdp-12.o | grep xdp_drop
>> >> >>      32: 0000000000000000    16 FUNC    GLOBAL DEFAULT     3 xdp_drop
>> >> >>      33: 0000000000000010    16 FUNC    GLOBAL DEFAULT     3 xdp_drop1
>> >> >> -bash-4.4$
>> >> >>
>> >> >>    [ 3] prog              PROGBITS        0000000000000000 000040 000020
>> >> >> 00  AX  0   0  8
>> >> >>
>> >> >>
>> >> >> Yes, llvm7 does not encode type and size for FUNC's. I guess libbpf can
>> >> >> change to recognize NOTYPE and use the symbol value (representing the
>> >> >> offset from the start of the section) and section size to
>> >> >> calculate the individual function size. This is more complicated than
>> >> >> elf file providing FUNC type and symbol size directly.
>> >> >
>> >> > I think we should just face the fact that LLVM7 is way too old to
>> >> > produce a sensible BPF ELF file layout. We can extend:
>> >> >
>> >> > libbpf: sec 'prog': failed to find program symbol at offset 0
>> >> > Couldn't open file '../lib/testing/xdp_drop.o': BPF object format invalid
>> >> >
>> >> > with a suggestion to upgrade Clang/LLVM to something more recent, if
>> >> > that would be helpful.
>> >> >
>> >> > But I don't want to add error-prone checks and assumptions in the
>> >> > already quite complicated logic. Even the kernel itself maintains that
>> >> > Clang 10+ needs to be used for its compilation. BPF CO-RE is also not
>> >> > working with older than Clang10, so lots of people have already
>> >> > upgraded way beyond that.
>> >>
>> >> Wait, what? This is a regression that *breaks people's programs* on
>> >> compiler versions that are still very much in the wild! I mean, fine if
>> >> you don't want to support new features on such files, but then surely we
>> >> can at least revert back to the old behaviour?
>> >
>> > This is clearly a bug in LLVM7, which didn't produce correct ELF
>> > symbols, do we agree on that? libbpf used to handle such invalid ELF
>> > files *by accident* until it changed its internal logic to be more
>> > strict in v0.2. It became more strict and doesn't work with such
>> > invalid ELF files anymore. Does it need to add extra quirks to support
>> > such broken ELF? I don't think so.
>>
>> I don't know enough about the intricacies of the ELF format to say, but
>> I believe you when you say it's a bug. However, that doesn't change the
>> fact that from a user's PoV, something that was working before is now
>> broken, with the only change being a newer libbpf.
>>
>> This is not a theoretical concern, BTW, I discovered this due to
>> feedback from a partner that we've been pushing to adopt libbpf. When
>> they finally tried it out, the first thing they noticed is that their
>> programs wouldn't load due to this issue.
>>
>> Sure, I can tell them to just upgrade their toolchain (and I will), but
>> that still means we're back to "in order to use this library, you should
>> expect to keep chasing the latest version of the entire toolchain". And
>
> Migrating from LLVM7 to something like LLVM10 or LLVM11 (not asking
> for not-yet-released LLVM12) hardly qualifies as "chasing the latest
> version". LLVM8 or LLVM9 might work for their simple use case either,
> I haven't checked. Just please don't use the extremely outdated
> toolchain that is (now) known to be broken. That's all I'm asking.

"Extremely outdated" is very much subjective: It's only two years old,
and still shipping as the current version in major Linux distributions.

But fine, "we won't promise not to break compilers older than a year" is
also a support statement, it's just not the one (I thought you were)
making before.

In any case, having a clearly stated expectation articulated somewhere
would be very helpful. Something concrete like "expect to run an LLVM
version no older than two stable releases"; just saying "it's stable"
seems to be a tad too subjective, as evidenced by this discussion (and
similar ones we've had before) :)

>> this is a much harder sell than "this is a stable library and upstream
>> takes backwards compatibility very serious", which I *thought* was the
>> expectation.
>
> That's still true and I'd rather not go over the same discussion
> again. But libbpf is also not a dumpster of work-arounds for all
> possible bugs in the kernel and compiler. Libbpf does a lot of that
> for backwards compatibility reasons, no need to deal with quirks of
> buggy and very outdated compilers (and kernels, if there are obvious
> bugs like this).

Not adding new workarounds for things that were broken from the start is
fine. But we're discussing a change that broke something that was
working before. Look at the kernel stability policy: If something was
working with an old kernel, we promise it will keep working on new ones,
regardless of whether what it was doing is technically outside of some
spec. *That's* a stability promise.

>> > Surely, users that can't upgrade LLVM7 to something less ancient, can
>> > stick to libbpf v0.1, that was lenient enough to accept such invalid
>> > ELF files. libbpf v0.2 was released more than a month ago, and so far
>> > you are the only one who noticed this "regression". So hopefully it's
>> > not super annoying to people and they would be accommodating enough to
>> > use more up to date compiler (and save themselves lots of trouble
>> > along the way).
>>
>> Oh, boy, do I envy your adoption rate for new versions! In my world I
>> would expect that by one month a few people who are very early adopters
>> have started noticing and maybe thinking about testing the new version :)
>
> In practice with the new libbpf releases I've been getting reports
> about something broken within a few days. So yeah, I'm a lucky guy, I
> suppose.

Don't mean to knock early testing, that's awesome. I'm just objecting to
the converse implication (i.e., "if it doesn't show up in early testing
it's not a real bug").

>> >> a section name, so it makes it convenient to load programs with 'ip'
>> >> without supplying the section name. However, I do realise this is not
>> >> the best of reasons, and I am not opposed to changing it. However...
>> >>
>> >> > I'm also going to emit warnings in libbpf soon for section names that
>> >> > don't follow proper libbpf naming pattern, so it would be good if you
>> >> > could get ahead of the curve.
>> >>
>> >> ...this sounds like just another way to annoy users by breaking things
>> >> that were working before? :/
>> >
>> > It won't break, libbpf will emit a warning about the need to use
>> > proper section name format, which will start to be enforced only with
>> > major version bump. So that will give users plenty of time to make
>> > sure their BPF programs are compatible with stricter libbpf.
>>
>> Well see above re: different expectations for "plenty of time". But OK,
>> maybe this isn't as bad as I figured at first glance :)
>
> So far libbpf releases were timed to Linux releases, so roughly one
> every 2 months. libbpf 1.0 is unlikely to come sooner than 2 releases
> out. So it's not like 2 weeks notice, right? Waiting for a year or
> more seems excessive as well.

Removing deprecated features on major release versions does seem
reasonable. But do keep in mind that for some projects, adding a warning
still constitutes "breaking the build" due to downstream policy.

-Toke


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-12-08 22:40 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-03 17:55 Latest libbpf fails to load programs compiled with old LLVM Toke Høiland-Jørgensen
2020-12-04  1:42 ` Yonghong Song
2020-12-04  9:34   ` Toke Høiland-Jørgensen
2020-12-04 17:54     ` Yonghong Song
2020-12-04 19:23       ` Andrii Nakryiko
2020-12-07 10:59         ` Toke Høiland-Jørgensen
2020-12-07 15:55           ` Alexei Starovoitov
2020-12-07 16:15             ` Toke Høiland-Jørgensen
2020-12-07 16:20               ` Alexei Starovoitov
2020-12-07 16:51                 ` Toke Høiland-Jørgensen
2020-12-07 17:16                   ` Alexei Starovoitov
2020-12-07 22:18                     ` Toke Høiland-Jørgensen
2020-12-08  1:20                       ` Alexei Starovoitov
2020-12-07 18:02                 ` David Ahern
2020-12-07 18:14                   ` Alexei Starovoitov
2020-12-07 18:59                     ` David Ahern
2020-12-08  2:47           ` Andrii Nakryiko
2020-12-08 13:41             ` Toke Høiland-Jørgensen
2020-12-08 18:39               ` Andrii Nakryiko
2020-12-08 22:38                 ` Toke Høiland-Jørgensen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.