* Help with the BPF verifier
@ 2018-11-01 18:52 Arnaldo Carvalho de Melo
2018-11-01 19:10 ` David Miller
2018-11-01 20:05 ` Edward Cree
0 siblings, 2 replies; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-11-01 18:52 UTC (permalink / raw)
To: Yonghong Song
Cc: Daniel Borkmann, Jiri Olsa, Martin Lau, Alexei Starovoitov,
Linux Networking Development Mailing List
tl;dr: I seem to be trying to get past clang optimizations that get the
verifier to accept my proggie.
Hi,
So I'm moving to use raw_syscalls:sys_exit to collect pointer
contents, using maps to tell the bpf program what to copy, how many
bytes, filters, etc.
I'm at the start of it at this point I need to use an index to
get to the right syscall arg that is a filename, starting just with
"open" and "openat", that have the filename in different args, so to get
this first part working I'm doing it directly in the bpf restricted C
program, later this will be to maps, etc, so if I set the index as a
constant, just for testing, it works, look at the "open" and "openat"
calls below, later we'll see why openat is failing to augment its
"filename" arg while "open" works:
[root@seventh perf]# trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
? ( ): sleep/10152 ... [continued]: execve()) = 0
0.045 ( 0.004 ms): sleep/10152 brk() = 0x55ccff356000
0.074 ( 0.007 ms): sleep/10152 access(filename: , mode: R) = -1 ENOENT No such file or directory
0.089 ( 0.006 ms): sleep/10152 openat(dfd: CWD, filename: , flags: CLOEXEC) = 3
0.097 ( 0.003 ms): sleep/10152 fstat(fd: 3, statbuf: 0x7ffecdd283f0) = 0
0.103 ( 0.006 ms): sleep/10152 mmap(len: 103334, prot: READ, flags: PRIVATE, fd: 3) = 0x7f8ffee9c000
0.111 ( 0.002 ms): sleep/10152 close(fd: 3) = 0
0.135 ( 0.007 ms): sleep/10152 openat(dfd: CWD, filename: , flags: CLOEXEC) = 3
0.144 ( 0.003 ms): sleep/10152 read(fd: 3, buf: 0x7ffecdd285b8, count: 832) = 832
0.150 ( 0.002 ms): sleep/10152 fstat(fd: 3, statbuf: 0x7ffecdd28450) = 0
0.155 ( 0.005 ms): sleep/10152 mmap(len: 8192, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS) = 0x7f8ffee9a000
0.166 ( 0.007 ms): sleep/10152 mmap(len: 3889792, prot: EXEC|READ, flags: PRIVATE|DENYWRITE, fd: 3) = 0x7f8ffe8dc000
0.175 ( 0.010 ms): sleep/10152 mprotect(start: 0x7f8ffea89000, len: 2093056) = 0
0.188 ( 0.010 ms): sleep/10152 mmap(addr: 0x7f8ffec88000, len: 24576, prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 1753088) = 0x7f8ffec88000
0.204 ( 0.005 ms): sleep/10152 mmap(addr: 0x7f8ffec8e000, len: 14976, prot: READ|WRITE, flags: PRIVATE|FIXED|ANONYMOUS) = 0x7f8ffec8e000
0.218 ( 0.002 ms): sleep/10152 close(fd: 3) = 0
0.239 ( 0.002 ms): sleep/10152 arch_prctl(option: 4098, arg2: 140256433779968) = 0
0.312 ( 0.009 ms): sleep/10152 mprotect(start: 0x7f8ffec88000, len: 16384, prot: READ) = 0
0.343 ( 0.005 ms): sleep/10152 mprotect(start: 0x55ccff1c6000, len: 4096, prot: READ) = 0
0.354 ( 0.006 ms): sleep/10152 mprotect(start: 0x7f8ffeeb6000, len: 4096, prot: READ) = 0
0.362 ( 0.019 ms): sleep/10152 munmap(addr: 0x7f8ffee9c000, len: 103334) = 0
0.476 ( 0.002 ms): sleep/10152 brk() = 0x55ccff356000
0.480 ( 0.004 ms): sleep/10152 brk(brk: 0x55ccff377000) = 0x55ccff377000
0.487 ( 0.002 ms): sleep/10152 brk() = 0x55ccff377000
0.497 ( 0.008 ms): sleep/10152 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3
0.507 ( 0.002 ms): sleep/10152 fstat(fd: 3, statbuf: 0x7f8ffec8daa0) = 0
0.511 ( 0.006 ms): sleep/10152 mmap(len: 113045344, prot: READ, flags: PRIVATE, fd: 3) = 0x7f8ff7d0d000
0.524 ( 0.002 ms): sleep/10152 close(fd: 3) = 0
0.574 (1000.140 ms): sleep/10152 nanosleep(rqtp: 0x7ffecdd29130) = 0
1000.753 ( 0.007 ms): sleep/10152 close(fd: 1) = 0
1000.767 ( 0.004 ms): sleep/10152 close(fd: 2) = 0
1000.781 ( ): sleep/10152 exit_group()
[root@seventh perf]#
1 // SPDX-License-Identifier: GPL-2.0
2 /*
3 * Augment the raw_syscalls tracepoints with the contents of the pointer arguments.
4 *
5 * Test it with:
6 *
7 * perf trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c cat /etc/passwd > /dev/null
8 *
9 * This exactly matches what is marshalled into the raw_syscall:sys_enter
10 * payload expected by the 'perf trace' beautifiers.
11 *
12 * For now it just uses the existing tracepoint augmentation code in 'perf
13 * trace', in the next csets we'll hook up these with the sys_enter/sys_exit
14 * code that will combine entry/exit in a strace like way.
15 */
16 #include <stdio.h>
17 #include <linux/socket.h>
18 /* bpf-output associated map */
19 struct bpf_map SEC("maps") __augmented_syscalls__ = {
20 .type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
21 .key_size = sizeof(int),
22 .value_size = sizeof(u32),
23 .max_entries = __NR_CPUS__,
24 };
25 struct syscall_enter_args {
26 unsigned long long common_tp_fields;
27 long syscall_nr;
28 unsigned long args[6];
29 };
30 struct syscall_exit_args {
31 unsigned long long common_tp_fields;
32 long syscall_nr;
33 long ret;
34 };
35 struct augmented_filename {
36 unsigned int size;
37 int reserved;
38 char value[256];
39 };
40 #define SYS_OPEN 2
41 #define SYS_OPENAT 257
42 SEC("raw_syscalls:sys_enter")
43 int sys_enter(struct syscall_enter_args *args)
44 {
45 struct {
46 struct syscall_enter_args args;
47 struct augmented_filename filename;
48 } augmented_args;
49 unsigned int len = sizeof(augmented_args);
50 unsigned int filename_arg = 6;
51 probe_read(&augmented_args.args, sizeof(augmented_args.args), args);
52 switch (augmented_args.args.syscall_nr) {
53 case SYS_OPEN: filename_arg = 0; break;
54 case SYS_OPENAT: filename_arg = 1; break;
55 }
56 if (filename_arg <= 5) {
57 augmented_args.filename.reserved = 0;
58 augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
59 sizeof(augmented_args.filename.value),
60 (const void *)args->args[0]);
61 if (augmented_args.filename.size < sizeof(augmented_args.filename.value)) {
62 len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size;
63 len &= sizeof(augmented_args.filename.value) - 1;
64 }
65 } else {
66 len = sizeof(augmented_args.args);
67 }
68 perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, len);
69 return 0;
70 }
71 SEC("raw_syscalls:sys_exit")
72 int sys_exit(struct syscall_exit_args *args)
73 {
74 return 1; /* 0 as soon as we start copying data returned by the kernel, e.g. 'read' */
75 }
76 license(GPL);
In line #60 if I change that to 1, then "openat" works and "open"
doesn't, so what I wanted was to use filename_arg there as the index,
now it comes from that switch, but really it'll come from userspace,
that knows the syscall tables for each arch, etc.
But if I do that, i.e. apply this patch to that program:
--- /wb/augmented_raw_syscalls.c.old 2018-11-01 15:43:55.000394234 -0300
+++ /wb/augmented_raw_syscalls.c 2018-11-01 15:44:15.102367838 -0300
@@ -67,7 +67,7 @@
augmented_args.filename.reserved = 0;
augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
sizeof(augmented_args.filename.value),
- (const void *)args->args[0]);
+ (const void *)args->args[filename_arg]);
if (augmented_args.filename.size < sizeof(augmented_args.filename.value)) {
len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size;
len &= sizeof(augmented_args.filename.value) - 1;
Then I end up with the verifier complying, I tried various ways to get
around the compiler about filename_arg being safe to use as an index,
but I couldn't find the right trick, ideas?
This is what I end up with when I apply that patch:
[root@seventh perf]# trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
event syntax error: 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
\___ Kernel verifier blocks program loading
(add -v to see detail)
Run 'perf list' for a list of valid events
Usage: perf trace [<options>] [<command>]
or: perf trace [<options>] -- <command> [<options>]
or: perf trace record [<options>] [<command>]
or: perf trace record [<options>] -- <command> [<options>]
-e, --event <event> event/syscall selector. use 'perf list' to list available events
[root@seventh perf]#
Using -v, as suggested, I get:
[root@seventh perf]# trace -v -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
bpf: builtin compilation failed: -95, try external compiler
Kernel build dir is set to /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
set env: KBUILD_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
unset env: KBUILD_OPTS
include option is set to -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h
set env: NR_CPUS=4
set env: LINUX_VERSION_CODE=0x41300
set env: CLANG_EXEC=/usr/local/bin/clang
unset env: CLANG_OPTIONS
set env: KERNEL_INC_OPTIONS= -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h
set env: PERF_BPF_INC_OPTIONS=-I/home/acme/lib/perf/include/bpf
set env: WORKING_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
set env: CLANG_SOURCE=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
llvm compiling command template: $CLANG_EXEC -D__KERNEL__ -D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign -working-directory $WORKING_DIR -c "$CLANG_SOURCE" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE
llvm compiling command : /usr/local/bin/clang -D__KERNEL__ -D__NR_CPUS__=4 -DLINUX_VERSION_CODE=0x41300 -I/home/acme/lib/perf/include/bpf -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h -Wno-unused-value -Wno-pointer-sign -working-directory /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build -c /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c -target bpf -O2 -o -
libbpf: loading object 'tools/perf/examples/bpf/augmented_raw_syscalls.c' from buffer
libbpf: section(1) .strtab, size 168, link 0, flags 0, type=3
libbpf: skip section(1) .strtab
libbpf: section(2) .text, size 0, link 0, flags 6, type=1
libbpf: skip section(2) .text
libbpf: section(3) raw_syscalls:sys_enter, size 376, link 0, flags 6, type=1
libbpf: found program raw_syscalls:sys_enter
libbpf: section(4) .relraw_syscalls:sys_enter, size 16, link 10, flags 0, type=9
libbpf: section(5) raw_syscalls:sys_exit, size 16, link 0, flags 6, type=1
libbpf: found program raw_syscalls:sys_exit
libbpf: section(6) maps, size 56, link 0, flags 3, type=1
libbpf: section(7) license, size 4, link 0, flags 3, type=1
libbpf: license of tools/perf/examples/bpf/augmented_raw_syscalls.c is GPL
libbpf: section(8) version, size 4, link 0, flags 3, type=1
libbpf: kernel version of tools/perf/examples/bpf/augmented_raw_syscalls.c is 41300
libbpf: section(9) .llvm_addrsig, size 6, link 10, flags 80000000, type=1879002115
libbpf: skip section(9) .llvm_addrsig
libbpf: section(10) .symtab, size 240, link 1, flags 0, type=2
libbpf: maps in tools/perf/examples/bpf/augmented_raw_syscalls.c: 2 maps in 56 bytes
libbpf: map 0 is "__augmented_syscalls__"
libbpf: map 1 is "__bpf_stdout__"
libbpf: collecting relocating info for: 'raw_syscalls:sys_enter'
libbpf: relo for 4 value 28 name 124
libbpf: relocation: insn_idx=39
libbpf: relocation: find map 1 (__augmented_syscalls__) for insn 39
bpf: config program 'raw_syscalls:sys_enter'
bpf: config program 'raw_syscalls:sys_exit'
libbpf: create map __bpf_stdout__: fd=3
libbpf: create map __augmented_syscalls__: fd=4
libbpf: load bpf program failed: Permission denied
libbpf: -- BEGIN DUMP LOG ---
libbpf:
0: (bf) r6 = r1
1: (bf) r1 = r10
2: (07) r1 += -328
3: (b7) r7 = 64
4: (b7) r2 = 64
5: (bf) r3 = r6
6: (85) call bpf_probe_read#4
7: (b7) r2 = 1
8: (79) r3 = *(u64 *)(r10 -320)
9: (15) if r3 == 0x101 goto pc+1
R0=inv(id=0) R2=inv1 R3=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
10: (b7) r2 = 6
11: (b7) r1 = 0
12: (15) if r3 == 0x2 goto pc+1
R0=inv(id=0) R1=inv0 R2=inv6 R3=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
13: (bf) r1 = r2
14: (25) if r1 > 0x5 goto pc+21
R0=inv(id=0) R1=inv6 R2=inv6 R3=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
15: (b7) r2 = 0
16: (63) *(u32 *)(r10 -260) = r2
17: (67) r1 <<= 32
18: (77) r1 >>= 32
19: (67) r1 <<= 3
20: (bf) r2 = r6
21: (0f) r2 += r1
22: (79) r3 = *(u64 *)(r2 +16)
R2 invalid mem access 'inv'
libbpf: -- END LOG --
libbpf: failed to load program 'raw_syscalls:sys_enter'
libbpf: failed to load object 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
bpf: load objects failed: err=-4007: (Kernel verifier blocks program loading)
event syntax error: 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
\___ Kernel verifier blocks program loading
(add -v to see detail)
Run 'perf list' for a list of valid events
Usage: perf trace [<options>] [<command>]
or: perf trace [<options>] -- <command> [<options>]
or: perf trace record [<options>] [<command>]
or: perf trace record [<options>] -- <command> [<options>]
-e, --event <event> event/syscall selector. use 'perf list' to list available events
[root@seventh perf]#
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Help with the BPF verifier
2018-11-01 18:52 Help with the BPF verifier Arnaldo Carvalho de Melo
@ 2018-11-01 19:10 ` David Miller
2018-11-01 19:13 ` Arnaldo Carvalho de Melo
2018-11-01 20:05 ` Edward Cree
1 sibling, 1 reply; 11+ messages in thread
From: David Miller @ 2018-11-01 19:10 UTC (permalink / raw)
To: acme; +Cc: yhs, daniel, jolsa, kafai, alexei.starovoitov, netdev
From: Arnaldo Carvalho de Melo <acme@kernel.org>
Date: Thu, 1 Nov 2018 15:52:17 -0300
> 50 unsigned int filename_arg = 6;
...
> --- /wb/augmented_raw_syscalls.c.old 2018-11-01 15:43:55.000394234 -0300
> +++ /wb/augmented_raw_syscalls.c 2018-11-01 15:44:15.102367838 -0300
> @@ -67,7 +67,7 @@
> augmented_args.filename.reserved = 0;
> augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
> sizeof(augmented_args.filename.value),
> - (const void *)args->args[0]);
> + (const void *)args->args[filename_arg]);
args[] is sized to '6', therefore the last valid index is '5', yet you're using '6' here which
is one entry past the end of the declared array.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Help with the BPF verifier
2018-11-01 19:10 ` David Miller
@ 2018-11-01 19:13 ` Arnaldo Carvalho de Melo
2018-11-01 19:28 ` David Miller
0 siblings, 1 reply; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-11-01 19:13 UTC (permalink / raw)
To: David Miller; +Cc: yhs, daniel, jolsa, kafai, alexei.starovoitov, netdev
Em Thu, Nov 01, 2018 at 12:10:39PM -0700, David Miller escreveu:
> From: Arnaldo Carvalho de Melo <acme@kernel.org>
> Date: Thu, 1 Nov 2018 15:52:17 -0300
>
> > 50 unsigned int filename_arg = 6;
> ...
> > --- /wb/augmented_raw_syscalls.c.old 2018-11-01 15:43:55.000394234 -0300
> > +++ /wb/augmented_raw_syscalls.c 2018-11-01 15:44:15.102367838 -0300
> > @@ -67,7 +67,7 @@
> > augmented_args.filename.reserved = 0;
> > augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
> > sizeof(augmented_args.filename.value),
> > - (const void *)args->args[0]);
> > + (const void *)args->args[filename_arg]);
>
> args[] is sized to '6', therefore the last valid index is '5', yet you're using '6' here which
> is one entry past the end of the declared array.
Nope... this is inside an if:
if (filename_arg <= 5) {
augmented_args.filename.reserved = 0;
augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
sizeof(augmented_args.filename.value),
(const void *)args->args[filename_arg]);
if (augmented_args.filename.size < sizeof(augmented_args.filename.value)) {
len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size;
len &= sizeof(augmented_args.filename.value) - 1;
}
} else {
I use 6 to mean "hey, this syscall doesn't have any string argument, don't
bother with it".
- Arnaldo
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Help with the BPF verifier
2018-11-01 19:13 ` Arnaldo Carvalho de Melo
@ 2018-11-01 19:28 ` David Miller
0 siblings, 0 replies; 11+ messages in thread
From: David Miller @ 2018-11-01 19:28 UTC (permalink / raw)
To: acme; +Cc: yhs, daniel, jolsa, kafai, alexei.starovoitov, netdev
From: Arnaldo Carvalho de Melo <acme@kernel.org>
Date: Thu, 1 Nov 2018 16:13:10 -0300
> Nope... this is inside an if:
>
> if (filename_arg <= 5) {
> augmented_args.filename.reserved = 0;
> augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
> sizeof(augmented_args.filename.value),
> (const void *)args->args[filename_arg]);
> if (augmented_args.filename.size < sizeof(augmented_args.filename.value)) {
> len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size;
> len &= sizeof(augmented_args.filename.value) - 1;
> }
> } else {
>
> I use 6 to mean "hey, this syscall doesn't have any string argument, don't
> bother with it".
Really weird. And it's unsigned so I can't imagine it wants you to
check that it's >= 0...
Maybe Deniel or someone else can figure it out.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Help with the BPF verifier
2018-11-01 18:52 Help with the BPF verifier Arnaldo Carvalho de Melo
2018-11-01 19:10 ` David Miller
@ 2018-11-01 20:05 ` Edward Cree
2018-11-02 15:02 ` Arnaldo Carvalho de Melo
1 sibling, 1 reply; 11+ messages in thread
From: Edward Cree @ 2018-11-01 20:05 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Yonghong Song
Cc: Daniel Borkmann, Jiri Olsa, Martin Lau, Alexei Starovoitov,
Linux Networking Development Mailing List
On 01/11/18 18:52, Arnaldo Carvalho de Melo wrote:
> R0=inv(id=0) R1=inv6 R2=inv6 R3=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> 15: (b7) r2 = 0
> 16: (63) *(u32 *)(r10 -260) = r2
> 17: (67) r1 <<= 32
> 18: (77) r1 >>= 32
> 19: (67) r1 <<= 3
> 20: (bf) r2 = r6
> 21: (0f) r2 += r1
> 22: (79) r3 = *(u64 *)(r2 +16)
> R2 invalid mem access 'inv'
I wonder if you could run this with verifier log level 2? (I'm not sure how
you would go about plumbing that through the perf tooling.) It seems very
odd that it ends up with R2=inv, and I'm wondering whether R1 becomes unknown
during the shifts or whether the addition in insn 21 somehow produces the
unknown-ness. (I know we used to have a thing[1] where doing ptr += K and
then also having an offset in the LDX produced an error about
ptr+const+const, but that seems to have been fixed at some point.)
Note however that even if we get past this, R1 at this point holds 6, so it
looks like the verifier is walking the impossible path where we're inside the
'if' even though filename_arg = 6. This is a (slightly annoying) verifier
limitation, that it walks paths with impossible combinations of constraints
(we've previously had cases where assertions in the verifier would blow up
because of this, e.g. registers with max_val less than min_val). So if the
check_ctx_access() is going to worry about whether you're off the end of the
array (I'm not sure what your program type is and thus which is_valid_access
callback is involved), then it'll complain about this.
If filename_arg came from some external source you'd have a different
problem, because then it would have a totally unknown value, that on entering
the 'if' becomes "unknown but < 6", which is still too variable to have as
the offset of a ctx access. Those have to be at a known constant offset, so
that we can determine the type of the returned value.
As a way to fix this, how about [UNTESTED!]:
const void *filename_arg = NULL;
/* ... */
switch (augmented_args.args.syscall_nr) {
case SYS_OPEN: filename_arg = args->args[0]; break;
case SYS_OPENAT: filename_arg = args->args[1]; break;
}
/* ... */
if (filename_arg) {
/* stuff */
blah = probe_read_str(/* ... */, filename_arg);
} else {
/* the other stuff */
}
That way, you're only ever dealing in constant pointers (although judging by
an old thread I found[1] about ptr+const+const, the compiler might decide to
make some optimisations that end up looking like your existing code).
As for what you want to do with the index coming from userspace, the verifier
will not like that at all, as mentioned above, so I think you'll need to do
something like:
switch (filename_arg_from_userspace) {
case 0: filename_arg = args->args[0]; break;
case 1: filename_arg = args->args[1]; break;
/* etc */
default: filename_arg = NULL;
}
thus ensuring that you only ever have ctx pointers with constant offsets.
-Ed
[1]: https://lists.iovisor.org/g/iovisor-dev/topic/21386327#1302
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Help with the BPF verifier
2018-11-01 20:05 ` Edward Cree
@ 2018-11-02 15:02 ` Arnaldo Carvalho de Melo
2018-11-02 15:42 ` Edward Cree
0 siblings, 1 reply; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-11-02 15:02 UTC (permalink / raw)
To: Edward Cree
Cc: Yonghong Song, Daniel Borkmann, Jiri Olsa, Martin Lau,
Alexei Starovoitov, Linux Networking Development Mailing List
Em Thu, Nov 01, 2018 at 08:05:07PM +0000, Edward Cree escreveu:
> On 01/11/18 18:52, Arnaldo Carvalho de Melo wrote:
> > R0=inv(id=0) R1=inv6 R2=inv6 R3=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> > 15: (b7) r2 = 0
> > 16: (63) *(u32 *)(r10 -260) = r2
> > 17: (67) r1 <<= 32
> > 18: (77) r1 >>= 32
> > 19: (67) r1 <<= 3
> > 20: (bf) r2 = r6
> > 21: (0f) r2 += r1
> > 22: (79) r3 = *(u64 *)(r2 +16)
> > R2 invalid mem access 'inv'
> I wonder if you could run this with verifier log level 2? (I'm not sure how
> you would go about plumbing that through the perf tooling.) It seems very
> odd that it ends up with R2=inv, and I'm wondering whether R1 becomes unknown
> during the shifts or whether the addition in insn 21 somehow produces the
> unknown-ness. (I know we used to have a thing[1] where doing ptr += K and
> then also having an offset in the LDX produced an error about
> ptr+const+const, but that seems to have been fixed at some point.)
>
> Note however that even if we get past this, R1 at this point holds 6, so it
> looks like the verifier is walking the impossible path where we're inside the
> 'if' even though filename_arg = 6. This is a (slightly annoying) verifier
> limitation, that it walks paths with impossible combinations of constraints
> (we've previously had cases where assertions in the verifier would blow up
> because of this, e.g. registers with max_val less than min_val). So if the
> check_ctx_access() is going to worry about whether you're off the end of the
> array (I'm not sure what your program type is and thus which is_valid_access
> callback is involved), then it'll complain about this.
> If filename_arg came from some external source you'd have a different
> problem, because then it would have a totally unknown value, that on entering
> the 'if' becomes "unknown but < 6", which is still too variable to have as
> the offset of a ctx access. Those have to be at a known constant offset, so
> that we can determine the type of the returned value.
>
> As a way to fix this, how about [UNTESTED!]:
> const void *filename_arg = NULL;
> /* ... */
> switch (augmented_args.args.syscall_nr) {
> case SYS_OPEN: filename_arg = args->args[0]; break;
> case SYS_OPENAT: filename_arg = args->args[1]; break;
> }
> /* ... */
> if (filename_arg) {
> /* stuff */
> blah = probe_read_str(/* ... */, filename_arg);
> } else {
> /* the other stuff */
> }
> That way, you're only ever dealing in constant pointers (although judging by
> an old thread I found[1] about ptr+const+const, the compiler might decide to
> make some optimisations that end up looking like your existing code).
Yeah, didn't work as well:
SEC("raw_syscalls:sys_enter")
int sys_enter(struct syscall_enter_args *args)
{
struct {
struct syscall_enter_args args;
struct augmented_filename filename;
} augmented_args;
unsigned int len = sizeof(augmented_args);
const void *filename_arg = NULL;
probe_read(&augmented_args.args, sizeof(augmented_args.args), args);
switch (augmented_args.args.syscall_nr) {
case SYS_OPEN: filename_arg = (const void *)args->args[0]; break;
case SYS_OPENAT: filename_arg = (const void *)args->args[1]; break;
}
if (filename_arg != NULL) {
augmented_args.filename.reserved = 0;
augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
sizeof(augmented_args.filename.value),
filename_arg);
if (augmented_args.filename.size < sizeof(augmented_args.filename.value)) {
len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size;
len &= sizeof(augmented_args.filename.value) - 1;
}
} else {
len = sizeof(augmented_args.args);
}
perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, len);
return 0;
}
And the -vv in 'perf trace' didn't seem to map to further details in the
output of the verifier debug:
# trace -vv -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
bpf: builtin compilation failed: -95, try external compiler
Kernel build dir is set to /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
set env: KBUILD_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
unset env: KBUILD_OPTS
include option is set to -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h
set env: NR_CPUS=4
set env: LINUX_VERSION_CODE=0x41300
set env: CLANG_EXEC=/usr/local/bin/clang
unset env: CLANG_OPTIONS
set env: KERNEL_INC_OPTIONS= -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h
set env: PERF_BPF_INC_OPTIONS=-I/home/acme/lib/perf/include/bpf
set env: WORKING_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
set env: CLANG_SOURCE=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
llvm compiling command template: $CLANG_EXEC -D__KERNEL__ -D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign -working-directory $WORKING_DIR -c "$CLANG_SOURCE" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE
llvm compiling command : /usr/local/bin/clang -D__KERNEL__ -D__NR_CPUS__=4 -DLINUX_VERSION_CODE=0x41300 -I/home/acme/lib/perf/include/bpf -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h -Wno-unused-value -Wno-pointer-sign -working-directory /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build -c /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c -target bpf -O2 -o -
libbpf: loading object 'tools/perf/examples/bpf/augmented_raw_syscalls.c' from buffer
libbpf: section(1) .strtab, size 168, link 0, flags 0, type=3
libbpf: skip section(1) .strtab
libbpf: section(2) .text, size 0, link 0, flags 6, type=1
libbpf: skip section(2) .text
libbpf: section(3) raw_syscalls:sys_enter, size 344, link 0, flags 6, type=1
libbpf: found program raw_syscalls:sys_enter
libbpf: section(4) .relraw_syscalls:sys_enter, size 16, link 10, flags 0, type=9
libbpf: section(5) raw_syscalls:sys_exit, size 16, link 0, flags 6, type=1
libbpf: found program raw_syscalls:sys_exit
libbpf: section(6) maps, size 56, link 0, flags 3, type=1
libbpf: section(7) license, size 4, link 0, flags 3, type=1
libbpf: license of tools/perf/examples/bpf/augmented_raw_syscalls.c is GPL
libbpf: section(8) version, size 4, link 0, flags 3, type=1
libbpf: kernel version of tools/perf/examples/bpf/augmented_raw_syscalls.c is 41300
libbpf: section(9) .llvm_addrsig, size 6, link 10, flags 80000000, type=1879002115
libbpf: skip section(9) .llvm_addrsig
libbpf: section(10) .symtab, size 240, link 1, flags 0, type=2
libbpf: maps in tools/perf/examples/bpf/augmented_raw_syscalls.c: 2 maps in 56 bytes
libbpf: map 0 is "__augmented_syscalls__"
libbpf: map 1 is "__bpf_stdout__"
libbpf: collecting relocating info for: 'raw_syscalls:sys_enter'
libbpf: relo for 4 value 28 name 124
libbpf: relocation: insn_idx=35
libbpf: relocation: find map 1 (__augmented_syscalls__) for insn 35
Added extra kernel map __entry_SYSCALL_64_trampoline fffffe0000006000-fffffe0000007000
Added extra kernel map __entry_SYSCALL_64_trampoline fffffe0000032000-fffffe0000033000
Added extra kernel map __entry_SYSCALL_64_trampoline fffffe000005e000-fffffe000005f000
Added extra kernel map __entry_SYSCALL_64_trampoline fffffe000008a000-fffffe000008b000
bpf: config program 'raw_syscalls:sys_enter'
bpf: config program 'raw_syscalls:sys_exit'
libbpf: create map __bpf_stdout__: fd=3
libbpf: create map __augmented_syscalls__: fd=4
libbpf: load bpf program failed: Permission denied
libbpf: -- BEGIN DUMP LOG ---
libbpf:
0: (bf) r6 = r1
1: (bf) r1 = r10
2: (07) r1 += -328
3: (b7) r7 = 64
4: (b7) r2 = 64
5: (bf) r3 = r6
6: (85) call bpf_probe_read#4
7: (79) r1 = *(u64 *)(r10 -320)
8: (15) if r1 == 0x101 goto pc+4
R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
9: (55) if r1 != 0x2 goto pc+22
R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
10: (bf) r1 = r6
11: (07) r1 += 16
12: (05) goto pc+2
15: (79) r3 = *(u64 *)(r1 +0)
dereference of modified ctx ptr R1 off=16 disallowed
libbpf: -- END LOG --
libbpf: failed to load program 'raw_syscalls:sys_enter'
libbpf: failed to load object 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
bpf: load objects failed: err=-4007: (Kernel verifier blocks program loading)
event syntax error: 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
\___ Kernel verifier blocks program loading
(add -v to see detail)
Run 'perf list' for a list of valid events
Usage: perf trace [<options>] [<command>]
or: perf trace [<options>] -- <command> [<options>]
or: perf trace record [<options>] [<command>]
or: perf trace record [<options>] -- <command> [<options>]
-e, --event <event> event/syscall selector. use 'perf list' to list available events
[root@seventh perf]#
I'll check how to plumb that, but its a holiday down here in Brazil,
kids at home...
> As for what you want to do with the index coming from userspace, the verifier
> will not like that at all, as mentioned above, so I think you'll need to do
> something like:
> switch (filename_arg_from_userspace) {
> case 0: filename_arg = args->args[0]; break;
> case 1: filename_arg = args->args[1]; break;
> /* etc */
> default: filename_arg = NULL;
> }
> thus ensuring that you only ever have ctx pointers with constant offsets.
>
> -Ed
>
> [1]: https://lists.iovisor.org/g/iovisor-dev/topic/21386327#1302
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Help with the BPF verifier
2018-11-02 15:02 ` Arnaldo Carvalho de Melo
@ 2018-11-02 15:42 ` Edward Cree
2018-11-02 21:27 ` Yonghong Song
2018-11-03 11:29 ` Arnaldo Carvalho de Melo
0 siblings, 2 replies; 11+ messages in thread
From: Edward Cree @ 2018-11-02 15:42 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Yonghong Song, Daniel Borkmann, Jiri Olsa, Martin Lau,
Alexei Starovoitov, Linux Networking Development Mailing List
On 02/11/18 15:02, Arnaldo Carvalho de Melo wrote:
> Yeah, didn't work as well:
> And the -vv in 'perf trace' didn't seem to map to further details in the
> output of the verifier debug:
Yeah for log_level 2 you probably need to make source-level changes to either
perf or libbpf (I think the latter). It's annoying that essentially no tools
plumb through an option for that, someone should fix them ;-)
> libbpf: -- BEGIN DUMP LOG ---
> libbpf:
> 0: (bf) r6 = r1
> 1: (bf) r1 = r10
> 2: (07) r1 += -328
> 3: (b7) r7 = 64
> 4: (b7) r2 = 64
> 5: (bf) r3 = r6
> 6: (85) call bpf_probe_read#4
> 7: (79) r1 = *(u64 *)(r10 -320)
> 8: (15) if r1 == 0x101 goto pc+4
> R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> 9: (55) if r1 != 0x2 goto pc+22
> R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> 10: (bf) r1 = r6
> 11: (07) r1 += 16
> 12: (05) goto pc+2
> 15: (79) r3 = *(u64 *)(r1 +0)
> dereference of modified ctx ptr R1 off=16 disallowed
Aha, we at least got a different error message this time.
And indeed llvm has done that optimisation, rather than the more obvious
11: r3 = *(u64 *)(r1 +16)
because it wants to have lots of reads share a single insn. You may be able
to defeat that optimisation by adding compiler barriers, idk. Maybe someone
with llvm knowledge can figure out how to stop it (ideally, llvm would know
when it's generating for bpf backend and not do that). -O0? ¯\_(ツ)_/¯
Alternatively, your prog looks short enough that maybe you could kick the C
habit and write it directly in eBPF asm, that way no-one is optimising things
behind your back. (I realise this option won't appeal to everyone ;-)
The reason the verifier disallows this, iirc, is because it needs to be able
to rewrite the offsets on ctx accesses (see convert_ctx_accesses()) in case
underlying kernel struct doesn't match the layout of the ctx ABI. To do this
it needs the ctx offset to live entirely in the insn doing the access,
otherwise different paths could lead to the same insn accessing different ctx
offsets with different fixups needed — can't be done.
-Ed
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Help with the BPF verifier
2018-11-02 15:42 ` Edward Cree
@ 2018-11-02 21:27 ` Yonghong Song
2018-11-05 12:33 ` Arnaldo Carvalho de Melo
2018-11-03 11:29 ` Arnaldo Carvalho de Melo
1 sibling, 1 reply; 11+ messages in thread
From: Yonghong Song @ 2018-11-02 21:27 UTC (permalink / raw)
To: Edward Cree, Arnaldo Carvalho de Melo
Cc: Daniel Borkmann, Jiri Olsa, Martin Lau, Alexei Starovoitov,
Linux Networking Development Mailing List
On 11/2/18 8:42 AM, Edward Cree wrote:
> On 02/11/18 15:02, Arnaldo Carvalho de Melo wrote:
>> Yeah, didn't work as well:
>
>> And the -vv in 'perf trace' didn't seem to map to further details in the
>> output of the verifier debug:
> Yeah for log_level 2 you probably need to make source-level changes to either
> perf or libbpf (I think the latter). It's annoying that essentially no tools
> plumb through an option for that, someone should fix them ;-)
>
>> libbpf: -- BEGIN DUMP LOG ---
>> libbpf:
>> 0: (bf) r6 = r1
>> 1: (bf) r1 = r10
>> 2: (07) r1 += -328
>> 3: (b7) r7 = 64
>> 4: (b7) r2 = 64
>> 5: (bf) r3 = r6
>> 6: (85) call bpf_probe_read#4
>> 7: (79) r1 = *(u64 *)(r10 -320)
>> 8: (15) if r1 == 0x101 goto pc+4
>> R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
>> 9: (55) if r1 != 0x2 goto pc+22
>> R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
>> 10: (bf) r1 = r6
>> 11: (07) r1 += 16
>> 12: (05) goto pc+2
>> 15: (79) r3 = *(u64 *)(r1 +0)
>> dereference of modified ctx ptr R1 off=16 disallowed
> Aha, we at least got a different error message this time.
> And indeed llvm has done that optimisation, rather than the more obvious
> 11: r3 = *(u64 *)(r1 +16)
> because it wants to have lots of reads share a single insn. You may be able
> to defeat that optimisation by adding compiler barriers, idk. Maybe someone
> with llvm knowledge can figure out how to stop it (ideally, llvm would know
> when it's generating for bpf backend and not do that). -O0? ¯\_(ツ)_/¯
The optimization mostly likes below:
br1:
...
r1 += 16
goto merge
br2:
...
r1 += 20
goto merge
merge:
*(u64 *)(r1 + 0)
The compiler tries to merge common loads. There is no easy way to
stop this compiler optimization without turning off a lot of other
optimizations. The easiest way is to add barriers
__asm__ __volatile__("": : :"memory")
after the ctx memory access to prevent their down stream merging.
> Alternatively, your prog looks short enough that maybe you could kick the C
> habit and write it directly in eBPF asm, that way no-one is optimising things
> behind your back. (I realise this option won't appeal to everyone ;-)
The LLVM supports BPF inline assembly as well. Some examples here
https://github.com/llvm-mirror/llvm/blob/master/test/CodeGen/BPF/inline_asm.ll
You may try it for selective ctx access to work around some
compiler optimizations. I personally have not used it yet and actually
not sure whether it actually works or not :-)
> The reason the verifier disallows this, iirc, is because it needs to be able
> to rewrite the offsets on ctx accesses (see convert_ctx_accesses()) in case > underlying kernel struct doesn't match the layout of the ctx ABI.
To do this
> it needs the ctx offset to live entirely in the insn doing the access,
> otherwise different paths could lead to the same insn accessing different ctx
> offsets with different fixups needed — can't be done.
>
> -Ed
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Help with the BPF verifier
2018-11-02 15:42 ` Edward Cree
2018-11-02 21:27 ` Yonghong Song
@ 2018-11-03 11:29 ` Arnaldo Carvalho de Melo
2018-11-03 11:32 ` Arnaldo Carvalho de Melo
1 sibling, 1 reply; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-11-03 11:29 UTC (permalink / raw)
To: Edward Cree
Cc: Yonghong Song, Daniel Borkmann, Jiri Olsa, Martin Lau,
Alexei Starovoitov, Linux Networking Development Mailing List
Em Fri, Nov 02, 2018 at 03:42:49PM +0000, Edward Cree escreveu:
> On 02/11/18 15:02, Arnaldo Carvalho de Melo wrote:
> > Yeah, didn't work as well:
>
> > And the -vv in 'perf trace' didn't seem to map to further details in the
> > output of the verifier debug:
> Yeah for log_level 2 you probably need to make source-level changes to either
> perf or libbpf (I think the latter). It's annoying that essentially no tools
> plumb through an option for that, someone should fix them ;-)
>
> > libbpf: -- BEGIN DUMP LOG ---
> > libbpf:
> > 0: (bf) r6 = r1
> > 1: (bf) r1 = r10
> > 2: (07) r1 += -328
> > 3: (b7) r7 = 64
> > 4: (b7) r2 = 64
> > 5: (bf) r3 = r6
> > 6: (85) call bpf_probe_read#4
> > 7: (79) r1 = *(u64 *)(r10 -320)
> > 8: (15) if r1 == 0x101 goto pc+4
> > R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> > 9: (55) if r1 != 0x2 goto pc+22
> > R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> > 10: (bf) r1 = r6
> > 11: (07) r1 += 16
> > 12: (05) goto pc+2
> > 15: (79) r3 = *(u64 *)(r1 +0)
> > dereference of modified ctx ptr R1 off=16 disallowed
> Aha, we at least got a different error message this time.
> And indeed llvm has done that optimisation, rather than the more obvious
> 11: r3 = *(u64 *)(r1 +16)
> because it wants to have lots of reads share a single insn. You may be able
> to defeat that optimisation by adding compiler barriers, idk. Maybe someone
> with llvm knowledge can figure out how to stop it (ideally, llvm would know
> when it's generating for bpf backend and not do that). -O0? ¯\_(ツ)_/¯
set env: NR_CPUS=4
set env: LINUX_VERSION_CODE=0x41300
set env: CLANG_EXEC=/usr/local/bin/clang
unset env: CLANG_OPTIONS
set env: KERNEL_INC_OPTIONS= -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h
set env: PERF_BPF_INC_OPTIONS=-I/home/acme/lib/perf/include/bpf
set env: WORKING_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
set env: CLANG_SOURCE=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
llvm compiling command template: $CLANG_EXEC -D__KERNEL__ -D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign -working-directory $WORKING_DIR -c "$CLANG_SOURCE" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE
llvm compiling command : /usr/local/bin/clang -D__KERNEL__ -D__NR_CPUS__=4 -DLINUX_VERSION_CODE=0x41300 -I/home/acme/lib/perf/include/bpf -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h -Wno-unused-value -Wno-pointer-sign -working-directory /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build -c /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c -target bpf -O2 -o -
So it is using -O2, lets try with -O0...
So I added this to my ~/.perfconfig, i.e. the default clang command line
template replacing -O2 with -O0.
[root@seventh perf]# cat ~/.perfconfig
[llvm]
clang-bpf-cmd-template = "$CLANG_EXEC -D__KERNEL__ -D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign -working-directory $WORKING_DIR -c \"$CLANG_SOURCE\" -target bpf $CLANG_EMIT_LLVM -O0 -o - $LLVM_OPTIONS_PIPE"
# dump-obj = true
[root@seventh perf]#
And got an explosion:
# trace -vv -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
bpf: builtin compilation failed: -95, try external compiler
Kernel build dir is set to /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
set env: KBUILD_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
unset env: KBUILD_OPTS
include option is set to -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h
set env: NR_CPUS=4
set env: LINUX_VERSION_CODE=0x41300
set env: CLANG_EXEC=/usr/local/bin/clang
unset env: CLANG_OPTIONS
set env: KERNEL_INC_OPTIONS= -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h
set env: PERF_BPF_INC_OPTIONS=-I/home/acme/lib/perf/include/bpf
set env: WORKING_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
set env: CLANG_SOURCE=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
llvm compiling command template: $CLANG_EXEC -D__KERNEL__ -D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign -working-directory $WORKING_DIR -c "$CLANG_SOURCE" -target bpf $CLANG_EMIT_LLVM -O0 -o - $LLVM_OPTIONS_PIPE
llvm compiling command : /usr/local/bin/clang -D__KERNEL__ -D__NR_CPUS__=4 -DLINUX_VERSION_CODE=0x41300 -I/home/acme/lib/perf/include/bpf -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h -Wno-unused-value -Wno-pointer-sign -working-directory /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build -c /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c -target bpf -O0 -o -
clang-7: /home/acme/git/llvm/lib/Target/BPF/MCTargetDesc/BPFAsmBackend.cpp:74: virtual void {anonymous}::BPFAsmBackend::applyFixup(const llvm::MCAssembler&, const llvm::MCFixup&, const llvm::MCValue&, llvm::MutableArrayRef<char>, uint64_t, bool, const llvm::MCSubtargetInfo*) const: Assertion `Value == 0' failed.
Stack dump:
0. Program arguments: /usr/local/bin/clang-7 -cc1 -triple bpf -emit-obj -mrelax-all -disable-free -main-file-name augmented_raw_syscalls.c -mrelocation-model static -mthread-model posix -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -dwarf-column-info -debugger-tuning=gdb -coverage-notes-file /home/acme/git/perf/-.gcno -nostdsysteminc -nobuiltininc -resource-dir /usr/local/lib/clang/8.0.0 -working-directory /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -include /home/acme/git/linux/include/linux/kconfig.h -D __KERNEL__ -D __NR_CPUS__=4 -D LINUX_VERSION_CODE=0x41300 -I /home/acme/lib/perf/include/bpf -I /home/acme/git/linux/arch/x86/include -I ./arch/x86/include/generated -I /home/acme/git/linux/include -I ./include -I /home/acme/git/linux/arch/x86/include/uapi -I ./arch/x86/include/generated/uapi -I /home/acme/git/linux/include/uapi -I ./include/generated/uapi -O0 -Wno-unused-value -Wno-pointer-sign -fdebug-compilation-dir /home/acme/git/perf -ferror-limit 19 -fmessage-length 203 -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -o - -x c /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c -faddrsig
1. <eof> parser at end of file
2. Code generation
#0 0x000000000408ff98 llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/usr/local/bin/clang-7+0x408ff98)
#1 0x000000000409002b (/usr/local/bin/clang-7+0x409002b)
#2 0x000000000408e059 llvm::sys::RunSignalHandlers() (/usr/local/bin/clang-7+0x408e059)
#3 0x000000000408fa5b (/usr/local/bin/clang-7+0x408fa5b)
#4 0x00007fcd33ab81c0 __restore_rt (/lib64/libpthread.so.0+0x121c0)
#5 0x00007fcd32817750 __GI_raise (/lib64/libc.so.6+0x34750)
#6 0x00007fcd32818d31 __GI_abort (/lib64/libc.so.6+0x35d31)
#7 0x00007fcd3281005a __assert_fail_base (/lib64/libc.so.6+0x2d05a)
#8 0x00007fcd328100d2 (/lib64/libc.so.6+0x2d0d2)
#9 0x0000000002602d13 (/usr/local/bin/clang-7+0x2602d13)
#10 0x0000000003bf6da7 llvm::MCAssembler::layout(llvm::MCAsmLayout&) (/usr/local/bin/clang-7+0x3bf6da7)
#11 0x0000000003bf6e2c llvm::MCAssembler::Finish() (/usr/local/bin/clang-7+0x3bf6e2c)
#12 0x0000000003c524e8 llvm::MCObjectStreamer::FinishImpl() (/usr/local/bin/clang-7+0x3c524e8)
#13 0x0000000003c2db4b llvm::MCELFStreamer::FinishImpl() (/usr/local/bin/clang-7+0x3c2db4b)
#14 0x0000000003c5ba43 llvm::MCStreamer::Finish() (/usr/local/bin/clang-7+0x3c5ba43)
#15 0x0000000004c281b9 llvm::AsmPrinter::doFinalization(llvm::Module&) (/usr/local/bin/clang-7+0x4c281b9)
#16 0x00000000038d637b llvm::FPPassManager::doFinalization(llvm::Module&) (/usr/local/bin/clang-7+0x38d637b)
#17 0x00000000038d683d (/usr/local/bin/clang-7+0x38d683d)
#18 0x00000000038d6d6b llvm::legacy::PassManagerImpl::run(llvm::Module&) (/usr/local/bin/clang-7+0x38d6d6b)
#19 0x00000000038d6f9b llvm::legacy::PassManager::run(llvm::Module&) (/usr/local/bin/clang-7+0x38d6f9b)
#20 0x000000000436b091 (/usr/local/bin/clang-7+0x436b091)
#21 0x000000000436df07 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (/usr/local/bin/clang-7+0x436df07)
#22 0x0000000004f6ee11 (/usr/local/bin/clang-7+0x4f6ee11)
#23 0x00000000062af3a3 clang::ParseAST(clang::Sema&, bool, bool) (/usr/local/bin/clang-7+0x62af3a3)
#24 0x0000000004a73873 clang::ASTFrontendAction::ExecuteAction() (/usr/local/bin/clang-7+0x4a73873)
#25 0x0000000004f6ced8 clang::CodeGenAction::ExecuteAction() (/usr/local/bin/clang-7+0x4f6ced8)
#26 0x0000000004a7329d clang::FrontendAction::Execute() (/usr/local/bin/clang-7+0x4a7329d)
#27 0x0000000004a04ccf clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/usr/local/bin/clang-7+0x4a04ccf)
#28 0x0000000004bc512e clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/usr/local/bin/clang-7+0x4bc512e)
#29 0x0000000001d88a6c cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/usr/local/bin/clang-7+0x1d88a6c)
#30 0x0000000001d7e31e (/usr/local/bin/clang-7+0x1d7e31e)
#31 0x0000000001d7ef94 main (/usr/local/bin/clang-7+0x1d7ef94)
#32 0x00007fcd32803fea __libc_start_main (/lib64/libc.so.6+0x20fea)
#33 0x0000000001d7beea _start (/usr/local/bin/clang-7+0x1d7beea)
clang-7: error: unable to execute command: Aborted (core dumped)
clang-7: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 8.0.0 (http://llvm.org/git/clang.git 8587270a739ee30c926a76d5657e65e85b560f6e) (http://llvm.org/git/llvm.git 0566eefef9c3777bd780ec4cbb9efa764633b76c)
Target: bpf
Thread model: posix
InstalledDir: /usr/local/bin
clang-7: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
clang-7: note: diagnostic msg:
********************
PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-7: note: diagnostic msg: /tmp/augmented_raw_syscalls-7444d9.c
clang-7: note: diagnostic msg: /tmp/augmented_raw_syscalls-7444d9.sh
clang-7: note: diagnostic msg:
********************
ERROR: unable to compile tools/perf/examples/bpf/augmented_raw_syscalls.c
Hint: Check error message shown above.
Hint: You can also pre-compile it into .o using:
clang -target bpf -O2 -c tools/perf/examples/bpf/augmented_raw_syscalls.c
with proper -I and -D options.
event syntax error: 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
\___ Failed to load tools/perf/examples/bpf/augmented_raw_syscalls.c from source: Error when compiling BPF scriptlet
(add -v to see detail)
Run 'perf list' for a list of valid events
Usage: perf trace [<options>] [<command>]
or: perf trace [<options>] -- <command> [<options>]
or: perf trace record [<options>] [<command>]
or: perf trace record [<options>] -- <command> [<options>]
-e, --event <event> event/syscall selector. use 'perf list' to list available events
[root@seventh perf]#
Trying with -O1...
> Alternatively, your prog looks short enough that maybe you could kick the C
> habit and write it directly in eBPF asm, that way no-one is optimising things
> behind your back. (I realise this option won't appeal to everyone ;-)
Nah, if that is what it takes... Would be better with simple looking C
code tho :-)
> The reason the verifier disallows this, iirc, is because it needs to be able
> to rewrite the offsets on ctx accesses (see convert_ctx_accesses()) in case
> underlying kernel struct doesn't match the layout of the ctx ABI. To do this
> it needs the ctx offset to live entirely in the insn doing the access,
> otherwise different paths could lead to the same insn accessing different ctx
> offsets with different fixups needed — can't be done.
>
> -Ed
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Help with the BPF verifier
2018-11-03 11:29 ` Arnaldo Carvalho de Melo
@ 2018-11-03 11:32 ` Arnaldo Carvalho de Melo
0 siblings, 0 replies; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-11-03 11:32 UTC (permalink / raw)
To: Edward Cree
Cc: Yonghong Song, Daniel Borkmann, Jiri Olsa, Martin Lau,
Alexei Starovoitov, Linux Networking Development Mailing List
Em Sat, Nov 03, 2018 at 08:29:34AM -0300, Arnaldo Carvalho de Melo escreveu:
> PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
> Preprocessed source(s) and associated run script(s) are located at:
> clang-7: note: diagnostic msg: /tmp/augmented_raw_syscalls-7444d9.c
> clang-7: note: diagnostic msg: /tmp/augmented_raw_syscalls-7444d9.sh
> clang-7: note: diagnostic msg:
>
> ********************
> ERROR: unable to compile tools/perf/examples/bpf/augmented_raw_syscalls.c
> Hint: Check error message shown above.
> Hint: You can also pre-compile it into .o using:
> clang -target bpf -O2 -c tools/perf/examples/bpf/augmented_raw_syscalls.c
> with proper -I and -D options.
> event syntax error: 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
> \___ Failed to load tools/perf/examples/bpf/augmented_raw_syscalls.c from source: Error when compiling BPF scriptlet
>
> (add -v to see detail)
> Run 'perf list' for a list of valid events
>
> Usage: perf trace [<options>] [<command>]
> or: perf trace [<options>] -- <command> [<options>]
> or: perf trace record [<options>] [<command>]
> or: perf trace record [<options>] -- <command> [<options>]
>
> -e, --event <event> event/syscall selector. use 'perf list' to list available events
> [root@seventh perf]#
>
> Trying with -O1...
-O1 doesn't get clang confused, its just the verifier that doesn't like
the result, i.e. we're back to that optimization, that isn't disabled
with -O1
llvm compiling command : /usr/local/bin/clang -D__KERNEL__ -D__NR_CPUS__=4 -DLINUX_VERSION_CODE=0x41300 -I/home/acme/lib/perf/include/bpf -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h -Wno-unused-value -Wno-pointer-sign -working-directory /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build -c /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c -target bpf -O1 -o -
libbpf: loading object 'tools/perf/examples/bpf/augmented_raw_syscalls.c' from buffer
libbpf: section(1) .strtab, size 168, link 0, flags 0, type=3
libbpf: skip section(1) .strtab
libbpf: section(2) .text, size 0, link 0, flags 6, type=1
libbpf: skip section(2) .text
libbpf: section(3) raw_syscalls:sys_enter, size 344, link 0, flags 6, type=1
libbpf: found program raw_syscalls:sys_enter
libbpf: section(4) .relraw_syscalls:sys_enter, size 16, link 10, flags 0, type=9
libbpf: section(5) raw_syscalls:sys_exit, size 16, link 0, flags 6, type=1
libbpf: found program raw_syscalls:sys_exit
libbpf: section(6) maps, size 56, link 0, flags 3, type=1
libbpf: section(7) license, size 4, link 0, flags 3, type=1
libbpf: license of tools/perf/examples/bpf/augmented_raw_syscalls.c is GPL
libbpf: section(8) version, size 4, link 0, flags 3, type=1
libbpf: kernel version of tools/perf/examples/bpf/augmented_raw_syscalls.c is 41300
libbpf: section(9) .llvm_addrsig, size 6, link 10, flags 80000000, type=1879002115
libbpf: skip section(9) .llvm_addrsig
libbpf: section(10) .symtab, size 240, link 1, flags 0, type=2
libbpf: maps in tools/perf/examples/bpf/augmented_raw_syscalls.c: 2 maps in 56 bytes
libbpf: map 0 is "__augmented_syscalls__"
libbpf: map 1 is "__bpf_stdout__"
libbpf: collecting relocating info for: 'raw_syscalls:sys_enter'
libbpf: relo for 4 value 28 name 124
libbpf: relocation: insn_idx=35
libbpf: relocation: find map 1 (__augmented_syscalls__) for insn 35
Added extra kernel map __entry_SYSCALL_64_trampoline fffffe0000006000-fffffe0000007000
Added extra kernel map __entry_SYSCALL_64_trampoline fffffe0000032000-fffffe0000033000
Added extra kernel map __entry_SYSCALL_64_trampoline fffffe000005e000-fffffe000005f000
Added extra kernel map __entry_SYSCALL_64_trampoline fffffe000008a000-fffffe000008b000
bpf: config program 'raw_syscalls:sys_enter'
bpf: config program 'raw_syscalls:sys_exit'
libbpf: create map __bpf_stdout__: fd=3
libbpf: create map __augmented_syscalls__: fd=4
libbpf: load bpf program failed: Permission denied
libbpf: -- BEGIN DUMP LOG ---
libbpf:
0: (bf) r6 = r1
1: (bf) r1 = r10
2: (07) r1 += -328
3: (b7) r7 = 64
4: (b7) r2 = 64
5: (bf) r3 = r6
6: (85) call bpf_probe_read#4
7: (79) r1 = *(u64 *)(r10 -320)
8: (15) if r1 == 0x101 goto pc+4
R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
9: (55) if r1 != 0x2 goto pc+22
R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
10: (bf) r1 = r6
11: (07) r1 += 16
12: (05) goto pc+2
15: (79) r3 = *(u64 *)(r1 +0)
dereference of modified ctx ptr R1 off=16 disallowed
libbpf: -- END LOG --
libbpf: failed to load program 'raw_syscalls:sys_enter'
libbpf: failed to load object 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
bpf: load objects failed: err=-4007: (Kernel verifier blocks program loading)
event syntax error: 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
\___ Kernel verifier blocks program loading
(add -v to see detail)
Run 'perf list' for a list of valid events
Usage: perf trace [<options>] [<command>]
or: perf trace [<options>] -- <command> [<options>]
or: perf trace record [<options>] [<command>]
or: perf trace record [<options>] -- <command> [<options>]
-e, --event <event> event/syscall selector. use 'perf list' to list available events
[root@seventh perf]#
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Help with the BPF verifier
2018-11-02 21:27 ` Yonghong Song
@ 2018-11-05 12:33 ` Arnaldo Carvalho de Melo
0 siblings, 0 replies; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-11-05 12:33 UTC (permalink / raw)
To: Yonghong Song
Cc: Edward Cree, Daniel Borkmann, Jiri Olsa, Martin Lau,
Alexei Starovoitov, Linux Networking Development Mailing List
Em Fri, Nov 02, 2018 at 09:27:52PM +0000, Yonghong Song escreveu:
>
>
> On 11/2/18 8:42 AM, Edward Cree wrote:
> > On 02/11/18 15:02, Arnaldo Carvalho de Melo wrote:
> >> Yeah, didn't work as well:
> >
> >> And the -vv in 'perf trace' didn't seem to map to further details in the
> >> output of the verifier debug:
> > Yeah for log_level 2 you probably need to make source-level changes to either
> > perf or libbpf (I think the latter). It's annoying that essentially no tools
> > plumb through an option for that, someone should fix them ;-)
> >
> >> libbpf: -- BEGIN DUMP LOG ---
> >> libbpf:
> >> 0: (bf) r6 = r1
> >> 1: (bf) r1 = r10
> >> 2: (07) r1 += -328
> >> 3: (b7) r7 = 64
> >> 4: (b7) r2 = 64
> >> 5: (bf) r3 = r6
> >> 6: (85) call bpf_probe_read#4
> >> 7: (79) r1 = *(u64 *)(r10 -320)
> >> 8: (15) if r1 == 0x101 goto pc+4
> >> R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> >> 9: (55) if r1 != 0x2 goto pc+22
> >> R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> >> 10: (bf) r1 = r6
> >> 11: (07) r1 += 16
> >> 12: (05) goto pc+2
> >> 15: (79) r3 = *(u64 *)(r1 +0)
> >> dereference of modified ctx ptr R1 off=16 disallowed
> > Aha, we at least got a different error message this time.
> > And indeed llvm has done that optimisation, rather than the more obvious
> > 11: r3 = *(u64 *)(r1 +16)
> > because it wants to have lots of reads share a single insn. You may be able
> > to defeat that optimisation by adding compiler barriers, idk. Maybe someone
> > with llvm knowledge can figure out how to stop it (ideally, llvm would know
> > when it's generating for bpf backend and not do that). -O0? ¯\_(ツ)_/¯
>
> The optimization mostly likes below:
> br1:
> ...
> r1 += 16
> goto merge
> br2:
> ...
> r1 += 20
> goto merge
> merge:
> *(u64 *)(r1 + 0)
>
> The compiler tries to merge common loads. There is no easy way to
> stop this compiler optimization without turning off a lot of other
> optimizations. The easiest way is to add barriers
> __asm__ __volatile__("": : :"memory")
> after the ctx memory access to prevent their down stream merging.
Great, this made it work:
cat /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
#include <stdio.h>
#include <linux/socket.h>
/* bpf-output associated map */
struct bpf_map SEC("maps") __augmented_syscalls__ = {
.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
.key_size = sizeof(int),
.value_size = sizeof(u32),
.max_entries = __NR_CPUS__,
};
struct syscall_enter_args {
unsigned long long common_tp_fields;
long syscall_nr;
unsigned long args[6];
};
struct syscall_exit_args {
unsigned long long common_tp_fields;
long syscall_nr;
long ret;
};
struct augmented_filename {
unsigned int size;
int reserved;
char value[256];
};
#define SYS_OPEN 2
#define SYS_OPENAT 257
SEC("raw_syscalls:sys_enter")
int sys_enter(struct syscall_enter_args *args)
{
struct {
struct syscall_enter_args args;
struct augmented_filename filename;
} augmented_args;
unsigned int len = sizeof(augmented_args);
const void *filename_arg = NULL;
probe_read(&augmented_args.args, sizeof(augmented_args.args), args);
switch (augmented_args.args.syscall_nr) {
case SYS_OPEN: filename_arg = (const void *)args->args[0];
__asm__ __volatile__("": : :"memory");
break;
case SYS_OPENAT: filename_arg = (const void *)args->args[1];
break;
default:
return 0;
}
if (filename_arg != NULL) {
augmented_args.filename.reserved = 0;
augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
sizeof(augmented_args.filename.value),
filename_arg);
if (augmented_args.filename.size < sizeof(augmented_args.filename.value)) {
len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size;
len &= sizeof(augmented_args.filename.value) - 1;
}
} else {
len = sizeof(augmented_args.args);
}
perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, len);
return 0;
}
SEC("raw_syscalls:sys_exit")
int sys_exit(struct syscall_exit_args *args)
{
struct syscall_exit_args augmented_args;
probe_read(&augmented_args, sizeof(augmented_args), args);
return augmented_args.syscall_nr == SYS_OPEN || augmented_args.syscall_nr == SYS_OPENAT;
}
license(GPL);
Now I have both open and openat getting the right
[root@jouet perf]# trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
LLVM: dumping tools/perf/examples/bpf/augmented_raw_syscalls.o
0.000 ( 0.010 ms): sleep/15794 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC) = 3
0.028 ( 0.005 ms): sleep/15794 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC) = 3
0.241 ( 0.005 ms): sleep/15794 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3
[root@jouet perf]#
[root@jouet perf]# trace -e tools/perf/examples/bpf/augmented_raw_syscalls.o cat /etc/passwd > /dev/null
0.000 ( 0.010 ms): cat/15918 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC) = 3
0.028 ( 0.006 ms): cat/15918 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC) = 3
0.241 ( 0.007 ms): cat/15918 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3
0.289 ( 0.004 ms): cat/15918 openat(dfd: CWD, filename: /etc/passwd) = 3
[root@jouet perf]#
[root@jouet perf]# readelf -SW tools/perf/examples/bpf/augmented_raw_syscalls.o
There are 11 section headers, starting at offset 0x438:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .strtab STRTAB 0000000000000000 000376 0000bd 00 0 0 1
[ 2] .text PROGBITS 0000000000000000 000040 000000 00 AX 0 0 4
[ 3] raw_syscalls:sys_enter PROGBITS 0000000000000000 000040 000138 00 AX 0 0 8
[ 4] .relraw_syscalls:sys_enter REL 0000000000000000 000360 000010 10 10 3 8
[ 5] raw_syscalls:sys_exit PROGBITS 0000000000000000 000178 000070 00 AX 0 0 8
[ 6] maps PROGBITS 0000000000000000 0001e8 000038 00 WA 0 0 4
[ 7] license PROGBITS 0000000000000000 000220 000004 00 WA 0 0 1
[ 8] version PROGBITS 0000000000000000 000224 000004 00 WA 0 0 4
[ 9] .llvm_addrsig LOOS+0xfff4c03 0000000000000000 000370 000006 00 E 10 0 1
[10] .symtab SYMTAB 0000000000000000 000228 000138 18 1 7 8
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
p (processor specific)
[root@jouet perf]#
> > Alternatively, your prog looks short enough that maybe you could kick the C
> > habit and write it directly in eBPF asm, that way no-one is optimising things
> > behind your back. (I realise this option won't appeal to everyone ;-)
>
> The LLVM supports BPF inline assembly as well. Some examples here
> https://github.com/llvm-mirror/llvm/blob/master/test/CodeGen/BPF/inline_asm.ll
> You may try it for selective ctx access to work around some
> compiler optimizations. I personally have not used it yet and actually
> not sure whether it actually works or not :-)
I will take a look at it, but the simple barrier right after the first ctx
access did the trick for me, probably I'll cook some ctx accessor macros to
make this transparent.
Now to make -e open,openat to set up filters via a map, get that
augmented_raw_syscalls.c to be added automatically to the evlist, etc.
- Arnaldo
> > The reason the verifier disallows this, iirc, is because it needs to be able
> > to rewrite the offsets on ctx accesses (see convert_ctx_accesses()) in case > underlying kernel struct doesn't match the layout of the ctx ABI.
> To do this
> > it needs the ctx offset to live entirely in the insn doing the access,
> > otherwise different paths could lead to the same insn accessing different ctx
> > offsets with different fixups needed — can't be done.
> >
> > -Ed
> >
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-11-05 21:53 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-01 18:52 Help with the BPF verifier Arnaldo Carvalho de Melo
2018-11-01 19:10 ` David Miller
2018-11-01 19:13 ` Arnaldo Carvalho de Melo
2018-11-01 19:28 ` David Miller
2018-11-01 20:05 ` Edward Cree
2018-11-02 15:02 ` Arnaldo Carvalho de Melo
2018-11-02 15:42 ` Edward Cree
2018-11-02 21:27 ` Yonghong Song
2018-11-05 12:33 ` Arnaldo Carvalho de Melo
2018-11-03 11:29 ` Arnaldo Carvalho de Melo
2018-11-03 11:32 ` Arnaldo Carvalho de Melo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).