All of lore.kernel.org
 help / color / mirror / Atom feed
* Help with the BPF verifier
@ 2018-11-01 18:52 Arnaldo Carvalho de Melo
  2018-11-01 19:10 ` David Miller
  2018-11-01 20:05 ` Edward Cree
  0 siblings, 2 replies; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-11-01 18:52 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Daniel Borkmann, Jiri Olsa, Martin Lau, Alexei Starovoitov,
	Linux Networking Development Mailing List

tl;dr: I seem to be trying to get past clang optimizations that get the
       verifier to accept my proggie.

Hi,

	So I'm moving to use raw_syscalls:sys_exit to collect pointer
contents, using maps to tell the bpf program what to copy, how many
bytes, filters, etc.

	I'm at the start of it at this point I need to use an index to
get to the right syscall arg that is a filename, starting just with
"open" and "openat", that have the filename in different args, so to get
this first part working I'm doing it directly in the bpf restricted C
program, later this will be to maps, etc, so if I set the index as a
constant, just for testing, it works, look at the "open" and "openat"
calls below, later we'll see why openat is failing to augment its
"filename" arg while "open" works:

[root@seventh perf]# trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
         ? (         ): sleep/10152  ... [continued]: execve()) = 0
     0.045 ( 0.004 ms): sleep/10152 brk() = 0x55ccff356000
     0.074 ( 0.007 ms): sleep/10152 access(filename: , mode: R) = -1 ENOENT No such file or directory
     0.089 ( 0.006 ms): sleep/10152 openat(dfd: CWD, filename: , flags: CLOEXEC) = 3
     0.097 ( 0.003 ms): sleep/10152 fstat(fd: 3, statbuf: 0x7ffecdd283f0) = 0
     0.103 ( 0.006 ms): sleep/10152 mmap(len: 103334, prot: READ, flags: PRIVATE, fd: 3) = 0x7f8ffee9c000
     0.111 ( 0.002 ms): sleep/10152 close(fd: 3) = 0
     0.135 ( 0.007 ms): sleep/10152 openat(dfd: CWD, filename: , flags: CLOEXEC) = 3
     0.144 ( 0.003 ms): sleep/10152 read(fd: 3, buf: 0x7ffecdd285b8, count: 832) = 832
     0.150 ( 0.002 ms): sleep/10152 fstat(fd: 3, statbuf: 0x7ffecdd28450) = 0
     0.155 ( 0.005 ms): sleep/10152 mmap(len: 8192, prot: READ|WRITE, flags: PRIVATE|ANONYMOUS) = 0x7f8ffee9a000
     0.166 ( 0.007 ms): sleep/10152 mmap(len: 3889792, prot: EXEC|READ, flags: PRIVATE|DENYWRITE, fd: 3) = 0x7f8ffe8dc000
     0.175 ( 0.010 ms): sleep/10152 mprotect(start: 0x7f8ffea89000, len: 2093056) = 0
     0.188 ( 0.010 ms): sleep/10152 mmap(addr: 0x7f8ffec88000, len: 24576, prot: READ|WRITE, flags: PRIVATE|FIXED|DENYWRITE, fd: 3, off: 1753088) = 0x7f8ffec88000
     0.204 ( 0.005 ms): sleep/10152 mmap(addr: 0x7f8ffec8e000, len: 14976, prot: READ|WRITE, flags: PRIVATE|FIXED|ANONYMOUS) = 0x7f8ffec8e000
     0.218 ( 0.002 ms): sleep/10152 close(fd: 3) = 0
     0.239 ( 0.002 ms): sleep/10152 arch_prctl(option: 4098, arg2: 140256433779968) = 0
     0.312 ( 0.009 ms): sleep/10152 mprotect(start: 0x7f8ffec88000, len: 16384, prot: READ) = 0
     0.343 ( 0.005 ms): sleep/10152 mprotect(start: 0x55ccff1c6000, len: 4096, prot: READ) = 0
     0.354 ( 0.006 ms): sleep/10152 mprotect(start: 0x7f8ffeeb6000, len: 4096, prot: READ) = 0
     0.362 ( 0.019 ms): sleep/10152 munmap(addr: 0x7f8ffee9c000, len: 103334) = 0
     0.476 ( 0.002 ms): sleep/10152 brk() = 0x55ccff356000
     0.480 ( 0.004 ms): sleep/10152 brk(brk: 0x55ccff377000) = 0x55ccff377000
     0.487 ( 0.002 ms): sleep/10152 brk() = 0x55ccff377000
     0.497 ( 0.008 ms): sleep/10152 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3
     0.507 ( 0.002 ms): sleep/10152 fstat(fd: 3, statbuf: 0x7f8ffec8daa0) = 0
     0.511 ( 0.006 ms): sleep/10152 mmap(len: 113045344, prot: READ, flags: PRIVATE, fd: 3) = 0x7f8ff7d0d000
     0.524 ( 0.002 ms): sleep/10152 close(fd: 3) = 0
     0.574 (1000.140 ms): sleep/10152 nanosleep(rqtp: 0x7ffecdd29130) = 0
  1000.753 ( 0.007 ms): sleep/10152 close(fd: 1) = 0
  1000.767 ( 0.004 ms): sleep/10152 close(fd: 2) = 0
  1000.781 (         ): sleep/10152 exit_group()
[root@seventh perf]# 

     1	// SPDX-License-Identifier: GPL-2.0
     2	/*
     3	 * Augment the raw_syscalls tracepoints with the contents of the pointer arguments.
     4	 *
     5	 * Test it with:
     6	 *
     7	 * perf trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c cat /etc/passwd > /dev/null
     8	 *
     9	 * This exactly matches what is marshalled into the raw_syscall:sys_enter
    10	 * payload expected by the 'perf trace' beautifiers.
    11	 *
    12	 * For now it just uses the existing tracepoint augmentation code in 'perf
    13	 * trace', in the next csets we'll hook up these with the sys_enter/sys_exit
    14	 * code that will combine entry/exit in a strace like way.
    15	 */
       
    16	#include <stdio.h>
    17	#include <linux/socket.h>
       
    18	/* bpf-output associated map */
    19	struct bpf_map SEC("maps") __augmented_syscalls__ = {
    20		.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
    21		.key_size = sizeof(int),
    22		.value_size = sizeof(u32),
    23		.max_entries = __NR_CPUS__,
    24	};
       
    25	struct syscall_enter_args {
    26		unsigned long long common_tp_fields;
    27		long		   syscall_nr;
    28		unsigned long	   args[6];
    29	};
       
    30	struct syscall_exit_args {
    31		unsigned long long common_tp_fields;
    32		long		   syscall_nr;
    33		long		   ret;
    34	};
       
    35	struct augmented_filename {
    36		unsigned int	size;
    37		int		reserved;
    38		char		value[256];
    39	};
       
    40	#define SYS_OPEN 2
    41	#define SYS_OPENAT 257
       
    42	SEC("raw_syscalls:sys_enter")
    43	int sys_enter(struct syscall_enter_args *args)
    44	{
    45		struct {
    46			struct syscall_enter_args args;
    47			struct augmented_filename filename;
    48		} augmented_args;
    49		unsigned int len = sizeof(augmented_args);
    50		unsigned int filename_arg = 6;
       
    51		probe_read(&augmented_args.args, sizeof(augmented_args.args), args);
       
    52		switch (augmented_args.args.syscall_nr) {
    53		case SYS_OPEN:	 filename_arg = 0; break;
    54		case SYS_OPENAT: filename_arg = 1; break;
    55		}
       
    56		if (filename_arg <= 5) {
    57			augmented_args.filename.reserved = 0;
    58			augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
    59								      sizeof(augmented_args.filename.value),
    60								      (const void *)args->args[0]);
    61			if (augmented_args.filename.size < sizeof(augmented_args.filename.value)) {
    62				len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size;
    63				len &= sizeof(augmented_args.filename.value) - 1;
    64			}
    65		} else {
    66			len = sizeof(augmented_args.args);
    67		}
       
    68		perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, len);
    69		return 0;
    70	}
       
    71	SEC("raw_syscalls:sys_exit")
    72	int sys_exit(struct syscall_exit_args *args)
    73	{
    74		return 1; /* 0 as soon as we start copying data returned by the kernel, e.g. 'read' */
    75	}
       
    76	license(GPL);

In line #60 if I change that to 1, then "openat" works and "open"
doesn't, so what I wanted was to use filename_arg there as the index,
now it comes from that switch, but really it'll come from userspace,
that knows the syscall tables for each arch, etc.

But if I do that, i.e. apply this patch to that program:

--- /wb/augmented_raw_syscalls.c.old	2018-11-01 15:43:55.000394234 -0300
+++ /wb/augmented_raw_syscalls.c	2018-11-01 15:44:15.102367838 -0300
@@ -67,7 +67,7 @@
 		augmented_args.filename.reserved = 0;
 		augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
 							      sizeof(augmented_args.filename.value),
-							      (const void *)args->args[0]);
+							      (const void *)args->args[filename_arg]);
 		if (augmented_args.filename.size < sizeof(augmented_args.filename.value)) {
 			len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size;
 			len &= sizeof(augmented_args.filename.value) - 1;

Then I end up with the verifier complying, I tried various ways to get
around the compiler about filename_arg being safe to use as an index,
but I couldn't find the right trick, ideas?

This is what I end up with when I apply that patch:

[root@seventh perf]# trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
event syntax error: 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
                     \___ Kernel verifier blocks program loading

(add -v to see detail)
Run 'perf list' for a list of valid events

 Usage: perf trace [<options>] [<command>]
    or: perf trace [<options>] -- <command> [<options>]
    or: perf trace record [<options>] [<command>]
    or: perf trace record [<options>] -- <command> [<options>]

    -e, --event <event>   event/syscall selector. use 'perf list' to list available events
[root@seventh perf]# 

Using -v, as suggested, I get:

[root@seventh perf]# trace -v -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
bpf: builtin compilation failed: -95, try external compiler
Kernel build dir is set to /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
set env: KBUILD_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
unset env: KBUILD_OPTS
include option is set to  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h 
set env: NR_CPUS=4
set env: LINUX_VERSION_CODE=0x41300
set env: CLANG_EXEC=/usr/local/bin/clang
unset env: CLANG_OPTIONS
set env: KERNEL_INC_OPTIONS= -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h 
set env: PERF_BPF_INC_OPTIONS=-I/home/acme/lib/perf/include/bpf
set env: WORKING_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
set env: CLANG_SOURCE=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
llvm compiling command template: $CLANG_EXEC -D__KERNEL__ -D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign -working-directory $WORKING_DIR -c "$CLANG_SOURCE" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE
llvm compiling command : /usr/local/bin/clang -D__KERNEL__ -D__NR_CPUS__=4 -DLINUX_VERSION_CODE=0x41300  -I/home/acme/lib/perf/include/bpf  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h  -Wno-unused-value -Wno-pointer-sign -working-directory /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build -c /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c -target bpf  -O2 -o - 
libbpf: loading object 'tools/perf/examples/bpf/augmented_raw_syscalls.c' from buffer
libbpf: section(1) .strtab, size 168, link 0, flags 0, type=3
libbpf: skip section(1) .strtab
libbpf: section(2) .text, size 0, link 0, flags 6, type=1
libbpf: skip section(2) .text
libbpf: section(3) raw_syscalls:sys_enter, size 376, link 0, flags 6, type=1
libbpf: found program raw_syscalls:sys_enter
libbpf: section(4) .relraw_syscalls:sys_enter, size 16, link 10, flags 0, type=9
libbpf: section(5) raw_syscalls:sys_exit, size 16, link 0, flags 6, type=1
libbpf: found program raw_syscalls:sys_exit
libbpf: section(6) maps, size 56, link 0, flags 3, type=1
libbpf: section(7) license, size 4, link 0, flags 3, type=1
libbpf: license of tools/perf/examples/bpf/augmented_raw_syscalls.c is GPL
libbpf: section(8) version, size 4, link 0, flags 3, type=1
libbpf: kernel version of tools/perf/examples/bpf/augmented_raw_syscalls.c is 41300
libbpf: section(9) .llvm_addrsig, size 6, link 10, flags 80000000, type=1879002115
libbpf: skip section(9) .llvm_addrsig
libbpf: section(10) .symtab, size 240, link 1, flags 0, type=2
libbpf: maps in tools/perf/examples/bpf/augmented_raw_syscalls.c: 2 maps in 56 bytes
libbpf: map 0 is "__augmented_syscalls__"
libbpf: map 1 is "__bpf_stdout__"
libbpf: collecting relocating info for: 'raw_syscalls:sys_enter'
libbpf: relo for 4 value 28 name 124
libbpf: relocation: insn_idx=39
libbpf: relocation: find map 1 (__augmented_syscalls__) for insn 39
bpf: config program 'raw_syscalls:sys_enter'
bpf: config program 'raw_syscalls:sys_exit'
libbpf: create map __bpf_stdout__: fd=3
libbpf: create map __augmented_syscalls__: fd=4
libbpf: load bpf program failed: Permission denied
libbpf: -- BEGIN DUMP LOG ---
libbpf: 
0: (bf) r6 = r1
1: (bf) r1 = r10
2: (07) r1 += -328
3: (b7) r7 = 64
4: (b7) r2 = 64
5: (bf) r3 = r6
6: (85) call bpf_probe_read#4
7: (b7) r2 = 1
8: (79) r3 = *(u64 *)(r10 -320)
9: (15) if r3 == 0x101 goto pc+1
 R0=inv(id=0) R2=inv1 R3=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
10: (b7) r2 = 6
11: (b7) r1 = 0
12: (15) if r3 == 0x2 goto pc+1
 R0=inv(id=0) R1=inv0 R2=inv6 R3=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
13: (bf) r1 = r2
14: (25) if r1 > 0x5 goto pc+21
 R0=inv(id=0) R1=inv6 R2=inv6 R3=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
15: (b7) r2 = 0
16: (63) *(u32 *)(r10 -260) = r2
17: (67) r1 <<= 32
18: (77) r1 >>= 32
19: (67) r1 <<= 3
20: (bf) r2 = r6
21: (0f) r2 += r1
22: (79) r3 = *(u64 *)(r2 +16)
R2 invalid mem access 'inv'

libbpf: -- END LOG --
libbpf: failed to load program 'raw_syscalls:sys_enter'
libbpf: failed to load object 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
bpf: load objects failed: err=-4007: (Kernel verifier blocks program loading)
event syntax error: 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
                     \___ Kernel verifier blocks program loading

(add -v to see detail)
Run 'perf list' for a list of valid events

 Usage: perf trace [<options>] [<command>]
    or: perf trace [<options>] -- <command> [<options>]
    or: perf trace record [<options>] [<command>]
    or: perf trace record [<options>] -- <command> [<options>]

    -e, --event <event>   event/syscall selector. use 'perf list' to list available events
[root@seventh perf]# 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Help with the BPF verifier
  2018-11-01 18:52 Help with the BPF verifier Arnaldo Carvalho de Melo
@ 2018-11-01 19:10 ` David Miller
  2018-11-01 19:13   ` Arnaldo Carvalho de Melo
  2018-11-01 20:05 ` Edward Cree
  1 sibling, 1 reply; 11+ messages in thread
From: David Miller @ 2018-11-01 19:10 UTC (permalink / raw)
  To: acme; +Cc: yhs, daniel, jolsa, kafai, alexei.starovoitov, netdev

From: Arnaldo Carvalho de Melo <acme@kernel.org>
Date: Thu, 1 Nov 2018 15:52:17 -0300

>     50		unsigned int filename_arg = 6;
 ...
> --- /wb/augmented_raw_syscalls.c.old	2018-11-01 15:43:55.000394234 -0300
> +++ /wb/augmented_raw_syscalls.c	2018-11-01 15:44:15.102367838 -0300
> @@ -67,7 +67,7 @@
>  		augmented_args.filename.reserved = 0;
>  		augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
>  							      sizeof(augmented_args.filename.value),
> -							      (const void *)args->args[0]);
> +							      (const void *)args->args[filename_arg]);

args[] is sized to '6', therefore the last valid index is '5', yet you're using '6' here which
is one entry past the end of the declared array.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Help with the BPF verifier
  2018-11-01 19:10 ` David Miller
@ 2018-11-01 19:13   ` Arnaldo Carvalho de Melo
  2018-11-01 19:28     ` David Miller
  0 siblings, 1 reply; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-11-01 19:13 UTC (permalink / raw)
  To: David Miller; +Cc: yhs, daniel, jolsa, kafai, alexei.starovoitov, netdev

Em Thu, Nov 01, 2018 at 12:10:39PM -0700, David Miller escreveu:
> From: Arnaldo Carvalho de Melo <acme@kernel.org>
> Date: Thu, 1 Nov 2018 15:52:17 -0300
> 
> >     50		unsigned int filename_arg = 6;
>  ...
> > --- /wb/augmented_raw_syscalls.c.old	2018-11-01 15:43:55.000394234 -0300
> > +++ /wb/augmented_raw_syscalls.c	2018-11-01 15:44:15.102367838 -0300
> > @@ -67,7 +67,7 @@
> >  		augmented_args.filename.reserved = 0;
> >  		augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
> >  							      sizeof(augmented_args.filename.value),
> > -							      (const void *)args->args[0]);
> > +							      (const void *)args->args[filename_arg]);
> 
> args[] is sized to '6', therefore the last valid index is '5', yet you're using '6' here which
> is one entry past the end of the declared array.

Nope... this is inside an if:

        if (filename_arg <= 5) {
                augmented_args.filename.reserved = 0;
                augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
                                                              sizeof(augmented_args.filename.value),
                                                              (const void *)args->args[filename_arg]);
                if (augmented_args.filename.size < sizeof(augmented_args.filename.value)) {
                        len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size;
                        len &= sizeof(augmented_args.filename.value) - 1;
                }
        } else {

I use 6 to mean "hey, this syscall doesn't have any string argument, don't
bother with it".

- Arnaldo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Help with the BPF verifier
  2018-11-01 19:13   ` Arnaldo Carvalho de Melo
@ 2018-11-01 19:28     ` David Miller
  0 siblings, 0 replies; 11+ messages in thread
From: David Miller @ 2018-11-01 19:28 UTC (permalink / raw)
  To: acme; +Cc: yhs, daniel, jolsa, kafai, alexei.starovoitov, netdev

From: Arnaldo Carvalho de Melo <acme@kernel.org>
Date: Thu, 1 Nov 2018 16:13:10 -0300

> Nope... this is inside an if:
> 
>         if (filename_arg <= 5) {
>                 augmented_args.filename.reserved = 0;
>                 augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
>                                                               sizeof(augmented_args.filename.value),
>                                                               (const void *)args->args[filename_arg]);
>                 if (augmented_args.filename.size < sizeof(augmented_args.filename.value)) {
>                         len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size;
>                         len &= sizeof(augmented_args.filename.value) - 1;
>                 }
>         } else {
> 
> I use 6 to mean "hey, this syscall doesn't have any string argument, don't
> bother with it".

Really weird.  And it's unsigned so I can't imagine it wants you to
check that it's >= 0...

Maybe Deniel or someone else can figure it out.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Help with the BPF verifier
  2018-11-01 18:52 Help with the BPF verifier Arnaldo Carvalho de Melo
  2018-11-01 19:10 ` David Miller
@ 2018-11-01 20:05 ` Edward Cree
  2018-11-02 15:02   ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 11+ messages in thread
From: Edward Cree @ 2018-11-01 20:05 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Yonghong Song
  Cc: Daniel Borkmann, Jiri Olsa, Martin Lau, Alexei Starovoitov,
	Linux Networking Development Mailing List

On 01/11/18 18:52, Arnaldo Carvalho de Melo wrote:
>  R0=inv(id=0) R1=inv6 R2=inv6 R3=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> 15: (b7) r2 = 0
> 16: (63) *(u32 *)(r10 -260) = r2
> 17: (67) r1 <<= 32
> 18: (77) r1 >>= 32
> 19: (67) r1 <<= 3
> 20: (bf) r2 = r6
> 21: (0f) r2 += r1
> 22: (79) r3 = *(u64 *)(r2 +16)
> R2 invalid mem access 'inv'
I wonder if you could run this with verifier log level 2?  (I'm not sure how
 you would go about plumbing that through the perf tooling.)  It seems very
 odd that it ends up with R2=inv, and I'm wondering whether R1 becomes unknown
 during the shifts or whether the addition in insn 21 somehow produces the
 unknown-ness.  (I know we used to have a thing[1] where doing ptr += K and
 then also having an offset in the LDX produced an error about
 ptr+const+const, but that seems to have been fixed at some point.)

Note however that even if we get past this, R1 at this point holds 6, so it
 looks like the verifier is walking the impossible path where we're inside the
 'if' even though filename_arg = 6.  This is a (slightly annoying) verifier
 limitation, that it walks paths with impossible combinations of constraints
 (we've previously had cases where assertions in the verifier would blow up
 because of this, e.g. registers with max_val less than min_val).  So if the
 check_ctx_access() is going to worry about whether you're off the end of the
 array (I'm not sure what your program type is and thus which is_valid_access
 callback is involved), then it'll complain about this.
If filename_arg came from some external source you'd have a different
 problem, because then it would have a totally unknown value, that on entering
 the 'if' becomes "unknown but < 6", which is still too variable to have as
 the offset of a ctx access.  Those have to be at a known constant offset, so
 that we can determine the type of the returned value.

As a way to fix this, how about [UNTESTED!]:
    const void *filename_arg = NULL;
    /* ... */
    switch (augmented_args.args.syscall_nr) {
        case SYS_OPEN: filename_arg = args->args[0]; break;
        case SYS_OPENAT: filename_arg = args->args[1]; break;
    }
    /* ... */
    if (filename_arg) {
        /* stuff */
        blah = probe_read_str(/* ... */, filename_arg);
    } else {
        /* the other stuff */
    }
That way, you're only ever dealing in constant pointers (although judging by
 an old thread I found[1] about ptr+const+const, the compiler might decide to
 make some optimisations that end up looking like your existing code).

As for what you want to do with the index coming from userspace, the verifier
 will not like that at all, as mentioned above, so I think you'll need to do
 something like:
    switch (filename_arg_from_userspace) {
        case 0: filename_arg = args->args[0]; break;
        case 1: filename_arg = args->args[1]; break;
        /* etc */
        default: filename_arg = NULL;
    }
 thus ensuring that you only ever have ctx pointers with constant offsets.

-Ed

[1]: https://lists.iovisor.org/g/iovisor-dev/topic/21386327#1302

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Help with the BPF verifier
  2018-11-01 20:05 ` Edward Cree
@ 2018-11-02 15:02   ` Arnaldo Carvalho de Melo
  2018-11-02 15:42     ` Edward Cree
  0 siblings, 1 reply; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-11-02 15:02 UTC (permalink / raw)
  To: Edward Cree
  Cc: Yonghong Song, Daniel Borkmann, Jiri Olsa, Martin Lau,
	Alexei Starovoitov, Linux Networking Development Mailing List

Em Thu, Nov 01, 2018 at 08:05:07PM +0000, Edward Cree escreveu:
> On 01/11/18 18:52, Arnaldo Carvalho de Melo wrote:
> >  R0=inv(id=0) R1=inv6 R2=inv6 R3=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> > 15: (b7) r2 = 0
> > 16: (63) *(u32 *)(r10 -260) = r2
> > 17: (67) r1 <<= 32
> > 18: (77) r1 >>= 32
> > 19: (67) r1 <<= 3
> > 20: (bf) r2 = r6
> > 21: (0f) r2 += r1
> > 22: (79) r3 = *(u64 *)(r2 +16)
> > R2 invalid mem access 'inv'
> I wonder if you could run this with verifier log level 2?  (I'm not sure how
>  you would go about plumbing that through the perf tooling.)  It seems very
>  odd that it ends up with R2=inv, and I'm wondering whether R1 becomes unknown
>  during the shifts or whether the addition in insn 21 somehow produces the
>  unknown-ness.  (I know we used to have a thing[1] where doing ptr += K and
>  then also having an offset in the LDX produced an error about
>  ptr+const+const, but that seems to have been fixed at some point.)
> 
> Note however that even if we get past this, R1 at this point holds 6, so it
>  looks like the verifier is walking the impossible path where we're inside the
>  'if' even though filename_arg = 6.  This is a (slightly annoying) verifier
>  limitation, that it walks paths with impossible combinations of constraints
>  (we've previously had cases where assertions in the verifier would blow up
>  because of this, e.g. registers with max_val less than min_val).  So if the
>  check_ctx_access() is going to worry about whether you're off the end of the
>  array (I'm not sure what your program type is and thus which is_valid_access
>  callback is involved), then it'll complain about this.
> If filename_arg came from some external source you'd have a different
>  problem, because then it would have a totally unknown value, that on entering
>  the 'if' becomes "unknown but < 6", which is still too variable to have as
>  the offset of a ctx access.  Those have to be at a known constant offset, so
>  that we can determine the type of the returned value.
> 
> As a way to fix this, how about [UNTESTED!]:
>     const void *filename_arg = NULL;
>     /* ... */
>     switch (augmented_args.args.syscall_nr) {
>         case SYS_OPEN: filename_arg = args->args[0]; break;
>         case SYS_OPENAT: filename_arg = args->args[1]; break;
>     }
>     /* ... */
>     if (filename_arg) {
>         /* stuff */
>         blah = probe_read_str(/* ... */, filename_arg);
>     } else {
>         /* the other stuff */
>     }
> That way, you're only ever dealing in constant pointers (although judging by
>  an old thread I found[1] about ptr+const+const, the compiler might decide to
>  make some optimisations that end up looking like your existing code).

Yeah, didn't work as well:

SEC("raw_syscalls:sys_enter")
int sys_enter(struct syscall_enter_args *args)
{
	struct {
		struct syscall_enter_args args;
		struct augmented_filename filename;
	} augmented_args;
	unsigned int len = sizeof(augmented_args);
	const void *filename_arg = NULL;

	probe_read(&augmented_args.args, sizeof(augmented_args.args), args);

	switch (augmented_args.args.syscall_nr) {
	case SYS_OPEN:	 filename_arg = (const void *)args->args[0]; break;
	case SYS_OPENAT: filename_arg = (const void *)args->args[1]; break;
	}

	if (filename_arg != NULL) {
		augmented_args.filename.reserved = 0;
		augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
							      sizeof(augmented_args.filename.value),
							      filename_arg);
		if (augmented_args.filename.size < sizeof(augmented_args.filename.value)) {
			len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size;
			len &= sizeof(augmented_args.filename.value) - 1;
		}
	} else {
		len = sizeof(augmented_args.args);
	}

	perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, len);
	return 0;
}

And the -vv in 'perf trace' didn't seem to map to further details in the
output of the verifier debug:

# trace -vv -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
bpf: builtin compilation failed: -95, try external compiler
Kernel build dir is set to /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
set env: KBUILD_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
unset env: KBUILD_OPTS
include option is set to  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h 
set env: NR_CPUS=4
set env: LINUX_VERSION_CODE=0x41300
set env: CLANG_EXEC=/usr/local/bin/clang
unset env: CLANG_OPTIONS
set env: KERNEL_INC_OPTIONS= -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h 
set env: PERF_BPF_INC_OPTIONS=-I/home/acme/lib/perf/include/bpf
set env: WORKING_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
set env: CLANG_SOURCE=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
llvm compiling command template: $CLANG_EXEC -D__KERNEL__ -D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign -working-directory $WORKING_DIR -c "$CLANG_SOURCE" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE
llvm compiling command : /usr/local/bin/clang -D__KERNEL__ -D__NR_CPUS__=4 -DLINUX_VERSION_CODE=0x41300  -I/home/acme/lib/perf/include/bpf  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h  -Wno-unused-value -Wno-pointer-sign -working-directory /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build -c /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c -target bpf  -O2 -o - 
libbpf: loading object 'tools/perf/examples/bpf/augmented_raw_syscalls.c' from buffer
libbpf: section(1) .strtab, size 168, link 0, flags 0, type=3
libbpf: skip section(1) .strtab
libbpf: section(2) .text, size 0, link 0, flags 6, type=1
libbpf: skip section(2) .text
libbpf: section(3) raw_syscalls:sys_enter, size 344, link 0, flags 6, type=1
libbpf: found program raw_syscalls:sys_enter
libbpf: section(4) .relraw_syscalls:sys_enter, size 16, link 10, flags 0, type=9
libbpf: section(5) raw_syscalls:sys_exit, size 16, link 0, flags 6, type=1
libbpf: found program raw_syscalls:sys_exit
libbpf: section(6) maps, size 56, link 0, flags 3, type=1
libbpf: section(7) license, size 4, link 0, flags 3, type=1
libbpf: license of tools/perf/examples/bpf/augmented_raw_syscalls.c is GPL
libbpf: section(8) version, size 4, link 0, flags 3, type=1
libbpf: kernel version of tools/perf/examples/bpf/augmented_raw_syscalls.c is 41300
libbpf: section(9) .llvm_addrsig, size 6, link 10, flags 80000000, type=1879002115
libbpf: skip section(9) .llvm_addrsig
libbpf: section(10) .symtab, size 240, link 1, flags 0, type=2
libbpf: maps in tools/perf/examples/bpf/augmented_raw_syscalls.c: 2 maps in 56 bytes
libbpf: map 0 is "__augmented_syscalls__"
libbpf: map 1 is "__bpf_stdout__"
libbpf: collecting relocating info for: 'raw_syscalls:sys_enter'
libbpf: relo for 4 value 28 name 124
libbpf: relocation: insn_idx=35
libbpf: relocation: find map 1 (__augmented_syscalls__) for insn 35
Added extra kernel map __entry_SYSCALL_64_trampoline fffffe0000006000-fffffe0000007000
Added extra kernel map __entry_SYSCALL_64_trampoline fffffe0000032000-fffffe0000033000
Added extra kernel map __entry_SYSCALL_64_trampoline fffffe000005e000-fffffe000005f000
Added extra kernel map __entry_SYSCALL_64_trampoline fffffe000008a000-fffffe000008b000
bpf: config program 'raw_syscalls:sys_enter'
bpf: config program 'raw_syscalls:sys_exit'
libbpf: create map __bpf_stdout__: fd=3
libbpf: create map __augmented_syscalls__: fd=4
libbpf: load bpf program failed: Permission denied
libbpf: -- BEGIN DUMP LOG ---
libbpf: 
0: (bf) r6 = r1
1: (bf) r1 = r10
2: (07) r1 += -328
3: (b7) r7 = 64
4: (b7) r2 = 64
5: (bf) r3 = r6
6: (85) call bpf_probe_read#4
7: (79) r1 = *(u64 *)(r10 -320)
8: (15) if r1 == 0x101 goto pc+4
 R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
9: (55) if r1 != 0x2 goto pc+22
 R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
10: (bf) r1 = r6
11: (07) r1 += 16
12: (05) goto pc+2
15: (79) r3 = *(u64 *)(r1 +0)
dereference of modified ctx ptr R1 off=16 disallowed

libbpf: -- END LOG --
libbpf: failed to load program 'raw_syscalls:sys_enter'
libbpf: failed to load object 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
bpf: load objects failed: err=-4007: (Kernel verifier blocks program loading)
event syntax error: 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
                     \___ Kernel verifier blocks program loading

(add -v to see detail)
Run 'perf list' for a list of valid events

 Usage: perf trace [<options>] [<command>]
    or: perf trace [<options>] -- <command> [<options>]
    or: perf trace record [<options>] [<command>]
    or: perf trace record [<options>] -- <command> [<options>]

    -e, --event <event>   event/syscall selector. use 'perf list' to list available events
[root@seventh perf]# 


I'll check how to plumb that, but its a holiday down here in Brazil,
kids at home...


 
> As for what you want to do with the index coming from userspace, the verifier
>  will not like that at all, as mentioned above, so I think you'll need to do
>  something like:
>     switch (filename_arg_from_userspace) {
>         case 0: filename_arg = args->args[0]; break;
>         case 1: filename_arg = args->args[1]; break;
>         /* etc */
>         default: filename_arg = NULL;
>     }
>  thus ensuring that you only ever have ctx pointers with constant offsets.
> 
> -Ed
> 
> [1]: https://lists.iovisor.org/g/iovisor-dev/topic/21386327#1302

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Help with the BPF verifier
  2018-11-02 15:02   ` Arnaldo Carvalho de Melo
@ 2018-11-02 15:42     ` Edward Cree
  2018-11-02 21:27       ` Yonghong Song
  2018-11-03 11:29       ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 11+ messages in thread
From: Edward Cree @ 2018-11-02 15:42 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Yonghong Song, Daniel Borkmann, Jiri Olsa, Martin Lau,
	Alexei Starovoitov, Linux Networking Development Mailing List

On 02/11/18 15:02, Arnaldo Carvalho de Melo wrote:
> Yeah, didn't work as well: 

> And the -vv in 'perf trace' didn't seem to map to further details in the
> output of the verifier debug:
Yeah for log_level 2 you probably need to make source-level changes to either
 perf or libbpf (I think the latter).  It's annoying that essentially no tools
 plumb through an option for that, someone should fix them ;-)

> libbpf: -- BEGIN DUMP LOG ---
> libbpf: 
> 0: (bf) r6 = r1
> 1: (bf) r1 = r10
> 2: (07) r1 += -328
> 3: (b7) r7 = 64
> 4: (b7) r2 = 64
> 5: (bf) r3 = r6
> 6: (85) call bpf_probe_read#4
> 7: (79) r1 = *(u64 *)(r10 -320)
> 8: (15) if r1 == 0x101 goto pc+4
>  R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> 9: (55) if r1 != 0x2 goto pc+22
>  R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> 10: (bf) r1 = r6
> 11: (07) r1 += 16
> 12: (05) goto pc+2
> 15: (79) r3 = *(u64 *)(r1 +0)
> dereference of modified ctx ptr R1 off=16 disallowed
Aha, we at least got a different error message this time.
And indeed llvm has done that optimisation, rather than the more obvious
11: r3 = *(u64 *)(r1 +16)
 because it wants to have lots of reads share a single insn.  You may be able
 to defeat that optimisation by adding compiler barriers, idk.  Maybe someone
 with llvm knowledge can figure out how to stop it (ideally, llvm would know
 when it's generating for bpf backend and not do that).  -O0?  ¯\_(ツ)_/¯
Alternatively, your prog looks short enough that maybe you could kick the C
 habit and write it directly in eBPF asm, that way no-one is optimising things
 behind your back.  (I realise this option won't appeal to everyone ;-)
The reason the verifier disallows this, iirc, is because it needs to be able
 to rewrite the offsets on ctx accesses (see convert_ctx_accesses()) in case
 underlying kernel struct doesn't match the layout of the ctx ABI.  To do this
 it needs the ctx offset to live entirely in the insn doing the access,
 otherwise different paths could lead to the same insn accessing different ctx
 offsets with different fixups needed — can't be done.

-Ed

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Help with the BPF verifier
  2018-11-02 15:42     ` Edward Cree
@ 2018-11-02 21:27       ` Yonghong Song
  2018-11-05 12:33         ` Arnaldo Carvalho de Melo
  2018-11-03 11:29       ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 11+ messages in thread
From: Yonghong Song @ 2018-11-02 21:27 UTC (permalink / raw)
  To: Edward Cree, Arnaldo Carvalho de Melo
  Cc: Daniel Borkmann, Jiri Olsa, Martin Lau, Alexei Starovoitov,
	Linux Networking Development Mailing List



On 11/2/18 8:42 AM, Edward Cree wrote:
> On 02/11/18 15:02, Arnaldo Carvalho de Melo wrote:
>> Yeah, didn't work as well:
> 
>> And the -vv in 'perf trace' didn't seem to map to further details in the
>> output of the verifier debug:
> Yeah for log_level 2 you probably need to make source-level changes to either
>   perf or libbpf (I think the latter).  It's annoying that essentially no tools
>   plumb through an option for that, someone should fix them ;-)
> 
>> libbpf: -- BEGIN DUMP LOG ---
>> libbpf:
>> 0: (bf) r6 = r1
>> 1: (bf) r1 = r10
>> 2: (07) r1 += -328
>> 3: (b7) r7 = 64
>> 4: (b7) r2 = 64
>> 5: (bf) r3 = r6
>> 6: (85) call bpf_probe_read#4
>> 7: (79) r1 = *(u64 *)(r10 -320)
>> 8: (15) if r1 == 0x101 goto pc+4
>>   R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
>> 9: (55) if r1 != 0x2 goto pc+22
>>   R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
>> 10: (bf) r1 = r6
>> 11: (07) r1 += 16
>> 12: (05) goto pc+2
>> 15: (79) r3 = *(u64 *)(r1 +0)
>> dereference of modified ctx ptr R1 off=16 disallowed
> Aha, we at least got a different error message this time.
> And indeed llvm has done that optimisation, rather than the more obvious
> 11: r3 = *(u64 *)(r1 +16)
>   because it wants to have lots of reads share a single insn.  You may be able
>   to defeat that optimisation by adding compiler barriers, idk.  Maybe someone
>   with llvm knowledge can figure out how to stop it (ideally, llvm would know
>   when it's generating for bpf backend and not do that).  -O0?  ¯\_(ツ)_/¯

The optimization mostly likes below:
    br1:
      ...
      r1 += 16
      goto merge
    br2:
      ...
      r1 += 20
      goto merge
    merge:
      *(u64 *)(r1 + 0)

The compiler tries to merge common loads. There is no easy way to
stop this compiler optimization without turning off a lot of other
optimizations. The easiest way is to add barriers
    __asm__ __volatile__("": : :"memory")
after the ctx memory access to prevent their down stream merging.

> Alternatively, your prog looks short enough that maybe you could kick the C
>   habit and write it directly in eBPF asm, that way no-one is optimising things
>   behind your back.  (I realise this option won't appeal to everyone ;-)

The LLVM supports BPF inline assembly as well. Some examples here
https://github.com/llvm-mirror/llvm/blob/master/test/CodeGen/BPF/inline_asm.ll
You may try it for selective ctx access to work around some
compiler optimizations. I personally have not used it yet and actually
not sure whether it actually works or not :-)

> The reason the verifier disallows this, iirc, is because it needs to be able
>   to rewrite the offsets on ctx accesses (see convert_ctx_accesses()) in case >   underlying kernel struct doesn't match the layout of the ctx ABI.  
To do this
>   it needs the ctx offset to live entirely in the insn doing the access,
>   otherwise different paths could lead to the same insn accessing different ctx
>   offsets with different fixups needed — can't be done.
> 
> -Ed
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Help with the BPF verifier
  2018-11-02 15:42     ` Edward Cree
  2018-11-02 21:27       ` Yonghong Song
@ 2018-11-03 11:29       ` Arnaldo Carvalho de Melo
  2018-11-03 11:32         ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-11-03 11:29 UTC (permalink / raw)
  To: Edward Cree
  Cc: Yonghong Song, Daniel Borkmann, Jiri Olsa, Martin Lau,
	Alexei Starovoitov, Linux Networking Development Mailing List

Em Fri, Nov 02, 2018 at 03:42:49PM +0000, Edward Cree escreveu:
> On 02/11/18 15:02, Arnaldo Carvalho de Melo wrote:
> > Yeah, didn't work as well: 
> 
> > And the -vv in 'perf trace' didn't seem to map to further details in the
> > output of the verifier debug:
> Yeah for log_level 2 you probably need to make source-level changes to either
>  perf or libbpf (I think the latter).  It's annoying that essentially no tools
>  plumb through an option for that, someone should fix them ;-)
> 
> > libbpf: -- BEGIN DUMP LOG ---
> > libbpf: 
> > 0: (bf) r6 = r1
> > 1: (bf) r1 = r10
> > 2: (07) r1 += -328
> > 3: (b7) r7 = 64
> > 4: (b7) r2 = 64
> > 5: (bf) r3 = r6
> > 6: (85) call bpf_probe_read#4
> > 7: (79) r1 = *(u64 *)(r10 -320)
> > 8: (15) if r1 == 0x101 goto pc+4
> >  R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> > 9: (55) if r1 != 0x2 goto pc+22
> >  R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> > 10: (bf) r1 = r6
> > 11: (07) r1 += 16
> > 12: (05) goto pc+2
> > 15: (79) r3 = *(u64 *)(r1 +0)
> > dereference of modified ctx ptr R1 off=16 disallowed
> Aha, we at least got a different error message this time.
> And indeed llvm has done that optimisation, rather than the more obvious
> 11: r3 = *(u64 *)(r1 +16)
>  because it wants to have lots of reads share a single insn.  You may be able
>  to defeat that optimisation by adding compiler barriers, idk.  Maybe someone
>  with llvm knowledge can figure out how to stop it (ideally, llvm would know
>  when it's generating for bpf backend and not do that).  -O0?  ¯\_(ツ)_/¯

set env: NR_CPUS=4
set env: LINUX_VERSION_CODE=0x41300
set env: CLANG_EXEC=/usr/local/bin/clang
unset env: CLANG_OPTIONS
set env: KERNEL_INC_OPTIONS= -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h 
set env: PERF_BPF_INC_OPTIONS=-I/home/acme/lib/perf/include/bpf
set env: WORKING_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
set env: CLANG_SOURCE=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
llvm compiling command template: $CLANG_EXEC -D__KERNEL__ -D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign -working-directory $WORKING_DIR -c "$CLANG_SOURCE" -target bpf $CLANG_EMIT_LLVM -O2 -o - $LLVM_OPTIONS_PIPE
llvm compiling command : /usr/local/bin/clang -D__KERNEL__ -D__NR_CPUS__=4 -DLINUX_VERSION_CODE=0x41300  -I/home/acme/lib/perf/include/bpf  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h  -Wno-unused-value -Wno-pointer-sign -working-directory /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build -c /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c -target bpf  -O2 -o -

So it is using -O2, lets try with -O0...

So I added this to my ~/.perfconfig, i.e. the default clang command line
template replacing -O2 with -O0.

[root@seventh perf]# cat ~/.perfconfig 
[llvm]
	clang-bpf-cmd-template = "$CLANG_EXEC -D__KERNEL__ -D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign -working-directory $WORKING_DIR -c \"$CLANG_SOURCE\" -target bpf $CLANG_EMIT_LLVM -O0 -o - $LLVM_OPTIONS_PIPE"
	# dump-obj = true
[root@seventh perf]# 

And got an explosion:

# trace -vv -e tools/perf/examples/bpf/augmented_raw_syscalls.c sleep 1
bpf: builtin compilation failed: -95, try external compiler
Kernel build dir is set to /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
set env: KBUILD_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
unset env: KBUILD_OPTS
include option is set to  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h 
set env: NR_CPUS=4
set env: LINUX_VERSION_CODE=0x41300
set env: CLANG_EXEC=/usr/local/bin/clang
unset env: CLANG_OPTIONS
set env: KERNEL_INC_OPTIONS= -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h 
set env: PERF_BPF_INC_OPTIONS=-I/home/acme/lib/perf/include/bpf
set env: WORKING_DIR=/lib/modules/4.19.0-rc8-00014-gc0cff31be705/build
set env: CLANG_SOURCE=/home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
llvm compiling command template: $CLANG_EXEC -D__KERNEL__ -D__NR_CPUS__=$NR_CPUS -DLINUX_VERSION_CODE=$LINUX_VERSION_CODE $CLANG_OPTIONS $PERF_BPF_INC_OPTIONS $KERNEL_INC_OPTIONS -Wno-unused-value -Wno-pointer-sign -working-directory $WORKING_DIR -c "$CLANG_SOURCE" -target bpf $CLANG_EMIT_LLVM -O0 -o - $LLVM_OPTIONS_PIPE
llvm compiling command : /usr/local/bin/clang -D__KERNEL__ -D__NR_CPUS__=4 -DLINUX_VERSION_CODE=0x41300  -I/home/acme/lib/perf/include/bpf  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h  -Wno-unused-value -Wno-pointer-sign -working-directory /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build -c /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c -target bpf  -O0 -o - 
clang-7: /home/acme/git/llvm/lib/Target/BPF/MCTargetDesc/BPFAsmBackend.cpp:74: virtual void {anonymous}::BPFAsmBackend::applyFixup(const llvm::MCAssembler&, const llvm::MCFixup&, const llvm::MCValue&, llvm::MutableArrayRef<char>, uint64_t, bool, const llvm::MCSubtargetInfo*) const: Assertion `Value == 0' failed.
Stack dump:
0.	Program arguments: /usr/local/bin/clang-7 -cc1 -triple bpf -emit-obj -mrelax-all -disable-free -main-file-name augmented_raw_syscalls.c -mrelocation-model static -mthread-model posix -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -dwarf-column-info -debugger-tuning=gdb -coverage-notes-file /home/acme/git/perf/-.gcno -nostdsysteminc -nobuiltininc -resource-dir /usr/local/lib/clang/8.0.0 -working-directory /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -include /home/acme/git/linux/include/linux/kconfig.h -D __KERNEL__ -D __NR_CPUS__=4 -D LINUX_VERSION_CODE=0x41300 -I /home/acme/lib/perf/include/bpf -I /home/acme/git/linux/arch/x86/include -I ./arch/x86/include/generated -I /home/acme/git/linux/include -I ./include -I /home/acme/git/linux/arch/x86/include/uapi -I ./arch/x86/include/generated/uapi -I /home/acme/git/linux/include/uapi -I ./include/generated/uapi -O0 -Wno-unused-value -Wno-pointer-sign -fdebug-compilation-dir /home/acme/git/perf -ferror-limit 19 -fmessage-length 203 -fobjc-runtime=gcc -fdiagnostics-show-option -fcolor-diagnostics -o - -x c /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c -faddrsig 
1.	<eof> parser at end of file
2.	Code generation
#0 0x000000000408ff98 llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/usr/local/bin/clang-7+0x408ff98)
#1 0x000000000409002b (/usr/local/bin/clang-7+0x409002b)
#2 0x000000000408e059 llvm::sys::RunSignalHandlers() (/usr/local/bin/clang-7+0x408e059)
#3 0x000000000408fa5b (/usr/local/bin/clang-7+0x408fa5b)
#4 0x00007fcd33ab81c0 __restore_rt (/lib64/libpthread.so.0+0x121c0)
#5 0x00007fcd32817750 __GI_raise (/lib64/libc.so.6+0x34750)
#6 0x00007fcd32818d31 __GI_abort (/lib64/libc.so.6+0x35d31)
#7 0x00007fcd3281005a __assert_fail_base (/lib64/libc.so.6+0x2d05a)
#8 0x00007fcd328100d2 (/lib64/libc.so.6+0x2d0d2)
#9 0x0000000002602d13 (/usr/local/bin/clang-7+0x2602d13)
#10 0x0000000003bf6da7 llvm::MCAssembler::layout(llvm::MCAsmLayout&) (/usr/local/bin/clang-7+0x3bf6da7)
#11 0x0000000003bf6e2c llvm::MCAssembler::Finish() (/usr/local/bin/clang-7+0x3bf6e2c)
#12 0x0000000003c524e8 llvm::MCObjectStreamer::FinishImpl() (/usr/local/bin/clang-7+0x3c524e8)
#13 0x0000000003c2db4b llvm::MCELFStreamer::FinishImpl() (/usr/local/bin/clang-7+0x3c2db4b)
#14 0x0000000003c5ba43 llvm::MCStreamer::Finish() (/usr/local/bin/clang-7+0x3c5ba43)
#15 0x0000000004c281b9 llvm::AsmPrinter::doFinalization(llvm::Module&) (/usr/local/bin/clang-7+0x4c281b9)
#16 0x00000000038d637b llvm::FPPassManager::doFinalization(llvm::Module&) (/usr/local/bin/clang-7+0x38d637b)
#17 0x00000000038d683d (/usr/local/bin/clang-7+0x38d683d)
#18 0x00000000038d6d6b llvm::legacy::PassManagerImpl::run(llvm::Module&) (/usr/local/bin/clang-7+0x38d6d6b)
#19 0x00000000038d6f9b llvm::legacy::PassManager::run(llvm::Module&) (/usr/local/bin/clang-7+0x38d6f9b)
#20 0x000000000436b091 (/usr/local/bin/clang-7+0x436b091)
#21 0x000000000436df07 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (/usr/local/bin/clang-7+0x436df07)
#22 0x0000000004f6ee11 (/usr/local/bin/clang-7+0x4f6ee11)
#23 0x00000000062af3a3 clang::ParseAST(clang::Sema&, bool, bool) (/usr/local/bin/clang-7+0x62af3a3)
#24 0x0000000004a73873 clang::ASTFrontendAction::ExecuteAction() (/usr/local/bin/clang-7+0x4a73873)
#25 0x0000000004f6ced8 clang::CodeGenAction::ExecuteAction() (/usr/local/bin/clang-7+0x4f6ced8)
#26 0x0000000004a7329d clang::FrontendAction::Execute() (/usr/local/bin/clang-7+0x4a7329d)
#27 0x0000000004a04ccf clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/usr/local/bin/clang-7+0x4a04ccf)
#28 0x0000000004bc512e clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/usr/local/bin/clang-7+0x4bc512e)
#29 0x0000000001d88a6c cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/usr/local/bin/clang-7+0x1d88a6c)
#30 0x0000000001d7e31e (/usr/local/bin/clang-7+0x1d7e31e)
#31 0x0000000001d7ef94 main (/usr/local/bin/clang-7+0x1d7ef94)
#32 0x00007fcd32803fea __libc_start_main (/lib64/libc.so.6+0x20fea)
#33 0x0000000001d7beea _start (/usr/local/bin/clang-7+0x1d7beea)
clang-7: error: unable to execute command: Aborted (core dumped)
clang-7: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 8.0.0 (http://llvm.org/git/clang.git 8587270a739ee30c926a76d5657e65e85b560f6e) (http://llvm.org/git/llvm.git 0566eefef9c3777bd780ec4cbb9efa764633b76c)
Target: bpf
Thread model: posix
InstalledDir: /usr/local/bin
clang-7: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
clang-7: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang-7: note: diagnostic msg: /tmp/augmented_raw_syscalls-7444d9.c
clang-7: note: diagnostic msg: /tmp/augmented_raw_syscalls-7444d9.sh
clang-7: note: diagnostic msg: 

********************
ERROR:	unable to compile tools/perf/examples/bpf/augmented_raw_syscalls.c
Hint:	Check error message shown above.
Hint:	You can also pre-compile it into .o using:
     		clang -target bpf -O2 -c tools/perf/examples/bpf/augmented_raw_syscalls.c
     	with proper -I and -D options.
event syntax error: 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
                     \___ Failed to load tools/perf/examples/bpf/augmented_raw_syscalls.c from source: Error when compiling BPF scriptlet

(add -v to see detail)
Run 'perf list' for a list of valid events

 Usage: perf trace [<options>] [<command>]
    or: perf trace [<options>] -- <command> [<options>]
    or: perf trace record [<options>] [<command>]
    or: perf trace record [<options>] -- <command> [<options>]

    -e, --event <event>   event/syscall selector. use 'perf list' to list available events
[root@seventh perf]# 

Trying with -O1...


> Alternatively, your prog looks short enough that maybe you could kick the C
>  habit and write it directly in eBPF asm, that way no-one is optimising things
>  behind your back.  (I realise this option won't appeal to everyone ;-)

Nah, if that is what it takes... Would be better with simple looking C
code tho :-)

> The reason the verifier disallows this, iirc, is because it needs to be able
>  to rewrite the offsets on ctx accesses (see convert_ctx_accesses()) in case
>  underlying kernel struct doesn't match the layout of the ctx ABI.  To do this
>  it needs the ctx offset to live entirely in the insn doing the access,
>  otherwise different paths could lead to the same insn accessing different ctx
>  offsets with different fixups needed — can't be done.
> 
> -Ed

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Help with the BPF verifier
  2018-11-03 11:29       ` Arnaldo Carvalho de Melo
@ 2018-11-03 11:32         ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-11-03 11:32 UTC (permalink / raw)
  To: Edward Cree
  Cc: Yonghong Song, Daniel Borkmann, Jiri Olsa, Martin Lau,
	Alexei Starovoitov, Linux Networking Development Mailing List

Em Sat, Nov 03, 2018 at 08:29:34AM -0300, Arnaldo Carvalho de Melo escreveu:
> PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
> Preprocessed source(s) and associated run script(s) are located at:
> clang-7: note: diagnostic msg: /tmp/augmented_raw_syscalls-7444d9.c
> clang-7: note: diagnostic msg: /tmp/augmented_raw_syscalls-7444d9.sh
> clang-7: note: diagnostic msg: 
> 
> ********************
> ERROR:	unable to compile tools/perf/examples/bpf/augmented_raw_syscalls.c
> Hint:	Check error message shown above.
> Hint:	You can also pre-compile it into .o using:
>      		clang -target bpf -O2 -c tools/perf/examples/bpf/augmented_raw_syscalls.c
>      	with proper -I and -D options.
> event syntax error: 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
>                      \___ Failed to load tools/perf/examples/bpf/augmented_raw_syscalls.c from source: Error when compiling BPF scriptlet
> 
> (add -v to see detail)
> Run 'perf list' for a list of valid events
> 
>  Usage: perf trace [<options>] [<command>]
>     or: perf trace [<options>] -- <command> [<options>]
>     or: perf trace record [<options>] [<command>]
>     or: perf trace record [<options>] -- <command> [<options>]
> 
>     -e, --event <event>   event/syscall selector. use 'perf list' to list available events
> [root@seventh perf]# 
> 
> Trying with -O1...

-O1 doesn't get clang confused, its just the verifier that doesn't like
the result, i.e. we're back to that optimization, that isn't disabled
with -O1

llvm compiling command : /usr/local/bin/clang -D__KERNEL__ -D__NR_CPUS__=4 -DLINUX_VERSION_CODE=0x41300  -I/home/acme/lib/perf/include/bpf  -nostdinc -isystem /usr/lib/gcc/x86_64-redhat-linux/7/include -I/home/acme/git/linux/arch/x86/include -I./arch/x86/include/generated  -I/home/acme/git/linux/include -I./include -I/home/acme/git/linux/arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I/home/acme/git/linux/include/uapi -I./include/generated/uapi -include /home/acme/git/linux/include/linux/kconfig.h  -Wno-unused-value -Wno-pointer-sign -working-directory /lib/modules/4.19.0-rc8-00014-gc0cff31be705/build -c /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c -target bpf  -O1 -o - 
libbpf: loading object 'tools/perf/examples/bpf/augmented_raw_syscalls.c' from buffer
libbpf: section(1) .strtab, size 168, link 0, flags 0, type=3
libbpf: skip section(1) .strtab
libbpf: section(2) .text, size 0, link 0, flags 6, type=1
libbpf: skip section(2) .text
libbpf: section(3) raw_syscalls:sys_enter, size 344, link 0, flags 6, type=1
libbpf: found program raw_syscalls:sys_enter
libbpf: section(4) .relraw_syscalls:sys_enter, size 16, link 10, flags 0, type=9
libbpf: section(5) raw_syscalls:sys_exit, size 16, link 0, flags 6, type=1
libbpf: found program raw_syscalls:sys_exit
libbpf: section(6) maps, size 56, link 0, flags 3, type=1
libbpf: section(7) license, size 4, link 0, flags 3, type=1
libbpf: license of tools/perf/examples/bpf/augmented_raw_syscalls.c is GPL
libbpf: section(8) version, size 4, link 0, flags 3, type=1
libbpf: kernel version of tools/perf/examples/bpf/augmented_raw_syscalls.c is 41300
libbpf: section(9) .llvm_addrsig, size 6, link 10, flags 80000000, type=1879002115
libbpf: skip section(9) .llvm_addrsig
libbpf: section(10) .symtab, size 240, link 1, flags 0, type=2
libbpf: maps in tools/perf/examples/bpf/augmented_raw_syscalls.c: 2 maps in 56 bytes
libbpf: map 0 is "__augmented_syscalls__"
libbpf: map 1 is "__bpf_stdout__"
libbpf: collecting relocating info for: 'raw_syscalls:sys_enter'
libbpf: relo for 4 value 28 name 124
libbpf: relocation: insn_idx=35
libbpf: relocation: find map 1 (__augmented_syscalls__) for insn 35
Added extra kernel map __entry_SYSCALL_64_trampoline fffffe0000006000-fffffe0000007000
Added extra kernel map __entry_SYSCALL_64_trampoline fffffe0000032000-fffffe0000033000
Added extra kernel map __entry_SYSCALL_64_trampoline fffffe000005e000-fffffe000005f000
Added extra kernel map __entry_SYSCALL_64_trampoline fffffe000008a000-fffffe000008b000
bpf: config program 'raw_syscalls:sys_enter'
bpf: config program 'raw_syscalls:sys_exit'
libbpf: create map __bpf_stdout__: fd=3
libbpf: create map __augmented_syscalls__: fd=4
libbpf: load bpf program failed: Permission denied
libbpf: -- BEGIN DUMP LOG ---
libbpf: 
0: (bf) r6 = r1
1: (bf) r1 = r10
2: (07) r1 += -328
3: (b7) r7 = 64
4: (b7) r2 = 64
5: (bf) r3 = r6
6: (85) call bpf_probe_read#4
7: (79) r1 = *(u64 *)(r10 -320)
8: (15) if r1 == 0x101 goto pc+4
 R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
9: (55) if r1 != 0x2 goto pc+22
 R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
10: (bf) r1 = r6
11: (07) r1 += 16
12: (05) goto pc+2
15: (79) r3 = *(u64 *)(r1 +0)
dereference of modified ctx ptr R1 off=16 disallowed

libbpf: -- END LOG --
libbpf: failed to load program 'raw_syscalls:sys_enter'
libbpf: failed to load object 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
bpf: load objects failed: err=-4007: (Kernel verifier blocks program loading)
event syntax error: 'tools/perf/examples/bpf/augmented_raw_syscalls.c'
                     \___ Kernel verifier blocks program loading

(add -v to see detail)
Run 'perf list' for a list of valid events

 Usage: perf trace [<options>] [<command>]
    or: perf trace [<options>] -- <command> [<options>]
    or: perf trace record [<options>] [<command>]
    or: perf trace record [<options>] -- <command> [<options>]

    -e, --event <event>   event/syscall selector. use 'perf list' to list available events
[root@seventh perf]# 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Help with the BPF verifier
  2018-11-02 21:27       ` Yonghong Song
@ 2018-11-05 12:33         ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 11+ messages in thread
From: Arnaldo Carvalho de Melo @ 2018-11-05 12:33 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Edward Cree, Daniel Borkmann, Jiri Olsa, Martin Lau,
	Alexei Starovoitov, Linux Networking Development Mailing List

Em Fri, Nov 02, 2018 at 09:27:52PM +0000, Yonghong Song escreveu:
> 
> 
> On 11/2/18 8:42 AM, Edward Cree wrote:
> > On 02/11/18 15:02, Arnaldo Carvalho de Melo wrote:
> >> Yeah, didn't work as well:
> > 
> >> And the -vv in 'perf trace' didn't seem to map to further details in the
> >> output of the verifier debug:
> > Yeah for log_level 2 you probably need to make source-level changes to either
> >   perf or libbpf (I think the latter).  It's annoying that essentially no tools
> >   plumb through an option for that, someone should fix them ;-)
> > 
> >> libbpf: -- BEGIN DUMP LOG ---
> >> libbpf:
> >> 0: (bf) r6 = r1
> >> 1: (bf) r1 = r10
> >> 2: (07) r1 += -328
> >> 3: (b7) r7 = 64
> >> 4: (b7) r2 = 64
> >> 5: (bf) r3 = r6
> >> 6: (85) call bpf_probe_read#4
> >> 7: (79) r1 = *(u64 *)(r10 -320)
> >> 8: (15) if r1 == 0x101 goto pc+4
> >>   R0=inv(id=0) R1=inv(id=0) R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> >> 9: (55) if r1 != 0x2 goto pc+22
> >>   R0=inv(id=0) R1=inv2 R6=ctx(id=0,off=0,imm=0) R7=inv64 R10=fp0,call_-1
> >> 10: (bf) r1 = r6
> >> 11: (07) r1 += 16
> >> 12: (05) goto pc+2
> >> 15: (79) r3 = *(u64 *)(r1 +0)
> >> dereference of modified ctx ptr R1 off=16 disallowed
> > Aha, we at least got a different error message this time.
> > And indeed llvm has done that optimisation, rather than the more obvious
> > 11: r3 = *(u64 *)(r1 +16)
> >   because it wants to have lots of reads share a single insn.  You may be able
> >   to defeat that optimisation by adding compiler barriers, idk.  Maybe someone
> >   with llvm knowledge can figure out how to stop it (ideally, llvm would know
> >   when it's generating for bpf backend and not do that).  -O0?  ¯\_(ツ)_/¯
> 
> The optimization mostly likes below:
>     br1:
>       ...
>       r1 += 16
>       goto merge
>     br2:
>       ...
>       r1 += 20
>       goto merge
>     merge:
>       *(u64 *)(r1 + 0)
> 
> The compiler tries to merge common loads. There is no easy way to
> stop this compiler optimization without turning off a lot of other
> optimizations. The easiest way is to add barriers
>     __asm__ __volatile__("": : :"memory")
> after the ctx memory access to prevent their down stream merging.

Great, this made it work:

cat /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
#include <stdio.h>
#include <linux/socket.h>

/* bpf-output associated map */
struct bpf_map SEC("maps") __augmented_syscalls__ = {
	.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,
	.key_size = sizeof(int),
	.value_size = sizeof(u32),
	.max_entries = __NR_CPUS__,
};

struct syscall_enter_args {
	unsigned long long common_tp_fields;
	long		   syscall_nr;
	unsigned long	   args[6];
};

struct syscall_exit_args {
	unsigned long long common_tp_fields;
	long		   syscall_nr;
	long		   ret;
};

struct augmented_filename {
	unsigned int	size;
	int		reserved;
	char		value[256];
};

#define SYS_OPEN 2
#define SYS_OPENAT 257

SEC("raw_syscalls:sys_enter")
int sys_enter(struct syscall_enter_args *args)
{
	struct {
		struct syscall_enter_args args;
		struct augmented_filename filename;
	} augmented_args;
	unsigned int len = sizeof(augmented_args);
	const void *filename_arg = NULL;

	probe_read(&augmented_args.args, sizeof(augmented_args.args), args);

	switch (augmented_args.args.syscall_nr) {
	case SYS_OPEN:	 filename_arg = (const void *)args->args[0];
			__asm__ __volatile__("": : :"memory");
			 break;
	case SYS_OPENAT: filename_arg = (const void *)args->args[1];
			 break;
	default:
			 return 0;
	}

	if (filename_arg != NULL) {
		augmented_args.filename.reserved = 0;
		augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
							      sizeof(augmented_args.filename.value),
							      filename_arg);
		if (augmented_args.filename.size < sizeof(augmented_args.filename.value)) {
			len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size;
			len &= sizeof(augmented_args.filename.value) - 1;
		}
	} else {
		len = sizeof(augmented_args.args);
	}

	perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, len);
	return 0;
}

SEC("raw_syscalls:sys_exit")
int sys_exit(struct syscall_exit_args *args)
{
	struct syscall_exit_args augmented_args;
	probe_read(&augmented_args, sizeof(augmented_args), args);
	return augmented_args.syscall_nr == SYS_OPEN || augmented_args.syscall_nr == SYS_OPENAT;
}

license(GPL);

Now I have both open and openat getting the right 

[root@jouet perf]# trace -e tools/perf/examples/bpf/augmented_raw_syscalls.c  sleep 1
LLVM: dumping tools/perf/examples/bpf/augmented_raw_syscalls.o
     0.000 ( 0.010 ms): sleep/15794 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC) = 3
     0.028 ( 0.005 ms): sleep/15794 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC) = 3
     0.241 ( 0.005 ms): sleep/15794 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3
[root@jouet perf]# 

[root@jouet perf]# trace -e tools/perf/examples/bpf/augmented_raw_syscalls.o cat /etc/passwd > /dev/null
     0.000 ( 0.010 ms): cat/15918 openat(dfd: CWD, filename: /etc/ld.so.cache, flags: CLOEXEC) = 3
     0.028 ( 0.006 ms): cat/15918 openat(dfd: CWD, filename: /lib64/libc.so.6, flags: CLOEXEC) = 3
     0.241 ( 0.007 ms): cat/15918 open(filename: /usr/lib/locale/locale-archive, flags: CLOEXEC) = 3
     0.289 ( 0.004 ms): cat/15918 openat(dfd: CWD, filename: /etc/passwd) = 3
[root@jouet perf]#

[root@jouet perf]# readelf -SW tools/perf/examples/bpf/augmented_raw_syscalls.o
There are 11 section headers, starting at offset 0x438:

Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .strtab           STRTAB          0000000000000000 000376 0000bd 00      0   0  1
  [ 2] .text             PROGBITS        0000000000000000 000040 000000 00  AX  0   0  4
  [ 3] raw_syscalls:sys_enter PROGBITS        0000000000000000 000040 000138 00  AX  0   0  8
  [ 4] .relraw_syscalls:sys_enter REL             0000000000000000 000360 000010 10     10   3  8
  [ 5] raw_syscalls:sys_exit PROGBITS        0000000000000000 000178 000070 00  AX  0   0  8
  [ 6] maps              PROGBITS        0000000000000000 0001e8 000038 00  WA  0   0  4
  [ 7] license           PROGBITS        0000000000000000 000220 000004 00  WA  0   0  1
  [ 8] version           PROGBITS        0000000000000000 000224 000004 00  WA  0   0  4
  [ 9] .llvm_addrsig     LOOS+0xfff4c03  0000000000000000 000370 000006 00   E 10   0  1
  [10] .symtab           SYMTAB          0000000000000000 000228 000138 18      1   7  8
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  p (processor specific)
[root@jouet perf]# 

> > Alternatively, your prog looks short enough that maybe you could kick the C
> >   habit and write it directly in eBPF asm, that way no-one is optimising things
> >   behind your back.  (I realise this option won't appeal to everyone ;-)
> 
> The LLVM supports BPF inline assembly as well. Some examples here
> https://github.com/llvm-mirror/llvm/blob/master/test/CodeGen/BPF/inline_asm.ll
> You may try it for selective ctx access to work around some
> compiler optimizations. I personally have not used it yet and actually
> not sure whether it actually works or not :-)

I will take a look at it, but the simple barrier right after the first ctx
access did the trick for me, probably I'll cook some ctx accessor macros to
make this transparent.

Now to make -e open,openat to set up filters via a map, get that
augmented_raw_syscalls.c to be added automatically to the evlist, etc.

- Arnaldo
 
> > The reason the verifier disallows this, iirc, is because it needs to be able
> >   to rewrite the offsets on ctx accesses (see convert_ctx_accesses()) in case >   underlying kernel struct doesn't match the layout of the ctx ABI.  
> To do this
> >   it needs the ctx offset to live entirely in the insn doing the access,
> >   otherwise different paths could lead to the same insn accessing different ctx
> >   offsets with different fixups needed — can't be done.
> > 
> > -Ed
> > 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-11-05 21:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-01 18:52 Help with the BPF verifier Arnaldo Carvalho de Melo
2018-11-01 19:10 ` David Miller
2018-11-01 19:13   ` Arnaldo Carvalho de Melo
2018-11-01 19:28     ` David Miller
2018-11-01 20:05 ` Edward Cree
2018-11-02 15:02   ` Arnaldo Carvalho de Melo
2018-11-02 15:42     ` Edward Cree
2018-11-02 21:27       ` Yonghong Song
2018-11-05 12:33         ` Arnaldo Carvalho de Melo
2018-11-03 11:29       ` Arnaldo Carvalho de Melo
2018-11-03 11:32         ` Arnaldo Carvalho de Melo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.